[Metafacture] Metafacture / Metamorph hurdles

Günter Hipler guenter.hipler at unibas.ch
Mon Oct 13 18:16:27 CEST 2014


Dear Metafactures,

together with my colleague Nicolas Prongué from HEG Geneve we tried to 
play around with Metafacture and Metamorph principles.
Our first aim was to define a basic transformation from MarcXML to RDF/XML.

After getting some hints and explanations from HBZ (especially Fabian, 
thanks a lot!) how they use the entity mechanism in Metamorph
(https://github.com/hbz/lobid-organisations/blob/master/src/main/resources/morph-enriched.xml) 
Nicolas was able to define a first transformation from MarcXML into RDF/XML
https://github.com/linked-swissbib/metafacture-runner/blob/master/examples/nicolas/morph-marc21_NP.xml
Personally I think this is really nice because Nicolas isn't experienced 
in scripting or programming at all so far and it shows from my point of 
view one aspect of the potential using a DSL in Metamorph.

Our questions and experiences:
a) Instead of RDF/XML we want to serialize in turtle. Is there a 
specialized stream-type for this?
For RDF/XML we used nested entities and the tilde mechanism to create 
xml - attributes

             <data source="001" name="~rdf:about">
                 <!--The symbol "~" before rdf:about is very important: 
it permits to integrate rdf:about as an attribute in the tag 
rdf:Description-->
                 <compose prefix="http://data.swissbib.ch/resource/"/>
             </data>


If I'm not wrong nesting entity elements isn't appropriate for 
turtle-triples. My example:
https://github.com/linked-swissbib/metafacture-runner/tree/master/examples/gh/turtletest


- Is it possible to define a generic Morph transformation which doesn't 
depend on the output? The rdf-macros command might be helpful but I'm 
not sure how to use it.

I looked around in the HBZ lobid repository and found types like

triples-to-rdfmodel org.lobid.lodmill.Triples2RdfModel
write-rdfmodel org.lobid.lodmill.RdfModelFileWriter
encode-ntriples org.lobid.lodmill.PipeEncodeTriples
which are using libraries from the org.apache.jena.* package. Based on 
this I made some examples in our own sandbox repository which worked as 
expected
https://github.com/linked-swissbib/linked.swissbib.mf/tree/evaluation/examples/gh/lobid_hbz_map_triple

Is this the only way to create triple output? What were the reasons to 
create these additional commands?
I have the impression the metafacture-core 'template' command might help 
to serialize output in turtle format but I'm not sure

- Is it possible to create RDF serializations other than XML with 
Metafacture core commands  (comparable to stream-to-xml) ?


b) We would like to document our experiences we made so far using the 
Metafacture framework. Our idea is to express 'our understanding' of the 
various Metafacture/Metamorph pieces while using it in our real use 
cases/processes.  This could be done on our project wiki and being 
referenced e.g. on the culturegraph wiki as the central platform. Better 
or further ideas are welcomed.

I think this would make it a lot easier for other people to join the 
community. At the moment the barrier to use the software for the first 
time is really steep which could be one reason the feedback or activity 
on the user list is so rare.

Another idea to reduce the barrier:
At the moment one can find shorter (snippet) examples in the 
metafacture-runner repository. This is fine to get at least a first idea 
what might be possible.
But it would be really helpful to provide more comprehensive ('real 
world') examples used in production workflows (which is already done by 
HBZ in various repositories ) . Together with a documentation explaining 
shortly the ideas behind them would be a great thing and really helpful 
not only for newbies.

Perhaps these ideas and proposals could be discussed during the upcoming 
Metafacture workshop at SWIB?

c) Beside transformations in RDF I'm thinking about the possibilities to 
use Metafacture/Metamorph for our Search-engine document processing. At 
the moment we use a combination of chaining XSLT templates together with 
various Java-plugins for specialized tasks. 
(https://github.com/swissbib/content2SearchDocs/tree/master/xslt)

Does anybody use Metafacture/Metamorph for SOLR as the target? Would be 
really nice to see how the transformations are done. I stumbled upon 
probably really simple questions:

-- how to create the xml - structure for a field (at the moment we don't 
use JSON)

  <field name="id">
         <xsl:value-of select="$fragment/myDocID" />
</field>

Attributes could be created with nested structures (quite complicated 
for really large documents) but I wasn't able to create a field value 
without any additional tag. Metafacture wants to create an additional 
tag for the value because the data element needs a name and source 
attribute as well.


-- how to create fixes simple structures like this one:

<field name="recordtype">marc</field>


My understanding of the Metamorph module: It is listening on 
Metadata-events. But I don't have any event for such constant 
structures. Any suggestion how to create them?


Sorry for such a long post. We would be really happy to get some feedback!


Günter






-- 
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hipler at unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/




More information about the Metafacture mailing list