[Metafacture] Metafacture / Metamorph hurdles
Günter Hipler
guenter.hipler at unibas.ch
Mon Oct 13 18:16:27 CEST 2014
Dear Metafactures,
together with my colleague Nicolas Prongué from HEG Geneve we tried to
play around with Metafacture and Metamorph principles.
Our first aim was to define a basic transformation from MarcXML to RDF/XML.
After getting some hints and explanations from HBZ (especially Fabian,
thanks a lot!) how they use the entity mechanism in Metamorph
(https://github.com/hbz/lobid-organisations/blob/master/src/main/resources/morph-enriched.xml)
Nicolas was able to define a first transformation from MarcXML into RDF/XML
https://github.com/linked-swissbib/metafacture-runner/blob/master/examples/nicolas/morph-marc21_NP.xml
Personally I think this is really nice because Nicolas isn't experienced
in scripting or programming at all so far and it shows from my point of
view one aspect of the potential using a DSL in Metamorph.
Our questions and experiences:
a) Instead of RDF/XML we want to serialize in turtle. Is there a
specialized stream-type for this?
For RDF/XML we used nested entities and the tilde mechanism to create
xml - attributes
<data source="001" name="~rdf:about">
<!--The symbol "~" before rdf:about is very important:
it permits to integrate rdf:about as an attribute in the tag
rdf:Description-->
<compose prefix="http://data.swissbib.ch/resource/"/>
</data>
If I'm not wrong nesting entity elements isn't appropriate for
turtle-triples. My example:
https://github.com/linked-swissbib/metafacture-runner/tree/master/examples/gh/turtletest
- Is it possible to define a generic Morph transformation which doesn't
depend on the output? The rdf-macros command might be helpful but I'm
not sure how to use it.
I looked around in the HBZ lobid repository and found types like
triples-to-rdfmodel org.lobid.lodmill.Triples2RdfModel
write-rdfmodel org.lobid.lodmill.RdfModelFileWriter
encode-ntriples org.lobid.lodmill.PipeEncodeTriples
which are using libraries from the org.apache.jena.* package. Based on
this I made some examples in our own sandbox repository which worked as
expected
https://github.com/linked-swissbib/linked.swissbib.mf/tree/evaluation/examples/gh/lobid_hbz_map_triple
Is this the only way to create triple output? What were the reasons to
create these additional commands?
I have the impression the metafacture-core 'template' command might help
to serialize output in turtle format but I'm not sure
- Is it possible to create RDF serializations other than XML with
Metafacture core commands (comparable to stream-to-xml) ?
b) We would like to document our experiences we made so far using the
Metafacture framework. Our idea is to express 'our understanding' of the
various Metafacture/Metamorph pieces while using it in our real use
cases/processes. This could be done on our project wiki and being
referenced e.g. on the culturegraph wiki as the central platform. Better
or further ideas are welcomed.
I think this would make it a lot easier for other people to join the
community. At the moment the barrier to use the software for the first
time is really steep which could be one reason the feedback or activity
on the user list is so rare.
Another idea to reduce the barrier:
At the moment one can find shorter (snippet) examples in the
metafacture-runner repository. This is fine to get at least a first idea
what might be possible.
But it would be really helpful to provide more comprehensive ('real
world') examples used in production workflows (which is already done by
HBZ in various repositories ) . Together with a documentation explaining
shortly the ideas behind them would be a great thing and really helpful
not only for newbies.
Perhaps these ideas and proposals could be discussed during the upcoming
Metafacture workshop at SWIB?
c) Beside transformations in RDF I'm thinking about the possibilities to
use Metafacture/Metamorph for our Search-engine document processing. At
the moment we use a combination of chaining XSLT templates together with
various Java-plugins for specialized tasks.
(https://github.com/swissbib/content2SearchDocs/tree/master/xslt)
Does anybody use Metafacture/Metamorph for SOLR as the target? Would be
really nice to see how the transformations are done. I stumbled upon
probably really simple questions:
-- how to create the xml - structure for a field (at the moment we don't
use JSON)
<field name="id">
<xsl:value-of select="$fragment/myDocID" />
</field>
Attributes could be created with nested structures (quite complicated
for really large documents) but I wasn't able to create a field value
without any additional tag. Metafacture wants to create an additional
tag for the value because the data element needs a name and source
attribute as well.
-- how to create fixes simple structures like this one:
<field name="recordtype">marc</field>
My understanding of the Metamorph module: It is listening on
Metadata-events. But I don't have any event for such constant
structures. Any suggestion how to create them?
Sorry for such a long post. We would be really happy to get some feedback!
Günter
--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hipler at unibas.ch
URL: www.swissbib.org / http://www.ub.unibas.ch/
More information about the Metafacture
mailing list