AW: [Metafacture] copy / use of complex xml structures with Metamorph

Böhme, Christoph C.Boehme at dnb.de
Tue Feb 3 14:58:17 CET 2015


Dear Günter,

sorry for not replying sooner. I hope my reply is still relevant.

Guenter Hipler schrieb am Montag, 12. Januar 2015 um 22:22:
> Is there a way to copy complex xml structures as part of a metamorph transformation?

No, that is sadly not possible. Metamorph only works on literal data and has no notion of structured data.
In the current implementation all data received by Metamorph is "flattened". Your xml file for instance is seen by Metamorph as a sequence of simple literal-like events:

structuredtag.structuredtag1.singletag="value of single tag"
structuredtag.structuredtag1=""
structuredtag=""

Literals are prefixed with the entity names in which they are contained and end entity events are turned into virtual literal events (this was introduced to be able to flush on end-of-entity events). For this reason Metamorph is not aware of entities and structures cannot be passed through it.

> You can find my complete example in https://github.com/linked-swissbib/metafacture-
> runner/blob/master/examples/gh/xmltest
> input.xml contains structured xml-tags. The aim is to select a key / value pair where the 
> value is not only a single value as with
> <data source="structuredtag.structuredtag1.singletag.value" name="newNameOfSingleValue"/>
> in the outputfile  but the complete structure
> <data source="structuredtag" name="newNameOfStructure"/>
> would create the key "newNameOfStructure" with the "structured value" (an entity in 
> Metamorph datamodel)
> <structuredtag1>
> <singletag>value of single tag</singletag>
> </structuredtag1>

> Actually I get only the empty element 
> <newNameOfStructure />

This happens because the data-statement treats the event generated on the end of the structured-tag entity as a literal with an empty value and outputs this value under a new name. It would be handy to be able to forward data-structured using the data statement. However, such an implementation would have some conceptual problems as the data-statement allows to define a sequence of functions which process the literal value. These functions are not able to process complex structures.

> Background for my question:
> As we (Christoph, Pascal) talked at swib about it, I would like to use Metafacture 
> for the document processing of our Search-Engine (SOLR). Currently we are using 
> XSLT transformations with Java-Plugins for special tasks (Enrichment, 
> de-duplication etc.)
> Beside single document fields we store most of the complete marc-xml structure 
> and the complete holdings-structure as part of the search-document so we are 
> able to use this information in our presentation logic. In xslt this can be easily done 
> with the <xsl:copy-of> task 
> Compare [1] [2] [3] as examples for such a transformation.

It would be nice indeed to have a function similar to xsl:copy-of in Metamorph. Something like  <entity-data source=" structuredtag " name=" newNameOfStructure" /> which works like a data-statement but on complex structures could be used for this, I imagine. 

The main feature required to add such a function to Metamorph would be the ability to handle start and end entity events in Metamorph. With this in place, it is easy to add a statement which passes these events through so that complex structures can be handled in Metamorph. The architecture of Metamorph should not pose any fundamental problems for adding start and end entity events.

I think it would be nice to be able to handle complex structures as it would make it much easier to write scripts which modify data. 

Best,
Christoph

-- 

***Lesen. Hören. Wissen. Deutsche Nationalbibliothek***

Christoph Böhme
Deutsche Nationalbibliothek
Fachbereich Informationsinfrastruktur und Bestandserhaltung
Adickesallee 1
D-60322 Frankfurt am Main
Telefon: +49-69-1525-1721
Telefax: +49-69-1525-1799
mailto:c.boehme at dnb.de
http://www.dnb.de


More information about the Metafacture mailing list