[Metafacture] Proposal for Metafacture module package structure

Christoph Böhme christoph at b3e.net
Mon Oct 31 18:10:03 CET 2016


Hi all,

attached you find a proposal for separating the Metafacture modules
which are currently in metafacture-core into different packages.

I am not really happy with this current version of the proposal as the
metafacture-core-modules package contains way to many modules for my
taste. Perhaps you have some ideas how we could further split up these
modules. I think, it is desirable to have a small number of modules in
the core-modules package as a number of other packages use modules from
this package. Hence, it would be good if it were rather stable. That is
much easier if it does not contain many modules.

Another open question is the naming of the Metafacture module packages.
Should they using generic names or implementation specific ones? For
instance, should the package that contains the modules for processing
JSON with the jackson library be named metafacture-json-modules or
metafacture-jackson-modules? The former would hide the implementation
detail of which library was used to do the actual reading and writing.
However, it is no longer possible to offer to packages for json
processing which are based on different implementations. The latter
naming scheme would allow this.

Regarding the general progress of the reorganisation. All pull requests
for metafacture-core are now merged. At the moment I am updating all
dependencies of the core to their latest versions. Afterwards, I want to
determine what needs to go into the Metamorph API package. That is
hopefully not to difficult as Metamorph is already quite independent
from the rest of Metafacture. Only the xml-based test cases may cause
some trouble. Once this is done, I thought of reorganising the package
structure to reflect the new multi-module structure, so that the actual
split up in the next release becomes easier. This last all-in-one
version of Metafacture will be released as metafacture-core 4.0.0

Cheers,
Christoph
-------------- next part --------------

metafacture-framework
	Contains the interface definitions and base classes for Metafacture modules.

	Stability: high
	
	Packages:
		framework

--------------------------------------------------------------------------------

metafacture-morph-api
	Contains the interface definitions and base classes for implementing 
	Metamorph functions and collectors.

	Stability: high

	Contents:
		[TODO] (Collector and Funtion interfaces, AbstractFlushingCollector ...)
 
--------------------------------------------------------------------------------

metafacture-commons
	Contains utility classes and commonly used data structures.

	Stability: medium

	Packages:
		exceptions
		types
		util

--------------------------------------------------------------------------------

metafacture-test-tools
	An extension for JUnit that allows to test Metamorph scripts using JUnit.

	Stability: medium

	Packages:
		test

	Modules:
		WellformednessChecker
		StreamValidator [DEPENDS ON: metafacture-core-modules (EventList.Event)]

--------------------------------------------------------------------------------

metafacture-core-modules
	
	Stability: medium

	Packages:
		formeta [DEPENDS ON: metafacture-commons (StringUtil.copyToBuffer)]

	Modules:
		FormetaEncoder [DEPENDS ON: formeta]
		FormetaDecoder [DEPENDS ON: formeta]
		FormetaRecordsReader [DEPENDS ON: formeta]
		ObjectToLiteral
		LiteralToObject
		StreamToTriples [DEPENDS ON: formeta]
		TriplesToStream [DEPENDS ON: formeta]
		StringListMapToStream [DEPENDS ON: metafacture-commons (ListMap)]
		MapToStream
		CloseSuppressor
		IdChangePipe
		IdentityStreamPipe
		LineReader
		RecordReader
		RegexDecoder
		ObjectTemplate [DEPENDS ON: metafacture-commons (StringUtil.format)]
		PojoEncoder
		PojoDecoder
		PreambleEpilogueAdder
		StreamLiteralFormatter
		SortedTripleFileFacade
		AbstractTripleSort
		TripleSort
		TripleCount
		TripleCollect [DEPENDS ON: formeta]
		AbstractStreamBatcher
		StreamBatchLogger [DEPENDS ON: metafacture-commons (StringUtil.format)]
		StreamBatchResetter
		DuplicateObjectFilter
		LineSplitter
		ObjectBatchLogger [DEPENDS ON: metafacture-commons (StringUtil.format)]
		ObjectBuffer
		ObjectExceptionCatcher
		ObjectLogger
		ObjectTee
		ObjectTimer [DEPENDS ON: metafacture-commons (TimeUtil.formatDuration)]
		RecordToEntity
		StreamBatchMerger
		StreamBuffer
		StreamDeferrer
		StreamEventDiscarder
		StreamExceptionCatcher
		StreamFlattener
		StreamLogger
		StreamMerger
		StreamTee
		StreamTimer [DEPENDS ON: metafacture-commons (TimeUtil.formatDuration)]
		StringDecoder
		StringFilter
		StringMatcher
		TripleFilter
		TripleReorder
		EntityPathTracker
		EventList
		StringConcatenator
		NamedValueList [DEPENDS ON: metafacture-commons (Collector, NamedValue)]
		NamedValueSet [DEPENDS ON: metafacture-commons (Collector, NamedValue)]
		SingleValue [DEPENDS ON: metafacture-commons (Collector)]
		StringListMap [DEPENDS ON: metafacture-commons (Collector, ListMap)]
		StringMap [DEPENDS ON: metafacture-commons (Collector)]
		ValueSet [DEPENDS ON: metafacture-commons (Collector)]
		HttpOpener
		ResourceOpener [DEPENDS ON: metafacture-commons (ResourceUtil.getReader)]
		DirReader
		StdInOpener
		StringReader
		StringSender
		XmlDecoder
		GenericXmlHandler
		CGXmlHandler
		SimpleXmlEncoder [DEPENDS ON: metafacture-commons (MultiMap, ResourceUtil.loadProperties)]
		XmlTee
		XmlElementSplitter [DEPENDS ON: org.apache.commons.lang]

--------------------------------------------------------------------------------
	
metafacture-extra-modules
	Stability: low

	Modules:
		ObjectPipeDecoupler
		StreamUnicodeNormalizer
		UnicodeNormalizer
		JScriptObjectPipe [DEPENDS ON: metafacture-commons (ResourceUtil.getReader)]

--------------------------------------------------------------------------------

metafacture-morph-modules
	Contains Metamorph and modules directly building on it.

	Stability: medium

	Depends on: metafacture-comons

	Modules:
		Metamorph [ÐEPENDS ON: metafacture-core-modules (StreamBuffer, StreamFlattener), org.apache.commons.lang]
		Filter [DEPENDS ON: metafacture-core-modules (Metamorph, StreamBuffer, SingleValue)]
		Splitter

--------------------------------------------------------------------------------

metafacture-stats-modules
	Stability: low

	Modules:
		AbstractCountProcessor
		CooccurrenceMetricCalculator
		Counter
		UniformSampler
		Histogram

--------------------------------------------------------------------------------

metafacture-file-modules
	Modules for reading and writing files. 

	Stability: low
	
	Depends on: org.apache.commons.io
	            org.apache.commons.compress

	Modules:
		FileOpener
		TarReader
		TripleObjectRetriever
		TripleObjectWriter
		TripleReader
		TripleWriter
		FileDigestCalculator
		ConfigurableObjectWriter
		AbstractObjectWriter
		ObjectStdoutWriter
		ObjectFileWriter
		ObjectWriter
		XmlFilenameWriter

--------------------------------------------------------------------------------

metafacture-biblio-modules 
	Modules for working with library related data formats.	

	Stability: low

	Packages:
		iso2709 [DEPENDS: metafacture-commons (Require, StringUtil.repeatChars)]

	Modules:
		AseqDecoder
		MabDecoder
		Marc21Encoder [DEPENDS ON: iso2709]
		Marc21Decoder [DEPENDS ON: iso2709]
		PicaEncoder
		PicaDecoder [DEPENDS ON: metafacture-commons (StringUtil.copyToBuffer)]
		MarcXmlHandler
		PicaXmlHandler
		AlephMabXmlHandler
		OreAggregationAdder [DEPENDS ON: metafacture-commons (ListMap, ResourceUtil.loadProperties)]
		PicaMultiscriptRemodeler [DEPENDS ON: metafacture-core-modules (StreamBuffer)]

--------------------------------------------------------------------------------

metafacture-rdf-modules
	Stability: low

	Depends on: org.apache.commons.lang

	Modules:
		RdfMacroPipe

--------------------------------------------------------------------------------

metafacture-csv-modules
	Stability: low

	Depends on: net.sf.opencsv

	Modules:
		CsvDecoder

--------------------------------------------------------------------------------

metafacture-jdom-modules
	Stability: low

	Depends on: org.jdom
	            metafacture-commons

	Modules:
		StreamToJDomDocument
		JDomDocumentToStream

--------------------------------------------------------------------------------

metafacture-json-modules
	Stability: low

	Depends on: org,apache.commons.lang
	            com.fasterxml.jackson

	Modules
		JsonEncoder
		JsonToElasticsearchBulk


More information about the Metafacture mailing list