[Lds] DNB RDF dumps loaded into a triplestore

Thomas Berger ThB at Gymel.com
Die Feb 2 12:15:23 CET 2016


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 02.02.2016 um 09:52 schrieb Thomas Gängler:

> is there anyone out there who did load successfully the DNB RDF dumps
> into a triplestore? - If yes, how was your experience? For example,
> which triplestore did you utilise? Which version of the triplestore?
> Which operating system? Which version of the DNB RDF dump? Which
> serialisation of the DNB RDF dump?

I can only speak of my experience with the GND dumps, wich I
regularily import into a 4store instance (cluster of 4 workers
with one frontend machine).

Based on the Turtle serialization I perform some preprocessing,
namely

* transform triples into quads (so subsequent OAI harvesting is
able to update "records" by exchanging subgraphs), technically
I wrap the block of statements pertaining to one GND id by
<some_gnd-id_derived_uri> = { original_triples_of_that_record } .
(this became a bit dirty since the GND is delivered with
skolemized nodes instead of the originally "inlined" blank nodes)

* split the file into portions of 20k records each to faciliate loading
(during import the frontend requires RAM proportional to the
size of the input packet being processed. In my setup frontend
and workers compete for the same physical RAM on the host so I
prefer relatively small portions of input)

4store really appreciates holding everything in RAM / Memmapped files
(and actually is designed and optimized for that situation), so expect
smooth and fast loading only when at least 20GB RAM are available to
the worker(s).

HTH
Thomas Berger







-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iJwEAQECAAYFAlawj8kACgkQYhMlmJ6W47N8kAP/VVcXWTPYHZmciFiwnVAzg4jT
yVcvw/GOOGiQ5qhTdslnNUsstbXvDVONS71rl4GqfzvZvtN8pfL/KcO+Z5EJ5oEP
NNePlPo0rqeZ4gEPZvY1xJk+YSx4nYpBAwBRKS/0dMf5EoOi4ys42IsIwYuRHWnv
ZbaBKx+E6/6UMhE15K8=
=bIDU
-----END PGP SIGNATURE-----