[Lds] DNB RDF dumps loaded into a triplestore

Die Feb 2 11:06:06 CET 2016

Am 02.02.2016 um 09:52 schrieb Thomas Gängler:
> Hi all,
>
> is there anyone out there who did load successfully the DNB RDF dumps
> into a triplestore? - If yes, how was your experience? For example,
> which triplestore did you utilise? Which version of the triplestore?
> Which operating system? Which version of the DNB RDF dump? Which
> serialisation of the DNB RDF dump?
>
> Thanks a lot in advance for all your help.
>

Hi Thomas,

Joachim Neubert deployed the GND dumps into a Fuseki endpoint in 2014: 
[1]. It is available online [2], has a lot of examples [3] and works 
very well.

The dumps are quite big, so the more memory is available to the 
triplestore the better. Speed, memory and disk usage vary with the depth 
of indexing, full indexing of all S,P,O,G permutations being the worst 
case. 32+ GB RAM and a large SSD are a good start, I'd still recommend 
ingesting the data in chunks. What I did for the GND was:

* Download the Turtle version from [4] (~1GB)
* Convert it to N-TRIPLE with rapper [6] (~12GB)
* Split it into files with 1M statements
* Load them one by one into a Apache Jena TDB triplestore with 
tdbloader. Can probably be sped up with tdbloader2data and 
tdbloader2index [6]
* Run Fuseki on the TDB triplestore

Once the data is loaded and indexed, SPARQL queries are fast.

For trivial, static queries, it can be faster to just search the 
N-TRIPLE data using some command line magic. Not as elegant and fast as 
doing SPARQL but no infrastructure to setup and less memory 
requirements. I did that for extracting the DDC concordances from the 
GND due to memory and disk limitations on that particular machine.

Best of luck,

Konstantin

[1] 
http://lists.dnb.de/pipermail/dini-ag-kim-normdaten/2014-March/000037.html

[2] 
http://zbw.eu/beta/sparql-lab/?endpoint=http://zbw.eu/beta/sparql/gnd/query

[3] https://github.com/jneubert/sparql-queries/tree/master/gnd#readme

[4] http://www.dnb.de/EN/lds.html

[5] http://librdf.org/raptor/rapper.html

[6] https://jena.apache.org/documentation/tdb/commands.html#tdbloader

-- 
Konstantin Baierer
Universitätsbibliothek Mannheim
Abteilung Digitale Bibliotheksdienste / Projekt InFoLiS II
68131 Mannheim
Tel. 0621/181-2962
Email: konstantin.baierer at bib.uni-mannheim.de