[Lds] DNB RDF dumps loaded into a triplestore
Konstantin Baierer
konstantin.baierer at bib.uni-mannheim.de
Die Feb 2 11:06:06 CET 2016
Am 02.02.2016 um 09:52 schrieb Thomas Gängler:
> Hi all,
>
> is there anyone out there who did load successfully the DNB RDF dumps
> into a triplestore? - If yes, how was your experience? For example,
> which triplestore did you utilise? Which version of the triplestore?
> Which operating system? Which version of the DNB RDF dump? Which
> serialisation of the DNB RDF dump?
>
> Thanks a lot in advance for all your help.
>
Hi Thomas,
Joachim Neubert deployed the GND dumps into a Fuseki endpoint in 2014:
[1]. It is available online [2], has a lot of examples [3] and works
very well.
The dumps are quite big, so the more memory is available to the
triplestore the better. Speed, memory and disk usage vary with the depth
of indexing, full indexing of all S,P,O,G permutations being the worst
case. 32+ GB RAM and a large SSD are a good start, I'd still recommend
ingesting the data in chunks. What I did for the GND was:
* Download the Turtle version from [4] (~1GB)
* Convert it to N-TRIPLE with rapper [6] (~12GB)
* Split it into files with 1M statements
* Load them one by one into a Apache Jena TDB triplestore with
tdbloader. Can probably be sped up with tdbloader2data and
tdbloader2index [6]
* Run Fuseki on the TDB triplestore
Once the data is loaded and indexed, SPARQL queries are fast.
For trivial, static queries, it can be faster to just search the
N-TRIPLE data using some command line magic. Not as elegant and fast as
doing SPARQL but no infrastructure to setup and less memory
requirements. I did that for extracting the DDC concordances from the
GND due to memory and disk limitations on that particular machine.
Best of luck,
Konstantin
[1]
http://lists.dnb.de/pipermail/dini-ag-kim-normdaten/2014-March/000037.html
[2]
http://zbw.eu/beta/sparql-lab/?endpoint=http://zbw.eu/beta/sparql/gnd/query
[3] https://github.com/jneubert/sparql-queries/tree/master/gnd#readme
[4] http://www.dnb.de/EN/lds.html
[5] http://librdf.org/raptor/rapper.html
[6] https://jena.apache.org/documentation/tdb/commands.html#tdbloader
--
Konstantin Baierer
Universitätsbibliothek Mannheim
Abteilung Digitale Bibliotheksdienste / Projekt InFoLiS II
68131 Mannheim
Tel. 0621/181-2962
Email: konstantin.baierer at bib.uni-mannheim.de