[Dini-ag-kim-bestandsdaten] about blank nodes as I understand it

Fri Apr 19 14:25:08 CEST 2013

Hi everyone!

When I was thinking about how our library holdings data can be expressed in RDF, I got aware of the fact that at several points we cannot avoid blank nodes.

And what I was remembering before I started to investigate the issue was: Blank nodes are bad! Now I think that this is a myth.

Let me just briefly explain what the problem was and what changed my mind. 

The problem lies in our data structure, I think. Like MARC21 we have fields and subfields. While the pure data lies in the subfields and the field are just categories, this is not the problem in modelling the data. But fields can be repeatable and that could be a problem.

Here is a simple example, where the field '7135' stands for 'Access information on a library holding', the subfield =u for the 'URL' and =x for 'a comment on the access information':

7135 =u http://www.ub.uni-freiburg.de/hylib/suche-EP.cgi?nd=32018=x Readme-File
7135 =u http://www.bibliothek.uni-regensburg.de/ezeit/?123456=x Access via EZB 

If I try to express this data in RDF without the usage of blank nodes, the [determination | differentiation]? of the field 7135 gets lost:

:Holding 
    :url "http://www.ub.uni-freiburg.de/hylib/suche-EP.cgi?nd=32018" ;
    :comment "Readme-File" ;
    :url "http://www.bibliothek.uni-regensburg.de/ezeit/?123456" ;
    :comment "Access via EZB" .

You cannot tell anymore which comment is on which URL. Minting URIs for every repeatable field, which also would avoid blank nodes, is from my point of view not [viable | practical]?. So we cannot avoid blank nodes. And from there on I started to investigate in 'blank nodes', knowing that my knowledge on 'blank nodes' is poor (like my English).

Then I read a lot of the mails on the subject "Well Behaved RDF - Taming Blank Nodes, etc." [1] from the w3c linked data list and also the David Booth paper "Well Behaved RDF" [2] and the paper "On Blank Nodes" [3].

After that my opinion on blank nodes has changed. For me it all comes down to two points:

1. You can use blank nodes as long they are implicit and not cyclic but tree like. Then they do no harm in any system.
- As far I understand this our fields and subfields are tree like and implicit (because we can serialize it in Turtle with brackets []). 

2. You should using URIs instead of blank nodes "[...] if another party might reasonably want to create a hypertext link to it, make or refute assertions about it, retrieve or cache a representation of it, include all or part of it by reference into another representation, annotate it, or perform other operations on it. [...]" [4]
- And that is for the tiny bit of information of the fields and subfields without the context of the hole holding information not applicable.

So I hope I'm not a bad guy when I say
:Holding 
    :hasAccessInformation [
        :url "http://www.ub.uni-freiburg.de/hylib/suche-EP.cgi?nd=32018" ;
        :comment "Readme-File" . 
    ] ;
    :hasAccessInformation [
        :url "http://www.bibliothek.uni-regensburg.de/ezeit/?123456" ;
        :comment "Access via EZB" .
    ] .

Cheers!

Carsten

[1] <http://w3-org.9356.n7.nabble.com/Well-Behaved-RDF-Taming-Blank-Nodes-etc-td241694.html>
[2] <http://dbooth.org/2013/well-behaved-rdf/Booth-well-behaved-rdf.pdf>
[3] <http://sw.deri.org/~aidanh/docs/bnodes.pdf>
[4] <http://www.w3.org/TR/webarch/#uri-benefits>
_______________________________________________
Carsten Klee
Abt. Überregionale Bibliographische Dienste IIE
Staatsbibliothek zu Berlin - Preußischer Kulturbesitz
Potsdamer Straße 33
10785 Berlin

Fon:  +49 30 266-43 44 02
Fax:   +49 30 266-33 40 01
carsten.klee at sbb.spk-berlin.de
www.zeitschriftendatenbank.de