If you’re a linked data nerd you may be following the ongoing There’s No Money in Linked Data email thread that’s been haunting public-lod
and a few other lists. (See also [1]) A particularly thought-provoking post from Andreas Blumauer emerged as a result of this discussion:
…To publish LOD which is interesting for the usage beyond research projects, datasets should be specific and trustworthy (another example is the German labor law thesaurus by Wolters Kluwer). I am not saying that datasets like DBpedia are waivable. They serve as important hubs in the LOD cloud, but for non-academic projects based on LOD we need an additional layer of linked open datasets, the Trusted LOD cloud…
Due mostly (I think) to language an/or cultural barriers, the core message — or what I believe is the core message — is not coming across very well. I believe the core point is this: data published without explicit expressions of (a) provenance and (b) rights is of limited use, especially to commercial (and presumably responsible) consumers. In a “private” cloud it might be easier to make explicit assertions, but to be honest following linked data best practices it really shouldn’t be that hard to do today.
We’ve been here before: The problem of missing or ambiguous rights and provenance metadata was an issue back when content — images, audio, etc — first went online, with a notable difference: content usually has inherent utility without metadata, but data usually doesn’t. Back in the day, some of us used to talk about “copyright as an enabler,” evangelizing the idea that decorating content with useful rights metadata would be a great thing because it would facilitate communications with the people “behind” that content (Ester Dyson’s notion of Intellectual Value). Such an argument really only resonates with responsible derivative users who want to “do the right thing” w.r.t. copyrights, which is likely a tiny percentage of users and producers, and indeed is increasingly moot with the popular use of Creative Commons licenses. With published data, this becomes more critical; many types of data are simply not valid without at least an understanding of its provenance, and usually also whatever rights have been asserted by the creator.
Early on (i.e. 2009…) Leigh Dodds and several others created versions of the LOD cloud illustrating the known rights domains . I think the key argument in the ongoing “No Money in Linked Data” thread is the uncertainty imposed by unknown licensing state, which is clearly a big problem when one studies these diagrams…
My thanks to Michael Pendleton of the EPA for provoking me to write this…
References:
1. Prateek Jain, et.al., There’s No Money in Linked Data. Self-published. (2013)