A standard which enables a user to say anythingâ¦ is not a collaboration solution.
I've recently heard more than once that the solution to cooperation between organizations who work on overlapping datasets is to use RDF, or more generally that the "Semantic Web" and "linked data" areÂ the solutions to all standardization of data.Â Don't get me wrong.Â I don't hate RDF or the ideas of the Semantic Web and linked data.Â Having said that, they don't solve the problems of inter-institutional collaboration.
RDF essentially prescribes a Subject, Predicate, Direct Object syntax.Â Basically, thatâs the grammar of a two year old. Â I donât say that to minimize RDF; I say that to maximize it. Â RDF has done nothing to standardize anything, anymore than XML has done to standardize data representation.Â Consider this: If I have to learn all the RDF Predicates a data repository might use (note: RDF Predicates are not really English language predicates because RDF Predicates exclude the direct object and oftenÂ imply a simple âis-aâ verb) and figure out what exactly an authorÂ means by those RDF Predicates, then how is that any different than the need to read and understand a set of XML Schema definitions for objects I might find in an XML repository? Â Let me use a different adjective: how is that any better?
</example.org/#spiderman> Â </www.perceive.net/schemas/relationship/enemyOf> Â </example.org/#green-goblin> . </example.org/#uncle-ben> Â </www.perceive.net/schemas/relationship/guardianOf> Â </example.org/#spiderman> .
<character id=âspidermanâ> <enemyOf>green-goblin</enemyOf> <guardian>uncle-ben</guardian> </character>
Sure, youâve atomized data into bite-sized chunks and can now build very simple data browsing tools based on these granular chunks of data. Â I can look up Spiderman and see all known predicates assigned to Spiderman and if those predicates mostly make human sense to me without forcing me to go read their definition, then I can click on one of their direct objects and learn all the predicates related to that object, ad infinitum. Â Cool right? Â Sure. Â Is it more useful for programming a valuable user experience than an XML Schema? Â Probably not so much.
Donât miss the problem. Â The hard part of data interoperability is political agreement. Â Itâs not a technical problem that hasnât been solved in many ways. Â Telling people, hey, hereâs a standard which allows you to create noun/predicate/direct object syntax and hereâs a browser and search tool for that syntax, and thinking that youâve solved any real problem without getting people to agree on exactly what RDF Predicates to use and what exactly they mean, and forcing them to adopt a unique object identification system and to agree on exactly what each of those object identifiers specify, but then calling it an interoperability standard is worse than TEI. Â I mean, I love TEI, but no one uses TEI in any standards-compliant way which allows real useful interoperability because TEI allows a user the freedom to build their own schema from many different modules of tags, and-- I can guess because Iâve been there before-- the reason the tags are not more strictly defined is because participating members of the standards team had heartfelt and different usages in mind for how they would apply those tags. Â The end result is that basically there is no interoperability. Â There is familiarity, but thatâs not the same thing.Â The hard work EpiDoc has done to wrangleÂ political agreement on a subset of the TEI and strict usage definitions shared between organizations, that is work toward inter-institutional collaboration.Â Back to RDF and the Semantic Web, sure, work has been done to define very large "vocabularies" and "ontologies" (if you find an authoritative source with a clear definition of the difference between "ontology" and "vocabulary", send me an email) for specialized domains... This isÂ parallelÂ to the work of specializing XML into the TEI and is a step toward interoperability, but are we in the same place as TEI, with different organizations opting to use different subsets of these ontologies?Â Are the common elements which institutions might share used in the same way?Â Do the same noun instances common in these organizations use the same unique key?Â In short, do we have a large bodyÂ of data available in one of these ontologies (besides the ontologies themselves)?Â Do we have TWO large bodies of data from TWO different institutions available in the same ontology, both using all the terms in exactly the same wayÂ (parallel: TEI -> EpiDoc) and identifying proper nouns with exactly the same keys?Â Do we have any specializedÂ (= useful) software systems developed based on this ontology which workÂ with BOTH datasets?Â These are the hard parts-- the time consuming parts-- of inter-institutional collaboration, and they are not strictly technical in nature.
Yeah, so what exactly am I saying? Â I am saying that once you adopt a unique naming scheme for objects and have multiple institutions agree on that naming scheme and what exactly those objects mean; and once you specifically define predicates which can be used with thoseÂ object types (e.g., adjectivesÂ and relationships) and get more than one institution to agree to actually adopt and implement that schema internally, and finallyÂ convince them to make those resources available to other institutions, then youâve developed a useful standard for interoperability. Â And then that standard can be described in RDF or XML Schema or a number of other ways. Â Saying that youâve adopted RDF is like saying youâve adopted XML or JSON. Â Are RDF, XML, and JSON all standards? Â Sure. Â Does simply adopting RDF, XML, or JSON mean that you are interoperable? It doesnât even mean that youâve begun.