Industry embraces the semantic web! Just like us, only different….

Google, Bing!, and Yahoo announced an initiative on June 2, 2011 to create Schema.org, a web site that will promote standard ways of adding machine readable (semantic) data to all of our web pages.  As a validation of the semantic web, this is great news.

However, they are supporting a different format for exposing semantic data than what we use in biomedical informatics, so what does that mean for our way of doing things?  More specifically, we have become proponents of RDFa as the “serialization format” for exposing our semantic data. RDFa is how VIVO exposes semantic data, it’s how the future versions (and current Harvard version) of Profiles will expose semantic data, and it’s intrinsically tied to our support of machine readable ontologies such as vivo and foaf.

Schema.org will support a different serialization format known as “microdata”.  Some are seeing this as the possible death of RDFa: http://graveshow.com/blog/tutorials/web-design/death-rdfa.  Others are not sure if it is a threat or opportunity: http://bnode.org/blog/2011/06/06/schema-org-threat-or-opportunity.  At least one person thinks this is actually good for RDFa: http://planet.linkeddata.org/.  The schema.org creators are aware of the controversy they have created with their support of microdata versus RDFa, and they do a good job of explaining their decisions here: http://schema.org/docs/faq.html.

The general consensus seems to be that RDFa is in many ways a more complete solution for semantic expression than microdata, but RDFa is difficult and intimidating for developers to grasp and therefore suffers from adoption outside of certain niche fields (such as BioMed2.0). We don’t mind handling the difficulty of RDFa because our field has already forced us to deal with the challenges of sharing large complex data sets and to wrap our heads around ontologies and other semantic concepts.

One way to interpret this would be to say that what we are doing with the semantic web in BioMed is great, and that we should continue down our path while industry takes a baby step into the semantic web with the more-pragmatic if less-complete microdata approach.  At the surface, this would seem like a fine solution.  The problem is: now we have industry and BioMed on different paths.  For those people (like us at UCSF) who want to combine the best technical solutions from industry with the best technical solutions of academia and research, this can be a problem.

In particular at UCSF we want to combine our “academic” BioMed semantic web solutions with the “industry” OpenSocial specification to create something that is a better way to publish and share data rich applications than either one of those technologies can support by themselves today.  In pursuing this we’re already seeing issues with bridging RDFa into the JSON centric world of OpenSocial.  Mapping RDFa to JSON is a tough problem to solve, and a number of solutions have been proposed (search RDFa and JSON) without any clear winner.  However with microdata, going from semantic web to JSON/OpenSocial might not be as hard.  Given industries favor of pragmatism over elegance, and the recognition that JSON is THE dominant data exchange method on the web today, this would hardly be surprising.  So…., for some of us this “support of the semantic web & simultaneous challenge to RDFa” may be good news after all!

Now we just need to deal with the very real problem of getting VIVO, Profiles and the rest of our BioMed2.0 systems to produce microdata as well as RDFa.  And why not?  Supporting one format, even by mandate, does not mean you shouldn’t support another.  If you want to share data and ideas, which we say we want to do, then the more the merrier.