Keys to a Successful Data Repository

Recently, Cameron Neylon posted an interesting article on his blog, reflecting on some of the challenges in building a data repository:

One of the problems with many efforts in this space is how they are conceived and sold as the user. “Making it easy to put your data on the web” and “helping others to find your data” solve problems that most researchers don’t think they have. Most researchers don’t want to share at all, preferring to retain as much of an advantage through secrecy as possible. Those who do see a value in sharing are for the most part highly skeptical that the vast majority of research data can be used outside the lab in which it was generated. The small remainder who see a value in wider research data sharing are painfully aware of how much work it is to make that data useful.

A successful data repository system will start by solving a different problem, a problem that all researchers recognize they have, and will then nudge the users into doing the additional work of recording or allowing the capture of the metadata that could make that data useful to other researchers. Finally it will quietly encourage them to make the data accessible to other researchers. Both the nudge and the encouragement will arise by offering back to the user immediate benefits in the form of automated processing, derived data products, or other more incentives.

He goes on to discuss how the system needs to be as simple as possible, and as automated as can be.  He also mentions a few tools that could help in this process.  All in all, required reading for those of us interested in this domain space.