Keys to a Successful Data Repository

Recently, Cameron Neylon posted an interesting article on his blog, reflecting on some of the challenges in building a data repository:

One of the problems with many efforts in this space is how they are conceived and sold as the user. “Making it easy to put your data on the web” and “helping others to find your data” solve problems that most researchers don’t think they have. Most researchers don’t want to share at all, preferring to retain as much of an advantage through secrecy as possible. Those who do see a value in sharing are for the most part highly skeptical that the vast majority of research data can be used outside the lab in which it was generated. The small remainder who see a value in wider research data sharing are painfully aware of how much work it is to make that data useful.

A successful data repository system will start by solving a different problem, a problem that all researchers recognize they have, and will then nudge the users into doing the additional work of recording or allowing the capture of the metadata that could make that data useful to other researchers. Finally it will quietly encourage them to make the data accessible to other researchers. Both the nudge and the encouragement will arise by offering back to the user immediate benefits in the form of automated processing, derived data products, or other more incentives.

He goes on to discuss how the system needs to be as simple as possible, and as automated as can be.  He also mentions a few tools that could help in this process.  All in all, required reading for those of us interested in this domain space.

ScienceSeeker – A Better Way to Keep Informed?

For those of us interested in staying on top of the latest news in science, reading blogs can be a daunting task.   As Bora Zivkovic recently wrote on the Scientific American blog:

Over the years, the science blogosphere exploded in size. There are now thousands of science blogs (in many languages) and nobody can keep up with all of them. Thus, by this time last year, was containing only a miniscule proportion of the science blogging community, and it is quite possible that it was not as representative as it used to be. Yet it was still a one-stop-shopping destination for many, including for the media.

Earlier this year, an interesting new science blog aggregator site started.  Called ScienceSeeker, it attempts to collect “science reporting, analysis, and discussion” in one place.  As they write on their “About” page:

ScienceSeeker is our effort to fill that void. We have collected hundreds of blogs in one place, and invite you to submit even more. Our goal is to be the world’s most comprehensive aggregator of science discussions, all organized by topic.

This site is a work in progress. Consider it to be the first step in our effort. Blogs are categorized according to a fixed list of topics. You can see lists of posts from those blogs, but since many bloggers have wide-ranging interests, some of the topics might not quite fit. Ultimately we plan on categorizing not by blog, but by individual post. We hope to have other ways of arranging posts as well: just the best posts, chosen by experts; the most popular posts; posts about particular events.

Do sites like this help get information out faster, or does it all get lost in the noise?  How do you keep track of the myriad of blog postings?

Upcoming talk on Open Science by Michael Nielsen

For those of us interested in open science, Dr. Michael Nielsen will be speaking in San Francisco later this month.  Dr. Nielsen is a leading advocate in this field and his book, “Reinventing Discovery” will be published later this year.  Here’s some information about his upcoming talk:

The net is transforming many aspects of our society, from finance to friendship.  And yet scientists, who helped create the net, are extremely conservative in how they use it.  Although the net has great potential to transform science, most scientists remain stuck in a centuries-old system for the construction of knowledge. Michael will describe some leading-edge projects that show how online tools can radically change and improve science using projects in Mathematics and Citizen Science as examples, and he will then go on to discuss why these tools haven’t spread to all corners of science, and how we can change that. [via]

The wine, beer, and cheese event will be held at the Public Library of Science on June 29th at 6pm.  The event is free and open to the public,but they ask people to RSVP at if you plan to attend.

Open Notebook Science

Thinking about our recent posting  regarding project and document management, along with a number of postings on open source data, people might be interested in learning more about a movement that takes open source to a basic level.  As described in Wikipedia:

Open Notebook Science is the practice of making the entire primary record of a research project publicly available online as it is recorded. This involves placing the personal, or laboratory, notebook of the researcher online along with all raw and processed data, and any associated material, as this material is generated. The approach may be summed up by the slogan ‘no insider information’.

While not everyone thinks this is a great idea, a number of labs in a variety of disciplines have begun to embrace the concept.  Similar to the Creative Commons movement, there are a number of ways to implement open science in your lab (with associated logos, of course!).

So, does open notebook science have a place in biomedical research, and does it have a role in translational science?

Pharma and Social Media

Pharmaceutical companies continue to struggle with patient interactions in today’s social media environment.  While a number of pharma and biotech firms have a presence on social platforms, the conversation has traditionally been one-sided.  The companies speak, and the consumer can only listen.  However, that’s now starting to shift.

Pharma brand marketers that disable comments on their Facebook pages are in for a change. As predicted, Facebook will no longer allow pharma brands – which are typically highly risk averse when it comes to discussions about their drugs and products in social media environments – to turn off commenting on their pages.[via]

Part of the challenge is a regulatory one.  Industry continues to wait for guidance from the FDA on how social media should and should not be used.  Although the FDA held a hearing on this topic back in 2009, they continue to delay issuing any guidance (which was most recently supposed to be available in Q1 2011, but that didn’t happen).

For now, it seems that pharma and the social media providers must continue to work this out themselves.

Google for Data?

When we think of searching the web for information, our thoughts (or at least mine) usually turn to Google.  However, if you’re looking for numeric data rather than text, a new search engine called “Zanran” might be a better place to start.

Zanran helps you to find ‘semi-structured’ data on the web. This is the numerical data that people have presented as graphs and tables and charts. For example, the data could be a graph in a PDF report, or a table in an Excel spreadsheet, or a barchart shown as an image in an HTML page. This huge amount of information can be difficult to find using conventional search engines, which are focused primarily on finding text rather than graphs, tables and bar charts. [via]

One nice trick: Hover your mouse over the icon on the left-hand side of the search results, and you’ll see a preview image containing your search term.

Open Source Genetics

We’re familiar with open source software and open source data.  Now it looks like we need to add open source molecular biology to the list.

The same concepts that have lead to open source rockin the software world have spawned the beginning of a revolution in biotech. An organization called Biofab, funded by the NSF and run through teams at Stanford and Berkeley, is applying open development approaches to creating building blocks (BioBricksTM from BioBricks Foundation) for the bio products of the future. Now, the first of those building blocks based on E. coli are just rolling off the production line. This, according to the organizers, represents “a new paradigm for biological research.” (via)

