Having recently left an academic post, I’ve been thinking about what will happen to the data that I collected during my previous role that remains unpublished. Will it, like so much data, end up stuck in the limbo of the proverbial ‘file drawer’?
The ‘file drawer problem’ is generally understood to mean “the bias introduced into the scientific literature by selective publication – chiefly by a tendency to publish positive results but not to publish negative or nonconfirmatory results.”
However, while selective publication based on results is a big problem, even positive data can end up unpublished. If no-one gets round to analyzing some data at all, no-one will know whether it’s positive or negative, and it will vanish into obscurity, not because of the ‘file drawer problem’ but just because researchers only have so much time.
I would estimate that over the course of my research career, at least 25% of the data I’ve collected was never published, and in most cases it was never seriously analyzed. What happened? Well, life happened. For instance, I conducted a study as part of my PhD work that I’ll call the Orphan project.
I designed and carried out the Orphan project during my PhD. It was intended to form the final part of my thesis, but in the end, my other studies provided sufficient results and Orphan was left out. So analyzing the Orphan data was never a priority during my PhD: I had to focus on the thesis studies.
I’d only managed to make a start on Orphan’s analysis by the time I’d got my PhD and moved away to start out as a postdoc in a different lab. Now that I had a new job to worry about, I had no time to finish Orphan, although I made a couple of abortive attempts. My old lab had their own data to deal with. So the data remained on my portable hard-drive, literally gathering dust, and now I can’t even remember where that hard-drive is.
This is a sad story. The volunteers who gave their time to participate did so in vain, and all my work was in vain too. Orphan wasn’t a great study, but it wasn’t terrible. It was no less deserving of publication than most of my published work.
I know that my lost project is far from unique. It’s hard to know just how much scientific data gets lost in limbo – by definition, it leaves no public traces – but it’s fair to say it’s not uncommon. Even if a study is published, it’s common that some parts of the work never make it into the papers. Researchers love adding additional measures to studies but don’t always have time to analyze all the extra data this generates.
So what can we do about this? Well, if the problem is that studies are being orphaned, why not create an adoption service?
There are lots of researchers who would love to get their hands on a particular kind of data, but who lack the resources to collect it. What if we could connect the people with too much data, and the people who need more?
I’m picturing a site where researchers could (perhaps anonymously at this stage) post a brief description of datasets that they have no time to analyze. Interested researchers could then contact those with data, and hopefully a collaboration would result, in which the data was shared, analyzed and published, saving it from oblivion.
Now, it could be said that we don’t need such a system, we just need open data. Why don’t researchers with surplus data just post it online so that anyone can access it and work with it? I agree that this is a great solution. But some researchers, for whatever reason, don’t want to make their data fully public. I think many of these would be open to handing their data to a named researcher in the framework of an agreed collaboration.
I’m not aware of any existing service that provides this kind of ‘adoption service for data’ (or ‘Tinder for data’ if you prefer). Perhaps the closest thing I know of is PsychFileDrawer, but it is mainly focussed on letting people share unpublished replication datasets.