99% Perspiration

Recently I’ve been having a lot of conversations about how to create incentives for open science.  As we continue to make progress on the technical challenges around open data and collaborative analytics it’s becoming ever more apparent that changing the motivations and culture of science is the real barrier.  Even at Sage Bionetworks, despite the emphasis on openness and transparency in research as a foundation of the culture, it is still often the case that “papers are the currency of science”. Internal and external projects often require complex politics to break work into “publishable units” and define first and last authorship.

With Synapse, one of the areas we are being driven to develop more fully is the notion of attribution tracking.  The basic argument I get is “Papers are too coarse grained and slow to be published.  We need finer grained attribution tracking in real time so we can see everyone’s contribution.  That will allow people to get credit for smaller pieces of work, driving faster science.”  When I’ve used the “GitHub for Biology” analogy when explaining Synapse to scientists, the thing many latch on to is the notion that a digital record of their work could be used to drive career advancements like tenure decisions.  There’s a lot of interest in features allowing users to pull together the full history of their work and quantify its impact.  For example, what if on every Synapse project page you could get to the list of all people who contributed to generating and processing data before it entered the project, sort of an automatic generation of references? The knowledge that the software would drive complete attribution for downstream work might draw a lot more interest in contributing well-curated data to the Synapse Commons.

Of course this idea can be carried to extremes.  One of the worrying comments I actually got recently was that we should try to extend attribution tracking upstream to the point where we could capture the “generation of the key ideas” that could drive future work.  Basically, I had a scientist wanting to record pure ideas so that anytime anyone else did work to actually do something in that area; he could take the credit for coming up with the idea.  I had a flashback to an old Onion story about Microsoft patenting “Ones and Zeros”.  Pure ideas are cheap; making them work is hard.  If genius is “1% inspiration and 99% perspiration”, I think attribution clearly needs to be focused on the 99%.

I’m also a little worried that we’re over-emphasizing the notion that work needs to be clearly attributed to a single author.  GitHub, like other software development tools, emphasizes tracking work not primarily for the purpose of creating attribution but for the simple fact that fine grained tracking of code changes is often useful in resolving issues and getting work done more efficiently.  People were using code version control systems long before GitHub took the lead in connecting your check-in history to your online profile.  Even today, building a profile is a side-effect of doing actual work writing software.  Many of the most effective engineers I know contribute far more to a project indirectly through their interactions with their teammates than directly though lines of code added to the code base.

Much of a software engineer’s reputation and career advancement comes from the projects he’s worked on and companies for which he’s worked.  The best ones put the overall success of the project before their own need to attach their name to the work because they get excited by the potential of the project to change the world.  This is not unique to software engineering: I’m sure the engineers on the Apollo or Manhattan projects were also driven by the enormous vision and scope of the projects.  Trying to cut these projects up into “publishable units” where everyone got a few pieces would only have prevented their success.  I feel this is something we desperately need in the area of human health research.   Hopefully, our recently launched Sage / DREAM challenge on predictive modeling of breast cancer will help us start figure out how to create these sorts of driver projects around human health research.


About Michael Kellen

I've spent my career working at the intersection of science and technology. Currently I lead the technology team at Sage Bionetworks, but this blog contains my own thoughts.
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s