Publishers often argue that the industry has undergone a massive transformation in the last 20 years, moving smoothly, swiftly and effectively from print to digital. Yet our most vocal critics within academia frequently accuse the industry of being antiquated and failing to meet researchers’ needs, while many of our more recent attempts at innovation have, as Sarah Andrus noted in a post on The Scholarly Kitchen [1], failed to find an enthusiastic audience. Having spent the better part of two decades in traditional publishing, I think about these issues often. The simplest explanation I can offer is that in publishing’s move from print to digital, little has genuinely changed beyond the method of delivery.
PDFs – Replicating the Strength of Print at the Expense of Innovation
When publishers began their switch to digital in the 1990s, they started by recreating existing print artifacts in digital media: the printed journal article became the PDF with the fixed layout of this new digital format exactly mirroring the printed original. Instead of this hybrid format serving merely as an early, transitory stage in the development of more highly-functional digital formats, however, the PDF has remained the most popular format among researchers despite the existence of newer, more fully-featured interactive equivalents. Its lasting popularity derives from those qualities it shares with its print predecessors: it is both fixed and available offline through saving and printing.
By focusing so intently on replicating the strengths of print formats, however, publishers and researchers have restricted themselves to reproducing in their digital formats only those elements of the research process that print can convey: primarily, the results obtained, and the conclusions drawn from them. And though communicating results has long been the accepted role of publishing within the scientific process, in light of opportunities opened up by digital delivery and the internet, we no longer need to restrict publication to this limited stage of the process.
Today, with advances in technology that enable us to distribute far more than just words on pages, we can push the boundaries further to capture, connect, and circulate different research artifacts that substantiate the scientific process to facilitate greater insight. Much of what occurs before publication – the experimentation, the analysis, and the possibility that results will disprove the hypothesis – can now be shared. So too, can the conversation that takes place around that research. Our current publishing process produces:
“Not the scholarship itself [but] merely advertising of the scholarship”Jon Claerbout, Cecil Green Professor Emeritus of Geophysics at Stanford University [2]
However, we now have the capability to expand what we mean by publication so that it includes what Claerbout terms “the actual scholarship” – which, in the case of his own field of research, comprises “the complete software development environment and the complete set of instructions which generated the figures.”
Open Data Movement – Publishing the Data Behind the Research
Increasing demands for transparency and reproducibility in research have resulted in a widening of what we consider publishable research outputs. The open data movement has seen the data behind research being shared in repositories and increasingly recognized as a publishable output. By curating and distributing this wider range of research outputs, we can expose more of the research process to fellow researchers, enabling deeper and more dynamic engagement. By assembling more of the elements that bring researchers to conclusions in their published articles – data, code, lab notebooks, protocols, reagents, annotations, and referenced work – we can create a web of interconnected research objects that better facilitates the process of science.
It is not enough for us merely to facilitate the publication of a wider range of scholarly outputs, however. We must also offer tools that enable researchers to interact with them. Publishers first established their place within the scholarly ecosystem by doing certain things more efficiently, specifically circulating research more widely than scholars could do on their own. Now that anyone with an internet connection can post their research online, the value of our contribution to that ecosystem is increasingly being called into question, and we must focus on what we can do within this new environment to benefit researchers, smooth their scientific processes, and accelerate the pace of research.
Turning a Scientific Paper into a Research Tool
To justify our existence, then, we must enable researchers to do more with these outputs than simply read or write about them. By turning the research paper itself into a functional research tool, we can offer greater value to the community.
Code offers one such opportunity: alongside data, it, too, is rapidly becoming a central component of the scientific record, often turning data into something functional and constructive. As data increase in complexity and become more integral to the work, researchers need to deploy code for analysis, and curate that code in an executable fashion. Successfully running another researchers’ code on one’s own computer is no small task, however. Code is not universal, and researchers develop algorithms, software simulations, and analyses in different programming languages, which can have multiple versions, further complicating the task. Analyses also depend on different files, packages, scripts, installers, and more, making the process of getting code running successfully both time-consuming and complex.
This presents considerable challenges for those who need to run that code to establish its validity, as well as that of the underlying data. It can also create limitations for those researchers looking to reuse that code for furthering their own experiments. The same problems can occur for complex datasets and different types of analyses.
Peer review of these sorts of research outputs is another stumbling point. Increasingly, journals are interested in reviewing data and code as part of the article acceptance process. Nature [3] has described reviewing code as “cumbersome” since it “requires authors to compile the code in a format that is accessible for others to check, and reviewers to download the code and data, set up the computational environment in their own computer and install the many dependencies that are often required to make it all work.” Again, this is a point where publishers can provide the tools needed to ease author and reviewer burden.
Code Ocean – Making Code, Data, Environment and Results Accessible
For code, our approach at Code Ocean is based around providing self-contained executable Compute Capsules™ that includes the code, data, results, and run environment within an article of record. This approach will save researchers both time and provide them with interactivity needed to facilitate reuse and collaboration. It’s just one innovation among many that aim to expand what we mean by “publication” to incorporate more of the scientific process, to ease the burden on scholars, and to expand the concept of what the research paper can offer the community.
Similar initiatives around data, methodologies and annotation are already saving researchers time and effort while adding value to the publication process and the services offered by publishers. In doing so, they remind us that the biggest benefit publishers have always offered researchers is not so much circulating their research as performing the tasks that need doing – whatever they may be – more effectively than the researcher can on their own.
If you have any questions about Code Ocean and how we work with academics and publishers, please contact info@codeocean.com.
References
1. Sarah Ardus „Is the Research Article Immune to Innovation?”, June 3, 2018, https://scholarlykitchen.sspnet.org/2018/07/03/guest-post-research-article-immune-innovation/
2. Quoted in: Jonathan B. Buckheit and David L. Donoho “WaveLab and Reproducible Research”, 1995, http://statweb.stanford.edu/~wavelab/Wavelab_850/wavelab.pdf
3. Mark Staniland “Nature Research journals trial new tools to enhance code peer review and publication”, Aug 1, 2018, http://blogs.nature.com/ofschemesandmemes/2018/08/01/nature-research-journals-trial-new-tools-to-enhance-code-peer-review-and-publication
Read more from our blog:
View All PostsIntroducing Code Ocean Models: a unified environment for ML in CompBio
View PostPre-production: the missing ML link in Biotech & Pharma
View PostMap of foundational models for use in biotech and pharma R&D
View PostSubscribe to our newsletter
Get the latest product updates, company news, and be the first to hear about upcoming webinars and events.