Duplicating Compute Capsules in Code Ocean
In bioinformatics and computational biology, solid workflows are difficult to set up. Advanced analyses often require several computational environments, different languages and multiple libraries. To get to a computational workflow faster, researchers have been known to use a short-cut: past code adoption. For past code adoption, researchers peruse the scientific literature for work that has an objective similar to their own, then try to find the code used in that paper and tailor it to fit their own workflow.
The idea behind it is simple: let’s not reinvent the wheel, if the code is already out there, there is no need to recreate it.
This approach is very common in computational biology and can boost collaboration and keep computational biologists up-to-date on the latest developments in their field.
The Challenges with Past Code Adoption
Reusing existing code that is made available on open source repositories such as Github seems simple enough, but adopting somebody else’s work is not without challenges. There are many trivial obstacles each of which alone may not cause any problems but when they add up they can break your workflow.
Frequent sources of problems include:
Obstacles like these can make the adoption of otherwise ideal workflows impossible or at least incredibly time-consuming and can lead to more computational researchers reinventing the wheel rather than struggling with existing solutions.
What if the code was guaranteed to run?
Addressing the challenge created by non-reproducible code was a key motivator behind the creation of the Code Ocean platform, specifically the Compute Capsules©. Ideally, example code should run anytime, anywhere, regardless of who uses it and how much time has passed since it was last run. It should at most take a few clicks and no DevOps experience, no IT support team, no late-night trouble-shooting sessions.
Compute Capsules were purpose-designed to enable sharing of computational research, incl. sharing and reuse of existing workflows and pipelines. Compute Capsules are self-contained research assets that contain all the code, environment, data required to execute a workflow and all the associated results. Each Compute Capsule, in effect, is also a time capsule in which the code is preserved with all dependencies intact and the environment unchanged. A new user can reproduce past results simply by rerunning the analysis with one click.
This exact preservation of code, environment, data and results is a critically important characteristic in the context of past code adoption: the intact, working, runnable code can be easily modified and edited by cloning or duplicating a Capsule. That allows researchers to build on existing work without time-consuming fixes.
The screenshots below show the difference between rerunning code with and without Code Ocean.
Comparison 1: Adopting code in a Code Ocean Compute Capsule is very similar to Github and therefore does not cause any learning curve in duplicating workflows for the user . Both platforms have actions to clone the code, however, duplicating a Code Ocean Capsule also ensures the original environment and parameters are saved for immediate reuse.
Comparison 2: The first two pictures are examples of the countless errors, e.g. dependency issues or machine limitations, that can occur when trying to run example code. Code Ocean Compute Capsules makes this easy: by clicking “Reproducible Run”, users can seamlessly run code from top to bottom with a single click.
Key Points
For more information about the Code Ocean platform please go to codeocean.com/product or go to /explore to view public research Compute Capsules.