Duplicating Compute Capsules in Code Ocean
In bioinformatics and computational biology, solid workflows are difficult to set up. Advanced analyses often require several computational environments, different languages and multiple libraries. To get to a computational workflow faster, researchers have been known to use a short-cut: past code adoption. For past code adoption, researchers peruse the scientific literature for work that has an objective similar to their own, then try to find the code used in that paper and tailor it to fit their own workflow.
The idea behind it is simple: let’s not reinvent the wheel, if the code is already out there, there is no need to recreate it.
This approach is very common in computational biology and can boost collaboration and keep computational biologists up-to-date on the latest developments in their field.
The Challenges with Past Code Adoption
Reusing existing code that is made available on open source repositories such as Github seems simple enough, but adopting somebody else’s work is not without challenges. There are many trivial obstacles each of which alone may not cause any problems but when they add up they can break your workflow.
Frequent sources of problems include:
- Lack of needed computational resources, e.g. if the code requires a certain GPU/CPU from NVIDIA to handle single cell genomics workflows.
- Dependency issues, e.g., if the code employs out-of-date or deprecated libraries that cannot be accessed.
- Undetected bugs in code version/performance issues depending on machine.
Obstacles like these can make the adoption of otherwise ideal workflows impossible or at least incredibly time-consuming and can lead to more computational researchers reinventing the wheel rather than struggling with existing solutions.
What if the code was guaranteed to run?
Addressing the challenge created by non-reproducible code was a key motivator behind the creation of the Code Ocean platform, specifically the Compute Capsules©. Ideally, example code should run anytime, anywhere, regardless of who uses it and how much time has passed since it was last run. It should at most take a few clicks and no DevOps experience, no IT support team, no late-night trouble-shooting sessions.
Compute Capsules were purpose-designed to enable sharing of computational research, incl. sharing and reuse of existing workflows and pipelines. Compute Capsules are self-contained research assets that contain all the code, environment, data required to execute a workflow and all the associated results. Each Compute Capsule, in effect, is also a time capsule in which the code is preserved with all dependencies intact and the environment unchanged. A new user can reproduce past results simply by rerunning the analysis with one click.
This exact preservation of code, environment, data and results is a critically important characteristic in the context of past code adoption: the intact, working, runnable code can be easily modified and edited by cloning or duplicating a Capsule. That allows researchers to build on existing work without time-consuming fixes.
The screenshots below show the difference between rerunning code with and without Code Ocean.
Comparison 1: Adopting code in a Code Ocean Compute Capsule is very similar to Github and therefore does not cause any learning curve in duplicating workflows for the user . Both platforms have actions to clone the code, however, duplicating a Code Ocean Capsule also ensures the original environment and parameters are saved for immediate reuse.
Comparison 2: The first two pictures are examples of the countless errors, e.g. dependency issues or machine limitations, that can occur when trying to run example code. Code Ocean Compute Capsules makes this easy: by clicking “Reproducible Run”, users can seamlessly run code from top to bottom with a single click.
- Adopting code from other researchers and adapting it is highly non-trivial;
- Code Ocean makes it easy to search for existing code and adopt with a single click
- Compute Capsules feature pre-configured environments and ready-to-use machine resources
- Creators can configure their Capsules to produce a “Reproducible Run” where results are available with one click. This is beneficial when working with new frameworks
- The ease of adoption encourages researchers to explore new methods they would otherwise not consider (as an example watch our webinar “Accelerating Single-Cell Genomics Discovery with AI, NVIDIA, and Code Ocean”