June 27, 2022

The Top 3 Challenges in Scientific Data Analysis: A Conversation with Benjamin Haibe-Kains

At Code Ocean, we’re thrilled and honored to work with brilliant scientists like Benjamin Haibe-Kains, Senior Scientist at Princess Margaret Cancer Center, Associate Professor at University of Toronto, and Scientific Advisor to the Break Through Cancer Foundation.

We sat down with him recently to talk through the larger state of genomic research and what he sees as the major challenges and opportunities facing the field today. This three-part series covers his response across three main areas.

“Today’s rapid advancements in computational biology, combined with plummeting costs of data and cloud computing should lead to an acceleration in discovery of new therapies,” said Haibe-Kains.

“Yet they often present new challenges that slow down progress. If not addressed, they can lead to higher costs – including the opportunity cost of not providing patients with effective therapies in a timely manner.”

Haibe-Kains summarized three major areas of focus that should be addressed: onboarding, reproducibility and collaboration.

Challenge 1: Hiring and Onboarding

The crux of this issue lies within the tremendous opportunity provided by larger changes in technology.

“The costs of sequencing and cloud computing have come down dramatically in the past few decades,” says Haibe-Kains. “But that means massive amounts of data are being generated, creating bottlenecks in analysis.”

He recounts that, 30 years ago, sequencing DNA was extremely tedious and expensive, and was the rate-limiting step, as opposed to the computational analysis of such sequencing data. But the costs of cloud computing and sequencing have become so inexpensive that sequencing data have now become a commodity.

The bottleneck is now the computational analysis. You still need an expert who can analyze the data and talk to the people who generated it to figure out the best way to analyze it. This has become a major bottleneck across all industries including life sciences, social sciences, and engineering.

The shrinking average tenure and scarcity of qualified candidates means that you can’t waste time asking someone to recreate the code of recently exited staff.

As a result, the first challenge is truly adjacent to the second, reproducibility – which is often listed as one of the main challenges in science overall. How can current knowledge be ready not only for the next scientist, but also the next experiment? Haibe-Kains will cover that topic in the next post of the series.

Compute Capsules

Pipelines

Data

Models

Lineage Graph

Collections

Apps

Admin Panel

Computational scientists

IT & engineering

R&D leadership

Bench scientists

Biotechs

Pharmaceutical companies

Research institutes

Universities

Financial Institutions

CompBio newsletter

Webinars

Blog

Case Studies

Model map

User docs

Admin guide

FAQ

OSL for Authors

OSL for Publishers

About Code Ocean

News

Careers

Release notes

The Top 3 Challenges in Scientific Data Analysis: A Conversation with Benjamin Haibe-Kains

Challenge 1: Hiring and Onboarding

Code Ocean

Read more from our blog:

Introducing Code Ocean Models: a unified environment for ML in CompBio

Pre-production: the missing ML link in Biotech & Pharma

Map of foundational models for use in biotech and pharma R&D

Compute Capsules

Pipelines

Data

Models

Lineage Graph

Collections

Apps

Admin Panel

Computational scientists

IT & engineering

R&D leadership

Bench scientists

Biotechs

Pharmaceutical companies

Research institutes

Universities

Financial Institutions

CompBio newsletter

Webinars

Blog

Case Studies

Model map

User docs

Admin guide

FAQ

OSL for Authors

OSL for Publishers

About Code Ocean

News

Careers

Release notes

The Top 3 Challenges in Scientific Data Analysis: A Conversation with Benjamin Haibe-Kains

Challenge 1: Hiring and Onboarding

Code Ocean

Read more from our blog:

Introducing Code Ocean Models: a unified environment for ML in CompBio

Pre-production: the missing ML link in Biotech & Pharma

Map of foundational models for use in biotech and pharma R&D

Subscribe to our newsletter