How to speed up cancer genomics multi-omics analysis
What does it take to treat cancers more effectively? The answers to this important question are still emerging but one thing we know: we need large amounts of multi-omics, i.e. genomic, epigenetic, proteomic, metabolomic, transcriptomic and pharmacological data and advanced analysis tools that help us better understand the relevance of all these layers to the biological underpinnings of this highly heterogeneous group of diseases. Based on that knowledge, oncologists can then characterize a patient’s cancer on the biomolecular level and treat them with the drug or drug combination most likely to work in their specific case.
This is the core concept and promise of precision oncology: the ability to treat molecularly heterogeneous cancers with drugs designed to address the specific underlying cause, rather than using a “one drug fits all cancers” approach.
In a recent webinar Benjamin Haibe-Kains, Senior Scientist at the Princess Margaret Cancer Centre and Associate Professor at the University of Toronto Affiliate OICR and Vector Institute, and Code Ocean’s CEO Simon Adar discussed the promises and challenges of multi-omics analysis in cancer research.
Here are the five key take-aways:
- Data is abundant, analysis tools are advanced but critical challenges remain, specifically:
- Data cleaning and preprocessing is extremely time-consuming, taking up 50 – 80% of the time.[1]
- Lack of standards makes it difficult to combine multiple datasets – a critical requirement for comprehensive analysis.
- Lack of easy ways to collaborate results in researchers frequently reinventing the wheel, e.g. recleaning the data and developing code for the same or similar analysis over and over.
- A three-pronged framework is required to make it easier to find, access, use and combine large datasets:
- Better organization: transparent and reproducible curation of data using established standards.
- Seamless sharing: based on FAIR principles of curated datasets, ideally packaged with code for quality control, mining and visualization, e.g. in Code Ocean Compute Capsules.
- Advance analysis: creation of a computational framework to enable collaborative analysis of multiple datasets.
- While promising, precision oncology has a long way to go to reach maturity. Currently, for the majority of cancer patients there is either no biomarker that allows stratification or no clinical trial they can be matched with. To address this issue, additional multi-omics data and more sophisticated predictive models are needed to develop multi-omics for clinical practice rather than the current univariate tests
- More pharmacogenomic studies are critical for identification of biomarkers that help predict whether a drug has an effect on cancer cells. Significant data already exist but are marred by the same issues as other multi-omics data, i.e. the lack of standards and metadata make pharmacological data hard to clean, align, combine and analyze, esp. using machine learning algorithms.
- To address the need for shared tools and easier collaboration, Haibe-Kains and his team developed a pharmacological toolbox, PharmacoGx, that helps with the integration and curation of data and makes it easily accessible so users can focus on data analysis rather than data wrangling.
More multi-omics data as well as standards and tools that eliminate repetitive, time-consuming and inefficient data preparation steps are required to propel precision oncology from a specialized treatment option to clinical reality for (almost) all cancer patients. Tools like the PharmacoGx package and Code Ocean’s Compute Capsules, that allow for easy, fast and secure sharing of code, data, results and computing environment and enable frictionless collaboration, pave the way to the wider use of multi-omics data in clinical practice.
For more details about the promises and challenges of a multi-omics approach to cancer and to see a demonstration of the PharmacoGx package in a Compute Capsule please watch our webinar.
To explore the publicly available Cancer Pharmacogenomic Toolbox – CCLE Capsule please click here (https://codeocean.com/capsule/0762056/tree/v1)
References
[1] www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/
Read more from our blog:
View All PostsIntroducing Code Ocean Models: a unified environment for ML in CompBio
View PostPre-production: the missing ML link in Biotech & Pharma
View PostMap of foundational models for use in biotech and pharma R&D
View PostSubscribe to our newsletter
Get the latest product updates, company news, and be the first to hear about upcoming webinars and events.