0% found this document useful (0 votes)
20 views18 pages

Swetabh Pathak

The document discusses the transition from AI models to deployed AI in life sciences, highlighting the importance of relevant use cases and high-quality data for successful implementation. It identifies common barriers to AI adoption, such as lack of expertise and training data, and presents case studies demonstrating how Elucidata's solutions can accelerate AI deployment and improve R&D productivity. The emphasis is on the value of data over tools, advocating for data-centric approaches to enhance AI capabilities in drug discovery and development.

Uploaded by

pal.spandan.99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views18 pages

Swetabh Pathak

The document discusses the transition from AI models to deployed AI in life sciences, highlighting the importance of relevant use cases and high-quality data for successful implementation. It identifies common barriers to AI adoption, such as lack of expertise and training data, and presents case studies demonstrating how Elucidata's solutions can accelerate AI deployment and improve R&D productivity. The emphasis is on the value of data over tools, advocating for data-centric approaches to enhance AI capabilities in drug discovery and development.

Uploaded by

pal.spandan.99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Navigating the Transition:

From Models to Deployed AI

Swetabh Pathak
CTO & Co-Founder, Elucidata
Emerging Leaders for AI in Life Sciences R&D
..with Differentiated Technology & Expertise

#1 30+ customer proof points across


discovery, development and trials

#2 Cloud-first data platform and ML that can


seamlessly complement your ecosystem.

#3 With a team that can quickly assimilate


domain expertise into high impact services.
AI Shows Promise, But Opinions Vary..

AI is currently only
It's too early to used for solving AI is a new hype -
AI is the future - it Al is somewhat
recognize the true simple investors buy into
will help us explore valuable.
impact of Al - we will problems. The the
areas that have never In our work, AI has
only be able to see InSilico desire to be hip, cash
been explored before. helped make a lot of
the true impact once screen would only is raised, Pharma cos
One day AI will help molecules
we can see the have do deals to be in the
us understand biology synthesizable faster &
productivity over had a 4% failure rate, news. There's lots of
so deeply that we can cheaper.
time. even without AI. noise in this field but
form new scientific
it has not been
laws and drug design
proven yet.
principles.
Computational
Head of Data and Translational Deputy Director,
Biologist, Chief Executive,
Platform Strategy, Scientist, Academia Global Health, Non-
Research Institute Data Consortium
‘AI-first’ biotech profit Organisation

Innovators Early Adopters Early Majority Late Majority Laggards


(2.5%) (13.5%) (34%) (34%) (16.5%)

Source:BCG Report on AI, 2023


Biggest Barriers to Broader AI Adoption

Limiting Belief 1 Limiting Belief 2 Limiting Belief 3

‘We don’t have ‘We just couldn’t ‘We don’t have


a relevant use get enough the expertise or
case for AI’ training data to the millions of
solve the dollars needed
problem we’re to build a team’
working on’

Translational Leader, Computational CSO, Early Stage


Mid-Stage Biologist, Research Therapeutics
Pharmaceutical Institute
Company
Limiting Belief 1

“We don’t have a relevant use


case for AI”
Good AI use cases are not rare events

Access to ‘high quality and relevant’ data

Human supervision is available & possible

What do good
use-cases look Biological rules can be framed

like?
Hypothesis generation rather than testing

Explainability is not necessary


Right use case is the biggest predictor of success

‘An Early Stage Therapeutics company wanted to develop and train Classifier Models
to segment patients in AML’

2+ Targets
Identified
and
Validated

10k Public Multi- Patient Prioritized list of


Segmented Cohorts
omics samples Stratification Model gene targets

AI was used to assist domain experts, to solve a well-defined problem with


clear outcomes: ‘Segment the patient cohorts in AML as per their prognosis’
LLMs were Deployed to Advance Target Identification

8
Limiting Belief 2

“We just couldn’t get enough


training data to solve the
problem we’re working on”
Make the Shift to Data-Centric AI

Fine-tune existing models with high quality and relevant data. Especially useful for predicting long-
tail problems with limited data points (<10,000)

Use Cases

Cell Type
Annotation
Pre-training Data

Available Single Cell Fine-tuned with task Dataset Integration


Gene Expression Fed into Foundational
specific, high quality
profiles (33 million) Model
data Gene Perturbation
Response
Prediction

Gene Regulatory
Network Inference
How do we define High Quality Data?

● Up-to-date with patient information, domain specific


● Metadata annotations, processing & ontologies are custom to
Highest
the use case / task
Quality Data
● QC-ed by experts

● Annotated with critical metadata


● Relevant to the biological domain
Context Specific,
Relevant (scGPT,
GenePT)

● Ingested & transformed into Machine readable


formats Machine Curated from Public Sources
● Structured into tabular files (CSV, JSON) (BERT, GPT)

11
How well does scGPT perform after Fine-Tuning with High Quality Data?

scGPT can perform reference based cell type annotation in a zero shot setting.
However, fine-tuning with high quality data improves model performance by 20% (avg)

Experimental Design

● Training Dataset: 25k immune


cells from HCA to fine-tune

● Testing Dataset: 13k immune


cells from Tabula Sapiens

● All datasets were cleaned and


linked with Elucidata’s
Harmonization Engine.
Elucidata’s Harmonization Engine
Cleans the Data you Need

50 Million
10X FASTER Samples harmonized to support use cases
curation with in drug discovery, development & trials
LLM powered
annotation tools Techn
ology

25+ Data Types


Supported including RWE, Omics and
Clinical
Proces
People
s

100+ Experts
30+
99.99% Accurate
In curation, NLP, data
Data delivered with Data Pipelines built and maintained on
engineering, & Elucidata infrastructure to process data
robust QA/QC
bioinformatics
Limiting Belief 3

“We don’t have the expertise


or millions needed to build a
team”
Case Study: Building Production-ready AMDET Pipelines

SCENARIO

‘This mid-cap pharmaceutical company wanted to develop an end-to-end ADMET prediction pipeline
that would support 5 lead development programs across neurology and oncology’.

THEIR NEEDS
● Collect and prepare all the data generated across assays in a meaningful and scalable way.
● Productionize existing models on the cloud, so that they can run at scale.
● Develop an ML-ops infrastructure to manage the data & models across stakeholders, multiple
sites and different types of users.

Doing this in-house needs a team of 7 FTEs and cloud resources.


Costs drum upto ~$2 Million and projects could take 1+ years to kick start.
Scaling up AI in production, Set up in 1.5 Months

Compound Screening & Evaluation Accelerated by 2X with Production-Ready ADMET pipeline

Polly

Load & Split Data, Feature Selection

Train & Fine-Tune Models

Track Experiments, Select Models,


Version

Relevant Public Harmonization Validated Model to Dashboard


and In-house Predict endpoints Predicted Endpoints
Engine
Datasets Ingest, process, annotate

Add inputs & Run Workflow

ADMET Model Deployment Workflow


Significant R&D Productivity Unlocked

Projects could be kickstarted within a month, at 4X Lower Costs.

Productivity Areas Improvement with Elucidata Rationale

● Dedicated team to perform searches


Data Acquisition 4X Faster
in Public Databases

Data Preparation, ● LLM-powered Harmonization reduced


4X Faster
Annotation, QC manual effort in data preparation.

● Key bottleneck steps in the process


Model Development /
Reduced by 30% (ingestion, cleaning, ML Model
Deployment Cycle
Deployment, Versioning) automated.
Any Questions?

Reach out at elucidata.io to know more!


"The value is in the data, it is not in
the tools. That is the one thing, it’s a
bit of a hobby horse for me. One
Polly Platform &
Services thing always point to in these
discussions around data, don’t
underestimate the amount of time
ML-ready and value in doing what is really
Data Access often difficult and not so rewarding
directly work, like cleaning data sets
isn’t always fun, but it is often the
most valuable thing you can do."

ML
Initiatives Dr. Jeffrey Reid,
Regeneron's Chief Data Officer

You might also like