Swetabh Pathak
Swetabh Pathak
Swetabh Pathak
CTO & Co-Founder, Elucidata
Emerging Leaders for AI in Life Sciences R&D
..with Differentiated Technology & Expertise
AI is currently only
It's too early to used for solving AI is a new hype -
AI is the future - it Al is somewhat
recognize the true simple investors buy into
will help us explore valuable.
impact of Al - we will problems. The the
areas that have never In our work, AI has
only be able to see InSilico desire to be hip, cash
been explored before. helped make a lot of
the true impact once screen would only is raised, Pharma cos
One day AI will help molecules
we can see the have do deals to be in the
us understand biology synthesizable faster &
productivity over had a 4% failure rate, news. There's lots of
so deeply that we can cheaper.
time. even without AI. noise in this field but
form new scientific
it has not been
laws and drug design
proven yet.
principles.
Computational
Head of Data and Translational Deputy Director,
Biologist, Chief Executive,
Platform Strategy, Scientist, Academia Global Health, Non-
Research Institute Data Consortium
‘AI-first’ biotech profit Organisation
What do good
use-cases look Biological rules can be framed
like?
Hypothesis generation rather than testing
‘An Early Stage Therapeutics company wanted to develop and train Classifier Models
to segment patients in AML’
2+ Targets
Identified
and
Validated
8
Limiting Belief 2
Fine-tune existing models with high quality and relevant data. Especially useful for predicting long-
tail problems with limited data points (<10,000)
Use Cases
Cell Type
Annotation
Pre-training Data
Gene Regulatory
Network Inference
How do we define High Quality Data?
11
How well does scGPT perform after Fine-Tuning with High Quality Data?
scGPT can perform reference based cell type annotation in a zero shot setting.
However, fine-tuning with high quality data improves model performance by 20% (avg)
Experimental Design
50 Million
10X FASTER Samples harmonized to support use cases
curation with in drug discovery, development & trials
LLM powered
annotation tools Techn
ology
100+ Experts
30+
99.99% Accurate
In curation, NLP, data
Data delivered with Data Pipelines built and maintained on
engineering, & Elucidata infrastructure to process data
robust QA/QC
bioinformatics
Limiting Belief 3
SCENARIO
‘This mid-cap pharmaceutical company wanted to develop an end-to-end ADMET prediction pipeline
that would support 5 lead development programs across neurology and oncology’.
THEIR NEEDS
● Collect and prepare all the data generated across assays in a meaningful and scalable way.
● Productionize existing models on the cloud, so that they can run at scale.
● Develop an ML-ops infrastructure to manage the data & models across stakeholders, multiple
sites and different types of users.
Polly
ML
Initiatives Dr. Jeffrey Reid,
Regeneron's Chief Data Officer