SlideShare a Scribd company logo
Donovan N. Chin & R. Aldrin Denny
 Traditional Drug Discovery (insert graph)
 In Silico Prediction of ADME (insert graph)
◦ Potency
◦ Absorption
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ distribution
 Target IVY(Brute force virtual screening of
very large compound libraries) Lead
Discovery IVY(Utilize predictive models
from Biogen data for more efficient virtual
screening) Lead Optimization candidate
 (insert graph)
◦ Potency
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ Distribution
◦ absorption
 Goal: Identify crystallographic binding mode,
Rank order ligands wrt binding with protein
 (insert graph)
 Receptor Docking
 Ligand Shape
 Generate plausible trial binding modes using
docking function then Re-rank modes with
scoring function
 (insert graph)
 341 Active
 47 Non-Active
 (insert graph)
 After filtering by Pharmacophore Feature
 (insert graph)
 (insert functions for)
◦ F_Score*
◦ D_Score
◦ G_Score
◦ PMF_Score
◦ Chem_Score
◦ ICM_Score*
 Cell Adhesion Assay (50% Serum)
◦ (insert graph)
 Biochemical Adhesion Assay
◦ (insert graph)
 Scoring Functions Are Poor More Often Than
Not
 Receptor Site View Library Design FlexX
Score Consensus Score>=3 e.g. Contact
Map, CLogP MW, HBOND Rotatable bonds
Consensus=5? if yes, substructure exists?
if yes, Pharmacophore<4.2Å? if yes, Publish
Hit Report
 (insert graph)
 Goal: Predict hit/miss class based on presence of features
(fingerprints)
 Method
◦ Given a set of N samples
◦ Given that some subset A of them are good (‘active’)
 Then we estimate for a new compound: P(good)~ A/N
◦ Given a set of binary features F
 For a given feature F:
 It appears in N samples
 It appears in A good samples
 Can we estimate: P(good l F)~A/N
 (Problem: Error gets worse as Nsmall)
◦ P’(good l F)= (A+P(good)k)/(n+k)
 P’(good l F)p(good)as N0
 P’(good l F) A/N as N large
◦ (If K=1/P(good) this is the Laplacian correction)
 Descriptors (insert)
 Advantages
◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead
scope 27,000)
◦ Contains tertiary and stereochemistry information
◦ Fast
 Classification Analysis
◦ Developing Non-Linear Scoring Functions to classify
actives and non-actives
◦ (insert graphs)
◦ Cost Function to Minimize: Gini Impurity N= 1-
ΣP^2(ω)
 Training Set Prediction Success
 (insert table)
 10-fold cross validation
 Randomly split training and test sets
 Significant Improvement in Separating Actives
from Non-Actives
 (insert graph)
 Significant Improvement in Finding Hits Using
New SF
 Optimal tree identified (insert graph)
 No random effects (insert graph)
 (insert cluster)
 Able to identify different molecular property
criteria that lead to hits
 (insert graph)
 (insert graph)
 Size= magnitude of OBA
 OBA values cover range of descriptor space
 (insert graph)
 Choose 1 & 2D Descriptors for ease of
interpretation and lower “noise”
 Build Model (insert graphs) Apply Model
 Features found in high OBA
 Features found in low OBA
 Would be nice if CART did similar view
 Improved scoring functions for separating
hits from non-hits in structure-based drug
design developed with CART and Bayesian
models
 Identified key differences in molecular
physical properties that led to hits
 Built reasonably predictive OBA model
(cannot expect method to extend to other
systems given complexity of OBA, however)
 Biogen IDEC
 Modeling
◦ Rajiah Denny
◦ Claudio Chuaqui
◦ Juswinder Singh
◦ Herman van Vlijmen
◦ Norman Wang
◦ Anuj Patel
◦ Zhan Deng
 Chemistry
◦ Kevin Guckian
◦ Dan Scott
◦ Thomas Durand-Reville
◦ Pat Conlon
◦ Charlie Hammond
◦ Chuck Jewell
 Pharmacology
◦ Tonika Bonhert

More Related Content

Similar to Improved Predictions in Structure Based Drug Design Using Cart and Bayesian Models (20)

PPTX
Summer 2015 Internship
Taylor Martell
 
PDF
SEM MODELING THROUGH PARTIAL least SQUARE
MohitGupta986332
 
PPT
Prediction Of Bioactivity From Chemical Structure
Jeremy Besnard
 
PDF
Data mining with weka
Hein Min Htike
 
PPT
A Validation of Object-Oriented Design Metrics as Quality Indicators
vie_dels
 
PPTX
Face recognition v1
San Kim
 
PPT
RBHF_SDM_2011_Jie
MDO_Lab
 
PPTX
Metabolomic Data Analysis Workshop and Tutorials (2014)
Dmitry Grapov
 
PDF
Introduction to Chainer Chemistry
Preferred Networks
 
PPTX
Use of Definitive Screening Designs to Optimize an Analytical Method
Philip Ramsey
 
PDF
P0126557 slides
Nguyen Chien
 
PPTX
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 
PDF
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
PDF
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
 
PPTX
ADMET.pptx
Santu Chall
 
PDF
Predicting best classifier using properties of data sets
Abhishek Vijayvargia
 
PDF
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Davide Chicco
 
PDF
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systems
Iosif Itkin
 
PDF
Efficient aggregation for graph summarization
aftab alam
 
PDF
WCTFR : W RAPPING C URVELET T RANSFORM B ASED F ACE R ECOGNITION
csandit
 
Summer 2015 Internship
Taylor Martell
 
SEM MODELING THROUGH PARTIAL least SQUARE
MohitGupta986332
 
Prediction Of Bioactivity From Chemical Structure
Jeremy Besnard
 
Data mining with weka
Hein Min Htike
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
vie_dels
 
Face recognition v1
San Kim
 
RBHF_SDM_2011_Jie
MDO_Lab
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Dmitry Grapov
 
Introduction to Chainer Chemistry
Preferred Networks
 
Use of Definitive Screening Designs to Optimize an Analytical Method
Philip Ramsey
 
P0126557 slides
Nguyen Chien
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
 
ADMET.pptx
Santu Chall
 
Predicting best classifier using properties of data sets
Abhishek Vijayvargia
 
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Davide Chicco
 
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systems
Iosif Itkin
 
Efficient aggregation for graph summarization
aftab alam
 
WCTFR : W RAPPING C URVELET T RANSFORM B ASED F ACE R ECOGNITION
csandit
 

More from Salford Systems (20)

PDF
Datascience101presentation4
Salford Systems
 
PPTX
Improve Your Regression with CART and RandomForests
Salford Systems
 
PPTX
Churn Modeling-For-Mobile-Telecommunications
Salford Systems
 
PPT
The Do's and Don'ts of Data Mining
Salford Systems
 
PPTX
Introduction to Random Forests by Dr. Adele Cutler
Salford Systems
 
PPTX
9 Data Mining Challenges From Data Scientists Like You
Salford Systems
 
PPTX
Statistically Significant Quotes To Remember
Salford Systems
 
PPTX
Using CART For Beginners with A Teclo Example Dataset
Salford Systems
 
PPT
CART Classification and Regression Trees Experienced User Guide
Salford Systems
 
PPTX
Evolution of regression ols to gps to mars
Salford Systems
 
PPTX
Data Mining for Higher Education
Salford Systems
 
PDF
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
 
PDF
Molecular data mining tool advances in hiv
Salford Systems
 
PPTX
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
Salford Systems
 
PDF
SPM v7.0 Feature Matrix
Salford Systems
 
PDF
SPM User's Guide: Introducing MARS
Salford Systems
 
PPT
Hybrid cart logit model 1998
Salford Systems
 
PPTX
Session Logs Tutorial for SPM
Salford Systems
 
PPTX
Some of the new features in SPM 7
Salford Systems
 
PPTX
TreeNet Overview - Updated October 2012
Salford Systems
 
Datascience101presentation4
Salford Systems
 
Improve Your Regression with CART and RandomForests
Salford Systems
 
Churn Modeling-For-Mobile-Telecommunications
Salford Systems
 
The Do's and Don'ts of Data Mining
Salford Systems
 
Introduction to Random Forests by Dr. Adele Cutler
Salford Systems
 
9 Data Mining Challenges From Data Scientists Like You
Salford Systems
 
Statistically Significant Quotes To Remember
Salford Systems
 
Using CART For Beginners with A Teclo Example Dataset
Salford Systems
 
CART Classification and Regression Trees Experienced User Guide
Salford Systems
 
Evolution of regression ols to gps to mars
Salford Systems
 
Data Mining for Higher Education
Salford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
 
Molecular data mining tool advances in hiv
Salford Systems
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
Salford Systems
 
SPM v7.0 Feature Matrix
Salford Systems
 
SPM User's Guide: Introducing MARS
Salford Systems
 
Hybrid cart logit model 1998
Salford Systems
 
Session Logs Tutorial for SPM
Salford Systems
 
Some of the new features in SPM 7
Salford Systems
 
TreeNet Overview - Updated October 2012
Salford Systems
 
Ad

Recently uploaded (20)

PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
July Patch Tuesday
Ivanti
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
July Patch Tuesday
Ivanti
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Ad

Improved Predictions in Structure Based Drug Design Using Cart and Bayesian Models

  • 1. Donovan N. Chin & R. Aldrin Denny
  • 2.  Traditional Drug Discovery (insert graph)  In Silico Prediction of ADME (insert graph) ◦ Potency ◦ Absorption ◦ Lead ◦ Drug ◦ Toxicity ◦ Excretion ◦ Metabolism ◦ distribution
  • 3.  Target IVY(Brute force virtual screening of very large compound libraries) Lead Discovery IVY(Utilize predictive models from Biogen data for more efficient virtual screening) Lead Optimization candidate
  • 4.  (insert graph) ◦ Potency ◦ Lead ◦ Drug ◦ Toxicity ◦ Excretion ◦ Metabolism ◦ Distribution ◦ absorption
  • 5.  Goal: Identify crystallographic binding mode, Rank order ligands wrt binding with protein  (insert graph)  Receptor Docking  Ligand Shape  Generate plausible trial binding modes using docking function then Re-rank modes with scoring function
  • 6.  (insert graph)  341 Active  47 Non-Active
  • 7.  (insert graph)  After filtering by Pharmacophore Feature
  • 9.  (insert functions for) ◦ F_Score* ◦ D_Score ◦ G_Score ◦ PMF_Score ◦ Chem_Score ◦ ICM_Score*
  • 10.  Cell Adhesion Assay (50% Serum) ◦ (insert graph)  Biochemical Adhesion Assay ◦ (insert graph)  Scoring Functions Are Poor More Often Than Not
  • 11.  Receptor Site View Library Design FlexX Score Consensus Score>=3 e.g. Contact Map, CLogP MW, HBOND Rotatable bonds Consensus=5? if yes, substructure exists? if yes, Pharmacophore<4.2Å? if yes, Publish Hit Report
  • 13.  Goal: Predict hit/miss class based on presence of features (fingerprints)  Method ◦ Given a set of N samples ◦ Given that some subset A of them are good (‘active’)  Then we estimate for a new compound: P(good)~ A/N ◦ Given a set of binary features F  For a given feature F:  It appears in N samples  It appears in A good samples  Can we estimate: P(good l F)~A/N  (Problem: Error gets worse as Nsmall) ◦ P’(good l F)= (A+P(good)k)/(n+k)  P’(good l F)p(good)as N0  P’(good l F) A/N as N large ◦ (If K=1/P(good) this is the Laplacian correction)  Descriptors (insert)  Advantages ◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead scope 27,000) ◦ Contains tertiary and stereochemistry information ◦ Fast
  • 14.  Classification Analysis ◦ Developing Non-Linear Scoring Functions to classify actives and non-actives ◦ (insert graphs) ◦ Cost Function to Minimize: Gini Impurity N= 1- ΣP^2(ω)
  • 15.  Training Set Prediction Success  (insert table)  10-fold cross validation  Randomly split training and test sets  Significant Improvement in Separating Actives from Non-Actives
  • 16.  (insert graph)  Significant Improvement in Finding Hits Using New SF
  • 17.  Optimal tree identified (insert graph)  No random effects (insert graph)
  • 18.  (insert cluster)  Able to identify different molecular property criteria that lead to hits
  • 20.  (insert graph)  Size= magnitude of OBA  OBA values cover range of descriptor space
  • 21.  (insert graph)  Choose 1 & 2D Descriptors for ease of interpretation and lower “noise”
  • 22.  Build Model (insert graphs) Apply Model
  • 23.  Features found in high OBA  Features found in low OBA  Would be nice if CART did similar view
  • 24.  Improved scoring functions for separating hits from non-hits in structure-based drug design developed with CART and Bayesian models  Identified key differences in molecular physical properties that led to hits  Built reasonably predictive OBA model (cannot expect method to extend to other systems given complexity of OBA, however)
  • 25.  Biogen IDEC  Modeling ◦ Rajiah Denny ◦ Claudio Chuaqui ◦ Juswinder Singh ◦ Herman van Vlijmen ◦ Norman Wang ◦ Anuj Patel ◦ Zhan Deng  Chemistry ◦ Kevin Guckian ◦ Dan Scott ◦ Thomas Durand-Reville ◦ Pat Conlon ◦ Charlie Hammond ◦ Chuck Jewell  Pharmacology ◦ Tonika Bonhert