Improved Predictions in Structure Based Drug Design Using Cart and Bayesian Models

Download as PPTX, PDF

0 likes939 views

The document discusses using in silico methods like virtual screening and predictive modeling to improve drug discovery. It presents results from applying techniques like receptor docking, machine learning algorithms, and Bayesian modeling to develop improved scoring functions that better distinguish active from inactive compounds. These scoring functions helped identify key molecular properties that correlated with active hits. The methods showed improved ability to find active hits compared to previous scoring functions.

Technology

 Traditional Drug Discovery (insert graph)
 In Silico Prediction of ADME (insert graph)
◦ Potency
◦ Absorption
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ distribution

 Target IVY(Brute force virtual screening of
very large compound libraries) Lead
Discovery IVY(Utilize predictive models
from Biogen data for more efficient virtual
screening) Lead Optimization candidate

 (insert graph)
◦ Potency
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ Distribution
◦ absorption

 Goal: Identify crystallographic binding mode,
Rank order ligands wrt binding with protein
 (insert graph)
 Receptor Docking
 Ligand Shape
 Generate plausible trial binding modes using
docking function then Re-rank modes with
scoring function

 (insert graph)
 341 Active
 47 Non-Active

 (insert graph)
 After filtering by Pharmacophore Feature

 (insert functions for)
◦ F_Score*
◦ D_Score
◦ G_Score
◦ PMF_Score
◦ Chem_Score
◦ ICM_Score*

 Cell Adhesion Assay (50% Serum)
◦ (insert graph)
 Biochemical Adhesion Assay
◦ (insert graph)
 Scoring Functions Are Poor More Often Than
Not

 Receptor Site View Library Design FlexX
Score Consensus Score>=3 e.g. Contact
Map, CLogP MW, HBOND Rotatable bonds
Consensus=5? if yes, substructure exists?
if yes, Pharmacophore<4.2Å? if yes, Publish
Hit Report

 Goal: Predict hit/miss class based on presence of features
(fingerprints)
 Method
◦ Given a set of N samples
◦ Given that some subset A of them are good (‘active’)
 Then we estimate for a new compound: P(good)~ A/N
◦ Given a set of binary features F
 For a given feature F:
 It appears in N samples
 It appears in A good samples
 Can we estimate: P(good l F)~A/N
 (Problem: Error gets worse as Nsmall)
◦ P’(good l F)= (A+P(good)k)/(n+k)
 P’(good l F)p(good)as N0
 P’(good l F) A/N as N large
◦ (If K=1/P(good) this is the Laplacian correction)
 Descriptors (insert)
 Advantages
◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead
scope 27,000)
◦ Contains tertiary and stereochemistry information
◦ Fast

 Classification Analysis
◦ Developing Non-Linear Scoring Functions to classify
actives and non-actives
◦ (insert graphs)
◦ Cost Function to Minimize: Gini Impurity N= 1-
ΣP^2(ω)

 Training Set Prediction Success
 (insert table)
 10-fold cross validation
 Randomly split training and test sets
 Significant Improvement in Separating Actives
from Non-Actives

 (insert graph)
 Significant Improvement in Finding Hits Using
New SF

 Optimal tree identified (insert graph)
 No random effects (insert graph)

 (insert cluster)
 Able to identify different molecular property
criteria that lead to hits

 (insert graph)
 Size= magnitude of OBA
 OBA values cover range of descriptor space

 (insert graph)
 Choose 1 & 2D Descriptors for ease of
interpretation and lower “noise”

 Build Model (insert graphs) Apply Model

 Features found in high OBA
 Features found in low OBA
 Would be nice if CART did similar view

 Improved scoring functions for separating
hits from non-hits in structure-based drug
design developed with CART and Bayesian
models
 Identified key differences in molecular
physical properties that led to hits
 Built reasonably predictive OBA model
(cannot expect method to extend to other
systems given complexity of OBA, however)

 Biogen IDEC
 Modeling
◦ Rajiah Denny
◦ Claudio Chuaqui
◦ Juswinder Singh
◦ Herman van Vlijmen
◦ Norman Wang
◦ Anuj Patel
◦ Zhan Deng
 Chemistry
◦ Kevin Guckian
◦ Dan Scott
◦ Thomas Durand-Reville
◦ Pat Conlon
◦ Charlie Hammond
◦ Chuck Jewell
 Pharmacology
◦ Tonika Bonhert

More Related Content

Similar to Improved Predictions in Structure Based Drug Design Using Cart and Bayesian Models (20)

PPTX

Summer 2015 InternshipTaylor Martell

PDF

SEM MODELING THROUGH PARTIAL least SQUAREMohitGupta986332

PPT

Prediction Of Bioactivity From Chemical StructureJeremy Besnard

PDF

Data mining with wekaHein Min Htike

PPT

A Validation of Object-Oriented Design Metrics as Quality Indicatorsvie_dels

PPTX

Face recognition v1San Kim

PPT

RBHF_SDM_2011_JieMDO_Lab

PPTX

Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov

PDF

Introduction to Chainer ChemistryPreferred Networks

PPTX

Use of Definitive Screening Designs to Optimize an Analytical MethodPhilip Ramsey

PDF

P0126557 slidesNguyen Chien

PPTX

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf

PDF

MLConf 2016 SigOpt Talk by Scott ClarkSigOpt

PDF

Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf

PPTX

ADMET.pptxSantu Chall

PDF

Predicting best classifier using properties of data setsAbhishek Vijayvargia

PDF

Doctoral Thesis Dissertation 2014-03-20 @PoliMiDavide Chicco

PDF

TMPA-2017: Evolutionary Algorithms in Test Generation for digital systemsIosif Itkin

PDF

Efficient aggregation for graph summarizationaftab alam

PDF

WCTFR : W RAPPING C URVELET T RANSFORM B ASED F ACE R ECOGNITIONcsandit

Summer 2015 InternshipTaylor Martell

SEM MODELING THROUGH PARTIAL least SQUAREMohitGupta986332

Prediction Of Bioactivity From Chemical StructureJeremy Besnard

Data mining with wekaHein Min Htike

A Validation of Object-Oriented Design Metrics as Quality Indicatorsvie_dels

Face recognition v1San Kim

RBHF_SDM_2011_JieMDO_Lab

Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov

Introduction to Chainer ChemistryPreferred Networks

Use of Definitive Screening Designs to Optimize an Analytical MethodPhilip Ramsey

P0126557 slidesNguyen Chien

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf

MLConf 2016 SigOpt Talk by Scott ClarkSigOpt

Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf

ADMET.pptxSantu Chall

Predicting best classifier using properties of data setsAbhishek Vijayvargia

Doctoral Thesis Dissertation 2014-03-20 @PoliMiDavide Chicco

TMPA-2017: Evolutionary Algorithms in Test Generation for digital systemsIosif Itkin

Efficient aggregation for graph summarizationaftab alam

WCTFR : W RAPPING C URVELET T RANSFORM B ASED F ACE R ECOGNITIONcsandit

More from Salford Systems (20)

PDF

Datascience101presentation4Salford Systems

PPTX

Improve Your Regression with CART and RandomForestsSalford Systems

PPTX

Churn Modeling-For-Mobile-Telecommunications Salford Systems

PPT

The Do's and Don'ts of Data MiningSalford Systems

PPTX

Introduction to Random Forests by Dr. Adele CutlerSalford Systems

PPTX

9 Data Mining Challenges From Data Scientists Like YouSalford Systems

PPTX

Statistically Significant Quotes To RememberSalford Systems

PPTX

Using CART For Beginners with A Teclo Example DatasetSalford Systems

PPT

CART Classification and Regression Trees Experienced User GuideSalford Systems

PPTX

Evolution of regression ols to gps to marsSalford Systems

PPTX

Data Mining for Higher EducationSalford Systems

PDF

Comparison of statistical methods commonly used in predictive modelingSalford Systems

PDF

Molecular data mining tool advances in hivSalford Systems

PPTX

TreeNet Tree Ensembles & CART Decision Trees: A Winning CombinationSalford Systems

PDF

SPM v7.0 Feature MatrixSalford Systems

PDF

SPM User's Guide: Introducing MARSSalford Systems

PPT

Hybrid cart logit model 1998Salford Systems

PPTX

Session Logs Tutorial for SPMSalford Systems

PPTX

Some of the new features in SPM 7Salford Systems

PPTX

TreeNet Overview - Updated October 2012Salford Systems

Datascience101presentation4Salford Systems

Improve Your Regression with CART and RandomForestsSalford Systems

Churn Modeling-For-Mobile-Telecommunications Salford Systems

The Do's and Don'ts of Data MiningSalford Systems

Introduction to Random Forests by Dr. Adele CutlerSalford Systems

9 Data Mining Challenges From Data Scientists Like YouSalford Systems

Statistically Significant Quotes To RememberSalford Systems

Using CART For Beginners with A Teclo Example DatasetSalford Systems

CART Classification and Regression Trees Experienced User GuideSalford Systems

Evolution of regression ols to gps to marsSalford Systems

Data Mining for Higher EducationSalford Systems

Comparison of statistical methods commonly used in predictive modelingSalford Systems

Molecular data mining tool advances in hivSalford Systems

TreeNet Tree Ensembles & CART Decision Trees: A Winning CombinationSalford Systems

SPM v7.0 Feature MatrixSalford Systems

SPM User's Guide: Introducing MARSSalford Systems

Hybrid cart logit model 1998Salford Systems

Session Logs Tutorial for SPMSalford Systems

Some of the new features in SPM 7Salford Systems

TreeNet Overview - Updated October 2012Salford Systems

Recently uploaded (20)

PDF

The Rise of AI and IoT in Mobile App Tech.pdfIMG Global Infotech

PPTX

Q2 FY26 Tableau User Group Leader Quarterly Calllward7

PDF

IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...Rejig Digital

PDF

Chris Elwell Woburn, MA - Passionate About IT InnovationChris Elwell Woburn, MA

PDF

From Code to Challenge: Crafting Skill-Based Games That Engage and Rewardaiyshauae

PPTX

WooCommerce Workshop: Bring Your LaptopLaura Hartwig

PDF

July Patch TuesdayIvanti

PDF

New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025BookNet Canada

PDF

CIFDAQ Market Wrap for the week of 4th July 2025CIFDAQ

PDF

Building Real-Time Digital Twins with IBM Maximo & ArcGIS IndoorsSafe Software

PDF

Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...darshakparmar

PDF

Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdfdarshakparmar

PDF

HubSpot Main Hub: A Unified Growth PlatformJaswinder Singh

PDF

"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...Fwdays

PDF

Agentic AI lifecycle for Enterprise Hyper-AutomationDebmalya Biswas

PDF

POV_ Why Enterprises Need to Find Value in ZERO.pdfdarshakparmar

PDF

Bitcoin for Millennials podcast with Bram, Power Laws of BitcoinStephen Perrenod

PDF

NewMind AI - Journal 100 Insights After The 100th IssueNewMind AI

PDF

Blockchain Transactions Explained For EveryoneCIFDAQ

PDF

"AI Transformation: Directions and Challenges", Pavlo ShaternikFwdays

The Rise of AI and IoT in Mobile App Tech.pdfIMG Global Infotech

Q2 FY26 Tableau User Group Leader Quarterly Calllward7

IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...Rejig Digital

Chris Elwell Woburn, MA - Passionate About IT InnovationChris Elwell Woburn, MA

From Code to Challenge: Crafting Skill-Based Games That Engage and Rewardaiyshauae

WooCommerce Workshop: Bring Your LaptopLaura Hartwig

July Patch TuesdayIvanti

New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025BookNet Canada

CIFDAQ Market Wrap for the week of 4th July 2025CIFDAQ

Building Real-Time Digital Twins with IBM Maximo & ArcGIS IndoorsSafe Software

Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...darshakparmar

Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdfdarshakparmar

HubSpot Main Hub: A Unified Growth PlatformJaswinder Singh

"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...Fwdays

Agentic AI lifecycle for Enterprise Hyper-AutomationDebmalya Biswas

POV_ Why Enterprises Need to Find Value in ZERO.pdfdarshakparmar

Bitcoin for Millennials podcast with Bram, Power Laws of BitcoinStephen Perrenod

NewMind AI - Journal 100 Insights After The 100th IssueNewMind AI

Blockchain Transactions Explained For EveryoneCIFDAQ

"AI Transformation: Directions and Challenges", Pavlo ShaternikFwdays

Improved Predictions in Structure Based Drug Design Using Cart and Bayesian Models

1. Donovan N. Chin & R. Aldrin Denny

2.  Traditional Drug Discovery (insert graph)  In Silico Prediction of ADME (insert graph) ◦ Potency ◦ Absorption ◦ Lead ◦ Drug ◦ Toxicity ◦ Excretion ◦ Metabolism ◦ distribution

3.  Target IVY(Brute force virtual screening of very large compound libraries) Lead Discovery IVY(Utilize predictive models from Biogen data for more efficient virtual screening) Lead Optimization candidate

4.  (insert graph) ◦ Potency ◦ Lead ◦ Drug ◦ Toxicity ◦ Excretion ◦ Metabolism ◦ Distribution ◦ absorption

5.  Goal: Identify crystallographic binding mode, Rank order ligands wrt binding with protein  (insert graph)  Receptor Docking  Ligand Shape  Generate plausible trial binding modes using docking function then Re-rank modes with scoring function

6.  (insert graph)  341 Active  47 Non-Active

7.  (insert graph)  After filtering by Pharmacophore Feature

8.  (insert graph)

9.  (insert functions for) ◦ F_Score* ◦ D_Score ◦ G_Score ◦ PMF_Score ◦ Chem_Score ◦ ICM_Score*

10.  Cell Adhesion Assay (50% Serum) ◦ (insert graph)  Biochemical Adhesion Assay ◦ (insert graph)  Scoring Functions Are Poor More Often Than Not

11.  Receptor Site View Library Design FlexX Score Consensus Score>=3 e.g. Contact Map, CLogP MW, HBOND Rotatable bonds Consensus=5? if yes, substructure exists? if yes, Pharmacophore<4.2Å? if yes, Publish Hit Report

12.  (insert graph)

13.  Goal: Predict hit/miss class based on presence of features (fingerprints)  Method ◦ Given a set of N samples ◦ Given that some subset A of them are good (‘active’)  Then we estimate for a new compound: P(good)~ A/N ◦ Given a set of binary features F  For a given feature F:  It appears in N samples  It appears in A good samples  Can we estimate: P(good l F)~A/N  (Problem: Error gets worse as Nsmall) ◦ P’(good l F)= (A+P(good)k)/(n+k)  P’(good l F)p(good)as N0  P’(good l F) A/N as N large ◦ (If K=1/P(good) this is the Laplacian correction)  Descriptors (insert)  Advantages ◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead scope 27,000) ◦ Contains tertiary and stereochemistry information ◦ Fast

14.  Classification Analysis ◦ Developing Non-Linear Scoring Functions to classify actives and non-actives ◦ (insert graphs) ◦ Cost Function to Minimize: Gini Impurity N= 1- ΣP^2(ω)

15.  Training Set Prediction Success  (insert table)  10-fold cross validation  Randomly split training and test sets  Significant Improvement in Separating Actives from Non-Actives

16.  (insert graph)  Significant Improvement in Finding Hits Using New SF

17.  Optimal tree identified (insert graph)  No random effects (insert graph)

18.  (insert cluster)  Able to identify different molecular property criteria that lead to hits

19.  (insert graph)

20.  (insert graph)  Size= magnitude of OBA  OBA values cover range of descriptor space

21.  (insert graph)  Choose 1 & 2D Descriptors for ease of interpretation and lower “noise”

22.  Build Model (insert graphs) Apply Model

23.  Features found in high OBA  Features found in low OBA  Would be nice if CART did similar view

24.  Improved scoring functions for separating hits from non-hits in structure-based drug design developed with CART and Bayesian models  Identified key differences in molecular physical properties that led to hits  Built reasonably predictive OBA model (cannot expect method to extend to other systems given complexity of OBA, however)

25.  Biogen IDEC  Modeling ◦ Rajiah Denny ◦ Claudio Chuaqui ◦ Juswinder Singh ◦ Herman van Vlijmen ◦ Norman Wang ◦ Anuj Patel ◦ Zhan Deng  Chemistry ◦ Kevin Guckian ◦ Dan Scott ◦ Thomas Durand-Reville ◦ Pat Conlon ◦ Charlie Hammond ◦ Chuck Jewell  Pharmacology ◦ Tonika Bonhert