Discovery Studio 2016 Help - Theory - ADMET Descriptors
Discovery Studio 2016 Help - Theory - ADMET Descriptors
Small Molecules tools > Theory - Small Molecules > Theory - ADMET Descriptors
Development of model
The model was developed using 182 compounds in the training set, with descriptors that include AlogP98 and 2D polar
surface area (PSA_2D) [Egan et al., 2000] and [Egan and Lauri, 2002]. Well absorbed compounds have at least 90%
absorp on into human bloodstream, and by this model will generally reside within the ellipse regions of 95% and 99%
confidence levels.
Figure 1: The Development of Human Intes nal Absorp on Model
The human absorp on model was validated against several datasets. Results showed that for drug-like compounds
within the 99% absorp on ellipsoid, the following ra os were predicted correctly:
Physicians Desk Reference (PDR): 87.4% of 438 orally delivered compounds
World Drug Index: 82.9% of 8,504 with USAN or INN
Comprehensive Medicinal Chemistry: 83.5% of 5,836 filtered by class
Pharmacopeia's Caco-2 data (446 compounds with low, moderate, and high permeability (i.e., Papp), used in many
PCOP Labs collabora ve projects):
Moderate/High Papp: 91.5% of compounds lie within 99% ellipse
Low Papp: 20.6% of compounds lie within 95% ellipse
Computed proper es
Property Descrip on
ADMET_PSA_2D Fast polar surface area
ADMET_AlogP98 ALogP
ADMET_Unknown_AlogP98 The count of unknown AlogP atoms found in the molecule
ADMET_Absorp on_Level Categorical absorp on level
Note: In calcula ng ADMET_AlogP98, atoms with unknown AlogP98 types do not contribute to the AlogP98
calcula on.
Note: ADMET_Absorp on_T2_2D is the Mahalanobis distance for the compound in the ADMET_PSA_2D,
ADMET_AlogP98 plane. It is referenced from the center of the region of chemical space defined by well absorbed
compounds
Development of model
Cheng and Merz reported a predic ve model for aqueous solubility that was generated using a dataset containing 775
compounds (molecular weight between 50 and 800) [Cheng and Merz, 2003]. The training set compounds cover
numerous classes, including alkanes, alkenes, alkynes, halogens, amines, alcohols, N-containing compounds, ketones,
aldehydes, organic acids, etc. A plot of the predicted versus experimental data gives a linear regression of LogSw (25 oC,
pH=7.0) R2 = 0.84, and standard devia on = 0.87. The test set consisted of 34 compounds, and the regression sta s cs
are R2 = 0.88 and standard devia on = 0.79 (see Figure 2). A valida on test consis ng of 1,615 compounds from the PDR,
Comprehensive Medicinal Chemistry database (CMC), and other sources was also performed. Results yield an overall
RMSE (SD) of 1.0.
Figure 2 shows the regression plots for the training set and test set applicable to the deriva on and tes ng of this model
Figure 2: Development of Aqueous Solubility Model
Computed proper es
Property Descrip on
ADMET_Solubility The base 10 logarithm of the molar solubility as predicted by the regression
ADMET_Solubility_Level Categorical solubility level
Note: In calcula ng aqueous solubility, atoms with unknown AlogP98 types do not contribute to the AlogP98
calcula on.
Development of model
Scien sts at BIOVIA recently developed two robust models for the predic on of BBB penetra on. A regression model to
predict logBB values was derived from a training set of 102 compounds (R2 = 0.7329, RMSE = 0.3638, N = 102), and a test
set of 86 compounds (R2 = 0.8892, RMSE = 0.3064, N = 86). The model was validated against 881 compounds designated
as CNS compounds in the CMC database (Figure 3). Further tes ng against a collec on of 124 compounds with known
logBB values yielded an R2 = 0.889 and SD = 0.306.
Figure 3: Development of Blood-Brain Barrier Model
Computed proper es
Property Descrip on
ADMET_PSA_2D Fast polar surface area
ADMET_AlogP98 Atom-based LogP
ADMET_Unknown_AlogP98 The count of unknown AlogP atoms found in the molecule
ADMET_BBB Base 10 logarithm of (brain concentra on)/(blood concentra on) as predicted by a robust
(least-median-of-squares) regression derived from literature in vivo brain penetra on
data
ADMET_BBB_Level Categorical level
Note: Atoms with unknown AlogP98 types do not contribute to the AlogP98 calcula on.
Development of model
A computa onal model for compounds inhibi ng the CYP2D6 enzyme was developed from a training set of 151
structurally diverse compounds with known CYP2D6 inhibi on constants [Susnow and Dixon, 2003]. However, the
modeling methodology described in that paper was not used; instead, modified Bayesian learning [Xia et al., 2004] was
performed. A leave-one-out cross valida on was conducted and the following table lists the performance of the model:
Model Name ROC ROC True False False True Sensi vity Specificity Concordance
Score Ra ng Posi ve Nega ve Posi ve Nega ve Rate
ADMET_EXT_CYP2D6 0.877 Good 56 13 12 70 0.812 0.854 0.834
Note: For defini on of these terms, see Receiver Opera ng Characteris c (ROC) Plot Analysis.
Computed proper es
Property Descrip on
ADMET_EXT_CYP2D6 Bayesian score from the model
ADMET_EXT_CYP2D6#Predic on The classifica on whether a compound is an CYP2D6 inhibitor using the cutoff
Bayesian score of 0.161 (obtained by minimizing the total number of false
posi ves and false nega ves)
ADMET_EXT_CYP2D6_Applicability The applicability of the model on the predicted compound
ADMET_EXT_CYP2D6#MD The Mahalanobis distance (MD) to the center of the training data. MD is a
generaliza on of the Euclidean distance that accounts for correla on among the
proper es. The larger the MD, the less reliable the predic on
ADMET_EXT_CYP2D6#MDpvalue The p-value associated with the MD. The smaller the p-value, the less reliable the
predic on
ADMET - Hepatotoxicity
The hepatotoxicity model predicts poten al organ toxicity for a wide range of structurally diverse compounds.
Development of model
The model was developed from available literature data of 436 compounds known to exhibit liver toxicity (i.e., posi ve
dose-dependent hepatocellular, cholesta c, neoplas c, etc.), or trigger dose-related elevated aminotransferase levels in
more than 10% of the human popula on [Cheng and Dixon, 2003]. From this, a model was generated from a
classifica on structure ac vity rela onship (SAR) technique that uses a modified Bayesian learning [Xia et al., 2004] The
model was validated using leave-one-out cross valida on. The following table lists the performance of the model:
Model Name ROC ROC True False False True Sensi vity Specificity Concordance
Score Ra ng Posi ve Nega ve Posi ve Nega ve Rate
ADMET_EXT_Hepatotoxic 0.888 Good 157 15 66 198 0.913 0.750 0.814
Note: For defini on of these terms, see Receiver Opera ng Characteris c (ROC) Plot Analysis.
Computed proper es
Property Descrip on
ADMET_EXT_Hepatotoxic Bayesian score from the model
ADMET_EXT_Hepatotoxic#Predic on The classifica on whether a compound is hepatotoxic using the cutoff
Bayesian score of -4.154 (obtained by minimizing the total number of false
posi ves and false nega ves)
ADMET_EXT_Hepatotoxic_Applicability The applicability of the model on the predicted compound
ADMET_EXT_Hepatotoxic#MD The Mahalanobis distance (MD) to the center of the training data. MD is a
generaliza on of the Euclidean distance that accounts for correla on among
the proper es. The larger the MD, the less reliable the predic on
ADMET_EXT_Hepatotoxic#MDpvalue The p-value associated with the MD. The smaller the p-value, the less reliable
the predic on
frac on is temporarily shielded from metabolism. On the other hand, only the unbound frac on exhibits
pharmacological effects.
Development of model
Two data sets with plasma protein binding level data [Dixon and Merz, 2001][Votano et al., 2006]were combined and
duplicates were removed to generate a training set of 854 compounds. Using a cutoff binding level of 90%, the training
set was divided into 329 binders and 525 non-binders. A modified Bayesian learning [Xia et al., 2004] was used to create
a binary classifica on model. The model was validated using leave-one-out cross valida on. The following table lists the
performance of the model:
Model Name ROC ROC True False False True Sensi vity Specificity Concordance
Score Ra ng Posi ve Nega ve Posi ve Nega ve Rate
ADMET_EXT_PPB 0.873 Good 285 44 121 404 0.866 0.770 0.807
Note: For defini on of these terms, see Receiver Opera ng Characteris c (ROC) Plot Analysis.
Computed proper es
Property Descrip on
ADMET_EXT_PPB Bayesian score from the model
ADMET_EXT_PPB#Predic on The classifica on whether a compound is highly bounded (>= 90% bound) to plasma
proteins using the cutoff Bayesian score of -2.209 (obtained by minimizing the total
number of false posi ves and false nega ves)
ADMET_EXT_PPB_Applicability The applicability of the model on the predicted compound
ADMET_EXT_PPB#MD The Mahalanobis distance (MD) to the center of the training data. MD is a
generaliza on of the Euclidean distance that accounts for correla on among the
proper es. The larger the MD, the less reliable the predic on
ADMET_EXT_PPB#MDpvalue The p-value associated with the MD. The smaller the p-value, the less reliable the
predic on
Further informa on
Receiver Opera ng Characteris c (ROC) Plot Analysis
Toxicity Predic on (Extensible) - Theory
Toxicity Predic on (TOPKAT) - Theory
Legal No ces BIOVIA Discovery Studio 2016 Help: Tuesday, December 01, 2015