0% found this document useful (0 votes)
18 views

chemoinformatics

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

chemoinformatics

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Chemoinformatics and

Applications in
Agrochemical Discovery

C. Devakumar and Rajesh Kumar


Division of Agricultural Chemicals
IARI, New Delhi-110012
[email protected]
Chemical Space

Stars Small
Molecules
Existing 1022 107
Virtual 0 1060 (?)
Mode Real Virtual
Access Difficult “Easy”

2
Chemical Space: Small Molecules in Organic
Chemistry

Understanding chemical space


Small molecules:
chemical synthesis
drug design
 chemical genomics,
systems biology
 nanotechnology
And others
3
t t o d e v e lo p a n d t im e t o m a r k e t o f v a r io u s p r o d u

4
R e g is t r a t io n o f s a f e r
c h e m ic a ls

P r o p o r t io n o f p e s t ic id e a c t iv e in g r e d ie n t s t h a t a r e
c o n s id e r e d t o b e s a f e r ( b io l o g ic a l c h e m ic a ls a n d
r e d u c e d - r is k c o n v e n t io n a l c h e m ic a ls ) h a s s t e a d i ly
in c r e a s e d o v e r t h e la s t s e v e r a l y e a r s .
S o u rc e : EP A, 19 9 9 . 5
P la n t b io t e c h n o lo g y o p e n s n e w
m a r k e t s / s o lu t io n s

6
T h e d e v e lo p m e n t o f t h e
a g r o c h e m ic a l
in vivo s c r e e n in g

7
Overall Outline
1. Introduction
2. Molecular Representations
3. Chemical Data and Databases
4. Molecular Similarity
5. Chemical Reactions
6. Machine Learning and Other Predictive
Methods
7. Molecular Docking and Drug Discovery
What is Chemoinformatics?
• It encompasses the design, creation,
organisation, management, retrieval, analysis,
dissemination, visualization and use of chemical
information
• It is the mixing of information resources to
transform data into information and information
into knowledge, for the intended purpose of
making better decisions faster in the arena of
drug lead identification and optimization
What is Chemoinformatics?
• “the set of computer algorithms and tools
to store and analyse chemical data in the
context of drug discovery and design
projects”
• Chemoinformatics is the application of
informatics methods to solve chemical
problems
Resources
Books:
J. Gasteiger, T. E. and Engel, T. (Editors) (2003).
Chemoinformatics: A Textbook. Wiley.
A.R. Leach and V. J. Gillet (2005). An Introduction to
Chemoinformatics. Springer.
Journal:
Journal of Chemical Information and Modeling
Web:
http://cdb.ics.uci.edu
and many more………
History of Chemoinformatics
The first, and still the core, journal for the subject, the Journal of Chemical
Documentation, started in 1961 (the name Changed to
the Journal of Chemical Information and computer Science in 1975)

The first book appeared in 1971 (Lynch, Harrison, Town and Ash,
Computer Handling of Chemical Structure Information)

The first international conference on the subject was held in 1973


at Noordwijkerhout and every three years since 1987
Chemoinformatics….

'Cheminformatics combines the scientific working fields of chemistry and


computer science for example in the area of chemical graph theory and
mining the chemical space. It is to be expected that the chemical space
contains at least 1062 molecules

 Chemoinformatics encompasses the design, creation, organisation, management,


retrieval, analysis, dissemination, visualization and use of chemical information
Is it Cheminformatics or Chemoinformatics?
• Cheminformatics, molecular informatics, chemical
informatics, or even Chemo bioinformatics

Year Cheminformatics Chemoinformatics Ratio

2000 39 684 0.05

2001 8,010 2,910 2.75

2002 34,000 16,000 2.12

2203 58,143 32,872 1.77

2204 85,435 60,439 1.41

2005 6,58,298 2,72,096 2.41

2006 3,17,000+ 1,63,000+ 1.94


Why We Need Chemoinformatics?

1) An enormous amount of data and maintenance of data

2) Can we gain enough knowledge from the known data to make


predictions for those cases where the required information is
not available?

3) Relationships between the structure of a compound and its


biological activity, or for the influence of reaction conditions
on chemical reactivity.
Advances in theoretical and computational chemistry now allow
chemists to model chemical compounds “in silico” with ever-
increasing accuracy.
Molecular properties now becoming accessible through
computation include molecular shape, electronic structure,
physical properties, chemical reactivity, protein folding, structures
of materials and surfaces, catalytic activity, and biochemical
activities.
Chemoinformatics
integrates a comprehensive knowledge of chemistry with
an extensive understanding of information technology.
The intersection of chemistry and information
technology embraces an expanding territory;
computational modeling of individual molecules,
thermodynamic methods of estimating chemical
properties, methods of predicting biological activity of
hypothetical compounds, and organization and
classification of chemical information.
Schematic representation of a crowded cell. An
array of different molecules can function
independently under extremely crowded
conditions, partly because of judicious
distributions of oppositely charged polar groups
on the molecular surfaces. However, such
systems are in some ways extremely fragile.
For example, a mutation that alters just one
amino acid in the haemoglobin molecule can
stimulate massive aggregation and give rise to
a fatal genetic disease, sickle-cell anaemia.
More generally, many disorders of old age, most
famously Alzheimer’ s disease, result from the
increasingly facile conversion of normally
soluble proteins into intractable deposits that
occur particularly as we get older Many of these
aggregation processes involve the reversion of
the unique biologically active forms of
polypeptide chains into a generic and non-
functional ‘ chemical’ form
Additional computational challenges lie in indexing and
classifying the infinite population of chemical compounds that
could be synthesized or are already known.
Specific indexing and search problems include
how to find a compound that might block a specific biological
target;
how to predict the most efficient synthetic strategy for a desired
compound from available precursors;
how to employ results of bioactivity tests on a family of
molecules to design improved versions;
20
Currently combinatorial chemists are developing new methods of
synthesizing libraries of related compounds on an unprecedented
scale.
Such libraries can be used to produce huge arrays of materials for
investigation of biochemical, catalytic, or material properties.
Systems are required to design, catalog, and search these libraries,
assess test results in a meaningful way, and integrate new
information with existing chemical databases.
Investigations into information storage at the molecular level are
underway, bringing to full circle the link between chemistry and
information technology.
The Scope of Chemoinformatics

Representations and Structure Searching

Substructure Searching

Similarity Searching, Clustering, and Diversity Analysis

Searching Databases

Computer-aided Structure Elucidation

3D Substructure Searching

QSAR and Docking

22
Structure and applications of chemoinformatics
Database design and programming
Representation and searching of chemical structures
Structure, substructure & similarity searching in 2D & 3D
Markush and reaction searching
Representation and searching of biological databases
chemoinformatics software
Data analysis techniques
Clustering;
Evolutionary algorithms;
Graph theory;
Neural networks;
Chemical information sources
Cheminformatics applications
Techniques used to design bioactive compounds
Molecular simulation and design
Drug discovery process; QSAR; Combi-chem; SBDD
Spectroscopy and crystallography in cheminformatics
Kinds of chemistry databases
• Small-molecule databases
– Databases of commercially-available compounds (e.g. ACD,
http://www.mdl.com/products/experiment/available_chem_dir/index.jsp)
– Proprietary chemical structure databases
– Literature databases
– Patent databases
– Small project-specific databases
• Protein databases
– Public, online databases (e.g. PDB, http://www.pdb.org)
– Proprietary and project-specific databases
Software Companies
Accelrys -Large chemoinformatics company
ACD/Labs - analytical informatics & predictions
BCI - 2D fingerprinting, clustering toolkits & software
Bioreason - HTS data analysis software
Cambridgesoft - 2D drawing tools & E-notebooks
CAS - produce Scifinder Scholar searching software
ChemAxon - Java based toolkits and software
Daylight- 2D representation & searching software
Leadscope - 2D structure and property tools
Lion Bioscience - produce LeadNavigator
MDL - Large chemoinformatics company
Openeye - Fast 3D docking, structure generation, toolkits
Quantum Pharmaceuticals - prediction, docking, screening
Sage Informatics - ChemTK 2D analysis software
Tripos-Large chemoinformatics company
Journals & Magazines

Journal of Chemical Information and Computer Sciences


Journal of Computer-Aided Molecular Design
Journal of Molecular Graphics and Modelling
Journal of Medicinal Chemistry
NetSci (online journal)
Scientific Computing World
Bio-IT World
Drug Discovery Today

Newsletters, Mailing Lists & Other Hubs

Chemical Informatics Letters- Monthly newsletter


CHMINF-L (Indiana)- Email discussion list
Chemoinf Yahoo Group -Email discussion list
Chemistry Software Yahoo Group
Cheminformatics.org Lots of links and QSAR datasets
Reactive Reports Chemistry Web Magazine
SMILES (Simplified Molecular Input Line Entry Specification)

H Aliphatic- Capital
8 Aromatic-Small
N4 7
9 3
Ring-By giving no.
5
O Double bonds- “=” sign
H
O 10 2 6 Parentheses-branching in the molecule
1
11
Acetaminophen

SMILES Representation
c1c(O)ccc(NC(=O)C)c1

Sources of 3D structures information

• X-ray crystallography
• NMR spectroscopy
DRAWING AND DEPICTING 2D STRUCTURES
Web-based drawing tools
JME (http://www.molinspiration.com/cgi-bin/properties) is a clean, simple Java drawing tool.
Draw your structure and click on the smiley face to show the SMILES.
Marvin Sketch is a Java applet that allows you to draw structures, and export them as
SMILES, MDL MOL files or others.

Web-based depiction tools


Daylight Depiction Tool (http://www.daylight.com/daycgi/depict) is a very simple to use tool
that allows you to enter a SMILES string and will then produce a 2D structure diagram from
it.
CACTVS GIF generator has a more complex interface, but allows many more options for
producing GIF picture files of SMILES or other format structures. The quality of the images
is superior to the daylight tool.
MDL Chime (http://www.mdlchime.com) is a browser-based plugin that can display both 2D
and interactive 3D structures in web pages.
2D searching with Oracle chemistry cartridges

Daylight DayCart – http://www.daylight.com/products/daycart.html


• Tripos Auspyx –
ttp://www.tripos.com/sciTech/inSilicoDisc/chemInfo/auspyx.html
• Accelrys Accord for Oracle –
http://www.accelrys.com/accord/oracle.html
• MDL Relational Chemistry Server –
http://www.mdl.com/products/isisdirect.html
• IDBS ActivityBase – http://www.id-bs.com/products/abase/
• Chemaxon JChem Cartridge – http://www.jchem.com
3D Structure generation and minimization

Concord from Tripos, Inc. One of the first 3D structure generation


programs, and is still being refined and developed. It generates single,
minimal-energy structures from input 2D structures. The program can
input and output a variety of file formats.
http://www.tripos.com/sciTech/inSilicoDisc/chemInfo/concord.html
Corina from the Gasteiger group. It is similar to Concord.
http://www2.chemie.uni-erlangen.de/software/corina/free_struct.html
Omega from OpenEye is the latest release. It offers very fast generation
of multiple low-energy conformers.
http://www.eyesopen.com/products/applications/omega.html
Depiction Tools for 3D structures
MDL Chime is a web browser plug-in that allows 2D and 3D structures to be
viewed in web pages. It can be used to visualize both proteins and small
molecules, and includes some limited ability to create molecular surfaces. It
is excellent for communicating structures via the web and for use in writing
web-based chemoinformatics software. http://www.mdlchime.com

ArgusLab is a free molecular modeling program that has a fairly extensive


set of options for 3D visualization, calculation of surfaces and properties,
minimization, and molecular docking. http://www.arguslab.com.
Methods for Calculating Physical and Chemical
Data

• Quantum mechanical calculations

• Additive schemes

Data Analysis Methods

 Unsupervised-artificial neural networks, genetic algorithms

 Supervised - inductive learning methods statistics, pattern


recognition methods
Chemistry Based Data Mining And Exploration

Structure
Data bases
searchable
Chemical(s) Chemical Structural Property Biological or
of concern Specific analogue analogue mechanistic
data analogue
Data mining Structure activity relationships
Chemometrics

• Quantitative analysis of chemical data relied exclusively on


Multilinear regression analysis.

• Artificial neural networks

An artificial neural network (ANN) or commonly just neural network (NN)


is an interconnected group of artificial neurons that uses a mathematical
model or computational model for information processing based on a
connectionist approach to computation.

Hidden

Input Output
Hidden
Computer-Assisted Structure Elucidation (CASE)

• A field of exercise for artificial intelligence techniques.

• The DENDRAL project, initiated in 1964 at Stanford University

Computer-Assisted Synthesis Design (CASD)

In 1969 Corey and Wipke worked for the development of a


synthesis design system.

1. Substructure searching
2. Similarity searching

35
Applications of Chemoinformatics

1. Chemical Information

 Storage and retrieval of chemical structures and associated


data to manage the flood of data

 Dissemination of data on the internet

 Cross-linking of data to information

2. All fields of chemistry

• Prediction of the physical, chemical, or biological properties


of compounds

36
3. Bioactive molecules

• identification of new lead structures

• optimization of lead structures

• establishment of quantitative structure-activity relationships

• comparison of chemical libraries

• definition and analysis of structural diversity

•planning of chemical libraries

37
Contd……
• analysis of high-throughput data

• docking of a ligand into a receptor

• prediction of the metabolism of xenobiotics

• analysis of biochemical pathways

4. Organic Chemistry

• Prediction of the course and products of organic reactions

• Design of organic synthesis

38
5. Analytical Chemistry

• Analysis of data from analytical chemistry to make predictions on


the quality, origin, and age of the investigated objects

• Elucidation of the structure of a compound based on spectroscopic


data

Teaching Chemoinformatics

 Chemists have to become more efficient in planning their


experiments, have to extract more knowledge from their
data

39
40
Toxicity Prediction for chemical Q
Chemical
Q class
Class based
SAR model
assignment
Global
toxicity
model
Supporting
information

Toxicity Analogue
prediction search

Hypothesis
generation
Weight of
evidence of
Data collection toxicity
presentation

41
Institutes are Offering Courses on Chemoinformatics

University of Barcelona, Spain


University of Erlangen-Nürnberg, Germany
Bioinformatics Institute Of India , Chandigarh
Georgia Institute of Technology
University of Sheffield (Willett) - MSc/PhD programs
University of Erlangen (Gasteiger)
UCSF (Kuntz)
University of Texas (Pearlman)
Yale (Jorgensen)
University of Michigan (Crippen)
Indiana University (Wiggins) - MSc program
Cambridge Unilever (Glen, Goodman, Murray-Rust)
Scripps - Molecular Graphics lab

42
SAR Application

Maximum activity Prediction of toxicity

DRUG DESIGN ENVIRONMENTAL PROTECTION

Minimize toxicity

•Single therapeutic target •Multiple unknown targets


•Drug like chemical •Diverse Structures
•Some toxicity anticipated •Human and ecosystems
QSAR STUDIES
DESIGN OF INSECTICIDE SYNERGISTS

FURAPIOLE ANALOGUES

H3 C
O
O O
CH3
O
O
R

log SF = 0.319 RM + 0.445σR + 0.248B1 + 0.034B4 - 0.966


n s r F

14 0.057 0.950 21.04


DESIGN OF INSECTICIDE SYNERGISTS

SESAMOL ETHERS
H3 C
O H3 C
O O O
R O O
R O O
O CH3
O R
O O

log SF =
0.153D2 + 0.240D1 - 1.711 σI - 0.429RM + 0.070L - 0.384
n s r F

29 0.087 0.938 33.72


DILLAPIOLE SIDE CHAIN ANALOGUES
H3C H3C
O O
O O O
CH3 O CH3
O O
R O
O CH3 R1
log SF = n s r F

0.467 - 0.105 D - 1.537 RM2 - 0.980σR 17 0.046 0.948 38.84

0.305 - 0.1 I0 D - 1. I 14 RM2 - 1.626 σR + 0.012 B4 17 0.045 0.955 31.37

0.071- 0.120 D - 0.619 R2M - 2.066 σR + 0.080 B4 - 17 0.045 0.958 24.86


0.003 L2
0.053-0.134D - 0.216 R2M – 1.290 σR + 0.135B4 + 17 0.046 0.961 20.30
0.006L2 - 0.67 σ
DESIGN OF CHEMICAL HYBRIDISING AGENTS
Hybrid Technology

Pollination Control
system

Male sterility
Male sterility

Three - line Two - line

Cytoplasmic Genetic Chemical Hybridising


Male Sterility Agents
QSAR Equations for Ethyl Oxanilates
O
n = 27 H
N
O
X
Sl.No Equation (Ms = ) O Statistics

r s F

1 49.99Fp – 2.39ΣMR + 64.73 0.7 14.10 11.75


(0.00)*
2 39.73 Fp -3.24ΣMR +0.32 MW- 0.91 0.76 13.20 10.43 (0.02)

3 43.74Fp – 3.04ΣMR +0.36 MW- 5.63D - 0.71 0.81 12.10 10.41 (0.01)

4 44.61Fp – 2.93ΣMR +0.65MW- 5.78D +8.02ΣEs – 56.94 0.86 10.80 12.05 (0.00)

5 35.56Fp – 2.96ΣMR +0.85MW- 4.94D +10.36ΣEs –10.00Σπ - 96.48 0.90 9.37 14.49 (0.00)

39.59 Fp-2.86 ΣMR+0.67MW-5.11D-2.57ΣR-16.91 Σπ-64.28 0.91 3.73 16.99

* p values (%)
J. Agri. Food Chem. 2003, 51, 992-998
Agrophore Group

H O
N
O

F / Br / CF3 / CN O
QSAR equations for 2-pyridones analogues
O

O
O
N
Equations (Ms =) Statistics

X
n s r F(p %)

-2.34ΣMR + 64.45 Fp -3.70Rp – 4.69D + 71.49 26 9.34 0.85 14.13 (0.00)

-3.21ΣMR + 57.18 Fp -3.77Rp – 5.38D + 93.98lnMw -459.06 26 7.77 0.91 18.38 (0.00)

-3.32ΣMR+47.47Fp-1.74Rp–5.10D+173.39lnMw+7.05ΣEs-904.54 26 7.01 0.93 19.74 (0.00)

-3.43ΣMR + 38.60 Fp– 4.79D + 210.64lnMw + 10.42ΣEs-1113.06 26 7.24 0.92 21.82 (0.00)

-3.00ΣMR + 49.50 Fp– 7.87D + 211.67lnMw + 12.19ΣEs -6.87ΣEs(m) -1117.35 6.37 0.94 22.94

J. Agri. Food Chem. 2005, 53, 3468-3475


QSAR equations for N-acylanilines

Equations (Ms=) Statistics

n r r2 s F (Probability)

62.76FP – 1.66ΣR-6.39D + 43.38 29 0.81 0.65 10.79 15.65 (0.0000)

67.54FP –1.67ΣR-6.59D + 0.13P + 29 0.86 0.74 9.55 16.97 (0.0000)


15.37

67.54FP –1.67ΣR-6.59D + 0.13P + 15.37 9.56 0.86 16.97

J. Agri. Food Chem. 2005, 53, 5959-5968


r a t e g y o f id e n t if y in g n e w t a r g e t s

55
l o r g a n is m s f o r t a r g e t id e n t if ic a

56
s t s y s t e m s f o r t a r g e t s in U H T B S

57
U H T V S - A u t o m a t e d e v a lu a t io n o f
a c t iv it y o f c o m p o u n d s

58
T h e v ir t u a l d is c o v e r y c y c le

59
U n iq u e r e s e a r c h p la t f o r m – N e t w o r k
o f c o m p le m e n t a r y t e c h n o lo g ie s t o
m e e t t h e c h a lle n g e s in c o m p o u n d
d is c o v e r y

60
D is c o v e r y o f t h e t a r g e t p r o t e in s o f
n o v e l f u n g ic id e s

61
D e n o v o t a r g e t d is c o v e r y b y f u n c t io n a l
g e n o m ic s a n d t h e s t e p s a im in g t o
d e v e lo p a n d p e r f o r m h ig h t h r o u g h p u t
b io c h e m ic a l t e s t s

62
Gene expression profiling, a revolutionary tool in
herbicide discovery

Gene Expression Profiling (GEP) with DNA microarrays (chips) is a new


technology used to measure changes in the entire transcriptome, i.e. full
complement of active genes, of an organism in a single experiment.
A catalogue of genetic fingerprints of the plant Arabidopsis thaliana, is created
and each fingerprint being characteristic for a single herbicidal MoA is then
used to rapidly classify herbicidal compounds from UHTVS according to their
MoA.
Helps to identify the affected metabolic pathway and the MoA of pro-drugs,
which cannot be elucidated by conventional biochemical methods.
GEP provides insight into the interactions of any herbicidal compound with the
entire plant metabolism with unprecedented accuracy and completeness.

63
T h e p r in c ip le o f G e n e E x p r e s s io n
P r o f ilin g .
64

You might also like