CHAP 12 - Bioinformatics - Research Application
CHAP 12 - Bioinformatics - Research Application
Applications
S. Gupta
INTRODUCTION
Bioinformatics is a rapidly developing branch of biology which derives
knowledge from computer analysis of biological data and is highly
interdisciplinary, using techniques and concepts from informatics, statistics,
mathematics, physics, chemistry, biochemistry, and linguistics. It has many
practical applications in different areas of biology and medicine and it
describes the use of computers to handle biological information. It is
synonymous with “computational molecular biology” which means use of
computers for the characterization of the molecular components of living
things and analyzing the information stored in the genetic code as well as
experimental results from various sources, patient statistics, and scientific
literature. Research in bioinformatics includes storage, retrieval, and analysis
of the data. Richard Durbin, Head of Informatics at the Wellcome Trust
Sanger Institute, believes that all biological computing is not bioinformatics,
e.g. mathematical modeling is not bioinformatics, even when connected with
biology-related problems. Bioinformatics is mainly management and
subsequent use of biological information, particularly, genetic information.
Fredj Tekaia from the Institut Pasteur defines “Classical” bioinformatics as
the mathematical, statistical and computing methods that aim to solve
biological problems using DNA and amino acid sequences and related
information. Medical imaging/image analysis and biologically-inspired
computation as well as genetic algorithms and neural networks too are
considered as part of bioinformatics. These areas interact in strange ways.
Neural networks are inspired by crude models of the functioning of nerve
cells in the brain and are used in a program called PHD to accurately predict
the secondary structures of proteins from their primary sequences.
Bioinformatics is thus the processing of large amounts of biologically-derived
information pertaining to DNA sequences or X-rays.
The NIH Biomedical Information Science and Technology Initiative
Consortium agreed on the following definitions of bioinformatics and
208 S. Gupta
APPLICATIONS OF BIOINFORMATICS
Bioinformatics has many applications in the research areas of medicine,
biotechnology, agriculture, etc.
1. Genomics: It is an attempt to analyze or compare the entire genetic
complement of a species. It is possible to compare genomes by comparing
representative subsets of genes within genomes.
2. Proteomics: Proteomics is the study of proteins—their location, structure
and function. It is the identification, characterization and quantification
of all proteins involved in a particular pathway, organelle, cell, tissue,
organ or organism that can provide accurate and comprehensive data
about that system. It deals with the study of the proteome, called
proteomics viz. all the proteins in any given cell as well as the set of
all protein isoforms and modifications, the interactions between them,
the structural description of proteins and their higher-order complexes
and everything ‘post-genomic’.
3. Pharmacogenomics: Pharmacogenomics is the application of genomic
approaches and technologies to the identification of drug targets. It uses
genetic information to predict whether a drug will help make a patient
Bioinformatics—Research Applications 209
well or sick and study how genes influence the response of humans to
drugs i.e. pharmacogenetics.
4. Pharmacogenetics: Pharmacogenetics is the study of how the actions
of and reactions to drugs vary with the patient’s genes. All individuals
respond differently to drug treatments; some positively, others with
little obvious change in their conditions and yet others with side effects
or allergic reactions. Much of this variation is known to have a genetic
basis. Pharmacogenetics is a subset of pharmacogenomics which uses
genomic/bioinformatic methods to identify genomic correlates, for
example SNPs (Single Nucleotide Polymorphisms), characteristic of
particular patient response profiles and use those markers to inform the
administration and development and improvement of therapies.
5. Cheminformatics: It deals with the mixing of information resources
and appropriate analysis to transform data into information and
information into knowledge for the specific purpose of drug lead
identification and optimization. Related terms of cheminformatics are
chemometrics, computational chemistry, chemical informatics and chemical
information management/science.
Chemical informatics: Computer-assisted storage, retrieval and analysis
of chemical information, from data to chemical knowledge. (Chem. Inf.
Lett., 2003, 6: 14)
Chemometrics: The application of statistics to the analysis of chemical
data (from organic, analytical or medicinal chemistry) and design of
chemical experiments and simulations. [IUPAC Computational]
6. Structural genomics or structural bioinformatics refers to the analysis
of macromolecular structure, particularly proteins using computational
tools and theoretical frameworks. One of the goals of structural genomics
is the extension of the idea of genomics to obtain accurate three-
dimensional structural models for all known protein families, protein
domains or protein folds. Structural alignment is a tool of structural
genomics.
7. Comparative genomics: The study of human genetics by comparisons
with model organisms such as mice, the fruit fly and the bacterium E.
coli.
8. Biophysics: The British Biophysical Society defines biophysics as “an
interdisciplinary field which applies techniques from the physical sciences
to understanding biological structure and function”.
9. Biomedical informatics/Medical informatics: Biomedical informatics
is an emerging discipline that has been defined as the study, invention,
and implementation of structures and algorithms to improve
communication, understanding and management of medical information.
10. Mathematical Biology: Mathematical biology also tackles biological
problems. The methods it uses to tackle them need not be numerical
210 S. Gupta
(a) The end user (the biologist) may not be a frequent user of computer
technology.
(b) These software tools must be made available over the internet given the
global distribution of the scientific research community.
Structural Analysis
This set of tools allows the comparison of structures with the known structure
databases. The function of a protein is more directly a consequence of its
structure rather than its sequence with structural homologs tending to share
functions. The determination of a protein’s 2D/3D structure is crucial in the
study of its function.
Sequence Analysis
This set of tools allows you to carry out more detailed analysis on your
query sequence including evolutionary analysis, identification of mutations,
hydropathy regions, CpG islands and compositional biases. The identification
212 S. Gupta
of these and other biological properties are all clues that aid the search to
elucidate the specific function of your sequence.
JAVA in Bioinformatics
Since research centres are scattered all around the globe ranging from private
to academic settings, and a range of hardware and OSs are being used, Java
is emerging as a key player in bioinformatics. Physiome Sciences’ computer-
based biological simulation technologies and Bioinformatics Solutions’
PatternHunter are two examples of the growing adoption of Java in
bioinformatics.
Perl in Bioinformatics
String manipulation, regular expression matching, file parsing, data format
interconversion etc. are the common text-processing tasks performed in
bioinformatics. Perl excels in such tasks and is being used by many developers.
Yet, there are no standard modules designed in Perl specifically for the field
of bioinformatics. However, developers have designed several of their own
individual modules for the purpose, which have become quite popular and
are coordinated by the BioPerl project.
Bioinformatic Projects
An exhaustive list of various bioinformatic projects is given here; a
detailed description of these projects is available on the internet. This list
includes BioPerl, BioXML, Biocorba, Ensembl, Bioperl-db, Biopython
and BioJava.
Biopython and bioJava are open source projects with very similar goals
to bioPerl. However their code is implemented in python and java,
respectively. With the development of interface objects and biocorba, it is
214 S. Gupta