0% found this document useful (0 votes)
32 views8 pages

CHAP 12 - Bioinformatics - Research Application

Uploaded by

Trucanh Dao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views8 pages

CHAP 12 - Bioinformatics - Research Application

Uploaded by

Trucanh Dao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

12 Bioinformatics—Research

Applications

S. Gupta

INTRODUCTION
Bioinformatics is a rapidly developing branch of biology which derives
knowledge from computer analysis of biological data and is highly
interdisciplinary, using techniques and concepts from informatics, statistics,
mathematics, physics, chemistry, biochemistry, and linguistics. It has many
practical applications in different areas of biology and medicine and it
describes the use of computers to handle biological information. It is
synonymous with “computational molecular biology” which means use of
computers for the characterization of the molecular components of living
things and analyzing the information stored in the genetic code as well as
experimental results from various sources, patient statistics, and scientific
literature. Research in bioinformatics includes storage, retrieval, and analysis
of the data. Richard Durbin, Head of Informatics at the Wellcome Trust
Sanger Institute, believes that all biological computing is not bioinformatics,
e.g. mathematical modeling is not bioinformatics, even when connected with
biology-related problems. Bioinformatics is mainly management and
subsequent use of biological information, particularly, genetic information.
Fredj Tekaia from the Institut Pasteur defines “Classical” bioinformatics as
the mathematical, statistical and computing methods that aim to solve
biological problems using DNA and amino acid sequences and related
information. Medical imaging/image analysis and biologically-inspired
computation as well as genetic algorithms and neural networks too are
considered as part of bioinformatics. These areas interact in strange ways.
Neural networks are inspired by crude models of the functioning of nerve
cells in the brain and are used in a program called PHD to accurately predict
the secondary structures of proteins from their primary sequences.
Bioinformatics is thus the processing of large amounts of biologically-derived
information pertaining to DNA sequences or X-rays.
The NIH Biomedical Information Science and Technology Initiative
Consortium agreed on the following definitions of bioinformatics and
208 S. Gupta

computational biology, recognizing that no definition could completely


eliminate overlap with other activities or preclude variations in interpretation
by different individuals and organizations.
Bioinformatics: Research, development or application of computational tools
and approaches for expanding the use of biological, medical, behavioural or
health data, including those to acquire, store, organize, archive, analyze or
visualize such data.
Computational Biology: The development and application of data-analytical
and theoretical methods, mathematical modeling and computational simulation
techniques to the study of biological, behavioural, and social systems.
The National Center for Biotechnology Information defines bioinformatics
as “the field of science in which biology, computer science, and information
technology merge into a single discipline.” There are three important sub-
disciplines within bioinformatics:
(i) the development of new algorithms and statistics with which to assess
relationships among members of large data sets;
(ii) the analysis and interpretation of various types of data including
nucleotide and amino acid sequences, protein domains and protein
structures; and
(iii) the development and implementation of tools that enable efficient
access and management of different types of information.

APPLICATIONS OF BIOINFORMATICS
Bioinformatics has many applications in the research areas of medicine,
biotechnology, agriculture, etc.
1. Genomics: It is an attempt to analyze or compare the entire genetic
complement of a species. It is possible to compare genomes by comparing
representative subsets of genes within genomes.
2. Proteomics: Proteomics is the study of proteins—their location, structure
and function. It is the identification, characterization and quantification
of all proteins involved in a particular pathway, organelle, cell, tissue,
organ or organism that can provide accurate and comprehensive data
about that system. It deals with the study of the proteome, called
proteomics viz. all the proteins in any given cell as well as the set of
all protein isoforms and modifications, the interactions between them,
the structural description of proteins and their higher-order complexes
and everything ‘post-genomic’.
3. Pharmacogenomics: Pharmacogenomics is the application of genomic
approaches and technologies to the identification of drug targets. It uses
genetic information to predict whether a drug will help make a patient
Bioinformatics—Research Applications 209

well or sick and study how genes influence the response of humans to
drugs i.e. pharmacogenetics.
4. Pharmacogenetics: Pharmacogenetics is the study of how the actions
of and reactions to drugs vary with the patient’s genes. All individuals
respond differently to drug treatments; some positively, others with
little obvious change in their conditions and yet others with side effects
or allergic reactions. Much of this variation is known to have a genetic
basis. Pharmacogenetics is a subset of pharmacogenomics which uses
genomic/bioinformatic methods to identify genomic correlates, for
example SNPs (Single Nucleotide Polymorphisms), characteristic of
particular patient response profiles and use those markers to inform the
administration and development and improvement of therapies.
5. Cheminformatics: It deals with the mixing of information resources
and appropriate analysis to transform data into information and
information into knowledge for the specific purpose of drug lead
identification and optimization. Related terms of cheminformatics are
chemometrics, computational chemistry, chemical informatics and chemical
information management/science.
Chemical informatics: Computer-assisted storage, retrieval and analysis
of chemical information, from data to chemical knowledge. (Chem. Inf.
Lett., 2003, 6: 14)
Chemometrics: The application of statistics to the analysis of chemical
data (from organic, analytical or medicinal chemistry) and design of
chemical experiments and simulations. [IUPAC Computational]
6. Structural genomics or structural bioinformatics refers to the analysis
of macromolecular structure, particularly proteins using computational
tools and theoretical frameworks. One of the goals of structural genomics
is the extension of the idea of genomics to obtain accurate three-
dimensional structural models for all known protein families, protein
domains or protein folds. Structural alignment is a tool of structural
genomics.
7. Comparative genomics: The study of human genetics by comparisons
with model organisms such as mice, the fruit fly and the bacterium E.
coli.
8. Biophysics: The British Biophysical Society defines biophysics as “an
interdisciplinary field which applies techniques from the physical sciences
to understanding biological structure and function”.
9. Biomedical informatics/Medical informatics: Biomedical informatics
is an emerging discipline that has been defined as the study, invention,
and implementation of structures and algorithms to improve
communication, understanding and management of medical information.
10. Mathematical Biology: Mathematical biology also tackles biological
problems. The methods it uses to tackle them need not be numerical
210 S. Gupta

and need not be implemented in software or hardware. It includes things


of theoretical interest which are not necessarily algorithmic, or molecular
in nature nor useful in analyzing collected data.
11. Computational chemistry: Computational chemistry is the branch of
theoretical chemistry whose major goals are to create efficient computer
programs that calculate the properties of molecules (such as total energy,
dipole moment, vibrational frequencies) and to apply these programs to
concrete chemical objects. It is also sometimes used to cover the areas
of overlap between computer science and chemistry. It is a discipline
using mathematical methods for the calculation of molecular properties
or for the simulation of molecular behaviour. It also includes synthesis
planning, database searching, combinatorial library manipulation
(Hopfinger, 1981; Ugi et al., 1990). [IUPAC Computational]
12. Functional genomics: Functional genomics is a field of molecular
biology using the vast wealth of data produced by genome sequencing
projects to describe genome function. It uses high-throughput techniques
like DNA microarrays, proteomics, metabolomics and mutation analysis
to describe the function and interactions of genes.
13. Pharmacoinformatics: Pharmacoinformatics concentrates on the aspects
of bioinformatics dealing with drug discovery.
14. In silico ADME-Tox Prediction: Drug discovery is a complex and
risky treasure hunt to find the most efficacious molecule which do not
have toxic effects but at the same time have desired pharmacokinetic
profile. Huge amount of research is required to be done to come out
with a molecule which has the reliable binding profile. The molecule
which shows better binding is then evaluated for its toxicity and
pharmacokinetic profiles so that the molecule becomes a successful
drug.
15. Agroinformatics/Agricultural informatics: Agro informatics
concentrates on the aspects of bioinformatics dealing with plant genomes.
16. Systems biology: Systems biology is the coordinated study of biological
systems by investigating the components of cellular networks and their
interactions, by applying experimental high-throughput and whole-
genome techniques and integrating computational methods with
experimental efforts.

BIOINFORMATICS—TOOLS, SOFTWARES AND PROGRAMS


Bioinformatic tools are software programs that are designed for extracting
the meaningful information from the mass of molecular biology/biological
databases and to carry out sequence or structural analysis.
The following factors must be taken into consideration when designing
bioinformatics tools, software and programs:
Bioinformatics—Research Applications 211

(a) The end user (the biologist) may not be a frequent user of computer
technology.
(b) These software tools must be made available over the internet given the
global distribution of the scientific research community.

Major Categories of Bioinformatic Tools


There are both standard and customized products to meet the requirements
of particular projects. There are data-mining software that retrieve data from
genomic sequence databases and also visualization tools to analyze and
retrieve information from proteomic databases. These can be classified as
homology and similarity tools, protein functional analysis tools, sequence
analysis tools and miscellaneous tools. Everyday bioinformatics is done with
sequence search programs like BLAST, sequence analysis programs like the
EMBOSS and Staden packages, structure prediction programs like
THREADER or PHD or molecular imaging/modeling programs like RasMol
and WHATIF.

Homology and Similarity Tools


Homologous sequences are sequences related by divergence from a common
ancestor. Thus the degree of similarity between two sequences can be
measured while their homology is a case of being either true or false. This
set of tools can be used to identify similarities between novel query sequences
of unknown structure and function and database sequences whose structure
and function have been elucidated.

Protein Function Analysis


This group of programs allow you to compare your protein sequence to the
secondary (or derived) protein databases that contain information on motifs,
signatures and protein domains. Highly significant hits against these different
pattern databases allow you to approximate the biochemical function of your
query protein.

Structural Analysis
This set of tools allows the comparison of structures with the known structure
databases. The function of a protein is more directly a consequence of its
structure rather than its sequence with structural homologs tending to share
functions. The determination of a protein’s 2D/3D structure is crucial in the
study of its function.

Sequence Analysis
This set of tools allows you to carry out more detailed analysis on your
query sequence including evolutionary analysis, identification of mutations,
hydropathy regions, CpG islands and compositional biases. The identification
212 S. Gupta

of these and other biological properties are all clues that aid the search to
elucidate the specific function of your sequence.

Some Examples of Bioinformatic Tools


BLAST: BLAST (Basic Local Alignment Search Tool) comes under the
category of homology and similarity tools. It is a set of search programs
designed for the Windows platform and is used to perform fast similarity
searches regardless of whether the query is for protein or DNA. Comparison
of nucleotide sequences in a database can be performed. Also a protein
database can be searched to find a match against the queried protein sequence.
NCBI has also introduced the new queuing system to BLAST (Q BLAST)
that allows users to retrieve results at their convenience and format their
results multiple times with different formating options. Depending on the
type of sequences to compare, there are different programs:
• blastp compares an amino acid query sequence against a protein sequence
database.
• blastn compares a nucleotide query sequence against a nucleotide sequence
database.
• blastx compares a nucleotide query sequence translated in all reading
frames against a protein sequence database.
• tblastn compares a protein query sequence against a nucleotide sequence
database dynamically translated in all reading frames.
• tblastx compares the six-frame translations of a nucleotide query sequence
against the six-frame translations of a nucleotide sequence database.
FASTA: It is an alignment program for protein sequences created by Pearsin
and Lipman (1988), and is one of the many heuristic algorithms proposed
to speed up sequence comparison. The basic idea is to add a fast prescreen
step to locate the highly matching segments between two sequences, and
then extend these matching segments to local alignments using more rigorous
algorithms such as Smith-Waterman.
EMBOSS: EMBOSS (European Molecular Biology Open Software Suite)
is a software-analysis package. It can work with data in a range of formats
and also retrieve sequence data transparently from the Web. Extensive libraries
are also provided with this package, allowing other scientists to release their
software as open source. It provides a set of sequence-analysis programs,
and also supports all UNIX platforms.
Clustalw: It is a fully automated sequence alignment tool for DNA and
protein sequences. It returns the best match over a total length of input
sequences, be it a protein or a nucleic acid.
RasMol: It is a powerful research tool to display the structure of DNA,
proteins, and smaller molecules. Protein Explorer, a derivative of RasMol,
is an easier to use program.
Bioinformatics—Research Applications 213

PROSPECT: PROSPECT (PROtein Structure Prediction and Evaluation


Computer ToolKit) is a protein-structure prediction system that employs a
computational technique called protein threading to construct a protein’s 3-
D model.
PatternHunter: PatternHunter, based on Java, can identify all approximate
repeats in a complete genome in a short time using little memory on a
desktop computer. Its features are its advanced patented algorithm and data
structures, and the java language used to create it. The Java language version
of PatternHunter is just 40 kB, only 1% the size of Blast, while offering a
large portion of its functionality.
COPIA: COPIA (COnsensus Pattern Identification and Analysis) is a protein
structure analysis tool for discovering motifs (conserved regions) in a family
of protein sequences. Such motifs can be then used to determine membership
to the family for new protein sequences, predict secondary and tertiary
structure and function of proteins and study evolution history of the sequences.

Application of Programs in Bioinformatics

JAVA in Bioinformatics
Since research centres are scattered all around the globe ranging from private
to academic settings, and a range of hardware and OSs are being used, Java
is emerging as a key player in bioinformatics. Physiome Sciences’ computer-
based biological simulation technologies and Bioinformatics Solutions’
PatternHunter are two examples of the growing adoption of Java in
bioinformatics.

Perl in Bioinformatics
String manipulation, regular expression matching, file parsing, data format
interconversion etc. are the common text-processing tasks performed in
bioinformatics. Perl excels in such tasks and is being used by many developers.
Yet, there are no standard modules designed in Perl specifically for the field
of bioinformatics. However, developers have designed several of their own
individual modules for the purpose, which have become quite popular and
are coordinated by the BioPerl project.

Bioinformatic Projects
An exhaustive list of various bioinformatic projects is given here; a
detailed description of these projects is available on the internet. This list
includes BioPerl, BioXML, Biocorba, Ensembl, Bioperl-db, Biopython
and BioJava.
Biopython and bioJava are open source projects with very similar goals
to bioPerl. However their code is implemented in python and java,
respectively. With the development of interface objects and biocorba, it is
214 S. Gupta

possible to write java or python objects which can be accessed by a bioPerl


script, or to call bioPerl objects from java or python code. Since biopython
and bioJava are more recent projects than bioPerl, most effort to date has
been to port bioPerl functionality to biopython and bioJava rather than the
other way around. However, in the future, some bioinformatic tasks may
prove to be more effectively implemented in java or python in which case
being able to call them from within bioPerl will become more important.
Major bioinformatic activities are carried out using three types of Molecular
Modeling programs viz., AMBER, CHARMM and GROMACS as well as
three types of Genetic Algorithms STRUCTURE OPTIMIZATION,
SEQUENCE ALIGNMENT and PSEGA.

You might also like