Book Chapter
Book Chapter
net/publication/236031265
CITATIONS READS
0 1,244
3 authors:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Deciphering Genetic Variation in the Carbohydrate Metabolism of Farmed Rohu Families View project
All content following this page was uploaded by Jyotika Bhati on 24 March 2014.
Biology is in the middle of a major paradigm shift driven by computing technology. Due to
the impact of information technology, biological sciences have been rapidly becoming much
more computational and analytical. Rapid progress in research in the field of genetics and
related field combined with the tools provided by modern biotechnology has generated
massive volumes of genetic and protein sequence data over last two decades. Compilation,
storage, analysis for extraction of information and knowledge from this data becomes a
challenging task as usual analytical procedures are not directly applicable to these data sets.
Bioinformatics has been defined as a means for analysing, comparing, graphically displaying,
modeling, storing, systemising, searching, and ultimately distributing biological information,
which includes sequences, structures, function, and phylogeny. Thus, bioinformatics may be
defined as a discipline that generates computational tools, databases, and methods to support
genomic and post-genomic research. It comprises the study of DNA structure and function,
gene and protein expression, protein production, structure and function, genetic regulatory
systems, and clinical applications. Bioinformatics applications need knowledge from
computer science, mathematics, statistics, medicine, chemistry and biology.
Biology employs a digital language for representing its information using the four basic
alphabets (A, C, G, T). All the chromosomes in an organism' cell have been represented and
being identified using sequences of these alphabets. The demanding challenge here is to
determine how this digital language of the chromosomes is being converted into the three-
dimensional and sometimes four-dimensional languages of living and breathing organisms.
It has been found that performing all these above-mentioned tasks manually is nearly
impossible due to the massive volumes of biological data and the preciseness of task to be
performed; it became mandatory to use computers for these purposes. Thus, the subject of
bioinformatics deals with designing and deploying efficient software tools and computational
algorithms for accomplishing the above quoted tasks in a fast and precise manner. Therefore,
to bridge the gap between the real world of biology and precise logical nature of computers
requires an interdisciplinary perspective.
The tools of computer science, statistics, and mathematics are very critical for studying
biology in the perspective of bioinformatics. Some of the recent advances in the field of
biotechnology including improved DNA sequencing methods, new approaches to identify
protein structure, and revolutionary methods to monitor the expression of many genes in
parallel have posed number of challenges to computational scientists. The designing tools and
techniques to deal with different sources of incomplete and noisy data have become another
crucial goal for the bioinformatics community. In addition, there is the need to implement
computational solutions based on theoretical frameworks to allow scientists to perform
complex inferences about the phenomena under study.
1
Genomics in the recent past has triggered the development of high-throughput
instrumentation for DNA sequencing, DNA arrays, genotyping, proteomics, etc. These
instruments have catalyzed a new type of science for biology termed discovery science.
Discovery science defines all of the elements in a biological system. For example, sequence
of the genome, identification and quantitation of all of the mRNAs or proteins in a particular
cell type genome, transcriptome, and the proteome. Discovery science creates databases of
information, in contrast to the more classical hypothesis-driven science that formulates
hypotheses and attempts to test them. The high-throughput tools both provide the means for
discovery science and can assay how global information sets, for example, transcriptomes or
proteomes change as systems are perturbed.
The genomes of the model organisms yeast, worm, fly etc., have demonstrated the
fundamental conservation among all living organisms of the basic informational pathways.
Hence, systems can be perturbed in model organisms to gain insight into their functioning,
and these data will provide fundamental insights into biological systems. From the genome,
the information pathways and networks can be extracted to begin understanding the logic of
life. Further, different genomes can be compared to identify similarities and differences in the
strategies for the logic of life and these provide fundamental insights into development,
physiology and evolution. The first eukaryotic genome that has been fully sequenced and
annotated is Saccharomyces cerevisiae. This opens the path to develop biological and
computational tools for genomic and post-genomic research.
In the era of automated DNA sequencing and revolutionary advances in DNA sequence
analysis, the attention of many researchers is now shifting away from the study of single
genes or small gene clusters to whole genome analyses. Knowing the complete sequence of a
genome is only the first step in understanding how the myriad of information contained
within the genes is transcribed and ultimately translated into functional proteins. In the post
genomic era, the functional genomic and proteomic studies help to obtain an image of the
dynamic cell.
The information of genes or proteins, which are the molecular machines of life
The information of the regularity networks that coordinate and specify the expression
patterns of the genes and proteins.
All biological information is hierarchical in nature. Initially, DNA will change over to
mRNA, which in turn goes to protein. Proteins enact protein interactions, which creates some
informational pathways. These pathways form informational networks, which in turn become
cells. Now cells form networks of cells. Finally, an individual is a collection of cells. A host
of individuals forms population and a variety of populations becomes ecologies. This
evolution brings a primary challenge for researchers and scientists to create tools and
mechanisms to capture and integrate these different levels of biological information and
integrate it towards gaining insight of their curious functioning.
In February 2003, the human genome was finally deciphered. In other words, scientists have
succeeded in reading the chain of more than 3 billion base pairs that constitute the DNA
molecule of humans, this process is called sequencing. That daunting task required new
2
analytical methods created by bioinformatics. The broad challenge was to identify all the
genes and associate them with specific functions (field of genomics), predict the structure of
the proteins for which they code (field of proteomics), and compare the roles of certain genes
with those of other species in the living world (using biochips, for example).
The National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as:
"Bioinformatics is the field of science in which biology, computer science, and information
technology merge into a single discipline. There are three important sub-disciplines within
bioinformatics (i) the development of new algorithms and statistics with which to assess
relationships among members of large data sets (ii) the analysis and interpretation of various
types of data including nucleotide and amino acid sequences, protein domains, and protein
structures (iii) and the development and implementation of tools that enable efficient access
and management of different types of information."
1. Biological databases
The biological databases are libraries of life sciences information, collected from scientific
experiments, published literature, high-throughput experiment technology, and computational
analyses. They contain information from research areas including genomics, proteomics,
metabolomics, microarray gene expression, and phylogenetics. Information contained in
biological databases includes gene function, structure, localization (both cellular and
chromosomal), clinical effects of mutations as well as similarities of biological sequences and
structures.
Biological databases are an important tool in assisting scientists to understand and explain a
host of biological phenomena from the structure of biomolecules and their interaction, to the
whole metabolism of organisms and to understanding the evolution of species. This
knowledge helps to facilitate the fight against diseases, assists in the development of
medications and in discovering basic relationships amongst species in the history of life.
3
Biological knowledge is distributed amongst many different general and specialized
databases. This sometimes makes it difficult to ensure the consistency of information.
Biological databases cross-reference other databases with accession numbers as one way of
linking their related knowledge together.
There are two categories for the biological databases in bioinformatics, firstly nucleotide
database and second protein database. The nucleotide databases are developed by National
Centre for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov.in), European
Molecular Biology Laboratory (EMBL) (http://www.ebi.ac.uk) and DNA DataBank of Japan
(DDBJ) (http://www.ddbj.nig.ac.jp).
2. Bioinformatics Tools
The Bioinformatics tools are the software programs for the saving, retrieving and analysis of
Biological data and extracting the information from them.
Factors that must be taken into consideration while designing these tools are:
The end user (the biologist) may not be a frequent user of computer technology and
thus it should be very user friendly.
These software tools must be made available over the internet given the global
distribution of the scientific research community.
There are both standard and customized products to meet the requirements of a particular
project. There are data-mining software that retrieves data from genomic sequence databases
and also visualization tools to analyze and retrieve information from proteomic databases.
The Bioinformatics tools may be categorized into:
Sequence Analysis
Homology and Similarity Tools
Protein Function Analysis
Structural Analysis
This set of tools allows you to carry out further, more detailed analysis on your query
sequence including evolutionary analysis, identification of mutations, hydropathy regions,
CpG islands and compositional biases. The identification of these and other biological
properties are all clues that aid the search to elucidate the specific function of the sequence.
4
Align is used to compare 2 sequences that covers the whole length of both sequences, use
Needleman-Wunsch algorithm. In order to find the best region of similarity between two
sequences, it uses Smith-Waterman algorithm.
CENSOR is a software tool which screens query sequences against a reference collection of
repeats and "censors" (masks) homologous portions with masking symbols, as well as
generating a report classifying all found repeats.
ClustalW2 is a general purpose multiple sequence alignment program for DNA or proteins.
It produces biologically meaningful multiple sequence alignments of divergent sequences. It
calculates the best match for the selected sequences, and lines them up so that the identities,
similarities and differences can be seen. Evolutionary relationships can also be seen via
viewing Cladograms or Phylograms.
Dna Block Aligner (DBA) aligns two sequences under the assumption that the sequences
share a number of colinear blocks of conservation separated by potentially large and varied
lengths of DNA in the two sequences. The aim was that this was a very sensible thing to do
with syntenous regions of non coding DNA between say mouse and human, for example, the
upstream regions of a gene from mouse and human, or the conserved intron of a human -
chicken gene. The conserved blocks may be regions that are important for regulation of the
gene. The conserved blocks may have one or two gaps. The final model is a probabilistic
finite state machine (or pair-HMM) which aligns the two sequences. Each block can choose
one of 4 different parameter sets, roughly being conservation at 65,75,85 or 95 percent
identity. Linear gaps (gaps where the gap open is the same as the extension) have been
modeled in the blocks at a fixed probability 0.05 and each block is expected around 1% of the
DNA sequence. The DBA form works by submitting the two query sequences (one can use
the file upload feature if needed and then submit. An ASCII output of the alignment is
returned to the users.
Wise2 form compares a protein sequence to a genomic DNA sequence, allowing for introns
and frameshifting errors. The model parameters which have been chosen are human gene
parameters, local start/end in the protein and 6:23 algorithm.
MAFFT (Multiple Alignment using Fast Fourier Transform) is a high speed multiple
sequence alignment program.
5
Mauve is a system for efficiently constructing multiple genome alignments in the presence of
large-scale evolutionary events such as rearrangement and inversion. Multiple genome
alignment provides a basis for research into comparative genomics and the study of
evolutionary dynamics. Aligning whole genomes is a fundamentally different problem than
aligning short sequences. Mauve has been developed with the idea that a multiple genome
aligner should require only modest computational resources. It employs algorithmic
techniques that scale well in the amount of sequence being aligned. For example, a pair of Y.
pestis genomes can be aligned in under a minute, while a group of 9 divergent Enterobacterial
genomes can be aligned in a few hours.
HMMER is used for searching sequence databases for homologs of protein sequences, and
for making protein sequence alignments. It implements methods using probabilistic models
called “profile hidden Markov models” (profile HMMs). Compared to BLAST, FASTA, and
other sequence alignment and database search tools based on older scoring methodology,
HMMER aims to be significantly more accurate and more suitable to detect remote homologs
because of the strength of its underlying mathematical models. In the past, this strength came
at significant computational expense, but in the new HMMER3 project, HMMER is now
essentially as fast as BLAST.
SeWeR is an acronym, stands for SEquence analysis using WEb Resources. It serves you a
single door to all the common web-based services for sequence analysis. And it sews. It sews
all these services together. For a refined mind, SeWeR is an integrated portal to common
web-based services in bioinformatics. SeWeR is cross-browser DHTML. It is written entirely
in JavaScript1.2. Hence it will run only in Netscape 4.0 or higher and Internet Explorer 4.0 or
higher.
VecScreen is a system for quickly identifying segments of a nucleic acid sequence that may
be of vector origin. NCBI developed VecScreen to combat the problem of vector
6
contamination in public sequence databases. This Web page is designed to help researchers
identify and remove any segments of vector origin before sequence analysis or submission.
ORF Finder (Open Reading Frame Finder) is a graphical analysis tool which finds all open
reading frames of a selectable minimum size in a user's sequence or in a sequence already in
the database. This tool identifies all open reading frames using the standard or alternative
genetic codes. The deduced amino acid sequence can be saved in various formats and
searched against the sequence database using the WWW BLAST server. The ORF Finder
should be helpful in preparing complete and accurate sequence submissions. It is also
packaged with the Sequin sequence submission software.
Genome Workbench can display sequence data in many ways, including graphical sequence
views, various alignment views, phylogenetic tree views, and tabular views of data. It can also
align your private data to data in public databases, display your data in the context of public
data, and retrieve BLAST results. Genome Workbench is built on the NCBI C++ ToolKit and
uses cross-platform APIs for graphics. It runs on your local machine, and is available for
Windows 2000/XP, Linux, MacOS X, and various flavors of Unix.
SAPS evaluates by statistical criteria a wide variety of protein sequence properties. Properties
considered include compositional biases, clusters and runs of charge and other amino acid
types, different kinds and extents of repetitive structures, locally periodic motifs, and
anomalous spacings between identical residue types. The statistics are computed for any
single (or appropriately concatenated) protein sequence input.
Transeq translates nucleic acid sequences to the corresponding peptide sequence. It can
translate in any of the three forward or three reverse sense frames, or in all three forward or
reverse frames, or in all six frames.
7
2.2 Homology and Similarity Tools
The term homology implies a common evolutionary relationship between two traits -whether
they are DNA sequences or bristle patterns on a fly's nose. Homologous sequences are
sequences that are related by divergence from a common ancestor. Thus the degree of
similarity between two sequences can be measured while their homology is a case of being
either true or false. This set of tools can be used to identify similarities between novel query
sequences of unknown structure and function; and database sequences whose structure and
function have been elucidated. The following software tools are broadly used for homology
and similarity searches.
BLAST (Basic Local Alignment Search Tool) comes under the category of homology and
similarity tools. It is a set of search programs used to perform fast similarity searches
regardless of whether the query is for protein or DNA. Comparison of nucleotide sequences
in a database can be performed. Also, a protein database can be searched to find a match
against the queried protein sequence. NCBI has also introduced the new queuing system to
BLAST (Q BLAST) that allows users to retrieve results at their convenience and format their
results multiple times with different formatting options.
blastp compares an amino acid query sequence against a protein sequence database
blastn compares a nucleotide query sequence against a nucleotide sequence database
blastx compares a nucleotide query sequence translated in all reading frames against a
protein sequence database
tblastn compares a protein query sequence against a nucleotide sequence database
dynamically translated in all reading frames
tblastx compares the six-frame translations of a nucleotide query sequence against the
six-frame translations of a nucleotide sequence database.
FASTA stands for FAST homology search All sequences. An alignment program for protein
sequences was created by Pearsin and Lipman in 1988. The program is one of the many
heuristic algorithms proposed to speed up sequence comparison. The basic idea is to add a
fast prescreen step to locate the highly matching segments between two sequences, and then
extend these matching segments to local alignments using more rigorous algorithms such as
Smith-Waterman. FASTA can be very specific when identifying long regions of low
similarity especially for highly diverged sequences. You can also conduct sequence similarity
searching against nucleotide databases or complete proteome/genome databases using the
FASTA programs.
ENA Sequence Search allows you to search against all nucleotide sequences in the European
Nucleotide Archive (ENA).
8
SSEARCH/GGSEARCH/GLSEARCH: Provides sequence similarity searching against
protein databases using the FASTA and SSEARCH programs. SSEARCH does a rigorous
Smith-Waterman search for similarity between a query sequence and a database.
GGSEARCH compares a protein or DNA sequence to a sequence database producing
global-global alignment (Needleman-Wunsch). GLSEARCH compares a protein or DNA
sequence to a sequence database.
Function Analysis is identification and mapping of all functional elements (both coding and
non-coding) in a genome. This group of programs allow you to compare your protein
sequence to the secondary (or derived) protein databases that contain information on motifs,
signatures and protein domains. Highly significant hits against these different pattern
databases allow you to approximate the biochemical function of your query protein. Some of
these databases are described below:
CluSTr: There are two ways to search the CluSTr database, the advanced search allows
protein accession queries and the simple search allows cluster identifier searching of the
database. For advanced search, use UniProt Knowledgebase accessions (e.g. Q9Y9L0),
UniProt Knowledgebase IDs (e.g. TDXH_AERPE), IPI accessions (e.g. IPI00745335),
UniProt isoform ids (e.g. P45983-1) or a protein name (or fragment of). For simple search,
pick out individual clusters by specifying all three of: a run name, a numeric id (e.g. 16457)
and a z score (e.g. 232.7). Otherwise specify a cluster identifier (e.g. HUMAN:16457:232.7,
i.e. a colon-separated list of the above three values).
Inquisitor will examine your protein sequence and identify whether or not it corresponds to a
sequence in Integr8 (complete proteomes only) and the UniProt Knowledgebase. If the
sequence is not identified, the Inquisitor will return details of the closest matches to your
sequence, and will also return an analysis of the exact sequence submitted. The Inquisitor
uses FASTA to find inexact maches, and InterProScan to analyse sequences. A status report
will keep you informed of the analysis process.
InterProScan form allows you to query your sequence against InterPro. It’s an integrated
database and diagnostic tool, which uses different methodologies and a varying degree of
biological information on well-characterised proteins to derive protein signatures.
InterProScan combines a number of databases (referred to as member databases) like
ProDom, PROSITE patterns, PROSITE and HAMAP profiles, PRINTS, PANTHER, PIRSF,
Pfam, SMART, TIGRFAMs, Gene3D and SUPERFAMILY.
9
Phobius server is for prediction of transmembrane topology and signal peptides from the
amino acid sequence of a protein.
PPSearch: Search your query sequence for protein motifs, rapidly compare your query
protein sequence against all patterns stored in the PROSITE pattern database and determine
what the function of an uncharacterised protein can perform. This tool requires a protein
sequence as input, but DNA/RNA may be translated into a protein sequence using transeq
and then queried.
Pratt: An important problem in sequence analysis is to find patterns matching sets or subsets
of sequences. This tool allows the user to search for patterns conserved in sets of unaligned
protein sequences. The user can specify what kind of patterns should be searched for, and
how many sequences should match a pattern to be reported.
RADAR stands for Rapid Automatic Detection and Alignment of Repeats in protein
sequences. Many large proteins have evolved by internal duplication and many internal
sequence repeats correspond to functional and structural units. Radar uses an automatic
algorithm, for segmenting your query sequence into repeats. It identifies short composition
biased as well as gapped approximate repeats and complex repeat architectures involving
many different types of repeats in your query sequence.
GLIMMER is a system for finding genes in microbial DNA, especially the genomes of
bacteria, archaea, and viruses. GLIMMER (Gene Locator and Interpolated Markov Modeler)
uses interpolated Markov models to identify coding regions.
Proteax software suite makes it easy to handle modified proteins and protein derivatives. Use
Proteax with Microsoft Excel or Oracle databases to register and analyse protein structures.
The chemically-aware Proteax engine enables researcher to work with post-translationally
and chemically modified protein sequences. Protein data can therefore be transferred directly
to bioinformatics tools, chemistry tools and to and from mass spectrometry instruments. The
Proteax Cartridge directly supports Oracle databases without imposing restrictions on the
way you model data. Using Proteax inside PostgreSQL databases is an option too. Offline
and local data can be processed within a familiar spreadsheet environment. The Proteax for
spreadsheet add-ins work in both Microsoft Excel and OpenOffice.org Calc and gives the full
functionality of the Proteax toolkit right inside your spreadsheet of choice. Bulk-like
processing of flat file datasets is a natural fit for Proteax Desktop which can be scripted by all
common programming languages.
This set of tools allows comparing structures with the known structure databases. The
10
function of a protein is more directly a consequence of its structure rather than its sequence
with structural homologs tending to share functions. The determination of a protein's 2D/3D
structure is crucial in the study of its function.
RASMOL is a computer program written for molecular graphics visualization intended and
used primarily for the depiction and exploration of biological macromolecule structures, such
as those found in the Protein Data Bank. It was originally developed by Roger Sayle in the
early 90s. Historically, it was an important tool for molecular biologists since the extremely
optimized program allowed the software to run on (then) modestly powerful personal
computers. Before RasMol, visualization software ran on graphics workstations that, due to
their expense, were less accessible to scholars. RasMol has become an important educational
tool as well as continuing to be an important tool for research in structural biology. RasMol
includes a language for selecting certain protein chains, or changing colors etc.
QuteMol is an open source, interactive, molecular visualization system. QuteMol utilizes the
current capabilities of modern Graphics Processing Units through OpenGL shaders to offer
an array of innovative visual effects. QuteMol visualization techniques are aimed at
improving clarity and an easier understanding of the 3D shape and structure of large
molecules or complex proteins.
Ascalaph Designer is a general purpose molecular modeling package for molecular design
and simulations. It provides a graphical environment for the common programs of quantum
and classical molecular modeling like Firefly, CP2K and MDynaMix. The molecular
mechanics calculations cover model building, energy optimizations and molecular dynamics.
The Firefly/PC GAMESS covers a wide range of quantum chemistry methods.
NAMD (Not (just) Another Molecular Dynamics program) is a free molecular dynamics
simulation package written using the Charm++ parallel programming model, noted for its
parallel efficiency and often used to simulate large systems (millions of atoms). It has been
developed by the joint collaboration of the Theoretical and Computational Biophysics Group
(TCB) and the Parallel Programming Laboratory (PPL) at the University of Illinois at
Urbana-Champaign. It was introduced in 1995 by Nelson et al. as a parallel molecular
dynamics code enabling interactive simulation by linking to the visualization code VMD.
NAMD has since matured, adding many features and scaling to thousands of processors.
Jmol is an open-source Java viewer for chemical structures in 3D. Jmol returns a 3D
representation of a molecule that may be used as a teaching tool, or for research in chemistry
and biochemistry. It is free and open source software, written in Java and so it runs on
Windows, Mac OS X, Linux and Unix systems. There is a standalone application and a
development tool kit that can be integrated into other Java applications. The most notable
feature is an applet that can be integrated into web pages to display molecules in a variety of
12
ways. For example, molecules can be displayed as "ball and stick" models, "space filling"
models, "ribbon" models, etc. Jmol supports a wide range of molecular file formats, including
Protein Data Bank (pdb), Crystallographic Information File (cif), MDL Molfile (mol), and
Chemical Markup Language (CML).
DaliLite is a program for pairwise structure comparison. Compare your structure (first
structure) to a reference structure (second structure).
MaxSprout is a fast database algorithm for generating protein backbone and side chain co-
ordinates from a C (alpha) trace. The backbone is assembled from fragments taken from
known structures. Side chain conformations are optimised in rotamer space using a rough
potential energy function to avoid clashes.
PDBeFold (also known as SSM) is an interactive service for comparing protein structures in
3D. This service provides:
13
linking the results to other services - PDBeMotif, SCOP, GeneCensus, FSSP, CATH,
PDBSum, UniProt.
PDBeMotif is an extremely fast and powerful search tool that facilitates exploration of the
Protein Data Bank (PDB) by combining protein sequence, chemical structure and 3D data in
a single search. Currently, it is the only tool that offers this kind of integration at this speed.
PDBeMotif can be used to examine the characteristics of the binding sites of single proteins
or classes of proteins such as Kinases and the conserved structural features of their immediate
environments either within the same species or across different species. For example, it can
highlight a conserved activation loop common to protein kinases, which is important in
regulating activity and is marked by conserved DFG and APE motifs at the start and end of
the loop, respectively. The prediction of the effect of modifications to small molecules that
bind to the active and/or regulatory sites of proteins on their efficacy can be based on the
outcome of analytic work done using PDBeMotif. It can be ported to all major operating
system platforms such as MS Windows, LINUX, Apple Mac and Solaris as it is written in
Java and uses Oracle and the free source PostGreSQL database server. PDBeMotif can be
used online or downloaded and installed locally where public and private PDB files
(including libraries of theoretically derived 3D structures) can be loaded and analyzed. There
is also the capability to load protein site annotations, families and domains from Distributed
Annotation System (DAS) servers.
Tempura is a server designed to allow a user to specify the amino acids to be selected for
searching using the "reverse template" approach. In addition to the selection of important
residues, the user can also specify whether to search against a non-redundant representative
sample of the PDB, an uploaded list of PDB codes (useful when examining specific families
of proteins) or a single PDB code (for pairwise searches). The Tempura server allows for two
types of searches: either by uploading a 3D coordinate file or by entering the corresponding
4-character PDB code. Once a valid selection has been made or structure uploaded, a list of
residues is presented for selection. Click on as many residues as required (use shift+click or
ctrl+click for multiple residue selections) and then click "submit" to go to the database
selection screen where the number of templates generated will be displayed.
Bioinformatics is being used in various fields of sciences. Few important areas are given
below:
Molecular medicine
14
Personalised medicine
Preventative medicine
Gene therapy
Drug development
Microbial genome applications
Waste cleanup
Climatic change studies
Alternative energy sources
Biotechnology
Antibiotic resistance
Forensic analysis of microbes
Bio-weapon creation
Evolutionary studies
Crop improvement
Insect resistance
Improve nutritional quality
Development of Drought resistance varieties
Vetinary science
With the confluence of statistics, biology and computer science, the computer applications of
molecular biology are drawing a greater attention among the life science researchers and
scientists these days. As it becomes imperative for biologists to seek the help of information
technology professionals to accomplish the ever growing computational requirements of a
host of exciting and needy biological problems, the synergy between modern biology and
computational science is to blossum in the days to come. Thus, the research scope for all the
mathematical techniques and algorithms coupled with software programming languages,
software development and deployment tools are to get a real boost. In addition, information
technologies such as databases, middleware, graphical user interface (GUI) design,
distributed object computing, storage area networks (SAN), data compression, network and
communication and remote management are all set to play a very critical role in taking
forward the goals for which the bioinformatics field came into existence.
REFERENCES
1. http://www.ncbi.nlm.nih.gov.in
2. http://www.ebi.ac.uk
3. http://www.biochemfusion.com/
4. http://www.rasmol.org/
5. http://qutemol.sourceforge.net/
15
6. http://pymol.org/
7. http://biomolecular-modeling.com/Ascalaph/Ascalaph_Designer.html
8. http://www.gromacs.org/
9. http://www.fos.su.se/~sasha/mdynamix/
10. http://dasher.wustl.edu/tinker/
11. http://www.nvidia.com/object/namd_on_tesla.html
12. www.jmol.org
13. http://spdbv.vital-it.ch/
14. http://gel.ahabs.wisc.edu/mauve/
15. http://jaligner.sourceforge.net/
16. http://hmmer.janelia.org/
17. http://www.mbio.ncsu.edu/BioEdit/bioedit.html
18. http://www.bioinformatics.org/sewer/
16