0% found this document useful (0 votes)
2 views

bioinformatics

Bioinformatics is an interdisciplinary field that applies computational tools to analyze biological data, particularly focusing on nucleic acids and protein sequences. The document discusses the importance of sequence analysis in understanding genetic information and evolutionary relationships, as well as various alignment methods and tools used in bioinformatics. Additionally, it covers the significance of phylogenetic analysis in studying the evolutionary history of organisms.

Uploaded by

itsmesimmu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

bioinformatics

Bioinformatics is an interdisciplinary field that applies computational tools to analyze biological data, particularly focusing on nucleic acids and protein sequences. The document discusses the importance of sequence analysis in understanding genetic information and evolutionary relationships, as well as various alignment methods and tools used in bioinformatics. Additionally, it covers the significance of phylogenetic analysis in studying the evolutionary history of organisms.

Uploaded by

itsmesimmu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Bioinformatics is defined as the application of tools of computation and analysis to the capture

and interpretation of biological data. It is an interdisciplinary field, which harnesses computer


science, mathematics, physics, and biology

A database is an organized collection of structured information, or data, typically stored


electronically in a computer system. A database is usually controlled by a database management
system (DBMS)

a large amount of data that is stored in a computer and can easily be used, added to, etc.

Nucleic acids are biopolymers, macromolecules, essential to all known forms of life.[1] They are
composed of nucleotides. The two main classes of nucleic acids are deoxyribonucleic acid
(DNA) and ribonucleic acid (RNA). If the sugar is ribose, the polymer is RNA; if the sugar is
deoxyribose, a version of ribose, the polymer is DNA. Nucleic acids are chemical compounds
that are found in nature. They carry information in cells and make up genetic material. Nucleic
acids are chemical compounds that are found in nature. They carry information in cells and make
up genetic material. One DNA or RNA molecule differs from another primarily in the sequence
of nucleotides. Nucleotide sequences are of great importance in biology since they carry the
ultimate instructions that encode all biological molecules, molecular assemblies, subcellular and
cellular structures, organs, and organisms, and directly enable cognition, memory, and behavior.
Enormous efforts have gone into the development of experimental methods to determine the
nucleotide sequence of biological DNA and RNA molecules,[26][27] and today hundreds of
millions of nucleotides are sequenced daily at genome centers and smaller laboratories
worldwide. In addition to maintaining the GenBank nucleic acid sequence database, the National
Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov) provides analysis
and retrieval resources for the data in GenBank and other biological data made available through
the NCBI web site

Genomes

The genome is the entire set of DNA instructions found in a cell. In humans, the genome consists
of 23 pairs of chromosomes located in the cell's nucleus, as well as a small chromosome in the
cell's mitochondria. A genome contains all the information needed for an individual to develop
and function.

DNA is the information molecule for all living organisms. All of the DNA of an organism is
called its genome. For example, the human genome contains about 3 billion nucleotides.

Protein sequences and structures,

Protein structures are made by condensation of amino acids forming peptide bonds. The
sequence of amino acids in a protein is called its primary structure. The secondary structure is
determined by the dihedral angles of the peptide bonds, the tertiary structure by the folding of
protein chains in space.

Bibliography

Sequence analysis

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide


sequence to any of a wide range of analytical methods to understand its features, function,
structure, or evolution. Methodologies used include sequence alignment, searches against
biological databases, and others.
sequence analysis can be used to assign function to genes and proteins by the study of the
similarities between the compared sequences. Nowadays, there are many tools and techniques
that provide the sequence comparisons (sequence alignment) and analyze the alignment product
to understand its biology.

Sequence analysis in molecular biology includes a very wide range of relevant topics:

The comparison of sequences in order to find similarity, often to infer if they are related
(homologous)
Identification of intrinsic features of the sequence such as active sites, post translational
modification sites, gene-structures, reading frames, distributions of introns and exons and
regulatory elements
Identification of sequence differences and variations such as point mutations and single
nucleotide polymorphism (SNP) in order to get the genetic marker.
Revealing the evolution and genetic diversity of sequences and organisms
Identification of molecular structure from sequence alone
In chemistry, sequence analysis comprises techniques used to determine the sequence of a
polymer formed of several monomers (see Sequence analysis of synthetic polymers). In
molecular biology and genetics, the same process is called simply "sequencing".

In marketing, sequence analysis is often used in analytical customer relationship management


applications, such as NPTB models (Next Product to Buy).

In social sciences and in sociology in particular, sequence methods are increasingly used to study
life-course and career trajectories, time use, patterns of organizational and national development,
conversation and interaction structure, and the problem of work/family synchrony. This body of
research is described under sequence analysis in social sciences.
Since the very first sequences of the insulin protein were characterized by Fred Sanger in 1951,
biologists have been trying to use this knowledge to understand the function of molecules. The
method used in this study, which is called the “Sanger method” or Sanger sequencing, was a milestone in
sequencing long strand molecules such as DNA. This method was eventually used in the human genome
project.
the first complete genome of a bacteriophage in 1977. Robert Holley and his team in Cornell University were
believed to be the first to sequence an RNA molecule. There are millions of protein and nucleotide
sequences known. Relationships between these sequences are usually discovered by aligning them
together and assigning this alignment a score. There are two main types of sequence alignment.
Pair-wise sequence alignment only compares two sequences at a time and multiple sequence
alignment compares many sequences. Two important algorithms for aligning pairs of sequences are
the Needleman-Wunsch algorithm and the Smith-Waterman algorithm. Popular tools for sequence
alignment include:

● Pair-wise alignment - BLAST, Dot plots


● Multiple alignment - ClustalW, PROBCONS, MUSCLE, MAFFT, and T-Coffee.
A common use for pairwise sequence alignment is to take a sequence of interest and compare it to
all known sequences in a database to identify homologous sequences.
a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify
regions of similarity that may be a consequence of functional, structural, or evolutionary relationships
between the sequences.
Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a
matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in
successive columns. Sequence alignments are also used for non-biological sequences, such as
calculating the distance cost between strings in a natural language or in financial data.

BLAST: Basic Local Alignment Search Tool


The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity
between sequences. BLAST is an algorithm and program for comparing primary
biological sequence information, such as the amino-acid sequences of proteins or the
nucleotides of DNA and/or RNA sequences. A BLAST search enables a researcher to
compare a subject protein or nucleotide sequence (called a query) with a library or
database of sequences, and identify database sequences that resemble alphabet above
a certain threshold. For example, following the discovery of a previously unknown gene
in the mouse, a scientist will typically perform a BLAST search of the human genome to
see if humans carry a similar gene; BLAST will identify sequences in the human
genome that resemble the mouse gene based on similarity of sequence.

CLUSTALW
Clustal W is a general purpose multiple sequence alignment program for DNA or
proteins.It produces biologically meaningful multiple sequence alignment.

Alignment methods
Very short or very similar sequences can be aligned by hand. However, most interesting problems
require the alignment of lengthy, highly variable or extremely numerous sequences that cannot be
aligned solely by human effort. Instead, human knowledge is applied in constructing algorithms to
produce high-quality sequence alignments, and occasionally in adjusting the final results to reflect
patterns that are difficult to represent algorithmically (especially in the case of nucleotide
sequences). Computational approaches to sequence alignment generally fall into two categories:
global alignments and local alignments. Calculating a global alignment is a form of global
optimization that "forces" the alignment to span the entire length of all query sequences. By contrast,
local alignments identify regions of similarity within long sequences that are often widely divergent
overall. Local alignments are often preferable, but can be more difficult to calculate because of the
additional challenge of identifying the regions of similarity. A variety of computational algorithms have
been applied to the sequence alignment problem. These include slow but formally correct methods
like dynamic programming. These also include efficient, heuristic algorithms or probabilistic methods
designed for large-scale database search, that do not guarantee to find best matches.

Global and local alignments


Global alignments, which attempt to align every residue in every sequence, are most useful when
the sequences in the query set are similar and of roughly equal size. (This does not mean global
alignments cannot start and/or end in gaps.) A general global alignment technique is the
Needleman–Wunsch algorithm, which is based on dynamic programming. Local alignments are
more useful for dissimilar sequences that are suspected to contain regions of similarity or similar
sequence motifs within their larger sequence context. The Smith–Waterman algorithm is a general
local alignment method based on the same dynamic programming scheme but with additional
[5]
choices to start and end at any place.

Hybrid methods, known as semi-global or "glocal" (short for global-local) methods, search for the
best possible partial alignment of the two sequences (in other words, a combination of one or both
starts and one or both ends is stated to be aligned). This can be especially useful when the
downstream part of one sequence overlaps with the upstream part of the other sequence. In this
case, neither global nor local alignment is entirely appropriate: a global alignment would attempt to
force the alignment to extend beyond the region of overlap, while a local alignment might not fully
[8]
cover the region of overlap. Another case where semi-global alignment is useful is when one
sequence is short (for example a gene sequence) and the other is very long (for example a
chromosome sequence). In that case, the short sequence should be globally (fully) aligned but only
a local (partial) alignment is desired for the long sequence.

Fast expansion of genetic data challenges speed of current DNA sequence alignment algorithms.
Essential needs for an efficient and accurate method for DNA variant discovery demand innovative
approaches for parallel processing in real time. Optical computing approaches have been suggested
as promising alternatives to the current electrical implementations, yet their applicability remains to
be tested

Pairwise alignment
Pairwise sequence alignment methods are used to find the best-matching piecewise (local or global)
alignments of two query sequences. Pairwise alignments can only be used between two sequences
at a time, but they are efficient to calculate and are often used for methods that do not require
extreme precision (such as searching a database for sequences with high similarity to a query). The
three primary methods of producing pairwise alignments are dot-matrix methods, dynamic
[1]
programming, and word methods; however, multiple sequence alignment techniques can also
align pairs of sequences. Although each method has its individual strengths and weaknesses, all
three pairwise methods have difficulty with highly repetitive sequences of low information content -
especially where the number of repetitions differ in the two sequences to be aligned.

Multiple sequence alignment


Multiple sequence alignment is an extension of pairwise alignment to incorporate more than two
sequences at a time. Multiple alignment methods try to align all of the sequences in a given query
set. Multiple alignments are often used in identifying conserved sequence regions across a group of
sequences hypothesized to be evolutionarily related. Such conserved sequence motifs can be used
in conjunction with structural and mechanistic information to locate the catalytic active sites of
enzymes. Alignments are also used to aid in establishing evolutionary relationships by constructing
phylogenetic trees. Multiple sequence alignments are computationally difficult to produce and most
[11][12]
formulations of the problem lead to NP-complete combinatorial optimization problems.
Nevertheless, the utility of these alignments in bioinformatics has led to the development of a variety
of methods suitable for aligning three or more sequences.

Structural alignment
Structural alignments, which are usually specific to protein and sometimes RNA sequences, use
information about the secondary and tertiary structure of the protein or RNA molecule to aid in
aligning the sequences. These methods can be used for two or more sequences and typically
produce local alignments; however, because they depend on the availability of structural information,
they can only be used for sequences whose corresponding structures are known (usually through
X-ray crystallography or NMR spectroscopy). Because both protein and RNA structure is more
[20]
evolutionarily conserved than sequence, structural alignments can be more reliable between
sequences that are very distantly related and that have diverged so extensively that sequence
comparison cannot reliably detect their similarity.

Structural alignments are used as the "gold standard" in evaluating alignments for homology-based
[21]
protein structure prediction because they explicitly align regions of the protein sequence that are
structurally similar rather than relying exclusively on sequence information. However, clearly
structural alignments cannot be used in structure prediction because at least one sequence in the
query set is the target to be modeled, for which the structure is not known. It has been shown that,
given the structural alignment between a target and a template sequence, highly accurate models of
the target protein sequence can be produced; a major stumbling block in homology-based structure
prediction is the production of structurally accurate alignments given only sequence information.

Phylogenetic analysis
phylogeny, the history of the evolution of a species or group,
especially in reference to lines of descent and relationships among
broad groups of organisms.
In biology, phylogenetics is the study of the evolutionary history and relationships
among or within groups of organisms. These relationships are determined by
phylogenetic inference methods that focus on observed heritable traits, such as DNA
sequences, protein amino acid sequences, or morphology
A phylogenetic tree is a diagram that represents evolutionary relationships among
organisms. Phylogenetic trees are hypotheses, not definitive facts. The pattern of
branching in a phylogenetic tree reflects how species or other groups evolved from a
series of common ancestors.
Phylogenetic analysis provides an in-depth understanding of how species evolve
through genetic changes. Using phylogenetics, scientists can evaluate the path that
connects a present-day organism with its ancestral origin, as well as can predict the
genetic divergence that may occur in the future.

You might also like