0% found this document useful (0 votes)
54 views

BIOINFORMATICS Chapter 1 3rd Sem

This document provides an introduction to bioinformatics. It defines bioinformatics as the use of computers to study and manage biological data, especially using algorithms and software. It explains that bioinformatics integrates biology, computer science, and information technology. It also gives examples of key applications of bioinformatics like sequence analysis, structure prediction, gene function prediction, and analyzing genetic mutations and diseases.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

BIOINFORMATICS Chapter 1 3rd Sem

This document provides an introduction to bioinformatics. It defines bioinformatics as the use of computers to study and manage biological data, especially using algorithms and software. It explains that bioinformatics integrates biology, computer science, and information technology. It also gives examples of key applications of bioinformatics like sequence analysis, structure prediction, gene function prediction, and analyzing genetic mutations and diseases.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Introduction to Bioinformatics

Dr. Shayaq Ul Abeer Rasool


Lecturer
Government Degree College Boys
Baramulla
Mobile: 9622599672
Email: [email protected]

Dr.Shayaq Ul Abeer Rasool


Dr.Shayaq Ul Abeer Rasool
What is Bioinformatics?
• Bioinformatics is the use of
computers to study biology
• Bioinformatics is the science of
using information to understand
biology
• Bioinformatics is integration of
information technology (IT) and
biology
• Bioinformatics is the
development of computational
methods for studying structure,
function and evolution of genes,
proteins and whole genomes
Dr.Shayaq Ul Abeer Rasool
Biological Data + Computer Calculations

Bioinformatics

Dr.Shayaq Ul Abeer Rasool


What is Bioinformatics?
•Bioinformatics is “the study of the
information content and information
flow in biological systems and
processes
•Bioinformatics is a relatively new
interdisciplinary field that integrates
computer science, mathematics, biology,
and information technology to manage,
analyze, and understand biological,
biochemical and biophysical information.
•Bioinformatics deals with
Development of methods & algorithms
to organize, integrate, analyze and
interpret biological data
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool
• Between 1958 and 1962, Margaret Dayhoff worked
alongside Robert Ledley at the National Biomedical
Resource Foundation to develop the computer
program known as COMPROTEIN
• In 1970, Needleman and Wunsch established
dynamic programming algorithm for the alignment
of pairwise protein sequences
• Paulien Hogeweg and Ben Hesper coined term
“Bioinformatics” in 1970 to refer to the study of
information processes in biotic systems
• In 1976, the Sanger’s sequencing method became
the first widely adopted DNA sequencing method.
• In 1979, Roger Staden created the first available
software to analyze DNA using Sanger sequencing
• In 1990’s the Human Genome Project was
established. In 2003 accurate and complete human
genome sequence was published
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool
Biological Information in Genome
• Made up of ~35,000-50,000 genes which code for
functional proteins in the body
• Includes non-coding sequences located between
genes, which makes up the vast majority of the
DNA in the genome (~95%)
• The particular order of nucleotide bases (As, Gs,
Cs, and Ts) determines the amino acid
composition of proteins
• Information about DNA variations
(polymorphisms) among individuals can lend
insight into new technologies for diagnosing,
treating, and preventing diseases

Dr.Shayaq Ul Abeer Rasool


Genome size and length of various organisms
DNA Length Weight (Da)

• Epstein-Barr virus 0.172 *106


• Bacterium (E.coli) 4.6 *106
• Yeast (S.cerevisiae) 12.1 * 106
• Nematode worm (C.elegans) 95.5 * 106
• Thale cress (A.thaliana) 117 * 106
• Fruit fly (D.melanogaster) 180 * 106
• Human (H.sapiens) 3200 * 106

Dr.Shayaq Ul Abeer Rasool


Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool
Aims of bioinformatics

➢ Store the biological data organized in form of a database.


easy access to existing information and submit new entries.
Annote data and assign its functional characteristics

➢Develop tools and resources that aid in the analysis of data


find out similar nucleotide/amino-acid sequences- BLAST
align more nucleotide/amino-acid sequences- ClustalW
design primer probes for PCR techniques – Primer3

➢Analyze the biological data interpret the results in a biologically


meaningful manner.

Dr.Shayaq Ul Abeer Rasool


Applications of Bioinformatics
Various bioinformatics application can be categorized under
following groups
➢Sequence Analysis
All the applications that analyzes various types of sequence
information can compare between similar types of information is
grouped under Sequence Analysis
➢Function Analysis
These applications analyze the function engraved within the sequences
and helps predict the functional interaction between various
proteins or genes
➢Structure Analysis
Applications predicting structure of biomolecules

Dr.Shayaq Ul Abeer Rasool


Sequence Analysis
Sequence Database Searching
Sequence Alignment
Genome Comparison
Gene Promoter Prediction
Mutations discovery
Phylogeny

Dr.Shayaq Ul Abeer Rasool


Function Analysis
• Gene Expression Profiling
• Metabolic Pathway Modeling
• Protein Interaction Prediction
• Protein Subcellular Localization

Dr.Shayaq Ul Abeer Rasool


Structure Analysis
• Nucleic Acid Structure Prediction
• Effects of Variants on structure
• Protein Structure Prediction
• Protein Structure Classification
• Protein Structure Comparison

Dr.Shayaq Ul Abeer Rasool


Dr.Shayaq Ul Abeer Rasool
Genome annotation
• Annotation is the process of marking the
genes and other biological features in a DNA
sequence
• Annotation is made possible by the fact that
genes have recognizable start and stop regions
• Gene finding is a chief aspect of nucleotide-
level annotation
• Gene anotation includes Promoter searching,
signal peptide searching, marking exons and
introns and coding regions in a gene

Dr.Shayaq Ul Abeer Rasool


Gene Function Prediction
• The properties of sequences can be used to predict
the function of genes
• Most gene function prediction methods focus
on protein sequences
• Distribution of hydrophobic amino
acids predicts transmembrane segments in
proteins
• Bioinformatics tools can also use external
information such as gene/ protein expression
data, protein structure, or protein-protein
interactions to predict protein function

Dr.Shayaq Ul Abeer Rasool


Evolutionary biology
• Evolutionary biology is the study of the origin and
descent of species, as well as their change over
time
• Bioinformatics can help to trace the evolution of
a large number of organisms by measuring
changes in their DNA
• We can compare entire genomes, which permits
the study of more complex evolutionary events
• Track and share information on an increasingly
large number of species and organisms
• Build complex computational population
genetics models to predict the outcome of the
system over time
Dr.Shayaq Ul Abeer Rasool
Genetics of disease
• Bioinformatics can help to map the genes of
complex polygenic diseases like infertility, breast
cancer or Alzheimer's disease
• Genome-wide association studies are a useful
approach to pinpoint the mutations responsible for
such complex diseases
• Thousands of DNA variants have been identified
that are associated with similar diseases and traits
• Better tools for diagnosis or treatment is one of
the most essential applications of bioinformatics
for complex disorders

Dr.Shayaq Ul Abeer Rasool


Analysis of mutations
• The genomes of disease affected cells are
rearranged in complex or even unpredictable ways
(mutations)
• Massive sequencing efforts are used to identify
previously unknown point mutations in a variety
of disease associated genes
• Detection methods using bioinformatics tools
simultaneously measure several hundred
thousand sites throughout the genome to find
new mutations
Dr.Shayaq Ul Abeer Rasool
Gene & Protein Expression Analysis
• The expression of genes can be determined by
measuring mRNA levels
• Gene expression of whole genome (Transcriptomics)
gives lot of data that can be analysed by bioinformatics
• The techniques used in sequencing are extremely
noise-prone and/or subject to bias in the biological
measurement, so bioinformatics helps to filter signal
from noise
• Protein microarrays and high throughput (HT) mass
spectrometry (MS) can provide a snapshot of the
proteins present in a biological sample

Dr.Shayaq Ul Abeer Rasool


NCBI National Center for Biotechnology Information
ncbi.nlm.nih.gov
• Created in 1988 as part of the United States National
Library of Medicine (NLM), a branch of the National
Institutes of Health (NIH)
• NCBI houses a series of databases relevant to
biotechnology and biomedicine
• Major databases include GenBank for DNA sequences and
PubMed a bibliographic database
• Gene, Online Mendelian Inheritance in Man OMIM, the
Molecular Modeling Database (3D protein
structures), dbSNP (a database of single-nucleotide
polymorphisms), the Reference Sequence Collection, a
map of the human genome
Dr.Shayaq Ul Abeer Rasool
Dr.Shayaq Ul Abeer Rasool
EBI European Bioinformatics Institute
www.ebi.ac.uk
• Created in 1992 as part of part of the European
Molecular Biology Laboratory (EMBL)
• EMBL-EBI indexes and maintains biological data
in a set of databases, including Ensembl
(housing whole genome sequence data),
UniProt (protein sequence and annotation
database) and Protein Data Bank (protein and
nucleic acid tertiary structure database)

Dr.Shayaq Ul Abeer Rasool


Dr.Shayaq Ul Abeer Rasool
ExPASy (Expert Protein Analysis System)
• Created in 1993, is an online bioinformatics
resource operated by the SIB Swiss Institute of
Bioinformatics
• provides access to over 160 databases and
software tools and supports a range of life
science and clinical research areas
• acted as a proteomics server to analyze protein
sequences and structures and two-dimensional
gel electrophoresis (2-D Page electrophoresis)

Dr.Shayaq Ul Abeer Rasool


Dr.Shayaq Ul Abeer Rasool
Entrez
• This molecular biology database Search System is
a search engine, that allows users to search
many health sciences databases at the NCBI
• Entrez Global Query is a search and retrieval
system that provides access to all databases
simultaneously with a single query and user
interface
• Entrez covers over 20 databases including PIR-
International, Swiss-Prot, and PDB and nucleotide
sequence data from GenBank that includes
information from EMBL and DDBJ
Dr.Shayaq Ul Abeer Rasool
PDB Protein Data Bank
www.wwpdb.org
• A database for the three-dimensional structural
data of large biological molecules, such
as proteins and nucleic acids
• The Protein Data Bank was announced in October
1971 as a joint venture between Cambridge
Crystallographic Data Centre, UK and Brookhaven
National Laboratory, US.
• PDB is a key in areas of structural biology and
scientists are required to submit their structure
data to the PDB

Dr.Shayaq Ul Abeer Rasool


Dr.Shayaq Ul Abeer Rasool
Uniprot
• UniProt provides the scientific community with a
comprehensive, high quality and freely accessible
resource of protein sequence and functional
information.
• UniProt is comprised of four components, each
optimized for different uses:

Dr.Shayaq Ul Abeer Rasool


UniProt Knowledgebase (UniProtKB) is the central access
point for extensive curated protein information, including
function, classification, and cross-reference
UniProtKB comprises two sections:
UniProtKB/Swiss-Prot (1986)
SWISS-PROT, a curated protein sequence data bank, contains
not only sequence data but also annotation relevant to a
particular sequence. The annotation added to each entry is
done by a team of biologists and comes, primarily, from
articles in journals reporting the actual sequencing and
sometimes characterisation
Review articles and collaboration with external experts also
play a role along with the use of secondary databases like
PROSITE and Pfam in addition to a variety of feature
prediction methods
Dr.Shayaq Ul Abeer Rasool
UniProtKB/TrEMBL
(Translation of the EMBL database)
•TrEMBL consists of entries that are derived from the
translation of all coding sequences in the
EMBL/GenBank/ DDBJ nucleotide sequence database.
Nucleotide Sequence Databases and also protein
sequences extracted from the literature or submitted to
UniProtKB/Swiss-Prot.
•TrEMBL supplements SwissProt
•Unlike SWISS-PROT entries those in TrEMBL are awaiting
manual annotation. However, rather than just
representing basic sequence and source information,
steps have been taken to add features and annotation
automatically
Dr.Shayaq Ul Abeer Rasool

You might also like