0% found this document useful (0 votes)
2 views

Lec (1) - Introduction

Bioinformatics is a computational field focused on the design and use of software tools for analyzing molecular biology data, aiming to predict biological processes in health and disease. It encompasses various analyses including sequence, structural, and functional analysis, utilizing data from DNA sequencing and gene expression. The Human Genome Project and advancements in sequencing technologies like Next-Generation Sequencing have significantly contributed to the field, allowing for extensive genomic and proteomic studies.

Uploaded by

Alkadafe
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lec (1) - Introduction

Bioinformatics is a computational field focused on the design and use of software tools for analyzing molecular biology data, aiming to predict biological processes in health and disease. It encompasses various analyses including sequence, structural, and functional analysis, utilizing data from DNA sequencing and gene expression. The Human Genome Project and advancements in sequencing technologies like Next-Generation Sequencing have significantly contributed to the field, allowing for extensive genomic and proteomic studies.

Uploaded by

Alkadafe
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Introduction

1
to

Rashid A A Abbas
What is
Bioinformatics?
 Bioinformatics
: is a computational branch
of molecular biology
 The design, construction and use of
software tools to generate ,store ,
annotate, access and analyze data and
information relating to Molecular Biology
DATA
Goal
 Theultimate goal of bioinformatics is to
be able to predict the biological
processes in health and disease (Better
understand a living cell and how it
functions at the molecular level)
Tools

Molecular
Structural Functional
Sequence
Analysis Analysis
Analysis
Sequence Analysis

a. Sequence Alignment
b. Sequence Database Searching
c. Motif and Pattern Discovery
d. Gene and Promoter Finding
e. Reconstruction of Evolutionary Relationships
f. ...
Structural Analysis

Protein and nucleic acid structure


 Analysis
 Comparison
 Classification
 Prediction
Functional Analysis

 Gene Expression Profiling


 Protein– Protein Interaction Prediction
 Protein Sub cellular Localization
Prediction
 Metabolic Pathway Reconstruction
 ...
Bioinformatics

biolog informatics
y
Biology
 Data Generation
Experimental Data types include :
1. Sequences :
sanger
Next-Generation DNA Sequencing (NGS).

2. 3D Protein Structures :
X-ray crystallography
Nuclear magnetic resonance spectroscopy (NMR)
3. Gene Expression Data : Microarrays
Biology

 Data Analysis
Alignment ….. Homology ,,,,
Phylogenetic analysis
 Data/InformationStorage/Access
Data + Annotation = Information
Information can now be stored in
Databases that allow users easy and
unrestricted access
Informatics :

 Operating Systems Windows and linux .


 Programming Python is currently the
most popular Programming Language for Bioinformatics

Minimal programming skill levels would allow :


The construction of small programs.
The understanding of slightly larger programs.
Ability to convey program specifications to a specialist.
Informatics :

 Statistics A basic
understanding of Statistics is just as vital
when designing an experiment.

Bioinformatics software commonly employs


statistics to select the most probable
answer from a set of many possible
answers to a given question.
From DNA Sequence to Protein
3D Structure
Human Genome Project :

 The Human Genome Project (HGP)= a 13-years


(1990-April 14, 2003)
international effort to sequence of the 3 billion
"letters” of human DNA.
The HGP project required that all human genome
sequence information be freely and publicly
available.
 The existing DNA sequences have been stored in
databases available to anyone willing to exploit and
analyze them.
 Dedicated databases house various data for model
organisms such as sequences of known and
hypothetical genes and proteins (GenBank, NCBI).
 Other databases (Ensembl http://www.ensembl.org)
present additional data and annotation as well as
powerful tools for visualizing and searching it.
The human genome contains only
about 20,000 protein-coding genes
The extent of non protein coding DNA
increases with increasing complexity of
organism, reaching > 98% in humans.
DNA Sequencing method:
from Sanger to NGS
DNA Sequencing is the process of
reading the nucleotides present in
DNA : determining the precise
order of nucleotides within a DNA
molecule.
 there are 2 main types of DNA
sequencing technologies that are used
today:
 Sanger sequencing .
 Next-Generation Sequencing (NGS).
Sanger sequencing :

 Sequencing method developed by Fred Sanger in


1977. This method involves copying single-stranded
DNA with chemically altered bases called
dideoxynucleotides (ddNTPs).
ddNTPs when incorporated at the 3' end of the
growing chain, terminate the chain selectively at A,
C, G, or T. The terminated chains are then resolved
by capillary electrophoresis.
Next generation sequencing
(NGS)

 High-throughput DNA sequencing where hundreds of thousands of DNA


fragments are sequenced in parallel.
The four main advantages of NGS over
classical Sanger sequencing :

 Speed
NGS is quicker than Sanger sequencing in two ways.
- Chemical reaction may be combined with the signal detection,
whereas in Sanger
sequencing these are two separate processes.
- 1 read can be taken at a time in Sanger sequencing, whereas
NGS is massively parallel.
 Cost
The human genome sequence cost $300M.
Sequencing a human genome with Illumina allows to approach
the $1,000 expected.
 Sample size
needs significantly less starting amount of
DNA/RNA
 Accuracy
More repeats than with Sanger sequencing a
greater coverage, higher accuracy and sequence
reliability (individual reads less accurate for NGS).
N.B.

Sequencing quality depend upon


the average number of times each
base in the genome is 'read' during
the sequencing process.
GENOMICS
 Genomicsis the study of an organism's
genome and the use of the genes.
It deals with the systematic use of genome
information, associated with other data, to
provide answers in biology, medicine, and
industry.
PROTEOMICS

 Proteomics is the large-scale study of


proteins, particularly their structures
and functions.
 Proteomics is much more complicated than
genomics. Most importantly, while the genome is a
rather constant entity, the proteome differs from cell
to cell and is constantly changing through its
biochemical interactions with the genome and the
environment. One organism will have radically
different protein expression in different parts of its
body, in different stages of its life cycle and in
different environmental conditions.
IMMUNOINFORMATICS

 IMMUNOINFORMATICS is
a field of science that
encompasses high
throughput genomic and
bioinformatics approaches
to immunology.
Others :

NEUROINFORMATICS
CHEMOINFORMATICS
Glossary
Term Definition

Genome he entirety of an individual’s genetic material


including coding and non-coding regions.

Exome The part of the genome that codes for proteins.


The exome constitutes 1-2% of the human genome.

Gene The entire nucleic acid sequence that is necessary for the synthesis of a
functional polypeptide or RNA.

Genetic code is a dictionary that identifies the correspondence between a sequence of


nucleotide bases and a sequence of a specific amino acids
Term Definition

Reference An assembled version of a genome that can be used to


sequence/
genome make comparisons to the genomes from other individual

Nucleotide A nucleotide is composed of a DNA base, a phosphate


and a pentose sugar. It forms the basic
unit of DNA.

Single nucleotide A single base substitution occurring at high frequency


polymorphism
(SNP) (more than 1%) in the general population. SNPs are the
most common type of variation in the human genome.

You might also like