0% found this document useful (0 votes)

31 views28 pages

Lecture - 02 - Comparative Sequence Analysis

Uploaded by

abubakrashfaq1607

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views28 pages

Lecture - 02 - Comparative Sequence Analysis

Uploaded by

abubakrashfaq1607

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Comparative

Sequence Analysis

Department of Life Sciences, SBASSE, LUMS

Genome to Gene

Heredity Unit

2
Latest on Genome Sequencing
• Human Genome Project (1990 – 2003)

Now!

3
Our Genome and Need for Comparative
Genomics
• Number of bases: 3.2 billion bases

• Number of chromosomes: 23 pairs

• Percentage of genes: Only 1% of genome is genes

• Protein-coding Gene Number: 20,000 - 25,000

• Average gene size: ~ 3000 bases & huge variation

• Largest known human gene consists of 2.4 million bases (dystrophin)

• Repetition: Almost 45-50% of the DNA is repetitive

• Similarity between individuals: Almost all (99.9%) nucleotide bases are exactly the same
in all people 4
Proteome to Protein
Genes: 30,000

Alternative Splicing: 2 - 3 per gene

3 x 30,000 = 90,000 proteins

Post translational modifications

10 x 90,000 = 900,000 proteins

Peng and Gygi, JMS 2001

Asa Wheelock

5
Need for Comparative Proteomics
• Number of reported proteins: 150 million and counting

6
Benefits of Comparative Genomics
• Comparison of whole genome sequences provides a highly detailed
view of how organisms are related to each other at the genetic level

• Comparative genomics also provides a powerful tool for studying

evolutionary changes among organisms

• Helps to identify genes that are conserved or common among species

that give each organism its unique characteristics

7
Fly vs. Humans
Comparison between fruit fly genome with the human genome:

• about 75% percent of genes are conserved

• two organisms appear to share a core set of genes

• two-thirds of human genes known to be involved in cancer have

counterparts in the fruit fly

8
Evolutionary Relationship

9
COV2

10
http://bacterialphylogeny.info/overview.html

11
What have we done and what’s
next?
DONE: Gene and Protein Sequences
• GenBank (DNA Sequences)
• Uniprot (Protein Sequences)
• GeneMark (Gene Prediction)

NEXT: Sequence & Structure Analysis

• BLAST (nucleotide, protein)
• PDB
• iTASSER

12
From Sequences to Comparisons
• Problem: If we sequence a new gene or protein, can we compare it
with the existing information in GenBank or Uniprot?

• Idea: Compare NOVEL sequences with KNOWN (previously

characterized) genes or proteins.

• Benefit: STRUCTURAL , FUNCTIONAL and EVOLUTIONARY

information can be inferred from WELL DESIGNED comparisons.

• The most common tool used is called BLAST.

13
BLAST?
• Basic Local Alignment Search Tool

• A method for rapid searching of sequence databases, for both

nucleotides and proteins.

• The BLAST algorithm detects local as well as global matches

(alignments) and regions of similarity embedded in otherwise unrelated
proteins.

• Uses statistical theory to determine if a match might have occurred by

chance.
14
https://blast.ncbi.nlm.nih.gov/Blast.cgi

15
BLAST - Workflow
1. BLAST searches the database sequences using “Dynamic Programming” on “promising”
sequences.

2. This is done by indexing all database sequences in a so-called suffix-tree which makes it
very fast to search for perfect matching sub-strings. A suffix tree is the quickest possible
way (so far) to search for the longest matching sub-string between two strings.

3. BLAST creates a list of all “words” (short subsequences) that have a certain “threshold”
score when compared with the query sequence. Words are 16-256 nucleotides or 3
amino acids put together in a row consecutively.

4. A lookup hash table is made of all such words and “neighboring” words present in the
query sequence (rather than just random words).

5. When a BLAST search is run, candidate sequences from the database is picked based on
perfect matches to small sub-sequences in the query sequence. 16
BLOSUM62 Match/Mismatch Matrix

17
• Here the word is PQG and
Score from neighboring words are
BLOSUM everything with a score
above 13 (for three
letters) as calculated by
the given scoring system
(e.g., BLOSUM62).
T is user provided threshold!
• PSG is a neighboring word,
PQA is not.

18
Example Blast search method
Query sequence: PQGELV

•Make list of all possible k-mer words (length 3 for proteins)

PQG (score 18)
QGE (score 16)
GEL (score 15)
ELV (score 13)

•Assign scores from Blosum62, use those with score >= 13

• PQG, QGE, GEL & ELV

•In total we get: PQG, QGE, GEL & ELV

Example Blast search method
• Make k-mer (word-size 3) of all sequences in database
• Store in a suffix-tree (fast tree-structure to search for identical matches)

• Find all database sequences that has at least 2 matches among our 3 words
• PQG, GEL & PEG

• Find database hit and extend alignment (High-scoring Segment Pair):

Query: M E T P Q G I A V
Database: - - - P Q G E L V
8 5 5 2 0 8

• HSP: PQGI (score 8+5+5+2)

• If 2 HSP in query sequence are < 40 positions away

• Full alignment on query and hit sequences
Advantages of BLAST
• The BLAST algorithm was written balancing speed and
increased sensitivity for finding distant sequence relationships.
• Speed is achieved by:
1. Pre-indexing the database before the search
2. Parallel processing
3. Hash table that contains neighborhood words rather than just random words.

• BLAST emphasizes regions of local alignment to detect

relationships among sequences having isolated regions of
similarity between them.

21
BLAST for Nucleotides and Proteins
• Nucleotides
• blastn
• Compares a nucleotide query sequence against a nucleotide sequence
database.

• Proteins
• blastp
• Compares an amino acid query sequence against a protein sequence
database.

22
Comparing an unknown nucleotide
sequence with possible “protein”
sequences!!
• blastx
> but what about the 6 possible ORFs?

• Compares a nucleotide query sequence translated in all reading

frames against a protein sequence database.

• This option may be used to find potential translation products of

an unknown nucleotide sequence.

23
How about the reverse of blastx?
• tblastn

• Compares a protein query sequence against a nucleotide

sequence database dynamically translated in all reading
frames.

24
Comparing all translated ORFs of a
nucleotide sequence with all ORFs
of a nucleotide DB
• tblastx

• Compares the six-frame translations of a nucleotide query

sequence against the six-frame translations of a nucleotide
sequence database.

25
Getting started with BLAST
Getting started:
http://www.ncbi.nlm.nih.gov/
http://www.ncbi.nlm.nih.gov/BLAST/
and
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html

26
So what if we find out the Alien
Gene in GenBank?
• Homologs
• Features (including DNA and protein sequences) in species being compared that are similar
because they are ancestrally related

• Homologs can be either Orthologs and Paralogs

• Orthologs
• Homologous genes (or any DNA sequences) that separated because of a speciation event
• Derived from the same gene in the last common ancestor

• Paralogs
• Homologous genes that separated because of gene duplication events within the same species

27
28

Homology vs. Analogy Worksheet
100% (2)
Homology vs. Analogy Worksheet
4 pages
Abstract Book Leish World Congress 5
No ratings yet
Abstract Book Leish World Congress 5
1,116 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Evidence of Evolution: Quarter 3 Week 4
100% (1)
Evidence of Evolution: Quarter 3 Week 4
16 pages
BLAST
No ratings yet
BLAST
17 pages
BLAST
100% (1)
BLAST
4 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
Bs982 l08 Basic Blast
No ratings yet
Bs982 l08 Basic Blast
38 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Blast Nsuite
No ratings yet
Blast Nsuite
19 pages
Lab 2.1
No ratings yet
Lab 2.1
21 pages
Bioinformatics: Arushi Dinesh Kasi Shruthi
No ratings yet
Bioinformatics: Arushi Dinesh Kasi Shruthi
28 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
UNIT IV _ BLAST (1)
No ratings yet
UNIT IV _ BLAST (1)
21 pages
Bio 2
No ratings yet
Bio 2
39 pages
Blast Fasta
No ratings yet
Blast Fasta
27 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
Blast
No ratings yet
Blast
18 pages
Bioinformatics: Blast and Sequence Analysis
No ratings yet
Bioinformatics: Blast and Sequence Analysis
45 pages
Week 3 LocalAlignment
No ratings yet
Week 3 LocalAlignment
25 pages
Blast
100% (1)
Blast
21 pages
_second_done_w14b_searching squence databases
No ratings yet
_second_done_w14b_searching squence databases
32 pages
Basic Local Alignment
No ratings yet
Basic Local Alignment
36 pages
Using Genbank and BLAST in The Biology Classroom: Matt Wester
No ratings yet
Using Genbank and BLAST in The Biology Classroom: Matt Wester
9 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Blast
No ratings yet
Blast
6 pages
Final Blast PDF
No ratings yet
Final Blast PDF
31 pages
BLAST Background
100% (1)
BLAST Background
27 pages
Fundamentals of bioinformatics_L5
No ratings yet
Fundamentals of bioinformatics_L5
56 pages
Lecture 4
No ratings yet
Lecture 4
106 pages
BLAST - A Heuristic Algorithm
No ratings yet
BLAST - A Heuristic Algorithm
18 pages
TY-Exercise_4_(35)(Updated)
No ratings yet
TY-Exercise_4_(35)(Updated)
7 pages
BTH 403-BTG407 PRACTICAL SESSION1
No ratings yet
BTH 403-BTG407 PRACTICAL SESSION1
12 pages
Database Searching
No ratings yet
Database Searching
41 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Blast (Basic Local Alignment Search Tool)
No ratings yet
Blast (Basic Local Alignment Search Tool)
28 pages
Lab Report 05
No ratings yet
Lab Report 05
20 pages
ALLIENU Blast and Fasta
No ratings yet
ALLIENU Blast and Fasta
27 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
BLAST
No ratings yet
BLAST
30 pages
TY-Exercise_4_(35)
No ratings yet
TY-Exercise_4_(35)
8 pages
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
100% (1)
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
4 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
Lab Report 03
No ratings yet
Lab Report 03
18 pages
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
100% (1)
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
38 pages
Some Significant Databases Blast Blast
No ratings yet
Some Significant Databases Blast Blast
18 pages
blast-170122070200
No ratings yet
blast-170122070200
22 pages
02.-Sequence Analysis PDF
No ratings yet
02.-Sequence Analysis PDF
14 pages
Using BLAST: FASTA Format
0% (1)
Using BLAST: FASTA Format
3 pages
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
4 pages
Lecture 05
No ratings yet
Lecture 05
36 pages
Database Similarity Searching
No ratings yet
Database Similarity Searching
4 pages
Blast
No ratings yet
Blast
115 pages
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
No ratings yet
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
6 pages
Lecture/Lab: BLAST: Materials Last Updated June 2007
No ratings yet
Lecture/Lab: BLAST: Materials Last Updated June 2007
11 pages
Blast Analisis II
No ratings yet
Blast Analisis II
15 pages
University of Kwazulu-Natal Bioinformatics Gene320 3 May 2016 Test 2 Duration 100 Minutes Total Marks: 70
No ratings yet
University of Kwazulu-Natal Bioinformatics Gene320 3 May 2016 Test 2 Duration 100 Minutes Total Marks: 70
6 pages
Blast
No ratings yet
Blast
26 pages
Bi205: Genetics & Evolution: Bioinformatics 1 & 2
No ratings yet
Bi205: Genetics & Evolution: Bioinformatics 1 & 2
14 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Darwin and Natural Selection
No ratings yet
Darwin and Natural Selection
30 pages
W6-3RD Q-DLL-SCIENCE
No ratings yet
W6-3RD Q-DLL-SCIENCE
4 pages
Biology M15 Evolution
0% (2)
Biology M15 Evolution
25 pages
Biodiversity and Evolution
No ratings yet
Biodiversity and Evolution
20 pages
General Biology 2: 2 Semester - Module 4
No ratings yet
General Biology 2: 2 Semester - Module 4
20 pages
Have No Fear of Zoology PDF
No ratings yet
Have No Fear of Zoology PDF
202 pages
CLR BIO 10 MELC5 NYAgapay
No ratings yet
CLR BIO 10 MELC5 NYAgapay
11 pages
Evidence For Evolution
100% (2)
Evidence For Evolution
12 pages
GENBIO2 Mod.18 Phylogenetic Tree
No ratings yet
GENBIO2 Mod.18 Phylogenetic Tree
20 pages
Lecture 4: Phylogeny and The Tree of Life: Campbell
100% (2)
Lecture 4: Phylogeny and The Tree of Life: Campbell
67 pages
Descent With Modification
100% (3)
Descent With Modification
38 pages
Presentation Evidence of Evolution 9th Grade
No ratings yet
Presentation Evidence of Evolution 9th Grade
33 pages
11 - Evidence of Evolution
No ratings yet
11 - Evidence of Evolution
11 pages
10.4 Evidence For Evolution
No ratings yet
10.4 Evidence For Evolution
24 pages
6 Bio200 Chapter 2
No ratings yet
6 Bio200 Chapter 2
28 pages
GenBio2 Module-6
100% (1)
GenBio2 Module-6
20 pages
Evolution Quiz1
100% (1)
Evolution Quiz1
2 pages
Evolution As A Way of Seeing The Natural World
No ratings yet
Evolution As A Way of Seeing The Natural World
49 pages
Organic Evolution
No ratings yet
Organic Evolution
34 pages
Objectives:: - Today We Will
No ratings yet
Objectives:: - Today We Will
54 pages
Biology Assignment
No ratings yet
Biology Assignment
2 pages
Phylogeny+and+Tree+of+Life
No ratings yet
Phylogeny+and+Tree+of+Life
45 pages
Evidence of Evolution: Quarter 3 /week 5
No ratings yet
Evidence of Evolution: Quarter 3 /week 5
18 pages
Phylogenetic Analysis PDF
No ratings yet
Phylogenetic Analysis PDF
55 pages
Worksheet Q3 Week 5
No ratings yet
Worksheet Q3 Week 5
2 pages
Biology2 Q3 Module 4 Evidences of Evolution
100% (1)
Biology2 Q3 Module 4 Evidences of Evolution
21 pages
CLS Aipmt 17 18 XIII Zoo Study Package 4 SET 1 Chapter 14
No ratings yet
CLS Aipmt 17 18 XIII Zoo Study Package 4 SET 1 Chapter 14
46 pages

Lecture - 02 - Comparative Sequence Analysis

Uploaded by

Lecture - 02 - Comparative Sequence Analysis

Uploaded by

Comparative

Department of Life Sciences, SBASSE, LUMS

• Number of chromosomes: 23 pairs

• Percentage of genes: Only 1% of genome is genes

• Protein-coding Gene Number: 20,000 - 25,000

• Average gene size: ~ 3000 bases & huge variation

• Repetition: Almost 45-50% of the DNA is repetitive

Alternative Splicing: 2 - 3 per gene

Post translational modifications

Peng and Gygi, JMS 2001

• Comparative genomics also provides a powerful tool for studying

• Helps to identify genes that are conserved or common among species

• about 75% percent of genes are conserved

• two organisms appear to share a core set of genes

• two-thirds of human genes known to be involved in cancer have

NEXT: Sequence & Structure Analysis

• Idea: Compare NOVEL sequences with KNOWN (previously

• Benefit: STRUCTURAL , FUNCTIONAL and EVOLUTIONARY

• The most common tool used is called BLAST.

• A method for rapid searching of sequence databases, for both

• The BLAST algorithm detects local as well as global matches

• Uses statistical theory to determine if a match might have occurred by

•Make list of all possible k-mer words (length 3 for proteins)

•Assign scores from Blosum62, use those with score >= 13

•In total we get: PQG, QGE, GEL & ELV

• Find database hit and extend alignment (High-scoring Segment Pair):

• HSP: PQGI (score 8+5+5+2)

• If 2 HSP in query sequence are < 40 positions away

• BLAST emphasizes regions of local alignment to detect

• Compares a nucleotide query sequence translated in all reading

• This option may be used to find potential translation products of

• Compares a protein query sequence against a nucleotide

• Compares the six-frame translations of a nucleotide query

• Homologs can be either Orthologs and Paralogs

You might also like