SlideShare a Scribd company logo
Pairwise sequence Alignment

                    Dr Avril Coghlan
                   alc@sanger.ac.uk

Note: this talk contains animations which can only be seen by
downloading and using ‘View Slide show’ in Powerpoint
Sequence comparison
• How can we compare the human & Drosophila
  melanogaster Eyeless protein sequences?
  One method is a dotplot
• A dotplot is a graphical (visual) approach
  Regions of local similarity between the 2 sequences appear as diagonal
       lines of coloured cells (‘dots’)
                Fruitfly Eyeless




                                                   Window-size = 10,
                                                   Threshold = 5




                                   Human Eyeless
Sequence alignment
• A second method for comparing sequences is a
  sequence alignment
• An alignment is an arrangement in columns of 2
  sequences, highlighting their similarity
  The sequences are padded with gaps (dashes) so that wherever
  possible, alignment columns contain identical letters from the   two
  sequences involved
  An insertion or deletion is represented by ‘–’ (a gap)
  The symbol “|” is used to represent matches
  eg. here is an alignment for amino acid sequences
  “QKGSYPVRSTC” & “QKGSGPVRSTC”:

            Q K G S Y P V R S T C             This alignment has
                                              There are 10 matches
                                                     is 1 mismatch
            | | | |   | | | | | |
            Q K G S G P V R S T C              11 columns
            1 2 3 4 5 6 7 8 9 10 11
Sequence alignment
• An alignment of the human and fruitfly
  (Drosophila melanogaster) Eyeless proteins:
What does an alignment mean?
• An alignment is tells you tells you what mutations
  occurred in the sequences since the sequences
  shared a common ancestor
  eg. an alignment of the human & fruitfly Eyeless suggests:
  (i) there were probably deletion(s) at the start of the human
  Eyeless, or insertion(s) at the start of fruitfly Eyeless




  (ii) there was probably a G→N substitution in human Eyeless, or a N→G
         substitution in fruitfly Eyeless (see arrow)
How do we make an alignment?
• Given two or more sequences, what is the best way
  to align them to each other
  We want the alignment columns to contain identical letters
• Comparison of similar sequences of similar length is
  straightforward
  eg. for amino acid sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”, we
       line up the identical letters in columns:

               Q K G S Y P V R S T C            sequence 1
               | | | |   | | | | | |
               Q K G S G P V R S T C            sequence 2

  The alignment implies that one mutation occurred since the two
  sequences shared a common ancestor
  That is, the alignment implies there was a G→Y substitution in
  sequence 1 or a Y→G substitution in sequence 2
Problem
• Are there other possible plausible alignments for
  sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”?
Answer
• Are there other possible plausible alignments for
  sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”?
  There are many other possible alignments, eg. :

  Q K G S Y - P V R S T C
  | | |       | | | | | |
  Q K G - S G P V R S T C
  Q K G S - Y P V R S T C
  | | | |       | | | | |
  Q K G S G P - V R S T C
  Q K G - - - - - S Y P V R S T C
  | | |           |           | |
  Q K G S G P V R S - - - - - T C
  Q K - G S Y P V R S T C
  | |                   |
  Q K G S G P V R S T - C                  etc. etc. etc. . . .
Number of possible pairwise alignments
• There are lots of different possible alignments for
  two sequences that are both of length n
  The number of possible alignments of 2 seqs of length n letters (amino
  acids/nucleotides) is ( ) (“2n2n
                                 choose n”)
                                       n
      2n
  (   n)   can be calculated as ( 2n
                                   n   ) =   (2*n) !
                                             n! * n!
  where n! (‘n factorial’) = n * (n - 1) * (n – 2) * (n – 3) * ... * 3 * 2 * 1
• For example, for “QKGSYPVRSTC” &
  “QKGSGPVRSTC”, n (length) = 11 letters
  The number of possible alignments of these two sequences is
  (2*11) = ( 22 ) = (2*11) !  =           22!
    11       11
                    11! * 11!     39916800*3991680

  = 1.124001e+21/1.593351e+15 = 705,432 possible alignments
Number of possible pairwise alignments
• Even for relatively short sequences, (2n ) is large, so
                                        n
  there are lots of possible alignments
  eg. for two sequences that are both 11 letters long, there are
  705,432 possible alignments
• In fact, the number of possible alignments, ( 2n ),
                                                n
  increases exponentially with the sequence length (n)
  ie. ( 2n ) is approximately equal to 22n
        n

                                                        For two sequences of
    Number of                                           17 letters long (n=17),
    possible                                            there are 2.3 billion
    alignments                                          possible alignments



                         Length of sequences (n)
• Many of the possible alignments for 2 seqs are
  implausible as they imply many mutations occurred
  (but it’s known mutations are rare)
  eg. for amino acid sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”, the
        alignment made by lining the identical letters into columns only
        implies one mutation:
  Q K G S Y P V R S T C              This alignment implies that 1 G→Y or
  | | | |   | | | | | |              Y→G substitution occurred
  Q K G S G P V R S T C

  Many of the alternative alignments for these two sequences        imply
  that many more mutations occurred, eg. :

  Q K G S Y - P V R S T C             This alignment implies that 1 S→Y or
  | | |       | | | | | |             Y→S substitution occurred;
  Q K G - S G P V R S T C
                                      that 1 insertion of S or deletion of S
                                      occurred;
                                      and that 1 deletion of G or insertion of G
                                      occurred
Further Reading
•   Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn
•   Practical on pairwise alignment in R in the Little Book of R for
    Bioinformatics:
    https://a-little-book-of-r-for-
    bioinformatics.readthedocs.org/en/latest/src/chapter4.html

More Related Content

What's hot (20)

sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
nadeem akhter
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
ALLIENU
 
Fasta
FastaFasta
Fasta
Venkatasubramanian P
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
Vidya Kalaivani Rajkumar
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
Ashwini
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
Alphonsa Joseph
 
Fasta
FastaFasta
Fasta
university of education,Lahore
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
Vidya Kalaivani Rajkumar
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
benazeer fathima
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
Hafiz Muhammad Zeeshan Raza
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
Pritom Chaki
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology Laboratory
Thapar Institute of Engineering & Technology, Patiala, Punjab, India
 
Scop database
Scop databaseScop database
Scop database
Sayantani Roy
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
Ariful Islam Sagar
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
prashant tripathi
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
Aashish Patel
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
Sardar Harpreet Kalsi
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
Nitin Naik
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Subhranil Bhattacharjee
 
Biological database
Biological databaseBiological database
Biological database
Iqbal college Peringammala TVM
 

Viewers also liked (9)

Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
Nikesh Narayanan
 
Metamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and DetectionMetamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and Detection
Grijesh Chauhan
 
Using Machine Learning in Networks Intrusion Detection Systems
Using Machine Learning in Networks Intrusion Detection SystemsUsing Machine Learning in Networks Intrusion Detection Systems
Using Machine Learning in Networks Intrusion Detection Systems
Omar Shaya
 
Global alignment
Global alignmentGlobal alignment
Global alignment
Pinky Vincent
 
Nucleic Acid Sequence Databases
Nucleic Acid Sequence DatabasesNucleic Acid Sequence Databases
Nucleic Acid Sequence Databases
farwa fayaz
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
karamveer prajapat
 
Nucleic acid database
Nucleic acid database Nucleic acid database
Nucleic acid database
bhargvi sharma
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
Pranavathiyani G
 
Biological databases
Biological databasesBiological databases
Biological databases
Prasanthperceptron
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
Nikesh Narayanan
 
Metamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and DetectionMetamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and Detection
Grijesh Chauhan
 
Using Machine Learning in Networks Intrusion Detection Systems
Using Machine Learning in Networks Intrusion Detection SystemsUsing Machine Learning in Networks Intrusion Detection Systems
Using Machine Learning in Networks Intrusion Detection Systems
Omar Shaya
 
Nucleic Acid Sequence Databases
Nucleic Acid Sequence DatabasesNucleic Acid Sequence Databases
Nucleic Acid Sequence Databases
farwa fayaz
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
karamveer prajapat
 
Nucleic acid database
Nucleic acid database Nucleic acid database
Nucleic acid database
bhargvi sharma
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
Pranavathiyani G
 

Similar to Pairwise sequence alignment (20)

Ch06 multalign
Ch06 multalignCh06 multalign
Ch06 multalign
BioinformaticsInstitute
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
avrilcoghlan
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
Daffodil International University
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
Daffodil International University
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
austinps
 
Slides4
Slides4Slides4
Slides4
BioinformaticsInstitute
 
Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
UthishAravind
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
BioinformaticsInstitute
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
avrilcoghlan
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
Masahiro Suzuki
 
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Konstantinos Giannakis
 
Conference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph StatesConference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph States
Chase Yetter
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
Computer Science Club
 
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
University of California, San Diego
 
Smaller fully-functional bidirectional BWT indexes
Smaller fully-functional bidirectional BWT indexesSmaller fully-functional bidirectional BWT indexes
Smaller fully-functional bidirectional BWT indexes
Fabio Cunial
 
Better Late Than Never: A Fully Abstract Semantics for Classical Processes
Better Late Than Never: A Fully Abstract Semantics for Classical ProcessesBetter Late Than Never: A Fully Abstract Semantics for Classical Processes
Better Late Than Never: A Fully Abstract Semantics for Classical Processes
Marco Peressotti
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
Danushka Bollegala
 
seq alignment.ppt
seq alignment.pptseq alignment.ppt
seq alignment.ppt
AmandeepKaur836413
 
20110501 csseminar alekseyev_comparative_genomics
20110501 csseminar alekseyev_comparative_genomics20110501 csseminar alekseyev_comparative_genomics
20110501 csseminar alekseyev_comparative_genomics
Computer Science Club
 
Quantified NTL
Quantified NTLQuantified NTL
Quantified NTL
FoCAS Initiative
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
avrilcoghlan
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
austinps
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
avrilcoghlan
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
Masahiro Suzuki
 
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Konstantinos Giannakis
 
Conference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph StatesConference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph States
Chase Yetter
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
Computer Science Club
 
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
University of California, San Diego
 
Smaller fully-functional bidirectional BWT indexes
Smaller fully-functional bidirectional BWT indexesSmaller fully-functional bidirectional BWT indexes
Smaller fully-functional bidirectional BWT indexes
Fabio Cunial
 
Better Late Than Never: A Fully Abstract Semantics for Classical Processes
Better Late Than Never: A Fully Abstract Semantics for Classical ProcessesBetter Late Than Never: A Fully Abstract Semantics for Classical Processes
Better Late Than Never: A Fully Abstract Semantics for Classical Processes
Marco Peressotti
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
Danushka Bollegala
 
20110501 csseminar alekseyev_comparative_genomics
20110501 csseminar alekseyev_comparative_genomics20110501 csseminar alekseyev_comparative_genomics
20110501 csseminar alekseyev_comparative_genomics
Computer Science Club
 

More from avrilcoghlan (9)

DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
avrilcoghlan
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
avrilcoghlan
 
Homology
HomologyHomology
Homology
avrilcoghlan
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
avrilcoghlan
 
BLAST
BLASTBLAST
BLAST
avrilcoghlan
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
avrilcoghlan
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
avrilcoghlan
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
avrilcoghlan
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformatics
avrilcoghlan
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
avrilcoghlan
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
avrilcoghlan
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
avrilcoghlan
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
avrilcoghlan
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
avrilcoghlan
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
avrilcoghlan
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformatics
avrilcoghlan
 

Recently uploaded (20)

The Splitting of the Moon (Shaqq al-Qamar).pdf
The Splitting of the Moon (Shaqq al-Qamar).pdfThe Splitting of the Moon (Shaqq al-Qamar).pdf
The Splitting of the Moon (Shaqq al-Qamar).pdf
Mirza Gazanfar Ali Baig
 
KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression
KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted RegressionKNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression
KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression
Global Academy of Technology
 
5503 Course Proposal Online Computer Middle School Course Wood M.pdf
5503 Course Proposal Online Computer Middle School Course Wood M.pdf5503 Course Proposal Online Computer Middle School Course Wood M.pdf
5503 Course Proposal Online Computer Middle School Course Wood M.pdf
Melanie Wood
 
QUIZ-O-FORCE 3.0 FINAL SET BY SOURAV .pptx
QUIZ-O-FORCE 3.0 FINAL SET BY SOURAV .pptxQUIZ-O-FORCE 3.0 FINAL SET BY SOURAV .pptx
QUIZ-O-FORCE 3.0 FINAL SET BY SOURAV .pptx
Sourav Kr Podder
 
Paper 110A | Shadows and Light: Exploring Expressionism in ‘The Cabinet of Dr...
Paper 110A | Shadows and Light: Exploring Expressionism in ‘The Cabinet of Dr...Paper 110A | Shadows and Light: Exploring Expressionism in ‘The Cabinet of Dr...
Paper 110A | Shadows and Light: Exploring Expressionism in ‘The Cabinet of Dr...
Rajdeep Bavaliya
 
How to Use Owl Slots in Odoo 17 - Odoo Slides
How to Use Owl Slots in Odoo 17 - Odoo SlidesHow to Use Owl Slots in Odoo 17 - Odoo Slides
How to Use Owl Slots in Odoo 17 - Odoo Slides
Celine George
 
SAMARTH QUIZ 2024-25_ PRELIMINARY ROUNDS
SAMARTH QUIZ 2024-25_ PRELIMINARY ROUNDSSAMARTH QUIZ 2024-25_ PRELIMINARY ROUNDS
SAMARTH QUIZ 2024-25_ PRELIMINARY ROUNDS
Anand Kumar
 
QUIZ-O-FORCE PRELIMINARY ANSWER SLIDE.pptx
QUIZ-O-FORCE PRELIMINARY ANSWER SLIDE.pptxQUIZ-O-FORCE PRELIMINARY ANSWER SLIDE.pptx
QUIZ-O-FORCE PRELIMINARY ANSWER SLIDE.pptx
Sourav Kr Podder
 
[2025] Qualtric XM-EX-EXPERT Study Plan | Practice Questions + Exam Details
[2025] Qualtric XM-EX-EXPERT Study Plan | Practice Questions + Exam Details[2025] Qualtric XM-EX-EXPERT Study Plan | Practice Questions + Exam Details
[2025] Qualtric XM-EX-EXPERT Study Plan | Practice Questions + Exam Details
Jenny408767
 
Odoo 18 Point of Sale PWA - Odoo Slides
Odoo 18 Point of Sale PWA  - Odoo  SlidesOdoo 18 Point of Sale PWA  - Odoo  Slides
Odoo 18 Point of Sale PWA - Odoo Slides
Celine George
 
Regression Analysis-Machine Learning -Different Types
Regression Analysis-Machine Learning -Different TypesRegression Analysis-Machine Learning -Different Types
Regression Analysis-Machine Learning -Different Types
Global Academy of Technology
 
How to Automate Activities Using Odoo 18 CRM
How to Automate Activities Using Odoo 18 CRMHow to Automate Activities Using Odoo 18 CRM
How to Automate Activities Using Odoo 18 CRM
Celine George
 
Unit 1 Kali NetHunter is the official Kali Linux penetration testing platform...
Unit 1 Kali NetHunter is the official Kali Linux penetration testing platform...Unit 1 Kali NetHunter is the official Kali Linux penetration testing platform...
Unit 1 Kali NetHunter is the official Kali Linux penetration testing platform...
ChatanBawankar
 
EVALUATION AND MANAGEMENT OF OPEN FRACTURE
EVALUATION AND MANAGEMENT OF OPEN FRACTUREEVALUATION AND MANAGEMENT OF OPEN FRACTURE
EVALUATION AND MANAGEMENT OF OPEN FRACTURE
BipulBorthakur
 
the dynastic history of the Gahadwals of Early Medieval Period
the dynastic history of the Gahadwals of Early Medieval Periodthe dynastic history of the Gahadwals of Early Medieval Period
the dynastic history of the Gahadwals of Early Medieval Period
PrachiSontakke5
 
Understanding-the-Weather.pdf/7th class/social/ 2nd chapter/Samyans Academy n...
Understanding-the-Weather.pdf/7th class/social/ 2nd chapter/Samyans Academy n...Understanding-the-Weather.pdf/7th class/social/ 2nd chapter/Samyans Academy n...
Understanding-the-Weather.pdf/7th class/social/ 2nd chapter/Samyans Academy n...
Sandeep Swamy
 
Sri Guru Arjun Dev Ji .
Sri Guru Arjun Dev Ji                   .Sri Guru Arjun Dev Ji                   .
Sri Guru Arjun Dev Ji .
Balvir Singh
 
Low Vison introduction from Aligarh Muslim University
Low Vison introduction from Aligarh Muslim UniversityLow Vison introduction from Aligarh Muslim University
Low Vison introduction from Aligarh Muslim University
Aligarh Muslim University, Aligarh, Uttar Pradesh, India
 
Drug Metabolism advanced medicinal chemistry.pptx
Drug Metabolism advanced medicinal chemistry.pptxDrug Metabolism advanced medicinal chemistry.pptx
Drug Metabolism advanced medicinal chemistry.pptx
pharmaworld
 
Decision Tree-ID3,C4.5,CART,Regression Tree
Decision Tree-ID3,C4.5,CART,Regression TreeDecision Tree-ID3,C4.5,CART,Regression Tree
Decision Tree-ID3,C4.5,CART,Regression Tree
Global Academy of Technology
 
The Splitting of the Moon (Shaqq al-Qamar).pdf
The Splitting of the Moon (Shaqq al-Qamar).pdfThe Splitting of the Moon (Shaqq al-Qamar).pdf
The Splitting of the Moon (Shaqq al-Qamar).pdf
Mirza Gazanfar Ali Baig
 
KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression
KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted RegressionKNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression
KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression
Global Academy of Technology
 
5503 Course Proposal Online Computer Middle School Course Wood M.pdf
5503 Course Proposal Online Computer Middle School Course Wood M.pdf5503 Course Proposal Online Computer Middle School Course Wood M.pdf
5503 Course Proposal Online Computer Middle School Course Wood M.pdf
Melanie Wood
 
QUIZ-O-FORCE 3.0 FINAL SET BY SOURAV .pptx
QUIZ-O-FORCE 3.0 FINAL SET BY SOURAV .pptxQUIZ-O-FORCE 3.0 FINAL SET BY SOURAV .pptx
QUIZ-O-FORCE 3.0 FINAL SET BY SOURAV .pptx
Sourav Kr Podder
 
Paper 110A | Shadows and Light: Exploring Expressionism in ‘The Cabinet of Dr...
Paper 110A | Shadows and Light: Exploring Expressionism in ‘The Cabinet of Dr...Paper 110A | Shadows and Light: Exploring Expressionism in ‘The Cabinet of Dr...
Paper 110A | Shadows and Light: Exploring Expressionism in ‘The Cabinet of Dr...
Rajdeep Bavaliya
 
How to Use Owl Slots in Odoo 17 - Odoo Slides
How to Use Owl Slots in Odoo 17 - Odoo SlidesHow to Use Owl Slots in Odoo 17 - Odoo Slides
How to Use Owl Slots in Odoo 17 - Odoo Slides
Celine George
 
SAMARTH QUIZ 2024-25_ PRELIMINARY ROUNDS
SAMARTH QUIZ 2024-25_ PRELIMINARY ROUNDSSAMARTH QUIZ 2024-25_ PRELIMINARY ROUNDS
SAMARTH QUIZ 2024-25_ PRELIMINARY ROUNDS
Anand Kumar
 
QUIZ-O-FORCE PRELIMINARY ANSWER SLIDE.pptx
QUIZ-O-FORCE PRELIMINARY ANSWER SLIDE.pptxQUIZ-O-FORCE PRELIMINARY ANSWER SLIDE.pptx
QUIZ-O-FORCE PRELIMINARY ANSWER SLIDE.pptx
Sourav Kr Podder
 
[2025] Qualtric XM-EX-EXPERT Study Plan | Practice Questions + Exam Details
[2025] Qualtric XM-EX-EXPERT Study Plan | Practice Questions + Exam Details[2025] Qualtric XM-EX-EXPERT Study Plan | Practice Questions + Exam Details
[2025] Qualtric XM-EX-EXPERT Study Plan | Practice Questions + Exam Details
Jenny408767
 
Odoo 18 Point of Sale PWA - Odoo Slides
Odoo 18 Point of Sale PWA  - Odoo  SlidesOdoo 18 Point of Sale PWA  - Odoo  Slides
Odoo 18 Point of Sale PWA - Odoo Slides
Celine George
 
Regression Analysis-Machine Learning -Different Types
Regression Analysis-Machine Learning -Different TypesRegression Analysis-Machine Learning -Different Types
Regression Analysis-Machine Learning -Different Types
Global Academy of Technology
 
How to Automate Activities Using Odoo 18 CRM
How to Automate Activities Using Odoo 18 CRMHow to Automate Activities Using Odoo 18 CRM
How to Automate Activities Using Odoo 18 CRM
Celine George
 
Unit 1 Kali NetHunter is the official Kali Linux penetration testing platform...
Unit 1 Kali NetHunter is the official Kali Linux penetration testing platform...Unit 1 Kali NetHunter is the official Kali Linux penetration testing platform...
Unit 1 Kali NetHunter is the official Kali Linux penetration testing platform...
ChatanBawankar
 
EVALUATION AND MANAGEMENT OF OPEN FRACTURE
EVALUATION AND MANAGEMENT OF OPEN FRACTUREEVALUATION AND MANAGEMENT OF OPEN FRACTURE
EVALUATION AND MANAGEMENT OF OPEN FRACTURE
BipulBorthakur
 
the dynastic history of the Gahadwals of Early Medieval Period
the dynastic history of the Gahadwals of Early Medieval Periodthe dynastic history of the Gahadwals of Early Medieval Period
the dynastic history of the Gahadwals of Early Medieval Period
PrachiSontakke5
 
Understanding-the-Weather.pdf/7th class/social/ 2nd chapter/Samyans Academy n...
Understanding-the-Weather.pdf/7th class/social/ 2nd chapter/Samyans Academy n...Understanding-the-Weather.pdf/7th class/social/ 2nd chapter/Samyans Academy n...
Understanding-the-Weather.pdf/7th class/social/ 2nd chapter/Samyans Academy n...
Sandeep Swamy
 
Sri Guru Arjun Dev Ji .
Sri Guru Arjun Dev Ji                   .Sri Guru Arjun Dev Ji                   .
Sri Guru Arjun Dev Ji .
Balvir Singh
 
Drug Metabolism advanced medicinal chemistry.pptx
Drug Metabolism advanced medicinal chemistry.pptxDrug Metabolism advanced medicinal chemistry.pptx
Drug Metabolism advanced medicinal chemistry.pptx
pharmaworld
 

Pairwise sequence alignment

  • 1. Pairwise sequence Alignment Dr Avril Coghlan [email protected] Note: this talk contains animations which can only be seen by downloading and using ‘View Slide show’ in Powerpoint
  • 2. Sequence comparison • How can we compare the human & Drosophila melanogaster Eyeless protein sequences? One method is a dotplot • A dotplot is a graphical (visual) approach Regions of local similarity between the 2 sequences appear as diagonal lines of coloured cells (‘dots’) Fruitfly Eyeless Window-size = 10, Threshold = 5 Human Eyeless
  • 3. Sequence alignment • A second method for comparing sequences is a sequence alignment • An alignment is an arrangement in columns of 2 sequences, highlighting their similarity The sequences are padded with gaps (dashes) so that wherever possible, alignment columns contain identical letters from the two sequences involved An insertion or deletion is represented by ‘–’ (a gap) The symbol “|” is used to represent matches eg. here is an alignment for amino acid sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”: Q K G S Y P V R S T C This alignment has There are 10 matches is 1 mismatch | | | | | | | | | | Q K G S G P V R S T C 11 columns 1 2 3 4 5 6 7 8 9 10 11
  • 4. Sequence alignment • An alignment of the human and fruitfly (Drosophila melanogaster) Eyeless proteins:
  • 5. What does an alignment mean? • An alignment is tells you tells you what mutations occurred in the sequences since the sequences shared a common ancestor eg. an alignment of the human & fruitfly Eyeless suggests: (i) there were probably deletion(s) at the start of the human Eyeless, or insertion(s) at the start of fruitfly Eyeless (ii) there was probably a G→N substitution in human Eyeless, or a N→G substitution in fruitfly Eyeless (see arrow)
  • 6. How do we make an alignment? • Given two or more sequences, what is the best way to align them to each other We want the alignment columns to contain identical letters • Comparison of similar sequences of similar length is straightforward eg. for amino acid sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”, we line up the identical letters in columns: Q K G S Y P V R S T C sequence 1 | | | | | | | | | | Q K G S G P V R S T C sequence 2 The alignment implies that one mutation occurred since the two sequences shared a common ancestor That is, the alignment implies there was a G→Y substitution in sequence 1 or a Y→G substitution in sequence 2
  • 7. Problem • Are there other possible plausible alignments for sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”?
  • 8. Answer • Are there other possible plausible alignments for sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”? There are many other possible alignments, eg. : Q K G S Y - P V R S T C | | | | | | | | | Q K G - S G P V R S T C Q K G S - Y P V R S T C | | | | | | | | | Q K G S G P - V R S T C Q K G - - - - - S Y P V R S T C | | | | | | Q K G S G P V R S - - - - - T C Q K - G S Y P V R S T C | | | Q K G S G P V R S T - C etc. etc. etc. . . .
  • 9. Number of possible pairwise alignments • There are lots of different possible alignments for two sequences that are both of length n The number of possible alignments of 2 seqs of length n letters (amino acids/nucleotides) is ( ) (“2n2n choose n”) n 2n ( n) can be calculated as ( 2n n ) = (2*n) ! n! * n! where n! (‘n factorial’) = n * (n - 1) * (n – 2) * (n – 3) * ... * 3 * 2 * 1 • For example, for “QKGSYPVRSTC” & “QKGSGPVRSTC”, n (length) = 11 letters The number of possible alignments of these two sequences is (2*11) = ( 22 ) = (2*11) ! = 22! 11 11 11! * 11! 39916800*3991680 = 1.124001e+21/1.593351e+15 = 705,432 possible alignments
  • 10. Number of possible pairwise alignments • Even for relatively short sequences, (2n ) is large, so n there are lots of possible alignments eg. for two sequences that are both 11 letters long, there are 705,432 possible alignments • In fact, the number of possible alignments, ( 2n ), n increases exponentially with the sequence length (n) ie. ( 2n ) is approximately equal to 22n n For two sequences of Number of 17 letters long (n=17), possible there are 2.3 billion alignments possible alignments Length of sequences (n)
  • 11. • Many of the possible alignments for 2 seqs are implausible as they imply many mutations occurred (but it’s known mutations are rare) eg. for amino acid sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”, the alignment made by lining the identical letters into columns only implies one mutation: Q K G S Y P V R S T C This alignment implies that 1 G→Y or | | | | | | | | | | Y→G substitution occurred Q K G S G P V R S T C Many of the alternative alignments for these two sequences imply that many more mutations occurred, eg. : Q K G S Y - P V R S T C This alignment implies that 1 S→Y or | | | | | | | | | Y→S substitution occurred; Q K G - S G P V R S T C that 1 insertion of S or deletion of S occurred; and that 1 deletion of G or insertion of G occurred
  • 12. Further Reading • Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn • Practical on pairwise alignment in R in the Little Book of R for Bioinformatics: https://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html

Editor's Notes

  • #4: Made
  • #5: Made alignment of human.fa and fly.fa using Needleman-wunsch with default parameters at: http://emboss.bioinformatics.nl/cgi-bin/emboss/needle (EMBOSS needle) Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Viewed in jalview, and saved as humanfly_needlemanwunsch.png
  • #6: Made
  • #7: Made
  • #10: In R factorial(22)/( (factorial(11)) * (factorial(11)) )
  • #11: N.B. (2n choose n) = the binomial coefficient = the number of ways that n things can be 'chosen' from a set of 2 n things = ((2n)!)/(n!)*(n!). This can be shown to be proportional to 2^(2*n) (Deonier, Tavare & Waterman book page 158-9). Graph made using wolfram alpha at http://www.wolframalpha.com/ and typing “plot 2n choose n from 1 to 20”.
  • #12: Made