0% found this document useful (0 votes)

52 views38 pages

Bs982 l08 Basic Blast

The document provides an introduction to the Basic Local Alignment Search Tool (BLAST) by outlining why BLAST is used, how BLAST works, and how to run BLAST searches. BLAST is a tool that helps find similar sequences to a query sequence in a database. It works by breaking the query into pieces and searching for matches in the database, then extending matches. The document guides users on choosing a query sequence, BLAST program, database, and other parameters to run their own BLAST search.

Uploaded by

Narges Miri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views38 pages

Bs982 l08 Basic Blast

Uploaded by

Narges Miri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

BS982

Welcome to BLAST
Basic Local Alignment Search Tool
Ben Skinner

School of Life Sciences • University of Essex

[email protected]
Outline

• Why BLAST?

• How BLAST works

• Running BLAST for yourself

Why BLAST?

• Falling cost of sequencing

• Growing size of databases - 1979 Los Alamos Sequence Database (became GenBank)

https://www.ncbi.nlm.nih.gov/genbank/statistics/
Why BLAST?

• Imagine – you have sequenced something from an environmental sample

• Has anyone seen this before? Is it new?

Extract DNA

Your Sequence

DNA:
ATGTGATCCGACTATGACA….

PROTEIN:
MRPILMKGHERPLTFLRYNRDG….

https://www.universe-review.ca/I11-50-DNAsequencing1.jpg
Why BLAST?

• Or you may have a protein/DNA sequence from a database:

NCBI/EMBL/SwissProt/UniProt
• What else is it similar to?

Your Sequence

DNA:
ATGTGATCCGACTATGACA….

PROTEIN:
MRPILMKGHERPLTFLRYNRDG….
Why BLAST?

BLAST is a tool that helps us find similar

sequences to a query sequence in a
database

BLAST can search for matches to DNA or

protein queries
Why BLAST?

• Usually the first thing you do when you obtain a

new, unknown sequence
• Provides biological context and helps establish
hypotheses to be tested (by experimentation or
further bioinformatic analysis)
• Part of our lexicon: “To Blast a sequence” as in
“To Google a question”
• Emphasizes speed over sensitivity
• Quick and simple to use, but very often used
sub-optimally
BLAST can be used for different purposes

• Looking for species. If you are sequencing DNA from unknown

species - identify the correct species or homologous species.
• Looking for domains. If you BLAST a protein sequence (or a
translated nucleotide sequence) – identify known domains
• Looking at phylogeny. You can use the BLAST web pages to
generate a phylogenetic tree of the BLAST result.
• Mapping DNA to a known chromosome. If you are sequencing a
gene from a known species but have no idea of the chromosome
location
• Annotations. BLAST can also be used to map annotations from
one organism to another or look for common genes in two
related species.
Important concepts

• Similarity: Degree of likeness between two sequences, usually

expressed as a percentage of similar (or identical) residues over
a given length of the alignment. Can usually be easily calculated.
• Homology: Statement about common evolutionary ancestry of
two sequences – hypotheses
• A high degree of similarity implies a high probability of homology
Homologues: genes with a common ancestor

Remember the Hox genes from

Eukaryotic genomes lecture?
Conservation of sequence implies conservation of function

Key to finding important regions –

functional protein domains

https://ars.els-cdn.com/content/image/1-s2.0-S2001037015000070-gr3.jpg
Searching for similarity and homology
Outline

• Why BLAST?

• How BLAST works

• Running BLAST for yourself

Searching for similarity

• Important goal of genomics is to determine if a particular sequence is “like”

another sequence
• Compare new sequences with sequences already stored in a database
• Two alignment types: global and local
Global alignment

• Compares one whole sequence with another entire sequence (end to

end alignment)
• Suitable for aligning closely related species, for example comparing
two genes with the same function in humans and mouse

See lecture on sequence alignment for details

Needleman-Wunsch algorithm
Local alignment

• Uses a subset of a sequence to align a subset of other sequences

• Reveals regions that are highly similar, but do not necessarily
provide comparison across the entire two sequences
• Find conserved patterns in DNA sequences or conserved domains in
two proteins
• May uncover regions of homology that are related by descent
between otherwise diverse sequences

Smith-Waterman algorithm
How does BLAST work? Basic overview

The original BLAST program (Altschul et al 1990 J Mol Biol 215:403 )

• Sequence query is broken into words of length W
• Align all words with sequences in the database
• Calculate a score T for each word that aligns with a sequence in
the database using a substitution matrix
• Discard words whose T value is below a neighbourhood score
threshold
• Extend words in both directions until score drops below the
previous best score
Using BLAST: the different tools and databases

Query tool to retrieve

homologous genes from a
database BLAST
Sequence
Database
(Target)

Putative Function

Kerfeld and Scott, PLoS Biology 2011 9(2): e1001014.

Using BLAST: https://blast.ncbi.nlm.nih.gov/Blast.cgi
Outline

• Why BLAST?

• How BLAST works (basics)

• Running BLAST
Basic search strategy

• (1) Choose your sequence (query)

• (2) Choose the BLAST program
• (3) Choose the database to search
• (4) Choose optional parameters

• Then click “BLAST”

Getting a query sequence

• We start by finding our query sequence in FASTA format

• e.g. from Genbank, Ensembl, or a paper

Amino acid sequence of a protein, in FASTA format:

>ribosomal protein L7/L12 [Thiomicrospira crunogena XCL-2]
MAITKDDILEAVANMSVMEVVELVEAMEEKFGVSAAAVAVAGPAGDAGAAGEEQTEFDVVLTGAGDNKVAAIKAVRGATG
LGLKEAKSAVESAPFTLKEGVSKEEAETLANELKEAGIEVEVK

Nucleotide sequence of a gene, in FASTA format:

>gi|118139508:333094-333465 Thiomicrospira crunogena XCL-2
ATGGCAATTACAAAAGACGATATTTTAGAAGCAGTTGCTAACATGTCAGTAATGGAAGTTGTTGAACTTGTTGAAGCAAT
GGAAGAGAAGTTTGGTGTTTCTGCAGCAGCAGTTGCGGTTGCAGGTCCTGCAGGTGATGCTGGCGCTGCTGGTGAAGAAC
AAACAGAGTTTGACGTTGTCTTGACTGGTGCTGGTGACAACAAAGTTGCAGCAATCAAAGCCGTTCGTGGCGCAACTGGT
CTTGGGCTTAAAGAAGCGAAAAGTGCAGTTGAAAGTGCACCATTTACGCTTAAAGAGGGTGTTTCTAAAGAAGAAGCAGA
AACTCTTGCAAATGAGCTTAAAGAAGCAGGTATTGAAGTCGAAGTTAAATAA
What is the FASTA format?

• 1985: FASTP program developed for fast alignment of protein sequences, FASTN for
fast alignment of nucleotide sequences

• 1988: FASTA program for fast alignment of all sequence types - nucleotide or protein
https://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml

• Had a format for input query sequences

• FASTA the program is rarely used now – replaced by BLAST – but the FASTA format lives
on

”description line” (not read as sequence data) > ribosomal proteinL7/L12

• Begins with >
MAITKDDILEAVANMSVMEVVELVEA
• Ends with a new line MEEKFGVSAAAVAVAGPAGDAGAA
GEEQTEFDVVLTGAGDNKVAAIKAVR
Sequence data GATGLGLKEAKSAVESAPFTLKEG
(amino acid in this case) VSKEEAETLANELKEAGIEVEVK
What is the FASTA format?

• FASTA format files are one of the most common and universally used input files in
bioinformatics and sequence analysis

• Most bioinformatics software expect this format

• For more advanced users there is freely available code to input/output this format in
bioinformatic pipelines, e.g. BioPython and many R packages.

”description line” (not read as sequence data) > ribosomal proteinL7/L12

• What type of alignment do you

want?

Basic

• blastn – nucleotide – nucleotide

database

• blastp – protein – protein

database

More complex

• blastx – translated nucleotide –

protein database

• tblastn – protein – translated

nucleotide database

• tblastx – translated nucleotide – https://blast.ncbi.nlm.nih.gov/Blast.cgi

translated nucleotide database
NCBI BLAST interface (blastp: proteins)

Paste FASTA format

sequence here
Paste FASTA sequence here
NCBI BLAST interface (blastp: proteins)
Other input options

• When using NCBI’s BLAST, you can also use a GenBank

accession number instead of the FASTA sequence
• e.g. the accession for that ribosomal sequence is
WP_011369709.1
Choose the database to search

• nr/nt = non-redundant protein and nucleotide databases (most general

database)
• Human G+T – genomic and transcript database for humans
• dbest = database of expressed sequence tags
• gss = genomic survey sequences

Protein databases

Nucleotide databases
BLASTn page
• Default = human
• Next = mouse
• Then = others
• Default others = nr/nt (includes
many of those below, hence nr
= “non-redundant”.
• Many more specific databases
that allow to focus search
BLASTn page
• Search can be focussed on a
particular organism
• Model sequences and those
from uncultured organisms can
be excluded
• Entrez allows key words to be
used to restrict the search, e.g.
“olfactory receptor”
Choose optional parameters
• To understand optimal parameters you need to look at the output first!

Graphic
display

Descriptions
display

Alignment
display

Kerfeld and Scott, PLoS Biology 2011 9(2): e1001014.

BLAST results page: potential homologues identified

S‘ (Bit score) a E (expect value) a

measure of overall statistical
sequence similarity measure
Kerfeld and Scott, PLoS Biology 2011 9(2): e1001014.
How to do BLAST wrong: believing the E-value tells the
whole story

Does it cover the whole length of both the query and subject sequences?

Discovery of a Distant Homolog or

Garbage?

Kerfeld and Scott, PLoS Biology 2011

Is your result biologically meaningful?

Important considerations
• Understand the output
• Adjust the input
• Treat the analysis like an experiment (not like a google search)
Optional parameters:
Click here to see the list of algorithm
parameters that can be changed
Summary

• BLAST is not one tool – it is a suite of tools

• Can quickly search for nucleotide or protein sequences – or more

complex queries

• It is not a black box – you should understand how BLAST works

• Tuning the input parameters may be needed to find sequences of

interest

Next time: more complex BLAST & choosing those

optional parameters to improve your search

Cellular and Molecular Pharmacology
From Everand
Cellular and Molecular Pharmacology
Dr. Amteshwar Singh Jaggi
4.5/5 (6)
Lesson-Plan-in-Science-8 - Cell Division
100% (2)
Lesson-Plan-in-Science-8 - Cell Division
18 pages
01.12, TST Prep Test 12, The Reading Section
No ratings yet
01.12, TST Prep Test 12, The Reading Section
34 pages
BLAST
100% (1)
BLAST
4 pages
BLAST
No ratings yet
BLAST
17 pages
Blast Analisis II
No ratings yet
Blast Analisis II
15 pages
Final Blast PDF
No ratings yet
Final Blast PDF
31 pages
UNIT IV _ BLAST (1)
No ratings yet
UNIT IV _ BLAST (1)
21 pages
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
100% (1)
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
4 pages
Lecture 05
No ratings yet
Lecture 05
36 pages
Ncbi Blast Name: Rohith ND Roll No:20054
No ratings yet
Ncbi Blast Name: Rohith ND Roll No:20054
11 pages
Week 3 LocalAlignment
No ratings yet
Week 3 LocalAlignment
25 pages
BLAST
No ratings yet
BLAST
30 pages
Lecture/Lab: BLAST: Materials Last Updated June 2007
No ratings yet
Lecture/Lab: BLAST: Materials Last Updated June 2007
11 pages
Basic Local Alignment
No ratings yet
Basic Local Alignment
36 pages
Blast Fasta
No ratings yet
Blast Fasta
27 pages
BE Blast
No ratings yet
BE Blast
11 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Lab Report 03
No ratings yet
Lab Report 03
18 pages
ALLIENU Blast and Fasta
No ratings yet
ALLIENU Blast and Fasta
27 pages
Asic Ocal Lignment Earch Ool: B L A S T Blast
No ratings yet
Asic Ocal Lignment Earch Ool: B L A S T Blast
24 pages
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
100% (1)
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
38 pages
Bio 2
No ratings yet
Bio 2
39 pages
Using Genbank and BLAST in The Biology Classroom: Matt Wester
No ratings yet
Using Genbank and BLAST in The Biology Classroom: Matt Wester
9 pages
Lecture 9...Basic Local Alignment Tool (BLAST)-1
No ratings yet
Lecture 9...Basic Local Alignment Tool (BLAST)-1
11 pages
Bioinformatics: Blast and Sequence Analysis
No ratings yet
Bioinformatics: Blast and Sequence Analysis
45 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Bioinformatics: Arushi Dinesh Kasi Shruthi
No ratings yet
Bioinformatics: Arushi Dinesh Kasi Shruthi
28 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
blast-170122070200
No ratings yet
blast-170122070200
22 pages
Blast (Basic Local Alignment Search Tool)
No ratings yet
Blast (Basic Local Alignment Search Tool)
28 pages
Blast
No ratings yet
Blast
18 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Fundamentals of bioinformatics_L5
No ratings yet
Fundamentals of bioinformatics_L5
56 pages
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
4 pages
Merin 1
No ratings yet
Merin 1
10 pages
Blast
100% (1)
Blast
21 pages
Some Significant Databases Blast Blast
No ratings yet
Some Significant Databases Blast Blast
18 pages
Database Searching
No ratings yet
Database Searching
41 pages
BLAST Background
100% (1)
BLAST Background
27 pages
Bioinformatics 3 vedant
No ratings yet
Bioinformatics 3 vedant
7 pages
Blast
No ratings yet
Blast
12 pages
An Introduction To NCBI BLAST: Prerequisites Resources
No ratings yet
An Introduction To NCBI BLAST: Prerequisites Resources
23 pages
Fasta and Blast
No ratings yet
Fasta and Blast
3 pages
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
No ratings yet
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
26 pages
How To Use BLAST
No ratings yet
How To Use BLAST
18 pages
Blast ND Fasta
No ratings yet
Blast ND Fasta
28 pages
BI205 Prac 5&6
No ratings yet
BI205 Prac 5&6
11 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Blast
No ratings yet
Blast
115 pages
Mastering BLAST tutorial
No ratings yet
Mastering BLAST tutorial
4 pages
Lecture 4
No ratings yet
Lecture 4
106 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
Blast
No ratings yet
Blast
6 pages
Using BLAST: FASTA Format
0% (1)
Using BLAST: FASTA Format
3 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
From Everand
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
Fouad Sabry
No ratings yet
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Class Xii Biology Ncert Textbook Solution Chapter 9 Biotechnology
No ratings yet
Class Xii Biology Ncert Textbook Solution Chapter 9 Biotechnology
37 pages
076-Chapter 5 The Distribution and Molecular Identification of the Microcystis aeruginosa in Yuqiao Reservoir
No ratings yet
076-Chapter 5 The Distribution and Molecular Identification of the Microcystis aeruginosa in Yuqiao Reservoir
9 pages
Flow Cytometry and Cell Sorting
No ratings yet
Flow Cytometry and Cell Sorting
6 pages
Educ 101 Midterm Reviewer
No ratings yet
Educ 101 Midterm Reviewer
8 pages
NCERT - Solutions-Class8 - Reproduction in Animals Science
No ratings yet
NCERT - Solutions-Class8 - Reproduction in Animals Science
6 pages
8.5 Lesson 6 Handout Reading v2 New
No ratings yet
8.5 Lesson 6 Handout Reading v2 New
1 page
The Istanbul Consensus Workshop On Embryo Assessment: Proceedings of An Expert Meeting
No ratings yet
The Istanbul Consensus Workshop On Embryo Assessment: Proceedings of An Expert Meeting
13 pages
Ch 2 Cell Dr Amr Alaa Act
No ratings yet
Ch 2 Cell Dr Amr Alaa Act
14 pages
Kami Export - NaturalSelection2
No ratings yet
Kami Export - NaturalSelection2
3 pages
Jinu Resume
No ratings yet
Jinu Resume
4 pages
46519
No ratings yet
46519
86 pages
Celtic Immigrants in Ancient Peru a1
No ratings yet
Celtic Immigrants in Ancient Peru a1
28 pages
Steiner Et Al 2009 Contrib Nat Hist (Bern)
No ratings yet
Steiner Et Al 2009 Contrib Nat Hist (Bern)
21 pages
Cambridge International AS & A Level: Biology 9700/43
No ratings yet
Cambridge International AS & A Level: Biology 9700/43
16 pages
S1-S4 Bio Learner's Research Book (LBL)
100% (1)
S1-S4 Bio Learner's Research Book (LBL)
26 pages
Strain Improvement Techniques
No ratings yet
Strain Improvement Techniques
28 pages
Proteomics of Edible Mushrooms-A Mini-Review
No ratings yet
Proteomics of Edible Mushrooms-A Mini-Review
7 pages
Plasmid
No ratings yet
Plasmid
6 pages
Molecular Systematics
No ratings yet
Molecular Systematics
31 pages
Curriculum 2018-2019
No ratings yet
Curriculum 2018-2019
18 pages
PHI 312 Ethics of Biotechnology - Genetic Engineering - Cloning - Use of Humans and and Animals in Biomedical Research - Slides
No ratings yet
PHI 312 Ethics of Biotechnology - Genetic Engineering - Cloning - Use of Humans and and Animals in Biomedical Research - Slides
85 pages
Plateau of Muscle Growth
0% (1)
Plateau of Muscle Growth
18 pages
.Rave Complite 01-08
No ratings yet
.Rave Complite 01-08
17 pages
Biotechnology8 q3 Mod2 Manipulation-of-Genetic-Material
57% (7)
Biotechnology8 q3 Mod2 Manipulation-of-Genetic-Material
16 pages
Gene & Inheritance-Grd 8
No ratings yet
Gene & Inheritance-Grd 8
13 pages
checklist AQA-Trilogy-Biology-Paper-2
No ratings yet
checklist AQA-Trilogy-Biology-Paper-2
3 pages
ICSE Class 8 Biology Selina Solution Chapter 7 The Nervous System
No ratings yet
ICSE Class 8 Biology Selina Solution Chapter 7 The Nervous System
7 pages
Cell Division Meiosis Homework Assignment
100% (1)
Cell Division Meiosis Homework Assignment
8 pages

Bs982 l08 Basic Blast

Uploaded by

Bs982 l08 Basic Blast

Uploaded by

BS982

School of Life Sciences • University of Essex

• How BLAST works

• Running BLAST for yourself

• Falling cost of sequencing

• Imagine – you have sequenced something from an environmental sample

• Or you may have a protein/DNA sequence from a database:

BLAST is a tool that helps us find similar

BLAST can search for matches to DNA or

• Usually the first thing you do when you obtain a

• Looking for species. If you are sequencing DNA from unknown

• Similarity: Degree of likeness between two sequences, usually

Remember the Hox genes from

Key to finding important regions –

• How BLAST works

• Running BLAST for yourself

• Important goal of genomics is to determine if a particular sequence is “like”

• Compares one whole sequence with another entire sequence (end to

See lecture on sequence alignment for details

• Uses a subset of a sequence to align a subset of other sequences

The original BLAST program (Altschul et al 1990 J Mol Biol 215:403 )

Query tool to retrieve

Kerfeld and Scott, PLoS Biology 2011 9(2): e1001014.

• How BLAST works (basics)

• (1) Choose your sequence (query)

• Then click “BLAST”

• We start by finding our query sequence in FASTA format

• e.g. from Genbank, Ensembl, or a paper

Amino acid sequence of a protein, in FASTA format:

Nucleotide sequence of a gene, in FASTA format:

• Had a format for input query sequences

”description line” (not read as sequence data) > ribosomal proteinL7/L12

• Most bioinformatics software expect this format

”description line” (not read as sequence data) > ribosomal proteinL7/L12

• What type of alignment do you

• blastn – nucleotide – nucleotide

• blastp – protein – protein

• blastx – translated nucleotide –

• tblastn – protein – translated

• tblastx – translated nucleotide – https://blast.ncbi.nlm.nih.gov/Blast.cgi

Paste FASTA format

• When using NCBI’s BLAST, you can also use a GenBank

• nr/nt = non-redundant protein and nucleotide databases (most general

Kerfeld and Scott, PLoS Biology 2011 9(2): e1001014.

S‘ (Bit score) a E (expect value) a

Discovery of a Distant Homolog or

Kerfeld and Scott, PLoS Biology 2011

• BLAST is not one tool – it is a suite of tools

• Can quickly search for nucleotide or protein sequences – or more

• It is not a black box – you should understand how BLAST works

• Tuning the input parameters may be needed to find sequences of

Next time: more complex BLAST & choosing those

You might also like