Bs982 l08 Basic Blast
Bs982 l08 Basic Blast
Welcome to BLAST
Basic Local Alignment Search Tool
Ben Skinner
• Why BLAST?
https://www.ncbi.nlm.nih.gov/genbank/statistics/
Why BLAST?
Extract DNA
Your Sequence
DNA:
ATGTGATCCGACTATGACA….
PROTEIN:
MRPILMKGHERPLTFLRYNRDG….
https://www.universe-review.ca/I11-50-DNAsequencing1.jpg
Why BLAST?
Your Sequence
DNA:
ATGTGATCCGACTATGACA….
PROTEIN:
MRPILMKGHERPLTFLRYNRDG….
Why BLAST?
https://ars.els-cdn.com/content/image/1-s2.0-S2001037015000070-gr3.jpg
Searching for similarity and homology
Outline
• Why BLAST?
Needleman-Wunsch algorithm
Local alignment
Smith-Waterman algorithm
How does BLAST work? Basic overview
Putative Function
• Why BLAST?
• Running BLAST
Basic search strategy
• 1985: FASTP program developed for fast alignment of protein sequences, FASTN for
fast alignment of nucleotide sequences
• 1988: FASTA program for fast alignment of all sequence types - nucleotide or protein
https://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml
• FASTA the program is rarely used now – replaced by BLAST – but the FASTA format lives
on
• FASTA format files are one of the most common and universally used input files in
bioinformatics and sequence analysis
• For more advanced users there is freely available code to input/output this format in
bioinformatic pipelines, e.g. BioPython and many R packages.
Basic
More complex
Protein databases
Nucleotide databases
BLASTn page
• Default = human
• Next = mouse
• Then = others
• Default others = nr/nt (includes
many of those below, hence nr
= “non-redundant”.
• Many more specific databases
that allow to focus search
BLASTn page
• Search can be focussed on a
particular organism
• Model sequences and those
from uncultured organisms can
be excluded
• Entrez allows key words to be
used to restrict the search, e.g.
“olfactory receptor”
Choose optional parameters
• To understand optimal parameters you need to look at the output first!
Graphic
display
Descriptions
display
Alignment
display
Does it cover the whole length of both the query and subject sequences?
Important considerations
• Understand the output
• Adjust the input
• Treat the analysis like an experiment (not like a google search)
Optional parameters:
Click here to see the list of algorithm
parameters that can be changed
Summary