0% found this document useful (0 votes)
108 views

BLAST: An Introductory Tool For Students To Bioinformatics Applications

The document discusses the BLAST algorithm which is a commonly used tool for comparing biological sequences. It provides an overview of how BLAST works and the steps involved, including filtering sequences, finding local alignments above a scoring threshold, and extending alignments. The article emphasizes that BLAST is an excellent tool for teaching students basic concepts in bioinformatics.

Uploaded by

Abdul j
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

BLAST: An Introductory Tool For Students To Bioinformatics Applications

The document discusses the BLAST algorithm which is a commonly used tool for comparing biological sequences. It provides an overview of how BLAST works and the steps involved, including filtering sequences, finding local alignments above a scoring threshold, and extending alignments. The article emphasizes that BLAST is an excellent tool for teaching students basic concepts in bioinformatics.

Uploaded by

Abdul j
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/267332265

BLAST: An introductory tool for students to Bioinformatics Applications

Article · December 2013

CITATIONS READS

0 11,294

4 authors:

Gareth Syngai Pranjan Barman


Lady Keane College, Shillong Gauhati University
10 PUBLICATIONS   166 CITATIONS    7 PUBLICATIONS   37 CITATIONS   

SEE PROFILE SEE PROFILE

Rupjyoti Bharali Sudip Dey


Gauhati University North Eastern Hill University
23 PUBLICATIONS   499 CITATIONS    170 PUBLICATIONS   521 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Ultra structure of Silk moth larvae View project

Validation of ITKS on Pest Management in NE India View project

All content following this page was uploaded by Gareth Syngai on 25 October 2014.

The user has requested enhancement of the downloaded file.


ISSN 2321 - 6077 GENERAL ARTICLE Keanean Journal of Science
Vol 2 2013 67-76

BLAST: An introductory tool for students to Bioinformatics


Applications

Gareth Gordon Syngai1*, Pranjan Barman2, Rupjyoti Bharali2 & Sudip Dey3
1
Department of Biochemistry, Lady Keane College, Shillong – 793001
2
Department of Biotechnology, Gauhati University, Guwahati – 781014
3
Sophisticated Analytical Instrument Facility, North Eastern Hill University, Shillong – 793022
[email protected], [email protected], [email protected], [email protected]
*Corresponding author: Gareth Gordon Syngai; email:[email protected]

Abstract
BLAST which is a sequence similarity search program is an excellent starting point for teaching bioinformatics
to students and it has the potential to enhance a student’s grasp of biomedical, biochemical, and biogeochemical
concepts. This article discusses the underlying concepts of the BLAST algorithm, the scores and statistics of the
alignments; with illustrations using the NCBI BLAST. The article also emphasizes the need for students to be
familiarized with the basic concepts and programs of bioinformatics which is a necessity in biological sciences
now-a-days because of the recent advances in high-throughput techniques for data generation and analysis.

Keywords
BLAST, algorithm, introductory tool, bioinformatics teaching, bioinformatics applications

Introduction
The Basic Local Alignment Search Tool (BLAST) which are often quite abstract in nature. This is possible
is one of the most commonly used tools for comparing because of the abundance of sequence data present in public
sequence information and retrieving sequences from databases which raises the far more attractive possibility
databases and is thus an excellent starting point for teaching of using searches tailored to a particular course, or, better
bioinformatics (Kerfeld & Scott, 2011). BLAST has been yet, allowing the students to choose their own examples.
utilized in nearly every branch of biology, far beyond the
scope of molecular genetics, molecular biology and protein Another benefit of teaching students how the BLAST
biochemistry, and this tool has made great contributions algorithm works is that it provides an opportunity to illustrate
to many scientific fields since its development (Altschul how mathematics functions as a language of biology.
et al., 1997; Altschul, 1991). Currently, the work of most But of higher significance is the fact that understanding
biologists, bioinformaticians, evolutionists and medical the steps in the calculation of an E-value provides an
scientists cannot progress without the use of BLAST opportunity to show the relationship between how the
(Dong-Wook et al., 2012). algorithm works based on the fundamental principles
The major reasons for the ever-growing popularity of biochemistry and evolution (Kerfeld & Scott, 2011).
of BLAST are the flexibility of the search algorithm, Here, this paper presents a concise and conceptual
reliable statistical reports, continual software development approach with simplified interpretations of the BLAST
and the speed attained by the heuristic search methods algorithm for helping the students understand the underlying
(Neumann et al., 2013). basics of the BLAST program which in turn has the
On the other hand, by using BLAST, students can be potential to enhance a student’s grasp of biomedical,
introduced to concepts of molecular evolution (e.g., gene biochemical, and biogeochemical concepts; thus helping
duplication and divergence; orthologs versus paralogs) widen the scope for multidisciplinary integration.

67
BLAST: The Tool accuracy of the algorithm is slightly decreased (Zhimin
BLAST is a sequence similarity search program that & Zhongwen, 2013).
can be used via a web interface or as a stand-alone tool The Algorithm
to compare a user’s query to a database of sequences The algorithm itself is straightforward, the important
(Altschul et al., 1997; Altschul et al., 1990). There are concept being that of the segment pair. Given two sequences,
several types of BLAST to compare all combinations of a segment pair is defined as a pair of sub-sequences of the
nucleotide or protein queries with nucleotide or protein same length that form an ungapped alignment. BLAST
databases (McGinnis & Madden, 2004). BLAST performs calculates all segment pairs between the query and the
comparisons between pairs of sequences, searching for database sequences, above a scoring threshold. The
regions of local similarity (Pertsemlidis & Fondon III, algorithm searches for fixed-length hits, which are then
2001). extended until certain threshold parameters are achieved.
The rationale for local similarity searching is that The resulting high-scoring pairs (HSPs) form the basis of
functional sites (e.g., catalytic sites of enzymes) are the ungapped alignments that characterize BLAST output.
localized to relatively short regions, which are conserved Subsequently, a modification of the algorithm had been
irrespective of deletions or mutations in intervening parts introduced for generating gapped alignments (Altschul
of the sequence. Thus, a search for local similarity may et al., 1997). The new algorithm seeks only one, rather
produce more biologically meaningful and sensitive than all, ungapped alignments that make up a significant
results than a search attempting to optimize alignment match, and hence speeds the initial database search.
over the entire sequence lengths (Attwood et al., 2007). Dynamic programming is used to extend a central pair
of aligned residues in both directions to yield the final
Sequence similarity searching, typically with BLAST,
gapped alignment. Having dropped the requirements
is the most widely used and most reliable strategy for
to find all ungapped alignments independently, the
characterizing newly determined sequences. Sequence new algorithm is three times faster than its predecessor
similarity searches can identify “homologous” proteins (Attwood et al., 2007).
or genes by detecting excess similarity between the
newly determined sequence (the query sequence) and any The Steps
similar sequence in the database; which in turn reflects There are three major steps in the BLAST algorithm
common ancestry. and the details of which are as described below:
Homology implies that sequences may be related by Step 1: BLAST filters the low complexity regions (e.g.,
divergence from a common ancestor or share common CA repeats) and removes them from the query sequence
functional aspects. Homologous genes found in different (Pertsemlidis & Fondon III, 2001). The reasons being
species that evolved from the same gene in a common that low-complexity regions and interspersed repeats
ancestor are called orthologs, whereas homologous typically match many sequences, and as such these
genes in the same organism (arising by duplication of a matches are normally not of biological interest which
single gene in the evolutionary past) are called paralogs. may in turn lead to spurious results, and confound the
Homologous genes (both orthologs and paralogs) often statistics used by BLAST.
have the same or related functions (Pierce, 2002). BLAST offers two query masking modes to avoid such
Sequence homology searches are a key computational matches. One is known as “hard-masking” and replaces the
tool of molecular biology and they are important as their masked portion of the query by X’s or N’s for all phases
products, the high scoring alignments, are used in a of the search. On the other hand, “soft-masking” makes
range of areas, from estimating evolutionary histories, to the masked portion of the query unavailable for finding
predicting functions of genes and proteins, to identifying the initial word hits, but the masked portion is available
possible drug targets (Pearson, 2013; Bayat, 2002; Bailey for the gap-free and gapped extension once an initial
& Gribskov, 1998). word hit has been found (Camacho et al., 2009). Filtering
is only applied to the query sequence (or its translation
The BLAST algorithm was described by Altschul et al. products), not to the database sequences. Default filtering
in 1990. It became popular largely because implementations is by the Nucleotide Dust Masker program (Morgulis
of it have been very efficient and it has been optimized et al., 2006) and SEG program (Wootton & Federhen,
to work with parallel UNIX architectures from an early 1996). The BLAST formatter now can represent these
stage (Attwood et al., 2007). The BLAST algorithm is regions by lower-case letters, making them distinct from
a heuristic program, which means that it relies on some the (upper-case) non-filtered regions. In addition, the user
smart shortcuts to perform the search faster (Madden, may select from three colors (black, gray, red) to vary
2002). However, in this trade-off for increased speed, the the emphasis on these regions. This new display option

68 Keanean Journal of Science Vol. 2 2013


is now the default, showing the masked regions in gray ungapped alignments. These joined regions are then
lower-case (Ye et al., 2006). extended using the same method as in the original BLAST.
Next, the query sequence which is a long string of Next, BLAST identifies and list the maximal scoring
either nucleotide or amino acids is first broken into small segment pairs (MSPs) from the entire database (Pertsemlidis
pieces called “words”. As a default setting, the DNA & Fondon III, 2001). A maximal scoring segment pair
sequences are broken into 11 consecutive letters (word (MSP) is defined to be the highest scoring pair of identical
length) and amino acids into 3 letters. However, users length segments chosen from 2 sequences. An MSP is
can change the ‘word length’ as desired (Pertsemlidis & reported if its score exceeds a cutoff value S (Altschul et
Fondon III, 2001). For example, a nucleotide sequence al., 1990), which is calculated by using the parameters
ATCGTCGAT with word length 7 produces three different of W (word length), T (the neighbourhood word score
words ATCGTCG (first word), TCGTCGA (second word) threshold), X (the maximum permissible drop off of the
and CGTCGAT (third word). cumulative segment score), and a substitution matrix
like the BLOSUM 62 for most of the BLAST programs
There can be at most L-w+1 such words, where L is (Deusdado & Carvalho, 2008; Rivera et al., 1998).
the query sequence length, and w is the word length; in
case of amino acid sequences w = 3 and for nucleotide Finally, the score and statistics of the alignments which
sequences w = 11. are calculated are then depicted in the form of results on
the output window.
BLAST then uses a scoring matrix BLOSUM (block
substitution matrix) or PAM (percent accepted mutation) BLAST Scores and Statistics
to determine all high-scoring matching words from the BLAST provides three related pieces of information
database for each word in the query sequence (BLOSUM in the form of the raw scores, bit scores, and E-values
62 is used as a default setting for amino acids). No gaps that allows interpretation of its results.
are allowed. The list of matches is reduced by taking only
those that will score above a given threshold, called the The raw score for a local sequence alignment is
neighbourhood word-score threshold (T). After doing the sum of the scores of the maximal-scoring segment
this, approximately 50 of these matches are usually kept pairs (MSPs) that make up the alignment. Because of
for each of the words generated from the original query. differences between scoring matrices, raw scores aren’t
always directly comparable. Bit scores on the other hand,
Step 2: BLAST searches through the target sequence are raw scores that have been converted from the log base
database for exact matches to the word list generated. If a of the scoring matrix that creates the alignment to log
match is found, it is used to seed (hit) a possible alignment base 2. This rescaling allows bit scores to be compared
between the query and the database sequences. between alignments even if different scoring matrices
Step 3: These initial neighbourhood word hits act as have been used (Madden, 2002; Gibas & Jambeck, 2001).
seeds for initiating searches to find longer high-scoring Thus, BLAST uses statistical theory to produce a
pairs (HSPs) containing them. bit score and expect value (E-value) for each alignment
The word hits are then extended in both left and right pair (query to hit).
directions along each sequence for as far as the cumulative The bit score gives an indication of how good the
alignment score can be increased. The critical parameter alignment is; with the higher the score, the better the
controlling extension is called X. Low values of X cause alignment. In general terms, this score is calculated from
alignments to terminate after only a few mismatches have a formula that takes into account the alignment of similar
been found, while high values of X allow alignments or identical residues, as well as any gaps introduced to
to continue through dissimilar regions. Extension of align the sequences. A key element in this calculation is the
the word hits in each direction are halted when: the “substitution matrix”, which assigns a score for aligning
cumulative alignment score falls off by the quantity X any possible pair of residues. The BLOSUM62 matrix
from its maximum achieved value; the cumulative score is the default for most BLAST programs, the exceptions
goes to zero or below due to the accumulation of one or being blastn and MegaBLAST (programs that perform
more negative-scoring residue alignments; or the end nucleotide-nucleotide comparisons and hence do not use
of either sequence is reached (Deusdado & Carvalho, protein-specific matrices).
2008; Korf, 2003).
The E-value on the other hand, gives an indication of
Gapped BLAST (Altschul et al., 1997) uses a lower the statistical significance of a given pair-wise alignment
threshold for generating the list of high-scoring matching and reflects the size of the database and the scoring
words; the algorithm uses short matched regions with no system used. The lower the E-value, the more significant
insertions or deletions between them and within a certain is the hit. A sequence alignment that has an E-value of
distance of each other as the starting points for longer 0.05, means that this similarity has a 5 in 100 (1 in 20)

Keanean Journal of Science Vol. 2 2013 69


chance of occurring by chance alone. Thus, an E-value identify proteins in new, un-described genomes. Finally,
greater than 1 indicates that the alignment probably has TBLASTX compares all six reading frames of a translated
occurred by chance, and that the query sequence has nucleotide query sequence to all six reading frames of a
been aligned to a sequence in the database to which it is translated nucleotide database.
not related. E-values less than 0.1 or 0.05 are typically In addition, NCBI has some of its own specialized
taken to represent biological significance (Madden, 2002; variants of BLAST. For example, MEGABLAST is a
Pertsemlidis & Fondon III, 2001); with the default E-value program that can rapidly complete searches for sequences
being 10, that is, 10 hits are expected to occur by chance with only minor variations and it can more efficiently
with scores equal to or greater than the alignment score. manage queries with longer sequences (Altschul et al.,
The BLAST Family of Programs 1994). PSI- and PHI- are powerful BLAST tools that allow
more complex and evolutionary divergent proteins to be
Since 1990, many variants of BLAST have been
aligned (Altschul et al., 1997). These and other programs,
developed, each with its own specialized features. Early
as well as genomic BLAST databases, are all available
on, the original BLAST was split into two adaptations:
on the NCBI BLAST website (Lobo, 2008).
NCBI BLAST and Washington University BLAST (WU
BLAST). Both the BLASTs have program variations. BLAST+, CS-BLAST and DELTA-BLAST are the
Examples of the programs include BLASTN which can be other user-friendly BLAST interfaces with increasing
used to compare a nucleotide sequence with a nucleotide computer processing power and new algorithms (Neumann
database; BLASTP which can be used to compare a et al., 2013).
protein sequence with a database of protein sequences; Performing the BLAST Run
and BLASTX which can take a nucleotide sequence,
BLAST search can be performed using the NCBI
translate it, and query it versus a protein database in one
website from the web address
step (Gish & States, 1993). TBLASTN can compare a
protein query sequence to all six possible reading frames http:// www.ncbi.nlm.nih.gov/Blast. There are various
of a translated nucleotide database and is often used to BLAST options available on this home page (Figure 1)

Fig 1. NCBI BLAST home page at www.ncbi.nlm.nih.gov/blast.

70 Keanean Journal of Science Vol. 2 2013


>gi|7558|emb|Z00030.1| D. melanogaster alcohol dehydrogenase structural gene and flanks
(composite sequence)

CATCCTCGCCCGTTTCCACGCCGTCGTCCTCCTCATCATCGGCGAGAGCTGATTGCGTGGTGGTCAGAGG
CGAACCAGCGGTCTTCGTGGAGCTGGGACCCAGATCAAGGCTGCTCAACAGATTGCCTGCCGACTGGGAA
GACGTTAGGGTGTCCTTGTGATAGGAGCTGTGCCGATTGCCCAGCTTAGTGGATAGTGTTAGGTCGCCGT
TGCTCGTTGGGCGTAGACTGCCCACCACCTGACCACCGGGCAGGGTGGCGCTTCTCTTGTGGCGACCCTT
CGACTTGGGAAAGGCAGCCAGGATGTTGAGCCACCACTGGGATTCCTCTGAACTGGTGCCCTTCACAAAG
GTCACGCGCTCGGGAGCGGTTATGGCGATGGAGTTGGGGTGACCTGTCACCTCCACGGCGCTGGTAACCT
CCAGCACTTTGGTCATATCAACGCACGCCTGCGGTATGGTTTCGGGCTATAGAAAATATATGTAAATTAA
AGAGTAAACAAGTTGTATTTTAAGATTTTAATTAGGAGAATTAATTAATCGGTAATCAAATGAACTCGGC
CTATCGCGTAATAATATACATTTTTTAATTTAATGACTAATAAATAATATAAAATCTAATTAATAGTTCA
GTAAGTTAGTAAAAGTAAATCAATCTGGTGGTAATTTAAGAAGCCACTTTAATTCTTCCACTTCATAAAT
Fig 2. Nucleotide sequence of D. melanogaster alcohol dehydrogenase structural gene and flanks (composite
sequence) [GenBank: Z00030.1] in FASTA format.

Fig 3. BLAST window with the query sequence pasted in it and the selected databases.

Keanean Journal of Science Vol. 2 2013 71


Fig 4. Graphical Summary Output of BLAST showing the homology coverage between query and the Hits.

Fig 5. Description section in the BLAST report showing one-line summaries of sequences producing significant alignments.

Fig 6. Alignment section from a BLAST report showing pair-wise sequence alignment between a query sequence
and a database sequence.

72 Keanean Journal of Science Vol. 2 2013


Further, for performing a BLAST search, the query sequence Hence, bioinformatics tools that allow scientists to
should be in FASTA format as shown in (Figure 2). explore genome sequence data have become a corner-
stone of current biological research and as such should
In order to perform the run, first we have to open the NCBI
be included in any modern biology curriculum (Klein &
home page and then click on BLAST. Then the type of
Gulsvig, 2012; Ditty et al., 2010; Ranganathan, 2005).
BLAST options can be selected from this window. In this
No science curriculum can remain current without a
case, we have selected the nucleotide blast option which
bioinformatics component. Undergraduate students
is in Basic BLAST. Subsequently, the query sequence
increasingly need training in methods related to finding
which is in FASTA file is pasted in the Enter Query
and retrieving information stored in vast databases
Sequence section of the window. Next, the database for
(Maloney et al., 2010).
performing the BLAST is selected from the drop down
menu; in our case we have selected the nr database; and It is in this context, that BLAST finds its ideal
from the program selection we have optimize for blastn place as an important introductory tool for students to
(Figure 3). Subsequently we can click the BLAST option bioinformatics applications and this tool is also one of the
and wait patiently for a few seconds. most used bioinformatics approaches which is accessible
to any researcher over the internet, and is routinely used to
After a few seconds, the BLAST output window will appear
assign sequences into functional and taxonomic categories;
showing the results of the BLAST search (Figs 4, 5 & 6).
with its application ranging from the analysis of raw
Applications of BLAST in Biological Sciences sequence data and genome comparisons, often extending
into sequence-based data mining (Neumann et al., 2013).
The BLAST tool finds its use in a wide range of
biological applications and some of which includes: Nonetheless, it is therefore important that a modern
identification of homologous gene candidates across biology course ought to have a nature of instruction that
diverse genomes (Lu et al., 2006); species comparison by familiarizes the students with the basic concepts and
identifying similar genes in different organisms (Holton, programs of bioinformatics which is a necessity in the
2004); comparative gene prediction which involves biological sciences now-a-days as because the biologists
conducting a search between two genome sequences to will continue to use the so-called wet labs to give students
provide both sensitive and specific gene predictions (Parra a chance to experience experimental techniques involving
et al., 2003); functional annotation of genomes for the organisms, tissues, and cellular components first hand.
identification of functional properties and biological roles But in silico dry labs involving bioinformatics techniques
of the genes in the genomes (Moriya et al., 2007); contig and virtual lab exercises can be very effective, especially
mapping for efficient gap-closure of prokaryotic genome in genetics, cell biology, and molecular biology (Maloney
sequence assemblies (van Hijum et al., 2005); pseudogene et al., 2010).
identification for understanding the evolutionary history Acknowledgements
of genes and genomes (Zhang et al., 2006).
The authors gratefully acknowledge the vital inputs
This tool is also helpful in building datasets for given by Kitriphar Tongper, Gopi Ragupathi, and Alagu
phylogenetic analysis (Dereeper et al., 2010), and Lakshmanan during the writing of this manuscript.
constructing phylogenetic dendrograms/trees from protein References
sequences (Kelly & Maini, 2013). Further, it is also used
for designing target-specific primers for polymerase chain Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.,
reaction (Ye et al., 2012). Zhang, Z., Miller, W., Lipman, D.J. 1997. Gapped
BLAST and PSI-BLAST: a new generation of protein
Conclusion database search programs. Nucleic Acids Research
The parallel development of large-scale sequencing 25: 3389–3402.
projects and bioinformatics tools like BLAST has enabled Altschul, S.F., Boguski, M.S., Gish, W., Wootton, J.C. 1994.
scientists to study the genetic blueprint of life across many Issues in searching molecular sequence databases.
species and has helped bridge the gap between biology and Nature Genetics 6: 119–129.
computer science in the maturing field of bioinformatics
(Lobo, 2008). It is noteworthy to mention here that as Altschul, S.F. 1991. Journal of Molecular Biology
the biological sequence data are generated at an ever 219: 555–565.
increasing rate, the role of bioinformatics in biological Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman,
research will also continue to grow (Newell et al., 2013). D.J. 1990. Basic local alignment search tool. Journal

Keanean Journal of Science Vol. 2 2013 73


of Molecular Biology 215: 403–410. Kerfeld, C.A., Scott, K.M. 2011. Using BLAST to
teach “E-value-tionary” Concepts. PLoS Biology
Attwood, T. K., Parry-Smith, D, J., Phukan, S. (Eds.) 2007.
9(2): e1001014.
Pairwise Alignment Techniques. In: Introduction to
Bioinformatics, Dorling Kindersley (India) Pvt. Ltd., Klein, J.R., & Gulsvig, T. 2012. Using bioinformatics to
New Delhi pp. 114-138. develop and test hypotheses: E. coli-specific virulence
determinants. Journal of Microbiology & Biology
Bailey, T.L. & Gribskov, M. 1998. Combining evidence
Education 13(2): 161-169.
using p-values: application to sequence homology
searches. Bioinformatics 14(1): 48–54. Korf, I. 2003. Serial BLAST searching. Bioinformatics
19(12): 1492-1496.
Bayat, A. 2002. Science, medicine, and the future:
Bioinformatics. In Clinical review. British Medical Lobo, I. 2008. Basic Local Alignment Search Tool
Journal 324: 1018–22. (BLAST). Nature Education 1(1): 215.
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Lu, G., Jiang, L., Helikar, R. M. K., Rowley, T. W., Zhang,
Papadopoulos, J., Bealer, K., Madden, T.L. 2009. L., Chen, X., Moriyama, E.N. 2006. GenomeBlast:
BLAST+: architecture and aplications. BioMed a web tool for small genome comparison. BioMed
Central Bioinformatics 10: 421. Central Bioinformatics, 7(Suppl 4):S18: 1- 9.
Dereeper, A., Audic, S., Claverie, J-M., Blanc, G. 2010. Madden, T. 2002. The BLAST sequence analysis tool.
BLAST-EXPLORER helps you building datasets for In: NCBI Handbook (Eds. Mc Entyre, J., Ostell,
phylogenetic analysis. BioMed Central Evolutionary J.), National Library of Medicine, Bethesda, MD.
Biology, 10(8) pp. 1 – 6. Maloney, M., Parker, J., LeBlanc, M., Woodard, C.T.,
Deusdado, S.A.D., & Carvalho, P.M.M. 2008. SimSearch: Glackin, M., Hanrahan, M. 2010. Bioinformatics
A New Variant of Dynamic Programming Based and the Undergraduate Curriculum Essay. CBE-Life
on Distance Series for Optimal and Near-Optimal Sciences Education 9: 172-174.
Similarity Discovery in Biological Sequences. In: J.M. McGinnis, S., Madden, T.L. 2004. BLAST: at the core
Corchado et al. (Eds.) IWPACBB2008, Advances of a powerful and diverse set of sequence analysis
in Soft Computing 49: 206-216. tools. Nucleic Acids Research 32: W20–W25.
Ditty, J.L., Kvaal, C.A., Goodner, B., Freyermuth, S.K., Morgulis, A., Gertz, E.M., Schaffer, A.A., Agarwala, R.
Bailey, C., et al. 2010. Incorporating genomics and 2006. A fast and symmetric DUST implementation
bioinformatics across the life sciences curriculum. to mask low-complexity DNA sequences. Journal
PLoS Biology 8(8): e1000448. of Computational Biology 13: 1028–1040.
Dong-Wook Kim, Ryong Nam Kim, Dae-Soo Kim, Sang- Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A.C., Kanehisa,
Haeng Choi, Sung-Hwa Chae, Hong-Seog Park. 2012. M. 2007. KAAS: an automatic genome annotation
easySEARCH: A user-friendly bioinformatics program and pathway reconstruction server. Nucleic Acids
that enables BLAST searching with massive number Research, 35: W182–W185.
of query sequences. Bioinformation 8(16): 792–794.
Neumann, R.S., Kumar, S., Shalchian-Tabrizi, K. 2013.
Gibas, C., & Jambeck, P. 2001. Sequence Analysis, Pairwise BLAST output visualization in the new sequencing
Alignment, and Database Searching. In: Developing era. Briefings in Bioinformatics.
Bioinformatics Computer Skills; O’Reilly Media
Newell, P.D., Fricker, A.D., Roco, C.A., Chandrangsu,
Inc., Seventh Indian Reprint (2008) pp. 159-190.
P., Merkel, S.M. 2013. A Small-Group Activity
Gish, W., & States, D.J. 1993. Identification of protein Introducing the Use and Interpretation of BLAST.
coding regions by database similarity search. Nature Journal of Microbiology & Biology Education
Genetics 3(3): 266-272. 14(2): 238-243.
Holton, W. C. 2004. The Path to Species Comparison. In: Parra, G., Agarwal, P., Abril, J.F., Wiehe, T., Fickett, J.W.,
Environmental Health Perspectives 112(12): A 672. & Guigo, R. 2003. Comparative Gene Prediction in
Human and Mouse. Genome Research 13:108–117.
Kelly, S., Maini, P.K. 2013. DendroBLAST: Approximate
Phylogenetic Trees in the absence of Multiple Sequence Pearson, W.R. 2013. An Introduction to Sequence Similarity
Alignments. PLOS ONE 8(3): e58537 pp. 1-11. (“Homology”) Searching. Current Protocols in

74 Keanean Journal of Science Vol. 2 2013


Bioinformatics John Wiley & Sons, Inc. 42: 3.1.1- 3.1.8. Wootton, J.C., & Federhen, S. 1996. Analysis of
Pertsemlidis, A., Fondon III, J.W. 2001. Having a BLAST compositionally biased regions in sequence databases.
with bioinformatics (and avoiding BLASTphemy). Methods in Enzymology 266: 554–571.
Genome Biology Reviews 2(10): 2002.1-2002.10. Ye, J., Coulouris, G., Zaretskaya, I., Cutcutache, I.,
Pierce, B. A. 2002. Genomics. In: Genetics: a conceptual Rozen, S., Madden, T. L. 2012. Primer-BLAST: A
approach; W.H. Freeman and Co. pp. 548-582. tool to design target-specific primers for polymerase
chain reaction. BioMed Central Bioinformatics
Ranganathan, S. 2005. Bioinformatics education-
13(134): 1-11.
Perspectives and challenges. PLoS Computational
Biology 1(6): e52. Ye, J., McGinnis, S., Madden, T.L. 2006. BLAST:
improvements for better sequence analysis. Nucleic
Rivera, M.C., Jain, R., Moore, J.E., Lake, J.A. 1998.
Acids Research 34: W6–W9.
Genomic evidence for two functionally distinct
gene classes. Proceedings of the National Academy Zhang, Z., Carriero, N., Zheng, D., Karro, J., Harrison,
of Sciences of the United States of America 95: P.M., Gerstein, M. 2006. PseudoPipe: an automated
6239-6244. pseudogene identification pipeline. Bioinformatics
22(12): 1437–1439.
van Hijum, S. A. F. T., Zomer, A.L., Kuipers, O.P., Kok, J.
2005. Projector 2: contig mapping for efficient gap- Zhimin, Z., & Zhongwen, C. 2013. Dynamic Programming
closure of prokaryotic genome sequence assemblies. for Protein Sequence Alignment. International Journal
Nucleic Acids Research 33: W560–W566. of Bioscience and Biotechnology 5(2): 141–150.

Received 09 December 2013: Revised Accepted 20 December 2013

Keanean Journal of Science Vol. 2 2013 75


View publication stats

You might also like