F 56665

Uploaded by

Maria Tsirka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

8 views

F 56665

Uploaded by

Maria Tsirka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 3

Figure 1. The sensi of he woh and on-hire at action of ‘Sbscore Usngle ALOSUMEG amine ad sabtaion as(08) ae tag rguenis gpd by equaion 3 and the background ane sd frequracsP, of Reisen and Rebinson (2,100 000 adel HSPs were pore fer cach ofthe wal stes 3792 conesponding 1 toad Sees 9015.1 Iwas deteninedty npecuon wht uch ISP aed tc casa wo mo-oedapyng engl Wed ps it omin sor elet Tieand wit adsunce Woon snot or asin eng word pa wih sein sora eat 1, The conerpondig petite of mang an BSP Shing the wot eure with T= [andthe one-hit heute wth T= 1, ste plated aa fiction of normalized HSP scoe The twodteehod ote Fenelon lisPr wh score at ea 3b efficiently. Specifically, we choose a window length A, and invoke an extension only when two non-overlapping hits are found within distance A of one another on the sane diagonal. Any hit that overlaps the most recent one is ignored. Efficient execution requires an array to record foreach diagonal, the frst Coordinate ofthe most recent hit found. Since databate sequences are scanned sequentially this coordinate always inceates for foccessive hits, The idea of seeking mulple hts on the same diagonal was first used in the context of biclogeal databace searches by Wilbur and Lipman (1. Because we require two his rather than one to invoke an extension, te threshold parameter 7 mnt be lowered to retain comparable sensitivity. The effect is that many more single bit fre found, but only a smal faction have an associated second hit oon the same diggonal that biggers an extension, The great tale of hits may be dsmlsed ater the minor calculation of looking up, forthe appropriate diagonal, the coordinate ofthe tost recent hit checking whether itis within distance A of the ‘rent hits coordinate and finaly replacing the old wih the pew Coordinate. Empirically, the computation saved by tequring fewer extensions more than 38 bits. While this ‘would appear sufficient for most purposes, the one-hit default T parameter has typically been set as low as 11, yielding an execution time nearly three times that for T= 13. Why pay this price for what appears at best marginal gains in sensitivity? The4394 Nucleic Acids Research, 1997, Vol. 25, No. 17 ‘The times required by various steps of the BLAST algorithm vvary substantially from one query and one database to another, ‘Table 1 shows typical relative times spent by the original and the gapped BLAST programs on various algorithmic stages. The ‘original BLAST’ program is represented, here and below, by a variant form of blastp version 1.4.9, modified so that it uses the same edge-effect correction (22) and background amino acid frequencies as the ‘gapped BLAST". The times represent the average for tee different queries, with the time for the original BLAST program normalized in each instance to 100 units More concretely, to search SWISS-PROT (26), release 34 (59576 sequences: 21 219 450 residues), with the lengtt-567 influenza A virus hemagglutinin precursor (27) as query, the original BLAST program requires 45.8 s, and the gapped BLAST program 15.8 s, This timing experiment, and others referred to below, was run on one 200 Miz R10000 epu processor of a lightly loaded SGI Power Challenge XL computer with 2.5, Gbytes of RAM, This machine runs the operating systema IRIX, version 6.2, which is an implementation of UNIX. We used the standard SGI C compiler, with the -O flag for optimization, to compile all ersions of the programs. The times reported are the user times given by the rime command, and are forthe better of two identical runs ‘A closely related type of gapped extension routine to that used here was developed by G, Myers during the evaluation of the original BLAST algorithm. It was not included in the publicly distributed code primarily because the then current strategy of extending every hit decreased the algorithm's speed unduly forthe relatively small gain in sensitivity realized (1). ‘As discussed above, the statistical significance of gapped alignments may be evaluated using the two statistical parameters ‘agand Ke, The current version ofthe Fasta program (2) estimates, these parameters on each run, by analyzing the distribution of alignment scores proxtced by all the sequences in the database BLAST gains speed by producing alignments for only the few database sequences likely to be related to the query, and therefore does not have the option of estimating Ay and Kz on the Ay, Instead, it uses estimates of these parameters produced before hand by random sinmulation (3). A drawback ofthis approach is thatthe program may not accept an arbitrary scoring system, for which no simulation has been performed, and still produce accurate estimates of statistical significance. The original BLAST programs, in contrast, because they dealt only with ungapped Tocal alignments, could derive 2, and Ky ftom theory for any scoring matrix (8,9) ITERATED APPLICATION OF BLAST TO POSITION-SPECIFIC SCORE MATRICES Database searches using position-specifie score matrices, also called profiles or motifs, often are much better able to detect weak relationships than are database searches that use a simple sequence as query (28-38). Employing these methods, however, frequently has involved the use of several different programs and a fair degree of expertise. Accordingly, to render the power of motif searches more readily available, we have written a procedure to construct a position-specific score matrix automati- cally from the output of a BLAST run, and modified BLAST to ‘operate using such a matrix in the place of a simple query. The resulting PSI-BLAST program often is substantially more sensitive than the corresponding BLAST program, but for each iteration takes litle more than the same time to run. In related work, Henikoff and Henikoff (39) have described how, short of modifying BLAST so that it may operate on a position-specific score malrix, a single artificial sequence that approximates such a matrix may be used as a query with the original BLAST programs The construction of a position-specifie score matrix is a smult-stage process, and at each stage a choice must be made among a number of alternative routes. We have been guided by the goals of automatic operation, speed of execution, and general simplicity. The issues discussed below are: (i) general architecture of the score matrix; (i) construction of the multiple alignment from which the matrix is derived: (ii) weights for sequences within the multiple alignment, and evaluation of the effective number of independent observations it constitutes; (Gv) estimation of target frequencies, and the construction of ‘matrix scores; (v) applying BLAST to a position-specific matrix, and the statistical evaluation of search results. We do not claim ‘our current implementation is optimal, and it is likely that over time some of its details will change Score matrix architecture “The alignment ofa simple sequence with a pattern embodied by 1 position-specific score matrix is almost completely analogous to the alignment of two simple sequences. The only real difference is that the score for aligning a leter with a pattern position is given by the matrix itself, rather dan with reference to substitution matrix. For proteins, a query of length L and a substitution matrix of dimension 20 x 20 are replaced by a position-specific matzix of dimension L x 20. Position-specific gap costs may be defined as well (34,40). As with pairwise sequence comparison, one may choose among finding the best global alignment of the matrix and the simple sequence (23), finding the best aligament of the complete matrix with a segment of the sequence (41), and finding the best local alignment of the matrix and sequence (2. Position-specific protein score matrices draw their power from two sources. The fist is improved estimation of the probabilities ‘with which amino acids occur at various pattern positions, leading to a more sensitive scoring system. The second is relatively precise definition of the boundaries of important motifs. By demanding the complete alignment of one or more motif, rather than seeking an arbitrary local alignment, the size of the search space may be greatly reduced, thereby lowering the level of random noise, Unfortunately, there are many obstacles to automating well the delineation ofa set of motifs from the output ofa database search. The query sequence may contain a variety of different domains, and share different subsets of them with different proteins in the database. Furthermore, defining the proper extent of even a single motif may be challenging (42). Accordingly, we have chosen to forgo the potential advantages of restricting the length of our derived matrices, and then demanding that they be completely aligned with segments of database sequences (41). Instead, each matrix we construct has Tength precisely equal to that of the original query sequence. When searching the database with such a matrix, we seck local alignments, in full analogy to those sought by BLAST when used for straightforward sequence-sequence comparison. Finally, we 4do not altempt to derive position-specific gap scores for use with ‘our position-specific substitution scores, Instead, in each iteration000 200 200 200 4080 oo 70 Optima oosl alignment scare Figure 6. The dissbuton of opiimal leat aligament scores fom the comparison of a postionspeciic sere matnx with 10 000 random protein Sequences The score atx was constuted by PSLBLAST fom th 12 oal ‘ignments wih Evalue 000 found ina search of SWISS-PROT using as ‘oer he Iengh-S67induenza A virus bemaggluiin precursor (27 (SWISS- PROT acersson 0,035) The random sequences, cach of eat 367, were gencated using the anio acd frequencies af Robinson and Robinson 2). ‘pial loca abgnment scares were calculated using the postonspeeiie ‘mauisinconjuncton wath 10+ gap costs. Tae exteme vale dstabuion that bet its the data (318) potied Ay? goodness-of-fit ent with 3M depres of ‘feedom has value 41.8, cmespending oa Pvalue of 0.20, lowest E-value found, as well asthe number of shuffled sequences yielding Evalues <1 and 10, For comparison, we performed the Nucleic Acids Research, 1997, Vol. 25,No. 17 3397 identical shuffled-tatabase test on the gapped and original versions of BLAST. To reduce the probability that high-scoring alignments were missed due to the heuristic nature of the algorithms, we performed these tests with T = 9 rather than the default value of 11, The results are given in Table 2, For the 11 queries, the median of the low PSI-BLAST E-values was 0.87, which comesponds to a median P-value of 0.58 (8,9). The mean numbers of shuffled database sequences with E-values You might also like
W 35432
No ratings yet
W 35432
10 pages
Fast Heuristic Local Alignment Algorithms: Stephen F
No ratings yet
Fast Heuristic Local Alignment Algorithms: Stephen F
18 pages
Bioinformatics Session8
No ratings yet
Bioinformatics Session8
33 pages
ItoBI Lec10 1
No ratings yet
ItoBI Lec10 1
17 pages
Blast
No ratings yet
Blast
18 pages
Database Similarity Searching
No ratings yet
Database Similarity Searching
4 pages
BLAST Script
No ratings yet
BLAST Script
10 pages
Lab Report 05
No ratings yet
Lab Report 05
20 pages
05 CAP5510 Fall21
No ratings yet
05 CAP5510 Fall21
40 pages
BLAST - A Heuristic Algorithm
No ratings yet
BLAST - A Heuristic Algorithm
18 pages
Sequence DB Search
No ratings yet
Sequence DB Search
38 pages
BLAST
No ratings yet
BLAST
30 pages
29) Altschul 1997
No ratings yet
29) Altschul 1997
14 pages
_second_done_w14b_searching squence databases
No ratings yet
_second_done_w14b_searching squence databases
32 pages
Lab 2.1
No ratings yet
Lab 2.1
21 pages
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
100% (1)
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
38 pages
Blast & Fasta
No ratings yet
Blast & Fasta
47 pages
5 Database Similarity Search BLAST
No ratings yet
5 Database Similarity Search BLAST
47 pages
Homologysearch
No ratings yet
Homologysearch
82 pages
Basic Local Alignment Search Tool
No ratings yet
Basic Local Alignment Search Tool
8 pages
Lecture 8 ACB
No ratings yet
Lecture 8 ACB
5 pages
Multi Blast
No ratings yet
Multi Blast
3 pages
Lecture 4
No ratings yet
Lecture 4
106 pages
Algorithm Design and Scoring Matrices PDF
No ratings yet
Algorithm Design and Scoring Matrices PDF
31 pages
Bioinformatics: Blast and Sequence Analysis
No ratings yet
Bioinformatics: Blast and Sequence Analysis
45 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
Blast
No ratings yet
Blast
115 pages
BLAST Background
100% (1)
BLAST Background
27 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Comparing (PSI-) BLAST To Golden Standard GO (Group-X)
No ratings yet
Comparing (PSI-) BLAST To Golden Standard GO (Group-X)
7 pages
Merin 1
No ratings yet
Merin 1
10 pages
Lecture 3 and 4 LSM2241
No ratings yet
Lecture 3 and 4 LSM2241
6 pages
Sequence Alignment
No ratings yet
Sequence Alignment
14 pages
Fundamentals of bioinformatics_L5
No ratings yet
Fundamentals of bioinformatics_L5
56 pages
Reconfigurable Accelerator For The
No ratings yet
Reconfigurable Accelerator For The
46 pages
Final Blast PDF
No ratings yet
Final Blast PDF
31 pages
Lecture 05
No ratings yet
Lecture 05
36 pages
Fassler 2011
No ratings yet
Fassler 2011
8 pages
Lecture2022 - 3 /!
No ratings yet
Lecture2022 - 3 /!
60 pages
Delta Blast PDF
No ratings yet
Delta Blast PDF
14 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Basic Local Alignment Search Tool (Blast)
No ratings yet
Basic Local Alignment Search Tool (Blast)
3 pages
Week 3 LocalAlignment
No ratings yet
Week 3 LocalAlignment
25 pages
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
4 pages
Bio 2
No ratings yet
Bio 2
39 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Blast
No ratings yet
Blast
12 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
CL662 Homework 3: Roll Number: 150020027 Name: Prathamesh Kulkarni
No ratings yet
CL662 Homework 3: Roll Number: 150020027 Name: Prathamesh Kulkarni
21 pages
Bioinfo - BLAST - Scores PDF
No ratings yet
Bioinfo - BLAST - Scores PDF
8 pages
Advance Blast Rani Anak Mat 212111
No ratings yet
Advance Blast Rani Anak Mat 212111
3 pages
Blast glossary
No ratings yet
Blast glossary
8 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
UNIT IV _ BLAST (1)
No ratings yet
UNIT IV _ BLAST (1)
21 pages
Part 1: Your First BLAST Search
No ratings yet
Part 1: Your First BLAST Search
24 pages
BLAST Glossary With Highlights
No ratings yet
BLAST Glossary With Highlights
9 pages
Sequence Alignment: Scoring Matrices
No ratings yet
Sequence Alignment: Scoring Matrices
30 pages