0% found this document useful (0 votes)
28 views

03 Backtrace For Computing Alignments 5-55

Uploaded by

idhitappu
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

03 Backtrace For Computing Alignments 5-55

Uploaded by

idhitappu
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Knowing the edit distance between two

strings is important but it turns out not


to be sufficient. We often need something
more which is the alignment between two
strings. We wanna know which symbol in
string x corresponds to which symbol in
string y and this is gonna be important
for any application we have of [inaudible]
for often from spell checking to machine
translation even in computational biology.
The way we compute this alignment is we
keep a back trace. A back trace is simply
a pointer when we enter each cell in the
matrix that tells us where we came from
and when we reach the. And the upper right
corner of our matrix. We can use that
pointer an then trace back through all the
pointers to read off the alignment. Let's
see how this works in practice. Again,
I've given you the equation for each cell,
in edit distance. And, if we put in
some of our values that we saw earlier,
I'll start by putting in some values, so.
[sound] Alright. So we can ask, how did we
get to this value two? Two is, we pick the
minimum of three values. We could either
take, so two is the distance. This two
here is the distance between the string I
and the string E. And we got that by
saying, it's either the alignment between
nothing and E, plus the insertion of an
extra I. So that's distance of one plus
one is two. Or zero plus two is two, or
one plus one is two. So we had three
different values. So if we were asking
which of, which minimum path did we come
from. Really they're all the same. We
could have come from any of them. And
that's going to be true for this value of
three as well. Yeah. We computed it as the
minimum of two plus one, one plus two, or
two plus one. So this could have come from
here, here, or here. And similarly, that's
going to be true. I didn't work out the,
arithmetic for you, but it's going to be
true for this cell too. You can work it
out for yourself. Here we have a distant,
distant, difference. So, the distance
between inte and e, we could compute that
by taking the distance. What it cost us
to, to convert I N T E to nothing, and
then add another insertion for E. And that
would be, that would be silly because four
plus one is five, and there's a cheaper
way to get from I N T E to E, and that is
that it costs us nothing to match this E
to that E. So, our previous alignment
between I N T and nothing, we, we can add
zero from three to get a three, so. The
minimum path for this three came from that
three. So while in some cases a cell came
from many places. In this case it
[inaudible] came from this previous three.
So we're going to do this for every cell
in the array. And the result will look
something like this where we have for
every cell every place it could have come
from. And you'll see that in a lot of
cases any path could have worked, so this
six could have come from any place. But,
crucially, this final alignment, this
eight that tells us the final, edit
distance between intention and execution.
Our traceback tells us it came from the,
the best alignment between intentio and
executio, which came from the best
alignment, from intensi, from executi and
so on. And so, we can trace back this
alignment, and get ourselves, alignment
that tells us that this N. Match this N
match this O match this O and so on but
maybe here we have an insertion, rather
than a clean lining up. Computing the back
trace very simple. We take our same
minimum edit algorithm that we have seen
and here I have labelled the cases for
you. So when we are looking at a cell
we're either deleting, inserting or
substituting and we simply add pointers.
So in a case where we are inserting we
point left and in a case where we are
deleting we point down and in a case where
I am substituting we point diagonally. I
have shown you that arrow on the previous
slide. [sound] So we can look at this
distance matrix and think about the paths
from the origin here. To the, the end of
the matrix. And any non-decreasing path
that goes from the origin to the point NM,
corresponds to some alignment of the two
sequences. An optimal alignment, then, is
composed of optimal sub-sequences, and
that's the idea that makes it, it possible
to use dynamic programming for this task.
So, the resulting of our back trace are,
two strings and then, the alignment
between them. So we, we'll know which,
which things line up exactly, which things
line up with substitutions, and then, when
we should have insertions or deletions.
What's the performance of this algorithm?
In time it's order NM because our back,
our distance matrix is of size NM, and
we're filling in each cell one time. The
same is true for space. And in the
backtrace we have to on the, in the worst
case go for, if we had N deletions and M
insertions we'd have to go N plus M. We'd
have to touch N plus M cells but not more
than that. So that's our backtrace
algorithm for computing alignments.

You might also like