Longest Common Subsquence
Longest Common Subsquence
1
Longest Common Subsequence Example
Definition 1: Subsequence
Given a sequence X = <A,B,D,F,M,Q>
X = < x1, x2, . . . , xm> Z = <B, F, M>
then another sequence
Z = < z1, z2, . . . , zk>
is a subsequence of X if there exists a strictly Z is a subsequence of X with index
increasing sequence <i1, i2, . . . , ik> of indices of x sequence <2,4,5>
such that for all j = 1,2,...k we have xij= zj
5 6
More Definitions
Example
Definition 2: Common subsequence
– Given 2 sequences X and Y, we say Z is a X = <A,B,C,B,D,A,B>
common subsequence of X and Y if Z is a
Y = <B,D,C,A,B,A>
subsequence of X and a subsequence of Y
Then, what is/are LCS?
Definition 3: Longest common
1. BCBA length 4
subsequence problem
– Given X = < x1, x2, . . . , xm> and Y = < y1,
y2, . . . , yn> find a maximum length
2. BDAB length 4
common subsequence of X and Y
3. BCAB length 4
7 8
2
Brute Force Algorithm
Brute force algorithm would compare each subsequence of
X with the symbols in Y Yet More Definitions
1 for every subsequence of X
2 Is there subsequence in Y?
Definition 4: Prefix of a subsequence
If X = < x1, x2, . . . , xm> , the ith prefix of X for
3 If yes, is it longer than the longest i = 0,1,...,m is Xi = < x1, x2, . . . , xi>
subsequence found so far?
What about Complexity? Example
if|X| = m, |Y| = n, then there are 2m subsequences – if X = <A,B,C,D,E,F,H,I,J,L> then
of x; we must compare each with Y (n comparisons)
X4 = <A,B,C,D> and X0 = <>
So the running time of the brute-force algorithm is
O(n 2m)
9 10
11 12
3
Cost of Optimal Solution The Recurrence
Cost is length of the common 0 if i 0 or j 0
subsequence
c[i, j] c[i 1, j 1] 1 if i, j 0 and x i y j
We want to pick the longest one
max(c[i, j 1],c[i 1, j]) if i, j 0 and x i y j
Let c[i,j] be the length of an LCS of the
sequences Xi and Yj
Base case is an empty subsequence--
then c[i,j] = 0 because there is no LCS
13 14
4
LCS recursive solution LCS recursive solution
c[i 1, j 1] 1 if x[i ] y[ j ],
c[i 1, j 1] 1 if x[i ] y[ j ], c[i, j ]
c[i, j ] max(c[i, j 1], c[i 1, j ]) otherwise
max(c[i, j 1], c[i 1, j ]) otherwise
When we calculate c[i,j], we consider two Second case: x[i] != y[j]
cases: As symbols don’t match, our solution is not
First case: x[i]=y[j]: one more symbol in improved, and the length of LCS(Xi , Yj) is
strings X and Y matches, so the length of LCS the same as before (i.e. maximum of LCS(Xi,
Xi and Yj equals to the length of LCS of Yj-1) and LCS(Xi-1,Yj)
smaller strings Xi-1 and Yi-1 , plus 1 17 18
5
ABCB ABCB
LCS Example (0) BDCAB
LCS Example (1) BDCAB
j 0 1 2 3 4 5 j 0 1 2 3 4 5
i Yj B D C A B i Yj B D C A B
0 Xi 0 Xi 0 0 0 0 0 0
1 A 1 A 0
2 B 2 B 0
3 C 3 C 0
4 B 4 B 0
X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5 for i = 1 to m c[i,0] = 0
Allocate array c[5,4] for j = 1 to n c[0,j] = 0
21 22
ABCB
LCS Example (15) BDCAB LCS Algorithm Running Time
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0 LCS algorithm calculates the values of each
1 A 0 0 0 0 1 1
entry of the array c[m,n]
B So what is the running time?
2 0 1 1 1 1 2
3 C 0 1 1 2 2 2 O(m*n)
4 B 0 1 1 2 2 3 since each c[i,j] is calculated in
if ( Xi == Yj ) constant time, and there are m*n
c[i,j] = c[i-1,j-1] + 1 elements in the array
else c[i,j] = max( c[i-1,j], c[i,j-1] )
35 36
6
How to find actual LCS How to find actual LCS - continued
So far, we have just found the length of LCS, Remember that
but not LCS itself. c[i 1, j 1] 1 if x[i ] y[ j ],
c[i, j ]
We want to modify this algorithm to make it
max(c[i, j 1], c[i 1, j ]) otherwise
output Longest Common Subsequence of X
and Y So we can start from c[m,n] and go backwards
Each c[i,j] depends on c[i-1,j] and c[i,j-1] Whenever c[i,j] = c[i-1, j-1]+1, remember
or c[i-1, j-1] x[i] (because x[i] is a part of LCS)
For each c[i,j] we can say how it was acquired: When i=0 or j=0 (i.e. we reached the
2 2 For example, here beginning), output remembered letters in
c[i,j] = c[i-1,j-1] +1 = 2+1=3 reverse order
2 3 37 38
39 40
7
Finding LCS (2)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
LCS (reversed order): B C B
LCS (straight order): B C B
(this string turned out to be a palindrome) 41