0% found this document useful (0 votes)
110 views

Longest Common Subsquence

Dynamic programming is an algorithmic technique used to solve problems recursively by storing solutions to subproblems, avoiding recomputing them. The longest common subsequence (LCS) problem finds the longest subsequence common to two strings and can be solved using dynamic programming. Specifically, it works by building a table where each entry c[i,j] represents the length of the LCS of the prefixes X[i] and Y[j]. The table is built recursively according to rules: if the last characters match, c[i,j] = c[i-1,j-1]+1; otherwise c[i,j] takes the maximum of c[i,j-1] and c[i-

Uploaded by

Raushan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views

Longest Common Subsquence

Dynamic programming is an algorithmic technique used to solve problems recursively by storing solutions to subproblems, avoiding recomputing them. The longest common subsequence (LCS) problem finds the longest subsequence common to two strings and can be solved using dynamic programming. Specifically, it works by building a table where each entry c[i,j] represents the length of the LCS of the prefixes X[i] and Y[j]. The table is built recursively according to rules: if the last characters match, c[i,j] = c[i-1,j-1]+1; otherwise c[i,j] takes the maximum of c[i,j-1] and c[i-

Uploaded by

Raushan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Dynamic programming

 It is used, when the solution can be


recursively described in terms of solutions
Dynamic programming
to subproblems (optimal substructure)
Longest Common Subsequence
 Algorithm finds solutions to subproblems
and stores them in memory for later use
 More efficient than “brute-force methods”,
which solve the same subproblems over
and over again
1 2

Longest Common Subsequence LCS Algorithm


(LCS)  if |X| = m, |Y| = n, then there are 2m
subsequences of x; we must compare each
Application: comparison of two DNA strings with Y (n comparisons)
Ex: X= {A B C B D A B }, Y= {B D C A B A}  So the running time of the brute-force
Longest Common Subsequence: algorithm is O(n 2m)
X= AB C BDAB  Notice that the LCS problem has optimal
substructure: solutions of subproblems are
Y= BDCAB A parts of the final solution.
 Subproblems: “find LCS of pairs of prefixes
of X and Y”
3 4

1
Longest Common Subsequence Example
 Definition 1: Subsequence
Given a sequence X = <A,B,D,F,M,Q>
X = < x1, x2, . . . , xm> Z = <B, F, M>
then another sequence
Z = < z1, z2, . . . , zk>
is a subsequence of X if there exists a strictly Z is a subsequence of X with index
increasing sequence <i1, i2, . . . , ik> of indices of x sequence <2,4,5>
such that for all j = 1,2,...k we have xij= zj

5 6

More Definitions
Example
 Definition 2: Common subsequence
– Given 2 sequences X and Y, we say Z is a X = <A,B,C,B,D,A,B>
common subsequence of X and Y if Z is a
Y = <B,D,C,A,B,A>
subsequence of X and a subsequence of Y
Then, what is/are LCS?
 Definition 3: Longest common
1. BCBA length 4
subsequence problem
– Given X = < x1, x2, . . . , xm> and Y = < y1,
y2, . . . , yn> find a maximum length
2. BDAB length 4
common subsequence of X and Y
3. BCAB length 4

7 8

2
Brute Force Algorithm
Brute force algorithm would compare each subsequence of
X with the symbols in Y Yet More Definitions
1 for every subsequence of X
2 Is there subsequence in Y?
 Definition 4: Prefix of a subsequence
If X = < x1, x2, . . . , xm> , the ith prefix of X for
3 If yes, is it longer than the longest i = 0,1,...,m is Xi = < x1, x2, . . . , xi>
subsequence found so far?
What about Complexity?  Example
if|X| = m, |Y| = n, then there are 2m subsequences – if X = <A,B,C,D,E,F,H,I,J,L> then
of x; we must compare each with Y (n comparisons)
X4 = <A,B,C,D> and X0 = <>
So the running time of the brute-force algorithm is
O(n 2m)
9 10

Optimal Substructure Sub-problem structure


 Theorem Optimal Substructure of LCS  Case 1
Let X = < x1, x2, . . . , xm> and Y = < y1, y2, . . . , yn> – if xm = yn then there is one sub-problem to solve
be sequences and let Z = < z1, z2, . . . ,zk> be any find a LCS of Xm-1 and Yn-1 and append xm
LCS of X and Y
1. if xm = yn then zk = xm = yn and zk-1 is an LCS of
 Case 2
xm-1 and yn-1 – if xm  yn then there are two sub-problems
2. if xm  yn and zk  x m Z is an LCS of Xm-1 and Y • find an LCS of Xmand Yn-1
3. if xm  yn and zk  yn Z is an LCS of Xm and Yn-1 • find an LCS of Xm-1 and Yn
• pick the longer of the two

11 12

3
Cost of Optimal Solution The Recurrence
 Cost is length of the common 0 if i  0 or j  0
subsequence 
c[i, j]  c[i  1, j  1] 1 if i, j  0 and x i  y j
 We want to pick the longest one 
max(c[i, j  1],c[i  1, j]) if i, j  0 and x i  y j
 Let c[i,j] be the length of an LCS of the
sequences Xi and Yj
 Base case is an empty subsequence--
then c[i,j] = 0 because there is no LCS

13 14

Tables used for LCS LCS recursive solution


c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ]  
 max(c[i, j  1], c[i  1, j ]) otherwise
 Table c[0..m, 0..n] stores the length of
an LCS of Xi and Yj c[i,j]  We start with i = j = 0 (empty substrings of x
 Table b[1..m, 1..n] stores pointers to and y)
optimal sub-problem solutions  Since X0 and Y0 are empty strings, their LCS
is always empty (i.e. c[0,0] = 0)
 LCS of empty string and any other string is
empty, so for every i and j: c[0, j] = c[i,0] = 0
15 16

4
LCS recursive solution LCS recursive solution
c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i  1, j  1]  1 if x[i ]  y[ j ], c[i, j ]  
c[i, j ]    max(c[i, j  1], c[i  1, j ]) otherwise
 max(c[i, j  1], c[i  1, j ]) otherwise
 When we calculate c[i,j], we consider two  Second case: x[i] != y[j]
cases:  As symbols don’t match, our solution is not
 First case: x[i]=y[j]: one more symbol in improved, and the length of LCS(Xi , Yj) is
strings X and Y matches, so the length of LCS the same as before (i.e. maximum of LCS(Xi,
Xi and Yj equals to the length of LCS of Yj-1) and LCS(Xi-1,Yj)
smaller strings Xi-1 and Yi-1 , plus 1 17 18

LCS Length Algorithm LCS Example


LCS-LENGTH(X,Y)
1 m length[X]
We’ll see how LCS algorithm works on the
2 n length[Y] following example:
3 for i  to m
4 do c[i,0]   X = ABCB
5 for j  to n
6 do c[0,j]   Y = BDCAB
7 for i  to m
8 do for j  to n
9 do if xi = yj
then c[i,j] c[i-1,j-1] + 1
What is the Longest Common Subsequence
b[i,j] ”” of X and Y?
12 else if c[i-1,j] c[i,j-1]
13 then c[i,j] c[i-1,j]
14 b[i,j] ”” LCS(X, Y) = BCB
15 else c[i,j] c[i,j-1]
16 b[i,j] ”” X=AB C B
17 return c and b 19
Y= BD CAB 20

5
ABCB ABCB
LCS Example (0) BDCAB
LCS Example (1) BDCAB
j 0 1 2 3 4 5 j 0 1 2 3 4 5
i Yj B D C A B i Yj B D C A B
0 Xi 0 Xi 0 0 0 0 0 0
1 A 1 A 0
2 B 2 B 0
3 C 3 C 0
4 B 4 B 0

X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5 for i = 1 to m c[i,0] = 0
Allocate array c[5,4] for j = 1 to n c[0,j] = 0
21 22

ABCB
LCS Example (15) BDCAB LCS Algorithm Running Time
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0  LCS algorithm calculates the values of each
1 A 0 0 0 0 1 1
entry of the array c[m,n]
B  So what is the running time?
2 0 1 1 1 1 2
3 C 0 1 1 2 2 2 O(m*n)
4 B 0 1 1 2 2 3 since each c[i,j] is calculated in
if ( Xi == Yj ) constant time, and there are m*n
c[i,j] = c[i-1,j-1] + 1 elements in the array
else c[i,j] = max( c[i-1,j], c[i,j-1] )
35 36

6
How to find actual LCS How to find actual LCS - continued
 So far, we have just found the length of LCS,  Remember that
but not LCS itself. c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ]  
 We want to modify this algorithm to make it
 max(c[i, j  1], c[i  1, j ]) otherwise
output Longest Common Subsequence of X
and Y  So we can start from c[m,n] and go backwards
Each c[i,j] depends on c[i-1,j] and c[i,j-1]  Whenever c[i,j] = c[i-1, j-1]+1, remember
or c[i-1, j-1] x[i] (because x[i] is a part of LCS)
For each c[i,j] we can say how it was acquired:  When i=0 or j=0 (i.e. we reached the
2 2 For example, here beginning), output remembered letters in
c[i,j] = c[i-1,j-1] +1 = 2+1=3 reverse order
2 3 37 38

Algorithm to print LCS


Finding LCS
j 0 1 2 3 4 5
i Yj B D C A B
PRINT-LCS (b,X,i,j) 0 Xi 0 0 0 0 0 0
1if i = 0 or j = 0 A
2 then return 1 0 0 0 0 1 1
3 if b[i,j] = ”” 2 B 0 1 1 1 1 2
4 then PRINT-LCS(b,X,i-1,j-1)
5 print xi 3 C 0 1 1 2 2 2
6 else if b[i,j] ”” 4 B 0 1 1 2 2 3
7 then PRINT-LCS(b,X,i-1,j)
8 else PRINT-LCS(b,X,i,j-1)

39 40

7
Finding LCS (2)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
LCS (reversed order): B C B
LCS (straight order): B C B
(this string turned out to be a palindrome) 41

You might also like