Longest Common Sub Sequence
Longest Common Sub Sequence
Let P=set of alphabet.And A and B are two strings of size n from the alphabet .
i.e. A=a1,a2,.....,am and B=b1,b2,.....,bn . Now we have to _nd the longest common subsequence
between them .
Now suppose we de_ne L[i,j]as
L[i,j]= Length of longest common subsequence between the strings a1,a2,.....,ai and b1,b2,.....,bj
.
Now from the problem we can see that L[i,j] can be de_ned in terms of the previous ones
L[i-1,j-1] , L[i-1,j] , L[i,j-1] as bellow .
L[i,j] = L[i-1,j-1]+1 if ai = bj
L[i,j] = Max(L[i,j-1] , L[i-1,j]) otherwise .
Let C[m_n] matrix contains the length of an longest common subsequence between
5
a1,a2,.....,am and b1,b2,.....,bn and D[m_n] matrix contains the longest common subsequence
. If D[i;j]=- then ai or bj ( both are equal ) is part of the longest common
subsequence .
The Algorithm is given bellow .
Method
Input:Two Arrayes A[m] and B[n].
Output: Two m_n matrices C and D.
LCS
m=length[A];
n=length[B];
for i 1 upto m
C[i,0] 0;
for j 1 upto n
C[0,j] 0;
for i 1 upto m
for j 1 upto n
if(ai = bj)
C[i,j] C[i-1,j-1]+1;
D[i,j] "-";
else if(C[i-1,j]_C[i,j-1])
C[i,j] C[i-1,j];
D[i,j] """;
elseC[i,j] C[i,j-1];
D[i,j] " ";
endfor
endfor
return C and D
4.1 Construction Of Longest Common Subsequence:
From the D[m_n] matrix we can get the LCS as bellow .
Method
6
Input: D[m_n] matrix output from LCS Algorithm .
Output: Longest Common Subsequence .
PrintLCS(D,A,i,j)
if(i=0 or j=0)
then return ;
endif
if(D[i,j]="-")
then PrintLCS(D,A,i-1,j-1);
print Ai;
else if(D[i,j]=""")
then PrintLCS(D,A,i-1,j);
else PrintLCS(D,A,i,j-1);
endif
endif
4.2 Complexity analysis
4.2.1 Time Complexity
From the above LCS algorithm it is very clear that the time complexity is _( m*n ) as
there are two nested for loop of length m and n .
4.2.2 Space Complexity
As we need to store the m_n matrices C and D the space complexity is also _( m*n )
. But we can reduce the space for storing storing the m*n matrix C . We can store only
i-1st and i-2nd row/column whichever is smaller for computing at ith level in the LCS
algorithm . Thus we can improve the space complexity to _( Min ( m,n ) ) .
7
Why might we want to solve the longest common subsequence problem? There are
a) Molecular biology.
submolecules forming DNA. When biologists find a new sequences, they typically want
to know what other sequences it is most similar to. One way of computing how similar
two sequences are is to find the length of their longest common subsequence.
b) File comparison.
The Unix program "diff" is used to compare two different versions of the same file, to
determine what changes have been made to the file. It works by finding a longest
common subsequence of the lines of the two files; any line in the subsequence has not
been changed, so what it displays is the remaining set of lines that have changed. In this
instance of the problem we should think of each line of a file as being a single
c) Screen redisplay.
Many text editors like "emacs" display part of a file on the screen, updating the screen
image as the file is changed. For slow dial-in terminals, these programs want to send the
possible to view the computation of the minimum length sequence of characters needed
to update the terminal as being a sort of common subsequence problem (the common
subsequence tells you the parts of the display that are already correct and don't need to be
changed).
Brute-force methods:-
Using brute-force methods,we are solving LCS problem. If we have two strings, say
Subsequence
|||||
Opsubset
If we draw lines connecting the letters in the first string to the corresponding letters in the
second, no two lines cross (the top and bottom endpoints occur in the same order, the
order of the letters in the subsequence). Conversely any set of lines drawn like this,
On the other hand, suppose that, like the example above, the two first characters differ.
Then it is not possible for both of them to be part of a common subsequence - one or the
Finally, observe that once we've decided what to do with the first characters of the
Recursive LCS:
This is a correct solution but it's very time consuming. For example, if the two strings
have no matching characters, so the last line always gets executed, the the time bounds