exp 7
exp 7
Aim: - Implementation of longest common subsequences for string matching using Dynamic
Programming.
Lab Outcomes : CSL 401.3: Ability to implement the algorithm using Dynamic Programming
approach..
Theory:-
The longest common subsequence (LCS) is defined as the longest subsequence that is common to all
the given sequences, provided that the elements of the subsequence are not required to occupy
consecutive positions within the original sequences.
If S1 and S2 are the two given sequences then, Z is the common subsequence of S1 and S2 if Z is a
subsequence of both S1 and S2. Furthermore, Z must be a strictly increasing sequence of the indices of
both S1 and S2.
In a strictly increasing sequence, the indices of the elements chosen from the original sequences must
be in ascending order in Z.
If S1 = {B, C, D, A, A, C, D}
Then, {A, D, B} cannot be a subsequence of S1 as the order of the elements is not the same (ie. not
strictly increasing sequence).
Then, common subsequences are {B, C}, {C, D, A, C}, {D, A, C}, {A, A, C}, {A, C}, {C, D}, ...
Among these subsequences, {C, D, A, C} is the longest common subsequence. We are going to find
this longest common subsequence using dynamic programming.
Let two sequences be defined as follows: X = (x1, x2...xm) and Y = (y1, y2...yn). The prefixes of X are
X1, 2,...m; the prefixes of Y are Y1, 2,...n. Let LCS (Xi, Yj) represent the set of longest common
subsequence of prefixes Xi and Yj. This set of sequences is given by the following.
Algorithm
Example:-
1. Create a table of dimension n+1*m+1 where n and m are the lengths of X and Y respectively. The
first row and the first column are filled with zeros.
3. If the character corresponding to the current row and current column are matching, then fill the
current cell by adding one to the diagonal element. Point an arrow to the diagonal cell. 4. Else take the
maximum value from the previous column and previous row element for filling the current cell. Point
an arrow to the cell with maximum value. If they are equal, point to any of them.
7. In order to find the longest common subsequence, start from the last element and follow the direction
of the arrow. The elements corresponding to () symbol form the longest common subsequence.
How is a dynamic programming algorithm more efficient than the recursive algorithm while
solving an LCS problem?
The method of dynamic programming reduces the number of function calls. It stores the result of each
function call so that it can be used in future calls without the need for redundant calls.
In the above dynamic algorithm, the results obtained from each comparison between elements of X and
the elements of Y are stored in a table so that they can be used in future computations.
So, the time taken by a dynamic approach is the time taken to fill the table (ie. O(mn)). Whereas, the
recursion algorithm has the complexity of 2^max(m, n).
Conclusion: - Thus we have implemented longest common subsequences for string matching using
dynamic programming.