HW 2

The document discusses 6 problems related to suffix trees and suffix arrays. Problem 1 describes how a generalized suffix array and LCP array can be used to construct the L1 and L2 lists for the Unique Decipherability algorithm instead of a suffix tree. Problem 2 analyzes the runtime of using a suffix tree to find all occurrences of codewords in a message. Problem 3 discusses modifying the Kasai algorithm to use quadruplets instead of triplets and analyzes the runtime. Problem 4 describes using an LCP array instead of lowest common ancestor computations. Problem 5 provides an algorithm using a generalized suffix tree to find the longest common substring between two strings. Problem 6 explains how to construct a suffix tree from a suffix array in linear time

Uploaded by

JubBoy3338468

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views5 pages

HW 2

Uploaded by

JubBoy3338468

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

ECS 224 Homework 2

Tal Levy October 18, 2011

Problem 1.
The use of the sux tree in the algorithm discussed was to be able to traverse the suxes of the code words in such a way that one can take note of a sux that is a prex to a code word, or when a code word is a prex to a sux. A Generalized Sux Tree (GST) was used in the discussion. claim: a generalized sux array (GSA) and an LCP array are sucient to construct the L1 and L2 lists which the algorithm then processes to extract the type 1,2 edges in the Unique Decipherability algorithm. Given code words C = {c1 , c2 , ..., ck }, we construct a generalized sux array by simply constructing a sux array, but setting $ delimiters between code words to distinguish which sux belongs to which code word. Using the sux array we use the LCP building algorithm to construct the LCP array. Now, we can mimic the eect of doing a DFS traversal through a sux tree by simply traversing the LCP and the GSA in a linear fasion because the elements in the GSA are leaves sorted by a lexicographic DFS traversal through the GST. There are two traversals we must do, one for the L1 rules and one for the L2 rules. L1 : The original algorithm looks for nodes with edges labeled $ to signify an end of a sux, this is equivelent to nding a consecutive pair of suxes in the GSA with LCP value greater than 0. The rst sux in this computation is the leaf (i, j) the algorithm discusses, and you can simply add it to a stack for processing. When you get to a code word then you look at what you have in the stack and add the code word to the L1 list for that sux element, just as the algorithm describes. You can do this without worrying about location because the sux array is already sorted in such a way that it is a specic DFS of the sux tree. When you reach an LCP value of 0, then you know that you are starting a new branch and can pop the previous pointers from the stack, because they are no longer valid prexes of the suxes to come. L2 :

Just as for L1 list building, you now stack suxes that represent code words as the form (k, 1), where k is a codeword. When you reach a sux in the array (by linear traversal) that is of the form (i, j), then add k to the L2 list of (i, j). This indicates that the codeword k is a prex of the sux specied by the path to the leaf with the given label. note: You can know where you are in the tree by observing the LCP values. 0 - new separate branch, > 0 - same branch and rst sux which computes the value has a smaller depth in the tree (has edge e with $ label on it.) Now the lists of the algorithm are constructed, and you can complete the algorithm as specied by processing the L1 and L2 lists. The use of the sux tree and array is to get the pointers to these nodes and construct the two lists, we have done so using the sux array and its nice properties.

Problem 2.
Construct a sux tree, T , for the message M . For each of the n codewords, do a pattern match for the codeword c for each c C. Each codeword pattern search through the tree T can be done in O(m + k) time where k is the number of occurences of the codeword c in M . To run through all of these codewords will take O(m + nk) time, or O(m + n|M |) because the number of occurences for each codeword is bounded by he length of M . and m is the total length of all fo the codewords being searched. It follows that this pattern match can simply be done by traversing down the sux tree until the codeword is completely matched, if a mismatch occurs, then no occurances of the pattern are to be found in M . Now that all occurances and their locations in M are accounted for, one can apply the greedy algorithm to solving the interval scheduling problem. We essentially have an interval (the message M ) and then sub-intervals containing matched codewords found in M . the reason we have this complexity is because if a codeword i is a substring of j, then a simple traversal through M is complicated because multiple parses are found greedily. And it is possible to solve the interval scheduling problem in O(|M |log(|M |)), the log is there because the greedy algorithm for solving this must sort the codewords by their index in M , where matches at the front of M are of lesser value than ones with larger indexes, and there are at most |M | codeword matches to t uniquely in message M . We can speed this up by using bucket sort to sort the codewords by index values. This will reduce the problem into O(|M |), leaving our total time to nd the unique sequence of codewords in M as O(m + (n + 1)|M |) = O(m + n|M |).

Problem 3.
3(a)
Just as the KS algorithm looks at the rst 3 characters of each sux and splits the sufxes: 1, 2mod3 and 0mod3, one can do an analogous split of quadrupals and forcing the string S to be a multiple of four. and splitting the suxes into 1, 2, 3mod4 and 0mod4. we can do this by the following steps (similiar with KS):

3 1. Recursively sort the 4 n suxes suf fi with imod4 = 0.

this is done by the same method described by the KS algorithm, just with quadruplets instead of triplets 2. Sort the 1 n suxes suf fi with imod3 = 0 using the result of step (1) 4 This can be done just as in KS by performing a radix sort on tuples (s[i], rank(suf fi+1 )) where the rank is the rank of the sux obtained in step 1. 3. Merge the two sorted arrays Just as with KS, we now have all suxes imod4 = 0 compared so we can merge in linear time by comparing the rst characters of suf fi (imod4 = 1or2) with suf fj (jmod4 = 0), s[i], s[j]. If they are unequal, then their ordering is clear. otherwise you move to the next character of each sux and compare s[i+1], s[j+1], where we have already compared these two in step 1 by comparing suxes suf fi (imod4 = 0). if we compare suf fi (imod4 = 3) with suf fj (jmod4 = 0), s[i], s[j], then we have to compare the rst two characters, and then rely on the comparison made in step 1. With this, just as in the KS algorithm, we can merge in linear time, where each comparison takes constant time.

With this we see that with the steps combined, the running time of this new algorithm, similiar to the KS algorithm, is T (n) = T ( 3n ) + O(n). and each step after that 4

3(b)
Given the analysis done in part a, the KS algorithm is superior in running time because of the deterioration of the recursion in step 1. The splitting that the KS algorithm does is superior to all other variants of their algorithm with dierent splitting constants.

Problem 4.
4(a)
The original algorithm solves for h(w) values using the times the lca() computation is done at each node w. We can replace this step by using the LCP array of the sux tree/array. For each list Li , look at the LCP array for each consecutive elements in the list. These lists are ordered by how they are collected, and then can the LCP value can be reached from the array. With this value, you can traverse the tree to reach the node labeled with the LCP of the elements. Then increment h(w) by one where w is the node we reach via the LCP, because the LCP value between two nodes is essentially an equivelent representation of the LCA.

4(b)

Problem 5.
To solve the longest common substring problem, you can simply construct a Generalized sux tree (GST) for the pair of strings, S and S . If the GST is constructed where each sux from both strings (if the path labels are the same) are on the same leaf. A solution can be found by traversing through the tree in DFS order, and keeping track of the depth of each node. So that when you follow the nodes and the internal node has a leaf which has suxes from both strings, mark the leaf and the depth. this will be the prex of the sux (i.e. the substring) which is shared between strings S and S . The longest string depth of these marked nodes will be the length of the longest substring shared between S and S . Each of the nodes with this longest depth will be a longest common substring shared by the strings. The running time of this algorithm is linear in the traversal of the GST, O(|S| + |S |).

Because all the children of such a node which is the longest common substring are leaves (who are labeled by suxes from both strings), andy consecutive pair of suxes in a Generalized sux array will have a common substring. The length of this can be found by looking at the LCP array for the GSA. when you reach two consecutive suxes, one from each string, the LCP value is essentially the length of that longest prex matched, or substring. To nd the occurances in the GSA for all of the longest common substring lengths, one can simply do a linear traversal through the LCP array in search of occurances of the largest value. The starting position of these suxes with largest LCP value reveals the position in the strings where the common substring occurs, and the LCP tells you the length. Just like the variant of this problem using a tree

traversal, this algorithm runs in O(|S| + |S |) time.

Problem 6.
Because each of the consecutive suxes in the array are lexicographically smaller than the next. You can start by constructing the tree by adding suxes one by one in the order found in the sux array. Initially, you can add the rst sux (the empty string $). Then you can move back up the tree to the root using the LCP values. The LCP values will help you traverse along the tree to reach the next location in which you must attach the next sux. This traversal can be done in constant time because it is simply arithmetic movement to the new position. And each time you use the LCP to move along to the next point where to start attaching the next sux. So the construction of the Tree, with the help of the LCP to guide positioning, can be done in linear time. The paths are only walked once because of the order in which the suxes are appended to the tree.

Nested Churn Dash Block
100% (4)
Nested Churn Dash Block
5 pages
Leadership Skills of Satya Nadella
100% (1)
Leadership Skills of Satya Nadella
3 pages
hw10 Solution PDF
No ratings yet
hw10 Solution PDF
5 pages
Modern Multidimensional Calculus
From Everand
Modern Multidimensional Calculus
Marshall Evans Munroe
No ratings yet
Ultimate Guide To The Perfect Bass Setup
No ratings yet
Ultimate Guide To The Perfect Bass Setup
63 pages
Spamming
100% (1)
Spamming
4 pages
Pattern Matching: Suffix Tree Applications
No ratings yet
Pattern Matching: Suffix Tree Applications
39 pages
Applications of Suffix Trees
No ratings yet
Applications of Suffix Trees
40 pages
Suffix Tree and Suffix Array Techniques For Pattern Analysis in Strings
No ratings yet
Suffix Tree and Suffix Array Techniques For Pattern Analysis in Strings
78 pages
10 String Algorithms
No ratings yet
10 String Algorithms
36 pages
Suffix Arrays: Justin Zhang 24 May 2017
No ratings yet
Suffix Arrays: Justin Zhang 24 May 2017
5 pages
Dynamic Programming - Longest Common Subsequence (LCS)
No ratings yet
Dynamic Programming - Longest Common Subsequence (LCS)
34 pages
10 TSP Exam Sol
No ratings yet
10 TSP Exam Sol
8 pages
Suffix Array Tutorial
No ratings yet
Suffix Array Tutorial
17 pages
Strings
No ratings yet
Strings
73 pages
54.string 2notes
No ratings yet
54.string 2notes
20 pages
DSA _Strings_ Notes
No ratings yet
DSA _Strings_ Notes
8 pages
Fin f12 Sol
No ratings yet
Fin f12 Sol
6 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
Fin f14 Sol
No ratings yet
Fin f14 Sol
7 pages
Bhaskar - CSE-2 - DAA Experential Learning
No ratings yet
Bhaskar - CSE-2 - DAA Experential Learning
9 pages
Advanced String Lecture
No ratings yet
Advanced String Lecture
50 pages
Suffix Trees and Suffix Arrays
No ratings yet
Suffix Trees and Suffix Arrays
33 pages
L17
No ratings yet
L17
23 pages
Aoa 6
No ratings yet
Aoa 6
4 pages
Pattern Search
No ratings yet
Pattern Search
2 pages
20BCS5977_DAA LAB WORKSHEET 3.3pdf
No ratings yet
20BCS5977_DAA LAB WORKSHEET 3.3pdf
5 pages
experiment 9 DAA
No ratings yet
experiment 9 DAA
5 pages
Suffix Arrays
No ratings yet
Suffix Arrays
20 pages
Chapter 10 - Parallel in Tree-Related Problems
No ratings yet
Chapter 10 - Parallel in Tree-Related Problems
84 pages
KMP Algorithm For Strings
No ratings yet
KMP Algorithm For Strings
4 pages
Week 4
No ratings yet
Week 4
18 pages
suffix
No ratings yet
suffix
29 pages
String Naive and KMP
No ratings yet
String Naive and KMP
18 pages
Simple Linear Work Su X Array Construction: Abstract. A Su X Array Represents The Su Xes of A String in Sorted
No ratings yet
Simple Linear Work Su X Array Construction: Abstract. A Su X Array Represents The Su Xes of A String in Sorted
13 pages
Semester Final Project Report
No ratings yet
Semester Final Project Report
11 pages
adamodelpaper-3-
No ratings yet
adamodelpaper-3-
35 pages
Tutorial
No ratings yet
Tutorial
6 pages
Lecture Notes On Pattern Matching Algorithms
No ratings yet
Lecture Notes On Pattern Matching Algorithms
16 pages
Lecture Notes On Pattern Matching Algorithms
No ratings yet
Lecture Notes On Pattern Matching Algorithms
16 pages
Toc
No ratings yet
Toc
6 pages
1 s2.0 S0020019015000411 Main
No ratings yet
1 s2.0 S0020019015000411 Main
3 pages
Hors Pool
No ratings yet
Hors Pool
16 pages
Hors Pool
No ratings yet
Hors Pool
16 pages
Project Explanation
No ratings yet
Project Explanation
50 pages
String Matching and Hashing
No ratings yet
String Matching and Hashing
10 pages
Boyer Moore Algorithm: Idan Szpektor
100% (1)
Boyer Moore Algorithm: Idan Szpektor
48 pages
Suffix Array
No ratings yet
Suffix Array
71 pages
Programming-Assignment-3
No ratings yet
Programming-Assignment-3
17 pages
H26 PracticeSoln PDF
No ratings yet
H26 PracticeSoln PDF
4 pages
Fin f14
No ratings yet
Fin f14
16 pages
Z Function and Its Calculation:: Int Int Int Int For Int If While If
No ratings yet
Z Function and Its Calculation:: Int Int Int Int For Int If While If
32 pages
Solution Notes
No ratings yet
Solution Notes
3 pages
Module 06. String Algorithms Lecture 3-6
No ratings yet
Module 06. String Algorithms Lecture 3-6
48 pages
Amazon Interview Questions With Solutions Java
No ratings yet
Amazon Interview Questions With Solutions Java
13 pages
Week 2+3 TRIE (Student Copy)
No ratings yet
Week 2+3 TRIE (Student Copy)
24 pages
6.851 Advanced Data Structures (Spring'12) Prof. Erik Demaine Problem 9 Sample Solution
No ratings yet
6.851 Advanced Data Structures (Spring'12) Prof. Erik Demaine Problem 9 Sample Solution
2 pages
Akhanda Complex Ap
No ratings yet
Akhanda Complex Ap
17 pages
String
No ratings yet
String
4 pages
String Problems
No ratings yet
String Problems
20 pages
Infinite Sequences and Series
From Everand
Infinite Sequences and Series
Konrad Knopp
3.5/5 (3)
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Elementary Functional Analysis
From Everand
Elementary Functional Analysis
Georgi E. Shilov
4/5 (1)
Design and Construction of Modern Curved Bridges
No ratings yet
Design and Construction of Modern Curved Bridges
23 pages
Dassault Systemes Case Study
No ratings yet
Dassault Systemes Case Study
4 pages
Essay On The Next Global Stage
No ratings yet
Essay On The Next Global Stage
15 pages
IC Manufacturing
No ratings yet
IC Manufacturing
54 pages
Cascade Control
No ratings yet
Cascade Control
10 pages
Annexure IV
No ratings yet
Annexure IV
8 pages
Government of Rajasthan Salary Bill (Regular (Inner Sheet) )
No ratings yet
Government of Rajasthan Salary Bill (Regular (Inner Sheet) )
3 pages
Plumbing Course Outline
No ratings yet
Plumbing Course Outline
10 pages
3.ul 1703 Allmax&tallmax
No ratings yet
3.ul 1703 Allmax&tallmax
3 pages
RNRQ Sales
No ratings yet
RNRQ Sales
8 pages
B 64 Dcode
No ratings yet
B 64 Dcode
3 pages
Control Valve Sizing
100% (11)
Control Valve Sizing
144 pages
PDS - Castrol Vecton 15W40
No ratings yet
PDS - Castrol Vecton 15W40
2 pages
Research Paper
No ratings yet
Research Paper
10 pages
2mbi 150NC-060
No ratings yet
2mbi 150NC-060
4 pages
OKI Digital Envelope Support Guide - 2.0
No ratings yet
OKI Digital Envelope Support Guide - 2.0
29 pages
Wargames Illustrated #078
100% (2)
Wargames Illustrated #078
60 pages
Auraton 2005
No ratings yet
Auraton 2005
11 pages
Niranjana Radhakrishnan: ISTQB Certified Tester
No ratings yet
Niranjana Radhakrishnan: ISTQB Certified Tester
3 pages
ExportDocs 20230726 - 10 03
No ratings yet
ExportDocs 20230726 - 10 03
6 pages
Advanced Maintenance Management: Jose K Puthur RCBS
No ratings yet
Advanced Maintenance Management: Jose K Puthur RCBS
178 pages
SmartPly VapAirTight DATA SHEET Final 2015
No ratings yet
SmartPly VapAirTight DATA SHEET Final 2015
4 pages
Using The Border Gateway Protocol For Interdomain Routing: BGP Fundamentals
No ratings yet
Using The Border Gateway Protocol For Interdomain Routing: BGP Fundamentals
66 pages
Tiffany Thomas Women in STEM PowerPoint
No ratings yet
Tiffany Thomas Women in STEM PowerPoint
17 pages
Delegation of Authority
No ratings yet
Delegation of Authority
23 pages

HW 2

Uploaded by

HW 2

Uploaded by

ECS 224 Homework 2

Tal Levy October 18, 2011

3 1. Recursively sort the 4 n suxes suf fi with imod4 = 0.

traversal, this algorithm runs in O(|S| + |S |) time.

You might also like