Unit 9 Space and Time Tradeoffs: Structure
Unit 9 Space and Time Tradeoffs: Structure
9.1 Introduction
By now you must be familiar with different methods of problem solving using
algorithms. This unit deals with space and time tradeoffs in algorithms.
Algorithm analysis determines the amount of resources necessary to
execute the algorithm. The vital resources are time and storage. Most
algorithms are constructed to work with inputs of arbitrary length. Usually
the efficiency of an algorithm is stated as a function relating the input length
to the number of steps (time complexity) or storage locations (space
complexity).
A balance has to be maintained in the space and time aspects of computing.
Research in Computer Science these days is more towards achieving time
efficiency.
With continuous reduction in prices, space availability is perhaps not a
significant problem. But with respect to networking and robotics, however,
the necessity of a balance becomes apparent. Memory has to be used
conservatively.
Objectives:
After studying this unit you should be able to:
explain the importance of space-time tradeoff in programming
the process of sorting by counting
analyze the process of input enhancement in string matching
define the role of hashing in space and time tradeoff
explain B-Tree technique with respect to space and time tradeoff
9.2 Sorting
Input enhancement is based on preprocessing the instance to obtain
additional information that can be used to solve the instance in less time.
Sorting is an example of input enhancement that achieves time efficiency.
First, let us study distribution counting.
9.2.1 Distribution counting
Distribution counting is a sorting method that uses some associated
information of the elements and places the elements in an array at their
relative position. In this method elements are actually distributed in the array
from 0th to (n-1)th position. This method ensures that the elements do not get
over written. The accumulated sum of frequencies also called as distribution
in statistics is used to place the elements at proper positions. Therefore this
method of sorting is called distribution counting method for sorting.
Sikkim Manipal University B1480 Page No. 183
Analysis and Design of Algorithms Unit 9
6 4 2 6 4 4 2
Figure 9.1: Array a[ ]
The figure 9.1 depicts the values of array a[ ]. Observing the array we find
that the list contains only {2, 4, 6}. We will count the frequencies of each
element and the distribution count. Distribution count value represents the
last occurrence of corresponding element in the sorted array. These values
are given in table 9.1.
Table 9.1: Distribution Values of Elements of Array a[ ]
Elements 2 4 6
Frequencies 2 3 2
Distribution values 2 5 7
We will have two arrays as shown in figure 9.2, one to store distribution
values in an array called Dval [0..2] and another array to store the sorted list
of elements called b[0…6] and then scan the unsorted list of elements from
right to left. The element that is scanned first is 2 and its distribution value
is 2.
0 1 2 0 1 2 3 4 5 6
2 5 7 2
1 5 7 2 4
Now decrement the distribution value of 4 that is Dval 5-1 = 4. Scan a[4]
which is 4 then find the distribution value of 4 Dval[1] = 4 put 4 at 4 - 1 = 3,
at b[3] location. This is depicted in figure 9.4.
6 4 2 6 4 4 2
1 4 7 2 4 4
The array a[ ] is scanned through all the elements and sorted. In the last
step, a[0] which is 6, is scanned. The distribution value of 6 is Dval[2] =6.
Hence we insert 6 at 6-1=5 i.e. at b[5] position. This is shown in figure 9.5.
6 4 2 6 4 4 2
0 2 6
2 2 4 4 4 6 6
Thus the array is scanned from right to left. The following figure 9.6 depicts
the summation of all the above steps.
2 5 7 a[6] =1 2
1 5 7 a[5] =2 4
1 4 7 a[4]= 2 4
1 3 7 a[3] = 3 6
1 3 6 a[2] = 1 2
0 3 6 4
a[1] = 2
0 2 6 6
a[0] = 3
Figure 9.6: Depicting Array Dval[ ] and Array b[ ] After Scanning All Elements
The draw back of this method is that it depends on the nature of input,
requires additional auxiliary array for sorting list and an array for distribution
of values.
If we analyze the algorithm for distribution counting, we can divide the time
taken into four parts as given in the table 9.2.
Loop Complexity
for j = 0 to u-1 Ө(u)
do D[]j] = 0
for i = 0 to n-1 Ө(n).
do D[A[i] - L] = D[A[i] -L] +1
for j =1 to u-L Ө(u)
do D[j] = D[j-1] + D [j]
for 1=n - 1 downto 0 do Ө(n)
j = A[i] –l
S[D[j] - 1] = A[i]
D[j] = D[j] – 1 ,it takes O(n)
Total Ө(u + n)
A R
S H A R P E R
Shift pattern by
3 characters
A R
S H A R P E R
12 3
Figure 9.7: Example for Pattern Matching
The number of shifts that are made to match pattern against text is an
important activity. To speed up the algorithm we can precompute the shift
sizes and store them in a table. Such a table is called shift table.
The shift size can be computed by the following formula:
If ‘T’ is not among first m-1 characters of pattern
Table (T) = the patterns length m
Else Table (T) = the distance from the rightmost T among
first m-1 characters of patterns to its last character.
The first step in this algorithm is to create the shift table for the pattern as
given in table 9.3.
O P Q R S T U V W X Y Z
4 4 4 4 4 4 4 4 4 4 4 4
The table 9.4 illustrates the way the pattern is matched with the text.
Table 9.4: Matching Pattern Against Text
Scan is
done from
D A V I D L O V E S K I D S Right to left
K I D S I≠S, refer to
the shift
table I
indicates 2,
Hence shift
pattern by
2 positions
Mismatch
D A V I D L O V E S K I D S occurs so
K I D S the entire
K I D S pattern is
shifted to
K I D S
the right by
its length 4
Now I≠S,
with
D A V I D L O V E S K I D S reference
K I D S to the shift
table the
pattern is
shifted 2
positions to
the right
All the
D A V I D L O V E S K I D S characters
K I D S from the
text are
matching
the
characters
in the
pattern,
hence the
search is
declared
successful
Let us next discuss the Horspool’s algorithm for string matching.
B H U T T O K N O W S T O R O N T O
T O R O N T O
6 5 4 3 2 1 0
T O R O N T O
A B C D E F G H I J K L M N O P Q R
7 7 7 7 7 7 7 7 7 7 7 7 7 2 3 7 7 4
S T U V W X Y Z
7 1 7 7 7 7 7 7
k Q
1 3
2 5
3 5
4 5
5 5
6 5
Space means shift by 7
B H U T T O K N O W S T O R O N T O
T O R O N T O
Since space is encountered while matching the pattern with the text, the
pattern is shifted by 7 positions as shown in table 9.7.
Table 9.7: After Shifting the Pattern by 7 Positions
B H U T T O K N O W S T O R O N T O
T O R O N T O
With reference to T entry from table 9.5 i.e. bad character table which is 1,
shift the pattern to the right by 1 position as shown in table 9.8.
Table 9.8: Pattern Shifted to Right by 1 Position
B H U T T O K N O W S T O R O N T O
T O R O N T O
k=2
There is mismatch in pattern because of the blank place in the text.
Therefore P = max [(text (-) – k) ,1]
= max {(7 - 2),1}
P=5
Compute Good suffix shift Q. As 2 characters from the pattern are matching
Q(2) = 5
Shift size R = max {P,Q}
= max {5,5}
=5
Therefore we shift the pattern 5 positions ahead as shown in table 9.9.
Table 9.9: Depicting the Pattern in the Text
B H U T T O K N O W S T O R O N T O
T O R O N T O
Activity 2
Compare the collision resolution strategies and list the disadvantages of
one strategy over the other
Binary search is typically used within nodes to find the separation values
and child tree of interest.
Insertion
To understand the process of insertion let us consider an empty B-tree of
order 5 and insert the following numbers in it 3, 14, 7, 1, 8, 5, 11, 17, 13. A
tree of order 5 has a maximum of 5 children and 4 keys. All nodes other
than the root must have a minimum of 2 keys. The figure 9.9 shows how the
1st four numbers gets inserted.
To insert the next number 8, there is no room in this node, so we split it into
2 nodes, by moving the median item 7 up into a new root node as depicted
in figure 9.10.
Insertion of the next three numbers 5, 11, and 17 proceeds without requiring
any splits. This is illustrated in figure 9.11.
Next, to delete 18, even though 18 is in a leaf, we can see from figure 9.6
that this leaf does not have an extra key; the deletion results in a node with
one key, which is not acceptable for a B-tree of order 5. If the sibling node to
the immediate left or right has an extra key, we then borrow a key from the
parent and move a key up from this sibling. In this specific case, the sibling
to the right has an extra key. So, the successor 23 of 19, is moved down
from the parent, and the 24 is moved up, and 19 is moved to the left so that
23 can be inserted in its place.
Now the parent node contains only one key, 7 which is not acceptable. If
this node had a sibling to its immediate left or right having a spare key, then
we could again "borrow" a key. Suppose the right sibling (the node with 17
24) had one more key. We would then move 13 down to the node with too
few keys and move the 17 up where the 13 had been. The 14 and 16 nodes
would be attached via the pointer field to the right of 13's new location.
Since in our example we have no way to borrow a key from a sibling, we
must again combine with the sibling, and move down the 13 from the parent.
In this case, the tree’s height reduces by one. The resulting B-tree is shown
in figure 9.17.
9.6 Summary
Let us summarize the unit here.
We usually analyze the efficiency of an algorithm in terms of its time and
space requirements. Usually the efficiency of an algorithm is stated as a
function relating the input length to the number of steps (time complexity) or
storage locations (space complexity).
Distribution counting is an input enhancement method wherein a separate
array is used to store the information generated during the sorting process
and these arrays enhance the sorting process. Horspool’s and Boyre-Moore
algorithms are string matching algorithms where in the pattern is compared
with the text and shifting of the pattern is done by computing shift size. Thus
searching operation becomes faster.
Hashing is a technique that uses a hash key to find items. Collision occurs
when the hash key value of two items turns out to be the same value. This is
rectified by the two collision resolution methods - separate chaining and
open addressing. The branching factor in B-Tree technique speeds the
access time.
Thus by using all the above techniques, we can reduce the execution time
of an algorithm.
9.7 Glossary
Term Description
Leaf node A leaf node is a node in a tree data structure that has no child
nodes.
Linked list A linked list is a data structure which consists of a sequence of
data records, that in each record there is a field that contains a
reference
9.9 Answers
Self Assessment Questions
1. Preprocessing
2. Distribution
3. Time efficiency
4. Preprocess
5. Right to left
6. Good suffix and bad character shift
7. Bucket
8. Same hash value
9. Double hashing
10. Branching factor
11. Increases
12. Leaf node
Terminal Questions
1. Refer to 9.2.1 – Distribution counting
2. Refer to 9.4.2 – Collision resolution
3. Refer to 9.3 – Input enhancement in string matching
4. Refer to 9.4.1 – Hash function
5. Refer to 9.5.1 – B-Tree technique
Reference
Puntambekar, A.A. (2008). Design and Analysis of Algorithms, First
edition, Technical publications, Pune.
Donald Adjeroh., & Timothy Bell., & Amar Mukherjee (2008). The
Burrows-Wheeler transform, Springer Publishing Company.
E-Reference
www.cs.ucr.edu/~jiang/cs141/ch07n.ppt
http://documentbook.com/horspool-ppt.html
http://www-igm.univ-mlv.fr/~lecroq/string/node14.html