0% found this document useful (0 votes)
66 views

Sequential Data Structures

The document discusses sequential data structures like stacks, queues, arrays, and linked lists. It defines them as abstract data types (ADTs) that specify functionality without specifying implementation. Stacks follow LIFO order and support push(), pop(), and isEmpty(). Queues follow FIFO order and support enqueue(), dequeue(), and isEmpty(). Both stacks and queues can be implemented with arrays or linked lists in O(1) time. Arrays store elements contiguously in memory, while linked lists allocate two or more cells per element to store the element and a pointer to the next element.

Uploaded by

Abhi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Sequential Data Structures

The document discusses sequential data structures like stacks, queues, arrays, and linked lists. It defines them as abstract data types (ADTs) that specify functionality without specifying implementation. Stacks follow LIFO order and support push(), pop(), and isEmpty(). Queues follow FIFO order and support enqueue(), dequeue(), and isEmpty(). Both stacks and queues can be implemented with arrays or linked lists in O(1) time. Arrays store elements contiguously in memory, while linked lists allocate two or more cells per element to store the element and a pointer to the next element.

Uploaded by

Abhi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Inf2B Algorithms and Data Structures Note 3 Informatics 2B (KK1.

Informatics 2B (KK1.3) Inf2B Algorithms and Data Structures Note 3 Informatics 2B (KK1.3)

A stack obeys the LIFO (Last-In, First-Out) principle. The Stack ADT is typically
Sequential Data Structures implemented by building either on arrays (in general these need to be dynamic
arrays, discussed in 3.4) or on linked lists2 . Both types of implementation are
straightforward, and efficient, taking O(1) time (the (dynamic) array case is more
In this lecture we introduce the basic data structures for storing sequences of ob- involved, see 3.4) for any of the three operations listed above.
jects. These data structures are based on arrays and linked lists, which you met A Queue is an ADT for storing a collection of elements that retrieves elements
in first year (you were also introduced to stacks and queues). In this course, we in the opposite order to a stack. The rule for a queue is FIFO (First-In, First-Out).
give abstract descriptions of these data structures, and analyse the asymptotic A queue supports the following methods:
running time of algorithms for their operations.
• enqueue(e): Insert element e (at the “rear” of the queue).

3.1 Abstract Data Types • dequeue(): Remove the element inserted the longest time ago (the element
at the “front”) and return it; an error occurs if the queue is empty.
The first step in designing a data structure is to develop a mathematical model
for the data to be stored. Then we decide which methods we need to access and • isEmpty(): Return TRUE if the queue is empty and FALSE otherwise.
modify the data. Such a model, together with the methods to access and modify
it, is an abstract data type (ADT). An ADT completely determines the function- Like stacks, queues can easily be realised using (dynamic) arrays or linked lists.
ality of a data structure (what we want it to), but it says nothing about the Again, whether we use arrays or linked lists, we can implement a queue so that
implementation of the data structure and its methods (how the data structure all operations can be performed in O(1) time.
is organised in memory, or which algorithms we implement for the methods). Stacks and Queues are very simple ADTs, with very simple methods—and
Clearly we will be very interested in which algorithms are used but not at the this is why we can implement these ADTs so the methods all run in O(1) time.
stage of defining the ADT. The particular algorithms/data structures that get
used will influence the running time of the methods of the ADT. The definition of
an ADT is something done at the beginning of a project, when we are concerned 3.3 ADTs for Sequential Data
with the specification1 of a system. In this section, our mathematical model of the data is a linear sequence of ele-
As an example, if we are implementing a dictionary ADT, we will need to ments. A sequence has well-defined first and last elements. Every element of a
perform operations such as look-up(w), where w is a word. We would know that sequence except the first has a unique predecessor while every element except
this is an essential operation at the specification stage, before deciding on data the last has a unique successor 3 . The rank of an element e in a sequence S is
structures or algorithms. the number of elements before e in S.
A data structure for realising (or implementing) an ADT is a structured set of The two most natural ways of storing sequences in computer memory are
variables for storing data. On the implementation level and in terms of JAVA, arrays and linked lists. We model the memory as a sequence of memory cells,
an ADT corresponds to a JAVA interface and a data structure realising the ADT each of which has a unique address (a 32 bit non-negative integer on a 32-bit
corresponds to a class implementing the interface. The ADT determines the func- machine). An array is simply a contiguous piece of memory, each cell of which
tionality of a data structure, thus an algorithm requiring a certain ADT works stores one object of the sequence stored in the array (or rather a reference to
correctly with any data structure realising the ADT. Not all methods are equally the object). In a singly linked list, we allocate two successive memory cells for
efficient in the different possible implementations, and the choice of the right each object of the sequence. These two memory cells form a node of a sequence.
one can make a huge difference for the efficiency of an algorithm. The first stores the object and the second stores a reference to the next node of
the list (i.e., the address of the first memory cell of the next node). In a doubly
3.2 Stacks and Queues linked list we not only store a reference to the successor of each element, but
2
A Stack is an ADT for storing a collection of elements, with the following methods: Do not confuse the two structures. Arrays are by definition contiguous memory cells giving
us efficiency both in terms of memory usage and speed of access. Linked lists do not have to
• push(e): Insert element e (at the “top” of the stack). consist of contiguous cells and for each cell we have to pay not only the cost of storing an item
but also the location of the next cell. The disadvantage of arrays is that we cannot be sure of
• pop(): Remove the most recently inserted element (the element on “top”) being able to grow them in situ whereas of course we can always grow a list (subject to memory
and return it; an error occurs if the stack is empty. limitations). Confusing the two things is inexcusable.
3
A sequence can consist of a single element in which case the first and last elements are
• isEmpty(): Return TRUE if the stack is empty and FALSE otherwise. identical and of course there are no successor or predecessor elements. In some applications it
also makes sense to allow the empty sequence in which case it does not of course have a first or
1
You will learn about specification in Software engineering courses. last element.

1 2
Inf2B Algorithms and Data Structures Note 3 Informatics 2B (KK1.3) Inf2B Algorithms and Data Structures Note 3 Informatics 2B (KK1.3)

also to its predecessor. Thus each node needs three successive memory cells. Vectors
Figure 3.1 illustrates how an array, a singly linked list, and a doubly linked list
storing the sequence o1, o2, o3, o4, o5 may be located in memory.4 Figure 3.2 gives A Vector is an ADT for storing a sequence S of n elements that supports the
a more abstract view which is how we usually picture the data. following methods:
• elemAtRank(r): Return the element of rank r; an error occurs if r < 0 or
o1 o2 o3 o4 o5
r > n 1.
• replaceAtRank(r, e): Replace the element of rank r with e; an error occurs if
r < 0 or r > n 1.
• insertAtRank(r, e): Insert a new element e at rank r (this increases the rank
of all following elements by 1); an error occurs if r < 0 or r > n.
o2 o1 o5 o3 o4

• removeAtRank(r): Remove the element of rank r (this reduces the rank of


all following elements by 1); an error occurs if r < 0 or r > n 1.
• size(): Return n, the number of elements in the sequence.

o3 o1 o2
The most straightforward data structure for realising a vector stores the el-
o4 o5
ements of S in an array A, with the element of rank r being stored at index r
(assuming that the first element of an array has index 0). We store the length
of the sequence in a variable n, which must always be smaller than or equal to
A.length. Then the methods elemAtRank, replaceAtRank, and size have trivial
Figure 3.1. An array, a singly linked list, and a doubly linked list
algorithms5 (cf. Algorithms 3.3–3.5) which take O(1) time.
storing o1, o2, o3, o4, o5 in memory.
Algorithm elemAtRank(r)

1. return A[r]

o1 o2 o3 o4 o5 Algorithm 3.3

o1 o2 o3 o4 o5

Algorithm replaceAtRank(r, e)

o1 o2 o3 o4 o5 1. A[r] e

Algorithm 3.4
Figure 3.2. An array, a singly linked list, and a doubly linked list
storing o1, o2, o3, o4, o5.

Algorithm size()

The advantage of storing a sequence in an array is that elements of the se- 1. return n // n stores the current length of the sequence,
quence can be accessed quickly in terms of rank. The advantage of linked lists which may be different from the length of A.
is that they are flexible, of unbounded size (unlike arrays) and easily allow the
insertion of new elements. Algorithm 3.5
We will discuss two ADTs for sequences. Both can be realised using linked
lists or arrays, but arrays are (maybe) better for the first, and linked lists for the By our general assumption that each line of code only requires a constant
second. number of computation steps, the running time of Algorithms 3.3–3.5 is ⇥(1).
4 5
This memory model is simplified, but it illustrates the main points. We don’t worry about implementation issues such as error handling.

3 4
Inf2B Algorithms and Data Structures Note 3 Informatics 2B (KK1.3) Inf2B Algorithms and Data Structures Note 3 Informatics 2B (KK1.3)

The implementation of insertAtRank and removeAtRank are much less efficient • first(): Return the position of the first element; an error occurs if the list is
(see Algorithms 3.6 and 3.7). Also, there is a problem with insertAtRank if n = empty.
A.length (we will consider this issue properly in § 3.4 on dynamic arrays), but
for now we assume that the length of the array A is chosen to be large enough • isEmpty(): Return TRUE if the list is empty and FALSE otherwise.
to never fill up. In the worst case the loop of insertAtRank is iterated n times
and the loop of removeAtRank is iterated n 1 times. Hence TinsertAtRank (n) and • next(p): Return the position of the element following the one at position p;
TremoveAtRank (n) are both ⇥(n). an error occurs if p is the last position.

• isLast(p): Return TRUE if p is the last position of the list and FALSE otherwise.
Algorithm insertAtRank(r, e)

1. for i n downto r + 1 do • replace(p, e): Replace the element at position p with e.

2. A[i] A[i 1] • insertFirst(e): Insert e as the first element of the list.


3. A[r] e
• insertAfter(p, e): Insert element e after position p.
4. n n+1
• remove(p): Remove the element at position p.
Algorithm 3.6
List also has methods last(), previous(p), isFirst(p), insertLast(e), and insertBefore(p, e).
These methods correspond to first(), next(p), isLast(p), insertFirst(e), and insertAfter(p, e)
if we reverse the order of the list; their functionality should be obvious.
Algorithm removeAtRank(r)
The natural way of realising the List ADT is by a data structure based on a
1. for i r to n 2 do doubly linked list. Positions are realised by nodes of the list, where each node
has fields previous, element, and next. The list itself stores a reference to the first
2. A[i] A[i + 1]
and last node of the list. Algorithms 3.8–3.9 show implementations of insertAfter
3. n n 1 and remove.
Algorithm 3.7 Algorithm insertAfter(p, e)

1. create a new node q


The Vector ADT can also be realised by a data structure based on linked lists. 2. q.element e
Linked lists do not properly support the access of elements based on their rank.
3. q.next p.next
To find the element of rank r, we have to step through the list from the beginning
for r steps. This makes all methods required by the Vector ADT quite inefficient, 4. q.previous p
with running time ⇥(n). 5. p.next q

Lists 6. q.next.previous q

Suppose we had a sequence and wanted to remove every element satisfying some Algorithm 3.8
condition. This would be possible using the Vector ADT, but it would be incon-
venient and inefficient (for the standard implementation of Vector). However, if
we had our sequence stored as a linked list, it would be quite easy: we would
Algorithm remove(p)
just step through the list and remove nodes holding elements with the given
condition. Hence we define a new ADT for sequences that abstractly reflects the 1. p.previous.next p.next
properties of a linked list—a sequence of nodes that each store an element, have
2. p.next.previous p.previous
a successor, and (in the case of doubly linked lists) a predecessor. We call this
ADT List. Our abstraction of a node is a Position, which is itself an ADT associated 3. delete p (done automatically in JAVA by garbage collector.)
with List. The basic methods of the List are:
Algorithm 3.9
• element(p): Return the element at position p.

5 6
Inf2B Algorithms and Data Structures Note 3 Informatics 2B (KK1.3) Inf2B Algorithms and Data Structures Note 3 Informatics 2B (KK1.3)

The asymptotic running time of Algorithms 3.8–3.9 is ⇥(1). It is easy to see Algorithm insertLast(e)
that all other methods can also be implemented by algorithms of asymptotic
1. if n < N then
running time ⇥(1) but only because we assume that p is given as a direct “pointer”
(to the relevant node). An operation such as insertAtRank, or asking to insert into 2. A[n] e
the element’s “sorted” position, would take ⌦(n) worst-case running time on a 3. else // n = N , i.e., the array is full
list.
4. N 2(N + 1)
Given the trade-off between using List and Vector for different operations,
5. Create new array A0 of length N
it makes sense to consider combining the two ADTs into one ADT Sequence,
which will support all methods of both Vector and List. For Sequence, both 6. for i = 0 to n 1 do
arrays and linked lists are more efficient on some methods than on others. The 7. A0 [i] A[i]
data structure used in practice would depend on which methods are expected to
8. A0 [n] e
be used most frequently in the application.
9. A A0
10. n n+1
Algorithm 3.10
3.4 Dynamic Arrays

There is a real problem with the array-based data structures for sequences that
Amortised Analysis
we have not considered so far. What do we do when the array is full? We cannot
simply extend it, because the part of the memory following the block where the
array sits may be used for other purposes at the moment. So what we have to By letting the length of the new array be at most twice the number of elements it
do is to allocate a sufficiently large block of memory (large enough to hold both currently holds, we guarantee that the load factor is always at least 1/2. Unfor-
the current array and the additional element we want to insert) somewhere else, tunately, the worst-case running time of inserting one element in a sequence of
and then copy the whole array there. This is not efficient, and we clearly want size n is ⇥(n), because we need ⌦(n) steps to copy the old array into the new one
to avoid doing it too often. Therefore, we always choose the length of the array in lines 6–7. However, we only have to do this copying phase occasionally, and in
to be a bit larger than the number of elements it currently holds, keeping some fact we do it less frequently as the sequence grows larger. This is reflected in the
extra space for future insertions. In this section, we will see a strategy for doing following theorem, which states that if we average the worst-case performance
this surprisingly efficiently. of a sequence of insertions (this is called amortised analysis), we only need an
average of O(1) time for each.
Concentrating on the essentials, we shall only implement the very basic ADT
Theorem 3.11. Inserting m elements into an initially empty VeryBasicSequence
VeryBasicSequence. It stores a sequence of elements and supports the methods
using the method insertLast (Algorithm 3.10) takes ⇥(m) time.
elemAtRank(r), replaceAtRank(r, e) of Vector and the method addLast(e) of List.
So it is almost like a queue without the dequeue operation. If we only knew the worst-case running time of ⇥(n) for a single
P insertion into
a sequence of size n, we might conjecture a worst-case time of m 2
n=1 ⇥(n) = ⇥(m )
Our data structure stores the elements of the sequence in an array A. We
for m insertions into an initially empty VeryBasicSequence. Our ⇥(m) bound is
store the current size of the sequence in a variable n and let N be the length
therefore a big improvement. Analysing the worst-case running-time of a total
of A. Thus we must always have N n. The load factor of our array is defined
sequence of operations, is called amortised analysis.
to be n/N . The load factor is a number between 0 and 1 indicating how much
space we are wasting. If it is close to 1, most of the array is filled by elements
of the sequence and we are not wasting much space. In our implementation, we
will always maintain a load factor of at least 1/2. P ROOF Let I(1), . . . , I(m) denote the m insertions. For most insertions only lines
The methods elemAtRank(r) and replaceAtRank(r, e) can be implemented as 1,2, and 10 are executed, taking ⇥(1) time. These are called cheap insertions.
for Vector by algorithms of running time ⇥(1). Consider Algorithm 3.10 for in- Occasionally on an insertion we have to create a new array and copy the old one
sertions. As long as there is room in the array, insertLast simply inserts the into it (lines 4-9). When this happens for an insertion I(i) it requires time ⇥(i),
element at the end of the array. If the array is full, a new array of length twice because all of the elements l(1), . . . , l(i 1) need to be copied into a new array
the length of the old array plus the new element is created. Then the old array is (lines 6-7). These are called expensive insertions. Let I(i1 ), . . . , I(i` ), where 1 
copied to the new one and the new element is inserted at the end. i1 < i2 < . . . < i`  m, be all the expensive insertions.

7 8
Inf2B Algorithms and Data Structures Note 3 Informatics 2B (KK1.3) Inf2B Algorithms and Data Structures Note 3 Informatics 2B (KK1.3)

Then the overall time we need for all our insertions is Note that although the theorem makes a statement about the “average time”
needed by an insertion, it is not a statement about average running time. The
X̀ X
⇥(ij ) + ⇥(1) (3.1) reason is that the statement of the theorem is completely independent of the
j=1 1im
input, i.e., the elements we insert. In this sense, it is a statement about the
i6=i1 ,...,i` worst-case running time, whereas average running time makes statements about
“average”, or random, inputs. Since an analysis such as the one used for the
We now split this into two parts, one for O and one for ⌦ (recall that f = ⇥(g) iff
theorem occurs quite frequently in the theory of algorithms, there is a name
f = O(g) and f = ⌦(g)). First consider O.
for it: amortised analysis. A way of rephrasing the theorem is saying that the
X̀ X X̀ m
X ⇣ X̀ ⌘ amortised (worst case) running time of the method insertLast is ⇥(1).
O(ij ) + O(1)  O(ij ) + O(1)  O ij + O(m), (3.2) Finally, suppose that we want to add a method removeLast() for removing
j=1 1im j=1 i=1 j=1 the last element of our ADT VeryBasicSequence. We can use a similar trick for
i6=i1 ,...,i`
implementing this—we only create a new array of, say, size 3/4 of the current
where at the last stage we have repeatedly applied rule (2) of Theorem 2.3 (lecture one, if the load factor falls below 1/2. With this strategy, we always have a load
notes 2). To give an upper bound on the last term in (3.2), we have to determine factor of at least 1/2, and it can be proved that the amortised running time of
the ij . This is quite easy: We start with n = N = 0. Thus the first insertion is both insertLast and removeLast is ⇥(1).
expensive, and after it we have n = 1, N = 2. Therefore, the second insertion is The JAVA Collections Framework contains the ArrayList implementation of List
cheap. The third insertion is expensive again, and after it we have n = 3, N = 6. (in the class java.util.ArrayList), as well as other list implementations. ArrayList,
The next expensive insertion is the seventh, after which we have n = 7, N = 14. though not implemented as a dynamic array, provides most features of a dy-
Thus i1 = 1, i2 = 3, i3 = 7. The general pattern is namic array.
3.5 Further reading
ij+1 = 2ij + 1.
If you have [GT]: The chapters “Stacks, Queues and Recursion” and “Vectors,
Now an easy induction shows that 2j 1
 ij < 2j , and this gives us Lists and Sequences”.
X̀ X̀ Exercises
ij  2j = 2`+1 2 (3.3)
j=1 j=1 1. Implement the method removeLast() for removing the last element of a dy-
namic array as described in the last paragraph of Section 3.4.
(summing the geometric series). Since 2` 1
 i`  m, we have `  lg(m) + 1. Thus
2. Prove by induction that in the proof of Theorem 3.11, the following claim
2`+1 2  2lg(m)+2 2 = 4 · 2lg(m) 2 = 4m 2 = O(m). (3.4) (towards the end of the proof) is true:
For any k 2 N, k 4, the set {1, 2, 3, . . . , k} contains at most k/2 occurrences
Then by (3.1)–(3.4), the amortised running time of I(1), . . . , I(m) is O(m). of ij indices.
Next we consider ⌦. By (3.1) and by properties of ⌦ , we know
3. What is the amortised running time of a sequence P = p1 p2 . . . pn of opera-
X̀ X X tions if the running time of pi is ⇥(i) if i is a multiple of 3 and ⇥(1) otherwise?
⌦(ij ) + ⌦(1) ⌦(1). (3.5)
j=1 1im 1im What if the running time of pi is ⇥(i) if i is a square and ⇥(1) otherwise?
i6=i1 ,...,i` i6=i1 ,...,i` P
Hint: For the second question, use the fact that m 2 3
i=1 i = ⇥(m ).
Recall that we have shown that i1 = 1 and for j 1, ij+1 = 2ij + 1. We claim that
for any k 2 N, with k 4, the set {1, 2, 3, . . . , k} contains at most k/2 occurrences 4. Implement in JAVA a Stack class based on dynamic arrays.
of ij indices (This can be proven by induction, left as an exercise). Hence there
at least m/2 elements of the set {1  i  m | i 6= i1 , . . . , i` }. Hence
X
⌦(1) (m/2)⌦(1) = ⌦(m), (3.6)
1im
i6=i1 ,...,i`

whenever m 4. Combining (3.5) and (3.6), the running time for I(1), . . . , I(m)
is ⌦(m). Together with our O(m) result, this implies the theorem. ⇤

9 10

You might also like