Unit 1 - 1
Unit 1 - 1
Any course on Data Structures and Algorithms will try to teach you about three
things:
1. It will present a collection of commonly used data structures and algorithms. These
form a programmer’s basic “toolkit”. For many problems, some data structure or
algorithm in the toolkit will provide a good solution. We focus on data structures and
algorithms that have proven over time to be most useful.
2. It will introduce the idea of tradeoffs, and reinforce the concept that there are costs
and benefits associated with every data structure or algorithm. This is done by
describing, for each data structure, the amount of space and time required for typical
operations. For each algorithm, we examine the time required for key input types.
3. It will teach you how to measure the effectiveness of a data structure or algorithm.
Only through such measurement can you determine which data structure in your
toolkit is most appropriate for a new problem. The techniques presented also allow
you to judge the merits of new data structures that you or others might invent.
There are often many approaches to solving a problem. How do we choose between
them? At the heart of computer program design are two (sometimes conflicting)
goals:
You might think that with ever more powerful computers, program efficiency is
becoming less important. After all, processor speed and memory size still continue to
improve. Won’t today’s efficiency problem be solved by tomorrow’s hardware?
As we develop more powerful computers, our history so far has always been to use
that additional computing power to tackle more complex problems, be it in the form
of more sophisticated user interfaces, bigger problem sizes, or new problems
previously deemed computationally infeasible. More complex problems demand more
computation, making the need for efficient programs even greater. Unfortunately, as
tasks become more complex, they become less like our everyday experience. So
today’s computer scientists must be trained to have a thorough understanding of the
principles behind efficient program design, because their ordinary life experiences
often do not apply when designing computer programs.
In the most general sense, a data structure is any data representation and its associated
operations. Even an integer or floating point number stored on the computer can be
viewed as a simple data structure. More commonly, people use the term “data
structure” to mean an organization or structuring for a collection of data items. A
sorted list of integers stored in an array is an example of such a structuring. These
ideas are explored further in a discussion of Abstract Data Types.
A solution is said to be efficient if it solves the problem within the required resource
constraints. Examples of resource constraints include the total space available to store
the data—possibly divided into separate main memory and disk space constraints—
and the time allowed to perform each subtask. A solution is sometimes said to be
efficient if it requires fewer resources than known alternatives, regardless of whether
it meets any particular requirements. The cost of a solution is the amount of resources
that the solution consumes. Most often, cost is measured in terms of one key resource
such as time, with the implied assumption that the solution meets the other resource
constraints.
It should go without saying that people write programs to solve problems. However,
sometimes programmers forget this. So it is crucial to keep this truism in mind when
selecting a data structure to solve a particular problem. Only by first analyzing the
problem to determine the performance goals that must be achieved can there be any
hope of selecting the right data structure for the job. Poor program designers ignore
this analysis step and apply a data structure that they are familiar with but which is
inappropriate to the problem. The result is typically a slow program. Conversely,
there is no sense in adopting a complex representation to “improve” a program that
can meet its performance goals when implemented using a simpler design.
When selecting a data structure to solve a problem, you should follow these steps.
Are all data items inserted into the data structure at the beginning, or are
insertions interspersed with other operations? Static applications (where the
data are loaded at the beginning and never change) typically get by with
simpler data structures to get an efficient implementation, while dynamic
applications often require something more complicated.
Can data items be deleted? If so, this will probably make the implementation
more complicated.
Are all data items processed in some well-defined order, or is search for
specific data items allowed? “Random access” search generally requires more
complex data structures.
Each data structure has associated costs and benefits. In practice, it is hardly ever true
that one data structure is better than another for use in all situations. If one data
structure or algorithm is superior to another in all respects, the inferior one will
usually have long been forgotten. For nearly every data structure and algorithm
presented in this book, you will see examples of where it is the best choice. Some of
the examples might surprise you.
A data structure requires a certain amount of space for each data item it stores, a
certain amount of time to perform a single basic operation, and a certain amount of
programming effort. Each problem has constraints on available space and time. Each
solution to a problem makes use of the basic operations in some relative proportion,
and the data structure selection process must account for this. Only after a careful
analysis of your problem’s characteristics can you determine the best data structure
for the task.