0% found this document useful (0 votes)
36 views

Data Structures Related Algorithms Correctness Time Space Complexity

This document describes a CS 130A course on data structures and algorithms. The key points are: 1. The course focuses on data structures and related algorithms, as well as analyzing their correctness and time/space complexity. 2. It has prerequisites in basic data structures, functions, and C/C++ programming. 3. The course covers a variety of algorithms and data structures, how to analyze algorithms, and examples of widely-used algorithms.

Uploaded by

misganaw2003
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Data Structures Related Algorithms Correctness Time Space Complexity

This document describes a CS 130A course on data structures and algorithms. The key points are: 1. The course focuses on data structures and related algorithms, as well as analyzing their correctness and time/space complexity. 2. It has prerequisites in basic data structures, functions, and C/C++ programming. 3. The course covers a variety of algorithms and data structures, how to analyze algorithms, and examples of widely-used algorithms.

Uploaded by

misganaw2003
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 56

CS 130 A: Data Structures and Algorithms

 Focus of the course:


 Data structures and related algorithms

 Correctness and (time and space) complexity

 Prerequisites
 CS 20: stacks, queues, lists, binary search trees, …

 CS 40: functions, recurrence equations, induction, …

 CS 60: C, C++, and UNIX

1
Course Organization
 Grading:
 See course web page

 Policy:

 No late homeworks.
 Cheating and plagiaris: F grade and disciplinary actions
 Online info:
Homepage: www.cs.ucsb.edu/~cs130a
 Email: [email protected]
 Teaching assistants: See course web page

2
Introduction

 A famous quote: Program = Algorithm + Data Structure.


 All of you have programmed; thus have already been exposed
to algorithms and data structure.
 Perhaps you didn't see them as separate entities;
 Perhaps you saw data structures as simple programming
constructs (provided by STL--standard template library).
 However, data structures are quite distinct from algorithms,
and very important in their own right.

3
Objectives
 The main focus of this course is to introduce you to a
systematic study of algorithms and data structure.
 The two guiding principles of the course are: abstraction and
formal analysis.
 Abstraction: We focus on topics that are broadly applicable
to a variety of problems.
 Analysis: We want a formal way to compare two objects (data
structures or algorithms).
 In particular, we will worry about "always correct"-ness, and
worst-case bounds on time and memory (space).

4
Textbook
 Textbook for the course is:
 Data Structures and Algorithm Analysis in C++
 by Mark Allen Weiss

 But I will use material from other books and


research papers, so the ultimate source should be
my lectures.

5
Course Outline
 C++ Review (Ch. 1)
 Algorithm Analysis (Ch. 2)
 Sets with insert/delete/member: Hashing (Ch. 5)
 Sets in general: Balanced search trees (Ch. 4 and 12.2)
 Sets with priority: Heaps, priority queues (Ch. 6)
 Graphs: Shortest-path algorithms (Ch. 9.1 – 9.3.2)
 Sets with disjoint union: Union/find trees (Ch. 8.1–8.5)
 Graphs: Minimum spanning trees (Ch. 9.5)
 Sorting (Ch. 7)

6
130a: Algorithm Analysis
 Foundations of Algorithm Analysis and Data Structures.
 Analysis:
 How to predict an algorithm’s performance

 How well an algorithm scales up

 How to compare different algorithms for a problem

 Data Structures
 How to efficiently store, access, manage data

 Data structures effect algorithm’s performance

7
Example Algorithms
 Two algorithms for computing the Factorial
 Which one is better?

 int factorial (int n) {


if (n <= 1) return 1;
else return n * factorial(n-1);
}

 int factorial (int n) {


if (n<=1) return 1;
else {
fact = 1;
for (k=2; k<=n; k++)
fact *= k;
return fact;
}
}

8
Examples of famous algorithms

 Constructions of Euclid
 Newton's root finding
 Fast Fourier Transform
 Compression (Huffman, Lempel-Ziv, GIF, MPEG)
 DES, RSA encryption
 Simplex algorithm for linear programming
 Shortest Path Algorithms (Dijkstra, Bellman-Ford)
 Error correcting codes (CDs, DVDs)
 TCP congestion control, IP routing
 Pattern matching (Genomics)
 Search Engines

9
Role of Algorithms in Modern World
 Enormous amount of data
 E-commerce (Amazon, Ebay)
 Network traffic (telecom billing, monitoring)
 Database transactions (Sales, inventory)
 Scientific measurements (astrophysics, geology)
 Sensor networks. RFID tags
 Bioinformatics (genome, protein bank)

 Amazon hired first Chief Algorithms Officer (Udi


Manber)

10
A real-world Problem

 Communication in the Internet


 Message (email, ftp) broken down into IP packets.
 Sender/receiver identified by IP address.
 The packets are routed through the Internet by special
computers called Routers.
 Each packet is stamped with its destination address, but not
the route.
 Because the Internet topology and network load is constantly
changing, routers must discover routes dynamically.
 What should the Routing Table look like?

11
IP Prefixes and Routing
 Each router is really a switch: it receives packets at several
input ports, and appropriately sends them out to output
ports.
 Thus, for each packet, the router needs to transfer the
packet to that output port that gets it closer to its
destination.
 Should each router keep a table: IP address x Output Port?
 How big is this table?
 When a link or router fails, how much information would need
to be modified?
 A router typically forwards several million packets/sec!

12
Data Structures
 The IP packet forwarding is a Data Structure problem!
 Efficiency, scalability is very important.

 Similarly, how does Google find the documents matching your


query so fast?
 Uses sophisticated algorithms to create index structures,
which are just data structures.
 Algorithms and data structures are ubiquitous.
 With the data glut created by the new technologies, the need
to organize, search, and update MASSIVE amounts of
information FAST is more severe than ever before.

13
Algorithms to Process these Data

 Which are the top K sellers?


 Correlation between time spent at a web site and purchase
amount?
 Which flows at a router account for > 1% traffic?
 Did source S send a packet in last s seconds?
 Send an alarm if any international arrival matches a profile
in the database
 Similarity matches against genome databases
 Etc.

14
Max Subsequence Problem
 Given a sequence of integers A1, A2, …, An, find the maximum possible
value of a subsequence Ai, …, Aj.
 Numbers can be negative.
 You want a contiguous chunk with largest sum.

 Example: -2, 11, -4, 13, -5, -2


 The answer is 20 (subseq. A2 through A4).

 We will discuss 4 different algorithms, with time complexities O(n3),


O(n2), O(n log n), and O(n).
 With n = 106, algorithm 1 may take > 10 years; algorithm 4 will take a
fraction of a second!

15
Algorithm 1 for Max Subsequence Sum
 Given A1,…,An , find the maximum value of Ai+Ai+1+···+Aj
0 if the max value is negative

int maxSum = 0; O (1)

for( int i = 0; i < a.size( ); i++ )


for( int j = i; j < a.size( ); j++ )
{ O (1) n−1 n−1 n−1
int thisSum = 0; O (∑( j−i)) O (∑∑( j−i))
for( int k = i; k <= j; k++ ) O( j −i ) j=i i=0 j=i
thisSum += a[ k ]; O (1)
if( thisSum > maxSum ) O (1)
maxSum = thisSum;
}
return maxSum;

 Time complexity: On3

16
Algorithm 2
 Idea: Given sum from i to j-1, we can compute the
sum from i to j in constant time.
 This eliminates one nested loop, and reduces the

running time to O(n2).

into maxSum = 0;

for( int i = 0; i < a.size( ); i++ )


int thisSum = 0;
for( int j = i; j < a.size( ); j++ )
{
thisSum += a[ j ];
if( thisSum > maxSum )
maxSum = thisSum;
}
return maxSum;

17
Algorithm 3
 This algorithm uses divide-and-conquer paradigm.
 Suppose we split the input sequence at midpoint.

 The max subsequence is entirely in the left half,

entirely in the right half, or it straddles the


midpoint.
 Example:

left half | right half


4 -3 5 -2 | -1 2 6 -2
 Max in left is 6 (A1 through A3); max in right is 8 (A6
through A7). But straddling max is 11 (A1 thru A7).

18
Algorithm 3 (cont.)
 Example:
left half | right half
4 -3 5 -2 | -1 2 6 -2
 Max subsequences in each half found by recursion.
 How do we find the straddling max subsequence?
 Key Observation:
 Left half of the straddling sequence is the max

subsequence ending with -2.


 Right half is the max subsequence beginning with -1.

 A linear scan lets us compute these in O(n) time.

19
Algorithm 3: Analysis

 The divide and conquer is best analyzed through


recurrence:

T(1) = 1
T(n) = 2T(n/2) + O(n)

 This recurrence solves to T(n) = O(n log n).

20
Algorithm 4
2, 3, -2, 1, -5, 4, 1, -3, 4, -1, 2
int maxSum = 0, thisSum = 0;

for( int j = 0; j < a.size( ); j++ )


{
thisSum += a[ j ];

if ( thisSum > maxSum )


maxSum = thisSum;
else if ( thisSum < 0 )
thisSum = 0;
}
return maxSum;
}
 Time complexity clearly O(n)
 But why does it work? I.e. proof of correctness.

21
Proof of Correctness

 Max subsequence cannot start or end at a negative Ai.


 More generally, the max subsequence cannot have a prefix
with a negative sum.
Ex: -2 11 -4 13 -5 -2
 Thus, if we ever find that Ai through Aj sums to < 0, then we
can advance i to j+1
 Proof. Suppose j is the first index after i when the sum

becomes < 0
 The max subsequence cannot start at any p between i and

j. Because Ai through Ap-1 is positive, so starting at i would


have been even better.

22
Algorithm 4
int maxSum = 0, thisSum = 0;

for( int j = 0; j < a.size( ); j++ )


{
thisSum += a[ j ];

if ( thisSum > maxSum )


maxSum = thisSum;
else if ( thisSum < 0 )
thisSum = 0;
}
return maxSum

• The algorithm resets whenever prefix is < 0.


Otherwise, it forms new sums and updates
maxSum in one pass.

23
Why Efficient Algorithms Matter
 Suppose N = 106
 A PC can read/process N records in 1 sec.
 But if some algorithm does N*N computation, then it takes
1M seconds = 11 days!!!

 100 City Traveling Salesman Problem.


 A supercomputer checking 100 billion tours/sec still

requires 10100 years!

 Fast factoring algorithms can break encryption schemes.


Algorithms research determines what is safe code length. (>
100 digits)

24
How to Measure Algorithm Performance

 What metric should be used to judge algorithms?


 Length of the program (lines of code)

 Ease of programming (bugs, maintenance)

 Memory required

 Running time

 Running time is the dominant standard.


 Quantifiable and easy to compare

 Often the critical bottleneck

25
Abstraction
 An algorithm may run differently depending on:
 the hardware platform (PC, Cray, Sun)

 the programming language (C, Java, C++)

 the programmer (you, me, Bill Joy)

 While different in detail, all hardware and prog models are


equivalent in some sense: Turing machines.

 It suffices to count basic operations.

 Crude but valuable measure of algorithm’s performance as a


function of input size.

26
Average, Best, and Worst-Case

 On which input instances should the algorithm’s performance


be judged?

 Average case:
 Real world distributions difficult to predict

 Best case:
 Seems unrealistic

 Worst case:
 Gives an absolute guarantee

 We will use the worst-case measure.

27
Examples
 Vector addition Z = A+B
for (int i=0; i<n; i++)
Z[i] = A[i] + B[i];
T(n) = c n

 Vector (inner) multiplication z =A*B


z = 0;
for (int i=0; i<n; i++)
z = z + A[i]*B[i];
T(n) = c’ + c1 n

28
Examples
 Vector (outer) multiplication Z = A*BT
for (int i=0; i<n; i++)
for (int j=0; j<n; j++)
Z[i,j] = A[i] * B[j];
T(n) = c2 n2;

 A program does all the above


T(n) = c0 + c1 n + c2 n2;

29
Simplifying the Bound

 T(n) = ck nk + ck-1 nk-1 + ck-2 nk-2 + … + c1 n + co


 too complicated

 too many terms

 Difficult to compare two expressions, each with

10 or 20 terms
 Do we really need that many terms?

30
Simplifications
 Keep just one term!
 the fastest growing term (dominates the runtime)

 No constant coefficients are kept


 Constant coefficients affected by machines, languages,

etc.

 Asymtotic behavior (as n gets large) is determined entirely


by the leading term.

 Example. T(n) = 10 n3 + n2 + 40n + 800


 If n = 1,000, then T(n) = 10,001,040,800
 error is 0.01% if we drop all but the n3 term
 In an assembly line the slowest worker determines the
throughput rate

31
Simplification
 Drop the constant coefficient
 Does not effect the relative order

32
Simplification
 The faster growing term (such as 2n) eventually will
outgrow the slower growing terms (e.g., 1000 n) no
matter what their coefficients!

 Put another way, given a certain increase in


allocated time, a higher order algorithm will not reap
the benefit by solving much larger problem

33
Complexity and Tractability

T(n)
n n n log n n2 n3 n4 n10 2n
10 .01μs .03μs .1μs 1μs 10μs 10s 1μs
20 .02μs .09μs .4μs 8μs 160μs 2.84h 1ms
30 .03μs .15μs .9μs 27μs 810μs 6.83d 1s
40 .04μs .21μs 1.6μs 64μs 2.56m s 121d 18m
50 .05μs .28μs 2.5μs 125μs 6.25m s 3.1y 13d
100 .1μs .66μs 10μs 1m s 100m s 3171y 4×1013y
103 1μs 9.96μs 1ms 1s 16.67m 3.17×1013y 32×10283y
104 10μs 130μs 100ms 16.67m 115.7d 3.17×1023y
105 100μs 1.66m s 10s 11.57d 3171y 3.17×1033y
106 1m s 19.92m s 16.67m 31.71y 3.17×107y 3.17×1043y

Assume the computer does 1 billion ops per sec.

34
log nnn log nn2n32n010112122484248166416382464512256416642564096

70000 2n n2 2n
100000
n3 n2
60000

50000
10000 n log n

40000
n3 1000 n
30000
100
20000
n log n 10 log n
10000

0
n
1
n log n n

35
Another View
 More resources (time and/or processing power) translate into
large problems solved if complexity is low

T(n) Problem size Problem size Increase in


solved in 103 solved in 104 Problem size
sec sec
100n 10 100 10

1000n 1 10 10

5n2 14 45 3.2

N3 10 22 2.2

2n 10 13 1.3

36
Asympotics

T(n) keep one drop coef

3n2+4n+1 3 n2 n2

101 n2+102 101 n2 n2

15 n2+6n 15 n2 n2

a n2+bn+c a n2 n2

 They all have the same “growth” rate

37
Caveats

 Follow the spirit, not the letter


 a 100n algorithm is more expensive than n2

algorithm when n < 100


 Other considerations:

 a program used only a few times

 a program run on small data sets

 ease of coding, porting, maintenance

 memory requirements

38
Asymptotic Notations

 Big-O, “bounded above by”: T(n) = O(f(n))


 For some c and N, T(n)  c·f(n) whenever n > N.

 Big-Omega, “bounded below by”: T(n) = (f(n))


 For some c>0 and N, T(n)  c·f(n) whenever n > N.

 Same as f(n) = O(T(n)).

 Big-Theta, “bounded above and below”: T(n) = (f(n))


 T(n) = O(f(n)) and also T(n) = (f(n))

 Little-o, “strictly bounded above”: T(n) = o(f(n))


 T(n)/f(n)  0 as n  

39
By Pictures
 Big-Oh (most commonly used)
 bounded above

 Big-Omega

 bounded below N0
 Big-Theta

 exactly

 Small-o

 not as expensive as ...


N0

N0
40
Example

T ( n ) = n + 2n
3 2

O (?) Ω(?)
∞ 0
10
n n
5 2
n n
3 3
n n

41
Examples

f ( n) Asymptomic
c Θ(1)
k i k
Σ i =1 ci n Θ( n )
Σ in=1 i Θ( n 2 )
Σ in=1 i 2 Θ( n 3 )
n k k +1
Σ i =1i Θ( n )
Σ in=0 r i Θ( r n )
n! Θ(n(n /e) n )
n
Σ i =11 / i Θ(log n)

42
Summary (Why O(n)?)
 T(n) = ck nk + ck-1 nk-1 + ck-2 nk-2 + … + c1 n + co
 Too complicated

 O(nk )

 a single term with constant coefficient dropped

 Much simpler, extra terms and coefficients do not

matter asymptotically
 Other criteria hard to quantify

43
Runtime Analysis
 Useful rules
 simple statements (read, write, assign)
 O(1) (constant)
 simple operations (+ - * / == > >= < <=
 O(1)
 sequence of simple statements/operations
 rule of sums
 for, do, while loops
 rules of products

44
Runtime Analysis (cont.)
 Two important rules
 Rule of sums

 if you do a number of operations in sequence, the


runtime is dominated by the most expensive operation
 Rule of products
 if you repeat an operation a number of times, the total
runtime is the runtime of the operation multiplied by
the iteration count

45
Runtime Analysis (cont.)
if (cond) then O(1)
body1 T1(n)
else
body2 T2(n)
endif

T(n) = O(max (T1(n), T2(n))

46
Runtime Analysis (cont.)
 Method calls
 A calls B

 B calls C

 etc.

 A sequence of operations when call sequences are

flattened
T(n) = max(TA(n), TB(n), TC(n))

47
Example

for (i=1; i<n; i++)


if A(i) > maxVal then
maxVal= A(i);
maxPos= i;

Asymptotic Complexity: O(n)

48
Example

for (i=1; i<n-1; i++)


for (j=n; j>= i+1; j--)
if (A(j-1) > A(j)) then
temp = A(j-1);
A(j-1) = A(j);
A(j) = tmp;
endif
endfor
endfor

 Asymptotic Complexity is O(n2)

49
Run Time for Recursive Programs
 T(n) is defined recursively in terms of T(k), k<n
 The recurrence relations allow T(n) to be “unwound”

recursively into some base cases (e.g., T(0) or T(1)).


 Examples:

 Factorial

 Hanoi towers

50
Example: Factorial
T ( n)
int factorial (int n) {
= T (n −1) + d
if (n<=1) return 1;
= T (n −2) + d + d
else return n * factorial(n-1); = T (n −3) + d + d + d
} = ....
= T (1) + ( n −1) * d
factorial (n) = n*n-1*n-2* … *1 = c + (n −1) * d
= O ( n)
n * factorial(n-1) T(n)
n-1 * factorial(n-2) T(n-1)
n-2 * factorial(n-3) T(n-2)

2 *factorial(1)

T(1)

51
Example: Factorial (cont.)
int factorial1(int n) {
if (n<=1) return 1;
else {
fact = 1;
O (1)
for (k=2;k<=n;k++)
fact *= k; O(n)
O (1)
return fact;
}
}
 Both algorithms are O(n).

52
Example: Hanoi Towers
 Hanoi(n,A,B,C) =
 Hanoi(n-1,A,C,B)+Hanoi(1,A,B,C)+Hanoi(n-1,C,B,A)

T ( n)
= 2T (n −1) + c
= 2 2 T (n −2) + 2c + c
= 23 T (n −3) + 2 2 c + 2c + c
= ....
= 2 n −1T (1) + (2 n −2 + ... + 2 +1)c
= (2 n −1 + 2 n −2 + ... + 2 +1)c
= O(2 n )

53
Worst Case, Best Case, and Average Case
template<class T>
void SelectionSort(T a[], int n)
{ // Early-terminating version of selection sort
bool sorted = false;
for (int size=n; !sorted && (size>1); size--) {
int pos = 0;
sorted = true;
// find largest
for (int i = 1; i < size; i++)
if (a[pos] <= a[i]) pos = i;
else sorted = false; // out of order
Swap(a[pos], a[size - 1]);
}
}
 Worst Case
 Best Case

54
c f(N)
n0 T(N)
f(N)

T(N)=O(f(N))

 T(N)=6N+4 : n0=4 and c=7, f(N)=N


 T(N)=6N+4 <= c f(N) = 7N for N>=4
 7N+4 = O(N)
 15N+20 = O(N)
 N2=O(N)?
 N log N = O(N)?
 N log N = O(N2)?
 N2 = O(N log N)?
 N10 = O(2N)?
 6N + 4 = W(N) ? 7N? N+4 ? N2? N log N?
 N log N = W(N2)?
 3 = O(1)
 1000000=O(1)
 Sum i = O(N)?

55
An Analogy: Cooking Recipes
 Algorithms are detailed and precise instructions.
 Example: bake a chocolate mousse cake.
 Convert raw ingredients into processed output.

 Hardware (PC, supercomputer vs. oven, stove)

 Pots, pans, pantry are data structures.

 Interplay of hardware and algorithms


 Different recipes for oven, stove, microwave etc.

 New advances.
 New models: clusters, Internet, workstations

 Microwave cooking, 5-minute recipes, refrigeration

56

You might also like