Data Structures Related Algorithms Correctness Time Space Complexity
Data Structures Related Algorithms Correctness Time Space Complexity
Prerequisites
CS 20: stacks, queues, lists, binary search trees, …
1
Course Organization
Grading:
See course web page
Policy:
No late homeworks.
Cheating and plagiaris: F grade and disciplinary actions
Online info:
Homepage: www.cs.ucsb.edu/~cs130a
Email: [email protected]
Teaching assistants: See course web page
2
Introduction
3
Objectives
The main focus of this course is to introduce you to a
systematic study of algorithms and data structure.
The two guiding principles of the course are: abstraction and
formal analysis.
Abstraction: We focus on topics that are broadly applicable
to a variety of problems.
Analysis: We want a formal way to compare two objects (data
structures or algorithms).
In particular, we will worry about "always correct"-ness, and
worst-case bounds on time and memory (space).
4
Textbook
Textbook for the course is:
Data Structures and Algorithm Analysis in C++
by Mark Allen Weiss
5
Course Outline
C++ Review (Ch. 1)
Algorithm Analysis (Ch. 2)
Sets with insert/delete/member: Hashing (Ch. 5)
Sets in general: Balanced search trees (Ch. 4 and 12.2)
Sets with priority: Heaps, priority queues (Ch. 6)
Graphs: Shortest-path algorithms (Ch. 9.1 – 9.3.2)
Sets with disjoint union: Union/find trees (Ch. 8.1–8.5)
Graphs: Minimum spanning trees (Ch. 9.5)
Sorting (Ch. 7)
6
130a: Algorithm Analysis
Foundations of Algorithm Analysis and Data Structures.
Analysis:
How to predict an algorithm’s performance
Data Structures
How to efficiently store, access, manage data
7
Example Algorithms
Two algorithms for computing the Factorial
Which one is better?
8
Examples of famous algorithms
Constructions of Euclid
Newton's root finding
Fast Fourier Transform
Compression (Huffman, Lempel-Ziv, GIF, MPEG)
DES, RSA encryption
Simplex algorithm for linear programming
Shortest Path Algorithms (Dijkstra, Bellman-Ford)
Error correcting codes (CDs, DVDs)
TCP congestion control, IP routing
Pattern matching (Genomics)
Search Engines
9
Role of Algorithms in Modern World
Enormous amount of data
E-commerce (Amazon, Ebay)
Network traffic (telecom billing, monitoring)
Database transactions (Sales, inventory)
Scientific measurements (astrophysics, geology)
Sensor networks. RFID tags
Bioinformatics (genome, protein bank)
10
A real-world Problem
11
IP Prefixes and Routing
Each router is really a switch: it receives packets at several
input ports, and appropriately sends them out to output
ports.
Thus, for each packet, the router needs to transfer the
packet to that output port that gets it closer to its
destination.
Should each router keep a table: IP address x Output Port?
How big is this table?
When a link or router fails, how much information would need
to be modified?
A router typically forwards several million packets/sec!
12
Data Structures
The IP packet forwarding is a Data Structure problem!
Efficiency, scalability is very important.
13
Algorithms to Process these Data
14
Max Subsequence Problem
Given a sequence of integers A1, A2, …, An, find the maximum possible
value of a subsequence Ai, …, Aj.
Numbers can be negative.
You want a contiguous chunk with largest sum.
15
Algorithm 1 for Max Subsequence Sum
Given A1,…,An , find the maximum value of Ai+Ai+1+···+Aj
0 if the max value is negative
16
Algorithm 2
Idea: Given sum from i to j-1, we can compute the
sum from i to j in constant time.
This eliminates one nested loop, and reduces the
into maxSum = 0;
17
Algorithm 3
This algorithm uses divide-and-conquer paradigm.
Suppose we split the input sequence at midpoint.
18
Algorithm 3 (cont.)
Example:
left half | right half
4 -3 5 -2 | -1 2 6 -2
Max subsequences in each half found by recursion.
How do we find the straddling max subsequence?
Key Observation:
Left half of the straddling sequence is the max
19
Algorithm 3: Analysis
T(1) = 1
T(n) = 2T(n/2) + O(n)
20
Algorithm 4
2, 3, -2, 1, -5, 4, 1, -3, 4, -1, 2
int maxSum = 0, thisSum = 0;
21
Proof of Correctness
becomes < 0
The max subsequence cannot start at any p between i and
22
Algorithm 4
int maxSum = 0, thisSum = 0;
23
Why Efficient Algorithms Matter
Suppose N = 106
A PC can read/process N records in 1 sec.
But if some algorithm does N*N computation, then it takes
1M seconds = 11 days!!!
24
How to Measure Algorithm Performance
Memory required
Running time
25
Abstraction
An algorithm may run differently depending on:
the hardware platform (PC, Cray, Sun)
26
Average, Best, and Worst-Case
Average case:
Real world distributions difficult to predict
Best case:
Seems unrealistic
Worst case:
Gives an absolute guarantee
27
Examples
Vector addition Z = A+B
for (int i=0; i<n; i++)
Z[i] = A[i] + B[i];
T(n) = c n
28
Examples
Vector (outer) multiplication Z = A*BT
for (int i=0; i<n; i++)
for (int j=0; j<n; j++)
Z[i,j] = A[i] * B[j];
T(n) = c2 n2;
29
Simplifying the Bound
10 or 20 terms
Do we really need that many terms?
30
Simplifications
Keep just one term!
the fastest growing term (dominates the runtime)
etc.
31
Simplification
Drop the constant coefficient
Does not effect the relative order
32
Simplification
The faster growing term (such as 2n) eventually will
outgrow the slower growing terms (e.g., 1000 n) no
matter what their coefficients!
33
Complexity and Tractability
T(n)
n n n log n n2 n3 n4 n10 2n
10 .01μs .03μs .1μs 1μs 10μs 10s 1μs
20 .02μs .09μs .4μs 8μs 160μs 2.84h 1ms
30 .03μs .15μs .9μs 27μs 810μs 6.83d 1s
40 .04μs .21μs 1.6μs 64μs 2.56m s 121d 18m
50 .05μs .28μs 2.5μs 125μs 6.25m s 3.1y 13d
100 .1μs .66μs 10μs 1m s 100m s 3171y 4×1013y
103 1μs 9.96μs 1ms 1s 16.67m 3.17×1013y 32×10283y
104 10μs 130μs 100ms 16.67m 115.7d 3.17×1023y
105 100μs 1.66m s 10s 11.57d 3171y 3.17×1033y
106 1m s 19.92m s 16.67m 31.71y 3.17×107y 3.17×1043y
34
log nnn log nn2n32n010112122484248166416382464512256416642564096
70000 2n n2 2n
100000
n3 n2
60000
50000
10000 n log n
40000
n3 1000 n
30000
100
20000
n log n 10 log n
10000
0
n
1
n log n n
35
Another View
More resources (time and/or processing power) translate into
large problems solved if complexity is low
1000n 1 10 10
5n2 14 45 3.2
N3 10 22 2.2
2n 10 13 1.3
36
Asympotics
3n2+4n+1 3 n2 n2
15 n2+6n 15 n2 n2
a n2+bn+c a n2 n2
37
Caveats
memory requirements
38
Asymptotic Notations
39
By Pictures
Big-Oh (most commonly used)
bounded above
Big-Omega
bounded below N0
Big-Theta
exactly
Small-o
N0
40
Example
T ( n ) = n + 2n
3 2
O (?) Ω(?)
∞ 0
10
n n
5 2
n n
3 3
n n
41
Examples
f ( n) Asymptomic
c Θ(1)
k i k
Σ i =1 ci n Θ( n )
Σ in=1 i Θ( n 2 )
Σ in=1 i 2 Θ( n 3 )
n k k +1
Σ i =1i Θ( n )
Σ in=0 r i Θ( r n )
n! Θ(n(n /e) n )
n
Σ i =11 / i Θ(log n)
42
Summary (Why O(n)?)
T(n) = ck nk + ck-1 nk-1 + ck-2 nk-2 + … + c1 n + co
Too complicated
O(nk )
matter asymptotically
Other criteria hard to quantify
43
Runtime Analysis
Useful rules
simple statements (read, write, assign)
O(1) (constant)
simple operations (+ - * / == > >= < <=
O(1)
sequence of simple statements/operations
rule of sums
for, do, while loops
rules of products
44
Runtime Analysis (cont.)
Two important rules
Rule of sums
45
Runtime Analysis (cont.)
if (cond) then O(1)
body1 T1(n)
else
body2 T2(n)
endif
46
Runtime Analysis (cont.)
Method calls
A calls B
B calls C
etc.
flattened
T(n) = max(TA(n), TB(n), TC(n))
47
Example
48
Example
49
Run Time for Recursive Programs
T(n) is defined recursively in terms of T(k), k<n
The recurrence relations allow T(n) to be “unwound”
Factorial
Hanoi towers
50
Example: Factorial
T ( n)
int factorial (int n) {
= T (n −1) + d
if (n<=1) return 1;
= T (n −2) + d + d
else return n * factorial(n-1); = T (n −3) + d + d + d
} = ....
= T (1) + ( n −1) * d
factorial (n) = n*n-1*n-2* … *1 = c + (n −1) * d
= O ( n)
n * factorial(n-1) T(n)
n-1 * factorial(n-2) T(n-1)
n-2 * factorial(n-3) T(n-2)
…
2 *factorial(1)
T(1)
51
Example: Factorial (cont.)
int factorial1(int n) {
if (n<=1) return 1;
else {
fact = 1;
O (1)
for (k=2;k<=n;k++)
fact *= k; O(n)
O (1)
return fact;
}
}
Both algorithms are O(n).
52
Example: Hanoi Towers
Hanoi(n,A,B,C) =
Hanoi(n-1,A,C,B)+Hanoi(1,A,B,C)+Hanoi(n-1,C,B,A)
T ( n)
= 2T (n −1) + c
= 2 2 T (n −2) + 2c + c
= 23 T (n −3) + 2 2 c + 2c + c
= ....
= 2 n −1T (1) + (2 n −2 + ... + 2 +1)c
= (2 n −1 + 2 n −2 + ... + 2 +1)c
= O(2 n )
53
Worst Case, Best Case, and Average Case
template<class T>
void SelectionSort(T a[], int n)
{ // Early-terminating version of selection sort
bool sorted = false;
for (int size=n; !sorted && (size>1); size--) {
int pos = 0;
sorted = true;
// find largest
for (int i = 1; i < size; i++)
if (a[pos] <= a[i]) pos = i;
else sorted = false; // out of order
Swap(a[pos], a[size - 1]);
}
}
Worst Case
Best Case
54
c f(N)
n0 T(N)
f(N)
T(N)=O(f(N))
55
An Analogy: Cooking Recipes
Algorithms are detailed and precise instructions.
Example: bake a chocolate mousse cake.
Convert raw ingredients into processed output.
New advances.
New models: clusters, Internet, workstations
56