0% found this document useful (0 votes)
48 views

Lec 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Lec 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 49

Data Structures and Algorithms

 Algorithm: Outline, the essence of a


computational procedure, step-by-step
instructions
 Program: an implementation of an
algorithm in some programming language
 Data structure: Organization of data
needed to solve the problem
Algorithmic problem
Specification
Specification ? of output as
of input a function of
input

 Infinitenumber of input instances satisfying the


specification. For eg: A sorted, non-decreasing
sequence of natural numbers of non-zero, finite
length:
 1, 20, 908, 909, 100000, 1000000000.
 3.
Algorithmic Solution

Input instance, Algorithm Output


adhering to related to
the the input as
specification required

 Algorithm describes actions on the input instance


 Infinitely many correct algorithms for the same
algorithmic problem
What is a Good Algorithm?
 Efficient:
 Running time
 Space used

 Efficiency as a function of input size:


 Thenumber of bits in an input number
 Number of data elements (numbers, points)
Measuring the Running Time
t (ms)
60

50
How should we measure the 40

running time of an algorithm? 30

20

10

Experimental Study 0 50 100


n

 Write a program that implements the algorithm


 Run the program with data sets of varying size
and composition.
 Use a method like System.currentTimeMillis() to
get an accurate measure of the actual running
time.
Limitations of Experimental Studies

 Itis necessary to implement and test the


algorithm in order to determine its running time.
 Experiments can be done only on a limited set of
inputs, and may not be indicative of the running
time on other inputs not included in the
experiment.
 In order to compare two algorithms, the same
hardware and software environments should be
used.
Beyond Experimental Studies
We will develop a general methodology for
analyzing running time of algorithms. This
approach
 Uses a high-level description of the algorithm
instead of testing one of its implementations.
 Takes into account all possible inputs.
 Allows one to evaluate the efficiency of any
algorithm in a way that is independent of the
hardware and software environment.
Pseudo-Code
 A mixture of natural language and high-level
programming concepts that describes the main
ideas behind a generic implementation of a data
structure or algorithm.
 Eg: Algorithm arrayMax(A, n):
Input: An array A storing n integers.
Output: The maximum element in A.
currentMax  A[0]
for i  1 to n-1 do
if currentMax < A[i] then currentMax  A[i]
return currentMax
Pseudo-Code
It is more structured than usual prose but
less formal than a programming language
 Expressions:
 use standard mathematical symbols to
describe numeric and boolean expressions
 use  for assignment (“=” in Java)
 use = for the equality relationship (“==” in
Java)
 Method Declarations:
 Algorithm name(param1, param2)
Pseudo Code
 Programming Constructs:
 decision structures: if ... then ... [else ... ]
 while-loops: while ... do
 repeat-loops: repeat ... until ...
 for-loop: for ... do
 array indexing: A[i], A[i,j]

 Methods:
 calls:object method(args)
 returns: return value
Analysis of Algorithms
 Primitive
Operation: Low-level operation
independent of programming language.
Can be identified in pseudo-code. For eg :
 Data movement (assign)
 Control (branch, subroutine call, return)
 arithmetic an logical operations (e.g. addition,
comparison)
 Byinspecting the pseudo-code, we can
count the number of primitive operations
executed by an algorithm.
Example: Sorting
INPUT OUTPUT
sequence of numbers a permutation of the
sequence of numbers

a1, a2, a3,….,an b1,b2,b3,….,bn


Sort
2 5 4 10 7 2 4 5 7 10

Correctness
Correctness(requirements
(requirementsfor forthe
the Running
Runningtime
time
output)
output) Depends
Dependson on
For
Forany
anygiven
giveninputinputthethealgorithm
algorithm ••number
numberof ofelements
elements(n)(n)
halts
haltswith
withthetheoutput:
output: ••how
how(partially)
(partially)sorted
sorted
••bb1 <<bb2 <<bb3 <<…. ….<< bbnn they
theyare
are
1 2 3
••bb1,,bb2,,bb3,,…., bn is a ••algorithm
algorithm
1 2 3 …., bn is a
permutation
permutationof ofaa11,,aa22,,aa33,….,a
,….,ann
Insertion Sort
A 3 4 6 8 9 7 2 5 1
1 j n
i

Strategy
Strategy INPUT:
INPUT:A[1..n]
A[1..n]––an
anarray
arrayofofintegers
integers
OUTPUT:
OUTPUT:aapermutation
permutationofofAAsuch
suchthat
that
••Start
Start“empty
“emptyhanded”
handed” A[1]A[2]…A[n]
A[1]A[2]…A[n]
••Insert
Insertaacard
cardininthe
theright
right
position
positionofofthe
thealready
alreadysorted
sorted for
for j2
j2 to to nn dodo
hand
hand key A[j]
key A[j]
••Continue
Continueuntil
untilall
allcards
cardsare
are Insert
InsertA[j]
A[j]into
intothe
thesorted
sortedsequence
sequence
inserted/sorted
inserted/sorted A[1..j-1]
A[1..j-1]
ij-1
ij-1
while
while i>0 i>0 andand A[i]>key
A[i]>key
do
do A[i+1]A[i]
A[i+1]A[i]
i--
i--
A[i+1]key
A[i+1]key
Analysis of Insertion Sort
cost times
for j2 to n do c1 n
keyA[j] c2 n-1
Insert A[j] into the sorted 0 n-1
sequence A[1..j-1]
ij-1 c3 n-1

n
t
while i>0 and A[i]>key c4
j =2 j


n
(t j - 1)
do A[i+1]A[i] c5
j =2


n
(t j - 1)
i-- j =2
c6 n-1
A[i+1] Ã key
c7
Total time = n(c1+c2+c3+c7) + nj=2 tj (c4+c5+c6)
– (c2+c3+c5+c6+c7)
Best/Worst/Average Case
Total time = n(c1+c2+c3+c7) + nj=2 tj (c4+c5+c6)
– (c2+c3+c5+c6+c7)
 Best case: elements already sorted; tj=1,
running time = f(n), i.e., linear time.
 Worst case: elements are sorted in
inverse order; tj=j, running time = f(n2), i.e.,
quadratic time
 Average case: tj=j/2, running time = f(n2),
i.e., quadratic time
Best/Worst/Average Case (2)
 Fora specific size of input n, investigate
running times for different input instances:
Best/Worst/Average Case (3)
For inputs of all sizes:

worst-case

6n average-case
Running time

5n
best-case
4n

3n

2n

1n

1 2 3 4 5 6 7 8 9 10 11 12 …..
Input instance size
Best/Worst/Average Case (4)
 Worst case is usually used: It is an upper-
bound and in certain application domains (e.g.,
air traffic control, surgery) knowing the worst-
case time complexity is of crucial importance
 For some algorithms worst case occurs fairly
often
 Average case is often as bad as the worst
case
 Finding average case can be very difficult
Asymptotic Analysis
 Goal: to simplify analysis of running time by
getting rid of ”details”, which may be affected by
specific implementation and hardware
 like“rounding”: 1,000,001  1,000,000
 3n2  n2

 Capturing the essence: how the running time of


an algorithm increases with the size of the input
in the limit.
 Asymptotically more efficient algorithms are best for
all but small inputs
Asymptotic Notation
 The “big-Oh” O-Notation
 asymptotic upper bound
 f(n) is O(g(n)), if there
exists constants c and
n0, s.t. f(n)  c g(n) for n c  g ( n)
 n0 f (n )

Running Time
 f(n) and g(n) are
functions over non-
negative integers
n0
 Usedfor worst-case Input Size

analysis
Example
For functions f(n) and g(n) there are positive 
constants c and n0 such that: f(n) ≤ c g(n) for n ≥ n0
f(n) = 2n + 6

conclusion: 
     2n+6 is O(n). c g(n) = 4n

g(n) = n

n
Another Example

On the other hand…
n2 is not O(n) because there is 
no c and n0 such that:  
n2 ≤ cn for n ≥ n0 

The graph to the right 
illustrates that no matter how 
large a c is chosen there is an n 
big enough that n2 > cn ) .
Asymptotic Notation
 SimpleRule: Drop lower order terms and
constant factors.
 50 n log n is O(n log n)
 7n - 3 is O(n)
 8n2 log n + 5n2 + n is O(n2 log n)

 Note:Even though (50 n log n) is O(n5), it


is expected that such an approximation be
of as small an order as possible
Asymptotic Analysis of Running Time
 Use O-notation to express number of primitive
operations executed as function of input size.
 Comparing asymptotic running times
 an algorithm that runs in O(n) time is better
than one that runs in O(n2) time
 similarly, O(log n) is better than O(n)
 hierarchy of functions: log n < n < n 2 < n3 < 2n
 Caution! Beware of very large constant factors.
An algorithm running in time 1,000,000 n is still
O(n) but might be less efficient than one running
in time 2n2, which is O(n2)
Example of Asymptotic Analysis
Algorithm prefixAverages1(X):
Input: An n-element array X of numbers.
Output: An n-element array A of numbers such that
A[i] is the average of elements X[0], ... , X[i].
for i  0 to n-1 do
a0
n iterations
for j  0 to i do i iterations 
a  a + X[j] 1 step with 
A[i]  a/(i+1) i=0,1,2...n­1
return array A
Analysis: running time is O(n2)
A Better Algorithm
Algorithm prefixAverages2(X):
Input: An n-element array X of numbers.
Output: An n-element array A of numbers such
that A[i] is the average of elements X[0], ... , X[i].
s0
for i  0 to n do
s  s + X[i]
A[i]  s/(i+1)
return array A
Analysis: Running time is O(n)
Asymptotic Notation (terminology)
 Special classes of algorithms:
 Logarithmic: O(log n)
 Linear: O(n)
 Quadratic: O(n2)
 Polynomial: O(nk), k ≥ 1
 Exponential: O(an), a > 1

 “Relatives” of the Big-Oh


 (f(n)): Big Omega -asymptotic lower bound
  (f(n)): Big Theta -asymptotic tight bound
Asymptotic Notation
 The “big-Omega”
-Notation
 asymptotic lower bound f (n )

Running Time
 f(n) is (g(n)) if there exists
c  g ( n)
constants c and n0, s.t.
c g(n)  f(n) for n  n0
 Used to describe best- n0
case running times or Input Size

lower bounds for


algorithmic problems
 E.g.,lower-bound for
searching in an unsorted
array is (n).
Asymptotic Notation
 The “big-Theta” -Notation
 asymptotically tight bound
 f(n) = (g(n)) if there exists
constants c1, c2, and n0, s.t.
c1 g(n)  f(n)  c2 g(n) for n 
n0 c 2  g (n )
 f(n) is (g(n)) if and only if f (n )

Running Time
f(n) is (g(n)) and f(n) is c 1  g (n )

(g(n))
 O(f(n)) is often misused n0 Input Size
instead of (f(n))
Asymptotic Notation
Two more asymptotic notations
 "Little-Oh" notation f(n) is o(g(n))
non-tight analogue of Big-Oh
 For every c, there should exist n0 , s.t. f(n)
 c g(n) for n  n0
 Used for comparisons of running times. If
f(n)=o(g(n)), it is said that g(n) dominates
f(n).
 "Little-omega" notation f(n) is (g(n))
non-tight analogue of Big-Omega
Asymptotic Notation
 Analogy with real numbers
 f(n) = O(g(n)) f g
 f(n) = (g(n)) f g
 f(n) = (g(n)) f =g
 f(n) = o(g(n)) f g
 f(n) = (g(n)) f g
 Abuse
of notation: f(n) = O(g(n)) actually
means f(n) O(g(n))
Comparison of Running Times

Running Maximum problem size (n)


Time 1 second 1 minute 1 hour

400n 2500 150000 9000000


20n log n 4096 166666 7826087
2n2 707 5477 42426
n4 31 88 244
2n 19 25 31
Correctness of Algorithms
 The algorithm is correct if for any legal
input it terminates and produces the
desired output.
 Automatic proof of correctness is not
possible
 But there are practical techniques and
rigorous formalisms that help to reason
about the correctness of algorithms
Partial and Total Correctness

 Partial correctness
IF this point is reached, THEN this is the desired output

Any legal input Algorithm Output

 Total correctness
INDEED this point is reached, AND this is the desired output

Any legal input Algorithm Output


Assertions
 To prove correctness we associate a number of
assertions (statements about the state of the
execution) with specific checkpoints in the
algorithm.
 E.g., A[1], …, A[k] form an increasing sequence
 Preconditions – assertions that must be valid
before the execution of an algorithm or a
subroutine
 Postconditions – assertions that must be valid
after the execution of an algorithm or a
subroutine
Loop Invariants
 Invariants – assertions that are valid any
time they are reached (many times during
the execution of an algorithm, e.g., in
loops)
 We must show three things about loop
invariants:
 Initialization – it is true prior to the first
iteration
 Maintenance – if it is true before an iteration,
it remains true before the next iteration
 Termination – when loop terminates the
invariant gives a useful property to show the
correctness of the algorithm
Example of Loop Invariants (1)
 Invariant: at the start of for jj ÃÃ 22 to
for to length(A)
length(A)
each for loop, A[1…j-1] do key ÃÃ A[j]
do key A[j]
ii ÃÃ j-1
j-1
consists of elements while
while i>0
i>0 and
and A[i]>key
A[i]>key
originally in A[1…j-1] but do A[i+1] ÃÃ A[i]
do A[i+1] A[i]
in sorted order i--
i--
A[i+1] ÃÃ key
A[i+1] key
Example of Loop Invariants (2)
 Invariant: at the start of for jj ÃÃ 22 to
for to length(A)
length(A)
each for loop, A[1…j-1] do key ÃÃ A[j]
do key A[j]
ii ÃÃ j-1
j-1
consists of elements while
while i>0
i>0 and
and A[i]>key
A[i]>key
originally in A[1…j-1] but do
do A[i+1]Ã
A[i+1]Ã A[i]
A[i]
in sorted order i--
i--
A[i+1] ÃÃ key
A[i+1] key

 Initialization: j = 2, the invariant trivially holds


because A[1] is a sorted array 
Example of Loop Invariants (3)
 Invariant: at the start of for jj ÃÃ 22 to
for to length(A)
length(A)
each for loop, A[1…j-1] do key ÃÃ A[j]
do key A[j]
ii ÃÃ j-1
j-1
consists of elements while
while i>0
i>0 and
and A[i]>key
A[i]>key
originally in A[1…j-1] but do A[i+1] ÃÃ A[i]
do A[i+1] A[i]
in sorted order i--
i--
A[i+1] ÃÃ key
A[i+1] key

 Maintenance: the inner while loop moves elements


A[j-1], A[j-2], …, A[j-k] one position right without
changing their order. Then the former A[j] element is
inserted into k-th position so that A[k-1]  A[k] 
A[k+1].
A[1…j-1] sorted + A[j] A[1…j] sorted
Example of Loop Invariants (4)
 Invariant: at the start of for jj ÃÃ 22 to
for to length(A)
length(A)
each for loop, A[1…j-1] do key ÃÃ A[j]
do key A[j]
ii ÃÃ j-1
j-1
consists of elements while
while i>0
i>0 and
and A[i]>key
A[i]>key
originally in A[1…j-1] but do A[i+1] ÃÃ A[i]
do A[i+1] A[i]
in sorted order i--
i--
A[i+1] ÃÃ key
A[i+1] key

 Termination: the loop terminates, when j=n+1. Then


the invariant states: “A[1…n] consists of elements
originally in A[1…n] but in sorted order” 
Math You Need to Review
 Properties of logarithms:
logb(xy) = logbx + logby
logb(x/y) = logbx - logby
logb xa = a logb x
logb a = logxa/logxb
 Properties of exponentials:
a(b+c) = abac ; abc = (ab)c
ab /ac = a(b-c) ; b = aloga b
 Floor: x = the largest integer ≤ x
 Ceiling: x = the smallest integer ≥ x
Math Review

 Geometric progression
 given an integer n0 and a real number 0< a  1
n 1
n
1 - a

i =0
a i = 1  a  a 2  ...  a n =
1- a

 geometric progressions exhibit exponential


growth
 Arithmetic progression
n
n2  n

i =0
i = 1  2  3  ...  n =
2
Summations
 Therunning time of insertion sort is
determined by a nested loop
for
for j2
j2 to
to length(A)
length(A)
keyA[j]
keyA[j]
ij-1
ij-1
while
while i>0
i>0 and
and A[i]>key
A[i]>key
A[i+1]A[i]
A[i+1]A[i]
ii-1
ii-1
A[i+1]key
A[i+1]key

 Nested loops correspond to summations


 j =2
n
( j - 1) = O ( n 2
)
Proof by Induction
 We want to show that property P is true for
all integers n  n0
 Basis: prove that P is true for n0
 Inductive step: prove that if P is true for all
k such that n0  k  n – 1 then P is also true
for n
n
n(n  1)
 Example S ( n ) =  i
i =0
=
2
for n  1

1
1(1  1)
 Basis S (1) =  i =
i =0 2
Proof by Induction (2)
 Inductive Step
k
k (k  1)
S (k ) =  i = for 1  k  n - 1
i =0 2
n n -1
S (n) =  i = i  n =S (n - 1)  n =
i =0 i=0

(n - 1  1) ( n 2 - n  2n)
= (n - 1) n= =
2 2
n(n  1)
=
2

You might also like