0% found this document useful (0 votes)
4 views

4 Randquicksort

Uploaded by

arastogi1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

4 Randquicksort

Uploaded by

arastogi1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Algorithms for Data Science

CSOR W4246

Eleni Drinea
Computer Science Department

Columbia University

Randomized quicksort
Outline

1 Randomized Quicksort
Today

1 Randomized Quicksort
Pseudocode for randomized Quicksort

Randomized-Quicksort(A, lef t, right)


if |A| = 0 then return //A is empty
end if
split = Randomized-Partition(A, lef t, right)
Randomized-Quicksort(A, lef t, split − 1)
Randomized-Quicksort(A, split + 1, right)

Randomized-Partition(A, lef t, right)


b = random(lef t, right)
swap(A[b], A[right])
return Partition(A, lef t, right)

Subroutine random(i, j) returns a random number between i and j


inclusive.
Expected running time of randomized Quicksort

I Let T (n) be the expected running time of


Randomized-Quicksort.
I We want to bound T (n).
I Randomized-Quicksort differs from Quicksort only in
how they select their pivot elements.

⇒ We will analyze Randomized-Quicksort based on


Quicksort and Partition.
Pseudocode for Partition

Partition(A, lef t, right)


pivot = A[right] line 1
split = lef t − 1 line 2
for j = lef t to right − 1 do line 3
if A[j] ≤ pivot then line 4
swap(A[j], A[split + 1]) line 5
split = split + 1 line 6
end if
end for
swap(pivot, A[split + 1]) line 7
return split + 1 line 8
Few observations

1. How many times is Partition called?


Few observations

1. How many times is Partition called?


At most n.

2. Further, each Partition call spends some work


1. outside the for loop

2. inside the for loop


Few observations

1. How many times is Partition called?


At most n.

2. Further, each Partition call spends some work


1. outside the for loop
I every Partition spends constant work ouside the for loop
I at most n calls to Partition
⇒ total work outside the for loop in all calls to Partition is O(n)
2. inside the for loop
Few observations

1. How many times is Partition called?


At most n.

2. Further, each Partition call spends some work


1. outside the for loop
I every Partition spends constant work ouside the for loop
I at most n calls to Partition
⇒ total work outside the for loop in all calls to Partition is O(n)
2. inside the for loop
I let X be the total number of comparisons performed at line 4 in
all calls to Partition
I each comparison may require some further constant work (lines 5
and 6)
⇒ total work inside the for loop in all calls to Partition is O(X)
Towards a bound for T (n)

X = the total number of comparisons in all Partition calls.


The running time of Randomized-Quicksort is

O(n + X).

Since X is a random variable, we need E[X] to bound T (n).


Towards a bound for T (n)

X = the total number of comparisons in all Partition calls.


The running time of Randomized-Quicksort is

O(n + X).

Since X is a random variable, we need E[X] to bound T (n).

Fact 1.
Fix any two input items. During the execution of the algorithm,
they may be compared at most once.
Towards a bound for T (n)

X = the total number of comparisons in all Partition calls.


The running time of Randomized-Quicksort is

O(n + X).

Since X is a random variable, we need E[X] to bound T (n).

Fact 1.
Fix any two input items. During the execution of the algorithm,
they may be compared at most once.

Proof.
Comparisons are only performed with the pivot of each Partition
call. After Partiton returns, pivot is in its final location in the output
and will not be part of the input to any future recursive call.
Simplifying the analysis

n(n−1)
There are n numbers in the input, hence n2 =

I
2
distinct (unordered) pairs of input numbers.
n

I From Fact 1, the algorithm will perform at most 2
comparisons.
I What is the expected number of comparisons?
Simplifying the analysis

n(n−1)
There are n numbers in the input, hence n2 =

I
2
distinct (unordered) pairs of input numbers.
n

I From Fact 1, the algorithm will perform at most 2
comparisons.
I What is the expected number of comparisons?

To simplify the analysis


I relabel the input as z1 , z2 , . . . , zn , where zi is the i-th
smallest number.
I assume that all input numbers are distinct; thus zi < zj ,
for i < j.
Writing X as the sum of indicator random variables

Let Xij be an indicator random variable such that



1, if zi and zj are ever compared
Xij =
0, otherwise
Writing X as the sum of indicator random variables

Let Xij be an indicator random variable such that



1, if zi and zj are ever compared
Xij =
0, otherwise
P
The total number of comparisons is given by X = Xij .
1≤i<j≤n
Writing X as the sum of indicator random variables

Let Xij be an indicator random variable such that



1, if zi and zj are ever compared
Xij =
0, otherwise
P
The total number of comparisons is given by X = Xij .
1≤i<j≤n
E[X] =?
Writing X as the sum of indicator random variables

Let Xij be an indicator random variable such that



1, if zi and zj are ever compared
Xij =
0, otherwise
P
The total number of comparisons is given by X = Xij .
1≤i<j≤n
By linearity of expectation
n−1 n
" #
X X X X
E[X] = E Xij = E[Xij ] = Pr[Xij = 1]
1≤i<j≤n 1≤i<j≤n i=1 j=i+1
Writing X as the sum of indicator random variables

Let Xij be an indicator random variable such that



1, if zi and zj are ever compared
Xij =
0, otherwise
P
The total number of comparisons is given by X = Xij .
1≤i<j≤n
By linearity of expectation
n−1 n
" #
X X X X
E[X] = E Xij = E[Xij ] = Pr[Xij = 1]
1≤i<j≤n 1≤i<j≤n i=1 j=i+1

Goal: compute Pr[Xij = 1], that is, the probability that two
fixed items zi and zj are ever compared.
Fix two items zi and zj . When are they compared?

Notation: let Zij = {zi , zi+1 , . . . , zj }

Consider the initial call Partition(A, 1, n). Assume it picks zk


outside Zij as pivot (see figure below).
pivot
Zij

z₁ < z₂ < … < zᵢ < zᵢ₊₁ < … !"""z!"" < … < zk < … < zn !!

1. zi and zj are not compared in this call (why?).


2. All items in Zij will be greater (or smaller) than zk , so they
will all be input to the same subproblem after
Partition(A, 1, n) returns.
In the first Partition with pivot ∈ Zij = {zi , . . . , zj }
The first Partition call that picks its pivot from Zij
determines if zi , zj are ever compared. Three possibilities:

1. pivot = zi

2. pivot = zj

3. pivot = z` , for some i < ` < j


In the first Partition with pivot ∈ Zij = {zi , . . . , zj }
The first Partition call that picks its pivot from Zij
determines if zi , zj are ever compared. Three possibilities:

1. pivot = zi

zi is compared with every element in Zij − {zi }, thus with zj


too. zi is placed in its final location in the output and will not
appear in any future calls to Partition.

2. pivot = zj

zj is compared with every element in Zij − {zj }, thus with zi


too. zj is placed in its final location in the output and will not
appear in any future recursive calls.

3. pivot = z` , for some i < ` < j

zi and zj are never compared (why?)


So zi and zj are compared when . . .

. . . either of them is chosen as pivot in that first Partition call


that chooses its pivot element from Zij .

Now we can compute Pr[Xij = 1]:

Pr[Xij = 1] = Pr[zi is chosen as pivot by the first Partition


that picks its pivot from Zij , or
zj is chosen as pivot by the first Partition
that picks its pivot from Zij ] (1)
The union bound

Suppose we are given a set of events ε1 , ε2 , . . . , εn , and we are


interested in the probability that any of them happens.

Union bound: Given events ε1 , ε2 , . . . , εn , we have


"n # n
[ X
Pr εi ≤ Pr[εi ].
i=1 i=1

Union bound for mutually exclusive events: Suppose that


εi ∩ εj = ∅ for each pair of events. Then
"n # n
[ X
Pr εi = Pr[εi ].
i=1 i=1
Computing the probability that zi and zj are compared

Since the two events in equation (1) are mutually exclusive, we


obtain

Pr[Xij = 1] = Pr[zi is chosen as pivot by the first Partition


call that picks its pivot from Zij ]
+ Pr[zj is chosen as pivot by the first Partition
call that picks its pivot from Zij ]
1 1 2
= + = , (2)
j−i+1 j−i+1 j−i+1
since the set Zij contains j − i + 1 elements.
From Pr[Xij = 1] to E[X]
n−1 n n−1 n
X X X X 2
E[X] = Pr[Xij = 1] =
j−i+1
i=1 j=i+1 i=1 j=i+1

X n−i+1
n−1 X 1
= 2 (3)
`
i=1 `=2

k
P 1
Note that ` = Hk is the k-th harmonic number, such that
`=1

ln k ≤ Hk ≤ ln k + 1 (4)

n−i+1
P 1
Hence ` ≤ ln (n − i + 1). Substituting in (3), we get
`=2

n−1
X n−1
X
E[X] ≤ 2 ln (n − i + 1) ≤ 2 ln n = O(n ln n)
i=1 i=1
From E[X] to T (n)

I Equations (3), (4) also yield a lower bound of Ω(n ln n) for


E[X] (show this!).

I Hence E[X] = Θ(n ln n). Then the expected running time


of Randomized-Quicksort is

T (n) = Θ(n ln n)

You might also like