0% found this document useful (0 votes)
2 views

note2

This document discusses concentration inequalities, particularly Hoeffding's Inequality, and its applications in various contexts such as Multi-Armed Bandits and supervised learning. It introduces key concepts including pseudo-regret, simple regret, and generalization bounds, providing mathematical formulations and proofs for these concepts. The document emphasizes the importance of union bounds and Hoeffding's Inequality in deriving performance guarantees for learning algorithms.

Uploaded by

zzyy20010204
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

note2

This document discusses concentration inequalities, particularly Hoeffding's Inequality, and its applications in various contexts such as Multi-Armed Bandits and supervised learning. It introduces key concepts including pseudo-regret, simple regret, and generalization bounds, providing mathematical formulations and proofs for these concepts. The document emphasizes the importance of union bounds and Hoeffding's Inequality in deriving performance guarantees for learning algorithms.

Uploaded by

zzyy20010204
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Concentration Inequalities and Union Bound

Nan Jiang

September 13, 2022

This note introduces the basics of concentration inequalities and examples of its applications (often
with union bound), which will be useful for the rest of this course.

1 Hoeffding’s Inequality
Theorem 1. Let X1 , . . . , Xn be independent random variables on R such that Xi is bounded in the interval
Pn
[ai , bi ] . Let Sn = i=1 Xi . Then for all t > 0,
2 Pn 2
Pr[Sn − E[Sn ] ≥ t] ≤ e−2t / i=1 (bi −ai ) , (1)
−2t2 /
Pn 2
Pr[Sn − E[Sn ] ≤ −t] ≤ e i=1 (bi −ai ) . (2)

Remarks:
2 Pn 2
• By union bound, we have Pr[|Sn − E[Sn ]| ≥ t] ≤ 2e−2t / i=1 (bi −ai ) .

• We often care
h about the convergence
i of the empirical mean to the true average, so we can devide
2 2 Pn 2
Sn by n: Pr Snn − E[Snn ] ≥ t ≤ 2e−2n t / i=1 (bi −ai ) .

• A useful rephrase of the result whenqall variables share the same support [a, b]: with probability
Sn E[Sn ] 1
at least 1 − δ, n − n ≤ (b − a) 2n ln 2δ .

• X1 , . . . , Xn are not necessarily identically distributed; they just have to be independent.

• The number of variables, n, is a constant in the theorem statement. When n is a random variable
itself, for Hoeffding’s inequality to apply, n cannot depend on the realization of X1 , . . . , Xn .
Example: Consider the following Markov chain:

s1
p 1-p

s2 s4

s3

1
Say we start at s1 and sample a path of length T (T is a constant). Let n be the number of times
we visit s1 , and we can use the transitions from s1 to estimate p.

1. Can we directly apply Hoeffding’s inequality here with n as the number of coin tosses? If
you want to derive a concentration bound for this problem, look up Azuma’s inequality.
2. What if we sample a path until we visit s1 N times for some constant N ? Can we apply
Hoeffding’s inequality with N as the number of random variables?

2 Multi-Armed Bandits (MAB)


2.1 Formulation
A MAB problem is specified by K distributions over R, {Ri }K i=1 . Each Ri has bounded supported
[0, 1] and mean µi . Let µ⋆ = maxi∈[K] µi . For round t = 1, 2, . . . , T , the learner

1. Chooses arm it ∈ [K].

2. Receives reward rt ∼ Rit .

A popular objective for MAB is the pseudo-regret, which poses the exploration-exploitation challenge:
T
X
RegretT = (µ⋆ − µit ).
t=1

Another important objective is the simple regret:

µ⋆ − µî ,

where î is the arm that the learner picks after T rounds of interactions. This poses the “pure explo-
ration” challenge, since all it matters is to make a good final guess and the regret incurred within the
T rounds does not matter. A related objective is called Best-Arm Identification, which asks whether
î ∈ arg maxi∈[K] µi ; Best-Arm Identification results often require additional gap conditions.

2.2 Uniform sampling


We consider the simplest algorithm that chooses each arm the same number of times, and after T
rounds selects the arm with the highest empirical mean. For simplicity let’s assume that T /K is an
integer. We will prove a high-probability bound on the simple regret. The analysis gives an example
of the application of Hoeffiding’s inequlaity to a learning problem; the algorithm itself is likely to be
suboptimal.
For simplicity let’s assume that T /K is an integer. After T rounds, each arm is chosen T /K times,
and let µ̂i be the empirical average reward associated with arm i. By Hoeffding’s inequality, we have:
2
Pr[|µ̂i − µi | ≥ ϵ] ≤ 2e−2T ϵ /K
.

2
Now we want accurate estimation for all arms simultaneously. That is, we want to bound the proba-
bility of the event that any µ̂i deviating from µi too much. This is where union bound is useful:
"K #
[
Pr {|µ̂i − µi | ≥ ϵ} (the event that estimation is ϵ-inaccurate for at least 1 arm)
i=1
K
X 2
≤ Pr [|µ̂i − µi | ≥ ϵ] ≤ 2Ke−2T ϵ /K
. (union bound, then Hoeffding’s inequality)
i=1
q
K
To rephrase this result: with probability at least 1 − δ, |µ̂i − µi | ≤ 2T ln 2K
δ holds for all i simultane-
ously.
Finally, we use the estimation error to bound the decision loss: recall that î = arg maxi∈[K] µ̂i , and
let i⋆ = arg maxi∈[K] µi .

µ⋆ − µî = µi⋆ − µ̂i⋆ + µ̂i⋆ − µî


r
K 2K
≤ µi⋆ − µ̂i⋆ + µ̂î − µî ≤ 2 ln .
2T δ
We can rephrase this result as a sample complexity
 statement:
 in order to guarantee that µ⋆ − µî ≤ ϵ
K K
with probablity at least 1 − δ, we need T = O 2 ln .
ϵ δ

2.3 Lower bound


The linear dependence of the sample complexity on K makes a lot of sense, as to choose a arm with
high reward we have to try each arm at least once. Below we will see how to mathematically formalize
this idea and prove a lower bound on the sample complexity of MAB.
p
Theorem 2. For any K ≥ 2, ϵ ≤ 1/8, and any MAB algorithm, there exists an MAB instance where µ⋆
is ϵ better than other arms, yet the algorithm identifies the best arm with no more than 2/3 probability unless
K
T ≥ 72ϵ 2.

The theorem itself is stated as a best-arm identification lower bound, but it is also a lower bound
for simple regret minimization. This is because all arms except the best one is ϵ worse than µ⋆ , so
missing the optimal arm means a simple regret of at least ϵ.
See the proof in [1] (Theorem 2); the technique is due to [2] and can be also used to prove the lower
bound on the regret of MAB.

3 Generalization Bounds for Supervised Learning


Consider a simple supervised learning setting: let X be the feature space and Y be the label space; in
this example we consider classification so Y = {0, 1}. Let PX,Y be a distribution over X × Y, and we
are given a dataset {(Xi , Yi )}ni=1 with each (Xi , Yi ) drawn i.i.d. from PX,Y . Let F : X → Y be a finite
hypothesis class. The classifier in F that minimizes the classification error is:

f ⋆ := arg min E[I[f (X) ̸= Y ]],


f ∈F

3
where E[·] is w.r.t. PX,Y . Given only a finite sample, one natural thing to do is empirical risk minimiza-
tion, i.e., find the classifer that has the lowest training error rate on data:
n
1X
fˆ = arg min E[I[f
b (X) ̸= Y ]] := I[f (Xi ) ̸= Yi ].
f ∈F n i=1

The question is, can we give any guarantee to how good the learned classifier fˆ is compared to the
optimal one f ⋆ , as a function of n? In other words, we want to bound

E[I[fˆ(X) ̸= Y )]] − E[I[f ⋆ (X) ̸= Y )]].

We provide the analysis below, which mainly uses Hoeffding’s and union bound. First of all,

E[I[fˆ(X) ̸= Y )]] − E[I[f ⋆ (X) ̸= Y )]]


≤ E[I[fˆ(X) ̸= Y )]] − E[I[
b fˆ(X) ̸= Y )]] + E[I[f
b ⋆
(X) ̸= Y )]] − E[I[f ⋆ (X) ̸= Y )]] (fˆ is optimal w.r.t. E)
b
≤ 2 · max |E[I[f (X) ̸= Y )]] − E[I[f
b (X) ̸= Y )]]|. (3)
f ∈F

It then suffices to bound maxf ∈F |E[I[f (X) ̸= Y )]] − E[I[f


b (X) ̸= Y )]]|, which is often called a uniform
deviation bound. The key is to realize that, for any fixed f ∈ F, E[I[f b (X) ̸= Y ]] is the average of
i.i.d. random variables I[f (Xi ) ̸= Yi ] bounded in [0, 1], whose true expectation is precisely E[I[f (X) ̸=
Y ]]. Applying Hoeffding’s, for a fixed f ∈ F, with probability at least 1 − δ, we have
r
1 2
|E[I[f (X) ̸= Y ] − E[I[f (X) ̸= Y ]| ≤
b ln .
2n δ
Union bounding over F and plugging into Eq.(4),
r
2 2|F|
E[I[fˆ(X) ̸= Y )]] − E[I[f ⋆ (X) ̸= Y )]] ≤ ln . (4)
n δ

References
[1] Akshay Krishnamurthy, Alekh Agarwal, and John Langford. PAC reinforcement learning with
rich observations. In Advances in Neural Information Processing Systems, pages 1840–1848, 2016.

[2] Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit
problem. Machine learning, 47(2-3):235–256, 2002.

You might also like