note2
note2
Nan Jiang
This note introduces the basics of concentration inequalities and examples of its applications (often
with union bound), which will be useful for the rest of this course.
1 Hoeffding’s Inequality
Theorem 1. Let X1 , . . . , Xn be independent random variables on R such that Xi is bounded in the interval
Pn
[ai , bi ] . Let Sn = i=1 Xi . Then for all t > 0,
2 Pn 2
Pr[Sn − E[Sn ] ≥ t] ≤ e−2t / i=1 (bi −ai ) , (1)
−2t2 /
Pn 2
Pr[Sn − E[Sn ] ≤ −t] ≤ e i=1 (bi −ai ) . (2)
Remarks:
2 Pn 2
• By union bound, we have Pr[|Sn − E[Sn ]| ≥ t] ≤ 2e−2t / i=1 (bi −ai ) .
• We often care
h about the convergence
i of the empirical mean to the true average, so we can devide
2 2 Pn 2
Sn by n: Pr Snn − E[Snn ] ≥ t ≤ 2e−2n t / i=1 (bi −ai ) .
• A useful rephrase of the result whenqall variables share the same support [a, b]: with probability
Sn E[Sn ] 1
at least 1 − δ, n − n ≤ (b − a) 2n ln 2δ .
• The number of variables, n, is a constant in the theorem statement. When n is a random variable
itself, for Hoeffding’s inequality to apply, n cannot depend on the realization of X1 , . . . , Xn .
Example: Consider the following Markov chain:
s1
p 1-p
s2 s4
s3
1
Say we start at s1 and sample a path of length T (T is a constant). Let n be the number of times
we visit s1 , and we can use the transitions from s1 to estimate p.
1. Can we directly apply Hoeffding’s inequality here with n as the number of coin tosses? If
you want to derive a concentration bound for this problem, look up Azuma’s inequality.
2. What if we sample a path until we visit s1 N times for some constant N ? Can we apply
Hoeffding’s inequality with N as the number of random variables?
A popular objective for MAB is the pseudo-regret, which poses the exploration-exploitation challenge:
T
X
RegretT = (µ⋆ − µit ).
t=1
µ⋆ − µî ,
where î is the arm that the learner picks after T rounds of interactions. This poses the “pure explo-
ration” challenge, since all it matters is to make a good final guess and the regret incurred within the
T rounds does not matter. A related objective is called Best-Arm Identification, which asks whether
î ∈ arg maxi∈[K] µi ; Best-Arm Identification results often require additional gap conditions.
2
Now we want accurate estimation for all arms simultaneously. That is, we want to bound the proba-
bility of the event that any µ̂i deviating from µi too much. This is where union bound is useful:
"K #
[
Pr {|µ̂i − µi | ≥ ϵ} (the event that estimation is ϵ-inaccurate for at least 1 arm)
i=1
K
X 2
≤ Pr [|µ̂i − µi | ≥ ϵ] ≤ 2Ke−2T ϵ /K
. (union bound, then Hoeffding’s inequality)
i=1
q
K
To rephrase this result: with probability at least 1 − δ, |µ̂i − µi | ≤ 2T ln 2K
δ holds for all i simultane-
ously.
Finally, we use the estimation error to bound the decision loss: recall that î = arg maxi∈[K] µ̂i , and
let i⋆ = arg maxi∈[K] µi .
The theorem itself is stated as a best-arm identification lower bound, but it is also a lower bound
for simple regret minimization. This is because all arms except the best one is ϵ worse than µ⋆ , so
missing the optimal arm means a simple regret of at least ϵ.
See the proof in [1] (Theorem 2); the technique is due to [2] and can be also used to prove the lower
bound on the regret of MAB.
3
where E[·] is w.r.t. PX,Y . Given only a finite sample, one natural thing to do is empirical risk minimiza-
tion, i.e., find the classifer that has the lowest training error rate on data:
n
1X
fˆ = arg min E[I[f
b (X) ̸= Y ]] := I[f (Xi ) ̸= Yi ].
f ∈F n i=1
The question is, can we give any guarantee to how good the learned classifier fˆ is compared to the
optimal one f ⋆ , as a function of n? In other words, we want to bound
We provide the analysis below, which mainly uses Hoeffding’s and union bound. First of all,
References
[1] Akshay Krishnamurthy, Alekh Agarwal, and John Langford. PAC reinforcement learning with
rich observations. In Advances in Neural Information Processing Systems, pages 1840–1848, 2016.
[2] Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit
problem. Machine learning, 47(2-3):235–256, 2002.