Hoeffding 1948
Hoeffding 1948
DAVIDE GIRAUDO
Abstract. In this note, we give sufficient conditions for the almost sure and the convergence in Lp of
a U -statistic of order m built on a strictly stationary but not necessarily ergodic sequence.
arXiv:2309.05988v1 [math.PR] 12 Sep 2023
n o
where J1, mK = {k ∈ N, 1 6 k 6 m} and Incm
n = (iℓ )ℓ∈J1,mK , 1 6 i1 < i2 < · · · < im 6 n . If (Xi )i>1
is i.i.d. and E [|h (X1 , . . . , Xm )|] is finite, then Um,n,h → E [h (X1 , . . . , Xm )] a.s. and in L1 . A natural
question is whether for a strictly stationary sequence (Xi )i>1 , the sequence (Um,n,h )n>m converges
almost surely or in L1 to some random variable. Assume first that (Xi )i>1 is ergodic. It is shown
in Aaronson et al. (1996) that if S = R, (Xi )i>1 has common distribution PX0 , h is bounded and
PX0 × · · · × PX0 almost everywhere continuous, then
Z
(1.2) Um,n,h → h (x1 , . . . , xm ) dPX0 (x1 ) . . . dPX0 (xm ) a.s..
Convergence in probability was also investigated in Borovkova et al. (2002). A proof of (1.2) in the con-
text of absolutely regular sequences has been given in Arcones (1998). Moreover, Marcinkievicz law of
large numbers for U -statistics of order two has been established in Dehling and Sharipov (2009) for ab-
solutely regular sequences and Giraudo (2021) for sequences expressable as functions of an independent
sequence.
It is worth pointing out that in general, the sequence (U2,n,h )n>2 may fail to converge. For instance,
Aaronson et al. (1996), Example 4.5, found a non-bounded kernel h and a strictly stationary sequence
such that lim supn→∞ U2,n,h = ∞. Moreover, the example given in Proposition 3 of Dehling et al.
(2023) shows the existence of a bounded kernel h and a stationary ergodic sequence (Xi )i>1 such that
a subsequence of (U2,n,h )n>2 converges to 0 almost surealy and an other subsequence of (U2,n,h )n>2
converges to 1 almost surely. Also, as Proposition 4 shows, boundedness in L1 of (h (X1 , Xj ))j>2 plays
a key role, otherwise, we can find a kernel h and a strictly stationary sequence (Xi )i>1 for which the
sequence (U2,n,h − E [U2,n,h ])n>2 converges to a non-degenerate normal distribution.
Some results have been established in Dehling et al. (2023), assuming that the strictly stationary
sequence (Xi )i>1 is ergodic.
where I denotes the σ-algebra of invariant sets, that is, the sets E such that T −1 E = E. The limit of
U -statistics will be expressed as integral with respect to product measure of µω , which lead us to define
Z
(1.5) Im (S, h, ω) := h (x1 , . . . , xm ) dµω (x1 ) . . . dµω (xm ) .
Sm
Some assumption will be made on the set of discontinuity points of h, which will be denoted by D (h).
SOME NOTES ON ERGODIC THEOREMS FOR U -STATISTICS OF ORDER m 3
1.1. Almost sure convergence. Our first result deals with the almost sure convergence of a U -
statistic under the assumption of boundedness of the kernel and negligibility of the set of discontinuity
with respect to the product of the marginal law.
Theorem 1.1. Let (S, d) be a separable metric space, let (Xi )i∈Z be a strictly stationary sequence.
Suppose that h : S m → R satisfies the following assumptions:
(A.1.1) h is symmetric, that is, h xσ(1) , . . . , xσ(m) = h (x1 , . . . , xm ) for each x1 , . . . , xm ∈ S and each
bijective σ : J1, mK → J1, mK,
(A.1.2) h is bounded and
(A.1.3) for almost every ω ∈ Ω, Im (S, 1Dh , ω) = 0, where Dh denotes the set of discontinuity points of
h.
Then for almost every ω ∈ Ω, the following convergence holds:
(1.7) lim Um,n,h (ω) = Im (S, h, ω) ,
n→∞
where Im (S, h, ω) is defined as in (1.5).
This result extends Theorem 1 in Dehling et al. (2023) in two directions: first, the case of U -statistics
of arbitrary order are considered. Second, we address here the not necessarity ergodic case.
When (Xi )i>1 is ergodic, the measure µω is simply the distribution of X0 hence the right hand side of
h i
(1) (m) (1) (m)
(1.7) can be simply expressed as E h X1 , . . . , X1 , where X1 , . . . , X1 are independent copies
of X1 .
The symmetry assumption is needed in order to relate Um,n,h to a sum over a rectangle and then see
the convergence in (1.7) as a convergence in distribution of product of random measures.
1.2. Convergence in Lp , p > 1. In this subsection, we present sufficient conditions for the convergence
in Lp of (Um,n,h )n>1 .
We start by mentioning the following consequence of Theorem 1.1.
Corollary 1.2. Let(S, d) be a separable metric space, let (Xi )i∈Z be a strictly stationary sequence and
let p > 1. Suppose that h : S m → R satisfies the following assumptions:
(A.1.1) h is symmetric, that is, h xσ(1) , . . . , xσ(m) = h (x1 , . . . , xm ) for each x1 , . . . , xm ∈ S and each
bijective σ : J1, mK → J1, mK,
(A.1.2) the family {|h (Xi1 , . . . , Xim )|p , 1 6 i1 < · · · < im } is uniformly integrable.
(A.1.3) the following integral is finite:
Z Z
(1.8) |h (x1 , . . . , xm )|p dµω (x1 ) . . . dµω (xm ) dP (ω) .
Ω Sm
(A.1.4) PX0 × · · · × PX0 (Dh ) = 0, where Dh denotes the set of discontinuity points of h and PX0 the
distribution of X0 .
Then the following convergence takes place:
(1.9) lim kUm,n,h − Im (S, h, ·)kp = 0,
n→∞
where Im (S, h, ·) is defined as in (1.6).
This improves Theorem 2 in Dehling et al. (2023) under assumption (A.1) in the paper, since we do
not require symmetry of the kernel.
One may wonder why we do not present a similar result for U -statistics of order m. A first idea
would be an argument by induction on the dimension. In order to perform the induction step, say from
m = 2 to m = 3, we would need to show, after a use of the weighted ergodic, the convergence in Lp of
n −1 P
2 16i<j6n h (X−j , X−i , X0 ). Since we assume uniform integrability, it suffices to show the almost
sure convergence, which could be established by seeing this almost sure convergence as that of a product
of random measures. But without symmetry, we do not know whether the almost sure convergence of
−1 P
the sequence of random measures n2 16i<j6n δ(X−j ,X−i ) takes place.
Let us now state a result on the convergence in Lp without imposing any continuity of the kernel,
but making assumptions on the distribution of the vectors (Xi1 , . . . , Xim ) .
Theorem 1.4. Let X0 ◦ T i i∈Z be a strictly stationary sequence taking values in Rd and let p > 1.
m
Suppose that h : Rd → R and X0 ◦ T i i∈Z satisfy the following assumptions:
(A.3.1) the collection {|h (Xi1 , . . . , Xim )|p , 1 6 i1 < · · · < im } is uniformly integrable.
(A.3.2) for each (iℓ )ℓ∈J1,mK such that 1 6 i1 < · · · < im , the vector (Xi1 , . . . , Xim ) has a density fi1 ,...,im
and there exists a q0 > 1 such that
Z
(1.12) M1 := sup m
fi1 ,...,im (t1 , . . . , tm )q0 dt1 . . . dtm < ∞.
(iℓ )ℓ∈J1,mK :16i1 <···<im (Rd )
(A.3.3) For almost every ω, the measure µω defined as in (1.4) admits a density fω with respect to the
Lebesgue measure and there exists a set Ω′ having probability one and q1 > 1 for which
Z
(1.13) M2 := sup fω (t)q1 dt < ∞.
ω∈Ω′ Rd
Our Theorem 1.4 improves Theorem 2 in Dehling et al. (2023) under assumption (A.2) in the following
directions. First, we provide a result for U -statistics of arbitrary order. Second, the not necessarily
ergodic case is addressed. Third, even in the ergodic case, our assumption only require a uniform control
on the Lq1 norm of the densities instead of a uniform bound.
2. Proofs
2.1. Proof of Theorem 1.1. The symmetry assumption guarantees the following decomposition
X n X
1 1
(2.1) Um,n,h = n h (Xi1 , . . . , Xim ) + n h (Xi1 , . . . , Xim ) ,
m! m i ,...,i =1 m
1 m (iℓ )ℓ∈J1,mK ∈Jn
where Jn denotes the set of elements (iℓ )ℓ∈J1,mK ∈ J1, nKm for which there exist at least two distinct
n
indexes ℓ and ℓ′ for which iℓ = iℓ′ . Since h is bounded and Card (Jn ) / m goes to 0 as n goes to infinity,
it suffices to prove that for almost every ω ∈ Ω,
X n Z
1
(2.2) lim m h (Xi1 (ω) , . . . , Xim (ω)) = h (x1 , . . . , xm ) dµω (x1 ) . . . dµω (xm ) ,
n→∞ n Sm
i1 ,...,im =1
where
n
1X
(2.4) νn,ω = δXi (ω) .
n
i=1
Separability of S guarantees the existence of a countable collection (fk )k>1 of continuous and bounded
functions from S to R such that a sequence (µn )n>1 of probability measures converges weakly to a
R
probability measure µ if and only if for each k > 1, fk dµn → inf fk dµ. By the ergodic theorem, we
know that for each k > 1, there exists a set Ωk having probability one for which the convergence
Z n Z
1X
(2.5) lim fk (x) dµn,ω = lim fk (Xj (ω)) = E [fk (X0 ) | I] (ω) = fk (x) dµω (x) .
n→∞ n→∞ n
j=1
T
holds for each ω ∈ Ωk . Therefore, for each ω belonging to the set of probability one Ω′ := k>1 Ωk , the
sequence (µn,ω )n>1 converges weakly to µω .
Recall that Theorem 3.2 (page 21) of Billingsley (1968) shows that if µn → µ and µ′n → µ′ in
distribution on metric spaces S1 and S2 respectively, then µn × µ′n → µ × µ′ in distribution on S1 × S2 .
Applying inductively this result and using assumptions (A.1.2) and (A.1.3) shows that for each ω ∈ Ω′ ,
(1.7) holds.
2.2. Proof of Corollary 1.2. Let hR be as in (2.13). Observe that assumption (A.1.3) guarantee that
Im (S, h, ·) defined as in (1.5) belongs to Lp . Moreover, the triangle inequality implies
kUm,n,h − Im (S, h, ·)kp 6 kUm,n,h − Um,n,hR kp +kUm,n,hR − Im (S, hR , ·)kp +kIm (S, h, ·) − Im (S, hR , ·)kp .
Using assumption (A.1.2) combined with the triangle inequality, one gets
hence it suffices to show that for each fixed R > 0, kUm,n,hR Im (S, hR , ·)kp → 0 as n goes to infinity.
This follows from an application of Theorem 1.1 with h replaced by hR (note that continuity of φR
guarantees that D (hR ) ⊂ D (h)), which gives that Um,n,hR → YR almost surely and the dominated
convergence theorem allows to conclude.
2.3. Convergence of weighted averages. The proof of Theorems 1.3 and 1.4 rests on weighted
versions of the ergodic theorem, which read as follows.
Lemma 2.1. Let T be a measure preserving map on the probability space (Ω, F, P) and let I be the σ-
algebra of T invariance sets. Let p > 1 and let (fj )j>1 be a sequence of functions such that kfj − f kp → 0.
Then for each m > 0, the following convergence holds:
n
X
1 j−1
(2.6) lim n fj ◦ T j − E [f | I] = 0.
n→∞
m+1
m
j=m+1
p
n
X
1 j−1
(2.7) n fj ◦ T j − E [f | I]
m+1
m
j=m+1
p
n
X n
X
1 j−1 1 j−1
6 n
kfj − f kp + n
f ◦ T j − E [f | I] .
m+1
m m+1
m
j=m+1 j=m+1
p
P P
The first term goes to zero as j goes to infinity from the elementary fact that ni=1 ci xi / n
j=1 cj → 0
Pn
if cj > 0 and j=1 cj → ∞. For the second term, we assume for without loss of generality that
P
E [f | I] = 0, otherwise, we replace f by f − E [f | I]. Let Sj := ji=1 f ◦ T i . Then
n
X n
X n−1
X
j−1 j j −1 j
(2.8) f ◦T = Sj − Sj
m m m
j=m+1 j=m+1 j=m
Since kSn kp /n → 0, the first term of the right hand side of (2.9) goes to 0 as n goes to infinity. For the
second term, one has for each m 6 R 6 n that
n−1
X R
X
1 j j−1 R j j−1
(2.10) n − kSj kp 6 n − kSj kp
m+1
m m m+1
m m
j=m j=m
n−1
X
1 j j−1 kSk kp
+ n j − sup
m+1
m m k>R k
j=R
SOME NOTES ON ERGODIC THEOREMS FOR U -STATISTICS OF ORDER m 7
hence
n−1
X
1 j j−1 kSk kp
(2.11) lim sup n − kSj kp 6 sup
n→∞ m+1
m m k>R k
j=m
kSk kp
and we conclude using again that k → 0 as k goes to infinity.
2.4. Proof of Theorem 1.3. The proofs will lead us to consider truncated versions of the kernel h.
Define for each fixed R > 0 the maps φR : R → R by
−R if t < −R,
(2.12) φR (t) := t if − R 6 t < R,
R if t > R
and hR : S m → R by
(2.13) hR (x1 , . . . , xm ) := φR (h (x1 , . . . , xm )) , x1 , . . . , xm ∈ S.
Then |hR | is bounded by R and since DhR ⊂ Dh , the equality PX0 × PX0 (DhR ) = 0 holds. We claim
that it suffices to prove that (1.11) holds for each R with h replaced by hR . Indeed, by the triangle
inequality,
and
Z Z Z p
(2.15) h (x, y) dµω (x) dµω (y) − hR (x, y) dµω (x) dµω (y) dP (ω)
Ω S2 S2
Z Z
6 |h (x, y)|p 1|h(x,y)|>R dµω (x) dµω (y) dP (ω) ,
Ω S2
hence assumptions (A.2.1) and (A.2.3) allows us to choose R making the previous quantities as small
as we wish. Defining
j−1
1X
(2.16) dj,R := hR (X−i , X0 ) ,
j
i=1
we get that
n
1 X
(2.17) U2,n,hR = n jdj,R ◦ T j .
2 j=2
We first show that there exists a set of probability one Ω′ such that for each ω ∈ Ω′ ,
Z
(2.18) dj,R (ω) → hR (x, y) dµω (x) dδX0 (ω) =: YR (ω) .
S2
First, separability of S guarantees the existence of a countable collection (fk )k>1 of continuous and
bounded functions from S to R such that a sequence (µn )n>1 of probability measures converges weakly
R
to a probability measure µ if and only if for each k > 1, fk dµn → inf fk dµ.
P
Taking µn,ω := n−1 ni=1 δX−i (ω) , the ergodic theorem furnishes for each k > 1 a set Ωk having
probability one for which the convergence
Z n Z
1X
(2.19) lim fk (x) dµn,ω = lim fk (Xj (ω)) = E [fk (X0 ) | I] (ω) = fk (x) dµω (x) .
n→∞ n→∞ n
j=1
8 DAVIDE GIRAUDO
T
holds for each ω ∈ Ωk . Consequently, for each ω ∈ Ω′ := k>1 Ωk , one has µn,ω → µω weakly in S
and by Theorem 3.2 (page 21) of Billingsley (1968), we get that µn,ω × δX0 (ω) → µω × δX0 (ω) weakly in
S 2 . Since hR is bounded and for almost every ω ∈ Ω, µω × δX0 (ω) (DhR ) = 0, we get (2.18). Moreover,
|dj,R | 6 R hence by dominated convergence,
Z
(2.20) lim dj,R − hR (x, y) dµω (x) dδX0 (ω) = 0.
j→∞ S2 p
n n
1 X 1 X
(2.21) kU2,n,hR − E [YR | I]kp 6 n j kdj,R − YR kp + n jYR ◦ T j − E [YR | I] .
2 j=2 2 j=2
p
An application of (2.18) combined with the dominated convergence theorem shows that the first term
of the right hand side of (2.21) goes to 0 as n goes to infinity. Then Lemma 2.1 with m = 1 shows that
kU2,n,hR − E [YR | I]kp → 0. It remains to check that
Z
(2.22) E [YR | I] (ω) = h (x, y) dµω (x) dµω (y) .
S2
Pn
Observe that by the ergodic theorem, E [YR | I] (ω) = limn→∞ n−1 k=1 YR T k ω . Since µT k ω = µω ,
it follows that
n Z
1X
(2.23) E [YR | I] (ω) = lim hR (x, y) dµω (x) dδXk (ω) (y) .
n→∞ n 2
k=1 S
Pn
Using similar arguments as before gives that for almost every ω, µω × n−1 k=1 δXk (ω) converges in
distribution to µω × µω . This ends the proof of Theorem 1.3.
2.5. Proof of Theorem 1.4. We start by proving Theorem 1.4 in the case where h (x1 , . . . , xm ) =
Qm d i
ℓ=1 1xℓ ∈Aℓ . We show by induction over m that if Aℓ , ℓ ∈ J1, mK are Borel subsets of R and X ◦ T i∈Z
a stationary sequence with invariance σ-algebra I, then
X m
Y m
Y
1
(2.24) lim n
1Xiℓ ∈Aℓ − E [1X0 ∈Aℓ | I] = 0.
n→∞
m (iℓ )ℓ∈J1,mK ∈Incm
n ℓ=1 ℓ=1
p
The case m = 1 is a direct consequence of the ergodic theorem. Let us show the case m = 2. We start
from
n j−1
!
1 X 1 X 1X
(2.25) n 1Xi ∈A1 1Xj ∈A2 6 n j 1X−i ∈A1 1X0 ∈A2 ◦ T j
2 2
j
16i<j6n j=2 i=1
Pj−1
Let fj := 1j i=1 1X−i ∈A1 1X0 ∈A2 . By the ergodic theorem, fj → E [1X0 ∈A1 | I] 1X0 ∈A2 , which gives
(2.24) for m = 2.
Suppose now that (2.24) holds for each Borel subset A1 , . . . , Am of Rd and each strictly stationary
sequence X0 ◦ T i i∈Z . Let A1 , . . . , Am+1 be Borel subsets of Rd and let X0 ◦ T i i∈Z be a strictly
SOME NOTES ON ERGODIC THEOREMS FOR U -STATISTICS OF ORDER m 9
Define
1 X Y
(2.27) fj := j−1
1X0 ∈Am+1 1Xiℓ −j ∈Aℓ .
m (iℓ )ℓ∈J1,mK ∈Incm
j−1 ℓ∈J1,mK
where |·|d denotes the Euclidean norm on Rd . Observe that by the triangle inequality,
m
X
+ sup h (Xi1 , . . . , Xim ) 1|xℓ |d >K ,
16i1 <···<im p
ℓ=1
hence by assumption (A.3.1), we can find K ′ such that for each K > K ′ ,
Moreover, by assumption (A.3.4), we can choose K ′′ such that for each K > K ′′ ,
Z p
(2.33) Im Rd , h − h(K) , ω dP (ω) 6 εp .
Ω
Let K0 = max {K ′ , K ′′ }. Observe that in assumptions (A.3.2) and (A.3.3) , we can assume without loss
of generality that q0 = q1 . By standard results in measure theory, we know that we can find an integer
J, constants c1 , . . . , cJ and Borel subsets Aℓ,j , ℓ ∈ J1, mK, j ∈ J1, JK such that
Z q0
p q −1 q
p 0
(2.34) m
h (K0 )
(x 1 , . . . , x m ) − e
h (K0 )
(x 1 , . . . , x m ) 0
dx1 . . . dxm < (M1 + M2 )p ε q0 −1 ,
( Rd )
where
J
X m
Y
(2.35) e
h(K0 ) (x1 , . . . , xm ) = cj 1xℓ ∈Aℓ,j .
j=1 ℓ=1
10 DAVIDE GIRAUDO
As a consequence,
Um,n,h − Im Rd , h, · 6 sup Um,N,h − Um,N,h(K0 ) + sup Um,N,h(K0 ) − Um,N,eh(K0 )
p N >m p N >m p
+ Um,n,eh(K0 ) − Im Rd , e h(K0 ) , · + Im R d , e
h(K0 ) , · − Im Rd , h(K0 ) , ·
p p
d (K0 ) d
+ Im R , h , · − Im R , h, · .
p
By (2.24) and (2.35), we can find n0 such that for each n > n0 , Im Rd , e
h(K0 ) , · − Im Rd , h(K0 ) , · 6
p
ε hence we derive that for such n’s, Um,n,h − Im Rd , h, · p 6 4ε. This ends the proof of Theorem 1.4.
References
J. Aaronson, R. Burton, H. Dehling, D. Gilat, T. Hill, and B. Weiss. Strong laws for L-
and U -statistics. Trans. Amer. Math. Soc., 348(7):2845–2866, 1996. ISSN 0002-9947. URL
https://doi.org/10.1090/S0002-9947-96-01681-9.
M. A. Arcones, The law of large numbers for U -statistics under absolute regularity, Electron. Comm.
Probab. 3 (1998), 13–19. MR 1624866
P. Billingsley. Convergence of probability measures. John Wiley & Sons Inc., New York, 1968.
S. Borovkova, R. Burton, and H. Dehling. From dimension estimation to asymptotics of dependent U -
statistics. In Limit theorems in probability and statistics, Vol. I (Balatonlelle, 1999), pages 201–234.
János Bolyai Math. Soc., Budapest, 2002.
I. P. Cornfeld, S. V. Fomin, and Ya. G. Sinaı̆. Ergodic theory, volume 245 of Grundlehren der Mathe-
matischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, New
York, 1982. ISBN 0-387-90580-4. URL http://dx.doi.org/10.1007/978-1-4615-6927-5. Trans-
lated from the Russian by A. B. Sosinskiı̆.
H. Dehling, D. Giraudo, and D. Volny. Some remarks on the ergodic theorem for U -statistics, 2023.
URL https://arxiv.org/abs/2302.04539, to appear in Comptes Rendus Mathématique.
H. G. Dehling and O. Sh. Sharipov, Marcinkiewicz-Zygmund strong laws for U -statistics of weakly
dependent observations, Statist. Probab. Lett. 79 (2009), no. 19, 2028–2036. MR 2571765
SOME NOTES ON ERGODIC THEOREMS FOR U -STATISTICS OF ORDER m 11
M. Denker and M. Gordin. Limit theorems for von Mises statistics of a measure preserving
transformation. Probab. Theory Related Fields, 160(1-2):1–45, 2014. ISSN 0178-8051. URL
https://doi.org/10.1007/s00440-013-0522-z.
D. Giraudo, Limit theorems for U -statistics of Bernoulli data, ALEA Lat. Am. J. Probab. Math. Stat.
18 (2021), no. 1, 793–828. MR 4243516
W. Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Statistics, 19:
293–325, 1948. ISSN 0003-4851. URL https://doi.org/10.1214/aoms/1177730196.
R. Lachièze-Rey and M. Reitzner. U -statistics in stochastic geometry. In Stochastic analysis for Poisson
point processes, volume 7 of Bocconi Springer Ser., pages 229–253. Bocconi Univ. Press, 2016.
R. Lyons. Distance covariance in metric spaces. Ann. Probab., 41(5):3284–3305, 2013. ISSN 0091-1798.
URL https://doi.org/10.1214/12-AOP803.
(†) Institut de Recherche Mathématique Avancée UMR 7501, Université de Strasbourg and CNRS 7 rue
René Descartes 67000 Strasbourg, France