Time Averages and Ergodicity
Time Averages and Ergodicity
In practice when we study signals, we are forced to analyze individual realizations of random
processes. We usually desire information about the ensemble average properties of measureable
quantities; we must derive this information from single realizations and consequently we are
usually compelled to make time averages:
Consider the following stochastic integral of a WSS random process X(t):
Z T
1
Y = dt X(t).
T 0
1
We can show that Y is an unbiased estimator:
Z T
1
hY i = dt X(t)
T 0
Z T
1
= dt hX(t)i
T 0
Z T
1
= hXi dt = hXi
T 0
Therefore Y has a mean equal to the mean of X. The question now is what the spread in Y is:
2
To investigate the variation of Y about hY i we must look at the second moment:
Z T 2
1
hY 2i = dt X(t)
T 0
ZZ T
= T −2 dt1 dt2 X(t1) X(t2)
0
ZZ T
−2
= T dt1 dt2 hX(t1)X(t2)i
0
| {z }
k
RX (t1, t2) = RX (t1 − t2)
3
The tedious part is to identify the limits of integration. The area in the t1 − t2 plane transforms
to the z − τ plane:
This result is useful because it shows how the averaging process behaves as T → ∞.
4
Convergence:
Suppose the autocovariance of X, CX (τ ) has finite width Wx and looks like
5
Then Z T
−1
lim σY2 = lim T dτ CX (τ ) (1 − |τ |/T )
T →∞ T →∞ −T
If Cx(τ ) → 0 at lags T :
Z ∞
−1
σY2 = lim T dτ CX (τ )
T →∞ −∞
−1 2
' lim T σX W
T →∞ | {z }x
constant
= 0
lim
Thus, σY → 0 as T → ∞ and therefore, T →∞ y = hxi
An infinite time average is equivalent to an ensemble average.
This is an example of ergodicity where time averages of realizations(s) of a random process
converge to an ensemble average.
6
Comments:
7
The meaning of the width Wx of the autocorrelation function:
If Wx ≡ correlation time for x ⇒ independent samples of x(t) are separated by a time ≈ Wx
The rate at which Y −→ hxi depends on the autocorrelation width Wx of the ACV of x. To get
σ2 W
σY2 ∼ XT x −→ 0 for large T we assumed that Cx(τ ) dropped to zero sufficiently fast so that
the area of CX (τ ) is finite. For this to be true, we need CX (τ ) ∝ τ −p for large τ to have p > 0,
since
T 1−p
1−p
T −p
−
Z
1 −p 1 T τ1
dτ τ = −→
T τ1 T 1−p 1−p
8
Fourier Transform and Power Spectrum Estimate for a Stochastic Process
Another stochastic integral is the Fourier transform. As stated before, the FT of a random
process (WSS) X(t)
Z ∞
X̃(f ) = dt x(t) e−iωt recall WSS ⇒ hX 2(t)i = constant in t
−∞
cannot exist because the integral diverges. Luckily we need consider windowed or finite inte-
grals in order to model experimental situations:
Z T
X̃T (f ) = dt x(t) e−iωt
−T
What should an estimator of the power spectrum Sx(f ) look like? Recall for deterministic
functions that Z ∞
dt f ∗(t) f (t + τ ) ⇔ |F̃ (f )|2
−∞
So we expect that an estimator (denoted with a carat) for the power spectrum of a process x(t)
would have the form:
Ŝx(f ) = const |X̃T (f )|2
1
and an appropriate value of the constant is = 2T so
1
Ŝx(f ) = |X̃T (f )|2
2T
This ensures that the corresponding ACF has the correct units:
9
It can be shown that the estimator satisfies the Wiener-Khinchin theorem (which applies to
ensemble average quantities), where
Ĉx(τ ) ⇔ Ŝx(f )
is
Z ∞
1
Ĉx(τ ) = df eiωτ |X̃T (f )|2
2T −∞
Z ∞ ZZ T
1 0
= df e iωτ
dt dt0 x(t) x∗(t0) e−iω(t−t )
2T −∞ −T
ZZ T Z ∞
1 0 ∗ 0 0
= dt dt x(t) x (t ) df eiω(t +τ −t)
2T −T
| −∞ {z }
≡δ(t0 +τ −t)
Z T
1
Ĉx(τ ) = dt x(t) x∗(t − τ )
2T −T
Z T
1
so Ĉx(0) = dt |x(t)|2
2T −T
= estimate for h|x(t)|2i as expected for Cx(0)
10
How good an estimator is Ŝx(f ) for the power spectrum Sx(f )?
Answer: terrible! Recall that Ŝ(f ) is itself a ramdom process (since Ŝx for fixed f is a random
variable). Thus, we Rmay fairly ask what the convergence properties of Ŝx(f ) are just as we
T
investigated Y ≡ T1 0 dt X(t) and found that hY i = hXi and σy → 0 as T → ∞ so long as
the correlation function of X(t) decayed sufficiently quickly.
variance but σŜ2 (f ) ≡ hŜx2(f )i − hŜx(f )i2 does not decay to zero as T → ∞:
lim σŜ2 (f ) 6= 0.
T →∞
Conclusion: The squared magnitude of a finite Fourier transform of a WSS process is a poor
estimate for the power spectrum Sx(f ) (an ensemble average quantity).
Why Ŝx(f ) is a poor estimator:
ZZ
1
= − dt1 dt2 Rx(t1 − t2) sin ω(t1 + t2)
2
ZZ
+ dt1 dt2 Rx(t1 − t2) sin ω(t2 − t1) .
| {z } | {z }
even odd
The first term integrates to zero for Wx T while the second term is the integral of the product
of odd and even functions.
Therefore
1 1
ŜT (ω) = |X̃T (ω)|2 = [R2(ω) + I 2(ω)]
2T 2T | {z }
Where the sum of squares is of two independent Gaussian r.v.’s ≡ χ22 implying that ŜT (ω) is a
χ22 r.v. with mean hŜT (ω)i = S(ω) = true power spectrum.
13
Let
p(ω) ≡ ŜT (ω).
Then the PDF of p(ω) is
1 −p/hpi
fp(p) = e U (p).
hpi
14
Method 2: Another way of understanding why ŜT (ω) does not converge as T → ∞ is to
consider the number of degrees of freedom in each independent frequency bin.
• Note that Z T Z ∞
−iωt
X̃T (ω) ≡ dt x(t) e = dt x(t) WT (t) e−iωt
−T −∞
where WT is a window function,
WT (t) = 1 for |t| ≤ T and zero otherwise.
or
2 sin ωT
X̃T (ω) = X̃(ω) ∗
ω
sin ωT
As usual, multiplication by WT (t) in the time domain corresponds to convolution by ω
in the frequency domain.
π 1
A frequency cell or bin has a width ∆ω ≈ T or ∆f ≈ 2T
• Let Wt = correlation time in the time domain. Then the number of independent fluctuations
2T
in the time series is Nt = W t
• Let Wω = width of spectrum (bandlimited). Then the number of frequency cells into which
the variance is divided is
Wω T
Nω = = Wω
∆ω π
15
• The number of degrees of freedom (d.o.f.) per frequency cell is
Nt #of independent data points 2T /Wt 2π
Nd.o.f. = ≡ = = .
Nω #of frequency cells T Wω /π WtWω
But the uncertainty principle ⇒ WtWω ≥ 2π, so Nd.o.f. ≈ 1 for each part of F.T.
Interpretation:
As T → ∞ more and more independent fluctuations contribute to the integral, but these are
being spread into more and more frequency bins so that the # d.o.f. per bin remains the same
and small ⇒ large errors.
Nd.o.f. ≈ 1 ⇒ |X̃T (ω)|2 will have 2 d.o.f. per cell, as before.
16
Solution to the convergence problem: increase the number of degrees of freedom in a
frequency cell.
The simplest approach to take is to average spectral estimates. Obtain spectra from L realiza-
tions of length 2T and average; i.e. find ŜT (ω) for a block of data, repeat for L blocks of data,
and average.
X L
ŜT,L(ω) = L−1 ŜT ;(ω)
j=1
[V ar [ŜT,L(ω)]]1/2
error : = L−1/2
hŜT,L(ω)i
10% ⇒ L = 100
(Best method if unlimited amount of data are available.) We will talk about this and other
methods later.
17
Direct Calculation of the Mean and Variance of the Spectral Estimate
We will use continuous notation for now.
The spectral estimator for a WSS process is
Z T
1
ŜT (ω) ≡ |X̃T (ω)|2 where X̃T (ω) = dt x(t) e−iωt.
2T −T
Properties of the estimator: As usual, we want to calculate the mean and variance of the
estimator.
Mean:
The ensemble average is
1
hŜT (ω)i = |X̃T (ω)|2
2T
ZZ
1
= dt1 dt2 x(t1) x∗(t2) e−iω(t1−t2)
2T
ZZ T
1
= dt1 dt2 RX (t1 − t2) e−iω(t1−t2).
2T −T
18
Use, as before, the coordinate transformation
τ = t1 − t2
1
z = (t1 + t2)
2
(1)
where dz dτ = dt1 dt2.
The integration limits are transformed as follows:
19
Z 2T Z T −τ /2 Z 0 Z T +τ /2
1
hŜT (ω)i = dτ Rx (τ ) e−iωτ dz + dτ RX (τ ) e−ωτ dz
2T
0 −T +τ /2 −2T −T −τ /2
| {z } | {z }
2T (1−τ /2T ) 2T (1+τ /2T )
Z 2T Z 0
1 −iωτ τ −iωτ τ
= dτ RX (τ ) e 2T 1 − + dτ RX (τ ) e 2T 1 +
2T 0 2T −2T 2T
| {z }
|τ |
=1− 2T
2T
|τ |
Z
1
= dτ Rx (τ ) e−iωτ 2T 1 −
2T −2T 2T
or Z 2T
hŜT (ω)i = dτ RX (τ ) e−iωτ [1 − |τ |/2T ]
−2T
and Z ∞
lim hŜT (ω)i = dτ RX (τ ) e−iωτ ≡ S(ω)
T →∞ −∞
20
Variance:
Now let’s look at the variance of the estimator:
Var[ŜT (ω)] ≡ hŜT2 (ω)i − hŜT (ω)i2
By definition,
1
hŜT2 (ω)i = 2
|X̃T (ω)|4
4T
ZZZZ T
1
= 2
dt1 dt2 dt3 dt4 hx(t1 )x∗ (t2 )x(t3 )x∗ (t4 )i e−iω[t1 −t2 +t3 −t4 ]
4T −T
21
Plugging in, we get
ZZZZ
1
dt1 dt2 dt3 dt4 h ie−iω[t1 −t2 +t3 −t4 ]
4T 2
hŜT (ω)ias above
ZZ zZ
Z }| {
1
= 2
dt1 dt2 RX (t1 − t2 )e−iω[t1 −t2 ] dt3 dt4 RX (t3 − t4 )e−iω[t3 −t4 ]
4T
ZZ ZZ
+ dt1 dt3 RX (t1 − t3 )e−iω[t1 +t3 ] dt2 dt4 RX (t2 − t4 )e−iω[−t2 −t4 ]
ZZ ZZ
+ dt1 dt4 RX (t1 − t4 )e−iω[t1 −t4 ] dt2 dt3 RX (t2 − t3 )e−iω[t2 −t3 ]
Note the 1st and 3rd terms are of the form hŜT (ω)i2. Therefore
ZZ
1
hŜT2 (ω)i = 2hŜT (ω)i2 + 2 dt1 dt3 RX (t1 − t3)e−iω[t1+t3] → 0
4T
| {z }
→0 as T /Wx →∞ (except for ω=0)
t1 ,t2 →τ,Z as before
22
Looking at the double integral in the second term we have
ZZ
R 2T
Z T −τ /2
−iω[t1 +t3 ]
dt1 dt3 RX (t1 − t3 )e = 0 dτ RX (τ ) dz e−iω2z
−T +τ /2
| {z }
1 [e−iω2(T −τ /2) −eiω2(T −τ /2) ]
− 2ω2
=ω1 sin ω2T (1−τ /2T )
R0
Z T +τ /2
+ −2T dτ RX (τ ) dz e−iω2z
−T −τ /2
| {z }
τ
1 [e−iω2(T + 2 ) −eiω2(T +τ /2)
− iω2
1 sin ω2T
=ω (1 + τ /2T )
| {z }
=1−|τ |/2T
as before
Therefore
ZZ Z 2T
1 −iω[t1 +t3 ] 1
dt1 dt3 RX (t1 − t3 )e = dτ RX (τ ) sin[2ωT (1 − |τ |/2T )].
2T 2T ω −2T
= 0
23
In a slightly different approach for the second term we consider
ZZ T
I2 = dt1 dt3 RX (t1 − t2) e−iω[t1+t3]
−T
Again let τ = t1 − t3 and z = 12 (t1 + t3): By calculating the integrals and taking a limit T → 0,
it can be shown that I2 → 0.
Consequently we have the fairly general result
lim hŜT2 (ω)i = 2hŜT (ω)i2
T →∞
1. increase the number of degrees of freedom through averaging of multiple spectral estimates;
2. increase the number of degrees of freedom by smoothing the spectrum: convolve the spectral estimate with a
window function; this procedure sacrifices of multiple spectral estimates;
3. use another method that uses a priori information (e.g. Bayesian and maximum entropy approaches).
24