0% found this document useful (0 votes)
150 views

Time Averages and Ergodicity

1) Time averages of stochastic processes can be used to estimate ensemble averages if the process is ergodic. 2) For a wide sense stationary process, the variance of the time average estimator decreases as 1/T as the time interval T increases, converging to the true ensemble average as T approaches infinity. 3) The rate of convergence depends on the width Wx of the autocorrelation function - for a process to be ergodic, its autocorrelation must decrease sufficiently fast as the time lag increases.

Uploaded by

tarun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

Time Averages and Ergodicity

1) Time averages of stochastic processes can be used to estimate ensemble averages if the process is ergodic. 2) For a wide sense stationary process, the variance of the time average estimator decreases as 1/T as the time interval T increases, converging to the true ensemble average as T approaches infinity. 3) The rate of convergence depends on the width Wx of the autocorrelation function - for a process to be ergodic, its autocorrelation must decrease sufficiently fast as the time lag increases.

Uploaded by

tarun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Time Averages and Ergodicity

In practice when we study signals, we are forced to analyze individual realizations of random
processes. We usually desire information about the ensemble average properties of measureable
quantities; we must derive this information from single realizations and consequently we are
usually compelled to make time averages:
Consider the following stochastic integral of a WSS random process X(t):
Z T
1
Y = dt X(t).
T 0

This integral can be considered an estimator for ensemble average hX(t)i.


Note that Y is a random variable and has its own PDF.
Question: How good is Y as an estimator?
Related questions: What are its expected value and variance? Does it converge to hX(t)i and
if so how fast does it converge?

1
We can show that Y is an unbiased estimator:
 Z T 
1
hY i = dt X(t)
T 0
Z T
1
= dt hX(t)i
T 0
Z T
1
= hXi dt = hXi
T 0
Therefore Y has a mean equal to the mean of X. The question now is what the spread in Y is:

2
To investigate the variation of Y about hY i we must look at the second moment:
 Z T 2
1
hY 2i = dt X(t)
T 0
 ZZ T 
= T −2 dt1 dt2 X(t1) X(t2)
0
ZZ T
−2
= T dt1 dt2 hX(t1)X(t2)i
0
| {z }
k
RX (t1, t2) = RX (t1 − t2)

We transform the variables using


1
z = (t1 + t2)
2
τ = t1 − t2,
which satisfies
dz dτ = dt1dt2 (Jacobian = 1).

3
The tedious part is to identify the limits of integration. The area in the t1 − t2 plane transforms
to the z − τ plane:

This indicates that


Z T Z T −τ /2 Z 0 Z +T +τ /2 
hY 2i = T −2 dτ RX (τ ) dz + dτ RX (τ ) dz
0 τ /2 −T −τ /2
Z T
1
hY 2i = dτ RX (τ ) (1 − |τ |/T ).
T −T

To get the variance of y we use the autocovariance CX (τ ) ≡ RX (τ ) − hXi2 to write


RT
σY2 ≡ hY 2i − hY i2 = T1 −T dτ CX (τ ) (1 − |τ |/T )

This result is useful because it shows how the averaging process behaves as T → ∞.
4
Convergence:
Suppose the autocovariance of X, CX (τ ) has finite width Wx and looks like

We now define the autocorrelation width of the process:


Assume we can write
2
CX (τ ) = σX ρX (τ )
with normalization ρX (0) = 1
Define the area of ρX (τ ) as Wx: Z
dτ ρX (τ ) = WX

5
Then Z T
−1
lim σY2 = lim T dτ CX (τ ) (1 − |τ |/T )
T →∞ T →∞ −T

If Cx(τ ) → 0 at lags  T :
Z ∞
−1
σY2 = lim T dτ CX (τ )
T →∞ −∞
−1 2
' lim T σX W
T →∞ | {z }x
constant
= 0

lim
Thus, σY → 0 as T → ∞ and therefore, T →∞ y = hxi
An infinite time average is equivalent to an ensemble average.
This is an example of ergodicity where time averages of realizations(s) of a random process
converge to an ensemble average.

6
Comments:

1) wide sense stationarity was assumed for the example of y −→ hxi


2) ergodicity of higher order moments requires higher order stationarity
3) in the example above for finite T we have
 1/2
Wx
σY ≈ σX for T  Wx.
T

7
The meaning of the width Wx of the autocorrelation function:
If Wx ≡ correlation time for x ⇒ independent samples of x(t) are separated by a time ≈ Wx

⇒ T /Wx = number of independent samples of x(t) in the time interval [0, T ]


⇒ σY ≈ σX N −1/2, N = number of degrees of freedom in estimate
4) we will find that the F.T. of a WSS r.p. is not so well behaved.

The rate at which Y −→ hxi depends on the autocorrelation width Wx of the ACV of x. To get
σ2 W
σY2 ∼ XT x −→ 0 for large T we assumed that Cx(τ ) dropped to zero sufficiently fast so that
the area of CX (τ ) is finite. For this to be true, we need CX (τ ) ∝ τ −p for large τ to have p > 0,
since

T 1−p 
1−p
T −p


Z
1 −p 1 T τ1
dτ τ = −→
T τ1 T 1−p 1−p

Examples of processes that do not converge: Random walks.

8
Fourier Transform and Power Spectrum Estimate for a Stochastic Process

Another stochastic integral is the Fourier transform. As stated before, the FT of a random
process (WSS) X(t)
Z ∞
X̃(f ) = dt x(t) e−iωt recall WSS ⇒ hX 2(t)i = constant in t
−∞

cannot exist because the integral diverges. Luckily we need consider windowed or finite inte-
grals in order to model experimental situations:
Z T
X̃T (f ) = dt x(t) e−iωt
−T
What should an estimator of the power spectrum Sx(f ) look like? Recall for deterministic
functions that Z ∞
dt f ∗(t) f (t + τ ) ⇔ |F̃ (f )|2
−∞
So we expect that an estimator (denoted with a carat) for the power spectrum of a process x(t)
would have the form:
Ŝx(f ) = const |X̃T (f )|2
1
and an appropriate value of the constant is = 2T so
1
Ŝx(f ) = |X̃T (f )|2
2T
This ensures that the corresponding ACF has the correct units:
9
It can be shown that the estimator satisfies the Wiener-Khinchin theorem (which applies to
ensemble average quantities), where
Ĉx(τ ) ⇔ Ŝx(f )
is
Z ∞
1
Ĉx(τ ) = df eiωτ |X̃T (f )|2
2T −∞
Z ∞ ZZ T
1 0
= df e iωτ
dt dt0 x(t) x∗(t0) e−iω(t−t )
2T −∞ −T
ZZ T Z ∞
1 0 ∗ 0 0
= dt dt x(t) x (t ) df eiω(t +τ −t)
2T −T
| −∞ {z }
≡δ(t0 +τ −t)
Z T
1
Ĉx(τ ) = dt x(t) x∗(t − τ )
2T −T
Z T
1
so Ĉx(0) = dt |x(t)|2
2T −T
= estimate for h|x(t)|2i as expected for Cx(0)

10
How good an estimator is Ŝx(f ) for the power spectrum Sx(f )?
Answer: terrible! Recall that Ŝ(f ) is itself a ramdom process (since Ŝx for fixed f is a random
variable). Thus, we Rmay fairly ask what the convergence properties of Ŝx(f ) are just as we
T
investigated Y ≡ T1 0 dt X(t) and found that hY i = hXi and σy → 0 as T → ∞ so long as
the correlation function of X(t) decayed sufficiently quickly.

mean hŜx(f )i T−→


→∞ Sx (f ) so Ŝx (f ) converges in the mean.

variance but σŜ2 (f ) ≡ hŜx2(f )i − hŜx(f )i2 does not decay to zero as T → ∞:
lim σŜ2 (f ) 6= 0.
T →∞

Conclusion: The squared magnitude of a finite Fourier transform of a WSS process is a poor
estimate for the power spectrum Sx(f ) (an ensemble average quantity).
Why Ŝx(f ) is a poor estimator:

1. Ŝx(f ) is a χ22 r.v. in the limit where Wx  T


σŜ
⇒ x
hŜx i
≡ 1 independent of T .
2. From the point of view of number of degrees of freedom, the number of degrees of freedom
in the data Ndof ∼ WTx may be large, but the number of degrees of freedom in the spectral
estimate (per independent frequency bin of width ∆f ≈ 2T1 ) is small.
11
Intuitive Approaches
We can see the same result by bypassing the brute force details by being clever.
Method 1: Consider Z T
X̃T (ω) ≡ dt x(t) e−iωt
| −T {z }
infinite sum of random variables

View this as the sum of many random variables. How many?


RT
Let Wx = autocorrelation width of x(t) as in discussion on ergodicity of Y (t) = T −1 0 dt x(t).
By definition, Wx is the time scale over which two samples X(t1) and X(t2) become indepen-
dent.
2T
Therefore N = Wx ≈ number of independent samples of x(t)
If N  1 then we invoke the Central Limit Theorem and say that X̃T (ω) becomes a Gaussian
random variable with zero mean if x(t) is zero mean.
Break X̃T (ω) into real and imaginary parts:
X̃T (ω) = R(ω) + i I(ω)
It can be shown that R(ω) and I(ω) are independent r.v.’s; therefore, they are zero mean, inde-
pendent Gaussian r.v.’s.
12
Proof:
Z 2T
R(ω) = dt x(t) ωs ωt
−2T
Z 2T
I(ω) = − dt x(t) sin ωt
−2T
ZZ
=⇒ hR(ω) I(ω)i = − dt1 dt2hx(t1) x(t2)i cos
| ωt{z
1 sin ωt}
2
1 [sin ω(t +t )+sin ω(t −t )]
2 1 2 2 1

 ZZ
1
= − dt1 dt2 Rx(t1 − t2) sin ω(t1 + t2)
2
ZZ 
+ dt1 dt2 Rx(t1 − t2) sin ω(t2 − t1) .
| {z } | {z }
even odd
The first term integrates to zero for Wx  T while the second term is the integral of the product
of odd and even functions.
Therefore
1 1
ŜT (ω) = |X̃T (ω)|2 = [R2(ω) + I 2(ω)]
2T 2T | {z }
Where the sum of squares is of two independent Gaussian r.v.’s ≡ χ22 implying that ŜT (ω) is a
χ22 r.v. with mean hŜT (ω)i = S(ω) = true power spectrum.
13
Let
p(ω) ≡ ŜT (ω).
Then the PDF of p(ω) is
1 −p/hpi
fp(p) = e U (p).
hpi

It can be shown that hp2i = 2hpi2 ⇒ ε = σp/hpi = 1 as before.

14
Method 2: Another way of understanding why ŜT (ω) does not converge as T → ∞ is to
consider the number of degrees of freedom in each independent frequency bin.

• Note that Z T Z ∞
−iωt
X̃T (ω) ≡ dt x(t) e = dt x(t) WT (t) e−iωt
−T −∞
where WT is a window function,
WT (t) = 1 for |t| ≤ T and zero otherwise.
or
2 sin ωT
X̃T (ω) = X̃(ω) ∗
ω
sin ωT
As usual, multiplication by WT (t) in the time domain corresponds to convolution by ω
in the frequency domain.
π 1
A frequency cell or bin has a width ∆ω ≈ T or ∆f ≈ 2T

• Let Wt = correlation time in the time domain. Then the number of independent fluctuations
2T
in the time series is Nt = W t

• Let Wω = width of spectrum (bandlimited). Then the number of frequency cells into which
the variance is divided is
Wω T
Nω = = Wω
∆ω π

15
• The number of degrees of freedom (d.o.f.) per frequency cell is
Nt #of independent data points 2T /Wt 2π
Nd.o.f. = ≡ = = .
Nω #of frequency cells T Wω /π WtWω

But the uncertainty principle ⇒ WtWω ≥ 2π, so Nd.o.f. ≈ 1 for each part of F.T.
Interpretation:
As T → ∞ more and more independent fluctuations contribute to the integral, but these are
being spread into more and more frequency bins so that the # d.o.f. per bin remains the same
and small ⇒ large errors.
Nd.o.f. ≈ 1 ⇒ |X̃T (ω)|2 will have 2 d.o.f. per cell, as before.

16
Solution to the convergence problem: increase the number of degrees of freedom in a
frequency cell.
The simplest approach to take is to average spectral estimates. Obtain spectra from L realiza-
tions of length 2T and average; i.e. find ŜT (ω) for a block of data, repeat for L blocks of data,
and average.
X L
ŜT,L(ω) = L−1 ŜT ;(ω)
j=1

[V ar [ŜT,L(ω)]]1/2
error : = L−1/2
hŜT,L(ω)i
10% ⇒ L = 100
(Best method if unlimited amount of data are available.) We will talk about this and other
methods later.

17
Direct Calculation of the Mean and Variance of the Spectral Estimate
We will use continuous notation for now.
The spectral estimator for a WSS process is
Z T
1
ŜT (ω) ≡ |X̃T (ω)|2 where X̃T (ω) = dt x(t) e−iωt.
2T −T

Properties of the estimator: As usual, we want to calculate the mean and variance of the
estimator.
Mean:
The ensemble average is
 
1
hŜT (ω)i = |X̃T (ω)|2
2T
 ZZ 
1
= dt1 dt2 x(t1) x∗(t2) e−iω(t1−t2)
2T
ZZ T
1
= dt1 dt2 RX (t1 − t2) e−iω(t1−t2).
2T −T

18
Use, as before, the coordinate transformation
τ = t1 − t2
1
z = (t1 + t2)
2
(1)
where dz dτ = dt1 dt2.
The integration limits are transformed as follows:

19
 
Z 2T Z T −τ /2 Z 0 Z T +τ /2 
1
hŜT (ω)i = dτ Rx (τ ) e−iωτ dz + dτ RX (τ ) e−ωτ dz 
 
2T

 0 −T +τ /2 −2T −T −τ /2 
| {z } | {z }
2T (1−τ /2T ) 2T (1+τ /2T )
Z 2T   Z 0  
1 −iωτ τ −iωτ τ
= dτ RX (τ ) e 2T 1 − + dτ RX (τ ) e 2T 1 +
2T 0 2T −2T 2T
| {z }
|τ |
=1− 2T
2T  
|τ |
Z
1
= dτ Rx (τ ) e−iωτ 2T 1 −
2T −2T 2T
or Z 2T
hŜT (ω)i = dτ RX (τ ) e−iωτ [1 − |τ |/2T ]
−2T
and Z ∞
lim hŜT (ω)i = dτ RX (τ ) e−iωτ ≡ S(ω)
T →∞ −∞

=⇒ ŜT (ω) is an unbiased estimator of S(ω) if the width of Rx(τ ) is finite.

20
Variance:
Now let’s look at the variance of the estimator:
Var[ŜT (ω)] ≡ hŜT2 (ω)i − hŜT (ω)i2
By definition,
 
1
hŜT2 (ω)i = 2
|X̃T (ω)|4
4T
ZZZZ T
1
= 2
dt1 dt2 dt3 dt4 hx(t1 )x∗ (t2 )x(t3 )x∗ (t4 )i e−iω[t1 −t2 +t3 −t4 ]
4T −T

Assume the time series is real and assume a Gaussian process:


Then
hx(t1)x(t2)x(t3)x(t4)i ≡ hx(t1)x(t2)ihx(t3)x(t4)i
+hx(t1)x(t3)ihx(t2)x(t4)i
+hx(t1)x(t4)ihx(t2)x(t3)i

≡ RX (t1 − t2)RX (t3 − t4)


+RX (t1 − t3)RX (t2 − t4)
+RX (t1 − t4)RX (t2 − t3)

21
Plugging in, we get
ZZZZ
1
dt1 dt2 dt3 dt4 h ie−iω[t1 −t2 +t3 −t4 ]
4T 2
hŜT (ω)ias above
 ZZ zZ
Z }| {
1
= 2
dt1 dt2 RX (t1 − t2 )e−iω[t1 −t2 ] dt3 dt4 RX (t3 − t4 )e−iω[t3 −t4 ]
4T
ZZ ZZ
+ dt1 dt3 RX (t1 − t3 )e−iω[t1 +t3 ] dt2 dt4 RX (t2 − t4 )e−iω[−t2 −t4 ]
ZZ ZZ 
+ dt1 dt4 RX (t1 − t4 )e−iω[t1 −t4 ] dt2 dt3 RX (t2 − t3 )e−iω[t2 −t3 ]

Note the 1st and 3rd terms are of the form hŜT (ω)i2. Therefore
ZZ
1
hŜT2 (ω)i = 2hŜT (ω)i2 + 2 dt1 dt3 RX (t1 − t3)e−iω[t1+t3] → 0

4T
| {z }
→0 as T /Wx →∞ (except for ω=0)
t1 ,t2 →τ,Z as before

22
Looking at the double integral in the second term we have
ZZ
R 2T
Z T −τ /2
−iω[t1 +t3 ]
dt1 dt3 RX (t1 − t3 )e = 0 dτ RX (τ ) dz e−iω2z
−T +τ /2
| {z }
1 [e−iω2(T −τ /2) −eiω2(T −τ /2) ]
− 2ω2
=ω1 sin ω2T (1−τ /2T )

R0
Z T +τ /2
+ −2T dτ RX (τ ) dz e−iω2z
−T −τ /2
| {z }
τ
1 [e−iω2(T + 2 ) −eiω2(T +τ /2)
− iω2
1 sin ω2T
=ω (1 + τ /2T )
| {z }
=1−|τ |/2T
as before

Therefore
ZZ Z 2T
1 −iω[t1 +t3 ] 1
dt1 dt3 RX (t1 − t3 )e = dτ RX (τ ) sin[2ωT (1 − |τ |/2T )].
2T 2T ω −2T

Now this term → 0 as T → ∞ because


Z 2T
2
1
lim
2T ω dτ RX (τ ) sin[2ωT (1 − |τ |(2T )]
T →∞ −2T
2
1 ∞
Z

= lim dτ RX (τ ) sin 2ωT
T →∞ 4T 2 ω 2 −∞ | {z }
finite if RX (τ ) of finite width

= 0

23
In a slightly different approach for the second term we consider
ZZ T
I2 = dt1 dt3 RX (t1 − t2) e−iω[t1+t3]
−T

Again let τ = t1 − t3 and z = 12 (t1 + t3): By calculating the integrals and taking a limit T → 0,
it can be shown that I2 → 0.
Consequently we have the fairly general result
lim hŜT2 (ω)i = 2hŜT (ω)i2
T →∞

The error in the spectral estimate is


q q
V ar[ŜT (ω)] 2hŜT (ω)i2 − hŜT (ω)i2
ε= = =1
hŜT (ω)i hŜT (ω)i
i.e. 100% error ⇒ ŜT (ω) does not converge to S(ω) as T ⇒ ∞.
How to fix the properties of the spectral estimator:

1. increase the number of degrees of freedom through averaging of multiple spectral estimates;
2. increase the number of degrees of freedom by smoothing the spectrum: convolve the spectral estimate with a
window function; this procedure sacrifices of multiple spectral estimates;
3. use another method that uses a priori information (e.g. Bayesian and maximum entropy approaches).

24

You might also like