0% found this document useful (0 votes)
63 views

Lecture 11: Standard Error, Propagation of Error, Central Limit Theorem in The Real World

This document summarizes key concepts related to standard error, propagation of error, and how the central limit theorem applies in real-world situations. The main points are: 1) The standard deviation of a sample statistic is called its standard error. A common formula is used to calculate the standard error when deriving one quantity from others. 2) This formula for propagation of error assumes measurements are independent and subject to small random errors. It provides the standard deviation of the derived quantity based on the standard deviations of the input quantities. 3) In cases where inputs are sample means, the central limit theorem implies the derived quantity will be approximately Gaussian distributed, with mean equal to the true population value and variance calculated
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Lecture 11: Standard Error, Propagation of Error, Central Limit Theorem in The Real World

This document summarizes key concepts related to standard error, propagation of error, and how the central limit theorem applies in real-world situations. The main points are: 1) The standard deviation of a sample statistic is called its standard error. A common formula is used to calculate the standard error when deriving one quantity from others. 2) This formula for propagation of error assumes measurements are independent and subject to small random errors. It provides the standard deviation of the derived quantity based on the standard deviations of the input quantities. 3) In cases where inputs are sample means, the central limit theorem implies the derived quantity will be approximately Gaussian distributed, with mean equal to the true population value and variance calculated
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Lecture 11: Standard Error, Propagation of

Error, Central Limit Theorem in the Real World


5 October 2005
1 Standard Error
Quick point of terminology: last time, when we talked about getting at the
sampling distribution of summary statistics, we mostly looked at their means
the law of large numbers, in particular, is about the mean of the sample
distribution. Theres also going to be a variance or standard deviation. Its a bit
unfortunate, terminologically, but the standard deviation of a sample statistic
is called its standard error. The main tool for getting at standard errors is
the central limit theorem. Recall that X has mean and variance
2
/n, so it
has standard deviation /

n.
2 Propagation of Error
In many experimental lab courses, you learn a rather mysterious-looking formula
for the error bars of derived or calculated quantities. It says that if you have a
quantity z which is a function of measured quantities x and y, i.e., z = h(x, y),
then

z
=

_
h
x
_
2

2
x
+
_
h
y
_
2

2
y
where
z
is the standard deviation of z, and similarly for the other variables.
(This formula, and everything which follows, extends in the natural way to
functions of more than two variables.)
We are now in a position to see exactly where this formula comes from, and
when its actually valid.
We assume that each of the input quantities x and y is really a random
variable, X, Y , which has some average value (
x
,
y
), plus uctuations around
it which represent noise in our apparatus, errors of procedure, gremlins, etc.
The value of z we calculate is therefore also a random quantity, Z, because if
the uctuations had come out dierently, wed be plugging dierent numbers
into the function h, and getting a dierent answer. The question we want to
answer is how dierent that result would, probably, be.
1
Lets start by Taylor-expanding h, making the expansion around the mean
values of the input variables:
h(X, Y ) h(
x
,
y
) +
_
h
x
_
(X
X
) +
_
h
y
_
(Y
Y
) +higher order terms
A Taylor expansion like this is only valid if the neglected higher-order terms, like,
1
2
_

2
h
x
2
_
(X
X
)
2
, are small compared to the included terms, like
_
h
x
_
(X
X
).
So we want
_
h
x
_
(X
X
)
1
2
_

2
h
x
2
_
(X
X
)
2
_
h
x
_

1
2
_

2
h
x
2
_
(X
X
)
2
_
h
x
_
_

2
h
x
2
_ (X
X
)
And similarly for Y . We can have this happen either if X
X
is always very
small, or if the ratio of the rst to the second derivative is always very large
that is, the function h is smooth.
Assumption 1: Measurement errors are small, where the scale for smallness is
set by the ratio of rst to second derivatives.
If Assumption 1 holds, and we can use our Taylor expansion, weve re-
expressed h as a linear combination of random variables, and we know how to
handle linear combinations. First, the mean:
E[Z] = E[h(X, Y )] h(
X
,
Y
) +E
__
h
x
_
(X
X
)
_
+E
__
h
y
_
(Y
Y
)
_
= h(
X
,
Y
) +
_
h
x
_
E[X
X
] +
_
h
y
_
E[Y
Y
]
= h(
X
,
Y
) +
_
h
x
_
(E[X]
X
) +
_
h
y
_
(E[Y ]
Y
)
= h(
X
,
Y
) +
_
h
x
_
(
X

X
) +
_
h
y
_
(
Y

Y
)
= h(
X
,
Y
)
Now we compute the variance:
Var (Z) = Var (h(X, Y )) Var
_
h(
x
,
y
) +
_
h
x
_
(X
X
) +
_
h
y
_
(Y
Y
)
_
= Var
__
h
x
_
(X
X
) +
_
h
y
_
(Y
Y
)
_
We can drop h(
X
,
Y
) because its just a constant, but now we need to make
an additional assumption.
2
Assumption 2: The measurement errors in the input variables are indepen-
dent.
Var (Z) Var
__
h
x
_
(X
X
)
_
+ Var
__
h
y
_
(Y
Y
)
_
=
_
h
x
_
2
Var (X
X
) +
_
h
y
_
2
Var (Y
Y
)
=
_
h
x
_
2
Var (X) +
_
h
y
_
2
Var (Y )
=
_
h
x
_
2

2
X
+
_
h
y
_
2

2
Y
Taking the square root of Var (Z) to get the standard deviation gives us the
usual formula for propagation of error.
The most important special case for this is when the values of x and y we
plug in to the formula are themselves obtained by averaging many measurements
that X, above, is really X, and Y is really Y . Lets make the following
assumptions.
Assumption 3: Measurement errors are independent from one measurement
to the next.
Assumption 4: There are many measurements of each variable.
In this case, we can use the central limit theorem to say more about X and
Y . The mean values of X and Y are still the population means,
X
and
Y
.
But now the standard deviations we plug in are standard errors, s
x
=
X
/

n
and s
y
=
Y
/

n. Also, X and Y are Gaussian. Since a linear combination of


independent Gaussians is Gaussian, Z is also Gaussian.
So we have the following result.
Suppose Z = h(X, Y ), where X is the sample mean of measured
values of X, and likewise for Y . Then, if Assumptions 14 hold, Z
is approximately Gaussian, with mean h(
X
,
Y
), and variance
_
h
x
_
2

2
X
n
+
_
h
y
_
2

2
Y
n
where n is the number of measurements of each input variable, and

2
X
is the true (population) variance of X.
3 The Law of Large Numbers and Central Limit
Theorem in Real Data
The law of large numbers and the central limit theorem Ive presented assume
independent data-points. While we can create independent data, and a lot of
experimental technique, survey design methods, etc., is about ensuring our data
3
are independent, phenomena in the natural world are rarely so cooperative as
to be completely independent. Fortunately, the asymptotic laws still generally
hold when the data values are not too dependent. Making this precise involves
some mathematics way beyond the scope of this course though I strongly
encourage you to take a course in stochastic processes, where youll learn all
about it but we can convince ourselves of its validity experimentally in many
cases.
Heres one particular case: the wind-tunnel data which we rst saw back
in the second lecture. Well look at the acceleration measurements. These are
weakly correlated (the correlation between successive values is about 0.017,
which is small, but denitely not zero). The equivalent of looking at a sample of
n independent draws from the distribution is to look at at a time average of T
successive values from the series:
1
T

T
i=1
a
k+i1
. This time average is going to
depend on the starting position, k, as well as on the length of the interval over
which we average, T. If the system were looking at is well-behaved, though, our
initial starting point (k) makes less and less dierence as we look at longer and
longer intervals (T ), just as, with independent samples, the sample mean
always converges to the population mean. With a dependent system, what we
hope is that the time average converges to the space average, which is the
mean over the sample space:
lim
T
1
T
T

i=1
a
k+i1
=
_
af(a)da
where f(a) is systems density in the sample space the fraction of the time
it spends near the point a. If this happens, we say that the system is er-
godic. Ergodicity is extremely important for statistics, because it means that
any suciently long sequence of data is representative of the whole process,
and we can use it to make reliable inferences about the system as a whole. Its
also extremely important to making statistical mechanics and thermodynamics
work. Unfortunately, the math needed to really handle ergodicity is fairly com-
plicated
1
, but we can see it demonstrated in our data. After all, if the equation
above holds, then, starting from any position k, the time averages should get
closer and closer as T gets larger and larger. If we histogram the time averages
(Fig. 1), we see that this is indeed the case they become more and more
tightly peaked around a common central value.
If the values a
t
are ergodic, then so is any function of a
t
. In particular, if we
look at the indicator function which says whether or not a
t
B for some set B,
this will also converge on a limiting value, which is the probability of B. We saw
something like this in lecture 2, but lets look at it again for the acceleration.
Here Ive chosen the set B = [0.05, 0.06] [0.03, 0.02] i.e., two intervals on
either side of zero. (Theres no particular interest to this region, I just chose it
to show that this works on pretty much any event you like.) As you can see in
Fig. 2, the time-average of the number of measurements falling in B converges to
1
Though you might try reading Michael C. Mackey, Times Arrow: The Origins of Ther-
modynamic Behavior (Dover Books, 2003).
4
Figure 1: Distribution of the values of time averages. Filled circles: individual
measurements. Open circles: averages of successive measurements (T = 2).
Squares: averages over thirty time-steps (T = 30); diamonds: T = 100; tri-
angles: T = 1000. Note that as we average over longer and longer times, the
distribution gets narrower and narrower, while the center does not move. This
indicates that time-averages are converging to a common value, independent of
when we start observing the acceleration that the system is ergodic.
5
a stable value, no matter when we start making our measurements. (Remember
that the sampling rate here is 30kHZ, so 3000 time-steps is one second.)
At this point, we should be pretty much convinced that the law of large num-
bers holds in this data that reasonably long samples all look alike, and are all
representative of the process as a whole. What about the central limit theorem?
More specically, do the time averages approach a Gaussian distribution?
One way to check this would be to compare the histograms of the time-
averages, as in Fig. 1, to Gaussian density functions with the same mean and
variance. But then wed have to assess whether two more-or-less bell-shaped
wiggly curves are good matches, and wed rather do something easier. The
something easier is provided by probability plots, which you read about last
week.
Remember how probability plots work: along the horizontal axis, Ive plotted
all the dierent values seen in the data, in order. Each value falls at a a certain
quantile of the data: the i
th
largest value is bigger than or equal to a fraction i/n
of the sample values. Now, for any distribution, the quantile function Q(p) is the
inverse of the cumulative distribution F(x). Just as F(x) answers what is the
probability that a random value is x?, Q(p) answers What value is at least
as large as a fraction p of the samples? The vertical axis gives Q(F(x)), where
F(x) is the CDF of the data, and Q is the quantile function of a theoretical
distribution, here the standard Gaussian. If the data really does come from the
theoretical distribution, then Q = F
1
, and we should get a straight line, up to
sampling error. If not, well get something curved. One wrinkle is that a sample
from any Gaussian distribution, plotted against the standard Gaussian, should
give a straight line, because all Gaussian distributions can be standardized by
a linear transformation. So plotting the data against a standard Gaussian lets
us check normality.
The next few gures give Gaussian probability plots for the individual accel-
eration measurements (T = 1), averages over successive pairs of measurements
(T = 2), and then over times of length 30, 100 and 1000. What you can see
is that the probability plots come closer and closer to straight lines, over more
and more of their range, until at T = 1000 weve got something which is really
very Gaussian indeed. So it looks like the central limit theorem holds in this
real-world, correlated data too.
However, theres an important caveat here. If the CLT worked just like it did
in the case of independent data, then the variance of the time-averages should
be approximately Var (A) /T. We know that the variance is getting smaller
we can see that in Fig. 1 but is it getting smaller like 1/T?
T Variance of time averages Var (A) /T
1 7.50 10
4
7.50 10
4
2 3.69 10
4
3.75 10
4
30 2.83 10
5
2.50 10
5
100 6.21 10
6
7.50 10
6
1000 2.55 10
7
7.50 10
7
The variance in the time-averages is actually getting smaller faster than the
6
Figure 2: Convergence of relative frequencies to long-run probabilities for real
data. The horizontal axis shows time, in steps of 1/30,000 second. The ver-
tical axis shows the fraction of measures to date which fall into the region
B = [0.03, 0.02] [0.05, 0.06]. Gray horizontal line: long-run average of this
fraction (probability). Solid line: time-averages starting from the rst mea-
surement. Dashed line: time-averages starting from the 100, 00
th
measurement.
Dotted line: time-averages starting from the 900, 000
th
measurement.
7
Figure 3: Gaussian probability plot of the acceleration values. Here and in the
other probability plots, the diagonal line connects values at the rst quartile to
those at the third quartile; it serves as a rough guide to the eye.
8
Figure 4: Gaussian probability plot of the means of pairs of successive acceler-
ations
9
Figure 5: Gaussian probability plot of the means of thirty successive accelera-
tions. The horizontal line through zero is just a graphics bug.
10
Figure 6: Gaussian probability plot of the means of 100 successive accelerations
11
Figure 7: Gaussian probability plot of the means of 1,000 successive accelera-
tions
12
CLT would predict in independent data. This is basically because the correlation
between a
t
and a
t+1
is negative if one of them uctuates above the mean
value, the other one is apt to move below it, so theyre even more likely to cancel
out uctuations around the mean than independent measurements are. The
moral of this story is that while time averages converge, and they tend to have
a Gaussian distribution when you look at enough of them, you cant, necessarily,
assume that theyll have the same Gaussian distribution as if measurements were
all independent of one another.
13

You might also like