Probability, Statistics and Estimation
Probability, Statistics and Estimation
Estimation
(Short Edition)
Mathieu ROUAUD
Physics and Chemistry Lecturer in Preparatory Classes for
the Engineering Schools. Associate Professor.
MSc Degree in Theoretical Physics.
Phd Student in Particle Physics.
To share the knowledge and provide access to the greatest number of reader, the
book license is free, the digital book is free and to minimize the cost of the paper
version, it is printed in black and white and on economic paper.
• The licensor cannot revoke these freedoms as long as you follow the license
terms.
• You do not have to comply with the license for elements of the material in the
public domain or where your use is permitted by an applicable exception or
limitation.
• No warranties are given. The license may not give you all of the permissions
necessary for your intended use. For example, other rights such as publicity,
privacy, or moral rights may limit how you use the material.
• For other uses you have to contact the author.
Thanks to the life and to all those who have come before
me.
Contents
I. RANDOM VARIABLE.........................................................1
A. How to measure a quantity ?..........................................1
B. The center of a distribution............................................1
C. The dispersion of a distribution......................................3
D. Examples of distributions..............................................4
E. Central limit theorem.....................................................7
1) Population and samples.....................................................7
2) The central limit theorem................................................10
3) Student's t-value and uncertainty.....................................12
4) Examples.........................................................................15
F. Gaussian distribution....................................................19
1) Definition of continuous distribution..............................19
2) Bell-shaped density curve................................................20
3) Standard normal distribution...........................................23
G. Hypothesis test............................................................24
H. Chi-squared test...........................................................30
I. The sources of the uncertainties....................................33
J. Exercises.......................................................................37
IV. ESTIMATORS................................................................126
A. Properties of an Estimator.........................................126
1) Bias...............................................................................126
2) Mean Square Error........................................................127
B. Construction of estimators.........................................129
1) Method of Moments.....................................................129
2) Method of Maximum Likelihood.................................133
C. Interval estimate.........................................................137
D. Exercises...................................................................146
1
center is the mean. The mean is the sum of the observed
values divided by the number of observations :
x 1 x 2...x i...x n
∑ xi
i=1
x =
n
also x = 2
51004230375045603980
c = =4324 J / K / kg
5
x = ∏ x
n
i
2MATH : To simplify the writing of a sum, the Greek letter sigma is used
as a shorthand and read as "the sum of all x i with i ranging from 1
to n".
2
C. The dispersion of a distribution
In addition to locating the center of the observed values we
want to evaluate the extent of variation around the center.
Two data sets may have the same mean but may be differ-
ent with respect to variability. There are several ways to
measure the spread of data. Firstly the range is the differ-
ence between the maximum and minimum values in the
sample. The sample range of the variable is very easy to
compute, however it is sensitive to extreme values that can
be unrepresentative.
n
∑ x i −x 2
i=1
s=
n−1
and s c ≃530 J / K / kg
3
The mean deviation may also be used (see Exercise 1).
Using the standard deviation formula, dividing by n rather
than n-1, will obtain the root mean square deviation (square
root of average square deviation). The choice of the
standard deviation is justified on page 128. Besides, the
values of n are often large and the difference is small.
D. Examples of distributions
Case 1 :
x1 6
x11 11
x12 9 5
x13 10
4
x14 14
frequencies
x15 11 3
x16 8
x17 9 2
x18 12
1
x19 7
x 1
10
8 0
x111 8 7 8 9 10 11 12 13 14 15 16 17 18 19
x112 9 x1
x113 11
x114 14 mean = 10 standard deviation= 2.07
x115 10 mode= 9 range= 7
x116 9 median= 9.5 root mean square deviation= 2.00
4
Case 2 :
x1 6
x11 15
x12 5
13
x 1
3
12
4
x14 13
frequencies
x15 14 3
x16 13
x17 16 2
x18 19
x19 1
13
x 1
10
14
0
x111 10 7 8 9 10 11 12 13 14 15 16 17 18 19
x112 16 x1
x113 14
x114 15 mean = 14 standard deviation= 2.00
x115 13 mode= 13 range= 9
x116 14 median= 14 root mean square deviation= 1.94
Case 3 :
6
x1
x11 10 5
x12 10
x13 12 4
frequencies
x14 11
3
x15 9
x 1
6
8 2
x17 10
x18 9 1
x19 9
0
x110 11 7 8 9 10 11 12 13 14 15 16 17 18 19
x 1
11
9 x1
x112 11
x113 10
x114 10 mean = 10 standard deviation= 1.03
x115 11 mode= 10 range= 4
x116 10 median= 10 root mean square deviation= 1.00
5
The mean is not always the most represented value (case 1
and 2) and in some cases does not appear at all. In case 3
the histogram is symmetrical and illustrating that the
median and mean are equal.
In the event that some values are represented several times,
we determine the frequency fi for each value xi.
c
We have n=∑ f i , where c gives the number of different
i=1
values of xi.
√
c c
∑ f i⋅x i c
fi
∑ f i( xi − x̄)2
i=1
=∑
i=1
x = ⋅x s=
n i=1 n i n−1
6
E. Central limit theorem
7
When the sample size becomes very large, it tends towards
the population : =lim x 3.
n∞
8
Let us flip nine coins and collect a sample :
{0; 1; 1; 0; 1; 1; 1; 0; 0}.
Then we find x ≃0,56 and s≃0,53 .
9
2) The central limit theorem
10
Hypothetically, for a city with a population of one million
inhabitants, p could represent the probability that one have
a given height x. If we could measure the height of all the
inhabitants, we could exactly determine their average
height μ and the standard deviation σ. However, it is prag-
matically difficult, if not impossible, to measure a popula-
tion of that size. Therefore a sample size of only a thousand
inhabitants is taken to reduce the burden of labour. For this
to be as representative of the whole population as possible,
the thousand-person sample is picked randomly.
We obtain a thousand measures of height from x1 to x1000.
From this sample of size n=1000 we calculate a mean x̄
and a standard deviation s. We think that x̄ is close to μ,
but at the same time there is no reason for it to be equal to
μ. We put this value of x̄ on the right side of the figure on
page 10.
We take a new random sample of a thousand people and a
new value for x̄ .
We then repeat this operation a great number of times. We
see on the right the distribution of the samples obtained:
11
3) Student's t-value and uncertainty
Prediction interval :
If μ and σ are known, the sampling distribution is also
Gaussian and the expected statistical fluctuations are with a
probability of p% between μ−t ∞ σ / √ n and μ+t ∞ σ / √ n .
Confidence interval :
In the case of the calculation of uncertainties μ and σ are
not known and we estimate them from the sample with x̄
and s. Due to a small statistic, there is widening given by
the Student's t-distribution:
s
=x ±t⋅
n
The Student's t-value depends on the sample size n and on
the confidence. If the confidence is 95%, we have 95 in 100
chance that μ is between x −t⋅s / n and x̄+ t⋅s / √ n .
12
We recognize here the notion of measurement uncertainty
x 4:
s
x= x ± x with x=t⋅
n
x is also called absolute uncertainty and Δ x /|x̄| rela-
tive uncertainty.
13
perimental approach, the various parameters which make it
possible to describe a experimental situation are not per-
fectly known. We do not have a simple numerical value as-
sociated with each characteristic, but an interval for a given
confidence. Strictly speaking, any experimental value must
associate its uncertainty with its confidence.
Exceptions:
▸ Large number samples: the size of the sample n is large
enough to be able to directly apply the central limit theo-
rem. The sampling distribution is normal regardless of the
distribution of the population. We do not have to worry
about Student's distribution which anyway identifies with a
Gaussian distribution in this case.
14
4) Examples
one two
toss tosses
n=1
15
We can obtain the probability (the number of tails divided
by the number of possibilities 2n) as a function of the
number of tails. For n = 1 we have the distribution of the
population and following the sampling distributions for
different values of n.
7
6
5
4
3
2
three four 1
tosses tosses 0
0 1 2 3 4
12 25
10 20
8
15
6
10
4
five 2 six 5
tosses 0 tosses 0
0 1 2 3 4 5 0 1 2 3 4 5 6
16
2
one die : 1
0
1 2 3 4 5 6
For two dice, there is only one possibility for the sum to be
two: 1 for the first die and 1 for the second die. For the sum
to be three, there are two possibilities: 1 and 2, or, 2 and 1.
The most likely with two dice is to obtain a sum of 7: (1,6)
(6,1) (2,5) (5,2) (3,4) (4,3).
7
two dice : 1
0
2 3 4 5 6 7 8 9 10 11 12
30
25
20
15
10
5
three dice :
0
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
17
For four dice, we already recognize the bell curve and the
profile is clearly of the Gaussian type:
160
140
120
100
80
60
40
four 20
dice : 0
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
and σ x ≃1.71
18
also for four dice : σ x̄ =σ x/ √ n =σ x /2≃0.85 . Now, on the
above curve, 40% from the maximum (explanation page
21), we have a deviation close to 3.5 (between 3 and 4),
and an average of 3,5/ 4≃0.88 . It matches.
F. Gaussian distribution
19
So, the probability that the event is realized on the set of
possible values is 100%:
∞
∫ p x dx=1
x=−∞
p x =
1
e
−
2
1 x−
2⋅
In the mathematic tools section page 165 some demonstra-
tions are performed.
20
We have represented two cases on the following graph:
μ+σ
∫
μ−σ
p(x )dx=0.683 ...≃68 %
21
The probability concentrated within interval [μ − 2σ, μ +
2σ] is 0.95 :
μ +2 σ
μ +3 σ
22
3) Standard normal distribution
x− 1 −2
z
and z= then : p z = e
2
23
G. Hypothesis test
24
We consider the coefficient t∞ of a normal distribution for a
p% confidence (or Student's t-value when n → ∞).
We can also use other characteristic intervals of the sam-
pling distribution. In general, the hypothesis H 0 implies a
property A of the sampling distribution. Here, the involve-
ment is not deterministic but statistical and decision-mak-
ing proceeds differently.
25
Accept H0 Reject H0
Wrong decision
H0 is true Right decision (error of the first
kind α)
Wrong decision
H0 is false (error of the Right decision
second kind β)
26
mixed with the others and it does not bear any distinctive
marks. We choose one die for a game night and we want to
distinguish the biased die to be sure that the chosen die is
well balanced.
xi x0 = 0 x1 = 1
pi p0 = 5/6 p1 = 1/6
μ= p0 x0 + p1 x 1
2 2
σ ²= p0 ( x 0−μ) + p1 ( x 1−μ)
7 Here σ is known and n=92 is large enough to use the central limit
27
If H0 is false then H1 is true. If the alternative hypothesis H1
was "the die is biased" (with no details about the way the
die is loaded) we would have conducted a two-tailed test
and for α we would have to consider the two tails of the
distribution. This scenario only requires a one-tailed test: if
H0 were false, the probability of observing six would be
doubled and we would observe greater values.
theorem.
28
less than a 1 out of 1000 chance that we considered the die
balanced while the die is rigged (we also talk about the
power of the test : η=1-β).
Note that we never calculate the probability that a
hypothesis is true but the probability to reject the
hypothesis while it is true (error of the first kind).
29
H. Chi-squared test
2 O j − E j 2
c
=∑
j=1 Ej
Next we have a table on page 178 to estimate the probabil-
ity of rejecting the null hypothesis H0 when it is true. Ac-
2
cording to the value of and the number of degrees of
freedom we determine whether the assumption is accept-
able. The number of degrees of freedom is :
ddl=c−1 (number of categories minus one)
Let us illustrate with the experiments carried out by the
botanist Mendel. He makes crosses between plants. He
crosses peas with pink flowers. His theory implies that he
must obtain 25% of peas with red flowers, 25% of peas
with white flowers and 50% of peas with pink flowers. This
result is derived from the random encounter of gametes.
30
Imagine that he observes one thousand flowers with the fol-
lowing values: 27% white, 24% red and 49% roses. Should
he continue to believe in his hypothesis?
then
2 2 2
(270− 250) ( 240− 250) (490− 500)
χ 2= + + ≃2.2
250 250 500
and ddl=3−1=2 .
31
The test is easily generalized for a table :
( )( )
O 21 O22 ... ... ... ... E21 E 22 ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
Oi1 ... ... O i j ... ... Ei 1 ... ... Ei j ... ...
... ... ... ... ... ... ... ... ... ... ... ...
O r 1 ... ... ... ... Or c E r1 ... ... ... ... E rc
ddl=(c−1)(r−1)
2
We compute the χ with a similar formula :
2
O i j − Ei j
2= ∑
i , j Ei j
32
I. The sources of the uncertainties
33
can also be performed with a high-precision apparatus that
is used as a reference.
The influence of these different sources of uncertainty can
be illustrated by a target and arrows. The center of the tar-
get corresponds to the quantity to be measured and the ar-
rows represent the different measurements. If the arrows as
a whole are not correctly centered, the accuracy is not as-
sured. The tightening of arrows represents precision. The
distance between the circles on the target indicates the res-
olution. The value noted is that of the circle whose arrow is
closest. The experimenter sees the arrows and the circles,
however he does not know where is the center of the target.
He holds the bow and his desire to be closer to the center
of the target shows the quality and rigor of his work.
34
Measure accurate, with a poor precision and a low resolution :
35
Measure biased, with low precision and resolution :
= 1 2 3 ...
2 2 2
36
J. Exercises
37
Exercise 4 : Elevator Answers (Complete Edition)
10 5 13 7 6 9 5 5 10 15 5 3
15 12 11 1 3 13 11 10 2 7 2 8
2 15 4 11 11 5 8 12 10 18 6
38
3) We do a series of throws and we get the following
sums: 18, 15, 17, 22, 16, 12, 14, 22, 23, 14, 23, 14,
18, 21, 12, 15, 18, 13, 15, 18, 17, 15, 17, 21, 25, 16,
8, 15, 15, 13.
39
39.1 38.8 39.5 39.2 38.9 39.1 39.2 41.1 38.6 39.3
Male Female
Parliament 470 107
Senate 272 76
The Supreme Court 10 2
40
Exercise 11 : Births Answers (Complete Edition)
41
Theory
Hints:
42
1- What is the expression of p x , y ? Show that
p( x , y) satisfies the two necessary conditions for a
probability distribution.
43
densities.
Hints:
44
Gaussian 1D 2D 3D
Distance from
∣x∣ 9
ρ r
the origin
Mean
2
2
2
2
Standard 1 2 3
deviations
P (σ) 68.3% 1-1/e=63.2% 60.8%
P (2σ) 95.4% 98.2% 99.3%
P (3σ) 99.7% 99.988% 99.9994%
On OpenOffice :
• sum of a selected area * (ex.: B43:B53) :
=SOMME(*)
• value set to the cell B3 : $B$3
• =MOYENNE(*)
• squared value : *^2
• square root : *^(1/2)
• =ECARTYPE(*)
• Student's t-value 95% confidence and n=20 :
=LOI.STUDENT.INVERSE(0,05;19)
• =TEST.KHIDEUX(*;**); * : experimental frequencies
(ex.: B71:E72), ** : theoretical frequencies.
45
46
II. CORRELATION AND
INDEPENDENCE
A. Correlation coefficient
The sample correlation coefficient r is used to identify a
linear relationship between two variables Xi and Xj :
∑ [ x i k − xi x j k− xj ]
k
r ij =
∑ [ x
k
ik − xi 2]⋅∑ [ x
k
jk − xj 2 ]
47
r varies between -1 and +1. If ∣r∣=1 the variables are
perfectly correlated : r=1 is verified in the case of a
perfect increasing linear relationship, and r=−1 in the
case of a perfect decreasing linear correlation. If r=0 ,
the variables are uncorrelated and independents.
2
X1 X2 X3 x 1 − x1 x 2− x2 x 3 − x3 x1− x1
(cm) (kg)
and :
2 2
x2 − x2 x3 − x3 x1 − x1 x1 − x1 x 2− x2
. x 2− x2 . x 3− x3 . x 3− x3
121 4 165 30 22
81 4 45 -10 -18
81 9 45 15 27
121 9 165 -45 -33
404 26 420 -10 -2
48
420
then : r 12= ≃0.93 , r 13≃−0.09 and
√500 √ 404
r 23≃−0.02 .
r12 is close to +1, so we have a strong positive correlation.
r13 and r23 are close to zero and therefore: X3 is independent
of X1 and X2 :
49
Examples 7, 9 and 10 illustrate a strong correlation
between two variables. Yet the correlation coefficient is not
as close to -1 or +1 as we could imagine, it is even zero in
example 9. This is due to the fact that the correlations are
not linear.
50
threshold effect (a product may be beneficial at low dose
and harmful at higher doses - example 9). To prevent these
phenomena from misrepresenting the data, it is important
to carefully consider the relevance of the variables chosen
before starting a statistical study.
Another example: if we study the volume V of different
objects according to their size T, we find a positive
correlation. But the correlation will be much stronger
between V and X = T 3 (graphs 7 and 8).
51
B. Propagation of uncertainties formula
f x1 , x2 , ... , x j ,... , x p
10
[ ]
p 2
2 ∂f
σ =∑
f ( ) σ2j
j=1 ∂ xj
52
2) Uncertainty calculations
[ ]
p 2
2 ∂ f 2
f =∑ xj
j =1 ∂ xj
with M =∑ m j
j =1
53
p
then ΔM =
2
(∑ )
j=1
2
1 Δm
2
54
In practice, there are multiple common ways to calculate
uncertainties' propagation, depending on the situation,
given below:
p
f xj
=∑
f j=1 xj
55
uncorrelated packets approximately Gaussian, it is possible
to do a numerical calculation.
The latter method is illustrated in Exercise 2 of this
chapter. A spreadsheet can do the calculations auto-
matically (for example see on sheet 4 of the file
IncertitudesLibres on www.incertitudes.fr).
56
gram and not to the milligram!
While these tricks serve to aid the calculations, they are not
without their pitfalls and must used with caution.
57
C. Linear regression
xi x
58
We have to minimize the following quantity:
∑ d 2 =∑ y i − y i 2
i
This is equivalent to
x y− x y
a= and b= y −a x .
x 2− x 2
12 Demonstrations p97.
s b=s r
∑ x i2
n ∑ x i− x
2
59
Then a=t n−2 s a and b =t n−2 s b .
60
• The dotted lines represent the two extreme lines (
y=a min x b max and y=amax x+b min ).
2
1 x o− x
y o=t n−2 s r
n ∑ x i −x 2
61
• The outer curves represent a prediction for a new
measurement. Prediction interval for an observation
yo :
2
1 x o− x
y o=t n−2 s r 1
n ∑ xi −x 2
For example, if the height equals 175 cm there is a
90% chance of their mass being between 58 and 92
kg (generally 90% of the data are inside the curves
and 10% outside).
62
time 10h 10h 10h 10h not not 12h
15 30 40 55 noted noted
temperature 19.8 52.9 47.8 42.4 36.2 33.5 30.2
ϴ (°C)
pressure 1013 1130 1112 1093 1072 1061 1049
P (hPa)
63
We know that absolute zero is -273.15 °C, therefore this is
not consistent. We can therefore assume that there is a bias
and that we have not considered all the sources of
uncertainties.
M i x i ± x i , y i± yi
1
∑ w i e i2 with w i=
y i a x i 2
2
i
64
We obtain: 0 K =−266±35 ° C with the same con-
fidence on the uncertainties of xi and yi. The value is now
correct. The main sources of uncertainties seem to be
included.
We could also consider the modeling uncertainty induced
by the ideal gas hypothesis, but under the experimental
conditions of this experiment, the model provides a good
approximation. This source of uncertainty is negligible
compared to the others uncertainties considered here. The
use of a real gas model (like Van der Waals equation of
state) would demonstrate this.
65
Formulas [i] :
S 2 =∑ wi [ y i −a xi b]2
i
∂S 2 ∂S 2
=0 and =0
∂b ∂a
leads to
b =
∑ wi y i ∑ w i xi2 − ∑ wi x i ∑ w i xi yi
and
a =
∑ wi ∑ wi x i y i − ∑ wi x i ∑ w i y i
with
2
= ∑ wi ∑ wi x2i − ∑ wi x i
then
b =
∑ wi x2i
and
a =
∑ wi
66
4) Linearization
e x (logistic y
y= y ' =ln y ' = x
x distri- 1− y
1e bution)
y=1−e
− x
y ' =ln ln
1
1− y
b
(Weibull distribution) −
x ' =ln x =a = e a
67
5) Comparison of methods
a) Summary
Simple regression does not mean that the data has no un-
certainties. The uncertainties are unknown and we estimate
them with the data itself. The uncertainties are considered
68
constant whatever regardless of yi. The uncertainty corre-
sponds to the standard deviation of yi with respect to the es-
timated line :
s y =s y =sr =
i
sr
∑ yi− yi 2
i
n−2
sa =
∑ x − x
i
i
2 sb =
x2
∑ xi− x2
sr
sa =
sy
∑ x − x
i
i
2
sb =
x2
∑ xi− x 2
sy
69
If the straight line does not pass through the error bars, it
can be assumed that all sources of uncertainty have not
been calculated. It is necessary either to integrate them into
sy, or to apply the previous method.
i
∂a
∂ yi
sy 2
i
and s b =∑
2
i
∂b
∂ yi
sy 2 i
The formulas' results are exact and help us to find the ex-
pressions of the previous cases. Also in this case we can
70
use the following estimates:
1 1 1 x2 1
a= b= w i=
∑ w i x −x2
2
∑ w i x 2 −x 2 y2
i
71
Hence the formulas :
2 2
s a 2= ∑
i
∂a
s2
∂ yi i
and s b 2= ∑
i
∂b
s2
∂ yi i
The derivatives are difficult to calculate (the weights de-
pend on a), but we can easily evaluate them numerically.
Also we commonly use the following estimates:
a =
∑ wi
b =
∑ wi x2i
w i=
1
y a 2 x 2
i
2
i
b) Discussion
72
fundamentally random phenomenon.
We would like that the fourth case on page 71 includes all
the sources of uncertainty of the cases 1 and 5 :
s a ≃0.443 and s a ≃0.696 but
1 5
s a ≃0.695
4
73
j=1 j=2 j=3 j=4 j=5 j=6 j=7
∂a
-0.093 -0.078 -0.049 -0.006 0.041 0.080 0.105
∂ yj
method.
74
D. Nonlinear regression
We generalize the weighted least squares method in the
nonlinear case. The function is nonlinear with respect to x
and also nonlinear with respect to the parameters. Although
multiple regression has similar developments, it is not dealt
with in this book [x].
1) Principle
75
V y i − f x i =V y i V f x i
By applying the variance propagation formula again:
V f x i = f ' x i 2 V x i
2 2
into S =∑ w i y i− f x i
i
with w i= 1 14
y f ' x i2 x 2
2
i i
∂ S2
Then =0 allows to determine the parameters ak of our
∂a k
function (by an analytical computation or a numerical
resolution).
Each time we can return to a system of linear equations of
the form HA=B, with H a square matrix and A the vector
associated with the parameters. Then A=H-1B, where H-1 is
the inverse matrix of H.
76
sr =
∑ yi − f xi 2
n− p
When wi depends on the parameters we iterate the method
until we can consider the weights to be constant. If we
know the standard deviations of the data, the standard
deviations of the parameters can be computed with the
propagation formula, or they can be estimated using the
same procedure used for linear regression with error bars
on page 66.
2) Polynomial regression
In that case:
m
f x =a 0a1 xa 2 x 2...a m x m=∑ a i x i
i=0
a1 a2
n =a0 2
4
77
The uncertainties on λ and n are initially neglected. What
are the values and the uncertainties of a0, a1 and a2?
S =∑ y i −a0 −a 1 x i −a 2 x i
2 2 2
and ∂ S 2 /∂ a k =0
{
2
y−a o−a 1 x −a 2 x =0
we obtain: x y −ao x−a1 x 2 −a 2 x3=0
x 2 y−ao x 2−a 1 x3−a 2 x4 =0
1 x x2
a0 y
H = x x2 x3 , A= a 1 and B= x y .
2
x2 x3 x4 a2 x y
( )
1 3.3 11 4150 −2530 376
(
H≃ 3.3 11 38
11 38 135 ) −1
, H ≃ −2530
376
1546 −230
−230 34.3
,
() ( )
1.7 1.68129
−1
B≃ 5.7 then A=H B≃ 0.01135 .
19 0.00022
a k = H kk t n−3 sr with sr =
−1
∑ yi − f xi 2
n−3
78
sr=1.87⨉10-5 and with 95% confidence tn-3=4.30
then
a0=1.6813±0.0017 , a1=(1.13±0.10)⨉10-2 μm2
and a2=(2.2±1.5)⨉10-4 μm4
Δ ak=
√ (H −1 )kk
∑ wi
then
79
We thus obtain a first expression of the parameters. To
iterate, we replace the previously used estimated
parameters with these new parameters. We iterate as much
as necessary to obtain a consistent value. Convergence is
often very rapid:
Iteration a0 a1 a2
Estimated 1.5 0.005 0.0001
1 1.681282848208 0.011350379724 0.000219632215
...
3) Nonlinear regression
Then we have f i =f ( xi ; a k ) .
80
n
∂ S2 ∂f
S 2 = ∑ wi y i − f i 2 and =−2 ∑ wi ( y −f )=0
i ∂a k i=1 ∂a k i i
f i= f 0, i∑
l
∂f
∂a l 0,i
a0l
then ∑ wi ∂∂ af
i k
y i − f 0, i −∑
l
∂f
∂ al 0,i
a 0l =0
and ∑ wi ∂∂ af y i − f 0, i =∑ w i
∂f ∂f
a
∂ a k ∂ al 0 l .
i k i ,l
i
∂f
We set H k ,l =∑ w i ∂ a
k
i
∂f
∂ al i
=H l , k ,
∂f
B k =∑ wi y − f 0,i and Al = a 0l .
i ∂ ak i
81
variation on the parameters is negligible and the values
converge.
2 −1 2
Here also, we use: a =H kk r .
k
i 1 2 3 4 5 6 7
[S] 0.038 0.194 0.425 0.626 1.253 2.500 3.740
v 0.050 0.127 0.094 0.2122 0.2729 0.2665 0.3317
∂f
=
x
∂ x
,
∂f
∂
=−
x
x2 , H=
H 11 H 12
H 21 H 22 ,
A=
and B=
B1
B2
.
7 2
2
xi
H 11=∑
i=1
∂f
( ) ( )
∂α α0i
∂f
∂α α0 i
=∑
i
∂f
( )
∂α α 0 ,i
=∑
i
( ),
β0 + x i
82
0 x 2i 20 x 2i
H 12=H 21=∑ 3 and H 22= ∑ 4 .
i 0 x i i 0 x i
xi 0 xi
B1 = ∑ y i − f 0,i and B 2=−∑ 2 y i − f 0, i
i 0 x i i 0 x i
0 x i
with f 0,i =
0 x i
B 1≃−2.33 et B 2≃1.86 .
and also
−1
H ≃ (0.649
0.508
0.508
0.668 ) then A=H B≃
−1
(−0.567
0.0602 )
.
83
and β2.
The results are shown in the following table:
Iterat° α β δα δβ S2
−1 1.52 6.34
After enough iterations: H ≃ 6.34 36.2 ( )
Calculate the uncertainties on the parameters:
sr =
S2
n−2
, Δ α=√ (H )11 t n−2 sr ≃ √1.52.1 .48 .
−1
5 √
0.00784
84
The following graph shows the reaction rate as a function of substrate
concentration. The squares are the experimental points, the solid line is
the optimal curve and the dotted lines are the two extreme curves
f , and f , .
max min min max
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 0.5 1 1.5 2 2.5 3 3.5 4
85
E. Exercises
X1 -1 2 -2 0 -2 2 1
X2 -1 0 2 -2 0 2 -1
86
Exercise 2 : Volumes Answers p168
87
Exercise 3 : Trees Answers (Complete Edition)
88
Exercise 6 : Cauchy's equation
Answers (Complete Edition)
ni=
sin AD m ,i
2
sin
A
2
Dm is the minimal deviation angle. A=60° is the
internal angle of the prism. These two angles are
measured within 2' (1'=arc-minute and 1°=60').
89
2- Using Cauchy's equation find A, B and their
respective uncertainties.
What is the value of the regression coefficient r?
90
Exercise 8 : Insulation and inertia
Answers (Complete Edition)
t in hours 0 1 2 4 5 6 8 9 10
T in °C 18 16 14 12 11 10 9 9 8
91
b) What is the relationship between τ, R and C?
Determine C and its uncertainty.
92
Exercise 10 : Study of a battery
Answers (Complete Edition)
93
Experimental Data (mm) :
OA ΔOA OA' ΔOA'
635 5 150 15
530 5 160 17
496 5 164 15
440 5 172 18
350 5 191 20
280 5 214 25
210 5 292 28
150 5 730 102
Theory
∑ xi−x 2=n x2 −x 2
i=1
2
1 x
sb =s r .
n ∑ x i −x 2
94
Exercise 15 : Asymptotes Answers (Complete Edition)
Analogy
Confidence: Prediction:
2 2
1 xo −x 1 x o− x
y o= 1 2 2 y o= 1n 2 2
n ∑ w i x −x n∑ w i x −x
95
Show that we then have the following formulas:
Confidence: Prediction:
2 2
1 x o −x 1 x o− x
y o= 1 y o= 2
∑ w i x 2 −x 2 ∑ w i x 2− x2
96
Exercise 18 : Least squares method
Answers (Complete Edition)
Method 1 :
x i − x
1- Show that a= ∑ pi yi with p i = .
i ∑ xi− x 2
2- Deduce from this formula of a its variance V(a) 16
.
Method 2 :
Use the propagation of standard deviation formula.
For simple linear regression, is it possible to find a, b,
Δa and Δb using the generalized regression matrix
method?
Exercise 19 : Expectation of a
Answers (Complete Edition)
97
Exercise 20 : Standard deviations
proportional to y
Answers (Complete Edition)
∑ y12 ∑ xy −∑ yx2 ∑ 1y
1- Show that: a= 2
2
∑ y12 ∑ xy 2 − ∑ yx2
∂a
2- Express (long calculation).
∂ yj
xi 1 2 3 4 5 6 7
1 : yi 10 15 20 25 30 35 40
2 : yi 8.286 17.286 18.286 27.286 28.286 37.286 38.286
(We let k=0.1).
Exercise 21 : Interpretation of
the expression of wi
Answers (Complete Edition)
98
Non-linear regression
99
100
III. PROBABILITY DISTRIBUTIONS
k
Moments: μ k =E( X )
[ ( X−μ
σ )]
k μk
βk−2=E or βk−2=
σk
β1 : Skewness (third standardized moment)
β2 : Kurtosis
101
A. Discrete Random Variables
1) Binomial distribution
We consider the number of successes in a sequence of
n identical and independent experiments. Each trial has two
outcomes named success and failure ( Bernoulli trial). Each
success occurs with the probability p=P(S) . The two
parameters are n and p and we write B(n,p). We want to
determine the probability to obtain exactly k successes out
of n trials. A path containing k successes has n-k failures
and its probability is pk qn−k , where q=1− p is the
probability of a failure.
Then we have to count the number of paths where k
successes occur, there are different ways of distributing k
successes in a sequence of n trials. n choice for the position
of the first success, n-1 for the second and n+1-k for the
kth success:
n( n−1)...(n+1−k )=n! /(n−k ) ! possibilities17.
After we divide by k! to remove
multiple counts (for example
S1S2F and S2S1F correspond to the
same path). From where:
P( X =k ) = n ( ) pq k n−k
k
n!
with (nk )= (n−k) !k !
(often read aloud as « n choose k »)
n
and ∑ P( X=k)=1
k=0
102
we show that E( X)=np and V (X )=npq .
2) Geometric distribution
We consider a random experiment with exactly two
possible outcomes : S="success" and F="failure" ;
p=P(S) and q=1− p=P(F) . We repeat independent
trials until the first success. Let X be a random variable of
the number of trials needed to get one success. We denote
the distribution by G(p).
∞
P( X =k )=q
k−1
p , ∑ P( X=k)=1
k=1
103
Answer: P( X=3)=(2/ 3)2 1 /3=4 /27 .
104
P( X ⩽k )=∑ qk−1 p=∑ e ln (1− p).(k−1) p≃∑ p e− pk
We have used a Taylor series, indeed t≫Δ t and k ≫1 .
p
− t
Then P( X ⩽k )→∫ f (t)dt , ∑ p e →∫ λ e−λ t dt so,
Δt
3) Poisson distribution
105
events occur with a known frequency independent of the
time elapsed since the previous events20. If λ is the number
of times the event occurs on average over a given interval
then the probability that the event occurs k times over this
time interval is:
k ∞
P( X =k )= λ e ∑ P( X=k)=1
−λ
,
k! k=0
Answer:
12 107 −10
λ= 50=10 and P( X =7)= e ≃9 %
60 7!
20 or space traveled.
106
B. Continuous Random Variables
1) Uniform distribution
{
x <a : f (x)=0
1
a⩽x⩽b :f ( x)=
b−a
x >b : f (x)=0
2
a+b ( b−a)
E( X)= and V ( X )=
2 12
+∞
f Z (x)=∫ f X ( y ) f Y ( x− y )dy
−∞
107
a< y <b and a< x − y< b then x−b< y< x−a
^
a b y
^
a b y
If 2 a< x <a+ b
x−a
1 x −2 a
then f Z ( x)= ∫ 2
dy=
a (b−a) (b−a)2
{
x <2 a : f Z (x)=0
x−2 a
2 a⩽x <a+b :f Z (x)=
(b−a)2
2 b−x
a+b⩽x⩽2 b :f Z (x)=
(b−a)2
x >2 b : f Z (x)=0
f
^
^
2a a+b 2b x
108
2) Exponential distribution
f
1.0
λ=1
0.5
λ=1/2
0.1
0 1.0 2.0 X
t<0 : f (t)=0
1
E(T )= λ et V (T )=
1
λ
2
109
The distribution of the sum of two independent exponential
distributions is not an exponential distribution.
3) Normal distribution
The normal or Gaussian distribution has previously been
described page 20.
The sum of two independent Gaussian distributions
NX(μ1,σ12) and NY(μ2,σ22) is the Gaussian distribution
NZ(μ1+μ2, σ12+σ22) .
4) Student's t-distribution
The t-distribution is express with the number of degrees of
freedom k and the gamma function (function described in
the mathematical tools).
For k ≥1 , f k (x)=
1
Γ ( k +12 ) 1
2 k +1
√k π k
Γ ( ) 1+ x
2 ( k)
2
k
Variance: V k = if k ≥3 .
k −2
110
k −2
Kurtosis: βk =3 if k ≥5 .
k−4
Y
∞
8
4
3
2
k=1
+0.3
+0.2
+0.1 0.1
X
1
−4.0 −3.0 −2.0 −1.0 +1.0 +2.0 +3.0 +4.0
111
First Students
k d° V β2 β4 f k( x) f k (0)
1 2 - - - 1 1 ≃0.318
π 2
1+ x
3
2 3 - - - x2 ≃0.354
2 √2 ( )
1 2
1 / 1+
2
3 4 3 - - 2 2 ≃0.366
1/ ( 1+ )
2 x
√3 π 3
4 5 2 - - 2 5 ≃0.375
1/ ( 1+ )
3 x 2
8 4
2 3
5 6 5/3 9 - ≃0.380
≃1.67
8
3√5 π ( )
1/ 1+
x
5
6 7 3/2 6 - 2 7 ≃0.383
16 √ 6 ( 6 )
15 x 2
=1.5 1/ 1+
2 4
7 8 7/5 5 125 ≃0.385
5√7 π (
1/ 1+ )
16 x
=1.4 7
8 9 4/3 9/2 67.5 2 9 ≃0.387
64 √ 2 ( 8 )
35 x 2
≃1.33 =4.5 1/ 1+
2 5
9 10 9/7 21/5 49 ...
105 π (
1/ 1+ )
128 x
≃1.29 =4.2 9
10 11 5/4 4 ... ...
=1.25
11 12 11/9 27/7
≃1.22 ≃3.9
...
2
∞ 1 3 15 1/ √2 π e− x / 2 ≃0.399
112
5) Chi-squared distribution
k
E( X k )=∑ E(T i2)=k (V (T )+ E(T )2 )=k
i=1
1 k/ 2−1 −x /2
For k ≥1 and x≥0 , f k (x)= k /2
x e
2 Γ ( k /2 )
Expectation : Ek =k Variance : V k =2 k
Skewness : β1,k =√ 8/ k
113
f
k=1
0,5
k=2
0,2
k=3
k=10
0 1 5 10 X
114
C. Function of a continuous distribution
115
⦁ Example: X is a uniform distribution U(1;2). What prob-
ability distribution is Y?
{
0 x⩽1
y
We have f X (x )= 1 1< x <2 then if e ≤1 , y≤0 and
0 x⩾2
f Y ( y )=0 . We continue like this for the two other cases
and we obtain the distribution of Y:
{
0 y⩽0
f Y ( y )= y
e 0< y < ln 2
0 y ⩾ln 2
1 y −b 1 y −b
and f Y ( y )=− f X ( ) then f Y ( y )= f X ( )
a a |a| a
116
⦁ Example 1: X is a uniform distribution U(0,1). What
probability distribution is Y?
{
0 x⩽0 y−b
We have f X (x )= 1 0< x <1 , so if ≤0 and
a
0 x⩾1
a> 0 then y≤b and f Y ( y )=0 . We continue like this for
the two other cases and we obtain the distribution of Y:
{
0 y⩽b U (0 ,1)→U ( b , a+b)
f Y ( y )= 1
b< y < a+b
a If φ(x)=(b-a)x+a and a<b:
0 y ⩾a+b U (0 ,1)→U ( a , b)
117
4) Case where φ(x)=ex: Y=eX and y>0
D. Numerical simulation
118
−λ x
Case of an exponential distribution: f X ( x )=λ e if x>0
x
else zero, then F X (x)=∫ f X ( x ) dx=1−e−λ x = y if x>0
−∞
(1− y )
else zero. So x=−ln and finally we simplify,
λ
knowing that 1-U and U have the same distribution.
ln U
Simulation of an exponential distribution: X =−
λ
119
vation.
⦁ There are many other methods that use the different prop-
erties of the probability distributions. For example, by sim-
ulating the Bernoulli distribution, we obtain, by sum, a bi-
nomial distribution which itself allows us to simulate a nor-
mal distribution.
120
E. Exercises
Exercise 1: Binomial distribution
Answers (Complete Edition)
Check that the binomial distribution defines a
probability distribution. Calculate the expectation and
the variance as a function of the parameters n and p.
121
Exercise 5: Poisson distribution
Answers (Complete Edition)
122
Exercise 10: Student's t-distribution
Answers (Complete Edition)
( )
+∞
x
We can use the integral I k =∫ 1+ 2
dx and
−∞ k
x
carry out an integration by substitution with u= .
√k
Exercise 11: Variance of Student's t-distribution
Answers (Complete Edition)
123
2. We now consider the case where X and Y are two
independent continuous uniform distributions U(1,2).
a. Determine the analytic expression of the Z
distribution.
b. Find the shape of probability density function of Z
with a numerical simulation of the product on a
spreadsheet for n=10,000.
124
125
IV. ESTIMATORS
A. Properties of an Estimator
1) Bias
The bias of an estimator Tn is the expectation E(T n−θ) .
Then : bT (θ)=E (T n )−θ
n
126
2) Mean Square Error
We can compare different estimators with their mean
square errors. The mean square error of Tn is defined as the
expectation E[(T n−θ)2 ] and we show that:
2
r T (θ)=V (T n)+ b
n
127
mean square error tends to zero when n tends to infinity.
∑ ( X i− X¯n)2 ∑ ( X i− X¯n)2
S n2= i=1 and Rn2= i=1
n n−1
128
B. Construction of estimators
1) Method of Moments
We identify the moments of the population with those
of the sample. We consider as many moments as we have
parameters starting with the first moment. As we will see
on examples this method provides us with estimators but
does not guarantee us that these are the best in terms of
bias and mean square error.
k *
Theoretical moments: mk =E (X ) , k ∈ℕ
n
1
Sample moments: X nk = ∑Xk
n i=1 i
129
⦁ Example 1: We have a checkerboard of 100 squares and
200 seeds. Every second
we randomly place a seed
in one of the squares. At
the end of the experiment
we count the number of
seeds in each square and
we count the number of
squares that contain no
seed, one seed, two seeds
and so on. Let X be the
random variable for the
numbers of seeds per
square. We obtain the following distribution:
k 0 1 2 3 4 5 6 7 8
n 12 28 34 10 8 7 0 1 0
130
the event per second for each square is 1/n. For a Poisson
distribution at any instant the event can occur, here the
time is discretized, nevertheless the approximation of a
continuous time is correct because one second is a small
duration compared to that of 200 seconds of the
experiment. So we can take λ=N /n . We tend to a
Poisson distribution when the number of squares and seeds
tend to infinity.
131
lower risk (mean square error). A new estimate of the
parameter gives λ≃0.064 and the life expectancy is about
16 months.
a+b 1
We have E( X)= = Xn
2
2 ( b−a)2 (a+ b)2 a 2+ ab+b 2
and E( X )= + = = X n2
12 4 3
1 1 2 2
then a+ b=2 X n and ab=4 X n −3 X n
132
2) Method of Maximum Likelihood
{
Pθ (X =x) for a discrete variable
f (x , θ) = or
∂ L ( x i , θ) ∂2 L ( xi , θ)
= 0 and < 0
∂θ ∂θ 2
133
Usually on can find the maximum by differentiating the
likelihood function L(θ). However the calculation of the
derivative may be tedious, it is why we prefer to consider
the logarithm of L(θ). The logarithm is a increasing
function and we can consider the extreme values of
ln(L(θ)) instead of L(θ).
x! i=1 x i ! i=1 x i !
n n
x
∂ln L x
ln L=−n λ+ ∑ ln λ
i
and =−n+ ∑ i =0
i=1 xi ! ∂λ i=1
λ
n
1
then λ= ∑ x i
n i=1
i=1
134
n x
∂ln L n n
ln L=n ln λ−∑ ln λ
i
and = −∑ t i
i =1 xi ! ∂λ λ i=1
n
then λ= n
∑ ti
i=1
n 2
xi −μ
so ln L =−n ln σ−
n
2
ln (2 π)−
1
∑
2 i=1 σ
( )
∂ln L n x i−μ n
1
∂μ ∑ ∑x
= =0 and μ=
i=1 σ2 n i=1 i
n n
∂ln L n 1 1
=− σ + 3 ∑ ( x i−μ) =0 and σ = ∑ ( x i−μ)2
2 2
∂σ σ i=1 n i=1
135
⦁ Example 4: Let consider now the uniform distribution.
{
1
n
n
if {x i}⊂[a , b]
L ( xi , a , b)=∏ f (xi , a , b)= (b−a)
i=1
0 else
136
C. Interval estimate
137
V (X )
∀ ϵ>0, P(|X−E (X )|≥ϵ) ≤ 2
ϵ
k 0 1 2 3 4 5 6 7 8
k-λm -2 -1 0 1 2 3 4 5 6
2
(k-λm) 4 1 0 1 4 9 16 25 36
n 12 28 34 10 8 7 0 1 0
n
1
σ =
λ
2
∑
n−1 i=1
(x i− x̄)2 , σ λ ≃1.44 and λ=2.0±0.3 with
95% confidence.
Here the sample size is not sufficient for this method but
we have illustrated the general method for large samples.
We probably have underestimated the width of the interval.
138
a) Let us first use the method of maximum likelihood
which gives us the estimator Tn , the distribution of the
maximum of a number n of independent, identically
distributed variables: T n=max( {X i})
a is here estimated at 17 minutes.
d FT (x ) x n−1
So we obtain the density of Tn : f T (x)= =n n
n
n
dx a
a
Expectation: E(T n )=∫ x f T ( x )dx= nn ∫ x n dx= n a
n
a 0 n+1
n a
The estimator Tn is biased: bT = a−a=−
n
n+1 n+ 1
a is underestimated and the bias is asymptotically zero.
139
a
Variance: E( X )=∫ x ² f T ( x )dx= nn ∫ x n+1 dx= n a ²
2
n
a 0 n+2
2 2
n n na
V (T n)=E(X 2)−E ( X )2= a ²− 2
a ²= 2
n+2 ( n+ 1) (n+2)( n+1)
we set ϵ= √
V (W n)
α and we have:
1−P(W n−ϵ≤a≤W n+ ϵ)≤α
Eventually:
P(W n − √ V (W n)
α ≤ a ≤ W n + √
V (W n )
α ) ≥ 1−α
140
In the 90% confidence case α=0.1 and for our sample
V (W 12 )=
18.42
12×14
≃2.0 and Δ a=
V (W n )
α ≃4.5 . √
Then 13.9≤a≤22.9 and a≃18.4±4.5 minutes.
n n x n
n+ 1
FW (x)=F T
n n ( n+ 1
x=)(
n+1 a ) if 0≤x ≤
n
a.
n
n n+ 1
f W (x)=
n ( a(n+1) )
n x n−1 if 0≤x ≤
n
a
141
13 2
so amax =
12 ( )
×17≃19.95 and amax ≃20.0 minutes.
amax
Lower bound: ∫ f W ( x)dx=1−α
n
amin
∫ fW 12
( x )dx=0.9 with amin ≃16.4 minutes.
amin
142
We thus find the same results as by the preceding method
(file : www.incertitudes.fr/livre/Train.ods).
x̄=
3+12+ ...+ 11
12
≃9.83 and s=
11√
(3− x̄)2 +...
≃4.88
143
numerical simulation, the interval is thus underestimated.
Indeed by carrying out a numerical simulation with this
estimator, a=19.7±5.5 minutes with 90% confidence:
144
In conclusion, the maximum likelihood estimator converges
much faster than that of the moment theorem and we
prefer this first method. The variance converges to 1/n 2
instead of 1/n:
a2 a2
[ V (W n)] ML= n(n+2) and [ V (T n) ]MM = 3 n
70
Convergence of variances
60
30
20
10
0
0 5 10 15 20 25 30
145
D. Exercises
Let consider the (X1, X2, X3). X1, X2 and X3 are three
independent variables with the same distribution,
expectation m and variance σ2.
Compare the following three proposed estimators to
estimate the mean m of the sample:
A3=(X1+X2+X3)/3, B3=(X1+2X2+3X3)/6 and
C3=(X1+2X2+X3)/3.
146
Exercise 3 : Two estimators
Answers (Complete Edition)
Values 0 1 2
Probabilities 3θ θ 1 - 4θ
Zn
Show that U n= is an unbiased estimator of θ. De-
n
termine V(Un).
5. We make estimates of θ with the following realiza-
tions:
Values 0 1 2
Frequencies 31 12 57
147
Exercise 4 : Ballot boxes Answers (Complete Edition)
{
variable X with the following a
a+ 1
if x>1
density: f (x)= x
Where a is the parameter we
want to estimate (a>1). 0 otherwise
1. Check that f defines a valid probability density.
2. Calculate E(X).
3. Determine estimators of a by the method of
maximum likelihood and by the method of moments.
4. Give point estimates of a for the following
observations:
1.16 / 1.80 / 1.04 / 3.40 / 1.22 / 1.06 / 1.35 / 1.10.
5. Can we calculate V(X)?
6. Perform a numerical simulation of the law of X.
What can we guess about the biases and convergences
of the estimators found?
148
Exercise 6 : Linear density
Answers (Complete Edition)
{
a x +b if 0≤x≤1
f (x)=
0 otherwise
1. Express b as a function of a such that f defines a
probability distribution.
2. Calculate E(X) and V(X).
3. Determine an estimator Tn of a by the method of
moments. Discuss the properties of this estimator.
4. We draw a sample:
0.81 0.67 0.72 0.41 0.93 0.55 0.28 0.09 0.89
149
Exercise 8 : Decays Answers (Complete Edition)
150
V. Measure with a ruler
ABSTRACT
The measurement of a physical quantity by an acquisition
system induces because its resolution a discretization error.
We are here concerned with measuring a length with a
graduated ruler. This type of measure leads us to consider a
uniform continuous probability distribution. We then use a
convolution to determine the uncertainty with its confidence
of a sum of lengths. Finally, we generalize to the general
case of the calculation of uncertainties for independent
random variables using the error propagation formula.
INTRODUCTION
We want to measure lengths and evaluate uncertainties
as exactly as possible. Uncertainties about the measured
values and their sums. We have a ruler of 15cm graduated
to the millimeter and two sets of cards. The ruler is
assumed to be perfect and the cards of each game identical.
151
1. MEASURE OF THE LENGTH OF ONE CARD
We place the
graduation zero on
the left edge of the
card. On the right
edge we consider
the graduation clo-
sest to the edge.
The experimenter
does not read
between the gra-
duations. The thickness of the lines
which delimit a graduation is considered negligible
compared with the width of this graduation. We get thus
for the deck 1:
x 1=8.4±0.05 cm .
152
Concerning the cards of the second deck:
x 2=11 .2±0. 05 cm.
153
To characterize the spreading of a distribution we consider
the range E and the standard deviation σ whose definition
for a continuous law is:
2 2
V =σ =∫ (x−x m) f ( x ) dx ,
σ=δ/ √ 12≃0.29 δ ,
154
The probability density f of X is computed from those of
X1 and X2. For a sum of independent random variables the
result is given by a convolution [iii] :
{
x< x min ⇒ f ( x)=0
( x− xmin )
xmin <x<x m ⇒ f ( x)= 2
f ( x)=∫ f 1( y) f 2( x− y) dy ⇒ δ
( xmax − x)
x m< x< x max ⇒ f ( x)= 2
δ
x> x max ⇒ f ( x)=0
155
We then have a triangular probability distribution.
We obtain x=19 .6±0.1cm with 100% confidence, and
x=19 .6±0.055 cm with 80% confidence.
For each die the six values are equally likely. Here the
law of probability is no longer continuous but discrete. We
launch two dice simultaneously, the sum of the values
obtained is between two and 12. In this case, there is no
equiprobability, a way to get two with a double one, two
ways to get three with one and two or two and one ... to get
seven we have the maximum of possibilities. We thus find a
triangular distribution.
156
The cards of a deck are supposed identical then if the
length of one of them is overestimated, it will be the same
for the second one. In this case the errors are added and
can not be compensated. For two different cards, the first
measure can be underestimated and the second
overestimated, and a compensation can occur. Here it is no
longer the case and when X =X i + X i ' we obtain a
uniform distribution of width 2 δ . Our random variables
are not independent.
N
We have X =∑ X i . Each length X i follow a
i=1
uniform distribution of width δ . For the sum of nine
independent random variables after iteration of the
calculation we obtain the following curve:
157
In this case we obtain x=x moy±0.11 cm with 80%
confidence. With 100% confidence: x=x moy±0. 45 cm ,
which leads us to consider domains where the probability
of presence of X is really negligible. An uncertainty of
0.45cm seems unnecessary while 99% of the cases were
already present with an uncertainty of 0.22cm.
158
80% 95% 99%
N=1 0.40 0.48 0.50
2 0.55 0.78 0.90
3 0.66 0.97 1.19
4 0.75 1.12 1.41
5 0.84 1.25 1.60
6 0.92 1.38 1.76
7 0.99 1.49 1.91
8 1.06 1.59 2.05
9 1.12 1.69 2.18
10 1.2 1.8 2.3
20 1.7 2.5 3.3
50 2.6 4.0 5.2
100 3.7 5.7 7.4
But this approach does not take into account one thing: the
curve narrows around the mean when N increases. There is
another additive quantity: the variance. The standard
deviation, square root of the variance, is proportional to √N
and takes account of error compensations.
159
The results of the measurements are often given with
a confidence of 95%, which corresponds for a Gaussian to
an uncertainty of about 2σ .
6. OTHER APPLICATIONS
160
In the laboratory many measuring instruments have
digital displays. The resolution is define by the last digit.
But the overall uncertainty is much higher. It is necessary
to consult the operating instructions of each device.
CONCLUSION
161
n
∂f 2
f 2=∑ xi 2 ,
i =1 ∂ xi
162
VI. Mathematical Tools
A - Derivatives
163
cos x −sin x cos x =∣sin x∣ x
ex ex e x =e x x
ln x 1/ x ln x = x /∣x∣
uv u ' v ' (u and v as functions of x)
uv u ' vv ' u (product rule)
u u ' v −v ' u
v 2 (quotient rule)
v
f g x g ' x f ' g x (chain rule)
1 −1 1 −1− 1 1
• =x so ' =−1 x =− 2 .
x x x
1 1
x =x 2 then ( √ x )'= 1 x 2 = 1 .
−1
•
2 2√x
• sin x 2 ' = x 2 ' cos x 2 =2 x cos x2
B - Partial derivatives
( ∂∂ xf ) y ,z
=2 x + y , ( ∂∂ fy )
x, z
=x and
∂f
∂z x,y
=−2
164
C - Taylor series
With the notion of derivative we have studied the first-order
behavior of a function around a point, here we refine to the
higher orders.
For every infinitely differentiable function and for ϵ≪1 we
have the following development in the neighborhood of a point:
2 3 n
f ( x 0+ϵ)=f ( x 0)+ϵ f ' (x 0 )+ ϵ f ' ' ( x0 )+ ϵ f (3)( x 0)+...+ ϵ f ( n)( x 0)+...
2 3! n!
The more we take high order terms the better is the
approximation. For example for f (x )=sin( x ) and x 0 =0 we
3 2
find: sin(ϵ)≃ϵ− ϵ , in the same way cos (ϵ)≃1− ϵ .
3! 2
D - Integrals
+∞ 2
165
We have used an integration by parts:
b b
b
∫ u(x )v '(x )dx =[u(x )v (x)] −∫ u' (x )v (x)dx
a
a a
Integration by substitution:
−1 −1
Let u=g (x) be a new variable with g a continuous
function strictly monotonic on [a,b] and g the inverse function,
then:
b g−1 (b)
∫ f (x)dx= ∫ −1
f (g(u))g ' (u)du
a g (a)
166
E – Series
n
k =0 k
∑ k (k −1)...(k−r+ 1)q k− r= r ! r+ 1
∞
k =r (1−q)
1 ∞
k q k −r
so
(1−q)
r+1 = ∑
k =r r
()
x
∞
xk
A definition of the exponential function: e =∑
k =0 k!
+∞
F - Gamma function Γ( x )= ∫ t x− 1 e−t dt
0
This function is an extension of the factorial to real and complex
number (except for 0, -1, -2...)22. We will use it for half-integer
numbers. We demonstrate with an integration by parts:
Γ( x+1)=x Γ( x) . Γ(1)=1 then for n integer
Γ(n+1)=n! . Moreover, Γ(1/ 2)= √ π allows to calculate the
function for the half-integers.
167
VII. Answers to Exercises
168
Chapitre II : Correlation and Independence
c) r12=0 . r13=0 . r23=-1 . X1 and X2 are not correlated. The same for X1
and X3. X2 and X3 are dependents and totally correlated.
2- a) x1=0 and x2=0. σ 1≃1.22 and σ 2≃1.58 .
c) r12=0.904 . Quantities are positively correlated.
3- a) x1=0 and x2=0. σ 1≃1.73 and σ 2≃1.53 .
c) r12=0 . Quantities completely uncorrelated, do not forget that the
correlations sought here are linear. There is a correlation in the form
of a V.
2- b) 3- c)
3 3
2 2
1 1
0 0
-3 -2 -1-1 0 1 2 3 -3 -2 -1-1 0 1 2 3
-2 -2
-3 -3
169
E2 : Volumes Exercise on page 87
1-
V1=(100.1+100.0+99.9+100.0)/4 so V1 = 100.0 mL.
√
2 2 2 2
σ 1=
0.1 +0 +(−0.1) +0
4−1
=
√ 2
3
⋅0.1 mL then σ 1≃0.082mL
2-
V i =V i−V and
∑ [(V 1i−V 1)(V 2i−V 2 )]=∑ [ V^1i V^2i ]=0.1×0+0×0.1+(−0.1)×0+0×0.1=0
i i
∑ [ V 1i−V 1 V 2i −V 2 ]
i
by definition r 12=
∑ V 1 −V 1 ∑ V 2 −V 2
i 2 i 2
i i
3-
V={200.1 , 200.1 , 199.9 , 199.9}mL so V = 200 mL.
√
2 2 2 2
0.1 +0.1 +(−0.1) +(−0.1) 2
σV= = ⋅0.1mL
4 −1 √3
then σ V ≃0.115 mL and ΔV≃ 0.183 mL , ΔV /V ≃ 0.09%
4-
V(V1,V2) then we have:
2 2
∂V ∂V
V 2= V 12 V 22 = V 1 2 V 22
∂V1 V ∂V 2 V
2 1
170
VIII. Bibliography / Sources /
Softwares / Illustrations
Books
[vi] WONNACOTT. Introductory Statistics for Business and
Economics, 1972. 919 p.
SHELDON M. ROSS. A first course in probability.1976. 585 p.
Livres
[iv] PROTASSOV Konstantin. Probabilités et incertitudes dans
l'analyse des données expérimentales. Presses Universitaires de
Grenoble, 1999. 128 p.
Web
Place http:// in front of the site name. Most files are
copied to the folder <www.incertitudes.fr/livre/>.
171
[v] Nombres, mesures et incertitudes. mai 2010.
<www.educnet.education.fr/rnchimie/recom/mesures_incertitudes.pdf>
Articles
Softwares
All the software used are free and open source.
• Word processing and spreadsheet: LibreOffice
• Graphics: Gimp (points), Inkskape (vectors) and Blender
(3D).
• Computation: Scilab (numerical), XCAS (symbolic) et
PHP programs (on server).
• Plotters: KmPlot, TeXgraph.
• Operating System: Ubuntu.
172
Illustrations
At what distance from the starting point is the walker at the instant t?
(after n time intervals: t = n Δt)
The center of the table is its starting point. On the abscissa x (direction
East-West) the displacement is more or less one step (x=±p , p= Δd /2
and we have fixed Δd=1). Similarly on the y-axis: y=±p. In each
square is indicated the number of possibilities to meet at this place.
The second table indicates the probabilities (1/4, 4=2x2).
In the four possibilities he is at a distance √2/2 from the point of origin.
And, in terms of standard deviation, the characteristic distance is
s= √ (4×1/ √ 2) /(4−1)≃0.816 .
2
1 1 2 1 6% 13% 6%
0 2 4 2 13% 25% 13%
-1 1 2 1
6% 13% 6%
-1 0 1
173
For example to be at (x=0 ; y=-1), there are two possible paths:
(PF,PP) and (FP,PP). Standard deviation s2 ≃1.033 .
The walker has a one in four chance to be back to the starting point.
1.5 1 3 3 1 2% 5% 5% 2%
0.5 3 9 9 3 5% 14% 14% 5%
-0.5 3 9 9 3 s3 ≃1.234
5% 14% 14% 5%
-1.5 1 3 3 1
2% 5% 5% 2%
-1.5 -0.5 0.5 1.5
174
For n=6, we draw the following tables:
Hence the evolution of the quadratic mean distance from the starting
point as a function of time:
t (Δt) 0 1 2 3 4 5 6
√t 0.00 1.00 1.41 1.73 2.00 2.24 2.45
s (2p) 0.000 0.816 1.033 1.234 1.417 1.582 1.732
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0
We find a much better correlation in √t. Indeed we saw directly that the
distance at the origin did not evolve proportionally to time, for n=2 we
are about one unit of the origin, so we should be towards 3 for n=6.
This variation in √t is characteristic of diffusion phenomena and here
finds its analogy with the compensation of errors in √n.
175
IX. TABLES / Index z2
1 −2
f (z)= e
A. Standard normal distribution √2 π
z
176
B. Student's t-values
Student
value Confidence (%)
t 50 80 90 95 98 99 99.5 99.8 99.9
1 1.00 3.08 6.31 12.7 31.8 63.7 127 318 637
2 0.82 1.89 2.92 4.30 6.96 9.92 14.1 22.3 31.6
3 0.76 1.64 2.35 3.18 4.54 5.84 7.45 10.2 12.9
4 0.74 1.53 2.13 2.78 3.75 4.60 5.60 7.17 8.61
5 0.73 1.48 2.02 2.57 3.36 4.03 4.77 5.89 6.87
Degrees of freedom (sample size minus the number of parameters)
177
C. Chi-square values
3.57 6.30 8.44 11.34 14.0 15.8 18.5 21.0 24.1 26.2 32.9
13 4.11 7.04 9.30 12.34 15.1 17.0 19.8 22.4 25.5 27.7 34.5
14 4.66 7.79 10.2 13.34 16.2 18.2 21.1 23.7 26.9 29.1 36.1
15 5.23 8.55 11.0 14.34 17.3 19.3 22.3 25.0 28.3 30.6 37.7
16 5.81 9.31 11.9 15.34 18.4 20.5 23.5 26.3 29.6 32.0 39.3
17 6.41 10.1 12.8 16.34 19.5 21.6 24.8 27.6 31.0 33.4 40.8
18 7.01 10.9 13.7 17.34 20.6 22.8 26.0 28.9 32.3 34.8 42.3
19 7.63 11.7 14.6 18.34 21.7 23.9 27.2 30.1 33.7 36.2 43.8
20 8.26 12.4 15.5 19.34 22.8 25.0 28.4 31.4 35.0 37.6 45.3
21 8.90 13.2 16.3 20.34 23.9 26.2 29.6 32.7 36.3 38.9 46.8
22 9.54 14.0 17.2 21.34 24.9 27.3 30.8 33.9 37.7 40.3 48.3
23 10.2 14.8 18.1 22.34 26.0 28.4 32.0 35.2 39.0 41.6 49.7
24 10.9 15.7 19.0 23.34 27.1 29.6 33.2 36.4 40.3 43.0 51.2
25 11.5 16.5 19.9 24.34 28.2 30.7 34.4 37.7 41.6 44.3 52.6
26 12.2 17.3 20.8 25.34 29.2 31.8 35.6 38.9 42.9 45.6 54.1
27 12.9 18.1 21.7 26.34 30.3 32.9 36.7 40.1 44.1 47.0 55.5
28 13.6 18.9 22.7 27.34 31.4 34.0 37.9 41.3 45.4 48.3 56.9
29 14.3 19.8 23.6 28.34 32.5 35.1 39.1 42.6 46.7 49.6 58.3
30 15.0 20.6 24.5 29.34 33.5 36.3 40.3 43.8 48.0 50.9 59.7
31 15.7 21.4 25.4 30.34 34.6 37.4 41.4 45.0 49.2 52.2 61.1
32 16.4 22.3 26.3 31.34 35.7 38.5 42.6 46.2 50.5 53.5 62.5
33 17.1 23.1 27.2 32.34 36.7 39.6 43.7 47.4 51.7 54.8 63.9
34 17.8 24.0 28.1 33.34 37.8 40.7 44.9 48.6 53.0 56.1 65.2
35 18.5 24.8 29.1 34.34 38.9 41.8 46.1 49.8 54.2 57.3 66.6
36 19.2 25.6 30.0 35.34 39.9 42.9 47.2 51.0 55.5 58.6 68.0
37 20.0 26.5 30.9 36.34 41.0 44.0 48.4 52.2 56.7 59.9 69.3
38 20.7 27.3 31.8 37.34 42.0 45.1 49.5 53.4 58.0 61.2 70.7
39 21.4 28.2 32.7 38.34 43.1 46.2 50.7 54.6 59.2 62.4 72.1
40 22.2 29.1 33.7 39.34 44.2 47.3 51.8 55.8 60.4 63.7 73.4
178
Index
Absolute zero.................................................................................62
Accuracy........................................................................................33
Aging............................................................................................104
Arithmetic mean..............................................................................2
Asymptotes.....................................................................................95
Ballot boxes..................................................................................148
Bernoulli distribution...........................................................102, 120
Bias...............................................................................................126
Binomial distribution....................................................102, 120, 121
Binomial formula.........................................................................167
Box-Muller transform..................................................................119
Cauchy's equation...........................................................................89
Central limit theorem.............................................................10, 138
Chebyshev's inequality.........................................................137, 140
Chi 2...............................................................................................76
Chi-squared distribution.......................................................113, 123
Chi-squared test.............................................................................30
Class interval..............................................................................6, 38
Coefficient de dissymétrie............................................................101
Confidence Interval............................................................12, 61, 95
Convolution..................................................................107, 151, 155
Correlation coefficient....................................................................47
Cumulative distribution function..........................................104, 115
Decay...........................................................................................150
Decomposition into Gaussians.......................................................99
Degrees of freedom........................................................................13
Derivatives of geometric series....................................................167
Diffusion phenomena....................................................................175
Discretization error......................................................................151
Error of the first kind.....................................................................26
Error of the second kind................................................................26
Estimator......................................................................................126
Expectation.....................................................................................20
Exponential distribution...............................................109, 122, 131
179
Frequency.........................................................................................6
Function of a continuous distribution...........................................115
Gamma function...........................................................................167
Gaussian distribution................................................................10, 19
Gaussian distribution 3D................................................................42
Geometric distribution.........................................................103, 121
Geometric mean...............................................................................2
Homokinetic Beam......................................................................146
Hypothesis test...............................................................................24
Integral.........................................................................................165
Integration by parts......................................................................166
Integration by substitution............................................................166
Interval estimate...........................................................................137
Inverse distribution.......................................................................124
Inverse transform sampling..........................................................118
Inverse transformation method.....................................................118
Kurtosis........................................................................101, 113, 166
Least squares method...............................................................58, 75
Likelihood....................................................................................133
Line spectrum.................................................................................89
Linear density...............................................................................149
Linear regression............................................................................58
Linearization..................................................................................67
Mean deviation...........................................................................4, 37
Mean Square Error.......................................................................127
Memoryless property....................................................................104
Method of Maximum Likelihood.................................................133
Method of Moments.....................................................................129
Moment................................................................................101, 166
Negative binomial distribution.....................................................121
Nonlinear regression.................................................................75, 80
Normal distribution......................................................................110
Numerical simulation...................................................................118
Parabolic regression.......................................................................78
Poisson distribution..............................................................105, 122
Polynomial regression.....................................................................77
180
Power of the test.............................................................................29
Precision.........................................................................................33
Prediction.......................................................................................62
Prediction Interval..............................................................12, 62, 95
Prism..............................................................................................89
Probability density function............................................................19
Product of distributions................................................................123
Propagation of standard deviations formula.................................161
Propagation of uncertainties formula.....................................52, 161
Random errors..............................................................................162
Range.......................................................................................3, 153
Refractive index..............................................................................88
Repeatability..................................................................................33
Reproducibility...............................................................................33
Residual....................................................................................59, 76
Resolution..............................................................................33, 151
Sample standard deviation................................................................3
Sampling distribution.....................................................................10
Skewness......................................................................101, 113, 166
Small variations method...........................................................73, 98
Student's t-distribution.........................................................110, 122
Student's t-value.............................................................................60
Sum of binomial distributions......................................................121
Sum of exponentials.....................................................................124
Sum of Gaussians.........................................................................122
Sum of independent random variables.........................................101
Sum of Student's t-distributions...................................................123
Taylor series.........................................................................105, 165
Thermal conductivity......................................................................90
Triangular distribution..................................................................156
Uncertainty.....................................................................................13
Uncertainty calculations.................................................................53
Uniform distribution....................................................107, 122, 153
Variance.................................................................20, 101, 154, 159
Waiting time.................................................................................138
181