0% found this document useful (0 votes)

11 views90 pages

Slides ProbTheoryStats

Uploaded by

coxdevon045

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views90 pages

Slides ProbTheoryStats

Uploaded by

coxdevon045

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

Engineering Modelling:

Probability Theory - Brief review

Stephan Schmidt

[email protected]

Last generated: February 27, 2024

1/90
Please let me know if anything needs to be clarified in the slides (e.g.
something that is not apparent) or if there are mistakes/typos.

2/90
Outline

Probability theory
Introduction
Probability density functions
Cumulative distributions
Sampling
Probabilities
Percentiles and quantiles
Likelihood
Transformation of variables
Multivariate distributions
Expected values
Monte Carlo integration
Conclusion

3/90
Probability theory

4/90
Subsection 1

Introduction

5/90
Introduction

(a) (b)

Urn 1 Urn 2 Urn 1 Urn 2

Assume that the stones are properly mixed.

1. What is the probability of selecting a (i) red square, (ii) green triangle or a (iii)
blue circle?
2. Given you have selected Urn 1, what is the probability of selecting a (i) red
square, (ii) green triangle or a (iii) blue circle?
3. Given you have selected Urn 2, what is the probability of selecting a (i) red
square, (ii) green triangle or a (iii) blue circle?
4. Given you have selected a blue circle, what is the probability that you have
sampled from Urn 1.
5. Given you have selected a green triangle, what is the probability that you have
sampled from Urn 1.
6. Given you have selected a green triangle and a blue circle what is the probability
that you selected Urn 1 at random.

6/90
Introduction

Go to the next page to see the answers . . .

7/90
Introduction

Description Urn 1 Urn 2 Total

Blue circle (B) 2 6 8
Red square (R) 8 3 11
Green triangle (G) 0 1 1
Total 10 10 20
What is the probability of selecting a red square, green triangle or a blue circle?

P (B) = 8/20 = 0.4 (1)

P (R) = 11/20 = 0.55 (2)
P (G) = 1/20 = 0.05 (3)

Given you have selected Urn 1, what is the probability of selecting a red square,
green triangle or a blue circle?

P (B|U1 ) = 2/10 = 0.2 (4)

P (R|U1 ) = 8/10 = 0.8 (5)
P (G|U1 ) = 0/10 = 0.0 (6)

8/90
Introduction

Description Urn 1 Urn 2 Total

Blue circle (B) 2 6 8
Red square (R) 8 3 11
Green triangle (G) 0 1 1
Total 10 10 20
Given you have selected Urn 2, what is the probability of selecting a red square,
green triangle or a blue circle?
6
P (B|U2 ) = = 0.6 (7)
10
3
P (R|U2 ) = = 0.3 (8)
10
1
P (G|U2 ) = = 0.1 (9)
10

9/90
Introduction

Description Urn 1 Urn 2 Total

Blue circle (B) 2 6 8
Red square (R) 8 3 11
Green triangle (G) 0 1 1
Total 10 10 20

Given you have selected a blue circle, what is the probability that you have
sampled from Urn 1.
2
P (U1 |B) = = 0.25 (10)
8
6
P (U2 |B) = = 0.75 (11)
8

10/90
Introduction

Description Urn 1 Urn 2 Total

Blue circle (B) 2 6 8
Red square (R) 8 3 11
Green triangle (G) 0 1 1
Total 10 10 20

Given you have selected a green triangle, what is the probability that you have
sampled from Urn 1.
0
P (U1 |G) = =0 (12)
1
1
P (U2 |G) = = 1 (13)
1
Given you have selected a green triangle and a blue circle what is the
probability that you selected Urn 1 at random.

P (U1 |G, B) = 0 (14)

11/90
Subsection 2

Probability density functions

12/90
Probability density functions

The function p(x) is a probability density function over the continuous random
variable x if and only if

• p(x) ≥ 0 for the domain of the function.

R∞
• −∞
p(x)dx = 1 (if −∞ < x < ∞).

Exercise: Calculate c to ensure that this is a valid probability density function

c · (1 + x)−1

if 0≤x≤1
p(x) = (15)
0 otherwise

Plot the probability density function.

13/90
Probability density functions

• Normal/Gaussian distribution:
1 − 1 (x−µ)2
N (x; µ, σ 2 ) = √ e 2σ2 (16)
2πσ 2

• Laplace distribution:

1 1
L(x; µ, b) = exp − |x − µ| (17)
2b b
• Gamma distribution:
1 a a−1 −bτ
Gamma(τ ; a, b) = b τ e (18)
Γ(a)
• Beta distribution:
Γ(a + b) a−1
Beta(µ; a, b) = µ (1 − µ)b−1 (19)
Γ(a)Γ(b)
• Bernoulli distribution:

Be(x; µ) = µx · (1 − µ)1−x (20)

14/90
Probability density functions

The scipy.stats module

Example of the (univariate) Normal/Gaussian distribution

NB: Always check the form of the function (there are often different
parametrisations of the same distribution)!

15/90
Probability density functions
These are the functions that are available for the Normal distribution in
scipy.stats:

• rvs - sampling
• pdf - probability density function
• cdf - cumulative density function
• logpdf and logcdf - more stable than np.log(pdf) and np.log(cdf)
Laplace
16/90
Probability density functions
Common probability density functions:

a Normal/Gaussian b Laplace

1.0
0.75 loc=0, scale=1 loc=0, scale=1
loc=1, scale=1 loc=1, scale=1
loc=-2.5, scale=0.5 loc=-2.5, scale=0.5
0.50
p(x)

p(x)
0.5
0.25

0.00 0.0
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x

c Beta d Gamma

2 1.0
a=1.5, b=1 a=1, loc=0, scale=1
a=1, b=1 a=2, loc=0, scale=1
a=2, b=1 a=5, loc=0, scale=1
p(x)

p(x)

1 0.5

0 0.0
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x
17/90
Probability density functions

(a) How does the distribution of a random variable u3 , which is calculated from
the sum of two uniformly distributed variables u3 = u1 + u2 , look like?

(b) How does the distribution of a random variable u3 , which is calculated from
the difference of two uniformly distributed variables u3 = u1 − u2 , look like?

a b

1.0 1.0

0.5 0.5

0.0 0.0
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
u1 u2

18/90
Probability density functions
Example:
Which option is correct for p(u2 + u1 )?

a b

1.0
0.4

0.5
0.2

0.0 0.0
0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 1.00
u1 + u2 u1 + u2

c d

1.0
4

0.5
2

0.0 0
0.0 0.5 1.0 1.5 2.0 0.6 0.8 1.0 1.2
u1 + u2 u1 + u2

19/90
Probability density functions
Which option is correct for p(u1 − u2 )?

a b

1.0

0.5
20

0.0 0
−1.0 −0.5 0.0 0.5 1.0 −0.4 −0.2 0.0 0.2 0.4
u1 − u2 u1 − u2

c d

10
4

5
2

0 0
−0.10 −0.05 0.00 0.05 0.10 −0.2 0.0 0.2
u1 − u2 u1 − u2

20/90
Subsection 3

Cumulative distributions

21/90
Cumulative distributions

The cumulative distribution function is defined by

Fx (X) = P (x < X) (21)

and can be calculated with

Z X
Fx (X) = px (x)dx (22)
−∞

Some useful properties:

• P (x ≥ X) = 1 − Fx (X)

• P (X1 ≤ x ≤ X2 ) = P (x ≤ X2 ) − P (x ≤ X1 )

• px (x) = d
F (x)
dx x

22/90
Cumulative distributions
Example
A uniform distribution U(x; a, b) has the following probability density function:

1/(b − a) if a≤x≤b
U(x; a, b) = (23)
0 otherwise

Its CDF is given by

Z X
Fx (X) = U(x; a, b)dx (24)
−∞
X −a
= , a≤X≤b (25)
(b − a)

U(x; 0, 1):

1.0 1.0
pdf

cdf

0.5 0.5

0.0 0.0
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x

23/90
Cumulative distributions
Common probability density functions (left) with their cumulative distributions (right):

a Normal/Gaussian b Normal/Gaussian

1.0
0.75 loc=0, scale=1
loc=1, scale=1
loc=-2.5, scale=0.5
0.50

cdf (x)
p(x)

0.5
0.25 loc=0, scale=1
loc=1, scale=1
loc=-2.5, scale=0.5
0.00 0.0
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x

c Laplace d Laplace

1.0 1.0
loc=0, scale=1
loc=1, scale=1
loc=-2.5, scale=0.5
cdf (x)
p(x)

0.5 0.5
loc=0, scale=1
loc=1, scale=1
loc=-2.5, scale=0.5
0.0 0.0
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x

24/90
Cumulative distributions
a Beta b Beta

2 1.0
a=1.5, b=1 a=1.5, b=1
a=1, b=1 a=1, b=1
a=2, b=1 a=2, b=1

cdf (x)
p(x)

1 0.5

0 0.0
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x

c Gamma d Gamma

1.0 1.0
a=1, loc=0, scale=1 a=1, loc=0, scale=1
a=2, loc=0, scale=1 a=2, loc=0, scale=1
a=5, loc=0, scale=1 cdf (x) a=5, loc=0, scale=1
p(x)

0.5 0.5

0.0 0.0
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x

25/90
Subsection 4

Sampling

26/90
Sampling

• If we have a PDF, p(x), then a sample from the PDF is denoted x ∼ p(x),
where ∼ means ”sample from” in this course.

• Sampling forms a critical part of modelling and inference.

• It also allows efficient integration in high-dimensions

27/90
Sampling
Samples (left); empirical and analytical probability distribution (right).

a N=10 b N=10

Analytical
1 0.6
Empirical
0.4
xi

0
0.2
−1
0.0
0 2 4 6 8 −5.0 −2.5 0.0 2.5 5.0
N x

c N=100 d N=100

0.6 Analytical
2 Empirical
0.4
xi

0
0.2
−2

0.0
0 25 50 75 100 −5.0 −2.5 0.0 2.5 5.0
N x

28/90
Sampling
a N=10000 b N=10000

10
Analytical
0.4 Empirical
xi

0
0.2

−10 0.0
0 2500 5000 7500 10000 −10 −5 0 5 10
N x

c N=100000 d N=100000

10 Analytical
0.4 Empirical
xi

0
0.2

−10
0.0
0 25000 50000 75000 100000 −10 0 10
N x

29/90
Subsection 5

Probabilities

30/90
Probabilities

The probability of a variable a ≤ x < b, with x ∼ p(x) is given by

Z b
P (a < x < b) = p(x)dx (26)
a

P (a < x < b) = P (x < b) − P (x < a) (27)

P (a < x < b) = Fx (b) − Fx (a) (28)

31/90
Probabilities

0.2 0.2

pdf

pdf
B C
A
0.0 0.0
−5.0 −2.5a 0.0 2.5 5.0 −5.0 −2.5 0.0b 2.5 5.0 −5.0 −2.5 a 0.0b 2.5 5.0

Z a
P (x < a) = p(x)dx = Fx (a) (29)
−∞
Z b
P (x < b) = p(x)dx = Fx (b) (30)
−∞
Z b
P (a < x < b) = p(x)dx (31)
a
Z b Z a
= p(x)dx − p(x)dx (32)
−∞ −∞

= Fx (b) − Fx (a) (33)

32/90
Probabilities
a Example 1 b Example 1

a = 0, b= 1 a = 0, b= 1
0.4 1.0
a cdf(a)=0.5
b cdf(b)=0.841

cdf (x)
p(x)

0.2 0.5

0.0 0.0
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x

c Example 2 d Example 2

a = -1, b= 1 a = -1, b= 1
0.4 1.0
a cdf(a)=0.159
b cdf(b)=0.841
cdf (x)
p(x)

0.2 0.5

0.0 0.0
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x

33/90
Probabilities
a Example 3 b Example 3

a = -2, b= 4 a = -2, b= 4
0.4 1.0
a cdf(a)=0.023
b cdf(b)=1.0

cdf (x)
p(x)

0.2 0.5

0.0 0.0
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x

c Example 4 d Example 4

a = 4, b= 4.5 a = 4, b= 4.5
0.4 1.0
a cdf(a)=1.0
b cdf(b)=1.0
cdf (x)
p(x)

0.2 0.5

0.0 0.0
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x

34/90
Subsection 6

Percentiles and quantiles

35/90
Percentiles and quantiles

Consider the PDF p(x) with the CDF given by

Z X
Fx (X) = p(x)dx (34)
−∞

The qth percentile, denoted by Xq is obtained by solving:

Fx (Xq ) = q/100 (35)

What value should Xq be, to ensure that Fx (Xq ) = q/100?

36/90
Percentiles and quantiles

Example:
Calculate the 95th percentile of a uniform distribution.

Answers:
The 95th percentile of a uniform distribution is obtained by firstly calculating
the CDF: Z X
Fx (X) = 1dx = X, 0 ≤ X ≤ 1 (36)
0

Therefore, the 95th percentile is X95 = 0.95.

37/90
Percentiles and quantiles

Example:
Student-t distribution with 4 degrees-of-freedom, mean of 0 and a scaling
parameter of 1:

0.2
pdf

0.0
−5.0 −2.5 0.0 2.5 5.0
x

What is the 75th percentile/3rd quartile?

38/90
Percentiles and quantiles
Method 1: Use a table for standardised t-distributions (Outdated and tedious)

Figure: https://www.dummies.com/wp-content/uploads/451675.image0.jpg

Remember, Z Xq
Fx (Xq ) = p(x)dx (37)
−∞

is equivalent to Z Xq
1 − Fx (Xq ) = 1 − p(x)dx (38)
−∞
39/90
Percentiles and quantiles

Method 2: Use the CDF to find the percentile

Percentile: 0.741
1.0
cdf 75th percentile

0.5

0.0
−5.0 −2.5 0.0 2.5 5.0
x

Hint: See scipy.stats.<distribution>.ppf for directly calculating the

percentiles.

Therefore, the 75th percentile is 0.741.

40/90
Percentiles and quantiles

Method 3: Calculate the 75th percentile of the samples

Percentile of samples: 0.745

0.3
pdf

0.2

0.1

0.0
−5.0 −2.5 0.0 2.5 5.0
x

How many samples are sufficient to reliably estimate the percentiles?

41/90
Subsection 7

Likelihood

42/90
Likelihood

Consider the following generative model:

x ∼ p(x; θ) (39)

The likelihood of the model with the parameters θ for the data point xn is
given by
L(θ|xn ) = p(xn ; θ) (40)
Example:

1.0 0
PDF
Data point
Likelihood
−5
0.5
Log-PDF
−10 Data point
Log-Likelihood
0.0
0 1 2 3 0 1 2 3
x x

The likelihood measures how well the model agrees with the measured data.

43/90
Likelihood

The likelihood values are usually small and therefore the log-likelihood is
usually calculated and reported:

log L(θ|xn ) = log p(xn ; θ) (41)

If we have multiple data points and the data are independently sampled, we
can calculate the likelihood as follows:
N
Y
L(θ|x) = p(xn ; θ) (42)
n=1

and the log-likelihood as follows:

N
Y
log L(θ|x) = log p(xn ; θ) (43)
n=1

which can be simplified as follows:

N
X
log L(θ|x) = log p(xn ; θ) (44)
n=1

44/90
Subsection 8

Transformation of variables

45/90
Let x be from a known distribution with a probability density function pX (x)
and a cumulative distribution function F (x). Let

y = ψ(x) (45)
−1
where ψ is a monotonic function and x = ψ (y). Then,

dx ψ −1 (y)
pY (y) = pX (ψ −1 (y)) = pX (ψ −1 (y)) (46)
dy dy
What is the probability distribution over

y =a·x+b (47)

if x ∼ pX (x)?

46/90
What is the probability distribution over

y =a·x+b (48)

if x ∼ pX (x)?

dx
pY (y) = pX (ψ −1 (y)) (49)
dy

y−b 1
pY (y) = pX · (50)
a a
2
If x ∼ N (x|µX , σX )

y−b 2 1
pY (y) = N µX , σ X · (51)
a a
which can be written as follows:

pY (y) = N (y|a · µX + b, a2 σX
2
) (52)

47/90
Simplification of N y−b 2
1
a
µX , σ X · a :
The Gaussian probability density function over x is given by

2 1 1 2
N (x|µX , σX ) = p exp − 2 (x − µX ) (53)
2πσX2 2σX

The distribution of y is as follows:

( 2 )
1 1 y−b 1
pY (y) = p exp − 2
− µ X · (54)
2πσX 2 2σX a a

1 1 2
= p exp − 2 2 (y − (a · µX + b)) (55)
2πa2 σX2 2a σX

1 1 2
= p exp − (y − µ Y ) (56)
2πσY2 2σY2
= N (y|µY , σY2 ) (57)
2 2
= N (y|a · µX + b, a σX ) (58)

48/90
Example model 1
The equation
xn = θ + ϵn (59)
2

where ϵn ∼ N 0, σ can be written as follows:

xn ∼ N θ, σ 2

(60)

Example model 2
The equation
yn = w0 + w1 · xn + ϵn (61)
where yn and xn are known, ϵn ∼ N 0, σ 2 can be written as follows:

yn ∼ N w0 + w1 · xn , σ 2

(62)

or
yn |xn ∼ N w0 + w1 · xn , σ 2

(63)
or
yn |xn , w0 , w1 , σ 2 ∼ N w0 + w1 · xn , σ 2

(64)

49/90
Subsection 9

Multivariate distributions

50/90
Multivariate distributions

A multivariate density is a probability density defined over multiple variables.

We can write the density as p(x1 , x2 , x3 ) or p(x) where xT = [x1 , x2 , x3 ].

The multivariate Gaussian distribution over a D dimensional space is defined by

1 1 1 T −1
p(x; µ, Σ) = exp − (x − µ) Σ (x − µ) (65)
(2π)D/2 |Σ|1/2 2

where µ ∈ RD is the mean vector and Σ ∈ RD×D is the covariance matrix.

51/90
Multivariate distributions

Examples of bivariate Gaussian distributions:

10 0.16 10 0.09
0.14 0.08
5 0.12 5 0.07
0.10 0.06
0 0.08 0 0.05
b

b
0.04
0.06 0.03
−5 0.04 −5 0.02
0.02 0.01
−10 0.00 −10 0.00
−10 −5 0 5 10 −10 −5 0 5 10
a a

5 2
• Left model: µT = [1, 1]; Σ =
2 1

1 −1
• Right model: µT = [−1, 1]; Σ =
−1 4
Note:
Z
1 1 1 T −1
D/2
exp − (x − µ) Σ (x − µ) d x = 1 (66)
x (2π) |Σ|1/2 2

52/90
Multivariate distributions

We can marginalise over a to obtain a probability distribution over b:

Z
p(b) = p(a, b)da (67)
a

This is referred to as the marginal distribution over b.

We can integrate over c to obtain a probability distribution over a and b:

Z
p(a, b) = p(a, b, c)dc (68)
c

We can marginalise over c and b to obtain a probability distribution over a:

Z Z
p(a) = p(a, b, c)dbdc (69)
c b

where Z Z Z
p(a, b, c)dadbdc = 1 (70)
c b a

53/90
Multivariate distributions

The joint distribution p(a, b) can be decomposed as follows

p(a, b) = p(b|a)p(a) = p(a|b)p(b), Product rule (71)

where p(b|a) denotes the conditional distribution of b given a, i.e. what do we

know about b given the observations of a.

If b is independent of a then

p(b|a) = p(b) (72)

Bayes’ rule:
p(b|a)p(a)
p(a|b) = (73)
p(b)

54/90
Multivariate distributions

Example:
If we go back to the urn example in the introduction . . .
(a) (b)

Urn 1 Urn 2 Urn 1 Urn 2

Figure

1. Calculate the joint distribution p(colour, urn).

2. Calculate the marginal distribution p(colour).

3. Calculate the conditional distribution distribution p(colour|urn).

4. Calculate the conditional distribution distribution p(urn|colour).

55/90
Multivariate distributions

This is the frequency table we filled in previously:

Description Urn 1 Urn 2 Total
Blue circle (B) 2 6 8
Red square (R) 8 3 11
Green triangle (G) 0 1 1
Total 10 10 20

56/90
Multivariate distributions

The joint distribution for this problem is denoted

p(colour, urn) (74)

and given by the following table:

Description Urn 1 Urn 2 Total
Blue circle (B) 0.1 0.3 0.4
Red square (R) 0.4 0.15 0.55
Green triangle (G) 0.0 0.05 0.05
Total 0.5 0.5 1.0
XX
p(colour, urn) = 1 (75)
colour urn

57/90
Multivariate distributions

The marginal distribution

X
p(colour) = p(colour, urn) (76)
urn

and given by the following table:

Description Urn 1 Urn 2 P(colour)
Blue circle (B) 0.1 0.3 0.4
Red square (R) 0.4 0.15 0.55
Green triangle (G) 0.0 0.05 0.05
Total 0.5 0.5 1.0
X
p(colour) = 1 (77)
colour

58/90
Multivariate distributions

The conditional distribution for this problem is denoted

p(colour, urn)
p(colour|urn) = (78)
p(urn)
and given by the following table:
Description Urn 1 Urn 2 Total
Blue circle (B) 0.2 0.6 0.8
Red square (R) 0.8 0.3 1.1
Green triangle (G) 0.0 0.1 0.1
Total 1.0 1.0
X
p(colour|urn) = 1 (79)
colour

59/90
Multivariate distributions

Calculate P (U1 |B)

P (B|U1 )P (U1 )
P (U1 |B) = (80)
P (B)
0.2 · 0.5
= (81)
0.4
= 0.25 (82)

Calculate P (U2 |B)

P (B|U2 )P (U2 )
P (U2 |B) = (83)
P (B)
0.6 · 0.5
= (84)
0.4
= 0.75 (85)

60/90
Multivariate distributions

Consider the following joint distribution:

10 0.16
0.14
5 0.12
0.10
0 0.08
b

0.06
−5 0.04
0.02
−10 0.00
−10 −5 0 5 10
a

• Calculate the marginal distribution of a and b.

• Calculate the condition distribution for b = 2 and b = −1.

61/90
Multivariate distributions

The marginal distributions associated with this distribution is given by

0.4
Analytical Analytical
0.15
Numerical Numerical
0.10
p(a)

p(b)
0.2

0.05

0.00 0.0
−10 −5 0 5 10 −10 −5 0 5 10
a b

Analytical: p(a) = N (a|µa , Σaa ) (See Bishop, 2006).

Numerical: Z
p(a) = N (x|µ, Σ) db (86)

where x = [a, b].

62/90
Multivariate distributions
Consider b = 2 and b = −1:

10 0.16
b = 2.0 b = −1.0 0.14
5 0.12
0.10
0 0.08
b 0.06
−5 0.04
0.02
−10 0.00
−10 −5 0 5 10
a

The associated condition distributions are given by:

0.4 0.4
p(a|b = 2.0) p(a|b = −1.0)

0.2 0.2

0.0 0.0
−10 −5 0 5 10 −10 −5 0 5 10
a a
63/90
Multivariate distributions

See Bishop (2006)’s Chapter 2 for a extensive overview of univariate probability

distributions and multivariate probability distributions.

Useful equations:
• Marginal and condition distributions (I): Equations 2.94-2.98
• Marginal and condition distributions (II): Equations 2.113-2.117

64/90
Subsection 10

Expected values

65/90
Expected values

Expected values are often used to understand the performance of models.

Therefore, we need to understand what they mean and know how they are
calculated.

Consider a dartboard:

1.00

0.75

0.50

0.25

0.00

−0.25

−0.50

−0.75

−1.00
−1.0 −0.5 0.0 0.5 1.0

Four people are throwing darts at the board to see who can throw the most
accurate. The aim is to throw the darts as close as possible to the centre of the
board. Each person gets 10 throws.
66/90
Expected values

These are the results from the four people:

Attempt 1 Attempt 2
1.0 1.0

0.5 0.5

0.0 0.0

−0.5 −0.5

−1.0 −1.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

Attempt 3 Attempt 4
1.0 1.0

0.5 0.5

0.0 0.0

−0.5 −0.5

−1.0 −1.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

Who was the most accurate?

67/90
Expected values

Summary:
• Attempt 1: Close to the centre, low variance
• Attempt 2: Far from the centre, low variance
• Attempt 3: Close to the centre, high variance
• Attempt 4: Far from the centre, high variance

Ideally, we want the darts close to the centre, with a low variance.

• Expected value of throw: Average position from centre for an infinite

number of throws
• Variance of throw: Average squared distance from expected value for an
infinite number of throws
We need to consider both of them when evaluating the accuracy of the throws.

68/90
Expected values

Consider a Gaussian with a mean of 2 and a standard deviation of 3 (variance

of 9), i.e. N (x; 2, 32 ).

If we draw 10 samples from a Gaussian,

• What is the mean of the samples?

• Would there be a difference in the data if we draw another 10 samples?

• Would there be a difference in the mean of the data if we draw another

10 samples?

If we repeat this exercise but draw 106 samples instead, what would be
different?

69/90
Expected values

The distribution of N (x; 2, 32 ) has the following form:

0.10

0.05

0.00
−10 0 10
x

Most of the data points are between −7 and 11.

What is the analytical mean of the distribution?

70/90
Expected values

Consider 10 samples from N (x; 2, 32 ):

x
−5

2 4 6 8 10
Sample number

What is the mean of the data?

The mean of the data is calculated with the following equation1 :
N
1 X
g(x) = xi (87)
N i=1

where the data vector is denoted x = [x1 , . . . , xN ].

1
Please note that g(x) is a definition of an arbitrary function of the data, e.g.
1
PN 1
PN
g(x) = N i=1 xi , g(x) = N i=1 |xi |, or g(x) = x1 71/90
Expected values

g(x) is superimposed on the data:

0
x

−5
g(x) = 1.5675032467265801

2 4 6 8 10
Sample number

If we generate data again, what do we expect would happen with the data and
g(x)?

72/90
Expected values

The data (left) and the data with the corresponding g(x) (right) for 10 repeats:

g(x)

5 5
x

x
0 0

2 4 6 8 10 2 4 6 8 10
Sample number Sample number

• Data are random - generated from a random process.

• The function g(x) is dependent on the data and therefore also random.
• How can we quantify the behaviour of g(x)?

73/90
Expected values

If we repeat this process 100, 000 times, then g(x) has the following behaviour:

g(x) 5.0

2.5

0.0

0 25000 50000 75000 100000

Repeat number

g(x) varies significantly around 2.

74/90
Expected values

The distribution over g(x) has the following form:

0.4

0.2

0.0
−2 0 2 4 6
g(x)

This is referred to as the sampling distribution of g(x). (More on this later)

Do not confuse it with the distribution of x!

75/90
Expected values

Distribution of x and the distribution of g(x):

0.4
0.10

0.2
0.05

0.00 0.0
−10 0 10 −2 0 2 4 6
x g(x)

76/90
Expected values

The expected value quantifies what would happen if we could repeat and
average an infinite number of experiments.

Example: (Continued)
The (approximate) expected value of g(x) is superimposed on g(x) for each
run:

5.0
g(x)

2.5

0.0
E{g(x)}

0 25000 50000 75000 100000

Repeat number

The expected value of g(x), denoted E(g(x)), is the average of g(x) over an
infinite number of experiments.

77/90
Expected values
Definition:
The expected value of a function of a random variable x, denoted g(x), is given
by Z ∞
Ex∼p(x) {g(x)} = g(x)p(x)dx (88)
−∞
Notation:
• Ex∼p(x) denotes the expected value where x has a PDF of p(x).
• Ex∼p(x) {g(x)} is therefore the expected value of g(x) where x ∼ p(x).
• Sometimes x ∼ p(x) is omitted from the notation.
The expected value tells us on average how far we are from the centre of the
dart board.

The expected value is calculated as an integral and therefore the properties of

integrals can be used in the calculation, e.g. let
q(x) = e(x) · g(x) + f (x) + h(x), then
Ex∼p(x) {q(x)} (89)
Z ∞
= (e(x) · g(x) + f (x) + h(x)) p(x)dx (90)
−∞

= Ex∼p(x) {e(x) · g(x)} + Ex∼p(x) {f (x)} + Ex∼p(x) {h(x)}

(91)
78/90
Expected values

Definition: The variance of g(x) tells us the dispersion of g(x) under the
distribution x ∼ p(x):
Z ∞
varx∼p(x) {g(x)} = (g(x) − Ex∼p(x) {g(x)})2 p(x)dx (92)
−∞

The variance tells us how far the different throws are from the average value of
the throws.

79/90
Expected values

Example: (Continued)
The expected value and the variance of g(x) is quantified:

5.0
g(x)
2.5

0.0 E{g(x)}
E{g(x)} ± 3 · std{g(x)}

0 25000 50000 75000 100000

Repeat number

Therefore, even though the data are random, we have a bound on g(x)!

How do we calculate the expected value and variance in practice?

80/90
What is the expected value and variance of

y =a·x+b (93)
2
if N (x|µX , σX )?

81/90
The expected value is given by:

Ex∼p(x) {y(x)} = a · Ex∼p(x) {x} + b = a · µX + b (94)

The variance of y is given by:

Ex∼p(x) (y − µY )2 = Ex∼p(x) (a · x + b − a · µX − b)2

(95)
= a2 Ex∼p(x) (x − µX )2

(96)
2 2
=a σX (97)

Compare these answers against the distribution of y.

82/90
Subsection 11

Monte Carlo integration

83/90
Monte Carlo integration

Consider an integral of the form:

Z ∞
p(x)f (x)dx (98)
−∞

where p(x) is a probability density function and f (x) is an arbitrary function of

How do we calculate this?

• Analytical - only possible for specific combinations of g(x) and p(x).
• Numerical integration (e.g. Trapezoidal, Simpson) - scales poorly with
high-dimensions
• Monte Carlo - we need to have samples available - not always easy.

84/90
Monte Carlo integration

Let us assume we have a method to generate samples from a distribution

x ∼ p(x).

We can write this integral Z ∞

p(x)f (x)dx (99)
−∞

in the form of a summation

Z ∞ N
1 X
p(x)f (x)dx = lim f (xi ), xi ∼ p(x) (100)
−∞ N →∞ N
i=1

Therefore, we need to sample from p(x) and then calculate the mean of f (x),
evaluated at the samples.

85/90
Monte Carlo integration

Exercise:
Calculate the integral Z ∞
N (x; 1, 22 )xdx (101)
−∞

using scipy.integrate.quad and Monte Carlo integration.

What is the analytical value of the integral?

(Hint: This integral calculates the expected value of x)

86/90
Monte Carlo integration

Exercise:
Calculate the integral Z ∞
N (x; 1, 22 )xdx (102)
−∞

• scipy.integrate.quad2 : 1.0
• Monte Carlo integration3 : 1.0004166378896788 ± 0.020921105045549167

2
The integral is not performed from −∞ to ∞. Instead, a large domain that covers the
majority of the probability mass of p(x) is used in the integration.
3
Only a single scalar estimate is obtained when performing Monte Carlo integration. The
Monte Carlo estimate is a random variable as we are using randomly sampled values from p(x) to
estimate the integral. This means that each time that we generate new samples, a new estimate is
obtained. The expected value of the estimate is unbiased and the variance of the Monte Carlo
estimate depends on the number of samples that is used. In this example, the mean of the
estimates and the standard deviation of the estimates are shown to highlight that the Monte Carlo
estimate is random. 87/90
Monte Carlo integration

Exercise:
Calculate the integral Z ∞
N (x; 1, 22 )x4 dx (103)
−∞

• scipy.integrate.quad: 73.00
• Monte Carlo integration ≈ 73.

88/90
Subsection 12

Conclusion

89/90
Conclusion

In this lecture we developed the language to be able to build engineering

models from the available data.
• PDF, CDF
• Probabilities, percentiles, quantiles
• Likelihood
• Marginal and conditional distributions
• Transformations
• Expected values
• Monte Carlo integration, i.e. integration through sampling.

90/90

Probability Cheatsheet
100% (1)
Probability Cheatsheet
10 pages
UNIT 3 - DIGITAL DOCUMENTATION Class 9th NCRT
85% (13)
UNIT 3 - DIGITAL DOCUMENTATION Class 9th NCRT
9 pages
1DevelopingDSS SpreadSheet Ebook PDF
No ratings yet
1DevelopingDSS SpreadSheet Ebook PDF
1,398 pages
Electronic DATA PROCESSING (EDP)
100% (1)
Electronic DATA PROCESSING (EDP)
19 pages
Infrastructure Setup For RPA
No ratings yet
Infrastructure Setup For RPA
4 pages
Continuous Probability Distributions
100% (1)
Continuous Probability Distributions
48 pages
Probability Civil
No ratings yet
Probability Civil
36 pages
Lesson 2: Probability Density Functions (Of Continuous Random Variables)
No ratings yet
Lesson 2: Probability Density Functions (Of Continuous Random Variables)
7 pages
Chapter 4
No ratings yet
Chapter 4
21 pages
Continuous Probability Distributions
No ratings yet
Continuous Probability Distributions
59 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
Lecture Notes on Continuous Random Variables
No ratings yet
Lecture Notes on Continuous Random Variables
38 pages
Honeycomb FStat
No ratings yet
Honeycomb FStat
185 pages
Continuous Random Variables: Wei-Yang Lin Department of Computer Science & Information Engineering
No ratings yet
Continuous Random Variables: Wei-Yang Lin Department of Computer Science & Information Engineering
26 pages
Chapter 6 Statistics For Engineer
No ratings yet
Chapter 6 Statistics For Engineer
16 pages
Lab-2: Probability Distributions Name: Objective:To Compute Probability Density Function (PDF) and Cumulative Distribution Function (CDF) Outcomes
No ratings yet
Lab-2: Probability Distributions Name: Objective:To Compute Probability Density Function (PDF) and Cumulative Distribution Function (CDF) Outcomes
15 pages
Continuous Distn
No ratings yet
Continuous Distn
18 pages
Probability Slides
No ratings yet
Probability Slides
45 pages
M131-Lecture Notes No. 4
No ratings yet
M131-Lecture Notes No. 4
58 pages
Probability and Statistics (IT302) 17 August 2020 Monday 09:45AM-10:15AM Class 6
No ratings yet
Probability and Statistics (IT302) 17 August 2020 Monday 09:45AM-10:15AM Class 6
23 pages
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
No ratings yet
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
26 pages
Research Methods in Economics Part II STAT
No ratings yet
Research Methods in Economics Part II STAT
350 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
3. Continuous Random Variables
No ratings yet
3. Continuous Random Variables
28 pages
MATH 156 Chapter 4 - Probability and Calculus
No ratings yet
MATH 156 Chapter 4 - Probability and Calculus
32 pages
Chapter 6 Stats
No ratings yet
Chapter 6 Stats
39 pages
3
No ratings yet
3
34 pages
Chapter 6
No ratings yet
Chapter 6
52 pages
Continuous Probability Distributions Lesson Notes
No ratings yet
Continuous Probability Distributions Lesson Notes
58 pages
Probability
No ratings yet
Probability
73 pages
Notes
No ratings yet
Notes
56 pages
01 Lectureslides ProbTheory
No ratings yet
01 Lectureslides ProbTheory
42 pages
LECT3 Probability Theory
No ratings yet
LECT3 Probability Theory
42 pages
Ch3 Random Variables
No ratings yet
Ch3 Random Variables
27 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
DOC-20250313-WA0005.
No ratings yet
DOC-20250313-WA0005.
118 pages
3 Special Probability Distributions and Densities Copy 1
No ratings yet
3 Special Probability Distributions and Densities Copy 1
43 pages
Math403 - 4.0 Continuous Probability Distribution
No ratings yet
Math403 - 4.0 Continuous Probability Distribution
42 pages
Continuous Random Variables and Their Probability Distributions
No ratings yet
Continuous Random Variables and Their Probability Distributions
59 pages
L-6 Probability Distribution
No ratings yet
L-6 Probability Distribution
58 pages
Probability Theory Cheat Sheet
No ratings yet
Probability Theory Cheat Sheet
10 pages
Probability Cheatsheet v2.0: Thinking Conditionally
No ratings yet
Probability Cheatsheet v2.0: Thinking Conditionally
10 pages
chap4
No ratings yet
chap4
36 pages
Continuous Random Variables and Probability Distributions
No ratings yet
Continuous Random Variables and Probability Distributions
73 pages
Probability Cheatsheet
100% (2)
Probability Cheatsheet
10 pages
Continuous_Probability_Function
No ratings yet
Continuous_Probability_Function
36 pages
Expected Value - Continuous Variables - Exponential and Gamma
No ratings yet
Expected Value - Continuous Variables - Exponential and Gamma
8 pages
Prob Distribution 1
No ratings yet
Prob Distribution 1
86 pages
Contious Random Variable_1
No ratings yet
Contious Random Variable_1
24 pages
$ Sem5 - Maths Formulae
No ratings yet
$ Sem5 - Maths Formulae
3 pages
Moodule 1 - One Random Variable
No ratings yet
Moodule 1 - One Random Variable
58 pages
Statistics 100A Homework 6 Solutions: Ryan Rosario
No ratings yet
Statistics 100A Homework 6 Solutions: Ryan Rosario
13 pages
Chapter 2 - Section 2 (part 1)
No ratings yet
Chapter 2 - Section 2 (part 1)
42 pages
Lecture 8
No ratings yet
Lecture 8
15 pages
Exercises in Electronics: Operational Amplifier Circuits
From Everand
Exercises in Electronics: Operational Amplifier Circuits
Roland Büchi
3/5 (1)
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Software Requirements Specification: Version 1.0 Approved
No ratings yet
Software Requirements Specification: Version 1.0 Approved
16 pages
Nutanix Security Guide v6 - 6 - Compressed
No ratings yet
Nutanix Security Guide v6 - 6 - Compressed
233 pages
Numerical Mathematics and Computing -- David R_ Kincaid E_ Ward Cheney -- 2008-01-01 -- B
No ratings yet
Numerical Mathematics and Computing -- David R_ Kincaid E_ Ward Cheney -- 2008-01-01 -- B
180 pages
Unit 1
No ratings yet
Unit 1
55 pages
Erd 02
No ratings yet
Erd 02
24 pages
Building Information Modeling (BIM) in Construction
No ratings yet
Building Information Modeling (BIM) in Construction
21 pages
WPA2 Handshake Capturing
No ratings yet
WPA2 Handshake Capturing
15 pages
Leaflet - UPS - HPH G2 20 40kVA B BN - en Global - ECAMCK 05 JY
No ratings yet
Leaflet - UPS - HPH G2 20 40kVA B BN - en Global - ECAMCK 05 JY
2 pages
DMG80480T050 32WTC Datasheet
No ratings yet
DMG80480T050 32WTC Datasheet
6 pages
Gayle McDowell CareerCup Sample Resume
No ratings yet
Gayle McDowell CareerCup Sample Resume
2 pages
Sicap Developing Irules For BIG-IP F5-TRG-BIG-IRULE-CFG
No ratings yet
Sicap Developing Irules For BIG-IP F5-TRG-BIG-IRULE-CFG
5 pages
Calc - Worksheet - 3 - On - Series - With - Key - 2 (2) - 2
No ratings yet
Calc - Worksheet - 3 - On - Series - With - Key - 2 (2) - 2
4 pages
Best Practices For Contact Modeling Using Ansys: Yongyi Zhu, PHD Research and Development Fellow July, 2017
No ratings yet
Best Practices For Contact Modeling Using Ansys: Yongyi Zhu, PHD Research and Development Fellow July, 2017
44 pages
Document 1323505 Prerequisition For Implement Supplier
No ratings yet
Document 1323505 Prerequisition For Implement Supplier
2 pages
KYO 8 - KyoUnit - Manual de Progração Teclado
No ratings yet
KYO 8 - KyoUnit - Manual de Progração Teclado
52 pages
K3_Revised Packet Tracer - Subnet Routing
No ratings yet
K3_Revised Packet Tracer - Subnet Routing
2 pages
Tally Question (25 3 20)
No ratings yet
Tally Question (25 3 20)
6 pages
Instagram Mastery Guide
No ratings yet
Instagram Mastery Guide
74 pages
TikTok Guide and Best Practices
No ratings yet
TikTok Guide and Best Practices
6 pages
Discrete Structures Introductory Lecture
No ratings yet
Discrete Structures Introductory Lecture
14 pages
E2 Smart Band PDF
No ratings yet
E2 Smart Band PDF
10 pages
Lec 16,17
No ratings yet
Lec 16,17
90 pages
How do i change fast forward_rewind length_ (left-right arrow buttons on keyboard) _ mpv
No ratings yet
How do i change fast forward_rewind length_ (left-right arrow buttons on keyboard) _ mpv
3 pages
Analysis & Pediction Using WEKA Machine Learing Toolkit
No ratings yet
Analysis & Pediction Using WEKA Machine Learing Toolkit
37 pages
Ensayos Sobre La Odisea
100% (1)
Ensayos Sobre La Odisea
7 pages
@OutCloudMail Hotmail Combo
No ratings yet
@OutCloudMail Hotmail Combo
21 pages