0% found this document useful (0 votes)
21 views

Quant 2020 Lecture 3 FT

The document outlines a lecture plan on quantitative methods that includes discussing random variables, mean, standard deviation, correlation, binomial distribution, and central limit theorem. It provides examples of continuous and discrete random variables, such as rainfall amount and aircraft orders. It also discusses calculating the mean, standard deviation, and probability of random variables like the face of a die, battery lifespan, and the sum of two dice rolls.

Uploaded by

Barrio Bravo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Quant 2020 Lecture 3 FT

The document outlines a lecture plan on quantitative methods that includes discussing random variables, mean, standard deviation, correlation, binomial distribution, and central limit theorem. It provides examples of continuous and discrete random variables, such as rainfall amount and aircraft orders. It also discusses calculating the mean, standard deviation, and probability of random variables like the face of a die, battery lifespan, and the sum of two dice rolls.

Uploaded by

Barrio Bravo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Quantitative Methods

13th October (Fall 2020)


Lecture Plan for today

1. Random Variables

2. Mean & Standard Deviation

3. Correlation

4. Binomial Distribution

5. Central Limit Theorem

6. Prelude to Statistical Sampling


Random Variables
Random Variables

Will it rain or not tomorrow? Probability P(rain) is always


between 0 and 1.

Instead if we ask how much rain will we get tomorrow? Answer


can vary from 0mm (no rain) to over 200mm (monsoon).

We refer to this uncertain quantity (of rain) as a random


variable.

Random variables can either be continuous or discrete.


Example #1 - rain projection for Oct 2020 in KL, Malaysia

Continuous random variable


since rainfall can take
any values >= 0mm
Source: https://www.data.gov.my/data/en_US/dataset/rainfall-2020-to-2040-ccsm3a1b
Example #2 - projected aircraft orders for next year

Discrete random variable


since aircraft orders can
only take integer values
Source: DMD Figure 2.3
Example #2 - projected aircraft orders for next year
Orders for
General
Avionics
636 Cumulative 1. What is the mode? 46
Aircraft Probability Probability
2. What is the median?
42 5% 5% 45
43 10% 15% 3. What is the mean?

f
44 15% 30% Link to Random variables sheet
45 20% 50% ←
46 25% 75%
42×5%7
48×10%745-35
47 15% 90% . . .
t
48 10% 100%
Example #2 - projected aircraft orders for next year

P(Orders ≤ 46) = 75%


Example #3 - Face of Die

Cumulative
Face of Die Probability Probability
1 16.7% 16.7%
2 16.7% 33.3%
3 16.7% 50.0%
4 16.7% 66.7%
5 16.7% 83.3%
6 16.7% 100.0%

1/6 = 16.7%
mean =3 5
Example #3 - Face of Die
.

Cumulative
Face of Die Probability Probability P(Die ≤ 3) = 50%
1 16.7% 16.7%
2 16.7% 33.3%
3 16.7% 50.0%
4 16.7% 66.7%
5 16.7% 83.3%
6 16.7% 100.0%
Example #4 - car battery lifespan

Standard car battery last about two years, but this


“gel cell” battery last between four to six years.
Example #4 - car battery lifespan
Probability Density Function (PDF)

What is the probability of


this battery lasting 4.5
years?
Example #4 - car battery lifespan
Probability Density Function (PDF)

What is the probability of


this battery lasting 4.5
years?

° P(battery = 4.5) = 0

q P(battery ≥ 4.5) → next slide

s
[
Area = I
P(battery ≥ 4.5) = 1.5/2.0 = 3/4 or 75%

6.0 - 4.5 = 1.5

4.0 4.5 6.0

6.0 - 4.0 = 2.0


Example #4 - car battery lifespan
Cumulative Distribution Function (CDF)

P(X ≤ 4.5) = 0.25 => P(X ≥ 4.5) = 0.75


Summary of Random Variables (that we have seen)

Continuous Discrete

Normal

Birth weight of
newborn babies
Summary of Random Variables (that we have seen)

Continuous Discrete

Normal Uniform

Birth weight of Car battery


newborn babies lifespan
Summary of Random Variables (that we have seen)

Continuous Discrete

Normal Uniform

Birth weight of Car battery


Roll of One Die
newborn babies lifespan
Summary of Random Variables (that we have seen)
To be
discussed
after the
Continuous Discrete
break!

Normal Uniform Binomial

Birth weight of Car battery


Roll of One Die
newborn babies lifespan
Structured

Summary of Random Variables (that we have seen)


To be
discussed
after the
Continuous Discrete
break!

Normal Uniform Binomial

Birth weight of Car battery


Roll of One Die
newborn babies lifespan
Mean & Standard Deviation
Face of Die X := face value of die

Mean (or Expectation)


E[X] := p1x1 + … + pnxn
= 3.5
Face of Die Probability
1 16.7% Variance
2 16.7% VAR[X] := p1(x1-E[X])2 + … + pn(xn-E[X])2
3 16.7% = I. 9
4 16.7% Standard Deviation
5 16.7% SD[X] := sqrt(VAR[X])
6 16.7% = c. 7

Link to Random variables sheet


Linear Functions of Random Variables
Xt lo
a
-
-
I
E[aX+b] = aE[X] + b

SD[aX+b] = |a|SD[X]
Questions:
✓ b= LO
1. If everyone gets an extra 10
points on the final exam, how
mean
will it affect the mean &
- + 10
standard deviation?
SD
9--1-1,5=0 #
2. If everyone is given a 10%
bonus to their final exam score,
" x - how will it affect the mean &
standard deviation?
meant 10%
Sb 910%
Expectations (or Mean) add up linearly

Let X and Y be two random variables, then:

E[X + Y] = E[X] + E[Y]

Or in general
W := aX + bY (where a and b are any given numbers):

E[W] = E[aX+bY] = aE[X] + bE[Y]


But Standard Deviations do not

Let X and Y be two independent random variables, then:

SD[X + Y]2 = SD[X]2 + SD[Y]2

Or in general for any X and Y, then


W := aX + bY (where a and b are any given numbers):

O
SD[W]2 =a2SD[X]2 + b2SD[Y]2 + 2abSD[X]SD[Y]CORR[X,Y]
Sum of Two Independent Dice E.S .
see

settles of
Catan game
!
Sum of Two Independent Dice
Sum of Y -
r
X t
, k
Two Dice Probability
2 2.8% Y := sum of two independent dice rolls
3 5.6% Mean E[Y] = 7-
4 8.3%
5 11.1%
Standard Deviation SD[Y] = 2 4 .

6 13.9% Link to Random variables sheet


7 16.7%
8 13.9% ECY 7- E Cx 34 , ECK 3=2×3 . 5
9 11.1%
10 8.3% =
7 v
11 5.6%
12 2.8% SDCYI = 2 x SD CK .
I
'

2 r 42 E 2 K C . 72
Sum of Two Independent Dice
Sum of
Two Dice Probability
2 2.8% Y := sum of two independent dice rolls
3 5.6% Mean E[Y] =
4 8.3%
5 11.1%
Standard Deviation SD[Y] =
6 13.9% Link to Random variables sheet
7 16.7%
8 13.9%
Y = X1 + X2 where X1, X2 are face value of die1
9 11.1% and die2, respectively.
10 8.3%
Verify that E[Y] = E[X1] + E[X2] and that
11 5.6%
SD[Y]2 = SD[X1]2 + SD[X2]2
12 2.8%
Correlation
Correlation is always between +1 and -1

+1.0 +0.9 +0.5 +0.0 -0.5 -0.9 -1.0


Example of Positive Correlation

Source: https://www.mathsisfun.com/data/correlation.html
Example of Negative Correlation

Source: DMD Figure 2.6


Correlation does not imply causation

Source: DMD Figure 2.6


Ice Cream Sales vs Temperature (10-45°C)

Source: https://www.mathsisfun.com/data/correlation.html
Menti Poll #1

What is the correlation between Ice Cream Sales &


Temperature (10-45°C)?

● Negative
● Around Zero
● Positive
Correlation only detects linear relationships

x CORR(Sales,Temp) = 0.0!

Source: https://www.mathsisfun.com/data/correlation.html
Binomial Distributions
2020 US Elections - Biden vs Trump

Source: https://projects.fivethirtyeight.com/polls/president-general/national/
2020 US Elections - Biden vs Trump

Latest (11th Oct) polls shows:

Biden 52.4%, Trump 41.9% => Undecided 5.7%

Suppose US voters preference on election day is as follows:

Biden 60%, Trump 40%

If we sample 10 US voters at random, what will we get?


Menti Poll #2

How many Trump voters do we expect in a random sample of 10


US voters? Assume US voters preference is 60% Biden and 40%
Trump.

● Less than 3
● 3
● 4
0
● 5
● More than 5
Define X := # Trump voters (in sample of 10 US voters)
"

P(X = 0) = ? P (BBB .
. .
B) = O - G = o .
6%
P(X = 1) = ? PCTB . . B )t .
. .
t PCB . . - BT ) = 10 x O - 4×0 .
69

=
4.0%
P(X = 9) = ?
"
P(X = 10) = ? PCT TT .
.
)
T = 0.4 =
O .
0%0
Link to Binomial distributions sheet
X := # Trump voters (in sample of 10 US voters)
# of Trump Cumulative
voters Probability Probability
0 0.6% 0.6%
1 4.0% 4.6%
2 12.1% 16.7%
3 21.5% 38.2%

o
4 25.1% 63.3%
5 20.1% 83.4%
6 11.1% 94.5%
7 4.2% 98.8%
8 1.1% 99.8%
9 0.2% 100.0%
10 0.0% 100.0%
X := # Trump voters (in sample of 10 US voters)
# of Trump Cumulative
voters Probability Probability
0 0.6% 0.6%
1 4.0% 4.6%
2 12.1% 16.7% P(X = 4)
3 21.5% 38.2% =BINOM.DIST(4,10,0.4,FALSE)
4 25.1% 63.3%
5 20.1% 83.4%
6 11.1% 94.5%
7 4.2% 98.8%
8 1.1% 99.8%
9 0.2% 100.0%
10 0.0% 100.0%
X := # Trump voters (in sample of 10 US voters)
# of Trump Cumulative
voters Probability Probability
0 0.6% 0.6%
1 4.0% 4.6%
2 12.1% 16.7% P(X = 4)
3 21.5% 38.2% =BINOM.DIST(4,10,0.4,FALSE)
4 25.1% 63.3% P(X ≤ 4)
5 20.1% 83.4% =BINOM.DIST(4,10,0.4,TRUE)
6 11.1% 94.5%
7 4.2% 98.8%
8 1.1% 99.8%
9 0.2% 100.0%
10 0.0% 100.0%
X := # Trump voters (in sample of 10 US voters)

P(X=4)
X := # Trump voters (in sample of 10 US voters)

P(X≤4)
X := # Trump voters (in sample of 10 US voters)
# of Trump
voters Probability
Cumulative
Probability
Mean E[X] = ? 4
0 0.6% 0.6%
Standard Deviation SD[X] = ? 1. 5
1 4.0% 4.6% Link to Binomial distribution sheet
2 12.1% 16.7%
3 21.5% 38.2%
4 25.1% 63.3% ←median
5 20.1% 83.4%
6 11.1% 94.5%
7 4.2% 98.8%
8 1.1% 99.8%
9 0.2% 100.0%
10 0.0% 100.0%
Mean & Standard Deviation of Binomial Distributions
sample probability
size of “success”

If X ~ BINOM(n,p) then:

E[X] = np = n .

Counter )
E

SD[X] = sqrt(np(1-p))
Jn
SDI
=

Ier )
-

9
pm
Central Limit Theorem
Binomial Distribution ≅ Normal Distribution (for large n)
Sample size, n = 15 Sample size, n = 50
Binomial Distribution ≅ Normal Distribution (for large n)

If X ~ BINOM(n,p) and both np ≥ 5 and n(1-p) ≥ 5,


then X approximates Normal distribution Y ~ N(μY,σY)
with μY = μX = np and σY = σX = sqrt(np(1-p))

5)
0.4
p 15×0.4--6>5
-

n 15×0.6=97,5
Example of Normal approximation

X := # Trump voters in sample size n = 15

P(X≤6) = BINOM.DIST(6,15,0.4,TRUE) = 61.0%


Example of Normal approximation

X := # Trump voters in sample size n = 15

P(X≤6) = BINOM.DIST(6,15,0.4,TRUE) = 61.0%

Both np = 15*0.4 = 6 and n(1-p) = 15*0.6 = 9 are ≥ 5

μX = np = 6, σX = sqrt(6*0.4*0.6) = 1.9
Example of Normal approximation

X := # Trump voters in sample size n = 15

O
P(X≤6) = BINOM.DIST(6,15,0.4,TRUE) = 61.0%

Both np = 15*0.4 = 6 and n(1-p) = 15*0.6 = 9 are ≥ 5

μX = np = 6, σX = sqrt(6*0.4*0.6) = 1.9

Hence, X ≅ Y ~ N(6,1.9)

P(X≤6) ≅ P(Y<6) = P(Z<0.0) = 0.5 ???


- .
#
Example of Normal approximation (with continuity correction)

X := # Trump voters in sample size n = 15 y


P(X≤6) = BINOM.DIST(6,15,0.4,TRUE) = 61.0%

Both np = 15*0.4 = 6 and n(1-p)×= 15*0.6 = 9 are ≥ 5

μX = np = 6, σX = sqrt(6*0.4*0.6) = 1.9

Hence, X ≅ Y ~ N(6,1.9)

DO
P(X≤6) ≅ P(Y<6.5)
Example of Normal approximation (with continuity correction)

X := # Trump voters in sample size n = 15

P(X≤6) = BINOM.DIST(6,15,0.4,TRUE) = 61.0%

Both np = 15*0.4 = 6 and n(1-p) = 15*0.6 = 9 are ≥ 5

μX = np = 6, σX = sqrt(6*0.4*0.6) = 1.9

Hence, X ≅ Y ~ N(6,1.9)

P(X≤6) ≅ P(Y<6.5) = P(Z<(6.5-6)/1.9) = P(Z<+0.26) = 60.4%


Example of Normal approximation (with continuity correction)

X := # Trump voters in sample size n = 15

P(X≤6) = BINOM.DIST(6,15,0.4,TRUE) = 61.0%

Both np = 15*0.4 = 6 and n(1-p) = 15*0.6 = 9 are ≥ 5

μX = np = 6, σX = sqrt(6*0.4*0.6) = 1.9

Hence, X ≅ Y ~ N(6,1.9)

P(X≤6) ≅ P(Y<6.5) = P(Z<(6.5-6)/1.9) = P(Z<+0.26) = 60.4%


Continuity Correction Factor Table

Source: https://www.statisticshowto.com/what-is-the-continuity-correction-factor/
Prelude to Sampling Statistics
Menti Poll #3

Over 100M votes are projected to be cast in the upcoming US


2020 elections. Suppose the winner is determined by popular
vote (which is not the actual case). How many random samples
of voters does one need to take in order to determine the
winner with 99% confidence?

-
● About 1,000 or less
● About 10,000
● About 100,000
● About 1,000,000 or more
Can random sampling “inform” us of the election results?

Sample size of 10:


P(X O O
≤ 4) = BINOM.DIST(4,10,0.4,TRUE) = 63.3%
Can random sampling “inform” us of the election results?

Sample size of 10:


P(X ≤ 4) = BINOM.DIST(4,10,0.4,TRUE) = 63.3%

Sample size of 100:


P(X ≤ 49) = BINOM.DIST(49,100,0.4,TRUE) = 97.3%
Can random sampling “inform” us of the election results?

Sample size of 10:


P(X ≤ 4) = BINOM.DIST(4,10,0.4,TRUE) = 63.3%

Sample size of 100:


P(X ≤ 49) = BINOM.DIST(49,100,0.4,TRUE) = 97.3%

Sample size of 1,000:


P(X ≤ 499) = BINOM.DIST(499,1000,0.4,TRUE) = 100.0%
Can random sampling “inform” us of the election results?

Sample size of 10:


P(X ≤ 4) = BINOM.DIST(4,10,0.4,TRUE) = 63.3%

Sample size of 100:


P(X ≤ 49) = BINOM.DIST(49,100,0.4,TRUE) = 97.3%

Sample size of 1,000:


P(X ≤ 499) = BINOM.DIST(499,1000,0.4,TRUE) = 100.0%

Even with P(Trump) = 0.45 & sample size of 1,000:


P(X ≤ 499) = BINOM.DIST(499,1000,0.45,TRUE) = 99.9%
E.g. Singapore General Election Sample Count

Source: https://www.eld.gov.sg/mediarelease/SampleCount_Generic.pdf
Accuracy of 2020 Singapore GE sample counts
Sample counts
predicted winner
of all 31
constituencies
correctly!
Accuracy of 2020 Singapore GE sample counts
Sample counts
predicted winner
of all 31
constituencies
correctly!

Approx 100,000
sample counts
(0.4%) vs
2,500,000 total
votes cast
What we have learned so far
0 p 1
Actual
Population

Sample
0 Binomial distribution 1
≅ Normal distribution (for large n)
What we will learn in Lecture 5 (the following Tue 27th Oct)
What can we infer about actual
population given sample statistics?
0 1
Actual
Population

Sample
0 p 1
Next lecture this Thu 15th Oct will be fully online (Zoom)

Problem-Solving Recitation (Part I) & Office Hours (Part II)

Attempt your HW problems & come prepared with questions!

You might also like