Stats 101 - Class 01
Stats 101 - Class 01
Bryon Aragam
Chicago Booth
[email protected]
https://canvas.uchicago.edu/courses/43775/
Suggested Reading:
Naked Statistics, Chapters 1, 2, 3, 5, 5.5 and 6
OpenIntro Statistics, Chapters 2, 3, and 4
1 / 110
Reminders
Quick reminder: This is not a math class, but some things are
expected
I Manipulating equations, square roots, solving linear equations
P
(y = mx + b), sums i
√
I e.g. if y = a2 u + b 2 v + 2z, solve for u, v , or z
xy
I e.g. if z = ab+cd , solve for a, b, c, or d
I Averages, proportions, percentages, etc.
1 / 110
Why Probability?
2 / 110
Kidney stones
Treatment A Treatment B
3 / 110
Kidney stones
Treatment A Treatment B
5 / 110
Ad targeting
It will cost you $.80 (eighty cents) to run the promotion and a
customer spends $40 if they respond to the promotion.
7 / 110
Probability Basics
8 / 110
Introduction
9 / 110
Randomness and outcomes
10 / 110
Events
11 / 110
Random Variables
12 / 110
Example: Baseball
13 / 110
The language of probability
Summary:
I Events are collections of outcomes
I Random variables make it easy to describe complicated
events
I Data collection entails randomness and possible outcomes
Of course...
I This is all very formal, and you probably won’t find yourself
using this language very often in board meetings...
I ...but we need to settle on a common language so we’re all on
the same page
14 / 110
Probability
16 / 110
The Four Rules of Probability
18 / 110
Conditional probability
Intuitively, P(A|B) means: In an alternative world where we know
that B has occurred (or will occur), how does P(A) change?
P(A, B)
P(A|B) = .
P(B)
In 2019, United Airlines set a 2020 earnings goal of $11 to $13 per
share. Let’s say they felt there was a 95% probability of hitting
this target:
20 / 110
Understanding conditional probability
Let A = earnings of $11 to $13 per share in 2020. There are lots
of events that can affect the outcome of this event.
I B1 = United buys Southwest P(B1 ) = ?
I B2 = It rains at 6:24PM sometime in May P(B2 ) = ?
I B3 = A major network outage grounds all United planes for
48 hours P(B3 ) = ?
I B4 = A global pandemic quarantines the world’s population
for months years P(B4 ) = ?
21 / 110
Understanding conditional probability
23 / 110
Reality vs. imagination
Be careful: Only true for independent events, and only true with
“and” (not “or”)!
Exercise: Can you see why rule #4 and independence imply this
formula?
26 / 110
Independence and statistics
27 / 110
Pete Rose Hitting Streak
We now need to find P(Ai ): What are all the ways to get at least one hit
in 4 tries? This is hard. Instead, how many ways ways are there to get
NO hits in 4 tries? Exactly one: All misses (i.e. 0000).
Now what? It turns out that there are 177,100 different sequences
of 25 games where the Patriots win 19... it turns out each
potential sequence has probability 0.525 (why?)
30 / 110
Probability distributions
31 / 110
Probability Distributions
Examples:
I Voting: 0 = Candidate A, 1 = Candidate B
I Product testing: 0 = safe, 1 = defective / unsafe
I Yes/no: 0 = no, 1 = yes
33 / 110
Conditional, Joint and Marginal Distributions
34 / 110
Conditional, Joint and Marginal Distributions
There are two different ways to discuss the probabilities of two events:
I Joint probability P(A and B): I don’t know if either event has
occurred.
I Conditional probability P(A|B): I know for sure that B has occurred
(or will occur).
35 / 110
Conditional, Joint and Marginal Distributions
36 / 110
Conditional probability tables
S P(S|E = 1) S P(S|E = 0)
1 0.05 1 0.20
2 0.20 2 0.30
3 0.50 3 0.30
4 0.25 4 0.20
37 / 110
Conditional probability tables
S P(S|E = 1) S P(S|E = 0)
1 0.05 1 0.20
2 0.20 2 0.30
3 0.50 3 0.30
4 0.25 4 0.20
38 / 110
Computing joint probabilities
39 / 110
Probability tree diagrams
40 / 110
Joint probability tables
X
P(X = x) = P(X = x, Y = y ).
y
P(S = 4, E = 1) 0.175
P(S = 4|E = 1) = = = 0.25
P(E = 1) 0.7
43 / 110
Conditional, Joint and Marginal Distributions
P(S = 4, E = 1) 0.175
P(E = 1|S = 4) = = = 0.745
P(S = 4) 0.235
44 / 110
Independence of random variables
P(Y = y |X = x) = P(Y = y )
In other words,
In other words,
46 / 110
Disease Testing Example
If you take the test and the result is positive, you are really
interested in the question: Given that you tested positive, what is
the chance you have the disease?
47 / 110
Disease Testing Example
P(D = 1, T = 1) 0.019
P(D = 1|T = 1) = = = 0.66
P(T = 1) (0.019 + 0.0098)
48 / 110
Bayes Theorem
P(X = x, Y = y ) P(X = x, Y = y )
P(X = x|Y = y ) = =P
P(Y = y ) x P(X = x, Y = y )
P(X = x)P(Y = y |X = x)
=P .
x P(X = x)P(Y = y |X = x)
0.019
P(D = 1|T = 1) = (0.019+0.0098) = 0.66
50 / 110
Causality and experimentation
51 / 110
Interlude: Conditional probability vs causation
52 / 110
Interlude: Conditional probability vs causation
Ordinary relationships:
When we say that two things are related, our language is
deliberately vague: You are not allowed to conclude ANYTHING
about how or why they are related. (See Slide 52.)
Causal relationships:
Causality is a much stronger type of relationship: Two things share
a causal relationship if manipulating one of them changes the
other.
I Non-causal: Ice cream sales and shark attacks
I Causal: Ice cream melting and temperature
55 / 110
Randomized controlled trials
Why random?
57 / 110
Ice cream does not cause shark attacks
Common pitfall: “It’s causal when I want it to be, and it’s spurious
otherwise.” Don’t do this!
(See also: Lucas critique in economics)
59 / 110
Beyond the scope of this course
61 / 110
Probability and Decisions
Suppose you are presented with an investment opportunity in the
development of a drug... probabilities are a vehicle to help us build
scenarios and make decisions.
62 / 110
Probability and Decisions
Revenue P(Revenue)
$250,000 0.7
$0 0.138
$25,000,000 0.162
So, should we invest or not? Everyone has their own opinions, but
let’s try to analyze this rigorously.
63 / 110
Mean and Variance of a Random Variable
n
X
E(X ) = P(X = xi ) × xi
i=1
64 / 110
Mean and Variance of a Random Variable
Suppose X ∼ Ber(p).
n
X
E(X ) = P(X = xi ) × xi
i=1
= (1 − p) × 0 + p × 1
E(X ) = p
65 / 110
Managing expectations (pun intended)
Huh?
Example:
I How to price an auto insurance policy?
I At the end of the day, either the policy holder gets in a wreck
or not
I But you don’t charge them the entire cost of a wrecked car!
What does it mean when your weather app says there is a 40%
chance of rain?
This is psychology, not math:
I If there is a 10% chance it rains, and it rains, you will get mad
for not bringing an umbrella
I If there is a 50% chance it rains then the app sounds
wishy-washy
I If there is a 90% chance it rains, and it doesn’t rain, then you
just changed all your plans for nothing
There’s no winning... unless you always report 40% or 60%!
(Let’s come back to this after Section 2. Also, nowadays with modern tools it’s a lot
more sophisticated.)
67 / 110
Mean and Variance of a Random Variable
The variance is defined as (for a discrete X ):
n
X
var(X ) = P(X = xi ) × [xi − E(X )]2
i=1
Suppose X ∼ Ber(p).
n
X
=⇒ var(X ) = P(X = xi ) × [xi − E(X )]2
i=1
= (1 − p) × (0 − p)2 + p × (1 − p)2
= p(1 − p) × [(1 − p) + p]
= p(1 − p)
69 / 110
The Standard Deviation
I What are the units of E(X )? What are the units of var(X )?
I A more intuitive way to understand the spread of a
distribution is to look at the standard deviation:
p
sd(X ) = var(X )
70 / 110
Probability and Decisions
71 / 110
Back to Target Marketing
72 / 110
Back to Target Marketing
Revenue P(Revenue)
$-0.80 0.95
$39.20 0.05
73 / 110
Covariance
The covariance between X and Y is:
n X
X m
cov(X , Y ) = P(X = xi , Y = yj )×[xi − E(X )]×[yj − E(Y )]
i=1 j=1
!
!
! !
! !
! !
!
!!
!
! ! !
!
! !
! ! ! !
!
Ȳ !
!
!
! !
0
Y
! !
! ! !
! !
!
!!
!
!
! !
! ! !
−20
! !
!
(Yi − Ȳ )(Xi − X̄) > 0 (Yi − Ȳ )(Xi − X̄) < 0
!
!
!
−40
−20 −10 0 10 20
X
X̄
75 / 110
Ford vs. Tesla
76 / 110
Ford vs. Tesla
77 / 110
Ford vs. Tesla
cov(X , Y )
cor (X , Y ) =
sd(X )sd(Y )
3.063
cor (F , T ) = = 0.428 (not too strong!)
2.29 × 3.12
79 / 110
Correlation and dependence
81 / 110
Spurious correlations
https://tylervigen.com/spurious-correlations
82 / 110
Linear combinations
83 / 110
Linear Combination of Random Variables
84 / 110
Linear Combination of Random Variables
85 / 110
Linear Combination of Random Variables
More generally...
I E(w1 X1 + w2 X2 + ...wp Xp ) =
Pp
w1 E(X1 ) + w2 E(X2 ) + ... + wp E(Xp ) = i=1 wi E(Xi )
86 / 110
Example
On average, LeBron James scores 27.1 points per game, with a standard
deviation of 5.3 points.
I Over a randomly selected series of 3 games, how many points do
you expect LeBron to score?
I What is the standard deviation over these three games?
Assume that each game is independent.
88 / 110
Continuous Random Variables
89 / 110
Probability density functions
0.2
0.1
0.0
−4 −2 0 2 4
91 / 110
The Standard Normal Distribution
0.4
standard normal pdf
0.3
0.2
0.2
0.1
0.1
0.0
0.0
−4 −2 0 2 4 −4 −2 0 2 4
z z
92 / 110
Example: Normal Probabilities
Questions:
93 / 110
Normal Distribution as a Family
94 / 110
The Family of Normal Distributions
95 / 110
Why normal?
96 / 110
Why normal?
97 / 110
Mean and Variance of Continuous RVs
But you DO need to know that the basic idea is the same as for
discrete RVs. The interpretation is also the same:
I The mean measures the central tendency (but NOT
necessarily the most likely value);
I The variance measures the average spread or variation around
the mean;
I The covariance measures how two RVs move together on
average.
98 / 110
Mean and Variance of a Random Variable
99 / 110
Normal pdfs
−8 −6 −4 −2 0 2 4 6 8
100 / 110
Example: Modeling returns with a normal
101 / 110
Example: Modeling returns with a normal
prob less than 0 prob is 2%
0.020
0.020
0.010
0.010
0.000
0.000
−40 −20 0 20 40 60 −40 −20 0 20 40 60
sp500 sp500
I (i) Pr (SP500 < 0) = 0.35 and (ii) Pr (SP500 < −25) = 0.02
I (This is just a conceptual example. You are not expected to
know these precise numbers.)
102 / 110
Normal probabilities and standardization
103 / 110
Standardization
If X ∼ N(µ, σ 2 ) then
X −µ
Z= ∼ N(0, 1).
σ
r − 0.012
r ∼ N(0.012, 0.0432 ) =⇒ z = ∼ N(0, 1)
0.043
−0.2176 − 0.012
z= = −5.27
0.043
105 / 110
Building normals
One more very useful property of normal distributions... sum/linear
combination of normal random variables is a new normal random variable!
107 / 110
Portfolios once again...
108 / 110
What about outliers?
109 / 110
Median, Skewness
110 / 110