Calculus, Probability, and Statistics Primers: Dave Goldsman
Calculus, Probability, and Statistics Primers: Dave Goldsman
Dave Goldsman
12/30/18
1 / 104
Outline
1 Calculus Primer
2 Probability Primer
Basics
Simulating Random Variables
Great Expectations
Functions of a Random Variable
Jointly Distributed Random Variables
Covariance and Correlation
Some Probability Distributions
Limit Theorems
3 Statistics Primer
Intro to Estimation
Unbiased Estimation
Maximum Likelihood Estimation
Distributional Results and Confidence Intervals
2 / 104
Calculus Primer
Calculus Primer
First of all, let’s suppose that f (x) is a function that maps values of x
from a certain domain X to a certain range Y , which we can denote
by the shorthand f : X → Y .
3 / 104
Calculus Primer
d f (x + h) − f (x)
f (x) ≡ f 0 (x) ≡ lim
dx h→0 h
exists and is well-defined for any given x. Think of the derivative as
the slope of the function.
4 / 104
Calculus Primer
[xk ]0 = kxk−1 ,
[ex ]0 = ex ,
[sin(x)]0 = cos(x),
[cos(x)]0 = − sin(x),
1
[`n(x)]0 = ,
x
1
[arctan(x)]0 = . 2
1 + x2
5 / 104
Calculus Primer
0
g(x)f 0 (x) − f (x)g 0 (x)
f (x)
= (quotient rule)1 ,
g(x) g 2 (x)
1
Ho dee Hi minus Hi dee Ho over Ho Ho.
2
www.youtube.com/watch?v=gGAiW5dOnKo
6 / 104
Calculus Primer
d 2
[f (x)g(x)]0 = x `n(x) = 2x`n(x) + x,
dx
0
d x2
f (x) 2x`n(x) − x
= = ,
g(x) dx `n(x) `n2 (x)
2`n(x)
[f (g(x))]0 = 2g(x)g 0 (x) = . 2
x
7 / 104
Calculus Primer
The minimum or maximum of f (x) can only occur when the slope of
f (x) is zero, i.e., only when f 0 (x) = 0, say at x = x0 . Exception:
Check the endpoints of your interval of interest as well.
Then if f 00 (x0 ) < 0, you get a max; if f 00 (x0 ) > 0, you get a min; and
if f 00 (x0 ) = 0, you get a point of inflection.
Example Find the value of x that minimizes f (x) = e2x + e−x . The
minimum can only occur when f 0 (x) = 2e2x − e−x = 0. After a little
algebra, we find that this occurs at x0 = −(1/3)`n(2) ≈ −0.231. It’s
also easy to show that f 00 (x) > 0 for all x; and so x0 yields a
minimum. 2
8 / 104
Calculus Primer
9 / 104
Calculus Primer
Bisection: Suppose you can find x1 and x2 such that g(x1 ) < 0 and
g(x2 ) > 0. (We’ll follow similar logic if the inequalities are both
reversed.) By the Intermediate Value Theorem (which you may
remember), there must be a zero in [x1 , x2 ], that is, x? ∈ [x1 , x2 ] such
that g(x? ) = 0.
Thus, take x3 = (x1 + x2 )/2. If g(x3 ) < 0, then there must be a zero
in [x3 , x2 ]. Otherwise, if g(x3 ) > 0, then there must be a zero in
[x1 , x3 ]. In either case, you’ve reduced the length of the search
interval.
Continue in this same manner until the length of the search interval is
as small as desired.
g(xi )
xi+1 = xi − .
g 0 (xi )
This makes sense since for xi and xi+1 close to each other and the
zero x? , we have
g(x? ) − g(xi )
g 0 (xi ) ≈ .
x? − xi
11 / 104
Calculus Primer
x2i − 2 xi 1
xi+1 = xi − = + .
2xi 2 xi
Let’s start with a bad guess of x1 = 1. Then
x1 1 1
x2 = + = + 1 = 1.5
2 x1 2
x2 1 1.5 1
x3 = + ≈ + = 1.4167
2 x2 2 1.5
x3 1
x4 = + ≈ 1.4142 Wow! 2
2 x3
12 / 104
Calculus Primer
Integration
3
“I’m really an integral!”
13 / 104
Calculus Primer
xk+1
Z
xk dx = + C for k 6= −1
k+1
Z
dx
= `n|x| + C,
x
Z
ex dx = ex + C,
Z
cos(x) dx = sin(x) + C,
Z
dx
= arctan(x) + C,
1 + x2
where C is an arbitrary constant. 2
14 / 104
Calculus Primer
Z b Z a
f (x) dx = − f (x) dx,
a b
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx.
a a c
15 / 104
Calculus Primer
Z Z
f (x)g (x) dx = f (x)g(x)− g(x)f 0 (x) dx
0
(integration by parts)4 ,
Z Z
0
f (g(x))g (x) dx = f (u) du (substitution rule)5 .
4
www.youtube.com/watch?v=OTzLVIc-O5E
5
www.youtube.com/watch?v=eswQl-hcvU0
16 / 104
Calculus Primer
xe2x 1 e2 e2x 1
Z 1 Z 1 2x
e2 + 1
e
2x
xe dx = − dx = − = . 2
0 2 0 0 2 2 4 0 4
17 / 104
Calculus Primer
18 / 104
Calculus Primer
Example And while we’re at it, here are some miscellaneous sums
that you should know.
n
X n(n + 1)
k = ,
2
k=1
n
X n(n + 1)(2n + 1)
k2 = ,
6
k=1
∞
X 1
pk = (for −1 < p < 1).
1−p
k=0
19 / 104
Calculus Primer
f (x) f 0 (x)
lim = lim 0 .
x→a g(x) x→a g (x)
sin(x) cos(x)
lim = lim = 1. 2
x→0 x x→0 1
6
This rule makes me sick.
20 / 104
Calculus Primer
21 / 104
Calculus Primer
Riemann (cont’d): Since I’m such a nice guy, I’ve made things
easy for you. In this problem, I’ve thoughtfully taken a = 0 and
b = 1, so that ∆x = 1/n and xi = i/n, which simplifies the notation
a bit. Then
Z b Z 1
f (x) dx = f (x) dx
a 0
n
X
≈ f (xi )∆x
i=1
n
1X πi
= sin .
n 2n
i=1
22 / 104
Calculus Primer
R1
Again try it out on 0 sin(πx/2) dx.
23 / 104
Calculus Primer
24 / 104
Probability Primer
Outline
1 Calculus Primer
2 Probability Primer
Basics
Simulating Random Variables
Great Expectations
Functions of a Random Variable
Jointly Distributed Random Variables
Covariance and Correlation
Some Probability Distributions
Limit Theorems
3 Statistics Primer
Intro to Estimation
Unbiased Estimation
Maximum Likelihood Estimation
Distributional Results and Confidence Intervals
25 / 104
Probability Primer
Basics
Basics
Will assume that you know about sample spaces, events, and the
definition of probability.
26 / 104
Probability Primer
Basics
27 / 104
Probability Primer
Basics
Example: Let X be the sum of two dice rolls. Then X((4, 6)) = 10.
In addition,
1/36 if x = 2
2/36 if x = 3
..
P (X = x) = . 2
1/36 if x = 12
0 otherwise
28 / 104
Probability Primer
Basics
Examples: Here are some well-known discrete RV’s that you may
know: Bernoulli(p), Binomial(n, p), Geometric(p), Negative
Binomial, Poisson(λ), etc.
29 / 104
Probability Primer
Basics
30 / 104
Probability Primer
Basics
We’ll make a brief aside here to show how to simulate some very
simple random variables.
32 / 104
Probability Primer
Simulating Random Variables
Can’t use a die toss to simulate this random variable. Instead, use
what’s called the inverse transform method.
x f (x) P (X ≤ x) Unif(0,1)’s
−2 0.25 0.25 [0.00, 0.25]
3 0.10 0.35 (0.25, 0.35]
4.2 0.65 1.00 (0.35, 1.00)
34 / 104
Probability Primer
Simulating Random Variables
If you don’t like programming, you can use Excel function RAND()
or something similar to generate Unif(0,1)’s.
35 / 104
Probability Primer
Simulating Random Variables
FUNCTION UNIF(IX)
K1 = IX/127773 (this division truncates, e.g., 5/3 = 1.)
IX = 16807*(IX - K1*127773) - K1*2836 (update seed)
IF(IX.LT.0)IX = IX + 2147483647
UNIF = IX * 4.656612875E-10
RETURN
END
36 / 104
Probability Primer
Simulating Random Variables
Some Exercises: In the following, I’ll assume that you can use
Excel (or whatever) to simulate independent Unif(0,1) RV’s. (We’ll
review independence in a little while.)
1 Make a histogram of Xi = −`n(Ui ), for i = 1, 2, . . . , 10000,
where the Ui ’s are independent Unif(0,1) RV’s. What kind of
distribution does it look like?
2 Suppose Xi and Yi are independentp Unif(0,1) RV’s,
i = 1, 2, . . . , 10000. Let Zi = −2`n(Xi ) sin(2πYi ), and
make a histogram of the Zi ’s based on the 10000 replications.
3 Suppose Xi and Yi are independent Unif(0,1) RV’s,
i = 1, 2, . . . , 10000. Let Zi = Xi /(Xi − Yi ), and make a
histogram of the Zi ’s based on the 10000 replications. This may
be somewhat interesting. It’s possible to derive the distribution
analytically, but it takes a lot of work.
37 / 104
Probability Primer
Great Expectations
Great Expectations
2
P
and we have E[X] = x xf (x) = p.
38 / 104
Probability Primer
Great Expectations
2
R
and we have E[X] = R xf (x) dx = (a + b)/2.
39 / 104
Probability Primer
Great Expectations
40 / 104
Probability Primer
Great Expectations
x 2 3 4
f (x) 0.3 0.6 0.1
Then E[X 3 ] = 3 2
P
x x f (x) = 8(0.3) + 27(0.6) + 64(0.1) = 25.
41 / 104
Probability Primer
Great Expectations
p
The standard deviation of X is Var(X).
42 / 104
Probability Primer
Great Expectations
2 1 2
Var(X) = E[X 2 ] − (E[X])2 = − = 1/λ2 . 2
λ2 λ
43 / 104
Probability Primer
Great Expectations
44 / 104
Probability Primer
Great Expectations
λ
Example: X ∼ Exp(λ). Then MX (t) = λ−t for λ > t. So
d λ
E[X] = MX (t)
= = 1/λ.
dt t=0 (λ − t)2 t=0
Further,
d2
2 2λ
= 2/λ2 .
E[X ] = MX (t)
=
dt2 t=0 (λ − t)3
t=0
Thus,
2 1
Var(X) = E[X 2 ] − (E[X])2 = − = 1/λ2 . 2
λ2 λ2
Moment generating functions have many other important uses, some
of which we’ll talk about in this course.
46 / 104
Probability Primer
Functions of a Random Variable
47 / 104
Probability Primer
Functions of a Random Variable
Discrete Example: Let X denote the number of H’s from two coin
tosses. We want the pmf for Y = X 3 − X.
x 0 1 2
f (x) 1/4 1/2 1/4
y= x3 −x 0 0 6
48 / 104
Probability Primer
Functions of a Random Variable
G(y) = P (Y ≤ y)
= P (X 2 ≤ y)
√ √
= P (− y ≤ X ≤ y)
Z √y
= √
|x| dx = y, 0 < y < 1.
− y
49 / 104
Probability Primer
Functions of a Random Variable
P (Y ≤ y) = P (F (X) ≤ y)
= P (X ≤ F −1 (y))
= F (F −1 (y)) = y,
50 / 104
Probability Primer
Functions of a Random Variable
51 / 104
Probability Primer
Functions of a Random Variable
If you set F (X) = U and solve for X, show that you get
1
X= [−`n(1 − U )]1/β .
λ
Now pick your favorite λ and β, and use this result to generate values
of X. In fact, make a histogram of your X values. Are there any
interesting values of λ and β you could’ve chosen?
52 / 104
Probability Primer
Functions of a Random Variable
By the chain rule (and since a pdf must be ≥ 0), the pdf of Y is
d −1
d −1
fY (y) = FY (y) = fX (h (y)) h (y) .
dy dy
53 / 104
Probability Primer
Jointly Distributed Random Variables
Remark: The marginal cdf of X is FX (x) = F (x, ∞). (We use the
X subscript to remind us that it’s just the cdf of X all by itself.)
Similarly, the marginal cdf of Y is FY (y) = F (∞, y).
54 / 104
Probability Primer
Jointly Distributed Random Variables
Theorem: X and Y are indep if you can write their joint pdf as
f (x, y) = a(x)b(y) for some functions a(x) and b(y), and x and y
don’t have funny limits (their domains do not depend on each other).
Examples: If f (x, y) = cxy for 0 ≤ x ≤ 2, 0 ≤ y ≤ 3, then X and
Y are independent.
If f (x, y) = 21 2 2
4 x y for x ≤ y ≤ 1, then X and Y are not
independent.
If f (x, y) = c/(x + y) for 1 ≤ x ≤ 2, 1 ≤ y ≤ 3, then X and Y are
not independent. 2
57 / 104
Probability Primer
Jointly Distributed Random Variables
This
R is a legit pmf/pdf. For example, in the continuous case,
R f (y|x) dy = 1, for any x.
21 2
Example: Suppose f (x, y) = 4 x y for x2 ≤ y ≤ 1. Then
21 2
f (x, y) 4 x y 2y
f (y|x) = = 21 2 = , x2 ≤ y ≤ 1. 2
fX (x) 4
8 x (1 − x )
1 − x4
59 / 104
Probability Primer
Jointly Distributed Random Variables
60 / 104
Probability Primer
Jointly Distributed Random Variables
X ∞
X
E[Y ] = yfY (y) = yq y−1 p = 1/p,
y y=1
62 / 104
Probability Primer
Jointly Distributed Random Variables
63 / 104
Probability Primer
Jointly Distributed Random Variables
64 / 104
Probability Primer
Jointly Distributed Random Variables
Thus,
P (A) = E[Y ] = E[E(Y |X)]
Z
= E[Y |X = x]dFX (x)
ZR
= P (A|X = x)dFX (x).
R
Proof: Follows from above result if we let the event A = {Y < X}.
2
65 / 104
Probability Primer
Jointly Distributed Random Variables
66 / 104
Probability Primer
Jointly Distributed Random Variables
Similarly,
h i
Var [E(Y |X)] = E {E(Y |X)}2 − {E [E(Y |X)]}2
h i
= E {E(Y |X)}2 − {E(Y )}2 .
Thus,
E [Var(Y |X)]+Var [E(Y |X)] = E(Y 2 )−{E(Y )}2 = Var(Y ). 2
67 / 104
Probability Primer
Covariance and Correlation
68 / 104
Probability Primer
Covariance and Correlation
iid
Notation: X1 , . . . , Xn ∼ f (x). (The term “iid” reads independent
and identically distributed.)
iid
P If X1 , . . . , Xn ∼ f (x) and the sample mean
Example:
X̄n ≡ ni=1 Xi /n, then E[X̄n ] = E[Xi ] and Var(X̄n ) = Var(Xi )/n.
Thus, the variance decreases as n increases. 2
69 / 104
Probability Primer
Covariance and Correlation
70 / 104
Probability Primer
Covariance and Correlation
and
Theorem: −1 ≤ ρ ≤ 1.
71 / 104
Probability Primer
Covariance and Correlation
and
E[XY ] − E[X]E[Y ]
ρ = p = −0.415. 2
Var(X)Var(Y )
72 / 104
Probability Primer
Covariance and Correlation
d
Setting dw Var(P ) = 0, we obtain the critical point that (hopefully)
minimizes the variance of the portfolio,
σ 2 − σ12
w = 2 2 2 . 2
σ1 + σ2 − 2σ12
73 / 104
Probability Primer
Covariance and Correlation
74 / 104
Probability Primer
Some Probability Distributions
X ∼ Bernoulli(p).
(
p if x = 1
f (x) =
1 − p (= q) if x = 0
iid
Y ∼ Binomial(n, p). If X1 , X
P2 , . . . , Xn ∼ Bern(p) (i.e.,
n
Bernoulli(p) trials), then Y = i=1 Xi ∼ Bin(n, p).
!
n y n−y
f (y) = p q , y = 0, 1, . . . , n.
y
f (x) = q x−1 p, x = 1, 2, . . . .
76 / 104
Probability Primer
Some Probability Distributions
X ∼ Poisson(λ).
78 / 104
Probability Primer
Some Probability Distributions
X ∼ Gamma(α, λ).
λα xα−1 e−λx
f (x) = , x ≥ 0,
Γ(α)
where the gamma function is
Z ∞
Γ(α) ≡ tα−1 e−t dt.
0
h iα
E[X] = α/λ, Var(X) = α/λ2 , MX (t) = λ/(λ − t) for t < λ.
iid
If X1 , X2 , . . . , Xn ∼ Exp(λ), then Y ≡ ni=1 Xi ∼ Gamma(n, λ).
P
The Gamma(n, λ) is also called the Erlangn (λ). It has cdf
n−1
X (λy)j
FY (y) = 1 − e−λy , y ≥ 0.
j!
j=0
79 / 104
Probability Primer
Some Probability Distributions
E[X] = (a + b + c)/3.
Γ(a+b) a−1
X ∼ Beta(a, b). f (x) = Γ(a)Γ(b) x (1 − x)b−1 for 0 ≤ x ≤ 1 and
a, b > 0.
a ab
E[X] = and Var(X) = .
a+b (a + b)2 (a + b + 1)
80 / 104
Probability Primer
Some Probability Distributions
Limit Theorems
This is a special case of the Law of Large Numbers, which says that
X̄n approximates µ well as n becomes large.
d
Idea: If Yn −→ Y and n is large, then you ought to be able to
approximate the distribution of Yn by the limit distribution of Y .
82 / 104
Probability Primer
Limit Theorems
iid
Central Limit Theorem: If X1 , X2 , . . . , Xn ∼ f (x) with mean µ
and variance σ 2 , then
Pn √
i=1√Xi − nµ n(X̄n − µ) d
Zn ≡ = −→ Nor(0, 1).
nσ σ
The CLT usually works well if the pmf/pdf is fairly symmetric and
n ≥ 15.
83 / 104
Probability Primer
Limit Theorems
iid
Example: If X1 , X2 , . . . , X100 ∼ Exp(1) (so µ = σ 2 = 1), then
100
X
P 90 ≤ Xi ≤ 110
i=1
90 − 100 110 − 100
= P √ ≤ Z100 ≤ √
100 100
≈ P (−1 ≤ Nor(0, 1) ≤ 1) = 0.6827.
84 / 104
Probability Primer
Limit Theorems
85 / 104
Statistics Primer
Outline
1 Calculus Primer
2 Probability Primer
Basics
Simulating Random Variables
Great Expectations
Functions of a Random Variable
Jointly Distributed Random Variables
Covariance and Correlation
Some Probability Distributions
Limit Theorems
3 Statistics Primer
Intro to Estimation
Unbiased Estimation
Maximum Likelihood Estimation
Distributional Results and Confidence Intervals
86 / 104
Statistics Primer
Intro to Estimation
Intro to Estimation
Examples of parameters: µ, σ 2 .
87 / 104
Statistics Primer
Intro to Estimation
* Its expected value should equal the parameter it’s trying to estimate.
88 / 104
Statistics Primer
Unbiased Estimation
Unbiased Estimators
Definition: T (X) is unbiased for θ if E[T (X)] = θ.
90 / 104
Statistics Primer
Unbiased Estimation
91 / 104
Statistics Primer
Unbiased Estimation
iid
Big Example: Suppose that X1 , . . . , Xn ∼ Unif(0, θ), i.e., the pdf
is f (x) = 1/θ, 0 < x < θ.
n+1
Consider two estimators: Y1 ≡ 2X̄ and Y2 ≡ n max1≤i≤n Xi
It’s also the case that Y2 is unbiased, but it takes a little more work to
show this. As a first step, let’s get the cdf of M ≡ maxi Xi ,
P (M ≤ y) = P (X1 ≤ y and X2 ≤ y and · · · and Xn ≤ y)
n
Y
= P (Xi ≤ y) = [P (X1 ≤ y)]n (Xi ’s are iid)
i=1
Z y n Z y n
= fX1 (x) dx = 1/θ dx = (y/θ)n .
0 0
92 / 104
Statistics Primer
Unbiased Estimation
94 / 104
Statistics Primer
Unbiased Estimation
iid
Example: X1 , . . . , Xn ∼ Unif(0, θ).
n+1
Two estimators: Y1 = 2X̄ and Y2 = n maxi Xi .
θ2 θ2
Also, Var(Y1 ) = 3n and Var(Y2 ) = n(n+2) .
θ2 θ2
Thus, MSE(Y1 ) = 3n and MSE(Y2 ) = n(n+2) , so Y2 is better.
95 / 104
Statistics Primer
Maximum Likelihood Estimation
Could take the derivative and plow through all of the horrible algebra.
Too tedious. Need a trick. . . .
Useful Trick: Since the natural log function is one-to-one, it’s easy to
see that the λ that maximizes L(λ) also maximizes `n(L(λ))!
n
X n
X
n
`n(L(λ)) = `n λ exp − λ xi = n`n(λ) − λ xi
i=1 i=1
(3) At the end, we make all of the little xi ’s into big Xi ’s to indicate
that this is a RV.
98 / 104
Statistics Primer
Maximum Likelihood Estimation
iid
Example: Suppose X1 , . . . , Xn ∼ Exp(λ). We define the survival
function as
This kind of thing is used all of the time the actuarial sciences. 2
99 / 104
Statistics Primer
Distributional Results and Confidence Intervals
If Z ∼ Nor(0,
p 1), Y ∼ χ2 (k), and Z and Y are independent, then
T = Z/ Y /k has the Student t distribution with k df. Notation:
T ∼ t(k). Note that the t(1) is the Cauchy distribution.
100 / 104
Statistics Primer
Distributional Results and Confidence Intervals
How (and why) would one use the above facts? Because they can be
used to construct confidence intervals (CIs) for µ and σ 2 under a
variety of assumptions.
Here are some examples / theorems, all of which assume that the Xi ’s
are iid normal. . .
(n − 1)S 2 2 (n − 1)S 2
≤ σ ≤ ,
χ2α ,n−1 χ21− α ,n−1
2 2
102 / 104
Statistics Primer
Distributional Results and Confidence Intervals
103 / 104
Statistics Primer
Distributional Results and Confidence Intervals
104 / 104