0% found this document useful (0 votes)
392 views

cs109 Final Cheat 3 PDF

The document provides a cheat sheet summarizing key concepts in probability and statistics for a final exam, including: 1. Fundamental concepts like DeMorgan's laws, axioms of probability, and the inclusion-exclusion identity. 2. Conditional probability, the chain rule, Bayes' theorem, and odds. 3. Independence and conditional independence of events. 4. Common random distributions like the probability mass function, density function, and cumulative distribution function. Properties of independent and conditional random variables are also summarized.

Uploaded by

Thapelo Sebolai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
392 views

cs109 Final Cheat 3 PDF

The document provides a cheat sheet summarizing key concepts in probability and statistics for a final exam, including: 1. Fundamental concepts like DeMorgan's laws, axioms of probability, and the inclusion-exclusion identity. 2. Conditional probability, the chain rule, Bayes' theorem, and odds. 3. Independence and conditional independence of events. 4. Common random distributions like the probability mass function, density function, and cumulative distribution function. Properties of independent and conditional random variables are also summarized.

Uploaded by

Thapelo Sebolai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

cs109 Final Cheat Sheet

TG Sido
April 4, 2017

1 Fundamentals
1.1 DeMorgan’s Laws

n c n n c n
c
( ⋃ Ei ) = ⋂ Ei ( ⋂ Ei ) = ⋃ Eic
i=1 i=1 i=1 i=1

1.2 Axioms of Probability


Axiom 1 : 0 ≤ P (E) ≤ 1
Axiom 2 : P (S) = 1
Axiom 3 : For any sequence of mutually exclusive events E1 , E2 , . . .
∞ ∞
P ( ⋃ Ei ) = ∑ P (Ei )
i=1 i=1

1.3 Inclusion-Exclusion Identity


P (E ∪ F ) = P (E) + P (F ) − P (EF )
n n
P ( ⋃ Ei ) = ∑ (−1)(r+1) ∑ P (Ei1 , Ei2 , . . . , Eir )
i=1 r=1 i1 <...<ir

1.4 Number of Integer Solutions of Equations


There are (n−1
r−1
) distinct positive integer-valued vectors (x1 , x2 , . . . , xr ) satisfying the equation

x1 + x2 + ⋯ + xr = n xi > 0, i = 1, . . . , r

There are (n+r−1


r−1
) distince nonnegative integer-valued vectors (x1 , x2 , . . . , xr ) satisfying the equation

x1 + x2 + ⋯ + xr = n

2 Conditional Probability
P (EF )
P (E∣F ) = ⇔ P (EF ) = P (E∣F )P (F )
P (F )

2.1 Generalized Chain Rule


P (E1 E2 . . . En ) = P (E1 )P (E2 ∣E1 )P (E3 ∣E1 E2 ) . . . P (En ∣E1 E2 . . . En−1 )

1
2.2 Bayes’ Theorem
The many shapes and forms of Bayes’ Theorem...
P (E) = P (E∣F )P (F ) + P (E∣F c )P (F c )
P (EF ) P (E∣F )P (F )
P (F ∣E) = =
P (E) P (E)
P (E∣F )P (F )
P (F ∣E) =
P (E∣F )P (F ) + P (E∣F c )P (F c )
Fully General Form:
If F1 , F2 , . . . , Fn comprise a set of mutually exclusive and exhaustive events, then
P (E∣Fj )P (Fj )
P (Fj ∣E) = n
∑i=1 P (E∣Fi )P (Fi )
That’s odd.
The odds of H given observed evidence E:
P (H∣E) P (H)P (E∣H)
=
P (H ∣E) P (H c )P (E∣H c )
c

3 Independence
3.1 Definition
Two events are independent if P (EF ) = P (E)P (F ). Otherwise they are dependent.
More generally, events E1 , E2 , . . . , En are independent if for every subset E1′ , E2′ , . . . , Er where r ≤ n it holds
that
P (E1′ E2′ . . . Er ) = P (E1′ )P (E2′ )⋯P (Er )

3.2 Conditional Independence


Two events E and F are conditional independent given G if
P (EF ∣G) = P (E∣G)P (F ∣G)
Dependent events can become independent, and vice-versa, by conditioning on additional information.

4 Random Distributions
4.1 Definitions and Properties
Probability Mass Function:
p(a) = P (X = a)
Probability Density Function:
b ∞
P (a ≤ X ≤ b) = ∫ f (x)dx P (−∞ < X < ∞) = ∫ f (x)dx = 1
a −∞

Cumulative Distribution Function:


F (a) = F (X ≤ a) where − ∞ < a∞

a
F (a) = ∑ p(x) F (a) = ∫ f (x)dx
all x≤a −∞

d
Density f is the derivative of CDF F : f (a) = da
F (a)

2
4.2 Joint distributions
Joint Probability Mass Function:
pX,Y (a, b) = P (X = a, Y = b)
Marginal distributions:

pX (a) = P (X = a) = ∑ pX,Y (a, y) pY (b) = P (Y = b) = ∑ pX,Y (x, b)


y x

Joint Cumulative Probability Distribution (CDF):

FX,Y (a, b) = F (a, b) = P (X ≤ a, Y ≤ b) where − ∞ < a, b < ∞

Marginal distributions:

FX (a) = P (X ≤ a) = P (X ≤ a, Y < ∞) = FX,Y (a, ∞)


FY (b) = P (Y ≤ b) = P (X < ∞, Y ≤ b) = FX,Y (∞, b)

Joint Probability Density Function:


a2 b2
P (a1 < X ≤ a2 , b1 < Y ≤ b2 ) = ∫ ∫ fX,Y (x, y)dydx
a1 b1

a b ∂2
FX,Y (a, b) = ∫ ∫ fX,Y (x, y)dydx fX,Y (a, b) = FX,Y (a, b)
−∞ −∞ ∂a∂b
Marginal density functions:
∞ ∞
fx (a) = ∫ fX,Y (a, y)dy fy (b) = ∫ fX,Y (x, b)dx
−∞ −∞

4.3 Independent Random Variables


n random variables X1 , X2 , . . . , Xn are called independent if
n
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) = ∏ P (Xi = xi ) for all x1 , x2 , . . . , xn
i=1

or analogously for continuous random variables if


n
P (X1 ≤ a1 , X2 ≤ a2 , . . . , Xn ≤ an ) = ∏ P (Xi ≤ ai ) for all a1 , a2 , . . . , an
i=1

4.4 Convolution
Let X and Y be independent random variables. The convolution of FX and FY is FX+Y :

FX+Y (a) = P (X + Y ≤ a) = ∫ FX (a − y)fY (y)dy
y=−∞


fX+Y (a) = ∫ fX (a − y)fY (y)dy
y=−∞

In discrete case, replace ∫y=−∞ with ∑y , and f (y) with p(y).

3
4.5 Conditional Distributions
Conditional PMF of X given Y :
pX,Y (x, y)
pX∣Y (x∣y) = P (X = x∣Y = y) =
pY (y)
Conditional PDF of X given Y :
fX,Y (x, y)
fX∣Y (x∣y) =
fY (y)
Conditional CDF of X given Y :
FX∣Y (a∣y) = P (X ≤ a, Y = y) = ∑ pX∣Y (x∣y)
x≤a
a
=∫ fX∣Y (x∣y)dx
−∞

n random variables X1 , X2 , . . . , Xn are conditionally independent given Y if


n
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ∣Y = y) = ∏ P (Xi = xi ∣Y = y) for all x1 , x2 , . . . , xn , y
i=1

or analogously for continuous random variables if


n
P (X1 ≤ a1 , X2 ≤ a2 , . . . , Xn ≤ an ∣Y = y) = ∏ P (Xi ≤ ai ∣Y = y) for all a1 , a2 , . . . , an , y
i=1

It is possible to mix continuous and discrete random variables in conditional distributions. For example let X
be a continuous random variable and N be a discrete random variable. Then the conditional PDF of X given
N and the conditional PMF of N given X are
pN ∣X (n∣x)fX (x)
fX∣N (x∣n) =
pN (n)
fX∣N (x∣n)pN (n)
PN ∣X (n∣x) =
fX (x)

5 Expectation
5.1 Definitions
The expected value for a discrete random variable X is defined as
E[X] = ∑ xp(x)
x∶p(x)>0

For a continuous random variable X, the expected value is



E[X] = ∫ xf (x)dx
−∞

5.2 Properties
If I is an indicator variable for the event A, then
E[I] = P (A)
Let g(X) be a real-valued function of X.

E[g(X)] = ∑ g(xi )p(xi ) E[g(X)] = ∫ g(x)f (x)dx
i −∞

4
Let g(X, Y ) be a real-valued function of two random variables.
∞ ∞
E[g(X, Y )] = ∑ ∑ g(x, y)pX,Y (x, y) E[g(X, Y )] = ∫ ∫ g(x, y)fX,Y (x, y)dxdy
y x −∞ −∞

Linearity:
E[aX + b] = aE[X] + b
N -th Moment of X:
E[X n ] = ∑ xn p(x)
x∶p(x)>0

Expected Values of Sums:


n n
E [∑ Xi ] = ∑ E[Xi ]
i=1 i=1

Bounding Expectation:
If random variable X ≥ a then E[X] ≥ a.
If P (a ≤ X < ∞) = 1 then a ≤ E[X] < ∞.
If random variables X ≥ Y then E[X] ≥ E[Y ].

5.3 Conditional Expectation


Conditional Expectation of X given Y = y:
+∞
E[X∣Y = y] = ∑ xpX∣Y (x∣y) E[X∣Y = y] = ∫ xfX∣Y (x∣y)dx
x −∞

Expectation of conditional sum:


n n
E [∑ Xi ∣Y = y] = ∑ E[Xi ∣Y = y]
i=1 i=1

Expectation of conditional expectations:


E[E[X∣Y ]] = E[X]

6 Variance
6.1 Definition
If X is a random variable with mean µ then the variance of X, denoted Var(X), is:

Var(X) = E[(X − µ)2 ] = E[X 2 ] − (E[X])2

6.2 Properties
Var(aX + b) = a2 Var(X)
If X1 , X2 , . . . , Xn are independent random variables, then
n n
Var (∑ Xi ) = ∑ Var(Xi )
i=1 i=1

5
6.3 Covariance
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])] = E[XY ] − E[X]E[Y ]
If X and Y are independent, Cov(X, Y ) = 0 Properties:

Cov(X, Y ) = Cov(Y, X)
Cov(X, X) = Var(X)
Cov(aX + b, Y ) = aCov(X, Y )

If X1 , X2 , . . . , Xn and Y1 , Y2 , . . . , Ym are random variables, then

⎛n m ⎞ n m
Cov ∑ Xi , ∑ Yj = ∑ ∑ Cov(Xi , Xj )
⎝i=1 j=1 ⎠ i=1 j=1

6.4 Correlation
Cov(X, Y )
ρ(X, Y ) = √
Var(X)Var(Y )
Note: −1 ≤ ρ(X, Y ) ≤ 1.
Correlation measures linearity between X and Y .
If ρ(X, Y ) = 0, X and Y are uncorrelated.

7 Moment Generating Functions


7.1 Definition
Moment Generating Function (MGF) of a random variable X, where −∞ < t < ∞, is

M (t) = E[etX ]

When X is discrete: = ∑ etx p(x)


x

When X is continuous: =∫ etx f (x)dx
−∞

For any n random variables X1 , X2 , . . . , Xn

M (t1 , t2 , . . . , tn ) = E[et1 X1 +t2 X2 +⋯+tn Xn ]

The individual moment generating function is obtained:

MXi (t) = E[etX ] = M (0, . . . , 0, t, 0, . . . , 0) where t at ith place

7.2 Properties
dn
M n (t) = ( ) M (t) = E[X n enX ]
dtn
M n (0) = E[X n ]
MX (t) = MY (t) iff X ∼ Y
X1 , X2 , . . . , Xn independent if and only if:

M (t1 , t2 , . . . , tn ) = MX1 (t1 )MX2 (t2 ) . . . MXn (tn )

6
8 Inequalities
8.1 Boole’s Inequality
Let E1 , E2 , . . . , En be events with indicator random variables Xi .
n n
∑ P (Ei ) ≥ P ( ⋃ Ei )
i=1 i=1

8.2 Markov’s Inequality


X is a nonnegative random variable.

E[X]
P (X ≥ a) ≤ for all a > 0
a

8.3 Chebyshev’s Inequality


X is a random variable with E[X] = µ and Var(X) = σ 2 .

σ2
P (∣X − µ∣ ≥ k) ≤ for all k > 0
k2
One-sided inequality:
σ2
P (X ≥ E[X] + a) ≤ for any a > 0
σ2 + a2
σ2
P (X ≤ E[X] − a) ≤ for any a > 0
σ 2 + a2

8.4 Chernoff Bound


X is a random variable with MGF M (t).

P (X ≥ a) ≤ e−ta M (t) for all t > 0

P (X ≤ a) ≤ e−ta M (t) for all t < 0


In practice, use the t that minimizes e−ta M (t).
If Poisson, and P(X ≥ i) , minimizing t is ln( λi )

8.5 Jensen’s Inequality


If f (x) is a convex function (f ′′ (x) ≥ 0 for all x) then

E[f (x)] ≥ f (E[X])

9 Maximum Likelihood Estimator


9.1 Derivation Method
1) get density function w/ given parameter (λ) by plugging in parameter to density expression
2) L(λ) = ∏ni=1 f (Xi ∣λ)
3) LL(λ) = ∑ni=1 log(f (Xi ∣λ))
4) Now, maximize by setting dLL(λ)

=0
5) Finally, solve for λ̂

7
9.2 Biased or NonBiased?
Unbiased when :
E[Θ̂] = Θ

9.3 Estimator Consistency


for  greater than 0
lim P (∣Θ̂ − Θ∣ ≤ ) = 1
n−>∞
Meaning: as we get more data, estimate should deviate from true value by at most a small amount.

10 Central Limit Theorem


10.1 Discussion
Deals with I.I.D. (Independent and Identically Distributed Random Variables.
CLT says that if Random Variables have a finite mean µ and finite variance σ 2 , then the distribution of the
sum of the first n of them is, for large n, approximately that of a normal variable with mean: nµ and variance:
n n
∑ P (Ei ) ≥ P ( ⋃ Ei )
i=1 i=1

10.2 Confidence Interval


Consider IID Random Variables.
S : sample std. deviation
2
S 2 = ∑ni=1 (Xn−1
i −X)

2
Var(X) = σn
for large n, 100(1 - α)% CI is:
S S
(X − z α2 √ , X + z α2 √ )
n n
Meaning: 100(1 - α)% of time that CI is computed from sample , true µ is in interval.
α
Φ(z α2 ) = 1 −
2
Ex: α = .05, α2 = .025, Φ(z α2 ) = .975, z α2 = 1.96
Confidence Level:
90% − − > 1.645
95% − − > 1.96
99% − − > 2.58

10.3 Calculating n to ensure confidence level with average


Consider IID Random Variables. With given µ and σ 2 √ √
( n X )−nµ
If you have buffer +/- a and confidence level 100(1 - α)% Zn = ∑i=1σ√ni P ( −aσ n ≤ Zn ≤ a n
σ
)

a n
2Φ( ) − 1 = 100(1 − α)%
σ

10.4 Approximate Probability with CLT


23.5−20
Take X as Poisson(20) P (X ≥ 24) = P (Z ≥ √
20
) *Remember continuity correction if X comes from discrete
distribution

8
11 Laws of Large Numbers
Consider IID Random Variables

11.1 Weak Law


for any  greater than 0
n
∑ P (∣X − µ∣ ≥ ) − − > 0n− > ∞
i=1

11.2 Strong Law


X1 + X2 + ...Xn
P ( lim ( ) = µ) = 1
n−>∞ n
Strong implies Weak but NOT vice-versa
Strong law implies for any  greater than 0, there are only finite number of values of n, such that condition:
∣X − µ∣ ≥  holds

12 Discrete Random Variables


12.1 Bernoulli
An experiment that results in ”success” or ”failure.”

X ∼ Ber(p)
P (X = 0) = 1 − p
P (X = 1) = p
E[X] = p
Var(X) = p(1 − p)
M (t) = et p + 1 − p

12.2 Binomial
The number of successes in an experiment with n trials and p probability of success on each trial.

X ∼ Bin(n, p)
n
P (X = i) = p(i) = ( )pi (1 − p)n−i where i = 0, 1, . . . , n
i
E[X] = np
Var(X) = np(1 − p)
M (t) = (pet + 1 − p)n

If Xi ∼ Bin(ni , p) for 1 ≤ i ≤ N , then


N N
(∑ Xi ) ∼ Bin (∑ ni , p)
i=1 i=1

Note that the binomial distribution is a generalization of the Bernoulli distribution, since Ber(p) ∼ Bin(1, p).

9
12.3 Poisson
Approximates the binomial random variable when n is large and p is small enough to make np ”moderate”—
generally when n > 20 and p < 0.05—and approaches the binomial distribution as n → ∞ and p → 0.
X ∼ Poi(λ) where λ = np
λi
P (X = i) = e−λ where i = 0, 1, 2, . . .
i!
E[X] = λ
Var(X) = λ
t
M (t) = eλ(e −1)

The approximations also works to a certain extent when the successes in the trials are not entirely independent,
and when the probability of success in each trial varies slightly.

If Xi ∼ Poi(λi ) for 1 ≤ i ≤ N , then


N N
(∑ Xi ) ∼ Poi (∑ λi )
i=1 i=1

12.4 Geometric
The number of independent trials until a success, where the probability of success is p.
X ∼ Geo(p)
P (X = n) = (1 − p)n−1 p where n = 1, 2, . . .
E[X] = 1/p
Var(X) = (1 − p)/p2

12.5 Negative Binomial


The number of independent trials until r successes, with probability p of success.
X ∼ NegBin(r, p)
n−1 r
P (X = n) = ( )p (1 − p)n−r where n = r, r + 1, . . .
r−1
E[X] = r/p
Var(x) = r(1 − p)/p2
Geo(p) ∼ NegBin(1, p)
Note that the negative binomial distribution generalizes the geometric distribution, with Geo(p) ∼ NegBin(1, p).

12.6 Hypergeometric
The number of white balls drawn after drawing n balls (without replacement) from an urn containing N balls,
with m white balls and N − m other (”black”) balls.
X ∼ HypG(n, N, m)
(mi)(Nn−i
−m
)
P (X = i) = where i = 0, 1, . . . , n
(N
n
)
E[X] = n(m/N )
nm(N − n)(N − m)
Var(X) =
N 2 (N − 1)
HypG(n, N, m) → Bin(n, m/N ) , as N → ∞ and m/N stays constant

10
12.7 Multinomial
The multinomial distribution further generalizes the binomial distribution: given an experiment with n inde-
pendent trials, where each trial results in one of m outcomes, with respective probabilities p1 , p2 , . . . , pm such
that ∑m
i=1 pi = 1, then if Xi denotes the number of trials with outcome i we have

n
P (X1 = c1 , X2 = c2 , . . . , Xm = cm ) = ( )pc1 pc2 ⋯pcmm
c1 , c2 , . . . , cm 1 2
n
where ∑m
i=1 ci = n and (c1 ,c2 ,...,cm ) =
n!
c1 !c2 !⋯cm !
.

13 Continuous Random Variables


If Y is a non-negative continuous random variable

E[Y ] = ∫ P (Y > y)dy
0

13.1 Uniform

X ∼ Uni(α, β)
1
α≤x≤β
β−α
f (x) = {
0 otherwise
α+β
E[X] =
2
(β − α)2
Var(X) =
12

13.2 Normal
For values in common natural phenomena, especially when resulting from the sum of multiple variables.

X ∼ N(µ, σ 2 )
1 (x−µ)2
f (x) = √ e− 2σ2 where − ∞ < x < ∞
σ 2π
E[X] = µ
Var(X) = σ 2
2 2
( σ 2t +µt)
M (t) = e

Letting X ∼ N (µ, σ 2 ) and Y = aX + b, we have

Y ∼ N(aµ + b, a2 σ 2 )
x−b
FY (x) = FX ( )
a
The Standard (Unit) Normal Random Variable Z ∼ N (0, 1) has a cumulative distribution function (CDF)
commonly labeled Φ(z) = P (Z ≤ z) that has some useful properties.
1 2 z
√ e−x /2 dx
Φ(z) = ∫
−∞ 2π
Φ(−z) = 1 − Φ(z)
P (Z ≥ −z) = P (Z > z)

11
Given X ∼ N (µ, σ 2 ) where σ > 0, we can then compute the CDF of X using the CDF of the standard normal
variable.
x−µ
FX (x) = Φ( )
σ
By the de Moivre-Laplace Limit Theorem, the normal variable can approximate the binomial when Var(X) =
np(1 − p) ≥ 10. If we let Sn denote the number of successes (with probability p) in n independent trials, then

⎛ Sn − np ⎞ n→∞
P a≤ √ ≤ b → Φ(b) − Φ(a)
⎝ np(1 − p) ⎠

If Xi ∼ N(µi , σi2 ) for i = 1, 2, . . . , n, then


n n n
(∑ Xi ) ∼ N (∑ µi , ∑ σi2 )
i=1 i=1 i=1

13.3 Exponential
Represents time until some event, with rate λ > 0.
X ∼ Exp(λ)
λe−λx if x ≥ 0
f (x) = {
0 if x < 0
1
E[X] =
λ
1
Var(X) = 2
λ
F (x) = 1 − e−λx where x ≥ 0
Exponentially distributed random variables are memoryless.
P (X > s + t∣X > s) = P (X > t)

13.4 Beta

X ∼ Beta(a, b)
1
B(a,b)
xa−1 (1 − x)b−1 0<x<1
f (x) = {
0 otherwise
1
B(a, b) = ∫ xa−1 (1 − x)b−1 dx
0
a
E[X] =
a+b
ab
Var(X) =
(a + b)2 (a + b + 1)
If X ∼ Uni(0, 1) and N denotes the number of heads resulting from a number of coin flips with some unknown
probability of getting heads, then
X∣(N = n, m + n trials) ∼ Beta(n + 1, m + 1)

14 Useful Definitions
14.1 Taylor Series
n
xn
ex = ∑
i=1 n!

12
14.2 Integration By Parts
b dv b du
∫ u(x) ∗ dx = u(x) ∗ v(x)∣ba − ∫ v(x) ∗
a dx a dx

13

You might also like