Probability and Statistics - Cookbook
Probability and Statistics - Cookbook
Cookbook
12 Parametric Inference
12.1 Method of Moments . . . . . . . . .
12.2 Maximum Likelihood . . . . . . . . .
versity of California in Berkeley but also influenced by other
12.2.1 Delta Method . . . . . . . . .
sources [4, 5]. If you find errors or have suggestions for further
12.3 Multiparameter Models . . . . . . .
topics, I would appreciate if you send me an email. The most re12.3.1 Multiparameter delta method
cent version of this document is available at http://matthias.
12.4 Parametric Bootstrap . . . . . . . .
15 Exponential Family
11 20 Stochastic Processes
20.1 Markov Chains . . . . . . . . . .
11
20.2 Poisson Processes . . . . . . . . .
12
12
21 Time Series
12
21.1 Stationary Time Series . . . . . .
13
21.2 Estimation of Correlation . . . .
13
21.3 Non-Stationary Time Series . . .
21.3.1 Detrending . . . . . . . .
13
21.4 ARIMA models . . . . . . . . . .
21.4.1 Causality and Invertibility
14
21.5 Spectral Analysis . . . . . . . . .
14
14 22 Math
22.1 Gamma Function . . . . . . . . .
15
22.2 Beta Function . . . . . . . . . . .
15
22.3 Series . . . . . . . . . . . . . . .
15
22.4 Combinatorics . . . . . . . . . .
16
16 Sampling Methods
16.1 The Bootstrap . . . . . . . . . . . . .
16.1.1 Bootstrap Confidence Intervals
16.2 Rejection Sampling . . . . . . . . . . .
16.3 Importance Sampling . . . . . . . . . .
16
16
16
17
17
This cookbook integrates a variety of topics in probability theory and statistics. It is based on literature [1, 6, 3] and in-class
material from courses of the statistics department at the Uni-
vallentin.net/probability-and-statistics-cookbook/. To
reproduce, please contact me.
2 Probability Theory
3 Random Variables
3.1 Transformations . . . . . . . . . . . . .
6
7
4 Expectation
5 Variance
6 Inequalities
1 Distribution Overview
1.1 Discrete Distributions . . . . . . . . . .
1.2 Continuous Distributions . . . . . . . .
7 Distribution Relationships
8 Probability
Functions
and
Moment
9 Multivariate Distributions
9.1 Standard Bivariate Normal . . . . . . .
9.2 Bivariate Normal . . . . . . . . . . . . .
9.3 Multivariate Normal . . . . . . . . . . .
10 Convergence
10.1 Law of Large Numbers (LLN) . . . . . .
10.2 Central Limit Theorem (CLT) . . . . .
11 Statistical Inference
11.1 Point Estimation . . . . . . . . . .
11.2 Normal-Based Confidence Interval
11.3 Empirical distribution . . . . . . .
11.4 Statistical Functionals . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17 Decision Theory
17.1 Risk . . . . . . . . . . . . . . . . . . . .
17.2 Admissibility . . . . . . . . . . . . . . .
17.3 Bayes Rule . . . . . . . . . . . . . . . .
9
17.4 Minimax Rules . . . . . . . . . . . . . .
9
9 18 Linear Regression
9
18.1 Simple Linear Regression . . . . . . . .
9
18.2 Prediction . . . . . . . . . . . . . . . . .
18.3 Multiple Regression . . . . . . . . . . .
9
18.4 Model Selection . . . . . . . . . . . . . .
10
10
19 Non-parametric Function Estimation
19.1 Density Estimation . . . . . . . . . . . .
10
19.1.1 Histograms . . . . . . . . . . . .
10
19.1.2 Kernel Density Estimator (KDE)
11
19.2 Non-parametric Regression . . . . . . .
11
11
19.3 Smoothing Using Orthogonal Functions
8
Generating
.
.
.
.
.
.
13 Hypothesis Testing
14 Bayesian Inference
14.1 Credible Intervals . . . .
14.2 Function of parameters .
3
3
14.3 Priors . . . . . . . . . .
4
14.3.1 Conjugate Priors
14.4 Bayesian Testing . . . .
6
Contents
.
.
.
.
.
.
17
17
17
18
18
18
18
19
19
19
20
20
20
21
21
21
22
. . . . 22
. . . . 22
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
24
24
24
24
25
25
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
26
26
27
27
Distribution Overview
1.1
Discrete Distributions
Notation1
FX (x)
Unif {a, . . . , b}
Uniform
x<a
axb
x>b
(1 p)1x
bxca+1
ba
1
Bernoulli
Bern (p)
Binomial
Multinomial
Hypergeometric
Hyp (N, m, n)
Negative Binomial
NBin (r, p)
Geometric
Geo (p)
Poisson
Po ()
MX (s)
a+b
2
(b a + 1)2 1
12
eas e(b+1)s
s(b a)
px (1 p)1x
p(1 p)
1 p + pes
np
np(1 p)
(1 p + pes )n
x N+
x
X
i
i!
i=0
Uniform (discrete)
1p
p
1p
p2
1
p
1p
p2
x N+
x e
x!
p
1 (1 p)es
e(e
1)
Poisson
p = 0.2
p = 0.5
p = 0.8
=1
=4
= 10
0.2
0.6
PMF
0.4
PMF
0.15
PMF
0.10
0.05
x
1 We
10
20
x
30
40
0.0
0.0
0.1
0.2
0.00
PMF
0.20
0.3
0.25
n = 40, p = 0.3
n = 30, p = 0.6
n = 25, p = 0.9
r
p
1 (1 p)es
Geometric
pi e
nm(N n)(N m)
N 2 (N 1)
nm
N
r
Binomial
!n
si
i=0
nx
N
x
p(1 p)x1
k
X
npi (1 pi )
npi
!
x+r1 r
p (1 p)x
r1
Ip (r, x + 1)
1 (1 p)x
V [X]
k
X
n!
x
xi = n
px1 1 pkk
x1 ! . . . xk !
i=1
m mx
Mult (n, p)
x np
p
np(1 p)
E [X]
!
n x
p (1 p)nx
x
I1p (n x, x + 1)
Bin (n, p)
fX (x)
0.8
10
10
15
20
use the notation (s, x) and (x) to refer to the Gamma functions (see 22.1), and use B(x, y) and Ix to refer to the Beta functions (see 22.2).
3
1.2
Continuous Distributions
Notation
FX (x)
Uniform
Unif (a, b)
Normal
N , 2
x<a
a<x<b
1
x>b
Z x
(x) =
(t) dt
xa
ba
Log-Normal
ln N , 2
Multivariate Normal
MVN (, )
Students t
Student()
Chi-square
2k
1
1
ln x
+ erf
2
2
2 2
fX (x)
E [X]
V [X]
MX (s)
exp
2 2
x 2 2
a+b
2
(b a)2
12
esb esa
s(b a)
2 s2
exp s +
2
,
(k/2)
2 2
1 (x)
(+1)/2
+1
x2
2
1+
2
1
xk/21 ex/2
2k/2 (k/2)
r
e+
/2
(e 1)e2+
2k
d2
d2 2
2d22 (d1 + d2 2)
d1 (d2 2)2 (d2 4)
1
exp T s + sT s
2
F
F(d1 , d2 )
Exponential
Exp ()
Gamma
Inverse Gamma
Dirichlet
Beta
Weibull
Pareto
d1 x
d1 x+d2
Gamma (, )
InvGamma (, )
d1 d1
,
2 2
Pareto(xm , )
(, x/)
()
, x
()
1
x1 ex/
()
1 /x
x
e
()
P
k
k
i=1 i Y 1
xi i
Qk
i=1 (i ) i=1
>1
1
2
>2
( 1)2 ( 2)2
( + ) 1
x
(1 x)1
() ()
+
1
1 +
k
( + )2 ( + + 1)
2
2 1 +
2
k
xm
>1
1
x
m
>2
( 1)2 ( 2)
1 e(x/)
1
d1
, 2
2
1 x/
e
Ix (, )
Weibull(, k)
xB
d1
1 ex/
Dir ()
Beta (, )
(d1 x)d1 d2 2
x
m
x xm
k x k1 (x/)k
e
x
m
+1
x
x xm
i
Pk
i=1 i
1
(s < 1/)
1 s
1
(s < 1/)
1 s
p
2(s)/2
4s
K
()
E [Xi ] (1 E [Xi ])
Pk
i=1 i + 1
1+
X
k=1
X
n=0
k1
Y
r=0
+r
++r
sk
k!
sn n
n
1+
n!
k
(xm s) (, xm s) s < 0
Normal
Lognormal
Student's t
1.0
= 0, 2 = 3
= 2, 2 = 2
= 0, 2 = 1
= 0.5, 2 = 1
= 0.25, 2 = 1
= 0.125, 2 = 1
=1
=2
=5
=
0.2
PDF
0.4
(x)
0.6
0.6
0.3
0.8
0.8
= 0, 2 = 0.2
= 0, 2 = 1
= 0, 2 = 5
= 2, 2 = 0.5
0.4
Uniform (continuous)
0.1
4
0.0
0.0
0.2
0.2
0.0
0.0
0.5
1.0
1.5
2.5
3.0
Exponential
Gamma
2.0
=2
=1
= 0.4
= 1, = 2
= 2, = 2
= 3, = 2
= 5, = 1
= 9, = 0.5
0.4
3.0
d1 = 1, d2 = 1
d1 = 2, d2 = 1
d1 = 5, d2 = 2
d1 = 100, d2 = 1
d1 = 100, d2 = 100
0.3
PDF
0.2
1.0
1.5
0.1
0
0.0
0
Inverse Gamma
Beta
10
15
20
Pareto
xm = 1, = 1
xm = 1, = 2
xm = 1, = 4
= 1, k = 0.5
= 1, k = 1
= 1, k = 1.5
= 1, k = 5
2.0
2.5
Weibull
= 0.5, = 0.5
= 5, = 1
= 1, = 3
= 2, = 2
= 2, = 5
3.0
= 1, = 1
= 2, = 1
= 3, = 1
= 3, = 0.5
3
x
3
x
0.0
0.2
0.4
0.6
x
0.8
1.0
2
0
0.0
0.0
0.5
0.5
1.0
1.0
1.5
PDF
2
1.5
2.0
2.5
0.0
0.0
0.0
0.5
0.1
0.5
1.0
0.2
0.3
2.0
1.5
2.5
0.4
0.5
k=1
k=2
k=3
k=4
k=5
2.0
0.5
0.4
1
ba
0.0
0.5
1.0
1.5
x
2.0
2.5
Probability Theory
Definitions
P [B] =
Sample space
Outcome (point or element)
Event A
-algebra A
P [B|Ai ] P [Ai ]
n
G
Ai
i=1
Bayes Theorem
P [B | Ai ] P [Ai ]
P [Ai | B] = Pn
j=1 P [B | Aj ] P [Aj ]
Inclusion-Exclusion Principle
n
n
[ X
Ai =
(1)r1
Probability Distribution P
i=1
1. P [A] 0 A
2. P [] = 1
" #
G
X
3. P
Ai =
P [Ai ]
r=1
n
G
Ai
i=1
r
\
Aij
Random Variables
i=1
X:R
Probability space (, A, P)
Probability Mass Function (PMF)
Properties
i=1
1. A
S
2. A1 , A2 , . . . , A = i=1 Ai A
3. A A = A A
i=1
n
X
P [] = 0
B = B = (A A) B = (A B) (A B)
P [A] = 1 P [A]
P [B] = P [A B] + P [A B]
P [] = 1
P [] = 0
S
T
T
S
( n An ) = n An ( n An ) = n An
DeMorgan
S
T
P [ n An ] = 1 P [ n An ]
P [A B] = P [A] + P [B] P [A B]
= P [A B] P [A] + P [B]
P [A B] = P [A B] + P [A B] + P [A B]
P [A B] = P [A] P [A B]
f (x) dx
a
FX (x) = P [X x]
Continuity of Probabilities
A1 A2 . . . = limn P [An ] = P [A]
A1 A2 . . . = limn P [An ] = P [A]
S
whereA = i=1 Ai
T
whereA = i=1 Ai
ab
Independence
fY |X (y | x) =
A
B P [A B] = P [A] P [B]
f (x, y)
fX (x)
Independence
Conditional Probability
P [A | B] =
fY |X (y | x)dy
P [a Y b | X = x] =
P [A B]
P [B]
P [B] > 0
1. P [X x, Y y] = P [X x] P [Y y]
2. fX,Y (x, y) = fX (x)fY (y)
6
3.1
Transformations
E [XY ] =
Transformation function
E [(Y )] 6= (E [X])
(cf. Jensen inequality)
P [X Y ] = 0 = E [X] E [Y ] P [X = Y ] = 1 = E [X] = E [Y ]
X
E [X] =
P [X x]
Z = (X)
Discrete
X
fZ (z) = P [(X) = z] = P [{x : (x) = z}] = P X 1 (z) =
f (x)
x1 (z)
x=1
Sample mean
n
X
n = 1
X
Xi
n i=1
Continuous
Z
FZ (z) = P [(X) z] =
with Az = {x : (x) z}
f (x) dx
Az
Conditional expectation
Z
E [Y | X = x] = yf (y | x) dy
E [X] = E [E [X | Y ]]
E[(X, Y ) | X = x] =
(x, y)fY |X (y | x) dx
Z
Z
E [Z] =
E [Y + Z | X] = E [Y | X] + E [Z | X]
E [(X)Y | X] = (X)E [Y | X]
E[Y | X] = c = Cov [X, Y ] = 0
Z
dFX (x) = P [X A]
Z
E [IA (x)] =
Convolution
Z
Z := X + Y
fX,Y (x, z x) dx
fZ (z) =
X,Y 0
fX,Y (x, z x) dx
Z := |X Y |
Z :=
X
Y
fZ (z) = 2
fX,Y (x, z + x) dx
0
Z
Z
fZ (z) =
|x|fX,Y (x, xz) dx =
xfx (x)fX (x)fY (xz) dx
Expectation
Variance
" n
X
#
Xi =
i=1
Z
E [X] = X =
x dFX (x) =
i6=j
V [Xi ]
iff Xi
Xj
i=1
Standard deviation
X
xfX (x)
x
Z
xfX (x)
P [X = c] = 1 = E [c] = c
E [cX] = c E [X]
E [X + Y ] = E [X] + E [Y ]
i=1
n
X
sd[X] =
X discrete
V [X] = X
Covariance
X continuous
n
m
n X
m
X
X
X
Cov
Xi ,
Yj =
Cov [Xi , Yj ]
i=1
j=1
Correlation
i=1 j=1
Cov [X, Y ]
[X, Y ] = p
V [X] V [Y ]
Independence
Poisson
X
Y = [X, Y ] = 0 Cov [X, Y ] = 0 E [XY ] = E [X] E [Y ]
Xi Po (i ) Xi Xj =
n
X
Xi Po
i=1
Sample variance
n
X
!
i
i=1
X
n
X
n
i
Xj Bin
Xj , Pn
Xi Po (i ) Xi Xj = Xi
j=1
j=1 j
j=1
1 X
n )2
S2 =
(Xi X
n 1 i=1
Conditional variance
2
V [Y | X] = E (Y E [Y | X])2 | X = E Y 2 | X E [Y | X]
V [Y ] = E [V [Y | X]] + V [E [Y | X]]
Exponential
Xi Exp () Xi
Xj =
n
X
Xi Gamma (n, )
i=1
Inequalities
Normal
X N , 2
Cauchy-Schwarz
E [XY ] E X 2 E Y 2
Markov
P [(X) t]
E [(X)]
t
Chebyshev
P [|X E [X]| t]
Chernoff
P [X (1 + )]
> 1
Distribution Relationships
Binomial
Xi Bern (p) =
n
X
Gamma
Jensen
N (0, 1)
X N , Z = aX + b = Z N a + b, a2 2
X N 1 , 12 Y N 2 , 22 = X + Y N 1 + 2 , 12 + 22
P
P
P
2
Xi N i , i2 =
X N
i i ,
i i
i i
a
P [a < X b] = b
(x) = 1 (x)
0 (x) = x(x)
00 (x) = (x2 1)(x)
1
Upper quantile of N (0, 1): z = (1 )
V [X]
t2
e
(1 + )1+
Xi Bin (n, p)
i=1
X Gamma (, ) X/ Gamma (, 1)
P
Gamma (, ) i=1 Exp ()
P
P
Xi Gamma (i , ) Xi
Xj =
Xi Gamma ( i i , )
i
Z
()
=
x1 ex dx
0
Beta
1
( + ) 1
x1 (1 x)1 =
x
(1 x)1
B(, )
()()
B( + k, )
+k1
E Xk =
=
E X k1
B(, )
++k1
Beta (1, 1) Unif (0, 1)
8
X
(Y E [Y ])
Y
p
V [X | Y ] = X 1 2
E [X | Y ] = E [X] +
|t| < 1
Xt
MX (t) = GX (e ) = E e
=E
"
#
X (Xt)i
i!
i=0
X
E Xi
=
ti
i!
i=0
P [X = 0] = GX (0)
P [X = 1] = G0X (0)
P [X = i] =
9.3
(i)
GX (0)
Multivariate Normal
(Precision matrix 1 )
V [X1 ]
Cov [X1 , Xk ]
..
..
..
=
.
.
.
Covariance matrix
i!
E [X] = G0X (1 )
(k)
E X k = MX (0)
X!
(k)
E
= GX (1 )
(X k)!
Cov [Xk , X1 ]
If X N (, ),
n/2
GX (t) = GY (t) = X = Y
9
9.1
V [Xk ]
fX (x) = (2)
||
1
exp (x )T 1 (x )
2
Properties
Multivariate Distributions
Let X, Y N (0, 1) X
Z where Y = X +
1/2
1 2 Z
Z N (0, 1) X = + 1/2 Z = X N (, )
X N (, ) = 1/2 (X ) N (0, 1)
X N (, ) = AX N A, AAT
X N (, ) kak = k = aT X N aT , aT a
Joint density
f (x, y) =
2
1
x + y 2 2xy
p
exp
2(1 2 )
2 1 2
10
Convergence
Let {X1 , X2 , . . .} be a sequence of rvs and let X be another rv. Let Fn denote
the cdf of Xn and let F denote the cdf of X.
Conditionals
(Y | X = x) N x, 1 2
and
(X | Y = y) N y, 1 2
Types of convergence
Independence
X
Y = 0
9.2
Bivariate Normal
2. In probability: Xn X
Let X N x , x2 and Y N y , y2 .
f (x, y) =
"
z=
x x
x
2x y
2
+
1
p
z
exp
2
2(1 2 )
1
y y
y
t where F continuous
2
2
x x
x
y y
y
n
as
#
qm
CLT notations
Zn N (0, 1)
2
Xn N ,
n
2
Xn N 0,
n
2
n ) N 0,
n(X
n(Xn )
N (0, 1)
n
lim E (Xn X)2 = 0
Relationships
qm
Xn X = Xn X = Xn X
as
P
Xn X = Xn X
D
P
Xn X (c R) P [X = c] = 1 = Xn X
Xn
Xn
Xn
Xn
X
qm
X
P
X
P
X
Yn
Yn
Yn
=
Y = Xn + Yn X + Y
qm
qm
Y = Xn + Yn X + Y
P
P
Y = Xn Yn XY
P
(Xn ) (X)
Continuity correction
x + 12
/ n
x 12
P Xn x 1
/ n
Xn X = (Xn ) (X)
qm
Xn b limn E [Xn ] = b limn V [Xn ] = 0
qm
n
X1 , . . . , Xn iid E [X] = V [X] < X
n x
P X
Slutzkys Theorem
D
Xn X and Yn c = Xn + Yn X + c
D
P
D
Xn X and Yn c = Xn Yn cX
D
D
D
In general: Xn X and Yn Y =
6
Xn + Yn X + Y
10.1
Delta method
Yn N
11
2
,
n
= (Yn ) N
2
(), ( ())
n
0
Statistical Inference
iid
Weak (WLLN)
11.1
n
as
n
X
P
Consistency: bn
Sampling distribution: F (bn )
r h i
b
Standard error: se(n ) = V bn
h
i
h i
Mean squared error: mse = E (bn )2 = bias(bn )2 + V bn
Strong (WLLN)
10.2
n
V X
lim P [Zn z] = (z)
Point Estimation
n
X
P
where Z N (0, 1)
zR
11.2
b 2 . Let z/2
Suppose bn N , se
and P z/2 < Z < z/2 = 1 where Z N (0, 1). Then
= 1 (1 (/2)), i.e., P Z > z/2 = /2
b
Cn = bn z/2 se
11.4
Statistical Functionals
Statistical functional: T (F )
Plug-in estimator of = (F ): bn = T (Fbn )
R
Linear functional: T (F ) = (x) dFX (x)
Plug-in estimator for linear functional:
n
11.3
1X
(Xi )
(x) dFbn (x) =
n i=1
b 2 = T (Fbn ) z/2 se
b
Often: T (Fbn ) N T (F ), se
T (Fbn ) =
Empirical distribution
i=1
I(Xi x)
n
(
1
I(Xi x) =
0
Xi x
Xi > x
b3 j
Pn
n )(Yi Yn )
(Xi X
qP
b = qP i=1
n
n
2
(X
X
)
i
n
i=1
i=1 (Yi Yn )
12
Parametric Inference
Let F = f (x; ) : be a parametric model with parameter space Rk
and parameter = (1 , . . . , k ).
12.1
Method of Moments
j th moment
j () = E X j =
xj dFX (x)
j th sample moment
n
bj =
1X j
X
n i=1 i
11
n(b ) N (0, )
where = gE Y Y T g T , Y = (X, X 2 , . . . , X k )T ,
1
g = (g1 , . . . , gk ) and gj =
j ()
12.2
Maximum Likelihood
Likelihood: Ln : [0, )
Ln () =
n
Y
f (Xi ; )
i=1
n
X
log f (Xi ; )
i=1
12.2.1
Delta Method
(b
n ) D
N (0, 1)
b )
se(b
Ln (bn ) = sup Ln ()
Score function
s(X; ) =
log f (X; )
b b
b = 0 ()
b n )
se
se(
Fisher information
I() = V [s(X; )]
In () = nI()
Fisher information (exponential family)
I() = E s(X; )
12.3
Multiparameter Models
2 `n
2
Hjk =
n
2 X
log f (Xi ; )
2 i=1
2 `n
j k
..
.
E [Hk1 ]
E [H11 ]
..
In () =
.
E [H1k ]
..
.
E [Hkk ]
12
13
Hypothesis Testing
H0 : 0
(bj j ) D
N (0, 1)
bj
se
h
i
b 2j = Jn (j, j) and Cov bj , bk = Jn (j, k)
where se
12.3.1
1
.
=
..
k
Definitions
Null hypothesis H0
Alternative hypothesis H1
Simple hypothesis = 0
Composite hypothesis > 0 or < 0
Two-sided test: H0 : = 0 versus H1 : 6= 0
One-sided test: H0 : 0 versus H1 : > 0
Critical value c
Test statistic T
Rejection region R = {x : T (x) > c}
Power function () = P [X R]
Power of a test: 1 P [Type II error] = 1 = inf ()
(b
) D
N (0, 1)
b )
se(b
H0 true
H1 true
T
Type II Error ()
1F (T (X))
p-value
< 0.01
0.01 0.05
0.05 0.1
> 0.1
b
Jbn
b and
b = b.
and Jbn = Jn ()
=
12.4
Retain H0
Reject H0
Type
I Error ()
(power)
p-value = sup0 P [T (X) T (x)] = inf : T (x) R
p-value = sup0
P [T (X ? ) T (X)]
= inf : T (X) R
|
{z
}
where
b ) =
se(b
p-value
b Then,
Suppose =b 6= 0 and b = ().
r
H1 : 1
versus
Parametric Bootstrap
Sample from f (x; bn ) instead of from Fbn , where bn could be the mle or method
of moments estimator.
since T (X ? )F
evidence
very strong evidence against H0
strong evidence against H0
weak evidence against H0
little or no evidence against H0
Wald test
Two-sided test
b 0
Reject H0 when |W | > z/2 where W =
b
se
P |W | > z/2
p-value = P0 [|W | > |w|] P [|Z| > |w|] = 2(|w|)
Likelihood ratio test (LRT)
T (X) =
sup Ln ()
Ln (bn )
=
sup0 Ln ()
Ln (bn,0 )
13
k
X
iid
i=1
p-value = P0 [(X) > (x)] P 2rq > (x)
Multinomial LRT
Xk
X1
,...,
mle: pbn =
n
n
Xj
k
Y
Ln (b
pn )
pbj
T (X) =
=
Ln (p0 )
p0j
j=1
k
X
pbj
D
Xj log
(X) = 2
2k1
p
0j
j=1
xn = (x1 , . . . , xn )
Prior density f ()
Likelihood f (xn | ): joint density of the data
n
Y
In particular, X n iid = f (xn | ) =
f (xi | ) = Ln ()
i=1
Posterior density f ( | xn )
R
Normalizing constant cn = f (xn ) = f (x | )f () d
Kernel: part of a density that depends Ron
R
Ln ()f ()
Posterior mean n = f ( | xn ) d = R
Ln ()f () d
14.1
Credible Intervals
Posterior interval
k
X
(Xj E [Xj ])2
where E [Xj ] = np0j under H0
T =
E [Xj ]
j=1
D
f ( | xn ) d = 1
P [ (a, b) | x ] =
a
2k1
T
p-value = P 2k1 > T (x)
f ( | xn ) d =
2
Faster Xk1
than LRT, hence preferable for small n
f ( | xn ) d = /2
Independence testing
I rows, J columns, X multinomial sample of size n = I J
X
mles unconstrained: pbij = nij
X
14
1. P [ Rn ] = 1
2. Rn = { : f ( | xn ) > k} for some k
Rn is unimodal = Rn is an interval
14.2
Function of parameters
Let = () and A = { : () }.
Posterior CDF for
Bayesian Inference
H(r | xn ) = P [() | xn ] =
f ( | xn ) d
Bayes Theorem
Posterior density
f (x | )f ()
f (x | )f ()
f ( | x) =
=R
Ln ()f ()
n
f (x )
f (x | )f () d
h( | xn ) = H 0 ( | xn )
Bayesian delta method
Definitions
n
X = (X1 , . . . , Xn )
b
b se
b 0 ()
| X n N (),
14
14.3
Priors
Conjugate prior
Unif (0, )
Pareto(xm , k)
Exp ()
Gamma (, )
Choice
Subjective bayesianism.
Objective bayesianism.
Robust bayesianism.
i=1
2
2
N , c
N 0 , 0
Types
N c , 2
Flat: f () constant
R
Proper: f () d = 1
R
Improper: f () d =
Jeffreys prior (transformation-invariant):
f ()
I()
f ()
N , 2
Normalscaled
Inverse
Gamma(, , , )
det(I())
MVN(, c )
MVN(0 , 0 )
Conjugate Priors
Discrete likelihood
Likelihood
Bern (p)
Bin (p)
Conjugate prior
Beta (, )
Beta (, )
Posterior hyperparameters
+
+
n
X
i=1
n
X
xi , + n
xi , +
i=1
NBin (p)
Po ()
Beta (, )
Gamma (, )
+ rn, +
+
n
X
n
X
n
X
i=1
xi
Dir ()
n
X
xi , + n
x(i)
i=1
Geo (p)
Beta (, )
n
X
InverseWishart(, )
Pareto(xmc , k)
Gamma (, )
Pareto(xm , kc )
Pareto(x0 , k0 )
Gamma (c , )
Gamma (0 , 0 )
Pn
n
0
1
i=1 xi
+
+ 2 ,
/
2
2
02
c
0
c1
n
1
+ 2
02
c
Pn
02 + i=1 (xi )2
+ n,
+n
n
+ n
x
,
+ n,
+ ,
+n
2
n
1X
(
x )2
2
+
(xi x
) +
2 i=1
2(n + )
1
1
0 + nc
+ n, +
n
X
i=1
1
1
1
x
,
0 0 + n
1 1
1
0 + nc
n
X
n + , +
(xi c )(xi c )T
i=1
n
X
xi
xm c
i=1
x0 , k0 kn where k0 > kn
n
X
0 + nc , 0 +
xi
+ n, +
log
i=1
xi
i=1
Ni
n
X
14.4
xi
Bayesian Testing
If H0 : 0 :
i=1
Z
Prior probability P [H0 ] =
i=1
i=1
Multinomial(p)
Posterior hyperparameters
max x(n) , xm , k + n
n
X
+ n, +
xi
f () d
Z0
f ( | xn ) d
f (xn | Hk )P [Hk ]
P [Hk | xn ] = PK
,
n
k=1 f (x | Hk )P [Hk ]
15
Marginal likelihood
f (xn | Hi ) =
f (xn | , Hi )f ( | Hi ) d
f (xn | Hi )
f (xn | Hj )
| {z }
P [Hi ]
P [Hj ]
| {z }
prior odds
Bayes factor
log10 BF10
p =
15
0 0.5
0.5 1
12
>2
p
1p BF10
1+
p
1p BF10
BF10
evidence
1 1.5
1.5 10
10 100
> 100
Weak
Moderate
Strong
Decisive
vboot
B
B
X
1 X
bb = 1
T
=V
T
n,b
Fn
B
B r=1 n,r
!2
b=1
16.1.1
Normal-based interval
b boot
Tn z/2 se
Exponential Family
Pivotal interval
Scalar parameter
1.
2.
3.
4.
s
X
Location parameter = T (F )
Pivot Rn = bn
Let H(r) = P [Rn r] be the cdf of Rn
Let Rn,b
= bn,b
bn . Approximate H using bootstrap:
B
1 X
b
H(r)
=
I(Rn,b
r)
B
i=1
b=1
, . . . , bn,B
)
5. = sample quantile of (bn,1
a
=
b =
16
16.1
Sampling Methods
The Bootstrap
b 1 1 =
bn H
2
b 1 =
bn H
2
bn r1/2
=
2bn 1/2
bn r/2
=
2bn /2
Percentile interval
Cn = /2
, 1/2
16
16.2
Rejection Sampling
Setup
We can easily sample from g()
We want to sample from h(), but it is difficult
k()
k() d
Envelope condition: we can find M > 0 such that k() M g()
We know h() up to a proportional constant: h() = R
Algorithm
1. Draw cand g()
2. Generate u Unif (0, 1)
k(cand )
3. Accept cand if u
M g(cand )
4. Repeat until B values of cand have been accepted
Example
Loss functions
Squared error loss: L(, a) = ( a)2
(
K1 ( a) a < 0
Linear loss: L(, a) =
K2 (a ) a 0
Absolute error loss: L(, a) = | a| (linear loss with K1 = K2 )
Lp loss: L(, a) = | a|p
(
0 a=
Zero-one loss: L(, a) =
1 a 6=
17.1
16.3
Posterior risk
Z
h
i
b
b
L(, (x))f
( | x) d = E|X L(, (x))
h
i
b
b
L(, (x))f
(x | ) dx = EX| L(, (X))
r(b | x) =
(Frequentist) risk
b =
R(, )
Bayes risk
ZZ
Importance Sampling
b =
r(f, )
Decision Theory
Definitions
Unknown quantity affecting our decision:
h
i
b
b
L(, (x))f
(x, ) dx d = E,X L(, (X))
h
h
ii
h
i
b = E EX| L(, (X)
b
b
r(f, )
= E R(, )
h
h
ii
h
i
b = EX E|X L(, (X)
b
r(f, )
= EX r(b | X)
iid
17
Risk
17.2
Admissibility
b0 dominates b if
b
: R(, b0 ) R(, )
b
: R(, b0 ) < R(, )
17
17.3
Bayes Rule
b = inf e r(f, )
e
r(f, )
R
b
b
b = r(b | x)f (x) dx
(x) = inf r( | x) x = r(f, )
n
X
2i
i=1
17.4
n
b0 = Yn b1 X
Pn
Pn
n )(Yi Yn )
(Xi X
i=1 Xi Yi nXY
Pn
b1 = i=1
=
P
n
2
2
2
i=1 (Xi Xn )
i=1 Xi nX
h
i
0
E b | X n =
1
P
h
i
2 n1 ni=1 Xi2 X n
n
b
V |X =
X n
1
nsX
r Pn
2
b
i=1 Xi
b b0 ) =
se(
n
sX n
b b1 ) =
se(
sX n
Minimax Rules
Maximum risk
b = sup R(, )
b
)
R(
R(a)
= sup R(, a)
Minimax rule
e
e = inf sup R(, )
b = inf R(
)
sup R(, )
b =c
b = Bayes rule c : R(, )
Least favorable prior
bf = Bayes rule R(, bf ) r(f, bf )
18
Linear Regression
Pn
b2 =
where s2X = n1 i=1 (Xi X n )2 and
Further properties:
Pn
2i
i=1
(unbiased estimate).
P
P
Consistency: b0 0 and b1 1
Asymptotic normality:
Definitions
Response variable Y
Covariate X (aka predictor variable or feature)
18.1
1
n2
b0 0 D
N (0, 1)
b b0 )
se(
b1 1 D
N (0, 1)
b b1 )
se(
Model
Yi = 0 + 1 Xi + i
and
E [i | Xi ] = 0, V [i | Xi ] = 2
b b0 )
b0 z/2 se(
b b1 )
and b1 z/2 se(
Fitted line
Wald test for H0 : 1 = 0 vs. H1 : 1 6= 0: reject H0 if |W | > z/2 where
b b1 ).
W = b1 /se(
rb(x) = b0 + b1 x
Predicted (fitted) values
Ybi = rb(Xi )
Residuals
i = Yi Ybi = Yi b0 + b1 Xi
R2
Pn b
Pn 2
2
rss
i=1 (Yi Y )
R = Pn
= 1 Pn i=1 i 2 = 1
2
tss
i=1 (Yi Y )
i=1 (Yi Y )
2
18
Likelihood
L=
L1 =
n
Y
i=1
n
Y
f (Xi , Yi ) =
n
Y
fX (Xi )
i=1
n
Y
b = (X T X)1 X T Y
h
i
V b | X n = 2 (X T X)1
fY |X (Yi | Xi ) = L1 L2
i=1
b N , 2 (X T X)1
fX (Xi )
i=1
n
Y
2
1 X
Yi (0 1 Xi )
fY |X (Yi | Xi ) n exp 2
L2 =
2 i
i=1
)
rb(x) =
k
X
bj xj
j=1
Under the assumption of Normality, the least squares estimator is also the mle
n
1 X 2
b =
n k i=1 i
1X 2
b2 =
n i=1 i
= X b Y
mle
18.2
b=X
Prediction
Prediction interval
bn2 =
b2
Pn
2
i=1 (Xi X )
P
2j + 1
n i (Xi X)
nk 2
1 Confidence interval
b bj )
bj z/2 se(
18.4
Model Selection
Yb z/2 bn
18.3
b2 =
Procedure
1. Assign a score to each model
2. Search through all models to find the one with the highest score
Multiple Regression
Y = X +
Hypothesis testing
H0 : j = 0 vs. H1 : j 6= 0 j J
where
X11
..
X= .
Xn1
..
.
X1k
..
.
Xnk
1
= ...
1
..
=.
n
Likelihood
1
2 n/2
L(, ) = (2 )
exp 2 rss
2
R(S) =
n
X
mspei =
i=1
n
X
h
i
E (Ybi (S) Yi )2
i=1
Training error
rss = (y X)T (y X) = kY Xk2 =
N
X
(Yi xTi )2
i=1
btr (S) =
R
n
X
(Ybi (S) Yi )2
i=1
19
R2
btr (S)
rss(S)
R
R2 (S) = 1
=1
=1
tss
tss
Pn b
2
i=1 (Yi (S) Y )
P
n
2
i=1 (Yi Y )
Frequentist risk
Z
h
i Z
R(f, fbn ) = E L(f, fbn ) = b2 (x) dx + v(x) dx
n
X
h
i
b(x) = E fbn (x) f (x)
h
i
v(x) = V fbn (x)
h
i
Cov Ybi , Yi
i=1
Adjusted R
R2 (S) = 1
n 1 rss
n k tss
19.1.1
Mallows Cp statistic
Histograms
Definitions
b
btr (S) + 2kb
R(S)
=R
2 = lack of fit + complexity penalty
Number of bins m
1
Binwidth h = m
Bin Bj has j observations
R
Define pbj = j /n and pj = Bj f (u) du
Histogram estimator
k
BIC(S) = `n (bS ,
bS2 ) log n
2
Validation and training
bV (S) =
R
m
X
fbn (x) =
(Ybi (S) Yi )2
i=1
Leave-one-out cross-validation
bCV (S) =
R
n
X
(Yi Yb(i) ) =
i=1
n
X
i=1
Yi Ybi (S)
1 Uii (S)
!2
19
19.1
n
n
or
4
2
m
X
pbj
j=1
I(x Bj )
h
i p
j
E fbn (x) =
h
h
i p (1 p )
j
j
V fbn (x) =
nh2
Z
1
h2
2
b
(f 0 (u)) du +
R(fn , f )
12
nh
!1/3
1
6
h = 1/3 R
2 du
n
(f 0 (u))
2/3 Z
1/3
3
C
2
C=
(f 0 (u)) du
R (fbn , f ) 2/3
4
n
Density Estimation
R
Estimate f (x), where f (x) = P [X A] = A f (x) dx.
Integrated square error (ise)
Z
Z
2
L(f, fbn ) =
f (x) fbn (x) dx = J(h) + f 2 (x) dx
2Xb
2
n+1 X 2
f(i) (Xi ) =
pb
fbn2 (x) dx
n i=1
(n 1)h (n 1)h j=1 j
20
19.1.2
Kernel K
i:xi Nk (x)
K(x) 0
R
K(x) dx = 1
R
xK(x) dx = 0
R 2
2
>0
x K(x) dx K
n
X
wi (x)Yi
i=1
KDE
xxi
h
Pn
xxj
K
j=1
h
wi (x) =
n
1X1
x Xi
fbn (x) =
K
n i=1 h
h
Z
Z
1
1
4
00
2
b
R(f, fn ) (hK )
(f (x)) dx +
K 2 (x) dx
4
nh
Z
Z
2/5 1/5 1/5
c
c2 c3
2
2
h = 1
c
=
,
c
=
K
(x)
dx,
c
=
(f 00 (x))2 dx
1
2
3
K
n1/5
Z
4/5 Z
1/5
c4
5 2 2/5
2
00 2
b
R (f, fn ) = 4/5
K (x) dx
(f ) dx
c4 = (K )
4
n
|
{z
}
R(b
rn , r)
h4
4
Z
[0, 1]
4 Z
2
f 0 (x)
x2 K 2 (x) dx
r00 (x) + 2r0 (x)
dx
f (x)
R
2 K 2 (x) dx
dx
nhf (x)
Z
c1
n1/5
c2
R (b
rn , r) 4/5
n
h
C(K)
Epanechnikov Kernel
(
K(x) =
4 5(1x2 /5)
|x| <
otherwise
JbCV (h) =
n
X
i=1
n
X
i=1
Pn
j=1
19.2
19.3
n
n
n
2Xb
1 X X Xi Xj
2
2
b
fn (x) dx
f(i) (Xi )
+
K(0)
K
n i=1
hn2 i=1 j=1
h
nh
K (2) (x) =
Approximation
r(x) =
X
j=1
j j (x)
J
X
j j (x)
i=1
Multivariate regression
Non-parametric Regression
K(0)
xx
j
K
h
Z
K(x y)K(y) dy
!2
where
i = i
Y = +
0 (x1 )
..
and = .
..
.
0 (xn )
J (x1 )
..
.
J (xn )
b = (T )1 T Y
1
T Y (for equally spaced observations only)
n
21
2
n
J
X
X
bCV (J) =
Yi
R
j (xi )bj,(i)
i=1
20
j=1
20.2
Poisson Processes
Poisson process
{Xt : t [0, )} = number of events up to and including time t
X0 = 0
Independent increments:
Stochastic Processes
Stochastic Process
(
{0, 1, . . . } = Z discrete
T =
[0, )
continuous
{Xt : t T }
Notations Xt , X(t)
State space X
Index set T
20.1
Markov Chains
Markov chain
Rt
0
(s) ds
n T, x X
(t) = Xt Po (t)
Transition probabilities
pij P [Xn+1 = j | Xn = i]
pij (n) P [Xm+n = j | Xm = i]
>0
Waiting times
n-step
Wt := time at which Xt occurs
i pij = 1
1
Wt Gamma t,
Chapman-Kolmogorov
Interarrival times
pij (m + n) =
St = Wt+1 Wt
Pm+n = Pm Pn
Pn = P P = Pn
St Exp
1
Marginal probability
n = (n (1), . . . , n (N ))
where
i (i) = P [Xn = i]
St
0 , initial distribution
n = 0 Pn
Wt1
Wt
t
22
21
Time Series
21.1
Mean function
Strictly stationary
xt = E [xt ] =
xft (x) dx
Autocovariance function
x (s, t) = E [(xs s )(xt t )] = E [xs xt ] s t
x (t, t) = E (xt t )2 = V [xt ]
Autocorrelation function (ACF)
k N, tk , ck , h Z
Weakly stationary
t Z
E x2t <
2
E xt = m
t Z
x (s, t) = x (s + r, t + r)
(s, t)
Cov [xs , xt ]
=p
(s, t) = p
V [xs ] V [xt ]
(s, s)(t, t)
Cross-covariance function (CCV)
r, s, t Z
Autocovariance function
xy (s, t)
x (s, s)y (t, t)
h Z
Backshift operator
B k (xt ) = xtk
Difference operator
(t + h, t)
(h)
Cov [xt+h , xt ]
x (h) = p
=p
=
(0)
V [xt+h ] V [xt ]
(t + h, t + h)(t, t)
d = (1 B)d
White noise
2
wt wn(0, w
)
iid
2
0, w
Gaussian: wt N
E [wt ] = 0 t T
V [wt ] = 2 t T
w (s, t) = 0 s 6= t s, t T
Random walk
Linear process
Drift
Pt
xt = t + j=1 wj
E [xt ] = t
xt = +
k
X
j=k
aj xtj
j wtj
where
j=
xy (h)
x (0)y (h)
where aj = aj 0 and
k
X
j=k
aj = 1
(h) =
|j | <
j=
2
w
j+h j
j=
23
21.2
Estimation of Correlation
21.3.1
Detrending
Least squares
Sample mean
n
1X
x
=
xt
n t=1
Sample variance
n
|h|
1 X
1
x (h)
V [
x] =
n
n
h=n
b(h) =
(xt+h x
)(xt x
)
n t=1
vt =
1
2k+1 :
k
X
1
xt1
2k + 1
i=k
Pk
1
If 2k+1
i=k wtj 0, a linear trend function t = 0 + 1 t passes
without distortion
b(h)
b(0)
Differencing
t = 0 + 1 t = xt = 1
bxy (h) =
nh
1 X
(xt+h x
)(yt y)
n t=1
21.4
ARIMA models
Autoregressive polynomial
(z) = 1 1 z p zp
bxy (h)
bxy (h) = p
bx (0)b
y (0)
z C p 6= 0
Autoregressive operator
(B) = 1 1 B p B p
Properties
1
bx (h) = if xt is white noise
n
1
bxy (h) = if xt or yt is white noise
n
21.3
k1
X
j (wtj )
k,||<1
j=0
j (wtj )
j=0
{z
linear process
xt = t + st + wt
t = trend
st = seasonal component
wt = random noise term
E [xt ] =
j=0
j (E [wtj ]) = 0
(h)
(0)
2 h
w
12
= h
(h) = (h 1) h = 1, 2, . . .
24
Seasonal ARIMA
(z) = 1 + 1 z + + q zq
z C q 6= 0
21.4.1
q
X
j E [wtj ] = 0
2
w
0
Pqh
j=0
j j+h
j=0
wtj = (B)wt
j=0
j=0
0hq
h>q
MA (1)
xt = wt + wt1
2 2
(1 + )w h = 0
2
(h) = w
h=1
0
h>1
(
h=1
2
(h) = (1+ )
0
h>1
(B)xt =
j=0
Xtj = wt
j=0
Properties
ARMA (p, q) causal roots of (z) lie outside the unit circle
(z) =
j z j =
j=0
(z)
(z)
|z| 1
ARMA (p, q)
xt = 1 xt1 + + p xtp + wt + 1 wt1 + + q wtq
ARMA (p, q) invertible roots of (z) lie outside the unit circle
(B)xt = (B)wt
(z) =
j z j =
j=0
xh1
, regression of xi on {xh1 , xh2 , . . . , x1 }
i
hh = corr(xh xh1
, x0 xh1
) h2
0
h
E.g., 11 = corr(x1 , x0 ) = (1)
(z)
(z)
|z| 1
Behavior of the ACF and PACF for causal and invertible ARMA models
ACF
PACF
ARIMA (p, d, q)
d xt = (1 B)d xt is ARMA (p, q)
AR (p)
tails off
cuts off after lag p
MA (q)
cuts off after lag q
tails off q
ARMA (p, q)
tails off
tails off
(1 )j1 xtj + wt
when || < 1
21.5
Spectral Analysis
Periodic process
xt = A cos(2t + )
= U1 cos(2t) + U2 sin(2t)
j=1
x
n+1 = (1 )xn +
xn
25
Amplitude A
Phase
U1 = A cos and U2 = A sin often normally distributed rvs
xt =
q
X
Fourier/Fundamental frequencies
(Uk1 cos(2k t) + Uk2 sin(2k t))
j = j/n
k=1
Inverse DFT
xt = n1/2
j=0
Scaled Periodogram
2 2i0 h 2 2i0 h
e
+
e
=
2
2
Z 1/2
=
e2ih dF ()
4
I(j/n)
n
!2
n
2X
=
xt cos(2tj/n +
n t=1
P (j/n) =
1/2
22
22.1
(h)e2ih
h=
h=
!2
Gamma Function
ts1 et dt
0
Z
Upper incomplete: (s, x) =
ts1 et dt
x
Z x
Lower incomplete: (s, x) =
ts1 et dt
Ordinary: (s) =
Spectral density
2X
xt sin(2tj/n
n t=1
Math
Z
F () = F (1/2) = 0
F () = F (1/2) = (0)
f () =
d(j )e2ij t
I(j/n) = |d(j/n)|2
(h) = 2 cos(20 h)
0
F () = 2 /2
n1
X
Periodogram
xt e2ij t
i=1
Periodic mixture
Needs
n
X
1
1
2
2
R 1/2
1/2
e2ih f () d
f () 0
f () = f ()
f () = f (1 )
R 1/2
(0) = V [xt ] = 1/2 f () d
h = 0, 1, . . .
( + 1) = ()
>1
(n) = (n 1)!
nN
(1/2) =
22.2
Beta Function
Z 1
(x)(y)
tx1 (1 t)y1 dt =
Ordinary: B(x, y) = B(y, x) =
(x + y)
0
Z x
Incomplete: B(x; a, b) =
ta1 (1 t)b1 dt
2
White noise: fw () = w
ARMA (p, q) , (B)xt = (B)wt :
|(e2i )|2
fx () =
|(e2i )|2
Pp
Pq
where (z) = 1 k=1 k z k and (z) = 1 + k=1 k z k
2
w
Regularized incomplete:
a+b1
B(x; a, b) a,bN X
(a + b 1)!
Ix (a, b) =
=
xj (1 x)a+b1j
B(a, b)
j!(a
+
b
j)!
j=a
26
I0 (a, b) = 0
I1 (a, b) = 1
Ix (a, b) = 1 I1x (b, a)
22.3
Series
Finite
n(n + 1)
k=
2
(2k 1) = n2
k=1
n
X
k=1
n
X
k=1
n
X
k2 =
ck =
k=0
cn+1 1
c1
n
X
n
k=0
n
X
=2
k=0
Binomial
Theorem:
n
X
n nk k
a
b = (a + b)n
k
B : D, U : D
k=0
c 6= 1
k > n : Pn,k = 0
f :BU
f arbitrary
n
m
B : D, U : D
n+n1
n
m
X
n
k=1
1
,
1p
p
|p| < 1
1p
k=1
!
X
d
1
1
k
=
p
=
dp 1 p
(1 p)2
pk =
kpk1 =
B : D, U : D
|p| < 1
ordered
(n i) =
i=0
unordered
Pn,k
f injective
(
mn m n
0
else
m
n
(
1 mn
0 else
(
1 mn
0 else
f surjective
n
m!
m
n1
m1
n
m
Pn,m
f bijective
(
n! m = n
0 else
(
1 m=n
0 else
(
1 m=n
0 else
(
1 m=n
0 else
References
[3] R. H. Shumway and D. S. Stoffer. Time Series Analysis and Its Applications
With R Examples. Springer, 2006.
w/o replacement
nk =
D = distinguishable, D = indistinguishable.
Sampling
k1
Y
n 1 : Pn,0 = 0, P0,0 = 1
Combinatorics
k out of n
m
X
k=1
k=0
22.4
Pn,i
k=0
d
dp
k=0
k=0
X
r+k1 k
x = (1 x)r r N+
k
k=0
X
k
p = (1 + p) |p| < 1 , C
k
n
X
Vandermondes Identity:
r
X
m
n
m+n
=
k
rk
r
k=0
Infinite
pk =
Pn+k,k =
i=1
r+k
r+n+1
=
k
n
k=0
n
X k
n+1
=
m
m+1
n(n + 1)(2n + 1)
6
k=1
2
n
X
n(n + 1)
k3 =
2
Partitions
Binomial
n
X
1kn
(
1 n=0
n
=
0
0 else
n!
(n k)!
n
nk
n!
=
=
k
k!
k!(n k)!
w/ replacement
nk
n1+r
n1+r
=
r
n1
28