4ST601 Formulas v3.54
4ST601 Formulas v3.54
STATISTICS
FORMULAS FOR 4ST601
Version 3.54
Last Update: January 2024
i 1, 2, ..., n
n k
xi x n j j k
x xj pj
i 1 j 1
x x k
n
n j 1
j
j 1
x n j j k
x xj pj
j 1
x k
n
j 1
j
j 1
x k
n j
j 1
nj nj
j 1 j 1 j 1
k k
x 2j p j ( x j p j ) 2
j 1 j 1
k k
s 2
jx nj (x j x )2 n j k k
s 2jx p j (x j x ) 2 p j
j 1 j 1
sx2 s 2jx sx2j k
k
n j 1
j n
j 1
j
j 1 j 1
sx s 2
x
n n
( xi x )2 n 2 (x x ) i
2
s2 i 1
sx s s2 i 1
n 1 n 1 n 1
sx
mad median( xi x ) vx
x
1
Descriptive analysis of dependence
Contingency table (r x c)
c r nij c
ni. r n. j
ni. nij n. j nij pij pi. pij p. j pij
j 1 i 1 n j 1 n i 1 n
nij pij nij pij
pi| j p j|i
n. j p. j ni. pi.
ni.n. j r (nij nij ) 2
c
nij g
n i 1 j 1 nij
g g
Cval 0, (m 1) / m Vval 0,1 m = min (r,c)
gn n(m 1)
j 1 i 1 j 1 j 1 i 1
ssTR
r2
ssT
Correlation analysis
xy x y sxy
ryx rxy r
x 2
x2 y 2
y2 sx s y
Regression analysis
Regresssion line ŷ b0 b1 x yˆi b0 b1 xi , i 1, 2,..., n
xy x y sxy
b1 byx 2 b0 y b1x
x2 x 2 sx
Quadratic regression function ŷ b0 b1 x b2 x 2 yˆi b0 b1 xi b2 xi2 , i 1, 2,..., n
Regression plane ŷ b0 b1 x1 b2 x2 yˆi b0 b1 x1,i b2 x2,i , i 1, 2,..., n
Residuals ei yi yˆi yi yˆi ei , i 1, 2,..., n
n n n n
ssT (yi y ) 2 ssR yˆi y ssE ei2 ( yi yˆi ) 2
2
i 1 i 1 i 1 i 1
2
Descriptive time series analysis and predictions
n n 1
y1 y2 y y3 y yn
y
1 1
t y1 yt yn d1 2 d 2 ... n 1 d n 1
2 2
y
t 1
y
t 2
y 2 2 2
n n 1 d1 d 2 ... d n 1
1 n y y1
t = yt - yt-1
n 1 t 2
t n
n 1
yt yn
kt k n 1 k 2 k 3 ...k n n 1
y t 1 y1
y yt I
I t /1 t I 2 /1 I 3/ 2 ... I t / t 1 I t / t 1 t /1
y1 yt 1 I t 1/1
Moving averages
yt p ... yt 1 yt yt 1 ... yt p
m 2 p 1 yt trt
m
yt p ... yt 1 yt yt 1 ... yt p 1 yt p 1 ... yt 1 yt yt 1 ... yt p
m 2p yt trt m m
2
Simple approach to seasonal component
nj s
(sij eij ) sd j
st et yt trt sd j i 1
sd j sd j
j 1
nj s
nj
(s ij eij )
s
st et yt / trt si j i 1
si j si j s
si
nj
j
j 1
3
Probability and random variables
Probability theory
m
P(A) P (A) 1 P (A)
n
P(A B) P(A) P(B) P(A B)
Disjoint events P(A B) P(A) P(B)
Independent events P(A) P(A| B) P(B) P(B | A) P(A B) P(A) P(B)
Dependent events P( A B) P( A) P( B | A) P( B) P( A | B)
E (X ) x P(x)
x
2
Var (X ) E X 2 E X x 2 P (x) x P (x)
2
S.D.(X ) Var (X )
x x
Continuous random variables
x
f (x) F (x) f ( x)dx 1
F (x) P(X x)
f (t )dt
x2
xP , 0 P 1 F (xP ) P
E (X ) xf (x)dx
2
Var (X ) E X E X x f (x)dx xf (x)dx
2
2 2
S.D.(X ) Var (X )
4
The most important probability distributions
Probability distributions of discrete random variables
Binomial distribution: X ~ Bi(n,)
n
P( x ) x (1 ) n x x = 0, 1, 2, ..., n, 0 1
x
E ( X ) n Var ( X ) n 1
x
F ( x) ( z ) xP z P
x X x2
P (x1 X x2 ) P 1 P(z1 Z z2 ) (z2 ) (z1 )
1
F-distribution (Fisher – Snedecor d.): F ~ F(df1, df2) f 0 f P (df1 ,df 2 )
f1 P (df 2 ,df1 )
5
Statistical inference: point and interval parameter estimates
Population variance
Normal distribution in population; X ~ N(,2): ˆ 2 s 2
nj
Population proportion ˆ = p
n
Count in population N Np
General distribution in population, Var(X) unknown (but finite), random sample size assumed
to be large enough, two-sided confidence interval for parameter E(X):
s s
x z1 / 2 ,x z1 / 2
n n
Population proportion
Random sample size assumed to be large enough, two-sided confidence int. for parameter :
p(1 p) p(1 p)
p z1 / 2 , p z1 / 2
n n
Count in population
Random sample size assumed to be large enough, N known, two-sided confidence interval for
parameter N:
p(1 p) p(1 p)
N p z1 / 2 , p z1 / 2
n n
6
Statistical inference: statistical hypothesis tests
Chi-square goodness-of-fit test (random sample size and 0,j assumed to be large enough
nj n 0, j
7
Inferential analysis of dependence, inferential time series analysis
Correlation analysis
H0 H1 Test statistic Value of test statistic Rejection region
ρXY = 0 ρXY 0 T r n2 Wα={t; t t1–/2}
t
Assuming H0 is true: 1 r 2
T ~ t(n –2)
8
Index number analysis
c
All totals: i1
Q = pq IQ Ip Iq
p1 q1 Q1
Ip p p1 p0 Iq q q1 q0 IQ Q Q1 Q0
p0 q0 Q0
q, Q, p q
Q
Homogeneous commodities
I (Σq)
q Iq q
1 0
q 1
Δ(q) q1 q0
q q 0 0
q
Iq 1
I (ΣQ)
Q p q 1 1 1
IQ Q Q 0 1
(Q) Q1 Q0
Q p q 0 0 0 Q Q
IQ
0 1
Q
Q1 p1q1
1
Q
p 1
q1 q1 p p1 p0 pq
p p1q1
Ip 1 1
0 0
p0 Q0 p0 q0 Q 0 q 1 q 0
q0 q0 Q
p 0
Heterogeneous commodities
Ip ( L )
p1q0
Ip p q 0 0
Ip Q 0
Ip ( P )
pq pq
1 1 1 1
Q 1
p0 q0 pq 0 0 Q 0 p q pq
0 1 1 1 Q
Ip 1
Ip
Ip ( F ) Ip ( L ) Ip ( P )
Iq ( L )
p0 q1
Iq p q 0 0
Iq Q 0
Iq ( P )
p1q1
pq 1 1
Q 1
p0 q0 pq 0 0 Q 0 p1q0 pq
Iq 1 1 Q
Iq 1
Iq ( F ) Iq ( L ) Iq ( P )
9
Selected Excel functions
What Excel function Lecture Symbol
Frequency of a qualitative variables’ level COUNTIF() 1 nj
Frequencies of quantitative variables’ values FREQUENCY() 1 nj
Sum (total) SUM() 1
Quantile PERCENTILE.INC() 1 xP
Median MEDIAN() 1 x
Mode MODE.MULT() 2 x̂
Simple arithmetic mean AVERAGE() 2 x
Variance (from raw data set) VAR.P() 2 sx2
Standard deviation (from raw data set) STDEV.P() 2 sx
Sample variance (from raw data set) VAR.S() 2 s2
Sample standard variation (from raw data set) STDEV.S() 2 s
Pearson’s correlation coefficient CORREL() 4 r
Probability function for X with binomial distribution BINOM.DIST() 8 P(x)
Probability function for X with hypergeometric distribution HYPGEOM.DIST() 8 P(x)
Probability function for X with Poisson distribution POISSON.DIST() 8 P(x)
Cumulative distribution function for Z NORM.S.DIST() 8 (z )
Quantiles for Z NORM.S.INV() 8 zP
Cumulative distribution function for X with normal
NORM.DIST() 8 F(x)
distribution
Quantiles for X with normal distribution NORM.INV() 8 xP
Quantiles for T with t-distribution T.INV() 8 tP
Quantiles for G with chi-squared distribution CHISQ.INV() 8 P2
Quantiles for F with F-distribution F.INV() 8 fP
p-value for the test of independence in the contingency table CHISQ.TEST() 11 p-value
10