0% found this document useful (0 votes)
9 views

4ST601 Formulas v3.54

The document provides a comprehensive overview of statistical formulas and methods used in the Faculty of Informatics and Statistics at Prague University of Economics and Business. It covers various topics including descriptive statistics, correlation analysis, regression analysis, time series analysis, and probability theory. The content is structured to serve as a reference for statistical analysis techniques and their applications.

Uploaded by

Art
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

4ST601 Formulas v3.54

The document provides a comprehensive overview of statistical formulas and methods used in the Faculty of Informatics and Statistics at Prague University of Economics and Business. It covers various topics including descriptive statistics, correlation analysis, regression analysis, time series analysis, and probability theory. The content is structured to serve as a reference for statistical analysis techniques and their applications.

Uploaded by

Art
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Prague University of Economics and Business

FACULTY OF INFORMATICS AND STATISTICS


Department of Statistics and Probability

STATISTICS
FORMULAS FOR 4ST601

Version 3.54
Last Update: January 2024

KSTP, VŠE 2024


Descriptive statistics
nj k k
pj 
n
n j 1
j n p
j 1
j 1 j 1, 2, ..., k

x(1)  x(2)  ...  x( n )


xP , 0  P  1 xP  x( h )  (h   h  )( x( h )  x( h ) )
h  (n  1)  P  1

i 1, 2, ..., n
n k

 xi x n j j k
x   xj pj
i 1 j 1
x x k

n
n j 1
j
j 1

x n j j k
x   xj pj
j 1
x k

n
j 1
j
j 1

r  xmax  xmin rq  x0.75  x0.25


2
n n
 n 
 (x  x ) i
2
 xi2   xi 
 
i 1 i 1
s 
2 i 1
sx2  x 2  x 2 
 n 
x
n n
 
 
2
k
 k k

 (x j  x) nj 2
k  x n j   xj nj
2
j 
  (x j  x ) 2 p j  k 
j 1 j 1 j 1
s 
2
sx  x  x  k
2 2 2

 
x k

n j
j 1
 nj   nj 
j 1 j 1  j 1 
k k
  x 2j p j  ( x j p j ) 2
j 1 j 1
k k

s 2
jx nj  (x j  x )2 n j k k
  s 2jx p j   (x j  x ) 2 p j
j 1 j 1
sx2  s 2jx  sx2j  k
 k

n j 1
j n
j 1
j
j 1 j 1

sx  s 2
x

n n

 ( xi  x )2 n 2  (x  x ) i
2

s2  i 1
 sx s  s2  i 1

n 1 n 1 n 1

sx
mad  median( xi  x ) vx 
x

1
Descriptive analysis of dependence
Contingency table (r x c)
c r nij c
ni. r n. j
ni.   nij n. j   nij pij  pi.   pij  p. j   pij 
j 1 i 1 n j 1 n i 1 n
nij pij nij pij
pi| j   p j|i  
n. j p. j ni. pi.
ni.n. j r (nij  nij ) 2
c
nij  g  
n i 1 j 1 nij

g g
Cval   0, (m  1) / m Vval   0,1 m = min (r,c)
gn n(m  1)

One-way analysis of variance


nj nj

ssT    x ji  x  = ssTR + ssE ssTR    x j  x  n j ssE    x ji  x j 


k k k
2 2 2

j 1 i 1 j 1 j 1 i 1

ssTR
r2 
ssT

Correlation analysis
xy  x  y sxy
ryx  rxy  r  
x 2
 x2  y 2
 y2  sx s y

Regression analysis
Regresssion line ŷ  b0  b1 x yˆi  b0  b1 xi , i  1, 2,..., n
xy  x  y sxy
b1  byx   2 b0  y  b1x
x2  x 2 sx
Quadratic regression function ŷ  b0  b1 x  b2 x 2 yˆi  b0  b1 xi  b2 xi2 , i  1, 2,..., n
Regression plane ŷ  b0  b1 x1  b2 x2 yˆi  b0  b1 x1,i  b2 x2,i , i  1, 2,..., n
Residuals ei  yi  yˆi yi  yˆi  ei , i  1, 2,..., n

n n n n
ssT   (yi  y ) 2 ssR    yˆi  y  ssE   ei2   ( yi  yˆi ) 2
2

i 1 i 1 i 1 i 1

ssT  ssR  ssE


ssR n 1
r2  2
rADJ  1  (1  r 2 )
ssT n p

2
Descriptive time series analysis and predictions
n n 1
y1  y2 y  y3 y  yn
y
1 1
t y1   yt  yn d1  2 d 2  ...  n 1 d n 1
2 2
y
t 1
y
t 2
y 2 2 2
n n 1 d1  d 2  ...  d n 1
1 n y  y1
t = yt - yt-1  
n  1 t 2
t  n
n 1
yt yn
kt  k  n 1 k 2 k 3 ...k n  n 1
y t 1 y1
y yt I
I t /1  t  I 2 /1 I 3/ 2 ... I t / t 1 I t / t 1   t /1
y1 yt 1 I t 1/1

Time series decomposition


yt  trt  st  ct  et yt  trt  st  ct  et
Trend functions
n
Constant trend trt  b0 , t  1, 2,..., n b0   yt / n
t 1

Linear trend trt  b0  b1t , t  1, 2,..., n


Quadratic trend trt  b0  b1t  b2t 2 , t  1, 2,..., n
Predictions yˆt  trt , t  n  1, n  2,...

Moving averages
yt  p  ...  yt 1  yt  yt 1  ...  yt  p
m  2 p 1 yt  trt 
m
yt  p  ...  yt 1  yt  yt 1  ...  yt  p 1 yt  p 1  ...  yt 1  yt  yt 1  ...  yt  p

m  2p yt  trt  m m
2
Simple approach to seasonal component
nj s

 (sij  eij )  sd j

st  et  yt  trt sd j  i 1
sd j  sd j 
j 1

nj s
nj

 (s ij  eij )
s
st  et  yt / trt si j  i 1
si j  si j  s

 si
nj
j
j 1

Regression approach to seasonal component (linear trend, length of seasonality 4)


yt  trt  st  et  b0  b1t  sd1x1  sd 2 x2  sd3 x3  sd 4 x4  et b0  b0*  a

yt  trt  st  et  b0*  b1t  a1x1  a2 x2  a3 x3  et


a1  a2  a3
a sd j  a j  a , j  1, 2,3 sd 4   a
4
Predictions yˆt  trt  st  b0  b1  t  sd j , t  n  1, n  2,..., j  1, 2,3, 4

3
Probability and random variables
Probability theory
m
P(A)  P (A)  1  P (A)
n
P(A  B)  P(A)  P(B)  P(A  B)
Disjoint events P(A  B)  P(A)  P(B)
Independent events P(A)  P(A| B) P(B)  P(B | A) P(A  B)  P(A) P(B)
Dependent events P( A  B)  P( A)  P( B | A)  P( B)  P( A | B)

Random variables (random variable X; the value of this random variable x)


Discrete random variables
P(x)  P(X  x)  P(x)  1
x
F (x)  P(X  x)   P (t )
tx

P (x1  X  x2 )  F (x2 )  F (x1 )  


x1  x  x2
P (x )

E (X )   x P(x)
x

2
 
Var (X )  E  X 2    E  X     x 2 P (x)    x P (x) 
2
S.D.(X )  Var (X )
x  x 
Continuous random variables
 x
f (x)  F (x)  f ( x)dx  1

F (x)  P(X  x)  

f (t )dt

x2

P  x1  X  x2   F (x2 )  F (x1 )   f (x)dx


x1

xP , 0  P  1 F (xP )  P

E (X )   xf (x)dx


2

 
Var (X )  E  X    E  X     x f (x)dx    xf (x)dx 
2
2 2
S.D.(X )  Var (X )
   

4
The most important probability distributions
Probability distributions of discrete random variables
Binomial distribution: X ~ Bi(n,)
n
P( x )     x (1   ) n  x x = 0, 1, 2, ..., n, 0    1
 x
E ( X )  n Var ( X )  n 1   

Poisson distribution: X ~ Po()


x
P(x)  e   x = 0, 1, ... , >0
x!
E( X )   Var ( X )  

Hypergeometric distribution: X ~ Hy(N,M,n)


M  N  M 
   
 x   nx 
P( x)  , x  max(0, M -N +n),..., min(M , n) , n > 0, N ≥ n, M ≤ N
N
 
 n
M M  M  N n
E( X )  n Var (X )  n 1  
N N N  N 1

Probability distributions of continuous random variables


Standard normal distribution: Z ~ N(0,1), X ~ N(,2)
X  x
Z z   z  
 
E (Z )  0 Var ( Z )  1
 (z )  1   (  z ) z P   z1 P

Normal distribution: X ~ N(,2) -  x   -  µ   2 > 0


E( X )   Var ( X )   2

 x 
F ( x)   ( z )     xP     z P
  
 x   X   x2   
P (x1  X  x2 )  P  1    P(z1  Z  z2 )   (z2 )   (z1 )
    

Chi-square distribution: G ~ 2(df) g0

t-distribution (Student’s distribution): T ~ t(df)   t   t P (df )  t1 P (df )

1
F-distribution (Fisher – Snedecor d.): F ~ F(df1, df2) f 0 f P (df1 ,df 2 ) 
f1 P (df 2 ,df1 )

5
Statistical inference: point and interval parameter estimates

Calculated point population parameter estimates


Population mean (expected value)
Normal distribution in population; X ~ N(,2): ˆ = x
General distribution in population: Eˆ(X )  x

Population variance
Normal distribution in population; X ~ N(,2): ˆ 2  s 2
nj
Population proportion ˆ = p 
n
Count in population N   Np

Constructed (empirical) interval population parameter estimates


(confidence level 1 - )

Population mean (expected value)


Normal distribution in population, population variance 2 unknown, two-sided confidence
interval for parameter μ:
 s s 
 x  t1 / 2 ,x  t1 / 2  T ~ t(n – 1)
 n n

General distribution in population, Var(X) unknown (but finite), random sample size assumed
to be large enough, two-sided confidence interval for parameter E(X):
 s s 
 x  z1 / 2 ,x  z1 / 2 
 n n

Population proportion
Random sample size assumed to be large enough, two-sided confidence int. for parameter :
 p(1  p) p(1  p) 
 p  z1 / 2 , p  z1 / 2 
 n n 

Count in population
Random sample size assumed to be large enough, N known, two-sided confidence interval for
parameter N:
 p(1  p) p(1  p) 
N   p  z1 / 2 , p  z1 / 2 
 n n 

6
Statistical inference: statistical hypothesis tests

Statistical hypothesis tests of population parameters


Population mean (expected value); normal distribution in population, 2 unknown
H0 H1 Test statistic Value of test statistic Rejection region
 = 0  > 0 X  0 x  0 Wα={t; t  t1–}
T  n t  n
 < 0 S s Wα={t; t  –t1–}
  0 Wα={t; t  t1–/2}
Assuming H0 is true:
T ~ t(n – 1)

Population mean (expected value); general distribution in population, Var(X) unknown


(but finite), random sample size assumed to be large enough
H0 H1 Test statistic Value of test statistic Rejection region
E(X)= 0 X) > 0 X  0 x  0 Wα={z; z  z1–}
Z  n z  n
X < 0 S s Wα={z; z  –z1–}
X  0 Wα={z; z  z1–/2}
Assuming H0 is true:
Z ≈ N(0,1)

Population proportionrandom sample size assumed to be large enough


H0 H1 Test statistic Value of test statistic Rejection region
 = 0  > 0 P 0 p 0 Wα={z; z  z1–}
Z z
 < 0  0 (1   0 )  0 (1   0 ) Wα={z; z  –z1–}
  0 n n Wα={z; z  z1–/2}
Assuming H0 is true:
Z ≈ N(0,1)

Statistical hypothesis test of distribution in population

Chi-square goodness-of-fit test (random sample size and 0,j assumed to be large enough
nj  n 0, j

H0 and H1 Test statistic Value of test statistic Rejection region


H0: j = 0,j G k (n j  nj ) 2 Wα={g; g  21-}
(j = 1, .., k) Assuming H0 is true: g 
j 1 nj
H1: non H0 G ≈ 2(k – 1)

7
Inferential analysis of dependence, inferential time series analysis

Contingency table (r x c) (all expected frequencies assumed to be large enough


H0 H1 Test statistic Value of test statistic Rejection region
Variables are non H0 G r c (nij  nij ) 2 Wα ={g; g ≥ 21-}
g  
independent
Assuming H0 is true:
i 1 j 1 nij
G ≈ 2((r - 1)(c - 1))

One-way analysis of variance


H0 and H1 Test statistic Value of test statistic Rejection region
H0: 1 = 2 = … =k F ssTR
Wα={f; f  f1-α}
H1: non H0 Assuming H0 is true: f  k 1
ssE
F ~ F(k – 1, n – k) nk

Correlation analysis
H0 H1 Test statistic Value of test statistic Rejection region
ρXY = 0 ρXY  0 T r n2 Wα={t; t  t1–/2}
t
Assuming H0 is true: 1 r 2

T ~ t(n –2)

Regression line (p = 2), data-generating process:


Yi   0  1  xi   i i  1, 2,..., n
Hypothesis test of slope (t-test of slope), s(b1) only from Excel
H0 H1 Test statistic Value of test statistic Rejection region
1 = 0 1  0 T b Wα={t; t  t1–/2}
t 1
Assuming H0 is true: s (b1 )
T ~ t(n –p)

Linear trend (p = 2), data-generating process:


Yt   0  1  t   t t  1, 2,..., n
Hypothesis test of slope (t-test of slope), s(b1) only from Excel
H0 H1 Test statistic Value of test statistic Rejection region
1 = 0 1  0 T b Wα={t; t  t1–/2}
t 1
Assuming H0 is true: s (b1 )
T ~ t(n –p)

8
Index number analysis
c
All totals: i1

Q = pq IQ  Ip  Iq

p1 q1 Q1
Ip   p  p1  p0 Iq  q  q1  q0 IQ  Q  Q1  Q0
p0 q0 Q0

 q,  Q, p   q
Q
Homogeneous commodities

I (Σq) 
 q   Iq  q
1 0

q 1
Δ(q)   q1   q0
q q 0 0
q
 Iq 1

I (ΣQ) 
Q   p q 1 1 1

 IQ  Q   Q 0 1
(Q)   Q1   Q0
Q  p q 0 0 0 Q Q
 IQ
0 1

Q
 Q1  p1q1
1
Q
p 1

 q1   q1   p  p1  p0   pq
p p1q1
Ip  1  1
 0 0

p0  Q0  p0 q0 Q 0 q 1 q 0

 q0  q0 Q
p 0

Heterogeneous commodities

Ip ( L ) 
 p1q0

 Ip  p q 0 0

 Ip  Q 0
Ip ( P ) 
pq  pq
1 1 1 1

Q 1

 p0 q0  pq 0 0 Q 0 p q  pq
0 1 1 1 Q
 Ip 1
Ip

Ip ( F )  Ip ( L ) Ip ( P )

Iq ( L ) 
 p0 q1

 Iq  p q 0 0

 Iq  Q 0
Iq ( P ) 
 p1q1

pq 1 1

Q 1

 p0 q0 pq 0 0 Q 0  p1q0 pq
 Iq 1 1 Q
 Iq 1

Iq ( F )  Iq ( L ) Iq ( P )

9
Selected Excel functions
What Excel function Lecture Symbol
Frequency of a qualitative variables’ level COUNTIF() 1 nj
Frequencies of quantitative variables’ values FREQUENCY() 1 nj
Sum (total) SUM() 1 
Quantile PERCENTILE.INC() 1 xP
Median MEDIAN() 1 x
Mode MODE.MULT() 2 x̂
Simple arithmetic mean AVERAGE() 2 x
Variance (from raw data set) VAR.P() 2 sx2
Standard deviation (from raw data set) STDEV.P() 2 sx
Sample variance (from raw data set) VAR.S() 2 s2
Sample standard variation (from raw data set) STDEV.S() 2 s
Pearson’s correlation coefficient CORREL() 4 r
Probability function for X with binomial distribution BINOM.DIST() 8 P(x)
Probability function for X with hypergeometric distribution HYPGEOM.DIST() 8 P(x)
Probability function for X with Poisson distribution POISSON.DIST() 8 P(x)
Cumulative distribution function for Z NORM.S.DIST() 8  (z )
Quantiles for Z NORM.S.INV() 8 zP
Cumulative distribution function for X with normal
NORM.DIST() 8 F(x)
distribution
Quantiles for X with normal distribution NORM.INV() 8 xP
Quantiles for T with t-distribution T.INV() 8 tP
Quantiles for G with chi-squared distribution CHISQ.INV() 8  P2
Quantiles for F with F-distribution F.INV() 8 fP
p-value for the test of independence in the contingency table CHISQ.TEST() 11 p-value

10

You might also like