0% found this document useful (0 votes)
24 views6 pages

Stats formula sheet

Uploaded by

vunaschaap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views6 pages

Stats formula sheet

Uploaded by

vunaschaap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

STAT 201 Formula Sheet Winter 2023

Chapter 1 or a special case

Sample mean P (B) = P (A)P (B|A) + P (Ac )P (B|Ac )


Pn
xi Bayes rule
i=1
x=
n P (Ai )P (B|Ai )
P (Ai |B) =
Sample variance P (A1 )P (B|A1 ) + · · · + P (Ak )P (B|Ak )

1 X
n or a special case
s2x = (xi − x)2
n − 1 i=1 P (A)P (B|A)
P (A|B) =
P (A)P (B|A) + P (Ac )P (B|Ac )
Sample standard deviation
p De Morgan’s Law
sx = s2x
(A ∪ B)c = Ac ∩ B c and (A ∩ B)c = Ac ∪ B c
Inter-quartile range
For a discrete r.v. X, its pmf is
IQR = Q3 − Q1
p(x) = P (X = x).
1.5IQR rule p(x) is non-negative and
Q1 − 1.5IQR, Q3 + 1.5IQR X
p(x) = 1.
all x
Chapter 2 The mean and variance of X are
X
For any event A, µX = x · p(x)
all x
0 ≤ P (A) ≤ 1 and P (Ac ) = 1 − P (A). X X
2
σX = (x−µX )2 ·p(x) = [ x2 ·p(x)]−µ2X .
For an empty set ∅,
all x all x
P (∅) = 0. For a continuous r.v. X with a pdf f (x) and two
real numbers a and b where a < b,
For any two events A and B, Z b
P (A ∪ B) = P (A) + P (B) − P (A ∩ B). P (a < X < b) = f (x)dx.
a

If A and B are mutually exclusive, then f (x) is non-negative and


Z ∞
P (A ∩ B) = 0.
f (x)dx = 1.
−∞
For any two events A and B,
The mean and variance of X are
P (A ∩ B) Z ∞
P (B|A) = .
P (A) µX = x · f (x)dx
−∞
If A and B are independent, then Z ∞ Z ∞
2
σX = (x−µX )2 ·f (x)dx = [ x2 ·f (x)dx]−µ2X .
P (B|A) = P (B) and P (A ∩ B) = P (A)P (B). −∞ −∞
The median xm of X can be solved with
The law of total probability Z xm
1
P (B) = P (A1 )P (B|A1 ) + · · · + P (Ak )P (B|Ak ) f (x)dx = .
−∞ 2

1
STAT 201 Formula Sheet Winter 2023

Chapter 3 Poisson Distributions


If Y = X + c, then X ∼ P oisson(λ), then
e−λ λx
σY = σX . p(x) = for x = 0, 1, . . .
x!
2
If Y = cX, then µX = λ and σX = λ.
P (X = x) = dpois(x, λ)
σY = |c|σX .
P (X ≤ x) = ppois(x, λ)
If Y = c1 X1 + · · · + cn Xn , where Xi ’s are in-
dependent measurements and each has uncer- Normal Distributions
tainty σXi . Then
X ∼ N (µ, σ 2 ), then
q
2
σY = c21 σX 2 + · · · + c2 σ 2 .
n Xn
µX = µ and σX = σ2 .
1

1 −(x−µ)2

If f (x) = √ e 2σ 2 , −∞ < x < ∞


X1 + · · · + Xn 2πσ
X= , P (X ≤ x) = pnorm(x, µ, σ)
n
where each and every Xi has the same uncer- If P (X ≤ x) = q, then
tainty σ. Then x = qnorm(q, µ, σ)
σ z-score is
σX = √ . x−µ
n z=
σ
If Y = u(X), where u(·) is a nonlinear function.
Then Sampling Distribution of X
du(X)
σY ≈ | |σX . Let X1 , . . . , Xn be a random sample from a po-
dX
pulation with mean µ and variance σ 2 . Let X
be the sample mean. Then
Chapter 4
2 σ2
µX = µ and σX = .
Bernoulli Distributions n
X ∼ Bernoulli(p), then CLT : X follows a normal distribution approxi-
mately when n is sufficiently large.
p(x) = px (1 − p)1−x for x = 0, 1 z-score is
x−µ
z= √
2
µX = p and σX = p(1 − p). σ/ n

Sampling Distribution of Sn
Binomial Distributions
Let X1 , . . . , Xn be a random sample from a po-
X ∼ Bin(n, p), then pulation with mean µ and variance σ 2 . Let Sn
n! be the sample total. Then
p(x) = px (1−p)n−x for x = 0, 1, . . . , n
x!(n − x)! µSn = nµ and σS2 n = nσ 2 .
2
µX = np and σX = np(1 − p). CLT : Sn follows a normal distribution approxi-
mately when n is sufficiently large.
P (X = x) = dbinom(x, n, p) z-score is
sn − nµ
P (X ≤ x) = pbinom(x, n, p) z= √

2
STAT 201 Formula Sheet Winter 2023

Chapter 5 If sampled data are matched pairs, (1−α)100%


confidence interval for µX − µY when popula-
Under proper conditions (that is, you tion variances are equal :
will need to consider sample sizes, popu- sd
lation distributions,etc.) : D ± tnd −1,α/2 √ .
nd
(1 − α)100% confidence interval for µ when σ If a set of data has been given, you may use
is known : t.test() in R to find a CI. Please refer to R do-
σ cumentation for more info.
x ± zα/2 √ .
n
(1 − α)100% confidence interval for µ when σ Chapter 6
is unknown :
s Conclusion 1 :
x ± tn−1,α/2 √ . Reject H0
n
Statistical significant
The multipliers can be found via R : p − value ≤ α
(1 − α)100% CI doesn’t contain µ0
zα/2 = qnorm(α/2, lower.tail = F ), |test statistics|≥ critical point
Observed retults likey not due to chance
tn−1,α/2 = qt(α/2, n − 1, lower.tail = F ). Data not consistent with H0
The sample size required for a (1 − α)100%
Conclusion 2 :
confidence interval for µ to have a precision of
Fail to reject H0
±m is
zα/2 σ 2 Not statistical significant
n=( ) . p − value > α
m
(1 − α)100% CI contains µ0
If two samples are independent, (1 − α)100%
|test statistics|< critical point
confidence interval for µX − µY when popula-
Observed retults likey due to chance
tion variances are not equal :
Data consistent with H0
s
s2X s2 Under proper conditions (that is, you
(x − y) ± tdf,α/2 + Y ,
nX nY will need to consider sample sizes, popu-
lation distributions,etc.) :
where
s2 s2
( nXX + nYY )2 The test statistic for testing H0 : µ = µ0 when
df = s2 s2
.
( nX )2 ( nY )2 σ is known :
X
nX −1 + Y
nY −1 x − µ0
zts = .
√σ
If two samples are independent, (1 − α)100% n
confidence interval for µX − µY when popula- The test statistic for testing H : µ = µ when
0 0
tion variances are equal : σ is unknown :
r
1 1 x − µ0
(x − y) ± tdf,α/2 sp + , tts = .
√s
nX nY n

If H1 : µ > µ0 , then
where
df = nX + nY − 2 and p−value = P (T ≥ tts ) = pt(tts , df, lower.tail = F ).
If H1 : µ < µ0 , then
s
(nX − 1)s2X + (nY − 1)s2Y
sp = .
nX + nY − 2 p − value = P (T ≤ tts ) = pt(tts , df ).

3
STAT 201 Formula Sheet Winter 2023

2
σX
If H1 : µ ̸= µ0 , then To test H0 : 2 = 1, the test statistic is
σY

p − value = P (T ≥ |tts |) s2X


Fts = .
s2Y
= 2 ∗ pt(|tts |, df, lower.tail = F ).
2
σX
If H1 is one-sided, then the critical point is If H1 : 2
σY
> 1, then

qt(α, df, lower.tail = F ). p−value = pf (Fts , nX −1, nY −1, lower.tail = F ).


2
If H1 is two-sided, then the critical point is σX
If H1 : 2
σY
< 1, then
qt(α/2, df, lower.tail = F ). p − value = pf (Fts , nX − 1, nY − 1).
Type I error is Falsely rejecting a true null If H : σX 2
1 σ 2 ̸= 1, then
hypothesis. Type II error is Falsely failing to Y

reject a false null hypothesis when the alterna- p−value = 2∗pt(F , n −1, n −1, lower.tail = F ), if F > 1;
ts X Y ts
tive hypothesis is true.
p−value = 2∗pt(Fts , nX −1, nY −1), if Fts < 1.
To test H0 : µX − µY = 0 :
If you use the F distribution table to find the
p-value, then you need to refer to the lecture
If two samples are independent with unequal
notes method to form the test statistic.
population variances, then the test statistic is
x−y If a set of data has been given, you may use
tts = q , var.test() in R to perferm a test. Please refer to
s2X s2Y
nX + nY R documentation for more info.
with
s2 s2Y 2
( nXX + nY ) Chapter 9
df = s2 s2
.
( nX )2 ( nY )2
X
+ Y You will always need to check proper
nX −1 nY −1
condistions so that the following formu-
If two samples are independent with equal po- lae can be used.
pulation variances, then the test statistic is
x−y One way ANOVA model is
tts = q ,
sp n1X + 1
nY
Xij = µi + ϵij

where i = 1, . . . , I for levels and j = 1, . . . , J


where
for replicates.
df = nX + nY − 2 and
s
(nX − 1)s2X + (nY − 1)s2Y To test H0 : µ1 = . . . = µI , use the one way
sp = . ANOVA table as follows :
nX + nY − 2
If sampled data are matched pairs, then the test
statistic is
D
tts = sd .

nd
where
If a set of data has been given, you may use I
t.test() in R to perferm a test. Please refer to
X
SST r = Ji (X i· − X ·· )2
R documentation for more info. i=1

4
STAT 201 Formula Sheet Winter 2023

I
r
X M SE
SSE = (Ji − 1)s2i x·j· ± tIJ(K−1),α/2 .
IK
i=1
Block design and 23 factorial experiments use
SST otal = SST r + SSE
similar techniques as shown in one-way or two-
The critical value for one way ANOVA F test way factor analysis. Please study these two
is sections for more information.

qf (α, I − 1, N − I, lower.tail = F ). If data are given, you may use lm(), anova() and
confint(), etc. in R to perferm ANOVA tests.
If the factor/treatment effect is significant,
Please refer to R documentation for more info.
compute the CI for each µi as
r
M SE
xi ± tN −I,α/2 .
Ji Chapter 7
Two way ANOVA model is
You will always need to check proper
Xij = µ + αi + βj + γij + ϵijk condistions so that the following formu-
lae can be used.
where i = 1, . . . , I for levels of facotr A, j =
1, . . . , J for levels of factor B and k = 1, . . . , K Correlation coefficient is
for replicates. 1 xi − x yi − y
r= ( )( ) = cor(x, y).
n − 1 sX sY
To test interaction effects and main effcts, use
the two way ANOVA table as follows : The fitted regression line is

ŷ = β̂0 + β̂1 x,

where
Pn
(x − x)(yi − y) sy
β̂1 = Pn i
i=1
2
=r
(x
i=1 i − x) sx

β̂0 = y − β̂1 x.
The estimated effects are
The coefficient of determination is
α̂i = xi·· − x··· Pn Pn
2 SSR − y)2 − i=1 (yi − ŷ)2
i=1 (yi P
r = = n
β̂j = x·j· − x··· SST otal i=1 (yi − y)
2

γ̂ij = xij· − xi·· − x·j· + x··· The redisdual is


The critical value for a two way ANOVA F test
ei = yi − ŷ.
is
The standard deviation of errors is
qf (α, dfnumerator , dfdenominator , lower.tail = F ). s Pn
(1 − r)2 i=1 (yi − y)2
If the interation effect is not siginificant but a s=
n−2
factor main effect is significant, compute the CI
for the factor means of the significant factor(s)
The standard deviation of β̂0 is
as s
1 x2
r
M SE sβ̂0 = s + Pn
xi·· ± tIJ(K−1),α/2 and/or 2
JK n i=1 (xi − x)

5
STAT 201 Formula Sheet Winter 2023

The standard deviation of β̂1 is Test of one or two Means


s t.test(x, alternative=__,
sβ̂1 = pPn mu=__, conf.level=__)
i=1 (xi − x)2
t.test(x, y, alternative=__,
mu=__, paired=__, conf.level=__)
Inference about β̂1 can be obtained through
t.test(Resp~Factor, data, alternative=__),
mu=__, paired=__, conf.level=__)
β̂1 ± tn−2,α/2 sβ̂1
pt(q, df, lower.tail=__)
β̂1 − β1claim qt(p, df, lower.tail=__)
tts =
sβ̂1

Inference about β̂0 can be obtained through Test of Equality of Variance


var.test(x, y, alternative=__,
β̂0 ± tn−2,α/2 sβ̂0 conf.level=__,)

pf(q, df1, df2,lower.tail= __)


β̂0 − β0claim
tts = qf(p, df1, df2, lower.tail=__)
sβ̂0
Inference about the mean repsonse y for a given
x is ANOVA Models
ŷ ± tn−2,α/2 sŷ You may need to use the
factor() function in some cases.
ŷ − µ0
tts = tapply(Resp, list(FactorA), mean)
sŷ
fit<-lm(Resp~FactorA, data)
where anova(fit)
ŷ = β̂0 + β̂1 x
s fit1<-lm(Resp~FactorA-1, data)
confin(fit1)
1 (x − x)2
sŷ = s + Pn 2
n i=1 (xi − x)
tapply(Resp, list(FactorA), mean)
tapply(Resp, list(FactorB), mean)
and µ0 is the claimed mean response value for tapply(Resp, list(FactorA,FacotrB), mean)

the given x value. fit<-lm(Resp~FactorA*FactorB, data)


Prediction interval for a future y at a given x anova(fit)

is fit<-lm(Resp~FactorA+FactorB, data)
ŷ ± tn−2,α/2 spred anova(fit)

pf(q, df1, df2,lower.tail =FALSE)


where
ŷ = β̂0 + β̂1 x qf(p, df1, df2, lower.tail =FALSE)
s
1 (x − x)2
spred = s 1 + + Pn .
n i=1 (xi − x)
2 Simple Linear Regression Models
cor(x,y)
R functions model<-lm(Resp~Expl, data)
summary(model)
x and y refer to generic variables.
anova(model)
You will need to provide proper values at __.
Expl<-data.frame(Expl=new_value)
predict(model, newdata=Expl, interval=__)
Descriptive Statistics confint(model, level=__)

mean(x) sd(x)
var(x) median(x)
min(x) max(x) If functions that you would like to use aren’t provi-
quantile(x) summary(x)
ded in this list, please feel free to search them in the
hist(x) boxplot(x)
lecture notes or R documentation.

You might also like