0% found this document useful (0 votes)
44 views

Statistics Formula Sheet: Summarising Data

The document provides formulas and procedures for summarizing and analyzing statistical data, including calculating measures of central tendency and dispersion, probability distributions, hypothesis testing, and other statistical techniques. Key concepts covered include sample mean and variance, probability rules, parameter estimation, confidence intervals, hypothesis testing for means, proportions, and correlations, and goodness of fit tests.

Uploaded by

cinnamonalbee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Statistics Formula Sheet: Summarising Data

The document provides formulas and procedures for summarizing and analyzing statistical data, including calculating measures of central tendency and dispersion, probability distributions, hypothesis testing, and other statistical techniques. Key concepts covered include sample mean and variance, probability rules, parameter estimation, confidence intervals, hypothesis testing for means, proportions, and correlations, and goodness of fit tests.

Uploaded by

cinnamonalbee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Statistics formula sheet

Summarising data
Sample mean:
x =
1
n
n

i=1
x
i
.
Sample variance:
s
2
x
=
1
n 1
n

i=1
(x
i
x)
2
=
1
n 1
_
n

i=1
x
2
i
nx
2
_
.
Sample covariance:
g =
1
n 1
n

i=1
(x
i
x)(y
i
y) =
1
n 1
_
n

i=1
x
i
y
i
nxy
_
.
Sample correlation:
r =
g
s
x
s
y
.
Probability
Addition law:
P(A B) = P(A) +P(B) P(A B).
Multiplication law:
P(A B) = P(A)P(B|A) = P(B)P(A|B).
Partition law: For a partition B
1
, B
2
, . . . , B
k
P(A) =
k

i=1
P(A B
i
) =
k

i=1
P(A|B
i
)P(B
i
).
Bayes formula:
P(B
i
|A) =
P(A|B
i
)P(B
i
)
P(A)
=
P(A|B
i
)P(B
i
)

k
i=1
P(A|B
i
)P(B
i
)
.
Discrete distributions
Mean value:
E(X) = =

x
i
S
x
i
p(x
i
).
Variance:
Var(X) =

x
i
S
(x
i
)
2
p(x
i
) =

x
i
S
x
2
i
p(x
i
)
2
.
The binomial distribution:
p(x) =
_
n
x
_

x
(1 )
nx
for x = 0, 1, . . . , n.
This has mean n and variance n(1 ).
The Poisson distribution:
p(x) =

x
exp()
x!
for x = 0, 1, 2, . . . .
This has mean and variance .
Continuous distributions
Distribution function:
F(y) = P(X y) =
_
y

f(x) dx.
Density function:
f(x) =
d
dx
F(x).
Evaluating probabilities:
P(a < X b) =
_
b
a
f(x) dx = F(b) F(a).
Expected value:
E(X) = =
_

xf(x) dx.
Variance:
Var(X) =
_

(x )
2
f(x) dx =
_

x
2
f(x) dx
2
.
Hazard function:
h(t) =
f(t)
1 F(t)
.
Normal density with mean and variance
2
:
f(x) =
1

2
2
exp
_

1
2
_
x

_
2
_
for x [, ].
Weibull density:
f(t) = t
1
exp(t

) for t 0.
Exponential density:
f(t) = exp(t) for t 0.
This has mean
1
and variance
2
.
Test for population mean
Data: Single sample of measurements x
1
, . . . , x
n
.
Hypothesis: H : =
0
.
Method:
Calculate x, s
2
, and t = |x
0
|

n/s.
Obtain critical value from t-tables, df = n 1.
Reject H at the 100p% level of signicance if |t| > c,
where c is the tabulated value corresponding to col-
umn p.
Paired sample t-test
Data: Single sample of n measurements x
1
, . . . , x
n
which
are the pairwise dierences between the two original sets
of measurements.
Hypothesis: H : = 0.
Method:
Calculate x, s
2
and t = x

n/s.
Obtain critical value from t-tables, df = n 1.
Reject H at the 100p% level of signicance if |t| > c,
where c is the tabulated value corresponding to col-
umn p.
Two sample t-test
Data: Two separate samples of measurements x
1
, . . . , x
n
and y
1
, . . . , y
m
.
Hypothesis: H :
x
=
y
.
Method:
Calculate x, s
2
x
, y, and s
2
y
.
Calculate
s
2
=
_
(n 1)s
2
x
+ (m1)s
2
y
_
/(n +m2).
Calculate t =
xy
_
s
2
_
1
n
+
1
m
_
.
Obtain critical value from t-tables, df = n +m2.
Reject H at the 100p% level of signicance if |t| > c,
where c is the tabulated value corresponding to col-
umn p.
CI for population mean
Data: Sample of measurements x
1
, . . . , x
n
.
Method:
Calculate x, s
2
x
.
Look in t-tables, df = n 1, column p. Let the
tabulated value be c say.
100(1 p)% condence interval for is x cs
x
/

n.
CI for dierence in population means
Data: Separate samples x
1
, . . . , x
n
and y
1
, . . . , y
m
.
Method:
Calculate x, s
2
x
, y, s
2
y
.
Calculate
s
2
=
_
(n 1)s
2
x
+ (m1)s
2
y
_
/(n +m2).
Look in t-tables, df = n +m2, column p. Let the
tabulated value be c say.
100(1 p)% condence interval for the dierence in
population means i.e.
x

y
, is
(x y) c
_
_
s
2
_
1
n
+
1
m
_
_
.
Regression and correlation
The linear regression model:
y
i
= +x
i
+z
i
.
Least squares estimates of and :

n
i=1
x
i
y
i
nxy
(n 1)s
2
x
, and = y

x.
Condence interval for
Calculate

as given previously.
Calculate s
2

= s
2
y

2
s
2
x
.
Calculate SE(

) =
_
s
2

(n 2)s
2
x
.
Look in t-tables, df = n 2, column p. Let the
tabulated value be c.
100(1 p)% condence interval for is

c SE(

).
Test for = 0
Hypothesis: H : = 0.
Calculate
t = r
_
n 2
1 r
2
_
1/2
.
Obtain critical value from t-tables, df = n 2.
Reject H at 100p% level of signicance if |t| > c,
where c is the tabulated value corresponding to col-
umn p.
Approximate CI for proportion
p 1.96
_
p(1 p)
n 1
where p is the observed proportion in the sample.
Test for a proportion
Hypothesis: H : =
0
.
Test statistic z =
p
0
_

0
(1
0
)
n
.
Obtain critical value from normal tables.
Comparison of proportions
Hypothesis: H :
1
=
2
.
Calculate
p =
n
1
p
1
+n
2
p
2
n
1
+n
2
.
Calculate
z =
p
1
p
2
_
p(1 p)
_
1
n
1
+
1
n
2
_
Obtain appropriate critical value from normal tables.
Goodness of t
Test statistic

2
=
m

i=1
(o
i
e
i
)
2
e
i
where m is the number of categories.
Hypothesis H : F = F
0
.
Calculate the expected class frequencies under F
0
.
Calculate the
2
test statistic given above.
Determine the degrees of freedom, say.
Obtain critical value from
2
tables, df = .
Reject H : F = F
0
at the 100p% level of signicance
if
2
> c where c is the tabulated critical value.

You might also like