Sta1000f Notes
Sta1000f Notes
~
Is
February
....................
2023
probability
Random experiment:
A
procedure whose outcome (result) in a
particular performance (trial) cannotbe predetermined
Fair die
Pr(0) 1/6
=
Pr (even) 3
=
/6
Odd S
Random variation
-> won't
hit at throws
implies we once
every 6
Statistical distributions
Excel Formula
RANDBETWEEN
=
and 6 times: 11
=
For this to be
fair, we need to R6 win for R1 bet
game payout every
would for local
Practically speaking, not a
good idea, you
make no
money community
for number
win
percentage payout
a
winning X 100 =
%
amountto be bet across all numbers
: 50 100
x
91,677.
=
:house 8,23%
advantage
=
...................
CH 2:Set theory
Why do we have to do set
theory?One Murphy's
of Laws state that before we can
to do
"probability theory", and for that we need some "set theory"
#finition sets:
of
The set A
is determined when we can either (a list
the objects that belong
to A
or (b) give a rule
by which we can decide whether or nota
given
object belongs to A.
A
=
5e;f,93
letters e,f, g the set A
the
belong to
B 3x/
=
11x 103
=
the set 1 consists of all real numbers (C such that IC is larger than or equal
to 1 butis less than or
equal to 10.
ef A
e is an element of A
e+B
e is not an element of B
c 31,3,
=
5, a3 and D =
50,1,5,33
C:Dthe order in which we list the elements a
of is
set irrelevant
E 5a,
=
b, c,
a3 and F Sa,b,c)
=
E F E b,
set contains the
distinguisable elements
=
the c.
only a,
G
Let 31,3,53,
= H 31,3, 5,93
=
and 5 31,2,3,4,53
=
:G H, H4J,TCG
IfH CGand G C
H, then H G.
obviously,
=
Av rs e c t i o n s suppose L Ga,
=
c) and M 3b, c, d3. Then LK M
b, =
write N 1 =
b, c3 and R 3d,e,f3?
=
0
"the intersection sets
of and is the set "
empty
Pairs sets said be
of whose intersection is the
empty set are to
mutually
exclusive sets (or disjoint sets). Thus Land R are
mutually exclusive.
Unions
The concept union contrasts with the concept intersection. The union of
two sets A
and B the set that the elements that to
is contains
belong
A or B. Here we use the word "or" in an inclusive sense -
we do not
"
↳ we I equals A union B
say
Complements
Given the
sample space of 3, we define the complement of a set A
to
1) S 31,2,3,4,5,63,
=
A 31,3,53
=
and B 32,4,63
=
then
E 22,4,63
=
We write I =
"the complement A
of
equals B"or "A
complement equals 13"
1 s 3x/0xx 1/3
=
and D =
5x/0 <
x43, find J.
Because the is
set excludes endpoints of the interval from zero to
one,
D 50;13
=
Venn
diagrams
Think all
of the
"points"in the
rectangle as
being
the
sample spaces, and all
A 1B, to and
A B.
the of
set points belonging Similarly the space
of both individual circles and their
overlap represent A
UB, the set of
A
or B.
points belonging to
* "or"
helpful to associate "intersection"with "and" and "union"with
a KRmDG Up
mutually exclusive
.....................
A=
5x/ 1 x a3 =
B 31,2,3,4,5,63
=
A B
why?I
=
does not contain all the elements thatA does
Sample space
the setofall a random
possible outcomes of
experiment
(in venn
diagram outer rectangle)
sets
Empty
set which has
A no elements f
Intersection
Unions
the union of
sets A and B contains those elements that are other in A
or
in B or in both and
A B
Exclusive
Mutually
two exclusive when their
sets are
mutually intersection is the
empty set
of 5 left out
Complement
the complement ofa set contains all elements thatare not
in that set.
CH3:Probability theory
An event is defined to be subset the
of
sample space S.
of
any
The set is the set no elements.
empty containing
0 and 3 are both subsets of 5 and thus events. (O is called an
Kolmogorov's axioms of
probability
S is the sample space for random For all
Suppose that a
experiment. events
A
c
5, we define the
probability of
A, denoted Pr
(A), to be a real number
the
with
following properties:
①0 Pr(A) 1 for all AC S
⑪ Pr(s) 1
=
③IfA + B 0
=
(ie if
A and B are
mutually exclusive events) then
Pr (AUB) Pr(A) Pr(B). =
+
->
by consequence, Pr(0) 0
=
Functions
usually look like
y f(x)
=
-you put
in number (x), you getanother
number
(y) out.
you put
a set (A) in, and out pops a number between zero and one
Relative
probability
We
start trials the random experiment and the
by doing a of
counting
times that some event ACS trials.
number of occurs
during a
r/ n = 1. Thus and
Obviously, 0 relative frequencies
both lie between and
probabilities zero one.
the the
We can of
think
probability of A
event as the relative
frequency of
A the random
as
gets large e
numberoftrians ofthe
, experiment very
theorem 1 -
AC S .
Le t Then Pr(F) 1
= -
Pr(n)
-> writes the two
proof We as union of
mutually exclusive events:
UA
A S
=
and
A I 1A
A 0
=
because are
mutually exclusive, ie
Pr (AWE) Pr(A)
=
Pr(A)
+
so
Pr (A UE)=1. Therefore,
Pr (A) Pr
+
(A) 1 =
and Pr(E) 1
= -
Pr(A)
theorem 2 If
-
proof ->
Write A
as the union the
of two
mutually exclusive sets:
A (A
=
+
B)u(A 1B).
clearly, (A1B)n(An B) 4 =
therefore
using axiom 3, Pr(A) Pr (An B) =
Pr(A1B)
+
Pr(A + 5) Pr(A)
=
-
Pr(An B)
Pr(A
n B) + -
AUB of
proof ->
Write as the union two
mutually exclusive sets:
AUB BU
=
(A 1 B)
and AnB
because B are
mutually exclusive, we can
again
apply axiom 3 and
say Pr(AWB) Pr(B) =
+ Pr (AUB).
But, by theorem 2, Pr (AUB) =
Pr (A) -
Pr (A1B).
theorem 4 -> If B C
A, then Pr(B) < Pr (A).
proof -> If B C A then we can write A
as the union two
of
>Pr(B)
Pr (A nB) <0 all
because as probabilities are
non-negative.
theorem 5 - If A, As, .... An are
pairwise mutually exclusive,
is Av 1 A j 0 for it j
=
+...
+
Pr(An)
or in more concise notation,
pr(, Ai) =
Pr (Ai)
Equally probable elementary events
be solved theorem 5.
theory, all probability problems
In can
using
The S
up the
elementary events whose make union
sample space are
always mutually
exclusive because one event other event occurs.
if
elementary occurs, no
elementary
Therefore, we
if knew the probabilities all
of the
elementary events, we would also be able
are N
elementary events contained in
S, and each one has the same
probability of
occuring, then the
probability ofeach and
every elementary
event be
must 1/N.
card
Equally probable elementary events occur in
many games
of chance. When a is
to be1/52. Let A
be some event in this scenario. Then A
must consist the
of union of
events, 1 / N. we could
If determine the number of
elementary each
probability
of
events contained in
A, then could
elementary we
write,
Pr(A)- merofelementyearseatedst
= =
in
"counting rules"
ofa objectjects
mutations and thatthe order in which the objects are listedin
Irrelevant. We now consider the number different all the objects are listed is irrelevant.
of
ways
Consider the number different all the objects be order.
of
ways in a set
may arranged in
A set has
containing distinguishable a
objects
n(n 1) 3 2 -
x ...x x x1 n!
=
(in factorial")
differentorderings the objects belonging to the set. We this
of can see
by thinking in
one
object. We can choose an
object for the first slotin a
ways;
there are
aren!
We
say that
there distinctarrangements
(technically, we call each
arrangement or ordering a
permutation) the
of n
objects in the set.
n(n-1) ways.
Firstthree slots can be filled in n(n-1)(n-2) ways.
-
r 1
+
n(n -
1) ...x(n
x -
r +
1)
=>
n(n -
1) ...x(nx 1)x(n r)(n
-
r +
- -
r -
1) x ...x3 2
x 1
x
(n -
-r)! ways
Thus there are is!ways of
ordering elements taken from a set
containing
n elements
using
each element at most once. Note thatwe (a) choosing
↓
objects and (b) arranging them. We are here involved in two
processes,
and The number and
Eng
arranging.
of
ways choosing of
arranging
r
ofa called of
objects out
distinguishable objects is the number permutations
of
n
objects taken ata time and is denoted
by (n) r (in permation r')
(n) = mrs!
This formula is also valid for r n
=
if we adopt the convention 0!=
that 1
Combinations n at
of
objects taken r a time ...
process
them. We interested the first operation. We recall
arranging
are now
only in
r! little
A reflection
that a subset
having r
objects can be
arranged in
permutations.
will that therefore r! combinations.
convince
you
there are times more permutations than
-r)!
(w) yr
=n!
Now have a
objects and and thatwe have atleast
we
types of a
slots, r
objects
each available. We can thus fill the first slot with the
of of
of
type any types
objects, there still objects available for the second slot because
are n
types of ...
there at least of
each there still each ofthe
are r
objects type, are
objects of n
n x x x
x x ...
xn n
=
Combinations with repetitions
We have of with least available of each The number
a
types objects, at
type.
selections a
objects allowing repetitions
of is
given by
of
C for- S
included at end of
the
proof of this result
is as an exercise chapter
rules
Counting
n!
allowing repetitions is
2:The
He number of chosen from distinct not
ways ordering
of r
objects a
objects,
is
allowing repetitions
(n) r
=
=
r)!
not
allowing
repetitionsse)!
4:The number of r
me permutations of
objects, chosen from a distinct
r
--
Permutations combinations
Fide4. an
of in
A)
(number of equally likely elementary events in 3)
Whatis a
permutation?An arrangementofobjects where order does matter.
What is a combination? A
selection objects
of where order does not matter
VI020 S:conditional probability and events
independent
The event A
probability of
occuring
event
If B has occurred.
Pr(A/B) =
2)
If and
A B are unrelated or independent Pr(A(B) Pr(A) =
Pr(A1B) Pr
=
(A) Pr(B)
UniE4.
-----
monal
probability
Provide method for new information.
revising probabilities the
light
a
updating or in of
Let and
A B be two events in a
sample space 3. Then the conditional
using
Pr (clubs)
-
m clubs) of
Pr (clubs)
clubs"is "clubs"
The event
"King of a subset the
of events -
so the
there are 3
ways
of
doing this. Hence Pr (clubs) 13/92 =
(King clubs)
Pr of
clubs)
Example
Suppose A and B are two events in a
sample space, and that
0.5.
Pr(AB) 0.1
=
a) Pr(B(A) Pr(B1A)/Pr(A) =
0.1 0.6
=
0.1 7
=
=
=
d) Pr(B/A) Pr(BrA)/Pr(A) = -
(1
=
-
Pr(A))
0.1/(1 0.6)
=
-
0.2 5
=
Bayes' Theorem
For and B
A conditional
any two events there are two
probabilities that can
be considered:
Pr(B(A) Pr (A =
+ B) / Pr(A)
Pr(A(B) Pr(A =
n B) /Pr(B)
IfAand are
B two
events, then
Pr(A(B) -mPr(A)
=
Proof ->
recall the definition of conditional probability
Pr(A(B) Pr(A 1B)/Pr(B)=
and theorem 2
Pr(B) Pr
=
(A 1B) Pr(A1 B) +
have
Substituting, we
pr (A(B)
(an
=
a
We note that
Pr(A 1B) Pr
=
(BIA). Pr(A)
and
Pr (A-B) Pr
=
(B/A). pr (E)
therefore
Pr(A1B) =
as: A). Pras
W"
-endent
events
probability of B: Pr (B(A) Pr
=
(B).
then,
the definition conditional
using of
probability,
Pr(B) ?A)
=
or
Pr(An B) Pr(A) =
x
Pr(B)
*
ifevents are
independent
words -> of intersection
in the
probability independent
of events is the
extended:
Pr (A, 1 Azn ...
An) Pr(A.) =
exclusive"is from
It helps to realize that
"mutually a
concept set
and Venn
theory be
represented by
can
Diagram. But a
"Independence"is
concept probability theory and cannot
a in
be Venn
represented by diagram. a
->
independent exclusive
events are never
mutually
("price of
gold today
in
theyare conceptually, attent
VId& 8 6:Summarising
using graph
information
Bad don'tadd
ple charts the 100%.
-> to
categories up
Whatkind data
dealing with?
of are we
·
ordinal data (ranked or ordered) (steps avert necessarily the same size)
data
CH 1.Exploring
We define statistics as the science of decision the face of
making in
uncertainty.
What data?
do we mean
by
-data is information
Either
you get
data drips (too little info) data floods (too
or much data)
-
different
types data
of qualitative and quantative)
non-numerical I numerical
always
(nominal)
of location
VIde8 7:Summary and
measures
spread
A
statistic calculated from the data values a sample
is
any quantity of
-
rd deviation on excel:
Step by step
I find mean data
of set (FAVERAGECi)(
2. In new
coloumn, find (Ci -T ) (xi x
=
given; T mean)
=
for each ci
1)
6 this equals variance
7.
Square to
root
getstandard deviation
s x
=
1),(xi -
x)-
-
STDEV. S (all values in data set row)
excel
summary on r
we
min (range;o(
=
QUARTILE
LG QUARTILE (range; a(
=
median QUARTILE
=
range j 2)
UQ QUARTILE (range; 3)
-
(range;4(
QUARTILE
-
max
strays
->
unusually small or
large numbers ( more extreme are outliers)
193S
than median -
3 (median -
(Q) lesuan=
median -
6 (median -
(a)
more more
than
=
median 3
+
(UA-median (
Video 8:
random variable ->
in the example rolling
of a dice
x (the
=
die after is
it thrown)
X 3121x 1,2,3,4,5,63
=
=
remember x
(big x) =
the random variable
12 (small x) A
=
We random variables:the of
set possible values thatcan take on is
-
variable?
relative approach will not work. Instead,
frequency we
between two
values, and d, is equal to the
any probability
that i lies between and d
ya
-
↳ function
probability density
The function of a discrete random variable is known as a
probability
function.
probability mass
(iii) 2p(x) =
1 for all values of
function
as a
probability density
(i) defined for all values of
In order
manipulate the events defined sample
to on a
space mathematically,
It is to attach numerical value to each event.
necessary a
elementary
When events are
quantative, there is an obvious and natural to
way assign
numbers to them (ie hours, countof items). However
elementary
if events
of a random experiment.
However, for
summing
them. continuous random variables, the probability
density function f(x) is constructed in such a that probabilities
way
of events are found the under the between
by integration: area
graph
the numbers and a the the event.
represent probability of
Video 9:
also referred to the expected value the random variable X.
average
-man as of
possible values of X.
E(x) E,
=
x.p(X) where x=
the number dots
of on
upturned face of
thrown die
a
randomly
E(X) 5(1) 6(z) 5(3) 5(4) (s) (6) + +
+ +
= +
=
3.5
the
expected value of X is also referred to as the mean ofX.
(.8x.f(x) dx
Ample let X=
a continuous random variable with pdf given as:
f(x) T(z
=
-
xi) -1 x 1
O
=
elsewhere
ESX3 S.,x
=
f(x) dx
-> To " -
"1.
(t
-
-
(r) -
( -
i)
=
var [X] =
= [x] -(ESX])2
143-149
p
How do we calculate?
=(A B) +
Var(A B) +
E(A -
B) var (A-B)
⑦ E(A -
B) E(A)=
-
E(B)
③ E(cYo s t (E(A))
·
B) var(A)
Var(A+ =
+ var(B)
·
Var (cA) c"
=
(Var(A))
Excel example
mean AV E R AG E
->
(A1:A1000)
variance ->
VA R (A1:A1000)
-
e E(x] suM =
(c.p(x)]
Ia I [X] E(x--(ECX])"
=
var
·
S 0.167
6 0.167
pr (0.8)
E[X] [x2. =
p(x)
o()
=
1))
+
2))
+
3)(
+
0 0.23
=
=
0.0 0 8
=
1
=
2 0.8" x
=
0.2
=
0.1 2 8
3 0.83
=
0.5 1 2
=
14
=
-I
dx Si"(x -
14)"g(x)dx
=[12 1](!) (z x)"(8)
,(0)"(x)
- -
-
=-
24
CH 4:Probability
Scenario the
-
-fo r m
distribution
>
- I t
30
0 +130
elsewhere
or
generally put
->
+ (x) as
=
atx b
- O elsewhere
Pr (0c +
(20) relative, to the time the
of next train (between 0 and 30)
S
=apad at
0.6 7
you have
·
↳ uniform distribution
X -v(a,b)
and function of
probability density
f(x) ba
=
acx<b
8
=
elsewhere
-om
distribution
any
value in the interval (a;b), then has
X the uniform distribution
X -
and
·
mice
If cannotread-what should we set
p to be!
p
=
0. (godgett)
·
EVENT INTEREST:
OF of 4 out 5 left?
probability of mice
turning
when read
can't
even
they
·RANDOM VA R I A B L E : -
the number of mice that turn left
-out 5 mice
of
potential
0.5
-
p
=
given
X the
=
number mice
of that
successfully negotiate the maze
p
=
probability
ofeach mouse
turning
left (success)
X 3 no
mice;one mouse; ...;
five mice
and
happen once
only once
0.5" (0.5)
:.P(X 4) 5x(0.s"0.5) = =
(repetitions) an
of
experiment in which the outcome at each
P(X x) =
(c)p")1 p)r
=
-
x 0, 1, 2,
=
...
R
O elsewhere
The 4 left when
probability of mice
turning by chance
they can't
=0.154
Binomial Distribution
random variable
r
x
p(n
P(X x)
(4)p"(1 1, 2, ...
=
0,
=
-
=
else where
normal distributions
Pr(success) p;P(failure) =1
= -
p orq
=
1 -
pPr(fall) q =
the
If above conditions are satisfied, we have binomial and
process,
random variable X has a binomial distribution.
-----
I Binomial Distribution
I
-n
independent trials
I -
two outcomes I
-random of trials
I variable X is number successes in a
I
I x
-
p(x) (c)p(1
=
-
p)4 x 0, 1,
=
. .
.,
R
I
I 0
=
otherwise
------------
n and are
parameters of the distribution
p
(n 1) (0<p
= <
1)
x B(n,p)
-
Poisson
The
VIdCO 12:
and
Exponential Distributions
a random random
on
any repetition
of
experiment, because its
we can't know what the outcome will be. We can use some
probability, so we an
idea of what to
expect
repetitions.
----
X stretch of
=
mass function
-probability
the function will depend 1
on
only parameter:
the rate occurence
of
potholes
of
average
average
rate of occurence
=
-8 per km
4.2 5
=
:.x 9
x 5km road
potholes per
of
x -
P(x 4.25) =
p(x)i? 32
0, 1, 2, 3...
=
elsewhere
POISSON PROCESS
average
-
y -
E(x)
0.4
Y
S
1.5.
*3
-
-
+ (y) = y>0 O
1.5 e
dy
.e elsewhere [ -
e
-
1.5.03
I
5km road
Probability of
exactly 2 potholes in a stretch
·
of
-
Poisson distribution R 4.25
=
P(X 4.25)
x p(x)
ing"
- =
=
Exponential
Probability Y is at least 3 in
exponential distribution
y -
E(x 0.8 3
=
per
km of road)
0.854
f(y) 0.85.e-
=
y>0
Pr(y>3) (,0.852-0.853
=
dy
( 0.853]
-
q
-
0.078
=
stretch road
Probability finding
of no
potholes in 3km of
Poisson
↳joer
3km of road)
&roumberofeventcurrence
time
of I space/distance
per in the
&
paar b ui n ance betweenente
Chapter 5:Poisson
distribution
In Distribution
-
the same as the time period during which the events are counted.
Then has
poisson distribution
X the with parameter x,
12 X -
p(x)
Fi
=
O
-
otherwise
Example
Thereis fleet trucks. On there 12
a
large delivery
of
average are
trucks on
standby.
down in a
given day.
Pr(x x) p(x)
22
=
=
=
a.) What the that needed?
is
probability on
any day no
standby is
Pr (no
breakdowns)
=
0.0 9 07
What
b) is the
probability that number
standby
of trucks is inadequate?
Pr (Inadequate standby) Pr
=
(x 27
>
=1 -
Pr(x- z]
=1 -
-1 -
0.4383
=
Poisson Distribution
the
->
...
N
①lm n = ( ) e
=
for
⑦ limn finite
=
= 0 value a
any
=(c)()"(1 -
)
I
when tend
see what happens we let a to
infinity
Therefore the result we
require,
p(x)
2
=
probability density
functio
otherwise
ample
A computer that breaks down at random
operates continuously
on 1.5 times week.
average per
This tells us X 1.5=
1.5 e x > 0
-
O otherwise
x > 2
-
exponential is by integratione
Pr(X>2] S8 =
continuous, evaluateprobability
15e
-
1.33
2
-
-
3
-- e I e
=
0.0498
b) What
is the
probability of a breakdown within 3 days?
-
-maketimescompatible.See
--
-
days
1.53
e
d
- O
(
=
-
e- 13x]z
=-
0.9 2 5 8 1
+
0.4 74 2
=
Video B.Normal
Distribution
30 tose in
3.5
--
X sample
=
mean
123
#
throws -> 100
1 =
RANDBETWEEN (1,6)
2
↓
30
AV E R AG E (B2:B32)
=
-> 108
average
min MiM
=
(B32:(w3z)
max - 1
2.6 FREQUENCY
=
(B32:CW32, bins)
2.8 -> distribution of
averages
3 of 30 tosses across 100 I =FREQUENCY (B2:CW32, bins)
the
of
experiment 2 individual outcomes
3.2 repeats ->
3
3.4
3.0
↳
3.8 S
4 6
4.2
4.4
If the random variable X is the sum a
of
large
number random
of then X will tend towards
increments,
a Normal Distribution.
---
-mentral
Limit Theorem:
functin
density
-
It has a probably
·
middle
mean
into
printer
the tails distribution
of head asymptotically towards IS
+() l -
- 8
e
ar.
=
x -
N(100,4)
to calculate the that X
lies within interval:
probability a
given
we convert from the units of X to a standard Normal
Distribution (2) where the O
mean: and variance
=
1
to from
convert X to 2: z E
=
:.x -
N(100,4)
=Pros, a
re
-"c00._-mso.) -
a
both
conqurent
to each other
---
book
the
* under the for function
area curve a
probability density is one
point is
away from the mean
M.
for other tasks. the total time taken will also be random
Obviously, a
"
Xiis Y-N(p,o")
where
p S- pi,
=
and o =
SY-, o
then, z x) -
z -
N(Ne-pc, O, 0.) +
b
+
are
constants,
also has a normal distribution, with y-N(ap+b,do")
X
if
-
N(67,42) where x
is inches, then y= 2.54 x 0
+
y -
N((2.34 6
x
0),(0.254244)) x(n0.2, 103.2)
+
=
Module 5:Hypothesis
testing
Video 15
The difference between a sample and a
population
Population
From the sample we can draw
parameter L statistic
population variance (o
·
Our be
sample needs to
representative and random:
representative similar
->
in structure to the population it is drawn from
X
is a random variable
We call of statistic
the
sampling distribution
a
probability distribution a
Example of 1st students
measuring heights year
Let X
be the a drawn is student
height of
randomly year
and let, X1, X2, ...
Xn. be a
randomly drawn sample
"
from the normal distribution. true variance
-
a
mean-p
Rememberi) E(A) =
+
E(B)
and
A drawn
Var (A+B) var (A) var
+
(B)
is
randomly
independently
=
.e
=x:E"
E(x) =
=
(np) ->
E[X] p =
Var(aX)a"Var(X) =
where a is a constant
var(X) var
=
(n , Xi)
Var(Vi
-wan
of
large sample stable estimate mean
N(r,) as
long as 30
testimate
m
-
Interval
values
of
-range
communicate how much the estimate
uncertainty present
-
or is in
precision
X: number dots
of
upturned
on face a fair die after
of toss
any given
number.S
Probability of
any
E(X) 2i =
xi
pi
=
=
=3.5
=
var(X) E(X) =
-
E(X) [xip:-(5)
-- ()"
35
=
12
=
2.917
The sample mean is a random variable.
Therefore, it has a
sampling Distribution
x -
N(r,)
now,
for a fair die tossed 30 times:
X -
from 30 tosses.
E
z
=
-n(0,1)
0.95
continuing
:confidence interval (4.70-1.96*; 4.70 1.96
+
(
↳
(4,09; 5,31)
L z x
=
is apparent
-
So... SAMPLES
LARGE YIELD BETTER RESULTS THAN SMALL SAMPLE
Confidence interval can be written as 100(1-2) %.
-within how units of the true mean do we want our estimate to be?
many
↳ this is the L
-
how confident do
you want to be of estimate?
your
↳ that is z
your
.
=2a,
Ifwe don't know interval that
we can construct an we are
Pr(X -
z(p(k z()5) +
1
=
-
a
confidence interval method for a "known
,
If we have a random
sample size with sample mean X,
then A% confidence intervals for
given by
are
a
(X -
z*,x +
5)
*
where the
appropriate values 24
of are
given by
III.
Sample size a required to acheive desired
accuracy L,
when or is known
n ( =0)2
=
whether the
Video 17:testing is a specified value
mean
100(1 -
The Te s t
Hypothesis statistical inference, significance tests
1:define
Step the null
hypothesis (Ho)
Ho is a statement about one of the
true, unknown
population parameters
Ho will is
always state that the true parameter
equal to some (hypothesised) value.
2:define
ep the alternative
hypothesis (Hi)
He is still about the
true, unknown population parameter
C
He will
always have a
I, or
sign
In our example we have a that the true mean
strong suspicion
of this die is than 3.5
greater
H:
.
3.5
3: Set up a
significance level (2) for
your test
difference from
to detect a the
hypothesised mean.
Ifa 1%
=
(tough)
Then even if the die is fair (Ho=0), we will still
Ifx
in our
example:signficance level 5%
of 4 0.05=
ep4:Set
up a
rejection based on a
region
If2 9%
=
In our
example:
according to He (H1:p < 3.5) we
only need to
consider
rejecting Ho the
if
sample mean is in the
Iresear
:Calculate
# the test statistic
4.7
=, aktainedaromaare
-
X 4.7
=
I
Enam? =3.85
i)estatistica
z =
under Ho
6: Draw a conclusion
the test
If statistic falls within the rejection
then we
reject to in favour of HI
region
the test
If statistic does not fall within the
rejection
then do not Ho.
region we
reject
In our
example
-
our test statistic of 3.89 falls within
the rejection region (values > 1.649)
therefore we to and conclude thatthe true
reject
mean our friends
die is than 3.3
of
greater
nests
& thedefault positionunless,we are
e
-veseg in Se
2.5%
st I
⑧
one-sided test
Ierrors
Ho
Reject
·
erroneously
this is controlled d
by
is will be difficult to to so the chances of
If
very small, it more
reject
making a
type"error will be less
error
Ho
Accept erroneously
of
->
·
for the purposes of this course the
that
we
generally accept
Pr (type II error) is
acceptable when 2 0.05
=
or 0.01
Comparing two
sample means (with known population variances)
x
-N(Ne, 0.2 X2-NIN2, on
X
and2 are independentvariables,it here 2
Video 18
Friend example continued...
dice from
statistically we are
testing whether the two come
the different
same, or
populations
Are from the same that has the
they coming population
mean?
same
underlying true
your die=X1 your friends die=
X2
mean
x- N(Ne, ( X- -N(N2, (
· on" and 8," are the variances X,
of and X2
on is the variance the
of
sample mean
Xc -
Xn-N(Ne-N2, +
(
After 30 rolls of the die:
X
1 3.6
= X 2 4.4
=
N1 N2
=
or
-N2
2: H1:
Step N1 FNC
Step 3:Set
up signifance level (a) for
your
test
2 0.0 5
= 5%
4:Set based a
up rejection
on
Step region
this is a two-sided test so we
split in half
↳ We will
rejectto if absolute value of
z
=
)
I
447 1.81
=
-
-
lies
that 1.8 standard errors from what we would expect.
away
modified
approacha significance level to be spectedin
the observed level of
-rather, we
report significance
the
or
p-value
Step 1:HO:
=
N1 N2
2:H1:
Step N1 N2
=
...
- -
1.81
0.0351
=>
X 2
=0.0702
Ap value
7% the time.
size
approximately of
->
In this course
p-value
Eme3
It the to true (ie
is
probability,ifwe assume is there
ARA ->
Pr (type 1 error)
VIDEO 19:Hypothesis testing using
t-distribution the
fair die=2.917
We can estimate
o'using the
sample variances"
->
the of this is that our test statistic is more
consequence
variable than before
As an
increases, the distribution becomes more
peaked and more
E
=
z
-
N(0,1)
t
EN
=
tae
So ifwe have a
sample ofsize , and we estimate a
"by s"
then the test statistic has a t-distribution
X
I t ?
two
V1090 28:comparing
means
X, 3.9 =
xz 4.3
=
51
=
2.08 S2 1.93
=
Step 3:
Significance level
2 0.0 5
=
4:calculate
Step rejection region
remember! z =
re
xar,
:+ Ne
*-
that the equal
population
·
Sp-pooled estimate
↑
weighted average
scien
by
sample
si
spi(h)
statistic
ANe-N)
->
test +=
-trtnth
this test
statistic will follow a t-distribution with me+M2-2
of freedom
degrees
0.09/2
Rejection region
->
(TestStatisticl> t
n1 12
+ -
2
I
we reject Ho if this is true
Creject Ho if (
S
test statistic
t critical
Fanso value 2.000 =
!
Alte --
0.78
-distributortorem
= -
..
p-value is <0.6
↑)
two sided, we double
-and a
pooled standard deviation estimate
V1d20 21:Hypothesis
paired
testing using
measurement
and
degrees freedom
of
(n-1)
area of
Site the
p-value
variances:test
stat
assuming equal
-
two-sample
ret
hypothesis test:comparing
mary of means two
of
dependent samples
or
-
0 no difference
NB NA =
Step 2 define
-
the alternative
hypothesis He (now a one-sided
H1:NB<
test)
NA
- 2 O
or
NB NA
Sin above
example:we want
drug to have an effect and therefore
mean after
drug should be
greater than itwas before)
Step 3 ->
significance level (2)
d 0.0 5
=
->
Step 4 calculate
rejection region
d XB =
-
X A
↑
can be tested with a I-sample t-test
of freedom:1 1
degrees
-
0.05
*
RejectHo ifteststatistic <-ts4
&
appropriate
5 -
calculate test statistic critical value 1.691
Step
-
=
stat
test A
-
=g=-4.91-tax
6 ->
conclusion
Step "
"test statistic is more extreme than critical value
-
4.91<-1.691:·falls
in
rejection region
Ho
reject
the effective
drug is in
improving motor
functioning
Red
approach
(What is the
probability getting
of a test statistic as small as -4.91?(
nature
only 5 times in 10000
(very very unlikely) on this
---
H1: F
NB NA
we would double our
p-value
.pvalue IS <0.001
in
two
your groups
Critical need
point
-> We paired tests because our paired
data are not independent of each other
A 10 A
example 204
pg.
=
From has the standard normal distribution.
8, z
=
Chapter
Butwhen the "true"value for standard deviation
o, single the
In the population, IS
replaced by 3, the estimate of a, this is no
the
longer true, although, as sample size n
increases, it
rapidly
becomes excellent approximation. But for small far
an
samples it is
and identical.
ultimately, they are
freedom
Degrees of
remember, g =
lie)"
these terms are deviations of each of the
cifrom the
sample mean
of is
of freedom
Degrees
For each need
evaluating the
to estimate
parameter we prior to
lose
current
parameter interest,
of we one
degree freedom. of
Because the shape of the t-distribution varies with the sample size,
t-distributions. It therefore
there is a whole
family of is
necessary
have of
to some means
indicating
which t-distribution is
being used
In a
specific situation. We do this
by using a
subscript. Intuition
that the should be the but it turns
suggests subscript sample size,
out that the sensible the freedom of the
subscript is
degrees of
-n
=
Exact p-values software.
are
usually computed using statistical
①identify the
appropriate degrees of freedom
is s2,
I
(X-tre' m, X trn)
+
where the +4 values are obtained from the t-tables. For 95%
farmer
Example:Poultry is
investigating ways improving of
profitability
his standard diet, turkeys of
of
operation. Using a
grow to
mean mass
1) Ho:p 4.5
=
2.) H1:N > 4.5 one-sided alternative C not cause loss of mass should
3) 5%
Significance
4) found follows. Because the
rejection region is
by reasoning as population
need to to distribution. of
variance is
unknown, we use
Degrees
freedom for I will be
19, s is based on 20 observations.
s) test stat
==
=
2.68
reduction in
dizzy spells for each after
patient medicine. (how
taking many
dizzy spells had per month before the meds and
again while on meds I
1 4 S ↳ 7 8
2 3 q 18
Before (B):19 18 q 8 7 12 16 72 14 18
After (A):17 24 12 H 7 15 14 IS 16 24
d B
=
-
A 2 -
6 -
3 4 G -
3 -
3 -
3 3 -
6
The two
samples have
effectively reduced to one, by taking the
been
can
perform a
one-sample t-test.
1.) Ho Nd 0
=
C H1 > 0
2.
Nd
3.) x 0.0 5
=
3.) the
If number have reduced while
of
dizzy spells on the
A
score. since
by,
-
were
2,5 = -
1.33
<1.833
6.) 1.33
-
isn't effective at
reducing dizzy spells.
to decision).
reject
of
As already mentioned, in this example we appear to have two sets
data -
the "before" data and the "after" data. But were these
had 10 selected
really two
separate samples?Ifwe randomly patients
"before"and for
for then a different 10
randomly selected patients
"after"then the answer would be and the samples would be
yes
independent one
of another. But, since dizziness varies from patient
looked 10
to
patient, we at the same
patients and took two
readings
from each individual. Thus our two sets ofdata were
dependentand to
two methods?
Pair I 2 -.. 9 10
&:I - II 82 -
152 -
36 -
118
relevant statistics:d =
-40 71.6 10
=
5d= n
summary pairs
Once we that this is random from the
again assume a
representative sample
differences differences speed
population of and that the in
reading are
normally
distributed with mean
Nd.
1) Ho: Nd 0
=
2.) H1: Nd FO
3.) ta 2
=
= -
1.77
4) the p-value for this statistic lies between 0.1 and 0.2
intervals
Erence
As before, It is also
possible to construct confidence intervals for these
Nd tree,
the 95% confidence interval be constructed
can
by:
Pr)-t, at non
0.025
S 0.9 5
=
so
Pr
(d -
+,.Nd(d +
+
i=.2) 0.99
=
Ca -
+ i... ;a+ti
Comparing two independent
sample means (variance estimated
- (Ne
Tr
I =
NN
But, unfortunately, statisticians can show that this does not
quantity
have the t-distribution. In order to find a test statistic which does
Two varieties of
wheat tested Twelve
being in a
country. test
plots are
given
identical treatment. Six plots sown with 1 and other six
preparatory are
variety
with Scientists level.
variety 2. hope to determine
significant difference, 5%
1:1.5
variety 1.9 1.2 1.4 2.3 1.3 tons
per plot
tons
Variety 2: 1.6 1.8 2.O 1.8 2.3
per plot
, 1.60
=
31 0.4 2
= b
n =
522 1.90
=
32 =
0.2 7 12 5
=
0
N1 Nz
2.) H1:N1-N2 F0 (a two-tailed test)
3) level:5%
significance
4.? 5.) Before we find the
rejection region we need to know the
"degrees
of freedom". The procedure is rather different from that in the test
Instead
when the
population variances were known. of
working
with the individual variances on and on we assume that both
populations have the same variance and we pool the two individual
therefore, "=A
4x(0.27)" +
6
+
5 -
2
=0.13
n nz
+ -
z =
9
of
tg, from the table is 2.262. So we to the
if
reject
observed to-value is less than -2.262 than 2.242.
or
greater
↳emerit4
two-sample t-test
+q
=
-
1.372
6.) Because -
die
for eachoutcome a
Expebability role
of
mig 10 10 10 10 % o
observe data
we
compare what we in a
sample of
1 2 3 456
Outcomes
Observed no. 7 9 7, 9 30
come
Probability
I ! I I I
of outcome
Expected no.
5x60
=
10 10 10 10 10
of outcomes
C2 1,2,3,4,5,6
=
O
=
elsewhere
3:2 0.0 3
=
Step
CHI-SQUARED DISTRIBUTION
TA B L3
E
pg327
-
6 -
1 5
=
and
=
observed expected
2E
Dr =
EE k number
=
comparisons
of
would
suggest you are
observing something quite different to
what
you expect
the
·if test statistic is small, it would not constitute
.. * 14.8
=
You need an
expected value -5 for a CHI squared test
"goodness fit
of
tests", "tests association
of in
contingency tables",
and
"tests
sample variance".Although
and confidence intervals for the rationale
distribution.
any specified
#ample In
printing Industry, misprints are to
thought occur at
random
(independently one
of another). Thus the number misprints
of
can be
expected to distribution with R.
obey a
passon some parameter
To test
this, 200 are examined and number of misprints on
pages
each A level used.
page are noted. 5%
significance is
of
misprints
pages
Observed number of
⑧ 43
1 69
2 53
3 21
4 8
S b
-
In order to test whether the distribution fits this data,
poisson
the first to decide what value to use for the
parameter
problem is
Let first consider the test the null that the data
us of
hypothesis
be as from Poisson Distribution with
can
thought of a
sample a
poisson
⑳ H1:The distribution is not
passon with R 1.2
=
8significance level:5%
⑭+ 8 First we need to
compute the expected (theoretical) frequencies,
that the null If
assuming hypothesis is true.
misprints are
occuring in
e- 1.2,.z
probability that a
page contains (2 misprints 15
p(x) e =
the no 0.3012
probability of =
A
sample 200
of can therefore be expected to have 60.24
pages pages
with no misprints (30.12% 200). of This theoretical
frequency be is to
1.2
with
frequency 43.
Similarly p() e
=
0.3012 60.24
-
1.2
1 64 1.2 x e 11! =
0.3614 72.28
2 - 1.2
2 e/2!
=
21 e 17.34
-
1.2
4 8 12* x e (4! =
0.0260 5. 20
5 or more 6 1 -
2Y 0p(x) 0.0078
=
=
156
Even ifHo is true, we anticipate that the observed and expected frequencies
will would if
not be
exactly equal. We
clearly like to
reject Ho, however, the
"too
large"
We need to find a test statistic which is a function of the differences between
the differences
squared by their expected frequency.
12 =
>(i)
The statistic DC has
approximately a
chi-squared (X) distribution. It
Ho I
if exceeds the 5% point of X's distribution.
reject
From the table x,100s) 11.071 =
mean
IfD exceeds 11.071 then the observed and
be
"too far"apart for
explained by chance
... it 22. 17
distribution with R =
1.2
Remember D has x"distribution the expected
approximately a of
③ significance level 5%
(5xx) 300 =
misprints. 300 in
freedom:K- d - 1
Degrees of
Notice that of
fit tests one-sided
goodness are
intrinsically
-
we
reject
Ho D" too D
If that the distribution
if is
large. is
small, it means
Ho fits data
specified by the
very
well.
0.2
...
1001 i
-
=Sc...it
=1.01
o sz.dis'o
6 1.01 lies in
acceptance region, we can't
reject Ho. It is
12 E
=
Fi
20
-
Ei
=
But EOi=2Ei =
the K cells).
:. EE
= -n
for association
VId20 23:Te s t i n g an
between
categoric variables
female 40 12 4
IS 10 S
male
->
Example Die with friend
tossing
8 d
=
0.0 5
⑪ Our test statistic (b") will compare observed values with expected frequencies
and follow distribution.
a
chi-squared
(We are
assuming
thatHo is true and thatthe factors are independent)
To calculate correct
degrees freedom
of (Ino. of rows) -
1) ( (no.
x of coloumns) -
11
↳> (z 1) - x (6 1)-
=
Go to distribution table
chi-squared
-
critical value:11.070
We reject Ho if: D
2
> x,(0.05) 11.070
=
⑤member A
If and B are
independent, then: Pr(A1B)P(A). P(B) =
the
If two variables are
independent
Pr(AiBj)total amattal
association between and outcome:
Assuming no
ownership
Outcome toss of
of die
1 2 3 4 S 6
you 11 9 6 15 9 10 60
6
friend 3 S S ↳ 17 40
14 14 11 19 15 27 100
0.004
die
your being
-R
60% of the die rolls were with die. Theres a total of 14 is rolled.
your
I's rolled. 60% of 14 8.4
=
toa
column;
Eij=motal
1 23 4 S 6
x x100
p2 2 =(E)
=
-n
=11.03
2
11.83
=
At 5%, have
:.
level, we do not evidence to
reject to, and so must
Ale is rolled.
dified
approach
Chi-squared table
-
0.1
=
and 9.236
x =9.2 3 0 therefore
p-value <0.1
0.050796
inverse of
chi-squared right-tailed
↓ I
critical value in excel ->=
CHISG. INV.RT (significance, df
=CHISQ. INV. RT (0.05, 5)
11.075
=
INV. RT (0.050796, 5)
=11.02962
variables associated.
categorical are
The test statistic measures the difference between what we observe and
two variables:
Are
eye and hair colour
independent?
Is there a
tendency
for colour to be related to hair colour?
eye
DEGREES OF
FREEDOM for tests association
of
distribution has
degrees of freedom (r -1)(c 1)
=
-
for linear
VIOCO Te
24: s t i n g a
relationship
between two variables
a bx
+
y
the variable
·
y is
dependent
the variable
·
s is
independent
b and
regression coefficients
·
a are
Is there a linear
relationship between cc and
y?
evidence correlation
of measured in a sample to infer that
the
of population.
1 O 1
⑲ ⑨ ⑧!
so..... low
negative
⑧
⑧
⑧
·o
⑧
⑧
low positive
·
"
as
as
Increases, increases,
decreases
y y Increases
So ...
the
higher the value of r (or R2), the closer (stronger) the
y Bx
I
line
slope straight
of
Parameters
e
statistics
-
C level
#
is the
intercept parameter, not to be confused with significance
of
is no cc
y
So we can this as basis for whether or not there is a
testing
use a
linear
relationship between 1 and
y.
Step 1 H0:
-
B 0
=
we assume no relationship
Step 2 ->
H1: 8 B <0
BCO
BF or or
3 -> 2 =
0.0 1
Step
4 ->
under of Ho
Step rejection region assumption
n the
=
number of
pairs of observations
test statistic -
tn-z
step 5 ->
test stat
-
(b) hypothesised parameter (B)
matistic -
6 ->
If teststat falls Ho
Step in
rejection region:reject
modified
A pack
1) Ho:B 0
=
2 x10 0
=
y
3.0635
= +
(0.8211)(50)
=44.12
regression analysis
Correlation:Is there
really a
relationship between two variables?
How do
predict values for variable, given particular values
Regression: we one
Regression analysis
·
ALWAYS one
dependent variable and AT LEAST Independent
ONE variable
Years education
of and income. Income is the
dependent
variable and
e.g.
education
of isthe independent variable. Income depends on
years years
education but NOT
of
conversely.
·
involves of education to
Regression analysis using years predict income.
ample
Person 2 3 4 S
Education 12 15 8 10 17
60
50
X Income is (atleast to some extent)
education.
X
dependent
on
years
of
o 30
28
X
X
Income
variable
is therefore the
and is
plotted
response
on vertical
(4)
axis
X
10
I Iis isitis
is
Years of education
the
dependent and independent variables.
·
be used method for
Scatterplots can as a
preliminary identifying
relationships between two variables.
is ite ......
the form =
a bc
+
y
and the variable
y
is the dependent variable is independent
·
a and b are chosen so that the line "best fits"
regression
the observed data.
least squares
ummmmmm
·
the constants a and b are chosen to minimize the sum squared
of
residuals.
·
residual
A (e) is defined as the difference between the actual
y
and
y (the predicted value
y).
of
e
= -
y y
Ifwe have have residuals.
a of
observations then you'll a
·
pairs
sum
squared residuals en
---- !miniminis
of
agresa,
Esoeuvre
-
5 10 IS TO 25
Analysis
-on
·
A
measure of the linear relationship between and
y
Itis numerical the and direction a linear
strength
·
a measure of of
1 r < 0
: .
r > 0 r
+
=
1
Determination
immett of
·
The square the
of correlation coefficient (r) is known as R2
The coefficient
determination, R2, the the variation
·
is
of
proportion of in
y that is
explained by the variation in X.
proportion of in
other than X.
The closer R is to
7, the better the model fits the data.
regression
The closer the fit.
it is to 0, the poorer
·
If r is close to +1
or-1,R" will be close to 1 and if is close
·
R2 =
interest.
population parameters of
on!
A
correlation coefficient close to +1
or-7, or
·
a
significant slope
coefficient b, does in itself a casual relationship.
not
prove
·
For observe correlation between
example, one
may a
strong positive
the beach sales of
number
drownings
of at a and the
figures
ke-cream vendor -
logical considerations).
END OF SYLLABUS