0% found this document useful (0 votes)
37 views89 pages

Sta1000f Notes

This document introduces random experiments and probability distributions. It discusses a fair die as an example of a random experiment where the probability of landing on any side is equal. It defines probability as the number of equally likely outcomes that give the event of interest, divided by the total number of possible outcomes. The document then discusses using the Excel RANDBETWEEN formula to generate random numbers between a start and end interval as an application of probability distributions. It provides an example of a fair gambling game using a die and explains the concept of house advantage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views89 pages

Sta1000f Notes

This document introduces random experiments and probability distributions. It discusses a fair die as an example of a random experiment where the probability of landing on any side is equal. It defines probability as the number of equally likely outcomes that give the event of interest, divided by the total number of possible outcomes. The document then discusses using the Excel RANDBETWEEN formula to generate random numbers between a start and end interval as an application of probability distributions. It provides an example of a fair gambling game using a die and explains the concept of house advantage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

workunit1:Introducing

~
Is
February
....................
2023
probability
Random experiment:

A
procedure whose outcome (result) in a
particular performance (trial) cannotbe predetermined

Fair die

Equal likelihood side


landing
of on
any
↳ didn't have this couldn'tuse them random
If
they property we as number
generators

Pr (an outcome) = (equally likely) ways ofgettingoutcome


mer of

number possible equally likelyoutcomes


of

Pr(0) 1/6
=

Pr (even) 3
=

/6

Odd S

Reflect the number of that the event interest


of relative to the
ways give you
number of that don't the event of interest
ways give you
e.gPr(b) 1/6
=

odds of (6) 1/5


=

Random variation

-> won't
hit at throws
implies we once
every 6

Statistical distributions

this course we will be ofdifferentstatistical distributions, which be


in
considering a
range can

applied to differentpractical situations.

Excel Formula

RANDBETWEEN
=

startofinterval, end interval)


of le (1,6)
mufair
gamered to win and no one is expected to lose in the
long an

and 6 times: 11
=

You betRI on a number


you bet 6
x R6

Expectation 1/6 throws:WiN!

For this to be
fair, we need to R6 win for R1 bet
game payout every
would for local
Practically speaking, not a
good idea, you
make no
money community

Therefore, one bet costs


R1, butwin RS, so instead of RG (keep so for community)

That retained son is referred to as house advantage

for number
win
percentage payout
a
winning X 100 =
%
amountto be bet across all numbers

: 50 100
x
91,677.
=

:house 8,23%
advantage
=

...................

CH 2:Set theory
Why do we have to do set
theory?One Murphy's
of Laws state that before we can

have do else. Before do "statistics"we have


do
anything, you to
something we can

to do
"probability theory", and for that we need some "set theory"

#finition sets:
of

We define set A be collection


a to a
distinguishable objects
of or entities.

The set A
is determined when we can either (a list
the objects that belong
to A
or (b) give a rule
by which we can decide whether or nota
given
object belongs to A.

A
=
5e;f,93
letters e,f, g the set A
the
belong to

B 3x/
=

11x 103
=

the set 1 consists of all real numbers (C such that IC is larger than or equal
to 1 butis less than or
equal to 10.

ef A

e is an element of A

e+B
e is not an element of B

c 31,3,
=

5, a3 and D =

50,1,5,33
C:Dthe order in which we list the elements a
of is
set irrelevant

E 5a,
=

b, c,
a3 and F Sa,b,c)
=

E F E b,
set contains the
distinguisable elements
=

the c.
only a,

subsetswe have two sets G and H, and that


every elementofa
also H. Then that "A subset H and
belongs to we
say is a of we

write GCH. We can also write H Gand "Icontains A"


say
If every in Gdoes not also
belong to
H, we write GK H
element
and Gis not a subset H"
of
say

G
Let 31,3,53,
= H 31,3, 5,93
=

and 5 31,2,3,4,53
=

:G H, H4J,TCG

Note:The notation C) for sets is


analogous C comparable)
X), for numbers. The "round end"of
to the notation
ordinary
the subset notation tells which the
of sets is "smaller" (same way
the "pointed end"shows which number is smaller)

IfH CGand G C
H, then H G.
obviously,
=

Av rs e c t i o n s suppose L Ga,
=
c) and M 3b, c, d3. Then LK M
b, =

and MKL. Butifwe consider the N 5b, c3, then


set
=

that NCL and we see

and which his has leads


NCM, that no other set of a subset this
property. This
us to the idea intersection.
of
The intersection two sets the set that contains those
of
any is
precisely
elements which to both sets. For the I, M, N above
belong sets, we

write N 1 =

M and read this "N equals I intersection M. The

intersection two sets


of M and N can be
thought of as the set

those elements which to both M and to N.


containing belong

The emptyset, exclusive sets...


mutually
what
happens L
if 3a,
=

b, c3 and R 3d,e,f3?
=

Ifwe want LnR

be then introduce the the set


to a
set, we must a new
concept, empty set,
that has members. This sensible the of
no is a
concept:consider set

English-speaking fish, or consider the set of real numbers whose square is

negative. We reserve the


symbol & to denote the
empty set. We use this

symbol for no other


purpose. We write and read this as L1 R =

0
"the intersection sets
of and is the set "
empty
Pairs sets said be
of whose intersection is the
empty set are to
mutually
exclusive sets (or disjoint sets). Thus Land R are
mutually exclusive.

The universal set, the sample space...


reserved the letter S. Itis used for the all objects
Another symbol is set
containing
under consideration. Thus if,in particular problem, the interest
a
only objects of

the the traffic then S red, 3


are colours of
lights, amber, green
The 3 set is known to mathematicians as the universal set. In statistical

the set S is called the sample space.


jargon

Unions

The concept union contrasts with the concept intersection. The union of

two sets A
and B the set that the elements that to
is contains
belong
A or B. Here we use the word "or" in an inclusive sense -
we do not

exclude from the those elements that to both and


A B.
union
belong

IA-S1,.33 and B-S2,3,4.sS thentheunsonof


/
a mee

"

↳ we I equals A union B
say
Complements
Given the
sample space of 3, we define the complement of a set A
to

be the set elements


of of3 which are not in A. The complement of
A
is written A, and is
always relative to the
sample space S.

1) S 31,2,3,4,5,63,
=
A 31,3,53
=

and B 32,4,63
=

then

E 22,4,63
=

We write I =

"the complement A
of
equals B"or "A
complement equals 13"

1 s 3x/0xx 1/3
=

and D =

5x/0 <
x43, find J.
Because the is
set excludes endpoints of the interval from zero to

one,
D 50;13
=

Venn
diagrams
Think all
of the
"points"in the
rectangle as
being
the
sample spaces, and all

the inside for and


A the and
A B
points the circle B as sets
respectively.
The shaded the where circles
area in
diagram, the overlap then represent

A 1B, to and
A B.
the of
set points belonging Similarly the space
of both individual circles and their
overlap represent A
UB, the set of

A
or B.
points belonging to

* "or"
helpful to associate "intersection"with "and" and "union"with

a KRmDG Up

ArB AUB ANB =


8 ArBrC

mutually exclusive

.....................
A=
5x/ 1 x a3 =

B 31,2,3,4,5,63
=

A B
why?I
=
does not contain all the elements thatA does

1.2 A contains 1.9 and 2.5 etc.

Sample space
the setofall a random
possible outcomes of
experiment
(in venn
diagram outer rectangle)
sets
Empty
set which has
A no elements f

Intersection

the intersection of two sets contains those elements common to both

Unions

the union of
sets A and B contains those elements that are other in A
or

in B or in both and
A B

Exclusive
Mutually
two exclusive when their
sets are
mutually intersection is the
empty set

Pairwise exclusive and exhaustive sets


mutually
two sets thatcarve
up the sample space (s) into a
partition with no elements

of 5 left out

Complement
the complement ofa set contains all elements thatare not
in that set.
CH3:Probability theory
An event is defined to be subset the
of
sample space S.
of
any
The set is the set no elements.
empty containing
0 and 3 are both subsets of 5 and thus events. (O is called an

Impossible event and S is called a certain event]


An is with one member.
elementary event an event
exactly

Kolmogorov's axioms of
probability
S is the sample space for random For all
Suppose that a
experiment. events

A
c
5, we define the
probability of
A, denoted Pr
(A), to be a real number

the
with
following properties:
①0 Pr(A) 1 for all AC S

⑪ Pr(s) 1
=

③IfA + B 0
=

(ie if
A and B are
mutually exclusive events) then
Pr (AUB) Pr(A) Pr(B). =
+

->
by consequence, Pr(0) 0
=

Functions
usually look like
y f(x)
=

-you put
in number (x), you getanother
number
(y) out.

P(A) is a new kind function


of

you put
a set (A) in, and out pops a number between zero and one

Relative
probability
We
start trials the random experiment and the
by doing a of
counting
times that some event ACS trials.
number of occurs
during a

Then we define r/n to be the relative frequency the


of event A.

r/ n = 1. Thus and
Obviously, 0 relative frequencies
both lie between and
probabilities zero one.

the the
We can of
think
probability of A
event as the relative
frequency of
A the random
as
gets large e
numberoftrians ofthe
, experiment very

the number trials increases,


As of the relative
frequency tends to
get
"true"
closer and closer to the
probability
Some useful theorems...

theorem 1 -
AC S .
Le t Then Pr(F) 1
= -

Pr(n)
-> writes the two
proof We as union of
mutually exclusive events:

UA
A S
=

and
A I 1A
A 0
=

because are
mutually exclusive, ie

we can use axiom 3 to state

Pr (AWE) Pr(A)
=

Pr(A)
+

butA UA= S, and Pr(s)=1, by Kolmogorov's 2nd axiom

so
Pr (A UE)=1. Therefore,
Pr (A) Pr
+

(A) 1 =

and Pr(E) 1
= -

Pr(A)

theorem 2 If
-

ACS and BCS them Pr(A) Pr(AUB) Pr(ArB)


= +

proof ->
Write A
as the union the
of two
mutually exclusive sets:

A (A
=

+
B)u(A 1B).
clearly, (A1B)n(An B) 4 =

therefore
using axiom 3, Pr(A) Pr (An B) =

Pr(A1B)
+

notice that theorem I can also be expressed as

Pr(A + 5) Pr(A)
=
-

Pr(An B)

theorem 3:the addition rule -


B, and
arbitrary events A
Pr(AUB) Pr(A) Pr(B)
=

Pr(A
n B) + -

AUB of
proof ->
Write as the union two
mutually exclusive sets:

AUB BU
=

(A 1 B)
and AnB
because B are
mutually exclusive, we can
again
apply axiom 3 and
say Pr(AWB) Pr(B) =
+ Pr (AUB).
But, by theorem 2, Pr (AUB) =
Pr (A) -
Pr (A1B).

theorem 4 -> If B C
A, then Pr(B) < Pr (A).
proof -> If B C A then we can write A
as the union two
of

mutually exclusive sets, A BU


=
15)
(A and

Pr(A) Pr(B) Pr(A 15)


= +

>Pr(B)
Pr (A nB) <0 all
because as probabilities are
non-negative.
theorem 5 - If A, As, .... An are
pairwise mutually exclusive,
is Av 1 A j 0 for it j
=

then Pr(A, U AcU...U An) Pr(A1) +Pr(A2)


=

+...
+

Pr(An)
or in more concise notation,

pr(, Ai) =

Pr (Ai)
Equally probable elementary events

be solved theorem 5.
theory, all probability problems
In can
using
The S
up the
elementary events whose make union
sample space are
always mutually
exclusive because one event other event occurs.
if
elementary occurs, no
elementary
Therefore, we
if knew the probabilities all
of the
elementary events, we would also be able

to compute the probabilities eventin 3. theorem 5, event is


of
any By the
probability any
of

simply the sum the


of
probabilities the
of
elementary events that make up the event.

There is a wide class of


problems for which we do know the probabilities of all the

elementary events in a sample space. These are the


problems for which is
it

reasonable to assume that all the events equally likely. there


If
elementary are

are N
elementary events contained in
S, and each one has the same
probability of
occuring, then the
probability ofeach and
every elementary
event be
must 1/N.

card
Equally probable elementary events occur in
many games
of chance. When a is

drawn from of 32 the card assumed


a
pack playing cards, probability of
any particular is

to be1/52. Let A
be some event in this scenario. Then A
must consist the
of union of

events, 1 / N. we could
If determine the number of
elementary each
probability
of

events contained in
A, then could
elementary we
write,

Pr(A)- merofelementyearseatedst
= =

permutations and combinations

In which all and


problems in
elementary events are
equally probable, it is
usually impractical to list

countall contained eventofinterest. combinations


elementary events
in
sample space or in
Theory of

and and be determined quite summarized


permutations comes to rescue events can
easily. Theory is

in
"counting rules"
ofa objectjects
mutations and thatthe order in which the objects are listedin

Irrelevant. We now consider the number different all the objects are listed is irrelevant.
of
ways
Consider the number different all the objects be order.
of
ways in a set
may arranged in

A set has
containing distinguishable a
objects

n(n 1) 3 2 -
x ...x x x1 n!
=

(in factorial")
differentorderings the objects belonging to the set. We this
of can see
by thinking in

slots fill with the the set. Each slot hold


terms
having
of in to n
objects in can

one
object. We can choose an
object for the first slotin a
ways;
there are

then n-1 available for the second select


objects slot, so we can an
object
for the second slot n-1 n-2 available for the third
in
ways, leaving objects
the has the final slot.
slot, ...,
until last
remaining object
to
placed in

aren!
We
say that
there distinctarrangements
(technically, we call each

arrangement or ordering a
permutation) the
of n
objects in the set.

permutations a taken ata time


of
objects
Suppose we have a set
containing a objects, and thatwe have r(0(r(r)
slots to fill. In how do this, assuming that each object
many ways can we

is "used up"once it is allocated to a slot? We number the slots 1 to r

and fill each fill first


any of
choose
in turn. We can n
objects to slot.
Having
filled the first slot there are n-1
objects available, ofwhich be
any may
chosen for the second slot. Therefore, first two slots can be filled in

n(n-1) ways.
Firstthree slots can be filled in n(n-1)(n-2) ways.

By the time we have filled (r-1)* slot and are


ready for the rth
slot, we
have used r-1 members of our set and therefore have n -(r-1) n =

-
r 1
+

members left to choose from. the slots be filled


Hence, r can in

n(n -

1) ...x(n
x -
r +

1)
=>
n(n -
1) ...x(nx 1)x(n r)(n
-
r +
- -

r -

1) x ...x3 2
x 1
x

(n -

r)(n 1)x ...x3 2


-
r +
x 1
x

-r)! ways
Thus there are is!ways of
ordering elements taken from a set
containing
n elements
using
each element at most once. Note thatwe (a) choosing

objects and (b) arranging them. We are here involved in two
processes,
and The number and
Eng
arranging.
of
ways choosing of
arranging
r

ofa called of
objects out
distinguishable objects is the number permutations
of

n
objects taken ata time and is denoted
by (n) r (in permation r')

(n) = mrs!
This formula is also valid for r n
=
if we adopt the convention 0!=
that 1

Combinations n at
of
objects taken r a time ...

Now want to countthe elements out the


we number of
ways ofchoosing of

n elements set without


regard to the the chosen
in our
arrangement of

elements (ie we want to determine the number of reelement subsets

that we can form). We call this the number combinations


of a
of
objects
taken ata
time, and denote it
by (r) (in combination r")

when we found the number of


permutations a
of objects taker at a
time,
divided the into two and then
we
operations choosing objects
-

process
them. We interested the first operation. We recall
arranging
are now
only in

r! little
A reflection
that a subset
having r
objects can be
arranged in
permutations.
will that therefore r! combinations.
convince
you
there are times more permutations than

-r)!
(w) yr
=n!

permutations, with repetitions...

Now have a
objects and and thatwe have atleast
we
types of a
slots, r
objects
each available. We can thus fill the first slot with the
of of
of
type any types
objects, there still objects available for the second slot because
are n
types of ...

there at least of
each there still each ofthe
are r
objects type, are
objects of n

available for the


final, with slot. Thus the number of
permutations ofa types
types
objects
of taken at a
time, allowing repetitions is

n x x x
x x ...
xn n
=
Combinations with repetitions
We have of with least available of each The number
a
types objects, at
type.
selections a
objects allowing repetitions
of is
given by
of

C for- S
included at end of
the
proof of this result
is as an exercise chapter

rules
Counting

1: the number a distinct


He of
distinguishable arrangements of
objects, not

n!
allowing repetitions is

2:The
He number of chosen from distinct not
ways ordering
of r
objects a
objects,
is
allowing repetitions

(n) r
=
=
r)!

3:The number of robjects from distinct


ways
of
choosing a set of a
objects,

not
allowing
repetitionsse)!
4:The number of r
me permutations of
objects, chosen from a distinct
r

objects, allowing repetitions is W

--

Permutations combinations
Fide4. an

Pr(A) (number equally likely elementary events


=

of in
A)
(number of equally likely elementary events in 3)

Whatis a
permutation?An arrangementofobjects where order does matter.

What is a combination? A
selection objects
of where order does not matter
VI020 S:conditional probability and events
independent

What is conditional probability?It is a method of


updating knowledge of
our

the an when we are provided


probability of event

with new information another event


concerning
and this other not
event
mightor might occur

The event A
probability of
occuring
event
If B has occurred.

Pr(A/B) =

2)
If and
A B are unrelated or independent Pr(A(B) Pr(A) =

Pr(A1B) Pr
=

(A) Pr(B)

When two events unrelated independent THEN of intersection


are or
probability
is
equal to the productoftheir individual probabilities

UniE4.
-----
monal
probability
Provide method for new information.
revising probabilities the
light
a
updating or in of

In statistical we weather forecast is conditional on the info


jargon, say a

available to that point in time.


up

Let and
A B be two events in a
sample space 3. Then the conditional

ofthe B the A has occurred, denoted


probability event
given event
by
Pr (BIA), is Pr(B(A) Pr(A -B) /Pr(A) =

provided that Pr(A) = 0. Pr (BIA) is read "the


probability ofB
given
A".
The conditional probabilityPr (B(A) may be
thoughtofas a reassessment

the B the information that other event occurred.


A
of
probability of
given some

consider the example this definition


Example We
->

using

Pr(King of clubs /Clubs) Aclubs) -


clubs

Pr (clubs)
-
m clubs) of

Pr (clubs)

clubs"is "clubs"
The event
"King of a subset the
of events -
so the

intersection of these two events is the event


"King of clubs".
Pr (clubs) is the
probability drawing
of club a
-

there are 3
ways
of
doing this. Hence Pr (clubs) 13/92 =

(King clubs)
Pr of

clubs)
Example
Suppose A and B are two events in a
sample space, and that

Pr (A) 0.6, Pr(B) 0.2


=
=
and Pr(A/B) =

0.5.

In useful first step


this
type ofproblem
to
a
always is
simplify as
many
conditional probabilities into absolute
probabilities as possible.
Pr(A(B) Pr(A 1B)/Pr(B) =

0.9 Pr(A1B) /0.2


=

Pr(AB) 0.1
=

a) Pr(B(A) Pr(B1A)/Pr(A) =

0.1 0.6
=
0.1 7
=

b) Pr(AUB) Pr(A) +Pr(B) -Pr(AnB)


=

=
=

0.6 +0.2-01 0.7

c) Pr(B) Pr (AnB) Pr(A1B) by theorem 2.


= +

Pr(AnB) Pr(B) Pr(A1B) 0.2 - 0.1


= -
=
0.1
=

d) Pr(B/A) Pr(BrA)/Pr(A) = -

(1
=
-

Pr(A))
0.1/(1 0.6)
=
-
0.2 5
=
Bayes' Theorem
For and B
A conditional
any two events there are two
probabilities that can

be considered:

Pr(B(A) Pr (A =
+ B) / Pr(A)
Pr(A(B) Pr(A =

n B) /Pr(B)

IfAand are
B two
events, then

Pr(A(B) -mPr(A)
=

Pr (BIA). Pr(A) Pr(BIA). Pr(A) +

Proof ->
recall the definition of conditional probability
Pr(A(B) Pr(A 1B)/Pr(B)=

and theorem 2

Pr(B) Pr
=

(A 1B) Pr(A1 B) +

have
Substituting, we

pr (A(B)
(an
=

a
We note that

Pr(A 1B) Pr
=

(BIA). Pr(A)
and
Pr (A-B) Pr
=

(B/A). pr (E)
therefore

Pr(A1B) =
as: A). Pras
W"
-endent
events

The intuitive that have effect


feeling is
independent events no
upon each

other. But how do we decide whether two events and


A B are

independent?If the occurence of event has


A
nothing to do with

the occurrence of event


B, then we
expect the conditional
B A be the same the unconditional
probability of
given to as

probability of B: Pr (B(A) Pr
=

(B).

The information that event has


A occurred does not
change
the
probability of B
occuring.
Pr
If (B(A) Pr(B), =

then,
the definition conditional
using of
probability,

Pr(B) ?A)
=

or

Pr(An B) Pr(A) =
x
Pr(B)
*
ifevents are
independent
words -> of intersection
in the
probability independent
of events is the

product oftheir individual probabilities.

extended:
Pr (A, 1 Azn ...
An) Pr(A.) =

x Pr(A2) x ... Pr(An)

exclusive"is from
It helps to realize that
"mutually a
concept set

and Venn
theory be
represented by
can
Diagram. But a

"Independence"is
concept probability theory and cannot
a in

be Venn
represented by diagram. a

->
independent exclusive
events are never
mutually
("price of
gold today
in
theyare conceptually, attent
VId& 8 6:Summarising
using graph
information

pie charts are to show


a
great way composition

Bad don'tadd
ple charts the 100%.
-> to
categories up

units not comparable


slices do reflect
ple not proportions

Whatkind data
dealing with?
of are we

quantitative I fully numeric) (histogram or scatter


plot or box plot)
·

qualitative (categorical or nominal) (pie chart bar graph) or

·
ordinal data (ranked or ordered) (steps avert necessarily the same size)

data
CH 1.Exploring
We define statistics as the science of decision the face of
making in

uncertainty.

What data?
do we mean
by
-data is information

Either
you get
data drips (too little info) data floods (too
or much data)
-

different
types data
of qualitative and quantative)
non-numerical I numerical
always
(nominal)
of location
VIde8 7:Summary and
measures

spread

summary statistics:numerical (rather than graphical) summaries data.


of

A
statistic calculated from the data values a sample
is
any quantity of

-
rd deviation on excel:

Step by step
I find mean data
of set (FAVERAGECi)(
2. In new
coloumn, find (Ci -T ) (xi x
=

given; T mean)
=

for each ci

3. take value in step 2 and


square it in new coloumn (= D2 r2)
4 find the sum all
of standard deviations (step 3) (AUTOSum)
3. Sum all
of stdeviation divided
by (sample size
-

1)
6 this equals variance

7.
Square to
root
getstandard deviation

s x
=

1),(xi -

x)-

-
STDEV. S (all values in data set row)

excel
summary on r
we

min (range;o(
=

QUARTILE

LG QUARTILE (range; a(
=

median QUARTILE
=

range j 2)
UQ QUARTILE (range; 3)
-

(range;4(
QUARTILE
-

max

strays
->
unusually small or
large numbers ( more extreme are outliers)
193S
than median -

3 (median -

(Q) lesuan=
median -

6 (median -

(a)
more more
than
=
median 3
+

(UA-median ( than median 6


+

(UA-median (
Video 8:
random variable ->
in the example rolling
of a dice

x (the
=

number of dots on the upturned face the


of

die after is
it thrown)
X 3121x 1,2,3,4,5,63
=
=

remember x
(big x) =
the random variable

12 (small x) A
=

particular outcome on throw the


of die
any

We random variables:the of
set possible values thatcan take on is
-

either finite infinite


or
countably

how do we calculate probabilities associated with a continuous

variable?
relative approach will not work. Instead,
frequency we

define the function known as a


density function of the

random variable X as such that the area under the curve

is equal to 1. The function is further defined as the


such that area

between two
values, and d, is equal to the
any probability
that i lies between and d

Pr(x < d) (,f(x)dx


=

ya

-
↳ function
probability density
The function of a discrete random variable is known as a
probability
function.
probability mass

(i) defined for all values of is butnon-zero a


at finite (or countably
infinite) subset of these values

(ii) p(2) 0 = = 1 for all values of 2

(iii) 2p(x) =
1 for all values of

The probability function for a continuous random variable is known

function
as a
probability density
(i) defined for all values of

(ii) 0 = f(x) co for all values of

(iii) S.% f(x) dx 1


=
CH 4: Kandom variables pg.
97

In order
manipulate the events defined sample
to on a
space mathematically,
It is to attach numerical value to each event.
necessary a
elementary
When events are
quantative, there is an obvious and natural to
way assign
numbers to them (ie hours, countof items). However
elementary
if events

expressed have to number to each event.


are
qualitatively, we
assign a

Once all have numerical values


elementary events in a
sample space assigned to
them,
we follow the classic
algebraic tradition and letX "stand for" the numerical
values the events. We then call X random variable.
of
elementary a

X is a variable because it can "take on"different values. X is a random

variable because the particular value ittakes on depends on the outcome

of a random experiment.

Discrete random variables take on isolated values along the real


line, usually
values. PROBABILITY MASS FUNCTIONS
Integer

Continuous random variables be measured to


any degree
can of
accuracy
PROBABILITY DENSITY FUNCTIONS

Probabilities for discrete random variables found the


are
by calculating
values the function p(x) at the interest and
of
probability mass
points of

However, for
summing
them. continuous random variables, the probability
density function f(x) is constructed in such a that probabilities
way
of events are found the under the between
by integration: area
graph
the numbers and a the the event.
represent probability of
Video 9:
also referred to the expected value the random variable X.
average
-man as of

The expected value X


ofcan be viewed as a
waghted sum of the

possible values of X.

E(x) E,
=

x.p(X) where x=
the number dots
of on
upturned face of

thrown die
a
randomly
E(X) 5(1) 6(z) 5(3) 5(4) (s) (6) + +
+ +
= +

=
3.5

the
expected value of X is also referred to as the mean ofX.

How does this apply to the of continuous random variables?


theory
E(X)
=

(.8x.f(x) dx

Ample let X=
a continuous random variable with pdf given as:

f(x) T(z
=
-

xi) -1 x 1

O
=
elsewhere

ESX3 S.,x
=

f(x) dx

f(x) y(z x2)


=
-

:.x f(x) (x(1 x)))


=
-
T -
=

E[x] S.,Foc -Foc"


=

-> To " -

"1.
(t
-
-

(r) -

( -

i)
=

var [X] =

= [x] -(ESX])2

143-149
p
How do we calculate?

=(A B) +

Var(A B) +

E(A -

B) var (A-B)

We use these rules:

8 E(A B) E(A) E(B)


+
= +

⑦ E(A -

B) E(A)=
-

E(B)

③ E(cYo s t (E(A))

the where and


A B variables
In case are
independent

·
B) var(A)
Var(A+ =

+ var(B)

·Var (A-B) Var(A)+


Va r (B)
=

·
Var (cA) c"
=

(Var(A))

Excel example

mean AV E R AG E
->

(A1:A1000)
variance ->
VA R (A1:A1000)

-
e E(x] suM =

(c.p(x)]

Ia I [X] E(x--(ECX])"
=

var
·

L 0.167 SD(X7 sqrt(VAR(x3)


=

S 0.167

6 0.167
pr (0.8)

E[X] [x2. =

p(x)

o()
=

1))
+

2))
+

3)(
+

0 0.23
=
=

0.0 0 8

=
1
=

0.8 x 0.2 0.0 3 2

2 0.8" x
=

0.2
=

0.1 2 8

3 0.83
=

0.5 1 2
=

Six. f(x) dx S.". g(x)dx


=[x.x)! (x.I],
=

14
=

-I

var [X] S!(x-1) "f(x)


=

dx Si"(x -

14)"g(x)dx
=[12 1](!) (z x)"(8)
,(0)"(x)
- -
-

=-
24
CH 4:Probability

VideO 18: Uniform distribution

Scenario the
-

train arrives at the station 30 minutes. Ittakes


every
the
10 minutes to
get from the
gate to the
platform. What are

chances make train?


you your
·

which distribution should use?


probability we

-fo r m
distribution

-train leaves 30 minutes


every
-time train leaves is between 0 and 30

don't know what time arrive at the station relative


-you you
to the trains times.

-time arrive the


at station is a random variable
you
distributed somewhere between 8 and 30.

the uniform continuous random variable because it

(t) distribution deals with time


or

>
- I t
30

this yields the function: f(t) o =

0 +130
elsewhere

or
generally put
->
+ (x) as
=
atx b
- O elsewhere

Pr (0c +
(20) relative, to the time the
of next train (between 0 and 30)
S
=apad at

0.6 7
you have
·

If a continuous random variable

If it to lie between interval and b


is
equally likely a

it impossible lie outside this interval


If is
completely to

↳ uniform distribution

X -v(a,b)
and function of
probability density
f(x) ba
=

acx<b
8
=

elsewhere

-om
distribution

It is the simplest possible continuous distribution. It is used to model

the situation where all values in some interval (a;b) are


equally
likely to occur.

If the continuous random variable X is


equally likely to take on

any
value in the interval (a;b), then has
X the uniform distribution

X -
and

v(a,b)with probability densityfun


=
0 otherwise
1800 11:knowribution
Example:teaching mice to read. Set off in
maze, sign turn left.
says
4 of the 5 mice turn left.

does the mouse turn left or


right?" random outcome

·
mice
If cannotread-what should we set
p to be!
p
=
0. (godgett)
·
EVENT INTEREST:
OF of 4 out 5 left?
probability of mice
turning
when read
can't
even
they
·RANDOM VA R I A B L E : -
the number of mice that turn left

-out 5 mice
of
potential
0.5
-

p
=

given
X the
=

number mice
of that
successfully negotiate the maze

p
=

probability
ofeach mouse
turning
left (success)

X 3 no
mice;one mouse; ...;
five mice

For each outcome: We calculate the associated


possible probability
I (success or fail)
px pxpxpx)1 p) - -
the F could be
any
has
mouse, just to

and
happen once
only once

Each these elements


of occur with
probability: p"(1 p)-

0.5" (0.5)
:.P(X 4) 5x(0.s"0.5) = =

We can this formula as follows:


generalize
If we the number successes independent trials
are
observing of in

(repetitions) an
of
experiment in which the outcome at each

trial can be success or failure


only
-
x

P(X x) =

(c)p")1 p)r
=
-

x 0, 1, 2,
=

...
R

O elsewhere
The 4 left when
probability of mice
turning by chance
they can't

read visual clues is


given by:

Pr(X 4) (i) 0.5" (1-0.5)"


=

=0.154

Binomial Distribution

random variable

records the number successes


of in a trials an
of
experiment
with success
probability of
which remains constant on each trial

r
x

p(n
P(X x)
(4)p"(1 1, 2, ...
=

0,
=
-

=
else where

:binomial distribution X- B (n,p)


CH S:Probability the
Distributions

binomial, poisson, exponential and


1-

normal distributions

The Binomial Distribution

be used when these conditions satisfied:


may only are

Irandom experiment which has a


sample space with
exactly two

outcomes, one we label "success" and the other "failure".

②random experiment is repeated 4


times,
n) 1. The outcome

not influenced outcome other


on
any
one
repetition is
by of
any
have trials a random
repetition.We say we a
independent of
experiment
⑧)probability success
of remains constant from trial to trial

Pr(success) p;P(failure) =1
= -

p orq
=
1 -

pPr(fall) q =

Our random variable X is number successes


of we observe in a trials.

the
If above conditions are satisfied, we have binomial and
process,
random variable X has a binomial distribution.

-----

I Binomial Distribution
I
-n
independent trials
I -
two outcomes I
-random of trials
I variable X is number successes in a

I
I x
-

p(x) (c)p(1
=
-

p)4 x 0, 1,
=

. .

.,
R
I
I 0
=
otherwise

------------

n and are
parameters of the distribution
p
(n 1) (0<p
= <
1)
x B(n,p)
-
Poisson
The
VIdCO 12:
and
Exponential Distributions

Excel spreadsheet containing pothole example


1000kms road,
of 850 potholes on this stretch road
of

a random random
on
any repetition
of
experiment, because its

we can't know what the outcome will be. We can use some

theoretical model have


to
give
us a

probability, so we an

idea of what to
expect

observed closer to theoretical with


pattern gets pattern more

repetitions.
----

X stretch of
=

the number that


of
potholes occur in a
given
road (5km)
Discrete
-

mass function
-probability
the function will depend 1
on
only parameter:
the rate occurence
of
potholes
of
average

average
rate of occurence
=
-8 per km

4.2 5
=

:.x 9
x 5km road
potholes per
of

x -
P(x 4.25) =

p(x)i? 32
0, 1, 2, 3...
=

elsewhere

this distribution but


depends on events
occuring randomly
with an rate ofoccurence ->

POISSON PROCESS
average
-

The random variable of interest is distance


-

The random variable is continuous

Y is a continuous random variable (distance)

y -
E(x)
0.4
Y
S
1.5.
*3
-
-

+ (y) = y>0 O
1.5 e
dy
.e elsewhere [ -
e
-

1.5.03
I

The rate occurence events


of that
average of occur
randomly
-ensure that i uses the same units as the
question asked
&
for 5km will be half that for 10 km.
e.g

5km road
Probability of
exactly 2 potholes in a stretch
·
of

-
Poisson distribution R 4.25
=

P(X 4.25)
x p(x)
ing"
- =
=

Probability of 3km stretch of (smooth) road between potholes


distribution
-

Exponential
Probability Y is at least 3 in
exponential distribution

y -
E(x 0.8 3
=

per
km of road)
0.854
f(y) 0.85.e-
=

y>0
Pr(y>3) (,0.852-0.853
=

dy

( 0.853]
-

q
-

0.078
=
stretch road
Probability finding
of no
potholes in 3km of

Poisson

↳joer
3km of road)

&roumberofeventcurrence
time
of I space/distance
per in the

&
paar b ui n ance betweenente
Chapter 5:Poisson
distribution

In Distribution
-

which events random.


We are
given a
period of time
during occur at

The average rate at which events occur is i events per time


period.

It is critical that the time


period referred to in the rate must be

the same as the time period during which the events are counted.

Let the random variable X be the number of events


occuring
during the time period.

Then has
poisson distribution
X the with parameter x,
12 X -

P(X), and has the


probability mass function

p(x)
Fi
=

x=0, 1,2, ...

O
-
otherwise

Example
Thereis fleet trucks. On there 12
a
large delivery
of
average are

breakdowns week. Each there two


per 5-day working day are

trucks on
standby.

↳ Let X be the random variable:the number of trucks that break

down in a
given day.

Because we are interested in breakdowns per


day, we need to convert
12
=

the rate into rate. 12


Sdays=
5 2.4
given weekly a
daily In

Thus we assume that X


has poisson distribution with R 2.4
=

Pr(x x) p(x)
22
=
=
=
a.) What the that needed?
is
probability on
any day no
standby is

Pr (no
breakdowns)
=

0.0 9 07

What
b) is the
probability that number
standby
of trucks is inadequate?

Pr (Inadequate standby) Pr
=

(x 27
>

=1 -
Pr(x- z]
=1 -

(p(0) p(1) p(z))


+
+

-1 -

((0.0907) (0.2177) (0.2413)]


+ +

0.4383
=

This means that 9% of


days we will not use our
standby
but 43% do not have
trucks on
days
of we
enough.

Poisson Distribution
the
->
...

N
①lm n = ( ) e
=

for
⑦ limn finite
=

= 0 value a
any

Pr (c events in time period] Pr


=
[I successes in a trials]
-

=(c)()"(1 -

)
I
when tend
see what happens we let a to
infinity
Therefore the result we
require,
p(x)
2
=

the function for the distribution


probability mass
poisson
Distribution
ential

For the interval of time


exponential distribution we consider the

between events. Let the random variable X be the time

between events. X is a continuous variable (it can take on


any
value in sample space s 3x1(),03)
=
-

probability density
functio

Ifevents are at random with rate x unit


occuring average per
the function for the random
time,
of then
probability density
variable time between
X, the
length of events is
given by
-
Nx
f(x) xe =
x70
8
=

otherwise

X said have the distribution with


is to
exponential
parameter is, and we write X -E(i)

-> HAS TO BE ONE-DIMESIONAL SPACE

ample
A computer that breaks down at random
operates continuously
on 1.5 times week.
average per
This tells us X 1.5=

week, and that the random variable X,


per
the time between breakdowns, has function
density
-
1.Sx
f(x) =

1.5 e x > 0

-
O otherwise

9.) What the of breakdowns for 2 weeks?


is
probability no

x > 2
-

exponential is by integratione
Pr(X>2] S8 =
continuous, evaluateprobability
15e
-
1.33

2
-
-
3
-- e I e

=
0.0498
b) What
is the
probability of a breakdown within 3 days?
-
-maketimescompatible.See
--
-

days
1.53

e
d
- O
(
=
-
e- 13x]z

=-
0.9 2 5 8 1
+

0.4 74 2
=
Video B.Normal
Distribution

30 tose in

Die tossingexample: where =


x 31,2,3,4,5,63
=
I =

3.5

--

X sample
=

mean

On excel, simulate tossing a coin 30 times, then


repeat
this 100 times

123
#
throws -> 100

1 =
RANDBETWEEN (1,6)
2


30

AV E R AG E (B2:B32)
=
-> 108
average
min MiM
=

(B32:(w3z)
max - 1

2.6 FREQUENCY
=

(B32:CW32, bins)
2.8 -> distribution of
averages
3 of 30 tosses across 100 I =FREQUENCY (B2:CW32, bins)
the
of
experiment 2 individual outcomes
3.2 repeats ->

3
3.4

3.0

3.8 S

4 6

4.2

4.4
If the random variable X is the sum a
of
large
number random
of then X will tend towards
increments,
a Normal Distribution.

---

-mentral
Limit Theorem:

If random variable X is the sum of number of


a a
large
random then X has normal distribution.
increments, a

The normal distribution

for random variables


-appropriate continuous

functin
density
-

It has a probably
·
middle

mence a now spread the

bell and about the


shaped symmetric
-

mean

into
printer
the tails distribution
of head asymptotically towards IS

+() l -
- 8
e
ar.
=

unable to which means cannot


integrate analytically
-

formula for the would for


getexact probability way we others

:use table 1 in introstat


-maple
in video

x -
N(100,4)
to calculate the that X
lies within interval:
probability a
given
we convert from the units of X to a standard Normal
Distribution (2) where the O
mean: and variance
=
1

Z is measured in terms of:the number of standard deviations


from the mean X
of
away

to from
convert X to 2: z E
=

:.x -
N(100,4)

=Pros, a
re

-"c00._-mso.) -

a
both
conqurent
to each other

---
book

the
* under the for function
area curve a
probability density is one

How do we make do with tables for only standard normal distribution?


Because we have an
easily proved result that the proportion ofthe
and
density function that lies between the mean a specified number

of standard deviations from the mean is


always constant
regardless
away
of numerical values of the mean and standard deviation
The for
general result:the area between
p
and some point
N
(2
x
-

N (r, 02) is the same as the area between 0 and z - =

for M(0,1). tells


I us how
many standard deviations that the

point is
away from the mean
M.

Sum and Differences of random


independent variables

Suppose we have a number tasks that


of have to be completed in
sequence
the time taken for each
Suppose task
obeys a normal distribution, each

and and independent the taken


having a
given mean variance is of time

for other tasks. the total time taken will also be random
Obviously, a

variable. What will its and be? We state


distribution,
mean variance

that total time taken will be distributed with total time


normally mean

equal of for each and equal of


to sum means task, variance to sum

variances (not standard deviations). :.X: -N(pr, 0."(

and it independent time taken for then the


if is of
other tasks,
distribution random
of variable 4 Si-,
=

"
Xiis Y-N(p,o")
where
p S- pi,
=

and o =
SY-, o

Sometimes we need to consider the difference of two independent


normally distributed random variables. Suppose

x -N(Ne, 0,2) and X2-N(N2, o.)

then, z x) -

X2, state without proof,


letting we
=

z -
N(Ne-pc, O, 0.) +

The the random variable Z found


mean of If
by subtraction,
butthe still found addition.
variance is
by
normal random variable constant
Multiplying a
by a

Ifthe random variable X -

N(p, o"), and a and


if b

then the random variable y aX


=

b
+

are
constants,
also has a normal distribution, with y-N(ap+b,do")

Example normal distribution with 67 and standard deviation


4
of inches.

convert to cm:a is 2.54and bis 0

X
if
-
N(67,42) where x
is inches, then y= 2.54 x 0
+

y -
N((2.34 6
x
0),(0.254244)) x(n0.2, 103.2)
+
=
Module 5:Hypothesis
testing
Video 15
The difference between a sample and a
population

Population
From the sample we can draw

Sample inference about the population


ja R

parameter L statistic

draw random sample that represents


-we a a
population
-we measure a sample statistic
this estimate the
we use as true unknown
population parameter
-
an of

population mean (p)


·

sample mean (X)


·

population variance (o
·

sample variance (S)

Our be
sample needs to
representative and random:

representative similar
->
in structure to the population it is drawn from

random has chance of


-

every member of the population an


equal
being chosen as
part the
of
sample.

X
is a random variable

We call of statistic
the
sampling distribution
a
probability distribution a
Example of 1st students
measuring heights year

Let X
be the a drawn is student
height of
randomly year
and let, X1, X2, ...
Xn. be a
randomly drawn sample
"
from the normal distribution. true variance
-
a
mean-p

Rememberi) E(A) =
+

E(B)
and
A drawn
Var (A+B) var (A) var
+

(B)
is
randomly
independently
=

.e

EXi -N(mp,no) for x:drawn


normal
randomly
distribution
with

=x:E"
E(x) =
=

(np) ->
E[X] p =

Var(aX)a"Var(X) =

where a is a constant

var(X) var
=

(n , Xi)
Var(Vi
-wan
of
large sample stable estimate mean

small variance (it has


big n)

small sample-varying mean (imprecise)


variance
big
meetrage interbenomaaist anene
x -

N(r,) as
long as 30

Vid&O 16:Confidence intervals

It is the estimate interval


point us

testimate
m

information the the estimate


no
regarding uncertainty of

-
Interval
values
of
-range
communicate how much the estimate
uncertainty present
-

or is in
precision

Ample two die each, number wins


players, one
higher
friend wins a
lot, start tosuspect unfair die
you

X: number dots
of
upturned
on face a fair die after
of toss
any given
number.S
Probability of
any

E(X) 2i =

xi
pi
=
=
=3.5
=

var(X) E(X) =
-

E(X) [xip:-(5)
-- ()"
35
=

12

=
2.917
The sample mean is a random variable.

Therefore, it has a
sampling Distribution

x -
N(r,)
now,
for a fair die tossed 30 times:

X -

N(3.5, ) where X is our observed sample mean

from 30 tosses.

E
z
=

-n(0,1)

pr(x-1.96 =<p> x 1.965)


+ =

0.95

We are 95% confident that the true mean value for a

lies between (X-1.96) and


fairdie tosses)
example X 4.70
=

continuing
:confidence interval (4.70-1.96*; 4.70 1.96
+
(

(4,09; 5,31)

We are 95% confident that the interval (4.09;5.31) contains

the true mean (p)


It excludes the true mean of a fair die (3.5)

:there to believe friend


is reason our is
cheating

What affects the width of a confidence interval

(E) therefore it 100(1 2)% 2xL


=

L z x
=

is apparent
-

If increase confidence interval narrower


you n, gets
If
you increase z (confidence level), interval will
get wider

So... SAMPLES
LARGE YIELD BETTER RESULTS THAN SMALL SAMPLE
Confidence interval can be written as 100(1-2) %.

We need to ask the


following questions:

-within how units of the true mean do we want our estimate to be?
many
↳ this is the L

-
how confident do
you want to be of estimate?
your
↳ that is z
your

-how variable is the population you are


studying?
↳ this determines o

.
=2a,
Ifwe don't know interval that
we can construct an we are

100 (1-2)% confident contains po, as follows:

Pr(X -

z(p(k z()5) +

1
=
-

a
confidence interval method for a "known
,

If we have a random
sample size with sample mean X,
then A% confidence intervals for
given by
are
a

(X -

z*,x +

5)
*

where the
appropriate values 24
of are
given by

III.
Sample size a required to acheive desired
accuracy L,
when or is known

to obtain sample mean X


a which is within 1 units
the
of
population mean, with confidence level A %, the
required sample size 13

n ( =0)2
=
whether the
Video 17:testing is a specified value
mean

Confidence intervals allow


say things like:
us to

Our best estimate the


of unknown mean is X and we are

100(1 -

2)% it lies within the inter al


confident that
*

The Te s t
Hypothesis statistical inference, significance tests

1:define
Step the null
hypothesis (Ho)
Ho is a statement about one of the
true, unknown

population parameters

Ho will is
always state that the true parameter
equal to some (hypothesised) value.

Ho statement about the status


is
generally a
quo.
Here, to assumes no effect or difference.

in our example dice


of
->
H0:p No
=
3.5
=

2:define
ep the alternative
hypothesis (Hi)
He is still about the
true, unknown population parameter

C
He will
always have a
I, or
sign
In our example we have a that the true mean
strong suspicion
of this die is than 3.5
greater
H:
.

3.5
3: Set up a
significance level (2) for
your test

If set make it difficult


very tough
we a test, we

difference from
to detect a the
hypothesised mean.
Ifa 1%
=

(tough)
Then even if the die is fair (Ho=0), we will still

erroneously reject to 1% of the time

10% (less tough)


=

Ifx

Then even the die


if is fair (Ho =0), we will

still to 10% the time


erroneously reject of

in our
example:signficance level 5%
of 4 0.05=

ep4:Set
up a
rejection based on a
region

If2 9%
=

will to the falls within


we
reject if sample mean

the most extreme 5% values of the standard


normal distribution

In our
example:
according to He (H1:p < 3.5) we
only need to

consider
rejecting Ho the
if
sample mean is in the

5% values the distribution of X


greatest possible of

Iresear
:Calculate
# the test statistic

We proceed under the assumption that Ho is true

So, If Ho is true, then X-M(No, )


in our
example: X =

4.7

=, aktainedaromaare
-

X 4.7
=

I
Enam? =3.85
i)estatistica
z =

under Ho

6: Draw a conclusion

the test
If statistic falls within the rejection
then we
reject to in favour of HI
region

the test
If statistic does not fall within the
rejection
then do not Ho.
region we
reject

In our
example
-
our test statistic of 3.89 falls within
the rejection region (values > 1.649)
therefore we to and conclude thatthe true
reject
mean our friends
die is than 3.3
of
greater

nests
& thedefault positionunless,we are
e
-veseg in Se
2.5%
st I

one-sided test

Ierrors

Ho
Reject
·

erroneously
this is controlled d
by
is will be difficult to to so the chances of
If
very small, it more
reject
making a
type"error will be less

error

Ho
Accept erroneously

Ifa small -> small 1


is
very very probability making
of a
type error

of
->

increasing probability type 2 error

·
for the purposes of this course the
that
we
generally accept
Pr (type II error) is
acceptable when 2 0.05
=

or 0.01

Comparing two
sample means (with known population variances)

x
-N(Ne, 0.2 X2-NIN2, on
X
and2 are independentvariables,it here 2
Video 18
Friend example continued...

dice from
statistically we are
testing whether the two come

the different
same, or
populations
Are from the same that has the
they coming population
mean?
same
underlying true
your die=X1 your friends die=
X2

sample mean=X1 Sample xz


=

mean

x- N(Ne, ( X- -N(N2, (
· on" and 8," are the variances X,
of and X2
on is the variance the
of
sample mean

Xc -

Xn-N(Ne-N2, +

(
After 30 rolls of the die:

X
1 3.6
= X 2 4.4
=

assume that the variances of these dice are the

same as for a fair die it c 0


= 2 2.917
=

Step 1: Ho: HO:pe 0


=

N1 N2
=

or
-N2
2: H1:
Step N1 FNC
Step 3:Set
up signifance level (a) for
your
test

2 0.0 5
= 5%
4:Set based a
up rejection
on
Step region
this is a two-sided test so we
split in half
↳ We will
rejectto if absolute value of

test statistic > 1.96


Step 5: statistic
Calculate test

z
=

)
I
447 1.81
=
-
-

this means that


ifthe null true and the
hypothesis is two means

equal to each we have observed difference


are
actually other, a

lies
that 1.8 standard errors from what we would expect.
away

Step 6:Draw a conclusion

the test statistic does not fall the


in
rejection region
so we can not to and must conclude
reject so

that there is no difference between the mean value

of die and that of friend's die.


your your

modified
approacha significance level to be spectedin
the observed level of
-rather, we
report significance
the
or
p-value

Step 1:HO:
=

N1 N2
2:H1:
Step N1 N2
=

3:Go test statistic


Step straightto
z
=

...
- -

1.81

the test statistic at


probability of
observing a least

as extreme as our calculated statistic


test

it is"*" (two-sided, value =Pr estate


->
H1
If is "<"(one-sided) p value
Pr
=

(E < test statistic) w


In our example:
Pr(z) 1 -

1811) xz (0.5 0.4649)x2


= -

0.0351
=>

X 2

=0.0702

Ap value

what this means is if to is true (ie there is no difference between

the dice) we would observe a difference at least of this

7% the time.
size
approximately of

->
In this course

used 5% rule when


commonly , we can
reject
lower than 5%
pralue is

p-value

Eme3
It the to true (ie
is
probability,ifwe assume is there

is no difference between the dice), that we would


as extremee
teststatistat,ast

ARA ->
Pr (type 1 error)
VIDEO 19:Hypothesis testing using
t-distribution the

a"of friends die:a"of


Previously:we assumed our a

fair die=2.917

We can estimate
o'using the
sample variances"

->
the of this is that our test statistic is more
consequence
variable than before

1.. two random variables:X and o "(


:test statistic no longer follows standard normal distribution

As an
increases, the distribution becomes more
peaked and more

like the Normal Distribution

E
=

z
-
N(0,1)

t
EN
=

tae

So ifwe have a
sample ofsize , and we estimate a
"by s"
then the test statistic has a t-distribution

distribution will be determined


the appropriate by n

X
I t ?
two
V1090 28:comparing
means

Rerun experiment of rolling die with friend, potentially biased die.

Your die: Their die

X, 3.9 =
xz 4.3
=

51
=

2.08 S2 1.93
=

Step 1:define the null


hypothesis (Hol
HO: N2
=
or
HO:N1-N2 0
=

Step 2:define the alternative


hypothesis (H1)
He: = He:Ne-N2 0
=
N1 Ne or

Step 3:
Significance level
2 0.0 5
=

4:calculate
Step rejection region

remember! z =

re
xar,
:+ Ne
*-
that the equal
population
·

we assume variances are

Therefore: S," and S," can both be viewed as estimates

of the same true variance

Sp-pooled estimate

weighted average
scien
by
sample
si
spi(h)
statistic
ANe-N)
->
test +=
-trtnth

this test
statistic will follow a t-distribution with me+M2-2

of freedom
degrees
0.09/2

Rejection region
->
(TestStatisticl> t
n1 12
+ -
2

I
we reject Ho if this is true

Creject Ho if (

S
test statistic
t critical
Fanso value 2.000 =

!
Alte --
0.78
-distributortorem
= -

:. at 5% level we do not reject Ho

this means we conclude that the two means are equal

What that statistic is 30.78 2-0.78?


is the
probability our test or

lusing the modified approach)

use the table:60


degrees of freedom
find level which test statistic would be
highest at
significant
0.3
=

..
p-value is <0.6
↑)
two sided, we double

We need the t-distribution because we no


longer assume we know

the population variances (estimated from two


sample variances)
that the two have
We assume
populations equal variances

so we can combine the two sample statistics

-and a
pooled standard deviation estimate
V1d20 21:Hypothesis
paired
testing using
measurement

trial alertness. Participants must record how times


Drug testing many
roll die a min. Given the Then 30 later
they can a in
drug. min

record how roll die 7 min.


many
times
they can a in

Your friend thinks


they are
going
to
analyse the mean before the

and to number rolls after


drug compare mean of use
drug.
of

what matters the difference between each


Surely actually is

individual participants before-score (B) and after-score (A)


ie B-A

test statistic two sample for


N paired means

to distribution:absolute value test statistic


p-value of

and
degrees freedom
of
(n-1)

area of

Site the
p-value

variances:test
stat
assuming equal
-
two-sample
ret
hypothesis test:comparing
mary of means two
of
dependent samples

1 define the null Ho


Step -
hypothesis
HO:NB NA =

or
-
0 no difference
NB NA =

Step 2 define
-
the alternative
hypothesis He (now a one-sided

H1:NB<
test)
NA
- 2 O
or
NB NA

Sin above
example:we want
drug to have an effect and therefore

mean after
drug should be
greater than itwas before)

Step 3 ->
significance level (2)
d 0.0 5
=

->
Step 4 calculate
rejection region
d XB =
-

X A


can be tested with a I-sample t-test

of freedom:1 1
degrees
-

where n refers to the number pairs


of of data.

0.05
*
RejectHo ifteststatistic <-ts4
&
appropriate
5 -
calculate test statistic critical value 1.691
Step
-
=

stat
test A
-

=g=-4.91-tax
6 ->
conclusion
Step "
"test statistic is more extreme than critical value
-
4.91<-1.691:·falls
in
rejection region
Ho
reject
the effective
drug is in
improving motor
functioning
Red
approach

(What is the
probability getting
of a test statistic as small as -4.91?(

look t-table: df 34 find closest to 4.91


a
probability
=

↳ less than 0.000s


p-value is

If Ho is true we would have observed a difference of this

nature
only 5 times in 10000
(very very unlikely) on this

basis have evidence to Ho and conclude


we
strong reject again
the medication is effective.

---

our alternative was


If a two-sided

H1: F
NB NA
we would double our
p-value
.pvalue IS <0.001

You will have the number observations


always same of
·

in

two
your groups

Critical need
point
-> We paired tests because our paired
data are not independent of each other

A paired test is most


appropriate with repeated measures

A 10 A
example 204
pg.

We convert two sets measurements into a measurement


our of
single
the
of differences between the two sets.

We then use the I-sample t-test


The t- and F-
CH 9: distributions

=
From has the standard normal distribution.
8, z
=

Chapter
Butwhen the "true"value for standard deviation
o, single the

In the population, IS
replaced by 3, the estimate of a, this is no

the
longer true, although, as sample size n
increases, it
rapidly
becomes excellent approximation. But for small far
an
samples it is

from the truth. This is because the sample variances" (also 5)


is a
itself random variable (varies from sample to
sample)

We know that the the influences the of


size of
sample accuracy
our

estimates. The the sample the closer the estimate to


larger is
likely
be to the true value.

The of the t-distribution similar to that


ofthe normal distri
shape is

the of with the


However, shape the distribution varies sample size.

It is heavier-tailed than the normal distribution when


longer or
-

the small. As the the


sample size is
sample size
increases,
- distribution and normal distribution become
progressively closer,
-

and identical.
ultimately, they are

freedom
Degrees of

remember, g =

lie)"
these terms are deviations of each of the

cifrom the
sample mean

to acheive for, numbers, five of


those
given sample
mean six
a
say,
can be chosen then the last determined.
will,but
at is
fully
In we told the a numbers IC, and that
general are mean
-

of is

the first n-1 numbers are (Cr, (2, ...


(2n-1
We that based has n-1 of freedom
say s on a
sample sizen, degrees

of freedom
Degrees
For each need
evaluating the
to estimate
parameter we prior to
lose
current
parameter interest,
of we one
degree freedom. of

Because the shape of the t-distribution varies with the sample size,
t-distributions. It therefore
there is a whole
family of is
necessary
have of
to some means
indicating
which t-distribution is
being used
In a
specific situation. We do this
by using a
subscript. Intuition
that the should be the but it turns
suggests subscript sample size,
out that the sensible the freedom of the
subscript is
degrees of

standard deviation, which is one less than the


sample size.

-n
=
Exact p-values software.
are
usually computed using statistical

For the E-test could calculate these


we
exactly since our

tables us associated with z- s c o r e s . But due


gave probabilities
to the fact
that t-tables critical values rather than
our
report
be
probabilities, p-values can no
longer computed exactly. To
obtain approximate p-values:

①identify the
appropriate degrees of freedom

8highlight the relevant line


(corresponding to the df) in t-tables

⑧ where test would lie that line


Identify the statistic
along
4determine
probability (range)
the associated with
approximate
the test statistic
by looking at the
probabilities corresponding
with the two values on either side of the test statistic.
confidence interval for , when his estimated s
a
by
If have random size from with normal
we a
sample of a
population a

distribution, and the


sample mean is is and the
sample variance

is s2,
I
(X-tre' m, X trn)
+

where the +4 values are obtained from the t-tables. For 95%

confidenceintervals, use the column in the tables headed 0.025.

For 99% confidence intervals, use the column headed 0.005.

Te s t i n gwhether the mean is a


specified value

/population variance estimated from the sample)...

farmer
Example:Poultry is
investigating ways improving of
profitability
his standard diet, turkeys of
of
operation. Using a
grow to
mean mass

4.5 at 4 months. A of enriched


kg age sample 10
turkeys, given special
diet, had mass after 4 months. The
average of
4.8kg sample
standard deviation 0.5 3%
was kg. Using significance level test

whether the diet


new is
effectively increasing
mass.

1) Ho:p 4.5
=

we assume enriched diet

2.) H1:N > 4.5 one-sided alternative C not cause loss of mass should
3) 5%
Significance
4) found follows. Because the
rejection region is
by reasoning as population
need to to distribution. of
variance is
unknown, we use
Degrees
freedom for I will be
19, s is based on 20 observations.

We will thus to "observed t-value"exceeds


if to s
reject (0.05)
From the table t 1.729
=

s) test stat
==
=

2.68

2.68 > 1.729


6.)
conclude the
we
to,
reject and that, at 5%
significance
level, we have established that new enriched diet
is effective.
two means: matched
Comparing sample pairs or
paired samples
↳> the idea a t-test
of
paired

consider medicine reduce dizziness. We want if


ample a
designed to to know it

effective. We have 10 who of dizziness and the


is
patients complain we examine

reduction in
dizzy spells for each after
patient medicine. (how
taking many
dizzy spells had per month before the meds and
again while on meds I

1 4 S ↳ 7 8
2 3 q 18

Before (B):19 18 q 8 7 12 16 72 14 18

After (A):17 24 12 H 7 15 14 IS 16 24

d B
=
-
A 2 -
6 -

3 4 G -

3 -

3 -

3 3 -
6

The two
samples have
effectively reduced to one, by taking the
been

differencein scores. Therefore,


testing ifthe two populations
("before"and "after") have the same mean, is
equivalent to
testing
the (population)
If mean the
of difference scores is 0.
So, the

have been reduced to which


two paired samples one
sample on we

can
perform a
one-sample t-test.

The relevant statistics obtained from the isd=-1.5


summary
and Sd=3:57. 5%
Performing the hypothesis test
assuming
level of
significance.

1.) Ho Nd 0
=

C H1 > 0
2.
Nd
3.) x 0.0 5
=

4.) critical value is 1.833

3.) the
If number have reduced while
of
dizzy spells on the

before should be after


treatment, the score
greater than the

And difference defined d B


=

A
score. since
by,
-

were

we'd expect them to be positive. Note that the


sample into

is not consistent with this hypothesis;hypotheses are determined


by what
you are interested in finding
out, by the data!
NOT
ta =

2,5 = -
1.33

<1.833
6.) 1.33
-

fall null meds


we to
reject hypothesis and conclude that the

isn't effective at
reducing dizzy spells.

:It is assumed the difference is


* population of scores

distributed with of Also, this


normally a mean ad. in case

above the associated with the test statistic would


p-value
fail
have been between 0.8 and 0.9
(clearly resulting in a

to decision).
reject

of
As already mentioned, in this example we appear to have two sets

data -
the "before" data and the "after" data. But were these

had 10 selected
really two
separate samples?Ifwe randomly patients
"before"and for
for then a different 10
randomly selected patients
"after"then the answer would be and the samples would be
yes
independent one
of another. But, since dizziness varies from patient
looked 10
to
patient, we at the same
patients and took two
readings
from each individual. Thus our two sets ofdata were
dependentand to

perform the used differences. dependent


test
we a
single sample of These

measures are known as repeated measures.

Example Twenty individuals paired their intial rate of


reading.were on

each for method I for speed


One of
pair was
randomly assigned
and the other tomethod 1. After the courses the speed of
reading
measured. Is there difference the effectiveness of the
reading was a in

two methods?

Pair I 2 -.. 9 10

Method I 1114 996 996 894

Method I 1032 1148 1032 1012

&:I - II 82 -
152 -
36 -
118

relevant statistics:d =
-40 71.6 10
=

5d= n
summary pairs
Once we that this is random from the
again assume a
representative sample
differences differences speed
population of and that the in
reading are
normally
distributed with mean
Nd.

1) Ho: Nd 0
=

2.) H1: Nd FO

3.) ta 2
=

= -

1.77

4) the p-value for this statistic lies between 0.1 and 0.2

3.) Therefore we do not the


reject
null hypothesis (p-value is too
largel
and conclude that isn't difference between
there a in
reading speed
the two methods.

intervals
Erence

As before, It is also
possible to construct confidence intervals for these

differences. the before and the fact that


Following same
logic as
using

Nd tree,
the 95% confidence interval be constructed
can
by:
Pr)-t, at non
0.025

S 0.9 5
=

so

Pr
(d -

+,.Nd(d +
+
i=.2) 0.99
=

therefore the 95% confidence interval is:

Ca -
+ i... ;a+ti
Comparing two independent
sample means (variance estimated

from the sample)...

When we have small samples from two


populations and want to
compare
their the little more than
means, procedure is a
complex one
might expect.

- (Ne
Tr
I =

you mightanticipate that the test statistic would be

NN
But, unfortunately, statisticians can show that this does not
quantity
have the t-distribution. In order to find a test statistic which does

have the t-distribution, an additional assumption needs to be made.

The that the two from


assumption is the
population variances in
populations
which drawn
the samples were are
equal.

Example (highlights difference between


comparing means when variance

is known and when variance is estimated from sample (

Two varieties of
wheat tested Twelve
being in a
country. test
plots are
given
identical treatment. Six plots sown with 1 and other six
preparatory are
variety
with Scientists level.
variety 2. hope to determine
significant difference, 5%

1:1.5
variety 1.9 1.2 1.4 2.3 1.3 tons
per plot
tons
Variety 2: 1.6 1.8 2.O 1.8 2.3
per plot

One the I extra fertilizer.


of
plots in
variety was
accidently given
Results discarded.

, 1.60
=

31 0.4 2
= b
n =

522 1.90
=
32 =

0.2 7 12 5
=

We follow the standard


hypothesis testing procedure
1.) HO:
=
-

0
N1 Nz
2.) H1:N1-N2 F0 (a two-tailed test)
3) level:5%
significance
4.? 5.) Before we find the
rejection region we need to know the
"degrees
of freedom". The procedure is rather different from that in the test

Instead
when the
population variances were known. of
working
with the individual variances on and on we assume that both

populations have the same variance and we pool the two individual

sample variancess," and S," to form a


jointestimate of the

variances. This assumption of


equal variances is
required by the

mathematical the t-distribution.


theory underlying

the formula for


general pooled variance:

3nee = A(r 1)s" + -

therefore, "=A
4x(0.27)" +

6
+
5 -
2

=0.13

:standard deviation: 8i3 0.3 6 1


=

How freedom does have?


many degrees
of I

n nz
+ -

z =
9

Thus the t-distribution with 9 and because


we use
of, we have

two-sided alternative and 5% level need value


a
0.02S
a
significance we the

of
tg, from the table is 2.262. So we to the
if
reject
observed to-value is less than -2.262 than 2.242.
or
greater

↳emerit4
two-sample t-test

X, and 2 are the sample means

determined the null


hypothesis
Ne-N2 is
by
deviation.
S is
pooled sample standard
ne and he the
are
sample sizes
t 6+g- z
0
=

+q
=
-
1.372

6.) Because -

1.372 does notlie in the


rejection region we conclude that the

difference between the not


varieties is
significant.
Video 22:Te s tofn gGoodness
ifit

Is the die random for all outcomes?


producing the
right proportion of numbers

die
for eachoutcome a
Expebability role
of

mig 10 10 10 10 % o

Goodnessof fit test:

observe data
we
compare what we in a
sample of

with what we would expect under a


specified hypothesis

We toss a die 60 times:

1 2 3 456
Outcomes

Observed no. 7 9 7, 9 30

come
Probability
I ! I I I
of outcome

Expected no.
5x60
=

10 10 10 10 10
of outcomes

1: Ho: X has the PMF:


Step following
p(x) 5
=

C2 1,2,3,4,5,6
=

O
=

elsewhere

Step 2: H1:Xhas some other distribution

3:2 0.0 3
=

Step
CHI-SQUARED DISTRIBUTION

the Chi-squared distribution is similar to the t-distribution in that

it has freedom and its to


degrees of
shape changes according
these of freedom.
degrees
·
skewed to the
right
is
always positive
·

TA B L3
E
pg327

to the correct freedom:


get degrees of

the number less the


categories
of we are
comparing
number of we estimate in the data less 1.
parameters

-
6 -

1 5
=

critical value from the table:

and
=

Set 0.05 11.07

Step 4:We will


reject the null
hypothesis (Hol ifour
> 11.87
test statistic

Step 5:Our statistic


test is the between what we
comparison
observe and what we expect Ho
if is true

IfHO is we expect ofthe 60 tosses to


true,
each
produce outcome

observed expected
2E
Dr =

EE k number
=

comparisons
of

If the large, test


statistic, this
discrepancy is
large
·
le

would
suggest you are
observing something quite different to

what
you expect
the
·if test statistic is small, it would not constitute

much evidence against Ho


in another form, D=(= E) -

.. * 14.8
=

14.8 > 11.07


Step 6:

Ho at 5% level and conclude that the distribution


Reject
this
of die is different to what we would expect.

EMBER!Ifyou use the sample to estimate one (or two)


for the
parameters distribution
you are
testing, then your
test statistic loses one (or two) degrees of freedom.

You need an
expected value -5 for a CHI squared test

to be valid. Ifit is not than


5, you may have to
greater
collapse categories together.
CH 10:The CHI-squared
Distribution

We consider three applications the chi-squared distribution


only of
-

"goodness fit
of
tests", "tests association
of in
contingency tables",
and

"tests
sample variance".Although
and confidence intervals for the rationale

behind first the calculation


the two
completely different,
is
applications
of the test statistic is almost identical.

Goodness of fit tests

To do the two-sample t-test, we found we had to make the


assumption
that the Another for the
variances were equal. underlying assumption
t-test that the drawn from populations which had
was samples were

normal distributions. There is therefore a need to be able to test

this Not the that


assumption. only we test
can if
process generated
the data has a normal distribution, we can test whether data fits

distribution.
any specified

#ample In
printing Industry, misprints are to
thought occur at

random
(independently one
of another). Thus the number misprints
of

can be
expected to distribution with R.
obey a
passon some parameter
To test
this, 200 are examined and number of misprints on
pages
each A level used.
page are noted. 5%
significance is

of
misprints
pages
Observed number of

⑧ 43

1 69

2 53

3 21

4 8

S b
-
In order to test whether the distribution fits this data,
poisson
the first to decide what value to use for the
parameter
problem is

2 the Poisson Distribution. This either be the


of can
specified by
null the data be used to estimate 2. We treat
hypothesis, or can

these two situations


separately.

Let first consider the test the null that the data
us of
hypothesis
be as from Poisson Distribution with
can
thought of a
sample a

x 1.2 the alternative


parameter
=

misprints per page against


that the data comes from some other distribution.

①Ho:Data comes from a distribution with x 1.2


=

poisson
⑳ H1:The distribution is not
passon with R 1.2
=

8significance level:5%

⑭+ 8 First we need to
compute the expected (theoretical) frequencies,
that the null If
assuming hypothesis is true.
misprints are
occuring in

accordance with a Poisson distribution with rate 1 =


1.2 then the

e- 1.2,.z
probability that a
page contains (2 misprints 15
p(x) e =

Thus misprints is p(0) e12


=

the no 0.3012
probability of =

A
sample 200
of can therefore be expected to have 60.24
pages pages
with no misprints (30.12% 200). of This theoretical
frequency be is to

1.2
with
frequency 43.
Similarly p() e
=

compared an observed of , 0.3614 =

Thus the expected frequency of one error is 200 0.3614=


x 72,28. Compare
this to the observed 69 with one error.
pages

no of misprints (x) observed (02) theoretical probability p(x) expected (Ei)


-
1.2

O 43 1.2 oxe 10! =

0.3012 60.24
-
1.2

1 64 1.2 x e 11! =

0.3614 72.28
2 - 1.2

2 e/2!
=

53 1.2 x 0.2169 43.38


-
1.2

3 1.2 X 13! 0.0867


=

21 e 17.34
-
1.2
4 8 12* x e (4! =

0.0260 5. 20

5 or more 6 1 -
2Y 0p(x) 0.0078
=
=

156
Even ifHo is true, we anticipate that the observed and expected frequencies
will would if
not be
exactly equal. We
clearly like to
reject Ho, however, the

differencebetween the observed frequencies and the


expected frequencies is

"too
large"
We need to find a test statistic which is a function of the differences between

observed and which has


and expected frequencies a known
sampling distribution.
The the differences of because to We divide
sum of is no use
they sum zero.

the differences
squared by their expected frequency.

12 =
>(i)
The statistic DC has
approximately a
chi-squared (X) distribution. It

has freedom attached it. The freedom


degrees of to correct
degrees of

where "K" the number cells which


into the data
is
given by K- 1 , is of

Here and therefore of for X 5. Thus


are
categorized. K 6 is we
=

Ho I
if exceeds the 5% point of X's distribution.
reject
From the table x,100s) 11.071 =

mean
IfD exceeds 11.071 then the observed and

tos I ! expected frequencies


their differences to
are

be
"too far"apart for

explained by chance

sampling fluctuations alone.

... it 22. 17

⑥This value lies in the


rejection region.
We thus
reject the null
hypothesis that the samples follow a Poisson

distribution with R =
1.2
Remember D has x"distribution the expected
approximately a of

frequencies exceed "about"s. In the table above we see that

"S more"does not fit this condition and the


or so we
amalgamate
it "4 more". We 4 and S
adjoining cases, is is now or
repeat
steps
and conclude based off the data.
with the new
figures of correct

In this case, the conclusion remains the same.

Let's use the data to estimate and see what difference


same X,
this makes to the test.

0Ho: the data fits some Poisson distribution

① H1:the data fits a distribution other than the poisson distribution

③ significance level 5%

⑪ to find a we need to estimate the rate at which misprints occurred in our

data. The total that


sample number of
misprints occurred was

(0x43) (1x69) (2x53)


+ +
...
+
+

(5xx) 300 =

misprints. 300 in

100 implies at 1.5


pages misprints occur
per page.

freedom:K- d - 1
Degrees of

where K is the number of cells, and

d the number estimated from data.


is of
parameters

5 and d:7, because estimated


parameter, R,
=
ere k we one
just
from the data. Thus we must use x"with 3-1-1= 3
degrees freedom
of

The 5% point ofx" is 7. 8 1 5 . We


reject
Ho if D" exceeds 7. 8 1 9

Notice that of
fit tests one-sided
goodness are
intrinsically
-
we
reject
Ho D" too D
If that the distribution
if is
large. is
small, it means

Ho fits data
specified by the
very
well.

0.2

...
1001 i
-
=Sc...it
=1.01

o sz.dis'o

6 1.01 lies in
acceptance region, we can't
reject Ho. It is

reasonable to conclude Poisson distribution with X 1.5.


=
A
short-cut formula for B

12 E
=
Fi
20
-

Ei
=

But EOi=2Ei =

n, the sample size. (Each summation is over

the K cells).

:. EE
= -n
for association
VId20 23:Te s t i n g an

between
categoric variables

Ta b l e s are used cross-tabulate


generally to
categorical variables
arranged as rows

and where the cells


columns, contain counts.

Is there an association between


gender (rows) and
job level (columns)?
worker
supervisor manager

female 40 12 4

IS 10 S
male

we that data random and


assume our is representative

->
Example Die with friend
tossing

&Ho: there is NO ASSOCIATION between the die outcome and die


ownership.

& He:there IS AN ASSOCIATION between die outcome and die ownership

8 d
=

0.0 5

⑪ Our test statistic (b") will compare observed values with expected frequencies
and follow distribution.
a
chi-squared
(We are
assuming
thatHo is true and thatthe factors are independent)

To calculate correct
degrees freedom
of (Ino. of rows) -
1) ( (no.
x of coloumns) -
11
↳> (z 1) - x (6 1)-
=

Go to distribution table
chi-squared
-

5 freedom and t 0.0 5


degrees of =

critical value:11.070

We reject Ho if: D
2
> x,(0.05) 11.070
=

⑤member A
If and B are
independent, then: Pr(A1B)P(A). P(B) =

In example -> A die =


B die outcome
our
ownership
=

the
If two variables are
independent
Pr(AiBj)total amattal
association between and outcome:
Assuming no
ownership
Outcome toss of
of die

1 2 3 4 S 6

you 11 9 6 15 9 10 60

6
friend 3 S S ↳ 17 40

14 14 11 19 15 27 100

Pr (Your die rolling a e):5 0


x
=

0.004

100 8.4 value of rolls of I


0.084
x
expected
=

die
your being
-R

60% of the die rolls were with die. Theres a total of 14 is rolled.
your
I's rolled. 60% of 14 8.4
=

We would expect your die to account for 60% of

toa
column;
Eij=motal

1 23 4 S 6

x x100

You 8.4 8.46.6 11.4 9 16.2 60


Expected
friend 5.6 5.6 4. L 7. 6 6 10.8 40 values
14 14 11 19 15 27 100

p2 2 =(E)
=

-n

=11.03

this summation is taken over all cells

⑥We rejectto if D2> 11.07

2
11.83
=

At 5%, have
:.
level, we do not evidence to
reject to, and so must

conclude the the die roll does not


that outcome of depend on whose

Ale is rolled.
dified
approach

p-value:The probability observing of a Te sstatistic


t 1.03 if Ho is true.

Chi-squared table
-

5df and less than 0.05

0.1
=

and 9.236

x =9.2 3 0 therefore
p-value <0.1

excel calculating p-value

=CHISQ. TEST (actual range, expected range)



le observed
=

0.050796
inverse of
chi-squared right-tailed
↓ I
critical value in excel ->=
CHISG. INV.RT (significance, df
=CHISQ. INV. RT (0.05, 5)
11.075
=

test statistic in excel ->=CHIS.


INV.RT)p-value, degrees of f)
CHISQ.
=

INV. RT (0.050796, 5)
=11.02962

The Chi-squared test of


independence and is used to test two

variables associated.
categorical are

The test statistic measures the difference between what we observe and

what we would expect under to (no association (


The the test the more the evidence will be Ho
larger statistic, against
Note:With these tests do one-sided test.
chi-squared we
always a
Te s t s of association in tables:
contingency
A
contingency table is a
simple table (or matrix) counts.
of Each

the table is called cell. Each member of classified


entry in a a
sample is

according to two variables, e.g eye


colour and hair colour, and each

cell in the table the count of the number members


of the
of
represents
sample who have a
particular combination of the two variables.The

chi-squared distribution is the distribution of the statistic


test
sampling
which tests whether there is an association (or relationship) between the

two variables:
Are
eye and hair colour
independent?
Is there a
tendency
for colour to be related to hair colour?
eye

DEGREES OF
FREEDOM for tests association
of

For the chi-squared


contingency table, appropriate
an rxc

distribution has
degrees of freedom (r -1)(c 1)
=

-
for linear
VIOCO Te
24: s t i n g a
relationship
between two variables

a bx
+

y
the variable
·

y is
dependent
the variable
·
s is
independent
b and
regression coefficients
·
a are

Is there a linear
relationship between cc and
y?

The true (population) correlation is often unknown so we use

evidence correlation
of measured in a sample to infer that

the
of population.

P:population correlation (parameter)


r:
sample correlation (statistic)
r falls on this scale d
-

1 O 1
⑲ ⑨ ⑧!

perfect negative! " independent


! i perfect positive
·
high negative 8
high positive
·. o

so..... low
negative


·o


low positive

·
"
as
as
Increases, increases,
decreases
y y Increases

R measures the proportion the


of variation in that is able to
explain.
y
(often expressed as a
percentage)

So ...
the
higher the value of r (or R2), the closer (stronger) the

relationship between 1 and


y.
by intercept
a
= +

y Bx
I
line
slope straight
of
Parameters
e

d and B regression coefficients


-

statistics
-

a and b -estimates of a and B (random variables)

C level
#
is the
intercept parameter, not to be confused with significance
of

Ingeneral, we are interested about (slope


more in
testing hypotheses
straight line) than
of d
(y-intercept)
If
B 0 then there linear relationship between and
=

is no cc
y
So we can this as basis for whether or not there is a
testing
use a

linear
relationship between 1 and
y.

Step 1 H0:
-

B 0
=

we assume no relationship
Step 2 ->
H1: 8 B <0
BCO
BF or or

3 -> 2 =

0.0 1
Step
4 ->
under of Ho
Step rejection region assumption

the test statistic has a t-distribution with n-2 df.

n the
=

number of
pairs of observations

test statistic -
tn-z

we will to ifour test statistic falls within the


reject region
that defines the 1% most extreme points of triz distribution.

Foratrois detestreset,tastic acne


·

step 5 ->
test stat
-
(b) hypothesised parameter (B)
matistic -

standard error of (b)


-

6 ->
If teststat falls Ho
Step in
rejection region:reject
modified
A pack

1) Ho:B 0
=

we assume no relationship between ( and


y
2.) H1:B =
0

3.) read from excel


outputthe test statistic 51.97
=

(in excel example)


- 24

4.) read from excel outputthe p-value


=

2 x10 0
=

5.) conclusion: based on the p-value (approximately 0) we Ho and


reject
conclude that there is a
relationship between
year
mark and final mark.

We can use the line to predict value from a value (c.


regression a of
y given
final 44.12%
e.g. year mark
If (x) =

50% then mark


(y) =

y
3.0635
= +

(0.8211)(50)
=44.12

The value the correlation


of
coefficient, when
squared, (r) can be used

to indicate the of variation that I is able to


explain.
proportion in
y
CH12:SLIDES Introduction to

regression analysis
Correlation:Is there
really a
relationship between two variables?

How do
predict values for variable, given particular values
Regression: we one

for another variable?

Regression analysis
·
ALWAYS one
dependent variable and AT LEAST Independent
ONE variable

Years education
of and income. Income is the
dependent
variable and
e.g.
education
of isthe independent variable. Income depends on
years years
education but NOT
of
conversely.
·

involves of education to
Regression analysis using years predict income.

ample
Person 2 3 4 S

Education 12 15 8 10 17

Income (in 1000's) 33 ↓I 18 28 53

60

50
X Income is (atleast to some extent)
education.
X
dependent
on
years
of

o 30

28
X
X
Income

variable
is therefore the

and is
plotted
response
on vertical
(4)
axis
X

10

I Iis isitis
is

Years of education

Regression analysis effective a relationship exists between


is
only if

the
dependent and independent variables.

·
be used method for
Scatterplots can as a
preliminary identifying
relationships between two variables.

is ite ......

positive relationship negative relationship no relationship


model
methe
regression
·
We attempt to fit a
straight line (since we are
dealing with linear

regression) through our observed data. The


regression equation is of

the form =

a bc
+

y
and the variable
y
is the dependent variable is independent
·
a and b are chosen so that the line "best fits"
regression
the observed data.

least squares
ummmmmm
·
the constants a and b are chosen to minimize the sum squared
of

residuals.

·
residual
A (e) is defined as the difference between the actual
y
and
y (the predicted value
y).
of

e
= -

y y
Ifwe have have residuals.
a of
observations then you'll a
·

pairs
sum
squared residuals en
---- !miniminis
of

agresa,
Esoeuvre
-

5 10 IS TO 25

Explanatory variable (x)

Analysis
-on
·
A
measure of the linear relationship between and
y
Itis numerical the and direction a linear
strength
·

a measure of of

relationship between two variables


quantative
We calculate the correlation for the
population (true correlation
·
can

value) as well as the sample.


·
The sample correlation coefficient (r) can be used to
justify a

particular regression analysis.


The
·
correlation coefficient lies between-1 and 1.
always
--ie?
e
my frien ....... .......
r = -

1 r < 0
: .

r > 0 r
+
=
1

Determination
immett of

·
The square the
of correlation coefficient (r) is known as R2

The coefficient
determination, R2, the the variation
·

is
of
proportion of in

y that is
explained by the variation in X.

R' lies between 0 and 1.

I- R" the variation Y thatis explained factors


is
by
·

proportion of in

other than X.

The closer R is to
7, the better the model fits the data.
regression
The closer the fit.
it is to 0, the poorer
·
If r is close to +1
or-1,R" will be close to 1 and if is close

to R will also be close 0.


0, to

·
R2 =

0.8697 this is interpreted as:86,97% of the variation


e.g.
of Y explained by the variation X. This
is in is a
large proportion
so we conclude the model fits well.
regression

and slope terms as random variables


ummmmmmm
We data to and b. These therefore
sample
·

use compute a are

statistics computed from sample.


as
they are a

Both and b random from


a are
variables, since
they vary
·

to therefore both have distributions


sample sample. They sampling
normal.
which turn out to be
Why?
The fixed, but unknown intercept and slope population parameters
could be obtained model to the
theoretically by fitting a
regression
often
entire
population. Since this is not
practically possible, we

use our sample statistics to draw inferences about the unknown

interest.
population parameters of
on!
A
correlation coefficient close to +1
or-7, or
·

a
significant slope
coefficient b, does in itself a casual relationship.
not
prove
·
For observe correlation between
example, one
may a
strong positive
the beach sales of
number
drownings
of at a and the
figures
ke-cream vendor -

but this does not that ice


a
nearby imply
sales age an increase in
drownings! Sunny weather is more

likely to be cause simultaneous


of increase in ice-cream and
drownings.
·

Correlation and establish


regression analyses can a
relationship
between two
variables, but it is
up to the research worker to

explain the casual mechanism


(usually based on theoretical and

logical considerations).

END OF SYLLABUS

You might also like