0% found this document useful (0 votes)
17 views

2.2 The Poisson distribution

Chapter 2 introduces the Poisson distribution as a probability model for counting rare events occurring randomly within a fixed interval. It covers how to calculate probabilities, solve problems involving independent Poisson distributions, and use the Poisson distribution as an approximation for the binomial distribution. The chapter also discusses the relationship between the mean and variance in Poisson distributions and provides examples to illustrate its application.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

2.2 The Poisson distribution

Chapter 2 introduces the Poisson distribution as a probability model for counting rare events occurring randomly within a fixed interval. It covers how to calculate probabilities, solve problems involving independent Poisson distributions, and use the Poisson distribution as an approximation for the binomial distribution. The chapter also discusses the relationship between the mean and variance in Poisson distributions and provides examples to illustrate its application.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

25

Chapter 2
The Poisson distribution
In this chapter you will learn how to:

■ understand the Poisson distribution as a probability model


■ calculate probabilities using the Poisson distribution
■ solve problems involving linear combinations of independent Poisson distributions
■ use the Poisson distribution as an approximation to the binomial distribution
■ use the normal distribution as an approximation to the Poisson distribution
■ carry out hypothesis testing of a Poisson model.

Copyright Material - Review Only - Not for Redistribution


Cambridge International AS & A Level Mathematics: Probability & Statistics 2

PREREQUISITE KNOWLEDGE

Where it comes from What you should be able to do Check your skills
Probability & Statistics 1, Calculate probabilities using the 1 If X ~ B(10, 0.2), find:
Chapter 7 binomial distribution. a P( X = 7)
b P( X , 2)
Probability & Statistics 1, Use normal distribution tables. 2 If X ~ N(16, 1.8), find:
Chapter 8 a P( X , 20)
b P(13 , X , 15)
Pure Mathematics 2 & 3, Evaluate exponential 3 Giving your answers to 3
Chapter 2 expressions. significant figures, evaluate:
a e −2
b e−3.6
c e −9
Chapter 1 Formulate and carry out 4 State the null and alternative
hypothesis testing. hypotheses and test statistic for a
test for the population mean for the
random variable X ~ N(42, 8),
sample value 45; two-tailed test
at 10% level of significance.
26

Why do we study the Poisson distribution?


In Probability & Statistics 1, you learnt about discrete probability distributions. In the
binomial distribution there are a fixed number of trials with only two possible outcomes,
usually designated ‘success’ or ‘failure’, for each trial, and you count the number of
successes. In the geometric distribution there are, again, two possible outcomes, success
or failure, and you count the number of trials up to and including the one in which the first
success is obtained.
The following situation is an example where you count occurrences of an outcome,
but the situation does not fit either the binomial or geometric distribution model. In
1898, Ladislaus Bortkiewicz investigated the number of soldiers in the Prussian army
accidentally killed by horse kicks. It was rare for soldiers to be accidently killed in this
way. There was no way to predict when it might happen: horse kicks happen at random.
Bortkiewicz collected the following data over 20 years for 14 army corps.

Number of deaths 0 1 2 3 4
Frequency 144 91 32 11 2

Each data point in the table represents the number of accidental deaths from horse kicks in
one corps in one year; so there are 144 + 91 + 32 + 11 + 2 = 280 data points in total.
From the data, the mean average number of accidental deaths from horse kicks in one
year is 0.7 (as shown in Section 2.1). The modal number is zero. The most recorded was
four. However, it was possible, although unlikely, for the number of accidental deaths
from horse kicks in an army corps to be much greater than four, and it was not possible to
predict when during each year an accidental death from a horse kick would happen. In this
situation, you count the number of outcomes, accidental deaths from horse kicks, over a
fixed interval, one year.

Copyright Material - Review Only - Not for Redistribution


Chapter 2: The Poisson distribution

Numerous other situations have the same characteristics of rare outcomes occurring singly
and at random within a fixed interval. For example, the number of eagles nesting in a
region or the number of bubbles in an item of glassware or the number of typing errors per
page made by a good typist or the number of an insect type in a square metre of farmland.
Notice, too, that the fixed interval can be a period of time or space.
Consider this example. A clinic treats people with severe insect bites. It keeps records
for 100 days. On most days, no more than three people with severe insect bites attend the
clinic, but on one day there are 16 people. The bar chart shows the frequencies.

50

45

40

35
Frequency

30

25

20

15

10

5
27

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Number of people each day

The bar chart shows what we may expect if each member of a population has the same low
probability of attending the clinic for a severe insect bite. In theory, the maximum number
of people attending the clinic is infinite, but large numbers, much bigger than the mean, are
not expected.
To model these situations, you need a different discrete probability distribution from those
you have already studied. This type of discrete distribution is a Poisson distribution.

2.1 Introduction to the Poisson distribution


First, let us explore Bortkiewicz’s data further.

Number of deaths, x 0 1 2 3 4
Frequency, f 144 91 32 11 2

Here are calculations for the mean and the variance.

Σ fx (0 × 144) + (1 × 91) + (2 × 32) + (3 × 11) + (4 × 2) 196


x= = = = 0.7
Σf 144 + 91 + 32 + 11 + 2 280

Σ fx 2 (02 × 144) + (12 × 91) + (22 × 32) + (32 × 11) + (42 × 2)


Var( x ) = − x2 = − 0.72
Σf 144 + 91 + 32 + 11 + 2
350
= − 0.72 = 0.76
280

Copyright Material - Review Only - Not for Redistribution


Cambridge International AS & A Level Mathematics: Probability & Statistics 2

We can see that the mean and variance are quite close in value. When working with
experimental data where the outcome or event occurs at random in a given interval and
the mean and variance are almost the same value, a Poisson distribution is a good choice
to model the data further. For reasons we will not explore here, when working with
experimental data we use the mean rather than the variance as the parameter to describe a
Poisson distribution.

KEY POINT 2.1


A Poisson distribution can be used to model a discrete probability distribution in which the events
occur singly, at random and independently, in a given interval of space or time. The mean and
variance of a Poisson distribution are equal; hence, a Poisson distribution has only one parameter.

Let us explore another example. FAST FORWARD

To take away some of the pressure from an accident and emergency hospital department A proof of this
during weekdays, a minor injuries clinic opens. To determine possible staffing formula is not needed
requirements, the following data for 250 30-minute intervals are collected. at this stage, but it
will be explained
Number of patients arriving later in the chapter
0 1 2 3 4 5 .5 (Section 2.3); for now
per 30 minutes, x
we will just learn how
Frequency, f 45 81 58 40 22 4 0
to use it.
Σ fx 425
For these data, the mean number of patients per 30-minute interval is = = 1.7
28 Σf 250
Σ fx 2 1125 REWIND
and the variance is − x2 = − 1.72 = 1.61.
Σf 250
A statistician notes that patients arriving at the clinic generally do so independently of You may have already
each other at random intervals. The data are collected for fixed 30-minute intervals. The met the mathematical
mean and variance, 1.7 and 1.61, respectively, are quite close in value. All these factors constant e in the Pure
Mathematics 2 & 3
suggest the Poisson distribution is a suitable model to use. The Poisson model for the data
Coursebook. If you
is as follows:
have not, then you
If X is the random variable ‘number of patients arriving in 30 minutes’, then need to know that
X ~ Po(1.7), E( X ) = 1.7 and Var(X ) = 1.7. e = 2.7183 (to 4
decimal places) and you
For a Poisson distribution with mean λ the probability of an event r occurring is given as: can calculate powers of
λr
P(X = r ) = e − λ . e using a calculator.
r!

DID YOU KNOW?


WEB LINK
The mathematical constant e is an irrational number that is the base of
natural logarithms. Leonard Euler famously discovered the value when You can watch a more
solving a problem posed by another mathematician, Jacob Bernoulli detailed explanation in
n the e (Euler’s Number)
(pictured), to find lim n →∞  1 +  .
1
 numberphile clip on
n
YouTube.

For the minor injuries clinic example, if we have r patients arriving in 30-minute intervals,
1.7 r
then P(X = r ) = e −1.7 .
r!

Copyright Material - Review Only - Not for Redistribution


Chapter 2: The Poisson distribution

This is the graph of the probabilities of numbers of patients arriving in 30-minute intervals.

0.3

Probability
0.2

0.1

0 1 2 3 4 5 6 7
Number of patients arriving in 30-minute intervals

We can see from the graph that the probabilities peak at lower values of X and begin to get
much smaller for values of X greater than 4. It is possible, although unlikely, that a large
number of patients will arrive in a randomly chosen 30-minute time interval.
1.7 r
Using the probability formula for a Poisson distribution with mean = 1.7, P(X = r ) = e −1.7 ,
r!
we can work out expected frequencies for different numbers of patients. For example, three
1.73
patients arriving in a 30-minute interval P( X = 3) = e −1.7 = 0.150. The expected
3!
number of patients is 250 × P( X = 3) = 250 × 0.150 = 37, to the nearest integer. The 29
following table shows the probabilities and the expected frequencies, together with the
observed frequencies of patients per 30 minutes from the previous table.

Number of patients per


0 1 2 3 4 5 6 7
half-hour ( r )
1.7 r
P( X = r ) = e −1.7 0.183 0.311 0.264 0.150 0.0636 0.0216 0.00612 0.00149
r!
Expected frequency
46 78 66 37 16 5 2 0
250 × P( X = r )
Observed frequency 45 81 58 40 22 4 0 0

A comparison of the expected frequencies with the observed frequencies shows that the
numbers are all reasonably close. This allows us to use the Poisson distribution as a model.

Using the data, the statistician makes calculations and advises the clinic that there is about a
5+2+0
3% chance, × 100, of staff being needed to deal with more than four patients every
250
30 minutes.

EXPLORE 2.1

The data collected by Bortkiewicz given in the introduction is one of the earliest
examples of a Poisson distribution being used as a model. For Bortkiewicz’s data, use
the formula for Poisson probabilities and the calculated mean, 0.7 , to work out the
expected frequencies. Compare your calculated expected frequencies with the actual
data. Does the Poisson distribution appear to be a reasonable model for these data?
What predictions can you make from your theoretical model of the data?

Copyright Material - Review Only - Not for Redistribution


Cambridge International AS & A Level Mathematics: Probability & Statistics 2

KEY POINT 2.2


When modelling data using a Poisson distribution:
• Work out the mean and variance and check if they are approximately equal.
If the mean and variance are not approximately equal, the Poisson distribution is not a suitable
model to use with the data.
• Use the mean to calculate probabilities and expected frequencies.
• Compare expected frequencies with observed frequencies.

KEY POINT 2.3

If the random variable X has a Poisson distribution with parameter λ , where λ . 0, we write
X ~ Po(λ ) and:
λr
• P(X = r ) = e − λ , where r = 0, 1, 2, …
r!
• E(X ) = λ
• Var(X ) = λ

E }~ LORE 2.2
30

Using graphing software, such as GeoGebra, explore the shape of the Poisson
distribution for different values of λ .
● What features do you notice?
● How does the shape of the graph change as the value of λ varies?
● For different values of λ , find the value of r where P( X = r ) is at its maximum.
● What do you notice?
● Can you give a general answer in terms of λ ?

WORKED EXAMPLE 2.1

The number of calls to a consumer hotline is modelled by a Poisson distribution with


a mean call rate of six per minute. Calculate the probability that in a given minute
there will be:
a nine calls
b three or fewer calls
c more than one call
d at least one call.

Copyright Material - Review Only - Not for Redistribution


Chapter 2: The Poisson distribution

TIP
Answer
X ~ Po(6) Always state the When working out
random variable you more than one Poisson
are working with. probability, first
factorise out the
69 term e − λ.
a P( X = 9) = e−6 = 0.0688 Apply the formula.
9!

60 61 62 63 Note
Note that
that ‘three
‘three or
b P(X ø 3) = e −6 + e −6 + e −6 + e −6 or
0! 1! 2! 3! fewer’
fewer’ includes three.
includes 3.
 6 2
6 
3
Factorise out e −6 to
= e −6  1 + 6 + 2 + 3 ! 
make the working more
= 0.151 straightforward.

 60 61 
c P(X . 1) = 1 –  e −6 + e −6  = 1 − 0.0174 Remember that the
 0! 1! 
probability ‘greater
= 0.983 than’ = 1 – probability
‘less than or equal to’.

60
d P( X ù 1) = 1 – P(X = 0) = 1 − e −6 Remember that
0!
P( X ù 1) = P( X . 0). 31
= 1 − 0.00248 = 0.998

WORKED EXAMPLE 2.2

A typesetter makes, on average, five errors per page of typing.


a State the assumptions made to model the average number of errors per page as a Poisson distribution.
b In a book with 200 pages, on how many pages would you expect to find, at most, two errors?

Answer
a Assumptions: errors occur independently, Clearly state your assumptions.
singly and at random.
b X ~ Po(5) First state the random variable. Remember that
 52  ‘at most’ means up to and including that value.
P(X ø 2) = e −5  1 + 5 + = 0.124652019...
2 ! 
200 × 0.124652019... = 24.9 or 25 pages

Copyright Material - Review Only - Not for Redistribution


Cambridge International AS & A Level Mathematics: Probability & Statistics 2

E € RCISE 2A

1 X ~ Po(4). Calculate:
a P(X = 5) b P(X , 3) c P(X . 0)

2 X ~ Po(7). Calculate:
a P(X = 6) b P(X ù 3)

3 X ~ Po(1.5). Calculate:
a P(X = 4) b P(X ø 2) c P(X . 2)

M 4 In a particular town, it was found that potholes occur independently and at random at a rate of five in a
1 kilometre stretch of road.
a What is the probability that in a randomly chosen 1 kilometre stretch of road in this town there will be
fewer than three potholes?
b Explain why this model may not be applicable to other towns.

M 5 A fire station receives, on average, one emergency call per hour.


a i What assumptions do you need to make to model this situation by a Poisson distribution?
ii Are the assumptions reasonable in this situation?
b Assuming the assumptions are reasonable, what is the probability of four calls being received between
32 11 pm and midnight?

M 6 A manufacturer of chocolate bars states that the number of whole hazelnuts in a randomly
chosen 100 g hazelnut chocolate bar can be modelled as a random variable having a Poisson
distribution with mean 7.2.
a Find the probability that in a randomly chosen 100 g hazelnut chocolate bar there are:
i exactly eight whole hazelnuts
ii at least four whole hazelnuts.
b Describe how you could check the manufacturer’s statement.

M 7 A handful of rice grains is scattered at random onto a chessboard. Jo counts the number of rice grains in each
of the 64 squares on the chessboard.

Number of rice grains per square 0 1 2 3 4 5 6 .6


Number of squares 12 20 20 7 3 1 1 0

a Use appropriate calculations to show that it may be possible to model the distribution of rice grains using a
Poisson distribution.
b Using an appropriate value for the parameter, find the expected distribution of numbers of rice grains in
the squares.

Copyright Material - Review Only - Not for Redistribution


Chapter 2: The Poisson distribution

E ‚ LORE 2.3

For the situation in Exercise 2A Question


7 to follow a Poisson distribution, the
handful of rice is scattered at random onto
a chessboard. Will similar results be true for
different-sized boards of squares? Or for a
board of triangles?
Experiment for yourself. Tabulate your results
and calculate the mean and variance per
square or triangle.
What do your results suggest?

2.2 Adapting the Poisson distribution for different intervals


In the minor injuries clinic example, we chose to collect data on the number of patients
arriving at the clinic in 30-minute intervals. We could have chosen to collect the data in
60-minute intervals or 10-minute intervals. Suppose we had collected the data across the
whole time interval of 7500 (250 × 30) minutes. We would still have noted 425 patients
arriving. The mean arrival rate of patients would be 1.7 per 30 minutes. We could also
60
calculate the average rate of patients arriving per 60 minutes; 425 × = 3.4 patients
7500 33
10 17
per 60 minutes. Or the average rate per 10 minutes; 425 × = patients per
7500 30
10 minutes. The key point here is that the Poisson distribution is based on an average rate,
something per something. If we know the rate for a specific interval then we can adapt the
parameter to use for multiples of that interval.

KEY POINT 2.4


In a Poisson distribution, events occur at a constant rate; the mean average number of events in a
given interval is proportional to that interval.

Copyright Material - Review Only - Not for Redistribution


Cambridge International AS & A Level Mathematics: Probability & Statistics 2

WORKED EXAMPLE 2.3

People arrive at random and independently at a post office at an average rate of two people every 5 minutes.
Work out the probability of:
a three people arriving in a 10-minute period
b more than four people arriving in a half-hour period
c five people arriving in a 4-minute period
d one person arriving in a 1-minute period.

Answer
a X ~ Po(4) First find the value of λ ; 10 minutes is double 5 minutes so
−4 43 λ is 2 × 2.
P( X = 3) = e = 0.195
3!
b X ~ Po(12) First find the value of λ ; half-hour = 6 × 5 minutes
P( X . 4) = 1 − P( X ø 4) so λ is 6 × 2.
 122 123 12 4  Note that . 4 doesn’t include the value 4.
= 1 − e −12  1 + 12 + + +
 2! 3! 4! 
= 1 − 0.0076 = 0.9924 or 0.992
c X ~ Po(1.6) λ does not need to be either an exact multiple or an
5 4
P( X = 5) = e −1.6
1.6
= 0.0176 integer; here λ = × 2 = 1.6.
5! 5
34
d X ~ Po(0.4) First find the value of λ ; 1 minute is one-fifth of 5 minutes
0.41 1
P( X = 1) = e −0.4 so λ = × 2 = 0.4.
1! 5
The time period can be smaller than the original period
= 0.268
given.
Remember the mean does not need to be a whole number.

WORKED EXAMPLE 2.4

The number of breakages at a restaurant in a randomly chosen week can be modelled as a random variable having
a Poisson distribution with mean 0.8.
a Work out the probability of the following.

i Exactly one breakage in 1 week.

ii Exactly one breakage in a randomly chosen 3-week period.


b The manager offers staff a bonus if there are no breakages in 6 consecutive weeks. What is the probability
that staff receive a bonus?

Answer
a i X ~ Po(0.8) Use the probability formula directly.
1
0.8
P( X = 1) = e−0.8 = 0.359
1!

Copyright Material - Review Only - Not for Redistribution


Chapter 2: The Poisson distribution

ii X ~ Po(2.4) A randomly chosen 3-week period is a multiple of the


2.41 interval.
P( X = 1) = e−2.4 = 0.218
1!
b X ~ Po(0.8) The two approaches highlight why you can use
 multiples of an interval.
0.80 
6
P( X = 0) =  e−0.8 = 0.00823
 0 !  Calculate the probability of no breakages in 1 week
then raise to power 6 for 6 weeks.
Or X ~ Po(4.8)
4.80 Or find the average rate for 6 weeks, 6 × 0.8, and use
P( X = 0) = e−4.8 = 0.00823 this average for a single 6-week interval.
0!

E ƒ„ RCISE 2B

1 Potholes occur independently and at random at a rate of five in a stretch of road 1 kilometre long. Calculate
the probability that in a randomly chosen stretch of road 2 kilometres long there will be:
a exactly eight potholes b fewer than two potholes.

2 The number of faults in a roll of wallpaper can be modelled as a random variable having a Poisson
distribution with mean 0.6. Find the probability that a decorator using four rolls of wallpaper for a room
finds no faults in the wallpaper.

M 3 The number of flaws in a given length of cloth occur at the rate of 1.6 per metre. State the assumptions you 35
need to make to model this situation as a Poisson distribution. Find the probability that:
a in a 5-metre length of cloth there are no flaws
b in a 21 -metre length of cloth there are two or more flaws.

M 4 The number of cars passing a point on a road can be modelled as a random variable having a Poisson
distribution with mean two cars per 5 minutes.
a What is the probability in a randomly chosen 20-minute period that more than three cars will pass that
point on the road?
b What conclusions might you draw if no cars pass that point in the randomly chosen 20-minute period?
c How might installing traffic lights at one end of the road affect the Poisson model?

5 Over a long period of time, a plumber finds that, on average, he receives two emergency calls per week. Work
out the probability of:
a no emergency calls in a two-week period
b one emergency call on one day (assume the plumber is available for emergency calls five days per week).

6 A typist makes, on average, one error for every 200 keyboard strokes. Assuming the errors occur
independently and at random, find the probability that:
a in a document requiring 400 keyboard strokes there is, at most, one error
b in two documents requiring 400 keyboard strokes there is, at most, one error.

M 7 The number of orders placed at an online store is 4500 per hour.


a What assumptions do you need to make to model the number of orders placed at an online store using a
Poisson distribution?

Copyright Material - Review Only - Not for Redistribution


Cambridge International AS & A Level Mathematics: Probability & Statistics 2

b Assuming the number of orders placed can be modelled as a Poisson distribution, find the probability of:
i zero orders occurring per second ii one order occurring per second.

M 8 Over a period of time, a consumer complaints department notes that it receives an average of 4.2 emails per hour.
a What assumptions do you need to make to model the number of email complaints received by a consumer
complaints department as a Poisson distribution?
b Why might it be unlikely that all complaints are made independently of each other?
c Are complaints more likely to be made at certain times rather than others?
d Assuming the number of emails received can be modelled as a Poisson distribution, find the probability
that the consumer complaints department receives:
i nine emails in 2 hours ii three emails per minute.

DID YOU KNOW?

In 1837 the French mathematician Simeon-Denis Poisson published


his probability theory in his research work on the probability of
judgements in criminal and civil matters, in which he theorised on
the number of wrongful convictions. The Poisson distribution is named
in his honour.

36

2.3 The Poisson distribution as an approximation to the binomial


distribution
A manufacturer makes plastic pipe in a continuous length before cutting the pipe into
shorter lengths to sell. Suppose there are, on average, four defects per metre of pipe; then
we have a Poisson distribution Po(4).
On average, how many defects would there be in a 10 cm length or in a 1cm length or in a
1mm length of plastic pipe? It would be reasonable to say that for some length of pipe there
will be either zero or one defect, where the probability of more than one defect is so small
that it may be ignored.
If we now use n pieces of this smaller length of pipe to make a 1m length of pipe, the
4
probability of one defect in a length of pipe is . We have a fixed number of pieces with
n
only two outcomes, zero or one defect, and hence a binomial distribution B  n,  .
4
 n
From this example, we can see that the Poisson distribution and the binomial distribution
are related. Our question is for what values of n and p is it reasonable to use a Poisson
distribution to approximate to the binomial distribution?

In this example, we have B  n,  ≈ Po(4).


4
 n
4
For the binomial, mean = n × = 4, which is the same as the mean value for Po(4).
n
The variance of the binomial = n ×  1 −  . As n gets larger, gets very small and
4 4 4
n n n
4
1 − ≈ 1. And so the variance = 4, which is the same as the variance for Po(4). This shows
n

Copyright Material - Review Only - Not for Redistribution


Chapter 2: The Poisson distribution

that the approximation of a binomial distribution by a Poisson distribution improves as n


becomes larger.
Let us explore some graphs of these discrete random variables.
These graphs show Poisson distributions for different mean values.

λ=1 λ=3 λ = 4.8

0 2 4 6 8 0 2 4 6 8 10 12 0 5 10 15

Graphs of Poisson distributions are always skewed, but when the mean is small, a graph
of the Poisson distribution is very skewed. As the mean increases, the graph of a Poisson
distribution becomes more symmetrical.
These graphs show binomial distributions for different values of n and p.

0.4

0.2
37
B (10, 0.1) B (50, 0.06) B (60, 0.08)

0
0 2 4 6 8 10 10 20 10 20

For a binomial distribution E(X ) = np, and for a Poisson distribution E(X ) = λ .
Comparing the graphs of the two distributions with the same mean value, np = λ, we see
that for B(20, 0.05)and Po(1) the Poisson graph is very skewed whereas the binomial graph
is more symmetrical.
As n becomes large and p becomes small, we can see the graph for B(60, 0.08) becoming
more similar to the graph for Po(4.8).

KEY POINT 2.5

For a binomial distribution when the value of n is large and p is small (implying occurrence of the
event is rare), such that np is moderate (as a guide, n . 50 and np , 5), the Poisson distribution
with mean np can be used as an approximation for the binomial distribution.

EXPLORE 2.4

Many binomial distributions give rise to the same Poisson distribution. For example,
Po(4.8) could be approximated from B(100, 0.048) or B(10, 0.48) or B(20, 0.24). Not all of
these are equally good approximations. Use graphing software, such as GeoGebra,
to explore different graphs of binomial and Poisson distributions where np = λ.

Copyright Material - Review Only - Not for Redistribution


Cambridge International AS & A Level Mathematics: Probability & Statistics 2

To find out how good an approximation a Poisson distribution is to the binomial


REWIND
distribution, let’s compare exact probabilities and approximate probabilities.
Gareth makes glassware. An item of glassware is imperfect if it contains bubbles. Over the Recall from Probability
years, he finds that the probability of any item containing bubbles is 0.012. Gareth made a & Statistics 1
Coursebook, Chapter 7,
batch of 375 items of glassware.
that this situation
Let X be the random variable ‘number of glassware items containing bubbles’. Then can be modelled by
X ~ B(375, 0.012). a binomial situation
with n = 375 and
Assuming the presence of a bubble in one item of glassware is independent of the p = 0.012.
presence of a bubble in another item of glassware, then np = 375 × 0.012 = 4.5 and
B(375, 0.012) ≈ Po(4.5).
The following table shows some probabilities of finding bubbles in an item of glassware.

Number of bubbles, r P( X = r ) 0 1 2 3
 375  r
Binomial probability  r  0.012 0.0108 0.0492 0.112 0.169
375– r
(1 – 0.012)
e –4.5 4.5r
Poisson approximation 0.0111 0.0450 0.112 0.169
r!

We can see that the probabilities, correct to 3 significant figures, are almost the same for
all these values; hence, the Poisson is a good approximation to the binomial to use in this
situation.
38
WORKED EXAMPLE 2.5

A company produces electrical components. Past records show that the proportion of faulty components is 0.4%.
Hari buys a box of 1000 electrical components. Using a suitable approximation, work out the probability that
more than five components are faulty.

Answer
Let the random variable X be ‘the number of The situation is binomial.
faulty components’. Then X ~ B(1000, 0.004) . 0.4
Probability 0.4% = = 0.004
100
B(1000, 0.004) ≈ Po(4)
Poisson approximation is suitable as n is large, p is
P(X . 5) = 1 – P(X ø 5) small and np = 1000 × 0.004 = 4.
 40 41 42 43 44 45 
= 1 − e −4  + + + + + Factorise out to simplify the working.
 0 ! 1 ! 2 ! 3 ! 4 ! 5 ! 
= 1 − 0.7851
= 0.2150

E † RCISE 2C

1 For the random variable X , where X ~ B(60, 0.05), use a suitable approximation to find:
a P( X , 4) b P( X ù 4)

2 For the random variable X , where X ~ B(120, 0.02) , use a suitable approximation to find:
a P( X ø 5) b P( X . 2)

Copyright Material - Review Only - Not for Redistribution


Chapter 2: The Poisson distribution

3 For the random variable X , where X ~ B(200, 0.01), use a suitable approximation to find P(6 , X , 10).

PS 4 Past records show that the proportion of faulty resistors manufactured by a company is 0.2%. Robert buys a
box of 450 resistors. Using a suitable approximation, work out the probability that fewer than three resistors
are faulty.

PS 5 A machine is known to produce defective components, at random and independently of each other, on
average 0.24% of the time. In a production of 500 components, state a suitable approximating distribution and
calculate the probability that, at most, three components will be defective.

M 6 A rare reaction to a prescribed medicine occurs in 0.05% of patients.


a Using a suitable approximation, find the probability that in a random sample of 3000 patients prescribed
this medicine, four or more will suffer the rare reaction.
b In a random group of n patients, the probability that none suffer the rare reaction is 0.001. Work out the
value of n.

PS 7 For a certain flower, a seed mutation occurs at random with probability 0.0004. A total of 12 000 seeds
germinate. Let X be the number of seeds that germinate and carry the mutation.
a Justify using the Poisson distribution as an approximating distribution for X .
b Use your approximating distribution to find P( X ø 3).
c Calculate P( X ø 3) given that X > 1.

39
Consider again the example of the manufacturer of plastic pipe at the start of Section 2.3.
In that example there were n small pieces of pipe in which there was only zero or one defect
in each piece, and hence X ~ B  n,  . To ensure each piece does not contain more than
4
 n
one defect, the pieces would need to be very small and we would have a large number of
pieces.
 1000   4  0  4 
1000
Suppose n = 1000, then P(X = 0) =  1 − = 0.018169…
 0   1000   1000 
0 10 000
 10 000  4   4 
For n = 10 000, P(X = 0) =      1− = 0.018300…
 0   10 000   10 000 
0 100 000
 100 000  4   4 
For n = 100 000, P(X = 0) =      1− = 0.018315…
 0   100 000   100 000 

We can see that for increasing values of n the probabilities tend towards 0.018315… = e −4.
This is the value given by the Poisson probability for P( X = 0), where X ~ Po(4) .

Let’s move on to explore the probabilities for one defect using the binomial X ~ B  n,  .
4
 n
 1000   4  1  4 
999
For n = 1000 , P( X = 1) =  1 − = 0.072969…
 1   1000   1000 
1
 10 000  4   4 
9999
For n = 10 000, P( X = 1) =  1 − = 0.072323…
 1   10 000   1000 

If we continue to calculate probabilities for one defect, increasing the values of n, the result
will tend towards 0.073262…, and this is the value given by the Poisson probability for
P(X = 1) for e −4.

Copyright Material - Review Only - Not for Redistribution


Cambridge International AS & A Level Mathematics: Probability & Statistics 2

 λ
In general, for X ~ B  n,  and r defects:
 n
n−r n−r
 n   λr  λ n( n − 1)( n − 2) … ( n − r + 1) λ r  λ
P(X = r ) =     1 −  = × r 1 − 
 r   n  n r! n n
n−r
λr n n − 1 n − 2 n − r +1  λ
= × × × ×… × × 1 − 
r! n n n n  n
n−r n
n −1 n −2 λ λ
As n increases, the fractions , , … tend towards 1; and 1 −  
≈ 1 − 
n n  n  n
n
 λ
since r is negligible compared to n. We found earlier that 1 −  tends towards e − λ .
 n
λ r −λ
Hence, P(X = r ) = e , which is the probability formula for X ~ Po(λ ), the formula
r!
that was given in Section 2.1.

2.4 Using the normal distribution as an approximation to the Poisson


distribution
In Explore 2.2 and in Section 2.3 we looked at graphs of Poisson distributions with
different values of λ . This series of graphs shows the shape of the Poisson distribution as
the value of λ increases.

40

λ = 15 λ = 20
λ = 10

0 5 10 15 20 25 0 10 20 30 0 10 20 30 40

A normal distribution curve with the same mean and variance as the Poisson graph – that REWIND
is, N(λ , λ ) – has been drawn on each graph.
In Probability &
Notice that as the value of λ increases the Poisson graph improves as a fit to the normal Statistics 1 Coursebook,
curve. The shape of the Poisson distribution graph is always skewed; however, for larger Chapter 8, we saw that
values of λ , the Poisson distribution graph resembles the shape of a normal distribution. the normal distribution
is a continuous
The Poisson distribution is a limiting distribution of binomial distributions. A binomial
probability distribution
distribution can be approximated by a normal distribution, so it seems reasonable
described by two
to suppose that the Poisson distribution itself can be approximated by the normal parameters: mean and
distribution, with a continuity correction applied. variance.

Copyright Material - Review Only - Not for Redistribution


Chapter 2: The Poisson distribution

For a Poisson distribution the mean and variance are the same, and hence Po( λ ) ≈ N( λ , λ ).
REWIND
A Poisson distribution is a discrete probability distribution and so to use the normal
distribution as an approximation to the Poisson distribution a continuity correction has In Probability
& Statistics 1
to be applied, just as one has to be applied when using the normal distribution as an
Coursebook,
approximation to a binomial distribution.
Chapter 8, we saw
that a continuity
correction had to be
KEY POINT 2.6
applied when using the
For λ . 15, the Poisson distribution with mean λ can be reasonably approximated by the normal normal distribution as
distribution with mean λ and variance λ , with a continuity correction applied. The accuracy of this an approximation to a
approximation improves as λ increases. binomial distribution.

We can explore how this works, and how good an approximation it might be, by working
out some probabilities. Assume that Manuela posts pictures on her social media page at
random points in time. She posts, on average, 24 pictures each week.
Let the random variable X be ‘the number of pictures posted each week’.
Then X ~ Po(24) .
e−24 24 n
P( X = n ) =
n!
The table shows some probabilities for the number of pictures posted using the Poisson
formula and using a normal approximation.

41
Poisson Normal approximation Comments
X ~ Po(24) Y ~ N(24, 24) Apply the continuity correction when using the
normal approximation.

P( X = 30)  30.5 − 24 29.5 − 24  The approximate probability using the normal


P(29.5 ø Y ø 30.5) = P  øZø
 24 24  distribution is a range of values.
e –24 2430
= = Φ(1.327) − Φ(1.123)
30 !
= 0.0363 = 0.9077 – 0.8692 = 0.0385

 30.5 − 24 29.5 − 24  29 and 31 are not included and so the range of


P(29 , X , 31) P(29.5 ø Y ø 30.5) = P  øZø
 24 24  values is 29.5 to 30.5.
e –24 2430
= = Φ(1.327) − Φ(1.123)
30 !
= 0.9077 – 0.8692
= 0.0363
= 0.0385
P(24 ø X ø 26)  26.5 − 24 23.5 − 24  24 and 26 are included and so the range of values is
P(23.5 ø Y ø 26.5) = P  øZø
e –24
24 24  24 24  23.5 to 26.5.
= = Φ(0.510) − Φ( − 0.102)
24 !
e –24 2425 = 0.6950 – (1 – 0.5406)
+
25 ! = 0.236
e –24 2426
+
26 !
= 0.231

Copyright Material - Review Only - Not for Redistribution


Cambridge International AS & A Level Mathematics: Probability & Statistics 2

Poisson Normal approximation Comments


P(19 , X ø 21)  21.5 − 24 19.5 − 24  19 is not included but 21 is included, so the range of
P(19.5 ø Y ø 21.5) = P  øZø
e –24
24 20  24 24  values is 19.5 to 21.5.
=
20 ! = Φ( − 0.510) − Φ( −0.919)
e –24 2421 = (1 – 0.6950) – (1 – 0.8209)
+
21 ! = 0.126
= 0.134

From these calculations we can see that the probabilities using the normal distribution
as an approximation to the Poisson distribution give the same set of values, correct to
2 decimal places, as the calculations using the Poisson probability formula in almost all the
probabilities.

WORKED EXAMPLE 2.6

Over the years, a biologist notes that a species of turtle lays on average 60 eggs in each nest. The number of eggs
laid in each nest follows a Poisson distribution, and is independent of the number of eggs laid in other nests.
Calculate the probability that in a randomly chosen nest there are:
a exactly 50 eggs b over 74 eggs c 40 eggs or fewer.

Answer
Let X be the random variable ‘number of eggs in a nest’. Always state the distribution you are using
42
Then X ~ Po(60) ≈ N(60, 60). and any approximating distribution.

 50.5 − 60 49.5 − 60  You could just use Poisson here; the question
a P( X = 50) ≈ P  øZø
 60 60  is included to highlight the process for a
= Φ( −1.226) − Φ( −1.356) single value.
= (1 – 0.8899) – (1 – 0.9125) = 0.0226

 74.5 − 60  You could have written 1 – P( X ø 74).


b P( X . 74) = 1 – P( X , 75) ≈ 1 – P  Z ø
 60 
74.5 is the value you need to work with from
= 1 − Φ(1.872) = 1 – 0.9693 = 0.0307
writing either , 75 or ø 74 .

 40.5 − 60 
c P( X ø 40) ≈ P  Z ø
 60 
= Φ( −2.517) = 1 – 0.9941 = 0.0059

E ‡ˆ RCISE 2D

1 X ~ Po(42). Use the normal approximation to find:


a P( X , 50) b P( X ø 40) c P( X = 45)

2 X ~ Po(50). Use the normal approximation to find:


a P(52 , X , 56) b P(50 ø X ø 52)

Copyright Material - Review Only - Not for Redistribution


Chapter 2: The Poisson distribution

3 For a particular pond, in 1 ml of pond water, on average there are 117 microorganisms. Find the probability
there are more than 1200 microorganisms in a 10 ml sample of the pond water.

M 4 On a particular train line, delayed trains occur at an average rate of eight per day.
a What is the probability that fewer than 100 trains are delayed on this line in a 14-day period?
b Decide if your Poisson model for the 14-day period is reasonable to use in this situation. Explain your decision.

PS 5 Fabio receives, on average, 48 emails at work each day. Emails are received at random and independently.
Fabio can answer, at most, 60 emails each day. Find the probability that on a randomly chosen day, Fabio can
answer all emails received.

6 Given that X ~ Po(42) and P ( X . x ) ø 0.1, find the minimum integer value of x.

PS 7 The number of tea lights that are lit at a place of memorial during one day follows a Poisson distribution with mean
38. How many tea lights should be available to be at least 98% certain that there are sufficient tea lights for the
demand?

PS 8 The number of errors made by customers when using online banking transactions each week follows a
Poisson distribution with mean 25.
a Find the probability that there are more than 32 customer errors in a randomly chosen week.
b Katya calculates that it is almost certain that the number of customer errors on a randomly chosen day
is greater than 11 and less than 39. Use calculations to show how Katya arrived at her conclusion.

43
2.5 Hypothesis testing with the Poisson distribution REWIND
For a single observation from a population that has a Poisson distribution, we can directly
compare Poisson probabilities or use a normal approximation to the Poisson distribution. Chapter 1 showed
how to carry out a
hypothesis test with the
WORKED EXAMPLE 2.7 binomial distribution.
To carry out hypothesis
Records show that 1% of the population has a positive reaction to a test for a testing with the
particular allergy. In a village, 120 people are tested and four people have a positive Poisson distribution,
the process is the
reaction. Test at the 5% level of significance if there is any evidence of an increase in
same. Go back and
the population with a positive reaction for this particular allergy.
review Chapter 1 to
Answer remind yourself of the
procedure involved.
X ~ B(120, 0.01) ≈ Po(1.2) State the distribution; n
H 0 : λ = 1.2; H1: λ . 1.2 is large and np , 5 so
approximate to Poisson.
5% significance level one-tailed test
State the null and alternative
P( X ù 4) = 1 − P( X ø 3)
hypotheses.
 1.2 0 1.21 1.22 1.23 
= 1 – e −1.2  + + + State the significance level of
 0! 1! 2! 3 ! 
the test and whether one- or
= 1 − 0.9662 = 3.38% two-tailed.
As 3.38% , 5%, reject the null hypothesis. Calculate the probability.
There is evidence at the 5% level of significance to
suggest that in this village there has been an increase Compare the probability
in the population who react positively to a test for with the significance level.
this particular allergy. Interpret the result.

Copyright Material - Review Only - Not for Redistribution


Cambridge International AS & A Level Mathematics: Probability & Statistics 2

REWIND

Using the binomial distribution, we can see the same conclusions would be reached:
X ~ B(120, 0.01)
H 0 : µ = 0.01; H1 : µ . 0.01
5% level one-tailed test
P( X ù 4) = 1 − P( X ø 3)
  120   120   120   120  
= 1 −    0.010 0.99120 +   0.011 0.99119 +   0.012 0.99118 +   0.013 0.99117 
 0   1   2   3  
= 1 − 0.9670 = 3.3%
As 3.3% , 5%, reject the null hypothesis, as before.

WORKED EXAMPLE 2.8

Accidents on a stretch of road occur at the rate of seven each month. New traffic measures are put in place to
reduce the number of accidents. In the following month, there are only two accidents.
a Test at the 5% level of significance if there is evidence that the new traffic measures have significantly
reduced the number of accidents.
b Over a period of 6 months, there are 32 accidents. It is claimed the new traffic measures are no longer
reducing the number of accidents.
44
i Test this claim at the 5% level of significance.
ii What would your conclusion be if you tested the claim at the 10% level of significance?
Answer
a X ~ Po(7) State the distribution.
H 0: λ = 7; H1: λ , 7 State the null and alternative hypotheses.
5% level one-tailed test State the significance level of the test and whether
 7 0 71 7 2  one- or two-tailed.
P( X ø 2) = e −7  0 ! + 1 ! + 2 ! 
Calculate the probability.
= 0.0296 = 2.96%
As 2.96% , 5%, reject the null hypothesis. Compare the probability with the significance
There is evidence to suggest the new traffic level.
measures reduce the number of accidents. Interpret the result.

b i X ~ Po(42) ≈ N(42, 42) State the distribution and the approximating


H 0 : µ = 42; H1: µ , 42 distribution.
5% level one-tailed test State the null and alternative hypotheses; since we
 32.5 − 42  are approximating to the normal distribution, use
P( X ø 32) = P  Z ø
 42  µ and not λ .
= Φ( −1.466) = 1 – 0.9287 = 0.0713
State the significance level of the test and whether
As 7.13% . 5%, accept the null hypothesis.
one- or two-tailed.
There is insufficient evidence to suggest the
new traffic measures have reduced the number Calculate the probability. Remember to use the
of accidents. continuity correction.
Compare the probability with the significance level.
Interpret the result.

Copyright Material - Review Only - Not for Redistribution


Chapter 2: The Poisson distribution

ii 7.31% , 10% Show the relevant comparison; there is no need to


There is evidence at the 10% significance carry out an additional calculation.
level that the new traffic measures reduce the Interpret the result.
number of accidents.

EXPLORE 2.5

For Worked example 2.8, at a 5% level of significance results from the first month
suggest there is evidence that the new traffic measures reduce the number of
accidents, yet results from 6 months suggest there is insufficient evidence. Which
is true? What other factors might affect the result? At a 5% significance level,
statistically 1 month in every 20 months there will be fewer accidents. What does
it imply if the number of accidents in the first month is a statistical fluke? If drivers
become more careless over time as they get used to a new road layout, what does this
imply about your answer to part b ii? Over what period of time should you collect
data to test the effect of the new traffic measures? What else can you do to show if the
new traffic measures have reduced the number of accidents?

45
E ‰Š RCISE 2E

PS 1 The number of calls to a consumer hotline can be modelled by a Poisson


distribution with mean 62 calls every 5 minutes. Salina believes this average is
too low and observes the number of calls recorded during a randomly chosen
5-minute interval to be 70. Stating the null and alternative hypotheses, test
Salina’s belief at the 10% significance level.

PS 2 A small shop sells, on average, seven laptops per week. Following a price rise, the
number of laptops sold drops to four per week. Test at the 5% significance level
whether the sales of laptops have significantly reduced.

PS 3 At a certain company, machine faults occur randomly and at a constant mean


rate of 1.5 per week. Following an overhaul of the machines, the company boss
wishes to determine if the mean rate of machine faults has fallen. The number of
machine faults recorded over 26 weeks is 28. Use a suitable approximation and REWIND
test at the 5% significance level whether the mean rate has fallen.
Hypothesis testing
based on direct
PS 4 The number of misprints per page of a newspaper follows a Poisson distribution
evaluation of Poisson
with mean two per page. Following new procedures, 49 misprints are found in
probabilities of making
32 pages of the newspaper. Type I and Type II
a Use a suitable approximation to test at the 5% level of significance if the mean errors are calculated in
number of misprints has changed. the same way as for the
binomial distribution
b How many misprints would be needed for a Type I error to have occurred? in Chapter 1.

Copyright Material - Review Only - Not for Redistribution


Cambridge International AS & A Level Mathematics: Probability & Statistics 2

Checklist of learning and understanding


● A Poisson distribution is a suitable model to use for events that occur:
● singly
● independently
● at random in a given interval of time or space
● at a constant rate: this is the mean number of events in a given interval that is proportional
to the size of the interval.
● If for a Poisson distribution X ~ Po( λ ), where λ . 0, then:
λr
● P( X = r ) = e − λ × r!
, where r = 0, 1, 2, 3,…
● Mean E( X ) = λ and Variance Var( X ) = λ .
● A binomial distribution B( n, p ), where n is large such that n . 50, and p is small such that
np , 5 , can be approximated by a Poisson distribution Po( np ). The larger the value for n and
the smaller the value for p, the better the approximation.
● A Poisson distribution Po( λ ), where λ . 15, may be approximated by the normal distribution
N( λ , λ ). A continuity correction must be applied.

46

Copyright Material - Review Only - Not for Redistribution

You might also like