Chapter 5
Chapter 5
Discrete Probability
Distributions
When computing probabilities, the sample space, which comtains all the out-
comes of the experiment, is listed. If the probabilities for all of the outcomes are
also listed then these two together are called a probability distribution. With a
probability distribution, the shape can be determined, the mean and standard
deviation can be calculated, and the probability of events can be found. How
to find all of these concepts depends on what type of quantitative variables
are being considered. Remember there are different types of quantitative vari-
ables, called discrete or continuous. What is the difference between discrete and
continuous data? Discrete data can only take on particular values in a range.
Continuous data can take on any value in a range. Discrete data usually arises
from counting while continuous data usually arises from measuring.
If you have a variable, and can find a probability associated with that variable,
it is called a random variable. In many cases the random variable is what
you are measuring, but when it comes to discrete random variables, it is usually
what you are counting. So for the example of how tall is a plant given a new
fertilizer, the random variable is the height of the plant given a new fertilizer.
For the example of how many fleas are on prairie dogs in a colony, the random
variable is the number of fleas on a prairie dog in a colony.
Examples of each:
How tall is a plant given a new fertilizer? Continuous. This is something you
measure.
How many fleas are on prairie dogs in a colony? Discrete. This is something
you count.
Now suppose you put all the values of the random variable together with the
157
158 CHAPTER 5. DISCRETE PROBABILITY DISTRIBUTIONS
probability that the random variable would occur. You could then have a distri-
bution like before, but now it is called a probability distribution since it involves
probabilities. A probability distribution is an assignment of probabilities to
the values of the random variable.
With the idea of a probability distribution, the next thing is to look at the basics
of a probabiity distribution.
In this case the packages you need are arm and Weighted.Desc.Stat.
## Loading required package: lme4
##
## Attaching package: 'lme4'
## The following object is masked from 'package:mosaic':
##
## factorize
##
## arm (Version 1.11-2, built: 2020-7-27)
## Working directory is /Users/kathrynkozak/R_packages/Stat_book
##
## Attaching package: 'arm'
## The following objects are masked from 'package:mosaic':
##
## logit, rescale
To draw the probability distribution, use the following command. First you
need to create variables for x and the probability (p) in R Studio. Then you
can draw the distribtuion.
x<-c(1, 2, 3, 4, 5, 6,7)
p<-c(.267, .336, .158, .137, 0.063, 0.024, 0.015)
discrete.histogram(x,p, bar.width = 1, main="size of family")
This command is different than the commands used in the past, but is needed
for discrete probability distributions. So putting a title on the graph uses the
command main=“title you want” instead of title= as before.
Notice this graph (Figure 5.1) is skewed right, which means that most families
have around 2 people in them and larger families become more and more rare.
To find the mean, variance, and standard deviation using R Studio, make sure
that the package Weighted.Desc.Stat is loaded, then use the following com-
mands.
160 CHAPTER 5. DISCRETE PROBABILITY DISTRIBUTIONS
size of family
0.30
0.20
Probability
0.10
0.00
1 2 3 4 5 6 7
x
w.mean(x,p)
## [1] 2.525
w.var(x,p)
## [1] 2.023375
w.sd(x,p)
## [1] 1.422454
The mean is 2.525 people, the variance is 2.02 𝑝𝑒𝑜𝑝𝑙𝑒2 , and the standard devia-
tion is 1.42 people.
When calculating the mean and standard deviation of a probability distribu-
tion, you can consider the population distribution the population even though
it was most likely created from a large sample. Since a probability distribution
is basically a population, the mean and standard deviation that are calculated
are actually the population parameters and not the sample statistics. The no-
tation used is the same as the notation for population mean, 𝜇, and population
standard deviation, 𝜎, that was used in chapter 3. Note: the mean can also be
thought of as the expected value. It is the value you expect to get if the trials
were repeated infinite number of times. The mean or expected value does not
5.1. BASICS OF PROBABILITY DISTRIBUTIONS 161
need to be a whole number, even if the possible values of x are whole numbers.
This means one can find what value they can expect to get in the long run for
gambling or insurance including extended warranties using the mean of a prob-
ability distribution. First one needs to figure out the probability distribution,
and then follow the process in example 5.1.1.
## [1] -0.5
The expected value (or mean) is -0.5. That is -$0.50. Since it is negative, that
means you lose $0.50 every time you play the Pick 3. It seems you would be
162 CHAPTER 5. DISCRETE PROBABILITY DISTRIBUTIONS
better off putting the $1 every week into a savings account then playing the
Pick 3 lottery.
Solution:
To determine this, you need to look at probabilities. Again, look at the prob-
ability of x being four or less or the probability of x being four or more. The
𝑃 (𝑥 ≤ 4) = 𝑃 (0) + 𝑃 (1) + 𝑃 (2) + 𝑃 (3) + 𝑃 (4) = 0.267 + 0.336 + 0.158 + 0.137 =
0.898 Since this probability is more than 5%, four is not an unusually low value.
The 𝑃 (≥ 4) = 𝑃 (4)+𝑃 (5)+𝑃 (5)+𝑃 (7) = 0.137 +0.063+0.024+0.015 = 0.239
Since this probability is more than 5%, four is not an unusually low value. Thus,
four is not an unusual size of a family.
d. If you did come upon a family that has four people in it, what would you
think?
Solution:
Since it is not unusual for a family to have four members, then you would not
think anything is amiss.
5.1.4 Homework
1. Eyeglassomatic manufactures eyeglasses for different retailers. The num-
ber of days it takes to fix defects in an eyeglass and the probability that
it will take that number of days are in the table.
Table #5.1.4: Number of Days to Fix Defects
## days prob
## 1 1 0.249
## 2 2 0.108
## 3 3 0.091
## 4 4 0.123
## 5 5 0.133
## 6 6 0.114
## 7 7 0.070
## 8 8 0.046
## 9 9 0.019
## 10 10 0.013
## 11 11 0.010
## 12 12 0.008
## 13 13 0.006
## 14 14 0.004
## 15 15 0.002
## 16 16 0.002
## 17 17 0.001
## 18 18 0.001
a. State the random variable.
b. Draw a histogram of the number of days to fix defects
c. Find the mean number of days to fix defects.
5.2. BINOMIAL PROBABILITY DISTRIBUTION 165
3) There are only two outcomes, which are called a success and a failure.
) The probability of a success doesn’t change from trial to trial, where p =
probability of success and q = 1 − 𝑝 = probability of failure.
If you know you have a binomial experiment, then you can calculate binomial
probabilities. This is important because binomial probabilities come up often
in real life. Examples of binomial experiments are:
Toss a fair coin ten times, and find the probability of getting two heads.
Question twenty people in class, and look for the probability of more than half
being women?
Shoot five arrows at a target, and find the probability of hitting it five times?
𝑃 (𝑥 ≤ 𝑟) =
pbinom(r, n, p, lower.tail=TRUE)
𝑃 (𝑥 ≥ 𝑟) =
pbinom(r-1, n, p, lower.tail=FALSE)
## [1] 0.8179069
d. Find the probability that nine have green eyes.
Solution: If nine have green eyes, then r=9. Probability that 9 have green
eyes is 𝑃 (𝑋 = 9) = 1.50𝑋10−13 . Notice that R gives the answer as 1.50391e-13.
This is the way many computer programs write a number in scientific notation.
It isn’t possible for a computer to write it as 1.50381𝑋10−13 , but it is possible
for humans to write it correctly. So make sure the answer is written in the
correct scientific notation.
dbinom(9,20,0.01)
## [1] 1.50381e-13
e. Find the probability that at most three have green eyes.
Solution:
At most three means that three is the highest value you will have. Find the
probability of x is less than or equal to three. Since this is less than, then the
lower tail of the probability distribution is being used, so 𝑃 (𝑋 ≤ 3) = 0.99996
using the command in R Studio of
168 CHAPTER 5. DISCRETE PROBABILITY DISTRIBUTIONS
pbinom(3,20,0.01, lower.tail=TRUE)
## [1] 0.9999574
The reason the answer is written to more decimal palces is because when it is
rounded to three decimal places the rounding makes the answer 1. But 1 means
that the event will happen, when in reality there is a slight chance that it won’t
happen. It is best to write the answer to more decimal places or it can be
written as >0.999 to represent that the number is very close to 1, but isn’t 1.
Solution:
At most 2 means 2 or less. So find the probability that there are less than or
equal to 2. 𝑃 (𝑋 ≤ 2) = 0.999, and again, this is the lower tail of the probabilty
distribution, so use lower.tail=TRUE in the R command:
pbinom(2,20,0.01, lower.tail=TRUE)
## [1] 0.9989964
Solution:
At least four means four or more. Find the probability of x being greater than
or equal to four. Since it is greater than or equal to, this is the right tail of
the probability distribution. However, if you just use lower.tail=FALSE, then
the 4 is not included in R calculations. You want all numbers from 4 on up, so
you need to use 𝑟 = 4 − 1 = 3 in the R command. This will include 4 in the
calculation. 𝑃 (𝑋 ≥ 4) = 4.26𝑋10−5
pbinom(4-1,20,0.01, lower.tail=FALSE)
## [1] 4.262093e-05
h. In Europe, four people out of twenty have green eyes. Is this unusual?
What does that tell you?
Solution:
Since the probability of finding four or more people with green eyes is much
less than 0.05, it is unusual to find four people out of twenty with green eyes.
That should make you wonder if the proportion of people in Europe with green
eyes is more than the 1% for the general population. If this is true, then you
may want to ask why Europeans have a higher proportion of green-eyed people.
That of course could lead to more questions.
5.2. BINOMIAL PROBABILITY DISTRIBUTION 169
## [1] 0.892002
d. Find the probability that seven have autism.
Solution: 𝑃 (𝑋 = 7) = 2.84𝑋10−12
dbinom(7,10, 1/88)
## [1] 2.837346e-12
e. Find the probability that at least five have autism.
Solution: 𝑃 (𝑋 ≥ 5) = 4.553𝑋10−8 . Again, this is the upper tail of the
probability distribution, so use lower=tail=FALSE and 𝑟 = 5 − 1 = 4 to make
sure that R calculates for 5 and on up.
pbinom(5-1, 10, 1/88, lower.tail=FALSE)
## [1] 4.553416e-08
f. Find the probability that at most two have autism.
170 CHAPTER 5. DISCRETE PROBABILITY DISTRIBUTIONS
## [1] 0.9998341
g. Suppose five children out of ten have autism. Is this unusual? What does
that tell you?
Solution:
Since the probability of five or more children in a group of ten having autism is
much less than 5%, it is unusual to happen. If this does happen, then one may
think that the proportion of children diagnosed with autism is actually more
than 1/88.
5.2.3 Homework
1. Approximately 10% of all people are left-handed (”11 little-known facts,”
2013). Consider a grouping of fifteen people.
i. Suppose of the next twelve patients discharged, ten did not fill their cardiac
medication, would this be unusual? What does this tell you?
and then
172 CHAPTER 5. DISCRETE PROBABILITY DISTRIBUTIONS
p\<-dbinom(0:n, n, p)
It looks like nothing happened, but R save the values as variables. To see what
is in each of those values, type
x
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [19] 18 19 20
p
0 1 2 3 4 5 6 7 8 9 11 13 15 17 19
x
## [1] 0.2
w.var(x,p)
## [1] 0.198
w.sd(x,p)
## [1] 0.4449719
You expect on average that out of 20 people, less than 1 person would have
green eyes, with are variance of 0.198 𝑝𝑒𝑜𝑝𝑙𝑒2 and a standard deviation of 0.44
people.
5.3.2 Homework
1. Suppose a random variable, x, arises from a binomial experiment. Suppose
n = 6, and p = 0.13.
174 CHAPTER 5. DISCRETE PROBABILITY DISTRIBUTIONS