STAE Lecture Notes - LU5
STAE Lecture Notes - LU5
LEARNING OBJECTIVES:
• Construct a probability mass function
• Calculate the expected value, variance and probabilities from a probability mass function
• Calculate the expected value, variance and probabilities for a binomial random variable
• Calculate the expected value, variance and probabilities for a Poisson random variable
• Understand the relationship between a probability density function and the cumulative distribution
function
• Calculate expected values from a probability density function
• Calculate percentiles from a cumulative distribution function
• Calculate the expected value, variance and probabilities for an exponential random variable
• Calculate probabilities for the normal distribution using standard scores
5.1. Introduction
For a given sample space S of some experiment, a random variable is any rule that associates a number with
each outcome in S, i.e., a real-valued function that maps the sample space onto the real line. A random variable
is a function whose domain is the sample space and whose range is a set of real numbers. Random variables
are typically denoted by uppercase letters. Random variables are either discrete or continuous. A discrete
random variable can take on a finite or countable number of values. A continuous random variable can take
on any of the values in an interval, or in the union of disjoint intervals, on the numbers line.
A probability distribution is an assignment of probabilities to each distinct value of a discrete random variable
or to intervals of values of a continuous random variable. Probability distributions reflect distributions in the
population context and are constructed from empirical data as an estimate of the population distribution, or
from an experiment for which the complete distribution is known in advance (a priory). Population parameters
for central tendency and variability of any probability distribution exist and can be calculated. The population
mean µ is called the expected value of a random variable X, denoted as E(X). The population variance is 2
and the population standard deviation is .
1
For example, consider the experiment of tossing two fair coins. Let X = the number of tails observed in the
experiment.
• The sample space is S = {HH, HT, TH, TT}
• The possible outcomes of X is {0, 1, 2}
• Therefore, X is a random variable that maps S onto the range of X given by {0, 1, 2}
Exercise 5.1
Classify the following random variables as either discrete or continuous.
Number of traffic fatalities per year in Gauteng
Distance a golf ball travels after being hit with a driver
Time required to drive from home to university on any given day
Number of ships in Durban harbour on any given day
Your weight before breakfast each morning
if:
• 0 p ( x) 1
• p ( x) = 1
F ( X ) = P ( X x) = p ( y)
y x
2
The variance of X
Let X be a discrete random variable with a set of possible values D and p.m.f. p(x) and expected value .
V ( X ) = E ( X − ) = E ( X 2 ) − E ( X ) = x 2 p ( x ) − 2
2 2
D
The population parameters can also be calculated using the calculator. The discrete values and associated
probabilities are entered into the calculator, as was done for an ungrouped frequency distribution.
Steps
1) Enter data
2) AC
3) STAT →4:VAR → 2: x → = (population mean)
STAT →4:VAR → 3: x → = (population standard deviation)
Exercise 5.2
1) Let X = the number of students that consult with the lecturer on any given day. Use the following p.m.f.
to determine E(X), V(X), and the probability that 1 or 2 students will consult with the lecturer
X 0 1 2 3 4
p(x) 0.20 0.25 0.30 0.15 0.10
E ( X ) = x p ( x) =
xD
V ( X ) = x2 p ( x ) − 2 =
D
P ( X = 1) + P ( X = 2 ) =
3
2) Consider the following p.m.f. for the random variable X. Calculate the value k. Find P( X 0) , E(X)
and V(X).
kx 2 x = −1,1, 2
p ( x) =
0 otherwise
3) A telemarketer for a company that sells home security systems contacts customers and finds that each
contact will result in no sale with probability 0.85, a R5000 sale with a probability 0.10 and a R10000
sale with probability 0.05. Calculate the expected value and standard deviation of sales.
4
5.2.2. Binomial distribution
A binomial random variable arises from an experiment consisting of a number of identical sub experiments
called Bernoulli trials. A Bernoulli trial has only two possible outcomes, a success or a failure. The binomial
variable counts the number of successes out of n trials. It is a discrete random variable as it can take on any
value 0, 1, ..., n. This distribution is characterised by two parameters, namely the number of trials (n) and the
probability of a success (p), and the p.m.f. has been expressed in a general form. If variable X is a binomial
random variable with n trials and a probability of success p, it is denoted by X ~ B(n, p).
An experiment that counts the number of successes in n trials is a binomial random variable if all of the
following properties are satisfied:
• The experiment consists of n identical trials
• Each trial consists of two possible outcomes: success (S) or failure (F)
o Note: a success is not necessarily a positive event, it refers to the focus of the distribution
• The probability of a success is the same for all trials, namely p
• The n trials are independent
• Var ( X ) = npq = np (1 − p )
Exercise 5.3
The quality control inspector of a production plant will reject a batch of syringes if two or more defective
syringes are found in a random sample of eight syringes taken from the batch. From historical records it is
know that the plant produces 1% defective syringes.
5
2) What is the expected number of defective syringes the inspector will find?
7) What is the probability that the inspector finds more than 5 but at most 7 syringes that are not defective?
6
5.2.3. Poisson distribution
The Binomial random variable counts the number of successes out of n trials. Therefore, the total number in
the set (n) is known. Some experiments are such that this upper limit is not known, for example the number
of customers arriving at a store per day. In this case there is no limit to the total number of customers that
could arrive at the store. As a store owner it would be useful to calculate probabilities around customers’
arrivals to ensure that there are enough staff in the store to assist them. If events occur at a constant rate λ
during any given time interval, and occurrences are independent of one another, the events are said to occur
according to a Poisson process. Note: the textbook defines the rate as µ. Let X = the number of events in a
time interval occur according to a Poisson process. Then variable X follows a Poisson distribution with a
constant rate of occurrence, i.e., X ~ P(λ) and the p.m.f. is denoted by p(x; λ).
• V (X ) =
•
The Binomial and Poisson distributions are related such that, under certain conditions, the Binomial
distribution can be estimated using the Poisson distribution. Suppose that X ~ B(n, p). If we let n → and
p → 0 such that np approaches 0 , then X can be approximated with X* ~ P(λ), where = np . This
approximation can be applied if n 50 and np 5 .
Example
The probability that a 600-page book contains typographical errors on any given page is 0.005. It is assumed
that errors occur independently of page. Let X = the number of pages of the book that contains (at least one)
typographical error. Then X ~ B(600, 0.005). The probability that exactly 1 page out of 600 contains errors is:
600
P(X = 1) = ( 0.005)1 ( 0.995)599 = 0.14899
1
e−3 31
P(X = 1) = = 0.14936
1!
7
Exercise 5.4
A security company receives on average 4 burglary alarm phone calls a week from a certain suburb, according
to a Poisson process.
1) What is the probability that there will be at least 3 burglary alarm phone calls next week?
2) What is the expected number of burglary alarm phone calls per week?
3) What is the standard deviation of burglary alarm phone calls during a 2-week period?
4) What is the probability that the company receives only 1 burglary alarm phone calls during a 3-day
period?
8
5.3. Continuous Probability Distribution
A random variable X is a continuous random variable if its set of possible values consist either of all numbers
on the real line, or all numbers in a union of disjoint intervals. Furthermore, no single value of a continuous
random variable has a positive probability, i.e., P ( X = x ) = 0 for any possible value x.
Exercise 5.5
Consider the following random variables and indicate which curve/graph best describe the shape of the
distribution for each variable:
1) The time spent waiting at a traffic light, under normal conditions
2) Income distribution of the South African population currently working
3) Height of the female population
4) Lifetime of a light bulb
All of these curves are generated through mathematical equations. A continuous probability distribution is a
p.d.f. if an only if the total area under the curve is equal to 1. Here probabilities are not associated with
individual values of the random variable but with the density around a value.
9
It therefore follows that, for any possible value x of a continuous random variable X, the probability that the
variable is exactly equal to the value x is zero, i.e., P ( X = x ) = 0 . The population parameters of a p.d.f. exist
and can be calculated using integration. To calculate probabilities associated with an interval of x values, the
p.d.f. equation is integrated over the interval range.
Let X be a continuous random variable with p.d.f. f ( x ) . For f ( x ) to be a legitimate p.d.f. it must satisfy the
b
The probability that X takes on a value in the interval [a, b] is P ( a X b ) = f ( x ) dx . Since P ( X = x ) = 0
a
Exercise 5.6
The current in a certain circuit is a continuous random variable X with density:
0.075 x + 0.2 3 x 5
f ( x) =
0 otherwise
3) Calculate P ( X 4 )
10
5.3.2. Cumulative distribution function
The cumulative distribution function (c.d.f.) is the integral function of a p.d.f. and is used to calculate
probabilities for a continuous random variable. It is denoted by F ( x ) and is defined as the area less than or
• For any a: P ( X a ) = 1 − F ( a )
• p = F ( ( p ) ) = f ( y ) dy
−
Parameters
For a continuous random variable X with p.d.f. f ( x ) :
• Expected / mean value is: = E ( X ) = x f ( x ) dx
−
• Variance is: 2 = V ( X ) = E ( X − ) = (x − ) f ( x ) dx
2 2
−
Computational formula for variance is: V ( X ) = E ( X 2 ) − E ( X ) = x f ( x ) dx − 2
2
• 2
−
11
Exercise 5.7
1
x 1
Let f ( x ) = x 2
0 otherwise
Exercise 5.8
3 2
x 0 x2
Consider the p.d.f. f ( x ) = 8
0 otherwise
12
5.3.3. Exponential distribution
The family of Exponential distributions provides probability models that are widely used in engineering and
science. The Exponential distribution is frequently used as a model for the distribution of time until the
occurrence of the first event or times between the occurrence of successive events, such as customers arriving
at a service facility or calls coming into a switchboard. Another important application of the exponential
distribution is to model the distribution of component lifetime. A partial reason for the popularity of such
applications is the "memoryless" property of the Exponential distribution. An Exponential random variable X
is characterised by the scale parameter , which is the average number of occurrences in a single time unit
and is denoted as X ~ Exp ( ) . The Exponential distribution is related to the Poisson distribution. If the
number of occurrences in an interval follows a Poisson distribution, then the waiting time between occurrences
follows an exponential distribution, and vice versa.
e − x x0
• The p.d.f. is f ( x; ) =
0 otherwise
• The c.d.f. is F ( x; ) = P ( X x ) = 1 − e − x P ( X x ) = e − x
• P (a X b) = P ( X b) − P ( X a ) = P ( X a ) − P ( X b)
1
• The expected value is E ( X ) =
1
• The variance is V ( X ) =
2
1
• The standard deviation is
Exercise 5.9
Trains arrive at a station from 03h00 according to a Poisson process, at a rate of 4 per hour.
1) What is the average number of trains that arrive at this station between 03h00 and 05h00?
13
2) What is the probability that only two trains arrive in an hour?
3) Find the mean and variance of the waiting time between trains (in minutes)
4) If a passenger arrives at the station at 09h00, find the probability that he has to wait for the train until
after 09h06
5) A passenger arrives at the station at 09h00 and just misses a train. What is the probability that the next
train will arrive between 09h10 and 09h15?
14
5.3.4. Normal distribution
Most numerical variables follow a distribution where the majority of the observations are around the centre of
the distribution and only a few observations are in the left and right tail areas of the distribution, creating a
bell-shaped curve. Real population distributions are dynamic and may vary over time, but it will generally
follow such a bell-shaped curve.
This distribution is called the normal distribution, or Gaussian distribution, and is the most important
distribution in statistical analyses. If the shape of a population distribution can be effectively described using
the normal distribution, it is possible to calculate probabilities associated with the behaviour of the random
variable in the population. This forms the basis for statistical inference.
X ~ N ( , 2 )
15
For example, let’s assume that height of the female population in South Africa follows a normal distribution
with a mean of 165cm and a standard deviation of 5cm, i.e., Height = X ~ N ( = 165, 2 = 52 ) . Then:
Exercise 5.9
Consider the following three normal curves and comment on the differences and similarities.
B C
Differences Similarities
A vs. B
B vs. C
A vs. C
16
The standard normal distribution is a very particular normal distribution that has a mean of 0 and a standard
deviation of 1 (and consequently a variance of 1). It is denoted by Z ~ N (0,1) and can take on values ranging
from −∞ to +∞. The values of Z are called z-scores or z-values. Cumulative probabilities are used to find
probabilities under any normal distribution. A set of statistical tables (Z-tables) exists for the standard normal
distribution with cumulative probabilities for selected z-scores, i.e., it gives P ( Z z ) = P ( Z z ) , and are
used to find any area (interval) under the curve. Areas of a different form must be re-written in the form of the
tables. Note that P ( Z = z ) = 0 , and P ( X = x ) = 0 , for any X ~ N ( , 2 ) .
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
The margins of the Z-tables (first row and first column) contain the z-scores up to two decimal places. The
body of the table contains the cumulative probabilities associated with the corresponding z-scores, up to four
decimal places. As the z-score increases, the cumulative area increases. There is a one-to-one relationship
between the z-score and the area/probability, i.e., every z-score corresponds to a single cumulative area and
every cumulative area corresponds to a single z-score.
For example, to find the area between −∞ and 0.21, i.e., P ( Z 0.21) , find 0.2 in the first column and 0.01 in
the first row, and read the probability/area in the body of the table where the row and column intersect.
Therefore, the area is equal to 0.5832. Similarly, to find the area between −∞ and 0.76, i.e., P ( Z 0.76 ) , 0.7
in the first column and 0.06 in the first row intersect in the table at the value 0.7764.
For example, suppose the area between −∞ and a certain z-score is 0.5040. To find the value of this z-score,
find the given area (or closest value) in the body of the table and read the corresponding z-score from the first
column and the first row. For this example, the z-score that corresponds to a cumulative area of 0.5040 is z =
0.01. For example, find the z-score of the 70th percentile (P70), i.e., find the z-score such that P ( Z z ) = 0.70.
The probability/area value in the body of the Z-tables that is closest to 0.7 is 0.6985, corresponding to a
z-score of 0.52. Therefore P ( Z 0.52 ) 0.70 .
17
Rules for Using the Standard Normal Distribution Tables
The probabilities given in the Z-tables are of the form P ( Z z ) = P ( Z z ) . Areas of a different form must
• P ( Z z ) = 1− P ( Z z )
• P (a Z b) = P ( Z b) − P ( Z a )
18
Exercise 5.11
1) P ( Z 1.33) =
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
. . . . . . . . . . .
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
. . . . . . . . . . .
2) P ( Z −0.48 ) =
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
. . . . . . . . . . .
-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
-0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
-0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
. . . . . . . . . . .
3) P ( −1.70 Z 0.24 ) =
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
. . . . . . . . . . .
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
. . . . . . . . . . .
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
. . . . . . . . . . .
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
. . . . . . . . . . .
19
Calculating Probabilities for a Normal Random Variable X
The standard normal distribution is just one of an infinite number of possible normal distributions that could
exist. For a random variable X where X ~ N ( , 2 ) there are no statistical tables with cumulative
probabilities. However, a very important and useful property of random variables state that any normal random
variable can be standardised by subtracting its mean and dividing by its standard deviation. This
transformation will always result in the standard normal random variable Z ~ N ( 0,1) .
X −
If X ~ N ( , 2 ) then Z = ~ N ( 0,1)
To calculate probabilities for any normally distributed variable X, standardise X to produce Z, then use the
Z-tables to find the required probabilities.
Example
It is assumed that IQ is normally distributed with a mean of 100 and standard deviation of 12, i.e.,
X ~ N (100,122 ) . The probability that a randomly selected person has an IQ of at least 110 can then be
calculated by first standardising the original variable, and then using the Z-tables to find the required
probability:
X − 110 − 100
P ( X 110 ) = P
12
= P ( Z 0.83) = 1 − P ( Z 0.83) = 1 − 0.7967 = 0.2033
20
Example
It is assumed that IQ is normally distributed with a mean of 100 and standard deviation of 12. Find the highest
IQ value of the lowest 33% of the population, i.e., the x-value for which P ( X x ) = 0.33 .
Exercise 5.12
Coal is carried in hopper (freight) cars on trains. The weights of coal loaded into each car are normally
distributed with a mean of 75 tons and a standard deviation of 0.8 ton.
1) What is the probability that a car chosen at random will have less than 74.5 tons of coal?
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
. . . . . . . . . . .
-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
. . . . . . . . . . .
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
. . . . . . . . . . .
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
. . . . . . . . . . .
21
Exercise 5.12
The time a student spends learning a software package is assumed to be normally distributed with a mean
of 6 hours and a standard deviation of 1.5 hours. A student is selected at random.
1) What is the probability that the student spends longer than 7 hours learning the software package?
2) What is the probability that the student spends between 6.5 and 7 hours learning the software package?
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
. . . . . . . . . . .
-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
. . . . . . . . . . .
22