0% found this document useful (0 votes)
9 views

Week04 Proba Distribution

This document covers probability distributions and related concepts. It introduces probability distributions, random variables, and the key differences between discrete and continuous random variables. It then discusses specific probability distributions like the binomial, Poisson, and normal distributions. It also covers how to calculate the mean, variance, and standard deviation of a probability distribution and how to determine if a result is unusually high or low. The document reviews several types of probability distributions and their applications in statistics.

Uploaded by

a.bocus2510
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Week04 Proba Distribution

This document covers probability distributions and related concepts. It introduces probability distributions, random variables, and the key differences between discrete and continuous random variables. It then discusses specific probability distributions like the binomial, Poisson, and normal distributions. It also covers how to calculate the mean, variance, and standard deviation of a probability distribution and how to determine if a result is unusually high or low. The document reviews several types of probability distributions and their applications in statistics.

Uploaded by

a.bocus2510
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 143

Lecture 4

Probability Distributions
 Review and Preview
 Probability Distributions
 Binomial Probability Distributions
 Parameters for Binomial Distributions
 Poisson Probability Distributions
 The Standard Normal Distribution
 Applications of Normal Distributions
 Sampling Distributions and Estimators
 Assessing Normality
 Normal as Approximation to Binomial and
Poisson

SIS 1037Y 2020/2021 2


 A combination of the methods of descriptive
statistics and those of probability to describe
and analyze probability distributions.
 Probability distributions describe what will
probably happen instead of what actually did
happen, and they are often given in the
format of a graph, table, or formula.
 In order to fully understand probability
distributions, we must first understand the
concept of a random variable, and be able to
distinguish between discrete and continuous
random variables.

SIS 1037Y 2020/2021 3


Combining Descriptive Methods
and Probabilities
We will construct probability distributions by presenting possible
outcomes along with the relative frequencies we expect.

SIS 1037Y 2020/2021 4


 Review and Preview
 Probability Distributions
 Binomial Probability Distributions
 Parameters for Binomial Distributions
 Poisson Probability Distributions
 The Standard Normal Distribution
 Applications of Normal Distributions
 Sampling Distributions and Estimators
 Assessing Normality
 Normal as Approximation to Binomial and
Poisson

SIS 1037Y 2020/2021 5


 The probability distribution gives the
probability for each value of a variable that is
determined by chance.
 It give consideration to distinguishing
between outcomes that are likely to occur by
chance and outcomes that are “unusual” in
the sense they are not likely to occur by
chance.

SIS 1037Y 2020/2021 6


 Random Variable: a variable (typically represented by
X) that has a single numerical value, determined by
chance, for each outcome of a procedure
 Probability Distribution: a description that gives the
probability for each value of the random variable,
often expressed in the format of a graph, table, or
formula
 Discrete Random Variable: either a finite number of
values or countable number of values, where
“countable” refers to the fact that there might be
infinitely many values, but that they result from a
counting process
 Continuous Random Variable: has infinitely many
values, and those values can be associated with
measurements on a continuous scale without gaps or
interruptions.

SIS 1037Y 2020/2021 7


 There is a numerical random variable x and its
values are associated with corresponding
probabilities.
 The sum of all probabilities must be 1.
◦ ∑ P(x) = 1
 Each probability value must be between 0 and 1
inclusive.
◦ 0 P(x)  1

SIS 1037Y 2020/2021 8


Graphs
The probability histogram is very similar to a
relative frequency histogram, but the vertical
scale shows probabilities.

Probability histogram for a


certain variable X
SIS 1037Y 2020/2021 9
Mean, Variance and
Standard Deviation of a
Probability Distribution
  [ x  P( x)] Mean

 2  [( x   )2  P( x)] Variance

 2  [( x 2  P( x)]   2 Variance (shortcut)

Standard Deviation
  [( x  P( x)]  
2 2
 The expected value of a discrete random
variable is denoted by E, and it represents
the mean value of the outcomes. It is
obtained by finding the value of ∑[xP(x)]
 E(x) = ∑[xP(x)]

SIS 1037Y 2020/2021 11


 The following table describes the probability
distribution for the number of girls in two births.
 Find the mean, variance, and standard deviation.

x P( x)
0 0.25
1 0.50
2 0.25
Total 1

SIS 1037Y 2020/2021 12


 The following table describes the probability
distribution for the number of girls in two births.
 Find the mean, variance, and standard deviation.

x P( x) xP(x) (x-μ)2 P(x)


0 0.25 0.00 0.25
1 0.50 0.50 0.00
2 0.25 0.50 0.25
Total 1 1.00 0.50

SIS 1037Y 2020/2021 13


 Using the formulas given previously:

 Mean = μ = ∑[xP(x)] = 1

 Variance = σ2 =∑[(x-μ)2 P(x)] = 0.5

 Standard deviation = σ = 0.707

SIS 1037Y 2020/2021 14


 According to the range rule of thumb, most
values should lie within 2 standard deviations
of the mean.
 We can therefore identify “unusual” values by
determining if they lie outside these limits:
 Maximum usual value = μ + 2σ
 Minimum usual value = μ + 2σ

SIS 1037Y 2020/2021 15


 If, under a given assumption (such as the
assumption that a coin is fair), the probability of
a particular observed event (such as 992 heads in
1000 tosses of a coin) is extremely small, we
conclude that the assumption is probably not
correct.
 Using Probabilities to Determine When Results
Are Unusual
◦ Unusually high: x successes among n trials is an
unusually high number of successes if P(x or more) 
0.05.
◦ Unusually low: x successes among n trials is an
unusually low number of successes if P(x or fewer) 
0.05 .

SIS 1037Y 2020/2021 16


 Review and Preview
 Probability Distributions
 Binomial Probability Distributions
 Parameters for Binomial Distributions
 Poisson Probability Distributions
 The Standard Normal Distribution
 Applications of Normal Distributions
 Sampling Distributions and Estimators
 Assessing Normality
 Normal as Approximation to Binomial and
Poisson

SIS 1037Y 2020/2021 17


 Binomial probability distributions allow us to deal
with circumstances in which the outcomes
belong to two relevant categories such as
acceptable/defective or survived/died etc...
 A binomial probability distribution results from a
procedure that meets all the following
requirements:
◦ 1. The procedure has a fixed number of trials.
◦ 2. The trials must be independent. (The outcome of any
individual trial doesn’t affect the probabilities in the
other trials.)
◦ 3. Each trial must have all outcomes classified into two
categories (commonly referred to as success and failure).
◦ 4. The probability of a success remains the same in all
trials.

SIS 1037Y 2020/2021 18


 S and F (success and failure) denote the two possible
categories of all outcomes; p and q will denote the
probabilities of S and F, respectively, so
 P(S) = p (p = probability of success)
 P(F) = 1-p = q (q = probability of failure)
 n denotes the fixed number of trials.
 x denotes a specific number of successes in n trials,
so x can be any whole number between 0 and n,
inclusive.
 p denotes the probability of success in one of the n
trials
 q denotes the probability of failure in one of the n
trials.
 P(x) denotes the probability of getting exactly x
successes among the n trials.

SIS 1037Y 2020/2021 19


 When an adult is randomly selected, there is a
0.85 probability that this person knows what
Twitter is.
 Suppose we want to find the probability that
exactly three of five randomly selected adults
know of Twitter.
 Does this procedure result in a binomial
distribution?
◦ Yes. There are five trials which are independent. Each
trial has two outcomes and there is a constant
probability of 0.85 that an adult knows of Twitter.
 Caution
◦ Be sure that x and p both refer to the same category
being called a success.
◦ When sampling without replacement, consider events
to be independent if n<0.05N.

SIS 1037Y 2020/2021 20


 Two methods for finding the probabilities
corresponding to the random variable x in a
binomial distribution will be considered.
◦ Method 1: Using the Binomial Probability Formula
◦ Method 2: Using Technology
 Other methods
◦ Using tables

SIS 1037Y 2020/2021 21


 P(x) = (n!/(n-x)!x!)pxqn-x
◦ for x = 0,1,2…n
 where
◦ n = number of trials
◦ x = number of successes among n trials
◦ p = probability of success in any one trial
◦ q = probability of failure in any one trial (q=1-p)
 n!/(n-x)!x! : Number of outcomes with exactly x
successes among n trials
 pxqn-x: The probability of x successes among n
trials for any one particular order

SIS 1037Y 2020/2021 22


 Mathematical & statistical software,
spreadsheet and some calculators can be
used to find binomial probabilities.

SIS 1037Y 2020/2021 23


 Given there is a 0.85 probability that any
given adult knows of Twitter, use the
binomial probability formula to find the
probability of getting exactly three adults
who know of Twitter when five adults are
randomly selected.
 We have: n=5, x=3, p=0.85, q=0.15
 We want: P(3)
 We get P(3) = 0.138

SIS 1037Y 2020/2021 24


 Review and Preview
 Probability Distributions
 Binomial Probability Distributions
 Parameters for Binomial Distributions
 Poisson Probability Distributions
 The Standard Normal Distribution
 Applications of Normal Distributions
 Sampling Distributions and Estimators
 Assessing Normality
 Normal as Approximation to Binomial and
Poisson

SIS 1037Y 2020/2021 25


 Characteristics of a binomial distribution:
centre, variation and distribution.
 Given a particular binomial probability
distribution we can find its mean, variance
and standard deviation.
 A strong emphasis is placed on interpreting
and understanding those values.

SIS 1037Y 2020/2021 26


 Mean μ = np
 Variance σ2 = npq
 Standard deviation σ = (npq)

SIS 1037Y 2020/2021 27


 95% of school leavers want to join UoM. A
group consists of 12 randomly selected
school leavers.
 Find the mean and s.d. of the group.

SIS 1037Y 2020/2021 28


 μ = 11.4
 σ = 0.754
 Using the rule given previously, the max
usual number of school leavers wanting to
join UoM is 13. It is not unusual for everyone
in the group to want to join UoM.

SIS 1037Y 2020/2021 29


 Review and Preview
 Probability Distributions
 Binomial Probability Distributions
 Parameters for Binomial Distributions
 Poisson Probability Distributions
 The Standard Normal Distribution
 Applications of Normal Distributions
 Sampling Distributions and Estimators
 Assessing Normality
 Normal as Approximation to Binomial and
Poisson

SIS 1037Y 2020/2021 30


 The Poisson distribution is another discrete
probability distribution which is important
because it is often used for describing the
behaviour of rare events: events with small
probabilities.
 It is a discrete probability distribution that
applies to occurrences of some event over a
specified interval.
 The random variable x is the number of
occurrences of the event in an interval.
 The interval can be time, distance, area, volume,
or some similar unit.
 P(x) = μxe-μ/x!
 We usually use the letter λ for parameter instead
of μ.

SIS 1037Y 2020/2021 31


 The random variable x is the number of
occurrences of an event over some interval.
 The occurrences must be random.
 The occurrences must be independent of
each other.
 The occurrences must be uniformly
distributed over the interval being used.

SIS 1037Y 2020/2021 32


 The mean number of occurrences of the
event over the interval is μ.
 The variance is σ2 = μ
 The s.d. σ =  μ.

SIS 1037Y 2020/2021 33


 The binomial distribution is affected by the
sample size n and the probability p, whereas
the Poisson distribution is affected only by
the mean μ.
 In a binomial distribution the possible values
of the random variable x are 0, 1, . . ., n, but
a Poisson distribution has possible x values
of 0, 1, 2, . . . , with no upper limit.

SIS 1037Y 2020/2021 34


 Assuming a Poisson distribution as a
suitable model for 530 cyclones over 100
years.
 Find μ, the mean number of cyclone per year.
◦ μ = no. cyclone/no. years = 5.3
 If P(x) is the probability of x cyclone in a
randomly selected year, find P(2).
◦ P(2) = 5.32*e-5.3/2! = 0.0701
 Similarly
◦ P(0) = 5.30*e-5.3/0! =
◦ P(1) = 5.31*e-5.3/1! =

SIS 1037Y 2020/2021 35


 The Poisson distribution is sometimes used to
approximate the binomial distribution when n is
large and p is small. The larger the n and the
smaller the p, the better is the approximation.
 Rule of Thumb to Use the Poisson to
Approximate the Binomial: n ≥ 100, np  10
◦ The approximation is good when p < 0.05 and n > 20
◦ What about n > 40 and p < 0.1?
 If both of the above requirements are met,
then use the following formula:
◦ μ = np

SIS 1037Y 2020/2021 36


x B(n=100, p=0.01) Poisson
0 0.366032 0.367879
1 0.36973 0.367879
2 0.184865 0.183940
3 0.06099 0.061313
4 0.014942 0.015328
5 0.002898 0.003066
6 0.0000463 0.000511
7
5-
SIS 1037Y 37
 Consider a Binomial situation, n= 100, p=0.075
 Calculate the probability of fewer than 10
successes.

SIS 1037Y 2020/2021 38


 Bin (100, 0.075)
 We have to sum the probability for 0, 1, 2,…,
9 successes.
 We have the factorials which are very large
and the probabilities very small.
 Manual calculation is very tedious.
 Answer: 0.7832687

SIS 1037Y 2020/2021 39


 The Poisson approximation to the Binomial states
that  will be equal to np, i.e. 100 x 0.075
 so =7.5
 We still have to sum the probability for 0, 1, 2,…, 9
successes.
 Answer: 0.7764076
 So it is correct to 2 decimal places.
 But the manual calculation is much simpler to do
compared to the Binomial.

SIS 1037Y 2020/2021 40


 Review and Preview
 Probability Distributions
 Binomial Probability Distributions
 Parameters for Binomial Distributions
 Poisson Probability Distributions
 The Standard Normal Distribution
 Applications of Normal Distributions
 Sampling Distributions and Estimators
 Assessing Normality
 Normal as Approximation to Binomial and
Poisson

SIS 1037Y 2020/2021 41


 Look at the standard normal distribution
which has three properties:
◦ Its graph is bell-shaped.
◦ Its mean is equal to 0 (μ = 0).
◦ Its standard deviation is equal to 1 (σ = 1).
 Develop the skill to find areas (or
probabilities or relative frequencies)
corresponding to various regions under the
graph of the standard normal distribution.
 Find Z scores that correspond to area under
the graph.

SIS 1037Y 2020/2021 42


 Uniform Distribution: A continuous random
variable has a uniform distribution if its values
are spread evenly over the range of probabilities.
The graph of a uniform distribution results in a
rectangular shape.
 Density Curve: A density curve is the graph of a
continuous probability distribution. It must
satisfy the following properties:
◦ 1. The total area under the curve must equal 1.
◦ 2. Every point on the curve must have a vertical height
that is 0 or greater. (That is, the curve cannot fall below
the x-axis.)
 Because the total area under the density curve is
equal to 1, there is a correspondence between
area and probability.

SIS 1037Y 2020/2021 43


Using Area to Find Probability
Given the uniform distribution illustrated, find
the probability that a randomly selected
voltage level is greater than 124.5 volts.

Shaded area
represents
voltage levels
greater than
124.5 volts.

SIS 1037Y 2020/2021 44


The standard normal distribution is a
normal probability distribution with μ = 0
and σ = 1. The total area under its
density curve is equal to 1.

SIS 1037Y 2020/2021 45


 We can find areas (probabilities) for different
regions under a normal model using technology
or a table.
 Table (see attached):
◦ It is designed only for the standard normal distribution,
which has a mean of 0 and a standard deviation of 1.
 Using technology:
◦ Spreadsheet
◦ Maths & stats packages
◦ Programming
◦ Calculator
 Usually gives the cumulative area from the left up
to a vertical line above a specific value of z.

SIS 1037Y 2020/2021 46


SIS 1037Y 2020/2021 47
 P(c<z<b) denotes the probability that the z
score is between a and b.
 P(z>a) denotes the probability that the z
score is greater than a.
 P(z<a) denotes the probability that the z
score is less than a.

SIS 1037Y 2020/2021 48


The column gives the value of Z to
the second decimal point

Z 0.00 0.01 0.02 …

The row shows the 0.0


value of Z to the 0.1
first decimal point . The value within the
.
. table gives the
2.0 .9772 probability from Z =
  up to the desired
Z value
2.0
P(Z < 2.00) = 0.9772

SIS 1037Y 2020/2021 49


 1. Draw a bell-shaped curve and identify the
region under the curve that corresponds to
the given probability. If that region is not a
cumulative region from the left, work instead
with a known region that is a cumulative
region from the left.
 2. Using the cumulative area from the left,
locate the closest probability in the body of
Normal Probability Table and identify the
corresponding z score.

SIS 1037Y 2020/2021 50


Finding z Scores
When Given Probabilities

5% or 0.05

(z score will be positive)


1.645

Finding the 95th Percentile


SIS 1037Y 2020/2021 51
Example – continued

SIS 1037Y 2020/2021 52


 Definition: For the standard normal distribution,
a critical value is a z score separating unlikely
values from those that are likely to occur.
 Notation: The expression zα denotes the z
score with an area of α to its right.
 Example
◦ Find the value of z0.025.
◦ The notation z0.025 is used to represent the z
score with an area of 0.025 to its right.
◦ Referring back to the bone density example,
z0.025 = 1.96.

SIS 1037Y 2020/2021 53


 Review and Preview
 Probability Distributions
 Binomial Probability Distributions
 Parameters for Binomial Distributions
 Poisson Probability Distributions
 The Standard Normal Distribution
 Applications of Normal Distributions
 Sampling Distributions and Estimators
 Assessing Normality
 Normal as Approximation to Binomial and
Poisson

SIS 1037Y 2020/2021 54


 Methods for working with normal
distributions that are not standard. That is,
the mean is not 0 or the standard deviation is
not 1, or both.
 The key concept is that we can use a simple
conversion that allows us to standardize any
normal distribution so that the same methods
of the previous section can be used.
 Conversion Formula: z = (x-μ)/σ
◦ Round z scores to 2 decimal places.

SIS 1037Y 2020/2021 55


Many Normal Distributions

By varying the parameters μ and σ, we obtain different normal


distributions

SIS 1037Y 2020/2021 56


x
z

 1. Sketch a normal curve, label the mean and
any specific X values, then shade the region
representing the desired probability.
 2. For each relevant X value that is a
boundary for the shaded region, use Formula
given to convert that value to the equivalent Z
score.
 3. Use computer software or a calculator or
Table to find the area of the shaded region.
This area is the desired probability.

SIS 1037Y 2020/2021 58


 Do not confuse z scores and areas. z scores are
distances along the horizontal scale, but areas
are regions under the normal curve.
◦ Normal tables list z scores in the left column and across
the top row, but areas are found in the body of the table.
 Choose the correct (right/left) side of the graph.
 A z score must be negative whenever it is located
in the left half of the normal distribution.
 Areas (or probabilities) are positive or zero
values, but they are never negative.

SIS 1037Y 2020/2021 59


 1. Sketch a normal distribution curve, enter the given
probability or percentage in the appropriate region of
the graph, and identify the x value(s) being sought.
 2. If using technology, refer to the instructions given.
 If using Table to find the z score corresponding to
the cumulative left area bounded by x, refer to the
body of the table to find the closest area, then
identify the corresponding z score.
 3. Using the formula on slide 21, enter the values for
μ, σ, and the z score found in step 2, and then solve
for x: x = μ + (zσ)
 4. Refer to the sketch of the curve to verify that the
solution makes sense in the context of the graph and
in the context of the problem.

SIS 1037Y 2020/2021 60


 Let X represent the time it takes (in
seconds) to download an image file from
the internet.
 Suppose X is normal with a mean of18.0
seconds and a standard deviation of 5.0
seconds. Find P(X < 18.6)

X
18.0
18.6 SIS 1037Y 2020/2021 61
Finding Normal Probabilities
 Let X represent the time it takes, in
seconds to download an image file from
the internet.
 Suppose X is normal with a mean of 18.0
seconds and a standard deviation of 5.0
seconds. Find P(X < 18.6)
 Z = (x-μ)/σ = (18.6 – 18.0)/5.0 = 0.12
μ = 18 μ=0
σ=5 σ=1

18 18.6 X 0 0.12 Z
P(X < 18.6) P(Z < 0.12)
SIS 1037Y 2020/2021 62
Standardized Normal Probability P(X < 18.6)
Table (Portion)
= P(Z < 0.12)

Z .00 .01 .02 0.5478

0.0 .5000 .5040 .5080

0.1 .5398 .5438 .5478


0.2 .5793 .5832 .5871
Z
0.3 .6179 .6217 .6255 0.00
0.12

2020/2021 SIS 1037Y 63


 Suppose X is normal with mean 18.0
and standard deviation 5.0.
 Now Find P(X > 18.6)

X
18.0
18.6
SIS 1037Y 2020/2021 64
(continued
)
 Now Find P(X > 18.6)…
P(X > 18.6) = P(Z > 0.12) = 1.0 - P(Z ≤ 0.12)
= 1.0 - 0.5478 = 0.4522

0.5478
1.000 1.0 - 0.5478 =
0.4522

Z Z
0 0
0.12 0.12
SIS 1037Y 2020/2021 65
 Suppose X is normal with mean 18.0 and
standard deviation 5.0. Find P(18 < X < 18.6)

Calculate Z-values:

X  μ 18  18
Z  0
σ 5
18 18.6 X
X  μ 18.6  18 0 0.12 Z
Z   0.12
σ 5 P(18 < X < 18.6)
= P(0 < Z < 0.12)

SIS 1037Y 2020/2021 66


P(18 < X < 18.6)
Standardized Normal Probability
Table (Portion) = P(0 < Z < 0.12)
= P(Z < 0.12) – P(Z ≤ 0)
Z .00 .01 .02 = 0.5478 - 0.5000 = 0.0478

0.0 .5000 .5040 .5080 0.0478


0.5000

0.1 .5398 .5438 .5478


0.2 .5793 .5832 .5871

0.3 .6179 .6217 .6255 Z


0.00
0.12
2020/2021 SIS 1037Y 67
 Suppose X is normal with mean 18.0
and standard deviation 5.0.
 Now Find P(17.4 < X < 18)

X
18.0
17.4
SIS 1037Y 2020/2021 68
(continued)

Now Find P(17.4 < X < 18)…


P(17.4 < X < 18)
= P(-0.12 < Z < 0)
0.0478
= P(Z < 0) – P(Z ≤ -0.12)
= 0.5000 - 0.4522 = 0.0478

0.4522

The Normal distribution is


symmetric, so this probability is the
same as P(0 < Z < 0.12) 18.0 X
17.4
-0.12 0 Z

SIS 1037Y 2020/2021 69


When designing aircraft cabins, what ceiling height will allow
95% of men to stand without bumping their heads? Men’s
heights are normally distributed with a mean of 69.5 inches
and a standard deviation of 2.4 inches.
First, draw the normal distribution.

SIS 1037Y 2020/2021 70


 When designing aircraft cabins, what ceiling
height will allow 95% of men to stand without
bumping their heads? Men’s heights are
normally distributed with a mean of 69.5
inches and a standard deviation of 2.4 inches.
 With z = 1.645, μ = 69.5, and σ = 2.4. we
can solve for x.
 x = μ + (zσ) = 69.5 +(1.645*2.4) = 73.448
inches

SIS 1037Y 2020/2021 71


 A bone mineral density test can be helpful in
identifying the presence of osteoporosis.
 The result of the test is commonly measured
as a z score, which has a normal distribution
with a mean of 0 and a standard deviation of
1.
 A randomly selected adult undergoes a bone
density test.
 Find the probability that the result is a
reading less than 1.27.

SIS 1037Y 2020/2021 72


P ( z  1.27) 

The probability of
random adult having a
bone density less than
1.27 is 0.8980.

SIS 1037Y 2020/2021 73


Example – continued

The probability of a randomly selected adult having a


bone density above –1 is 0.8413.
SIS 1037Y 2020/2021 74
Example – continued

SIS 1037Y 2020/2021 75


SIS 1037Y 2020/2021 76
SIS 1037Y 2020/2021 77
SIS 1037Y 2020/2021 78
 Review and Preview
 Probability Distributions
 Binomial Probability Distributions
 Parameters for Binomial Distributions
 Poisson Probability Distributions
 The Standard Normal Distribution
 Applications of Normal Distributions
 Sampling Distributions and Estimators
 Assessing Normality
 Normal as Approximation to Binomial and
Poisson

SIS 1037Y 2020/2021 79


 A population's mean, μ, is found by summing
all the data, xi, and dividing by the total
number, N.

 We may also calculate what is called a sample


mean using only a subset of the population
containing n values from the N possible,
where n is the number of datum, xi, in the
sample.

SIS 1037Y 2020/2021 80


 The population standard deviation is defined
as:

 The standard deviation of the sample:

 The sample standard deviation

SIS 1037Y 2020/2021 81


 The main objective is to understand the
concept of a sampling distribution of a
statistic, which is the distribution of all values
of that statistic when all possible samples of
the same size are taken from the same
population.
 We will also see that some statistics are better
than others for estimating population
parameters.

SIS 1037Y 2020/2021 82


 The sampling distribution of a statistic (such as
the sample mean or sample proportion) is the
distribution of all values of the statistic when all
possible samples of the same size n are taken
from the same population. (The sampling
distribution of a statistic is typically represented
as a probability distribution in the format of a
table, probability histogram, or formula.)
 The sampling distribution of the sample mean is
the distribution of all possible sample means,
with all samples having the same sample size n
taken from the same population.

SIS 1037Y 2020/2021 83


 Sample means target the value of the
population mean.
◦ That is, the mean of the sample means is the
population mean.
◦ The expected value of the sample mean is equal to
the population mean.
 The distribution of the sample means tends
to be a normal distribution.

SIS 1037Y 2020/2021 84


 The formula for the variance is:
◦ σ2 = ∑(x-μ)2/N
 If the variance in a sample is used to estimate the
variance in a population, then the previous
formula underestimates the variance and the
following formula should be used:
◦ s2 = ∑(x - x̄ )2/(n-1)
◦ where s2 is the estimate of the variance and x̄ is the
sample mean.
◦ Note that x̄ is the mean of a sample taken from a
population with a mean of μ.
◦ Since, in practice, the variance is usually computed in a
sample, this formula is most often used.

SIS 1037Y 2020/2021 85


SIS 1037Y 2020/2021 86
 Assume the scores 1, 2, 4, and 5 were
sampled from a larger population. To
estimate the variance in the population you
would compute s2 as follows:
 x̄ = (1 + 2 + 4 + 5)/4 = 12/4 = 3.
 s2 = 3.333
 sn2 = 2.5

SIS 1037Y 2020/2021 87


 Definition
◦ The sampling distribution of the variance is the
distribution of sample variances, with all samples
having the same sample size n taken from the same
population.
 Properties
◦ Sample variances target the value of the population
variance.
 That is, the mean of the sample variances is the
population variance. The expected value of the sample
variance is equal to the population variance.
◦ The distribution of the sample variances tends to be
a distribution skewed to the right.

SIS 1037Y 2020/2021 88


 The sampling distribution of the proportion is
the distribution of sample proportions, with
all samples having the same sample size n
taken from the same population.
 We need to distinguish between a population
proportion p and some sample proportion:
 p = population proportion
 p̑ = sample proportion

SIS 1037Y 2020/2021 89


 Sample proportions target the value of the
population proportion.
◦ That is, the mean of the sample proportions is the
population proportion.
◦ The expected value of the sample proportion is
equal to the population proportion.
 The distribution of the sample proportion
tends to be a normal distribution.

SIS 1037Y 2020/2021 90


 In statistics, the bias (or bias function) of an
estimator is the difference between this
estimator's expected value and the true value of
the parameter being estimated.
 An estimator or decision rule with zero bias is
called unbiased. Otherwise the estimator is said
to be biased.
 Sample means, variances and proportions are
unbiased estimators.
 That is they target the population parameter.
 These statistics are better in estimating the
population parameter.

SIS 1037Y 2020/2021 91


 Sample medians, ranges and standard
deviations are biased estimators.
 That is they do NOT target the population
parameter.
 Note: the bias with the standard deviation is
relatively small in large samples so s is often
used to estimate.

SIS 1037Y 2020/2021 92


 Consider repeating this process:
◦ Roll a die 5 times. Find the mean x̄,
variance s2, and the proportion of odd
numbers of the results.
 What do we know about the behaviour
of all sample means that are
generated as this process continues
indefinitely?

SIS 1037Y 2020/2021 93


Example - Sampling Distributions
Specific results from 10,000 trials

All outcomes are equally likely, so the


population mean is 3.5; the mean of the
10,000 trials is 3.49. If continued
indefinitely, the sample mean will be 3.5.
Also, notice the distribution is “normal.”
SIS 1037Y 2020/2021 94
Example - Sampling Distributions
Specific results from 10,000 trials

All outcomes are equally likely, so the


population variance is 2.9; the mean of the
10,000 trials is 2.88. If continued
indefinitely, the sample variance will be 2.9.
Also, notice the distribution is “skewed to the
right.” SIS 1037Y 2020/2021 95
Example - Sampling Distributions
Specific results from 10,000 trials

All outcomes are equally likely, so the


population proportion of odd numbers is
0.50; the proportion of the 10,000 trials is
0.50. If continued indefinitely, the mean of
sample proportions will be 0.50. Also, notice
the distribution is “approximately normal.”
SIS 1037Y 2020/2021 96
 Sampling without replacement would have the very
practical advantage of avoiding wasteful duplication
whenever the same item is selected more than once.
 However, we are interested in sampling with
replacement for these two reasons:
1. When selecting a relatively small sample form a
large population, it makes no significant difference
whether we sample with replacement or without
replacement.
2. Sampling with replacement results in independent
events that are unaffected by previous outcomes, and
independent events are easier to analyze and result in
simpler calculations and formulas.

SIS 1037Y 2020/2021 97


 Many methods of statistics require a simple
random sample. Some samples, such as
voluntary response samples or convenience
samples, could easily result in very wrong
results.

SIS 1037Y 2020/2021 98


 Review and Preview
 Probability Distributions
 Binomial Probability Distributions
 Parameters for Binomial Distributions
 Poisson Probability Distributions
 The Standard Normal Distribution
 Applications of Normal Distributions
 Sampling Distributions and Estimators
 Assessing Normality
 Normal as Approximation to Binomial and
Poisson

SIS 1037Y 2020/2021 99


 A criteria for determining whether the
requirement of a normal distribution is
satisfied is presented.
 The criteria involves visual inspection of a
histogram to see if it is roughly bell shaped,
identifying any outliers, and constructing a
graph called a normal quantile plot.

10
SIS 1037Y 2020/2021 0
 A normal quantile plot (or normal probability
plot) is a graph of points (x, y), where each x
value is from the original set of sample data,
and each y value is the corresponding z score
that is a quantile value expected from the
standard normal distribution

10
SIS 1037Y 2020/2021 1
 Determining whether it is reasonable to
assume that sample data are from a normally
distributed population:
◦ 1. Histogram: Construct a histogram. Reject
normality if the histogram departs dramatically
from a bell shape.
◦ 2. Outliers: Identify outliers. Reject normality if
there is more than one outlier present.
◦ 3. Normal Quantile Plot: If the histogram is basically
symmetric and there is at most one outlier, use
technology to generate a normal quantile plot. Use
the following criteria to determine whether or not
the distribution is normal

10
SIS 1037Y 2020/2021 2
 Normal Distribution: The population
distribution is normal if the pattern of the
points is reasonably close to a straight line
and the points do not show some systematic
pattern that is not a straight-line pattern.
 Not a Normal Distribution: The population
distribution is not normal if either or both of
these two conditions applies:
◦ The points do not lie reasonably close to a straight
line.
◦ The points show some systematic pattern that is
not a straight-line pattern.

10
SIS 1037Y 2020/2021 3
Normal: Histogram of IQ scores is close to being bell-shaped, suggests that
the IQ scores are from a normal distribution. The normal quantile plot
shows points that are reasonably close to a straight-line pattern. It is safe
to assume that these IQ scores are from a normally distributed population.

10
SIS 1037Y 2020/2021 4
Uniform: Histogram of data having a uniform distribution. The
corresponding normal quantile plot suggests that the points are not
normally distributed because the points show a systematic pattern that is
not a straight-line pattern. These sample values are not from a population
having a normal distribution.

10
SIS 1037Y 2020/2021 5
Skewed: Histogram of the amounts of rainfall for every Monday during one
year. The shape of the histogram is skewed, not bell-shaped. The
corresponding normal quantile plot shows points that are not at all close to
a straight-line pattern. These rainfall amounts are not from a population
having a normal distribution.

10
SIS 1037Y 2020/2021 6
 Step 1. First sort the data by arranging the values
in order from lowest to highest.
 Step 2. With a sample of size n, each value
represents a proportion of 1/n of the sample.
Using the known sample size n, identify the areas
of 1/2n, 3/2n, and so on. These are the
cumulative areas to the left of the corresponding
sample values.
 Step 3. Use the standard normal distribution to
find the z scores corresponding to the
cumulative left areas found in Step 2.
◦ These are the z scores that are expected from a normally
distributed sample.

10
SIS 1037Y 2020/2021 7
 Step 4. Match the original sorted data values
with their corresponding z scores found in
Step 3, then plot the points (x, y), where each
x is an original sample value and y is the
corresponding z score.
 Step 5. Examine the normal quantile plot and
determine whether or not the distribution is
normal.

10
SIS 1037Y 2020/2021 8
 Many data sets have a distribution that is
not normal, but we can transform the data
so that the modified values have a normal
distribution.
 One common transformation is to replace
each value of x with log(x + 1).
 If the distribution of the log(x + 1) values is
a normal distribution, the distribution of
the x values is referred to as a lognormal
distribution.

10
SIS 1037Y 2020/2021 9
 In addition to replacing each x value with
the log(x + 1), there are other
transformations, such as replacing each x
value with x, or 1/x, or x2.
 In addition to getting a required normal
distribution when the original data values
are not normally distributed, such
transformations can be used to correct
other deficiencies, such as a requirement
(see later) that different data sets have the
same variance.

11
SIS 1037Y 2020/2021 0
 Review and Preview
 Probability Distributions
 Binomial Probability Distributions
 Parameters for Binomial Distributions
 Poisson Probability Distributions
 The Standard Normal Distribution
 Applications of Normal Distributions
 Sampling Distributions and Estimators
 Assessing Normality
 Normal as Approximation to Binomial and
Poisson

11
SIS 1037Y 2020/2021 1
 A method for using a normal distribution as
an approximation to the binomial probability
distribution.
 If the conditions of np ≥ 5 and nq ≥ 5 are
both satisfied, then probabilities from a
binomial probability distribution can be
approximated well by using a normal
distribution with mean μ = np and standard
deviation σ = (npq)

11
SIS 1037Y 2020/2021 2
 1. The procedure must have a fixed number
of trials.
 2. The trials must be independent.
 3. Each trial must have all outcomes
classified into two categories (commonly,
success and failure).
 4. The probability of success remains the
same in all trials.
 Solve by binomial probability formula, Table,
or technology

11
SIS 1037Y 2020/2021 3
np  5
nq  5

then   np ,   npq and


the random variable has

a distribution.
(normal)

11
SIS 1037Y 2020/2021 4
 Using a Normal Distribution to Approximate
a Binomial Distribution
 1. Verify that both np≥ 5 and nq ≥ 5. If not,
you must use software, a calculator, a table
or calculations using the binomial probability
formula.
 2. Find the values of the parameters μ and σ
by calculating: μ = np, σ = npq
 3. Identify the discrete whole number x that
is relevant to the binomial probability
problem. Focus on this value temporarily.

11
SIS 1037Y 2020/2021 5
 4. Draw a normal distribution centred about μ, then
draw a vertical strip area centred over x. Mark the left
side of the strip with the number equal to x – 0.5,
and mark the right side with the number equal to x +
0.5. Consider the entire area of the entire strip to
represent the probability of the discrete whole
number itself.
 5. Determine whether the value of x itself is included
in the probability. Determine whether you want the
probability of at least x, at most x, more than x,
fewer than x, or exactly x. Shade the area to the right
or left of the strip; also shade the interior of the strip
if and only if x itself is to be included. This total
shaded region corresponds to the probability being
sought.

11
SIS 1037Y 2020/2021 6
 6. Using x – 0.5 or x + 0.5 in place of x, find
the area of the shaded region: find the z
score, use that z score to find the area to the
left of the adjusted value of x. Use that
cumulative area to identify the shaded area
corresponding to the desired probability.

11
SIS 1037Y 2020/2021 7
 In 431 football games that went to over time,
the teams that won the coin toss went on to
win 235 of those games.
 If the coin-toss method is fair, teams winning
the toss would win about 50% of the games
(we’d expect 215.5 wins in 431 overtime
games).
 Assuming there is a 0.5 probability of
winning a game after winning the coin toss,
find the probability of getting at least 235
winning games.

11
SIS 1037Y 2020/2021 8
 The given problem involves a binomial
distribution with n = 431 trials and an
assumed probability of success of p = 0.5.
 Use the normal approximation to the
binomial distribution.
 Step 1: The conditions check:

np  431 0.5  215.5  5


nq  431 0.5  215.5  5

11
SIS 1037Y 2020/2021 9
 Step 2: Find the mean and standard deviation
of the normal distribution:
  np  431 0.5  215.5
  npq  431 0.5 0.5  10.38027

 Step 3: We want the probability of at least


235 wins, so x = 235.
 Step 4: The vertical strip will go from 234.5
to 235.5.

12
SIS 1037Y 2020/2021 0
Example – Football Coin Toss

Step 5: We will shade the area to the right of 234.5.

12
SIS 1037Y 2020/2021 1
 Step 6: Find the z score and use technology
or Table to determine the probability.

x 234.5  215.5


z   1.83
 10.380270
 The probability is 0.0336 for the coin flip
winning team to win at least 235 games.
 This probability is low enough to suggest
the team winning coin flip has an unfair
advantage.

12
SIS 1037Y 2020/2021 2
 When we use the normal distribution (which
is a continuous probability distribution) as an
approximation to the binomial distribution
(which is discrete), a continuity correction is
made to a discrete whole number x in the
binomial distribution by representing the
discrete whole number x by the interval from
x – 0.5 to x + 0.5 (that is, adding and
subtracting 0.5).

12
SIS 1037Y 2020/2021 3
Example – Continuity Corrections

12
SIS 1037Y 2020/2021 4
P(X = 18) ≈ P(17.5 < X < 18.5)
12
SIS 1037Y 2020/2021 5
P(X < 18) ≈ P(X < 18.5)
12
SIS 1037Y 2020/2021 6
 The normal distribution can also be
used to approximate the Poisson
distribution whenever the parameter
λ, the expected number of successes,
equals or exceeds 5.
 Since the value of the mean and the
variance of a Poisson distribution are
the same:
◦ μ = σ2 = λ
12
SIS 1037Y 2020/2021 7
 Substituting into the transformation Equation
X μ X 
Z 
σ 
 so that, for large enough λ, the random variable
Z is approximately normally distributed.
 Hence, to find approximate probabilities
corresponding to the values of the Poisson
random variable X, a correction has to be made
to cater for discrete values, similar to the
Binomial.

12
SIS 1037Y 2020/2021 8
12
SIS 1037Y 2020/2021 9
 Suppose that at a certain automobile plant
the average number of work stoppages per
day due to equipment problems during the
production process is 12.0.
 Determine the approximate probability of
having 15 or fewer work stoppages due to
equipment problems on any given day.

13
SIS 1037Y 2020/2021 0
 Substituting into the transformation Equation

Xa   (15.5  12.0)
Z   1.01
 12.0
 Here Xa, the adjusted number of successes, is 15.5.
Hence the approximate probability that X does not
exceed this value corresponds to a Z value, of not
more than +1.01.
 Note that the area under the normal curve less than
Z = +1.01 is 0.8438. Therefore, the approximate
probability of having 15 or fewer work stoppages due
to equipment problems on any given day is 0.8438.
 This approximation compares quite favourably to the
exact Poisson probability, 0.8445.

13
SIS 1037Y 2020/2021 1
 When we use the normal distribution (which
is a continuous probability distribution) as an
approximation to the binomial or Poisson
distribution (which are discrete), a continuity
correction is made to a discrete whole
number x in the discrete distributions by
representing the single value x by the interval
from
◦ x – 0.5 to x + 0.5
◦ (that is, adding and subtracting 0.5).

13
SIS 1037Y 2020/2021 2
If x is a random variable with mean x
and variance X and a and b are
2

numerical constants, the random variable


y defined by y = a + bx is called a linear
function of the random variable x.
The mean of y = a + bx is y = a + bx = a + bx


The variance of y is y2
  2
a  bx
 b X
2 2

From which it follows that the standard deviation


of y is  y  a bx  b  x
2020/2021 SIS 1037Y 133
Suppose x is the number of sales
staff needed on a given day. If the
cost of doing business on a day x p(x)
involves fixed costs of $255 and the 1 0.3
cost per sales person per day is 2 0.4
$110, find the mean cost (the mean 3 0.2
of x or x) of doing business on a 4 0.1
given day where the distribution of
x is given below.

13
SIS 1037Y 2020/2021 4
We need to find the mean of y = 255 + 110x

x p(x) xp(x)
1 0.3 0.3
2 0.4 0.8   2.1
x
3 0.2 0.6
4 0.1 0.4
2.1

 y  255110 x  255  110 x


 255  110(2.1)  $486
2020/2021 SIS 1037Y 135
We need to find the variance and
standard deviation of y = 255 + 110x

p(x) (x-) p(x)


2
x
1 0.3 0.3630   0.89
2
x
2 0.4 0.0040
3 0.2 0.1620  
x
0.89  0.9434
4 0.1 0.3610
0.8900

 2
255 110  X
 (110)   (110) (0.89)  10769
2 2
x
2

 255 110  X
 110 x  110(0.9434)  103.77

2020/2021 SIS 1037Y 136


 If x1, x2,  , xn are random variables and a1,
a2,  , an are numerical constants, the
random variable y defined as
 y = a 1 x1 + a 2 x2 +  + a n xn
 is a linear combination of the xi’s.

13
SIS 1037Y 2020/2021 7
 If x1, x2,  , xn are random variables with
means 1, 2,  , n and variances 12 , 22 , , n2
respectively,
 and y = a1x1 + a2x2 +  + anxn then
 1. y = a11 + a22 +  + ann
 (This is true for any random variables with no conditions.)

 2. If x1, x2,  , xn are independent random


variables then 2y  a1212  a2222   an2n2
and  y  a1 1  a22   an2n2
2 2 2 2

13
SIS 1037Y 2020/2021 8
A distributor of fruit baskets is going to put
4 apples, 6 oranges and 2 bunches of
grapes in his small gift basket. The weights,
in ounces, of these items are the random
variables x1, x2 and x3 respectively with
means and standard deviations as given in
the following table.
Apples Oranges Grapes
Mean
8 10 7

Standard deviation
0.9 1.1 2

Find the mean, variance and standard deviation of
the random variable y = weight of fruit in a small
gift basket. 13
SIS 1037Y 2020/2021 9
It is reasonable in this case to assume that
the weights of the different types of fruit are
independent. Apples Oranges Grapes
Mean
8 10 7

Standard deviation
0.9 1.1 2

a1  4, a2  6, a3  2, 1  8,  2  10,  3  7

1  0.9, 2  1.1, 3  2
 y  a x a x
1 1 2 2  a3 x 3
 a11  a2 2  a3  3
 4(8)  6(10)  2(7)  106
2y  a2 x a x
1 1 2 2  a3 x 3
 a12 12  a 22 22  a3232
 42 (.9)2  6 2 (1.1)2  22 (2)2  72.52

2020/2021
y = 72.52  8.5159
SIS 1037Y 140
 We have seen the basics of Probability
Distributions
 Two discrete Probability Distributions have
been studied:
◦ Binomial Probability Distributions
◦ Poisson Probability Distributions
 The approximation of one distribution by
another is possible under certain conditions.

14
SIS 1037Y 2020/2021 1
 Normal distribution
 Found approximations to binomial
probabilities by using the normal distribution.
 Found approximations to Poisson
probabilities by using the normal distribution.
 Continuity Corrections
 Linear combination of random variables

14
SIS 1037Y 2020/2021 2
Comments?

You might also like