Engr. Data Analysis
Engr. Data Analysis
Learning Objectives
At the end of this chapter, it is expected that the students will be able to:
1. Demonstrate an understanding of the different methods of obtaining data.
Statistical Terms
Before proceeding to the discussion of the different methods of obtaining data, let
us have first definition of some statistical terms:
Examples
Retrospective Study
Montgomery, Peck, and Vining (2001) describe an acetone-butyl alcohol
distillation column for which concentration of acetone in the distillate or output product
stream is an important variable. Factors that may affect the distillate are the reboil
temperature, the condensate temperature, and the reflux rate. Production personnel
obtain and archive the following records:
The concentration of acetone in an hourly test sample of output product
The reboil temperature log, which is a plot of the reboil temperature over time
The condenser temperature controller log
The nominal reflux rate each hour
The reflux rate should be held constant for this process. Consequently, production
personnel change this very infrequently. A retrospective study would use either all or a
sample of the historical process data archived over some period of time. The study
objective might be to discover the relationships among the two temperatures and the
reflux rate on the acetone concentration in the output product stream. However, this
type of study presents some problems:
1. We may not be able to see the relationship between the reflux rate and acetone
concentration, because the reflux rate didn’t change much over the historical period.
2. The archived data on the two temperatures (which are recorded almost
continuously) do not correspond perfectly to the acetone concentration measurements
(which are made hourly). It may not be obvious how to construct an approximate
correspondence.
3. Production maintains the two temperatures as closely as possible to desired
targets or set points. Because the temperatures change so little, it may be difficult to
assess their real impact on acetone concentration.
4. Within the narrow ranges that they do vary, the condensate temperature tends to
increase with the reboil temperature. Consequently, the effects of these two process
variables on acetone concentration may be difficult to separate.
As you can see, a retrospective study may involve a lot of data, but that data
may contain relatively little useful informationabout the problem. Furthermore, some of
the relevant data may be missing, there may be transcription or recording errors
resulting in outliers(or unusual values), or data on other important factors may not have
been collected and archived. In the distillation column, for example, the specific
concentrations of butyl alcohol and acetone in the input feed stream are a very
important factor, but they are not archived because the concentrations are too hard to
obtain on a routine basis. As a result of these types of issues, statistical analysis of
historical data sometimes identify interesting phenomena, but solid and reliable
explanations of these phenomena are often difficult to obtain.
Observational Study
In the distillation column, the engineer would design a form to record the two
temperatures and the reflux rate when acetone concentration measurements are made.
It may even be possible to measure the input feed stream concentrations so that the
impact of this factor could be studied. Generally, an observational study tends to solve
problems 1 and 2 above and goes a long way toward obtaining accurate and reliable
data. However, observational studies may not help resolve problems 3 and 4.
Designed Experiments
In a designed experiment the engineer makes deliberate or purposeful changes
in the controllable variables of the system or process, observes the resulting system
output data, and then makes an inference or decision about which variables are
responsible for the observed changes in output performance. Designed experiments
play a very important role in engineering design and development and in the
improvement of manufacturing processes. Generally, when products and processes are
designed and developed with designed experiments, they enjoy better performance,
higher reliability, and lower overall costs. Designed experiments also play a crucial role
in reducing the lead time for engineering design and development activities.
Planning and Conducting Surveys
A survey is a way to ask a lot of people a few well-constructed questions. The
survey is a series of unbiased questions that the subject must answer. Some
advantages of surveys are that they are efficient ways of collecting information from a
large number of people, they are relatively easy to administer, a wide variety of
information can be collected and they can be focused (researchers can stick to just the
questions that interest them.) Some disadvantages of surveys arise from the fact that
they depend on the subjects’ motivation, honesty, memory and ability to respond.
Moreover, answer choices to survey questions could lead to vague data. For example,
the choice “moderately agree” may mean different things to different people or to
whoever ends up interpreting the data
Conducting a Survey
There are various methods of administering a survey. It can be done as a face-to
face interview or a phone interview where the researcher is questioning the subject. A
different option is to have a self-administered surveywhere the subject can complete a
survey on paper and mail it back, or complete the survey online. There are advantages
and disadvantages to each of these methods.
Face to face interview
The advantages of face-to-face interviews include fewer misunderstood
questions, fewer incomplete responses, higher response rates, and greater control
over the environment in which the survey is administered; also, the researcher can
collect additional information if any of the respondents’ answers need clarifying.
The disadvantages of face-to-face interviews are that they can be expensive and
time-consuming and may require a large staff of trained interviewers. In addition, the
response can be biased by the appearance or attitude of the interviewer.
Self-administered survey
The advantages of self-administered surveys are that they are
less expensive than interviews, do not require a large staff of experienced interviewers
and can be administered in large numbers. In addition, anonymity and
privacy encourage more candid and honest responses, and there is less pressure
on respondents. The disadvantages of self-administered surveys are that
responders are more likely to stop participating mid-way through the survey and
respondents cannot ask them to clarify their answers. In addition, there are lower
response rates than in personal interviews, and often the respondents who bother to
return surveys represent extremes of the population – those people who care about
the issue strongly, whichever way their opinion leans.
Designing a Survey
Surveys can take different forms. They can be used to ask only one question or they can ask a
series of questions. We can use surveys to test out people’s opinions or to test a hypothesis.
When designing a survey, the following steps are useful:
1. Determine the goal of your survey: What question do you want to answer?
2. Identify the sample population: Whom will you interview?
3. Choose an interviewing method: face-to-face interview, phone interview, self-
administered paper survey, or internet survey.
4. Decide what questions you will ask in what order, and how to phrase them. (This is
important if there is more than one piece of information you are looking for.)
5. Conduct the interview and collect the information.
6. Analyze the results by making graphs and drawing conclusions.
1. Planning
At this stage, keep in mind these considerations:
o thorough and precise objective identifying the need to conduct the investigation
o assessment of time and resources available to achieve the objective
o integration of prior knowledge to the experimentation procedure
identifies possible factors to investigate and the most appropriate response(s) to
measure
o
CHAPTER II
Probability is simply how likely an event is to happen. “The chance of rain today is 50%”
is a statement that enumerates our thoughts on the possibility of rain. The likelihood of
an outcome is measured by assigning a number from the interval [0, 1] or as
percentage from 0 to 100%. The higher the number means the event is more likely to
happen than the lower number. A zero (0) probability indicates that the outcome is
impossible to happen while a probability of one (1) indicates that the outcome will occur
inevitably.
Learning Objectives
At the end of this module, it is expected that the students will be able to:
1. Understand and describe sample spaces and events for random experiments
2. Explaintheconceptofprobabilityanditsapplicationtodifferentsituations
3. Defineandillustratethedifferentprobabilityrules
4. Solve for the probability of different statistical data.
Probability
Probability is the likelihood or chance of an event occurring.
For example, the probability of flipping a coin and it being heads is ½, because
there is 1 way of getting a head and the total number of possible outcomes is 2 (a head
or tail). We write P(heads) = ½ .
Properties of Probability
b.
c.
d. The probability of something not happening is 1 minus the probability that it will happen.
Experiment – is used to describe any process that generates a set of data
Event – consists of a set of possible outcomes of a probability experiment. Can be one
outcome or more than one outcome.
Simple event – an event with one outcome.
Compound event – an event with more than one outcome.
This can be illustrated in a Venn Diagram. In Figure 2.1, the sample space is
represented by the rectangle and the events by the circles inside the rectangle. The
events A and B (in a to c) and A, B and C (in d and e) are all subsets of the sample
space S.
Figure 2.1 Venn diagrams of sample space with events (adapted from Montgomery et
al., 2003)
For example if a dice is rolled we have {1,2,3,4,5,6} as sample space. The event can be
{1,3,5} which means set of odd numbers. Similarly, when a coin is tossed twice the
sample space is {HH, HT, TH, TT}.
Sample space and events play important roles in probability. Once we have
sample space and event, we can easily find the probability of that event. We have
following formula to find the probability of an event.
The probability of an event E is defined as the number of outcomes
favourable to E divided by the total number of equally likely outcomes in the
sample space S of the experiment.
That is,
Where,
fsdfgsgsdfgfdg
{\left({E}\right)}n(E) is the number of outcomes favourable to E and
\displaystyle{n}{\left({S}\right)}n(S) is the total number of equally likely outcomes in
the sample space S of the experiment.
Let us try to understand this with the help of an example. If a die is tossed, the
sample space is {1,2,3,4,5,6}. In this set, we have a number of elements equal to 6.
Now, if the event is the set of odd numbers in a dice, then we have { 1, 3, 5} as an
event. In this set, we have 3 elements. So, the probability of getting odd numbers in a
single throw of dice is given by
Null space – is a subset of the sample space that contains no elements and is denoted
by the symbol Ø. It is also called empty space.
For example,
Let A = {3,6,9,12,15} and B = {1,3,5,8,12,15,17}; then A ∩ B = {3,12,15}
Let X = {q, w, e, r, t,} and Y = {a, s, d, f}; then X ∩ Y = Ø, since X and Y have no
elements in common.
Union of Events
The union of events A and B is the event containing all the elements that belong
to A or to B or to both and is denoted by the symbol A∪ B. The elements A ∪ B
maybe listed or defined by the rule A ∪B = { x | x ∈A or x ∈B}.
For example,
Let A = {a, e, i, o, u} and B = {b, c, d, e, f}; then A ∪B = {a, b, c, d, e, f, i, o, u}
Let X = {1,2,3,4} and Y = {3,4,5,6}; then A ∪B = {1,2,3,4,5,6}
Compliment of an Event
The complement of an event A with respect to S is the set of all elements of S
that are not in A and is denoted by A’. The shaded region in Figure 2.1 (e) shows
(A ∩ C)’.
For example,
Consider the sample space S = {dog, cow, bird, snake, pig}
Let A = {dog, bird, pig}; then A’ = {cow, snake}
Where
n1 + n2 + … + nk = n
The numerator gives the permutations of the n elements. The terms in the denominator
remove the duplicates due to the same assignments in the k sets (multinomial
coefficients).
Combinations Rule
A sample of k elements is to be chosen from a set of n elements. The number of
different samples of k samples that can be selected from n is equal to:
C. Rules of Probability
Before discussing the rules of probability, we state the following definitions:
Two events are mutually exclusive or disjoint if they cannot occur at the same time.
The probability that Event A occurs, given that Event B has occurred, is called
a conditional probability. The conditional probability of Event A, given Event B, is
denoted by the symbol P(A|B).
The complement of an event is the event not occurring. The probability that Event A will
not occur is denoted by P(A').
The probability that Events A and B both occur is the probability of the intersection of A
and B. The probability of the intersection of Events A and B is denoted by P(A ∩ B). If
Events A and B are mutually exclusive, P(A ∩ B) = 0.
The probability that Events A or B occur is the probability of the union of A and B. The
probability of the union of Events A and B is denoted by P(A ∪ B) .
If the occurrence of Event A changes the probability of Event B, then Events A and B
are dependent. On the other hand, if the occurrence of Event A does not change the
probability of Event B, then Events A and B are independent.
Rule of Addition
Rule 1: If two events A and B are mutually exclusive, then:
Example
A student goes to the library. The probability that she checks out (a) a work of
fiction is 0.40, (b) a work of non-fiction is 0.30, and (c) both fiction and non-fiction is
0.20. What is the probability that the student checks out a work of fiction, non-fiction, or
both?
Solution:
Let F = the event that the student checks out fiction;
let N = the event that the student checks out non-fiction.
Then, based on the rule of addition:
Rule of Multiplication
Rule 1: When two events A and B are independent, then:
Dependent - Two outcomes are said to be dependent if knowing that one of the
outcomes has occurred affects the probability that the other occurs
Conditional Probability - an event B in relationship to an event A is the
probability that event B occurs after event A has already occurred. The probability is
denoted by .
Rule 2: When two events are dependent, the probability of both occurring is:
Example
An urn contains 6 red marbles and 4 black marbles. Two marbles are drawn
without replacement from the urn. What is the probability that both of the marbles are
black?
Solution:
Let A = the event that the first marble is black;
and let B = the event that the second marble is black.
We know the following:
In the beginning, there are 10 marbles in the urn, 4 of which are black. Therefore, P(A)
= 4/10.
After the first selection, there are 9 marbles in the urn, 3 of which are black. Therefore,
P(B|A) = 3/9.
Rule of Subtraction
The probability that event A will occur is equal to 1 minus the probability that
event A will not occur.
Example
The probability of Bill not graduating in college is 0.8. What is the probability that
Bill will not graduate from college?
Solution:
REFERENCES:
Applied Statistics and Probability for Engineers, 3rd Edition (Douglas C. Montgomery)
https://math.tutorvista.com/statistics/sample-space-and-events.html
https://stattrek.com/probability/probability-rules.aspx
https://www.ck12.org/book/CK-12-Probability-and-Statistics-Advanced-Second-Edition/
section/3.6/
Probability examples
1. Getting an ace if I choose a card at random from a standard pack of 52 playing cards?
2. Getting a 5 if I roll a die?
3. Getting an even number if I roll a die?
4. Having one Tuesday in this week?
CHAPTER III
With a discrete probability distribution, each possible value of the discrete random variable can
be associated with a non-zero probability. Thus, a discrete probability distribution is often
presented in tabular form.
Random Variables
In probability and statistics, a random variable is a variable whose value is subject to variations
due to chance (i.e. randomness, in a mathematical sense). As opposed to other mathematical
variables, a random variable conceptually does not have a single, fixed value (even if unknown);
rather, it can take on a set of possible different values, each with an associated probability.
A random variable’s possible values might represent the possible outcomes of a yet-to-be-
performed experiment, or the possible outcomes of a past experiment whose already-existing
value is uncertain (for example, as a result of incomplete information or imprecise
measurements). They may also conceptually represent either the results of an “objectively”
random process (such as rolling a die), or the “subjective” randomness that results from
incomplete knowledge of a quantity.
Random variables can be classified as either discrete (that is, taking any of a specified list of
exact values) or as continuous (taking any numerical value in an interval or collection of
intervals). The mathematical function describing the possible values of a random variable and
their associated probabilities is known as a probability distribution.
Discrete random variables can take on either a finite or at most a countably infinite set of
discrete values (for example, the integers). Their probability distribution is given by a probability
mass function which directly maps each value of the random variable to a probability. For
example, the value of x1 takes on the probability p1, the value of x2 takes on the probability p2,
and so on. The probabilities pi must satisfy two requirements: every probability pi is a number
between 0 and 1, and the sum of all the probabilities is 1. (p1+p2+⋯+pk=1)
Discrete Probability Distribution: This shows the probability mass function of a discrete
probability distribution. The probabilities of the singletons {1}, {3}, and {7} are respectively 0.2,
0.5, 0.3. A set not containing any of these points has probability zero.
Examples of discrete random variables include the values obtained from rolling a die and the
grades received on a test out of 100.
Probability Histogram: This histogram displays the probabilities of each of the three discrete
random variables
The formula, table, and probability histogram satisfy the following necessary conditions of
discrete probability distributions:
x f(x)
2 0.2
3 0.3
5 0.5
7.
8. Discrete Probability Distribution: This table shows the values of the discrete random
variable can take on and their corresponding probabilities.
Again, F(x) accumulates all of the probability less than or equal to x. The cumulative distribution
function for continuous random variables is just a straightforward extension of that of the
discrete case. All we need to do is replace the summation with an integral.
The cumulative distribution function ("c.d.f.") of a continuous random variable X is defined
as:
For -∞<x<∞
3.3 Expected Values of Random Variables
The expected value of a random variable is the weighted average of all possible values that this
random variable can take on.
In probability theory, the expected value (or expectation, mathematical expectation, EV, mean,
or first moment) of a random variable is the weighted average of all possible values that this
random variable can take on. The weights used in computing this average are probabilities in
the case of a discrete random variable.
The expected value may be intuitively understood by the law of large numbers: the expected
value, when it exists, is almost surely the limit of the sample mean as sample size grows to
infinity. More informally, it can be interpreted as the long-run average of the results of many
independent repetitions of an experiment (e.g. a dice roll). The value may not be expected in the
ordinary sense—the “expected value” itself may be unlikely or even impossible (such as having
2.5 children), as is also the case with the sample mean.
Average Dice Value Against Number of Rolls: An illustration of the convergence of sequence
averages of rolls of a die to the expected value of 3.5 as the number of rolls (trials) grows.
Binomial Experiment
A binomial experiment is a statistical experiment that has the following properties:
The experiment consists of n repeated trials.
Each trial can result in just two possible outcomes. We call one of these outcomes a success
and the other, a failure.
The probability of success, denoted by P, is the same on every trial.
The trials are independent; that is, the outcome on one trial does not affect the outcome on
other trials.
Consider the following statistical experiment. You flip a coin 2 times and count the number of
times the coin lands on heads. This is a binomial experiment because:
The experiment consists of repeated trials. We flip a coin 2 times.
Each trial can result in just two possible outcomes - heads or tails.
The probability of success is constant - 0.5 on every trial.
The trials are independent; that is, getting heads on one trial does not affect whether we get
heads on other trials.
The following notation is helpful, when we talk about binomial probability.
x: The number of successes that result from the binomial experiment.
n: The number of trials in the binomial experiment.
P: The probability of success on an individual trial.
Q: The probability of failure on an individual trial. (This is equal to 1 - P.)
n!: The factorialof n (also known as n factorial).
b (x; n, P): Binomial probability - the probability that an n-trial binomial experiment results
in exactly xsuccesses, when the probability of success on an individual trial is P.
nCr: The number of combinations of n things, taken r at a time.
Binomial Distribution
A binomial random variable is the number of successes x in n repeated trials of a binomial
experiment. The probability distribution of a binomial random variable is called a binomial
distribution.
Suppose we flip a coin two times and count the number of heads (successes). The binomial
random variable is the number of heads, which can take on values of 0, 1, or 2. The binomial
distribution is presented below.
Number of
Probability
Heads
0 0.25
1 0.50
2 0.25
Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?
Solution: This is a binomial experiment in which the number of trials is equal to 5, the number of
successes is equal to 2, and the probability of success on a single trial is 1/6 or about 0.167.
Therefore, the binomial probability is:
b (2; 5, 0.167) = 5C2 * (0.167)2 * (0.833)3
b (2; 5, 0.167) = 0.161
Example 3
The probability that a student is accepted to a prestigious college is 0.3. If 5 students from the
same school apply, what is the probability that at most 2 are accepted?
Solution: To solve this problem, we compute 3 individual probabilities, using the binomial
formula. The sum of all these probabilities is the answer we seek. Thus,
b(x < 2; 5, 0.3) = b(x = 0; 5, 0.3) + b(x = 1; 5, 0.3) + b(x = 2; 5, 0.3)
b(x < 2; 5, 0.3) = 0.1681 + 0.3601 + 0.3087
b(x < 2; 5, 0.3) = 0.8369
Notation
The following notation is helpful, when we talk about the Poisson distribution.
e: A constant equal to approximately 2.71828. (Actually, eis the base of the natural logarithm
system.)
μ: The mean number of successes that occur in a specified region.
x: The actual number of successes that occur in a specified region.
P (x; μ): The Poisson probability that exactly x successes occur in a Poisson experiment, when
the mean number of successes is μ.
Poisson Distribution
A Poisson random variable is the number of successes that result from a Poisson experiment.
The probability distribution of a Poisson random variable is called a Poisson distribution.
Given the mean number of successes (μ) that occur in a specified region, we can compute the
Poisson probability based on the following Poisson formula.
Poisson Formula. Suppose we conduct a Poisson experiment, in which the average number of
successes within a given region is μ. Then, the Poisson probability is:
P (x; μ) = (e-μ) (μx) / x!
where x is the actual number of successes that result from the experiment, and e is
approximately equal to 2.71828.
The Poisson distribution has the following properties:
The mean of the distribution is equal to μ.
The variance is also equal to μ .
Poisson Distribution Example
The average number of homes sold by the Acme Realty company is 2 homes per day. What is
the probability that exactly 3 homes will be sold tomorrow?
Solution: This is a Poisson experiment in which we know the following:
μ = 2; since 2 homes are sold per day, on average.
x = 3; since we want to find the likelihood that 3 homes will be sold tomorrow.
e = 2.71828; since e is a constant equal to approximately 2.71828.
We plug these values into the Poisson formula as follows:
P (x; μ) = (e-μ) (μx) / x!
P (3; 2) = (2.71828-2) (23) / 3!
P (3; 2) = (0.13534) (8) / 6
P (3; 2) = 0.180
Thus, the probability of selling 3 homes tomorrow is 0.180.
Suppose the average number of lions seen on a 1-day safari is 5. What is the probability that
tourists will see fewer than four lions on the next 1-day safari?
Solution: This is a Poisson experiment in which we know the following:
μ = 5; since 5 lions are seen per safari, on average.
x = 0, 1, 2, or 3; since we want to find the likelihood that tourists will see fewer than 4 lions; that
is, we want the probability that they will see 0, 1, 2, or 3 lions.
e = 2.71828; since e is a constant equal to approximately 2.71828.
To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3 lions.
Thus, we need to calculate the sum of four probabilities: P (0; 5) + P (1; 5) + P (2; 5) + P (3; 5).
To compute this sum, we use the Poisson formula:
P (x < 3, 5) = P (0; 5) + P (1; 5) + P (2; 5) + P (3; 5)
P (x < 3, 5) = [ (e-5) (50) / 0!] + [ (e-5) (51) / 1!] + [ (e-5) (52) / 2!] + [ (e-5) (53) / 3!]
P (x < 3, 5) = [ (0.006738) (1) / 1] + [ (0.006738) (5) / 1] + [ (0.006738) (25) / 2] + [ (0.006738)
(125) / 6]
P (x < 3, 5) = [ 0.0067] + [ 0.03369] + [ 0.084224] + [ 0.140375]
P (x < 3, 5) = 0.2650
Thus, the probability of seeing at no more than 3 lions is 0.2650.
CHAPTER IV
Figure
4.1 Typical Density Functions
simply the density function of A'. Since X is defined over a continuous sample
space, it is possible for f(x) to have a finite number of discontinuities. However, most
density functions that have practical applications in the analysis of statistical data are
continuous and their graphs may take any of several forms, some of which are shown in
Figure 4.1. Because areas will be used to represent probabilities and probabilities arc
positive numerical values, the density function must lie entirely above the x axis. A
probability density function is constructed so that the area under its curve bounded by
the x axis is equal to 1 when computed over the range of X for which f(x) is defined.
Should this range of X be a finite interval, it is always possible to extend the interval to
include the entire sot of real numbers by defining f(x) to be zero at all points in the
extended portions of the interval. In Figure 4.2, the probability that X assumes a value
between a and /; is equal to the shaded area under the density function between the
ordinates at. x = a and x = b, and from integral calculus is given by
Therefore,
The cumulative distribution function F(x) is expressed graphically in Figure 4.3.
Now,
4.2 Expected Values of Continuous Random Variables
Let X be a continuous random variable with range [a, b] and probability density function
f(x). The expected value of X is defined by
Let’s see how this compares with the formula for a discrete random variable:
The discrete formula says to take a weighted sum of the values xi of X, where the
weights are the probabilities p(xi). Recall that f(x) is a probability density. Its units are
prob/(unit of X).
So f(x) dx represents the probability that X is in an infinitesimal range of width dx around
x. Thus we can interpret the formula for E(X) as a weighted integral of the values x of X,
where the weights are the probabilities f(x) dx.
As before, the expected value is also called the mean or average.
Example 4.2
Let X ∼ uniform(0, 1). Find E(X).
SOLUTION:
Since X has a range of [0, 1] and a density of f(x) = 1:
Not surprisingly, the mean is at the midpoint of the range.
Example 4.3
Let X have range [0, 2] and density . Find E(X).
Does it make sense that this X has mean is in the right half of its range?
Yes. Since the probability density increases as x increases over the range, the average
value of x should be in the right half of the range.
µ is “pulled” to the right of the midpoint 1 because there is more mass to the right.
Properties of E(X)
The properties of E(X) for continuous random variables are the same as for discrete
ones:
Example 4.4
Let X ∼ exp(λ). Find E(X2).
where μ and σ are parameters. These turn out to be the mean and standard deviation,
respectively, of the distribution. As a shorthand notation, we write X ~ N(μ,σ2).
The curve never actually reaches the horizontal axis buts gets close to it beyond about
3 standard deviations each side of the mean.
For any Normally distributed variable:
68.3% of all values will lie between μ −σ and μ + σ (i.e. μ ± σ )
95.45% of all values will lie within μ ± 2 σ
99.73% of all values will lie within μ ± 3 σ
The graphs below illustrate the effect of changing the values of μ and σ on the shape of
the probability density function. Low variability (σ = 0.71) with respect to the mean gives
a pointed bell-shaped curve with little spread. Variability of σ = 1.41 produces a flatter
bellshaped curve with a greater spread.
Example 4.5
The volume of water in commercially supplied fresh drinking water containers is
approximately Normally distributed with mean 70 litres and standard deviation 0.75
litres. Estimate the proportion of containers likely to contain
(i) in excess of 70.9 litres, (ii) at most 68.2 litres, (iii) less than 70.5 litres.
SOLUTION
Let X denote the volume of water in a container, in litres. Then X ~ N(70, 0.75 2 ), i.e. μ =
70, σ = 0.75 and Z = (X − 70)/0.75
X = 70.9 ; Z = (70.9 − 70)/0.75 = 1.20
P(X > 70.9) = P(Z > 1.20) = 0.1151 or 11.51%
X = 68.2 ; Z = −2.40
P(X < 68.2) = P(Z < −2.40) = 0.0082 or 0.82%
X = 70.5 ; Z = 0.67
P(X > 70.5) = 0.2514 ; P(X < 70.5) = 0.7486 or 74.86%
When working out probabilities, we want to include whole rectangles, which is what
continuity correction is all about.
Example 4.6
Suppose we toss a fair coin 20 times. What is the probability of getting between 9 and
11 heads?
SOLUTION
Let X be the random variable representing the number of heads thrown.
X ~ Bin(20, ½)
Since p is close to ½ (it equals ½!), we can use the normal approximation to the
binomial. X ~ N(20 × ½, 20 × ½ × ½) so X ~ N(10, 5) .
In this diagram, the rectangles represent the binomial distribution and the curve is the
normal distribution:
We want P(9 ≤ X ≤ 11), which is the red shaded area. Notice that the first rectangle
starts at 8.5 and the last rectangle ends at 11.5 . Using a continuity correction,
therefore, our probability becomes P(8.5 < X < 11.5) in the normal distribution.
and
It is important to use consistent units in the calculation of probabilities, means, and
variances involving exponential random variables. The following example illustrates unit
conversions.
Example 4.7
In a large corporate computer network, user log-ons to the system can be modeled as a
Poisson process with a mean of 25 log-ons per hour. What is the probability that there
are no logons in an interval of 6 minutes?
SOLUTION
Let X denote the time in hours from the start of the interval until the first log-on. Then, X
has an exponential distribution with log-ons per hour. We are interested in the
probability that X exceeds 6 minutes. Because is given in log-ons per hour, we express
all time units in hours. That is, 6 minutes 0.1 hour. The probability requested is shown
as the shaded area under the probability density function in Fig. 4.5. Therefore,
REFERENCES
Walpole, R., Myers, R., Myers, S., & Ye, K., (2007) Probability & Statistics for Engineers
& Scientists 8th Ed. [Available online] Retrieve from
https://www.csie.ntu.edu.tw/~sdlin/download/Probability%20&%20Statistics.pdf
Orloff, J. & Bloom, J., (2014) Introduction to Probability and Statistics [Available online]
Retrieved from https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-
probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading6a.pdf
Montgomery, D. & Runger, G. (2003) Applied Statistics and Probability for Engineers 3 rd
Ed. [Available online] Retrieved from http://www.um.edu.ar/math/montgomery.pdf
The joint probability density function of A and B defines probabilities for each pair of
outcomes. All possible outcomes are
Since each outcome is equally likely the joint probability density function becomes
Since the coin flips are independent, the joint probability density function is the product
of the marginals:
Example 5.2
The National Highway Traffic Safety Administration is interested in the effect of
seat belt use on saving lives. One study reported statistics on children under the age of
5 who were involved in motor vehicle accidents in which at least one fatality occurred.
For 7,060 such accidents between 1985 and 1989, the results are shown in the Table
5.1.
Table 5.1
Children Involved in Motor Vehicle Accidents
Whether or not he or she survived and what the seat belt situation was. For each child,
define two random variables as follows:
X1 will keep track of the number of child fatalities and X 2 will keep track of the type of
restraining device used for the child.
The frequencies from the Table 5.1 are turned into the relative frequencies of Table 5.2
to produce the joint probability distribution of X1 and X2. In general, we write
And call p(x1, x2) the joint probability function of (X1, X2). For example (see Table 5.2)
Represents the approximate probability that a child will both survive and be in a child
seat when involved in a fatal accident.
Table 5.2
Joint Probability Distribution
`
0 1 Total
The probability that a child will be in a child seat is
= 0.24 + 0.05
= 0.29
In particular, if Rx= {x1, x2, ...}, and Ry= {y1, y2, ...}, then we can always write
In fact, sometimes we define RXY = RX x Ry to simplify the analysis. In this case, for some
pairs (xi, yi) in RX x Ry, PXY(xi, yi) might be zero. For two discrete random variables X and Y, we
have
We can use the joint PMF to find P((X,Y)∈A) for any set A⊂R2. Specifically, we have
Let be the probability that your left sock is black and let be the probability that
your right sock is black. On the left side of the diagram, the yellow area represents the
probability that both of your socks are black. This is the joint probability. If is definitely
true (e.g., given that your right sock is definitely black), then the space of everything
not is dropped and everything in is rescaled to the size of the original space. The
rescaled yellow area is now the conditional probability of given , expressed as . In
other words, this is the probability that your left sock is black if you know that your right
sock is black. Note that the conditional probability of given is not in general equal to
the conditional probability of given . That would be the fraction of that is yellow, which
in this picture is slightly smaller than the fraction of that is yellow.
Philosophically, all probabilities are conditional probabilities. In the Euler
diagram, and are conditional on the box that they are in, in the same way that is
conditional on the box that it is in. Treating probabilities in this way makes chaining
together different types of reasoning using Bayes' theorem easier, allowing for the
combination of uncertainties about outcomes ("given that the coin is fair, how likely am I
to get a head") with uncertainties about hypotheses ("given that Frank gave me this
coin, how likely is it to be fair?"). Historically, conditional probability has often been
misinterpreted, giving rise to the famous Monty Hall problem and Bayesian mistakes in
science. There is only one main formula regarding conditional formula which is,
Any other formula regarding conditional probability can be derived from the above
formula. Specifically, if you have two random variables X and Y, you can write
Where C,D ∈ R
More Than Two Random Variables
For two or more random variables, joint probability distribution function is
defined in a similar way to what we have already seen for the case of two random
variables. Let x1, x2, ... be n discrete random variables. The joint PMF of x1, x2, ..., xn is
defined as
For n jointly continuous random variables x1, x2, ..., xn, the joint PDF is defined to be the
function fx1, x2, ..., xn(x1, x2, ..., xn) such that the probability of any set A⊂Rn, we can write
The marginal PDF of x1 can be obtained by integrating all other xj’s. For example
Example 5.3
Consider two random variables X and Y with joint PMF given in Table 5.3.
Table 5.3
Joint PMF of X and Y
a. Find .
b. Find the marginal PMFs of X and Y.
c. Find .
d. Are X and Y independent?
Solution:
We obtain
Example 5.4
A soft-drink machine has a random amount Y 2 in supply at the beginning of a
given day and dispenses a random amount Y 1 during the day (with measurements in
gallons). It is not resupplied during the day, hence Y 1 ≤ Y2. It has been observed that
Y1 and Y2 have joint density
That is, the points (Y1, Y2) are uniformly distributed over the triangle with the given boundaries.
Find the conditional probability density of Y1 given Y2=y2. Evaluate the probability that less than
½ gallon is sold, given that the machine contains 1 gallon at the start of the day.
Solution:
The marginal density of Y2 is given by
By definition,
The probability of interest is
Note that if the machine had contained 2 gallons at the start of the day, then
Thus, the amount sold is highly dependent upon the amount in supply.
Previous
Chapter V
Continue
Example 5.5 Mean and Standard Deviation of Sales Commission
You pay your sales personnel a commission of 75% of the amount they sell over
$2000. X = Sales has mean $5000 and standard deviation $1000. What are the mean
and standard deviation of pay?
Solution:
represents the basis for the commission, and "Pay" is 75% of that, so
Pay =
(1a)
(1b)
Example 5.6 The Portfolio Effect.
You are considering purchase of stock in two different companies, X and Y. Return after one
year for stock X is a random variable with X = $112, X = 10. Return for stock Y (a different
company) has the same and . Assuming that X and Y are independent, which portfolio has
less variability, 2 shares of X or one each of X and Y?
Solution:
The returns from 2 shares of X will be exactly twice the returns from one share, or 2X.
The returns from one each of X and Y is the sum of the two returns, X+Y.
Previous
Chapter V
Continue
Let
Y = X2
The function g(x) = x is strictly increasing and it admits an inverse on the support of X:
2
g-1(y)=√y
The support of Y is Ry - [1,4]. The distribution function of Y is
In the cases in which is either discrete or continuous there are specialized formulae for the
probability mass and probability density functions, which are reported below.
Strictly Increasing Functions of a Discrete Random Variable
When X is a discrete random variable, the probability mass function Y = g(X)can be computed
as follows.
Proposition (probability mass of an increasing function) Let X be a discrete random
variable with support Rx and probability mass function px(x). Let g: R→R be strictly increasing
on the support of X. Then, the support of Y=g(X) is
Let
Let
with derivative
References
En.wikipedia.org. (2019). Joint probability distribution. [online] Available at:
https://en.wikipedia.org/wiki/Joint_probability_distribution [Accessed 3 Jun. 2019].
Probabilitycourse.com. (2019). Joint Probability Mass Function | Marginal PMF | PMF. [online]
Available at: https://www.probabilitycourse.com/chapter5/5_1_1_joint_pmf.php [Accessed 3
Jun. 2019].
En.wikipedia.org. (2019). Marginal distribution. [online] Available at:
https://en.wikipedia.org/wiki/Marginal_distribution [Accessed 3 Jun. 2019].
Brilliant.org. (2019). Conditional Probability Distribution | Brilliant Math & Science Wiki. [online]
Available at: https://brilliant.org/wiki/conditional-probability-distribution/ [Accessed 3 Jun. 2019].
Scheaffer, R., MuleKar, M. and McClave, J. (2011). Probability and Statistics for Engineering Students.
Philippines: C&E Publishing, Inc., pp.355-366.
Previous
Chapter VI
Continue
Previous
Chapter VI
Continue
In the standard error formula, the factor sqrt[ (N - n ) / (N - 1) ] is called the finite
population correction or fpc. When the population size is very large relative to the
sample size, the fpc is approximately equal to one; and the standard error
formula can be approximated by:
where
Like the formula for the standard error of the mean, the formula for the standard
error of the proportion uses the finite population correction, sqrt[ (N - n ) / (N -
1) ]. When the population size is very large relative to the sample size, the fpc is
approximately equal to one; and the standard error formula can be approximated
by:
Previous
Chapter VI
Continue
Let's review what we know and what we want to know. We know that the
sampling distribution of the mean is normally distributed with a mean of 80 and a
standard deviation of 2.81. We want to know the probability that a sample mean
is less than or equal to 75 pounds.
Because we know the population standard deviation and the sample size is
large, we'll use the normal distribution to find probability. To solve the problem,
we plug these inputs into the Normal Probability Calculator: mean = 80, standard
deviation = 2.81, and normal random variable = 75. The Calculator tells us that
the probability that the average weight of a sampled student is less than 75
pounds is equal to 0.038.
Note: Since the population size is more than 20 times greater than the sample
size, we could have used the "approximate" formula σ x = [ σ / sqrt(n) ] to compute
the standard error. Had we done that, we would have found a standard error
equal to [ 20 /] or 2.83.
Example No 2: Find the probability that of the next 120 births, no more than 40%
will be boys. Assume equal probabilities for the births of boys and girls. Assume
also that the number of births in the population (N) is very large, essentially
infinite.
Solution: The Central Limit Theorem tells us that the proportion of boys in 120
births will be approximately normally distributed.
The mean of the sampling distribution will be equal to the mean of the population
distribution. In the population, half of the births result in boys; and half, in girls.
Therefore, the probability of boy births in the population is 0.50. Thus, the mean
proportion in the sampling distribution should also be 0.50.
The standard deviation of the sampling distribution (i.e., the standard error) can
be computed using the following formula:
Here, the finite population correction is equal to 1.0, since the population size (N)
was assumed to be infinite. Therefore, standard error formula reduces to:
Let's review what we know and what we want to know. We know that the
sampling distribution of the proportion is normally distributed with a mean of 0.50
and a standard deviation of 0.04564. We want to know the probability that no
more than 40% of the sampled births are boys.
Because we know the population standard deviation and the sample size is
large, we'll use the normal distribution to find probability. To solve the problem,
we plug these inputs into the Normal Probability Calculator: mean = .5, standard
deviation = 0.04564, and the normal random variable = .4. The Calculator tells us
that the probability that no more than 40% of the sampled births are boys is equal
to 0.014.
Note: This problem can also be treated as a binomial experiment. Elsewhere, we
showed how to analyze a binomial experiment. The binomial experiment is
actually the more exact analysis. It produces a probability of 0.018 (versus a
probability of 0.14 that we found using the normal distribution). Without a
computer, the binomial approach is computationally demanding. Therefore, many
statistics texts emphasize the approach presented above, which uses the normal
distribution to approximate the binomial.
Previous
Chapter VI
Continue
x1, x2, ..., xn
Definition. The range of possible values of the parameter θ is called the parameter
space Ω (the greek letter "omega").
For example, if μ denotes the mean grade point average of all college students,
t hen the parameter space (assuming a 4-point grading scale) is:
Ω = {μ: 0 ≤ μ ≤ 4}
And, if p denotes the proportion of students who smoke cigarettes, then the
parameter space is:
Ω = {p: 0 ≤ p ≤ 1}
Definition. The function of X1, X2, ..., Xn, that is, the statistic u(X1, X2, ..., Xn), used to
estimate θ is called a point estimator of θ.
For example, the function:
is a point estimator of the population variance σ2.
Definition. The function u(x1, x2, ..., xn) computed from a set of data is an observed point
estimate of θ.
For example, if xi are the observed grade point averages of a sample of 88
students, then:
is a point estimate of μ, the mean grade point average of all the students in the
population.
And, if xi = 0 if a student has no tattoo, and xi = 1 if a student has a tattoo, then:
p = 0.11
is a point estimate of p, the proportion of all students in the population who have
a tattoo.
Previous
Chapter VI
Continue
are the maximum likelihood estimators of μ and σ2, respectively. A natural question then
is whether or not these estimators are "good" in any sense. One measure of "good"
is "unbiasedness."
Example No. 1: If Xi is a Bernoulli random variable with parameter p, then:
is the maximum likelihood estimator (MLE) of p. Is the MLE of p an unbiased estimator
of p?
Solution. Recall that if Xi is a Bernoulli random variable with parameter p, then E(Xi)
= p. Therefore:
The first equality holds because we've merely replaced p-hat with its definition. The
second equality holds by the rules of expectation for a linear combination. The third
equality holds because E(Xi) = p. The fourth equality holds because when you add the
value p up n times, you get np. And, of course, the last equality is simple algebra. In
summary, we have shown that:
Also, recall that the expected value of a chi-square random variable is its degrees of
freedom. That is, if:
The first equality holds because we effectively multiplied the sample variance by 1. The
second equality holds by the law of expectation that tells us we can pull a constant
through the expectation. The third equality holds because of the two facts we recalled
above. That is:
In summary, we have shown that, if Xi is a normally distributed random variable with
mean μ and variance σ2, then S2is an unbiased estimator of σ2. It turns out, however,
that S2 is always an unbiased estimator of σ2, that is, for anymodel, not just the normal
model. (You'll be asked to show this in the homework.) And, although S2 is always an
unbiased estimator of σ2, S is not an unbiased estimator of σ.
Example No. 3 :Let T be the time that is needed for a specific task in a factory to be
completed. In order to estimate the mean and variance of T, we observe a random
sample T1,T2,⋯⋯,T6. Thus, Ti’s are i.i.d. and have the same distribution as T. We obtain
the following values (in minutes):
18,21,17,16,24,20.
Find the values of the sample mean, the sample variance, and the sample standard
deviation for the observed sample.
The sample mean is:
The sample variance is given by
Previous
Chapter VI
Continue
Previous
Chapter VI
Continue
Since we have already determined the bias and standard error of estimator, calculating
its mean squared error is easy:
References:
Holton,G., (n.d.) Value-at-Risk Second Edition [Available online] Retrieve
from: https://www.value-at-risk.net/bias/
Pishro-Nik, H., (n.d.) Introduction to Probability, Statistics, and Random Processes
[Available online] Retrieve
from: https://www.probabilitycourse.com/chapter8/8_2_2_point_estimators_for_mean_a
nd_var.php
The Pennsylvania State University (2018) Probability Theory and Mathematical
Statistics [Available online] Retrieve
from: https://newonlinecourses.science.psu.edu/stat414/node/192/
Stat Trek (n.d.) Sampling Distribution [Available online] Retrieve
from: https://stattrek.com/sampling/sampling-distribution.aspx
Previous
Chapter VII
Continue
The value z* representing the point on the standard normal density curve such that the
probability of observing a value greater than z* is equal to p is known as the
upper p critical value of the standard normal distribution. For example, if p = 0.025, the
value z* such that P(Z > z*) = 0.025, or P(Z < z*) = 0.975, is equal to 1.96. For a
confidence interval with level C, the value p is equal to (1-C)/2. A 95% confidence
interval for the standard normal distribution, then, is the interval (-1.96, 1.96), since 95%
of the area under the curve falls within this interval.
Previous
Chapter VII
Continue
Previous
Chapter VII
Continue
Previous
Chapter VII
Continue
because the future observation , is independent of the mean of the current sample𝜒 .
The prediction error is normally distributed. Therefore,
Example No. 1:
Reconsider the tensile adhesion tests on specimens of U-700 alloy described in
Example 8-4. The load at failure for specimens was observed, and we found that and .
The 95% confidence interval on was . We plan to test a twenty-third specimen. A 95%
prediction interval on the load at failure for this specimen is,
Notice that the prediction interval is considerably longer than the CI.
Previous
Chapter VII
Continue
where k is a tolerance interval factor found in Appendix Table XI. Values are given
for 90%, 95%, and 95% and for 95% and 99% confidence
One-sided tolerance bounds can also be computed. The tolerance factors for these
bounds are also given in Appendix Table XI.
Example No. 1:
Let’s reconsider the tensile adhesion tests. The load at failure for specimens was
observed, and we found that and . We want to find a tolerance interval for the load at
failure that includes 90% of the values in the population with 95% confidence. From
Appendix Table XI the tolerance factor k for , , and 95% confidence is The desired
tolerance interval is
which reduces to (23.67, 39.75). We can be 95% confident that at least 90% of the
values of load at failure for this particular alloy lie between 23.67 and 39.75
megapascals.
Reference:
Valerie J. Easton and John H. McColl's Statistics Glossary v1.1
Douglas C. Montgomery and George C. Runger: Applied Statistics and Probability for
Engineers (Third Edition)
Previous
Chapter VIII
Continue
Statistical Hypotheses
The best way to determine whether a statistical hypothesis is true would be to
examine the entire population. Since that is often impractical, researchers typically
examine a random sample from the population. If sample data are not consistent with
the statistical hypothesis, the hypothesis is rejected.
There are two types of statistical hypotheses.
o Null hypothesis. The null hypothesis, denoted by Ho, is usually the hypothesis that sample
observations result purely from chance.
o Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that
sample observations are influenced by some non-random cause.
Hypothesis testing is an act in statistics whereby an analyst tests an assumption
regarding a population parameter. The methodology employed by the analyst depends
on the nature of the data used and the reason for the analysis. Hypothesis testing is
used to infer the result of a hypothesis performed on sample data from a larger
population.
Hypothesis testing in statistics is a way for you to test the results of a survey or
experiment to see if you have meaningful results. You’re basically testing whether your
results are valid by figuring out the odds that your results have happened by chance. If
your results may have happened by chance, the experiment won’t be repeatable and so
has little use.
Hypothesis testing can be one of the most confusing aspects for students, mostly
because before you can even perform a test, you have to know what your null
hypothesis is. Often, those tricky word problems that you are faced with can be difficult
to decipher. But it’s easier than you think; all you need to do is:
Previous
Chapter VIII
Continue
1. The significance level is the probability of rejecting a null hypothesis that is correct.
2. The sampling distribution for a test statistic assumes that the null hypothesis is correct.
Consequently, to represent the critical regions on the distribution for a test
statistic, you merely shade the appropriate percentage of the distribution. For the
common significance level of 0.05, you shade 5% of the distribution.
When a test statistic falls in either critical region, your sample data are sufficiently
incompatible with the null hypothesis that you can reject it for the population.
In a two-tailed test, the generic null and alternative hypotheses are the following:
Null: The effect equals zero.
Alternative: The effect does not equal zero.
The specifics of the hypotheses depend on the type of test you perform because
you might be assessing means, proportions, or rates.
Advantages of two-tailed hypothesis tests
You can detect both positive and negative effects. Two-tailed tests are standard
in scientific research where discovering any type of effect is usually of interest to
researchers.
Again, the specifics of the hypotheses depend on the type of test you perform.
Notice how for both possible null hypotheses the tests can’t distinguish between
zero and an effect in a particular direction. For example, in the example directly above,
the null combines “the effect is greater than or equal to zero” into a single category.
That test can’t differentiate between zero and greater than zero.
Previous
Chapter VIII
Continue
8.1.2 P-value in Hypothesis Test
The P value, or calculated probability, is the probability of finding the observed,
or more extreme, results when the null hypothesis (H0) of a study question is true – the
definition of ‘extreme’ depends on how the hypothesis is being tested. P is also
described in terms of rejecting H0 when it is actually true, however, it is not a direct
probability of this state.
The term significance level (alpha) is used to refer to a pre-chosen probability and
the term "P value" is used to indicate a probability that you calculate after a given study.
If your P value is less than the chosen significance level then you reject the null
hypothesis i.e. accept that your sample gives reasonable evidence to support the
alternative hypothesis. It does NOT imply a "meaningful" or "important" difference; that
is for you to decide when considering the real-world relevance of your result.
Type I error is the false rejection of the null hypothesis and type II error is the
false acceptance of the null hypothesis. As an aid memoir: think that our cynical society
rejects before it accepts.
The significance level (alpha) is the probability of type I error. The power of a test
is one minus the probability of type II error (beta). Power should be maximized when
selecting statistical methods.
The following table shows the relationship between power and error in
hypothesis testing:
Previous
Chapter VIII
Continue
Previous
Chapter VIII
Continue
Previous
Chapter VIII
Continue
Previous
Chapter VIII
Continue
8.4 Test on the Variance and Statistical Deviation of a Normal
Distribution
Example No. 3:
An automatic filling machine is used to fill bottles with liquid detergent. A random
sample of 20 bottles results in a sample variance of fill volume of s 2 = 0.0153 (fluid
ounces)2. If the variance of fill volume exceeds 0.01 (fluid ounces) 2, an unacceptable
proportion of bottles will be underfilled or overfilled. Is there evidence in the sample data
to suggest that the manufacturer has a problem with underfilled or overfilled bottles?
Use α = 0.05, and assume that fill volume has a normal distribution.
Using the eight-step procedure:
2. H0; = 0.01
7. Computations:
8. Conclusions: Since , we conclude that there is no strong evidence that the variance
of fill volume exceeds 0.01 (fluid ounces)2.
1.
2.
Previous
3. Chapter VIII
4. Continue
15.
16. where x = 4, n = 200, and p0 = 0.05.
17. 6. Reject H0:
18.
19. 7. Computations: The test statistic is:
20.
21. 8. Conclusions: Since z0 = -1.95 < -z0.05 = -1.645, we reject H0 and conclude
that the process fraction defective p is less than 0.05. The P-value for this value
of the test statistic z 0 is P = 0.0256, which is less than α = 0.05. We conclude that
the process is capable.
22. References:
23. Majaski, C.,(2019) Hypothesis Testing [Available online]
24. Retrieved from https://www.investopedia.com/terms/h/hypothesistesting.asp
25. Statistics Solutions (2013) Hypothesis Testing [Available online]
26. Retrieved
from http://www.statisticssolutions.com/academic-solutions/resources/directory-
of-statistical-analyses/hypothesis-testing/
27. Glen, S., (2019) Hypothesis Testing [Available online]
28. Retrieved from https://www.statisticshowto.datasciencecentral.com/probability-
and-statistics/hypothesis-testing/
29. Frost J., (2019) One-Tailed and Two-Tailed Hypothesis Tests Explained
[Available online]
30. Retrieved from https://statisticsbyjim.com/hypothesis-testing/one-tailed-two-
tailed-hypothesis-tests/
Previous
Chapter IX
Continue
The quantity,
Null Hypothesis:
Test Statistic:
Sample Problem:
A product developer is interested in reducing the drying time of a primer paint. Two
formulations of the paint are tested; formulation 1 in the standard chemistry, and
formulation 2 has a new drying ingredient that should reduce drying time. From
experience, it is known that the standard deviation of drying time is 8 minutes, and this
inherent variability should be unaffected by the addition of the new ingredient. Ten
specimens are painted with formulation 1, and another 10 specimens are painted with
formulation 2; the 20 specimens are painted in random order. The two sample average
drying times are x1 = 121 minutes and x2 = 112 minutes, respectively. What conclusions
can the product developer draw about the effectiveness of the new ingredient,
using 0.05?
We apply the eight-step procedure to this problem as follows:
1. The quantity of interest is the difference in the mean drying
times, .
2.
3. We want to reject Ho if the new ingredient reduces mean
drying time.
4. α = 0.05
5. The test statistic is,
where
6. Reject .
7. Computations: x1 = 121 minutes and x2 = 112 minutes the test statistic is:
Previous
Chapter IX
Continue
Given the assumptions of the section, the quantity
has a t distribution of n1 + n2 - 2 degrees of freedom.
Definition: The Two-Sample of Pooled t-Test
Null Hypothesis:
Test statistic:
Alternative Hypothesis Rejection Criterion
Case 2:
is distributed approximately as t distribution with degrees of freedom v is given by
Type II Error and Choice of Sample Size
Case 1:
Case 2:
Previous
Chapter IX
Continue
The development of a test procedure for these hypotheses requires a new probability
distribution, the F distribution.
Let W and Y be independent chi-square random variables with u and v freedom,
respectively. Then, the ratio
Has the probability density function
And is said to follow the F distribution with u degrees of freedom in the numerator and v
Chapter IX
Continue
The following test statistic is distributed approximately as standard normal and is the
basis of the test:
Large-Sample Test on the Difference in Population Proportions
Null Hypothesis:
Test Statistic:
Alternative Hypotheses: Rejection Criterion:
Type II Error and Choice of Sample Size
Suppose that x is years on the job and y is salary. Then y intercept is (x=0) is the salary
for a person with zero years’ experience, the starting salary. The slope is the change in
salary per year of service. A person with a salary above the line would have a positive
residual, and a person with a salary below the line would have a negative residual.
If the line trends downward so that y decreases when x increases, then the slope is
negative. For example, if x is age and y is price for used cars, then the slope gives the
drop in price per year of age. In this example, the intercept is the price when new, and
the residuals represent the difference between the actual price and the predicted price.
All other things being equal, if the straight line is the correct model, a positive residual
means a car costs more than it should, and a negative residual means a car costs less
than it should (that is, it’s a bargain).
Example 10-1: Determining If There is a Relationship
Is there a relationship between the alcohol content and the number of calories in 12-ounce beer?
To determine if there is one a random sample was taken of beer’s alcohol content and calories
("Calories in beer," 2011), and the data is in table 10-1.
Solution: To aid in figuring out if there is a relationship, it helps to draw a scatter plot of
the data. It is helpful to state the random variables, and since in an algebra class the
variables are represented as x and y, those labels will be used here. It helps to state
which variable is x and which is y. State random variables x = alcohol content in the
beer y = calories in 12 ounce beer
This scatter plot looks fairly linear. However, notice that there is one beer in the list that
is actually considered a non-alcoholic beer. That value is probably an outlier since it is a
non-alcoholic beer. The rest of the analysis will not include O’Doul’s. You cannot just
remove data points, but in this case it makes more sense to, since all the other beers
have a fairly large alcohol content.
A scatter plot is a graphical representation of the relation between two or more variables. In the
scatter plot of two variables x and y, each point on the plot is an x-y pair.
To find the equation for the linear relationship, the process of regression is used to find
the line that best fits the data (sometimes called the best fitting line). The process is to
draw the line through the data and then find the distances from a point to the line, which
are called the residuals. The regression line is the line that makes the square of the
residuals as small as possible, so the regression line is also sometimes called the least
squares line. The regression line and the residuals are displayed in figure 10-3.
Previous
Chapter X
Continue
where α is the “true” intercept, β is the “true” slope, and ε is an error term. When you fit the line,
you’ll try to estimate a and b, but you can never know them exactly. The estimates of α and β,
we’ll label a and b. The predicted values of y using these estimates, we’ll label ŷ, so that
The residuals are the difference between the actual values and the estimated values.
Previous
Chapter X
Continue
is as small as possible. This procedure is called the least-squares method. The
values a and b that result in the smallest possible sum for the squared residuals can be
calculated from the following formulas:
Note: the easiest way to find the regression equation is to use the technology.
These are called the least-squares estimates.
LEAST SQUARES PRINCIPLE
The least squares principle is that the regression line is determined by minimizing the sum of
the squares of the vertical distances between the actual Y values and the predicted values of Y.
A line is fit through the XY points such that the sum of the squared residuals (that is, the sum of
the squared the vertical distance between the observations and the line) is minimized.
For example, say our data set contains the values listed in Table 10-2:
The sample averages for x and y are 1.8 and 3.4, and the estimates for a and b are
Thus the least-squares estimate of the regression equation is y = 2.5 + 0.5x.
Possible Uses of Linear Regression Analysis
Montgomery (1982) outlines the following four purposes for running a regression analysis.
Description
The analyst is seeking to find an equation that describes or summarizes the relationship
between two variables. This purpose makes the fewest assumptions.
Coefficient Estimation
This is a popular reason for doing regression analysis. The analyst may have a theoretical
relationship in mind, and the regression analysis will confirm this theory. Most likely, there is
specific interest in the magnitudes and signs of the coefficients. Frequently, this purpose for
regression overlaps with others.
Prediction
The prime concern here is to predict the response variable, such as sales, delivery time,
efficiency, occupancy rate in a hospital, reaction yield in some chemical process, or strength of
some metal. These predictions may be very crucial in planning, monitoring, or evaluating some
process or system. There are many assumptions and qualifications that must be made in this
case. For instance, you must not extrapolate beyond the range of the data. Also, interval
estimates require that normality assumptions to hold.
Control
Regression models may be used for monitoring and controlling a system. For example,
you might want to calibrate a measurement system or keep a response variable within certain
guidelines. When a regression model is used for control purposes, the independent variable
must be related to the dependent variable in a causal way. Furthermore, this functional
relationship must continue over time. If it does not, continual modification of the model must
occur.
Assumptions
The following assumptions must be considered when using linear regression analysis.
Linearity
Linear regression models the straight-line relationship between Y and X. Any curvilinear
relationship is ignored. This assumption is most easily evaluated by using a scatter plot. This
should be done early on in your analysis. Nonlinear patterns can also show up in residual plot. A
lack of fit test is also provided.
Constant Variance
The variance of the residuals is assumed to be constant for all values of X. This
assumption can be detected by plotting the residuals versus the independent variable. If these
residual plots show a rectangular shape, we can assume constant variance. On the other hand,
if a residual plot shows an increasing or decreasing wedge or bowtie shape, nonconstant
variance (heteroscedasticity) exists and must be corrected.
The corrective action for nonconstant variance is to use weighted linear regression or to
transform either Y or X in such a way that variance is more nearly constant. The most
popular variance stabilizing transformation is to take the logarithm of Y.
Special Causes
It is assumed that all special causes, outliers due to one-time situations, have been
removed from the data. If not, they may cause nonconstant variance, nonnormality, or other
problems with the regression model. The existence of outliers is detected by considering scatter
plots of Y and X as well as the residuals versus X. Outliers show up as points that do not follow
the general pattern.
Normality
When hypothesis tests and confidence limits are to be used, the residuals are assumed to
follow the normal distribution.
Independence
The residuals are assumed to be uncorrelated with one another, which implies that
the Y’s are also uncorrelated. This assumption can be violated in two ways: model
misspecification or time-sequenced data.
1. The regression coefficients remain unbiased, but they are no longer efficient, i.e.,
minimum variance estimates.
2. With positive serial correlation, the mean square error may be seriously underestimated.
The impact of this is that the standard errors are underestimated, the t-tests are inflated
(show significance when there is none), and the confidence intervals are shorter than
they should be.
3. Any hypothesis tests or confidence limits that require the use of the tor Fdistribution are
invalid. You could try to identify these serial correlation patterns informally, with the
residual plots versus time. A better analytical way would be to use the Durbin-Watson
test to assess the amount of serial correlation.
Example 10-2: Calculating the Regression Equation with the Formula
Is there a relationship between the alcohol content and the number of calories in 12-ounce
beer? To determine if there is one a random sample was taken of beer’s alcohol content and
calories ("Calories in beer,," 2011). Find the regression equation from the formula.
Previous
Chapter X
Continue
A correlation exists between two variables when the values of one variable are somehow
associated with the values of the other variable. When you see a pattern in the data you say
there is a correlation in the data. Though this book is only dealing with linear patterns, patterns
can be exponential, logarithmic, or periodic. To see this pattern, you can draw a scatter plot of
the data. Remember to read graphs from left to right, the same as you read words. If the graph
goes up the correlation is positive and if the graph goes down the correlation is
negative. The words “weak”, “moderate”, and “strong” are used to describe the strength of the
relationship between the two variables.
The correlation is a parameter of the bivariate normal distribution. This distribution is used to
describe the association between two variables. This association does not include a cause and
effect statement. That is, the variables are not labeled as dependent and independent. One
does not depend on the other. Rather, they are considered as two random variables that seem
to vary together. The important point is that in linear regression, Y is assumed to be a
random variable and X is assumed to be a fixed variable. In correlation analysis, both Y
and X are assumed to be random variables.
The linear correlation coefficient is a number that describes the strength of the linear
relationship between the two variables. It is also called the Pearson correlation
coefficient after Karl Pearson who developed it. It is the most often used measure of
correlation. The symbol for the sample linear correlation coefficient is r. The symbol for the
population correlation coefficient is ρ (Greek letter rho).
THE CORRELATION COEFFICIENT, r
The correlation coefficient can be interpreted in several ways. Here are some of the
interpretations.
1. If both Y and X are standardized by subtracting their means and dividing by their standard
deviations, the correlation is the slope of the regression of the standardized Y on the
standardized X.
2. The correlation is the standardized covariance between Y and X.
3. The correlation is the geometric average of the slopes of the regressions of Y on X and of X on
Y.
4. The correlation is the square root of R-squared, using the sign from the slope of the regression
of Y on X.
The corresponding formulas for the calculation of the correlation coefficient, r are
where sXY is the covariance between X and Y, b XY is the slope from the regression of X on Y,
and bYX is the slope from the regression of Y on X. sXY is calculated using the formula
Example 10-4: Calculating the correlation coefficient
Calculations:
Assumptions of linear correlation are the same as the assumptions for the regression
line:
a. The set (x, y) of ordered pairs is a random sample from the population of all such possible (x,
y) pairs. b. For each fixed value of x, the y-values have a normal distribution. All of the y
distributions have the same variance, and for a given x-value, the distribution of yvalues has a
mean that lies on the least squares line. You also assume that for a fixed y, each xhas its own
normal distribution. This is difficult to figure out, so you can use the following to determine if you
have a normal distribution.
Previous
Chapter X
Continue
To make a decision, compare the calculated t-statistic with the critical t-statistic for the
appropriate degrees of freedom and level of significance.
Example 10-5: In the previous example,
r = 0.475
N = 10
Example 10-6: Suppose the correlation coefficient is 0.2 and the number of observations is 32.
What is the calculated test statistic? Is this significant correlation using a 5% level of
significance?
Solution
Hypotheses:
H0: ρ = 0
Ha: ρ ≠ 0
Calculated t-statistic:
An estimate of the variance of the regression coefficients is calculated using
An estimate of the variance of the predicted mean of Y at a specific value of X, say X0 , is given
by
An estimate of the variance of the predicted value of Y for an individual for a specific value of X,
say X0 , is given by
Example 10-4: Inference for Regression and Correlation
How do you really say you have a correlation? Can you test to see if there really is a
correlation? Of course, the answer is yes. The hypothesis test for correlation is as follows:
Hypothesis Test for Correlation:
1. State the random variables in words.
x = independent variable
y = dependent variable
2. State the null and alternative hypotheses and the level of significance
H0 : ρ = 0 (There is no correlation)
HA : ρ ≠ 0 (There is a correlation)
Or
HA : ρ < 0 (There is a negative correlation)
Or
HA : ρ > 0 (There is a positive correlation)
Also, state your α level here.
3. State and check the assumptions for the hypothesis test
The assumptions for the hypothesis test are the same assumptions for regression and
correlation.
4. Find the test statistic and p-value
5.
Previous
Chapter X
Continue
Previous
Chapter X
Continue
Modified Residuals
Davison and Hinkley (1999) page 279 recommend the use of a special rescaling of the
residuals when bootstrapping to keep results unbiased. These modified residuals are calculated
using
F.2 Coefficient of Determination
There are times when the degree of linear association is of interest in its own right. Discussed
two descriptive measures to describe the degree of linear association between Y and Y.
Partitioning of Total Sum of Squares:
Total Sum of Squares (SST) = summation of (Yi- Ẏ)2
- Error Sum of Squares (SSE) = summation 0f (Yi– Ŷ)2
- Regression Sum of Squares (SSR) = summation of (Ŷi- Ẏi)2
The coefficient of determination r2is defined as
r2 = SSR/ SST
= 1- (SSE/SST)
Since 0 ≤ SSE SST, it follows that
0 ≤ r2 ≤ 1
Interpret r2 as the proportionate reduction of total variation associated with the use of the
predictor variable X.
The limiting values of r2 occurs as follows:
1. When all observations fall on the fitted regression line, then SSE = 0 and r2= 1.
2. When the fitted regression line is horizontal so that bo and Ŷ ≡ Ẏ, then SSE = SST and r2= 0.
Sample Problem
Find the coefficient of variation in calories that is explained by the linear relationship between
alcohol content and calories and interpret the value.
Solution:
From the calculator results,
r2 = 0.8344
Using R, you can do (cor(independent variable, dependent variable))^2. So that would
be (cor(alcohol, calories))^2, and the output would be
[1] 0.8343751
Or you can just use a calculator and square the correlation value. Thus, 83.44% of the
variation in calories is explained to the linear relationship between alcohol content and calories.
The other 16.56% of the variation is due to other factors. A really good coefficient of
determination has a very small, unexplained part.
Previous
Chapter X
10.8 Correlation
G. Correlation
A correlation exists between two variables when the values of one variable are somehow
associated with the values of the other variable. When you see a pattern in the data you say
there is a correlation in the data. Though this book is only dealing with linear patterns, patterns
can be exponential, logarithmic, or periodic. To see this pattern, you can draw a scatter plot of
the data.
Remember to read graphs from left to right, the same as you read words. If the graph goes up
the correlation is positive and if the graph goes down the correlation is negative. The words “
weak”, “moderate”, and “strong” are used to describe the strength of the relationship between
the two variables.
Correlation does not imply causation. We may say that two variables X and Y are correlated, but
that does not mean that X causes Y or that Y causes X – they simply are related or associated
with one another.
The linear correlation coefficient is a number that describes the strength of the linear
relationship between the two variables. It is also called the Pearson correlation coefficient after
Karl Pearson who developed it. The symbol for the sample linear correlation coefficient is r. The
symbol for the population correlation coefficient is ρ (Greek letter rho).
Interpretation of the correlation coefficient r is always between -1 and 1. r = -1 means
there is a perfect negative linear correlation and r=1 means there is a perfect positive
correlation. The closer r is to 1 or -1 the stronger the correlation.
Careful: r = 0 does not mean there is no correlation. It just means there is no linear
correlation. There might be a very strong curved pattern.
Sample Problem. Calculating the Linear Correlation Coefficient, r
How strong is the positive relationship between the alcohol content and the number of calories
in 12-ounce beer? To determine if there is a positive linear correlation, a random sample was
taken of beer’s alcohol content and calories for several different beers ("Calories in beer,,"
2011), and the data are in table #10.2.1. Find the correlation coefficient and interpret that value.
Solution:
State random variables
x= alcohol content in the beer
y= calories in 12 ounce beer
Assumptions check:
From example problem, the assumptions have been met.
To compute the correlation coefficient using the TI-83/84 calculator, use the LinRegTTest in the
STAT menu. The setup is in figure 10.2.2. The reason that >0 was chosen is because the
question was asked if there was a positive correlation. If you are asked if there is a negative
correlation, then pick <0. If you are just asked if there is a correlation, then pick ≠ 0 . Right now
the choice will not make a different, but it will be important later.
The correlation coefficient is r= 0.913. This is close to 1, so it looks like there is
a strong, positive correlation.
Sample Problem. Using the Formula to Calculate rand r2
How strong is the relationship between the alcohol content and the number of
calories in 12-ounce beer? To determine if there is a positive linear correlation, a random
sample was taken of beer’s alcohol content and calories for several different beers ("Calories in
beer,," 2011), and the data are in table #10.7.1. Find the correlation coefficient and the
coefficient of determination using the formula.
Solution:
From,
SSx= 12.45, SSy= 10335.5556, SSxy= 327.6667
Correlation coefficient:
r= SSxy/SSxSSy
= 327.6667
12.45 *10335.5556
Å 0.913
Coefficient of determination:
r2 = (r)2 = (0.913)2 Å 0.834
References:
https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/
Linear_Regression_and_Correlation.pdf
https://www.coconino.edu/resources/files/pdfs/academics/sabbatical-reports/kate-kozak/
chapter_10.pdf
http://educ.jmu.edu/~drakepp/FIN360/readings/Regression_notes.pdf
http://www.engineeringbookspdf.com/data-analysis-with-microsoft-excel/
Hypothesis Tests on the Ratio of Two Variances
Null Hypothesis:
Test statistic:
Alternative Hypotheses: Rejection Criterion: