FDSA UNIT 3
FDSA UNIT 3
PART A
Addition Rule
Add together the separate probabilities of several mutually exclusive
events to find the probability that any one of these events will occur.
Independent Events
The occurrence of one event has no effect on the probability that the other
event will occur.
Multiplication Rule
Multiply together the separate probabilities of several independent
events to find the probability that these events will occur together.
8. Narrate the symbols used for the mean and standard deviation of three
types of Distributions.
11. Define Shape of the sampling distribution or state the central limit
theorem.
SHAPE OF THE SAMPLING DISTRIBUTION
Central Limit Theorem
The central limit theorem states that, regardless of the shape of the
population, the shape of the sampling distribution of the mean
approximates a normal curve if the sample size is sufficiently large.
H1: 50.
14. Explain testing of Null Hypothesis. Define Common Outcome and Rare
Outcome.
Testing Null Hypothesis
The null hypothesis is tested by determining whether the one observed
sample mean qualifies as a common outcome or a rare outcome in the
hypothesized sampling distribution
Common Outcomes
o An observed sample mean qualifies as a common outcome if the
difference between its value and that of the hypothesized population
mean is small enough to be viewed as a probable outcome under the
null hypothesis.
o There is no compelling reason for rejecting the null hypothesis, it is
retained.
Rare Outcomes
o An observed sample mean qualifies as a rare outcome if the difference
between its value and the hypothesized population mean is too large to
be reasonably viewed as a probable outcome under the null hypothesis.
19. What is the use of one-tailed and two – tailed tests in hypothesis
testing? When to use it?
One and Two-Tailed Tests are ways to identify the relationship
between the statistical variables.
For checking the relationship between variables in a single direction
(Left or Right direction), use a one-tailed test.
A two-tailed test is used to check whether the relations between
variables are in any direction or not.
where
PART B
Sample
Any subset of observations from a population.
The sample size is small relative to the population size.
Example 3.1
For each of the following pairs, indicate with a Yes or No
whether the relationship between the first and second
expressions could describe that between a sample and its
population, respectively.
(a) students in the last row; students in class
(b) citizens of Wyoming; citizens of New York
(c) 20 lab rats in an experiment; all lab rats, similar to those
used, that could undergo the same experiment
(d) all U.S. presidents; all registered Republicans
(e) two tosses of a coin; all possible tosses of a coin
Solution
(a) Yes
(b) No. Citizens of Wyoming aren’t a subset of citizens of New York.
(c) Yes
(d) No. All U.S. presidents aren’t a subset of all registered Republicans.
(e) Yes
Example 3.2
Identify all of the expressions from Example 3.1 that involve a
hypothetical population.
Solution
Expressions in 8.1(c) and 8.1(e) involve hypothetical populations.
Random Sampling
A selection process that guarantees all potential observations in the
population have an equal chance of being selected.
Inferential statistics requires that samples be random.
Example 3.3
Indicate whether each of the following statements is True or False.
A random selection of 10 playing cards from a deck of 52 cards implies that
(a) the random sample of 10 cards accurately represents the important
features of the whole deck.
(b) each card in the deck has an equal chance of being selected.
(c) it is impossible to get 10 cards from the same suit (for example, 10
hearts).
(d) any outcome, however unlikely, is possible.
Solution
a. False. Sometimes, just by chance, a random sample of 10 cards fails to
represent the important features of the whole deck.
b. True
c. False. Although unlikely, 10 hearts could appear in a random sample of 10
cards.
d. True
Example 3.4
Describe how you would use the table of random numbers to take
a. a random sample of five statistics students in a classroom where
each of nine rows consists of nine seats.
b. a random sample of size 40 from a large directory consisting of
3041 pages, with 480 lines per page.
Solution
a. There are many ways. For instance, consult the tables of random numbers,
using the first digit of each 5-digit random number to identify the row
(previously labelled 1, 2, 3, and so on), and the second digit of the same
random number to locate a particular student’s seat within that row.
Repeat this process until five students have been identified. (If the
classroom is larger, use additional digits so that every student can be
sampled.)
b. Once again, there are many ways. For instance, use the initial 4 digits
of each random number (between 0001 and 3041) to identify the page
number of the telephone directory and the next 3 digits (between 001
and 480) to identify the particular line on that page. Repeat this
process, using 7-digit numbers, until 40 telephone numbers have been
identified.
Probability
The proportion or fraction of times that a particular event is likely to
occur.
Example 3.5
Assuming that people are equally likely to be born during any
One of the months, what is the probability of Jack being born
during
(a) June?
(b) any month other than June?
(c) either May or June?
Solution
Independent Events
The occurrence of one event has no effect on the probability that the
other event will occur.
Multiplication Rule
Multiply together the separate probabilities of several independent
events to find the probability that these events will occur together.
Example 3.6
Assuming that people are equally likely to be born during any of the
months, and also assuming (possibly over the objections of astrology
fans) that the birthdays of married couples are independent, what’s
the probability of
(a) the husband being born during January and the wife being born
during February?
(b) both husband and wife being born during December?
(c) both husband and wife being born during the spring (April or May)?
(Hint: First, find the probability of just one person being born during
April or May.)
Solution
Dependent Events
When the occurrence of one event affects the probability of the other
event, these events are dependent.
Although the heights of randomly selected pairs of men are independent,
the heights of brothers are dependent.
Conditional Probability
The probability of one event, given the occurrence of another event.
Example –
Figure 3.1 – A frequency analysis of 100 drivers who caused fatal accidents
Figure 3.1 shows a frequency analysis for the 100 drivers involved in fatal
accidents.
Working from the top down, notice that among the 100 drivers, 40 are drunk
(from .40 × 100 = 40) and 20 take drugs (from .20 × 100 = 20). Also notice
that 12 of the 40 drunk drivers also take drugs (from .30 × 40 = 12). Now, it
is fairly straightforward to establish that the probability of drivers both being
drunk and taking drugs. It is simply the number of drivers who are drunk
and take drugs, 12, divided by the total number of drivers, 100, that is,
12/100 =.12, which, of course, is the same as the previous answer.
Once a frequency analysis has been done, it often is easy to answer other
questions.
For example,
“What is the conditional probability of being drunk, given that the driver
takes illegal drugs?”
Referring to Figure 3.1, divide the number of drivers who are drunk and take
drugs, 12, by the number of drivers who take drugs, 20, that is, 12/20 = .60.
Example 3.7
Among 100 couples who had undergone marital counselling, 60
couples described their relationships as improved, and among this
latter group, 45 couples had children. The remaining couples
described their relationships as unimproved, and among this group, 5
couples had children. (Hint: Using a frequency analysis, begin with the
100 couples, first branch into the number of couples with improved
and unimproved relationships, then under each of these numbers,
branch into the number of couples with children and without children.
Enter a number at each point of the diagram before proceeding.)
a. What is the probability of randomly selecting a couple who described
their relationship as improved?
b. What is the probability of randomly selecting a couple with children?
c. What is the conditional probability of randomly selecting a couple
with children, given that their relationship was described as
improved?
d. What is the conditional probability of randomly selecting a couple
without children, given that their relationship was described as not
improved?
e. What is the conditional probability of an improved relationship, given
that a couple has children?
Itemize all possible random samples, each of size two, that could be
taken from this population.
There are four possibilities on the first draw from the population and
also four possibilities on the second draw from the population, as
indicated in Table 3.1.*
Table 3.1 - All possible samples of size two from a miniature population
FIGURE 3.3
Emergence of the sampling distribution of the mean from all possible
samples.
Example 3.8
Without peeking, list the special symbols for the mean of the
population
(a) mean of the sampling distribution of the mean
(b) mean of the sample
(c) standard error of the mean
(d) standard deviation of the sample
(e) standard deviation of the population (f) .
Example 3.9
Imagine a very simple population consisting of only five observations:
2, 4, 6, 8, 10.
(a) List all possible samples of size two.
Example 3.10
Indicate whether the following statements are True or False.
Example 3.10
Indicate whether the following statements are True or False. The
Example 3.11
Indicate whether the following statements are True or False. The
central limit theorem
a. states that, with sufficiently large sample sizes, the shape of the
population is normal.
b. states that, regardless of sample size, the shape of the sampling
distribution of the mean is normal.
c. ensures that the shape of the sampling distribution of the mean
equals the shape of the population.
d. applies to the shape of the sampling distribution—not to the shape
of the population and not to the shape of the sample.
Defining Hypotheses
Null hypothesis (H0):
In statistics, the null hypothesis is a general statement or default
position that there is no relationship between two measured cases or
no relationship among groups. In other words, it is a basic
assumption or made based on the problem knowledge.
Example:
A company’s mean production is 50 units/per day
H0: = 50.
Alternative hypothesis (H1):
The alternative hypothesis is the hypothesis used in hypothesis
testing that is contrary to the null hypothesis.
Example:
A company’s production is not equal to 50 units/per day i.e.
H1: 50.
Figure 3.6 - One possible set of common and rare outcomes (values of X).
Figure 3.6 shows one possible set of boundaries for common and rare
outcomes, expressed in values of X.
If the one observed sample mean is located between 478 and 522, it will
qualify as a common outcome, and the null hypothesis will be retained.
If, however, the one observed sample mean is greater than522 or less than
478, it will qualify as a rare outcome, and the null hypothesis will be
rejected.
where
Example 3.12
Calculate the value of the z test for each of the following situations:
Critical z Score
A z score that separates common from rare outcomes and hence
dictates whether H0 should be retained or rejected.
Example 3.13
Indicate what’s wrong with each of the following statistical
hypotheses:
Example 3.14
First using words, then symbols, identify the null hypothesis for
each of the following situations.
a. A school administrator wishes to determine whether sixth-grade
boys in her school district differ, on average, from the national
norms of 10.2 pushups for sixth-grade boys.
b. A consumer group investigates whether, on average, the true
weights of packages of ground beef sold by a large supermarket
chain differ from the specified 16 ounces.
c. A marriage counselor wishes to determine whether, during a
standard conflict-resolution session, his clients differ, on average,
from the 11 verbal interruptions reported for “welladjusted
couples.”
Example 3.15
For each of the following situations, indicate whether H0 should be
retained or rejected and justify your answer by specifying the precise
relationship between observed and critical z scores. Should H0 be
retained or rejected, given a hypothesis test with critical z scores of ±
1.96 and
Example 3.16
According to the American Psychological Association, members
with a doctorate and a full-time teaching appointment earn, on
thaverage, $82,500 per year, with a standard deviation of $6,000. An
investigator wishes to determine whether $82,500 is also the mean
salary for all female members with a doctorate and a full-time
teaching appointment. Salaries are obtained for a random sample of
100 women from this population, and the mean salary equals
$80,100.
(a) Someone claims that the observed difference between $80,100
and $82,500 is large enough by itself to support the conclusion
that female members earn less than male members. Explain why
it is important to conduct a hypothesis test.
(b) The investigator wishes to conduct a hypothesis test for what
population?
(c) What is the null hypothesis, H0?
(d) What is the alternative hypothesis, H1?
(e) Specify the decision rule, using the .05 level of significance.
(f) Calculate the value of z. (Remember to convert the standard
deviation to a standard error.)
(g) What is your decision about H0?
(h) Using words, interpret this decision in terms of the original
problem.
One and Two-Tailed Tests are ways to identify the relationship between the
statistical variables.
For checking the relationship between variables in a single direction (Left or
Right direction), use a one-tailed test.
A two-tailed test is used to check whether the relations between variables are
in any direction or not.
Figure 3.8 a, illustrates a rejection region that is associated with only the
lower tail of the hypothesized sampling distribution.
The corresponding decision rule, with its critical z of –1.65, is referred to
as a one-tailed or directional test with the lower tail critical.
Figure 3.8 b, illustrates one-tailed or directional test with the upper tail
critical. This one-tailed test is the mirror image of the previous test.
The corresponding decision rule, with its critical z of 1.65, is referred to
as a one-tailed or directional test with the upper tail critical.
Figure 3.10 shows rejection regions that are associated with both tails of
the hypothesized sampling distribution.
The corresponding decision rule, with its pair of critical z scores of ±1.96,
is referred to as a two-tailed or nondirectional test.
Critical region lies entirely on either the Critical region is given by the portion
right side or left side of the sampling of the area lying in both the tails of the
distribution. probability curve of the test statistic.
Rejection region is either from the left Rejection region is from both sides i.e.
side or right side of the sampling left and right of the sampling
distribution. distribution.
It checks the relation between the It checks the relation between the
variable in a single direction. variables in any direction.
It is used to check whether the one It is used to check whether the two
mean is different from another mean or mean different from one another or
not. not.
Example 3.17
For each of the following situations, indicate whether H0 should be
retained or rejected.
Given a one-tailed test, lower tail critical with α = .01, and
(a) z = – 2.34 (b) z = – 5.13 (c) z = 4.04
Given a one-tailed test, upper tail critical with α = .05, and
(d) z = 2.00 (e) z = – 1.80 (f) z = 1.61
a. Reject H0 at the .01 level of significance because z = –2.34 is more negative
than –2.33.
b. Reject H0 at the .01 level of significance because z = –5.13 is more negative
than –2.33.
c. Retain H0 at the .01 level of significance because z = 4.04 is less negative
than –2.33. (The value of the observed z is in the direction of no concern.)
d. Reject H0 at the .05 level of significance because z = 2.00 is more positive
than 1.65.
e. Retain H0 at the .05 level of significance because z = –1.80 is less positive
than 1.65. (The value of the observed z is in the direction of no concern.)
f. Retain H0 at the .05 level of significance because z = 1.61 is less positive
than 1.65.
Example 3.18
Specify the decision rule for each of the following situations (referring
to Table to find critical z values):
(a) a two-tailed test with α = .05
(b) a one-tailed test, upper tail critical, with α = .01
(c) a one-tailed test, lower tail critical, with α = .05
(d) a two-tailed test with α = .01
a. Reject H0 at the .05 level of significance if z equals or is more positive than
1.96 of if z equals or is more negative than –1.96.
b. Reject H0 at the .01 level of significance if z equals or is more positive than
2.33.
c. Reject H0 at the .05 level of significance if z equals or is more negative than
–1.65.
d. Reject H0 at the .01 level of significance if z equals or is more positive than
2.58 or if z equals or is more negative than –2.58.
Example 3.19
A random sample of 200 graduates of U.S. colleges reveals a mean
annual income of $62,600. What is the best estimate of the
unknown mean annual income for all graduates of U.S. colleges?
$62,600
where
Example 3.20
Reading achievement scores are obtained for a group of fourth
graders. A score of 4.0 indicates a level of achievement
appropriate for fourth grade, a score below 4.0 indicates
underachievement, and a score above 4.0 indicates
overachievement. Assume that the population standard
deviation equals 0.4. A random sample of 64 fourth graders
reveals a mean achievement score of 3.82.
a. Construct a 95 percent confidence interval for the unknown
population mean. (Remember to convert the standard deviation
to a standard error.)
(b) Can claim, with 95 percent confidence, that the interval between
3.72 and 3.92 includes the true population mean reading score
for the fourth graders. All of these values suggest that, on
average, the fourth graders are underachieving
Example 3.21
Before taking the GRE, a random sample of college seniors received
special training on how to take the test. After analysing their scores
on the GRE, the investigator reported a dramatic gain, relative to
the national average of 500, as indicated by a 95 percent confidence
interval of 507 to 527. Are the following interpretations true or
false?
a. About 95 percent of all subjects scored between 507 and 527.
b. The interval from 507 to 527 refers to possible values of the
population mean for all students who undergo special training.
c. The true population mean definitely is between 507 and 527.
d. This particular interval describes the population mean about 95
percent of the time.
e. In practice, we never really know whether the interval from 507
to 527 is true or false.
f. We can be reasonably confident that the population mean is
between 507 and 527.
a. False. We can be 95 percent confident that the mean for all subjects
will be between 507 and 527.
b. True
c. False. We can be reasonably confident—but not absolutely confident—
that the true population mean lies between 507 and 527.
d. False. This particular interval either describes the one true population
mean or fails to describe the one true population mean.
e. True
f. True
LEVEL OF CONFIDENCE
The level of confidence indicates the percent of time that a series of
confidence intervals includes the unknown population characteristic,
such as the population mean.
Any level of confidence may be assigned to a confidence interval
merely by substituting an appropriate value for zconf in Formula
Example 3.22
On the basis of a random sample of 120 adults, a pollster
reports, with 95 percent confidence, that between 58 and 72
percent of all Americans believe in life after death.
a. If this interval is too wide, what, if anything, can be done
with the existing data to obtain a narrower confidence
interval?
b. What can be done to obtain a narrower 95 percent
confidence interval if another similar investigation is being
planned?
a. Switch to an interval having a lesser degree of confidence, such
as 90 percent or 75 percent.
b. Increase the sample size.
Example 3.23
In a recent scientific sample of about 900 adult Americans, 70
percent favour stricter gun control of assault weapons, with a
margin of error of ±4 percent for a 95 percent confidence interval.
Therefore, the 95 percent confidence interval equals 66 to 74
percent. Indicate whether the following interpretations are true or
false:
a. The interval from 66 to 74 percent refers to possible values of
the sample percent.
b. The true population percent is between 66 and 74 percent.
c. In the long run, a series of intervals similar to this one would
fail to include the population percent about 5 percent of the
time.
d. We can be reasonably confident that the population percent is
between 66 and 74 percent.
Example 3.23
For the population at large, the Wechsler Adult Intelligence Scale is
designed to yield a normal distribution of test scores with a mean of
100 and a standard deviation of 15. School district officials wonder
whether, on the average, an IQ score different from 100 describes the
intellectual aptitudes of all students in their district. Wechsler IQ
scores are obtained for a random sample of 25 of their students, and
the mean IQ is found to equal 105. Using the step-by-step procedure,
test the null hypothesis at the .05 level of significance.
Example 3.24
Consult the power curves in Figure 11.7 to estimate the approximate
detection rates, rounded to the nearest tenth, for the following
situations:
(a) a three-point effect, with a sample size of 29
(b) a six-point effect, with a sample size of 13
(c) a twelve-point effect, with a sample size of 13
(a) .3
(b) .4
(c) .9
Example 3.25
An investigator consults a chart to determine the sample size required
to detect an eight-point effect with a probability of .80. What happens
to this detection rate of .80—will it actually be smaller, the same, or
larger—if, unknown to the investigator, the true effect actually equals
(a) twelve points?
(b) five points?
a. The power for the 12-point effect is larger than .80 because the true
sampling distribution is shifted further into the rejection region for the
false H0.
b. The power for the 5-point effect is smaller than .80 because the true
sampling distribution is shifted further into the retention region for the
false H0.
Example 3.26
In Question 10.5 on page 191, it was concluded that, the mean salary
among the population of female members of the American
Psychological Association is less than that ($82,500) for all
comparable members who have a doctorate and teach full time.
(a) Given a population standard deviation of $6,000 and a sample
mean salary of $80,100 for a random sample of 100 female members,
construct a 99 percent confidence interval for the mean salary for all
female members.
(b) Given this confidence interval, is there any consistent evidence that
the mean salary for all female members falls below $82,500, the mean
salary for all members?
(b) can claim, with 99 percent confidence, that the interval between $78,552
and $81,648 includes the true population mean salary for all female members
of the American Psychological Association. All of these values suggest that,
on average, females’ salaries are less than males’ salaries.
Example 3.27
Imagine that one of the following 95 percent confidence intervals
estimates the effect of vitamin C on IQ scores:
(a) Which one most strongly supports the conclusion that vitamin C
increases IQ scores?
(b) Which one implies the largest sample size?
(c) Which one most strongly supports the conclusion that vitamin C
decreases IQ scores?
(c) Which one would most likely stimulate the investigator to
conduct an additional experiment using larger sample sizes?
(a) 3 (b) 1 (c) 5 (d) 4