Chapter 3
Chapter 3
STATISTICAL
INFERENCES (WEEK 8)
3.1 Introduction
3.2 Sampling distribution WEEK 8
Sampling distribution:
A sampling distribution is a probability distribution of a statistic
obtained through a large number of samples drawn from a specific
population.
Why we do sampling?
1. Sample only contains portion of a population, thus save time
(less time consumes), less costly and less uses of resources.
2. It is more practical than studying the entire population.
Relationship between population distribution & sampling distribution
of the sample mean
1. The mean of the sample means is exactly equal to the population mean.
2. The dispersion of the sampling distribution is narrower than the original
distribution.
3. The sampling distribution of the sample means tend to become bell-shaped
and
approximately normal.
Definitions:
Estimation
A process of estimating the value of population parameter that is obtained from
sample information.
Point estimate
A single value estimate of a parameter.
MARGIN ERROR, E
Known Unknown
x z n 30 n 30
2
n
s
Margin error tells the
s x t
maximum difference between
x z 2
,v
n
the point estimate of a 2 n
parameter and the actual value
of the parameter
v = n −1
Confidence Interval Estimates for Population Mean
The (1 − )100% Confidence Interval of Population Mean,
(i) x z x − z x + z
2 n 2 n 2 n
s s s
(ii) x z x − z x + z
2 n 2 n 2 n
s s s
(iii) x tn−1,v x − tn−1, x + tn−1,
n 2 n 2 n
Determining Sample Size for population mean problems
Determining
Sample Size for the mean
Sampling error (margin of error Sample size for
population:
X Z / 2
@ error of estimation)
z /2
2
n E = Z / 2 n=
n E
Example 1
If a random sample of size n = 20 from a normal population with the
variance 2 = 225has the mean x = 64.3, construct a 95% confidence
interval for the population mean, .
Solution:
It is known that, n = 20, = x = 64.3 and 2 = 225 thus = 15
For 95% CI,
95% = 100(1 – )%
1 – = 0.95
= 0.05
= 0.025 z = z0.025 = 1.96 11
2 2
Hence, 95% CI = x z
2 n
15
= 64.3 1.96
20
= 64.3 6.57
= [57.73, 70.87]
@
57.73 70.87
Solution:
It is known that, n = 36
x = RM70.50
= RM 4.50 13
For 90% CI,
90% = 100(1 – )%
1 – = 0.90
= 0.1
= 0.05 z = z0.05 = 1.65
2 2
Hence, 90% CI = x z
2 n
4.50
= 70.50 1.65
36
= 70.50 1.24
= [ RM 69.26, RM 71.74]
Thus, we are 90% confident that the mean price of all such college
14
Z 2 σ2 (1.65) 2 (45) 2
n= 2
= 2
= 219.19 (Always round up)
E 5
• So the required sample size is n = 220
Confidence Interval Estimates for Population Proportion
pˆ qˆ x
pˆ z where pˆ = and qˆ = (1 − pˆ )
2 n n
or
pˆ qˆ pˆ qˆ
pˆ − z p pˆ + z
2 n 2 n 17
Determining Sample Size for population proportion problems
Determining
Sample Size for the proportion
pˆ z
@ error of estimation)
( Z / 2 ) pˆ (1 − pˆ )
2
n ˆ (1- p
p ˆ) n=
E = Z / 2
2
E2
n
Example 1:
According to the analysis of Women Magazine in June 2005, “Stress has become a common
part of everyday life among working women in Malaysia. The demands of work, family and
home place an increasing burden on average Malaysian women”. According to this poll,
40% of working women included in the survey indicated that they had a little amount of
time to relax. The poll was based on a randomly selected of 1502 working women aged 30
and above. Construct a 95% confidence interval for the corresponding population
proportion.
Solution:
Let p be the proportion of all working women age 30 and above, who have a limited
amount of time to relax.
ˆˆ
pq
Hence, 95% CI = p
ˆ z
2 n
0.40(0.60)
= 0.40 1.96
1502
= 0.40 0.02478
= [0.375, 0.425] or 37.5% to 42.5%
Thus, we can state with 95% confidence that the proportion of all
working women aged 30 and above who have a limited amount of
20
time to relax is between 37.5% and 42.5%.
Example 2:
How large a sample would be necessary to estimate the true proportion defective in a large
population within ±3%, with 95% confidence?
(Assume a sample yields p = 0.12)
Solution:
ˆ (1 − p
p ˆ) (0.35)(0.65)
E = z / 2 = 2.575 = 0.061
n 400
= 0.05 = 0.025 z0.025 = 1.96, p
ˆ = 0.12, E = 0.05
2
2
z0.025
n= ˆ (1 − p
p ˆ)
E
2
1.96
= 0.12(0.88) 163
0.05
23
HYPOTHESIS TESTING
HYPOTHESIS TESTING
Everyday, in every aspects of life, problems occur and need to be solved. From the
problems, questions arise and researchers are interested in answering those questions.
For example:
• Is the earth warming up?
• Does a new medication lower blood pressure?
• Does the public prefer a certain color in a new fashion line?
• Is a new teaching technique better than a traditional one?
• Do seat belts reduce the severity of injuries?
These types of questions can be addressed/solved through statistical hypothesis testing,
which is a decision-making process for evaluating claims about a population.
25
Definitions:
Hypothesis testing can be used to determine whether a statement about the value of a
population parameter (such as mean or proportion) should or should not be rejected.
27
Left-tailed test: The critical value,(CV) is Zα,
separates the critical region from the
noncritical region.
28
Right-tailed test:
Two-tailed test:
Fail to
37
• The level of significance is the maximum probability of
committing a type I error. This probability is symbolized by α
(alpha). That is,
P(type I error) = α
Likewise,
P(type II error) = β (beta).
• Typical significance levels are:
0.10, 0.05, and 0.01
40
Hypotheses
There are two hypotheses :
1. Null hypothesis, H 0
2. Alternative hypothesis, H1
Claim
When a researcher conducts a study, he or she is generally
looking for evidence to support a claim. Therefore, the
claim should be stated as the alternative hypothesis, or
research hypothesis. 41
Three methods used to test hypotheses:
1. The traditional method
2. The confidence interval method
3. The P-value method
Hypothesis and Test Procedures for mean
(Traditional Method)
A standard statistical test of hypothesis consist of :
1. State the Null hypothesis, H 0 and Alternative
hypothesis, H1 . Common phrases of hypotheses.
2. Find the critical value, Zα (refer to table z in Textbook)
3. Calculate the test statistic, Zstat/Zcalc.
4. Determine the rejection region.
45
HYPOTHESIS SYMBOL
Step 2 Find the critical value(s) from the appropriate statistical table (Table
Z = ?
6 in Textbook).
Step 4 Make the decision to reject or not reject the null hypothesis.
Z test − z or Ztest z
2 2
Z test − z Z test z
Rejection Region ttest −t or ttest t ttest −t ,v
2, v 2, v
ttest t ,v
v = n −1
Case left-tailed test:
Example: Since Ztest = − 1.7 < − z = −1.5 thus we decide to REJECT H 0 .
Example: Since Ztest =1.2 > − z = −1.5 thus we decide FAIL to / DO
NOT REJECT H 0 .
Step 5 Summarise the results.
H1 : The average monthly earnings for men in M&P positions is higher than
RM 2400 ( women )
60
H1 : 2400
Solution:
1.The hypothesis to be tested are,
H 0 : = 2400
Right-tailed test
H1 : 2400
2. =0.01, is known and n 30, We will use normal distribution, Z
Rejection Region : Z z ; z = z0.01 = 2.33
3. Test Statistic
x − 3600 − 2400
Z= = = 18.97
s 400
n 40
falls in the
4. Since Z test = 18.97 is greater than z0.01 = 2.33 (18.97 2.33) , we reject H 0 .
rejection region
5. Thus, we conclude that average monthly earnings for men in managerial and professional
61
positions are significantly higher than those for women
Example 2: Professors’ Salaries H1
A researcher reports that the average salary of assistant professors is more than $42,000. A
sample of 30 assistant professors has a mean salary of $43,260. At α = 0.05, test the claim
that assistant professors earn more than $42,000 per year. The standard deviation of the
population is $5230.
66
Example: Sugar Production
Sugar is packed in 5-pound bags. An inspector suspects the bags may not contain 5 pounds.
A sample of 50 bags produces a mean of 4.6 pounds and a standard deviation of 0.7 pound.
Is there enough evidence to conclude that the bags do not contain 5 pounds as stated at α =
0.05? Also, find the 95% confidence interval of the true mean.
67
Step 3: Compute the test value.
X − 4.6 − 5.0
z= = = −4.04
n 0.7 50
68
Example: Sugar Production
Recall that: H0: μ = 5 and H1: μ 5 (claim)
s 0.7
Hence, 95% CI = x z = 4.6 1.96 = 4.6 0.2
2 n 50
= [4.4, 4.8]
Notice that the 95% confidence interval of does not contain the
hypothesized value μ = 5. Thus we reject the null hypothesis.
Hence, there is agreement between the hypothesis test and the confidence
interval. 69
P-VALUE METHOD
Hypothesis Testing
The P-value (or probability value) is the probability of getting
a sample statistic (such as the mean) or more extreme sample
statistic in the direction of the alternative hypothesis when the
null hypothesis is true.
p-value is the probability calculated using the test statistic. The
smaller the p-value, the more contradictory it is the data to H .
0
P-value
P-Value
Test Value
71
Hypothesis Testing
• In this section, the traditional method for solving hypothesis-
testing problems compares z-values:
• critical value
• test value
• The P-value method for solving hypothesis-testing problems
compares areas:
• alpha
• P-value
72
Procedures:
Solving Hypothesis-Testing Problems
(P-Value Method)
Step 1 State the hypotheses and identify the claim.
Step 2 Compute the test value.
Step 3 Find the P-value.
Step 4 Make the decision.
Step 5 Summarize the results.
73
p-value
p-value is the smallest significance level at which the null hypothesis is rejected.
A researcher wishes to test the claim that the average cost of tuition
and fees at a four-year public college is greater than $5700. She
selects a random sample of 36 four-year public colleges and finds the
mean to be $5950. The population standard deviation is $659. Is there
evidence to support the claim at α 0.05? Use the P-value method.
ˆ − p0
p
Test Statistic : Z test =
p0 q0
n
H1 : p p0 Z − z 2 or Z z 2
H1 : p p0 Z z
H1 : p p0 Z< − z
85
Example 1:
When working properly, a machine that is used to make chips for calculators produce
4% defective chips. Whenever the machine produces more than 4% defective chips it
needs an adjustment. To check if the machine is working properly, the quality control
department at the company often takes sample of chips and inspects them to
determine if the chips are good or defective. One such random sample of 200 chips
taken recently from the production line contained 14 defective chips. Test at the 5%
significance level whether the machine needs an adjustment.
86
Solution:
3. Test statistic is
ˆ − p0
p 0.07 − 0.04
Z test = = = 2.17
p0 q0 0.04(0.96)
n 200 87
4. Since Z test = 2.17 z0.05 = 1.65, falls in the rejection region,
thus we can reject H 0 .