STAT-3247-Hypothesis-testing-concerning-two-samples
STAT-3247-Hypothesis-testing-concerning-two-samples
Onneth O. Tejada
STAT 3247 – Advanced Statistics
2
DEPARTMENT of
STATISTICS
3
DEPARTMENT of
STATISTICS
• Two samples are independent if the sample values selected from one
population are not related to or somehow paired or matched with the
sample values selected from the other population. If there is some
relationship so that each value in one sample is paired with a
corresponding value in the other sample, the samples are dependent.
Example:
▪ One group of subjects is treated with the cholesterol- reducing drug Lipitor, while a
second and separate group of subjects is given a placebo. These two sample groups
are independent because the individuals in the treatment group are in no way
paired or matched with corresponding members in the placebo group.
▪ The effectiveness of a diet is tested using weights of subjects measured before and
after the diet treatment. Each “before” value is matched with the “after” value
because each before/after pair of measurements comes from the same person.
4
DEPARTMENT of
STATISTICS
5
DEPARTMENT of
STATISTICS
𝑋ത1 − 𝑋ത2 − 𝜇1 − 𝜇2
𝑡𝑐 = μ1 ≠ μ 2 Reject Ho if 𝑡𝑐 ≥ 𝑡𝛼Τ2,𝑛1 +𝑛2 −2
1 1
𝑆𝑝2 +
𝑛1 𝑛2
μ1 > μ 2 Reject Ho if 𝑡𝑐 > 𝑡𝛼,𝑛1 +𝑛2 −2
2 2
𝑛1 − 1 𝑠1 + 𝑛2 − 1 𝑠2
𝑆𝑝2 =
𝑛1 − 1 + 𝑛2 − 1 μ1 < μ 2 Reject Ho if 𝑡𝑐 < -𝑡𝛼,𝑛1 +𝑛2 −2
6
DEPARTMENT of
STATISTICS
Example 1: The mean for the number of weeks 15 New York Times hard-
cover fiction books spent on the bestseller list is 22 weeks. The standard
deviation is 6.17 weeks. The mean for the number of weeks 15 New York
Times hard-cover nonfiction books spent on the list is 28 weeks. The
standard deviation is 13.2 weeks. At 𝛼 = 0.10, can we conclude that there is
a difference in the mean times for the number of weeks the books were on
the bestseller lists?
Solution:
Step 1: State the null (Ho) and alternative (Ha) hypothesis. Identify the claim.
Ho : 𝜇1 = 𝜇2 and Ha : 𝜇1 ≠ 𝜇2 (claim)
7
DEPARTMENT of
STATISTICS
Step 2: Determine the test statistic, critical value and tail of the distribution
where the rejection region is located.
The test to use is t-test since the
standard deviations 𝜎 are unknown. Rejection Acceptance Rejection
Region Region Region
Since the alternative hypothesis is
𝜇1 ≠ 𝜇2 , therefore the tail of the −1.701 0 1.701
distribution is two-tailed test.
8
DEPARTMENT of
STATISTICS
9
DEPARTMENT of
STATISTICS
10
DEPARTMENT of
STATISTICS
A teacher wants to know which of the two sections has a higher score. The
teacher will use these scores to make recommendations to the principal.
Random samples of students are asked about their scores. Test if the
population mean scores are different for the two sections? Use =0.05.
Section A 81 77 75 74 86 90 62 73 91 98
Section B 89 64 35 68 69 55 37 57 42 49
11
DEPARTMENT of
STATISTICS
section_A <- c(81, 77, 75, 74, 86, 90, 62, 73, 91, 98)
section_B <- c(89, 64, 35, 68, 69, 55, 37, 57, 42, 49)
t.test(section_A, section_B, alternative="two.sided", paired = FALSE,
mu=0, conf.level = 0.95)
Ho: The population mean scores are not different for the two sections
Ha: The population mean scores are different for the two sections
Since P-value (0.001) is less than 0.05 (level of significance), then reject Ho. Therefore, we can conclude that the
population mean scores are different for the two sections.
12
DEPARTMENT of
STATISTICS
13
DEPARTMENT of
STATISTICS
Section A 15 18 16 17 13 22 24 17 19 21 26 28
Section B 14 9 16 19 10 12 11 8 15 18 25
14
DEPARTMENT of
STATISTICS
Solution:
15
DEPARTMENT of
STATISTICS
Time 17 18 18 19 19 21 22 24 25 26 28
Group A B A A B A A A B A A
Rank 12.4 14.5 14.5 16.5 16.5 18 19 20 21 22 23
16
DEPARTMENT of
STATISTICS
b.) Sum the ranks of the group with the smaller sample size. (Note: If both
groups have the same sample size, either one can be used.) In this case, the
sample size for the Section B is smaller.
17
DEPARTMENT of
STATISTICS
18
DEPARTMENT of
STATISTICS
• If the grouping variable has more than two levels then you must specify
which two you want to compare:
> wilcox.test(values~groups, dataset, groups %in% c("Group1",
"Group2"))
• If your data is in unstacked form (with the values for each sample held in
separate variables), use the command:
> wilcox.test(dataset$sample1, dataset$sample2)
19
DEPARTMENT of
STATISTICS
21
DEPARTMENT of
STATISTICS
method_A <- c(85, 90, 78, 92, 88, 76, 84, 91)
method_B <- c(70, 75, 80, 78, 76, 74, 73, 79, 72, 77)
wilcox.test(method_A, method_B, alternative = "two.sided”, paired = FALSE,
mu=0, conf.level = 0.95)
Since P-value (0.004) is less than 0.05 (level of significance), then reject Ho. Therefore, we can conclude that there
is a significant difference in scores between the two groups.
22
DEPARTMENT of
STATISTICS
23
DEPARTMENT of
STATISTICS
Where
2
2 σ 𝑑𝑖
σ 𝑑𝑖 σ 𝑑𝑖 −
ҧ
𝑑= 𝑛 𝑠𝑑 = 𝑛
𝑛−1
24
DEPARTMENT of
STATISTICS
Example 1: As an aid for improving students’ study habits, nine students were
randomly selected to attend a seminar on the importance of education in life.
The table shows the number of hours each student studied per week before
and after the seminar. At 𝛼 = 0.05, did attending the seminar increase the
number of hours the students studied per week?
Student 1 2 3 4 5 6 7 8 9
Before (𝑋1 ) 9 12 6 15 3 18 10 13 7
After (𝑋2 ) 9 17 9 20 2 21 15 22 6
Solution:
Step 1: State the null (Ho) and alternative (Ha) hypothesis. Identify the claim.
Ho : 𝜇1 = 𝜇2 and Ha : 𝜇1 < 𝜇2 (claim)
25
DEPARTMENT of
STATISTICS
Step 2: Determine the test statistic, critical value and tail of the distribution
where the rejection region is located.
The test to use is t-test since the
standard deviations 𝜎 are unknown.
Rejection
Since the alternative hypothesis is Region Acceptance
Region
𝜇1 < 𝜇2 , therefore the tail of the
distribution is left-tailed test. −1.860 0
26
DEPARTMENT of
STATISTICS
Acceptance
Rejection Region
Region
−2.802 −1.860 0
28
DEPARTMENT of
STATISTICS
Student 1 2 3 4 5 6 7 8
Before 77 74 82 73 87 68 66 80
After 72 68 76 68 84 68 61 76
29
DEPARTMENT of
STATISTICS
Before <- c(77, 74, 82, 73, 87, 68, 66, 80)
After <- c(72, 68, 76, 68, 84, 68, 61, 76)
t.test(Before,After, alternative="less",
mu=0, paired = TRUE, conf.level = 0.95)
30
DEPARTMENT of
STATISTICS
31
DEPARTMENT of
STATISTICS
Solution:
Nonparametric Methods | 33
DEPARTMENT of
STATISTICS
b.) Find the sum of the positive ranks and the sum of the negative ranks
separately.
Positive rank sum: +3.5 + +5 + +6 + +3.5 + +7 = +25
Negative rank sum: −1.5 ± 1.5 = −3
c.) Select the smaller of the absolute values of the sums −3 , and use this
absolute value as the test value. In this case the test value is 3.
Step 4: Make a decision.
Reject the null hypothesis if the test value is less than or equal to the critical
value. In this case, 3 > 2; hence, the decision is to not reject the null
hypothesis.
Step 5: Summarize the results.
There is not enough evidence to support the claim that there is a difference
in the severity of the pain before and after the pain medication.
35
DEPARTMENT of
STATISTICS
36
DEPARTMENT of
STATISTICS
> before <- c(78, 85, 90, 88, 76, 89, 93, 82)
> after <- c(84, 88, 92, 91, 80, 90, 94, 85)
> wilcox.test(before, after, paired = TRUE)
Ho: There is no significant difference in student performance before and after the new teaching method.
Ha: There is a significant difference in student performance before and after the new teaching method.
Since P-value (0.01368) is less than 0.05 (level of significance), then reject Ho. Therefore, we can conclude that there is a
significant difference in student performance before and after the new teaching method.
37