Hypothesis Tests in r
Hypothesis Tests in r
eDon Symon
2024-12-09
• Chi-square test
• Correlation test
• The probability of making a Type I error is called the significance level (α).
• The probability of making a Type II error is called the power of the test (β).
1. Z-test in R
A Z-Test is a statistical test used for means or proportions when the population variances are known and
the sample size is large (typically n > 30). It is commonly used in hypothesis testing.
When to Use a Z-Test
• When the population standard deviation (σ) is known.
• For large sample sizes (n > 30).
• To compare:
– A sample mean to a population mean.
– Two sample proportions.
The Z statistics is calculated as below.
Calculate the Z-Statistic
The z.test() function from the BSDA package can be used to perform one-sample and two-sample z-tests
in R.
The function syntax is:
z.test(x, y, alternative = "two.sided", mu = 0, sigma.x = NULL, sigma.y = NULL, conf.level = 0.95)
where,
• x: Values for the first sample.
• mu: Mean under the null or mean difference (in the two-sample case).
##
## One-sample z-Test
##
## data: data
## z = 0.90933, p-value = 0.3632
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
## 96.47608 109.62392
## sample estimates:
## mean of x
## 103.05
Results:
- Z-Statistic: 0.90933
- p-value: 0.3632
Since the p-value (0.3632>0.05), we fail to reject the null hypothesis.
Conclusion: The medication does not significantly affect IQ levels.
city A : 82, 84, 85, 89, 91, 91, 92, 94, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 11
city B : 90, 91, 91, 91, 95, 95, 99, 99, 108, 109, 109, 114, 115, 116, 117, 117, 128, 129, 130, 133
# Enter IQ levels for 20 individuals from each city
cityA <- c(82, 84, 85, 89, 91, 91, 92, 94, 99, 99,
105, 109, 109, 109, 110, 112, 112, 113, 114, 114)
cityB <- c(90, 91, 91, 91, 95, 95, 99, 99, 108, 109,
109, 114, 115, 116, 117, 117, 128, 129, 130, 133)
##
## Two-sample z-Test
##
## data: cityA and cityB
## z = -1.7182, p-value = 0.08577
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -17.446925 1.146925
## sample estimates:
## mean of x mean of y
## 100.65 108.80
Results:
- Z-Statistic: -1.7182
- p-value: 0.08577
Since the p-value (0.08577 > 0.05), we fail to reject the null hypothesis.
Conclusion: The mean IQ levels are not significantly different between the two cities.
Z-Test Exercises
1. A researcher claims that the average weight of a population is 70 kg. A sample of 25 individuals has a
mean weight of 72 kg with a known population standard deviation of 8 kg.Test this claim at α = 0.05.
2. Suppose a company claims there is no difference in the average salaries of employees in two different
branches. A sample of 30 employees from Branch A has a mean salary of 50, 000, while a sample of 30
employees from Branch B has a mean salary of 52, 000. The known population standard deviation is
$5,000 for both branches. Test the company’s claim at α = 0.05.
p − p0
z=q
p0 (1−p0 )
n
Where: - p is the observed sample proportion. - p0 is the hypothesized population proportion. - n is the
sample size.
One Proportion Z-Test in R
To perform a one proportion z-test in R, we can use the following functions:
• If n ≤ 30: binom.test(x, n, p = 0.5, alternative = "two.sided")
• If n > 30: prop.test(x, n, p = 0.5, alternative = "two.sided", correct = TRUE)
Where: - x: is the number of successes.
- n: is the sample size.
- p: is the hypothesized population proportion.
- alternative: specifies the alternative hypothesis (can be “two.sided”, “less”, or “greater”).
- correct: is whether or not to apply Yates’ continuity correction.
##
## 1-sample proportions test with continuity correction
##
## data: 64 out of 100, null probability 0.6
## X-squared = 0.51042, df = 1, p-value = 0.475
## alternative hypothesis: true p is not equal to 0.6
## 95 percent confidence interval:
## 0.5372745 0.7318279
## sample estimates:
## p
## 0.64
Interpretation:
- p-value: 0.475 - 95% Confidence Interval: [0.5373, 0.7318] - Observed Proportion (p): 0.64
Since the p-value (0.475) is greater than the significance level α = 0.05, we fail to reject the null
hypothesis.
Conclusion: We do not have sufficient evidence to say that the proportion of residents who support the law
is different from 0.60.
Additionally, the 95% confidence interval for the true proportion of residents who support the law is [0.5373,
0.7318]. Since this confidence interval contains 0.60, it confirms that we do not have evidence to say that the
true proportion is different from 0.60.
2. Test in R
The t-test is a statistical test used to compare the means of two groups or a single group against a known
value. It is commonly used when the sample size is small and the population standard deviation is unknown.
The t-test follows the assumptions of normality in the data.
There are three main types of t-tests:
1. One-Sample T-Test: Compares the sample mean to a known value (e.g., population mean).
2. Two-Sample T-Test: Compares the means of two independent groups.
3. Paired Sample T-Test: Compares means from two related groups (e.g., pre-treatment and post-
treatment).
We can use the t.test() function in R to perform each type of test:
The syntax is below:
t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
where;
• x, y: The two samples of data.
• var.equal: Whether to assume the variances are equal between the samples.
W eights : 300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303
Test the researcher’s claim at α = 0.05.
#define vector of turtle weights
turtle_weights <- c(300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303)
##
## One Sample t-test
##
## data: turtle_weights
## t = -1.5848, df = 12, p-value = 0.139
## alternative hypothesis: true mean is not equal to 310
## 95 percent confidence interval:
## 303.4236 311.0379
## sample estimates:
## mean of x
## 307.2308
Interpretation:
• t-value: -1.5848 (test statistic).
• Degrees of freedom: 12
• p-value: 0.139.
Decision: Since the p-value (0.139 > 0.05), we reject the null hypothesis. . Conclusion: This means that
there is no significant difference between the mean weight of the turtles and the hypothesized mean weight of
310 grams.
Sample 1 : 300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303
Sample 2 : 335, 329, 322, 321, 324, 319, 304, 308, 305, 311, 307, 300, 305
we can visually inspect the plots of both samples to see whether there exist any difference;
# Load BSDA package
if (!require("ggplot2")) install.packages("ggplot2")
330
320 group
Weight
Sample 1
Sample 2
310
300
Sample 1 Sample 2
Group
From the visualization in the plot, we can clearly see there is a difference in the two samples. But is there a
statistical difference between the times.
Using t.test() function
#define vector of turtle weights for each sample
sample1 <- c(300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303)
sample2 <- c(335, 329, 322, 321, 324, 319, 304, 308, 305, 311, 307, 300, 305)
##
## Welch Two Sample t-test
##
## data: sample1 and sample2
## t = -2.1009, df = 19.112, p-value = 0.04914
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -14.73862953 -0.03060124
## sample estimates:
## mean of x mean of y
## 307.2308 314.6154
Interpretation:
• t-test statistic: -2.1009
• degrees of freedom: 19.112
• p-value: 0.04914
Male Heights : 175, 178, 180, 185, 170, 172, 180, 177, 174, 173
Female Heights : 160, 165, 163, 162, 167, 168, 170, 169, 164, 166
##
## Welch Two Sample t-test
##
## data: MaleHeights and FemaleHeights
## t = 1.073, df = 9.0008, p-value = 0.3112
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -185.0585 519.0585
## sample estimates:
## mean of x mean of y
## 332.4 165.4
Interpretation:
• t-test statistic: 1.073
• degrees of freedom: 9.0008
• p-value: 0.3112
• 95% confidence interval for true mean difference: [-185.0585 , 519.0585]
• mean of sample 1 weights: 332.4
• mean of sample 2 weights: 165.4 314.6154
Decision: Since the p-value (165.4 > 0.05), we fail to reject the null hypothesis.
Conclusion: This means we do not have sufficient evidence to say that the mean weight between the two
groups is significantly different.
We first load the data and convert the gender variable to a categorical variable.
#loading the data to r environment
bank<-read.csv('Bank.csv')
head(bank)
80
Gender
Salary
Female
60
Male
40
Female Male
Gender
The visual plot shows a difference in the salary of Female and Male individulas in the bank data.
Is this difference statistically significant?
#t test across male and Female salaries
##
## Welch Two Sample t-test
##
## data: Salary by Gender
## t = -4.141, df = 78.898, p-value = 8.604e-05
## alternative hypothesis: true difference in means between group Female and group Male is not equal to
## 95 percent confidence interval:
## -12.282943 -4.308082
## sample estimates:
## mean in group Female mean in group Male
## 37.20993 45.50544
Interpretation:
• t-test statistic: -4.141
• degrees of freedom: 78.898
• p-value: 8.604e-05
• 95% confidence interval for true mean difference: [-12.282943 , -4.308082]
• mean of sample 1 weights: 37.20993
• mean of sample 2 weights: 45.50544
Decision: Since the p-value (0.0000 < 0.05), we reject the null hypothesis.
Conclusion: This means we have sufficient evidence to say that the mean salary between the male and
female is significantly different.
Before : 22, 24, 20, 19, 19, 20, 22, 25, 24, 23, 22, 21
After : 23, 25, 20, 24, 18, 22, 23, 28, 24, 25, 24, 20
##
## Paired t-test
##
## data: before and after
## t = -2.5289, df = 11, p-value = 0.02803
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -2.3379151 -0.1620849
## sample estimates:
## mean difference
## -1.25
Interpretation:
• t-test statistic: -2.5289
• degrees of freedom: 11
• p-value: 0.02803
• 95% confidence interval for true mean difference: [-2.3379151, -0.1620849]
• mean difference between before and after: -1.25
Decision: Since the p-value (0.02803 < 0.05), we reject the null hypothesis.
Conclusion: This means we have sufficient evidence to say that the mean jump height before and after
using the training program is not equal.
Pre Test : 75, 80, 82, 70, 65, 85, 78, 88, 90, 85
Post Test : 80, 85, 90, 78, 72, 88, 83, 92, 95, 90
##
## Paired t-test
##
## data: pre_test and post_test
## t = -10.541, df = 9, p-value = 2.303e-06
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -6.680279 -4.319721
## sample estimates:
## mean difference
## -5.5
Interpretation:
• t-test statistic: -10.541
• degrees of freedom: 9
• p-value: 2.303e-06
• 95% confidence interval for true mean difference: [-6.680279 , -4.319721]
• mean difference between before and after: -5.5
Decision: Since the p-value (0.000 < 0.05), we reject the null hypothesis.
Conclusion: This means we have sufficient evidence to say that the mean scores were significantly different
pre and post the training.
One-Way ANOVA in R
A one-way ANOVA is used when we want to compare the means of three or more groups based on a single
factor.
Assumptions of ANOVA
Independence of observations: The groups must be independent of each other. Normality: The data within
each group should be approximately normally distributed. Homogeneity of variances: The variance within
each group should be approximately equal (homoscedasticity).
Example Problem
Suppose we have data on the average exam scores of students from three different teaching methods, and we
want to know if the teaching methods significantly affect exam scores.
Steps to Perform One-Way ANOVA in R We can start explore the data means and plot box plots to
visualize any differences;
# Enter Data for the three groups
method_A <- c(83, 76, 90, 85, 88)
method_B <- c(78, 70, 76, 80, 79)
method_C <- c(85, 88, 92, 91, 89)
#find mean and standard deviation of weight loss for each treatment group
scores %>%
group_by(method) %>%
summarise(mean = mean(score),
sd = sd(score))
## # A tibble: 3 x 3
## method mean sd
## <fct> <dbl> <dbl>
## 1 A 84.4 5.41
## 2 B 76.6 3.97
## 3 C 89 2.74
# Box plot to observe any differences in means of the three groups
ggplot(scores, aes(x = method, y = score, fill= method)) +
geom_boxplot() +
labs(title = "Weight Loss Distribution by Program", x = "Method", ylab = "scores")
90
85
method
score
80 B
C
75
70
A B C
Method
From the visualization in the plot, we can clearly see there is a difference in the two samples. But is there a
## Boxplots of the salary distribution of the salaries across the two groups
ggplot(data = bank, aes(x = JobGrade, y = Salary, fill = JobGrade)) +
geom_boxplot() +
labs(title = "Comparison of Salaries by Job Grade", x = "Job Grade", y = "Salary")
80
JobGrade
1
2
Salary
3
60
4
5
6
40
1 2 3 4 5 6
Job Grade
The visual plot shows a difference in the salaries across the job grades in the bank data.
Is this difference statistically significant?
#t test across male and Female salaries
Model assumptions
1. Normality assumption
We can inspect the Q-Q plot to check whether the assumption is violated or run the ‘Shapiro wilk test.
Ideally the standardized residuals would fall along the straight diagonal line in the plot.
# Q-Q plot
plot(model,2)
Q−Q Residuals
6
205
204
4
Standardized residuals
2
0
−2
−4
−6
208
−3 −2 −1 0 1 2 3
Theoretical Quantiles
aov(Salary ~ JobGrade)
#Running shapiro wilk normality test
shapiro.test(model$residuals)
##
## Shapiro-Wilk normality test
##
## data: model$residuals
## W = 0.81763, p-value = 7.095e-15
In the plot above we can see that the residuals stray from the line quite a bit towards the beginning and the
end. This is an indication that our normality assumption may be violated.
The p value is less than 0.05 indicating that the assumption of normality is violated.
2. Equality of variance
We can use the Levene's test in the package to test this or inspect the Residuals vs Fitted plot. Ideally
we’d like to see the residuals be equally spread out for each level of the fitted values.
#Residuals vs Fitted`
plot(model,1)
Residuals vs Fitted
205
204
20
Residuals
0
−20
−40
208
35 40 45 50 55 60 65
Fitted values
aov(Salary ~ JobGrade)
# Inspecting using the levene's test
Post-Hoc Tests:
Once we have verified that the model assumptions are met (or reasonably met), we can then conduct a post
hoc test to determine exactly which treatment groups differ from one another.
#perform Tukey's Test for multiple comparisons
TukeyHSD(model, conf.level=.95)
2−1
3−1
4−1
5−1
6−1
3−2
4−2
5−2
6−2
4−3
5−3
6−3
5−4
6−4
6−5
0
10
20
30
40
The results of the confidence intervals are consistent with the results of the hypothesis tests.
We can see that all the confidence intervals for the mean salaries between the Job Grades do not contain the
value zero except for grades 1 and 2, which indicates that there is a statistically significant difference in mean
salary between all Job Grades except for 1 and 2.
4. Chi-Square Tests in R
The chi-square test is a statistical method used to examine whether observed data matches expected data
(goodness-of-fit) or whether two categorical variables are independent (test of independence).
Types of Chi-Square Tests
1. Goodness-of-Fit Test: - Used to determine if a sample distribution matches a theoretical distribution. -
Example: Do the proportions of a dice roll match the expected proportions?
2. Test of Independence:
• Used to test the relationship between two categorical variables.
• Example: Is gender independent of voting preference?
Assumptions - The data must be categorical. - Observations must be independent. - Expected frequency in
each cell should be at least 5 (for large-sample approximation validity).
1. Goodness-of-Fit Test
A Chi-Square Goodness of Fit Test is used to determine whether or not a categorical variable follows a
hypothesized distribution.
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 1.4, df = 5, p-value = 0.9243
Interpretation:
• chi square statistic: 1.4
• degrees of freedom: 5
• p-value: 0.9243
Decision: Since the p-value (0.9243 > 0.05), we fail to reject the null hypothesis.
Conclusion: This means we do not have have sufficient evidence to say that the dice is unfair.
Example 2:
A shop owner claims that an equal number of customers come into his shop each weekday. To test this
hypothesis, a researcher records the number of customers that come into the shop in a given week and finds
the following:
• Monday: 50 customers
• Tuesday: 60 customers
• Wednesday: 40 customers
• Thursday: 47 customers
• Friday: 53 customers
Perform a Chi-Square goodness of fit test in R to determine if the data is consistent with the shop owner’s
claim.
# Enter data and create expected frequencies
observed <- c(50, 60, 40, 47, 53)
expected <- c(.2, .2, .2, .2, .2) #must add up to 1
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 4.36, df = 4, p-value = 0.3595
Interpretation:
• chi square statistic: 4.36
• degrees of freedom: 4
• p-value: 0.3595
Decision: Since the p-value (0.3595 > 0.05), we fail to reject the null hypothesis.
Conclusion: This means we do not have have sufficient evidence to say that the true distribution of
customers is different from the distribution that the shop owner claimed.
#create table
data <- matrix(c(120, 90, 40, 110, 95, 45), ncol=3, byrow=TRUE)
colnames(data) <- c("Rep","Dem","Ind")
rownames(data) <- c("Male","Female")
data <- as.table(data)
data
##
## Pearson's Chi-squared test
##
## data: data
## X-squared = 0.86404, df = 2, p-value = 0.6492
Interpretation:
• chi square statistic: 0.86404
• degrees of freedom: 2
• p-value: 0.6492
Decision: Since the p-value (0.6492 > 0.05), we fail to reject the null hypothesis.
Conclusion: This means we do not have have sufficient evidence to say that there is an association
between gender and political party preference i.e gender and political party preference are independent.
chisq_test
##
## Pearson's Chi-squared test
##
## data: data_table
## X-squared = 14.55, df = 2, p-value = 0.0006925
Interpretation:
• chi square statistic: 14.55
• degrees of freedom: 2
• p-value: 0.0006925
Decision: Since the p-value (0.0006925 < 0.05), we reject the null hypothesis.
Conclusion: This means we have sufficient evidence to say that there is an association between gender
and the products preference i.e gender and product preference are dependent.