0% found this document useful (0 votes)
18 views

Hypothesis Tests in r

Uploaded by

jumajohn0013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Hypothesis Tests in r

Uploaded by

jumajohn0013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

BSTA 3152 Statistical Programming: LECTURE

eDon Symon

2024-12-09

Statistical Hypothesis Testing in R


Objectives
• Introductory theory
• Z-test
• T-test (one-sample, paired, and independent two-sample)

• Chi-square test

• ANOVA (Analysis of Variance)

• Wilcoxon signed-rank test


• Mann-Whitney U test

• Correlation test

Introduction to Statistical Hypothesis Testing in R


A statistical hypothesis is an assumption made by the researcher about the data of the population collected
for any experiment. It is not mandatory for this assumption to be true every time. Hypothesis testing, in a
way, is a formal process of validating the hypothesis made by the researcher.
In order to validate a hypothesis, it will consider the entire population into account. However, this is not
possible practically. Thus, to validate a hypothesis, it will use random samples from a population. On the
basis of the result from testing over the sample data, it either selects or rejects the hypothesis.

Types of Statistical Hypotheses


Statistical Hypothesis Testing can be categorized into two types:
• Null Hypothesis H0 : Hypothesis testing is carried out to test the validity of a claim or assumption
made about the larger population. This claim that involves attributes to the trial is known as the Null
Hypothesis. It is denoted by H0 .
• Alternative Hypothesis (H1 or (Ha ): An alternative hypothesis is considered valid if the null
hypothesis is proven false. The evidence present in the trial, including the data and statistical
computations, supports the alternative hypothesis.

Steps in Hypothesis Testing


Statisticians use hypothesis testing to formally check whether the hypothesis is accepted or rejected. Hypothesis
testing is conducted in the following manner:

Prepared by Symon K. Matonyo Lecture 1


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

1. State the Hypotheses


Clearly define the null hypothesis (H0 ) and the alternative hypothesis (H0 ).
2. Formulate an Analysis Plan
Develop a plan that specifies the appropriate statistical test and significance level (α).
3. Analyze Sample Data
Perform calculations to compute the test statistic and p-value based on the analysis plan.
4. Interpret Results
Use the p-value to determine whether to reject or fail to reject the null hypothesis based on the decision
rule.

Understanding the p-Value


Hypothesis testing ultimately uses a p-value to weigh the strength of the evidence against the null hypothesis.
The p-value ranges between 0 and 1 and can be interpreted as follows:
• Small p-value (≤ 0.05): Indicates strong evidence against H0 , so you reject it.
• Large p-value (> 0.05): Indicates weak evidence against H0 , so you fail to reject it.
• p-value close to 0.05: Marginal results that could go either way.

Decision Errors in Hypothesis Testing


Two types of errors can occur during hypothesis testing:
1. Type I Error
• Occurs when the researcher rejects a true null hypothesis (H0 ).

• The probability of making a Type I error is called the significance level (α).

• Example: Concluding that a coin is biased when it is actually fair.


2. Type II Error
• Occurs when the researcher accepts a false null hypothesis (H0 ).

• The probability of making a Type II error is called the power of the test (β).

• Example: Concluding that a coin is fair when it is actually biased.

1. Z-test in R
A Z-Test is a statistical test used for means or proportions when the population variances are known and
the sample size is large (typically n > 30). It is commonly used in hypothesis testing.
When to Use a Z-Test
• When the population standard deviation (σ) is known.
• For large sample sizes (n > 30).
• To compare:
– A sample mean to a population mean.
– Two sample proportions.
The Z statistics is calculated as below.
Calculate the Z-Statistic

Prepared by Symon K. Matonyo Lecture 2


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

Use the formula for Z: - For means:


x̄ − µ
Z= √
σ/ n
- For proportions:
p̂ − p
Z=q
p(1−p)
n

The z.test() function from the BSDA package can be used to perform one-sample and two-sample z-tests
in R.
The function syntax is:
z.test(x, y, alternative = "two.sided", mu = 0, sigma.x = NULL, sigma.y = NULL, conf.level = 0.95)

where,
• x: Values for the first sample.

• y: Values for the second sample (optional for two-sample z-tests).

• alternative: The alternative hypothesis (“greater”, “less”, “two.sided”).

• mu: Mean under the null or mean difference (in the two-sample case).

• sigma.x: Population standard deviation of the first sample.

• sigma.y: Population standard deviation of the second sample (optional).

• conf.level: Confidence level for the test.

Example 1: One Sample Z-Test in R


Suppose the IQ in a certain population is normally distributed with a mean of µ = 100 and a stan-
dard deviation of σ = 15. A scientist wants to know if a new medication affects IQ levels, so she
recruits 20 patients to use it for one month and records their IQ levels at the end of the month.
88, 92, 94, 94, 96, 97, 97, 97, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 115
# Load BSDA package
if (!require("BSDA")) install.packages("BSDA")

## Loading required package: BSDA


## Loading required package: lattice
##
## Attaching package: 'BSDA'
## The following object is masked from 'package:datasets':
##
## Orange
suppressPackageStartupMessages(library(BSDA))
# Enter IQ levels for 20 patients
data <- c(88, 92, 94, 94, 96, 97, 97, 97, 99, 99,
105, 109, 109, 109, 110, 112, 112, 113, 114, 115)

Prepared by Symon K. Matonyo Lecture 3


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

# Perform one-sample z-test


z.test(data, mu = 100, sigma.x = 15)

##
## One-sample z-Test
##
## data: data
## z = 0.90933, p-value = 0.3632
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
## 96.47608 109.62392
## sample estimates:
## mean of x
## 103.05
Results:
- Z-Statistic: 0.90933
- p-value: 0.3632
Since the p-value (0.3632>0.05), we fail to reject the null hypothesis.
Conclusion: The medication does not significantly affect IQ levels.

Example 2: Two Sample Z-Test in R


Suppose the IQ levels among individuals in two different cities are known to be normally distributed each
with population standard deviations of 15.A scientist wants to know if the mean IQ level between individuals
in city A and city B are different, so she selects a simple random sample of 20 individuals from each city and
records their IQ levels.

city A : 82, 84, 85, 89, 91, 91, 92, 94, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 11
city B : 90, 91, 91, 91, 95, 95, 99, 99, 108, 109, 109, 114, 115, 116, 117, 117, 128, 129, 130, 133
# Enter IQ levels for 20 individuals from each city
cityA <- c(82, 84, 85, 89, 91, 91, 92, 94, 99, 99,
105, 109, 109, 109, 110, 112, 112, 113, 114, 114)

cityB <- c(90, 91, 91, 91, 95, 95, 99, 99, 108, 109,
109, 114, 115, 116, 117, 117, 128, 129, 130, 133)

# Perform two-sample z-test


z.test(x = cityA, y = cityB, mu = 0, sigma.x = 15, sigma.y = 15)

##
## Two-sample z-Test
##
## data: cityA and cityB
## z = -1.7182, p-value = 0.08577
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -17.446925 1.146925
## sample estimates:
## mean of x mean of y
## 100.65 108.80

Prepared by Symon K. Matonyo Lecture 4


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

Results:
- Z-Statistic: -1.7182
- p-value: 0.08577
Since the p-value (0.08577 > 0.05), we fail to reject the null hypothesis.
Conclusion: The mean IQ levels are not significantly different between the two cities.

Z-Test Exercises
1. A researcher claims that the average weight of a population is 70 kg. A sample of 25 individuals has a
mean weight of 72 kg with a known population standard deviation of 8 kg.Test this claim at α = 0.05.
2. Suppose a company claims there is no difference in the average salaries of employees in two different
branches. A sample of 30 employees from Branch A has a mean salary of 50, 000, while a sample of 30
employees from Branch B has a mean salary of 52, 000. The known population standard deviation is
$5,000 for both branches. Test the company’s claim at α = 0.05.

One Proportion Z-Test in R


A one proportion z-test is used to compare an observed proportion to a theoretical one. This test checks
whether the proportion in a sample is significantly different from a hypothesized population proportion.
• Null Hypothesis (H0): p = p0 (The population proportion is equal to the hypothesized proportion
p0 ).
The alternative hypothesis can be one of the following:
• Alternative Hypothesis (H1, two-tailed): p ̸= p0 (The population proportion is not equal to some
hypothesized value p0 ).
• Alternative Hypothesis (H1, left-tailed): p < p0 (The population proportion is less than the
hypothesized value p0 ).
• Alternative Hypothesis (H1, right-tailed): p > p0 (The population proportion is greater than the
hypothesized value p0 ).
The test statistic for a one proportion z-test is calculated as:

p − p0
z=q
p0 (1−p0 )
n

Where: - p is the observed sample proportion. - p0 is the hypothesized population proportion. - n is the
sample size.
One Proportion Z-Test in R
To perform a one proportion z-test in R, we can use the following functions:
• If n ≤ 30: binom.test(x, n, p = 0.5, alternative = "two.sided")
• If n > 30: prop.test(x, n, p = 0.5, alternative = "two.sided", correct = TRUE)
Where: - x: is the number of successes.
- n: is the sample size.
- p: is the hypothesized population proportion.
- alternative: specifies the alternative hypothesis (can be “two.sided”, “less”, or “greater”).
- correct: is whether or not to apply Yates’ continuity correction.

Prepared by Symon K. Matonyo Lecture 5


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

Example: One Proportion Z-Test in R


Suppose we want to know whether or not the proportion of residents in a certain county who support a
certain law is equal to 60%. To test this, we collect the following data:
• p0 : hypothesized population proportion = 0.60
• x: residents who support the law = 64
• n: sample size = 100
Since our sample size is greater than 30, we can use the prop.test() function to perform a one-sample z-test.
# Perform One Proportion Z-Test
prop.test(x = 64, n = 100, p = 0.60, alternative = "two.sided")

##
## 1-sample proportions test with continuity correction
##
## data: 64 out of 100, null probability 0.6
## X-squared = 0.51042, df = 1, p-value = 0.475
## alternative hypothesis: true p is not equal to 0.6
## 95 percent confidence interval:
## 0.5372745 0.7318279
## sample estimates:
## p
## 0.64
Interpretation:
- p-value: 0.475 - 95% Confidence Interval: [0.5373, 0.7318] - Observed Proportion (p): 0.64
Since the p-value (0.475) is greater than the significance level α = 0.05, we fail to reject the null
hypothesis.
Conclusion: We do not have sufficient evidence to say that the proportion of residents who support the law
is different from 0.60.
Additionally, the 95% confidence interval for the true proportion of residents who support the law is [0.5373,
0.7318]. Since this confidence interval contains 0.60, it confirms that we do not have evidence to say that the
true proportion is different from 0.60.

2. Test in R
The t-test is a statistical test used to compare the means of two groups or a single group against a known
value. It is commonly used when the sample size is small and the population standard deviation is unknown.
The t-test follows the assumptions of normality in the data.
There are three main types of t-tests:
1. One-Sample T-Test: Compares the sample mean to a known value (e.g., population mean).
2. Two-Sample T-Test: Compares the means of two independent groups.
3. Paired Sample T-Test: Compares means from two related groups (e.g., pre-treatment and post-
treatment).
We can use the t.test() function in R to perform each type of test:
The syntax is below:
t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),

Prepared by Symon K. Matonyo Lecture 6


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

mu = 0, paired = FALSE, var.equal = FALSE,


conf.level = 0.95, ...)

where;
• x, y: The two samples of data.

• alternative: The alternative hypothesis of the test.


• mu: The true value of the mean.

• paired: Whether to perform a paired t-test or not.

• var.equal: Whether to assume the variances are equal between the samples.

• conf.level: The confidence level to use.

a). One-Sample T-Test


A one sample t-test is used to test whether or not the mean of a population is equal to some value.

Example 1: One Sample t-test in R


A researcher wants to know whether or not the mean weight of a certain species of some turtle is equal to
310 pounds. He goes out and collects a simple random sample of turtles with the following weights:

W eights : 300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303
Test the researcher’s claim at α = 0.05.
#define vector of turtle weights
turtle_weights <- c(300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303)

#perform one sample t-test


t.test(x = turtle_weights, mu = 310)

##
## One Sample t-test
##
## data: turtle_weights
## t = -1.5848, df = 12, p-value = 0.139
## alternative hypothesis: true mean is not equal to 310
## 95 percent confidence interval:
## 303.4236 311.0379
## sample estimates:
## mean of x
## 307.2308
Interpretation:
• t-value: -1.5848 (test statistic).

• 95% Confidence Interval: [303.4236, 311.0379]

• Degrees of freedom: 12

• p-value: 0.139.

Prepared by Symon K. Matonyo Lecture 7


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

Decision: Since the p-value (0.139 > 0.05), we reject the null hypothesis. . Conclusion: This means that
there is no significant difference between the mean weight of the turtles and the hypothesized mean weight of
310 grams.

b). Two Sample t-test in R: Independent t test


A two sample t-test (Independent t test) is used to test whether or not the means of two populations are
equal.

Example 1: Two Sample t-test in R


Suppose we want to know whether or not the mean weight between two different species of turtles is equal.
To test this, we collect a simple random sample of turtles from each species with the following weights:

Sample 1 : 300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303

Sample 2 : 335, 329, 322, 321, 324, 319, 304, 308, 305, 311, 307, 300, 305
we can visually inspect the plots of both samples to see whether there exist any difference;
# Load BSDA package
if (!require("ggplot2")) install.packages("ggplot2")

## Loading required package: ggplot2


suppressPackageStartupMessages(library(ggplot2))
#define vector of turtle weights for each sample
sample1 <- c(300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303)
sample2 <- c(335, 329, 322, 321, 324, 319, 304, 308, 305, 311, 307, 300, 305)
#combine data using r bind
# Create a data frame for the combined data
combined <- data.frame(
weight = c(sample1, sample2),
group = factor(c(rep("Sample 1", length(sample1)), rep("Sample 2", length(sample2))))
)

# Box plot to observe any differences in means of the two groups


ggplot(combined, aes(x = group, y = weight, fill= group)) +
geom_boxplot() +
labs(title = "Comparison of Turtle Weights", x = "Group", y = "Weight")

Prepared by Symon K. Matonyo Lecture 8


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

Comparison of Turtle Weights

330

320 group
Weight

Sample 1
Sample 2

310

300

Sample 1 Sample 2
Group
From the visualization in the plot, we can clearly see there is a difference in the two samples. But is there a
statistical difference between the times.
Using t.test() function
#define vector of turtle weights for each sample
sample1 <- c(300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303)
sample2 <- c(335, 329, 322, 321, 324, 319, 304, 308, 305, 311, 307, 300, 305)

#perform two sample t-test


t.test(x = sample1, y = sample2)

##
## Welch Two Sample t-test
##
## data: sample1 and sample2
## t = -2.1009, df = 19.112, p-value = 0.04914
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -14.73862953 -0.03060124
## sample estimates:
## mean of x mean of y
## 307.2308 314.6154
Interpretation:
• t-test statistic: -2.1009
• degrees of freedom: 19.112
• p-value: 0.04914

Prepared by Symon K. Matonyo Lecture 9


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

• 95% confidence interval for true mean difference: [-14.74, -0.03]


• mean of sample 1 weights: 307.2308
• mean of sample 2 weights: 314.6154
Decision: Since the p-value (0.04914 < 0.05), we reject the null hypothesis.
Conclusion: This means we have sufficient evidence to say that the mean weight between the two species
is significantly different.

Example 2: Two Sample t-test in R


Suppose we want to test if there is a significant difference in the average heights between males and females
in a population. Consider the following data and run a t test.

Male Heights : 175, 178, 180, 185, 170, 172, 180, 177, 174, 173

Female Heights : 160, 165, 163, 162, 167, 168, 170, 169, 164, 166

Using t.test() function


#define vector of turtle weights for each sample
MaleHeights <- c(175, 178, 180, 185, 170, 172, 180, 177, 174, 1733)
FemaleHeights <- c(160, 165, 163, 162, 167, 168, 170, 169, 164, 166)

#perform two sample t-test


t.test(x = MaleHeights, y = FemaleHeights)

##
## Welch Two Sample t-test
##
## data: MaleHeights and FemaleHeights
## t = 1.073, df = 9.0008, p-value = 0.3112
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -185.0585 519.0585
## sample estimates:
## mean of x mean of y
## 332.4 165.4
Interpretation:
• t-test statistic: 1.073
• degrees of freedom: 9.0008
• p-value: 0.3112
• 95% confidence interval for true mean difference: [-185.0585 , 519.0585]
• mean of sample 1 weights: 332.4
• mean of sample 2 weights: 165.4 314.6154
Decision: Since the p-value (165.4 > 0.05), we fail to reject the null hypothesis.
Conclusion: This means we do not have sufficient evidence to say that the mean weight between the two
groups is significantly different.

Example 3: Two Sample t-test in R: From a dataset


Consider the data set Bank.csv. We want to conduct an independent t-test to investigate whether there exist
a significant difference in salary earned by make and female.

Prepared by Symon K. Matonyo Lecture 10


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

We first load the data and convert the gender variable to a categorical variable.
#loading the data to r environment
bank<-read.csv('Bank.csv')
head(bank)

## Employee EducLev JobGrade YrHired YrBorn Gender YrsPrior PCJob Salary


## 1 1 3 1 92 69 Male 1 No 32.0
## 2 2 1 1 81 57 Female 1 No 39.1
## 3 3 1 1 83 60 Female 0 No 33.2
## 4 4 2 1 87 55 Female 7 No 30.6
## 5 5 3 1 92 67 Male 0 No 29.0
## 6 6 3 1 92 71 Female 0 No 30.5
#Covert the Gender variable into a categorical variable
bank$Gender<-as.factor(bank$Gender)

We can conduct a visual inspection of the two groups.


## Boxplots of the salary distribution of the salaries across the two groups
ggplot(data = bank, aes(x = Gender, y = Salary, fill = Gender)) +
geom_boxplot() +
labs(title = "Comparison of Salaries by Gender", x = "Gender", y = "Salary")

Comparison of Salaries by Gender


100

80

Gender
Salary

Female
60
Male

40

Female Male
Gender

The visual plot shows a difference in the salary of Female and Male individulas in the bank data.
Is this difference statistically significant?
#t test across male and Female salaries

Prepared by Symon K. Matonyo Lecture 11


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

t.test(Salary~ Gender, data = bank)

##
## Welch Two Sample t-test
##
## data: Salary by Gender
## t = -4.141, df = 78.898, p-value = 8.604e-05
## alternative hypothesis: true difference in means between group Female and group Male is not equal to
## 95 percent confidence interval:
## -12.282943 -4.308082
## sample estimates:
## mean in group Female mean in group Male
## 37.20993 45.50544
Interpretation:
• t-test statistic: -4.141
• degrees of freedom: 78.898
• p-value: 8.604e-05
• 95% confidence interval for true mean difference: [-12.282943 , -4.308082]
• mean of sample 1 weights: 37.20993
• mean of sample 2 weights: 45.50544
Decision: Since the p-value (0.0000 < 0.05), we reject the null hypothesis.
Conclusion: This means we have sufficient evidence to say that the mean salary between the male and
female is significantly different.

b). Paired Sample t-test in R:


The paired sample t test also called the dependent t-test is a statistical procedure used to determine whether
the mean difference between two set of observations is zero.
In paired t-test, each subject is measured twice resulting in pairs of observations.

Example 1: Paired Samples t-test in R


Suppose we want to know whether or not a certain training program is able to increase the max vertical
jump (in inches) of basketball players.
To test this, we may recruit a simple random sample of 12 college basketball players and measure each of
their max vertical jumps. Then, we may have each player use the training program for one month and then
measure their max vertical jump again at the end of the month.
The following data shows the max jump height (in inches) before and after using the training program for
each player:

Before : 22, 24, 20, 19, 19, 20, 22, 25, 24, 23, 22, 21

After : 23, 25, 20, 24, 18, 22, 23, 28, 24, 25, 24, 20

#define before and after max jump heights


before <- c(22, 24, 20, 19, 19, 20, 22, 25, 24, 23, 22, 21)
after <- c(23, 25, 20, 24, 18, 22, 23, 28, 24, 25, 24, 20)

Prepared by Symon K. Matonyo Lecture 12


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

#perform paired samples t-test


t.test(x = before, y = after, paired = TRUE)

##
## Paired t-test
##
## data: before and after
## t = -2.5289, df = 11, p-value = 0.02803
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -2.3379151 -0.1620849
## sample estimates:
## mean difference
## -1.25
Interpretation:
• t-test statistic: -2.5289
• degrees of freedom: 11
• p-value: 0.02803
• 95% confidence interval for true mean difference: [-2.3379151, -0.1620849]
• mean difference between before and after: -1.25
Decision: Since the p-value (0.02803 < 0.05), we reject the null hypothesis.
Conclusion: This means we have sufficient evidence to say that the mean jump height before and after
using the training program is not equal.

Example 2: Paired Samples t-test in R


Suppose we want to test if a training program improves test scores. We collect pre- and post-test scores from
10 individuals.

Pre Test : 75, 80, 82, 70, 65, 85, 78, 88, 90, 85

Post Test : 80, 85, 90, 78, 72, 88, 83, 92, 95, 90

# Pre-test and post-test scores


pre_test <- c(75, 80, 82, 70, 65, 85, 78, 88, 90, 85)
post_test <- c(80, 85, 90, 78, 72, 88, 83, 92, 95, 90)

# Perform paired sample t-test


t.test(pre_test, post_test, paired = TRUE)

##
## Paired t-test
##
## data: pre_test and post_test
## t = -10.541, df = 9, p-value = 2.303e-06
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -6.680279 -4.319721
## sample estimates:
## mean difference
## -5.5

Prepared by Symon K. Matonyo Lecture 13


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

Interpretation:
• t-test statistic: -10.541
• degrees of freedom: 9
• p-value: 2.303e-06
• 95% confidence interval for true mean difference: [-6.680279 , -4.319721]
• mean difference between before and after: -5.5
Decision: Since the p-value (0.000 < 0.05), we reject the null hypothesis.
Conclusion: This means we have sufficient evidence to say that the mean scores were significantly different
pre and post the training.

3. Analysis of Variance Test (ANOVA)


ANOVA (Analysis of Variance) is a statistical method used to compare means across multiple groups to
determine if there is any significant difference between them. It is an extension of the t-test when there are
more than two groups to compare.
The general hypothesis in ANOVA is as follows:
• Null Hypothesis (H0): All group means are equal.
• Alternative Hypothesis (H1): At least one group mean is different from the others.

One-Way ANOVA in R
A one-way ANOVA is used when we want to compare the means of three or more groups based on a single
factor.

Assumptions of ANOVA
Independence of observations: The groups must be independent of each other. Normality: The data within
each group should be approximately normally distributed. Homogeneity of variances: The variance within
each group should be approximately equal (homoscedasticity).

Example Problem
Suppose we have data on the average exam scores of students from three different teaching methods, and we
want to know if the teaching methods significantly affect exam scores.

Method A Method B Method C


83 78 85
76 70 88
90 76 92
85 80 91
88 79 89

Steps to Perform One-Way ANOVA in R We can start explore the data means and plot box plots to
visualize any differences;
# Enter Data for the three groups
method_A <- c(83, 76, 90, 85, 88)
method_B <- c(78, 70, 76, 80, 79)
method_C <- c(85, 88, 92, 91, 89)

# Combine the data into a data frame

Prepared by Symon K. Matonyo Lecture 14


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

scores <- data.frame(


score = c(method_A, method_B, method_C),
method = factor(rep(c("A", "B", "C"), each = 5))
)

#load dplyr package


suppressPackageStartupMessages(library(dplyr))

#find mean and standard deviation of weight loss for each treatment group
scores %>%
group_by(method) %>%
summarise(mean = mean(score),
sd = sd(score))

## # A tibble: 3 x 3
## method mean sd
## <fct> <dbl> <dbl>
## 1 A 84.4 5.41
## 2 B 76.6 3.97
## 3 C 89 2.74
# Box plot to observe any differences in means of the three groups
ggplot(scores, aes(x = method, y = score, fill= method)) +
geom_boxplot() +
labs(title = "Weight Loss Distribution by Program", x = "Method", ylab = "scores")

Weight Loss Distribution by Program

90

85
method
score

80 B
C

75

70

A B C
Method

From the visualization in the plot, we can clearly see there is a difference in the two samples. But is there a

Prepared by Symon K. Matonyo Lecture 15


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

statistical difference between the times.


# Perform One-Way ANOVA
anova_result <- aov(score ~ method, data = scores)
summary(anova_result)

## Df Sum Sq Mean Sq F value Pr(>F)


## method 2 392.9 196.47 11.21 0.0018 **
## Residuals 12 210.4 17.53
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:
• F statistic: 11.21
• p-value: 0.0018
Decision: Since the p-value (0.000 < 0.05), we reject the null hypothesis.
Conclusion: This means we have sufficient evidence to say that the mean scores were significantly different
between the three groups.

Post-Hoc Tests (if necessary)


If the ANOVA test shows a significant result, we can perform post-hoc tests to determine which groups are
different from each other. The most commonly used post-hoc test is Tukey’s Honest Significant Difference
(HSD) test.
Tukey’s HSD test compares each pair of groups and gives adjusted p-values. If the adjusted p-value is less
than 0.05, we can conclude that there is a significant difference between those two groups.
# Perform Tukey's HSD test
tukey_result <- TukeyHSD(anova_result,conf.level=.95)
summary(tukey_result)

## Length Class Mode


## method 12 -none- numeric

Example 2: ANOVA TEST in R: From a dataset


Consider the data set Bank.csv. We want to conduct a One-Way ANOVA to investigate whether there exist
a significant difference in salary earned by across different Job Grades.
We first load the data and convert the JobGrade variable to a categorical variable.
#loading the data to r environment
bank<-read.csv('Bank.csv')
head(bank)

## Employee EducLev JobGrade YrHired YrBorn Gender YrsPrior PCJob Salary


## 1 1 3 1 92 69 Male 1 No 32.0
## 2 2 1 1 81 57 Female 1 No 39.1
## 3 3 1 1 83 60 Female 0 No 33.2
## 4 4 2 1 87 55 Female 7 No 30.6
## 5 5 3 1 92 67 Male 0 No 29.0
## 6 6 3 1 92 71 Female 0 No 30.5
#Covert the Gender variable into a categorical variable
bank$JobGrade<-as.factor(bank$JobGrade)

We can conduct a visual inspection of the two groups.

Prepared by Symon K. Matonyo Lecture 16


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

## Boxplots of the salary distribution of the salaries across the two groups
ggplot(data = bank, aes(x = JobGrade, y = Salary, fill = JobGrade)) +
geom_boxplot() +
labs(title = "Comparison of Salaries by Job Grade", x = "Job Grade", y = "Salary")

Comparison of Salaries by Job Grade


100

80
JobGrade
1
2
Salary

3
60
4
5
6

40

1 2 3 4 5 6
Job Grade

The visual plot shows a difference in the salaries across the job grades in the bank data.
Is this difference statistically significant?
#t test across male and Female salaries

model<-aov(Salary~ JobGrade, data = bank)


summary(model)

## Df Sum Sq Mean Sq F value Pr(>F)


## JobGrade 5 18495 3699 96.64 <2e-16 ***
## Residuals 202 7732 38
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:
• F statistic: 96.64
• p-value: 0.0000
Decision: Since the p-value (0.000 < 0.05), we reject the null hypothesis.
Conclusion: This means we have sufficient evidence to say that the mean salaris were significantly
different across the various Job categories.

Prepared by Symon K. Matonyo Lecture 17


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

Model assumptions
1. Normality assumption
We can inspect the Q-Q plot to check whether the assumption is violated or run the ‘Shapiro wilk test.
Ideally the standardized residuals would fall along the straight diagonal line in the plot.
# Q-Q plot
plot(model,2)

Q−Q Residuals
6

205
204
4
Standardized residuals

2
0
−2
−4
−6

208

−3 −2 −1 0 1 2 3

Theoretical Quantiles
aov(Salary ~ JobGrade)
#Running shapiro wilk normality test
shapiro.test(model$residuals)

##
## Shapiro-Wilk normality test
##
## data: model$residuals
## W = 0.81763, p-value = 7.095e-15
In the plot above we can see that the residuals stray from the line quite a bit towards the beginning and the
end. This is an indication that our normality assumption may be violated.
The p value is less than 0.05 indicating that the assumption of normality is violated.
2. Equality of variance
We can use the Levene's test in the package to test this or inspect the Residuals vs Fitted plot. Ideally
we’d like to see the residuals be equally spread out for each level of the fitted values.
#Residuals vs Fitted`
plot(model,1)

Prepared by Symon K. Matonyo Lecture 18


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

Residuals vs Fitted

205
204
20
Residuals

0
−20
−40

208

35 40 45 50 55 60 65

Fitted values
aov(Salary ~ JobGrade)
# Inspecting using the levene's test

#load car package


if (!require("car")) install.packages("car")

## Loading required package: car


## Loading required package: carData
##
## Attaching package: 'carData'
## The following objects are masked from 'package:BSDA':
##
## Vocab, Wool
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
suppressPackageStartupMessages(library(car))

#conduct Levene's Test for equality of variances


leveneTest(model)

## Levene's Test for Homogeneity of Variance (center = median)


## Df F value Pr(>F)

Prepared by Symon K. Matonyo Lecture 19


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

## group 5 14.738 2.549e-12 ***


## 202
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can see that the residuals are much more spread out for the higher fitted values, which is an indication
that our equal variances assumption may be violated.
The p-value of the test is 0.000 indicating the assumption is not violated.
3. Test for Independence
We can use the DurbinWatson test in lmtest package to check this.
# Check Autocorrelation with Durbin-Watson Test
if (!require("lmtest")) install.packages("lmtest")

## Loading required package: lmtest


## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(lmtest)
durbinWatsonTest(lm(model))

## lag Autocorrelation D-W Statistic p-value


## 1 0.2476946 1.317842 0
## Alternative hypothesis: rho != 0
The p value is 0.0 indicating that it is less than 0.05 hence the residuals of the model appear to be
autocorrelated.
NB: we could attempt to transform the data to make sure that our assumptions of normality and equality of
variances are met. (For now we assume this)

Post-Hoc Tests:
Once we have verified that the model assumptions are met (or reasonably met), we can then conduct a post
hoc test to determine exactly which treatment groups differ from one another.
#perform Tukey's Test for multiple comparisons
TukeyHSD(model, conf.level=.95)

## Tukey multiple comparisons of means


## 95% family-wise confidence level
##
## Fit: aov(formula = Salary ~ JobGrade, data = bank)
##
## $JobGrade
## diff lwr upr p adj
## 2-1 2.329357 -1.2520561 5.910770 0.4228303
## 3-1 6.329484 2.7726522 9.886317 0.0000105
## 4-1 11.816976 7.7427858 15.891167 0.0000000
## 5-1 17.993405 13.4799212 22.506888 0.0000000
## 6-1 35.664833 30.3812223 40.948444 0.0000000

Prepared by Symon K. Matonyo Lecture 20


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

## 3-2 4.000127 0.1381901 7.862065 0.0375396


## 4-2 9.487619 5.1445175 13.830721 0.0000000
## 5-2 15.664048 10.9064182 20.421677 0.0000000
## 6-2 33.335476 27.8418390 38.829113 0.0000000
## 4-3 5.487492 1.1646378 9.810346 0.0044146
## 5-3 11.663920 6.9247672 16.403073 0.0000000
## 6-3 29.335349 23.8577048 34.812993 0.0000000
## 5-4 6.176429 1.0376015 11.315256 0.0085767
## 6-4 23.847857 18.0209750 29.674739 0.0000000
## 6-5 17.671429 11.5293555 23.813502 0.0000000
The p-value indicates whether or not there is a statistically significant difference between each Job Grade.
From the output, we can see that there is a statistically significant difference between the mean salary of
each Job Grade at the 0.05 significance level except for Job Grade 1 and 2.
We can also visualize the 95% confidence intervals that result from the Tukey Test by using the
plot(TukeyHSD()) function in R:
#create confidence interval for each comparison
plot(TukeyHSD(model, conf.level=.95), las = 2)

95% family−wise confidence level

2−1
3−1
4−1
5−1
6−1
3−2
4−2
5−2
6−2
4−3
5−3
6−3
5−4
6−4
6−5
0

10

20

30

40

Differences in mean levels of JobGrade

The results of the confidence intervals are consistent with the results of the hypothesis tests.
We can see that all the confidence intervals for the mean salaries between the Job Grades do not contain the
value zero except for grades 1 and 2, which indicates that there is a statistically significant difference in mean
salary between all Job Grades except for 1 and 2.

Prepared by Symon K. Matonyo Lecture 21


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

4. Chi-Square Tests in R
The chi-square test is a statistical method used to examine whether observed data matches expected data
(goodness-of-fit) or whether two categorical variables are independent (test of independence).
Types of Chi-Square Tests
1. Goodness-of-Fit Test: - Used to determine if a sample distribution matches a theoretical distribution. -
Example: Do the proportions of a dice roll match the expected proportions?
2. Test of Independence:
• Used to test the relationship between two categorical variables.
• Example: Is gender independent of voting preference?
Assumptions - The data must be categorical. - Observations must be independent. - Expected frequency in
each cell should be at least 5 (for large-sample approximation validity).

1. Goodness-of-Fit Test
A Chi-Square Goodness of Fit Test is used to determine whether or not a categorical variable follows a
hypothesized distribution.

Example 1: Fairness of a Dice

# Observed data (number of occurrences of each side of a dice)


observed <- c(16, 18, 15, 17, 14, 20)

# Expected frequencies (equal probabilities for a fair dice)


expected <- rep(sum(observed) / length(observed), length(observed))

# Perform chi-square goodness-of-fit test


chisq_test <- chisq.test(x = observed, p = rep(1/6, 6))

# Output the result


chisq_test

##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 1.4, df = 5, p-value = 0.9243
Interpretation:
• chi square statistic: 1.4
• degrees of freedom: 5
• p-value: 0.9243
Decision: Since the p-value (0.9243 > 0.05), we fail to reject the null hypothesis.
Conclusion: This means we do not have have sufficient evidence to say that the dice is unfair.

Example 2:
A shop owner claims that an equal number of customers come into his shop each weekday. To test this
hypothesis, a researcher records the number of customers that come into the shop in a given week and finds
the following:

Prepared by Symon K. Matonyo Lecture 22


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

• Monday: 50 customers

• Tuesday: 60 customers

• Wednesday: 40 customers

• Thursday: 47 customers

• Friday: 53 customers
Perform a Chi-Square goodness of fit test in R to determine if the data is consistent with the shop owner’s
claim.
# Enter data and create expected frequencies
observed <- c(50, 60, 40, 47, 53)
expected <- c(.2, .2, .2, .2, .2) #must add up to 1

#perform Chi-Square Goodness of Fit Test


chisq.test(x=observed, p=expected)

##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 4.36, df = 4, p-value = 0.3595
Interpretation:
• chi square statistic: 4.36
• degrees of freedom: 4
• p-value: 0.3595
Decision: Since the p-value (0.3595 > 0.05), we fail to reject the null hypothesis.
Conclusion: This means we do not have have sufficient evidence to say that the true distribution of
customers is different from the distribution that the shop owner claimed.

2. Chi Square Test of Independence


A Chi-Square Test of Independence is used to determine whether or not there is a significant association
between two categorical variables.

Example 1: Political Party Preference


Suppose we want to know whether or not gender is associated with political party preference. We take a
simple random sample of 500 voters and survey them on their political party preference. The following table
shows the results of the survey:

Republican Democrat Independent Total


Male 120 90 40 250
Female 110 95 45 250
Total 230 185 85 500

#First, we will create a table to hold our data:

#create table

Prepared by Symon K. Matonyo Lecture 23


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

data <- matrix(c(120, 90, 40, 110, 95, 45), ncol=3, byrow=TRUE)
colnames(data) <- c("Rep","Dem","Ind")
rownames(data) <- c("Male","Female")
data <- as.table(data)
data

## Rep Dem Ind


## Male 120 90 40
## Female 110 95 45
# we can perform the Chi-Square Test of Independence using the chisq.test() function:

#Perform Chi-Square Test of Independence


chisq.test(data)

##
## Pearson's Chi-squared test
##
## data: data
## X-squared = 0.86404, df = 2, p-value = 0.6492
Interpretation:
• chi square statistic: 0.86404
• degrees of freedom: 2
• p-value: 0.6492
Decision: Since the p-value (0.6492 > 0.05), we fail to reject the null hypothesis.
Conclusion: This means we do not have have sufficient evidence to say that there is an association
between gender and political party preference i.e gender and political party preference are independent.

Example 2: Product Preference across gender


A survey asks 100 people about their gender and their preference for three types of products. The following
table shows the results of the survey:

preferA preferB preferC


Male 50 30 20
Female 40 60 50

# Create Contingency Table


data <- matrix(c(50, 30, 20, 40, 60, 50), nrow = 2, byrow = TRUE)
colnames(data) <- c("Prefer A", "Prefer B", "Prefer C")
rownames(data) <- c("Male", "Female")
data_table <- as.table(data)
data_table

## Prefer A Prefer B Prefer C


## Male 50 30 20
## Female 40 60 50
# Perform Chi-Square Test
chisq_test <- chisq.test(data_table)

chisq_test

Prepared by Symon K. Matonyo Lecture 24


BSTA 3152 STATISTICAL PROGRAMMING HYpothesis Testing in R

##
## Pearson's Chi-squared test
##
## data: data_table
## X-squared = 14.55, df = 2, p-value = 0.0006925
Interpretation:
• chi square statistic: 14.55
• degrees of freedom: 2
• p-value: 0.0006925
Decision: Since the p-value (0.0006925 < 0.05), we reject the null hypothesis.
Conclusion: This means we have sufficient evidence to say that there is an association between gender
and the products preference i.e gender and product preference are dependent.

Prepared by Symon K. Matonyo Lecture 25

You might also like