0% found this document useful (0 votes)
8 views

Hypothesis Testing

Uploaded by

cryptobaratoe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Hypothesis Testing

Uploaded by

cryptobaratoe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Machine Learning

Hypothesis Testing
Lecturer: Professor Hadi Farahani

February, 2024
Content

● Basics of hypothesis testing


● Hypothesis tests: Z-test, T-test, Chi-Square test
● p-Value method
● Type Ⅰ and Type Ⅱ Errors
Basics of hypothesis testing

- Hypothesis testing is a statistical method used to make inferences or conclusions about a


population parameter based on sample data. It involves formulating a null hypothesis (typically
denoted as H₀) and an alternative hypothesis (Hₐ), which are mutually exclusive statements about
the population parameter. The purpose of hypothesis testing is to assess the evidence provided by
the sample data to determine whether there is enough evidence to reject the null hypothesis in
favor of the alternative hypothesis, or if there is not enough evidence to do so.
Basics of hypothesis testing

● Hypothesis: A claim or a premise that we want to test.


● Null hypothesis (H₀): Currently accepted claim. In other words, we could say that H₀is the default
state of belief about the world.
● Alternative hypothesis (Hₐ): Involves the claim to be tested.
● H₀ and Hₐ are mathematically opposites.
Basics of hypothesis testing

● Test statistics: Calculated from sampled data and used to decide.


● Statistically significant: Where do we draw a line to make a decision?
● Level of confidence: This represents the probability that the statistical test will lead to the correct
rejection of the null hypothesis when it is false. It's the probability that the confidence interval will
contain the true population parameter.
● Level of significance: In hypothesis testing, the level of significance is typically set before
conducting the test, and it represents the threshold beyond which you would reject the null
hypothesis. Commonly used levels of significance are 0.05, 0.01, etc., but they can vary depending
on the context and the requirements of the analysis.. It is calculated by the following formula:
α= 1-C
Basics of hypothesis testing

The possible outcome of hypothesis testing:

- Reject Null hypothesis (H₀)


- Fail to reject Null hypothesis (H₀)
Example
It is believed that a candy machine makes chocolate bars that are on average 5g. A worker claims
that the machine after maintenance make no longer 5g bars.

● H₀: μ = 5g
● Hₐ: μ ≠ 5g
Hypothesis tests

There are various hypothesis tests, each appropriate for various goals to calculate our test. This could be
a Z-test, Chi-square, T-test, and so on.

● Z-test: If population means and standard deviations are known and the sample size is greater than
30 then Z-statistic is commonly used.
● T-test: If population standard deviations are unknown and sample size is less than 30 then t-test
statistic is more appropriate.
● Chi-square test: Chi-square test is used for categorical data or for testing independence in
contingency tables
● F-test: F-test is often used in the analysis of variance (ANOVA) to compare variances or test the
equality of means across multiple groups.
Hypothesis tests

● Hypothesis test could be conducted on population mean or on population proportion.


● In hypothesis test for population mean the test statistic is computed by:

● In hypothesis test for population proportion the test statistic is computed by:
Example
A factory has a machine that dispenses 80 mL of fluid in a bottle. An employee believes the average
amount of fluid is not 80 mL. Using 40 samples, he measures the average amount dispensed by the
machine to be 78 mL with a standard deviation of 2.5. (a) State the null and alternative hypothesis. (b) At
a 95% confidence level, is there enough evidence to support the idea that the machine is not working
properly?
Example
a) H₀: μ = 80 mL, Hₐ: μ ≠ 80 mL
b) The first step is to determine the type of test. Is this a one tail test or two tail test? The fact is that
the Hₐ is not equal to 80 and it could be less than 80 or greater than 80. So we need to conduct a
two tail test.

Here we use Z-test because the number of sampled data is more than 30.
Example
- The confidence level (C) is equal to 95% .
- So the significant level (α) would be equal to: α= 1-C= 1- 0.95= 0.05. Based on this we could say
that the value of α for each side is equal to α/2. That means the area of shaded regions would be
equal to 0.025 or 2.5%.
Example
- Now we need to find the z value correspond to 95% confidence level from the table. Which is
equal to 1.96. This value separates the rejection region (shaded area) from the failed to rejection
region (unshaded region).

- Now to make a decision to accept or reject null hypothesis we need to calculate the z score of
sampled data and compare it with the critical z value which is 1.96.
Example
Example
- So the calculated z is equal to - 5.06 which is less than -1.96. This shows that calculated z is in
rejection area and we could reject null hypothesis.
Example
A company manufactures car batteries with an average life span of 2 or more years. An engineer believes
this value to be less. Using 10 samples , he measures the average lifespan to be 1.8 years with a standard
deviation of 0.15. (a) state the null and alternative hypothesis. (b) At a 99% confidence level, is there
enough evidence to discord the null hypothesis?
Example
a) H₀: μ >= 2, Hₐ: μ < 2
b) The first step is to determine the type of test. Is this a one tail test or two tail test? The fact is that
the Hₐ is less than to. So we need to conduct a one tail test.
- Here we use t-test because population mean and standard deviations are unknown and the
number of sampled data is less than 30.
Example
- The confidence level (C) is equal to 99% .
- So the significant level (α) would be equal to: α= 1-C= 1- 0.95= 0.01.
- Now we need to find the t value correspond to the degree of freedom (df) and α value from the
table of student t distribution.
Example
Example
Example
- So the calculated t is equal to - 4.22 which is less than -2.82. This shows that calculated t is in
rejection area and we could reject null hypothesis.
p-Value method

Conducting a hypothesis test typically proceeds in four steps:


- Step 1: Define the Null and Alternative Hypothesis

- Step 2: Construct the Test Statistic

- Step 3: Compute the p-Value


- Step 4: Decide Whether to Reject the Null Hypothesis
p-Value method

p-Value: The p-Value serves as a crucial metric, quantifying the likelihood that an observed difference is
a result of chance. As the p-Value decreases, the statistical significance of the observed difference
intensifies. Ultimately, a very low p-Value prompts the rejection of the null hypothesis.
Example
A factory manufactures cars with a warranty of 5 years on the engine and transmission. An engineer
believes that the engine or transmission will malfunction in less than 5 years. He tested a sample of 40
cars and find the average time to be 4.8 years with the standard deviation of 0.50. (a) State the null and
alternative hypothesis. (b) At 2% significant level, is there enough evidence to support the idea that the
warranty should be revised?
Example
- First we need to calculate z value correspond to sampled data.
Example
● So the p-Value is equal to 0.0057. In p-Value method if p-Value < α the the null hypothesis is
rejected and if p-Value >= α the null hypothesis is accepted.
Example
- Then we need to find the area that correspond to z value.
Type Ι and Type Ⅱ Errors

When conducting hypothesis testing on randomly selected data samples instead of the entire population,
it's essential to acknowledge that our conclusions may not be universally applicable. Two types of errors
can occur:

● Type Ι Error: Incorrectly rejecting the null hypothesis when it is true.


● Type Ⅱ Error: Incorrectly accepting the null hypothesis when it is false.
References
● https://www.youtube.com/watch?v=zJ8e_wAWUzE&t=91s
● https://www.youtube.com/watch?v=8Aw45HN5lnA
● James, G., Witten, D., Hastie, T., Tibshirani, R. and Taylor, J., 2023. An introduction to
statistical learning: With applications in python. Springer Nature.

You might also like