0% found this document useful (0 votes)
10 views

Lecture Slides - Hypothesis Testing

Uploaded by

smorshed03
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lecture Slides - Hypothesis Testing

Uploaded by

smorshed03
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Statistics for Data Science

Hypothesis Testing
Agenda - Hypothesis Testing
1. Hypothesis Testing
a. Introduction
b. Hypothesis Formulation
2. Basic concepts of Hypothesis Testing
a. Importance of null
b. Importance of test statistic
c. Type I and Type 2 errors
d. Hypothesis testing template
3. Performing a Hypothesis Test
a. Some key ideas
b. Assumptions
c. Critical point
d. Rejection region approach
e. p-value approach
4. One-Tailed and Two-Tailed Tests
5. Confidence Interval and Hypothesis Test

2
Real World Problem
Suppose you are a quality analyst at a bulb manufacturing company and analyze the
reliability of bulbs. Historically, 70% of the bulbs pass the reliability test.

Now, a slightly altered manufacturing process(B) has been introduced to produce the bulbs.

Can you conclude whether the new process improves the reliability of the bulbs or not by
checking the number of reliable bulbs in a sample?

4
Gathering evidence for statistical Inference
We selected a random sample of 100 bulbs out of which 73 are reliable. Does this provide
strong evidence that the new manufacturing process is more reliable?
If the new manufacturing process was only as good as the current process - What is the
probability of getting 73 or more reliable bulbs in a sample of 100 bulbs?

The probability of getting 73 or more reliable bulbs in a sample of 100 bulbs is ~0.30.

Thus, there is no strong evidence that the new process improves reliability

5
Gathering evidence for statistical Inference
A similar experiment was run with yet another manufacturing process (C). A sample of 100
bulbs produced using this process had 81 reliable bulbs.

The probability of getting 81 or more reliable bulbs in a sample of 100 bulbs is ~0.01.

Thus, there is strong evidence that the new process improves reliability

6
Why Hypothesis?

The problem of estimation is considered, when there is no


previous knowledge of the population parameter. The
Estimation problem is simpler in that case. A random sample is taken,
a sample statistic is computed and an appropriate point
and interval estimate is suggested.

Often the interest is not in the numerical value of the point


estimate of the parameter, but in knowing the plausibility
Hypothesis Testing of a hypothesis about the population parameter by using
sample data. Estimation is not enough to arrive at a
conclusion in such cases.

7
What is Hypothesis?

Often we are interested in population parameter(s)

A hypothesis is a conjecture about the population parameter(s)

For example, a bulb manufacturing company is interested in knowing whether the new
manufacturing process improves reliability of the bulbs.

The objective of the Hypothesis Testing is to SET a value for the parameter(s) and perform
a statistical TEST to see whether that value is tenable in the light of the evidence gathered
from the sample.
8
Overview of Applications
Applications of Hypothesis Testing

Testing the Testing the


Testing Research
validity of a business
Hypotheses
claim decisions

e.g. a new automobile e.g. a manufacturer claims e.g. new online ad has
system increases the mean that 1L soft drink bottles are resulted in higher online
mpg performance filled with an average of at conversion rates for an E-
least 0.99L commerce website

9
Stating the Hypothesis
Null and Alternative Hypotheses - Two
mutually exclusive statements about the
population parameter(s)

Null Hypothesis (H0) Alternative Hypothesis (Ha)


The presumed current The rival opinion
state of the matter or research hypothesis
or status quo. or an improvement target.

E.g. The new process for E.g. The new process for
manufacturing bulbs does manufacturing bulbs
not improve reliability. improves reliability.
10
Null & Alternative Formulation : Example

Mean length of lumber is specified to be 8.5m for a certain building project. A construction
engineer wants to make sure that the shipments she received adhere to that specification.

The population parameter about which the hypothesis will be formed is population mean 𝜇.

The hypotheses are


H0 : 𝜇 = 8.5

Ha : 𝜇 ≠ 8.5

11
Null & Alternative Formulation : Example

There is a belief that 20% of men on business travel abroad brings a significant other with
them. A chain hotel claims that number is too low.

The population parameter about which the hypothesis will be formed is population
proportion 𝜋.

The hypotheses are


H0 : 𝜋 = 0.2

Ha : 𝜋 > 0.2

12
Tips to formulate Null & Alternative

Am I testing an assumption
Am I testing a status quo
or claim that is beyond
that already exists?
what I know?

Null Hypothesis Alternate Hypothesis

Negation of the research Research question to be


question proven

Always contains equality (=, >= ,


<=) Doesn’t contain equality (≠, >, <)

13
Basic Concepts of Hypothesis Testing

14
Importance of Null

Null hypothesis is assumed to be true unless reasonably strong evidence to the contrary is
found.

Based on a random sample a decision is made whether there exists reasonably strong
evidence against the null hypothesis.

Evidence is strong (satisfies the Reject the null hypothesis


predetermined decision rule) in favour of alternative hypothesis

Evidence is not strong (does not satisfy Fail to reject the null hypothesis
the predetermined decision rule) in favour of alternative hypothesis

15
Importance of Test Statistic
The test statistic is calculated from the sample data and tested against the predetermined
Decision Rule.

The test statistic is a random variable that follows a standard distribution such as Normal,
T, F, Chi-square etc. Sometimes the tests are named after the test statistic

Since hypothesis testing is done on the basis of sampling distribution, the decisions made
are probabilistic.

Hence, it is very important to understand the errors associated with hypothesis testing.

16
Type I and Type II Error

17
Type I and Type II Errors

Level of Power of
significance the test
H0 is True H0 is False

Type I Error Correct decision


Reject H0
Prob = α Prob = 1 - β

Fail to reject Correct decision Type II Error


H0 Prob = 1 - α Prob = β

18
Type I and Type II Errors : Example

Null Hypothesis: The patient doesn’t Alternate Hypothesis: The patient


have cancer has cancer

Type I error (false positive): “The patient doesn’t have cancer but doctors says she does”

Type II error (false negative): “The patient does have cancer but report says she doesn’t”

19
Template for Hypothesis Testing

20
Hypothesis Testing Template

1 Identify the key question What is the research question that you are trying to answer?

2 Establish the hypotheses What is the metric of interest? Define the Null and Alternate Hypothesis.

What data do you have? Do you understand what it means? Can it be used
3 Understand and prepare data directly?

4 Identify the right test Choose the method for testing based on the last three points

5 Check the assumptions Ensure that data satisfies the assumption for the test.

6 Perform the test Get to conclusion based on the results (p-value)

21
Performing a hypothesis test

22
Some key ideas first
● Probability of rejecting the null hypothesis when it is
true
Level of
Significance (𝝰) ● Fixed before the hypothesis test.

● Probability of observing test statistic or more extreme


results than the computed test statistic, under the
null hypothesis.
p-value
● Depends on the sample data. Alpha is pre-fixed but
p-value depends on the value of the test statistic

● The total area under the distribution curve of the test


Acceptance or statistic is partitioned into acceptance and rejection
Rejection Region region

● Reject the null hypothesis when the test statistic lies


in the rejection region, Else we fail to reject it
23
Let’s start simple

Consider the following questions in hypothesis testing

What are the null and alternative hypotheses? What is an appropriate test statistic?

How to check whether the data is giving significant


What is preset level of significance?
evidence against the null hypothesis or not?

Let’s see an example and understand the significance of the above questions

For simplicity, we will assume that the population standard deviation is known and the
sample size is more than 30.

24
Example

It is known from experience that for a certain E-commerce company the mean delivery time
of the products is 5 days with a standard deviation of 1.3 days.

The new customer service manager of the company is afraid that the company is slipping
and collects a random sample of 45 orders. The mean delivery time of these samples comes
out to be 5.25 days.

Is there enough statistical evidence for the manager’s apprehension that the mean delivery
time of products is greater than 5 days.

This is clearly a one-tailed test, concerning population mean 𝛍, the


mean delivery time of products.

25
First test - z-test for One Mean

Significance of Test Statistic


Assumptions
the test Distribution
Test for population Standard Normal
mean ● Continuous data distribution
H0 : 𝜇 = 𝜇0 ● Normally distributed
population or sample size > 30
● Known population standard
deviation 𝜎
● Random sampling from the
population

26
One-tailed and Two-tailed Tests

27
One-tailed and Two-tailed Tests
Greater than type
Ha : 𝜇 > 𝜇0

One-tailed test
Less than type
Alternative Ha : 𝜇 < 𝜇0
Hypothesis

Two-tailed test

Not equal type


Ha : 𝜇 ≠ 𝜇0

Choice of One tailed vs Two tailed depends on the nature of the problem, not on the sample data!

28
Difference between One-tailed and Two-tailed Tests

Test statistic value does not change for two-tailed or one-tailed test.

Only the critical value(s) / p-value associated with the test statistic changes

0 1.645 -1.96 0 1.96

The difference is not tested on this


The difference is tested on both the
side and the hypothesis test has
sides.
greater power on the other side
29
Connecting the dots with Confidence
Intervals

30
Confidence Interval vs Hypothesis Testing
Suppose we calculate the (100 - 5)% confidence interval for the mean

We also conduct the Z-test for the mean with a 5% significance level.

The hypotheses of the Z-test are


H0 : 𝜇 = 𝜇0 against Ha : 𝜇 ≠ 𝜇0

Is there any relationship between the estimated confidence interval and the hypothesis
test?

The confidence interval contains all values of 𝜇0 for which the null hypothesis will not be
rejected.
31

You might also like