4 Hypothesis Testing
4 Hypothesis Testing
Hypothesis Testing
• “Beware of the problem of testing too many hypotheses; the more
you torture the
• data, the more likely they are to confess, but confessions obtained
under duress
• may not be admissible in the court of scientific opinion.”
• — Stephen M Stigler
Hypothesis Testing
• LO 6-1 Understand hypothesis test and its importance in analytics.
• LO 6-2 Learn to setup a hypothesis test, understand the concept of null and alternative hypotheses.
• LO 6-3 Understand the link between central limit theorem and test statistic in one-sample Z-test
• and t-test.
• LO 6-4 Understand the concept of significance (a), probability value (p-value), Type I and Type II errors.
• LO 6-5 Understand simple one-sample hypothesis test for population mean when population
• variance is either known or unknown.
• LO 6-6 Learn to conduct a two-sample hypothesis test and its applications in analytics.
• LO 6-7 Understand the role of non-parametric tests such as chi-square test of independence.
• LO 6-8 Learn goodness of fit tests and their application in identifying best probability distribution to
• describe a data set.
Introduction to Hypothesis Testing
• Blackout Babies
On 9 November 1965 there was a power failure that resulted in blackout for approximately 12 hours in
New York and surrounding areas. Nine months later, in August 1966, New York Times published a series
of three articles in which it claimed that the birth rates in August 1966 was higher than normal based on
interviews with city doctors (Izenman and Zabell, 1981). The babies were nicknamed ‘blackout babies’.
The articles published by the New York Times raised an interesting question on whether power failures
result in procreation? Izenman and Zabell (1981) using time series data analysis claimed that there is not
enough evidence to suggest that the 1965 power failure resulted in increased birth rate nine months after
the blackout.
Introduction to Hypothesis Testing
• Hypothesis testing consists of two complementary statements called null hypothesis and alternative
hypothesis, and only one of them is true.
• In business, many claims are made by organizations. Few examples of such claims are listed
below:
1. Children who drink the health drink Complan (a health drink owned by the company Heinz in
India) are likely to grow taller.
2. If you drink Horlicks, you can grow taller, stronger, and sharper (3 in 1).
3. Using fair and lovely (fair and handsome) cream can make one fair and lovely (fair and
handsome).
4. Wearing perfume (such as Axe) will help to attract opposite gender (known as Axe effect).
Introduction to Hypothesis Testing
• Hypothesis testing consists of two complementary statements called null hypothesis and alternative
hypothesis, and only one of them is true.
• In business, many claims are made by organizations. Few examples of such claims are listed
below:
5. Women use camera phone more than men (Freier, 2016).
6. Beautiful people are likely to have girl child (Miller and Kanazawa, 2007). This is one of my
favorite hypotheses since I have a daughter I can claim that I am good looking.
7. Married people are happier than singles (Anon, 2015), especially those who married their best
friend (many married people may not agree!).
8. Vegetarians miss few flights (Siegel, 2016).
9. Smokers are better sales people.
SETTING UP A HYPOTHESIS TEST
• we will discuss the steps involved in hypothesis testing. Data analysis in general can be
classified as exploratory data analysis or confirmatory data analysis.
• In exploratory data analysis, the
idea is to look for new or previously unknown hypothesis or suggest hypotheses. In the case of
confirmatory data analysis, the objective is to test the validity of a hypothesis (confirm whether the
hypothesis
is true or not) using techniques such as hypothesis testing and regression.
• According to Tukey (1977),
exploratory data analysis is similar to a detective work suggesting hypotheses whereas
confirmatory data
analysis looks for evidence in support of hypotheses using techniques such as hypothesis testing.
SETTING UP A HYPOTHESIS TEST
• The following steps are used in hypothesis testing:
• 1. Describe the hypothesis in words. Hypothesis is described using a population parameter (such
as mean, standard deviation, proportion, etc.) about which a claim (hypothesis) is made. Few
sample claims (hypothesis) are:
(a) Average time spent by women using social media is more than men.
(b) On average women upload more photos in social media than men.
(c) Customers with more than one mobile handsets are more likely to churn.
• Based on the claim made in step 1, define null and alternative hypotheses. Initially we believe
that the null hypothesis is true. In general, null hypothesis means that there is no relationship
between the two variables under consideration (for example, null hypothesis for the claim
‘women use social media more than men’ will be ‘there is no relationship between gender and
the average time spent in social media’). Null and alternative hypotheses are defined using a
population parameter
SETTING UP A HYPOTHESIS TEST
• The following steps are used in hypothesis testing:
• Identify the test statistic to be used for testing the validity of the null hypothesis. Test statistic
will
enable us to calculate the evidence in support of null hypothesis. The test statistic will depend
on the probability distribution of the sampling distribution; for example, if the test is for mean
value and the mean is calculated from a large sample and if the population standard deviation
is
known, then the sampling distribution will be a normal distribution and the test statistic will
be
a Z-statistic (standard normal statistic).
SETTING UP A HYPOTHESIS TEST
• The following steps are used in hypothesis testing:
• 4. Decide the criteria for rejection and retention of null hypothesis. This is called significance
value traditionally denoted by symbol a. The value of a will depend on the context and
usually 0.1, 0.05, and 0.01 are used. Significance value a is the Type I error (discussed later).
5. Calculate the p-value (probability value), which is the conditional probability of observing the
test statistic value when the null hypothesis is true. In simple terms, p-value is the evidence in
support of the null hypothesis.
• 6. Take the decision to reject or retain the null hypothesis based on the p-value and significance
value a. The null hypothesis is rejected when p-value is less than a and the null hypothesis is
retained when p-value is greater than or equal to a
Description of Hypothesis
• Hypotheses are claims that are usually stated in simple words initially as listed below:
• 1. Average annual salary of machine learning experts is different for males and
females.
2. On an average people with Ph.D. in analytics earn more than people with Ph.D.
in engineering.
3. The average box-office collection of comedy genre movies is more than that of
action movies.
4. Average life of vegetarians is more than meat eaters.
5. Proportion of married people defaulting on loan repayment is less than
proportion of singles
defaulting on loan repayment.
Null and Alternative Hypothesis
• Null hypothesis, usually denoted as H0 (H zero and H naught), refers to the
statement that there is no
relationship or no difference between different groups with respect to the value of
a population parameter. Null hypothesis is the claim that is assumed to be true
initially. That is at the beginning we assume
that the null hypothesis is true and try to retain it unless there is strong evidence
against null hypothesis.
Alternative hypothesis, usually denoted as HA (or H1), is the complement of null
hypothesis. Alternative
hypothesis is what the researcher believes to be true and would like to reject the
null hypothesis.
Null and Alternative Hypothesis
Null and Alternative Hypothesis
• Hypothesis test checks the validity of the null hypothesis based on the
evidence from the sample.
At the beginning of the test, we assume that the null hypothesis is true.
Since the researcher may
believe in alternative hypothesis, she/he may like to reject the null
hypothesis. However, in many
cases (such as goodness of fit tests), we would like to retain or fail to
reject the null hypothesis.
Test Statistic
• Test statistic is the standardized difference between the estimated value of the parameter being
tested
calculated from the sample(s) and the hypothesis value (that is, standardized difference between X
and
m in the case of testing mean) in order to establish the evidence in support of the null hypothesis.
Test
statistic is the standardized value used for calculating the p-value (probability value) in support of
null
hypothesis. Since test statistic is a standardized value, it measures the standardized distance
(measured
in terms of number of standard deviations) between the value of the parameter estimated from the
sample(s) and the value of the null hypothesis
Test Statistic
• The p-value is the conditional probability of observing the statistic value when the
null hypothesis is true. For example, consider the following research hypothesis:
Average annual salary of machine
learning experts is at least 100,000. The corresponding null hypothesis is H0: mm ≤
100,000.
• Note that the p-value is a conditional probability. It is the conditional probability of observing
the statistic value given that the null hypothesis is true. P-value is the evidence in support of
null hypothesis.
Decision Criteria – Significance Value
• Primary task in hypothesis testing is to take a decision to either reject
or fail to reject (retain) the null
hypothesis, thus we need a criteria to take the decision. Significance
level, usually denoted by a, is the
criteria used for taking the decision regarding the null hypothesis
(reject or retain) based on the calculated p-value. The significance
value a is the maximum threshold for p-value. The decision to reject
or
retain will depend on whether the calculated p-value crosses the
threshold value a or not.
Decision Criteria – Significance Value
• The value of statistic in the sampling distribution for
which the probability is a is called the critical value. In a right-tailed test, if the calculated statistic value
is greater than the critical value (p-value will be less than a-value) then we reject the null hypothesis,
whereas, if the statistic value is less than the critical value then we retain the null hypothesis. In case
of left-tailed test, if the calculated statistic value is less than the critical value (p-value will be less than
a-value) then we reject the null hypothesis, whereas, if the statistic value is greater than the critical value
then we retain the null hypothesis. The areas beyond the critical values are known as rejection region
ONE-TAILED AND TWO-TAILED TEST
• Consider the following three hypotheses:
1. Salary of machine learning experts on average is at least US $100,000.
2. Average waiting time at the London Heathrow airport security check is less than 30 minutes.
3. Average annual salaries of male and female MBA students are different at the time of
graduation.
ONE-TAILED AND TWO-TAILED TEST
• where mm is the average annual salary of machine learning experts. Note that the equality symbol is always
part of the null hypothesis since we have to measure the difference between estimated value from the sample
and the hypothesis value. In this case, reject or retain decision will depend on the direction of deviation of
the estimated parameter value from the hypothesis value.
ONE-TAILED AND TWO-TAILED TEST
• Figure 6.2 shows the rejection region on the right
side of the distribution. Since the rejection region is only on one side this is a one-tailed test (right tailed
test). Specifically, since the alternative hypothesis in this case is mm > 100,000, this is called right-tailed test.
ONE-TAILED AND TWO-TAILED TEST
• Average salary of male and female MBA students at graduation is different:
• where mm and mf are the average salaries of male and female MBA students, respectively, at the time of
graduation. In this case, the rejection region will be on either side of the distribution and if the significance
level is
a then the rejection region will be a/2 on either side of the distribution. Since the rejection region is on either
side of the distribution, it will be a two-tailed test. Figure 6.4 shows the rejection region of a two-tailed test.
ONE-TAILED AND TWO-TAILED TEST
• Average salary of male and female MBA students at graduation is different:
• where mm and mf are the average salaries of male and female MBA students, respectively, at the time of
graduation. In this case, the rejection region will be on either side of the distribution and if the significance
level is
a then the rejection region will be a/2 on either side of the distribution. Since the rejection region is on either
side of the distribution, it will be a two-tailed test. Figure 6.4 shows the rejection region of a two-tailed test.
ONE-TAILED AND TWO-TAILED TEST
TYPE I ERROR, TYPE II ERROR, AND POWER OF THE HYPOTHESIS TEST
• 1. Type I Error: Conditional probability of rejecting a null hypothesis when it is true is called Type
I Error or False Positive (falsely believing that the claim made in alternative hypothesis is true).
The significance value a is the value of Type I error. Mathematically, Type I error can be defined
as follows:
Type I Error = a = P(Rejecting null hypothesis | H0 is true) (6.3)
It is important to understand the difference between the p-value and the significance value a.
Probability value (p-value) is the evidence for the null hypothesis whereas significance value a is
the error based on repetitive sampling. Hubbard et al. (2003) state that the p-value in a hypothesis test refers
to probability of observing the data given a null hypothesis, whereas the significance level a refers to
incorrect rejection of null hypothesis when it is true under repeated trials
TYPE I ERROR, TYPE II ERROR
• Type II Error: Conditional probability of failing to reject a null hypothesis (or retaining a null
hypothesis) when the alternative hypothesis is true is called Type II Error or False Negative
(falsely believing that there is no relationship). Usually Type II error is denoted by the symbol b.
HYPOTHESIS TESTING FOR POPULATION MEAN WITH
KNOWN VARIANCE: Z-TEST
• Z-test (also known as one-sample Z-test) is used when a claim (hypothesis) is made about the population
parameter such as population mean or proportion when population variance is known.
In this section, we will be discussing the hypothesis testing for the population mean when the population
variance is known. Since the hypothesis test is carried out with just one sample, this test is also known as
one-sample Z-test.
According to the central limit theorem (CLT) for sampling distribution of mean, we
know that the sampling distribution of mean from an independent and identically distributed population
for large sample follows a normal distribution with mean m and standard deviation
. The standardized value
follows a standard normal distribution. Z-test uses CLT to conduct a hypothesis test for
population mean when the population variance is known; the test statistics for Z-test is given by
HYPOTHESIS TESTING FOR POPULATION MEAN WITH
KNOWN VARIANCE: Z-TEST
HYPOTHESIS TESTING FOR POPULATION MEAN WITH
KNOWN VARIANCE: Z-TEST
HYPOTHESIS TEST FOR POPULATION MEAN UNDER UNKNOWN POPULATION
VARIANCE: t-TEST
• We use the fact that a sampling distribution of a sample from a population that follows normal distribution
with unknown variance follows a t-distribution with (n - 1) degrees of freedom. In many cases the
population variance (and thus the standard deviation) will not be known. In such cases we will have to
estimate the variance using the sample itself. Let S be the standard deviation estimated from the sample
of size n.
• The t-test is used when the population follows a normal distribution and the population standard
deviation s is unknown and is estimated from the sample. t-test is a robust test for violation of
normality of the data as long as the data is close to symmetry and there are no outliers
PAIRED SAMPLE t-TEST
• In many cases, we would like to analyse whether an intervention (or treatment) such as training programs,
marketing promotions, treatment for specific illness, and life style changes may have significantly
changed the population parameter values such as mean and proportion before and after the intervention.
• The objective in this case is to check whether the difference in the parameter values is statistically significant
before and after the intervention or between two different types of interventions (for example two
different types of promotions). In a paired t-test, the data related to the parameter is captured twice from
the same subject, once before the intervention and once after intervention.
• 1. Body weight of subjects before and after attending a yoga training program.
2. Cholesterol levels of subjects before and after attending meditation training.
3. Amount of time spent by subjects on the internet before and after marriage.
4. Quantity of alcohol consumed by people before and after breakup.
5. Level of cortisol among students during and after exam.
• Note that, in the above examples, we are observing a population parameter value on the same subject
before and after intervention. Assume that the mean difference in the estimated parameter value before
and after the treatment is D, and the corresponding standard deviation of difference is Sd. Let md be the
hypothesized mean difference. Then the statistic defined in Eq. (6.9) follows a t-distribution with (n - 1)
degrees of freedom. Here we assume that the differences follow a normal distribution
NON-PARAMETRIC TESTS: CHI-SQUARE TESTS
• A major difference between parametric and non-parametric tests is that in a parametric test we need only
values of the parameter and the knowledge about the distribution, whereas in case of non-parametric test we
use the entire distribution of the data. Importantly, the data may not follow any parametric distribution such
as normal distribution. Also, the test is not about the population parameter but about characteristics of the
entire distribution (for example, whether the data follows a normal distribution or not). A non-parametric
method for hypothesis tests is used when one or more of the following conditions exist in the test:
• 1. The test is not about the population parameter such as mean and standard deviation.
2. The method does not require assumptions about population distribution (such as population
follows normal distribution).