Hypothesis Testing
Hypothesis Testing
this pdf represents notes from introduction to statistical tests course on greatlearning and
might contain mistakes. some python codes associated with some exercises can be found here
Inferential Statistics
0.1 Data types
Data is generally devised into numerical data and categorical data. Categorical data can
either be :
• Nominal (Unordered)
• Ordered
• Binary
• Null hypothesis is when an assumption and the statistical results are the same
• Comparing two models to see who is more statistically significant than the other
• Probability
• Conditional probability
• Hypothesis testing
a random experiment is an experiment where all the possible outcomes are known, a sample
space is a collection of the possible outcomes(not all) and an event is a subset of a sample
space, every variable whose values are a bit random is random variables.
0.4.2 CLT
the clt theoreme says that the more we increase the sample size the more it becomes normally
distributed usually ≥ 30
• Define an analysis plan on how to use the sample to estimate the null hypothesis
• Apply a decision rule to see whether the whether to accept or reject the null hypothesis
Bensalem Mohammed Abderrahmane
0.5.2 p-value
The p value is a value produced by a statistical test and is the probability of seeing the
t-statistic as extreme as the calculated value (if the null hypothesis is true), a low p-value
indicates that we might reject the null hypothesis, if the p-value is smaller than the significance
value α we reject the null hypothesis.
p value represents how confident we should be about the null hypothesis if it is to be accepted,
so if for example we had 2 drugs and we make an assumption that drug1 is different from
drug2 and we perform the t-test and accept this hypothesis, the p value represents how much
we should be confident about our hypothesis.
The p-value is a measure of the probability of observing the data, or something more extreme,
under the assumption that the null hypothesis is true.
If we perform a statistical test on two groups and test a drug on them, and our null hypothesis
is that there is no difference between the two groups so if drug a is better than drug b we
should accept it, after the statistical test we get a p value smaller than 0.05, this means that
there is actually a difference between the groups and we reject the null hypothesis
• Goodness of fit : determine if the sample categorical data matches the population.
0.5.4 ANOVA
When performing experiments if we want to see if the results are significant or not we perform
ANOVA, it helps compare more than 3 samples(ex if one college is better), generelly there
are 2 types of anova tests, one way test and two way test
• Goodness of fit : determine if the sample categorical data matches the population.
1 Descriptive statistics
We try to describe something using it’s statistical properties,
xi
1.2 Variability
• Range Xmax − Xmin
• Variance V ar(X) = 1
(xi − x̄i )2
P
N
1.3 Relationships
• Covariance indicates that there is a linear relationship between two random variables,
butPthis measure doesn’t tell us how strong this relationship just it’s direction. Cov(X, Y ) =
1
N
(yi − ȳi )(xi − x̄i )2
1.4 Skewness
Measures the degree of asymmetry of points around the mean
• If the tail of the shape is in the positive direction it is positive and in the negative it is
negative, if there is no tail there is no skewness
Bensalem Mohammed Abderrahmane
Figure 2: Skewness
Figure 3: kurtosis
• If the skewness is positive then the mean>median>mode and if the skewness is negative
mean<median<mode
1
(xi −x̄i )3
P
• skewness = X̄−mode
αx
= 3X̄−median
αx
= N
V ar(X)
1.5 Kurtosis
Measures the sharpness of the peak of the distribution
• platykurtic means relatively small change in data kurtosis<0, small amount of outliers
• leptokurtic means speed change in data, indicates the existence of alot of outliers(heavy
tails), kurtosis>0
1
(xi −x̄i )4
P
• kurtosis = N
V ar(X)2
−3
Bensalem Mohammed Abderrahmane
• in a 100 coin toss i only need to know the number of times we got heads in order to
determine the rest of the outcomes thus it is a system with 1 degree of freedom
• in a traffic light i need to know two turned off lights in order to determine the light
that is on thus it is a system with 2 degrees of freedom
• if we already know the mean, when calculating the variance we don’t have to know
the N values but rather N-1, thus variance has N-1 degree of freedom, if i have {1,2,3}
and i know the mean is 3, when calculating the variance i don’t have to know the
three numbers because knowing the mean gives me that information, deviding by three
during the calculation of variance results in biased variance
• performing a test statistic we have to use unbiased variances, a one sample t test
requires N-1 degrees of freedom since we know the mean, a two sample test requires
N-2 degrees of freedom
2 Hypothesis testing
We do an experiment and we assume that our result is a fact
Hypothesis testing focuses only on the first row, since it is enough for us to decide whether
to reject or accept a hypothesis, so if p-value>α we accept the null hypothesis
2.2 Z test
A customer has complained that strawberries he had bought previously weighed under 250gm.
The retailer decides to check the weight of 85 punnets. He finds the average weight is 248.5gm
with standard deviation of 4.8gm. The question asks to use a significance test to judge
whether the retailer is selling under-weight punnets, and which of the following conclusions
is correct:
• D) A significance test is
• Large Sample Size: The Z-test is suitable for large sample sizes (typically n ≥ 30).
This is because, with a large sample size, the sample mean tends to follow a normal
distribution regardless of the distribution of the population, due to the Central Limit
Theorem.
• Sampling is Random: The sample data should be obtained through random sampling
methods to ensure that the sample is representative of the population.
• Continuous Data: The data being analyzed should be continuous and measured on
an interval or ratio scale.
Testing whether the mean height of students in a school is significantly different from the
national average height (when the population standard deviation is known). Comparing
the mean scores of two groups in an experiment when the sample sizes are large and the
population standard deviations are known. Determining if the proportion of defective items
produced by a manufacturing process is different from a specified value.
Bensalem Mohammed Abderrahmane
• In one sample t-test we have a single sample and we check that sample with some criteria
(previously checking whether the mean of strawberries is 250mg),for a one sample t-test
we have t = X−αx
√
X̄
with N-1 degrees of freedom
N
• In two sample t-test we have a two samples and we want to know if they are similar
• In a paired t-test we check a sample before and after, whether a change happened or
not(did a medicine work on a patient) , for a paired test we have √ms where s and m
N
are respectively the std and mean of the difference,
t= 38.4368−38.3962
√std
and df=1858 and thus the p-value> α so we accept H0
1859
Given a sufficiently large sample it is representative according to CLT.
samples.t = 156.45−151.358
√
= 3.33718 and the pvalue=0.001 which is strictly less than α which
120
means the diet has an statistically significant effect.
the data can be found here
2.4.1 Conditions
• the total number of frequencies should be strictly higher than 50
P
Fo > 50
• the sum of expected frequencies should be equal to the sum of the observed frequency
• any expected frequency should not be less than 5 for any group Fe≥5
2.4.2 Properties
• The chi square is non negative and ranges from 0 to infinity
• variance 2k
2.4.3 Applications
• Are two attributes dependent or not?
• Feature selection
Bensalem Mohammed Abderrahmane
2.4.4 Fe Calculation
Figure 4: Fe table
60 752
180 2256
Bensalem Mohammed Abderrahmane
We obtain X 2 = 37.48 and a p-value very close to 0 thus we reject the null hypothesis,
there is a relationship between the two
2.5 ANOVA
We use ANOVA to compare 3 or more means of groups, it assums the population is normally
distributed, and that they have the same variance and the groups are independent. the
number of populations is k, we have
• H0 : µ1 = · · · = µk
• H1 : population have diffrent means
The following calculations are necessary for the ANOVA , example given 3 groups A,B,C
2.6 F-distribution
also known as Fisher-Snedecor distribution this is a continuous distribution used to study
population variances, if we have two random variables Y1 , Y2 both following a chi square
distribution then YY12 follows an F distribution with k1 , k2 degrees of freedom denoted as
Fk1 ,k2 , this is used to test the variance of two groups .
we reject H0 if the test statistic is bigger than the critical value or if the p-value is smaller
than the significant value.
A media company collects data on 30 teenagers and 25 adults and the std of social media
usage is 58 and 94 minutes, do adults have greater variance than teenagers , α = 0.05?
• H0 : σteenagers
2 2
≤ σadults
• H1 : σteenagers
2 2
> σadults
S2
F = ST2 = 0.38 ∼ F29,24
A
Fα = 1.95 p − value = 0.99 thus we cannot reject the null hypothesis and thus there’s no
sufficient evidence that teenagers use social media more than adults
• H1 : σ12 ̸= σ22
we reject H0 if the test statistic is bigger than the critical value Fα/2 or less than F1−α/2 or
if the p-value is smaller than the significant value.
A vendor has choose between two services A and B, He wants the most reliable out of the
two , A has size 25 and B size 16 and the variance of arrival is 48 and 20 , so they have the
same reliability ? α = 0.1
• H0 : σA2 = σB2
• H1 : σA2 ̸= σB2
S2
F = ST2 = 2.24 ∼ F24,15
A
Fα = 2.29 F1−α/2 = 0.47 p − value = 0.028 thus we reject the null hypothesis and thus
there’s sufficient evidence that the services don’t have the same reliability
Bensalem Mohammed Abderrahmane
Figure 5: ex1
• since they are two independent samples it’s a two sample test
This is why since it is two tailed we compare with half of α rather then with it direclty
2.7.2 Exercise 2
Figure 7: exercise 2
• variance
• chi-square test
Bensalem Mohammed Abderrahmane
When we take many samples of same size from a normal population and find the sample
means, they follow a normal distribution however, when we take many samples of the same
size from a normal population and find the sample variance they don’t follow a normal
distribution but rather a chi-square distribution, which depends on degrees of freedom,
The higher the degrees of freedom the more it is close to normal distribution, the area
under this curve is 1, the cumulative probability runs from right to left , 1 towards the left
and 0 towards the right.
2.7.3 Exercise 3
three groups of samples of factory emissions of different plants of the same company. The
score is computed based on the composition of the emissions. We want to find out if there is
any inconsistency or difference across the three groups. we show groups and their emissions
• A: 57, 56, 58, 58, 56, 59, 56, 55, 53, 54, 53, 42, 44, 34, 54, 54, 34, 64, 84, 24
• B: 49, 47, 49, 47, 49, 47, 49, 46, 45, 46, 41, 42, 41, 42, 42, 42, 14, 14, 34
• C: 49, 48, 46, 46, 49, 46, 45, 55, 61, 45, 45, 45, 49, 54, 44, 74, 54, 84, 39
Here we want the difference between three groups, so we can’t try regular tests and we
perform ANOVA, so for one way anova the assumptions are :
- H0 : µ0 = µ1 = ... = µk
One way anova is used to see if there’s significant differences in means between two or
more independent samples. for anova the ratio between groups variability and inner group
variability should follow an f distribution if the null hypothesis is true. for a two sample one
way anova test the ratio follows a chi-square distribution.
dof(between) = k − 1 = 3 − 1 = 2
dof(inner) = n − k = 59 − 3 = 56
dof(total) = dof (between) + dof (inner) = 58
F _critic = 3.161
Ā = 52.45 B̄ = 41.36 C̄ = 51.45
group_means
P
overall_mean = = 48.54
X n
SStotal = (xi − overall_mean)2 = 8548.64
X X X
SSwithin = (ai − Ā)2 + (bi − B̄)2 + (ci − C̄)2 = 7096.32
SSbetween = SStotal − SSwithin = 1452.22
SSbetween
M Sbetween = = 726.16
dof between
SSwithin
M Swithin = = 126.72
dof within
M Sbetween
F _statistic = = 5.73
M Swithin
F _statistic > F _critical so we reject the null hypothesis.
Bensalem Mohammed Abderrahmane
2.7.4 Exercise 4
a study found that in 2015, out of a world population of 7.7 billion, 36.7 million people
were diagnosed HIV positive. In India, which has a population of 1.3 billion, there were 2.1
million people diagnosed HIV positive. is the population aids significantly different from that
of india?
- H0 : P _india = P _population
- H1 : P _india ̸= P _population
P _sample−P0
this is a two tailed z test of proportionality. the z statistic is calculated : z = q
P0 (1−P0 )
in
n
our case z = q0.0016−0.0047
0.0047(1−0.0047)
= −65.68 this we reject the null hypothesis and say the proportion
2100000
is significantly different.