0% found this document useful (0 votes)
38 views18 pages

Hypothesis Testing

This document provides an overview of key concepts in inferential statistics including data types, sampling, statistical analyses, hypothesis testing, descriptive statistics, and more. It defines important terms like population, sample, random variables, conditional probability, central limit theorem, null and alternative hypotheses, test statistics, p-values, chi-square test, ANOVA, measures of central tendency, variability, relationships, skewness, and kurtosis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views18 pages

Hypothesis Testing

This document provides an overview of key concepts in inferential statistics including data types, sampling, statistical analyses, hypothesis testing, descriptive statistics, and more. It defines important terms like population, sample, random variables, conditional probability, central limit theorem, null and alternative hypotheses, test statistics, p-values, chi-square test, ANOVA, measures of central tendency, variability, relationships, skewness, and kurtosis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Bensalem Mohammed Abderrahmane

this pdf represents notes from introduction to statistical tests course on greatlearning and
might contain mistakes. some python codes associated with some exercises can be found here

Inferential Statistics
0.1 Data types
Data is generally devised into numerical data and categorical data. Categorical data can
either be :

• Nominal (Unordered)

• Ordered

• Binary

Numerical data can either be discrete or continuous.

0.2 Data collections


The idea is to get a sample set that describes the most of what the entire population is doing,
so how do we take a sample ? this can be done by either using random sampling(selecting
randomly) or clustering(clustering than taking a bit out of every cluster).

0.3 Types of statistical analysis


There’s basically two types following the data types, qualitative analysis(for categorical)
and quantitative(numerical), also for the way we use statistical analysis is basically two :
inferential analysis where we try to answer a question (includes hypothesis testing where we
see if something that is true on a sample can be true on a population) or descriptive analysis
where we try to get insights from the current data, we try to describe data by their statistical
properties(mean,median,mode...).

0.3.1 Hypothesis testing


Is about verifying if certain conditions that apply to a sample do apply on a population,

• Null hypothesis is when an assumption and the statistical results are the same

• if the assumption and the statistical results are contradicting.

We have to consider randomness and bias.


Bensalem Mohammed Abderrahmane

0.4 Inferential statistics


If we want to know the average salary of a data analyst, either we take the entire population
and that isn’t always possible so we take a subset or a sample and use the average of the
sample to represent the average of the population, Inferential statistics is used for :

• Making conclusions from a sample about a population

• Conclude whether a sample is representative (statistically significant) about the population

• Comparing two models to see who is more statistically significant than the other

• To see if adding or removing a feature helps the model or makes it worst

Inferential statistics includes :

• Probability

• Conditional probability

• Hypothesis testing

• Central limit theorem (CLT)

a random experiment is an experiment where all the possible outcomes are known, a sample
space is a collection of the possible outcomes(not all) and an event is a subset of a sample
space, every variable whose values are a bit random is random variables.

0.4.1 Conditional Probability


N (Y ∩X) P (Y ∩X)
N is the number of outcomes and we have : P (Y |X) = N (X)
= P (X)

0.4.2 CLT
the clt theoreme says that the more we increase the sample size the more it becomes normally
distributed usually ≥ 30

0.5 Hypothesis testing


In hypothesis testing we make an assumption about some parameter in a population , the
steps of hypothesis testing are :

• Define the null hypothesis and the alternate hypothesis

• Define an analysis plan on how to use the sample to estimate the null hypothesis

• Extract the "test statistic" (t-stat)

• Apply a decision rule to see whether the whether to accept or reject the null hypothesis
Bensalem Mohammed Abderrahmane

Figure 1: p-value meaning illustration

0.5.1 The test-stat


If the t-stat is bigger than some critical value then we reject the null hypothesis other wise
we accept it (technically we fail to reject it) .
The critical value is a specific value from the distribution of the test statistic that defines the
boundary for rejecting the null hypothesis. It’s determined based on the chosen significance
level and the distribution of the test statistic. For example, in a z-test, the critical value
might be derived from the standard normal distribution, while in a t-test, it would come
from the t-distribution.

0.5.2 p-value
The p value is a value produced by a statistical test and is the probability of seeing the
t-statistic as extreme as the calculated value (if the null hypothesis is true), a low p-value
indicates that we might reject the null hypothesis, if the p-value is smaller than the significance
value α we reject the null hypothesis.
p value represents how confident we should be about the null hypothesis if it is to be accepted,
so if for example we had 2 drugs and we make an assumption that drug1 is different from
drug2 and we perform the t-test and accept this hypothesis, the p value represents how much
we should be confident about our hypothesis.
The p-value is a measure of the probability of observing the data, or something more extreme,
under the assumption that the null hypothesis is true.
If we perform a statistical test on two groups and test a drug on them, and our null hypothesis
is that there is no difference between the two groups so if drug a is better than drug b we
should accept it, after the statistical test we get a p value smaller than 0.05, this means that
there is actually a difference between the groups and we reject the null hypothesis

0.5.3 Chi-square test


if we want to compare categorical data we use chi square test, it suits two cases

• Goodness of fit : determine if the sample categorical data matches the population.

• Compare two categorical variables to see if they relate or not


Bensalem Mohammed Abderrahmane

0.5.4 ANOVA
When performing experiments if we want to see if the results are significant or not we perform
ANOVA, it helps compare more than 3 samples(ex if one college is better), generelly there
are 2 types of anova tests, one way test and two way test

• Goodness of fit : determine if the sample categorical data matches the population.

• Compare two categorical variables to see if they relate or not

1 Descriptive statistics
We try to describe something using it’s statistical properties,

1.1 Central of tendency


• Mean(average) µ = 1
PN
N i=1 xi

• Mode(most frequent data) mode = arg max Occ(x


N
i)

xi

• Median(median value) this requires ordering of data median = x N ∗ (1 − N %2) +


2
1
(x N + x N +1 ) ∗ (1 − N %2).
2 2 2

1.2 Variability
• Range Xmax − Xmin

• Variance V ar(X) = 1
(xi − x̄i )2
P
N

• Standard deviation αx = V ar(X).


p

1.3 Relationships
• Covariance indicates that there is a linear relationship between two random variables,
butPthis measure doesn’t tell us how strong this relationship just it’s direction. Cov(X, Y ) =
1
N
(yi − ȳi )(xi − x̄i )2

• Correlation is between -1 and 1 and it indicates how strong a linear relationship is


between two random variables, Corr(X, Y ) = Cov(X,Y
αx αy
)

1.4 Skewness
Measures the degree of asymmetry of points around the mean

• If the tail of the shape is in the positive direction it is positive and in the negative it is
negative, if there is no tail there is no skewness
Bensalem Mohammed Abderrahmane

Figure 2: Skewness

Figure 3: kurtosis

• If the skewness is positive then the mean>median>mode and if the skewness is negative
mean<median<mode
1
(xi −x̄i )3
P
• skewness = X̄−mode
αx
= 3X̄−median
αx
= N
V ar(X)

1.5 Kurtosis
Measures the sharpness of the peak of the distribution

• mesokurtic is the normal distribution kurtosis = 0

• platykurtic means relatively small change in data kurtosis<0, small amount of outliers

• leptokurtic means speed change in data, indicates the existence of alot of outliers(heavy
tails), kurtosis>0
1
(xi −x̄i )4
P
• kurtosis = N
V ar(X)2
−3
Bensalem Mohammed Abderrahmane

1.6 Degrees of freedom


Given a system with N possible outcomes, the degrees of freedom is the number of occurring
outcomes we have to know to determine all the occurring events

• in a 100 coin toss i only need to know the number of times we got heads in order to
determine the rest of the outcomes thus it is a system with 1 degree of freedom

• in a traffic light i need to know two turned off lights in order to determine the light
that is on thus it is a system with 2 degrees of freedom

• if we already know the mean, when calculating the variance we don’t have to know
the N values but rather N-1, thus variance has N-1 degree of freedom, if i have {1,2,3}
and i know the mean is 3, when calculating the variance i don’t have to know the
three numbers because knowing the mean gives me that information, deviding by three
during the calculation of variance results in biased variance

• performing a test statistic we have to use unbiased variances, a one sample t test
requires N-1 degrees of freedom since we know the mean, a two sample test requires
N-2 degrees of freedom

2 Hypothesis testing
We do an experiment and we assume that our result is a fact

2.1 type 1 and type 2 errors

Table 1: type 1 and 2 errors

Actual Positive Actual Negative


Reject null FP: Type I error (α) 1−α
accept null 1−β FN: Type II error (β)

Hypothesis testing focuses only on the first row, since it is enough for us to decide whether
to reject or accept a hypothesis, so if p-value>α we accept the null hypothesis

2.2 Z test
A customer has complained that strawberries he had bought previously weighed under 250gm.
The retailer decides to check the weight of 85 punnets. He finds the average weight is 248.5gm
with standard deviation of 4.8gm. The question asks to use a significance test to judge
whether the retailer is selling under-weight punnets, and which of the following conclusions
is correct:

• A) At 5% level he is selling under weight


Bensalem Mohammed Abderrahmane

• B) At 5% level he is not selling under weight

• C) At 5% level the test is inconclusive

• D) A significance test is

2.2.1 Exercise Z test


we take H0 as µ = 250gm and H1 as µ < 250gm , given that X̄ = 248.5gm and αx = 4.8 we
use the central limit theorem where Z = X−
αx


= 248.5−250
√4.8 = −1.875 α = 0.05, here we don’t
N 36
have population std but the sample is sufficiently large (≥ 30) so in this case we suppose the
sample std is same as population std.
since we are conducting the test "H1 : µ < µ0 " we will have a one left-tailed test (H1 : µ > µ0
is right tailed and H1 : µ ̸= µ0 is two tailed ), all we have to do is to compare our z score
which is the test statistic to the significance level of α which is the (p-value)taken from a
normal distribution , we use (Z,α)→ p = 0.0307 we have that p-value<α so we reject the
null hypothesis, he is selling under weight

2.2.2 Ztest conditions


The Z-test is applied under specific conditions, primarily when the following assumptions are
met:

• Population Standard Deviation is Known: The Z-test is most appropriate when


the population standard deviation (α) is known. If the population standard deviation
is unknown, the t-test is generally used instead.

• Large Sample Size: The Z-test is suitable for large sample sizes (typically n ≥ 30).
This is because, with a large sample size, the sample mean tends to follow a normal
distribution regardless of the distribution of the population, due to the Central Limit
Theorem.

• Sampling is Random: The sample data should be obtained through random sampling
methods to ensure that the sample is representative of the population.

• Independent Observations: The observations in the sample should be independent


of each other. This means that the value of one observation does not influence the
value of another observation.

• Continuous Data: The data being analyzed should be continuous and measured on
an interval or ratio scale.

Testing whether the mean height of students in a school is significantly different from the
national average height (when the population standard deviation is known). Comparing
the mean scores of two groups in an experiment when the sample sizes are large and the
population standard deviations are known. Determining if the proportion of defective items
produced by a manufacturing process is different from a specified value.
Bensalem Mohammed Abderrahmane

2.3 Student T-test


If we have a small size sample, and we do not have population std , and the population is
normally distributed then we can apply t-test.

• In one sample t-test we have a single sample and we check that sample with some criteria
(previously checking whether the mean of strawberries is 250mg),for a one sample t-test
we have t = X−αx


with N-1 degrees of freedom
N

• In two sample t-test we have a two samples and we want to know if they are similar

• In a paired t-test we check a sample before and after, whether a change happened or
not(did a medicine work on a patient) , for a paired test we have √ms where s and m
N
are respectively the std and mean of the difference,

2.3.1 Exercise-one sample test


A drink machine fills a bottle with 28 ounces of soft drink, a random sample of 49 bottles was
chosen, the mean was 11.88 ounces per bottle and std 0.35, is the machine working properly
?
the H0 : µ = 12 and H1 : µ ≤ 12
the test statistic is t = 11.88−12
0.35

= −1.68 and our α = 0.025 now using the values (t,α,df=n-1)
7
and looking at the students table by a left tail value we get a p-value of p = 0.01 < α thus
we reject H0 so the machine is not working properly.

2.3.2 Exercise-Random sampling


Given a random sample how can i determine if it is truly random or not, and whether it is
representative of the population or not, now given a dataset of 20k individuals with mean
= 38.3962 and we take a subset of 1859 individuals with mean = 38.4368 and see if it is
representative, given α = 0.05
H0 : sample mean = population mean ;H1 : sample mean ̸= population mean .
Bensalem Mohammed Abderrahmane

t= 38.4368−38.3962
√std
and df=1858 and thus the p-value> α so we accept H0
1859
Given a sufficiently large sample it is representative according to CLT.

2.3.3 Exercise-2 samples


The principal of a school wants to know if the average number of hours studied by boys is
significantly different from that of girls, two samples of independently randomly collected
data , one of 15 boys and one of 15 girls was collected (essential for two sample student
tests). given α = 0.05 we analyse whether the difference is significant or not, the first sample
(Boys) is 3.53 with std 0.63 and (Girls) is 3.96 with std 0.51 is this difference significant?
H0 : µgirls = µboys H1 : µgirls ̸= µboys
in order to verify whether the two samples do follow a normal distribution we either use
a histogram plot or a QQplot with line s, or we use the Shapiro test(since n ≤ 50) with H0
that data is normally distributed. now to verify whether the two distribution’s variances are
significantly different we use a Levene test with H0 them being the same and H1 them being
different, in this case we had a counter intuitive result which is that the two distributions
have the same variance.
X̄1 − X̄2 3.53 − 3.73
t= q 2 2
= q
S1 S 0.63+0.53
n1
+ n22 15

we obtain a p value of 0.0507>α so there for we fail to reject H0 .

2.3.4 Exercise-Paired t-test


The objective of this test is to compare a sample after and before some operation and to see if
the difference is significant, Now given a sample of size 120 individuals having blood pressure
and we apply a diet on them and see the effect of the diet on their blood pressure before and
after, for before we have mean = 156.45 and after we have 151.358, and we want to know if
this difference is statistically significant so we ask H0 : µbef ore = µaf ter H1 : µbef ore ̸= µaf ter
with a standard deviation of pair differences sd. and α = 0.05 we first have to know that the
samples are normally distributed , this can be checked using a boxplot.
¯
the t statistic is obtained using t = √sdd where d¯ is the mean of differences between the two
n

samples.t = 156.45−151.358

= 3.33718 and the pvalue=0.001 which is strictly less than α which
120
means the diet has an statistically significant effect.
the data can be found here

2.3.5 Parametric vs non parametric test


Any test that makes use of mean and standard deviation is parametric, otherwise it is
non-parametric, in the latter we convert the numbers into ranks and based on the ranks
we perform test, for example the bloodpressure becomes the rank, the highest to the lowest,
in the case where normality is violated we use the non parametric paired ttest. examples of
non parametric tests include Kruskal-Wallis test and Freidman test.
Bensalem Mohammed Abderrahmane

2.4 Chi-Square test


Chi square test is an inferential test applied on categorical data in order to test the relationship
between two variables , H0 they are independent, H1 are dependent.This test follows a Chi
square distribution with a parameter k that is the degrees of freedom and one degree of
freedom . which is the square of N(0,1), it can be described for system with N random
P (Fo −Fe )2
variables as X 2 = where Fe and Fo are respectively the expected
P xi −µi 2
[ σi ] = Fe
i∈N i∈N
and observed frequencies.

2.4.1 Conditions
• the total number of frequencies should be strictly higher than 50
P
Fo > 50

• the sum of expected frequencies should be equal to the sum of the observed frequency

• any expected frequency should not be less than 5 for any group Fe≥5

• no information about population

• the sample should be randomly selected

2.4.2 Properties
• The chi square is non negative and ranges from 0 to infinity

• The chi square shape depends on the number of freedoms k

• k = mean of chi square

• variance 2k

2.4.3 Applications
• Are two attributes dependent or not?

• Goodness of fit(how observed data matches expected output)

• Whether two categorical variable are of the same proportion

• Feature selection
Bensalem Mohammed Abderrahmane

2.4.4 Fe Calculation

Figure 4: Fe table

F e = CTN∗RT df = (nrows − 1)(ncols − 1)


Expected Frequency (Var1)= ((Var1 + Var2)*(Var1 + var3)/(Var1 + Var2 + Var3 +
Var4))
Expected Frequency (Var2)= ((Var1 + Var2)*(Var2 + Var4)/(Var1 + Var2 + Var3 +
Var4))
Expected Frequency (Var3)= ((Var3 + Var4)* (Var1 + Var3)/(Var1 + Var2 + Var3 +
Var4))
Expected Frequency (Var4)= ((Var3 + Var4)* (Var2 + Var4)/(Var1 + Var2 + Var3 +
Var4))

2.4.5 Exercise-Test of independence


Suppose we want to know the relationship between soda and fries and we have α = 0.05, also
2
X 2 = (Fo −F
Fe
e)
with Fe = sumRow∗sumColumn
N
, df = (nRow − 1)(sumColumn − 1)

Type of drink Fries No Fries


Cola 20 792
Orange Soda 220 2216

The first step is to calculate expected frequencies

Type of drink Fries No Fries Total Row


Cola 20 792 812
Orange Soda 220 2216 2436
Total column 240 3008 3248

Now using the F e we get :

60 752
180 2256
Bensalem Mohammed Abderrahmane

We obtain X 2 = 37.48 and a p-value very close to 0 thus we reject the null hypothesis,
there is a relationship between the two

2.4.6 Advantages and disadvantages


the Chi square test helps is feature selection and independence of groups and it judges how
well our observed data matches expected data, however it gives no information on parameters
of population and it requires a sample of at-least 50, it is also misleading for big sample sizes

2.5 ANOVA
We use ANOVA to compare 3 or more means of groups, it assums the population is normally
distributed, and that they have the same variance and the groups are independent. the
number of populations is k, we have
• H0 : µ1 = · · · = µk
• H1 : population have diffrent means
The following calculations are necessary for the ANOVA , example given 3 groups A,B,C

dof(total) = dof (between) + dof (within)


Ā + B̄ + C̄
overall_mean =
X n
SStotal = (xi − overall_mean)2
X X X
SSwithin = (ai − Ā)2 + (bi − B̄)2 + (ci − C̄)2
SSbetween = SStotal − SSwithin (1)
SSbetween
M Sbetween =
dof between
SSwithin
M Swithin =
dof within
M Sbetween
F _statistic =
M Swithin
This allows us to obtain the f_statistic which allows us to extract the p-value.
Bensalem Mohammed Abderrahmane

2.6 F-distribution
also known as Fisher-Snedecor distribution this is a continuous distribution used to study
population variances, if we have two random variables Y1 , Y2 both following a chi square
distribution then YY12 follows an F distribution with k1 , k2 degrees of freedom denoted as
Fk1 ,k2 , this is used to test the variance of two groups .

2.6.1 One tailed test


• H0 : σ12 ≤ σ22

• H1 : σ12 > σ22

we reject H0 if the test statistic is bigger than the critical value or if the p-value is smaller
than the significant value.
A media company collects data on 30 teenagers and 25 adults and the std of social media
usage is 58 and 94 minutes, do adults have greater variance than teenagers , α = 0.05?

• H0 : σteenagers
2 2
≤ σadults

• H1 : σteenagers
2 2
> σadults
S2
F = ST2 = 0.38 ∼ F29,24
A
Fα = 1.95 p − value = 0.99 thus we cannot reject the null hypothesis and thus there’s no
sufficient evidence that teenagers use social media more than adults

2.6.2 Two tailed test


• H0 : σ12 = σ22

• H1 : σ12 ̸= σ22

we reject H0 if the test statistic is bigger than the critical value Fα/2 or less than F1−α/2 or
if the p-value is smaller than the significant value.
A vendor has choose between two services A and B, He wants the most reliable out of the
two , A has size 25 and B size 16 and the variance of arrival is 48 and 20 , so they have the
same reliability ? α = 0.1

• H0 : σA2 = σB2

• H1 : σA2 ̸= σB2
S2
F = ST2 = 2.24 ∼ F24,15
A
Fα = 2.29 F1−α/2 = 0.47 p − value = 0.028 thus we reject the null hypothesis and thus
there’s sufficient evidence that the services don’t have the same reliability
Bensalem Mohammed Abderrahmane

2.7 Further examples


2.7.1 exercise 1

Figure 5: ex1

• we take H0 : µdog = µno_dog and H1 : µdog ̸= µno_dog

• it’s a two tail test

• since they are two independent samples it’s a two sample test

• it’s a test of mean

• we do a two sample two tail ttest

µ1 , s1 = 7.73, 1.24 µ2 , s2 = 7.58, 1.243 α = 0.05


µ1 − µ2
t= q 2 = 0.35 ≥ 0.025; ≤ 0.975
s1 s22
n1
+ n2
we don’t reject the null hypothesis,
Bensalem Mohammed Abderrahmane

Figure 6: Confidence Intervals

This is why since it is two tailed we compare with half of α rather then with it direclty

2.7.2 Exercise 2

Figure 7: exercise 2

• H0 : the variance is the same and H1 : the variance of lori’s is higher

• right tailed test

• variance

• chi-square test
Bensalem Mohammed Abderrahmane

When we take many samples of same size from a normal population and find the sample
means, they follow a normal distribution however, when we take many samples of the same
size from a normal population and find the sample variance they don’t follow a normal
distribution but rather a chi-square distribution, which depends on degrees of freedom,

Figure 8: Chi2 distribution

The higher the degrees of freedom the more it is close to normal distribution, the area
under this curve is 1, the cumulative probability runs from right to left , 1 towards the left
and 0 towards the right.

µ2 = 104.06 v2 = 40.99 dof = N − 1 = 14


chi_critical = 23.68
v2
x2 = dof ( ) = 35.62
v1
any value above chi_critical is in the rejection region, so we reject the null hypothesis and
the variance of lori is significantly higher than the population.
Bensalem Mohammed Abderrahmane

2.7.3 Exercise 3
three groups of samples of factory emissions of different plants of the same company. The
score is computed based on the composition of the emissions. We want to find out if there is
any inconsistency or difference across the three groups. we show groups and their emissions

• A: 57, 56, 58, 58, 56, 59, 56, 55, 53, 54, 53, 42, 44, 34, 54, 54, 34, 64, 84, 24

• B: 49, 47, 49, 47, 49, 47, 49, 46, 45, 46, 41, 42, 41, 42, 42, 42, 14, 14, 34

• C: 49, 48, 46, 46, 49, 46, 45, 55, 61, 45, 45, 45, 49, 54, 44, 74, 54, 84, 39

Here we want the difference between three groups, so we can’t try regular tests and we
perform ANOVA, so for one way anova the assumptions are :

- H0 : µ0 = µ1 = ... = µk

- H1 not all are equal

One way anova is used to see if there’s significant differences in means between two or
more independent samples. for anova the ratio between groups variability and inner group
variability should follow an f distribution if the null hypothesis is true. for a two sample one
way anova test the ratio follows a chi-square distribution.

dof(between) = k − 1 = 3 − 1 = 2
dof(inner) = n − k = 59 − 3 = 56
dof(total) = dof (between) + dof (inner) = 58
F _critic = 3.161
Ā = 52.45 B̄ = 41.36 C̄ = 51.45
group_means
P
overall_mean = = 48.54
X n
SStotal = (xi − overall_mean)2 = 8548.64
X X X
SSwithin = (ai − Ā)2 + (bi − B̄)2 + (ci − C̄)2 = 7096.32
SSbetween = SStotal − SSwithin = 1452.22
SSbetween
M Sbetween = = 726.16
dof between
SSwithin
M Swithin = = 126.72
dof within
M Sbetween
F _statistic = = 5.73
M Swithin
F _statistic > F _critical so we reject the null hypothesis.
Bensalem Mohammed Abderrahmane

2.7.4 Exercise 4
a study found that in 2015, out of a world population of 7.7 billion, 36.7 million people
were diagnosed HIV positive. In India, which has a population of 1.3 billion, there were 2.1
million people diagnosed HIV positive. is the population aids significantly different from that
of india?

- H0 : P _india = P _population

- H1 : P _india ̸= P _population
P _sample−P0
this is a two tailed z test of proportionality. the z statistic is calculated : z = q
P0 (1−P0 )
in
n
our case z = q0.0016−0.0047
0.0047(1−0.0047)
= −65.68 this we reject the null hypothesis and say the proportion
2100000
is significantly different.

You might also like