0% found this document useful (0 votes)

54 views

DS-2, Week 4 - Lectures

This document discusses common statistical tests used in data science, including parametric tests like the z-test, F-test, and ANOVA, as well as non-parametric tests like the chi-square test, Mann-Whitney U test, and Kruskal-Wallis H test. It explains what each test is used for, its assumptions and calculations. The key statistical tests covered are z-test for comparing means, F-test for comparing variances, ANOVA for comparing multiple group means, chi-square test for assessing goodness of fit, Mann-Whitney U test and Kruskal-Wallis H test as non-parametric alternatives to t-test and ANOVA.

Uploaded by

Prerana Varshney

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

DS-2, Week 4 - Lectures

Uploaded by

Prerana Varshney

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

MDS 202: Data Science II with R

Lectures 10-12: What are Statistical Tests?

Dr. Shatrughan Singh∗

Week 4 (27 Feb. - 3 March) 2023

1 LEARNING OBJECTIVES
1.1 What is expected from you to learn from this lecture?
• Learn about different statistical tests in data science.
• How to calculate these statistical tests in R?

4 Common Statistical Tests

Continues from the previous lectures ….

4.2 Z-Test
• It is a parametric test of hypothesis testing.
• It is essentially, testing the significance of the difference of the mean values when the sample size is
large (i.e, greater than 30) and when the population variance is known.
• Assumptions of this test are as follows:
– Population distribution is normal.
– Samples are random and independent.
– The sample size is large.
– Population standard deviation is known.

4.2.1 One-Sample Z-Test

To compare a sample mean with that of the population mean.

𝑥̄ − 𝜇
𝑧= √
𝜎/ 𝑛
where 𝑥̄ is sample mean, 𝜇 is population mean, 𝜎 is population standard deviation, and 𝑛 is sample size.

4.2.2 Two-Sample Z-Test

To compare the means of two different samples.

𝑥1̄ − 𝑥2̄
𝑧=
√𝜎1 /𝑛1 + 𝜎22 /𝑛2
2

∗ Amity University Rajasthan (Jaipur), [email protected]

Dr. S. Singh MDS 202: Lec —> 10-12 1

where 𝑥1̄ and 𝑥2̄ are sample means, 𝜎1 and 𝜎2 are sample standard deviations, and 𝑛1 and 𝑛2 are sample
sizes for groups, 1 and 2, respectively.

4.3 F-Test
• It is a parametric test of hypothesis testing based on Snedecor F-distribution.
• It is a test for the null hypothesis that two normal populations have the same variance.
• F-test is regarded as a comparison of equality of sample variances.
• F-statistic is simply a ratio of two variances.
• By changing the variance in the ratio, F-test has become a very flexible test. It can then be used to:
– Test the overall significance for a regression model.
– To compare the fits of different models, and
– To test the equality of means.
• Assumptions of this test are as follows:
– Population distribution is normal
– Samples are random and independent

4.3.1 F-Test is calculated as:

𝑠21
𝐹 =
𝑠22
where,
𝑛
∑ (𝑥𝑖 − 𝑋)̄ 2
𝑖=1
𝑠2 =
𝑛−1
̄
𝑥𝑖 = value of observations from 𝑖 = 1 to 𝑛, 𝑋 is mean of the samples, and 𝑛 is number of observations.

4.4 ANOVA: ANalysis Of VAriance

• It is a parametric test of hypothesis testing.
• It is an extension of the T-Test and Z-test.
• It is used to test the significance of the differences in the mean values among more than two sample
groups.
• It uses F-test to statistically test the equality of means and the relative variance between them.
• Assumptions of this test:
– Population distribution is normal
– Samples are random and independent
– Homogeneity of sample variance
• One-Way ANOVA and Two-Way ANOVA are different types of the ANOVA test.
• F-test to determine whether the variability between group means is larger than the variability of the
observations within} the groups.

𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑟𝑜𝑢𝑝𝑠 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

𝐹 =
𝑤𝑖𝑡ℎ𝑖𝑛 𝑔𝑟𝑜𝑢𝑝𝑠 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

4.5 Chi-Square Test

• It is a non-parametric test of hypothesis testing.
• As a non-parametric test, chi-square can be used:
– Test of ‘goodness of fit’ test which determines whether a particular distribution fits the observed
data or not.
– As a test of independence of two variables.
• It helps in assessing the goodness of fit between a set of observed and theoretically expected values.
Hence, making a comparison between the expected frequencies and the observed frequencies.

Dr. S. Singh MDS 202: Lec —> 10-12 2

In a nutshell, greater the difference, the greater is the value of chi-square.
• If there is no difference between the expected and observed frequencies, then the value of chi-square is
equal to zero.
• This test is calculated as:
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
where 𝑂 = Observed frequencies, and 𝐸 = Expected frequencies
• Conditions for chi-square test:
– Randomly collect and record the observations.
– All the entities in a sample must be independent.
– No one of the groups should contain very few items (say, less than 10).
– The reasonably large overall entities. Normally, it should be at least 50, however small the number
of groups may be.
• Chi-square as a parametric test is used as a test for population variance based on sample variance.
• If we take each one of a collection of sample variances, divide them by the known population variance
and multiply these quotients by (𝑛-1), where 𝑛 means the number of items in the sample, we get the
values of chi-square.
• It is calculated as:
𝑆2
𝜒2 = (𝑛 − 1)
𝜎2

where, S = Sample variance, 𝜎 = Hypothesized population variance, and 𝑛 = sample size;

Or,

(𝑂𝑖𝑗 − 𝐸𝑖𝑗 )2
𝜒2 = ∑
𝐸𝑖𝑗

where, 𝑂 = Observed frequencies, and 𝐸 = Expected frequencies.

4.6 Mann-Whitney ‘U’ Test

• It is a non-parametric test of hypothesis testing.
• This test is used to investigate whether two independent samples were selected from a population
having the same distribution.
• It is a true non-parametric counterpart of the T-test and gives the most accurate estimates of signifi-
cance, especially when sample sizes are small and the population is NOT normally distributed.
• It is based on the comparison of every observation in the first sample with every observation in the
other sample.
• Test statistic is: U
• Maximum value of U is 𝑛1 ∗ 𝑛2 and the minimum value is zero.
• This test is also knows as:
– Mann-Whitney Wilcoxon Rank Sum Test1
• Mathematically U is calculated as:
𝑛 (𝑛 + 1)
𝑈1 = 𝑅 1 − 1 1
2
1 However, Wilcoxon Signed Rank Test is different than Wilcoxon Rank Sum Test as the Rank Sum Test involves indepen-

dent samples compared to dependent samples in Signed Rank Test.

Dr. S. Singh MDS 202: Lec —> 10-12 3

𝑛2 (𝑛2 + 1)
𝑈2 = 𝑅2 −
2
where, 𝑅 = Sum of ranks, 𝑛1 and 𝑛2 are sample sizes for samples, 1 and 2, respectively.

When consulting the significance tables, the smaller values of 𝑈1 and 𝑈2 are used. The sum of two values is
given by,

𝑛1 (𝑛1 + 1) 𝑛 (𝑛 + 1)
𝑈1 + 𝑈2 = 𝑅1 − + 𝑅2 − 2 2
2 2
𝑁(𝑁+1)
Knowing that 𝑅1 + 𝑅2 = 2 and 𝑁 = 𝑛1 + 𝑛2 , and doing some algebra, we can find that the sum is:

𝑈1 + 𝑈 2 = 𝑛 1 ∗ 𝑛 2

4.7 Kruskal Wallis ‘H’ Test

• It is a non-parametric test of hypothesis testing.
• This test is used for comparing two or more independent samples of equal or different sample sizes.
• It extends the Mann-Whitney ‘U’ Test which is used to comparing only two groups.
• This is a non-parametric equivalent of One-Way ANOVA (parametric) test. Actually, it is a One-
Way ANOVA on RANKS.
• Assumption of normality is not required for this test.
• Test statistic is: H

𝑘
12 𝑇2
𝐻=[ ∑ 𝑖 ] − 3(𝑛 + 1)
𝑛(𝑛 + 1) 𝑖=1 𝑛𝑖

where, 𝑛 = sum of sample sizes for all samples, 𝑘 = number of samples, 𝑇𝑖 = sum of ranks in the 𝑖𝑡ℎ sample,
and 𝑛𝑖 = size of the 𝑖𝑡ℎ sample

OVERALL CONCLUSIONS
If the value of the test statistic is greater (>) than the table (critical) statistic value, then Reject the
Null Hypothesis.

End of the Lecture !!

Dr. S. Singh MDS 202: Lec —> 10-12 4

Practical Research 2 Final Examination
71% (14)
Practical Research 2 Final Examination
5 pages
FINALSTATEMCQ
No ratings yet
FINALSTATEMCQ
40 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Mbaz504 Statistics For Manager
No ratings yet
Mbaz504 Statistics For Manager
174 pages
Set 3 - Donea
0% (1)
Set 3 - Donea
2 pages
Hypothesis Testing Parametric and Non Parametric Tests
No ratings yet
Hypothesis Testing Parametric and Non Parametric Tests
14 pages
Chapter 12
No ratings yet
Chapter 12
26 pages
Class 20 Chi Square Copy 1 79
No ratings yet
Class 20 Chi Square Copy 1 79
13 pages
SM 38
No ratings yet
SM 38
58 pages
Parametric & Non-Parametric Tests
No ratings yet
Parametric & Non-Parametric Tests
34 pages
Group Comparision
No ratings yet
Group Comparision
49 pages
ADA Binder
No ratings yet
ADA Binder
171 pages
Non Parametric Test
No ratings yet
Non Parametric Test
13 pages
Unit 4 & Unit 5
0% (1)
Unit 4 & Unit 5
59 pages
Block 4
No ratings yet
Block 4
108 pages
Parametric & Non-Parametric Tests
No ratings yet
Parametric & Non-Parametric Tests
34 pages
FTEST@3PM 7th Oct
No ratings yet
FTEST@3PM 7th Oct
15 pages
Non Parametric Statistics
No ratings yet
Non Parametric Statistics
96 pages
DEFINITIONS AND FORMULAE WITH STATISTICAL TABLES FOR ELEMENTARY STATISTICS AND QUANTITATIVE METHODS COURSES
No ratings yet
DEFINITIONS AND FORMULAE WITH STATISTICAL TABLES FOR ELEMENTARY STATISTICS AND QUANTITATIVE METHODS COURSES
13 pages
Hypothesis Testing With T Tests Edited 1
No ratings yet
Hypothesis Testing With T Tests Edited 1
31 pages
Sign Mann Wilcoxon Kruskal - PPT - Compatibility Mode
No ratings yet
Sign Mann Wilcoxon Kruskal - PPT - Compatibility Mode
28 pages
Adstat Final Exam Reviewer2
No ratings yet
Adstat Final Exam Reviewer2
29 pages
Week7-Inferentionalstat - (Grup Differences)
No ratings yet
Week7-Inferentionalstat - (Grup Differences)
32 pages
BRM Unit-3
No ratings yet
BRM Unit-3
22 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
6 pages
Chapter 6 Non-Parametric and Robust Test
No ratings yet
Chapter 6 Non-Parametric and Robust Test
43 pages
NON-PARAMETRIC TESTS
No ratings yet
NON-PARAMETRIC TESTS
11 pages
Expi Psych Chapter 14-16
No ratings yet
Expi Psych Chapter 14-16
9 pages
Chapter 4
No ratings yet
Chapter 4
48 pages
Chisquare
No ratings yet
Chisquare
10 pages
Statistics Unit 9 Notes
No ratings yet
Statistics Unit 9 Notes
10 pages
STAT22209 - Nonparametric Statistics
No ratings yet
STAT22209 - Nonparametric Statistics
74 pages
Chisquare Gonzales
No ratings yet
Chisquare Gonzales
32 pages
Math204 NonParThree
No ratings yet
Math204 NonParThree
4 pages
Nonparametric Test
No ratings yet
Nonparametric Test
75 pages
09 Nonparametric Test Li Wenyun1730862642
No ratings yet
09 Nonparametric Test Li Wenyun1730862642
44 pages
Hypothesis Testing.pptx
No ratings yet
Hypothesis Testing.pptx
24 pages
(eBook PDF) Statistics in Context by Barbara Blatchley download
100% (1)
(eBook PDF) Statistics in Context by Barbara Blatchley download
50 pages
BRM - 9e - PPT - CH - 22 Student
No ratings yet
BRM - 9e - PPT - CH - 22 Student
19 pages
Non Parametric Tests
100% (1)
Non Parametric Tests
49 pages
Hypothesis Test - Variance - Section B
No ratings yet
Hypothesis Test - Variance - Section B
40 pages
PY1PR1 Stats Lecture 6 Handout
No ratings yet
PY1PR1 Stats Lecture 6 Handout
35 pages
Non-Parametric Analysis - 20241029 - 033906 - 0000
No ratings yet
Non-Parametric Analysis - 20241029 - 033906 - 0000
79 pages
Dr. SK Ahammad Basha Non Parametric Tests 1
No ratings yet
Dr. SK Ahammad Basha Non Parametric Tests 1
37 pages
Data Analysis and Hypothesis Testing
No ratings yet
Data Analysis and Hypothesis Testing
20 pages
Nonparametric Statistics
No ratings yet
Nonparametric Statistics
23 pages
Q3-HO1
No ratings yet
Q3-HO1
2 pages
Meeting 13 - 14 Non Parametric Statistics 16 - 17
No ratings yet
Meeting 13 - 14 Non Parametric Statistics 16 - 17
28 pages
Statistical Tests
No ratings yet
Statistical Tests
10 pages
Psychology Statistic Note
No ratings yet
Psychology Statistic Note
13 pages
RM Module 4
No ratings yet
RM Module 4
22 pages
Test of Significanc
No ratings yet
Test of Significanc
24 pages
Resesearch Data Analysis Ppt
No ratings yet
Resesearch Data Analysis Ppt
88 pages
Mann WHitney U Test
No ratings yet
Mann WHitney U Test
35 pages
Types of Data, Descriptive Statistics, and Statistical Tests For Nominal Data
No ratings yet
Types of Data, Descriptive Statistics, and Statistical Tests For Nominal Data
13 pages
Types of Data, Descriptive Statistics, and Statistical Tests For Nominal Data
No ratings yet
Types of Data, Descriptive Statistics, and Statistical Tests For Nominal Data
13 pages
23MT2013 DSS CO4 Session 19 Statistical Tests
No ratings yet
23MT2013 DSS CO4 Session 19 Statistical Tests
42 pages
MST-004 Statistical Inference: Non-Parametric Tests
No ratings yet
MST-004 Statistical Inference: Non-Parametric Tests
4 pages
hypothesis formulation and testing
No ratings yet
hypothesis formulation and testing
23 pages
Business Research CH-6
No ratings yet
Business Research CH-6
28 pages
Non Parametric Test: Business Research Methods
No ratings yet
Non Parametric Test: Business Research Methods
26 pages
Nonparametric Statistics and Model Selection: 5.1 Estimating Distributions and Distribution-Free Tests
No ratings yet
Nonparametric Statistics and Model Selection: 5.1 Estimating Distributions and Distribution-Free Tests
10 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Scope of qualitative research
No ratings yet
Scope of qualitative research
6 pages
5 Research Design
No ratings yet
5 Research Design
82 pages
Effect of Attention, Relevance, Confidence, Satisfaction (ARCS) Model On Motivation and Learning Outcomes of Optics at SMAN 4 Banda Aceh
No ratings yet
Effect of Attention, Relevance, Confidence, Satisfaction (ARCS) Model On Motivation and Learning Outcomes of Optics at SMAN 4 Banda Aceh
5 pages
T Test For Independent Samples
No ratings yet
T Test For Independent Samples
29 pages
Design of Experiments: and Statistical Techniques (TXL783)
No ratings yet
Design of Experiments: and Statistical Techniques (TXL783)
9 pages
Hasil Data Ordinal
No ratings yet
Hasil Data Ordinal
7 pages
Introduction To Hypothesis Testing
No ratings yet
Introduction To Hypothesis Testing
17 pages
Chapter 3
No ratings yet
Chapter 3
6 pages
Project
67% (3)
Project
48 pages
Sampling Distributions and Confidence Intervals: by Simulation and by Bootstrap
No ratings yet
Sampling Distributions and Confidence Intervals: by Simulation and by Bootstrap
3 pages
Unit 1 Reading Guide #2 AP
No ratings yet
Unit 1 Reading Guide #2 AP
7 pages
CSCI 688 Homework 6: Megan Rose Bryant Department of Mathematics William and Mary November 12, 2014
No ratings yet
CSCI 688 Homework 6: Megan Rose Bryant Department of Mathematics William and Mary November 12, 2014
12 pages
Analisis Bivariat: Crosstabs Dukungan Dengan Kepatuhan Dukungan Kpkat Crosstabulation
No ratings yet
Analisis Bivariat: Crosstabs Dukungan Dengan Kepatuhan Dukungan Kpkat Crosstabulation
7 pages
Bayes Slides1
No ratings yet
Bayes Slides1
146 pages
03 Basic Statistical Data Analysis Using Excel
No ratings yet
03 Basic Statistical Data Analysis Using Excel
141 pages
Application of Qualitative Methods in Health Research: An Overview
No ratings yet
Application of Qualitative Methods in Health Research: An Overview
5 pages
QM Tutorial - Session 1 Introduction, Descriptive Statistics and Numerical Measures
No ratings yet
QM Tutorial - Session 1 Introduction, Descriptive Statistics and Numerical Measures
13 pages
STA470 Chapter 1
No ratings yet
STA470 Chapter 1
21 pages
CRITIQUE
No ratings yet
CRITIQUE
29 pages
Non-Parametric Statistics, A Free Distribution Test, Is A Statistical Technique Used For Non
No ratings yet
Non-Parametric Statistics, A Free Distribution Test, Is A Statistical Technique Used For Non
17 pages
quiz module 2
No ratings yet
quiz module 2
1 page
Clinical Trial Design Kuliah MKDU 2020
100% (1)
Clinical Trial Design Kuliah MKDU 2020
51 pages
ANOVA
No ratings yet
ANOVA
21 pages
TN_Appendix E - Johns Hopkins EBP
No ratings yet
TN_Appendix E - Johns Hopkins EBP
9 pages
Marketing Research: Marketing Research Is "The Process or Set of Processes That Links The Consumers, Customers, and End
No ratings yet
Marketing Research: Marketing Research Is "The Process or Set of Processes That Links The Consumers, Customers, and End
15 pages
Blubber Experiment Worksheet
No ratings yet
Blubber Experiment Worksheet
1 page
Chapter 3
No ratings yet
Chapter 3
7 pages

DS-2, Week 4 - Lectures

Uploaded by

DS-2, Week 4 - Lectures

Uploaded by

MDS 202: Data Science II with R

Lectures 10-12: What are Statistical Tests?

Dr. Shatrughan Singh∗

Week 4 (27 Feb. - 3 March) 2023

4 Common Statistical Tests

4.2.1 One-Sample Z-Test

4.2.2 Two-Sample Z-Test

∗ Amity University Rajasthan (Jaipur), [email protected]

Dr. S. Singh MDS 202: Lec —> 10-12 1

4.3.1 F-Test is calculated as:

4.4 ANOVA: ANalysis Of VAriance

𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑟𝑜𝑢𝑝𝑠 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

4.5 Chi-Square Test

Dr. S. Singh MDS 202: Lec —> 10-12 2

where, S = Sample variance, 𝜎 = Hypothesized population variance, and 𝑛 = sample size;

where, 𝑂 = Observed frequencies, and 𝐸 = Expected frequencies.

4.6 Mann-Whitney ‘U’ Test

dent samples compared to dependent samples in Signed Rank Test.

Dr. S. Singh MDS 202: Lec —> 10-12 3

4.7 Kruskal Wallis ‘H’ Test

End of the Lecture !!

Dr. S. Singh MDS 202: Lec —> 10-12 4

You might also like