0% found this document useful (0 votes)

5 views6 pages

2071 TC2AILab5

Uploaded by

aarushgupta956

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views6 pages

2071 TC2AILab5

Uploaded by

aarushgupta956

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Lab 5

Principles of Data Science Engineering

Topic: Hypothesis Testing Using the Iris Dataset

Name – Neeraj Chormale

PRN- 20220802071
Batch – A2

Aim:
The purpose of this lab is to introduce hypothesis testing using statistical methods in Python, focusing
on hypothesis tests like the t-test, ANOVA, and chi-square test. By applying these techniques to the
well-known Iris dataset, you will learn how to test assumptions about population means and
relationships between categorical variables.

Introduction to Hypothesis Testing:

Hypothesis testing is a statistical method used to make inferences or draw conclusions about a
population based on a sample of data. It helps in determining whether there is enough evidence in a

sample of data to infer that a certain condition is true for the entire population.

Key concepts in hypothesis testing:

• Null Hypothesis (H₀): The statement that there is no effect or no difference. It is what you try

to disprove or reject.

• Alternative Hypothesis (H₁): The statement that there is an effect or a difference. It is what you

want to prove.

• p-value: The probability of observing the results if the null hypothesis is true. A small p-value
(< 0.05) indicates strong evidence against the null hypothesis.
• Significance Level (α): A threshold (commonly 0.05) used to decide whether to reject the null
hypothesis.
• Test Statistic: A value calculated from the data used to determine whether to reject the null
hypothesis.

Dataset: The Iris Dataset

The Iris dataset is one of the most famous datasets in the field of machine learning. It consists of 150
observations, with the following features:
▪ Sepal length (cm)
▪ Sepal width (cm)
▪ Petal length (cm)
▪ Petal width (cm)
▪ Species (Iris-setosa, Iris-versicolor, and Iris-virginica)

Each observation represents a different iris flower from one of the three species, and the dataset
contains measurements for each flower's sepals and petals.

Problem 1: Two-Sample t-test

Objective: To test if there is a significant difference in the sepal lengths between the species Irissetosa
and Iris-versicolor.
Hypotheses:

▪ Null Hypothesis (H₀): There is no significant difference between the mean sepal lengths of setosa

and versicolor species. (μ₁ = μ₂)

▪ Alternative Hypothesis (H₁): There is a significant difference between the mean sepal lengths of
setosa and versicolor species. (μ₁ ≠ μ₂) Steps:

1. Select the data for the two species (setosa and versicolor).
2. Calculate the mean and standard deviation of the sepal lengths for both species.
3. Use a two-sample t-test to determine if the difference in means is statistically significant.
4. Calculate the t-statistic and p-value.
5. Compare the p-value with the significance level (α = 0.05) to decide whether to reject or fail to
reject the null hypothesis.
Interpretation:
- If the p-value is less than 0.05, reject the null hypothesis, meaning there is a statistically
significant difference in sepal lengths between setosa and versicolor.
- If the p-value is greater than 0.05, fail to reject the null hypothesis, meaning there is no
significant difference in sepal lengths.

Problem 2: One-Way ANOVA (Analysis of Variance)

Objective:
To test if there is a significant difference in the sepal lengths across all three species (setosa, versicolor,
and virginica).

Hypotheses:

▪ Null Hypothesis (H₀): The means of sepal lengths are equal for all species. (μ₁ = μ₂ = μ₃)

▪ Alternative Hypothesis (H₁): At least one species has a different mean sepal length. (μ₁ ≠

μ₂ or μ₁ ≠ μ₃, etc.) Steps:

1. Group the data by species and calculate the means and standard deviations for sepal
lengths.
2. Use the one-way ANOVA test to compare the means of sepal lengths across the three
species.
3. Calculate the F-statistic and p-value.
4. Compare the p-value with the significance level (α = 0.05) to decide whether to reject or
fail to reject the null hypothesis.
Interpretation:
▪ If the p-value is less than 0.05, reject the null hypothesis, indicating that at least one species has
a significantly different mean sepal length.
▪ If the p-value is greater than 0.05, fail to reject the null hypothesis, suggesting that the means are
not significantly different across species.

Problem 3: Chi-Square Test for Independence

Objective:
To test whether there is a relationship between species and different categories of sepal width (e.g.,
narrow, medium, wide).

Hypotheses:

- Null Hypothesis (H₀): There is no relationship between species and sepal width categories (i.e.,

the two variables are independent).

- Alternative Hypothesis (H₁): There is a relationship between species and sepal width categories
(i.e., the two variables are dependent).
Steps:
1. Divide the sepal width data into categories (e.g., narrow, medium, wide).
2. Create a contingency table showing the frequency of species across these categories.
3. Perform a chi-square test to determine if the distribution of species is independent of
sepal width categories.
4. Calculate the chi-square statistic and p-value.
5. Compare the p-value with the significance level (α = 0.05) to decide whether to reject or
fail to reject the null hypothesis.
Interpretation:
- If the p-value is less than 0.05, reject the null hypothesis, indicating that sepal width and species
are related (dependent).
- If the p-value is greater than 0.05, fail to reject the null hypothesis, suggesting that sepal width
and species are independent.
Conclusion

In this lab, we explored three statistical tests on the Iris dataset:

1. Two-sample t-test to compare the means of sepal lengths between two species.
2. One-way ANOVA to compare the means of sepal lengths across all three species.
3. Chi-square test to determine whether species and sepal width categories are independent.

Hypothesis Testing
No ratings yet
Hypothesis Testing
10 pages
Math Syllabus F3-F5
100% (1)
Math Syllabus F3-F5
78 pages
BİL_CSE395_07
No ratings yet
BİL_CSE395_07
21 pages
H H: Not H: Data ("Iris") View (Iris) Head (Iris)
No ratings yet
H H: Not H: Data ("Iris") View (Iris) Head (Iris)
9 pages
Example of Hypothesis
No ratings yet
Example of Hypothesis
1 page
Click On Tabs On The Bottom To Switch To Different Tests: Chi Square Test
No ratings yet
Click On Tabs On The Bottom To Switch To Different Tests: Chi Square Test
23 pages
ANOVA Practical
No ratings yet
ANOVA Practical
7 pages
LEC10
No ratings yet
LEC10
13 pages
Chapter 14 - Analysis of Variance (ANOVA) : TI-83/84 Procedure
No ratings yet
Chapter 14 - Analysis of Variance (ANOVA) : TI-83/84 Procedure
6 pages
Tuto 2 - ABAK 2021 - 2022
No ratings yet
Tuto 2 - ABAK 2021 - 2022
10 pages
This Excel File Will Allow You To Do The Following Tests: Chi-Square Test T-Test Paired T-Test Regression Simpson's Diversity Index
No ratings yet
This Excel File Will Allow You To Do The Following Tests: Chi-Square Test T-Test Paired T-Test Regression Simpson's Diversity Index
40 pages
Stats and Math for 9700 Bio p5
No ratings yet
Stats and Math for 9700 Bio p5
8 pages
Report - Effects of Family Issues On Education
No ratings yet
Report - Effects of Family Issues On Education
8 pages
An Introduction To T-Tests
No ratings yet
An Introduction To T-Tests
5 pages
Supreet Kaur 4132 (Research Methodology)
No ratings yet
Supreet Kaur 4132 (Research Methodology)
8 pages
T Test For Hypothesis Testing 92
No ratings yet
T Test For Hypothesis Testing 92
5 pages
Biology Project: Aim: To Compare Species Diversity of The Field in Areas Under A Tree and Areas Not
No ratings yet
Biology Project: Aim: To Compare Species Diversity of The Field in Areas Under A Tree and Areas Not
2 pages
Stat Assignment
No ratings yet
Stat Assignment
2 pages
Unit IV. Inferential Statistics: A. T-Test B. Analysis of Variance (ANOVA) C. Chi-Square
No ratings yet
Unit IV. Inferential Statistics: A. T-Test B. Analysis of Variance (ANOVA) C. Chi-Square
26 pages
Stats and Math for 9700 Bio p5 (1)
No ratings yet
Stats and Math for 9700 Bio p5 (1)
8 pages
(2558) Assignment 1
No ratings yet
(2558) Assignment 1
23 pages
Statistics 151 Solution Sample Final Exam: Total: 100 Points Time: 3 Hour
No ratings yet
Statistics 151 Solution Sample Final Exam: Total: 100 Points Time: 3 Hour
8 pages
T Test
100% (1)
T Test
6 pages
Tutorial Hypothesis Testing and T
No ratings yet
Tutorial Hypothesis Testing and T
6 pages
STATISTICALinference
No ratings yet
STATISTICALinference
5 pages
Test of Significance For Small Samples
No ratings yet
Test of Significance For Small Samples
35 pages
Module 4 T Test For Independent
No ratings yet
Module 4 T Test For Independent
8 pages
T Test
No ratings yet
T Test
33 pages
Excel Guide to Hypothesis Testing and Statistical Analysis in Ecology (1)
No ratings yet
Excel Guide to Hypothesis Testing and Statistical Analysis in Ecology (1)
22 pages
Parametric and Non Parametric Assignment[1]
No ratings yet
Parametric and Non Parametric Assignment[1]
17 pages
Transformando La Movilidad Urbana en Mexico2
No ratings yet
Transformando La Movilidad Urbana en Mexico2
4 pages
6 - Test of Hypothesis (Part - 2)
No ratings yet
6 - Test of Hypothesis (Part - 2)
21 pages
2020-02-22 Linear Models
No ratings yet
2020-02-22 Linear Models
54 pages
An Introduction To T Tests - Definitions, Formula and Examples
No ratings yet
An Introduction To T Tests - Definitions, Formula and Examples
3 pages
Statistics 1. T-Test Review: 2014. Prepared by Lauren Pincus With Input From Mark Bell
No ratings yet
Statistics 1. T-Test Review: 2014. Prepared by Lauren Pincus With Input From Mark Bell
16 pages
TC2-Lab Manual
No ratings yet
TC2-Lab Manual
35 pages
Common Statistics
No ratings yet
Common Statistics
23 pages
Practice T-Test (12 Sample)
No ratings yet
Practice T-Test (12 Sample)
14 pages
T Test
No ratings yet
T Test
21 pages
Bernard F Dela Vega PH 1-1
No ratings yet
Bernard F Dela Vega PH 1-1
5 pages
Chi Square Test
No ratings yet
Chi Square Test
5 pages
Choosing The Right Test
No ratings yet
Choosing The Right Test
6 pages
Ecological Statistics_Practice
No ratings yet
Ecological Statistics_Practice
6 pages
Anova Training 6th Day
No ratings yet
Anova Training 6th Day
49 pages
Lab 7 t-Test Manual
No ratings yet
Lab 7 t-Test Manual
7 pages
Analysis: Test The Difference Between Two Population Means (Paired)
No ratings yet
Analysis: Test The Difference Between Two Population Means (Paired)
1 page
Lab Manual FPA 580 PDF
No ratings yet
Lab Manual FPA 580 PDF
34 pages
Stat 702 Bilal
100% (1)
Stat 702 Bilal
12 pages
Chi-Square Test
No ratings yet
Chi-Square Test
9 pages
FINAL (SG) - PR 2 11 - 12 - UNIT 7 - LESSON 2 - Testing The Difference of Two Means
No ratings yet
FINAL (SG) - PR 2 11 - 12 - UNIT 7 - LESSON 2 - Testing The Difference of Two Means
27 pages
Analysis of Variance
No ratings yet
Analysis of Variance
51 pages
Student's T Test
100% (1)
Student's T Test
7 pages
15. ANOVA
No ratings yet
15. ANOVA
15 pages
R commands New 2
No ratings yet
R commands New 2
23 pages
WILP ASM Mid-Sem (Regular) Solutions
No ratings yet
WILP ASM Mid-Sem (Regular) Solutions
4 pages
Hypothesis Testing. BCApptx
No ratings yet
Hypothesis Testing. BCApptx
34 pages
APM Minitab Lab 7 chi-square_2020_fillabel
No ratings yet
APM Minitab Lab 7 chi-square_2020_fillabel
6 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
01 - An Attribute-Based Ant Colony System For Adaptive Learning Object Recommendation
No ratings yet
01 - An Attribute-Based Ant Colony System For Adaptive Learning Object Recommendation
14 pages
BB Session - Me132p - Lecture 2 KD and Dof
No ratings yet
BB Session - Me132p - Lecture 2 KD and Dof
31 pages
STA 114 Question Bank
No ratings yet
STA 114 Question Bank
14 pages
Determining The Shaper Cut Helical Gear Fillet Profile: George Lian Management Summary
No ratings yet
Determining The Shaper Cut Helical Gear Fillet Profile: George Lian Management Summary
12 pages
Introduction To Computational Fluid Dynamics: Dmitri Kuzmin
No ratings yet
Introduction To Computational Fluid Dynamics: Dmitri Kuzmin
34 pages
Course Structure V2
No ratings yet
Course Structure V2
12 pages
SHRM Chapter 01
No ratings yet
SHRM Chapter 01
26 pages
Class - XII: Mathematics-041 Question Paper 2020
No ratings yet
Class - XII: Mathematics-041 Question Paper 2020
4 pages
Ba Paper 3
No ratings yet
Ba Paper 3
1 page
Mod 6
No ratings yet
Mod 6
15 pages
Course: B.Tech-II Subject: Engineering Mathematics II Unit-2 Rai University, Ahmedabad
No ratings yet
Course: B.Tech-II Subject: Engineering Mathematics II Unit-2 Rai University, Ahmedabad
14 pages
Unit II: Computer Arithmetic
No ratings yet
Unit II: Computer Arithmetic
23 pages
A Survey
No ratings yet
A Survey
8 pages
Geh 5081B
No ratings yet
Geh 5081B
137 pages
Psm
No ratings yet
Psm
36 pages
Jembspsu 01
No ratings yet
Jembspsu 01
3 pages
One and Two Step Equations Integers and Decimals
No ratings yet
One and Two Step Equations Integers and Decimals
7 pages
Lec06 Solving Local Minima in Grid and Examples Proofs of Correctness
No ratings yet
Lec06 Solving Local Minima in Grid and Examples Proofs of Correctness
25 pages
Reinforcement Learning for Optimal Feedback Control Rushikesh Kamalapurkar instant download
100% (4)
Reinforcement Learning for Optimal Feedback Control Rushikesh Kamalapurkar instant download
58 pages
Abp Quantum Physics Multiple Choice 2009-05-13
No ratings yet
Abp Quantum Physics Multiple Choice 2009-05-13
4 pages
PR Ekonometrika
No ratings yet
PR Ekonometrika
8 pages
CPSC 103 Midterm 2021W1
No ratings yet
CPSC 103 Midterm 2021W1
12 pages
Polynomial 10
No ratings yet
Polynomial 10
1 page
4_5846184061680225879
No ratings yet
4_5846184061680225879
68 pages
ch04 Ken Black Student Solutions
No ratings yet
ch04 Ken Black Student Solutions
30 pages
Strain Modeling of Transpressional and Transtensional Deformation
No ratings yet
Strain Modeling of Transpressional and Transtensional Deformation
12 pages
Sankalp Rank Enhancement Test - PH 3 - Paper 1
No ratings yet
Sankalp Rank Enhancement Test - PH 3 - Paper 1
23 pages
01 Reading a Vernier Caliper 2020
No ratings yet
01 Reading a Vernier Caliper 2020
5 pages

2071 TC2AILab5

Uploaded by

2071 TC2AILab5

Uploaded by

Lab 5

Principles of Data Science Engineering

Name – Neeraj Chormale

Introduction to Hypothesis Testing:

Key concepts in hypothesis testing:

Dataset: The Iris Dataset

Problem 1: Two-Sample t-test

and versicolor species. (μ₁ = μ₂)

Problem 2: One-Way ANOVA (Analysis of Variance)

μ₂ or μ₁ ≠ μ₃, etc.) Steps:

Problem 3: Chi-Square Test for Independence

the two variables are independent).

In this lab, we explored three statistical tests on the Iris dataset:

You might also like