0% found this document useful (0 votes)

5 views11 pages

Inferential Statistics

This document discusses inferential statistics, emphasizing its role in making predictions about populations based on sample data, and outlines various sampling techniques. It covers key concepts such as hypothesis testing, including Z-Test, T-Test, correlation tests, and Chi-Square tests, explaining their applications in data analysis. The article highlights the importance of inferential statistics for data analysts and mentions that it does not cover all existing techniques and tests.

Uploaded by

aimlhod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views11 pages

Inferential Statistics

Uploaded by

aimlhod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Statistics for Data Analysts:

Inferential Statistics with Python

Introduction

In data analysis, Statistics is important in

understanding data, discovering trends and
analyzing data efficiently which coincides with the
purpose of data analysis. Statistics is divided into 2
broad areas based on purpose: Descriptive Statistics
and Inferential Statistics.

This article is the second in the series of Statistics

for Data Analysis and it only covers Inferential
Statistics using Python. Click here for the previous
article on Descriptive Statistics with Python.

Inferential Statistics

Inferential statistics generally involves generating

deductions and/or predictions about a population. In
several cases, inferences are made about a
population using a sample. Unlike descriptive
statistics where a known sample/population data is
described, inferential statistics uses sample data to
make conclusions about the population.

Sampling and Sampling Techniques

Gathering information about the total population

can be very difficult and in some cases, impossible.
Due to this limitation, a smaller fraction of the
population, known as the sample, is analyzed and
inferences are made concerning the population
using the sample data collected. It should be noted
that the sample collected from a population has to
be a representation of the population for correct
deductions. Usually, this is dependent on factors
such as the sample size and sampling techniques
used.
Sampling Techniques
Generally, there are two sampling categories:
Random/Probabilistic sampling and Non-Probability
sampling. For the former, sampling is done at
random and is not biased. However, for non-
probability sampling, sampling is by deliberate
choice. For example, you might want to select the
best students to represent a school in competition
instead of selecting students at random. Under
these two broad categories lie several sampling
techniques.
Simple Random Sampling is the simplest, and most
common technique. Here, every element in the
population has an equal chance of being selected.
Another popular probabilistic sampling procedure is
the stratified sampling technique. In this case, the
population is divided into groups of related elements
called strata. Samples are then collected from each
stratum. For example, data might be collected from
the population in strata of different age groups
instead of at complete random.

The random.sample() function is typically used to

select samples from a population in python, where
the number of samples to be collected is passed as
an argument.
Click here for the Python Documentation

Hypothesis Testing

Hypothesis testing is a statistical inference

technique used to confirm or refute statements
made about a population using the sample data
provided. We can think of hypothesis testing as an
experiment, an hypothesis is made before the
experiment starts. After experimentation, we would
confirm if the results agree with the statement or
not.

Hypothesis testing is one of the most significant

aspects of inferential statistics. There are several
tests applied in hypothesis testing and the specific
test to use depends on the data and purpose of the
test. There are several hypothesis tests you would
need to be familiar with in your journey as a data
analyst. This article covers the following tests:

 Z Test & T-Test

 Correlation Test

 Chi Square Tests

Z-Test & T-Test

The Z-Test is a hypothesis test typically used to
determine if the means of two populations are
significantly different or if the mean of a population
is greater than, less than or equivalent to a specific
value. This test is used when the variance(s) of the
population(s) is/are known. It is also applied when
the data follows a normal distribution. When the
sample size is large, it is also assumed that the data
follows a normal distribution.

Check here for more on Z-Test

Using a case study of the performance of students in

2 classes, the Z Test can be used to ascertain if
there is a significant difference in score. In this
scenario, the null hypothesis is that the mean scores
from the two classes are equal. The hypothesis test
would enable us to support or refute this claim.
Usually, for hypothesis tests, a 5% level of
significance is applied and the claim is rejected if
the p-value produced is less than the level of
significance.
Check the documentation test here

The T-Test has a similar purpose as the Z-Test.

However, it is applied when the population standard
deviation is not known, or for samples with small
sample sizes (n < 30).

Let us paint another scenario of a coach who trains

junior athletes to run a 100meters race. The coach
believes that the average speed of her student is 10
seconds. To confirm this, she selects 10 athletes.

Correlation Test

Correlation describes the degree of relationship

between two (or more) variables. For example, there
might be a positive relationship between hours of
practice and overall performance: “The more you
practice, the better your results in an examination
will be”.
The correlation test tests if the relationship between
these variables is statistically significant. The
Pearson Correlation Coefficient is a popular
correlation coefficient that measures the linear
relationship between 2 variables.

For instance, the relationship between test scores

and exam scores can be tested using Pearson
correlation. The pearsonr function on Scipy returns
the correlation coefficient and tests if the
correlation is significant. The null hypothesis for
correlation test is that there is no correlation
between the variables.

Chi-Square Tests
There are 3 types of Chi-Square Tests:

 Chi-Square Test of Independence

 Chi-Square Goodness of Fit Test

 Chi-Square Test of Homogeneity

The most popularly of these tests are the Chi Square

Test of Independence and the Goodness of Fit Test.

The Chi-Square Goodness of Fit test is used, mostly,

to ascertain if the sample data is a true
representation of the population. On the other hand,
the Chi-Square test of independence is used to
determine if the relationship between two
categorical variables is significant. It is different
from the Correlation test because, unlike the
correlation test that focuses on quantitative
variables, this chi-square test deals with categorical
variables.

The scipy’s stats.chisquare function is used to

compute the goodness of fit test while
the chi2_contigency function is used to compute the
chi-square test of independence.

Inferential Statistics is an extremely valuable tool

for every potential data analyst. From applying
sampling techniques in your data collection process
to applying hypothesis tests to deduce from your
data, it is too valuable to dismiss. It is worth
mentioning that this article does not exhaust all the
sampling techniques and hypothesis tests that exist.
However, it covers some important and widely used
ones that you would come across.

Michael C. Whitlock, Dolph Schluter - The Analysis of Biological Data-MacMillan, W. H. Freeman (2020) - Compressed
50% (4)
Michael C. Whitlock, Dolph Schluter - The Analysis of Biological Data-MacMillan, W. H. Freeman (2020) - Compressed
2,306 pages
SPSS For Psychologists
100% (1)
SPSS For Psychologists
339 pages
Statistics For Dummies
From Everand
Statistics For Dummies
Deborah J. Rumsey
4/5 (27)
Inferential Statistics
100% (4)
Inferential Statistics
28 pages
Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions
From Everand
Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions
Jim Frost
No ratings yet
Bruno de Finetti Philosophical Lectures On Probability
100% (4)
Bruno de Finetti Philosophical Lectures On Probability
239 pages
DATA-VISUALIZATION-NOTES-OU
No ratings yet
DATA-VISUALIZATION-NOTES-OU
125 pages
DV Unit 1&2 Notes
No ratings yet
DV Unit 1&2 Notes
50 pages
Research
No ratings yet
Research
21 pages
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
No ratings yet
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
108 pages
Chapter 5 Data Analysis Ab
No ratings yet
Chapter 5 Data Analysis Ab
56 pages
Statistical Instruments and References Writing in Research
No ratings yet
Statistical Instruments and References Writing in Research
36 pages
Data Analysis Lecture
No ratings yet
Data Analysis Lecture
17 pages
Inferenatial Assign, Of Iqra Sajid
No ratings yet
Inferenatial Assign, Of Iqra Sajid
8 pages
Planning Data Analysis Choosing Statistical Tool (1)
No ratings yet
Planning Data Analysis Choosing Statistical Tool (1)
27 pages
Inferential Statistics
100% (2)
Inferential Statistics
16 pages
Inferential Statistics For Data Science
100% (1)
Inferential Statistics For Data Science
10 pages
Types of Statistical Hypothesis: Statistics
No ratings yet
Types of Statistical Hypothesis: Statistics
18 pages
Inferential Statistics
No ratings yet
Inferential Statistics
6 pages
sanvi isp practical
No ratings yet
sanvi isp practical
17 pages
Ch-5
No ratings yet
Ch-5
26 pages
Wk. 13 Ppt. - Quantitative Techniques in Business
No ratings yet
Wk. 13 Ppt. - Quantitative Techniques in Business
24 pages
Inferential Statistics
No ratings yet
Inferential Statistics
3 pages
Designing The Research Methodology
No ratings yet
Designing The Research Methodology
42 pages
Week - 1 Day - 2 Inferential Statistics
No ratings yet
Week - 1 Day - 2 Inferential Statistics
34 pages
1 Statistical Test and Their Issues I
No ratings yet
1 Statistical Test and Their Issues I
5 pages
Notes Unit-4 BRM
No ratings yet
Notes Unit-4 BRM
10 pages
Research Methodology - Introduction
No ratings yet
Research Methodology - Introduction
56 pages
Defining Hypothesis Testing
No ratings yet
Defining Hypothesis Testing
17 pages
Inferential Statistics
No ratings yet
Inferential Statistics
35 pages
MMW - Midterms
No ratings yet
MMW - Midterms
7 pages
Lecture 4_Data Science Statistics
No ratings yet
Lecture 4_Data Science Statistics
21 pages
Screenshot 2024-03-09 at 21.19.03
No ratings yet
Screenshot 2024-03-09 at 21.19.03
3 pages
Understanding Inferential Statistics
No ratings yet
Understanding Inferential Statistics
15 pages
Statistics - The Big Picture
No ratings yet
Statistics - The Big Picture
4 pages
What Is a Probability Distribution
No ratings yet
What Is a Probability Distribution
11 pages
Unit V Hypothesis Testing
No ratings yet
Unit V Hypothesis Testing
3 pages
Inferential Statistics
No ratings yet
Inferential Statistics
26 pages
Inferential Statistics
No ratings yet
Inferential Statistics
42 pages
Research Methodology - Module: 3: Prepare By: Prof. Vijay Bhatu
No ratings yet
Research Methodology - Module: 3: Prepare By: Prof. Vijay Bhatu
75 pages
Basic Statistic Tools
No ratings yet
Basic Statistic Tools
41 pages
Hypothesis Testing 2
No ratings yet
Hypothesis Testing 2
7 pages
Bachu Assignment
No ratings yet
Bachu Assignment
25 pages
1 Statistical-Tests
No ratings yet
1 Statistical-Tests
42 pages
3-4-RESEARCH-8-2
No ratings yet
3-4-RESEARCH-8-2
54 pages
Hypothesis
No ratings yet
Hypothesis
14 pages
MMW Midterms Notes
No ratings yet
MMW Midterms Notes
6 pages
Tests of Hypothesis
No ratings yet
Tests of Hypothesis
16 pages
Statistics Notes
No ratings yet
Statistics Notes
8 pages
Understanding Simple Inferential Statistics
No ratings yet
Understanding Simple Inferential Statistics
58 pages
Chapter 6
No ratings yet
Chapter 6
16 pages
ASSIGNMENT 2 (CHAPTER 5 & 6) FIN534
No ratings yet
ASSIGNMENT 2 (CHAPTER 5 & 6) FIN534
26 pages
Biostatistics Notes: Descriptive Statistics
No ratings yet
Biostatistics Notes: Descriptive Statistics
16 pages
Biostatistics Notes
No ratings yet
Biostatistics Notes
8 pages
Uts WPS Office
No ratings yet
Uts WPS Office
7 pages
stats theory
No ratings yet
stats theory
21 pages
Research and Statistics Units 5 & 6
No ratings yet
Research and Statistics Units 5 & 6
37 pages
Difference Between Descriptive and Inferential Statistics
No ratings yet
Difference Between Descriptive and Inferential Statistics
8 pages
hypothesis formulation and testing
No ratings yet
hypothesis formulation and testing
23 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet
Hypothesis - Testing General Math
No ratings yet
Hypothesis - Testing General Math
20 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Probability Sampling
No ratings yet
Probability Sampling
22 pages
1.1 Statistics For Data Science PDF
No ratings yet
1.1 Statistics For Data Science PDF
91 pages
Lecture Plan PTSP - ECE D - Pandurangaiah
100% (1)
Lecture Plan PTSP - ECE D - Pandurangaiah
15 pages
Uniform Probability Distribution Normal Probability Distribution Exponential Probability Distribution
100% (1)
Uniform Probability Distribution Normal Probability Distribution Exponential Probability Distribution
29 pages
Unit IV Probability Bayes
No ratings yet
Unit IV Probability Bayes
33 pages
An Introduction To Bayes
No ratings yet
An Introduction To Bayes
4 pages
Tests of Hypothesis
0% (1)
Tests of Hypothesis
48 pages
Quantitative Techniques Trimester 1 Mba Ktu 2016
No ratings yet
Quantitative Techniques Trimester 1 Mba Ktu 2016
2 pages
Stat q3 Wk2 Las2
No ratings yet
Stat q3 Wk2 Las2
1 page
Fa 20-Ai SL
No ratings yet
Fa 20-Ai SL
1 page
AB1202 Week 3
No ratings yet
AB1202 Week 3
5 pages
Binary Diagnostic Tests - Single Sample
No ratings yet
Binary Diagnostic Tests - Single Sample
6 pages
LECTURE NOTES_3
No ratings yet
LECTURE NOTES_3
17 pages
Stat For Business Spring2017 Final Exam PSUT
No ratings yet
Stat For Business Spring2017 Final Exam PSUT
5 pages
Sampling
No ratings yet
Sampling
9 pages
Week 9
No ratings yet
Week 9
1 page
Steps in The Process of Quantitative Data Analysis
No ratings yet
Steps in The Process of Quantitative Data Analysis
9 pages
1 - Worksheet 1 - Probability
No ratings yet
1 - Worksheet 1 - Probability
9 pages
Univariate Analysis of Variance: Notes
No ratings yet
Univariate Analysis of Variance: Notes
4 pages
Bayesian Tutorial
83% (6)
Bayesian Tutorial
76 pages
Demand
No ratings yet
Demand
2 pages
[Slides] 16 Statistical Reasoning (II)
No ratings yet
[Slides] 16 Statistical Reasoning (II)
25 pages
Experiment Design 5: Variables & Levels: Martin, CH 8, 9,10
No ratings yet
Experiment Design 5: Variables & Levels: Martin, CH 8, 9,10
22 pages
JUMBO (Jurnal Manajemen, Bisnis, Dan Organisasi) : Misrawati Madukala
No ratings yet
JUMBO (Jurnal Manajemen, Bisnis, Dan Organisasi) : Misrawati Madukala
13 pages
Paper 5 Format 1 - Discrete Random Variables
No ratings yet
Paper 5 Format 1 - Discrete Random Variables
97 pages
Glesa Jireh B. Mecua Diamons 11/23/2021: Engineering Data Analysis 1
No ratings yet
Glesa Jireh B. Mecua Diamons 11/23/2021: Engineering Data Analysis 1
5 pages

Inferential Statistics

Uploaded by

Inferential Statistics

Uploaded by

Statistics for Data Analysts:

Inferential Statistics with Python

In data analysis, Statistics is important in

This article is the second in the series of Statistics

Inferential statistics generally involves generating

Sampling and Sampling Techniques

Gathering information about the total population

The random.sample() function is typically used to

Hypothesis testing is a statistical inference

Hypothesis testing is one of the most significant

 Z Test & T-Test

 Chi Square Tests

Z-Test & T-Test

Check here for more on Z-Test

Using a case study of the performance of students in

The T-Test has a similar purpose as the Z-Test.

Let us paint another scenario of a coach who trains

Correlation describes the degree of relationship

For instance, the relationship between test scores

 Chi-Square Test of Independence

 Chi-Square Goodness of Fit Test

 Chi-Square Test of Homogeneity

The most popularly of these tests are the Chi Square

The Chi-Square Goodness of Fit test is used, mostly,

The scipy’s stats.chisquare function is used to

Inferential Statistics is an extremely valuable tool

You might also like