0% found this document useful (0 votes)

13 views29 pages

Understanding Statistical Analysis_ Techniques and Applications

The document provides an overview of statistical analysis, detailing its importance, types, and processes involved in analyzing data to identify patterns and trends. It covers various statistical methods such as descriptive, inferential, predictive, and hypothesis testing, explaining their applications in decision-making and forecasting. Additionally, it outlines the steps of statistical analysis, including data collection, organization, presentation, analysis, and interpretation.

Uploaded by

Unor Job

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views29 pages

Understanding Statistical Analysis_ Techniques and Applications

Uploaded by

Unor Job

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

1

Understanding
Statistical
Analysis:

SE
Techniques

EA
And
H
Applications
IT
W
RN
A
LE

Prepared by
Jahbuikem Anderson
2

Statistical analysis is the process of collecting and analyzing data in order to

discern patterns and trends. It is a method for removing bias from evaluating
data by employing numerical analysis. This technique is useful for collecting the
interpretations of research, developing statistical models, and planning surveys
and studies.

SE
Statistical analysis is a scientific tool in AI and ML that helps collect and analyze
large amounts of data to identify common patterns and trends to convert them
into meaningful information. In simple words, statistical analysis is a data

EA
analysis tool that helps draw meaningful conclusions from raw and unstructured
data.

H
The conclusions are drawn using statistical analysis facilitating decision-making
IT
and helping businesses make future predictions on the basis of past trends. It
can be defined as a science of collecting and analyzing data to identify trends
W

and patterns and presenting them. Statistical analysis involves working with
numbers and is used by businesses and other institutions to make use of data to
derive meaningful information.
RN
A
LE
3

Types of Statistical Analysis

Given below are the 6 types of statistical analysis:

● Descriptive Analysis

Descriptive statistical analysis involves collecting, interpreting, analyzing, and

SE
summarizing data to present them in the form of charts, graphs, and tables.
Rather than drawing conclusions, it simply makes the complex data easy to read

EA
and understand.

● Inferential Analysis

H
The inferential statistical analysis focuses on drawing meaningful conclusions
IT
on the basis of the data analyzed. It studies the relationship between different
variables or makes predictions for the whole population.
W

● Predictive Analysis
RN

Predictive statistical analysis is a type of statistical analysis that analyzes data to

derive past trends and predict future events on the basis of them. It uses
A

machine learning algorithms, data mining, data modelling, and artificial

intelligence to conduct the statistical analysis of data.
LE

● Prescriptive Analysis

The prescriptive analysis conducts the analysis of data and prescribes the best
course of action based on the results. It is a type of statistical analysis that helps
you make an informed decision.
4

● Exploratory Data Analysis

Exploratory analysis is similar to inferential analysis, but the difference is that it

involves exploring the unknown data associations. It analyzes the potential
relationships within the data.

SE
● Causal Analysis

The causal statistical analysis focuses on determining the cause and effect

EA
relationship between different variables within the raw data. In simple words, it
determines why something happens and its effect on other variables. This
methodology can be used by businesses to determine the reason for failure.

H
IT
Importance of Statistical Analysis
W

Statistical analysis eliminates unnecessary information and catalogs important

data in an uncomplicated manner, making the monumental work of organizing

inputs appear so serene. Once the data has been collected, statistical analysis
may be utilized for a variety of purposes. Some of them are listed below:
A
LE

● The statistical analysis aids in summarizing enormous amounts of data

into clearly digestible chunks.
● The statistical analysis aids in the effective design of laboratory, field,
and survey investigations.
● Statistical analysis may help with solid and efficient planning in any
subject of study.
5

● Statistical analysis aid in establishing broad generalizations and

forecasting how much of something will occur under particular
conditions.
● Statistical methods, which are effective tools for interpreting numerical
data, are applied in practically every field of study. Statistical
approaches have been created and are increasingly applied in physical

SE
and biological sciences, such as genetics.
● Statistical approaches are used in the job of a businessman, a
manufacturer, and a researcher. Statistics departments can be found in

EA
banks, insurance businesses, and government agencies.
● A modern administrator, whether in the public or commercial sector,
relies on statistical data to make correct decisions.
H
● Politicians can utilize statistics to support and validate their claims
IT
while also explaining the issues they address.
W
RN
A
LE
6

Statistical Analysis Process

There are five major steps involved in the statistical analysis process:

1. Data collection

The first step in statistical analysis is data collection. You can collect data through

SE
primary or secondary sources such as surveys, customer relationship
management software, online quizzes, financial reports and marketing

EA
automation tools. To ensure the data is viable, you can choose data from a
sample that's representative of a population. For example, a company might
collect data from previous customers to understand buyer behaviors.

2. Data organization
H
IT
The next step after data collection is data organization. Also known as data
cleaning, this stage involves identifying and removing duplicate data and
W

inconsistencies that may prevent you from getting an accurate analysis. This step
is important because it can help companies ensure their data and the
RN

conclusions they draw from the analysis are correct.

3. Data presentation
A

Data presentation is an extension of data cleaning, as it involves arranging the

data for easy analysis. Here, you can use descriptive statistics tools to
LE

summarize the data. Data presentation can also help you determine the best way
to present the data based on its arrangement.

4. Data analysis

Data analysis involves manipulating data sets to identify patterns, trends and
relationships using statistical techniques, such as inferential and associational
7

statistical analysis. You can use computer software like spreadsheets to

automate this process and reduce the likelihood of human error in the statistical
analysis process. This can allow you to analyze data efficiently.

5. Data interpretation

The last step is data interpretation, which provides conclusive results regarding

SE
the purpose of the analysis. After analysis, you can present the result as charts,
reports, scorecards and dashboards to make it accessible to nonprofessionals.
For example, the interpretation of the analysis of the impact of a 6,000-worker

EA
factory on crime rate in a small town with a population of 13,000 residents can
show a declining rate of criminal activities. You may use a line graph to display
this decline.

H
4 Common statistical analysis methods
IT
Here are four common methods for performing statistical analysis:
W

Mean

You can calculate the mean, or average, by finding the sum of a list of numbers
RN

and then dividing the answer by the number of items in the list. It is the simplest
form of statistical analysis, allowing the user to determine the central point of a
data set. The formula for calculating mean is:
A

Mean = Set of numbers / Number of items in the set

Example: You can find the mean of the numbers 1, 2, 3, 4, 5 and 6 by first
adding the numbers together, then dividing the answer from the first step by the
number of figures in the list, which is six. The mean of the numbers is 3.5.
8

Standard deviation

Standard deviation (SD) is used to determine the dispersion of data points. It is a

statistical analysis method that helps determine how the data spreads around the
mean. A high standard deviation means the data disperses widely from the
mean. A low standard deviation shows that most of the data are closer to the
mean.

SE
An application of SD is to test whether participants in a survey gave similar
questions. If a large percentage of respondents' answers are similar, it means

EA
you have a low standard deviation and you can apply their responses to a larger
population. To calculate standard deviation, use this formula:
σ2 = Σ(x − μ)2/n
● σ represents standard deviation
H
● Σ represents the sum of the data
IT
● x represents the value of the dataset
● μ represents the mean of the data
W

● n represents the number of data points in the population

Example: You can calculate the standard deviation of the data set used in the
RN

mean calculation. The first step is to find the variance of the data set. To find
variance, subtract each value in the data set from the mean, square the answer,
A

add everything together and divide by the number of data points.

Variance = ((3.5-1)² + (3.5-2) ² + (3.5-3) ² + (3.5-4) ² + (3.5-5) ² + (3.5-6) ²) / 6
LE

Variance = (6.25 + 2.25 + 0.25 + 0.25 + 2.25 + 6.25) / 6

Variance = 17.25/6 = 2.875
Next, you can calculate the square root of the variance to find the standard
deviation of the data.
Standard deviation = √2.875 = 1.695
9

Regression

Regression is a statistical technique used to find a relationship between a

dependent variable and an independent variable. It helps track how changes in
one variable affect changes in another or the effect of one on the other.
Regression can show whether the relationship between two variables is weak,
strong or varies over a time interval. The regression formula is:

SE
Y = a + b(x)
● Y represents the independent variable, or the data used to predict the

EA
dependent variable
● x represents the dependent variable which is the variable you want to
measure
● a represents the y-intercept or the value of y when x equals zero
H
● b represents the slope of the regression graph
IT
Example: Find the dollar cost of maintaining a car driven for 40,000 miles if the
W

cost of maintenance when there is no mileage on the car is $100. Take b as 0.02,
so the cost of maintenance increases by $0.02 for every unit increase in miles
driven.
RN

● Y = cost of maintaining the car

● X = 40,000 miles
A

● a = $100
● b = $0.02
LE

Y = $100 + 0.02(40,000)
Y = $900
This shows that mileage affects the maintenance costs of a car.
10

Hypothesis testing

Hypothesis testing is used to test if a conclusion is valid for a specific data set by
comparing the data against a certain assumption. The result of the test can nullify
the hypothesis, where it is called the null hypothesis or hypothesis 0. Anything
that violates the null hypothesis is called the first hypothesis or hypothesis 1.

SE
Example: From the regression calculation above, you want to test the hypothesis
that mileage affects the maintenance costs of a car. To test the hypothesis, you

EA
claim mileage affects the maintenance costs of a car. Here, we reject the null
hypothesis since the regression above shows that mileage influences car
maintenance costs.

H
IT
W
RN
A
LE
11

UNDERSTANDING HYPOTHESIS TESTING

Hypothesis testing involves formulating assumptions about population

parameters based on sample statistics and rigorously evaluating these

assumptions against empirical evidence. This article sheds light on the

significance of hypothesis testing and the critical steps involved in the

SE
process.
Hypothesis testing is a statistical method that is used to make a statistical

EA
decision using experimental data. Hypothesis testing is basically an assumption

that we make about a population parameter. It evaluates two mutually exclusive

statements about a population to determine which statement is best supported

by the sample data. H

IT
Defining Hypothesis
W

● Null hypothesis (H0): In statistics, the null hypothesis is a general

statement or default position that there is no relationship between two

measured cases or no relationship among groups. In other words, it is a

basic assumption or made based on the problem knowledge. Example: A

company’s mean production is 50 units/per da H0: μ = 50μ

● Alternative hypothesis (H1): The alternative hypothesis is the hypothesis

used in hypothesis testing that is contrary to the null hypothesis.

Example: A company’s production is not equal to 50 units/per day i.e. H1:

μ ≠ 50μ
12

Key terms of Hypothesis testing

● Level of significance: It refers to the degree of significance in which we
accept or reject the null hypothesis. 100% accuracy is not possible for
accepting a hypothesis, so we, therefore, select a level of significance that
is usually 5% This is normally denoted with α and generally, it is 0.05 or
5%, which means your output should be 95% confident to give a similar

SE
kind of result in each sample.

EA
● P-value: The P-value or calculated probability, is the probability of finding
the observed/extreme results when the null hypothesis (H0) of a
study-given problem is true. If your P-value is less than the chosen
significance level then you reject the null hypothesis i.e. accept that your
H
sample claims to the alternative hypothesis.
IT
● Test Statistic: the test statistics is a numerical value calculated from
W

sample data during a hypothesis test, used to determine whether to reject

the null hypothesis. It is compared to a critical value or p-value to make
decisions about the statistical significance of the observed results.
RN

● Critical value: The critical value is statistics is a threshold or cutoff point

used to determine whether to reject or to accept the null hypothesis in a

hypothesis testing.
LE

● Degrees of freedom: Degrees of freedom are associated with the

variability or freedom one has in estimating a parameter. The degrees of
freedom are related to the sample size and determine the shape.
13

Why use Hypothesis Testing

Hypothesis testing is an important procedure in statistics. Hypothesis testing
evaluates two mutually exclusive population statements to determine which
statement is most supported by sample data. Hypothesis testing helps us to
determine if a finding is statistically significant

SE
One-Tailed and two-Tailed Test
One tailed test focuses on one direction, either greater than > or less than < a

EA
specified value. We use one-tailed test when there is a clear directional
expectation based on prior knowledge or theory. The critical region is located on
only one side of the distribution curve. If the sample falls into this critical region,
the null hypothesis is rejected in favor of the alternative hypothesis.
H
IT
One Tailed test

There are two types of one-tailed test:

Left-Tailed (Left-Sided) Test: The alternative hypothesis asserts that the

true parameter value is less than the null hypothesis. Example: H0:μ≥50 μ≥50
RN

and H1: μ<50 μ<50

Right-Tailed (Right-Sided) Test: The alternative hypothesis asserts that the

true parameter value is greater than the null hypothesis. Example: H0 : μ≤50
LE

μ≤50 and H1:μ>50 μ>50

Two Tailed test

A two-tailed test considers both directions, greater than and less than a

specified value.We use a two-tailed test when there is no specific directional

expectation, and want to detect any significant difference.

Example: H0: μ= 50 and H1: μ≠50

Type 1 and Type 2 errors in Hypothesis Testing

These errors are associated with the decisions made regarding the null
hypothesis and the alternative hypothesis.
Type I error: When we reject the null hypothesis, although that hypothesis
was true. Type I error is denoted by alpha(α).

SE
Type II errors: When we accept the null hypothesis, but it is false. Type II
errors are denoted by beta(β).

EA
HNull
Hypothesis is
Null
Hypothesis is
IT
True False
W
RN

Null Hypothesis Type II Error

Correct Decision
is True (Accept) (False Negative)
A
LE

Alternative
Type I Error (False
Hypothesis is Correct Decision
Positive)
True (Reject)
15

How Hypothesis Testing work

Step 1: Define Null and Alternative Hypothesis;

State the null hypothesis (H0), representing no effect, and the alternative
hypothesis (H1), suggesting an effect or difference.

Step 2: Choose significance level;

SE
Select a significance level (α), typically 0.05, to determine the threshold for
rejecting the null hypothesis. It provides validity to our hypothesis test,

EA
ensuring that we have sufficient data to back up our claims.

Step 3: Collect and Analyze data;

H
Step 4: Calculate Test Statistic

There are various hypothesis tests, each appropriate for various goal to
IT
calculate our test. This could be a Z-test, Chi-square, T-test, and so on.
W

1. Z-test: If population means and standard deviations are known.

Z-statistics is commonly used.

2. T-test: if population standard deviations are unknown and the sample

size is small then t-test statistic is more appropriate.

3. Chi-square test: Chi-square test is used for categorical data or for

testing independence in contingency tables.

4. F-test: F-test is often used in analysis of variance (ANOVA) to compare

variance or test the equality of means across multiple groups.

Step 5: Comparing Test Statistic;

Method 1: Using Critical values

Comparing the test statistics and tabulated critical value we have,

● If Test Statistics>Critical Value: Reject the null hypothesis.

● If Test Statistics<Critical Value: Do not reject the null hypothesis.

Method 2: Using P-values

● If the p-value is less than or equal to < the significance level i.e (p<α), you

reject the null hypothesis. This indicates that the observed results are

unlikely to have occurred by chance alone, providing evidence in favor of

SE
the alternative hypothesis.

● If the p-value is greater than the significance level i.e (p>α), you do not

EA
reject the null hypothesis. This suggests that the observed results are

consistent with what would be expected under the null hypothesis.

H
Step 6: Interpret the Results
IT
We can conclude/interpret our result using either of the methods above.
W
RN
A
LE
17

Real life Examples of Hypothesis Testing

Let’s examine hypothesis testing using two real life situations

Case A: Does a New Drug Affect Blood Pressure?

Imagine a pharmaceutical company has developed a new drug that they believe can

SE
effectively lower blood pressure in patients with hypertension. Before bringing the

drug to market, they need to conduct a study to assess its impact on blood

pressure.

EA
Data:

H
Before Treatment: 120, 122, 118, 130, 125, 128, 115, 121, 123, 119

After Treatment: 115, 120, 112, 128, 122, 125, 110, 117, 119, 114
IT
Solution:
W

Step 1: Define the Hypothesis

Null Hypothesis: (H0)The new drug has no effect on blood pressure.

Alternate Hypothesis: (H1)The new drug has an effect on blood pressure.

Step 2: Define the Significance level

Let’s consider the Significance level at 0.05, indicating rejection of the null
LE

hypothesis.

If the evidence suggests less than a 5% chance of observing the results due to

random variation.

Step 3: Compute the test statistic

Using paired T-test analyze the data to obtain a test statistic and a p-value.
18

The test statistic (e.g., T-statistic) is calculated based on the differences between

blood pressure measurements before and after treatment.

t = m/(s/√n)

where:

m = mean of the difference i.e Xafter, Xbefore

s = standard deviation of the difference (d) i.e di=Xafter,i−Xbefore,

SE
n = sample size,

then, m= -3.9, s= 1.8 and n= 10

EA
we, calculate the , T-statistic = -9 based on the formula for paired t test

Step 4: Find the p-value H

IT
The calculated t-statistic is -9 and degrees of freedom df = 9, you can find the

p-value using statistical software or a t-distribution table.

Thus, p-value = 8.538051223166285e-06

Step 5: Result

If the p-value is less than or equal to 0.05, the researchers reject the null
A

hypothesis.

If the p-value is greater than 0.05, they fail to reject the null hypothesis.
LE

Conclusion: Since the p-value (8.538051223166285e-06) is less than the

significance level (0.05), the researchers reject the null hypothesis. There is

statistically significant evidence that the average blood pressure before and after

treatment with the new drug is different.

Case B: Cholesterol level in a population

Data: A sample of 25 individuals is taken, and their cholesterol levels are measured.

Cholesterol Levels (mg/dL): 205, 198, 210, 190, 215, 205, 200, 192, 198, 205, 198,

202, 208, 200, 205, 198, 205, 210, 192, 205, 198, 205, 210, 192, 205.

SE
Populations Mean = 200

EA
Population Standard Deviation (σ): 5 mg/dL(given for this problem)

Step 1: Define the Hypothesis

H
Null Hypothesis (H0): The average cholesterol level in a population is 200 mg/dL.
IT
Alternate Hypothesis (H1): The average cholesterol level in a population is different

from 200 mg/dL.

Step 2: Define the Significance level

As the direction of deviation is not given , we assume a two-tailed test, and based

on a normal distribution table, the critical values for a significance level of 0.05
A

(two-tailed) can be calculated through the z-table and are approximately -1.96 and

1.96.
LE

Step 3: Compute the test statistic

The test statistic is calculated by using the z formula Z=(203.8 – 200)/(5÷25)

(203.8–200)/(5÷25 )and we get accordingly , Z=2.039999999999992.

Step 4: Result

Since the absolute value of the test statistic (2.04) is greater than the critical value

(1.96), we reject the null hypothesis. And conclude that, there is statistically

significant evidence that the average cholesterol level in the population is different

from 200 mg/dL

SE
Limitations of Hypothesis Testing
Although a useful technique, hypothesis testing does not offer a comprehensive

EA
grasp of the topic being studied. Without fully reflecting the intricacy or whole

context of the phenomena, it concentrates on certain hypotheses and statistical

significance.
H
The accuracy of hypothesis testing results is contingent on the quality of available
IT
data and the appropriateness of statistical methods used. Inaccurate data or poorly
W

formulated hypotheses can lead to incorrect conclusions.

Relying solely on hypothesis testing may cause analysts to overlook significant

patterns or relationships in the data that are not captured by the specific
RN

hypotheses being tested. This limitation underscores the importance of

complimenting hypothesis testing with other analytical approaches.

In Conclusion…
LE

Hypothesis testing stands as a cornerstone in statistical analysis, enabling data

scientists to navigate uncertainties and draw credible inferences from sample data.

By systematically defining null and alternative hypotheses, choosing significance

levels, and leveraging statistical tests, researchers can assess the validity of their

assumptions. The article also elucidates the critical distinction between Type I and

Type II errors, providing a comprehensive understanding of the nuanced

decision-making process inherent in hypothesis testing. The real-life example of

testing a new drug’s effect on blood pressure using a paired T-test showcases the

practical application of these principles, underscoring the importance of statistical

rigor in data-driven decision-making.

SE
EA
H
IT
W
RN
A
LE
22

CHI-SQUARE TEST
The chi-square test is a statistical test used to determine if there is a significant
association between two categorical variables. It is a non-parametric test,
meaning it makes no assumptions about the distribution of the data. The test is
based on the comparison of observed and expected frequencies within a

SE
contingency table. The chi-square test helps with feature selection problems by
looking at the relationship between the elements. It determines if the association

EA
between two categorical variables of the sample would reflect their real
association in the population. It belongs to the family of continuous probability
distributions.

H
The chi-square distribution is a continuous probability distribution that arises in
IT
statistics and is associated with the sum of the squares of independent standard
W

normal random variables. It is often denoted as \chi^2 and is parameterized by

the degrees of freedom k.
RN

It is widely used in statistical analysis, particularly in hypothesis testing and

calculating confidence intervals. It is often used with non-normally distributed
data.
A
LE

Key terms used in Chi-Square test

● Degrees of freedom
● Observed values: Actual data collected
● Expected values: Predicted data based on a theoretical model in
chi-square test.
where,
● Ri : Totals of row i
23

● Cj : Totals of column j
● N: Total number of Observations

Contingency table: A contingency table, also known as a cross-tabulation or

two-way table, is a statistical table that displays the distribution of two categorical
variables.

SE
Types of Chi-Square test

EA
There are several types of chi-square tests, each designed to address specific
research questions or scenarios. The two main types are the chi-square test for
independence and the chi-square goodness-of-fit test.

H
● Chi-Square Test for Independence: This test assesses whether there is a
IT
significant association or relationship between two categorical variables. It
is used to determine whether changes in one variable are independent of
W

changes in another. This test is applied when we have counts of values for
two nominal or categorical variables. To conduct this test, two requirements
RN

must be met. independence of observations and a relatively large sample

size.
For example, suppose we are interested in exploring whether there is a
A

relationship between online shopping preferences and the payment

methods people choose. The first variable is the type of online shopping
LE

preference (e.g., Electronics, Clothing, Books), and the second variable is

the chosen payment method (e.g., Credit Card, Debit Card, PayPal).
The null hypothesis in this case would be that the choice of online
shopping preference and the selected payment method are independent.
● Chi-Square Goodness-of-Fit Test: The Chi-Square Goodness-of-Fit test is
used in statistical hypothesis testing to ascertain whether a variable is
24

likely from a given distribution or not. This test can be applied in situations
when we have value counts for categorical variables. With the help of this
test, we can determine whether the data values are a representative
sample of the entire population or if they fit our hypothesis well.
For example, imagine you are testing the fairness of a six-sided die. The
null hypothesis is that each face of the die should have an equal probability
of landing face up. In other words, the die is unbiased, and the proportions

SE
of each number (1 through 6) occurring are expected to be equal.

EA
Why we use the Chi-Square Test
● The chi-square test is widely used across diverse fields to analyze

H
categorical data, offering valuable insights into associations or differences
between categories.
IT
● Its primary application lies in testing the independence of two categorical
W

variables, determining if changes in one variable relate to changes in

another.
● It is particularly useful for understanding relationships between factors,
RN

such as gender and preferences or product categories and purchasing

behaviors.
● Researchers appreciate its simplicity and ease of application to categorical
A

data, making it a preferred choice for statistical analysis.

● The test provides insights into patterns and associations within categorical
data, aiding in the interpretation of relationships.
● Its utility extends to various fields, including genetics, market research,
quality control, and social sciences, showcasing its broad applicability.
● The chi-square test helps assess the conformity of observed data to
expected values, enhancing its role in statistical analysis.
25

Steps to perform Chi-square test

1. Define
● Null Hypothesis (H0): There is no significant association between the two
categorical variables.
● Alternative Hypothesis (H1): There is a significant association between the

SE
two categorical variables.
2. Create a contingency table that displays the frequency distribution of the
two categorical variables.

EA
3. Find the Expected values
4. Calculate the Chi-Square Statistic
5. Degrees of Freedom

H
6. Accept or Reject the Null Hypothesis: Compare the calculated chi-square
IT
statistic to the critical value from the chi-square distribution table for the
chosen significance level (e.g., 0.05)
W

To Conclude…
RN

The Chi-Square test stands as a versatile tool for exploring categorical data
associations, offering valuable insights into dependencies between variables.
Whether applied for independence or goodness-of-fit, its significance resonates
A

across genetics, market research, and social sciences. Feature selection using
Chi-Square enhances model efficiency, exemplified by the Python
LE

implementation on Iris dataset features.

T-Test
The t-test is named after William Sealy Gosset’s Student’s t-distribution, created
while he was writing under the pen name “Student.”

A t-test is a type of inferential statistic test used to determine if there is a

significant difference between the means of two groups. It is often used when

SE
data is normally distributed and population variance is unknown.

EA
The t-test is used in hypothesis testing to assess whether the observed
difference between the means of the two groups is statistically significant or just
due to random variation.

H
Assumptions in T-test
IT
● Independence: The observations within each group must be independent
W

of each other. This means that the value of one observation should not
influence the value of another observation. Violations of independence can
occur with repeated measures, paired data, or clustered data.
RN

● Normality: The data within each group should be approximately normally

distributed i.e the distribution of the data within each group being
compared should resemble a normal (bell-shaped) distribution. This
A

assumption is crucial for small sample sizes (n < 30).

● Homogeneity of Variances (for independent samples t-test): The

variances of the two groups being compared should be equal. This
assumption ensures that the groups have a similar spread of values.
Unequal variances can affect the standard error of the difference between
means and, consequently, the t-statistic.
27

● Absence of Outliers: There should be no extreme outliers in the data as

outliers can disproportionately influence the results, especially when
sample sizes are small.
Types of T-tests
There are three types of t-tests, and they are categorized as dependent and
independent t-tests.

SE
1. One sample t-test test: The mean of a single group against a known mean.
2. Two-sample t-test: It is further divided into two types:
- Independent samples t-test: compares the means for two groups.

EA
- Paired sample t-test: compares means from the same group at
different times (say, one year apart).

One sample T-test H

IT
One sample t-test is one of the widely used t-tests for comparison of the sample
mean of the data to a particularly given value. Used for comparing the sample
W

mean to the true/population mean.

We can use this when the sample size is small. (under 30) data is collected
randomly and it is approximately normally distributed. It can be calculated as:
RN

t = t-value
x_bar = sample mean
A

μ = true/population mean
LE

σ = standard deviation
n = sample size
28

Independent sample T-test

An Independent sample t-test, commonly known as an unpaired sample t-test is
used to find out if the differences found between two groups is actually significant
or just a random occurrence.

We can use this when:

SE
➔ the population mean or standard deviation is unknown. (information about
the population is unknown)
➔ the two samples are separate/independent. For eg. boys and girls (the two

EA
are independent of each other)

Paired Two-sample T-test

H
Paired sample t-test, commonly known as dependent sample t-test is used to find
IT
out if the difference in the mean of two samples is 0. The test is done on
dependent samples, usually focusing on a particular group of people or things. In
W

this, each entity is measured twice, resulting in a pair of observations.

We can use this when:

➔ Two similar (twin like) samples are given. [Eg, Scores obtained in English
and Math (both subjects)]
A

➔ The dependent variable (data) is continuous.

➔ The observations are independent of one another.
LE

➔ The dependent variable is approximately normally distributed.

To conclude…
T-test, play a crucial role in hypothesis testing, comparing means, and drawing
conclusions about populations. The test can be one-sample, independent
two-sample, or paired two-sample, each with specific use cases and
assumptions. Interpretation of results involves considering T-values, P-values,
and critical values.

SE
These tests aid researchers in making informed decisions based on statistical
evidence.

EA
H
IT
W
RN
A
LE

Prepared by
Jahbuikem Anderson

As researched from GeeksforGeeks, Simplilearn, Statology and indeed

Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
From Everand
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
Jim Frost
5/5 (1)
Notes-Advanced Statistical Methods For Business Decision Making
No ratings yet
Notes-Advanced Statistical Methods For Business Decision Making
69 pages
Statistical Analysis PDF
No ratings yet
Statistical Analysis PDF
7 pages
7 Types of Statistical Analysis Techniques
No ratings yet
7 Types of Statistical Analysis Techniques
7 pages
Introduction To Statistical Data Analysis
No ratings yet
Introduction To Statistical Data Analysis
9 pages
Notes
No ratings yet
Notes
5 pages
7 Types of Statistical Analysis
100% (1)
7 Types of Statistical Analysis
9 pages
UNIT V STATISTICAL DATA ANALYSIS (1)
No ratings yet
UNIT V STATISTICAL DATA ANALYSIS (1)
72 pages
QT FOR PGDBA
No ratings yet
QT FOR PGDBA
16 pages
Statistical Data Analysis
No ratings yet
Statistical Data Analysis
3 pages
1.3 1.4 Unit 1 Statistics and Data Dr. Rafiq
No ratings yet
1.3 1.4 Unit 1 Statistics and Data Dr. Rafiq
27 pages
Regression
No ratings yet
Regression
86 pages
Business Statistics and Analytics
No ratings yet
Business Statistics and Analytics
52 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
19 pages
Chapter 1 Introduction To Statistics and Analysis
No ratings yet
Chapter 1 Introduction To Statistics and Analysis
6 pages
Module 5 Research Methodology (3)
No ratings yet
Module 5 Research Methodology (3)
9 pages
تحليل احصائي
No ratings yet
تحليل احصائي
13 pages
Statistical Inquiry 1
No ratings yet
Statistical Inquiry 1
4 pages
Lecture Note (Chapter-I and II) PDF
No ratings yet
Lecture Note (Chapter-I and II) PDF
26 pages
Presentation On Data Analysis: Submitted by
No ratings yet
Presentation On Data Analysis: Submitted by
38 pages
Practical Research Week 1
No ratings yet
Practical Research Week 1
1 page
1.1 22. Statistical Data Analysis
No ratings yet
1.1 22. Statistical Data Analysis
9 pages
Statistical Data by Group 1 - Statistic Economics 2
No ratings yet
Statistical Data by Group 1 - Statistic Economics 2
17 pages
Statistics
No ratings yet
Statistics
3 pages
Statistics
No ratings yet
Statistics
11 pages
Educational Statistics EDU 408.doc ready
No ratings yet
Educational Statistics EDU 408.doc ready
41 pages
Submitted To Submitted by
No ratings yet
Submitted To Submitted by
44 pages
Submitted To Submitted by
No ratings yet
Submitted To Submitted by
44 pages
STATISTICS
No ratings yet
STATISTICS
32 pages
Lecture No.15-07-June-2023 -
No ratings yet
Lecture No.15-07-June-2023 -
22 pages
Chapter 1 AND 2-b.s.
No ratings yet
Chapter 1 AND 2-b.s.
9 pages
Nature and Scope of Statistics 01-20-2023
No ratings yet
Nature and Scope of Statistics 01-20-2023
4 pages
Statstics NOTES SEM2
No ratings yet
Statstics NOTES SEM2
20 pages
Statistics, Correlation Analysis, Index and Moving
No ratings yet
Statistics, Correlation Analysis, Index and Moving
12 pages
Data Analysis Quantitative
No ratings yet
Data Analysis Quantitative
10 pages
Statistics
No ratings yet
Statistics
109 pages
Statistics For Management
No ratings yet
Statistics For Management
20 pages
Role of Statistics in Engineering - OMPAD
No ratings yet
Role of Statistics in Engineering - OMPAD
15 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
ESM 507 statistical analysis B
No ratings yet
ESM 507 statistical analysis B
3 pages
1 Ffaa
No ratings yet
1 Ffaa
9 pages
Ada Module Chapter 1
No ratings yet
Ada Module Chapter 1
20 pages
7 Types of Analysis Meneses
No ratings yet
7 Types of Analysis Meneses
1 page
Statistics
100% (12)
Statistics
256 pages
Statistical Data Analysis Book Dang Quang The Hong
100% (1)
Statistical Data Analysis Book Dang Quang The Hong
256 pages
Statistical Analysis 1
No ratings yet
Statistical Analysis 1
4 pages
Lecture 1 Introduction to statistics
No ratings yet
Lecture 1 Introduction to statistics
15 pages
Data Analysis: An In-depth Insight
From Everand
Data Analysis: An In-depth Insight
Pasquale De Marco
No ratings yet
Statistical Data Analysis Made Easy
From Everand
Statistical Data Analysis Made Easy
Pasquale De Marco
No ratings yet
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Statistics and Data Analysis Essentials
From Everand
Statistics and Data Analysis Essentials
Jayant Ramaswamy
No ratings yet
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data Analysis for Engineers and Statisticians: A Modern Guide to Statistical Methods and Techniques
From Everand
Data Analysis for Engineers and Statisticians: A Modern Guide to Statistical Methods and Techniques
Pasquale De Marco
No ratings yet
Statistical Theory and Its Solutions
From Everand
Statistical Theory and Its Solutions
Pasquale De Marco
No ratings yet
Exploratory Data Science: A Practical Guide for Engineering and Science Students
From Everand
Exploratory Data Science: A Practical Guide for Engineering and Science Students
Pasquale De Marco
No ratings yet
Get Hired as a Data Analyst FAST in 2024
From Everand
Get Hired as a Data Analyst FAST in 2024
Silas Meadowlark
No ratings yet
Essential Statistical Concepts
From Everand
Essential Statistical Concepts
Pasquale De Marco
No ratings yet
Analyzing Data to Understand the World
From Everand
Analyzing Data to Understand the World
Pasquale De Marco
No ratings yet
IMPORTANCE OF RECYCLING
No ratings yet
IMPORTANCE OF RECYCLING
10 pages
Circle and Line
No ratings yet
Circle and Line
4 pages
LST 110 Student Workbook - 2023
No ratings yet
LST 110 Student Workbook - 2023
145 pages
Climaco, Berwyn Feliciano, Nicole Y. Mendoza, Reignielle Mae E. Peñaverde, Nadine Angeli S. Razo, Miles Arjoe C
No ratings yet
Climaco, Berwyn Feliciano, Nicole Y. Mendoza, Reignielle Mae E. Peñaverde, Nadine Angeli S. Razo, Miles Arjoe C
4 pages
Participatory Planning and Implementation: NSTP 2 - National Service Training Program 2 Dr. Herminigildo S. Villasoto
No ratings yet
Participatory Planning and Implementation: NSTP 2 - National Service Training Program 2 Dr. Herminigildo S. Villasoto
8 pages
Unit 3 Objectives and Functions of Research
No ratings yet
Unit 3 Objectives and Functions of Research
4 pages
Foundatong Igcse Revision Plan
No ratings yet
Foundatong Igcse Revision Plan
2 pages
Unit 6.3 - Linear Program Simplex Method
No ratings yet
Unit 6.3 - Linear Program Simplex Method
6 pages
Carefiber Product Catalogue
No ratings yet
Carefiber Product Catalogue
65 pages
Grid Connected Solar PV Project in Sri Lanka Phase 1A Review and Assessment of Potential Sites The Monaragala Site
No ratings yet
Grid Connected Solar PV Project in Sri Lanka Phase 1A Review and Assessment of Potential Sites The Monaragala Site
75 pages
Abeloff's Clinical Oncology 6th Edition John E. Niederhuber (Editor) All Chapters Instant Download
100% (5)
Abeloff's Clinical Oncology 6th Edition John E. Niederhuber (Editor) All Chapters Instant Download
66 pages
Yoga Presentation For Psychology 1504 4.24.08
No ratings yet
Yoga Presentation For Psychology 1504 4.24.08
55 pages
1990 Freire Et Al Analytical Chemistry 62 18 Isothermal Titration Calorimetry
No ratings yet
1990 Freire Et Al Analytical Chemistry 62 18 Isothermal Titration Calorimetry
10 pages
17p0543 Guide To Noise Policy For Industry
No ratings yet
17p0543 Guide To Noise Policy For Industry
11 pages
Socratic Seminar 2
No ratings yet
Socratic Seminar 2
4 pages
Module
No ratings yet
Module
13 pages
Additional Development OF Luhais & Subba Oil Field Projects
No ratings yet
Additional Development OF Luhais & Subba Oil Field Projects
12 pages
Vocab - 77 - Reminiscences and Regrets
No ratings yet
Vocab - 77 - Reminiscences and Regrets
2 pages
COC2 Script
No ratings yet
COC2 Script
4 pages
Animal Robot Mini Cheetah: Department of Mechanical Engineering
No ratings yet
Animal Robot Mini Cheetah: Department of Mechanical Engineering
10 pages
Comparatve Study On Business Environment of India and Germany
No ratings yet
Comparatve Study On Business Environment of India and Germany
23 pages
Final Year B.Tech Project Allotment List of Electrical Engineering, TMSL 2023-24
No ratings yet
Final Year B.Tech Project Allotment List of Electrical Engineering, TMSL 2023-24
10 pages
Whatmakesafriendshiplast Teacherguide
No ratings yet
Whatmakesafriendshiplast Teacherguide
2 pages
Self-Editing Worksheet 6 Chapter 6: Cause/Effect Essays
No ratings yet
Self-Editing Worksheet 6 Chapter 6: Cause/Effect Essays
2 pages
Agent Monitoring with AgentOps - CrewAI
No ratings yet
Agent Monitoring with AgentOps - CrewAI
5 pages
Soil Physical Properties - Those Properties
No ratings yet
Soil Physical Properties - Those Properties
29 pages
Adj Ing
No ratings yet
Adj Ing
8 pages
SUS 6170 Module Four
No ratings yet
SUS 6170 Module Four
2 pages
1.BSBWOR203 Student Assessment Tasks
No ratings yet
1.BSBWOR203 Student Assessment Tasks
28 pages
Transitions
No ratings yet
Transitions
86 pages