0% found this document useful (0 votes)

35 views6 pages

Checking the normality of a dataset

The document outlines methods for checking the normality of a dataset, which is crucial for parametric statistical analyses. It details visual methods like histograms and Q-Q plots, as well as statistical tests such as the Shapiro-Wilk and Kolmogorov-Smirnov tests. Additionally, it provides guidance on performing these checks in Python, R, and SAS, along with recommendations for data transformations if normality is not met.

Uploaded by

ritisnatanayak2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views6 pages

Checking the normality of a dataset

Uploaded by

ritisnatanayak2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

By-Ritisnata Nayak

Checking the normality of a dataset

Checking the normality of a dataset is important when you're performing statistical analyses that
assume the data is normally distributed, such as parametric tests (e.g., t-tests, ANOVA, linear
regression). There are several methods to check for normality, ranging from visual inspection to
formal statistical tests.
Here’s how you can check for normality:
1. Visual Methods:
a) Histogram:
 A histogram is a simple way to visualize the distribution of your data. If your data is normally
distributed, the histogram should have a bell-shaped curve.
b) Q-Q Plot (Quantile-Quantile Plot):
 A Q-Q plot compares the quantiles of your data against the quantiles of a normal
distribution. If your data is normally distributed, the points on the plot should fall roughly
along a straight line.
c) Box Plot:
 Although a box plot doesn’t directly show normality, it can help identify outliers. Normally
distributed data will have a symmetric box plot without extreme outliers.
d) Density Plot:
 A kernel density estimate (KDE) plot provides a smoothed version of the histogram. You can
compare it to the normal distribution curve to visually inspect normality.
2. Statistical Tests:
a) Shapiro-Wilk Test:
 One of the most popular tests for normality, especially for smaller datasets.
 Null hypothesis (H0): The data is normally distributed.
 If the p-value is less than 0.05, the null hypothesis is rejected, indicating that the data is not
normally distributed.
b) Kolmogorov-Smirnov Test:
 Another test to compare your data with a normal distribution. It's sensitive to deviations
from normality but is generally used for larger datasets.
c) Anderson-Darling Test:
 A more powerful test than the Kolmogorov-Smirnov test for checking normality, particularly
useful for smaller datasets.
d) D'Agostino's K-squared Test:
 This test measures the skewness and kurtosis of the data to check for departures from
normality.
e) Jarque-Bera Test:
By-Ritisnata Nayak

 A goodness-of-fit test that compares the skewness and kurtosis of the sample data to a
normal distribution.
3. Skewness and Kurtosis:
 Skewness measures the asymmetry of the data. For a normal distribution, skewness should
be close to 0.
 Kurtosis measures the "tailedness" of the distribution. For a normal distribution, kurtosis
should be close to 3.
4. Transformations (If Data is Not Normally Distributed):
If the data is not normally distributed, transformations such as logarithmic, square root, or Box-Cox
transformations can sometimes help make the data more normal.
How to Perform in Python:
Code

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns

# Sample data
data = np.random.normal(loc=0, scale=1, size=1000) # Normal distribution

# Histogram
plt.hist(data, bins=30, edgecolor='k')
plt.title('Histogram')
plt.show()

# Q-Q plot
stats.probplot(data, dist="norm", plot=plt)
plt.title('Q-Q Plot')
plt.show()

# Shapiro-Wilk Test
shapiro_test = stats.shapiro(data)
print(f"Shapiro-Wilk Test p-value: {shapiro_test.pvalue}")

# Kolmogorov-Smirnov Test
ks_test = stats.kstest(data, 'norm')
print(f"Kolmogorov-Smirnov Test p-value: {ks_test.pvalue}")

# Anderson-Darling Test
anderson_test = stats.anderson(data, dist='norm')
print(anderson_test)

# D'Agostino's K-squared Test

dagostino_test = stats.normaltest(data)
print(f"D'Agostino Test p-value: {dagostino_test.pvalue}")

# Skewness and Kurtosis

By-Ritisnata Nayak

print(f"Skewness: {stats.skew(data)}")
print(f"Kurtosis: {stats.kurtosis(data)}")

How to Perform in R:
Code
# Sample data
data <- rnorm(1000) # Normal distribution

# Histogram
hist(data, breaks=30, col="lightblue", main="Histogram")

# Q-Q plot
qqnorm(data)
qqline(data, col = "red")

# Shapiro-Wilk Test
shapiro.test(data)

# Kolmogorov-Smirnov Test
ks.test(data, "pnorm", mean=mean(data), sd=sd(data))

# Anderson-Darling Test (using 'nortest' package)

library(nortest)
ad.test(data)

# Skewness and Kurtosis (using 'moments' package)

library(moments)
skewness(data)
kurtosis(data)

How to Perform in SAS:

In SAS, you can check for normality using a combination of visual plots and statistical tests like the
Shapiro-Wilk, Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling tests.
Here’s how you can perform normality checks in SAS:
1. Visual Methods:
You can use histograms and Q-Q plots to visually inspect the distribution of your data.
a) Histogram and Q-Q Plot:
sas
Copy code
proc univariate data=your_dataset;
var your_variable;
histogram / normal;
qqplot / normal(mu=est sigma=est);
run;
 The histogram / normal; command will generate a histogram with a fitted normal
distribution curve.
By-Ritisnata Nayak

 The qqplot / normal(mu=est sigma=est); command will generate a Q-Q plot with a normal
reference line.
2. Statistical Tests for Normality:
You can run statistical tests like the Shapiro-Wilk test and others using the PROC UNIVARIATE
procedure.
sas
Copy code
proc univariate data=your_dataset normal;
var your_variable;
histogram / normal;
probplot / normal(mu=est sigma=est);
run;
 This code will generate the following:

o Shapiro-Wilk test (for sample sizes ≤ 2000).

o Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling tests (for sample
sizes > 2000).
o A probability plot (P-P Plot) of the variable against the normal distribution.
The output will provide p-values for these tests. If the p-value is less than 0.05, you reject the null
hypothesis, indicating that the data is not normally distributed.
Explanation of Key Tests:
 Shapiro-Wilk Test: Most commonly used for normality; accurate for smaller datasets.
 Kolmogorov-Smirnov Test: Compares the empirical distribution function of the sample with
the cumulative distribution function of the specified distribution (here, normal).
 Anderson-Darling Test: A more sensitive test than the K-S test, giving more weight to the
tails of the distribution.
 Cramer-von Mises Test: Similar to Anderson-Darling but slightly different in weighing
deviations.
Example:
Assume your dataset is named fish_data and you want to check the normality of a variable
body_weight.
sas
Copy code
proc univariate data=fish_data normal;
var body_weight;
histogram / normal;
qqplot / normal(mu=est sigma=est);
run;

3. Skewness and Kurtosis:

To check skewness and kurtosis in PROC UNIVARIATE:
By-Ritisnata Nayak

sas
Copy code
proc univariate data=fish_data;
var body_weight;
output out=stats skewness=sk kurtosis=ku;
run;

This code will output the skewness and kurtosis into the stats dataset, where:
 Skewness (sk): Measures the symmetry of the data.
 Kurtosis (ku): Measures the "tailedness" of the distribution.
A skewness value near 0 indicates a symmetric distribution, and a kurtosis value near 3 indicates a
normal distribution.
4. Checking Normality in a Report:
You can also include normality tests in a statistical summary report using PROC MEANS or PROC
SUMMARY, but PROC UNIVARIATE provides more detailed output for normality.

5. Transformations (If Data is Not Normally Distributed):

If your data is not normally distributed, you can try transforming it:
Example: Log Transformation:
sas
Copy code
data transformed_data;
set fish_data;
log_body_weight = log(body_weight);
run;

proc univariate data=transformed_data normal;

var log_body_weight;
histogram / normal;
qqplot / normal(mu=est sigma=est);
run;

Summary of Steps in SAS:

1. Run PROC UNIVARIATE with the normal option to check normality using visual plots and
statistical tests.
2. Use histograms and Q-Q plots for visual inspection.
3. Look at p-values from tests like Shapiro-Wilk to formally check normality.
4. Skewness and kurtosis can also provide insights into normality.
5. Apply transformations like log, square root, or Box-Cox if the data is not normally
distributed.
This method provides a comprehensive approach to normality testing in SAS.
By-Ritisnata Nayak

Guidelines for Choosing a Method:

 Small datasets (n < 50): Prefer the Shapiro-Wilk test.
 Large datasets (n > 2000): Visual methods and Q-Q plots are generally more reliable, as most
tests will flag even minor deviations from normality as significant.
 Moderate datasets: Combine visual inspection with a test like the Shapiro-Wilk or
D'Agostino's test.
Conclusion:
 Use visual methods for an intuitive understanding of the data's distribution.
 Use statistical tests for formal checks, but be cautious in large samples, as even minor
deviations from normality can lead to rejection of the null hypothesis.

HW 1
No ratings yet
HW 1
5 pages
Normality Checking 11 Ps
No ratings yet
Normality Checking 11 Ps
4 pages
TEST FOR NORMALITY SPSS
No ratings yet
TEST FOR NORMALITY SPSS
5 pages
Community Project: Checking Normality For Parametric Tests in R
No ratings yet
Community Project: Checking Normality For Parametric Tests in R
4 pages
Community Project: Checking Normality For Parametric Tests in SPSS
No ratings yet
Community Project: Checking Normality For Parametric Tests in SPSS
4 pages
Week 2 Lecture 1
No ratings yet
Week 2 Lecture 1
14 pages
Lecture09 (Assessing Normality)
No ratings yet
Lecture09 (Assessing Normality)
32 pages
Real Statistics Examples Goodness of Fit
No ratings yet
Real Statistics Examples Goodness of Fit
102 pages
Real Statistics Examples Goodness of Fit
No ratings yet
Real Statistics Examples Goodness of Fit
102 pages
2 Normality PG OK
No ratings yet
2 Normality PG OK
24 pages
Statistics Normality
No ratings yet
Statistics Normality
42 pages
3 Assmuption-testing PDF
No ratings yet
3 Assmuption-testing PDF
17 pages
Testing For Normality 2
No ratings yet
Testing For Normality 2
10 pages
Guideline For Final Year Project - Research Supervision: Faculty of Business, Accountancy and Management
No ratings yet
Guideline For Final Year Project - Research Supervision: Faculty of Business, Accountancy and Management
71 pages
SPSS-Lec#15
No ratings yet
SPSS-Lec#15
5 pages
3505 Test of Normality
No ratings yet
3505 Test of Normality
4 pages
Asghar Ghasemi, 2012
No ratings yet
Asghar Ghasemi, 2012
4 pages
Testing Normality Using R/R-Studio: Dean, FCM, BPSMV, Khanpur Kalan, Sonipat, Haryana
No ratings yet
Testing Normality Using R/R-Studio: Dean, FCM, BPSMV, Khanpur Kalan, Sonipat, Haryana
9 pages
week 6 & 7
No ratings yet
week 6 & 7
20 pages
Test of Normality
No ratings yet
Test of Normality
7 pages
Inferential-Statistics-1
No ratings yet
Inferential-Statistics-1
17 pages
Assignment No 3 (Repaired)
No ratings yet
Assignment No 3 (Repaired)
16 pages
Statistical Analysis Using SPSS and R - Chapter 4 PDF
No ratings yet
Statistical Analysis Using SPSS and R - Chapter 4 PDF
106 pages
Biostatistics 101: Data Presentation: Yhchan
No ratings yet
Biostatistics 101: Data Presentation: Yhchan
6 pages
Data Presentation For Doctors
No ratings yet
Data Presentation For Doctors
6 pages
Chapter 13 - Tests For The Assumption That A Variable Is Normally Distributed Final - Edited
No ratings yet
Chapter 13 - Tests For The Assumption That A Variable Is Normally Distributed Final - Edited
4 pages
Principle of multilinear regression, normality and herterschedasity
No ratings yet
Principle of multilinear regression, normality and herterschedasity
3 pages
Shapiro-Wilk Test in R Programming
No ratings yet
Shapiro-Wilk Test in R Programming
1 page
Testing For Normality Using SPSS PDF
100% (1)
Testing For Normality Using SPSS PDF
12 pages
Tabu Ran Normal
100% (1)
Tabu Ran Normal
14 pages
Five_Methods_and_Normality_Decision_Guide
No ratings yet
Five_Methods_and_Normality_Decision_Guide
2 pages
Lec 5 - Normality Testing
No ratings yet
Lec 5 - Normality Testing
30 pages
Test For Normality PDF
No ratings yet
Test For Normality PDF
30 pages
Testing Assumptions: Normality and Equal Variances
No ratings yet
Testing Assumptions: Normality and Equal Variances
4 pages
Chapter1 MV
No ratings yet
Chapter1 MV
72 pages
Lab 3 - Kristi Proc Univariate
No ratings yet
Lab 3 - Kristi Proc Univariate
10 pages
Normal Distribution: Theory and Testing of Normality
No ratings yet
Normal Distribution: Theory and Testing of Normality
21 pages
Normality Test
No ratings yet
Normality Test
27 pages
1333355396testing For Normality Using SPSS
No ratings yet
1333355396testing For Normality Using SPSS
19 pages
How To Transform Features Into Normal Gaussian Distribution
No ratings yet
How To Transform Features Into Normal Gaussian Distribution
9 pages
10.2478_rrlm-2022-0030
No ratings yet
10.2478_rrlm-2022-0030
10 pages
Empirical Data Analysis in Accounting and Finance
No ratings yet
Empirical Data Analysis in Accounting and Finance
48 pages
Test-of-Normality-and-HomoscedasticityV3
No ratings yet
Test-of-Normality-and-HomoscedasticityV3
11 pages
Part2 Statistics
No ratings yet
Part2 Statistics
55 pages
Long-Normality Test Data Transformation
No ratings yet
Long-Normality Test Data Transformation
11 pages
04 Assumptions
No ratings yet
04 Assumptions
53 pages
1-PDP On Decoding Statistics For Data Analysis - Day 1 - Test of Normality
No ratings yet
1-PDP On Decoding Statistics For Data Analysis - Day 1 - Test of Normality
31 pages
Normality Test
No ratings yet
Normality Test
21 pages
Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro Wilk Tests For Two-Sample Pooled T-Test
No ratings yet
Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro Wilk Tests For Two-Sample Pooled T-Test
13 pages
Normality Test
100% (1)
Normality Test
10 pages
Data Screening and Psychometrics
No ratings yet
Data Screening and Psychometrics
7 pages
Testing For Normality Using SPSS
No ratings yet
Testing For Normality Using SPSS
12 pages
STAT - Measures of Shape
No ratings yet
STAT - Measures of Shape
5 pages
DADM S3 Skewness and Transformations To Achieve Normality
No ratings yet
DADM S3 Skewness and Transformations To Achieve Normality
9 pages
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Quant Developers' Tools and Techniques: Quant Books, #1
From Everand
Quant Developers' Tools and Techniques: Quant Books, #1
Manfred Hindering
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
From Average To K-means
From Everand
From Average To K-means
Beam van Waardenberg
No ratings yet
Centrifugal Pump Rating Calculation
100% (3)
Centrifugal Pump Rating Calculation
33 pages
Material Removal Processes: - The Family Tree
No ratings yet
Material Removal Processes: - The Family Tree
62 pages
University of Botswana
No ratings yet
University of Botswana
7 pages
Econometric Projek Lithem
No ratings yet
Econometric Projek Lithem
31 pages
Temporal Difference Learning
No ratings yet
Temporal Difference Learning
17 pages
Coe BSC Chemical Engineering Study Plan
No ratings yet
Coe BSC Chemical Engineering Study Plan
1 page
Gate Solution 2023
No ratings yet
Gate Solution 2023
37 pages
Mathematics: Practice Paper 8300
No ratings yet
Mathematics: Practice Paper 8300
20 pages
Ramakrishnan, C. Zirkonium, Non-Invasive Software For Sound Spatialisation
No ratings yet
Ramakrishnan, C. Zirkonium, Non-Invasive Software For Sound Spatialisation
9 pages
Total Productive Maintenance Worksheet
100% (1)
Total Productive Maintenance Worksheet
2 pages
Final Report
No ratings yet
Final Report
72 pages
Astm c393 Testing Fixture
100% (1)
Astm c393 Testing Fixture
3 pages
Math Action Plan in Elementary
100% (9)
Math Action Plan in Elementary
5 pages
Unit 3
No ratings yet
Unit 3
13 pages
Performance Assesment of Spatial Filters in Noise Removal Based On A Degradation Model
No ratings yet
Performance Assesment of Spatial Filters in Noise Removal Based On A Degradation Model
4 pages
Chapter 8 Electricity and Magnetism
No ratings yet
Chapter 8 Electricity and Magnetism
30 pages
Ncert Solutions Class 12 Exercise 13.2
No ratings yet
Ncert Solutions Class 12 Exercise 13.2
15 pages
TMPGLesson-6 3
No ratings yet
TMPGLesson-6 3
4 pages
SAFIR User Manual
No ratings yet
SAFIR User Manual
74 pages
Axioms Real Numbers
No ratings yet
Axioms Real Numbers
3 pages
A Systematic Algorithm For Denoising Audio Signal Using Savitzky - Golay Method
No ratings yet
A Systematic Algorithm For Denoising Audio Signal Using Savitzky - Golay Method
4 pages
AOP Unit 1 chapter 1.pptx
No ratings yet
AOP Unit 1 chapter 1.pptx
58 pages
Polynomial Representation and Addition
No ratings yet
Polynomial Representation and Addition
4 pages
Chap 016
No ratings yet
Chap 016
69 pages
Tephigramp
No ratings yet
Tephigramp
8 pages
Y8 Revision 2 Straight Lines Graphs and Simultaneous Equations 1
No ratings yet
Y8 Revision 2 Straight Lines Graphs and Simultaneous Equations 1
36 pages
Henry
No ratings yet
Henry
19 pages
Abu Ja'Far Muhammad Bin Musa Al-Khawarizmi (Father of Algebra and Algorithm)
No ratings yet
Abu Ja'Far Muhammad Bin Musa Al-Khawarizmi (Father of Algebra and Algorithm)
4 pages

Checking the normality of a dataset

Uploaded by

Checking the normality of a dataset

Uploaded by

By-Ritisnata Nayak

Checking the normality of a dataset

# D'Agostino's K-squared Test

# Skewness and Kurtosis

# Anderson-Darling Test (using 'nortest' package)

# Skewness and Kurtosis (using 'moments' package)

How to Perform in SAS:

o Shapiro-Wilk test (for sample sizes ≤ 2000).

3. Skewness and Kurtosis:

5. Transformations (If Data is Not Normally Distributed):

proc univariate data=transformed_data normal;

Summary of Steps in SAS:

Guidelines for Choosing a Method:

You might also like