0% found this document useful (0 votes)

42 views

Tips For Recognizing and Transforming Non-Normal Data

1) The data from a hospital emergency room waiting time study was non-normally distributed, as shown by a right-skewed histogram and control chart with a lower control limit below the natural limit of zero. 2) Applying a Box-Cox transformation with a lambda value of 0.5 normalized the data to a bell-shaped histogram, allowing the use of standard control chart calculations. 3) Alternatively, probability plots showed the data best fit a Weibull distribution, which was confirmed with an Anderson-Darling test, allowing construction of a proper individuals control chart for the non-normal data.

Uploaded by

rana__singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Tips For Recognizing and Transforming Non-Normal Data

Uploaded by

rana__singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Tips for Recognizing and Transforming Non-normal Data

Peter J. Sherman 7

Six Sigma professionals should be familiar with normally distributed processes: the characteristic bell-shaped curve that is symmetrical about the mean, with tails approaching plus
and minus infinity (Figure 1).

Figure 1: Normally Distributed Data

When data fits a normal distribution, practitioners can make statements about the population using common analytical techniques, including control charts and capability indices
(such as sigma level, Cp, Cpk, defects per million opportunities and so on).

But what happens when a business process is not normally distributed? How do practitioners know the data is not normal? How should this type of data be treated? Practitioners
can benefit from an overview of normal and non-normal distributions, as well as familiarizing themselves with some simple tools to detect non-normality and techniques to accurately
determine whether a process is in control and capable.

Spotting Non-normal Data

There are some common ways to identify non-normal data:

1. The histogram does not look bell shaped. Instead, it is skewed positively or negatively (Figure 2).

Figure 2: Positively and Negatively Skewed Data

2. A natural process limit exists. Zero is often the natural process limit when describing cycle times and lead times. For example, when a restaurant promises to deliver a pizza
in 30 minutes or less, zero minutes is the natural lower limit.
3. A time series plot shows large shifts in data.
4. There is known seasonal process data.
5. Process data fluctuates (i.e., product mix changes).

Transactional processes and most metrics that involve time measurements exist with non-normal distributions. Some examples:

• Mean time to repair HVAC equipment

• Admissions cycle time for college applicants
• Days sales outstanding
• Waiting times at a bank or physician’s office
• Time being treated in a hospital emergency room

Example: Time in a Hospital Emergency Room

A sample hospital’s target time for processing, diagnosing and treating patients entering the ER is four hours or less. Historical data is shown in Figure 3.

Figure 3: Time Spent in ER

An Individuals chart shows several data points outside of the upper control limits (Figure 4). Based on control chart rules, these special causes indicate the process is not in control
(i.e., not stable or predictable). But is this the correct conclusion?

Figure 4: Individuals Chart of Time Spent in ER

There are a couple of ways to tell the data may not be normal. First, the histogram is skewed to the right (positively). Second, the control chart shows the lower control limit is less
than the natural limit of zero. Third, notice the number of high points and no real low points. These tell-tale signs indicate the data may not be normally distributed enough for an
individuals control chart. When control charts are used with non-normal data, they can give false special-cause signals. Therefore, the data must be transformed to follow the normal
distribution. Once this is done, standard control chart calculations can be used on the transformed data.

A Closer Look at Non-normal Data

There are two types of non-normal data:

• Type A: Data that exists in another distribution

• Type B: Data that contains a mixture of multiple distributions or processes

Type A data – One way to properly analyze the data is identify it with the appropriate distribution (i.e., lognormal, Weibull, exponential and so on). Some common distributions, data
types and examples associated with these distributions are in Table 1.

Table 1: Distribution Types

Distribution Type Data Examples

Normal Continuous Useful when it is equally likely the readings will fall above or below the average

Lognormal Continuous Cycle or lead time data

Weibull Continuous Mean time-to-failure data, time to repair and material strength

Exponential Continuous Constant failure rate conditions of products

Poisson Discrete Number of events in a specific time period (defect counts per interval such as arrivals, failures or defects)

Binomial Discrete Proportion or number of defectives

A second way is to transform the data so that it follows the normal distribution. A common transformation technique is the Box-Cox. The Box-Cox is a power transformation because
the data is transformed by raising the original measurements to a power lambda (l).Some common lambda values, the transformation equation and resulting transformed value
assuming Y = 4 are in Table 2.

Table 2: Lambda Values and Their Transformation Equations and Values

Lambda (Λ) Transformation Equation Transformed Value

-2 1/Y2 1/42 = 0.0625

-0.5 1/((sqrt)Y) 1/((sqrt)4) = 0.5

-1.0 1/Y 1/4 = 0.25

0.0 Lognormal (ln) The logarithm having base e, where e is the constant equal to approximately 2.71828.
The natural log of any positive number, n, is the exponent, x, to which e must be raised

so that ex = n. For example, 2.71828x = 4, so the natural log of 4 is 1.3863.

0.5 (sqrt)Y (sqrt)4 = 2

1.0 Y 4

2.0 Y 2
42 = 16

Type B data – If none of the distributions or transformations fit, the non-normal data may be “pollution” caused by a mixture of multiple distributions or processes. Examples of this
type of pollution include complex work activities; multiple shifts, locations, or customers; and seasonality. Practitioners can try stratifying or breaking down the data into categories to
make sense of it. For example, the cycle time required for attorneys to complete contract documents is generally not normally distributed. Nor does it have a lognormal distribution.
Stratifying the data can make some contract documents, such as residential real estate closings, much simpler to research, draft and execute than more complex contract
documents. Hence, the complex contracts represent all the longer times, while the simpler contracts have shorter times. Another approach is to convert all the process data into a
common denominator, such as contract draft time per page. After, all the data can be recombined and tested for a single distribution.

Revisiting the Hospital Example

Because the hospital ER data is non-normal, it can be transformed using the Box-Cox technique and statistical analysis software. The optimum lambda value of 0.5 minimizes the
standard deviation (Figure 5).

Figure 5: Box-Cox Plot of Time Spent in ER

Notice that the histogram of the transformed data (Figure 6) is much more normalized (bell-shaped, symmetrical) than the histogram in Figure 3.

Figure 6: ER Time Data after Transformation

An alternative to transforming the data is to find a non-normal distribution that does fit the data. Figure 7 shows probability plots for the ER waiting time using the normal, lognormal,
exponential and Weibull distributions.

Figure 7: Various Distributions of Time in ER Data

Statistical software calculated the x– and y-axis of each probability plot so the data points would follow the blue, perfect-model line if that distribution was a good fit of the data.
Looking at the various distributions, the exponential distribution appears to be a poor model for hospital ER times. In contrast, data points in the lognormal and Weibull probability
plots follow the model line well. But which one is the better distribution?

The Anderson-Darling Normality test can be used as an indicator of goodness-of-fit. It produces a p-value, which is a probability that is compared to the decision criteria, alpha (a)
risk. Assume a = 0.05, meaning there is a 5 percent risk of rejecting the null when it is true. The hypothesis test for this example is:

Null (H0) = The data is normally distributed

Alternate (H1) = The data is not normally distributed

If the p-value is equal to or less than alpha, there is evidence that the data does not follow a normal distribution. Conversely, a p-value greater than alpha suggests the data is
normally distributed.

The p-value for the lognormal distribution is 0.058 while the p-value for the Weibull distribution is 0.162. While both are above the 0.05 alpha risk, the Weibull distribution is the better
distribution because there is a 16.2 percent chance of being wrong when rejecting the null.

Now the Weibull distribution can be used to construct the proper individuals control chart (Figure 8). Notice all of the data points are within the control limits; hence, it is stable and
predictable.

Figure 8: Individuals Control Chart Using Weibull Distribution

Now that the process is in control, it can be assessed using indices such as Cpk (Figure 9). Overall, this is a predictable process with 8.85 percent of ER visit time out of
specification.

Figure 9: Process Capability of Time in ER

A similar assessment can be made with a probability plot, which shows this is a predictable process and that 91 percent of the ER waiting times are within four hours. Put another
way, only 9 percent of the patients will take longer than the four-hour target to be processed, diagnosed and treated in the hospital ER. This is an explanation that management can
readily understand.

Figure 10: Probability Plot of Time Spent in ER

Better Knowledge, Better Decisions

Non-normal data may be more common in business processes than many people think. When control charts are used with non-normal data, they can give false signals of special
cause variation, leading to inaccurate conclusions and inappropriate business strategies. Given this reality, it is important to be able to identify the characteristics of non-normal data
and know how to properly transform the data. In doing so, practitioners will make better decisions about their business and save time and resources in the process.

Amit Kumar Ojha

Great post Peter. I have one doubt here.
For large sample size, are capability analysis and control charts sensitive to normality assumption ?
By large sample size I mean greater than 100…

Chas Ward
Yes; the answer is to be found early-on:
When data fits a normal distribution, practitioners can make statements about the population using common analytical techniques, including control charts
and capability indices …

If you are clear that the sample is representative of the population then the characteristics describing the shape should be identical for both sample and
population. The normal distribution is, or should be, the shape of both the sample and the population. A larger sample size should, if randomly selected, be
more representative of the population than a smaller one. HTH

AL
You can use Kolmogorov-Smirnov test for large sample size and shapiro wilk for smaller than 2000.

null hypothesis is normality

Sean
Hi,

Very helpful! Was this completed in R? Is there any place I can find this code/dataset on the web?

Thanks!

Introduction to Applied Econometrics Analysis Using Stata
From Everand
Introduction to Applied Econometrics Analysis Using Stata
Justin Doran
5/5 (3)
Difference Between Quantative and Qualative Research
100% (1)
Difference Between Quantative and Qualative Research
13 pages
Tips and Tricks For Analyzing Non-Normal Data
No ratings yet
Tips and Tricks For Analyzing Non-Normal Data
3 pages
Control Charts For Non-Normal Data: Illustrative Example From The Construction Industry Business
No ratings yet
Control Charts For Non-Normal Data: Illustrative Example From The Construction Industry Business
6 pages
Healy MJR - Non-Normal Data.
No ratings yet
Healy MJR - Non-Normal Data.
6 pages
six sigma unit 5
No ratings yet
six sigma unit 5
105 pages
Control Charts and NonNormal Data
No ratings yet
Control Charts and NonNormal Data
9 pages
Session 7 Probability Distribution II - Continuous
No ratings yet
Session 7 Probability Distribution II - Continuous
30 pages
Ngec4 Section 8 Normal Dist
No ratings yet
Ngec4 Section 8 Normal Dist
16 pages
Statistical Process Control
No ratings yet
Statistical Process Control
39 pages
EJ1165803
No ratings yet
EJ1165803
15 pages
Assignment 570 Final - NTĐ
No ratings yet
Assignment 570 Final - NTĐ
21 pages
Minitab Capability Analysis 1
No ratings yet
Minitab Capability Analysis 1
5 pages
Statistics 1 - Sesi 11 - Continuous Probability Distributions (Cont.)
No ratings yet
Statistics 1 - Sesi 11 - Continuous Probability Distributions (Cont.)
40 pages
Chooser Capability Analysis
No ratings yet
Chooser Capability Analysis
14 pages
Chooser Capability Analysis PDF
No ratings yet
Chooser Capability Analysis PDF
14 pages
Mini Tab Capability Method Chooser
100% (1)
Mini Tab Capability Method Chooser
14 pages
Data Analysis Chp9 RM
No ratings yet
Data Analysis Chp9 RM
9 pages
Statistics Normality
No ratings yet
Statistics Normality
42 pages
Normality Test in Clinical Research
No ratings yet
Normality Test in Clinical Research
7 pages
When To Use A Histogram
No ratings yet
When To Use A Histogram
4 pages
T Rns Formations
No ratings yet
T Rns Formations
6 pages
Assignment 9 Nomor 1
No ratings yet
Assignment 9 Nomor 1
2 pages
Statistics 1 - Sesi 10 - Continuous Probability Distributions
No ratings yet
Statistics 1 - Sesi 10 - Continuous Probability Distributions
26 pages
Salinan Terjemahan Pengertian Uji Normalitas
No ratings yet
Salinan Terjemahan Pengertian Uji Normalitas
8 pages
5224 Measure Phase 2
No ratings yet
5224 Measure Phase 2
41 pages
Is Important Because:: The Normal Distribution
No ratings yet
Is Important Because:: The Normal Distribution
19 pages
Non-Normal Distribution Big
No ratings yet
Non-Normal Distribution Big
7 pages
Community Project: Checking Normality For Parametric Tests in R
No ratings yet
Community Project: Checking Normality For Parametric Tests in R
4 pages
QC TRNG
No ratings yet
QC TRNG
55 pages
Normal_dist
No ratings yet
Normal_dist
38 pages
2.1 - Normal Data
No ratings yet
2.1 - Normal Data
19 pages
Control Chart Basics: Primary Knowledge Unit Participant Guide
No ratings yet
Control Chart Basics: Primary Knowledge Unit Participant Guide
24 pages
Lesson 5 Probability Distributions 1
No ratings yet
Lesson 5 Probability Distributions 1
10 pages
Lesson 2.4–the Empirical Rule and Assessing Normality
No ratings yet
Lesson 2.4–the Empirical Rule and Assessing Normality
11 pages
Is Important Because:: TECH 6300 Introduction To Statistical Inference The Normal Distribution
100% (1)
Is Important Because:: TECH 6300 Introduction To Statistical Inference The Normal Distribution
19 pages
Normal Probability Plot
No ratings yet
Normal Probability Plot
6 pages
IS15 IS205 REE Unit 2 Failure Data Analysis
No ratings yet
IS15 IS205 REE Unit 2 Failure Data Analysis
5 pages
Normal Distribution and Regression Notes
No ratings yet
Normal Distribution and Regression Notes
71 pages
Normal Distribution Curve
0% (1)
Normal Distribution Curve
14 pages
g (2)
No ratings yet
g (2)
3 pages
Assessing Univariate and Multivariate Normality, A Guide For Non-Statisticians
No ratings yet
Assessing Univariate and Multivariate Normality, A Guide For Non-Statisticians
8 pages
Is Linear Regression Valid When The Outcome (Dependant Variable) Not Normally Distributed?
No ratings yet
Is Linear Regression Valid When The Outcome (Dependant Variable) Not Normally Distributed?
3 pages
Community Project: Checking Normality For Parametric Tests in SPSS
No ratings yet
Community Project: Checking Normality For Parametric Tests in SPSS
4 pages
PM&R Volume 4 Issue 12 2012 - Sainani, Kristin L. - Dealing With Non-Normal Data
No ratings yet
PM&R Volume 4 Issue 12 2012 - Sainani, Kristin L. - Dealing With Non-Normal Data
5 pages
What Is Normal Distribution
No ratings yet
What Is Normal Distribution
1 page
Business Research 2
No ratings yet
Business Research 2
8 pages
9 Ge1
No ratings yet
9 Ge1
19 pages
Chapter 5 The Normal Distribution
No ratings yet
Chapter 5 The Normal Distribution
5 pages
Normal Distribution Statistics Coursework
100% (2)
Normal Distribution Statistics Coursework
4 pages
Packet6
No ratings yet
Packet6
16 pages
Unit-4 Biostatistics Descriptive
No ratings yet
Unit-4 Biostatistics Descriptive
19 pages
39 1 Norm Dist
No ratings yet
39 1 Norm Dist
24 pages
3.0 Common Probability Distribution
No ratings yet
3.0 Common Probability Distribution
94 pages
3 Common Proba Distribution
No ratings yet
3 Common Proba Distribution
58 pages
Chapter 04 R
0% (1)
Chapter 04 R
31 pages
5 Normal distribution curve dr.nj 9 jan
No ratings yet
5 Normal distribution curve dr.nj 9 jan
23 pages
Normal Probability Curve
No ratings yet
Normal Probability Curve
6 pages
Statistical Process Control
No ratings yet
Statistical Process Control
24 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
IGNOU Conformation
No ratings yet
IGNOU Conformation
1 page
Application Form Ignou Id
No ratings yet
Application Form Ignou Id
2 pages
J 51 Rrks Sonal Priyanka Mayank PDF
No ratings yet
J 51 Rrks Sonal Priyanka Mayank PDF
19 pages
Intro To Citrus Unit 10
100% (2)
Intro To Citrus Unit 10
30 pages
Ar2014 15
No ratings yet
Ar2014 15
250 pages
Pack House Design
100% (1)
Pack House Design
24 pages
Predicting Travel Mode of Individuals by Machine Learning: Sciencedirect
No ratings yet
Predicting Travel Mode of Individuals by Machine Learning: Sciencedirect
10 pages
Me Topic Wise Conventionl Sample Book PDF For Ies Exam
No ratings yet
Me Topic Wise Conventionl Sample Book PDF For Ies Exam
16 pages
Rake Angle On Cutting Tools
No ratings yet
Rake Angle On Cutting Tools
21 pages
Fits and Tolerence
No ratings yet
Fits and Tolerence
34 pages
Schaums 2500 Problemas Resueltos de Mecanica de Fluidos e Hidrulica
83% (23)
Schaums 2500 Problemas Resueltos de Mecanica de Fluidos e Hidrulica
807 pages
Environmental Studies
No ratings yet
Environmental Studies
297 pages
Steam Turbine Buy Onkar
100% (8)
Steam Turbine Buy Onkar
965 pages
Management College of Southern Africa
100% (1)
Management College of Southern Africa
123 pages
SMDM Project
No ratings yet
SMDM Project
29 pages
Damages in Construction Arbitrations: Michael W Kling Thomas Gaines Secretariat International, Inc
No ratings yet
Damages in Construction Arbitrations: Michael W Kling Thomas Gaines Secretariat International, Inc
21 pages
论文数据分析
100% (1)
论文数据分析
6 pages
Security Analysis
No ratings yet
Security Analysis
33 pages
BADM (2nd) May2022 2
No ratings yet
BADM (2nd) May2022 2
2 pages
Best Practices Utilized and Challenges Encountered by Selected Cooperatives in Lipa City: Basis for Enhancement Program
No ratings yet
Best Practices Utilized and Challenges Encountered by Selected Cooperatives in Lipa City: Basis for Enhancement Program
17 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
Abstract:: A Study On Mergers and Acquisition in Banking Industry of India
No ratings yet
Abstract:: A Study On Mergers and Acquisition in Banking Industry of India
4 pages
Halima Aktar Panshi
No ratings yet
Halima Aktar Panshi
4 pages
Prediction of Cervical Cancer From Behavior Risk Using Machine Learning Techniques
No ratings yet
Prediction of Cervical Cancer From Behavior Risk Using Machine Learning Techniques
10 pages
Data Sciene Bro
No ratings yet
Data Sciene Bro
18 pages
Role of Budgeting Practices in Service Delivery in The Public Sector: A Study of District Assemblies in Ghana
0% (1)
Role of Budgeting Practices in Service Delivery in The Public Sector: A Study of District Assemblies in Ghana
11 pages
Lecture 1.5-1.6
No ratings yet
Lecture 1.5-1.6
23 pages
Fba Unit 1 2 3
No ratings yet
Fba Unit 1 2 3
15 pages
Data Science Life Cycle
No ratings yet
Data Science Life Cycle
7 pages
Design and Implementation of a Chatbot for University Student Support Service Using Natural Language Processing.
No ratings yet
Design and Implementation of a Chatbot for University Student Support Service Using Natural Language Processing.
11 pages
Multiple-Choice Test Linear Regression Regression: y X y X y X
No ratings yet
Multiple-Choice Test Linear Regression Regression: y X y X y X
2 pages
Machine Learning Performance Evaluation Report
No ratings yet
Machine Learning Performance Evaluation Report
40 pages
Mis-specifications of regression model
No ratings yet
Mis-specifications of regression model
18 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
7 pages
STAT 252-Notes-Topic 5-Multiple Linear Regression
No ratings yet
STAT 252-Notes-Topic 5-Multiple Linear Regression
33 pages
IEOR E4525 Logistics 2017
No ratings yet
IEOR E4525 Logistics 2017
3 pages
data science
No ratings yet
data science
2 pages
Pengaruh Gaya Hidup, Kelas Sosial, Dan Kepribadian Terhadap Keputusan Pembelian Pada Perumahan Griya Permata Insani
No ratings yet
Pengaruh Gaya Hidup, Kelas Sosial, Dan Kepribadian Terhadap Keputusan Pembelian Pada Perumahan Griya Permata Insani
17 pages
Assessment of Asset Management Practice in Commercial Bank of Ethiopia in Case Study of Kebri Dehar Branch
No ratings yet
Assessment of Asset Management Practice in Commercial Bank of Ethiopia in Case Study of Kebri Dehar Branch
8 pages
Week 2 Notes
No ratings yet
Week 2 Notes
11 pages
MKT-Research, Course Outline
No ratings yet
MKT-Research, Course Outline
5 pages

Tips For Recognizing and Transforming Non-Normal Data

Uploaded by

Tips For Recognizing and Transforming Non-Normal Data

Uploaded by

Tips for Recognizing and Transforming Non-normal Data

Figure 1: Normally Distributed Data

Spotting Non-normal Data

Figure 2: Positively and Negatively Skewed Data

• Mean time to repair HVAC equipment

Example: Time in a Hospital Emergency Room

Figure 3: Time Spent in ER

Figure 4: Individuals Chart of Time Spent in ER

A Closer Look at Non-normal Data

• Type A: Data that exists in another distribution

Table 1: Distribution Types

Distribution Type Data Examples

Lognormal Continuous Cycle or lead time data

Exponential Continuous Constant failure rate conditions of products

Binomial Discrete Proportion or number of defectives

Table 2: Lambda Values and Their Transformation Equations and Values

Lambda (Λ) Transformation Equation Transformed Value

-0.5 1/((sqrt)Y) 1/((sqrt)4) = 0.5

-1.0 1/Y 1/4 = 0.25

so that ex = n. For example, 2.71828x = 4, so the natural log of 4 is 1.3863.

0.5 (sqrt)Y (sqrt)4 = 2

Revisiting the Hospital Example

Figure 5: Box-Cox Plot of Time Spent in ER

Figure 6: ER Time Data after Transformation

Figure 7: Various Distributions of Time in ER Data

Null (H0) = The data is normally distributed

Alternate (H1) = The data is not normally distributed

Figure 8: Individuals Control Chart Using Weibull Distribution

Figure 9: Process Capability of Time in ER

Figure 10: Probability Plot of Time Spent in ER

Better Knowledge, Better Decisions

See Also See Also

Amit Kumar Ojha

null hypothesis is normality

You might also like