How Normal Is Normal, Using A Q-Q Plot, ASTM Data Points, May-June 2014
How Normal Is Normal, Using A Q-Q Plot, ASTM Data Points, May-June 2014
B y A l e x T . C . La u
Q: How can I determine if my data comes kk The slope of the approximate straight line is
from a normal distribution? an indication of the magnitude of the data
set standard deviation, where a steep slope
A. A quantile-quantile, or Q-Q, plot can be used represents a large standard deviation and a
to examine if a data set is approximately normal. shallow slope represents a small standard
A lion’s share of statistics interpretation and deviation.
associated decision making are based on the A simple description of how to construct a
assumption that the universe from which the Q-Q plot is outlined below. The Q-Q plot proce-
limited data set is obtained, or the statistics dure is as follows:
calculated from the data set, can be adequately 1. Order the data from smallest to largest (n =
represented (modeled) by the Gaussian, which total number of observations).
is more commonly known as the normal 2. Create an index i next to the ordered data
distribution. There is a plethora of techniques where i will take on values from 1 through n,
that can be used to validate the reasonableness with the lowest value assigned i = 1 and the
of this normal assumption. Most techniques highest assigned i = n.
will require a commercial statistical software 3. Calculate fi = (i - 0.5)/n for each observation.
package to carry out the necessary This is a rank plotting position for the Q-Q plot.
computations and plots. This article describes a 4. Obtain from the cumulative distribution
graphic technique that can be used to visually version of a standard normal distribution table
determine if the data are approximately ( µ = 0, σ = 1) the value of z i for each fi . An
normally distributed. The technical name for easier approach is to use the Excel spreadsheet
this technique is the Q-Q plot. function NORMSINV function to compute the z i
The Q-Q plot is a graphical method for studying values as shown in Table 1. Pair it to the
how well the underlying distribution from which observation with index i for plotting later.
the dataset is collected can be approximated 5. Plot each observation value on the y-axis
by the normal model. It is equivalent to the against its z i value obtained in step 4 on the
classical normal probability plot but, unlike the x-axis using ordinary linear graph paper. This
latter, no specialized scale or probability paper creates the Q-Q plot (see Figure 1).
is required. This plot can be easily implemented The next step is to visually examine the plot
in a spreadsheet tool such as Excel using the for approximate linearity. If the Q-Q plot pattern
NORMSINV function. The data can be deemed to is linear, or nearly so, the data distribution is well
be “adequately” normal if most of the points in the approximated by the normal model. Significant
plot lie roughly along a straight line. In addition to deviation from linearity should serve as a signal
judgment of data normality, other salient features for potential failure of the normality assumption.
associated with the Q-Q plots are: Interested readers are referred to ASTM
kk The y-axis is in the original units of the data, D6299, Practice for Applying Statistical Quality
kk Potential outlier(s) can be visually identified Assurance and Control Charting Techniques
as the point(s) that deviate significantly from to Evaluate Analytical Measurement System
the approximate straight line along which Performance, for a detailed description of
most of the data lie, the Q-Q plot as well as how to calculate an
kk The y-intercept of the approximate straight associated A-D (Anderson-Darling) statistic to
line is the median of the data set, and assess data normality.
16 a s t m S TA N D A R D I Z AT I O N N e w s o M ay/J u n e 2 0 1 4 www.astm.org/standardization-news
Table 1 — Data for Q-Q Plot Figure 1 — Q-Q Plot of Ordered Data* versus Zi**
Original Ordered Index i
Data Data fi =(i-0.5)/n z i =NORMSIN( fi ) Q-Q Plot of Ordered Data
49.5
Alex T.C. Lau , TCL Consulting, Whitby, Ontario, and coordinator of the DataPoints column; he is immediate
Canada, is chairman of Subcommittees D02.94 on Quality past chairman of Committee E11 on Quality and Statistics.
Assurance and Statistics and D02.01.0B on Precision,,
which are part of ASTM Committee D02 on Petroleum
Products and Lubricants. An ASTM International fellow,
snonline
Lau is also a member of Committees E11 on Quality and Get more tips for ASTM standards development at
Statistics, E36 on Accreditation and Certification, and F08 www.astm.org/sn-tips.
on Sports Equipment and Facilities.
Find other DataPoints articles at www.astm.org/
Dean V. Neubauer, Corning Inc., Corning, N.Y., is an standardization-news/datapoints.
ASTM International fellow, chairman of E11.90.03 on Publications
w w w . a s t m . o r g / s t a n d a r d i z a t i o n - n e w s M a y / J u n e 2 0 1 4 o a s t m S T A N D A R D I Z A T I O N N EWS 17