QQ-Plot Excel Assignment
QQ-Plot Excel Assignment
5
The Q-Q Plot
Purpose
In this assignment you will learn how to correctly do a Q-Q plot in Microsoft Excel. You will also
learn that there is no magic behind Q-Q plot. However, in most other systems, such as R,
normal Q-Q plot is available as a convenience feature, so you dont have to work so hard!
Loosly, these instructions are based on this link:
http://facweb.cs.depaul.edu/cmiller/it223/normQuant.html
The link contains a worked out spreadsheet example which fully explains the method.
Instructions
1. Follow the instructions provided here for data named time, but replace the data with
the residuals of the FEV data of Excel Assignment 2. You should standardize the
residuals.
2. After youve obtained the Normalized Q-Q plot for your residuals, draw the regression
line, display its equation and the coefficient of determination.
Step-by-step instructions
1. Place or load your data values into the first column. Leave the first row blank for
labeling the columns. Sort the data in ascending order (look under the Data
menu).
2. Label the second column as Rank. Enter the ranks, starting with 1 in the row right
below the label. Each following row will be one more than the last (note: you can
use an expression, copy and then paste to save you time)
3. Label the third column as Rank Proportion. This column shows the rank
proportion of each value. Use this expression for the first data value =(b2 - 0.5) /
count(b$2:b$N) where N should have the row number of the last cell. Finish the
column by copying the first data expression to the remaining rows. Check to make
sure your percentiles look like they are correct!
4. Label the fourth column as Rank-based z-scores. Excel provides these values
with the normsinv function. Use this function to create the values in the fourth
column.
5. Copy the first column to the fifth column. The Excel chart wizard works better if
the x-axis values are just to the left of the y-axis values.
6. Select the fourth and fifth column. Select the chart wizard and then the scatter
plot. The default data values should be good, but you should provide good labels.
Sample Data
The data is time and is in the first column. The remaining columns are auxillary columns used
in creating of the Q-Q plot.
time
rank
16.042
16.606
18.367
20.03
20.042
30.726
31.538
32.428
32.589
33.522
39.5
39.619
41.362
41.673
45.874
52.135
59.999
69.86
72.879
111.137
113.141
140.862
143.063
157.113
165.308
199.531
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
percentile
rank-based z-score time
0.01923077
-2.069901831
0.05769231
-1.574444965
0.09615385
-1.303782672
0.13461538
-1.104835744
0.17307692
-0.942075775
0.21153846
-0.801094529
0.25
-0.67448975
0.28846154
-0.557884763
0.32692308
-0.448425483
0.36538462
-0.344102463
0.40384615
-0.243404178
0.44230769
-0.145120941
0.48076923
-0.048223074
0.51923077
0.048223074
0.55769231
0.145120941
0.59615385
0.243404178
0.63461538
0.344102463
0.67307692
0.448425483
0.71153846
0.557884763
0.75
0.67448975
0.78846154
0.801094529
0.82692308
0.942075775
0.86538462
1.104835744
0.90384615
1.303782672
0.94230769
1.574444965
0.98076923
2.069901831
16.042
16.606
18.367
20.03
20.042
30.726
31.538
32.428
32.589
33.522
39.5
39.619
41.362
41.673
45.874
52.135
59.999
69.86
72.879
111.137
113.141
140.862
143.063
157.113
165.308
199.531
250
200
150
100
y = 49.133x + 67.113
R = 0.8278
50
0
-3
-2
-1
-50
Rank-based Z-score
Comment: The data were not normalized in this example, so the straight line is not close to y=x.
Also, the data does not appear quite normal, but R-squared is quite high.