0% found this document useful (0 votes)
3 views

Unit-6

Uploaded by

soneygashe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit-6

Uploaded by

soneygashe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

UNIT 6

RESIDUAL ANALYSIS

Structure
6.1 Introduction R-student

Expected Learning Outcomes 6.5 Residual Plots


6.2 Model Adequacy Diagnostic Various Forms of the Residual Plots

Examination of the Linearity 6.6 Normal Plots


Assumption
Some Patterns of the Normal Plots
6.3 Computation of Residuals
6.7 Summary
Some Properties of Residuals
6.8 Terminal Questions
6.4 Scaling of the Residuals
6.9 Solutions/Answers
Standardised Residuals

Studentised Residuals

PRESS Residuals

6.1 INTRODUCTION
In Unit 1, you have discovered how a scatter plot helps to understand the
relationship between response and explanatory variables before fitting the
regression model. You have also learnt how to fit a simple and multiple linear
regression model using the ordinary least squares method in Unit 2.
Once you fit the regression model, you need to go ahead with diagnostics of
the fitted regression model to verify the underlying assumptions of the model.
We explore how to use residuals (the difference between the observed and
predicted values of the response variable) or scaled residuals to verify some of
the model assumptions in this unit. The essential point to focus on here is that
the errors or disturbances ( i ) and the residuals ( ri ) are not the same but are
closely related. As we have already discussed, the error of an observed value
is a difference between the observed value and actual value (or expected
value based on the whole population) of the response variable for a given
data. The error term ( i ) is the unobservable random variable which cannot be
measured or observed directly. The error terms are assumed to be normally
distributed with zero mean and common variance σ2 as well as uncorrelated to
each other (see Fig. 6.1). 159
Block 2 Model Adequacy Checking

Fig. 6.1: Distribution of the Error Terms.

However, the residual of an observed value is the difference between the


observed and predicted values of the response variable. The mean and
variance of the residuals ( ri ) are derived as E(ri ) = 0 and Var ( ri ) = 2 (I − H) ,
respectively. The error terms are unknown, and the residuals are used to
check the properties of error terms related to the fitted model, such as
normality, constant variance, etc.
In this unit, we will focus on the computation of residuals and scaled residuals
along with the graphical diagnostics based on these residuals or scaled
residuals. If the residuals of a fitted model do not seem reasonable, it is likely
to be doubtful for the fulfilment of either of the assumptions. In Section 6.2, we
will revisit some assumptions considered for checking model adequacy and
the role of residuals in diagnosing these assumptions. We consider various
ways to obtain the scaled residuals in Section 6.3. Whereas in Section 6.4, we
will discuss residual plots based on predicted values of the response variable
and the explanatory variables to diagnose the assumptions of linearity,
constant variance, and presence of outlier. Section 6.5 will describe how to
create the normal probability, normal quantile, P-P and Q-Q plots.
In the next unit, you will learn about partial regression and partial residual
plots.

Expected Learning Outcomes


After studying this unit, you should be able to:
❖ compute the residuals and scaled residuals;
❖ check the adequacy of the fitted model with the help of residual analysis;
❖ construct and interpret the various residual plots; and
❖ create and interpret the normal probability, normal quantile, P-P and Q-Q
plots.

6.2 MODEL ADEQUACY DIAGNOSTIC


We have discussed some assumptions of the simple and multiple linear
regression models in Section 1.4.3 of Unit 1. Recall these assumptions from
Section 1.4.3, wherein we described the linearity, homoscedasticity and
160 normality assumptions required for linear regression models.
Unit 6 Residual Analysis

It is necessary to check the authenticity of these assumptions for a valid


regression analysis. These assumptions are required while fitting the linear
regression models, computing point and interval estimators of the regression
parameters as well as testing the significance of the regression parameters
and the fitted regression model. Just for a quick revision, some of the required
assumptions to be verified in this unit are given as follows:
• The relationship between the response and explanatory variables should
be approximately linear.
• The error terms should be normally distributed with zero mean and
constant variance.
• The error terms should be uncorrelated.
We have already discussed testing the individual regression parameters as
well as the overall fitted model accompanied by the coefficient of
determination to check the goodness of fit of the fitted model in Unit 3.
Furthermore, we need to validate the regression model's underlying
assumptions before concluding the regression modelling. Violating any
assumption may lead to incorrect results, which may also cause an unstable
model for different samples.
In this unit, we focus on some diagnostic approaches that study the residuals'
pattern to verify the execution of the regression model's assumptions. In
residual analysis, we use different types of plots to diagnose if there is any
violation in the regression assumptions. Let us first examine the simplest and
initial way to check the linear relationship between response and explanatory
variables before fitting the simple and multiple linear regression models when
we have one and more than one explanatory variable, respectively.
6.2.1 Examination of the Linearity Assumption
(i) For simple linear regression model
As discussed in Section 1.3.1 of Unit 1, we can quickly get an idea of
whether the response and explanatory variables are linearly related or
not with the help of a scatter diagram. A scatter diagram indicates an
upward or downward linear pattern if the response and explanatory
variables are linearly related. Otherwise, the relationship between them
is considered as non-linear. If you look at Fig. 6.2, you will observe that
scatter plots given in Fig. 6.2(a) and (b) show linear and non-linear
patterns, respectively.

(a) (b)
Fig. 6.2: Scatter Plot. 161
Block 2 Model Adequacy Checking

(ii) For multiple linear regression model


When we have more than one explanatory variable in a multiple
regression model, we create a scatter plot matrix to examine the linearity
assumption. As discussed in Section 1.4.2 of Unit 1, a scatter plot matrix
is a collection of scatter diagrams plotted between pairs of two variables
to assess the relationship between those pairs. So, we get some idea
about the linear or non-linear relationship between these pairs of
variables.
In the lab course "MSTL-012: Statistical Computing using R-II", you will
explore different kinds of scatter matrix plots, which can provide
additional information, like correlation coefficient, distribution of the
variables, etc. We construct the scatter matrix plot before fitting a
multiple regression model to be assured about the linearity between
each pair of variables. We consider the linearity assumption between the
response variable and set of explanatory variables (X1, X2 ,..., Xk ) jointly,
not among the pairs of two variables in the multiple regression model.
There may also arise a situation when explanatory variables are
correlated among themselves, which causes a multicollinearity problem.
We will discuss the issue of multicollinearity in Unit 10. In that case,
we do not get a clear picture of how to validate the linearity assumption.
In the subsequent section, we will first understand the computation of
residual and its properties.
Let us pause here and answer the following Self Assessment Question.

SAQ 1
State the assumptions of a linear regression model.

6.3 COMPUTATION OF RESIDUALS


In this section, we explore the vital role of residuals in investigating the
accuracy of the fitted regression model and detecting departures from the
underlying assumptions of the regression model. The residuals can be
perceived as the deviations of the observed values from their corresponding
fitted values. That is why residuals also measure the variability in the response
variable, which is not explained by the fitted regression model. These
residuals can be helpful to check the departure from the assumptions on
random errors. The residual analysis facilitates the finding if there is any
inadequacy in the fitted regression model.
In Section 2.3, we have obtained the fitted value of the response variable (Y)
for a multiple regression model for the ith observation as:

ŷi = ˆ 0 + ˆ 1x1i + ˆ 2 x2i + ... + ˆ k xki ; i = 1, 2, ..., n ... (1)

In matrix form, we can define the vector of predicted values as:


ˆ = Xˆ
Y … (2)

ˆ = ( yˆ , yˆ , ..., yˆ ) is a (n  1) vector of n predicted values of the


where, Y 1 2 n

162 response variable (Y).


Unit 6 Residual Analysis
th th
The i residual is expressed as a difference between the i observed value
( yi ) and the corresponding fitted value ( ŷi ) of the response variable (see Fig.
6.3).
Therefore, the ith residual is defined as:
ri = yi − yˆ i ; i = 1, 2, ..., n ... (3)

In matrix form, we can write


R = Y − Yˆ ... (4)

where R = ( r1, r2 , ..., rn ) is a vector of n residuals.

Y
Residuals

Regression Line

Ideally, the sum of


residuals for all given
observations should
always be equal to
n
Fitted zero, i.e.,  ri = 0 . But
Values i =1

sometime, rounding
0 errors may change the
X
result when we
approximate the
Fig. 6.3: Residuals
decimal places.

Ideally, according to the property of residuals, the sum of the residuals for all
n
given observations should always be equal to zero, i.e.,  ri = 0 . So, from the
i =1

properties of residuals, we know that it has zero mean and variance 2 . The
mean of the residuals is obtained as:
E(ri ) = E(yi − yˆ i )
= E(yi ) − E(yˆ i )
= yi − yi = 0

where yi = 0 + 1x1i + 2 x 2i + ... + k xki + i , E(i ) = 0 , E(i ) = i


The residuals
ŷi = ˆ 0 + ˆ 1x1i + ˆ 2 x2i + ... + ˆ k xki and E(ˆ i ) = i possibly explain the
irregularity of the
As explained in equations (15) and (40) of Unit 3, when the value of 2 is fitted model that
unknown, we estimate the approximate variance of residuals from the given might have
occurred and
data as:
misled us.
n n n

 (r − r ) i
2
r i
2
 (y i − yˆ i )2
ˆ 2 = i =1
= i =1
= i =1
= MSSRe s … (5)
n − k −1 n − k −1 n − k −1
n

r i
where r = i =1
= 0 and k is the number of explanatory variables in the model.
n
163
Block 2 Model Adequacy Checking

6.3.1 Some Properties of Residuals


Some of the properties of the residuals are given as follows:
(i) The sum of the residuals ( ri ) is always zero, i.e.,
n n

r = (y
i =1
i
i =1
i − yˆ i ) = 0

(ii) The sum of the cross-products between the values of each one of the
explanatory variables and the residuals will be zero. In other words, if we
consider the corresponding values of an explanatory variable as weight,
the weighted sum of the residuals will be equal to zero, i.e.,
n

 x r = 0 ; j = 1, 2, …, k.
i =1
ji i

(iii) The sum of the cross-products between predicted values of the response
n
variable and the residuals will be zero, i.e.,  ŷ r = 0
i =1
i i

We now solve an example before studying various ways of scaling the


residuals.
Example 1: Let us consider data related to the yield of a crop recorded for 20
years. A researcher wishes to model annual yield (in kg/hectare) based on two
explanatory variables: yearly rainfall (in millimetres) and area under crop (in
million hectares). The data are given as follows:
Table 6.1: Yield of a crop for 20 years

Year Yield Rainfall Area

1 2340 1340 49
2 2310 1210 49
3 2040 1420 45
4 2630 1200 47
5 2340 1310 46
6 2330 1250 48
7 2340 1310 48
8 2340 1190 48
9 2380 1280 50
10 2360 1340 46
11 2330 1340 41
12 2180 1390 50
13 2100 1430 52
14 3160 1210 51
15 2400 1320 53
16 2390 1260 53
17 2400 1320 53
18 2400 1200 55
19 2440 1290 51
20 2420 1350 46

Determine the predicted values of the response variable and residuals for the
164 given data.
Unit 6 Residual Analysis

Solution: We can fit the regression model for the given data, as discussed in
Unit 2. We have obtained the best-fitted multiple regression model using the
method of ordinary least squares given as:
Yˆ = 4605.52767 − 1.79313X1 + 2.10913X2

We can also determine the predicted values of the response variable as given
in Column 3 of Table 6.2. The observed and predicted values of the response
variable corresponding to the first observation are 2340 and 2306.0832,
respectively (see Columns 2 and 3 of Table 6.2). We now determine the value
of the first residual using the formula given in equation (3) shown as follows:
r1 = 2340 − 2306.0832 = 33.9168

In the same way, we compute other values of the residuals for all observations
given in the data and arrange them in Column (4) of Table 6.2 as follows:
Table 6.2: Computation of the residuals

S. No. yi Predicted Value (yˆ i ) Residual ri = (yi − yˆ i )

(1) (2) (3) (4)


1 2340 2306.0832 33.9168
2 2310 2539.1898 –229.1898
3 2040 2154.1964 –114.1964
4 2630 2552.9029 77.0971
5 2340 2353.5496 –13.5496
6 2330 2465.3556 –135.3556
7 2340 2357.7679 –17.7679
8 2340 2572.9433 –232.9433
9 2380 2415.7800 –35.7800
10 2360 2299.7558 60.2442
11 2330 2289.2101 40.7899
12 2180 2218.5359 –38.5359
13 2100 2151.0290 –51.0290
14 3160 2543.4081 616.5919
15 2400 2350.3822 49.6178
16 2390 2457.9700 –67.9700
17 2400 2350.3822 49.6178
18 2400 2569.7759 –169.7759
19 2440 2399.9578 40.0422
20 2420 2281.8245 138.1755

It would be helpful if you now solved the following Self Assessment Question
to revise the computation of the residuals.

SAQ 2
Suppose a store manager wants to evaluate the effect of a factor such as the
price of a product (X) on the number of products sold monthly (Y). For this
purpose, the following data on 25 products are recorded to explore the
relationship between the monthly sales and the price of a product:
165
Block 2 Model Adequacy Checking

Table 6.3: Monthly sales and price of the products.

Product Price Sales Product Price Sales


1 75 4740 14 225 26580
2 32 2640 15 182 18960
3 112 21000 16 160 25000
4 54 4740 17 39 1020
5 64 12000 18 141 11280
6 128 21840 19 82 4620
7 150 28380 20 48 8340
8 212 47000 21 60 3360
9 121 10000 22 97 9360
10 230 41100 23 171 31680
11 202 35040 24 114 15000
12 188 31260 25 125 30000
13 83 19500

For this data, obtain the residuals.

6.4 SCALING OF THE RESIDUALS


As discussed, residuals possibly explain any irregularity in the fitted regression
model that might have occurred and misled us. It is sometimes essential to
obtain scaled residuals instead of residuals. When we have outliers or extreme
values in the data, it is helpful to determine the scaled residuals. We have
different methods for computing the scaled residuals. Here, we will discuss
four methods for scaling the residuals:
6.4.1 Standardised Residuals
We can obtain the standardised residuals by subtracting the mean of the
residuals from the residuals and dividing by its standard deviation. The ith
standardised residual is given as:
ri − E(ri )
si = ; i = 1, 2,..., n

where E(ri ) = 0

We define the ith standardised residual for unknown 2 as:


ri
si = … (6)
ˆ
where ˆ 2 = MSSRes

The standardised residuals (si) have mean zero and variance approximately
equal to one, i.e., E(si ) = 0 and Var(si )  1.

We can determine the values of outliers with the help of standardised


residuals. Generally, we consider the high value of standardised residual, say
greater than three, i.e., si  3, as an indication of an outlier.

6.4.2 Studentised Residuals


Another way of scaling the residuals is studentised residuals. You must have
166 spotted that we considered MSSRes while computing the standardised
Unit 6 Residual Analysis

residuals, which is an approximate variance of the residuals ( ri ) . For


computing the studentised residuals, we use the exact variance of the
residuals instead of MSSRes . The studentised residuals are determined by
dividing the residual by its estimated exact standard deviation. Before
discussing how to obtain studentised residuals, let us first understand the hat
matrix denoted by H.
Hat matrix H
In the case of a multiple linear regression model Y = X +  , the hat matrix H
of order (n  n) is defined as:

H = X( X'X )−1 X' … (7)

If we denote the (i, j)th element of the matrix H by hij, we write

h11 h12 ... h1n 


h h22 ... h2n 
H=  21
… (8)
 
 
hn1 hn2 ... hnn 

Note that the value of hij when we have only one explanatory variable is given
as:

 1 (xi − x)(x j − x) 
hij =  +  ; i, j = 1, 2, …, n … (9)
n SS x 

We can define the values of residuals as a residual vector shown as follows:


R = (I − H)

where, I is an identity matrix of order n.


The identity matrix of
The variance-covariance matrix of residuals ( ri ) is determined as: order 3 is given as:

V(R) = V (I − H)  = (I − H)V()(I − H) ( V() = 2I ) 1 0 0


I33 = 0 1 0 
0 0 1
= (I − H)  I (I − H) = (I − H) (I − H)
2 2

Since matrices H and (I−H) are symmetric, i.e., (I − H) =(I−H) and idempotent,
i.e., (I − H)(I − H) = (I − H) , we have

V(R) = (I − H) 2

We obtain the variance of the ith residual term as:


Var(ri ) = 2 (1 − hii ) … (10)

where, 0  hii  1.

The covariance of the (i, j)th residual terms is written as:


Cov(r,i rj ) = − 2hij … (11)

If we consider the estimated approximate value of 2 , i.e., ˆ 2 = MSSRes , the


Var(ri ) can be rewritten as:

Var(ri ) = ˆ 2 (1 − hii ) = MSSres (1 − hii ) … (12) 167


Block 2 Model Adequacy Checking

It implies that the estimated 2 , i.e., ˆ 2 = MSSres overestimates the variance of


the residuals. Thus, we compute the ith studentised residual as:
ri ri
di = = … (13)
Var(ri ) MSSres (1 − hii )

Note that the studentised residuals are usually larger than the ordinary
residuals ( ri ) or the standardised residuals ( si ). Then, diagnosing any violation
in the model's assumptions may be more accessible by examining the
studentised residuals than the ordinary and standardised residuals. The ith
studentised residual d i is also known as the ith internally studentised residual.

In the case of studentised residuals, we have


E(di ) = 0 and Var(di ) = 1 .

Note that the variance of residuals becomes stable, especially in the case of
large data. In many situations, the values of si and di may not be differentiated
much, and these values often communicate similar information. It is also
essential to remember that if the ith data point for which we have a large
residual as well as large hii , this point will possibly be highly influential on the
fitted regression model. So, in that situation, we generally advise assessing
the studentised residual di .

The variance of the ith residual in case of only one explanatory variable can be
obtained as:
  1 (x − x)2  
Var(ri ) = MSSRes 1 −  + i  … (14)
  n SS x  

The studentised residual defined in equation (13) can be reduced to the


following form in the case of one explanatory variable:
ri
di = ; i = 1, 2, ..., n … (15)
  1 (x − x)2  
MSSRes 1 −  + i 
 n SS x  

From equation (15), we can observe that the estimated variance of ri will be
larger (or ri will be smaller) if x is are close to x . On the contrary, we can say
that it will be smaller (or ri will be larger) if x is are towards the extreme ends
of the data. Moreover, if the value of n is large enough, it controls the impact of
the term (x i − x)2 and makes it comparatively smaller. As a result, ri may not
be considerably much different from di for large data.

6.4.3 PRESS Residuals


The full form of PRESS is the predicted residual error sum of squares.
When the ith observation of the response variable ( y i ) is an outlier, then y i will
also excessively influence the regression model fitted based on all given
observations. The difference between the ith predicted value ( ŷi ) and the ith
observed value ( y i ) , which is the ith residual ( ri ) will be small. In this situation,
it will be challenging to identify the outlier. So, we delete the ith observation of
the data and fit the regression model based on the remaining observations of
168 the given data. We then determine the ith predicted value from this model fitted
Unit 6 Residual Analysis
th
after eliminating the i observation from the data (denoted by ŷ (i) ), which may
not be affected by the ith observation. In this case, the PRESS residual will
possibly be helpful in quickly identifying the presence of an outlier.
To determine the ith PRESS residual, we first determine the fitted regression
model considering all observations (except the ith observation) of the response
and explanatory variables. For this, we delete the ith observation from the data
and then fit the regression model using the remaining (n−1) observations.
The fitted multiple regression model after excluding the ith observation is given
as:
Y(i) = ˆ (i)0 + ˆ (i)1X(i)1 + ˆ (i)2 X(i)2 + ... + ˆ (i)k X(i)k … (16)

where, Y(i) , X(i) j ' s and regression parameters ˆ (i) j ' s have usual meanings
except for the subscript (i), which denotes the deleted ith observation from the
data.
If the ith fitted value of the response variable using the regression model
defined in equation (16) is denoted by ŷ (i) , we define the ith PRESS residual
as:
r(i) = yi − yˆ (i) ; i = 1, 2, ..., n … (17)

It is also known as deleted residuals. Further, PRESS residuals are also


known as prediction errors since we use them to obtain the prediction error
sum of squares.
We will also use the PRESS residuals in computing the PRESS statistic in Unit
7 for model validation by examining the prediction ability of a model for
comparing two or more regression models.
The relationship between the ordinary residual ( ri ) and the PRESS residual
(r ) can be expressed as:
(i)

ri
r(i) = … (18)
1 − hii

From equation (18), notice that it is not necessary to determine the separate
fitted regression models excluding the respective observation for obtaining the
PRESS residuals. We can compute it directly with the help of ordinary
residuals (ri). From equation (18 ), we can consider PRESS residuals as
weighted ordinary residuals, which are weighted corresponding to the diagonal
elements of H. After looking at the relationship of ri and r(i) , we notice that the
residuals (corresponding to some points) with large hii will also have large
PRESS residuals, and these points will usually be high influence points.
Remember the large difference between ri and r(i) signals to a point where the
model fits the data well; otherwise, the model predicts poorly without that
point.
The expected value (mean) of the PRESS residuals will be
E(r(i) ) = 0 … (19)

We can determine the variance of the ith PRESS residual as:


169
Block 2 Model Adequacy Checking

 r  1
Var(r(i) ) = Var  i  = Var(ri ) … (20)
 1 − hii  (1 − hii )
2

From equation (10), we have Var(ri ) = (1 − hii )2

We now obtain Var(r(i) ) as:

1 2
Var(r(i) ) = (1 − h ) 2
= … (21)
(1 − hii )2 1 − hii
ii

The variance of the ith PRESS residual for unknown 2 using estimated
ˆ 2 = MSSRe s is given as:

MSSRe s
Var(r(i) ) = …(22)
1 − hii

We define the ith standardised PRESS residual as:


 ri 
r(i) − E(r(i) )  1 − hii  ri
*
s(i) = = = … (23)
Var(r(i) ) 2 2 (1 − hii )
(1 − hii )

When 2 is unknown, we obtain the ith standardised PRESS residual as:


ri
*
s(i) = … (24)
MSSRe s (1 − hii )

From equations (13) and (24), we can say that the formula of standardised
PRESS residual is the same as the studentised residual when we choose the
estimated 2 , i.e., ˆ 2 = MSSRe s .

6.4.4 R-student
As mentioned in Section 6.4.2, we conventionally estimate 2 with MSSres to
calculate the studentised residual ( di ), which is generally used to detect an
outlier. You should make a point that we determine MSSres by internally fitting
the regression model based on all n observations. This kind of scaling of the
residuals is termed internal scaling. There is another scaling termed externally
studentised residual, also known as R-student. Additionally, we estimate 2
after removing the ith observation from the given data to obtain the ith R-
student. Notice that here, we get the estimated value of 2 by excluding the ith
observation from the data in the case of the ith R-student. That is why it is also
called externally studentised residual. The R-student residuals are more
powerful in detecting outliers and checking the assumption of equal variance.

The estimated 2 after removing the ith observation is given as follows:

1  ri2 
MSSRes(i) = (n − k)MSSRes −  … (25)
n − k −1 1 − hii 

We determine the ith R-student as:


ri
d(i) = ; i = 1, 2,...,n … (26)
MSSRes(i) (1 − hii )
170
Unit 6 Residual Analysis

Note that there is a difference between the standardised PRESS residuals


defined in equation (24) and the R-Student residuals given in equation (26).
We used MSSRes computed for the model fitted on all observations to
determine the standardised PRESS residuals. On the other hand, we use
MSSRes(i) based on the model fitted after excluding the ith observation for
computing the R-Student residuals.

Notice that the MSSRes(i) will be significantly different from MSSres if the ith
data observation is influential. In this case, the R-student will be more
responsive (sensitive) to that point. Otherwise, there will not be much
difference between d(i) and di . Also note that d(i) follows t-distribution with
(n − k − 1) degrees of freedom underlying the usual regression model's
assumptions.
It is always desirable to check the behaviour of the residuals. Residuals
possibly explain any irregularity of the fitted regression model that might have
occurred and misled us. Before discussing the different residual plots, let us
first delve into the computation of the scaled residuals using the following
example.
Example 2: For the data mentioned in Example 1, obtain the standardised,
studentised, PRESS and R-student residuals. Also, check the presence of any
outliers in the given data.
Solution: It is given that n = 20 and k = 2. We have determined the values of
residuals in Example 1. From the solution of Example 1, we compute the
appropriate estimator of  using equation (5), shown as follows:
595657.5015
ˆ = MSSRes = = 35038.6766 = 187.1862
20 − 2 − 1
Now, with the help of this example, you should try to understand the above-
described theoretical framework for computing the scaled residuals. We obtain
the scaled residuals at different data points by constructing Table 6.4 with
eight columns. The description of these columns is as follows:
1. The second column contains the residuals determined in Table 6.3 for all
observations of the response variable.
ri
2. We compute the standardised residuals using the formula si =
MSSRe s
given in equation (6) and arrange them in Column 3.
3. Before computing other scaled residuals, we determine the ith diagonal
element of the hat matrix H = X( XX )−1 X , i.e., hii in Column 4.

4. In the fifth column, we present the studentised residuals using the


ri
formula di = based on the values of ri and hii given in
MSSres (1 − hii )
Columns 2 and 4, respectively.
5. As per equation (18), the PRESS residual r(i) can be expressed as
ri
r(i) = . Therefore, in Column 6, we calculate r(i) through dividing
1 − hii
Column 2 by (1− Column 4). 171
Block 2 Model Adequacy Checking

6. As discussed in equation (24), the standardised PRESS residual


ri
*
s(i) = will be the same as the studentised residuals
MSSRe s (1 − hii )
computed in Column 5.
7. Column 7 is devoted to calculating the estimated 2 (given by MSSRes(i) )
after eliminating the ith observation from the data. Hence, for calculating
MSSRes(i) in Column 7, we use the following formula:
1  ri2 
MSSRes(i) = (n − k)MSS − .
n − k −1 1 − hii 
Res

8. To calculate the R-student ( d(i) ), we consider the formula


ri
d(i) = . Thus, Column 8 represents the values of R-
It is to be noted that MSSRes(i) (1 − hii )
all calculations were
student.
performed up to 15
fixed decimal places Here, we compute the scaled residuals for all observations given in Example 1
for showing accurate as follows:
results in this block.
For the sake of Table 6.4: Computation of the Scaled Residuals
simplicity, we are Standardised Studentised PRESS
showing results up to S. Residual Residual R-student
hii Residual Residual MSRe s(i)
4 decimal places only. No. ri = (yi − yˆ i ) r d(i)
si = i
*
di = s (i) r(i)
The results may vary ˆ
if we carry out the
(1) (2) (3) (4) (5) (6) (7) (8)
calculations by fixing
values at various 1 33.9168 0.1812 0.0683 0.1877 36.4050 37027.1432 0.1826
decimal places. 2 –229.1898 –1.2244 0.1319 –1.3142 –264.0246 33540.2600 –1.3432
3 –114.1964 –0.6101 0.2344 –0.6972 –149.1557 36097.8317 –0.6869
4 77.0971 0.4119 0.1927 0.4584 95.5007 36666.6674 0.4481
5 –13.5496 –0.0724 0.0930 –0.0760 –14.9388 37087.8684 –0.0739
6 –135.3556 –0.7231 0.0849 –0.7559 –147.9179 35922.0395 –0.7466
7 –17.7679 –0.0949 0.0556 –0.0977 –18.8131 37080.1123 –0.0949
8 –232.9433 –1.2444 0.1900 –1.3827 –287.5966 33158.9693 -1.4214
9 –35.7800 –0.1911 0.0560 –0.1967 –37.9026 37020.0013 –0.1914
10 60.2442 0.3218 0.1004 0.3393 66.9663 36862.4613 0.3308
11 40.7899 0.2179 0.3487 0.2700 62.6309 36949.4981 0.2629
12 –38.5359 –0.2059 0.1527 –0.2236 –45.4784 36996.6841 –0.2176
13 –51.0290 –0.2726 0.3168 –0.3298 –74.6898 36875.5784 –0.3215
14 616.5919 3.2940 0.1319 3.5354 710.2686 11338.2547 6.2149
15 49.6178 0.2651 0.1403 0.2859 57.7155 36931.3213 0.2785
16 –67.9700 –0.3631 0.1254 –0.3883 –77.7153 36789.0512 –0.3789
17 49.6178 0.2651 0.1403 0.2859 57.7155 36931.3213 0.2785
18 –169.7759 –0.9070 0.2621 –1.0558 –230.0656 34802.1519 –1.0594
19 40.0422 0.2139 0.0676 0.2215 42.9434 36998.6253 0.2156
20 138.1755 0.7382 0.1071 0.7812 154.7405 35842.0484 0.7724

Since the value of the standardised residual for the 14th observation is more
than 3, it indicates an outlier in the given data.
We now give the following Self Assessment Question to practice computing
172
the scaled residuals for a simple regression model.
Unit 6 Residual Analysis

SAQ 3
For the data on the sales and price of the product given in SAQ 2, determine
the scaled residuals.

After obtaining the residuals and scaled residuals, we now discuss how to
check the underlying assumptions, namely linearity, constant variance, and
normality, to be valid for the fitted regression model. We use graphical
Usually, we prefer
methods to verify the validity of these assumptions. Let us verify how well the
to plot the
regression model fits the given data. For this purpose, we draw the residual studentised
and normal probability plots. We will discuss these plots in the following residuals because
sections, one at a time. they have constant
variance.

6.5 RESIDUAL PLOTS


We now explore the graphical verification of the adequacy of the fitted
regression model by validating the underlying assumptions. It is a very The residual plot can
effective way to analyse the residuals graphically. We may use a variety of be used to detect the
residual plots to authenticate various assumptions, and we will discuss some non-linearity and/or
unequal variance
of them here in this unit. You will study how to generate these graphs using R assumption(s). A
software in MSTL-012. The residuals or scaled residuals can be plotted normal probability plot
against any one of the following: of the residuals can
be used to detect the
(i) Residuals plot using predicted values of a response variable non-normality
assumption.
For plotting the residual plot, we take the residuals (ri ) or one of the
*
scaled residuals (i.e., si , di , s(i) , r(i) or d(i) ) on the vertical axis (Y-axis)
and the predicted values (yˆ i ) of the response variable on the horizontal
axis (X-axis).

(ii) Residuals plot using explanatory variables


We can also create a residual plot by plotting the residuals or scaled
residuals in opposition to the subsequent values of each explanatory
variable considered in the model. Identifying outliers or non-constant
variance or the non-linear relationship in the case of multiple linear
regression is also very beneficial. Here, we take residuals or scaled
residuals on the Y-axis and the explanatory variable X, for a simple
regression model or the jth explanatory variable X j' s ( j = 1, 2, ...,k) for a
multiple regression model on the X-axis.
The difference between these plots will only be in the level or scale of the
X-axis. As you know, the predicted values (yˆ i ) are the linear
combinations of values of the explanatory variable (X) in the case of a
simple linear regression model so that we may plot the residuals (or
scaled residuals) versus either predicated values (yˆ i ) or the explanatory
variable (X). That is why it is not essential to plot the residuals versus the
explanatory variable separately for a simple linear regression model.

6.5.1 Various Forms of the Residual Plots


We elucidate the various possible forms of the residual plots for a regression
model shown as follows:
173
Block 2 Model Adequacy Checking

(i) Ideal plot / horizontal band


If a plot displays randomly scattered points which form an approximately
constant width band around a line along the horizontal axis, it is known
as an ideal residual plot. Suppose the linear regression model (or mean
function) and its related assumptions are accurate. In that case, the plots
of residuals (or scaled residuals) versus any standard preference,
including fitted values of the response variable or the explanatory
variables, should look like an ideal plot. The perfect plot is a revelation of
no issues with the fitted model. A satisfactory residual plot should be
more or less within the horizontal band of points (random pattern), as
shown in Fig. 6.4.

Y

→X
0

Fig. 6.4: Horizontal band.

The residual plot will not be randomly scattered when the fitted model
deviates from any assumptions. Some common types of variation in the
residual plots are discussed as follows:
(ii) Variation from the assumption of constant variance
Suppose we observe that the points of a residual plot are likely to create
a non-constant, i.e., an increasing or decreasing band. It indicates a
deviation from the constant variance assumption. When we have the
heteroscedastic data (non-constant variance), we obtain the residual plot
similar to the pattern shown in Figs. 6.5 to 6.7. The models depicting
these patterns may have a poor ability to account for the variability.

Y

→X
0

Fig. 6.5: Outward funnel shape.

Fig. 6.5 shows an outward (right opening) funnel pattern. It indicates that
the variance of the error terms is not constant and increases with the
response variable. Fig. 6.6 has an inward opening (left opening) funnel
pattern, which shows that the error variance decreases with the
174 response variable.
Unit 6 Residual Analysis

Y

→X
0

Fig 6.6: Inward funnel shape.

The variance of error terms is also considered non-constant even if the


residuals indicate a pattern that is accommodated inside a double bow,
as shown in Fig. 6.7. This shows the inequality of the variance. We
generally observe the double-bow pattern when Y is a proportion lying
between 0 and 1.

Y

si

→X

Fig. 6.7: Double bow pattern.

We should try simple remedies for treating non-constant variance, such


as variance stabilising transformation. We will discuss the variance
stabilising transformations later in Unit 8.
(iii) Variation from the assumption of model linearity
As discussed, if points of a residual plot are randomly dispersed around
the horizontal axis, a linear regression model is appropriate for the data.

Y

(a)
→X
0

175
Block 2 Model Adequacy Checking

Y

(b)
0 →X

Fig. 6.8: Non-linear patterns.

If we observe some curvature pattern as appeared in Fig. 6.8, it


indicates an incorrectly specified linear regression model. It is a signal
for considering a non-linear regression model. If the relationship
between Y and X is non-linear, we may observe a pattern of the
residuals similar to Fig. 6.8. In this situation, we suggest a curvilinear
relationship which fits an appropriate regression model. In this case, we
may use a suitable transformation to either the explanatory variable (X),
the response variable (Y), or both to deal with such inequalities. We may
also try polynomial regression or add interaction terms to eliminate the
curvilinear pattern.
(iv) Some other patterns of the residual plots
Observing the pattern of the residual plots shown in Fig. 6.9 shows the
combinations of non-linearity and non-constant variance.

(a) (b)
Fig. 6.9: Non-linear and non-constant variance patterns.

A trend may also be exhibited in the residual plot, as shown in Fig. 6.10.
In this type of plot, there is likely to be an error in the computation of the
regression model. In this situation, additional regression analysis may be
needed in the model formulation. Note that we also use the residual
plots to identify outliers (large residual values). These outliers occur
towards the extreme ends, away from most of the points. However,
sometimes the outliers may indicate either non-constant variance or a
non-linear relationship between Y and X. So you should also keep in
mind and examine these prospects instead of considering them only
outliers. You know, sometimes some outliers influence more on the
regression estimates and the modified model may change the status of
176 outliers.
Unit 6 Residual Analysis

Y

(a)

0 →X

Y

(b)
→X
0

Fig. 6.10: (a) Upward trend shape and (b) Downward trend shape.

You may now like to solve the following example to understand the residual
plots better.
Example 3: For the yield data given in Example 1, construct and interpret the
residual plots considering scaled residuals against the (i) predicted values of
yield (yˆ i ), rainfall (X1) and area under crop (X2).

Solution: (i) To obtain a residual plot, we consider the predicted values (yˆ i ) on
the horizontal axis and any scaled residuals on the vertical axis. We now mark
the points corresponding to the scaled residuals (computed in Table 6.4)
against the fitted values of the response variable obtained in Table 6.3.
In this way, we obtain the residual plots for scaled residuals with respect to the
predicted values of the response variable, which are shown in Fig. 6.11.

4.0 4.0

3.5 3.5

3.0 3.0
Studentised Residual

2.5
Standardised Residual

2.5
2.0 2.0

1.5 1.5

1.0 1.0

0.5 0.5

0.0 0.0

-0.5 -0.5

-1.0 -1.0

-1.5 -1.5
2100 2200 2300 2400 2500 2600 2100 2200 2300 2400 2500 2600
Predicted Yield Predicted Yield

177
Block 2 Model Adequacy Checking

800 7.0

6.0
600
5.0

R-student Residual
PRESS Residual
400 4.0

3.0
200
2.0

0 1.0

0.0
-200
-1.0
-400 -2.0
2100 2200 2300 2400 2500 2600 2100 2200 2300 2400 2500 2600
Predicted Yield Predicted Yield

Fig: 6.11: Residual plots using predicted values of yield (yˆ i ) .

Similarly, we can plot the rainfall (X1) against the scaled residuals. The
resulting residual plots are shown in Fig. 6.12.

4.0 4.0
3.5
3.0 3.0
Standardised Residual

Studentised Residual

2.5
2.0 2.0

1.5
1.0
1.0
0.5
0.0
0.0
-0.5 -1.0
-1.0
-1.5 -2.0
1150 1200 1250 1300 1350 1400 1450 1150 1200 1250 1300 1350 1400 1450
Annual Rainfall Annual Rainfall

800 7.0
6.0
600
5.0
R-student Residual
PRESS Residual

400 4.0
3.0
200
2.0
0 1.0
0.0
-200
-1.0
-400 -2.0
1150 1250 1350 1450 1150 1250 1350 1450
Annual Rainfall Annual Rainfall

Fig: 6.12: Residual plots using rainfall (X1).

The residual plots for scaled residuals corresponding to the area under crop
(X2) are presented in Fig. 6.13.
178
Unit 6 Residual Analysis

4.0 4.0
3.5
Standardised Residual

3.0 3.0

Studentised Residual
2.5
2.0 2.0
1.5
1.0
1.0
0.5
0.0
0.0
-0.5 -1.0
-1.0
-1.5 -2.0
40 45 50 55 60 40 45 50 55 60
Area Area

800 7.0
6.0
600
R-student Residual
5.0
PRESS Residual

400 4.0
3.0
200
2.0
0 1.0
0.0
-200
-1.0
-400 -2.0
40 45 50 55 60 40 45 50 55 60
Area Area

Fig: 6.13: Residual plots using area (X2).

The scaled residual plots shown in Figs. 6.11 and 6.12 appear to follow a
curvilinear pattern approximately. Hence, the assumption of linear regression
does not seem valid, or we can say that the linear regression model does not
fit the given data well. Fig. 6.13 slightly shows the deviation from the
assumption of constant variance of the error terms. You can also infer that
Figs. 6.11 to 6.13 indicate a possible outlier corresponding to the 14th
observation.
You may like to pause here and check your understanding of the construction
of residual plots by solving an exercise. Now, you can try the following Self
Assessment Question.

SAQ 4
For the scaled residuals computed in SAQ 3 for the data given on sales and
price of a product, construct the residual plots corresponding to the predicted
values of sales versus (i) standardised residuals and (ii) studentised residuals.

Until you are familiar with the residual plots, we will now discuss the normal
plots used to check the normality assumption.

6.6 NORMAL PLOTS


As discussed in "MST-012: Probability and Probability Distributions", the
central limit theorem ensures that the sufficiently large sample holds the 179
Block 2 Model Adequacy Checking

normality assumption. So, we can assume that the distribution of residuals will
be approximately normal for a sufficiently large sample. However, it would help
if you are cautious about possible violations of the normality assumption while
dealing with small samples. As you know, the assumption of normality (i.e.,
error terms should be normally distributed) plays an important role in
regression analysis in small samples. You have learnt in "MST-016: Statistical
Inference" that statistical tests rely upon certain assumptions about the
variables, including normality, which are essential for the validity of the results.
When these assumptions are not satisfied, the results may not be reliable.
In regression analysis, the assumption of normality of error terms is essential
since the t-test, F-test, and confidence intervals depend on it. In this section,
we construct the normal plots to verify the normality assumption. Suppose for
a sample of size n, we divide the area of a normal curve into n equal parts, in
1
which each part consists of an observation with cumulative probability .
n

Fig. 6.14: Area of the normal curve.

The normality assumption of error terms is essential while fitting a linear regression
model. We can plot the histogram to get an idea of normality. However, the normal
probability plot is widely used to validate the assumption of normality, and it is
easier to interpret than the histogram. You will come across various normal plots
in the literature depending on the variables considered on the X and Y axes of
the plots. They are:
▪ Normal probability plot: In a normal probability plot, we plot the ordered
residuals/standardised residuals against the theoretical cumulative
probabilities of a normal distribution (which is known as cumulative area
under the normal curve or theoretical cumulative density function (CDF)).
We consider either theoretical cumulative probability (pi) or percentile
theoretical cumulative probabilities (Pi) on the vertical axis (Y-axis) and the
ordered standardised residuals on the horizontal axis (X-axis) to obtain the
normal probability plot.
▪ Normal probability-probability plot: This plot is also known as the P-P
plot. This plot marks sample cumulative probabilities of the observed data
(CDF of the ordered residuals/standardised residuals) against the
theoretical cumulative probabilities on the X and Y axes, respectively. Note
that the sample cumulative and theoretical cumulative probabilities are also
known as empirical and expected cumulative probabilities, respectively.
▪ Normal quantile plot: It is similar to the normal probability plot. The main
180 difference is that ordered residuals/standardised residuals are plotted
Unit 6 Residual Analysis

against the theoretical quantiles (also known as normal score or Z-score)


instead of theoretical cumulative probabilities.
▪ Normal quantile-quantile plot: This plot is also known as the Q-Q plot. In
this plot, we draw sample quantiles of the observed data (normal scores of
the ordered residuals/standardised residuals) against the theoretical
quantiles (i.e., quantiles of a normal distribution).
Note that the normal quantile plot and Q-Q plot are the same. The term QQ
plot is more general, and we can say that the normal quantile plot is a specific
type of QQ plot which compares the quantiles of a dataset to the quantiles of a
normal distribution to assess whether the data follows a normal distribution.
More specifically, the QQ plot compares data to any theoretical distribution
(e.g., normal, exponential, etc.), while the Normal quantile plot compares
against the quantiles of a normal distribution. Since we need to check the
normality assumptions here, the distribution being tested is the normal
distribution. The sample quantiles of the observed data (normal scores of the
ordered standardised residuals) will be the ordered standardised residuals
themselves.
Remember that we compare the sample cumulative distribution function of
given data with a specified theoretical cumulative distribution function with the
help of the P-P plot. At the same time, the quantiles of a data distribution are
compared with the quantiles of a standard theoretical distribution from a
specified family of distributions by Q-Q plot.
The interpretations of these normal plots will be the same. If the points on the
normal plot lie along a straight line, it implies that the error terms are
approximately normally distributed. In contrast, deviation from the line
indicates a departure from the normality assumption. Let us consider the
following formulae involved in the construction of these plots are as follows:
1. If s1, s2, s3, …, sn are the standardised residuals computed using
equation (6), we obtain the ordered standardised residuals by arranging
them in increasing order as:
s(1) < s(2) < s(3) < … < s(n)
2. We assign ranks to the ordered standardised residuals. Let R1, R2, R3,
…, Rn be the ranks assigned to the ordered standardised residuals s(1),
s(2), s(3), …, s(n), respectively.
3. After ranking the standardised residuals, we compute the theoretical
(expected) cumulative probability (pi) for each corresponding ranked
standardised residual as:
 1
 Ri − 2 
pi =  ; i = 1, 2, 3, …, n … (27)
n
4. We can also express the cumulative probability in percentage by
multiplying pi with 100. It is known as percentile theoretical cumulative
probability (Pi) or percentiles. Thus,
 1
 Ri − 2 
Pi = pi  100 =    100; i = 1, 2, 3, …, n … (28)
n 181
Block 2 Model Adequacy Checking

5. To construct the Q-Q plot, we compute the expected normal values


instead of the cumulative probabilities considered in the P-P plot as
follows:
 1
  Ri − 2  
zi =  −1 ( pi ) =  −1    … (29)
 n 
 
 
where  denotes the standard normal cumulative distribution.
Note that we subtract ½ as a correction factor since all observations scattered
inside the band are supposed to be clustered at the midpoint of the band (Fig.
6.14). Some software uses different correction factors instead of ½ to compute
p i . An important point to note here is that we visually determine the straight line,
usually focusing on the central values instead of those at extreme ends. A
significant departure from a straight line reveals a non-normal distribution.

The resulting points should lie approximately along a straight line (Y=X) if it
satisfies the assumption of normality (see Fig. 6.15). If the considered sample
Fig. 6.15 comes from the normal distribution, the points on the Q-Q plot will closely
follow the straight line (Y = X). In the normal probability plot, we visually
determine the straight line that passes through the central points of all values
rather than the extreme points at both ends. You can come across different
plot patterns while dealing with a normal probability plot.

6.6.1 Some Patterns of the Normal Plots


The patterns or interpretation of the normal probability plot, P-P plot, normal
quantile plot or Q-Q plot will be the same. Here, let us discuss some of the
common patterns of normal probability plots given as follows:

(i) If all points lie approximately along a straight line, as shown in Fig. 6.16,
the normal probability plot is considered ideal and will satisfy the
normality assumption. A substantial amount of departure from the
straight line indicates non-normality.

Fig. 6.16: Ideal normal probability plot.

(ii) The normal probability plots shown in Figs. 6.17 (a) and (b) do not lie
182 along a straight line. These patterns indicate that there is a problem with
Unit 6 Residual Analysis

the assumption of normality. A sharp upward and downward curve at


both ends of the plot shows that the endpoints of this distribution have
heavier tails than the normal probability plot shown in Fig. 6.16. The
normal probability plots shown in Fig. 6.17 deviate from the normality
assumption.

(a) (b)
Fig. 6.17: Heavier-tailed normal probability plots.

(iii) In Figs 6.18 (a) and (b), the normal probability plots show sudden
upward and downward changes in the direction of the trend. This sharp
change in the direction indicates positively and negatively skewed
underlying distributions, respectively.

(a) (b)

Fig. 6.18: Skewed normal probability plots.

Remember that sometimes, normally distributed observations may not


necessarily be along a straight line on a normal probability plot, and
sometimes, a normal probability plot may deviate considerably from a straight
line in case of small sample sizes. So, in this case, expertise and experience
in the related field can be advantageous when interpreting normal probability
plots. We generally need at least n=20 observations in a sample to generate a
stable and explainable normal probability plot. Sometimes, the normal
probability plot cannot be trusted to identify departures from the normality
assumption. In some cases, the departure from a straight line may be due to
the presence of outliers.
Let us now solve an example to explain the construction of a normal
probability plot. 183
Block 2 Model Adequacy Checking

Example 4: Construct the normal probability plot, P-P plot and normal
quantile plot for the data on the crop yield considered in Example 1.
Solution: In Examples 1 and 2, we have already computed the residuals and
the standardised residuals for the data mentioned in Table 6.1. To draw the
normal probability and quantile plots, we calculate the ordered standardised
residuals, their ranks, percentiles and expected normal values (theoretical
quantile). We arrange the standardised residuals computed in Column 3 of
Table 6.4 in increasing order (up to two decimal places for simplicity), as
shown in Column 2 of Table 6.5. You should note that we can also assign
ranks to the standardised residuals without arranging them in increasing order.
After arranging the ranks in Column 3, we calculate the cumulative
probabilities and percentile cumulative probabilities using equations (27) and
(28) for the corresponding ordered standardised residuals and enter them in
Columns 4 and 5, respectively. We obtain the theoretical quantiles (expected
normal values) using equation (29) in Column 6. We then compute the sample
cumulative probabilities corresponding to the ordered standardised residuals
(Column 2) using negative and positive cumulative z-values given in Tables III
and IV of the appendix given at the end of this volume. The computed sample
cumulative probabilities are placed in Column 7 of Table 6.5.
Table 6.5: Computation of percentile cumulative probability and theoretical quantities

Theoretical Theoretical Sample


Ordered
S. Cumulative Percentile Theoretical cumulative
Standardised Rank (Ri)
No. Probability Cumulative Quantile (zi) probabilities
Residual (s(i))
(pi) Probability (Pi) (psi)
(1) (2) (3) (4) (5) (6) (7)
1 – 1.24 1 0.0250 2.50 –1.95996 0.1075
2 – 1.22 2 0.0750 7.50 –1.43953 0.1112
3 – 0.91 3 0.1250 12.50 –1.15035 0.1814
4 – 0.72 4 0.1750 17.50 –0.93459 0.2358
5 – 0.61 5 0.2250 22.50 –0.75542 0.2709
6 – 0.36 6 0.2750 27.50 –0.59776 0.3594
7 – 0.27 7 0.3250 32.50 –0.45376 0.3936
8 – 0.21 8 0.3750 37.50 –0.31864 0.4168
9 – 0.19 9 0.4250 42.50 –0.18912 0.4247
10 – 0.09 10 0.4750 47.50 –0.06271 0.4641
11 – 0.07 11 0.5250 52.50 0.062707 0.4721
12 0.18 12 0.5750 57.50 0.189118 0.5714
13 0.21 13 0.6250 62.50 0.318639 0.5832
14 0.22 14 0.6750 67.50 0.453762 0.5871
15 0.27 15 0.7250 72.50 0.59776 0.6064
16 0.27 16 0.7750 77.50 0.755415 0.6064
17 0.32 17 0.8250 82.50 0.934589 0.6255
18 0.41 18 0.8750 87.50 1.150349 0.6591
19 0.74 19 0.9250 92.50 1.439531 0.7704
20 3.29 20 0.9750 97.50 1.959964 0.9995

Then, we plot (i) ordered standardised residuals against the theoretical


cumulative probabilities and (ii) sample cumulative probabilities against the
theoretical cumulative probabilities to obtain normal probability and P-P plots,
184 respectively, as shown in Fig. 6.19 (a) and (b).
Unit 6 Residual Analysis
1.0
1.0
0.9
Theoretical Cumulative Probability

0.9

Theoretical Cumulative Probability


0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0
-1.5 -0.5 0.5 1.5 2.5 3.5 0.0 0.2 0.4 0.6 0.8 1.0
Ordered Standarised Residual Sample Cumulative Probability

(a) Normal probability plot (b) P-P plot

Fig. 6.19: Normal Probability Plot.

In the same way, we can plot the normal quantile plot (or Q-Q plot) as shown
in Fig. 6.20.
2.0
Expected Normal Values

1.5
1.0
0.5
0.0
-0.5
-1.0
-1.5
-2.0
-1.5 -0.5 0.5 1.5 2.5 3.5
Ordered Standarised Residual

Fig. 6.20: Normal Quantile Plot.

Note that all resulting points do not seem to lie along a straight line, as shown
in Figs. 6.19 and 6.20. Notice from these figures that some points of the
distribution deviate slightly from the straight line, and one point lying far away
is an outlier. It indicates that the distribution of error terms (residuals) is
deviating to be normally distributed.
We now solve an exercise to practice constructing the normal probability and
normal quantile plots. Now, you can try the following Self Assessment
Question.

SAQ 5
For the exercise given in SAQ 4, construct the normal probability and normal
quantile plots.

We now end this unit by summarising what you have learnt in it.

6.7 SUMMARY
• The difference between the ith observed and the corresponding predicted
(fitted) values of the response variable is defined as residual. The ith
residual is given as:
185
Block 2 Model Adequacy Checking
ri = yi − yˆ i ; i = 1, 2,..., n

In matrix form, we can write as:


ˆ
R=Y−Y

ˆ = ( yˆ , yˆ , ..., yˆ ) .
where R = (r1, r2 , ..., rn ) and Y 1 2 n

• When the value of 2 is unknown, we can approximately estimate it from


the given data as:
n n

r i
2
 (y i − yˆ i )2
ˆ 2 = MSSRe s = i =1
= i =1

n − k −1 n − k −1
• The ith standardised residual for the unknown 2 is given as:
ri
si =
ˆ
where, ˆ 2 = MSSRes

• We estimate the exact variance of the ith residual term as:


Var(ri ) = ˆ 2 (1 − hii ) = MSSres (1 − hii )

where, 0  hii  1

• The ith studentised residuals for unknown 2 is computed as:


ri ri
di = =
Var(ri ) MSSres (1 − hii )

• The ith PRESS residual also known as deleted residual, is given as:
r(i) = yi − yˆ (i) ; i = 1, 2, ..., n

• The relationship between the ordinary residual ( ri ) and the PRESS


residual ( r(i) ) can be expressed as:

ri
r(i) =
1 − hii

• The variance of the ith PRESS residual for unknown 2 is given as:
MSSRe s
Var(r(i) ) =
1 − hii

• When 2 is unknown, the ith standardised PRESS residual is obtained


as:
ri
*
s(i) =
MSSRe s (1 − hii )

• The estimated 2 after removing the ith observation can be determined


as:
1  ri2 
MSSRes(i) = (n − k)MSS − 
n − k −1 1 − hii 
Res

186
• The ith R-student is computed as:
Unit 6 Residual Analysis
ri
d(i) = ; i = 1, 2,...,n
MSSRes(i) (1 − hii )

• We perform residual analysis to check the validity of some basic


assumptions and ensure the adequacy of the regression model. We use
the residual plot to detect the non-linearity and/or unequal variance of
the error terms. In contrast, the normal probability plot is used to detect
non-normality in the error terms.
• The residual plot considers the residuals or scaled residuals on the
vertical (Y) axis and the predicted values or the explanatory variable(s)
on the horizontal (X) axis. If the points on a residual plot are randomly
dispersed around a horizontal band, a linear regression model will be
appropriate for the given data; otherwise, a non-linear model will be
more relevant.
• The normality assumption is essential because the t-test, F-test and
confidence intervals depend on it. We construct the normal probability
plot/normal quantile plot to check the validity of the normality
assumption.
• We can also express the cumulative probability in percentage by
multiplying pi with 100. It is known as percentile expected cumulative
probability (Pi) or simply percentiles. Thus,
 1
 Ri − 2 
Pi = pi  100 =    100; i = 1, 2, 3, …, n
n
• In the normal probability plot, we plot cumulative probabilities (pi) or
percentile cumulative probabilities (Pi) on the vertical (Y) axis and the
ordered standardised residuals on the horizontal (X) axis.
• The expected normal values can be computed as:
 1
  Ri − 2  
zi =  −1 ( pi ) =  −1   
 n 
 
 

where,  denotes the standard normal cumulative distribution.


• The Q-Q plot compares the quantiles of a data distribution with the
quantiles of a standardised theoretical distribution from a specified family
of distributions.
You may like to solve the following exercises to acquire a better understanding
of the residual analysis.

6.8 TERMINAL QUESTIONS


1. Differentiate between internally and externally studentised residuals.
2. Explain the P-P and Q-Q plots.
3. To assess the relationship between the heights and diameters of the
trees, a researcher collected data on the diameters (in centimetres) and
187
Block 2 Model Adequacy Checking

heights (in meters) of 20 trees. The resulting data are recorded in the
following table:
Table 6.6: Diameter and height data of 20 trees

Diameter Height Diameter Height


4.2 4.14 3.63 3.49
5.55 4.87 6.71 5.42
3.33 3.24 4.4 4.2
6.91 5.34 6.45 5.27
4.4 4.34 3 2.74
6.05 5.07 4.61 4.44
3.83 3.84 5 4.64
7.11 5.44 7.25 5.57
3.6 3.54 3.2 3.04
6.65 5.26 5.31 4.74

(i) Obtain the residuals, standardised residuals, studentised


residuals, PRESS and R-student.
(ii) Construct the residual plots by plotting standardised and
studentised residuals corresponding to the diameter of trees.
(iii) Also, draw the P-P and Q-Q plots.

6.9 SOLUTIONS / ANSWERS


Self Assessment Questions(SAQs)
1. Refer to Section 1.4.2 of Unit 1.
2. We can fit the regression model for the given data as mentioned in Unit
2, which is given as:
ˆ = − 4466.1863 + 1186.1372 X
Y

We now compute the predicted values of the response variable and


residuals for all observations given in the data. We arrange them in
Table 6.7 as follows:

Table 6.7: Computation of the predicted values and Residuals

S. Predicted Residual S. Predicted Residual


yi yi
No. Value (yˆ i ) ri = (yi − yˆ i ) No. Value (yˆ i ) ri = (yi − yˆ i )

1 4740 9494.1043 – 4754.1043 14 26580 37414.6854 – 10834.6854


2 2640 1490.2043 1149.7957 15 18960 29410.7855 – 10450.7855
3 21000 16381.1809 4618.8191 16 25000 25315.7669 – 315.7669
4 4740 5585.2229 – 845.2229 17 1020 2793.1648 – 1773.1648
5 12000 7446.5950 4553.4050 18 11280 21779.1600 – 10499.1600
6 21840 19359.3763 2480.6237 19 4620 10797.0647 – 6177.0647
7 28380 23454.3948 4925.6052 20 8340 4468.3997 3871.6003
8 47000 34994.9017 12005.0983 21 3360 6702.0461 – 3342.0461
9 10000 18056.4158 – 8056.4158 22 9360 13589.1228 – 4229.1228
10 41100 38345.3715 2754.6285 23 31680 27363.2762 4316.7238
188
Unit 6 Residual Analysis

S. Predicted Residual S. Predicted Residual


yi yi
No. Value (yˆ i ) ri = (yi − yˆ i ) No. Value (yˆ i ) ri = (yi − yˆ i )

11 35040 33133.5296 1906.4704 24 15000 16753.4554 –1753.4554


12 31260 30527.6087 732.3913 25 30000 18800.9646 11199.0354
13 19500 10983.2019 8516.7981

3. The predicted values of the response variable and residuals are given in
Table 6.7. As mentioned in Unit 3, we can compute the value of ̂ as:

̂ = 41732863.5406 = 6460.0978
We now compute the diagonal of hat matrix (hii), MSRe s(i) , standardised
residuals, studentised residuals, PRESS residuals and R-student for all
observations given in the data and arrange them in Columns (3) to (8) of
Table 6.8 as follows:
Table 6.8: Computation of the scaled residuals

Standardised Studentised PRESS


S. Residual Residual R-student
hii MSRe s(i) Residual Residual
No. ri = (yi − yˆ i ) ri d(i)
si = di = s(i) r(i)
ˆ
(1) (2) (3) (4) (5) (6) (7) (8)
1 –4754.1043 0.0670 42494059.3513 –0.7359 –0.7619 –5095.6728 –0.7550
2 1149.7957 0.1357 43480835.1138 0.1780 0.1914 1330.2514 0.1876
3 4618.8191 0.0416 42579551.9324 0.7150 0.7303 4819.2038 0.7230
4 –845.2229 0.0953 43513002.9529 –0.1308 –0.1376 –934.2589 –0.1347
5 4553.4050 0.0806 42566862.4313 0.7049 0.7351 4952.5331 0.7279
6 2480.6237 0.0402 43268586.8748 0.3840 0.3920 2584.5221 0.3849
7 4925.6052 0.0478 42439541.0064 0.7625 0.7814 5172.8226 0.7748
8 12005.0983 0.1283 36358863.3953 1.8583 1.9904 13772.0544 2.1324
9 –8056.4158 0.0401 40607487.4528 –1.2471 –1.2729 –8392.8778 –1.2904
10 2754.6285 0.1680 43150798.0250 0.4264 0.4675 3310.9257 0.4597
11 1906.4704 0.1094 43369894.1742 0.2951 0.3127 2140.6884 0.3068
12 732.3913 0.0868 43521797.9909 0.1134 0.1186 801.9910 0.1162
13 8516.7981 0.0589 40196240.9073 1.3184 1.3590 9049.7841 1.3847
14 –10834.6854 0.1562 37498247.7644 –1.6772 –1.8259 –12841.0767 –1.9262
15 –10450.7855 0.0784 38394455.8404 –1.6177 –1.6852 –11340.4146 –1.7569
16 –-315.7669 0.0549 43542749.0038 –0.0489 –0.0503 –334.1005 –0.0492
17 –1773.1648 0.1216 43391707.1628 –0.2745 –0.2929 –2018.6845 –0.2872
18 –10499.1600 0.0434 38537403.9174 –1.6252 –1.6617 –10975.0147 –1.7292
19 –6177.0647 0.0598 41782797.0553 –0.9562 –0.9861 –6570.1744 –0.9856
20 3871.6003 0.1052 42818993.4526 0.5993 0.6336 4326.8607 0.6255
21 –3342.0461 0.0862 43015904.6269 –0.5173 –0.5412 –3657.3159 –0.5331
22 –4229.1228 0.0482 42730367.1769 –0.6547 –0.6710 –4443.0679 –0.6631
23 4316.7238 0.0653 42680568.2155 0.6682 0.6912 4618.2376 0.6834
24 –1753.4554 0.0411 43407929.1183 –0.2714 –0.2772 –1828.5925 –0.2718
25 11199.0354 0.0400 37867058.3227 1.7336 1.7693 11665.8604 1.8575

4. The predicted values (yˆ i ) , standardised residuals (si) and studentised


residuals (di) are obtained in Table 6.8. We draw the residual plots
corresponding to the predicted values of sales versus (i) standardised
residuals and (ii) studentised residuals, as shown in Fig. 6.21(a) and (b),
respectively.
189
Block 2 Model Adequacy Checking
2.5
2.0

Standardised Residual
1.5
1.0
0.5
(a) 0.0
-0.5
-1.0
-1.5
-2.0
0 10000 20000 30000 40000
Predicted Yield

2.5
2.0

Studentised Residual
1.5
1.0
0.5
(b) 0.0
-0.5
-1.0
-1.5
-2.0
0 10000 20000 30000 40000
Predicted Yield

Fig: 6.21: Residual plots.

The residual plots (shown in Fig. 6.21) appear to be an outward (right


opening) funnel pattern. These patterns indicate that error variance
increases with the response variable. The assumption of linear
regression is also not valid here, so we can say that the simple linear
regression model does not fit the given data well.
5. We arrange the standardised residuals computed in Table 6.8 of the
solution of SAQ 4 in increasing order, as shown in Column 2 of
Table 6.9. After arranging the standardised residuals, we assign ranks
to the standardised residuals in Column 3. We calculate the cumulative
probabilities, percentile cumulative probabilities and expected normal
values in Columns 4, 5 and 6 of Table 6.9, respectively.
Table 6.9: Computation of cumulative probabilities and expected normal values

Ordered Cumulative Percentile Expected


S. No. Standardised Rank (Ri) Probability Cumulative Normal
Residual (s(i)) (pi) Probability (Pi) Value
(1) (2) (3) (4) (5) (6)
1 –1.6772 1 0.02 2.00 –2.0537
2 –1.6252 2 0.06 6.00 –1.5548
3 –1.6177 3 0.10 10.00 –1.2816
4 –1.2471 4 0.14 14.00 –1.0803
5 –0.9562 5 0.18 18.00 –0.9154
6 –0.7359 6 0.22 22.00 –0.7722
7 –0.6547 7 0.26 26.00 –0.6433
8 –0.5173 8 0.30 30.00 -0.5244
9 –0.2745 9 0.34 34.00 –0.4125
190
Unit 6 Residual Analysis
Ordered Cumulative Percentile Expected
Rank
S. No. Standardised Probability Cumulative Normal
(Ri)
Residual (s(i)) (pi) Probability (Pi) Value
10 –0.2714 10 0.38 38.00 –0.3055
11 –0.1308 11 0.42 42.00 –0.2019
12 –0.0489 12 0.46 46.00 –0.1004
13 0.1134 13 0.50 50.00 0.0000
14 0.1780 14 0.54 54.00 0.1004

15 0.2951 15 0.58 58.00 0.2019

16 0.3840 16 0.62 62.00 0.3055

17 0.4264 17 0.66 66.00 0.4125

18 0.5993 18 0.70 70.00 0.5244

19 0.6682 19 0.74 74.00 0.6433

20 0.7049 20 0.78 78.00 0.7722

21 0.7150 21 0.82 82.00 0.9154

22 0.7625 22 0.86 86.00 1.0803

23 1.3184 23 0.90 90.00 1.2816

24 1.7336 24 0.94 94.00 1.5548

25 1.8583 25 0.98 98.00 2.0537

We draw the ordered or ranked standardised residuals against the percentiles


to construct the normal probability plot, as shown in Fig. 6.22.
1.0
Theoretical cumulative probability

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
Ordered Standarised Residual

Fig. 6.22: Normal Probability Plot.

The ordered standardised residuals are plotted against the expected normal
values to obtain a normal quantile plot, as shown in Fig. 6.23.
2.0
1.5
Theoretical Quantile

1.0
0.5
0.0
-0.5
-1.0
-1.5
-2.0
-2.5
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
Ordered Standarised Residual

Fig. 6.23: Normal Quantile Plot.


191
Block 2 Model Adequacy Checking

The resulting points do not lie approximately on a straight line, as shown in


Figs. 6.22 and 6.23. Notice that some points of the distribution deviate far from
the central points. It indicates that the error terms are not normally distributed.

Terminal Questions(TQs)
1. Refer to Sections 6.4.2 and 6.4.4.
2. Refer to Section 6.6.
3. (i) The simple regression model for the given data can be fitted as:
ˆ = 1.4239 + 0.5945 X
Y

We also determine ̂ = 0.0459 = 0.2142 .

We compute the predicted values of the response variable, residuals,


diagonal of hat matrix (hii), MSRe s(i) , standardised, studentised, PRESS
and R-student for all observations in Table 6.10 given as follows:
Table 6.10: Computation of residuals and scaled residuals

Standardised Studentised PRESS


S. Predicted Residual Residual R-student
hii MSRe s(i) Residual Residual
No. Value (yˆ i ) ri = (yi − yˆ i ) ri d(i)
si = di = s(i) r(i)
ˆ

1 3.9206 0.2194 0.0689 0.0456 1.0242 1.0615 0.2357 1.0652


2 4.7231 0.1469 0.0562 0.0472 0.6858 0.7059 0.1557 0.6963
3 3.4034 –0.1634 0.1265 0.0467 –0.7627 –0.8161 –0.1871 –0.8086
4 5.5315 –0.1915 0.1376 0.0461 –0.8940 –0.9627 –0.2221 –0.9607
5 4.0395 0.3005 0.0611 0.0431 1.4028 1.4478 0.3201 1.4940
6 5.0203 0.0497 0.0751 0.0483 0.2320 0.2412 0.0537 0.2351
7 3.7006 0.1394 0.0887 0.0473 0.6506 0.6815 0.1529 0.6716
8 5.6504 –0.2104 0.1576 0.0455 –0.9822 –1.0701 –0.2498 –1.0744
9 3.5639 –0.0239 0.1045 0.0484 –0.1116 –0.1179 –0.0267 –0.1148
10 5.3770 –0.1170 0.1147 0.0476 –0.5460 –0.5803 –0.1321 –0.5699
11 3.5817 –0.0917 0.1023 0.0479 –0.4282 –0.4519 –0.1022 –0.4422
12 5.4126 0.0074 0.1197 0.0484 0.0344 0.0366 0.0084 0.0356
13 4.0395 0.1605 0.0611 0.0469 0.7494 0.7734 0.1710 0.7649
14 5.2581 0.0119 0.0995 0.0484 0.0556 0.0586 0.0132 0.0571
15 3.2072 –0.4672 0.1585 0.0340 –2.1809 –2.3774 –0.5552 –2.7609
16 4.1643 0.2757 0.0552 0.0440 1.2869 1.3240 0.2918 1.3525
17 4.3961 0.2439 0.0501 0.0450 1.1383 1.1679 0.2567 1.1799
18 5.7336 –0.1636 0.1728 0.0466 –0.7638 –0.8398 –0.1978 –0.8330
19 3.3261 –0.2861 0.1385 0.0432 –1.3355 –1.4388 –0.3321 –1.4836
20 4.5804 0.1596 0.0516 0.0470 0.7449 0.7649 0.1683 0.7563

Since the values of standardised residuals for all observations are less
than 3, there is no indication of an outlier in the given data.
(ii) The residual plot of standardised residuals obtained in Table 6.10
against the given values of the explanatory variable (X) is shown in
Fig. 6.24.
192
Unit 6 Residual Analysis

2.0
1.5

Standardised Residual
1.0
0.5
0.0
-0.5
-1.0
-1.5
-2.0
-2.5
3 4 5 6 7 8
Diameter

Fig: 6.24: Residual plot corresponding to standardised residuals.

Similarly, the residual plot of studentised residuals obtained in Table


6.10 against the given explanatory variable (X) is shown in Fig. 6.25.
2.0
1.5
Studentised Residual

1.0
0.5
0.0
-0.5
-1.0
-1.5
-2.0
-2.5
3 4 5 6 7 8
Diameter

Fig: 6.25: Residual plot corresponding to studentised residuals.

The residual plots corresponding to standardised and studentised


residuals (depicted in Figs. 6.24 and 6.25), respectively, indicate the
non-linear relationship between the height and diameter of the trees.
(iii) To draw the normal probability plot, we obtain the ordered standardised
residuals, ranks, theoretical and sample cumulative probabilities, and
theoretical quantiles, as shown in Table 6.11.
Table 6.11: Computation of percentile cumulative probability

Ordered Theoretical Sample


S. Theoretical
Standardised Rank Cumulative Cumulative
No. Quantile
Residual Probability Probability
1 –2.18 1 0.03 –1.9600 0.0146
2 –1.34 2 0.08 –1.4395 0.0901
3 –0.98 3 0.13 –1.1503 0.1635
4 –0.89 4 0.18 –0.9346 0.1867
5 –0.76 5 0.23 –0.7554 0.2236
6 –0.76 6 0.28 –0.5978 0.2236
7 –0.55 7 0.33 –0.4538 0.2912
8 –0.43 8 0.38 –0.3186 0.3336
9 –0.11 9 0.43 –0.1891 0.4562
10 0.03 10 0.48 –0.0627 0.5120

193
Block 2 Model Adequacy Checking
Ordered Theoretical Sample
S. Theoretical
Standardised Rank Cumulative Cumulative
No. Quantile
Residual Probability Probability
11 0.06 11 0.53 0.0627 0.5239
12 0.23 12 0.58 0.1891 0.5910
13 0.65 13 0.63 0.3186 0.7422
14 0.69 14 0.68 0.4538 0.7549
15 0.74 15 0.73 0.5978 0.7704
16 0.75 16 0.78 0.7554 0.7734
17 1.02 17 0.83 0.9346 0.8461
18 1.14 18 0.88 1.1503 0.8729
19 1.29 19 0.93 1.4395 0.9015
20 –2.18 20 0.98 1.9600 0.9192

Then, we plot the ordered standardised residuals against the percentiles to


obtain a normal probability plot, as shown in Fig. 6.26.
1.2
Theoretical cumulative probability

1.0

0.8

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
Sample cumulative probability

Fig: 6.26: P-P Plot.

The ordered standardised residuals are plotted against the expected normal
values to obtain a normal quantile plot, as shown in Fig. 6.27.
2.0
1.5
1.0
Theoretical Quantile

0.5
0.0
-0.5
-1.0
-1.5
-2.0
-2.5
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
Sample Quantile

Fig: 6.27: Q-Q Plot.

Note that the resulting points on the P-P and Q-Q plots are deviating slightly
from the straight lines (see Figs. 6.26 and 6.27). It indicates that the
distribution of error terms (residuals) slightly differs from the normal
distribution.
194

You might also like