0% found this document useful (0 votes)

5 views

4-Regression Diagnostics SAS

This document discusses regression analysis, its purposes, and the importance of model diagnostics to ensure appropriateness for data analysis. It covers various diagnostic methods for assessing issues such as nonlinearity, nonconstancy of error variance, presence of outliers, nonindependence of error terms, nonnormality, and multicollinearity. Additionally, it outlines remedial measures for addressing identified issues, including model modification and data transformation.

Uploaded by

Sompriya Narayana Tiwary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

4-Regression Diagnostics SAS

Uploaded by

Sompriya Narayana Tiwary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

Lalmohan Bhar
I.A.S.R.I., Library Avenue, Pusa, New Delhi – 110 012
[email protected]

1. Introduction
Regression analysis is a statistical methodology that utilizes the relation between two or more
quantitative variables so that one variable can be predicted from the other, or others. This
methodology is widely used in business, the social and behavioral sciences, the biological
sciences including agriculture and fishery research. Regression analysis serves three major
purposes: (1) description (2) control and (3) prediction. We frequent use equations to summarize
or describe a set of data. Regression analysis is helpful in developing such equations.

A functional relation between two variables is expressed by a mathematical formula. If X denotes

the independent variable and Y the dependent variable, a functional relation is of the form
Y = f(X)
Given a particular value of X, the function f indicates the corresponding value of Y. Depending
on the nature of the relationships between X and Y, regression approach may be classified into
two broad categories viz., linear regression models and nonlinear regression models. The
response variable is generally related to other causal variables through some parameters. The
models that are linear in these parameters are known as linear models, whereas in nonlinear
models parameters are appear nonlinearly. Linear models are generally satisfactory
approximations for most regression applications. There are occasions, however, when an
empirically indicated or a theoretically justified nonlinear model is more appropriate.

When a regression model is considered for an application, we can usually not be certain in
advance that the model is appropriate for that application, any one, or several, of the features of
the model, such as linearity of the regression function or normality of the error terms, may not be
appropriate for the particular data at hand. Hence, it is important to examine the aptness of the
model for the data before inferences based on that model are undertaken. In this note we discuss
some simple methods for studying the appropriateness of a model, as well as some remedial
measures that can be helpful when the data are not in accordance with the conditions of the
regression model.

2. Diagnostics
2.1 Nonlinearity of Regression Model
Whether a linear regression function is appropriate for the data being analyzed can be studied
from a residual plot against the predictor variable or equivalently from a residual plot against the
fitted values.

Figure 1(a) shows a prototype situation of the residual plot against X when a linear regression
model is appropriate. The residuals then fall within a horizontal band centred around 0,
displaying no systematic tendencies to be positive and negative.
Regression Diagnostics and Remedial Measures

Figure 1(b) shows a prototype situation of a departure from the linear regression model that
indicates the need for a curvilinear regression function. Here the residuals tend to vary in a
systematic fashion between being positive and negative.

Fig. 1(a) Fig. 1(b)

Fig. 1(c) Fig. 1(d)

2.2 Nonconstancy of Error Variance

Plots of residuals against the predictor variable or against the fitted values are not only helpful to
study whether a linear regression function is appropriate but also to examine whether the
variance of the error terms is constant

The prototype plot in Figure 1(a) exemplifies residual plots when error term variance is constant.
Figure 1(c) shows a prototype picture of residual plot when the error variance increases with X.
In many biological science applications, departures from constancy of the error variance tend to
be of the “meghaphone” type.

Modified Levene Test: The test is based on the variability of the residuals. Let ei1 denotes the ith
residual for group 1 and ei2 denotes the ith residual for group 2. Also we denote n1 and n2 to
denote the sample sizes of the two groups, where: n1 + n2 = n.
Further, we shall use ~ e1 and e~2 to denote the medians of the residuals in the two groups. The
modified Levene test uses the absolute deviations of the residuals around their median, to be
denoted by di1 and di2:
d i1  ei1  e~1 , d i 2  ei 2  e~2
With this notation, the two-sample t test statistic becomes:

82
Regression Diagnostics and Remedial Measures

d1  d 2
t L* =
1 1
s 
n1 n2

Where d1 and d 2 are the sample means of the di1 and di2, respectively, and the pooled variance s2
is:

s 
2  (d i1  d1 ) 2   (d i 2  d 2 ) 2
.
n2
If the error terms have constant variance and n1 and n2 are not too small, t L* follows
approximately the t distribution with n-2 degrees of freedom. Large absolute values of t L* indicate
that the error terms do not have constant variance.

2.3 Presence of Outliers

Outliers are extreme observations. Residual outliers can be identified from residual plots against
X or Yˆ . Outliers can create great difficulty. When we encounter one, our first suspicion is that
the observation resulted from a mistake or other extraneous effect. On the other hand, outliers
may convey significant information, as when an outlier occurs because of an interaction with
another predictor omitted from the model.

Tests for Outlying Observations

(i) Elements of Hat Matrix: The Hat matrix is defined as H  X( XX) 1 X , X is the
matrix for explanatory variables. The larger values reflect data points are outliers.

(ii) WSSDi: WSSDi is an important statistic to locate points that are remote in x-space.
WSSDi measures the weighted sum of squared distance of the ith point from the center of the
data. Generally if the WSSDi values progress smoothly from small to large, there are probably
no extremely remote points. However, if there is a sudden jump in the magnitude of WSSDi, this
often indicates that one or more extreme points are present.

(iii) Cook's Di: Cook's Di is designed to measure the shift in ŷ when ith obsevation is not used
in the estimation of parameters. Di follows approximately F p , n  p 1 (1-). Lower 10% point of
this distribution is taken as a reasonable cut off (more conservative users suggest the 50% point).
4
The cut off for Di can be taken as .
n
(iv) 
DFFITSi: DFFIT is used to measure difference in ith component of ŷ  ŷi  . It is
1
 p  1 2
suggested that DFFITSi  2  may be used to flag off influential observations.
 n 
(v) DFBETAS j (i ) : Cook's Di reveals the impact of ith observation on the entire vector of
the estimated regression coefficients. The influential observations for individual regression

83
Regression Diagnostics and Remedial Measures

coefficient are identified by DFBETAS j( i ) , j  1,2 ,..., p  1 , where each DFBETAS j( i ) is the
standardized change in b j when the ith observation is deleted.

(vi) COVRATIO i :The impact of the ith observation on variance-covariance matrix of the
estimated regression coefficients is measured by the ratio of the determinants of the two
variance-covariance matrices. Thus, COVRATIO reflects the impact of the ith observation on the
precision of the estimates of the regression coefficients. Values near 1 indicate that the ith
observation has little effect on the precision of the estimates. A value of COVRATIO greater
than 1 indicates that the deletion of the ith observation decreases the precision of the estimates; a
ratio less than 1 indicates that the deletion of the observation increases the precision of the
3 p  1
estimates. Influential points are indicated by COVRATIOi  1  .
n

(vii) FVARATIO i : The statistic detects change in variance of ŷi when an observation is
deleted. A value near 1 indicates that the ith observation has negligible effect on variance of yi .
A value greater than 1 indicates that deletion of the ith observation decreases the precision of the
estimates, a value less than one increases the precision of the estimates.

2.4 Nonindependence of Error Terms

Whenever data are obtained in a time sequence or some other type of sequence, such as for
adjacent geographical areas, it is good idea to prepare a sequence plot of the residuals. The
purpose of plotting the residuals against time or some other type of sequence is to see if there is
any correlation between error terms that are near each other in the sequence. A prototype
residual plot showing a time related trend effect is presented in Figure 1(d), which portrays a
linear time related trend effect. When the error terms are independent, we expect the residuals in
a sequence plot to fluctuate in a more or less random pattern around the base line 0.

Tests for Randomness

A run test is frequently used to test for lack of randomness in the residuals arranged in time
order. Another test, specially designed for lack of randomness in least squares residuals, is the

Durbin-Watson test:
The Durbin-Watson test assumes the first order autoregressive error models. The test consists of
determining whether or not the autocorrelation coefficient (  , say) is zero. The usual test
alternatives considered are:
H0 :   0 ; H0 :   0 .
The Durbin-Watson test statistic D is obtained by using ordinary least squares to fit the
regression function, calculating the ordinary residuals: et  Yt  Yˆt , and then calculating the
statistic:
n

 (e t  et 1 ) 2
D t 2
.
n e 2
t
t 1

84
Regression Diagnostics and Remedial Measures

Exact critical values are difficult to obtain, but Durbin-Watson have obtained lower and upper
bound d L and dU such that a value of D outside these bounds leads to a definite decision. The
decision rule for testing between the alternatives is:
if D > dU, conclude H0
if D <dL, conclude H1
if d L  D  d U , test is inconclusive.
Small value of D lead to the conclusion that  >0.

2.5 Nonnormality of Error Terms

Small departures from normality do not create any serious problems. Major departures, on the
other hand, should be of concern. The normality of the error terms can be studied informally by
examining the residuals in a variety of graphic ways.

Comparison of frequencies: when the number of cases is reasonably large is to compare actual
frequencies of the residuals against expected frequencies under normality. For example, one can
determine whether, say, about 90% of the residuals fall between  1.645 MSE .

Normal probability plot: Still another possibility is to prepare a normal probability plot of the
residuals. Here each residual is plotted against its expected value under normality. A plot that is
nearly linear suggests agreement with normality, whereas a plot that departs substantially from
linearity suggests that the error distribution is not normal.

Correlation Test for Normality

In addition to visually assessing the appropriate linearity of the points plotted in a normal
probability plot, a formal test for normality of the error terms can be conducted by calculating
the coefficient of correlation between residuals ei and their expected values under normality. A
high value of the correlation coefficient is indicative of normality.

2. 6 Multicollinearity
The use and interpretation of a multiple regression model depends implicitly on the assumption
that the explanatory variables are not strongly interrelated. In most regression applications the
explanatory variables are not orthogonal. Typically, it is impossible to estimate the unique
effects of individual variables in the regression equation. The estimated values of the coefficients
are very sensitive to slight changes in the data and to the addition or deletion of variables in the
equation. The regression coefficients have large sampling errors which affect both inference and
forecasting that is based on the regression model. The condition of severe non-orthogonality is
also referred to as the problem of multicollinearity.

Detection of Multicollinearity
Let  
R  rij and  
R 1  r ij denote simple correlation matrix and its inverse. Let
i ,i  1,2 ,..., p  p   p 1  ....1  denote the eigen values of R. The following are common
indicators of relationships among independent variables.
1. Simple pair-wise correlations
rij  1

85
Regression Diagnostics and Remedial Measures

2. The squared multiple correlation coefficients

1
Ri2  1   0.9 , where Ri2 denote the squared multiple correlation coefficients for the
ii
r
regression of xI on the remaining x variables.
3. The variance inflation factors, VIFi  r ii  10 and
4. eigen values, i  0 .

The first of these indicators, the simple correlation coefficients between pairs of independent
variables rij , may detect a simple relationship between xi and x j . Thus rij  1 implies that the
ith and jth variables are nearly proportional.

The second set of indicators, Ri2 , the squared multiple correlation coefficient for the regression
of xi on the remaining x variables indicates the degree to which xi is explained by a linear
combination of all of the other input variables.

The third set of indicators, the diagonal elements of the inverse matrix, which have been labeled
as the Variance Inflation Factors, VIFi . The term arises by noting that with standardized data
(mean zero and unit sum of squares), the variance of the least squares estimate of the ith
coefficient is proportional to r ii , VIFi  10 is probably based on the simple relation between Ri
and VIFi . That is VIFi  10 corresponds to Ri2  0.9 .

3. Overview of Remedial Measures

If the simple regression model (1) is not appropriate for a data set, there are two basic choices:
1. Abandon regression model and develop and use a more appropriate model.
2. Employ some transformation on the data so that regression model (1) is appropriate for
the transformed data.
Each approach has advantages and disadvantages. The first approach may entail a more complex
model that could yield better insights, but may also lead to more complex procedure for
estimating the parameters. Successful use of transformations, on the other hand, lead to relatively
simple methods of estimation and may involve fewer parameters than a complex model, an
advantage when the sample size is small. Yet transformation may obscure the fundamental
interconnections between the variables, though at times they may illuminate them.

3.1 Nonlinearity of Regression Function

When the regression function is not linear, a direct approach is to modify regression model (1)
by altering the nature of the regression function. For instance, a quadratic regression function
might be used.
Yi  0  1 X i  2 X i2  i
or an exponential regression function:
Xi
Yi   0 1  i .

86
Regression Diagnostics and Remedial Measures

When the nature of the regression function is not known, exploratory analysis that does not
require specifying a particular type of function is often useful.

3.2 Nonconstancy of Error Variance

When the error variance is not constant but varies in a systematic fashion, a direct approach is to
modify the method to allow for this and use the method of weighted least squares to obtain the
estimates of the parameters.

Transformations is another way in stabilizing the variance. We first consider transformation for
linearizing a nonlinear regression relation when the distribution of the error terms is reasonably
close to a normal distribution and the error terms have approximately constant variance. In this
situation, transformation on X should be attempted. The reason why transformation on Y may not
be desirable here is that a transformation on Y, such as Y   Y , may materially change the
shape of the distribution and may lead to substantially differing error term variance.

Following transformations are generally applied for stabilizing variance.

(1) when the error variance is rapidly increasing Y   log10 Y or Y   Y
(2) when the error variance is slowly increasing, Y   Y 2 or Y   Exp(Y )
(3) when the error variance is decreasing, Y   1 / Y or Y   Exp(Y ) .

Box - Cox Transformations: It is difficult to determine, which transformation of Y is most

appropriate for correcting skewness of the distributions of error terms, unequal error variance,
and nonlinearity of the regression function. The Box-Cox transformation automatically identifies
a transformation from the family of power transformations on Y. The family of power
transformations is of the form: Y   Y  , where is a parameter to be determined from the data.
Using standard computer programme it can be determined easily.

3.3 Nonindependence of Error Terms

When the error terms are correlated, a direct approach is to work with a model that calls for error
terms. A simple remedial transformation that is often helpful is to work with first differences.

3.4 Nonnormality of Error terms

Lack of normality and non-constant error variance frequently go hand in hand. Fortunately, it is
often the case that the same transformation that helps stabilize the variance is also helpful in
approximately normalizing the error terms. It is therefore, desirable that the transformation for
stabilizing the error variance be utilized first, and then the residuals studied to see if serious
departures from normality are still present.

3.5 Outlying Observations

When outlying observations are present, use of the least squares and maximum likelihood
estimates for regression model (1) may lead to serious distortions in the estimated regression
function. When the outlying observations do not represent recording errors and should not be
discarded, it may be desirable to use an estimation procedure that places less emphasis on such
outlying observations. Robust Regression falls under such methods.

87
Regression Diagnostics and Remedial Measures

3.6 Multicollinearity
i) Collection of additional data: Collecting additional data has been suggested as one of
the methods of combating multicollinearity. The additional data should be collected in a manner
designed to break up the multicollinearity in the existing data.
ii) Model respecification: Multicollinearity is often caused by the choice of model, such as
when two highly correlated regressors are used in the regression equation. In these situations
some respecification of the regression equation may lessen the impact of multicollinearity. One
approach to respecification is to redefine the regressors. For example, if x1, x2 and x3 are nearly
linearly dependent it may be possible to find some function such as x = (x1+x2)/x3 or x = x1x2x3
that preserves the information content in the original regressors but reduces the multicollinearity.
iii) Ridge Regression: When method of least squares is used, parameter estimates are
unbiased. A number of procedures have been developed for obtaining biased estimators of
regression coefficients to tackle the problem of multicollinearity. One of these procedures is
ridge regression. The ridge estimators are found by solving a slightly modified version of the
normal equations. Each of the diagonal elements of XX matrix are added a small quantity.

4 Illustration through SAS

Consider the following data set
Case X11 X21 X31 Yi
1 12.980 0.317 9.998 57.702
2 14.295 2.028 6.776 59.296
3 15.531 5.305 2.947 56.166
4 15.133 4.738 4.201 55.767
5 15.342 7.038 2.053 51.722
6 17.149 5.982 -0.055 60.446
7 15.462 2.737 4.657 60.715
8 12.801 10.663 3.048 37.447
9 17.039 5.132 0.257 60.974
10 13.172 2.039 8.738 55.270
11 16.125 2.271 2.101 59.289
12 14.340 4.077 5.545 54.027
13 12.923 2.643 9.331 53.199
14 14.231 10.401 1.041 41.896
15 15.222 1.220 6.149 63.264
16 15.740 10.612 -1.691 45.798
17 14.958 4.815 4.111 58.699
18 14.125 3.153 8.453 50.086
19 16.391 9.698 -1.714 48.890
20 16.452 3.912 2.145 62.213
21 13.535 7.625 3.851 45.625
22 14.199 4.474 5.112 53.923
23 15.837 5.753 2.087 55.799
24 16.565 8.546 8.974 56.741
25 13.322 8.589 4.011 43.145
26 15.949 8.290 -0.248 50.706

88
Regression Diagnostics and Remedial Measures

How to prepare the data file and syntax for performing normality test have been captured in the
following screen shot.

A portion of the output file is given in the following screen shot

The following program is useful for performing heterogeneity test for the errors.

89
Regression Diagnostics and Remedial Measures

proc reg data=abc;

model y=x1 x2 x3;
output out=abc r=rs p=pr;
run; quit;
data lm;
set abc;
abrs=abs(rs);
proc corr data=lm spearman ;
var abrs pr ; run;

The following program is used for detecting influential observations.

proc reg data=abc;
model y = x1-x3/influence;
output out=lmb cookd=d;
proc print data=lmb;
run;
For graphical view of the diagnostics for outliers can be obtained through the following
programme.

ods rtf file="lm5.rtf";

ods graphics on ;
proc reg
plots=(diagnostics(stats=none) RStudentByLeverage(label)
CooksD(label) Residuals(smooth)
DFFITS(label) DFBETAS ObservedByPredicted(label));

model y=x1 x2 x3;

run;
ods graphics off;
ods rtf close; run;
A portion of the result of this programme is given below:

90
Regression Diagnostics and Remedial Measures

The following programme is used for Box-Cox transformation.

ods graphics on;
ods rtf file ="lm2.rtf";
ods rtf select boxcoxplot boxcoxloglikeplot rmseplot;
proc transreg data=lmb test cl
plots= boxcox (rmse unpack);
model boxcox (y)=identity (x1 x2 x3);
run;
ods rtf close;
ods graphics off;

91
Regression Diagnostics and Remedial Measures

A portion of the result is given below:

Some of the observations were identified as outliers. After deleting these observation the fitted
equations are shown in the following table. Multicollinearity diagnostics are also shown in the
same table.

Regression Coefficients and Summary Statistics

Description b0 b1 b2 b3 s R2 Max Min Max
VIF e.v. R i2
All Data (n=26) 8.11 3.56 -1.63 0.34 1.80 0.94 2.82 0.210 0.65
Delete (11, 17, 18) 7.17 3.66 -1.79 0.40 0.51 0.99 2.85 0.210 0.65
Delete (24) 30.91 2.39 -2.14 -0.36 1.78 0.94 30.64 0.017 0.97
Delete (11, 17, 18, 24.27 2.79 -2.11 -0.16 0.50 0.99 171.9 0.003 0.99
24) 0
Ridge k=0.05 14.28 3.22 -1.73 0.25 0.66 0.99 10.20 0.053 0.90
(n=22)
Delete X3 (n=22) 19.50 3.03 -2.00 0.49 0.99 1.02 0.863 0.02

Assignments
No ratings yet
Assignments
6 pages
Chapter 4 MLR
No ratings yet
Chapter 4 MLR
17 pages
Chap03 4
No ratings yet
Chap03 4
49 pages
10 - APM 1205 Linear Model
No ratings yet
10 - APM 1205 Linear Model
40 pages
Some Methods of Detection of Outliers in Linear Regression Model-Ranjit PDF
No ratings yet
Some Methods of Detection of Outliers in Linear Regression Model-Ranjit PDF
19 pages
Diagnostico de Modelos
No ratings yet
Diagnostico de Modelos
4 pages
Linear Models Bias
No ratings yet
Linear Models Bias
17 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Chapter 3
No ratings yet
Chapter 3
22 pages
Pages From SPSS For Beginners
No ratings yet
Pages From SPSS For Beginners
58 pages
FCDS - RA ch3 Sp21
No ratings yet
FCDS - RA ch3 Sp21
20 pages
Plots Transformations and Regression an Introduction to Graphical Methods of Diagnostic Regression Analysis 0198533713 9780198533719
No ratings yet
Plots Transformations and Regression an Introduction to Graphical Methods of Diagnostic Regression Analysis 0198533713 9780198533719
300 pages
DB Structure Pivot Etc
No ratings yet
DB Structure Pivot Etc
14 pages
Unit 3
No ratings yet
Unit 3
24 pages
LR Assumptions_05
No ratings yet
LR Assumptions_05
12 pages
Econometrics for Finace Lecture II-Session Three
No ratings yet
Econometrics for Finace Lecture II-Session Three
32 pages
Regression Validation
No ratings yet
Regression Validation
3 pages
Problem in Regression Analysis
No ratings yet
Problem in Regression Analysis
7 pages
Assumption Checking On Linear Regression
No ratings yet
Assumption Checking On Linear Regression
65 pages
Econometrics A
No ratings yet
Econometrics A
18 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Sa 16
No ratings yet
Sa 16
5 pages
LR Assumptions
No ratings yet
LR Assumptions
9 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
20230305slides
No ratings yet
20230305slides
39 pages
Regression Adequacy
No ratings yet
Regression Adequacy
11 pages
The Four Assumptions of Linear Regression
No ratings yet
The Four Assumptions of Linear Regression
10 pages
00000chen - Linear Regression Analysis3
No ratings yet
00000chen - Linear Regression Analysis3
252 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Stats101A - Chapter 3
No ratings yet
Stats101A - Chapter 3
54 pages
It Skills and Data Analysis Group Project
No ratings yet
It Skills and Data Analysis Group Project
10 pages
Week 6: Assumptions in Regression Analysis
No ratings yet
Week 6: Assumptions in Regression Analysis
69 pages
A Review of Statistical Outlier Methods
No ratings yet
A Review of Statistical Outlier Methods
8 pages
Rekapitulacija NIR - Sve
No ratings yet
Rekapitulacija NIR - Sve
23 pages
Unit1 - Data Science - SPPU
No ratings yet
Unit1 - Data Science - SPPU
15 pages
Cheat Sheet
No ratings yet
Cheat Sheet
3 pages
Statistics Study Notes
No ratings yet
Statistics Study Notes
71 pages
Outliers Influence
No ratings yet
Outliers Influence
6 pages
Business Statistics Canadian 3rd Edition Sharpe Solutions Manual Download
100% (30)
Business Statistics Canadian 3rd Edition Sharpe Solutions Manual Download
41 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
如何读图
No ratings yet
如何读图
3 pages
Regression For Everyone Vol. 1
No ratings yet
Regression For Everyone Vol. 1
25 pages
12 W12NSE6220 - Fall 2023 - Zeng
No ratings yet
12 W12NSE6220 - Fall 2023 - Zeng
44 pages
2023 Level II Key Facts and Formula Sheet (KFFS)
No ratings yet
2023 Level II Key Facts and Formula Sheet (KFFS)
14 pages
Regression Analysis by Example - (CHAPTER 8 THE PROBLEM OF CORRELATED ERRORS)
No ratings yet
Regression Analysis by Example - (CHAPTER 8 THE PROBLEM OF CORRELATED ERRORS)
24 pages
Quantitative Methods Vocabulary
No ratings yet
Quantitative Methods Vocabulary
5 pages
Heteroscedasticity:: Testing and Correcting in SPSS
No ratings yet
Heteroscedasticity:: Testing and Correcting in SPSS
32 pages
Intervention Analysis
No ratings yet
Intervention Analysis
9 pages
Course Notes18
No ratings yet
Course Notes18
113 pages
Time Series
No ratings yet
Time Series
22 pages
Ch5 Slides Ed3 Feb2021
No ratings yet
Ch5 Slides Ed3 Feb2021
49 pages
ECON 332 Business Forecasting Methods Prof. Kirti K. Katkar
No ratings yet
ECON 332 Business Forecasting Methods Prof. Kirti K. Katkar
38 pages
Estadistica, Articulo, Analyzing Outliers: Influential or Nuisance?
No ratings yet
Estadistica, Articulo, Analyzing Outliers: Influential or Nuisance?
3 pages
Q&A Univ 3unit
No ratings yet
Q&A Univ 3unit
18 pages
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
No ratings yet
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
3 pages
Chapter 12
No ratings yet
Chapter 12
12 pages
Chapter 8 Regression Model - 2023
No ratings yet
Chapter 8 Regression Model - 2023
21 pages
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Document 68
No ratings yet
Document 68
8 pages
Document 66
No ratings yet
Document 66
10 pages
Document 2024
No ratings yet
Document 2024
9 pages
Document 64
No ratings yet
Document 64
7 pages
Further Development and Analysis of The Classical Linear Regression Model
No ratings yet
Further Development and Analysis of The Classical Linear Regression Model
56 pages
Nursing Research 63-377: Dr. Wally J. Bartfay
No ratings yet
Nursing Research 63-377: Dr. Wally J. Bartfay
26 pages
Factor Analysis
No ratings yet
Factor Analysis
16 pages
Module 4- ML-21EC744 (1)
No ratings yet
Module 4- ML-21EC744 (1)
18 pages
Chapter 5 Theoretical Framework and Hypothesis Developement
No ratings yet
Chapter 5 Theoretical Framework and Hypothesis Developement
12 pages
Business Forecasting Ii Final - Manual
No ratings yet
Business Forecasting Ii Final - Manual
86 pages
3-Part One - Chapter 2. Techniques, Tools, and Tactics
No ratings yet
3-Part One - Chapter 2. Techniques, Tools, and Tactics
25 pages
Jimkes 11022023 (545 554)
No ratings yet
Jimkes 11022023 (545 554)
10 pages
Chapter 1 Research
100% (1)
Chapter 1 Research
27 pages
Unit: Two Variable Linear Regression Model (Simple Linear Regression Model)
No ratings yet
Unit: Two Variable Linear Regression Model (Simple Linear Regression Model)
18 pages
SEM - An Econometrican S Introduction
No ratings yet
SEM - An Econometrican S Introduction
16 pages
02-Predicting Corporate Bond Illiquidity Via Machine Learning
No ratings yet
02-Predicting Corporate Bond Illiquidity Via Machine Learning
26 pages
Effects of Preschoolers' Storybook Exposure and Literacy Environments On Lower Level and Higher Level Language Skills
No ratings yet
Effects of Preschoolers' Storybook Exposure and Literacy Environments On Lower Level and Higher Level Language Skills
24 pages
Unit III Fds
No ratings yet
Unit III Fds
28 pages
Hypothesis (: Independent Variable (What Dependent Variable (What Controlled Variables (Those
No ratings yet
Hypothesis (: Independent Variable (What Dependent Variable (What Controlled Variables (Those
2 pages
Anckar - On The Applicability of The Most Similar Systems Design and The Most Different Systems Design in Comparative Research
0% (1)
Anckar - On The Applicability of The Most Similar Systems Design and The Most Different Systems Design in Comparative Research
14 pages
Java
No ratings yet
Java
34 pages
PQ1
No ratings yet
PQ1
36 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
Basic Econometrics 5th Edition by Damoda (156-206) (21-42) PDF
No ratings yet
Basic Econometrics 5th Edition by Damoda (156-206) (21-42) PDF
22 pages
Determinants of Induced Abortion Among Women of Reproductive Age
No ratings yet
Determinants of Induced Abortion Among Women of Reproductive Age
10 pages
Demand Estimation and Forecasting
No ratings yet
Demand Estimation and Forecasting
14 pages
PDF Proxy
No ratings yet
PDF Proxy
24 pages
Uncorrected Author Proof: Salutogenic Resources in Relation To Teachers' Work-Life Balance
No ratings yet
Uncorrected Author Proof: Salutogenic Resources in Relation To Teachers' Work-Life Balance
12 pages
PSQ Q2
No ratings yet
PSQ Q2
2 pages
Franchising and
No ratings yet
Franchising and
11 pages
Qreg
No ratings yet
Qreg
104 pages
Student
No ratings yet
Student
71 pages
Pr2 Reviewer
No ratings yet
Pr2 Reviewer
3 pages
DS II Mid Term 2017 Solution
No ratings yet
DS II Mid Term 2017 Solution
20 pages

4-Regression Diagnostics SAS

Uploaded by

4-Regression Diagnostics SAS

Uploaded by

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

A functional relation between two variables is expressed by a mathematical formula. If X denotes

Fig. 1(a) Fig. 1(b)

Fig. 1(c) Fig. 1(d)

2.2 Nonconstancy of Error Variance

2.3 Presence of Outliers

Tests for Outlying Observations

2.4 Nonindependence of Error Terms

Tests for Randomness

2.5 Nonnormality of Error Terms

Correlation Test for Normality

2. The squared multiple correlation coefficients

3. Overview of Remedial Measures

3.1 Nonlinearity of Regression Function

3.2 Nonconstancy of Error Variance

Following transformations are generally applied for stabilizing variance.

Box - Cox Transformations: It is difficult to determine, which transformation of Y is most

3.3 Nonindependence of Error Terms

3.4 Nonnormality of Error terms

3.5 Outlying Observations

4 Illustration through SAS

A portion of the output file is given in the following screen shot

proc reg data=abc;

The following program is used for detecting influential observations.

ods rtf file="lm5.rtf";

model y=x1 x2 x3;

The following programme is used for Box-Cox transformation.

A portion of the result is given below:

Regression Coefficients and Summary Statistics

You might also like