0% found this document useful (0 votes)

18 views

Chapter 4

sampling

Uploaded by

Admasu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Chapter 4

sampling

Uploaded by

Admasu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Chapter 4

Model Adequacy Checking

The fitting of linear regression model, estimation of parameters testing of hypothesis properties
of the estimator are based on following major assumptions:
1. The relationship between the study variable and explanatory variables is linear, at least
approximately.
2. The error term has zero mean.
3. The error term has constant variance.
4. The errors are uncorrelated.
5. The errors are normally distributed.

Additionally, there are issues that can arise during the analysis that, while strictly speaking
are not assumptions of regression, are none the less, of great concern to data analysts.

 Influence - individual observations that exert undue influence on the coefficients

 Collinearity - predictors that are highly collinear, i.e., linearly related, can cause problems
in estimating the regression coefficients.

The validity of these assumptions is needed for the results to be meaningful. If these assumptions are
violated, the result can be incorrect and may have serious consequences. If these departures are small, the
final result may not be changed significantly. But if the departures are large, the model obtained may
become unstable in the sense that a different sample could lead to a entirely different model with opposite
conclusions. So such underlying assumptions have to be verified before attempting to regression
modeling. Such information is not available from the summary statistic such as t-statistic, F-statistic or
coefficient of determination.

One important point to keep in mind is that these assumptions are for the population and we work only
with a sample. So the main issue is to take a decision about the population on the basis of a sample of
data.

Several diagnostic methods to check the violation of regression assumption are based on the study of
model residuals with the help of various types of graphics
Checking of linear relationship between study and explanatory variables
1. Case of one explanatory variable
If there is only one explanatory variable in the model, then it is easy to check the existence of linear
relationship between and by scatter diagram of the available data. Y and X
If the scatter diagram shows a linear trend, it indicates that the relationship between and is linear. If the
trend is not linear, then it indicates that the relationship between y and X is nonlinear. For example, the
following figure indicates a linear trend y and X.

1
Where as the following figure indicates a nonlinear trend:

2. Case of more than one explanatory variable

To check the assumption of linearity between study variable and explanatory variables, the scatter plot
matrix of the data can be used. A scatterplot matrix is a two dimensional array of two dimension plots
where each form contains a scatter diagram except for the diagonal. Thus, each plot sheds some light on
the relationship between a pair of variables. It gives more information than the correlation coefficient
between each pair of variables because it gives a sense of linearity or nonlinearity of the relationship and
some awareness of how the individual data points are arranged over the region. It is a scatter diagram of

(y versus X1), (y versus X2), …, (y versus Xk ).

Another option to present the scatterplot is
- present the scatterplots in the upper triangular part of plot matrix.
- Mention the corresponding correlation coefficients in the lower triangular part of the matrix.
Suppose there are only two explanatory variables and the model is then the
scatterplot matrix looks like as follows.

2
Such arrangement helps in examining of plot and corresponding correlation coefficient together. The
pairwise correlation coefficient should always be interpreted in conjunction with the corresponding scatter
plots because
 The correlation coefficient measures only the linear relationship and
 The correlation coefficient is non-robust, i.e., its value can be substantially influenced by
one or two observations in the data.
The presence of linear patterns is reassuring but absence of such patterns does not imply that linear model
is incorrect. Most of the statistical software provides the option for creating the scatterplot matrix. The
view of all the plots provides an indication that a multiple linear regression model may provide a
reasonable fit to the data. It is to be kept is mind that we get only the information on pairs of variables
through the scatterplot of ( y versus X1 ), (y versus X2 ), …, (y versus Xk ). whereas the assumption of
linearity is between and y jointly with X1 ,X2 ,….,Xk ).
If some of the explanatory variables are themselves interrelated, then these scatter diagrams can be
misleading. Some other methods of sorting out the relationships between several explanatory variables
and a study variable are used.

3
Residual Analysis in Regression
Because a linear regression model is not always appropriate for the data, you should assess the
appropriateness of the model by defining residuals and examining residual plots.

Residuals
The residual is defined as the difference betw

een the observed value of the dependent variable (y) and the predicted value (ŷ) is called the residual (e).
Each data point has one residual.

Residual = Observed value - Predicted value

e=y–ŷ
^
ei  Yi  Y i  Yi  (b0  b1 X i )
Both the sum and the mean of the residuals are equal to zero. That is, Σ ei = 0 and e = 0.

Approximate average variance of residuals is estimated by

To be studied by residuals

 Regression function not linear

 Error terms do not have constant
variance
 Error terms are not independent
 Model fits all but one or a few outlier
-+observations
 Er
 ror terms are not normally distributed
 One or more predictor variables have
been omitted from the model

Diagnostics for residuals

 Plot of residuals against predictor variable

 Plot of absolute or squared residuals against predictor variable
 Plot of residuals against fitted values
 Plot of residuals against time or other sequence
 Plot of residuals against omitted predictor variables
 Box plot of residuals
 Normal probability plot of residuals

4
Residual Plots
A residual plot: it is a graph that shows the residuals on the vertical axis and the independent variable on
the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a
linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.

Below the table on the left shows inputs and outputs from a simple linear regression analysis, and the
chart on the right displays the residual (e) and independent variable (X) as a residual plot.

X 60 70 80 85 95

Y 70 65 70 95 85

Ŷ 65.41 71.84 78.28 81.5 87.95

E 4.589 -6.85 -8.288 13.493 -2.945

 The residual plot shows a fairly random pattern - the first residual is positive, the next two are
negative, the fourth is positive, and the last residual is negative. This random pattern indicates
 that a linear model provides a decent fit to the data.

Below, the residual plots show three typical patterns. The first plot shows a random pattern,
indicating a good fit for a linear model. The other plot patterns are non-random (U-shaped and
inverted U), suggesting a better fit for a non-linear model. So should be no systematic
relationship between residual and predictor variable if it is linear related.
a

Random pattern Non-random: U-shaped Non-random: Inverted U

Transformations to Achieve Linearity

When a residual plot reveals a data set to be nonlinear, it is often possible to "transform" the raw data to
make it more linear. This allows us to use linear regression techniques more effectively with nonlinear data.

What is a Transformation to Achieve Linearity?

Transforming a variable involves using a mathematical operation to change its measurement scale. Broadly
speaking, there are two kinds of transformations.

5
 Linear transformation. A linear transformation preserves linear relationships between variables. Therefore,
the correlation between x and y would be unchanged after a linear transformation. Examples of a linear
transformation to variable x would be multiplying x by a constant, dividing x by a constant, or adding a
constant to x.

 Nonlinear Transformation

A nonlinear transformation changes (increases or decreases) linear relationships between variables and, thus,
changes the correlation between variables. Examples of a nonlinear transformation of variable x would be taking
the square root of x or the reciprocal of x.In regression, a transformation to achieve linearity is a special kind of
nonlinear transformation. It is a nonlinear transformation that increases the linear relationship between two
variables.

Methods of Transforming Variables to Achieve Linearity

There are many ways to transform variables to achieve linearity for regression analysis .some common
transformation

Trans
Method Regression equation Predicted value (ŷ)
formation(s)

Standard
linear None y = b0 + b1x ŷ = b0 + b1x
regression
Exponential
= log(y) log(y) = b0 + b1x ŷ = 10b + b1x
model
Quadratic
= sqrt(y) sqrt(y) = b0 +b1x ŷ = ( b0 + b1x )2
model
Reciprocal
= 1/y 1/y = b0 + b1x ŷ = 1 / ( b0 + b1x )
model
Logarithmic
= log(x) y= b0 + b1log(x) ŷ = b0 + b1log(x)
model

Each row shows a different nonlinear transformation method. The second column shows the specific
transformation applied to dependent and/or independent variables. The third column shows the regression
equation used in the analysis. And the last column shows the "back transformation" equation used to restore
the dependent variable to its original, non-transformed measurement scale.

6
In practice, these methods need to be tested on the data to which they are applied to be sure that they increase
rather than decrease the linearity of the relationship. Testing the effect of a transformation method involves
looking at residual plots and correlation coefficients, as described in the following sections.

How to Perform a Transformation to Achieve Linearity

Transforming a data set to enhance linearity is a multi-step, trial-and-error process.

First Conduct a standard regression analysis on the raw data and Construct a residual plot.
 If the plot pattern is random, do not transform data.
 If the plot pattern is not random, continue.
 Compute the coefficient of determination (R2).
 Choose a transformation method (see above table).
 Transform the independent variable, dependent variable, or both.
 Conduct a regression analysis, using the transformed variables.
 Compute the coefficient of determination (R2), based on the transformed variables.
o If the transformed R2 is greater than the raw-score R2, the transformation was successful. Congratulations!
o If not, try a different transformation method.
The best transformation method (exponential model, quadratic model, reciprocal model, etc.) will depend on
nature of the original data. The only way to determine which method is best is to try each and compare the
result (i.e., residual plots, correlation coefficients).

Influential Points in Regression

Sometimes in regression analysis, a few data points have disproportionate effects on the slope of the
regression equation. In this lesson, we describe how to identify those influential points.
Outliers
Data points that diverge in a big way from the overall pattern are called outliers. There are four ways that
a data point might be considered an outlier.
 It could have an extreme X value compared to other data points.
 It could have an extreme Y value compared to other data points.
 It could have extreme X and Y values.
 It might be distant from the rest of the data, even without extreme X or Y value

7
Extreme X value Extreme Y value

Extreme X and Y Distant data point

Influential Points

An influential point is an outlier that greatly affects the slope of the regression line. One way to test the
influence of an outlier is to compute the regression equation with and without the outlier.

This type of analysis is illustrated below. The scatter plots are identical, except that the plot on the right
includes an outlier. The slope is flatter when the outlier is present (-3.32 vs. -4.10), so this outlier would
be considered an influential point.

Without Outlier With Outlier

Regression equation: ŷ = 104.78 - 4.10x Regression equation: ŷ = 97.51 - 3.32x

Coefficient of determination: R2 = 0.94 Coefficient of determination: R2 = 0.55

8
The charts below compare regression statistics for another data set with and without an outlier. Here, the
chart on the right has a single outlier, located at the high end of the X axis (where x = 24). As a result of
that single outlier, the slope of the regression line changes greatly, from -2.5 to -1.6; so the outlier would
be considered an influential point.

Without Outlier
With Outlier

Regression equation: ŷ = 92.54 - 2.5x Regression equation: ŷ = 87.59 - 1.6x

Slope: b0 = -2.5 Slope: b0 = -1.6
Coefficient of determination: R2 = 0.46 Coefficient of determination: R2 = 0.52

Sometimes, an influential point will cause the coefficient of determination to be bigger;

sometimes, smaller. In the first example above, the coefficient of determination is smaller when
the influential point is present (0.94 vs. 0.55). In the second example, it is bigger (0.46 vs. 0.52).
| |
Outliers can strongly affect the fitted values of the regression line If say it is an outlier
√
(

If your data set includes an influential point, here are some things to consider.

 An influential point may represent bad data, possibly the result of measurement error. If possible,
check the validity of the data point.
 Compare the decisions that would be made based on regression equations defined with and
without the influential point. If the equations lead to contrary decisions, use caution.

Influence Statistics, Outliers, and Collinearity Diagnostics

 Studentized Residuals – Residuals divided by their estimated standard errors (like t-statistics).
Observations with values larger than 3 in absolute value are considered outliers.

9
 Outliers: In linear regression, an outlier is an observation with large residual. In other words, it is an
observation whose dependent-variable value is unusual given its values on the predictor variables. An
outlier may indicate a sample peculiarity or may indicate a data entry error or other problem
 Leverage Values (Hat Diag) – An observation with an extreme value on a predictor variable is
called a point with high leverage. Levearge Measure of how far an observation is from the others in
terms of the levels of the independent variables (not the dependent variable). Observations with
values larger than 2(k+1)/n are considered to be potentially highly influential, where k is the number
of predictors and n is the sample size.
 Influence: An observation is said to be influential if removing the observation substantially changes
the estimate of coefficients. Influence can be thought of as the product of leverage and outlierness.
 DFFITS – Measure of how much an observation has affected its fitted value from the regression
model. Values larger than 2*sqrt((k+1)/n) in absolute value are considered highly influential. Use
standardized DFFITS in SPSS.
 DFBETAS – Measure of how much an observation has affected the estimate of a regression
coefficient (there is one DFBETA for each regression coefficient, including the intercept). Values
larger than 2/sqrt (n) in absolute value are considered highly influential.
 Cook’s D – Measure of aggregate impact of each observation on the group of regression coefficients,
as well as the group of fitted values. Values larger than 4/n are considered highly influential.
 COVRATIO – Measure of the impact of each observation on the variances (and standard errors) of
the regression coefficients and their covariance’s. Values outside the interval 1 3(k+1)/n are
considered highly influential.
 Variance Inflation Factor (VIF) – Measure of how highly correlated each independent variable is
with the other predictors in the model. Values larger than 10 for a predictor imply large inflation of
standard errors of regression coefficients due to this variable being in model.

Obtaining Influence Statistics and Student zed Residuals in SPSS

A. Choose ANALYZE, REGRESSION, LINEAR, and input the Dependent variable and set of
Independent variables from your model of interest (possibly having been chosen via an automated
model selection method).
B. Under STATISTICS, select Collinearity Diagnostics, Casewise Diagnostics and All Cases and
CONTINUE
C. Under PLOTS, select Y:*SRESID and X:*ZPRED. Also choose HISTOGRAM. These give a plot
of studentized residuals versus standardized predicted values, and a histogram of standardized
residuals (residual/sqrt(MSE)). Select CONTINUE.

10
D. Under SAVE, select Student zed Residuals, Cook’s, Leverage Values, Covariance Ratio,
Standardized DFBETAS, Standardized DFFITS. Select CONTINUE. The results will be added to
your original data worksheet.

Remedial Measures
 There are two things you can do when you find out that your linear regression model is
not appropriate:
 Change your model (use another statistical model).
 Change your data.
 Transformations of X.  Omitting outliers.
 Transformations of Y.
Problems and Solutions
Nonlinearity of regression function:
 Use a nonlinear model or transform x(if residuals are reasonably normal and constant variance)
Non consistent error variance:
 Use weighted least squares estimation method.
 Transform Y if mean function is reasonably linear; address working with variance stabilizing
transformations of Y.

Non independent error terms:

 Change your model to include correlated error terms (change error assumption).
 use more complex models so that errors about them might indeed be reasonably independent, or
model first differences, or use models designed to handle dependent errors
Non normal error terms:
 Transform x.
 Often variance stabilizing transformations of the response also make residuals more consistent
with an iid Gaussian sample.
 Use a generalized model (differing error assumptions).
Omission of important predictor variables:
 Add them to the model.
 Use more complex models that include them.
Outlying observations:
 Check to see if that observation is "real".
 If so, you may want to use a more robust estimation method.
 If more than one, maybe use a mixture model.
 If outlier was from an error in data collection/coding, then delete the observation.

11
5. Remedial measures of model inadequacy

Data do not always come in a form that is immediately suitable for analysis. We often have to transform
the variables before carrying out the analysis. Transformations are applied to accomplish certain
objectives such as
 To stabilize the relationship
 To stabilize the variance of the dependent variable
 To normalize dependent variable
 To linearize the regression model

TRANSFORMATIONS TO STABILIZE VARIANCE

We have discussed in the preceding section the use of transformations to achieve linearity of the
regression function. Transformations are also used to stabilize the error variance, that is, to make the error
variance constant for all the observations. The constancy of error variance is one of the standard
assumptions of least squares theory. It is often referred to as the assumption of homoscedusdcity. When
the error variance is not constant over all the observations, the error is said to be heteroscedustic.
Heteroscedusticity is usually detected by suitable graphs of the residuals such as the scatter plot of the
standardized residuals against the fitted values or against each of the predictor variables. A plot with the
characteristics of Figure 6.9 typifies the situation. The residuals tend to have a funnel-shaped distribution,
either fanning out or closing in with the values of X. If heteroscedasticity is present, and no corrective
action is taken application of OLS to the raw data will result in estimated coefficients which lack
precision in a theoretical sense. The estimated standard errors of the regression coefficients are often
understated, giving a false sense of accuracy. Heteroscedasticity can be removed by means of a suitable
transformation. We describe an approach for
(a) detecting heteroscedasticity and its effects on the analysis
(b) removing heteroscedustic from the data analyzed using transformation

12
A number of the problems in our model can be solved by transforming X.
Why do we concentrate on x?
 The distribution of the error terms depends on Y, not X.
 If we were to transform Y, we would change the shape and nature of the analysis.
 So, always transform X.
So, you have problems, what transformation do you use? Some common transformations are:
X′ = ln(X) X′ = √X
X′ = X′ = exp(X)
Note: Box-Cox transformations of the response: Instead of selecting a transformation “by eye”, select
an optimal power transformation.

Variable Selection and Model Building

The Model-Building Problem

• Ensure that the function form of the model is correct and that the underlying assumptions
are not violated.
In most practical problems, the analyst has a rather large pool of possible candidate repressors, of which
only a few are likely to be important. Finding an appropriate subset of repressors for the model is often
called the variable selection problem. While choosing a subset of explanatory variables, there are two
possible options:

1. In order to make the model as realistic as possible, the analyst may include as many as possible
explanatory variables.

2. In order to make the model as simple as possible, one may include only fewer number of explanatory
variables.

Both the approaches have their own consequences. In fact, model building and subset selection have
contradicting objectives.

 When large numbers of variables are included in the model, then these factors can influence the
prediction of study variable y.
 On the other hand when small number of variables are included then the predictive variance of ̂
decreases.
 Also, when the observations on more number are to be collected, then it involves more cost, time,
labour etc.

A compromise between these consequences is striked to select the “best regression equation”.

There can be two types of incorrect model specifications.

 Omission/exclusion of relevant variables.
 Inclusion of irrelevant variables.
Exclusion of relevant variables

13
In order to keep the model simple, the analyst may delete some of the explanatory variables which may be
of importance from the point of view of theoretical considerations. There can be several reasons behind
such decision, e.g., it may be hard to quantity the variables like taste, intelligence etc. Sometimes it may
be difficult to take correct observations on the variables like income etc.

Inclusion of irrelevant variables

Sometimes due to enthusiasm and to make the model more realistic, the analyst may include some
explanatory variables that are not very relevant to the model. Such variables may contribute very little to
the explanatory power of the model. This may tend to reduce the degrees of freedom (n - k) and
consequently the validity of inference drawn may be questionable. For example, the value of coefficient
of determination will increase indicating that th e model is getting better which may not really be true.

Exclusion type Inclusion type

Estimation of coefficients Biased Unbiased
Efficiency Generally declines Declines
Estimation of disturbance term Over-estimate Unbiased
Conventional test of hypothesis and Invalid and faulty inferences Valid though erroneous
confidence region
The basic steps for variable selection are as follows:

(a) Specify the maximum model to be considered.

(b) Specify a criterion for selection a model.

(c) Specify a strategy for selecting variables.

(d) Conduct the specified analysis.

(e) Evaluate the Validity of the model chosen.

Step 1: Specifying the maximum Model: The maximum model is defined to be the largest model (the
one having the most predictor variables) considered at any point in the process of model selection.

 Error degrees of freedom must be positive. Therefore, n-p=n-(k+1)>0

 The weakest requirement is n-(k+1)>10
 Another suggested rule of thumb for regression is to have at least 5 (or 10) observations per
predictor.
 In general, we like to have large error degrees of freedom.

Step 2: Specifying a Criterion for Selecting a Model: There are several criteria that can be used to
evaluate subset regression models. The criterion that we used for model selection certainly be related to
intended use of model.

14
 F-Test Statistic: Another reasonable criterion for selecting the best model is the F-test statistic
for comparing the full and reduced models.
 This statistic may be compared to an F-distribution with k-p+1 and n-k-1 degrees of
freedom. If F-Calculated is not significant, we can use the smaller (P-1 variables) model.
 Coefficient of Determination ( ): A measure of the adequacy of a regression model that has
been widely used is the coefficient of determination.
 Increases as P increases and is maximum when P=K+1. Therefore, the analyst uses this
criterion by adding repressors to the model up to the point where an additional variable only
provides a small increase in .

Step 3: Specifying a Strategy for Selecting Variables:

 All possible regression procedure: The all possible regression procedure requires that we fit
each possible regression equation.
 Backward Elimination Procedure: We begin with a model that includes all candidate
regressors. Then the partial F-statistic is computed for each repressors as if it were the last
variable to enter the model. The smallest of these partial F-statistics is compared with a pre-
selected value FOUT, that regressor is removed from the model. Now a regression model with k-1
is fit. The partial F-statistics for this new model calculated, and the procedure repeated. The
backward elimination algorithm terminates when the smallest partial F- value is not less than the
pre-selected cutoff value FOUT.
 Forward Selection Procedure: The procedure begins with the assumption that there are no
regressors in the model other than the intercept. An effort is made to find an optimal subset by
inserting into model one at a time. At each step the repressor having the highest partial correlation
with (or equivalently the largest F-statistic given the other regressors already in the model) is
added to the model if its partial F-statistic exceeds the pre-selected entry level FIN
 Stepwise Regression Procedure: Stepwise regression is a modified version of forward
regression that permits reexamination, at every step, of the variables incorporated in the model in
pervious steps. A variable that entered at an early stage may become superfluous at a larger stage
because of its relationship with other variables subsequently added to the model.

Bike Sharing Assignment
100% (6)
Bike Sharing Assignment
7 pages
AP Statistics Chapter 3
0% (1)
AP Statistics Chapter 3
3 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Unit 9 Simple Linear Regression: Structure
No ratings yet
Unit 9 Simple Linear Regression: Structure
22 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
8.1-linear-regression-and-correlation-analysis-glossary
No ratings yet
8.1-linear-regression-and-correlation-analysis-glossary
8 pages
Subjective Questions
No ratings yet
Subjective Questions
8 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Unit1 - Data Science - SPPU
No ratings yet
Unit1 - Data Science - SPPU
15 pages
Linear Regression Analysis: Module - Iv
No ratings yet
Linear Regression Analysis: Module - Iv
10 pages
ArunRangrej
No ratings yet
ArunRangrej
5 pages
Explain The Linear Regression Algorithm in Detail
No ratings yet
Explain The Linear Regression Algorithm in Detail
12 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
65 pages
Assignment Linear Regression
No ratings yet
Assignment Linear Regression
10 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
CHAPTER 14 Regression Analysis
No ratings yet
CHAPTER 14 Regression Analysis
69 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Econometria 2
No ratings yet
Econometria 2
16 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
Simple Linear Regression: Coefficient of Determination
No ratings yet
Simple Linear Regression: Coefficient of Determination
21 pages
Econometrics Practical
No ratings yet
Econometrics Practical
13 pages
STATG5 - Simple Linear Regression Using SPSS Module
No ratings yet
STATG5 - Simple Linear Regression Using SPSS Module
16 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Ch 4- Correlation and Regression YARA&LAMA
No ratings yet
Ch 4- Correlation and Regression YARA&LAMA
27 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Chapter 4 MLR
No ratings yet
Chapter 4 MLR
17 pages
Regression Analysis - Chapter 4 - Model Adequacy Checking - Shalabh, IIT Kanpur
No ratings yet
Regression Analysis - Chapter 4 - Model Adequacy Checking - Shalabh, IIT Kanpur
36 pages
Regression basics
No ratings yet
Regression basics
27 pages
06 Simple Linear Regression Part1
No ratings yet
06 Simple Linear Regression Part1
8 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Correlation Analysis-Students NotesMAR 2023
No ratings yet
Correlation Analysis-Students NotesMAR 2023
24 pages
Machinery
No ratings yet
Machinery
9 pages
Regression Course For Second Year (Chap 1-3)
No ratings yet
Regression Course For Second Year (Chap 1-3)
59 pages
Bio2 Module 4 - Multiple Linear Regression
No ratings yet
Bio2 Module 4 - Multiple Linear Regression
20 pages
W6 - L4 - Simple Linear Regression
No ratings yet
W6 - L4 - Simple Linear Regression
4 pages
Scatter plot
No ratings yet
Scatter plot
20 pages
Module 6 RM: Advanced Data Analysis Techniques
No ratings yet
Module 6 RM: Advanced Data Analysis Techniques
23 pages
ML UNIT 4 MATERIAL
No ratings yet
ML UNIT 4 MATERIAL
20 pages
CH 5
No ratings yet
CH 5
36 pages
Summary: Correlation and Regression
No ratings yet
Summary: Correlation and Regression
6 pages
Linear Regression Assignment_Subjective
No ratings yet
Linear Regression Assignment_Subjective
7 pages
1_UNIT 2 2 files merged
No ratings yet
1_UNIT 2 2 files merged
80 pages
Notes On Linear Regression - 2
No ratings yet
Notes On Linear Regression - 2
4 pages
BRM File
No ratings yet
BRM File
35 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Econometrics
No ratings yet
Econometrics
13 pages
Handout 05 Regression and Correlation PDF
No ratings yet
Handout 05 Regression and Correlation PDF
17 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Chapter 2 - Simple Linear Regression Function
100% (1)
Chapter 2 - Simple Linear Regression Function
49 pages
Linear Regression Assignment Questions and Answer
No ratings yet
Linear Regression Assignment Questions and Answer
7 pages
Unit 2
No ratings yet
Unit 2
34 pages
Model Development
No ratings yet
Model Development
80 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
MATH 101-Week 7-8- Lesson 4.1 Correlation & Regression Analysis
No ratings yet
MATH 101-Week 7-8- Lesson 4.1 Correlation & Regression Analysis
53 pages
A Scatter Plot
No ratings yet
A Scatter Plot
5 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Chapter 3 - 2012
No ratings yet
Chapter 3 - 2012
5 pages
Chapter 5 - 2010
No ratings yet
Chapter 5 - 2010
8 pages
Using Double Integral Transform Laplace-ARA Transf
No ratings yet
Using Double Integral Transform Laplace-ARA Transf
27 pages
Ain Shame Tarig Shams Badry Lesha 2021
No ratings yet
Ain Shame Tarig Shams Badry Lesha 2021
6 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Chapter 2e
No ratings yet
Chapter 2e
19 pages
ART20181144
No ratings yet
ART20181144
5 pages
Chapter 3e
No ratings yet
Chapter 3e
8 pages
Chapter 4e
100% (1)
Chapter 4e
4 pages
Modeling The Volatility of Exchange Rate Currency Using Garch Model
No ratings yet
Modeling The Volatility of Exchange Rate Currency Using Garch Model
22 pages
Worksheet 3 For Eng
No ratings yet
Worksheet 3 For Eng
2 pages
News Letter: Dr. Sarvepalli Radhakrishnan
No ratings yet
News Letter: Dr. Sarvepalli Radhakrishnan
9 pages
Worksheet 2 For Eng
No ratings yet
Worksheet 2 For Eng
2 pages
Worksheet 3 For Eng
No ratings yet
Worksheet 3 For Eng
2 pages
Worksheet 1
No ratings yet
Worksheet 1
8 pages
Worksheet For Engineers
100% (2)
Worksheet For Engineers
2 pages
Worksheet 3 For Eng
No ratings yet
Worksheet 3 For Eng
1 page
The Effects of Business Digitalization and Knowledge Management P
No ratings yet
The Effects of Business Digitalization and Knowledge Management P
18 pages
Determinants of Total Asset Growth in Micro and Small-Scale Enterprise in Gondar City, Ethiopia
No ratings yet
Determinants of Total Asset Growth in Micro and Small-Scale Enterprise in Gondar City, Ethiopia
8 pages
Standardized Multiple Regression Analysis
No ratings yet
Standardized Multiple Regression Analysis
18 pages
Effectof Principals Technology Leadershipon Teachers Technology Integration
No ratings yet
Effectof Principals Technology Leadershipon Teachers Technology Integration
19 pages
Review Comments of "Economic Policy Uncertainty, Ownership Structure, and Smes Performance"
No ratings yet
Review Comments of "Economic Policy Uncertainty, Ownership Structure, and Smes Performance"
2 pages
Lab1-01 - Amazon Sagemaker Data Wrangling and Features Storel
No ratings yet
Lab1-01 - Amazon Sagemaker Data Wrangling and Features Storel
47 pages
Gregory 2019, Predictors of Women's Satisfaction With Prenatal Care in A Canadian Setting
No ratings yet
Gregory 2019, Predictors of Women's Satisfaction With Prenatal Care in A Canadian Setting
10 pages
The Effect of Fraud Risk Management, Risk Culture, On The Performance of Nigerian Banking Sector Preliminary Analysis2
No ratings yet
The Effect of Fraud Risk Management, Risk Culture, On The Performance of Nigerian Banking Sector Preliminary Analysis2
14 pages
04.Session Notes on Principal Component Regression(1)
No ratings yet
04.Session Notes on Principal Component Regression(1)
12 pages
Eigenvector Spatial Filtering Enhancing Natural Hazards V - 2024 - Environmental
No ratings yet
Eigenvector Spatial Filtering Enhancing Natural Hazards V - 2024 - Environmental
19 pages
Multiple Linear Regression test_2025
No ratings yet
Multiple Linear Regression test_2025
47 pages
Yusfiarto2022 - Examining Islamic Capital Market Adoption From A Socio-Pychological Perspective
No ratings yet
Yusfiarto2022 - Examining Islamic Capital Market Adoption From A Socio-Pychological Perspective
21 pages
98-Article Text-241-1-10-20210721
No ratings yet
98-Article Text-241-1-10-20210721
14 pages
Multicollinearity
No ratings yet
Multicollinearity
36 pages
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
11 pages
Rural Livelihood Diversification in Rice-Based Areas of Bangladesh
100% (1)
Rural Livelihood Diversification in Rice-Based Areas of Bangladesh
29 pages
Ogunsola 2023 Employee Selection Process - An Approach For Effective Organizational Performance
No ratings yet
Ogunsola 2023 Employee Selection Process - An Approach For Effective Organizational Performance
9 pages
Mediating Role of Emotional Intelligence in The Relationship Between Resilience and Academic Engagement in Adolescents Differences Between Men and Wo
No ratings yet
Mediating Role of Emotional Intelligence in The Relationship Between Resilience and Academic Engagement in Adolescents Differences Between Men and Wo
14 pages
Olsrr
No ratings yet
Olsrr
76 pages
Theinfluenceoflearningvalueonlearningmanagementsystemuse Anextensionof UTAUT2
100% (1)
Theinfluenceoflearningvalueonlearningmanagementsystemuse Anextensionof UTAUT2
17 pages
Use of Linear Regression For Time Series Prediction
No ratings yet
Use of Linear Regression For Time Series Prediction
38 pages
WASTE MANAGEMENT AND ENVIRONMENTAL SUSTAINABILITY IN KAMPALA, UGANDA
No ratings yet
WASTE MANAGEMENT AND ENVIRONMENTAL SUSTAINABILITY IN KAMPALA, UGANDA
5 pages
JURNAL Nailatuseng
No ratings yet
JURNAL Nailatuseng
16 pages
Research On The Impact of Geopolitical Instability On Russian Trade
100% (1)
Research On The Impact of Geopolitical Instability On Russian Trade
15 pages
j.jfa.20241204.11
No ratings yet
j.jfa.20241204.11
12 pages
Applied statistical inference with MINITAB Second Edition Lesik 2024 Scribd Download
100% (2)
Applied statistical inference with MINITAB Second Edition Lesik 2024 Scribd Download
55 pages
Textile Resaerch Journal 1
No ratings yet
Textile Resaerch Journal 1
12 pages
2023 CMOST Presentation Conventional Tight
No ratings yet
2023 CMOST Presentation Conventional Tight
220 pages
Instant download (Ebook) Building a Resilient and Sustainable Agriculture in Sub-Saharan Africa by Abebe Shimeles, Audrey Verdier-Chouchane, Amadou Boly ISBN 9783319762210, 9783319762227, 3319762214, 3319762222 pdf all chapter
100% (7)
Instant download (Ebook) Building a Resilient and Sustainable Agriculture in Sub-Saharan Africa by Abebe Shimeles, Audrey Verdier-Chouchane, Amadou Boly ISBN 9783319762210, 9783319762227, 3319762214, 3319762222 pdf all chapter
67 pages
R - Companion To Applied Regression
No ratings yet
R - Companion To Applied Regression
10 pages

Chapter 4

Uploaded by

Chapter 4

Uploaded by

Chapter 4

Model Adequacy Checking

 Influence - individual observations that exert undue influence on the coefficients

2. Case of more than one explanatory variable

(y versus X1), (y versus X2), …, (y versus Xk ).

Residual = Observed value - Predicted value

Approximate average variance of residuals is estimated by

 Regression function not linear

Diagnostics for residuals

 Plot of residuals against predictor variable

Ŷ 65.41 71.84 78.28 81.5 87.95

E 4.589 -6.85 -8.288 13.493 -2.945

Random pattern Non-random: U-shaped Non-random: Inverted U

Transformations to Achieve Linearity

What is a Transformation to Achieve Linearity?

Methods of Transforming Variables to Achieve Linearity

How to Perform a Transformation to Achieve Linearity

Transforming a data set to enhance linearity is a multi-step, trial-and-error process.

Influential Points in Regression

Extreme X and Y Distant data point

Without Outlier With Outlier

Regression equation: ŷ = 104.78 - 4.10x Regression equation: ŷ = 97.51 - 3.32x

Regression equation: ŷ = 92.54 - 2.5x Regression equation: ŷ = 87.59 - 1.6x

Sometimes, an influential point will cause the coefficient of determination to be bigger;

Influence Statistics, Outliers, and Collinearity Diagnostics

Obtaining Influence Statistics and Student zed Residuals in SPSS

Non independent error terms:

TRANSFORMATIONS TO STABILIZE VARIANCE

Variable Selection and Model Building

The Model-Building Problem

There can be two types of incorrect model specifications.

Inclusion of irrelevant variables

Exclusion type Inclusion type

(a) Specify the maximum model to be considered.

(b) Specify a criterion for selection a model.

(c) Specify a strategy for selecting variables.

(d) Conduct the specified analysis.

(e) Evaluate the Validity of the model chosen.

 Error degrees of freedom must be positive. Therefore, n-p=n-(k+1)>0

Step 3: Specifying a Strategy for Selecting Variables:

You might also like