0% found this document useful (0 votes)

18 views21 pages

Regression

Uploaded by

mulyonosarpan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views21 pages

Regression

Uploaded by

mulyonosarpan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

REGRESSION

This procedure performs multiple linear regression with five methods for entry and
removal of variables. It also provides extensive analysis of residual and influential
cases. Caseweight (CASEWEIGHT) and regression weight (REGWGT) can be
specified in the model fitting.

Notation
The following notation is used throughout this chapter unless otherwise stated:

yi Dependent variable for case i with variance σ 2 gi

ci Caseweight for case i; ci = 1 if CASEWEIGHT is not specified

gi Regression weight for case i; gi = 1 if REGWGT is not specified

l Number of distinct cases
wi ci gi
l
W
∑w i
i =1
p Number of independent variables
l
C Sum of caseweights: ∑c i
i =1

x ki The kth independent variable for case i

l
Xk Sample mean for the kth independent variable: X k = ∑ wi xki
i =1
W

l
Sample mean for the dependent variable: Y = ∑ wi yi W
Y
i =1
hi Leverage for case i

1
2 REGRESSION

~ gi
hi + hi
W
S kj Sample covariance for X k and X j

S yy Sample variance for Y

S ky Sample covariance for X k and Y

p∗ Number of coefficients in the model. p ∗ = p if the intercept is not included;

otherwise p ∗ = p + 1
R The sample correlation matrix for X 1 ,K , X p and Y

Descriptive Statistics
r 11 K r1 p r1 y "#
R=
r21 K r2 p r2 y ##
⋅ K ⋅ ⋅
##
!
ry1 K ryp ryy $
where

S kj
rkj =
S kk S jj

and

S ky
ryk = rky =
S kk S yy

The sample mean X i and covariance Sij are computed by a provisional means
algorithm. Define

k
Wk = ∑w = i cumulative weight up to case k
i =1
REGRESSION 3

then

1 k 6 = X i1 k −16 + 4 xik − X i1 k −16 9 Wkk

w
Xi

and, if the intercept is included,

Cij 16
k = Cij 1 6
k −1 4
+ xik − X i 1 6 94 x
k −1 jk −Xj 1 96 w
k −1 k −
wk2
Wk

Otherwise,

Cij 1 k 6 = Cij1k −16 + wk xik x jk

where

X i 116 = xi1

and

16
Cij 1 = 0

The sample covariance Sij is computed as the final Cij divided by C − 1 .

Sweep Operations (Dempster, 1969)

For a regression model of the form

Yi = β 0 + β 1 X 1i + β 2 X 2i + L + β p X pi + ei

sweep operations are used to compute the least squares estimates b of β and the
associated regression statistics. The sweeping starts with the correlation matrix R.
4 REGRESSION

~
Let R be the new matrix produced by sweeping on the kth row and column of R.
~
The elements of R are

1
r~kk =
rkk
r
r~ik = ik , i≠k
rkk
rkj
r~kj = , j≠k
rkk

and

rij rkk − rik rkj

r~ij = , i ≠ k, j ≠ k
rkk

If the above sweep operations are repeatedly applied to each row of R 11 in

R R 12
R
11
R=
21 R 22

where R 11 contains independent variables in the equation at the current step, the
result is

−1 −1
− R 11
~
R=
R 11
−1
R 21R 11
R 12
−1
R 22 − R 21R 11 R 12

The last row of

−1
R 21R 11

contains the standardized coefficients (also called BETA), and

−1
R 22 − R 21R 11 R 12
REGRESSION 5

can be used to obtain the partial correlations for the variables not in the equation,
controlling for the variables already in the equation. Note that this routine is its own
inverse; that is, exactly the same operations are performed to remove a variable as
to enter a variable.

Variable Selection Criteria

Let rij be the element in the current swept matrix associated with X i and X j .
Variables are entered or removed one at a time. X k is eligible for entry if it is an
independent variable not currently in the model with

rkk ≥ t (tolerance with a default of 0.0001)

and also, for each variable X j that is currently in the model,

r t ≤1

r jk rkj
jj −
rkk

The above condition is imposed so that entry of the variable does not reduce the
tolerance of variables already in the model to unacceptable levels.
The F-to-enter value for X k is computed as

F − to − enterk =
4C − p − 19V
∗
k
ryy − Vk

with 1 and C − p∗ − 1 degrees of freedom, where p∗ is the number of coefficients

currently in the model and

ryk rky
Vk =
rkk

The F-to-remove value for X k is computed as

F − to − removek =
4C − p 9 V
∗
k
ryy

with 1 and C − p∗ degrees of freedom.

6 REGRESSION

Methods for Variable Entry and Removal

Five methods for entry and removal of variables are available. The selection
process is repeated until the maximum number of steps (MAXSTEP) is reached or
no more independent variables qualify for entry or removal. The algorithms for
these five methods are described below.

Stepwise

If there are independent variables currently entered in the model, choose X k such
that F − to − removek is minimum. X k is removed if F − to − removek < Fout
1
(default = 2.71) or, if probability criteria are used, P F − to − removek > Pout 6
(default = 0.1). If the inequality does not hold, no variable is removed from the
model.
If there are no independent variables currently entered in the model or if no
entered variable is to be removed, choose X k such that F − to − enterk is
maximum. Xk is entered if F − to − enterk > Fin (default = 3.84) or,
1 6
P F − to − enterk < Pin (default = 0.05). If the inequality does not hold, no
variable is entered.
At each step, all eligible variables are considered for removal and entry.

Forward
This procedure is the entry phase of the stepwise procedure.

Backward
This procedure is the removal phase of the stepwise procedure and can be used only
after at least one independent variable has been entered in the model.

Enter (Forced Entry)

Choose X k such that rkk is maximum and enter X k . Repeat for all variables to
be entered.
REGRESSION 7

Remove (Forced Removal)

Choose X k such that rkk is minimum and remove X k . Repeat for all variables to
be removed.

Statistics
Summary
For the summary statistics, assume p independent variables are currently entered in
the equation, of which a block of q variables have been entered or removed in the
current step.

Multiple R

R = 1 − ryy

R Square

R 2 = 1 − ryy

Adjusted R Square

2
Radj =R 2
−
41 − R 9 p
2

C − p∗

R Square Change (when a block of q independent variables was added or removed)

∆R 2 = Rcurrent
2
− R previous
2
8 REGRESSION

F Change and Significance of F Change

%K ∆R 3C − p 8
2 ∗

∆F = &
K q31 − R 8 2
current
for the removal of q independent variables

KK ∆R 3C − p − q8
2 ∗

K' q3 R − 18
2
for the addition of q independent variables
previous

the degrees of freedom for the addition are q and C − p∗ , while the degrees of
freedom for the removal are q and C − p∗ − q .

Residual Sum of Squares

1 6
SSe = ryy C − 1 S yy

with degrees of freedom C − p∗ .

Sum of Squares Due to Regression

1 6
SS R = R 2 C − 1 S yy

with degrees of freedom p.

REGRESSION 9

ANOVA Table

Analysis of Variance df Sum of Squares Mean Square

Regression p SS R 1SS 6 p
R

w C − p∗ SSe 1SS 6 4C − p 9
e
∗

Variance-Covariance Matrix for Unstandardized Regression Coefficient Estimates

A square matrix of size p with diagonal elements equal to the variance, the below
diagonal elements equal to the covariance, and the above diagonal elements equal
to the correlations:

1 6
var bk =
rkk ryy Syy
3
Skk C − p∗ 8

3 8
cov bk , b j =
rkj ryy Syy
3
Skk S jj C − p∗ 8
3 8
cor bk , b j =
rkj
rkk rjj

Selection Criteria

Akaike Information Criterion (AIC)

SS + 2 p ∗
AIC = C ln
C
e
10 REGRESSION

Amemiya’s Prediction Criterion (PC)

PC =
41 − R 94C + p 9
2 ∗

C − p∗

Mallow’s Cp (CP)

SSe
CP = + 2 p* − C
σ$ 2

where σ$ 2 is the mean square error from fitting the model that includes all the
variables in the variable list.

Schwarz Bayesian Criterion (SBC)

SS + p ln1C6
∗
SBC = C ln
C
e

Collinearity

Variance Inflation Factors

1
VIFi =
rii

Tolerance

Tolerancei = rii
REGRESSION 11

Eigenvalues, lk

The eigenvalues of scaled and uncentered cross-product matrix for the

independent variables in the equation are computed by the QL method
(Wilkinson and Reinsch, 1971).

Condition Indices

max λ j
ηk =
λk

Variance-Decomposition Proportions

Let

3
v i = vi1 ,K , vip 8
be the eigenvector associated with eigenvalue λ i . Also, let

p
Φ ij = vij2 λ i and Φ j = ∑Φ ij
i =1

The variance-decomposition proportion for the jth regression coefficient associated

with the ith component is defined as

π ij = Φ ij Φ j

Statistics for Variables in the Equation

Regression Coefficient bk

ryk Syy
bk = for k = 1,K , p
Skk
12 REGRESSION

The standard error of bk is computed as

rkk ryy S yy
σ$ bk =
4
S kk C − p∗ 9

A 95% confidence interval for bk is constructed from

bk ± σ$ bk t 0.025, C − p∗

If the model includes the intercept, the intercept is estimated as

p
b0 = y − ∑b X k k
k =1

The variance of b0 is estimated by

1C − 16ryy S yy + p X 2σ$ 2 p p −1
σ$ b2 =
0
C 4C − p ∗ 9
∑ k b k
+2 ∑ ∑ X k X j est. cov3bk , b j 8
k =1 k = j +1 j =1

Beta Coefficients

Beta k = ryk

The standard error of Beta k is estimated by

ryy rkk
σ$ Betak =
C − p∗
REGRESSION 13

F-test for Beta k

F =
Beta k
2

σ$ Beta
k

with 1 and C − p∗ degrees of freedom.

Part Correlation of Xk with Y

Part − Corr X k = 1 6 ryk

rkk

Partial Correlation of Xk with Y

1 6
Partial − Corr X k =
ryk
rkk ryy − ryk rky

Statistics for Variables Not in the Equation

Standardized regression coefficient Beta ∗k if Xk enters the equation at the next step

ryk
Beta k∗ =
rkk

The F-test for Beta k∗

F=
4C − p∗ − 19ryk2
rkk ryy − ryk
2

with 1 and C − p∗ degrees of freedom

14 REGRESSION

Partial Correlation of Xk with Y

1 6
Partial X k =
ryk
ryy rkk

Tolerance of Xk

Tolerancek = rkk

Minimum tolerance among variables already in the equation if Xk enters at the next step is

1
1≤ j ≤ p r jj − 3rkj r jk 8 rkk
min , rkk

Residuals and Associated Statistics

There are 19 temporary variables that can be added to the active system file. These
variables can be requested with the RESIDUAL subcommand.

Centered Leverage Values

For all cases, compute

%K
KK 0Cg− 15 ∑ ∑ 3 X − X S83 XS 8
− Xk rjk
p p
i ji j ki
if intercept is included

h =&
K j =1 k =1 jj kk

i
KK
KK 0Cg− 15 ∑ ∑ X SX S r
p p
i ji ki jk
otherwise
' j =1 k =1 jj kk
REGRESSION 15

For selected cases, leverage is hi ; for unselected case i with positive caseweight,
leverage is

%Kg 1 + h 1 + 1 + h − 1 "
K W W W + 1#$
h′ = & !
i i i if intercept is included
i
KKh 11 + h g 6
' i i i otherwise

Unstandardized Predicted Values

%K p

K ∑b X k ki if no intercept
Y =&
$ k =1
i
KKb + b X p

K' ∑
0 k ki otherwise
k =1

Unstandardized Residuals

ei = Yi − Y$i

Standardized Residuals

%K e i
if no regression weight is specified
ZRESID = & s
K'SYSMIS
i
otherwise

where s is the square root of the residual mean square.

16 REGRESSION

Standardized Predicted Values

%K Y$ − Y i
if no regression weight is specified
ZPRED = & sd
KKSYSMIS
i

' otherwise

where sd is computed as

l
4
ci Y$i − Y 9 2

sd = ∑ C −1
i =1

Studentized Residuals

%K e s i
for selected cases with ci > 0

=&
K 41− h 9 g
~
i i

KK e ~s
SRESi
i
otherwise
K' 41 + h 9 g i i

Deleted Residuals

%Kei 41 − h~i 9 for selected cases with ci > 0

DRESIDi = &Kei
' otherwise
REGRESSION 17

Studentized Deleted Residuals

%K DRESID i
for selected cases with ci > 0

SDRESID = &
K s1 6 i

KK e~
i
i
otherwise
K' s 41 + h 9 g i i

16
where s i is computed as

1 4C − p 9s ∗ 2

16
si =
∗
~
1 − hi
− DRESIDi2
C − p −1

Adjusted Predicted Values

ADJPREDi = Yi − DRESIDi

DfBeta

16
g e X ′WX
DFBETAi = b − b i = i i ~
1 6 −1
Xit
1 − hi

where

%K31, X1i ,K, X pi 8

&K3 X1i ,K, X pi 8
if intercept is included
Xit =
' ottherwise

1
and W = diag w1 ,K , wl . 6
18 REGRESSION

Standardized DfBeta

SDBETAij =
bj − bj i 16
4Xt WX9 jj
−1
16
si

16
where b j − b j i is the jth component of b − b i . 16
DfFit

~
DFFITi = X i b − b i = 16 hi ei
~
1 − hi

Standardized DfFit

DFFITi
SDFITi = ~
16
s i hi

Covratio

s1 6
COVRATIO =
2 p∗

s
i 1
i × ~
1 − hi

Mahalanobis Distance

For selected cases with ci > 0 ,

MAHALi =
%&1C − 16h i if intercept is included
'C h i otherwise
REGRESSION 19

For unselected cases with ci > 0

%&C h ′ if intercept is included

'1C + 16h′
i
MAHALi =
i otherwise

Cook’s Distance (Cook, 1977)

For selected cases with ci > 0

%K4 DRESID h~ g 9 s 1 p + 16
2 2
=&
i i i if intercept is included

K'4 DRESID h g 9 4s p9
COOKi
2 2
i i i otherwise

For unselected cases with ci > 0

%K DRESIDi2 hi′ + 1 1 6

K W
~
s 2 p +1 if intercept is included
COOKi = &
KK4 DRESIDi2hi′9 4~s 2 p9
'
otherwise

where hi′ is the leverage for unselected case i, and ~

s 2 is computed as

%K 1 SS + e 1 − h′ − 1 "

KC − p ! 1 + W #$
2
if intercept is included
=&
e i i
~2
KK 1 SS + e 11 − h′6
s
2
'C − p + 1 e i i otherwise
20 REGRESSION

Standard Errors of the Mean Predicted Values

For all the cases with positive caseweight,

%Ks ~
SEPREDi = &Ks hi gi if intercept is included

' hi gi otherwise

95% Confidence Interval for Mean Predicted Response

LMCIN i = Y$i − t 0.025, C − p∗ SEPREDi

UMCIN i = Y$i + t 0.025, C − p∗ SEPREDi

95% Confidence Interval for a Single Observation

%KY$ − t 4h~ + 19 g
=&
i 0.025, C − p ∗
s i i if intercept is included

K'Y$ − t
LICINi
i 0.025, C − p s 1h + 16 g
i i otherwise

%KY$ + t 4h~ + 19 g
=&
i 0.025, C − p ∗
s i i if intercept is included

K'Y$ + t
UICINi
i 0.025, C − p s 1h + 16 g
i i otherwise

Durbin-Watson Statistic

∑ 1e~ − e~ 6 i i −1
2

i =2
DW = l

∑ c e~ i i
2

i =1

where e~i = ei gi .
REGRESSION 21

Partial Residual Plots

The scatterplots of the residuals of the dependent variable and an independent
variable when both of these variables are regressed on the rest of the independent
variables can be requested in the RESIDUAL branch. The algorithm for these
residuals is described in Velleman and Welsch (1981).

Missing Values
By default, a case that has a missing value for any variable is deleted from the
computation of the correlation matrix on which all consequent computations are
based. Users are allowed to change the treatment of cases with missing values.

References
Cook, R. D. 1977. Detection of influential observations in linear regression,
Technometrics, 19: 15–18.

Dempster, A. P. 1969. Elements of Continuous Multivariate Analysis. Reading,

Mass.: Addison-Wesley.

Velleman, P. F., and Welsch, R. E. 1981. Efficient computing of regression

diagnostics. The American Statistician, 35: 234–242.

Wilkinson, J. H., and Reinsch, C. 1971. Linear algebra. In: Handbook for
Automatic Computation, Volume II, J. H. Wilkinson and C. Reinsch, eds. New
York: Springer-Verlag.

Homework 1
0% (1)
Homework 1
8 pages
Regression SPSS
No ratings yet
Regression SPSS
21 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
Trend Analysis - CompContr12
No ratings yet
Trend Analysis - CompContr12
68 pages
Multiple Regression - Selecting The Best Equation: An Example
No ratings yet
Multiple Regression - Selecting The Best Equation: An Example
29 pages
Intronumericalrecipes v01 Chapter02 Regress
No ratings yet
Intronumericalrecipes v01 Chapter02 Regress
15 pages
Section 2
No ratings yet
Section 2
22 pages
T8-Least Squares Regression
No ratings yet
T8-Least Squares Regression
9 pages
Simple Regression 1
No ratings yet
Simple Regression 1
18 pages
Regression 2
No ratings yet
Regression 2
27 pages
Complete Business Statistics: Multiple Regression
No ratings yet
Complete Business Statistics: Multiple Regression
64 pages
SBE11 CH 16
No ratings yet
SBE11 CH 16
59 pages
Unit 4 Multiple Linear Regression
No ratings yet
Unit 4 Multiple Linear Regression
3 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Method Least Squares
No ratings yet
Method Least Squares
7 pages
Ch5 Slide VariableSelection
No ratings yet
Ch5 Slide VariableSelection
36 pages
Multiple Regression
No ratings yet
Multiple Regression
22 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
Unit Iv-1
No ratings yet
Unit Iv-1
15 pages
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
No ratings yet
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
3 pages
Primer of Applied Regression & Analysis of Variance 3E 3rd Edition Educational eBook Download
100% (10)
Primer of Applied Regression & Analysis of Variance 3E 3rd Edition Educational eBook Download
16 pages
Lecture 2 Multivariate Linear Regression Models
No ratings yet
Lecture 2 Multivariate Linear Regression Models
15 pages
Jurnal Asli Diagram Sa
No ratings yet
Jurnal Asli Diagram Sa
11 pages
Linear Algebra: 03/26/12 Revised by D.H. Chen 1
No ratings yet
Linear Algebra: 03/26/12 Revised by D.H. Chen 1
47 pages
15Multiple Linear Regression
No ratings yet
15Multiple Linear Regression
168 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Statistics Assignmemnt
No ratings yet
Statistics Assignmemnt
18 pages
FDSA Unit V LECTURE NOTS
No ratings yet
FDSA Unit V LECTURE NOTS
28 pages
Regression Model
No ratings yet
Regression Model
30 pages
Reg07
No ratings yet
Reg07
22 pages
Chapter 4 Regression
No ratings yet
Chapter 4 Regression
38 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Unit 4
No ratings yet
Unit 4
7 pages
L2D-Multiple Regression D 2022-03-03 21_20_03
No ratings yet
L2D-Multiple Regression D 2022-03-03 21_20_03
31 pages
WLR
No ratings yet
WLR
4 pages
Chapter 3 Multivariate Linear Regression
No ratings yet
Chapter 3 Multivariate Linear Regression
16 pages
Regression Analysis
No ratings yet
Regression Analysis
22 pages
12 W12NSE6220 - Fall 2023 - Zeng
No ratings yet
12 W12NSE6220 - Fall 2023 - Zeng
44 pages
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
No ratings yet
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
1,472 pages
LMC02 App
No ratings yet
LMC02 App
3 pages
10a. Estimation and Forecasting Techniques
No ratings yet
10a. Estimation and Forecasting Techniques
39 pages
A Tutorial On Regression
No ratings yet
A Tutorial On Regression
10 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
13 pages
Multiple Linear Regression: Beginning of Next Lecture - Online Course Evaluation (Bring A Tablet, Laptop, Phone?)
No ratings yet
Multiple Linear Regression: Beginning of Next Lecture - Online Course Evaluation (Bring A Tablet, Laptop, Phone?)
37 pages
US - TMC - 06 - Curve Fitting & Interpolation
No ratings yet
US - TMC - 06 - Curve Fitting & Interpolation
64 pages
Solutions_Manual_for_Econometric_Analysis_7th_Edition_by_Greene_sample_chapter
No ratings yet
Solutions_Manual_for_Econometric_Analysis_7th_Edition_by_Greene_sample_chapter
13 pages
Chapter 5. Regression Models: 1 A Simple Model
No ratings yet
Chapter 5. Regression Models: 1 A Simple Model
49 pages
3.regression Slides
100% (1)
3.regression Slides
25 pages
General Linear Model: Advance Methods of Research Masters of Engineering Program Major in Electrical Engineering
No ratings yet
General Linear Model: Advance Methods of Research Masters of Engineering Program Major in Electrical Engineering
33 pages
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
No ratings yet
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
23 pages
Regression 101
No ratings yet
Regression 101
18 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
IE266-S25-week12
No ratings yet
IE266-S25-week12
53 pages
Brm Unit 3 Mcom Sem1
No ratings yet
Brm Unit 3 Mcom Sem1
40 pages
Topic 7-Regression Analysis
No ratings yet
Topic 7-Regression Analysis
56 pages
Java Ref
No ratings yet
Java Ref
141 pages
Getting Started With The SPSS Data Access Technology
No ratings yet
Getting Started With The SPSS Data Access Technology
28 pages
Oleclig
No ratings yet
Oleclig
48 pages
Mixed
No ratings yet
Mixed
23 pages
Jxtug
No ratings yet
Jxtug
220 pages
Notation: T T T T T
No ratings yet
Notation: T T T T T
9 pages
App13 Spchart
No ratings yet
App13 Spchart
11 pages
GLM Repeated Measures
No ratings yet
GLM Repeated Measures
7 pages
Genlog Poisson
No ratings yet
Genlog Poisson
16 pages
Clementine Application Templates
No ratings yet
Clementine Application Templates
8 pages
Getstart
No ratings yet
Getstart
103 pages
Catreg
No ratings yet
Catreg
12 pages
Coxreg
No ratings yet
Coxreg
19 pages
Cluster
No ratings yet
Cluster
13 pages
Cross Tabs
No ratings yet
Cross Tabs
29 pages
Arima
No ratings yet
Arima
8 pages