0% found this document useful (0 votes)

235 views

Supplement 5 - Multiple Regression

The document introduces multiple linear regression models, which describe how a single response variable depends linearly on multiple predictor variables. It provides examples of situations that can be modeled by multiple linear regression, such as how the selling price of a house depends on location, size, and other factors. The document then discusses estimating the parameters of multiple linear regression models using a linear algebra approach for efficiency with many parameters. It presents the normal equations and shows that their solution provides the least-squares estimates of the regression coefficients.

Uploaded by

nm2007k

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

235 views

Supplement 5 - Multiple Regression

Uploaded by

nm2007k

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Math 261A - Spring 2012

M. Bremer

Multiple Linear Regression

So far, we have seen the concept of simple linear regression where a single predictor
variable X was used to model the response variable Y . In many applications, there
is more than one factor that influences the response. Multiple regression models
thus describe how a single response variable Y depends linearly on a number of
predictor variables.
Examples:
The selling price of a house can depend on the desirability of the location, the
number of bedrooms, the number of bathrooms, the year the house was built,
the square footage of the lot and a number of other factors.
The height of a child can depend on the height of the mother, the height of
the father, nutrition, and environmental factors.

Note: We will reserve the term multiple regression for models with two or
more predictors and one response. There are also regression models with two or
more response variables. These models are usually called multivariate regression models.
In this chapter, we will introduce a new (linear algebra based) method for computing
the parameter estimates of multiple regression models. This more compact method
is convenient for models for which the number of unknown parameters is large.
Example: A multiple linear regression model with k predictor variables X1 , X2 , ..., Xk
and a response Y , can be written as
y = 0 + 1 x1 + 2 x2 + k xk + .
As before, the are the residual terms of the model and the distribution assumption we place on the residuals will allow us later to do inference on the remaining model parameters. Interpret the meaning of the regression coefficients
0 , 1 , 2 , ..., k in this model.

More complex models may include higher powers of one or more predictor variables,
e.g.,
y = 0 + 1 x + 2 x2 +
(1)
18

Math 261A - Spring 2012

M. Bremer

or interaction eects of two or more variables

y = 0 + 1 x1 + 2 x2 + 12 x1 x2 +

(2)

Note: Models of this type can be called linear regression models as they can
be written as linear combinations of the -parameters in the model. The x-terms are
the weights and it does not matter, that they may be non-linear in x. Confusingly,
models of type (1) are also sometimes called non-linear regression models or
polynomial regression models, as the regression curve is not a line. Models of
type (2) are usually called linear models with interaction terms.
It helps to develop a little geometric intuition when working with regression models.
Models with two predictor variables (say x1 and x2 ) and a response variable y can
be understood as a two-dimensional surface in space. The shape of this surface
depends on the structure of the model. The observations are points in space and
the surface is fitted to best approximate the observations.
Example: The simplest multiple regression model for two predictor variables is
y = 0 + 1 x1 + 2 x2 +
The surface that corresponds to the model
y = 50 + 10x1 + 7x2
looks like this. It is a plane in R3 with dierent slopes in x1 and x2 direction.
250
200
150
100
50
0
50
100
150
200
10
0
10

Math 261A - Spring 2012

M. Bremer

Example: For a simple linear model with two predictor variables and an interaction
term, the surface is no longer flat but curved.
y = 10 + x1 + x2 + x1 x2

140
120
100
80
60
40
20
10

0
0

5
8

Example: Polynomial regression models with two predictor variables and interaction terms are quadratic forms. Their surfaces can have many dierent shapes
depending on the values of the model parameters with the contour lines being either
parallel lines, parabolas or ellipses.
y = 0 + 1 x1 + 2 x2 + 11 x21 + 22 x22 + 12 x1 x2 +

1000

300

800

250
200

600
150
400

100

200

0
10

0
10
5

10
0

5
10

400

200

5
10

0
400
200
600

400
600

800

800
10

1000
10
5

10
0

5
10

Math 261A - Spring 2012

M. Bremer

Estimation of the Model Parameters

While it is possible to estimate the parameters of more complex linear models with
methods similar to those we have seen in chapter 2, the computations become very
complicated very quickly. Thus, we will employ linear algebra methods to make the
computations more ecient.
The setup: Consider a multiple linear regression model with k independent predictor variables x1 , . . . , xk and one response variable y.
y = 0 + 1 x1 + + k xk +
Suppose, we have n observations on the k + 1 variables.
yi = 0 + 1 xi1 + + k xik + i ,

i = 1, . . . , n

n should be bigger than k. Why?

You can think of the observations as points in (k + 1)-dimensional space if you like.
Our goal in least-squares regression is to fit a hyper-plane into (k + 1)-dimensional
space that minimizes the sum of squared residuals.
2

n
n
k

e2i =
yi 0
j xij
i=1

i=1

j=1

As before, we could take derivatives with respect to the model parameters 0 , . . . , k ,

set them equal to zero and derive the least-squares normal equations that
our parameter estimates 0 , . . . , k would have to fulfill.
n0

0
xi1
n

i=1

+1
+1

i=1
n

i=1

xi1
x2i1

+2
+2

xi2

i=1
n

i=1

xi1 xi2 +

..
..
..
.
.
.
n
n
n

0
xik +1
xik xi1 +2
xik xi2 +
i=1

i=1

+k
+k

xik

i=1
n

xi1 xik =

i=1

..
.
n

+k
x2ik
i=1

i=1
n

yi
xi1 yi

i=1

..
.
n

xik yi

i=1

These equations are much more conveniently formulated with the help of vectors
and matrices.
Note: Bold-faced lower case letters will now denote vectors and bold-faced upper case letters will denote matrices. Greek letters cannot be bold-faced in Latex.
Whether a Greek letter denotes a random variable or a vector of random variables
should be clear from the context, hopefully.
21

Math 261A - Spring 2012

Let

y1
y2
..
.
yn

M. Bremer

0
1
..
.
k

1 x11 x12
1 x21 x22
.. ..
..
. .
.
1 xn1 xn2

1
2

= ..
.
n

x1k
x2k
..
.

xnk

With this compact notation, the linear regression model can be written in the form
y = X +
In linear algebra terms, the least-squares parameter estimates are the vectors that
minimize
n

2i = = (y X) (y X)
i=1

Any expression of the form X is an element of a (at most) (k + 1)-dimensional

hyperspace in Rn spanned by the (k + 1) columns of X. Imagine the columns of X
to be fixed, they are the data for a specific problem, and imagine to be variable.
We want to find the best in the sense that the sum of squared residuals is
minimized. The smallest that the sum of squares could be is zero. If all i were zero,
then
= X
y
is the projection of the n-dimensional data vector y onto the hyperplane
Here y
spanned by X.

Column space of X

are the predicted values in our regression model that all lie on the regression
The y
hyper-plane. Suppose further that satisfies the equation above. Then the resid are orthogonal to the columns of X (by the Orthogonal Decomposition
uals y y
Theorem) and thus

Math 261A - Spring 2012

M. Bremer
=0
X (y X)
X y X X = 0
X X = X y

These vector normal equations are the same normal equations that one could obtain
from taking derivatives. To solve the normal equations (i.e., to find the parameter
multiply both sides with the inverse of X X. Thus, the least-squares
estimates ),
estimator of is (in vector form)
= (X X)1 X y
This of course works only if the inverse exists. If the inverse does not exist, the
normal equations can still be solved, but the solution may not be unique. The
inverse of X X exists, if the columns of X are linearly independent. That means
that no column can be written as a linear combination of the other columns.
in a linear regression model can be expressed as
The vector of fitted values y
= X = X(X X)1 X y = Hy
y
The n n matrix H = X(X X)1 X is often called the hat-matrix. It maps
that lie on the
the vector of observed values y onto the vector of fitted values y
regression hyper-plane. The regression residuals can be written in dierent ways as
= y X = y Hy = (I H)y
=yy

Math 261A - Spring 2012

M. Bremer

Example: The Delivery Times Data

A soft drink bottler is analyzing the vending machine serving routes in his distribution system. He is interested in predicting the time required by the distribution
driver to service the vending machines in an outlet. This service activity includes
stocking the machines with new beverage products and performing minor maintenance or housekeeping. It has been suggested that the two most important variables
influencing delivery time (y in min) are the number of cases of product stocked (x1 )
and the distance walked by the driver (x2 in feet). 25 observations on delivery
times, cases stocked and walking times have been recorded and are available in the
file DeliveryTimes.txt.
Before you begin fitting a model, it makes sense to check that there is indeed a
(somewhat) linear relationship between the predictor variables and the response.
The easiest way to do this is with the plot() command in R. If your object is a
data file where each column corresponds to a variable (predictor or response), you
will automatically obtain a matrix of scatterplots.
10

Time

1000

1400

Cases

200

600

Distance

200

600

1000

1400

Look at the panels that describe the relationship between the response (here time)
and the predictors. Make sure that the pattern is somewhat linear (look for obvious
curves in which case the simple linear model without powers or interaction terms
would not be a good fit).
Caution: Do not rely too much on a panel of scatterplots to judge how well a multiple linear regression really works. It can be very hard to see. A perfectly fitting
model can look like a random confettiplot if the predictor variables are themselves
correlated.
24

Math 261A - Spring 2012

M. Bremer

If a regression model has only two predictor variables, it is also possible to create a
three-dimensional plot of the observations.

1500
1000

Distance

Time

Delivery Times

500
0

0
0

Cases

Computing the parameter estimates of this linear regression model by-hand in R,

means to formulate the X matrix and the y-vector and to use the equations derived

on the previous pages to compute .

Thus,

0
2.34123115
1 = 1.61590721
0.01438483
2
25

Math 261A - Spring 2012

M. Bremer

With this, the estimated multiple regression equation becomes:

y = 2.341 + 1.616x1 + 0.0144x2
where y is the delivery time, x1 is the number of cases and x2 is the distance walked
by the driver. We can get more details about the fitted regression model, such as
the estimated residual variance and hypothesis tests for both slopes.

Example: Read o the estimated residual variance from the output shown above.

Math 261A - Spring 2012

M. Bremer

Properties of the Least Squares Estimators

Example: The least squares estimate vector in the multiple linear regression
model is unbiased.

Example: Find the covariance matrix of the least squares estimate vector .

The estimate of the residual variance can still be found via the residual sum of
squares SSRes which has the same definition as in the simple linear regression case.
SSRes =

2i =

i=1

(yi yi )2 =

It can also be expressed in vector form:

SSRes =

If the multiple regression model contains k predictors, then the degree of freedom of
the residual sum of squares is nk (we lose one degree of freedom for the estimation
of each slope and the intercept). Thus
M SRes =

SSRes
=
2
nk1

The residual variance is model dependent. Its estimate changes if additional predictor variables are included in the model or if predictors are removed. It is hard to
say which one the correct residual variance is. We will learn later how to compare
dierent models with each other. In general, a smaller residual variance is preferred
in a model.
27

Math 261A - Spring 2012

M. Bremer

Maximum Likelihood Estimation

As in the simple linear regression model, the maximum likelihood parameter estimates are identical to the least squares parameter estimates in the multiple regression model.
y = X +
where the are assumed to be iid N(0, 2 ). Or short, N(0, 2 I). The likelihood
function can be written in vector form. Maximizing the likelihood function leads to
the ML parameter estimates and
2.
L(, , 2 ) =

Thus,
= (X X)1 X y,

(y X)

=
n
2

Hypothesis Testing for Multiple Regression

After fitting a multiple linear regression model and computing the parameter estimates, we have to make some decisions about the model:
Is the model a good fit for the data?
Do we really need all the predictor variables in the model? (Generally, a model
with fewer predictors and about the same explanatory power is better).
There are several hypothesis tests that we can utilize to answer these questions.
Their results are usually reported in the coecients and ANOVA tables that are
produced as routine output in multiple regression analysis. But the tests can also
be conducted by-hand, if necessary.

Math 261A - Spring 2012

M. Bremer

Testing for Significance of Regression: This very pessimistic test asks

whether any of the k predictor variables in the model have any relationship with
the response.
H0 : 1 = = k = 0 vs. Ha : j = 0 for at least one j
The test statistic function for this test is based on the sums of squares that we have
previously defined for the simple linear regression model (the definitions are still the
same).
n

SSR =
(
yi y)2
i=1

SSRes =

i=1

SST =

i=1

(yi yi )2 = y y X y

(yi y) = y y

i=1

The test statistic function then becomes

SSR /k
M SR
F =
=
Fk,nk1
SSRes /(n k 1)
M SRes

If the value of this test statistic is large, then the regression works well and at
least one predictor in the model is relevant for the response. The F -test statistic
and p-value are reported in the regression ANOVA table (columns F value and
Pr(>F)).
Example: Read o and interpret the result of the F -test for significance of regression in the Delivery Time Example.

Assessing Model adequacy: There are several ways in which to judge how
well a specific model fits. We have already seen that in general, a smaller residual
variance is desirable. Other quantities that describe the goodness of fit of the
model are R2 and adjusted R2 . Recall, that in the simple linear regression model,
R2 was simply the square of the correlation coecient between the predictor and
the response. This is no longer true in the multiple regression model. But there is
another interpretation for R2 . In general, R2 is the proportion of variation in the
response that is explained through the regression on all the predictors in the model.
Including more predictors in a multiple regression model will always bring up the
value of R2 . But using more predictors is not necessarily better. To weigh the
proportion of variation explained with the number of predictors, we can use the
adjusted R2 .
29

Math 261A - Spring 2012

M. Bremer

2
RAdj
=1

SSR /(n k 1)
SST /(n 1)

Here, k is the number of predictors in the current model and SSR /(n k) is actually
the estimated residual variance of the model with k predictors. The adjusted R2
does not automatically increase when more predictors are added to the model and
it can be used as one tool in the arsenal of finding the best model for a given data
set. Higher adjusted R2 indicates a better fitting model.
Example: For the Delivery Time data, find R2 and the adjusted R2 for the model
with both predictor variables in the R-output.

Testing Individual Regression Coefficients: As in the simple linear regression model, we can formulate individual hypothesis tests for each slope (or even the
intercept) in the model. For instance
H0 : j = 0,

vs. HA : j = 0

tests whether the slope associated with the j th predictor is significantly dierent
from zero. The test statistics for this test is
t=

j
t(df = n k 1)
se(j )

Here, se(j ) is the square root of the j th diagonal entry of the covariance matrix
This test is a marginal test.

2 (X X)1 of the estimated parameter vector .

That means that the test statistic (and thus the p-value of the test) depends not just
on the j th predictor but also on all other predictors that are included in the model
at the same time. Thus, if any predictor is added or removed from a regression
model, hypothesis tests for individual slopes need to be repeated. If this tests
null hypothesis is rejected, we can conclude that the j th predictor has a significant
influence on the response, given the other regressors in the model at the same time.
Example: Read o test statistic values and p-values for the two regressors Cases
and Distance in the Delivery Time data example. Formulate conclusions for both
predictors.

Note: As weve seen before, every two-sided hypothesis test for a regression slope
can also be reformulated as a confidence interval for the same slope. The 95%
confidence intervals for the slopes can also be computed by R (command confint().
30

Math 261A - Spring 2012

M. Bremer

Test for a Group of Predictors: Consider the multiple linear regression

model with k predictors. Suppose that we can partition the predictors into two
groups (x1 , . . . , xkp ) and (xkp+1 , . . . , xk ). We want to simultaneously test, whether
the latter group of p predictors can be removed from the model. Suppose we partition
the vector of regression slopes accordingly into two parts

1
=
2
where 1 contains the intercept and the slopes for the first k p predictors and 2
contains the remaining p slopes. We want to test
H0 : 2 = 0 vs. HA : 2 = 0
We will compare two alternative regression models to each other:
(Full Model)

y = X +

(Reduced Model)

y = X1 1 +

y (k degrees of freedom).
with SSR () = X
with SSR (1 ) = 1 X y (k p degrees of freedom)
1

With this notation, the regression sum of squares that describes the contribution of
the slopes in 2 given that 1 is already in the model becomes
SSR (2 |1 ) = SSR (1 , 2 ) SSR (1 )
The test statistic that tests the hypotheses described above is
F =

SSR (2 |1 )/p
Fp,nk1
M SRes

Caution: Under certain circumstances (when there is multicollinearity in the data),

the power of this test is very low. That means that even if the predictors in 2 may
be important, this test may fail to reject the null hypothesis and consequently exclude these predictors.
Note: Tests like the above play an important role in model building. Model
building is the task of selecting a subset of relevant predictors from a larger set of
available predictors to build a good regression model. This kind of test is well suited
for this task, because it tests whether additional predictors contribute significantly
to the quality of the model, given the predictors that are already included.
Example: For the Delivery Time data, test whether there is a significant contribution from including the Distance variable, if the Cases variable is already included
in the model.

Math 261A - Spring 2012

M. Bremer

General Linear Hypotheses

So far, we have discussed how to test hypotheses that state that single slopes or sets
of slopes are all equal to zero. There are more general ways in which statements
about the slopes in a multiple regression model could be phrased. For instance,
consider the null hypothesis H0 : T = 0, where T is some r k matrix of constants.
Eectively, this defines r linear equations in the k slope (and intercept) parameters.
Assume that the equations are independent (not redundant).
The full model for this problem is y = X + with = (X X)1 X y with residual
sum of squares
SSRes (F M ) = y y X y (n k 1 degrees of freedom)
To obtain the reduced model (the model under the null hypothesis), the r independent equations in T = 0 are used to solve for r of the regression coecients in
terms of the remaining k r coecients. This leads to the reduced model y = Z +
in which the estimate of is = (Z Z)1 Z y with residual sum of squares
SSRes (RM ) = y y Z y (n k 1 + r degrees of freedom)
The full model contains more parameters than the reduced model and thus has
higher explanatory power: SSRes (RM ) SSRes (F M ). Compute
SSH = SSRes (RM )SSRes (F M ) (nk1+r(nk1) = r degrees of freedom)
and use the test statistic
F =

SSH /r
Fr,nk1
SSRes /(n k 1)

for the general linear hypothesis phrased above.

Example: Consider the model
y = 0 + 1 x1 + 2 x2 + 3 x3 +
and phrase the test statistic for the (simultaneous) test H0 : 1 = 3 , 2 = 0.

Math 261A - Spring 2012

M. Bremer

Simultaneous Confidence Intervals on Regression Coecients

The confidence intervals that we have computed so far, were confidence intervals
for a single slope or intercept. In multiple regression models it is not uncommon to
compute several confidence intervals (one for each slope, for example) from the same
set of data. The confidence level 100(1 )% refers to each individual confidence
interval. A set of confidence intervals that all contain their respective population
parameters 95% of the time (with respect to repeated sampling) are called simultaneous confidence intervals.
Definition: The joint confidence region for the multiple regression parameter
vector can be formulated as follows:
( ) X X( )
F,k,nk1
kM SRes
This inequality described an elliptical region in k dimensional space. For simple
linear regression (k = 2) this is a two-dimensional ellipse.
Example: Construct the confidence ellipse for 0 and 1 for the Rocket Propellant
data from Chapter 2.

-45

-40

Slope

-35

-30

Simultaneous Confidence Interval Ellipsoid

2550

2600

2650

2700

2750

Intercept

Other methods for constructing simultaneous confidence intervals include the Bonferroni method which eectively splits the into as many equal portions as confidence intervals need to be computed (say p) and then computes each interval
individually at level (1 /p).
33

Math 261A - Spring 2012

M. Bremer

Prediction in Multiple Linear Regression

Just as in simple linear regression, we may be interested to produce prediction
intervals for specific or for general new observations. For a specific set of values of
the predictor
x0 = [1, x01 , x02 , . . . , x0k ]
a point estimate for a future observation y at x0 is
y0 = x0
a 100(1 )% prediction interval for this future observation is

y0 t/2,nk1
2 (1 + x0 (X X)1 x0 )

Example: For the Delivery Time data, calculate a 95% prediction interval for the
time it takes to restock a vending machine with x1 = 8 cases if the driver has to
walk x2 = 275 feet.

Note: In an introductory regression class, you may have learned that it is dangerous to predict new observations outside of the range of data you have collected. For
instance, if you have data on the ages and heights of young girls, all between age 2
and 12, it would not be a good idea to use that linear regression model to predict
the height of a 25 year old young woman. This concept of outside the range has
to be extended in multiple linear regression.

range of x 2

Consider a regression problem with two predictor variables in which the collected data
all falls within the ellipse in the picture
shown on the right. The point (x, y) has
coordinates that are each within the ranges
of the observed variables individually, but it
would still not be a good idea to predict the
value of the response at this point, because
we have no data to check the validity of the
model in the vicinity of the point.

Original
Data
y

(x,y)

range of x 1

Math 261A - Spring 2012

M. Bremer

Standardized Regression Coecients

The value of a slope j in a multiple regression problem depends on the units in
which the corresponding predictor xj is measured. This makes it dicult to compare
slopes with each other. Both within the same model and across dierent models.
To make slope estimates comparable, it is sometimes advantageous to scale them
(make them unit less). These dimensionless regression coecients are usually called
standardized regression coefficients. There are dierent techniques for
scaling the coecients.
Unit Normal Scaling: Subtract the sample mean and divide by the sample
standard deviation both the predictor variables and the response:
zij =

xij xj
,
sj

yi =

yi y
sy

where sj is the estimated sample standard deviation of predictor xj and sy is the

estimated sample standard deviation of the response. Using these new standardized
variables, our regression model becomes
yi = b1 zi1 + b2 zi2 + + bk zik + i ,

i = 1, . . . , n

Question: What happened to the intercept?

= (Z Z)1 Z y is the standardized coecient estimate.
The least squares estimator b
Unit Length Scaling: Subtract the mean again, but now divide by the root of
the sum of squares for each regressor:
xij xj
wij =
,
Sjj

yi y
yi0 =
SST

where Sjj = (xij xj )2 is the corrected sum of squares for regressor xj . In this
case the regression model becomes
yi0 = b1 wi1 + b2 wi2 + + bk wik + i ,

i = 1, . . . , n

= (W W)1 W y0 .
and the vector of scaled least-squares regression coecients is b
The W W matrix is the correlation matrix for the k predictor variables. I.e., W Wij
is simply the correlation between xi and xj .
The matrices Z in unit normal scaling and W in unit length scaling are closely
related and both methods will produce the exact same standardized regression co The relationship between the original and scaled coecients is
ecients b.

1/2
SST

j = bj
, j = 1, 2, . . . , k
Sjj
35

Math 261A - Spring 2012

M. Bremer

Multicollinearity
In theory, one would like to have predictors in a multiple regression model that each
have a dierent influence on the response and are independent from each other. In
practice, the predictor variables are often correlated themselves. Multicollinearity
is the prevalence of near-linear dependence among the regressors.
If one regressor were a linear combination of the other regressors, then the matrix X
(whose columns are the regressors) would have linearly dependent columns, which
would make the matrix (X X) singular (non-invertible). In practice, it would mean
that the predictor that can be expressed through the other predictors cannot contribute any new information about the response. But, worse than that, the linear
dependence of the predictors makes the estimated slopes in the regression model
arbitrary.
Example: Consider a regression model in which somebodys height (in inches) is
expressed as a function of arm-span (in inches). Suppose the true regression equation
is
y = 12 + 1.1x
Now, suppose further that when measuring the arm span, two people took independent measurements in inches (x1 ) and in centimeters (x2 ) of the same subjects and
both variables have erroneously been included in the same linear regression model.
y = 0 + 1 x1 + 2 x2 +
We know that in this case, x2 = 0.394x1 and thus we should have 1 +0.3942 = 1.1,
in theory. But since this is a single equation with two unknowns, there are infinitely many possible solutions - some quite nonsensical. For instance, we could
have 1 = 2.7 and 2 = 9.645. Of course, these slopes are not interpretable in the
context of the original problem. The computer used to fit the data and to compute
parameter estimates cannot distinguish between sensible and nonsensical estimates.
How can you tell whether you have multicollinearity in your data? Suppose your
data have been standardized, so that X X is the correlation matrix for the k predictors in the model. The main diagonal elements of the inverse of the predictor
correlation matrix are called the variance inflation factors (VIF). The larger these
factors are, the more you should worry about multicollinearity in your model. On
the other extreme, VIFs of 1 mean that the predictors are all orthogonal.
In general, the variance inflation factor for the j th regressor coecient can be computed as
1
V IFj =
1 Rj2

where Rj2 is the coecient of multiple determination obtained from regressing xj on

the remaining predictor variables. We will discuss how to diagnose (and fix) eects
of multicollinearity in more detail in Chapter 11 towards the end of the course.
36

Calibration of Peristaltic Pumps - Lab 1
100% (2)
Calibration of Peristaltic Pumps - Lab 1
12 pages
CHAPTER 8 Sampling and Sampling Distributions
0% (1)
CHAPTER 8 Sampling and Sampling Distributions
54 pages
Mult Regression
No ratings yet
Mult Regression
28 pages
Desingn of Experiments ch10
No ratings yet
Desingn of Experiments ch10
5 pages
Multiple Regression Analysis: I 0 1 I1 K Ik I
100% (1)
Multiple Regression Analysis: I 0 1 I1 K Ik I
30 pages
Notes Part 2
No ratings yet
Notes Part 2
101 pages
Forecasting Techniques: Quantitative Techniques in Management
No ratings yet
Forecasting Techniques: Quantitative Techniques in Management
25 pages
Applied Linear Regression
No ratings yet
Applied Linear Regression
6 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Unit-3 Data Analysis
No ratings yet
Unit-3 Data Analysis
36 pages
CHAPTER 3- POLYNOMIAL REGRESSION AND INTERACTION
No ratings yet
CHAPTER 3- POLYNOMIAL REGRESSION AND INTERACTION
12 pages
Unit 6
No ratings yet
Unit 6
8 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Multi Collinearity
No ratings yet
Multi Collinearity
11 pages
03 ES Regression Correlation
No ratings yet
03 ES Regression Correlation
14 pages
converting model to standard
No ratings yet
converting model to standard
5 pages
Linear Programming Notes V Problem Transformations
No ratings yet
Linear Programming Notes V Problem Transformations
6 pages
X X B X B X B y X X B X B N B Y: QMDS 202 Data Analysis and Modeling
No ratings yet
X X B X B X B y X X B X B N B Y: QMDS 202 Data Analysis and Modeling
6 pages
Greene - Chap 9
No ratings yet
Greene - Chap 9
2 pages
Econometric Model With Cross-Sectional, Time Series, and Panel Data
0% (1)
Econometric Model With Cross-Sectional, Time Series, and Panel Data
4 pages
GED102 Week 6 WGN - JINGONA
No ratings yet
GED102 Week 6 WGN - JINGONA
4 pages
Chapter PDF
No ratings yet
Chapter PDF
55 pages
Calculus 1. Chapter 3
No ratings yet
Calculus 1. Chapter 3
42 pages
Regression Analysis
No ratings yet
Regression Analysis
29 pages
q6-5 Solution (Ridge and Lasso)
No ratings yet
q6-5 Solution (Ridge and Lasso)
7 pages
Year 1 MT Vacation Work, Vectors and Matrices
No ratings yet
Year 1 MT Vacation Work, Vectors and Matrices
5 pages
Multiple Linear Regression: Chapter 12
No ratings yet
Multiple Linear Regression: Chapter 12
49 pages
Chapter - 3
No ratings yet
Chapter - 3
38 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Cayanan-Mbbc 514-8865
No ratings yet
Cayanan-Mbbc 514-8865
107 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Regression Equations
No ratings yet
Regression Equations
14 pages
Solve Quadratic Equation
No ratings yet
Solve Quadratic Equation
9 pages
LectureNotes - TOPIC - 4 - Algebraic Expression
No ratings yet
LectureNotes - TOPIC - 4 - Algebraic Expression
4 pages
slides4-mrbm2324
No ratings yet
slides4-mrbm2324
40 pages
Session 4 - Multiple Linear Regression
No ratings yet
Session 4 - Multiple Linear Regression
63 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
CH 06
No ratings yet
CH 06
22 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Stats ch12 PDF
No ratings yet
Stats ch12 PDF
28 pages
Module 12. Linear Corr & Reg Analysis
No ratings yet
Module 12. Linear Corr & Reg Analysis
15 pages
LP Methods (Notes)
No ratings yet
LP Methods (Notes)
32 pages
Linear Models: The Least-Squares Method, The Perceptron: A Heuristic Learning Algorithm For
No ratings yet
Linear Models: The Least-Squares Method, The Perceptron: A Heuristic Learning Algorithm For
25 pages
1 Chapter I Linear Equations
No ratings yet
1 Chapter I Linear Equations
98 pages
Regression Corr
No ratings yet
Regression Corr
15 pages
VAR Models in Macro and Finance
No ratings yet
VAR Models in Macro and Finance
38 pages
Lecture 6 Functions
No ratings yet
Lecture 6 Functions
19 pages
Session 18 Regression
No ratings yet
Session 18 Regression
16 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
Functions and Applications
No ratings yet
Functions and Applications
30 pages
Chapter 0
No ratings yet
Chapter 0
10 pages
Regression: 9.1.1 Definition
No ratings yet
Regression: 9.1.1 Definition
20 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Module - 3 Lecture Notes - 5 Revised Simplex Method, Duality and Sensitivity Analysis
No ratings yet
Module - 3 Lecture Notes - 5 Revised Simplex Method, Duality and Sensitivity Analysis
11 pages
Third Geometric Note On S-Convexity
No ratings yet
Third Geometric Note On S-Convexity
12 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Chapter 9
No ratings yet
Chapter 9
23 pages
Regression
No ratings yet
Regression
16 pages
Maths
No ratings yet
Maths
16 pages
WEEK 4 St.
No ratings yet
WEEK 4 St.
7 pages
Chapter - 4 Linear Equations in Two Variables
No ratings yet
Chapter - 4 Linear Equations in Two Variables
15 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Lab Services
No ratings yet
Lab Services
4 pages
Engineering Formulas: Metric Pitch: 0.7 Conversion Factor: 0.03937 Pitch: Tpi: Spindle RPM: 400 Ipm
No ratings yet
Engineering Formulas: Metric Pitch: 0.7 Conversion Factor: 0.03937 Pitch: Tpi: Spindle RPM: 400 Ipm
1 page
FPTS - 2016 Brochure
No ratings yet
FPTS - 2016 Brochure
2 pages
C. Data Specifications: Svan 957 User Manual
No ratings yet
C. Data Specifications: Svan 957 User Manual
25 pages
Iso 21940 11 2016
No ratings yet
Iso 21940 11 2016
12 pages
GE Fanuc Automation: AC Spindle Motor βi Series
No ratings yet
GE Fanuc Automation: AC Spindle Motor βi Series
70 pages
User Manual - At128 Dev BRD
No ratings yet
User Manual - At128 Dev BRD
21 pages
Single 2011 MI 2094 CE Multitester Ang PDF
No ratings yet
Single 2011 MI 2094 CE Multitester Ang PDF
2 pages
User Manual - At128 Dev BRD
No ratings yet
User Manual - At128 Dev BRD
21 pages
QC20 Ballbar Presentation - IMTMA
No ratings yet
QC20 Ballbar Presentation - IMTMA
39 pages
Research and Testing Facilities 27-6-2016
No ratings yet
Research and Testing Facilities 27-6-2016
148 pages
Research and Testing Facilities 27-6-2016
No ratings yet
Research and Testing Facilities 27-6-2016
148 pages
Tuv Rheinland Gs Mark Checklist en
No ratings yet
Tuv Rheinland Gs Mark Checklist en
2 pages
2604op Iss1
No ratings yet
2604op Iss1
234 pages
Surtronic S100 Lowres en
No ratings yet
Surtronic S100 Lowres en
4 pages
BCB F002 - NABCB Fee Structure IBs - July 2018 - 2
No ratings yet
BCB F002 - NABCB Fee Structure IBs - July 2018 - 2
3 pages
Modal Analysis High Speed Spindle
No ratings yet
Modal Analysis High Speed Spindle
7 pages
JUNE 2014 Higher Growth, Lower Unemployment Predicted For Second Half of 2014
No ratings yet
JUNE 2014 Higher Growth, Lower Unemployment Predicted For Second Half of 2014
10 pages
Moog Controller N121-132electronics
No ratings yet
Moog Controller N121-132electronics
6 pages
Machine Tool Testing
100% (1)
Machine Tool Testing
4 pages
CBL 6111D
No ratings yet
CBL 6111D
5 pages
Determining Battery Capacity3
No ratings yet
Determining Battery Capacity3
0 pages
June 2014 Acca
No ratings yet
June 2014 Acca
1 page
Kayesr - Eticket
No ratings yet
Kayesr - Eticket
1 page
Vibration Control B&K
No ratings yet
Vibration Control B&K
84 pages
HyperMesh Advanced Training PDF
No ratings yet
HyperMesh Advanced Training PDF
122 pages
Indirect Effects in Path Analysis: Finney
No ratings yet
Indirect Effects in Path Analysis: Finney
12 pages
RSH Qam11 Excel and Excel QM ExplsM2010
No ratings yet
RSH Qam11 Excel and Excel QM ExplsM2010
150 pages
Gender Differences in Buying Behavior and Brand Preferences Towards Backpack
No ratings yet
Gender Differences in Buying Behavior and Brand Preferences Towards Backpack
16 pages
Predicting Teachers' and Schools' Implementation of The Olweus Bullying Prevention Program: A Multilevel Study
No ratings yet
Predicting Teachers' and Schools' Implementation of The Olweus Bullying Prevention Program: A Multilevel Study
31 pages
Estad Istica II Chapter 5. Regression Analysis (Second Part)
No ratings yet
Estad Istica II Chapter 5. Regression Analysis (Second Part)
39 pages
A Method For Calibration and Validation Subset Partitioning
No ratings yet
A Method For Calibration and Validation Subset Partitioning
5 pages
BRM Multi Var
No ratings yet
BRM Multi Var
38 pages
Apparent Losses Analysis in District Metered Areas of Water Distribution Systems
No ratings yet
Apparent Losses Analysis in District Metered Areas of Water Distribution Systems
14 pages
Validation Methods BK-rev
No ratings yet
Validation Methods BK-rev
137 pages
Econometrics Lab I Assignment
No ratings yet
Econometrics Lab I Assignment
5 pages
Ids Unit 4
No ratings yet
Ids Unit 4
4 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
196 pages
Verbeek e Nijman - Testing For Selectivity Bias in Panel Data Models
No ratings yet
Verbeek e Nijman - Testing For Selectivity Bias in Panel Data Models
24 pages
(Ebook) From Data to Decisions in Music Education Research: Data Analytics and the General Linear Model Using R by Brian C. Wesolowski ISBN 9781032060521, 9781032060491 - The latest updated ebook version is ready for download
100% (1)
(Ebook) From Data to Decisions in Music Education Research: Data Analytics and the General Linear Model Using R by Brian C. Wesolowski ISBN 9781032060521, 9781032060491 - The latest updated ebook version is ready for download
85 pages
OLS Assumptions and diagnostics
No ratings yet
OLS Assumptions and diagnostics
18 pages
Vecto Rs
No ratings yet
Vecto Rs
330 pages
Timeline of Statistics
100% (1)
Timeline of Statistics
1 page
SM FF M Further Pure Maths
No ratings yet
SM FF M Further Pure Maths
23 pages
4
No ratings yet
4
20 pages
DS100-2-Grp#4 Chapter 6 Advanced Analytical Theory and Methods Regression (CADAY, CASTOR, CRUZ, SANORIA, TAN)
No ratings yet
DS100-2-Grp#4 Chapter 6 Advanced Analytical Theory and Methods Regression (CADAY, CASTOR, CRUZ, SANORIA, TAN)
4 pages
Linear Regression Overview
No ratings yet
Linear Regression Overview
14 pages
CHAPTER IV - Multiple Regression Model
No ratings yet
CHAPTER IV - Multiple Regression Model
90 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
22 pages
Data Science With Python
No ratings yet
Data Science With Python
27 pages
Phillips Perron
100% (1)
Phillips Perron
4 pages
Tsion Seifu orginal research(Thesis)pdff - Copy
No ratings yet
Tsion Seifu orginal research(Thesis)pdff - Copy
91 pages
Expected Shortfall LASSO: Sander Barendse
No ratings yet
Expected Shortfall LASSO: Sander Barendse
42 pages
AP Statistics Portfolio Q2
No ratings yet
AP Statistics Portfolio Q2
17 pages

Supplement 5 - Multiple Regression

Uploaded by

Supplement 5 - Multiple Regression

Uploaded by

Math 261A - Spring 2012

Multiple Linear Regression

Math 261A - Spring 2012

or interaction eects of two or more variables

Math 261A - Spring 2012

Math 261A - Spring 2012

Estimation of the Model Parameters

n should be bigger than k. Why?

As before, we could take derivatives with respect to the model parameters 0 , . . . , k ,

Math 261A - Spring 2012

Any expression of the form X is an element of a (at most) (k + 1)-dimensional

Math 261A - Spring 2012

Math 261A - Spring 2012

Example: The Delivery Times Data

Math 261A - Spring 2012

Computing the parameter estimates of this linear regression model by-hand in R,

on the previous pages to compute .

Math 261A - Spring 2012

With this, the estimated multiple regression equation becomes:

Math 261A - Spring 2012

Properties of the Least Squares Estimators

It can also be expressed in vector form:

Math 261A - Spring 2012

Maximum Likelihood Estimation

Hypothesis Testing for Multiple Regression

Math 261A - Spring 2012

Testing for Significance of Regression: This very pessimistic test asks

The test statistic function then becomes

Math 261A - Spring 2012

2 (X X)1 of the estimated parameter vector .

Math 261A - Spring 2012

Test for a Group of Predictors: Consider the multiple linear regression

Caution: Under certain circumstances (when there is multicollinearity in the data),

Math 261A - Spring 2012

General Linear Hypotheses

for the general linear hypothesis phrased above.

Math 261A - Spring 2012

Simultaneous Confidence Intervals on Regression Coecients

Simultaneous Confidence Interval Ellipsoid

Math 261A - Spring 2012

Prediction in Multiple Linear Regression

Math 261A - Spring 2012

Standardized Regression Coecients

where sj is the estimated sample standard deviation of predictor xj and sy is the

Question: What happened to the intercept?

Math 261A - Spring 2012

where Rj2 is the coecient of multiple determination obtained from regressing xj on

You might also like