0% found this document useful (0 votes)
53 views

Simple Linear Regression.: 29.1 Method of Least Squares

The document summarizes the method of least squares linear regression. It defines the loss function as the sum of squared residuals that is minimized to find the regression line. This line is called the least squares line. It further describes simple linear regression where the response variable Y is modeled as a linear function of X, with random error epsilon. It derives the maximum likelihood estimates of the regression coefficients beta_0 and beta_1 as well as the variance sigma^2. Finally, it computes the distributions of the estimates beta_0 and beta_1, showing they are normally distributed.

Uploaded by

Carlos Socré
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Simple Linear Regression.: 29.1 Method of Least Squares

The document summarizes the method of least squares linear regression. It defines the loss function as the sum of squared residuals that is minimized to find the regression line. This line is called the least squares line. It further describes simple linear regression where the response variable Y is modeled as a linear function of X, with random error epsilon. It derives the maximum likelihood estimates of the regression coefficients beta_0 and beta_1 as well as the variance sigma^2. Finally, it computes the distributions of the estimates beta_0 and beta_1, showing they are normally distributed.

Uploaded by

Carlos Socré
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Lecture 29

Simple linear regression.

29.1 Method of least squares.


Suppose that we are given a sequence of observations

(X1 , Y1 ), . . . , (Xn , Yn )

where each observation is a pair of numbers X, Yi . Suppose that we want to


predict variable Y as a function of X because we believe that there is some underlying
relationship between Y and X and, for example, Y can be approximated by a function
of X, i.e. Y f (X). We will consider the simplest case when f (x) is a linear function
of x:
f (x) = 0 + 1 x.

x x

x x
x

Figure 29.1: The least-squares line.

Of course, we want to find the line that fits our data best and one can define the
measure of the quality of the fit in many different ways. The most common approach

116
LECTURE 29. SIMPLE LINEAR REGRESSION. 117

is to measure how Yi is approximated by 0 + 1 Xi in terms of the squared difference


(Yi (0 +1 Xi ))2 which means that we measure the quality of approximation globally
by the loss function
n
X
L= ( Yi (0 + 1 Xi ))2 minimize over 0 , 1
|{z} | {z }
i=1 actual estimate

and we want to minimize it over all choices of parameters 0 , 1 . The line that mini-
mizes this loss is called the least-squares line. To find the critical points we write:
n
L X
= 2(Yi (0 + 1 Xi )) = 0
0 i=1
n
L X
= 2(Yi (0 + 1 Xi ))Xi = 0
1 i=1

If we introduce the notations

= 1 1X 1X 2 1X
X
X Xi , Y = Yi , X 2 = Xi , XY = Xi Y i
n n n n
then the critical point conditions can be rewritten as
= Y and 0 X
0 + 1 X + 1 X 2 = XY

and solving it for 0 and 1 we get


Y
XY X
1 = and 0 = Y 1 X.

2
X2 X
If each Xi is a vector Xi = (Xi1 , . . . , Xik ) of dimension k then we can try to
approximate Yi s as a linear function of the coordinates of Xi :

Yi f (Xi ) = 0 + 1 Xi1 + . . . + k Xik .

In this case one can also minimize the square loss:


X
L= (Yi (0 + 1 Xi1 + . . . + k Xik ))2 minimize over 0 , 1 , . . . , k

by taking the derivatives and solving the system of linear equations to find the pa-
rameters 0 , . . . , k .
LECTURE 29. SIMPLE LINEAR REGRESSION. 118

29.2 Simple linear regression.


First of all, when the response variable Y in a random couple (X, Y ) is predicted as
a function of X then one can model this situation by

Y = f (X) +

where the random variable is independent of X (it is often called random noise)
and on average it is equal to zero: = 0. For a fixed X, the response variable Y in
this model on average will be equal to f (X) since

(Y |X) = (f (X) + |X) = f (X) + (|X) = f (X) + = f (X).

and f (x) = (Y |X = x) is called the regression function.


Next, we will consider a simple linear regression model in which the regression
function is linear, i.e. f (x) = 0 + 1 x, and the response variable Y is modeled as

Y = f (X) + = 0 + 1 X + ,

where the random noise is assumed to have normal distribution N (0, 2 ).


Suppose that we are given a sequence (X1 , Y1 ), . . . , (Xn , Yn ) that is described by
the above model:
Y i = 0 + 1 Xi + i
and 1 , . . . , n are i.i.d. N (0, 2 ). We have three unknown parameters - 0 , 1 and 2
- and we want to estimate them using the given sample. Let us think of the points
X1 , . . . , Xn as fixed and non random and deal with the randomness that comes from
the noise variables i . For a fixed Xi , the distribution of Yi is equal to N (f (Xi ), 2 )
with p.d.f.
1 (yf (Xi ))2
f (y) = e 22
2
and the likelihood function of the sequence Y1 , . . . , Yn is:
 1 n  1 n
12 1 Pn
Pn
(Yi f (Xi ))2 2
f (Y1 , . . . , Yn ) = e 2 i=1 = e 22 i=1 (Yi 0 1 Xi ) .
2 2

Let us find the maximum likelihood estimates of 0 , 1 and 2 that maximize this
likelihood function. First of all, it is obvious that for any 2 we need to minimize
n
X
(Yi 0 1 Xi )2
i=1
LECTURE 29. SIMPLE LINEAR REGRESSION. 119

over 0 , 1 which is the same as finding the least-squares line and, therefore, the MLE
for 0 and 1 are given by

and 1 = XY X Y .
0 = Y 1 X
X2 X2
Finally, to find the MLE of 2 we maximize the likelihood over 2 and get:
n
1X
2
= (Yi 0 1 Xi )2 .
n i=1

Let us now compute the joint distribution of 0 and 1 . Since Xi s are fixed, these
estimates are written as linear combinations of Yi s which have normal distributions
and, as a result, 0 and 1 will have normal distributions. All we need to do is find
their means, variances and covariance. First, if we write 1 as
P
XY X Y 1 (Xi X)Y i

1 = =
X2 X2 n X2 X 2
then its expectation can be computed:
P P
(X i Yi
X) 0 + 1 Xi )
(Xi X)(
(1 ) = =
n(X 2 X 2) n(X 2 X 2)
P P 2
(Xi X) Xi (Xi X) nX 2 nX
= 0 +1 = 1 = 1 .
n(X 2 X 2) n(X 2 X 2) 2)
n(X 2 X
| {z }
=0

Therefore, 1 is unbiased estimator of 1 . The variance of 1 can be computed:


 P(X X)Y i X  (X X)Y i
i i
Var(1 ) = Var = Var
n(X 2 X 2) n(X 2 X 2)
X  Xi X 2 1
= 2 = 2 ) 2
n(X 2 X
2
n(X X )
2 2 2
n (X X 2 )2
2
= .
n(X 2 X 2)
 
Therefore, 1 N 1 , n(X 2X 2 ) . A similar straightforward computations give:
2

  2  
N 0 , 1 +
0 = Y 1 X
X
2
2)
n n(X 2 X
and
2
X
Cov(0 , 1 ) = .
2)
n(X 2 X

You might also like