0% found this document useful (0 votes)
11 views

regression2

The document provides an introduction to regression analysis, focusing on simple linear regression, which models the relationship between a dependent variable and an independent variable. It outlines the assumptions of the regression model, the least squares estimation method for calculating coefficients, and the evaluation of the model's effectiveness. Additionally, it includes examples and group assignments related to estimating parameters and proving properties of ordinary least squares (OLS) estimates.

Uploaded by

reginaldtackie79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

regression2

The document provides an introduction to regression analysis, focusing on simple linear regression, which models the relationship between a dependent variable and an independent variable. It outlines the assumptions of the regression model, the least squares estimation method for calculating coefficients, and the evaluation of the model's effectiveness. Additionally, it includes examples and group assignments related to estimating parameters and proving properties of ordinary least squares (OLS) estimates.

Uploaded by

reginaldtackie79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

INTRODUCTION TO REGRESSION ANALYSIS

(STAT 367)

DEPARMENT OF STATISTICS AND ACTUARIAL SCIENCE

FACULTY OF PHYSICAL AND COMPUTATIONAL SCIENCE

COLLECGE OF SCIENCE

E. O. Owiredu

January 18, 2025


Simple Linear Regression

January 18, 2025 2 / 28


Introduction to Regression

Regression is a statistical tool for investigating the nature of the


relationship between variables, that is, positive or negative, linear or
nonlinear.

Regression Analysis: is used to predict the value of one variable (the


dependent variable) on the basis of other variables (the independent
variables)

Dependent Variable: is the variable whose variability is explained by


the regression model, denoted by Y.

Independent Variables: is the variable whose variation explains the


variability in the dependent variable denoted by X.

January 18, 2025 3 / 28


Simple Linear Regression Model (SLRM)

SLRM : Is a model that estimates the linear relationship between a single


dependent variable Y and an independent variable X.

Yi = β0 + β1 Xi + εi , i = 1...n (1)
Variables:
Y - Dependent Variable
X - Independent Variable

Parameter:
β0 - Intercept
β1 - Slope
ε - Random error component

January 18, 2025 4 / 28


Assumptions of Regression Model

Normality : The residual/error term is normally distributed random


variable with the mean zero and the variance σ 2
εi ⇒ N(0, σ).

Independence : The observation ( Y and X pair) are uncorrelated


with one another.

εi and εj are uncorrelated cov (εi , εj ) = 0.

Linearity : There is a linear relationship between the dependent and


the independent variable(linearity in parameters).

January 18, 2025 5 / 28


Estimating the Coefficients

In as much as we can we estimates µ with x̄ , β0 with βˆ0 and β1 with


βˆ1 the y-intercept and slope respectively of the least squares or
regression line is given by:

E (Y ) = Ŷ = βˆ0 + βˆ1 Xi (2)

This is an application of the least squares method and it produces a


straight line that minimizes the sum of the squared differences
between the points and the line.

January 18, 2025 6 / 28


Least Squares Estimation Method

The parameters β0 and β1 are unknown and must be estimated using


sample data: (X1 , Y1 ), (X2 , Y2 ), ..., (Xn , Yn )

The regression line/model minimizes the square of the vertical


distance between the actual variable (Y) and the estimated variable of
the response Ŷ

January 18, 2025 7 / 28


Least Squares Estimation Method Cont’d...

n
X n
X
L = min ε̂2 = (Y − Ŷ )2 (3)
i=1 i=1
n n
(Y − βˆ0 − β̂1 Xi )2
X X
L= ε̂2 = (4)
i=1 i=1
n
δL
= −2 (Y − βˆ0 − βˆ1 Xi )
X
(5)
δβ0 i=1
n
δL
Xi (Y − βˆ0 − β̂1 Xi )
X
= −2 (6)
δβ1 i=1

These normal equation are solved to find the estimated value β0 and β1

January 18, 2025 8 / 28


Least Squares Estimation Method Cont’d...

From eqn 5
n
(Y − βˆ0 − β̂1 Xi ) = 0
X
(7)
i=1
n n
Y − nβˆ0 − βˆ1
X X
Xi (8)
i=1 i=1

nβˆ0 βˆ1 ni=1 Xi


Pn P
i=1 Yi
= − (9)
n n n
βˆ0 = Ȳ − βˆ1 x̄ (10)

January 18, 2025 9 / 28


Least Squares Estimation Method Cont’d...
From eqn 6
n
Xi (Y − βˆ0 − β̂1 Xi ) = 0
X
2 (11)
i=1
n
Xi (Y − βˆ0 − β̂1 Xi ) = 0
X
(12)
i=1

From eqn 10
n
Xi (Y − βˆ0 − β̂1 Xi ) = 0
X
(13)
i=1

From eqn 10
n
X
Xi (Y − [Ȳ − β̂1 X̄ ] − β̂1 Xi ) = 0 (14)
i=1

January 18, 2025 10 / 28


Least Squares Estimation Method Cont’d...

n
Xi (Y − Ȳ ) + βˆ1 [x̄ − Xi ] = 0
X
(15)
i=1
n n
βˆ1 [Xi − x̄ ]Xi = 0
X X
(Yi − Ŷ )Xi − (16)
i=1 i=1
n n
βˆ1 [Xi − x̄ ]Xi = 0
X X
(Yi − Ŷ )Xi = (17)
i=1 i=1

From eqn 16
n
[Yi − Ŷ ][Xi − x̄ ]
P
βˆ1 = Pi=1
n (18)
i=1 [Xi − x̄ ][Xi − x̄ ]

January 18, 2025 11 / 28


Least Squares Estimation Method Cont’d...

Pn
i=1 [Yi − Ŷ ][Xi − x̄ ]
βˆ1 = Pn 2
(19)
i=1 [Xi − X̄ ]
Sxy
βˆ1 = (20)
SSxx
cov (y , x )
βˆ1 = (21)
var (x )
Pn Pn Pn
n i=1 XY − ( i=1 X )( i=1 Y )
βˆ1 = (22)
n( ni=1 Xi2 ) − ( ni=1 X )2
P P

January 18, 2025 12 / 28


Regression Equation

Regression equation describes the regression line mathematically by the


intercept and the slope. We replace βˆ0 by a and βˆ1 by b in the graph
below.

January 18, 2025 13 / 28


Regression Equation Cont’d...

January 18, 2025 14 / 28


Example 1

The amount of a chemical compound Y, which is dissolved in 100 grams


of water at various temperatures X, was recorded as follows.

1 Fit the linear regression model y = β0 + β1 x + ε to these data, using


the method of least squares.

2 Estimate the amount of the chemical compound which will dissolve in


the 100 grams of water at 7.5o C

January 18, 2025 15 / 28


Solution

January 18, 2025 16 / 28


Solution Cont’d...

January 18, 2025 17 / 28


Example 2

A sample of 6 persons were selected, and the value of their age (x


variable) and their premium is demonstrated in the following table. find
the regression equation and what is the predicted premium when the age is
8.5 years.

January 18, 2025 18 / 28


Output

January 18, 2025 19 / 28


Now the Regression equation is:

Ŷ = 4.692 +0.923X

When age (X) is 8.5 years,

Ŷ = 4.692 + 0.923(8.5)
Ŷ = 12.538

January 18, 2025 20 / 28


Estimation of σ 2

This is obtained from the residual sum of squares:

εi = Yi − Ŷ

is used to obtain the estimate of the error term. The sum of squares of the
residuals (error sum of squares) is;
n
X n
X
SSRes = ε2i = (Yi − Ŷ )2
i=1 i=1

The expected value of the error sum of square is;

E (SSE ) = (n − 2)σ 2

January 18, 2025 21 / 28


Estimation of σ 2 Cont’d...

Thus the residual sum of square by n-2 is an unbiased estimator of sigma2


SSRes
E( ) = σ2 (23)
n−2
Also, the standard error of the estimate is;
s
SSE
Se = (24)
n−2
If Se is zero, all the points fall on the regression line. If Se is small, the fit
is excellent and the linear model should be used for forecasting. If Se is
large, the model is poor.

January 18, 2025 22 / 28


Evaluation of Model

Testing the slope coefficient

If no linear relationship exists between the two variables, we would


expect the regression line to be horizontal, that is, to have a slope of
zero.

We want to see if there is a linear relationship, i.e., if the slope is


something other than zero. Our research hypothesis becomes:

H0 = β1 = 0 [no linear relationship]


H0 = β1 ̸= 0 [there is linear relationship]

January 18, 2025 23 / 28


Evaluation of Model Cont’d...

We can implement this test statistic to try our hypotheses:

βˆ1 − β1
t= (25)
Sβˆ1

Where Sβˆ1 is the standard deviation of βˆ1 defined as:


s
σ2
Sβˆ1 = (26)
SXX
Where
n Pn 2
X ( i=1 xi )
Sxx = xi2 − (27)
i=1
n

January 18, 2025 24 / 28


Evaluation of Model Cont’d...

If the error term is normally distributed, the test statistic has a


student t-distribution with n−2 degrees of freedom. The rejection
region depends on whether or not weâre doing a one or two-tail test
(a two-tail test is most typical).

We reject the null hypothesis H0 if:

tcal > tα/2,n−2

January 18, 2025 25 / 28


Properties of the OLS Estimates

These can be summarized by: OLS estimator is BLUE


B - Best
L - Linear
U - Unbiased
E - Estimator
NOTE
The Gauss Markov Theorem is required for the proof.

January 18, 2025 26 / 28


GROUP ASSIGNMENT

1 Prove that OLS is BLUE

2 Estimate β0 and β1 (Show Working)

January 18, 2025 27 / 28


January 18, 2025 28 / 28

You might also like