Basics of Regression analysis

Basics of
Regression Analysis
Presented By
Mahak Vijay
08-02-2017
1

•What is Regression Analysis?
•Population Regression Line
•Why do we use Regression Analysis?
•What are the types of Regression?
•Simple Linear Regression Model
•Least Square Estimation for parameters
•Least Square for Linear Regression
•References
08-02-2017
2
Outlines

 Regression analysis is a form of predictive modelling technique which investigates the
relationship between a dependent (target) and independent variable(s) (predictor).
 This technique is used for forecasting, time series modelling and finding the causal effect
relationship between the variables.
 For example, relationship between rash driving and number of road accidents by a driver is
best studied through regression.
08-02-2017
3
What is Regression Analysis?

x
y
Regression Line
Actual
Estimated
Errors
08-02-2017
4
Population Regression Line
Independent Variables
DependentVariables

Study Time
EstimatedGrades
Population regression function =
𝑦 = 𝑏0+𝑏1x
𝑦 = Estimated Grades
x = Study Time
𝑏0= Intercept
𝑏1= Slope
Example
08-02-2017
5
Population Regression Line
𝑏0= Intercept
𝑏1= Slope
Regression Line

Typically, a regression analysis is used for these purposes:
(1) Prediction of the target variable (forecasting).
(2) Modelling the relationships between the dependent variable and the explanatory variable.
(3) Testing of hypotheses.
Benefits
1. It indicates the strength of impact of multiple independent variables on a dependent variable.
2. It indicates the significant relationships between dependent variable and independent variable.
These benefits help market researchers / data analysts / data scientists to eliminate and evaluate the best set
of variables to be used for building predictive models.
08-02-2017
6
Why we need Regression Analysis?

Types of regression analysis:
Regression analysis is generally classified into two
kinds: simple and multiple.
Simple Regression:
It involves only two variables: dependent variable ,
explanatory (independent) variable.
A regression analysis may involve a linear model or
a nonlinear model.
The term linear can be interpreted in two different
ways:
1. Linear in variable
2. Linearity in the parameter
Regression
Analysis
Simple Multiple
Linear Non Linear
1 Explanatory
variable
2+ Explanatory
variable
08-02-2017
7
Types of Regression Analysis

Simple linear regression model is a model with a single regressor x that has a linear relationship with a
response y.
Simple linear regression model:
y = 𝑏0+𝑏1x + ɛ
Response variable Regressor variable
Intercept Slope Random error component
In this technique, the dependent variable is continuous
and random variable, independent variable(s) can
be continuous or discrete but it is not a random
variable, and nature of regression line is linear.
08-02-2017
8
Simple Linear Regression Model

Some basic assumption on the model:
Simple linear regression model:
yi= 𝑏0+𝑏1xi + ɛi for i=(1,2….n)
 ɛi is a random variable with zero mean and variance σ2,i.e.
 ɛi and ɛj are uncorrelated for i ≠ j, i.e.
 ɛi is a normally distributed random variable with mean zero and variance σ2.
Ɛi ~𝑖𝑛𝑑 N (0, σ2).
E(ɛi )=0 ; V(ɛi )= σ2
cov(ɛi , ɛj )=0
08-02-2017
9

yi= 𝑏0+𝑏1xi + ɛi for i=(1,2….n)
E(yi) = 𝐸(𝑏0+𝑏1xi + ɛi)= 𝑏0+𝑏1xi
V(yi) = 𝑉(𝑏0+𝑏1xi + ɛi)=V(ɛi )=σ2.
=> Ɛi ~𝒊𝒏𝒅 N (0, σ2)
=> Yi ~𝒊𝒏𝒅 N (𝒃 𝟎+𝒃 𝟏xi , σ2)
08-02-2017
10
NOTE : The dataset should satisfy the basic assumption.
E(ɛi )=0

The parameters 𝑏0 and 𝑏1are unknown and must be estimates using sample data:
(𝑥1,𝑦1), (𝑥2,𝑦2),……(𝑥 𝑛,𝑦𝑛)
x
y
𝑦 = 𝑏0+𝑏1x + ɛ
x
y
08-02-2017
11
Least Square Estimation for Parameters
𝑦𝑖 = 𝑏0+ 𝑏1xi + ɛi

The line fitted by least square is the one that makes the sum of squares of all vertical discrepancies
as small as possible.
x
y
We estimate the parameters so that sum of
squares of all the vertical difference between
the observation and fitted line is minimum.
S= 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖
2
(x1,y1)
(x1, 𝑦1)
(y1- 𝑦1)= ɛ1
08-02-2017
12
𝑦𝑖 = 𝑏0+ 𝑏1xi + ɛi

Minimizing the function requires to calculate the first order condition with respect to alpha and beta and
set them zero:
I:
𝜕𝑠
𝜕𝑏0
= -2 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 = 0
II:
𝜕𝑠
𝜕𝑏1
= -2 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 𝑥𝑖 = 0
We can mathematically solve for 𝑏0 𝑎𝑛𝑑 𝑏1:
I:
𝜕𝑠
𝜕𝑏0
= -2 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 = 0
𝑏0= 𝑖=1
𝑛
𝑦𝑖 − 𝑏1 𝑥𝑖
𝑏0= 𝑦- 𝑏1 𝑥
08-02-2017
13
Where 𝑦 =
𝑦𝑖
𝑛
𝑥 =
𝑥𝑖
𝑛
S= 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖
2

14
II:
𝜕𝑠
𝜕𝑏1
= -2 𝑖=1
𝑛
𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 𝑥𝑖 = 0
𝑖=1
𝑛
𝑥𝑖 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 = 0
𝑖=1
𝑛
𝑥𝑖 𝑦𝑖 − 𝑦+ 𝑏1 𝑥 − 𝑏1 𝑥𝑖 = 0
𝑖=1
𝑛
𝑥𝑖(𝑦𝑖 − 𝑦) = 𝑏1 𝑖=1
𝑛
(xi− 𝑥) 𝑥𝑖
𝑏1 = 𝑖=1
𝑛
(𝑦𝑖− 𝑦) 𝑥𝑖
𝑖=1
𝑛 (𝑥𝑖
− 𝑥) 𝑥𝑖
𝑏1 = 𝑖=1
𝑛
(𝑦𝑖− 𝑦)( 𝑥𝑖
− 𝑥)
𝑖=1
𝑛 (𝑥𝑖− 𝑥)2
𝑏1 =
𝐶𝑜𝑣(𝑥,𝑦)
𝑉𝑎𝑟(𝑥)
= 𝑋′
𝑋
−1 𝑋′
𝑦
Proof:
= 𝑖=1
𝑛
(𝑦𝑖 − 𝑦) 𝑥
= 𝑥 𝑦𝑖 − 𝑥 𝑦
=𝑛𝑥 𝑦 − 𝑛𝑥 𝑦
=0
𝑏0= 𝑦- 𝑏1 𝑥 ; 𝑏1 = 𝑖=1
𝑛
− 𝑥)
𝑖=1
𝑛 (𝑥𝑖− 𝑥)2
08-02-2017

08-02-2017
15
Example
𝑏1 = 𝑖=1
𝑛
− 𝑥)
𝑖=1
𝑛 (𝑥𝑖− 𝑥)2 =
6
10
= 0.6
𝑏0 = 𝑦- 𝑏1 𝑥 = 2.2

08-02-2017
16
Calculating R2 Using Regression Analysis
 R-squared is a statistical measure of how close the data are to the fitted regression line(For measuring the
goodness of fit ). It is also known as the coefficient of determination.
 Firstly we calculate distance between actual values and mean value and also calculate distance between estimated
value and mean value.
 Then compare both the distances.

18
Performance of Model
08-02-2017

The standard error of the estimate is a measure of the accuracy of predictions.
Note: The regression line is the line that minimizes the sum of squared deviations of prediction
(also called the sum of squares error).
The standard error of the estimate is closely related to this quantity and is defined below:
Where Y = actual value
Y’= Estimated Value
N = No. of observations
Standard error of the Estimate (Mean square error)
08-02-2017
19

X Y Y' Y-Y' (Y-Y')2
1.00 1.00 1.210 -0.210 0.044
2.00 2.00 1.635 0.365 0.133
3.00 1.30 2.060 -0.760 0.578
4.00 3.75 2.485 1.265 1.600
5.00 2.25 2.910 -0.660 0.436
Sum 15.00 10.30 10.30 0.000 2.791
Example
08-02-2017
20

Solve : Ax=b
The columns of A define a vector space range(A).
2a
1a
Ax 2211 aa xx
Ax is an arbitrary vector in range(A).
b is a vector in Rn and also in the column space of A so this has a solution.
b
08-02-2017
22
Least Square for Linear Regression

The columns of A define a vector space range(A).
2a
1a
Ax 2211 aa xx
Ax is an arbitrary vector in range(A).
b is a vector in Rn but not in the column space of A then it doesn’t has a solution.
b
Try to find out 𝒙 that makes A𝒙 as close to 𝒃 as possible and this is called least square solution of
our problem.
xAb ˆ
08-02-2017
23

08-02-2017
24
b
2a
1a
xA ˆ
xAb ˆ
A 𝑥 is the orthogonal projection of b onto range(A)
  bAxAAxAbA TTT
 ˆˆ 0

26
Matlab Implementation (Linear_Regression3.m)

27
Matlab Implementation (Linear_Regression3.m)

[1] Sykes, Alan O. "An introduction to regression analysis." (1993).
[2] Chatterjee, Samprit, and Ali S. Hadi. Regression analysis by example. John Wiley & Sons,
2015.
[3] Draper, Norman Richard, Harry Smith, and Elizabeth Pownell. Applied regression analysis.
Vol. 3. New York: Wiley, 1966.
[4] Montgomery, Douglas C., Elizabeth A. Peck, and G. Geoffrey Vining. Introduction to linear
regression analysis. John Wiley & Sons, 2015.
[5] Seber, George AF, and Alan J. Lee. Linear regression analysis. Vol. 936. John Wiley & Sons,
2012.
08-02-2017
28
Reference

Basics of Regression analysis

Recommended

More Related Content

What's hot (20)

Viewers also liked (19)

Similar to Basics of Regression analysis (20)

Recently uploaded (20)

Basics of Regression analysis

Editor's Notes