0% found this document useful (0 votes)
6 views

Lecture 3 Multi-Regresion 2022.

The document discusses different types of regression models including simple linear regression, multivariate linear regression, and basis function regression. It covers key concepts such as estimating regression parameters using least squares, calculating errors, and evaluating accuracy of regression models.

Uploaded by

Haziq Zaq
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture 3 Multi-Regresion 2022.

The document discusses different types of regression models including simple linear regression, multivariate linear regression, and basis function regression. It covers key concepts such as estimating regression parameters using least squares, calculating errors, and evaluating accuracy of regression models.

Uploaded by

Haziq Zaq
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

MEC3361 IIoT and Data Analytics II

Lecture 3: Multi Regression

Asst. Prof. Seow Chee Kiat


[email protected]

1
Outline
• Simple Linear Regression
• Multivariate Linear Regression
• Basis Function Regression
• Regularization

2
Simple Linear Regression

Inputs Density Based Probability


(such as Bayesian Network)

Inputs Classifier Predict the class


(such as Decision Tree, Neural Network)

Inputs Regressor Predict number in real time

3
Simple Linear Regression
Dependent Variable : Play? Dependent Variable : Play?
Weather outlook Weather outlook

Sunny Cloudy Sunny Cloudy

Humidity Humidity
Windy Windy
<=70 >70 True False <=70 >70 True False

Play:3 Play:0 Play:0 Play:3 Play~=30 min Play~=5min Play ~=0 min Play ~=32 min
Don’t Play: 0 Don’t Play: 3 Don’t Play: 2 Don’t Play: 0
Play =30 min ,45 min Play =0 min ,0 min, Play =0 min ,0 min Play=20 min , 30 min
15 min , 45 min
Decision Tree Classification Regression Tree
4
𝑦! = 𝑤𝑥! + 𝜀!
𝜀! = 𝑦! − 𝑤𝑥!
If there is N sample, there will be N number of 𝜀 from 𝜀" , 𝜀# ⋯ 𝜀$

Simple Linear Regression


• Given an input 𝑥, would like to compute 𝑦
• Predict height from age
• Predict Google’s price from yahoo’s price
• Predict distance from wall from IoT sensors
What we are
Observed values
trying to predict

𝑦 = 𝑤𝑥 + 𝜀 noise 𝑦 = 𝑤𝑥 + 𝜀

𝑤 is the regression parameter, 𝜀 is the measurement noise


• Goal is to estimate 𝑤 from a training data <𝑥! , 𝑦! > pairs to
predict 𝑦&! using least squares where 𝑁 is the total number of
training data Linear Regression Line
% % %
& &
𝑤 = arg m𝑖𝑛 - 𝑦" − 𝑦/" → 𝑤 = arg m𝑖𝑛 - 𝜀" 𝑤 = arg m𝑖𝑛 - 𝑦" − 𝑤𝑥" &
!
"#$
!
"#$ ! 5
"#$
Simple Linear Regression
• It can be shown that the optimal value of 𝑤 after taking
derivative wrt to 𝑤 and set to 0 is optimal 𝑤 = 2.027
%
,
𝑤 = arg m𝑖𝑛 - 𝑦" − 𝑤𝑥" &
→ ∑0
!./ 𝑦! − 𝑤𝑥!
1 =0 50 data points of y= 2𝑥 with normal noise standard
! ,- variation of 1
"#$

∑%
"#$ 𝑥" 𝑦" 𝑐𝑜𝑣(𝑥, 𝑦)
𝑤= = if mean of 𝑥 and 𝑦 are 0
∑%"#$ 𝑥"
& 𝑣𝑎𝑟(𝑥)

• If there is a bias, in which bias


Actual value
of 𝑤* = 5
𝑦 = 𝑤$ 𝑥 + 𝑤' + 𝜀
optimal 𝑤$ = 2.027
• optimal value of 𝑤 and 𝑤+ using least squares 𝑤' = 5.001
∑%
"#$ 𝑦" − 𝑤$ 𝑥" 𝑁 ∑% % %
"#$ 𝑥" 𝑦" − ∑"#$ 𝑥" ∑"#$ 𝑦"
𝑤( = 𝑤$ = &
50 data points of y= 2𝑥 + 5 with normal noise standard
𝑁 𝑁 ∑% & %
"#$ 𝑥" − ∑"#$ 𝑥"
variation of 1
Simple Linear Regression(Proof)
• 𝑦! = 𝑤𝑥! + 𝜀! → 𝜀! = 𝑦! − 𝑤𝑥! .
• Since there are 𝑁 data point, there will be 𝜀/ ⋯ 𝜀0 .
• Total sum of error is 𝜀/+ 𝜀1 + ⋯ 𝜀0 = ∑0 !./ 𝜀! .

E
+, +,
<0
• Total square sum of error E = ∑0
!./ 𝜀! 1 = ∑0
!./ 𝑦! − 𝑤𝑥! 1. +- +, +-
>0
+-
=0
• To get minimum error, %
&
𝑤 = arg m𝑖𝑛 - 𝑦" − 𝑤𝑥"
! weight
"#$
,
• To obtain 𝑤, differentiate wrt to 𝑤 → ,- ∑0
!./ 𝑦! − 𝑤𝑥!
1 =0
0 0 0 0 0
, 1
,-
∑0
!./ 𝑦! − 𝑤𝑥!
1 = 0 → 2 E −𝑥! 𝑦! − 𝑤𝑥! = 0 → E 𝑥! 𝑦! − E 𝑤𝑥! 1 = 0 → E 𝑤𝑥! = E 𝑥! 𝑦!
!./ !./ !./ !./ !./
∑0
!./ 𝑥! 𝑦!
→𝑤= 0
∑!./ 𝑥! 1
Simple Linear Regression
• Accuracy of prediction
• Root mean square Error (RMS)
$ $
RMS error = ∑% 𝜀 & = ∑%
"#$ 𝑦" − 𝑦
/" &
% "#$ " %

where 𝑦/" = 𝑤$ 𝑥" + 𝑤' is the predicted 𝑦"


• The more accurate the prediction is, RMS error→ 0

• 𝑅& (coefficient of determination)

&
∑%
"#$ 𝑦" − 𝑦
/" &
Variance of the model errors
𝑅 =1− %
∑"#$ 𝑦" − 𝑦B &
Variance of the data
where 𝑦B is the mean of 𝑦
8
• The more accurate the prediction is, 𝑅& → 1
Multivariate Linear Regression (MLR)
• Multiplevariate (input) regression
• E.g. Predict Google’s stock price using Yahoo, Microsoft , Ebay, Amazon

𝑦 = 𝑤' + 𝑤$ 𝑥$ + 𝑤& 𝑥& + 𝑤, 𝑥, + 𝑤- 𝑥- + 𝜀

Google Stock Yahoo Stock Microsoft stock Ebay stock Amazon stock

• In general, a multivariate regression model for 𝑘 features and 𝑁 dataset can be


modeled as 𝒚=𝐗𝐰 + 𝑤' + 𝝐
𝑥$$ ⋯ 𝑥$* 𝐗$
where 𝒚 = 𝑦$ ⋯ 𝑦% , 𝐗 = ⋮
) ⋱ ⋮ = = ⋮ , 𝒘 = 𝑤$ ⋯ 𝑤* ) , 𝝐 = 𝜀$ ⋯ 𝜀% )
𝑥%$ ⋯ 𝑥%* 𝐗+
• Task is to find a set of weight such as to minimize the error or loss 𝐽(𝒘)
%
&
𝒘 = arg m𝑖𝑛 𝐽 𝒘 = arg m𝑖𝑛 - 𝑦" − 𝐗 / 𝒘 − 𝑤'
𝒘 𝒘 9
"#$
Basis Function Regression
• Not all problem or functions can be approximated by a
straight line /hyperplane. In general, it is non-linear function
• E.g. model the track of roller coaster, GPS route trajectory,
trajectory of sine wave Track of roller coaster

• The idea is to transform the non-linear function to linear


function
• E.g. for polynomial function as
𝑦 = 𝑤' + 𝑤$ 𝑥$ + 𝑤& 𝑥&& + 𝑤, 𝑥,, + 𝑤- 𝑥-- + ⋯ 𝑤* 𝑥** + 𝜀

𝑦 = 𝑤' + 𝑤$ 𝑧$ + 𝑤& 𝑧& + 𝑤, 𝑧, + 𝑤- 𝑧- + ⋯ 𝑤* 𝑧* + 𝜀

where 𝑧$ = 𝑓 𝑥$ = 𝑥$ , 𝑧& = 𝑓 𝑥& = 𝑥&& , 𝑧* = 𝑓 𝑥* = 𝑥**


Sine wave
• 𝑓 𝑥 is called the basis function that form the original data to
linear form of data 10
Any function of the input
values can be used for basis.
The solution for the
parameters of the regression
remains the same.

Basis Function Regression


• There are many non-linear basis function to be used such as
• Polynomial : 𝜙0 𝑥 = 𝑥 0 𝑓𝑜𝑟 𝑗 = 0 ⋯ 𝑘 where 𝑘 is the number of
feature 50 points of 𝑦 = 𝑠𝑖𝑛𝑥 corrupted by gaussian
123! noise. Fit with a polynomial function of
• Gaussian : 𝜙0 𝑥 = 𝑓𝑜𝑟 𝑗 = 0 ⋯ 𝑘 where 𝜎0 and 𝜇0 are the order 7
&4!"
standard deviation and mean of the 𝑗56 feature data respectively
$
• Sigmoid : 𝜙0 𝑥 = $78 #(%) 𝑓𝑜𝑟 𝑗 = 0 ⋯ 𝑘
• Logarithmic : 𝜙0 𝑥 = log 𝑥0 + 1 𝑓𝑜𝑟 𝑗 = 0 ⋯ 𝑘
• In general, the basis function linear regression can be written as
* Track of roller coaster

𝑦 = - 𝑤0 𝜙0 (𝑥) + 𝜀
0#' 50 points of 𝑦 = 𝑠𝑖𝑛𝑥 corrupted by gaussian
noise. Fit with a gaussian function
where 𝜙0 𝑥 can be either 𝑥0 or one of the above non-linear basis
11
function. 𝜙' 𝑥 = 1 for the intercept term
Basis Function Regression
• Using the same least square approach, task is to predict for 𝑁 sample of training data of 𝑦F!
G/, 𝑦G1 ⋯ 𝑦G0 where 𝑦F! = ∑5 𝑤4 𝜙4 (𝑥 ! ) by minimizing the error or loss function
from 𝑦 4.+
&
% % *
&
𝐽(𝒘) = - 𝑦 " − 𝑦_" = - 𝑦 " − - 𝑤0 𝜙0 (𝑥 " )
"#$ "#$ 0#'

• By taking derivative with respect to 𝑤4 and set to 0 , it can be shown that


%
𝜕
𝐽 𝒘 = 0 → 2 -(𝑦 " − 𝑦_" )𝜙0 (𝑥 " ) = 0 → 𝒘 = 𝛟) 𝛟 2𝟏
𝛟) y
𝜕𝑤0
"#'
where
𝑦$ 𝜙' (𝑥 $ ) 𝜙$ (𝑥 $ ) ⋯ 𝜙* (𝑥 $ ) 𝑤' 𝜀$
& & & & 𝑤$
𝒚= 𝑦 𝛟 = 𝜙' (𝑥 ) 𝜙$ (𝑥 ) ⋯ 𝜙* (𝑥 )
&
𝒘= ⋮ 𝜺= 𝜀 𝒚 = 𝛟w+ 𝜺
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝑦% % %
𝜙' (𝑥 ) 𝜙$ (𝑥 ) ⋯ 𝜙* (𝑥 ) % 𝑤* 𝜀% 12
Regression Model Overfitting
• The use of basis functions into linear regression make the model flexible but it can be
easily led to over-fitting.

50 points of 𝑦 = 𝑠𝑖𝑛𝑥 corrupted by 50 points of 𝑦 = 𝑠𝑖𝑛𝑥 corrupted by 50 points of 𝑦 = 𝑠𝑖𝑛𝑥 corrupted by 50 points of 𝑦 = 𝑠𝑖𝑛𝑥 corrupted
gaussian noise. Fit with a polynomial gaussian noise. Fit with a polynomial gaussian noise. Fit with a polynomial by gaussian noise. Fit with a
function of order 1. function of order 3. function of order 7. polynomial function of order 20.
𝑤! = −0.012 , 𝑤" = 0.348 𝑤! = −0.721 , 𝑤# = 0.141 , 𝑤$ = −0.008,
𝑤" = 1.089 13
Regression Model Overfitting
• Regularization
• a regulated process applied to the linear model by minimizing the
least square loss function with a penalty function
50 points of 𝑦 = 𝑠𝑖𝑛𝑥 corrupted
• Ridge Regression (𝐿& Regulation) by gaussian noise. Fit with a
• A form of MLR that penalize the sum of squares of the model polynomial function of order 20.

coefficients
%
&
𝐽(𝒘) = - 𝑦" − 𝑦/" +𝑃 Penalty function
"#$

𝐽(𝒘) = - 𝑦" − 𝐗 / 𝒘 − 𝑤' &


+𝛼 𝒘 &
+ 𝑤'&
"#$
50 points of 𝑦 = 𝑠𝑖𝑛𝑥 corrupted
Where 𝛼 is the hyperparameter for ridge regression. If 𝛼=0, then it by gaussian noise. Fit with a
ridge function of 𝛼=5. 14
becomes the standard MLR loss function
Regression Model Overfitting
• Regularization
• Lasso Regression (𝐿$ Regulation)
• A form of MLR that impose 𝐿$ regulation on the regression 50 points of 𝑦 = 𝑠𝑖𝑛𝑥 corrupted
coefficients for sparser model by gaussian noise. Fit with a
polynomial function of order 20.
%
&
𝐽(𝒘) = - 𝑦" − 𝐗 / 𝒘 − 𝑤' +𝛼 𝒘 $ + 𝑤'
"#$

Penalty function

Where 𝛼 is the hyperparameter for ridge regression. If 𝛼=0, then it


becomes the standard MLR loss function 50 points of 𝑦 = 𝑠𝑖𝑛𝑥 corrupted
by gaussian noise. Fit with a
lasso function of 𝛼=0.05. 15
Next Lecture
Lecture 4 : Classification Analysis II

#UofGWorldChangers
@UofGlasgow
16

You might also like