0% found this document useful (0 votes)

62 views

L3 Linear Regression

1) The document introduces linear regression and the ordinary least squares method for learning a linear regression function from training data. 2) Ordinary least squares finds the coefficients that minimize the residual sum of squares between the predicted and actual target values. It works by computing the closed-form solution of the normal equations. 3) The method has limitations if the data matrix is not full rank, meaning some attributes are dependent on others, making the matrix not invertible. In this case, other methods are needed.

Uploaded by

Hieu Tien Trinh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

L3 Linear Regression

Uploaded by

Hieu Tien Trinh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Introduction to

Machine Learning and Data Mining

(Học máy và Khai phá dữ liệu)

Khoat Than
School of Information and Communication Technology
Hanoi University of Science and Technology

2021
2
Contents
¡ Introduction to Machine Learning & Data Mining
¡ Supervised learning
¨ Linear regression
¡ Unsupervised learning

¡ Practical advice
3
Linear regression: introduction
¡ Regression problem: learn a function y = f(x) from a given
training data D = {(x1, y1), (x2, y2), …, (xM, yM)} such that
yi ≅ f(xi) for every i
¨ Each observation of x is represented by a vector in an n-dimensional
space, e.g., xi = (xi1, xi2, …, xin)T. Each dimension represents an
attribute/feature/variate.
¨ Bold characters denote vectors.
¡ Linear model: if f(x) is assumed to be of linear form
f(x) = w0 + w1x1 + … + wnxn
¨ w0, w1, …, wn are the regression coefficients/weights. w0 sometimes is
called “bias”.
¡ Note: learning a linear function is equivalent to learning the
coefficient vector w = (w0, w1, …, wn)T.
4
Linear regression: example
¡ What is the best function?

x y 𝑓(𝑥)
0.13 -0.91
1.02 -0.17
3.17 1.61
-2.76 -3.31
1.44 0.18
5.28 3.36
-1.74 -2.46 𝑥
7.93 5.56
... ...
5
Prediction
¡ For each observation x = (x1, x2, …, xn)T
¨ The true output: cx
(but unknown for future data)
¨ Prediction by our system:
yx = w0 + w1x1 + … + wnxn
¨ We often expect yx ≅ cx.
¡ Prediction for a future observation z = (z1, z2, …, zn)T
¨ Use the learned function to make prediction
f(z) = w0 + w1z1 + … + wnzn
6
Learning a regression function
¡ Learning goal: learn a function f* such that its prediction in the
future is the best.
¨ Its generalization is the best.
¡ Difficulty: infinite number of functions 𝑓(𝑥)
¨ How can we learn?
¨ Is function f better than g?
¡ Use a measure
¨ Loss function is often used to guide learning.

𝑥
7
Loss function
¡ Definition:
¨ The error/loss of the prediction for an abservation x = (x1, x2, …, xn)T
r(x) = [cx – f(x)]2 = (cx – w0 – w1x1 -… - wnxn)2
¨ The expected loss of f over the whole space:
E = Ex[r(x)] = Ex[cx – f(x)]2 Cost, risk
(Ex is the expectation over x)
¡ The goal of learning is to find f* that minimizes the expected loss:
f ⇤ = arg minf 2H E x [r(x)]
¨ H is the space of functions of linear form.
¡ But, we cannot work directly with this problem during the
learning phase. (why?)
Hàm lỗi thực nghiệm 8
Empirical loss
Ta chỉ quan sát được một tập 𝑫 = 𝒙 , 𝑦 , … , (𝒙 , 𝑦 ) .
Cần học hàm 𝑓 từ đó.
¡ We can only observe a set of training data D = {(x1, y1), (x2, y2),
…, Lỗi thực nghiệm (empirical
(xM, yM)}, and have to learnloss; residual
f from D. sum of squares)
Là một xấp xỉ của 𝐸𝒙 𝑟(𝒙) trên tập học D

¡ Empirical loss (lỗi thực nghiệm, residual sum of squares):

Định nghĩa:

𝑅𝑆𝑆 𝑓 = (𝑦 − 𝑓(𝒙 )) = (𝑦 − 𝑤 − 𝑤 𝑥 − ⋯ − 𝑤 𝑥 )

¨
!Nhiều phương pháp học thường gắn với 𝑅𝑆𝑆.
𝑅𝑆𝑆𝑓 is an approximation to Ex[r(x)].
"
%
¡ 𝑅𝑆𝑆 𝑓 − 𝑬𝑥 𝑟 𝒙 is often known as generalization error of f.
&
(lỗi tổng quát hoá)
¡ Many learning algorithms base on this RSS and its variants.
8
9
Bình phương tối thiểu
Methods: ordinary least squares (OLS)
Bình phương tối thiểu
¡ Given D, we find f* that minimizesmà có 𝑅𝑆𝑆
Cho trước 𝑫, ta đi tìm hàm 𝑓 RSS: nhỏ nhất.
∗
𝑓 = arg min
Cho trước 𝑫, ta đi tìm hàm 𝑓 𝑅𝑆𝑆(𝑓) nhỏ nhất.
mà có 𝑅𝑆𝑆
∈𝑯
∗
𝑓 = arg min 𝑅𝑆𝑆(𝑓)
∈𝑯
∗
⇔ 𝒘 = arg min (𝑦 − 𝑤 − 𝑤 𝑥 − ⋯ − 𝑤 𝑥 ) (1)
𝒘
⇔ 𝒘∗ = arg min (𝑦 − 𝑤 − 𝑤 𝑥 − ⋯ − 𝑤 𝑥 )
𝒘
¡ This Đây được gọi là
method is often known bình phương tối thiểu
as ordinary least (least squares).
squares (OLS, bình
phương tối thiểu). bình phương tối thiểu
Đây được gọi là
Tìm nghiệm 𝒘 ∗ (least squares).
bằng cách lấy đạo hàm của 𝑅𝑆𝑆 và giải
¡ Find phương trình 𝑅𝑆𝑆′
w* by taking
Tìm nghiệm 𝒘 = 0. Thu được:
∗ bằng cách lấy đạo hàm của 𝑅𝑆𝑆
the gradient of RSS and the solving the
và giải
equation RSS’=0. We=have:
phương trình 𝑅𝑆𝑆′ 𝒘∗ = 𝑨 𝑨 𝑨 𝒚
0. Thu được:
𝒘∗ = 𝑨 𝑨 𝑨 ×
Trong đó 𝑨 là ma trận dữ liệu cỡ 𝑀 𝒚 (𝑛 + 1) mà hàng thứ 𝑖 là
(1, 𝑥 , … là ma trận dữ liệu cỡ 𝑀
Trong đó 𝑨 , 𝑥 ); 𝑩 là ma trận nghịch đảo;; 𝒚 = 𝑦 , … , 𝑦 là
× (𝑛 + 1) mà hàng thứ 𝑖 .
¨ Where A is the data matrix
Chú ý: giả thuyết 𝑨 𝑨 of size Mx(n+1), whose the ith row is
tồn tại nghịch đảo.
(1, 𝑥 , … , 𝑥 ); 𝑩 là ma trận nghịch đảo;; 𝒚 = 𝑦 , … , 𝑦 .
Ai = (1, xi1, xi2, …, xin); B-1 is the inversion of matrix B; y = (y1, y2, …, yM)T.
Chú ý: giả thuyết 𝑨 𝑨 tồn tại nghịch đảo.
¨ Note: we assume that ATA is invertible (ma trận ATA khả nghịch).
Cho trước 𝑫, ta đi tìm hàm 𝑓 mà có 𝑅𝑆𝑆 nhỏ nhất.
𝑓 ∗ = arg min 𝑅𝑆𝑆(𝑓)
∈𝑯 10
Methods: OLS
Bình phương tối thiểu: thuật toán
∗
⇔ 𝒘 = arg min (𝑦 − 𝑤 − 𝑤 𝑥 − ⋯ − 𝑤 𝑥 )
𝒘
¡ Input: D = {(x1, y1), (x2, y2), …, (xM, yM)}
Đây được gọi là
Input: 𝑫 = 𝒙bình phương tối thiểu
, 𝑦 , … , (𝒙 , 𝑦 ) (least squares).
¡ Output: w*
Output: 𝒘∗ ∗ bằng cách lấy đạo hàm của 𝑅𝑆𝑆 và giải
Tìm nghiệm 𝒘
¡ Learning: compute
phương trình 𝑅𝑆𝑆′
∗ = 0. Thu được:
Học 𝒘 bằng cách tính:
𝒘∗ 𝒘
=∗ =𝑨 𝑨𝑨 𝑨 𝑨 𝒚𝑨 𝒚
Trong đó 𝑨
¨ Where A
Trong đó 𝑨 the data matrix of ×
là ma trận dữ liệu cỡ 𝑀
is là ma trận dữ liệu cỡ 𝑀 (𝑛×M
size +(𝑛x1)
+mà hàng thứ 𝑖
(n+1), whose the là ith là
1) mà hàng thứ 𝑖 row is
(1, 𝑥A , … , 𝑥 ); 𝑩 là ma trận nghịch đảo;; 𝒚 =of𝑦matrix
, … , 𝑦 B; .y = (y=, y , …, y )T.
i = (1, xi1, xi2,=…,
một véctơ 𝑨 (1,x𝑥in);, …
B-1, 𝑥is the
), 𝑩inversion
là ma trận nghịch đảo, 𝒚 1 2 M
Chú ý: giả thuyết 𝑨
𝑦 ,…,𝑦 . 𝑨 tồn tại nghịch đảo.
¨ Note: we assume that A A is invertible.
T
Chú ý: giả thuyết 𝑨 𝑨 tồn tại nghịch đảo.
¡ Prediction for a new x:
Phán đoán cho quan sát mới 𝒙: 9

𝑦 = 𝑤 ∗ + 𝑤 ∗𝑥 + ⋯ + 𝑤 ∗𝑥
11
Methods: OLS example

6
x y
0.13 -1
4 f*
1.02 -0.17
3 1.61
2
-2.5 -2
1.44 0.1
0
5 3.36
-1.74 -2.46
-2
7.5 5.56

-4
-4 -2 0 2 4 6 8
f*(x) = 0.81x – 0.78
12
Methods: limitations of OLS
¡ OLS cannot work if ATA is not invertible
¨ If some columns (attributes/features) of A are dependent, then A will
be singular and therefore ATA is not invertible.
¡ OLS requires considerable computation due to the need of
computing a matrix inversion.
¨ Intractable for the very high dimensional problems.
¡ OLS very likely tends to overfitting, because the learning phase
just focuses on minimizing errors on the training data.
Ridge regression (1) 13
Methods: Ridge regression (1)
Cho trước 𝑫 = 𝒙 , 𝑦 , … , (𝒙 , 𝑦 ) , ta đi giải bài toán:
¡ Given D = {(x1, y1), (x2, y2), …, (xM, yM)}, we solve for:
𝑓 ∗ = arg min 𝑅𝑆𝑆 𝑓 + 𝜆 𝒘
∈𝑯

⇔ 𝒘∗ = arg min (𝑦 − 𝑨𝒊 𝒘) + 𝜆 𝑤 (2)

𝒘

¨ Where Ai = (1, x>i1,0xlà một hằng số phạt.

Trong đó 𝜆
i2, …, xin) is composed from xi; and λ is a
𝑨 = 1, 𝑥 constant
regularization , … , 𝑥 tương ứng với quan sát 𝒙
(λ> 0). 𝒘 # is the L2 norm.
Đại lượng chuẩn tắc 𝜆 𝒘 :
Có vai trò hạn chế độ lớn của 𝒘∗ (hạn chế không gian hàm 𝑓).
Đánh đổi chất lượng của hàm 𝑓 đối với tập học 𝑫, để có khả
năng phán đoán tốt hơn với quan sát tương lai.

13
14
Methods: Ridge regression (2)
¡ Problem (2) is equivalent to the following:
"
𝑤 ∗ = arg min . 𝑦& − 𝑨& 𝒘 # (3)
𝒘
&'!

Subject to ∑*(') 𝑤(# ≤ 𝑡

¨ for some constant t.

"
¡ The regularization/penalty term: 𝜆 𝒘 "
¨ Limits the magnitute/size of w* (i.e., reduces the search space for f*).
¨ Helps us to trade off between the fitting of f on D and its
generalization on future observations.
15
Ridge regression
Methods: (2)
Ridge regression (3)
¡ We solve for w* ∗by
Tìm nghiệm 𝒘 taking the gradient of the objective
bằng cách lấy đạo hàm của 𝑅𝑆𝑆 function
và giải
inphương trình 𝑅𝑆𝑆′
(2), and then zeroing it. Therefore we obtain:
= 0. Thu được:
𝒘∗ = 𝑨 𝑨 + 𝜆𝑰 𝑨 𝒚
¨
Trong đó 𝑨
Where data matrix of size M×
A is thelà ma trận dữ liệu cỡ 𝑀 (𝑛 + 1)
x(n+1), mà hàng thứ 𝑖
whose the ith row là is
Ai =(1,(1,
𝑥 x, … , 𝑥 ); 𝒚 = 𝑦-1, … , 𝑦 ;𝑰 là ma trận đơn vị cỡ 𝑛 + 1.
i1, xi2, …, xin); B is the inversion of matrix B; y = (y1, y2, …, yM) ;
T

In+1 is the identity matrix of size n+1.

So sánh với phương pháp bình phương tối thiểu:
Tránh được trường hợp ma trận dữ liệu suy biến. Hồi quy Ridge
¡ Compared with OLS, Ridge can
luôn làm việc được.
¨ Avoid the cases of singularity, unlike OLS. Hence Ridge always works.
Khả năng overfitting thường ít hơn.
¨ Reduce overfitting.
Lỗi trên tập học có thể nhiều hơn.
But errorchất lượng của phương pháp phụ thuộc rất nhiều
¨ Chú ý: in the training data might be greater than OLS.
vào sự lựa chọn của tham số 𝜆.
¡ Note: the predictiveness of Ridge depends heavily on the
choice of the hyperparameter λ.
Bình phương tối thiểu: thuật toán
16
Input: 𝑫 = 𝒙 , 𝑦 , … , (𝒙 , 𝑦 )
Methods: ∗
Ridge regression (4)
Output: 𝒘
Ridge
Học 𝒘
¡ Input:
regression
∗
D = bằng cách tính:
(2)
{(x1, y1), (x2, y2), …, (xM, yM)} and λ>0
𝒘∗ = 𝑨 𝑨 𝑨 𝒚
¡ Output: w* ∗
Tìm nghiệm 𝒘 bằng cách lấy đạo hàm của 𝑅𝑆𝑆
Trong đó 𝑨 là ma trận dữ liệu cỡ 𝑀 và giải
× (𝑛 + 1) mà hàng thứ 𝑖 là
¡ Learning: compute
một véctơ 𝑨 = (1,=
phương trình 𝑅𝑆𝑆′ 𝑥 0. Thu được:
, … , 𝑥 ), 𝑩 là ma trận nghịch đảo, 𝒚 =
𝑦 ,…,𝑦 . ∗
𝒘 = 𝑨 𝑨 + 𝜆𝑰 𝑨 𝒚
Chú ý: giả thuyết 𝑨 𝑨 tồn tại nghịch đảo.
Trong đó 𝑨 là ma trận dữ liệu cỡ 𝑀 × (𝑛 + 1) mà hàng thứ 𝑖 là
¡ Prediction for a new x:
Phán đoán cho quan sát mới 𝒙:
(1, 𝑥 , … , 𝑥 ); 𝒚 = 𝑦 , … , 𝑦 ;𝑰 là ma trận đơn vị cỡ 𝑛 + 1.
∗ ∗ ∗
𝑦 = 𝑤 + 𝑤 𝑥 + ⋯ + 𝑤 𝑥
So sánh với phương pháp bình phương tối thiểu:
Tránh được trường hợp ma trận dữ liệu suy biến. Hồi quy Ridge
luôn làm việc được.
¡ Note: to avoid some negative effects of the magnitute of y on
Khả năng overfitting thường ít hơn.
covariates x, one should remove w0 from the penalty term in (2).
Lỗi trên tập học có thể nhiều hơn.
In this case, the solution of w* should be modified slightly. 10
Chú ý: chất lượng của phương pháp phụ thuộc rất nhiều
vào sự lựa chọn của tham số 𝜆.
17
An example of using Ridge and OLS
¡ The training set D contains 67 observations on prostate cancer,
each was represented with 8 attributes. Ridge and OLS were
learned from D, and then predicted 30 new observations.
Ordinary Least
w Squares Ridge
0 2.465 2.452
lcavol 0.680 0.420
lweight 0.263 0.238
age −0.141 −0.046
lbph 0.210 0.162
svi 0.305 0.227
lcp −0.288 0.000
gleason −0.021 0.040
pgg45 0.267 0.133
Test RSS 0.521 0.492
18
Effects of λ in Ridge regression
¡ W* = (w0, S1, S2, S3, S4, S5, S6, AGE, SEX, BMI, BP) changes as the
regularization constant λ changes.
19
LASSO
¡ Ridge regression use L2 norm for regularization:
𝑤 ∗ = arg min ∑(
%&' 𝑦% − 𝑨% 𝒘
"
, subject to ∑+)&* 𝑤)" ≤ 𝑡 (3)
𝒘

¡ Replacing L2 by L1 norm will result in LASSO:

"
𝑤 ∗ = arg min . 𝑦& − 𝑨& 𝒘 #
𝒘
&'!

Subject to ∑*(') |𝑤( | ≤ 𝑡

¡ Equivalently:
&
𝑤 ∗ = arg min 0 𝑦) − 𝑨) 𝒘 + +𝜆 𝒘 (4)
%
𝒘
)*%
¡ This problem is non-differentiable à the training algorithm should
be more complex than Ridge.
20
LASSO: regularization role
¡ The regularization types lead to different domains for w.
¡ LASSO often produces sparse solutions, i.e., many components
of w are zero.
¨ Shinkage and selection at the same time

Figure by Nicoguaro - Own work, CC BY 4.0,

https://commons.wikimedia.org/w/index.php?curid=58258966
21
OLS, Ridge, and LASSO
¡ The training set D contains 67 observations on prostate cancer,
each was represented with 8 attributes. OLS, Ridge, and LASSO
were trained from D, and then predicted 30 new observations.
Ordinary Least
w Squares Ridge LASSO
0 2.465 2.452 2.468
lcavol 0.680 0.420 0.533
lweight 0.263 0.238 0.169 Some weights
age −0.141 −0.046 are 0
à some
lbph 0.210 0.162 0.002
attributes may
svi 0.305 0.227 0.094 not be
lcp −0.288 0.000 important
gleason −0.021 0.040
pgg45 0.267 0.133
Test RSS 0.521 0.492 0.479
22
References
¡ Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical
Learning. Springer, 2009.
¡ Tibshirani, Robert (1996). "Regression Shrinkage and Selection via the lasso".
Journal of the Royal Statistical Society. Series B (methodological). Wiley. 58 (1):
267–88.
23
Exercises
¡ Derive the solution of (1) and (2) in details.
¡ Derive the solution of (2) when removing w0 from the penalty
term.

The Simple Regression Model: Introductory Econometrics: A Modern Approach (Wooldridge)
No ratings yet
The Simple Regression Model: Introductory Econometrics: A Modern Approach (Wooldridge)
15 pages
Chap 3.2
No ratings yet
Chap 3.2
56 pages
Chap 2
No ratings yet
Chap 2
15 pages
LinearModels Slides
No ratings yet
LinearModels Slides
130 pages
Lecture 14 - Panel Data Models - Auto.vi
No ratings yet
Lecture 14 - Panel Data Models - Auto.vi
40 pages
3 - Chapter3
No ratings yet
3 - Chapter3
52 pages
Machine Learning Coursera
No ratings yet
Machine Learning Coursera
43 pages
Ridge Regression LASSO
No ratings yet
Ridge Regression LASSO
18 pages
bbt9
No ratings yet
bbt9
17 pages
2023 Logictic Regression VN
No ratings yet
2023 Logictic Regression VN
49 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
OLS
No ratings yet
OLS
18 pages
Chương 3 - Bt123 - Kinh Tế Lượng (1)
No ratings yet
Chương 3 - Bt123 - Kinh Tế Lượng (1)
3 pages
Ch5 SimpleRegression
No ratings yet
Ch5 SimpleRegression
20 pages
Bai Nop Ngay 03.12.23pdf
No ratings yet
Bai Nop Ngay 03.12.23pdf
4 pages
ví dụ kte lượng bản mới 1
No ratings yet
ví dụ kte lượng bản mới 1
17 pages
CT KTL
No ratings yet
CT KTL
24 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
Econometrics1 Cha2
100% (1)
Econometrics1 Cha2
77 pages
Chap 4 Linear Regression With One Regressor
No ratings yet
Chap 4 Linear Regression With One Regressor
46 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
Cong Thuc KTL Tong Hop2
No ratings yet
Cong Thuc KTL Tong Hop2
17 pages
Module 4: Regression Shrinkage Methods
No ratings yet
Module 4: Regression Shrinkage Methods
5 pages
Evans Analytics2e PPT 04
No ratings yet
Evans Analytics2e PPT 04
63 pages
TWO-VARIABLE New
No ratings yet
TWO-VARIABLE New
19 pages
LLICO2b_ECO1_English
No ratings yet
LLICO2b_ECO1_English
15 pages
Lec3
No ratings yet
Lec3
20 pages
BT Toán NC Steepsest
No ratings yet
BT Toán NC Steepsest
8 pages
EDA 4th Module
No ratings yet
EDA 4th Module
26 pages
LA_1. Linear Systems - Edit
No ratings yet
LA_1. Linear Systems - Edit
56 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Machine learning
No ratings yet
Machine learning
19 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
Chapter 3 Econometrics Edited
No ratings yet
Chapter 3 Econometrics Edited
48 pages
03 - Chương 1B
No ratings yet
03 - Chương 1B
19 pages
Classical Least Squares Theory
No ratings yet
Classical Least Squares Theory
38 pages
et_Ch3
No ratings yet
et_Ch3
38 pages
Least Squares.: Herv e Abdi
No ratings yet
Least Squares.: Herv e Abdi
4 pages
Exercises_Multiple_Linear_Regression (2)
No ratings yet
Exercises_Multiple_Linear_Regression (2)
38 pages
Chapter Three
No ratings yet
Chapter Three
22 pages
Lecture-2 Least Squares Regression
No ratings yet
Lecture-2 Least Squares Regression
18 pages
Abdi Least Squares 06 Pretty
No ratings yet
Abdi Least Squares 06 Pretty
7 pages
Dap An BTL KTL HK I 2022
No ratings yet
Dap An BTL KTL HK I 2022
4 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Least Squares Method
No ratings yet
Least Squares Method
36 pages
523h0197 Giua Ki Thuc Hanh
No ratings yet
523h0197 Giua Ki Thuc Hanh
15 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Quiz Review Business Statistics
No ratings yet
Quiz Review Business Statistics
9 pages
BGXLTH Slide
No ratings yet
BGXLTH Slide
337 pages
Chapter 2 SOLVING NONLINEAR EQUATION 3
No ratings yet
Chapter 2 SOLVING NONLINEAR EQUATION 3
14 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Overview of Total Least Squares Methods: Im@ecs - Soton.ac - Uk
No ratings yet
Overview of Total Least Squares Methods: Im@ecs - Soton.ac - Uk
24 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
17 pages
Unit 2
No ratings yet
Unit 2
92 pages
Y2025 MAE101 Algebra Summary
No ratings yet
Y2025 MAE101 Algebra Summary
21 pages
Exercise Machine Learning 1
No ratings yet
Exercise Machine Learning 1
5 pages
MLDA U1
No ratings yet
MLDA U1
10 pages
lecture03a_least_squares_annotated
No ratings yet
lecture03a_least_squares_annotated
9 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Probability Predict Stata
No ratings yet
Probability Predict Stata
2 pages
Abrams and Bolland 1999 Labour costs
No ratings yet
Abrams and Bolland 1999 Labour costs
31 pages
Practice Problems of Regression
No ratings yet
Practice Problems of Regression
5 pages
Cunningham Mixtape
No ratings yet
Cunningham Mixtape
328 pages
Market Feasibility
100% (1)
Market Feasibility
43 pages
Correlation Between Body Mass Index (BMI) and Z-Score (BMD) AP Spine (Z-Score)
No ratings yet
Correlation Between Body Mass Index (BMI) and Z-Score (BMD) AP Spine (Z-Score)
7 pages
Syllabus
No ratings yet
Syllabus
3 pages
Ch_18_Wooldridge_6e_PPT_Updated
No ratings yet
Ch_18_Wooldridge_6e_PPT_Updated
18 pages
Tututu Tutu
No ratings yet
Tututu Tutu
12 pages
Hasil SPSS
No ratings yet
Hasil SPSS
8 pages
Regression1 Framework
No ratings yet
Regression1 Framework
52 pages
University of Gondar College of Agriculture and Environmental Science Department of Agricultural Economics
No ratings yet
University of Gondar College of Agriculture and Environmental Science Department of Agricultural Economics
38 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Influential Observation
No ratings yet
Influential Observation
4 pages
Demand Management and Forecasting: Mcgraw-Hill/Irwin
No ratings yet
Demand Management and Forecasting: Mcgraw-Hill/Irwin
15 pages
Probit Analysis - Respuesta: Estimated Regression Model (Maximum Likelihood)
No ratings yet
Probit Analysis - Respuesta: Estimated Regression Model (Maximum Likelihood)
18 pages
Applied Linear Regression 4th Edition Sanford Weisberg - Own the complete ebook set now in PDF and DOCX formats
No ratings yet
Applied Linear Regression 4th Edition Sanford Weisberg - Own the complete ebook set now in PDF and DOCX formats
46 pages
On Estimation of Almost Ideal Demand System Using Moving Blocks Bootstrap and Pairs Bootstrap Methods
No ratings yet
On Estimation of Almost Ideal Demand System Using Moving Blocks Bootstrap and Pairs Bootstrap Methods
30 pages
Numpy NP Pandas PD Scipy Matplotlib - Pyplot PLT Statsmodels - Api SM Statsmodels - Tsa.setar - Model Setar - Model
No ratings yet
Numpy NP Pandas PD Scipy Matplotlib - Pyplot PLT Statsmodels - Api SM Statsmodels - Tsa.setar - Model Setar - Model
3 pages
econometrics final
No ratings yet
econometrics final
13 pages
RMA Discussion and References
No ratings yet
RMA Discussion and References
2 pages
Course Outline BBA-5A Final Uploaded Fall 2023as
No ratings yet
Course Outline BBA-5A Final Uploaded Fall 2023as
4 pages
Jeffrey M. Wooldridge-Introductory Econometrics - A Modern Approach-South-Western College Pub (2016) - 113-115
No ratings yet
Jeffrey M. Wooldridge-Introductory Econometrics - A Modern Approach-South-Western College Pub (2016) - 113-115
3 pages
Econometrics - Solution sh.2B 2024
No ratings yet
Econometrics - Solution sh.2B 2024
9 pages
Cost Estimation Using Regression Analysis
No ratings yet
Cost Estimation Using Regression Analysis
9 pages
Econ Placement Brochure
No ratings yet
Econ Placement Brochure
6 pages
Econometrics Assignment 2 IIT Delhi
No ratings yet
Econometrics Assignment 2 IIT Delhi
9 pages
Hydrognomon Theory English 07
No ratings yet
Hydrognomon Theory English 07
11 pages
Lampiran 1 Perhitungan Pembuatan Konsentrasi Ekstrak: Universitas Sumatera Utara
No ratings yet
Lampiran 1 Perhitungan Pembuatan Konsentrasi Ekstrak: Universitas Sumatera Utara
30 pages
Car Price Prediction
No ratings yet
Car Price Prediction
18 pages

L3 Linear Regression

Uploaded by

L3 Linear Regression

Uploaded by

Introduction to

Machine Learning and Data Mining

¡ Empirical loss (lỗi thực nghiệm, residual sum of squares):

⇔ 𝒘∗ = arg min (𝑦 − 𝑨𝒊 𝒘) + 𝜆 𝑤 (2)

¨ Where Ai = (1, x>i1,0xlà một hằng số phạt.

Subject to ∑*(') 𝑤(# ≤ 𝑡

¨ for some constant t.

In+1 is the identity matrix of size n+1.

¡ Replacing L2 by L1 norm will result in LASSO:

Subject to ∑*(') |𝑤( | ≤ 𝑡

Figure by Nicoguaro - Own work, CC BY 4.0,

You might also like