0% found this document useful (0 votes)

18 views

Kayatu

Uploaded by

duxburyjoel096

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Kayatu

Uploaded by

duxburyjoel096

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

CSE 546: Machine Learning Lecture 2

Least Squares

Instructor: Sham Kakade

1 Supervised Learning and Regression

We observe data:
T = (x1 , y1 ), . . . (xn , yn )
from some distribution. Our goal may be to predict the Y give some X. If Y is real, we may wish to learn the
conditional expectation E[Y |Xi ].
Typically, in supervised learning, we are interested in some notion of the our prediction loss. For example, in regres-
sion, the average squared error for a function f is:

Lsquared error (f ) = E(f (X) − Y )2

where the expectation is with respect to a random X, Y pair. (Note: sometimes the average error in machine learning
is referred to as the risk.)
Note that minimizing the squared loss function also corresponds to doing maximum likelihood estimation under the
model Pr(Y |X, f ) = N (f (X), σ 2 ). To see this observe that,
√ (f (X) − Y )2
− log Pr(Y |X, f ) = − log 2πσ 2 +
2σ 2
which is the square loss (up to a linear transformation).
Our goal is to use our training set T to estimate a function fˆ which has low error. Also, note that the lowest possible
squared error is achieved by:
f∗ (X) = E[Y |X]
which is the conditional expectation.

1.1 Risk (and some terminology clarifications)

A learning algorithm (or a decision rule) δ is a mapping from T to some hypothesis space. In this the case of regression
it is a mapping from T to a function f . The notion of risk in statistics measures the quality of this procedure, on
average.
Let f ∗ be the minimizer of L in some set F, e.g.

f ∗ ∈ arg minf ∈F L(f )

The regret of f (sometimes referred to as the loss) is defined as:

L(f ) − L(f ∗ )

1
which is a measure of the sub-optimality of f .
The risk is some measure of the (average) performance of a decision rule; where, importantly, an expectation is take
over the training set T . One natural definition of the risk function is:

Risk(δ) = ET [L(δ(T ))] − L(f∗ )

Note that the expectation is over the training set. Other definitions may also be appropriate (though, technically, the
risk should always refer to the performance of the decision rule δ).

2 Linear Regression

Suppose that X ∈ Rd . Our prediction loss on our training set for a linear predictor is:
n
1X 1
E(Xi · w − Yi )2 = kXw − Yk2
n i=1 n

where X (X in boldface) is defined to be the n × d matrix whose rows are Xi and Y (Y in boldface) is vector where
[Y1 , Y2 , . . . Yn ]> .
The least squares estimator using an outcome Y is just:
1
β̂ = arg minw kY − Xwk2
n
The first derivative condition, often referred to as the normal questions, is that:

X> (Y − Xβ̂) = 0

which is sometimes referred to as the normal equations.

The least squares estimator (the MLE) is then:

β̂least squares = (X> X)−1 X> Y

3 Review: The SVD; the “Thin” SVD; and the pseudo-inverse

Theorem 3.1. (SVD) Let X ∈ Rn×d . there exists U ∈ Rn×n and V ∈ Rd×d orthogonal matrices (e.g. matrices with
orthonormal rows and columns, so that U U > = In and V V > = Id where Ik is the k × k identity matrix) such that:
X
X= λi ui vi> = U diag(λ1 , . . . λmin{n,d} )V >
i

where diag(·) is diagonal Rn×d matrix and the λi ’s are referred to as the the singular values.

For X ∈ Rn×d and Y ∈ Rn , suppose that the equation:

Xβ = Y

has a unique solution. Then:

β = X−1 Y

2
where X−1 is the inverse of X (it exists and is unique since we have assume the linear system has a unique solution).
In regression, there is typically noise, and we find a β which minimizes:
kXβ − Yk2
Clearly, if there is no noise, then a solution is given by β = X−1 Y, assuming no degeneracies. In general though, the
minimizer of this error, referred to as the least squares estimator, is:
β = (X> X)−1 XY . (1)
Furthermore, Equation 1 above only holds if X is of rank d (else X> X−1 would not be invertible).
Now let us define the Moore-Penrose pseudo-inverse.
First, let us define the ’thin’ SVD.
Definition 3.2. We say X = U DV > is the “thin” SVD of X ∈ Rn×p if: U n×r and V p×r have orthonormal columns
(e.g. where r is the number of columns) and D ∈ Rr×r is diagonal, with all it’s diagonal entries being non-zero.
Here, r is the rank of X.

Now we define the pseudo-inverse as follows:

Definition 3.3. Let X = U DV > be the thin SVD of X. The Moore-Penrose pseudo-inverse of X, denoted by X+ , is
defined as:
X+ = V D−1 U >

Let us make some observations:

1. First, if X is invertible (so X is square) then X+ = X−1 .

2. Suppose that X isn’t square and that Xw = Y has a (unique) solution, then w = X+ Y.
3. Now suppose that Xw = Y has (at least one) solution. Then one solution is given by w = X+ Y. This solution
is the minimum norm solution w.
4. (geometric interpretation) The matrix X+ maps any point in the range of X to the minimum norm point in the
domain.

Using this terminology, we can write the least squares estimator in a more interpretable way:
Lemma 3.4. The least squares estimator is:
β = X+ Y
(Note that the above is alway a minimizer, while the solution provided in Equation 1 only holds if X> X is invertible,
in which case the minimizer is unique).

4 Analysis: what is the risk?

We will return to this in the next lecture.

5 What about if d > n?

We will examine this in the next few lectures.

Personal Statement 1
82% (17)
Personal Statement 1
2 pages
03 Statistical Inference v0 1 28052022 091609am
No ratings yet
03 Statistical Inference v0 1 28052022 091609am
18 pages
Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
SAGE Vancouver Reference Style
No ratings yet
SAGE Vancouver Reference Style
2 pages
Site Analysis
No ratings yet
Site Analysis
11 pages
Notes Linearregression
No ratings yet
Notes Linearregression
4 pages
LeastSquares_DeptMath
No ratings yet
LeastSquares_DeptMath
7 pages
Representer Function
No ratings yet
Representer Function
12 pages
SolutionQuiz1
No ratings yet
SolutionQuiz1
5 pages
lecture03a_least_squares_annotated
No ratings yet
lecture03a_least_squares_annotated
9 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Asset-V1 ColumbiaX+CSMM.102x+3T2018+type@asset+block@ML Lecture3 PDF
No ratings yet
Asset-V1 ColumbiaX+CSMM.102x+3T2018+type@asset+block@ML Lecture3 PDF
33 pages
01_lecturenote_SRM
No ratings yet
01_lecturenote_SRM
9 pages
Regress
No ratings yet
Regress
11 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
Chap 03
No ratings yet
Chap 03
59 pages
Linear_least_squared
No ratings yet
Linear_least_squared
23 pages
data analysis
No ratings yet
data analysis
40 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Day 1
No ratings yet
Day 1
41 pages
Lecture 2 - Linear Regression
No ratings yet
Lecture 2 - Linear Regression
54 pages
BM1 CheatsheetExam2
No ratings yet
BM1 CheatsheetExam2
3 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Econometrics I 3
No ratings yet
Econometrics I 3
27 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Method of Moment
No ratings yet
Method of Moment
53 pages
10: Empirical Risk Minimization
No ratings yet
10: Empirical Risk Minimization
6 pages
Unesco - Eolss Sample Chapters: Statistical Parameter Estimation
No ratings yet
Unesco - Eolss Sample Chapters: Statistical Parameter Estimation
9 pages
Lecture Notes on High Dimensional Linear Regression
No ratings yet
Lecture Notes on High Dimensional Linear Regression
73 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Ch2 Linear Regression Analysis
No ratings yet
Ch2 Linear Regression Analysis
57 pages
Linear Least Squares
No ratings yet
Linear Least Squares
21 pages
Regression Basics in Matrix Terms: 1 The Normal Equations of Least Squares
No ratings yet
Regression Basics in Matrix Terms: 1 The Normal Equations of Least Squares
3 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Least Squares
No ratings yet
Least Squares
12 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Applied Econometrics: Department of Economics Stern School of Business
No ratings yet
Applied Econometrics: Department of Economics Stern School of Business
27 pages
eng
No ratings yet
eng
10 pages
ECEN615 Fall2022 Lect16-1
No ratings yet
ECEN615 Fall2022 Lect16-1
47 pages
Unit 2
No ratings yet
Unit 2
92 pages
8. Linear Regression
No ratings yet
8. Linear Regression
29 pages
DA Unit-3
No ratings yet
DA Unit-3
11 pages
Chap7
No ratings yet
Chap7
7 pages
Course1 Review
No ratings yet
Course1 Review
45 pages
lecture-4 2
No ratings yet
lecture-4 2
50 pages
Chapter 4a Riskmin-Reg - Commented4
No ratings yet
Chapter 4a Riskmin-Reg - Commented4
54 pages
UMVUE Statmat 2 2022
No ratings yet
UMVUE Statmat 2 2022
43 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
LECTURE2
No ratings yet
LECTURE2
13 pages
MIT18 650F16 Regression
No ratings yet
MIT18 650F16 Regression
44 pages
2 Classical Linear Regression Models: 2.1 Assumptions For The Ordinary Least Squares Regression
No ratings yet
2 Classical Linear Regression Models: 2.1 Assumptions For The Ordinary Least Squares Regression
18 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Robust Regression: 1 M-Estimation
No ratings yet
Robust Regression: 1 M-Estimation
8 pages
Appendix Robust Regression
No ratings yet
Appendix Robust Regression
8 pages
Quant_Chapter_05_ols
No ratings yet
Quant_Chapter_05_ols
15 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Bài B_ Tôi khá chắc rằng… _ Bài 9_ Bạn có thể giải thích được không_ _ PENC123_Anh ngữ 4_ĐCV0030661_N01 _ HUTECH eLearning
No ratings yet
Bài B_ Tôi khá chắc rằng… _ Bài 9_ Bạn có thể giải thích được không_ _ PENC123_Anh ngữ 4_ĐCV0030661_N01 _ HUTECH eLearning
7 pages
Genetics Vocabulary 335
No ratings yet
Genetics Vocabulary 335
1 page
Power and Interdependence in Organizations Cambridge Companions to Management 1st Edition Dean Tjosvold download pdf
No ratings yet
Power and Interdependence in Organizations Cambridge Companions to Management 1st Edition Dean Tjosvold download pdf
67 pages
Pacific Affairs, University of British Columbia Pacific Affairs
No ratings yet
Pacific Affairs, University of British Columbia Pacific Affairs
4 pages
Lesson Wise Notes For Midterm Soc401 by Mudasar Qureshi Vu360
No ratings yet
Lesson Wise Notes For Midterm Soc401 by Mudasar Qureshi Vu360
45 pages
TDS Dau Phanh Brake Fluid Hydraulan 406 DOT 5.1 EN
No ratings yet
TDS Dau Phanh Brake Fluid Hydraulan 406 DOT 5.1 EN
4 pages
Cost To Cost Price List PDF
100% (1)
Cost To Cost Price List PDF
8 pages
Control of Dual Active Bridge
No ratings yet
Control of Dual Active Bridge
15 pages
ANSI-ASME B16.47 Series B Weld Neck Flange 150lb
100% (1)
ANSI-ASME B16.47 Series B Weld Neck Flange 150lb
1 page
Employee Evaluation Template: Performance Item
No ratings yet
Employee Evaluation Template: Performance Item
7 pages
Calculation Methods of Inter-Story Drifts in Building Structures
No ratings yet
Calculation Methods of Inter-Story Drifts in Building Structures
11 pages
4 (Optional) Multiple Response Optimization
No ratings yet
4 (Optional) Multiple Response Optimization
4 pages
Recycling and Reuse Approaches For Better Sustainability
No ratings yet
Recycling and Reuse Approaches For Better Sustainability
297 pages
Centrifugal Pump PDF
No ratings yet
Centrifugal Pump PDF
32 pages
6mumbai HDR Complete PDF
No ratings yet
6mumbai HDR Complete PDF
285 pages
Chapter 19 Materials For Developing Writing Skills
No ratings yet
Chapter 19 Materials For Developing Writing Skills
15 pages
PHYS 101 Midterm Exam 2 Solution 2020-21-2: 1. A Block of Mass
No ratings yet
PHYS 101 Midterm Exam 2 Solution 2020-21-2: 1. A Block of Mass
4 pages
Design of Common Raft: Loads On Column at Base
No ratings yet
Design of Common Raft: Loads On Column at Base
20 pages
Module 2 Materials For Prestressed Concrete
No ratings yet
Module 2 Materials For Prestressed Concrete
17 pages
Distance Protection For Transmission Lines:: Power Transmission and Distribution
No ratings yet
Distance Protection For Transmission Lines:: Power Transmission and Distribution
15 pages
Indonesia’s Energy Transition- Energy Justice and Challenges (Final)
No ratings yet
Indonesia’s Energy Transition- Energy Justice and Challenges (Final)
23 pages
Ma 134 Assignment3 Solutions
No ratings yet
Ma 134 Assignment3 Solutions
3 pages
Ls5-Mga Filipino Na Kilala Sa Mundo Ngayon
No ratings yet
Ls5-Mga Filipino Na Kilala Sa Mundo Ngayon
2 pages
FNBN - Fssai Makes The Use of Its Own Test Methods For Fortified Rice and Rice Kernel Mandatory
No ratings yet
FNBN - Fssai Makes The Use of Its Own Test Methods For Fortified Rice and Rice Kernel Mandatory
1 page
Solar System Crossword: Across Down
No ratings yet
Solar System Crossword: Across Down
2 pages
Rcbos Ex9Nl-N 3P+N, 6 Ka
No ratings yet
Rcbos Ex9Nl-N 3P+N, 6 Ka
6 pages

Kayatu

Uploaded by

Kayatu

Uploaded by

CSE 546: Machine Learning Lecture 2

Instructor: Sham Kakade

1 Supervised Learning and Regression

Lsquared error (f ) = E(f (X) − Y )2

1.1 Risk (and some terminology clarifications)

f ∗ ∈ arg minf ∈F L(f )

The regret of f (sometimes referred to as the loss) is defined as:

Risk(δ) = ET [L(δ(T ))] − L(f∗ )

which is sometimes referred to as the normal equations.

β̂least squares = (X> X)−1 X> Y

3 Review: The SVD; the “Thin” SVD; and the pseudo-inverse

For X ∈ Rn×d and Y ∈ Rn , suppose that the equation:

has a unique solution. Then:

Now we define the pseudo-inverse as follows:

Let us make some observations:

1. First, if X is invertible (so X is square) then X+ = X−1 .

4 Analysis: what is the risk?

We will return to this in the next lecture.

5 What about if d > n?

We will examine this in the next few lectures.

You might also like