0% found this document useful (0 votes)

2 views

SolutionQuiz1

The document provides solutions to Quiz 1 for a course on Machine Learning, focusing on weighted mean squared errors and their minimization. It discusses the role of the vector r in adjusting weights based on residuals, and explores the relationship between total sum of squares, regression sum of squares, and sum of squared errors in linear regression. Additionally, it emphasizes the importance of assumptions in Ordinary Least Squares regression for the validity of the SST = SSM + SSE relationship.

Uploaded by

Anuj Jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

SolutionQuiz1

Uploaded by

Anuj Jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

DS303: Introduction to Machine Learning

Solution to Quiz 1
Prepared by - Sai Vivek and Siddhant

Solution Question 1: Consider a dataset S = (x(1) , y (1) ), (x(2) , y (2) ), . . . , (x(m) , y (m) ) where
x(i) ∈ Rd and y (i) ∈ R for all i=1,2,. . . ,m. Let r := (r1 , r2 , . . . , rm ) be any real value vector. For
any Θ ∈ Rθ and Θ0 ∈ R, consider following weighted mean squared errors

Pm (i)
L(S, r, Θ) = i=1 ri (y − x(i)T Θ − Θ0 )2

1.1 [2 points] Find value of (Θ0 ,Θ) ∈ Rd+1 that minimizes it.

m
X
L(S, r, Θ) = ri (y (i) − x(i)T Θ − Θ0 )2
i=1
(i) (i)
let, Θ = [θ0 ,θ1 , . . . , θd ]T , x(i)T = [1, xd , . . . , xd ]
X m
L(S, r, Θ) = ri (y (i) − x(i)T Θ)2
i=1
m
X √ √
L(S, r, Θ) = ( ri y (i) − ri x(i)T Θ)2
i=1
N ow,
Y = [y 1 , y 2 , . . . , y m ]T
√ 
r1 0 ... ... ...
 0 √
 r2 . . . ... ... 

R =  ... ... ...
 ... ... 
 . . . . . . . . . √rm−1

0 
√
... ... ... 0 rm
 (1)T 
x
 x(2)T 
 
X=  . . . 

 ... 
x(m)T

Loss function in Matrix Notation:

L(S, r, Θ) = (RY − RXΘ)2

= (RY − RXΘ)T (RY − RXΘ)
= (Y T RT − ΘT X T RT )(RY − RXΘ)
= Y T RT RY − Y T RT RXΘ − ΘT X T RT RY + ΘT X T RT RXΘ
Gradient of Loss Function w.r.t Θ to find optimal value:

∇L = 0
T T T T T T
0v − X R RY − X R RY + 2X R RXΘ = 0
X T RT RXΘ = X T RT RY
Θ = (X T RT RX)−1 (X T RT RY )

1.2 [2 points] Briefly explain how the vector r affects the optimal values.

Vector r helps to model situations where the variance of residuals is not uniform.
Observations with high residual values are given less weights and observations with low
residual values are assigned high weights.

1.3 [1 point] Is there any particular way you like to choose r?

One of the choices of vector r is ri = 1/σi2 where σi2 is variance of error for xi .

Solution to Question 2 Consider a dataset S = (x(1) , y (1) ), (x(2) , y (2) ), . . . , (x(m) , y (m) ) where
x(i) ∈ Rd and (i) ∈ R for all i=1,2,. . . ,m. The samples are drawn i.i.d. Let
Pmy (i)
ȳ = (1/m) i=1 y be the average of the labels. Consider a linear model with parameters
Θ ∈ Rd and Θ ∈ R, giving predictions yd ( i) = Θ + x(i) T Θ for all i. Does the following
0 0
relationship hold? Prove or disprove.
Pm 2 Pm 2 Pm 2
i=1 y (i) − ȳ = i=1 yb(i) − ȳ + i=1 y (i) − yb(i)

Solution:
For a multivariate linear regression, suppose we have a dataset
S = {(x(1) , y (1) ), (x(2) , y (2) ) · · · , (x(m) , y (m) )} where xi ∈ Rd and yi ∈ R for all i = 1, 2, · · · , m.
Let xij , i ∈ {1, · · · , m}, j ∈ {1, · · · , d} be the j-th feature of the i-th sample.
The predicting equation for ybi is given by

ybi = xi1 · β1 + xi2 · β2 · · · + xid · βd + 1 · β0 + εi , i ∈ {1, · · · , m}

where εi is the i-th error term.  
β0
y1
 
  β1
If we put everything in a matrix form, i.e., let Y =  ...  and β =   and X =
   
..
m
  .
y
βd
1 x11 · · · x1d
   
ε1
 .. . . ..  . 
 and ε =  ..  (vector/matrix will be written in bold form), then we

 . . .
1 xm1 · · · xm
d εm
can get the predicting equation by

Yb = Xβ + ε
For the ordinary least squares estimation, we want to minimize sum of squared errors SSE,
that is, the objective function is εT ε. If we substitute the above equation to the SSE formula,
we get the target optimization problem represented by

2
min εT ε : ε = Y − Xβ

β

= min(Y − Xβ)T (Y − Xβ)

Okay, let’s recall the first order partial derivative in a matrix form, you can expand and verify
the rules below in its scalar form.
If W is symmetric, 0
Rule #1: β T X = β, (W X)0 = W
0
Rule #2: X T W X = 2W X
0
In the special case for Rule #2 when W = I, X T X = 2X
Therefore, for this continuous function of SSE, the first order necessary optimality condition is
0
given by εT ε = 0, that is, by the chain rule,

2X T (Y − Xβ) = 0
b satisfies X T Y = X T X β,
Hence, the optimal β i.e., β b thus we get
−1
b = XT X
β XT Y
and

Yb = X β
b
−1 T
where X T X X is called the left pseudo inverse of X.
Note that for a simple linear regression (only one dependent variable), above reduces to

cov(x, y)
β1 =
var(x)
To see this, we write out the variables in their explicit form.
   
1 x1 y1
X m×2 =  ... ..  , Y =  .. 

.   . 
1 xm ym
We get

β0 −1
β
b
2×1 = = XT X XT Y
β1
  −1  
1 x1
y1
 1 ··· 1  .. .
.
1 ··· 1  .. 
=

 . .   . 
x1 · · · xm x1 · · · xm
1 xm ym
Pm −1 Pm
m i=1 xi i=1 yi
= Pm Pm 2
Pm
i=1 xi i=1 xi i=1 xi yi
Bear in mind that we have
−1
a b 1 d −b
=
c d ad − bc −c a
We can get

3
P P P
m xi yi − xi · yi cov(x, y)
β1 = P 2 P P =
m xi − xi · xi var(x)
We now focus on proving

SST = SSM + SSE

The total sum of squares (SST) is given by
m
X 2
y (i) − ȳ = (Y − Y )T (Y − Y )
i=1
T
= Y T Y + Y Y − 2Y T Y
The sum of squared errors (SSE), a.k.a. sum of squared residuals (SSR), is given by
m
X 2
y (i) − yb(i) = (Y − Yb )T (Y − Yb )
i=1
b T (Y − X β)
= (Y − X β) b
T
= Y T (Y − X β)
b −β b X T (Y − X β)
b
= Y T Y − Y T Xβ
b

The SSM/regression sum of squares (RSS), a.k.a. explained sum of squares (ESS), is given by
m
X 2
yb(i) − ȳ = (Yb − Y )T (Yb − Y )
i=1
b − Y )T (X β
= (X β b −Y)
bT XT Xβ
=β bT XT Y
b + Y T Y − 2β

Therefore,

SST − SSM − SSE

T
= Y T Y + Y Y − 2Y T Y − Y T Y + Y T X β bT XT Xβ
b −β bT XT Y
b − Y T Y + 2β
b T X T Y − 2Y T Y + Y T X β
= 2β bT XT Xβ
b −β b

where
−1
b = XT X
β XT Y
We see that
T
Y T Xβ b XT Xβ
b −β b
bT XT Xβ

=Y T X β
b −β b

=Y T X β
b −βbT XT Y
=Y T X β
b − Y T Xβ
b=0

So, it suffices to prove that

4
T
b X T Y − 2Y T Y = 0
2β
to get SST = SSM + SSE.
We may ask is this true in general??? No! But we do have assumptions when we
conduct Ordinary Least Squares(OLS) regression.
Remember the moment restriction for a simple linear OLS regression.

E (y − β0 − β1 x) = 0 and E [x (y − β0 − β1 x)] = 0

The expected value of the error term should be zero and the error term should be uncorrelated
with the explanatory variables.

b T X T Y − Y T Y = −(Y − X β)
β b T Y = −εT Y = −ȳεT e = 0
 
1
 .. 
where en×1 =  . .
1
Therefore
SST = SSM+SSE
If the assumption that the expected value of the residual term is zero is violated, then

SST 6= SSM+SSE

Reference
Larry Li @ http://www.larrylisblog.net/WebContents/SST EQ RSS PLUS SSE.pdf

FINAL Reviewer - Mathematics in The Modern World
75% (4)
FINAL Reviewer - Mathematics in The Modern World
111 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Lady or The Tiger
No ratings yet
Lady or The Tiger
4 pages
Education and Research: UP School of Statistics Student Council
No ratings yet
Education and Research: UP School of Statistics Student Council
26 pages
data analysis
No ratings yet
data analysis
40 pages
8. Linear Regression
No ratings yet
8. Linear Regression
29 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Lecture2
No ratings yet
Lecture2
29 pages
Lecture-04__Least Squares and Geometry
No ratings yet
Lecture-04__Least Squares and Geometry
35 pages
Notes Linearregression
No ratings yet
Notes Linearregression
4 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Wooldridge 6e AppE IM
No ratings yet
Wooldridge 6e AppE IM
5 pages
Linear Regression Models: 1 What Does This Equation Means?
No ratings yet
Linear Regression Models: 1 What Does This Equation Means?
4 pages
Ecomt Solns1
No ratings yet
Ecomt Solns1
15 pages
Matrix OLS NYU Notes
No ratings yet
Matrix OLS NYU Notes
14 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Data Science Unit-II
No ratings yet
Data Science Unit-II
28 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
Lec 13
No ratings yet
Lec 13
10 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Regress
No ratings yet
Regress
11 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Econometrics (EM2008) The K-Variable Linear Regression Model
No ratings yet
Econometrics (EM2008) The K-Variable Linear Regression Model
46 pages
LeastSquares_DeptMath
No ratings yet
LeastSquares_DeptMath
7 pages
Day 1
No ratings yet
Day 1
41 pages
Lecture 2 - Linear Regression
No ratings yet
Lecture 2 - Linear Regression
54 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Lec10 LeastSquaresRegression PDF
No ratings yet
Lec10 LeastSquaresRegression PDF
4 pages
lec12
No ratings yet
lec12
9 pages
Paper On Polynomial Regression
No ratings yet
Paper On Polynomial Regression
7 pages
Lesson01 PDF 02
No ratings yet
Lesson01 PDF 02
5 pages
Kayatu
No ratings yet
Kayatu
3 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
Final_Exam_Solution_20220201
No ratings yet
Final_Exam_Solution_20220201
14 pages
BS Classes V2
No ratings yet
BS Classes V2
70 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Representer Function
No ratings yet
Representer Function
12 pages
ECON 5350 Class Notes Least Squares: 2.1 The Problem
No ratings yet
ECON 5350 Class Notes Least Squares: 2.1 The Problem
4 pages
Examples PDF
No ratings yet
Examples PDF
26 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
1238 Support Vector Regression Machines
No ratings yet
1238 Support Vector Regression Machines
7 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
MIT_Regression
No ratings yet
MIT_Regression
5 pages
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
No ratings yet
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
6 pages
Ols Estimates
No ratings yet
Ols Estimates
16 pages
SL_3
No ratings yet
SL_3
11 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
57 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Classical Multiple Regression
No ratings yet
Classical Multiple Regression
5 pages
Matrix Model
No ratings yet
Matrix Model
6 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Linear Models
No ratings yet
Linear Models
35 pages
ML_Lec 4-introduction to regression
No ratings yet
ML_Lec 4-introduction to regression
65 pages
Econometrics: Problem Set 2: Professor: Mauricio Sarrias
No ratings yet
Econometrics: Problem Set 2: Professor: Mauricio Sarrias
10 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Mathematical Formulas for Economics and Business: A Simple Introduction
From Everand
Mathematical Formulas for Economics and Business: A Simple Introduction
K.H. Erickson
4/5 (4)
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
230hw3sol
No ratings yet
230hw3sol
3 pages
MATH215_2SOL
No ratings yet
MATH215_2SOL
3 pages
QM Ts 01 2024
No ratings yet
QM Ts 01 2024
2 pages
230hw3
No ratings yet
230hw3
1 page
UrbanEase-A-Promising-Investment-in-Urban-Mobility
No ratings yet
UrbanEase-A-Promising-Investment-in-Urban-Mobility
5 pages
230hw2
No ratings yet
230hw2
1 page
230hw2sol
No ratings yet
230hw2sol
4 pages
assignment_8
No ratings yet
assignment_8
2 pages
course_outline
No ratings yet
course_outline
3 pages
notes_20130821
No ratings yet
notes_20130821
2 pages
answers_exam_1
No ratings yet
answers_exam_1
6 pages
answers_exam_2
No ratings yet
answers_exam_2
8 pages
extra_credit_assignment_1
No ratings yet
extra_credit_assignment_1
2 pages
assignment_10
No ratings yet
assignment_10
3 pages
answers_exam_3
No ratings yet
answers_exam_3
10 pages
assignment_11
No ratings yet
assignment_11
1 page
Lecture 1
No ratings yet
Lecture 1
16 pages
Emacao JAVA
No ratings yet
Emacao JAVA
907 pages
Stage-2 S-203 - Business Mathematics & Statistics
No ratings yet
Stage-2 S-203 - Business Mathematics & Statistics
3 pages
Normal Distribution: (For M.B.A. I Semester)
No ratings yet
Normal Distribution: (For M.B.A. I Semester)
29 pages
Tutorial Sheet Two-2 092724
No ratings yet
Tutorial Sheet Two-2 092724
3 pages
Unique Decipherability Notes
No ratings yet
Unique Decipherability Notes
5 pages
Xii - Revision Quiz 1 (PDV, Crku, MPK) (30.04
No ratings yet
Xii - Revision Quiz 1 (PDV, Crku, MPK) (30.04
13 pages
WK14 GE4 Problem Solving With Pattern and With Strategies
No ratings yet
WK14 GE4 Problem Solving With Pattern and With Strategies
15 pages
@ Action Research, To Be Check
No ratings yet
@ Action Research, To Be Check
24 pages
Diff. Calc. Module 1 Functions - Limits.Continuity
No ratings yet
Diff. Calc. Module 1 Functions - Limits.Continuity
17 pages
The Cartan-Hadamard Theorem For Metric Spaces With Local Bicombings
No ratings yet
The Cartan-Hadamard Theorem For Metric Spaces With Local Bicombings
9 pages
GEOMETRY - Math 7 Module
No ratings yet
GEOMETRY - Math 7 Module
30 pages
Bicriteria Shortest Path Problems in The Plane
No ratings yet
Bicriteria Shortest Path Problems in The Plane
4 pages
11weekly Test in Math Week 1 4 Q1
No ratings yet
11weekly Test in Math Week 1 4 Q1
4 pages
WST02 01 Que 20140117
No ratings yet
WST02 01 Que 20140117
24 pages
MMW - PPT 3 - Second Sem 2021-2022-Updated
No ratings yet
MMW - PPT 3 - Second Sem 2021-2022-Updated
16 pages
3.NEP-Bsc-4-years-Final-Syllabus-Mathematics
No ratings yet
3.NEP-Bsc-4-years-Final-Syllabus-Mathematics
87 pages
7 Dynamics Tutorial Ans
No ratings yet
7 Dynamics Tutorial Ans
20 pages
Tao Xing, Pablo Carrica, and Fred Stern
No ratings yet
Tao Xing, Pablo Carrica, and Fred Stern
11 pages
Task Thats Radical Dude Te
No ratings yet
Task Thats Radical Dude Te
5 pages
Introduction To Superstrings Theory PDF
No ratings yet
Introduction To Superstrings Theory PDF
19 pages
Lesson Plan in Grade 7 Mathematics 7 - Algebra Introduction
No ratings yet
Lesson Plan in Grade 7 Mathematics 7 - Algebra Introduction
8 pages
Unit3 PDF
No ratings yet
Unit3 PDF
25 pages
H MMT
No ratings yet
H MMT
21 pages
I I I I I I: Complex Numbers-Demoivre'S Theorem, Roots of A Complex Number
No ratings yet
I I I I I I: Complex Numbers-Demoivre'S Theorem, Roots of A Complex Number
9 pages
Imso Short Answer
67% (3)
Imso Short Answer
50 pages
8 2 Mass Spring Damper Tutorial 11-08-08
100% (4)
8 2 Mass Spring Damper Tutorial 11-08-08
11 pages
تجميع قوانين MT101 عبدالعزيز الشهراني
No ratings yet
تجميع قوانين MT101 عبدالعزيز الشهراني
4 pages
Grade 9
100% (1)
Grade 9
6 pages
MathLPG10 - M5L01 - Day37 For Demo DISTANCE FORMULA
No ratings yet
MathLPG10 - M5L01 - Day37 For Demo DISTANCE FORMULA
4 pages

SolutionQuiz1

Uploaded by

SolutionQuiz1

Uploaded by

DS303: Introduction to Machine Learning

Loss function in Matrix Notation:

L(S, r, Θ) = (RY − RXΘ)2

1.3 [1 point] Is there any particular way you like to choose r?

ybi = xi1 · β1 + xi2 · β2 · · · + xid · βd + 1 · β0 + εi , i ∈ {1, · · · , m}

= min(Y − Xβ)T (Y − Xβ)

SST = SSM + SSE

SST − SSM − SSE

So, it suffices to prove that

You might also like