MLF Combined
MLF Combined
REVISION (WEEK 2)
• Continuity
• Differentiability
• Linear Approximation
• Higher order approximations
• Multivariate Linear Approximation
• Directional Derivative
2
Continuity
3
Differentiability
4
Linear Approximation
5
Linear Approximation
𝑦 − 𝑦1 = 𝑚(𝑥 − 𝑥1 )
𝑦 = 𝑦1 + 𝑚(𝑥 − 𝑥1 )
5
Linear Approximation
𝑦 − 𝑦1 = 𝑚(𝑥 − 𝑥1 )
𝑦 = 𝑦1 + 𝑚(𝑥 − 𝑥1 )
𝑦 = 𝑓(𝑎) + 𝑓 ′ (𝑎)(𝑥 − 𝑎)
5
Higher order approximations
Linear Approximation
′
𝐿(𝑥) = 𝑓(𝑎) + 𝑓 (𝑎)(𝑥 − 𝑎)
6
Higher order approximations
Linear Approximation
′
𝐿(𝑥) = 𝑓(𝑎) + 𝑓 (𝑎)(𝑥 − 𝑎)
Quadratic Approximation
″
′ 𝑓 (𝑎)
𝐿(𝑥) = 𝑓(𝑎) + 𝑓 (𝑎)(𝑥 − 𝑎) + (𝑥 − 𝑎)2
2
6
Higher order approximations
Linear Approximation
′
𝐿(𝑥) = 𝑓(𝑎) + 𝑓 (𝑎)(𝑥 − 𝑎)
Quadratic Approximation
″
′ 𝑓 (𝑎)
𝐿(𝑥) = 𝑓(𝑎) + 𝑓 (𝑎)(𝑥 − 𝑎) + (𝑥 − 𝑎)2
2
Higher-order Approximations
𝑓 (2) (𝑎)
𝐿(𝑥) = 𝑓(𝑎) + 𝑓 (1) (𝑎)(𝑥 − 𝑎) + (𝑥 − 𝑎)2 +
2
𝑓 (3) (𝑎) 𝑓 (4) (𝑎)
+ (𝑥 − 𝑎)3 + (𝑥 − 𝑎)4 ...
3⋅2 4⋅3⋅2
6
Multivariate linear approximation: Linear approximation of functions involving
multiple variables
7
DIRECTIONAL DERIVATIVES
Directional Derivative
𝜕𝑓
• 𝑓𝑥 (𝑥, 𝑦) = 𝜕𝑥 (𝑥, 𝑦)
9
Directional Derivative
𝜕𝑓
• 𝑓𝑥 (𝑥, 𝑦) = 𝜕𝑥 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑥 (keeping 𝑦 fixed).
9
Directional Derivative
𝜕𝑓
• 𝑓𝑥 (𝑥, 𝑦) = 𝜕𝑥 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑥 (keeping 𝑦 fixed).
𝜕𝑓
• 𝑓𝑦 (𝑥, 𝑦) = 𝜕𝑦 (𝑥, 𝑦)
9
Directional Derivative
𝜕𝑓
• 𝑓𝑥 (𝑥, 𝑦) = 𝜕𝑥 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑥 (keeping 𝑦 fixed).
𝜕𝑓
• 𝑓𝑦 (𝑥, 𝑦) = 𝜕𝑦 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑦 (keeping 𝑥 fixed).
9
Directional Derivative
𝜕𝑓
• 𝑓𝑥 (𝑥, 𝑦) = 𝜕𝑥 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑥 (keeping 𝑦 fixed).
𝜕𝑓
• 𝑓𝑦 (𝑥, 𝑦) = 𝜕𝑦 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑦 (keeping 𝑥 fixed).
• Directional derivative of 𝑓(𝑥, 𝑦)
9
Directional Derivative
𝜕𝑓
• 𝑓𝑥 (𝑥, 𝑦) = 𝜕𝑥 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑥 (keeping 𝑦 fixed).
𝜕𝑓
• 𝑓𝑦 (𝑥, 𝑦) = 𝜕𝑦 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑦 (keeping 𝑥 fixed).
• Directional derivative of 𝑓(𝑥, 𝑦) = Rate of change of 𝑓 if we allow both 𝑥 and 𝑦 to
change simultaneously (in some direction (𝑢)).
9
Directional Derivative
𝜕𝑓
• 𝑓𝑥 (𝑥, 𝑦) = 𝜕𝑥 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑥 (keeping 𝑦 fixed).
𝜕𝑓
• 𝑓𝑦 (𝑥, 𝑦) = 𝜕𝑦 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑦 (keeping 𝑥 fixed).
• Directional derivative of 𝑓(𝑥, 𝑦) = Rate of change of 𝑓 if we allow both 𝑥 and 𝑦 to
change simultaneously (in some direction (𝑢)).
𝐷𝑢⃗⃗⃗ ⃗ 𝑓(𝑥, 𝑦) = ∇𝑓 ⋅ 𝑢
𝜕𝑓 𝜕𝑓
=[ , ] ⋅ [𝑢1 , 𝑢2 ]
𝜕𝑥 𝜕𝑦
𝜕𝑓 𝜕𝑓
= 𝑢1 + 𝑢2
𝜕𝑥 𝜕𝑦
9
Directional Derivative
𝜕𝑓
• 𝑓𝑥 (𝑥, 𝑦) = 𝜕𝑥 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑥 (keeping 𝑦 fixed).
𝜕𝑓
• 𝑓𝑦 (𝑥, 𝑦) = 𝜕𝑦 (𝑥, 𝑦) = Rate of change of 𝑓 as we vary 𝑦 (keeping 𝑥 fixed).
• Directional derivative of 𝑓(𝑥, 𝑦) = Rate of change of 𝑓 if we allow both 𝑥 and 𝑦 to
change simultaneously (in some direction (𝑢)).
𝐷𝑢⃗⃗⃗ ⃗ 𝑓(𝑥, 𝑦) = ∇𝑓 ⋅ 𝑢
𝜕𝑓 𝜕𝑓
=[ , ] ⋅ [𝑢1 , 𝑢2 ]
𝜕𝑥 𝜕𝑦
𝜕𝑓 𝜕𝑓
= 𝑢1 + 𝑢2
𝜕𝑥 𝜕𝑦
9
WEEK 3: REVISION
1. Four Fundamental Subspaces
2. Orthogonal Vectors and Subspaces
3. Projections
4. Least Squares and Projections onto a Subspace
5. Example of Least Squares
2
Suppose A is a m × n matrix.
3
A=
Nullspace is N(A)
4
▪ Find the condition on (𝑏1 , 𝑏2 , 𝑏3 ) for 𝐴𝑥 = 𝑏 to be solvable, if
5
rank(A) + nullity(A) = n dim(C(AT )) + dim(N(AT)) = m
6
▪ The projection p of b onto a line:
7
−1
3
▪ Projection matrix of 𝑎 =
−2
1
8
1
0
Projection of 𝑏 = onto a:
1
1
9
▪ It often happens that Ax = b has no solution.
▪ The usual reason is: too many equations.
▪ The matrix A has more rows than columns.
▪ There are more equations than unknowns (m is greater than n).
▪ Then columns span a small part of m-dimensional space.
▪ We cannot always get the error e= b - Ax down to zero. When e is zero, x is an exact
solution to Ax=b.
▪ When the length of e is as small as possible, 𝑥ො is a least squares solution.
10
11
𝒙 = 𝑨𝑇 b
𝑨𝑻 𝑨ෝ
12
Best fit line: y = 0.7x + 0.7
13
Machine Learning
Foundations
Week 4
Machine Learning Foundations
Week-5 Revision
Arun Prakash A
2
Complex vectors
x∈C n y ∈ Cn
⎡ 3 − 2i ⎤ ⎡−2 + 4i⎤
x = ⎢ −2 + i ⎥ ∈ C3 y = ⎢ 5 − i ⎥ ∈ C3
⎣−4 − 3i ⎦ ⎣ −2i ⎦
Operations:
⎡ 1 + 2i ⎤
z = x + y ∈ Cn z = ⎢ 3 ⎥ ∈ C3
Addition
⎣−4 − 5i⎦
⎡ 1 − 2i ⎤
z=⎢ 3 ⎥
⎣−4 + 5i⎦
Conjugate
3
Inner Product x ⋅ y = xT y ∈ C
⎡ 3 − 2i ⎤ ⎡−2 + 4i⎤ ⎡−2 + 4i⎤
x = ⎢ −2 + i ⎥ y = ⎢ 5 − i ⎥ xT = [3 + 2i, −2 − i, −4 + 3i] ⎢ 5 − i ⎥
⎣−4 − 3i ⎦ ⎣ −2i ⎦ ⎣ −2i ⎦
= −19 + 13i
Properties
1.x ⋅ y = y ⋅ x 3. x ⋅ cy = c(x ⋅ y) 5. x ⋅ x = ∣∣x∣∣2
2. (x + y) ⋅ z = x ⋅ z + y ⋅ z 4. cx ⋅ y = c(x ⋅ y)
4
Complex Matrices
2 3 − 3i 2i 3 + 3i
A=[ ] B=[ ]
3 + 3i 5 3 + 3i 5i
T
Hermitian if: ∗
A = A = AT x ⋅ y = x∗ y ∈ C
2 3 − 3i
A=[ ] is Hermitian
3 + 3i 5
2i 3 + 3i
B=[ ] is not Hermitian
3 + 3i 5i
2i 3 + 3i
C=[ ] is not Hermitian
3 − 3i 5i 5
Properties of Hermitian Matrices
1. All Eigenvalues λi are real.
2. Eigenvectors are orthogonal if λi
= λj for i =
j
7
Diagonalization of Hermitian Matrices
Schur's Theorem
Any n × n matrix is similar to upper
triangular matrix T , that is A = U T U ∗
Example:
1 7
A= [
5 7
] λ1 = −2, λ2 = 3 x1 = [ ] x2 = [ ]
−2 −4 −1 −2
1
Find a vector orthogonal to x1 = [ ] ( you could have picked x2 as well )
−1
1 1
1
q1 = [ ] U= 1
[ ]
−1 2 −1 1
This is my claim!
10
Gram-Schmidt process:
Spectral Theorem
Any Hermitian matrix is similar to diagonal
matrix D, that is A = U DU ∗
11
Singular Value Decomposition (SVD)
Any matrix A can be diagonalized as A =
Q1 ΣQT2 , where Q1 = eig(AAT ) and Q2 =
eig(AT A)
No problem in computation steps as long as none of the singular values
are zero.
If any of the singular value is zero, we need to bring GS process to create unitary
matrices.
Add-on
If SVD is used for PCA, then Singular values represent the variance of the data.
Higher the singular value, higher the variance!. (Watch once again the Image
compression tutorial in week-5, keeping this in mind)
12
WEEK 9: REVISION FINAL EXAM
CONTENTS
1. Properties of Convex Functions
2. Applications of Optimization in Machine Learning
3. Revisiting Constrained Optimization
4. Relation between Primal and Dual Problem, KKT Conditions
5. KKT conditions continued
1. PROPERTIES OF CONVEX FUNCTIONS
Necessary and sufficient conditions for optimality of convex functions
1. PROPERTIES OF CONVEX FUNCTIONS
1. PROPERTIES OF CONVEX FUNCTIONS
1. PROPERTIES OF CONVEX FUNCTIONS
1. PROPERTIES OF CONVEX FUNCTIONS
2. APPLICATIONS OF OPTIMIZATION IN ML
2. APPLICATIONS OF OPTIMIZATION IN ML
3. CONSTRAINED OPTIMIZATION
Consider the constrained optimization problem as follows:
4. RELATION BETWEEN PRIMAL AND DUAL PROBLEM
5. KARUSH-KUHN-TUCKER CONDITIONS
Consider the optimization problem with multiple equality and inequality constraints as
follows:
SOME SOLVED
PROBLEMS
Properties of convex functions
https://www.geogebra.org/m/esqcd4he
Given below is a set of data points and their labels.
Stationarity conditions 1
Stationarity conditions 1