0% found this document useful (0 votes)

118 views

Vector - Matrix Calculus

Vector/Matrix calculus extends calculus to vectors and matrices. The key concepts are: - Vector gradient is the vector of partial derivatives of a scalar function with respect to the variables. - Jacobian is the generalization of gradient to vector-valued functions. - Hessian is the matrix of second-order partial derivatives, generalizing the second derivative. - For matrix functions, the matrix gradient is the matrix of partial derivatives with respect to the matrix elements. Several examples are provided to illustrate calculating gradients, Jacobians, and matrix gradients for different vector, matrix, and scalar functions of vectors and matrices. Differentiation rules also extend to the vector/matrix case.

Uploaded by

std_iut

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views

Vector - Matrix Calculus

Uploaded by

std_iut

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Vector/Matrix Calculus

In neural networks, we often encounter problems with analysis of several variables. Vector/Matrix calculus extends calculus of one variable into that of a vector or a matrix of variables. Vector Gradient: Let g(w) be a dierentiable scalar function of m variables, where

w = [w1, . . . , wm]T
Then the vector gradient of g(w) w.r.t. w is the m-dimensional vector of partial derivatives of g
g w1 g = g = w g = . . w g wm

Similarly we can dene second-order gradient or Hessian matrix.

Hessian matrix is dened as

= 2 w

2g 2 w1

. .

2g wm w1

2g w1 wm . . 2g
2 wm

Jacobian matrix : Generalization to the vector valued functions

g(w) = [g1(w), . . . , gn(w)]T

leads to a denition of the Jacobian matrix of g w.r.t. w g w
g1 w1 = . . g1 wm

In this vector convention the columns of the Jacobian matrix are gradients of the corresponding components functions gi(w) w.r.t. the vector w.
2

gn w1 . . gn wm

Dierentiation Rules: The dierentiation rules are analogous to the case of ordinary functions:

f (w)g(w) f (w) g(w) = g(w) + f (w) w w w

f (w) g(w) f (w) g(w) f (w)/g(w) w w = w g 2(w)

f (g(w)) g(w) = f (g(w)) w w

Example: Consider
m

g(w) =
i=1

aiwi = aT w

where a is a constant vector. Thus, a1 g = . . w am aT w = a. w Example: Let w = (w1, w2, w3)T R3. g(w) = 2w1 + 5w2 + 12w3 = 2 5 12

or in the vector notation

Because g = (2w1 + 5w2 + 12w3) = 2 + 0 + 0 = 2 w1 w1 hence, 2 g = 5 . w 12

Example: Consider
m m

g(w) =
i=1 j=1

aij wiwj = wT Aw

where A is a constant square matrix. Thus,

g = w

. . m w a j=1 j mj +

m w a + j=1 j 1j

and so in the vector notation

m wa i=1 i i1 m wa i=1 i im

wT Aw = Aw + AT w. w Hessian of g(w) is 2a11 a1m + am1 2wT Aw . . = . . 2 w am1 + a1m 2amm

which equals to A + AT .

Example: Let w = (w1, w2)T R2. g(w) = 3w1w1 + 2w1w2 + 6w2w1 + 5w2w2
5

w1 w2

3 2 6 5

w1 w2

g = (3w1w1 + 2w1w2 + 6w2w1 + 5w2w2) w1 w1 = 3 2w1 + 2w2 + 6w2 + 0 = 6w1 + 8w2 g = 0 + 2 w1 + 6w1 + 5 2w2 = 8w1 + 10w2 w2 and so in the vector notation wT Aw = w 3 2 6 5 = Hessian of g(w) is 2wT Aw wT Aw = ( ) 2 w w w = 23 2+6 6+2 25 = 6 8 8 10 .
6

w1 w2 6 8 8 10

+ w1 w2

3 6 2 5

w1 w2

Matrix Gradient: Consider a scalar valued function g(W) of the n m matrix W = {wij } (e.g. determinant of a matrix). The matrix gradient w.r.t W is a matrix of the same dimension as W consisting of partial derivatives of g(W) w.r.t. components of W:
g w11 g = . . W g wm1

Example: If g(W) = tr(W), then g = I W Example: Consider a matrix function

m m

g w1n . . g wmn

g(W) =
i=1 j=1

wij aiaj = aT Wa

i.e. assume that a is a constant vector, whereas W is a matrix of variables. Taking the gradient aT Wa = a a . Thus, in the w.r.t. to W yields w i j ij matrix form aT Wa = aaT W
7

Example: g(W) = 9w11 + 6w21 + 6w12 + 4w22 = 3 2 w11 w12 w21 w22 3 2

Thus we have g = (9w11 + 6w21 + 6w12 + 4w22) = 9 w11 w11 g = (9w11 + 6w21 + 6w12 + 4w22) = 6 w12 w12 g = (9w11 + 6w21 + 6w12 + 4w22) = 6 w21 w21 g = (9w11 + 6w21 + 6w12 + 4w22) = 4 w22 w22 hence g = W 9 6 6 4 = 3 2 3 2

Example: Let W be an invertible square matrix of dimension m with determinant, det(W). Then, det W = (WT )1 det W W Recall that 1 W1 = adj(W). det W In the above adj(W) is the adjoint of W: W11 Wm1 . adj(W) = . . . W1m Wmm

where Wij is a cofactor obtained by multiplying term (1)i+j by the determinant of a matrix obtained from W by removing the ith row and the j th column. Recall that the determinant of W can be also obtained using cofactors:
m

det W =
k=1

wik Wik .
9

In the above formula i denotes an arbitrary row. Now taking derivative of det(W) w.r.t. Wij gives det(W) = Wij wij but from the denition of the matrix gradient it follows that det(W) = adj(W)T W Using the formula for the inverse of W.

W1 =
We have

1 adj(W). det W

det(W) = (WT )1 det W W Homework: Prove that log |det(W)| 1 |det W| = = (WT )1. W |det W| W
10

RD Sharma Class 12 Volume 2
100% (5)
RD Sharma Class 12 Volume 2
958 pages
Linear Algebra With Applications
No ratings yet
Linear Algebra With Applications
1,032 pages
Matrix Calculus: 1 The Derivative
100% (1)
Matrix Calculus: 1 The Derivative
13 pages
Gradient Notes
No ratings yet
Gradient Notes
5 pages
斯坦福大学机器学习数学基础 17-24
No ratings yet
斯坦福大学机器学习数学基础 17-24
8 pages
Gradient Notes PDF
No ratings yet
Gradient Notes PDF
7 pages
Computing Neural Network Gradients-merged
No ratings yet
Computing Neural Network Gradients-merged
67 pages
Matrix Calculus
No ratings yet
Matrix Calculus
8 pages
Notice: Estimation Theory Pattern Recognition
No ratings yet
Notice: Estimation Theory Pattern Recognition
5 pages
Chapter Matrix Derivative Common Cases
No ratings yet
Chapter Matrix Derivative Common Cases
6 pages
Thomas Minka - Note On Matrix Calculus and Algebra
No ratings yet
Thomas Minka - Note On Matrix Calculus and Algebra
19 pages
Ann2018 L5
No ratings yet
Ann2018 L5
23 pages
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
No ratings yet
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
46 pages
Lecture 7
No ratings yet
Lecture 7
24 pages
Derivatives, Backpropagation, and Vectorization
No ratings yet
Derivatives, Backpropagation, and Vectorization
7 pages
Crespin, D. - Matrix Formulas For Semilinear Backpropagation
No ratings yet
Crespin, D. - Matrix Formulas For Semilinear Backpropagation
29 pages
Day 1
No ratings yet
Day 1
41 pages
TA WEEK 3 Copy
No ratings yet
TA WEEK 3 Copy
27 pages
Matrix Algebra Calculus Review
0% (1)
Matrix Algebra Calculus Review
12 pages
Mathématiques avancées master wow
No ratings yet
Mathématiques avancées master wow
4 pages
Math 5390 Chapter 2
No ratings yet
Math 5390 Chapter 2
5 pages
TUM I2DL Matrix Derivatives
No ratings yet
TUM I2DL Matrix Derivatives
8 pages
Selected Linear Algebra for Machine Learning
No ratings yet
Selected Linear Algebra for Machine Learning
30 pages
Backpropagation in Matrix Notation
No ratings yet
Backpropagation in Matrix Notation
8 pages
The Matrix Calculus You Need For Deep Learning
No ratings yet
The Matrix Calculus You Need For Deep Learning
34 pages
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
No ratings yet
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
6 pages
Matrix Calculus Tutorial
No ratings yet
Matrix Calculus Tutorial
7 pages
F Matrix Calculus
No ratings yet
F Matrix Calculus
9 pages
Matrix Calculus PDF
No ratings yet
Matrix Calculus PDF
9 pages
Linear Quadratic Gradients
No ratings yet
Linear Quadratic Gradients
3 pages
mit18_s096iap23_lec1
No ratings yet
mit18_s096iap23_lec1
16 pages
Vector and Matrix Calculus: Herman Kamper 30 January 2013
No ratings yet
Vector and Matrix Calculus: Herman Kamper 30 January 2013
5 pages
Week 6
No ratings yet
Week 6
50 pages
Calculus With Vectors and Matrices
No ratings yet
Calculus With Vectors and Matrices
16 pages
Matrix Calculus Short
No ratings yet
Matrix Calculus Short
5 pages
LinearAlgebraPrimer Ver 2010
No ratings yet
LinearAlgebraPrimer Ver 2010
15 pages
matrixcalc Đạo hàm ma trận PDF
No ratings yet
matrixcalc Đạo hàm ma trận PDF
25 pages
Regression 1
No ratings yet
Regression 1
63 pages
Linear Algebra Primer: Daniel S. Stutts, PH.D
No ratings yet
Linear Algebra Primer: Daniel S. Stutts, PH.D
14 pages
Linear Algebra Matrices
No ratings yet
Linear Algebra Matrices
20 pages
Curs Tehnici de Optimizare
No ratings yet
Curs Tehnici de Optimizare
141 pages
Linear Algebra Matrices
No ratings yet
Linear Algebra Matrices
20 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
Matrix Differentiation
No ratings yet
Matrix Differentiation
15 pages
mit18_s096iap23_lec05
No ratings yet
mit18_s096iap23_lec05
6 pages
mit18_s096iap23_lec07
No ratings yet
mit18_s096iap23_lec07
4 pages
calculus
No ratings yet
calculus
5 pages
Change of Coordinates Math 130 Linear Algebra: 1j 2j NJ
No ratings yet
Change of Coordinates Math 130 Linear Algebra: 1j 2j NJ
3 pages
Some Basics of Matrix Calculation: Ij N, K I, J 1
No ratings yet
Some Basics of Matrix Calculation: Ij N, K I, J 1
3 pages
XCS224N_Module2_Slides
No ratings yet
XCS224N_Module2_Slides
80 pages
ECE275AB Lecture 10 View Graphs 2008-2009
No ratings yet
ECE275AB Lecture 10 View Graphs 2008-2009
15 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
Mat Deriv
No ratings yet
Mat Deriv
3 pages
Ch6 Presn PDF
No ratings yet
Ch6 Presn PDF
24 pages
Matrix Calculus
No ratings yet
Matrix Calculus
9 pages
MC
No ratings yet
MC
27 pages
EN530.678 Nonlinear Control and Planning in Robotics Lecture 1: Matrix Algebra Basics January 27, 2020
No ratings yet
EN530.678 Nonlinear Control and Planning in Robotics Lecture 1: Matrix Algebra Basics January 27, 2020
4 pages
A Second Course in Elementary Differential Equations PDF
No ratings yet
A Second Course in Elementary Differential Equations PDF
201 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
Lecture 2: Background: - Linear Algebra
No ratings yet
Lecture 2: Background: - Linear Algebra
36 pages
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
EM1 Ex7 Theory
No ratings yet
EM1 Ex7 Theory
12 pages
L-1 - Indefinite Integration PDF
No ratings yet
L-1 - Indefinite Integration PDF
36 pages
Chap 2 Factorization
No ratings yet
Chap 2 Factorization
5 pages
C& A 1 QB
No ratings yet
C& A 1 QB
7 pages
Chapter 9 Solutions
No ratings yet
Chapter 9 Solutions
9 pages
Assignment 4 SVD AND KL TRANSFORM
No ratings yet
Assignment 4 SVD AND KL TRANSFORM
2 pages
Mind Map - Matrices - Class 12
75% (4)
Mind Map - Matrices - Class 12
5 pages
Fiche N°2 - Chapitre 1
No ratings yet
Fiche N°2 - Chapitre 1
2 pages
Linear Models and Matrix Algebra
100% (1)
Linear Models and Matrix Algebra
36 pages
Elementary Row Operations
No ratings yet
Elementary Row Operations
8 pages
DSP Exp 4
No ratings yet
DSP Exp 4
4 pages
All Reviews in One File
No ratings yet
All Reviews in One File
21 pages
dg1 hw5 Solutions
No ratings yet
dg1 hw5 Solutions
11 pages
Matrix Algebra and Random Vectors
No ratings yet
Matrix Algebra and Random Vectors
37 pages
Week 1
No ratings yet
Week 1
25 pages
Final Add Math T5 Fasa 2 DLP
100% (2)
Final Add Math T5 Fasa 2 DLP
106 pages
Properties of Matrix Operations
No ratings yet
Properties of Matrix Operations
4 pages
Schur Triangularization
No ratings yet
Schur Triangularization
4 pages
Gradient, Jacobian, Hessian, Laplacian and All That
No ratings yet
Gradient, Jacobian, Hessian, Laplacian and All That
2 pages
Estimation With Singular Inverses: Royal Institute of Technology, Stockholm, Sweden
No ratings yet
Estimation With Singular Inverses: Royal Institute of Technology, Stockholm, Sweden
18 pages
2.1 Matrix Operations: Two Ways To Denote Matrix
No ratings yet
2.1 Matrix Operations: Two Ways To Denote Matrix
6 pages
Cs421 Cheat Sheet
No ratings yet
Cs421 Cheat Sheet
2 pages
Matrix and Determinant
100% (1)
Matrix and Determinant
7 pages
Section 6.1 - Inner Products and Norms: M×N I, J I, J
No ratings yet
Section 6.1 - Inner Products and Norms: M×N I, J I, J
16 pages
Quadratic Forms and Local Extrema: JLB MAT3216: Handout 2
No ratings yet
Quadratic Forms and Local Extrema: JLB MAT3216: Handout 2
2 pages
Eigen
No ratings yet
Eigen
18 pages
Matrix-reasoning-questions-answers-and-explanation
No ratings yet
Matrix-reasoning-questions-answers-and-explanation
5 pages
Ma 2715
No ratings yet
Ma 2715
78 pages

Vector - Matrix Calculus

Uploaded by

Vector - Matrix Calculus

Uploaded by

Vector/Matrix Calculus

Similarly we can dene second-order gradient or Hessian matrix.

Hessian matrix is dened as

Jacobian matrix : Generalization to the vector valued functions

g(w) = [g1(w), . . . , gn(w)]T

f (w)g(w) f (w) g(w) = g(w) + f (w) w w w

f (g(w)) g(w) = f (g(w)) w w

or in the vector notation

Because g = (2w1 + 5w2 + 12w3) = 2 + 0 + 0 = 2 w1 w1 hence, 2 g = 5 . w 12

where A is a constant square matrix. Thus,

and so in the vector notation

wT Aw = Aw + AT w. w Hessian of g(w) is 2a11 a1m + am1 2wT Aw . . = . . 2 w am1 + a1m 2amm

Example: If g(W) = tr(W), then g = I W Example: Consider a matrix function

You might also like