0% found this document useful (0 votes)
118 views

Vector - Matrix Calculus

Vector/Matrix calculus extends calculus to vectors and matrices. The key concepts are: - Vector gradient is the vector of partial derivatives of a scalar function with respect to the variables. - Jacobian is the generalization of gradient to vector-valued functions. - Hessian is the matrix of second-order partial derivatives, generalizing the second derivative. - For matrix functions, the matrix gradient is the matrix of partial derivatives with respect to the matrix elements. Several examples are provided to illustrate calculating gradients, Jacobians, and matrix gradients for different vector, matrix, and scalar functions of vectors and matrices. Differentiation rules also extend to the vector/matrix case.

Uploaded by

std_iut
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views

Vector - Matrix Calculus

Vector/Matrix calculus extends calculus to vectors and matrices. The key concepts are: - Vector gradient is the vector of partial derivatives of a scalar function with respect to the variables. - Jacobian is the generalization of gradient to vector-valued functions. - Hessian is the matrix of second-order partial derivatives, generalizing the second derivative. - For matrix functions, the matrix gradient is the matrix of partial derivatives with respect to the matrix elements. Several examples are provided to illustrate calculating gradients, Jacobians, and matrix gradients for different vector, matrix, and scalar functions of vectors and matrices. Differentiation rules also extend to the vector/matrix case.

Uploaded by

std_iut
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Vector/Matrix Calculus

In neural networks, we often encounter problems with analysis of several variables. Vector/Matrix calculus extends calculus of one variable into that of a vector or a matrix of variables. Vector Gradient: Let g(w) be a dierentiable scalar function of m variables, where

w = [w1, . . . , wm]T
Then the vector gradient of g(w) w.r.t. w is the m-dimensional vector of partial derivatives of g
g w1 g = g = w g = . . w g wm

Similarly we can dene second-order gradient or Hessian matrix.


1

Hessian matrix is dened as


= 2 w

2g

2g 2 w1

. .

2g wm w1

2g w1 wm . . 2g
2 wm

Jacobian matrix : Generalization to the vector valued functions

g(w) = [g1(w), . . . , gn(w)]T


leads to a denition of the Jacobian matrix of g w.r.t. w g w
g1 w1 = . . g1 wm

In this vector convention the columns of the Jacobian matrix are gradients of the corresponding components functions gi(w) w.r.t. the vector w.
2

gn w1 . . gn wm

Dierentiation Rules: The dierentiation rules are analogous to the case of ordinary functions:

f (w)g(w) f (w) g(w) = g(w) + f (w) w w w


f (w) g(w) f (w) g(w) f (w)/g(w) w w = w g 2(w)

f (g(w)) g(w) = f (g(w)) w w

Example: Consider
m

g(w) =
i=1

aiwi = aT w

where a is a constant vector. Thus, a1 g = . . w am aT w = a. w Example: Let w = (w1, w2, w3)T R3. g(w) = 2w1 + 5w2 + 12w3 = 2 5 12

or in the vector notation

w.

Because g = (2w1 + 5w2 + 12w3) = 2 + 0 + 0 = 2 w1 w1 hence, 2 g = 5 . w 12


Example: Consider
m m

g(w) =
i=1 j=1

aij wiwj = wT Aw

where A is a constant square matrix. Thus,


g = w

. . m w a j=1 j mj +

m w a + j=1 j 1j

and so in the vector notation

m wa i=1 i i1 m wa i=1 i im

wT Aw = Aw + AT w. w Hessian of g(w) is 2a11 a1m + am1 2wT Aw . . = . . 2 w am1 + a1m 2amm


which equals to A + AT .

Example: Let w = (w1, w2)T R2. g(w) = 3w1w1 + 2w1w2 + 6w2w1 + 5w2w2
5

w1 w2

3 2 6 5

w1 w2

g = (3w1w1 + 2w1w2 + 6w2w1 + 5w2w2) w1 w1 = 3 2w1 + 2w2 + 6w2 + 0 = 6w1 + 8w2 g = 0 + 2 w1 + 6w1 + 5 2w2 = 8w1 + 10w2 w2 and so in the vector notation wT Aw = w 3 2 6 5 = Hessian of g(w) is 2wT Aw wT Aw = ( ) 2 w w w = 23 2+6 6+2 25 = 6 8 8 10 .
6

w1 w2 6 8 8 10

+ w1 w2

3 6 2 5

w1 w2

Matrix Gradient: Consider a scalar valued function g(W) of the n m matrix W = {wij } (e.g. determinant of a matrix). The matrix gradient w.r.t W is a matrix of the same dimension as W consisting of partial derivatives of g(W) w.r.t. components of W:
g w11 g = . . W g wm1

Example: If g(W) = tr(W), then g = I W Example: Consider a matrix function


m m

g w1n . . g wmn

g(W) =
i=1 j=1

wij aiaj = aT Wa

i.e. assume that a is a constant vector, whereas W is a matrix of variables. Taking the gradient aT Wa = a a . Thus, in the w.r.t. to W yields w i j ij matrix form aT Wa = aaT W
7

Example: g(W) = 9w11 + 6w21 + 6w12 + 4w22 = 3 2 w11 w12 w21 w22 3 2

Thus we have g = (9w11 + 6w21 + 6w12 + 4w22) = 9 w11 w11 g = (9w11 + 6w21 + 6w12 + 4w22) = 6 w12 w12 g = (9w11 + 6w21 + 6w12 + 4w22) = 6 w21 w21 g = (9w11 + 6w21 + 6w12 + 4w22) = 4 w22 w22 hence g = W 9 6 6 4 = 3 2 3 2

Example: Let W be an invertible square matrix of dimension m with determinant, det(W). Then, det W = (WT )1 det W W Recall that 1 W1 = adj(W). det W In the above adj(W) is the adjoint of W: W11 Wm1 . adj(W) = . . . W1m Wmm

where Wij is a cofactor obtained by multiplying term (1)i+j by the determinant of a matrix obtained from W by removing the ith row and the j th column. Recall that the determinant of W can be also obtained using cofactors:
m

det W =
k=1

wik Wik .
9

In the above formula i denotes an arbitrary row. Now taking derivative of det(W) w.r.t. Wij gives det(W) = Wij wij but from the denition of the matrix gradient it follows that det(W) = adj(W)T W Using the formula for the inverse of W.

W1 =
We have

1 adj(W). det W

det(W) = (WT )1 det W W Homework: Prove that log |det(W)| 1 |det W| = = (WT )1. W |det W| W
10

You might also like