Magnus Matrix Differentials Presentation
Magnus Matrix Differentials Presentation
Outline
1
Introduction
Notation
Matrix Calculus: Idea Two
References
Steven W. Nydick
2/119
Introduction
Notation
Notation
X: A matrix
x: A vector
x: A scalar
(x), (x), or (X): A scalar function
f(x), f(x), or f(X): A vector function
F(x), F(x), or F(X): A matrix function
xT or XT : The transpose of x or X
xij : The element in the ith row and jth column of X
(xT )ij : The element in the ith row and jth column of XT
D f(x): The derivative of the function f(x)
d f(x): The differential of the function f(x)
Steven W. Nydick
3/119
Introduction
Basic Idea
Vector calculus is well established, but matrix calculus is difficult.
The paper written by Schneman took one version of the Calculus of
Vectors and applied it to matrices:
1
Steven W. Nydick
4/119
Introduction
Basic Idea
Steven W. Nydick
5/119
There are several matrix algebra properties and matrices that Magnus
references through his paper and book.
1
Steven W. Nydick
6/119
Vectorized Operators
a11 B
a21 B
AB= .
..
a12 B
a11 B
..
.
..
.
am1 B am2 B
ain B
a2n B
..
.
(1)
amn B
Steven W. Nydick
7/119
Vectorized Operators
an
(2)
an
Note that vec(A) is an mn 1 column vector.
Steven W. Nydick
8/119
Vectorized Operators
b1 a
ab1
ab2 b2 a
= . = . =ba
.
.
. .
bn a
abn
Thus, as a basic rule
vec(abT ) = b a
(3)
Steven W. Nydick
9/119
Vectorized Operators
(xj eTj ) = x1 1
0 + x2 0
0 + + xq 0
xq
j=1
= x1
= x1
0 + 0 x2
xq
0
x2
0 + + 0
=X
A matrix can be written as a sum of a bunch of vectors.
Steven W. Nydick
10/119
Vectorized Operators
q
X
vec(AXC) = vec A (xj eTj ) C
j=1
= vec
q
X
(Axj eTj C)
j=1
= vec
q
X
j=1
11/119
Vectorized Operators
q
q
X
X
T
vec
[(Axj )(ej C)] =
vec[(Axj )(eTj C)]
j=1
j=1
q
X
[(eTj C)T (Axj )]
by (3)
j=1
q
X
[(CT ej ) (Axj )]
j=1
q
X
[(CT A)(ej xj )]
j=1
12/119
Vectorized Operators
j=1
= (CT A)
q
X
vec(xj eTj )
by (3)
j=1
X
= (CT A) vec (xj eTj )
j=1
T
= (C A) vec(X)
(4)
13/119
Vectorized Operators
a11
a21
..
.
an1
vech(A) = a22
..
.
an2
..
.
ann
a11
a21
A= .
..
a21
a22
..
.
..
.
an1 an2
Steven W. Nydick
an1
an2
..
.
ann
(5)
14/119
Patterned Matrices
Kmn
(6)
Steven W. Nydick
15/119
Patterned Matrices
A32
Steven W. Nydick
1 2
= 3 4
5 6
K32
1
0
0
=
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
1
16/119
Patterned Matrices
1 0 0 0 0
0 0 0 1 0
1 2
0 1 0 0 0
K32 vec(A) = K32 vec 3 4 =
0 0 0 0 1
5 6
0 0 1 0 0
0 0 0 0 0
1
2
3
1
= = vec
4
2
5
1
0
3
0
0
5
0
2
0 4
6
3 5
4 6
= vec(AT )
6
Steven W. Nydick
17/119
Patterned Matrices
The first m columns of Kmn will affect only the first m elements
of vec(X).
The second m columns of Kmn will affect only the second m
elements of vec(X).
There will be n of these blocks.
Steven W. Nydick
18/119
Patterned Matrices
k1 k2
km km+1
km2
km(n1)+1
kmn
Steven W. Nydick
19/119
Patterned Matrices
Steven W. Nydick
20/119
Patterned Matrices
Steven W. Nydick
21/119
Patterned Matrices
Steven W. Nydick
22/119
Patterned Matrices
1 0
0 0
.. ..
. .
0 0
0 1
0 0
.. ..
Kmn =
. .
0 0
.. ..
. .
0 0
0 0
. .
.. ..
..
.
..
.
..
.
..
.
0 0
Steven W. Nydick
0 0 0
0 1 0
.. .. ..
. . .
0 0 0
0 0 0
0 0 1
.. .. ..
. . .
..
.
..
.
0 0 0
.. .. ..
..
. . .
.
1 0 0
0 0 0
.. .. ..
..
. . .
.
0 0 0
0
0
..
..
.
.
0
0
0
..
..
.
.
0 0
0 0
.. ..
. .
1 0
0 0
0 0
.. ..
..
. .
.
0
0
..
.
0
..
..
.
.
0
1
..
..
.
.
0 1
.. ..
..
. .
.
0 0
0 0
.. ..
..
.
. .
0 0
0
..
.
0
0
0
..
.
0
0
..
.
1
23/119
Patterned Matrices
by (4)
= (B A)Kqn vec(Xqn )
by (6)
But because
Kpm vec(BXAT ) = Kpm (A B) vec(X)
by (4)
it follows that
(B A)Kqn = Kpm (A B)
Steven W. Nydick
(7)
24/119
Patterned Matrices
Dn
(8)
Steven W. Nydick
25/119
Patterned Matrices
Steven W. Nydick
Columns of Dn =
n(n + 1)
2
26/119
Patterned Matrices
1 0 0 0
0 1 0 0
0 0 1 0
0 1 0 0
1 2 3
A33 = 2 4 5
D3 =
0 0 0 1
0 0 0 0
3 5 6
0 0 1 0
0 0 0 0
0 0 0 0
Steven W. Nydick
of A will only
0
0
0
0
0
1
0
1
0
0
0
0
1
27/119
Patterned Matrices
1
0
0
0
1 2 3
2 4 5
D3 vech(A) = D3 vech
=
0
0
3 5 6
0
0
0
1
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0 0
0 0
0 0
0 0
1 0
0 1
0 0
0 1
0 0
1
T
= 1 2 3 2 4 5 3 5 6 = vec 2
3
0
0
1
0
2
0
3
0
4
0
5
0
6
0
1
2 3
4 5
5 6
= vec(A)
Steven W. Nydick
28/119
Patterned Matrices
Note that vech(X) will affect ever decreasing elements in the columns.
Steven W. Nydick
29/119
Patterned Matrices
d1
dn dn+1
dn+(n1)
d[n(n+1)/2]
Rather than dividing blocks of the same length, the separators divide
blocks of increasingly shortening lengths because the number of
elements in vech(X) corresponding to a particular column of X
decreases by 1 in each column.
How many elements are in each column of Dn ?
Steven W. Nydick
30/119
Patterned Matrices
The first column block of Dn takes the first column and puts it in
the first column and first row.
The second column block of Dn takes the second column and puts
it in the second column and second row.
Steven W. Nydick
31/119
Patterned Matrices
Steven W. Nydick
32/119
Patterned Matrices
Steven W. Nydick
33/119
Patterned Matrices
Or:
Dn =
Steven W. Nydick
1 0
0 1
.. .. . .
.
. .
0 0
0 1
0 0
.. ..
..
. .
.
0 0
0 0
..
.. ..
.
. .
1 0
0 0
0 1
.. .. . .
.
. .
0
0
..
..
.
.
0
0
0
..
..
.
.
0 0
.. ..
..
. .
.
0 0
0 0
.. ..
..
. .
.
0 0
0 0
0 0
.. ..
..
. .
.
1 0
0 0
.. ..
..
. .
.
0 0
0 0
1
..
..
.
.
0
1
..
..
.
.
0
0
0 0 0
0 0 0
.. .. ..
. . .
0 0 0
0 0 0
0 0 0
.. .. ..
. . .
0 0 0
.. .. ..
. . .
0 0 0
0 0 0
.. .. ..
. . .
0 1 0
0 0 1
34/119
Patterned Matrices
Steven W. Nydick
36/119
X
(k) (c)
=
(x c)k
k!
(x) = (c) +
k=0
p
X
k=0
(k) (c)
(x c)k + rc (x c)
k!
37/119
X
(k) (c)
k=0
p
X
k=0
k!
(u)
(k) (c) k
(u) + rc (u)
k!
00 (c)
+ r2c (u)
2
= (c) + u0 (c) + r1c (u)
= (c) + u0 (c) + u2
38/119
u0
(c + u) (c)
= 0 (c)
u
Steven W. Nydick
39/119
Differentiability
Based on two slides ago, we have
(c + u) = (c) + u0 (c) + r1c (u)
so that (c) + u0 (c) is the best linear approximation to the original
function. But the strength of the linear approximation depends on the
size of r1c (u).
The first differential:
d (c; u) = u0 (c)
(9)
Steven W. Nydick
40/119
Steven W. Nydick
41/119
Note 3: There is only one first derivative, and the rows of the Jacobian
are Gradients of a particular partial functions of the vector function f,
whereas the columns are the partial derivatives of f with respect to a
particular element of c.
Steven W. Nydick
42/119
(10)
h(c)1
c1
h(c)k
c1
..
.
Steven W. Nydick
h(c)1
cn
..
.
h(c)k
cn
g(b)
b1
g(b)k
b1
..
.
g(b)1
bp
..
.
g(b)k
bp
f (c)1
c1
f (c)p
c1
..
.
f (c)1
cn
..
.
f (c)p
cn
43/119
Steven W. Nydick
x21
+ 2x2
f(t) =
t + 2 cos(t)
ln(t)
44/119
f(t) =
t + 2 cos(t)
ln(t)
Method 1:
(t) = g(f(t))
= (t + 2 cos(t))2 + 2(ln(t))
= t2 + 4t cos(t) + 4 cos2 (t) + 2 ln(t)
So
d(t)
= 2t + 4t[ sin(t)] + 4 cos(t) + 8 cos(t)[ sin(t)] + 2(1/t)
dt
= 2t 4t sin(t) + 4 cos(t) 8 cos(t) sin(t) + 2/t
Steven W. Nydick
45/119
x21
+ 2x2
f(t) =
t + 2 cos(t)
ln(t)
Method 2:
(t) = g(f(t))
So
d(t) g(f(t))
= f1 (t)
dt
g(f(t))
f2 (t)
f1 (t)
t
f2 (t)
t
1 2 sin(t)
= 2 t + 2 cos(t) 2
1/t
= [2 t + 2 cos(t) ][1 2 sin(t)] + [2][1/t]
= 2t 4t sin(t) + 4 cos(t) 8 cos(t) sin(t) + 2/t
Steven W. Nydick
46/119
d X
=
dt
i=1
Steven W. Nydick
g xi
xi t
47/119
by (9)
by (10)
= D g(b) d f (c; u)
by (9)
(11)
48/119
The Hessian
(12)
||u||0
Steven W. Nydick
r(u)
=0
||u||2
49/119
Steven W. Nydick
50/119
Steven W. Nydick
(13)
51/119
Steven W. Nydick
52/119
Therefore
d2 h(c; u) = d2 g(b; d f (c; u)) + d g(b; d2 f (c; u))
(14)
Steven W. Nydick
53/119
f(x)
xT
F(X)
[vec(X)]T
54/119
(15)
Steven W. Nydick
55/119
A is the derivative.
1
2 (B(X)
Steven W. Nydick
56/119
Preliminary Results
dA = O
(16)
d(F) = d F
(17)
d(F + G) = d F + d G
(18)
d tr F = tr(d F)
(19)
(20)
(21)
Steven W. Nydick
57/119
Preliminary Results
(Dij (F)uj ) +
(Dij (G)uj )
= di F + di G
Because linearity applies for an arbitrary element in the differential
vector, it holds for the entire vector of differentials.
Steven W. Nydick
58/119
Preliminary Results
(d(FG))ij = d(FG)ij = d
fik gkj
d(fik gkj )
X
=
[(d fik )gkj + fik (d gkj )]
k
X
X
=
[(d fik )gkj ] +
[fik (d gkj )]
k
59/119
Scalar Functions
by (17)
Thus
Steven W. Nydick
d(aT x) = aT d x
(22)
D(aT x) = aT
(23)
60/119
Scalar Functions
= d(x) Ax + x A d(x)
T
by (20)
by (17)
= x A d(x) + x A d(x)
= [xT (AT + A)] d(x)
Thus
d(xT Ax) = [xT (AT + A)] d(x)
T
D(a x) = x (A + A)
Steven W. Nydick
(24)
(25)
61/119
Scalar Functions
by (15)
= vec[d(aT Xb)]
= vec[aT (d X)b]
= bT aT vec(d X)
T
= b a d vec(X)
Steven W. Nydick
by (17)
by (4)
by (15)
62/119
Scalar Functions
(26)
(27)
Steven W. Nydick
63/119
Scalar Functions
by (15)
T
= vec[d(a XX a)]
= vec[aT (d X)XT a + aT X d(XT )a]
T
Scalar Transpose
Steven W. Nydick
64/119
Scalar Functions
by (4)
= [2(X a) a ] d vec(X)
by (15)
(28)
T
Steven W. Nydick
(29)
65/119
Scalar Functions
Trace Functions
Finding the differential of trace functions use
tr(AT B) = vec(A)T vec(B)
(30)
n
m X
X
(aij bij )
j=1 i=1
= vec(A)T vec(B)
Vectorizing a matrix and taking the dot product is equivalently
summing the squares of every entry in the matrix.
Steven W. Nydick
66/119
Scalar Functions
= vec(A) d vec(X)
by (30)
by (17)
Steven W. Nydick
(31)
(32)
67/119
Scalar Functions
by (30)
Both the second to third line and the third to fourth line use typical
trace rules (e.g., linearity of traces and cyclic permutation).
Steven W. Nydick
68/119
Scalar Functions
(33)
(34)
Steven W. Nydick
69/119
Scalar Functions
by (20)
Steven W. Nydick
70/119
Scalar Functions
And we have
d tr(XT X) = 2 tr[XT d(X)]
= 2 vec(X)T d vec(X)
by (30)
(35)
D tr(XT X) = 2 vec(X)T
(36)
which implies
Steven W. Nydick
71/119
Scalar Functions
by (20)
by (30)
And thus
d tr(XAXB) = vec[(AXB + BXA)T ]T d vec(X)
T T
Steven W. Nydick
(37)
(38)
72/119
Scalar Functions
XX
i
x2ij = A = I, B = I, & p = 1
Steven W. Nydick
73/119
Scalar Functions
Steven W. Nydick
74/119
Scalar Functions
75/119
Scalar Functions
76/119
Scalar Functions
(40)
Even though Equations (39) and (40) do not appear interesting, they
generalize to all matrix differentials on pages 358359 of Magnus.
For instance:
D[tr(XT X)] = 1 vec{IX[I(XT IXI)0 ] + IT X[I(XT IXI)0 ]T }T
= vec[XI + XIT ]T
= vec[X + X]T = 2 vec(X)T
Steven W. Nydick
77/119
Scalar Functions
Trace Differentials
Combine terms.
Steven W. Nydick
78/119
Vector Functions
by (20)
= vec{d[A(x)]x} + A(x) d x
= vec{I d[A(x)]x} + A(x) d x
= (xT I) vec{d[A(x)]} + A(x) d x
= (xT I) D vec[A(x)] d x + A(x) d x
= (xT I) D vec[A(x)] + A(x) d x
Steven W. Nydick
by (9)
(41)
79/119
Vector Functions
by (20)
by (24)
Steven W. Nydick
(42)
80/119
Vector Functions
by (17)
= vec(d[X]a)
= vec(I d[X]a)
= (aT In ) vec(d X)
T
= (a In ) d vec(X)
Steven W. Nydick
by (4)
(43)
81/119
Matrix Functions
by (20)
by (4)
Steven W. Nydick
(44)
82/119
Matrix Functions
by (20)
(45)
Steven W. Nydick
83/119
Matrix Functions
by (6)
= [Kmn ] d vec(X)
(46)
Steven W. Nydick
84/119
Matrix Functions
by (20)
by (4)
by (6)
Steven W. Nydick
85/119
Matrix Functions
Steven W. Nydick
by (7)
(47)
86/119
Matrix Functions
by (20)
T
by (4)
by (6)
Steven W. Nydick
87/119
Matrix Functions
(48)
Steven W. Nydick
88/119
Matrix Functions
by (21)
T
by (4)
(49)
89/119
The Inverse
Steven W. Nydick
90/119
The Inverse
by (16)
We have
d(X1 X) = d(I)
d(X1 X) = 0
d(X
)X + X
d(X) = 0
by (16)
by (21)
d(X1 )X = X1 d(X)
d(X1 ) = X1 d(X)X1
And
d(X1 ) = X1 d(X)X1
Steven W. Nydick
(50)
91/119
The Inverse
Vectorize everything.
Isolate d(X1 ).
Steven W. Nydick
92/119
The Inverse
by (50)
93/119
The Inverse
Steven W. Nydick
(51)
94/119
The Inverse
95/119
The Inverse
(52)
(53)
maps y into the space orthogonal to the predictors (the error space).
Hy = y
Steven W. Nydick
My =
96/119
The Inverse
by (21)
Steven W. Nydick
by (50)
by (21)
97/119
The Inverse
98/119
The Inverse
Steven W. Nydick
99/119
The Inverse
= Im2 [X(X X)
T
M] d vec(X)
Kmm [X(X X)
M] d vec(X)
T
by (6)
M] d vec(X)
by (7)
(54)
100/119
The Inverse
Find the differential w.r.t. d vec(X), but use the duplication matrix to
limit the number of freely varying terms to those on the lower diagonal.
Steven W. Nydick
101/119
The Inverse
Steven W. Nydick
102/119
The Inverse
by (16)
d(X)X
by (50)
Steven W. Nydick
by (4)
103/119
The Inverse
by the Symmetry of X1
AX
] d[Dn vech(X)]
Steven W. Nydick
(55)
(56)
(57)
104/119
The Inverse
Inverse Example 4: = T X1
Assume is a vector of 1s. Then
= T X1
is the sum of all of the elements in X1 .
Now, if X is symmetric, then
d vec() = vec[d(T X1 )]
= vec[T d(X1 )]
T
= vec[ X
by (16)
d(X)X
= [(X1 )T T X1 ] d vec(X)
T
= [ X
Steven W. Nydick
]Dn d vech(X)
by (50)
by (4)
(58)
105/119
X
(k) (0)
k!
k=0
xk
X
xk
k=0
k!
=1+x+
x2 x3 x4
+
+
+ ...
2
6
24
X
f (x)k
k=0
Steven W. Nydick
k!
106/119
d(e
xA
)=d
X
(xA)k
k=0
k!
!
=
X
k=0
x k Ak
k!
!
=
=
=
=
X
d(xk Ak )
k=0
X
k=0
X
k=0
X
k=0
Steven W. Nydick
k!
d(xk )Ak
k!
by (16)
[kxk1 d x]Ak
k!
xk1 Ak
dx
(k 1)!
107/119
X
xk1 Ak
k=0
(k 1)!
dx =
X
xk1 [AAk1 ]
k=1
=A
k=1
And set m = k 1:
d(exA ) = A
X
xk1 Ak1
k=1
(k 1)!
dx = A
(k 1)!
dx
xk1 Ak1
dx
(k 1)!
X
xm A m
dx
m!
m=0
X
X
xm A m
(xA)m
dx = A
d x = AexA d x
m!
m!
m=0
Steven W. Nydick
(59)
m=0
108/119
X
1
k=0
k!
(60)
by (21)
Steven W. Nydick
109/119
k1
k2
(d X)X
+ X(d X)X
+ + X
k!
k1
X
X
1
Xj (d X)Xkj1
=
k!
j=0
k=0
k1
X
X
1
=
Xj (d X)Xkj1
k!
d F(X) =
k1
(d X)
k=0
k=1
j=0
110/119
k1
X
X
1
d F(X) =
Xj (d X)Xkj1
k!
j=0
k=1
m
X
X
1
=
Xj (d X)Xmj
(m + 1)!
m=0
j=0
m
X
X
1
Xj (d X)Xmj
tr d[exp(X)] = tr
(m + 1)!
m=0
j=0
m
X
X
1
=
tr Xj (d X)Xmj
(m + 1)!
m=0
Steven W. Nydick
j=0
111/119
X
tr d[exp(X)] =
m=0
m
X
1
tr Xj (d X)Xmj
(m + 1)!
j=0
X
1
mj j
(m + 1) tr X
X (d X)
=
(m + 1)!
m=0
!
X
1 m
= tr
X (d X)
m!
m=0
h
i
= tr exp(X) d(X) = vec[exp(X)T ]T d vec(X)
(61)
112/119
X
(k) (0)
k=0
k!
xk
00 (0)x2
(3) (0)x3
(4) x4 (0)
+
+
+ ...
2!
3!
4!
2
x
x3
x
+ (1)
+
(2)(1)
+ ...
= ln(1 + 0) +
1+0
2!(1 + 0)2
3!(1 + 0)3
X
x2
x3
x4
(1)k+1 xk
=0+x
+
+ =
2
3
4
k
= (0) + 0 (0)x +
k=1
X xk
x2
x3
x4
ln(1 x) = x
=
2
3
4
k
k=1
Steven W. Nydick
113/119
h
i
X
X (xA)k
d(xk )Ak
d ln(In xA) = d
=
k
k
k=1
k=1
by (16)
kxk1 d xAk
k
k=1
= A
= A
[xk1 Ak1 ] d x
k=1
[xA]m d x
m=0
114/119
X
k=0
xk =
1
1x
h
i
X
d ln(In xA) = A
[xA]m d x
m=0
= A(In xA)1 d x
= A(In xA)1 d x
Steven W. Nydick
(62)
115/119
X
1
k=1
To take the differential of ln(In X), notice that we ultimately use the
same expansion as for the exponential differential.
!
h
i
X
1 k
X
d F(X) = d ln(In X) = d
k
k=1
X
1
k
d(X )
by (16)
=
k
k=1
k1
X
X
1
=
Xj (d X)Xkj1
k
k=1
Steven W. Nydick
j=0
116/119
k1
X
X
1
d F(X) =
Xj (d X)Xkj1
k
j=0
k=1
m
X
X
1
=
Xj (d X)Xmj
m+1
m=0
j=0
m
X
X
1
Xj (d X)Xmj
tr d ln(In X) = tr
m+1
m=0
j=0
m
X
X
1
=
tr Xj (d X)Xmj
m+1
m=0
Steven W. Nydick
j=0
117/119
X
tr d ln(In X) =
m=0
1
m+1
m
X
tr Xj (d X)Xmj
j=0
1
mj j
=
(m + 1) tr X
X (d X)
m+1
m=0
!
X
m
= tr
X (d X)
m=0
= tr (In X) d X
h
T i T
= vec (In X)1
d vec(X)
(63)
118/119
References
References
Steven W. Nydick
119/119