02 Lecture
02 Lecture
Edgar Solomonik
Its often helpful to use alternative views of the same collection of elements
I Folding a tensor yields a higher-order tensor with the same elements
I Unfolding a tensor yields a lower-order tensor with the same elements
I In linear algebra, we have the unfolding v = vec(A), which stacks the
columns of A ∈ Rm×n to produce v ∈ Rmn
I For a tensor T ∈ Rs1 ×s2 ×s3 , v = vec(T ) gives v ∈ Rs1 s2 s3 with
I What is a tensor?
I A collection of numbers arranged into an array of a particular order, with
dimensions l × m × n × · · · , e.g., T ∈ Rl×m×n is order 3
I A multilinear operator z = fM (x, y)
X
zi = tijk xj yk
j,k
I A multilinear form
P
i,j,k tijk xi yj zk
Tensor Transposition
For tensors of order ≥ 3, there is more than one way to transpose modes
bi2 i1 = ai1 i2
The Kronecker product between two matrices A ∈ Rm1 ×m2 , B ∈ Rn1 ×n2
I This form omits ’Hadamard indices’, i.e., indices that appear in both inputs
and the output (as with pointwise product, Hadamard product, and batched
mat–mul.)
I Other contractions can be mapped to this form after transposition
Unfolding the tensors reduces the tensor contraction to matrix multiplication
I Combine (unfold) consecutive indices in appropriate groups of size s ,t, or v
I If all tensor modes are of dimension n, obtain matrix–matrix product
s t s v v t
C = AB where C ∈ Rn ×n , A ∈ Rn ×n , and B ∈ Rn ×n
I Assuming classical matrix multiplication, contraction requires ns+t+v
elementwise products and ns+t+v − ns+t additions
Properties of Einsums
Given an elementwise expression containing a product of tensors, the operands
commute
I For example AB 6= AB, but
X X
aik bkj = bkj aik
k k
I Similarly with multiple terms, we can bring summations out and reorder as
needed, e.g., for ABC
X X X
aik ( bkl clj ) = clj bkl aik
k l kl
I W(m) and U(m) are unfoldings where the mth mode is mapped to be an index
into rows of the matrix
I To perform multiple tensor times matrix products, can write, e.g.,
X
W = U ×1 X ×2 Y ×3 Z ⇒ wijk = upqr xip yjq zkr
pqr
(A ⊗ B)(C ⊗ D) = AC ⊗ BD
(A B)T (C D) = AT C ∗ B T D
I Denoting Sn−1 ⊂ Rn as the unit sphere (set of vectors with norm one), we
define the tensor operator (spectral) norm to generalize the matrix 2-norm as
(1) (d)
X
kT k22 = sup ti1 ...id xi1 · · · xid
x(1) ,...,x(d) ∈Sn−1 i1 ...id
R1 Rd−1 d−1
(1) (j) (d)
X X Y
ti1 ...id = ··· ui1 r1 urj−1 ij rj urd−1 id
r1 =1 rd−1 =1 j=2
We can compare the aforementioned decomposition for an order d tensor with all
dimensions equal to n and all decomposition ranks equal to R
decomposition CP Tucker tensor train
size dnR dnR + Rd 2nR + (d − 2)nR2
uniqueness if R ≤ (3n − 2)/2 no no
orthogonalizability none partial partial
exact decomposition NP hard O(nd+1 ) O(nd+1 )
approximation NP hard NP hard NP hard
typical method ALS HOSVD TT-ALS (implicit)