hla
hla
15
15.1 Hypermatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
15.2 Tensors and Multilinear Functionals . . . . . . . . . . . . . . . . . 15-6
15.3 Tensor Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-12
15.4 Border Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-15
15.5 Generic and Maximal Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-17
15.6 Rank-Retaining Decomposition . . . . . . . . . . . . . . . . . . . . . . 15-17
15.7 Multilinear Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-20
15.8 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-22
15.9 Hyperdeterminants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-25
Lek-Heng Lim 15.10 Odds and Ends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-28
University of Chicago References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-28
Most chapters in this handbook are concerned with various aspects and implications of lin-
earity; Chapter 14 and this chapter are unusual in that they are about multilinearity. Just as
linear operators and their coordinate representations, i.e., matrices, are the main objects of
interest in other chapters, tensors and their coordinate representations, i.e., hypermatrices,
are the main objects of interest in this chapter. The parallel is summarized in the following
schematic:
Chapter 14, or indeed the monographs on multilinear algebra such as [Gre78, Mar23,
Nor84, Yok92], are about properties of a whole space of tensors. This chapter is about
properties of a single tensor and its coordinate representation, a hypermatrix.
The first two sections introduce (1) a hypermatrix, (2) a tensor as an element of a tensor
product of vector spaces, its coordinate representation as a hypermatrix, and a tensor as a
multilinear functional. The next sections discuss the various generalizations of well-known
linear algebraic and matrix theoretic notions, such as rank, norm, and determinant, to
tensors and hypermatrices. The realization that these notions may be defined for order-d
hypermatrices where d > 2 and that there are reasonably complete theories which parallel
and generalize those for usual 2-dimensional matrices is a recent one. However, some of these
hypermatrix notions have roots that go back as early as those for matrices. For example,
the determinant of a 2 × 2 × 2 hypermatrix can be found in Cayley’s 1845 article [Cay45];
in fact, he studied 2-dimensional matrices and d-dimensional ones on an equal footing. The
final section describes material that is omitted from this chapter for reasons of space.
In modern mathematics, there is a decided preference for coordinate-free, basis-independ-
ent ways of defining objects but we will argue here that this need not be the best strategy.
The view of tensors as hypermatrices, while strictly speaking incorrect, is nonetheless a very
15-1
15-2 Handbook of Linear Algebra
useful device. First, it gives us a concrete way to think about tensors, one that allows a
parallel to the usual matrix theory. Second, a hypermatrix is what we often get in practice:
As soon as measurements are performed in some units, bases are chosen implicitly, and the
values of the measurements are then recorded in the form of a hypermatrix. (There are of
course good reasons not to just stick to the hypermatrix view entirely.)
We have strived to keep this chapter as elementary as possible, to show the reader that
studying hypermatrices is in many instances no more difficult than studying the usual 2-
dimensional matrices. Many exciting developments have to be omitted because they require
too much background to describe.
Unless otherwise specified, everything discussed in this chapter applies to tensors or hy-
permatrices of arbitrary order d ≥ 2 and all may be regarded as appropriate generalizations
of properties of linear operators or matrices in the sense that they agree with the usual
definitions when specialized to order d = 2. When notational simplicity is desired and when
nothing essential is lost, we shall assume d = 3 and phrase our discussions in terms of
3-tensors. We will sometimes use the notation hni := {1, . . . , n} for any n ∈ N. The bases
in this chapter are always implicitly ordered according to their integer indices. All vector
spaces in this chapter are finite dimensional.
We use standard notation for groups and modules. Sd is the symmetric group of permu-
tations on d elements. An Sd -module means a C[Sd ]-module, where C[Sd ] is the set of all
formal linear combinations of elements in Sd with complex coefficients (see, e.g., [AW92]).
The general linear group of the vector space V is the group GL(V ) of linear isomorphisms
from V onto itself with operation function composition. GL(n, F ) is the general linear group
of invertible n × n matrices over F . We will however introduce a shorthand for products of
such classical groups, writing
GL(n1 , . . . , nd , F ) := GL(n1 , F ) × · · · × GL(nd , F ),
and likewise for SL(n1 , . . . , nd , F ) (where SL(n, F ) is the special linear group of n × n
matrices over F having determinant one) and U(n1 , . . . , nd , C) (where U(n, C) is the group
of n × n unitary matrices).
In this chapter, as in most other discussions of tensors in mathematics, we use ⊗ in
multiple ways: (i) When applied to abstract vector spaces U , V , W , the notation U ⊗V ⊗W
is a tensor product space as defined in Section 15.2; (ii) when applied to vectors u, v, w from
abstract vector spaces U , V , W , the notation u ⊗ v ⊗ w is a symbol for a special element of
U ⊗ V ⊗ W ; (iii) when applied to l-, m-, n-tuples in F l , F m , F n , it means the Segre outer
product as defined in Section 15.1; (iv) when applied to F l , F m , F n , F l ⊗ F m ⊗ F n means
the set of all Segre outer products that can be obtained from linear combinations of terms
like those in Eq. (15.3). Nonetheless, they are all consistent with each other.
15.1 Hypermatrices
What is the difference between an m × n matrix A ∈ Cm×n and a mn-tuple a ∈ Cmn ? The
immediate difference is a superficial one: Both are lists of mn complex numbers except that
we usually write A as a 2-dimensional array of numbers and a as a 1-dimensional array of
numbers. The more important distinction comes from consideration of the natural group
actions on Cm×n and Cmn . One may multiply A ∈ Cm×n on “two sides” independently by
an m×m matrix and an n×n matrix, whereas one may only multiply a ∈ Cmn on “one side”
by an mn × mn matrix. In algebraic parlance, Cm×n is a Cm×m × Cn×n -module whereas
Cmn is a Cmn×mn -module. This extends to any order-d hypermatrices (i.e., d-dimensional
matrices).
In Sections 15.3 to 15.9 we will be discussing various properties of hypermatrices and
tensors. Most of these are generalizations of well-known notions for matrices or order-2
Tensors and Hypermatrices 15-3
tensors. Since the multitude of indices when discussing an order-d hypermatrix can be
distracting, for many of the discussions we assume that d = 3. The main differences between
usual matrices and hypermatrices come from the transition from d = 2 to d = 3. An
advantage of emphasizing 3-hypermatrices is that these may be conveniently written down
on a 2-dimensional piece of paper as a list of usual matrices. This is illustrated in the
examples.
Definitions:
For n1 , . . . , nd ∈ N, a function f : hn1 i × · · · × hnd i → F is a (complex) hypermatrix, also called
an order-d hypermatrix or d-hypermatrix. We often just write ak1 ···kd to denote the value
f (k1 , . . . , kd ) of f at (k1 , . . . , kd ) and think of f (renamed as A) as specified by a d-dimensional
table of values, writing A = [ak1 ···kd ]kn11,...,k ,...,nd
d =1
, or A = [ak1 ···kd ].
The set of order-d hypermatrices (with domain hn1 i×· · ·×hnd i) is denoted by F n1 ×···×nd , and we
define entrywise addition and scalar multiplication: For any [ak1 ···kd ], [bk1 ···kd ] ∈ F n1 ×···×nd
and γ ∈ F , [ak1 ···kd ] + [bk1 ···kd ] := [ak1 ···kd + bk1 ···kd ] and γ[ak1 ···kd ] := [γak1 ···kd ].
The standard basis for F n1 ×···×nd is E := {Ek1 k2 ···kd : 1 ≤ k1 ≤ n1 , . . . , 1 ≤ kd ≤ nd } where
Ek1 k2 ···kd denotes the hypermatrix with 1 in the (k1 , k2 , . . . , kd )-coordinate and 0s everywhere else.
(1) (d)
Let X1 = [xij ] ∈ F m1 ×n1 , . . . , Xd = [xij ] ∈ F md ×nd and A ∈ F n1 ×···×nd . Define multilinear
matrix multiplication by A0 = (X1 , . . . , Xd ) · A ∈ F m1 ×···×md where
Xn1 ,...,nd (1) (d)
a0j1 ···jd = xj1 k1 · · · xjd kd ak1 ···kd for j1 ∈ hm1 i, . . . , jd ∈ hmd i. (15.1)
k1 ,...,kd =1
a ⊗ b ⊗ c := [ai bj ck ]`,m,n
i,j,k=1 ∈ F
`×m×n
. (15.3)
In general if A ∈ F n1 ×···×nd and B ∈ F m1 ×···×me have an index with a common range, say,
np = mq = n, then C ∈ F n1 ×···×nb p ×···×nd ×m1 ×···×m
b q ×···×me
is the hypermatrix with entries
Xn
ci1 ···bip ···i = ai1 ···k···id bj1 ···k···je . (15.7)
d j1 ···jq ···je k=1
b
where by convention a caret over any entry means that the respective entry is omitted (e.g.,
aibjk = aik and F l×m×n = F m×n ).
b
Contractions are not restricted to one pair of indices at a time. For hypermatrices A and B, the
hypermatrix
hA, Biα:λ,β:µ,...,γ:ν
is the hypermatrix obtained from contracting the αth index of A with the λth index of B, the βth
index of A with the µth index of B, . . . , the γth index of A with the νth index of B (assuming that
the indices that are contracted run over the same range and in the same order).
Facts:
Facts requiring proof for which no specific reference is given can be found in [Lim] and the
references therein.
1. F n1 ×···×nd with entrywise addition and scalar multiplication is a vector space.
2. The standard basis E is a basis for F n1 ×n2 ×···×nd , and dim F n1 ×n2 ×···×nd = n1 n2 · · · nd .
3. The elements of the standard basis of F n1 ×···×nd may be written as
Ek1 k2 ···kd = ek1 ⊗ ek2 ⊗ · · · ⊗ ekd ,
using the Segre outer product (15.3).
4. Let A ∈ F n1 ×···×nd and Xk ∈ F lk ×mk , Yk ∈ F mk ×nk for k = 1, . . . , d. Then
(X1 , . . . , Xd ) · [(Y1 , . . . , Yd ) · A] = (X1 Y1 , . . . , Xd Yd ) · A.
n1 ×···×nd
5. Let A, B ∈ F , α, β ∈ F , and Xk ∈ F mk ×nk for k = 1, . . . , d. Then
(X1 , . . . , Xd ) · [αA + βB] = α(X1 , . . . , Xd ) · A + β(X1 , . . . , Xd ) · B.
6. Let A ∈ F n1 ×···×nd , α, β ∈ F , and Xk , Yk ∈ F mk ×nk for k = 1, . . . , d. Then
[α(X1 , . . . , Xd ) + β(Y1 , . . . , Yd )] · A = α(X1 , . . . , Xd ) · A + β(Y1 , . . . , Yd ) · A.
7. The Segre outer product interacts with multilinear matrix multiplication in the fol-
lowing manner
hXr i Xr
(X1 , . . . , Xd ) · βp vp(1) ⊗ · · · ⊗ vp(d) = βp (X1 vp(1) ) ⊗ · · · ⊗ (Xd vp(d) ).
p=1 p=1
then
D Xr Xs,t E
ai ⊗ bi ⊗ ci , wk ⊗ x j ⊗ y j ⊗ z k =
i=1 j,k=1 2:3,3:1
Xr,s,t
hbi , yj ihci , wk iai ⊗ xj ⊗ zk ∈ F l×p×q ,
i,j,k=1
hX, hY, hZ, Ai2:3 i2:2 i2:1 = hY, hX, hZ, Ai2:3 i2:1 i2:2 = · · · = hZ, hY, hX, Ai2:1 i2:2 i2:3 .
14. For the special case where we contract two hypermatrices A, B ∈ Cn1 ×···×nd in all
indices to get a scalar in C, we shall drop all indices and denote it by
Xn1 ,...,nd
hA, Bi = aj1 ···jd bj1 ···jd .
j1 ,...,jd =1
If we replace B by its complex conjugate, this gives the usual Hermitian inner product.
Examples:
1. A 3-hypermatrix A ∈ Cl×m×n has “three sides” and may be multiplied by three matrices
X ∈ Cp×l , Y ∈ Cq×m , Z ∈ Cr×n . This yields another 3-hypermatrix A0 ∈ Cp×q×r where
Xl,m,n
A0 = (X, Y, Z) · A ∈ Cp×q×r , a0αβγ = xαi yβj zγk aijk .
i,j,k=1
A = [A1 | A2 | · · · | An ] ∈ Cl×m×n .
x1 A1 + x2 A2 + · · · + xn An ∈ C[x1 , . . . , xn ]l×m .
Applications:
2. Hooke’s law in one spatial dimension, with x = extension, F = force, c = the spring constant,
is F = −cx. Hooke’s law in three spatial dimensions is given by the linear elasticity equation:
X3
σij = cijkl γkl .
k,l=1
where x = [x1 , x2 , x3 ], C = [cijkl ] ∈ R3×3×3×3 is the elasticity tensor (also called stiffness
tensor), Σ ∈ R3×3 is the stress tensor, and Γ ∈ R3×3 is strain tensor. Hooke’s law may be
expressed in terms of contraction product as
Σ = hC, Γi3:1,4:2 .
3. The observant reader might have noted that the word “tensor” was used to denote a tensor
of order 2. The stress and strain tensors are all of order 2. This is in fact the most common
use of the term “tensors” in physics, where order-2 tensors occur a lot more frequently than
those of higher orders. There are authors (cf. [Bor90], for example) who use the term “tensor”
to mean exclusively a tensor of order 2.
succinct answer to the question “What is a vector?” is, “It is an element of a vector space”
[Hal85, p. 153]. One advantage of such an approach is that it allows us to examine the
entity in the appropriate context. Depending on the context, a matrix can be an element of
a vector space F n×n , of a ring, e.g., the endomorphism ring L(F n ) of F n , of a Lie algebra,
e.g., gl(n, F ), etc. Depending on what properties of the matrix one is interested in studying,
one chooses the space it lives in accordingly.
The same philosophy applies to tensors, where one first defines a tensor product of d vector
spaces V1 ⊗ · · · ⊗ Vd and then subsequently defines an order-d tensor as an element of such
a tensor product space. Since a tensor product space is defined via a universal factorization
property (the definition used in Section 14.2), it can be interpreted in multiple ways, such as
a quotient module (the equivalent definition used here) or a space of multilinear functionals.
We will regard tensors as multilinear functionals. A perhaps unconventional aspect of our
approach is that for clarity we isolate the notion of covariance and contravariance (see
Section 15.10) from our definition of a tensor. We do not view this as an essential part of
the definition but a source of obfuscation.
Tensors can also be represented as hypermatrices by choosing a basis. Given a set of
bases, the essential information about a tensor T is captured by the coordinates aj1 ···jd ’s
(cf. Fact 3 below). We may view the coefficient aj1 ···jd as the (j1 , . . . , jd )-entry of the d-
dimensional matrix A = [aj1 ···jd ] ∈ F n1 ×···×nd , where A is a coordinate representation of T
with respect to the specified bases.
See also Section 14.2 for more information on tensors and tensor product spaces.
Definitions:
Let F be a field and let V1 , . . . , Vd be F -vector spaces.
The tensor product space V1 ⊗ · · · ⊗ Vd is the quotient module F (V1 , . . . , Vd )/R where
F (V1 , . . . , Vd ) is the free module generated by all n-tuples (v1 , . . . , vd ) ∈ V1 × · · · × Vd and R
is the submodule of F (V1 , . . . , Vd ) generated by elements of the form
for all vk , vk0 ∈ Vk , α, β ∈ F , and k ∈ {1, . . . , d}. We write v1 ⊗· · ·⊗vd for the element (v1 , . . . , vd )+
R in the quotient space F/R.
An element of V1 ⊗· · ·⊗Vd that can be expressed in the form v1 ⊗· · ·⊗vd is called decomposable.
The symbol ⊗ is called the tensor product when applied to vectors from abstract vector spaces.
The elements of V1 ⊗ · · · ⊗ Vd are called order-d tensors or d-tensors and nk = dim Vk ,
k = 1, . . . , d are the dimensions of the tensors.
(k) (k)
Let Bk = {b1 , . . . , bnk } be a basis for Vk , k = 1, . . . , d. For a tensor T ∈ V1 ⊗ · · · ⊗ Vd , the
coordinate representation of T with respect to the specified bases is [T ]B1 ,...,Bd = [aj1 ···jd ].
where Xn1 ,...,nd (1) (d)
T = aj1 ···jd bj1 ⊗ · · · ⊗ bjd . (15.8)
j1 ,...,jd =1
Facts:
Facts requiring proof for which no specific reference is given can be found in [Bou98,
Chap. II], [KM97, Chap. 4], [Lan02, Chap. XVI], and [Yok92, Chap. 1]. Additional facts
about tensors can be found in Section 14.2.
1. The tensor product space V1 ⊗ · · · ⊗ Vd with ν : V1 × · · · × Vm → V1 ⊗ · · · ⊗ Vd defined
by
ν(v1 , . . . , vd ) = v1 ⊗ · · · ⊗ vd = (v1 , . . . , vd ) + R ∈ F (V1 , . . . , Vd )/R
and extended linearly satisfies the Universal Factorization Property that can be used
to define tensor product spaces (cf. Section 14.2), namely:
If ϕ is a multilinear map from V1 × · · · × Vd into the vector space U , then there
exists a unique linear map ψ from V1 ⊗ · · · ⊗ Vd into U , that makes the following
diagram commutative:
V1 × · · · × Vd
ν / V1 ⊗ · · · ⊗ Vd
ψ
ϕ
(
U
i.e., ψν = ϕ.
2. If U = F l , V = F m , W = F n , we may identify
F l ⊗ F m ⊗ F n = F l×m×n
through the interpretation of the tensor product of vectors as a hypermatrix via the
Segre outer product (cf. Eq. (15.3)),
In older literature, the aj1 ···jd ’s are often called the components of T .
4. One loses information when going from the tensor to its hypermatrix representation,
in the sense that the bases B1 , . . . , Bd must be specified in addition to the hypermatrix
A in order to recover the tensor T .
Tensors and Hypermatrices 15-9
where A = [ak1 ···kd ], A0 = [a0k1 ···kd ] ∈ F n1 ×···×nd . The relationship between A and A0
is given by the multilinear matrix multiplication
A0 = (X1 , . . . , Xd ) · A
(k) (k)
in terms of the coordinates of the coordinate vector [xk ] = [x1 , . . . , xnk ]T ∈ F nk for
k = 1, . . . , d. In older literature, the aj1 ···jd ’s are also often called the components of
T as in Fact 3.
15-10 Handbook of Linear Algebra
(k) (k)
13. Let Bk = {b1 , . . . , bnk } be a basis for Vk for k = 1, . . . , d.
(a) For each i1 ∈ hn1 i, . . . , id ∈ hnd i, define the multilinear functional ϕi1 ···id :
V1 × · · · × Vd → F by
(
(1) (d) 1 if i1 = j1 , . . . , id = jd ,
ϕi1 ···id (bj1 , . . . , bjd ) =
0 otherwise,
is a basis for V1 ⊗ · · · ⊗ Vd .
(c) For a multilinear functional T : V1 × · · · × Vd → F with aj1 ···jd as defined in
Pn ,...,n
Fact 12, T = j11,...,jdd=1 aj1 ···jd ϕj1 ···jd .
(d) dim V1 ⊗ · · · ⊗ Vd = dim V1 · · · dim Vd since |B ∗ | = |B| = n1 · · · nd .
14. Td (V ) is an End(V )-module (where End(V ) is the algebra of linear operators on V )
with the natural action defined on decomposable elements via
g(v1 ⊗ · · · ⊗ vd ) = g(v1 ) ⊗ · · · ⊗ g(vd )
for any g ∈ End(V ) and then extended linearly to all of Td (V ).
Examples:
For notational convenience, let d = 3.
1. Explicitly, the definition of a tensor product space above simply means that
nXn o
U ⊗ V ⊗ W := αi ui ⊗ vi ⊗ wi : ui ∈ U, vi ∈ V, wi ∈ W, n ∈ N
i=1
where ⊗ satisfies
The “modulo relation” simply means that ⊗ obeys these rules. The statement that u⊗v ⊗w
are generators of U ⊗ V ⊗ W simply means that U ⊗ V ⊗ W is the set of all possible linear
combinations of the form u ⊗ v ⊗ w where u ∈ U , v ∈ V , w ∈ W .
2. We emphasize here that a tensor and a hypermatrix are quite different. To specify a tensor
T ∈ V1 ⊗ · · · ⊗ Vd , we need both the hypermatrix [T ]B1 ,...,Bd ∈ Cn1 ×···×nd and the bases
B1 , . . . , Bd that we chose for V1 , . . . , Vd .
3. Each hypermatrix in F n1 ×···×nd has a unique, natural tensor associated with it: the tensor
in the standard basis of F n1 ⊗· · ·⊗F nd . (The same is true for matrices and linear operators.)
Applications:
2. [Cor84] In the standard model of particle physics, a proton is made up of two up quarks and
one down quark. A more precise statement is that the state of a proton is a 3-tensor
1
√ (e1 ⊗ e1 ⊗ e2 − e2 ⊗ e1 ⊗ e1 ) ∈ V ⊗ V ⊗ V
2
where V is a 3-dimensional inner product space spanned by orthonormal vectors e1 , e2 , e3
that have the following interpretation:
and thus
3. In physics, the question “What is a tensor?” is often taken to mean “What kind of physical
quantities should be represented by tensors?” It is often cast in the form of questions such as
“Is elasticity a tensor?”, “Is gravity a tensor?”, etc. The answer is that the physical quantity
in question is a tensor if it obeys the change-of-bases rule in Fact 15.2.6: A d-tensor is an
object represented by a list of numbers aj1 ···jd ∈ C, jk = 1, . . . , nk , k = 1, . . . , d, once a basis
is chosen, but only if these numbers transform themselves as expected when one changes the
basis.
4. Elasticity is an order-4 tensor and may be represented by a hypermatrix C ∈ R3×3×3×3 . If
we measure stress using a different choice of coordinates (i.e., different basis), then the new
hypermatrix representation C 0 ∈ R3×3×3×3 must be related to C via
C 0 = (X, X, X, X) · C (15.12)
where X ∈ GL(3, R) is the change-of-basis matrix, and Eq. (15.12) is defined according to
Fact 6 X3
c0pqrs = xpi xqj xrk xsl cijkl , p, q, r, s = 1, 2, 3. (15.13)
i,j,k,l=1
5. Let A be an algebra over a field F, i.e., a vector space on which a notion of vector multipli-
cation · : A × A → A, (a, b) 7→ a · b is defined. Let B = {e1 , . . . , en } be a basis for A. Then
A is completely determined by the hypermatrix C = [cijk ] ∈ Fn×n×n where
n
X
ei · ej = cijk ek .
k=1
The n3 entries of C are often called the structure constants of A. This hypermatrix is the
coordinate representation of a tensor in A ⊗ A ⊗ A with respect to the basis B. If we had
chosen a new basis B0 , then the new coordinate representation C 0 would be related to C as
in Fact 6 — in this case C 0 = (X, X, X) · C where X is the change of basis matrix from
B to B0 . Note that this says that the entries of the hypermatrix of structure constants are
coordinates of a tensor with respect to a basis.
6. For an explicit example, the Lie algebra so(3) is the set of all skew-symmetric matrices in
R3×3 . A basis is given by
0 0 0 0 0 −1 0 −1 0
Z1 = 0 0 −1 , Z2 = 0 0 0 , Z3 = 1 0 0 .
0 1 0 1 0 0 0 0 0
15-12 Handbook of Linear Algebra
The product · for so(3) is the commutator product [X, Y ] = XY − Y X. Note that [Y, X] =
−[X, Y ] and [X, X] = 0. Since [Z1 , Z2 ] = Z3 , [Z2 , Z3 ] = Z1 , [Z3 , Z1 ] = Z2 , the structure
constants of so(3) are given by the hypermatrix ε = (εijk ) ∈ R3×3×3 defined by
+1 if (i, j, k) = (1, 2, 3), (2, 3, 1), (3, 1, 2),
εijk = −1 if (i, j, k) = (1, 3, 2), (2, 1, 3), (3, 2, 1),
0 if i = j, j = k, k = i,
(i − j)(j − k)(k − i)
= .
2
ε is often called the Levi-Civita symbol.
Definitions:
Let F be a field.
A hypermatrix A ∈ F n1 ×···×nd has rank one or rank-1 if there exist non-zero v(i) ∈ F n ,
i = 1, . . . , d, so that A = v(1) ⊗ · · · ⊗ v(d) and v(1) ⊗ · · · ⊗ v(d) is the Segre outer product defined
in Eq. (15.3).
The rank of a hypermatrix A ∈ F n1 ×···×nd is defined to be the smallest r such that it may be
written as a sum of r rank-1 hypermatrices, i.e.,
n Xr (1) (d)
o
rank(A) := min r : A = vp ⊗ · · · ⊗ vp . (15.14)
p=1
(k)
Here vp is a vector in the abstract vector space Vk and ⊗ denotes tensor product as defined in
Section 15.2.
A hypermatrix or a tensor has rank zero if and only if it is zero (in accordance with the
convention that the sum over the empty set is zero).
A minimum length decomposition of a tensor or hypermatrix, i.e.,
Xrank(T ) (1) (d)
T = vp ⊗ · · · ⊗ vp , (15.16)
p=1
Facts:
Facts requiring proof for which no specific reference is given can be found in [BCS96,
Chap. 19], [Lan12, Chap. 3], [Lim], and the references therein.
Examples:
1. The phenomenon of rank dependence on field was first observed by Bergman [Ber69]. Take
linearly independent pairs of vectors x1 , y1 ∈ Rl , x2 , y2 ∈ Rm , x3 , y3 ∈ Rn and set zk =
xk + iyk and z̄k = xk − iyk , then
A = x1 ⊗ x2 ⊗ x3 + x1 ⊗ y2 ⊗ y3 − y1 ⊗ x2 ⊗ y3 + y1 ⊗ y2 ⊗ x3 (15.17)
1
= (z̄1 ⊗ z2 ⊗ z̄3 + z̄1 ⊗ z̄2 ⊗ z3 ).
2
One may in fact show that rankC (A) = 2 < 3 = rankR (A).
2. An example where rankR (A) ≤ 2 < rankQ (A) is given by
z ⊗ z ⊗ z + z ⊗ z ⊗ z = 2x ⊗ x ⊗ x − 4y ⊗ y ⊗ x + 4y ⊗ x ⊗ y − 4x ⊗ y ⊗ y ∈ Q2×2×2 ,
√ √
where z = x + 2y and z = x − 2y.
Applications:
1. Let M, C, K ∈ Rn×n be the mass, damping, and stiffness matrices of a viscously damped
linear system in free vibration
M ẍ(t) + C ẋ(t) + Kx(t) = 0.
where M, C, K are all symmetric positive definite. The system may be decoupled using
classical modal analysis [CO65] if and only if
CM −1 K = KM −1 C.
Formulated in hypermatrix language, this asks when A = [M | C | K] ∈ Rn×n×3 has
rank(A) ≤ 3.
15-14 Handbook of Linear Algebra
2. The notion of tensor rank arises in several areas and a well-known one is algebraic com-
putational complexity [BCS96], notably the complexity of matrix multiplications. This is
surprisingly easy to explain. For matrices X = [xij ], Y = [yjk ] ∈ Cn×n , observe that the
product may be expressed as
Xn Xn
XY = xik ykj Eij = ϕik (X)ϕkj (Y )Eij (15.18)
i,j,k=1 i,j,k=1
where Eij = ei e∗j ∈ Cn×n has all entries 0 except 1 in the (i, j)-entry and ϕij (X) =
∗
tr(Eij X) = xij is the linear functional ϕij : Cn×n → C dual to Eij . Let Tn : Cn×n ×
n×n
C → Cn×n be the map that takes a pair of matrices (X, Y ) ∈ Cn×n × Cn×n to their
product T (X, Y ) = XY ∈ Cn×n . Then by Eq. (15.18), Tn is given by the tensor
Xn
Tn = ϕik ⊗ ϕkj ⊗ Eij ∈ (Cn×n )∗ ⊗ (Cn×n )∗ ⊗ Cn×n . (15.19)
i,j,k=1
The exponent of matrix multiplication is then a positive number ω defined in terms of tensor
rank,
ω := inf{α : rank(Tn ) = O(nα ), n ∈ N}.
It is not hard to see that whatever the value of ω > 0, there must exist O(nω ) algorithms for
multiplying n×n matrices. In fact, every r-term decomposition (15.16) of Tn yields an explicit
algorithm for multiplying two n × n matrices in O(nlog2 r ) complexity. Via elementary row
operations, we may deduce that any O(nω ) algorithm for computing matrix multiplications
(A, B) 7→ AB would also yield a corresponding O(nω ) algorithm for matrix inversion A 7→
A−1 (and thus for solving linear systems Ax = b). If we choose the standard bases {Eij :
i, j ∈ hni} on Cn×n and its dual bases {ϕij : i, j ∈ hni} on the dual space (Cn×n )∗ , then
the 3-tensor Tn may be represented by a 3-hypermatrix
2
×n2 ×n2
Mn ∈ Cn .
The connection between the exponent of matrix multiplication and tensor rank was first
noted by Strassen in [Str73]. We refer the reader to Chapter 61 and [Knu98] for very readable
accounts and to [BCS96] for an extensive in-depth discussion.
3. The special case n = 2 is behind Strassen’s algorithm for matrix multiplication and inversion
with O(nlog2 7 ) time complexity [Str69]. We shall present it in the modern language of
hypermatrices. We write M2 = [A1 | A2 | A3 | A4 ] ∈ C4×4×4 where the matrix slices of M2
are
I O O O O I O O
A1 = , A2 = , A3 = , A4 = ,
O O I O O O O I
One may check that XDj Y T = Aj for j = 1, 2, 3, 4. In other words, this is a simultaneous
diagaonlization of A1 , A2 , A3 , A4 by X and Y in the sense of Fact 1 and so we conclude that
rank(M2 ) ≤ 7. In fact, it has been shown that rank(M2 ) = 7 [HK71, Win71] and much more
recently, the border rank (see Section 15.4) of M2 is 7 [Lan06].
Tensors and Hypermatrices 15-15
4. In quantum computing, a pure state, also known as a completely separable state, corresponds
to a rank-1 tensor. A quantum state that is not pure is called entangled. A natural, but not
the most commonly used, measure of the degree of entanglement is therefore tensor rank
[Bry02], i.e., the minimal number of pure states it can be written as a sum of. For example,
the well-known Greenberger–Horne–Zeilinger state [GHZ89] may be regarded as a 2 × 2 × 2
hypermatrix of rank 2:
1
|GHZi = √ (|0i ⊗ |0i ⊗ |0i + |1i ⊗ |1i ⊗ |1i) ∈ C2×2×2 ,
2
1
|W i = √ (|0i ⊗ |0i ⊗ |1i + |0i ⊗ |1i ⊗ |0i + |1i ⊗ |0i ⊗ |0i) ∈ C2×2×2 .
3
Definitions:
The border rank of a hypermatrix A ∈ Cn1 ×···×nd is
rank(A) = min r : inf kA − Bk = 0 ,
rank(B)≤r
where k · k is any norm on hypermatrices (including one obtained by identifying A ∈ Cn1 ×···×nd
with Cn1 ···nd ; see Section 15.8).
Facts:
Facts requiring proof for which no specific reference is given can be found in [BCS96,
Chap. 19], [Lan12, Chap. 3], [Lim], and the references therein.
1. For A ∈ Cn1 ×···×nd , rank(A) ≤ rank(A).
2. For a matrix A ∈ Cm×n , rank(A) = rank(A).
3. There exist examples of a sequence of 3-hypermatrices of rank not more than 2 that
converges to a limit that has rank 3 [BLR80] (see Examples 1 and 2). In fact, the gap
between rank and border rank can be arbitrarily large.
4. Let A = [A1 | A2 | A3 ] ∈ Cn×n×3 where A2 is invertible. Then
1
rank(A) ≥ n + rank(A1 A−1 2 A3 − A3 A2 A1 ).
−1
2
5. Let A ∈ Cn1 ×···×nd and (X1 , . . . , Xd ) ∈ GL(n1 , . . . , nd , C). Then
rank((X1 , . . . , Xd ) · A) = rank(A).
6. While border rank is defined here for hypermatrices, the definition of border rank
extends to tensors via coordinate representation because of the invariance of rank
under change of basis (cf. Fact 15.3.2).
15-16 Handbook of Linear Algebra
7. Suppose rank(T ) = r and rank(T ) = s. If s < r, then T has no best rank-s approxi-
mation.
Examples:
1. [BLR80] The original example of a hypermatrix whose border rank is less than its rank is a
2 × 2 × 2 hypermatrix. In the notation of Section 15.1, if we choose xp = (1, 0), yp = (0, 1) ∈
C2 for p = 1, 2, 3, then
1 1 0 1 0 1 0 1 1 0 1 1 0
A= ⊗ ⊗ + ⊗ ⊗ + ⊗ ⊗ = ∈ C2×2×2
0 0 1 0 1 0 1 0 0 1 0 0 0
and
1 1 1 1 1 1 1 1
Aε = ⊗ ⊗ − ⊗ ⊗
ε ε ε ε ε 0 0 0
ε2
1 1 ε ε 1 1 0 0 0 0 1 1 ε
= − = ∈ C2×2×2 ,
ε ε ε 2 ε2 ε3 ε 0 0 0 0 1 ε ε ε2
from which it is clear that limε→0 Aε = A. We note here that the hypermatrix A is actu-
ally the same hypermatrix that represents the W state in quantum computing (cf. Exam-
ple 15.3.4).
2. Choose linearly independent pairs of vectors x1 , y1 ∈ Cl , x2 , y2 ∈ Cm , x3 , y3 ∈ Cn (so
l, m, n ≥ 2 necessarily). Define the 3-hypermatrix,
A := x1 ⊗ x2 ⊗ y3 + x1 ⊗ y2 ⊗ x3 + y1 ⊗ x2 ⊗ x3 ∈ Cl×m×n ,
A − Aε = ε(y1 ⊗ y2 ⊗ x3 + y1 ⊗ x2 ⊗ y3 + x1 ⊗ y2 ⊗ y3 ) (15.20)
and so
kA − Aε k = O(ε)
In other words, A can be approximated arbitrarily closely by Aε . As a result,
lim Aε = A.
ε→0
One may also check that rank(A) = 3 while it is clear that rank(Aε ) ≤ 2. By definition, the
border rank of A is not more than 2. In fact, rank(A) = 2 because a rank-1 hypermatrix
always has border rank equals to 1 and as such the border rank of a hypermatrix whose rank
and border rank differ must have border rank at least 2 and rank at least 3.
3. There are various results regarding the border rank of hypermatrices of specific dimensions.
If either l, m, or n = 2, then a hypermatrix A = [A1 , A2 ] ∈ Cm×n×2 may be viewed as a
matrix pencil and we may depend on the Kronecker-Weierstraß canonical form to deduce
results about rank and border rank.
4. One may define border rank algebraically over arbitrary fields without involving norm by
considering expressions like those in Eq. (15.20) modulo ε. We refer the reader to [BCS96,
Knu98] for more details on this.
Tensors and Hypermatrices 15-17
Definitions:
Let F be a field.
The maximum rank of F n1 ×···×nd is
Facts:
Facts requiring proof for which no specific reference is given can be found in [Lan12, Chap. 3].
1. The concepts of maxrank and genrank are uninteresting for matrices (i.e., 2-hypermatrices)
since
genrankC (m, n) = maxrankC (m, n) = max{m, n},
but for d > 2, the exact values of genrankC (l, m, n) and maxrankC (l, m, n) are mostly
unknown.
2. Since rank(A) ≤ rank(A), genrankC (n1 , . . . , nd ) ≤ maxrankC (n1 , . . . , nd ).
3. A simple dimension count yields the following lower bound
n1 · · · nd
genrankC (n1 , . . . , nd ) ≥ .
n1 + · · · + nd − d + 1
Note that strict inequaltiy can occur when d > 2. For example, for d = 3,
concept that has been reinvented multiple times under different names (originally as girth
in matroid theory [Oxl11], as krank in tensor decompositions [SB00], and most recently as
spark in compressive sensing [DE03]). The term “Kruskal rank” was not coined by Kruskal
himself who simply denoted it as kS . It is an unfortunate nomenclature, since it is not a
notion of rank. “Kruskal dimension” would have been more appropriate.
Definitions:
Let V be a vector space, v1 , . . . , vn ∈ V , and S = {v1 , . . . , vn }.
The Kruskal rank or krank of S, denoted krank(S) or krank(v1 , . . . , vr ), is the largest k ∈ N
so that every k-element subset of S is linearly independent.
The girth or spark of S is the smallest s ∈ N so that there exists an s-element subset of S that
is linear dependent.
We say that a decomposition of the form (15.21) is essentially unique if given another such
decomposition, Xr Xr
(1) (d) (1) (d)
αp vp ⊗ · · · ⊗ vp = βp wp ⊗ · · · ⊗ wp ,
p=1 p=1
(1) (d)
we must have (i) the coefficients αp = βp for all p = 1, . . . , r; and (ii) the factors vp , . . . , vp and
(1) (d)
wp , . . . , wp differ at most via unimodulus scaling, i.e.,
where θ1p + · · · + θdp ≡ 0 mod 2π, for all p = 1, . . . , r. In the event when successive coefficients are
equal, σp−1 > σp = σp+1 = · · · = σp+q > σp+q+1 , the uniqueness of the factors in (ii) is only up
to relabeling of indices p, . . . , p + q.
Facts:
Facts requiring proof for which no specific reference is given can be found in [Lim] and the
references therein.
1. Let A ∈ Cn1 ×···×nd . Then A has a rank-retaining decomposition
Xr
A= σp vp(1) ⊗ · · · ⊗ vp(d) , (15.21)
p=1
σ1 ≥ σ2 ≥ · · · ≥ σr ,
(k) (k)
and vp ∈ Cnk has unit norm kvp k2 = 1 all k = 1, . . . , d and all p = 1, . . . , r.
2. Fact 1 has a coordinate-free counterpart: If T ∈ V1 ⊗ · · · ⊗ Vd , then we may also write
Xr
T = σp vp(1) ⊗ · · · ⊗ vp(d) ,
p=1
(k)
where r = rank(T ), vp ∈ Vk are vectors in an abstract C-vector space. Furthermore
(k)
if the Vk ’s are all equipped with norms, then vp may all be chosen to be unit vectors
and σp may all be chosen to be positive real numbers. As such, we may rightly call
a decomposition of the form (15.21) a tensor decomposition, with Fact 1 being the
special case when Vk = Cnk .
(1) (1) (2) (2)
3. For the special case d = 2, the unit-norm vectors {v1 , . . . , vr } and {v1 , . . . , vr }
(1) (1) n1 ×r
may in fact be chosen to be orthonormal. If we write U := [v1 , . . . , vr ] ∈ C
(2) (2)
and V := [v1 , . . . , vr ] ∈ Cn2 ×r , Fact 1 yields the singular value decomposition
Tensors and Hypermatrices 15-19
is both rank-retaining, i.e., r = rank(A), and essentially unique if the following com-
binatorial condition is met:
krank(u1 , . . . , ur ) + krank(v1 , . . . , vr ) + krank(w1 , . . . , wr ) ≥ 2r + 2.
This is the Kruskal uniqueness theorem.
9. [SB00] Fact 8 has been generalized to arbitrary order d ≥ 3. Let A ∈ Cn1 ×···×nd where
d ≥ 3. Then a decomposition of the form
Xr
A= σp vp(1) ⊗ · · · ⊗ vp(d)
p=1
is both rank-retaining, i.e., r = rank A, and essentially unique if the following condi-
tion is satisfied: Xd (k)
krank(v1 , . . . , vr(k) ) ≥ 2r + d − 1. (15.22)
k=1
10. A result analogous to the Kruskal uniqueness theorem (Fact 8) does not hold for
d = 2, since a decomposition of a matrix A ∈ Cn×n can be written in infinitely
many different ways A = U V > = (U X −1 )(XV ) for any X ∈ GL(n, C). This is not
surprising since Eq. (15.22) can never be true for d = 2 because of Fact 7.
11. [Der13] For d ≥ 3, the condition (15.22) is sharp in the sense that the right-hand side
cannot be further reduced.
12. One may also write the decomposition in Fact 1 in the form of multilinear matrix
multiplication,
Xr Xr
A= λi xi ⊗ yi ⊗ zi = (X, Y, Z) · Λ, aαβγ = λi xαi yβi zγi ,
i=1 i=1
Note however that this is not a multilinear rank-retaining decomposition in the sense
of the next section, since µ rank(A) 6= (r, r, r) in general.
Here we let Ai• = [ai1 , . . . , aim ]T ∈ F m and A•j = [a1j , . . . , anj ]T ∈ F n denote the ith
row and jth column vectors of A. The numbers in Eqs. (15.24) and (15.23) are the row
and column ranks of A. Their equality is a standard fact in linear algebra and the common
value is called the rank of A. The number in Eq. (15.25) is also easily seen to be equal to
rank(A).
For a hypermatrix A = [aijk ] ∈ F l×m×n , one may also define analogous numbers. How-
ever, we generally have four distinct numbers, with the first three, called multilinear rank
associated with the three different ways to slice A, and the analog of Eq. (15.25) being the
tensor rank of A (cf. Section 15.3). The multilinear rank is essentially matrix rank and so
inherits many of the latter’s properties, so we do not see the sort of anomalies discussed in
Sections 15.4 and 15.5.
Like the tensor rank, the notion of multilinear rank was due to Hitchcock [Hit27b], as a
special case (2-plex rank) of his multiplex rank.
Definitions:
For a hypermatrix A = [aijk ] ∈ F l×m×n ,
defined by
([k (A))ij = (A)sk (i,j)
where sk (i, j) is the jth element in lexicographic order in the subset of hn1 i × · · · × hnd i consisting
of elements that have kth coordinate equal to i, and by convention a caret over any entry of a
d-tuple means that the respective entry is omitted. See Example 1.
The multilinear kernels or nullspaces of T ∈ U ⊗ V ⊗ W are
Facts:
Facts requiring proof for which no specific reference is given can be found in [Lan12, Chap. 2]
as well as [Lim] and the references therein.
1. For A ∈ F l×m×n , the multilinear rank µ rank(A) = (r1 , r2 , r3 ) is given by
where rank here is of course the usual matrix rank of the matrices [1 (A), [2 (A), [3 (A).
2. Let A ∈ F l×m×n . For any X ∈ GL(l, F ), Y ∈ GL(m, F ), Z ∈ GL(n, F ),
µ rank((X, Y, Z) · A) = µ rank(A).
5. If A ∈ F l×m×n has µ rank(A) = (p, q, r), then there exist matrices X ∈ F l×p , Y ∈
F m×q , Z ∈ F n×r of full column rank and C ∈ Cp×q×r such that
A = (X, Y, Z) · C,
A = (L1 , L2 , L3 ) · U.
This may be viewed as the analogues of the LU decomposition for hypermatrices with
respect to the notion of multilinear rank.
15-22 Handbook of Linear Algebra
8. Let A ∈ Cl×m×n and µ rank(A) = (p, q, r). There exist Q1 ∈ Cl×p , Q2 ∈ Cm×q ,
Q3 ∈ Cn×r with orthonormal columns and R ∈ Cp×q×r such that
A = (Q1 , Q2 , Q3 ) · R.
This may be viewed as the analogues of the QR decomposition for hypermatrices with
respect to the notion of multilinear rank.
9. The LU and QR decompositions in Facts 7 and 8 are among the few things that
can be computed for hypermatrices, primarily because multilinear rank is essen-
tially a matrix notion. For example, one may apply usual Gaussian elimination or
Householder/Givens QR to the flattenings of A ∈ F l×m×n successively, reducing all
computations to standard matrix computations.
Examples:
1. Let
a111 a121 a131 a112 a122 a132
a211 a221 a231 a212 a222 a232
A= ∈ C4×3×2 ,
a311 a321 a331 a312 a322 a332
a411 a421 a431 a412 a422 a432
then
a111 a112 a121 a122 a131 a132
a211 a212 a221 a222 a231 a232
[1 (A) = ∈ C4×6 ,
a311 a312 a321 a322 a331 a332
a411 a412 a421 a422 a431 a432
a111 a112 a211 a212 a311 a312 a411 a412
[2 (A) = a121 a122 a221 a222 a321 a322 a421 a422 ∈ C3×8 ,
a131 a132 a231 a232 a331 a332 a431 a432
a a a a a a a a a a a a
[3 (A) = 111 121 131 211 221 231 311 321 331 411 421 431 ∈ C2×12 .
a112 a122 a132 a212 a222 a232 a312 a322 a332 a412 a422 a432
2. Note that if we had a bilinear form represented by a matrix M ∈ Cm×n , the analogues of
these subspaces would be the four fundamental subspaces of the matrix: ker1 (M ) = ker(M ),
im1 (M ) = im(M ), ker2 (M ) = ker(M T ), im2 (M ) = im(M T ). The rank-nullity theorem
reduces to the usual one:
15.8 Norms
In this section we will discuss the Hölder, induced, and nuclear norms of hypermatrices.
When discussing multiple variety of norms, one has to introduce different notation to dis-
tinguish them and here we follow essentially the notation and terminology for matrix norms
in Chapter 24, adapted as needed for hypermatrices. For the induced and nuclear norms,
we assume d = 3 to avoid notational clutter.
Definitions:
For A = [aj1 ···jd ]n 1 ,...,nd
j1 ,...,jd =1 ∈ C
n1 ×···×nd
and p ∈ [1, ∞], the Hölder p-norm is defined by
Xn1 ,...,nd 1/p
kAkH,p := |aj1 ···jd |p , (15.29)
j1 ,...,jd =1
Tensors and Hypermatrices 15-23
kAkH,2 is often denoted kAkF and called the Frobenius norm or Hilbert–Schmidt norm of
the hypermatrix A ∈ Cn1 ×···×nd .
For hypermatrices, the induced norm, natural, or operator norm is defined for p, q, r ∈
[1, ∞], by the quotient
|A(x, y, z)|
kAkp,q,r := max (15.30)
x,y,z6=0 kxk p kykq kzkr
where
Xl,m,n
A(x, y, z) = aijk xi yj zjk .
i,j,k=1
For hypermatrices A ∈ Cn1 ×···×nd of arbitrary order, use the obvious generalizations, and of
course for p = ∞, replace the sum by max{|λ1 |, . . . , |λr |}.
The inner product on Cn1 ×···×nd obtained from the dot product on Cn1 ···nd by viewing hyper-
matrices as complex vectors of dimension n1 · · · nd , the dot product will be denoted by
Xn1 ,...,nd
hA, Bi := aj1 ···jd b̄j1 ···jd . (15.32)
j1 ,...,jd =1
Facts:
Facts requiring proof for which no specific reference is given can be found in [DF93, Chap. I]
as well as [Lim] and the references therein.
1. The hypermatrix Hölder p-norm kAkH,p of A ∈ Cn1 ×···×nd is the (vector Hölder)
p-norm of A when regarded as a vector of dimension n1 · · · nd (see Section 50.1).
2. The Hölder p-norm has the property of being multiplicative on rank-1 hypermatrices
in the following sense:
where u ∈ Cn1 , v ∈ Cn2 , . . . , z ∈ Cnd and the norms on these are the usual p-norms
on vectors.
3. The Frobenius norm is the norm (length) in the inner product space Cn1 ×···×nd with
Eq. (15.32).
4. The inner product (15.32) (and thus the Frobenius norm) is invariant under multi-
linear matrix multiplication by unitary matrices,
12. [DF93] The nuclear norm defines a norm on Cn1 ×···×nd (with norm as defined in
Section 50.1).
13. The nuclear norm is invariant under mutilinear unitary matrix multiplication, i.e.,
k(Q1 , . . . , Qd ) · AkS,1 = kAkS,1
for any Q1 ∈ U(n1 ), . . . , Qd ∈ U(nd ).
14. The nuclear norm and the spectral norm are dual norms to each other, i.e.,
kAkS,1 = max{|hA, Bi| : kBk2,...,2 = 1},
kAk2,...,2 = max{|hA, Bi| : kBkS,1 = 1}.
15. Since Cn1 ×···×nd is finite dimensional, all norms are necessarily equivalent (see Sec-
tion 50.1). Given that all norms induce the same topology, questions involving con-
vergence of sequences of hypermatrices, whether a set of hypermatrices is closed, etc.,
are independent of the choice of norms.
Examples:
1. [Der13a] Let Tn be the matrix multiplication tensor in Eq. (15.19). Then kTn kF = n3/2 ,
kTn kS,1 = n3 , and kTn k2,2,2 = 1.
Tensors and Hypermatrices 15-25
15.9 Hyperdeterminants
There are two ways to extend the determinant of a matrix to hypermatrices of higher
order. One is to simply extend the usual expression of an n × n matrix determinant as
a sum of n! monomials in the entries of the matrix, which we will call the combinatorial
hyperdeterminant, and the other, which we will call the geometric hyperdeterminant, is by
using the characterization that a matrix has det A = 0 if and only if Ax = 0 has nonzero
solutions. Both approaches were proposed by Cayley [Cay49, Cay45], who also gave the
explicit expression of a 2 × 2 × 2 geometric hyperdeterminant. Gelfand, Kapranov, and
Zelevinsky [GKZ94, GKZ92] have a vast generalization of Cayley’s result describing the
dimensions of hypermatrices for which geometric hyperdeterminants exist.
Definitions:
The combinatorial hyperdeterminant of a cubical d-hypermatrix A = [ai1 i2 ···id ] ∈ F n×···×n is
defined as
n
1 X Y
det(A) = sgn π1 · · · sgn πd aπ1 (i)···πd (i) . (15.33)
n! π1 ,...,πd ∈Sn
i=1
Facts:
Facts requiring proof for which no specific reference is given may be found in [GKZ94,
Chap. 13].
X n
Y
det(A) = sgn(π2 · · · πd ) aiπ2 (i)···πd (i) .
π2 ,...,πd ∈Sn
i=1
For d = 2, this reduces to the usual expression for the determinant of an n×n matrix.
3. Taking the π-transpose of an order d-hypermatrix A ∈ F n×···×n leaves the combina-
torial hyperdetermiant invariant, i.e., det Aπ = det A for and π ∈ Sd .
4. (Gelfand–Kapranov–Zelevinsky) A geometric hyperdeterminant exists for Cn1 ×···×nd
if and only if for all k = 1, . . . , d, the following dimension condition is satisfied:
X
nk − 1 ≤ (nj − 1).
j6=k
6. The geometric hyperdeterminant exists for all cubical hypermatrices (and by defi-
nition is nontrivial when it exists, including for odd order cubical hypermatrices).
The geometric hyperdeterminant also exists for some non-cubical hypermatrices (see
Fact 3).
7. For Cm×n , the dimension condition in Fact 4 is m ≤ n and n ≤ m, which may be
viewed as a reason why matrix determinants are only defined for square matrices.
8. It is in general nontrivial to find an explicit expression for Detn1 ,...,nd (A). While
systematic methods for finding it exists, notably one due to Schläfli [GKZ94], an
expression may nonetheless contain a large number of terms when expressed as a sum
of monomials. For example, Det2,2,2,2 (A) has more than 2.8 million monomial terms
[GHS08] even though a hypermatrix A ∈ C2×2×2×2 has only 16 entries.
9. Unlike rank and border rank (cf. Facts 15.3.2 and 15.4.5), the geometric hyperde-
terminant is not invariant under multilinear matrix multiplication by nonsingular
matrices (which is expected since ordinary matrix determinant is not invariant under
left and right multiplications by nonsingular matrices either). However, it is relatively
invariant in the following sense [GKZ94].
Let n1 , . . . , nd satisfy the dimension condition in Fact 4. Then for any A ∈
Cn1 ×···×nd and any X1 ∈ GL(n1 , C), . . . , Xd ∈ GL(nd , C),
Detn1 ,...,nd ((X1 , . . . , Xd ) · A) = det(X1 )m/n1 · · · det(Xd )m/nd Detn1 ,...,nd (A) (15.34)
where m is the degree of Detn1 ,...,nd . Hence for X1 ∈ SL(n1 , C), . . . , Xd ∈ SL(nd , C),
we get
Detn1 ,...,nd ((X1 , . . . , Xd ) · A) = Detn1 ,...,nd (A).
10. A consequence of Fact 9 is the properties of the usual matrix determinant under
row/column interchanges, addition of a scalar multiple of row/column to another,
etc., are also true for the hyperdeterminant. For notational convenience, we shall
just state the following corollary for a 3-hypermatrix although it is true in general for
any d-hypermatrix [GKZ94]. We use the notation in Example 15.1.3 where A1 , . . . , An
denote the slices of the 3-hypermatrix A, analogues of rows or columns of a matrix. Let
l, m, n satisfy the dimension condition in Fact 4 and let A = [A1 | . . . |An ] ∈ Cl×m×n .
Then
(a) interchanging two slices leaves the hyperdeterminant invariant up to sign:
Detl,m,n ([A1 |...|Ai |...|Aj |...|An ]) = ± Detl,m,n ([A1 |...|Aj |...|Ai |...|An ]);
(b) adding a scalar multiple of a slice to another leaves the hyperdeterminant in-
variant:
Detl,m,n ([A1 |...|Ai |...|Aj |...|An ]) = Detl,m,n ([A1 |...|Ai + αAj |...|Aj |...|An ])
for any α ∈ C;
(c) having two proportional slices implies that the hyperdetermiant vanishes:
Detl,m,n ([A1 |...|Ai |...|αAi |...|An ]) = 0 for any α ∈ C;
In particular, the last property implies that the other three properties hold for slices
of A in any other fixed index.
11. Detn1 ,...,nd (A) = 0 if and only if 0 is a singular value of A.
Tensors and Hypermatrices 15-27
Examples:
This is immediate from Fact 2, or by applying the definition (15.33) and simplifying.
2. [Cay45] The geometric hyperdeterminant of A = [A1 |A2 ] = [aijk ] ∈ C2×2×2 (where A1 and
A2 slices of A, cf. Section 15.1) is
1
Det2,2,2 (A) = [det(A1 + A2 ) − det(A1 − A2 )]2 − 4 det[A1 ] det[A2 ]
4
2
1 a111 a112 a a221 a111 a121 a a221
= det + 211 − det − 211
4 a121 a122 a212 a222 a112 a122 a212 a222
a a121 a a221
− 4 det 111 det 211 .
a112 a122 a212 a222
The statement that the gradient ∇A(x, y, z) of the trilinear functional defined by A vanishes
for some nonzero vectors x, y, z is the statement that the system of bilinear equations,
Applications:
1. In Example 15.3.4, we saw how one may characterize the notion of a pure state in terms of
the rank of a hypermatrix, namely rank(A) = 1. The hyperdeterminant characterizes the
dual notion — the degenerate entangled states are precisely the hypermatrices A ∈ C2×2×2
with Det2,2,2 (A) = 0 [MW02]. More generally, when Det2,2,2 (A) 6= 0, the magnitude of the
hyperdeterminant |Det2,2,2 (A)| gives another way to measure the amount of entanglement
called the concurrence [HW97].
Acknowledgment
The author would like to express his heartfelt gratitude to Leslie Hogben for her tireless
editorial efforts, transforming a manuscript that is an earlier version of [Lim] into the HLA
format that one sees here. It is an arduous undertaking without which this chapter would
not have made it in time for inclusion in this volume. The author would also like to thank
Harm Derksen, Shenglong Hu, J. M. Landsberg, Peter McCullagh, and Ke Ye for their very
helpful comments.
References
[AW92] W. A. Adkins and S. H. Weintraub. Algebra: An Approach Via Module Theory.
Springer-Verlag, New York, 1992.
[Bax78] R. J. Baxter. Solvable eight-vertex model on an arbitrary planar lattice. Philos. Trans.
Tensors and Hypermatrices 15-29