Semidefinite Programming & Algebraic Geometry
Semidefinite Programming & Algebraic Geometry
main
2012/11/1
page v
i
Contents
List of Contributors
ix
List of Figures
xi
Preface
xv
List of Notation
xvii
Semidenite Optimization
Pablo A. Parrilo
2.1
From Linear to Semidenite Optimization
2.2
Applications of Semidenite Optimization
2.3
Algorithms and Software . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
25
41
43
47
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
48
76
86
106
117
131
148
149
i
i
vi
Contents
4.3
The Hypercube Example . . . . . . . . . . . .
4.4
Symmetries, Dual Cones, and Facial Structure
4.5
Generalizing the Hypercube Example . . . . .
4.6
Dual Cone of n,2d . . . . . . . . . . . . . . .
4.7
Ranks of Extreme Rays of 3,6 and 4,4 . . .
4.8
Extracting Finite Point Sets . . . . . . . . . .
4.9
Volumes . . . . . . . . . . . . . . . . . . . . .
4.10
Convex Forms . . . . . . . . . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . . . .
main
2012/11/1
page vi
i
Dualities
Philipp Rostalski and Bernd Sturmfels
5.1
Introduction . . . . . . . . . . . . . . . . . .
5.2
Ingredients . . . . . . . . . . . . . . . . . . .
5.3
The Optimal Value Function . . . . . . . . .
5.4
An Algebraic View of Convex Hulls . . . . .
5.5
Spectrahedra and Semidenite Programming
5.6
Projected Spectrahedra . . . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . . .
Semidenite Representability
Jiawang Nie
6.1
Introduction . . . . . . . . . . . . . . . . .
6.2
Spectrahedra . . . . . . . . . . . . . . . . .
6.3
Projected Spectrahedra . . . . . . . . . . .
6.4
Constructing Semidenite Representations
Bibliography . . . . . . . . . . . . . . . . . . . . .
Convex Hulls of Algebraic Sets
Jo
ao Gouveia and Rekha R. Thomas
7.1
Introduction . . . . . . . . . . . . .
7.2
The Method . . . . . . . . . . . . .
7.3
Convergence of Theta Bodies . . . .
7.4
Combinatorial Optimization . . . .
Bibliography . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
163
167
172
176
182
184
185
195
200
203
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
203
209
219
224
231
239
247
251
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
251
252
261
271
289
293
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Free Convexity
J. William Helton, Igor Klep, and Scott McCullough
8.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2
Basics of Noncommutative Polynomials and Their Convexity
8.3
Computer Algebra Support . . . . . . . . . . . . . . . . . . .
8.4
A Gram-like Representation . . . . . . . . . . . . . . . . . .
8.5
Der QuadratischePositivstellensatz . . . . . . . . . . . . . .
8.6
Noncommutative Varieties with Positive Curvature Have
Degree 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.7
Convex Semialgebraic Noncommutative Sets . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
293
295
317
330
338
341
.
.
.
.
.
.
.
.
.
.
341
349
366
370
380
. . 387
. . 396
i
i
Contents
main
2012/11/1
page vii
i
vii
8.8
From Free Real Algebraic Geometry to the Real World . . . . . 400
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
9
Index
407
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
407
408
411
416
419
421
425
427
429
431
435
438
443
444
444
447
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
447
450
453
459
468
471
i
i
main
2012/11/1
page viii
i
main
2012/11/1
page ix
i
List of Contributors
Grigoriy Blekherman
Georgia Institute of Technology
Pablo A. Parrilo
Massachusetts Institute of Technology
Jo
ao Gouveia
University of Coimbra
Mihai Putinar
University of California, Santa
Barbara
Philipp Rostalski
University of Frankfurt and
Dr
agerwerk AG & Co. KGaA, L
ubeck
William Helton
University of California, San Diego
Igor Klep
The University of Auckland
Bernd Sturmfels
University of California, Berkeley
Scott McCullough
University of Florida
Jiawang Nie
University of California, San Diego
Rekha Thomas
University of Washington
ix
i
i
main
2012/11/1
page x
i
main
2012/11/1
page xi
i
List of Figures
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
5.1
5.2
Feasible sets of the primal and dual LP problems (2.1) and (2.2). .
The shaded set is a spectrahedron, with a semidenite representation given by (2.4). . . . . . . . . . . . . . . . . . . . . . . . . . . .
A projected spectrahedron dened by (2.6). . . . . . . . . . . . . .
A spectrahedron and its projection. . . . . . . . . . . . . . . . . .
Feasible set of the primal SDP problem (2.7). . . . . . . . . . . . .
Unit balls of the spectral norm and the nuclear norm, for the space
of 2 2 symmetric matrices. . . . . . . . . . . . . . . . . . . . . .
A 3-ellipse, a 4-ellipse, and a 5-ellipse, each with its foci. . . . . .
Petersen graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The discriminant Disx (p). . . . . . . . . . . . . . . . . . . . . . . .
The zero set of the discriminant of the polynomial x4 + 4ax3 +
6bx2 + 4cx + 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A three-dimensional convex set. . . . . . . . . . . . . . . . . . . .
Relationships between set classes. . . . . . . . . . . . . . . . . . .
Convex hulls of the graphs of cubic polynomials on an interval. . .
Projection of a rounded solution. . . . . . . . . . . . . . . . . . . .
The boundary of the domain of stability is dened by f(a, b) = 0.
Newton polytope of the polynomial 5 xy x2 y 2 + 3y 2 + x4 . . .
2
The polynomials p = 10 x2 y and (3 y6 )2 + 35
36 y take exactly
2
2
the same values on the unit circle x + y = 1. . . . . . . . . . . .
Set of valid moments (1 , 2 , 3 ) of a probability measure supported on [1, 1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
8
10
10
12
17
18
35
53
54
57
58
69
71
82
92
96
124
129
135
138
i
i
xii
main
2012/11/1
page xii
i
List of Figures
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
6.1
6.2
6.3
6.4
6.5
6.6
6.7
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12
7.13
The unit balls for the L4 -norm and the L4/3 -norm are dual. The
curve on the left has degree 4, while its dual curve on the right has
degree 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The bicuspid curve in Example 5.25. . . . . . . . . . . . . . . . . .
A quartic curve in the plane can have up to 28 real bitangents. . .
The convex hull of the curve (cos(), cos(2), sin(3)) in R3 . . . .
The curve on the unit sphere discussed in Examples 5.37 and
5.61. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The elliptope P = E3 and its dual convex body P . . . . . . . . .
The discriminant in Example 5.59 denes a curve in the (a, b)plane. The projected spectrahedron C is the set of points where
the ternary quartic fa,b is sos. The ranks of the corresponding sos
matrices Q are indicated. . . . . . . . . . . . . . . . . . . . . . . .
Convex hull as intersection of half spaces. . . . . . . . . . . . . . .
Convex hull of the curve in Figure 5.7 and its dual convex body.
The TV screen {(x1 , x2 ) : x41 + x42 1}. . . . . . . . . . . . . .
A line passing through (0.5, 0) intersects the curve x31 3x22 x1
(x21 + x22 )2 = 0 in only 2 real points. . . . . . . . . . . . . . . .
The shaded area is the union of T1 and T2 in Example 6.16. . .
The semialgebraic set of Example 6.18. . . . . . . . . . . . . .
Projected spectrahedron dened in Example 6.19. . . . . . . .
The convex set dened by x21 + x22 x41 + x21 x22 + x42 . . . . . . .
The convex set in Example 6.40. . . . . . . . . . . . . . . . . .
. .
. .
. .
. .
. .
. .
. .
212
223
225
229
230
232
242
243
245
257
258
266
268
269
281
283
299
302
310
311
311
312
314
316
316
319
320
321
324
i
i
List of Figures
7.14
7.15
7.16
7.17
7.18
7.19
On the left we see the cardioid p(x) = 0 and its convex hull. On
the right we see the graph of p, its intersection with the plane z = 0
and the ellipsoidal region where the graph and the boundary of its
convex hull dier. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Graph of the polynomial x x2 x3 + x4 , its convex hull, and
intersection with the x-axis. . . . . . . . . . . . . . . . . . . . . . .
TH2 (I), TH3 (I), TH4 (I), and TH5 (I): all contain the origin in
their interior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The curved eight variety and its convex hull. . . . . . . . . . . . .
Serpentine curve and the closure of its convex hull. . . . . . . . . .
5-wheel, partial 5-wheel, and Petersen graph. . . . . . . . . . . . .
main
2012/11/1
page xiii
i
xiii
325
326
327
328
329
337
i
i
main
2012/11/1
page xiv
i
main
2012/11/1
page xv
i
Preface
In the past decade there has been a surge of interest in algebraic approaches to
optimization problems dened in terms of multivariate polynomials. Fundamental
mathematical challenges that arise in this program include understanding the structure of nonnegative polynomials, the interplay between eciency and complexity
of dierent representations of algebraic sets, and the development of eective algorithms. Remarkably, and perhaps unexpectedly, convexity provides a new viewpoint
and a powerful framework for addressing these questions. This naturally brings us
to the intersection of algebraic geometry, optimization, and convex geometry, with
an emphasis on algorithms and computation. This emerging area has become known
as convex algebraic geometry.
Our aim is to provide an accessible and unifying introduction to the many
facets of this fast-growing interdisciplinary area. Each chapter addresses a fundamental aspect of convex algebraic geometry, ranging from the well-established
core mathematical theory to the forefront of current research and open questions.
Throughout we showcase the rich interactions between theory and applications.
This book is suitable as a textbook in a graduate course in mathematics and
engineering. The chapters make connections to several areas of pure and applied
mathematics and contain exercises at many levels, providing multiple entry points
for readers with varied backgrounds.
We thank the National Science Foundation for funding a Focused Research
Group grant (20082011) awarded to Bill Helton, Jiawang Nie, Pablo A. Parrilo,
Bernd Sturmfels, and Rekha R. Thomas. This award enabled a urry of research
activity in semidenite optimization and convex algebraic geometry. Several workshops and conferences were organized under this grants support. In particular this
book was inspired by the lectures at the workshop LMIPO organized by Bill Helton
and Jiawang Nie at the University of California, San Diego in March 2010.
We thank all our contributors for their hard work and perseverance through
multiple rounds of edits. We also thank Tom Liebling, Sara Murphy, and Ann
Manning Allen at SIAM for their support and patience with the production of this
book. Special thanks to our students and colleagues who read versions of this book
and sent us comments, in particular Chris Aholt, Hamza Fawzi, Fabiana Ferracina,
Alexander Fuchs, Chris Jordan-Squire, Frank Permenter, James Pfeier, Stefan
xv
i
i
xvi
main
2012/11/1
page xvi
i
Preface
Richter, Richard Robinson, Raman Sanyal, James Saunderson, Rainer Sinn, and
Thao Vuong.
Greg Blekherman1
Atlanta, GA
Pablo A. Parrilo2
Cambridge, MA
Rekha R. Thomas3
Seattle, WA
1 The work of Greg Blekherman was supported by a Sloan Fellowship, NSF grant DMS-0757212,
the Mittag-Leer Institute Sweden, and IPAM UCLA.
2 The work of Pablo A. Parrilo was supported by NSF grant DMS-0757207 and a Finmeccanica
Career Development Chair.
3 The work of Rekha R. Thomas was supported by NSF grants DMS-0757371 and DMS-1115293
and a Robert R. and Elaine F. Phelps Endowed Professorship.
i
i
main
2012/11/1
page xvii
i
List of Notation
Basics:
elds, rings
nonnegative integers
nonnegative orthant
positive orthant
standard simplex in Rn+
standard basis vectors
R, C, P, Q, Z
N
Rn+
Rn++
n := {x Rn+ :
xi = 1}
ei
Matrices:
m n matrices
matrix brackets
n n symmetric matrices
n n positive semidenite denite matrices
n n positive denite matrices
inner product in S n
matrix multiplication
trace
matrix transpose
determinant
rank
diagonal of a matrix M as a vector
diagonal matrix obtained from a matrix M
lower triangular matrix from matrix M
turning a vector v into a diagonal matrix
block diagonal matrix with blocks A, B etc
positive semidenite
positive denite
max/min eigen/singular value
Rmn
[]
Sn
n
S+
n
S++
A, B
AB
Tr
AT
det M
rank M
diag(M )
Diag(M )
Tril(M )
Diag(v)
BlockDiag(A, B, ...)
0
0
max , min
Geometry:
p-norm
ball with center u, radius r
vector space dual
orthogonal complement of vector space
dimension
u
p
B(u, r)
V
V
dim V
xvii
i
i
xviii
List of Notation
codimension
cone dual
polar dual of convex body
dual face to an exposed face
dual variety
interior of a set
boundary of set
algebraic boundary
closure of set
convex hull of set C
conical hull of set C
gauge function of a convex body K
codim V
C
P
F
X
int(C)
C
a C
cl(C) or C
conv(C)
cone(C)
GK (x)
Optimization:
optimal solution
semidenite program
kth theta body of ideal I
characteristic vector of a set S
u
SDP
THk (I)
S
Algebra:
ideal generated by
variety of ideal
vanishing ideal of a set
Jacobian
gradient
Hessian
singular locus
smooth points in a variety
polynomial ring in n variables
polynomials in n variables, degree at most d
if n clear
monomials of degree at most d
Nn (for exponents of monomials)
nonnegative polynomials in n variables, degree
at most 2d
if n is clear
sum of squares in n variables of degree at
most 2d
if n is clear
forms in n variables, degree equal to d
if n clear
monomials of degree d
nonnegative forms in n variables, degree 2d
if n is clear
sos forms in n variables of degree 2d
if n is clear
main
2012/11/1
page xviii
i
f1 , . . . , fm
VR (I), VC (I)
I(S)
Jac( )
2
Sing( )
Xreg
R[x], C[x]
R[x]n,d
R[x]d
[x]d
|| = i
Pn,2d
P2d
n,2d
2d
R[x]n,d
R[x]d
[x]d
Pn,2d
P2d
n,2d
2d
i
i
List of Notation
sos polynomials mod an ideal I
polynomials in R[x]n,d that are k-sos mod I
if n is clear
ane linear polynomials in above
Newton polytope of f
linear functionals on R[x]
linear functionals that are evaluations at v
quadratic forms on R[x]n,d
nonnegative quadratic forms in S n,d
preorder of g1 , . . . , gm /truncated
quadratic module of g1 , . . . , gm /truncated
main
2012/11/1
page xix
i
xix
(I)
kn,d (I)
kd (I)
k1 (I)
N (f )
v
S n,d
n,d
S+
preorder(g1 , . . . , gm ),
preorderk (g1 , . . . , gm )
qmodule(g1 , . . . , gm ),
qmodulek (g1 , . . . , gm )
i
i
main
2012/11/1
page xx
i
main
2012/11/1
page 1
i
Chapter 1
What is Convex
Algebraic Geometry?
i
i
main
2012/11/1
page 2
i
i
i
main
2012/11/1
page 3
i
Chapter 2
Semidenite
Optimization
Pablo A. Parrilo
In this chapter we introduce one of the core theoretical and computational techniques in convex algebraic geometry, namely, semidenite optimization. We begin
by reviewing linear programming and proceed to dene and discuss semidenite programs from the algebraic, geometric, and computational perspectives. We dene
spectrahedra as the feasible sets of semidenite programs, study their properties,
and discuss numerous examples. Despite the many parallels, the duality theory
of semidenite optimization is more complicated than in the case of linear programming, and we elaborate on the similarities and dierences. We also showcase
a number of applications of semidenite optimization in several areas of applied
mathematics and engineering and give a short discussion of algorithmic and software aspects. For the convenience of the reader, we present additional background
material on convex geometry and optimization in Appendix A.
2.1
Semidenite optimization is a branch of convex optimization that is of great theoretical and practical interest. Informally, the main idea is to generalize linear
programming and the associated feasible sets (polyhedra) to the case where the decision variables are symmetric matrices, and the inequalities are to be understood
as matrices being positive semidenite. Formal denitions and examples will be
presented shortly in Subsection 2.1.2, preceded by a review of the familiar case of
linear programming. A few selected standard references for linear programming and
their applications are the books [5, 12, 29, 42].
i
i
2.1.1
main
2012/11/1
page 4
i
Linear Programming
Linear programming is the problem of minimizing a linear function subject to linear constraints. A linear programming problem (LP) in standard form is usually
written as
minimize
cT x
(LP-P)
subject to
Ax = b,
x 0,
where A Rmn , b Rm , and we are minimizing over the decision variable x Rn .
The inequality x 0 is interpreted componentwise, i.e., xi 0 for i = 1, . . . , n.
Geometrically, an LP problem has a nice and natural interpretation. Its feasible set is the intersection of an ane subspace (dened by the equations Ax = b),
and the nonnegative orthant. Since it is the intersection of two convex sets, the feasible set of (LP-P) is always convex. In general, a set dened by nitely many linear
inequalities or equations is called a polyhedron, and it is always convex. Thus, linear
programming corresponds exactly to the minimization of a linear function over a
polyhedron. If a polyhedron is bounded, it is called a polytope.
Perhaps one of the most remarkable and useful features of linear programming
is that to every LP problem we can associate a corresponding dual problem. This
is another LP problem (its dual LP), which for the case of (LP-P) is
maximize bT y
(LP-D)
subject to AT y c.
Notice that here we are again optimizing a linear function over a polyhedron. As
we will see, there are very natural and direct algebraic relationships between the
primal problem (LP-P) and its dual problem (LP-D).
Remark 2.1. In practice, LP problems may not naturally present themselves in
the form (LP-P), where all the decision variables are nonnegative and only equality
constraints are present, or the form (LP-D), where there are no sign restrictions
on the variables and only inequalities appear. However, they can always be put in
either form, by introducing additional slack variables and/or splitting variables if
necessary. The details can be found in any textbook on linear programming.
Example 2.2. Consider the following LP problem:
minimize x1 8x2
subject to
x1 + 3x2 + x3
4x1 x2 + x4
x1 , x2 , x3 , x4
= 4,
= 6,
0.
(2.1)
The feasible region is a two-dimensional polyhedron. Its projection into the (x1 , x2 )plane is drawn in Figure 2.1. Notice that the optimal solution is achieved at a vertex,
namely, x = (2, 2, 0, 0), with optimal cost p = 14.
i
i
main
2012/11/1
page 5
i
5
y2
x2
7
6
5
4
3
2
y1
1
2.0
1
1.5
2
1.0
3
0.5
4
0.5
1.0
1.5
2.0
x1
5
Figure 2.1. Feasible sets of the primal and dual LP problems (2.1) and (2.2).
The corresponding dual LP is
subject to
y1 + 4y2
3y1 y2
y1
y2
1,
8,
0,
0.
(2.2)
The dual feasible set (y1 , y2 ) is presented in the same gure, with optimal solution
31
5
y = ( 11
, 11
) and optimal cost d = 14. For this example we have
p = d = 14,
and thus the optimal values of the primal and dual problems are the same.
Even in this simple example, we can observe many of the important features
of linear programming. The following facts are well known.
Geometry of the feasible set: The feasible sets of linear programs are polyhedra. The geometry of polyhedra is quite well understood. In particular, the
Minkowski-Weyl theorem (e.g., Appendix A, [5], or [48, Section 1.1]) states
that every polyhedron P is nitely generated, i.e., it can be written as
P = conv(u1 , . . . , ur ) + cone(v1 , . . . , vs ),
where ui , vi are the vertices and extreme rays of P , respectively, and the
convex hull and conical hull are dened by
r
r
conv(u1 , . . . , ur ) =
i ui
i = 1, i 0, i = 1, . . . , r
i=1
i=1
and
cone(v1 , . . . , vs ) =
s
i=1
i vi i 0,
i = 1, . . . , s .
i
i
main
2012/11/1
page 6
i
(2.3)
where the last inequality follows from the feasibility conditions x 0 and
AT y c. Thus, from any feasible dual solution one can obtain a lower bound
on the value of the primal. Conversely, primal feasible solutions give upper
bounds on the value of the dual.
Strong duality: If both primal and dual problems are feasible, then they achieve
exactly the same optimal value, and there exist optimal feasible solutions
x , y such that cT x = bT y . This is a consequence of the separation theorems for convex sets; see, e.g., Section A.3.3 in Appendix A.
Complementary slackness: Strong duality, combined with (2.3), implies that at
optimality we must have
xi (c AT y )i = 0,
i = 1, . . . , n.
i
i
main
2012/11/1
page 7
i
Exercise 2.4. Consider the set of n n matrices with nonnegative entries that
have all row and column sums equal to 1 (i.e., the doubly stochastic matrices).
1. Write explicitly the equations and inequalities describing this set for n =
2, 3, 4.
2. Compute (using CDD, lrs, or other software; see Section 2.3.2) all the extreme
points of these polytopes.
3. How many extreme points did you nd? What is the structure of the extreme
points? Can you conjecture what happens for arbitrary values of n?
4. Google BirkhoVon Neumann theorem, and check your guess.
2.1.2
Semidenite Programming
m
Ai xi 0,
i=1
i
i
main
2012/11/1
page 8
i
-6
-5
-4
-3
-2
-1
-1
-2
-3
Figure 2.2. The shaded set is a spectrahedron, with a semidenite representation given by (2.4).
Denition 2.6. A set S Rm is a spectrahedron if it has the form
m
m
S = (x1 , . . . , xm ) R : A0 +
Ai xi 0 ,
i=1
x+1
0
y
2
x 1 0 .
(x, y) R2 : A(x, y) := 0
(2.4)
y
x 1
2
This set is shown in Figure 2.2. To obtain scalar inequalities dening the set, let
pA (t) = det(tI A(x, y)) = t3 + p2 t2 + p1 t + p0 be the characteristic polynomial of
i
i
main
2012/11/1
page 9
i
p
m
Ai xi +
Bj y j 0 ,
S = (x1 , . . . , xm ) Rm : (y1 , . . . , yp ) Rp , A0 +
i=1
j=1
(2.5)
where A0 , A1 , . . . , Am , B1 , . . . , Bp are given symmetric matrices.
As the name indicates, geometrically this corresponds to a spectrahedron in
Rm+p that is projected under the linear map : Rm+p Rm , (x, y) x. Since
spectrahedra are semialgebraic sets, by the TarskiSeidenberg theorem (Section
A.4.4 in Appendix A) projected spectrahedra are also semialgebraic. Thus, they
can be dened in terms of nite unions of sets dened by polynomial inequalities involving only the variables xi , although in practice it is not always easy or convenient
to do so.
Example 2.9. Consider the projected spectrahedron in R2 given by
z + y 2z x
2
(x, y) R : z R,
0, z 1 .
2z x z y
(2.6)
i
i
10
main
2012/11/1
page 10
i
1.0
0.5
0.5
1.0
1.5
2.0
2.5
3.0
0.5
1.0
1
1.5
1.0
z
0.5
0.0
0
2
x
i
i
main
2012/11/1
page 11
i
11
(see Section 5.6 in Chapter 5), having a representation of the form (2.5) will often
be enough for optimization purposes.
Exercise 2.10. Both spectrahedra and projected spectrahedra are convex sets.
Show that spectrahedra are always closed sets. What about projected spectrahedra?
Primal SDP formulation. Semidenite programs are linear optimization problems over spectrahedra. An SDP problem in standard primal form is written as
minimize
subject to
C, X
Ai , X = bi ,
i = 1, . . . , m,
(SDP-P)
X 0,
where C, Ai S n , and X, Y := Tr(X T Y ) = ij Xij Yij . The matrix X S n is
the variable over which the minimization is performed. The inequality in the third
line means that the matrix X must be positive semidenite. Notice the strong
formal similarities to the LP formulation (LP-P). As we will see in Section 2.1.4,
this formal analogy can be pushed even further to conic optimization problems.
Let us make a few quick comments before presenting examples of semidenite
programs. The set of feasible solutions of (SDP-P), i.e., the set of matrices X that
satisfy the constraints, is a spectrahedron, and thus it is always convex. This follows
directly from the fact that the feasible set is the intersection of an ane subspace
n
, both of which are convex sets. However,
and the positive semidenite cone S+
unlike the linear programming case, in general the set of feasible solutions will not
be polyhedral.
Example 2.11. Consider the semidenite optimization problem
minimize
subject to
2x11 + 2x12
x11 + x22 = 1,
x11 x12
0.
x12 x22
(2.7)
b1 = 1.
The constraints are satised if and only if x11 (1 x11 ) x212 , and thus the
feasible set is a closed disk, which is not polyhedral. Figure 2.5 shows the feasible
set, parametrized by the variables (x11 , x12 ). The optimal solution is equal to
2 2
1
4
2 2
X =
,
1
2+ 2
2
4
2
with optimal cost 1
i
i
12
main
2012/11/1
page 12
i
0.6
0.4
0.2
0.2
0.2
0.4
0.6
0.8
1.0
1.2
X11
0.2
0.4
0.6
(SDP-D)
i=1
i=1
i
i
main
2012/11/1
page 13
i
13
where the last inequality follows from the fact that the inner product of two positive
semidenite matrices is nonnegative. From (SDP-P) and (SDP-D) we can see that
the left-hand side of (2.8) is the dierence between the primal and dual objective
functions. The inequality in (2.8) tells us that the value of the primal objective
function evaluated at any feasible matrix X is always greater than or equal to the
dual objective function at any dual feasible y. This is known as weak duality. Thus,
we can use any X for which (SDP-P) is feasible to compute an upper bound for
the value of bT y in (SDP-D), and we can also use any feasible y of (SDP-D) to
compute a lower bound for the value of C, X in (SDP-P). Furthermore, in the
case of feasibility problems (i.e., C = 0), the dual problem can be used to certify
nonexistence of solutions to the primal problem. This property will be crucial in
our later developments.
If X and Y are positive semidenite matrices, then X, Y = 0 if and only if
XY = Y X = 0 (e.g., Corollary A.24). Thus, the expression (2.8) allows us to give
a simple sucient characterization of optimality.
Lemma 2.12 (optimality conditions for SDP). Assume (X, y) are primal and
dual feasible solutions of (SDP-P) and (SDP-D), respectively, that satisfy the complementary slackness condition
m
Ai yi X = 0
(2.9)
C
i=1
(and thus achieve the same cost C, X = bT y). Then, (X, y) are primal and dual
optimal solutions of the SDP problem.
In general, the converse statement may require some additional assumptions, to be
discussed shortly.
Example 2.13. Here we continue Example 2.11. The SDP dual to (2.7) is
maximize y
subject to
2y
1
1
0.
y
22
m
1
2
1
+
2
1
4
2
C
Ai yi X =
= 0.
2+ 2
1
1
21
4
i=1
2 2
As opposed to the linear programming case, strong duality may fail in general
semidenite programming. We present below a simple example (from [36]), for
which both the primal and dual problems are feasible, but their optimal values are
i
i
14
main
2012/11/1
page 14
i
dierent (i.e., there is a nonzero nite duality gap). Further examples and a detailed
discussion will be presented in Section 2.1.5.
Example 2.14. Let 0, and consider the primal-dual pair
minimize
subject to
X11
X22 = 0,
maximize
X11 + 2X23 = 1,
X 0,
subject to
y2
y2
0
0
0
y1
y2
y2 0
0
0
0 0
0 0 .
0 0
For a primal feasible point, X being positive semidenite and X22 = 0 imply
X23 = 0, and thus X11 = 1. The primal optimal cost p is then equal to (and is
achieved). On the dual side, the vanishing of the (3, 3) entry implies that y2 must
be zero, and thus d = 0. The duality gap p d is then equal to .
The example above (and others like it), are somewhat pathological. We will
see in Section 2.1.5 that under relatively mild conditions, usually called constraint
qualications, strong duality will also hold in semidenite programming. The simplest and most useful case corresponds to the so-called Slater conditions, where the
primal and/or dual problems are required to be strictly feasible. On the primal side,
this means that there exists X 0 that
satises the linear constraints, and on the
dual side, there exists y such that C i Ai yi 0 (notice that the inequalities are
strict). In this case, the situation is as nice as in the linear programming case.
Theorem 2.15. Assume that both the primal (SDP-P) and dual (SDP-D) semidefinite programs are strictly feasible. Then, both problems have optimal solutions, and
the corresponding optimal costs are equal; i.e., there is no duality gap.
This statement will reappear, in a more general setting, in Section 2.1.5. For
many problems (for instance, the ones discussed in the next section), these assumptions hold and are relatively straightforward to verify. In full generality, however,
they may be restrictive, and thus we investigate in Section 2.1.5 the geometric reasons why strong duality may fail in semidenite optimization, as well as possible
workarounds.
Exercise 2.16. Consider the following SDP problem:
x 1
minimize x
subject to
0.
1 y
1. Draw the feasible set. Is it convex?
2. Is the primal strictly feasible? Is the dual strictly feasible?
3. What can you say about strong duality? Are the results consistent with
Theorem 2.15?
i
i
main
2012/11/1
page 15
i
15
Exercise 2.17. Do the assumptions of Theorem 2.15 hold for Example 2.14?
2.1.3
Before proceeding further, we present several interesting examples of sets that are
expressible in terms of semidenite programming. We will revisit several of these
throughout the dierent chapters in this book.
Spectraplex: The spectraplex or free spectrahedron On is the set of n n positive
semidenite matrices of trace one, i.e.,
On = {X S n |
X 0,
Tr X = 1} .
n
The hyperplane Tr X=1 intersects S+
on a compact set and thus denes a base
for this cone. The extreme points of On are exactly the rank one matrices of the
form X = xxT , where x Rn and
x
= 1. The two-dimensional spectraplex O2
is anely isomorphic to the unit disk in the plane and has already appeared in
Example 2.11.
Elliptope and dual elliptope: Let En be the set of positive semidenite matrices
with unit diagonal, i.e.,
En = {X S n |
X 0,
Xii = 1,
i = 1, . . . , n} .
(2.10)
For nice pictures of the 33 elliptope and its dual body, see Figure 5.8 in Chapter 5.
Operator and nuclear norms: Let A Rn1 n2 be a matrix. The spectral or
operator norm of A is given by its maximum norm gain, i.e.,
A
=
max
vRn2 ,v=1
Av
= 1 (A),
i
i
16
main
2012/11/1
page 16
i
A
:=
r
i (A),
(2.11)
i=1
B
.
Furthermore, the following inequalities hold for any matrix A of rank at most r:
A
A
F
A
r
A
F r
A
,
(2.12)
1
1
where
A
F is the Frobenius norm, dened as
A
F := (TrAT A) 2 = ( ij a2ij ) 2 .
Both the operator norm and the nuclear norm have nice characterizations in
terms of semidenite programming. In particular, the operator norm
A
is the
optimal solution of the primal-dual pair of semidenite programs
maximize
subject to
Tr 2AT X12
X11 X12
Tr
= 1,
T
X12
X22
X 0,
minimize
subject to
t
tIn1
AT
(2.13)
A
0.
tIn2
To see the exact correspondence between the standard form (SDP-P)-(SDP-D) and
this formulation, notice that we can take m = 1, X is a block (n1 + n2 ) (n1 + n2 )
matrix, A1 is the (n1 + n2 ) (n1 + n2 ) identity matrix, b1 = 1, and the cost matrix
0 A
C is the block matrix ( A
). Notice that we have the factor of 2 here because
T
0
T
Tr CX = Tr 2A X12 , and we have maximize in (2.13) instead of minimize
in (SDP-P) due to change of sign in the objective function.
Similarly (or dually), the nuclear norm
A
corresponds to the optimal
value of the primal-dual pair
i
i
Tr AT Y
In1 Y
0,
Y T In2
maximize
subject to
main
2012/11/1
page 17
i
17
minimize
subject to
1
(TrW1 + TrW2 )
2
W1 A
0.
AT W2
(2.14)
Since the operator norm and the nuclear norm are dual norms, their unit balls
are dual polar convex bodies. In Figure 2.6 we illustrate these convex sets for the
case of a 2 2 symmetric matrix given by
x y
A=
.
y z
(2.15)
1.0
1.0
0.5
0.5
z 0.0
1.0
1.0
z 0.0
0.5
0.5
0.5
0.5
0.0 y
0.0 y
1.0
1.0
1.0
0.5
0.5
1.0
0.5
0.0
0.0
x
0.5
1.0
0.5
1.0
0.5
1.0
1.0
Figure 2.6. Unit balls of the spectral norm and the nuclear norm, for the
space of 2 2 symmetric matrices.
k-ellipse: We consider a class of planar convex sets dened by the algebraic curves
known as k-ellipses [33]. Recall that the standard ellipse in R2 is dened as the
locus of points with the sum of distances to two xed points (the foci) a xed
constant. Extending this denition to k foci, one can dene the k-ellipse as the
algebraic curve in R2 consisting of all points whose sum of distances from k given
points is a xed number. More formally, x a positive real number d, and x k
distinct points (u1 , v1 ), (u2 , v2 ), . . . , (uk , vk ) in R2 . The k-ellipse with foci (ui , vi )
and radius d is the following curve in the plane:
2
2
(x ui ) + (y vi ) = d .
(x, y) R
2
(2.16)
i=1
In Figure 2.7, we present a few k-ellipses with dierent numbers of foci. In contrast
to the classical circle (corresponding to k = 1) and ellipse (k = 2), a k-ellipse does
not necessarily contain all the foci in its interior. We dene the closed convex set
Ck to be the region whose boundary is the k-ellipse, and it is a sublevel set of the
i
i
18
main
2012/11/1
page 18
i
Figure 2.7. A 3-ellipse, a 4-ellipse, and a 5-ellipse, each with its foci.
convex function
(x, y)
k
(x ui )2 + (y vi )2 .
(2.17)
i=1
di d,
di + x ui
y vi
y vi
0,
di x + ui
i = 1, . . . , k.
To see this, notice that the 2 2 matrix above is positive semidenite if and only
if (x ui )2 + (y vi )2 d2i and di 0.
In a less obvious fashion, the k-ellipse can also be represented without additional slack variables, so it is also a spectrahedron. However, in this case the size
of the matrices is much bigger. Below we present a concrete statement; see [33] for
a sharper result and an explicit construction of this representation.
Theorem 2.18. The convex set Ck whose boundary is the k-ellipse of foci (ui , vi )
and radius d is dened by the LMI
x Ak + y Bk + Ck 0,
(2.18)
i
i
d+3xu1 u2 u3
yv1
yv
d+x+u
1
1 u2 u3
yv
0
2
0
yv
2
yv
0
3
0
yv3
0
0
0
0
yv3
0
0
0
d+xu1 u2 +u3
yv1
yv2
0
0
yv3
0
0
yv1
dx+u1 u2 +u3
0
yv2
main
2012/11/1
page 19
i
19
yv2
0
d+xu1 +u2 u3
yv1
0
0
yv3
0
0
0
yv3
0
yv2
0
dxu1 +u2 +u3
yv1
0
yv2
yv1
dx+u1 +u2 u3
0
0
0
yv3
yv3
.
yv2
yv1
d3x+u1 +u2 +u3
Exercise 2.19. Prove the relation (2.10) between the elliptope and the spectraplex.
Exercise 2.20. Show that the two semidenite programs in (2.14) are indeed a
primal-dual pair.
Exercise 2.21. Prove the correctness of the semidenite characterizations of the
operator and nuclear norms given in (2.13) and (2.14).
Exercise 2.22. Show that for the symmetric matrix in (2.15), the inequalities that
dene the boundary of the unit balls of the operator and spectral norms shown in
Figure 2.6 are
y 2 + (x + z) xz 1,
y 2 (x + z) xz 1
and
(x z)2 + 4y 2 1,
x + z 1,
(x + z) 1,
respectively.
Exercise 2.23. Analyze the structure of the convex sets in Figure 2.6. What are
the matrices associated with the at facets (or the vertices)? How can you interpret
the rotational symmetries of these convex bodies?
2.1.4
Conic Programming
The strong formal similarities between linear programming and semidenite programming (equations (LP-P)-(LP-D) vs. (SDP-P)-(SDP-D)) suggest that a more
i
i
20
main
2012/11/1
page 20
i
Ax = b
x0
minimize C, X
subject to Ai , X = bi ,
X 0
maximize
bT y
subject to
AT y c
maximize
subject to
bT y
Ai yi C
(LP)
(SDP)
maximize
subject to
y, bT
c A y K
(CP)
xK
Table 2.1. Primal-dual formulations of linear programming (LP), semidefinite programming (SDP), and general conic programming (CP).
general formulation, encompassing both cases, may be possible. Indeed, a general class of optimization problems that unies linear and semidenite optimization
(as well as a few other additional cases) is conic programming. We describe the
conic framework next, explaining rst the key idea, followed by the mathematical
formulation.
The starting point is the geometric interpretation of linear and semidenite
programming. The feasible set of an LP problem in standard form (LP-P) is the
intersection of an ane subspace (described by the equations Ax = b) and the nonnegative orthant Rn+ . Similarly, the feasible set of a semidenite program (SDP-P)
is the intersection of an ane subspace (described by Ai , X = bi ) with the set of
n
n
positive semidenite matrices S+
. Since both Rn+ and S+
are closed convex cones
(in fact, they are proper conessee below), one can dene a general class of optimization problems where the feasible set is the intersection of a proper cone and an
ane subspace. This is exactly what conic optimization will do!
We present a formal description next. We will be a bit more careful than usual
here in the denition of the respective spaces and mappings. It does not make much
of a dierence if we are working in Rn (since we can identify a space and its dual
through the inner product), but it is good hygiene to keep these distinctions in
mind and will prove useful when dealing with more complicated spaces. We consider
two real vector spaces, S and T , and a linear mapping A : S T . Recall that every
real vector space has an associated dual space, which is the vector space of realvalued linear functionals. We denote these dual spaces by S and T , respectively,
and the pairing between an element of a vector space and one of the dual as ,
i
i
main
2012/11/1
page 21
i
21
(i.e., f (x) = f, x). Recall that the adjoint mapping of A is the unique linear map
A : T S dened by
A y, xS = y, AxT
x S, y T .
Notice here that the brackets on the left-hand side of the equation represent the
pairing in S, and those on the right-hand side correspond to the pairing in T .
A cone K S is pointed if K (K) = {0} and is solid if it is full-dimensional
(i.e., dim K = dim S). A cone that is convex, closed, pointed, and solid is called a
proper cone. Given a cone K, its dual cone is K := {z S : z, xS 0 x
K}. The dual of a proper cone is also a proper cone; see Exercise 2.24. An element
x is in the interior of the proper cone K if and only if x, z > 0 z K , z = 0.
Standard conic programs. Given a linear map A : S T and a proper cone
K S, we dene the primal-dual pair of (conic) optimization problems
minimize c, xS
subject to Ax = b,
maximize
subject to
y, bT
c A y K ,
x K,
where b T , c S . Notice that exactly the same proof presented earlier works
here to show weak duality:
c, xS y, bT = c, xS y, AxT
= c, xS A y, xS
(2.19)
= c A y, xS
0.
In the usual cases (e.g., LP and SDP), all vector spaces are nite-dimensional and
thus isomorphic to their duals. The specic correspondence between these is given
through whatever inner product we use.
Among the classes of problems that can be interpreted as particular cases of
the general conic formulation we have linear programs, second-order cone programs
(SOCP), and semidenite programs, when we take the cone K to be the nonnegative
orthant Rn+ , the second-order cone Ln+ (Exercise 2.25), or the positive semidenite
n
cone S+
, respectively. Two other important cases are when K is the hyperbolicity
cone associated with a given hyperbolic polynomial [22, 40] and the cone n,2d of
multivariate polynomials that are sums of squares. We discuss this latter example
in much more detail in Chapter 3.
Despite the formal similarities, there are a number of dierences between linear programming and general conic programming. We have already seen in (2.19)
that weak duality always holds for conic programming. However, recall from Example 2.14 that in semidenite programming (and thus, in general conic programming)
there may be a nonzero duality gap. In the next section, we explore the geometric
reasons for the possible failure of strong duality in conic programming.
Exercise 2.24. Let K S be a proper cone. Show that its dual cone K S is
also a proper cone, and K = K.
i
i
22
main
2012/11/1
page 22
i
n
12
x2i
x0 .
Ln+ = (x0 , x1 , . . . , xn ) Rn+1 :
i=1
Show that Ln+ is a proper cone and is isomorphic to its dual cone.
Exercise 2.26. Classify the following statements as true or false. A proof or
counterexample is required.
Let A : Rn Rm be a linear mapping and K Rn a cone.
1. If K is convex, then A(K) is convex.
2. If K is solid, then A(K) is solid.
3. If K is pointed, then A(K) is pointed.
4. If K is closed, then A(K) is closed.
Do the answers change if A is injective and/or surjective? How?
2.1.5
Strong Duality
2x12 = 1,
x11 x12
0,
x12 x22
maximize y
0
subject to
y
y
1 0
.
0
0 0
For the dual problem, y = 0 provides an optimal solution, with optimal value d = 0.
On the primal side, however, we cannot have x11 = 0, since this would violate the
positive semideniteness constraint. However, by choosing x11 = , x22 = 1/, we
obtain a cost p that is arbitrarily small but always strictly positive.
The example above shows that, in contrast with the case of linear programming, in
semidenite or conic programming optimal solutions may not be attained, even if
there is zero duality gap.
i
i
main
2012/11/1
page 23
i
23
There are several geometric interpretations of what causes the failure of strong
duality for general conic problems. Perhaps the most natural one is based on the
fact that the image of a proper cone under a linear map may not be closed, and
thus it is not necessarily a proper cone. This fact may seem a bit surprising (or
perhaps wrong!) the rst time one encounters it, but after a while it becomes
quite reasonable. (If this is the rst time you have heard about this, we strongly
encourage you to stop reading and think of a counterexample right now! Or, see
Exercise 2.30.)
Strong duality and infeasibility certicates. To better understand strong
duality, we begin with a simple geometric interpretation in the conic setting, in terms
of the separating hyperplane theorem. Recall that this theorem (see Section A.3.3
in Appendix A for several versions of this important result) establishes that if we
have two disjoint convex sets, where one of them is closed and the other compact,
there always exists a hyperplane that separates the two sets. For simplicity, we
concentrate only on the case of conic feasibility, i.e., where we are interested in
deciding the existence of a solution x to the equations
Ax = b,
x K,
(2.20)
A y, x 0 x K
A y K .
Thus, if (2.20) is infeasible, and provided the hypotheses of the separating hyperplane theorem apply, there exists a (suitably normalized) linear functional y which
satises
y, b = 1,
A y K .
(2.21)
i
i
24
main
2012/11/1
page 24
i
i
i
main
2012/11/1
page 25
i
25
Exercise 2.30. Consider the set K = {(x, y, z) : y 2 xz, z 0}. Show that K is
a proper cone. Show that its projection onto the (x, y) plane is not a proper cone.
Exercise 2.31. Let K1 , K2 be closed convex cones. Show, via a counterexample,
that the Minkowski sum K1 + K2 does not have to be closed.
Exercise 2.32. Let L S be a subspace, and K S be a proper cone. Show that
the following two propositions are equivalent:
(i) L K = {0}.
(ii) There exists z L int(K ).
Hint: For the dicult direction (i) (ii), argue by contradiction, and use homogeneity and the separation theorem for convex sets.
Although as we have seen, standard duality may fail in semidenite (or
conic) programming, it is nevertheless possible to formulate a more complicated
semidenite dual program (called the Extended LagrangeSlater Dual in [36])
for which strong duality always holds, regardless of interior-point assumptions.
For details, as well as a comparison with the more general minimal cone approach, we refer the reader to [36, 37].
2.2
2.2.1
x[0] = x0 .
(2.22)
This kind of linear recurrence equation is a simple example of a discrete-time dynamical system, where the state x[k] evolves over time, starting from an initial
condition x0 . The dierence equation (2.22), or its continuous-time analogue (the
i
i
26
main
2012/11/1
page 26
i
d
x(t) = Ax(t)), is often used to model the time evolulinear dierential equation dt
tion of quantities such as temperature of objects, size of a population, voltage of
electrical circuits, and concentration of chemical mixtures.
A natural and important question about (2.22) is the long-term behavior of
the state. In particular, as k , under what conditions can we guarantee that
the state x[k] remains bounded, or converges to zero? It is well known (and easy
to prove; see Exercise 2.35) that x[k] converges to zero for all initial conditions x0
if and only if the spectral radius of the matrix A is smaller than one, i.e., all the
eigenvalues i satisfy |i (A)| < 1 for i = 1, . . . , n. In this case we say that the
system (2.22), or the matrix A, is stable (or Schur stable, if the discrete-time aspect
is not clear from the context).
While this spectral characterization is very useful, an alternative viewpoint is
sometimes even more convenient. The basic idea is to consider a generalization and
abstraction of the notion of energy, usually known as a Lyapunov function. These
are functions of the state x[k], with the property that they decrease monotonically
along trajectories of the system (2.22). It turns out that for linear systems there
is a simple characterization of stability in terms of a quadratic Lyapunov function
V (x[k]) = x[k]T P x[k]. Notice rst that the monotonicity condition V (x[k + 1])
V (x[k]) (for all states x[k]) can be equivalently expressed in terms of the matrix
inequality AT P A P 0. We then have the following result.
Theorem 2.33. Given a matrix A Rnn , the following conditions are equivalent:
1. All eigenvalues of A are inside the unit circle; i.e., |i (A)| < 1 for i = 1, . . . , n.
2. There exists a matrix P S n such that
P 0,
AT P A P 0.
k=1
(Ak )T Ak
(Ak )T Ak = I 0.
k=0
Thus, the characterization given above enables the study of the stability properties of the linear dierence equation (2.22) in terms of a semidenite programming problem, whose feasible solutions correspond to Lyapunov functions. In Section 3.6.2 we will explore extensions of these ideas to more complicated dynamics,
not necessarily linear.
Control design. Consider now the case of a linear system, where there is a control
input u[k]:
(2.23)
x[k + 1] = A x[k] + B u[k],
x[0] = x0 ,
i
i
main
2012/11/1
page 27
i
27
where B Rnm . The idea here is that by properly choosing the control input
u[k] Rm at each time instant, we may be able (under certain conditions), to aect
or steer the behavior of x[k] toward some desired goal. We are interested in the
case where the matrix A is not stable, but we can use linear state feedback to set
u[k] = Kx[k] for some xed matrix K (to be chosen appropriately). It is easy to
see that after this substitution, the system is described by (2.22), where the matrix
A is replaced by A(K) = A + BK. Thus, our goal is stabilization; i.e., we want
to nd a matrix K such that A + BK is stable (all eigenvalues have absolute value
smaller than one).
Although this problem seems (and is!) fairly complicated due to the nonlinear
dependence of the eigenvalues of A + BK on the unknown matrix K, it turns out
that it can be nicely solved using semidenite optimization and the Lyapunov characterization given earlier. Indeed, we can use Schur complements (see Appendix A)
to rewrite the condition
(A + BK)T P (A + BK) P 0,
P 0,
as
P
(A + BK)T P
P (A + BK)
P
0.
Although nicer, this condition is not quite an SDP yet, since it is bilinear in (P, K)
(and, thus, not jointly convex). However, dening Q := P 1 , and left- and rightmultiplying the equation above with the matrix BlockDiag(Q, Q), we obtain
Q
(A + BK)Q
Q(A + BK)T
Q
0.
Notice that this expression contains both Q and KQ, but there is no single appearance of the variable K. Thus, we can dene a new variable Y := KQ, to
obtain
Q
QAT + Y T B T
0.
(2.24)
AQ + BY
Q
This problem is now linear in the new variables (Q, Y ). In fact, it is a semidenite
programming problem! After solving it, we can recover the controller K via K =
Q1 Y . We summarize our discussion in the following result.
Theorem 2.34. Given two matrices A and B, there exists a matrix K such that
A + BK is stable if and only if the spectrahedron described by (2.24) is nonempty,
i.e., there exist matrices (Q, Y ) satisfying this (strict) linear matrix inequality.
Hence our control design problem is equivalent to solving a semidenite programming feasibility problem.
i
i
28
main
2012/11/1
page 28
i
Semidenite programming techniques have become quite central in the analysis and design of control systems. The example above describes only the tip of
the iceberg in terms of the many design problems that can be attacked with these
techniques; we refer the reader to the works [6, 47] and the references therein.
We remark that the formulas in this example (e.g., (2.24)) do not explicitly
depend on the dimensions of the matrices A, B, K, Y, Q. Hence, these kinds of
problems are sometimes called dimension-free. This dimension-free feature applies
to many classical problems in linear systems and has strong implications. Linear
control theory problems can often be reduced to polynomials in matrix variables
where the feasible set is dened by these polynomials being positive semidenite.
Analyzing this situation requires a theory of inequalities for free noncommutative
polynomials extending classical real geometry for commutative polynomials. The
convexity aspects of this new area, noncommutative real algebraic geometry, is the
subject of Chapter 8.
Exercise 2.35. Show that for the linear dierence equation (2.22), the state
x[k] converges to zero for all initial conditions x0 if and only if |i (A)| < 1 for
i = 1, . . . , n. Hint: show that x[k] = Ak x0 , and consider rst the case where the
matrix A is diagonalizable.
Exercise 2.36. The system (2.23) has a nonstabilizable mode if the matrix A has
a left eigenvector w such that wT A = wT , wT B = 0, and || 1. Show that if
this is the case, then the SDP (2.24) cannot be feasible. Interpret this statement in
terms of the eigenvalues of A + BK. What does this say about the dual SDP?
2.2.2
xT Qx
xi {1, 1},
(2.25)
where Q S n . There are many well-known problems that can be naturally written in the form above. Among these, we mention the maximum cut (MAXCUT)
problem, 0-1 knapsack, etc.
Notice that the Boolean constraints can be modeled using quadratic equations, i.e.,
xi {1, 1}
x2i = 1.
i
i
main
2012/11/1
page 29
i
29
xT Qx
subject to
x2i = 1,
(2.26)
and we denote the optimal value and optimal solution of this problem as f and
x , respectively. It is well known that the decision version of this problem is NPcomplete (e.g., [18]). Notice that this is true even if the objective function is convex
(i.e., the matrix Q is positive denite), since we can always assume Q 0 by adding
to it a large constant multiple of the identity (this only shifts the objective by a
constant).
Computing good solutions to the binary optimization problem (2.26) is a
quite dicult task, so it is of interest to produce accurate bounds on its optimal
value. As in all minimization problems, upper bounds can be directly obtained from
feasible points. In other words, if x0 Rn has entries equal to 1, it always holds
that f xT0 Qx0 (of course, for a poorly chosen x0 , this upper bound may be very
loose).
To prove lower bounds, we need a dierent technique. There are several approaches to doing this, but many of them will turn out to be exactly equivalent
in the end. In particular, we can provide a lower bound in terms of the following
primal-dual pair of semidenite programming problems:
minimize
subject to
Tr QX
maximize
Tr
Xii = 1,
X 0,
subject to
Q ,
diagonal.
(2.27)
n
i x2i = Tr,
i=1
i
i
30
main
2012/11/1
page 30
i
2
Tr Q arcsin[X].
(2.28)
The notation arcsin[] indicates that the arcsine function is applied componentwise,
i.e., (arcsin[X])ij = arcsin Xij .
Exercise 2.38. Prove Lemma 2.37, and verify that it implements the hyperplane
rounding scheme.
Approximation ratios. In many problems, we want to understand how far these
upper and lower bounds are from each other. Depending on the specic assumptions
on the cost function, the hyperplane rounding method (or slight variations) will give
i
i
main
2012/11/1
page 31
i
31
x2i = 1
(2.29)
and state below our assumptions in terms of the matrix A (or, equivalently, the
matrix Q in the minimization formulation (2.25)).
We describe next three well-known cases where constant approximation ratios
can be obtained.
Diagonally
dominant: A symmetric matrix A is diagonally dominant if aii
|a
j=i ij | for all i. This is an important case that corresponds, for instance,
to the MAXCUT problem, where the cost
function to be maximized is the
Laplacian of a graph (V, E), given by 14 (i,j)E (xi xj )2 . Every diagonally
dominant quadratic form can be written as a nonnegative linear combination
of terms of the form x2i and (xi xj )2 [4]. Thus, to analyze the performance of
hyperplane rounding when A is diagonally dominant, it is enough to consider
the inequality
E[(xi xj )2 /2] = E[1 xi xj ] = 1
2
arcsin Xij GW (1 Xij ),
i
i
32
main
2012/11/1
page 32
i
2
2
Tr A arcsin[X] Tr AX.
Notice that 2 0.636, so the approximation ratio in this case is slightly worse
than for the diagonally dominant case.
Bipartite: This case corresponds to the cost function being bilinear and has been
analyzed in [2, 30]. We assume that the matrix A has a structure
1 0 S
.
A=
2 ST 0
Letting x = [p; q], an equivalent formulation is in terms of a bilinear optimization problem
maximize pT Sq,
where S Rnm and p, q are in {+1, 1}n and {+1, 1}m, respectively.
This problem has a long history in operator theory and functional analysis and
was rst analyzed (in a quite dierent form) by Grothendieck. For this class
of problems, it follows from his results that a constant ratio approximation is
possible. In fact, the worst-case ratio (over all instances) between the values
of the semidenite relaxation and the bilinear binary optimization problem is
called the Grothendieck constant and is usually denoted KG ,
KG := sup
A
Tr AX
,
f
where X is, as before, the optimal solution of the SDP relaxation. The exact
value is this constant is unknown at this time. The argument below is essentially due to Krivine [25] and provides an upper bound to the Grothendieck
constant.
Since there are no assumptions about the sign of the entries of the matrix S,
we cannot directly apply the techniques discussed earlier to prove a bound on
the quality of hyperplane rounding. The basic strategy in Krivines approach
is the following: instead of using hyperplane rounding directly on the solution X of the SDP relaxation, we will apply rst a particular componentwise
transformation, to obtain a matrix Y , and then apply hyperplane rounding
to Y . The reason is that this will considerably simplify the computation of
the expected value of the objective function.
To do this, we use a block version of Lemma 2.39.
i
i
main
2012/11/1
page 33
i
33
k=0
t2k+1
,
(2k + 1)!
sin(t) =
k=0
(1)k
t2k+1
,
(2k + 1)!
2
2
Tr A arcsin[Y ] = Tr S(cK X12 /2) = cK Tr SX12 ,
f
KG 1/cK 1.7822. It has been recently shown that this rounding method
(and thus, the value 1/cK ) is not the best possible one [8], but the exact
approximation ratio is not currently known.
Exercise 2.41. Show that the optimal values of the primal and dual semidenite
programs in (2.27) are equal, i.e., there is no duality gap.
Exercise 2.42. The entrywise product AB of two matrices is given by (AB)ij =
Aij Bij . This product is also known as the Hadamard or Schur product. The Schur
product theorem says that if two matrices A, B are positive semidenite, so is
their product A B.
i
i
34
main
2012/11/1
page 34
i
2.2.3
Given an undirected graph G = (V, E), a stable set (or independent set ) is a subset
of the set of vertices V with the property that the induced subgraph has no edges.
In other words, none of the selected vertices are adjacent to each other.
The stability number of a graph, usually denoted by (G), is the cardinality
of the largest stable set. Computing the stability number of a graph is NP-hard.
There are many interesting applications of the stable set problem. In particular,
it can be used to provide upper bounds on the Shannon capacity of a graph [28],
a problem that appears in coding theory (when computing the zero-error capacity
of a noisy channel [43]). In fact, this was one of the rst appearances of semidenite
programming.
In many problems, it is of interest to compute upper bounds on (G). The
Lov
asz theta function of the graph G is denoted by (G) and is dened as the
solution of the primal-dual SDP pair:
maximize
subject to
minimize t
subject to Y tI
Tr JX
TrX = 1
Xij = 0,
X 0,
(i, j) E,
Yii = 1,
Yij = 1,
i V,
(i, j) E,
(2.31)
i
i
main
2012/11/1
page 35
i
35
(G) (G)
is the complement of the graph G.
holds, where G
construct a feasible solution of the dual SDP in
Hint: Given a coloring of G,
(2.31).
2.2.4
In many applications, one tries to nd a function in a given function class, that takes
specic values at prescribed points. These kinds of questions are known as interpolation problems. A classical and important class of interpolation problems involves
bounded analytic functions. The mathematical background for these problems is
reviewed and developed further in Chapter 9. Good general references include [3]
for the theoretical aspects, and [24, 47] for specic applications of interpolation in
systems and control theory.
We discuss here two specic problems related to this area. The rst is the
computation of the H -norm of an analytic function, and the second is the classical
NevanlinnaPick interpolation problem. Additional connections between analytic
interpolation and convex optimization can be found in [6].
Norms of rational analytic functions. Let D be the complex open unit disk
D = {z C : |z| < 1}. Consider a scalar rational function of a complex variable z
given by
f (z) = cT (z 1 I A)1 b + d,
(2.32)
i
i
36
main
2012/11/1
page 36
i
f
= sup |f (z)|.
(2.33)
zD
f
< if and only if the semidenite program
T
P 0 A b
A b
P 0
,
P 0,
(2.34)
0 1 cT d
cT d
0 2
is feasible, where the decision variable is the matrix P S n .
A full proof can be found, for instance, in [3, 47]. We present here only the
easy direction, i.e., showing that if (2.34) holds, then we have
f (z)
< . For
this, let v = (z 1 I A)1 b, and multiply the rst inequality in (2.34) left and right
by [v 1] and its conjugate transpose, respectively. From the identity
1
A b v
z v
,
=
cT d 1
f (z)
we have that
(|z 1 |2 1)(v P v) + (|f (z)|2 2 ) < 0,
and thus the conclusion directly follows. The converse direction takes a bit more
work; see Chapter 9. There are extensions of this result to the matrix case, i.e.,
where f (z) is matrix-valued.
Exercise 2.46. Use the given formulation to compute the H -norm of the analytic function f (z) = z3 +zz2
2 z+3 . How can you compute, from the semidenite
formulation, a value of z at which the maximum is achieved?
2 We remark that the notation used here is slightly dierent from the usual notation in systems
and control theory, where z is used instead of z 1 in (2.32). The reason is that for interpolation,
it is more natural to use functions that are analytic on D (poles outside the unit circle) than
functions that are analytic outside D. To avoid distracting technical issues of controllability and/or
observability, we use strict inequalities throughout.
i
i
main
2012/11/1
page 37
i
37
Exercise 2.47. Formulate a similar statement for the matrix case. Do the same
formulas work?
NevanlinnaPick interpolation. Consider now the following problem. We want
to nd an analytic function on D satisfying the interpolation constraints:
f (ak ) = ck
for
k = 1, . . . , m,
(2.35)
where ak D. When does there exist an analytic function, satisfying the interpolation conditions, whose absolute value is bounded by 1 on the unit disk?
Clearly, a necessary condition is that the interpolated values ck must satisfy
|ck | 1 for all k. However, due to the analyticity constraint, this is not sucient.
Consider, for instance, the case m = 2 and the constraints f (0) = 0 and f (1/2) = c.
In this case, a necessary condition is |c| 1/2, which is stronger than the obvious
condition |c| 1. To see this, notice that, due to the rst interpolation constraint,
f (z) must have the form f (z) = zg(z), where g(z) = f (z)/z is also analytic on D
and bounded by one (by the maximum modulus theorem, since |f (z)| = |g(z)| on
the unit circle). Thus, we must have 1 |g(1/2)| = 2|c|, and thus |c| 1/2.
Necessary and sucient conditions for the interpolation problem to be feasible
are given by the NevanlinnaPick theorem; see Chapter 9. The formulation below
is convenient from the optimization viewpoint.
Theorem 2.48. There exists a function f (z) analytic on D, satisfying the norm
bound
f (z)
and the interpolation constraints (2.35) if and only if
Z
C
0,
(2.36)
C Z 1
where Zjk =
1
1a
j ak
and C = Diag(c1 , . . . , cm ).
Using Schur complements, it can be easily seen that this formulation is equivalent to the more usual characterization where the m m Pick matrix P given by
Pjk =
2 cj ck
1 aj ak
is required to be positive semidenite (e.g., Section 9.8). The advantage of condition (2.36) is that it is linear in the interpolation values ck . This allows its
use in a variety of system identication problems; see, for instance, [11, 35]. The
NevanlinnaPick interpolation problem has many important applications in systems
and control theory; see, for instance, [14] and [47] and the references therein.
2.2.5
Assume we are given a list of pairwise distances between a nite number of points.
Under what conditions can the points be embedded in some nite-dimensional space
and those distances be realized as the Euclidean metric between the embedded
i
i
38
main
2012/11/1
page 38
i
points? This problem appears in a large number of applications, including distance geometry, computational chemistry, sensor network localization, and machine
learning.
Concretely, assume we have a list of distances dij for 1 i < j n. We
would like to nd points xi Rk (for some value of k) such that
xi xj
= dij
for all i, j. What are necessary and sucient conditions for such an embedding
to exist? In 1935, Schoenberg [41] gave an exact characterization in terms of the
semideniteness of the matrix of squared distances.
Theorem 2.49. The distances dij can be embedded in a Euclidean space if and
only if the n n matrix
0
d2
12
2
D :=
d13
..
.
d212
0
d223
..
.
d213
d223
0
..
.
...
...
...
..
.
d21n
d22n
d23n
...
d21n
d22n
d23n
..
.
0
x1 , x1
x2 , x1
G :=
..
x1 , x2
x2 , x2
..
.
xn , x1 xn , x2
...
...
..
.
x1 , xn
x2 , xn
= [x1 , . . . , xn ]T [x1 , . . . , xn ],
..
. . . xn , xn
i
i
2.2.6
main
2012/11/1
page 39
i
39
An interesting class of optimization problems appearing in many application domains is rank minimization problems. These have the form
minimize
subject to
rank X
X C,
(2.37)
where the matrix X Rmn is the decision variable, and C is a given convex
constraint set. Notice that the cost function is integer-valued, and thus (unless the
problem is trivial) these optimization problems are not convex.
Rank minimization questions arise in many dierent areas, since notions such
as order, complexity, and dimensionality can often be expressed by means of the
rank of an appropriate matrix. For example, a low-rank matrix could correspond
to a low-degree statistical model for a random process (e.g., factor analysis), a loworder realization of a linear dynamical system, or a low-dimensional embedding of
data in Euclidean space (as in Section 2.2.5). If the set of models that satisfy the
desired constraints is convex, then choosing the simplest one in a given family can
be formulated as a rank minimization problem of the form (2.37).
In general, rank minimization problems can be quite dicult to solve, both
in theory and practice. However, several researchers have proposed heuristic techniques to obtain good approximate solutions. A particularly interesting method is
the nuclear norm heuristic, originally proposed in [17, 16]. In this method, instead
of directly solving the problem (2.37), one solves instead
minimize
subject to
X C,
(2.38)
where
is the nuclear norm dened earlier in (2.11). In other words, the
dicult objective function (rank) is replaced by a nicer cost function (nuclear
norm) which is convex, and thus the resulting problem is convex.
Under certain conditions on the set C, it has been shown that the solution of
the problem (2.38) coincides with the lowest-rank solution, i.e., the true solution
of (2.37). For example, a typical formulation (see, e.g., [39] for a specic statement)
would establish that if the set C is a subspace of dimension O(n log n), uniformly chosen according to a natural rotation-invariant probability measure, then the nuclear
norm heuristic succeeds with high probability.
Atomic norms. An interesting generalization of these methods is obtained by
considering more general atomic norms [10]. Consider a set A of atoms vi in some
vector space V (the set A can be nite or innite). Given an element a V , we are
interested in the smallest decomposition of a in terms of the elements vi , i.e., the
one that satises
minimize
i |
i |
(2.39)
subject to a = i i vi .
We can then dene the atomic norm
a
A as the optimal value of this optimization
problem. If the set of atoms is nite, this is a linear programming problem. In most
i
i
40
main
2012/11/1
page 40
i
maximize Tr AT Y
Diag(p)
Y
subject to
0,
YT
Diag(q)
m
n
pi +
qi = 2,
i=1
minimize
subject to
i=1
t
V
AT
A
0,
W
Vii = t,
Wii = t.
(2.40)
A
2
A
A KG
A
2 ,
where KG is the Grothendieck constant.
i
i
main
2012/11/1
page 41
i
41
Exercise 2.54. Based on the previous exercise, explain the geometric relationship
between the unit ball of the 2 -norm in Rmn and the elliptope Em+n dened earlier
in Section 2.1.3.
2.3
2.3.1
In this section we describe a few algorithmic and complexity aspects of the numerical
solution of semidenite optimization problems. For a complete treatment, we refer
the reader to articles and monographs such as [13, 32, 44, 45].
Semidenite programs are convex optimization problems and, as such, can be
solved using general convex optimization techniques. Under natural assumptions
(e.g., to rule out doubly exponentially small solutions), semidenite optimization is
solvable in polynomial time, in the sense that -suboptimal, weakly feasible solutions
can be computed in time polynomial in log 1 . This follows, for instance, from general
results about the ellipsoid method [21].
Despite these nice theoretical results, the ellipsoid method is often too slow
in practice. Since SDP is a generalization of linear programming, it is natural that
some of the most eective practical methods for SDP have been inspired by stateof-the-art techniques from LP. This has led to the development of interior-point
methods [1, 32] for SDP. The basic idea of interior-point methods is to consider the
optimality conditions
of Lemma 2.12 and to perturb the complementarity slackness
condition to (C i Ai yi )X = I. As varies, these equations implicitly dene a
curve (X , y ) called the central path, and to solve the original problem we need to
compute (X , y ) as 0. These equations are relatively easy to solve for large ,
and by carefully decreasing the value of , it is possible to use Newtons method to
eciently track solutions as decreases to zero. There are several dierent versions
of these methods (depending on the exact form of the equations to which Newtons
method is applied), although they all share fairly similar features. In particular,
primal-dual interior-point methods of this kind are among the most ecient known
methods for small- and medium-scale SDP problems.
Besides interior-point methods, there are several alternative techniques for
solving SDPs that are sometimes preferable to pure primal-dual methods due
to speed or memory eciency issues. Examples of these are techniques based on
low-rank factorizations [9], spectral bundle methods [23], or augmented Lagrangian
methods for large-scale problems [46], among others.
2.3.2
Software
There are a number of useful software packages for polyhedral computations, linear and semidenite programming, and algebraic visualization. We present below
a partial annotated selection. A few good up-to-date web resources for general
information about semidenite programming include Christoph Helmbergs SDP
page www-user.tu-chemnitz.de/helmberg/semidef.html and the SDPA website
sdpa.sourceforge.net.
i
i
42
main
2012/11/1
page 42
i
i
i
Bibliography
main
2012/11/1
page 43
i
43
SeDuMi, originally by Jos Sturm, currently being maintained by the optimization group at Lehigh University (sedumi.ie.lehigh.edu), is a widely used
MATLAB package for linear, quadratic, second order conic, and semidenite
optimization, and any combination of these.
An easy and convenient way to try out many of these packages, without installing
them in a local machine, is through the NEOS Optimization server (neos-server.org),
currently hosted by the University of Wisconsin-Madison.
Parsers. In practice, specifying a semidenite programming problem by explicitly
dening matrices Ai , C, and b in (SDP-P) can be cumbersome and error-prone.
A much more convenient and reliable way is to use a natural description of the
variables and inequalities and to automatically translate these into standard form
using a parser or modeling language. Two well-known and convenient modeling
environments for semidenite programming are the following:
CVX, by Michael Grant and Stephen Boyd.
cvxr.com/cvx. CVX is a MATLAB-based disciplined convex programming
software. It is particularly well suited to conic optimization, including semidefinite and geometric programming.
YALMIP, by Johan Lofberg.
yalmip.org. YALMIP is a MATLAB-based parser and solver for the modeling
and solution of convex and nonconvex optimization problems.
Bibliography
[1] F. Alizadeh. Interior point methods in semidenite programming with applications to combinatorial optimization. SIAM J. Optim., 5(1):1351, 1995.
[2] N. Alon and A. Naor. Approximating the cut-norm via Grothendiecks inequality. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of
Computing, ACM, New York, 2004, pp. 7280.
[3] J.A. Ball, I. Gohberg, and L. Rodman. Interpolation of Rational Matrix Functions. Birkhauser, Basel, 1990.
[4] G.P. Barker and D. Carlson. Cones of diagonally dominant matrices. Pacic
J. Math., 57(1):1532, 1975.
[5] D. Bertsimas and J. N. Tsitsiklis. Introduction to Linear Optimization. Athena
Scientic, Cambridge, MA, 1997.
[6] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory, Studies in Applied Mathematics 15. SIAM,
Philadelphia, 1994.
[7] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, Cambridge, UK, 2004.
i
i
44
main
2012/11/1
page 44
i
[8] M. Braverman, K. Makarychev, Y. Makarychev, and A. Naor. The Grothendieck constant is strictly smaller than Krivines bound. In the IEEE
52nd Annual Symposium on Foundations of Computer Science (FOCS), IEEE,
Washington, DC, 2011, pp. 453462.
[9] S. Burer and R. D.C. Monteiro. A nonlinear programming algorithm for solving
semidenite programs via low-rank factorization. Mathematical Programming,
95(2):329357, 2003.
[10] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A.S. Willsky. The convex geometry of linear inverse problems. Foundations of Computational Mathematics,
12:805849, 2012.
[11] J. Chen, C.N. Nett, and M.K.H. Fan. Worst case system identication in H :
Validation of a priori information, essentially optimal algorithms, and error
bounds. IEEE Transactions on Automatic Control, 40(7):12601265, 1995.
[12] V. Chvatal. Linear Programming. W.H. Freeman, New York, 1983.
[13] E. de Klerk. Aspects of Semidenite Programming: Interior Point Algorithms
and Selected Applications, Applied Optimization 65. Kluwer Academic Publishers, Dordrecht, The Netherlands, 2002.
[14] P. Delsarte, Y. Genin, and Y. Kamp. On the role of the NevanlinnaPick
problem in circuit and system theory. International Journal of Circuit Theory
and Applications, 9(2):177187, 1981.
[15] M. M. Deza and M. Laurent. Geometry of Cuts and Metrics, Algorithms and
Combinatorics 15. Springer-Verlag, Berlin, 1997.
[16] M. Fazel. Matrix Rank Minimization with Applications. Ph.D. thesis, Stanford
University, Stanford, CA, 2002.
[17] M. Fazel, H. Hindi, and S.P. Boyd. A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American
Control Conference, volume 6, IEEE, Washington, DC, 2001, pp. 47344739.
[18] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the
Theory of NP-Completeness. W. H. Freeman, New York, 1979.
[19] M. X. Goemans. Semidenite programming in combinatorial optimization.
Math. Programming, 79(13):143161, 1997.
[20] M. X. Goemans and D. P. Williamson. Improved approximation algorithms
for maximum cut and satisability problems using semidenite programming.
Journal of the ACM, 42(6):11151145, 1995.
[21] M. Grotschel, L. Lov
asz, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization, 2nd ed., Algorithms and Combinatorics 2. Springer-Verlag,
Berlin, 1993.
i
i
Bibliography
main
2012/11/1
page 45
i
45
[22] O. G
uler. Hyperbolic polynomials and interior point methods for convex programming. Math. Oper. Res., 22(2):350377, 1997.
[23] C. Helmberg and F. Rendl. A spectral bundle method for semidenite programming. SIAM Journal on Optimization, 10(3):673696, 2000.
[24] J.W. Helton. Operator Theory, Analytic Functions, Matrices, and Electrical
Engineering. CBMS Regional Conference Series in Mathematics 68. AMS,
Providence, RI, 1987.
[25] J.L. Krivine. Constantes de Grothendieck et fonctions de type positif sur les
spheres. Adv. Math, 31:1630, 1979.
[26] M. Laurent and S. Poljak. On a positive semidenite relaxation of the cut
polytope. Linear Algebra and Its Applications, 223:439461, 1995.
[27] T. Lee and A. Shraibman. Lower bounds in communication complexity. Foundations and Trends in Theoretical Computer Science, 3(4), 2009.
[28] L. Lovasz. On the Shannon capacity of a graph. IEEE Transactions on Information Theory, 25(1):17, 1979.
[29] J. Matousek and B. Gartner. Understanding and Using Linear Programming.
Springer-Verlag, New York, 2007.
[30] A. Megretski. Relaxations of quadratic programs in operator theory and system
analysis. In Systems, Approximation, Singular Integral Operators, and Related
Topics (Bordeaux, 2000), Oper. Theory Adv. Appl. 129. Birkh
auser, Basel,
2001, pp. 365392.
[31] Y. Nesterov. Semidenite relaxation and nonconvex quadratic optimization.
Optimization Methods and Software, 9:141160, 1998.
[32] Y. E. Nesterov and A. Nemirovski. Interior Point Polynomial Methods in Convex Programming, Studies in Applied Mathematics 13. SIAM, Philadelphia,
1994.
[33] J. Nie, P. A. Parrilo, and B. Sturmfels. Semidenite representation of the
k-ellipse. IMA Volumes in Mathematics and Its Applications, 146:117132,
2008.
[34] A. Packard and J. C. Doyle. The complex structured singular value. Automatica
J. IFAC, 29(1):71109, 1993.
[35] P. A. Parrilo, M. Sznaier, R.S. Sanchez Pe
na, and T. Inanc. Mixed
time/frequency-domain based robust identication. Automatica J. IFAC,
34(11):13751389, 1998.
[36] M. V. Ramana. An exact duality theory for semidenite programming and its
complexity implications. Math. Programming, 77(2, Ser. B):129162, 1997.
i
i
46
main
2012/11/1
page 46
i
i
i
main
2012/11/1
page 47
i
Chapter 3
Polynomial
Optimization,
Sums of Squares, and
Applications
Pablo A. Parrilo
We begin the study of one of the main themes of the book, namely, the relationships
between nonnegative polynomials, sums of squares, and semidenite programming.
The two key ideas around which this chapter is structured are
sum of squares decompositions of polynomials can be computed using
semidenite programming,
and
the search for infeasibility certicates for real polynomial systems is a
convex problem. Given an upper bound on the degree of the certicates,
they can be found by solving a sum of squares program.
In the rest of this chapter, we dene and explain the basic concepts needed to make
these assertions precise. For this, in Section 3.1 we introduce nonnegative polynomials, sum of squares decompositions, and the notion of sum of squares programs,
followed by a few simple but important applications in Section 3.2. In Section 3.3
we explore how the presence of additional algebraic structure, such as symmetries
or sparsity, enables more ecient computations. We then explain how these results
can be used to provide infeasibility certicates for systems of polynomial inequalities and the important implications for polynomial optimization (Section 3.4). Section 3.5 explores the dual side, including geometric and probabilistic interpretations.
Finally, in Section 3.6, we present additional applications of the methods in diverse
areas of applied mathematics and engineering, concluding with a short discussion
of current software implementations.
47
i
i
48
3.1
3.1.1
main
2012/11/1
page 48
i
Nonnegative Polynomials
We consider polynomials in n variables, with real coecients. A multivariate polynomial p(x1 , . . . , xn ) is nonnegative if it takes only nonnegative values, i.e.,
p(x1 , . . . , xn ) 0
(3.1)
(3.2)
We normally assume that the leading coecient pd is not zero, and occasionally we
will normalize it to pd = 1, in which case we say that p(x) is monic. The roots are
the values of x at which p(x) vanishes. By the fundamental theorem of algebra,
there is a unique factorization
p(x) = pd
d
#
(x xi ),
(3.3)
i=1
where the (complex) roots xi may have multiplicities, i.e., they are not necessarily
all distinct.
How do we decide if p is nonnegative? Clearly, an obvious necessary condition
is that the degree of p(x) be even. Otherwise, if the degree is odd, then either as
x or as x , the polynomial p(x) will become negative.
In some simple cases, it is possible to give direct characterizations.
i
i
main
2012/11/1
page 49
i
49
s1 sd1
s0
d
s1
s2
sd
,
s
=
xkj ,
(3.4)
H1 (p) = .
..
.. . .
k
..
.
.
.
sd1
sd
j=1
s2d2
where, as before, xj are the roots of p(x). The quantities sk are known as the power
sums and, remarkably, can be obtained directly from the coecients of p(x) using
the Newton identities, with no root computation needed; see Exercise 3.5. When
p(x) is monic, the sk are polynomials of degree k in the coecients of p(x).
It turns out that we can count the real roots of p(x) by analyzing the inertia
of its Hermite matrix (see Appendix A for background material on matrix inertia).
The following theorems make this connection precise.
Theorem 3.2. The rank of the Hermite matrix H1 (p) is equal to the number of
distinct (complex) roots. Its signature is equal to the number of distinct real roots.
Theorem 3.3. Let p(x) be a monic univariate polynomial of degree 2d. Then, the
following are equivalent:
1. The polynomial p(x) is strictly positive.
i
i
50
main
2012/11/1
page 50
i
(0, 0, 2) if > 0,
(0, 1, 1) if = 0,
I(H1 (p)) =
(1, 0, 1) if < 0,
and thus p is strictly positive if and only if p21 4p0 < 0.
Exercise 3.5. Let p(x) be a monic univariate polynomial as in (3.2). Show that
the power sums sk satisfy the recursive equations:
s0 = d,
sk =
k
(1)j1 pj skj ,
k = 1, 2, . . . .
j=1
d
f (xi )2 ,
i=1
i
i
main
2012/11/1
page 51
i
51
1 T
x Ax + 2bT x + c,
2
i
i
52
main
2012/11/1
page 52
i
This agrees with Example 3.1, which corresponds to the monic case where
p2 = 1.
As we will shortly see, although always convex, the cone of nonnegative polynomials
has a fairly complicated geometry in the general case. In Chapter 4, further features
of this set will be studied in detail.
Exercise 3.12. Prove Theorem 3.10.
Except for special situations like the quadratic case of Example 3.11, it will
not be easy to eciently obtain explicit descriptions of Pn,2d . The reason is that the
algebraic and combinatorial structure of the set of nonnegative polynomials can be
extremely complicated, even though it is a convex set. As a consequence, obtaining
general explicit inequalities (e.g., on the coecients) that dene when a polynomial
is nonnegative can be a very complex, or even hopeless, task.
To understand this situation in more detail, we discuss the algebraic and
geometric situation with the help of a few examples, followed by a discussion of the
computational complexity aspects.
Pn,2d is semialgebraic but is not basic semialgebraic. Recall that in Example 3.1 we provided explicit inequalities for the set P1,2 of univariate quadratics.
Since this description did not include quantiers or logical operations (e.g., set
unions, implications), we obtained a basic semialgebraic set (see Section A.4.4 in
Appendix A). As we will see, such convenient descriptions are not possible in general, since the set of nonnegative polynomials is not basic semialgebraic for 2d 4.
To see why this is the case, consider the following example, describing a particular ane section of P1,4 .
Example 3.13. Let p(x) be the quartic univariate polynomial p(x) = x4 +2ax2 +b.
For what values of a, b is p(x) nonnegative? Since the leading term x4 has even
degree and is strictly positive, p(x) is strictly positive if and only if it has no real
roots. The discriminant1 of p(x) is equal to Disx (p) = 256 b (a2 b)2 . For the
number of real roots to change, the discriminant must vanish, and thus the zero
set of the discriminant partitions the set of parameters (a, b) into regions where
the number of real roots is constant. The subset of (a, b) R2 for which p(x) is
positive corresponds to the case of no real roots, with its closure being the region
of nonnegativity. Notice that (as expected) this subset is convex and is shown in
Figure 3.1.
As the example illustrates, in the univariate case it is easy to see that if p(x)
lies on the boundary of the set P1,2d , then it must have a real root, of multiplicity at
least two. Indeed, if there is no real root, then p(x) is in the strict interior of P1,2d
(small enough perturbations will not create a root), and if it has a simple real root
it clearly cannot be nonnegative. Thus, on the boundary of P1,2d , the discriminant
1 The discriminant Dis (p) of a univariate polynomial p(x) is a polynomial in the coecients of
x
p that vanishes if and only if p has a multiple root. It is dened as the resultant between p(x) and
its derivative p (x); see [32] or [120] for an introduction to polynomial resultants and discriminants.
i
i
main
2012/11/1
page 53
i
53
b
2
1.5
1
0.5
-2
-1.5
-1
-0.5
-0.5
0.5
1.5
Figure 3.1. The discriminant Disx (p) partitions the parameter space (a, b)
into regions where the number of real roots is constant. The numbers indicate how
many real roots the polynomial x4 + 2ax2 + b has whenever (a, b) are in the corresponding region. The shaded set corresponds to the polynomial being nonnegative.
Disx (p) must necessarily vanish. However, it turns out that the discriminant does
not vanish only on the boundary, but it may also vanish at points inside the set;
see Figure 3.1. The algebraic reason is that pairs of complex roots may coincide,
which will cause the discriminant to vanish, even though this does not directly aect
nonnegativity of p.
This situation can create some serious diculties. For instance, even though
we have a perfectly valid analytic expression for the boundary of the set, we cannot
get a good sense of how far we are from the boundary by looking at the absolute
value of the discriminant (this would be very useful for numerical optimization over
Pn,2d ). A more algebraic way of describing the situation is that Pn,2d is a convex set
with the complicating feature that the Zariski closure of the boundary intersects
the interior of the set.
In general, these sets are not very convenient to work with since we cannot
describe them in terms of unquantied inequalities.
Lemma 3.14. The set discussed in Example 3.13 and presented in Figure 3.1 is
not basic semialgebraic.
The fact that Pn,2d is not basic semialgebraic (for 2d 4) means that there
is no description of Pn,2d in terms of a nite collection of polynomial inequalities
{g1 (p ) 0, . . . , gm (p ) 0} in the coecients p . In other words, any characterization of the set Pn,2d using polynomial inequalities must necessarily include logical
operations between sets (e.g., unions, complements) or other similar complications.
Things can be even more complicated than what Figure 3.1 suggests in the
sense that (as opposed to what may be inferred from this gure) in higher dimensions
it is impossible to remove the undesired component (i.e., the discriminant does
not factor, as it did in this example). Consider the case of a quartic polynomial of
i
i
54
main
2012/11/1
page 54
i
Figure 3.2. The zero set of the discriminant of the polynomial x4 + 4ax3 +
6bx + 4cx + 1. The convex set inside the bowl corresponds to the region of
nonnegativity. There is an additional one-dimensional component inside the set.
2
the form p(x) = x4 + 4ax3 + 6bx2 + 4cx + 1. Its discriminant (up to a nonessential
numerical factor) is the irreducible polynomial
1 27a4 64c3 a3 + 108bca3 54b3 a2 + 36b2 c2 a2 6c2 a2 + 54ba2
+ 108bc3a 180b2ca 12ca + 81b4 27c4 18b2 54b3 c2 + 54bc2 .
The zero set of this discriminant, shown in Figure 3.2, is an algebraic surface that
denes the boundary of a three-dimensional convex set, corresponding to the values
of (a, b, c) for which p(x) is nonnegative. It can be shown that this convex set is the
convex hull of two parabolas, dened parametrically as
'
&
'
&
2t2 1
2t2 + 1
,t ,
t t,
, t ,
t t,
3
3
respectively, and that the surface is singular along these parabolas (these correspond
to the cases when the polynomial factors as p(x) = (x2 + 2tx 1)2 ).
From the numerical optimization viewpoint, the presence of extraneous components of the discriminant in the interior of the feasible set is also an important
roadblock for the availability of easily computable barrier functions for these sets
(even in the univariate case). Indeed, every polynomial that vanishes on the boundary of the set P1,2d must necessarily contain the discriminant as a factor. This is a
striking dierence from the case of the nonnegative orthant or the positive semidefinite cone, where the standard barriers are given (up to a logarithm) by products
of the linear constraints or a determinant (which are polynomials). A possible
i
i
main
2012/11/1
page 55
i
55
i
i
56
main
2012/11/1
page 56
i
b (a t)2 ,
t 0}.
In Figure 3.3 we present a plot of this three-dimensional convex set and its projection
onto the plane (a, b) that gives exactly the set of Figure 3.1.
As we shall see in detail in the next section, this idea will allow us to exactly
represent the set P1,2d of univariate nonnegative polynomials as the projection of
a nice spectrahedral set. Furthermore, the same techniques will make it possible
to obtain good approximations for the set Pn,2d of multivariate nonnegative polynomials. The techniques will be based on the connection between sums of squares
polynomials and semidenite programming.
i
i
main
2012/11/1
page 57
i
57
5
4
t
3
2
1
6
0
4
4
2
0
0
x 0,
y 0,
z 0}.
1. Is it convex?
2. Is it a spectrahedron?
3. Is it a projected spectrahedron?
Hint: If you need help with item 2, try the real zero condition in Chapter 6.
Exercise 3.23. Prove the validity of the set containment relationships described
in Figure 3.4, and give counterexamples for all noninclusions.
3.1.2
Sums of Squares
m
qk2 (x).
(3.5)
k=1
i
i
58
main
2012/11/1
page 58
i
p(x1 , x2 ) =
It quickly follows from its denition that the set n,2d of sos polynomials is
invariant under nonnegative scalings and convex combinations; i.e., it is a convex
cone. In fact, more is true, as follows.
Theorem 3.26. The set of sos polynomials n,2d is a proper cone (i.e., closed,
convex, pointed, and solid) in R[x]n,2d RN .
One of the central questions in convex algebraic geometry is to understand the
relationships between the two cones Pn,2d and n,2d . In the remainder of this chapter, as well as in Chapter 4, we analyze this problem from the algebraic, geometric,
and computational viewpoints.
Exercise 3.27. Consider the sum of squares representation (3.5). Show that if
p(x) has degree 2d, then the polynomials qi necessarily have degree less than or
equal to d, by considering the coecients corresponding to the highest order terms.
Exercise 3.28. Using nitely many squares in Denition 3.24 may seem restrictive
at rst. Show using Caratheodorys theorem
A.10 in Appendix A) that
%
$ (Theorem
.
in Denition 3.24 we can always take m n+d
d
i
i
main
2012/11/1
page 59
i
59
(3.6)
i
i
60
main
2012/11/1
page 60
i
where rj and (zk , zk ) are the real and complex roots of p(x), p2d > 0, and the
multiplicities nj of the real roots are even.
2. Show that if z is a complex number, the quadratic polynomial (x z)(x z )
is a sum of two squares.
3. Use Exercise 3.29 to conclude that p(x) is itself a sum of two squares.
Exercise 3.31. Using the previous exercise, compute a decomposition of p(x) =
x4 + 2x3 + 6x2 22x + 13 as a sum of two squares.
Exercise 3.32. Let p(x1 , . . . , xn ) be a quadratic polynomial (i.e., 2d = 2). Show
that if p(x1 , . . . , xn ) is nonnegative, then it is a sum of squares.
Hint: Recall Example 3.11 and matrix factorizations.
3.1.3
Univariate Polynomials
In this section we explain in detail the computation of sos decompositions of univariate polynomials, with a full discussion of the multivariate case in the next section.
The main reason for starting with the univariate case is that it is notationally
simpler, and it is fairly similar to the general case.
Consider a univariate polynomial p(x) of degree 2d:
p(x) = p2d x2d + p2d1 x2d1 + + p1 x + p0 .
(3.7)
Assume that p(x) is a sum of squares; i.e., it can be written as in (3.5). Notice that
the degree of the polynomials qk must be at most equal to d, since the coecient of
the highest term of each qk2 is positive, and thus there cannot be any cancellation
in the highest power of x (cf. Exercise 3.27). Then, we can write
1
q1 (x)
x
q2 (x)
(3.8)
.. = V .. ,
.
.
qm (x)
xd
where V Rm(d+1) , and its kth row contains the coecients of the polynomial
qk . For future reference, let [x]d be the vector of monomials on the right-hand side
of (3.8), and dene the matrix Q := V T V . We then have
p(x) =
m
k=1
i
i
main
2012/11/1
page 61
i
61
Q 0.
(3.9)
The matrix Q is usually called the Gram matrix of the sos representation. One
direction of the lemma follows directly from noticing that the matrix Q = V T V constructed above is positive semidenite. For the other direction, assume there exists a
positive semidenite matrix Q for which (3.9) holds. Then, by factorizing Q = V T V
(e.g., via Cholesky or square root factorization), we obtain an sos decomposition
of p(x).
Although perhaps not immediately obvious at rst, the condition in (3.9) is a
semidenite program! Indeed, notice that the constraint p(x) = [x]Td Q[x]d is ane
in the matrix Q, and thus the set of possible Gram matrices Q is given exactly by
the intersection of an ane subspace and the cone of positive semidenite matrices.
To obtain explicit equations for this semidenite program, we index the rows
and columns of Q by {0, . . . , d} as
d
2d
d
[x]Td Q[x]d =
Qij xi+j =
Qij xk .
i=0 j=0
k=0
i+j=k
Thus, for this expression to be equal to p(x), it must be the case that
pk =
Qij ,
k = 0, . . . , 2d.
(3.10)
i+j=k
This is a system of 2d + 1 linear equations between the entries of Q and the coecients of p(x). Thus, since Q is simultaneously constrained to be positive semidenite, and to belong to the ane subspace dened by these equations, an sos condition
is exactly equivalent to a semidenite programming problem. We have shown, then,
the following.
2d
Lemma 3.34. A univariate polynomial p(x) = k=0 pk xk is a sum of squares if
and only if there exists a positive semidenite matrix Q S d+1 satisfying (3.10).
This is a semidenite programming problem.
Recall that in the univariate case, nonnegativity and sum of squares are equivalent conditions. Thus, Lemma 3.34 completely characterizes the set of univariate
nonnegative polynomials and shows that the set P1,2d = 1,2d is a projected spectrahedron.
Example 3.35. Consider the univariate polynomial
p(x) = x4 + 4x3 + 6x2 + 4x + 5,
i
i
62
main
2012/11/1
page 62
i
T
1
q00
p(x) = x q01
q02
x2
q01
q11
q12
q02
1
q12 x
q22
x2
5
Q= 2
0
2 0
6 2 = V T V,
2 1
0 2 1
2 0 ,
V = 2
3 0 0
i
i
63
0 0
1 0
Cp := 0 1
.. ..
. .
0 0
2:
main
2012/11/1
page 63
i
..
.
0
0
0
..
.
p0
p1
p2
..
.
1 p2d1
U11
U21
U12
U22
11
0
12
22
U11
U21
U12
U22
,
i
i
64
main
2012/11/1
page 64
i
3.1.4
Multivariate Polynomials
The general multivariate case is quite similar to the univariate case discussed in
the previous section. The main dierences are the need of multi-index notation for
monomials, and the fact that sos will only be a sucient condition for nonnegativity.
The number
Consider a polynomial p(x
$ 1 , . .%. , xn ) of degree 2d
in n variables.
.
We
let
p(x)
=
p
x
,
where
are tuples
of coecients of p is equal to n+2d
2d
of exponents {(1 , . . . , n ) : 1 + + n 2d, i 0 i = 1, . .$. , n}.
%
monoLet [x]d := [1, x1 , . . . , xn , x21 , x1 x2 , . . . , xdn ]T be the vector of all n+d
d
mials in x1 , . . . , xn of degree less than or equal to d, and consider the equation
p(x) = [x]Td Q [x]d ,
(3.11)
% $n+d%
$
symmetric matrix. Proceeding exactly as in the previous
where Q is an n+d
d d
$
%
section, and indexing the matrix Q by the n+d
monomials in n variables of ded
gree d (or, more precisely, the associated exponent tuples), we obtain the following
conditions:
p =
Q ,
Q 0.
(3.12)
+=
$n+2d%
i
i
main
2012/11/1
page 65
i
65
q00,00
1
x q00,10
y q00,01
p(x, y) = 2
x q00,20
xy q00,11
q00,02
y2
x y :
2
y :
2 = q20,20 ,
1 = q00,22 + 2 q01,21 + q11,11 ,
0 = 2 q00,02 + q01,01 .
6 3 0 2 0 2
3 4 0
0 0
0
1
0 0 4
0 0
0
.
Q=
6 3 4
3 2 0 0
0 0 0
3 5
0
2 0 0 4 0 15
Any factorization of this positive semidenite matrix will give an explicit sos decomposition of p(x, y), for instance,
p(x, y) =
4 2 1349 4
1
1
y +
y + (4x + 3)2 + (3x2 + 5xy)2 +
3
705
12
15
1
(21x2 + 20y 2 + 10)2
+
315
1
(328y 2 235)2 .
+
59220
We summarize the contents of this section in the following theorem, describing the
direct relation between positive semidenite matrices and an sos condition.
i
i
66
main
2012/11/1
page 66
i
3.1.5
Computational Formulations
s
Qij vi vj , wk ,
k = 1, . . . , t.
i,j=1
i
i
main
2012/11/1
page 67
i
67
d 2
x
,
Bs =
$ %
$ d %
where d denotes the multinomial coecient 1 ,...,
= 1 !2d!!...n ! .
n
The rationale behind this choice is the following: consider the inner product
between polynomials given by
& d '1
p(x), q(x) :=
p x ,
q x
p q .
=
This inner product is known under many dierent names, such as the apolar, Fischer, Calder
on, or Bombieri inner product. Its dening property is
the direct relationship between powers of linear forms and point evaluations.
Indeed, if p is a homogeneous polynomial of degree d, we have
& d '1 & d '
T
d
p(x), (v x) =
p
v = p(v).
i
i
68
main
2012/11/1
page 68
i
# x xk
,
xi xk
i = 0, . . . , d,
k=i
form a basis of R[x]1,d . Also of interest is that the corresponding dual basis of
the dual space R[x]1,d is then given by the point evaluations xi that satisfy
xi (p) = p(xi ).
This choice is particularly appealing in the case where the polynomial is presented in terms of its values at a given set of points, instead of an explicit
description in terms of coecients. This approach also has some convenient
numerical properties related to the use of interior-point methods in the solution of the corresponding semidenite programs; see [75] for more details.
Exercise 3.42. Consider a univariate cubic polynomial p(x) on the interval [a, b],
for which we want to describe the convex hull of its graph, i.e., the set
%
$
S = conv {(t, p(t)) R2 : t [a, b]} .
i
i
main
2012/11/1
page 69
i
69
1
x2 = a + (b a),
4
3
x3 = a + (b a),
4
x4 = b.
i = 1,
4
i xi = x,
i=1
4
i p(xi ) = y.
i=1
The set S is then given by the projection of this spectrahedron onto the variables
(x, y). Notice that in this description, the explicit expression of the polynomial p(x)
is never used, but instead only the interpolation values p(xi ) appear.
1. Prove the validity of this description using an sos formulation based on Lagrange interpolation.
2. Generalize this representation to univariate polynomials of any degree.
3.1.6
We have seen in previous sections how to compute sos decompositions using semidefinite programming. These convex optimization problems are usually solved numerically, using oating-point arithmetic. Although oating-point techniques in
principle allow for numerical approximations of arbitrary precision, the computed
solutions will typically not be exact. This may mean, for instance, that the equation
p(x) = [x]Td Q[x]d is only approximately satised, or that the matrix Q may have
very small negative eigenvalues.
In many applications, particularly those arising from problems in pure mathematics, it is desirable or necessary to obtain exact solutions. Examples of this are
i
i
70
main
2012/11/1
page 70
i
the use of sos methods for geometric theorem proving (e.g., Section 3.6.5) for establishing the validity of certain algebraic inequalities between matrices [68], or a case
of the monotone column permanent (MCP) conjecture [64]. A remarkable recent
application is the work in [10], where sos methods were used to prove new upper
bounds on kissing numbers, a well-known problem in sphere packings. A common
element in all these works is the use of exact algebraic identities obtained from
inspection of a numerically computed solution as the basic ingredients in a rigorous
proof.
In this section, we show that under a strict feasibility assumption, we can obtain a rational sos representation from an approximate solution to the semidenite
program of Theorem 3.39. The basic idea is to round and project the numerically
obtained Gram matrix onto the feasible subspace. We quantify the relation between the numerical error in the subspace and semidenite constraints, versus the
rounding tolerance, that will guarantee that the rounded and projected solution
remains feasible. For a full exposition of these ideas, as well as alternative approaches and improvements, we refer the reader to [98], [60], [65], and the references
therein.
To obtain rational sos decompositions, it is enough to focus on rational Gram
matrices. This follows from the LDLT decomposition; see Exercise 3.46.
Theorem 3.43. There exists a rational sos decomposition, i.e., p(x) = i pi (x)2 ,
where pi (x) Q[x], if and only if there is a Gram matrix with rational entries.
The approach we will use to obtain rational sums of squares is to take advantage of interior point solvers computational eciency: we rst compute an approximate numerical solution, and in a second step we round this numerical solution to
an exact rational one. We have the following standing assumption.
Assumption. There exists a positive denite Gram matrix Q for p(x).
This assumption is equivalent to the polynomial p(x) being in the interior of
the cone of sums of squares. The method described here could fail in general for
sums of squares that are not strictly positive: if there is an x such that p(x ) = 0,
it follows from the identity p(x ) = [x ]Td Q[x ]d that the monomial vector [x ]d
is in the kernel of Q. Hence Q cannot be positive denite. Nevertheless, this
assumption is reasonable for many problems of interest. Furthermore, very recent
work of Scheiderer [108] shows that this assumption (or a similar one) is required
by giving a construction of sos polynomials with rational coecients for which no
rational decompositions exist.
We assume the sos problem is posed as a semidenite problem in primal form,
as described in Section 3.1.4. After solving the SDP problem in general the numerical solution Q will not exactly satisfy (3.11). For an exact representation of
the original polynomial p(x), we have to nd a rational approximation to Q which
satises the equality constraints. The simplest procedure is to compute a ratio either by naive rounding or more sophisticated techniques
nal approximation Q,
is then projected onto the
like continued fractions. This rational approximation Q
subspace dened by the equations. Since this subspace is dened by rational data
i
i
main
2012/11/1
page 71
i
71
( Q )
( Q )
PSD
L
Figure 3.6. Projection of a rounded solution. The matrix Q is the numerical solution of the SDP problem, and the orthogonal projections of the matrices Q
onto the subspace L are denoted by (Q) and (Q),
respectively. The shaded
and Q
cone PSD represents the cone of positive semidenite matrices.
(the coecients of p(x)), an orthogonal projection onto this subspace will yield
see Exercise 3.47.
a rational matrix (Q);
Now we obtain conditions to ensure that the truncated and projected matrix
remains positive semidenite. For this, we will estimate the rounding toler(Q)
ance needed. Assuming strict feasibility of the numerical solution Q returned by the
SDP solver, we quantify how far it is from the boundary of the positive semidenite
cone and the ane subspace through two parameters and . The parameter > 0
will satisfy Q I and is a lower bound on the minimum eigenvalue of Q. The parameter quanties the distance of Q to the subspace, and thus
Q (Q)
F ,
where
F denotes the Frobenius norm. The matrix Q will be approximated by
such that
Q Q
i
i
main
2012/11/1
page 72
72
As described in [96], these ideas have been implemented in the software package SOS.m2 for the computer algebra system Macaulay 2 [54]. This package can be
used to compute rational sos decompositions and is available for download at [97].
Similar concepts have been recently implemented by Harrison in the open source
theorem prover HOL Light [60].
In SOS.m2, the main function is getSOS, which tries to compute a rational sos
decomposition for a given polynomial. In the following example we demonstrate how
to use the getSOS command for computing an sos decomposition of a polynomial
of degree 4 with 4 variables.
Example 3.45. Consider the polynomial
p(x, y, z, w) = 2x4 + x2 y 2 + y 4 4x2 z 4xyz 2y 2 w + y 2 2yz + 8z 2 2zw + 2w2 .
We rst load the SOS package and dene p(x, y, z, w):
i1 : loadPackage "SOS";
i2 : P = QQ[x,y,z,w];
i3 : p = 2*x^4 + x^2*y^2 + y^4 - 4*x^2*z - 4*x*y*z - 2*y^2*w +
y^2 - 2*y*z + 8*z^2 - 2*z*w + 2*w^2;
If successful, the
function getSOS returns a weighted sos representation such that
p(x, y, z, w) = i di gi (x, y, z, w)2 . Otherwise an error message is displayed.
i4 : (g,d) = getSOS p
... omitted output ...
1 2
1
1
1
2 2
2
8 2
1
o8 = ({- -*x - -*x*y - -*y + z - -*w, - --*x - --*x*y - --*y - --*y + w,
4
4
8
8
15
15
15
15
---------------------------------------------------------------------2
4
4 2
2
18 2
20
81 2
2
x - --*x*y - --*y - --*y, x*y - --*y - --*y, - ---*y + y, y },
11
11
11
59
59
205
---------------------------------------------------------------------15 22 59 41
66
{8, --, --, --, --, ----})
8 15 55 59 1025
y +y
y .
59
205
1025
Correctness of the obtained decomposition may be veried with the function sumSOS,
which expands a weighted sum of squares decomposition:
i
i
main
2012/11/1
page 73
i
73
i5 : sumSOS (g,d) - p
o5 = 0
Exercise 3.46. Prove Theorem 3.43. Use the LDLT decomposition (see Appendix A, Section A.1.2).
Exercise 3.47. Consider the ane subspace in Rn dened by the equations Ax = b,
and a point x0 Rn . Show that the orthogonal projection of x0 onto the subspace
is given by
(x0 ) = A+ b + (I A+ A)x0 ,
where A+ is the MoorePenrose pseudoinverse of A. If the rows of A are linearly
independent, we have A+ = AT (AAT )1 , and thus this formula can be written as
(x0 ) = x0 AT (AAT )1 (Ax0 b).
Show that if the matrices A and b are rational, and x0 is a rational point, then so
is (x0 ). Prove these facts, and show how to use them to convert an approximate
Gram matrix into a rational Gram matrix.
Exercise 3.48. Prove Theorem 3.44.
3.1.7
We have described in previous sections how to check whether a given, xed multivariate polynomial is a sum of squares. These results can be nicely generalized to
dene a natural class of convex optimization problems which we will call sum of
squares (sos) programs.
Recall that the main objects of interest in semidenite programming are
quadratic forms that are positive semidenite.
When attempting to generalize this to homogeneous polynomials of higher degree,
a diculty appears: deciding nonnegativity for quartic or higher degree forms is
NP-hard. Therefore, a computationally tractable replacement is the following:
even degree polynomials that are sums of squares.
Sum of squares programs can then be dened as conic optimization problems,
where the feasible set is given by the intersection of an ane family of polynomials
and the proper cone n,2d of sos polynomials. As in the case of pure semidenite
programming, there are several possible equivalent descriptions. We choose below
a free variables formulation to highlight the analogy with the standard SDP dual
form (SDP-D) discussed in Chapter 2.
Denition 3.49. An sos optimization problem or sos program is a convex optimization problem of the form
maximizey
subject to
b1 y1 + + bm ym
pi (x; y) are sos in R[x],
i = 1, . . . , k,
(3.13)
i
i
74
main
2012/11/1
page 74
i
where pi (x; y) := ci (x) + ai1 (x)y1 + + aim (x)ym , and the ci , aij are given multivariate polynomials in R[x].
Notice that the pi (x; y) are arbitrary polynomial expressions that are ane in
the parameters y1 , . . . , ym (the decision variables). Also, note that the variables x
are dummy variables, in the sense that we are not optimizing over them, but they
are the indeterminates of the underlying polynomials. Sum of squares programs are
very useful, since they directly operate with polynomials as their basic data type,
thus providing a quite natural modelling formulation for many problems. We will
discuss several examples later in this chapter, including Lyapunov functions for
nonlinear systems [89, 87], probability inequalities [16], and convex relaxations for
nonconvex optimization [89, 72].
Example 3.50. Consider the following simple sos program:
maximizey
subject to
y1 + y2
x4 + y1 x + (2 + y2 )
(y1 y2 + 1) x2 + y2 x + 1
is sos,
is sos.
The constraints involve two univariate polynomials (in x), whose coecients are
ane functions of the parameters (or decision variables) (y1 , y2 ). Notice that the
feasible set (i.e., the set of y1 , y2 for which both polynomials are sos) is necessarily
convex, since it is dened by the intersection of an ane subspace and the sos
cone.
Interestingly enough, despite their apparently greater generality, sos programs
are in fact equivalent to SDPs. To see this, notice that, on the one hand, by choosing
the polynomials ci (x), aij (x) to be quadratic forms, we recover the standard SDP
formulation. On the other hand, it is possible to exactly embed every sos program
into a larger semidenite program. Indeed, the constraints requiring pi (x; y) to be
sos in R[x] are equivalent to the existence of matrices Qi 0 satisfying
pi (x; y) = [x]Td Qi [x]d ,
i = 1, . . . , k.
i
i
main
2012/11/1
page 75
i
75
where the matrices Q, R are positive semidenite. Expanding and equating the
left- and right-hand sides, we obtain ane equations between the decision variables
y1 , y2 and the entries of the matrices Q, R. For instance, for the rst constraint we
obtain
1 = q22 ,
x4 :
x3 :
2
x :
x:
1:
0 = 2q12 ,
0 = q11 + 2q02 ,
y1 = 2q01 ,
2 + y2 = q00 ,
y1 y2 + 1 = r11 ,
y2 = 2r01 ,
1 = r00 .
Putting together these linear equations with the conditions Q 0 and R 0 yields
a standard semidenite program.
As we see, the conversion process from an sos program to a standard semidefinite program is fully algorithmic (and somewhat messy and cumbersome if done
by hand!). For these reasons, it has been implemented in several parsers/solvers
such as SOSTOOLS [101], YALMIP [74], and SPOT [78]. Furthermore, it is quite
useful from both theoretical and practical viewpoints to abstract out the fact
that (under the hood) sos programs are solved via semidenite programming and
instead just think of them as a tractable class of convex optimization problems that
we can freely use for modeling and implementation. In fact, from the next chapter
on, we will rarely mention semidenite programming, and all our formulations will
be given directly in terms of sos programs.
Although sos programs and semidenite programming are equivalent in the
sense described earlier, the rich algebraic structure of sos programs makes possible
a much deeper understanding of their special properties. This also enables customized, more ecient algorithms for their numerical solution [50, 75, 107]. As
illustrated in later sections, there are numerous questions in a number of application domains, as well as foundational issues in nonconvex optimization that have
simple and natural formulations as sos programs.
Exercise 3.52. Plot the feasible set of the sos program of Example 3.50. Find the
corresponding optimal solution (y1 , y2 ) as well as explicit sos decompositions of the
constraint polynomials at optimality.
Exercise 3.53. Show that sos programs can be written as conic optimization
problems in terms of the cone n,2d of sos polynomials. Write the corresponding
dual conic program.
i
i
76
3.2
main
2012/11/1
page 76
i
In this section we elaborate on several natural extensions of the basic sos methods
discussed so far. In combination with the more advanced techniques presented later,
these will serve as building blocks for more complex, domain-specic applications
developed in Section 3.6.
3.2.1
x R
p(x) 0
x R.
Notice that the polynomial p(x) has coecients that depend anely on . This
suggests considering the optimization problem
maximize
subject to
p(x) is nonnegative.
(OPT-NN)
Clearly, this is a convex problem, since the feasible set is dened by an innite
number of linear inequalities (one for each value of x). Its optimal solution p is
equal to the global minimum of the polynomial, p(x ).
Consider now instead the following optimization problem, where the nonnegativity condition has been replaced by an sos constraint:
maximize
subject to
p(x) is sos.
(OPT-SOS)
The key distinction between the problems (OPT-NN) and (OPT-SOS) is the replacement of nonnegativity by an sos condition. However, since in the univariate
case nonnegativity is equivalent to sum of squares, these two optimization problems are, in fact, equivalent. Furthermore, (OPT-SOS) has exactly the form of
an sos program, and it is thus equivalent to a standard semidenite program; see
Exercise 3.54 for its explicit formulation.
As a consequence, we can obtain the value of the global minimum of a univariate polynomial by solving
m an sos program. Notice also that at optimality we
have 0 = p(x ) p = k=1 qk2 (x ) and thus all the qk simultaneously vanish at
x , which in principle gives a way of computing the minimizer x . As we shall see
later, a better alternative is to obtain the solution x directly from the dual SDP
problem by using complementary slackness.
i
i
main
2012/11/1
page 77
i
77
Even though p(x) may be highly nonconvex, the proposed convex formulation nevertheless eectively computes its global minimum. This will extend, with
suitable modications, to the general multivariate case.
2d
k
Exercise 3.54. Let p(x) =
k=0 ck x . Give an explicit SDP formulation to
compute the value of the global minimum of p(x). Apply your formulation to the
polynomial p(x) = x4 20x2 + x.
3.2.2
Rational Functions
q(x)
p(x) q(x) 0,
it follows that one can nd the global minimum of the rational function by solving
maximize
subject to
3.2.3
Multivariate Optimization
subject to
p(x1 , . . . , xn ) is nonnegative.
(MOPT-NN)
Despite being convex (why?), this formulation is in general intractable, since the
constraint set involves the set of nonnegative polynomials. As in the univariate
case, this suggests considering its sos alternative:
maximize
subject to
p(x1 , . . . , xn ) is sos.
(MOPT-SOS)
Let p be the optimal value of (MOPT-NN) (i.e., the global minimum2 of the
polynomial p(x1 , . . . , xn )) and psos be the optimal value of (MOPT-SOS). It should
2 Unlike in the univariate case, a multivariate polynomial that is bounded below need not
achieve its global minimum (as an example, consider the polynomial x2 + (1 xy)2 ). Therefore,
to make things fully rigorous one should consider here the supremum rather than the maximum.
i
i
78
main
2012/11/1
page 78
i
be clear that one can compute psos eciently by solving the corresponding sos
program (e.g., using an SDP solver).
Recall that for the general multivariate case, nonnegativity and sum of squares
are no longer equivalent. Thus, since the feasible set of the second problem is a
(possibly strict) subset of the feasible set of the rst problem, we have the inequality
psos p ,
and thus the sos technique is (in principle) only guaranteed to produce a lower bound
on the value of the global minimum of p. Notice that, on computational complexity
grounds, this is to be expected, since multivariate polynomial optimization is NPhard, while semidenite programming is polynomial-time (to any given accuracy).
Interestingly, there is strong experimental evidence that shows that, at least
for relatively small problems, we very often have p = psos ; see, e.g., [94]. The
reasons for this phenomenon are not yet completely understood, except in particular
cases. As explained in Chapter 4, perhaps the opposite trend should be expected
for large enough dimension. Nevertheless, as we shall see shortly in Section 3.2.6,
even in those situations where psos < p , we will be able to produce stronger sos
conditions that will improve upon the plain sos lower bound psos .
Exercise 3.57. Find the value of psos for the trivariate polynomial
p(x, y, z) = x4 + y 4 + z 4 4xyz + 2x + 3y + 4z.
Is the computed value of psos equal to the global minimum p ?
Exercise 3.58. Find a bivariate polynomial p(x, y) for which psos < p .
Exercise 3.59. Assume that p(x) is bounded below. Is psos necessarily nite?
Prove or disprove with a counterexample.
3.2.4
i
i
main
2012/11/1
page 79
i
79
Equations. For simplicity, let us assume rst that the set S is described by a set
of polynomial equations, i.e., that it is a real algebraic variety of the form
S = {x Rn : f1 (x) = 0, . . . , fm (x) = 0}.
Recalling the formal similarity with weak duality and Lagrange multipliers, it is
natural to write a condition of the following type:
p(x) +
m
i (x)fi (x)
is sos,
(3.14)
i=1
where i (x) are arbitrary polynomials. Notice that this condition does what we
want, since it obviously implies that p(x) is nonnegative on the set S. Indeed,
if (3.14) holds, by evaluating this expression at any point x0 S, we immediately
conclude that p(x0 ) 0. Notice also that the expression (3.14) is ane in the
unknown polynomials i (x), and once the set of allowable multipliers i (x) has
been xed (e.g., by restricting their degrees), this condition has the form of an sos
program.
In more algebraic terms, condition (3.14) considers the polynomial ideal I generated by the constraints fi (x). If p(x) is congruent with a sum of squares modulo
the ideal I, then this obviously certies nonnegativity of p(x). We elaborate more
on this algebraic viewpoint in Section 3.3.5 and Chapter 7.
Inequalities. If the set S is described using polynomial inequalities (as opposed to
equations), we can do something very similar. Assume the set S has a description:
S = {x Rn : g1 (x) 0, . . . , gm (x) 0}.
Similar to the previous subsection, and again inspired by weak duality, one can now
consider expressions of the type
p(x) = s0 (x) +
m
si (x)gi (x),
(3.15)
i=1
where s0 (x) and si (x) are sos polynomials. Indeed, this serves as a self-evident
certicate of nonnegativity of p(x) on the set S, since evaluating such a representation at any point x0 S will directly prove p(x0 ) 0. In addition, notice that we
can consider more powerful expressions by allowing nite products of constraints of
the form
p(x) = s0 (x) +
m
i=1
si (x)gi (x) +
m
(3.16)
ij
where as before the polynomials s0 (x), si (x), sij (x), . . . are sums of squares. Again,
once the structure of these polynomials has been xed (e.g., by restricting their
degrees), the conditions boil down to sos programs. Any representation of the
type (3.16) serves as an obvious certicate of nonnegativity of p(x) on S.
i
i
80
main
2012/11/1
page 80
i
Remark 3.60. In principle, one could perhaps think of using nonnegative polynomials instead of sum of squares for the si (x) in the previous expressions, since
evaluating them at candidate points x0 would certainly show nonnegativity of p(x) on
the set S. Notice, however, that in this case one would have to rely on a promise
that the polynomials si indeed have the stated property. The reason why sums of
squares are of relevance is that their (unconstrained) positivity is certied by the
sos decomposition itself, and thus they serve as a bona de mathematical proof of
nonnegativity of p(x) on S.
Under certain assumptions, converse results or representation theorems will
ensure that whenever p(x) is nonnegative on a given set S, a certicate of a specied
form must exist. We emphasize, however, that in most practical applications of
sos techniques only the easy direction is actually used, in the sense that once
an sos certicate has eectively been computed, it transparently proves the desired
property (e.g., polynomial nonnegativity, etc.).
S-procedure. In the particular case when the gi (x) are quadratic forms, and
the si (x) are nonnegative scalars, the sucient condition (3.15) is known as the
S-procedure in the mathematical optimization and control literature. Under suitable
assumptions, this condition is lossless; i.e., it exactly characterizes nonnegativity of
a quadratic form on a quadratically constrained set.
Lemma 3.61 (S-lemma). Let p(x) and g1 (x) be quadratic forms, and assume that
the set S has an interior point (i.e., there exists an x0 Rn such that g1 (x0 ) > 0).
In this case, if p(x) is nonnegative on S, it has a representation as in (3.16), i.e.,
p(x) = s0 (x) + s1 g1 (x),
where s0 (x) is a positive semidenite quadratic form, and s1 is a nonnegative constant.
For more about the S-procedure, the S-lemma, and their many applications,
see the books [21, 15] or the survey [99].
Exercise 3.62. Let p(x) = x4 3x2 + 1. Give an sos certicate of the nonnegativity of p(x) on the set S = {x R : x3 4x = 1}.
Exercise 3.63. Allowing products of constraints (as in (3.16) as opposed to (3.15))
sometimes makes possible the existence of much more concise nonnegativity certicates (or even makes possible their existence). Consider, for instance, the polynomial p(x, y) = xy, which is obviously nonnegative on the compact set S = {(x, y)
R2 : x 0, y 0, x + y 1}.
1. Show that no nonnegativity certicate of the form (3.15) exists.
2. Give a nonnegativity certicate of the form (3.16).
i
i
main
2012/11/1
page 81
i
81
Exercise 3.64. Assume that the set S is described using both equations and
inequalities; i.e., it has the form
S = {x Rn : f1 (x) = 0, . . . , fk (x) = 0, g1 (x) 0, . . . , gm (x) 0}.
What conditions would you propose to use to certify nonnegativity of a polynomial
p(x) on S?
3.2.5
subject to
fi (x) = 0, i = 1, . . . , m.
The true minimum value d of this problem yields the distance from x0 to the
variety V , and thus any valid lower bound on d will give a guaranteed neighborhood
of x0 that does not intersect the variety. Based on the same arguments as in the
previous section, it should be clear that one can compute lower bounds on d and
safe neighborhoods by considering sos problems of the form
maximize
subject to
(
x x0
) +
2
m
i (x)fi (x)
is sos.
(3.17)
j=1
i
i
82
main
2012/11/1
page 82
i
0
1
2
3
4
2
1+b
0
a
1
a
2 b + a 1 ,
A=
3
0
b
2
whose characteristic polynomial is
det(tI A) = [27t3 + (45 9a)t2 + (24 + 9a + 3ab 3b2 )t
+ (4 2a b 2ab + a2 b + 3b2 )]/27.
When the parameters (a, b) vanish, i.e., for (a, b) = (0, 0), the eigenvalues of A
are (1/3, 2/3, 2/3), and thus the dierence equation is clearly stable. We want to
determine how large a perturbation in (a, b) can be (measured in the Euclidean
norm) for the dierence equation to remain stable.
To apply the methods described in this section, we can consider the algebraic
variety dened by the Zariski closure of the boundary of the region of stability.
Clearly, A is on the boundary of stability if and only if some eigenvalue i lies on
the unit circle, i.e., satises i i = 1. We can easily characterize this condition
algebraically. For instance, one can consider the polynomial
f (a, b) := det(A A I),
i
i
main
2012/11/1
page 83
i
83
since the eigenvalues of the Kronecker product A A are the products i j , and
because A is real its eigenvalues appear in complex conjugate pairs. For our example, after removing constants and multiplicities from the factors, this yields the
polynomial
f(a, b) = (2 2a b + ab + a2 b)(100 20a b 5ab + a2 b + 6b2 )
(245 + 133a 14a2 37b + 2ab + 27a2 b + 5a3 b + 31b2 + 19ab2
+ 2a2 b2 4a3 b2 + a4 b2 6b3 12ab3 + 6a2 b3 + 9b4 ).
(3.18)
This polynomial denes the variety of interest, and it can be seen that it factors
into three components. This factorization is structural and corresponds to the conditions of the matrix A having eigenvalues at 1, at 1, or on the remainder of the
unit circle. (As an aside, a more ecient alternative is to directly compute a factorized representation using the bialternate matrix product instead of the Kronecker
product, since this removes multiplicities associated with the pairs i j and j i ;
see, e.g., [57].)
We can now compute, using (3.17), the size of a neighborhood of (a, b)
that is guaranteed not to intersect this variety. Notice that, for our example, since
the variety is dened by a single polynomial that factors, it is possible (and more
ecient) to consider each factor separately. In this case, for each of the three factors
in (3.18), we obtain values
1 0.8875,
2 9.0696,
3 2.1974.
Of these three, 1 denes the smallest neighborhood, and thus it yields a region
a2 + b2 < 0.8875 where the linear dierence equation is certied to be stable. This
neighborhood and the corresponding varieties are presented in Figure 3.7.
Remark. In the robust control literature, there are several methods that can
partially exploit the determinantal structure of these kinds of problems. The notion
of structured singular value and associated convex bounds are particularly relevant;
see e.g. [18, 43, 125] and the references therein.
Remark 3.66. Notice that in the optimization problem (3.17) the unknown multipliers i (x) are otherwise unconstrained. We will see in Section 3.3.5 and Chapter 7
that it is possible to exploit this structure for more ecient computation by computing sums of squares on the quotient ring R[x]/I(V ).
3.2.6
i
i
84
main
2012/11/1
page 84
i
(3.19)
which clearly certies that M (x, y) 0, despite the fact that M (x, y) itself is not a
sum of squares.
We will discuss a far-reaching generalization of this basic idea in Section 3.4,
where we explain how to approximate any semialgebraic problem (including of
course the simple case of a single polynomial being nonnegative) by sos techniques.
However, let us elaborate at this point on a number of interesting connections.
Sums of squares of rational functions. A simple explanation of why a multiplier
q(x) makes possible more powerful
nonnegativity certicates can be obtained by
considering the case where q(x) = i qi (x)2 is a sum of squares. In this case, we
can reinterpret an sos certicate for the product as
q(x) p(x) =
j
s2j (x)
p(x) =
q(x)
i
i
main
2012/11/1
page 85
i
85
is clearly ane in the unknown polynomial q(x) and thus can be reduced to an sos
program (and solved via semidenite programming).
Uniform denominators and P
olyas theorem. Artins solution to Hilberts
17th problem ensures that for every nonnegative polynomial there is a decomposition as a sum of squares of rational functions, or alternatively, a suitable multiplier
always exists. In many situations, it is convenient or necessary to restrict the structure of the possible multipliers (we will see examples of this later when discussing
copositive matrices in Section 3.6.1). Recall that a form is a homogeneous polynomial, i.e., one for which all monomials have the same degree. A well-known theorem
by P
olya about forms that are positive on the nonnegative orthant states precisely
a case where this situation holds.
Theorem 3.67
([59, Section 2.24]). Given a form f (x1 , x2 , . . . , xn ) strictly positive for xi 0, i xi > 0, then f can be expressed as
g
f= ,
h
where g and h are forms with positive coecients. In particular, we can choose
h = (x1 + x2 + + xn )r
for a suitable r.
As we see, a representation of this kind gives an obvious certicate of the
nonnegativity of f on the nonnegative orthant. To see the relationship with sums
of squares, notice that if f is positive on the nonnegative orthant, then we can
write f-(x1 , . . . , xn ) := f (x21 , . . . , x2n ) = g(x21 , . . . , x2n )/(x21 + + x2n )r , and thus
P
olyas theorem yields a representation of the positive even form f- as a sum of
squares of rational functions, with a denominator of a xed form. P
olyas theorem
was generalized by Reznick [105], who showed that for any strictly
positive form
(not necessarily even), after multiplying by a suitable factor ( i x2i )r it becomes
a sum of squares (for r large enough). Furthermore, he also provided quantitative
estimates for the exponent r.
Exercise 3.68. Let q(x)p(x) and q(x) be sums of squares, where the multiplier
q(x) is not the zero polynomial. Show that p(x) is nonnegative.
Exercise 3.69. Consider the quartic form in four variables
p(w, x, y, z) := w4 + x2 y 2 + x2 z 2 + y 2 z 2 4wxyz.
1. Show that p(w, x, y, z) is not a sum of squares.
2. Find a multiplier q(w, x, y, z) such that q(w, x, y, z) p(w, x, y, z) is a sum of
squares.
i
i
86
main
2012/11/1
page 86
i
3.3
3.3.1
Univariate Intervals
For univariate polynomials, we have seen how to exactly characterize global nonnegativity (i.e., for x (, )) in terms of semidenite programming. But what
if we are interested in polynomials that are nonnegative only on an interval (either
nite or semi-innite)? As explained below, we can use very similar ideas and two
classical characterizations usually associated to the names PolyaSzeg
o, Fekete, or
MarkovLukacs. The basic results are the following.
Theorem 3.71. A univariate polynomial p(x) is nonnegative on [0, ) if and only
if it can be written as
p(x) = s(x) + x t(x),
where s(x), t(x) are sums of squares. If deg(p) = 2d, then we have deg(s) 2d,
deg(t) 2d 2, while if deg(p) = 2d + 1, then deg(s) 2d, deg(t) 2d.
A similar result holds for closed nite intervals.
Theorem 3.72. Let a < b. Then the univariate polynomial p(x) is nonnegative on
[a, b] if and only if it can be written as
p(x) = s(x) + (x a) (b x) t(x)
if deg(p) is even,
p(x) = (x a) s(x) + (b x) t(x)
if deg(p) is odd,
where s(x), t(x) are sums of squares. In the rst case, we have deg(p) = 2d, and
deg(s) 2d, deg(t) 2d 2. In the second, deg(p) = 2d + 1, and deg(s) 2d,
deg(t) 2d.
Notice the similarity to the conditions discussed in Section 3.2.4 and the fact
that these representations obviously certify that p(x) 0 on the corresponding
set. From the existence of these sos representations, it also follows directly that
i
i
main
2012/11/1
page 87
i
87
3.3.2
The notions of positive semideniteness and sums of squares of scalar polynomials can be naturally extended to polynomial matrices, i.e., matrices with entries
in R[x1 , . . . , xn ]. Sum of squares matrices are of interest in many situations, including the characterization of sos convexity (Section 3.3.3) and representations for
symmetry-invariant polynomials (Section 3.3.6).
We say that a symmetric polynomial matrix P (x) R[x]mm is positive
semidenite if P (x) 0 for all x Rn (i.e., it is pointwise positive semidenite).
The denition of an sos matrix is as follows [69, 48, 109].
Denition 3.76. A symmetric polynomial matrix P (x) R[x]mm , x Rn , is
an sos matrix if there exists a polynomial matrix M (x) R[x]sm for some s N,
such that P (x) = M T (x)M (x).
When m = 1, i.e., for scalar polynomials, this corresponds to the standard sos
notion. Also, when P is a constant matrix, then the condition simply states that P
is positive semidenite. Thus, sos matrices are a common generalization of positive
semidenite (constant) matrices and sos polynomials.
i
i
88
main
2012/11/1
page 88
i
i
i
main
2012/11/1
page 89
i
89
literature, as well as an ecient eigenvalue-based method for nding their sos decomposition, we refer the reader to [9].
In the multivariate case, however, not all positive polynomial matrices are
sums of squares. A well-known counterexample is due to Choi [27], who constructed
a positive semidenite biquadratic form that is not a sum of squares of bilinear
forms. His counterexample can be rewritten as the polynomial matrix
2
x1 + 2x22
C(x) =
x1 x2
x1 x3
x1 x2
x22
x1 x3
2x23
x2 x3
x2 x3
,
x23 + 2x21
(3.20)
which is positive semidenite for all (x1 , x2 , x3 ) R3 but is not an sos matrix.
Exercise 3.81. Prove Lemma 3.78.
Exercise 3.82. Let P (x) be an sos matrix. Show that all principal minors of P (x)
are scalar sos polynomials. (Hint: Use the CauchyBinet matrix identity.)
Exercise 3.83. Show that the Choi matrix (3.20) is positive semidenite for all
real values of (x1 , x2 , x3 ) but is not an sos matrix.
Exercise 3.84. Modify the algorithm given in Exercise 3.36 so that it will compute
a decomposition of a univariate sos matrix P (x).
Exercise 3.85. Certain optimization problems include constraints that are naturally expressed in matrix form. For instance, a set S could be dened as
1
S = (x1 , x2 , x3 ) R3 : G(x) := x1
x2
x1
1
x3
x2
x3 0
i
i
90
main
2012/11/1
page 90
i
3.3.3
32x81 + 118x61 x22 + 40x61 x23 + 25x41 x42 43x41 x22 x23 35x41 x43 + 3x21 x42 x23
16x21 x22 x43 + 24x21 x63 + 16x82 + 44x62 x23 + 70x42 x43 + 60x22 x63 + 30x83
i
i
main
2012/11/1
page 91
i
91
The work [4] presents a complete classication of the cases for which convexity
and sos-convexity coincide. This description is in a certain sense the analogue to
Hilberts classication of nonnegativity described in Section 3.1.2.
Another motivation and justication for studying sos-convexity is its computational tractability. Deciding convexity of a multivariate polynomial is an NP-hard
problem [3], while it follows from our earlier discussions that sos-convexity can be
checked using semidenite programming. Sos-convexity will appear prominently in
the characterization of semidenite representability of convex sets; see Section 6.4.3
in Chapter 6 for details. For more results and background material on sos-convexity,
we refer the reader to [5, 4].
Exercise 3.88. Show that the Choi matrix (3.20) is not the Hessian of any polynomial.
Exercise 3.89. Prove Theorem 3.87. Hint: To show that p(x) is not sos-convex,
analyze the (1, 1) entry of the Hessian.
Exercise 3.90. In this exercise, we explore the use of sos-convexity for the problem of tting a polynomial to data, under a convexity restriction (e.g., [76]).
Consider a nite set of data {xi , fi } for i = 1, . . . , N , where xi D Rn and
fi R. We want to t these data points with a polynomial function p(x) of degree
2
d, making the least-squares tting error N
i=1 (p(xi ) fi ) as small as possible.
1. Give an sos formulation for this problem, in the case where p(x) is required
to be a globally convex polynomial. Explain whether the formulation solves
this problem exactly.
2. How would you modify your formulation if we only require that p(x) be convex
on the domain D of interest?
3. Generate data points where xi D := [1, 1] [1, 1], and numerically solve
your formulation for those two cases (p(x) is convex everywhere, or is only
convex on the domain D).
3.3.4
Many of the polynomial systems that appear in practice are far from being generic
but rather present a number of structural features that, when properly exploited,
allow for much more ecient computational techniques. This is quite similar to the
situation in numerical linear algebra, where there is a big dierence in performance
between algorithms that take into account matrix sparsity and those that do not.
For matrices, the notion of sparsity is often relatively straightforward and relates
mostly to the number of nonzero coecients. In computational algebra, however,
there exists a much more rened notion of sparsity that refers not only to the
number of zero coecients of a polynomial, but also to the underlying combinatorial
structure of the nonzero coecients.
i
i
92
main
2012/11/1
page 92
i
y degree
2
x degree
(3.21)
2
i qi (x) ,
This theorem allows us, without loss of generality, to restrict the set of monomials appearing in the sos representation (3.12) to those in the Newton polytope
of p, scaled by a factor of 12 . This reduces the size of the corresponding matrix Q,
thus simplifying the semidenite program to be solved.
i
i
main
2012/11/1
page 93
i
93
3.3.5
i
i
94
main
2012/11/1
page 94
i
(3.23)
Both expressions are sucient conditions for the nonnegativity of p on the variety
dened by fi (x) = 0. As we will see, we can use this to give a more ecient version
of the SDP formulation of sum of squares.
Sum of squares on quotient rings. We describe next a natural modication
of the standard sos methods that will allow us to compute sos decompositions on
quotient rings. This can be done by using essentially the same SDP techniques
as in the standard case. Since we will need to do eective computations on the
quotient, we assume that a Gr
obner basis G = {b1 , . . . , bk } of the polynomial ideal
I is available; see Appendix A and [32] for an introduction to computational algebra
and Gr
obner basis methods.
The method will be basically the same as in the standard case explained in
Section 3.1.4 (expressing the polynomial as a quadratic form on a vector of monomials and writing linear equations to obtain a semidenite program), but with two
main dierences:
Instead of indexing the rows and columns of the matrix Q in the semidenite
program by the usual monomials, we use standard monomials corresponding
to the Grobner basis G of the ideal I. These are the monomials that are not
obner basis.
divisible by any leading term of the polynomials bi in the Gr
When equating the left- and right-hand sides to form linear equations dening
the subspace of valid Gram matrices, all operations are performed in the
quotient ring; i.e., we rewrite the terms in normal form after multiplication.
i
i
main
2012/11/1
page 95
i
95
q12
q22
q23
q13
1
q23 x
q33
y
mod I,
where, in the last line, we used reduction modulo the ideal to rewrite some terms
as linear combinations of standard monomials only (e.g., the term q33 y 2 is replaced
by q33 q33 x2 ). Matching coecients between left and right, we obtain the linear
equations
1 : 10 = q11 + q33 ,
x:
0 = 2q12 ,
y : 1 = 2q13 ,
x2 : 1 = q22 q33 ,
xy :
0 = 2q23
that dene the subspace. Thus, we obtain again a simple semidenite program.
Solving it, we have
9 0 12
3 0 16
1
T
L=
Q = 0 0 0 = L L,
,
35
2 0 0
12 0 1
6
and therefore
.
y /2 35 2
10 x2 y 3
y
+
6
36
mod I,
which shows that p is indeed a sum of squares on R[x, y]/I. A simple geometric
interpretation is shown in Figure 3.9. As expected, by the condition above, p coincides with an sos polynomial on the variety, and thus it is obviously nonnegative
on that set.
i
i
96
main
2012/11/1
page 96
i
12
10
1
8
0
1
0
1
x
1
2
Figure 3.9. The polynomials p = 10 x2 y and (3 y6 )2 + 35
36 y take
exactly the same values on the unit circle x2 + y 2 = 1. Thus, p is nonnegative on
the circle.
Remark 3.100. Despite the similarities between the standard case of sum of
squares on the polynomial ring R[x] versus the quotient ring R[x]/I, there are a
few important dierences. A key distinction is related
to computational complexity
issues. Consider an sos decomposition p(x) = i qi (x)2 . When working on R[x],
we can always bound a priori the degree of the polynomials qi in terms of the degree
of p (namely, deg(qi ) 12 deg(p)). This is not true when working on a quotient
ring, since monomials can wrap around when computing normal forms. This is
the reason why when working on R[x]/I we typically have some freedom in choosing
a nite set of standard monomials to index the matrix Q (unless it is feasible to
include all of them).
In fact, since for the ideal I = x21 1, . . . , x2n 1 every polynomial nonnegative
on V (I) is a sum of squares on R[x]/I (Exercise 3.105), it directly follows that,
in the general case, deciding whether a polynomial is sum of squares modulo I is
NP-hard.
Even though in the worst case computing a Grobner basis for I may be
troublesome, for many practical problems they are often directly available or relatively easy to compute. A typical example is the case of combinatorial optimization
problems, where the equations dening the Boolean ideal x21 1, . . . , x2n 1 are
already a Gr
obner basis. Another frequent situation is when the ideal is dened by
a single constraint, in which case the dening equation is again obviously a Gr
obner
basis of the corresponding ideal.
SDP dimensions and Hilbert series. Another advantage of the idealtheoretic formulation is the ease with which structural results can be obtained
through basic algebraic notions. For instance, consider the following question: what
are the matrix dimensions of the semidenite programs for sum of squares modulo an
i
i
main
2012/11/1
page 97
i
97
ideal? Recall that in the standard sos case (over R[x], for a polynomial of degree
2d), the matrices are indexed by all monomials of degree less than or equal to d and
%
$n+d% d $n+k1%
$
= k=0
, where each
thus have size n+d
d . This can be rewritten as
d
k
term in the sum corresponds to the number of monomials of total degree k. How
can we generalize this?
For quotient rings, there is a nice way of counting the dimensions of the
dierent homogeneous components, known as the Hilbert series; see, e.g., [33]. The
Hilbert series H(I, t) is the generating function (a formal power series) of the Hilbert
function HI (k), which gives the dimension of the degree k the homogeneous part of
the quotient ring, i.e.,
HI (k) tk ,
H(I, t) =
k=0
ideal
= {0}.
% k The Hilbert series for R[x]/I = R[x] is H(I, t) = 1/(1 t) =
I$n+k1
t , which corresponds exactly to the dimensions computer earlier.
k=0
k
Example 3.101. Consider the ideal I = x2 + y 2 1 of Example 3.99. Its Hilbert
series is
1+t
H(I, t) =
= 1 + 2t + 2t2 + 2t3 + 2t4 ,
1t
which counts the number of standard monomials of each degree. The terms of the
series allow us to determine, given a bound on the total degree of the monomials
to be considered, what size the corresponding semidenite program will be. For
instance, since in Example 3.99 we used only monomials of degree less than or
equal to 1, the size of the corresponding semidenite program is 1 + 2 = 3.
In Exercise 3.106 we discuss another natural and important example, namely,
the Boolean ideal x21 1, . . . , x2n 1. These ideas will appear again in Chapter 7,
when computing semidenite representations of convex hulls of algebraic varieties.
Exercise 3.102. Prove formally that the expressions (3.22) and (3.23) are equivalent.
Exercise 3.103. Consider the polynomial f (x, y, z) := 1 + xy + yz + xz, and the
variety V (I), where I = x2 1, y 2 1, z 2 1. Notice that V (I) is nite.
1. Show, by explicit enumeration, that f is nonnegative on V (I).
2. Write f as a sum of squares on R[x]/I.
i
i
98
main
2012/11/1
page 98
i
3.3.6
Symmetries
Another useful property that can be exploited in the sos context is symmetry. Symmetric problems arise very frequently in applications for a variety of reasons. Sometimes symmetry reects the underlying structure of existing physical systems (e.g.,
time-invariance, conservation laws), while in some other cases it arises as a result
of the chosen mathematical abstraction. Symmetry reduction techniques have been
explored in many contexts, with areas such as crystallography, dynamical systems
[53], and geometric mechanics [77] being prominent examples.
In optimization, as we shall see, symmetry interacts in a very interesting way
with convexity, particularly in the case of semidenite programming. In general,
there are many potential advantages in exploiting symmetries:
Problem size. The rst immediate advantage is the reduction in problem size,
as the new instance can have a signicantly smaller number of variables and
constraints.
i
i
main
2012/11/1
page 99
i
99
Degeneracy removal. In symmetric SDP problems, there are repeated eigenvalues of high multiplicity that are dicult to handle numerically. These can be
removed by a proper handling of the symmetry.
Conditioning and reliability. Symmetry-aware methodologies have in general
much better numerical conditioning, and the resulting smaller size instances
are usually less prone to numerical errors.
An in-depth discussion of symmetries in sum of squares and semidenite programming requires some elements of group representation theory and invariant theory. In this section, we present and isolate the key ideas, referring to the literature
for the full technical details; see, e.g., [48, 123]. We consider the simple situation
where we want to compute an sos decomposition of a single polynomial, and the
underlying symmetry group is nite; the extensions to more general cases are relatively straightforward. The main message is that the presence of symmetry in sos
problems can be exploited at three levels of increasing sophistication: (a) convexity,
(b) semidenite programming, and (c) sum of squares.
The set-up is as follows: we consider a polynomial p(x1 , . . . , xn ) that is invariant under the action of a nite group G. A formal denition is given below in (3.24),
but the idea is that the polynomial in unchanged under certain transformations of
the variables. We will use the following as a running example.
Example 3.107. Consider the (nonconvex) quartic trivariate polynomial
p(x, y, z) = x4 + y 4 + z 4 4xyz + x + y + z.
This polynomial is invariant under all permutations of {x, y, z} (the full symmetric
group S3 ). The global minimum of p is p 2.1129 and is achieved at the orbit
of global minimizers:
(0.988, 1.102, 1.102) , (1.102, 0.988, 1.102) , (1.102, 1.102, 0.988).
For this polynomial, it holds that psos = p .
Recall that a linear representation of a group G is a homomorphism : G
GL(Rn ) (i.e., (st) = (s)(t) s, t G), where GL(Rn ) is the group of invertible
n n real matrices. The assumption that p is invariant under the group action
means that
p((g)x) = p(x)
g G.
(3.24)
i
i
100
main
2012/11/1
page 100
i
g G
and
x S (g)x S
g G,
g G}.
To see why the statement is true, consider any feasible solution x0 S, and dene
the group average
1
x
-0 =
(g)x0
|G|
gG
that expresses x
-0 as a convex combination of the images of x0 under the group
action. By construction, x
-0 F . Since S is convex and
-0 S,
invariant, we have x
1
and convexity and invariance of f yield f (x0 ) |G|
f
((g)x
)
=
f
(x
).
0
0
gG
Thus, without loss of generality, for invariant convex problems we can restrict the search for optimal solutions to a potentially much smaller subset S F
(of course, this is most useful whenever the dimension of the subspace F is small).
In other words, for convex problems, no symmetry-breaking is ever necessary.
Example 3.108. The entropy of a probability vector (p1 , . . . , pn ) with
pi 0, is dened as
n
pi log pi ,
H(p) :=
n
i=1
pi = 1,
i=1
i
i
main
2012/11/1
page 101
i
101
g G};
(3.25)
i.e., X must commute with all matrices in the representation of G. In this case,
using Schurs lemma of representation theory, one can show that in the appropriate symmetry-adapted basis, the xed-point subspace will have a block-diagonal
structure.
Example 3.109. Consider an invariant semidenite program where the matrices
in the xed-point subspace have the structure
a b b
X = b c d .
b d c
Notice that these matrices are invariant under simultaneous permutation of the last
two rows and columns. We now show that these matrices can be put into a more
convenient form. By pre- and postmultiplying by the orthogonal matrix
1 0
0
1
T = 0 ,
= ,
2
0
we obtain
2b
0
a
T T XT = 2b c + d
0 ,
0
0
cd
i
i
102
main
2012/11/1
page 102
i
Example 3.110. Consider our running example, Example 3.107. Since p(x, y, z)
has n = 3 variables, degree 2d
$ = %4, and
$5% a full Newton polytope, its standard sos
formulation is indexed by all n+d
=
d
2 = 10 monomials of degree 2, i.e.,
T
1
x
y
z
2
x
p(x, y, z) =
y2
2
z
yz
xz
xy
q00
q01
q02
q03
q04
q05
q06
q07
q08
q09
q01
q11
q12
q13
q14
q15
q16
q17
q18
q19
q02
q12
q22
q23
q24
q25
q26
q27
q28
q28
q03
q13
q23
q33
q34
q35
q36
q37
q38
q39
q04
q14
q24
q34
q44
q45
q46
q47
q48
q49
q05
q15
q25
q35
q45
q55
q56
q57
q58
q59
q06
q16
q26
q36
q46
q56
q66
q67
q68
q69
q07
q17
q27
q37
q47
q57
q67
q77
q78
q79
q08
q18
q28
q38
q48
q58
q68
q78
q88
q89
1
q09
q19
x
q29
y
q39
z2
q49 x
2 ,
q59
y2
q69
z
q79
yz
xz
q89
q99
xy
where the matrix Q above will be constrained to be positive semidenite. Recall that
p is invariant under all permutations of the variables (the full symmetric group S3 ).
Thus, we can constrain the matrix Q to be in the xed-point subspace, i.e., it
should satisfy Q = (g)T Q(g), where g G and : G GL(R10 ) is the induced
representation on the vector of monomials that arises from permuting the variables
(x, y, z). Solving the equations (3.25) that dene the xed-point subspace, we nd
that the matrices there have the structure
r0 r1 r1 r1 r2 r2 r2 r3 r3 r3
r1 r4 r5 r5 r6 r7 r7 r8 r9 r9
r1 r5 r4 r5 r7 r6 r7 r9 r8 r9
r1 r5 r5 r4 r7 r7 r6 r9 r9 r8
(3.26)
Q=
T = BlockDiag(1, R, R, R) ,
R = ,
i
i
main
2012/11/1
page 103
i
103
It can be veried that under this tranformation, the matrix in (3.26) now takes
the form
- = BlockDiag(Q1 , Q2 , Q2 ),
T T QT
where
3r1
3r2
3r3
r0
3r1 r4 + 2r5 r6 + 2r7
r8 + 2r9
Q1 =
3r2 r6 + 2r7 r10 + 2r11 r12 + 2r13 ,
r8 r9
r4 r5 r6 r7
Q2 = r6 r7 r10 r11 r12 r13 .
r8 r9 r12 r13 r14 r15
Notice that the 10 10 matrix has split into three blocks, one of size 4 4 and two
identical blocks of size 3 3. Also, all$ entries
% $ % are otherwise linearly independent
(in fact, we have the dimension count 52 + 42 = 10 + 6 = 16, the number of free
parameters in (3.26)).
- 0, this implies that instead of solving
- 0 if and only if T T QT
Since Q
an SDP problem with a positivity constraint on a 10 10 matrix, we have now a
4 4 matrix and a 3 3 matrix instead (clearly, we need only one copy of the two
identical 3 3 blocks), which is a lot simpler.
As we can see, exploiting symmetry can allow for a signicant reduction in the
computational cost. Depending on how much symmetry the problem has, the gains
can be very signicant and may enable the solution of problems that are otherwise
practically impossible to solve.
Sums of squares. We showed in the previous section how to simplify and decompose a specic semidenite program, corresponding to the sos decomposition of a
given polynomial. We can use similar techniques to simultaneously decompose the
semidenite programs associated to sos decompositions of all polynomials invariant
under a given symmetry group. In other words, if before we were using a symmetryadapted basis to split a xed vector of monomials into isotypic components, now
we will instead simultaneously decompose the whole polynomial ring.
The results we present can be expressed in a very appealing form using a few
basic concepts of invariant theory. Given a nite group G acting on (x1 , . . . , xn ),
recall that the invariant ring is the set of invariant polynomials R[x]G := {p
R[x] : p((g)x) = p(x) g G}, with the natural operations. For simplicity, we will
restrict ourselves to the simple situation where the invariant ring R[x]G is isomorphic
to a polynomial ring.3 In this case, we have R[x]G 1 , . . . , n , where 1 , . . . , n
are algebraically independent invariant polynomials.
3 In general, the invariant ring is a nitely generated algebra but is not necessarily isomorphic to
a polynomial ring; i.e., there may not exist a set of algebraically independent generators; see, e.g.,
[119, 38]. A simple example of this situation is the cyclic group C3 acting on R[x, y, z] by cyclically
permuting the indeterminates. In this case, a minimal set of generators for the invariant ring R[x]G
is {s1 , s2 , s3 , s4 } := {x + y + z, xy + yz + zx, xyz, x2 y + y 2 z + z 2 y}. However, these are algebraically
dependent since they satisfy the relation 9s23 + 3s3 s4 + s24 6s1 s2 s3 s1 s2 s4 + s32 + s31 s3 = 0.
i
i
104
main
2012/11/1
page 104
i
where i R[]ri ri are symmetric matrices that depend only on the group action
and Si R[]ri ri are sos matrices.
i
i
main
2012/11/1
page 105
i
105
2113
1000
s2 = 0,
S3 =
+ e1 +
79
282
79
47 e2
79 2
141 e1
74
304 2
1279 e1 + 693 e1
749
92 + 1636
e1
1120
11511 e1 e2
29 +
749
1636 e1
3469
4908
148 3
1279 e1
1439 2
2454 e2
85469 2
188958 e1 e2
85 4
693 e1 ,
.
It is easy to check that s1 , s2 , and S3 are indeed sums of squares and that they
satisfy p + 2113
1000 = s1 + S3 , 3 and therefore serve as a valid algebraic certicate
for the lower bound 2.113.
i
i
106
main
2012/11/1
page 106
i
Equality constraints
Symmetries
monomials (deg k)
standard monomials
isotypic components
Hilbert series
Molien series
Finite convergence
on zero dimensional ideals
Block diagonalization
1
(1t)n
$
n+k1
k
tk
k=0
3.4
Infeasibility Certicates
At several points in this chapter, we have given sos-based sucient conditions for
dierent problems (e.g., nonnegativity of polynomials over sets in Section 3.2.4). We
i
i
main
2012/11/1
page 107
i
107
now study in more detail the structure of these certicates, as well as the question
of when converse results hold, i.e., how to use sos techniques to certify properties
of systems of equations and inequalities over the real numbers. As we shall see,
sos techniques are very powerful in the sense that they can always provide proofs
of infeasibility for general basic semialgebraic sets. The key role of sum of squares
in these infeasibility certicates is developed in Section 3.4.2, where we introduce
the Positivstellensatz, highlighting the similarities to and dierences from other
well-known algebraic infeasibility certicates.
3.4.1
The feasible set S of an optimization problem is usually described by a nite number of polynomial equations and/or inequalities. However, at least in principle,
one could write many other constraints that are equally valid on the set S. For
instance, for a linear programming problem, we could consider nonnegative linear combinations of the given inequalities. Recall that this issue appeared already
in Section 3.2.4, when considering polynomial nonnegativity over a set, and we
described there two techniques (for equations and inequalities, respectively) of producing further valid constraints. We would like to understand the set of all possible
valid constraints and, in particular, how to algorithmically generate them. To do
so, we revisit those constructions next and formalize their properties in terms of
two important algebraic objects: ideals and preorders.
For the case of a set described by equations fi (x) = 0, we were able to produce
further polynomials vanishing on the set S by considering linear combinations with
polynomial coecients. The set of all polynomials generated this way is a polynomial ideal. We restate the familiar denition here, for easy comparison with the
new concepts introduced later.
Denition 3.118. Given multivariate polynomials {f1 , . . . , fm }, the ideal generated by the fi is
f1 , . . . , fm := {f : f = t1 f1 + + tm fm ,
ti R[x]} .
Similarly, for a set described by inequalities gi (x) 0, one can generate new
valid inequality constraints by multiplying the gi (x) against sos polynomials, or
by taking conic combinations of valid constraints. This is formalized through the
notion of quadratic module.
Denition 3.119. Given multivariate polynomials {g1 , . . . , gm }, the quadratic
module generated by the gi is the set
qmodule(g1 , . . . , gm ) := {g : g = s0 + s1 g1 + + sm gm },
where s0 , s1 , . . . , sm R[x] are sums of squares.
i
i
108
main
2012/11/1
page 108
i
preorder(g1 , . . . , gm ) :=
g : g = s0 +
si g i
{i}
{i,j}
sij gi gj +
{i,j,k}
sijk gi gj gk +
where each term in the sum is a square-free product of the polynomials gi , with a
coecient s R[x] that is a sum of squares. The sum is nite, with a total of
2m terms, corresponding to all subsets of {g1 , . . . , gm }.
Clearly qmodule(g1 , . . . , gm ) preorder(g1 , . . . , gm ), so, in principle, the
latter yields a possibly larger set of valid constraints. By construction, ideals,
quadratic modules, and preorders contain only valid constraints, which are logical
consequences of the given equations and inequalities. Indeed, every polynomial
in the ideal f1 , . . . , fm vanishes on the solution set of fi (x) = 0. Similarly, every element of preorder(g1 , . . . , gm ) is clearly nonnegative on the feasible set of
gi (x) 0.
A natural question arises: Can all valid constraints be generated this way?
Unless further assumptions are made, ideals and preorders (and thus, quadratic
modules) may not necessarily contain all valid constraints; see Exercise 3.121. Remarkably, however, they will be powerful enough to always detect and certify the
possible infeasibility (i.e., emptiness) of the corresponding feasible set; the Positivstellensatz (Theorem 3.127) formalizes this statement.
The notions of ideal, preorder, and quadratic module as used above are standard in real algebraic geometry; see, for instance, [19] (the preorders are sometimes
also referred to as a cones). Notice that, as geometric objects, ideals are ane sets,
and quadratic modules and preorders are closed under convex combinations and
nonnegative scalings (i.e., they are actually cones in the convex geometry sense).
These convexity properties, coupled with the relationships between semidenite
programming and sums of squares, will be key for our developments in the next
section.
Exercise 3.121. In general, ideals and preorders may not contain all valid constraints. In this exercise, we illustrate a few cases where things may go wrong.
1. Let S = {x R : x2 = 0}. Show that the polynomial x vanishes on the
feasible set but is not in the ideal x2 .
i
i
main
2012/11/1
page 109
i
109
3.4.2
Certicates of Infeasibility
A central theme throughout convex optimization is the concept of infeasibility certicates, or, equivalently, theorems of the alternative. The key links relating algebraic techniques and optimization will be the facts that infeasibility of a given
polynomial system can always be certied through a particular algebraic identity,
and that this identity itself can be found via convex optimization.
Let us start by considering the following question: If a system of equations
does not have solutions, how can we prove this fact? In particular, what kind of
evidence could we show to a third party to convince them that the given equations
are indeed unsolvable?
Remark 3.122. Notice the asymmetry between this question (proving or certifying
nonexistence of solutions) versus providing evidence that the equations truly have
solutions. The latter could be certied (at least in principle) by producing a candidate
point x0 that satises all equations (nding such a point x0 could be very hard, but
that is not the issue here). In complexity-theoretic terms, this is essentially the
distinction between the NP and co-NP complexity classes (over either the Turing or
the real computation model).
Fortunately, for problems with algebraic structure, there are quite natural
ways of providing infeasibility certicates. These are formal algebraic identities that
give irrefutable evidence about the inexistence of solutions. We briey recall and
illustrate several well-known special cases before proceeding to the general case of
polynomial systems over the reals. Table 3.2 contains a summary of the infeasibility
certicates to be discussed and the associated computational techniques.
Linear equations. We consider rst linear systems of equations over either the
real or the complex numbers (in fact, any eld will do). It is a well-known result
from linear algebra that if a set of linear equations Ax = b is infeasible, there exists a
linear combination of the given equations such that the left-hand side is identically
zero, but the right-hand side does not vanish (and thus, infeasibility is evident).
Such a linear combination can be found, for instance, by Gaussian elimination.
This result is also known as the Fredholm alternative.
i
i
110
main
2012/11/1
page 110
i
Complex
Range/kernel
Linear algebra
Nullstellensatz
Bounded degree: Linear algebra
Grobner bases
Real
Farkas lemma
Linear programming
Positivstellensatz
Bounded degree: SDP
is infeasible
s.t. AT = 0, bT = 1.
Notice that one direction of the theorem (existence of a suitable implies
infeasibility) is obvious: premultiply the equations with T to obtain
Ax = b
T Ax = T b
0 = 1,
(i = 1, . . . , m)
is infeasible in Cn
1 f1 , . . . , fm .
Again, the easy direction is almost trivial. If 1 is in the ideal generated
by the fi , there exist polynomials t1 (z), . . . , tm (z) such that
t1 (z)f1 (z) + + tm (z)fm (z) = 1.
Evaluating this expression at any candidate solution of the polynomial system, we
obtain a contradiction, since the left-hand side vanishes, while the right-hand side
does not. The polynomials ti prove infeasibility of the given equations and constitute
a Nullstellensatz refutation for the polynomial system. Their eective computation
can be accomplished in a variety of ways. This could be done, for instance, via
i
i
main
2012/11/1
page 111
i
111
Gr
obner basis techniques, or, if a bound on the degree of the polynomials ti is
assumed a priori, via straightforward (but possibly inecient) linear algebra.
At this point, we should mention an important complexity-theoretic distinction between this case and the simpler case of linear equations discussed earlier.
Since deciding feasibility of polynomial equations includes propositional satisability (which is NP-hard) as a special case, it would be unreasonable to expect that
short certicates of infeasibility always exist. Thus, in general one should not
expect to always be able to produce certicates ti (z) of small degree for every infeasible system. In fact, explicit systems of equations are known whose Nullstellensatz
refutations necessarily have large degree; see Exercise 3.135, as well as [24, 55, 36]
and the references therein.
Remark 3.125. The two results discussed above deal only with equations (either
linear equations over any eld, or polynomial equations over the complex numbers).
Working with inequalities, or trying to distinguish between real versus complex
solutions, will bring additional algebraic challenges. As we will see, to do this one
needs to take into account special properties of the reals (mainly, the fact that R is
an ordered eld) that are not true for the complex numbers.
Linear inequalities. For systems of linear inequalities, strong LP duality provides ecient certicates of infeasibility. These are essentially an algebraic interpretation of the separation theorem for polyhedral sets and are usually presented
in terms of theorems of the alternative such as the celebrated Farkas lemma.
Theorem 3.126 (Farkas lemma).
Ax + b = 0,
Cx + d 0
0, s.t.
is infeasible
AT + C T
bT + dT
= 0,
= 1.
As in the previous cases, the easy direction is straightforward. It is equivalent to the weak duality of linear programming and follows from direct syntactic manipulations (premultiply the rst equation by T and the second equation
by T , and add to obtain a contradiction). The dicult converse direction is
equivalent to strong duality, which always holds for linear programming problems.
A suitable certicate pair (, ) can be obtained by solving the corresponding LP,
which can be done in polynomial time using the ellipsoid algorithm or interior-point
methods.
These classical results can be generalized and unied to handle the case of
systems of polynomial equations and inequalities over the real numbers. This will
yield a simultaneous generalization of Farkas lemma (to allow for polynomial inequalities), as well as the possibility of distinguishing between real and complex
solutions (unlike the Nullstellensatz).
i
i
112
3.4.3
The Positivstellensatz
main
2012/11/1
page 112
i
Consider a general system of polynomial equations and inequalities for which one
wants to show that it has no solutions over the real numbers. How do we certify
its infeasibility? As we describe next, a very natural class of algebraic certicates
exists for this case, under no assumptions whatsoever. This result is known as
the Positivstellensatz and is one of the cornerstones of real algebraic geometry. It
essentially appears in this form in [19] and is due to Stengle [114].
Theorem 3.127 (Positivstellensatz).
fi (x) = 0 (i = 1, . . . , m),
gi (x) 0 (i = 1, . . . , p)
is infeasible in Rn
F (x) + G(x) = 1,
F (x) f1 , . . . , fm ,
F (x), G(x) R[x] s.t.
G(x) preorder(g1 , . . . , gp ).
(3.27)
The theorem states that for every infeasible system of polynomial equations
and inequalities, there exists a simple algebraic identity that directly certies the
inexistence of real solutions. The certicate has a very simple form: a polynomial F (x) from the ideal generated by the equality constraints and a polynomial
G(x) from the preorder generated by the equations that add up to the polynomial
1. The easy direction is immediate: by construction, evaluating F (x) + G(x)
at any feasible point should produce a nonnegative number. However, since this
expression is identically equal to the polynomial 1, we arrive at a contradiction.
Remarkably, the Positivstellensatz holds under no assumptions whatsoever on the
polynomials.
Naturally, we are concerned with the eective computation of these certicates. Recall that for the cases of Theorems 3.1233.126, the corresponding refutations can be obtained using either linear algebra, linear programming, or Gr
obner
bases techniques. For the Positivstellensatz, we have established that ideals and
preorders are convex cones in the space of polynomials. As a consequence, the
conditions in Theorem 3.127 for a certicate to exist are convex, regardless of any
convexity property of the original problem. Furthermore, the same property holds
if we consider only bounded-degree sections, i.e., the intersection with the subspace
of polynomials of degree less than or equal to a given number D. In this case,
the conditions in the Positivstellensatz have exactly the form of an sos program.
This implies that we can nd bounded-degree certicates by solving semidenite
programs.
Theorem 3.128. Consider a system of polynomial equations and inequalities that
has no real solutions. The search for bounded-degree Positivstellensatz infeasibility
certicates is an sos program and thus is solvable via semidenite programming.
If the degree bound is suciently large, infeasibility certicates F (x), G(x) for the
original system will be obtained from the corresponding sos program.
i
i
main
2012/11/1
page 113
i
113
Since infeasibility certicates are naturally ordered by their degree, this gives
rise to a natural hierarchy of semidenite relaxations for semialgebraic problems,
indexed by certicate degree [89, 91]. The Positivstellensatz guarantees that this
hierarchy is complete in the sense that, for every infeasible system, a suitable refutation will eventually be found.
Example 3.129. Consider the following polynomial system:
f1 := x21 + x22 1 = 0,
g1 := 3x2 x31 2 0,
g2 := x1 8x32 0.
We will prove that it has no solutions (x1 , x2 ) R2 . By the Positivstellensatz, the
system is infeasible if and only if there exist polynomials t1 , s0 , s1 , s2 , s12 R[x1 , x2 ]
that satisfy
f t + s0 + s1 g1 + s2 g2 + s12 g1 g2 = 1,
1 ! "1
!
"
ideal f1
(3.28)
preorder(g1 ,g2 )
i
i
114
main
2012/11/1
page 114
i
To summarize our discussions, there is a direct path connecting general polynomial optimization problems to semidenite programming, via Positivstellensatz
infeasibility certicates. Pictorially, we have the following:
Polynomial systems
Positivstellensatz certicates
Semidenite programming.
Even though so far we have discussed only feasibility problems, there are obvious
straightforward connections with optimization questions, which we make more concrete in the next section. As we did earlier in the case of unconstrained optimization,
by considering the emptiness of the sublevel sets of the objective function, sequences
of converging bounds indexed by certicate degree can be directly constructed.
Exercise 3.130. Consider a single quadratic polynomial equation ax2 + bx +
c = 0. What conditions must (a, b, c) satisfy for this equation to have no real
solutions? Assuming this condition holds, give a Positivstellensatz certicate of the
nonexistence of real solutions.
Exercise 3.131. Explain how Theorem 3.127 simplies in the following cases:
1. There are no equality constraints.
2. There are no inequality constraints. Is this case equivalent to Hilberts Nullstellensatz? Explain why or why not.
Exercise 3.132. Consider the polynomial system {x + y 3 = 2, x2 + y 2 = 1}.
1. Is it feasible over C? How many solutions are there?
2. Is it feasible over R? If not, give a Positivstellensatz-based infeasibility certicate of this fact.
Exercise 3.133. Assume that in the statement of the Positivstellensatz, we replace
preorder(g1 , . . . , gp ) with the (potentially smaller) set qmodule(g1 , . . . , gp ). Is the
result still true? Prove, or disprove via a counterexample.
Exercise 3.134. Prove, using the Positivstellensatz, that every nonnegative polynomial is a sum of squares of rational functions. (Hint: A polynomial f (x) satises
f (x) 0 for all x Rn if and only if the set {(x, y) Rn R : f (x) 0, yf (x) = 1}
is empty.)
i
i
main
2012/11/1
page 115
i
115
3.4.4
(3.29)
1 + q1 (x)
,
q2 (x)
(3.30)
i
i
116
main
2012/11/1
page 116
i
As we can see, these representations are simpler in the sense that the conditions involve fewer sos multipliers (recall that the preorder contains terms corresponding to squarefree products between inequalities). Notice, however, that these
results say nothing about the degrees of the corresponding sos polynomials. It may
be possible, at least in certain cases, that the degrees appearing in simpler representations are much larger than those of more complicated ones; see, e.g., [115].
We explore some of these issues in the exercises.
Hierarchies of relaxations. All the sos conditions that we have discussed, including Positivstellensatz certicates (Theorem 3.127) and the representation theorems of Schm
udgen (Theorem 3.136) and Putinar (Theorem 3.138), depend on
the degree of the sos multipliers. Thus, each of these theorems gives rise to a corresponding hierarchy of sos relaxations, obtained by increasing the corresponding
certicate degree. For instance, when minimizing a polynomial p(x) over a set
S of the form (3.29), we can consider as before Positivstellensatz certicates of
the form
p(x) =
1 + q1 (x)
,
q2 (x)
(3.31)
i
i
main
2012/11/1
page 117
i
117
exists when = 0, where s0 (x), s1 (x) are sums of squares. He also showed that
as 0, the degrees of s0 , s1 necessarily have to go to innity, and provided the
1
1
bounds c1 2 deg(s0 ) c2 2 log 1 for some constants c1 , c2 .
1. Give a Positivstellensatz certicate of the form (3.30) for strict positivity of
p(x) + on S. Does the certicate degree depend on ?
2. Verify that the expressions below give the best representation of the form
(3.31). Let the degree of s0 (x) be equal to 4N . Then, the optimal solution
that minimizes is
=
N
1
,
(2N + 2)2 1
s0 (x) = q0 (x)2 ,
s1 (x) = q1 (x)2 ,
where
$
%
q0 (x) = 2(N + 1) 2 F1 N, N + 2 ; 12 ; x2 ,
$
%
1
q1 (x) = x 2 F1 N 1, N + 1 ; 32 ; x2 ,
N
and 2 F1 (a, b; c, x) is the standard Gauss hypergeometric function [1, Chapter 15].
Exercise 3.141. Recall the set S from Exercise 3.63:
S = {(x, y) R2 : x 0, y 0, x + y 1}.
The polynomial p(x, y) = xy + (for > 0) is strictly positive on S. Analyze
experimentally the smallest values of , provable using the positivity certicates of
Theorems 3.136 and 3.138, as a function of certicate degree. Compare this against
the Positivstellensatz certicates (3.30).
3.5
The sets of nonnegative and sos polynomials, being convex cones, have a rich duality
3.5.1
Recall that the sets of nonnegative polynomials Pn,2d and sums of squares n,2d
are proper cones in R[x]n,2d . It then follows that the corresponding duals Pn,2d
and n,2d are also proper cones (in the vector space R[x]n,2d ) and that the reverse
containment holds:
n,2d Pn,2d
n,2d Pn,2d
.
i
i
118
main
2012/11/1
page 118
i
What is the interpretation of these dual cones? Are there natural objects associated with them?
The dual space. Let us consider rst the dual space to polynomials R[x]n,2d . The
elements of this vector space are linear functionals on polynomials, i.e., linear maps
of the form : R[x]n,2d R, that take a polynomial and return a real number.
There are many such functionals, and they can supercially look quite dierent.
For instance, some examples of such linear maps are
evaluation of p at a point x0 Rn : p p(x0 ),
1
integration of p over a subset S Rn : p S p(x)dx,
evaluation of derivatives of p at a point x0 Rn : p
p
xi ...xk (x0 ),
Dual cone of nonnegative polynomials. What about the dual cone Pn,2d
=
Dual cone of sums of squares. For the cone n,2d (dual of sums of squares),
the situation is a bit simpler. Since the cone n,2d is generated by the squares, we
have almost by denition the description n,2d = { R[x]n,2d : (q 2 ) 0 q
R[x]n,d }. This directly gives a characterization of n,2d as a spectrahedron; see
Exercise 3.144. However, in this case the geometric interpretation is less clear,
i
i
main
2012/11/1
page 119
i
119
13
2
3. Is Pn,2d
or n,2d basic semialgebraic?
Exercise 3.145. Find an extreme point of 2,4 that is not a conic combination
of point evaluations. Hint: Think about the Motzkin polynomial. How would you
prove that it is not a sum of squares?
i
i
120
3.5.2
main
2012/11/1
page 120
i
0,
where
:=
x d. Conversely, given
tive measure. Then pd
=
,
if
c
0
for
all
nonnegative
p,
then
the linear functional
a set of numbers
(p) := c is in Pn,2d
, and thus it is (up to closure) a conic combination
of point evaluations. We1 can interpret this as a nonnegative measure , which will
satisfy = (x ) = x d. Thus, we can identify (again, up to closure) the
k := E[X ] =
xk d(x).
(3.32)
i
i
a20
+ 2a0 a1 E[X] +
main
2012/11/1
page 121
i
121
a21 E[X 2 ]
T
a
0
= 0
a1
1
1
2
a0
,
a1
which implies that the 2 2 matrix above must be positive semidenite. Interestingly, this is equivalent to the inequality obtained earlier.
The same procedure can be repeated for higher-order moments. Let =
(0 , 1 , . . . , 2d ) be given. By considering the expectation of the square of a generic
polynomial
0 E[(a0 + a1 X + + ad X d )2 ],
we have that the higher order moments of a random
0
1
2
1
2
3
3
4
H() := 2
..
..
..
..
.
.
.
.
d d+1 d+2
d
d+1
d+2
0.
..
.
(3.33)
2d
Notice that H() is a Hankel matrix, and the diagonal elements correspond to the
even-order moments, which should obviously be nonnegative.
As we will see below, this condition is almost necessary and sucient in
the univariate case in the sense that it characterizes the set of valid moments up to
closure.
Theorem 3.146. Let = (0 , 1 , . . . , 2d ) be given, where 0 = 1. If is a valid
set of moments, then the associated Hankel matrix H() is positive semidenite.
Conversely, if H() is (strictly) positive denite, then is valid; i.e., there exists
a nonnegative random variable with this set of moments.
The derivation given earlier shows the necessity of the semideniteness condition. Suciency will follow from the explicit construction of Section 3.5.5.
Remark 3.147. For the case of measures supported on the real line, the semidefinite condition in (3.33) characterizes the closure of the set of moments, but not
necessarily the whole set. As an example, consider = (1, 0, 0, 0, 1), corresponding
to the Hankel matrix
1 0 0
H() = 0 0 0 .
0 0 1
Although this matrix is positive semidenite, there is no nonnegative measure corresponding to those moments (notice that 2 = 0). However, the parametrized atomic
measure given by
&
&
'
'
1
1
4
4
=
x+
x
+ (1 4 ) (x) +
2
i
i
122
main
2012/11/1
page 122
i
has as rst ve moments (1, 0, 2 , 0, 1), and thus as 0 they converge to those
given above.
As the remark above illustrates, the fact that the semidenite description is
correct only up to closure is a consequence of considering measures supported on
the whole real line, which is not compact. For the case of compact intervals, the
situation will be nicer, as we will see in the next section.
As we move on to the general multivariate case, however, much more serious
diculties will appear (essentially, once again, the dierence between polynomial
nonnegativity versus sums of squares). We will discuss this situation in Section 3.5.6.
3.5.3
.
d
i=0
ai xi
/2
,
(1 x)
.
d
i=0
ai xi
/2
,
(3.34)
which are obviously nonnegative for x [1, 1]. As before, by computing the
expectation of these polynomials, we obtain necessary conditions in terms of the
quadratic form (in the coecients ai ):
d
d
/2
.
d
i
a
X
(j+k j+k+1 )aj ak .
=
0 E (1 X)
i
i=0
j=0 k=0
Since the polynomials of the form (3.34) generate all nonnegative polynomials on
[1, 1], and this interval is compact, these conditions give a full characterization.
We formalize this in the next result.
i
i
123
1
0
1
2
2
3
d+1
2
2
3
4
d+2
3
..
..
..
.
.
.
..
..
..
.
.
.
d+1
d+2
main
2012/11/1
page 123
i
2d
d+1
3
4
5
..
.
..
.
d+2
d+3
(1 x2 )
.
d1
i=0
ai xi
d+1
d+2
d+3
0. (3.35)
..
.
2d+1
/2
,
which are again obviously nonnegative in [1, 1]. This yields the following lemma.
Lemma 3.149. There exists a nonnegative nite measure supported in [1, 1] with
moments (0 , 1 , . . . , 2d ) if and only if
1
2
d
0
1
2
3
d+1
2
3
4
d+2
0,
..
..
..
..
..
.
.
.
.
.
0
1
2
..
.
1
2
3
..
.
2
3
4
..
.
..
.
d1
d+1
2
d1
3
d
d+1
4
.. ..
. .
2d2
d+1
d+1
d+2
3
4
5
..
.
4
5
6
..
.
..
.
d+2
d+3
2d
d+1
d+2
d+3
0.
..
.
2d
(3.36)
3.5.4
i
i
124
main
2012/11/1
page 124
i
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
1
0.8
0.5
1
1
0
0.5
0.5
0
0,
0 = 1.
1 2
2 3
Since both semidenite constraints are given by 2 2 matrices, the moment space
is the intersection of two circular cones.
i
i
main
2012/11/1
page 125
i
125
Exercise 3.152. Explain Remark 3.147 from this geometric perspective. What can
you say about the closedness of the convex hull of the moment curve in Rd ? Show
that if we consider closed intervals (i.e., t [a, b]), then the corresponding convex
hull is compact. What happens in the unconstrained case, i.e., when t (, )?
3.5.5
Constructing a Measure
0
1
..
.
1
2
..
.
..
.
d1
..
.
d1
2d2
c0
c1
..
.
cd1
d+1
= .. .
(3.38)
2d1
The Hankel matrix on the left-hand side of this equation is H(), and thus the
linear system in (3.38) has a unique solution if the matrix is positive denite. In
this case, we let xi be the roots of the univariate polynomial
xn + cn1 xn1 + + c1 x + c0 = 0,
which are all real and distinct (why?). We can then obtain the corresponding
weights wi by solving the nonsingular Vandermonde system given by
n
wi xji = j
(0 j n 1).
i=1
i
i
126
main
2012/11/1
page 126
i
In Exercise 3.155 we will prove that this method actually works (i.e., the atoms
xi are real and distinct, the weights wi are nonnegative, and the moments are the
correct ones).
Example 3.153. Consider the problem of nding a nonnegative measure whose
rst six moments are given by (1, 1, 2, 1, 6, 1). The solution of the linear system (3.38) yields the polynomial
x3 4x2 9x + 16 = 0,
whose roots are 2.4265, 1.2816, and 5.1449. The corresponding weights are 0.0772,
0.9216, and 0.0012, respectively. It can be easily veried that the found measure
indeed satises the desired constraints.
Remark 3.154. The measure recovery method described above always works correctly, provided the computations are done in exact arithmetic. In most practical
applications, it is necessary or convenient to use oating-point computations. Furthermore, in many settings the moment information may be noisy, and therefore the
matrices may contain some (hopefully small) perturbations from their nominal values. For these reasons, it is of interest to understand sensitivity issues at the level
of what is intrinsic about both the problem (conditioning) and the specic algorithm
used (numerical stability).
When using oating-point arithmetic, this technique may run into numerical
diculties. On the conditioning side, it is well known that from the numerical viewpoint, the monomial basis (with respect to which we are taking moments) is a bad
basis for the space of polynomials. On the numerical stability side, the algorithm
above does a number of inecient calculations, such as explicitly computing the coecients ci of the polynomial corresponding to the support of the measure. Better
approaches involve, for instance, directly computing the nodes xi as the generalized
eigenvalues of a matrix pencil; see, e.g., [51, 52].
Exercise 3.155. Prove that the algorithm described above always produces a
valid measure, provided the initial matrix of moments is positive denite. Hint:
Show that if p(x) is a polynomial that vanishes at the points xi then E[p(x)2 ] = 0.
From this, using the assumed positive deniteness of the Hankel matrix, determine
what equations p(x) must satisfy. What is the relation between this matrix and the
Hermite form?
Exercise 3.156.
1. Find a discrete measure having the same rst eight moments as a standard
Gaussian distribution of zero mean and unit variance.
2. What does the previous result imply if we are interested in computing integrals
of the type
,
x2
1
p(x)e 2 dx,
2
i
i
main
2012/11/1
page 127
i
127
where p(x) is a polynomial of degree less than eight? What would you do if
p(x) is an arbitrary (smooth) function?
3. Use these ideas to give an approximate numerical value of the denite integral
,
2
cos(2x + 1) e2x dx.
2e
cos(1)?
Note. In the general case where we are matching 2d moments of a standard Gaussian, it can be shown
that the support of these discrete measures will be given by
the d zeros of Hd (x/ 2), where Hd is the standard Hermite polynomial of degree d.
These numerical techniques are called Gaussian quadrature; see, e.g., [116, 49] for
details.
Exercise 3.157. What is the geometric interpretation of the atomic measure
produced by the algorithm described in this$section? %Explain your answer in terms
of Figure 3.10 and the set of moments = 1, 15 , 12 , 17 .
3.5.6
The same questions we have considered so far in this section for the univariate case
can be formulated for nonnegative measures in several variables. Concretely, given a
set of numbers , with Nn and || 2d, does there exist a nonnegative measure
in Rn matching these moments? By our earlier discussions, this is essentially the
i
i
128
main
2012/11/1
page 128
i
can produce tighter outer approximations to the set Pn,2d that improve upon the
straightforward outer bound n,2d while still being computationally tractable. To
do this, we simply dualize the hierarchies of inner approximations to the set of nonnegative polynomials that we obtained via sos methods. Each variation of the sos
methods that we have seen (Positivstellensatz, Polya/Reznick theorem, Schm
udgen,
and Putinar representations) can be used to produce a matching sequence of dual
approximations to the corresponding dual cone. For concreteness, we illustrate this
discussion with two specic examples.
Polynomial multipliers and rational moments. Recall from Section 3.2.6 that
a way of producing stronger sos conditions in the multivariate case was to multiply
the given polynomial p(x) by a xed sos factor q(x). What does this construction
correspond to on the dual side?
A dual interpretation of this method is in terms of rational moments, i.e.,
expectations of rational functions
= E [X /q(X)] .
Indeed, one can easily write necessary conditions that these rational moments
should satisfy, of the form
2
3
E p(X)2 /q(X) 0,
(3.39)
which, as before (after parametrizing polynomials p(x) up to a given degree), give
spectrahedral conditions on the rational moments . Furthermore, the standard
moments = E[X ] are
given by a linear transformation of the rational moments , since if q(x) = c x , then
= E[X ] = E[q(X)(X /q(X))] =
c E[X + /q(X)] =
c + .
E[x4 y 2 ]+E[x2 y 4 ] and = 22 = E[x2 y 2 ]. The simple sos approximation 2,6 P2,6
in this case yields the trivial orthant outer bound 0, 0.
We can produce tighter bounds by considering the multiplier-based relaxations
described earlier. Let us describe the geometry rst. For this, dene the Motzkinlike family of polynomials Mt (x, y) = t3 x4 y 2 + t3 x2 y 4 + 1 3t2 x2 y 2 (for t = 1, this
i
i
main
2012/11/1
page 129
i
129
1.2
1.0
0.8
0.6
0.4
0.2
0.5
1.0
1.5
2.0
is the standard Motzkin polynomial). It can be shown (e.g., via the arithmeticgeometric inequality or Exercise 3.160) that Mt (x, y) is nonnegative for t 0.
Therefore, we have the parametrized family of linear inequalities
0 E[Mt (X, Y )] = t3 + 1 3t2
3
p,
which is a quadratic form in the coecients ai . Expressing this in matrix form, one
obtains a 14 14 matrix4 whose entries are the rational moments jk . We also have
the normalization condition E[1] = 20 + 02 = 1. Since (, ) = (24 + 42 , 22 ),
the desired projection is then given by jk (62 + 244 + 26 , 42 + 24 ).
Moments on compact sets. Consider a basic semialgebraic set set S = {x
Rn : g1 (x) 0, . . . , gm (x) 0}. We want to describe (or approximate) the set of
valid moments of nonnegative measures supported on S.
As before, we can easily write necessary conditions that the moments should
satisfy by computing expectations of polynomials that are obviously nonnegative
4 In this specic case, the problem can be much simplied by exploiting the sparsity and
symmetry present in the problem. For simplicity, the details are omitted.
i
i
130
main
2012/11/1
page 130
i
on S. Since squares are certainly nonnegative, and so are the products of squares
with the dening polynomials gi , we can consider the expressions
E[p(X)2 ] 0,
E[g1 (X)p(X)2 ] 0,
...
E[gm (X)p(X)2 ] 0,
(3.40)
where p(x) are arbitrary polynomials. Exactly as in the univariate case, imposing
this condition for all p(x) up to a certain degree, these yield quadratic forms in
the coecients of p(x) that depend linearly on the moments . Thus, the conditions (3.40) give a family of spectrahedral approximations of the set of moments
of S-supported nonnegative measures. By increasing the degree of the polynomial p(X), tighter approximations are obtained. Under the right assumptions
(essentially, if we can approximate the set of nonnegative polynomials), this dual
hierarchy will approximate the set of moments arbitrarily well. For instance, recall
from Section 3.4.4 that this will be the case if qmodule(q1 , . . . , qm ) satises the
Archimedean property of Denition 3.137 (and thus, S is compact), as was done
in [72]. Notice that these approximations can be strengthened by including products
of the form E[gi (X) gk (X)p(X)2 ] 0, which correspond to the distinction between preorders and quadratic modules, or, equivalently, Schm
udgen versus Putinar
representations.
Constructing multivariate measures. In the univariate case, we have discussed
in Section 3.5.5 how to produce an atomic measure matching a given nite set of
moments using Pronys method. This is possible because in that case there is a full
characterization of the moment space. In the multivariate case, as we have seen,
even the decision question (Are these valid moments?) is NP-hard, and thus,
in general, unless further assumptions are satised, no such ecient procedure is
available.
Given a truncated moment sequence (or, equivalently, a functional Rn,2d ),
the positivity condition (p2 ) 0 is of course necessary for the existence of a
nonnegative measure. A well-known case where it is possible to construct such
a measure is whenever the at extension property [34] holds. This is a condition on the given moment sequence that requires the rank of the quadratic form
p (p2 ) to remain the same when considering polynomials p of degree d or d + 1
for some value of d. Whenever this condition holds, a natural generalization of
the method described in the univariate case can be applied to obtain an atomic
measure matching the given moment sequence. The basic idea of this construction is sketched below and appears in a number of related forms in the literature
(e.g., GelfandNeimarkSegal construction, Stickelberg/Stetter-M
oller/eigenvalue
method for polynomial equations [32, 121], etc.). Under the at extension assumption, one can dene nite-dimensional commuting multiplication operators (i.e.,
matrices) associated to each of the variables xi . To do this, one considers the linear
maps Mxi : f xi f , where Mxi : R[x]n,d /S R[x]n,d /S and S is the subspace
{p R[x]n,d : (p2 ) = 0}. By construction, these matrices pairwise commute, and
they can be simultaneously diagonalized. From their diagonal representation, one
can directly read the components of the support of the measure and then obtain the
corresponding weights. For a full exposition of the procedure, we refer the reader
to [63, 73].
i
i
main
2012/11/1
page 131
i
131
3.6
3.6.1
Copositive Matrices
xT M x 0.
min
Ax0, xT x=1
xT Qx.
i
i
132
main
2012/11/1
page 132
i
It is easy to verify that M is copositive if and only if the form P (z) is nonnegative,
i.e., P (z) 0 for all z Rn . This shows that we can indeed identify the cone
i
i
main
2012/11/1
page 133
i
133
P 0,
M = P + N,
for i = j
(without loss of generality, we can take Nii = 0). If this holds, then M is copositive.
The condition in Lemma 3.164 is only sucient for copositivity. A well-known
example showing this is the matrix
H=
1 1
1
1 1
1
1 1
1
1
1 1
1 1
1
1
1 1
1 1
1
1
1 1
1
(3.41)
This matrix, originally introduced by A. Horn, is copositive even though it does not
satisfy the P + N condition of Lemma 3.164.
This motivates the denition of a natural hierarchy of approximations to the
copositive cone [89, 35]. Consider the family of 2(r + 2)-forms given by
Pr (z) =
n
r
zi2
P (z),
(3.42)
i=1
and dene the cones Kr = {M S n : Pr (z) is sos} (for simplicity, we omit the
dependence on n). It is easy to see that if Pr is a sum of squares, then Pr+1 is also
a sum of squares. The converse proposition, however, does not necessarily hold;
i.e., Pr+1 could be a sum of squares even if Pr is not. Additionally, if Pr (z) is
nonnegative, then so is P (z). Thus, by testing whether Pr (z) is a sum of squares,
we can guarantee the nonnegativity of P (z) and, as a consequence, the copositivity
of M . This yields the hierarchy of inclusions
(n)
n
+ R+2 K0 K1 Kr Cn ,
S+
(3.43)
where (abusing notation) the rst equality expresses the statement of Lemma 3.164.
The containment between these cones is in general strict. For instance, the Horn
matrix presented in (3.41) is not in K0 , but it is in K1 ; see Exercise 3.170.
i
i
134
main
2012/11/1
page 134
i
Clearly, this hierarchy gives computable conditions that are at least as powerful as the P + N test of Lemma 3.164. But how conservative is this procedure?
Does it approximate the copositive cone Cn to arbitrary precision? It follows from
our discussion of P
olyas theorem in Section 3.2.6 that for any strictly copositive
matrix M , there is a nite r for which M Kr . However, the minimum r cannot be chosen as a constant (uniformly over all strictly copositive matrices). In
general, the known lower bounds for r usually involve a condition number for
the form P (z): the minimum r grows as the form tends to degeneracy (nontrivial
solutions). This is consistent with the computational complexity results mentioned
earlier: if the value of r were uniformly bounded above, then we could always produce a polynomial-time certicate for copositivity (namely, an sos decomposition of
Pr (z)), contradicting NP = co-NP.
Circulant copositive matrices. In general, particularly in high dimensions, the
geometry of the copositive cone is very complicated. As such, it is often useful
to consider low-dimensional sections, where we can gain some intuition and understanding. A nice case, which we analyze next, is the case of circulant (or cyclic)
matrices.
An n n matrix is circulant if its (i, j) entry depends only on |i j| mod n.
We denote the subspace of n n circulant matrices by On . For the case of n = 5,
we provide below a complete characterization of the circulant copositive matrices
and the associated relaxations. A general 5 5 circulant matrix has the form
a b c c b
b a b c c
(3.44)
M (a, b, c) =
c b a b c .
c c b a b
b c c b a
For circulant matrices, the second relaxation K1 will be enough to recognize copositivity, i.e., C5 O5 = K1 O5 . Notice that if a = 0, then all the other elements
must be nonnegative. For later reference, we dene the constant
= (1 + 5)/4 0.809.
Theorem 3.165. Consider a circulant matrix M = M (a, b, c) as in (3.44). Then
the following hold.
1. The matrix M is in K0 if and only if
a 0,
a + b 0,
a + c 0,
a + 2b + 2c 0.
a + b 0,
a + c 0,
a + 2b + 2c 0,
i
i
main
2012/11/1
page 135
i
135
c
1.0
0.5
0.0
0.5
1.0
1.5
1.5
1.0
0.5
0.0
0.5
1.0
k
vi viT ,
i=1
i
i
136
main
2012/11/1
page 136
i
nij xi xj ,
i=j
n
i=1
xi
(xT M x) =
n
i=1
xi (xT Qi x) +
ijk xi xj xk ,
i=j=k
3.6.2
Lyapunov Functions
i
i
main
2012/11/1
page 137
i
137
makes possible searching over anely parametrized polynomial or rational Lyapunov functions for systems with dynamics of the form
x i (t) = fi (x(t))
for i = 1, . . . , n,
(3.45)
where the functions fi are polynomials or rational functions. Recall that, for a
system to be globally asymptotically stable, it is sucient to prove the existence of
a Lyapunov function that satises
&
'T
V
V (x) > 0,
V (x) =
f (x) < 0
x
for all x Rn \ {0}, where without loss of generality we have assumed that the
dynamical system (3.45) has an equilibrium at the origin (see, e.g., [67]).
As mentioned earlier, we will consider candidate Lyapunov functions that are
polynomials (or rational functions). Since polynomial nonnegativity is computationally hard, we will instead impose that the candidate Lyapunov function V (x)
and its Lie derivative V (x) both satisfy the (possibly stronger) condition:5
&
'T
V
V (x) is sos,
V (x) =
f (x) is sos.
x
Parametrizing a candidate Lyapunov function (e.g., by considering all possible polynomials of degree less than or equal to 2d), the conditions given above can be expressed as sos constraints in terms of the coecients of the Lyapunov function.
Since both conditions are ane in the coecients of V (x), using the techniques
described earlier in this chapter, these can be easily transformed into a standard
semidenite optimization formulation.
As an example, consider the following nonlinear dynamical system that corresponds to the MooreGreitzer model of a jet engine with stabilizing feedback
operating in the no-stall mode (see, e.g., [71]). The dynamic equations take the form
1
3
x = y x2 x3 ,
2
2
y = 3x y.
(3.46)
i
i
138
main
2012/11/1
page 138
i
5
4
3
2
1
0
1
2
3
4
5
5
i
i
3.6.3
main
2012/11/1
page 139
i
139
Probability Bounds
Two of the most useful results in basic probability theory are the classic Markov
and Chebyshev inequalities. Markovs inequality states that if X is a nonnegative
scalar random variable, then, for all a > 0,
P(X a)
E[X]
.
a
(3.47)
Similarly, Chebyshevs inequality says that for any random variable X with mean
and variance 2 , we have
P(|X | a)
2
.
a2
(3.48)
d
k=0
c k k
subject to
d
ck xk 1 x S,
k=0
d
k
k=0 ck x 0 x .
(3.50)
Notice that when and S are (unions of) univariate intervals, it follows from the
characterizations given in Section 3.3.1 that this is an sos optimization program of
the form discussed in Section 3.1.7.
i
i
140
main
2012/11/1
page 140
i
We claim that any feasible solution of (3.49) gives a valid upper bound on
P(X S). To see this, notice if 1S (x) is the indicator function of the set S (i.e., it
is equal to 1 if x S and 0 otherwise); the constraints in (3.49) imply the inequality
1S (x) p(x) for all x . It then follows that
,
,
P(X S) =
1S (x) dP(x)
p(x) dP(x) = E[p(X)].
In simpler terms, these bounds work by approximating (from above, in the case
of upper bounds) the indicator function of the event S by a polynomial. Since we
know the moments of X, we can compute in closed form the expectation of this
polynomial. By optimizing over the coecients ck , we nd the best polynomial
approximation of the indicator function and thus the best upper bound provable by
this method.
Essentially the same techniques apply to much more complicated situations
(e.g., the multivariate case, partial moment information, martingale inequalities,
etc.). For a detailed treatment, see [16, 17] and references therein.
Exercise 3.176. Show that the Markov and Chebyshev bounds can be interpreted as closed-form solutions of (3.49) for specic sets and S. What are the
corresponding optimal polynomials p(x)?
Exercise 3.177. Assume that = [0, 5], S = [4, 5], and the mean and variance
of the random variable X are equal to 1 and 1/2, respectively. Give upper and
lower bounds on P (X S). Are these bounds tight? Can you nd the worst-case
distributions?
3.6.4
i
n1 n2
S+
.
i
i
main
2012/11/1
page 141
i
141
The physical interpretation of a separable state corresponds to a probabilistic superposition (with probabilities given by the pi ), where one subsystem is in state
xi and the other subsystem is in state yi . If no such decomposition is possible,
then it is not possible to think of the two subsystems as being independent (even
though they may be physically separated), and thus actions/measurements on one
subsystem may aect the other (i.e., they are entangled).
The quantum separability or quantum entanglement question is the following:
Given the density matrix of a quantum state, how do we decide whether is
entangled or not? If it entangled (or separable), how can we certify this property?
It has been shown by Gurvits that in general this is an NP-hard question [58].
As we shall see, quantum entanglement is intimately related to polynomial
nonnegativity. A natural mathematical object to study in this context is the set of
positive maps. These are the linear operators : S n1 S n2 that satisfy X 0
(X) 0; i.e., they map positive semidenite matrices into positive semidenite
matrices. Notice that to any such , we can associate a unique observable W
S n1 n2 that satises y T (xxT )y = (xy)T W (xy). Furthermore, if is a positive
map, then the pairing between the observable W and any separable state will
always give a nonnegative number, since
T
T
pi (xi xi ) (yi yi ) =
pi Tr W (xi yi ) (xi yi )T
W , = Tr W
=
pi (xi yi ) W (xi yi ) =
T
In other words, every positive map yields a separating hyperplane for the convex set
of separable states. It can further be shown that every valid inequality corresponds
to a positive map, so this yields, in fact, a complete characterization (and thus,
the sets of separable states and positive maps are dual to each other). For this
reason, the observables W associated to positive maps are called entanglement
witnesses.
The set of positive maps (and thus, entanglement witnesses) can be exactly
characterized in terms of a multivariate polynomial nonnegativity, since a linear map
: S n1 S n2 is positive if and only if the biquadratic form in n1 + n2 variables
p(x, y) = y T (xxT )y is nonnegative for all x, y (why?). Replacing nonnegativity
with sos based conditions, we can obtain a family of eciently computable criteria
that certify entanglement.
Concretely, given a state for which we want to determine whether it is entangled, the rst such test corresponds to the optimization problem of nding an
entanglement witness W (or linear map ) such that
W , < 0,
y T (xxT )y is sos.
(3.51)
(3.52)
i
i
142
main
2012/11/1
page 142
i
for k 0 that obviously generalize (3.51) (which corresponds to the case k = 0). It
should be clear that these sos programs can be numerically solved using semidenite
programming. It can also be shown [40, 41] that this hierarchy is complete in the
sense that every entangled state is eventually certied by some value of k.
For more background and details about quantum entanglement and the separability problem, see [40, 41] and the references therein. It has been recently
shown [22] that the sos based algorithm described above can be used to provide a
quasipolynomial time algorithm for the quantum separability problem.
Exercise 3.178. Consider linear maps between symmetric matrices of the form
: S n1 S n2 .
1. Show that any linear map of the form A i PiT APi , where Pi Rn1 n2 ,
is positive. These maps are known as decomposable maps.
2. Consider the polynomial dened by p(x, y) := y T (xxT )y. Show that is a
positive map if and only if p(x, y) is nonnegative and that is a decomposable
map if and only if p(x, y) is a sum of squares.
3. Show that the linear map C : S 3 S 3 (due to M.-D. Choi) given by
2a11 + a22
0
C : A
0
0
2a22 + a33
0
0
A
0
2a33 + a11
3.6.5
i
i
main
2012/11/1
page 143
i
143
We give next a simple sos proof of this inequality for the case k = 1, easily
obtainable via semidenite programming. Dene
T
2
x + yz
1 2
y + xz
S1 =
2 2
z + xy
2 1 1 x2 + yz
1 2 1 y 2 + xz ,
z 2 + xy
1 1 2
(3.54)
holds for all triangles. The statement was subsequently shown to be false in general
[11] but proved to hold whenever the triangle in question is acute (all angles are
less than or equal to /2) [12]. Using sos techniques, we will obtain a very concise
proof.
For this, we can express the premise that the triangle be acute as the three
polynomial inequalities
t1 := a2 + b2 c2 0,
t2 := b2 + c2 a2 0,
(3.55)
t3 := c + a b 0.
2
It is well known (Herons formula) that we can rewrite the square of the area K as
a polynomial in a, b, c:
K 2 = s(s a)(s b)(s c),
s=
a+b+c
.
2
The question, therefore, reduces to verifying that (3.54) holds whenever the inequalities (3.55) are satised. A simple proof of Onos inequality can then be found using
the Positivstellensatz and sos methods: dene the sos polynomial
s(x, y, z) := (x4 +x2 y 2 2y 4 2x2 z 2 +y 2 z 2 +z 4 )2 +15 (xz)2 (x+z)2 (z 2 +x2 y 2 )2 .
i
i
144
main
2012/11/1
page 144
i
We have then
(4K)6 27 t21 t22 t23 = s(a, b, c) t1 t2 + s(c, a, b) t1 t3 + s(b, c, a) t2 t3 , (3.56)
therefore proving the inequality.
Another, more complicated, application of these techniques is given in [93].
In that paper, the subadditivity of a geometric quantity for triangles, expressible in
terms of its side lengths and an angle, is proved via sos methods. The problem can
be reduced to proving the nonnegativity of the polynomial
2 2 ( )2 + 2 (1 )(1 + 22 ) 2
+ 2 (1 )(1 + 2 2 ) 2 (2 + 3 4 + 3 )
+ (1 )(2 2 ) 3 (2 + 2 + 23 3 42 2 ) 2 2
+ (1 )(2 2 ) 3 + ( )2 3 3
(3.57)
a2 + b2 + c2 (4 3)K,
i
i
main
2012/11/1
page 145
i
145
Exercise 3.180 (Pedoes inequality). Consider two triangles with side lengths
equal to (a1 , b1 , c1 ) and (a2 , b2 , c2 ) and areas K1 , K2 , respectively. Give an sos proof
of the inequality
a21 (b22 + c22 a22 ) + b21 (c22 + a22 b22 ) + c21 (a22 + b22 c22 ) 16K1K2 .
Is 16 the best possible constant? What happens if one of the triangles is equilateral?
Exercise 3.181. Prove that the polynomial (3.57) is nonnegative when the variables satisfy 0 , , , 1. Find an sos certicate of this fact.
3.6.6
Polynomial Games
The mathematical theory of games was developed to model and analyze strategic
interactions among multiple decision makers with possibly conicting objectives.
Game theory has been successfully used in many domains, including economics,
engineering, and biology. Standard modern references include [46, 82]. In this
section we present an application of sos methods in game theory, initially described
in [92].
We consider two-player zero-sum games, where the payos are polynomial
functions. This class of polynomial games was originally introduced and studied
by Dresher, Karlin, and Shapley in 1950 [42]. In the basic set-up there are two
players (which we will denote as Player 1 and Player 2), which simultaneously and
independently choose actions parametrized by real numbers x, y, respectively, in the
interval [1, 1]. The payo associated with these choices is given by a polynomial
function
n
m
pij xi y j
(3.58)
P (x, y) =
i=0 j=0
that assigns payments from Player 2 to Player 1. Thus, Player 1 wants to choose
his strategy x to maximize P (x, y), while Player 2 tries to make this expression as
small as possible. Players are allowed, and often it is in their interest, to choose
their actions randomly according to specic probability distributions; these are
called mixed strategies (the game of rock-paper-scissors is a simple example of this
situation).
The solution concept of interest is called Nash equilibrium. This corresponds
to a choice of strategies for both players, for which there is no incentive for a player
to deviate, assuming the other player keeps their strategy xed. It is well known
that for zero-sum games, this notion reduces to the simpler minimax or saddle-point
equilibrium; see (3.60).
Example 3.182. Consider a polynomial game on [1, 1][1, 1], with payo function given by P (x, y) = (x y)2 . Since Player 2 wants to minimize her payos, she
should try to guess the number chosen by Player 1. Conversely, the rst player
should try to make his number as dicult to guess as possible (in the sense dened
by P (x, y)). It is easy to see in this case that the optimal strategy for Player 1 is
to randomize between x = 1 or x = 1 with equal probability, while the optimal
i
i
146
main
2012/11/1
page 146
i
strategy for Player 2 is to always choose y = 0. Assuming the other player keeps
their strategy xed, no player has incentive to deviate from these strategies, and
thus this yields an equilibrium, with the corresponding value of the game being
equal to 1.
The question of interest is the following: given a game described by its payo
function P (x, y), how do we eciently compute its equilibrium solution, i.e., the
optimal strategies both players should use?
Recall that players can randomize over their choices, so their strategies will
be described by probability measures and , respectively, supported on [1, 1].
When considering mixed strategies, and similarly to the nite case, we need to
consider the expressions
max min E [P (x, y)]
and
where E [] denotes the expectation under the product measure. We can rewrite
these as bilinear expressions
max min
i
n
m
pij i j ,
i=0 j=0
min max
j
n
m
pij i j ,
(3.59)
i=0 j=0
Recall from Section 3.5.4 that the moment spaces (i.e., the image of the probability
measures under the moment map given above) are compact convex sets in Rn+1
and Rm+1 . Since the objective function in the problems (3.59) is bilinear, and
the feasible sets are convex and compact, the minimax theorem (Theorem A.6
in Appendix A) can be used to show that these two quantities are equal. As a
consequence, there exist measures , that satisfy the saddle-point condition:
n
m
i=0 j=0
pij i j
n
m
i=0 j=0
pij i j
n
m
pij i j .
(3.60)
i=0 j=0
The key fact here is that, due to the separable structure of the payos, the optimal
strategies can be characterized only in terms of their rst m (or n) moments. Higher
moments are irrelevant, at least in terms of the payos of the players.
From the previous discussion, we have the following result, essentially contained in [42].
Theorem 3.183. Consider the two-player zero-sum game on [1, 1] [1, 1], with
payos given by (3.58). Then, the value of the game is well dened, and there exist
optimal mixed strategies , satisfying a saddle-point condition. Furthermore,
without loss of generality, the optimal measures can be taken to be discrete, with at
most min(n, m) + 1 atoms.
i
i
main
2012/11/1
page 147
i
147
The derivation and computation of the mixed strategies and the value of the
game can be done as follows. We rst characterize security strategies that provide
a minimum guaranteed payo. We can then invoke convex duality to prove that
this actually yields the unique value of the game. Proceeding along these lines,
by analogy to the nite case, a security strategy of Player 2 can be computed by
solving
E [P (x, y)] x [1, 1],
11
(3.61)
minimize s.t.
,
d(y) = 1.
1
Indeed, if Player 2 plays the mixed strategy obtained from the solution of this
problem, the best that Player 1 can do is to choose a value of x that maximizes
E [P (x, y)], thus limiting his gain (and Player 2s loss) to .
Since P (x, y) is a polynomial, this expectation can be equivalently written in
terms of the rst n moments of the measure , i.e.,
,
E [P (x, y)] =
P (x, y)d(y) =
1
n
m
pij j xi .
i=0 j=0
Notice that this is a univariate polynomial in the action x of Player 1, with coecients that depend anely on the moments j of the mixed strategy of Player 2.
Consider now the problem (3.61), but instead of writing it in terms of the
decision variable (which is a probability measure), let us use instead the moments
{j }m
j=0 . The problem is then reduced to the minimization of the safety level ,
subject to the following conditions:
The univariate polynomial
m
n
i=0
j=0
The sequence {j }m
j=0 is a valid moment sequence for a probability measure
supported in [1, 1].
We can rewrite this in a more compact form, as the optimization problem
minimize
s.t.
n
i=0
m
j=0
pij xi j
P1,n ,
Mm ,
(3.62)
where P1,n is the set of univariate polynomials of degree n nonnegative in [1, 1],
and Mm is the set of m + 1 rst moments of a probability measure with support
on the same interval.
By the characterizations provided in earlier sections, it is clear that both of
these conditions can be rewritten in terms of semidenite programming and thus
eciently solved. Furthermore, using the procedure described in Section 3.5.5, the
corresponding optimal mixed strategies can be obtained.
Example 3.184. Consider the guessing game discussed in Example 3.182. In this
case, the decision variables (0 , 1 , 2 ) are the moments of the mixed strategy of
i
i
148
main
2012/11/1
page 148
i
Player 1. To compute the optimal strategies, we must then solve (3.62), i.e.,
2
(x2 0 2x
1 + 2) = s0 (x) + s1 (1 x ),
0 1
0,
1 2
minimize s.t.
0 2 0,
0 = 1,
where we have used the sos/semidenite characterizations of univariate polynomials (Section 3.3.1) and moments constraints (Section 3.5.3) for the interval [1, 1].
The optimal solution of this problem is = 1, (0 , 1 , 2 ) = (1, 0, 1), s0 (x) = 0, and
s1 = 1. From this, the optimal strategies (x) for Player 1 and 12 (x 1) + 12 (x + 1)
for Player 2 directly follow.
Exercise 3.185. Consider a two-player game on [1, 1] [1, 1] with payo function given by
P (x, y) = 5xy 2x2 2xy 2 y.
Notice this function is neither convex nor concave.
Formulate and solve the corresponding optimization problem to nd the optimal solution of this game. Verify that the optimal strategies correspond to Player 1
always choosing x = 0.2, and Player 2 choosing y = 1 with probability 0.78, and
y = 1 with probability 0.22.
3.7
Software Implementations
Despite the many advances in theoretical and modeling aspects of SDP and sos
methods, much of their impact in applications has undoubtedly been a direct consequence of the eorts of many researchers in producing and making available good
quality software implementations. In this section we give pointers to and discuss
briey some of the current computational tools for eectively formulating and solving SDP and sos programs.
Most SDP solvers (e.g., those described in Section 2.3.2) usually take as input
either text les containing a problem description or directly the matrices (Ai , b, C)
corresponding to the standard primal/dual formulation. This is often inconvenient
at the initial modeling and solution stages. A more exible approach is to formulate the problem using a more natural description, closer to its mathematical
formulation, that can later be automatically translated to t the requirements of
each solver. For generic optimization problems, this has indeed been the approach of
much of the operations research community, which has developed some well-known
standard le formats, such as MPS, or optimization modeling languages like AMPL
and GAMS. An important remark to keep in mind, much more critical in the SDP
case than for linear optimization, is the extent to which the problem structure can
be signaled to the solver.
For sos programs, as we have seen, the conversion process to an SDP formulation is algorithmic, and there are parsers that partially or fully automate this
i
i
Bibliography
main
2012/11/1
page 149
i
149
conversion task and can be used from within a problem-solving environment such as
MATLAB. The software SOSTOOLS [101] is a free, third-party MATLAB toolbox
for formulating and solving general sos programs. The related software Gloptipoly
[62] is oriented toward global optimization problems and the associated moment
problems. In their current version, both use the SDP solver SeDuMi [118] for numerical computations. Other possibilities include YALMIP [74], a very complete
modeling language for convex and nonconvex optimization that includes several
sos/moments features, as well as the more specialized toolbox SPOT [78], oriented
toward problems in systems and control theory. An interesting new addition to this
area is the MATLAB toolbox NCSOStools [25] that specializes in sums of squares
in noncommuting variables, a topic that will be discussed extensively in Chapter 8.
Any of these parsers can make formulating and solving sos programs a much simpler
and more enjoyable task than manual, error-prone methods.
Bibliography
[1] M. Abramowitz and I.A. Stegun, eds. Handbook of Mathematical Functions.
Dover, New York, 1964.
[2] A.A. Ahmadi, M. Krstic, and P. A. Parrilo. A globally asymptotically stable
polynomial vector eld with no polynomial Lyapunov function. In Proceedings
of the 50th IEEE Conference on Decision and Control, IEEE, Washington,
DC, 2011.
[3] A.A. Ahmadi, A. Olshevsky, P. A. Parrilo, and J.N. Tsitsiklis. NP-hardness of
deciding convexity of quartic polynomials and related problems. Mathematical
Programming, 124, 2011.
[4] A.A. Ahmadi and P. A. Parrilo. A complete characterization of the gap between convexity and sos-convexity. Mathematical Programming, to appear.
arXiv:1111.4587, 2011.
[5] A.A. Ahmadi and P. A. Parrilo. A convex polynomial that is not sos-convex.
Mathematical Programming, 135:275292, 2012.
[6] N. I. Akhiezer. The Classical Moment Problem. Hafner Publishing Company,
New York, 1965.
[7] C. Andradas. Characterization and description of basic semialgebraic sets.
In Algorithmic and Quantitative Real Algebraic Geometry (Piscataway, NJ,
2001), DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 60, Amer. Math.
Soc., Providence, RI, 2003, pp. 112.
[8] C. Andradas and J.M. Ruiz. Ubiquity of L
ojasiewiczs example of a nonbasic
semialgebraic set. The Michigan Mathematical Journal, 41:465472, 1994.
[9] E. M. Aylward, S. M. Itani, and P. A. Parrilo. Explicit SOS decomposition of
univariate polynomial matrices and the Kalman-Yakubovich-Popov lemma.
i
i
150
main
2012/11/1
page 150
i
[10] C. Bachoc and F. Vallentin. New upper bounds for kissing numbers from
semidenite programming. J. Amer. Math. Soc, 21:909924, 2008.
[11] F. Balitrand. Problem 4417. Intermed. Math., 22:66, 1915.
[12] F. Balitrand. Problem 4417. Intermed. Math., 23:8687, 1916.
[13] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in Real Algebraic Geometry,
Algorithms and Computation in Mathematics 10, Springer-Verlag, Berlin,
2003.
[14] A. Ben-Tal, L. El Ghaoui, and A.S. Nemirovski. Robust Optimization. Princeton University Press, Princeton, NJ, 2009.
[15] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization.
MPS/SIAM Series on Optimization 2. SIAM, Philadelphia, 2001.
[16] D. Bertsimas and I. Popescu. Optimal inequalities in probability theory: A
convex optimization approach. SIAM J. Optim., 15:780804, 2005.
[17] D. Bertsimas and J. Sethuraman. Moment problems and semidenite optimization. In Handbook of Semidenite Programming, H. Wolkowicz, R. Saigal,
and L. Vandenberghe, eds., Springer, New York, 2000, pp. 469509.
[18] S.P. Bhattacharyya, H. Chapellat, and L.H. Keel. Robust Control: The Parametric Approach. Prentice-Hall, Englewood Clis, NJ, 1995.
[19] J. Bochnak, M. Coste, and M-F. Roy. Real Algebraic Geometry. Springer, New
York, 1998.
[20] J.M. Borwein and H. Wolkowicz. Facial reduction for a cone-convex programming problem. J. Austral. Math. Soc. Ser. A, 30:369380, 1980.
[21] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory, Stud. Appl. Math. 15. SIAM, Philadelphia,
1994.
[22] F.G.S.L. Brand
ao, M. Christandl, and J. Yard. A quasipolynomial-time algorithm for the quantum separability problem. In Proceedings of the 43rd
Annual ACM Symposium on Theory of Computing, ACM, New York, 2011,
pp. 343352.
[23] C.W. Brown. QEPCAD Quantier Elimination by Partial Cylindrical Algebraic Decomposition, 2003. Available from www.cs.usna.edu/qepcad/B/
QEPCAD.html.
[24] S. R. Buss and T. Pitassi. Good degree bounds on Nullstellensatz refutations
of the induction principle. J. Comp. System Sci., 57:162171, 1998.
i
i
Bibliography
main
2012/11/1
page 151
i
151
i
i
152
main
2012/11/1
page 152
i
i
i
Bibliography
main
2012/11/1
page 153
i
153
i
i
154
main
2012/11/1
page 154
i
i
i
Bibliography
main
2012/11/1
page 155
i
155
i
i
156
main
2012/11/1
page 156
i
i
i
Bibliography
main
2012/11/1
page 157
i
157
[114] G. Stengle. A Nullstellensatz and a Positivstellensatz in semialgebraic geometry. Math. Ann., 207:8797, 1974.
[115] G. Stengle. Complexity estimates for the Schm
udgen Positivstellensatz.
J. Complexity, 12:167174, 1996.
[116] J. Stoer and R. Bulirsch. Introduction to Numerical Analysis, Texts Appl.
Math. 12. Springer-Verlag, New York, 2002.
[117] A. Strzebonski. Solving algebraic inequalities. The Mathematica Journal,
7:525541, 2000.
[118] J. Sturm, O. Romanko, and I. Polik. SeDuMi version 1.3, 2010. MATLAB
toolbox, available from sedumi.ie.lehigh.edu.
[119] B. Sturmfels. Algorithms in Invariant Theory, Texts Monogr. Symbol. Comput. 1. Springer, Wien, 1993.
[120] B. Sturmfels. Introduction to resultants. In Applications of Computational
Algebraic Geometry (San Diego, CA, 1997), Proc. Sympos. Appl. Math. 53,
AMS, Providence, RI, 1998, pp. 2539.
[121] B. Sturmfels. Solving Systems of Polynomial Equations. AMS, Providence,
RI, 2002.
[122] H. Valiaho. Criteria for copositive matrices. Linear Algebra Appl., 81:1934,
1986.
[123] F. Vallentin. Symmetry in semidenite programs. Linear Algebra Appl.,
430:360369, 2009.
[124] H. Waki, S. Kim, M. Kojima, and M. Muramatsu. Sums of squares and
semidenite program relaxations for polynomial optimization problems with
structured sparsity. SIAM J. Optim., 17:218242, 2006.
[125] K. Zhou, K. Glover, and J. C. Doyle. Robust and Optimal Control. Prentice
Hall, Englewood Clis, NJ, 1995.
i
i
main
2012/11/1
page 158
i
main
2012/11/1
page 159
i
Chapter 4
Nonnegative
Polynomials and Sums
of Squares
Grigoriy Blekherman
A central question, for both practical and theoretical reasons, is how to eciently
test whether a polynomial p is nonnegative. We reformulate this problem in the
following way: given a nonnegative polynomial p, how do we eciently nd a representation of p, so that nonnegativity of p is apparent from this representation?
In other words, how do we eciently represent p as an obviously nonnegative
polynomial? Some polynomials are obviously nonnegative. If we can write p as a
sum of squares of polynomials, then it is clear that p is nonnegative just from this
presentation. Very importantly, if p is a sum of squares then its sums of squares
representation can be eciently computed via semidenite programming. This
connection was described in detail in Chapter 3. As we will see, the set of sums of
squares is a projected spectrahedron, while the set of nonnegative polynomials is far
more challenging computationally. The main question for this chapter is: what is
the relationship between nonnegative polynomials and sums of squares?
4.1
Introduction
Our story begins in 1885, when twenty-three-year-old David Hilbert was one of the
examiners in the Ph.D. defense of twenty-one-year-old Hermann Minkowski. During
the examination Minkowski claimed that there exist nonnegative polynomials that
are not sums of squares. Although he did not provide an example or a proof, his
argument must have been convincing, as he defended successfully.
Three years later Hilbert published a paper in which he classied all of the
(few) cases, in terms of degree and number of variables, in which nonnegative polynomials are the same as sums of squares. In all other cases Hilbert showed that
there exist nonnegative polynomials that are not sums of squares. Interestingly,
159
i
i
160
main
2012/11/1
page 160
i
Hilbert did not provide an explicit example of such polynomials. The rst explicit
example was found only seventy years later and is due to Theodore Motzkin. In
fact, Motzkin was not aware of what he constructed. Olga Taussky-Todd, who was
present during the seminar in which Motzkin described his construction, later notied him that he found the rst example of a nonnegative polynomial that is not
a sum of squares [22].
We examine the relationship between nonnegativity and sums of squares in
two dierent fundamental ways. We rst consider the structures that prevent sums
of squares from capturing all nonnegative polynomials, and show that equality occurs precisely when these structures are not present. We then examine in detail
the smallest cases where there exist nonnegative polynomials that are not sums of
squares and show that the inequalities separating nonnegative polynomials from
sums of squares have a simple and elegant structure. Second, we look at the quantitative relationship between nonnegative polynomials and sums of squares. Here
we show that when the degree is xed and the number of variables grows, there are
signicantly more nonnegative polynomials than sums of squares. We also apply
these ideas to studying the relationship between sums of squares and convex polynomials. While the techniques we develop for the two approaches are quite dierent
in nature, the unifying theme is that we examine the sets of nonnegative polynomials and sums of squares geometrically. Algebraic geometry is at the forefront of our
examination of fundamental dierences between nonnegative polynomials and sums
of squares, while convex geometry and analysis are used to examine the quantitative
relationship.
The chapter is structured as follows: After discussing Hilberts theorem and
Motzkins example in Section 4.2, we begin a detailed examination of the underlying causes of dierences between nonnegative polynomials and sums of squares
in Section 4.3. On the way we will see that nonnegative polynomials and sums of
squares form fascinating convex sets. Section 4.4 is devoted to the examination of
these objects from the point of view of convex algebraic geometry. We note that
many basic questions remain open.
The fundamental reasons for the existence of nonnegative polynomials that
are not sums of squares come from CayleyBacharach theory in classical algebraic
geometry and, in fact, Hilberts original proof of his theorem already used some of
these ideas. We begin developing the necessary techniques in Section 4.5. Duality
from convex geometry and its interplay with commutative algebra will play a central
role in our investigation. Section 4.6 develops the duality ideas and presents a unied
proof of the equality cases of Hilberts theorem. Sections 4.7 and 4.8 investigate
the smallest cases in which there exist nonnegative polynomials that are not sums
of squares. We show that this situation fundamentally arises from the existence of
CayleyBacharach relations and present some consequences.
We proceed by examining the quantitative relationship between nonnegative
polynomials and sums of squares in Section 4.9. This is done by establishing bounds
on the volume of sets of nonnegative polynomials and sums of squares, and analytic aspects of convex geometry come to the fore in this examination. We will
explain that if the degree is xed and the number of variables is allowed to grow,
then there are signicantly more nonnegative polynomials than sums of squares [5].
i
i
main
2012/11/1
page 161
i
161
This happens despite the diculty of constructing explicit examples of nonnegative polynomials that are not sums of squares, and numerical evidence that sums
of squares approximate nonnegative polynomials well if the degree and number of
variables is small [19]. The question of precisely when nonnegative polynomials
begin to signicantly overtake sums of squares is currently poorly understood.
Section 4.10 presents an application of the volume ideas to showing that there
exist homogeneous polynomials that are convex functions but are not sums of
squares. There is no known explicit example of such a polynomial, and this is
the only known method of showing their existence.
4.2
A Deeper Look
x1
xn
,...,
xn+1
xn+1
'
.
i
i
162
4.2.1
main
2012/11/1
page 162
i
Hilberts Theorem
The rst fundamental result about the relationship between Pn,2d and n,2d was
shown by Hilbert in 1888.
Theorem 4.3. Nonnegative forms are the same as sums of squares, Pn,2d = n,2d ,
in the following three cases: n = 2 (univariate nonhomogeneous case), 2d = 2
(quadratic forms), and n = 3, 2d = 4 (ternary quartics). In all other cases there
exist nonnegative forms that are not sums of squares.
The proof of the three equality cases in Hilberts theorem usually proceeds by
treating each of the three cases separately. For example, it is a simple exercise to
show that Pn,2 = n,2 .
Exercise 4.4. Deduce that Pn,2 = n,2 from diagonalization of symmetric matrices.
We adopt a dierent approach: We begin by examining the structures that
allow the existence of nonnegative forms that are not sums of squares. In Section
4.6.1 we show that the three cases of Hilberts theorem are the only cases in which
these structures do not exist. This provides a unied proof of the three equality
cases of Hilberts theorem, which are usually treated separately.
4.2.2
Motzkins Example
The rst explicit example of a nonnegative form that is not a sum of squares is due
to Motzkin:
M (x, y, z) = x4 y 2 + x2 y 4 + z 6 3x2 y 2 z 2 .
The form M can be seen to be nonnegative by the application of the arithmetic
mean-geometric mean inequality. Why is M not a sum of squares?
In the following exercises we develop a general method for showing that a
form is not a sum of squares, based on the monomials that occur in the form.
This method can also be applied to reduce the size of the semidenite program that
computes the sum of squares decomposition, as explained in Chapter 3. These ideas
are originally due to Choi, Lam, and Reznick [22].
Exercise 4.5.
For a polynomial p dene its Newton polytope N (p) to be the
convex hull of the vectors of exponents of monomials that occur in p. For example,
if p = x1 x22 + x22 + x1 x2 x3 , then N (p) = conv ({(1, 2, 0), (0, 2, 0), (1, 1, 1)}), which is
a triangle in R3 .
Show that if p = qi2 , then
N (qi )
1
N (p).
2
Exercise 4.6. Calculate the Newton polytope of the Motzkin form and use Exercise 4.5 to show that the Motzkin form is not a sum of squares.
i
i
main
2012/11/1
page 163
i
163
For much more on explicit examples of nonnegative polynomials that are not
sums of squares see [22].
4.2.3
Quantitative Relationship
While Hilberts theorem completely settles all cases of equality between Pn,2d and
n,2d it does not shed light on whether these cones are close to each other, even if
the cone of nonnegative polynomials is strictly larger. Due to the diculty of constructing explicit examples and numerical evidence for a small number of variables
and degrees, it is tempting to assume that n,2d approximates Pn,2d fairly well.
However, it was shown in [5] that if the degree 2d is xed and at least 4, then
as the number of variables n grows, there are signicantly more nonnegative forms
than sums of squares. We will make this statement precise and present a proof in
Section 4.9. The main idea is that, although the cones themselves are unbounded,
we can slice both cones with the same hyperplane, so that the section of each cone
is compact. We then derive separate bounds on the volume of each section.
For now we would like to note that the bounds guarantee that the dierence between Pn,2d and n,2d is large only for a very large number of variables n.
Whether this is an artifact of the techniques used to derive the bounds is unclear.
As we will see, for a small number of variables the distinction between Pn,2d and
n,2d is quite delicate, and it is not known at what point Pn,2d becomes much
larger than n,2d .
We now begin a systematic examination of dierences between nonnegative
forms and sums of squares. It is actually possible to see that there exist nonnegative forms that are not sums of squares by considering values of forms on nitely
many points. The following example will illustrate this idea and explain some of
the major themes in our investigation.
4.3
According to Hilberts theorem the smallest cases where Pn,2d and n,2d dier are
forms in 3 variables of degree 6, and forms in 4 variables of degree 4. We take a
close look at an explicit example for the case of forms in 4 variables of degree 4.
Let S = {s1 , . . . , s8 } be the following set of 8 points in R4 :
S = {1, 1, 1, 1}.
We will see that there is a dierence between nonnegative forms and sums
of squares by simply looking at the values that nonnegative polynomials and sums
of squares take on S. Accordingly, let us dene a projection from R[x]4,4 to R8
given by evaluation on S:
(f ) = (f (s1 ), . . . , f (s8 )) for f R[x]4,4 .
We will explicitly describe the images of P4,4 and 4,4 under this projection. Let
P = (P4,4 ) and = (4,4 ).
i
i
164
main
2012/11/1
page 164
i
As they are images of convex cones under a linear map, it is clear that both
P and are convex cones in R8 . Although both P and will turn out to be
closed, projections of closed convex cones do not have to be closed in general.
Exercise 4.7. Construct a closed convex cone C in R3 and a linear map : R3 R2
such that (C) is not closed.
4.3.1
We rst look at values on S that are achievable by nonnegative forms. Let R8+ be
the nonnegative orthant of R8 :
R8+ = {(x1 , . . . , x8 ) | xi 0 for i = 1, . . . , 8}.
Since we are evaluating nonnegative polynomials, it is clear that P R8+ . We
claim that, in fact, P = R8+ . In other words, any 8-tuple of nonnegative numbers
can be attained on S by a globally nonnegative form. By convexity of P it suces
to show that all the standard basis vectors ei are in P . Moreover, substitutions
xi xi permute the set S, and therefore it is enough to show that ei P for
some i.
Exercise 4.8. Let p R[x]4,4 be the following symmetric form:
p=
4
x4i + 2
i=1
x2i xj xk + 4x1 x2 x3 x4 .
i=j=k
4.3.2
In order to analyze the values of sums of squares, we need to take a look at the
values of the forms that we are squaring. The values of quadratic forms on S are not
linearly independent. Here is the unique (up to a constant multiple) linear relation
between the values on the points si that all quadratic forms in 4 variables satisfy:
f (si ) =
f (si ).
(4.1)
si has even number of 1s
Exercise 4.9. Verify that the relation (4.1) holds for all quadratic forms f R[x]4,2
and that it is unique up to a constant multiple.
i
i
main
2012/11/1
page 165
i
165
We are now ready to see how the relation (4.1) prevents sums of squares from
attaining all values in R8+ .
Proposition 4.10 (Hilberts original insight). Let ei be the ith standard basis
/ for all i.
vector in R8 . Then ei
Proof. Since we did not attach a specic labeling to the points of S it will suce
to show that e1
/
= (4,4 ). Suppose that there exists p 4,4 such that
(p) = e1 . Write p = j qj2 for some qj R[x]4,2 . The form p vanishes on s2 , . . . , s8 ,
2
and it has value 1 on s1 . Since p =
j qj it follows that each qj vanishes on
s2 , . . . , s8 . Each qj is a quadratic form in 4 variables, and therefore each qj satises
relation (4.1). From this relation it follows that qj (s1 ) = 0 for all j. Therefore
p(s1 ) = 0, which is a contradiction.
Hilberts original proof did not use an explicit example to show that the vectors ei can be realized as values of a nonnegative form, which we did in Exercise
4.8. Instead he provided a recipe for constructing such a form, and proved that
the construction works. We largely followed Hilberts recipe to construct our counterexample. For more information on Hilberts construction see [23].
4.3.3
Complete Description of
We can do better than just describing some points that are not in . Our next goal
is to completely describe and, in particular, we will see how far the points ei are
from being the values of a sum of squares.
We use to also denote the same evaluation projection on quadratic forms in
4 variables:
(f ) = (f (s1 ), . . . , f (s8 )) for f R[x]4,2 .
Let L be the projection of the entire vector space of quadratic forms:
L = (R[x]4,2 ).
Using relation (4.1) and Exercise 4.9 we see that L is a hyperplane in R8 . Let
C be the set of points that are coordinatewise squares of points in L:
C = {(v12 , . . . , v82 ) | v = (v1 , . . . , v8 ) L}.
We rst show the following description of .
Lemma 4.11. is equal to the convex hull of C:
= conv(C).
Proof. Let v = (v1 , . . . , v8 ) L. Then there exists a quadratic form f R[x]4,2
such that f (si ) = vi for i = 1, . . . , 8. It follows that for the square of f we have
f 2 (si ) = vi2 . In other words,
(f 2 ) = (v12 , . . . , v82 ),
where v = (v1 , . . . , v8 ) = (f ).
i
i
166
main
2012/11/1
page 166
i
2
i qi
m
Tm = (x1 , . . . , xm ) Rm
xi 2 xk for all k .
+
i=1
x
1/2 = ( x1 + + xm )2 .
We can restate the inequalities of Tm as xk 0 and
x
1/2 4xk for all k. Now
suppose that x, y Tm and let z = x + (1 )y for some 0 1. It is clear that
zk 0 for all k. It is known by the Minkowski inequality [11, p. 30] that L1/2 -norm
is a concave function:
x + (1 )y
1/2
x
1/2 + (1 )
y
1/2 . Therefore
z
1/2
x
1/2 + (1 )
y
1/2 4xk + 4(1 )yk = 4zk for all k.
Thus Tm is a convex cone.
To show that Tm is the convex hull of the points where
x
1/2 = 4xk for some
k we proceed by induction. The base case m = 2 is simple since T2 is just a ray
spanned by the point (1, 1). For the induction step we observe that any convex set
is the convex hull of its boundary. For any point on the boundary of Tm one of
the dening 2m inequalities must be sharp. If a point x is on the boundary of Tm
and xi = 0 for all i, then the inequalities xi 0 are not sharp at x; therefore the
inequality
x
1/2 4xk must be sharp for some k, and we are done.
If xi = 0 for some i, then the point x lies in the set Tm1 in the subspace
spanned by the m 1 standard basis vectors excluding ei , and we are done by
induction.
Exercise 4.13. Show that the cone T4 R4 can be transformed via a nonsingular
linear transformation into the dual cone of 3 3 positive semidenite matrices with
equal diagonal elements:
x1 x2 x3
4
(x1 , x2 , x3 , x4 ) R such that x2 x1 x4 0.
x3 x4 x1
If we restrict x1 to being 1 then we obtain the elliptope E3 , which we have already
seen in Chapter 2.
i
i
main
2012/11/1
page 167
i
167
(4.2)
i=1
t
+
+
t
.
Since
v
has
the
largest
absolute
value
among
v
,
it
follows
that
8
i
1
1
2 tk t1 + + t8 for all 1 k 8. Hence we see that T8 .
To show the reverse inclusion T8 we use Lemma 4.12. It suces to
in . Without loss of generality we may assume that k = 1 and we have x1 =
4.4
4.4.1
The cones Pn,2d and n,2d have a lot of built-in symmetries coming from linear
changes of coordinates. Suppose that A GLn (R) is a nonsingular linear transformation of Rn .
i
i
168
main
2012/11/1
page 168
i
Exercise 4.15. Show that if p(x) R[x]2d is a nonnegative form, then p(Ax) is
also a nonnegative form in R[x]2d . Similarly, if p(x) is a sum of squares, then p(Ax)
is also a sum of squares.
In more formal terms, a nonsingular linear transformation A of Rn induces a
nonsingular transformation A of R[x]2d , which maps p(x) R[x]2d to p(A1 (x)).
We say that the group GLn (R) acts on R[x]2d . It follows from Exercise 4.15 that
both cones Pn,2d and n,2d are invariant under this action. In other words, Pn,2d
and n,2d are invariant under nonsingular linear changes of coordinates.
Exercise 4.16. Show that, up to a constant multiple, r2d = (x21 + + x2n )d is the
only form in R[x]2d that is xed under all orthogonal changes of coordinates; i.e.,
it is the only form in R[x]2d that satises
p(x) = p(Ax) for all A On ,
where On is the group of orthogonal transformations of Rn .
We note that even if a linear transformation A of Rn is singular, it still induces
a linear transformation A in the same way. However the linear map A will also
be singular. The map A still sends Pn,2d and n,2d into themselves, but it will
no longer preserve the cones. Closed convex cones in R[x]2d that are mapped into
themselves under any linear change of coordinates are called blenders [24].
4.4.2
Let K be a convex cone in a real vector space V . Let V be the dual vector space
of linear functionals on V . The dual cone K is dened as the set of all linear
functionals in V that are nonnegative on K:
K = { V | (x) 0
for all x K} .
f R[x]2d .
i
i
main
2012/11/1
page 169
i
169
$
%
Pn,2d
= cone v | v Sn1 .
Proof. Let Ln,2d R[x]2d be the conical hull of functionals v with v Sn1 .
The dual cone Ln,2d is the set of all forms p R[x]2d such that
v (p) = p(v) 0 for all v Sn1 .
Therefore we see that Ln,2d = Pn,2d . Using biduality we see that the dual
cone Pn,2d
is equal to the closure of Ln,2d :
= (Ln,2d ) = Ln,2d.
Pn,2d
We now just need to show that the cone Ln,2d is closed and then Ln,2d =
Ln,2d. Consider the set C of all linear functionals v with v Sn1 . The set C is
given by a continuous embedding of the unit sphere Sn1 into R[x]2d , and therefore
C is compact. If we can show that the convex hull of C does not contain the origin,
then we are done by applying Exercise 4.17.
Let r2d = (x21 + + x2n )d be the
form in R[x]2d that is constantly 1 on
the unit sphere.
Suppose
that
m
=
cv v conv(C). Then it follows that
cv = 1, and therefore m cannot be the zero functional in R[x]2d . It
m(r2d ) =
follows that conv(C) is a compact convex set with 0
/ C and we are done.
Exercise 4.19. Use the apolar inner product from Chapter 3 to identify R[x]2d
with the dual space R[x]2d . Show that the dual cone Pn,2d
is identied with the
cone of sums of 2dth powers of linear forms:
4
qi2d
p R[x]2d p =
with
5
qi R[x]n,1 .
Remark 4.20. The map that sends a point v Rn to the form (v1 x1 + +vn xn )2d
is called the 2dth Veronese embedding and its image is called the Veronese variety. It
and the functionals v form the complete set of extreme rays of Pn,2d .
i
i
170
main
2012/11/1
page 170
i
and convex geometry point of view. For example, given a linear functional
4.4.3
The boundary and the interior of the cone of nonnegative forms Pn,2d are easy to
4.4.4
Exposed faces of Pn,2d are conceptually easy to understand due to our knowledge of
i
i
main
2012/11/1
page 171
i
171
4.4.5
The cone Pn,2d has many nonexposed faces. If a form p has a zero at a point
v R, then it must have a double zero at v. Exposed faces of Pn,2d capture double
zeroes on any set of points v1 , . . . , vk , but exposed faces fail to capture zeroes of
higher order.
i
i
172
main
2012/11/1
page 172
i
4.4.6
Algebraic Boundaries
The boundaries of the cones Pn,2d and n,2d are hypersurfaces in R[x]2d . Suppose
that we would like to describe these hypersurfaces by polynomial equations. This
leads to the notion of algebraic boundary of the cones Pn,2d and n,2d , which is
obtained by taking the Zariski closure of the boundary hypersurfaces. As explained
in Chapter 5, the algebraic boundary of Pn,2d is cut out by a single polynomial, the
discriminant. The algebraic boundary of the cone of sums of squares is signicantly
more complicated.
Exercise 4.28. Show that the hypersurface cut out by the discriminant is a component of the algebraic boundary of n,2d .
The above exercise shows that the algebraic boundary of Pn,2d is included in
the algebraic boundary of n,2d . This seems counterintuitive, but it occurs because
we passed to the Zariski closures of the actual boundaries. We will see below that
for 3,6 and 4,4 the algebraic boundary of the cone of sums of squares has one
more component, which is described in Exercise 4.51.
4.5
i
i
4.5.1
main
2012/11/1
page 173
i
173
4.5.2
Zero-Dimensional Intersections
P = S (Pn,2d ).
i
i
174
main
2012/11/1
page 174
i
Before proving Theorem 4.29 we make some remarks. As we know from Exercise 4.7 we cannot simply conclude that P = H Rk+ using a closure argument,
since a projection of a closed cone does not have to be closed. We now show that
this occurs for evaluation projections as well.
Exercise 4.30. Let S R5 be the set of 16 points S = {1, 1, 1, 1, 1}. Show
that S can be dened as a common zero set of four quadratic forms in R[x]5,2 , and
use Theorem 4.29 to show that R16
++ S (P5,4 ). Show that the standard basis
vectors ei R16 are not in the image S (P5,4 ). In other words, the vectors ei are
not realized as values on S of a nonnegative form of degree 4 in 5 variables, but all
strictly positive points in R16
++ are realized.
Proof of Theorem 4.29. Let v = (v1 , . . . , vk ) H Rk++ . Since v H there
2
, where qi are
exists a form f R[x]2d such that f (si ) = vi . Let g = q12 + + qm
the forms dening V . We claim that for large enough R the form f$ =% f + g
will be nonnegative, and since each qi is zero on S we will also have S f = v.
By homogeneity of f it suces to show that it is nonnegative on the unit
sphere Sn1 . Furthermore, we may assume that the evaluation points si lie on the
unit sphere. Since we are dealing with forms, evaluation on the points outside of
the unit sphere amounts to rescaling of the values on Sn1 .
Let B (S) be the open epsilon neighborhood of S in the unit sphere Sn1 .
Since f (si ) > 0 for all i, it follows that for suciently small the form f is strictly
positive on B (S):
f (x) > 0 for all x B (S).
The complement of B (S) in Sn1 is compact, and therefore we can let m1 be the
minimum of g and m2 be the minimum of f on Sn1 \ B (S). If m2 0, then f
itself is nonnegative and we are done. Therefore, we may assume m2 < 0. We also
note that since g vanishes on S only, it follows that m1 is strictly positive.
2
Now let m
m1 . The form f = f + g is positive on B (S). By construction
of B (S) we also see that the minimum of f on the complement of B (S) is at
least 0. Therefore f is nonnegative on the unit sphere Sn1 , and we are done.
We proved in Theorem 4.29 that any set of strictly positive values on the nite
set S, coming from real zeroes of forms of degree d, can be achieved by a globally
nonnegative form of degree 2d. We now look at the values that sums of squares can
take on such sets S.
4.5.3
We recall from Section 4.3 that the reason that sums of squares could not achieve all
the possible nonnegative values on the hypercube was that the values of quadratic
forms on the hypercube satised a linear relation. The points of the hypercube
come from common zeroes of the quadratic forms, as we have seen in Section 4.5.1.
There is a general theory in algebraic geometry on the number of relations
that values of forms of certain degree have to satisfy on nite sets of points. These
i
i
main
2012/11/1
page 175
i
175
relations are known as CayleyBacharach relations. For more details we refer the
reader to [10].
At rst glance it is surprising that there should be any linear relation at all.
If the points were chosen generically then the values of forms of degree d on these
points would be linearly independent, at least until we have as many points as the
dimension of the vector space of forms of degree d. However, our choice of points
is not generic; point sets that come from common zeroes are special.
For the cases R[x]4,4 and R[x]3,6 it is easy to establish the existence of the
linear relation by simple dimension counting. We explain the case of R[x]4,4 .
Since common zeroes of real forms do not have to be real, for this section
we will work with complex forms. Suppose that q1 , q2 , q3 C[x]4,2 are complex
quadratic forms in 4 variables. As before let V be the complete set of projective
zeroes of some forms q1 , q2 , q3 :
V = {
x CP3 | q1 (
x) = q2 (
x) = q3 (
x) = 0}.
Three quadratic forms in C[x]4,2 are expected to generically have 23 = 8 common
zeroes. Suppose that this is the case and let V = {
s1 , . . . , s8 }.
For each si V let si be an ane representative of si lying on the line
corresponding to si . Let S = {s1 , . . . , s8 }, be the set of ane representatives
corresponding to the common zeroes of qi . Dene S : C[x]4,2 C8 to be the
evaluation projection.
Lemma 4.31. The values of quadratic forms in C[x]4,2 satisfy a linear relation on
the points of S. In other words there exist 1 , . . . 8 C such that
1 f (s1 ) + + 8 f (s8 ) = 0 for all f C[x]4,2 .
(4.3)
Proof. The dimension of C[x]4,2 is 10. Note that the kernel of S contains the
three forms qi , since each qi evaluates to 0 on S. Therefore the dimension of the
kernel of S is at least 3. It follows that the image of S has dimension at most
10 3 = 7. Since the image of S lies inside C8 , it follows that there exists a linear
functional that vanishes on the image of S . This linear functional gives us the
desired linear relation.
Remark 4.32. It is possible to show in the above proof that the dimension of the
kernel of S is exactly 3 and therefore the linear relation (4.3) is unique. Furthermore, it can be shown that each i = 0, or, in other words, the unique linear relation
has to involve all of the points of S.
Exercise 4.33. Suppose that q1 , q2 C[x]3,3 are two cubic forms intersecting in
32 = 9 points in CP2 . Let S be the set of ane representatives of the common
zeroes of q1 and q2 . Use the argument of Lemma 4.31 to show that the values of
cubic forms on S satisfy a linear relation.
Exercise 4.34. The Robinson form
R(x, y, z) = x6 + y 6 + z 6 (x4 y 2 + x2 y 4 + x4 z 2 + x2 z 4 + y 4 z 2 + y 2 z 4 ) + 3x2 y 2 z 2
i
i
176
main
2012/11/1
page 176
i
4.6
in Corollary
We gave a simple description of the extreme rays of the dual cone Pn,2d
4.21. The description of the extreme rays of the dual cone n,2d is signicantly more
complicated. We will see that evaluation on the special nite point sets we described
in Section 4.5 will naturally lead to extreme rays of n,2d .
We rst describe the connection between n,2d and the cone of positive
semidenite matrices that lies at the heart of semidenite programming approaches
to polynomial optimization. To every linear functional R[x]2d we can associate
a quadratic form Q dened on R[x]d by setting
Q (f ) = (f 2 )
for all
f R[x]d .
The cone n,2d can be thought of as a section of the cone of positive semidefinite quadratic forms. We now show how this description arises.
Lemma 4.35. Let be a linear functional in R[x]2d . Then n,2d if and only
if the quadratic form Q is positive semidenite.
Proof. Suppose that n,2d . Then (f 2 ) 0 for all f R[x]d . Therefore
Q (f ) 0 for all f R[x]d and Q is positive semidenite.
Now
that Q is positive
semidenite. Then (f 2 ) 0 for all f R[x]d .
suppose
2
Let g = fi n,2d . Then (g) = (fi2 ) 0 and n,2d .
An Aside: The Monomial Basis and Moment Matrices
Suppose that we x the monomial basis for R[x]d . Given a linear functional
R[x]2d we can write an explicit matrix M ( ) for the quadratic form Q using
the monomial basis of R[x]d . The matrix M ( ) is known as the moment matrix or
generalized Hankel matrix. The entries of M ( ) are indexed by monomials x , x
R[x]d . The entry M ( ), is given by evaluating on x x = x+ :
M ( ), = (x+ ).
For example, consider the linear functional v : R[x]2,4 R given by evaluation on v = (1, 2). The monomial basis of R[x]2,2 is given by x2 , xy, y 2 and the
i
i
main
2012/11/1
page 177
i
177
1 2
M ( v ) = 2 4
4 8
4
8 .
16
The rank of the quadratic form Q is the same as the rank of its moment matrix
M ( ), and Q being nonnegative is equivalent to having a positive semidenite
moment matrix M ( ). However, the moment approach is tied to the specic choice
of the monomial basis. Below we prefer to keep a basis independent approach
with emphasis on the underlying geometry, but we note that the results are readily
translatable into the terminology of moments.
Let S n,d be the vector space of real quadratic forms on R[x]d . We can view
the dual space R[x]2d as a subspace of S n,d by identifying the linear functional
n,d
R[x]2d with its quadratic form Q . Let S+
be the cone of positive semidenite
n,d
forms in S :
4
5
n,d
S+
= Q S n,d Q(f ) 0 for all f R[x]d .
We can restate Lemma 4.35 as follows.
Corollary 4.36. The cone n,2d is the section of the cone of positive semidenite
n,d
with the subspace R[x]2d :
matrices S+
n,d
R[x]2d .
n,2d = S+
i
i
178
main
2012/11/1
page 178
i
We would like to see what separates sums of squares from nonnegative forms.
The extreme rays of n,2d cut out the cone of sums of squares. Therefore we would
like to nd extreme rays of n,2d that are not in the dual cone Pn,2d
, since these
are the functionals that distinguish the cone of sums of squares from the cone of
nonnegative forms.
Formally the dual cone n,2d is dened as the cone of linear functionals nonnegative on n,2d , which is equivalent to being nonnegative on squares. One way
of constructing linear functionals nonnegative on squares is to consider point evaluation functionals v with v Rn that send p R[x]2d to p(v). However, as we
have seen in Corollary 4.21, point evaluation functionals are precisely the extreme
rays of Pn,2d
. Therefore, these linear functionals are not helpful in distinguishing
. Our goal now is to nd a new way of constructing funcbetween n,2d and Pn,2d
tionals nonnegative on squares and also to understand why such functionals do not
exist when n,2d = Pn,2d .
We showed in Corollary 4.36 that the cone n,2d is a spectrahedron. We
now prove a general lemma about spectrahedra that states that extreme rays of a
spectrahedron are quadratic forms with maximal kernel [20]. The examination of
the kernels of extreme rays of n,2d will provide a crucial tool for our understanding
of n,2d .
Let S be the vector space of quadratic forms on a real vector space V . Let
S+ be the cone of psd forms in S.
Lemma 4.39. Let L be a linear subspace of S and let K be the section of S+
with L:
K = S+ L.
Suppose that a quadratic form Q spans an extreme ray of K. Then the kernel of Q
is maximal for all quadratic forms in L: if P L and ker Q ker P then P = Q
for some R.
Proof. Suppose not, so that there exists an extreme ray Q of K and a quadratic
form P L such that ker Q ker P and P = Q. Since ker Q ker P it follows
that all eigenvectors of both Q and P corresponding to nonzero eigenvalues lie in
the orthogonal complement (ker Q) of ker Q. Furthermore, Q is positive denite
on (ker Q) .
It follows that Q and P can be simultaneously diagonalized to matrices Q
and P with the additional property that whenever the diagonal entry Qii is 0 the
corresponding entry Pii is also 0. Therefore, for suciently small R we have
that Q + P and Q P are positive semidenite and therefore Q + P, Q P K.
Then Q is not an extreme ray of K, which is a contradiction.
We now apply Lemma 4.39 to the case n,2d . This gives us a crucial tool for
studying extreme rays of n,2d .
Corollary 4.40. Suppose that Q spans an extreme ray of n,2d . Then either
rank Q = 1 or the forms in the kernel of Q have no common zeroes, real or complex.
i
i
main
2012/11/1
page 179
i
179
Proof. Let W R[x]d be the kernel of Q and suppose that the forms in W
have a common real zero v = 0. Let R[x]2d be the linear functional given
by evaluation at v: (f ) = f (v) for all f R[x]2d . Then Q is a rank 1 positive
semidenite quadratic form and ker Q ker Q . By Lemma 4.39 it follows that
Q = Q and thus Q has rank 1.
Now suppose that the forms in W have a common complex zero z = 0. Let
R[x]2d be the linear functional given by taking the real part of the value at z:
(f ) = Re f (z) for all f R[x]2d . It is easy to check that the kernel of Q includes
all forms that vanish at z and therefore W ker Q . Therefore by applying Lemma
4.39 we again see that Q = Q . However, we claim that Q is not a positive
semidenite form.
The quadratic form Q is given by Q (f ) = Re f 2 (z) for f R[x]d . However,
there exist f R[x]d such that f (z) is purely imaginary and therefore Q (f ) < 0.
The corollary now follows.
Corollary 4.40 shows that extreme rays of n,2d are of two types: either they
are rank 1 quadratic forms or they have a kernel with no common zeroes. We now
deal with the rank 1 extreme rays of n,2d . For v Rn let v be the linear functional
in R[x]2d given by evaluation at v,
v (f ) = f (v) for f R[x]2d ,
and let Qv be the quadratic form associated to v : Qv (f ) = f 2 (v). In this case we
say that Qv (or v ) corresponds to point evaluation. Recall that the inequalities
v 0 are the dening inequalities of the cone of nonnegative forms Pn,2d . The
following lemma shows that all rank 1 forms in R[x]2d correspond to point evaluations. Since we are interested in the inequalities that are valid on n,2d but not
valid on Pn,2d it allows us to disregard rank 1 extreme rays of n,2d and focus on
the case of a kernel with no common zeros.
Lemma 4.41. Suppose that Q is a rank 1 quadratic form in R[x]2d . Then Q = Qv
for some v Rn and R.
Proof. Let Q be a rank 1 form in R[x]2d . Then Q(f ) = s2 (f ) for some linear
functional s R[x]d . Therefore it suces to show that if Q = s2 (f ) for some
s R[x]d , then Q = Qv for some v Rn .
Since Q R[x]2d we know that Q is dened by Q(f ) = (f 2 ) for a linear
functional R[x]2d and therefore (f 2 ) = s2 (f ) for all f R[x]d . We have Q(f +
g) = ((f +g)2 ) = (f 2 )+2 (f g)+ (g 2 ) = (s(f )+s(g))2 = s2 (f )+2s(f )s(g)+s2 (g)
and it follows that (f g) = s(f )s(g) for all f, g R[x]d .
n
1
Let x denote the monomial x
1 xn . If we take monomials x , x , x , x
in R[x]d such that x x = x x , then we must have s(x )s(x ) = s(x )s(x ).
Suppose that s(xdi ) = 0 for all i. Then we see that
s(xd1
xj )2 = s(xdi )s(xd2
x2j ) = 0,
i
i
and continuing in similar fashion we have s(x ) = 0 for all monomials in R[x]d .
Then is the zero functional and Q does not have rank one which is a contradiction.
i
i
180
main
2012/11/1
page 180
i
We may assume without loss of generality that s(xd1 ) = 0. Since we are interested in (f 2 ) = s2 (f ) we can work with s if necessary, and thus we may assume
that s(xd1 ) > 0. Let si = s(xd1
xi ) for 1 i n. We will express s(x ) in
1
xi xj ) = (xd1
xi )(xd1
xj ) we have
terms of si for all x R[x]d . Since (xd1 )(xd2
1
1
1
d2
s(x1 xi xj ) = si sj /s1 . Continuing in this fashion we nd that
n
1
s(x
1 xn ) =
n
2
s
2 sn
.
d11
s1
(d1)/d
v = (s1 , s1
(d1)/d
s2 , . . . , s1
sn ).
2
n
n
1
1
sv (x
1 xn ) = s2 sn s1
n
2
s
2 sn
1
sd1
1
Suppose that Q spans an extreme ray of n,2d that does not correspond to
point evaluation. Let W be the kernel of Q . Then by Corollary 4.40 and Lemma
4.41 we know that the forms in W have no common zeroes real or complex. This
condition gives us a lot of dimensional information about W and places strong
restrictions on the linear functionals . As we will see, for the three equality cases
of Hilberts theorem the dimensional restrictions on W will allow us to derive nonexistence of the extreme rays of n,2d with kernel W , thus proving the equality
between nonnegative forms and sums of squares.
Let W be a linear subspace of R[x]d and dene W
2 to be the degree 2d part
of the ideal generated by W :
W
2 = W 2d .
We use VC (W ) to denote the set of common zeroes (real and complex) of forms
in W .
We next show that there is a strong relation between the linear functional
and the kernel W of the quadratic form Q . Namely, we show that vanishes on
2
all of W :
2
(p) = 0 for all p W .
(4.4)
2
We will write the condition (4.4) as (W ) = 0 for short. We also now show
that W is the maximal subspace among all W such that (W
2 ) = 0.
Lemma 4.42. Let Q be a quadratic form in n,2d and let W R[x]d be the
kernel of Q . Then p W if and only if (pq) = 0 for all q R[x]d .
i
i
main
2012/11/1
page 181
i
181
Q (p + q) Q (p) Q (q)
2
for p, q R[x]d .
& '
n
= n dim R[x]d
.
2
4.6.1
We have obtained enough information on the dual cone n,2d to give a unied proof
of the equality cases of Hilberts theorem.
Proof of equality cases in Hilberts theorem. Suppose that n,2d = Pn,2d .
Then there exists an extreme ray of n,2d that does not come from point evaluation.
Let be such an extreme ray and let W be the kernel of Q . By Lemma 4.41 it
follows that rank Q > 1, and therefore by Corollary 4.40 we see that VC (W ) = .
Therefore dim W n and we can nd forms p1 , . . . , pn W such that
VC (p1 , . . . , pn ) = . Let I = p1 , . . . , pn be the ideal generated by pi . It follows
$ %
2
that W includes I2d and dim I2d = n dim R[x]d n2 by Lemma 4.43. Therefore
we see that
& '
n
2
.
dim W n dim R[x]d
2
However, by (4.4) we must also have
2
dim W
dim R[x]2d 1,
i
i
182
main
2012/11/1
page 182
i
4.7
We rst examine, in the cases (3, 6) and (4, 4), the structure of linear functionals
R[x]2d with a given kernel W such that VC (W ) = .
Proposition 4.45. Let W be a three-dimensional subspace of R[x]3,3 such that
VC (W ) = . Then dim W
2 = 27 and there exists a unique quadratic form Q
R[x]3,6 containing W in its kernel. Furthermore ker Q = W .
Before we prove Proposition 4.45 we note that the unique form Q with kernel
W need not be positive semidenite. The investigation of positive deniteness of
Q will lead us to evaluation on nite point sets in the next section.
Proof of Proposition 4.45. By applying Lemma 4.43 we see that
dim W
2 = 3 dim R[x]3,3 3 = 27.
Since dim R[x]3,6 = 28 it follows that W
2 is a hyperplane in R[x]3,6 and
therefore there is a unique linear functional vanishing on W . By Lemma 4.42 it
follows that Q is the unique (up to a constant multiple) quadratic form with W in
its kernel.
We leave the part that the dimension of the kernel of Q cannot be more
than 3 as an exercise.
There is also the corresponding proposition for the case (4, 4) with the same
proof.
Proposition 4.46. Let W be a four-dimensional subspace of R[x]4,2 such that
VC (W ) = . Then dim W
2 = 34 and there exists a unique quadratic form Q
R[x]4,4 containing W in its kernel. Furthermore ker Q = W.
i
i
main
2012/11/1
page 183
i
183
i
i
184
main
2012/11/1
page 184
i
The equivalent corollaries hold for the case (4, 4), although the proof of Corollary 4.50 requires slightly more work, while the proof of Corollary 4.49 is exactly
the same. For complete details see [7].
Corollary 4.49. Suppose that spans an extreme ray of 4,4 and does not
correspond to point evaluation. Then rank Q = 6. Conversely, suppose that Q is
4,2
a positive semidenite form of rank 6 in S+
and let W be the kernel of Q . If
VC (W ) = , then Q spans an extreme ray of 4,4 .
Corollary 4.50. Suppose that p 4,4 lies on the boundary of the cone of sums
of squares and p is a strictly positive form. Then p is a sum of exactly 4 squares.
Corollaries 4.48 and 4.50 were used to study the algebraic boundary of the
cones 3,6 and 4,4 in [8].
Exercise 4.51. Show that all forms in R[x]3,6 that can be written as linear combinations of squares of 3 cubics form an irreducible hypersurface in R[x]3,6 . Similarly,
show that all forms in R[x]4,4 that are linear combinations of squares of 4 quadratics also form an irreducible hypersurface in R[x]4,4 . (Hint: Use Terracinis lemma.)
Use Corollaries 4.48 and 4.50 to show that the algebraic boundary of 3,6 and 4,4
has a single component in addition to the discriminant hypersurface.
It was shown in [8] that despite their simple denition the hypersurfaces of
Exercise 4.51 have very high degree: 83200 in the case (3, 6) and 38475 in the
case (4, 4). This shows that the boundary of the cone of sums of squares is quite
complicated from the algebraic point of view.
4.8
We have established in the previous section that the interesting extreme rays of
3,6 have rank 7 and those of 4,4 have rank 6. Lets consider the case of 4 variables
of degree 4. We have shown that a four-dimensional subspace W leads to a unique
form Q of rank 6 such that the kernel of Q contains W . However, the form Q does
not have to lie in 4,4 , since the form Q is not necessarily positive semidenite.
In order to examine positive semideniteness of Q we reduce the problem to
looking at an evaluation on nite point sets.
Exercise 4.52. Let W be a subspace of R[x]d such that VC (W ) = . Show that
there exist forms q1 , . . . , qn1 W that intersect in dn1 projective points in CPn1 :
s1 , . . . , sdn1 | si CPn1 }.
VC (q1 , . . . , qn1 ) = {
We apply this result to our case of W R[x]4,4 and obtain forms q1 , q2 , q3
W intersecting in 23 = 8 projective points si CP3 . We can take their ane
representatives s1 , . . . , s8 Cn . Unfortunately, even though the forms qi W are
real, their points of intersection may be complex.
i
i
4.9. Volumes
main
2012/11/1
page 185
i
185
However, as was shown in [7], the fact that the form Q is positive semidenite
restricts the number of complex zeroes. Since complex zeroes of real forms come
in conjugate pairs, the fewest number of complex zeroes that the forms qi may
have is 2.
Theorem 4.53. Suppose that R[x]4,4 is an extreme ray of 4,4 that does not
correspond to point evaluation and let W be the kernel of Q . Let q1 , q2 , q3 W
be any three forms intersecting in 23 = 8 projective points in CP3 . Then the forms
qi have at most 2 common complex zeroes. Conversely, given q1 , q2 , q3 R[x]4,2
intersecting in 8 points with at most 2 of them complex, there exists an extreme ray
of 4,4 whose kernel contains q1 , q2 , q3 .
There is an equivalent theorem for the case (3, 6).
Theorem 4.54. Suppose that R[x]3,6 is an extreme ray of 3,6 that does not
correspond to point evaluation and let W be the kernel of Q . Let q1 , q2 W be
any two forms intersecting in 32 = 9 projective points in CP2 . Then the forms
qi have at most 2 common complex zeroes. Conversely, given q1 , q2 , q3 R[x]3,3
intersecting in 9 points with at most 2 of them complex, there exists an extreme ray
of 3,6 whose kernel contains q1 , q2 .
It is possible to apply the CayleyBacharach machinery explained in Section
4.5 to completely describe the structure of the extreme rays of n,2d for the cases
(4, 4) and (3, 6) using the coecients of the unique CayleyBacharach relation that
exists on the points of intersection of the forms qi .
We have now come full circle, from using a nite point set to establish that
there exist nonnegative forms that are not sums of squares in Section 4.3 to showing
that these sets underlie all linear inequalities that separate n,2d from Pn,2d .
4.9
Volumes
We now switch gears completely and turn to the question of the quantitative relationship between Pn,2d and n,2d . Our goal is to compare the relative sizes of
the cones Pn,2d and n,2d . While the cones themselves are unbounded objects, we
can take a section of each cone with the same hyperplane so that both sections are
compact.
Let Ln,2d be an ane hyperplane in R[x]2d consisting of all forms with integral (average) 1 on the unit sphere Sn1 in Rn :
,
p d = 1 ,
Ln,2d = p R[x]2d
Sn1
n,2d
where is the rotation invariant probability measure on Sn1 . Let Pn,2d and
be the sections of Pn,2d and n,2d with Ln,2d:
Pn,2d = Pn,2d Ln,2d
and
i
i
186
main
2012/11/1
page 186
i
Let r2d = (x21 + + x2n )d be the form in R[x]2d that is constantly 1 on the
n,2d lie in the ane hyperplane Ln,2d of
unit sphere. Convex bodies Pn,2d and
forms of integral 1 on the unit sphere. We now translate them to lie in the linear
hyperplane Mn,2d of forms of integral 0 on the unit sphere by subtracting r2d :
Pn,2d = Pn,2d r2d = {p R[x]2d | p + r2d Pn,2d }
and
n,2d =
n,2d r2d = {p R[x]2d | p + r2d
n,2d }.
(Vol K) n .
4.9.1
Let Mn,2d be the linear hyperplane of forms of integral 0 on the unit sphere:
,
p d = 0 .
Mn,2d = p R[x]2d
Sn1
p
2 = p, p =
Sn1
p2 d =
p
22 .
i
i
4.9. Volumes
main
2012/11/1
page 187
i
187
Vol Pn,2d
Vol B N
1/N
1
n1/2 .
2 4d + 2
K d,
Vol K
Vol B n
'1/n
Sn1
G1
K d.
Exercise 4.58. Use Exercise 4.57 and Jensens inequality to show that
&
Vol K
Vol B n
&,
'1/n
Sn1
'1
GK d
Vol Pn,2d
Vol B N
1/N
&,
SN 1
'1
||p|| dp
Proof. We observe that Pn,2d consists of all forms of integral 1 on Sn1 whose
minimum on Sn1 is at least 0. Therefore Pn,2d consists of all forms of integral 0
i
i
188
main
2012/11/1
page 188
i
Sn1
(4.5)
xSn1
Using Exercise 4.58 we can bound the volume of Pn,2d from below:
Vol Pn,2d
Vol B N
1/N
&,
'1
SN 1
min(p) dp
Vol Pn,2d
Vol B N
1/N
&,
SN 1
'1
||p|| dp
as desired.
From Lemma 4.59 we see that in order to obtain a lower bound on the volume
of Pn,2d we need to nd an upper bound on the average L -norm of forms in SN 1 :
,
||p|| dp .
SN 1
It is easy to see that the L -norm of any polynomial is bounded from below by any
of its L2k -norms:
||p||
p
2k
for all k. Finding upper bounds on the L -norm of forms in R[x]2d in terms of
their L2k -norms is signicantly more challenging.
Exercise 4.60. It was shown by Barvinok in [3] that the following inequality holds
for all p R[x]2d and all k:
&
||p||
'1
2kd + n 1 2k
p
2k .
2kd
||p|| 2 2d + 1
p
2n
for all p R[x]2d .
i
i
4.9. Volumes
main
2012/11/1
page 189
i
189
Remark 4.61. It is possible to obtain slightly better bounds for our purposes by
using k = n log(2d + 1) in the above inequality. See [4] for details.
We use Barvinoks inequality to convert the problem of bounding the average
L -norm on SN 1 into bounding the average L2n -norm. In order for this to be
useful we need lower bounds on the average L2k -norms. We will show the following
bound.
Lemma 4.62.
,
SN 1
p
2k dp
2k.
(4.6)
Now let : Rn R be a linear form given by (x) = x, for some vector Rn .
Use (4.6) to show that
%
$n% $
,
1
2k
2k $ 2 % $ k + 2 %
.
(4.7)
(x) dx =
12 k + n2
Sn1
In order to apply the result of Exercise 4.63 we will need to know the L2 -norm
of a special form in Mn,2d .
Lemma 4.64. Let v Sn1 be a unit vector and let v Mn,2d be the form such
that
p, v = p(v)
Then
for all
v
= dim Mn,2d =
6&
'
n + 2d 1
1.
2d
p Mn,2d.
SN 1
p, v 2 dp .
On one hand it is the average of a quadratic form on the unit sphere and by Exercise
4.63 we have
,
v
2
p2 (v) dp =
.
dim Mn,2d
SN 1
i
i
190
main
2012/11/1
page 190
i
SN 1
p2 (v) dp =
,
SN 1
Sn1
p2 (v) dp dv .
SN 1
We observe that
Sn1
p2 (v) dp =
SN 1
Sn1
p2 (v) dv dp .
SN 1
SN 1
&,
p
2k dp =
SN 1
Sn1
p2k (x) dx
1
' 2k
By applying the H
older inequality we can move the exponent
&,
,
SN 1
p
2k dp
dp .
1
2k
1
' 2k
,
2k
SN 1
Sn1
p (x) dx dp
,
SN 1
p
2k dp
1
' 2k
,
2k
Sn1
SN 1
p (x) dp dx
,
SN 1
p2k (x) dp .
(4.8)
By rotational invariance it does not depend on the choice of the point x Sn1 .
Therefore the outer integral over Sn1 is redundant and we obtain
&,
,
SN 1
p
2k dp
2k
SN 1
p (v) dp
1
' 2k
(4.9)
i
i
4.9. Volumes
main
2012/11/1
page 191
i
191
&,
SN 1
p
2k dp
p, v
2k
SN 1
1
' 2k
dp
Now we see that the integral in (4.8) is actually just the average of the 2kth
power of a linear form and we can apply Exercise 4.63 to see that
%
$N % $
,
1
2k
2k 2 k + 2
%.
$ % $
p, v dp =
v
12 k + N2
SN 1
By Lemma 4.64 we know that
v
2 = dim Mn,2d = N.
Putting it all together with (4.9) we see that
,
SN 1
p
2k dp N
1
$ % $
% 2k
N2 k + 12
%
$ % $
.
12 k + N2
1
7
$ % 2k
N2
2
%
$
N
k + N2
and
1
$
% 2k
k + 12
$1%
k.
2
Vol Pn,2d
Vol B N
1/N
&,
SN 1
'1
||p|| dp
||p|| 2 2d + 1
p
2n .
Therefore we see that
1/N
&,
'1
Vol Pn,2d
1
p
2n dp
.
Vol B N
2 2d + 1
SN 1
Now we can apply Lemma 4.62 with k = n and obtain
Vol Pn,2d
Vol B N
1/N
1
n1/2
2 4d + 2
as desired.
i
i
192
4.9.2
main
2012/11/1
page 192
i
We now turn our attention to the cone of sums of squares n,2d . Although it will
be somewhat obscured by our presentation, the main reason for our ability to derive
n,2d comes from the fact that the dual cone
bounds on the volume of
n,2d is a
section of the cone of positive semidenite matrices.
We have just seen how to derive lower bounds on the volume of the cone of
nonnegative forms. These bounds, of course, apply to quadratic forms, and they
can be extended to work for sections of the cone. This gives us a lower bound on
the volume of the dual cone, which can be turned around into an upper bound on
n,2d is therefore
n,2d . The approach to bounding the volume of
the volume of
very similar to what we did for nonnegative forms. In fact, the technique in the
proofs of the main bounds in Lemma 4.70 and Lemma 4.62 is nearly identical.
n,2d is
Let D be the dimension of R[x]d . Our main result on the volume of
as follows.
Theorem 4.65.
n,2d
Vol
Vol B N
1/N
7
2
and
4d+1
6D
.
N
&
'
n+d1
D=
.
d
n,2d is of the
Therefore, for xed degree d our upper bound on the volume of
d/2
. In Theorem 4.55 we proved a lower bound on the volume of Pn,2d that
order n
is of the order n1/2 . Therefore, when the total degree 2d is at least 4, the lower
bound on the volume of Pn,2d is asymptotically much larger than the upper bound
n,2d . Thus we see that if the degree 2d is xed and at least 4,
on the volume of
there are signicantly more nonnegative forms than sums of squares.
It is possible to show that the bounds of Theorems 4.55 and 4.65 are asymptotically tight for the case of xed degree 2d. See [5] for more details.
In Exercises 4.564.58 we showed how to bound the volume of a convex body
K from below using the average of its gauge over the unit sphere Sn1 . As we
explained above, we are now dealing with the dual situation, and we need a related
dual inequality that bounds the volume of K from above by the average gauge of
its dual body K .
Exercise 4.67. Let K Rn be a convex body with 0 in its interior and let K be
the dual convex body dened as
K = {x Rn | x, y 1
for all
y K}.
i
i
4.9. Volumes
main
2012/11/1
page 193
i
193
Vol K
Vol B n
'1/n
Sn1
GK (x) dx .
.
In order to apply Lemma 4.68 we need a description of the gauge of
n,2d
Let SD1 be the unit sphere in R[x]d with respect to the L2 inner product.
:
Lemma 4.69. We have the following description of the gauge of
n,2d
G
n,2d
n,2d
We observe that the maximal inner product maxq n,2d p, q always occurs at an
n,2d are all squares, and therefore
n,2d . Extreme points of
extreme point of
extreme point of n,2d are translates of squares and have the form
,
q 2 r2d with q R[x]d and
q 2 d = 1.
Sn1
for all
p Mn,2d.
n,2d
for
q R[x]d .
i
i
194
main
2012/11/1
page 194
i
n,2d
Vol
Vol B N
1/N
||Qp || dp .
SN 1
Now we can apply Barvinoks inequality to bound ||Qp || by high L2k -norms.
Using Exercise 4.60 with k = D we see that
||Qp || 2 3
Qp
2D .
Therefore we obtain
n,2d
Vol
Vol B N
1/N
,
2 3
SN 1
Qp
2D dp .
The proof is now nished with the following estimate, which proceeds in nearly
the same way as the proof of Lemma 4.62.
Lemma 4.70.
,
SN 1
Qp
2D dp 2
4d
2D
.
N
&,
Qp
2D dp =
'1/2D
p, q
2 2D
SN 1
SD1
dq
dp .
,
SN 1
Qp
2D dp
'1/2D
,
p, q
2 2D
SN 1
SD1
dq dp
,
SN 1
Qp
2D dp
'1/2D
,
p, q
2 2D
SD1
SN 1
dp dq
(4.10)
p, q 2 2D dp .
(4.11)
2 2D
SN 1
%
$N % $
D + 12
2
%.
$ % $
12 D + N2
2 2D
dp
q
i
i
main
2012/11/1
page 195
i
195
q 2
=
q
24 .
Since q lies in the unit sphere of SD1 it follows that
q
= 1. By a result of
Duoandikoetxea in [9] we know that
q
4 42d
q
.
Putting it all together we get
,
p, q
2 2D
SN 1
dp 4
%
$N % $
D + 12
2
$ % $
%.
12 D + N2
4dD
We note that this estimate is independent of q and therefore the outer integral in (4.10) is redundant and we obtain
$ % $
% 1/2D
N
1
D
+
2%
$ 2% $
Qp
2D dp 42d
.
12 D + N2
SN 1
1
7
$ % 2D
N2
2
%
$
N
N
D+ 2
1
$
% 2D
D + 12
$1%
D.
2
and
Therefore we have
7
,
SN 1
4.10
Qp
2D 2
4d
2D
.
N
Convex Forms
There is another very interesting convex cone inside R[x]2d , the cone of convex
forms Cn,2d . A form p R[x]2d is called convex if p is a convex function on Rn :
&
'
x+y
p(x) + p(y)
p
for all x, y Rn .
2
2
It is an easy exercise to show that Cn,2d is contained in the cone of nonnegative
forms.
Exercise 4.71. Show that if a form p R[x]2d is convex, then p is nonnegative.
Show that x21 x22 P2,4 is not convex.
i
i
196
main
2012/11/1
page 196
i
4.10.1
As before we can take a compact section of Cn,2d with the hyperplane Ln,2d of
forms of integral 1 on Sn1 :
Cn,2d = Cn,2d Ln,2d .
We also let Cn,2d be Cn,2d translated by subtracting r2d :
Cn,2d = Cn,2d r2d .
The convex body Cn,2d lies in the hyperplane Mn,2d of forms of average 0 on
the unit sphere Sn1 . We will show the following estimate on the volume of Cn,2d
that, together with Theorems 4.55 and 4.65, implies that if the degree 2d is xed
and the number of variables grows then there are signicantly more convex forms
than sums of squares. This is the only currently known method of establishing
existence of convex forms that are not sums of squares.
Theorem 4.73.
&
Vol Cn,2d
Vol Pn,2d
'1/N
1
.
2(2d 1)
i
i
main
2012/11/1
page 197
i
197
Remark 4.74. From Exercise 4.71 it follows that Cn,2d Pn,2d . Therefore the
estimate of Theorem 4.73 is asymptotically tight for the case of xed degree 2d.
Our rst goal is to show that if a form p R[x]2d is suciently close to being
constant on the unit sphere, then p must be convex.
Theorem 4.75. Let p be a form in R[x]2d . If for all v Sn1
1
1
1
p(v) 1 +
,
2d 1
2d 1
then p is convex.
For a point Sn1 we can think of as a direction. We will use
p
= p,
This follows since
p
= p, |p| || = |p|
i
i
198
main
2012/11/1
page 198
i
Proof. We proceed by induction on the order of partial derivatives k. The base case
k = 1 is covered by Theorem 4.76. Now we need to show the induction step. We
assume that the statement holds for all derivatives of order at most k and consider
k+1 p
(v)
1 k+1
for some 1 , . . . k+1 Sn1 .
Let
q=
p
.
1
(4.12)
1
1
q(v)
.
2d 1
2d 1
In other words
||q||
1
.
2d 1
i
i
main
2012/11/1
page 199
i
199
Vol K K
Vol K
' n1
1
.
2
Exercise 4.79. The set Pn,2d is a convex body in the hyperplane Mn,2d of all
forms of integral 0 on the unit sphere. Use invariance of Pn,2d under orthogonal
changes of coordinates to show that 0 is the barycenter of Pn,2d . Let Pn,2d be
the reection of Pn,2d through the origin. Show that Pn,2d Pn,2d consists of all
forms in Mn,2d whose values on the unit are between 1 and 1, i.e., the forms with
L -norm at most 1:
Pn,2d Pn,2d = {p Mn,2d | ||p|| 1} .
Proof of Theorem 4.73. Let Kn,2d be the set of forms that take values only
1
1
and 1 + 2d1
on the unit sphere:
between 1 2d1
Kn,2d =
1
1
p(v) 1 +
for all v Sn1 .
p R[x]2d 1
2d 1
2d 1
i
i
200
main
2012/11/1
page 200
i
2d 1
By Exercise 4.79 it follows that
1 .
2d 1
/
n,2d.
Pn,2d Pn,2d K
1/N
1
.
2
n,2d
Vol K
Vol Pn,2d
1/N
1
.
2(2d 1)
Bibliography
[1] A. A. Ahmadi and P. A. Parrilo. A convex polynomial that is not sos-convex.
Math. Program. Ser. A, 135:275292, 2012.
[2] A. Barvinok. A Course in Convexity. American Mathematical Society, Providence, RI, 2002.
[3] A. Barvinok. Estimating L norms by L2k norms for functions on orbits.
Found. Comput. Math., 2:393412, 2002.
[4] A. Barvinok and G. Blekherman. Convex geometry of orbits. In Combinatorial and Computational Geometry, Math. Sci. Res. Inst. Publ. 52, Cambridge
University Press, Cambridge, UK, 2005, pp. 5177.
[5] G. Blekherman. There are signicantly more nonnegative polynomials than
sums of squares. Israel J. Math., 183:355380, 2006.
[6] G. Blekherman. Dimensional dierences between nonnegative polynomials and
sums of squares. Submitted for publication, arXiv:0907.1339.
i
i
Bibliography
main
2012/11/1
page 201
i
201
i
i
202
main
2012/11/1
page 202
i
i
i
main
2012/11/1
page 203
i
Chapter 5
Dualities
Dualities are ubiquitous in mathematics and its applications. This chapter compares
several notions of duality that are central to the connections between convexity,
optimization, and algebraic geometry developed in this book. It is meant as a rst
introduction and is intended for a diverse audience ranging from graduate students
in mathematics to practitioners of optimization who are based in engineering.
5.1
Introduction
Convex algebraic geometry concerns the interplay between optimization theory and
real algebraic geometry. Its objects of study include convex semialgebraic sets that
arise in semidenite programming and from sums of squares. This chapter compares
three notions of duality that are relevant in these contexts: duality of convex bodies,
duality of projective varieties, and the KarushKuhnTucker conditions derived
from Lagrange duality. We show that the optimal value of a polynomial program is
an algebraic function whose minimal polynomial is expressed by the hypersurface
projectively dual to the constraint set. We give an introduction to the algebraic
geometry in the boundary of the convex hull of a compact variety. Our focus lies
on making the polynomials that vanish on that boundary explicit, in contrast to
the representation of convex bodies as projected spectrahedra. We also explore the
geometric underpinnings of semidenite programming duality.
Duality for vector spaces lies at the heart of linear algebra and functional
analysis. Duality in convex geometry is essentially an involution on the set of
Philipp Rostalski was supported by the Alexander-von-Humboldt Foundation through a
Feodor Lynen postdoctoral fellowship.
Bernd Sturmfels was supported by NSF grants DMS-0757207 and DMS-0968882.
203
i
i
204
main
2012/11/1
page 204
i
Chapter 5. Dualities
5.1.1
1 x
x 1
Q(x, y, z) =
0 y
x 0
0 x
y 0
.
(5.1)
1 z
z 1
i
i
5.1. Introduction
main
2012/11/1
page 205
i
205
x2 (y z)2 2x2 y 2 z 2 + 1
0.
(5.3)
We nd these from a Gr
obner basis of the ideal of 3 3 minors of Q(x, y, z):
: 2
;
2x 1, 2z 2 1, y + z .
The linear polynomial y + z in this Gr
obner basis denes the symmetry plane of the
pillow P . The four singular points form a square in that plane. Its edges are also
edges of P . All other faces of P are exposed points. These come in two families,
sometimes called protrusions, one above the plane y + z = 0 and one below it.
The protrusions are drawn in two dierent colors on the left in Figure 5.2.
Note that the surface P is smooth along the four edges that separate the two
protrusions. To be more precise, the four points (5.3) are the only singular points
in P . All points in the relative interiors of the four edges are nonsingular in P .
i
i
206
main
2012/11/1
page 206
i
Chapter 5. Dualities
Like all convex bodies, our pillow P has an associated dual convex body
:
;
(5.4)
P = (a, b, c) R3 | ax + by + cz 1 for all (x, y, z) P ,
consisting of all linear forms that evaluate to at most one on the convex body P .
The dual pillow P is shown on the right in Figure 5.2. Note the association
of faces under duality. The pillow P has four one-dimensional faces, four singular
zero-dimensional faces, and two smooth families of zero-dimensional faces. The
corresponding dual faces of P have dimensions 0, 2, and 0, respectively.
Semidenite programming was introduced in Chapter 2 as the computational
problem of optimizing a linear function over a spectrahedron. For our pillow P ,
this optimization problem takes the form
p (a, b, c) =
max
(x,y,z)R3
subject to
ax + by + cz
Q(x, y, z) 0.
(5.5)
min
R
subject to
1
(a, b, c) P .
(5.6)
min
u1 + u4 + u6 + u7
2u2
2u1
2u2
2u4
subject to
2u3
b
2u2 a 2u5
uR7
2u3
b
2u6
c
2u2 a
2u5
0.
c
2u7
(5.7)
The derivation of such a dual formulation will be explained in Section 5.5. Since
(5.5) and (5.7) are both strictly feasible, strong duality holds [5, Subsection 5.2.3];
i.e., the two programs attain the same optimal value: p (a, b, c) = d (a, b, c). Hence,
problem (5.7) can be derived from (5.6), as we shall see in Section 5.5.
We write M (u; a, b, c) for the 44 matrix in (5.7). The following equations and
inequalities, known as the KarushKuhnTucker conditions (KKT), are necessary
and sucient for any pair of optimal solutions:
Q(x, y, z) M (u; a, b, c) = 0,
(complementary slackness)
Q(x, y, z) 0,
M (u; a, b, c) 0.
We relax the inequality constraints and consider the system of equations
= ax + by + cz
i
i
5.1. Introduction
main
2012/11/1
page 207
i
207
(5.8)
In the latter case it comes from the four corners of the pillow, and it satises
(2 2 a2 + 2ab b2 + 2bc c2 2ac)
(2 2 a2 2ab b2 + 2bc c2 + 2ac)
0.
(5.9)
These two equations describe the algebraic boundary of the dual body P . Namely,
after setting = 1, the irreducible polynomial in (5.8) describes the quartic surface
that makes up the curved part of the boundary of P , as seen in Figure 5.2. In
addition, there are four planes spanned by at two-dimensional faces of P . The
product of the four corresponding ane linear forms is the expression (5.9). Indeed, each of the two quadrics in (5.9) factors into two linear factors. These two
characterize the planes spanned by opposite 2-faces of P .
The two equations (5.8) and (5.9) also oer a rst glimpse of the concept
of projective duality in algebraic geometry, dened precisely in Subsection 5.2.4.
Namely, consider the surface in projective space P3 dened by det(Q(x, y, z)) = 0
after replacing the ones along the diagonals by a homogenization variable. Then
(5.8) is its dual surface in the dual projective space (P3 ) . The surface (5.9) in (P3 )
is dual to the zero-dimensional variety in P3 cut out by the 33 minors of Q(x, y, z).
The optimal value function of the optimization problem (5.5) is represented,
in the sense of Section 5.3, by the algebraic surfaces dual to the boundary of P
and its singular locus. We have seen two dierent ways of dualizing (5.5): the dual
optimization problem (5.7) and the optimization problem (5.6) on P . These two
formulations are related as follows. If we regard (5.7) as specifying a 10-dimensional
spectrahedron, then the dual pillow P is a projection of that spectrahedron:
:
;
P = (a, b, c) R3 | u R7 : M (u; a, b, c) 0 and u1 + u4 + u6 + u7 = 1 .
Linear projections of spectrahedra, so-called projected spectrahedra, were introduced
in Chapter 2. They are at the heart of several parts of this book, most notably,
Chapters 6 and 7. The dual of a spectrahedron is generally not a spectrahedron,
but it is always a projected spectrahedron. We shall see this in Theorem 5.57.
5.1.2
Duality is a central concept in convexity and convex optimization, and numerous authors have written about their connections and their interplay with other notions of
duality and polarity. Relevant references include Barvinoks textbook [1, Section 4]
and the survey by Luenberger [24]. The latter focuses on dualities used in engineering, such as duality of vector spaces, polytopes, graphs, and control systems. The
i
i
208
main
2012/11/1
page 208
i
Chapter 5. Dualities
objective of this chapter is to revisit the theme of duality in the context of convex
algebraic geometry and semidenite optimization. In algebraic geometry, there is
a natural notion of projective duality, which associates to every algebraic variety a
dual variety. One of our main goals is to explore the meaning of projective duality
for optimization theory. It is precisely this deeper connection with algebra which
distinguishes this chapter from other treatments of duality in convex optimization.
Our presentation is organized as follows. In Section 5.2 we cover preliminaries
needed for the rest of the chapter. Here the various dualities are carefully dened
and their basic properties are illustrated by means of examples. In Section 5.3
we derive the result that the optimal value function of a polynomial program is
represented by the dening equation of the hypersurface projectively dual to the
manifold describing the boundary of all feasible solutions. This highlights the important fact that the duality best known to algebraic geometers arises very naturally
in convex optimization. Section 5.4 concerns the convex hull of a compact algebraic
variety in Rn . We discuss work of Ranestad and Sturmfels [31, 32] on the hypersurfaces in the boundary of such a convex body, and we present several examples
and applications.
In Section 5.5 we focus on semidenite programming (SDP), and we oer a
concise geometric introduction to SDP duality. This leads us to the concept of
algebraic degree of SDP [12, 27] or, more geometrically, to projective duality for
varieties dened by rank constraints on symmetric matrices of linear forms.
A projected spectrahedron is the image of a spectrahedron under a linear projection. Its dual body is a linear section of the dual body to the spectrahedron. In
Section 5.6 we examine this situation in the context of sums-of-squares programming, and we discuss linear families of nonnegative polynomials. The gures in
this chapter were made with the software package Bermeja [34], which specializes
in computations in convex algebraic geometry.
We now come to the rst round of exercises in this chapter. They are meant for
our readers to get their hands dirty right away. The problems can be approached
from rst principles. No knowledge of any general algorithms or theorems is needed.
The use of both numerical software and computer algebra tools is encouraged.
Exercises
Exercise 5.1. Maximize the function 2x + 3y + 7y over the spectrahedron P given
in (5.2). Express the optimal solution in exact arithmetic. Locate the cost function
on the right in Figure 5.2 and locate the optimal solution on the left.
Exercise 5.2. Compute the projections of the spectrahedron P into the (x, y)plane and into the (y, z)-plane. Determine polynomials f (x, y) and g(y, z) that
vanish on the boundaries of these two planar convex bodies.
Exercise 5.3. Project P into a random plane and compute the irreducible polynomial of degree eight in two variables that vanishes on the boundary of image.
Exercise 5.4. Does there exist a projected spectrahedron that is not a spectrahedron?
i
i
5.2. Ingredients
main
2012/11/1
page 209
i
209
5.2
Ingredients
In this section we review the mathematical preliminaries needed for the rest of the
chapter, we give precise denitions, and we x more of the notation. We begin
with the notion of duality for vector spaces and cones therein; then we move on to
convex bodies, polytopes, Lagrange duality in optimization, the KKT conditions,
and projective duality in algebraic geometry, and we conclude with discriminants.
5.2.1
lead to intermediate elds, e.g., the singular points in (5.3) live over the eld Q( 2).
Puiseux series come in handy when one needs a deformation parameter to deal
with degeneracies. This is standard for algorithms in real algebraic geometry [2].
Fix a nite-dimensional vector space V over an ordered eld K. The dual
vector space is the set V = Hom(V, K) of all linear forms on V . Let V and W
be vector spaces and : V W a linear map. The adjoint : W V is the
linear map dened by (w ) = w V for every w W . If we x bases of
both V and W , then is represented by a matrix A. The adjoint is represented,
relative to the dual bases for W and V , by the transpose AT of the matrix A.
A subset C V is a cone if it is closed under multiplication with positive
scalars. A cone C need not be convex, but its dual cone
C = { l V | for all x C : l(x) 0 }
(5.10)
is always closed and convex in V . If C is a convex cone, then the second dual
(C ) is the closure of C. Thus, if C is a closed convex cone in V , then
(C ) = C.
(5.11)
i
i
210
main
2012/11/1
page 210
i
Chapter 5. Dualities
in V .
(5.12)
Now, it makes sense to consider this convex set modulo L . We can thus identify
(C L) = L (C )
in V /L .
(5.13)
This formula expresses the fact that projection and intersection are dual operations.
Example 5.6. It is necessary to take the closure of L (C ) in (5.12) and (5.13)
because projections of closed convex cones need not be closed. The following simple
example is derived from [18, Example 3.5, p. 196]. Consider the closed convex cone
;
:
C =
(u, x, y, z) R4 : u 0, u + x 0, y 0, z 0, and (u + x)y z 2 ,
and x the hyperplane L = {(0, x, y, z) : x, y, z R} $ R3 . Then L is the
projection from R4 to R3 given by dropping the u-coordinate. We claim that the
image L (C ) is not closed. To see this, we note that for every > 0 the vector
(1/, 0, , 1) lies in C , and hence (0, , 1) lies in L (C ). On the other hand, (0, 0, 1)
does not lie in L (C ) because z = 1 implies (u + x)y 1 and hence y > 0.
The results summarized above are fundamental in convex analysis. For proofs
and details we refer to the textbook by Rockafellar [33, Section 16]. The space
V /L is the space Hom(L, K) of linear functionals on L. In applications one often
identies this space with L itself, by means of an inner product on the ambient space
V . The linear map L then becomes the orthogonal projection from V onto L, and
(5.13) is the closure of the image of C under that orthogonal projection.
A subset F C of a convex set C is a face if F is itself convex and contains
any line segment L C whose relative interior intersects F . We say that F is an
exposed face if there exists a linear functional l that attains its minimum over C
precisely at F . Clearly, every exposed face of C is a face, but the converse does not
hold. For instance, the edges of the triangle on the top in Figure 5.6 are nonexposed
faces of the three-dimensional convex body shown there.
An exposed face F of a cone C determines a face of the dual cone C via
F = { l C | l attains its minimum over C at F } .
The dimensions of the faces F of C and F of C satisfy the inequality
dim(F ) + dim(F ) dim(V ).
(5.14)
i
i
5.2. Ingredients
main
2012/11/1
page 211
i
211
5.2.2
(5.15)
This is derived from (5.10) using the identication l(x) = z (x) for z = 1. We
note that the dual of a convex body (as opposed to the dual of a cone) is not an
intrinsic construction, but it depends on the position of P relative to the origin.
Just as in the case of convex cones, if P is closed, then biduality holds:
(P ) = P.
The denition (5.15) makes sense for arbitrary subsets P of V . That is, P need
not be convex or closed. A standard fact from convex analysis [33, Corollary 12.1.1
and Section 14] says that the double dual is the closure of the convex hull with the
origin:
(P ) = conv(P 0).
All convex bodies discussed in this chapter are semialgebraic, that is, they can
be described by Boolean combinations of polynomial inequalities. We note that if P
is semialgebraic then its dual body P is also semialgebraic. This is a consequence
of Tarskis theorem on quantier elimination in real algebraic geometry [2, 4].
The algebraic boundary of a semialgebraic convex body P , denoted a P , is the
smallest complex algebraic variety that contains the boundary P . In geometric
language, a P is the Zariski closure of P . It is identied with the squarefree
polynomial fP that vanishes on P . Namely, a P = VC (fP ) is the zero set of the
polynomial fP . Note that fP is unique up to a multiplicative constant. Thus a P
is the smallest complex algebraic hypersurface which contains the boundary P .
A polytope is the convex hull of a nite subset of V . If P is a polytope, then
so is its dual P [37]. The boundary of P consists of nitely many facets F . These
are the faces F = v dual to the vertices v of P . The algebraic boundary a P is
the arrangement of hyperplanes spanned by the facets of P . Its dening polynomial
fP is the product of the linear polynomials v, x 1.
i
i
212
main
2012/11/1
page 212
i
Chapter 5. Dualities
y
q = sup{y, x | x Rn ,
x
p 1}.
Geometrically, the unit balls for these norms are dual as convex bodies.
Example 5.8. Consider the case n = 2 and p = 4. Here the unit ball equals
P = { (x, y) R2 : x4 + y 4 1 }.
This planar convex set is shown in Figure 5.3. The ordinary boundary P of this
convex set is the real curve dened by the quartic polynomial x4 + y 4 = 1. In this
example, the ordinary boundary coincides with the algebraic boundary a P .
Figure 5.3. The unit balls for the L4 -norm and the L4/3 -norm are dual.
The curve on the left has degree 4, while its dual curve on the right has degree 12.
i
i
5.2. Ingredients
main
2012/11/1
page 213
i
213
The dual body is the unit ball for the L4/3 -norm on R2 :
P = {(a, b) R2 : |a|4/3 + |b|4/3 1} .
The algebraic boundary of P is an irreducible algebraic curve of degree 12,
$
%
(5.16)
a P = V a12 +3a8 b4 +3a4 b8 +b12 3a8 +21a4 b4 3b8 +3a4 +3b4 1 ,
which again coincides precisely with the (geometric) boundary P . This dual
polynomial is easily produced by the following one-line program in the computer
algebra system Macaulay2 due to Grayson and Stillman [13]:
R = QQ[x,y,a,b]; eliminate({x,y},ideal(x^4+y^4-1,x^3-a,y^3-b))
In Subsection 5.2.4 we shall introduce the general algebraic framework for performing such duality computations, not just for curves, but for arbitrary varieties.
5.2.3
We now come to a standard concept of duality in optimization theory. The treatment here is more general than duality in convex optimization, which was presented
in Chapter 2. Let us consider the following general nonlinear polynomial optimization problem:
minimize
f (x)
n
xR
subject to gi (x) 0, i = 1, . . . , m,
hj (x) = 0, j = 1, . . . , p.
(5.17)
Here the g1 , . . . , gm , h1 , . . . , hp and f are polynomials in R[x1 , . . . , xn ]. The Lagrangian associated with the optimization problem (5.17) is the function
p
L : Rn R m
+ R
(x, , )
Rn ,
p
f (x) + m
i=1 i gi (x) +
j=1 j hj (x).
The scalars i R+ and j R are the Lagrange multipliers for the constraints
gi (x) 0 and hj (x) = 0. The Lagrangian L(x, , ) can be interpreted as an augmented cost function with penalty terms for the constraints. For more information
on the above formulation see [5, Section 5.1].
One can show that the problem (5.17) is equivalent to nding
u = minn
xR
max
Rp and 0
L(x, , ).
The key observation here is that any positive evaluation of one of the polynomials
gi (x), or any nonzero evaluation of one of the polynomials hj (x), would render the
inner optimization problem unbounded.
The dual optimization problem to (5.17) is obtained by exchanging the order
of the two nested optimization subproblems in the above formulation:
v =
max
min
Rp and 0 xRn
L(x, , ) .
!
"
(,)
i
i
214
main
2012/11/1
page 214
i
Chapter 5. Dualities
The function (, ) is known as the Lagrange dual function to our problem. This
function is always concave, so the dual is always a convex optimization problem.
It follows from the denition of the dual function that (, ) u for all , .
Hence the optimal values satisfy the inequality
v u .
If equality occurs, v = u , then we say that strong duality holds. A necessary
condition for strong duality is i gi (x ) = 0 for all i = 1, . . . , m, where (x , , )
denote a primal and dual optimizer. We see this by evaluating the Lagrangian at
an optimizer and taking into account the fact that hj (x) = 0 for all feasible x.
Collecting all inequality and equality constraints in the primal and dual optimization problems yields the following optimality conditions.
Theorem 5.9 (KKT conditions). Let (x , , ) be primal and dual optimal
solutions with u = v (strong duality). Then
x f
x
m
i
x gi
i=1
x
p
j x hj
j=1
x
= 0,
gi (x ) 0
for i = 1, . . . , m,
0
hj (x ) = 0
for i = 1, . . . , m,
for j = 1, . . . , p,
i gi (x ) = 0
for i = 1, . . . , m.
i
Complementary slackness:
(5.18)
For a derivation of this theorem see [5, Subsection 5.5.2]. Several comments
on the KKT conditions are in order. First, we note that complementary slackness
amounts to a case distinction between active (gi = 0) and inactive inequalities
(gi < 0). For any index i with gi (x ) = 0 we need i = 0, so the corresponding
inequality does not play a role in the gradient condition. On the other hand, if
gi (x ) = 0, then this can be treated as an equality constraint.
From an algebraic point of view, it is natural to relax the inequalities and to
focus on the KKT equations. These are the polynomial equations in (5.18):
x f
x
m
i=1
i x gi
x
p
j x hj
j=1
x
= 0,
(5.19)
If we wish to solve our optimization problem exactly, then we must compute the
algebraic variety in Rn Rm Rp that is dened by these equations.
In what follows we explore Lagrange duality and the KKT conditions in two
special cases, namely in optimizing a linear function over an algebraic variety (Section 5.3) and in semidenite programming (Section 5.5).
5.2.4
i
i
5.2. Ingredients
main
2012/11/1
page 215
i
215
of the real numbers R, and their points have coordinates in C. It is also customary
to work in projective space Pn rather than ane space Cn , i.e., we work with
equivalence classes x x for all C\{0}, x Cn+1 \{0}. Points (x0 : x1 :
: xn ) in projective space Pn are lines through the origin in Cn+1 , and the usual
ane coordinates are obtained by dehomogenization with respect to x0 (i.e., setting
x0 = 1). All points with x0 = 0 are then considered as points at innity. We refer
to [8, Chapter 8] for an elementary introduction to projective algebraic geometry.
Let I = h1 , . . . , hp be a homogeneous ideal in the ring K[x0 , x1 , . . . , xn ] of
polynomials in n + 1 unknowns with coecients in K. We write X = VC (I) for
its variety in the projective space Pn over C. The singular locus Sing(X) is a
proper subvariety of X. It is dened inside X
of the c c minors
%
$ by the vanishing
of the p(n+1) Jacobian matrix Jac(X) = hi /xj , where c = codim(X). See
[8, Section 9.6] for background on singularities and dimension. While the matrix
Jac(X) depends on our choice of ideal generators hi , the singular locus of X is
independent of that choice. Points in Sing(X) are called singular points of X. We
write Xreg = X\Sing(X) for the set of regular points in X. We say that the
projective variety X is smooth if Sing(X) = or, equivalently, if X = Xreg .
n
The dual projective space (Pn ) parametrizes:hyperplanes
n in P . A ;point
n
n
(u0 : u1 : : un ) (P ) represents the hyperplane x P | i=0 ui xi = 0 . We
say that u is tangent to X at a regular point x Xreg if x lies in that hyperplane
and its representing vector (u0 , u1 , . . . , un ) lies in the row space of the Jacobian
matrix Jac(X) at the point x.
We dene the conormal variety CN(X) of X to be the closure of the set
:
;
(x, u) Pn (Pn ) | x Xreg and u is tangent to X at x .
The projection of CN(X) onto the second factor is denoted X and is called the
dual variety. More precisely, the dual variety X is the closure of the set
:
;
u (Pn ) | the hyperplane u is tangent to X at some regular point .
In our denitions of conormal variety and dual variety, the word closure can mean
either Zariski closure or the classical strong closure over the complex numbers. Both
will lead to the same complex projective variety in the situations considered here.
Proposition 5.10. The conormal variety CN(X) has dimension n 1.
Proof sketch. We may assume that X is irreducible. Let c = codim(X). There
are nc degrees of freedom in picking a point x in Xreg . Once the regular point x
is xed, the possible tangent vectors u to X at x form a linear space of dimension
c1. Hence the dimension of CN(X) is (nc) + (c1) = n1.
Since the dual variety X is a linear projection of the conormal variety CN(X),
Proposition 5.10 implies that the dimension of X is at most n 1. We expect X
to have dimension n 1. In other words, regardless of the dimension of X, the dual
variety X is typically a hypersurface in the dual projective space (Pn ) . We shall
see many examples of such dual hypersurfaces throughout this chapter.
i
i
216
main
2012/11/1
page 216
i
Chapter 5. Dualities
a
4x3
b
4y 3
'
c
.
4z 3
We write J for the ideal generated by these four polynomials in Q[x, y, z, a, b, c].
We then replace J by its saturation
J = J : x, y, z .
(5.20)
J :=
5:
6:
%
J : c c minors of Jac(X) .
i
i
5.2. Ingredients
main
2012/11/1
page 217
i
217
The steps in this algorithm can be executed either using exact arithmetic in a
computer algebra system, such as Macaulay2, or using oating point arithmetic in
the framework of numerical algebraic geometry. Such a numerical implementation
in the software Bertini [3] is currently being developed by Jonathan Hauenstein.
Remark 5.12. The ideal J in step 3 above is bihomogeneous in x and u, respectively. Its zero set in Pn (Pn ) is the conormal variety CN(X).
Theorem 5.13 (Biduality, [11, Theorem 1.1]).
variety X Pn satises
(X ) = X.
Proof sketch. The main step in proving this important theorem is that the conormal variety is self-dual, in the sense that CN(X) = CN(X ). In this identity, the
roles of x Pn and u (Pn ) are swapped. It implies (X ) = X. A proof for the
self-duality of the conormal variety is found in [11, Subsection I.1.3].
Example 5.14. Suppose that X Pn is a general smooth hypersurface of degree d.
Then X is a hypersurface of degree d(d 1)n1 in (Pn ) . A concrete instance for
d = 4 and n = 2 was seen in Examples 5.8 and 5.11. When X is a hypersurface
that is not smooth, then the dual variety X is either a hypersurface of degree less
than d(d 1)n1 , or X is a variety of codimension at least 2.
Example 5.15. Let X be the variety of symmetric m m matrices of rank at
most r. Then X is the variety of symmetric m m matrices of rank at most m r
[11, Subsection I.1.4]. Here the conormal variety CN(X) consists of pairs of symmetric matrices A and B such that A B = 0. This conormal variety will be important
for our discussion of duality in semidenite programming in Section 5.5.
An important class of examples, arising from toric geometry, is featured in the
book by Gelfand, Kapranov, and Zelevinsky [11]. A projective toric variety XA in
Pn is specied by an integer matrix A of format r (n+1) and rank r with columns
a0 , a1 , . . . , an and whose row space
; We dene XA
: contains the vector (1, 1, . . . , 1).
as the closure in Pn of the set (ta0 : ta1 : : tan ) | t (C\{0})r .
that vanishes on XA
. The A-discriminant is indeed a discriminant in the sense that
its vanishing characterizes Laurent polynomials
p(t) =
n
cj t1 1j t2 2j tar rj
j=0
with the property that the hypersurface {p(t) = 0} has a singular point in (C\{0})r .
In other words, we can dene (and compute) the A-discriminant as the unique
i
i
218
main
2012/11/1
page 218
i
Chapter 5. Dualities
r
= =
=0 .
XA = c (P ) | t (C\{0}) with p(t) =
t1
tr
Example 5.16. Let r = 2, n = 4, and x the matrix
&
'
4 3 2 1 0
A =
.
0 1 2 3 4
The associated toric variety is the rational normal curve
;
:
XA = (t41 : t31 t2 : t21 t22 : t1 t32 : t42 ) P4 | (t1 : t2 ) P1
= V (x0 x2 x21 , x0 x3 x1 x2 , x0 x4 x22 , x1 x3 x22 , x1 x4 x2 x3 , x2 x4 x23 ).
A hyperplane { 4j=0 cj xj = 0} is tangent to XA if and only if the binary form
p(t1 , t2 )
c0 c1
c2
0 c0
c1
0
0
c0
1
c
2c
3c
A =
det
1
2
3
c4
0 c1 2c2
0
0
c1
0
0
0
c3
c4
0
0
c2
c3
c4
0
c1
c2
c3
c4
4c4 0
0
0
(5.21)
,
3c3 4c4 0
0
given here in the form of the determinant of a Sylvester matrix, see [9, Section 3].
i
i
main
2012/11/1
page 219
i
219
5.3
(5.22)
the optimal solution depends in a convex and piecewise linear manner on the cost
vector w and the right hand side b, and it is a piecewise rational function of the
entries of the matrix A. The area of mathematics which studies these functions
is geometric combinatorics, specically the theory of matroids for the dependence
on A, and the theory of regular polyhedral subdivisions for the dependence on w
and b. Exercise 5.30 at the end of this section asks for a further exploration.
If we replace (5.22) with the corresponding integer programming problem, where
the coordinates of x are required to be integers, then the dependence on w and
b becomes more subtle and nite Abelian groups enter the picture. The optimal
value function of an integer program has a certain arithmetic behavior, in addition
to the polyhedral structures which govern the parametric versions of the linear
programming problem.
For a second example, consider the following basic question in game theory:
Given a game, compute its Nash equilibria.
(5.23)
If there are only two players and one is interested in fully mixed Nash equilibria,
then this is a linear problem and in fact closely related to linear programming. On
the other hand, if the number of players is more than two, then the problem (5.23) is
universal in the sense of real algebraic geometry: Datta [10] showed that every real
algebraic variety is isomorphic to the set of Nash equilibria of some three-person
game. A corollary of her construction is that, if the Nash equilibria are discrete,
then their coordinates can be arbitrary algebraic functions of the given input data.
Our third motivating example concerns maximum likelihood estimation in
statistical models for discrete data. Here the optimization problem is as follows:
maximize p1 ()u1 p2 ()u2 pn ()un subject to ,
(5.24)
one, and the ui are positive integers (these are the data). The optimal solution ,
which is the maximum likelihood estimator, depends algebraically on the data:
1 , . . . , un ).
(u1 , . . . , un ) (u
(5.25)
Catanese et al. [7] give a formula for the degree of this algebraic function under
certain hypotheses on the polynomials pi () which specify the statistical model.
i
i
i main
2012/11/10
page 220
220
Chapter 5. Dualities
In this section we study this issue for the polynomial optimization problem
(5.17). We shall assume throughout that the cost function f (x) is linear and that
there are no inequality constraints gi (x). The purpose of these restrictions is to
simplify the presentation and focus on the key ideas. Also, this is compatible with
Chapter 7, which oers an algebraic method for the important problem of computing
lower bounds on the optimal value function. Our analysis can be extended to the
general problem (5.17), and we discuss this briey at the end of this section.
To be precise, we consider the problem of optimizing a linear cost function
over a compact real algebraic variety X in Rn . This is written formally as follows:
c0 = min c, x
x
subject to
x X = {v Rn | h1 (v) = = hp (v) = 0} .
(5.26)
0.
(5.27)
Our aim is to compute such a polynomial of least possible degree. The input
consists of the polynomials h1 , . . . , hp that cut out the variety X. The degree of
in the unknown c0 is called the algebraic degree of the optimization problem
(5.17). This number is an intrinsic algebraic complexity measure for the problem of
optimizing a linear function over X. For instance, if c1 , . . . , cn are rational numbers,
then the algebraic degree indicates the degree of the eld extension K over Q that
contains the coordinates of the optimal solution.
We illustrate our discussion by computing the optimal value function and its
algebraic degree for the trigonometric space curve featured in [31, Section 1].
Example 5.22. Let X be the curve in R3 with parametric representation
$
%
cos(), sin(2), cos(3) .
(x1 , x2 , x3 ) =
In terms of equations, our curve can be written as X = V (h1 , h2 ), where
h1 = x21 x22 x1 x3
i
i
main
2012/11/1
page 221
i
221
The optimal value function for maximizing c1 x1 +c2 x2 +c3 x3 over X is given by
= (11664c43 ) c60 + (864c31 c33 + 1512c21 c22 c23 19440c21 c43
+576c1 c42 c3 1296c1 c22 c33 + 64c62 25272c22 c43 34992c63 ) c40
6 2
+ (16c1 c3 + 8c51 c22 c3 1152c51 c33 1920c41 c22 c23 + 8208c41 c43 724c31 c42 c3 + 144c31 c22 c33
+c41 c42 17280c31 c53 80c21 c62 2802c21 c42 c23 3456c21 c22 c43 + 3888c21 c63 1120c1 c62 c3
+540c1 c42 c33 + 55080c1 c22 c53 128c82 208c62 c23 +15417c42 c43 +15552c22 c63 +34992c83 ) c20
+ (16c81 c23 8c71 c22 c3 + 256c71 c33 c61 c42 + 328c61 c22 c23 1600c61 c43 + 114c51 c42 c3
2856c51 c22 c33 + 4608c51 c53 + 12c41 c62 1959c41 c42 c23 + 9192c41 c22 c43 4320c41 c63
528c31 c62 c3 + 7644c31 c42 c33 7704c31 c22 c53 6912c31 c73 48c21 c82 + 3592c21 c62 c23
4863c21 c42 c43 13608c21 c22 c63 + 15552c21 c83 + 800c1 c82 c3 400c1 c62 c33 10350c1 c42 c53
8 2
6 4
4 6
2 8
10
+16200c1 c22 c73 + 64c10
2 + 80c2 c3 1460c2 c3 + 135c2 c3 + 9720c2 c3 11664c3 ).
The optimal value function c0 is the algebraic function of c1 , c2 , c3 obtained by solving = 0 for the unknown c0 . Since c0 has degree 6 in , we see that the algebraic
degree of this optimization problem is 6. Note that there are no odd powers of c0
in . Thus, is a cubic polynomial in c20 , and this implies that we can write the
optimal value function c0 as an expression in radicals in c1 , c2 , c3 .
We now come to the main result in this section. It will explain what the
polynomial means and how it was computed in the previous example. For the
sake of simplicity, we shall rst assume that the given variety X is smooth, i.e.
X = Xreg , where the set Xreg denotes all regular points on X.
Theorem 5.23. Let X (Pn ) be the dual variety to the projective closure of a
real ane variety X in Rn . If X is irreducible, smooth, and compact in Rn , then X
is an irreducible hypersurface, and its dening polynomial equals (c0 , c1 , . . . , cn )
where represents the optimal value function as in (5.27) of the optimization problem (5.26). In particular, the algebraic degree of (5.26) is the degree in c0 of the
irreducible polynomial that vanishes on the dual hypersurface X .
Here the change of sign in the coordinate c0 is needed because the equation
c0 = c1 x1 + + cn xn for the objective function value in Rn becomes the homogenized equation (c0 )x0 + c1 x1 + + cn xn = 0 when we pass to Pn .
Proof. Since X is compact, for every cost vector c there exists an optimal solution
x . Our assumption that X is smooth ensures that x is a regular point of X, and
c lies in the span of the gradient vectors x hi x for i = 1, . . . , p. In other words,
the KKT conditions are necessary at the point x :
c =
p
i x hi x ,
i=1
hi (x ) = 0
for i = 1, 2, . . . , p.
i
i
222
main
2012/11/1
page 222
i
Chapter 5. Dualities
(5.28)
c2 = 1 4x32 ,
x41 + x42 = 1.
(5.29)
i
i
main
2012/11/1
page 223
i
223
i
i
224
main
2012/11/1
page 224
i
Chapter 5. Dualities
Again, the optimal value function is represented by a unique square-free polynomial (c0 , c1 , . . . , cn ), and each factor of this polynomial is the dual hypersurface
Y of some variety Y that is obtained from X by setting gi (x) = 0 for some of
the inequality constraints, by recursively passing to singular loci. In Section 5.5 we
shall explore this for semidenite programming.
We close this section with a simple example involving A-discriminants.
Example 5.27. Consider the calculus exercise of minimizing a polynomial
q(t)
c 1 t + c 2 t2 + c 3 t3 + c 4 t4
5.4
i
i
main
2012/11/1
page 225
i
225
Figure 5.5. A quartic curve in the plane can have up to 28 real bitangents.
i
i
226
main
2012/11/1
page 226
i
Chapter 5. Dualities
Each reduces over R to four parallel lines (cf. Figure 5.5), two of which contribute
to the boundary. The point of this example is to stress the role of the (arithmetic
of) bitangents in any exact description of the convex hull of a plane curve.
We now present a general formula for the algebraic boundary of the convex hull
of a compact variety X in Rn . The key observation is that the algebraic boundary
of P = conv(X) will consist of dierent types of components, resulting from planes
that are simultaneously tangent at k dierent points of X, for various values of the
integer k. For the Trott curve X in Example 5.33, the relevant integers were k = 1
and k = 2, and we demonstrated that the algebraic boundary of its convex hull P
is a reducible curve of degree 12:
a (P ) = X Y.
(5.31)
n
<
(X [k] ) .
(5.32)
k=1
i
i
i main
2012/11/10
page 227
227
its irreducible components (over the eld K of interest). For each component we
then check, usually by means of numerical computations, whether it meets the
boundary P in a regular point. The irreducible hypersurfaces which survive this
test are precisely the components of a X.
Example 5.35. When X is a plane curve in R2 , (5.32) says that
a P X (X [2] ) .
(5.34)
Here X [2] is the set of points in (P2 ) that are dual to the bitangent lines of X, and
(X [2] ) is the union of those lines in P2 . If we work over K = Q and the curve X
is general enough then we expect equality to hold in (5.34). For special curves the
inclusion can be strict. This happens for the Trott curve (5.30) since Y is a proper
subset of (X [2] ) . Namely, Y consists of two of the six Q-components of (X [2] ) .
However, a small perturbation of the coecients in (5.30) leads to a curve X with
equality in (5.34), as the relevant Galois group acts transitively on the 28 points
in X [2] for general quartics X. See [28] for more details. We conclude that the
algebraic boundary of X over Q is a reducible curve of degree 32 = 28 + 4.
If we are given the variety X in terms of equations or in parametric form,
then we can compute equations for X [k] by an elimination process similar to the
computation of the dual variety X in Algorithm 5.1. However, expressing the
tangency condition at k dierent points requires a larger number of additional
variables (which need to be eliminated afterwards) and thus the computations are
quite involved. The subsequent step of dualizing X [k] to get the right-hand side of
(5.32) is even more forbidding. The resulting hypersurfaces (X [k] ) tend to have
high degree and their dening polynomials are very large when n 3.
The article [31] oers a detailed study of the case when X is a space curve in
R3 . Here the lower bound (5.33) tells us that a X (X [2] ) (X [3] ) . The surface
(X [2] ) is the edge surface of the curve X, and (X [3] ) is the union of all tritangent
planes of X. The following example illustrates these objects.
Example 5.36. We consider the trigonometric curve X in R3 parametrized by
x = cos(), y = cos(2), z = sin(3). This is an algebraic curve of degree six. Its
implicit representation equals X = V (h1 , h2 ), where
h1 = 2x2 y 1 and h2 = 4y 3 + 2z 2 3y 1.
The edge surface (X [2] ) has three irreducible components. Two of the components are the quadric V (h1 ) and the cubic V (h2 ). The third and most interesting
component of (X [2] ) is the surface of degree 16 with equation h3 =
419904x14 y 2 + 664848x12 y 4 419904x10 y 6 + 132192x8 y 8 20736x6 y 10 + 1296x4 y 12
46656x14 z 2 + 373248x12 y 2 z 2 69984x10 y 4 z 2 22464x8 y 6 z 2 +4320x6 y 8 z 2 +31104x12 z 4
+ 5184x10 y 2 z 4 + 4752x8 y 4 z 4 + 1728x10 z 6 + 699840x14 y 46656x12 y 3 902016x10 y 5
+694656x8 y 7 209088x6 y 9 1150848x10 y 3 z 2 +279936x8 y 5 z 2 +17280x6 y 7 z 2 4032x4 y 9 z 2
98496x10 yz 4 + 27072x4 y 11 1152x2 y 13 419904x12 yz 2 25920x8 y 3 z 4 4608x6 y 5 z 4
i
i
228
main
2012/11/1
page 228
i
Chapter 5. Dualities
i
i
main
2012/11/1
page 229
i
229
Figure 5.6. The convex hull of the curve (cos(), cos(2), sin(3)) in R3 .
X contains the curve X [2] which is the union of four quadratic curves. The duals
of these four plane curves are the singular quadratic surfaces dened by
h3 = x2 2y 2 z 2 , h4 = 2x2 y 2 1, h5 = 3y 2 + 2z 2 1, h6 = 3x2 + z 2 2.
The edge surface of X is the union of these four quadrics:
(X [2] ) = V (h3 ) V (h4 ) V (h5 ) V (h6 ).
The algebraic boundary of P consists of the last two among these quadrics:
a P = V (h5 ) V (h6 ).
i
i
230
main
2012/11/1
page 230
i
Chapter 5. Dualities
Figure 5.7. The curve on the unit sphere discussed in Examples 5.37 and 5.61.
These two quadrics are convex. From this we derive a representation of P as a
spectrahedron by applying Schur complements to the quadrics h5 and h6 :
3
(x, y, z) R
1+
3y
2z
0
0
2z
1 3y
0
0
0
0
2+z
3x
0
0 .
3x
2z
i
i
main
2012/11/1
page 231
i
231
Exercise 5.42. Intersect the unit sphere in 3-space with a general quadratic
surface. Show that the convex hull of the resulting curve is a spectrahedron.
5.5
m
m
n
P =
xR C
xi Ai 0
$ K S+
.
(5.35)
i=1
We shall assume that C is positive denite or, equivalently, that 0 int(P ). The
dual body to our spectrahedron is written in the coordinates on Rm as
P = { y Rm | y, x 1 for all x P } .
%
$
We can express P as a projection of the n+1
2 -dimensional spectrahedron
n
Q = { U S+
| U, C 1 }.
(5.36)
i
i
232
main
2012/11/1
page 232
i
Chapter 5. Dualities
1 x y
E3 = (x, y, z) R3 x 1 z 0 .
(5.37)
y z 1
This spectrahedron of dimension m = 3 is shown on the left in Figure 5.8. The
algebraic boundary of E3 is the cubic surface X dened by the vanishing of the 3 3
determinant in (5.37). That surface has four isolated singular points
Xsing = {(1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1)}.
The six edges of the tetrahedron conv(Xsing ) are edges of the elliptope E3 . The dual
body, shown on the right of Figure 5.8, is the projected spectrahedron
u a
b
c 0 .
E3 = (a, b, c) R3 u, v R : a v
(5.38)
b c 2uv
The algebraic boundary of E3 can be computed by the following method. We form
the ideal generated by the determinant in (5.38) and its derivatives with respect to
u and v, and we eliminate u, v. This results in the polynomial
(a2 b2 + b2 c2 + a2 c2 + 2abc)(a + b + c 1)(a b c 1)(a b + c + 1)(a + b c + 1).
The rst factor is the equation of Steiners quartic surface X , which is dual to
Cayleys cubic surface X = a E3 . The four linear factors represent the arrangement
(Xsing ) of the four planes dual to the four singular points.
i
i
main
2012/11/1
page 233
i
233
Thus the algebraic boundary of the dual body E3 is the reducible surface
a E3 = X (Xsing )
(P3 ) .
(5.39)
(5.40)
Here P is as in (5.35). As the semideniteness of a matrix is equivalent to the simultaneous nonnegativity of its principal minors, SDP is an instance of the polynomial
optimization problem (5.17). Lagrange duality theory applies here by [5, Section 5].
We shall derive the optimization problem dual to (5.40) from
d := minimize subject to
1
b P .
(5.41)
subject to
Ai , Y = bi for i = 1, . . . , m
and Y
(5.42)
n
S+
.
(5.43)
i
i
234
main
2012/11/1
page 234
i
Chapter 5. Dualities
The following reformulation of (5.40) highlights the symmetry between the primal
and dual formulations of our SDP problem:
n
p := max B, C X subject to X (C + W) S+
X
(5.44)
(5.45)
Given the data B, C, and W, our problem is to solve the polynomial equations
(5.45). The theorem ensures that, among its solutions (X, Y ), there is precisely one
pair of positive semidenite matrices. That pair is the one desired in SDP.
Example 5.46. Consider the problem of minimizing a linear function Y C, Y
over the set of all correlation matrices Y , that is, over the elliptope En of Example
5.44. Here m = n, B is the identity matrix, C is any symmetric matrix, W is
the space of all diagonal matrices, and W consists of matrices with zero diagonal.
n
This problem is dual to maximizing the trace of C X over all matrices X S+
such that C X is diagonal. Equivalently, we seek to nd the minimum trace t of
any positive semidenite matrix that agrees with C in its o-diagonal entries.
For n = 4, the KKT equations (5.45) can be written in the form
XY =
(5.46)
c13 c23 x3 c34 y13 y23 1 y34 = 0.
c14 c24 c34 x4
y14 y24 y34 1
This is a system of 16 quadratic equations in 10 unknowns. For general values of
the 6 parameters cij , these equations have 14 solutions. Eight of these solutions
i
i
main
2012/11/1
page 235
i
235
have rank(X) = 3 and rank(Y ) = 1 and they are dened over Q(cij ). The other
six solutions form an irreducible variety over Q(cij ) and they satisfy rank(X) =
rank(Y ) = 2. This case distinction reects the boundary structure of the dual body
to the six-dimensional elliptope E4 :
a E4
{rank(Y ) 2} {rank(Y ) = 1} .
(5.47)
Indeed, the boundary of E4 is the quartic hypersurface {rank(Y ) 3}, its singular
locus is the degree 10 threefold {rank(Y ) 2}, and, nally, the singular locus of
that threefold consists of eight matrices of rank 1:
;
:
{rank(Y ) = 1} = (u1 , u2 , u3 , u4 )T (u1 , u2 , u3 , u4 ) : ui {1, +1} .
The last two strata are dual to the hypersurfaces in (5.47). The second component
in (5.47) consists of eight hyperplanes, while the rst component is irreducible of
degree 18. The corresponding projective hypersurface is dened by an irreducible
homogeneous polynomial of degree 18 in seven unknowns c12 , c13 , c14 , c23 , c24 , c34 , t .
That polynomial has degree 6 in the special unknown t . Hence, the algebraic degree
of our SDP, i.e., the degree of the optimal value function, is 6 when rank(Y ) = 2.
We note that {rank(Y ) 3} does not appear as a component in the union
(5.47) since it is not a hypersurface. Nevertheless, it is still a subset of a E4 .
In algebraic geometry, it is natural to regard the matrix pairs (X, Y ) as points
in the product of projective spaces P(S n ) P(S n ) . This has the advantage that
solutions of (5.45) are invariant under scaling, i.e., whenever (X, Y ) is a solution,
then so is (X, Y ) for any nonzero , R. In that setting, there are no worries
about complications due to solutions at innity.
For the algebraic formulation we assume that, without loss of generality,
b1 = 1,
b2 = 0,
b3 = 0, . . . , bm = 0.
This means that A1 , X = 1 plays the role of the homogenizing variable. Our SDP
instance is specied by two linear subspaces of symmetric matrices:
L = Span(A2 , A3 , . . . , Am ) U = Span(C, A1 , A2 , . . . , Am ) S n .
Note that we have the following identications:
RC + W = U
and RB + W = RB + (L A
1 ) = L .
(5.48)
Here is an abstract denition of SDP that might appeal to some of our algebraically inclined readers: Given two nested linear subspaces L U S n with
dim(U/L) = 2, locate the unique semidenite point in the variety (5.48).
i
i
236
main
2012/11/1
page 236
i
Chapter 5. Dualities
For instance, in Example 5.46 the space L consists of traceless diagonal matrices and U/L is spanned by the unit matrix B and one o-diagonal matrix C. We
seek to solve the matrix equation X Y = 0 where the diagonal entries of X are
constant and the o-diagonal entries of Y are proportional to C.
The formulation (5.48) suggests that we study the variety {XY = 0} for pairs
of symmetric matrices X and Y . In [27, Equation (3.9)] it was shown that this
variety has the following decomposition into irreducible components:
{XY = 0} =
n1
<
{XY = 0}r
P(S n ) P(S n ) .
r=1
Here {XY = 0}r denotes the subvariety consisting of pairs (X, Y ) where rank(X)
r and rank(Y ) nr. This is irreducible because, by Example 5.15, it is the conormal variety of the variety of symmetric matrices of rank r. See also Exercise 5.19
at the end of Section 5.2.
The KKT equations describe sections of these conormal varieties:
$
%
{XY = 0}r P(U) P(L ) .
(5.49)
All solutions of a semidenite optimization problem (and thus also the boundary of
a spectrahedron and its dual) can be characterized by rank conditions. The main
result in [27] describes the case when the section in (5.49) is generic:
Theorem 5.47 ([27, Theorem 7]). For generic subspaces L U S n with
dim(L) = m 1 and dim(U) = m + 1, the variety (5.49) is empty unless
&
'
&
' &
'
nr+1
r+1
n+1
m and
m.
(5.50)
2
2
2
In that case, the variety (5.49) is reduced, nonempty, and zero-dimensional and at
each point the rank of X and Y is r and n r, respectively (strict complementarity).
The cardinality of this variety depends only on m, n, and r.
The generic choice of nested subspaces L U corresponds to the assumption
that our matrices A1 , A2 , . . . , Am , B, C lie in a certain dense open subset in the space
of all SDP instances. The inequalities (5.50) are known as Patakis inequalities.
If m and n are xed, then they give a lower bound and an upper bound for the
possible ranks r of the optimal matrix of a generic SDP instance. The variety
(5.49) represents all complex solutions of the KKT equations for such a generic
SDP instance. Its cardinality, denoted (m, n, r), is known as the algebraic degree
of SDP.
Corollary 5.48. Consider the variety of symmetric nn matrices of rank r that
lie in the generic m-dimensional linear subspace P(U) of P(S n ). Its dual variety is
a hypersurface if and only if Patakis inequalities (5.50) hold, and the degree of that
hypersurface is (m, n, r), the algebraic degree of SDP.
i
i
main
2012/11/1
page 237
i
237
Proof. The genericity of U ensures that {XY = 0}r ( P(U) P(U) ) is the
conormal variety of the given variety. We obtain its dual by projection onto the
second factor P(U) = P(S n /U ). The degree of the dual hypersurface is found by
intersecting with a generic line. The line we take is P(L /U ). That intersection
corresponds to the second factor P(L ) in (5.49).
We note that the symmetry in the equations (5.48) implies the duality
&&
'
'
$
%
n+1
m, n, r
=
m, n, n r ,
2
rst shown in [27, Proposition 9]. See also [27, Table 2]. Bothmer and Ranestad
[12] derived an explicit combinatorial formula for the algebraic degree of SDP. Their
result implies that (m, n, r) is a polynomial of degree m in n when n r is xed.
For example, in addition to [27, Theorem 11], we have
(6, n, n 2)
%
1$
11n6 81n5 + 185n4 75n3 196n2 + 156n .
72
{X L | rank(X) r} .
(5.51)
r as in (5.50)
i
i
238
main
2012/11/1
page 238
i
Chapter 5. Dualities
2D12
D12 +D13 D23
The Dij are the squared distances among six points in R3 if and only if this matrix
is positive semidenite of rank 3. The points represent the carbon atoms in
cyclohexane C6 H12 if and only if Di,i+1 = 1 and Di,i+2 = 8/3 for all indices i,
understood cyclically. The three diagonal distances x = D14 , y = D25 , and
z = D36 are unknowns, so, for cyclohexane conformations, the above Sch
onberg
matrix equals
2
8/3
x 5/3 11/3 y
2/3
8/3
2
5/3 + x
8/3
11/3 z
5/3
5/3
+
x
16/3
x
+
5/3
x 5/3
C6 (x, y, z) =
.
11/3 y
8/3
x + 5/3
2y
8/3
2/3
11/3 z x 5/3
8/3
16/3
The cyclohexatope Cyc6 is the spectrahedron in R3 dened by C6 (x, y, z) 0. Its
algebraic boundary decomposes as a Cyc6 = V (f ) V (g), where
f
g
=
=
i
i
main
2012/11/1
page 239
i
239
x
z+1 x+y+z
z+1
y
x y 0.
x+y+z xy 1xy
Determine the values x , y , and z for the optimal matrix as oating point numbers.
Make sure that you have at least twenty accurate digits. If this is possible, write
x , y , and z in terms of radicals over Q.
5.6
Projected Spectrahedra
i
i
240
main
2012/11/1
page 240
i
Chapter 5. Dualities
p
m
xi Ai +
y j Bj 0 .
P =
x Rm y Rp with C +
i=1
j=1
An expression for the dual body P is obtained by the following variant of the
construction in Remark 5.43. We consider the same linear map as before:
n
: S+
Rm , U (A1 , U , . . . , Am , U ).
(5.52)
By Hilberts theorem [25, Theorem 1.2.6], this inclusion of convex cones is strict
unless (n, 2d) equals (1, 2d) or (n, 2) or (2, 4). The sos cone is easily seen to be a
projected spectrahedron. Indeed, consider an unknown symmetric matrix Q S N
and write p = v T Qv, where v is the vector of all N monomials of degree d. The
matrix Q is positive semidenite if it has a Cholesky factorization Q = C T C. The
resulting identity p = (Cv)T (Cv) can be rewritten as (5.52). Hence the sos cone is
N
under the linear map Q v T Qv.
the image of S+
The boundaries of our two cones and their duals have been described in detail
already in Chapter 4, and here we want only to briey make some connections to
our previous discussion about dualities. In the work of Nie [26] the structure of
these boundaries was approached by computations with discriminants, encountered
at the end of Section 5.2.4.
Proposition 5.58 (Theorem 4.1 in [26]). The algebraic boundary of the cone
of homogeneous polynomials p of degree 2d that are nonnegative on Rn is given
i
i
main
2012/11/1
page 241
i
241
4 0 3 0 0 2 0
A = 0 4 0 2 3 0 0 .
(5.53)
0 0 1 2 1 2 4
This A-discriminant is an irreducible homogeneous polynomial of degree 24 in the
seven coecients. What we are interested in here is the specialized discriminant
which is obtained from A by substituting the vector of coecients (1, 1, a, a, b, b, a+
b) corresponding to our polynomial fa,b . The specialized discriminant is an inhomogeneous polynomial of degree 24 in the two unknowns a and b, and it is no longer
irreducible. A computation reveals that it is the product of four irreducible factors
whose degrees are 1, 5, 5, and 13.
The linear factor equals a + b. The two factors of degree 5 are
256a2 27a5 +512ab+144a3 b27a4 b+256b2 128ab2 +144a2 b2 128b3 4a2 b3 +16b4 ,
256a2 128a3 +16a4 +512ab128a2 b+256b2 +144a2 b2 4a3 b2 +144ab3 27ab4 27b5 .
i
i
242
main
2012/11/1
page 242
i
Chapter 5. Dualities
+663552a9 + 2949120a8 b + 10539008a7 b2 + 17727488a6 b3 + 9981952a5 b4
+9981952a4 b5 + 17727488a3 b6 + 10539008a2 b7 + 2949120ab8 + 663552b9
2719744a8 8847360a7 b 14974976a6 b2 36503552a5 b3 56360960a4 b4
36503552a3 b5 14974976a2 b6 8847360ab7 2719744b8 + 4587520a7
+25821184a6 b + 52035584a5 b2 + 50724864a4 b3 +50724864a3 b4 +52035584a2 b5
+25821184ab6 + 4587520b7 6291456a6 31457280a5 b 94371840a4 b2
138412032a3 b3 94371840a2 b4 31457280ab5 6291456b6 + 16777216a5
+50331648a4 b + 67108864a3 b2 + 67108864a2 b3 + 50331648ab4 + 16777216b5
16777216a4 67108864a3 b 100663296a2 b2 67108864ab3 16777216b4 .
The relevant pieces of these four curves in the (a, b)-plane are depicted in Figure 5.9.
The line a + b = 0 is seen in the lower left, the degree 13 curve is the swallowtail
in the upper right, and the two quintic curves form the upper-left and lower-right
boundaries of the enclosed convex region C.
rank 4
rank 5
rank 6
rank 5
rank 3
Figure 5.9. The discriminant in Example 5.59 denes a curve in the (a, b)plane. The projected spectrahedron C is the set of points where the ternary quartic
fa,b is sos. The ranks of the corresponding sos matrices Q are indicated.
For each (a, b) C, the ternary quartic fa,b has an sos representation
fa,b (x, y, z) = (x2 , xy, y 2 , xz, yz, z 2) Q (x2 , xy, y 2 , xz, yz, z 2)T ,
(5.54)
i
i
main
2012/11/1
page 243
i
243
then the ber consists of a single point. The ranks of these unique matrices are
indicated in Figure 5.9. Notice that C has three singular points, at which the rank
drops from 5 to 4 and 3, respectively.
We now turn our attention to the question of approximating the convex hull
of a variety by a nested family of projected spectrahedra. Let I be an ideal in
R[x1 , . . . , xn ] and VR (I) the variety it denes in Rn . Consider the set of anelinear polynomials that are nonnegative on VR (I):
P1 (I)
In light of the biduality theorem for convex sets (cf. Section 5.2.2), we can characterize the (closure of) the convex hull of our variety as follows:
conv(VR (I))
(5.55)
We now dualize the situation by considering the subsets of Rn where the various f
are nonnegative. The dth theta body of the ideal I is the set
:
;
THd (I) = x Rn | f (x) 0 for all f d1 (I) .
i
i
244
main
2012/11/1
page 244
i
Chapter 5. Dualities
(5.56)
This chain of outer approximations can fail to converge in general, but there are
various convergence results when the geometry is nice. For instance, if the real
variety VR (I) is compact then Schm
udgens Positivstellensatz [35, Section 3] ensures
asymptotic convergence. When VR (I) is a nite set, so that conv(VR (I)) is a polytope, then nite convergence follows from [19], that is, d : THd (I) = conv(VR (I)).
More information on theta bodies and related constructions is given in Chapter 7.
The main point we wish to record here is the following:
Theorem 5.60 ([16, 22]). Each theta body THd (I) is a projected spectrahedron.
Proof. We may assume, without loss of generality, that the origin 0 lies in the
interior of conv(VR (I)). Then d1 (I) is the cone over the convex set dual to THd (I).
Since the class of projected spectrahedra is closed under duality, and under intersection with ane hyperplanes, it suces to show that d1 (I) is a projected
spectrahedron. But this follows from the formula f q12 qr2 I by an
argument similar to that given after (5.52).
In this chapter we have seen two rather dierent representations of the convex hull of a real variety, namely, the characterization of the algebraic boundary
in Section 5.4, and the representation as a theta body suggested above. The relationship between these two is not yet well understood. A specic question is how
to eciently compute the algebraic boundary of a projected spectrahedron. This
leads to problems in elimination theory that seem to be particularly challenging for
current computer algebra systems.
We conclude by revisiting one of the examples we had seen in Section 5.4.
Example 5.61 (Example 5.37 continued). We revisit the curve X = V (h1 , h2 )
with
h1 = x2 + y 2 + z 2 1,
h2 = 19x2 + 21y 2 + 22z 2 20.
Scheiderer [35] proved that nite convergence holds in (5.56) whenever I denes
a curve of genus 1, such as X. We will show that d = 1 suces in our example;
i.e., we will show that TH1 (I) = conv(X) for the ideal I = h1 , h2 .
We are interested in ane linear forms f that admit a representation
qi2 .
(5.57)
f = 1 + ux + vy + wz = 1 h1 + 2 h2 +
i
i
i
main
2012/11/1
page 245
i
245
11 (I) = (u, v, w) R3 1 , 2 :
1 + 1 + 202
u
v
w
19
0
0
1
2
0 .
0
v
0
1 212
1 222
Dual to this is the theta body TH1 (I) = 11 (I) . It has the representation
1
x
y
z
x 2 1 u 4
u
u
1
2
3
3
3
.
0
TH1 (I) = (x, y, z) R u1 , u2 , u3 , u4 :
2
1
y
u1
3 3 u4 u3
z
u2
u3
u4
To show that TH1 (I) = conv(X), we use the general approach outlined in Remark
5.62 below. We consider the ideal generated by this 44 determinant and its derivatives with respect to u1 , u2 , u3 , u4 , we saturate by the ideal of 33 minors, and then
we eliminate u1 , u2 , u3 , u4 . The result is the principal ideal h4 h5 h6 , with hi as in
Example 5.37. This computation reveals that the algebraic boundary of conv(X)
consists of quadrics, and we can conclude that TH1 (I) = conv(X).
Figure 5.11. Convex hull of the curve in Figure 5.7 and its dual convex body.
Pictures of our convex body and its dual are shown in Figure 5.11. Diagrams
such as these can be drawn fairly easily for any projected spectrahedron in R3 . To
be precise, the matrix representation of TH1 (I) and 11 (I) given above can be
used to rapidly sample the boundaries of these convex bodies, by maximizing many
linear functions via SDP.
Remark 5.62. It would be desirable to develop a practical algorithm for computing the algebraic boundary of a projected spectrahedron. After a linear change of
i
i
246
main
2012/11/1
page 246
i
Chapter 5. Dualities
coordinates, we may assume that the given spectrahedron is represented by a symmetric matrix whose entries are linear forms in some unknowns, and our task is to
eliminate a subset of these unknowns. To do this, we consider the ideal generated by
the determinant and its partial derivative with respect to the unknowns to be eliminated. The variety of this ideal contains the ramication locus of the projection,
but it also contains the singular locus of the determinantal hypersurface. The main
diculty in the computation is that we need to remove that singular locus before we
eliminate the unknowns. Frequently, like in the previous example, the singular locus
is given by the vanishing of the comaximal minors. However, this need not always
be the case. A concrete example is discussed below in Example 5.63. Thus, one
issue is how to best represent the singular locus of the algebraic boundary of a spectrahedron, in order to perform the saturation step. Once we have the correct ideal
for the ramication locus, then we can compute the branch locus by elimination,
and the result will be the desired hypersurface.
Example 5.63. Consider the surface in 3-space dened by
x
y + z
det
x
y
y+z
1
y
1
x
y
z
x
y
1
= 0.
x
1
Its singular locus is the line x y = z = 0. This does not coincide, in this example,
with the variety dened by the vanishing of the (comaximal) 3 3 minors which
consist only of the two points (0, 0, 0) and (1, 1, 0).
Exercises
Exercise 5.64. Find an explicit symmetric 66 matrix Q, with entries that are
linear in a and b, that satises the identify (5.54). Is your matrix Q unique?
Exercise 5.65. The polynomial p(x) = 1+x+x2 +x3 +x4 +x5 +x6 is nonnegative
on the real line. What is its minimum value? Write p(x) as a sum of squares. The set
of all sums of squares representations of p(x) is a three-dimensional spectrahedron.
Draw a picture of this spectrahedron. Determine all possible representations of p(x)
as a sum of two squares.
Exercise 5.66. Let C denote the convex set of all points (u, v) R2 such that
fu,v (x) = x4 + ux2 + vx + 1 is a sum of squares. Draw a picture of C, express C
as a projected spectrahedron, and compute a polynomial g(u, v) that vanishes on
the boundary of C.
Exercise 5.67. Let I = h1 , where h1 = (x21 1)(x1 1)2 +(x22 1)2 is the bicuspid
curve in Example 5.25. Compute and draw the second theta body TH2 (I).
i
i
Bibliography
main
2012/11/1
page 247
i
247
Exercise 5.68. The A-discriminant A of the 3 7 matrix in (5.53) is a homogeneous polynomial of degree 24 in seven indeterminates. Can you compute A
explicitly? How many monomials appear in the expansion of A ?
Notes. This chapter grew out of the notes for three lectures given by Bernd Sturmfels on March 2224, 2010, at the spring school on Linear Matrix Inequalities and
Polynomial Optimization (LMIPO) at UC San Diego. Later that spring, Bernd
Sturmfels lectured on convex algebraic geometry at the Universit`a de Roma 3. This
led to the publication of a rst version of the material in this chapter under the
title Dualities in Convex Algebraic Geometry in Rendiconti di Matematica, Serie
VII, 30:285327, 2010.
Bibliography
[1] A. I. Barvinok. A Course in Convexity, Grad. Stud. in Math. 54. American
Mathematical Society, Providence, RI, 2002.
[2] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in Real Algebraic Geometry.
Springer, Berlin, 2006.
[3] D. Bates, J. Hauenstein, A. Sommese, and C. Wampler. Bertini: Software for Numerical Algebraic Geometry. Available at http://www.nd.edu/
sommese/bertini.
[4] J. Bochnak, M. Coste, and M.-F. Roy. Geometrie Algebraique Reelle, Ergebn.
Math. Grenzgeb. 12. Springer, Berlin, 1987.
[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, Cambridge, UK, 2004.
[6] S. Boyd and L. Vandenberghe. Semidenite programming. SIAM Rev., 38:49
95, 1996.
[7] F. Catanese, S. Hosten, A. Khetan, and B. Sturmfels. The maximum likelihood
degree. Amer. J. Math., 128:671697, 2006.
[8] D. Cox, J. Little, and D. OShea. Ideals, Varieties and Algorithms, 3rd edition,
Undergrad. Texts Math. Springer, New York, 2007.
[9] D. Cox, J. Little, and D. OShea. Using Algebraic Geometry, 2nd edition, Grad.
Texts in Math. Springer, New York, 2005.
[10] R. Datta. Universality of Nash equilibria. Math. Oper. Res., 28:424432, 2003.
[11] I. Gelfand, M. Kapranov, and A. Zelevinsky: Discriminants, Resultants and
Multidimensional Determinants. Birkhauser, Boston, 1994.
[12] H.-C. Graf von Bothmer and K. Ranestad. A general formula for the algebraic
degree in semidenite programming. Bull. Lond. Math. Soc., 41:193197, 2009.
i
i
248
main
2012/11/1
page 248
i
Chapter 5. Dualities
i
i
Bibliography
main
2012/11/1
page 249
i
249
[30] M. Ramana and A. J. Goldman. Some geometric results in semidenite programming. J. Global Optim., 7:3350, 1995.
[31] K. Ranestad and B. Sturmfels. On the convex hull of a space curve. Adv. Geom.,
12:157178, 2012.
[32] K. Ranestad and B. Sturmfels. The convex hull of a variety. In P. Br
anden,
M. Passare, and M. Putinar, editors, Notions of Positivity and the Geometry
of Polynomials. Trends Math. Springer-Verlag, Basel, 2011, pp. 331344.
[33] R. T. Rockafeller. Convex Analysis. Princeton University Press, Princeton, NJ,
1970.
[34] P. Rostalski. Bermeja, Software for Convex Algebraic Geometry. Available at
http://math.berkeley.edu/philipp/cagwiki.
[35] C. Scheiderer. Convex hulls of curves of genus one. Adv. Math., 228:26062622,
2011.
[36] M. Trott. Applying GroebnerBasis to three problems in geometry. Mathematica
in Education and Research, 6:1528, 1997.
[37] G. Ziegler. Lectures on Polytopes. Grad. Texts in Math. Springer, New York,
1995.
i
i
main
2012/11/1
page 250
i
main
2012/11/1
page 251
i
Chapter 6
Semidenite
Representability
Jiawang Nie
6.1
Introduction
251
i
i
252
main
2012/11/1
page 252
i
6.2
Spectrahedra
(6.1)
Here, each Ai is a constant symmetric matrix, and if the origin is in the interior of
S, then A0 can be chosen to be positive denite. Furthermore, if A0 0, we can
apply a congruence transformation to the matrices A1 , . . . , An and make A0 = I.
For instance, if A0 = BB T with B nonsingular, then S can be described by
I + x1 B 1 A1 B T + + xn B 1 An B T 0.
When A0 = I, the linear matrix inequality in (6.1) is said to be monic and the origin
is in the interior of S. Conversely, if S dened by (6.1) has nonempty interior, we
may assume A0 is positive denite by translating an interior point to the origin. The
expression A0 + x1 A1 + + xn An is called a symmetric linear matrix polynomial
or a linear pencil.
6.2.1
Examples of Spectrahedra
i
i
6.2. Spectrahedra
main
2012/11/1
page 253
i
253
n
and a vector c Rn . The
for a symmetric positive denite matrix E S++
vector c is called the center of E, and E is called the shape matrix of E. An
ellipsoid E is a spectrahedron because a point x is in E if and only if it satises
the linear matrix inequality
E
(x c)T
E
xc
=
1
cT
n
c
0
xi T
+
1
ei
i=1
ei
0.
0
We can use Schur complement to verify that the above linear matrix inequality
describes E. Ellipsoids have wide applications in optimization [3, 7, 8, 32].
Second order cones. The set {(x, t) Rn R+ :
x
2 t} is called the
second order cone (also Lorentz cone or ice cream cone). We have already
seen this cone in Chapter 2. It is a spectrahedron, because it is dened by
the linear matrix inequality
tIn
xT
n
x
0
xi T
=
t
ei
i=1
I
ei
+t n
0
0
0
0.
1
Second order cones also have wide applications in optimization (cf. [2]).
Convex quadratic sets. More general convex sets than ellipsoids and second order cones are dened by quadratic inequalities. Let Q := {x Rn :
q(x) 0} be a nonempty set, with
q(x) := xT Bx + bT x + c
being a quadratic function. Here B is a symmetric matrix. It is interesting
to note that the set Q is convex if and only if it is a spectrahedron. We leave
this as an exercise to the readers.
Matrices with bounded eigenvalues or singular values. Denote by
min () and max (), respectively, the minimum and maximum eigenvalues of
a symmetric matrix. Let X Rnn . If X is symmetric, then max (X) t if
and only if
tI X 0
and min (X) t if and only if
X tI 0.
If X is not symmetric, then its maximum singular value max (X) t if and
only if
tI X
0.
X T tI
These linear matrix inequalities all dene spectrahedra in the space of (X, t).
i
i
i main
2012/11/10
page 254
254
m
A
+
t
A
0
whenever
0
k k
n
k=1
(x, d) R R
,
C =
for k = 1, 2, . . . , m
where every Ak is a constant symmetric matrix. The set C is a spectrahedron
(cf. [30]), because there exists a symmetric linear matrix polynomial L(x, d)
in (x, d) such that
C = { (x, d) Rn R : L(x, d) 0 } .
The construction of L(x, d) is given in [30].
A special case of matrix cubes is the k-ellipse, which consists of all points
in the plane that have a constant sum of distances to a set of given foci (cf.
[31]). We have already encountered the k-ellipse in Section 2.1.3. For instance,
the 3-ellipse with foci (0, 0), (1, 0), (0, 1) and radius d = 5 is dened by the
equation
?
?
?
x21 + x22 + (x1 1)2 + x22 + x21 + (x2 1)2 = 5.
i
i
6.2. Spectrahedra
main
2012/11/1
page 255
i
255
The region surrounded by this 3-ellipse is convex and can be described by the
linear matrix inequality:
6 3x1
x2
x2 1
x2
0
0
x2
6 x1
0
x2 1
0
x2
0
0
x2 1
0
6 x1
x2
0
0
x2
0
0
x2 1
x2
6 + x1
0
0
0
x2
x2
0
0
0
4 x1
x2
x2 1
0
0
x2
0
0
x2
4 + x1
0
x2 1
0
0
x2
0
x2 1
0
4 + x1
x2
0
0
x2
0.
0
x2 1
x2
4 + 3x1
6.2.2
Let S be a spectrahedron dened as in (6.1), and pI (x) denote the principal minor
of the linear pencil
A(x) := A0 + A1 x1 + + An xn ,
whose rows and columns are indexed by a nonempty set I {1, 2, . . . , m}, where
m is the size of the matrices Ai . Then, a point x S if and only if all the principal
minors are nonnegative at x:
pI (x) 0 for all I {1, 2, . . . , n}.
Therefore, S is a basic closed semialgebraic set (dened by nitely many weak
polynomial inequalities). The boundary of S lies on the determinantal hypersurface
det A(x) = 0.
If A0 0 (the origin is in the interior of S), then S is the closure of the connected
component of the set
{x : det A(x) > 0}
containing the origin.
The above observation leads to the denition of algebraic interior, which was
introduced by Helton and Vinnikov [17]. A subset T of Rn is an algebraic interior
if it equals the closure of a connected component of the set {x : p(x) > 0} for
some polynomial p. The polynomial p is called a dening polynomial of T . The
dening polynomial of an algebraic interior is not unique. However, the one of
the smallest degree is unique up to a positive constant factor, and divides all the
dening polynomials of T . Its degree is called the degree of T .
Example 6.1. Consider the spectrahedron dened by
1 x1 x2
x1 1 x3 0.
x2 x3 1
i
i
256
main
2012/11/1
page 256
i
p{1,2} = 1 x21 0,
p{1,3} = 1 x22 0,
p{2,3} = 1 x23 0.
6.2.3
Rigid Convexity
Since W is symmetric, the equation p(x(t)) = 0 has only real roots. This is an
important property satised by spectrahedra.
A polynomial p R[x] is called real zero with respect to a point u with
p(u) > 0 if for every 0 = w Rn the univariate polynomial p(u + tw) R[t] has
only real zeros. If u = 0, we simply say that p is real zero. Real zero polynomials are
nonhomogeneous versions of hyperbolic polynomials. A homogeneous polynomial
h(x) is hyperbolic with respect to a direction u Rn with h(u) > 0 if for every
0 = w Rn the univariate polynomial h(u + tw) R[t] has only real zeros. If a
form h(x) is hyperbolic with respect to u = (1, u2 , . . . , un ), then the dehomogenized
polynomial h(1, x2 , . . . , xn ) is real zero with respect to (u2 , . . . , un ).
Example 6.2. (i) The cubic polynomial from Example 6.1,
2x1 x2 x3 x21 x22 x23 + 1,
is real zero, because it is the determinant of a monic linear pencil.
(ii) The polynomial p(x) = 1 (x41 + x42 ) is not real zero [17]. For every 0 =
(w1 , w2 ) R2 , the univariate polynomial in t
/.
/
.
p(tw) = 1 t2 (w14 + w24 )1/2 1 + t2 (w14 + w24 )1/2
has two nonreal zeros. The origin lies in the interior of {x : p(x) > 0}.
i
i
6.2. Spectrahedra
main
2012/11/1
page 257
i
257
1
0.8
0.6
0.4
0.2
0
2
4
6
8
0.5
i
i
258
main
2012/11/1
page 258
i
1.5
0.5
(x21
0.2
0.4
0.6
0.8
Figure 6.2. A line passing through (0.5, 0) intersects the curve x31 3x22 x1
= 0 in only 2 real points.
x22 )2
6.2.4
Given a polynomial p R[x], we say it has a symmetric determinantal representation if there exists a linear pencil
L(x) := L0 + x1 L1 + + xn Ln
such that p = det L(x) and every Li is symmetric. If L0 0, we say that p
admits a monic symmetric determinantal representation . An important result due
i
i
6.2. Spectrahedra
main
2012/11/1
page 259
i
259
(6.2)
1
0 x1
1 x2 .
1 + x21 + x22 = det 0
x1 x2 1
This linear pencil is clearly not monic.
(ii) Consider the following bivariate quartic polynomial:
1 + x21 + x22 + 4x21 x2 4x1 x22 + x41 2x31 x2 2x1 x32 x21 x22 + x42 .
It is the determinant of the following linear
1 x1 x2
x1 1 x1
x2 x1 1
x2 x2 x1
x2
x2
.
x1
1
i
i
260
main
2012/11/1
page 260
i
uses complexication of projective algebraic curves and the constructions are mostly
theoretical. Computational aspects of these constructions are discussed in [35].
When S Rn (n > 2) is an algebraic interior that is rigidly convex, its minimum degree dening polynomial p might not admit a monic symmetric determinantal representation. However, this does not exclude the possibility of a multiple
of p having a monic symmetric determinantal representation. If this is true, then S
would be a spectrahedron. Indeed, Helton and Vinnikov [17] conjectured that every
rigidly convex algebraic interior of Rn is a spectrahedron.
6.2.5
Exercises
i
i
main
2012/11/1
page 261
i
261
Represent the convex region surrounded by this 3-ellipse by a linear matrix inequality in variables x1 and x2 only. What is the polynomial of smallest degree (up to a
constant factor) vanishing on this 3-ellipse?
Exercise 6.13. Suppose S is a spectrahedron. Show that every face of S is
exposed. (A face F of S is called exposed if either F = S or there exists a supporting
hyperplane H of S such that H S = F .)
6.3
Projected Spectrahedra
n
k
S = x Rn A0 +
(6.4)
xi Ai +
yj Bj 0 for some y Rk .
i=1
j=1
Projected spectrahedra are a much larger class of convex sets than spectrahedra,
with signicantly greater modeling power. Unlike in the case of spectrahedra where
rigid convexity is a natural requirement, no nontrivial obstructions to being a projected spectrahedron are known. In the remainder of this chapter we discuss representability of convex sets as projected spectrahedra.
6.3.1
We now give several examples of projected spectrahedra, many of which are important in applications.
The TV screen {(x1 , x2 ) : 1 x41 x42 0} of Example 6.2 is a projected
spectrahedron since it admits the semidenite representation
'
&
y2
1 x1
1 x2
1 + y1
,
,
0.
BlockDiag
y2
1 y1
x1 y1
x2 y2
It has two lifting variables, and we have seen that the TV screen is not a
spectrahedron.
:
;
The three-dimensional hyperboloid H = x R3+ : x1 x2 x3 1 is a projected
spectrahedron, since it admits the semidenite representation
&
'
x1 y1
1
y
x y2
BlockDiag
, 1
, 3
0.
1 y2
y1 x2
y2 1
i
i
262
main
2012/11/1
page 262
i
(6.5)
is a projected spectrahedron (cf. [3, Section 3.3]). As we will see below, the
sets H(m, r) are useful in constructing semidenite representations for convex
sets.
Sums of largest eigenvalues [3]. In optimization one often needs to minimize the sum of k largest eigenvalues over an ane subspace of symmetric
matrices. This optimization problem is convex and can be formulated as a
semidenite program. For X S n , let i (X) be the ith largest eigenvalue
of X. Dene sk (X) := 1 (X) + + k (X) to be the sum of k largest
eigenvalues of X. Denote the set
:
;
Skn := (X, t) S n R : sk (X) t .
Note that sk (X) t if and only if there exists (Z, ) S n R such that [3,
Section 4.2]
t k Tr(Z) 0,
(6.6)
Z 0,
Z X + In 0.
It can be checked that (6.6) implies sk (X) t. Conversely, if sk (X) t,
then we can nd a pair (Z, ) S n R satisfying (6.6). To see this, we may
assume that X is diagonal (up to an orthogonal transformation) and choose
= k (X),
i
i
main
2012/11/1
page 263
i
263
X
0
'
.
5
i
(f0 , f1 , f2 , f3 , f4 ) R
fi x 0 x R .
i=0
1
1
f0
2 f1
3 f2
1
1
1 f1
0.
2
3 f2 + 2
2 f3
1
1
f4
3 f2
2 f3
Truncated quadratic modules and preordering. In constrained polynomial optimization, weighted sos polynomials are very useful in representing
i
i
264
main
2012/11/1
page 264
i
m
deg(i gi ) 2k for all i
,
(6.7)
i gi
qmodulek (g) =
0 , . . . , m are sos
i=0
deg(
g
)
2k,
preorderk (g) =
g
.
(6.8)
m
and g0 = 1. The set of all sos
In the above, we denote g := g11 gm
polynomials with a xed degree is a projected spectrahedron, as shown in
the preceding example. Therefore, both qmodulek (g) and preorderk (g) are
projected spectrahedra.
For instance, in the case of two variables (n = 2), qmodule1 (1 x21 x22 )
admits the semidenite representation with one lifting variable :
a
b
T
T
0 , 0 .
a + 2b x + x Cx
C + I2
bT
6.3.2
Necessary Conditions
The geometry of the boundary is very important in investigating semidenite representability of convex sets. The notion of curvature plays a crucial role.
Let f be a polynomial in R[x]. Consider its real variety
VR (f ) = {x Rn : f (x) = 0}
and a point u VR (f ). We say f is nonsingular at u if f (u) = 0. If f is nonsingular
at u VR (f ), we say VR (f ) has positive curvature at u if for either s = 1 or s = 1
s v T 2 f (u)v > 0 for all 0 = v f (u) .
(6.9)
i
i
main
2012/11/1
page 265
i
265
Its boundary has zero curvature on four points (1, 0), (0, 1) and has positive
curvature everywhere else.
A polynomial function f (x) is said to be strictly quasi-concave at u if the
condition (6.9) holds for s = 1. For a subset V Rn , we say f (x) is strictly
quasi-concave on V if f (x) is strictly quasi-concave on every point of V . When > is
replaced by in (6.9) for s = 1, we can similarly dene f (x) to be quasi-concave.
Similarly, quasi-convexity and strict quasi-convexity are dened by requiring s = 1
in (6.9). Our denitions of quasi-convexity and quasi-concavity are slightly less
demanding than the ones in the existing literature (e.g., [8, Section 3.4.3]).
Example 6.15. Consider the two-dimensional hyperboloid
H := {x R2+ : x1 x2 1 0}.
We see that
$
%
v T 2 (x1 x2 1) v = 2v1 v2 > 0
whenever 0 = v x and x1 x2 = 1. Hence the boundary H has positive curvature.
The dening polynomial is not convex anywhere, but it is strictly quasi-concave on
the boundary of H.
Now we present some necessary conditions for a set to be a projected spectrahedron. We are interested in closed semialgebraic sets:
S=
m
<
Tk ,
k
Tk = {x Rn : g1k (x) 0, . . . , gm
(x) 0}.
k
k=1
Each gik is a polynomial and the sets Tk are called basic closed semialgebraic. Denote
by Tk the boundary of Tk in the standard Euclidean topology. For any u Tk ,
the active set Ik (u) := {1 i mk : gik (u) = 0} is nonempty.
The description of a semialgebraic set by polynomials is usually not unique,
and its boundary might have singularities. We say u is a nonsingular point of Tk
if |Ik (u)| = 1 and gik (u) = 0 for i Ik (u); otherwise, we say u is a singular
point of Tk . A point u on Tk is called a corner point of Tk if |Ik (u)| > 1. For
u S and i Ik (u) = , we say gik is irredundant at u with respect to S (or just
irredundant at u if the set S is clear from the context) if there exists a sequence
of nonsingular points {uN } V (gik ) S of Tk such that uN u; otherwise, we
say gik is redundant at u. We say gik is at u if gik (u) = 0. Geometrically, when
gik is nonsingular at u S, gik being redundant at u means that the inequality
gik (x) 0 is not necessary for describing S in a small neighborhood of u.
Example 6.16. Consider the convex set that is drawn in the shaded area of Figure 6.3. It is the union of the following two basic closed semialgebraic sets:
T1 = {g11 (x) := x2 0, g21 (x) := 1 x2 0, g31 (x) := x42 x61 0},
T2 = {g12 (x) := x1 0, g22 (x) := 1 x2 0, g32 (x) := 10x32 x51 0}.
i
i
266
main
2012/11/1
page 266
i
1.5
0.5
0.5
1.5
Figure 6.3. The shaded area is the union of T1 and T2 in Example 6.16.
The corner points of T1 are (1, 1), (0, 0), (1, 1). The polynomial g31 is irredundant at (1, 1) and (0, 0) but redundant at (1, 1). The polynomials g31 in
nonsingular at (1, 1) but singular at (0, 0). The
polynomial g11 is redundant at
5
(0, 0). The corner points of T2
are (0, 0), (0, 1), ( 10, 1). The polynomial g32 is ir5
redundant at both (0, 0) and ( 10, 1). It is nonsingular at ( 5 10, 1) but singular
at (0, 0). The polynomial g12 is redundant at (0, 1) and (0, 0). Both g21 and g22 are
irredundant on the section x2 = 1 of the boundary.
Now we present necessary conditions for semidenite representability.
Theorem 6.17 ([13]). Let S Rn be a projected spectrahedron. Then S is convex
and has the following additional properties:
(a) The interior int(S) of S is a nite union of basic open semialgebraic sets, i.e.,
int(S) =
m
<
Tk ,
k
Tk = {x Rn : g1k (x) > 0, . . . , gm
(x) > 0}.
k
k=1
m
<
Tk ,
k
Tk = {x Rn : g1k (x) 0, . . . , gm
(x) 0}.
k
k=1
i
i
main
2012/11/1
page 267
i
267
Theorem 6.17 says that a projected spectrahedron must be convex and semialgebraic, and its boundary must have nonnegative curvature at smooth points. In
particular, the rst two parts establish the necessary algebraic structure of projected
spectrahedra, while nonnegativity of curvature follows from convexity. In other
words, convexity and being semialgebraic are necessary conditions for semidenite
representability. It is not clear whether they are also sucient. Indeed, it was
conjectured in [13] that every convex semialgebraic set in Rn is semidenite representable.
Proof of Theorem 6.17. The convexity of S is obvious. Parts (a) and (b)
immediately follow from the TarskiSeidenberg quantier elimination [6].
(c) Let u S Tk . Note that S is a convex set and has the same boundary
as S. (If a set is not closed, then its boundary is dened to be the boundary of its
closure.)
First, consider the case that u is a smooth point. Since S is convex, S has a
supporting hyperplane u + w = {u + x : wT x = 0}. S lies on one side of u + w
and so does Tk , since Tk is contained in S. Since u is a smooth point, Ik (u) = {i}
has cardinality one. For some > 0 suciently small, we have
Tk B(u, ) = {x Rn : gik (x) 0, 2
x u
2 > 0}.
Note u + w is also a supporting hyperplane of Tk passing through u. So, the
gradient gik (u) must be parallel to w, i.e., gik (u) = ki w for some nonzero scalar
v is not
ki = 0. Thus, for all 0 = v w and > 0 small enough, the point u + v
in the interior of Tk B(u, ), which implies
&
'
gik u +
v 0 for all 0 = v w = gik (u) .
$
k
k
T
R(v) := In
gik (v)
2
2 gi (v)gi (v) .
So the quasi-concavity of gik at uN is equivalent to
R(uN )T 2 gik (uN )R(uN ) 0.
i
i
268
main
2012/11/1
page 268
i
2.5
2
1.5
1
0.5
0
5
i
i
main
2012/11/1
page 269
i
269
1
0
x1
0
1
x2 y 0.
x1 x2 y
y
Its picture is shown in the shaded area of Figure 6.5. The above linear matrix
inequality is equivalent to
f (x, y) := x21 + (x2 y)2 y 0,
where f (x, y) is the determinant of the dening linear pencil. If a point x lies on the
boundary of S, then there exists y such that f (x, y) = 0 and y is a local maximizer
of the function y f (x, y), which implies
fy = 2x2 + 2y 1 = 0.
Eliminating y from f (x, y) = fy (x, y) = 0 gives the equation
g(x) := 1 + 4(x2 x21 ) = 0.
On the other hand, for every x satisfying g(x) 0, the equation f (x, y) = 0 has
a real solution y and the pair (x, y) satises f (x, y) 0. Therefore, we get an
equivalent description for S as
S = {(x1 , x2 ) : 1 + 4(x2 x21 ) 0}.
25
20
15
10
i
i
270
main
2012/11/1
page 270
i
The dening polynomial g(x) is concave. The boundary of S has positive curvature
everywhere.
6.3.3
Exercises
1 x1 y
x1
(a) : x1 1 x2 0; (b) : y1
y x2 1
1
1
1
y2 0; (c) : y1
x2
y2
y1
y2
x1
y2
x1 0.
x2
i
i
main
2012/11/1
page 271
i
271
6.4
6.4.1
To illustrate the basic idea of moment constructions, we begin with a simple example
of a one-dimensional convex set dened by a single quartic inequality
a0 + a1 x + a2 x2 + a3 x3 + a4 x4 0.
We introduce a new variable yi for each monomial xi and convert the dening
quartic inequality to the following system:
a0 y0 + a1 y1 + a2 y2 + a3 y3 + a4 y4 0,
y0
y1
y2
The matrix
y1
y2
y3
1
y2
y3 = x
y4
x2
1
x
x2
x
x2
x3
x
x2
x3
x2
x3 .
x4
x2
x3
x4
1 x y2
a0 + a1 x + a2 y2 + a3 y3 + a4 y4 0, x y2 y3 0,
y2 y3 y4
which yields a projected spectrahedron with lifting variables y2 , y3 , y4 .
i
i
272
main
2012/11/1
page 272
i
(6.10)
(6.11)
Let
Write every gi as
gi (x) =
g(i) x .
||2d
y0 = 1, Md (y) 0,
R = x Rn
.
(6.12)
x1 = ye1 , . . . , xn = yen
The lifting variables in R are y , where || 2.
Example 6.28. Consider the set S = {(x1 , x2 ) R2 : 1 x41 x42 x21 x22 0}.
The construction (6.12) gives a semidenite relaxation R of S dened by
1
x1 x2 y20 y11 y02
x1 y20 y11 y30 y21 y12
i
i
main
2012/11/1
page 273
i
273
When B is positive semidenite with nonnegative entries and d is even, the equality
S = R holds, which will be shown in Section 6.4.3.
6.4.2
In general, the semidenite relaxation R given by (6.12) does not equal S, except in
the special case of sos-convex sets (dened in the next subsection). Hence tighter
constructions by using higher order moments are necessary. We describe two basic types of rened moment constructions: Putinar and Schm
udgen semidenite
relaxations.
To describe them, we need to dene localizing matrices. Let p be a polynomial
with deg(p) 2N . Write
)
A(N
(k = 'deg(p)/2();
p(x)[x]N k [x]TN k =
x
||2N
(N )
The pencil Lp (y) is called the Nth order localizing matrix of p. If p is nonnegative on S, then for every x S we have
)
L(N
p (y) 0
if every y = x .
(d)
Note that g0 = 1 and Lg0 = Md (y) as before. Since all g0 , g1 , . . . , gm are nonnegative on S, for every N the set S is contained in the projected spectrahedron
)
L(N
n
gi (y) 0, i = 0, 1, . . . , m
SN = x R
.
(6.13)
y0 = 1, x1 = ye1 , . . . , xn = yen
The set SN is called a Putinar semidenite relaxation of S.
The product of polynomials from any subset of g1 , . . . , gm is also nonnegative
m
on S. For every {0, 1}m, dene g := g11 gm
. Each g is nonnegative on S.
So every x S satises
y0 = 1,
)
L(N
g (y) 0
if every y = x .
i
i
274
main
2012/11/1
page 274
i
This implies that for every N the set S is contained in the projected spectrahedron
(N )
m
n Lg (y) 0 for all {0, 1} ,
SN = x R
.
(6.14)
y0 = 1, x1 = ye1 , . . . , xn = yen
udgen semidenite relaxation of S. Clearly, for every N ,
The set SN is called a Schm
SN SN because (6.14) has extra conditions in addition to those in (6.13). We
have the nesting relation
S1
S1
SN
SN
S
.
S
Later we will see that both SN and SN are equal to S for N large enough, under
some general conditions. Typically, it is very dicult to get explicit bounds on N
for which SN = S or SN = S. In some special cases, such bounds can be estimated,
e.g., in [29, Section 3].
Example 6.30. Consider the convex set S dened by
g1 (x) := x2 x31 0,
The relaxation S3 is given
y01 y30
(3)
Lg1 (y) = y11 y40
y02 y31
y01 + y30
(3)
Lg2 (y) = y11 + y40
y02 + y31
by
g2 (x) := x2 + x31 0.
y11 y40
y21 y50
y12 y41
y02 y31
y12 y41 0,
y03 y32
y11 + y40
y21 + y50
y12 + y41
y02 + y31
y12 + y41 0,
y03 + y32
x1 = y10 ,
x2 = y01 ,
y00 = 1,
M3 (y) 0.
6.4.3
Sos-convex Sets
i
i
main
2012/11/1
page 275
i
275
In the rest of this subsection we present the proof of this result. It gives a
general framework for proving that moment relaxations provide semidenite representations. A typical approach for proving equality of two convex sets is to use
duality theory via separating hyperplanes. Let S be as in Theorem 6.31. Suppose
aT x + b = 0 (a = 0) is a supporting hyperplane of S, then
aT x + b 0 for all x S,
aT u + b = 0 for some u S.
The point u is a minimizer of aT x+b over S and belongs to the boundary S. Since S
has nonempty interior, there exists a point v Rn such that every gi (v) > 0 (Slaters
condition) and every gi is concave. So, the rst order optimality condition holds
at u (cf. [5, Proposition 5.3.5]); i.e., there exist Lagrange multipliers 1 0, . . . ,
m 0 such that
a = 1 g1 (u) + + m gm (u),
i gi (u) = 0 (1 i m).
(6.15)
Since 2 p(x) is sos, the double integral above is sos by Lemma 6.32. Thus p(x) is
also sos.
i
i
276
main
2012/11/1
page 276
i
aT x
+ b < 0,
for some u S.
aT u + b = 0
The point u minimizes aT x + b over S. Since int(S) = and each gi is concave, the
rst order optimality condition holds (cf. [5]) and there must exist (1 , . . . , m ) 0
such that the Lagrangian L(x) in (6.15) is a convex nonnegative polynomial satisfying L(u) = 0 and L(u) = 0. Furthermore, its Hessian
2 L(x) =
m
i (2 gi (x))
i=1
is sos, and Lemma 6.33 implies L(x) is sos. The degree of L(x) is at most 2d. So
there exists a symmetric matrix W 0 such that
aT x + b =
m
i=1
m
$
%
i Lgi (
y ) + T r W Md (
y ) 0,
i=1
x
4 1
x2
x1
x2
T
+2
2x1
2
x1
+2
x2
2
2x2
ai (x) = 2d(d 1)
Bij xd2
xdj .
i
j=i
i
i
main
2012/11/1
page 277
i
277
If B 0 and d 1, then W 0 and must be sos; if each Bij 0 and d > 0 is even,
then all ai (x) are sos. Therefore, when B 0, every Bij 0 and d > 0 is even, the
form (xd )T Bxd is sos-convex, and by Theorem 6.31 the projected spectrahedron R
given by (6.12) is a semidenite representation for S.
Sos-convexity is a very strong condition, and not all convex polynomials are
sos-convex. An explicit example is given in [1]. More generally, a nonnegative
convex polynomial need not be a sum of squares (cf. Chapter 4). Generally, the
projected spectrahedron R given by (6.12) does not equal S if gi are not sos-concave.
On the other hand, sos-convexity can be veried by semidenite programming.
A polynomial f is sos-convex if and only if its Hessian 2 f is sos. This can be
checked numerically by solving a single SDP feasibility problem, and therefore, sosconvexity is a favorable condition in practice.
6.4.4
When S is not sos-convex, the basic moment relaxation R given by (6.12) might
not be a semidenite representation of S. The projected spectrahedra SN in (6.13)
and SN in (6.14) are better candidates for a semidenite representation of S. We
now examine weaker conditions than sos-convexity that guarantee that SN = S (or
SN = S) for some nite N .
A sucient condition for SN = S or SN = S is the bounded degree representation (BDR) introduced by Lasserre in [19]. BDR is typically very dicult to check.
More easily checkable conditions are strict convexity and strict quasi-convexity. We
now discuss these cases.
Bounded Degree Representation Condition
A general approach for showing that a moment relaxation produces a semidenite
representation is given in the proof of Theorem 6.31. The key point is to prove a
weighted sos representation with uniform degree bounds for all linear functionals
nonnegative on S. If a linear functional aT x + b is positive on S, then Putinars
Positivstellensatz [37] says that
aT x + b = 0 + 1 g1 + + m gm ,
(6.16)
where each i is an sos polynomial. To make sure that (6.16) holds, we require that
the presentation of S satises the archimedean condition: there exist sos polynomials
s0 , s1 , . . . , sm and a number M > 0 such that
M
x
22 = s0 + s1 g1 + + sm gm .
The archimedean condition implies that S is compact, but the reverse is not necessarily true. However, the presentation of any compact set S can be strengthened to satisfy the archimedean condition by adding a redundant ball constraint
M
x
22 0 for a suciently large M . Generally, the degrees of the polynomials
i in (6.16) go to innity as the minimum value of aT x + b on S tends to zero.
i
i
278
main
2012/11/1
page 278
i
aT x + b = 0 + 1 g1 + + m gm
with i sos and deg(i gi ) 2N for all i. If almost all positive linear functionals
on S have such a representation, then we say that the presentation of S admits a
PutinarPrestel bounded degree representation (PP-BDR) of order N (cf. [19]).
For the Schm
udgen moment relaxation SN in (6.14), to guarantee that SN = S
for some order N , we need a Schm
udgen bounded degree representation (S-BDR) of
order N (cf. [19]): for almost every pair (a, b) Rn R
aT x + b > 0 on S
aT x + b =
g ,
{0,1}m
aT x + b < 0.
Since conv(S) is compact, we can choose the above (a, b) generically. Since PP-BDR
of order N holds for the presentation of S, there exist sos polynomials 0 , . . . , m
such that (6.16) is true and deg(i gi ) 2N . For each i, we can nd a symmetric
Wi 0 such that i (x) = [x]TN di Wi [x]N di with di = 'deg(gi )/2(. Replacing each
monomial x by y , we get
/
.
/
.
)
)
aT x
+ b = T r L(N
y )W0 + + T r L(N
y )Wm 0,
g0 (
gm (
which contradicts the previous assertion that aT x
+b < 0. Therefore, conv(S) = SN .
Part (b) is proved in almost exactly the same way.
i
i
main
2012/11/1
page 279
i
279
m
i gi (u),
1 g1 (u) = = m gm (u) = 0.
i=1
Let L(x) be the Lagrangian dened in (6.15). Note that L(u) = 0 and L(u) = 0.
By Taylor expansion
, 1, t
m
L(x) = (x u)T
i
2 gi (u + s(x u))ds dt (x u).
i=1
0
0
!
"
Hi (x,u)
i
i
280
main
2012/11/1
page 280
i
We close by noting that if the set S is convex, then Theorem 6.37 gives concrete
conditions under which SN and SN give semidenite representations of S.
Convex sets with positively curved boundaries
When a semialgebraic set S is convex its dening polynomials are not necessarily
concave. For instance, the hyperboloid {x R2+ : x1 x2 1 0} is convex, while its
dening polynomial is neither concave nor convex. However, because of convexity,
the boundary of S must have nonnegative curvature at smooth points (see Theorem 6.17). Therefore, the dening polynomials are quasi-concave at smooth points.
This observation leads to weaker conditions, such as strict quasi-concavity of the
dening polynomials.
Theorem 6.38 ([15]). Assume that the set S dened in (6.10) is compact and
convex and has nonempty interior. If each gi (x) is either sos-concave or strictly
quasi-concave on S, then SN equals S for N suciently large. If, in addition, the
archimedean condition holds, then SN equals S for N suciently large.
The proof of Theorem 6.38 is based on Theorem 6.37. The basic idea is that
we are able to nd a dierent set of strictly concave dening polynomials for S by
using strict quasi-concavity. When gi (x) is strictly quasi-concave on S, we can nd
a polynomial hi (x) positive on S such that pi (x) = gi (x)hi (x) is strictly concave
on S. We refer to [15] for the details of the proof but provide an example below.
Consider the set
1
S = x R2 : g1 (x) := x1 x2 1 0, g2 (x) := (x1 1)2 (x2 1)2 0 .
9
The set S is compact and convex. The polynomial g1 is strictly quasi-concave, but
not concave. However, the set S can also be equivalently described as
p1 (x) := (x1 x2 1)(3 x1 x2 ) 0
2
S= xR
,
g2 (x) := 19 (x1 1)2 (x2 1)2 0
where p1 (x) is strictly concave on S.
For a convex basic closed semialgebraic set S, the Putinar moment relaxation
produces a semidenite representation of S only if all faces of S are exposed (cf.
[25]). There are further dierent conditions under which SN or SN gives semidenite
representations of S (cf. [20, 28]).
6.4.5
Generalizations
In many applications convex sets are naturally dened by rational function inequalities or polynomial matrix inequalities. In these cases semidenite representations
can also be constructed by using moments. We show some examples without going
into the details. Further results on these topics can be found in [28, 29].
i
i
main
2012/11/1
page 281
i
281
1
0.8
0.6
0.4
x2
0.2
0
2
4
6
8
0
x1
0.5
Figure 6.6. The convex set dened by x21 + x22 x41 + x21 x22 + x42 .
Interestingly, the rational function f (x) is concave everywhere. It satises a socalled rst order sos-concavity condition, and the set S admits the following semidefinite representation (cf. [28]):
y = (yij ), z = (zij ), s.t.
2
1 y20 + z04
xR
L1 (x, y, z) + L2 (x, y, z) 0
i
i
282
main
2012/11/1
page 282
i
0 0
0
1
0
0
0 1
0
x1
x2
0
0 0
0
x
0
0
2
,
L1 (x, y, z) =
1
x
x
y
y
y
y
1
2
20
02
11
02
0 x2 0
y11
y02 0
0
0
0 0
0
y02
z00
z10
z01 z02
z11
z02
z10 z02
z
z
z
z12
11
12
03
z01
z
z
z
z
z03
11
02
03
12
L2 (x, y, z) =
z02 z12 z03
z
z
z
04
13
04
z11 z03
z12 z13 z04
z13
z02
z12
z03 z04
z13
z04
The lifting variables yij correspond to regular moments, while zij correspond to
moments with the weight (x21 + x22 )1 , i.e., the integrals of type
,
xi xj
d(x)
2
x1 + x22
with respect to some positive measure on Rn . The details of constructing L1 , L2
are described in [28].
Now we consider the case of a convex set dened by a polynomial matrix inequality. A semidenite relaxation as in (6.12) can be constructed by using moments.
Under a matrix sos-convexity condition, this construction gives a semidenite representation of the convex set (cf. [29]).
Example 6.40 ([29]). Consider the set S dened by the polynomial matrix inequality:
2 x21 2x23
1 + x1 x2
x1 x3
1 + x1 x2
2 x22 2x21
1 + x2 x3 0.
x1 x3
1 + x2 x3
2 x23 2x22
The above quadratic matrix polynomial is matrix sos-concave (cf. [29]). A picture of
this set is drawn in Figure 6.7. As in (6.12), a basic moment semidenite relaxation
of S is
y
1
+
y
2
2y
101
011
002
020
y
s.t.
ijk
3
xR
.
1
x1
x2
x3
x2 y110 y020 y011 0
x3 y101 y011 y002
Indeed, the above is a semidenite representation of S, as shown in [29]. Therefore,
S is a projected spectrahedron.
i
i
main
2012/11/1
page 283
i
283
0.5
1
0.5
1
0.5
6.4.6
Suppose we can divide a convex set S into several parts and nd a semidenite representation for each piece. Then a natural question is whether these representations
can be glued together to provide a semidenite representation of S. This brings us
to the main question of this section: Is the convex hull of a union of projected
spectrahedra a projected spectrahedron? If so, how can we construct a semidenite
representation of it? Interestingly, there exist positive answers to these questions.
A simple implementation of the above idea is to cover the compact set by
nitely many balls. If the intersection of each ball with the convex set is a projected spectrahedron, then we can glue them together to get a uniform semidenite
representation for the whole convex set. This approach is called localization. The
necessary tool is building a single semidenite representation for the convex hull of
several projected spectrahedra. Since balls (ellipsoids) are spectrahedra, the question of semidenite representability of a convex set reduces to the representability
of the intersections of balls with the boundary of the set. Thus we can focus on
local properties of the boundary.
Let W1 , . . . , Wm Rn be convex sets. Their Minkowski sum is the convex set
dened as
m
W1 + + Wm :=
xk xk Wk , k = 1, . . . , m .
k=1
(6.17)
j=1
i
i
284
main
2012/11/1
page 284
i
.
/
.
/
.
/
(k)
(k) (k)
(1) (1)
(m) (m)
0 for pairs x , y
,..., x ,y
.
x Lk x , y
k=1
(6.18)
where m = { Rm
+ : 1 + + m = 1} is the standard simplex.
Proof. The proof follows readily from the denitions of convex hull and Minkowski
sum. See, for instance, [13].
Using Lemma 6.41, we can get a single semidenite representation for the
convex hull conv(m
k=1 Wk ) from those of the individual Wk .
Theorem 6.42 ([13]). Let W1 , . . . , Wm be nonempty projected spectrahedra dened
in (6.17), and W := conv(m
k=1 Wk ) be the convex hull of their union. Dene
k , u(k) (k = 1, . . . , m)
(k)
1 , . . . , m 0, 1 + + m = 1,
.
(6.19)
C :=
x
k=1
(1)
(1)
+
x
x
+
1
1
1
2
0
(1)
(1)
x
+
x
1
2
1
(1)
(2)
(2)
(2)
x=x +x
.
x
x
2
1
2
(2)
(2) 0
x2 2 22 x1
1 + 2 = 1, 1 , 2 0
i
i
main
2012/11/1
page 285
i
285
Setting x(2) = (u1 , u2 ), we get a projected spectrahedron with three lifting variables:
21 + x1 u1 x2 u2 + 1
0
x2 u2 + 1
u1 x1
2
u1
u2 + 1 1
.
xR
0
u
+
1
2
2
1
1
1
1 0, 1 1 0
When some Wk are unbounded, C and the convex hull W may not be equal,
but they have the same closure and interior. Note that both C and W are not
necessarily closed even when all Wi are.
Example 6.44 ([13]). (i) Consider the following spectrahedra:
x1 1
2
W1 = x R :
0 , W2 = {0}.
1 x2
Their convex hull is
conv(W1 W2 ) = {x R2+ : x1 = x2 = 0 or x1 x2 > 0}.
However, the set C in (6.19) is
x
x R2 : 0 1 1, 1
1
1
0 = R2+ .
x2
So, C = conv(W1 W2 ), but they have same interior. Both W1 and W2 are closed
while conv(W1 W2 ) is not.
(ii) Consider the projected spectrahedra
x1
2
W1 = x R : u 0,
1 + x2
1 + x2
0
1+u
i
i
286
main
2012/11/1
page 286
i
u1 x1
u1
1
1 1
0,
0
u
1
u
1
2
2
1
2
x R2
.
2
u R , 0 1 1
This is a semidenite representation with three lifting variables.
Putting all of the above together we obtain the following result.
Theorem 6.46 ([13]). Let S Rn be a compact convex set. Then S is a projected
spectrahedron if and only if for every u S there exists > 0 such that the
intersection S B(u, ) is a projected spectrahedron.
Proof. The only if part is trivial, because the closed ball
B(u, ) = {x :
x u
2 }
is a spectrahedron. For the if part, suppose for every u S and some u > 0,
the set S B(u, u ) is a projected spectrahedron. Note that {B(u, u ) : u S} is
an open cover for the compact set S. So there are a nite number of balls, say,
B(u1 , 1 ), . . . , B(uL , L ), to cover S. Note that
L
L
<
<
S = conv(S) = conv
(S B(uk , k )) conv
(S B(uk , k )) S.
k=1
k=1
The sets S B(uk , k ) are all bounded. By Theorem 6.42, we see that
L
<
S B(uk , k )
S = conv
k=1
is a projected spectrahedron.
6.4.7
We now have all the tools to present a sucient condition for a compact convex
semialgebraic set S Rn to be a projected spectrahedron. The condition essentially
requires that the boundary of S has positive curvature.
Theorem 6.47 ([13]). Suppose S Rn is a compact convex set dened by
S
m
<
k
Tk := {x Rn : g1k (x) 0, . . . , gm
(x) 0},
k
k=1
i
i
main
2012/11/1
page 287
i
287
where gik are polynomials. If for every u S and every gik satisfying gik (u) = 0,
Tk has interior near u (i.e., for any > 0, the ball B(u, ) intersects the interior
of Tk ) and gik (x) is strictly quasi-concave at u, then S is a projected spectrahedron.
Theorem 6.47 is proved by applying Theorem 6.46. It is enough to show that
for every u S, there exists a ball B(u, ) so that S B(u, ) is a projected
spectrahedron. Note that S B(u, ) is a nite union of intersections Tk B(u, ).
By Theorem 6.38, every Tk B(u, ) is a projected spectrahedron, under the assumption of strict quasi-concavity of the dening polynomials. A complete proof of
this result can be found in Theorem 4.5 of Helton and Nie [13].
In Theorem 6.47, if the set S is not convex, but the other conditions are satised, then we can conclude that the convex hull of S is a projected spectrahedron.
Here we give some remarks on the conditions in Theorems 6.31, 6.37, and 6.47.
Theorem 6.31 assumes that all gi are sos-concave, which is the strongest assumption, but its conclusion is also the strongest: (6.12) is an explicit representation of S
as a projected spectrahedron. Theorem 6.37 assumes that gi are either sos-concave
or strictly quasi-concave, which is weaker than Theorem 6.31, and its conclusion
is also weaker: SN or SN provides a representation of S as a projected spectrahedron for some large enough N . Theorem 6.47 assumes the weakest condition, but
its conclusion is also the weakest: there exists a semidenite representation of S
(an explicit description is typically quite complicated).
By comparing Theorems 6.17 and 6.47, we can see that the presented necessary
and sucient conditions for semidenite representability are not too far away from
each other. The dierence between them is nonnegative versus positive curvature
and singularity versus nonsingularity.
6.4.8
Exercises
i
i
288
main
2012/11/1
page 288
i
1
x1
x2
x1
1
x3
x2
x3 0,
1
1
x1 1 x2 1
x1 1
1
x3 1 0.
x2 1 x3 1
1
x1
x2
x2
0,
x3
x1
x2 0.
x3
Exercise 6.55. Let P be the set of univariate quadratic polynomials that are either
nonnegative on [1, 0] or nonnegative on [0, 1]. Find a semidenite representation
for the convex hull of P with the smallest number of lifting variables.
Exercise 6.56. Prove Theorem 6.42. (Hint: use Lemma 6.41.)
Exercise 6.57. Let T be a compact nonconvex set in Rn . Its convex boundary is
dened as c T := T conv(T ). Show that conv(c T ) = conv(T ). Is this also
true if T is not compact?
i
i
Bibliography
main
2012/11/1
page 289
i
289
Bibliography
[1] A. A. Ahmadi and P. A. Parrilo. A convex polynomial that is not sos-convex.
Math. Program., 135:275292, 2012.
[2] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Program., 95:351, 2003.
[3] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization:
Analysis, Algorithms, and Engineering Applications, MPS/SIAM Ser. Optim.
SIAM, Philadelphia, 2001.
[4] G. Blekherman. Convex forms that are not sums of squares. Preprint, 2009.
http://arxiv.org/abs/0910.0656.
[5] D. P. Bertsekas. Convex Optimization Theory. Athena Scientic, Belmont, MA,
2009.
[6] J. Bochnak, M. Coste, and M.-F. Roy. Real Algebraic Geometry, Springer,
Berlin, 1998.
[7] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory, SIAM Stud. Appl. Math. 15, SIAM,
Philadelphia, 1994.
[8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, Cambridge, UK, 2004.
[9] P. Br
anden. Obstructions to determinantal representability. Adv. Math.,
226:12021212, 2011.
[10] J. B. Conway. A Course in Functional Analysis. Grad. Texts in Math. Springer,
Berlin, 1985.
[11] D. Cox, J. Little, and D. OShea. Ideals, Varieties, and Algorithms. An Introduction to Computational Algebraic Geometry and Commutative Algebra, 3rd
edition, Undergrad. Texts in Math. Springer, New York, 2007.
[12] J. W. Helton and J. Nie. Semidenite representation of convex sets. Math.
Program., 122:2164, 2010.
[13] J. W. Helton and J. Nie. Sucient and necessary conditions for semidenite
representability of convex hulls and sets. SIAM J. Optim., 20:759791, 2009.
[14] J. W. Helton and J. Nie. Structured semidenite representation of some convex
sets. Proceedings of 47th IEEE Conference on Decision and Control (CDC),
Cancun, Mexico, Dec. 911, 2008, pp. 47974800.
[15] J. W. Helton and J. Nie. Semidenite representation of convex sets and convex
hulls. In M. Anjos and J. Lasserre, editors, Handbook on Semidenite, Cone
and Polynomial Optimization: Theory, Algorithms, Software and Applications,
to appear.
i
i
290
main
2012/11/1
page 290
i
i
i
Bibliography
main
2012/11/1
page 291
i
291
[32] J. Nie and J. Demmel. Minimum ellipsoid bounds for solutions of polynomial
systems via sum of squares. J. Global Optim., 33:511525, 2005.
[33] P. A. Parrilo. Exact semidenite representation for genus zero curves. Talk at
the Ban Workshop Positive Polynomials and Optimization, Ban, Canada,
October 812, 2006.
[34] P. A. Parrilo and B. Sturmfels. Minimizing polynomial functions. In S. Basu
and L. Gonzalez-Vega, editors, Proceedings of the DIMACS Workshop on Algorithmic and Quantitative Aspects of Real Algebraic Geometry in Mathematics
and Computer Science (March 2001), American Mathematical Society, Providence, RI, 2003, pp. 83100.
[35] D. Plaumann, B. Strumfels, and C. Vinzant. Computing linear matrix representations of Helton-Vinnikov curves. In H. Dym, M. de Oliveira, and M. Putinar,
editors, Mathematical Methods in Systems, Optimization, and Control, Oper.
Theory Adv. Appl., Birkhauser, Basel, 2011.
[36] S. Prajna, A. Papachristodoulou, P. Seiler, and P. Parrilo. SOSTOOLS Users
Guide. Website: http://www.mit.edu/parrilo/sosTOOLS/.
[37] M. Putinar. Positive polynomials on compact semi-algebraic sets, Indiana Univ.
Math. J., 42:969984, 1993.
[38] K. Schm
udgen. The K-moment problem for compact semialgebraic sets. Math.
Ann., 289:203206, 1991.
[39] M. Spivak. A Comprehensive Introduction to Dierential Geometry. Vol. II,
2nd edition. Publish or Perish, Inc., Wilmington, DE, 1979.
[40] H. Wolkowicz, R. Saigal, and L. Vandenberghe, editors. Handbook of Semidefinite Programming. Kluwer, Amsterdam, 2000.
i
i
main
2012/11/1
page 292
i
main
2012/11/1
page 293
i
Chapter 7
Spectrahedral
Approximations of
Convex Hulls of
Algebraic Sets
Jo
ao Gouveia and Rekha R. Thomas
7.1
Introduction
n
Ai xi 0
maximize c, x : A0 +
i=1
Jo
ao
293
i
i
294
main
2012/11/1
page 294
i
i
i
main
2012/11/1
page 295
i
295
iterations required by the procedure are lacking. The work presented in this chapter
was inspired by a question posed by Lov
asz in [19] that asked for a characterization
of ideals for which the rst approximation in our hierarchy will yield a semidenite
representation of the convex hull of the variety of the ideal. In Section 7.3 we answer
this question for nite varieties. The case of innite varieties is far less understood.
We identify conditions that prevent nite convergence of these approximations to
the closure of the convex hull of the variety. However, again a full characterization
is missing. Thus, the material in this chapter oers both advances in spectrahedral
representations of algebraic sets as well as many avenues for further research.
This chapter is organized as follows. In Section 7.2 we explain the procedure for nding spectrahedral approximations of the convex hull of an algebraic
set. These techniques were developed in [8], coauthored with Parrilo. One of the
key theorems needed in this section (Theorem 7.6) was strengthened in this presentation with the help of Greg Blekherman. We illustrate the method with various
examples and explain the underlying computations. In Section 7.3 we discuss the
situations in which this method converges, either asymptotically or nitely, to an
exact semidenite representation of the convex hull of the variety. The most useful
scenario is when the rst approximation yields an exact semidenite representation
of the convex hull of the variety. We characterize all nite varieties for which this
happens. We conclude in Section 7.4 with examples from combinatorial optimization where the underlying varieties are all nite. The methods we describe have
algorithmic impact on certain classes of combinatorial optimization problems and
the algebra becomes endowed with rich combinatorics in these cases.
7.2
The Method
gi fi : gi R[x], m N R[x].
I = f1 , . . . , fm =
i=1
i
i
296
main
2012/11/1
page 296
i
It is not so clear how to work with this description. Even for a single linear polynomial l, checking whether l(x) is nonnegative on VR (I) is a dicult task. A natural
idea is to relax the condition l|VR (I) 0 to something easier to check, at the risk
of losing some of the l(x) in the above intersection, and obtaining a superset of
cl(conv(VR (I))). As seen already in Chapters 3 and 4, the classical method to
certify the nonnegativity of a polynomial on all of Rn is to write it as a sum of
squares (sos) of other polynomials. In our case, we just need to certify that l(x) is
nonnegative on VR (I), a subset of Rn .
Let denote the set of all sos polynomials in R[x], R[x]k the set of all
polyh2j ,
nomials in R[x] of degree at most k, and 2k the set of all sos polynomials
where hj R[x]k . Nonnegativity of l(x) on VR (I) is guaranteed if
l(x) = (x) +
m
gi (x)fi (x)
(7.1)
i=1
for (x) and gi R[x], since then for all s VR (I), l(s) = (s) 0. In
Chapter 3 we saw that semidenite programming can be used to check whether a
polynomial is sos. In (7.1) we need to nd both (x) and the polynomials gi to
write l(x) as sos mod I. Therefore, to check (7.1) in practice, we impose degree
restrictions and proceed in one of two possible ways.
(i) In the rst method, we ask that 2k and gi fi R[x]2k for a xed positive
integer k and, if so, say that l(x) is k-sos mod {f1 , . . . , fm }. This is the basic
idea that underlies Lasserres moment method for approximating the convex
hull of a semialgebraic set described in Chapter 6.
(ii) In the second method, we ask only that 2k for a xed positive integer k
which reduces (7.1) to l(x) = (x)+h(x) where h(x) I. If this is the case, we
say that l(x) is k-sos mod I. This method is more natural if one is interested
in the geometry of VR (I) and conv(VR (I)) as it removes the dependence of the
method on the choice of a particular generating set of I. The only issue is if
the computation can be done in practice at the level of the ideal I and not
the input f1 , . . . , fm .
Both methods yield a hierarchy of convex relaxations of conv(VR (I)) obtained
as the intersection of all half spaces {x : l(x) 0} as l(x) ranges over the linear
polynomials that are k-sos in the sense of the method. Since if l(x) is k-sos mod
{f1 , . . . , fm } then it is also k-sos mod I, method (ii) yields a relaxation that is no
worse than that from method (i) for each value of k. On the other hand, method
(ii) requires the knowledge of a basis of R[x]/I as we will see below, which for some
problems may be hard to compute in practice. To see the computational dierences
that can occur between the two methods, consult Remark 7.14.
i
i
main
2012/11/1
page 297
i
297
In this chapter we focus on method (ii). The kth iteration of (ii) yields a
closed convex set, called the kth theta body of I, dened as
THk (I) := {x Rn : l(x) 0 for all l linear and k-sos mod I}.
Clearly VR (I), and hence cl(conv(VR (I))), is contained in THk (I) for all k. Thus the
theta bodies of I form a hierarchy of closed convex approximations of conv(VR (I))
as follows:
TH1 (I) TH2 (I) THk (I) THk+1 (I) cl(conv(VR (I))).
An immediate question is when this hierarchy converges to cl(conv(VR (I))) either
nitely or asymptotically. Finite convergence allows an exact representation of
cl(conv(VR (I))) as a theta body which would be extremely useful if we can represent
and optimize over a theta body eciently. We will show in Section 7.2.2 that each
THk (I) is the closure of a projected spectrahedron. This enables optimization
over a real variety using semidenite programming. In Section 7.4, we will learn
the motivation for the name theta bodies. We begin with some background on
working modulo a polynomial ideal.
7.2.1
Let I R[x] be an ideal and VR (I) be its real variety. For two polynomials f, g
R[x], if f g I, then f (s) = g(s) for all s VR (I). If f g I, then f and g
are said to be congruent mod I, written as f g mod I. Congruence mod I is an
equivalence relation on R[x]. The equivalence class of f is denoted as f + I, and the
set of equivalence classes is denoted as R[x]/I. The set R[x]/I is both an R-vector
space and a ring over R where addition, scalar multiplication, and multiplication
are dened as follows. Given f, g R[x] and R, (f + I) + (g + I) = (f + g) + I,
(f + I) = f + I, and (f + I)(g + I) = f g + I. We will denote vector space bases
of R[x]/I by B in this chapter. By the degree of an equivalence class f + I, we mean
the smallest degree of an element in the class. With this denition, we may assume
that the elements of B are listed in order of increasing degree. Further, for each
k N, the set Bk of all elements in B of degree at most k is then well-dened.
Computations in R[x]/I can be done via Gr
obner bases of I. Recall that if
G is any reduced Gr
obner basis of I, then a polynomial h lies in I if and only
if the normal form of h with respect to G is zero. Therefore, f g mod I if
and only if the normal form of f g with respect to G is zero, or equivalently,
f and g have the same normal form with respect to G. This provides an algorithm
to check whether two polynomials are congruent mod I. The unique normal form
of all polynomials in the same equivalence class serves as a canonical representative
for this class given G. If M is the initial ideal of I corresponding to the reduced
Gr
obner basis G, then recall that the standard monomials of M form an R-vector
space basis for R[x]/I. Therefore, the normal form of a polynomial with respect
to G can be written as an R-linear combination of the standard monomials of the
initial ideal M . The vector space R[x]/I has many other bases, some of which may
be better suited for computations than the standard monomial bases coming from
i
i
298
main
2012/11/1
page 298
i
an initial ideal of I. See Chapter 3 for a discussion of alternative bases of R[x] and
hence R[x]/I. In this chapter we will use only a standard monomial basis of R[x]/I.
A quick tour of the algebraic notions needed in this chapter can be found in the
appendix. For a thorough introduction to the theory of Grobner bases and related
notions, we refer the reader to [6].
We now come to sum of squares polynomials modulo an ideal I, and the question of how to check whether a
polynomial f R[x] is k-sos mod I. A polynomial
f R[x] is sos mod I if f
h2j mod I for some hj R[x], and k-sos mod I
if hj R[x]k for all j. Hence, the equivalence classes of polynomials that are sos
mod I (respectively, k-sos mod I) are precisely those in
/I := { + I : }
(respectively, 2k /I). It is worthwhile to note that many polynomials that are not
sos in R[x] can become sos mod an ideal I. For instance, the univariate linear
polynomial x is congruent to x2 mod the ideal x x2 R[x].
Let [x]k denote the vector of all monomials in R[x]k in a xed order, say degree
lexicographic. Recall from Chapter 3 that a polynomial f 2k if and only if there
exists a positive semidenite matrix A, denoted A 0, such that f = [x]Tk A[x]k .
The matrix A can be solved for using semidenite programming
and a Cholesky
2
hj for f , where hj (x)
factorization of it as A = V T V yields an sos expression
is the inner product of the jth row of V and the vector of monomials [x]k . This
method can be adapted to check whether f is k-sos mod I as follows. The vector
[x]k can be replaced by the vector of monomials from Bk , denoted as [x]Bk , since
R[x]k /I is spanned by Bk . Since the size of Bk is no larger than the size of a
basis of R[x]k , this can decrease the size of the unknown matrix A considerably,
making the nal SDP much smaller than before. Setting up A as a symmetric
matrix of indeterminates Aij and multiplying out [x]TBk A[x]Bk , we get a polynomial
obner
g R[x]2k . Let the normal forms of f and g with respect to a reduced Gr
basis G of I be f and g , respectively. Then since f f and g g mod I and f
and g are fully reduced with respect to G, we have that f g mod I if and only if
f = g . Therefore, to check if f is k-sos mod I, we equate the coecients of f and
g for like monomials and check whether the resulting linear system in the Aij s has
a solution with A 0.
Example 7.1. Consider the polynomial f (x, y) = x4 + y 4 + 2x2 y 2 x2 + y 2 and
the principal ideal I = f R[x, y]. The real variety VR (I), which is the set of real
zeros of f , is a Bernoulli lemniscate (shown in Figure 7.1) with foci ( 12 , 0).
It is easy to check that the horizontal line y = 18 is a bitangent to VR (I) and
that l(x, y) := y + 18 is nonnegative on VR (I). Since f has degree 4 and l has
degree 1, l cannot be 1-sos mod I but has a chance to be 2-sos mod I. We apply
the method described above to verify this.
The set {f } is a reduced Grobner basis of I with respect to every term order.
The initial ideal of I under the total degree order with ties broken lexicographically
with x > y, is generated by x4 . Hence a basis B for R[x, y]/I is given by the innite
set of standard monomials of x4 R[x, y] which are all the monomials in x and y
i
i
main
2012/11/1
page 299
i
299
g=
1
x
y
x2
xy
y2
a11
a12
a13
a14
a15
a12
a22
a23
a24
a25
a13
a23
a33
a34
a35
a14
a24
a34
a44
a45
a15
a25
a35
a45
a55
a16
a26
a36
a46
a56
a16
a26
a36
a46
a56
a66
1
x
y
x2
xy
y2
i
i
300
main
2012/11/1
page 300
i
0
8
0
a
22
1 a15
2
a14
0
a15 a25
a16 a26
21
a15
a33
a25
a26
0
a14
0
a25
a44
0
a46
a15
a25
a26
0
a55
0
a16
a26
0
a46
0
a44
A=
23/2
0
1/2
23/2
0
23/2
0
0
0
0
0
0
1/2 23/2
0
0
21/2
0
0
21/2
0
0
0
21/2
0
0
0
0
0
0
23/2
0
0
21/2
0
21/2
is positive semidenite and satises the conditions given above. This matrix A
factors as A = V T V with
25/4 0
0 21/4 0 21/4
V =
,
25/4 0 21/4
0 0
0
and hence,
&
1
y
8
'
&
'2
%2
1
1 $
mod I.
2x2 + 2y 2 1 + 2 y
4 2
8
In general, nding exact sos expressions, as above, is dicult. This particular sos
decomposition was found by Bruce Reznick using a series of tricks. He showed that
( 18 y) + 12 ((x2 + y 2 )2 (x2 y 2 ))
/2
$ 2
%2 .
1
2
1
2x
+
2y
1
+
2
y
.
= 4
2
8
In practice, one can use an SDP solver to nd A. Using MATLAB, to do this
computation in YALMIP [17] we input the following code:
sdpvar a14 a15 a16 a22 a25 a26 a33 a44 a46 a55
A=[ 1/sqrt(8) 0
-1/2 a14 a15 a16;
0
a22 -a15 0
a25 a26;
-1/2
-a15 a33 -a25 -a26 0 ;
a14
0
-a25 a44 0
a46;
a15
a25 -a26 0
a55 0 ;
a16
a26 0
a46 0
a44];
i
i
main
2012/11/1
page 301
301
We ran this code with SeDuMi 1.1 as the underlying SDP solver in YALMIP. The
matrix can now be recovered by simply typing double(A) and we obtain
0.3536
0.0000 0.5000 0.4052 0.0000 0.1985
0.0000
0.1034
0.0000
0.0000 0.2924 0.0000
0.5000 0.0000
1.1041
0.2924
0.0000
0.0000
,
A=
0.2924
0.7071
0.0000
0.2936
0.4052 0.0000
0.3536000000 y
0.707(x4 + 2x2 y 2 + y 4 x2 + y 2 )
1011 (8.089965190 x2y 3.247827064 y 3).
The last command will actually display a list of polynomials whose squares
sum up to (approximately) l(x, y) + f (x, y). In our example, the following output
is obtained
i
i
302
main
2012/11/1
page 302
i
which should be interpreted as saying that l(x, y) is the sum of squares of the
polynomials shown on each line. Note that the last two polynomials in the list
above again point to the fact that the software only provided reasonable evidence
that l(x, y) is 2-sos mod I.
The above computations also give a glimpse into the intertwining of algebraic
and numerical methods that is prevalent in convex algebraic geometry. The question
of whether a polynomial is a sum of squares modulo an ideal is purely algebraic.
However, the search for an sos expression is done via semidenite programming
which is solved using numerical methods. The answer provided by these numerical
solvers is often not exact. Massaging the numerical information into a certiable
answer can sometimes be an art.
Example 7.2. Consider the polynomial g(x, y) := y 2 (1 x2 ) (x2 + 2y 1)2 and
the ideal I = g(x, y) dening the bicorn curve shown in Figure 7.2. It is clear
that y 0 over the curve. Instead of checking if y is k-sos mod I for some k (which
is never the case as we will see in the next section), it is in general more useful to
search for the smallest such that y + is k-sos mod I. That way, if y is not sos
mod I, we will at least obtain a valid inequality y + 0 on VR (I) which will then
be valid for THk (I). In general, y + is k-sos mod I if there exists some polynomial
h(x, y) of degree 2k 4 such that (y + ) + h(x, y)g(x, y) is sos. As before, this can
be checked easily using YALMIP.
k=2;
sdpvar x y mu
[h,c]=polynomial([x y],2*k-4);
g=y^2*(1-x^2)-(x^2+2*y-1)^2;
F=sos(y+mu-h*g);
solvesos(F,mu,[],[mu;c]);
i
i
main
2012/11/1
page 303
i
303
7.2.2
Theta Bodies
We now come back to theta bodies of the ideal I and their representations. Recall
that the kth theta body of I is
THk (I) := {x Rn : l(x) 0 for all l linear and k-sos mod I}.
Given any polynomial, it is possible to check whether it is k-sos mod I using Grobner
bases and semidenite programming as seen in Section 7.2.1. The bottleneck in using the denition of THk (I) in practice is that it requires knowledge of all the linear
polynomials (innitely many) that are k-sos mod I. To overcome this diculty we
will now derive an alternative description of THk (I) as a projected spectrahedron
(up to closure) which enables computations via semidenite programming.
We may assume that there are no linear polynomials in the ideal I since
otherwise, some variable xi is congruent to a linear combination of other variables
mod I, and we may work in a smaller polynomial ring. Therefore, R[x]1 /I
= R[x]1
and {1 + I, x1 + I, . . . , xn + I} can be completed to a basis B of R[x]/I. Recall
the denition of degree of f + I. We will assume that each element in a basis
B = {fi +I} of R[x]/I is represented by a polynomial whose degree equals the degree
of its equivalence class, and that B is ordered so that deg(fi + I) deg(fi+1 + I).
Further, Bk denotes the ordered subset of B of degree at most k.
Denition 7.3. Let I R[x] be an ideal. A basis B = {f0 + I, f1 + I, . . .} of R[x]/I
is a -basis if it has the following properties:
1. B1 = {1 + I, x1 + I, . . . , xn + I}.
2. If deg(fi + I), deg(fj + I) k, then fi fj + I is in the R-span of B2k .
Our goal will be to rst express the kth theta body THk (I) as the closure
of a certain set of linear functionals on the k-sos polynomials mod I. This will be
achieved in Theorem 7.6. In the case where I contains the polynomials x2i xi
for all i = 1, . . . , n, the closure can be removed (Theorem 7.8). Such ideals appear
in combinatorial optimization and hence this result will have an important role in
Section 7.4. After this, we use a -basis of the quotient ring R[x]/I to turn the
description of THk (I) in Theorem 7.6 to an explicit semidenite representation.
This allows concrete computations and examples. We proceed toward Theorem 7.6.
In what follows, we identify a linear polynomial + a, x R[x]1 with the
vector (, a) Rn+1 . Let k1 (I) := {f + I : f R[x]1 , f k-sos mod I}. Then k1 (I)
is a cone in the vector space R[x]1 /I
= R[x]1 , and its dual cone k1 (I) lives in
n+1
. Thus,
(R[x]1 /I) = R[x]1 = R
k1 (I) = {(t, x) R Rn : t + a, x 0 for all (, a) k1 (I)}.
i
i
304
main
2012/11/1
page 304
i
(7.2)
i
i
main
2012/11/1
page 305
i
305
Proof. Since {1} Qk (I) = Sk (I) H, we have {1} cl(Qk (I)) = cl(Sk (I) H).
Since cl(Sk (I)) = k1 (I) , it follows from (7.2) that {1} THk (I) = cl(Sk (I)) H.
Therefore, the theorem will follow if we can show that
cl(Sk (I)) H = cl(Sk (I) H).
By Lemma 7.5, this equality holds if H intersects Sk (I) in its relative interior. Again, by Lemma 7.5, relint(k1 (I) ) Sk (I). Lemma 7.4 showed that H
intersects the relative interior of k1 (I) and hence the relative interior of Sk (I).
We now focus on an important situation where the closure is not needed in
Theorem 7.6. In many cases in practice, we are interested in nding the convex hull
of a set S Rn that may not be presented as the real variety of an ideal. However,
the approximation THk (I) of conv(S) is dened with respect to an ideal I whose
real variety is S. In this case, the canonical choice for such an ideal is the vanishing
ideal of S, denoted as I(S), which consists of all polynomials in R[x] that vanish
on S. The real radical of an ideal I R[x] is the ideal
4
5
R
I = f R[x] : f 2m +
gi2 I, m N, gi R[x] ,
and the ideal I is said to be real radical if I = R I. The real Nullstellensatz [21]
states that I is real radical if and only if I = I(VR (I)). This is the analogue of
Hilberts Nullstellensatz for real algebraic varieties. Computing any ideal I such that
VR (I) = S might be hard, and in general, computing I(S), given S, might also be
hard. However, in many cases of practical interest, I(S) is available. A large source
of such examples is combinatorial optimization, where S is usually a nite set of
0/1 points for which a generating set for I(S) can be computed using combinatorial
arguments. We will see several such examples in Section 7.4. If S is a subset of
{0, 1}n and I = I(S), then Theorem 7.6 can be improved to Theorem 7.8. We rst
prove a lemma.
Lemma 7.7. Let J be any ideal that contains x2i xi for all i = 1, . . . , n. Then
1 + J is in the relative interior of k (J) = {f + J : f is k-sos mod J.
Proof. Let I := x2i xi for all i = 1, . . . , n. We will rst show that 1 + I is in
the relative interior of k (I) R[x]2k /I. The cone k (J) is a projection of k (I)
since I J, and hence, if 1 + I relint(k (I)), then 1 + J relint(k (J)). 1 + I
is in the relative interior of k (I), which is a cone in the vector space R[x]2k /I.
We will show that for any polynomial p R[x]2k , (1 + p) + I k (I) for
some > 0. Since we are working modulo I, we may assume that every monomial
in p is square-free. Further, since every monomial is a square modulo I, it suces
to show that (1 q) + I k (I) for any square-free monomial q of degree at most
2k and some > 0. Write q = q1 q2 for some square-free monomials q1 , q2 of degree
at most k. Now note that
(1 q2 )2 = 1 2q2 + q22 1 q2 mod I, and
(1 q1 + q2 ) = 1 + q12 + q22 2q1 + 2q2 2q1 q2 1 q1 + 3q2 2q1 q2 mod I.
2
i
i
306
main
2012/11/1
page 306
i
i
i
main
2012/11/1
page 307
i
307
1 x x2 x3
x x2 x3 x4
[x]B3 [x]TB3 =
x2 x3 x4 x5 ,
x3 x4 x5 x6
which is entrywise equivalent mod I to
1
x
x2
2
x
x
x3
2
3
3
x
x
x + x2 x
3
3
2
x x +x x
2x3 x
x3
x3 + x2 x
.
2x3 x
2x3 + x2 2x
1
y1
MB3 (y) =
y2
y3
y1
y2
y3
y3 + y2 y1
y2
y3
y3 + y2 y1
2y3 y1
y3
y3 + y2 y1
.
2y3 y1
2y3 + y2 2y1
The reduced moment matrices MB1 (y) and MB2 (y) are the upper left 2 2
and 3 3 principal submatrices of MB3 (y).
Example 7.11. Consider the ideal I = x4 y 2 z 2 , x4 + x2 + y 2 1. Using a
computer algebra package such as Macaulay2 [10] one can calculate a total degree
reduced Gr
obner basis of I as follows:
Macaulay2, version 1.3
i1
i2
i3
o3
:
:
:
=
R
I
G
|
which says that this Grobner basis consists of the two polynomials
x2 + 2y 2 + z 2 1 and 4y 4 + 4y 2 z 2 + z 4 5y 2 3z 2 + 1.
A basis for the quotient ring R[x, y, z]/I is given by the standard monomials of the
initial ideal x2 , y 4 , which gives the following partial bases:
B1
B2
B3
B4
= {1, x, y, z},
= B1 {xy, y 2 , xz, yz, z 2},
= B2 {xy 2 , y 3 , xyz, y 2 z, xz 2 , yz 2 , z 3 },
= B3 {xy 3 , xy 2 z, y 3 z, xyz 2 , y 2 z 2 , xz 3 , yz 3 , z 4 }.
i
i
308
main
2012/11/1
page 308
i
x
y1
y
y2
z
y3
y2
y5
xy
y4
xy 3
y16
xz
y6
xy 2 z
y17
yz
y7
y3z
y18
z2
y8
xyz 2
y19
xy 2
y9
y2z 2
y20
y3
y10
y2z
y12
xyz
y11
xz 3
y21
yz 3
y22
xz 2
y13
yz 2
y14
z3
y15
z4
y23 .
MB2 (y) =
y1
T1
y2
y4
y5
y3
y6
y7
y8
y4
T2
y9
y11
T4
y5
y9
y10
y12
y16
T6
y6
T3
y11
y13
T5
y17
T7
y7
y11
y12
y14
y17
y18
y19
y20
y8
y13
y14
y15
y19
y20
y21
y22
y23
where we have lled in only the upper triangular region. The unknowns T1 , T2 , . . .
stand for the following expressions:
T1
T2
T3
T4
T5
T6
T7
= 2y5 y8 + 1,
= 2y10 y14 + y2 ,
= 2y12 y15 + y3 ,
= y20 + y223 3y25 3y28 + 12 ,
= 2y18 y22 + 1,
= y20 y423 + 5y45 + 3y48 14 ,
= 2y20 y23 + y8 .
The Ti s can be calculated using Macaulay2 by rst nding the normal form of the
needed monomial with respect to the Gr
obner basis that was calculated and then
linearizing using the yi s. For instance, T2 is the linearization of the normal form
of x2 y, which by the calculation below, is 2y 3 yz 2 + y.
i6 : x^2*y%G
3
2
o6 = - 2y - y*z + y
The reduced moment matrix MBk (y) can also be dened in terms of linear
functionals on R[x]2k /I. For a vector y = (yb ) RB2k , dene Ly (R[x]2k /I)
as Ly (b) := yb for all b B2k . Then every L (R[x]2k /I) is equal to Ly for
y = (L(b) : b B2k ) RB2k . If y RB2k , let y0 := y1+I , yi := yxi +I for i = 1, . . . , n.
Further, let Rn be the projection map that sends y RB2k to (y1 , . . . , yn ) Rn .
i
i
main
2012/11/1
page 309
i
309
Lemma 7.12.
1. For a vector y RB2k with y0 = 1, the entry of MBk (y) indexed by bi , bj Bk
is Ly (bi bj ).
2. MBk (y) 0 Ly (f 2 + I) 0 for all f + I R[x]k /I.
Proof. The rst part follows from the denition of MBk (y) and Ly . For f + I
R[x]k /I, let f be the unique vector in RBk such that f +I = bi Bk fi bi . Therefore,
f 2 + I = bi ,bj Bk fi fj (bi bj ) which implies that
Ly (f 2 + I) =
bi ,bj Bk
i
i
310
main
2012/11/1
page 310
i
We then get cl(QB2 (I)) [1.0000, 1.0417], and we will later see that it is actually
exactly [1, 25
24 ].
To nish, we compute QB3 (I) = {y1 : y R3 s.t. MB3 (y) 0}. This is the
projection onto the y1 -coordinate of the spectrahedron in R3 described by all the
i
i
main
2012/11/1
page 311
i
311
Figure 7.4. The variety of Example 7.11 and its rst theta body.
i
i
312
main
2012/11/1
page 312
i
1
x1
x2
x21
x1 x2
x22
x1
x2
x21
x1 x2 x22
1
y1
y2
y3
y4
y5
y1
y3
y4
y6
1
y7
y2
y4
y5
1
y7
y8
y3
y6
1
y9
y1
y2
y4
1
y7
y1
y2
y10
y5
y7
y8
y2
y10
y11
If MB2 (y) 0, then the principal minor indexed by x1 and x1 x2 implies that
y2 y3 1, and so in particular, y2 = 0 for all y QB2 (I). However, since QB2 (I)
conv(VR (I)) = {(s1 , s2 ) R2 : s2 > 0}, it must be that QB2 (I) = conv(VR (I)),
which shows that QB2 (I) is not closed.
We will see in the next section that when S is a nite set of points in Rn ,
the ideal I = I(S) of all polynomials that vanish on S, has the property that
i
i
main
2012/11/1
page 313
i
313
THl (I) = conv(VR (I)) = conv(S) for a nite l that depends on I. However, since
conv(S) QBl (I) THl (I), we also get that QBl (I) is closed. Even in this case,
QBk (I) may not be closed for some k < l.
Example 7.16. Consider the nite set of points S = {(t, 1/t2 ) : t = 1, . . . , 7}
lying on the curve x21 x2 = 1. Then
I(S) = x21 x2 1, (x21 1)(x21 4)(x21 9)(x21 16)(x21 25)(x21 36)(x21 49).
This is a zero-dimensional ideal, and a basis for R[x1 , x2 ]/I(S) is given by
B = {1, x1 , x2 , x21 , x1 x2 , x22 , x1 x22 , x31 , x32 , x1 x32 , x41 , x42 , x51 , x1 x42 } + I.
In particular, B4 is the same as the B4 in Example 7.15 and the initial ideal of I(S)
whose standard monomials are the monomials in B is generated by {x21 x2 , x52 , x61 }.
Therefore, MB2 (I(S)) and QB2 (I(S)) agree with those in Example 7.15, which implies that QB2 (I(S)) is not closed.
Another natural question is whether the theta bodies of dierent ideals with
the same real variety can have drastically dierent behaviors, especially with respect
to convergence
anideal I and its
to the convex hull of the variety. For instance,
real radical R I have the same real variety and I R I, THk ( R I) THk (I)
for all k.
Theorem 7.17. Fix
an ideal I. Then there exists a function : N N such that
TH (k) (I) THk ( R I) for all k.
We refer the reader to [9, Section 2.2] for a proof. The main message to take
away from this result is that whether or not the theta body hierarchy of an ideal
converges to cl(conv(VR (I))) is determined by the real variety of I. In particular,
whether the theta body sequence of anideal converges to cl(conv(VR (I))) in nitely
many steps, or not, is determined by R I.
7.2.3
Possible Extensions
The focus of this chapter is on polynomial equations, and sums of squares relaxations. However, all this theory can potentially be adapted to work in some more
complicated cases. In this section we give examples of some constructions that give
a avor of possible extensions. Similar constructions were also seen in Chapter 6,
and we refer to [22] for a more systematic study of the types of techniques we will
see below (in a slightly dierent setting).
Example 7.18. The theta body sequence can be modied to deal with polynomial inequalities, using Lasserres ideas. Given an ideal I and some polynomials g1 , . . . , gt , we might want to nd the convex hull of the semialgebraic set
S = {x VR (I) : g1 (x) 0, . . . , gt (x) 0}. To do this we use shifted reduced
moment matrices in addition to the reduced moment matrices of I.
i
i
314
main
2012/11/1
page 314
i
1
x
y
w20
w11 w02
x w0 w1
w30
w21 w12
2
1
y w11 w02
w21
w12 w03
w 0 w 0 w 1 w 0 w 2 w 1 w 2 ,
2
3
2
2
0
3
2
1
w1 w21 w12
w31
w22 w13
w02 w12 w03
w22
w13 w04
where wij is the linearization of xi y j . The combinatorial moment matrix shifted by
x and truncated at k = 1 is
x w20 w11
0
w2 w30 w21 .
w11 w21 w12
If we force both matrices to be positive semidenite and project over the x, y coordinates, we get an approximation of the convex hull of the right half of the lemniscate,
as shown in Figure 7.7. By increasing the truncation parameter of the reduced moment matrix and the shifted moment matrix we get better approximations to the
convex hull.
Note that in this example we are essentially searching for certicates of nonnegativity of the form l(x, y) 0 (x, y) + x1 (x, y) mod I, where 0 and 1 are
2-sos and 1-sos, respectively.
i
i
main
2012/11/1
page 315
i
315
1
x
y
w20
w11 w02
x w20 w11
w30
w21 w12
y w1 w2
w21
w12 w03
1
0
,
0
w2 w30 w21 w30 w02 w31 w22
1
w1 w21 w12
w31
w22 w13
w02 w12 w03
w22
w13 w04
where wij is a variable that linearizes the monomial xi y j , and so the rows and
columns are indexed by {1, x, y, x2 , xy, y 2 }. One can in this case strengthen the
condition by adding a new row and column to the matrix, indexed not by a monomial
1
. We then use the same strategy as
but by the fraction xy that we linearize as w1
before, of linearizing all resulting products modulo the relation x4 = x3 y 2 (which
2
2
allows us to get rid of w4,0 ) and the relations yx = x2 x3 and xy 2 = x x2 (which
eliminates two more variables). This new pseudomoment matrix is given by
1
1
x
y
w20
w11 w02
w1
x
w20
w11
w30
w21 w12
y
1
2
1
2
3
0
0
y
w1
w0
w2
w1 w0 w2 w3
0
0
w21
w30 w02 w31 w22
w11
M (x, y, w) =
.
w2 w3
w1 w1
2
1
2
3
2
w
w
w
w
w
1
2
1
3
2
1
0
2
3
w0 w12
w03
w22
w13 w04
w1
1
3
w1
y w20 w30
w11
w02 w1
x w20
Since the original moment matrix is a submatrix of M (x, y, w), the body Q =
{(x, y) : w s.t. M (x, y, w) 0} must be contained in TH2 (p), and a simple
numeric computation seems to show that Q actually matches the convex hull of the
real variety VR (p), as we can see in Figure 7.8. In this gure we see a comparison
of the second theta body and Q, drawn numerically using YALMIP. The fact that
Q seems to be exact is related to the fact that we can now use the term xy to get
sos certicates. For example, x = x2 + ( xy )2 modulo the new identities that we
introduced.
B C
Exercise 7.20. Let I = x2 .
1. Show that x is not k-sos mod I for any k.
2. Show that for any > 0, the polynomial x + is 1-sos mod I.
3. Describe TH1 (I).
i
i
316
main
2012/11/1
page 316
i
Figure 7.8. In the darker color we see TH2 (p), while in the lighter color
we see the strengthening Q as dened in Example 7.19. In black we see the variety
itself.
i
i
main
2012/11/1
page 317
i
317
7.3
k=1
2. For a nite integer k, the ideal I is THk -exact if THk (I) = cl(conv(VR (I))).
3. If I is THk -exact for a nite integer k, then we say that the theta body sequence of I converges to cl(conv(VR (I))) in nitely many steps. If the theta
body sequence of I converges to cl(conv(VR (I))) but there is no nite k for
which I is THk -exact, then we say that the theta body sequence of I converges
asymptotically to cl(conv(VR (I))).
We will see in Section 7.3.1 that if VR (I) is nite, then there is always some
nite k for which I is THk -exact. However, tight bounds on k for which I is THk exact are not known in general. The best scenario is when I is TH1 -exact. We
characterize nite varieties whose real radical ideal is TH1 -exact. Recall from the
discussion following Theorem 7.17 that there is no loss of generality in passing to
the real radical of I in discussing convergence.
When VR (I) is innite, much less is understood about the convergence of
the theta body sequence of I. In Section 7.3.2 we explain what we know about
this case. The best general result is that when VR (I) is compact, the theta body
sequence is guaranteed to converge to cl(conv(VR (I))) asymptotically. However,
nite convergence, and even convergence in the rst step are sometimes possible for
innite varieties, although no characterization is known in either case. We show that
certain singularities can prevent nite convergence when the variety is compact.
i
i
318
7.3.1
main
2012/11/1
page 318
i
Theorem 7.26. Let I be an ideal such that VR (I) is nite; then there exists some
k such that THk (I) = conv(VR (I)).
Proof. First notethat by Theorem 7.17 we just need to prove the existence of
such a k for J = R I. Let VR (I) := {P1 , . . . , Pm } Rn and, for each Pi , let qi
be a polynomial such that qi (Pi ) = 1 and qi (Pj ) = 0 for j = i. Then given any
polynomial f (x) that is nonnegative on VR (I) we have that
'2
m &?
f (x)
f (Pj )qj (x)
j=1
vanishes at all Pi , and hence it belongs to J, and f is sos modulo J. So all nonnegative polynomials on VR (J) are sos modulo J, which in particular implies that
each of them is nonnegative over some THk (J). Since the convex hull of VR (I)
is a polytope, it is cut out by a nite number of linear inequalities. Pick k large
enough for all these linear inequalities to be valid on THk (J) simultaneously. Then
conv(VR (I)) = THk (J).
Clearly, Theorem 7.26 implies that when VC (I) is nite, the ideal I is THk exact for some nite k. When the ideal I is also radical, nite convergence of
its theta body sequence to the convex hull of the variety was proved by Parrilo
(see Theorem 2.4 in [16]). Having established nite convergence of the theta body
sequence of I when VR (I) is nite, one can ask the more ambitious question of when
such an I is TH1 -exact. This is the most useful and computationally practical case
of nite convergence. If the ideal dening a nite set of points is always assumed to
be the vanishing ideal of the variety (and hence real radical), we can give a complete
geometric characterization of when they are TH1 -exact. We will need the following
fact about real radical ideals.
Lemma 7.27 ([8]). If I R[x] is a real radical ideal, then a linear inequality
l(x) 0 is valid for THk (I) if and only if l(x) is k-sos modulo I.
In order to characterize real radical ideals with nite real varieties, we need a
new denition.
Denition 7.28. Given a polytope P , we say that P is 2-level if for each facet F
of P and its ane span HF , all vertices of P are either in F or in a unique translate
of HF .
Example 7.29. In R3 , up to ane equivalence there are ve three-dimensional
2-level polytopes, shown in the upper part of Figure 7.10. It is easy to see that a
2-level polytope must be anely equivalent to a 0/1-polytope. In the bottom of
Figure 7.10 we show the three remaining 0/1-polytopes (up to ane equivalence)
with a face that fails to verify the 2-level condition highlighted.
i
i
main
2012/11/1
page 319
i
319
Figure 7.10. The top row contains all 0/1 three-dimensional 2-level polytopes (up to ane equivalence). The bottom row contains all 0/1 three-dimensional
polytopes (up to ane equivalence) that are not 2-level.
Theorem 7.30. Let I be real radical with S := VR (I) nite. Then I is TH1 -exact
if and only if S is the set of vertices of a 2-level polytope.
Proof. Assume without loss of generality that S spans the entire space and let
f1 (x) 0, . . . , fm (x) 0 be a minimal list of linear inequalities describing P :=
conv(S), i.e., each fi corresponds to a facet Fi of P and is zero on that facet. By
Lemma 7.27, I is TH1 -exact if and only if all fi are 1-sos mod I, since every ane
linear polynomial that is nonnegative on S is a nonnegative linear combination of
the fi s.
If I is TH1 -exact, for each i = 1, . . . , m, we have fi (x) (hk (x))2 mod I,
where all hk are linear. But since fi vanishes on S Fi so must all hk and
therefore, since they are linear, they must vanish on the ane space generated
by Fi . This means that they are actually just scalar multiples of fi and we have
fi (x) (fi (x))2 mod I, for some nonnegative . In particular, all points P S
must satisfy either fi (P ) = 0 or fi (P ) = 1/ proving the 2-level condition.
Suppose now that P is 2-level. Then for each fi , all points P S must satisfy
fi (P ) = 0 or fi (P ) = i , for some xed i > 0. But then fi (fi i ) vanishes on
S, and therefore belongs to I. This implies fi (1/i )fi2 mod I and fi is 1-sos
modulo I.
Theorem 7.30 will turn out to be very useful in the context of combinatorial
optimization as we will see in the next section. Polytopes with integer vertices
that are 2-level are called compressed polytopes in the literature [34, 35] and play an
important role in other research areas. Being 2-level is a highly restrictive condition
that immediately gives us much information on the polytope. Since all the vertices
of a 2-level polytope in Rn can be assumed to be 0/1 vectors, it is clear that they
have at most 2n vertices. It was shown in [8] that they also have at most 2n facets
which is not obvious. There are many innite families of 2-level polytopes such as
simplices, hypercubes, cross polytopes, and hypersimplices.
i
i
320
main
2012/11/1
page 320
i
7.3.2
We begin by showing that unlike for nite varieties, the theta body approximations
can fail drastically when VR (I) is innite. The following simple example is adapted
from Example 1.3.2 in [21].
B
C
Example 7.31. Consider the ideal I = x2 y 3 dening the cusp in Figure 7.11.
The closure of the convex hull of this curve is the upper half-plane, so the only linear
0. Suppose
inequalities valid on the curve are of the form l (x, y) = y + , where
there exists some l with an sos certicate modulo I, then l (x, y)
pi (x, y)2
mod I for some polynomials pi . Note that any polynomial p has a unique standard
form of the type a(y) + xb(y) modulo this ideal, which we can obtain by reducing all
multiples of x2 , using the fact that x2 y 3 mod I. Two polynomials are the same
modulo the ideal if they have the same standard form. Since l (x, y) is already in
this form, we can simply reduce the right-hand side in the congruence relation to its
standard form too. Suppose each pi = ai (y) + xbi (y). Then it is easy to check that
pi (x, y)2
mod I.
k=1
i
i
main
2012/11/1
page 321
i
321
This is an immediate consequence of Schmudgens Positivstellensatz (see Chapter 3). To see the connection, just consider any set of generators {g1 , . . . , gt } for I
and the semialgebraic set S = {x Rn : g1 0, . . . gt 0} = VR (I). When applied to S, Schmudgens Positivstellensatz guarantees that every linear polynomial
that is strictly positive over VR (I) is sos modulo I.
Example 7.33. The existence of varieties as in Example 7.31 does not imply that
for all unbounded varieties we have problems with the theta body sequence. Consider the strophoid curve given by p(x, y) := (1 y)x2 (1 + y)y 2 = 0, shown in
Figure 7.12. The closure of the convex hull of this variety is the band B dened by
1 y 1. We claim that TH2 (I) = B. To show this it is enough to prove that
both 1 y and 1 + y are 2-sos modulo I, which is true since
&
'2
%2 1
1 2
1
1$
1
2
y y 2 + (xy x) + (y 1)p(x, y).
1y = 1 y y
+
2
2
4
2
2
In what follows we concentrate our eorts on the compact case, where asymptotic convergence of the theta body sequence is guaranteed. The next natural
question when VR (I) is innite but compact is whether we can understand when
the theta body sequence converges in nitely many steps to cl(conv(VR (I))). Finite convergence would prove that conv(VR (I)) is the projection of a spectrahedron,
which is an important feature of a convex semialgebraic set as seen in Chapter 6.
There is no complete understanding of this situation, but in the remainder of this
section, we discuss the known results.
TH1 -exactness. We begin by discussing the strongest scenario within nite convergence, namely TH1 -exactness of an ideal. In spite of the strength of this property,
there are surprisingly many interesting examples of such ideals with innite real varieties. We begin by taking a general look at the notion of TH1 -exactness for all
ideals. Roughly speaking, TH1 -exact ideals are those whose quadratic elements are
enough to describe their convex geometry, a statement that will be made precise
shortly. We start with a small lemma concerning convex quadrics.
Lemma 7.34. If p R[x] is a convex quadric polynomial, then p is TH1 -exact.
i
i
322
main
2012/11/1
page 322
i
Proof. This result will follow from Proposition 7.41, where we will show that the
rst theta body of any quadric is simply the convex hull of its graph intersected
with the x-plane. This intersection is precisely conv(p) if p is convex.
We now give an alternative characterization of TH1 (I) for any ideal I.
Proposition 7.35. For any ideal I R[x], TH1 (I) equals the intersection of
conv(VR (p)) as p varies over all convex quadrics in I.
Proof. The inclusion TH1 (I) conv(VR (p)) for all convex quadrics p I is
easy, since a linear inequality is valid over the second set if and only if it is 1-sos
modulo p, which immediately implies that it is 1-sos modulo I and therefore valid
on TH1 (I). For the second inclusion note that if l(x) is 1-sos mod I, then
l(x) = (x) + g(x),
where is a sum of squares and g is a quadric in I. But note that 2 g =
2 0 which implies g is a convex quadric in I, and l(x) is 1-sos modulo g.
Therefore, l(x) 0 is valid on conv(VR (g)) and hence also valid on the intersection
of conv(VR (p)) as p varies over all convex quadrics in I.
B
C
Example 7.36. Consider the ideal I = x4 y 2 z 2 , x4 + x2 + y 2 1 that we
introduced in Example 7.11. This is the intersection of two quartic surfaces in R3 .
The Gr
obner basis computation we did then shows that there exists a single quadric
in this ideal (up to scalar multiplication), which is the polynomial 1+x2 +2y 2 +z 2 .
Therefore, TH1 (I) equals the ellipsoid {(x, y, z) R3 : x2 + 2y 2 + z 2 1}, as seen
in Figure 7.4.
Proposition 7.35 can sometimes be used to prove TH1 -exactness.
Example 7.37. Consider the ideal I = x2 + y 2 + z 2 4, (x 1)2 + y 2 1, from
Example 7.47. Note that the quadratic polynomials p1 = (x 1)2 + y 2 1 and
p2 = 2x + z 2 4 belong to I. Write I1 = p1 and I2 = p2 . Then we claim that
conv(VR (I)) = conv(VR (I1 )) conv(VR (I2 )),
and therefore I is TH1 -exact. To see this note that the variety VR (I) can be written as
i
i
main
2012/11/1
page 323
i
323
i
i
324
main
2012/11/1
page 324
i
i
i
main
2012/11/1
page 325
i
325
Figure 7.14. On the left we see the cardioid p(x) = 0 and its convex hull.
On the right we see the graph of p, its intersection with the plane z = 0 and the
ellipsoidal region where the graph and the boundary of its convex hull dier.
separation theorem for convex sets we can therefore take a hyperplane H that
strictly separates L and C. Since H does not touch the graph of p, it depends
on x0 , and since it does not touch L, it must be parallel to it. Therefore we have
a hyperplane of the form l (x0 , x) := x0 + (l(x) ) = 0, with = 0, > 0.
Since p(x0 , x) = x0 p(x), this means that (x) := p(x) + (l(x) ) is always
nonnegative or always nonpositive. Without loss of generality assume it is always
nonnegative (which implies > 0). Since the degree and number of variables of
this polynomial fall under Hilberts result (see Chapter 4), (x) is a sum of squares.
Hence, l(x) = (x)/ + p(x)/ is d-sos modulo the ideal, which implies that
l(x) 0 is valid over THd (p), proving the inclusion.
Example 7.42. We use the above result to prove TH2 -exactness of the following
principal ideal. Consider
p(x, y) = (x2 + y 2 + 2x)2 4(x2 + y 2 )
dening a cardioid, and the function
q(x, y) =
p(x, y)
if (x + 1)2 + y 2 3,
if (x + 1)2 + y 2 < 3.
8x 4
One can check that q is smooth and convex by noticing that p(x, y) = ((x+1)2 +y 2
3)2 +8x4 and by looking at its Hessian. Furthermore, the convex hull of the graph
of p is just the region above the graph of q. Therefore sh(p) = {(x, y) : q(x, y) 0},
and we can see in Figure 7.14 that sh(p) is the convex hull of the cardioid.
Even for one-variable polynomials this result is interesting.
i
i
326
main
2012/11/1
page 326
i
(7.3)
(7.4)
i
i
main
2012/11/1
page 327
i
327
Figure 7.16. TH2 (I), TH3 (I), TH4 (I), and TH5 (I): all contain the origin
in their interior.
Proof. This follows from the previous proposition and Lemma 7.27.
Example 7.46. Let p(x, y) = (x2 + y 2 )2 (x + 5y)x2 and I = p. This ideal
denes a bifolium with a singularity at the origin, which implies N(0,0) (I) = {(0, 0)}.
Furthermore the linear inequality x + 5y 0 is valid on the variety and holds
with equality at the origin. Since (1, 5) N(0,0) (I) we immediately have that this
inequality does not hold for any theta body relaxation of this ideal. In Figure 7.16
we can see THk (I) for k = 2, 3, 4, 5, and see that in fact the inequality does not
hold for any of them.
Corollary 7.45 essentially tells us that certain singularities of the ideal I that
are in the boundary of the convex hull of VR (I) aect the convergence of the theta
bodies of I. For a point P VR (I), the expected dimension of the normal space
NP (I) is the codimension of VR (I). A reasonable notion of a singularity of I is a
point P VR (I) for which NP (I) has smaller dimension than expected. The next
example will show that just the existence of singularities of I on the boundary of
conv(VR (I)) is not enough for Corollary 7.45 to apply.
Example 7.47. Consider the variety VR (I) in R3 dened by the ideal
I = x2 + y 2 + z 2 4, (x 1)2 + y 2 1.
As seen in Figure 7.17, this variety looks like a curved gure-eight and has a
singularity at the point p = (2, 0, 0), which belongs to the boundary of conv(VR (I)).
This happens since NP (I) = R{(1, 0, 0)} has dimension one, smaller than the codimension of the variety, which is two. However, (2, 0, 0) does not cause problems
for the convergence of theta bodies since the only linear polynomial that is zero at
p and nonnegative on VR (I) is the polynomial 2 x, whose gradient is in NP (I).
Indeed, the rst theta body of I already equals conv(VR (I)), as we will see in
Example 7.37.
i
i
328
main
2012/11/1
page 328
i
Figure 7.17. The curved eight variety and its convex hull.
A better, more rened, way of looking at singularities was introduced by
Omar and Osserman in [23]. They introduce a stronger notion of nonnegativity
over varieties that yields a stronger necessary condition for nite convergence of the
theta body hierarchy. As a byproduct they prove the following result.
Theorem 7.48. Let f (x) be a polynomial such that there exists some positive
integer n and an R-algebra homomorphism : R[x]/I R[]/ n for which
(f ) = a0 + a1 + + an1 n1 . If the rst nonzero (leading) coecient ai
is negative, then f is not a sum of squares modulo I.
Proof. Just note that homomorphisms send sums of squares to sums of squares, and
sums of squares in R[]/ n always have their leading coecient nonnegative.
Again this immediately gives us a new criterion.
Corollary 7.49. Let I be a real radical ideal and l(x) 0 a linear inequality valid
on VR (I). If there exists an R-algebra homomorphism : R[x]/I R[]/ n for
which (l) has negative leading coecient, then I is not THk -exact for any k.
This corollary is much stronger than Corollary 7.45, and examples showing
the dierence are presented in [23]. In our next example we just show that we can
recover Corollary 7.45 from Corollary 7.49 for the variety in Example
but, in
C
B 7.46
fact, we can do so for any variety just by considering maps to R[]/ 2 .
2
Example 7.50. Let p(x, y) = (x2 +y 2 )B2 (x+5y)x
and I = p as in Example 7.46.
C
2
Then the map : R[x, y]/I R[]/ dened by (x) = (y) = is well
dened, since (p) = 0. However, (x+5y) = 6 has a negative leading coecient
despite x + 5y 0 being valid on the variety. Hence, p is not THk -exact for
any k.
One should keep in mind that singularities are not necessarily the only things
that prevent nite convergence of the theta body sequence to cl(conv(VR (I))). For
compact smooth curves and surfaces, Scheiderer proved that nonnegativity and
i
i
main
2012/11/1
page 329
i
329
Figure 7.18. Serpentine curve and the closure of its convex hull.
sums of squares modulo the ideal are equivalent [28, 29]. However, even in these
cases, it is an open question if one can bound the degree needed to represent every
nonnegative ane polynomial as a sum of squares modulo the ideal. Thus there
might be examples of smooth curves and surfaces with no nite convergence of the
theta body hierarchy to conv(VR (I)). The only cases where we know a little more
is when the genus of the curve is one.
Proposition 7.51 (Theorem 2.1 [30]). If VR (I) is a smooth curve of genus 1
with at least one nonreal point at innity, then I is THk -exact for some k.
Genus zero curves can be rationally parametrized which allows semidenite
representations of their convex hulls by means of sums of squares, as seen in [13].
However such constructions do not automatically translate to nite convergence
of the theta body sequence to the convex hull of the curve, even in the smooth
case.
For varieties of dimension greater than two, there always exist nonnegative
polynomials that are not sums of squares modulo any ideal that denes them, even
in the smooth compact case, as seen in [27]. It is therefore very natural to expect
examples of smooth compact varieties with no nite convergence of the theta body
hierarchy, but we do not know a concrete example at this point.
Exercise 7.52. Consider the serpentine curve given by p(x) := y(x2 + 1) x = 0,
depicted in Figure 7.18. The closure of its convex hull is the band cut out by the
inequalities 1/2 y 1/2. Show that the ideal I = p is TH2 -exact by giving
an exact expression of 1 2y and 1 + 2y as 2-sos polynomials modulo I.
Exercise 7.53. Using Proposition 7.35 show that the rst theta body of the
vanishing ideal of the points {(0, 0), (1, 0), (0, 1), (2, 2)} is cut out by precisely two
polynomial inequalities, and write them explicitly.
C
B
Exercise 7.54. Consider the ideal I = y 2 x5 , z x3 . The inequality z 0 is
valid on the variety VR (I).
1. Can we use Proposition 7.44 to prove that z is not k-sos modulo I for any k?
2. Use Theorem 7.48 to prove that z is not k-sos modulo I for any k.
i
i
330
main
2012/11/1
page 330
i
7.4
Combinatorial Optimization
i
i
main
2012/11/1
page 331
i
331
relaxations of conv(VR (I)) for an ideal I. In the special case of the combinatorial
optimization model described above, the starting point is the nite set {T } which
is a nite algebraic variety, and we typically take its vanishing ideal as the ideal
whose theta bodies are to be computed. As we saw in Section 7.3.1, these real
radical ideals are always THk -exact for some nite k. We take a closer look at some
combinatorial optimization problems whose theta bodies have been explored.
7.4.1
An example that is at the heart of the history of theta bodies is the maximum
stable set problem in an undirected graph G = ([n], E) with vertex set [n] and edge
set E. A stable set in G is a set U [n] such that for all i, j U , {i, j} E. The
maximum stable set problem seeks the stable set of largest cardinality in G, the
size of which is the stability number, (G), of G.
The maximum stable set problem can be modeled as follows. For each stable set U [n], let U {0, 1}n be its characteristic vector dened as U
i = 1 if
n
i U and U
i = 0 otherwise. Let SG {0, 1} be the set of characteristic vectors
of all stable sets in G. Then STAB(G) := conv(SG ) is called the stable set polytope of
nG and the maximum stable set problem is, in theory, the linear program
max{ i=1 xi : x STAB(G)} with optimal value (G). However, STAB(G) is
not
n known a priori, and so one resorts to relaxations of it over which to optimize
i=1 xi .
Polyhedral relaxations of STAB(G) can be constructed from combinatorial
arguments. For instance, a well-known relaxation is the polytope
FRAC(G) := {x Rn : xi + xj 1 for all {i, j} E, xi 0 for all i [n]},
where the constraint xi + xj 1 for {i, j} E comes from the fact that both
endpoints of an edge cannot be in a stable set. It can be checked that STAB(G) is
exactly the convex hull of the integer points in FRAC(G). The polytope FRAC(G)
and several tighter polyhedral relaxations of STAB(G) have been studied extensively
in the literature; see [11, Chapter 9].
Since the set SG is an algebraic variety, the theta bodies of its vanishing ideal
oer convex relaxations of STAB(G). This vanishing ideal is:
IG := x2i xi for all i [n], xi xj for all {i, j} E R[x1 , . . . , xn ].
0
For U [n], let xU := iU xi . From the generators of IG it follows that if
f R[x], then f g mod IG where g is in the R-span of the set of monomials
{xU : U is a stable set in G}. In particular,
B := {xU + IG : U stable set in G}
is a -basis of R[x]/IG (containing 1 + IG , x1 + IG , . . . , xn + IG ). This implies that
Bk = {xU + IG : U stable set in G, |U | k}, and for xUi + IG , xUj + IG Bk ,
their product is xUi Uj + IG , which is 0 + IG if Ui Uj is not a stable set in G.
This product formula allows us to compute MBk (y), where we index the element
i
i
332
main
2012/11/1
page 332
i
xU + IG Bk by the set U . Since SG {0, 1}n and I(G) is the vanishing ideal of
SG , by Theorems 7.8, we have that
M = 1,
n
.
THk (IG ) = y R : M{i} = M{i} = M{i}{i} = yi
M
=
0
if
U
U
is
not
stable
in
G
UU
MUU = MW W if U U = W W
In particular, indexing the one-element stable sets by the vertices of G,
M00 = 1,
TH1 (IG ) = y Rn :
.
M0i = Mi0 = Mii = yi i [n]
MB1 (y) =
1
y1
y2
y3
y4
y5
y1
y1
0
y6
y7
0
y2
0
y2
0
y8
y9
y3
y6
0
y3
0
y10
y4
y7
y8
0
y4
0
y5
0
y9
y10
0
y5
Note that xi x2i and 1 xi (1 xi )2 mod IG for any graph G, so TH1 (IG ) is
always contained in the [0, 1] cube.
The rst example of an SDP relaxation of a combinatorial optimization problem was the theta body of a graph G = ([n], E) constructed by Lov
asz in [18] while
studying the Shannon capacity of graphs. The theta body of G, denoted as TH(G),
is a relaxation of STAB(G) that was originally dened as the intersection of the
innitely many half spaces that arise from the orthonormal representations of G.
Several equivalent denitions can be found in [18] and [11, Chapter 9]. However,
none of them point to an obvious generalization of the construction to other discrete
i
i
main
2012/11/1
page 333
i
333
n
xi : x TH(G) = TH1 (IG ) ,
max
i=1
which is an upper bound (and approximation) for the stability number (G) of
a graph. We can now easily compute (C5 ), the theta number of the 5-cycle,
numerically using YALMIP, since we have the precise structure of the reduced
moment matrix.
y=sdpvar(1,10);
M=[1
y(1) y(2) y(3) y(4) y(5) ;
y(1) y(1) 0
y(6) y(7) 0
;
y(2) 0
y(2) 0
y(8) y(9) ;
y(3) y(6) 0
y(3) 0
y(10);
y(4) y(7) y(8) 0
y(4) 0
;
y(5) 0
y(9) y(10) 0
y(5) ];
obj=y(1)+y(2)+y(3)+y(4)+y(5);
solvesdp(M>=0,-obj);
double(obj)
This will return the answer (C5 ) 2.361. Note that (C5 ) = 2, so we do get an
upper approximation as expected, but it is clear that IC5 is not TH1 -exact.
A particular reason for Lov
aszs interest in [19, Problem 8.3] was due to the
fact that STAB(G) = TH(G) if and only if G is a perfect graph [11, Corollary 9.3.27].
Recall that a graph is perfect if and only if it has no induced odd cycle of length at
least ve or its complement [4]. Since TH(G) = TH1 (IG ) for all graphs G, it follows
that IG is TH1 -exact if and only if G is perfect. The pentagon in Example 7.58
is not perfect, which justies our observation that its ideal IG is not TH1 -exact.
Chv
atal and Fulkerson had shown that STAB(G) = QSTAB(G) if and only if G is
a perfect graph where
n
xi 1 for all cliques K in G .
QSTAB(G) := x R : xi 0 for all i [n],
iK
i
i
334
main
2012/11/1
page 334
i
7.4.2
A General Framework
The stable set problem and many others in combinatorial optimization can be modeled as arising from a simplicial complex. A simplicial complex or independence
system, , with vertex set [n], is a collection of subsets of [n], called the faces of
the , such that whenever S and T S, then T . The StanleyReisner
ideal of is the ideal J generated by the square-free monomials xi1 xi2 xik such
that {i1 , i2 , . . . , ik } [n] is not a face of . If I := J + x2i xi : i [n],
then VR (I ) = {s {0, 1}n : support(s) }. The support of a vector
v Rn
0
T
is the set {i [n] : vi = 0}. Further, for T [n], if x := iT xi , then
B := {xT + I : T } is a -basis of R[x]/I . This implies that the kth theta
body of I is
THk (I ) = Rn {y RB2k : MBk (y) 0, y0 = 1}.
Since B is in bijection with the faces of and x2i xi I for all i [n], the theta
body can be written explicitly as
M = 1,
n
.
THk (I ) = y R : M{i} = M{i} = M{i}{i} = yi ,
M
=
0
if
U
,
UU
MUU = MW W if U U = W W
If the dimension of is d 1 (i.e., the largest faces in have size d), then I is
THd -exact since all elements of B have degree at most d and hence the last possible
theta body THd (I ) must coincide with conv(VR (I )) as VR (I ) is nite. However,
in many examples, I could be THk -exact for a k much smaller than d.
In the case of the stable set problem on G = ([n], E), is the set of all stable
sets in G. This is a simplicial complex with vertex set [n] whose nonfaces are the sets
T [n] containing a pair i, j [n] such that {i, j} E. Hence the minimal nonfaces (by set inclusion) are precisely the edges of G and so J = xi xj : {i, j} E.
i
i
main
2012/11/1
page 335
i
335
Then I = J +x2i xi : i [n], which is precisely the ideal IG from Section 7.4.1,
and the remaining facts about the -basis B used in Section 7.4.1 and the structure
of the theta bodies of IG follow from the general set up described above.
An example from combinatorial optimization that does not follow the simplicial complex framework is the maximum cut problem of nding the largest size cut
in a graph. Recall that a cut in G is the collection of edges that go between the two
parts of a partition of the vertices of G. Note that a subset of a cut is not necessarily
a cut and hence the set of cuts in a graph do not form a simplicial complex. In
[7] the theta body hierarchy for the maximum cut problem, and more generally for
binary matroids, is studied. In this case, a -basis for the ideal in question is not
obvious as in the simplicial complex model.
7.4.3
We nish the chapter with a second example from combinatorial optimization that
ts the simplicial complex model. A subgraph H of a graph G = ([n], E) is trianglefree if it does not contain a triangle (K3 , the complete graph on 3 vertices). Given
weights on the edges of G, the triangle-free subgraph problem in G asks for a trianglefree subgraph of G of maximum weight. If all the edge weights are one, then the
problem seeks a triangle-free subgraph in G with the most number of edges. The
triangle-free subgraph problem is known to be NP-hard [36] and is relevant in various
contexts within optimization.
The integer programming
formulation of the triangle-free subgraph problem
optimizes the linear function eE we xe , where we is the weight on edge e E, over
the characteristic
vectors {H : H is triangle-free in G}. This is equivalent to max
imizing eE we xe over
Ptf (G) := conv{H : H is triangle-free in G},
the triangle-free subgraph polytope of G. Note that Ptf (G) is a full-dimensional 0/1
polytope in RE . The triangle-free subgraph polytope of a graph has been studied by
various authors (see, for instance, [3, 5]), and a number of facet dening inequalities
of the polytope are known, although a full inequality description is not known or
expected.
Taking to be the simplicial complex on E consisting of all triangle-free
subgraphs in G, and Itf (G) := I , we have that
VR (Itf (G)) = {H : H is triangle-free in G}.
Hence the theta bodies of Itf (G) provide convex relaxations of the triangle-free
subgraph polytope Ptf (G). From the general framework in Section 7.4.2, B =
{xH +Itf (G) : H triangle-free in G} is a -basis of R[x]/Itf (G). Therefore, the rows
and columns of MBk (y) are indexed by the triangle-free subgraphs in G with at most
k edges. For ease of exposition, let us denote the entry of MBk (y) corresponding
to row indexed by xH1 and column indexed by xH2 by MBk (y)H1 H2 , let H1 H2
denote the subgraph of G whose edge set is the union of the edge sets of H1 and H2 ,
i
i
336
main
2012/11/1
page 336
i
and yH denote the entry of y RB corresponding to the basis element xH + Itf (G).
Then
M = 1,
E
.
THk (Itf (G)) = y R :
0 if H1 H2 has a triangle
MH1 H2 =
yH1 H2 otherwise
Since all subgraphs of G with at most two edges are triangle-free, and B1 =
{1 + Itf (G)} {xe + Itf (G) : e E}, TH1 (Itf (G)) is exactly the same as the rst
theta body of the ideal x2e xe : e E which is TH1 -exact by Theorem 7.30.
Hence TH1 (Itf (G)) = [0, 1]E , and Itf (G) is TH1 -exact if and only if every subgraph
of G is triangle-free, or equivalently, G is triangle-free.
For graphs G that contain triangles, the second theta body of Itf (G) is more
interesting as triples and quadruples of edges in G can contain triangles which forces
some of the entries in MB2 (y) to be zero.
Example 7.60. Suppose G = K3 with edges labeled 1, 2, 3. Then Ptf (G) is the
convex hull of all 0/1 vectors in R3 except (1, 1, 1) which is the rst polytope shown
in the second row of polytopes in Figure 7.10. This polytope is TH2 -exact since
B2 = {1, x1 , x2 , x3 , x1 x2 , x1 x3 , x2 x3 } + Itf (G) = B.
Denoting y RB2 , with rst entry one, to be
have that
1
y1 y2 y3
y1 y1 y12 y13
y2 y12 y2 y23
MB2 (y) =
y3 y13 y23 y3
y12 y12 y12 0
y12 0 y23
0 y13 y23
.
y12 0
0
0 y13 0
0
0 y23
i
i
main
2012/11/1
page 337
i
337
i
i
338
main
2012/11/1
page 338
i
Bibliography
[1] Y. H. Au and L. Tuncel. Complexity analyses of Bienstock-Zuckerberg and
Lasserre relaxations on the matching and stable set polytopes. In Integer Programming and Combinatorial Optimization, Lecture Notes in Comput. Sci.
6655, Springer, Heidelberg, 2011, pp. 1426.
[2] E. Balas, S. Ceria, and G. Cornuejols. A lift-and-project cutting plane algorithm for mixed 0-1 programs. Math. Program., 58:295324, 1993.
[3] F. Bendali, A. R. Mahjoub, and J. Mailfert. Composition of graphs and the
triangle-free subgraph polytope. J. Comb. Optim., 6:359381, 2002.
[4] M. Chudnovsky, N. Robertson, P. Seymour, and R. R. Thomas. The strong
perfect graph theorem. Ann. of Math. (2), 164:51229, 2006.
[5] M. Conforti, D. G. Corneil, and A. R. Mahjoub. Ki -covers. I. Complexity and
polytopes. Discrete Math., 58:121142, 1986.
[6] D. Cox, J. Little, and D. OShea. Ideals, Varieties and Algorithms. SpringerVerlag, New York, 1992.
i
i
Bibliography
main
2012/11/1
page 339
i
339
i
i
340
main
2012/11/1
page 340
i
[22] J. Nie. First order conditions for semidenite representations of convex sets
dened by rational or singular polynomials. Math. Program., 131:136, 2012.
[23] M. Omar and B. Osserman. Strong nonnegativity and sums of squares on real
varieties. arXiv:1101.0826.
[24] R. T. Rockafellar. Convex Analysis, Princeton Landmarks in Mathematics and
Physics. Princeton University Press, Princeton, NJ, 1996.
[25] P. Rostalski. Bermeja, Software for Convex Algebraic Geometry. Available at
http://math.berkeley.edu/philipp/Software/Software.
[26] R. Sanyal. Orbitopes and theta bodies. Talk at IPAM Workshop
on Convex Optimization and Algebraic Geometry, slides available at
http://math.berkeley.edu/bernd/raman.pdf, 2010.
[27] C. Scheiderer. Sums of squares of regular functions on real algebraic varieties.
Trans. Amer. Math. Soc., 352:10391069, 2000.
[28] C. Scheiderer. Sums of squares on real algebraic curves. Math. Z., 245:725760,
2003.
[29] C. Scheiderer. Sums of squares on real algebraic surfaces. Manuscripta Math.,
119:395410, 2006.
[30] C. Scheiderer. Convex hulls of curves of genus one. Adv. Math., 228:26062622,
2011.
[31] A. Schrijver. Theory of Linear and Integer Programming, Wiley-Interscience
Series in Discrete Mathematics and Optimization. Wiley, New York, 1986.
[32] A. Schrijver. Combinatorial Optimization. Polyhedra and Eciency. Vol. B,
Algorithms Combin. 24. Springer-Verlag, Berlin, 2003.
[33] H. D. Sherali and W. P. Adams. A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems.
SIAM J. Discrete Math., 3:411430, 1990.
[34] R. P. Stanley. Decompositions of rational convex polytopes. Ann. Discrete
Math., 6:333342, 1980.
[35] S. Sullivant. Compressed polytopes and statistical disclosure limitation. Tohoku
Math. J. (2), 58:433445, 2006.
[36] M. Yannakakis. Edge-deletion problems. SIAM J. Comput., 10:297309, 1981.
i
i
main
2012/11/1
page 341
i
Chapter 8
A new development is extension of the algebraic certicates of real algebraic geometry to noncommutative polynomials, thereby giving a theory of noncommutative
polynomial inequalities. Here we shall focus on convexity aspects of noncommutative real algebraic geometry, and we shall see this leads to a very rigid structure.
Our subject pertains to optimization problems where the unknowns are matrices.
8.1
Introduction
This chapter is a tutorial on techniques and results in free convex algebraic geometry
and free positivity. As such it also serves as a point of entry into the larger eld of
free real algebraic geometry and makes contact with noncommutative real algebraic
geometry [27, 30, 32, 33, 38, 47, 48, 53, 59, 62, 63], free analysis and free probability
(lying at the origins of free analysis; cf. [64]), and free analytic function theory and
free harmonic analysis [28, 29, 34, 54, 60, 69, 70, 46].
The term free here refers to the central role played by algebras of noncommuting polynomials R<x> in free (freely noncommuting) variables x = (x1 , . . . , xg ).
A striking dierence between the free and classical settings is the following Positivstellensatz.
J. William Helton was partially supported by NSF grants DMS-0700758, DMS-0757212, and
DMS-1160802 and by the Ford Motor Company.
Igor Klep was supported by the Faculty Research Development Fund (FRDF) of The University of Auckland (project 3701119) and was partially supported by the Slovenian Research Agency
(program P1-0222).
Scott McCullough was supported by NSF grant DMS-1101137.
341
i
i
342
main
2012/11/1
page 342
i
8.1.1
Motivation
While the theory is both mathematically pleasing and natural, much of the excitement of free convexity and positivity stems from its applications. Indeed, the
fact that a large class of linear systems engineering problems naturally lead to free
inequalities provided the main force behind the development of the subject. In this
motivational section, we describe in some detail the linear systems point of view.
We also give a brief introduction to other applications.
Linear systems engineering
The layout of a linear systems problem is typically specied by a signal ow diagram.
Signals go into boxes and other signals come out. The boxes in a linear system
contain constant coecient linear dierential equations which are specied entirely
by matrices (the coecients of the dierential equations). Often many boxes appear
and many signals transmit between them. In a typical problem some boxes are
given, and some we get to design subject to the condition that the L2 -norm of
various signals must compare in a prescribed way; e.g., the input to the system has
L2 -norm bigger than the output. The signal ow diagram itself and corresponding
problems do not specify the size of matrices involved. So ideally any algorithms
derived apply to matrices of all sizes. Hence the problems are called dimension free.
An empirical observation is that system problems of this type convert to inequalities on polynomials in matrices, the form of the polynomials being determined entirely by the signal ow layout (and independent of the matrices involved).
Thus the systems problem naturally leads to free polynomials and free positivity
conditions.
For yet a more detailed discussion of this example, see [13, Section 4.1]. Those
who read Chapter 2 saw a basic example of this in Section 2.2.1. Next we give more
i
i
8.1. Introduction
main
2012/11/1
page 343
i
343
v
G
called a signal ow diagram. Here u is a signal going into the closed loop system
and y is the signal coming out. The signal ow diagram is equivalent to a collection
of equations. The systems F and G themselves are, respectively, given by the linear
dierential equations
d
= Q + R w,
dt
v = S .
dx
= Ax + Be,
dt
y = Cx,
and
e = u v.
Putting these relations together gives that the closed loop system is described by
dierential equations
dx
= Ax BS + Bu,
dt
d
= Q + R y = Q + R Cx,
dt
y = Cx,
i
i
344
main
2012/11/1
page 344
i
0
dt
2
3 x
y= C 0
,
(8.1)
where the state space of the closed loop systems is the direct sum X Y of the
state spaces X of F and Y of G. From (8.1), the coecients of the ODE are
(block) matrices whose entries are (in this case simple) polynomials in the matrices
A, B, C, Q, R, S.
This illustrates the moral of the general story:
System connections produce a new system whose coecients are matrices with
entries which are noncommutative polynomials (or at worst rational expressions)
in the coecient matrices of the component systems.
Complicated signal ow diagrams give complicated matrices of noncommutative polynomials or rationals. Note that in what was said the dimensions of vector
spaces and matrices A, B, C, Q, R, S never entered explicitly; the algebraic form of
(8.1) is completely determined by the ow diagram. Thus, such linear systems lead
to dimension free problems.
Next we turn to how noncommutative inequalities arise. The main constraint producing them can be thought of as energy dissipation, a special case of
which are the Lyapunov functions already seen in Section 2.2.1.
Energy dissipation
We have a system F and want a condition which checks whether
,
,
2
2
|u| dt
|Fu| dt,
x(0) = 0,
0
holds for all input functions u, where Fu = y in the above notation. If this holds F
is called a dissipative system.
L2 [0, ]
L2 [0, ]
t2
V (x(t1 )) +
t1
,
|u(t)|2 dt V (x(t2 )) +
t2
|y(t)|2 dt
t1
i
i
8.1. Introduction
main
2012/11/1
page 345
i
345
for all input functions u and initial states x1 is called a storage function. The displayed inequality is interpreted physically as
potential energy now + energy in potential energy then + energy out.
Assuming enough smoothness of V , we can dierentiate this integral condition
d
x(t1 ) = Ax(t1 ) + Bu(t1 ) to obtain a dierential inequality
and use dt
0 V (x)(Ax + Bu) + |Cx|2 |u|2
(8.2)
on what is called the reachable set (which we do not need to dene here).
In the case of linear systems, V can be chosen to be a quadratic. So it has the
form V (x) = Ex, x with E 0 and V (x) = 2Ex.
Theorem 8.2. The linear system A, B, C is dissipative if inequality (8.2) holds for
all u U, x X . Conversely, if A, B, C is reachable,1 then dissipativity implies
that inequality (8.2) holds for all u U, x X .
In the linear case, we may substitute V (x) = 2Ex in (8.2) to obtain
0 2(Ex) (Ax + Bu) + |Cx|2 |u|2
for all u, x. Then maximize in x to get
0 x [EA + A E + EBB E + C C]x.
Thus the classical Riccati matrix inequality
0 EA + A E + EBB E + C C
with
E0
(8.3)
ensures dissipativity of the system and, it turns out, is also implied by dissipativity
when the system is reachable.
It is inequality (8.3), applied in many many contexts, which leads to positive
semidenite inequalities throughout all of linear systems theory.
As an aside we return to the very special case of dissipativity, namely Lyapunov stability, described in Section 2.2.1. Our discussion starts with the miracle
of inequality (8.3): when B = 0 it becomes the Lyapunov inequality. However,
this is merely magic (no miracle whatsoever); the trick being that the if input u
is identically zero, then dissipativity implies stability. The converse is less intuitive, but true: stability of x = Ax implies existence of a virtual potential energy
V (x) = Ex, x and output C making the virtual system dissipative.
Schur complements and linear matrix inequalities
Using Schur complements, the Riccati inequality of (8.3) is equivalent to the inequality
EA + A E + C C EB
0.
L(E) :=
BE
I
1A
i
i
346
main
2012/11/1
page 346
i
i,j=1
i
i
8.1. Introduction
main
2012/11/1
page 347
i
347
controller for a helicopter puts in the mathematical systems model for his helicopter
and puts in matrices, for example, A is a particular 8 8 real matrix etc. Another
user who designs a satellite controller might have a 50-dimensional state space and
of course would pick completely dierent A, B, C. Essentially any matrices of any
compatible dimensions can occur. Any claim we make about our formulas must be
valid regardless of the size of the matrices plugged in.
The toolbox designer faces two completely dierent tasks. One is manipulation
of algebraic inequalities; the other is numerical solutions. Often the rst is far more
daunting since the numerics is handled by some standard package (although for
numerics problem size is a demon). Thus there is a great need for algebraic theory.
Most of this chapter bears on questions like (3) above, where the unknowns are
matrices. The rst two questions will not be addressed. Here we treat (3) when
there are no a variables. When there are a variables, see [26, 1]. Thus we shall
consider polynomials p(x) in free noncommutative variables x and focus on their
convexity on free semialgebraic sets.
What are the implications of our study for engineering? Herein you will see
strong results on free convexity but what do they say to an engineer? We foreshadow the forthcoming answer by saying it is fairly negative, but postpone further
disclosure till the nal page of these writings not so much to promote suspense but
for the conclusion to arrive after you have absorbed the theory.
Quantum phenomena
Free Positivstellens
atzealgebraic certicates for positivityof which Theorem 8.1
is the grandfather, have physical applications. Applications to quantum physics are
explained by Pironio, Navascues, and Acn [59], who also consider computational
aspects related to noncommutative sum of squares. How this pertains to operator
algebras is discussed by Klep and Schweighofer in [47]. The important Bessis
MoussaVillani conjecture (BMV) from quantum statistical mechanics is tackled in
[48, 7]. Doherty et al. [12] employ noncommutative positivity and the Positivstellensatz [37] of the rst and the third author to consider the quantum moment problem
and multiprover games.
A particularly elegant recent development, independent of the line of history
containing the work in this chapter, was initiated by Eros. The classic perspective transformation carries a function on Rn to a function on Rn+1 . It is used for
various purposes, one being in algebraic geometry to produce blowups of singularities, thereby removing them. It has the property that convex functions map to
convex functions. What about convex functions on free variables? This question
was asked by Eros and settled armatively in [18] for natural cases as a way to
show that quantum relative entropy is convex. Subsequently, [19] showed that the
perspective transformation in free variables always maps convex functions to convex
functions.
Miscellaneous applications
A number of other scientic disciplines use free analysis, though less systematically
than in free real algebraic geometry.
i
i
348
main
2012/11/1
page 348
i
Free probability. Voiculescu developed it to attack one of the purest of mathematical questions regarding von Neumann algebras. From the outset (about 20 years
ago) it was elegant and it came to have great depth. Subsequently, it was discovered
to bear forcefully and eectively on random matrices. The area is vast, so we do
not dive in but refer the reader to an introduction [64, 71].
Nonlinear engineering systems. A classical technique in nonlinear systems theory developed by Fliess is based on manipulation of power series with noncommutative variables (the Chen series). The area has a new impetus coming from the
problem of data compression, so now is a time when these correspondences are being
worked out; cf. [21, 22, 52].
8.1.2
Further Reading
We pause here to oer some suggestions for further reading. For further engineering
motivation we recommend the paper [65] or the longer version [66] for related new
directions. Descriptions of Positivstellens
atze are in the surveys [31, 13, 43, 63],
with the rst three also briskly touring free convexity. The survey article [40] is
aimed at engineers.
Noncommutative is a broad term, encompassing essentially all algebras. In
between the extremes of commutative and free lie many important topics, such as
Lie algebras, Hopf algebras, quantum groups, C -algebras, von Neumann algebras,
etc. For instance, there are elegant noncommutative real algebraic geometry results
for the Weyl algebra [62]; cf. [63].
8.1.3
The goal of this tutorial is to introduce the reader to the main results and techniques
used to study free convexity. Fortunately, the subject is new and the techniques
not too numerous so that one can quickly become an expert.
The basics of free, or noncommutative, polynomials and their evaluations are
developed in Section 8.2. The key notions are positivity and convexity for free polynomials. The principal fact is that the second directional derivative (in direction h)
of a free convex polynomial is a positive quadratic polynomial in h (just like in the
commutative case). Free quadratic (in h) polynomials have a Gram-type representation which thus gures prominently in studying convexity. The nuts and bolts of
this Gram representation and some of its consequences, including Theorem 8.1, are
the subjects of Sections 8.4 and 8.5, respectively.
The Gram representation techniques actually require only a small amount of
convexity, and thus there is a theory of geometry on free varieties having signed
(e.g., positive) curvature. Some details are in Section 8.6.
A couple of free real algebraic geometry results which have a heavy convexity
component are described in the last section, Section 8.7. The rst is an optimal
free convex Positivstellensatz which generalizes Theorem 8.1. The second says that
free convex semialgebraic sets are free spectrahedra, giving another example of the
much more rigid structure in the free setting.
i
i
main
2012/11/1
page 349
i
349
Section 8.3 introduces software which handles free noncommutative computations. You may nd it useful in your free studies.
In what follows, mildly incorrectly but in keeping with the usage in the literature, the terms noncommutative and free are used synonymously.
8.2
This section treats the basics of polynomials in noncommutative variables, noncommutative dierential calculus, and noncommutative inequalities. There is also
a brief introduction to noncommutative rational functions and inequalities.
8.2.1
Noncommutative Polynomials
(8.4)
It is precisely the fact that x1 and x2 do not commute that makes c nonzero.
While a commutative polynomial q R[t1 , t2 ] is naturally evaluated at points
t R2 , noncommutative polynomials are naturally evaluated on tuples of square
matrices. For instance, with
0 1
1 0
X1 =
, X2 =
,
1 0
0 0
and X = (X1 , X2 ), one nds
c(X) =
0
1
1
.
0
i
i
350
main
2012/11/1
page 350
i
(8.5)
X4 + Y 4 . 1
1 /4
164 120
X+ Y
=
120 084
2
2
2
i
i
main
2012/11/1
page 351
i
351
t2
1
t1
g
<
j=1
nN
i
i
352
8.2.2
main
2012/11/1
page 352
i
Given positive integers d, d , let Rdd <x> denote the d d matrices with en
tries from R<x>. Thus elements of Rdd <x> are matrix-valued noncommutative polynomials. The involution on R<x> naturally extends to a mapping
: Rdd <x> Rd d <x>. In particular, if
2 3d,d
P = pi,j i,j=1 Rdd <x>,
then
2 3d,d
P = pj,i i,j=1 Rd d <x>.
g
Aj xj Sdd <x>
(8.6)
j=1
i
i
main
2012/11/1
page 353
i
353
0 1 0
1 0 0
A1 =
0 0 0
0 0 0
0
0
,
0
0
Then
0
0
A2 =
0
0
0
0
1
0
1
x1
I+
Aj xj =
0
0
0
1
0
0
x1
1
x2
0
0
0
,
0
0
0
x2
1
x3
0
0
A3 =
0
0
0
0
0
0
0
0
0
1
0
0
.
1
0
0
0
x3
1
0 4 0 2
4 0 2 0
=
0 3 0 0 .
3 0 0 0
'
1
0
i
i
354
main
2012/11/1
page 354
i
with matrix entries. We have reserved the tensor product notation for the
tensor product of matrices and have eschewed the strong temptation of using A x
in place of Ax when x is one of the variables.
Proposition 8.6. Suppose p R<x>. In increasing levels of generality,
1. if p(X) = 0 for all n and all X (Snn )g , then p = 0;
2. if there is a nonempty noncommutative basic open semialgebraic set O such that
p(X) = 0 on O (meaning for every n and X O(n), p(X) = 0), then p = 0;
3. there is an N, depending only upon the degree of p, so that for any n N if
there is an open subset O (Snn )g with p(X) = 0 for all X O, then p = 0.
Proof. See Exercises 8.28, 8.31, and 8.34.
Exercise 8.7. Use Proposition 8.6 to prove the following statement.
Proposition 8.8. Suppose p R<x>. Show p(X) is symmetric for every n and
every X (Snn )g if and only if p = p.
8.2.3
Now we return with a bit more detail to our main theme, convexity. A symmetric
polynomial p is matrix convex if, for each positive integer n, each pair of g-tuples
X = (X1 , . . . , Xg ) and Y = (Y1 , . . . , Yg ) in (Snn )g , and each 0 t 1,
$
%
tp(X) + (1 t)p(Y ) p tX + (1 t)Y 0,
where, for an n n matrix A Rnn , the notation A 0 means A is positive
semidenite. Synonyms for matrix convex include both noncommutative convex
and simply convex.
Exercise 8.9. Show that the denition here of (matrix) convex is equivalent to
that given in (8.5) in the informal introduction to noncommutative polynomials.
As we have already seen in the informal introduction to noncommutative
polynomials, even in one variable, convexity in the noncommutative setting differs from convexity in the commutative case because here Y need not commute
with X. Thus, although the polynomial x4 is a convex function of one real variable,
it is not matrix convex. On the other hand, to verify that x2 is a matrix convex
polynomial, observe that
tX 2 + (1 t)Y 2 (tX + (1 t)Y )2
= t(1 t)(X 2 XY Y X + Y 2 ) = t(1 t)(X Y )2 0.
A polynomial p R<x> is matrix positive, synonymously noncommutative
positive or simply positive, if p(X) 0 for all tuples X = (X1 , . . . , Xg ) (Snn )g .
i
i
main
2012/11/1
page 355
i
355
8.2.4
Thus p
()
We let p (x)[h] denote the rst derivative, and the Hessian, denoted p (x)[h] of
p(x), is the second directional derivative of p in the direction h.
Equivalently, the Hessian of p(x) can also be dened as the part of the polynomial
$
%
r(x)[h] := 2 p(x + h) p(x)
in
R<x>[h] := R < x1 , . . . , xg , h1 , . . . , hg >
that is homogeneous of degree two in h.
If p = 0, that is, if p = p(x) is a noncommutative polynomial of degree two
or more, then the polynomial p (x)[h] in the 2g variables x1 , . . . , xg , h1 . . . , hg is
homogeneous of degree 2 in h and has degree equal to the degree of p.
i
i
356
main
2012/11/1
page 356
i
Example 8.11.
(1) The Hessian of the polynomial p = x21 x2 is
p (x)[h] = 2(h21 x2 + h1 x1 h2 + x1 h1 h2 ).
(2) The Hessian of the polynomial f (x) = x4 (just one variable) is
f (x)[h] = 2(h2 x2 + hxhx + hx2 h + xhxh + xh2 x + x2 h2 ).
Noncommutative convexity is neatly described in terms of the Hessian.
Lemma 8.12. p R<x> is noncommutative convex if and only if p (x)[h] is
noncommutative positive.
Proof. See Exercise 8.26.
8.2.5
To this point, our variables x have been symmetric in the sense that, under the
involution, xj = xj . The corresponding polynomials, elements of R<x> are then
the noncommutative analogue of polynomials in real variables, with evaluations
at tuples in Snn . In various applications and settings it is natural to consider
noncommutative polynomials in other types of variables.
Free variables
The noncommutative analogue of polynomials in complex variables is obtained by
allowing evaluations on tuples X of not necessarily symmetric matrices. In this case,
the involution must be interpreted dierently, and the variables are called free.
In this setting, given the noncommutative variables x = (x1 , . . . , xg ), let x =
(x1 , . . . , xg ) denote another collection of noncommutative variables. On the ring
R<x, x > dene the involution by requiring xj xj ; xj xj ; reverses the
order of words; and linearity. For instance, for
q(x) = 1 + x1 x2 x2 x1 R<x, x >,
we have
Elements of R<x, x > are polynomials in free variables, and in this setting the
variables themselves are free.
A polynomial p R<x, x > is symmetric provided p = p. In particular, q
above is not symmetric, but
p = 1 + x1 x2 + x2 x1
(8.7)
is.
i
i
main
2012/11/1
page 357
i
357
p(X) =
3 0
.
0 1
The space Rdd <x, x > is dened by analogy with Rdd <x>, and evaluation
of elements in Rdd <x, x > at a tuple X (R )g is dened in the obvious way.
Exercise 8.13. State and prove analogues of Propositions 8.6 and 8.8 for R<x, x >
and evaluations from (R )g .
Mixed variables
At times it is desirable to mix free and symmetric variables. We wont introduce
notation for this situation, as it will generally be understood from the context. Here
are some examples:
Example 8.14.
3
p(x) = x1 x1 + x2 + x1 x2 x1 ,
4
x2 = x2 ;
(8.8)
x = x .
In the rst case x1 is free, but x2 is symmetric; and in the second a1 and a2 are
free, but x is symmetric. Two additional remarks are in order about the second
polynomial. First, it is a Riccati polynomial ubiquitous in control theory. Second,
we have separated the variables into two classes of variables, the a variables and the
x variable(s); thus p R<a, x = x >. In applications, the a variables can be chosen
to represent known (system parameters), while the x variables are unknown(s). Of
course, it could be that some of the a variables are symmetric and some free and
ditto for the x variables.
Example 8.15. Various directional derivatives of p in (8.8) are
3
3
Dx1 p(x)[h1 ] = h1 x1 +x1 h1 + h1 x2 x1 + x1 x2 h1 ,
4
4
3
Dx2 p(x)[h2 ] = h2 + x1 h2 x1 ,
4
3
3
3
Dx p(x)[h] = h1 x1 + x1 h1 + h2 + h1 x2 x1 + x1 x2 h1 + x1 h2 x1 ,
4
4
4
Continuing with the variable class warfare, consider the following matrixvalued example.
i
i
358
main
2012/11/1
page 358
i
a x + xa1
L(a1 , a2 , x) = 1
xa2
a2 x
.
1
We consider L R22 <a, x = x >; i.e., the a variables are free, and the x-variables
symmetric. Note that L is linear in x if we consider a1 , a2 xed. Of course, if a1 , a2 ,
and x are all scalars, then using Schur complements tells us there is a close relation
between L in this example and the Riccati of the previous example.
8.2.6
While it is possible to dene noncommutative functions [67, 64, 69, 70, 60, 61, 46,
28, 29], in this section we content ourselves with a relatively informal discussion of
noncommutative rational functions [10, 11, 41, 45].
Rational functions, a gentle introduction
Noncommutative rational expressions are obtained by allowing inverses of polynomials. An example is the discrete time algebraic Riccati equation
r(a, x) = a1 xa1 (a1 xa2 )a1 (a3 + a2 xa2 )1 (a2 xa1 ) + a4 ,
x = x .
(8.9)
Thus, we dene (scalar) noncommutative rational expressions for free noncommutative variables x by starting with noncommutative polynomials and then
applying successive arithmetic operationsaddition, multiplication, and inversion.
We emphasize that an expression includes the order in which it is composed, and
no two distinct expressions are identied, e.g., (x1 ) + (x1 ), (1) + (((x1 )1 )(x1 )),
and 0 are dierent noncommutative rational expressions.
Evaluation on polynomials naturally extends to rational expressions. If r is a
rational expression in free variables and X (R )g , then r(X) is denedin the
obvious wayas long as any inverses appearing actually exist. Indeed, our main
interest is in the evaluation of a rational expression. For instance, for the polynomial
s above in one free variable, s(X) is dened as long as I XX is invertible and
in this case,
s(X) = X (I XX )1 .
Generally, a noncommutative rational expression r can be evaluated on a g-tuple X
of n n matrices in its domain of regularity, dom r, which is dened as the set of
all g-tuples of square matrices of all sizes such that all the inverses involved in the
calculation of r(X) exist. For example, if r = (x1 x2 x2 x1 )1 , then dom r = {X =
(X1 , X2 ) : det(X1 X2 X2 X1 ) = 0}. We assume that dom r = . In other words,
i
i
main
2012/11/1
page 359
i
359
(8.11)
Exercise 8.18.
$ Consider the
% function W from (8.10). Let R, X be n n matrices
and assume c X, c(X, R)1 exists and is invertible. Prove the following:
(1) If n = 2, then W (R) = 0.
(2) If n = 3, then W (R) = det(c(X, R)).
Exercise 8.19. Consider Bergmans rational function (8.11).
(1) Show that on a dense set of 2 2 matrices (X, Y ), b(X, Y ) = 0.
(2) Prove that on a dense set of 3 3 matrices (X, Y ), b(X, Y ) = 1.
The moral of Exercise 8.19 is that, unlike in the case of polynomial identities,
a noncommutative rational function that vanishes on (a dense set of) 3 3 matrices
need not vanish on (a dense set of) 2 2 matrices.
Matrices of rational functions; LDL
One of the main ways noncommutative rational functions occur in systems engineering is in the manipulation of matrices of polynomials. Extremely important is
the LDL decomposition. Consider the 2 2 matrix with noncommutative entries
a b
,
M=
b c
i
i
360
main
2012/11/1
page 360
i
0
0
0
0 0
d1 0
.. . .
.
. 0
0
0 0
0 dk
0
0 0
0 0
D=
(8.12)
0 . . . 0 Dk+1
.
.
.
.
..
..
..
..
0 0
0 0
0
D 0
0 0
0
0 E
and
L=
0
..
.
0 0
1 0
I2
0
0
0
0
0
0
..
.
0
I2
0
,
0
Ia
(8.13)
where dj are symmetric rational functions, and the Dj are nonzero 2 2 matrices
of the form
0 bj
Dj =
.
bj 0
E is a square 0 matrix (possibly of size 0 0 and thus absent), and I2 is the 2 2
identity and the s represent possibly nonzero rational expressions (in some cases
matrices of rational functions), some of the 0s are zero matrices (of the appropriate
sizes), and a is the dimension of the space that E acts upon. The permutation
is necessary in cases where the procedure hits a 0 on the diagonal, necessitating a
permutation to bring a nonzero diagonal entry into the pivot position.
i
i
main
2012/11/1
page 361
i
361
8.2.7
Exercises
i
i
362
main
2012/11/1
page 362
i
Exercise 8.22.
(a) What is the derivative with respect to x1 in direction h1 of q and s?
(b) Concerning the formal derivative with respect to x1 in direction h1 ,
1
(i) show the derivative of r(x1 ) = x1 1 is x1
1 h1 x1 ;
d2 f (X+tH)
dt2
i
i
main
2012/11/1
page 363
i
363
Symn
k
gj .
(8.14)
j=0
i
i
364
main
2012/11/1
page 364
i
Using the staircase matrices E11 , E12 , E22 , E23 , . . . , En1 n , Enn show that a
nonzero multilinear polynomial q of degree n cannot vanish on all nn matrices.
(c) By (a) we may assume p is homogeneous. By induction on the biggest degree
a variable in p can have, prove that p = 0. Hint: What are the degrees of the
variables appearing in
1 , x2 , . . . , xg ) p(x1 , x2 , . . . , xg ) p(
x1 , x2 , . . . , xg )?
p(x1 + x
Exercise 8.30. Redo Exercise 8.29 for a polynomial
(a) p R<x, x >, not necessarily analytic, vanishing on all tuples of matrices;
(b) p R<x> vanishing on all tuples of symmetric matrices.
Exercise 8.31. Show that if p R<x> vanishes on a nonempty basic open
semialgebraic set, then p = 0.
Exercise 8.32. Suppose p R<x>, n is a positive integer, and O (Snn )g
is an open set. Show that if p(X) = 0 for each X O, then P (X) = 0 for each
X (Snn )g . Hint: Given X0 O and X (Snn )g , consider the matrix valued
polynomial,
q(t) = p(X0 + tX).
Exercise 8.33. Suppose r R<x
( >
) is a rational function and there is a nonempty
noncommutative basic open semialgebraic set O dom(r) with r|O = 0. Show that
r = 0.
i
i
main
2012/11/1
page 365
i
365
Exercise 8.34. Prove item (3) of Proposition 8.6. You may wish to use Exercises
8.32 and 8.28.
Exercise 8.35. Prove the following proposition.
Proposition 8.36. If : R<x> Rnn is an involution preserving homomorphism, then there is an X (Snn )g such that (p) = p(X); i.e., all nite
dimensional representations of R<x> are evaluations.
Exercise 8.37. Do the algebra to show
x (1 xx )1 = (1 x x)1 x .
(This is a key fact used in the model theory for contractions [55].)
Exercise 8.38. Give an example of symmetric 2 2 matrices X, Y such that
X Y 0 but X 2 Y 2 .
This failure of a basic order property of R for Snn is closely related to the
rigid nature of positivity and convexity in the noncommutative setting.
Exercise 8.39. Antiderivatives.
(a) Is q(x)[h] = xh + hx the derivative of any noncommutative polynomial p? If so,
what is p?
(b) Is q(x)[h] = hhx + hxh + xhh the second derivative of any noncommutative
polynomial p? If so, what is p?
(c) Describe in general which polynomials q(x)[h] are the derivative of some noncommutative polynomial p(x).
(d) Check you answer against the theory in [23].
Exercise 8.40. (Requires background in algebra) Show that R<x
( >
) is a division
ring; i.e., the noncommutative rational functions form a ring in which every nonzero
element is invertible.
Exercise 8.41. In this exercise we will establish that it is possible to embed the
free algebra R<x1 , . . . , xg > into R<x, y> for any g N.
(a) Show that the subalgebra of R<x, y> generated by xy n , n N0 , is free.
(b) Ditto for the subalgebra generated by
x1 = x,
x2 = c(x1 , y),
x3 = c(x2 , y),
...,
xn = c(xn1 , y), . . . .
i
i
366
main
2012/11/1
page 366
i
X1 + Y1
2
&
'4
X2 + Y2
2
'4
0.
8.3
There are several computer algebra packages available to ease the rst contact with
free convexity and positivity. In this section we briey describe two of them:
(1) NCAlgebra running under Mathematica;
(2) NCSOStools running under MATLAB.
The former is more universal in that it implements manipulation with noncommutative variables, including noncommutative rationals, and several algorithms pertaining to convexity. The latter is focused on noncommutative positivity and numerics.
8.3.1
NCAlgebra
NCAlgebra [42] runs under Mathematica and gives it the capability of manipulating
noncommuting algebraic expressions. An important part of the package (which we
shall not go into here) is NCGB, which computes noncommutative Groebner bases
and has extensive sorting and display features as well as algorithms for automatically
discarding redundant polynomials.
We recommend that the user have a look at the Mathematica notebook
NCBasicCommandsDemo available from the NCAlgebra website
http://math.ucsd.edu/ncalg/
for the basic commands and their usage in NCAlgebra. Here is a sample.
The basic ingredients are (symbolic) variables, which can be either noncommutative or commutative. At present, single-letter lowercase variables are noncommutative by default and all others are commutative by default. To change this one
can employ
i
i
main
2012/11/1
page 367
i
367
Slightly more advanced is the NCAlgebra command to generate the directional derivative of a polynomial p(x, y) with respect to x, which is denoted by
Dx p(x, y)[h]:
NCAlgebra Command: DirectionalD[Function p, x, h], and is abbreviated
NCAlgebra Command: DirD.
Example 8.44. Consider
a = x ** x ** y - y ** x ** y
Then
DirD[a, x, h] = (h ** x + x ** h) ** y - y ** h ** y
or in expanded form,
NCExpand[DirD[a, x, h]] = h ** x ** y + x ** h ** y - y ** h ** y
i
i
368
main
2012/11/1
page 368
i
i
i
8.3.2
main
2012/11/1
page 369
i
369
NCSOStools
i
i
370
main
2012/11/1
page 370
i
8.4
A Gram-like Representation
The next two sections are devoted to a powerful representation of quadratic functions q in noncommutative variables which takes a strong form when q is matrix
positive; we call it a QuadratischePositivstellensatz. Ultimately we shall apply this
to q(x)[h] = p (x)[h] and show that if p is matrix convex (i.e., q is matrix positive),
then p has degree 2. We begin by illustrating our grand scheme with examples.
8.4.1
Example 8.53. The (symmetric) polynomial p(x) = x1 x2 x1 + x2 x1 x2 (in symmetric variables) has Hessian q(x)[h] = p (x)[h], which is homogeneous quadratic in h
and is
q(x)[h] = 2h1 h2 x1 + 2h1 x2 h1 + 2h2 h1 x2 + 2h2 x1 h2 + 2x1 h2 h1 + 2x2 h1 h2 .
i
i
main
2012/11/1
page 371
i
371
2
q(x)[h] = h1
h2
x2 h1
2x2
3 0
x1 h2
0
2
0
2x1
2
0
0
2
0
0
2
h1
0
h2 .
0 h1 x2
0 h2 x1
h1
0
2x2 x1 2x2
0
0 2
2x1 x2
0
0
2x1 2 0
h2
2x1
h1 x2
0
0
2
0
0
.
2x2
2
0
0 0
h2 x1
0
2
0
0
0 0 h1 x2 x1
2
0
0
0
0 0 h2 x1 x2
Example 8.55. In the one variable with h1 = h1 we abbreviate h1 to h. Fix some
noncommutative variables not necessarily symmetric w := (a, b, d, e) and consider
q(w)[h] := hah + e hbh + hb he + e hdhe,
which is a quadratic function of h. It can be written in the BV-MM form
2
3 a b h
q(w)[h] = h e h
.
b d he
(8.15)
(8.16)
i
i
372
main
2012/11/1
page 372
i
(8.17)
+ hx h,
a polynomial that is homogeneous of degree 2
h that can be expressed as
2
x
2
3
q(x)[h] = 2 h xh x2 h x
1
1
h
0 hx .
0 hx2
Notice that the contribution of the main antidiagonal of the middle matrix for
q in Example 8.56 (all 1s) corresponds to the right-hand side of rst line of (8.17).
Indeed, each antidiagonal corresponds to a line of (8.17).
Exercise 8.57. In Example 8.56, for which symmetric matrices X is Z(X) positive
semidenite?
Exercise 8.58. What is the middle matrix Z(x) for p(x) = x3 ? For which symmetric matrices X is Z(X) positive semidenite?
Exercise 8.59. Compute middle matrix representations using NCAlgebra. The
command is
{lt, mq, rt} =NCMatrixOfQuadratic[q, {h, k}]
In the output mq is the middle matrix, rt is the border vector, and lt is (rt) . For
examples, see NCConvexityRegionDemo.nb in the NC/DEMOS directory.
The positivity of q vs. positivity of the middle matrix
In this section we let q(x)[h] denote a polynomial which is homogeneous of degree
two in h, but which is not necessarily the Hessian of a noncommutative polynomial.
While we have focused on Hessians, such a q will still have a BV-MM representation. So what good is this representation? After all one expects that q could have
wonderful properties, such as positivity, which are not shared by its middle matrix.
No, the striking thing is that positivity of q implies positivity of the middle matrix.
Roughly we shall prove what we call the QuadratischePositivstellensatz, which is
essentially Theorem 3.1 of [9].
Theorem 8.60. If the polynomial 2 q(x)[h] is homogeneous quadratic in h, then q
is matrix positive if and only if its middle matrix Z is matrix positive.
2 This
theorem is true (but not proved here) for q which are noncommutative rational in x.
i
i
main
2012/11/1
page 373
i
373
More generally, suppose O is a nonempty noncommutative basic open semialgebraic set. If q(X)[H] is positive semidenite for all n N, X O(n), and
H (Snn )g , then Z(X) 0 for all X O.
We emphasize that, in the theorem, the convention that the terms of the
border vector are distinct is in force.
To foreshadow Section 8.5 and to give an idea of the proof of Theorem 8.60,
we illustrate it on an example in one variable. This time we use a free rather than
symmetric variable since proofs are a bit easier.
Consider the noncommutative quadratic function q given by
q(w)[h] := h bh + e h ch + h c he + e h ahe,
(8.18)
where w = (a, b, c, e). The border vector V (w)[h] and the coecient matrix Z(w)
with noncommutative entries are
h
b c
V (w)[h] =
and
Z(w) =
;
he
c a
that is, q has the form
2
q(w)[h] = V (w)[h] Z(w)V (w)[h] = h
e h
3 b
c
c
a
h
.
he
3
v E H .
i
i
374
main
2012/11/1
page 374
i
Thus we are nished unless for all v the vectors v and Ev are linearly dependent. That is for all v, 1 (v)v + 2 (v)Ev = 0 for nonzero 1 (v) and 2 (v). Note
2 (v) = 0, unless v = 0. Set (v) := 12 (v)
(v) ; then the linear dependence becomes
(v)v + Ev = 0 for all v. It turns out that this does not happen unless E = I
for some R. This is a baby case of Theorem 8.92 which comes later and is a
subject unto itself.
To nish the proof pick a v which makes Rv equal all of R2n . Then v q(W )[H]v
0 implies that Z 0 by (8.19).
8.4.2
Z01
Z0,1 Z0
Z00
V0
Z10
Z11
Z1,1
0
V1
2
3
.
.
.
.
.
.
..
.. ..
..
q(x)[h] = V0 V1 V1 V ..
,
..
Z1,0 Z2,1
0
0
V1
Z0
0
0
0
V
(8.21)
where the following hold:
1. The degree d of q(x)[h] is d = + 2.
2. Vj = Vj (x)[h], j = 0, . . . , , is a vector of height g j+1 whose entries are
monomials of degree j in the x variables and degree 1 in the h variables.
The h always appears to the left. In particular, V (x)[h] is a vector of height
g( ), where as in (8.14),
( ) = 1 + g + + g .
3. Zij = Zij (x) is a matrix of size g i+1 g j+1 whose entries are polynomials in
the noncommuting variables x1 , . . . , xg of degree (i + j). In particular,
Zi,i = Zi,i (x) is a constant matrix for i = 0, . . . , .
= Zji .
4. Zij
i
i
main
2012/11/1
page 375
i
375
to check that a minimal length border vector contains distinct monomials, and once
the ordering of entries of V is set, the middle matrix for a given q is unique; see
Lemma 8.62 below.
Example 8.61. Returning to Example 8.54, we have for the middle matrix representation of q that
h1
h2 x1
h1 x2 x1
V0 =
,
V1 =
,
V2 =
,
h2
h1 x2
h2 x1 x2
and, for instance,
0
Z00 =
2x1 x2
2x2 x1
,
0
Z01
2x2
=
0
0
,
2x1
Z02
0 2
=
.
2 0
Note that generically for a polynomial q in two variables the Vj have additional
terms. For instance, usually V1 is the column
h1 x1
h1 x2
h2 x1 .
h2 x2
Likewise generically V2 has eight terms. As for the Zij , Z01 , for instance, is generically 2 4.
Lemma 8.62. The entries in the middle matrix Z(x) are uniquely determined by
the polynomial q(x)[h] and the border vector V (x)[h].
Proof. Note every monomial in q(x)[h] has the form
m L h i mM h j mR .
Dene
Rj := {hj m : mL hi mM hj m is a term in q(x)[h]}.
Given the representation V ZV for q, let EV denote the monomials in V . Then it
is clear that each monomial in EV must occur in some term of q, so it appears in
Rj for some j. Conversely, each term hj m in Rj corresponds to at least one term
mL hi mM hj m of q, so it must be in EV .
Exercise 8.63. Consider (8.21) and prove the degree bound on the Zij in (3).
Hint: Read Example 8.64 rst.
Example 8.64. If p(x) is a symmetric polynomial of degree d = 4 in g noncommuting variables, then the middle matrix Z(x) in the representation of the Hessian
p (x)[h] is
i
i
376
main
2012/11/1
page 376
i
where the block entries Zij = Zij (x) have the following structure:
Z00
Z01
Z02
All of these are proved merely by keeping track of the degrees. For example, the
contribution of Z02 to p is V0 Z02 V2 , whose degree is
deg(V0 ) + deg(Z02 ) + deg(V2 ) = 1 + deg(Z02 ) + 3 4,
so deg(Z02 ) = 0.
8.4.3
The middle matrix Z(x) of the Hessian p (x)[h] of a noncommutative symmetric
polynomial p(x) plays a key role. These middle matrices have a very rigid structure
similar to that in Example 8.56. We illustrate with an example and then with
exercises.
Example 8.65. As a warm-up we rst illustrate that Z02 (X) = 0 if and only if
Z11 (X) = 0 for Example 8.54. To this end, observe that the contribution of the
middle matrixs extreme outer diagonal element Z02 to q is as follows:
1
h
0 2 h1 x2 x1
V0 (x)[h] Z02 (x)V2 (x)[h] = 1
= 2h1 h2 x1 x2 + 2h2 h1 x2 x1 .
h2
2 0 h2 x1 x2
2
Substitute hj xj and get 2x1 x2 x1 x2 + 2x2 x1 x2 x1 , which is 2p(x). That is,
p(x) =
1
V0 (x)[x] Z02 (x)V2 (x)[x],
2
1
V1 (x)[x] Z11 (x)V1 (x)[x].
2
Exercise 8.67. Suppose p is homogeneous of degree d and its Hessian q has the
BV-MM representation q(x)[h] = V (x)[h] Z(x)V (x)[h].
(a) Show
p=
1
V0 (x)[x] Z0 V (x)[x]
2
i
i
main
2012/11/1
page 377
i
377
1
V1 (x)[x] Z1,1 (x)V1 (x)[x].
2
Do not cheat and look this up in [14], but do compare with Exercise 8.63.
Exercise 8.68. Let Z denote the middle matrix for the Hessian of a noncommutative polynomial p. Show, if i + j = i + j , then Zij = 0 if and only if Zi j = 0.
8.4.4
A
B
B
0
Z00 Z0
.. .
.
Z = ...
..
.
Z0
A
B
B
,
0
i
i
378
main
2012/11/1
page 378
i
3
2
where A = A and B = Z0 (X) 0 0 . From Exercise 8.67, pd , the homogeneous degree d part of p, can be reconstructed from Z0 . Now there is an X O
such that pd (X) is nonzero, as otherwise pd vanishes on a basic open semialgebraic
set and is equal to 0. It follows that there is an X O such that Z0 (X) is not
zero. Hence B(X) is not zero which implies, by Exercise 8.69, the contradiction
that Z(X) is not positive semidenite.
We have now reached our goal of showing that convex polynomials have degree 2.
Theorem 8.71. If p R<x> is a symmetric polynomial which is convex on a
nonempty noncommutative basic open semialgebraic set O, then it has degree at
most 2.
There is a version of the theorem for free variables, i.e., with p R<x, x >.
Proof. The convexity of p on O is equivalent to p (X)[H] being positive semidefinite for all X in O; see Exercise 8.26. By the QuadratischePositivstellensatz the
middle matrix Z(x) for p (x)[h] is positive on O; that is, Z(X) 0 for all X O.
Proposition 8.70 implies degree p is at most 2.
8.4.5
This section introduces the notion of the signature (Z(x)) of Z(x), the middle
matrix of a Hessian, or more generally a polynomial q(x)[h] which is homogeneous
of degree 2 in h.
The signature of a symmetric matrix M is a triple of integers
$
%
(M ), 0 (M ), + (M ) ,
where (M ) is the number of negative eigenvalues (counted with multiplicity);
+ (M ) is the number of positive eigenvalues; and 0 (M ) is the dimension of the
null space of M .
Lemma 8.72. A noncommutative symmetric polynomial q(x)[h] homogeneous of
degree 2 in h has middle matrix Z of the form in (8.21), and Z being positive
semidenite implies Z is of the form
Z00
Z10
.
.
.
Z ,0
2
0
..
.
Z01
Z11
..
.
Z ,1
2
0
..
.
.
..
.
..
Z0,
2
Z1,
2
..
.
Z ,
2
2
0
.
..
0
0
..
.
0
0
.
..
.
.
.
. .
..
.
..
i
i
main
2012/11/1
page 379
i
379
A
E = B
C
is a real symmetric matrix, then
B
D
0
C
0
0
8.4.6
Exercises
be a vector consisting of
i
i
380
main
2012/11/1
page 380
i
8.4.7
A Glimpse of History
There is a theory of operator monotone and operator convex functions which overlaps with the matrix convex functions considered here in the case of one variable.
However, the points of view are substantially dierent, diverging markedly in several
variables. Lowner introduced a class of real analytic functions in one real variable
called matrix monotone functions, which we shall not dene here. L
owner gave
integral representations and these have developed substantially over the years. The
contact with convexity came when L
owners student Kraus [49] introduced matrix
convex functions f in one variable. Such a function f on [0, ) R can be represented as f (t) = tg(t) with g matrix monotone, so the representations for g produce
representations for f . Hansen has extensive in-depth work on matrix convex and
monotone functions whose denition in several variables is dierent than the one
we use here; see [25] or [24]. All of this gives a beautiful integral representation
characterizing matrix convex functions using techniques very dierent from ours.
An excellent treatment of the one-variable case is [3, Chapter 5]. Interestingly, to
the best of our knowledge, the one-variable version of Theorem 8.71 [36] does not
seem to be explicit in this classical literature. However, it is an immediate consequence of the results of [25], where (not necessarily polynomial) operator convex
functions on an interval are described. This and the papers of Hansen and [56, 68]
are some of the more recent references in this line of convexity history orthogonal
to ours.
8.5
Der QuadratischePositivstellensatz
8.5.1
At the root of the CHSY lemma [9] is the following linear algebra fact.
Lemma 8.81. Fix n > d. If {z1 , . . . , zd } is a linearly independent set in Rn , then
the codimension of
Hz1
Hz
2
nn
Rnd
.. : H S
Hzd
is
d(d1)
.
2
i
i
main
2012/11/1
page 381
i
381
Hz1
Hz2
nn
= Rnd .
:
H
R
..
Hzd
Hint: It proceeds like the proof of (8.20).
Proof of Lemma 8.81. Consider the mapping : Snn Rnd given by
Hz1
Hz2
H . .
..
Hzd
Since the span of {z1 , . . . , zd } has dimension d, it follows that the kernel of has
, and hence the range has dimension n(n+1)
. To
dimension = (nd)(nd+1)
2
2
see this assertion, it suces to assume that the span of {z1 , . . . , zd } is the span of
{e1 , . . . , ed } Rn (the rst d standard basis vectors in Rn ). In this case (since H
is symmetric) Hzj = 0 for all j if and only if
0 0
,
H=
0 H
where H is a symmetric matrix of size (n d) (n d); in other words, this is the
kernel of .
From this we deduce that the codimension of the range of is
/ d(d 1)
. n(n + 1)
=
,
nd
2
2
concluding the proof.
Next is a straightforward extension of Lemma 8.81.
Lemma 8.83 ([9]). If n > d and {z1 , . . . , zd } is a linearly independent subset of
Rn , then the codimension of
Hj z 1
Hj z 2
g
nn g
j=1 . : H = (H1 , . . . , Hg ) (S
Rgnd
)
.
Hj z d
is g d(d1)
and is independent of n.
2
Proof. See Exercise 8.94.
i
i
382
main
2012/11/1
page 382
i
d
j=0
V =
g j and where
g
F
Hi m
i=1 m<x>d
is the border vector associated with <x>d . Again, this codimension is independent
of n as it depends only upon the number of variables g and the degree d of the
polynomial.
Proof. Let zm = m(X)v for m <x>d . There are at most of these. Now apply
the previous lemma.
8.5.2
The main result in this section, Theorem 8.92, says roughly that if each evaluation
of a set G1 , . . . G of rational functions produces linearly dependent matrices, then
they satisfy a universal linear dependence relation. We begin with a clean and easily
stated consequence of Theorem 8.92.
In Section 8.2.1 we dened noncommutative basic open semialgebraic sets.
Here we dene a noncommutative basic semialgebraic set. Given matrix-valued
symmetric noncommutative polynomials and , let
and
D(n) = {X (Snn )g : (X) 0}.
Then D is a noncommutative basic semialgebraic set if there exists 1 , . . . , k and
1 , . . . , k such that D = (D(n))nN , where
D(n) =
@
j
D+j (n)
Dj (n) .
i
i
main
2012/11/1
page 383
i
383
for all X D.
j Gj (X)
j=1
If, in addition, D contains an -neighborhood of 0 for some > 0, then there exists
a nonzero R such that
0=
j Gj .
j=1
j Gj = 0.
j=1
X1
X1 X2 :=
0
0
,
X2
v
v1 v2 := 1 .
v2
i
i
384
main
2012/11/1
page 384
i
where each B(n) is a set whose members are pairs (X, v), where X is in (Snn )g
and v Rn .
Denition 8.89. The set B is said to respect direct sums if (X j , v j ) with X j
(Snj nj )g and v j Rnj for j = 1, . . . , being contained in the set B(nj ) implies
that the direct sum
(X 1 . . . X , v 1 . . . v ) = (j=1 X j , j=1 v j )
is also contained in B( nj ).
Denition 8.90. By a natural map G on B, we mean a sequence of functions
G(n) : B(n) Rn , which respects direct sums in the sense that, if (X j , v j ) B(nj )
for j = 1, 2, . . . , , then
G
nj (X j , v j ) = 1 G(nj )(X j , v j ).
1
j Gj (X, v)
j=1
i
i
main
2012/11/1
page 385
i
385
(R)j Gj (X)v = 0
j=1
j=1
j Gj (X, v) = 0 .
(X,v) = B : G(X)v =
i
i
386
8.5.3
main
2012/11/1
page 386
i
We are now ready to give the proof of Theorem 8.60. Accordingly, let O be a given
basic open semialgebraic set. Suppose
q(x)[h] = V (x)[h] Z(x) V (x)[h],
(8.22)
where V is the border vector and Z is the middle matrix; cf. (8.21). Clearly, if Z is
matrix-positive on O, then q(X)[H] is positive semidenite for each n, X O(n),
and H (Snn )g .
The converse is less trivial and requires the CHSY lemma plus our main result on linear dependence of noncommutative rational functions. Let denote the
degree of q(x)[h] in the variable x. In particular, the border vector in the representation of q(x)[h] itself has degree in x. Recall from Exercise 8.28.
g )
= (X
1 , . . . , X
Suppose that for some s and g-tuple of symmetric matrices X
is not positive semidenite. By Lemma 8.84 and Theorem
O(s), the matrix Z(X)
8.85, there is a t, a Y O(t), and a vector so that {m(Y ) : m <x> } is linearly
Y and = 0 Rs+t . Then Z(X) is not positive
independent. Let X = X
semidenite and {m(X) : m <x> } is linearly independent.
+ 1, where is given in Lemma 8.84, and let n = (s + t)N .
Let N = g (1)
2
Consider W = X IN = (X1 IN , . . . , Xg IN ) and vector = e, for any
nonzero vector e RN +1 . The set {m(W ) : m <x> } is linearly independent,
and thus by Lemma 8.84, the codimension of M = {V (W )[H] : H (Snn )g } is at
most N 1. On the other hand, because Z(X) has a negative eigenvalue, the matrix
Z(W ) has an eigenspace E, corresponding to a negative eigenvalue, of dimension at
least N . It follows that E M is nonempty; i.e., there is an H (Snn )g such that
V (W )[H] E. In particular, this together with (8.22) implies
q(W )[H], = Z(W )V (W )[H], V (W ) < 0,
and thus, q(W )[H] is not positive semidenite.
8.5.4
Exercises
(8.23)
i
i
main
2012/11/1
page 387
i
387
8.6
(8.24)
where Hess p is the Hessian of p, and h Rg is in the tangent space to the surface
(p) at x0 ; i.e., p(x0 ) h = 0.3
We shall show that in the noncommutative setting the zero set V(p) of a
noncommutative polynomial p (subject to appropriate irreducibility constraints)
having positive curvature (even in a small neighborhood) implies that p is convex
and thus, p has degree at most twoand V(p) has positive curvature everywhere;
see Theorem 8.103 for the precise statements.
In fact there is a natural notion of the signature C (V(p)) of a variety V(p)
and the bound
deg(p) 2C (V(p)) + 2
3 The choice of the minus sign in (8.24) is somewhat arbitrary. Classically the sign of the
second fundamental form is associated with the choice of a smoothly varying vector that is normal
to (p). The zero set (p) has positive curvature at x0 if the second fundamental form is either
positive semidenite or negative semidenite at x0 . For example, if we dene (p) using a concave
function p, then the second fundamental form is negative semidenite, while for the same set (p)
the second fundamental form is positive semidenite.
i
i
388
main
2012/11/1
page 388
i
on the degree of p in terms of the signature C (V(p)) was obtained in [16]. The
convention that C+ (V(p)) = 0 corresponds to positive curvature, since in our examples, dening functions p are typically concave or quasiconcave. One could consider
characterizing p for which C (V(p)) satises a less restrictive hypothesis than being
equal to zero, and this has been done to some extent in [14]; however, this higher
level of generality is beyond our focus here. Since our goal is to present the basic
ideas, we stick to positive curvature.
8.6.1
We next dene a number of basic geometric objects associated to the noncommutative variety determined by a noncommutative polynomial p.
Varieties, tangent planes, and the second fundamental form
The variety (zero set) of a p R<x> is
V(p) :=
<
Vn (p),
n1
where
:
;
Vn (p) := (X, v) (Snn )g Rn : p(X)v = 0 .
The clamped tangent plane to V(p) at (X, v) Vn (p) is
Tp (X, v) := {H (Snn )g : p (X)[H]v = 0}.
The clamped second fundamental form for V(p) at (X, v) Vn (p) is the quadratic
form
Tp (X, v) R,
Note that
{X (Snn )g : (X, v) V(p) for some v = 0} = {X (Snn )g : det(p(X)) = 0}
is a variety in (Snn )g and typically has a true (commutative) tangent plane at
many points X, which of course has codimension one, whereas the clamped tangent
plane at a typical point (X, v) Vn (p) has codimension on the order of n and is
contained inside the true tangent plane.
Full rank points
The point (X, v) V(p) is a full rank point of p if the mapping
(Snn )g Rn ,
H p (X)[H]v
i
i
main
2012/11/1
page 389
i
389
8.6.2
i
i
390
main
2012/11/1
page 390
i
is computed for certain choices of p, X, and v. Recall that if p(X)v = 0, then the
subspace T is the clamped tangent plane introduced in Subsection 8.6.1.
Example 8.100. Let X Snn , v Rn , v = 0, and let p(x) = xk for some integer
k 1. Suppose that (X, v) V(p), that is, X k v = 0. Then, since
X k v = 0 Xv = 0
when X Snn ,
it follows that p is a minimum degree dening polynomial for V(p) if and only if
k = 1.
It is readily checked that
(X, v) V(p) = p (X)[H]v = X k1 Hv
and hence that X is a full rank point for p if and only if X is invertible.
Now suppose k 2. Then
p (X)[H]v, v = 2HX k2 Hv, v.
Therefore, if k > 2,
(X, v) V(p) and p (X)[H]v = 0 = XHv = 0, and so
p (X)[H]v, v = 0.
To count the dimension of T we can suppose without loss of generality that
2
3
0 0
X=
and v = 1 0 0 ,
0 Y
where Y S(n1)(n1) is invertible. Then, for the simple case under consideration,
T = {H Snn : h21 , . . . , hn1 = 0},
where hij denotes the ij entry of H. Thus,
dim T =
n2 + n
(n 1),
2
i.e., codim T = n 1.
Remark 8.101. We remark that
X k v = 0 and p (X)[H]v, v = 0 = p (X)[H]v = 0 if k = 2t 4,
as follows easily from the formula
p (X)[H]v, v = 2X t1 Hv, X t1 Hv.
Exercise 8.102. Let A Snn and let U be a maximal strictly negative subspace of Rn with respect to the quadratic form Au, u. Prove that there exists a
complementary subspace V of U in Rn such that Av, v 0 for every v V.
i
i
8.6.3
main
2012/11/1
page 391
i
391
8.6.4
Our aim is to give the idea behind the proof of Theorem 8.103 under much stronger
hypotheses. We saw earlier the positivity of a quadratic on a noncommutative basic
open set O imparts positivity to its middle matrix there. The following shows this
happens for thin sets (noncommutative varieties) too. Thus, the following theorem
generalizes the QuadratischePositivstellensatz, Theorem 8.60.
Theorem 8.104. Let p, O, R be as in Theorem 8.103. Let q(x)[h] be a polynomial
which is quadratic in h having middle matrix representation q = V ZV for which
deg(V ) deg(p). If
v q(X)[H]v 0
for all
(8.25)
i
i
392
main
2012/11/1
page 392
i
i
i
main
2012/11/1
page 393
i
393
second fundamental form as is consistent with the choice of sign in our denition
in Subsection 8.6.1. (All this concern with the sign is unimportant to the content
of this chapter and can be ignored by the reader.)
Next, in view of the presumed strict positive curvature of each level set ,
the matrix A at each point of is negative denite but the Hessian could have a
negative eigenvalue. However, by standard Schur complement arguments, R will be
negative denite if
D + G B A1 B 0
on this region. Thus, strict convexity assumptions on the sublevel sets of p make
the modied Hessian negative denite for negative enough . One can make
this negative deniteness uniform in X in various neighborhoods under modest
assumptions.
Very unfortunately in the noncommutative case, Remark 6.8 [17] implies that
if n is large enough, then the second fundamental form will have a nonzero null
space, thus strict negative deniteness of the A part of the modied Hessian is
impossible.
Our trick for dealing with the likely reality that A is only positive semidenite
and obtaining a negative denite R is to add another negative term, say I, with
arbitrarily small < 0. After adding such , the argument based on choosing
large succeeds as before. This term plus the term produces the relaxed Hessian, to be introduced next, and proper selection of these terms makes it negative
denite.
The relaxed Hessian
Recall Let Vk (x)[h] denotes the vector of polynomials with entries hj w(x), where
w <x> runs through the set of g k words of length k, j = 1, . . . , g. Although the
order of the entries is xed in some of our earlier applications (see e.g. [16, (2.3)])
it is irrelevant for the moment. Thus, Vk = Vk (x)[h] is a vector of height g k+1 , and
the vectors
V (x)[h] = col(V0 , . . . , Vd2 ) and VG (x)[h] = col(V0 , . . . , Vd1 )
are vectors of height g(d 2) and g(d 1), respectively. Note that
VG (x)[h] VG (x)[h] =
g
j=1 deg(w)d1
i
i
394
main
2012/11/1
page 394
i
then for every < 0 there exists a < 0 such that for all ,
p, (X)[H]v, v 0
T + I + rr
T + I + R = 0
cr
rc
.
+ cc
i
i
main
2012/11/1
page 395
i
395
Now
r( 2 c ( + cc )1 c)r = rc ((cc )1 ( + cc )1 )cr
= rc (cc )1 ( + (cc ))1 cr
r(cc )1 r .
Hence,
T0 + I + r(cc )1 r 0.
Since the above inequality holds for all > 0, it follows that T0 0.
We now have enough machinery developed to prove Theorem 8.103.
Proof of Theorem 8.103. Fix , > 0 and consider q(x)[h] = p, (x)[h]. We
are led to investigate the middle matrix Z , of q(x)[h], whose border vector V (x)[h]
includes all monomials of the form hj m, where m is a word in x only of length at
most d 1; here d is the degree of p. Indeed,
Z , = Z + I + W,
where Z is the middle matrix for p (x)[h] and W is the middle matrix for
p (x)[h] p (x)[h]. With an appropriate choice of ordering for the border vector V ,
we have W = CC , where
w(x)
C(x) =
c
for a nonzero vector c, and at the same time,
0,0
Z (x)
Z(x) =
0
0
.
0
Hence, by Theorem 8.104, the middle matrix, Z , (X) for q(x)[h] is positive semidefinite. We are in the setting of Lemma 8.107, from which we obtain Z 0,0 (X) 0. If
this held for X in a noncommutative basic open semialgebraic set, then Theorem
8.71 forces p to have degree no greater than 2. The proof of that theorem applies
easily here to nish this proof.
8.6.5
Exercises
Exercise 8.108. Compute the BV-MM representation for the relaxed Hessian of
x3 and x4 .
i
i
396
8.7
main
2012/11/1
page 396
i
In this section we will give a brief overview of convex semialgebraic noncommutative sets and positivity of noncommutative polynomials on them. We shall see that
their structure is much more rigid than that of their commutative counterparts.
For example, roughly speaking, each convex semialgebraic noncommutative set is a
spectrahedron, i.e., a solution set of a linear matrix inequality (LMI) (cf. Section
8.7.1 below). Similarly, every noncommutative polynomial nonnegative on a spectrahedron admits a sum of squares representation with weights and optimal degree
bounds (see Section 8.7.2 for details and precise statements).
8.7.1
Noncommutative Spectrahedra
Let L be an ane linear pencil. Then the solution set of the LMI L(x) 0 is
DL =
<:
;
X (Snn )g : L(X) 0
nN
i
i
main
2012/11/1
page 397
i
397
Chief among a pleasant list of natural properties is the fact that there is an (X, v)
with X Dp and p(X)v = 0 for which Zd (X, v) contains all pairs (Y, w) such that
Y Dp and p(Y )w = 0. Combining this with the ErosWinkler theorem and
battling degeneracies is a bit tricky, but separation prevails in the end. See [38] for
the details.
An unexpected consequence of Theorem 8.109 is that projections of noncommutative semialgebraic sets may not be semialgebraic; see Exercise 8.112. For perspective, in the commutative case of a basic open semialgebraic subset C of Rg , there
is a stringent condition, called the line test (see Chapter 6 for more details), which,
in addition to convexity, is necessary for C to be a spectrahedron. In two dimensions
the line test is necessary and sucient [44], a result used by LewisParriloRamana
[51] to settle a 1958 conjecture of Peter Lax on hyperbolic polynomials.
In summary, if a (commutative) bounded basic open semialgebraic convex set
is a spectrahedron, then it must pass the highly restrictive line test; whereas a
noncommutative basic open semialgebraic set is a spectrahedron if and only if it is
convex.
8.7.2
Noncommutative Positivstellens
atze under Convexity
Assumptions
i
i
398
main
2012/11/1
page 398
i
nite
fj Lfj ,
(8.26)
8.7.3
Exercises
Exercise 8.111. Suppose L is an ane linear pencil such that 0 DL (1). Show
with DL = D .
that there is a monic linear pencil L
L
Exercise 8.112. Chapters 5 and 6 discuss sets D Rg which have a semidenite
representation as a strict generalization of a spectrahedron. For instance, consider
the TV screen (cf. Section 8.2.1)
ncTV(1) = {X R2 : 1 X14 X24 > 0} R2 .
Given a positive real number, choose 4 = 1 + 22 and let
1 0
y1
y2
L0 = 0 1
y1 y2 1 2(y1 + y2 )
and
Lj =
1
xj
xj
,
+ yj
j = 1, 2.
(8.27)
(8.28)
i
i
main
2012/11/1
page 399
i
399
Note that the Lj are not monic, but because Lj (0) 0, they can be normalized to
be monic without altering the solution sets of Lj (X) 0; cf. Exercise 8.111. Let
L = L0 L1 L2 .
It is readily veried that ncTV(1) is the projection onto the rst two (the x)
coordinates of the set DL (1); i.e.,
ncTV(1) = {X R2 : Y R2 L(X, Y ) 0}.
1. Show that ncTV(1) is not a spectrahedron. (Hint: How often is LTV (tX, tY ) for
t R singular?)
2. Show that ncTV is not the projection of the noncommutative spectrahedron DL .
3. Show that ncTV is not the projection of any noncommutative spectrahedron.
4. Is ncTV(2) a projection of a spectrahedron? (Feel free to use the results about
ncTV and LMI representable sets (spectrahedra), stated without proofs, from
Sections 8.2.1 and 8.7.1.)
Exercise 8.113. If q is a symmetric concave matrix-valued polynomial with
q(0) = I, then there exists a linear pencil L and a matrix-valued linear polynomial
such that
q = I L .
Exercise 8.114. Consider the monic linear pencil
1 x
M (x) =
.
x 1
1. Determine DM .
2. Show that 1 + x is positive semidenite on DM .
3. Construct a representation for 1 + x of the form (8.26).
Exercise 8.115. Consider the univariate ane linear pencil
1 x
L(x) =
.
x 0
1. Determine DL .
2. Show that x is positive semidenite on DL .
3. Does x admit a representation of the form (8.26)?
Exercise 8.116. Let L be an ane linear pencil. Prove that
1. DL is bounded if and only if DL (1) is bounded;
2. DL = if and only if DL (1) = .
i
i
400
main
2012/11/1
page 400
i
0
(x1 , x2 ) = I + 1
0
1 0
0 0
0 0 x1 + 0 0
0 0
1 0
1
1
0 x2 = x1
0
x2
x1
1
0
x2
0
1
and
1 0
0
(x1 , x2 ) = I +
x +
0 1 1
1
1
1 + x1
x =
x2
0 2
x2
1 x1
8.8
m
i=1
Now that you have gone through the mathematics we return to its implications. In
the linear systems engineering problems you have seen both in Section 8.1.1 and in
Section 2.2.1, the conclusion was that the problem was equivalent to solving an LMI.
Indeed this is what one sees throughout the literature. Thousands of engineering
papers have a dimension free problem and it converts (often by serious cleverness)
to an LMI in the best of cases, or more likely there is some approximate solution
which is an LMI.
While engineers would be satised with convexity, what they actually do get
is an LMI. One would hope that there is a rich world of convex situations not
equivalent to an LMI. Then there would be a variety of methods waiting to be
discovered for dealing with them. Alas what we have shown here is compelling
evidence that any convex dimension free problem is equivalent to an LMI. Thus
there is no rich world of convexity beyond what is already known and no armada
of techniques beyond those for producing LMIs which we already see all around us.
i
i
Bibliography
main
2012/11/1
page 401
i
401
Bibliography
[1] S. Balasubramanian and S. McCullough. Quasi-convex free polynomials. To
appear in Proc. Amer. Math. Soc. http://arxiv.org/abs/1208.3582.
[2] G. M. Bergman. Rational relations and rational identities in division rings I.
J. Algebra, 43:252266, 1976.
[3] R. Bhatia. Matrix Analysis. Springer-Verlag, Berlin, 1997.
[4] D. P. Blecher and C. Le Merdy. Operator Algebras and Their ModulesAn
Operator Space Approach, Oxford Science Publications, Oxford, UK, 2004.
[5] J. Bochnak, M. Coste, and M. F. Roy. Real Algebraic Geometry. SpringerVerlag, Berlin, 1998.
[6] M. Bresar and I. Klep. A local-global principle for linear dependence of noncommutative polynomials. Israel J. Math., to appear.
[7] K. Cafuta, I. Klep, and J. Povh. A note on the nonexistence of sum of squares
certicates for the Bessis-Moussa-Villani conjecture. J. Math. Phys., 51:083521,
2010.
[8] K. Cafuta, I. Klep, and J. Povh. NCSOStools: a computer algebra system
for symbolic and numerical computation with noncommutative polynomials.
Optim. Methods Softw., 26:363380, 2011.
[9] J. F. Camino, J. W. Helton, R. E. Skelton, and J. Ye. Matrix inequalities:
A symbolic procedure to determine convexity automatically. Integral Equations
Operator Theory, 46:399454, 2003.
[10] P. M. Cohn. Skew Fields. Theory of General Division Rings. Cambridge University Press, Cambridge, UK, 1995.
[11] P. M. Cohn. Free Ideal Rings and Localization in General Rings. Cambridge
University Press, Cambridge, UK, 2006.
[12] A. C. Doherty, Y.-C. Liang, B. Toner, and S. Wehner. The quantum moment
problem and bounds on entangled multi-prover games. In Twenty-Third Annual
IEEE Conference on Computational Complexity, 2008, pp. 199210.
[13] M. de Oliviera, J. W. Helton, S. McCullough, and M. Putinar. Engineering systems and free semi-algebraic geometry. In Emerging Applications of Algebraic
Geometry, IMA Vol. Math. Appl. 149. Springer-Verlag, Berlin, 2009, pp. 1762.
[14] H. Dym, J. M. Greene, J. W. Helton, and S. McCullough. Classication of all
noncommutative polynomials whose Hessian has negative signature one and a
noncommutative second fundamental form. J. Anal. Math., 108:1959, 2009.
[15] H. Dym, J. W. Helton, and S. McCullough. Irreducible noncommutative dening polynomials for convex sets have degree four or less. Indiana Univ. Math. J.,
56:11891232, 2007.
i
i
402
main
2012/11/1
page 402
i
i
i
Bibliography
main
2012/11/1
page 403
i
403
i
i
404
main
2012/11/1
page 404
i
paths. In Ecole
dEte de Probabilites de Saint-Flour XXXIV, Lecture Notes in
Math. 1908, Springer-Verlag, Berlin, 2004.
[53] S. McCullough. Factorization of operator-valued polynomials in several
noncommuting variables. Linear Algebra Appl., 326:193203, 2001.
[54] P. S. Muhly and B. Solel. Progress in noncommutative function theory. Sci.
China Ser. A, 54:22752294, 2011.
[55] B. Sz.-Nagy, C. Foias, H. Bercovici, and L. Kerchy. Harmonic Analysis of
Operators on Hilbert Space. Springer-Verlag, New York, 2010.
[56] H. Osaka, S. Silvestrov, and J. Tomiyama. Monotone operator functions, gaps
and power moment problem. Math. Scand., 100:161183, 2007.
[57] V. Paulsen. Completely Bounded Maps and Operator Algebras. Cambridge
University Press, Cambridge, UK, 2002.
[58] G. Pisier. Introduction to Operator Space Theory. Cambridge University Press,
Cambridge, UK, 2003.
[59] S. Pironio, M. Navascues, and A. Acn. Convergent relaxations of polynomial
optimization problems with noncommuting variables. SIAM J. Optim.,
20:21572180, 2010.
[60] G. Popescu. Free holomorphic functions on the unit ball of B(H)n . J. Funct.
Anal., 241:268333, 2006.
[61] G. Popescu. Free holomorphic automorphisms of the unit ball of B(H)n .
J. Reine Angew. Math., 638:119168, 2010.
[62] K. Schm
udgen. A strict Positivstellensatz for the Weyl algebra. Math. Ann.,
331:779794, 2005.
[63] K. Schm
udgen. Noncommutative real algebraic geometrysome basic concepts and rst ideas. In Emerging Applications of Algebraic Geometry, IMA
Vol. Math. Appl. 149. Springer-Verlag, Berlin, 2009, pp. 325350.
i
i
Bibliography
main
2012/11/1
page 405
i
405
i
i
main
2012/11/1
page 406
i
main
2012/11/1
page 407
i
Chapter 9
Sums of Hermitian
Squares: Old and New
Mihai Putinar
This nal chapter marks a departure from the main framework of the book by
putting emphasis on hermitian forms over the complex eld rather than symmetric
forms over the real eld. The passage is both natural and necessary. To give a
simple motivation: polynomial or rational functions with real coecients, so much
praised in the preceding chapters, may very well have complex roots or complex
poles. Taking them into account greatly simplies computations and conceptual
thinking, as we all remember from elementary algebra. A second important observation goes back to the dictionary between elementary functions and matrices: by
writing in
complex coordinates a real valued polynomial (in any number of variables)
the hermitian matrix (c ), while a simp(z, z) = c z z uniquely
determines
ilar decomposition q(x) =
x+ , with real coecients , so much needed
for semidenite programming, has a clear ambiguity. The appearance at this late
stage of the book of imaginary ghosts related to the basic entities encountered so
far should not discourage the truly real and very applied reader.
9.1
Introduction
A question arises from the very beginning: how much of the vast theory of hermitian
forms (in a nite or innite number of variables) should the student or practitioner
in applied areas of real algebra, functional analysis, algebraic geometry, or optimization theory know? Due to the depth and wide ramications of hermitian forms
(over the complex eld) versus forms over real elds, the answer is: quite a lot! The
good news is that the material, old and new, either is well known, circulating in part
as folklore, or is accessible, due to a century and a half of continuous development
Mihai
407
i
i
408
main
2012/11/1
page 408
i
9.2
We start by recalling a few well-known facts about canonical forms of matrices and
positive denite kernels. Let C be the complex eld and denote by Md (C) the
algebra of d d matrices over C, regarded as linear transforms of the space Cd . We
endow Cd with its hermitian structure, that is, the inner product
z, w = z w = z1 w 1 + + zd wd ,
where z = (z1 , . . . , zd ), w = (w1 , . . . , wd ) Cd . We put as usual
z
2 = z, z. The
adjoint of a linear transform A L(Cd ) is dened by the identity
Az, w = z, A w.
Let e1 , . . . , ed denote the canonical orthonormal basis of Cd . When representing
A = (ajk )dj,k=1 and z = z1 e1 + + zd ed as a column vector, we have
(Az)j = Az, ej =
d
ajk zk ,
k=1
i
i
main
2012/11/1
page 409
i
409
9.2.1
d
j,k=1
where u, v Cd and i = 1.
i
i
410
9.2.2
main
2012/11/1
page 410
i
The spectral theorem asserts that the quadratic form qA (z) can be written as a
weighted sum of squares of complex linear forms:
qA (z) =
d
j |wj |2 ,
j=1
where
wj =
d
ujk zk , 1 j d,
k=1
n
j=1
|Pj (z)|2
r
|Pj (z)|2 ,
(9.1)
j=n+1
i
i
main
2012/11/1
page 411
i
411
9.2.3
Min-max Principle
min
dim V =dk+1
max
zV \{0}
qA (z)
, 1 k d.
z
2
For this, and other, reasons, the numbers k (A) are also known as the characteristic values of the form qA (z).
9.2.4
Exercises
9.3
9.3.1
i
i
412
and
main
2012/11/1
page 412
i
K(i, j)ci cj 0
i,jI
Denote the vectors of zero norm by N = {f F (X); f, f = 0}. Note that by the
classical CauchySchwarz inequality we infer
|f, g|2 f, f g, g.
Thus N is a vector subspace of F (X) and the quotient F (X)/N carries a nondegenerate inner product induced on equivalence classes by f, g. The Hilbert space
completion H then contains F (X)/N as a dense subspace and the map F : X H
dened by the class of characteristic function F (x)(y) = 0 if x = y and F (x)(x) = 1
induces then the factorization in the statement.
Note that in general the Hilbert space constructed in the proof is nonseparable.
A uniqueness of the factorization can be immediately derived from the same proof.
Corollary 9.9. Assume that the positive semidenite kernel L : X X C
admits two factorizations L(x, y) = F (x), F (y)H = G(x), G(y)K , where H, K
are Hilbert spaces and the maps F : X H, G : X K both have dense
ranges. Then there exists a unitary transformation U : H K with the property
U F = G.
9.3.2
i
i
main
2012/11/1
page 413
i
413
z, w .
Proof. By its very construction, the factorization proved in Theorem 9.8 has
the property that the scalar function z F (z), y is analytic for every vector y
belonging to a dense subspace of H. By taking limits of sequences of the form
F (z), yn we nd that the map z F (z), u is analytic for every u H. Hence
F (z) is analytic, due to the equivalence between weak and strong analyticity of
Hilbert space valued maps; see, for instance, [30].
To prove that the space H is separable, simply note that the vectors F (),
G, span H as soon as the countable set G is everywhere dense in .
When expanding in a Taylor, or Fourier, series we will encounter later the
natural question of whether the matrix of coecients of a kernel reects its positivity
as a map, as dened at the beginning of this section. For instance, take to be a
polydisk (that is, a product of disks) in Cd centered at z = 0, and assume that the
map K : C is analytic/antianalytic. Then a power series expansion
K(z, w) =
c, z w
(9.2)
,Nd
,
...
K(z, w)z
|zj |=|wk |=
'
d &
#
dzj dwj
.
2izj 2iwj
j=1
A Riemann sum approximation of the integral proves then that (c, ),Nd is
a positive semidenite discrete kernel. Conversely, assuming that (c, ),Nd is
positive semidefninite, the convergence of the power series expansion implies the
positivity of K.
Exactly as in the case of hermitian forms, an analytic/antianalytic kernel
K(z, w) is determined by its values on the diagonal K(z, z).
i
i
414
9.3.3
main
2012/11/1
page 414
i
Hadamards Product
Besides the natural operations which preserve positivity of kernels, their pointwise
product stands out:
Theorem 9.12 (Schur). Let Kj : X X C, j = 1, 2, be two positive
semidenite kernels. Then K(x, y) = K1 (x, y)K2 (x, y), x, y X, is also positive
semidenite.
For the proof see [15].
To give a single, illustrative application of Schurs theorem, consider an open
set Cd and a positive denite kernel K : C which is analytic/
antianalytic in the sense discussed above. Assume that there exists a positive constant M such that K(z, z) < M for all z . Then the new kernel
1
, z, w ,
M 2 K(z, w)
has the same properties (i.e., analyticity and positivity). Indeed, by virtue of the
CauchySchwarz inequality, |K(z, w)| < M for all z, w . Then Neumann series
decomposition and Schurs theorem lead to the desired conclusion:
'
&
1
K(z, w) K(z, w)2
2
=M
+
+ ... .
1+
M 2 K(z, w)
M2
M4
9.3.4
Bergmans Kernel
f
22, =
It is easy to see that A () is complete with respect to this norm and that the
evaluation functional f f (a) is continuous for every a (for the proof use
the mean value theorem on a polydisk centered at z = a and fully contained in ).
Thus, according to Rieszs representation theorem (see [30]), there exists a unique
element ka A2 () which represents this functional:
f (a) = f, ka ,
f A2 ().
The positive denite kernel K (z, w) = kw , kz , also known as the Bergman kernel
of the domain , consequently satises the reproducing property
,
K (z, w)f (w)d2d (w), z , f A2 ().
f (z) =
i
i
main
2012/11/1
page 415
i
415
(z)
(w) = K1 (z, w),
z
w
where
z (z) denotes the complex Jacobian. Since K1 (z, z) > 0 for all z 1 we
infer that the dierential form
d
log K (z, z)
dzj dz k
zj z k
j,k=1
1
1
, K (z, w) =
,
|B|(1 z, w)d+1
||(1 z1 w 1 )2 . . . (1 zd wd )2
where |A| denotes the volume of the set A. One can prove via these invariants that
the ball and the polydisk are not biholomorphically equivalent as soon as d 2;
see [33].
9.3.5
Exercises
Exercise 9.13. Let (X, ) be a compact space endowed with a Borel probability
measure, and let
,
K(x, y)f (y)d(y),
TK : L2 (X, ) L2 (X, ), (TK f )(x) =
X
hn (z)hn (w),
n=0
i
i
416
9.4
main
2012/11/1
page 416
i
In an inspired and undeservedly forgotten work of his early career, Hermite has
developed an algebraic method for counting the number of solutions of systems of
polynomial equations which are contained in a prescribed basic semialgebraic subset
of Rn or Cn . He was aiming at bypassing, via purely algebraic methods, Cauchys
residue integral method for counting roots of complex polynomials, analogous with
the widely circulated (at that time) algebraic algorithm developed by Sturm for
counting real zeros of polynomials. For this very reason Hermite introduced and
studied what we call today hermitian forms. For complete mathematical details
and ample historical comments see the (also forgotten) little book by Krein and
Naimark [27].
We illustrate below Hermites ideas in a couple of typical examples. For simplicity we expose Hermites idea in two variables, the transition to a larger number
of variables being straightforward. Suppose that two polynomials P1 , P2 R[x, y] of
degrees n1 (respectively, n2 ) possess exactly n = n1 n2 common roots V (P1 , P2 ) =
{(aj , bj ), 1 j n}, complex or real. Fix rational real functions , 1 , . . . , n so
that does not vanish on V (P1 , P2 ),
det((j (ak , bk ))nj,k=1 ) = 0,
and consider the hermitian form on Cn :
n
H(z, z) =
(aj , bj )|z1 1 (aj , bj ) + z2 2 (aj , bj ) + + zn n (aj , bj )|2 .
j=1
Since the sum is symmetric in the variables (aj , bj ) the hermitian form H depends
only on the coecients of the polynomials P1 , P2 . Denote the number of roots in
dierent sectors as follows:
Nc (P1 , P2 ) = #(V (P1 , P2 ) \ R2 ),
N+ (P1 , P2 ) = #(V (P1 , P2 ) {(x, y) R2 ; (x, y) > 0}),
N (P1 , P2 ) = #(V (P1 , P2 ) {(x, y) R2 ; (x, y) < 0}).
By the inertia theorem we infer, following Hermite, that
n (H) = Nc (P1 , P2 ) + N (P1 , P2 ), n+ (H) = Nc (P1 , P2 ) + N+ (P1 , P2 ),
where (n (H), n+ (H)) is the signature of the form H. Although it is dicult in
general to eliminate the variables (aj , bj ) in the form H, counting the number of
real common zeros of some given polynomials contained in a rectangle leads to an
elegant closed form, as pointed out by Hermite; see [27] for details.
We specialize the above ideas to polynomials of a single complex variable.
For p C[] denote p () = p(), that is, the polynomial obtained from p by
conjugating the coecients. Assume n = deg p and dene the complex polynomial
in two variables:
n
p(u)p (v) p (u)p(v)
ckl uk1 v l1 .
=
i
uv
k,l=1
i
i
main
2012/11/1
page 417
i
417
n
ckl zk zl .
(9.3)
k,l=1
Theorem 9.17 (Hermite). Let Hp be the hermitian form (9.3) associated with
a polynomial p C[] of degree n. Denote by n (Hp ) the number of negative,
respectively, positive, squares in the decomposition of Hp . Then the polynomial p
has n+ (Hp ) roots in the upper half-plane , > 0, n (Hp ) roots in the lower halfplane, and n n (Hp ) n+ (Hp ) common roots between p and p , that is, real roots
or complex conjugated roots.
In particular we derive from here a stability criterion widely used in mechanics
and engineering (compare with the similar criteria due to Routh and Hurwitz [15]).
Corollary 9.18. Assume that the hermitian form Hp is positive denite. Then the
polynomial p has all roots contained in the upper half-plane.
The proof of Hermites theorem relies on a product formula (well known today
in the context of Bezoutian computations). The key identity is, assuming p = p1 p2 :
p(u)p (v) p (u)p(v)
=
uv
p1 (u)p1 (v) p1 (u)p1 (v)
p2 (u)p2 (v) p2 (u)p2 (v)
+ p1 (u)p1 (v)
.
uv
uv
A similar product rule is inherited by the form Hp , allowing us to use the inertia
theorem and induction on the degree in order to prove Hermites theorem.
p2 (u)p2 (v)
9.4.1
The specic denominator uv and form of the conjugate p in the Hermite theorem
are related to the Schwarz reection with respect to the boundary of the domain
of root separation. In the case of the upper half-plane the reection is .
1
When repeating the procedure for the unit disk, with Schwarz reection
one arrives at a similar conclusion. The computations were detailed by Schur (and
independently by several other authors); see [27]. Specically, let p C[] be a polynomial of degree n and dene p () = n p ( 1 ) as the polynomial with conjugated
coecients, arranged in reversed order. Consider the bivariate polynomial
n
p (u)p (v) p(u)p (v)
=
akl uk1 v l1 .
1 uv
k,l=1
Let Sp be the hermitian form with coecients (akl ). In complete analogy with
Hermites theorem we state the following well-known result.
i
i
418
main
2012/11/1
page 418
i
n
#
( aj ), p () =
j=1
n
#
(1 aj ),
j=1
(9.4)
and it is a rational n-fold covering of the disk onto the disk and its boundary onto
the boundary.
9.4.2
Eigenvalue Separation
A too well charted and traveled area of control theory deals with stability criteria
for linear systems of dierential equations. In its turn, via a Laplace transform, this
heavily relies on root separation criteria as presented above. We discuss below an
instance of Hermite theory as transgressed and distilled by engineers.
Let A, B be complex n n matrices, with A = A self-adjoint. We consider
the (spectrahedral) region in the complex plane
G = {z C; A + zB + (zB) 0}.
We assume that G is nonempty and does not coincide with the full complex plane.
Theorem 9.20. An n n matrix M has all its eigenvalues in the region G if and
only if there exists a positive denite matrix X such that
A M + B (XM ) + (XM ) B 0.
The most important examples are given by the following choices: the halfplane n = 1, A$= 0, B%= 1 and the disk centered at zero, of radius r, corresponding
0
01
to n = 2, A = r
0 r , B = ( 0 0 ) .
For the proof of the theorem and more details see [7].
9.4.3
Exercises
i
i
main
2012/11/1
page 419
i
419
of the polynomial
i
is positive denite.
Exercise 9.23. Find a hermitian form whose positivity certies that a polynomial
has all roots contained in a given ellipse.
Exercise 9.24. Prove the eigenvalue separation theorem in the case of a disk or a
half-plane.
9.5
Schurs Algorithm
Returning to Schurs theorem discussed in the last section, notice that the Blaschke
product (9.4) produces a positive semidenite kernel
1 m(u)m(v)
.
1 uv
K(u, v) =
is positive semidenite (as the polynomial p has all its roots contained in the unit
disk), and in addition the function p does not vanish in the disk.
It was Schur who recognized in the above positivity a characterization of all
power series
f (z) = a0 + a1 z + a2 z 2 +
(9.5)
which map the disk into the disk. By dierent means, the same question was
studied by Caratheodory, Fejer, and Toeplitz; again see [27] for more details. We
focus below on Schurs approach, as it leads to a basic algorithmic way of verifying
when f (z) maps the disk into the disk. We call, in short, f a contractive analytic
function in the disk.
Assume that the analytic function (9.5) satises |f (z)| 1 whenever |z| < 1.
In particular |a0 | = |f (0)| 1. If |a0 | = 1, then the function f (z) = a0 is a constant
obius transform applied
by the maximum principle. Assume that |a0 | < 1. Then a M
to f yields a new function from the disk to its closure, which in addition vanishes
at z = 0, whence
zf1 (z) =
f (z) 0
,
1 0 f (z)
where 0 = f (0) by denition. By virtue of Schwarz lemma, the factor f1 (z) satises
|f1 (z)| 1 for all |z| < 1. By inverting the transform we nd
f (z) =
zf1 (z) + 0
.
1 + 0 zf1 (z)
i
i
420
main
2012/11/1
page 420
i
zfk (z) + k1
.
1 + k1 zfk (z)
(9.6)
In this way we have associated with the nite section of the sequence of coefcients of f (z) another sequence of the same length, called the Schur parameters:
(a0 , . . . , an ) (0 , 1 , . . . , n ).
The transformation is real analytic, as one can easily prove by induction. The main
result is:
Theorem 9.25 (Schur). Let n be a positive integer and let a0 , a1 , . . . , an be
complex numbers. There exists a power series
f (z) = a0 + a1 z + a2 z 2 + + an z n + O(z n+1 )
mapping the open disk into the closed disk if and only if the Schur parameters
0 , 1 , . . . , n are of modulus less than or equal to one. If k is the rst index with
|k | = 1, then there exists only one continuation of a0 + a1 z + + ak z k into such
a function f , and this is a Blaschke product of degree k.
Moreover, the recursion formula (9.6) labels all possible extensions of the
polynomial a0 +a1 z +a2 z 2 + +an z n to a contractive function as in the statement.
The recursion formula and the representation of the function f (z) by a chain of
simple multiplication and division operations is a perfect analogue of the continued
fraction algorithm in number theory.
One step further, Schur made the connection with the counting zeros form,
by proving that a0 + a1 z + a2 z 2 + + an z n can be continued to a function f (z)
which maps the disk into its closure if and only if the Toeplitz matrix
a0 a1 . . .
an
0 a0 . . . an1
T = .
..
..
..
.
.
0 0 ...
a0
is contractive. In order to prove this fact we start with the observation that Schurs
algorithm as presented above implies that every analytic function f (z) mapping the
disk into the disk can be uniformly approximated on compact subsets of the open
disk by Blaschke products. Consequently, the kernel
1 f (z)f (w)
,
1 zw
is positive semidenite. We can dilate the argument of the function to f (rz), r < 1,
and assume that f is analytic in a neighborhood of the closed unit disk.
i
i
main
2012/11/1
page 421
i
421
(a0 + a1 z + + an z n )(p0 + p1 z 1 + + pn z n )
|z|=1
|w|=1
dz dw
2iz 2iw
=
v
2
T v
2 ,
where v = (p0 , p1 , . . . , pn ) Cn and T is the above Toeplitz matrix.
The reader can consult the monograph [13] for further details and many unexpected applications of the Schur parameters.
9.5.1
Exercises
Exercise 9.26. Prove that the only power series (9.5) associated with an extremal
Schur parameter |n | = 1 is a Blaschke product of degree n.
Exercise 9.27. Find all continuations to a contractive power series of a degree 2
polynomial a0 + a1 z + a2 z 2 . Describe explicitly the conditions on the coecients
a0 , a1 , a2 that such a continuation exists, and if so, that it is unique.
Exercise 9.28. Let f (z) be an analytic function mapping the disk into the disk.
Prove that f can be approximated uniformly on compact subsets of the open disk
by Blaschke products.
9.6
RieszHerglotz Theorem
The structure of contractive analytic functions in the disk revealed in the previous section can be related by a linear fractional transform to that of nonnegative
harmonic functions in the disk. We briey describe this new point of view.
Let h(z), |h(z)| 1, be an analytic function in the disk |z| < 1. Leaving the
case of a constant function aside, we can assume that |h(z)| < 1 in the disk, and
1+h(z)
dene the function f (z) = 1h(z)
, so that -f (z) 0 for all |z| < 1. Let fr (z) =
f (rz), 0 < r < 1, so that the functions fr are dened in a neighborhood of the closed
disk and limr1 fr = f uniformly on compact subsets of D = {z C; |z| < 1}.
A direct application of Cauchys formula yields
, i
e + w -fr (ei )d
+ i,f (0).
fr (w) =
i
2
e w
i
i
422
main
2012/11/1
page 422
i
)d
are nonnegative and of uniform mass
Remark that the measures r = fr (e
2
equal to -f (0); hence they form a compact set in the weak- topology of measures
on the unit torus. By passing to a limit point we obtain a positive measure with
the property
, i
e +w
d() + i,f (0).
(9.7)
f (w) =
i
e w
i
Since the trigonometric polynomials are dense in the space of continuous functions
on the torus, we infer that the measure is unique with the above property.
Formula (9.7) is known as the RieszHerglotz representation of the nonnegative
harmonic functions in the disk. Since D is simply connected, for any harmonic
function u : D R there exists an analytic function f : D C such that u = -f .
Putting together these observations we have proved the equivalence between the rst
two statements in the next theorem.
Theorem 9.29. Let f : D C be an analytic function. The following assertions
are equivalent:
(a) -f 0;
(b) there exists a positive measure on D, such that (9.7) holds;
(c) the kernel
f (z)+f (w)
1zw
is positive semidenite on D D.
Proof. (a) (b) was proved before. If (b) holds true, then
,
d()
f (z) + f (w)
=2
,
i z)(ei w)
1 zw
(e
whence (c) is true. Finally, (c) (a) because a positive semidenite kernel has
nonnegative values on the diagonal.
The above positivity result has a classical counterpart in the case f is a nonnegative polynomial on the boundary of the disk. Specically, we have the following
RieszFejer theorem.
Theorem 9.30. Let p(z, z) be a polynomial with complex coecients which is
nonnegative on the unit torus T. Then there exists a polynomial q(z) C[z] with
the property
p(z, z) = |q(z)|2 ,
z T.
d
cj eij .
d
i
i
i
main
2012/11/1
page 423
i
423
poles of the rational function P are symmetric with respect to the torus, whence
&
'
#
#
1
d
2
z P (z) = cz
(z j )
(z k ) z
,
k
j
k
|ei j |2
# |ei k |2
9.6.1
|k |2
To remain in the spirit of this volume, and returning to Schurs theorem and the
RieszHerglotz integral representation, we are in the position of stating the following
direct optimization corollary of our computations.
Proposition 9.31. Let h(z) be a bounded analytic function in the disk. Then
M =
h
,D = sup |h(z)|
zD
is the smallest nonnegative number M with the property that the kernel
M 2 h(z)h(w)
1 zw
is positive semidenite.
For the proof we simply substitute f (z) =
that
f (z) + f (w) = 2
9.6.2
M+h(z)
Mh(z)
M 2 h(z)h(w)
(M h(z))(M h(w))
Theorem 9.29 has a fourth equivalent statement which brings into focus very naturally Hilbert space representations of all bounded analytic functions in the disk. We
start with the positive kernel appearing in Proposition 9.31. According to Proposition 9.10 there exists a Hilbert space H and an analytic function F : D H,
such that
M 2 h(z)h(w)
= F (z), F (w), z, w D,
1 zw
or equivalently
M 2 + zF (z), wF (w) = h(z)h(w) + F (z), F (w).
i
i
424
main
2012/11/1
page 424
i
d , b
c
A
'
C
C
:
H
H
i
i
9.6.3
main
2012/11/1
page 425
i
425
Exercises
Exercise 9.33. Assume that the analytic function f maps the disk into the right
(w)
half-plane. Under which conditions has the kernel f (z)+f
nite rank?
1zw
Exercise 9.34. The set H of all functions f satisfying the conditions in Theorem
9.29 is a closed convex cone, as a subset of O(D), the Frechet space of all analytic
functions in the disk. Find the extremal rays of H.
Exercise 9.35. Let be a simply connected domain and let : D be a
conformal mapping. Let f O() be bounded. Find
f
, via a positive denite
optimization of a hermitian form involving f and .
Exercise 9.36. Derive a Hilbert space realization of all analytic functions mapping the disk into the right half-plane.
Exercise 9.37. [17] Let A Md (C) be a matrix with cyclic vector and minimal
monic polynomial Pd (z). Prove that there are polynomials Pk (z), of exact degree
deg Pk = k, 0 k < d, such that
|P (z)|2
(A z)1
2 = |Pd1 (z)|2 + |Pd2 (z)|2 + + |P0 (z)|2 .
Conversely, every sum of hermitian squares of polynomials in exact decreasing order
comes as above from a cyclic matrix.
9.7
We are ready at this point to prove a new inequality involving functions and operators.
Theorem 9.38 (von Neumann). Let T : H H be a linear contractive
(
T
1) operator acting on a complex Hilbert space H and let f be an analytic
function dened in a neighborhood of the closed unit disk. Then
f (T )
f
,D .
(9.8)
i
i
426
main
2012/11/1
page 426
i
Let f (z) be an analytic function with positive real part dened in a neighborhood of the closed unit disk. By the RieszHerglotz formula
,
1
1
(1 wz) i
d(),
f (z) + f (w) = 2
i
w)
(e z)
(e
where is a positive measure on the unit circle.
Expand everything into a series and replace z in the above identity by T and
w by T , assuring that in the mixed terms contain T to the left of T . The result is
,
(ei T )1 (1 T T )(ei T )1 d() 0,
-f (T ) =
9.7.1
Among the many applications of von Neumanns inequality we sketch below the
construction of the spectral measure of a unitary operator. The reader will easily
adapt afterward the proof to the case of bounded self-adjoint operators.
Let H be a complex Hilbert space and let U : H H be a unitary operator,
that is U U = U U = I. Then zI U is invertible for all z C, |z| = 1. In other
terms, the spectrum of U is contained in the unit circle D.
Let p(z, z) be a polynomial satisfying p(z, z) 0 whenever |z| = 1. By means
of the identity zz = 1 along D we can replace all mixed terms z m z n by a linear
combination of pure terms z k or z , modulo 1 |z|2 . Consequently we can write
p(z, z) = p1 (z) + p1 (z) + (1 |z|2 )p2 (z, z),
where p1 (z) is a polynomial which depends only on z. According to the von Neumann inequality we obtain
p(U, U ) = 2-p1 (U ) 0,
since -p1 (z) 0 on the circle.
Thus, the polynomial functional calculus : C[z, z] L(H), dened by
(p(z, z)) = p(U, U ) is linear, multiplicative, unital, and positive:
(for all z D, p(z, z) 0) p(U, U ) 0,
or equivalently,
p(U, U )
p
,D , C[z, z].
Since every continuous function on the circle is a uniform limit of trigonometric
polynomials, we can extend by continuity to a continuous algebra homomorphism
: C(D) L(H), (z) = U,
i
i
main
2012/11/1
page 427
i
427
9.7.2
Exercises
Exercise 9.40. Let Jn be the (nilpotent) Jordan block of size n n. Prove that
Jn
= 1 and translate the matrix inequality
p(Jn )
p
,D into numerical
inequalities referring to an arbitrary polynomial p(z).
Exercise 9.41. Let Uk , 1 k n be a nite system of commuting unitary
matrices of size d d. Prove that there exists a unitary matrix U and polynomials
pk (z) such that Uk = pk (U ) for all k, 1 k n.
Exercise 9.42. Let U L(H) be a unitary operator. Prove that the bicommutant
(U ) of U is equal to the range of the Borel functional calculus described in Theorem
9.39. The commutant of a set of operators S L(H) is S = {T L(H); T X =
XT, X S}.
9.8
One of the classical applications of the realization Theorem 9.32 has to do with the
bounded analytic interpolation of discrete data in the disk. Contrary to the free
polynomial interpolation, the data are in this case bound by a series of positivity
conditions. The precise statement follows.
Theorem 9.43 (NevanlinnaPick). Let {ai }, {ci }, i I, be subsets of D, so
that ai does not have accumulation points, but the index set may be innite. There
i
i
428
main
2012/11/1
page 428
i
i I,
1 ci cj
, i, j I,
1 ai aj
is positive semidenite.
Proof. One implication follows from Theorem 9.29. In order to prove the converse,
assume that the kernel in the statement is positive semidenite. Then there exists
a Hilbert space and a function h : I H with the property
1 ci cj
= h(i), h(j), i, j I.
1 ai aj
Starting from here we argue as in the proof of Theorem 9.32, namely,
1 + ai h(i), aj h(j) = ci cj + h(i), h(j)
implies the existence of a contractive block matrix operator on C H satisfying
&
' &
' &
'
d , b
1
ci
:
=
.
c
A
ai h(i)
h(i)
From here we infer by eliminating h(i):
ci = d + (I ai A)1 c, b.
Hence the contractive analytic function
f (z) = d + (I zA)1 c, b
interpolates the given data.
A similar result is known for (higher multiplicity) Hermite interpolation, that
is, by prescribing the values of nitely many derivatives of f at every point:
(k)
9.8.1
Exercises
i
i
main
2012/11/1
page 429
i
429
Exercise 9.45. Let a1 = 0, a2 , a3 be three distinct points in the unit disk and
choose c1 = 0, c2 , c3 also in the disk. Write the 3 3 conditions that there exists a
contractive analytic function in the disk interpolating these data. Find when this
function is unique.
Exercise 9.46. Prove that in the case of nitely many data (the set I in the
statement is nite), there always exists a rational contractive interpolant. Estimate
its degree.
9.9
RieszHerglotzs theory on the unit circle has an obvious parallel on the line. A few
details are worth a closer look, as they provide the background of perturbation
theory of self-adjoint operators. We avoid below the complications related to unbounded symmetric operators or even general Hilbert space theory, focusing only
on nite-dimensional computations. The reader can greatly benet by lling these
gaps by reading the relevant sections contained in the monograph by Gohberg and
Krein [16].
Start with a self-adjoint matrix A = A L(Cd ). We can arrange the eigenvalues in nondecreasing order:
1 (A) 2 (A) d (A).
Consider a rank 1 self-adjoint operator , acting on Cd , where Cd is a vector.
An immediate corollary of the min-max principle (see Exercises 9.2.4, exercise 2)
shows that the perturbed matrix B = A + , has eigenvalues interlaced to
those of A:
1 (A) 1 (B) 2 (A) d (A) d (B).
d
Let = j=1 [j (A),j (B)] be the characteristic function of the union of spectral
displacement intervals between the two sets of eigenvalues. Then, for every z
/R
we obtain
d
#
j (B) z
det[(B zI)(A zI)1 ] =
(A) z
j=1 j
,
d
(t)dt
j (B) z
.
log
= exp
= exp
j (A) z
R tz
j=1
On the other hand, by simply expanding the vector in the orthonormal basis
which diagonalizes A we nd
det[(B zI)(A zI)1 ] = det[I + (A zI)1 , ]
= 1 + (A zI)1 , =
d
j=1
cj
,
j (A) z
i
i
430
main
2012/11/1
page 430
i
One step further, we can put together the above observations in the form of
equivalent representations of the same object.
Proposition 9.47. The following classes are equivalent:
(a) rational functions R(z) C(z) satisfying R() = 1 and
0 < ,R(z),(z) < , z
/ R;
(b) nite atomic positive measures on the real line;
(c) characteristic functions (t) of bounded semialgebraic subsets of the real
line;
z
R
R tz
= det[(A + , zI)(A zI)1 ] = 1 + (A zI)1 , .
Proof. A bounded semialgebraic subset of the real line is simply a nite union of
intervals. We prefer this fancy terminology due to higher-dimensional analogues;
see [17]. Since in all formulas the operator A or its powers appear against the
vector , it is natural to assume that this vector is cyclic with respect to A; that is,
, A, . . . , Ad1 is a linear basis of Cd .
To see that (a) (b) we remark that both zeros and poles of R must be
real, interlaced (by the argument principle), and that the residues at every pole
are positive. Then (b) (d) by considering the multiplier A = Mt on the space
L2 (), and (d) (c), (d) (b) by the computations preceding the statement.
The implication (c) (b) follows by direct integration and exponentiation. Finally
(d) (a) is a straightforward computation
(A zI)1 , (A zI)1 ,
= (A zI)1 (A zI)1 , > 0
zz
for all z
/ R.
To be in line with the theme of this chapter, we can add to the above equivalences the positivity (as a kernel) condition
E
D
R(z)R(z)
> 0, z
/ R.
(e) R is rational, R() = 1, and
zz
In this way the relation to the positivity theory in the disk exposed in the
previous sections becomes more transparent.
The function AB = appearing in the statement is known as the phase
shift or the spectral shift of the perturbation A B = A + , . The name is
justied by the following remarkable trace formula:
,
Tr(f (B) f (A)) =
f AB dt, f C[z].
R
i
i
main
2012/11/1
page 431
i
431
Indeed,
Tr(f (B) f (A)) =
d
(f (j (B)) f (j (A))) =
j=1
d ,
j=1
j (B)
f (t)dt.
j (A)
By dening step by step via rank 1 additive perturbations to the spectral shift
of a pair of self-adjoint matrices according to the rule
AB + BC = AC ,
we are led to the crucial observation
,
|AB |dt Tr|A B|.
R
This enables us to take limits and obtain the following well-known theorem.
Theorem 9.48 (LifshitzKrein). Let A, B L(H) be bounded self-adjoint
operators acting on a complex Hilbert space H and assume that A B is traceclass. Then there exists a function L1 (R, dt) with compact support, such that
,
Tr(f (B) f (A)) =
f dt
R
C01 (R),
and
,
||dt Tr|A B|.
R
The reader can consult for details the original article [25] and the monograph [23].
9.9.1
Exercises
9.10
i
i
432
main
2012/11/1
page 432
i
where the standard multiindex notation is used: z = z11 zdd . Note that the
matrix of coecients (c ) is unambiguously determined by f . A diagonalization
of this matrix yields a decomposition
f (z, z) =
F1 (z)
2
F2 (z)
2 ,
where Fj : C d Cnj are homogeneous (of degree n), vector-valued polynomial
functions.
It is important to remark from the very beginning that, even if f (z, z) > 0 for
all z = 0, the form f may not be a sum of hermitian squares. The following simple
example in two variables (z, w) singles out where the obstruction lies:
f0 = |z|4 + |w|4 c|zw|2
is everywhere positive on C2 \ {0} as soon as c < 2. Now dene
fN = (|z|2 + |w|2 )N (|z|4 + |w|4 c|zw|2 ).
The matrix associated with fN is diagonal with entries containing binomial coecients of the form
&
' &
'
&
'
N
N
N
+
c
.
p
p+2
p+1
After elementary calculations, the condition that all these coecients are positive is
N +1>
2c
,
2c
i
i
main
2012/11/1
page 433
i
433
||
z1 1 ...zd d
f (z, z) =
a z z
||=||=n
|| u w
f (z, w)u(w)dG.
Cd
, then
Ef (u)(z) =
a !u z
and
Ef (u), vH =
a !!u v ,
where v(z) = || v z . If the matrix of coecients a is hermitian, then Ef turns
out to be a symmetric operator acting on the space of homogeneous polynomials of
degree n.
Ef (u), vH = u, Ef (v)H .
Note also that Ef 0 is a linear operator if and only if the matrix (a, ) is positive
semidenite, if and only if the form f is a sum of hermitian squares.
2N
Let N be a positive integer, and denote by fN (z, z) = z
N ! f (z, z), a form
of bidegree (N + n, N + n). Consider two homogeneous polynomials u, v of degree
N + n each. Note that
zz
z
2N
=
.
N!
!
||=N
Then
EfN u, vH =
,,
( + )!( + )!
u+ v+
!
a D u, D vH
=
,
a z D u, v
.
H
i
i
434
main
2012/11/1
page 434
i
Assume that f (z, z) > 0 for all z = 0. From this point the proof becomes
more technical
and we merely mention the main idea: the dierential operator
Tf =
a
is elliptic and symmetric on the space of polynomials,
||=|| z D
with positive principal symbol equal to the form f (up to a constant). Hence, on
a proper choice of Sobolev norms, Tf is Fredholm. In particular, when restricted
on polynomials Tf possesses a nite number of negative eigenvalues. But Tf maps
the space of homogeneous polynomials of degree N + n into itself, and it coincides
there with the operator with integral kernel fN . Thus, for N suciently large, EfN
is a positive operator; that is, the form fN is a sum of hermitian squares.
Proof 2. The second proof of Quillens theorem is due to Catlin and dAngelo [5]
and closely follows the same idea of compact perturbations of integral operators
(only that it was published thirty
Start again with the (n, n) years after Quillen).
a
z
z
,
and
assume that it is positive
homogeneous form f (z, z) =
||=||=n
d
on the unit sphere in C . By homogeneity, this is the same condition as before:
f (z, z) > 0 for all z = 0. Let B denote the unit ball in Cd and consider the Bergman
space A2 (B) with reproducing kernel KB (see [5, Section 3.4]). The operator
,
2
2
Sf : A (B) A (B), (Sf u)(z) =
KB (z, w)f (z, w)u(w)d2d (w)
B
||=||=n a z w
=
cN fN (z, w),
KB (z, w)f (z, w) =
|B|(1 z w)d+1
N =0
2N
,
f (w, w)|u(w)|2 d2d (w) 0.
=
B
Hence Sf has only nitely many negative eigenvalues. Finally, we infer that for
large enough N , the restriction of the operator Sf on the space of homogeneous
polynomials of degree N is positive, that is, the form fN is a sum of hermitian
squares.
9.10.1
P
olyas Theorem
i
i
main
2012/11/1
page 435
i
435
2N
F (z, z) =
Assume |x1 |N f (x) =
|| a x . Then
z
|| a z z ; that
is, the coecient matrix associated with F is diagonal, and hence it has positive
entries by Quillens theorem. In conclusion we obtain the following theorem.
Theorem 9.51 (P
olya). Let f (x) be a homogeneous polynomial with real coecients in d variables. If f (x) > 0 for all x = (x1 , . . . , xd ) [0, )d \ {0}, then there
exists an integer N 0 with the property that the form (x1 + + xd )N f (x) has
positive coecients.
An important addition to Polyas theorem is that one can estimate the degree
N from the degree of f and its distance to zero on the standard simplex; see [28].
9.10.2
Exercises
Exercise 9.52. Prove, using the geometry of the zero set, that for every N 1
the two complex variable form (|z|2 + |w|2 )N (|z|2 |w|2 )2 is not a sum of squares
of hermitian forms.
Exercise 9.53. Prove that the zero set of a sum of hermitian squares is a complex
algebraic variety.
Exercise 9.54. Show that x2 cannot be represented as a sum of hermitian squares
in the variable z = x + iy.
9.11
i
i
436
main
2012/11/1
page 436
i
Proof. Let p(z, z) = , p z z be the polarization of p(x, y) and assume that
p has degree n both in z and z, but it is not necessarily homogeneous. One can
assume that n is even by passing from p to
z
2p. Next we add a new complex
variable zd+1 = u + iv, z = (z, zd+1 ) and homogenize p:
n||
n||
P (z , z ) =
p z zd+1 z z d+1 .
,
i
i
main
2012/11/1
page 437
i
437
and the proof is complete, noting that the ideal of the variety
z
2 = 1 is radical.
Let A be an R-algebra, and let S be a subsemiring of A with R+ S. Recall
that S is said to be Archimedean (in A) if R+S = A, that is, if for every f A there
exists a real number c such that c f S. If A is generated by x1 , . . . , xn , then S
is Archimedean if and only if there exist ci R with ci xi S (i = 1, . . . , n). See
[28, Denition 5.4.1], [32, references there].
Denition 9.56. Let I be an ideal in R[x, y]. We say that h is Archimedean
modulo I if the semiring h + I is Archimedean in R[x, y] or, equivalently, if the
semiring h = (h + I)/I is Archimedean in R[x, y]/I.
By (a particular case of) the representation theorem [28, Theorem 5.4.4], we
have the following theorem.
Theorem 9.57. Let I be an ideal in R[x, y]. The following conditions on I are
equivalent:
(i) The set VR (I) is compact and every f R[x, y] with f > 0 on VR (I) lies in
h + I;
(ii) h is Archimedean modulo I.
(The representation theorem, in the version for semirings, asserts that (ii)
implies (i). The opposite implication is obvious.)
We observe the following simple characterization of these ideals.
Proposition 9.58. Let I be an ideal in R[x, y]. Then h is Archimedean modulo I
if and only if I contains a polynomial of the form
f = c + ||z|| +
2
r
|qk (z)|2
k=1
and
(1 c) 2yj = |zj i|2 +
|zk |2 (c + ||z||2 )
k=j
i
i
438
main
2012/11/1
page 438
i
This gives plenty of examples of ideals I such that every polynomial strictly
positive on VR (I) is a hermitian sum of squares modulo I. In particular we have
obtained in this way an algebraic proof (the third one) and explanation of Quillens
phenomenon.
Proposition 9.59. On a real hypersurface of Cd of equation
z
2 +
r
|qk (z)|2 = M,
k=1
where qk C[z] and M > 0, every positive polynomial is a sum of hermitian squares.
9.11.1
Exercises
Exercise 9.60. Let F (z, z) = k |qk (z)|2 be a sum of hermitian squares. Prove
that the polarization of F satises CauchySchwarz inequality |F (, )|2
F (, )F (, ).
Exercise 9.61. Let P1 , . . . , Pr C[z] be polynomials in a single complex variable
and let a1 , . . . , ar be real numbers. Dene the function
h(z, z) =
r
j=1
r
a2j .
j=1
Prove that
h(1, 1) = h(1, 1) = h(1, 1) = 0,
and deduce that h is not Archimedean modulo the ideal (h).
9.12
Multivariable Miscellanea
9.12.1
Among all aspects of the theory of bounded analytic functions, results of Nevanlinna
Pick interpolation type have received by far the most attention. A pioneer on these
topics is Jim Agler. His book with McCarthy [1] well illustrates the intricate nature
of interpolation and realization theories in higher dimensions.
One of the starting points is the observation that an analytic function f (z) is
t for NevanlinnaPick interpolation in the unit ball B only if the kernel
1 f (z)f (w)
, z, w B,
1 z, w
i
i
main
2012/11/1
page 439
i
439
p(T1 , T2 )
p
,D2
is true.
A celebrated example of Varopoulos shows that such an inequality fails for
three commuting contractions; see [2]. One preferred way of avoiding the complications related to the dierence between the SchurAgler class and all contractive
analytic functions is to turn to functions of free, noncommuting variables.
9.12.2
Having as an example Artins solution to Hilbert 17th problem, there were a few recent attempts to characterize quotients of sums of hermitian squares. The denitive
result is due to Varolin [34], but before stating it we consider a few low-dimensional
cases and examples.
Proposition 9.63 (dAngelo). A nontrivial real valued polynomial of a single
complex variable P (z, z) can be represented as
2
j |pj (z)|
P (z, z) =
,
2
k |qk (z)|
with nitely many pj , qk C[z] if and only if there are complex numbers a , positive
or negative integers n , and a polynomial Q(z, z), such that
#
P (z, z) =
|z a |2n Q(z, z), z C,
Q(z, z) > 0, z C,
and 2 degz Q = deg Q.
The proof [9] is accessible as an exercise to the reader, with the only indication
that if a quotient of sums of hermitian squares vanishes at the point z = a, then its
Taylor series in z a and z a has the lower degree term of the form |z a|2m .
i
i
440
main
2012/11/1
page 440
i
j=1
|pj (z)|2 +
N
j+1
|pj (z)|2
P (z, z)
< .
The proof uses algebraic geometry techniques and a rened estimate of the
Bergman kernel of a carefully chosen metric in the ambient space.
9.12.3
z
2N
P (z)
2 , there exists Q as in the statement, so that
z
2 =
P (z)
2 +
Q(z)
2 .
i
i
main
2012/11/1
page 441
i
441
Since the map P Q is nonconstant, the maximum principle implies that it is proper
(that is, by denition, it pulls back compact subsets of the open unit ball in Cm+n
into compact subsets of the open unit ball in Cd ).
To illustrate the complexity of the classication of proper analytic maps between balls, we reproduce below from the work of dAngelo (see, for instance, [10])
a low-degree and low-dimensional analysis.
The main point is the following question: given N , is there a polynomial or
rational function g from C2 to CN such that |g(z)|2 = 1 |z1 z2 |2 on the unit
sphere? Here is the result.
(a) If ||2 4, then for all N , the answer is no.
(b) If N = 1, then the answer is yes only when = 0.
(c) If N = 2, the answer is yes precisely when one of the following holds:
= 0, ||2 = 1, ||2 = 2, ||2 = 3.
(d) For each with ||2 < 4, there is a smallest N for which the answer is
yes. The limit as || tends to 2 of N is innity.
Idea of proof. We are seeking a holomorphic polynomial mapping g such that
|g1 (z)|2 + + |gN (z)|2 + ||2 |z1 z2 |2 = 1
on the unit sphere.
The components of g and the additional term z1 z2 dene a holomorphic
mapping from the n ball to the N + 1 ball which maps the sphere to the sphere.
Such a map is either constant or proper. The maximum of |z1 z2 |2 on the sphere
is 1 when |z1 |2 = |z2 |2 = 12 . Hence ||2 4 must hold if the question has a positive
answer. We claim that ||2 = 4 cannot hold either. Suppose ||2 = 4 and g exists.
Then we would have
|g(z)|2 + 4|z1 |2 |z2 |2 = 1 = (|z1 |2 + |z2 |2 )2
on the sphere, and hence
|g(z)|2 = (|z1 |2 |z2 |2 )2
on the sphere. No such g exists.
The only proper mappings from the 2-ball to itself are automorphisms and
hence linear fractional transformations. Therefore the term z1 z2 can arise only if
= 0. When = 0 we may of course choose g(z) to be (z1 , z2 ).
The next statement follows from Farans classication of the proper holomorphic rational mappings from B2 to B3 [12]. We say that two maps g and h are
spherically equivalent if there are automorphisms u, v of the domain and target
balls such that h = vgu. If g existed, then there would be a proper polynomial
mapping h from B2 to B3 with the monomial z1 z2 as a component. It follows from
Farans classication that h would have to be spherically equivalent to one of the
i
i
442
main
2012/11/1
page 442
i
four mappings
h(z1 , z2 ) = (z1 , z2 , 0),
h(z1 , z2 ) = (z1 , z1 z2 , z22 ),
7
z17 , z27 ,
f (z) =
f (z) =
9.12.4
7
z15 , z25 ,
7
z1 z2 ,
2
10
z1 z2 ,
3
7 5
z z2 ,
2 1
5 4
z z2
3 1
7
5
z1 z ,
2 2
5
4
z1 z2 .
3
Exercises
Exercise 9.66. Find the Hilbert space realization of functions in the SchurAgler
class.
Exercise 9.67. Show that the polynomial (|zw|2 |u|2 )2 + |z|8 is not a quotient
of sums of squares.
Exercise 9.68. The polynomial 1 + |z|2 + |z|4 is a quotient of sums of squares
for > 2, but for = 2 it is not.
Exercise 9.69. The polynomial z 2 + z 2 + 2|z|2 is not a quotient of sums of squares.
Open problem. A classication of polynomial or rational proper maps between
balls is still unknown, even for maps dened on B3 .
i
i
9.13
main
2012/11/1
page 443
i
443
Among the many possible ramications of the positivity of hermitian forms discussed in the preceding section, the case of so-called hereditary polynomials in a
free -algebra stands aside. First for its simplicity, and second for the applications
to optimization problems outlined in other chapters of the present book. We conne
ourselves to report a couple of signicant results in this direction, recently proved
in [20].
Let A denote the free R-algebra with generators {x1 , . . . , xd , x1 , . . . , xd } and
R-linear involution satisfying
(f g) = g f , (xk ) = xk , (xk ) = xk , 1 k d, f, g A.
An element f A is called analytic if it belongs to the algebra generated by
x1 , . . . , xd , and it is called hereditary if all monomials in the decomposition of f
have xk to the left of xj for all j, k. For instance x1 x3 + x2 x2 is hereditary, while
x1 x3 + x2 x2 is not.
We will state a generic Positivstellensatz and Nullstellensatz, quite dierent
and simpler than the results we have seen in the commutative case. To this aim,
let p1 , . . . , pm be analytic elements of the free -algebra and let
(p) = {r1 p1 + + rd pd ; r1 , . . . , rd A}
denote the left ideal generated by them. Also, let
5
4
sym(p) =
(rj qj + qj rj ); rj A, qj (p)
be the associated symmetrized ideal.
The following result holds.
Theorem 9.70. Let p1 , . . . , pm A be analytic elements. If a symmetric hereditary
q A satises
q(X)v, v 0,
for all pairs (X, v) of nite matrices and vectors satisfying pj (X)v = 0, 1 j d,
then
n
q=
fk fk + g,
k=1
i
i
444
main
2012/11/1
page 444
i
nite matrices and vectors better separate points and directions than the mere point
evaluations and derivations of the commutative polynomial algebra.
9.14
Further Reading
The selection of topics related to hermitian positivity included in the present chapter
is far from complete. While we have tried to make the text self-contained and
illustrative for many theoretical ramications, we did not touch the vast array of
applications, classical and modern. We indicate below a few links to applied areas
with the hope that the interested reader will pursue some of these threads.
To start with the most recent publications, one can consult the monograph
[3], where the essential role played by hermitian sums of squares in signal processing, the prediction theory of stochastic processes and quantum information, is
well explained. Then, for matrix completion problems, a subject of high interest
nowadays, having its origin in the Schur parameter analysis, see the monograph [8].
Matrix completion problems are frequently invoked nowadays in image analysis,
remote sensing, information theory, codication, and on and on. The monograph
by Foias and Frazho [13] contains an interesting application of completion problems
and Schur parameters to the study of the wave propagation in layered media.
The early discoveries of the shift of spectral lines is at the origin of theory
of the perturbation theory of hermitian forms. Together with scattering theory,
another foundation theme of quantum mechanics, perturbation of spectra remains
a hot theme of research, with recent spectacular applications to solid state physics
and submolecular chemistry. The old writings of the founders, such as Friedrichs
[14] and Krein [25, 26] remain actual and inspiring. We must remark here on the
imperative appearance of complex numbers and hermitian forms in the mathematical formulations of quantum mechanics. The textbook [30] and its three additional
volumes are lled with hermitian forms formalism, it is true, in innitely many
variables.
The stability of motion of classical mechanical systems (for instance, oscillations of an elastic medium or uid ows) naturally leads to the problem of enclosing
the spectrum of a hermitian or dissipative operator (aka generator of a semigroup
or hamiltonian) into a prescribed region of the complex plane. The classical root
separation results presented in the rst part of this chapter have immediate consequences to the stability of dynamical systems; see, for instance, [15, 26].
Finally, Hilbert space realization of contractive analytic functions and bounded
analytic interpolation theorems are at the heart of moment problems and the control
theory of systems of dierential equations. Each subject is a big enterprise in itself.
See again the most recent publications [2, 3] and track their bibliographies back to
century-old sources.
Bibliography
[1] J. Agler and J. McCarthy. Pick Interpolation and Hilbert Function Spaces,
Grad. Stud. Math. American Mathematical Society, Providence, RI, 2002.
i
i
Bibliography
main
2012/11/1
page 445
i
445
i
i
446
main
2012/11/1
page 446
i
i
i
main
2012/11/1
page 447
i
Appendix A
Background Material
The appendix consists of four parts: matrices and quadratic forms, convex optimization, convex geometry, and algebraic geometry. The material in this appendix
is mostly standard and as such is presented for the convenience of the reader in a
compact form.
A.1
We present here a few basic facts about linear algebra, symmetric matrices, and
quadratic forms. There are many excellent references on the topic, including [11]
and [15], among others.
A matrix A Rnn is symmetric if aij = aji for i, j = 1, . . . , n. The set
of
symmetric
matrices is denoted as S n and is a real vector space of dimension
$n+1
%
1
= 2 (n + 1)n. Real quadratic
2
n can always be represented in terms of
n forms
symmetric matrices, i.e., q(x) = i=1 j=1 aij xi xj = xT Ax, where aij = aji . We
often identify a symmetric matrix with the corresponding quadratic form.
The characteristic polynomial of a matrix A S n is pA () := det(I A) =
n1
0n
n
+ k=0 pk k = k=1 (k ), where k are the eigenvalues of A. Given a subset
S {1, . . . , n}, let AS be the submatrix of A whose rows and columns are indexed
by S. The principal minor of A corresponding to the subset S is the determinant
of AS . If S has the form {1, 2, . . . , k}, then the corresponding minor is called a
leading principal minor. It can be shown that the coecient pk of the characteristic
polynomial is equal
(up to sign) to the sum of all the principal minors of size n k,
i.e., pk = (1)nk S:|S|=nk det AS . Notice that, in particular, pn1 = Tr A
and p0 = (1)n det A.
447
i
i
448
A.1.1
main
2012/11/1
page 448
i
If the quadratic form xT Ax takes only nonnegative values, we say that the matrix
A is positive semidenite. Similarly, if it takes only positive values (except at the
origin, where it necessarily vanishes), then A is positive denite. There are several
equivalent conditions for a matrix to be positive (semi)denite:
Proposition A.1. Let A S n be a symmetric matrix. The following statements
are equivalent:
1. The matrix A is positive semidenite (A 0).
2. For all x Rn , xT Ax 0.
3. All eigenvalues of A are nonnegative.
4. All 2n 1 principal minors of A are nonnegative.
5. The coecients of pA () weakly alternate in sign, i.e., (1)nk pk 0 for
k = 0, . . . , n 1.
6. There exists a factorization A = BB T , where B Rnr and r is the rank of A.
For the denite case, there are similar characterizations:
Proposition A.2. Let A S n be a symmetric matrix. The following statements
are equivalent:
1. The matrix A is positive denite (A 0).
2. For all nonzero x Rn , xT Ax > 0.
3. All eigenvalues of A are strictly positive.
4. All n leading principal minors of A are strictly positive.
5. The coecients of pA () alternate in sign, i.e., (1)nk pk > 0 for k =
0, . . . , n 1.
6. There exists a factorization A = BB T , with B square and nonsingular.
n
The set of positive semidenite matrices is denoted as S+
, and its interior (the
n
n
set of positive denite matrices) as S++ . The set S+ is invariant under nonsingular
congruence transformations; i.e., if T is nonsingular, A 0 T T AT 0. The
same statement holds for its interior, i.e., A 0 T T AT 0. For additional facts
about the geometry of the set of positive semidenite matrices, see Section A.3.5.
A.1.2
Matrix Factorizations
For a symmetric matrix A, there are several matrix factorizations that can be used
to determine or certify properties of A; see, e.g., [11] for theoretical background and
[9] for computational aspects. Among the most important matrix factorizations, we
have the following.
i
i
main
2012/11/1
page 449
i
449
Eigenvalue decomposition. Since A is symmetric, the eigenspaces corresponding to distinct eigenvalues are mutually orthogonal, and thus one can choose
an orthonormal basis of eigenvectors. As a consequence, the matrix A is
diagonalizable and there is always a decomposition
A = V V T ,
= diag(1 , . . . , n ),
A.1.3
i
i
450
main
2012/11/1
page 450
i
of positive eigenvalues minus the number of negative eigenvalues, i.e., the integer
n+ n .
Notice that, with the notation above, the rank of A is equal to n+ + n .
A symmetric positive denite n n matrix has inertia (0, 0, n), while a positive
semidenite one has (0, k, n k) for some k 0.
The inertia is an important invariant of a quadratic form, since it holds that
I(A) = I(T AT T ), where T is nonsingular. This invariance of the inertia of a matrix
under congruence transformations is known as Sylvesters law of inertia; see, for
instance, [11]. This invariance makes it possible to eciently compute the inertia
of a matrix from its LDLT decomposition, since in this case I(A) = I(D), and the
inertia of a diagonal matrix is trivial to compute.
A.1.4
Schur Complements
A
BT
B
,
C
0
T 1
1 T
BT C
A BC B 0.
C B A B 0
A.2
Convex Optimization
In this section we describe the basic elements of optimization theory, with an emphasis on convexity. For additional background, complete statements, and proofs,
we refer the reader to the works [2, 3, 5].
i
i
A.2.1
main
2012/11/1
page 451
i
451
for all
0 1,
x, y Rn .
for all
x, y Rn .
for all
x Rn ,
A.2.2
Minimax Theorem
sS tT
(A.1)
If the maxima or minima in (A.1) are not attained, the inequality is still true by
replacing max and min with sup and inf, respectively.
It is of interest to understand situations under which (A.1) holds with equality.
The following is a well-known condition for this.
Theorem A.6 (minimax theorem). Let S Rn and T Rm be compact convex
sets, and f : S T R be a continuous function that is convex in its rst argument
and concave in the second. Then
max min f (s, t) = min max f (s, t).
tT sS
sS tT
A special case of this theorem, used in game theory to prove the existence of
equilibria for zero-sum games, is when S and T are standard unit simplices and the
function f (s, t) is a bilinear form.
i
i
452
A.2.3
main
2012/11/1
page 452
i
Lagrangian Duality
f (x)
subject to
gi (x) 0, i = 1, . . . , m,
hj (x) = 0, j = 1, . . . , p,
xR
(A.2)
and let u be its optimal value. Dene the Lagrangian associated with the optimization problem (A.2) as
p
L : Rn Rm
+ R
(x, , )
Rn ,
p
f (x) + m
i=1 i gi (x) +
j=1 j hj (x).
L(x, , ),
max
Rp and 0
(, ).
Applying the minimax inequality (A.1), we see that this is a lower bound on the
value of the original optimization problem:
v minn
max
xR Rp and 0
L(x, , ) = u .
If the functions f , gi are convex and hi are ane, then the Lagrangian is convex
in x and concave in (, ). To ensure strong duality (i.e., equality in the expression
above), compactness or other constraint qualications are needed. An often used
condition is the Slater constraint qualication: there exists a strictly feasible point,
i.e., a point z Rn such that gi (z ) < 0 for all i = 1, . . . , m and hj (z ) = 0 for all
j = 1, . . . , p. Under this condition, strong duality always holds.
Theorem A.7. Consider the optimization problem (A.2), where f, gi are convex
and hi are ane. Assume Slaters constraint qualication holds. Then the optimal
value of the primal is the same as the optimal value of the dual, i.e., v = u .
A.2.4
i=1
x
j=1
Primal feasibility:
x
gi (x ) 0
for i = 1, . . . , m,
hj (x ) = 0
i 0
for j = 1, . . . , p,
for i = 1, . . . , m,
for i = 1, . . . , m.
Dual feasibility:
(A.3)
i
i
main
2012/11/1
page 453
i
453
Under certain constraint qualications (e.g., the ones in the theorem below), the
KKT conditions are necessary for local optimality.
Theorem A.8. Assume any of the following constraint qualications hold:
The gradients of the constraints {g1 (x ), . . . , gm (x ), h1 (x ), . . . , hp (x )}
are linearly independent.
There exists a strictly feasible point (Slater constraint qualication), i.e.,
a point z Rn such that gi (z ) < 0 for all i = 1, . . . , m and hj (z ) = 0
for all j = 1, . . . , p.
All constraints gi (x), hi (x) are ane functions.
Then, at every local minimum x of (A.2) the KKT conditions (A.3) hold.
On the other hand, for convex optimization problems, i.e., if all functions f , gi
are convex and hi are ane, then the KKT conditions are sucient for local (and
thus global) optimality:
Theorem A.9. Let (A.2) be a convex optimization problem and x be a point that
satises the KKT conditions (A.3). Then x is a global minimum.
A.3
Convex Geometry
We give a summary of standard properties of convex sets and the cone of positive
semidenite matrices. We refer the reader to [2, 13, 14] for more background and
proofs.
A.3.1
Basic Facts
i
i
454
main
2012/11/1
page 454
i
The conic hull, cone(S), of S is the set of all conic combinations of the points in S:
cone(S) =
x Rn
x = 1 y1 + + k yk for some y1 , . . . , yk S
.
i 0
i
i
main
2012/11/1
page 455
i
455
A.3.2
Cone Decomposition
A.3.3
Separation Theorems
An important property of a convex set is that we can certify when a point is not
in the set. This is usually done via a separation theorem. Let H be an ane
hyperplane in Rn . Then H divides Rn into two half spaces. We will use H+ and
+ and H
to denote the closed half spaces.
H to denote the open half spaces and H
i
i
456
main
2012/11/1
page 456
i
We say that H separates two sets K1 and K2 if K1 and K2 belong to dierent closed
+ and H
. We say that H strictly separates K1 and K2 if they belong
half spaces H
to dierent open subspaces H+ and H .
Equivalently we can think of H as the zero set of an ane linear functional
: Rn R. Then separates K1 and K2 if (x) 0 for all x K1 and (x) 0
for all x K2 . Similarly strictly separates K1 and K2 if (x) > 0 for all x K1
and (x) < 0 for all x K2 .
Now we state our most general separation theorem.
Theorem A.16. Let K1 and K2 be convex subsets of Rn such that K1 K2 = .
Then there exists an ane hyperplane H that separates K1 and K2 .
We observe that it follows from Theorem A.16 that every face of a convex set
K is contained in an exposed face of K.
We will often be interested in strict separation, in which case we need to make
further assumptions on K1 and K2 .
Theorem A.17. Let K1 and K2 be disjoint convex subsets of Rn and suppose that
K1 is compact and K2 is closed. Then there exists an ane hyperplane H strictly
separating K1 and K2 .
Theorem A.17 is often applied when K1 is a single point. Separation theorems
lead to certicates of not belonging to a convex set. Combined with notions of
polarity explained below this leads to theorems of the alternative.
We need to adjust Theorems A.16 and A.17 to the setting of cones, since, for
example, all cones contain the origin and are never disjoint. Also, any hyperplane
separating two cones C1 and C2 must be linear. We will say that a linear functional
: Rn R separates C1 and C2 if (x) 0 for all x C1 and (x) 0 for all
x C2 . Similarly strictly separates C1 and C2 if (x) > 0 for all nonzero x C1
and (x) < 0 for all nonzero x C2 . Then we have the following theorem.
Theorem A.18. Let C1 and C2 be pointed closed convex cones in Rn such that
C1 C2 = 0. Then there exists a linear functional : Rn R strictly separating
C1 and C2 .
A.3.4
We can view a compact convex set K as the convex hull of its extreme points,
but we can also view it as being cut out by linear inequalities. The set of ane
linear inequalities dening K is a convex object itself, and this leads to very fruitful
notions of polarity and duality in convex geometry.
Let , be an inner product on Rn . Let K Rn be a convex body with
origin in its interior. Dene the polar body K as follows:
K = {x Rn | x, y 1 for all y K}.
The polar body encodes all the ane linear dening inequalities of K. It is easy to
see that K is also a convex body with origin in its interior. Moreover x K is
i
i
main
2012/11/1
page 457
i
457
then K2 K1 .
A.3.5
and
(C ) = C.
n
Let S+
denote the cone of positive semidenite n n matrices. It is easy to show
n
that S+
is a closed, pointed cone and it is full dimensional in S n . We dene an
inner product on S n as follows: A, B = Tr(AB). It is not hard to show that the
n
is self-dual.
cone S+
i
i
458
main
2012/11/1
page 458
i
A.3.6
Dimensional Inequalities
This bound is sharp in general, but it was improved by Barvinok in the case
n
where the intersection A S+
is bounded [1].
i
i
main
2012/11/1
page 459
i
459
S
such that codim A = r+2
is
nonempty
S+
+
2
n
and bounded. Then there is a matrix X S+
A such that rank X r.
A.4
There are excellent books for the basics of commutative algebra, algebraic geometry,
and real algebraic geometry used in this book. For polynomials, ideals, Gr
obner
bases, and basic algebraic geometry we refer the reader to [7], an introduction to
these topics at the undergraduate level. For basic real algebraic geometry concepts
such as semialgebraic sets and the TarskiSeidenberg quantier elimination, see [12].
What we provide below is a brief tour through some of the algebraic themes that
arise in this book with the goal of giving the absolute newcomer a quick grasp of
the concepts. For a more serious appreciation of these topics, the reader is referred
to the above-mentioned books.
A.4.1
k.
A
monomial
x
is
in
the
support
of
f
if
c
=
0
in
the
expression
where
c
a
a a
f =
ca x . The degree of f =
ca xa is the maximum L1 -norm of the vectors a
that appear as exponents of monomials in the support of f . The usual elds considered in this book are the set of real numbers denoted as R and the set of complex
numbers denoted as C. In what follows, we assume that the eld k is either C or R.
The polynomial ring k[x] := k[x1 , . . . , xn ] is the set of all polynomials in x1 , . . . , xn
with coecients in k. It is endowed with the two binary operations of addition and
multiplication of pairs of polynomials.
Groups, rings, and elds are basic objects in abstract algebra that satisfy an
increasing list of properties. See, for instance, [8] for denitions and examples.
A binary operation on a set S is
associative if (f g) h = f (g h) for all f, g, h S, and
commutative if f g = g f for all f, g S.
The pair (S, )
has an identity if there exists an element e S such that f e = e f = f for
all f S, and
has inverses if for each f S, there exists an element f 1 S such that
f f 1 = f 1 f = e.
Denition A.27.
A set G with a binary operation is a group if is associative and (G, ) has
an identity and inverses. If in addition, is commutative in G, then G is
called a commutative group.
i
i
460
main
2012/11/1
page 460
i
A.4.2
Polynomial Ideals, Gr
obner Bases, and Quotient
Rings
Denition A.28.
1. A subset I k[x] is an ideal if it satises the following properties:
0 I.
If f, g I, then f + g I.
If f I and h k[x], then hf I.
2. The ideal generated by f1 , . . . , ft k[x] is the set I =
4
t
i=1
5
hi fi : hi k[x] ,
denoted as f1 , . . . , ft .
Check that f1 , . . . , ft is an ideal in k[x]. A simple example of an ideal in
the polynomial ring R[x1 , x2 ] is the set of all polynomials that evaluate to 0 on the
point (0, 0). This ideal consists of all polynomials of the form x1 f + x2 g, where
f, g k[x] and hence equals x1 , x2 . An ideal I k[x] is nitely generated if it
is generated by a nite set of polynomials in k[x]. A generating set of an ideal I
is called a basis of I. An ideal can have bases of dierent cardinalities and, unlike
a vector space basis, an ideal basis is just a generating set with no independence
requirements.
i
i
main
2012/11/1
page 461
i
461
ca xa k[x] with respect to
The initial term in (f ) of a polynomial f =
a
a
is that monomial ca x with ca = 0 such that x xb for all other monomials
xb in the support of f . The monomial xa is called the initial monomial of f .
The initial ideal in (I) is the ideal generated by the initial monomials of all
polynomials in I.
i
i
462
main
2012/11/1
page 462
i
basis of an ideal given a term order. This algorithm underlies the Grobner basis functionality in modern computer algebra packages such as Macaulay2, SINGULAR,
Maple, Mathematica, etc.
Example A.35. An example of a reduced Grobner basis with respect to the total
degree ordering was given in Chapter 7. Consider the ideal
I = x4 y 2 z 2 , x4 + x2 + y 2 1.
Using Macaulay2 [10] one can calculate a total degree reduced Gr
obner basis of I
as follows:
Macaulay2, version 1.3
i1
i2
i3
o3
:
:
:
=
R
I
G
|
which says that a total degree Grobner basis consists of the two polynomials
x2 + 2y 2 + z 2 1 and 4y 4 + 4y 2 z 2 + z 4 5y 2 3z 2 + 1.
The reduced Gr
obner basis of I would have the property that no initial term of an
element is divisible by the initial term of another element and that all initial terms
have unit coecients. Hence the reduced Gr
obner basis of I is
1 4
2
2
2
4
2 2
2
2
x + 2y + z 1, y + y z + (z 5y 3z + 1) .
4
In particular, the initial ideal of I with respect to this term order is x2 , y 4 . Check
that both elements in the Grobner basis lie in the ideal I.
Gr
obner bases enable a multitude of computations with ideals such as checking whether a polynomial lies in an ideal (ideal membership), checking whether an
ideal equals the whole ring, nding all roots of a system of polynomial equations,
nding the intersection of two ideals, etc. Ideal membership relies on a multivariate
division algorithm that computes the remainder (called a normal form) of a polynomial f with respect to a Gr
obner basis. A polynomial f lies in I if and only if
the normal form of f (with respect to any reduced Gr
obner basis of I) is zero. This
in turn relies on the fact that the normal form of a polynomial with respect to a
reduced Gr
obner basis of I is unique.
Example A.36. The normal form of the monomial x2 y with respect to the Grobner
basis in Example A.35 is obtained by successively dividing out the initial monomial
obner basis from x2 y and
in (g) of an element g := in (g) g in the reduced Gr
2
2
2
multiplying with g . Let g1 := x + 2y + z 1 and g2 := y 4 + y 2 z 2 + 14 (z 4 5y 2
3z 2 + 1). Then x2 y can be divided by g1 to give 2y 3 yz 2 + y. The resulting initial
term 2y 3 cannot be divided by either in (g1 ) or in (g2 ), which implies that the
normal form of x2 y is 2y 3 yz 2 + y.
i
i
main
2012/11/1
page 463
i
463
Given an ideal I in a polynomial ring k[x], one can compute the quotient
ring k[x]/I which consists of all equivalence classes of polynomials in k[x] mod the
ideal I. Given two polynomials f, g k[x], f is equivalent to g mod I if f g I.
This is denoted as f
= g mod I, and the equivalence class of f mod I is denoted as
f + I. This notion is a generalization of the familiar modular arithmetic in the ring
of integers, where we say that z, z Z are equivalent mod a xed integer p if z z
is an integer multiple of p. In this case the ideal I (in the ring of integers Z) is the
ideal generated by p, namely the set consisting of all integer multiples of p. If f is
the normal form of a polynomial f with respect to a reduced Grobner basis of an
ideal I in k[x], then f f I and hence f
= f mod I. Since the normal form of a
polynomial with respect to a reduced Gr
obner basis is unique, if f g I, then the
normal form of f g is zero, which implies that both f and g have the same normal
form. Hence every equivalence class of polynomials mod I can be represented by
the unique normal form of all the elements in that class with respect to a xed
reduced Gr
obner basis of I.
Example A.37. In Example A.35, the equivalence class of x2 y mod I consists of
all polynomials g Q[x, y, z] such that x2 y g I. In other words, x2 y + I is the
set of all g Q[x, y, z] with normal form 2y 3 yz 2 + y with respect to the reduced
Gr
obner basis
1
g1 := x2 + 2y 2 + z 2 1, g2 := y 4 + y 2 z 2 + (z 4 5y 2 3z 2 + 1) .
4
The quotient ring k[x]/I is a k-vector space. Addition in the ring is dened
as (f + I) + (g + I) = (f + g) + I and scalar multiplication as (f + I) = f + I for
all k. A primary use of Grobner bases is that they provide a vector space basis
for k[x]/I in the following sense. Fix a term order on k[x] and consider the initial
ideal in (I) of the ideal I. Recall that this initial ideal is generated by monomials.
The monomials in k[x] that do not lie in in (I) are called the standard monomials
of in (I). The equivalence classes m + I as m varies over the standard monomials
of in (I) form a vector space basis of k[x]/I. Buchbergers algorithm for Gr
obner
bases was motivated by the quest to nd vector space bases for k[x]/I. It is easy
to see why the equivalence classes of standard monomials provide a vector space
basis for k[x]/I. We saw earlier that once a term order is xed, every equivalence
class f + I has a unique representative f + I, where f is the normal form of f
with respect to the reduced Grobner basis G of I corresponding to . Note that
f cannot be divided by in (g) for any g G and hence all its monomials are
standard with respect to in (I). This shows that the elements m + I span k[x]/I.
If a collection of them are linearly dependent,
then there exists standard monomials
,
.
.
.
,
m
and
scalars
,
.
.
.
,
such
that
i (mi + I) = 0 + I, or equivalently,
m
t
1
t
1
i mi I. However, if
i mi I, then its normal form with respect to G is
zero which implies that some mi is divisible by some in (g) for g G , which is a
contradiction.
Example A.38. The vector space Q[x, y, z]/I for the ideal in Example A.35 has
innite dimension. The initial ideal of the total degree order used in this example
i
i
464
main
2012/11/1
page 464
i
is in (I) = x2 , y 4 . Hence the standard monomials of this initial ideal are all
monomials in x, y, z that are not divisible by x2 and y 4 . There are innitely many
such monomials since all powers of z are standard. Regardless, an innite basis of
Q[x, y, z]/I consists of m+I as m varies over the standard monomials of in (I).
A.4.3
Algebraic Varieties
i
i
main
2012/11/1
page 465
i
465
An ideal I is radical if I = I.
The radical I is an ideal and both I and I have the same ane variety.
Further, the vanishing ideal I(Vk (I)) is a radical ideal. The following theorem shows
that when k is an algebraically closed eld, there is a bijection between radical ideals
in k[x] and ane varieties in k n .
Theorem A.44 (Hilbertsstrong Nullstellensatz). If k is an algebraically
closed eld, then I(Vk (I)) = I.
The following example points out the importance of k being algebraically
closed in the above Nullstellensatz.
Example A.45. The ideal I = x2 + y 2 C[x, y] is radical. Its ane variety
in R2 is {(0, 0)}, whose vanishing ideal is J = x, y and J = I. On the other
hand, the ane variety of I in C2 consists of the two lines x = iy whose vanishing
ideal is I.
There is a strong Nullstellensatz for projective varieties as well that has the
same statement. However, there is a weak Nullstellensatz that characterizes empty
varieties whose statements are dierent for ane and projective varieties. We refer
the reader to [7, Chapter 8] for details.
Theorem A.46 (Hilberts weak Nullstellensatz). Let k be an algebraically
closed eld.
1. If I is an ideal in k[x], then its ane variety Vk (I) k n is empty if and only
if I = k[x].
2. If I is a homogeneous ideal in k[x], then its projective variety in Pn1
is empty
k
i
I
where
mi is
if and only if for each i = 1, . . . , n, there is a monomial xm
i
some nonnegative integer.
To end this subsection, we briey discuss the notions of dimension, degree, and
singular points of an algebraic variety. These notions are too subtle to be explained
i
i
466
main
2012/11/1
page 466
i
correctly here and we refer the reader to [7, Chapter 9]. Dimension and degree of
a variety can be computed from an algebraic entity called the Hilbert polynomial of
the vanishing ideal of the variety. A key feature of Gr
obner basis theory is that an
ideal I and all its initial ideals have the same Hilbert polynomial and the polynomial
has a combinatorial expression that can be computed from the standard monomials
of any of its initial ideals. Intuitively, the dimension of an ideal is the dimension of
the largest component of its ane variety. For instance we expect a hypersurface in
k n to have dimension n 1 since it is constrained by a single polynomial. However,
when the eld is not algebraically closed, this intuition can be wrong. For instance,
VR (x2 + y 2 ) = {(0, 0)} is a zero-dimensional variety in R2 while VC (x2 + y 2 ) is a
one-dimensional variety in C2 .
The degree of a variety is also dened from the Hilbert polynomial of the
vanishing ideal. Intuitively we expect that slicing an r-dimensional variety in k n
with a generic plane of dimension n r through the origin would create nitely
many intersections. The number of intersection points should be constant if the
plane is generic enough and is intuitively the degree of the variety. For instance,
the parabola dened by y x2 has two points of intersection with a generic line
through the origin saying that its degree is two, while the cubic curve y = x3 cuts
out a variety of degree three.
A nonsingular (also called regular or smooth) point p on a variety W is a
point where the tangent space to W at p has the same dimension as the component
of W containing p and hence serves as a reasonable linear approximation to this
f
f
, . . . , x
) be the
component near p. For a polynomial f k[x], let (f ) := ( x
1
n
n
n
gradient of f and (f )(p) k be the evaluation of (f ) at p k . Since the
structure of a variety is unchanged by translation, we may assume without loss of
generality that p = 0. If I(W ) = f1 , . . . , fs , then the tangent space of W at p
is the null space of the matrix J(0) whose rows are (f1 )(0), . . . , (fs )(0). The
matrix J whose rows are the polynomials (f1 ), . . . , (fs ) is called the Jacobian
matrix of f1 , . . . , fs . Thus the rank of J(0) determines whether 0 is singular on W
or not. In particular, 0 is a singular point on a hypersurface Vk (f ) if and only if
(f )(u) = 0.
A.4.4
A good deal of the algebraic geometry that appears in this book is over R, which
is not an algebraically closed eld. As a result, many of the theorems that apply
over C do not work in this setting making the study of real varieties and their
ideals more tricky than their complex counterparts. A good introduction to the
real algebraic geometry background needed in this book is [12]. We dene a few of
the key concepts and results.
Denition A.47. A set S Rn dened as S = {x Rn : fi (x) i 0, i =
1, . . . , t}, where, for each i, i is one of , >, =, =, and fi (x) R[x], is called
a basic semialgebraic set. A basic closed semialgebraic set is a set of the form
S = {x Rn : f1 (x) 0, . . . , ft (x) 0}.
i
i
main
2012/11/1
page 467
i
467
R
I := {f R[x] : f 2m + I for some nonnegative integer m}.
i
i
468
main
2012/11/1
page 468
i
Recall that I R[x], the radical ideal of I in R[x] is the largest ideal that
vanishes on
thecomplex variety VC (I). Therefore, since VR (I) VC (I), we have
that I I R I.
The Positivstellensatz also gives a simple solution to Hilberts 17th problem,
which asked whether every nonnegative polynomial in R[x] can be written as a sum
of squares of rational functions in x. This was answered in the armative by Artin
in 1927. The two-variable case was shown by Hilbert in 1893.
Bibliography
[1] A. Barvinok. A remark on the rank of positive semidenite matrices subject
to ane constraints. Discrete Comput. Geom., 25:2331, 2001.
[2] A. Barvinok. A Course in Convexity, Grad. Stud. Math. 54. American Mathematical Society, Providence, RI, 2002.
[3] D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar. Convex Analysis and Optimization. Athena Scientic, Belmont, MA, 2003.
[4] J. Bochnak, M. Coste, and M-F. Roy. Real Algebraic Geometry. Springer,
Berlin, 1998.
[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, Cambridge, UK, 2004.
[6] R. W. Cottle. Manifestations of the Schur complement. Linear Algebra Appl.,
8:189211, 1974.
[7] D. Cox, J. Little, and D. OShea. Ideals, Varieties and Algorithms. SpringerVerlag, New York, 1992.
[8] D. S. Dummit and R. M. Foote. Abstract Algebra. Prentice Hall Inc., Englewood
Clis, NJ, 1991.
[9] G. H. Golub and C. F. Van Loan. Matrix Computations, 3rd edition. Johns
Hopkins University Press, 1996.
[10] D. R. Grayson and M. E. Stillman. Macaulay 2, a software system for research
in algebraic geometry. Available at http://www.math.uiuc.edu/Macaulay2/.
[11] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press,
Cambridge, UK, 1995.
i
i
Bibliography
main
2012/11/1
page 469
i
469
i
i
main
2012/11/1
page 470
i
main
2012/11/1
page 471
i
Index
A-discriminant, 217, 224
adjoint, 209
ane linear pencil, 353
algebraic boundary, 205, 207, 211, 224, 226
algebraic degree, 220
semidenite programming, 236
algebraic interior, 255
algebraic set, 294
analytic center, 239
analytic polynomial, 357
Ando theorem, 439
Archimedean property, 115, 277
Archimedean semiring, 437
atoms, 39
Bergman kernel, 414
biduality, 209, 211, 217, 243
binary optimization, 28
bitangent line, 225
Blaschke product, 419
border vector, 371, 374
border vectormiddle matrix, 371
bounded degree representation, 277
PutinarPrestel, 278
Schm
udgen, 278
calibrated geometry, 323
Cayleys cubic surface, 232
CayleyBacharach relations, 174
characteristic polynomial, 447
characteristic vector, 330
Chebyshev inequality, 139
Cholesky, see decomposition
CHSY lemma, 380
clamped second fundamental form, 388
clamped tangent plane, 388, 390
closed loop system, 343
471
i
i
472
convex sum of squares, 271, 274
copositive matrix, see matrix
corner point, 265
correlation matrix, 209, 232, 234
curvature, 264
nonnegative, 264
positive, 264
cyclic forms, 134
cyclohexatope, 238
decomposition
LDLT , 449
Cholesky, 449
eigenvalue, 449
dening polynomial, 255
dehomogenization, 211, 215
density matrix, 140
dimension free, 342, 344
directional derivative, 362, 365, 367
discriminant, 217
dissipative system, 344
domain of regularity, 358
dual cone; see cone/dual
dual optimization problem, 213
dual variety, 215, 216, 221
dual vector space, 209
duality, 203
projective, 207
semidenite programming, 22
strong, 22
ellipsoid, 252
elliptope, 15, 232
epigraph, 451
Euclidean distance matrix, 37
face, 210
dual, 210
exposed, 210, 261
proper, 211
facet, 211
Farkas lemma, 111
Fock space, 363, 386
free
analysis, 341
convex algebraic geometry, 341
main
2012/11/1
page 472
i
Index
convexity, 342, 348
positivity, 341
probability, 341, 342, 348
real algebraic geometry, 341
semialgebraic set, 351
variables, 356
full rank point, 388
genus, 329
geometric theorem proving, 142
Gr
obner basis, 94, 205, 216, 297
Gram matrix, 61, 379, 387
graph
perfect, 333
Petersen, 35, 337
triangle-free, 335
Grassmannian, 323
Grothendieck constant, 32
Hadamard product, 414
Hermite matrix, 49
Hermite theorem, 416
hermitian linear transform, 409
hermitian structure, 408
Hessian, 355, 376, 387
modied, 392
relaxed, 393
hierarchy of relaxations, 113, 297
Hilbert space factorization, 413
Hilbert space realization, 423
Hilberts theorem, 162, 325
homogeneous linear pencil, 353
homogenization, 211
hyperbolic, 256
hyperboloid, 261
hyperplane rounding, 30
ideal, 107, 295
congruent mod, 297
initial, 297
Pl
ucker, 323
principal, 323
real radical, 305
StanleyReisner, 334
vanishing, 305
independent set, 34
i
i
Index
inequality
linear matrix; see linear matrix
inequality
inertia, 49,
inertia, law of, 410
inertia of a matrix, 449
inner product, 408
apolar, 67
Bombieri, 67
Fischer, 67
input space, 343
interpolation
analytic, 35
intervals, 86
involution, 352
irredundant, 265
Jacobian matrix, 215
K3 -cover subgraph problem, 337
k-ellipse, 17, 254
KarushKuhnTucker condition, 452
equations, 214
general form, 214
SDP, 206, 234
k-sos mod I, 296
Lagrange multiplier, 213
Lagrangian, 213, 452
Lasserres method, 296
LDL decomposition, 359
leading principal minor, 447
LifshitzKrein theorem, 431
lift-and-project methods, 330
lifting vector, 261
line test, 257
linear matrix inequality, 7, 204, 252,
346, 396
monic, 252
linear pencil, 251, 258, 353, 396
ane, 353
homogeneous, 353
monic, 353, 396, 398
symmetric, 353
linear programming, 4, 293
linear system, 342, 343
main
2012/11/1
page 473
i
473
LMI; see linear matrix inequality
localization, 252, 283
localizing matrix, 273
N th order, 273
Lov
asz theta function, 34
Lyapunov function, 25, 136
Markov inequality, 139
MATLAB, 300
matrix
completely positive, 131
copositive, 131, 270
Euclidean distance, 37
pseudo-moment, 315
reduced moment, 306
shifted reduced moment, 313
sum of squares, 87
matrix convex, 354
matrix factorizations, 448
matrix inequality, 346
matrix positive, 354
matrix-valued noncommutative polynomials, 352
maximum cut problem, 28, 335
middle matrix, 371, 374, 376
signature, 378
min-max principle, 411
Minkowski sum, 283
MinkowskiWeyl theorem, 5
moment curve, 123
moment matrix, 176, 272
moment spaces, 123
moments, 120, 251, 271
monomial basis, 67
Motzkin form, 162
natural map, 384
NCAlgebra, 366
NCSOStools, 366, 369
NCvars, 369
NevanlinnaPick theorem, 35, 427
Newton identities, 49, 50
Newton polytope, 91, 162
noncommutative
basic open semialgebraic set, 351
basic semialgebraic set, 382
i
i
474
convex, 354, 396
polynomial, 349, 352
positive, 354
rational expressions, 358
spectrahedron, 396
nonnegative polynomials
algebraic boundary, 172
boundary structure, 170
cone of, 161
dual cone, 168
exposed faces, 170
on a variety, 296
volume of, 187
nonsingular point, 265
norm
Lp , 212
atomic, 39
dual, 212
Frobenius, 15
nuclear, 15, 39
operator, 15
normal form, 297
normal space, 326
Nullstellensatz, 110
odd cycle, 337
odd wheel, 338
Ono inequality, 143
optimal value function, 207, 220, 224,
235
optimality conditions, 452
output space, 343
partial order, 7
Pataki inequalities, 236
phase shift, 430
polyhedron, 4
polynomial, 349, 352
analytic, 357
concave, 399
convex, 350, 354356, 362, 368,
377, 396
evaluation, 353, 365
irreducible, 389
linear, 295
linear dependence, 387
main
2012/11/1
page 474
i
Index
noncommutative, 349, 352
positive, 354, 356, 362, 369
symmetric, 350, 352, 354, 356
trigonometric, 63
univariate, 86
vanishing, 354
polynomial identity, 354, 362364
polynomial matrix inequality, 282
polynomial optimization
univariate, 76, 77
polynomial optimization, 76, 213
polytope, 4, 211
2-level, 318
k-level, 330
compressed, 319
triangle-free subgraph, 335
positive curvature, 387, 389, 391
positive denite kernel, 411
positive semidenite, 204, 410, 448
positivity set, 396
Positivstellensatz, 112, 347, 348, 397
Schm
udgen, 115, 273, 321
Putinar, 115, 273
preorder, 107
truncated, 264
principal minor, 447
probability bounds, 139
projective space, 215
projective toric variety, 217
proper analytic maps, 440
protrusion, 205
Polya theorem, 434
quadratic module, 107
truncated, 264
QuadratischePositivstellensatz, 372,
380, 386, 391
quantum
entanglement, 140
phenomena, 342
quasi-concave, 265
strictly, 265
Quillen theorem, 432
R<x>k , 352
R<x, x >, 356
i
i
Index
rank minimization, 39
rational expressions, 358
equivalent, 359
rational function, 77, 359, 365
Bergman, 359
convex, 361, 368
linear dependence, 384
matrix, 359
noncommutative, 359
positive, 361
rational sos decompositions, 69
real Nullstellensatz, 305
real zero, 256
redundant, 265
regular point, 215
Riccati
matrix inequality, 345
polynomial, 357
RieszFejer theorem, 422
RieszHerglotz theorem, 421
rigidly convex, 257
root separation, 417
S-lemma, 80
S-procedure, 80
Sch
onberg matrix, 238
Schur algorithm, 419
Schur complement, 253, 345
Schur theorem, 414
Schur inequality, 142
SchurAgler class, 438
SDPT3, 41
second fundamental form, 387
clamped, 388
SeDuMi, 41, 301
semialgebraic set, 211, 220, 294, 350,
382
basic closed semialgebraic, 255,
265
basic open semialgebraic, 350
basic semialgebraic, 233
convex, 396
free, 351
noncommutative, 351
semidenite programming, 7, 233, 293
abstract denition, 235
main
2012/11/1
page 475
i
475
semidenite relaxation
Putinar, 273
Schm
udgen, 274
semidenite representation, 251, 294
separation theorem, 325
shadow area, 323
signal ow diagram, 343
signature of a matrix, 449
simplicial complex, 334
singular
locus, 215
point, 215, 265, 327
Slaters condition, 14, 275
sos-matrix, 87
spectrahedron, 8, 9, 15, 205, 231, 252,
396, 399
lifting, 261
projected, 9, 261, 294
spectral theorem, 409, 427
spectraplex, 15
stability number, 331
stable set, 34, 331
polytope, 331
problem, 331
standard monomials, 297
state space, 343
Steiners quartic surface, 232
storage function, 345
sum of largest eigenvalues, 262
sum of largest singular values, 262
sum of squares, 57, 296, 342, 347, 355,
369, 387, 396, 397
convexity, 90
mod ideal, 296, 298
on quotient rings, 94
program, 73
sums of squares cone
algebraic boundary, 172
cone of, 161
dual cone, 176
semidenite representation, 177
volume of, 192
symmetric ane linear pencil, 353
symmetric polynomial, 350, 352,
356
symmetric variables, 352
i
i
476
main
2012/11/1
page 476
i
Index
tangent plane
clamped, 388
tensor product, 353
Kronecker, 353
theta
body, 243, 297, 303
body of a graph, 332
number of a graph, 333
Toeplitz matrix, 420
trace, 7
triangle-free subgraph problem, 335
tritangent plane, 227
Trott curve, 225
truncated moment vector, 272
TV screen, 351, 366, 399
unitary transform, 409
valid constraint, 107
variables
classes, 357
free, 356
mixed, 357
variety, 388
compact, 320
noncommutative, 388
real, 295
real algebraic, 294
Varolin theorem, 440
Veronese surface, 224
von Neumann inequality, 425
YALMIP, 300
Zariski closure, 211
zero set, 388; see variety
i
i