100% found this document useful (1 vote)
348 views

Semidefinite Programming & Algebraic Geometry

This document is a table of contents for a book on convex algebraic geometry. It lists 9 chapters that discuss various topics in this area such as semidefinite optimization, polynomial optimization, sums of squares representations, convex hulls of algebraic sets, and free convexity. It also includes sections on background material and a list of contributors.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
348 views

Semidefinite Programming & Algebraic Geometry

This document is a table of contents for a book on convex algebraic geometry. It lists 9 chapters that discuss various topics in this area such as semidefinite optimization, polynomial optimization, sums of squares representations, convex hulls of algebraic sets, and free convexity. It also includes sections on background material and a list of contributors.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 492

i

main
2012/11/1
page v
i

Contents
List of Contributors

ix

List of Figures

xi

Preface

xv

List of Notation

xvii

What is Convex Algebraic Geometry?


Grigoriy Blekherman, Pablo A. Parrilo, and
Rekha R. Thomas

Semidenite Optimization
Pablo A. Parrilo
2.1
From Linear to Semidenite Optimization
2.2
Applications of Semidenite Optimization
2.3
Algorithms and Software . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.

Polynomial Optimization, Sums of Squares,


and Applications
Pablo A. Parrilo
3.1
Nonnegative Polynomials and Sums of Squares
3.2
Applications of Sum of Squares Programs . . .
3.3
Special Cases and Structure Exploitation . . .
3.4
Infeasibility Certicates . . . . . . . . . . . . .
3.5
Duality and Sums of Squares . . . . . . . . . .
3.6
Further Sum of Squares Applications . . . . .
3.7
Software Implementations . . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

3
25
41
43

47
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

48
76
86
106
117
131
148
149

Nonnegative Polynomials and Sums of Squares


159
Grigoriy Blekherman
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.2
A Deeper Look . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
v

i
i

vi

Contents
4.3
The Hypercube Example . . . . . . . . . . . .
4.4
Symmetries, Dual Cones, and Facial Structure
4.5
Generalizing the Hypercube Example . . . . .
4.6
Dual Cone of n,2d . . . . . . . . . . . . . . .
4.7
Ranks of Extreme Rays of 3,6 and 4,4 . . .
4.8
Extracting Finite Point Sets . . . . . . . . . .
4.9
Volumes . . . . . . . . . . . . . . . . . . . . .
4.10
Convex Forms . . . . . . . . . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . . . .

main
2012/11/1
page vi
i

Dualities
Philipp Rostalski and Bernd Sturmfels
5.1
Introduction . . . . . . . . . . . . . . . . . .
5.2
Ingredients . . . . . . . . . . . . . . . . . . .
5.3
The Optimal Value Function . . . . . . . . .
5.4
An Algebraic View of Convex Hulls . . . . .
5.5
Spectrahedra and Semidenite Programming
5.6
Projected Spectrahedra . . . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . . .
Semidenite Representability
Jiawang Nie
6.1
Introduction . . . . . . . . . . . . . . . . .
6.2
Spectrahedra . . . . . . . . . . . . . . . . .
6.3
Projected Spectrahedra . . . . . . . . . . .
6.4
Constructing Semidenite Representations
Bibliography . . . . . . . . . . . . . . . . . . . . .
Convex Hulls of Algebraic Sets
Jo
ao Gouveia and Rekha R. Thomas
7.1
Introduction . . . . . . . . . . . . .
7.2
The Method . . . . . . . . . . . . .
7.3
Convergence of Theta Bodies . . . .
7.4
Combinatorial Optimization . . . .
Bibliography . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

163
167
172
176
182
184
185
195
200
203

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

203
209
219
224
231
239
247
251

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

251
252
261
271
289
293

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

Free Convexity
J. William Helton, Igor Klep, and Scott McCullough
8.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2
Basics of Noncommutative Polynomials and Their Convexity
8.3
Computer Algebra Support . . . . . . . . . . . . . . . . . . .
8.4
A Gram-like Representation . . . . . . . . . . . . . . . . . .
8.5
Der QuadratischePositivstellensatz . . . . . . . . . . . . . .
8.6
Noncommutative Varieties with Positive Curvature Have
Degree 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.7
Convex Semialgebraic Noncommutative Sets . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

293
295
317
330
338
341

.
.
.
.
.

.
.
.
.
.

341
349
366
370
380

. . 387
. . 396

i
i

Contents

main
2012/11/1
page vii
i

vii

8.8
From Free Real Algebraic Geometry to the Real World . . . . . 400
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
9

Sums of Hermitian Squares: Old and New


Mihai Putinar
9.1
Introduction . . . . . . . . . . . . . . . . . .
9.2
Hermitian Forms and Sums of Squares . . .
9.3
Positive Denite Kernels . . . . . . . . . . .
9.4
Origins of Hermitian Forms . . . . . . . . .
9.5
Schurs Algorithm . . . . . . . . . . . . . . .
9.6
RieszHerglotz Theorem . . . . . . . . . . .
9.7
von Neumanns Inequality . . . . . . . . . .
9.8
Bounded Analytic Interpolation . . . . . . .
9.9
Perturbations of Self-Adjoint Matrices . . .
9.10
Positive Forms in Several Complex Variables
9.11
Semirings of Hermitian Squares . . . . . . .
9.12
Multivariable Miscellanea . . . . . . . . . . .
9.13
Hermitian Squares in the Free -Algebra . .
9.14
Further Reading . . . . . . . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . . .
Background Material
Grigoriy Blekherman, Pablo A. Parrilo, and
Rekha R. Thomas
A.1
Matrices and Quadratic Forms . . . . . . . .
A.2
Convex Optimization . . . . . . . . . . . . .
A.3
Convex Geometry . . . . . . . . . . . . . . .
A.4
Algebra of Polynomials and Ideals . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . . .

Index

407
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

407
408
411
416
419
421
425
427
429
431
435
438
443
444
444
447

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

447
450
453
459
468
471

i
i

main
2012/11/1
page viii
i

main
2012/11/1
page ix
i

List of Contributors
Grigoriy Blekherman
Georgia Institute of Technology

Pablo A. Parrilo
Massachusetts Institute of Technology

Jo
ao Gouveia
University of Coimbra

Mihai Putinar
University of California, Santa
Barbara
Philipp Rostalski
University of Frankfurt and
Dr
agerwerk AG & Co. KGaA, L
ubeck

William Helton
University of California, San Diego
Igor Klep
The University of Auckland

Bernd Sturmfels
University of California, Berkeley

Scott McCullough
University of Florida
Jiawang Nie
University of California, San Diego

Rekha Thomas
University of Washington

ix

i
i

main
2012/11/1
page x
i

main
2012/11/1
page xi
i

List of Figures
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13

5.1
5.2

Feasible sets of the primal and dual LP problems (2.1) and (2.2). .
The shaded set is a spectrahedron, with a semidenite representation given by (2.4). . . . . . . . . . . . . . . . . . . . . . . . . . . .
A projected spectrahedron dened by (2.6). . . . . . . . . . . . . .
A spectrahedron and its projection. . . . . . . . . . . . . . . . . .
Feasible set of the primal SDP problem (2.7). . . . . . . . . . . . .
Unit balls of the spectral norm and the nuclear norm, for the space
of 2 2 symmetric matrices. . . . . . . . . . . . . . . . . . . . . .
A 3-ellipse, a 4-ellipse, and a 5-ellipse, each with its foci. . . . . .
Petersen graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The discriminant Disx (p). . . . . . . . . . . . . . . . . . . . . . . .
The zero set of the discriminant of the polynomial x4 + 4ax3 +
6bx2 + 4cx + 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A three-dimensional convex set. . . . . . . . . . . . . . . . . . . .
Relationships between set classes. . . . . . . . . . . . . . . . . . .
Convex hulls of the graphs of cubic polynomials on an interval. . .
Projection of a rounded solution. . . . . . . . . . . . . . . . . . . .
The boundary of the domain of stability is dened by f(a, b) = 0.
Newton polytope of the polynomial 5 xy x2 y 2 + 3y 2 + x4 . . .
2
The polynomials p = 10 x2 y and (3 y6 )2 + 35
36 y take exactly
2
2
the same values on the unit circle x + y = 1. . . . . . . . . . . .
Set of valid moments (1 , 2 , 3 ) of a probability measure supported on [1, 1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

of valid moments onto (, ) = (42 +


Projection of the set Pn,2d
24 , 22 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The convex cone of 5 5 circulant copositive matrices (3.44) and
its inner sos approximation K0 . . . . . . . . . . . . . . . . . . . . .
Trajectories of the nonlinear dynamical system (3.46) and level sets
of a Lyapunov function found using sos techniques. . . . . . . . . .

5
8
10
10
12
17
18
35
53
54
57
58
69
71
82
92
96
124
129
135
138

The cube is dual to the octahedron. . . . . . . . . . . . . . . . . . 204


A three-dimensional spectrahedron P and its dual convex
body P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
xi

i
i

xii

main
2012/11/1
page xii
i

List of Figures
5.3

5.4
5.5
5.6
5.7
5.8
5.9

5.10
5.11
6.1
6.2
6.3
6.4
6.5
6.6
6.7
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8

7.9
7.10

7.11
7.12
7.13

The unit balls for the L4 -norm and the L4/3 -norm are dual. The
curve on the left has degree 4, while its dual curve on the right has
degree 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The bicuspid curve in Example 5.25. . . . . . . . . . . . . . . . . .
A quartic curve in the plane can have up to 28 real bitangents. . .
The convex hull of the curve (cos(), cos(2), sin(3)) in R3 . . . .
The curve on the unit sphere discussed in Examples 5.37 and
5.61. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The elliptope P = E3 and its dual convex body P . . . . . . . . .
The discriminant in Example 5.59 denes a curve in the (a, b)plane. The projected spectrahedron C is the set of points where
the ternary quartic fa,b is sos. The ranks of the corresponding sos
matrices Q are indicated. . . . . . . . . . . . . . . . . . . . . . . .
Convex hull as intersection of half spaces. . . . . . . . . . . . . . .
Convex hull of the curve in Figure 5.7 and its dual convex body.
The TV screen {(x1 , x2 ) : x41 + x42 1}. . . . . . . . . . . . . .
A line passing through (0.5, 0) intersects the curve x31 3x22 x1
(x21 + x22 )2 = 0 in only 2 real points. . . . . . . . . . . . . . . .
The shaded area is the union of T1 and T2 in Example 6.16. . .
The semialgebraic set of Example 6.18. . . . . . . . . . . . . .
Projected spectrahedron dened in Example 6.19. . . . . . . .
The convex set dened by x21 + x22 x41 + x21 x22 + x42 . . . . . . .
The convex set in Example 6.40. . . . . . . . . . . . . . . . . .

. .

. .
. .
. .
. .
. .
. .

The lemniscate x4 + y 4 + 2x2 y 2 x2 + y 2 = 0 with a bitangent. .


A bicorn curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The spectrahedra {y RB2k : y0 = 1, MBk (y)  0} for k = 1, 2, 3
for I = (x + 1)x(x 1)2  and their projections to the y1 -axis. . .
The variety of Example 7.11 and its rst theta body. . . . . . . . .
The second theta body from Example 7.11. . . . . . . . . . . . . .
The second Lasserre relaxation for Example 7.11. . . . . . . . . . .
Sum of squares approximation to the half-lemniscate of Gerono. .
In the darker color we see TH2 (p), while in the lighter color we
see the strengthening Q as dened in Example 7.19. In black we
see the variety itself. . . . . . . . . . . . . . . . . . . . . . . . . . .
Lemniscate of Gerono. . . . . . . . . . . . . . . . . . . . . . . . . .
The top row contains all 0/1 three-dimensional 2-level polytopes
(up to ane equivalence). The bottom row contains all 0/1 threedimensional polytopes (up to ane equivalence) that are not
2-level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cusp and its convex hull. . . . . . . . . . . . . . . . . . . . . . . .
Strophoid curve and its convex hull. . . . . . . . . . . . . . . . . .
Scarabaeus curve and its third theta body. . . . . . . . . . . . . .

212
223
225
229
230
232

242
243
245
257
258
266
268
269
281
283
299
302
310
311
311
312
314

316
316

319
320
321
324

i
i

List of Figures
7.14

7.15
7.16
7.17
7.18
7.19

On the left we see the cardioid p(x) = 0 and its convex hull. On
the right we see the graph of p, its intersection with the plane z = 0
and the ellipsoidal region where the graph and the boundary of its
convex hull dier. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Graph of the polynomial x x2 x3 + x4 , its convex hull, and
intersection with the x-axis. . . . . . . . . . . . . . . . . . . . . . .
TH2 (I), TH3 (I), TH4 (I), and TH5 (I): all contain the origin in
their interior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The curved eight variety and its convex hull. . . . . . . . . . . . .
Serpentine curve and the closure of its convex hull. . . . . . . . . .
5-wheel, partial 5-wheel, and Petersen graph. . . . . . . . . . . . .

main
2012/11/1
page xiii
i

xiii

325
326
327
328
329
337

i
i

main
2012/11/1
page xiv
i

main
2012/11/1
page xv
i

Preface
In the past decade there has been a surge of interest in algebraic approaches to
optimization problems dened in terms of multivariate polynomials. Fundamental
mathematical challenges that arise in this program include understanding the structure of nonnegative polynomials, the interplay between eciency and complexity
of dierent representations of algebraic sets, and the development of eective algorithms. Remarkably, and perhaps unexpectedly, convexity provides a new viewpoint
and a powerful framework for addressing these questions. This naturally brings us
to the intersection of algebraic geometry, optimization, and convex geometry, with
an emphasis on algorithms and computation. This emerging area has become known
as convex algebraic geometry.
Our aim is to provide an accessible and unifying introduction to the many
facets of this fast-growing interdisciplinary area. Each chapter addresses a fundamental aspect of convex algebraic geometry, ranging from the well-established
core mathematical theory to the forefront of current research and open questions.
Throughout we showcase the rich interactions between theory and applications.
This book is suitable as a textbook in a graduate course in mathematics and
engineering. The chapters make connections to several areas of pure and applied
mathematics and contain exercises at many levels, providing multiple entry points
for readers with varied backgrounds.
We thank the National Science Foundation for funding a Focused Research
Group grant (20082011) awarded to Bill Helton, Jiawang Nie, Pablo A. Parrilo,
Bernd Sturmfels, and Rekha R. Thomas. This award enabled a urry of research
activity in semidenite optimization and convex algebraic geometry. Several workshops and conferences were organized under this grants support. In particular this
book was inspired by the lectures at the workshop LMIPO organized by Bill Helton
and Jiawang Nie at the University of California, San Diego in March 2010.
We thank all our contributors for their hard work and perseverance through
multiple rounds of edits. We also thank Tom Liebling, Sara Murphy, and Ann
Manning Allen at SIAM for their support and patience with the production of this
book. Special thanks to our students and colleagues who read versions of this book
and sent us comments, in particular Chris Aholt, Hamza Fawzi, Fabiana Ferracina,
Alexander Fuchs, Chris Jordan-Squire, Frank Permenter, James Pfeier, Stefan

xv

i
i

xvi

main
2012/11/1
page xvi
i

Preface

Richter, Richard Robinson, Raman Sanyal, James Saunderson, Rainer Sinn, and
Thao Vuong.
Greg Blekherman1
Atlanta, GA

Pablo A. Parrilo2
Cambridge, MA

Rekha R. Thomas3
Seattle, WA

1 The work of Greg Blekherman was supported by a Sloan Fellowship, NSF grant DMS-0757212,
the Mittag-Leer Institute Sweden, and IPAM UCLA.
2 The work of Pablo A. Parrilo was supported by NSF grant DMS-0757207 and a Finmeccanica
Career Development Chair.
3 The work of Rekha R. Thomas was supported by NSF grants DMS-0757371 and DMS-1115293
and a Robert R. and Elaine F. Phelps Endowed Professorship.

i
i

main
2012/11/1
page xvii
i

List of Notation
Basics:
elds, rings
nonnegative integers
nonnegative orthant
positive orthant
standard simplex in Rn+
standard basis vectors

R, C, P, Q, Z
N
Rn+
Rn++

n := {x Rn+ :
xi = 1}
ei

Matrices:
m n matrices
matrix brackets
n n symmetric matrices
n n positive semidenite denite matrices
n n positive denite matrices
inner product in S n
matrix multiplication
trace
matrix transpose
determinant
rank
diagonal of a matrix M as a vector
diagonal matrix obtained from a matrix M
lower triangular matrix from matrix M
turning a vector v into a diagonal matrix
block diagonal matrix with blocks A, B etc
positive semidenite
positive denite
max/min eigen/singular value

Rmn
[]
Sn
n
S+
n
S++
A, B
AB
Tr
AT
det M
rank M
diag(M )
Diag(M )
Tril(M )
Diag(v)
BlockDiag(A, B, ...)
0
0
max , min

Geometry:
p-norm
ball with center u, radius r
vector space dual
orthogonal complement of vector space
dimension

u
p
B(u, r)
V
V
dim V
xvii

i
i

xviii

List of Notation

codimension
cone dual
polar dual of convex body
dual face to an exposed face
dual variety
interior of a set
boundary of set
algebraic boundary
closure of set
convex hull of set C
conical hull of set C
gauge function of a convex body K

codim V
C
P
F
X
int(C)
C
a C
cl(C) or C
conv(C)
cone(C)
GK (x)

Optimization:
optimal solution
semidenite program
kth theta body of ideal I
characteristic vector of a set S

u
SDP
THk (I)
S

Algebra:
ideal generated by
variety of ideal
vanishing ideal of a set
Jacobian
gradient
Hessian
singular locus
smooth points in a variety
polynomial ring in n variables
polynomials in n variables, degree at most d
if n clear
monomials of degree at most d
Nn (for exponents of monomials)
nonnegative polynomials in n variables, degree
at most 2d
if n is clear
sum of squares in n variables of degree at
most 2d
if n is clear
forms in n variables, degree equal to d
if n clear
monomials of degree d
nonnegative forms in n variables, degree 2d
if n is clear
sos forms in n variables of degree 2d
if n is clear

main
2012/11/1
page xviii
i

f1 , . . . , fm 
VR (I), VC (I)
I(S)
Jac( )

2
Sing( )
Xreg
R[x], C[x]
R[x]n,d
R[x]d
[x]d 
|| = i
Pn,2d
P2d
n,2d
2d
R[x]n,d
R[x]d
[x]d
Pn,2d
P2d
n,2d
2d

i
i

List of Notation
sos polynomials mod an ideal I
polynomials in R[x]n,d that are k-sos mod I
if n is clear
ane linear polynomials in above
Newton polytope of f
linear functionals on R[x]
linear functionals that are evaluations at v
quadratic forms on R[x]n,d
nonnegative quadratic forms in S n,d
preorder of g1 , . . . , gm /truncated
quadratic module of g1 , . . . , gm /truncated

main
2012/11/1
page xix
i

xix
(I)
kn,d (I)
kd (I)
k1 (I)
N (f )

v
S n,d
n,d
S+
preorder(g1 , . . . , gm ),
preorderk (g1 , . . . , gm )
qmodule(g1 , . . . , gm ),
qmodulek (g1 , . . . , gm )

i
i

main
2012/11/1
page xx
i

main
2012/11/1
page 1
i

Chapter 1

What is Convex
Algebraic Geometry?

Grigoriy Blekherman, Pablo A. Parrilo, and


Rekha R. Thomas

Convex algebraic geometry is an evolving subject area arising from a synthesis of


ideas and techniques from optimization, convex geometry, and algebraic geometry.
The central objects of study in this rapidly developing eld are convex sets with
algebraic structure. Such sets occur naturally, and have been analyzed independently, in convex geometry, real algebraic geometry, optimization, and analysis, but
only recently has a unied perspective that systematically takes advantage of the
interactions between algebra and convexity emerged. This viewpoint provides rich
connections across the mathematical sciences and novel tools for applied mathematics and engineering. This book presents the foundations of convex algebraic
geometry and provides an accessible entry point for students and researchers.
A fundamental class of algebraically dened convex sets arises from intersections of the cone of positive semidenite matrices with ane subspaces. These sets
are called spectrahedra and are automatically convex and endowed with rich algebraic structure. The problem of optimizing a linear function over a spectrahedron
is called semidenite programming. Such problems admit ecient algorithms, enable many applications, and have been studied extensively in the past few decades.
These basic concepts are introduced in Chapter 2.
The structure of nonnegative polynomials is a central theme in polynomial
optimization and real algebraic geometry. A classical question is the existence of
a representation that makes the nonnegativity of a polynomial apparent. Such
representations naturally involve sums of squares and provide certicates for nonnegativity. In addition to classical existence questions, convex algebraic geometry
is concerned with constructive aspects and ecient computation. Semidenite optimization is the algorithmic engine behind the eective computation of sums of
1

i
i

main
2012/11/1
page 2
i

Chapter 1. What is Convex Algebraic Geometry?

squares certicates. Chapter 3 provides a gentle introduction to these techniques.


The underlying geometric aspects of nonnegative and sum of squares polynomials
are then analyzed in detail in Chapter 4.
Chapter 5 presents a unied viewpoint of duality, a powerful and recurring
theme across algebraic geometry, convexity, and optimization. As such, it naturally
plays a central role in convex algebraic geometry. The philosophy of this chapter is
that the dierent notions of duality become nearly identical when applied to convex
sets with algebraic structure.
A natural question in optimization is to determine what problems can be modeled as semidenite programs, which translates into the problem of representing or
eciently approximating convex sets as spectrahedra or their projections. These
questions are addressed in Chapter 6. A particularly nice yet important and challenging class of sets to represent and approximate are convex hulls of real varieties.
This is the subject of Chapter 7. Sums of squares provide a universal approach
to the above representability questions, although a full picture, particularly with
regard to eciency issues, is still elusive.
Sums of squares are also prominent in noncommutative and Hermitian contexts. Nonnegativity is a much more rigid property in the noncommutative setting,
and thus some parts of the classical commutative theory become more elegant and
structured. Chapter 8 oers a friendly tour through noncommutative convexity and
nonnegativity. The Hermitian case is motivated by fundamental questions in operator theory and complex analysis, and analytic considerations oer new insights
and methods. This is the topic of Chapter 9. Both of these areas have deep roots
in classical mathematics and strong connections to engineering applications.
Besides these central themes, convex algebraic geometry oers fertile ground
for synergies with other areas such as representation theory, computational complexity, combinatorics, harmonic analysis, and probability theory. These interactions
provide exciting opportunities for theoretical developments, computational methods, and practical applications, as can be witnessed by the growing literature.
The dierent chapters in this book are interwoven by many recurring themes
and common ideas. However they can also be read independently by a reader who
is interested in a specic topic. Chapters 2 and 3 introduce the reader to the core
ideas and techniques in the book. The following chapters delve deeper into their
own topics while also presenting applications and links to the rest of the book.

i
i

main
2012/11/1
page 3
i

Chapter 2

Semidenite
Optimization

Pablo A. Parrilo

In this chapter we introduce one of the core theoretical and computational techniques in convex algebraic geometry, namely, semidenite optimization. We begin
by reviewing linear programming and proceed to dene and discuss semidenite programs from the algebraic, geometric, and computational perspectives. We dene
spectrahedra as the feasible sets of semidenite programs, study their properties,
and discuss numerous examples. Despite the many parallels, the duality theory
of semidenite optimization is more complicated than in the case of linear programming, and we elaborate on the similarities and dierences. We also showcase
a number of applications of semidenite optimization in several areas of applied
mathematics and engineering and give a short discussion of algorithmic and software aspects. For the convenience of the reader, we present additional background
material on convex geometry and optimization in Appendix A.

2.1

From Linear to Semidenite Optimization

Semidenite optimization is a branch of convex optimization that is of great theoretical and practical interest. Informally, the main idea is to generalize linear
programming and the associated feasible sets (polyhedra) to the case where the decision variables are symmetric matrices, and the inequalities are to be understood
as matrices being positive semidenite. Formal denitions and examples will be
presented shortly in Subsection 2.1.2, preceded by a review of the familiar case of
linear programming. A few selected standard references for linear programming and
their applications are the books [5, 12, 29, 42].

i
i

2.1.1

main
2012/11/1
page 4
i

Chapter 2. Semidenite Optimization

Linear Programming

Linear programming is the problem of minimizing a linear function subject to linear constraints. A linear programming problem (LP) in standard form is usually
written as
minimize
cT x
(LP-P)
subject to
Ax = b,
x 0,
where A Rmn , b Rm , and we are minimizing over the decision variable x Rn .
The inequality x 0 is interpreted componentwise, i.e., xi 0 for i = 1, . . . , n.
Geometrically, an LP problem has a nice and natural interpretation. Its feasible set is the intersection of an ane subspace (dened by the equations Ax = b),
and the nonnegative orthant. Since it is the intersection of two convex sets, the feasible set of (LP-P) is always convex. In general, a set dened by nitely many linear
inequalities or equations is called a polyhedron, and it is always convex. Thus, linear
programming corresponds exactly to the minimization of a linear function over a
polyhedron. If a polyhedron is bounded, it is called a polytope.
Perhaps one of the most remarkable and useful features of linear programming
is that to every LP problem we can associate a corresponding dual problem. This
is another LP problem (its dual LP), which for the case of (LP-P) is
maximize bT y

(LP-D)

subject to AT y c.

Notice that here we are again optimizing a linear function over a polyhedron. As
we will see, there are very natural and direct algebraic relationships between the
primal problem (LP-P) and its dual problem (LP-D).
Remark 2.1. In practice, LP problems may not naturally present themselves in
the form (LP-P), where all the decision variables are nonnegative and only equality
constraints are present, or the form (LP-D), where there are no sign restrictions
on the variables and only inequalities appear. However, they can always be put in
either form, by introducing additional slack variables and/or splitting variables if
necessary. The details can be found in any textbook on linear programming.
Example 2.2. Consider the following LP problem:
minimize x1 8x2

subject to

x1 + 3x2 + x3
4x1 x2 + x4

x1 , x2 , x3 , x4

= 4,
= 6,
0.

(2.1)

The feasible region is a two-dimensional polyhedron. Its projection into the (x1 , x2 )plane is drawn in Figure 2.1. Notice that the optimal solution is achieved at a vertex,
namely, x = (2, 2, 0, 0), with optimal cost p = 14.

i
i

2.1. From Linear to Semidenite Optimization

main
2012/11/1
page 5
i

5
y2

x2

7

6

5

4

3

2

y1

1

2.0

1

1.5

2

1.0

3

0.5

4

0.5

1.0

1.5

2.0

x1

5

Figure 2.1. Feasible sets of the primal and dual LP problems (2.1) and (2.2).
The corresponding dual LP is

maximize 4y1 + 6y2

subject to

y1 + 4y2

3y1 y2
y1

y2

1,
8,
0,
0.

(2.2)

The dual feasible set (y1 , y2 ) is presented in the same gure, with optimal solution
31
5
y  = ( 11
, 11
) and optimal cost d = 14. For this example we have
p = d = 14,
and thus the optimal values of the primal and dual problems are the same.
Even in this simple example, we can observe many of the important features
of linear programming. The following facts are well known.
Geometry of the feasible set: The feasible sets of linear programs are polyhedra. The geometry of polyhedra is quite well understood. In particular, the
Minkowski-Weyl theorem (e.g., Appendix A, [5], or [48, Section 1.1]) states
that every polyhedron P is nitely generated, i.e., it can be written as
P = conv(u1 , . . . , ur ) + cone(v1 , . . . , vs ),
where ui , vi are the vertices and extreme rays of P , respectively, and the
convex hull and conical hull are dened by

 r

r



conv(u1 , . . . , ur ) =
i ui
i = 1, i 0, i = 1, . . . , r

i=1

i=1

and
cone(v1 , . . . , vs ) =

s

i=1




i vi i 0,

i = 1, . . . , s .

i
i

main
2012/11/1
page 6
i

Chapter 2. Semidenite Optimization

Rational solutions: Unless the problem is unbounded, the optimal solution of


a linear programming problem is always achieved at extreme points of the
feasible set. Since these correspond to vertices of a polyhedron, the solution
can be characterized in terms of a system of linear equations, corresponding
to the equations and inequalities that are active at the optimal point. Thus, if
the problem description (i.e., the matrices A, b, c) is given by rational numbers,
there are always extreme points that are rational and achieve the optimal cost.
Weak duality: For any feasible solutions x, y of (LP-P) and (LP-D), respectively,
it always holds that
cT x bT y = xT c (Ax)T y = xT (c AT y) 0,

(2.3)

where the last inequality follows from the feasibility conditions x 0 and
AT y c. Thus, from any feasible dual solution one can obtain a lower bound
on the value of the primal. Conversely, primal feasible solutions give upper
bounds on the value of the dual.
Strong duality: If both primal and dual problems are feasible, then they achieve
exactly the same optimal value, and there exist optimal feasible solutions
x , y  such that cT x = bT y  . This is a consequence of the separation theorems for convex sets; see, e.g., Section A.3.3 in Appendix A.
Complementary slackness: Strong duality, combined with (2.3), implies that at
optimality we must have
xi (c AT y  )i = 0,

i = 1, . . . , n.

In other words, there is a correspondence between primal variables and dual


inequalities that says that whenever a primal variable is nonzero, the corresponding dual inequality must be tight.
In the linear programming case, these properties are well known and relatively
easy to prove. Interestingly, as we will see in the next section, some of these properties will break down as soon as we leave linear programming and go to the more
general case of semidenite programming. These technical aspects will cause some
minor diculties, although with the right assumptions in place, the resulting theory
will closely parallel the linear programming case.
Exercise 2.3. Consider a nite set of points S = {a1 , a2 , . . . , an } in Rd , where
n > d. Prove using linear programming duality that exactly one of the following
statements must hold:
The origin is in the convex hull of S.
There exists a hyperplane passing through the origin, such that all points ai
are strictly on one side of the hyperplane.

i
i

2.1. From Linear to Semidenite Optimization

main
2012/11/1
page 7
i

Exercise 2.4. Consider the set of n n matrices with nonnegative entries that
have all row and column sums equal to 1 (i.e., the doubly stochastic matrices).
1. Write explicitly the equations and inequalities describing this set for n =
2, 3, 4.
2. Compute (using CDD, lrs, or other software; see Section 2.3.2) all the extreme
points of these polytopes.
3. How many extreme points did you nd? What is the structure of the extreme
points? Can you conjecture what happens for arbitrary values of n?
4. Google BirkhoVon Neumann theorem, and check your guess.

2.1.2

Semidenite Programming

Semidenite programming is a broad generalization of linear programming, where


the decision variables are symmetric matrices. A semidenite programming problem
(SDP) corresponds to the optimization of a linear function subject to linear matrix inequality (LMI) constraints. Semidenite programs are convex optimization
problems and have very appealing numerical properties (e.g., [7, 44, 45]).
Our notation is as follows: the set of real symmetric n n matrices is denoted
by S n . A matrix A S n is positive semidenite if xT Ax 0 for all x Rn
and is positive denite if xT Ax > 0 for all nonzero x Rn . Equivalently, A is
positive semidenite if its eigenvalues i (A) satisfy i (A) 0, i = 1, . . . , n, and is
positive denite if i (A) > 0, i = 1, . . . , n. The set of n n positive semidenite
n
n
matrices is denoted S+
, and the set of positive denite matrices is denoted S++
.
n
is a proper cone (i.e., closed, convex, pointed, and solid).
As we will prove soon, S+
We use the inequality signs and  to denote the partial order induced by
n
(usually called the L
owner partial order); i.e., we write A  B if and only
S+
if A B
is positive semidenite. For a square matrix A, its trace is dened as
Tr(A) = i Aii . See Section A.1 for further characterizations and general properties
of positive semidenite matrices.
Spectrahedra. Recall that a polyhedron is a set dened by nitely many linear
inequalities and that feasible sets of LPs are polyhedra. Similarly, we dene spectrahedra as sets dened by nitely many LMIs. These sets will correspond exactly
to feasible sets of semidenite programming problems.
Denition 2.5. A linear matrix inequality (LMI) has the form
A0 +

m


Ai xi  0,

i=1

where Ai S n are given symmetric matrices.

i
i

main
2012/11/1
page 8
i

Chapter 2. Semidenite Optimization


y
3
2
1

-6

-5

-4

-3

-2

-1

-1
-2
-3

Figure 2.2. The shaded set is a spectrahedron, with a semidenite representation given by (2.4).
Denition 2.6. A set S Rm is a spectrahedron if it has the form


m

m
S = (x1 , . . . , xm ) R : A0 +
Ai xi  0 ,
i=1

for some given symmetric matrices A0 , A1 , . . . , Am S n .


Geometrically, a spectrahedron is dened by intersecting the positive semidefinite cone and an ane subspace (the span of A1 , . . . , Am , translated to A0 ). Spectrahedra are closed convex sets, since a matrix
inequality is equivalent to innitely
m
many scalar inequalities of the form v T (A0 + i=1 Ai xi )v 0, one for each value
n
of v R . Since it is always possible to bundle several matrix inequalities into a
single LMI (by choosing the matrices Ai to be block-diagonal), there is no loss of
generality in dening spectrahedra in terms of a single matrix inequality. In particular, this shows that polyhedra are a particular case of spectrahedra, corresponding
to all matrices Ai being diagonal.
Recall that the positive semideniteness of a matrix can be characterized in
terms of scalar inequalities on the coecients of its characteristic polynomial or its
principal minors (see Proposition A.1). Thus, one can obtain an explicit description
of a spectrahedron in terms of a nite collection of unquantied scalar polynomial
inequalities in the variables xi . In other words, spectrahedra are basic semialgebraic
sets, that are convex.
Example 2.7 (elliptic curve). Consider the spectrahedron in R2 given by

x+1
0
y

2
x 1  0 .
(x, y) R2 : A(x, y) := 0
(2.4)

y
x 1
2
This set is shown in Figure 2.2. To obtain scalar inequalities dening the set, let
pA (t) = det(tI A(x, y)) = t3 + p2 t2 + p1 t + p0 be the characteristic polynomial of

i
i

2.1. From Linear to Semidenite Optimization

main
2012/11/1
page 9
i

A(x, y). Positive semideniteness of A(x, y) is then equivalent to the conditions


p2 = x + 5 0,
p1 = x2 + 2x y 2 + 7 0,
p0 = 3 + x x3 3x2 2y 2 0.
It can be seen that this spectrahedron corresponds to the oval of the elliptic curve
3 + x x3 3x2 2y 2 = 0. Notice that the boundary of the set is given by the
determinant of the matrix inequality (why?), and the role of the other inequalities
is to cut down and isolate the relevant component.
As dened above, a spectrahedron S is a closed convex subset of the ane
standard usage, we will also use spectrahedron to denote
space Rm . Following
m
n
the set {A0 + i=1 Ai xi | x Rm } S+
. Notice that this is a convex set of matrices
instead of a subset of Rm , but if the matrices Ai are linearly independent, these
two convex sets are anely equivalent.
Projected spectrahedra. Also of interest are the linear projections of spectrahedra, which we will call projected spectrahedra:
Denition 2.8. A set S Rm is a projected spectrahedron if it has the form

p
m



Ai xi +
Bj y j  0 ,
S = (x1 , . . . , xm ) Rm : (y1 , . . . , yp ) Rp , A0 +

i=1

j=1

(2.5)
where A0 , A1 , . . . , Am , B1 , . . . , Bp are given symmetric matrices.
As the name indicates, geometrically this corresponds to a spectrahedron in
Rm+p that is projected under the linear map : Rm+p Rm , (x, y)  x. Since
spectrahedra are semialgebraic sets, by the TarskiSeidenberg theorem (Section
A.4.4 in Appendix A) projected spectrahedra are also semialgebraic. Thus, they
can be dened in terms of nite unions of sets dened by polynomial inequalities involving only the variables xi , although in practice it is not always easy or convenient
to do so.
Example 2.9. Consider the projected spectrahedron in R2 given by




z + y 2z x
2
(x, y) R : z R,
 0, z 1 .
2z x z y

(2.6)

This set is shown in Figure 2.3. It corresponds to the projection on R2 of the


spectrahedron in R3 dened by the intersection of a quadratic cone and a halfspace
(see Figure 2.4).
For any xed value of z, the set described by the 2 2 matrix inequality is a
disk of radius z centered at (2z, 0). Thus, this spectrahedron is the convex hull of
the disk of unit radius centered at (2, 0) and the origin.

i
i

10

main
2012/11/1
page 10
i

Chapter 2. Semidenite Optimization


y

1.0

0.5

0.5

1.0

1.5

2.0

2.5

3.0

0.5

1.0

Figure 2.3. A projected spectrahedron dened by (2.6).


y

1
1.5

1.0
z
0.5

0.0
0

2
x

Figure 2.4. A spectrahedron and its projection.


As we will see later in much more detail in Chapters 3 and 6, there are simple
examples of projected spectrahedra that are not spectrahedra (in fact, the set in
Example 2.9 is one such case). This is in strong contrast with the case of polyhedra,
for which we know (e.g., via FourierMotzkin elimination) that the linear projection
of a polyhedron is always a polyhedron. Thus, this is a key distinguishing feature
of semidenite programming, since by adding additional slack or lifting variables,
we can signicantly expand the expressibility of our class of sets.
Projected spectrahedra are very important for optimization. Indeed, by including the additional lifting variables yi , we will see that it is possible to reduce
a linear optimization problem over a projected spectrahedron to the solution of
a standard semidenite program. Furthermore, projected spectrahedra have very
high expressive power, in the sense that many convex sets of interest can be represented in this form. Although in general it may be hard to explicitly represent
projected spectrahedra in terms of their dening inequalities in their ambient space

i
i

2.1. From Linear to Semidenite Optimization

main
2012/11/1
page 11
i

11

(see Section 5.6 in Chapter 5), having a representation of the form (2.5) will often
be enough for optimization purposes.
Exercise 2.10. Both spectrahedra and projected spectrahedra are convex sets.
Show that spectrahedra are always closed sets. What about projected spectrahedra?
Primal SDP formulation. Semidenite programs are linear optimization problems over spectrahedra. An SDP problem in standard primal form is written as
minimize
subject to

C, X
Ai , X = bi ,

i = 1, . . . , m,

(SDP-P)

X  0,

where C, Ai S n , and X, Y  := Tr(X T Y ) = ij Xij Yij . The matrix X S n is
the variable over which the minimization is performed. The inequality in the third
line means that the matrix X must be positive semidenite. Notice the strong
formal similarities to the LP formulation (LP-P). As we will see in Section 2.1.4,
this formal analogy can be pushed even further to conic optimization problems.
Let us make a few quick comments before presenting examples of semidenite
programs. The set of feasible solutions of (SDP-P), i.e., the set of matrices X that
satisfy the constraints, is a spectrahedron, and thus it is always convex. This follows
directly from the fact that the feasible set is the intersection of an ane subspace
n
, both of which are convex sets. However,
and the positive semidenite cone S+
unlike the linear programming case, in general the set of feasible solutions will not
be polyhedral.
Example 2.11. Consider the semidenite optimization problem
minimize
subject to

2x11 + 2x12


x11 + x22 = 1,

x11 x12
 0.
x12 x22

Clearly, this has the form (SDP-P), with m = 1 and






2 1
1 0
C=
,
A1 =
,
1 0
0 1

(2.7)

b1 = 1.

The constraints are satised if and only if x11 (1 x11 ) x212 , and thus the
feasible set is a closed disk, which is not polyhedral. Figure 2.5 shows the feasible
set, parametrized by the variables (x11 , x12 ). The optimal solution is equal to


2 2
1

4
2 2

X =
,
1
2+ 2
2
4
2
with optimal cost 1

2, which is clearly not rational.

i
i

12

main
2012/11/1
page 12
i

Chapter 2. Semidenite Optimization


X12

0.6

0.4

0.2

0.2

0.2

0.4

0.6

0.8

1.0

1.2

X11

0.2

0.4

0.6

Figure 2.5. Feasible set of the primal SDP problem (2.7).


As we have seen from this simple example, SDP problems with rational data do
not necessarily have rational optimal solutions. Since the solutions are nevertheless
algebraic numbers, a natural question is to analyze their algebraic degree, i.e., the
minimum degree of a polynomial with integer coecients needed to specify the
solution. The algebraic degree of semidenite programming is studied in Chapter 5,
Section 5.5.
In the particular case when C = 0 in (SDP-P), the problem reduces to whether
or not the constraints can be satised for some matrix X. This is referred to as
a feasibility problem. As described later, the algebraic nature and convexity of
semidenite programming has made it possible to develop sophisticated and reliable
analytical and numerical methods to solve them.
Duality. A very important feature of semidenite programming, from both the theoretical and applied viewpoints, is the associated duality theory. For every semidefinite program of the form (SDP-P) (usually called the primal problem), there is
another associated SDP, called the dual problem, that can be stated as
maximize bT y
m

subject to
Ai yi C,

(SDP-D)

i=1

where b = (b1 , . . . , bm ), and y = (y1 , . . . , ym ) are the dual decision variables.


As in the linear programming case, the key relationship between the primal
and the dual problems is that feasible solutions of one problem can be used to bound
the values of the other. Indeed, let X and y be any two feasible solutions of the
primal and dual problems, respectively. We then have the following inequality:


m
m


T
yi Ai , X = C
Ai yi , X 0,
(2.8)
C, X b y = C, X
i=1

i=1

i
i

2.1. From Linear to Semidenite Optimization

main
2012/11/1
page 13
i

13

where the last inequality follows from the fact that the inner product of two positive
semidenite matrices is nonnegative. From (SDP-P) and (SDP-D) we can see that
the left-hand side of (2.8) is the dierence between the primal and dual objective
functions. The inequality in (2.8) tells us that the value of the primal objective
function evaluated at any feasible matrix X is always greater than or equal to the
dual objective function at any dual feasible y. This is known as weak duality. Thus,
we can use any X for which (SDP-P) is feasible to compute an upper bound for
the value of bT y in (SDP-D), and we can also use any feasible y of (SDP-D) to
compute a lower bound for the value of C, X in (SDP-P). Furthermore, in the
case of feasibility problems (i.e., C = 0), the dual problem can be used to certify
nonexistence of solutions to the primal problem. This property will be crucial in
our later developments.
If X and Y are positive semidenite matrices, then X, Y  = 0 if and only if
XY = Y X = 0 (e.g., Corollary A.24). Thus, the expression (2.8) allows us to give
a simple sucient characterization of optimality.
Lemma 2.12 (optimality conditions for SDP). Assume (X, y) are primal and
dual feasible solutions of (SDP-P) and (SDP-D), respectively, that satisfy the complementary slackness condition


m

Ai yi X = 0
(2.9)
C
i=1

(and thus achieve the same cost C, X = bT y). Then, (X, y) are primal and dual
optimal solutions of the SDP problem.
In general, the converse statement may require some additional assumptions, to be
discussed shortly.
Example 2.13. Here we continue Example 2.11. The SDP dual to (2.7) is
maximize y
subject to

2y
1


1
 0.
y

The optimal solution is y  = 1 2, with optimal cost 1 2. Notice that


in this example, the optimal values of the primal and dual problems are equal.
Furthermore, complementary slackness holds:





  22
m
1

2
1
+
2
1
4


2

C
Ai yi X =
= 0.
2+ 2
1
1
21
4
i=1
2 2
As opposed to the linear programming case, strong duality may fail in general
semidenite programming. We present below a simple example (from [36]), for
which both the primal and dual problems are feasible, but their optimal values are

i
i

14

main
2012/11/1
page 14
i

Chapter 2. Semidenite Optimization

dierent (i.e., there is a nonzero nite duality gap). Further examples and a detailed
discussion will be presented in Section 2.1.5.
Example 2.14. Let 0, and consider the primal-dual pair
minimize
subject to

X11
X22 = 0,

maximize

X11 + 2X23 = 1,
X  0,

subject to

y2

y2
0
0

0
y1
y2

y2 0
0
0

0 0
0 0 .
0 0

For a primal feasible point, X being positive semidenite and X22 = 0 imply
X23 = 0, and thus X11 = 1. The primal optimal cost p is then equal to (and is
achieved). On the dual side, the vanishing of the (3, 3) entry implies that y2 must
be zero, and thus d = 0. The duality gap p d is then equal to .
The example above (and others like it), are somewhat pathological. We will
see in Section 2.1.5 that under relatively mild conditions, usually called constraint
qualications, strong duality will also hold in semidenite programming. The simplest and most useful case corresponds to the so-called Slater conditions, where the
primal and/or dual problems are required to be strictly feasible. On the primal side,
this means that there exists X 0 that
satises the linear constraints, and on the
dual side, there exists y such that C i Ai yi 0 (notice that the inequalities are
strict). In this case, the situation is as nice as in the linear programming case.
Theorem 2.15. Assume that both the primal (SDP-P) and dual (SDP-D) semidefinite programs are strictly feasible. Then, both problems have optimal solutions, and
the corresponding optimal costs are equal; i.e., there is no duality gap.
This statement will reappear, in a more general setting, in Section 2.1.5. For
many problems (for instance, the ones discussed in the next section), these assumptions hold and are relatively straightforward to verify. In full generality, however,
they may be restrictive, and thus we investigate in Section 2.1.5 the geometric reasons why strong duality may fail in semidenite optimization, as well as possible
workarounds.
Exercise 2.16. Consider the following SDP problem:


x 1
minimize x
subject to
 0.
1 y
1. Draw the feasible set. Is it convex?
2. Is the primal strictly feasible? Is the dual strictly feasible?
3. What can you say about strong duality? Are the results consistent with
Theorem 2.15?

i
i

2.1. From Linear to Semidenite Optimization

main
2012/11/1
page 15
i

15

Exercise 2.17. Do the assumptions of Theorem 2.15 hold for Example 2.14?

2.1.3

Spectrahedra and Their Properties

Before proceeding further, we present several interesting examples of sets that are
expressible in terms of semidenite programming. We will revisit several of these
throughout the dierent chapters in this book.
Spectraplex: The spectraplex or free spectrahedron On is the set of n n positive
semidenite matrices of trace one, i.e.,
On = {X S n |

X  0,

Tr X = 1} .

n
The hyperplane Tr X=1 intersects S+
on a compact set and thus denes a base
for this cone. The extreme points of On are exactly the rank one matrices of the
form X = xxT , where x Rn and
x
= 1. The two-dimensional spectraplex O2
is anely isomorphic to the unit disk in the plane and has already appeared in
Example 2.11.

Elliptope and dual elliptope: Let En be the set of positive semidenite matrices
with unit diagonal, i.e.,
En = {X S n |

X  0,

Xii = 1,

i = 1, . . . , n} .

The convex set En is contained in a subspace of S n of codimension n, dened by the


constraints Xii = 1. It is often useful to consider it instead as a full-dimensional
n
n
convex body in R( 2 ) . For this, dene an orthogonal projection : S n R( 2 ) that
projects a matrix X onto its o-diagonal entries Xij for i < j.
The elliptope En is dened as En = (En ) and is a full-dimensional compact
n
convex set in R( 2 ) . As we will see in Section 2.2.2, this set is of great importance
when studying semidenite relaxations of combinatorial problems. Many geometric
aspects of elliptopes have been extensively studied, e.g., in [26].
The elliptope En is a convex body containing the origin in its interior. Thus,
n
we can dene its polar dual set En = {y R( 2 ) : y T x 1 x En }, known as the
dual elliptope. It follows from the expressions above that En is a (scaled) projection
of the spectraplex onto the o-diagonal entries:
En = 2(On ).

(2.10)

For nice pictures of the 33 elliptope and its dual body, see Figure 5.8 in Chapter 5.
Operator and nuclear norms: Let A Rn1 n2 be a matrix. The spectral or
operator norm of A is given by its maximum norm gain, i.e.,

A
=

max

vRn2 ,v=1

Av
= 1 (A),

where 1 (A) is the largest singular value of A.

i
i

16

main
2012/11/1
page 16
i

Chapter 2. Semidenite Optimization


The nuclear norm of a matrix is equal to the sum of its singular values, i.e.,

A
:=

r


i (A),

(2.11)

i=1

where r is the rank of A. The nuclear norm is alternatively known by several


other names including the Schatten 1-norm, the Ky Fan r-norm, and the trace class
norm. As we will see in Section 2.2.6, the nuclear norm is particularly useful in
optimization problems involving ranks of matrices.
The operator norm and the nuclear norm are dual norms in the sense that
their unit balls are convex bodies that are polar duals, i.e.,
{A Rn1 n2 :
A
1} = {B Rn1 n2 :
B
1}.
Therefore, any two matrices A and B satisfy
A, B
A

B
.
Furthermore, the following inequalities hold for any matrix A of rank at most r:

A

A
F
A

r
A
F r
A
,

(2.12)


1
1
where
A
F is the Frobenius norm, dened as
A
F := (TrAT A) 2 = ( ij a2ij ) 2 .
Both the operator norm and the nuclear norm have nice characterizations in
terms of semidenite programming. In particular, the operator norm
A
is the
optimal solution of the primal-dual pair of semidenite programs

maximize
subject to

Tr 2AT X12


X11 X12
Tr
= 1,
T
X12
X22
X  0,

minimize
subject to

t

tIn1
AT


(2.13)
A
 0.
tIn2

To see the exact correspondence between the standard form (SDP-P)-(SDP-D) and
this formulation, notice that we can take m = 1, X is a block (n1 + n2 ) (n1 + n2 )
matrix, A1 is the (n1 + n2 ) (n1 + n2 ) identity matrix, b1 = 1, and the cost matrix
0 A
C is the block matrix ( A
). Notice that we have the factor of 2 here because
T
0
T
Tr CX = Tr 2A X12 , and we have maximize in (2.13) instead of minimize
in (SDP-P) due to change of sign in the objective function.
Similarly (or dually), the nuclear norm
A
corresponds to the optimal
value of the primal-dual pair

i
i

2.1. From Linear to Semidenite Optimization

Tr AT Y


In1 Y
 0,
Y T In2

maximize
subject to

main
2012/11/1
page 17
i

17

minimize
subject to

1
(TrW1 + TrW2 )
2

W1 A
 0.
AT W2

(2.14)

Since the operator norm and the nuclear norm are dual norms, their unit balls
are dual polar convex bodies. In Figure 2.6 we illustrate these convex sets for the
case of a 2 2 symmetric matrix given by



x y
A=
.
y z

(2.15)

1.0
1.0

0.5
0.5

z 0.0
1.0

1.0

z 0.0

0.5

0.5

0.5

0.5
0.0 y

0.0 y
1.0

1.0
1.0

0.5
0.5

1.0

0.5
0.0

0.0
x

0.5

1.0

0.5
1.0

0.5

1.0
1.0

Figure 2.6. Unit balls of the spectral norm and the nuclear norm, for the
space of 2 2 symmetric matrices.
k-ellipse: We consider a class of planar convex sets dened by the algebraic curves
known as k-ellipses [33]. Recall that the standard ellipse in R2 is dened as the
locus of points with the sum of distances to two xed points (the foci) a xed
constant. Extending this denition to k foci, one can dene the k-ellipse as the
algebraic curve in R2 consisting of all points whose sum of distances from k given
points is a xed number. More formally, x a positive real number d, and x k
distinct points (u1 , v1 ), (u2 , v2 ), . . . , (uk , vk ) in R2 . The k-ellipse with foci (ui , vi )
and radius d is the following curve in the plane:




2
2
(x ui ) + (y vi ) = d .
(x, y) R

2

(2.16)

i=1

In Figure 2.7, we present a few k-ellipses with dierent numbers of foci. In contrast
to the classical circle (corresponding to k = 1) and ellipse (k = 2), a k-ellipse does
not necessarily contain all the foci in its interior. We dene the closed convex set
Ck to be the region whose boundary is the k-ellipse, and it is a sublevel set of the

i
i

18

main
2012/11/1
page 18
i

Chapter 2. Semidenite Optimization

Figure 2.7. A 3-ellipse, a 4-ellipse, and a 5-ellipse, each with its foci.

convex function
(x, y) 

k 

(x ui )2 + (y vi )2 .

(2.17)

i=1

In order for Ck to be nonempty, it is necessary and sucient that the radius d be


greater than or equal to the global minimum d of the convex function (2.17).
The set Ck is a projected spectrahedron, since it admits a semidenite representation. This can be easily obtained by adding slack variables di and rewriting
the function (2.17) in terms of 2 2 matrices. The region Ck is given by the points
(x, y) for which there exist (d1 , . . . , dk ) satisfying
k

i=1


di d,

di + x ui
y vi


y vi
 0,
di x + ui

i = 1, . . . , k.

To see this, notice that the 2 2 matrix above is positive semidenite if and only
if (x ui )2 + (y vi )2 d2i and di 0.
In a less obvious fashion, the k-ellipse can also be represented without additional slack variables, so it is also a spectrahedron. However, in this case the size
of the matrices is much bigger. Below we present a concrete statement; see [33] for
a sharper result and an explicit construction of this representation.
Theorem 2.18. The convex set Ck whose boundary is the k-ellipse of foci (ui , vi )
and radius d is dened by the LMI
x Ak + y Bk + Ck  0,

(2.18)

where Ak , Bk , Ck are symmetric 2k 2k matrices. The entries of Ak and Bk


are integer numbers, and the entries of Ck are linear forms in the parameters
d, u1 , v1 , . . . , uk , vk .
For illustration, we present the case k = 3 of the theorem. A spectrahedral
representation of the 3-ellipse is obtained by requiring the following 8 8 matrix to

i
i

2.1. From Linear to Semidenite Optimization


be positive semidenite:

d+3xu1 u2 u3
yv1

yv
d+x+u
1
1 u2 u3

yv
0
2

0
yv
2

yv
0
3

0
yv3

0
0
0
0
yv3
0
0
0
d+xu1 u2 +u3
yv1
yv2
0

0
yv3
0
0
yv1
dx+u1 u2 +u3
0
yv2

main
2012/11/1
page 19
i

19

yv2
0
d+xu1 +u2 u3
yv1
0
0
yv3
0

0
0
yv3
0
yv2
0
dxu1 +u2 +u3
yv1

0
yv2
yv1
dx+u1 +u2 u3
0
0
0
yv3

yv3
.

yv2

yv1
d3x+u1 +u2 +u3

Exercise 2.19. Prove the relation (2.10) between the elliptope and the spectraplex.
Exercise 2.20. Show that the two semidenite programs in (2.14) are indeed a
primal-dual pair.
Exercise 2.21. Prove the correctness of the semidenite characterizations of the
operator and nuclear norms given in (2.13) and (2.14).
Exercise 2.22. Show that for the symmetric matrix in (2.15), the inequalities that
dene the boundary of the unit balls of the operator and spectral norms shown in
Figure 2.6 are
y 2 + (x + z) xz 1,

y 2 (x + z) xz 1

and
(x z)2 + 4y 2 1,

x + z 1,

(x + z) 1,

respectively.
Exercise 2.23. Analyze the structure of the convex sets in Figure 2.6. What are
the matrices associated with the at facets (or the vertices)? How can you interpret
the rotational symmetries of these convex bodies?

2.1.4

Conic Programming

The strong formal similarities between linear programming and semidenite programming (equations (LP-P)-(LP-D) vs. (SDP-P)-(SDP-D)) suggest that a more

i
i

20

main
2012/11/1
page 20
i

Chapter 2. Semidenite Optimization


minimize cT x
subject to

Ax = b
x0

minimize C, X
subject to Ai , X = bi ,
X 0

minimize c, xS


subject to Ax = b,

maximize

bT y

subject to

AT y c

maximize
subject to

bT y


Ai yi C

(LP)

(SDP)

maximize
subject to

y, bT
c A y K

(CP)

xK

Table 2.1. Primal-dual formulations of linear programming (LP), semidefinite programming (SDP), and general conic programming (CP).

general formulation, encompassing both cases, may be possible. Indeed, a general class of optimization problems that unies linear and semidenite optimization
(as well as a few other additional cases) is conic programming. We describe the
conic framework next, explaining rst the key idea, followed by the mathematical
formulation.
The starting point is the geometric interpretation of linear and semidenite
programming. The feasible set of an LP problem in standard form (LP-P) is the
intersection of an ane subspace (described by the equations Ax = b) and the nonnegative orthant Rn+ . Similarly, the feasible set of a semidenite program (SDP-P)
is the intersection of an ane subspace (described by Ai , X = bi ) with the set of
n
n
positive semidenite matrices S+
. Since both Rn+ and S+
are closed convex cones
(in fact, they are proper conessee below), one can dene a general class of optimization problems where the feasible set is the intersection of a proper cone and an
ane subspace. This is exactly what conic optimization will do!
We present a formal description next. We will be a bit more careful than usual
here in the denition of the respective spaces and mappings. It does not make much
of a dierence if we are working in Rn (since we can identify a space and its dual
through the inner product), but it is good hygiene to keep these distinctions in
mind and will prove useful when dealing with more complicated spaces. We consider
two real vector spaces, S and T , and a linear mapping A : S T . Recall that every
real vector space has an associated dual space, which is the vector space of realvalued linear functionals. We denote these dual spaces by S and T , respectively,
and the pairing between an element of a vector space and one of the dual as , 

i
i

2.1. From Linear to Semidenite Optimization

main
2012/11/1
page 21
i

21

(i.e., f (x) = f, x). Recall that the adjoint mapping of A is the unique linear map
A : T S dened by
A y, xS = y, AxT

x S, y T .

Notice here that the brackets on the left-hand side of the equation represent the
pairing in S, and those on the right-hand side correspond to the pairing in T .
A cone K S is pointed if K (K) = {0} and is solid if it is full-dimensional
(i.e., dim K = dim S). A cone that is convex, closed, pointed, and solid is called a
proper cone. Given a cone K, its dual cone is K := {z S : z, xS 0 x
K}. The dual of a proper cone is also a proper cone; see Exercise 2.24. An element
x is in the interior of the proper cone K if and only if x, z > 0 z K , z = 0.
Standard conic programs. Given a linear map A : S T and a proper cone
K S, we dene the primal-dual pair of (conic) optimization problems
minimize c, xS
subject to Ax = b,

maximize
subject to

y, bT
c A y K ,

x K,

where b T , c S . Notice that exactly the same proof presented earlier works
here to show weak duality:
c, xS y, bT = c, xS y, AxT
= c, xS A y, xS

(2.19)

= c A y, xS
0.
In the usual cases (e.g., LP and SDP), all vector spaces are nite-dimensional and
thus isomorphic to their duals. The specic correspondence between these is given
through whatever inner product we use.
Among the classes of problems that can be interpreted as particular cases of
the general conic formulation we have linear programs, second-order cone programs
(SOCP), and semidenite programs, when we take the cone K to be the nonnegative
orthant Rn+ , the second-order cone Ln+ (Exercise 2.25), or the positive semidenite
n
cone S+
, respectively. Two other important cases are when K is the hyperbolicity
cone associated with a given hyperbolic polynomial [22, 40] and the cone n,2d of
multivariate polynomials that are sums of squares. We discuss this latter example
in much more detail in Chapter 3.
Despite the formal similarities, there are a number of dierences between linear programming and general conic programming. We have already seen in (2.19)
that weak duality always holds for conic programming. However, recall from Example 2.14 that in semidenite programming (and thus, in general conic programming)
there may be a nonzero duality gap. In the next section, we explore the geometric
reasons for the possible failure of strong duality in conic programming.
Exercise 2.24. Let K S be a proper cone. Show that its dual cone K S is
also a proper cone, and K = K.

i
i

22

main
2012/11/1
page 22
i

Chapter 2. Semidenite Optimization

Exercise 2.25. The second-order (or Lorentz) cone is dened as

 n
 12


x2i
x0 .
Ln+ = (x0 , x1 , . . . , xn ) Rn+1 :

i=1

Show that Ln+ is a proper cone and is isomorphic to its dual cone.
Exercise 2.26. Classify the following statements as true or false. A proof or
counterexample is required.
Let A : Rn Rm be a linear mapping and K Rn a cone.
1. If K is convex, then A(K) is convex.
2. If K is solid, then A(K) is solid.
3. If K is pointed, then A(K) is pointed.
4. If K is closed, then A(K) is closed.
Do the answers change if A is injective and/or surjective? How?

2.1.5

Strong Duality

As we have indicated earlier, strong duality in semidenite programming is a bit


more delicate than in the linear programming case. Most of the time (and particularly, in applications) this will not be a source of too many diculties. However,
it is important to understand the geometry behind this, as well as what conditions
we can impose to ensure that strong duality will hold.
As we showed in (2.19), weak duality always holds in conic programming (and
thus, also for semidenite programming (2.8)). However, it is possible to have nite
duality gaps (as in Example 2.14), or other anomalies, as the following simple
example illustrates.
Example 2.27. Consider the primal-dual SDP pair
minimize x11
subject to

2x12 = 1,


x11 x12
 0,
x12 x22

maximize y

0
subject to
y

 

y
1 0

.
0
0 0

For the dual problem, y = 0 provides an optimal solution, with optimal value d = 0.
On the primal side, however, we cannot have x11 = 0, since this would violate the
positive semideniteness constraint. However, by choosing x11 = , x22 = 1/ , we
obtain a cost p that is arbitrarily small but always strictly positive.
The example above shows that, in contrast with the case of linear programming, in
semidenite or conic programming optimal solutions may not be attained, even if
there is zero duality gap.

i
i

2.1. From Linear to Semidenite Optimization

main
2012/11/1
page 23
i

23

There are several geometric interpretations of what causes the failure of strong
duality for general conic problems. Perhaps the most natural one is based on the
fact that the image of a proper cone under a linear map may not be closed, and
thus it is not necessarily a proper cone. This fact may seem a bit surprising (or
perhaps wrong!) the rst time one encounters it, but after a while it becomes
quite reasonable. (If this is the rst time you have heard about this, we strongly
encourage you to stop reading and think of a counterexample right now! Or, see
Exercise 2.30.)
Strong duality and infeasibility certicates. To better understand strong
duality, we begin with a simple geometric interpretation in the conic setting, in terms
of the separating hyperplane theorem. Recall that this theorem (see Section A.3.3
in Appendix A for several versions of this important result) establishes that if we
have two disjoint convex sets, where one of them is closed and the other compact,
there always exists a hyperplane that separates the two sets. For simplicity, we
concentrate only on the case of conic feasibility, i.e., where we are interested in
deciding the existence of a solution x to the equations
Ax = b,

x K,

(2.20)

where as before K is a proper cone in the vector space S. We want to understand


when this problem is feasible and how to certify its infeasibility whenever there are
no solutions.
To do this, consider the image A(K) of the cone under the linear mapping.
Notice that feasibility of (2.20) is equivalent to the point b being contained in A(K).
We have now two convex sets in T , namely, A(K) and the singleton {b}, and we
want to know whether these sets intersect or not. If these sets satisfy certain
properties (for instance, closedness and compactness), then we could go on to apply
the (strict) separating hyperplane theorem and produce a linear functional y that
will be positive on one set and negative on the other. In particular, nonnegativity
of y on A(K) implies
y, Ax 0 x K

A y, x 0 x K

A y K .

Thus, if (2.20) is infeasible, and provided the hypotheses of the separating hyperplane theorem apply, there exists a (suitably normalized) linear functional y which
satises
y, b = 1,

A y K .

(2.21)

This yields a certicate of the infeasibility of the conic system (2.20).


When can we actually do this? The set {b} is certainly compact, so a natural
condition is that A(K) be a closed set. However, as we have mentioned, the image
of a proper cone is not necessarily closed, so we cannot automatically conclude
this. However, under certain conditions, we can ensure that this set will be closed.
Well-known sucient conditions for this are the following.

i
i

24

main
2012/11/1
page 24
i

Chapter 2. Semidenite Optimization

Theorem 2.28. Let K S be a proper cone and A : S T be a linear map. The


following two conditions are equivalent:
(i) K kerA = {0}.
(ii) There exists y T such that A y int(K ).
Furthermore, if these conditions hold, then A(K) is a closed cone.
The rst condition, while intuitive, has the drawback that it is not directly
veriable. The second condition is often more convenient, since it can be certied by exhibiting such a y, and can be interpreted as the range of A properly
intersecting K .
Proof. The equivalence of (i) and (ii) follows from Exercise 2.32, taking L = ker A,
and thus L = range A .
Assume now that (ii) holds, and dene C = {x K : A y, x = 1}. We
claim that the set C is compact. Indeed, C is closed (being the intersection of two
closed sets), and it is also bounded, since if there is a sequence xk C with
xk

going to innity, then dening z = limk xk /


xk
(passing to a subsequence
if necessary) gives an element of K (by closedness of K), for which A y, z =
limk A y, xk /
xk
= limk 1/
xk
= 0, contradicting A y int(K ).
The set A(C) is also compact (being the linear image of a compact set) and
does not include the origin, since for all x C we have y, Ax = A y, x = 1.
Thus, since A(K) = cone(A(C)), it follows from Exercise 4.17 in Chapter 4 that
A(K) is closed.
To recap, having strictly feasible solutions in (range A ) int K is a natural
condition for the existence of infeasibility certicates of the form (2.21).
For the case of a general conic optimization problem (not just feasibility),
similar conditions can be used to ensure that there will be no duality gap between
the primal and dual conic programs. The basic idea is to reduce the optimization
problem to a pure feasibility question by adjoining a new inequality corresponding
to the cost function. In this case, imposing a Slater-type condition will guarantee
that optimal solutions for both problems are achieved, with no gap (compare with
the semidenite programming case, Theorem 2.15).
Theorem 2.29. Consider a conic optimization problem (CP), where both the
primal and dual problems are strictly feasible. Then, both problems have nonempty,
compact sets of optimal solutions, and there is no duality gap.
Besides Theorem 2.28, many other conditions are known that ensure the
closedness of A(K). In particular, when K is polyhedral this image is always closed,
with no interior-point requirements needed. This corresponds to the case of linear
programming and is the reason why strong duality always holds in the LP case.
In Section 3.4.2 of Chapter 3 we will explore in much more detail general
infeasibility certicates for dierent kinds of systems of equations and inequalities.

i
i

2.2. Applications of Semidenite Optimization

main
2012/11/1
page 25
i

25

Exercise 2.30. Consider the set K = {(x, y, z) : y 2 xz, z 0}. Show that K is
a proper cone. Show that its projection onto the (x, y) plane is not a proper cone.
Exercise 2.31. Let K1 , K2 be closed convex cones. Show, via a counterexample,
that the Minkowski sum K1 + K2 does not have to be closed.
Exercise 2.32. Let L S be a subspace, and K S be a proper cone. Show that
the following two propositions are equivalent:
(i) L K = {0}.
(ii) There exists z L int(K ).
Hint: For the dicult direction (i) (ii), argue by contradiction, and use homogeneity and the separation theorem for convex sets.
Although as we have seen, standard duality may fail in semidenite (or
conic) programming, it is nevertheless possible to formulate a more complicated
semidenite dual program (called the Extended LagrangeSlater Dual in [36])
for which strong duality always holds, regardless of interior-point assumptions.
For details, as well as a comparison with the more general minimal cone approach, we refer the reader to [36, 37].

2.2

Applications of Semidenite Optimization

There have been many applications of semidenite optimization in a variety of


areas of applied mathematics and engineering. We present here just a few, to give a
avor of what is possible; many others will follow in other chapters. The subsections
corresponding to the dierent examples presented here can be read independently
and are not essential for the remainder of the developments in the book.

2.2.1

Lyapunov Stability and Control of Dynamical Systems

One of the earliest and most important applications of semidenite optimization is


in the context of dynamical systems and control theory. The main reason is that it
is possible to characterize dynamical properties (e.g., stability) in terms of algebraic
statements such as the feasibility of specic systems of inequalities. We describe
below a relatively simple example of these ideas that captures many of the features
of more complicated problems.
Stability of linear systems. Consider a linear dierence equation given by
x[k + 1] = A x[k],

x[0] = x0 .

(2.22)

This kind of linear recurrence equation is a simple example of a discrete-time dynamical system, where the state x[k] evolves over time, starting from an initial
condition x0 . The dierence equation (2.22), or its continuous-time analogue (the

i
i

26

main
2012/11/1
page 26
i

Chapter 2. Semidenite Optimization

d
x(t) = Ax(t)), is often used to model the time evolulinear dierential equation dt
tion of quantities such as temperature of objects, size of a population, voltage of
electrical circuits, and concentration of chemical mixtures.
A natural and important question about (2.22) is the long-term behavior of
the state. In particular, as k , under what conditions can we guarantee that
the state x[k] remains bounded, or converges to zero? It is well known (and easy
to prove; see Exercise 2.35) that x[k] converges to zero for all initial conditions x0
if and only if the spectral radius of the matrix A is smaller than one, i.e., all the
eigenvalues i satisfy |i (A)| < 1 for i = 1, . . . , n. In this case we say that the
system (2.22), or the matrix A, is stable (or Schur stable, if the discrete-time aspect
is not clear from the context).
While this spectral characterization is very useful, an alternative viewpoint is
sometimes even more convenient. The basic idea is to consider a generalization and
abstraction of the notion of energy, usually known as a Lyapunov function. These
are functions of the state x[k], with the property that they decrease monotonically
along trajectories of the system (2.22). It turns out that for linear systems there
is a simple characterization of stability in terms of a quadratic Lyapunov function
V (x[k]) = x[k]T P x[k]. Notice rst that the monotonicity condition V (x[k + 1])
V (x[k]) (for all states x[k]) can be equivalently expressed in terms of the matrix
inequality AT P A P 0. We then have the following result.

Theorem 2.33. Given a matrix A Rnn , the following conditions are equivalent:
1. All eigenvalues of A are inside the unit circle; i.e., |i (A)| < 1 for i = 1, . . . , n.
2. There exists a matrix P S n such that
P 0,

AT P A P 0.

Proof. (2) (1): Let Av = v, where v = 0. Then


0 > v (AT P A P )v = (||2 1) v P! v",
>0

and therefore || < 1.



k T k
(1) (2): Let P :=
k=0 (A ) A . The sum converges by the eigenvalue
assumption. Then
AT P A P =


k=1

(Ak )T Ak

(Ak )T Ak = I 0.

k=0

Thus, the characterization given above enables the study of the stability properties of the linear dierence equation (2.22) in terms of a semidenite programming problem, whose feasible solutions correspond to Lyapunov functions. In Section 3.6.2 we will explore extensions of these ideas to more complicated dynamics,
not necessarily linear.
Control design. Consider now the case of a linear system, where there is a control
input u[k]:
(2.23)
x[k + 1] = A x[k] + B u[k],
x[0] = x0 ,

i
i

2.2. Applications of Semidenite Optimization

main
2012/11/1
page 27
i

27

where B Rnm . The idea here is that by properly choosing the control input
u[k] Rm at each time instant, we may be able (under certain conditions), to aect
or steer the behavior of x[k] toward some desired goal. We are interested in the
case where the matrix A is not stable, but we can use linear state feedback to set
u[k] = Kx[k] for some xed matrix K (to be chosen appropriately). It is easy to
see that after this substitution, the system is described by (2.22), where the matrix
A is replaced by A(K) = A + BK. Thus, our goal is stabilization; i.e., we want
to nd a matrix K such that A + BK is stable (all eigenvalues have absolute value
smaller than one).
Although this problem seems (and is!) fairly complicated due to the nonlinear
dependence of the eigenvalues of A + BK on the unknown matrix K, it turns out
that it can be nicely solved using semidenite optimization and the Lyapunov characterization given earlier. Indeed, we can use Schur complements (see Appendix A)
to rewrite the condition
(A + BK)T P (A + BK) P 0,

P 0,

as


P
(A + BK)T P
P (A + BK)
P


0.

Although nicer, this condition is not quite an SDP yet, since it is bilinear in (P, K)
(and, thus, not jointly convex). However, dening Q := P 1 , and left- and rightmultiplying the equation above with the matrix BlockDiag(Q, Q), we obtain


Q
(A + BK)Q

Q(A + BK)T
Q


0.

Notice that this expression contains both Q and KQ, but there is no single appearance of the variable K. Thus, we can dene a new variable Y := KQ, to
obtain


Q
QAT + Y T B T
0.
(2.24)
AQ + BY
Q
This problem is now linear in the new variables (Q, Y ). In fact, it is a semidenite
programming problem! After solving it, we can recover the controller K via K =
Q1 Y . We summarize our discussion in the following result.
Theorem 2.34. Given two matrices A and B, there exists a matrix K such that
A + BK is stable if and only if the spectrahedron described by (2.24) is nonempty,
i.e., there exist matrices (Q, Y ) satisfying this (strict) linear matrix inequality.
Hence our control design problem is equivalent to solving a semidenite programming feasibility problem.

i
i

28

main
2012/11/1
page 28
i

Chapter 2. Semidenite Optimization

Semidenite programming techniques have become quite central in the analysis and design of control systems. The example above describes only the tip of
the iceberg in terms of the many design problems that can be attacked with these
techniques; we refer the reader to the works [6, 47] and the references therein.
We remark that the formulas in this example (e.g., (2.24)) do not explicitly
depend on the dimensions of the matrices A, B, K, Y, Q. Hence, these kinds of
problems are sometimes called dimension-free. This dimension-free feature applies
to many classical problems in linear systems and has strong implications. Linear
control theory problems can often be reduced to polynomials in matrix variables
where the feasible set is dened by these polynomials being positive semidenite.
Analyzing this situation requires a theory of inequalities for free noncommutative
polynomials extending classical real geometry for commutative polynomials. The
convexity aspects of this new area, noncommutative real algebraic geometry, is the
subject of Chapter 8.
Exercise 2.35. Show that for the linear dierence equation (2.22), the state
x[k] converges to zero for all initial conditions x0 if and only if |i (A)| < 1 for
i = 1, . . . , n. Hint: show that x[k] = Ak x0 , and consider rst the case where the
matrix A is diagonalizable.
Exercise 2.36. The system (2.23) has a nonstabilizable mode if the matrix A has
a left eigenvector w such that wT A = wT , wT B = 0, and || 1. Show that if
this is the case, then the SDP (2.24) cannot be feasible. Interpret this statement in
terms of the eigenvalues of A + BK. What does this say about the dual SDP?

2.2.2

Binary Quadratic Optimization

Binary (or Boolean) quadratic optimization is a classical combinatorial optimization


problem. In the version we consider, we want to minimize a quadratic function,
where the decision variables can take only the values 1. In other words, we are
minimizing an (indenite) quadratic form over the vertices of an n-dimensional
hypercube. The problem is formally expressed as
minimize
subject to

xT Qx
xi {1, 1},

(2.25)

where Q S n . There are many well-known problems that can be naturally written in the form above. Among these, we mention the maximum cut (MAXCUT)
problem, 0-1 knapsack, etc.
Notice that the Boolean constraints can be modeled using quadratic equations, i.e.,
xi {1, 1}

x2i = 1.

These n quadratic equations dene a nite set, with an exponential number of


elements, namely, all the n-tuples with entries in {1, 1}. There are exactly 2n
points in this set, so a direct enumeration approach to (2.25) is computationally
prohibitive when n is large (already for n = 30 we have 2n 109 ).

i
i

2.2. Applications of Semidenite Optimization

main
2012/11/1
page 29
i

29

We write the equivalent polynomial formulation


minimize

xT Qx

subject to

x2i = 1,

(2.26)

and we denote the optimal value and optimal solution of this problem as f and
x , respectively. It is well known that the decision version of this problem is NPcomplete (e.g., [18]). Notice that this is true even if the objective function is convex
(i.e., the matrix Q is positive denite), since we can always assume Q  0 by adding
to it a large constant multiple of the identity (this only shifts the objective by a
constant).
Computing good solutions to the binary optimization problem (2.26) is a
quite dicult task, so it is of interest to produce accurate bounds on its optimal
value. As in all minimization problems, upper bounds can be directly obtained from
feasible points. In other words, if x0 Rn has entries equal to 1, it always holds
that f xT0 Qx0 (of course, for a poorly chosen x0 , this upper bound may be very
loose).
To prove lower bounds, we need a dierent technique. There are several approaches to doing this, but many of them will turn out to be exactly equivalent
in the end. In particular, we can provide a lower bound in terms of the following
primal-dual pair of semidenite programming problems:
minimize
subject to

Tr QX

maximize

Tr

Xii = 1,
X  0,

subject to

Q  ,
diagonal.

(2.27)

These semidenite programs can be interpreted in a number of ways. For instance,


it is clear that the optimal solution X  of the primal formulation in (2.27) yields
a lower bound, since for every x in (2.26), the matrix X = xxT gives a feasible
solution of (2.27) with the same cost: Tr QX = Tr QxxT = xT Qx. Similarly, for
every feasible solution = Diag(1 , . . . , n ) of the dual SDP, we have
xT Qx xT x =

n


i x2i = Tr,

i=1

thus yielding a lower bound on (2.26).


In certain cases, these SDP-based bounds are provably good. Well-known cases
are when (Q) is diagonally dominant or positive semidenite or has a bipartite
structure, in which case results due to GoemansWilliamson [20], Nesterov [31], or
Grothendieck/Krivine [30, 2, 25], respectively, have shown that there is at most a
small constant factor between the true solutions and the SDP relaxations. We
discuss these bounds next.
Rounding. As described, the optimal value of the SDP relaxation (2.27) provides
a lower bound on the optimal value of the binary minimization problem (2.26). Two

i
i

30

main
2012/11/1
page 30
i

Chapter 2. Semidenite Optimization

natural questions arise:


1. Feasible solutions: can we use the SDP relaxations to provide feasible points
that yield good (or optimal) values of the objective?
2. Approximation guarantees: is it possible to quantify the quality of the bounds
obtained by SDP?
By suitably rounding in an appropriate manner the optimal solution of the SDP
relaxation, both questions can be answered in the armative. The basic idea is
to produce a binary vector x from the SDP solution matrix X, using the following
hyperplane rounding method [20]:
Factorize the SDP solution X as X = V T V , where V = [v1 . . . vn ] Rrn
and r is the rank of X.
Since Xij = viT vj and Xii = 1, this factorization gives n vectors vi on the
unit sphere in Rr . Thus, instead of assigning either 1 or 1 to each variable,
so far we have assigned to each xi a point on the unit sphere in Rr .
Now, choose a uniformly distributed random hyperplane in Rr (passing through
the origin), and assign to each variable xi either a +1 or a 1, depending on
which side of the hyperplane the point vi lies.
Since the last step involves a random choice, this is a randomized rounding method.
By a simple geometric argument, it is possible to quantify the expected value of the
objective function.
Lemma 2.37. Let x = sign(V T r), where X = V T V and r is a standard random
Gaussian vector. Then, E[xi xj ] = 2 arcsin Xij .
By linearity of expectations, we have the following relationship between the
lower bound given by the optimal value of the SDP, the true optimal value f ,
and the expected value of the rounded solution x:
Tr QX f E[xT Qx] =

2
Tr Q arcsin[X].

(2.28)

The notation arcsin[] indicates that the arcsine function is applied componentwise,
i.e., (arcsin[X])ij = arcsin Xij .
Exercise 2.38. Prove Lemma 2.37, and verify that it implements the hyperplane
rounding scheme.
Approximation ratios. In many problems, we want to understand how far these
upper and lower bounds are from each other. Depending on the specic assumptions
on the cost function, the hyperplane rounding method (or slight variations) will give

i
i

2.2. Applications of Semidenite Optimization

main
2012/11/1
page 31
i

31

solutions with dierent guaranteed approximation ratios. Since the approximation


algorithms literature often considers maximization problems (instead of the minimization version (2.26)), in this section we use
maximize xT Ax
subject to

x2i = 1

(2.29)

and state below our assumptions in terms of the matrix A (or, equivalently, the
matrix Q in the minimization formulation (2.25)).
We describe next three well-known cases where constant approximation ratios
can be obtained.
Diagonally
dominant: A symmetric matrix A is diagonally dominant if aii

|a
j =i ij | for all i. This is an important case that corresponds, for instance,
to the MAXCUT problem, where the cost
 function to be maximized is the
Laplacian of a graph (V, E), given by 14 (i,j)E (xi xj )2 . Every diagonally
dominant quadratic form can be written as a nonnegative linear combination
of terms of the form x2i and (xi xj )2 [4]. Thus, to analyze the performance of
hyperplane rounding when A is diagonally dominant, it is enough to consider
the inequality
E[(xi xj )2 /2] = E[1 xi xj ] = 1

2
arcsin Xij GW (1 Xij ),

where GW = mint[1,1] (1 2 arcsin t)/(1 t) 0.878. Combining this


with (2.28), and taking into account the change of signs (since A = Q), it
follows that
GW Tr AX E[xT Ax] f Tr AX;
i.e., the vector x obtained by randomly rounding the SDP solution matrix X
is at most 13% suboptimal in expectation. This analysis is due to Goemans
and Williamson [20] and yields the best currently known approximation ratio
for the MAXCUT problem.
Positive semidenite: Nesterov [31] rst analyzed the case of maximizing a convex quadratic function, i.e., when the matrix A is positive semidenite. Notice
that here we do not have any information on the sign of the individual entries aij , and thus a global analysis is needed instead of the term-by-term
analysis of the previous case. The key idea is to use the following result.
Lemma 2.39. Let f : R R be a function whose Taylor expansion has only
nonnegative coecients. Given a symmetric matrix X, dene a matrix Y as
Yij = f (Xij ) (equivalently, Y = f [X]). Then X  0 implies Y  0.
This lemma is a rather direct consequence of the Schur product theorem;
see Exercise 2.42. Since the scalar function f (t) = arcsin(t) t has only

i
i

32

main
2012/11/1
page 32
i

Chapter 2. Semidenite Optimization


nonnegative Taylor coecients, if X  0, we have arcsin[X]  X, and thus
E[xT Ax] =

2
2
Tr A arcsin[X] Tr AX.

Thus, in this case we have


2
Tr AX E[xT Ax] f Tr AX.

Notice that 2 0.636, so the approximation ratio in this case is slightly worse
than for the diagonally dominant case.
Bipartite: This case corresponds to the cost function being bilinear and has been
analyzed in [2, 30]. We assume that the matrix A has a structure


1 0 S
.
A=
2 ST 0
Letting x = [p; q], an equivalent formulation is in terms of a bilinear optimization problem
maximize pT Sq,
where S Rnm and p, q are in {+1, 1}n and {+1, 1}m, respectively.
This problem has a long history in operator theory and functional analysis and
was rst analyzed (in a quite dierent form) by Grothendieck. For this class
of problems, it follows from his results that a constant ratio approximation is
possible. In fact, the worst-case ratio (over all instances) between the values
of the semidenite relaxation and the bilinear binary optimization problem is
called the Grothendieck constant and is usually denoted KG ,
KG := sup
A

Tr AX
,
f

where X is, as before, the optimal solution of the SDP relaxation. The exact
value is this constant is unknown at this time. The argument below is essentially due to Krivine [25] and provides an upper bound to the Grothendieck
constant.
Since there are no assumptions about the sign of the entries of the matrix S,
we cannot directly apply the techniques discussed earlier to prove a bound on
the quality of hyperplane rounding. The basic strategy in Krivines approach
is the following: instead of using hyperplane rounding directly on the solution X of the SDP relaxation, we will apply rst a particular componentwise
transformation, to obtain a matrix Y , and then apply hyperplane rounding
to Y . The reason is that this will considerably simplify the computation of
the expected value of the objective function.
To do this, we use a block version of Lemma 2.39.

i
i

2.2. Applications of Semidenite Optimization

main
2012/11/1
page 33
i

33

Lemma 2.40. Let f, g : R R be functions such that both f + g and f g


have nonnegative Taylor coecients. Let




X11 X12
f (X11 ) g(X12 )
X=
.
(2.30)
,
Y
=
T
T
g(X12
X12
X22
) f (X22 )
Then X  0 implies Y  0.
The result now follows from a clever choice of f and g. Let
f (t) = sinh(cK t/2),
where the constant cK =
f (1) = 1. Since
sinh(t) =


k=0

g(t) = sin(cK t/2),

sinh1 (1) = 2 log(1 + 2) 0.5611 is chosen so

t2k+1
,
(2k + 1)!

sin(t) =


k=0

(1)k

t2k+1
,
(2k + 1)!

both f + g and f g have nonnegative Taylor expansions.


Let X be the optimal solution of the SDP relaxation, and dene Y as in (2.30).
Notice that the matrix Y satises Y  0 and Yii = 1. We can therefore apply
hyperplane rounding to it to obtain a vector y. Computing the expected value
of this solution, we have
E[y T Ay] =

2
2
Tr A arcsin[Y ] = Tr S(cK X12 /2) = cK Tr SX12 ,

and therefore this gives us a randomized algorithm with expected value cK


times the value of the SDP relaxation. Notice that no inequalities are used in
the analysis, so the expected cost of the solution y for this rounding scheme
is exactly equal to cK times the optimal value of the SDP:
cK Tr SX12 = E[y T Ay] f Tr SX12 .
12
This analysis gives an upper bound for the Grothendieck constant of Tr SX

f
KG 1/cK 1.7822. It has been recently shown that this rounding method
(and thus, the value 1/cK ) is not the best possible one [8], but the exact
approximation ratio is not currently known.

Exercise 2.41. Show that the optimal values of the primal and dual semidenite
programs in (2.27) are equal, i.e., there is no duality gap.
Exercise 2.42. The entrywise product AB of two matrices is given by (AB)ij =
Aij Bij . This product is also known as the Hadamard or Schur product. The Schur
product theorem says that if two matrices A, B are positive semidenite, so is
their product A B.

i
i

34

main
2012/11/1
page 34
i

Chapter 2. Semidenite Optimization


1. Prove the Schur product theorem. (Hint: What happens if one of the matrices
is rank one?)
2. Prove Lemmas 2.39 and 2.40.

2.2.3

Stable Sets and the Theta Function

Given an undirected graph G = (V, E), a stable set (or independent set ) is a subset
of the set of vertices V with the property that the induced subgraph has no edges.
In other words, none of the selected vertices are adjacent to each other.
The stability number of a graph, usually denoted by (G), is the cardinality
of the largest stable set. Computing the stability number of a graph is NP-hard.
There are many interesting applications of the stable set problem. In particular,
it can be used to provide upper bounds on the Shannon capacity of a graph [28],
a problem that appears in coding theory (when computing the zero-error capacity
of a noisy channel [43]). In fact, this was one of the rst appearances of semidenite
programming.
In many problems, it is of interest to compute upper bounds on (G). The
Lov
asz theta function of the graph G is denoted by (G) and is dened as the
solution of the primal-dual SDP pair:
maximize
subject to

minimize t
subject to Y tI

Tr JX
TrX = 1
Xij = 0,
X  0,

(i, j) E,

Yii = 1,
Yij = 1,

i V,
(i, j)  E,
(2.31)

where J is the matrix with all entries equal to one.


The theta function is an upper bound on the stability number, i.e.,
(G) (G).
The inequality is easy to prove. Consider the indicator vector (S) of any stable
1
(S)(S)T . It is easy to see that this X is
set S, and dene the matrix X := |S|
a feasible solution of the primal SDP in (2.31), and it achieves an objective value
equal to |S|. As a consequence, the inequality above directly follows.
For a class of graphs known as perfect graphs,1 the upper bound given by the
theta function is exact; i.e., it is equal to the stability number. Many classes of
graphs, such as bipartite, chordal, and comparability graphs, are perfect. Thus, for
these graphs one can compute in polynomial time the size of the largest stable set
(and a maximum stable set) by solving the SDPs (2.31). Interestingly, at this time
no polynomial-time combinatorial methods (not based on semidenite programming) are known to compute this quantity for all perfect graphs. Further material
1 A graph is perfect if, for every induced subgraph, the chromatic number is equal to size of
the largest clique.

i
i

2.2. Applications of Semidenite Optimization

main
2012/11/1
page 35
i

35

Figure 2.8. Petersen graph.


on the theta function of a graph and its applications in combinatorial optimization
can be found in Lov
aszs original paper [28], or the references [19, 21].
Exercise 2.43. Consider the graph in Figure 2.8, known as the Petersen graph.
Compute the semidenite programming upper bound on the size of its largest stable
subset (i.e., the Lov
asz theta function). Is this bound tight? Can you nd a stable
set that achieves this value?
Exercise 2.44. The chromatic number (G) of a graph G is the minimum number
of colors needed to color all vertices, in such a way that adjacent vertices receive
distinct colors. Show that the inequality

(G) (G)
is the complement of the graph G.
holds, where G
construct a feasible solution of the dual SDP in
Hint: Given a coloring of G,
(2.31).

2.2.4

Bounded analytic interpolation

In many applications, one tries to nd a function in a given function class, that takes
specic values at prescribed points. These kinds of questions are known as interpolation problems. A classical and important class of interpolation problems involves
bounded analytic functions. The mathematical background for these problems is
reviewed and developed further in Chapter 9. Good general references include [3]
for the theoretical aspects, and [24, 47] for specic applications of interpolation in
systems and control theory.
We discuss here two specic problems related to this area. The rst is the
computation of the H -norm of an analytic function, and the second is the classical
NevanlinnaPick interpolation problem. Additional connections between analytic
interpolation and convex optimization can be found in [6].
Norms of rational analytic functions. Let D be the complex open unit disk
D = {z C : |z| < 1}. Consider a scalar rational function of a complex variable z
given by
f (z) = cT (z 1 I A)1 b + d,
(2.32)

i
i

36

main
2012/11/1
page 36
i

Chapter 2. Semidenite Optimization

where A Rnn , b, c Rn1 , and d R. We assume that all the eigenvalues of A


are in D: |i (A)| < 1 (i.e., A is Schur stable). It follows that z 1 I A is nonsingular
on |z| 1, and thus f (z) is analytic2 on the domain D.
The question of interest is to compute the H -norm of the function f (z), i.e.,
its maximum absolute value on the unit disk:

f
= sup |f (z)|.

(2.33)

zD

It can be shown, by using the maximum principle in complex analysis, that it is


enough to compute the supremum of f (z) on the boundary of the domain, i.e., the
unit circle |z| = 1. A fairly complete characterization of this question is available.
It is known in the literature under several names, such as the KalmanYakubovich
Popov lemma [38], or the bounded real lemma, or (as a special case of) the structured
singular value theory [34], among others. The statement, presented below, characterizes this norm in terms of the solution of a semidenite programming problem.
Theorem 2.45. Consider a function f (z) as in (2.32), with |i (A)| < 1. Then,

f
< if and only if the semidenite program



T 
 
P 0 A b
A b
P 0
,
P 0,
(2.34)

0 1 cT d
cT d
0 2
is feasible, where the decision variable is the matrix P S n .
A full proof can be found, for instance, in [3, 47]. We present here only the
easy direction, i.e., showing that if (2.34) holds, then we have
f (z)
< . For
this, let v = (z 1 I A)1 b, and multiply the rst inequality in (2.34) left and right
by [v 1] and its conjugate transpose, respectively. From the identity

    1 
A b v
z v
,
=
cT d 1
f (z)
we have that
(|z 1 |2 1)(v P v) + (|f (z)|2 2 ) < 0,
and thus the conclusion directly follows. The converse direction takes a bit more
work; see Chapter 9. There are extensions of this result to the matrix case, i.e.,
where f (z) is matrix-valued.
Exercise 2.46. Use the given formulation to compute the H -norm of the analytic function f (z) = z3 +zz2
2 z+3 . How can you compute, from the semidenite
formulation, a value of z at which the maximum is achieved?
2 We remark that the notation used here is slightly dierent from the usual notation in systems
and control theory, where z is used instead of z 1 in (2.32). The reason is that for interpolation,
it is more natural to use functions that are analytic on D (poles outside the unit circle) than
functions that are analytic outside D. To avoid distracting technical issues of controllability and/or
observability, we use strict inequalities throughout.

i
i

2.2. Applications of Semidenite Optimization

main
2012/11/1
page 37
i

37

Exercise 2.47. Formulate a similar statement for the matrix case. Do the same
formulas work?
NevanlinnaPick interpolation. Consider now the following problem. We want
to nd an analytic function on D satisfying the interpolation constraints:
f (ak ) = ck

for

k = 1, . . . , m,

(2.35)

where ak D. When does there exist an analytic function, satisfying the interpolation conditions, whose absolute value is bounded by 1 on the unit disk?
Clearly, a necessary condition is that the interpolated values ck must satisfy
|ck | 1 for all k. However, due to the analyticity constraint, this is not sucient.
Consider, for instance, the case m = 2 and the constraints f (0) = 0 and f (1/2) = c.
In this case, a necessary condition is |c| 1/2, which is stronger than the obvious
condition |c| 1. To see this, notice that, due to the rst interpolation constraint,
f (z) must have the form f (z) = zg(z), where g(z) = f (z)/z is also analytic on D
and bounded by one (by the maximum modulus theorem, since |f (z)| = |g(z)| on
the unit circle). Thus, we must have 1 |g(1/2)| = 2|c|, and thus |c| 1/2.
Necessary and sucient conditions for the interpolation problem to be feasible
are given by the NevanlinnaPick theorem; see Chapter 9. The formulation below
is convenient from the optimization viewpoint.
Theorem 2.48. There exists a function f (z) analytic on D, satisfying the norm
bound
f (z)
and the interpolation constraints (2.35) if and only if


Z
C
 0,
(2.36)
C Z 1
where Zjk =

1
1a
j ak

and C = Diag(c1 , . . . , cm ).

Using Schur complements, it can be easily seen that this formulation is equivalent to the more usual characterization where the m m Pick matrix P given by
Pjk =

2 cj ck
1 aj ak

is required to be positive semidenite (e.g., Section 9.8). The advantage of condition (2.36) is that it is linear in the interpolation values ck . This allows its
use in a variety of system identication problems; see, for instance, [11, 35]. The
NevanlinnaPick interpolation problem has many important applications in systems
and control theory; see, for instance, [14] and [47] and the references therein.

2.2.5

Euclidean Distance Matrices

Assume we are given a list of pairwise distances between a nite number of points.
Under what conditions can the points be embedded in some nite-dimensional space
and those distances be realized as the Euclidean metric between the embedded

i
i

38

main
2012/11/1
page 38
i

Chapter 2. Semidenite Optimization

points? This problem appears in a large number of applications, including distance geometry, computational chemistry, sensor network localization, and machine
learning.
Concretely, assume we have a list of distances dij for 1 i < j n. We
would like to nd points xi Rk (for some value of k) such that
xi xj
= dij
for all i, j. What are necessary and sucient conditions for such an embedding
to exist? In 1935, Schoenberg [41] gave an exact characterization in terms of the
semideniteness of the matrix of squared distances.
Theorem 2.49. The distances dij can be embedded in a Euclidean space if and
only if the n n matrix

0
d2
12
2
D :=
d13
..
.

d212
0
d223
..
.

d213
d223
0
..
.

...
...
...
..
.

d21n

d22n

d23n

...

d21n
d22n

d23n

..
.
0

is negative semidenite on the subspace orthogonal to the vector e := (1, 1, . . . , 1).


Proof. We show only the necessity of the condition. Assume an embedding exists,
i.e., there are points xi Rk such that dij =
xi xj
. Consider now the Gram
matrix G of inner products

x1 , x1 
x2 , x1 

G :=
..

x1 , x2 
x2 , x2 
..
.

xn , x1  xn , x2 

...
...
..
.

x1 , xn 
x2 , xn 

= [x1 , . . . , xn ]T [x1 , . . . , xn ],
..

. . . xn , xn 

which is positive semidenite by construction. Since Dij =


xi xj
2 = xi , xi  +
xj , xj  2xi , xj , we have
D = Diag(G) eT + e Diag(G)T 2G,
from which the result directly follows.
Notice that the dimension of the embedding is given by the rank k of the
Gram matrix G.
For more on this and related embedding problems, good starting points are
Schoenbergs original paper [41] as well as the book [15].
Exercise 2.50. Consider the Euclidean distance matrix characterization in Theorem 2.49. Show that it implies the triangle inequality dik dij + djk for all triples
(xi , xj , xk ) of points. Is the converse true?

i
i

2.2. Applications of Semidenite Optimization

2.2.6

main
2012/11/1
page 39
i

39

Rank Minimization and Nuclear Norm

An interesting class of optimization problems appearing in many application domains is rank minimization problems. These have the form
minimize
subject to

rank X
X C,

(2.37)

where the matrix X Rmn is the decision variable, and C is a given convex
constraint set. Notice that the cost function is integer-valued, and thus (unless the
problem is trivial) these optimization problems are not convex.
Rank minimization questions arise in many dierent areas, since notions such
as order, complexity, and dimensionality can often be expressed by means of the
rank of an appropriate matrix. For example, a low-rank matrix could correspond
to a low-degree statistical model for a random process (e.g., factor analysis), a loworder realization of a linear dynamical system, or a low-dimensional embedding of
data in Euclidean space (as in Section 2.2.5). If the set of models that satisfy the
desired constraints is convex, then choosing the simplest one in a given family can
be formulated as a rank minimization problem of the form (2.37).
In general, rank minimization problems can be quite dicult to solve, both
in theory and practice. However, several researchers have proposed heuristic techniques to obtain good approximate solutions. A particularly interesting method is
the nuclear norm heuristic, originally proposed in [17, 16]. In this method, instead
of directly solving the problem (2.37), one solves instead
minimize
subject to

X C,

(2.38)

where

is the nuclear norm dened earlier in (2.11). In other words, the
dicult objective function (rank) is replaced by a nicer cost function (nuclear
norm) which is convex, and thus the resulting problem is convex.
Under certain conditions on the set C, it has been shown that the solution of
the problem (2.38) coincides with the lowest-rank solution, i.e., the true solution
of (2.37). For example, a typical formulation (see, e.g., [39] for a specic statement)
would establish that if the set C is a subspace of dimension O(n log n), uniformly chosen according to a natural rotation-invariant probability measure, then the nuclear
norm heuristic succeeds with high probability.
Atomic norms. An interesting generalization of these methods is obtained by
considering more general atomic norms [10]. Consider a set A of atoms vi in some
vector space V (the set A can be nite or innite). Given an element a V , we are
interested in the smallest decomposition of a in terms of the elements vi , i.e., the
one that satises

minimize
i |
i |
(2.39)
subject to a = i i vi .
We can then dene the atomic norm
a
A as the optimal value of this optimization
problem. If the set of atoms is nite, this is a linear programming problem. In most

i
i

40

main
2012/11/1
page 40
i

Chapter 2. Semidenite Optimization

situations of interest, however, the set A is either innite or exponentially large, in


which case an LP formulation is impractical. In certain cases, however, we can still
compute this norm eciently. For instance, in the case where the set of atoms A
corresponds to the rank one matrices uv T , where
u
=
v
= 1, then this norm
corresponds exactly to the matrix nuclear norm dened earlier.
For many problems, however, we would like to consider more general sets of
atoms. A particularly interesting case is when the atoms are the rank one matrices
with 1 entries. In other words, the atoms are given by A = {vwT Rmn : v
Rm , vi2 = 1, w Rn , wi2 = 1}. In this case, the norm (2.39) is in general NP-hard
to compute. However, a nice computable approximation is available, known as the
2 or max-norm. This norm is dened as
A
2 := maxu=1,v=1
A uv T
,
where is the entrywise product, and can be computed as the optimal value of the
primal-dual pair of semidenite programs:

maximize Tr AT Y


Diag(p)
Y
subject to
 0,
YT
Diag(q)
m
n


pi +
qi = 2,
i=1

minimize
subject to

i=1

t


V
AT


A
 0,
W

Vii = t,
Wii = t.
(2.40)

It can be easily seen that (2.40) gives


a lowerTbound on the optimal value of (2.39),
i.e.,
A
2
A
A .
Indeed, if A = i
i vi wi , where the vi 
and wi are 1 vectors,
then choosing V = i |i |vi viT , W = i |i |wi wiT , and t = i |i | gives a feasible
solution for the right-hand side of (2.40). As discussed in Exercise 2.53, the 2 -norm
actually yields a constant approximation ratio to the atomic norm for this specic
set of atoms. The 2 -norm is of great importance in a number of applications,
including communication complexity; see, e.g., [27].
Exercise 2.51. Check that the expression (2.39) correctly denes a matrix norm
by verifying homogeneity and the triangle inequality. What properties are needed
on the atom set A to ensure that the norm is well dened and nonzero at every
nontrivial point?
Exercise 2.52. Let the set of atoms A be the rank one matrices of the form vwT ,
where
v
=
w
= 1. Show that the corresponding atomic norm is the standard
nuclear norm (sum of singular values).
Exercise 2.53. Using the results in Section 2.2.2, show that in the case where the
atoms are the rank one matrices with 1 entries, the following inequality holds:

A
2
A
A KG
A
2 ,
where KG is the Grothendieck constant.

i
i

2.3. Algorithms and Software

main
2012/11/1
page 41
i

41

Exercise 2.54. Based on the previous exercise, explain the geometric relationship
between the unit ball of the 2 -norm in Rmn and the elliptope Em+n dened earlier
in Section 2.1.3.

2.3
2.3.1

Algorithms and Software


Algorithms

In this section we describe a few algorithmic and complexity aspects of the numerical
solution of semidenite optimization problems. For a complete treatment, we refer
the reader to articles and monographs such as [13, 32, 44, 45].
Semidenite programs are convex optimization problems and, as such, can be
solved using general convex optimization techniques. Under natural assumptions
(e.g., to rule out doubly exponentially small solutions), semidenite optimization is
solvable in polynomial time, in the sense that -suboptimal, weakly feasible solutions
can be computed in time polynomial in log 1 . This follows, for instance, from general
results about the ellipsoid method [21].
Despite these nice theoretical results, the ellipsoid method is often too slow
in practice. Since SDP is a generalization of linear programming, it is natural that
some of the most eective practical methods for SDP have been inspired by stateof-the-art techniques from LP. This has led to the development of interior-point
methods [1, 32] for SDP. The basic idea of interior-point methods is to consider the
optimality conditions
 of Lemma 2.12 and to perturb the complementarity slackness
condition to (C i Ai yi )X = I. As varies, these equations implicitly dene a
curve (X , y ) called the central path, and to solve the original problem we need to
compute (X , y ) as 0. These equations are relatively easy to solve for large ,
and by carefully decreasing the value of , it is possible to use Newtons method to
eciently track solutions as decreases to zero. There are several dierent versions
of these methods (depending on the exact form of the equations to which Newtons
method is applied), although they all share fairly similar features. In particular,
primal-dual interior-point methods of this kind are among the most ecient known
methods for small- and medium-scale SDP problems.
Besides interior-point methods, there are several alternative techniques for
solving SDPs that are sometimes preferable to pure primal-dual methods due
to speed or memory eciency issues. Examples of these are techniques based on
low-rank factorizations [9], spectral bundle methods [23], or augmented Lagrangian
methods for large-scale problems [46], among others.

2.3.2

Software

There are a number of useful software packages for polyhedral computations, linear and semidenite programming, and algebraic visualization. We present below
a partial annotated selection. A few good up-to-date web resources for general
information about semidenite programming include Christoph Helmbergs SDP
page www-user.tu-chemnitz.de/helmberg/semidef.html and the SDPA website
sdpa.sourceforge.net.

i
i

42

main
2012/11/1
page 42
i

Chapter 2. Semidenite Optimization

Polyhedral computations. The rst class of software packages we discuss is


polyhedral manipulation codes and libraries. Almost all of them allow us to convert
an inequality representation of a polyhedron (usually called an H-representation)
into vertices/extreme rays (V-representation), and vice versa, as well as much more
complicated operations between polyhedra.
cdd, by Komei Fukuda.
www.ifor.math.ethz.ch/fukuda/cdd home.
lrs, by David Avis.
cgm.cs.mcgill.ca/avis/C/lrs.html.
polymake, by Ewgenij Gawrilow and Michael Joswig (main authors).
polymake.org.
PORTA, by Thomas Christof and Andreas Lobel.
typo.zib.de/opt-long projects/Software/Porta.
Linear programming. For formulating and solving linear programs, many codes
are available, ranging from academic implementations suitable for relatively small
problems to industrial-scale solvers. The following is a necessarily partial list:
GLPK GNU Linear Programming Kit
www.gnu.org/s/glpk. This is an open-source package for solving large-scale
linear programming problems, using either simplex or interior-point methods.
GLPK can also solve integer programming problems and can be used as a
callable C library.
CLP LP solver, part of the COIN-OR (COmputational INfrastructure for
Operations Research) suite of open source software. www.coin-or.org
CPLEX Perhaps the best-known commercial solver, now being developed and
marketed by IBM.
Semidenite programming. Although SDP is much more recent than linear programming, fortunately many good software packages are already available. Among
the most well-known are the following:
CSDP, originally by Brian Borchers, now a COIN-OR project:
projects.coin-or.org/Csdp
SDPA, by the research group of Masakazu Kojima, sdpa.sourceforge.net. Several versions of the SDPA solver are available, including parallel and variableprecision oating-point arithmetic, in MATLAB and C++ versions.
SDPT3, by Kim-Chuan Toh, Reha T
ut
unc
u, and Michael Todd.
www.math.nus.edu.sg/mattohkc/sdpt3.html. SDPT3 is a MATLAB package for linear, quadratic, and semidenite programming. It can also handle
determinant maximization problems, as well as problems with complex data.

i
i

Bibliography

main
2012/11/1
page 43
i

43

SeDuMi, originally by Jos Sturm, currently being maintained by the optimization group at Lehigh University (sedumi.ie.lehigh.edu), is a widely used
MATLAB package for linear, quadratic, second order conic, and semidenite
optimization, and any combination of these.
An easy and convenient way to try out many of these packages, without installing
them in a local machine, is through the NEOS Optimization server (neos-server.org),
currently hosted by the University of Wisconsin-Madison.
Parsers. In practice, specifying a semidenite programming problem by explicitly
dening matrices Ai , C, and b in (SDP-P) can be cumbersome and error-prone.
A much more convenient and reliable way is to use a natural description of the
variables and inequalities and to automatically translate these into standard form
using a parser or modeling language. Two well-known and convenient modeling
environments for semidenite programming are the following:
CVX, by Michael Grant and Stephen Boyd.
cvxr.com/cvx. CVX is a MATLAB-based disciplined convex programming
software. It is particularly well suited to conic optimization, including semidefinite and geometric programming.
YALMIP, by Johan Lofberg.
yalmip.org. YALMIP is a MATLAB-based parser and solver for the modeling
and solution of convex and nonconvex optimization problems.

Bibliography
[1] F. Alizadeh. Interior point methods in semidenite programming with applications to combinatorial optimization. SIAM J. Optim., 5(1):1351, 1995.
[2] N. Alon and A. Naor. Approximating the cut-norm via Grothendiecks inequality. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of
Computing, ACM, New York, 2004, pp. 7280.
[3] J.A. Ball, I. Gohberg, and L. Rodman. Interpolation of Rational Matrix Functions. Birkhauser, Basel, 1990.
[4] G.P. Barker and D. Carlson. Cones of diagonally dominant matrices. Pacic
J. Math., 57(1):1532, 1975.
[5] D. Bertsimas and J. N. Tsitsiklis. Introduction to Linear Optimization. Athena
Scientic, Cambridge, MA, 1997.
[6] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory, Studies in Applied Mathematics 15. SIAM,
Philadelphia, 1994.
[7] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, Cambridge, UK, 2004.

i
i

44

main
2012/11/1
page 44
i

Chapter 2. Semidenite Optimization

[8] M. Braverman, K. Makarychev, Y. Makarychev, and A. Naor. The Grothendieck constant is strictly smaller than Krivines bound. In the IEEE
52nd Annual Symposium on Foundations of Computer Science (FOCS), IEEE,
Washington, DC, 2011, pp. 453462.
[9] S. Burer and R. D.C. Monteiro. A nonlinear programming algorithm for solving
semidenite programs via low-rank factorization. Mathematical Programming,
95(2):329357, 2003.
[10] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A.S. Willsky. The convex geometry of linear inverse problems. Foundations of Computational Mathematics,
12:805849, 2012.
[11] J. Chen, C.N. Nett, and M.K.H. Fan. Worst case system identication in H :
Validation of a priori information, essentially optimal algorithms, and error
bounds. IEEE Transactions on Automatic Control, 40(7):12601265, 1995.
[12] V. Chvatal. Linear Programming. W.H. Freeman, New York, 1983.
[13] E. de Klerk. Aspects of Semidenite Programming: Interior Point Algorithms
and Selected Applications, Applied Optimization 65. Kluwer Academic Publishers, Dordrecht, The Netherlands, 2002.
[14] P. Delsarte, Y. Genin, and Y. Kamp. On the role of the NevanlinnaPick
problem in circuit and system theory. International Journal of Circuit Theory
and Applications, 9(2):177187, 1981.
[15] M. M. Deza and M. Laurent. Geometry of Cuts and Metrics, Algorithms and
Combinatorics 15. Springer-Verlag, Berlin, 1997.
[16] M. Fazel. Matrix Rank Minimization with Applications. Ph.D. thesis, Stanford
University, Stanford, CA, 2002.
[17] M. Fazel, H. Hindi, and S.P. Boyd. A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American
Control Conference, volume 6, IEEE, Washington, DC, 2001, pp. 47344739.
[18] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the
Theory of NP-Completeness. W. H. Freeman, New York, 1979.
[19] M. X. Goemans. Semidenite programming in combinatorial optimization.
Math. Programming, 79(13):143161, 1997.
[20] M. X. Goemans and D. P. Williamson. Improved approximation algorithms
for maximum cut and satisability problems using semidenite programming.
Journal of the ACM, 42(6):11151145, 1995.
[21] M. Grotschel, L. Lov
asz, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization, 2nd ed., Algorithms and Combinatorics 2. Springer-Verlag,
Berlin, 1993.

i
i

Bibliography

main
2012/11/1
page 45
i

45

[22] O. G
uler. Hyperbolic polynomials and interior point methods for convex programming. Math. Oper. Res., 22(2):350377, 1997.
[23] C. Helmberg and F. Rendl. A spectral bundle method for semidenite programming. SIAM Journal on Optimization, 10(3):673696, 2000.
[24] J.W. Helton. Operator Theory, Analytic Functions, Matrices, and Electrical
Engineering. CBMS Regional Conference Series in Mathematics 68. AMS,
Providence, RI, 1987.
[25] J.L. Krivine. Constantes de Grothendieck et fonctions de type positif sur les
spheres. Adv. Math, 31:1630, 1979.
[26] M. Laurent and S. Poljak. On a positive semidenite relaxation of the cut
polytope. Linear Algebra and Its Applications, 223:439461, 1995.
[27] T. Lee and A. Shraibman. Lower bounds in communication complexity. Foundations and Trends in Theoretical Computer Science, 3(4), 2009.
[28] L. Lovasz. On the Shannon capacity of a graph. IEEE Transactions on Information Theory, 25(1):17, 1979.
[29] J. Matousek and B. Gartner. Understanding and Using Linear Programming.
Springer-Verlag, New York, 2007.
[30] A. Megretski. Relaxations of quadratic programs in operator theory and system
analysis. In Systems, Approximation, Singular Integral Operators, and Related
Topics (Bordeaux, 2000), Oper. Theory Adv. Appl. 129. Birkh
auser, Basel,
2001, pp. 365392.
[31] Y. Nesterov. Semidenite relaxation and nonconvex quadratic optimization.
Optimization Methods and Software, 9:141160, 1998.
[32] Y. E. Nesterov and A. Nemirovski. Interior Point Polynomial Methods in Convex Programming, Studies in Applied Mathematics 13. SIAM, Philadelphia,
1994.
[33] J. Nie, P. A. Parrilo, and B. Sturmfels. Semidenite representation of the
k-ellipse. IMA Volumes in Mathematics and Its Applications, 146:117132,
2008.
[34] A. Packard and J. C. Doyle. The complex structured singular value. Automatica
J. IFAC, 29(1):71109, 1993.
[35] P. A. Parrilo, M. Sznaier, R.S. Sanchez Pe
na, and T. Inanc. Mixed
time/frequency-domain based robust identication. Automatica J. IFAC,
34(11):13751389, 1998.
[36] M. V. Ramana. An exact duality theory for semidenite programming and its
complexity implications. Math. Programming, 77(2, Ser. B):129162, 1997.

i
i

46

main
2012/11/1
page 46
i

Chapter 2. Semidenite Optimization

[37] M. V. Ramana, L. Tuncel, and H. Wolkowicz. Strong duality for semidenite


programming. SIAM J. Optim., 7(3):641662, 1997.
[38] A. Rantzer. On the Kalman-Yakubovich-Popov lemma. Systems & Control
Letters, 28:710, 1996.
[39] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions
of linear matrix equations via nuclear norm minimization. SIAM Review,
52(3):471501, 2010.
[40] J. Renegar. Hyperbolic programs, and their derivative relaxations. Found.
Comput. Math., 6(1):5979, 2006.
[41] I. J. Schoenberg. Remarks to Maurice Frechets article Sur la denition
axiomatique dune classe despace distancies vectoriellement applicable sur
lespace de Hilbert. Ann. of Math. (2), 36(3):724732, 1935.
[42] A. Schrijver. Theory of Linear and Integer Programming. Wiley, New York,
1986.
[43] C. Shannon. The zero error capacity of a noisy channel. IRE Transactions on
Information Theory, 2(3):819, 1956.
[44] M. Todd. Semidenite optimization. Acta Numerica, 10:515560, 2001.
[45] L. Vandenberghe and S. Boyd. Semidenite programming. SIAM Review,
38(1):4995, 1996.
[46] X.Y. Zhao, D. Sun, and K.C. Toh. A Newton-CG augmented Lagrangian
method for semidenite programming. SIAM Journal on Optimization,
20:17371765, 2010.
[47] K. Zhou, K. Glover, and J. C. Doyle. Robust and Optimal Control. Prentice
Hall, Englewood Clis, NJ, 1995.
[48] G. M. Ziegler. Lectures on Polytopes, Graduate Texts in Mathematics 152.
Springer-Verlag, New York, 1995.

i
i

main
2012/11/1
page 47
i

Chapter 3

Polynomial
Optimization,
Sums of Squares, and
Applications
Pablo A. Parrilo

We begin the study of one of the main themes of the book, namely, the relationships
between nonnegative polynomials, sums of squares, and semidenite programming.
The two key ideas around which this chapter is structured are
sum of squares decompositions of polynomials can be computed using
semidenite programming,
and
the search for infeasibility certicates for real polynomial systems is a
convex problem. Given an upper bound on the degree of the certicates,
they can be found by solving a sum of squares program.
In the rest of this chapter, we dene and explain the basic concepts needed to make
these assertions precise. For this, in Section 3.1 we introduce nonnegative polynomials, sum of squares decompositions, and the notion of sum of squares programs,
followed by a few simple but important applications in Section 3.2. In Section 3.3
we explore how the presence of additional algebraic structure, such as symmetries
or sparsity, enables more ecient computations. We then explain how these results
can be used to provide infeasibility certicates for systems of polynomial inequalities and the important implications for polynomial optimization (Section 3.4). Section 3.5 explores the dual side, including geometric and probabilistic interpretations.
Finally, in Section 3.6, we present additional applications of the methods in diverse
areas of applied mathematics and engineering, concluding with a short discussion
of current software implementations.
47

i
i

48

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

3.1

Nonnegative Polynomials and Sums of


Squares

3.1.1

main
2012/11/1
page 48
i

Nonnegative Polynomials

We consider polynomials in n variables, with real coecients. A multivariate polynomial p(x1 , . . . , xn ) is nonnegative if it takes only nonnegative values, i.e.,
p(x1 , . . . , xn ) 0

for all (x1 , . . . , xn ) Rn .

(3.1)

The characterization of nonnegativity of multivariate polynomials is a ubiquitous


question throughout mathematics, with many rich and surprising connections.
From the algorithmic and computational viewpoints, perhaps the immediate
rst questions that one can ask include the following:
Decision question. Given a polynomial p(x), how do we decide if it is nonnegative?
Certication. Is it possible to certify nonnegativity eciently? In other words,
imagine you are trying to convince a friend that p(x) is actually nonnegative,
or that it is not. Is there a more ecient way of doing this than having them
run an algorithm themselves?
Complexity. What computational resources are needed to decide polynomial nonnegativity?
Structural questions. What is the structure of the set of nonnegative polynomials?
Before proceeding to answer these questions in the general case, it makes sense
to consider rst a few simple special cases.
Univariate polynomials. A good starting point is the case of polynomials in a
single variable, i.e., when n = 1:
p(x) = pd xd + pd1 xd1 + + p1 x + p0 .

(3.2)

We normally assume that the leading coecient pd is not zero, and occasionally we
will normalize it to pd = 1, in which case we say that p(x) is monic. The roots are
the values of x at which p(x) vanishes. By the fundamental theorem of algebra,
there is a unique factorization
p(x) = pd

d
#

(x xi ),

(3.3)

i=1

where the (complex) roots xi may have multiplicities, i.e., they are not necessarily
all distinct.
How do we decide if p is nonnegative? Clearly, an obvious necessary condition
is that the degree of p(x) be even. Otherwise, if the degree is odd, then either as
x or as x , the polynomial p(x) will become negative.
In some simple cases, it is possible to give direct characterizations.

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 49
i

49

Example 3.1. Let p(x) = x2 + p1 x + p0 be a monic quadratic polynomial. What


conditions must p1 , p0 satisfy for p(x) to be nonnegative? Since p(x) denes a
convex function that achieves its minimum, it is enough to verify the nonnegativity
condition only for its minimum value. Solving for the minimizer of p(x) by setting
its derivative to zero (i.e., 2x + p1 = 0), we obtain x = p1 /2, p(x ) = p0 p21 /4,
and thus we have
{(p0 , p1 ) : p(x) 0 x R} = {(p0 , p1 ) : 4p0 p21 0}.
Thus, in the special case of polynomials of degree 2, we were able to write an explicit
inequality condition in the coecients of p(x) to ensure its nonnegativity.
What can we say in the general (univariate) case? Reasoning directly in
terms of coecients does not seem too promising. However, it can be easily seen
that nonnegativity imposes strong restrictions on the roots of p(x). Assume the
leading coecient of p(x) is positive. If p(x) 0, then either p(x) has no real roots,
or, if it has real roots, they must have even multiplicity (why?). However, since in
general the roots are nonelementary functions of the coecients of the polynomial,
this approach does not directly yield a good characterization (we will, however, use
this insight later in Section 3.1.3).
There are several explicit algorithms for deciding nonnegativity of univariate
polynomials. These methods will not require the computation of the roots and
may in fact be implemented in exact rational arithmetic. A classical formulation is
based on Sturm sequences; see, e.g., [19]. We describe an alternative technique
instead, known as the Hermite or trace form method; its justication is developed in
Exercise 3.7. Consider a monic univariate polynomial (3.2) and dene its associated
Hermite matrix as the following d d symmetric Hankel matrix:

s1 sd1
s0
d
s1

s2
sd

,
s
=
xkj ,
(3.4)
H1 (p) = .
..
.. . .
k

..
.
.
.
sd1

sd

j=1

s2d2

where, as before, xj are the roots of p(x). The quantities sk are known as the power
sums and, remarkably, can be obtained directly from the coecients of p(x) using
the Newton identities, with no root computation needed; see Exercise 3.5. When
p(x) is monic, the sk are polynomials of degree k in the coecients of p(x).
It turns out that we can count the real roots of p(x) by analyzing the inertia
of its Hermite matrix (see Appendix A for background material on matrix inertia).
The following theorems make this connection precise.
Theorem 3.2. The rank of the Hermite matrix H1 (p) is equal to the number of
distinct (complex) roots. Its signature is equal to the number of distinct real roots.
Theorem 3.3. Let p(x) be a monic univariate polynomial of degree 2d. Then, the
following are equivalent:
1. The polynomial p(x) is strictly positive.

i
i

50

main
2012/11/1
page 50
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications


2. The polynomial p(x) has no real roots.
3. The inertia of the Hermite matrix is I(H1 (p)) = (k, 2d k, k) for some 1
k d.

Recall that the inertia of a matrix can be computed eciently, in polynomial


time, by diagonalization with a congruence transformation (e.g, via the LDLT
decomposition; see Appendix A), so a decision method for strict positivity based
on this theorem can be eectively implemented.
Example 3.4. Consider again the quadratic univariate polynomial p(x) = x2 +
p1 x + p0 . The power sums are s0 = 2, s1 = p1 , and s2 = p21 2p0 . The Hermite
matrix is then


2
p1
H1 (p) =
.
p1 p21 2p0
Let = det H1 (p) = p21 4p0 . The inertia of the Hermite matrix is

(0, 0, 2) if > 0,
(0, 1, 1) if = 0,
I(H1 (p)) =

(1, 0, 1) if < 0,
and thus p is strictly positive if and only if p21 4p0 < 0.
Exercise 3.5. Let p(x) be a monic univariate polynomial as in (3.2). Show that
the power sums sk satisfy the recursive equations:
s0 = d,

sk =

k


(1)j1 pj skj ,

k = 1, 2, . . . .

j=1

These equations are known as the Newton identities.


Exercise 3.6. Show that the determinant of the matrix H1 (p) is (up to a constant)
equal to the discriminant [32] of p(x). Hint: Express det H1 (p) in terms of the roots
of p(x).
Exercise 3.7. Given a univariate polynomial p(x) of degree d, dene the Hermite
quadratic form or trace form H1 (p) : R[x]d  R as
H1 (p)[f ] =

d


f (xi )2 ,

i=1

where x1 , . . . , xd are the roots of p(x).


1. Find a matrix representation of the quadratic form H1 (p).
2. When is H1 (p) singular?

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 51
i

51

3. Find a factorization of the Hermite matrix in terms of the Vandermonde


matrix of the roots. If necessary, assume that roots xi are all distinct, and
describe the required modications for the general case.
4. Prove Theorem 3.2.
Exercise 3.8. Can you nd a criterion for polynomial nonnegativity (not strict
positivity) based solely on the inertia of the Hermite matrix? Describe your proposed criterion in detail, or explain why additional information may be necessary.
Hint: Consider the polynomials (x + 1)x2 (x 1)3 and (x + 1)2 x2 (x 1)2 .
Exercise 3.9. Find necessary and sucient conditions for the quartic polynomial
p(x) = x4 + p1 x + p0 to be positive for all real values of x. Plot the number of real
roots as a function of the parameters (p0 , p1 ).
Multivariate polynomials. Now we move on to the multivariate case. Let Pn,2d
be the set of nonnegative polynomials in n variables of degree less than or equal
to 2d, i.e.,
Pn,2d = {p R[x]n,2d : p(x) 0 x Rn }.
%
$
coecients, and noticing that the
By identifying a polynomial with its N := n+d
d
constraints p(x) 0 are ane in the coecients of p for every xed x, it follows
directly that Pn,2d is a convex set in R[x]n,2d RN . Furthermore, the following
is true.
Theorem 3.10. The set of nonnegative polynomials Pn,2d is a proper cone (i.e.,
closed, convex, pointed, and solid) in R[x]n,2d RN .
Example 3.11. Consider the case of polynomials of degree 2d = 2, i.e., quadratic
polynomials in n variables. Every such polynomial can be represented as
p(x) =

1 T
x Ax + 2bT x + c,
2

where A S n is a symmetric matrix. It can be shown (Exercise 3.16) that p(x) 0


for all x Rn if and only if


A b
 0.
bT c
n+1
.
Thus, in this case, the set Pn,2 is isomorphic to the positive semidenite cone S+
Notice that for the particular case of a univariate quadratic polynomial p(x) =
p2 x2 + p1 x + p0 , this reduces to the condition


p2
p1 /2
 0.
p1 /2 p0

i
i

52

main
2012/11/1
page 52
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

This agrees with Example 3.1, which corresponds to the monic case where
p2 = 1.
As we will shortly see, although always convex, the cone of nonnegative polynomials
has a fairly complicated geometry in the general case. In Chapter 4, further features
of this set will be studied in detail.
Exercise 3.12. Prove Theorem 3.10.
Except for special situations like the quadratic case of Example 3.11, it will
not be easy to eciently obtain explicit descriptions of Pn,2d . The reason is that the
algebraic and combinatorial structure of the set of nonnegative polynomials can be
extremely complicated, even though it is a convex set. As a consequence, obtaining
general explicit inequalities (e.g., on the coecients) that dene when a polynomial
is nonnegative can be a very complex, or even hopeless, task.
To understand this situation in more detail, we discuss the algebraic and
geometric situation with the help of a few examples, followed by a discussion of the
computational complexity aspects.
Pn,2d is semialgebraic but is not basic semialgebraic. Recall that in Example 3.1 we provided explicit inequalities for the set P1,2 of univariate quadratics.
Since this description did not include quantiers or logical operations (e.g., set
unions, implications), we obtained a basic semialgebraic set (see Section A.4.4 in
Appendix A). As we will see, such convenient descriptions are not possible in general, since the set of nonnegative polynomials is not basic semialgebraic for 2d 4.
To see why this is the case, consider the following example, describing a particular ane section of P1,4 .
Example 3.13. Let p(x) be the quartic univariate polynomial p(x) = x4 +2ax2 +b.
For what values of a, b is p(x) nonnegative? Since the leading term x4 has even
degree and is strictly positive, p(x) is strictly positive if and only if it has no real
roots. The discriminant1 of p(x) is equal to Disx (p) = 256 b (a2 b)2 . For the
number of real roots to change, the discriminant must vanish, and thus the zero
set of the discriminant partitions the set of parameters (a, b) into regions where
the number of real roots is constant. The subset of (a, b) R2 for which p(x) is
positive corresponds to the case of no real roots, with its closure being the region
of nonnegativity. Notice that (as expected) this subset is convex and is shown in
Figure 3.1.
As the example illustrates, in the univariate case it is easy to see that if p(x)
lies on the boundary of the set P1,2d , then it must have a real root, of multiplicity at
least two. Indeed, if there is no real root, then p(x) is in the strict interior of P1,2d
(small enough perturbations will not create a root), and if it has a simple real root
it clearly cannot be nonnegative. Thus, on the boundary of P1,2d , the discriminant
1 The discriminant Dis (p) of a univariate polynomial p(x) is a polynomial in the coecients of
x
p that vanishes if and only if p has a multiple root. It is dened as the resultant between p(x) and
its derivative p (x); see [32] or [120] for an introduction to polynomial resultants and discriminants.

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 53
i

53

b
2
1.5

1
0.5

-2

-1.5

-1

-0.5
-0.5

0.5

1.5

Figure 3.1. The discriminant Disx (p) partitions the parameter space (a, b)
into regions where the number of real roots is constant. The numbers indicate how
many real roots the polynomial x4 + 2ax2 + b has whenever (a, b) are in the corresponding region. The shaded set corresponds to the polynomial being nonnegative.

Disx (p) must necessarily vanish. However, it turns out that the discriminant does
not vanish only on the boundary, but it may also vanish at points inside the set;
see Figure 3.1. The algebraic reason is that pairs of complex roots may coincide,
which will cause the discriminant to vanish, even though this does not directly aect
nonnegativity of p.
This situation can create some serious diculties. For instance, even though
we have a perfectly valid analytic expression for the boundary of the set, we cannot
get a good sense of how far we are from the boundary by looking at the absolute
value of the discriminant (this would be very useful for numerical optimization over
Pn,2d ). A more algebraic way of describing the situation is that Pn,2d is a convex set
with the complicating feature that the Zariski closure of the boundary intersects
the interior of the set.
In general, these sets are not very convenient to work with since we cannot
describe them in terms of unquantied inequalities.
Lemma 3.14. The set discussed in Example 3.13 and presented in Figure 3.1 is
not basic semialgebraic.
The fact that Pn,2d is not basic semialgebraic (for 2d 4) means that there
is no description of Pn,2d in terms of a nite collection of polynomial inequalities
{g1 (p ) 0, . . . , gm (p ) 0} in the coecients p . In other words, any characterization of the set Pn,2d using polynomial inequalities must necessarily include logical
operations between sets (e.g., unions, complements) or other similar complications.
Things can be even more complicated than what Figure 3.1 suggests in the
sense that (as opposed to what may be inferred from this gure) in higher dimensions
it is impossible to remove the undesired component (i.e., the discriminant does
not factor, as it did in this example). Consider the case of a quartic polynomial of

i
i

54

main
2012/11/1
page 54
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Figure 3.2. The zero set of the discriminant of the polynomial x4 + 4ax3 +
6bx + 4cx + 1. The convex set inside the bowl corresponds to the region of
nonnegativity. There is an additional one-dimensional component inside the set.
2

the form p(x) = x4 + 4ax3 + 6bx2 + 4cx + 1. Its discriminant (up to a nonessential
numerical factor) is the irreducible polynomial
1 27a4 64c3 a3 + 108bca3 54b3 a2 + 36b2 c2 a2 6c2 a2 + 54ba2
+ 108bc3a 180b2ca 12ca + 81b4 27c4 18b2 54b3 c2 + 54bc2 .
The zero set of this discriminant, shown in Figure 3.2, is an algebraic surface that
denes the boundary of a three-dimensional convex set, corresponding to the values
of (a, b, c) for which p(x) is nonnegative. It can be shown that this convex set is the
convex hull of two parabolas, dened parametrically as
'
&
'
&
2t2 1
2t2 + 1
,t ,
t  t,
, t ,
t  t,
3
3
respectively, and that the surface is singular along these parabolas (these correspond
to the cases when the polynomial factors as p(x) = (x2 + 2tx 1)2 ).
From the numerical optimization viewpoint, the presence of extraneous components of the discriminant in the interior of the feasible set is also an important
roadblock for the availability of easily computable barrier functions for these sets
(even in the univariate case). Indeed, every polynomial that vanishes on the boundary of the set P1,2d must necessarily contain the discriminant as a factor. This is a
striking dierence from the case of the nonnegative orthant or the positive semidefinite cone, where the standard barriers are given (up to a logarithm) by products
of the linear constraints or a determinant (which are polynomials). A possible

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 55
i

55

solution to this problem is to produce nonpolynomial barrier functions, either by


partial minimization from a higher-dimensional barrier (i.e., projection) or other
constructions such as the universal barrier function introduced by Nesterov and
Nemirovski [84].
Remark 3.15. In principle, explicit conditions describing the set Pn,2d can be
obtained via quantier elimination techniques, such as Tarski-Seidenberg, cylindrical
algebraic decomposition (CAD), and related algorithms; see, e.g., [13, 26]. To do
this, consider the quantied formula
x1 x2 xn p(x1 , . . . , xn ) 0,
and eliminate the quantied variables (x1 , . . . , xn ) to obtain a description of Pn,2d as
a semialgebraic set in terms of the coecients of p only. Notice that this shows that
the nonnegativity problem is decidable. Although extremely powerful from the theoretical viewpoint, these methods often run into serious practical diculties, given
their doubly exponential dependence on the number of variables (modern versions
reduce this to doubly exponential in the number of quantier alternations). In practice, they can only be used for problems of fairly small size. A high-quality implementation of these methods is the software QEPCAD [23].
Exercise 3.16. Prove the characterization of nonnegativity of quadratic polynomials given in Example 3.11.
Exercise 3.17. In this exercise, we consider sets that are semialgebraic but not
basic semialgebraic. For much more about this, see [7, 8].
1. Consider the set S = R2 \ T , where T is the open nonnegative orthant T =
{(x, y) R2 : x > 0, y > 0}. Write S as a union of basic semialgebraic sets.
Show that S is not basic semialgebraic.
2. Prove Lemma 3.14.
Exercise 3.18. Recall that an extreme point v of a convex set S is exposed if there
exists a supporting hyperplane H of S such that {v} = S H. Show that the closed
convex set in Figure 3.1 has an extreme point that is not exposed.
Exercise 3.19. Explore the geometry of the convex set in Figure 3.2. In particular,
analyze the swallowtail singularities of the discriminant variety at the points
(1, 1, 1) or (1, 1, 1) and the one-dimensional component that joins them.
Computational complexity. A dierent but related viewpoint on why the set
of nonnegative polynomials is dicult to characterize is based on computational
complexity arguments; see, e.g., [47] for an introduction to computational complexity. The goal here is to quantify the computational resources (e.g., time, memory)
required to decide membership in Pn,2d and, in particular, to understand how these
resources scale as a function of the problem input size.

i
i

56

main
2012/11/1
page 56
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Recall the situation of quadratic polynomials discussed in Example 3.11, where


nonnegativity of a quadratic polynomial was shown to be equivalent to the positive semideniteness of a symmetric matrix. Thus, for this case, polynomial nonnegativity (equivalently, membership in Pn,2 ) can be decided in polynomial time
using, for instance, Gaussian elimination, or LDLT , or Cholesky matrix decompositions. Similarly, in the univariate case, there are algorithms that will decide, in
time polynomial in the input size (i.e., the bit-length of the coecients), whether
a univariate polynomial is nonnegative. This can be done, for instance, with minor
variations of the Hermite matrix method described earlier (which, as described, applied only to strict positivity); see [19] for a complete treatment and other related
methods.
Unfortunately, the situation is drastically dierent for multivariate polynomials of degree four or higher. When 2d 4 it is known that deciding polynomial
nonnegativity is an NP-hard problem (for xed degree, as a function of the number
of variables). Essentially, this means that unless the complexity-theoretic statement P = NP holds (which is generally considered very unlikely), there cannot be
a polynomial-time algorithm that can decide whether a polynomial is nonnegative.
This includes, of course, the possibility of writing a small list of explicit conditions
on the coecients.
Exercise 3.20. Give a reduction from any known NP-hard problem (e.g., satisability, independent set, binary integer programming, etc.) to nonnegativity of
multivariate quartic polynomials.
A way out: Describing sets as projections. As we have seen, even in the
univariate case, the set of nonnegative polynomials Pn,2d has fairly complicated
features, such as not being basic semialgebraic. However, it turns out that at least
in some cases one can provide nicer representations.
To do this, we will represent (or approximate) these sets as a projection from
a higher dimensional space, where the object upstairs will have nicer properties, and all complicating features will be a consequence of the projection. As an
example, recall the set discussed in Example 3.13 and Figure 3.1. This is a twodimensional set, describing a particular section of the set of univariate nonnegative
quartics. Although, as we showed, this set is not basic semialgebraic, it is however
the projection of the convex basic semialgebraic set
{(a, b, t) R3 :

b (a t)2 ,

t 0}.

In Figure 3.3 we present a plot of this three-dimensional convex set and its projection
onto the plane (a, b) that gives exactly the set of Figure 3.1.
As we shall see in detail in the next section, this idea will allow us to exactly
represent the set P1,2d of univariate nonnegative polynomials as the projection of
a nice spectrahedral set. Furthermore, the same techniques will make it possible
to obtain good approximations for the set Pn,2d of multivariate nonnegative polynomials. The techniques will be based on the connection between sums of squares
polynomials and semidenite programming.

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 57
i

57

5
4
t

3
2
1
6

0
4

4
2

0
0

Figure 3.3. A three-dimensional convex set, described by a quadratic and a


linear inequality, whose projection on the (a, b) plane is equal to the set in Figure 3.1.
Exercise 3.21. Recall that the Minkowski sum of two sets S1 , S2 Rn is the
set S1 + S2 := {s1 + s2 : s1 S1 , s2 S2 }. Consider a set S R2 , given by the
Minkowski sum of a disk and a line segment. Show that S is not basic semialgebraic.
Give a representation of S as a projection of a convex semialgebraic set in R3 .
Exercise 3.22. Consider the set
{(x, y, z) R3 : xyz 1,

x 0,

y 0,

z 0}.

1. Is it convex?
2. Is it a spectrahedron?
3. Is it a projected spectrahedron?
Hint: If you need help with item 2, try the real zero condition in Chapter 6.
Exercise 3.23. Prove the validity of the set containment relationships described
in Figure 3.4, and give counterexamples for all noninclusions.

3.1.2

Sums of Squares

A multivariate polynomial p(x1 , . . . , xn ) is a sum of squares (sos) if it can be written


as the sum of squares of some other polynomials. Formally, we have the following.
Denition 3.24. A polynomial p(x) R[x]n,2d is a sum of squares (sos) if there
exist q1 , . . . , qm R[x]n,d such that
p(x) =

m


qk2 (x).

(3.5)

k=1

i
i

58

main
2012/11/1
page 58
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications


  





  



  



 

Figure 3.4. Relationships between set classes.


We will use n,2d for the set of sos polynomials in n variables of degree less
than or equal to 2d. If a polynomial p(x) is a sum of squares, then it obviously
satises p(x) 0 for all x Rn . Thus, an sos condition is a sucient condition for
global nonnegativity, i.e., n,2d Pn,2d .
In general, sos decompositions are not unique.
Example 3.25. The polynomial p(x1 , x2 ) = x21 x1 x22 + x42 + 1 is a sum of squares.
Among innitely many others, it has the decompositions
3
1
(x1 x22 )2 + (x1 + x22 )2 + 12
4
4
1
2 2
1
23
2 2
(9x1 16x22 )2 + x21 .
= (3 x2 ) + x2 +
9
3
288
32

p(x1 , x2 ) =

It quickly follows from its denition that the set n,2d of sos polynomials is
invariant under nonnegative scalings and convex combinations; i.e., it is a convex
cone. In fact, more is true, as follows.
Theorem 3.26. The set of sos polynomials n,2d is a proper cone (i.e., closed,
convex, pointed, and solid) in R[x]n,2d RN .
One of the central questions in convex algebraic geometry is to understand the
relationships between the two cones Pn,2d and n,2d . In the remainder of this chapter, as well as in Chapter 4, we analyze this problem from the algebraic, geometric,
and computational viewpoints.
Exercise 3.27. Consider the sum of squares representation (3.5). Show that if
p(x) has degree 2d, then the polynomials qi necessarily have degree less than or
equal to d, by considering the coecients corresponding to the highest order terms.
Exercise 3.28. Using nitely many squares in Denition 3.24 may seem restrictive
at rst. Show using Caratheodorys theorem
A.10 in Appendix A) that
%
$ (Theorem
.
in Denition 3.24 we can always take m n+d
d

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 59
i

59

When is nonnegativity equal to sum of squares? Since sum of squares implies


nonnegativity (i.e., n,2d Pn,2d ), a natural question is to understand under what
conditions the converse holds, i.e., when a nonnegative polynomial can be expressed
as a sum of squares. We will study many aspects of this question extensively in this
book, particularly in Chapter 4.
More than a century ago, David Hilbert showed that equality between the
set of nonnegative polynomials Pn,2d and sos polynomials n,2d holds only in the
following three cases:
Univariate polynomials (i.e., n = 1).
Quadratic polynomials (2d = 2).
Bivariate quartics (n = 2, 2d = 4).
For all other cases, there always exist nonnegative polynomials that are not sums
of squares. Perhaps the most celebrated example is the bivariate sextic (n = 2,
2d = 6) due to Motzkin, given by (in dehomogenized form)
M (x, y) = x4 y 2 + x2 y 4 + 1 3x2 y 2 .

(3.6)

This polynomial is nonnegative but is not a sum of squares. Nonnegativity of


M (x, y) follows from the arithmetic-geometric inequality applied to (x4 y 2 , x2 y 4 , 1)
(or, alternatively, from the identity (3.19)) and the fact that it is not a sum of
squares from Exercise 3.97.
The rst two cases (univariate and quadratic) of Hilberts classication are
relatively straightforward and are discussed in Exercises 3.30 and 3.32, respectively.
In Chapter 4 the more subtle remaining case will be proved, along with an in-depth
study of the structure of these sets.
Another immediate question is related to the algorithmic aspects of sos polynomials. Given a polynomial, how can we decide if it is a sum of squares? Equivalently, how can we decide membership in the cone n,2d ? We answer these questions
in the next section, where we describe the connections between sos conditions on
polynomials and semidenite programming.
Exercise 3.29. Let p(x), q(x) be sos polynomials.
1. Show that the sum p(x) + q(x) and the product p(x)q(x) are also sums of
squares.
2. Furthermore, show that if both p(x) and q(x) are each the sum of two squares,
then so is their product p(x)q(x).
Hint: Recall complex multiplication. For w, z C, |w|2 |z|2 = |wz|2 holds.
Consider the real and imaginary parts of this expression.
Exercise 3.30. In this exercise, we show that univariate nonnegative polynomials
are sums of squares, and, in fact, two squares suce.

i
i

60

main
2012/11/1
page 60
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications


1. Show that if p(x) = p2d x2d + +p1 x+p0 is nonnegative, it has a factorization
of the form
#
#
p(x) = p2d
(x rj )nj
[(x zk )(x zk )]mk ,
j

where rj and (zk , zk ) are the real and complex roots of p(x), p2d > 0, and the
multiplicities nj of the real roots are even.
2. Show that if z is a complex number, the quadratic polynomial (x z)(x z )
is a sum of two squares.
3. Use Exercise 3.29 to conclude that p(x) is itself a sum of two squares.
Exercise 3.31. Using the previous exercise, compute a decomposition of p(x) =
x4 + 2x3 + 6x2 22x + 13 as a sum of two squares.
Exercise 3.32. Let p(x1 , . . . , xn ) be a quadratic polynomial (i.e., 2d = 2). Show
that if p(x1 , . . . , xn ) is nonnegative, then it is a sum of squares.
Hint: Recall Example 3.11 and matrix factorizations.

3.1.3

Univariate Polynomials

In this section we explain in detail the computation of sos decompositions of univariate polynomials, with a full discussion of the multivariate case in the next section.
The main reason for starting with the univariate case is that it is notationally
simpler, and it is fairly similar to the general case.
Consider a univariate polynomial p(x) of degree 2d:
p(x) = p2d x2d + p2d1 x2d1 + + p1 x + p0 .

(3.7)

Assume that p(x) is a sum of squares; i.e., it can be written as in (3.5). Notice that
the degree of the polynomials qk must be at most equal to d, since the coecient of
the highest term of each qk2 is positive, and thus there cannot be any cancellation
in the highest power of x (cf. Exercise 3.27). Then, we can write

1
q1 (x)
x
q2 (x)

(3.8)
.. = V .. ,
.
.
qm (x)
xd
where V Rm(d+1) , and its kth row contains the coecients of the polynomial
qk . For future reference, let [x]d be the vector of monomials on the right-hand side
of (3.8), and dene the matrix Q := V T V . We then have
p(x) =

m


qk2 (x) = (V [x]d )T (V [x]d ) = [x]Td V T V [x]d = [x]Td Q[x]d .

k=1

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 61
i

61

This immediately suggests the following characterization of sos polynomials.


Lemma 3.33. Let p(x) be a univariate polynomial of degree 2d. Then, p(x) is a
sum of squares if and only if there exists a symmetric matrix Q S d+1 that satises
p(x) = [x]Td Q[x]d ,

Q  0.

(3.9)

The matrix Q is usually called the Gram matrix of the sos representation. One
direction of the lemma follows directly from noticing that the matrix Q = V T V constructed above is positive semidenite. For the other direction, assume there exists a
positive semidenite matrix Q for which (3.9) holds. Then, by factorizing Q = V T V
(e.g., via Cholesky or square root factorization), we obtain an sos decomposition
of p(x).
Although perhaps not immediately obvious at rst, the condition in (3.9) is a
semidenite program! Indeed, notice that the constraint p(x) = [x]Td Q[x]d is ane
in the matrix Q, and thus the set of possible Gram matrices Q is given exactly by
the intersection of an ane subspace and the cone of positive semidenite matrices.
To obtain explicit equations for this semidenite program, we index the rows
and columns of Q by {0, . . . , d} as

d 
2d
d




[x]Td Q[x]d =
Qij xi+j =
Qij xk .
i=0 j=0

k=0

i+j=k

Thus, for this expression to be equal to p(x), it must be the case that

pk =
Qij ,
k = 0, . . . , 2d.

(3.10)

i+j=k

This is a system of 2d + 1 linear equations between the entries of Q and the coecients of p(x). Thus, since Q is simultaneously constrained to be positive semidenite, and to belong to the ane subspace dened by these equations, an sos condition
is exactly equivalent to a semidenite programming problem. We have shown, then,
the following.
2d
Lemma 3.34. A univariate polynomial p(x) = k=0 pk xk is a sum of squares if
and only if there exists a positive semidenite matrix Q S d+1 satisfying (3.10).
This is a semidenite programming problem.
Recall that in the univariate case, nonnegativity and sum of squares are equivalent conditions. Thus, Lemma 3.34 completely characterizes the set of univariate
nonnegative polynomials and shows that the set P1,2d = 1,2d is a projected spectrahedron.
Example 3.35. Consider the univariate polynomial
p(x) = x4 + 4x3 + 6x2 + 4x + 5,

i
i

62

main
2012/11/1
page 62
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

for which we want to nd an sos decomposition. Proceeding as described earlier,


we consider the expression

T
1
q00
p(x) = x q01
q02
x2

q01
q11
q12

q02
1
q12 x
q22
x2

= q22 x4 + 2q12 x3 + (q11 + 2q02 )x2 + 2q01 x + q00 .


Matching coecients, we obtain the following linear equality constraints:
x4 : 1 = q22 ,
x3 : 4 = 2q12 ,
x2 : 6 = q11 + 2q02 ,
x : 4 = 2q01 ,
1 : 5 = q00 .
We need to nd a positive semidenite matrix Q that satises these linear equations
(i.e., solve a semidenite program). In this case, the semidenite program is feasible,
and we can obtain a solution given by

5
Q= 2
0

2 0
6 2 = V T V,
2 1

0 2 1
2 0 ,
V = 2
3 0 0

which yields the sos decomposition


p(x) = (x2 + 2x)2 + 2(1 + x)2 + 3.
In certain special cases, it may be possible to construct sos representations of a
xed polynomial without necessarily having to solve semidenite programs. In
Exercise 3.36 we explore the case of univariate polynomials; see Exercise 3.84 for
an extension of these results to a more complicated situation. However, as we
discuss in Section 3.1.7, the reason why the SDP reformulation is crucial is because
it will allow us to search for sos polynomials, even in the presence of additional
convex constraints.
Exercise 3.36. Consider the following algorithm, which computes an sos decomposition of a monic univariate polynomial, using linear algebra techniques, in a
numerically stable way.
Algorithm 3.1. SOS decomposition of a univariate polynomial.
Input: A monic univariate polynomial p(x) = x2d + + p1 x + p0 .
Output: An sos decomposition of p(x).

i
i

3.1. Nonnegative Polynomials and Sums of Squares


1:

63

Form the companion matrix Cp , dened by

0 0
1 0

Cp := 0 1
.. ..
. .
0 0
2:

main
2012/11/1
page 63
i

..
.

0
0
0
..
.

p0
p1
p2
..
.

1 p2d1

Find a complex Schur decomposition of the companion matrix, i.e.,


Cp = U U =

U11
U21

U12
U22



11
0

12
22



U11
U21

U12
U22


,

where U is unitary, is upper triangular, and the spectra of 11 , 22 are


complex conjugates of each other.
1
3: Let q := vU12 , where v is the rst row of U22 . Let qr and qi be the real and
imaginary parts of q, respectively.
4: Dene

1

 
 x
q1 (x)
qr 1
=
.
q2 (x)
qi 0 ...
xd
5:

return sos decomposition p(x) = q12 (x) + q22 (x).

1. Implement this algorithm, and test it in a few examples.


2. If p(x) is not nonnegative, where does the algorithm fail?
3. Prove that the algorithm is correct, i.e., it always produces a valid sos decomposition.
Hint: What properties does the complex polynomial q(x) = q1 (x) + iq2 (x) have?
Exercise 3.37. The results presented in this section for standard univariate
polynomials can be easily extended to real trigonometric polynomials, i.e., expressions of the form
d

(ak cos k + bk sin k) .
p() = a0 +
k=1

This is a trigonometric polynomial of degree d.


1. Show that if p() 0 for all [, ] and d is even, then there is an sos decomposition p() = q12 () + q22 (), where q1 , q2 are trigonometric polynomials.
What is the corresponding statement for the case when d is odd?

i
i

64

main
2012/11/1
page 64
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications


2. Give a semidenite programming formulation to decide if a trigonometric
polynomial is nonnegative. The formulation should be in terms of a (d + 1)
(d + 1) real symmetric matrix, where d is the degree of the polynomial. It
may be helpful to consider separately the case where d is odd or even.
3. Find an sos decomposition of the polynomial
p() = 4 sin + sin 2 3 cos 2.
4. Find an sos decomposition of the polynomial
p() = 5 sin + sin 2 3 cos 3.

3.1.4

Multivariate Polynomials

The general multivariate case is quite similar to the univariate case discussed in
the previous section. The main dierences are the need of multi-index notation for
monomials, and the fact that sos will only be a sucient condition for nonnegativity.
The number
Consider a polynomial p(x
$ 1 , . .%. , xn ) of degree 2d
in n variables.

.
We
let
p(x)
=
p
x
,
where
are tuples
of coecients of p is equal to n+2d

2d
of exponents {(1 , . . . , n ) : 1 + + n 2d, i 0 i = 1, . .$. , n}.
%
monoLet [x]d := [1, x1 , . . . , xn , x21 , x1 x2 , . . . , xdn ]T be the vector of all n+d
d
mials in x1 , . . . , xn of degree less than or equal to d, and consider the equation
p(x) = [x]Td Q [x]d ,

(3.11)

% $n+d%
$
symmetric matrix. Proceeding exactly as in the previous
where Q is an n+d
d d
$
%
section, and indexing the matrix Q by the n+d
monomials in n variables of ded
gree d (or, more precisely, the associated exponent tuples), we obtain the following
conditions:

p =
Q ,
Q  0.
(3.12)
+=

$n+2d%

This is a system of 2d linear equations, one for each coecient of p(x). As


before, these equations are ane conditions relating the entries of Q and the coecients of p(x). Thus, we can decide membership in, or optimize over, the set of sos
polynomials by solving an SDP problem.
Example 3.38. We want to determine whether the bivariate quartic polynomial
p(x, y) = 2x4 + 5y 4 x2 y 2 + 2x3 y + 2x + 2
is a sum of squares. Since this polynomial has degree 2d = 4, the vector [x]d contains
all monomials of degree less than or equal to 2, i.e., [x]d = [1, x, y, x2 , xy, y 2 ]T . Writing the expression (3.11) for a generic matrix Q (which, for consistency with (3.12),

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 65
i

65

though perhaps at the expense


T

q00,00
1
x q00,10

y q00,01

p(x, y) = 2

x q00,20
xy q00,11
q00,02
y2

of clarity, we index with exponent tuples), we have

q00,10 q00,01 q00,20 q00,11 q00,02


1

q10,10 q10,01 q10,20 q10,11 q10,02


x
y
q10,01 q01,01 q01,20 q01,11 q01,02
2 .

q10,20 q01,20 q20,20 q20,11 q20,02


x
q10,11 q01,11 q20,11 q11,11 q11,02 xy
q10,02 q01,02 q20,02 q11,02 q02,02
y2
$ %
= 15
Expanding the right-hand side, and matching coecients, we obtain 2+4
4
linear equations, one per each possible coecient of p(x, y). For instance, the equations corresponding to the monomials x4 , x2 y 2 , and y 2 are
x4 :
2 2

x y :
2

y :

2 = q20,20 ,
1 = q00,22 + 2 q01,21 + q11,11 ,
0 = 2 q00,02 + q01,01 .

Again, nding a positive semidenite matrix Q subject to these 15 linear equations


is an SDP problem. Solving it, we obtain a feasible solution:

6 3 0 2 0 2
3 4 0
0 0
0

1
0 0 4
0 0
0
.

Q=
6 3 4
3 2 0 0

0 0 0
3 5
0
2 0 0 4 0 15
Any factorization of this positive semidenite matrix will give an explicit sos decomposition of p(x, y), for instance,
p(x, y) =

4 2 1349 4
1
1
y +
y + (4x + 3)2 + (3x2 + 5xy)2 +
3
705
12
15
1
(21x2 + 20y 2 + 10)2
+
315
1
(328y 2 235)2 .
+
59220

We summarize the contents of this section in the following theorem, describing the
direct relation between positive semidenite matrices and an sos condition.


p x in n variables and de(n+d)


gree 2d is a sum of squares if and only if there exists Q S+ d satisfying (3.12).
As a consequence, membership in n,2d can be decided via semidenite programming.

Theorem 3.39. A multivariate polynomial p(x) =

$n+d%The matrix size of the semidenite program appearing in Theorem 3.39 is


grows polynomially in the number of variables n for xed degree d.
d ,$ which
% $n+d%
Since n+d
=
d
n , it also grows polynomially in d for xed n.

i
i

66

main
2012/11/1
page 66
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Corollary 3.40. The cone n,2d of sos polynomials is a projected spectrahedron of


$
%
dimension n+2d
2d .
The connections between sos conditions, the Gram matrix representation, and
convexity can be traced back to the work of Shor [113], as well as Reznick and
collaborators [106, 30]. The links with semidenite programming were made explicit
in [89, 91] and were also explored independently by Nesterov [83] and Lasserre [72].
These results will be of crucial importance in the remainder of the chapter.
Notice in particular the striking constrast with the case of nonnegative polynomials:
while membership in Pn,2d is an NP-hard problem for 2d 4 (and thus, practically
infeasible for most problems of interest), membership in n,2d can be reduced to a
polynomially sized SDP problem.

3.1.5

Computational Formulations

A nice and useful coordinate-free interpretation of our earlier discussion (and in


particular, of (3.11)) is that writing a polynomial of degree 2d as a sum of squares
is equivalent to expressing it as a quadratic form on the vector space of polynomials
R[x]n,d . Although this coordinate-free viewpoint is very advantageous for theoretical work, when solving these problems in practice it is necessary to express the
corresponding semidenite programs in a specic set of coordinates. The choice of
basis, although irrelevant from the mathematical (or exact arithmetic) viewpoint,
may have signicant consequences for the numerical conditioning of the resulting
optimization problem.
When writing down the semidenite programs associated to the sos decomposition of a polynomial, as we did in the previous section, there is an implicit
choice of bases for two vector spaces: one for the space of polynomials R[x]n,d , and
one for the dual space R[x]n,2d . Indeed, in our formulation, the polynomial p(x)
was expressed as a quadratic form on the vector space R[x]n,d , represented by the
matrix Q with respect to the monomial basis [x]n,d ; see (3.11). Similarly, the constraints (3.12) correspond to the coecients of p(x) = [x]Td Q[x]d with respect to the
monomial basis [x]n,2d of R[x]n,2d . While these choices are perhaps canonical,
there are several alternative bases that can be used instead, and these can have
very dierent algebraic and numerical properties.
Based on this discussion, we can write the following more general SDP formulation for sums of squares.
Theorem 3.41. Let p(x) be a polynomial in n variables and degree 2d. Choose
bases {v1 , . . . , vs } and {w1 , . . . , wt } of R[x]n,d and R[x]n,2d , respectively (and thus,
%
$
%
$
and t = n+2d
s = n+d
d
2d ). Then, p(x) is a sum of squares if and only if there
n+d
exists a positive semidenite matrix Q S ( d ) satisfying the ane constraints:
p(x), wk  =

s


Qij vi vj , wk ,

k = 1, . . . , t.

i,j=1

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 67
i

67

From the exact arithmetic viewpoint, this statement is of course completely


equivalent to Theorem 3.39 and simply corresponds to a change of basis in both primal and dual variables of the corresponding semidenite program. However, when
solving the corresponding semidenite programs in oating-point arithmetic, there
may be very signicant dierences in the numerical stability of the corresponding
formulations. When choosing a particular basis {vi } for the space of polynomials,
it will often be convenient to pick {wj } as the corresponding dual basis, i.e., so it
satises vi , wj  = ij . However, this need not always be the case, and there may
be advantages (numerical or otherwise) in not doing so.
In what follows, we discuss four specic bases for the space of polynomials
R[x]n,d , briey mentioning some of their relative advantages and disadvantages.
Monomial basis. This basis is given by the monomials
Bm = {x },
where = (1 , . . . , n ), with || d. This is perhaps the most usual choice,
and as we did in Section 3.1.4, much of the literature in this area implicitly or explicitly uses this basis. While convenient from the notational and
implementation viewpoints, it can have very poor numerical properties.
Scaled monomials. A small modication of the monomial basis is given by the
scaled monomial basis. The main motivation for this is to achieve certain
natural and appealing invariance properties, as explained below. This basis
is dened as
& ' 1

d 2
x
,
Bs =

$ %
$ d %
where d denotes the multinomial coecient 1 ,...,
= 1 !2d!!...n ! .
n
The rationale behind this choice is the following: consider the inner product
between polynomials given by




 & d '1

p(x), q(x) :=
p x ,
q x
p q .
=

This inner product is known under many dierent names, such as the apolar, Fischer, Calder
on, or Bombieri inner product. Its dening property is
the direct relationship between powers of linear forms and point evaluations.
Indeed, if p is a homogeneous polynomial of degree d, we have
 & d '1 & d '
T
d
p(x), (v x)  =
p
v = p(v).

As a consequence, this inner product satises the invariance property


p(Ax), q(x) = p(x), q(AT x),
where A is an n n matrix.

i
i

68

main
2012/11/1
page 68
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications


The scaled monomial basis is simply an orthonormal basis with respect to this
invariant inner product.

Orthogonal polynomials. Similar to the previous case, assume that there is a


naturally dened inner product in the space of polynomials. In this case, a
natural choice is to pick an orthonormal basis with respect to this inner product. Many well-known families of polynomials (e.g., Chebyshev, Lagrange,
Gegenbauer, etc.) fall into this class.
As a concrete illustration, consider the case of an inner product that is induced
by integration against a strictly positive measure. For instance, in the case
of univariate polynomials, for certain problems it may be natural to have an
inner product dened by the Gaussian measure, i.e.,
,
x2
1
p(x)q(x)e 2 dx.
p(x), q(x) =
2
For this example, such an orthonormal family would be the well-known Hermite polynomials.
Orthogonal polynomials generally enjoy much nicer numerical stability properties than the monomial basis. This is particularly true whenever the underlying measure is chosen in an appropriate way, consistent with the problem
to be solved.
Lagrange interpolation. Yet another choice is given by Lagrange interpolating
polynomials with respect to a given xed set of nodes. For simplicity, we
discuss here the univariate case only, although the discussion extends naturally
to the multivariate case.
Fix d + 1 distinct points x0 , . . . , xd in R. It is well known that the Lagrange
interpolating polynomials
i (x) :=

# x xk
,
xi xk

i = 0, . . . , d,

k =i

form a basis of R[x]1,d . Also of interest is that the corresponding dual basis of
the dual space R[x]1,d is then given by the point evaluations xi that satisfy
xi (p) = p(xi ).
This choice is particularly appealing in the case where the polynomial is presented in terms of its values at a given set of points, instead of an explicit
description in terms of coecients. This approach also has some convenient
numerical properties related to the use of interior-point methods in the solution of the corresponding semidenite programs; see [75] for more details.
Exercise 3.42. Consider a univariate cubic polynomial p(x) on the interval [a, b],
for which we want to describe the convex hull of its graph, i.e., the set
%
$
S = conv {(t, p(t)) R2 : t [a, b]} .

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 69
i

69

Figure 3.5. Convex hulls of the graphs of cubic polynomials on an interval.


See Figure 3.5 for a few examples. We provide below a description of the set S as
a projected spectrahedron. Dene the interpolation points
x1 = a,

1
x2 = a + (b a),
4

3
x3 = a + (b a),
4

x4 = b.

Consider the spectrahedron in the variables (x, y, 1 , 2 , 3 , 4 ) R6 dened by






1 2
0
32
+ 3
 0,
2 4
0
124




0
1 2
33
+ 2
 0,
0
121
2 4
4

i=1

i = 1,

4


i xi = x,

i=1

4


i p(xi ) = y.

i=1

The set S is then given by the projection of this spectrahedron onto the variables
(x, y). Notice that in this description, the explicit expression of the polynomial p(x)
is never used, but instead only the interpolation values p(xi ) appear.
1. Prove the validity of this description using an sos formulation based on Lagrange interpolation.
2. Generalize this representation to univariate polynomials of any degree.

3.1.6

Rational Sos Decompositions

We have seen in previous sections how to compute sos decompositions using semidefinite programming. These convex optimization problems are usually solved numerically, using oating-point arithmetic. Although oating-point techniques in
principle allow for numerical approximations of arbitrary precision, the computed
solutions will typically not be exact. This may mean, for instance, that the equation
p(x) = [x]Td Q[x]d is only approximately satised, or that the matrix Q may have
very small negative eigenvalues.
In many applications, particularly those arising from problems in pure mathematics, it is desirable or necessary to obtain exact solutions. Examples of this are

i
i

70

main
2012/11/1
page 70
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

the use of sos methods for geometric theorem proving (e.g., Section 3.6.5) for establishing the validity of certain algebraic inequalities between matrices [68], or a case
of the monotone column permanent (MCP) conjecture [64]. A remarkable recent
application is the work in [10], where sos methods were used to prove new upper
bounds on kissing numbers, a well-known problem in sphere packings. A common
element in all these works is the use of exact algebraic identities obtained from
inspection of a numerically computed solution as the basic ingredients in a rigorous
proof.
In this section, we show that under a strict feasibility assumption, we can obtain a rational sos representation from an approximate solution to the semidenite
program of Theorem 3.39. The basic idea is to round and project the numerically
obtained Gram matrix onto the feasible subspace. We quantify the relation between the numerical error in the subspace and semidenite constraints, versus the
rounding tolerance, that will guarantee that the rounded and projected solution
remains feasible. For a full exposition of these ideas, as well as alternative approaches and improvements, we refer the reader to [98], [60], [65], and the references
therein.
To obtain rational sos decompositions, it is enough to focus on rational Gram
matrices. This follows from the LDLT decomposition; see Exercise 3.46.

Theorem 3.43. There exists a rational sos decomposition, i.e., p(x) = i pi (x)2 ,
where pi (x) Q[x], if and only if there is a Gram matrix with rational entries.
The approach we will use to obtain rational sums of squares is to take advantage of interior point solvers computational eciency: we rst compute an approximate numerical solution, and in a second step we round this numerical solution to
an exact rational one. We have the following standing assumption.
Assumption. There exists a positive denite Gram matrix Q for p(x).
This assumption is equivalent to the polynomial p(x) being in the interior of
the cone of sums of squares. The method described here could fail in general for
sums of squares that are not strictly positive: if there is an x such that p(x ) = 0,
it follows from the identity p(x ) = [x ]Td Q[x ]d that the monomial vector [x ]d
is in the kernel of Q. Hence Q cannot be positive denite. Nevertheless, this
assumption is reasonable for many problems of interest. Furthermore, very recent
work of Scheiderer [108] shows that this assumption (or a similar one) is required
by giving a construction of sos polynomials with rational coecients for which no
rational decompositions exist.
We assume the sos problem is posed as a semidenite problem in primal form,
as described in Section 3.1.4. After solving the SDP problem in general the numerical solution Q will not exactly satisfy (3.11). For an exact representation of
the original polynomial p(x), we have to nd a rational approximation to Q which
satises the equality constraints. The simplest procedure is to compute a ratio either by naive rounding or more sophisticated techniques
nal approximation Q,
is then projected onto the
like continued fractions. This rational approximation Q
subspace dened by the equations. Since this subspace is dened by rational data

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 71
i

71

( Q )

( Q )

PSD

L

Figure 3.6. Projection of a rounded solution. The matrix Q is the numerical solution of the SDP problem, and the orthogonal projections of the matrices Q
onto the subspace L are denoted by (Q) and (Q),
respectively. The shaded
and Q
cone PSD represents the cone of positive semidenite matrices.

(the coecients of p(x)), an orthogonal projection onto this subspace will yield
see Exercise 3.47.
a rational matrix (Q);
Now we obtain conditions to ensure that the truncated and projected matrix
remains positive semidenite. For this, we will estimate the rounding toler(Q)
ance needed. Assuming strict feasibility of the numerical solution Q returned by the
SDP solver, we quantify how far it is from the boundary of the positive semidenite
cone and the ane subspace through two parameters and . The parameter > 0
will satisfy Q  I and is a lower bound on the minimum eigenvalue of Q. The parameter quanties the distance of Q to the subspace, and thus
Q (Q)
F ,
where

F denotes the Frobenius norm. The matrix Q will be approximated by
such that
Q Q

F , where is the desired tolerance.


a rational matrix Q
Figure 3.6 depicts the whole situation.
Theorem
3.44. Let , , and be dened as above. Assume < , and choose

of the rounded matrix Q


onto
2 2 . Then, the orthogonal projection (Q)
the ane subspace L is rational and positive semidenite, and thus it is a valid
rational sos decomposition.
Hence if the SDP problem is strictly feasible, and the numerical solution Q
satises < , it is in principle always possible to compute a valid rational solution
by using suciently many digits for the approximated solution. The allowed rounding tolerance depends on the minimum eigenvalue of the positive denite matrix
Q and its distance from the ane space L. Under the strict feasibility assumption,
there always exists a solution with suciently small such that the inequality above
can be fullled (in particular, we can just take = 0, although using larger values
of , if possible, will yield solutions with smaller denominators).

i
i

main
2012/11/1
page 72

72

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

As described in [96], these ideas have been implemented in the software package SOS.m2 for the computer algebra system Macaulay 2 [54]. This package can be
used to compute rational sos decompositions and is available for download at [97].
Similar concepts have been recently implemented by Harrison in the open source
theorem prover HOL Light [60].
In SOS.m2, the main function is getSOS, which tries to compute a rational sos
decomposition for a given polynomial. In the following example we demonstrate how
to use the getSOS command for computing an sos decomposition of a polynomial
of degree 4 with 4 variables.
Example 3.45. Consider the polynomial
p(x, y, z, w) = 2x4 + x2 y 2 + y 4 4x2 z 4xyz 2y 2 w + y 2 2yz + 8z 2 2zw + 2w2 .
We rst load the SOS package and dene p(x, y, z, w):
i1 : loadPackage "SOS";
i2 : P = QQ[x,y,z,w];
i3 : p = 2*x^4 + x^2*y^2 + y^4 - 4*x^2*z - 4*x*y*z - 2*y^2*w +
y^2 - 2*y*z + 8*z^2 - 2*z*w + 2*w^2;

If successful, the
 function getSOS returns a weighted sos representation such that
p(x, y, z, w) = i di gi (x, y, z, w)2 . Otherwise an error message is displayed.
i4 : (g,d) = getSOS p
... omitted output ...
1 2
1
1
1
2 2
2
8 2
1
o8 = ({- -*x - -*x*y - -*y + z - -*w, - --*x - --*x*y - --*y - --*y + w,
4
4
8
8
15
15
15
15
---------------------------------------------------------------------2
4
4 2
2
18 2
20
81 2
2
x - --*x*y - --*y - --*y, x*y - --*y - --*y, - ---*y + y, y },
11
11
11
59
59
205
---------------------------------------------------------------------15 22 59 41
66
{8, --, --, --, --, ----})
8 15 55 59 1025

Hence p(x, y, z, w) may be written as


2
2


1
1
8 2
1
1
15
2
1
2
xy
y
y+w
p(x, y, z, w) =8 x2 xy y + z w
+
x2
4
4
8
8
8
15
15
15
15
2
2


4
22
4
2
59
20
18
xy
y2
y
y2
y
+
+
x2
xy
15
11
11
11
55
59
59
2

41
66 4
81 2
+
+

y +y
y .
59
205
1025

Correctness of the obtained decomposition may be veried with the function sumSOS,
which expands a weighted sum of squares decomposition:

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 73
i

73

i5 : sumSOS (g,d) - p
o5 = 0

Exercise 3.46. Prove Theorem 3.43. Use the LDLT decomposition (see Appendix A, Section A.1.2).
Exercise 3.47. Consider the ane subspace in Rn dened by the equations Ax = b,
and a point x0 Rn . Show that the orthogonal projection of x0 onto the subspace
is given by
(x0 ) = A+ b + (I A+ A)x0 ,
where A+ is the MoorePenrose pseudoinverse of A. If the rows of A are linearly
independent, we have A+ = AT (AAT )1 , and thus this formula can be written as
(x0 ) = x0 AT (AAT )1 (Ax0 b).
Show that if the matrices A and b are rational, and x0 is a rational point, then so
is (x0 ). Prove these facts, and show how to use them to convert an approximate
Gram matrix into a rational Gram matrix.
Exercise 3.48. Prove Theorem 3.44.

3.1.7

Sum of Squares Programs

We have described in previous sections how to check whether a given, xed multivariate polynomial is a sum of squares. These results can be nicely generalized to
dene a natural class of convex optimization problems which we will call sum of
squares (sos) programs.
Recall that the main objects of interest in semidenite programming are
quadratic forms that are positive semidenite.
When attempting to generalize this to homogeneous polynomials of higher degree,
a diculty appears: deciding nonnegativity for quartic or higher degree forms is
NP-hard. Therefore, a computationally tractable replacement is the following:
even degree polynomials that are sums of squares.
Sum of squares programs can then be dened as conic optimization problems,
where the feasible set is given by the intersection of an ane family of polynomials
and the proper cone n,2d of sos polynomials. As in the case of pure semidenite
programming, there are several possible equivalent descriptions. We choose below
a free variables formulation to highlight the analogy with the standard SDP dual
form (SDP-D) discussed in Chapter 2.
Denition 3.49. An sos optimization problem or sos program is a convex optimization problem of the form
maximizey
subject to

b1 y1 + + bm ym
pi (x; y) are sos in R[x],

i = 1, . . . , k,

(3.13)

i
i

74

main
2012/11/1
page 74
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

where pi (x; y) := ci (x) + ai1 (x)y1 + + aim (x)ym , and the ci , aij are given multivariate polynomials in R[x].
Notice that the pi (x; y) are arbitrary polynomial expressions that are ane in
the parameters y1 , . . . , ym (the decision variables). Also, note that the variables x
are dummy variables, in the sense that we are not optimizing over them, but they
are the indeterminates of the underlying polynomials. Sum of squares programs are
very useful, since they directly operate with polynomials as their basic data type,
thus providing a quite natural modelling formulation for many problems. We will
discuss several examples later in this chapter, including Lyapunov functions for
nonlinear systems [89, 87], probability inequalities [16], and convex relaxations for
nonconvex optimization [89, 72].
Example 3.50. Consider the following simple sos program:
maximizey
subject to

y1 + y2
x4 + y1 x + (2 + y2 )
(y1 y2 + 1) x2 + y2 x + 1

is sos,
is sos.

The constraints involve two univariate polynomials (in x), whose coecients are
ane functions of the parameters (or decision variables) (y1 , y2 ). Notice that the
feasible set (i.e., the set of y1 , y2 for which both polynomials are sos) is necessarily
convex, since it is dened by the intersection of an ane subspace and the sos
cone.
Interestingly enough, despite their apparently greater generality, sos programs
are in fact equivalent to SDPs. To see this, notice that, on the one hand, by choosing
the polynomials ci (x), aij (x) to be quadratic forms, we recover the standard SDP
formulation. On the other hand, it is possible to exactly embed every sos program
into a larger semidenite program. Indeed, the constraints requiring pi (x; y) to be
sos in R[x] are equivalent to the existence of matrices Qi  0 satisfying
pi (x; y) = [x]Td Qi [x]d ,

i = 1, . . . , k.

Expanding and matching coecients as before, we obtain linear equations between


the coecients of pi (x; y) and the entries of Qi . Since the coecients of pi (x; y)
are ane in y, the equations above reduce to linear equations between the decision
variables yi and the entries of the matrices Qi . Thus, the sos program (3.13) is
equivalent to a (larger) SDP in the variables (y1 , . . . , ym , Q1 , . . . , Qk ).
Example 3.51. Consider again the sos program of Example 3.50. Using the Gram
matrix reformulation described in earlier sections, the sos constraints are equivalent to

T
q00 q01 q02
1
1
x4 + y1 x + (2 + y2 ) = x q01 q11 q12 x ,
q02 q12 q22
x2
x2
 T 
 
1
r00 r01 1
(y1 y2 + 1)x2 + y2 x + 1 =
,
x
r01 r11 x

i
i

3.1. Nonnegative Polynomials and Sums of Squares

main
2012/11/1
page 75
i

75

where the matrices Q, R are positive semidenite. Expanding and equating the
left- and right-hand sides, we obtain ane equations between the decision variables
y1 , y2 and the entries of the matrices Q, R. For instance, for the rst constraint we
obtain
1 = q22 ,
x4 :
x3 :
2

x :
x:
1:

0 = 2q12 ,
0 = q11 + 2q02 ,
y1 = 2q01 ,
2 + y2 = q00 ,

while for the second we obtain


x2 :
x:
1:

y1 y2 + 1 = r11 ,
y2 = 2r01 ,
1 = r00 .

Putting together these linear equations with the conditions Q  0 and R  0 yields
a standard semidenite program.
As we see, the conversion process from an sos program to a standard semidefinite program is fully algorithmic (and somewhat messy and cumbersome if done
by hand!). For these reasons, it has been implemented in several parsers/solvers
such as SOSTOOLS [101], YALMIP [74], and SPOT [78]. Furthermore, it is quite
useful from both theoretical and practical viewpoints to abstract out the fact
that (under the hood) sos programs are solved via semidenite programming and
instead just think of them as a tractable class of convex optimization problems that
we can freely use for modeling and implementation. In fact, from the next chapter
on, we will rarely mention semidenite programming, and all our formulations will
be given directly in terms of sos programs.
Although sos programs and semidenite programming are equivalent in the
sense described earlier, the rich algebraic structure of sos programs makes possible
a much deeper understanding of their special properties. This also enables customized, more ecient algorithms for their numerical solution [50, 75, 107]. As
illustrated in later sections, there are numerous questions in a number of application domains, as well as foundational issues in nonconvex optimization that have
simple and natural formulations as sos programs.
Exercise 3.52. Plot the feasible set of the sos program of Example 3.50. Find the
corresponding optimal solution (y1 , y2 ) as well as explicit sos decompositions of the
constraint polynomials at optimality.
Exercise 3.53. Show that sos programs can be written as conic optimization
problems in terms of the cone n,2d of sos polynomials. Write the corresponding
dual conic program.

i
i

76

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

3.2

Applications of Sum of Squares Programs

main
2012/11/1
page 76
i

In this section we elaborate on several natural extensions of the basic sos methods
discussed so far. In combination with the more advanced techniques presented later,
these will serve as building blocks for more complex, domain-specic applications
developed in Section 3.6.

3.2.1

Unconstrained Polynomial Optimization

Our rst application is the global optimization of a univariate polynomial p(x).


Although this is a relatively simple task that could be handled with a variety of
alternative methods, it nicely illustrates many of the features of much more complicated problems. In this section, we consider only the unconstrained case (i.e.,
minimization over the whole real line); the constrained case will be considered later.
Rather than directly computing a minimizer x for which p(x ) is as small as
possible, we instead focus on the alternative viewpoint of obtaining a good (or the
best possible) lower bound on its optimal value. It is easy to see that a number
is a global lower bound of a polynomial p(x) if and only if the polynomial p(x)
is nonnegative, i.e.,
p(x)

x R

p(x) 0

x R.

Notice that the polynomial p(x) has coecients that depend anely on . This
suggests considering the optimization problem
maximize

subject to

p(x) is nonnegative.

(OPT-NN)

Clearly, this is a convex problem, since the feasible set is dened by an innite
number of linear inequalities (one for each value of x). Its optimal solution p is
equal to the global minimum of the polynomial, p(x ).
Consider now instead the following optimization problem, where the nonnegativity condition has been replaced by an sos constraint:
maximize

subject to

p(x) is sos.

(OPT-SOS)

The key distinction between the problems (OPT-NN) and (OPT-SOS) is the replacement of nonnegativity by an sos condition. However, since in the univariate
case nonnegativity is equivalent to sum of squares, these two optimization problems are, in fact, equivalent. Furthermore, (OPT-SOS) has exactly the form of
an sos program, and it is thus equivalent to a standard semidenite program; see
Exercise 3.54 for its explicit formulation.
As a consequence, we can obtain the value of the global minimum of a univariate polynomial by solving
m an sos program. Notice also that at optimality we
have 0 = p(x ) p = k=1 qk2 (x ) and thus all the qk simultaneously vanish at
x , which in principle gives a way of computing the minimizer x . As we shall see
later, a better alternative is to obtain the solution x directly from the dual SDP
problem by using complementary slackness.

i
i

3.2. Applications of Sum of Squares Programs

main
2012/11/1
page 77
i

77

Even though p(x) may be highly nonconvex, the proposed convex formulation nevertheless eectively computes its global minimum. This will extend, with
suitable modications, to the general multivariate case.
2d
k
Exercise 3.54. Let p(x) =
k=0 ck x . Give an explicit SDP formulation to
compute the value of the global minimum of p(x). Apply your formulation to the
polynomial p(x) = x4 20x2 + x.

3.2.2

Rational Functions

What happens if we want to minimize a univariate rational function instead of a


polynomial? Consider a rational function given as a ratio of polynomials p(x)/q(x),
where q(x) is strictly positive. From the equivalence
p(x)

q(x)

p(x) q(x) 0,

it follows that one can nd the global minimum of the rational function by solving
maximize

subject to

p(x) q(x) is sos.

The constrained case (i.e., minimization over a nite or semi-innite interval) is


very similar and can be formulated using the results in Section 3.3.1. The details
are left to the exercises.
Exercise 3.55. Compute numerically the global minimum and the global maximum of the rational function (x3 8x + 1)/(x4 + x2 + 12).
Exercise 3.56. Why did we assume that the denominator q(x) is strictly positive?
Is this restriction necessary?

3.2.3

Multivariate Optimization

Consider now the case of unconstrained polynomial optimization of a multivariate


polynomial p(x1 , . . . , xn ). As in the univariate case discussed in Section 3.2.1, we
can write the following formulation for the global minimum of p(x1 , . . . , xn ):
maximize

subject to

p(x1 , . . . , xn ) is nonnegative.

(MOPT-NN)

Despite being convex (why?), this formulation is in general intractable, since the
constraint set involves the set of nonnegative polynomials. As in the univariate
case, this suggests considering its sos alternative:
maximize

subject to

p(x1 , . . . , xn ) is sos.

(MOPT-SOS)

Let p be the optimal value of (MOPT-NN) (i.e., the global minimum2 of the
polynomial p(x1 , . . . , xn )) and psos be the optimal value of (MOPT-SOS). It should
2 Unlike in the univariate case, a multivariate polynomial that is bounded below need not
achieve its global minimum (as an example, consider the polynomial x2 + (1 xy)2 ). Therefore,
to make things fully rigorous one should consider here the supremum rather than the maximum.

i
i

78

main
2012/11/1
page 78
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

be clear that one can compute psos eciently by solving the corresponding sos
program (e.g., using an SDP solver).
Recall that for the general multivariate case, nonnegativity and sum of squares
are no longer equivalent. Thus, since the feasible set of the second problem is a
(possibly strict) subset of the feasible set of the rst problem, we have the inequality
psos p ,
and thus the sos technique is (in principle) only guaranteed to produce a lower bound
on the value of the global minimum of p. Notice that, on computational complexity
grounds, this is to be expected, since multivariate polynomial optimization is NPhard, while semidenite programming is polynomial-time (to any given accuracy).
Interestingly, there is strong experimental evidence that shows that, at least
for relatively small problems, we very often have p = psos ; see, e.g., [94]. The
reasons for this phenomenon are not yet completely understood, except in particular
cases. As explained in Chapter 4, perhaps the opposite trend should be expected
for large enough dimension. Nevertheless, as we shall see shortly in Section 3.2.6,
even in those situations where psos < p , we will be able to produce stronger sos
conditions that will improve upon the plain sos lower bound psos .
Exercise 3.57. Find the value of psos for the trivariate polynomial
p(x, y, z) = x4 + y 4 + z 4 4xyz + 2x + 3y + 4z.
Is the computed value of psos equal to the global minimum p ?
Exercise 3.58. Find a bivariate polynomial p(x, y) for which psos < p .
Exercise 3.59. Assume that p(x) is bounded below. Is psos necessarily nite?
Prove or disprove with a counterexample.

3.2.4

Nonnegativity on Sets and Constrained Optimization

An sos representation is an obvious certicate of the nonnegativity of a polynomial


p(x1 , . . . , xn ) over the whole space Rn . What if we only care about p(x) being
nonnegative on a given subset S Rn , as in the case of constrained optimization?
Are there similarly simple and natural sucient conditions for nonnegativity that we
can write in this case? We present below an answer to these questions. We remark
up-front, however, that in this section we are concerned only with the suciency of
our conditions, and we postpone all possible concerns about the converse direction
to Section 3.4.
The set S could be specied in very dierent forms (e.g., using only equations,
or only inequalities, or a combination of both). As a consequence, the proposed
conditions for nonnegativity of p(x) on S that we discuss below will naturally depend
on how the set S is presented.

i
i

3.2. Applications of Sum of Squares Programs

main
2012/11/1
page 79
i

79

Equations. For simplicity, let us assume rst that the set S is described by a set
of polynomial equations, i.e., that it is a real algebraic variety of the form
S = {x Rn : f1 (x) = 0, . . . , fm (x) = 0}.
Recalling the formal similarity with weak duality and Lagrange multipliers, it is
natural to write a condition of the following type:
p(x) +

m


i (x)fi (x)

is sos,

(3.14)

i=1

where i (x) are arbitrary polynomials. Notice that this condition does what we
want, since it obviously implies that p(x) is nonnegative on the set S. Indeed,
if (3.14) holds, by evaluating this expression at any point x0 S, we immediately
conclude that p(x0 ) 0. Notice also that the expression (3.14) is ane in the
unknown polynomials i (x), and once the set of allowable multipliers i (x) has
been xed (e.g., by restricting their degrees), this condition has the form of an sos
program.
In more algebraic terms, condition (3.14) considers the polynomial ideal I generated by the constraints fi (x). If p(x) is congruent with a sum of squares modulo
the ideal I, then this obviously certies nonnegativity of p(x). We elaborate more
on this algebraic viewpoint in Section 3.3.5 and Chapter 7.
Inequalities. If the set S is described using polynomial inequalities (as opposed to
equations), we can do something very similar. Assume the set S has a description:
S = {x Rn : g1 (x) 0, . . . , gm (x) 0}.
Similar to the previous subsection, and again inspired by weak duality, one can now
consider expressions of the type
p(x) = s0 (x) +

m


si (x)gi (x),

(3.15)

i=1

where s0 (x) and si (x) are sos polynomials. Indeed, this serves as a self-evident
certicate of nonnegativity of p(x) on the set S, since evaluating such a representation at any point x0 S will directly prove p(x0 ) 0. In addition, notice that we
can consider more powerful expressions by allowing nite products of constraints of
the form
p(x) = s0 (x) +

m

i=1

si (x)gi (x) +

m


sij (x)gi (x)gj (x) + ,

(3.16)

ij

where as before the polynomials s0 (x), si (x), sij (x), . . . are sums of squares. Again,
once the structure of these polynomials has been xed (e.g., by restricting their
degrees), the conditions boil down to sos programs. Any representation of the
type (3.16) serves as an obvious certicate of nonnegativity of p(x) on S.

i
i

80

main
2012/11/1
page 80
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Remark 3.60. In principle, one could perhaps think of using nonnegative polynomials instead of sum of squares for the si (x) in the previous expressions, since
evaluating them at candidate points x0 would certainly show nonnegativity of p(x) on
the set S. Notice, however, that in this case one would have to rely on a promise
that the polynomials si indeed have the stated property. The reason why sums of
squares are of relevance is that their (unconstrained) positivity is certied by the
sos decomposition itself, and thus they serve as a bona de mathematical proof of
nonnegativity of p(x) on S.
Under certain assumptions, converse results or representation theorems will
ensure that whenever p(x) is nonnegative on a given set S, a certicate of a specied
form must exist. We emphasize, however, that in most practical applications of
sos techniques only the easy direction is actually used, in the sense that once
an sos certicate has eectively been computed, it transparently proves the desired
property (e.g., polynomial nonnegativity, etc.).
S-procedure. In the particular case when the gi (x) are quadratic forms, and
the si (x) are nonnegative scalars, the sucient condition (3.15) is known as the
S-procedure in the mathematical optimization and control literature. Under suitable
assumptions, this condition is lossless; i.e., it exactly characterizes nonnegativity of
a quadratic form on a quadratically constrained set.
Lemma 3.61 (S-lemma). Let p(x) and g1 (x) be quadratic forms, and assume that
the set S has an interior point (i.e., there exists an x0 Rn such that g1 (x0 ) > 0).
In this case, if p(x) is nonnegative on S, it has a representation as in (3.16), i.e.,
p(x) = s0 (x) + s1 g1 (x),
where s0 (x) is a positive semidenite quadratic form, and s1 is a nonnegative constant.
For more about the S-procedure, the S-lemma, and their many applications,
see the books [21, 15] or the survey [99].
Exercise 3.62. Let p(x) = x4 3x2 + 1. Give an sos certicate of the nonnegativity of p(x) on the set S = {x R : x3 4x = 1}.
Exercise 3.63. Allowing products of constraints (as in (3.16) as opposed to (3.15))
sometimes makes possible the existence of much more concise nonnegativity certicates (or even makes possible their existence). Consider, for instance, the polynomial p(x, y) = xy, which is obviously nonnegative on the compact set S = {(x, y)
R2 : x 0, y 0, x + y 1}.
1. Show that no nonnegativity certicate of the form (3.15) exists.
2. Give a nonnegativity certicate of the form (3.16).

i
i

3.2. Applications of Sum of Squares Programs

main
2012/11/1
page 81
i

81

Exercise 3.64. Assume that the set S is described using both equations and
inequalities; i.e., it has the form
S = {x Rn : f1 (x) = 0, . . . , fk (x) = 0, g1 (x) 0, . . . , gm (x) 0}.
What conditions would you propose to use to certify nonnegativity of a polynomial
p(x) on S?

3.2.5

Bounding the Distance to a Variety

The following problem is of interest in many applications: given a real algebraic


variety V and a point x0 that is not on V , we want to lower bound the distance
from x0 to V . This distance can be measured according to dierent metrics, but
for simplicity we consider here only the case of the squared Euclidean norm

2 .
A common engineering motivation for this problem occurs, for instance, when the
point x0 represents the nominal behavior of a system, while the variety V corresponds to an undesired operating region. In this situation, we want to quantify
how large the perturbations to x0 can be, while guaranteeing that the undesired
region described by V cannot be reached.
There are numerous important instances of this situation that appear mostly
in robust optimization [14] and robust control [125] problems. For instance, a typical
formulation in the robust control literature is the case where the point x0 represents
the parameter values of a feedback control system (given, e.g., by dierential or
dierence equations), and the variety V is described by a determinantal condition
that ensures that the system is stable. More complicated situations may require the
undesirable set to be a semialgebraic set (instead of an algebraic variety), but the
underlying techniques are essentially the same.
Let the real variety V be dened by polynomials f1 (x), . . . , fm (x), i.e., V =
{x Rn : f1 (x) = 0, . . . , fm (x) = 0}. As we will see, such safe regions can be
computed by considering the constrained polynomial optimization problem:
minimize
x x0
2

subject to

fi (x) = 0, i = 1, . . . , m.

The true minimum value d of this problem yields the distance from x0 to the
variety V , and thus any valid lower bound on d will give a guaranteed neighborhood
of x0 that does not intersect the variety. Based on the same arguments as in the
previous section, it should be clear that one can compute lower bounds on d and
safe neighborhoods by considering sos problems of the form
maximize

subject to

(
x x0
) +
2

m


i (x)fi (x)

is sos.

(3.17)

j=1

Any feasible solution sos of this problem gives a ball B = {x Rn :


x x0
2 <
sos } and a certicate that B that does not intersect the variety V . Indeed, evaluating the constraint in (3.17) at any point x V , we directly obtain
x x0
2 sos .
Example 3.65. Consider a linear dierence equation
x[k + 1] = Ax[k].

i
i

82

main
2012/11/1
page 82
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications


3

0
1
2
3
4

2

Figure 3.7. The boundary of the domain of stability is dened by f(a, b) =


0. Also shown is the computed certied stable region of the form a2 + b2 < 1 .
Recall (e.g., from Section 2.2.1) that this linear dierence equation is stable (i.e.,
solutions converge to the origin as k for all initial conditions x[0]) if and only
if all eigenvalues of A are inside the unit disk.
Now let A be the matrix

1+b
0
a
1
a
2 b + a 1 ,
A=
3
0
b
2
whose characteristic polynomial is
det(tI A) = [27t3 + (45 9a)t2 + (24 + 9a + 3ab 3b2 )t
+ (4 2a b 2ab + a2 b + 3b2 )]/27.
When the parameters (a, b) vanish, i.e., for (a, b) = (0, 0), the eigenvalues of A
are (1/3, 2/3, 2/3), and thus the dierence equation is clearly stable. We want to
determine how large a perturbation in (a, b) can be (measured in the Euclidean
norm) for the dierence equation to remain stable.
To apply the methods described in this section, we can consider the algebraic
variety dened by the Zariski closure of the boundary of the region of stability.
Clearly, A is on the boundary of stability if and only if some eigenvalue i lies on
the unit circle, i.e., satises i i = 1. We can easily characterize this condition
algebraically. For instance, one can consider the polynomial
f (a, b) := det(A A I),

i
i

3.2. Applications of Sum of Squares Programs

main
2012/11/1
page 83
i

83

since the eigenvalues of the Kronecker product A A are the products i j , and
because A is real its eigenvalues appear in complex conjugate pairs. For our example, after removing constants and multiplicities from the factors, this yields the
polynomial
f(a, b) = (2 2a b + ab + a2 b)(100 20a b 5ab + a2 b + 6b2 )
(245 + 133a 14a2 37b + 2ab + 27a2 b + 5a3 b + 31b2 + 19ab2
+ 2a2 b2 4a3 b2 + a4 b2 6b3 12ab3 + 6a2 b3 + 9b4 ).

(3.18)

This polynomial denes the variety of interest, and it can be seen that it factors
into three components. This factorization is structural and corresponds to the conditions of the matrix A having eigenvalues at 1, at 1, or on the remainder of the
unit circle. (As an aside, a more ecient alternative is to directly compute a factorized representation using the bialternate matrix product instead of the Kronecker
product, since this removes multiplicities associated with the pairs i j and j i ;
see, e.g., [57].)
We can now compute, using (3.17), the size of a neighborhood of (a, b)
that is guaranteed not to intersect this variety. Notice that, for our example, since
the variety is dened by a single polynomial that factors, it is possible (and more
ecient) to consider each factor separately. In this case, for each of the three factors
in (3.18), we obtain values
1 0.8875,

2 9.0696,

3 2.1974.

Of these three, 1 denes the smallest neighborhood, and thus it yields a region
a2 + b2 < 0.8875 where the linear dierence equation is certied to be stable. This
neighborhood and the corresponding varieties are presented in Figure 3.7.
Remark. In the robust control literature, there are several methods that can
partially exploit the determinantal structure of these kinds of problems. The notion
of structured singular value and associated convex bounds are particularly relevant;
see e.g. [18, 43, 125] and the references therein.
Remark 3.66. Notice that in the optimization problem (3.17) the unknown multipliers i (x) are otherwise unconstrained. We will see in Section 3.3.5 and Chapter 7
that it is possible to exploit this structure for more ecient computation by computing sums of squares on the quotient ring R[x]/I(V ).

3.2.6

What If Simple Sums of Squares Are Not Enough?

In many of the applications described earlier, we replaced the set of nonnegative


polynomials Pn,2d , which is computationally intractable, with its tractable equivalent, the sos polynomials n,2d . In certain cases (e.g., univariate, quadratic) these
two sets coincide, but in general n,2d is a strictly smaller subset (quantitative
estimates of the dierence between these sets will be presented in Chapter 4).
What do we do in the cases where the set of nonnegative polynomials is no
longer equal to sum of squares, and a simple sos approximation is not powerful

i
i

84

main
2012/11/1
page 84
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

enough to obtain the desired results? As outlined below, it is possible to produce


stronger, more rened approximations to the set of nonnegative polynomials that
strictly improve over what is achievable by simple sums of squares.
The power of multipliers. As a preview, and a hint at the techniques that will be
considered later, let us show how to prove nonnegativity of a particular polynomial
which is not a sum of squares. Recall that the Motzkin polynomial was dened as
M (x, y) = x4 y 2 + x2 y 4 + 1 3x2 y 2
and is a nonnegative polynomial that is not a sum of squares.
Despite M (x, y) not being a sum of squares, we can try multiplying it by another polynomial which is known to be positive and then check whether the resulting
product is a sum of squares. Clearly, if this is the case, we have succeeded in proving nonnegativity of the original polynomial (why?). For instance, for our example,
consider multiplying M (x, y) by the obviously positive factor q(x) := (x2 + y 2 ). In
this case, the product will be a sum of squares, and in fact we have the explicit sos
decomposition
(x2 + y 2 ) M (x, y) = y 2 (1 x2 )2 + x2 (1 y 2 )2 + x2 y 2 (x2 + y 2 2)2 ,

(3.19)

which clearly certies that M (x, y) 0, despite the fact that M (x, y) itself is not a
sum of squares.
We will discuss a far-reaching generalization of this basic idea in Section 3.4,
where we explain how to approximate any semialgebraic problem (including of
course the simple case of a single polynomial being nonnegative) by sos techniques.
However, let us elaborate at this point on a number of interesting connections.
Sums of squares of rational functions. A simple explanation of why a multiplier
q(x) makes possible more powerful
nonnegativity certicates can be obtained by
considering the case where q(x) = i qi (x)2 is a sum of squares. In this case, we
can reinterpret an sos certicate for the product as
q(x) p(x) =


j

s2j (x)

p(x) =

  & sj (x)qi (x) '2


i

q(x)

In other words, we now obtain a representation of the polynomial p(x) as a sum


of squares of rational functions (instead of a sum of squares of polynomials). It
was conjectured by Hilbert (in fact, this is exactly the statement of the celebrated
Hilberts 17th problem; see, e.g., [106]) and later proved by Artin that every nonnegative polynomial has a representation as a sum of squares of rational functions.
Searching over multipliers. In the Motzkin example presented earlier, we produced the multiplier q(x) = x2 + y 2 in an ad hoc fashion. Notice, however, that if
p(x) is a xed polynomial for which we are trying to prove nonnegativity, we can
systematically search for a multiplier q(x) by solving a modied convex optimization problem (assuming a xed bound on the degree of q(x)). Indeed, the problem

i
i

3.2. Applications of Sum of Squares Programs

main
2012/11/1
page 85
i

85

of nding a polynomial q(x) such that


q(x) is sos,

q(x) p(x) is sos

is clearly ane in the unknown polynomial q(x) and thus can be reduced to an sos
program (and solved via semidenite programming).
Uniform denominators and P
olyas theorem. Artins solution to Hilberts
17th problem ensures that for every nonnegative polynomial there is a decomposition as a sum of squares of rational functions, or alternatively, a suitable multiplier
always exists. In many situations, it is convenient or necessary to restrict the structure of the possible multipliers (we will see examples of this later when discussing
copositive matrices in Section 3.6.1). Recall that a form is a homogeneous polynomial, i.e., one for which all monomials have the same degree. A well-known theorem
by P
olya about forms that are positive on the nonnegative orthant states precisely
a case where this situation holds.
Theorem 3.67 
([59, Section 2.24]). Given a form f (x1 , x2 , . . . , xn ) strictly positive for xi 0, i xi > 0, then f can be expressed as
g
f= ,
h
where g and h are forms with positive coecients. In particular, we can choose
h = (x1 + x2 + + xn )r
for a suitable r.
As we see, a representation of this kind gives an obvious certicate of the
nonnegativity of f on the nonnegative orthant. To see the relationship with sums
of squares, notice that if f is positive on the nonnegative orthant, then we can
write f-(x1 , . . . , xn ) := f (x21 , . . . , x2n ) = g(x21 , . . . , x2n )/(x21 + + x2n )r , and thus
P
olyas theorem yields a representation of the positive even form f- as a sum of
squares of rational functions, with a denominator of a xed form. P
olyas theorem
was generalized by Reznick [105], who showed that for any strictly
positive form

(not necessarily even), after multiplying by a suitable factor ( i x2i )r it becomes
a sum of squares (for r large enough). Furthermore, he also provided quantitative
estimates for the exponent r.
Exercise 3.68. Let q(x)p(x) and q(x) be sums of squares, where the multiplier
q(x) is not the zero polynomial. Show that p(x) is nonnegative.
Exercise 3.69. Consider the quartic form in four variables
p(w, x, y, z) := w4 + x2 y 2 + x2 z 2 + y 2 z 2 4wxyz.
1. Show that p(w, x, y, z) is not a sum of squares.
2. Find a multiplier q(w, x, y, z) such that q(w, x, y, z) p(w, x, y, z) is a sum of
squares.

i
i

86

main
2012/11/1
page 86
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Exercise 3.70. The conditions for a Polya-type nonnegativity certicate can be


fairly stringent. Consider the quadratic form f (x, y) = (x y)2 + xy, which is
obviously positive on the nonnegative orthant for all > 0. Estimate how large the
exponent r must be, as a function of , for the polynomial (x + y)r f (x, y) to have
only positive coecients.

3.3

Special Cases and Structure Exploitation

In Section 3.1.4 we introduced a general characterization of sums of squares in terms


of its standard SDP formulation. In many applications, the polynomials under
consideration have further structure that can be characterized algebraically in a
variety of ways. In this section we analyze dierent situations that often appear in
practice and the consequent theoretical and computational simplications.

3.3.1

Univariate Intervals

For univariate polynomials, we have seen how to exactly characterize global nonnegativity (i.e., for x (, )) in terms of semidenite programming. But what
if we are interested in polynomials that are nonnegative only on an interval (either
nite or semi-innite)? As explained below, we can use very similar ideas and two
classical characterizations usually associated to the names PolyaSzeg
o, Fekete, or
MarkovLukacs. The basic results are the following.
Theorem 3.71. A univariate polynomial p(x) is nonnegative on [0, ) if and only
if it can be written as
p(x) = s(x) + x t(x),
where s(x), t(x) are sums of squares. If deg(p) = 2d, then we have deg(s) 2d,
deg(t) 2d 2, while if deg(p) = 2d + 1, then deg(s) 2d, deg(t) 2d.
A similar result holds for closed nite intervals.
Theorem 3.72. Let a < b. Then the univariate polynomial p(x) is nonnegative on
[a, b] if and only if it can be written as

p(x) = s(x) + (x a) (b x) t(x)
if deg(p) is even,
p(x) = (x a) s(x) + (b x) t(x)
if deg(p) is odd,
where s(x), t(x) are sums of squares. In the rst case, we have deg(p) = 2d, and
deg(s) 2d, deg(t) 2d 2. In the second, deg(p) = 2d + 1, and deg(s) 2d,
deg(t) 2d.
Notice the similarity to the conditions discussed in Section 3.2.4 and the fact
that these representations obviously certify that p(x) 0 on the corresponding
set. From the existence of these sos representations, it also follows directly that

i
i

3.3. Special Cases and Structure Exploitation

main
2012/11/1
page 87
i

87

nonnegative polynomials on any interval (nite or semi-innite) can be exactly


characterized using small sos programs.
As we will see later, these sos characterizations, suitably dualized, can be
used to give a complete characterization of the set of valid moments of probability measures with support on univariate intervals. We will discuss the details in
Section 3.5.3, followed by an application to game theory in Section 3.6.6.
Exercise 3.73. Prove Theorem 3.71. Hint: p(x) is nonnegative on [0, ) if and
only if q(t) := p(t2 ) is a nonnegative polynomial.
Exercise 3.74. Let p(x) be a univariate polynomial of degree d that satises
|p(x)| 1 for x [1, 1]. How large can its leading coecient be?
1. Give an sos formulation for this problem, and solve it numerically for d =
2, 3, 4, 5.
2. What is the largest value of d for which you can numerically solve this problem
(using the monomial basis) in a reliable way? Experiment using dierent
polynomial bases, as explained in Section 3.1.5.
3. Can you guess what the general solution is as a function of d? Can you give
an exact characterization of the optimal polynomial?
Exercise 3.75. Give an sos formulation to the problem of minimizing a univariate
rational function p(x)/q(x) on the interval [a, b]. What condition is needed on the
denominator q(x), if any?

3.3.2

Sum of Squares Matrices

The notions of positive semideniteness and sums of squares of scalar polynomials can be naturally extended to polynomial matrices, i.e., matrices with entries
in R[x1 , . . . , xn ]. Sum of squares matrices are of interest in many situations, including the characterization of sos convexity (Section 3.3.3) and representations for
symmetry-invariant polynomials (Section 3.3.6).
We say that a symmetric polynomial matrix P (x) R[x]mm is positive
semidenite if P (x)  0 for all x Rn (i.e., it is pointwise positive semidenite).
The denition of an sos matrix is as follows [69, 48, 109].
Denition 3.76. A symmetric polynomial matrix P (x) R[x]mm , x Rn , is
an sos matrix if there exists a polynomial matrix M (x) R[x]sm for some s N,
such that P (x) = M T (x)M (x).
When m = 1, i.e., for scalar polynomials, this corresponds to the standard sos
notion. Also, when P is a constant matrix, then the condition simply states that P
is positive semidenite. Thus, sos matrices are a common generalization of positive
semidenite (constant) matrices and sos polynomials.

i
i

88

main
2012/11/1
page 88
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Example 3.77. Consider the polynomial matrix


 2

x 2x + 2 x
P (x) =
.
x
x2
This is an sos matrix since it admits the factorization

T 

1
x
1
x
P (x) =
.
x1 0
x1 0
Since an m m matrix is simply a representation of an m-variate quadratic
form, we can always interpret an sos matrix in terms of a polynomial with m
additional variables. The following result makes this precise.
Lemma 3.78. Let P (x) R[x]mm be a symmetric polynomial matrix, with x
Rn . Let p(x, y) := y T P (x)y be the associated scalar polynomial in m + n variables
[x; y], where y = [y1 , . . . , ym ].
1. The matrix P (x) is positive semidenite if and only if p(x, y) is nonnegative.
2. The matrix P (x) is an sos matrix if and only if p(x, y) is a sum of squares
(in R[x; y]).
Example 3.79. Here we continue Example 3.77. The matrix P (x) is an sos matrix
since the scalar polynomial y T P (x)y has the sos decomposition
y T P (x)y = (y1 + xy2 )2 + (x 1)2 y12 .
Notice that Lemma 3.78 allows us to easily decide whether a given polynomial matrix is an sos matrix using the same semidenite programming techniques
already described in Section 3.1.4. While these results establish that sos matrices are not a completely new concept (since they are fully equivalent to scalar sos
polynomials), the main advantage is that they allow for a more concise notation,
since they appear naturally in many contexts (e.g., sos-convexity in Section 3.3.3,
or symmetry reduction in Section 3.3.6).
When are positive semidenite matrices sums of squares? A celebrated
result about sos matrices that has been rediscovered many times is the fact that in
the univariate case, the sos condition is also necessary.
Theorem 3.80. Let P (x) R[x]mm be a symmetric polynomial matrix, where
the variable x is scalar (i.e., x R). Then the matrix P (x) is positive semidenite
if and only if it is an sos matrix.
For a proof and historical details, see, e.g., [28], [9], and the references therein.
Notice that this is a simultaneous generalization of two of the classical Hilbert cases
where nonnegativity is equal to sum of squares (scalar polynomials and quadratic
forms). For more details about univariate polynomial matrices, references to the

i
i

3.3. Special Cases and Structure Exploitation

main
2012/11/1
page 89
i

89

literature, as well as an ecient eigenvalue-based method for nding their sos decomposition, we refer the reader to [9].
In the multivariate case, however, not all positive polynomial matrices are
sums of squares. A well-known counterexample is due to Choi [27], who constructed
a positive semidenite biquadratic form that is not a sum of squares of bilinear
forms. His counterexample can be rewritten as the polynomial matrix
2
x1 + 2x22

C(x) =
x1 x2

x1 x3

x1 x2
x22

x1 x3

2x23

x2 x3

x2 x3
,

x23 + 2x21

(3.20)

which is positive semidenite for all (x1 , x2 , x3 ) R3 but is not an sos matrix.
Exercise 3.81. Prove Lemma 3.78.
Exercise 3.82. Let P (x) be an sos matrix. Show that all principal minors of P (x)
are scalar sos polynomials. (Hint: Use the CauchyBinet matrix identity.)
Exercise 3.83. Show that the Choi matrix (3.20) is positive semidenite for all
real values of (x1 , x2 , x3 ) but is not an sos matrix.
Exercise 3.84. Modify the algorithm given in Exercise 3.36 so that it will compute
a decomposition of a univariate sos matrix P (x).
Exercise 3.85. Certain optimization problems include constraints that are naturally expressed in matrix form. For instance, a set S could be dened as

1
S = (x1 , x2 , x3 ) R3 : G(x) := x1

x2

x1
1
x3

x2

x3  0

(notice that this corresponds to the 3-dimensional elliptope discussed in Section


2.1.3). While these descriptions could be scalarized and rewritten in terms of
scalar polynomial inequalities (e.g., by considering minors, or coecients of the
characteristic polynomial of G(x)), it is often much more convenient to preserve
their structure and keep them in matrix form.
Consider a scalar polynomial p(x), for which we want to show that it is nonnegative on the set S.
1. Show that a sucient condition for nonnegativity of p on the set S is the
existence of a scalar sos polynomial s0 (x) and an sos matrix S1 (x), such that
p(x) = s0 (x) + S1 (x), G(x).

i
i

90

main
2012/11/1
page 90
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications


2. Explain how to compute s0 (x) and S1 (x) via sos programs and semidenite
programming.
3. Give an sos certicate of nonnegativity of p(x) := 4 (x41 + x42 + x43 ) on the
set S.

3.3.3

Sum of Squares Convexity

The notion of sos-convexity is a tractable algebraic replacement for convexity of


a polynomial function. Informally, the (dicult to verify) requirement of positive
semideniteness of the Hessian matrix is replaced with a tractable condition, the
existence of an sos decomposition. Besides its computational implications, sosconvexity is an appealing concept since it bridges the geometric and algebraic aspects of convexity. Indeed, while the usual denition of convexity is concerned only
with the geometry of the epigraph, in sos-convexity this geometric property (or
the nonnegativity of the Hessian) must be certied through a simple algebraic
identity, namely, an sos factorization of the Hessian.
Recall that a multivariate polynomial p(x) := p(x1 , . . . , xn ) is convex if and
only if its Hessian is positive semidenite for all x Rn . This is a pointwise
condition that the Hessian must satisfy at every point x. The notion of sos-convexity
requires instead a global algebraic certicate for this property.
Denition 3.86. A polynomial p(x) is sos-convex if its Hessian H(x) is an sos
matrix, i.e., if it factors as H(x) = M (x)T M (x), where M (x) is a polynomial
matrix.
Clearly, an sos-convex polynomial is convex, since the Hessian being an sos
matrix implies it is positive semidenite everywhere. Is the converse true? In other
words, is every convex polynomial necessarily sos-convex?
Recall (e.g., from the Choi example in the previous section) that not every
positive semidenite polynomial matrix is an sos matrix. However, this does not
necessarily serve as a counterexample, since due to the fact that partial derivatives
commute, the Hessian matrix of a polynomial has strong ane dependencies among
Hij (x)
ik (x)
= Hx
. As a consequence, the set of
the dierent entries, of the form x
j
k
valid Hessians is a lower-dimensional subspace of the space of symmetric polynomial matrices. Thus, due to this special structure, it is perhaps conceivable that
convexity and sos-convexity of polynomials could still be equivalent.
The following counterexample from [5] shows that this is not the case.
Theorem 3.87. The trivariate form of degree 8 given by
p(x)

32x81 + 118x61 x22 + 40x61 x23 + 25x41 x42 43x41 x22 x23 35x41 x43 + 3x21 x42 x23
16x21 x22 x43 + 24x21 x63 + 16x82 + 44x62 x23 + 70x42 x43 + 60x22 x63 + 30x83

is convex but is not sos-convex.

i
i

3.3. Special Cases and Structure Exploitation

main
2012/11/1
page 91
i

91

The work [4] presents a complete classication of the cases for which convexity
and sos-convexity coincide. This description is in a certain sense the analogue to
Hilberts classication of nonnegativity described in Section 3.1.2.
Another motivation and justication for studying sos-convexity is its computational tractability. Deciding convexity of a multivariate polynomial is an NP-hard
problem [3], while it follows from our earlier discussions that sos-convexity can be
checked using semidenite programming. Sos-convexity will appear prominently in
the characterization of semidenite representability of convex sets; see Section 6.4.3
in Chapter 6 for details. For more results and background material on sos-convexity,
we refer the reader to [5, 4].
Exercise 3.88. Show that the Choi matrix (3.20) is not the Hessian of any polynomial.
Exercise 3.89. Prove Theorem 3.87. Hint: To show that p(x) is not sos-convex,
analyze the (1, 1) entry of the Hessian.
Exercise 3.90. In this exercise, we explore the use of sos-convexity for the problem of tting a polynomial to data, under a convexity restriction (e.g., [76]).
Consider a nite set of data {xi , fi } for i = 1, . . . , N , where xi D Rn and
fi R. We want to t these data points with a polynomial function p(x) of degree

2
d, making the least-squares tting error N
i=1 (p(xi ) fi ) as small as possible.
1. Give an sos formulation for this problem, in the case where p(x) is required
to be a globally convex polynomial. Explain whether the formulation solves
this problem exactly.
2. How would you modify your formulation if we only require that p(x) be convex
on the domain D of interest?
3. Generate data points where xi D := [1, 1] [1, 1], and numerically solve
your formulation for those two cases (p(x) is convex everywhere, or is only
convex on the domain D).

3.3.4

Sparsity and Newton Polytopes

Many of the polynomial systems that appear in practice are far from being generic
but rather present a number of structural features that, when properly exploited,
allow for much more ecient computational techniques. This is quite similar to the
situation in numerical linear algebra, where there is a big dierence in performance
between algorithms that take into account matrix sparsity and those that do not.
For matrices, the notion of sparsity is often relatively straightforward and relates
mostly to the number of nonzero coecients. In computational algebra, however,
there exists a much more rened notion of sparsity that refers not only to the
number of zero coecients of a polynomial, but also to the underlying combinatorial
structure of the nonzero coecients.

i
i

92

main
2012/11/1
page 92
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

y degree

2
x degree

Figure 3.8. Newton polytope of the polynomial 5 xy x2 y 2 + 3y 2 + x4 .


Sparsity for multivariate polynomials is usually characterized in terms of their
Newton polytope, dened below.

Denition 3.91. Consider a multivariate polynomial p(x1 , . . . , xn ) = c x .
The Newton polytope of p, denoted by N (p), is the convex hull of the set of exponents , considered as vectors in Rn .
Thus, the Newton polytope of a polynomial always has integer extreme points,
given by a subset of the exponents of the polynomial.
Example 3.92. Consider the polynomial p(x, y) = 5 xy x2 y 2 + 3y 2 + x4 . Its
Newton polytope N (p), displayed in Figure 3.8, is the convex hull of the points
(0, 0), (1, 1), (2, 2), (0, 2), (4, 0).
Example 3.93. Consider the polynomial p(x, y) = 1 x2 + xy + 4y 4 . Its Newton
polytope N (p) is the triangle in R2 with vertices {(0, 0), (2, 0), (0, 4)}.
Newton polytopes are an essential tool when considering polynomial arithmetic because of the following fundamental identity:
N (g h) = N (g) + N (h),

(3.21)

where + denotes the Minkowski addition of polytopes.


The Newton polytope allows us to introduce a notion of sparsity for a polynomial, related to the size of its Newton polytope. Sparsity (in this algebraic sense)
allows a notable reduction in the computational cost of checking sum of squares
conditions of multivariate polynomials. The reason is the following theorem due to
Reznick.
Theorem 3.94 ([104, Theorem 1]). If p(x) =

2
i qi (x) ,

then N (qi ) 12 N (p).

This theorem allows us, without loss of generality, to restrict the set of monomials appearing in the sos representation (3.12) to those in the Newton polytope
of p, scaled by a factor of 12 . This reduces the size of the corresponding matrix Q,
thus simplifying the semidenite program to be solved.

i
i

3.3. Special Cases and Structure Exploitation

main
2012/11/1
page 93
i

93

Example 3.95. Consider the following polynomial:


p(w, x, y, z) := (w4 + 1)(x4 + 1)(y 4 + 1)(z 4 + 1) + 2w + 3x + 4y + 5z,
for which we want to compute an sos decomposition. The polynomial p has degree
2d = 16 and four independent variables (n = 4). A naive direct approach, along the
lines described in Section 3.1.4, would require a matrix Q indexed
$ by% all monomials
= 495.
in (w, x, y, z) of degree less than or equal to d = 8, i.e., of size n+d
d
However, its Newton polytope N (p) is easily seen to be the four-dimensional
hypercube with opposite vertices at (0, 0, 0, 0) and (4, 4, 4, 4). Therefore, by Theorem 3.94, the polynomials qi in the sos decomposition of p must have support
in 12 N (p), which is the hypercube with vertices at (0, 0, 0, 0) and (2, 2, 2, 2). This
scaled polytope contains 34 = 81 distinct monomials, and as a consequence a full
sos decomposition can be computed by solving a much smaller semidenite
program.
For a discussion of additional techniques for exploiting sparsity in the context
of sum of squares, we refer the reader to [70, 124] and the references therein.
Exercise 3.96. Prove identity (3.21).
Exercise 3.97. Consider the Motzkin polynomial (3.6), and compute its Newton
polytope. Which monomials could appear in a (hypothetical) sos decomposition of
M (x, y)? Show, by considering the coecient of x2 y 2 , that this leads to a contradiction, and thus that M (x, y) is not a sum of squares.
Exercise 3.98. Facial reduction [20] is a technique by which a conic programming feasibility problem x K L that is feasible, but not strictly feasible, is
replaced with a simpler problem that satises strict feasibility. The key idea is that
if the subspace L does not properly intersect the cone K, one may restrict attention
to a smaller face of K instead (ideally, of minimal possible dimension). For the
positive semidenite cone, faces are themselves isomorphic to smaller dimensional
positive semidenite cones, and thus this procedure yields smaller, but equivalent,
semidenite programs.
Explain how to interpret the Newton polytope technique described above in
terms of facial reduction.

3.3.5

Equations, Ideals, and Quotient Rings

Sum of squares decompositions give sucient conditions for global nonnegativity


of a polynomial. However, as discussed in Section 3.2.4, often we are interested in
deciding or proving nonnegativity only on certain regions of Rn . In this section
we consider the case where the set of interest is dened using equality constraints
only; i.e., it is an algebraic variety. The more general case of polynomial inequalities
(i.e., basic semialgebraic sets) will be discussed in Section 3.4. As we will see, when
explicit equality constraints are present in the problem, notable simplications in
the formulation of the corresponding semidenite programs are possible.

i
i

94

main
2012/11/1
page 94
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

For concreteness, consider the problem of verifying the nonnegativity of a


polynomial p(x) on a set dened by equality constraints: {x Rn : fi (x) = 0, i =
1, . . . , m} (i.e., an algebraic variety). Let I = f1 , . . . , fm  be the ideal generated
by the equality constraints, and dene the quotient ring R[x]/I as the set of equivalence classes for congruence modulo the ideal I. Then, provided computations
can be eectively performed in this quotient ring, very compact SDP formulations
will be possible. This will be usually the case when Grobner bases for the ideal
are either available or easy to compute. The rst case usually occurs in combinatorial optimization problems, and the latter when the ideal is generated by a few
constraints.
We explain the details next. We want to write sos-like sucient conditions for
the polynomial p(x) to be nonnegative on the variety V (I). As mentioned earlier,
the condition

i (x)fi (x) is a sum of squares in R[x]
(3.22)
p(x) +
i

is a self-evident certicate of nonnegativity that clearly guarantees this. To see


this, notice that evaluating this expression on any point x0 of V (I) gives p(x0 )
(since fi (x0 ) = 0), and this is a nonnegative value (since the expression is a sum of
squares). By passing to the quotient ring (equivalently, considering (3.22) modulo
the ideal I), we can rewrite this as
f (x) is a sum of squares in R[x]/I.

(3.23)

Both expressions are sucient conditions for the nonnegativity of p on the variety
dened by fi (x) = 0. As we will see, we can use this to give a more ecient version
of the SDP formulation of sum of squares.
Sum of squares on quotient rings. We describe next a natural modication
of the standard sos methods that will allow us to compute sos decompositions on
quotient rings. This can be done by using essentially the same SDP techniques
as in the standard case. Since we will need to do eective computations on the
quotient, we assume that a Gr
obner basis G = {b1 , . . . , bk } of the polynomial ideal
I is available; see Appendix A and [32] for an introduction to computational algebra
and Gr
obner basis methods.
The method will be basically the same as in the standard case explained in
Section 3.1.4 (expressing the polynomial as a quadratic form on a vector of monomials and writing linear equations to obtain a semidenite program), but with two
main dierences:
Instead of indexing the rows and columns of the matrix Q in the semidenite
program by the usual monomials, we use standard monomials corresponding
to the Grobner basis G of the ideal I. These are the monomials that are not
obner basis.
divisible by any leading term of the polynomials bi in the Gr
When equating the left- and right-hand sides to form linear equations dening
the subspace of valid Gram matrices, all operations are performed in the
quotient ring; i.e., we rewrite the terms in normal form after multiplication.

i
i

3.3. Special Cases and Structure Exploitation

main
2012/11/1
page 95
i

95

Rather than giving a formal description, it is more transparent to explain the


methodology via a simple example.
Example 3.99. Consider the problem of deciding if the polynomial p := 10x2 y
is nonnegative on the variety dened by f := x2 + y 2 1 = 0 (the unit circle). We
will check whether p is a sum of squares in R[x, y]/I, where I is the ideal I = f .
Since the ideal I is principal (generated by a single polynomial), we already have a
Gr
obner basis, which is simply G = {f }. We use a graded lexicographic monomial
ordering, where x y. The corresponding set of standard monomials is then
B = {1, x, y, x2 , xy, x3 , x2 y, . . .}.
To formulate the corresponding semidenite program, we pick a partial basis
of the quotient ring (i.e., a subset of monomials in B). In this example, we take
only {1, x, y}, and, as before, we write p as a quadratic form in these monomials:
T
q11
1
10 x2 y = x q12
q13
y

q12
q22
q23

q13
1
q23 x
q33
y

= q11 + q22 x2 + q33 y 2 + 2q12 x + 2q13 y + 2q23 xy


(q11 + q33 ) + (q22 q33 )x2 + 2q12 x + 2q13 y + 2q23 xy

mod I,

where, in the last line, we used reduction modulo the ideal to rewrite some terms
as linear combinations of standard monomials only (e.g., the term q33 y 2 is replaced
by q33 q33 x2 ). Matching coecients between left and right, we obtain the linear
equations
1 : 10 = q11 + q33 ,
x:
0 = 2q12 ,
y : 1 = 2q13 ,
x2 : 1 = q22 q33 ,
xy :

0 = 2q23

that dene the subspace. Thus, we obtain again a simple semidenite program.
Solving it, we have



9 0 12
3 0 16
1
T

L=
Q = 0 0 0 = L L,
,
35
2 0 0
12 0 1
6
and therefore
.
y /2 35 2
10 x2 y 3
y
+
6
36

mod I,

which shows that p is indeed a sum of squares on R[x, y]/I. A simple geometric
interpretation is shown in Figure 3.9. As expected, by the condition above, p coincides with an sos polynomial on the variety, and thus it is obviously nonnegative
on that set.

i
i

96

main
2012/11/1
page 96
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

12
10
1
8

0
1

0
1

x
1

2
Figure 3.9. The polynomials p = 10 x2 y and (3 y6 )2 + 35
36 y take
exactly the same values on the unit circle x2 + y 2 = 1. Thus, p is nonnegative on
the circle.

Remark 3.100. Despite the similarities between the standard case of sum of
squares on the polynomial ring R[x] versus the quotient ring R[x]/I, there are a
few important dierences. A key distinction is related
to computational complexity

issues. Consider an sos decomposition p(x) = i qi (x)2 . When working on R[x],
we can always bound a priori the degree of the polynomials qi in terms of the degree
of p (namely, deg(qi ) 12 deg(p)). This is not true when working on a quotient
ring, since monomials can wrap around when computing normal forms. This is
the reason why when working on R[x]/I we typically have some freedom in choosing
a nite set of standard monomials to index the matrix Q (unless it is feasible to
include all of them).
In fact, since for the ideal I = x21 1, . . . , x2n 1 every polynomial nonnegative
on V (I) is a sum of squares on R[x]/I (Exercise 3.105), it directly follows that,
in the general case, deciding whether a polynomial is sum of squares modulo I is
NP-hard.
Even though in the worst case computing a Grobner basis for I may be
troublesome, for many practical problems they are often directly available or relatively easy to compute. A typical example is the case of combinatorial optimization
problems, where the equations dening the Boolean ideal x21 1, . . . , x2n 1 are
already a Gr
obner basis. Another frequent situation is when the ideal is dened by
a single constraint, in which case the dening equation is again obviously a Gr
obner
basis of the corresponding ideal.
SDP dimensions and Hilbert series. Another advantage of the idealtheoretic formulation is the ease with which structural results can be obtained
through basic algebraic notions. For instance, consider the following question: what
are the matrix dimensions of the semidenite programs for sum of squares modulo an

i
i

3.3. Special Cases and Structure Exploitation

main
2012/11/1
page 97
i

97

ideal? Recall that in the standard sos case (over R[x], for a polynomial of degree
2d), the matrices are indexed by all monomials of degree less than or equal to d and
%
$n+d% d $n+k1%
$
= k=0
, where each
thus have size n+d
d . This can be rewritten as
d
k
term in the sum corresponds to the number of monomials of total degree k. How
can we generalize this?
For quotient rings, there is a nice way of counting the dimensions of the
dierent homogeneous components, known as the Hilbert series; see, e.g., [33]. The
Hilbert series H(I, t) is the generating function (a formal power series) of the Hilbert
function HI (k), which gives the dimension of the degree k the homogeneous part of
the quotient ring, i.e.,


HI (k) tk ,
H(I, t) =
k=0

where HI (k) = dim(R[x]/I R[x]k ). If I is a monomial ideal, HI (k) counts the


number of standard monomials of total degree k. If I is an ideal, and in (I) is
its initial ideal with respect to a graded monomial ordering, then both have the
same Hilbert series. The Hilbert series can be computed from a Gr
obner basis of
the ideal I, and, as a consequence, this allows us to determine the size of the corresponding semidenite program, given a bound on the total degree of the standard
monomials we will be considering.
For instance, the standard case we just discussed corresponds to the trivial
n

ideal
= {0}.
% k The Hilbert series for R[x]/I = R[x] is H(I, t) = 1/(1 t) =
 I$n+k1
t , which corresponds exactly to the dimensions computer earlier.
k=0
k
Example 3.101. Consider the ideal I = x2 + y 2 1 of Example 3.99. Its Hilbert
series is
1+t
H(I, t) =
= 1 + 2t + 2t2 + 2t3 + 2t4 ,
1t
which counts the number of standard monomials of each degree. The terms of the
series allow us to determine, given a bound on the total degree of the monomials
to be considered, what size the corresponding semidenite program will be. For
instance, since in Example 3.99 we used only monomials of degree less than or
equal to 1, the size of the corresponding semidenite program is 1 + 2 = 3.
In Exercise 3.106 we discuss another natural and important example, namely,
the Boolean ideal x21 1, . . . , x2n 1. These ideas will appear again in Chapter 7,
when computing semidenite representations of convex hulls of algebraic varieties.
Exercise 3.102. Prove formally that the expressions (3.22) and (3.23) are equivalent.
Exercise 3.103. Consider the polynomial f (x, y, z) := 1 + xy + yz + xz, and the
variety V (I), where I = x2 1, y 2 1, z 2 1. Notice that V (I) is nite.
1. Show, by explicit enumeration, that f is nonnegative on V (I).
2. Write f as a sum of squares on R[x]/I.

i
i

98

main
2012/11/1
page 98
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Exercise 3.104. Consider the buttery curve in R2 , dened by the equation


x6 + y 6 = x2 .
Give an sos certicate that the real locus of this curve is contained in a disk of
radius 5/4. Is this the best possible constant?
Exercise 3.105. Consider R[x1 , . . . , xn ] and the ideal I = x21 1, . . . , x2n 1.
We will show that every polynomial that is nonnegative on V (I) Rn is a sum of
squares modulo R[x]/I.
1. Show that V (I) corresponds to all the points {1, 1}n (i.e., the 2n vertices of
the unit hypercube). Thus, a polynomial p(x) is nonnegative on V (I) if and
only if evaluates to a nonnegative number on all these vertices.
.
/
0
+xi
.
2. Let v = (v1 , . . . , vn ) V (I). Dene the polynomial v (x) = ni=1 vi2v
i
Show that v (v) = 1, and v (w) = 0 for all w V (I), with w = v.

3. Assume that p(x) is nonnegative on V (I). Find an explicit sos decomposition


for p(x) on R[x]/I using the fact that, for all x V (I), we have

p(x) =
p(v) v (x).
vV (I)

4. Extend this result to all radical zero-dimensional ideals [90].


Exercise 3.106. Consider R[x1 , . . . , xn ] and the ideal I = x21 1, . . . , x2n 1.
Show that the standard monomials are the square-free monomials and thus are in
that the
bijection with the 2n subsets of {1, . . . , n}. Show 
$ %Hilbert series (actually
n
a polynomial in this case) is H(I, t) = (1 + t)n = k=0 nk tk . What does this say
about the sizes of the corresponding semidenite programs when looking at sums
of squares modulo I?

3.3.6

Symmetries

Another useful property that can be exploited in the sos context is symmetry. Symmetric problems arise very frequently in applications for a variety of reasons. Sometimes symmetry reects the underlying structure of existing physical systems (e.g.,
time-invariance, conservation laws), while in some other cases it arises as a result
of the chosen mathematical abstraction. Symmetry reduction techniques have been
explored in many contexts, with areas such as crystallography, dynamical systems
[53], and geometric mechanics [77] being prominent examples.
In optimization, as we shall see, symmetry interacts in a very interesting way
with convexity, particularly in the case of semidenite programming. In general,
there are many potential advantages in exploiting symmetries:
Problem size. The rst immediate advantage is the reduction in problem size,
as the new instance can have a signicantly smaller number of variables and
constraints.

i
i

3.3. Special Cases and Structure Exploitation

main
2012/11/1
page 99
i

99

Degeneracy removal. In symmetric SDP problems, there are repeated eigenvalues of high multiplicity that are dicult to handle numerically. These can be
removed by a proper handling of the symmetry.
Conditioning and reliability. Symmetry-aware methodologies have in general
much better numerical conditioning, and the resulting smaller size instances
are usually less prone to numerical errors.
An in-depth discussion of symmetries in sum of squares and semidenite programming requires some elements of group representation theory and invariant theory. In this section, we present and isolate the key ideas, referring to the literature
for the full technical details; see, e.g., [48, 123]. We consider the simple situation
where we want to compute an sos decomposition of a single polynomial, and the
underlying symmetry group is nite; the extensions to more general cases are relatively straightforward. The main message is that the presence of symmetry in sos
problems can be exploited at three levels of increasing sophistication: (a) convexity,
(b) semidenite programming, and (c) sum of squares.
The set-up is as follows: we consider a polynomial p(x1 , . . . , xn ) that is invariant under the action of a nite group G. A formal denition is given below in (3.24),
but the idea is that the polynomial in unchanged under certain transformations of
the variables. We will use the following as a running example.
Example 3.107. Consider the (nonconvex) quartic trivariate polynomial
p(x, y, z) = x4 + y 4 + z 4 4xyz + x + y + z.
This polynomial is invariant under all permutations of {x, y, z} (the full symmetric
group S3 ). The global minimum of p is p 2.1129 and is achieved at the orbit
of global minimizers:
(0.988, 1.102, 1.102) , (1.102, 0.988, 1.102) , (1.102, 1.102, 0.988).
For this polynomial, it holds that psos = p .
Recall that a linear representation of a group G is a homomorphism : G
GL(Rn ) (i.e., (st) = (s)(t) s, t G), where GL(Rn ) is the group of invertible
n n real matrices. The assumption that p is invariant under the group action
means that
p((g)x) = p(x)

g G.

(3.24)

Convexity. In general, when minimizing a symmetric function, one cannot always


expect that minimizers will also be symmetric (Example 3.107 is a case where
this clearly fails). There is, however, an important situation where optimization
problems invariant under the action of a group are guaranteed to have solutions that
are themselves invariant. As we show below, this is the case for convex problems,
where there is no loss of generality in restricting to symmetric solutions.

i
i

100

main
2012/11/1
page 100
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Consider the problem of minimizing a convex function f (x) over a convex


set S, where both the objective function f and the constraint set S are invariant
under the group action. This means that
f ((g)x) = f (x)

g G

and
x S (g)x S

g G,

respectively. When these properties hold (symmetry + convexity), then we can


always restrict the solution to the xed-point subspace (or subspace of symmetric
solutions) dened by
F := {x Rn : (g)x = x,

g G}.

To see why the statement is true, consider any feasible solution x0 S, and dene
the group average
1 
x
-0 =
(g)x0
|G|
gG

that expresses x
-0 as a convex combination of the images of x0 under the group
action. By construction, x
-0 F . Since S is convex and
-0 S,
 invariant, we have x
1
and convexity and invariance of f yield f (x0 ) |G|
f
((g)x
)
=
f
(x
).
0
0
gG
Thus, without loss of generality, for invariant convex problems we can restrict the search for optimal solutions to a potentially much smaller subset S F
(of course, this is most useful whenever the dimension of the subspace F is small).
In other words, for convex problems, no symmetry-breaking is ever necessary.
Example 3.108. The entropy of a probability vector (p1 , . . . , pn ) with
pi 0, is dened as
n

pi log pi ,
H(p) :=

n
i=1

pi = 1,

i=1

where (by continuity) 0 log 0 is dened as 0. The (negative) entropy H(p) is a


convex function of p that is clearly symmetric with respect to arbitrary permutations of the pi . Consider the problem of nding the vector p with largest possible
entropy; i.e., we want to minimize
nthe convex symmetric function H(p) over the
convex symmetric set S = {p : i=1 pi = 1, pi 0}. For this problem, the xedpoint subspace F is one-dimensional, of the form (t, t, . . . , t), and thus it follows
with no calculation that the entropy maximizing vector is given by the uniform
distribution ( n1 , . . . , n1 ).
Semidenite programs, being convex optimization problems, naturally t into
the class discussed above, and thus for invariant SDP problems we will always be
able to restrict solutions to their xed-point subspace. Furthermore, as we shall see
next, there is often additional structure to be exploited.

i
i

3.3. Special Cases and Structure Exploitation

main
2012/11/1
page 101
i

101

Semidenite programming. An invariant semidenite program is a semidenite


program whose objective and feasible sets are invariant under the action of a group.
As we have just seen, in this case we can always restrict solutions to the xedpoint subspace F of the group action. Remarkably, this subspace will have a very
convenient description.
For most semidenite programs (in particular, those arising from sos decompositions), the group acts on the decision variables in a specic way, where group
elements g act on a symmetric matrix by conjugation, i.e., X  (g)T X(g). Writing the equations for F , and using the fact that (g) is an orthogonal matrix, we
obtain
F = {X : X(g) = (g)X

g G};

(3.25)

i.e., X must commute with all matrices in the representation of G. In this case,
using Schurs lemma of representation theory, one can show that in the appropriate symmetry-adapted basis, the xed-point subspace will have a block-diagonal
structure.
Example 3.109. Consider an invariant semidenite program where the matrices
in the xed-point subspace have the structure

a b b
X = b c d .
b d c
Notice that these matrices are invariant under simultaneous permutation of the last
two rows and columns. We now show that these matrices can be put into a more
convenient form. By pre- and postmultiplying by the orthogonal matrix

1 0
0
1
T = 0 ,
= ,
2
0
we obtain

2b
0
a

T T XT = 2b c + d
0 ,
0
0
cd

and the matrix becomes block diagonal.


The calculation of a symmetry-adapted basis (i.e., the matrix T in the example above) is fully algorithmic; the details are representation-theoretic (and thus
omitted here) but can be found in the literature in [111, 45, 48]. What is important
is that this step simplies the description of F by replacing a big matrix with a collection of smaller ones (the specic dimensions will of course depend on the problem
data). As a consequence, the original SDP problem is reduced to a collection of
smaller coupled matrix constraints, with each block corresponding to an isotypic
component, and cardinality equal to the number of irreducible representations of
the group that appear nontrivially. This allows for a notable reduction in both the
number of decision variables and the size of the semidenite programs to be solved.

i
i

102

main
2012/11/1
page 102
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Example 3.110. Consider our running example, Example 3.107. Since p(x, y, z)
has n = 3 variables, degree 2d
$ = %4, and
$5% a full Newton polytope, its standard sos
formulation is indexed by all n+d
=
d
2 = 10 monomials of degree 2, i.e.,

T
1
x

y

z
2
x

p(x, y, z) =
y2
2
z

yz

xz
xy

q00
q01

q02

q03

q04

q05

q06

q07

q08
q09

q01
q11
q12
q13
q14
q15
q16
q17
q18
q19

q02
q12
q22
q23
q24
q25
q26
q27
q28
q28

q03
q13
q23
q33
q34
q35
q36
q37
q38
q39

q04
q14
q24
q34
q44
q45
q46
q47
q48
q49

q05
q15
q25
q35
q45
q55
q56
q57
q58
q59

q06
q16
q26
q36
q46
q56
q66
q67
q68
q69

q07
q17
q27
q37
q47
q57
q67
q77
q78
q79

q08
q18
q28
q38
q48
q58
q68
q78
q88
q89


1
q09

q19
x

q29
y

q39
z2

q49 x

2 ,
q59
y2

q69
z

q79
yz

xz
q89
q99
xy

where the matrix Q above will be constrained to be positive semidenite. Recall that
p is invariant under all permutations of the variables (the full symmetric group S3 ).
Thus, we can constrain the matrix Q to be in the xed-point subspace, i.e., it
should satisfy Q = (g)T Q(g), where g G and : G GL(R10 ) is the induced
representation on the vector of monomials that arises from permuting the variables
(x, y, z). Solving the equations (3.25) that dene the xed-point subspace, we nd
that the matrices there have the structure

r0 r1 r1 r1 r2 r2 r2 r3 r3 r3
r1 r4 r5 r5 r6 r7 r7 r8 r9 r9

r1 r5 r4 r5 r7 r6 r7 r9 r8 r9

r1 r5 r5 r4 r7 r7 r6 r9 r9 r8

r2 r6 r7 r7 r10 r11 r11 r12 r13 r13


.

(3.26)
Q=

r2 r7 r6 r7 r11 r10 r11 r13 r12 r13


r2 r7 r7 r6 r11 r11 r10 r13 r13 r12

r3 r8 r9 r9 r12 r13 r13 r14 r15 r15

r3 r9 r8 r9 r13 r12 r13 r15 r14 r15


r3 r9 r9 r8 r13 r13 r12 r15 r15 r14
$ %
Notice that the xed-point subspace is 16-dimensional, as opposed to the 11
2 = 55
degrees of freedom in the original matrix.
We can now, however, give a nicer description of this subspace. Consider the
coordinate transformation (a symmetry-adapted basis) of the form X  T T XT ,
where the orthogonal matrix T is given by


T = BlockDiag(1, R, R, R) ,
R = ,

where = 1/ 3, = (3 3)/6, = (3+ 3)/6, and is the permutation matrix


satisfying T [x0 , x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 ] = [x0 , x1 , x4 , x7 , x2 , x5 , x8 , x3 , x6 , x9 ].

i
i

3.3. Special Cases and Structure Exploitation

main
2012/11/1
page 103
i

103

It can be veried that under this tranformation, the matrix in (3.26) now takes
the form
- = BlockDiag(Q1 , Q2 , Q2 ),
T T QT
where

3r1
3r2
3r3
r0
3r1 r4 + 2r5 r6 + 2r7
r8 + 2r9

Q1 =
3r2 r6 + 2r7 r10 + 2r11 r12 + 2r13 ,

3r3 r8 + 2r9 r12 + 2r13 r14 + 2r15

r8 r9
r4 r5 r6 r7
Q2 = r6 r7 r10 r11 r12 r13 .
r8 r9 r12 r13 r14 r15

Notice that the 10 10 matrix has split into three blocks, one of size 4 4 and two
identical blocks of size 3 3. Also, all$ entries
% $ % are otherwise linearly independent
(in fact, we have the dimension count 52 + 42 = 10 + 6 = 16, the number of free
parameters in (3.26)).
-  0, this implies that instead of solving
-  0 if and only if T T QT
Since Q
an SDP problem with a positivity constraint on a 10 10 matrix, we have now a
4 4 matrix and a 3 3 matrix instead (clearly, we need only one copy of the two
identical 3 3 blocks), which is a lot simpler.
As we can see, exploiting symmetry can allow for a signicant reduction in the
computational cost. Depending on how much symmetry the problem has, the gains
can be very signicant and may enable the solution of problems that are otherwise
practically impossible to solve.
Sums of squares. We showed in the previous section how to simplify and decompose a specic semidenite program, corresponding to the sos decomposition of a
given polynomial. We can use similar techniques to simultaneously decompose the
semidenite programs associated to sos decompositions of all polynomials invariant
under a given symmetry group. In other words, if before we were using a symmetryadapted basis to split a xed vector of monomials into isotypic components, now
we will instead simultaneously decompose the whole polynomial ring.
The results we present can be expressed in a very appealing form using a few
basic concepts of invariant theory. Given a nite group G acting on (x1 , . . . , xn ),
recall that the invariant ring is the set of invariant polynomials R[x]G := {p
R[x] : p((g)x) = p(x) g G}, with the natural operations. For simplicity, we will
restrict ourselves to the simple situation where the invariant ring R[x]G is isomorphic
to a polynomial ring.3 In this case, we have R[x]G  1 , . . . , n , where 1 , . . . , n
are algebraically independent invariant polynomials.
3 In general, the invariant ring is a nitely generated algebra but is not necessarily isomorphic to
a polynomial ring; i.e., there may not exist a set of algebraically independent generators; see, e.g.,
[119, 38]. A simple example of this situation is the cyclic group C3 acting on R[x, y, z] by cyclically
permuting the indeterminates. In this case, a minimal set of generators for the invariant ring R[x]G
is {s1 , s2 , s3 , s4 } := {x + y + z, xy + yz + zx, xyz, x2 y + y 2 z + z 2 y}. However, these are algebraically
dependent since they satisfy the relation 9s23 + 3s3 s4 + s24 6s1 s2 s3 s1 s2 s4 + s32 + s31 s3 = 0.

i
i

104

main
2012/11/1
page 104
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Example 3.111. Consider R[x1 , . . . , xn ] and the symmetric group Sn acting by


permutation of the variables in the natural way. It is well known that in this
case the invariant ring R[x]G is isomorphic to a polynomial ring. There are several
natural sets of generators for the invariant ring of symmetric polynomials, including
the elementary symmetric functions
e1 = x1 + x2 + + xn ,
e2 = x1 x2 + x1 x3 + + xn1 xn
..
.
en = x1 x2 xn
and the power sums
p1 = x1 + x2 + + xn ,
p2 = x21 + x22 + + x2n
..
.
pn = xn1 + xn2 + + xnn .
Because the invariant ring is generated by {1 , . . . , n }, it is possible to rewrite
every invariant polynomial f (x) in terms of the generators i to yield a new polynomial f(). This can be done algorithmically, e.g., using Grobner bases, although
more ecient techniques like SAGBI bases can also be used [119].
Example 3.112. Consider the trivariate polynomial of our running example, Example 3.107. We can rewrite p(x, y, z) in terms of the elementary symmetric functions e1 = x + y + z, e2 = xy + yz + xz, and e3 = xyz as
p(e1 , e2 , e3 ) = e41 4e21 e2 + 2e22 + 4e1 e3 4e3 + e1 .
Rewriting an invariant polynomial f (x) in terms of invariants to obtain f() is
very convenient, since it usually leads to simpler representations. But how does this
help us in deciding if f (x) is a sum of squares? In general, if an invariant polynomial
is a sum of squares, it may not be a sum of squares of invariant polynomials (see
Exercise 3.115), so requiring f() to be a sum of squares in R[] would be a very
weak condition. The answer is given in the next theorem.
Theorem 3.113. Let f (x1 , . . . , xn ) be an sos polynomial that is invariant under
the action of a nite group G, and let {1 , . . . , n } be generators of the corresponding
invariant ring. Then f() = f (x) has a representation of the form
f() =

Si (), i (),

where i R[]ri ri are symmetric matrices that depend only on the group action
and Si R[]ri ri are sos matrices.

i
i

3.3. Special Cases and Structure Exploitation

main
2012/11/1
page 105
i

105

The structure of this representation is very appealing. Given a group G, the


matrices i can be precomputed, since they depend only on how the group acts on
the polynomial ring. Then, every invariant sos polynomial can be written as a sum
of pairings between coecients Si () (which are sos matrices) and the matrices i .
Since the Si () are sos matrices that are subject to ane constraints (equality in
the expression above), this is easily reducible to semidenite programming (which
should not be surprising, since this is just the symmetry-reduced version of the
original formulation).
The sizes ri of the matrices i in Theorem 3.113 correspond to the rank of
the ith module of equivariants as a free module over the ring of invariants, and the
number of terms corresponds to the number of irreducible representations of the
group that appear nontrivially in the isotypic decomposition of the polynomial
ring. The dimensions of the corresponding semidenite programs can be determined
explicitly using the generating functions known as the Molien (or HilbertPoincare )
series in a similar way as the Hilbert series for ideals discussed in Section 3.3.5. The
details are omitted here but can be found in [48].
Example 3.114. For the symmetric group S3 , the invariant ring R[x]G is generated by the elementary symmetric functions e1 , e2 , e3 . The corresponding matrices
i can be computed to be
1 = 1,
2 = e21 e22 4e32 4e31 e3 + 18e1 e2 e3 27e23 ,


2e21 6e2
e1 e2 + 9e3
3 =
.
e1 e2 + 9e3 2e22 6e1 e3
Thus, every S3 -invariant sos polynomial can be written in the form
f(e1 , e2 , e3 ) = s1 1 + s2 2 + S3 , 3 ,
where s1 , s2 are scalar sos polynomials and S3 is a 2 2 sos matrix.
Recall that the global minimum of our polynomial p(e1 , e2 , e3 ) is p = psos
2.112913 (an algebraic number of degree 6). We use the representation above to
2113
provide a rational certicate that psos 1000
by choosing
s1 =

2113
1000

s2 = 0,

S3 =

+ e1 +

79
282

79
47 e2

79 2
141 e1

74
304 2
1279 e1 + 693 e1
749
92 + 1636
e1

1120
11511 e1 e2

29 +

749
1636 e1
3469
4908

148 3
1279 e1

1439 2
2454 e2

85469 2
188958 e1 e2

85 4
693 e1 ,


.

It is easy to check that s1 , s2 , and S3 are indeed sums of squares and that they
satisfy p + 2113
1000 = s1 + S3 , 3  and therefore serve as a valid algebraic certicate
for the lower bound 2.113.

i
i

106

main
2012/11/1
page 106
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications


General case

Equality constraints

Symmetries

polynomial ring R[x]

quotient ring R[x]/I

invariant ring R[x]G

monomials (deg k)

standard monomials

isotypic components

Hilbert series

Molien series

Finite convergence
on zero dimensional ideals

Block diagonalization

1
(1t)n


$

n+k1
k

tk

k=0

Table 3.1. Algebraic structures and sos properties.

In Table 3.1 we present a summary and comparison of the dierent techniques


to exploit algebraic structure in sos programs.
Exercise 3.115. Let p(x) be an sos polynomial that is invariant under the action
of
Show that, in general, there may not exist an sos decomposition p(x) =
 a group.
2
q
(x)
,
where
all the qi (x) are invariant polynomials.
i
i
Exercise 3.116. An undirected graph G = (V, E) is vertex transitive if its automorphism group Aut(G) acts transitively on the set of vertices V . Consider the
standard semidenite relaxation for MAXCUT, discussed in Section 2.2.2.
1. Explain how to simplify the MAXCUT semidenite relaxation in the case of
vertex-transitive graphs.
2. Apply your results to the k-cycle graph. What are the values of the optimal
cut and the corresponding SDP upper bound?
Exercise 3.117. Consider the following sextic form, known as the Robinson form:
R(x, y, z) = x6 + y 6 + z 6 x4 y 2 y 4 x2 x4 z 2 y 4 z 2 x2 z 4 y 2 z 4 + 3x2 y 2 z 2 .
1. Show that R(x, y, z) is invariant under S3 but is not a sum of squares.
2. Rewrite (x2 +y 2 +z 2 )R(x, y, z) in terms of the elementary symmetric functions
e1 , e2 , e3 , and give an sos representation as in Theorem 3.113.

3.4

Infeasibility Certicates

At several points in this chapter, we have given sos-based sucient conditions for
dierent problems (e.g., nonnegativity of polynomials over sets in Section 3.2.4). We

i
i

3.4. Infeasibility Certicates

main
2012/11/1
page 107
i

107

now study in more detail the structure of these certicates, as well as the question
of when converse results hold, i.e., how to use sos techniques to certify properties
of systems of equations and inequalities over the real numbers. As we shall see,
sos techniques are very powerful in the sense that they can always provide proofs
of infeasibility for general basic semialgebraic sets. The key role of sum of squares
in these infeasibility certicates is developed in Section 3.4.2, where we introduce
the Positivstellensatz, highlighting the similarities to and dierences from other
well-known algebraic infeasibility certicates.

3.4.1

Valid Constraints: Ideals and Preorders

The feasible set S of an optimization problem is usually described by a nite number of polynomial equations and/or inequalities. However, at least in principle,
one could write many other constraints that are equally valid on the set S. For
instance, for a linear programming problem, we could consider nonnegative linear combinations of the given inequalities. Recall that this issue appeared already
in Section 3.2.4, when considering polynomial nonnegativity over a set, and we
described there two techniques (for equations and inequalities, respectively) of producing further valid constraints. We would like to understand the set of all possible
valid constraints and, in particular, how to algorithmically generate them. To do
so, we revisit those constructions next and formalize their properties in terms of
two important algebraic objects: ideals and preorders.
For the case of a set described by equations fi (x) = 0, we were able to produce
further polynomials vanishing on the set S by considering linear combinations with
polynomial coecients. The set of all polynomials generated this way is a polynomial ideal. We restate the familiar denition here, for easy comparison with the
new concepts introduced later.
Denition 3.118. Given multivariate polynomials {f1 , . . . , fm }, the ideal generated by the fi is
f1 , . . . , fm  := {f : f = t1 f1 + + tm fm ,

ti R[x]} .

Similarly, for a set described by inequalities gi (x) 0, one can generate new
valid inequality constraints by multiplying the gi (x) against sos polynomials, or
by taking conic combinations of valid constraints. This is formalized through the
notion of quadratic module.
Denition 3.119. Given multivariate polynomials {g1 , . . . , gm }, the quadratic
module generated by the gi is the set
qmodule(g1 , . . . , gm ) := {g : g = s0 + s1 g1 + + sm gm },
where s0 , s1 , . . . , sm R[x] are sums of squares.

i
i

108

main
2012/11/1
page 108
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

However, as noted earlier, we can also generate further valid constraints by


taking products of existing valid constraints, which suggests considering the preorder
generated by the polynomials gi (x).
Denition 3.120. Given multivariate polynomials {g1 , . . . , gm }, the preorder generated by the gi is the set

preorder(g1 , . . . , gm ) :=

g : g = s0 +

si g i

{i}


{i,j}

sij gi gj +


{i,j,k}

sijk gi gj gk +

where each term in the sum is a square-free product of the polynomials gi , with a
coecient s R[x] that is a sum of squares. The sum is nite, with a total of
2m terms, corresponding to all subsets of {g1 , . . . , gm }.
Clearly qmodule(g1 , . . . , gm ) preorder(g1 , . . . , gm ), so, in principle, the
latter yields a possibly larger set of valid constraints. By construction, ideals,
quadratic modules, and preorders contain only valid constraints, which are logical
consequences of the given equations and inequalities. Indeed, every polynomial
in the ideal f1 , . . . , fm  vanishes on the solution set of fi (x) = 0. Similarly, every element of preorder(g1 , . . . , gm ) is clearly nonnegative on the feasible set of
gi (x) 0.
A natural question arises: Can all valid constraints be generated this way?
Unless further assumptions are made, ideals and preorders (and thus, quadratic
modules) may not necessarily contain all valid constraints; see Exercise 3.121. Remarkably, however, they will be powerful enough to always detect and certify the
possible infeasibility (i.e., emptiness) of the corresponding feasible set; the Positivstellensatz (Theorem 3.127) formalizes this statement.
The notions of ideal, preorder, and quadratic module as used above are standard in real algebraic geometry; see, for instance, [19] (the preorders are sometimes
also referred to as a cones). Notice that, as geometric objects, ideals are ane sets,
and quadratic modules and preorders are closed under convex combinations and
nonnegative scalings (i.e., they are actually cones in the convex geometry sense).
These convexity properties, coupled with the relationships between semidenite
programming and sums of squares, will be key for our developments in the next
section.
Exercise 3.121. In general, ideals and preorders may not contain all valid constraints. In this exercise, we illustrate a few cases where things may go wrong.
1. Let S = {x R : x2 = 0}. Show that the polynomial x vanishes on the
feasible set but is not in the ideal x2 .

i
i

3.4. Infeasibility Certicates

main
2012/11/1
page 109
i

109

2. Let S = {(x, y) R2 : x2 + y 2 = 0}. Show that the polynomial x vanishes


on the feasible set but is not in the ideal x2 + y 2 .
3. Let S = {x R : x3 0}. Show that the polynomial x is nonnegative on the
feasible set but is not in preorder(x) (and thus, is not in qmodule(x)).
4. Let S = {x R : x 0, y 0}. Show that the polynomial xy is nonnegative
on the feasible set but is not in qmodule(x, y) (but it is in preorder(x, y)).
These examples fail for a variety of reasons that are related to either multiplicities, real versus complex solutions, or impossibility of degree cancellations. As we
shall see, using suitable modications to take into account the dierences between
C and R, and/or additional assumptions, all these diculties can be avoided.

3.4.2

Certicates of Infeasibility

A central theme throughout convex optimization is the concept of infeasibility certicates, or, equivalently, theorems of the alternative. The key links relating algebraic techniques and optimization will be the facts that infeasibility of a given
polynomial system can always be certied through a particular algebraic identity,
and that this identity itself can be found via convex optimization.
Let us start by considering the following question: If a system of equations
does not have solutions, how can we prove this fact? In particular, what kind of
evidence could we show to a third party to convince them that the given equations
are indeed unsolvable?
Remark 3.122. Notice the asymmetry between this question (proving or certifying
nonexistence of solutions) versus providing evidence that the equations truly have
solutions. The latter could be certied (at least in principle) by producing a candidate
point x0 that satises all equations (nding such a point x0 could be very hard, but
that is not the issue here). In complexity-theoretic terms, this is essentially the
distinction between the NP and co-NP complexity classes (over either the Turing or
the real computation model).
Fortunately, for problems with algebraic structure, there are quite natural
ways of providing infeasibility certicates. These are formal algebraic identities that
give irrefutable evidence about the inexistence of solutions. We briey recall and
illustrate several well-known special cases before proceeding to the general case of
polynomial systems over the reals. Table 3.2 contains a summary of the infeasibility
certicates to be discussed and the associated computational techniques.
Linear equations. We consider rst linear systems of equations over either the
real or the complex numbers (in fact, any eld will do). It is a well-known result
from linear algebra that if a set of linear equations Ax = b is infeasible, there exists a
linear combination of the given equations such that the left-hand side is identically
zero, but the right-hand side does not vanish (and thus, infeasibility is evident).
Such a linear combination can be found, for instance, by Gaussian elimination.
This result is also known as the Fredholm alternative.

i
i

110

main
2012/11/1
page 110
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications


Degree \ eld
Linear
Polynomial

Complex
Range/kernel
Linear algebra
Nullstellensatz
Bounded degree: Linear algebra
Grobner bases

Real
Farkas lemma
Linear programming
Positivstellensatz
Bounded degree: SDP

Table 3.2. Infeasibility certicates and associated computational techniques.


Theorem 3.123 (Range/kernel). Consider the linear system Ax = b. Then,
Ax = b

is infeasible

s.t. AT = 0, bT = 1.
Notice that one direction of the theorem (existence of a suitable implies
infeasibility) is obvious: premultiply the equations with T to obtain
Ax = b

T Ax = T b

0 = 1,

which is clearly a contradiction. Thus, if a vector satises the conditions in


the second half of the theorem, it provides an easily checkable certicate of the
infeasibility of the system Ax = b. Notice that in this particular case, not only is it
easy to verify that a given vector is a valid certicate, but one can also eciently
nd such a (e.g., by Gaussian elimination).
Polynomial systems over C. For systems of polynomial equations over an algebraically closed eld, infeasibility is characterized through one of the central results
in algebraic geometry.
Theorem 3.124 (Hilberts Nullstellensatz).
mials in complex variables z1 , . . . , zn . Then,
fi (z) = 0

(i = 1, . . . , m)

Let fi (z), . . . , fm (z) be polyno-

is infeasible in Cn

1 f1 , . . . , fm .
Again, the easy direction is almost trivial. If 1 is in the ideal generated
by the fi , there exist polynomials t1 (z), . . . , tm (z) such that
t1 (z)f1 (z) + + tm (z)fm (z) = 1.
Evaluating this expression at any candidate solution of the polynomial system, we
obtain a contradiction, since the left-hand side vanishes, while the right-hand side
does not. The polynomials ti prove infeasibility of the given equations and constitute
a Nullstellensatz refutation for the polynomial system. Their eective computation
can be accomplished in a variety of ways. This could be done, for instance, via

i
i

3.4. Infeasibility Certicates

main
2012/11/1
page 111
i

111

Gr
obner basis techniques, or, if a bound on the degree of the polynomials ti is
assumed a priori, via straightforward (but possibly inecient) linear algebra.
At this point, we should mention an important complexity-theoretic distinction between this case and the simpler case of linear equations discussed earlier.
Since deciding feasibility of polynomial equations includes propositional satisability (which is NP-hard) as a special case, it would be unreasonable to expect that
short certicates of infeasibility always exist. Thus, in general one should not
expect to always be able to produce certicates ti (z) of small degree for every infeasible system. In fact, explicit systems of equations are known whose Nullstellensatz
refutations necessarily have large degree; see Exercise 3.135, as well as [24, 55, 36]
and the references therein.
Remark 3.125. The two results discussed above deal only with equations (either
linear equations over any eld, or polynomial equations over the complex numbers).
Working with inequalities, or trying to distinguish between real versus complex
solutions, will bring additional algebraic challenges. As we will see, to do this one
needs to take into account special properties of the reals (mainly, the fact that R is
an ordered eld) that are not true for the complex numbers.
Linear inequalities. For systems of linear inequalities, strong LP duality provides ecient certicates of infeasibility. These are essentially an algebraic interpretation of the separation theorem for polyhedral sets and are usually presented
in terms of theorems of the alternative such as the celebrated Farkas lemma.
Theorem 3.126 (Farkas lemma).

Ax + b = 0,
Cx + d 0

0, s.t.

is infeasible

AT + C T
bT + dT

= 0,
= 1.

As in the previous cases, the easy direction is straightforward. It is equivalent to the weak duality of linear programming and follows from direct syntactic manipulations (premultiply the rst equation by T and the second equation
by T , and add to obtain a contradiction). The dicult converse direction is
equivalent to strong duality, which always holds for linear programming problems.
A suitable certicate pair (, ) can be obtained by solving the corresponding LP,
which can be done in polynomial time using the ellipsoid algorithm or interior-point
methods.
These classical results can be generalized and unied to handle the case of
systems of polynomial equations and inequalities over the real numbers. This will
yield a simultaneous generalization of Farkas lemma (to allow for polynomial inequalities), as well as the possibility of distinguishing between real and complex
solutions (unlike the Nullstellensatz).

i
i

112

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

3.4.3

The Positivstellensatz

main
2012/11/1
page 112
i

Consider a general system of polynomial equations and inequalities for which one
wants to show that it has no solutions over the real numbers. How do we certify
its infeasibility? As we describe next, a very natural class of algebraic certicates
exists for this case, under no assumptions whatsoever. This result is known as
the Positivstellensatz and is one of the cornerstones of real algebraic geometry. It
essentially appears in this form in [19] and is due to Stengle [114].
Theorem 3.127 (Positivstellensatz).

fi (x) = 0 (i = 1, . . . , m),
gi (x) 0 (i = 1, . . . , p)

is infeasible in Rn

F (x) + G(x) = 1,
F (x) f1 , . . . , fm ,
F (x), G(x) R[x] s.t.

G(x) preorder(g1 , . . . , gp ).

(3.27)

The theorem states that for every infeasible system of polynomial equations
and inequalities, there exists a simple algebraic identity that directly certies the
inexistence of real solutions. The certicate has a very simple form: a polynomial F (x) from the ideal generated by the equality constraints and a polynomial
G(x) from the preorder generated by the equations that add up to the polynomial
1. The easy direction is immediate: by construction, evaluating F (x) + G(x)
at any feasible point should produce a nonnegative number. However, since this
expression is identically equal to the polynomial 1, we arrive at a contradiction.
Remarkably, the Positivstellensatz holds under no assumptions whatsoever on the
polynomials.
Naturally, we are concerned with the eective computation of these certicates. Recall that for the cases of Theorems 3.1233.126, the corresponding refutations can be obtained using either linear algebra, linear programming, or Gr
obner
bases techniques. For the Positivstellensatz, we have established that ideals and
preorders are convex cones in the space of polynomials. As a consequence, the
conditions in Theorem 3.127 for a certicate to exist are convex, regardless of any
convexity property of the original problem. Furthermore, the same property holds
if we consider only bounded-degree sections, i.e., the intersection with the subspace
of polynomials of degree less than or equal to a given number D. In this case,
the conditions in the Positivstellensatz have exactly the form of an sos program.
This implies that we can nd bounded-degree certicates by solving semidenite
programs.
Theorem 3.128. Consider a system of polynomial equations and inequalities that
has no real solutions. The search for bounded-degree Positivstellensatz infeasibility
certicates is an sos program and thus is solvable via semidenite programming.
If the degree bound is suciently large, infeasibility certicates F (x), G(x) for the
original system will be obtained from the corresponding sos program.

i
i

3.4. Infeasibility Certicates

main
2012/11/1
page 113
i

113

Since infeasibility certicates are naturally ordered by their degree, this gives
rise to a natural hierarchy of semidenite relaxations for semialgebraic problems,
indexed by certicate degree [89, 91]. The Positivstellensatz guarantees that this
hierarchy is complete in the sense that, for every infeasible system, a suitable refutation will eventually be found.
Example 3.129. Consider the following polynomial system:
f1 := x21 + x22 1 = 0,
g1 := 3x2 x31 2 0,
g2 := x1 8x32 0.
We will prove that it has no solutions (x1 , x2 ) R2 . By the Positivstellensatz, the
system is infeasible if and only if there exist polynomials t1 , s0 , s1 , s2 , s12 R[x1 , x2 ]
that satisfy
f t + s0 + s1 g1 + s2 g2 + s12 g1 g2 = 1,
 1 ! "1

!
"

ideal f1 

(3.28)

preorder(g1 ,g2 )

where s0 , s1 , s2 , and s12 are sos polynomials.


We will look for solutions where all the terms on the left-hand side have
degree bounded by D. For each degree bound D, this is an sos program and thus
is solvable via semidenite programming. For instance, for D = 4 we nd the
certicate (written in fully explicit sos form)
t1 = 3x21 + x1 3x22 + 6x2 2,
'2
'2
&
&
5 2 387
52
11
1
5
2
2
x +
x1 +
s0 =
x1 x2
x1 x1 x2 x1 + x2
43 1
44
129
5
22
11
$
$
%
%2
1
3
2
x21 + 2x1 x2 + x22 + 5x2 +
2 x21 x22 x2 ,
+
20
4
s1 = 3,
s2 = 1,
s12 = 0.
The resulting identity (3.28) thus certies the inconsistency of the system {f1 = 0,
g1 0, g2 0}.
In the worst case, of course, the degree of the infeasibility certicates F (x),
G(x) could be large (this is to be expected due to the NP-hardness of polynomial
infeasibility). In fact, as in the Nullstellensatz case, there are explicit counterexamples where large degree refutations are necessary [55]. Nevertheless, for many
problems of practical interest, it is often possible to prove infeasibility using relatively low-degree certicates. There is signicant numerical evidence that this is
the case, as indicated by the large number of practical applications where sos techniques have provided solutions of very high quality. An outstanding open research
question is to understand classes of polynomial systems that can be solved, either
in an exact or approximate fashion, using certicates of low degree.

i
i

114

main
2012/11/1
page 114
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

To summarize our discussions, there is a direct path connecting general polynomial optimization problems to semidenite programming, via Positivstellensatz
infeasibility certicates. Pictorially, we have the following:
Polynomial systems

Positivstellensatz certicates

Sum of squares programs

Semidenite programming.
Even though so far we have discussed only feasibility problems, there are obvious
straightforward connections with optimization questions, which we make more concrete in the next section. As we did earlier in the case of unconstrained optimization,
by considering the emptiness of the sublevel sets of the objective function, sequences
of converging bounds indexed by certicate degree can be directly constructed.
Exercise 3.130. Consider a single quadratic polynomial equation ax2 + bx +
c = 0. What conditions must (a, b, c) satisfy for this equation to have no real
solutions? Assuming this condition holds, give a Positivstellensatz certicate of the
nonexistence of real solutions.
Exercise 3.131. Explain how Theorem 3.127 simplies in the following cases:
1. There are no equality constraints.
2. There are no inequality constraints. Is this case equivalent to Hilberts Nullstellensatz? Explain why or why not.
Exercise 3.132. Consider the polynomial system {x + y 3 = 2, x2 + y 2 = 1}.
1. Is it feasible over C? How many solutions are there?
2. Is it feasible over R? If not, give a Positivstellensatz-based infeasibility certicate of this fact.
Exercise 3.133. Assume that in the statement of the Positivstellensatz, we replace
preorder(g1 , . . . , gp ) with the (potentially smaller) set qmodule(g1 , . . . , gp ). Is the
result still true? Prove, or disprove via a counterexample.
Exercise 3.134. Prove, using the Positivstellensatz, that every nonnegative polynomial is a sum of squares of rational functions. (Hint: A polynomial f (x) satises
f (x) 0 for all x Rn if and only if the set {(x, y) Rn R : f (x) 0, yf (x) = 1}
is empty.)

i
i

3.4. Infeasibility Certicates

main
2012/11/1
page 115
i

115

Exercise 3.135. In this exercise we compare the relative power of Nullstellensatz


and Positivstellensatz
based proofs in the context of a specic example. Consider
n
the set of equations { i=1 xi = 1, x2i = 0 for i = 1, . . . , n}.
1. Show that the given equations are infeasible (either over C or R).
2. Give a short Positivstellensatz proof of infeasibility (degree 2 should be enough).
3. Show that every Nullstellensatz proof of infeasibility must have degree greater
than or equal to n.

3.4.4

Positivity on Compact Sets

In many problems, such as constrained optimization, it is of interest to obtain


explicit certicates of positivity of a polynomial over a set. In what follows, S is a
basic closed semialgebraic set dened as
S = {x Rn : g1 (x) 0, . . . , gm (x) 0}.

(3.29)

Using the Positivstellensatz, it can be easily shown (Exercise 3.139) that if


a polynomial p(x) is strictly positive on the set S, then it has a representation of
the form
p(x) =

1 + q1 (x)
,
q2 (x)

q1 (x), q2 (x) preorder(g1 , . . . , gm ),

(3.30)

which obviously certies its strict positivity on S.


Under further assumptions on the set S, this representation can be simplied. The following result, due to Schm
udgen, provides a denominator-free representation for positive polynomials on compact sets.
Theorem 3.136 ([110]). Let S be a compact set, dened as in (3.29). If a
polynomial p(x) is strictly positive on S, then p(x) is in preorder(g1 , . . . , gm ).
Adding an additional assumption (not just compactness of the set S, but an
algebraic certicate of its compactness), even more is true. It is convenient to
introduce the following Archimedean property.
Denition 3.137. A quadratic
module is Archimedean if there exists N N such

that the polynomial N i x2i is in the quadratic module.
Notice that 
if qmodule(g1 , . . . , gm ) is Archimedean, then the set S is contained in the ball i x2i N , and thus it is necessarily compact. The following theorem by Putinar gives a representation of positive polynomials for the Archimedean
case.
Theorem 3.138 ([102]). Let S be a compact set, dened as in (3.29). Furthermore, assume that qmodule(g1 , . . . , gm ) is Archimedean. If a polynomial p(x) is
strictly positive on S, then p(x) is in qmodule(g1 , . . . , gm ).

i
i

116

main
2012/11/1
page 116
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

As we can see, these representations are simpler in the sense that the conditions involve fewer sos multipliers (recall that the preorder contains terms corresponding to squarefree products between inequalities). Notice, however, that these
results say nothing about the degrees of the corresponding sos polynomials. It may
be possible, at least in certain cases, that the degrees appearing in simpler representations are much larger than those of more complicated ones; see, e.g., [115].
We explore some of these issues in the exercises.
Hierarchies of relaxations. All the sos conditions that we have discussed, including Positivstellensatz certicates (Theorem 3.127) and the representation theorems of Schm
udgen (Theorem 3.136) and Putinar (Theorem 3.138), depend on
the degree of the sos multipliers. Thus, each of these theorems gives rise to a corresponding hierarchy of sos relaxations, obtained by increasing the corresponding
certicate degree. For instance, when minimizing a polynomial p(x) over a set
S of the form (3.29), we can consider as before Positivstellensatz certicates of
the form
p(x) =

1 + q1 (x)
,
q2 (x)

where q1 , q2 preorder(g1 , . . . , gm ), or Schm


udgen/Putinar representations
p(x) preorder(g1 , . . . , gm ),
p(x) qmodule(g1 , . . . , gm ),
respectively, depending on what form of certicate is desired (or what assumptions
the set S satises). For any given maximum degree of the sos polynomials appearing on the right-hand side, one can maximize over , which can be done via
sos programs (possibly combined with bisection). Each of these alternatives will
thus produce a monotone sequence of lower bounds converging to the optimal value
(provided the assumptions are satised, for the case of Schm
udgen and Putinar
representations). For the Positivstellensatz, this was presented in [89, 91], and the
case of Putinar-type certicates was analyzed by Lasserre in [72] from the dual
viewpoint of moment sequences.
Exercise 3.139. Consider a set S as in (3.29). Show, using the Positivstellensatz,
that a polynomial p(x) is strictly positive on S if and only if it has a representation
of the form (3.30).
Exercise 3.140. Consider the problem of nding a representation certifying the
nonnegativity of p(x) := 1 x2 over the set S = {x : (1 x2 )3 0}. Notice
that the feasible set S is the interval [1, 1] and that for this example the preorder
and the quadratic module coincide. Let 0. Stengle proved in [115] that no
representation of the form
p(x) + = s0 (x) + s1 (x)(1 x2 )3

(3.31)

i
i

3.5. Duality and Sums of Squares

main
2012/11/1
page 117
i

117

exists when = 0, where s0 (x), s1 (x) are sums of squares. He also showed that
as 0, the degrees of s0 , s1 necessarily have to go to innity, and provided the
1
1
bounds c1 2 deg(s0 ) c2 2 log 1 for some constants c1 , c2 .
1. Give a Positivstellensatz certicate of the form (3.30) for strict positivity of
p(x) + on S. Does the certicate degree depend on ?
2. Verify that the expressions below give the best representation of the form
(3.31). Let the degree of s0 (x) be equal to 4N . Then, the optimal solution
that minimizes is

=
N

1
,
(2N + 2)2 1

s0 (x) = q0 (x)2 ,

s1 (x) = q1 (x)2 ,

where
$
%
q0 (x) = 2(N + 1) 2 F1 N, N + 2 ; 12 ; x2 ,
$
%
1
q1 (x) =  x 2 F1 N 1, N + 1 ; 32 ; x2 ,
N
and 2 F1 (a, b; c, x) is the standard Gauss hypergeometric function [1, Chapter 15].
Exercise 3.141. Recall the set S from Exercise 3.63:
S = {(x, y) R2 : x 0, y 0, x + y 1}.
The polynomial p(x, y) = xy + (for > 0) is strictly positive on S. Analyze
experimentally the smallest values of , provable using the positivity certicates of
Theorems 3.136 and 3.138, as a function of certicate degree. Compare this against
the Positivstellensatz certicates (3.30).

3.5

Duality and Sums of Squares

The sets of nonnegative and sos polynomials, being convex cones, have a rich duality

and n,2d and explain


structure. In this section we introduce their duals Pn,2d
their natural interpretations. We do this from both a coordinate-free viewpoint
that emphasizes the geometric aspects as well as a probabilistic interpretation with
strong links to the classical truncated moment problem and applications.

3.5.1

Dual Cones of Polynomials

Recall that the sets of nonnegative polynomials Pn,2d and sums of squares n,2d

are proper cones in R[x]n,2d . It then follows that the corresponding duals Pn,2d
and n,2d are also proper cones (in the vector space R[x]n,2d ) and that the reverse
containment holds:
n,2d Pn,2d

n,2d Pn,2d
.

i
i

118

main
2012/11/1
page 118
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

What is the interpretation of these dual cones? Are there natural objects associated with them?
The dual space. Let us consider rst the dual space to polynomials R[x]n,2d . The
elements of this vector space are linear functionals on polynomials, i.e., linear maps
of the form : R[x]n,2d R, that take a polynomial and return a real number.
There are many such functionals, and they can supercially look quite dierent.
For instance, some examples of such linear maps are
evaluation of p at a point x0 Rn : p  p(x0 ),
1
integration of p over a subset S Rn : p  S p(x)dx,
evaluation of derivatives of p at a point x0 Rn : p 

p
xi ...xk (x0 ),

extraction of coecients: p  coe(p, x ),


contraction with a dierential operator q R[1 , . . . , n ]n,2d : p  q p.
A distinguished class of linear functionals are the point evaluations (our rst example above): to any v Rn , we can associate v R[x]n,2d , with v : p  p(v).
Naturally, we can generate additional linear functionals
 by taking linear
 combinations of point evaluations, i.e., maps of the form p  i i vi (p) = i i p(vi ) for
i R and vi Rn . It turns out that all linear functionals can be obtained this
way; this is equivalent to the existence of dense multivariate polynomial interpolation schemes (Exercise 3.142).

Dual cone of nonnegative polynomials. What about the dual cone Pn,2d
=

{ R[x]n,2d : (p) 0 p Pn,2d }? Clearly, it contains all the point evaluations


v (since for any nonnegative
 polynomial p, we have v (p) = p(v) 0), as well as
their conic combinations i i vi , with i 0 and vi Rn . It can be shown that

almost all elements of Pn,2d


have this form in the sense that this dual cone is the
closure of the convex hull of the point evaluations. The need for a closure condition
arises because we are working in an ane setting, i.e., with polynomials instead of
forms; see Exercise 3.143 for an illustration of why the closure is required. In the
homogeneous case, as will be explained in Chapter 4, or when working on a compact
set, the situation is nicer, and the convex hull of point evaluations is automatically
closed. We discuss a probabilistic interpretation in Section 3.5.2 and revisit this
geometric characterization in Section 3.5.4.

Dual cone of sums of squares. For the cone n,2d (dual of sums of squares),
the situation is a bit simpler. Since the cone n,2d is generated by the squares, we
have almost by denition the description n,2d = { R[x]n,2d : (q 2 ) 0 q
R[x]n,d }. This directly gives a characterization of n,2d as a spectrahedron; see
Exercise 3.144. However, in this case the geometric interpretation is less clear,

, and thus this cone has extreme rays that do not


since in general n,2d  Pn,2d
necessarily correspond to point evaluations.
We remark that from Hilberts classication of the cases when Pn,2d and n,2d
coincide (Section 3.1.2), one directly obtains the corresponding equalities between

and n,2d for the same values of n and d.


Pn,2d

i
i

3.5. Duality and Sums of Squares

main
2012/11/1
page 119
i

119

The coordinate-free viewpoint described above is mathematically natural and


notationally simple, and it is analyzed in more detail in Chapter 4. It is also of
relevance when doing numerical computations, since, as we have discussed already
in Section 3.1.5, it is often essential to use vector space bases with good numerical
properties. Nevertheless, given its many applications, it is also important to understand the alternative viewpoint where one identies the dual space Rn,2d with
truncated moment sequences of probability distributions. This corresponds to a
specic choice of coordinates for the space of polynomials (namely, the monomial
basis), and moments constitute the associated dual basis for the dual space Rn,2d .
This viewpoint is further explored in the remainder of the section.
Exercise 3.142. As described earlier, every linear functional on R[x]n,2d is a
linear combination of point evaluations, i.e., for every R[x]n,2d , there exist
k
1 , . . . , k R and v1 , . . . , vk Rn , such that (p) = i=1 i p(vi ).
1. Prove this statement for the univariate case (n = 1). Hint: Use the nonsingularity of the Vandermonde matrix for suitably chosen points.
2. Extend your proof to the general multivariate case.
Exercise 3.143. Consider the vector space of univariate quadratic polynomials
R[x]1,2 R3 .
1. Express the linear functional (p2 x2 + p1 x + p0 ) 
combination of point evaluations.

13
2

p(x)dx as a (nite) linear

2. Express the linear functional (p2 x2 + p1 x + p0 )  p2 as a linear combination


of point evaluations.

3. Show that this linear functional is in P1,2


but cannot be written as a conic
combination of point evaluations.

4. Give a geometric interpretation of the statement above.


Exercise 3.144.
1. Show that n,2d is a spectrahedron.
2. Show that n,2d is a projected spectrahedron but is not a spectrahedron.

3. Is Pn,2d
or n,2d basic semialgebraic?

Exercise 3.145. Find an extreme point of 2,4 that is not a conic combination
of point evaluations. Hint: Think about the Motzkin polynomial. How would you
prove that it is not a sum of squares?

i
i

120

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

3.5.2

Probability and Moments

main
2012/11/1
page 120
i

A particular, but important, interpretation of the dual cone Pn,2d


is in terms of
truncated moment sequences of probability distributions. The basic idea, discussed
below in more detail,
 is the following: consider the standard monomial basis for
Pn,2d , and let p = ||2d c x be a nonnegative polynomial and be a nonnega1
1

c

0,
where

:=
x d. Conversely, given
tive measure. Then pd
=


,
if
c

0
for
all
nonnegative
p,
then
the linear functional
a set of numbers

(p) := c is in Pn,2d
, and thus it is (up to closure) a conic combination
of point evaluations. We1 can interpret this as a nonnegative measure , which will
satisfy = (x ) = x d. Thus, we can identify (again, up to closure) the

duals space Pn,2d


with the set of moments for which a nonnegative measure
matching those moments exists. The following geometric interpretation may be
helpful: on compact sets (or in the homogeneous case), by the Riesz representation theorem the duals of the nonnegative continuous functions are the nonnegative
measures. Since the set of polynomials is a subspace, Pn,2d is a section of the cone

of nonnegative continuous functions, and thus its dual Pn,2d


must be a projection
of the1 cone of measures. In the chosen basis, this projection is the moment map
 x d that takes a measure into its moments.
In what follows, we explain and elaborate upon this interpretation. For simplicity, we start with the univariate case.

Valid sequences of moments. Consider a real-valued random variable X, or,


equivalently, a nonnegative measure supported on R, where P(X E) = (E)
for all events E. The moments of X (or of ) are dened as the expectation of the
pure powers, i.e.,
,
k

k := E[X ] =

xk d(x).

(3.32)

In particular, for a random variable X we have 0 = 1 (normalization) and 1 =


E[X] (mean or expected value).
A natural question to consider is the following: what constraints, if any, should
the moments k satisfy? In particular, is it true that for any set of numbers
(0 , 1 , . . . , k ) there always exists a nonnegative measure having exactly these
moments? This is the classical truncated moment problem; see, e.g., [6, 112].
It should be apparent that this is not always the case and that some conditions
on the k are required. For instance, consider (3.32) for an even value of k. Since
the measure is nonnegative, it is clear that in this case we must have k 0.
However, this condition is clearly not enough, and further restrictions should hold.
A simple one can be derived by recalling the relationship between the variance of a
random variable and its rst and second moments, i.e., var(X) = E[(X E[X])2 ] =
E[X 2 ] E[X]2 = 2 21 . Since the variance is always nonnegative, the inequality
2 21 0 must always hold.
How to systematically derive conditions of this kind? The previous inequality
can be obtained by noticing that for all a0 , a1 ,

i
i

3.5. Duality and Sums of Squares


0 E[(a0 + a1 X) ] =
2

a20

+ 2a0 a1 E[X] +

main
2012/11/1
page 121
i

121
a21 E[X 2 ]

 T 
a
0
= 0
a1
1

1
2

 
a0
,
a1

which implies that the 2 2 matrix above must be positive semidenite. Interestingly, this is equivalent to the inequality obtained earlier.
The same procedure can be repeated for higher-order moments. Let =
(0 , 1 , . . . , 2d ) be given. By considering the expectation of the square of a generic
polynomial
0 E[(a0 + a1 X + + ad X d )2 ],
we have that the higher order moments of a random

0
1
2

1
2
3

3
4

H() := 2
..
..
..
..
.
.
.
.
d d+1 d+2

variable must satisfy

d
d+1

d+2
 0.
..
.

(3.33)

2d

Notice that H() is a Hankel matrix, and the diagonal elements correspond to the
even-order moments, which should obviously be nonnegative.
As we will see below, this condition is almost necessary and sucient in
the univariate case in the sense that it characterizes the set of valid moments up to
closure.
Theorem 3.146. Let = (0 , 1 , . . . , 2d ) be given, where 0 = 1. If is a valid
set of moments, then the associated Hankel matrix H() is positive semidenite.
Conversely, if H() is (strictly) positive denite, then is valid; i.e., there exists
a nonnegative random variable with this set of moments.
The derivation given earlier shows the necessity of the semideniteness condition. Suciency will follow from the explicit construction of Section 3.5.5.
Remark 3.147. For the case of measures supported on the real line, the semidefinite condition in (3.33) characterizes the closure of the set of moments, but not
necessarily the whole set. As an example, consider = (1, 0, 0, 0, 1), corresponding
to the Hankel matrix

1 0 0
H() = 0 0 0 .
0 0 1
Although this matrix is positive semidenite, there is no nonnegative measure corresponding to those moments (notice that 2 = 0). However, the parametrized atomic
measure given by
&
&
'
'
1
1
4
4
=
x+
x
+ (1 4 ) (x) +
2

i
i

122

main
2012/11/1
page 122
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

has as rst ve moments (1, 0, 2 , 0, 1), and thus as 0 they converge to those
given above.
As the remark above illustrates, the fact that the semidenite description is
correct only up to closure is a consequence of considering measures supported on
the whole real line, which is not compact. For the case of compact intervals, the
situation will be nicer, as we will see in the next section.
As we move on to the general multivariate case, however, much more serious
diculties will appear (essentially, once again, the dierence between polynomial
nonnegativity versus sums of squares). We will discuss this situation in Section 3.5.6.

3.5.3

Nonnegative Measures on Intervals

We are interested now in deriving conditions for = (0 , 1 , . . . , d ) to be valid


moments of the distribution of a random variable supported on a compact interval
of the real line. For simplicity, we concentrate in the case of the interval [1, 1].
Clearly, the necessary condition described in the previous section (positive
semideniteness of the Hankel matrix H()) should hold. However, additional conditions may be required to ensure the measure is supported in [1, 1]. Recall how
the necessity of the condition H()  0 was derived: by considering a nonnegative
polynomial p(x), and computing E[p(X)], which gives a linear condition on the moments. Thus, in order to generate additional valid inequalities that must satisfy,
we need to have access to nonnegative polynomials on the domain of interest (the
support set of the measure).
Fortunately, we have already discussed in Section 3.3.1 a full sos characterization of the set of polynomials nonnegative on intervals. As shown below, dualizing
these conditions, we can similarly obtain a complete characterization for valid moments of a [1, 1] measure. As in the case of polynomial nonnegativity, depending
on whether the index of the largest moment is even or odd, we can write two
slightly dierent (but equivalent) characterizations.
Odd case. Consider the polynomials
(1 + x)

.

d
i=0

ai xi

/2
,

(1 x)

.

d
i=0

ai xi

/2
,

(3.34)

which are obviously nonnegative for x [1, 1]. As before, by computing the
expectation of these polynomials, we obtain necessary conditions in terms of the
quadratic form (in the coecients ai ):

d 
d
/2  
.
d
i
a
X
(j+k j+k+1 )aj ak .
=
0 E (1 X)
i
i=0
j=0 k=0

Since the polynomials of the form (3.34) generate all nonnegative polynomials on
[1, 1], and this interval is compact, these conditions give a full characterization.
We formalize this in the next result.

i
i

3.5. Duality and Sums of Squares

123

Lemma 3.148. There exists a nonnegative


moments (0 , 1 , . . . , 2d+1 ) if and only if


1
0
1
2

2
3
d+1
2

2
3
4
d+2

3
..

..
..
.
.
.
..
..
..
.
.
.
d+1

d+2

main
2012/11/1
page 123
i

2d

d+1

Even case. Consider now instead


.
/2
d
i
,
i=0 ai x

nite measure supported in [1, 1] with


2
3
4
..
.

3
4
5
..
.

..
.

d+2

d+3

(1 x2 )

.

d1
i=0

ai xi

d+1
d+2

d+3
 0. (3.35)
..
.
2d+1

/2
,

which are again obviously nonnegative in [1, 1]. This yields the following lemma.
Lemma 3.149. There exists a nonnegative nite measure supported in [1, 1] with
moments (0 , 1 , . . . , 2d ) if and only if

1
2

d
0
1
2
3
d+1

2
3
4
d+2

 0,
..
..
..
..
..
.
.
.
.
.

0
1
2
..
.

1
2
3
..
.

2
3
4
..
.

..
.

d1

d+1

2
d1
3
d

d+1
4
.. ..
. .
2d2
d+1

d+1

d+2

3
4
5
..
.

4
5
6
..
.

..
.

d+2

d+3

2d

d+1
d+2

d+3
 0.
..
.
2d
(3.36)

In both cases, if the measure is normalized (i.e., if it is a probability measure),


then additionally the zeroth moment must satisfy 0 = 1.
Exercise 3.150. Show that the condition (3.35) implies positive semideniteness
of the Hankel matrix H(0 , 1 , . . . , 2d ).
Exercise 3.151. Show that the two given descriptions (odd and even cases) are
equivalent in the sense that if the highest-order moment is otherwise unconstrained,
the projection of the feasible set of one description is exactly given by the other.

3.5.4

Moment Spaces and the Moment Curve

An appealing geometric interpretation of the set of valid moments described in the


previous section is in terms of the so-called moment curve. This is the parametric

i
i

124

main
2012/11/1
page 124
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

1
0.8
0.6
0.4

0.2
0

0.2
0.4
0.6
1

0.8
0.5

1
1

0
0.5

0.5
0

Figure 3.10. Set of valid moments (1 , 2 , 3 ) of a probability measure


supported on [1, 1]. This is the convex hull of the moment curve (t, t2 , t3 ) for
1 t 1. An explicit semidenite representation is given in (3.37).
curve in Rd+1 given by t  (1, t, t2 , . . . , td ). The convex hull of this curve is known
as the moment space and corresponds exactly to the set of valid moments; see [66]
for background and many more details on this geometric viewpoint.
The reason for this correspondence is simple to understand. Every point on
the curve can be associated to a Dirac measure (i.e., one where all the probability
is concentrated on a single point). Indeed, for a measure of 1the form (x c) (all
mass is concentrated at x = c), we have k = E[X k ] = (x c)xk dx = ck ,
and thus the corresponding set of moments is (1, c, c2 , . . . , cd ). Any other measure
can be interpreted as a nonnegative combination of these Dirac measures. Since
the moment map that takes a measure into its set of moments is linear, these
probabilistic nonnegative linear combinations can be interpreted geometrically as
convex combinations of points, yielding the convex hull of the curve. Thus, every
nite measure on the interval gives a point in the convex hull.
In Figure 3.10 we present an illustration of the moment space for the case of
support [1, 1] and d = 3. Notice that, in this case, by Lemma 3.148, we have the
semidenite characterization
 


1 2
0 1
(3.37)

 0,
0 = 1.
1 2
2 3
Since both semidenite constraints are given by 2 2 matrices, the moment space
is the intersection of two circular cones.

i
i

3.5. Duality and Sums of Squares

main
2012/11/1
page 125
i

125

Exercise 3.152. Explain Remark 3.147 from this geometric perspective. What can
you say about the closedness of the convex hull of the moment curve in Rd ? Show
that if we consider closed intervals (i.e., t [a, b]), then the corresponding convex
hull is compact. What happens in the unconstrained case, i.e., when t (, )?

3.5.5

Constructing a Measure

We have given necessary conditions for the existence of a univariate nonnegative


measure with given moments. Under the right assumptions (e.g., compactness of
support, or strict positivity of the Hankel matrix), these conditions were also sucient. We describe next a classical algorithm to eectively obtain this measure.
In general, given a set of moments, there may be many measures that exactly
match these moments (equivalently, the moment map that takes a measure into a
nite set of its moments is not injective). Over the years, researchers have developed a number of techniques to produce specic choices of measures matching a
given set of moments (e.g., those that are simple according to specic criteria,
or that have large entropy, etc.). We review next a classical method for producing
an atomic measure matching a given set of moments; see, e.g., [112, 39]. This
technique (or essentially similar ones) is known under a variety of names, such as
Pronys method, or the Vandermonde decomposition of a Hankel matrix. Other variations of this method are commonly used in areas such as signal processing, e.g.,
Pisarenkos harmonic decomposition method, where one is interested in producing
a superposition of sinusoids with a given covariance matrix.
Consider the set of moments = (0 , 1 , . . . , 2d1 ) for which we want to nd
an associated nonnegative measure supported on the real line. We assume that the
associated Hankel matrix H() is positive denite. In this method, the resulting

measure will be discrete (a sum of d atoms) and will have the form di=1 wi (xxi ).
To obtain the weights wi and atom locations xi , consider the linear system

0
1
..
.

1
2
..
.

..
.

d1

..
.

d1

2d2

c0
c1
..
.

cd1

d+1

= .. .

(3.38)

2d1

The Hankel matrix on the left-hand side of this equation is H(), and thus the
linear system in (3.38) has a unique solution if the matrix is positive denite. In
this case, we let xi be the roots of the univariate polynomial
xn + cn1 xn1 + + c1 x + c0 = 0,
which are all real and distinct (why?). We can then obtain the corresponding
weights wi by solving the nonsingular Vandermonde system given by
n


wi xji = j

(0 j n 1).

i=1

i
i

126

main
2012/11/1
page 126
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

In Exercise 3.155 we will prove that this method actually works (i.e., the atoms
xi are real and distinct, the weights wi are nonnegative, and the moments are the
correct ones).
Example 3.153. Consider the problem of nding a nonnegative measure whose
rst six moments are given by (1, 1, 2, 1, 6, 1). The solution of the linear system (3.38) yields the polynomial
x3 4x2 9x + 16 = 0,
whose roots are 2.4265, 1.2816, and 5.1449. The corresponding weights are 0.0772,
0.9216, and 0.0012, respectively. It can be easily veried that the found measure
indeed satises the desired constraints.
Remark 3.154. The measure recovery method described above always works correctly, provided the computations are done in exact arithmetic. In most practical
applications, it is necessary or convenient to use oating-point computations. Furthermore, in many settings the moment information may be noisy, and therefore the
matrices may contain some (hopefully small) perturbations from their nominal values. For these reasons, it is of interest to understand sensitivity issues at the level
of what is intrinsic about both the problem (conditioning) and the specic algorithm
used (numerical stability).
When using oating-point arithmetic, this technique may run into numerical
diculties. On the conditioning side, it is well known that from the numerical viewpoint, the monomial basis (with respect to which we are taking moments) is a bad
basis for the space of polynomials. On the numerical stability side, the algorithm
above does a number of inecient calculations, such as explicitly computing the coecients ci of the polynomial corresponding to the support of the measure. Better
approaches involve, for instance, directly computing the nodes xi as the generalized
eigenvalues of a matrix pencil; see, e.g., [51, 52].
Exercise 3.155. Prove that the algorithm described above always produces a
valid measure, provided the initial matrix of moments is positive denite. Hint:
Show that if p(x) is a polynomial that vanishes at the points xi then E[p(x)2 ] = 0.
From this, using the assumed positive deniteness of the Hankel matrix, determine
what equations p(x) must satisfy. What is the relation between this matrix and the
Hermite form?
Exercise 3.156.
1. Find a discrete measure having the same rst eight moments as a standard
Gaussian distribution of zero mean and unit variance.
2. What does the previous result imply if we are interested in computing integrals
of the type
,
x2
1

p(x)e 2 dx,
2

i
i

3.5. Duality and Sums of Squares

main
2012/11/1
page 127
i

127

where p(x) is a polynomial of degree less than eight? What would you do if
p(x) is an arbitrary (smooth) function?
3. Use these ideas to give an approximate numerical value of the denite integral
,
2
cos(2x + 1) e2x dx.

How does your approximation compare with the exact value


2e

cos(1)?

Note. In the general case where we are matching 2d moments of a standard Gaussian, it can be shown
that the support of these discrete measures will be given by
the d zeros of Hd (x/ 2), where Hd is the standard Hermite polynomial of degree d.
These numerical techniques are called Gaussian quadrature; see, e.g., [116, 49] for
details.
Exercise 3.157. What is the geometric interpretation of the atomic measure
produced by the algorithm described in this$section? %Explain your answer in terms
of Figure 3.10 and the set of moments = 1, 15 , 12 , 17 .

3.5.6

Moments in Several Variables

The same questions we have considered so far in this section for the univariate case
can be formulated for nonnegative measures in several variables. Concretely, given a
set of numbers , with Nn and || 2d, does there exist a nonnegative measure
in Rn matching these moments? By our earlier discussions, this is essentially the

membership problem for the cone Pn,2d


.
Unfortunately, except for a few special situations (e.g., the univariate case and
the others that follow from Hilberts classication) there is no easy answer or an
ecient polynomial-time algorithm for this question. This mirrors (in fact, dualizes) the case of polynomial nonnegativity. Recall that the cone of valid moments
of nonnegative measures is (up to closure) the dual of the cone of nonnegative polynomials Pn,2d . It is known that the complexity of the weak membership problem
for a convex cone and its dual are equivalent [56], and, as a consequence, deciding

will also be NP-hard. Thus, the computational intractability


membership in Pn,2d
of nonnegative polynomials implies (and is equivalent to) the intractability of valid
multivariate moment sequences.
Remark 3.158. As in the case of polynomial nonnegativity noted in Remark 3.15,
the characterization of truncated moment sequences can be reformulated (and, in
principle, solved) using decision algebra methods such as quantier elimination.
Indeed, both polynomial nonnegativity and conic convex duality are expressible in
rst-order logic, and thus (for any xed dimension and degree) elimination of quantiers will yield a semialgebraic description of the valid moment sequences, in terms
of the variables only. While theoretically useful (since, for instance, this shows
decidability of the problem), this approach is practically infeasible except for very
small instances.

i
i

128

main
2012/11/1
page 128
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Fortunately, we can use the sos methods developed in earlier sections; re


. Furthermore, we
call that these yield the (dual) sos outer bound n,2d Pn,2d

can produce tighter outer approximations to the set Pn,2d that improve upon the
straightforward outer bound n,2d while still being computationally tractable. To
do this, we simply dualize the hierarchies of inner approximations to the set of nonnegative polynomials that we obtained via sos methods. Each variation of the sos
methods that we have seen (Positivstellensatz, Polya/Reznick theorem, Schm
udgen,
and Putinar representations) can be used to produce a matching sequence of dual
approximations to the corresponding dual cone. For concreteness, we illustrate this
discussion with two specic examples.
Polynomial multipliers and rational moments. Recall from Section 3.2.6 that
a way of producing stronger sos conditions in the multivariate case was to multiply
the given polynomial p(x) by a xed sos factor q(x). What does this construction
correspond to on the dual side?
A dual interpretation of this method is in terms of rational moments, i.e.,
expectations of rational functions
= E [X /q(X)] .
Indeed, one can easily write necessary conditions that these rational moments
should satisfy, of the form
2
3
E p(X)2 /q(X) 0,
(3.39)
which, as before (after parametrizing polynomials p(x) up to a given degree), give
spectrahedral conditions on the rational moments . Furthermore, the standard
moments = E[X ] are
 given by a linear transformation of the rational moments , since if q(x) = c x , then


= E[X ] = E[q(X)(X /q(X))] =
c E[X + /q(X)] =
c + .

Notice that this yields the normalization condition E[1] = c = 1. These


expressions give a rened outer approximation to the set of valid moments as an
ane projection of a spectrahedral set (i.e., we are approximating the set of moments with projected spectrahedra). Under suitable conditions on the polynomial
q(x) (e.g., those in Polyas theorem), this method will produce a complete hierarchy
of spectrahedral approximations to the set of valid moments.
Example 3.159. In this example we compute a particular projection of the set

. We consider bivariate probability distributions (n = 2) and


of moments Pn,2d
moments up to sixth order (2d = 6). We are interested in the projection of the set
of valid moments onto the two-dimensional plane (, ) given by = 42 + 24 =

E[x4 y 2 ]+E[x2 y 4 ] and = 22 = E[x2 y 2 ]. The simple sos approximation 2,6 P2,6
in this case yields the trivial orthant outer bound 0, 0.
We can produce tighter bounds by considering the multiplier-based relaxations
described earlier. Let us describe the geometry rst. For this, dene the Motzkinlike family of polynomials Mt (x, y) = t3 x4 y 2 + t3 x2 y 4 + 1 3t2 x2 y 2 (for t = 1, this

i
i

3.5. Duality and Sums of Squares

main
2012/11/1
page 129
i

129

1.2

1.0

0.8

0.6

0.4

0.2

0.5

1.0

1.5

2.0

Figure 3.11. Projection of the set Pn,2d


of valid moments onto (, ) =
(42 + 24 , 22 ). The outer approximation 0, 0 corresponds to the
plain sos bound n,2d . The inner region is obtained using a polynomial multiplier q(x, y) = x2 + y 2 and gives the exact projection.

is the standard Motzkin polynomial). It can be shown (e.g., via the arithmeticgeometric inequality or Exercise 3.160) that Mt (x, y) is nonnegative for t 0.
Therefore, we have the parametrized family of linear inequalities
0 E[Mt (X, Y )] = t3 + 1 3t2
3

for all t 0. Simplifying this expression, we obtain 2 2 , 0. These


inequalities exactly dene the projection of the set of valid moments onto (, );
see Figure 3.11 and Exercise 3.160.
Let us see how the rational moments interpretation described earlier gives a
description of this set as a projected spectrahedron. We choose q(x, y) = x2 + y 2
(as the PolyaReznick theorem would suggest) and dene rational moments jk =
E[X j Y k /(X 2 + Y 2 )]. Parametrizing a generic polynomial p(x, y) = a10 x + a01 y +
+ a13 xy 3 + a04 y 4 , we write the inequality (3.39), i.e.,
E[p(X, Y )2 /(X 2 + Y 2 )] 0

p,

which is a quadratic form in the coecients ai . Expressing this in matrix form, one
obtains a 14 14 matrix4 whose entries are the rational moments jk . We also have
the normalization condition E[1] = 20 + 02 = 1. Since (, ) = (24 + 42 , 22 ),
the desired projection is then given by jk  (62 + 244 + 26 , 42 + 24 ).
Moments on compact sets. Consider a basic semialgebraic set set S = {x
Rn : g1 (x) 0, . . . , gm (x) 0}. We want to describe (or approximate) the set of
valid moments of nonnegative measures supported on S.
As before, we can easily write necessary conditions that the moments should
satisfy by computing expectations of polynomials that are obviously nonnegative
4 In this specic case, the problem can be much simplied by exploiting the sparsity and
symmetry present in the problem. For simplicity, the details are omitted.

i
i

130

main
2012/11/1
page 130
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

on S. Since squares are certainly nonnegative, and so are the products of squares
with the dening polynomials gi , we can consider the expressions
E[p(X)2 ] 0,

E[g1 (X)p(X)2 ] 0,

...

E[gm (X)p(X)2 ] 0,

(3.40)

where p(x) are arbitrary polynomials. Exactly as in the univariate case, imposing
this condition for all p(x) up to a certain degree, these yield quadratic forms in
the coecients of p(x) that depend linearly on the moments . Thus, the conditions (3.40) give a family of spectrahedral approximations of the set of moments
of S-supported nonnegative measures. By increasing the degree of the polynomial p(X), tighter approximations are obtained. Under the right assumptions
(essentially, if we can approximate the set of nonnegative polynomials), this dual
hierarchy will approximate the set of moments arbitrarily well. For instance, recall
from Section 3.4.4 that this will be the case if qmodule(q1 , . . . , qm ) satises the
Archimedean property of Denition 3.137 (and thus, S is compact), as was done
in [72]. Notice that these approximations can be strengthened by including products
of the form E[gi (X) gk (X)p(X)2 ] 0, which correspond to the distinction between preorders and quadratic modules, or, equivalently, Schm
udgen versus Putinar
representations.
Constructing multivariate measures. In the univariate case, we have discussed
in Section 3.5.5 how to produce an atomic measure matching a given nite set of
moments using Pronys method. This is possible because in that case there is a full
characterization of the moment space. In the multivariate case, as we have seen,
even the decision question (Are these valid moments?) is NP-hard, and thus,
in general, unless further assumptions are satised, no such ecient procedure is
available.
Given a truncated moment sequence (or, equivalently, a functional Rn,2d ),
the positivity condition (p2 ) 0 is of course necessary for the existence of a
nonnegative measure. A well-known case where it is possible to construct such
a measure is whenever the at extension property [34] holds. This is a condition on the given moment sequence that requires the rank of the quadratic form
p  (p2 ) to remain the same when considering polynomials p of degree d or d + 1
for some value of d. Whenever this condition holds, a natural generalization of
the method described in the univariate case can be applied to obtain an atomic
measure matching the given moment sequence. The basic idea of this construction is sketched below and appears in a number of related forms in the literature
(e.g., GelfandNeimarkSegal construction, Stickelberg/Stetter-M
oller/eigenvalue
method for polynomial equations [32, 121], etc.). Under the at extension assumption, one can dene nite-dimensional commuting multiplication operators (i.e.,
matrices) associated to each of the variables xi . To do this, one considers the linear
maps Mxi : f  xi f , where Mxi : R[x]n,d /S R[x]n,d /S and S is the subspace
{p R[x]n,d : (p2 ) = 0}. By construction, these matrices pairwise commute, and
they can be simultaneously diagonalized. From their diagonal representation, one
can directly read the components of the support of the measure and then obtain the
corresponding weights. For a full exposition of the procedure, we refer the reader
to [63, 73].

i
i

3.6. Further Sum of Squares Applications

main
2012/11/1
page 131
i

131

Exercise 3.160. Consider again Example 3.159.


1. Show that (x2 + y 2 ) Mt (x, y) is a sum of squares when t 0.
2. Show, by producing a family of suitable probability distributions, that the
3
inequalities 2 2 , 0 fully characterize the projection of the set of

onto the plane (, ).


valid moments Pn,2d
3. Write the explicit form of the corresponding semidenite program, and verify

that it indeed gives the projection of Pn,2d


onto the plane (, ).
Exercise 3.161. Show that, by allowing the degree of p(X) to grow, the conditions (3.40) can approximate arbitrarily closely the set of valid moments of nonnegative measures supported on S. Notice that this statement is essentially the dual
of Putinars representation theorem (Theorem 3.138).

3.6

Further Sum of Squares Applications

In this section we present several applications from dierent domains of applied


mathematics and engineering where sos techniques have provided new solutions
and insights. In each case we present the core mathematical ideas, attempting to
reduce as much as possible the use of domain-specic jargon. The main point we
want to illustrate is the power and versatility of polynomial optimization and convex
optimization in addressing many apparently diverse questions, using virtually the
same mathematical and computational machinery. We refer the reader to the cited
literature for in-depth discussions of each specic topic.

3.6.1

Copositive Matrices

A symmetric matrix M S n is copositive if, for all x Rn ,


x0

xT M x 0.

Equivalently, the associated quadratic form is nonnegative on the closed nonnegative


orthant. If xT M x takes only positive values on the closed orthant (except the
origin), then M will be strictly copositive. We will denote the set of n n copositive
matrices as Cn .
Copositive matrices are of importance in a number of applications. We briey
describe two of them.
Example 3.162. Consider the problem of obtaining a lower bound on the optimal
solution of a linearly constrained quadratic optimization problem [103]:
f =

min

Ax0, xT x=1

xT Qx.

If there exists a feasible solution C to the linear matrix inequality


Q AT CA  I

i
i

132

main
2012/11/1
page 132
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

where the matrix C is copositive, then by multiplying the inequality above by xT


on the left and x on the right, it immediately follows that f . Thus, having
good convex conditions for copositivity would allow for enhanced bounds for this
type of problem.
Example 3.163 ([35]). This is an important special case of the problem just described. It corresponds to the computation of the stability number of a graph G
(recall Section 2.2.3 in Chapter 3). From a result of Motzkin and Straus [80], it is
known that (G) can be obtained as
1
=
min
xT (A + I)x,

(G) xi 0, i xi =1
where A is the adjacency matrix of the graph G. This result implies that given
a graph G with adjacency matrix A, the matrix (G)(I + A) eeT is copositive.
In [35], de Klerk and Pasechnik show how to use this result and the semidenite
approximations presented in this section to obtain guaranteed approximations to
the stability number that can improve upon the bound provided by the Lov
asz theta
function.
The set of copositive matrices Cn is a closed convex cone (in fact, it is a
proper cone; see Exercise 3.167). However, in general it is very dicult to decide if
a given matrix belongs to this cone. In the literature there are explicit necessary and
sucient conditions for a given matrix to be copositive, usually expressed in terms
of principal minors; see, e.g., [122, 31] and the references therein. Unfortunately, it
has been shown that checking copositivity of a matrix is a co-NP-complete problem
[81], so this implies that in the worst case, these tests can take an exponential
number of operations (unless P = NP). This motivates the need of developing
ecient sucient conditions to guarantee copositivity.
It should be clear that the situation looks very similar to the case of nonnegative polynomials studied in earlier sections. In fact, it is exactly the same, since, as
we will see, we can identify the set of copositive matrices with a particular section
of the cone of nonnegative polynomials. Not surprisingly, we will be able to use sos
and SDP techniques to provide tractable approximations of the cone Cn .
An apparent distinction between the copositivity question and the problems
studied earlier is the presence of nonnegativity constraints on the variables xi . Thus,
to establish the links with sos techniques we will need a way of handling the nonnegativity constraints. There are dierent ways of doing this, but a straightforward one
is to dene new variables zi and to let xi = zi2 . Then, to decide copositivity of M ,
we can equivalently study the global nonnegativity of the quartic form given by

P (z) := xT M x =
mij zi2 zj2 .
i,j

It is easy to verify that M is copositive if and only if the form P (z) is nonnegative,
i.e., P (z) 0 for all z Rn . This shows that we can indeed identify the cone

i
i

3.6. Further Sum of Squares Applications

main
2012/11/1
page 133
i

133

Cn of copositive matrices with a particular slice of the cone of nonnegative quartic


forms Pn,4 .
How to produce good approximations to Cn ? Based on the characterization
given earlier, it should be clear that an obvious sucient condition for M to be
copositive is that P (z) be a sum of squares. Due to the special structure of the
polynomial, this condition can be interpreted directly in terms of the matrix M .
Lemma 3.164. The form P (z) is a sum of squares if and only if M can be written
as the sum of a positive semidenite matrix and a nonnegative matrix, i.e.,
Nij 0

P  0,

M = P + N,

for i = j

(without loss of generality, we can take Nii = 0). If this holds, then M is copositive.
The condition in Lemma 3.164 is only sucient for copositivity. A well-known
example showing this is the matrix

H=

1 1
1
1 1
1
1 1
1
1
1 1
1 1
1
1
1 1
1 1
1
1
1 1
1

(3.41)

This matrix, originally introduced by A. Horn, is copositive even though it does not
satisfy the P + N condition of Lemma 3.164.
This motivates the denition of a natural hierarchy of approximations to the
copositive cone [89, 35]. Consider the family of 2(r + 2)-forms given by

Pr (z) =

n


r
zi2

P (z),

(3.42)

i=1

and dene the cones Kr = {M S n : Pr (z) is sos} (for simplicity, we omit the
dependence on n). It is easy to see that if Pr is a sum of squares, then Pr+1 is also
a sum of squares. The converse proposition, however, does not necessarily hold;
i.e., Pr+1 could be a sum of squares even if Pr is not. Additionally, if Pr (z) is
nonnegative, then so is P (z). Thus, by testing whether Pr (z) is a sum of squares,
we can guarantee the nonnegativity of P (z) and, as a consequence, the copositivity
of M . This yields the hierarchy of inclusions
(n)
n
+ R+2  K0 K1 Kr Cn ,
S+

(3.43)

where (abusing notation) the rst equality expresses the statement of Lemma 3.164.
The containment between these cones is in general strict. For instance, the Horn
matrix presented in (3.41) is not in K0 , but it is in K1 ; see Exercise 3.170.

i
i

134

main
2012/11/1
page 134
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Clearly, this hierarchy gives computable conditions that are at least as powerful as the P + N test of Lemma 3.164. But how conservative is this procedure?
Does it approximate the copositive cone Cn to arbitrary precision? It follows from
our discussion of P
olyas theorem in Section 3.2.6 that for any strictly copositive
matrix M , there is a nite r for which M Kr . However, the minimum r cannot be chosen as a constant (uniformly over all strictly copositive matrices). In
general, the known lower bounds for r usually involve a condition number for
the form P (z): the minimum r grows as the form tends to degeneracy (nontrivial
solutions). This is consistent with the computational complexity results mentioned
earlier: if the value of r were uniformly bounded above, then we could always produce a polynomial-time certicate for copositivity (namely, an sos decomposition of
Pr (z)), contradicting NP = co-NP.
Circulant copositive matrices. In general, particularly in high dimensions, the
geometry of the copositive cone is very complicated. As such, it is often useful
to consider low-dimensional sections, where we can gain some intuition and understanding. A nice case, which we analyze next, is the case of circulant (or cyclic)
matrices.
An n n matrix is circulant if its (i, j) entry depends only on |i j| mod n.
We denote the subspace of n n circulant matrices by On . For the case of n = 5,
we provide below a complete characterization of the circulant copositive matrices
and the associated relaxations. A general 5 5 circulant matrix has the form

a b c c b
b a b c c

(3.44)
M (a, b, c) =
c b a b c .
c c b a b
b c c b a
For circulant matrices, the second relaxation K1 will be enough to recognize copositivity, i.e., C5 O5 = K1 O5 . Notice that if a = 0, then all the other elements
must be nonnegative. For later reference, we dene the constant

= (1 + 5)/4 0.809.
Theorem 3.165. Consider a circulant matrix M = M (a, b, c) as in (3.44). Then
the following hold.
1. The matrix M is in K0 if and only if
a 0,

a + b 0,

a + c 0,

a + 2b + 2c 0.

2. The matrix M is in K1 if and only if


a 0,

a + b 0,

if b < 0, then ac 2b2 a2 ,

a + c 0,

a + 2b + 2c 0,

if c < 0, then ab 2c2 a2 .

3. Furthermore, if M is copositive, then it is in K1 .

i
i

3.6. Further Sum of Squares Applications

main
2012/11/1
page 135
i

135
c

1.0

0.5

0.0

0.5

1.0

1.5

1.5

1.0

0.5

0.0

0.5

1.0

Figure 3.12. The convex cone of 5 5 circulant copositive matrices (3.44)


and its inner sos approximation K0 . This plot corresponds to a compact section of
the cone where a + b + c = 1. This cone is not polyhedral, as parts of the boundary
are described by quadratic inequalities; see Theorem 3.165.
Notice that, for this example, the set K0 O5 is basic semialgebraic (in fact,
polyhedral), but K1 O5 = C5 O5 is not basic semialgebraic. These sets are
presented in Figure 3.12. Notice that the Horn matrix H = M (1, 1, 1) presented
in (3.41) corresponds to the extreme point at b = 1, c = 1.
For general matrices (even 5 5!), the situation is not nearly as nice as the
slice described above may lead us to believe. In fact, the following is true.
Theorem 3.166. Consider the set C5 of copositive 5 5 matrices. There is no
nite value of r for which C5 = Kr .
In fact, it is not yet known whether the set of 5 5 copositive matrices C5 is
a projected spectrahedron.
Exercise 3.167. Show that the set of copositive matrices Cn is a proper cone (i.e.,
closed, convex, pointed, and solid).
Exercise 3.168. A matrix A S n is completely positive if A = V V T for some
nonnegative matrix V Rnk
+ , i.e.,
A=

k


vi viT ,

i=1

i
i

136

main
2012/11/1
page 136
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

where vi Rn+ are the columns of V , and hence nonnegative vectors.


1. Show that the set Bn of completely positive matrices is a proper cone.
2. Show that Bn and Cn are dual cones.
3. Give explicit examples of matrices in the interior of the cones Cn and Bn .
Exercise 3.169. Prove Lemma 3.164.
Exercise 3.170. Prove that the Horn matrix (3.41) is copositive by nding an sos
certicate of the nonnegativity of xT Hx on the nonnegative orthant.
Exercise 3.171. An alternative (and perhaps more natural) interpretation of the
approximations (3.43) can be obtained by rewriting the sos certicates directly in
terms of the variables xi . In this case, we have that M K0 if and only if
xT M x = xT P x +

nij xi xj ,

i =j

where P  0 and nij 0 (P + N decomposition). Similarly, M K1 if and only if




n

i=1


xi

(xT M x) =

n

i=1

xi (xT Qi x) +

ijk xi xj xk ,

i =j =k

where Qi  0 and ijk 0.


Prove the correctness of these statements, and explain why these representations directly prove copositivity of M . What is the relationship between these
expressions and Schm
udgen-type certicates of nonnegativity?
Exercise 3.172. Explain how to use the semidenite relaxations Kr of the copositive cone Cn to give outer approximations to the cone Bn of completely positive
matrices. In particular, provide explicit SDP characterizations of the rst two
levels of the hierarchy.

3.6.2

Lyapunov Functions

As we have seen, reformulating conditions for a polynomial to be a sum of squares in


terms of semidenite programming is very useful, since we can use the sos property
as a convenient sucient condition for polynomial nonnegativity. In the context
of dynamical systems and control theory, there has been much work applying the
sos approach to the problem of nding Lyapunov functions for nonlinear systems
[89, 87].
The basic framework of Lyapunov functions was introduced in Section 2.2.1 of
the previous chapter for the case of linear systems. The main dierence is that now
we will allow our system of dierential equations to be nonlinear. This approach

i
i

3.6. Further Sum of Squares Applications

main
2012/11/1
page 137
i

137

makes possible searching over anely parametrized polynomial or rational Lyapunov functions for systems with dynamics of the form
x i (t) = fi (x(t))

for i = 1, . . . , n,

(3.45)

where the functions fi are polynomials or rational functions. Recall that, for a
system to be globally asymptotically stable, it is sucient to prove the existence of
a Lyapunov function that satises
&
'T
V

V (x) > 0,
V (x) =
f (x) < 0
x
for all x Rn \ {0}, where without loss of generality we have assumed that the
dynamical system (3.45) has an equilibrium at the origin (see, e.g., [67]).
As mentioned earlier, we will consider candidate Lyapunov functions that are
polynomials (or rational functions). Since polynomial nonnegativity is computationally hard, we will instead impose that the candidate Lyapunov function V (x)
and its Lie derivative V (x) both satisfy the (possibly stronger) condition:5
&
'T
V
V (x) is sos,
V (x) =
f (x) is sos.
x
Parametrizing a candidate Lyapunov function (e.g., by considering all possible polynomials of degree less than or equal to 2d), the conditions given above can be expressed as sos constraints in terms of the coecients of the Lyapunov function.
Since both conditions are ane in the coecients of V (x), using the techniques
described earlier in this chapter, these can be easily transformed into a standard
semidenite optimization formulation.
As an example, consider the following nonlinear dynamical system that corresponds to the MooreGreitzer model of a jet engine with stabilizing feedback
operating in the no-stall mode (see, e.g., [71]). The dynamic equations take the form
1
3
x = y x2 x3 ,
2
2
y = 3x y.

(3.46)

Using SOSTOOLS [101], we easily nd a Lyapunov function that is a polynomial


of degree 6. The trajectories of the nonlinear system, and the level sets of the
found Lyapunov function, are shown in Figure 3.13. Notice that, as expected,
V (x) monotonically decreases along trajectories, and thus all trajectories move
from larger to smaller level sets of the Lyapunov function for all possible initial
conditions.
Similar approaches have been developed for much more complicated problems
in systems and control theory. Among others, these include nding Lyapunov functionals for nonpolynomial, time-delayed, stochastic, uncertain, or hybrid systems;
see, e.g., [87, 88, 100, 44] and the references therein.
5 The strict positivity requirement can be easily handled, either by including a strictly positive
term, or by relying on the fact that SDP solvers usually compute strictly feasible solutions.

i
i

138

main
2012/11/1
page 138
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

5
4
3
2

1
0
1
2
3
4
5
5

Figure 3.13. Trajectories of the nonlinear dynamical system (3.46) and


level sets of a Lyapunov function found using sos techniques.
Exercise 3.173. Consider the polynomial dynamical system
x = x + (1 + x)y,
y = (1 + x)x.
Find a polynomial Lyapunov function of degree 4 that proves global asymptotic
stability.
Exercise 3.174. Consider the polynomial dynamical system
x = x + xy,
y = y.
1. Show that this system is globally asymptotically stable by considering the
(nonpolynomial) Lyapunov function V (x, y) = ln(1 + x2 ) + y 2 .
2. Using SOSTOOLS (or other software) try to nd a polynomial Lyapunov
function. Explain your success or failure.
Remark 3.175. The example in the previous exercise is from [2]. Although polynomial Lyapunov functions may fail to exist if global asymptotic stability is desired, it
is known that locally exponentially stable polynomial vector elds always have polynomial Lyapunov functions on compact sets; see, e.g., [95]. A suitable modication
of the method explained in this section can be used to establish stability for a given
compact set of initial conditions.

i
i

3.6. Further Sum of Squares Applications

3.6.3

main
2012/11/1
page 139
i

139

Probability Bounds

Two of the most useful results in basic probability theory are the classic Markov
and Chebyshev inequalities. Markovs inequality states that if X is a nonnegative
scalar random variable, then, for all a > 0,
P(X a)

E[X]
.
a

(3.47)

Similarly, Chebyshevs inequality says that for any random variable X with mean
and variance 2 , we have
P(|X | a)

2
.
a2

(3.48)

In fact, Chebyshevs inequality is just Markovs applied to the nonnegative random


variable (X )2 .
Both inequalities can be interpreted as producing bounds on the probability
of certain events, given partial information about the random variable X expressed
in terms of its moments (only the rst moment in Markovs, and the rst and second
moments for Chebyshev).
In this section we describe an important application of polynomial inequalities
in probability theory, namely, a technique to generalize Chebyshev-type inequalities to the case of more general events and moment information. For simplicity, we
consider only the univariate case; the extensions to the multivariate case are quite
straightforward. We refer the reader to [16] for background, extensions, applications, and more details.
The statement of the problem is the following: let X be a scalar random
variable with an unknown probability distribution supported on the set R, and
for which we know its rst d + 1 moments (0 , . . . , d ), where k = E[X k ]. The
goal is to nd bounds on the probability of an event S ; i.e., we want to bound
P(X S). For simplicity, we assume S and are given intervals.
As we shall see next,we can obtain bounds on this probability via convex
d
optimization. Let p(x) := k=0 ck xk be a univariate polynomial, and consider the
following optimization problem in the decision variables ck :

p(x) 1 x S,
minimize E[p(X)] subject to
(3.49)
p(x) 0 x ,
or, equivalently,
minimize

d

k=0

c k k

subject to


d
ck xk 1 x S,
k=0
d
k
k=0 ck x 0 x .

(3.50)

Notice that when and S are (unions of) univariate intervals, it follows from the
characterizations given in Section 3.3.1 that this is an sos optimization program of
the form discussed in Section 3.1.7.

i
i

140

main
2012/11/1
page 140
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

We claim that any feasible solution of (3.49) gives a valid upper bound on
P(X S). To see this, notice if 1S (x) is the indicator function of the set S (i.e., it
is equal to 1 if x S and 0 otherwise); the constraints in (3.49) imply the inequality
1S (x) p(x) for all x . It then follows that
,
,
P(X S) =
1S (x) dP(x)
p(x) dP(x) = E[p(X)].

In simpler terms, these bounds work by approximating (from above, in the case
of upper bounds) the indicator function of the event S by a polynomial. Since we
know the moments of X, we can compute in closed form the expectation of this
polynomial. By optimizing over the coecients ck , we nd the best polynomial
approximation of the indicator function and thus the best upper bound provable by
this method.
Essentially the same techniques apply to much more complicated situations
(e.g., the multivariate case, partial moment information, martingale inequalities,
etc.). For a detailed treatment, see [16, 17] and references therein.
Exercise 3.176. Show that the Markov and Chebyshev bounds can be interpreted as closed-form solutions of (3.49) for specic sets and S. What are the
corresponding optimal polynomials p(x)?
Exercise 3.177. Assume that = [0, 5], S = [4, 5], and the mean and variance
of the random variable X are equal to 1 and 1/2, respectively. Give upper and
lower bounds on P (X S). Are these bounds tight? Can you nd the worst-case
distributions?

3.6.4

Quantum Separability and Entanglement

The state of a nite-dimensional quantum system can be described in terms of


a positive semidenite Hermitian matrix, called the density matrix. A question
of interest in quantum information theory is whether a given quantum state can
be explained classically (i.e., purely in terms of probability theory) or whether
the full power of quantum mechanics is needed. In what follows, we explain the
core mathematical issues behind this question; see [85] for a detailed treatment of
quantum information theory. For simplicity, we consider real symmetric matrices
(as opposed to complex Hermitian) and use standard mathematical notation instead
of the Dirac formulation used in physics.
Consider a symmetric, positive semidenite matrix , with trace equal to one.
We will refer to as a density matrix. An important property of a bipartite quantum
state is whether or not it is separable, which means that it can be written as a
convex combination of tensor products of rank one matrices, i.e.,


pi (xi xTi ) (yi yiT ),
pi 0,
pi = 1.
=
i

i
n1 n2
S+
.

Here xi Rn1 , yi Rn2 , and


By construction, the set of separable
states is a convex set. If the state is not separable, then it is said to be entangled.

i
i

3.6. Further Sum of Squares Applications

main
2012/11/1
page 141
i

141

The physical interpretation of a separable state corresponds to a probabilistic superposition (with probabilities given by the pi ), where one subsystem is in state
xi and the other subsystem is in state yi . If no such decomposition is possible,
then it is not possible to think of the two subsystems as being independent (even
though they may be physically separated), and thus actions/measurements on one
subsystem may aect the other (i.e., they are entangled).
The quantum separability or quantum entanglement question is the following:
Given the density matrix of a quantum state, how do we decide whether is
entangled or not? If it entangled (or separable), how can we certify this property?
It has been shown by Gurvits that in general this is an NP-hard question [58].
As we shall see, quantum entanglement is intimately related to polynomial
nonnegativity. A natural mathematical object to study in this context is the set of
positive maps. These are the linear operators : S n1 S n2 that satisfy X  0
(X)  0; i.e., they map positive semidenite matrices into positive semidenite
matrices. Notice that to any such , we can associate a unique observable W
S n1 n2 that satises y T (xxT )y = (xy)T W (xy). Furthermore, if is a positive
map, then the pairing between the observable W and any separable state will
always give a nonnegative number, since




T
T
pi (xi xi ) (yi yi ) =
pi Tr W (xi yi ) (xi yi )T
W ,  = Tr W
=

pi (xi yi ) W (xi yi ) =
T

pi yiT (xi xTi )yi 0.

In other words, every positive map yields a separating hyperplane for the convex set
of separable states. It can further be shown that every valid inequality corresponds
to a positive map, so this yields, in fact, a complete characterization (and thus,
the sets of separable states and positive maps are dual to each other). For this
reason, the observables W associated to positive maps are called entanglement
witnesses.
The set of positive maps (and thus, entanglement witnesses) can be exactly
characterized in terms of a multivariate polynomial nonnegativity, since a linear map
: S n1 S n2 is positive if and only if the biquadratic form in n1 + n2 variables
p(x, y) = y T (xxT )y is nonnegative for all x, y (why?). Replacing nonnegativity
with sos based conditions, we can obtain a family of eciently computable criteria
that certify entanglement.
Concretely, given a state for which we want to determine whether it is entangled, the rst such test corresponds to the optimization problem of nding an
entanglement witness W (or linear map ) such that
W ,  < 0,

y T (xxT )y is sos.

(3.51)

Interestingly, this corresponds to the well-known positive partial trace (PPT)


separability criterion. The advantage of sos techniques is that stronger tests can be
naturally derived by considering higher-order sos conditions. In particular, we have
the parametrized family of tests
W ,  < 0,

(xT x)k (y T (xxT )y T ) is sos

(3.52)

i
i

142

main
2012/11/1
page 142
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

for k 0 that obviously generalize (3.51) (which corresponds to the case k = 0). It
should be clear that these sos programs can be numerically solved using semidenite
programming. It can also be shown [40, 41] that this hierarchy is complete in the
sense that every entangled state is eventually certied by some value of k.
For more background and details about quantum entanglement and the separability problem, see [40, 41] and the references therein. It has been recently
shown [22] that the sos based algorithm described above can be used to provide a
quasipolynomial time algorithm for the quantum separability problem.
Exercise 3.178. Consider linear maps between symmetric matrices of the form
: S n1 S n2 .

1. Show that any linear map of the form A  i PiT APi , where Pi Rn1 n2 ,
is positive. These maps are known as decomposable maps.
2. Consider the polynomial dened by p(x, y) := y T (xxT )y. Show that is a
positive map if and only if p(x, y) is nonnegative and that is a decomposable
map if and only if p(x, y) is a sum of squares.
3. Show that the linear map C : S 3 S 3 (due to M.-D. Choi) given by

2a11 + a22
0
C : A 
0

0
2a22 + a33
0

0
A
0
2a33 + a11

is a positive map but is not decomposable.


4. Explain the relationship between this linear map and the Choi matrix discussed earlier in (3.20).

3.6.5

Geometric Theorem Proving

Many geometric statements can be reinterpreted, after a suitable coordinatization,


in terms of algebraic inequalities. This opens up the possibility of proving theorems
about geometric objects by characterizing the desired properties in terms of algebraic inequalities and then proving these inequalities through sos certicates. We
give two concrete examples in what follows. The main value of these simple examples is to illustrate how the process of proving algebraic or geometric inequalities
can be made fully algorithmic and how the power of convex optimization can be
brought to these questions.
Schurs inequality. This is a classical inequality due to Schur that states
that for nonnegative variables x, y, z, we have
S(x, y, z) := xk (x y)(x z) + y k (y z)(y x) + z k (z x)(z y) 0, (3.53)
where k 0 is an integer.

i
i

3.6. Further Sum of Squares Applications

main
2012/11/1
page 143
i

143

We give next a simple sos proof of this inequality for the case k = 1, easily
obtainable via semidenite programming. Dene
T
2
x + yz
1 2
y + xz
S1 =
2 2
z + xy

2 1 1 x2 + yz
1 2 1 y 2 + xz ,
z 2 + xy
1 1 2

S2 = yz(y z)2 + xz(x z)2 + xy(x y)2 .


Since the matrix in the expression above is positive semidenite, it is clear that
both S1 and S2 are nonnegative when x, y, z are nonnegative. We have then the
easy-to-verify identity
(x + y + z) S(x, y, z) = S1 + S2
that clearly proves (3.53). Schurs inequality is closely related to the Robinson form,
one of the rst explicit examples of non-sos positive denite forms; see [29] and [106]
for background and more details.
Onos inequality. We present next an sos proof of a geometric inequality
due to Ono. This example originally appeared in [117] as a benchmark problem for
geometric theorem proving.
Consider a triangle with sides of length a, b, c, and denote its area by K. In
1914, T. Ono [86, 79] conjectured that the inequality
(4K)6 27 (a2 + b2 c2 )2 (b2 + c2 a2 )2 (c2 + a2 b2 )2

(3.54)

holds for all triangles. The statement was subsequently shown to be false in general
[11] but proved to hold whenever the triangle in question is acute (all angles are
less than or equal to /2) [12]. Using sos techniques, we will obtain a very concise
proof.
For this, we can express the premise that the triangle be acute as the three
polynomial inequalities
t1 := a2 + b2 c2 0,
t2 := b2 + c2 a2 0,

(3.55)

t3 := c + a b 0.
2

It is well known (Herons formula) that we can rewrite the square of the area K as
a polynomial in a, b, c:
K 2 = s(s a)(s b)(s c),

s=

a+b+c
.
2

The question, therefore, reduces to verifying that (3.54) holds whenever the inequalities (3.55) are satised. A simple proof of Onos inequality can then be found using
the Positivstellensatz and sos methods: dene the sos polynomial
s(x, y, z) := (x4 +x2 y 2 2y 4 2x2 z 2 +y 2 z 2 +z 4 )2 +15 (xz)2 (x+z)2 (z 2 +x2 y 2 )2 .

i
i

144

main
2012/11/1
page 144
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

We have then
(4K)6 27 t21 t22 t23 = s(a, b, c) t1 t2 + s(c, a, b) t1 t3 + s(b, c, a) t2 t3 , (3.56)
therefore proving the inequality.
Another, more complicated, application of these techniques is given in [93].
In that paper, the subadditivity of a geometric quantity for triangles, expressible in
terms of its side lengths and an angle, is proved via sos methods. The problem can
be reduced to proving the nonnegativity of the polynomial
2 2 ( )2 + 2 (1 )(1 + 22 ) 2
+ 2 (1 )(1 + 2 2 ) 2 (2 + 3 4 + 3 )
+ (1 )(2 2 ) 3 (2 + 2 + 23 3 42 2 ) 2 2
+ (1 )(2 2 ) 3 + ( )2 3 3

(3.57)

on the unit box 0 , , , 1. As shown in [93], it is possible to obtain a concise


certicate of its nonnegativity using sos methods.
As generalizations of the well-understood methods of semidenite programming, sos techniques have proved remarkably powerful in the treatment of geometric
problems. A nice example of this is the recent work of Bachoc and Vallentin [10],
where the authors have developed improved bounds on kissing numbers. This is
the classical question of how many identical n-dimensional nonoverlapping spheres
can be simultaneously tangent to a given central sphere of the same radius. It is
easy to see that in the plane, this number is six (six disks, surrounding a central
disk, in a hexagonal pattern), but determining this number in higher dimensions
is a very dicult problem. By combining techniques from harmonic analysis and
semidenite programming (related to the symmetry reduction techniques discussed
in Section 3.3.6), the authors of [10] have extended the classical association scheme
approach to spherical codes (see, e.g., [37] and the references therein) to obtain the
best available upper bounds on kissing numbers.
Sum of squares techniques can also be nicely interfaced with other, more
general, methods for automated theorem proving. We refer the reader to the work
of Harrison [60] for a discussion of these ideas, some of which have been implemented
in the theorem prover HOL Light [61].
An interesting open research question is whether these algebraic proofs or
sos certicates can be given natural geometric interpretations. For instance, as
a concrete question, is there any intrinsic geometric meaning of the polynomial
identity proof given in (3.56)?
Exercise 3.179 (Weitzenb
ocks inequality). Consider a triangle with side
lengths equal to a, b, c and area equal to K. Give an sos proof of the inequality

a2 + b2 + c2 (4 3)K,

and show that 4 3 is the best possible constant.

i
i

3.6. Further Sum of Squares Applications

main
2012/11/1
page 145
i

145

Exercise 3.180 (Pedoes inequality). Consider two triangles with side lengths
equal to (a1 , b1 , c1 ) and (a2 , b2 , c2 ) and areas K1 , K2 , respectively. Give an sos proof
of the inequality
a21 (b22 + c22 a22 ) + b21 (c22 + a22 b22 ) + c21 (a22 + b22 c22 ) 16K1K2 .
Is 16 the best possible constant? What happens if one of the triangles is equilateral?
Exercise 3.181. Prove that the polynomial (3.57) is nonnegative when the variables satisfy 0 , , , 1. Find an sos certicate of this fact.

3.6.6

Polynomial Games

The mathematical theory of games was developed to model and analyze strategic
interactions among multiple decision makers with possibly conicting objectives.
Game theory has been successfully used in many domains, including economics,
engineering, and biology. Standard modern references include [46, 82]. In this
section we present an application of sos methods in game theory, initially described
in [92].
We consider two-player zero-sum games, where the payos are polynomial
functions. This class of polynomial games was originally introduced and studied
by Dresher, Karlin, and Shapley in 1950 [42]. In the basic set-up there are two
players (which we will denote as Player 1 and Player 2), which simultaneously and
independently choose actions parametrized by real numbers x, y, respectively, in the
interval [1, 1]. The payo associated with these choices is given by a polynomial
function
n 
m

pij xi y j
(3.58)
P (x, y) =
i=0 j=0

that assigns payments from Player 2 to Player 1. Thus, Player 1 wants to choose
his strategy x to maximize P (x, y), while Player 2 tries to make this expression as
small as possible. Players are allowed, and often it is in their interest, to choose
their actions randomly according to specic probability distributions; these are
called mixed strategies (the game of rock-paper-scissors is a simple example of this
situation).
The solution concept of interest is called Nash equilibrium. This corresponds
to a choice of strategies for both players, for which there is no incentive for a player
to deviate, assuming the other player keeps their strategy xed. It is well known
that for zero-sum games, this notion reduces to the simpler minimax or saddle-point
equilibrium; see (3.60).
Example 3.182. Consider a polynomial game on [1, 1][1, 1], with payo function given by P (x, y) = (x y)2 . Since Player 2 wants to minimize her payos, she
should try to guess the number chosen by Player 1. Conversely, the rst player
should try to make his number as dicult to guess as possible (in the sense dened
by P (x, y)). It is easy to see in this case that the optimal strategy for Player 1 is
to randomize between x = 1 or x = 1 with equal probability, while the optimal

i
i

146

main
2012/11/1
page 146
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

strategy for Player 2 is to always choose y = 0. Assuming the other player keeps
their strategy xed, no player has incentive to deviate from these strategies, and
thus this yields an equilibrium, with the corresponding value of the game being
equal to 1.
The question of interest is the following: given a game described by its payo
function P (x, y), how do we eciently compute its equilibrium solution, i.e., the
optimal strategies both players should use?
Recall that players can randomize over their choices, so their strategies will
be described by probability measures and , respectively, supported on [1, 1].
When considering mixed strategies, and similarly to the nite case, we need to
consider the expressions
max min E [P (x, y)]

and

min max E [P (x, y)],

where E [] denotes the expectation under the product measure. We can rewrite
these as bilinear expressions
max min
i

n 
m


pij i j ,

i=0 j=0

min max
j

n 
m


pij i j ,

(3.59)

i=0 j=0

where i , j are the moments of the measures , , i.e.,


, 1
, 1
i :=
xi d,
j :=
y j d.
1

Recall from Section 3.5.4 that the moment spaces (i.e., the image of the probability
measures under the moment map given above) are compact convex sets in Rn+1
and Rm+1 . Since the objective function in the problems (3.59) is bilinear, and
the feasible sets are convex and compact, the minimax theorem (Theorem A.6
in Appendix A) can be used to show that these two quantities are equal. As a
consequence, there exist measures  ,  that satisfy the saddle-point condition:
n 
m

i=0 j=0

pij i j

n 
m

i=0 j=0

pij i j

n 
m


pij i j .

(3.60)

i=0 j=0

The key fact here is that, due to the separable structure of the payos, the optimal
strategies can be characterized only in terms of their rst m (or n) moments. Higher
moments are irrelevant, at least in terms of the payos of the players.
From the previous discussion, we have the following result, essentially contained in [42].
Theorem 3.183. Consider the two-player zero-sum game on [1, 1] [1, 1], with
payos given by (3.58). Then, the value of the game is well dened, and there exist
optimal mixed strategies  ,  satisfying a saddle-point condition. Furthermore,
without loss of generality, the optimal measures can be taken to be discrete, with at
most min(n, m) + 1 atoms.

i
i

3.6. Further Sum of Squares Applications

main
2012/11/1
page 147
i

147

The derivation and computation of the mixed strategies and the value of the
game can be done as follows. We rst characterize security strategies that provide
a minimum guaranteed payo. We can then invoke convex duality to prove that
this actually yields the unique value of the game. Proceeding along these lines,
by analogy to the nite case, a security strategy of Player 2 can be computed by
solving

E [P (x, y)] x [1, 1],
11
(3.61)
minimize s.t.
,
d(y) = 1.
1
Indeed, if Player 2 plays the mixed strategy obtained from the solution of this
problem, the best that Player 1 can do is to choose a value of x that maximizes
E [P (x, y)], thus limiting his gain (and Player 2s loss) to .
Since P (x, y) is a polynomial, this expectation can be equivalently written in
terms of the rst n moments of the measure , i.e.,
,
E [P (x, y)] =

P (x, y)d(y) =
1

n 
m


pij j xi .

i=0 j=0

Notice that this is a univariate polynomial in the action x of Player 1, with coecients that depend anely on the moments j of the mixed strategy of Player 2.
Consider now the problem (3.61), but instead of writing it in terms of the
decision variable (which is a probability measure), let us use instead the moments
{j }m
j=0 . The problem is then reduced to the minimization of the safety level ,
subject to the following conditions:
The univariate polynomial

m

n
i=0

j=0

pij j xi is nonnegative on [1, 1].

The sequence {j }m
j=0 is a valid moment sequence for a probability measure
supported in [1, 1].
We can rewrite this in a more compact form, as the optimization problem

minimize

s.t.

n
i=0

m
j=0

pij xi j

P1,n ,
Mm ,

(3.62)

where P1,n is the set of univariate polynomials of degree n nonnegative in [1, 1],
and Mm is the set of m + 1 rst moments of a probability measure with support
on the same interval.
By the characterizations provided in earlier sections, it is clear that both of
these conditions can be rewritten in terms of semidenite programming and thus
eciently solved. Furthermore, using the procedure described in Section 3.5.5, the
corresponding optimal mixed strategies can be obtained.
Example 3.184. Consider the guessing game discussed in Example 3.182. In this
case, the decision variables (0 , 1 , 2 ) are the moments of the mixed strategy of

i
i

148

main
2012/11/1
page 148
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

Player 1. To compute the optimal strategies, we must then solve (3.62), i.e.,

2
(x2 0 2x

 1 + 2) = s0 (x) + s1 (1 x ),

0 1

 0,
1 2
minimize s.t.

0 2 0,

0 = 1,
where we have used the sos/semidenite characterizations of univariate polynomials (Section 3.3.1) and moments constraints (Section 3.5.3) for the interval [1, 1].
The optimal solution of this problem is = 1, (0 , 1 , 2 ) = (1, 0, 1), s0 (x) = 0, and
s1 = 1. From this, the optimal strategies (x) for Player 1 and 12 (x 1) + 12 (x + 1)
for Player 2 directly follow.
Exercise 3.185. Consider a two-player game on [1, 1] [1, 1] with payo function given by
P (x, y) = 5xy 2x2 2xy 2 y.
Notice this function is neither convex nor concave.
Formulate and solve the corresponding optimization problem to nd the optimal solution of this game. Verify that the optimal strategies correspond to Player 1
always choosing x = 0.2, and Player 2 choosing y = 1 with probability 0.78, and
y = 1 with probability 0.22.

3.7

Software Implementations

Despite the many advances in theoretical and modeling aspects of SDP and sos
methods, much of their impact in applications has undoubtedly been a direct consequence of the eorts of many researchers in producing and making available good
quality software implementations. In this section we give pointers to and discuss
briey some of the current computational tools for eectively formulating and solving SDP and sos programs.
Most SDP solvers (e.g., those described in Section 2.3.2) usually take as input
either text les containing a problem description or directly the matrices (Ai , b, C)
corresponding to the standard primal/dual formulation. This is often inconvenient
at the initial modeling and solution stages. A more exible approach is to formulate the problem using a more natural description, closer to its mathematical
formulation, that can later be automatically translated to t the requirements of
each solver. For generic optimization problems, this has indeed been the approach of
much of the operations research community, which has developed some well-known
standard le formats, such as MPS, or optimization modeling languages like AMPL
and GAMS. An important remark to keep in mind, much more critical in the SDP
case than for linear optimization, is the extent to which the problem structure can
be signaled to the solver.
For sos programs, as we have seen, the conversion process to an SDP formulation is algorithmic, and there are parsers that partially or fully automate this

i
i

Bibliography

main
2012/11/1
page 149
i

149

conversion task and can be used from within a problem-solving environment such as
MATLAB. The software SOSTOOLS [101] is a free, third-party MATLAB toolbox
for formulating and solving general sos programs. The related software Gloptipoly
[62] is oriented toward global optimization problems and the associated moment
problems. In their current version, both use the SDP solver SeDuMi [118] for numerical computations. Other possibilities include YALMIP [74], a very complete
modeling language for convex and nonconvex optimization that includes several
sos/moments features, as well as the more specialized toolbox SPOT [78], oriented
toward problems in systems and control theory. An interesting new addition to this
area is the MATLAB toolbox NCSOStools [25] that specializes in sums of squares
in noncommuting variables, a topic that will be discussed extensively in Chapter 8.
Any of these parsers can make formulating and solving sos programs a much simpler
and more enjoyable task than manual, error-prone methods.

Bibliography
[1] M. Abramowitz and I.A. Stegun, eds. Handbook of Mathematical Functions.
Dover, New York, 1964.
[2] A.A. Ahmadi, M. Krstic, and P. A. Parrilo. A globally asymptotically stable
polynomial vector eld with no polynomial Lyapunov function. In Proceedings
of the 50th IEEE Conference on Decision and Control, IEEE, Washington,
DC, 2011.
[3] A.A. Ahmadi, A. Olshevsky, P. A. Parrilo, and J.N. Tsitsiklis. NP-hardness of
deciding convexity of quartic polynomials and related problems. Mathematical
Programming, 124, 2011.
[4] A.A. Ahmadi and P. A. Parrilo. A complete characterization of the gap between convexity and sos-convexity. Mathematical Programming, to appear.
arXiv:1111.4587, 2011.
[5] A.A. Ahmadi and P. A. Parrilo. A convex polynomial that is not sos-convex.
Mathematical Programming, 135:275292, 2012.
[6] N. I. Akhiezer. The Classical Moment Problem. Hafner Publishing Company,
New York, 1965.
[7] C. Andradas. Characterization and description of basic semialgebraic sets.
In Algorithmic and Quantitative Real Algebraic Geometry (Piscataway, NJ,
2001), DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 60, Amer. Math.
Soc., Providence, RI, 2003, pp. 112.
[8] C. Andradas and J.M. Ruiz. Ubiquity of L
ojasiewiczs example of a nonbasic
semialgebraic set. The Michigan Mathematical Journal, 41:465472, 1994.
[9] E. M. Aylward, S. M. Itani, and P. A. Parrilo. Explicit SOS decomposition of
univariate polynomial matrices and the Kalman-Yakubovich-Popov lemma.

i
i

150

main
2012/11/1
page 150
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications


In Proceedings of the 46th IEEE Conference on Decision and Control, IEEE,
Washington, DC, 2007.

[10] C. Bachoc and F. Vallentin. New upper bounds for kissing numbers from
semidenite programming. J. Amer. Math. Soc, 21:909924, 2008.
[11] F. Balitrand. Problem 4417. Intermed. Math., 22:66, 1915.
[12] F. Balitrand. Problem 4417. Intermed. Math., 23:8687, 1916.
[13] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in Real Algebraic Geometry,
Algorithms and Computation in Mathematics 10, Springer-Verlag, Berlin,
2003.
[14] A. Ben-Tal, L. El Ghaoui, and A.S. Nemirovski. Robust Optimization. Princeton University Press, Princeton, NJ, 2009.
[15] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization.
MPS/SIAM Series on Optimization 2. SIAM, Philadelphia, 2001.
[16] D. Bertsimas and I. Popescu. Optimal inequalities in probability theory: A
convex optimization approach. SIAM J. Optim., 15:780804, 2005.
[17] D. Bertsimas and J. Sethuraman. Moment problems and semidenite optimization. In Handbook of Semidenite Programming, H. Wolkowicz, R. Saigal,
and L. Vandenberghe, eds., Springer, New York, 2000, pp. 469509.
[18] S.P. Bhattacharyya, H. Chapellat, and L.H. Keel. Robust Control: The Parametric Approach. Prentice-Hall, Englewood Clis, NJ, 1995.
[19] J. Bochnak, M. Coste, and M-F. Roy. Real Algebraic Geometry. Springer, New
York, 1998.
[20] J.M. Borwein and H. Wolkowicz. Facial reduction for a cone-convex programming problem. J. Austral. Math. Soc. Ser. A, 30:369380, 1980.
[21] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory, Stud. Appl. Math. 15. SIAM, Philadelphia,
1994.
[22] F.G.S.L. Brand
ao, M. Christandl, and J. Yard. A quasipolynomial-time algorithm for the quantum separability problem. In Proceedings of the 43rd
Annual ACM Symposium on Theory of Computing, ACM, New York, 2011,
pp. 343352.
[23] C.W. Brown. QEPCAD Quantier Elimination by Partial Cylindrical Algebraic Decomposition, 2003. Available from www.cs.usna.edu/qepcad/B/
QEPCAD.html.
[24] S. R. Buss and T. Pitassi. Good degree bounds on Nullstellensatz refutations
of the induction principle. J. Comp. System Sci., 57:162171, 1998.

i
i

Bibliography

main
2012/11/1
page 151
i

151

[25] K. Cafuta, I. Klep, and J. Povh. NCSOStools: Computer Algebra System


for Symbolic and Numerical Computation with Noncommutative Polynomials,
2011. Available at ncsostools.s.unm.si.
[26] B. F. Caviness and J. R. Johnson, eds. Quantier Elimination and Cylindrical
Algebraic Decomposition, Texts and Monographs in Symbolic Computation,
Springer-Verlag, Vienna, 1998.
[27] M. D. Choi. Positive semidenite biquadratic forms. Linear Algebra Appl.,
12:95100, 1975.
[28] M. D. Choi, T. Y. Lam, and B. Reznick. Real zeros of positive semidenite
forms. I. Math. Z., 171:126, 1980.
[29] M. D. Choi, T. Y. Lam, and B. Reznick. Positive sextics and Schurs inequalities. J. Algebra, 141:3677, 1991.
[30] M. D. Choi, T. Y. Lam, and B. Reznick. Sums of squares of real polynomials.
Proceedings of Symposia in Pure Mathematics, 58:103126, 1995.
[31] R. W. Cottle, J. S. Pang, and R. E. Stone. The Linear Complementarity
Problem. Academic Press, New York, 1992.
[32] D. A. Cox, J. B. Little, and D. OShea. Ideals, Varieties, and Algorithms: An
Introduction to Computational Algebraic Geometry and Commutative Algebra.
Springer, New York, 1997.
[33] D. A. Cox, J. B. Little, and D. OShea. Using Algebraic Geometry, Grad.
Texts in Math. 185. Springer-Verlag, New York, 1998.
[34] R. E. Curto and L. A. Fialkow. Solution of the Truncated Complex Moment
Problem for Flat Data. Mem. Amer. Math. Soc. 568. AMS, Providence, RI,
1996.
[35] E. de Klerk and D.V. Pasechnik. Approximating the stability number of a
graph via copositive programming. SIAM J. Optim., 12:875892, 2002.
[36] J. A. De Loera, J. Lee, S. Margulies, and S. Onn. Expressing combinatorial
problems by systems of polynomial equations and Hilberts Nullstellensatz.
Combin., Probab. Comput., 18:551, 2009.
[37] P. Delsarte and V. I. Levenshtein. Association schemes and coding theory.
IEEE Trans. Inform. Theory, 44:24772504, 1998.
[38] H. Derksen and G. Kemper. Computational Invariant Theory, Encyclopaedia
Math. Sci. 130. Springer, Berlin, 2002.
[39] L. Devroye. Nonuniform Random Variate Generation. Springer-Verlag, New
York, 1986.

i
i

152

main
2012/11/1
page 152
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

[40] A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri. Distinguishing separable


and entangled states. Phys. Rev. Lett., 88:187904, 2002.
[41] A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri. Complete family of separability criteria. Phys. Rev. A, 69:022308, 2004.
[42] M. Dresher, S. Karlin, and L. S. Shapley. Polynomial games. In Contributions
to the Theory of Games, Ann. Math. Stud. 24, Princeton University Press,
Princeton, NJ, 1950, pp. 161180.
[43] G. E. Dullerud and F. Paganini. A Course in Robust Control Theory: A Convex Approach. Springer-Verlag, New York, 1999.
[44] C. Ebenbauer and F. Allg
ower. Analysis and design of polynomial control
systems using dissipation inequalities and sum of squares. Comput. Chem.
Engrg., 30:15901602, 2006.
[45] A. Fassler and E. Stiefel. Group Theoretical Methods and Their Applications.
Birkhauser, Basel, 1992.
[46] D. Fudenberg and J. Tirole. Game Theory. MIT Press, Cambridge, MA, 1991.
[47] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to
the Theory of NP-Completeness. W. H. Freeman and Company, New York,
1979.
[48] K. Gatermann and P. A. Parrilo. Symmetry groups, semidenite programs,
and sums of squares. J. Pure Appl. Algebra, 192:95128, 2004.
[49] W. Gautschi. Orthogonal Polynomials: Computation and Approximation.
Oxford University Press, Oxford, UK, 2004.
[50] Y. Genin, Y. Hachez, Yu. Nesterov, and P. Van Dooren. Optimization problems over positive pseudopolynomial matrices. SIAM J. Matrix Anal. Appl.,
25:5779, 2003.
[51] M. Giesbrecht, G. Labahn, and W. Lee. Symbolicnumeric sparse interpolation of multivariate polynomials. J. Symbol. Comput., 44:943959, 2009.
[52] G. H. Golub and G. Meurant. Matrices, Moments and Quadrature with Applications. Princeton University Press, Princeton, NJ, 2009.
[53] M. Golubitsky, I. Stewart, and D. G. Schaeer. Singularities and Groups in
Bifurcation Theory II, Appl. Math. Sci. 69. Springer, New York, 1988.
[54] D. R. Grayson and M. E. Stillman. Macaulay 2: A Software System for
Research in Algebraic Geometry. Available at http://www.math.uiuc.edu/
Macaulay2.
[55] D. Grigoriev and N. Vorobjov. Complexity of null- and Positivstellensatz
proofs. Ann. Pure Appl. Logic, 113:153160, 2002.

i
i

Bibliography

main
2012/11/1
page 153
i

153

[56] M. Grotschel, L. Lov


asz, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization, 2nd ed., Algorithms and Combinatorics 2. SpringerVerlag, Berlin, 1993.
[57] J. Guckenheimer, M. Myers, and B. Sturmfels. Computing Hopf bifurcations I.
SIAM J. Numer. Anal., 34:121, 1997.
[58] L. Gurvits. Classical deterministic complexity of Edmonds problem and quantum entanglement. In STOC 03: Proceedings of the 35th Annual ACM Symposium on Theory of Computing, ACM, New York, 2003.
[59] G. H. Hardy, J. E. Littlewood, and G. Polya. Inequalities. Cambridge University Press, Cambridge, UK, 1967.
[60] J. Harrison. Verifying nonlinear real formulas via sums of squares. In Proceedings of the 20th International Conference on Theorem Proving in Higher
Order Logics, Springer-Verlag, New York, 2007, pp. 102118.
[61] J. Harrison. The HOL Light Theorem Prover, 2011. Available at www.cl.
cam.ac.uk/jrh13/hol-light.
[62] D. Henrion and J.-B. Lasserre. GloptiPoly: Global Optimization over Polynomials with MATLAB and SeDuMi. Available from http://www.laas.
fr/henrion/software/gloptipoly/.
[63] D. Henrion and J.B. Lasserre. Detecting global optimality and extracting
solutions in gloptipoly. In Positive Polynomials in Control, D. Henrion and
A. Garulli, eds., Lecture Notes in Control and Inform. Sci. 312, Springer, New
York, 2005, p. 581.
[64] E. L. Kaltofen, Z. Yang, and L. Zhi. A proof of the monotone column permanent (MCP) conjecture for dimension 4 via sums-of-squares of rational
functions. In Proceedings of the 2009 Conference on Symbolic Numeric Computation, ACM, New York, 2009, pp. 6570.
[65] E. L. Kaltofen, B. Li, Z. Yang, and L. Zhi. Exact certication in global polynomial optimization via sums-of-squares of rational functions with rational
coecients. J. Symbol. Comput., 2011.
[66] S. Karlin and L. Shapley. Geometry of Moment Spaces, Mem. Amer. Math.
Soc. 12. AMS, Providence, RI, 1953.
[67] H. Khalil. Nonlinear Systems. Macmillan Publishing Company, New York,
1992.
[68] I. Klep and M. Schweighofer. Sums of hermitian squares and the BMV conjecture. J. Statist. Phys., 133:739760, 2008.
[69] M. Kojima. Sums of Squares Relaxations of Polynomial Semidenite Programs. Research report B-397, Dept. of Mathematical and Computing Sciences, Tokyo Institute of Technology, Tokyo, 2003.

i
i

154

main
2012/11/1
page 154
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

[70] M. Kojima, S. Kim, and H. Waki. Sparsity in sums of squares of polynomials.


Math. Program., 103:4562, 2005.
[71] M. Krstic, I. Kanellakopoulos, and P. V. Kokotovic. Nonlinear and Adaptive
Control Design. John Wiley & Sons, New York, 1995.
[72] J. B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM J. Optim., 11:796817, 2001.
[73] M. Laurent. Sums of squares, moment matrices and optimization over polynomials. In Emerging Applications of Algebraic Geometry, M. Putinar and
S. Sullivant, eds., IMA Vol. Math. Appl. 149, Springer, New York, 2009,
pp. 157270.
[74] J. Lofberg. YALMIP: A toolbox for modeling and optimization in MATLAB.
In Proceedings of the CACSD Conference, Taipei, Taiwan, 2004. Available at
yalmip.org.
[75] J. Lofberg and P. A. Parrilo. From coecients to samples: A new approach to
SOS optimization. In Proceedings of the 43rd IEEE Conference on Decision
and Control, IEEE, Washington, DC, 2004.
[76] A. Magnani, S. Lall, and S. Boyd. Tractable tting with convex polynomials
via sum-of-squares. In Proceedings of the 44th IEEE Conference on Decision
and Control and the 2005 European Control Conference. CDC-ECC05. IEEE,
Washington, DC, 2005, pp. 16721677.
[77] J. E. Marsden and T. Ratiu. Introduction to Mechanics and Symmetry, 2nd
ed., Texts Appl. Math. 17. Springer-Verlag, New York, 1999.
[78] A. Megretski. SPOT: Systems Polynomial Optimization Tools, 2010. MATLAB toolbox, available from web.mit.edu/ameg/www.
[79] D. S. Mitrinovic, J. E. Pecaric, and V. Volenec. Recent Advances in Geometric
Inequalities, Math. Appl. 28. Kluwer Academic Publishers, Dordrecht, The
Netherlands, 1989.
[80] T. S. Motzkin and E. G. Straus. Maxima for graphs and a new proof of a
theorem of tur
an. Canad. J. Math, 17:533540, 1965.
[81] K. G. Murty and S. N. Kabadi. Some NP-complete problems in quadratic and
nonlinear programming. Math. Program., 39:117129, 1987.
[82] R. B. Myerson. Game Theory: Analysis of Conict. Harvard University Press,
Cambridge, MA, 1991.
[83] Y. Nesterov. Squared functional systems and optimization problems. In
High Performance Optimization, J. B. G. Frenk, C. Roos, T. Terlaky, and
S. Zhang, eds., Kluwer Academic Publishers, Dordrecht, The Netherlands,
2000, pp. 405440.

i
i

Bibliography

main
2012/11/1
page 155
i

155

[84] Y. E. Nesterov and A. Nemirovski. Interior Point Polynomial Methods in


Convex Programming, Stud. Appl. Math. 13. SIAM, Philadelphia, 1994.
[85] M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, Cambridge, UK, 2000.
[86] T. Ono. Problem 4417. Intermed. Math., 21:146, 1914.
[87] A. Papachristodoulou and S. Prajna. On the construction of Lyapunov functions using the sum of squares decomposition. In Proceedings of the 41st IEEE
Conference on Decision and Control, IEEE, Washington, DC, 2002.
[88] A. Papachristodoulou and S. Prajna. Robust stability analysis of nonlinear
hybrid systems. IEEE Trans. Automat. Control, 54:10351041, 2009.
[89] P. A. Parrilo. Structured Semidenite Programs and Semialgebraic Geometry
Methods in Robustness and Optimization. Ph.D. thesis, California Institute of Technology, 2000. Available at resolver.caltech.edu/CaltechETD:
etd-05062004-055516.
[90] P. A. Parrilo. An Explicit Construction of Distinguished Representations of
Polynomials Nonnegative over Finite Sets. Technical Report IfA, Technical
Report AUT02-02. Available from www.mit.edu/parrilo, ETH Z
urich, 2002.
[91] P. A. Parrilo. Semidenite programming relaxations for semialgebraic problems. Math. Program. Ser. B, 96:293320, 2003.
[92] P. A. Parrilo. Polynomial games and sum of squares optimization. In Proceedings of the 45th IEEE Conference on Decision and Control, IEEE, Washington, DC, 2006.
[93] P. A. Parrilo and R. Peretz. An inequality for circle packings proved by
semidenite programming. Discrete Comput. Geom., 31:357367, 2004.
[94] P. A. Parrilo and B. Sturmfels. Minimizing polynomial functions. In Algorithmic and Quantitative Real Algebraic Geometry, S. Basu and L. Gonz
alezVega, eds., DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 60. AMS,
Providence, RI, 2003. Available from arXiv:math.OC/0103170.
[95] M. M. Peet. Exponentially stable nonlinear systems have polynomial Lyapunov functions on bounded regions. IEEE Trans. Automat. Control, 54:979
987, 2009.
[96] H. Peyrl and P. A. Parrilo. A Macaulay2 package for computing sum of squares
decompositions of polynomials with rational coecients. In Proceedings of the
2007 International Workshop on Symbolic-Numeric Computation, ACM, New
York, 2007, pp. 207208.
[97] H. Peyrl and P. A. Parrilo. SOS.m2: A Sum of Squares Package for Macaulay 2.
Available from www.control.ee.ethz.ch/hpeyrl/index.php, 2007.

i
i

156

main
2012/11/1
page 156
i

Chapter 3. Polynomial Optimization, Sums of Squares, and Applications

[98] H. Peyrl and P. A. Parrilo. Computing sum of squares decompositions with


rational coecients. Theoret. Comput. Sci., 409:269281, 2008.
[99] I. P
olik and T. Terlaky. A survey of the S-lemma. SIAM Rev., 49:371418,
2007.
[100] S. Prajna, A. Jadbabaie, and G. J. Pappas. Stochastic safety verication using
barrier certicates. In Proceedings of the 43rd IEEE Conference on Decision
and Control, IEEE, Washington, DC, 2004.
[101] S. Prajna, A. Papachristodoulou, and P. A. Parrilo. SOSTOOLS: Sum
of squares optimization toolbox for MATLAB, 20022005. Available from
www.cds.caltech.edu/sostools and www.mit.edu/parrilo/sostools.
[102] M. Putinar. Positive polynomials on compact semi-algebraic sets. Indiana
Univ. Math. J., 42:969984, 1993.
[103] A. J. Quist, E. De Klerk, C. Roos, and T. Terlaky. Copositive relaxation for
general quadratic programming. Optim. Methods Software, 9:185208, 1998.
[104] B. Reznick. Extremal PSD forms with few terms. Duke Math. J., 45:363374,
1978.
[105] B. Reznick. Uniform denominators in Hilberts seventeenth problem. Math.
Z., 220:7597, 1995.
[106] B. Reznick. Some concrete aspects of Hilberts 17th problem. In Real Algebraic
Geometry and Ordered Structures, Contemp. Math. 253, AMS, Providence,
RI, 2000, pp. 251272.
[107] T. Roh and L. Vandenberghe. Discrete transforms, semidenite programming, and sum-of-squares representations of nonnegative polynomials. SIAM
J. Optim., 16:939964, 2006.
[108] K. Scheiderer. Descending the Ground Field in Sums of Squares Representations, 2012. Preprint, available at arXiv:1209.2976.
[109] C. W. Scherer and C. W. J. Hol. Matrix sum-of-squares relaxations for robust
semi-denite programs. Math. Program. Ser. B, 107:189211, 2006.
[110] K. Schm
udgen. The K-moment problem for compact semialgebraic sets. Math.
Ann., 289:203206, 1991.
[111] J.-P. Serre. Linear Representations of Finite Groups. Springer-Verlag, New
York, 1977.
[112] J. A. Shohat and J. D. Tamarkin. The Problem of Moments. Amer. Math.
Soc. Math. Surveys 2. AMS, Providence, RI, 1943.
[113] N. Z. Shor. Class of global minimum bounds of polynomial functions. Cybernet., 23:731734, 1987. (Russian orig.: Kibernetika, No. 6, (1987), 911).

i
i

Bibliography

main
2012/11/1
page 157
i

157

[114] G. Stengle. A Nullstellensatz and a Positivstellensatz in semialgebraic geometry. Math. Ann., 207:8797, 1974.
[115] G. Stengle. Complexity estimates for the Schm
udgen Positivstellensatz.
J. Complexity, 12:167174, 1996.
[116] J. Stoer and R. Bulirsch. Introduction to Numerical Analysis, Texts Appl.
Math. 12. Springer-Verlag, New York, 2002.
[117] A. Strzebonski. Solving algebraic inequalities. The Mathematica Journal,
7:525541, 2000.
[118] J. Sturm, O. Romanko, and I. Polik. SeDuMi version 1.3, 2010. MATLAB
toolbox, available from sedumi.ie.lehigh.edu.
[119] B. Sturmfels. Algorithms in Invariant Theory, Texts Monogr. Symbol. Comput. 1. Springer, Wien, 1993.
[120] B. Sturmfels. Introduction to resultants. In Applications of Computational
Algebraic Geometry (San Diego, CA, 1997), Proc. Sympos. Appl. Math. 53,
AMS, Providence, RI, 1998, pp. 2539.
[121] B. Sturmfels. Solving Systems of Polynomial Equations. AMS, Providence,
RI, 2002.
[122] H. Valiaho. Criteria for copositive matrices. Linear Algebra Appl., 81:1934,
1986.
[123] F. Vallentin. Symmetry in semidenite programs. Linear Algebra Appl.,
430:360369, 2009.
[124] H. Waki, S. Kim, M. Kojima, and M. Muramatsu. Sums of squares and
semidenite program relaxations for polynomial optimization problems with
structured sparsity. SIAM J. Optim., 17:218242, 2006.
[125] K. Zhou, K. Glover, and J. C. Doyle. Robust and Optimal Control. Prentice
Hall, Englewood Clis, NJ, 1995.

i
i

main
2012/11/1
page 158
i

main
2012/11/1
page 159
i

Chapter 4

Nonnegative
Polynomials and Sums
of Squares

Grigoriy Blekherman

A central question, for both practical and theoretical reasons, is how to eciently
test whether a polynomial p is nonnegative. We reformulate this problem in the
following way: given a nonnegative polynomial p, how do we eciently nd a representation of p, so that nonnegativity of p is apparent from this representation?
In other words, how do we eciently represent p as an obviously nonnegative
polynomial? Some polynomials are obviously nonnegative. If we can write p as a
sum of squares of polynomials, then it is clear that p is nonnegative just from this
presentation. Very importantly, if p is a sum of squares then its sums of squares
representation can be eciently computed via semidenite programming. This
connection was described in detail in Chapter 3. As we will see, the set of sums of
squares is a projected spectrahedron, while the set of nonnegative polynomials is far
more challenging computationally. The main question for this chapter is: what is
the relationship between nonnegative polynomials and sums of squares?

4.1

Introduction

Our story begins in 1885, when twenty-three-year-old David Hilbert was one of the
examiners in the Ph.D. defense of twenty-one-year-old Hermann Minkowski. During
the examination Minkowski claimed that there exist nonnegative polynomials that
are not sums of squares. Although he did not provide an example or a proof, his
argument must have been convincing, as he defended successfully.
Three years later Hilbert published a paper in which he classied all of the
(few) cases, in terms of degree and number of variables, in which nonnegative polynomials are the same as sums of squares. In all other cases Hilbert showed that
there exist nonnegative polynomials that are not sums of squares. Interestingly,
159

i
i

160

main
2012/11/1
page 160
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

Hilbert did not provide an explicit example of such polynomials. The rst explicit
example was found only seventy years later and is due to Theodore Motzkin. In
fact, Motzkin was not aware of what he constructed. Olga Taussky-Todd, who was
present during the seminar in which Motzkin described his construction, later notied him that he found the rst example of a nonnegative polynomial that is not
a sum of squares [22].
We examine the relationship between nonnegativity and sums of squares in
two dierent fundamental ways. We rst consider the structures that prevent sums
of squares from capturing all nonnegative polynomials, and show that equality occurs precisely when these structures are not present. We then examine in detail
the smallest cases where there exist nonnegative polynomials that are not sums of
squares and show that the inequalities separating nonnegative polynomials from
sums of squares have a simple and elegant structure. Second, we look at the quantitative relationship between nonnegative polynomials and sums of squares. Here
we show that when the degree is xed and the number of variables grows, there are
signicantly more nonnegative polynomials than sums of squares. We also apply
these ideas to studying the relationship between sums of squares and convex polynomials. While the techniques we develop for the two approaches are quite dierent
in nature, the unifying theme is that we examine the sets of nonnegative polynomials and sums of squares geometrically. Algebraic geometry is at the forefront of our
examination of fundamental dierences between nonnegative polynomials and sums
of squares, while convex geometry and analysis are used to examine the quantitative
relationship.
The chapter is structured as follows: After discussing Hilberts theorem and
Motzkins example in Section 4.2, we begin a detailed examination of the underlying causes of dierences between nonnegative polynomials and sums of squares
in Section 4.3. On the way we will see that nonnegative polynomials and sums of
squares form fascinating convex sets. Section 4.4 is devoted to the examination of
these objects from the point of view of convex algebraic geometry. We note that
many basic questions remain open.
The fundamental reasons for the existence of nonnegative polynomials that
are not sums of squares come from CayleyBacharach theory in classical algebraic
geometry and, in fact, Hilberts original proof of his theorem already used some of
these ideas. We begin developing the necessary techniques in Section 4.5. Duality
from convex geometry and its interplay with commutative algebra will play a central
role in our investigation. Section 4.6 develops the duality ideas and presents a unied
proof of the equality cases of Hilberts theorem. Sections 4.7 and 4.8 investigate
the smallest cases in which there exist nonnegative polynomials that are not sums
of squares. We show that this situation fundamentally arises from the existence of
CayleyBacharach relations and present some consequences.
We proceed by examining the quantitative relationship between nonnegative
polynomials and sums of squares in Section 4.9. This is done by establishing bounds
on the volume of sets of nonnegative polynomials and sums of squares, and analytic aspects of convex geometry come to the fore in this examination. We will
explain that if the degree is xed and the number of variables is allowed to grow,
then there are signicantly more nonnegative polynomials than sums of squares [5].

i
i

4.2. A Deeper Look

main
2012/11/1
page 161
i

161

This happens despite the diculty of constructing explicit examples of nonnegative polynomials that are not sums of squares, and numerical evidence that sums
of squares approximate nonnegative polynomials well if the degree and number of
variables is small [19]. The question of precisely when nonnegative polynomials
begin to signicantly overtake sums of squares is currently poorly understood.
Section 4.10 presents an application of the volume ideas to showing that there
exist homogeneous polynomials that are convex functions but are not sums of
squares. There is no known explicit example of such a polynomial, and this is
the only known method of showing their existence.

4.2

A Deeper Look

We rst reduce the study of nonnegative polynomials and sums of squares to


the case of homogeneous polynomials, which are also called forms. A polynomial
p(x1 , . . . , xn ) of degree d can be made homogeneous by introducing an extra variable xn+1 and multiplying every monomial in p by a power of xn+1 , so that all
monomials have the same degree. More formally, let p be the homogenization of p:
&
p = xdn+1 p

x1
xn
,...,
xn+1
xn+1

'
.

Exercise 4.1. Let p be a nonnegative polynomial. Show that p is a nonnegative


form. Also show that if p is a sum of squares, then p is a sum of squares as well.
Given a form p we can dehomogenize it by setting xn+1 = 1. Dehomogenization clearly preserves nonnegativity and sums of squares. Therefore the study of
nonnegative polynomials and sums of squares in n variables is equivalent to studying
forms in n + 1 variables. From now on we restrict ourselves to the case of forms.
Let R[x]d be the vector space of real forms in n variables of degree d. In order
to be nonnegative a form must have even degree, and therefore our forms will have
even degree 2d. Inside R[x]2d sit two closed convex cones: the cone of nonnegative
polynomials,
Pn,2d = {p R[x]2d | p(x) 0 for all x Rn } ,
and the cone of sums of squares,
5
4

qi2 for some qi R[x]d .
n,2d = p R[x]2d | p(x) =
Exercise 4.2. Show that Pn,2d and n,2d are closed, full-dimensional convex cones
in R[x]2d . (Hint: Consider Exercise 4.17.)
We now come to the rst major theorem concerning nonnegative polynomials
and sums of squares.

i
i

162

4.2.1

main
2012/11/1
page 162
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

Hilberts Theorem

The rst fundamental result about the relationship between Pn,2d and n,2d was
shown by Hilbert in 1888.
Theorem 4.3. Nonnegative forms are the same as sums of squares, Pn,2d = n,2d ,
in the following three cases: n = 2 (univariate nonhomogeneous case), 2d = 2
(quadratic forms), and n = 3, 2d = 4 (ternary quartics). In all other cases there
exist nonnegative forms that are not sums of squares.
The proof of the three equality cases in Hilberts theorem usually proceeds by
treating each of the three cases separately. For example, it is a simple exercise to
show that Pn,2 = n,2 .
Exercise 4.4. Deduce that Pn,2 = n,2 from diagonalization of symmetric matrices.
We adopt a dierent approach: We begin by examining the structures that
allow the existence of nonnegative forms that are not sums of squares. In Section
4.6.1 we show that the three cases of Hilberts theorem are the only cases in which
these structures do not exist. This provides a unied proof of the three equality
cases of Hilberts theorem, which are usually treated separately.

4.2.2

Motzkins Example

The rst explicit example of a nonnegative form that is not a sum of squares is due
to Motzkin:
M (x, y, z) = x4 y 2 + x2 y 4 + z 6 3x2 y 2 z 2 .
The form M can be seen to be nonnegative by the application of the arithmetic
mean-geometric mean inequality. Why is M not a sum of squares?
In the following exercises we develop a general method for showing that a
form is not a sum of squares, based on the monomials that occur in the form.
This method can also be applied to reduce the size of the semidenite program that
computes the sum of squares decomposition, as explained in Chapter 3. These ideas
are originally due to Choi, Lam, and Reznick [22].
Exercise 4.5.
For a polynomial p dene its Newton polytope N (p) to be the
convex hull of the vectors of exponents of monomials that occur in p. For example,
if p = x1 x22 + x22 + x1 x2 x3 , then N (p) = conv ({(1, 2, 0), (0, 2, 0), (1, 1, 1)}), which is
a triangle in R3 .

Show that if p = qi2 , then
N (qi )

1
N (p).
2

Exercise 4.6. Calculate the Newton polytope of the Motzkin form and use Exercise 4.5 to show that the Motzkin form is not a sum of squares.

i
i

4.3. The Hypercube Example

main
2012/11/1
page 163
i

163

For much more on explicit examples of nonnegative polynomials that are not
sums of squares see [22].

4.2.3

Quantitative Relationship

While Hilberts theorem completely settles all cases of equality between Pn,2d and
n,2d it does not shed light on whether these cones are close to each other, even if
the cone of nonnegative polynomials is strictly larger. Due to the diculty of constructing explicit examples and numerical evidence for a small number of variables
and degrees, it is tempting to assume that n,2d approximates Pn,2d fairly well.
However, it was shown in [5] that if the degree 2d is xed and at least 4, then
as the number of variables n grows, there are signicantly more nonnegative forms
than sums of squares. We will make this statement precise and present a proof in
Section 4.9. The main idea is that, although the cones themselves are unbounded,
we can slice both cones with the same hyperplane, so that the section of each cone
is compact. We then derive separate bounds on the volume of each section.
For now we would like to note that the bounds guarantee that the dierence between Pn,2d and n,2d is large only for a very large number of variables n.
Whether this is an artifact of the techniques used to derive the bounds is unclear.
As we will see, for a small number of variables the distinction between Pn,2d and
n,2d is quite delicate, and it is not known at what point Pn,2d becomes much
larger than n,2d .
We now begin a systematic examination of dierences between nonnegative
forms and sums of squares. It is actually possible to see that there exist nonnegative forms that are not sums of squares by considering values of forms on nitely
many points. The following example will illustrate this idea and explain some of
the major themes in our investigation.

4.3

The Hypercube Example

According to Hilberts theorem the smallest cases where Pn,2d and n,2d dier are
forms in 3 variables of degree 6, and forms in 4 variables of degree 4. We take a
close look at an explicit example for the case of forms in 4 variables of degree 4.
Let S = {s1 , . . . , s8 } be the following set of 8 points in R4 :
S = {1, 1, 1, 1}.
We will see that there is a dierence between nonnegative forms and sums
of squares by simply looking at the values that nonnegative polynomials and sums
of squares take on S. Accordingly, let us dene a projection from R[x]4,4 to R8
given by evaluation on S:
(f ) = (f (s1 ), . . . , f (s8 )) for f R[x]4,4 .
We will explicitly describe the images of P4,4 and 4,4 under this projection. Let
P  = (P4,4 ) and  = (4,4 ).

i
i

164

main
2012/11/1
page 164
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

As they are images of convex cones under a linear map, it is clear that both
P  and  are convex cones in R8 . Although both P  and  will turn out to be
closed, projections of closed convex cones do not have to be closed in general.
Exercise 4.7. Construct a closed convex cone C in R3 and a linear map : R3 R2
such that (C) is not closed.

4.3.1

Values of Nonnegative Forms

We rst look at values on S that are achievable by nonnegative forms. Let R8+ be
the nonnegative orthant of R8 :
R8+ = {(x1 , . . . , x8 ) | xi 0 for i = 1, . . . , 8}.
Since we are evaluating nonnegative polynomials, it is clear that P  R8+ . We
claim that, in fact, P  = R8+ . In other words, any 8-tuple of nonnegative numbers
can be attained on S by a globally nonnegative form. By convexity of P  it suces
to show that all the standard basis vectors ei are in P  . Moreover, substitutions
xi  xi permute the set S, and therefore it is enough to show that ei P  for
some i.
Exercise 4.8. Let p R[x]4,4 be the following symmetric form:
p=

4


x4i + 2

i=1

x2i xj xk + 4x1 x2 x3 x4 .

i =j =k

Show that p is nonnegative, and check that p vanishes on exactly 7 points in S.


Conclude that P  = R8+ .
We have seen that all combinations of nonnegative values on S are realizable
as values of a nonnegative form. We now look at why some values in R8+ are not
attainable by sums of squares. In the end we will completely describe the projection  .

4.3.2

Values of Sums of Squares

In order to analyze the values of sums of squares, we need to take a look at the
values of the forms that we are squaring. The values of quadratic forms on S are not
linearly independent. Here is the unique (up to a constant multiple) linear relation
between the values on the points si that all quadratic forms in 4 variables satisfy:


f (si ) =
f (si ).
(4.1)
si has even number of 1s

si has odd number of 1s

Exercise 4.9. Verify that the relation (4.1) holds for all quadratic forms f R[x]4,2
and that it is unique up to a constant multiple.

i
i

4.3. The Hypercube Example

main
2012/11/1
page 165
i

165

We are now ready to see how the relation (4.1) prevents sums of squares from
attaining all values in R8+ .
Proposition 4.10 (Hilberts original insight). Let ei be the ith standard basis
/  for all i.
vector in R8 . Then ei
Proof. Since we did not attach a specic labeling to the points of S it will suce

to show that e1
/ 
= (4,4 ). Suppose that there exists p 4,4 such that
(p) = e1 . Write p = j qj2 for some qj R[x]4,2 . The form p vanishes on s2 , . . . , s8 ,
 2
and it has value 1 on s1 . Since p =
j qj it follows that each qj vanishes on
s2 , . . . , s8 . Each qj is a quadratic form in 4 variables, and therefore each qj satises
relation (4.1). From this relation it follows that qj (s1 ) = 0 for all j. Therefore
p(s1 ) = 0, which is a contradiction.
Hilberts original proof did not use an explicit example to show that the vectors ei can be realized as values of a nonnegative form, which we did in Exercise
4.8. Instead he provided a recipe for constructing such a form, and proved that
the construction works. We largely followed Hilberts recipe to construct our counterexample. For more information on Hilberts construction see [23].

4.3.3

Complete Description of 

We can do better than just describing some points that are not in  . Our next goal
is to completely describe  and, in particular, we will see how far the points ei are
from being the values of a sum of squares.
We use to also denote the same evaluation projection on quadratic forms in
4 variables:
(f ) = (f (s1 ), . . . , f (s8 )) for f R[x]4,2 .
Let L be the projection of the entire vector space of quadratic forms:
L = (R[x]4,2 ).
Using relation (4.1) and Exercise 4.9 we see that L is a hyperplane in R8 . Let
C be the set of points that are coordinatewise squares of points in L:
C = {(v12 , . . . , v82 ) | v = (v1 , . . . , v8 ) L}.
We rst show the following description of  .
Lemma 4.11.  is equal to the convex hull of C:
 = conv(C).
Proof. Let v = (v1 , . . . , v8 ) L. Then there exists a quadratic form f R[x]4,2
such that f (si ) = vi for i = 1, . . . , 8. It follows that for the square of f we have
f 2 (si ) = vi2 . In other words,
(f 2 ) = (v12 , . . . , v82 ),

where v = (v1 , . . . , v8 ) = (f ).

Therefore we see that C  and by convexity of  it follows that conv(C)  .

i
i

166

main
2012/11/1
page 166
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

To prove the other inclusion, suppose that p =


for all i and therefore  conv(C).

2
i qi

4,4 . Then (qi2 ) C

Let Tm be the subset of the nonnegative orthant Rm


+ dened by the following
m inequalities:


m




Tm = (x1 , . . . , xm ) Rm
xi 2 xk for all k .

+
i=1


We will show that = T8 . We begin with a lemma on the structure on Tm .


Moreover, Tm is the convex
Lemma 4.12. The set Tm is a closed convex cone.
m

hull of the points x = (x1 , . . . , xm ) Rm


+ , where
i=1 xi = 2 xk for some k.
Proof. The set Tm is dened as a subset of Rm by the following 2m inequalities:

xk 0 and x1 + + xm 2 xk for all k. Therefore it is clear that Tm is a


closed set.
1/2
-norm of x:
For x = (x1 , . . . , xm ) Rm
+ let
x
1/2 denote the L

x
1/2 = ( x1 + + xm )2 .
We can restate the inequalities of Tm as xk 0 and
x
1/2 4xk for all k. Now
suppose that x, y Tm and let z = x + (1 )y for some 0 1. It is clear that
zk 0 for all k. It is known by the Minkowski inequality [11, p. 30] that L1/2 -norm
is a concave function:
x + (1 )y
1/2
x
1/2 + (1 )
y
1/2 . Therefore

z
1/2
x
1/2 + (1 )
y
1/2 4xk + 4(1 )yk = 4zk for all k.
Thus Tm is a convex cone.
To show that Tm is the convex hull of the points where
x
1/2 = 4xk for some
k we proceed by induction. The base case m = 2 is simple since T2 is just a ray
spanned by the point (1, 1). For the induction step we observe that any convex set
is the convex hull of its boundary. For any point on the boundary of Tm one of
the dening 2m inequalities must be sharp. If a point x is on the boundary of Tm
and xi = 0 for all i, then the inequalities xi 0 are not sharp at x; therefore the
inequality
x
1/2 4xk must be sharp for some k, and we are done.
If xi = 0 for some i, then the point x lies in the set Tm1 in the subspace
spanned by the m 1 standard basis vectors excluding ei , and we are done by
induction.
Exercise 4.13. Show that the cone T4 R4 can be transformed via a nonsingular
linear transformation into the dual cone of 3 3 positive semidenite matrices with
equal diagonal elements:

x1 x2 x3
4
(x1 , x2 , x3 , x4 ) R such that x2 x1 x4  0.
x3 x4 x1
If we restrict x1 to being 1 then we obtain the elliptope E3 , which we have already
seen in Chapter 2.

i
i

4.4. Symmetries, Dual Cones, and Facial Structure

main
2012/11/1
page 167
i

167

We are now ready to completely describe  .


Theorem 4.14.  = T8 .
Proof. We rewrite the relation (4.1) in the form
8


ai f (si ) = 0 for f R[x]4,2 ,

(4.2)

i=1

and let a = (a1 , . . . , a8 ) be the vector of coecients, with ai = 1. It follows that


L = (R[x]4,2 ) is the hyperplane in R8 perpendicular to a.
Since T8 is a convex cone, to show the inclusion  T8 it suces by Lemma
4.11 to show that C T8 . Let v = (v1 , . . . , v8 ) L and t = (v12 , . . . , v82 ) C.
By the relation (4.2) we have a1 v1 + + a8 v8 = 0 with ai = 1. Without loss
of generality, we may assume that v1 has the maximal absolute value among vi .
Multiplying the relation (4.2) by 1, if necessary,
we can make
= 1. Thenwe
a1
have v1 = a2 v2 + + a8 v8 . We can now write t1 = t2 t3
t8
with
the
exact
signs
depending
on
a
and
signs
of
v
.
Therefore
we
see
that
2
t1
i
i

t
+

+
t
.
Since
v
has
the
largest
absolute
value
among
v
,
it
follows
that
8
i
1

1
2 tk t1 + + t8 for all 1 k 8. Hence we see that  T8 .
To show the reverse inclusion T8  we use Lemma 4.12. It suces to

show that all points x T8 with 2 xk = x1 + + x8 for some k, are also


in . Without loss of generality we may assume that k = 1 and we have x1 =

x2 + + x8 . Let y = (y1 , . . . , y8 ) with y1 = x1 /a1 and yi = xi /ai for


2 i 8. It follows that a1 y1 + + a8 y8 = 0. Therefore y (R[x]4,2 ) and
y = (q) for some quadratic form q. Then (q 2 ) = x and we are done.
We can use Exercise 4.8 and Theorem 4.14 to visualize the discrepancy between
P  and  . Lets take a slice of both cones with the hyperplane H given by x1 +
+ x8 = 1. Recall that by Exercise 4.8 we have P  = R+
8 . Therefore the slice of
P  with H is the standard simplex. The slice of T8 with H is the standard simplex
with cut o corners. It was Hilberts observation that the standard basis vectors
ei are not in  , and Theorem 4.14 tells us exactly how much is cut o around the
corners.
We now take a short break from comparing Pn,2d and n,2d to consider some
convexity properties of these cones, such as boundary, facial structure, symmetries,
and dual cones.

4.4
4.4.1

Symmetries, Dual Cones, and Facial


Structure
Symmetries of Pn,2d and n,2d

The cones Pn,2d and n,2d have a lot of built-in symmetries coming from linear
changes of coordinates. Suppose that A GLn (R) is a nonsingular linear transformation of Rn .

i
i

168

main
2012/11/1
page 168
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

Exercise 4.15. Show that if p(x) R[x]2d is a nonnegative form, then p(Ax) is
also a nonnegative form in R[x]2d . Similarly, if p(x) is a sum of squares, then p(Ax)
is also a sum of squares.
In more formal terms, a nonsingular linear transformation A of Rn induces a
nonsingular transformation A of R[x]2d , which maps p(x) R[x]2d to p(A1 (x)).
We say that the group GLn (R) acts on R[x]2d . It follows from Exercise 4.15 that
both cones Pn,2d and n,2d are invariant under this action. In other words, Pn,2d
and n,2d are invariant under nonsingular linear changes of coordinates.
Exercise 4.16. Show that, up to a constant multiple, r2d = (x21 + + x2n )d is the
only form in R[x]2d that is xed under all orthogonal changes of coordinates; i.e.,
it is the only form in R[x]2d that satises
p(x) = p(Ax) for all A On ,
where On is the group of orthogonal transformations of Rn .
We note that even if a linear transformation A of Rn is singular, it still induces
a linear transformation A in the same way. However the linear map A will also
be singular. The map A still sends Pn,2d and n,2d into themselves, but it will
no longer preserve the cones. Closed convex cones in R[x]2d that are mapped into
themselves under any linear change of coordinates are called blenders [24].

4.4.2

Dual Cone of Pn,2d

Let K be a convex cone in a real vector space V . Let V be the dual vector space
of linear functionals on V . The dual cone K is dened as the set of all linear
functionals in V that are nonnegative on K:
K = { V | (x) 0

for all x K} .

Many general aspects of duality will be discussed in Chapter 5. We examine


the specic cases of cones of nonnegative polynomials and sums of squares.
Lets consider the dual space R[x]2d of linear functionals on R[x]2d . We rst
observe that the dual cone of Pn,2d is conceptually simple. For v Rn , let v be
the linear functional in R[x]2d given by evaluation at v:
v (f ) = f (v) for

f R[x]2d .

By homogeneity of forms we know that nonnegativity on the unit sphere is


equivalent to global nonnegativity. Therefore it is natural to think that the func
. Before we show that this is
tionals v with v Sn1 generate the dual cone Pn,2d
in fact the case we need a useful exercise from convexity.
Exercise 4.17. Let K Rn be a compact convex set with the origin not in K.
Show that the conical hull of K, cone(K), is closed. Construct an explicit example
that shows that the condition 0
/ K is necessary.

i
i

4.4. Symmetries, Dual Cones, and Facial Structure

main
2012/11/1
page 169
i

169

of the cone of nonnegative forms is the conical


Lemma 4.18. The dual cone Pn,2d
hull of linear functionals v with v on the unit sphere:

$
%

Pn,2d
= cone v | v Sn1 .
Proof. Let Ln,2d R[x]2d be the conical hull of functionals v with v Sn1 .
The dual cone Ln,2d is the set of all forms p R[x]2d such that
v (p) = p(v) 0 for all v Sn1 .
Therefore we see that Ln,2d = Pn,2d . Using biduality we see that the dual

cone Pn,2d
is equal to the closure of Ln,2d :

= (Ln,2d ) = Ln,2d.
Pn,2d

We now just need to show that the cone Ln,2d is closed and then Ln,2d =
Ln,2d. Consider the set C of all linear functionals v with v Sn1 . The set C is
given by a continuous embedding of the unit sphere Sn1 into R[x]2d , and therefore
C is compact. If we can show that the convex hull of C does not contain the origin,
then we are done by applying Exercise 4.17.
Let r2d = (x21 + + x2n )d be the
 form in R[x]2d that is constantly 1 on
the unit sphere.
Suppose
that
m
=
cv v conv(C). Then it follows that

cv = 1, and therefore m cannot be the zero functional in R[x]2d . It
m(r2d ) =
follows that conv(C) is a compact convex set with 0
/ C and we are done.
Exercise 4.19. Use the apolar inner product from Chapter 3 to identify R[x]2d

with the dual space R[x]2d . Show that the dual cone Pn,2d
is identied with the
cone of sums of 2dth powers of linear forms:

4


qi2d
p R[x]2d p =

with

5
qi R[x]n,1 .

Remark 4.20. The map that sends a point v Rn to the form (v1 x1 + +vn xn )2d
is called the 2dth Veronese embedding and its image is called the Veronese variety. It

follows from Lemma 4.18 that the cone Pn,2d


is the conical hull of the 2dth Veronese
variety. For more information and for connections to orbitopes we refer to [25].
By applying spherical symmetries to functionals v we obtain the following

crucial corollary, which describes the extreme rays of Pn,2d


.

for all v Sn1 ,


Corollary 4.21. The functional v spans an extreme ray of Pn,2d

and the functionals v form the complete set of extreme rays of Pn,2d .

have a very nice parametrization by points


The extreme rays of the cone Pn,2d

is a very complex object from the computational


v Sn1 . However, the cone Pn,2d

i
i

170

main
2012/11/1
page 170
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

and convex geometry point of view. For example, given a linear functional

is known as the truncated


R[x]2d , determining whether it belongs to the cone Pn,2d
moment problem in real analysis. Despite a long history, there are very few explicit
and computationally feasible criteria for testing membership in R[x]2d . For more
on this approach see [15].
Decomposing a given linear functional in R[x]2d as a linear combination of
the functionals v , or equivalently by Exercise 4.19, decomposing a given form in
R[x]2d as a linear combination of forms v 2d is known as the symmetric tensor
decomposition problem. Again, despite a long history, many aspects of symmetric
tensor decomposition remain unknown. For more information we refer to [14, 21].

4.4.3

Boundary of the Cone of Nonnegative Polynomials

The boundary and the interior of the cone of nonnegative forms Pn,2d are easy to

describe given our knowledge of the dual cone Pn,2d


.
Exercise 4.22. Show that the interior of Pn,2d consists of forms that are strictly
positive on Rn \ {0} and the boundary of Pn,2d consists of forms with a nontrivial
zero.
We note that the situation is slightly dierent in the nonhomogeneous case.
Let f (x) = x2 + 1 be a univariate polynomial, and let P be the cone of nonnegative
univariate polynomials of degree at most 4. Clearly f P and f is strictly positive
on R. However, f lies on the boundary of P . Consider g = f x4 . For any > 0
the polynomial g will not be nonnegative. Therefore f is not in the interior of P ,
and it lies on the boundary of P .
The explanation for this phenomenon is that even though f is strictly positive
on R, when viewed as a polynomial of degree 4, f has a zero at innity. The
growth of f (x) as x goes to innity is only of order 2, and therefore we cannot
subtract a nonnegative polynomial of degree 4 from f and have the dierence remain
nonnegative. The easiest way to see the zero at innity is to homogenize f with an
extra variable y: f = x2 y 2 + y 4 .
Note that if we set y = 1 in f we just recover f . However, f is not a strictly
positive form on R2 \ {0}, since f has a nontrivial zero which comes from setting
y = 0. In general, for a polynomial f in n variables of degree d, let fd be the degree
d component of f consisting of all terms of degree exactly d. Zeroes at innity of
f correspond to zeroes of fd . This can be seen by homogenizing f with an extra
variable. When we set this variable equal to 0 we obtain fd .

4.4.4

Exposed Faces of Pn,2d

Exposed faces of Pn,2d are conceptually easy to understand due to our knowledge of

in Corollary 4.21. Maximal (by inclusion)


the extreme rays of the dual cone Pn,2d
faces of Pn,2d come from the vanishing of one extreme ray of the dual cone. Therefore it follows that maximal faces F (v) of Pn,2d consist of all nonnegative forms

i
i

4.4. Symmetries, Dual Cones, and Facial Structure

main
2012/11/1
page 171
i

171

that have a single common zero v Sn1 :


F (v) = {p Pn,2d | v (p) = p(v) = 0}.
We observe that a zero of a nonnegative form p is a local minimum. Therefore,
if p(v) = 0, this implies that the gradient of p at v is zero as well, p(v) = 0. In
other words, p must have a double zero at v.
Exercise 4.23 (Eulers relation). Show that for p R[x]d and all v Rn the
following relation holds:
p(v), v = d p(v).
From the above exercise it follows that for forms p R[x]2d the vanishing of
the gradient at v, p(v) = 0, forces the form p to vanish at v as well, p(v) = 0.
Therefore, for a nonnegative form p Pn,2d a single zero forces p to satisfy n linear conditions coming from p(v) = 0. It follows that the face F (v) has codimension
at least n.
Exercise 4.24. Show that the maximal faces F (v) of Pn,2d have codimension exactly n in R[x]2d .
All smaller exposed faces F (v1 , . . . , vk ) come from the vanishing of several

. The face F (v1 , . . . , vk ) has the form


extreme rays v1 , . . . , vk of Pn,2d
F (v1 , . . . , vk ) = {p Pn,2d | p(v1 ) = = p(vk ) = 0, vi Sn1 }.
Therefore F (v1 , . . . , vk ) consists of all nonnegative forms with zeroes at prescribed points v1 , . . . , vk Sn1 . It is natural to expect that every additional zero
increases the codimension of the exposed face by n so that codim F (v1 , . . . , vk ) = kn.
However, this intuition fails if the number of zeroes k is suciently large. In particular if we prescribe enough zeroes, it is not even clear when the face F (v1 , . . . , vk )
is nonempty. The question of the dimension of F (v1 , . . . , vk ) is quite complicated
[6] and it is related to the celebrated AlexanderHirschowitz theorem [17].
Exposed extreme rays of Pn,2d are also conceptually simple: a nonnegative
form p Pn,2d is an exposed extreme ray of Pn,2d if and only if the variety dened
by p is maximal among all varieties dened by nonnegative polynomials.
Exercise 4.25. Show that p Pn,2d is an exposed extreme ray of Pn,2d if and
only if for all q Pn,2d with V (p) V (q) it follows that q = p for some R.

4.4.5

Nonexposed Faces of Pn,2d

The cone Pn,2d has many nonexposed faces. If a form p has a zero at a point
v R, then it must have a double zero at v. Exposed faces of Pn,2d capture double
zeroes on any set of points v1 , . . . , vk , but exposed faces fail to capture zeroes of
higher order.

i
i

172

main
2012/11/1
page 172
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

Exercise 4.26. Show that x2d


1 is an extreme ray of Pn,2d . Use Exercise 4.25 to
conclude that x2d
is
not
exposed.
1
More generally, the following construction explains the origins of nonexposed
faces of Pn,2d . Consider a maximal face F (v) of Pn,2d . We can construct an
exposed subface of F (v) by considering nonnegative forms with zeroes at v and w
for some w Sn1 . We can also build nonexposed subfaces of F (v) by considering
nonnegative forms that are more singular at v.
Let p F (v), so that p is a nonnegative form and p(v) = 0. Since 0 is the
global minimum of p and p(v) = 0, it follows that the Hessian 2 p(v) must be a
positive semidenite matrix. Let Fw (v) be the set of all nonnegative forms p with
zero at v whose Hessian at v is positive semidenite and w lies in the kernel of
2 p(v):
Fw (v) = {p F (v) | 2 p(v) w = 0}.
Exercise 4.27. Show that Fw (v) is a face of Pn,2d . Use the characterization of
exposed faces of Pn,2d to show that Fw (v) is not an exposed face of Pn,2d .

4.4.6

Algebraic Boundaries

The boundaries of the cones Pn,2d and n,2d are hypersurfaces in R[x]2d . Suppose
that we would like to describe these hypersurfaces by polynomial equations. This
leads to the notion of algebraic boundary of the cones Pn,2d and n,2d , which is
obtained by taking the Zariski closure of the boundary hypersurfaces. As explained
in Chapter 5, the algebraic boundary of Pn,2d is cut out by a single polynomial, the
discriminant. The algebraic boundary of the cone of sums of squares is signicantly
more complicated.
Exercise 4.28. Show that the hypersurface cut out by the discriminant is a component of the algebraic boundary of n,2d .
The above exercise shows that the algebraic boundary of Pn,2d is included in
the algebraic boundary of n,2d . This seems counterintuitive, but it occurs because
we passed to the Zariski closures of the actual boundaries. We will see below that
for 3,6 and 4,4 the algebraic boundary of the cone of sums of squares has one
more component, which is described in Exercise 4.51.

4.5

Generalizing the Hypercube Example

We completely described the values of nonnegative forms and sums of squares on


the specic set S of 1 vectors in R4 and we have seen, just from the evaluation
on S, that there exist nonnegative forms in R[x]4,4 that are not sums of squares.
However, these descriptions are limited to the specic set S. We now extend the arguments of Section 4.3 to work in far greater generality. We begin by
explaining how the set S was chosen in the rst place.

i
i

4.5. Generalizing the Hypercube Example

4.5.1

main
2012/11/1
page 173
i

173

Hypercube Example Revisited

Let qi be the three quadratic forms


q1 = x21 x22 , q2 = x21 x23 , q3 = x21 x24 ,
and let V be the set of common zeroes of qi :
V = {x R4 | qi (x) = 0 for i = 1, 2, 3}.
Viewed projectively V consists of eight points in the real projective space RP3 .
Viewed anely V consists of eight lines, each line spanned by a point in S. We can
extend much of what was proved about the values of nonnegative polynomials to
zero-dimensional intersections in RPn1 .

4.5.2

Zero-Dimensional Intersections

Let V be a set of nitely many points in RPn1 :


V = {
s1 , . . . , sk }.
Suppose that V is the complete set of real projective zeroes of some forms q1 , . . . , qm
of degree d:
V = {x RPn1 | q1 (x) = = qm (x) = 0}.
For each si V let si be an ane representative of si lying on the line
spanned by si . Now let S = {s1 , . . . , sk }, be the set of ane representatives corresponding to the common zeroes of qi .
Lets consider the values of nonnegative forms of degree 2d on S. Let S :
R[x]2d Rk be the evaluation projection:
S (f ) = (f (s1 ), . . . , f (sk )) for f R[x]2d .
Let H be the image of R[x]2d and let P  be the image of Pn,2d under S :
H = S (R[x]2d ),

P  = S (Pn,2d ).

We have an additional complication that H does not have to equal Rk . We know,


however, that P  must lie in H, and since we are evaluating nonnegative forms
it follows that P  lies inside the nonnegative orthant of Rk : P  Rk+ . Therefore it
follows that P  lies inside the intersection of H and Rk+ :
P  H Rk+ .
The following theorem shows that this inclusion is almost an equality.
Theorem 4.29. Let Rk++ be the positive orthant of Rk . The intersection of H with
the positive orthant is contained in P  :
H Rk++ P  .

i
i

174

main
2012/11/1
page 174
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

Before proving Theorem 4.29 we make some remarks. As we know from Exercise 4.7 we cannot simply conclude that P  = H Rk+ using a closure argument,
since a projection of a closed cone does not have to be closed. We now show that
this occurs for evaluation projections as well.
Exercise 4.30. Let S R5 be the set of 16 points S = {1, 1, 1, 1, 1}. Show
that S can be dened as a common zero set of four quadratic forms in R[x]5,2 , and
use Theorem 4.29 to show that R16
++ S (P5,4 ). Show that the standard basis
vectors ei R16 are not in the image S (P5,4 ). In other words, the vectors ei are
not realized as values on S of a nonnegative form of degree 4 in 5 variables, but all
strictly positive points in R16
++ are realized.
Proof of Theorem 4.29. Let v = (v1 , . . . , vk ) H Rk++ . Since v H there
2
, where qi are
exists a form f R[x]2d such that f (si ) = vi . Let g = q12 + + qm
the forms dening V . We claim that for large enough R the form f$ =% f + g
will be nonnegative, and since each qi is zero on S we will also have S f = v.
By homogeneity of f it suces to show that it is nonnegative on the unit
sphere Sn1 . Furthermore, we may assume that the evaluation points si lie on the
unit sphere. Since we are dealing with forms, evaluation on the points outside of
the unit sphere amounts to rescaling of the values on Sn1 .
Let B (S) be the open epsilon neighborhood of S in the unit sphere Sn1 .
Since f (si ) > 0 for all i, it follows that for suciently small the form f is strictly
positive on B (S):
f (x) > 0 for all x B (S).
The complement of B (S) in Sn1 is compact, and therefore we can let m1 be the
minimum of g and m2 be the minimum of f on Sn1 \ B (S). If m2 0, then f
itself is nonnegative and we are done. Therefore, we may assume m2 < 0. We also
note that since g vanishes on S only, it follows that m1 is strictly positive.
2

Now let m
m1 . The form f = f + g is positive on B (S). By construction
of B (S) we also see that the minimum of f on the complement of B (S) is at
least 0. Therefore f is nonnegative on the unit sphere Sn1 , and we are done.
We proved in Theorem 4.29 that any set of strictly positive values on the nite
set S, coming from real zeroes of forms of degree d, can be achieved by a globally
nonnegative form of degree 2d. We now look at the values that sums of squares can
take on such sets S.

4.5.3

Values of Sums of Squares

We recall from Section 4.3 that the reason that sums of squares could not achieve all
the possible nonnegative values on the hypercube was that the values of quadratic
forms on the hypercube satised a linear relation. The points of the hypercube
come from common zeroes of the quadratic forms, as we have seen in Section 4.5.1.
There is a general theory in algebraic geometry on the number of relations
that values of forms of certain degree have to satisfy on nite sets of points. These

i
i

4.5. Generalizing the Hypercube Example

main
2012/11/1
page 175
i

175

relations are known as CayleyBacharach relations. For more details we refer the
reader to [10].
At rst glance it is surprising that there should be any linear relation at all.
If the points were chosen generically then the values of forms of degree d on these
points would be linearly independent, at least until we have as many points as the
dimension of the vector space of forms of degree d. However, our choice of points
is not generic; point sets that come from common zeroes are special.
For the cases R[x]4,4 and R[x]3,6 it is easy to establish the existence of the
linear relation by simple dimension counting. We explain the case of R[x]4,4 .
Since common zeroes of real forms do not have to be real, for this section
we will work with complex forms. Suppose that q1 , q2 , q3 C[x]4,2 are complex
quadratic forms in 4 variables. As before let V be the complete set of projective
zeroes of some forms q1 , q2 , q3 :
V = {
x CP3 | q1 (
x) = q2 (
x) = q3 (
x) = 0}.
Three quadratic forms in C[x]4,2 are expected to generically have 23 = 8 common
zeroes. Suppose that this is the case and let V = {
s1 , . . . , s8 }.
For each si V let si be an ane representative of si lying on the line
corresponding to si . Let S = {s1 , . . . , s8 }, be the set of ane representatives
corresponding to the common zeroes of qi . Dene S : C[x]4,2 C8 to be the
evaluation projection.
Lemma 4.31. The values of quadratic forms in C[x]4,2 satisfy a linear relation on
the points of S. In other words there exist 1 , . . . 8 C such that
1 f (s1 ) + + 8 f (s8 ) = 0 for all f C[x]4,2 .

(4.3)

Proof. The dimension of C[x]4,2 is 10. Note that the kernel of S contains the
three forms qi , since each qi evaluates to 0 on S. Therefore the dimension of the
kernel of S is at least 3. It follows that the image of S has dimension at most
10 3 = 7. Since the image of S lies inside C8 , it follows that there exists a linear
functional that vanishes on the image of S . This linear functional gives us the
desired linear relation.
Remark 4.32. It is possible to show in the above proof that the dimension of the
kernel of S is exactly 3 and therefore the linear relation (4.3) is unique. Furthermore, it can be shown that each i = 0, or, in other words, the unique linear relation
has to involve all of the points of S.
Exercise 4.33. Suppose that q1 , q2 C[x]3,3 are two cubic forms intersecting in
32 = 9 points in CP2 . Let S be the set of ane representatives of the common
zeroes of q1 and q2 . Use the argument of Lemma 4.31 to show that the values of
cubic forms on S satisfy a linear relation.
Exercise 4.34. The Robinson form
R(x, y, z) = x6 + y 6 + z 6 (x4 y 2 + x2 y 4 + x4 z 2 + x2 z 4 + y 4 z 2 + y 2 z 4 ) + 3x2 y 2 z 2

i
i

176

main
2012/11/1
page 176
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

is an explicit example of a nonnegative polynomial that is not a sums of squares.


Let q1 = x(x + z)(x z) and q2 = y(y + z)(y z). Calculate the 9 common zeroes
of q1 and q2 . Show that R(x, y, z) vanishes on 8 of the 9 zeroes. Use Exercise 4.33
to show that R(x, y, z) is not a sum of squares.
We have examined in detail what happens to values of nonnegative forms
and sums of squares on nite sets of points coming from common zeroes of forms.
However, this still seems to be a very special construction. We now move to show
that the dierence in values on such sets is in fact the fundamental reason that
there exists nonnegative polynomials that are not sums of squares.

4.6

Dual Cone of n,2d

in Corollary
We gave a simple description of the extreme rays of the dual cone Pn,2d
4.21. The description of the extreme rays of the dual cone n,2d is signicantly more
complicated. We will see that evaluation on the special nite point sets we described
in Section 4.5 will naturally lead to extreme rays of n,2d .
We rst describe the connection between n,2d and the cone of positive
semidenite matrices that lies at the heart of semidenite programming approaches
to polynomial optimization. To every linear functional R[x]2d we can associate
a quadratic form Q dened on R[x]d by setting

Q (f ) = (f 2 )

for all

f R[x]d .

The cone n,2d can be thought of as a section of the cone of positive semidefinite quadratic forms. We now show how this description arises.
Lemma 4.35. Let be a linear functional in R[x]2d . Then n,2d if and only
if the quadratic form Q is positive semidenite.
Proof. Suppose that n,2d . Then (f 2 ) 0 for all f R[x]d . Therefore
Q (f ) 0 for all f R[x]d and Q is positive semidenite.
Now
that Q is positive
semidenite. Then (f 2 ) 0 for all f R[x]d .
suppose
2
Let g = fi n,2d . Then (g) = (fi2 ) 0 and n,2d .
An Aside: The Monomial Basis and Moment Matrices
Suppose that we x the monomial basis for R[x]d . Given a linear functional
R[x]2d we can write an explicit matrix M ( ) for the quadratic form Q using
the monomial basis of R[x]d . The matrix M ( ) is known as the moment matrix or
generalized Hankel matrix. The entries of M ( ) are indexed by monomials x , x
R[x]d . The entry M ( ), is given by evaluating on x x = x+ :
M ( ), = (x+ ).
For example, consider the linear functional v : R[x]2,4 R given by evaluation on v = (1, 2). The monomial basis of R[x]2,2 is given by x2 , xy, y 2 and the

i
i

4.6. Dual Cone of n,2d

main
2012/11/1
page 177
i

177

moment matrix of M ( v ) reads as

1 2
M ( v ) = 2 4
4 8

4
8 .
16

The rank of the quadratic form Q is the same as the rank of its moment matrix
M ( ), and Q being nonnegative is equivalent to having a positive semidenite
moment matrix M ( ). However, the moment approach is tied to the specic choice
of the monomial basis. Below we prefer to keep a basis independent approach
with emphasis on the underlying geometry, but we note that the results are readily
translatable into the terminology of moments.

Let S n,d be the vector space of real quadratic forms on R[x]d . We can view
the dual space R[x]2d as a subspace of S n,d by identifying the linear functional
n,d
R[x]2d with its quadratic form Q . Let S+
be the cone of positive semidenite
n,d
forms in S :

4
5

n,d
S+
= Q S n,d Q(f ) 0 for all f R[x]d .
We can restate Lemma 4.35 as follows.
Corollary 4.36. The cone n,2d is the section of the cone of positive semidenite
n,d
with the subspace R[x]2d :
matrices S+
n,d
R[x]2d .
n,2d = S+

Note that this shows that the cone n,2d is a spectrahedron.


The following exercise establishes the connection between the cone of sums of
n,d
squares n,2d and the cone of positive semidenite matrices S+
. This allows us
to formulate sums of squares questions in terms of semidenite programming.
Exercise 4.37. Use the result of Corollary 4.36 to show that the cone n,2d is a
n,d
projection of the cone S+
of positive semidenite matrices on R[x]d . Use the
monomial basis of R[x]d to describe this projection explicitly. Conclude that the
cone n,2d is a projected spectrahedron. (Hint: See Chapter 5 for a general discussion
of the relationship between duality and projections.)
Remark 4.38. In order to apply the result of Exercise 4.37 to actual computation
we need to work with an explicit basis of R[x]d . See Chapter 3 for the discussion
of possible basis choices and their impact on computational performance. We note
that the size of the positive $semidenite
matrices we work with is the dimension
%
.
Therefore
the size of the underlying positive
of R[x]d , which is equal to n+d1
d
semidenite matrices increases rather rapidly as a function of n and d. This is one
of the main computational limitations of semidenite programming approaches to
polynomial optimization.

i
i

178

main
2012/11/1
page 178
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

We would like to see what separates sums of squares from nonnegative forms.
The extreme rays of n,2d cut out the cone of sums of squares. Therefore we would

like to nd extreme rays of n,2d that are not in the dual cone Pn,2d
, since these
are the functionals that distinguish the cone of sums of squares from the cone of
nonnegative forms.
Formally the dual cone n,2d is dened as the cone of linear functionals nonnegative on n,2d , which is equivalent to being nonnegative on squares. One way
of constructing linear functionals nonnegative on squares is to consider point evaluation functionals v with v Rn that send p R[x]2d to p(v). However, as we
have seen in Corollary 4.21, point evaluation functionals are precisely the extreme

rays of Pn,2d
. Therefore, these linear functionals are not helpful in distinguishing

. Our goal now is to nd a new way of constructing funcbetween n,2d and Pn,2d
tionals nonnegative on squares and also to understand why such functionals do not
exist when n,2d = Pn,2d .
We showed in Corollary 4.36 that the cone n,2d is a spectrahedron. We
now prove a general lemma about spectrahedra that states that extreme rays of a
spectrahedron are quadratic forms with maximal kernel [20]. The examination of
the kernels of extreme rays of n,2d will provide a crucial tool for our understanding
of n,2d .
Let S be the vector space of quadratic forms on a real vector space V . Let
S+ be the cone of psd forms in S.
Lemma 4.39. Let L be a linear subspace of S and let K be the section of S+
with L:
K = S+ L.
Suppose that a quadratic form Q spans an extreme ray of K. Then the kernel of Q
is maximal for all quadratic forms in L: if P L and ker Q ker P then P = Q
for some R.
Proof. Suppose not, so that there exists an extreme ray Q of K and a quadratic
form P L such that ker Q ker P and P = Q. Since ker Q ker P it follows
that all eigenvectors of both Q and P corresponding to nonzero eigenvalues lie in
the orthogonal complement (ker Q) of ker Q. Furthermore, Q is positive denite
on (ker Q) .
It follows that Q and P can be simultaneously diagonalized to matrices Q
and P  with the additional property that whenever the diagonal entry Qii is 0 the
corresponding entry Pii is also 0. Therefore, for suciently small R we have
that Q + P and Q P are positive semidenite and therefore Q + P, Q P K.
Then Q is not an extreme ray of K, which is a contradiction.
We now apply Lemma 4.39 to the case n,2d . This gives us a crucial tool for
studying extreme rays of n,2d .
Corollary 4.40. Suppose that Q spans an extreme ray of n,2d . Then either
rank Q = 1 or the forms in the kernel of Q have no common zeroes, real or complex.

i
i

4.6. Dual Cone of n,2d

main
2012/11/1
page 179
i

179

Proof. Let W R[x]d be the kernel of Q and suppose that the forms in W
have a common real zero v = 0. Let R[x]2d be the linear functional given
by evaluation at v: (f ) = f (v) for all f R[x]2d . Then Q is a rank 1 positive
semidenite quadratic form and ker Q ker Q . By Lemma 4.39 it follows that
Q = Q and thus Q has rank 1.
Now suppose that the forms in W have a common complex zero z = 0. Let
R[x]2d be the linear functional given by taking the real part of the value at z:
(f ) = Re f (z) for all f R[x]2d . It is easy to check that the kernel of Q includes
all forms that vanish at z and therefore W ker Q . Therefore by applying Lemma
4.39 we again see that Q = Q . However, we claim that Q is not a positive
semidenite form.
The quadratic form Q is given by Q (f ) = Re f 2 (z) for f R[x]d . However,
there exist f R[x]d such that f (z) is purely imaginary and therefore Q (f ) < 0.
The corollary now follows.
Corollary 4.40 shows that extreme rays of n,2d are of two types: either they
are rank 1 quadratic forms or they have a kernel with no common zeroes. We now
deal with the rank 1 extreme rays of n,2d . For v Rn let v be the linear functional
in R[x]2d given by evaluation at v,
v (f ) = f (v) for f R[x]2d ,
and let Qv be the quadratic form associated to v : Qv (f ) = f 2 (v). In this case we
say that Qv (or v ) corresponds to point evaluation. Recall that the inequalities
v 0 are the dening inequalities of the cone of nonnegative forms Pn,2d . The
following lemma shows that all rank 1 forms in R[x]2d correspond to point evaluations. Since we are interested in the inequalities that are valid on n,2d but not
valid on Pn,2d it allows us to disregard rank 1 extreme rays of n,2d and focus on
the case of a kernel with no common zeros.
Lemma 4.41. Suppose that Q is a rank 1 quadratic form in R[x]2d . Then Q = Qv
for some v Rn and R.
Proof. Let Q be a rank 1 form in R[x]2d . Then Q(f ) = s2 (f ) for some linear
functional s R[x]d . Therefore it suces to show that if Q = s2 (f ) for some
s R[x]d , then Q = Qv for some v Rn .
Since Q R[x]2d we know that Q is dened by Q(f ) = (f 2 ) for a linear
functional R[x]2d and therefore (f 2 ) = s2 (f ) for all f R[x]d . We have Q(f +
g) = ((f +g)2 ) = (f 2 )+2 (f g)+ (g 2 ) = (s(f )+s(g))2 = s2 (f )+2s(f )s(g)+s2 (g)
and it follows that (f g) = s(f )s(g) for all f, g R[x]d .
n

1
Let x denote the monomial x
1 xn . If we take monomials x , x , x , x

in R[x]d such that x x = x x , then we must have s(x )s(x ) = s(x )s(x ).
Suppose that s(xdi ) = 0 for all i. Then we see that
s(xd1
xj )2 = s(xdi )s(xd2
x2j ) = 0,
i
i
and continuing in similar fashion we have s(x ) = 0 for all monomials in R[x]d .
Then is the zero functional and Q does not have rank one which is a contradiction.

i
i

180

main
2012/11/1
page 180
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

We may assume without loss of generality that s(xd1 ) = 0. Since we are interested in (f 2 ) = s2 (f ) we can work with s if necessary, and thus we may assume
that s(xd1 ) > 0. Let si = s(xd1
xi ) for 1 i n. We will express s(x ) in
1

xi xj ) = (xd1
xi )(xd1
xj ) we have
terms of si for all x R[x]d . Since (xd1 )(xd2
1
1
1
d2
s(x1 xi xj ) = si sj /s1 . Continuing in this fashion we nd that
n
1
s(x
1 xn ) =

n
2
s
2 sn
.
d11
s1

Now let v Rn be the following vector:


1/d

(d1)/d

v = (s1 , s1

(d1)/d

s2 , . . . , s1

sn ).

Let sv be the linear functional on R[x]d dened by evaluating a form at v: sv (f ) =


xi ) = si and
f (v). Then we have sv (xd1
1
/d(d1)(d1 )/d

2
n
n
1
1
sv (x
1 xn ) = s2 sn s1

n
2
s
2 sn
1
sd1
1

Since s agrees with sv on monomials it follows that s = sv and thus (f 2 ) =


s (f ) = f (v)2 = f 2 (v). Therefore indeed corresponds to point evaluation and we
are done.
2

Suppose that Q spans an extreme ray of n,2d that does not correspond to
point evaluation. Let W be the kernel of Q . Then by Corollary 4.40 and Lemma
4.41 we know that the forms in W have no common zeroes real or complex. This
condition gives us a lot of dimensional information about W and places strong
restrictions on the linear functionals . As we will see, for the three equality cases
of Hilberts theorem the dimensional restrictions on W will allow us to derive nonexistence of the extreme rays of n,2d with kernel W , thus proving the equality
between nonnegative forms and sums of squares.
Let W be a linear subspace of R[x]d and dene W 2 to be the degree 2d part
of the ideal generated by W :
W 2 = W 2d .
We use VC (W ) to denote the set of common zeroes (real and complex) of forms
in W .
We next show that there is a strong relation between the linear functional
and the kernel W of the quadratic form Q . Namely, we show that vanishes on
2
all of W :
2
(p) = 0 for all p W .
(4.4)
2

We will write the condition (4.4) as (W ) = 0 for short. We also now show
that W is the maximal subspace among all W such that (W 2 ) = 0.
Lemma 4.42. Let Q be a quadratic form in n,2d and let W R[x]d be the
kernel of Q . Then p W if and only if (pq) = 0 for all q R[x]d .

i
i

4.6. Dual Cone of n,2d

main
2012/11/1
page 181
i

181

Proof. In order to investigate W , we need to dene the associated bilinear form B :


B (p, q) =

Q (p + q) Q (p) Q (q)
2

for p, q R[x]d .

By denition of Q we have Q (p) = (p2 ). Therefore it follows that


B (p, q) = (pq).
A form p R[x]d is in the kernel of Q if and only if B (p, q) = 0 for all q R[x]d .
Since B (p, q) = (pq), the lemma follows.
We note that VC (W ) = implies that the dimension of W is at least n
and we can nd forms p1 , . . . , pn W that have no common zeroes. We need a
dimensional lemma from algebraic geometry which we will use without proof.
Lemma 4.43. Suppose that p1 , . . . , pn R[x]d are forms such that VC (p1 , . . . , pn ) =
and let I = p1 , . . . , pn  be the ideal generated by the forms pi . Then
dim I2d

& '
n
= n dim R[x]d
.
2

Remark 4.44. The forms p1 , . . . , pn R[x]d such that VC (p1 , . . . , pn ) = form


a complete intersection. The dimensional information of the ideal p1 , . . . , pn  is
well understood via the Koszul complex. The statement of Lemma 4.43 is an easy
consequence of the powerful techniques developed for complete intersections [12].

4.6.1

Equality Cases of Hilberts Theorem

We have obtained enough information on the dual cone n,2d to give a unied proof
of the equality cases of Hilberts theorem.
Proof of equality cases in Hilberts theorem. Suppose that n,2d = Pn,2d .
Then there exists an extreme ray of n,2d that does not come from point evaluation.
Let be such an extreme ray and let W be the kernel of Q . By Lemma 4.41 it
follows that rank Q > 1, and therefore by Corollary 4.40 we see that VC (W ) = .
Therefore dim W n and we can nd forms p1 , . . . , pn W such that
VC (p1 , . . . , pn ) = . Let I = p1 , . . . , pn  be the ideal generated by pi . It follows
$ %
2
that W includes I2d and dim I2d = n dim R[x]d n2 by Lemma 4.43. Therefore
we see that
& '
n
2
.
dim W n dim R[x]d
2
However, by (4.4) we must also have
2

dim W

dim R[x]2d 1,

i
i

182

main
2012/11/1
page 182
i

Chapter 4. Nonnegative Polynomials and Sums of Squares


2

since a nontrivial linear functional R[x]2d vanishes on W . We now go case by


case and derive a contradiction from these dimensional facts in each of the equality
cases.
2
Suppose that n = 2. Then dim R[x]2,d = d + 1 and thus dim W 2(d +
1) 1 = 2d + 1 = dim R[x]2,2d , which is a contradiction.
$ %
2
Suppose that 2d = 2. Then dim R[x]n,1 = n and dim W n2 n2 =
$n+1%
= dim R[x]n,2 , leading to the same contradiction.
2
2
Finally
$3% suppose that n = 3 and 2d = 4. Then dim R[x]3,2 = 6 and dim W
6 3 2 = 15 = dim R[x]3,4 , which again leads to the same dimensional contradiction.
We now turn our attention to the structure of extreme rays of n,2d in the
smallest cases where there exist nonnegative polynomials that are not sums of
squares: 3 variables, degree 6, and 4 variables, degree 4.

4.7

Ranks of Extreme Rays of 3,6 and 4,4

We rst examine, in the cases (3, 6) and (4, 4), the structure of linear functionals
R[x]2d with a given kernel W such that VC (W ) = .
Proposition 4.45. Let W be a three-dimensional subspace of R[x]3,3 such that
VC (W ) = . Then dim W 2 = 27 and there exists a unique quadratic form Q
R[x]3,6 containing W in its kernel. Furthermore ker Q = W .
Before we prove Proposition 4.45 we note that the unique form Q with kernel
W need not be positive semidenite. The investigation of positive deniteness of
Q will lead us to evaluation on nite point sets in the next section.
Proof of Proposition 4.45. By applying Lemma 4.43 we see that
dim W 2 = 3 dim R[x]3,3 3 = 27.
Since dim R[x]3,6 = 28 it follows that W 2 is a hyperplane in R[x]3,6 and
therefore there is a unique linear functional vanishing on W . By Lemma 4.42 it
follows that Q is the unique (up to a constant multiple) quadratic form with W in
its kernel.
We leave the part that the dimension of the kernel of Q cannot be more
than 3 as an exercise.
There is also the corresponding proposition for the case (4, 4) with the same
proof.
Proposition 4.46. Let W be a four-dimensional subspace of R[x]4,2 such that
VC (W ) = . Then dim W 2 = 34 and there exists a unique quadratic form Q
R[x]4,4 containing W in its kernel. Furthermore ker Q = W.

i
i

4.7. Ranks of Extreme Rays of 3,6 and 4,4

main
2012/11/1
page 183
i

183

We obtain the following interesting corollaries.


Corollary 4.47. Suppose that spans an extreme ray of 3,6 and does not
correspond to point evaluation. Then rank Q = 7. Conversely, suppose that Q is
3,3
a psd form of rank 7 in S+
and let W be the kernel of Q . If VC (W ) = , then
Q spans an extreme ray of 3,6 .
Proof. Suppose that spans an extreme ray of 3,6 and does not correspond
to point evaluation. Let W be the kernel of Q . We know that V (W ) = and
dim W 3. We can then nd a three-dimensional subspace W of W such that
V (W ) = . Applying Proposition 4.45 we see that there exists a unique quadratic
form Q containing W in its kernel. Then it must happen that Q is a scalar multiple
of Q, and since ker Q = W we see that the kernel of Q has dimension 3 and thus
Q has rank 7.
Conversely suppose that Q is a positive semidenite form of rank 7 and
VC (W ) = . Then by Proposition 4.45 Q is the unique quadratic form in R[x]3,6
with kernel W . Suppose that Q = Q1 + Q2 with Q1 , Q2 3,6 . Then Q1 and
Q2 are positive semidenite forms by Lemma 4.35 and therefore ker Q ker Qi .
Then Q1 and Q2 are scalar multiples of Q and therefore Q spans an extreme ray
of 3,6 .
The above corollary has a couple of interesting consequences. If the quadratic
form Q is in 3,6 and its rank is at most 6, then it must be a convex combination of
rank 1 forms in 3,6 , which we know are point evaluations. Restated in measure and
moment language, this says that if a positive semidenite moment matrix in R[x]3,6
has rank at most 6, then the linear functional can be written as a combination of
point evaluations, and therefore the linear functional has a representing measure.
However, there are rank 7 positive semidenite moment matrices that do not admit
a representing measure.
Another consequence can be stated in optimization terms. Suppose that we
would like to optimize a linear functional over a compact base of the 3,6 . Then
the point where the optimum is achieved will have rank 1 or rank 7.
Corollary 4.48. Suppose that p 3,6 lies on the boundary of the cone of sums
of squares and p is a strictly positive form. Then p is a sum of exactly 3 squares.
Proof. Let p be as above. Since p lies in the boundary of 3,6 there exists an
extreme ray of the dual cone 3,6 such that (p) = 0. Now suppose that p =

fi2 for some fi R[x]3,3 . It follows that Q (fi ) = 0 for all i, and since Q is a
positive semidenite quadratic form, we see that all fi lie in the kernel W of Q .
By Corollary 4.47 we know that dim W = 3 and therefore p is a sum of squares
of forms coming from a three-dimensional subspace of R[x]3,3 . It follows that p
is a sum of at most 3 squares. Since any two ternary cubics have a common real
zero and p is strictly positive, it follows that p cannot be a sum of two or fewer
squares.

i
i

184

main
2012/11/1
page 184
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

The equivalent corollaries hold for the case (4, 4), although the proof of Corollary 4.50 requires slightly more work, while the proof of Corollary 4.49 is exactly
the same. For complete details see [7].
Corollary 4.49. Suppose that spans an extreme ray of 4,4 and does not
correspond to point evaluation. Then rank Q = 6. Conversely, suppose that Q is
4,2
a positive semidenite form of rank 6 in S+
and let W be the kernel of Q . If
VC (W ) = , then Q spans an extreme ray of 4,4 .
Corollary 4.50. Suppose that p 4,4 lies on the boundary of the cone of sums
of squares and p is a strictly positive form. Then p is a sum of exactly 4 squares.
Corollaries 4.48 and 4.50 were used to study the algebraic boundary of the
cones 3,6 and 4,4 in [8].
Exercise 4.51. Show that all forms in R[x]3,6 that can be written as linear combinations of squares of 3 cubics form an irreducible hypersurface in R[x]3,6 . Similarly,
show that all forms in R[x]4,4 that are linear combinations of squares of 4 quadratics also form an irreducible hypersurface in R[x]4,4 . (Hint: Use Terracinis lemma.)
Use Corollaries 4.48 and 4.50 to show that the algebraic boundary of 3,6 and 4,4
has a single component in addition to the discriminant hypersurface.
It was shown in [8] that despite their simple denition the hypersurfaces of
Exercise 4.51 have very high degree: 83200 in the case (3, 6) and 38475 in the
case (4, 4). This shows that the boundary of the cone of sums of squares is quite
complicated from the algebraic point of view.

4.8

Extracting Finite Point Sets

We have established in the previous section that the interesting extreme rays of
3,6 have rank 7 and those of 4,4 have rank 6. Lets consider the case of 4 variables
of degree 4. We have shown that a four-dimensional subspace W leads to a unique
form Q of rank 6 such that the kernel of Q contains W . However, the form Q does
not have to lie in 4,4 , since the form Q is not necessarily positive semidenite.
In order to examine positive semideniteness of Q we reduce the problem to
looking at an evaluation on nite point sets.
Exercise 4.52. Let W be a subspace of R[x]d such that VC (W ) = . Show that
there exist forms q1 , . . . , qn1 W that intersect in dn1 projective points in CPn1 :
s1 , . . . , sdn1 | si CPn1 }.
VC (q1 , . . . , qn1 ) = {
We apply this result to our case of W R[x]4,4 and obtain forms q1 , q2 , q3
W intersecting in 23 = 8 projective points si CP3 . We can take their ane
representatives s1 , . . . , s8 Cn . Unfortunately, even though the forms qi W are
real, their points of intersection may be complex.

i
i

4.9. Volumes

main
2012/11/1
page 185
i

185

However, as was shown in [7], the fact that the form Q is positive semidenite
restricts the number of complex zeroes. Since complex zeroes of real forms come
in conjugate pairs, the fewest number of complex zeroes that the forms qi may
have is 2.
Theorem 4.53. Suppose that R[x]4,4 is an extreme ray of 4,4 that does not
correspond to point evaluation and let W be the kernel of Q . Let q1 , q2 , q3 W
be any three forms intersecting in 23 = 8 projective points in CP3 . Then the forms
qi have at most 2 common complex zeroes. Conversely, given q1 , q2 , q3 R[x]4,2
intersecting in 8 points with at most 2 of them complex, there exists an extreme ray
of 4,4 whose kernel contains q1 , q2 , q3 .
There is an equivalent theorem for the case (3, 6).
Theorem 4.54. Suppose that R[x]3,6 is an extreme ray of 3,6 that does not
correspond to point evaluation and let W be the kernel of Q . Let q1 , q2 W be
any two forms intersecting in 32 = 9 projective points in CP2 . Then the forms
qi have at most 2 common complex zeroes. Conversely, given q1 , q2 , q3 R[x]3,3
intersecting in 9 points with at most 2 of them complex, there exists an extreme ray
of 3,6 whose kernel contains q1 , q2 .
It is possible to apply the CayleyBacharach machinery explained in Section
4.5 to completely describe the structure of the extreme rays of n,2d for the cases
(4, 4) and (3, 6) using the coecients of the unique CayleyBacharach relation that
exists on the points of intersection of the forms qi .
We have now come full circle, from using a nite point set to establish that
there exist nonnegative forms that are not sums of squares in Section 4.3 to showing
that these sets underlie all linear inequalities that separate n,2d from Pn,2d .

4.9

Volumes

We now switch gears completely and turn to the question of the quantitative relationship between Pn,2d and n,2d . Our goal is to compare the relative sizes of
the cones Pn,2d and n,2d . While the cones themselves are unbounded objects, we
can take a section of each cone with the same hyperplane so that both sections are
compact.
Let Ln,2d be an ane hyperplane in R[x]2d consisting of all forms with integral (average) 1 on the unit sphere Sn1 in Rn :


,

p d = 1 ,
Ln,2d = p R[x]2d
Sn1

n,2d
where is the rotation invariant probability measure on Sn1 . Let Pn,2d and
be the sections of Pn,2d and n,2d with Ln,2d:
Pn,2d = Pn,2d Ln,2d

and

n,2d = n,2d Ln,2d.

i
i

186

main
2012/11/1
page 186
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

Let r2d = (x21 + + x2n )d be the form in R[x]2d that is constantly 1 on the
n,2d lie in the ane hyperplane Ln,2d of
unit sphere. Convex bodies Pn,2d and
forms of integral 1 on the unit sphere. We now translate them to lie in the linear
hyperplane Mn,2d of forms of integral 0 on the unit sphere by subtracting r2d :
Pn,2d = Pn,2d r2d = {p R[x]2d | p + r2d Pn,2d }
and
n,2d =
n,2d r2d = {p R[x]2d | p + r2d
n,2d }.

n,2d will be done separately.


The estimation of the volumes of Pn,2d and
Before proceeding we make a short note on the proper way to measure the size of
a convex set. Let K Rn be a convex body. Suppose that we expand K by a
constant factor . Then the volume changes as follows:
Vol(K) = n Vol K.
We would like to think of K and K as similar in size, but if the ambient
dimension n grows, then K is signicantly larger in volume. Therefore the proper
measure of volume that takes care of the dimensional eects is
1

(Vol K) n .

4.9.1

Volume of Nonnegative Forms

Let Mn,2d be the linear hyperplane of forms of integral 0 on the unit sphere:


,

p d = 0 .
Mn,2d = p R[x]2d
Sn1

n,2d live inside Mn,2d, so our calculations will


Both convex bodies Pn,2d and
involve the unit sphere and the unit ball in Mn,2d.
We equip R[x]2d with the L2 inner product:
,
p, q =
pq d.
Sn1

We note that with this metric we have


,

p
2 = p, p =

Sn1

p2 d =
p
22 .

We also let ||p|| denote the L -norm of p:


||p|| = max |p(x)|.
xSn1

Let N be the dimension of Mn,2d . Since Mn,2d is a hyperplane in R[x]2d we


%
$
1. Let SN 1 and B N denote the unit
know that N = dim R[x]2d 1 = n+2d1
2d
sphere and the unit ball in Mn,2d with respect to the L2 inner product.

i
i

4.9. Volumes

main
2012/11/1
page 187
i

187

Our goal is to show the following estimate on the volume of Pn,2d .


Theorem 4.55.

Vol Pn,2d
Vol B N

1/N

1

n1/2 .
2 4d + 2

We rst develop a general way of estimating the volume of a convex set,


starting from simply writing out the integral for the volume in polar coordinates.
We refer to [11] for the relevant analytic inequalities.
Exercise 4.56. Let K Rn be a convex body with the origin in its interior and
let K be the characteristic function of K: K (x) = 1 if x K and K (x) = 0
otherwise. The volume of K is given by the following integral:
,
Vol K =
Rn

K d,

where is the Lebesgue measure.


Let GK be the gauge of K. Rewrite the above integral in polar coordinates
to show that
,
Vol K
=
Gn
K d,
Vol B n
Sn1
where B n and Sn1 are the unit ball and the unit sphere in Rn and is the rotation
invariant probability measure on Sn1 .
Exercise 4.57. Use Exercise 4.56 and H
olders inequality to show that
&

Vol K
Vol B n

'1/n

Sn1

G1
K d.

Exercise 4.58. Use Exercise 4.57 and Jensens inequality to show that
&

Vol K
Vol B n

&,

'1/n

Sn1

'1
GK d

Now we apply the results of Exercises 4.564.58 to the case of Pn,2d .


Lemma 4.59.

Vol Pn,2d
Vol B N

1/N

&,

SN 1

'1
||p|| dp

Proof. We observe that Pn,2d consists of all forms of integral 1 on Sn1 whose
minimum on Sn1 is at least 0. Therefore Pn,2d consists of all forms of integral 0

i
i

188

main
2012/11/1
page 188
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

on Sn1 whose minimum on the unit sphere is at least 1:




,

Pn,2d = p R[x]2d
p d = 0 and min p(x) 1 .
xSn1

Sn1

It follows that the gauge of Pn,2d is given by minSn1 :


GPn,2d (p) = min p(x).

(4.5)

xSn1

Using Exercise 4.58 we can bound the volume of Pn,2d from below:


Vol Pn,2d
Vol B N

1/N

&,

'1
SN 1

min(p) dp

Since minxSn1 p(x) is bounded above by ||p|| we obtain




Vol Pn,2d
Vol B N

1/N

&,

SN 1

'1
||p|| dp

as desired.
From Lemma 4.59 we see that in order to obtain a lower bound on the volume
of Pn,2d we need to nd an upper bound on the average L -norm of forms in SN 1 :
,
||p|| dp .
SN 1

It is easy to see that the L -norm of any polynomial is bounded from below by any
of its L2k -norms:
||p||
p
2k
for all k. Finding upper bounds on the L -norm of forms in R[x]2d in terms of
their L2k -norms is signicantly more challenging.
Exercise 4.60. It was shown by Barvinok in [3] that the following inequality holds
for all p R[x]2d and all k:
&
||p||

'1
2kd + n 1 2k

p
2k .
2kd

Show that for k = n we have

||p|| 2 2d + 1
p
2n
for all p R[x]2d .

i
i

4.9. Volumes

main
2012/11/1
page 189
i

189

Remark 4.61. It is possible to obtain slightly better bounds for our purposes by
using k = n log(2d + 1) in the above inequality. See [4] for details.
We use Barvinoks inequality to convert the problem of bounding the average
L -norm on SN 1 into bounding the average L2n -norm. In order for this to be
useful we need lower bounds on the average L2k -norms. We will show the following
bound.
Lemma 4.62.

,
SN 1

p
2k dp

2k.

Before we proceed with the proof we need some preliminary results.


Exercise 4.63. Let denote the gamma function. Show that for k N
%
$ % $
,
n2 k + 12
2k
%
$
%.
$
x1 d =
12 k + n2
Sn1

(4.6)

Now let : Rn R be a linear form given by (x) = x,  for some vector Rn .
Use (4.6) to show that
%
$n% $
,
1
2k
2k $ 2 % $ k + 2 %
.
(4.7)
(x) dx =

12 k + n2
Sn1
In order to apply the result of Exercise 4.63 we will need to know the L2 -norm
of a special form in Mn,2d .
Lemma 4.64. Let v Sn1 be a unit vector and let v Mn,2d be the form such
that
p, v  = p(v)
Then

for all

v
= dim Mn,2d =

6&
'
n + 2d 1
1.
2d

Proof. Consider the following average:


,
,
p2 (v) dp =
SN 1

p Mn,2d.

SN 1

p, v 2 dp .

On one hand it is the average of a quadratic form on the unit sphere and by Exercise
4.63 we have
,

v
2
p2 (v) dp =
.
dim Mn,2d
SN 1

i
i

190

main
2012/11/1
page 190
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

On the other hand, by symmetry, this average is independent of the choice of


v Sn1 . Therefore we may introduce an extra average over the unit sphere:
,

SN 1

p2 (v) dp =

,
SN 1

Sn1

p2 (v) dp dv .

Now we switch the order of integration:


,

SN 1

We observe that

Sn1

p2 (v) dp =

SN 1

Sn1

p2 (v) dv dp .

p2 (v) dv = 1 for all p SN 1 and therefore


,
p2 (v) dp = 1.

SN 1

The lemma now follows.


We are now ready to estimate the average L2k -norm on SN 1 .
Proof of Lemma 4.62.

SN 1

&,

p
2k dp =

SN 1

Sn1

p2k (x) dx

1
' 2k

By applying the H
older inequality we can move the exponent
&,

,
SN 1

p
2k dp

dp .
1
2k

1
' 2k

,
2k

SN 1

Sn1

outside and obtain

p (x) dx dp

Now we exchange the order of integration:


&,

,
SN 1

p
2k dp

Consider the inner integral

1
' 2k

,
2k

Sn1

SN 1

p (x) dp dx

,
SN 1

p2k (x) dp .

(4.8)

By rotational invariance it does not depend on the choice of the point x Sn1 .
Therefore the outer integral over Sn1 is redundant and we obtain
&,

,
SN 1

p
2k dp

2k

SN 1

p (v) dp

1
' 2k

for any v Sn1 .

(4.9)

i
i

4.9. Volumes

main
2012/11/1
page 191
i

191

We can rewrite this as


,

&,

SN 1

p
2k dp

p, v 

2k

SN 1

1
' 2k

dp

Now we see that the integral in (4.8) is actually just the average of the 2kth
power of a linear form and we can apply Exercise 4.63 to see that
%
$N % $
,
1
2k
2k 2 k + 2
%.
$ % $
p, v  dp =
v

12 k + N2
SN 1
By Lemma 4.64 we know that
v
2 = dim Mn,2d = N.
Putting it all together with (4.9) we see that
,
SN 1

p
2k dp N

1
 $ % $
%  2k
N2 k + 12
%
$ % $
.
12 k + N2

We now use the following two estimates to nish the proof:




1
7
$ %  2k
N2
2
%
$

N
k + N2

and

1
 $
%  2k

k + 12
$1%
k.
2

Weremark that asymptotically the second estimate is an overestimate by a factor


of e.

Proof of Theorem 4.55. We rst use Lemma 4.59 to see that




Vol Pn,2d
Vol B N

1/N

&,

SN 1

'1
||p|| dp

By Exercise 4.60 we know that for all p R[x]2d

||p|| 2 2d + 1
p
2n .
Therefore we see that

1/N
&,
'1
Vol Pn,2d
1

p
2n dp
.
Vol B N
2 2d + 1
SN 1
Now we can apply Lemma 4.62 with k = n and obtain


Vol Pn,2d
Vol B N

1/N

1

n1/2
2 4d + 2

as desired.

i
i

192

4.9.2

main
2012/11/1
page 192
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

Volume of Sums of Squares

We now turn our attention to the cone of sums of squares n,2d . Although it will
be somewhat obscured by our presentation, the main reason for our ability to derive
n,2d comes from the fact that the dual cone
bounds on the volume of
n,2d is a
section of the cone of positive semidenite matrices.
We have just seen how to derive lower bounds on the volume of the cone of
nonnegative forms. These bounds, of course, apply to quadratic forms, and they
can be extended to work for sections of the cone. This gives us a lower bound on
the volume of the dual cone, which can be turned around into an upper bound on
n,2d is therefore
n,2d . The approach to bounding the volume of
the volume of
very similar to what we did for nonnegative forms. In fact, the technique in the
proofs of the main bounds in Lemma 4.70 and Lemma 4.62 is nearly identical.
n,2d is
Let D be the dimension of R[x]d . Our main result on the volume of
as follows.
Theorem 4.65.

n,2d
Vol
Vol B N

1/N

Remark 4.66. Recall that


&
'
n + 2d 1
N=
1
2d

7
2

and

4d+1

6D
.
N

&
'
n+d1
D=
.
d

n,2d is of the
Therefore, for xed degree d our upper bound on the volume of
d/2
. In Theorem 4.55 we proved a lower bound on the volume of Pn,2d that
order n
is of the order n1/2 . Therefore, when the total degree 2d is at least 4, the lower
bound on the volume of Pn,2d is asymptotically much larger than the upper bound
n,2d . Thus we see that if the degree 2d is xed and at least 4,
on the volume of
there are signicantly more nonnegative forms than sums of squares.
It is possible to show that the bounds of Theorems 4.55 and 4.65 are asymptotically tight for the case of xed degree 2d. See [5] for more details.
In Exercises 4.564.58 we showed how to bound the volume of a convex body
K from below using the average of its gauge over the unit sphere Sn1 . As we
explained above, we are now dealing with the dual situation, and we need a related
dual inequality that bounds the volume of K from above by the average gauge of
its dual body K .
Exercise 4.67. Let K Rn be a convex body with 0 in its interior and let K be
the dual convex body dened as
K = {x Rn | x, y 1

for all

y K}.

Show that the gauge of K is given by the following formula:


GK (x) = max x, y.
yK

i
i

4.9. Volumes

main
2012/11/1
page 193
i

193

The following is known as Urysohns inequality [26].


Lemma 4.68.
&

Vol K
Vol B n

'1/n

Sn1

GK (x) dx .

.
In order to apply Lemma 4.68 we need a description of the gauge of
n,2d
Let SD1 be the unit sphere in R[x]d with respect to the L2 inner product.
:
Lemma 4.69. We have the following description of the gauge of
n,2d
G

n,2d

(p) = max p, q 2 .


qSD1

Proof. By Exercise 4.67 the gauge of


n,2d is given by
G

n,2d

(p) = max p, q.


n,2d
q

We observe that the maximal inner product maxq n,2d p, q always occurs at an
n,2d are all squares, and therefore
n,2d . Extreme points of
extreme point of

extreme point of n,2d are translates of squares and have the form
,
q 2 r2d with q R[x]d and
q 2 d = 1.
Sn1

The condition Sn1 q 2 d = 1 corresponds exactly to q lying in the unit sphere of


R[x]d . Since forms p Mn,2d have integral zero on the unit sphere Sn1 , it follows
that
p, r2d  = 0

for all

p Mn,2d.

n,2d we see that


Combining with the description of the extreme points of
G

n,2d

(p) = max p, q 2 .


qSD1

Given a form p R[x]2d we dene the associated quadratic form Qp on R[x]d :


Qp (q) = p, q 2 

for

q R[x]d .

n,2d is given by the maximum of Qp on


By Lemma 4.69 we see that the gauge of
D1
the unit sphere S
in R[x]d :
G n,2d (p) = max Qp (q).
qSD1

Since Qp is a quadratic form on R[x]d , its L -norm is the maximal value it


takes on the unit sphere SD1 :
||Qp || = max |Qp (q)|.
qSD1

i
i

194

main
2012/11/1
page 194
i

Chapter 4. Nonnegative Polynomials and Sums of Squares


Applying Lemma 4.68 we see that


n,2d
Vol
Vol B N

1/N

||Qp || dp .

SN 1

Now we can apply Barvinoks inequality to bound ||Qp || by high L2k -norms.
Using Exercise 4.60 with k = D we see that

||Qp || 2 3
Qp
2D .
Therefore we obtain


n,2d
Vol
Vol B N

1/N

,
2 3

SN 1

Qp
2D dp .

The proof is now nished with the following estimate, which proceeds in nearly
the same way as the proof of Lemma 4.62.
Lemma 4.70.

,
SN 1

Qp
2D dp 2

4d

2D
.
N

Proof. We rst write out the integral we would like to estimate:


,
SN 1

&,

Qp
2D dp =

'1/2D
p, q 

2 2D

SN 1

SD1

dq

dp .

Using the Holder inequality we move the exponent 1/2D outside:


&,

,
SN 1

Qp
2D dp

'1/2D

,
p, q 

2 2D

SN 1

SD1

dq dp

Next we interchange the order of integration:


&,

,
SN 1

Qp
2D dp

'1/2D

,
p, q 

2 2D

SD1

SN 1

dp dq

(4.10)

Consider the inner integral


,
SN 1

p, q 2 2D dp .

(4.11)

We apply Exercise 4.63 to see that


,
p, q 

2 2D

SN 1

%
$N % $
D + 12
2
%.
$ % $
12 D + N2

2 2D

dp
q

i
i

4.10. Convex Forms

main
2012/11/1
page 195
i

195

The reason that we have an inequality, instead of equality, above is that q 2


does not lie in the hyperplane Mn,2d , and for equality we should use the norm of
the projection of q 2 into Mn,2d . We now observe that

q 2
=
q
24 .
Since q lies in the unit sphere of SD1 it follows that
q
= 1. By a result of
Duoandikoetxea in [9] we know that

q
4 42d
q
.
Putting it all together we get
,
p, q 

2 2D

SN 1

dp 4

%
$N % $
D + 12
2
$ % $
%.
12 D + N2

4dD

We note that this estimate is independent of q and therefore the outer integral in (4.10) is redundant and we obtain
 $ % $
% 1/2D
N
1

D
+
2%
$ 2% $

Qp
2D dp 42d
.
12 D + N2
SN 1

As in the proof of Lemma 4.62 we use the estimates




1
7
$ %  2D
N2
2
%
$

N
N
D+ 2

1
 $
%  2D

D + 12
$1%
D.
2

and

Therefore we have
7

,
SN 1

4.10

Qp
2D 2

4d

2D
.
N

Convex Forms

There is another very interesting convex cone inside R[x]2d , the cone of convex
forms Cn,2d . A form p R[x]2d is called convex if p is a convex function on Rn :
&
'
x+y
p(x) + p(y)
p

for all x, y Rn .
2
2
It is an easy exercise to show that Cn,2d is contained in the cone of nonnegative
forms.
Exercise 4.71. Show that if a form p R[x]2d is convex, then p is nonnegative.
Show that x21 x22 P2,4 is not convex.

i
i

196

main
2012/11/1
page 196
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

The relationship between convex forms and sums of squares is signicantly


harder to understand. An equivalent denition of convexity is that a form p R[x]2d
is convex if and only if its Hessian 2 p is a positive semidenite matrix on all of
Rn . We can associate with p its Hessian form Hp , which is a form in 2n variables,
with old variables x = (x1 , . . . , xn ) and new variables y = (y1 , . . . , yn ). The Hessian
form Hp (x, y) is given by
$
%
Hp (x, y) = y T 2 p(x) y.
We note that Hp is a bihomogeneous form; it is quadratic in y and of degree
2d 2 in x. A form p is convex if and only if its Hessian form Hp is nonnegative
on R2n .
A form p R[x]2d is called sos-convex if Hp is a sum of squares. Sos-convexity
is a more restrictive condition than being a sum of squares.
Exercise 4.72. Let p R[x]2d be an sos-convex form. Show that p is a sum of
squares.
An explicit example of a convex form that is not sos-convex was constructed
in [1]. We will explain below that there exist convex forms that are not sums of
squares. In fact, we will show using volume arguments that asymptotically there
are signicantly more convex forms than sums of squares. However, it is still an
open question to nd an explicit example of a convex form that is not a sum of
squares.

4.10.1

Volumes of Convex Forms

As before we can take a compact section of Cn,2d with the hyperplane Ln,2d of
forms of integral 1 on Sn1 :
Cn,2d = Cn,2d Ln,2d .
We also let Cn,2d be Cn,2d translated by subtracting r2d :
Cn,2d = Cn,2d r2d .
The convex body Cn,2d lies in the hyperplane Mn,2d of forms of average 0 on
the unit sphere Sn1 . We will show the following estimate on the volume of Cn,2d
that, together with Theorems 4.55 and 4.65, implies that if the degree 2d is xed
and the number of variables grows then there are signicantly more convex forms
than sums of squares. This is the only currently known method of establishing
existence of convex forms that are not sums of squares.
Theorem 4.73.
&

Vol Cn,2d
Vol Pn,2d

'1/N

1
.
2(2d 1)

i
i

4.10. Convex Forms

main
2012/11/1
page 197
i

197

Remark 4.74. From Exercise 4.71 it follows that Cn,2d Pn,2d . Therefore the
estimate of Theorem 4.73 is asymptotically tight for the case of xed degree 2d.
Our rst goal is to show that if a form p R[x]2d is suciently close to being
constant on the unit sphere, then p must be convex.
Theorem 4.75. Let p be a form in R[x]2d . If for all v Sn1
1

1
1
p(v) 1 +
,
2d 1
2d 1

then p is convex.
For a point Sn1 we can think of as a direction. We will use
p
= p, 

to denote the derivative of p in the direction . A function f : Rn R is convex if


and only if for all v Rn and all Sn1 we have
2f
(v) 0.
2
Since we are working with forms it suces to restrict our attention to v Sn1 .
We use |p| to denote the length of the gradient of p. We will need the following
theorem of Kellogg [13].
Theorem 4.76. Let p be a form in R[x]d . For all v Sn1
|p(v)| d ||p|| .
Theorem 4.76 implies that for any v Sn1


p
(v) d ||p|| .


This follows since
p
= p,  |p| || = |p|

by applying the CauchySchwarz inequality.


We extend Theorem 4.76 to cover the case of higher derivatives, which is
necessary since convexity is a condition on second derivatives:
Lemma 4.77. Let p be a form in R[x]d . For any v and 1 , . . . k Sn1




kp
d!


(v)
1 k (d k)! ||p|| .

i
i

198

main
2012/11/1
page 198
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

Proof. We proceed by induction on the order of partial derivatives k. The base case
k = 1 is covered by Theorem 4.76. Now we need to show the induction step. We
assume that the statement holds for all derivatives of order at most k and consider




k+1 p


(v)
1 k+1
for some 1 , . . . k+1 Sn1 .
Let
q=

p
.
1

Using the base case we see that


||q|| d ||p|| .

(4.12)

Also, we know that q is a form in n variables of degree d 1. Therefore by the


induction assumption




(d 1)!
kq


(v)
(4.13)
2 k+1 (d k 1)! ||q|| .
Putting together (4.12) and (4.13), the lemma follows.
We are now ready to prove Theorem 4.75, which provides a sucient condition
for a form to be convex.
Proof of Theorem 4.75. Let p be as in the statement of the theorem, and let
q = p r2d . By the assumptions of the theorem it follows that, for all v Sn1 ,

1
1
q(v)
.
2d 1
2d 1

In other words
||q||

1
.
2d 1

Then by Lemma 4.77 we know that for any v and Sn1


2

q


2 (v) 2d.
In particular, it follows that
2q
(v) 2d
2
for all v and Sn1 .

i
i

4.10. Convex Forms

main
2012/11/1
page 199
i

199

It is easy to check that


2 r2d
(v) = 2d + 4d(d 1)v, 2 2d.
2
Since we know that p = q + r2d it follows that for all v and Sn1
2p
(v) 0,
2
and therefore p is convex.
We need one more result from convexity to help us with the volume bounds
(see [16]).
Exercise 4.78. Let K be a convex body in Rn . The barycenter of K is dened to
be a vector b = (b1 , . . . , bn ) K given by
,
xi K d,
bi =
Rn

where K is the characteristic function of K and is the Lebesgue measure. Let


K  be the reection of K through the barycenter b: K  = b (K b). Show that
&

Vol K K 
Vol K

' n1

1
.
2

Exercise 4.79. The set Pn,2d is a convex body in the hyperplane Mn,2d of all
forms of integral 0 on the unit sphere. Use invariance of Pn,2d under orthogonal
changes of coordinates to show that 0 is the barycenter of Pn,2d . Let Pn,2d be
the reection of Pn,2d through the origin. Show that Pn,2d Pn,2d consists of all
forms in Mn,2d whose values on the unit are between 1 and 1, i.e., the forms with
L -norm at most 1:
Pn,2d Pn,2d = {p Mn,2d | ||p|| 1} .
Proof of Theorem 4.73. Let Kn,2d be the set of forms that take values only
1
1
and 1 + 2d1
on the unit sphere:
between 1 2d1

Kn,2d =




1
1
p(v) 1 +
for all v Sn1 .
p R[x]2d 1
2d 1
2d 1

n,2d be the section of


We note that Kn,2d is a compact convex set. We let K
Kn,2d with Ln,2d ,
n,2d = Kn,2d Ln,2d,
K

i
i

200

main
2012/11/1
page 200
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

n,2d be the translated section:


and let K
n,2d = K
n,2d r2d .
K
n,2d consists of all the forms in Mn,2d that take values between
It follows that K
1
1
1
2d1
and 2d1
on the unit sphere, so forms with L -norm at most 2d1
:



1
n,2d = p Mn,2d ||p||
K
.

2d 1
By Exercise 4.79 it follows that
1 .
2d 1

/
n,2d.
Pn,2d Pn,2d K

Using Exercise 4.78 we see that




Vol Pn,2d Pn,2d


Vol Pn,2d

1/N

1
.
2

Therefore it follows that




n,2d
Vol K
Vol Pn,2d

1/N

1
.
2(2d 1)

n,2d is contained in Cn,2d ,


On the other hand, by Theorem 4.75 we know that K
and the theorem follows.

Bibliography
[1] A. A. Ahmadi and P. A. Parrilo. A convex polynomial that is not sos-convex.
Math. Program. Ser. A, 135:275292, 2012.
[2] A. Barvinok. A Course in Convexity. American Mathematical Society, Providence, RI, 2002.
[3] A. Barvinok. Estimating L norms by L2k norms for functions on orbits.
Found. Comput. Math., 2:393412, 2002.
[4] A. Barvinok and G. Blekherman. Convex geometry of orbits. In Combinatorial and Computational Geometry, Math. Sci. Res. Inst. Publ. 52, Cambridge
University Press, Cambridge, UK, 2005, pp. 5177.
[5] G. Blekherman. There are signicantly more nonnegative polynomials than
sums of squares. Israel J. Math., 183:355380, 2006.
[6] G. Blekherman. Dimensional dierences between nonnegative polynomials and
sums of squares. Submitted for publication, arXiv:0907.1339.

i
i

Bibliography

main
2012/11/1
page 201
i

201

[7] G. Blekherman. Nonnegative polynomials and sums of squares. J. Amer. Math.


Soc., 25:617635, 2012.
[8] G. Blekherman, J. Hauenstein, J. C. Ottem, K. Ranestad, and B. Sturmfels.
Algebraic boundaries of Hilberts SOS cones. To appear in Compositio Mathematica. arXiv:1107.1846.
[9] J. Duoandikoetxea. Reverse Holder inequalities for spherical harmonics. Proc.
Amer. Math. Soc., 101:487491, 1987.
[10] D. Eisenbud, M. Green, and J. Harris. Cayley-Bacharach theorems and conjectures. Bull. Amer. Math. Soc., 33:295324, 1996.
[11] G. Hardy, E. Littlewood, and G. Polya. Inequalities. Cambridge University
Press, Cambridge, UK, 1988.
[12] J. Harris. Algebraic Geometry. A First Course, Grad. Texts in Math. 133.
Springer-Verlag, New York, 1995.
[13] O. D. Kellogg. On bounded polynomials in several variables. Math. Z., 27:55
64, 1928.
[14] J. M. Landsberg and Z. Teitler. On the ranks and border ranks of symmetric
tensors. Found. Comput. Math., 10:339366, 2010.
[15] J. B. Lasserre, Moments, Positive Polynomials and Their Applications. Imperial College Press, London, 2009.
[16] V. D. Milman and A. Pajor. Entropy and asymptotic geometry of nonsymmetric convex bodies. Adv. Math., 152:314335, 2000.
[17] R. Miranda. Linear systems of plane curves. Notices Amer. Math. Soc., 46:192
202, 1999.
[18] P. A. Parrilo. Semidenite programming relaxations for semialgebraic problems. Math. Program. Ser. B, 96:293320, 2000/01.
[19] P. A. Parrilo and B. Sturmfels. Minimizing polynomial functions. In Algorithmic and Quantitative Aspects of Real Algebraic Geometry, DIMACS Ser.
Discrete Math. Theoret. Comput. Sci. 60, American Mathmatical Society,
Providence, RI, 2003, pp. 83100.
[20] M. Ramana and A. J. Goldman. Some geometric results in semidenite programming. J. Global Optim., 7:3350, 1995.
[21] B. Reznick, Sums of Even Powers of Real Linear Forms, Mem. Amer. Math.
Soc. 96, American Mathematical Society, Providence, RI, 1992.
[22] B. Reznick. Some concrete aspects of Hilberts 17th problem. In Real Algebraic
Geometry and Ordered Structures, Contemp. Math. 253. American Mathematical Society, Providence, RI, 2000, pp. 251272.

i
i

202

main
2012/11/1
page 202
i

Chapter 4. Nonnegative Polynomials and Sums of Squares

[23] B. Reznick. On Hilberts construction of positive polynomials. Submitted for


publication, arXiv:0707.2156.
[24] B. Reznick. Blenders. In Notions of Positivity and the Geometry of Polynomials, P. Branden, M. Passare, and M. Putinar, eds., Trends in Math. Birkhauser,
Basel, 2011, pp. 345373.
[25] R. Sanyal, F. Sottile, and B. Sturmfels. Orbitopes. Mathematika, 57:275314,
2011.
[26] R. Schneider. Convex Bodies: The Brunn-Minkowski Theory. Cambridge University Press, Cambridge, UK, 1993.

i
i

main
2012/11/1
page 203
i

Chapter 5

Dualities

Philipp Rostalski and Bernd Sturmfels

Dualities are ubiquitous in mathematics and its applications. This chapter compares
several notions of duality that are central to the connections between convexity,
optimization, and algebraic geometry developed in this book. It is meant as a rst
introduction and is intended for a diverse audience ranging from graduate students
in mathematics to practitioners of optimization who are based in engineering.

5.1

Introduction

Convex algebraic geometry concerns the interplay between optimization theory and
real algebraic geometry. Its objects of study include convex semialgebraic sets that
arise in semidenite programming and from sums of squares. This chapter compares
three notions of duality that are relevant in these contexts: duality of convex bodies,
duality of projective varieties, and the KarushKuhnTucker conditions derived
from Lagrange duality. We show that the optimal value of a polynomial program is
an algebraic function whose minimal polynomial is expressed by the hypersurface
projectively dual to the constraint set. We give an introduction to the algebraic
geometry in the boundary of the convex hull of a compact variety. Our focus lies
on making the polynomials that vanish on that boundary explicit, in contrast to
the representation of convex bodies as projected spectrahedra. We also explore the
geometric underpinnings of semidenite programming duality.
Duality for vector spaces lies at the heart of linear algebra and functional
analysis. Duality in convex geometry is essentially an involution on the set of
Philipp Rostalski was supported by the Alexander-von-Humboldt Foundation through a
Feodor Lynen postdoctoral fellowship.
Bernd Sturmfels was supported by NSF grants DMS-0757207 and DMS-0968882.

203

i
i

204

main
2012/11/1
page 204
i

Chapter 5. Dualities

Figure 5.1. The cube is dual to the octahedron.


convex bodies: for instance, it maps the cube to the octahedron and vice versa
(Figure 5.1). Duality in optimization, known as Lagrange duality, plays a key role
in designing ecient algorithms for the solution of various optimization problems.
In projective geometry, points are dual to hyperplanes, and this leads to a natural
notion of projective duality for algebraic varieties. Our aim here is to explore these
dualities and their interconnections in the context of polynomial optimization and
semidenite programming. Toward the end of the introduction, we shall discuss the
context and organization of this chapter. At this point, however, we jump right in
and present a concrete three-dimensional example that illustrates our perspective
on these topics.

5.1.1

How to Dualize a Pillow

We consider the following symmetric matrix with

1 x
x 1

Q(x, y, z) =
0 y
x 0

three indeterminate entries:

0 x
y 0
.
(5.1)
1 z
z 1

This symmetric 44 matrix species a three-dimensional compact convex body


;
:
(5.2)
P = (x, y, z) R3 | Q(x, y, z)  0 .
The notation  0 means that the matrix is positive semidenite, i.e., all four
eigenvalues are nonnegative real numbers. Such a linear matrix inequality always

i
i

5.1. Introduction

main
2012/11/1
page 205
i

205

Figure 5.2. A three-dimensional spectrahedron P and its dual convex body P .


denes a closed convex set (as in (5.2)) which is referred to as a spectrahedron.
Positive semidenite matrices and spectrahedra appear in all chapters of this book.
Our spectrahedron P looks like a pillow. It is shown on the left in Figure 5.2.
The algebraic boundary of P is the surface specied by the determinant
det(Q(x, y, z))

x2 (y z)2 2x2 y 2 z 2 + 1

0.

At this point we pause to emphasize that Subsection 5.1.1 is intended to be a


rst welcome to our readers. The objects of study are introduced here informally,
by way of one concrete example in three dimensions, which may guide the reader
through the following sections. Precise denitions of the general concepts, such as
algebraic boundary,algebraic degree, etc., will be furnished in the later sections.
The interior of the spectrahedron P in (5.1) represents all matrices Q(x, y, z)
whose four eigenvalues are positive. At all smooth points on the boundary of P ,
precisely one eigenvalue vanishes, and the rank of the matrix Q(x, y, z) drops from
4 to 3. However, the rank drops further to 2 at the four singular points
1
1
1
1
(x, y, z) = (1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1).
2
2
2
2

(5.3)

We nd these from a Gr
obner basis of the ideal of 3 3 minors of Q(x, y, z):
: 2
;
2x 1, 2z 2 1, y + z .
The linear polynomial y + z in this Gr
obner basis denes the symmetry plane of the
pillow P . The four singular points form a square in that plane. Its edges are also
edges of P . All other faces of P are exposed points. These come in two families,
sometimes called protrusions, one above the plane y + z = 0 and one below it.
The protrusions are drawn in two dierent colors on the left in Figure 5.2.
Note that the surface P is smooth along the four edges that separate the two
protrusions. To be more precise, the four points (5.3) are the only singular points
in P . All points in the relative interiors of the four edges are nonsingular in P .

i
i

206

main
2012/11/1
page 206
i

Chapter 5. Dualities
Like all convex bodies, our pillow P has an associated dual convex body
:
;
(5.4)
P = (a, b, c) R3 | ax + by + cz 1 for all (x, y, z) P ,

consisting of all linear forms that evaluate to at most one on the convex body P .
The dual pillow P is shown on the right in Figure 5.2. Note the association
of faces under duality. The pillow P has four one-dimensional faces, four singular
zero-dimensional faces, and two smooth families of zero-dimensional faces. The
corresponding dual faces of P have dimensions 0, 2, and 0, respectively.
Semidenite programming was introduced in Chapter 2 as the computational
problem of optimizing a linear function over a spectrahedron. For our pillow P ,
this optimization problem takes the form
p (a, b, c) =

max

(x,y,z)R3

subject to

ax + by + cz
Q(x, y, z)  0.

(5.5)

We regard this as a parametric optimization problem: we are interested in the


optimal value and optimal solution of (5.5) as a function of (a, b, c) R3 . This
function can be expressed in terms of the dual body P as follows:
d (a, b, c) =

min
R

subject to

1
(a, b, c) P .

(5.6)

We distinguish this formulation from the duality in semidenite programming.


The dual to (5.5) is the following program with 7 decision variables:
d (a, b, c) =

min

u1 + u4 + u6 + u7

2u2
2u1
2u2
2u4
subject to
2u3
b
2u2 a 2u5
uR7

2u3
b
2u6
c

2u2 a
2u5
 0.
c
2u7

(5.7)

The derivation of such a dual formulation will be explained in Section 5.5. Since
(5.5) and (5.7) are both strictly feasible, strong duality holds [5, Subsection 5.2.3];
i.e., the two programs attain the same optimal value: p (a, b, c) = d (a, b, c). Hence,
problem (5.7) can be derived from (5.6), as we shall see in Section 5.5.
We write M (u; a, b, c) for the 44 matrix in (5.7). The following equations and
inequalities, known as the KarushKuhnTucker conditions (KKT), are necessary
and sucient for any pair of optimal solutions:
Q(x, y, z) M (u; a, b, c) = 0,

(complementary slackness)

Q(x, y, z)  0,
M (u; a, b, c)  0.
We relax the inequality constraints and consider the system of equations
= ax + by + cz

and Q(x, y, z) M (u; a, b, c) = 0.

i
i

5.1. Introduction

main
2012/11/1
page 207
i

207

This is a system of 17 equations. Using computer algebra, we eliminate the 10


unknowns x, y, z, u1 , . . . , u7 . The result is a polynomial in a, b, c, and . Its factors,
shown in (5.8)(5.9), express the optimal value  in terms of a, b, c.
At the optimal solution, the product of the two 44 matrices Q(x, y, z) and
M (u; a, b, c) is zero, and hence the pair (rank(Q), rank(M )) equals either (3, 1) or
(2, 2). In the former case the optimal value  is one of the two solutions of
(b2 + 2bc + c2 ) 2 a2 b2 a2 c2 b4 2b2 c2 2bc3 c4 2b3 c = 0.

(5.8)

In the latter case it comes from the four corners of the pillow, and it satises
(2 2 a2 + 2ab b2 + 2bc c2 2ac)
(2 2 a2 2ab b2 + 2bc c2 + 2ac)

0.

(5.9)

These two equations describe the algebraic boundary of the dual body P . Namely,
after setting = 1, the irreducible polynomial in (5.8) describes the quartic surface
that makes up the curved part of the boundary of P , as seen in Figure 5.2. In
addition, there are four planes spanned by at two-dimensional faces of P . The
product of the four corresponding ane linear forms is the expression (5.9). Indeed, each of the two quadrics in (5.9) factors into two linear factors. These two
characterize the planes spanned by opposite 2-faces of P .
The two equations (5.8) and (5.9) also oer a rst glimpse of the concept
of projective duality in algebraic geometry, dened precisely in Subsection 5.2.4.
Namely, consider the surface in projective space P3 dened by det(Q(x, y, z)) = 0
after replacing the ones along the diagonals by a homogenization variable. Then
(5.8) is its dual surface in the dual projective space (P3 ) . The surface (5.9) in (P3 )
is dual to the zero-dimensional variety in P3 cut out by the 33 minors of Q(x, y, z).
The optimal value function of the optimization problem (5.5) is represented,
in the sense of Section 5.3, by the algebraic surfaces dual to the boundary of P
and its singular locus. We have seen two dierent ways of dualizing (5.5): the dual
optimization problem (5.7) and the optimization problem (5.6) on P . These two
formulations are related as follows. If we regard (5.7) as specifying a 10-dimensional
spectrahedron, then the dual pillow P is a projection of that spectrahedron:
:
;
P = (a, b, c) R3 | u R7 : M (u; a, b, c)  0 and u1 + u4 + u6 + u7 = 1 .
Linear projections of spectrahedra, so-called projected spectrahedra, were introduced
in Chapter 2. They are at the heart of several parts of this book, most notably,
Chapters 6 and 7. The dual of a spectrahedron is generally not a spectrahedron,
but it is always a projected spectrahedron. We shall see this in Theorem 5.57.

5.1.2

Context and Outline

Duality is a central concept in convexity and convex optimization, and numerous authors have written about their connections and their interplay with other notions of
duality and polarity. Relevant references include Barvinoks textbook [1, Section 4]
and the survey by Luenberger [24]. The latter focuses on dualities used in engineering, such as duality of vector spaces, polytopes, graphs, and control systems. The

i
i

208

main
2012/11/1
page 208
i

Chapter 5. Dualities

objective of this chapter is to revisit the theme of duality in the context of convex
algebraic geometry and semidenite optimization. In algebraic geometry, there is
a natural notion of projective duality, which associates to every algebraic variety a
dual variety. One of our main goals is to explore the meaning of projective duality
for optimization theory. It is precisely this deeper connection with algebra which
distinguishes this chapter from other treatments of duality in convex optimization.
Our presentation is organized as follows. In Section 5.2 we cover preliminaries
needed for the rest of the chapter. Here the various dualities are carefully dened
and their basic properties are illustrated by means of examples. In Section 5.3
we derive the result that the optimal value function of a polynomial program is
represented by the dening equation of the hypersurface projectively dual to the
manifold describing the boundary of all feasible solutions. This highlights the important fact that the duality best known to algebraic geometers arises very naturally
in convex optimization. Section 5.4 concerns the convex hull of a compact algebraic
variety in Rn . We discuss work of Ranestad and Sturmfels [31, 32] on the hypersurfaces in the boundary of such a convex body, and we present several examples
and applications.
In Section 5.5 we focus on semidenite programming (SDP), and we oer a
concise geometric introduction to SDP duality. This leads us to the concept of
algebraic degree of SDP [12, 27] or, more geometrically, to projective duality for
varieties dened by rank constraints on symmetric matrices of linear forms.
A projected spectrahedron is the image of a spectrahedron under a linear projection. Its dual body is a linear section of the dual body to the spectrahedron. In
Section 5.6 we examine this situation in the context of sums-of-squares programming, and we discuss linear families of nonnegative polynomials. The gures in
this chapter were made with the software package Bermeja [34], which specializes
in computations in convex algebraic geometry.
We now come to the rst round of exercises in this chapter. They are meant for
our readers to get their hands dirty right away. The problems can be approached
from rst principles. No knowledge of any general algorithms or theorems is needed.
The use of both numerical software and computer algebra tools is encouraged.
Exercises
Exercise 5.1. Maximize the function 2x + 3y + 7y over the spectrahedron P given
in (5.2). Express the optimal solution in exact arithmetic. Locate the cost function
on the right in Figure 5.2 and locate the optimal solution on the left.
Exercise 5.2. Compute the projections of the spectrahedron P into the (x, y)plane and into the (y, z)-plane. Determine polynomials f (x, y) and g(y, z) that
vanish on the boundaries of these two planar convex bodies.
Exercise 5.3. Project P into a random plane and compute the irreducible polynomial of degree eight in two variables that vanishes on the boundary of image.
Exercise 5.4. Does there exist a projected spectrahedron that is not a spectrahedron?

i
i

5.2. Ingredients

main
2012/11/1
page 209
i

209

Exercise 5.5. A correlation matrix is a positive semidenite real symmetric n n


matrix whose n diagonal entries are all equal to 1.
(a) Maximize the sum of the o-diagonal entries over correlation matrices with
n = 3. Solve this optimization problem also for n = 4.
(b) Minimize the sum of the o-diagonal entries over correlation matrices with
n = 3. Solve this optimization problem also for n = 4.
(c) Does there exist a correlation matrix, of any size n, whose determinant is
larger than 1? Find a proof or counterexample.

5.2

Ingredients

In this section we review the mathematical preliminaries needed for the rest of the
chapter, we give precise denitions, and we x more of the notation. We begin
with the notion of duality for vector spaces and cones therein; then we move on to
convex bodies, polytopes, Lagrange duality in optimization, the KKT conditions,
and projective duality in algebraic geometry, and we conclude with discriminants.

5.2.1

Vector Spaces and Cones

We x an ordered eld K. The primary example is the eld of real numbers,


K = R, but we also allow other elds, such as the rational numbers K = Q
or the real Puiseux series K = R{{ }}. The examples in this chapter have their
algebraic representation over the rationals Q, but we consider the corresponding
geometric objects over the reals R. However, special geometric features naturally

lead to intermediate elds, e.g., the singular points in (5.3) live over the eld Q( 2).
Puiseux series come in handy when one needs a deformation parameter to deal
with degeneracies. This is standard for algorithms in real algebraic geometry [2].
Fix a nite-dimensional vector space V over an ordered eld K. The dual
vector space is the set V = Hom(V, K) of all linear forms on V . Let V and W
be vector spaces and : V W a linear map. The adjoint : W V is the
linear map dened by (w ) = w V for every w W . If we x bases of
both V and W , then is represented by a matrix A. The adjoint is represented,
relative to the dual bases for W and V , by the transpose AT of the matrix A.
A subset C V is a cone if it is closed under multiplication with positive
scalars. A cone C need not be convex, but its dual cone
C = { l V | for all x C : l(x) 0 }

(5.10)

is always closed and convex in V . If C is a convex cone, then the second dual
(C ) is the closure of C. Thus, if C is a closed convex cone in V , then
(C ) = C.

(5.11)

This important relationship is referred to as biduality.

i
i

210

main
2012/11/1
page 210
i

Chapter 5. Dualities

Every linear subspace L V is also a closed convex cone. The dual of L,


when viewed as a cone, is the orthogonal complement of L, viewed as a subspace:
L = L = { l V | for all x L : l(x) = 0 } .
The adjoint to the inclusion L V is the projection L : V V /L . Given any
(convex) cone C V , the intersection C L is a (convex) cone in L. Its dual cone
(C L) is the projection of the cone C into V /L . More precisely,
(C L) = C + L

in V .

(5.12)

Now, it makes sense to consider this convex set modulo L . We can thus identify
(C L) = L (C )

in V /L .

(5.13)

This formula expresses the fact that projection and intersection are dual operations.
Example 5.6. It is necessary to take the closure of L (C ) in (5.12) and (5.13)
because projections of closed convex cones need not be closed. The following simple
example is derived from [18, Example 3.5, p. 196]. Consider the closed convex cone
;
:
C =
(u, x, y, z) R4 : u 0, u + x 0, y 0, z 0, and (u + x)y z 2 ,
and x the hyperplane L = {(0, x, y, z) : x, y, z R} $ R3 . Then L is the
projection from R4 to R3 given by dropping the u-coordinate. We claim that the
image L (C ) is not closed. To see this, we note that for every > 0 the vector
(1/ , 0, , 1) lies in C , and hence (0, , 1) lies in L (C ). On the other hand, (0, 0, 1)
does not lie in L (C ) because z = 1 implies (u + x)y 1 and hence y > 0.
The results summarized above are fundamental in convex analysis. For proofs
and details we refer to the textbook by Rockafellar [33, Section 16]. The space
V /L is the space Hom(L, K) of linear functionals on L. In applications one often
identies this space with L itself, by means of an inner product on the ambient space
V . The linear map L then becomes the orthogonal projection from V onto L, and
(5.13) is the closure of the image of C under that orthogonal projection.
A subset F C of a convex set C is a face if F is itself convex and contains
any line segment L C whose relative interior intersects F . We say that F is an
exposed face if there exists a linear functional l that attains its minimum over C
precisely at F . Clearly, every exposed face of C is a face, but the converse does not
hold. For instance, the edges of the triangle on the top in Figure 5.6 are nonexposed
faces of the three-dimensional convex body shown there.
An exposed face F of a cone C determines a face of the dual cone C via
F  = { l C | l attains its minimum over C at F } .
The dimensions of the faces F of C and F  of C satisfy the inequality
dim(F ) + dim(F  ) dim(V ).

(5.14)

If C is a polyhedral cone, then C is also polyhedral. In that case, the number of


faces F and F  is nite and equality holds in (5.14). On the other hand, most
convex cones considered in this chapter are not polyhedral; they have innitely

i
i

5.2. Ingredients

main
2012/11/1
page 211
i

211

many faces, and the inequality in (5.14)


 is usually strict. For instance, the secondorder cone C = { (x, y, z) R3 : x2 + y 2 z} is self-dual, each proper face F
of C is one-dimensional, and the formula (5.14) says that 1 + 1 3.

5.2.2

Convex Bodies and Their Algebraic Boundary

A convex body in V is a full-dimensional convex set that is closed and bounded. If C


is a cone and int(C ), then the hyperplane {(x) = z} intersects C for all z 0
and yields a convex body. In this manner, every pointed r-dimensional cone gives
rise to an (r1)-dimensional convex body by xing z = 1. The convex body forms
the base of the cone. The cone can be recovered from its base up to a linear isomorphism. These transformations, known as homogenization and dehomogenization
with homogenization variable z, respect faces and algebraic boundaries. They allow
us to go back and forth between convex bodies and cones in the next higher dimension. For instance, the three-dimensional body P in (5.2) is the base of the cone in
R4 we get by multiplying the constants 1 on the diagonal in (5.1) with a new variable.
Let P be a full-dimensional convex body in V and assume that 0 int(P ).
Dehomogenizing the denition for cones, we obtain the dual convex body
P = { V | for all x P : (x) 1 } .

(5.15)

This is derived from (5.10) using the identication l(x) = z (x) for z = 1. We
note that the dual of a convex body (as opposed to the dual of a cone) is not an
intrinsic construction, but it depends on the position of P relative to the origin.
Just as in the case of convex cones, if P is closed, then biduality holds:
(P ) = P.
The denition (5.15) makes sense for arbitrary subsets P of V . That is, P need
not be convex or closed. A standard fact from convex analysis [33, Corollary 12.1.1
and Section 14] says that the double dual is the closure of the convex hull with the
origin:
(P ) = conv(P 0).
All convex bodies discussed in this chapter are semialgebraic, that is, they can
be described by Boolean combinations of polynomial inequalities. We note that if P
is semialgebraic then its dual body P is also semialgebraic. This is a consequence
of Tarskis theorem on quantier elimination in real algebraic geometry [2, 4].
The algebraic boundary of a semialgebraic convex body P , denoted a P , is the
smallest complex algebraic variety that contains the boundary P . In geometric
language, a P is the Zariski closure of P . It is identied with the squarefree
polynomial fP that vanishes on P . Namely, a P = VC (fP ) is the zero set of the
polynomial fP . Note that fP is unique up to a multiplicative constant. Thus a P
is the smallest complex algebraic hypersurface which contains the boundary P .
A polytope is the convex hull of a nite subset of V . If P is a polytope, then
so is its dual P [37]. The boundary of P consists of nitely many facets F . These
are the faces F = v  dual to the vertices v of P . The algebraic boundary a P is
the arrangement of hyperplanes spanned by the facets of P . Its dening polynomial
fP is the product of the linear polynomials v, x 1.

i
i

212

main
2012/11/1
page 212
i

Chapter 5. Dualities

Example 5.7. A polytope known to everyone is the three-dimensional cube


$
%
P = conv {(1, 1, 1)} = {1 x, y, z 1}.
Figure 5.1 illustrates the familiar fact that its dual polytope is the octahedron
$
%
P = {1 a b c 1} = conv {e1 , e2 , e3 } .
Here ei denotes the ith unit vector. The eight vertices of P correspond to the facets
of P , and the six facets of P correspond to the vertices of P . The algebraic
boundary of the cube is described by a degree 6 polynomial
$
%
a P = VC (x2 1)(y 2 1)(z 2 1) .
The algebraic boundary of the octahedron is given by a degree 8 polynomial
.#
/
#
(1 a b c) (a b c + 1) .
a P = VC
Note that P and P are the unit balls for the norms L and L1 on R3 .

Recall that the Lp -norm on Rn is dened by
x
p = ( ni=1 |xi |p )1/p for
x Rn . The dual norm to the Lp -norm is the Lq -norm for 1p + 1q = 1, that is,

y
q = sup{y, x | x Rn ,
x
p 1}.
Geometrically, the unit balls for these norms are dual as convex bodies.
Example 5.8. Consider the case n = 2 and p = 4. Here the unit ball equals
P = { (x, y) R2 : x4 + y 4 1 }.
This planar convex set is shown in Figure 5.3. The ordinary boundary P of this
convex set is the real curve dened by the quartic polynomial x4 + y 4 = 1. In this
example, the ordinary boundary coincides with the algebraic boundary a P .

Figure 5.3. The unit balls for the L4 -norm and the L4/3 -norm are dual.
The curve on the left has degree 4, while its dual curve on the right has degree 12.

i
i

5.2. Ingredients

main
2012/11/1
page 213
i

213

The dual body is the unit ball for the L4/3 -norm on R2 :
P = {(a, b) R2 : |a|4/3 + |b|4/3 1} .
The algebraic boundary of P is an irreducible algebraic curve of degree 12,
$
%
(5.16)
a P = V a12 +3a8 b4 +3a4 b8 +b12 3a8 +21a4 b4 3b8 +3a4 +3b4 1 ,
which again coincides precisely with the (geometric) boundary P . This dual
polynomial is easily produced by the following one-line program in the computer
algebra system Macaulay2 due to Grayson and Stillman [13]:
R = QQ[x,y,a,b]; eliminate({x,y},ideal(x^4+y^4-1,x^3-a,y^3-b))

In Subsection 5.2.4 we shall introduce the general algebraic framework for performing such duality computations, not just for curves, but for arbitrary varieties.

5.2.3

Lagrange Duality in Optimization

We now come to a standard concept of duality in optimization theory. The treatment here is more general than duality in convex optimization, which was presented
in Chapter 2. Let us consider the following general nonlinear polynomial optimization problem:
minimize
f (x)
n
xR

subject to gi (x) 0, i = 1, . . . , m,
hj (x) = 0, j = 1, . . . , p.

(5.17)

Here the g1 , . . . , gm , h1 , . . . , hp and f are polynomials in R[x1 , . . . , xn ]. The Lagrangian associated with the optimization problem (5.17) is the function
p
L : Rn R m
+ R
(x, , )

Rn , 
p


f (x) + m
i=1 i gi (x) +
j=1 j hj (x).

The scalars i R+ and j R are the Lagrange multipliers for the constraints
gi (x) 0 and hj (x) = 0. The Lagrangian L(x, , ) can be interpreted as an augmented cost function with penalty terms for the constraints. For more information
on the above formulation see [5, Section 5.1].
One can show that the problem (5.17) is equivalent to nding
u = minn
xR

max

Rp and 0

L(x, , ).

The key observation here is that any positive evaluation of one of the polynomials
gi (x), or any nonzero evaluation of one of the polynomials hj (x), would render the
inner optimization problem unbounded.
The dual optimization problem to (5.17) is obtained by exchanging the order
of the two nested optimization subproblems in the above formulation:
v =

max

min


Rp and 0 xRn

L(x, , ) .
!
"
(,)

i
i

214

main
2012/11/1
page 214
i

Chapter 5. Dualities

The function (, ) is known as the Lagrange dual function to our problem. This
function is always concave, so the dual is always a convex optimization problem.
It follows from the denition of the dual function that (, ) u for all , .
Hence the optimal values satisfy the inequality
v  u .
If equality occurs, v  = u , then we say that strong duality holds. A necessary
condition for strong duality is i gi (x ) = 0 for all i = 1, . . . , m, where (x ,  ,  )
denote a primal and dual optimizer. We see this by evaluating the Lagrangian at
an optimizer and taking into account the fact that hj (x) = 0 for all feasible x.
Collecting all inequality and equality constraints in the primal and dual optimization problems yields the following optimality conditions.
Theorem 5.9 (KKT conditions). Let (x ,  ,  ) be primal and dual optimal
solutions with u = v  (strong duality). Then


x f

x

m


i



x gi

i=1

x

p




j x hj

j=1

x

= 0,

gi (x ) 0

for i = 1, . . . , m,

0
hj (x ) = 0

for i = 1, . . . , m,
for j = 1, . . . , p,

i gi (x ) = 0

for i = 1, . . . , m.

i


Complementary slackness:

(5.18)

For a derivation of this theorem see [5, Subsection 5.5.2]. Several comments
on the KKT conditions are in order. First, we note that complementary slackness
amounts to a case distinction between active (gi = 0) and inactive inequalities
(gi < 0). For any index i with gi (x ) = 0 we need i = 0, so the corresponding
inequality does not play a role in the gradient condition. On the other hand, if
gi (x ) = 0, then this can be treated as an equality constraint.
From an algebraic point of view, it is natural to relax the inequalities and to
focus on the KKT equations. These are the polynomial equations in (5.18):


x f

x

m

i=1



i x gi

x

p




j x hj

j=1

x

= 0,

h1 (x) = = hp (x) = 1 g1 (x) = = m gm (x) = 0.

(5.19)

If we wish to solve our optimization problem exactly, then we must compute the
algebraic variety in Rn Rm Rp that is dened by these equations.
In what follows we explore Lagrange duality and the KKT conditions in two
special cases, namely in optimizing a linear function over an algebraic variety (Section 5.3) and in semidenite programming (Section 5.5).

5.2.4

Projective Varieties and Their Duality

In algebraic geometry, it is customary to work over an algebraically closed eld,


such as the complex numbers C. All our varieties will be dened over a subeld K

i
i

5.2. Ingredients

main
2012/11/1
page 215
i

215

of the real numbers R, and their points have coordinates in C. It is also customary
to work in projective space Pn rather than ane space Cn , i.e., we work with
equivalence classes x x for all C\{0}, x Cn+1 \{0}. Points (x0 : x1 :
: xn ) in projective space Pn are lines through the origin in Cn+1 , and the usual
ane coordinates are obtained by dehomogenization with respect to x0 (i.e., setting
x0 = 1). All points with x0 = 0 are then considered as points at innity. We refer
to [8, Chapter 8] for an elementary introduction to projective algebraic geometry.
Let I = h1 , . . . , hp  be a homogeneous ideal in the ring K[x0 , x1 , . . . , xn ] of
polynomials in n + 1 unknowns with coecients in K. We write X = VC (I) for
its variety in the projective space Pn over C. The singular locus Sing(X) is a
proper subvariety of X. It is dened inside X
of the c c minors
%
$ by the vanishing
of the p(n+1) Jacobian matrix Jac(X) = hi /xj , where c = codim(X). See
[8, Section 9.6] for background on singularities and dimension. While the matrix
Jac(X) depends on our choice of ideal generators hi , the singular locus of X is
independent of that choice. Points in Sing(X) are called singular points of X. We
write Xreg = X\Sing(X) for the set of regular points in X. We say that the
projective variety X is smooth if Sing(X) = or, equivalently, if X = Xreg .
n
The dual projective space (Pn ) parametrizes:hyperplanes
n in P . A ;point
n
n
(u0 : u1 : : un ) (P ) represents the hyperplane x P | i=0 ui xi = 0 . We
say that u is tangent to X at a regular point x Xreg if x lies in that hyperplane
and its representing vector (u0 , u1 , . . . , un ) lies in the row space of the Jacobian
matrix Jac(X) at the point x.
We dene the conormal variety CN(X) of X to be the closure of the set
:
;
(x, u) Pn (Pn ) | x Xreg and u is tangent to X at x .
The projection of CN(X) onto the second factor is denoted X and is called the
dual variety. More precisely, the dual variety X is the closure of the set
:

;
u (Pn ) | the hyperplane u is tangent to X at some regular point .

In our denitions of conormal variety and dual variety, the word closure can mean
either Zariski closure or the classical strong closure over the complex numbers. Both
will lead to the same complex projective variety in the situations considered here.
Proposition 5.10. The conormal variety CN(X) has dimension n 1.
Proof sketch. We may assume that X is irreducible. Let c = codim(X). There
are nc degrees of freedom in picking a point x in Xreg . Once the regular point x
is xed, the possible tangent vectors u to X at x form a linear space of dimension
c1. Hence the dimension of CN(X) is (nc) + (c1) = n1.
Since the dual variety X is a linear projection of the conormal variety CN(X),
Proposition 5.10 implies that the dimension of X is at most n 1. We expect X
to have dimension n 1. In other words, regardless of the dimension of X, the dual
variety X is typically a hypersurface in the dual projective space (Pn ) . We shall
see many examples of such dual hypersurfaces throughout this chapter.

i
i

216

main
2012/11/1
page 216
i

Chapter 5. Dualities

To compute the dual X of a given projective variety X, we set up a system


of polynomial equations, and we eliminate some of the variables. This can be done
using Grobner bases [8, 13]. We rst illustrate this for a familiar example.
Example 5.11 (Example 5.8 continued). Fix coordinates (x : y : z) on P2 and
consider the ideal I = x4 + y 4 z 4 . Then X = V (I) is the projective version of
the quartic curve in Example 5.8. The dual curve X is the projective version of the
curve a P in (5.16). Hence, X is a curve of degree 12 in (P2 ) .
The equations used to compute X algebraically consist of the given quartic
4
x + y 4 z 4 together with the 2 2 minors of the augmented Jacobian matrix
&
Jac =

a
4x3

b
4y 3

'
c
.
4z 3

We write J  for the ideal generated by these four polynomials in Q[x, y, z, a, b, c].
We then replace J  by its saturation
J = J  : x, y, z .

(5.20)

This has the eect of removing an extraneous component of VC (J  ) that corresponds


to the origin (0, 0, 0) in (x, y, z)-space. We now eliminate the three unknowns x, y, z
from J, that is, we compute J Q[a, b, c]. This elimination ideal is the principal
ideal generated by the homogenization of the degree 12 polynomial in (5.16).
The steps we described in Example 5.11 to compute the degree 12 curve dual
to the given quartic can be extended to arbitrary instances. The role of the ideal
x, y, z in (5.20) is then played by the equations dening the singular locus of X.
This results in the following general algorithm for dualizing projective varieties.
Algorithm 5.1. Computing the dual variety X .
Require: The input is the homogeneous ideal I of a projective variety X = V (I).
Ensure: The output is the ideal Idual representing the dual variety X = V (Idual ).
1: Determine the codimension c of the variety X in Pn .
2: Generate the augmented Jacobian matrix
&
'
u0 u1 un
Jac(X) =
Jac(X)
Compute J  = I +  (c + 1) (c + 1) minors of Jac(X)  K[x, u].
4: Remove the singular locus by computing the saturation ideal
3:

J :=
5:
6:

%
J  :  c c minors of Jac(X)  .

Compute the desired ideal Idual = J K[u] by elimination.


return Dual variety X = V (Idual ).

i
i

5.2. Ingredients

main
2012/11/1
page 217
i

217

The steps in this algorithm can be executed either using exact arithmetic in a
computer algebra system, such as Macaulay2, or using oating point arithmetic in
the framework of numerical algebraic geometry. Such a numerical implementation
in the software Bertini [3] is currently being developed by Jonathan Hauenstein.
Remark 5.12. The ideal J in step 3 above is bihomogeneous in x and u, respectively. Its zero set in Pn (Pn ) is the conormal variety CN(X).
Theorem 5.13 (Biduality, [11, Theorem 1.1]).
variety X Pn satises

Every irreducible projective

(X ) = X.
Proof sketch. The main step in proving this important theorem is that the conormal variety is self-dual, in the sense that CN(X) = CN(X ). In this identity, the
roles of x Pn and u (Pn ) are swapped. It implies (X ) = X. A proof for the
self-duality of the conormal variety is found in [11, Subsection I.1.3].
Example 5.14. Suppose that X Pn is a general smooth hypersurface of degree d.
Then X is a hypersurface of degree d(d 1)n1 in (Pn ) . A concrete instance for
d = 4 and n = 2 was seen in Examples 5.8 and 5.11. When X is a hypersurface
that is not smooth, then the dual variety X is either a hypersurface of degree less
than d(d 1)n1 , or X is a variety of codimension at least 2.
Example 5.15. Let X be the variety of symmetric m m matrices of rank at
most r. Then X is the variety of symmetric m m matrices of rank at most m r
[11, Subsection I.1.4]. Here the conormal variety CN(X) consists of pairs of symmetric matrices A and B such that A B = 0. This conormal variety will be important
for our discussion of duality in semidenite programming in Section 5.5.
An important class of examples, arising from toric geometry, is featured in the
book by Gelfand, Kapranov, and Zelevinsky [11]. A projective toric variety XA in
Pn is specied by an integer matrix A of format r (n+1) and rank r with columns
a0 , a1 , . . . , an and whose row space
; We dene XA
: contains the vector (1, 1, . . . , 1).
as the closure in Pn of the set (ta0 : ta1 : : tan ) | t (C\{0})r .

The dual variety XA


is called the A-discriminant. It is usually a hypersurface,
in which case we identify the A-discriminant with the irreducible polynomial A

that vanishes on XA
. The A-discriminant is indeed a discriminant in the sense that
its vanishing characterizes Laurent polynomials
p(t) =

n


cj t1 1j t2 2j tar rj

j=0

with the property that the hypersurface {p(t) = 0} has a singular point in (C\{0})r .
In other words, we can dene (and compute) the A-discriminant as the unique

i
i

218

main
2012/11/1
page 218
i

Chapter 5. Dualities

irreducible polynomial A that vanishes on the hypersurface




p
p

r
= =
=0 .
XA = c (P ) | t (C\{0}) with p(t) =
t1
tr
Example 5.16. Let r = 2, n = 4, and x the matrix
&
'
4 3 2 1 0
A =
.
0 1 2 3 4
The associated toric variety is the rational normal curve
;
:
XA = (t41 : t31 t2 : t21 t22 : t1 t32 : t42 ) P4 | (t1 : t2 ) P1
= V (x0 x2 x21 , x0 x3 x1 x2 , x0 x4 x22 , x1 x3 x22 , x1 x4 x2 x3 , x2 x4 x23 ).

A hyperplane { 4j=0 cj xj = 0} is tangent to XA if and only if the binary form
p(t1 , t2 )

c0 t42 + c1 t1 t32 + c2 t21 t22 + c3 t31 t2 + c4 t41

has a linear factor of multiplicity 2. This is

c0 c1
c2
0 c0
c1

0
0
c0

1
c
2c
3c
A =
det
1
2
3

c4
0 c1 2c2

0
0
c1
0
0
0

controlled by the A-discriminant

c3
c4
0
0
c2
c3
c4
0

c1
c2
c3
c4

4c4 0
0
0
(5.21)
,
3c3 4c4 0
0

2c2 3c3 4c4 0


c1 2c2 3c3 4c4

given here in the form of the determinant of a Sylvester matrix, see [9, Section 3].

The sextic hypersurface XA


= V (A ) is the dual variety of the curve XA .
Exercises
Exercise 5.17. Let P be a convex body in R3 obtained by intersecting a ball and
a cube, where neither of these bodies contains the other. Describe the dual convex
body P . Can you draw pictures of P and P ?
Exercise 5.18. Determine the irreducible polynomial that vanishes on the L4/3 unit sphere in R3 . In other words, extend Example 5.8 from n = 2 to n = 3.
Exercise 5.19. Let X be the variety of symmetric m m matrices of rank at
most r. Determine the dimension of X and describe the singular locus Sing(X).
Exercise 5.20. Find an example of a surface X in P3 whose dual variety X is a
curve.
Exercise 5.21. Study the equation X Y = 0 when X and Y are unknown symmetric 44 matrices. This constraint translates into 16 bilinear equations in the

i
i

5.3. The Optimal Value Function

main
2012/11/1
page 219
i

219

20 unknown matrix entries. Decompose the algebraic variety dened by these 16


equations into its irreducible components. What is the dimension of each component? How do you know that it is irreducible?

5.3

The Optimal Value Function

A fundamental question concerning any optimization problem is how the output


depends on the input. The optimal solution and the optimal value of the problem
are functions of the parameters, and it is important to understand the nature of
these functions. For instance, for a linear programming problem,
maximize w, x subject to A x = b and x 0,

(5.22)

the optimal solution depends in a convex and piecewise linear manner on the cost
vector w and the right hand side b, and it is a piecewise rational function of the
entries of the matrix A. The area of mathematics which studies these functions
is geometric combinatorics, specically the theory of matroids for the dependence
on A, and the theory of regular polyhedral subdivisions for the dependence on w
and b. Exercise 5.30 at the end of this section asks for a further exploration.
If we replace (5.22) with the corresponding integer programming problem, where
the coordinates of x are required to be integers, then the dependence on w and
b becomes more subtle and nite Abelian groups enter the picture. The optimal
value function of an integer program has a certain arithmetic behavior, in addition
to the polyhedral structures which govern the parametric versions of the linear
programming problem.
For a second example, consider the following basic question in game theory:
Given a game, compute its Nash equilibria.

(5.23)

If there are only two players and one is interested in fully mixed Nash equilibria,
then this is a linear problem and in fact closely related to linear programming. On
the other hand, if the number of players is more than two, then the problem (5.23) is
universal in the sense of real algebraic geometry: Datta [10] showed that every real
algebraic variety is isomorphic to the set of Nash equilibria of some three-person
game. A corollary of her construction is that, if the Nash equilibria are discrete,
then their coordinates can be arbitrary algebraic functions of the given input data.
Our third motivating example concerns maximum likelihood estimation in
statistical models for discrete data. Here the optimization problem is as follows:
maximize p1 ()u1 p2 ()u2 pn ()un subject to ,

(5.24)

where is an open subset of Rm , the pi () are polynomial functions that sum to

one, and the ui are positive integers (these are the data). The optimal solution ,
which is the maximum likelihood estimator, depends algebraically on the data:
1 , . . . , un ).
(u1 , . . . , un )  (u

(5.25)

Catanese et al. [7] give a formula for the degree of this algebraic function under
certain hypotheses on the polynomials pi () which specify the statistical model.

i
i

i main

2012/11/10
page 220

220

Chapter 5. Dualities

In this section we study this issue for the polynomial optimization problem
(5.17). We shall assume throughout that the cost function f (x) is linear and that
there are no inequality constraints gi (x). The purpose of these restrictions is to
simplify the presentation and focus on the key ideas. Also, this is compatible with
Chapter 7, which oers an algebraic method for the important problem of computing
lower bounds on the optimal value function. Our analysis can be extended to the
general problem (5.17), and we discuss this briey at the end of this section.
To be precise, we consider the problem of optimizing a linear cost function
over a compact real algebraic variety X in Rn . This is written formally as follows:
c0 = min c, x
x

subject to

x X = {v Rn | h1 (v) = = hp (v) = 0} .

(5.26)

Here h1 , h2 , . . . , hp are xed polynomials in n unknowns x1 , . . . , xn . The expression


c, x = c1 x1 + +cn xn is a linear form whose coecients c1 , . . . , cn are unspecied
parameters. Our aim is to compute the optimal value function c0 . Thus, we regard
the optimal value c0 as a function Rn R of the parameters c1 , . . . , cn . We seek
to derive an exact symbolic representation of this algebraic function.
The hypothesis that X be compact has been included to ensure that the
optimal value function c0 is well-dened on all of Rn . Again, also this hypothesis
can be relaxed. We assume compactness here just for convenience.
Our problem is equivalent to that of describing the dual convex body P of
the convex hull P = conv(X), assuming that the latter contains the origin in its
interior. Indeed, P is precisely the set of points (c1 , . . . , cn ) at which the value of
the function c0 is less than or equal to 1. Hence the optimal value function of P
computes the gauge of the dual body P . A small instance of this was seen in (5.6).
Since our convex body P is a semialgebraic set, Tarskis theorem on quantier elimination in real algebraic geometry [2, 4] ensures that the dual body P is
also semialgebraic. This implies that the optimal value function c0 is an algebraic
function, i.e., there exists a polynomial (c0 , c1 , . . . , cn ) in n + 1 variables such that
(c0 , c1 , . . . , cn )

0.

(5.27)

Our aim is to compute such a polynomial of least possible degree. The input
consists of the polynomials h1 , . . . , hp that cut out the variety X. The degree of
in the unknown c0 is called the algebraic degree of the optimization problem
(5.17). This number is an intrinsic algebraic complexity measure for the problem of
optimizing a linear function over X. For instance, if c1 , . . . , cn are rational numbers,
then the algebraic degree indicates the degree of the eld extension K over Q that
contains the coordinates of the optimal solution.
We illustrate our discussion by computing the optimal value function and its
algebraic degree for the trigonometric space curve featured in [31, Section 1].
Example 5.22. Let X be the curve in R3 with parametric representation
$
%
cos(), sin(2), cos(3) .
(x1 , x2 , x3 ) =
In terms of equations, our curve can be written as X = V (h1 , h2 ), where
h1 = x21 x22 x1 x3

and h2 = x3 4x31 + 3x1 .

i
i

5.3. The Optimal Value Function

main
2012/11/1
page 221
i

221

The optimal value function for maximizing c1 x1 +c2 x2 +c3 x3 over X is given by
= (11664c43 ) c60 + (864c31 c33 + 1512c21 c22 c23 19440c21 c43
+576c1 c42 c3 1296c1 c22 c33 + 64c62 25272c22 c43 34992c63 ) c40
6 2
+ (16c1 c3 + 8c51 c22 c3 1152c51 c33 1920c41 c22 c23 + 8208c41 c43 724c31 c42 c3 + 144c31 c22 c33
+c41 c42 17280c31 c53 80c21 c62 2802c21 c42 c23 3456c21 c22 c43 + 3888c21 c63 1120c1 c62 c3
+540c1 c42 c33 + 55080c1 c22 c53 128c82 208c62 c23 +15417c42 c43 +15552c22 c63 +34992c83 ) c20
+ (16c81 c23 8c71 c22 c3 + 256c71 c33 c61 c42 + 328c61 c22 c23 1600c61 c43 + 114c51 c42 c3
2856c51 c22 c33 + 4608c51 c53 + 12c41 c62 1959c41 c42 c23 + 9192c41 c22 c43 4320c41 c63
528c31 c62 c3 + 7644c31 c42 c33 7704c31 c22 c53 6912c31 c73 48c21 c82 + 3592c21 c62 c23
4863c21 c42 c43 13608c21 c22 c63 + 15552c21 c83 + 800c1 c82 c3 400c1 c62 c33 10350c1 c42 c53
8 2
6 4
4 6
2 8
10
+16200c1 c22 c73 + 64c10
2 + 80c2 c3 1460c2 c3 + 135c2 c3 + 9720c2 c3 11664c3 ).

The optimal value function c0 is the algebraic function of c1 , c2 , c3 obtained by solving = 0 for the unknown c0 . Since c0 has degree 6 in , we see that the algebraic
degree of this optimization problem is 6. Note that there are no odd powers of c0
in . Thus, is a cubic polynomial in c20 , and this implies that we can write the
optimal value function c0 as an expression in radicals in c1 , c2 , c3 .
We now come to the main result in this section. It will explain what the
polynomial means and how it was computed in the previous example. For the
sake of simplicity, we shall rst assume that the given variety X is smooth, i.e.
X = Xreg , where the set Xreg denotes all regular points on X.
Theorem 5.23. Let X (Pn ) be the dual variety to the projective closure of a
real ane variety X in Rn . If X is irreducible, smooth, and compact in Rn , then X
is an irreducible hypersurface, and its dening polynomial equals (c0 , c1 , . . . , cn )
where represents the optimal value function as in (5.27) of the optimization problem (5.26). In particular, the algebraic degree of (5.26) is the degree in c0 of the
irreducible polynomial that vanishes on the dual hypersurface X .
Here the change of sign in the coordinate c0 is needed because the equation
c0 = c1 x1 + + cn xn for the objective function value in Rn becomes the homogenized equation (c0 )x0 + c1 x1 + + cn xn = 0 when we pass to Pn .
Proof. Since X is compact, for every cost vector c there exists an optimal solution
x . Our assumption that X is smooth ensures that x is a regular point of X, and
c lies in the span of the gradient vectors x hi x for i = 1, . . . , p. In other words,
the KKT conditions are necessary at the point x :
c =

p



i x hi x ,

i=1


hi (x ) = 0

for i = 1, 2, . . . , p.

The scalars 1 , . . . , p express c as a vector in the orthogonal complement of the


tangent space of X at x . In other words, the hyperplane {x Rn : c, x = c0 }
contains the tangent space of X at x . This means that the pair
$ 
%
x , (c0 : c1 : : cn )

i
i

222

main
2012/11/1
page 222
i

Chapter 5. Dualities

lies in the conormal variety CN(X) Pn (Pn ) of the projective closure of X. By


projection onto the second factor, we see that (c0 : c1 : : cn ) lies in the dual
variety X .
Our argument shows that the boundary of the dual body P is a subset of X .
Since that boundary is a semialgebraic set of dimension n 1, we conclude that
X is a hypersurface. If we write its dening equation as (c0 , c1 , . . . , cn ) = 0,
then the polynomial satises (5.27), and the statement about the algebraic degree
follows as well.
Theorem 5.23 tells us that the minimal polynomial which represents the
desired optimal value function c0 can be computed using Algorithm 5.1.
The KKT condition for the optimization problem (5.26) involves three sets of
variables, two of which are dual variables, to be carefully distinguished:
1. Primal variables x1 , . . . , xn to describe the set X of feasible solutions.
2. (Lagrange) dual variables 1 , . . . , p to parametrize the linear space of all
hyperplanes that are tangent to X at a xed point x .
3. (Projective) dual variables c0 , c1 , . . . , cn for the space of all hyperplanes. These
are coordinates for the dual variety X and the dual body P .
We can compute the equation that denes the dual hypersurface X by eliminating the rst two groups of variables x = (x1 , . . . , xn ) and = (1 , . . . , p ) from the
following system of polynomial equations:
c0 = c, x and h1 (x) = = hp (x) = 0 and c = 1 x h1 + + p x hp .
Example 5.24 (Example 5.8 continued). We consider (5.26) with n = 2, p = 1,
and h1 = x41 + x42 1. The KKT equations for maximizing the function
c0 = c1 x1 + c2 x2

(5.28)

over the TV screen curve X = V (h1 ) are


c1 = 1 4x31 ,

c2 = 1 4x32 ,

x41 + x42 = 1.

(5.29)

We eliminate the three unknowns x1 , x2 , 1 from the system of four polynomial


equations in (5.28) and (5.29). The result is the polynomial (c0 , c1 , c2 ) of degree
12 which expresses the optimal value c0 as an algebraic function of c1 and c2 . We
note that (1, c1 , c2 ) is precisely the polynomial in (5.16).
It is natural to ask what happens with Theorem 5.23 when X fails to be
smooth or compact or if there are additional inequality constraints. Let us rst
consider the case when X is no longer smooth, but still compact. Now, Xreg is a
proper (open, dense) subset of X. The optimal value function c0 for the problem
(5.26) is still perfectly well-dened on all of Rn , and it is still an algebraic function
of c1 , . . . , cn . However, the polynomial that represents c0 may now have more
factors than just the equation of the dual variety X .

i
i

5.3. The Optimal Value Function

main
2012/11/1
page 223
i

223

Example 5.25. Let n = 2 and p = 1 as in Example 5.24, but now we consider a


singular quartic. The bicuspid curve, shown in Figure 5.4, is dened by
h1 = (x21 1)(x1 1)2 + (x22 1)2 = 0.
The algebraic degree of optimizing a linear function c1 x1 + c2 x2 over X = V (h1 )
equals 8. The optimal value function c0 = c0 (c1 , c2 ) is represented by
% $
% $
$
=
c0 c1 + c2 c0 c1 c2 16c60 48(c21 + c22 )c40
%
+(24c21 c22 + 21c42 + 64c41 )c20 + (54c1 c42 +32c51 )c0 + 8c41 c22 3c21 c42 +11c62 .
The rst two linear factors correspond to the singular points of the bicuspid curve X,
and the larger factor of degree six represents the dual curve X .

Figure 5.4. The bicuspid curve in Example 5.25.


This example shows that, when X has singularities, it does not suce to just
dualize the variety X but we must also dualize the singular locus of X. This process
is recursive, and we must also consider the singular locus of the singular locus etc.
We believe that, in order to characterize the value function , it always suces to
dualize all irreducible varieties occurring in a Whitney stratication of X but this
has not been worked out yet. In our view, this topic requires more research, both
on the theoretical side and on the computational side.
The following result is valid for any variety X in Rn .
Corollary 5.26. If the dual variety of X is a hypersurface then its dening polynomial contributes a factor to the value function of the problem (5.26).
This result can be extended to an arbitrary optimization problem of the form
(5.17). We obtain a similar characterization of the optimal value c0 as a semialgebraic function of c1 , c2 , . . . , cn by eliminating all primal variables x1 , . . . , xn and
all dual (optimization) variables 1 , . . . , m , 1 , . . . , p from the KKT equations.

i
i

224

main
2012/11/1
page 224
i

Chapter 5. Dualities

Again, the optimal value function is represented by a unique square-free polynomial (c0 , c1 , . . . , cn ), and each factor of this polynomial is the dual hypersurface
Y of some variety Y that is obtained from X by setting gi (x) = 0 for some of
the inequality constraints, by recursively passing to singular loci. In Section 5.5 we
shall explore this for semidenite programming.
We close this section with a simple example involving A-discriminants.
Example 5.27. Consider the calculus exercise of minimizing a polynomial
q(t)

c 1 t + c 2 t2 + c 3 t3 + c 4 t4

of degree four over the real line R. Equivalently, we wish to minimize


c0 = c1 x1 + c2 x2 + c3 x3 + c4 x4
over the rational normal curve XA {x0 = 1} = V (x21 x2 , x31 x3 , x41 x4 ),
seen in Example 5.16. The optimal value function c0 is given by the equation
A (c0 , c1 , c2 , c3 , c4 ) = 0, where A is the discriminant in (5.21). Hence the algebraic degree of this optimization problem is equal to three.
Exercises
:
;
Exercise 5.28. Consider the plane curve Y = (sin(2), cos(3) : R obtained
from Example 5.22 by projection onto the last two coordinates. Determine the
optimal value function for maximizing a linear function over Y .
Exercise 5.29. Maximize 2x + 3y + 7z subject to x4 + y 4 + z 4 = 1. Can you
express the optimal solution and the optimal value in terms of radicals?
Exercise 5.30. What is the algebraic degree of nding the global minimum of a
polynomial function of degree 4 in two variables?
Exercise 5.31. Characterize the optimal value functions arising in linear programming.
Exercise 5.32. Let X denote the Veronese surface in ve-dimensional projective
space P5 that has the parametric representation (1 : x : y : x2 : xy : y 2 ). Compute the conormal variety CN(X) and the dual variety X . Verify the biduality
theorem (X ) = X for this example.

5.4

An Algebraic View of Convex Hulls

The problem of optimizing arbitrary linear functions over a given subset of Rn ,


discussed in the previous section, leads naturally to the geometric question of how to
represent the convex hull of that subset. In this section we explore this question from
the perspective of algebraic geometry. To be precise, we shall study the algebraic
boundary a P of the convex hull P = conv(X) of a compact real algebraic variety
X in Rn . Biduality of projective varieties (Theorem 5.13) will play an important

i
i

5.4. An Algebraic View of Convex Hulls

main
2012/11/1
page 225
i

225

role in understanding the structure of a P . The results to be presented are drawn


from [31, 32]. In Section 5.6 we shall briey discuss the alternative representation
of P as a projected spectrahedron, a topic much further elaborated in Chapter 7.
We begin with the seemingly easy example of a plane quartic curve.
Example 5.33. We consider the following smooth compact plane curve:
:
;
X = (x, y) R2 | 144x4 + 144y 4 225(x2 + y 2 ) + 350x2 y 2 + 81 = 0 . (5.30)
This curve is known as the Trott curve. It was rst constructed by Michael
Trott in [36], and is illustrated above in Figure 5.5. A classical result of algebraic
geometry states that a general quartic curve in the complex projective plane P2 has
28 bitangent lines, and the Trott curve X is an instance where all 28 lines are real
and have a coordinatization in terms of radicals over Q. Four of the 28 bitangents
form edges of conv(X). These special bitangents are


48050 + 434 9889


2
= 1.2177 . . . .
{(x, y) R | x y = }, where =
248
The boundary of conv(X) alternates between these four edges and pieces of the
curve X. The eight transition points have the oating point coordinates
( 0.37655 . . . , 0.84122 . . .) , ( 0.84122 . . . , 0.37655 . . .).
These coordinates lie in the eld Q() and we invite the reader to write them in the
form q1 + q2 , where qi Q. The Q-Zariski closure of the 4 edge lines of conv(X)
is a curve Y of degree 8. Its equation has two irreducible factors:
(992x4 3968x3y+5952x2y 2 3968xy 3+992y 41550x2 +3100xy1550y 2+117),
(992x4 +3968x3y+5952x2y 2 +3968xy 3+992y 41550x2 3100xy1550y 2+117).

Figure 5.5. A quartic curve in the plane can have up to 28 real bitangents.

i
i

226

main
2012/11/1
page 226
i

Chapter 5. Dualities

Each reduces over R to four parallel lines (cf. Figure 5.5), two of which contribute
to the boundary. The point of this example is to stress the role of the (arithmetic
of) bitangents in any exact description of the convex hull of a plane curve.
We now present a general formula for the algebraic boundary of the convex hull
of a compact variety X in Rn . The key observation is that the algebraic boundary
of P = conv(X) will consist of dierent types of components, resulting from planes
that are simultaneously tangent at k dierent points of X, for various values of the
integer k. For the Trott curve X in Example 5.33, the relevant integers were k = 1
and k = 2, and we demonstrated that the algebraic boundary of its convex hull P
is a reducible curve of degree 12:
a (P ) = X Y.

(5.31)

In the following denitions we regard X as a complex projective variety in Pn .


Let X [k] be the variety in the dual projective space (Pn ) which is the closure of
the set of all hyperplanes that are tangent to X at k regular points which span a
(k 1)-plane in Pn . This denition makes sense for k = 1, 2, . . . , n. Note that X [1]
coincides with the dual variety X , and X [2] parametrizes all hyperplanes that are
tangent to X at two distinct points. Typically, X [2] is an irreducible component of
the singular locus of X = X [1] . We have the following nested chain of projective
varieties in the dual space:
X [n] X [n1] X [2] X [1] (Pn ) .
We now dualize each of the varieties in this chain. The resulting varieties (X [k] )
live in the primal projective space Pn . For k = 1 we return to our original variety,
i.e., we have (X [1] ) = X by biduality (Theorem 5.13). In the following result we
assume that X is smooth as a complex variety in Pn , and we require one technical
hypothesis concerning tangency of hyperplanes.
Theorem 5.34 ([32, Theorem 1.1]). Let X be a smooth and compact real algebraic variety that anely spans Rn , and such that only nitely many hyperplanes are
tangent to X at innitely many points. The algebraic boundary a P of its convex
hull, P = conv(X), can be computed by biduality as follows:
a P

n
<

(X [k] ) .

(5.32)

k=1

Since a P is pure of codimension one, in the union we need only indices k


having the property that (X [k] ) is a hypersurface in Pn . As argued in [32], this
leads to the following lower bound on the relevant values to be considered:
=
>
n
k
.
(5.33)
dim(X) + 1
The formula (5.32) computes the algebraic boundary a P in the following sense. For
each relevant k we check whether (X [k] ) is a hypersurface, and if so, we determine

i
i

i main

2012/11/10
page 227

5.4. An Algebraic View of Convex Hulls

227

its irreducible components (over the eld K of interest). For each component we
then check, usually by means of numerical computations, whether it meets the
boundary P in a regular point. The irreducible hypersurfaces which survive this
test are precisely the components of a X.
Example 5.35. When X is a plane curve in R2 , (5.32) says that
a P X (X [2] ) .

(5.34)

Here X [2] is the set of points in (P2 ) that are dual to the bitangent lines of X, and
(X [2] ) is the union of those lines in P2 . If we work over K = Q and the curve X
is general enough then we expect equality to hold in (5.34). For special curves the
inclusion can be strict. This happens for the Trott curve (5.30) since Y is a proper
subset of (X [2] ) . Namely, Y consists of two of the six Q-components of (X [2] ) .
However, a small perturbation of the coecients in (5.30) leads to a curve X with
equality in (5.34), as the relevant Galois group acts transitively on the 28 points
in X [2] for general quartics X. See [28] for more details. We conclude that the
algebraic boundary of X over Q is a reducible curve of degree 32 = 28 + 4.
If we are given the variety X in terms of equations or in parametric form,
then we can compute equations for X [k] by an elimination process similar to the
computation of the dual variety X in Algorithm 5.1. However, expressing the
tangency condition at k dierent points requires a larger number of additional
variables (which need to be eliminated afterwards) and thus the computations are
quite involved. The subsequent step of dualizing X [k] to get the right-hand side of
(5.32) is even more forbidding. The resulting hypersurfaces (X [k] ) tend to have
high degree and their dening polynomials are very large when n 3.
The article [31] oers a detailed study of the case when X is a space curve in
R3 . Here the lower bound (5.33) tells us that a X (X [2] ) (X [3] ) . The surface
(X [2] ) is the edge surface of the curve X, and (X [3] ) is the union of all tritangent
planes of X. The following example illustrates these objects.
Example 5.36. We consider the trigonometric curve X in R3 parametrized by
x = cos(), y = cos(2), z = sin(3). This is an algebraic curve of degree six. Its
implicit representation equals X = V (h1 , h2 ), where
h1 = 2x2 y 1 and h2 = 4y 3 + 2z 2 3y 1.
The edge surface (X [2] ) has three irreducible components. Two of the components are the quadric V (h1 ) and the cubic V (h2 ). The third and most interesting
component of (X [2] ) is the surface of degree 16 with equation h3 =
419904x14 y 2 + 664848x12 y 4 419904x10 y 6 + 132192x8 y 8 20736x6 y 10 + 1296x4 y 12
46656x14 z 2 + 373248x12 y 2 z 2 69984x10 y 4 z 2 22464x8 y 6 z 2 +4320x6 y 8 z 2 +31104x12 z 4
+ 5184x10 y 2 z 4 + 4752x8 y 4 z 4 + 1728x10 z 6 + 699840x14 y 46656x12 y 3 902016x10 y 5
+694656x8 y 7 209088x6 y 9 1150848x10 y 3 z 2 +279936x8 y 5 z 2 +17280x6 y 7 z 2 4032x4 y 9 z 2
98496x10 yz 4 + 27072x4 y 11 1152x2 y 13 419904x12 yz 2 25920x8 y 3 z 4 4608x6 y 5 z 4

i
i

228

main
2012/11/1
page 228
i

Chapter 5. Dualities

1728x8 yz 6 291600x14 169128x12 y 2 256608x10 y 4 + 956880x8 y 6 618192x6 y 8


+ 148824x4 y 10 13120x2 y 12 + 256y 14 + 392688x12 z 2 + 671976x10 y 2 z 2 + 1454976x8 y 4 z 2
292608x6 y 6 z 2 4272x4 y 8 z 2 + 1016x2 y 10 z 2 116208x10 z 4 +135432x8 y 2 z 4 +18144x6 y 4 z 4
+ 1264x4 y 6 z 4 5616x8 z 6 + 504x6 y 2 z 6 1108080x12 y + 925344x10 y 3 + 215136x8 y 5
672192x6 y 7 + 331920x4 y 9 54240x2 y 11 + 2304y 13 +273456x10 yz 2 +282528x8 y 3 z 2
1185408x6 y 5 z 2 + 149376x4 y 7 z 2 368x2 y 9 z 2 32y 11 z 2 +273456x8 yz 4 67104x6 y 3 z 4
4704x4 y 5 z 4 64x2 y 7 z 4 + 4752x6 yz 6 32x4 y 3 z 6 + 747225x12 + 636660x10 y 2
908010x8 y 4 65340x6 y 6 + 291465x4 y 8 101712x2 y 10 + 8256y 12 818100x10 z 2
1405836x8 y 2 z 2 905634x6 y 4 z 2 + 583824x4 y 6 z 2 39318x2 y 8 z 2 + 368y 10 z 2 +193806x8 z 4
282996x6 y 2 z 4 + 15450x4 y 4 z 4 + 716x2 y 6 z 4 + y 8 z 4 + 6876x6 z 6 1140x4 y 2 z 6 + 2x2 y 4 z 6
+ x4 z 8 + 507384x10 y 809568x8 y 3 + 569592x6 y 5 27216x4 y 7 71648x2 y 9 + 13952y 11
+ 555768x8 yz 2 + 869040x6 y 3 z 2 + 688512x4 y 5 z 2 154128x2 y 7 z 2 +4416y 9 z 2 343224x6 yz 4
+ 127360x4 y 3 z 4 1656x2 y 5 z 4 64y 7 z 4 4536x4 yz 6 +48x2 y 3 z 6 775170x10 191808x8 y 2
+ 599022x6 y 4 245700x4 y 6 + 31608x2 y 8 + 7872y 10 + 765072x8 z 2 + 589788x6 y 2 z 2
66066x4 y 4 z 2 234252x2 y 6 z 2 + 16632y 8 z 2 173196x6 z 4 + 248928x4 y 2 z 4 26158x2 y 4 z 4
32y 6 z 4 3904x4 z 6 + 804x2 y 2 z 6 + 2y 4 z 6 2x2 z 8 + 5832x8 y + 98280x6 y 3 219456x4 y 5
+ 72072x2 y 7 8064y 9 724032x6 yz 2 515760x4 y 3 z 2 99672x2 y 5 z 2 + 29976y 7 z 2
+ 225048x4 yz 4 76216x2 y 3 z 4 + 1912y 5 z 4 + 1696x2 yz 6 32y 3 z 6 + 411345x8 66096x6 y 2
62532x4 y 4 +29388x2 y 6 11856y 8 365346x6 z 2 +19812x4 y 2 z 2 +104922x2 y 4 z 2 +24636y 6 z 2
+ 85090x4 z 4 104580x2 y 2 z 4 +8282y 4 z 4 +1014x2 z 6 144y 2 z 6 + z 8 39744x6 y+61992x4 y 3
+ 2304x2 y 5 + 576y 7 + 305328x4 yz 2 + 86640x2 y 3 z 2 + 960y 5 z 2 73480x2 yz 4 + 16024y 3 z 4
200yz 6 114966x6 + 24120x4 y 2 5958x2 y 4 + 6192y 6 + 85494x4 z 2 39696x2 y 2 z 2
11970y 4 z 2 21610x2 z 4 + 16780y 2 z 4 94z 6 3672x4 y 11024x2 y 3 + 272y 5
46904x2 yz 2 4632y 3 z 2 + 9368yz 4 + 15246x4 84x2 y 2 1908y 4 6892x2 z 2
+ 2204y 2 z 2 + 2215z 4 + 3216x2 y + 168y 3 + 904yz 2 664x2 + 292y 2 282z 2 96y + 9.

The boundary of P = conv(X) contains patches from all three


surfaces V (h1 ),
V(h2 ), and V (h3 ). There are also two triangles, with vertices at ( 3/2, 1/2, 1),
( 3/2, 1/2, 1), and (0, 1, 1). They span two of the tritangent planes of X,
namely, z = 1 and z = 1. The union of all tritangent planes equals (X [3] ) . Only
one triangle is visible in Figure 5.6. It is colored yellow. The curved blue patch
adjacent to one of the edges of the triangle is given by the cubic h2 , while the other
two edges of the triangle lie in the degree 16 surface V (h3 ). The curve X has two
singular points at (x, y, z) = (1/2, 1/2, 0). Around these two singular points, the
boundary is given by four alternating patches from the quadric V (h1 ) highlighted
in red and the degree 16 surface V (h3 ) in green. We conclude that the edge surface
(X [2] ) = V (h1 h2 h3 ) is reducible of degree 21 = 2 + 3 + 16, and the algebraic
boundary a (P ) is a reducible surface of degree 23 = 2 + 21.
In our next example we examine the convex hull of space curves of degree four
that are obtained as the intersection of two quadratic surfaces in R3 .

i
i

5.4. An Algebraic View of Convex Hulls

main
2012/11/1
page 229
i

229

Figure 5.6. The convex hull of the curve (cos(), cos(2), sin(3)) in R3 .

Example 5.37. Let X = V (h1 , h2 ) be the intersection of two quadratic surfaces in


3-space. We assume that X has no singularities in P3 . Then X is a curve of genus
one. According to recent work of Scheiderer [35], the convex body P = conv(X)
can be represented exactly using Lasserre relaxations, a topic we shall return to
when discussing projected spectrahedron in Section 5.6. If we are willing to work
over R, then P is in fact a spectrahedron, as shown in [31, Example 2.3]. We here
derive that representation for a concrete example.
Lazard et al. [23, Section 8.2] examine the curve X cut out by the two quadrics
h1 = x2 + y 2 + z 2 1 and h2 = 19x2 + 22y 2 + 21z 2 20.
Figure 5.7 shows the two components of X on the unit sphere V (h1 ).
The dual variety X is a surface of degree 8 in (P3 ) . The singular locus of

X contains the curve X [2] which is the union of four quadratic curves. The duals
of these four plane curves are the singular quadratic surfaces dened by
h3 = x2 2y 2 z 2 , h4 = 2x2 y 2 1, h5 = 3y 2 + 2z 2 1, h6 = 3x2 + z 2 2.
The edge surface of X is the union of these four quadrics:
(X [2] ) = V (h3 ) V (h4 ) V (h5 ) V (h6 ).
The algebraic boundary of P consists of the last two among these quadrics:
a P = V (h5 ) V (h6 ).

i
i

230

main
2012/11/1
page 230
i

Chapter 5. Dualities

Figure 5.7. The curve on the unit sphere discussed in Examples 5.37 and 5.61.
These two quadrics are convex. From this we derive a representation of P as a
spectrahedron by applying Schur complements to the quadrics h5 and h6 :





3
(x, y, z) R

1+
3y
2z

0
0

2z

1 3y
0
0

0
0
2+z
3x

0
 0 .

3x

2z

An extension of this example is suggested in Exercise 5.42 below.


Exercises
Exercise 5.38. Give an example of a compact algebraic curve of degree six in the
plane R2 whose convex hull has more than 8 straight edges in its boundary. It is an
interesting problem to determine the maximal number (d) of edges in the convex
hull of any curve of degree d in R2 . For instance, (6) 9.
Exercise 5.39. If X is a surface in 3-space, then its algebraic boundary consists
of three surfaces (X [1] ) , (X [2] ) , and (X [3] ) . Describe the geometric meaning of
these surfaces. Show that all three of them are needed for some X.
Exercise 5.40. Let P be the convex hull of the union of two circles in threedimensional space, where the rst circle is dened by x2 + y 2 = 5/4 and z = 0, and
the second circle is dened by x2 + z 2 = 1 and y = 0. Compute the irreducible
polynomial in x, y, z that vanishes on the boundary of P .
Exercise 5.41. Describe an algorithm for computing the variety X [k] from the
equations of X. Apply your algorithm to the curve X = V (h1 , h2 ) in Example 5.22.

i
i

5.5. Spectrahedra and Semidenite Programming

main
2012/11/1
page 231
i

231

Exercise 5.42. Intersect the unit sphere in 3-space with a general quadratic
surface. Show that the convex hull of the resulting curve is a spectrahedron.

5.5

Spectrahedra and Semidenite Programming

Spectrahedra and semidenite programming (SDP) have already surfaced numerous


times throughout this book. In this section we take a systematic look at these topics
from the point of view of duality. We write S n for the space of real symmetric
n+1
n
nn-matrices and S+
for the cone of positive semidenite matrices in S n $ R( 2 ) .
This cone is self-dual with respect to the inner product U, V  = Tr(U V ).
n
A spectrahedron is the intersection of the cone S+
with an ane subspace
K = C + Span(A1 , A2 , . . . , Am ) .

!
"
W

Here C, A1 , . . . , Am are symmetric n n matrices, and we assume that W is a linear


subspace of dimension m in S n . Recall from Chapter 2 that we may also think of
a spectrahedron as a set in the Euclidean space as follows:



m


m
n
P =
xR C
xi Ai  0
$ K S+
.
(5.35)
i=1

We shall assume that C is positive denite or, equivalently, that 0 int(P ). The
dual body to our spectrahedron is written in the coordinates on Rm as
P = { y Rm | y, x 1 for all x P } .
%
$
We can express P as a projection of the n+1
2 -dimensional spectrahedron
n
Q = { U S+
| U, C 1 }.

(5.36)

While Q is not literally a spectrahedron when regarded as a set of nn matrices, we


will identify it with the spectrahedron consisting of all symmetric positive semidef = ( U 0 ) that satisfy the equation U, C + x = 1.
inite (n + 1) (n + 1) matrices U
0 x
To write P as a projection of the spectrahedron Q, we consider the linear
map
$n+1%dual to the inclusion of the linearn subspace W = Span(A1 , A2 , . . . , Am ) in the
2 -dimensional real vector space S :
W : S n S n /W $ Rm
$
%
U  U, A1 , U, A2 , . . . , U, Am  .
Remark 5.43. The convex body P dual to the spectrahedron P is anely isomorphic to the closure of the image of the spectrahedron Q in (5.36) under the linear
map W , i.e., P $ W (Q).
This result in Remark 5.43 is due to Ramana and Goldman [30]. In summary,
while the dual to a spectrahedron is generally not a spectrahedron, it is always a
projected spectrahedron. We shall return to this issue in Theorem 5.57.

i
i

232

main
2012/11/1
page 232
i

Chapter 5. Dualities

Figure 5.8. The elliptope P = E3 and its dual convex body P .


Example 5.44. The elliptope En is the spectrahedron consisting of all correlation
matrices of size n; see [20]. These are the positive semidenite symmetric nn
matrices whose diagonal entries are 1. We consider the case n = 3:

1 x y


E3 = (x, y, z) R3 x 1 z  0 .
(5.37)

y z 1
This spectrahedron of dimension m = 3 is shown on the left in Figure 5.8. The
algebraic boundary of E3 is the cubic surface X dened by the vanishing of the 3 3
determinant in (5.37). That surface has four isolated singular points
Xsing = {(1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1)}.
The six edges of the tetrahedron conv(Xsing ) are edges of the elliptope E3 . The dual
body, shown on the right of Figure 5.8, is the projected spectrahedron


u a
b


c  0 .
E3 = (a, b, c) R3 u, v R : a v
(5.38)

b c 2uv
The algebraic boundary of E3 can be computed by the following method. We form
the ideal generated by the determinant in (5.38) and its derivatives with respect to
u and v, and we eliminate u, v. This results in the polynomial
(a2 b2 + b2 c2 + a2 c2 + 2abc)(a + b + c 1)(a b c 1)(a b + c + 1)(a + b c + 1).
The rst factor is the equation of Steiners quartic surface X , which is dual to
Cayleys cubic surface X = a E3 . The four linear factors represent the arrangement
(Xsing ) of the four planes dual to the four singular points.

i
i

5.5. Spectrahedra and Semidenite Programming

main
2012/11/1
page 233
i

233

Thus the algebraic boundary of the dual body E3 is the reducible surface
a E3 = X (Xsing )

(P3 ) .

(5.39)

We note that E3 is not a spectrahedron as it fails to be a basic semialgebraic set;


i.e., it cannot be described by a conjunction of polynomial inequalities gi 0. Since
the algebraic boundary of E3 is uniquely dened by (the irreducible polynomial)
equation (a, b, c) = 0, such a description would contain the inequality (a, b, c) 0.
This is not possible since the Steiner surface has a regular point in the interior of
the dual convex body E3 .
Semidenite programming (SDP) is the branch of convex optimization that is
concerned with maximizing a linear function b over a spectrahedron:
p := max b, x subject to x P.
x

(5.40)

Here P is as in (5.35). As the semideniteness of a matrix is equivalent to the simultaneous nonnegativity of its principal minors, SDP is an instance of the polynomial
optimization problem (5.17). Lagrange duality theory applies here by [5, Section 5].
We shall derive the optimization problem dual to (5.40) from
d := minimize subject to

1
b P .

(5.41)

Since we assumed 0 int(P ), strong duality holds and we have p = d .


The fact that P is a projected spectrahedron implies that the dual optimization problem is again a semidenite optimization problem. In light of Remark 5.43,
the condition 1 b P can be expressed as follows:
U : U  0 , C, U  1 and bi = Ai , U  for i = 1, 2, . . . , m.
Since the optimal value of (5.41) is attained at the boundary of P , we can here
replace the condition C, U  1 with C, U  = 1. Indeed, assume that C, U   =
< 1 at the optimum, then we could scale U  by 1 and the optimal cost 
by the factor and obtain a feasible solution with a smaller cost function value,
a contradiction.
This is in fact what was done to obtain (5.38). If we now set Y = U , then
(5.41) translates into
d := minimize C, Y 
Y

subject to

Ai , Y  = bi for i = 1, . . . , m
and Y

(5.42)

n
S+
.

We recall that W = Span(A1 , A2 , . . . , Am ) and we x any matrix B S n with


Ai , B = bi for i = 1, . . . , m. Then (5.42) can be written as follows:
n
d := minimize C, Y  subject to Y (B + W ) S+
.
Y

(5.43)

i
i

234

main
2012/11/1
page 234
i

Chapter 5. Dualities

The following reformulation of (5.40) highlights the symmetry between the primal
and dual formulations of our SDP problem:
n
p := max B, C X subject to X (C + W) S+
X

(5.44)

Then the following variant of the KKT conditions holds.


Theorem 5.45 ([5, Section 5.9.2]). If both the primal problem (5.44) and its
dual (5.43) are strictly feasible, then the KKT conditions take the following form:
n
,
X (C + W) S+
n
Y (B + W ) S+
,
X Y = 0 (complementary slackness).

These conditions characterize all the pairs (X, Y ) of optimal solutions.


This theorem can be related to the general optimality conditions (5.18) by
regarding the entries of Y S n as the
m(Lagrangian) dual variables to the positive semidenite constraint X = C i=1 xi Ai  0. The three KKT conditions
in Theorem 5.45 are both necessary and sucient for optimality. This holds because SDP is a convex problem and every local optimum is also a global optimal
solution.
In order to study algebraic and geometric properties of SDP, we will relax the
n
and focus only on the KKT equations
conic inequalities X, Y S+
X C + W , Y B + W , and X Y = 0.

(5.45)

Given the data B, C, and W, our problem is to solve the polynomial equations
(5.45). The theorem ensures that, among its solutions (X, Y ), there is precisely one
pair of positive semidenite matrices. That pair is the one desired in SDP.
Example 5.46. Consider the problem of minimizing a linear function Y  C, Y 
over the set of all correlation matrices Y , that is, over the elliptope En of Example
5.44. Here m = n, B is the identity matrix, C is any symmetric matrix, W is
the space of all diagonal matrices, and W consists of matrices with zero diagonal.
n
This problem is dual to maximizing the trace of C X over all matrices X S+

such that C X is diagonal. Equivalently, we seek to nd the minimum trace t of
any positive semidenite matrix that agrees with C in its o-diagonal entries.
For n = 4, the KKT equations (5.45) can be written in the form

x1 c12 c13 c14


1 y12 y13 y14
c12 x2 c23 c24 y12 1 y23 y24

XY =
(5.46)
c13 c23 x3 c34 y13 y23 1 y34 = 0.
c14 c24 c34 x4
y14 y24 y34 1
This is a system of 16 quadratic equations in 10 unknowns. For general values of
the 6 parameters cij , these equations have 14 solutions. Eight of these solutions

i
i

5.5. Spectrahedra and Semidenite Programming

main
2012/11/1
page 235
i

235

have rank(X) = 3 and rank(Y ) = 1 and they are dened over Q(cij ). The other
six solutions form an irreducible variety over Q(cij ) and they satisfy rank(X) =
rank(Y ) = 2. This case distinction reects the boundary structure of the dual body
to the six-dimensional elliptope E4 :
a E4

{rank(Y ) 2} {rank(Y ) = 1} .

(5.47)

Indeed, the boundary of E4 is the quartic hypersurface {rank(Y ) 3}, its singular
locus is the degree 10 threefold {rank(Y ) 2}, and, nally, the singular locus of
that threefold consists of eight matrices of rank 1:
;
:
{rank(Y ) = 1} = (u1 , u2 , u3 , u4 )T (u1 , u2 , u3 , u4 ) : ui {1, +1} .
The last two strata are dual to the hypersurfaces in (5.47). The second component
in (5.47) consists of eight hyperplanes, while the rst component is irreducible of
degree 18. The corresponding projective hypersurface is dened by an irreducible
homogeneous polynomial of degree 18 in seven unknowns c12 , c13 , c14 , c23 , c24 , c34 , t .
That polynomial has degree 6 in the special unknown t . Hence, the algebraic degree
of our SDP, i.e., the degree of the optimal value function, is 6 when rank(Y ) = 2.
We note that {rank(Y ) 3} does not appear as a component in the union
(5.47) since it is not a hypersurface. Nevertheless, it is still a subset of a E4 .
In algebraic geometry, it is natural to regard the matrix pairs (X, Y ) as points
in the product of projective spaces P(S n ) P(S n ) . This has the advantage that
solutions of (5.45) are invariant under scaling, i.e., whenever (X, Y ) is a solution,
then so is (X, Y ) for any nonzero , R. In that setting, there are no worries
about complications due to solutions at innity.
For the algebraic formulation we assume that, without loss of generality,
b1 = 1,

b2 = 0,

b3 = 0, . . . , bm = 0.

This means that A1 , X = 1 plays the role of the homogenizing variable. Our SDP
instance is specied by two linear subspaces of symmetric matrices:
L = Span(A2 , A3 , . . . , Am ) U = Span(C, A1 , A2 , . . . , Am ) S n .
Note that we have the following identications:
RC + W = U

and RB + W = RB + (L A
1 ) = L .

With the linear spaces L U, we write the homogeneous KKT equations as


X U, Y L and X Y = 0.

(5.48)

Here is an abstract denition of SDP that might appeal to some of our algebraically inclined readers: Given two nested linear subspaces L U S n with
dim(U/L) = 2, locate the unique semidenite point in the variety (5.48).

i
i

236

main
2012/11/1
page 236
i

Chapter 5. Dualities

For instance, in Example 5.46 the space L consists of traceless diagonal matrices and U/L is spanned by the unit matrix B and one o-diagonal matrix C. We
seek to solve the matrix equation X Y = 0 where the diagonal entries of X are
constant and the o-diagonal entries of Y are proportional to C.
The formulation (5.48) suggests that we study the variety {XY = 0} for pairs
of symmetric matrices X and Y . In [27, Equation (3.9)] it was shown that this
variety has the following decomposition into irreducible components:
{XY = 0} =

n1
<

{XY = 0}r

P(S n ) P(S n ) .

r=1

Here {XY = 0}r denotes the subvariety consisting of pairs (X, Y ) where rank(X)
r and rank(Y ) nr. This is irreducible because, by Example 5.15, it is the conormal variety of the variety of symmetric matrices of rank r. See also Exercise 5.19
at the end of Section 5.2.
The KKT equations describe sections of these conormal varieties:
$
%
{XY = 0}r P(U) P(L ) .
(5.49)
All solutions of a semidenite optimization problem (and thus also the boundary of
a spectrahedron and its dual) can be characterized by rank conditions. The main
result in [27] describes the case when the section in (5.49) is generic:
Theorem 5.47 ([27, Theorem 7]). For generic subspaces L U S n with
dim(L) = m 1 and dim(U) = m + 1, the variety (5.49) is empty unless
&
'
&
' &
'
nr+1
r+1
n+1
m and

m.
(5.50)
2
2
2
In that case, the variety (5.49) is reduced, nonempty, and zero-dimensional and at
each point the rank of X and Y is r and n r, respectively (strict complementarity).
The cardinality of this variety depends only on m, n, and r.
The generic choice of nested subspaces L U corresponds to the assumption
that our matrices A1 , A2 , . . . , Am , B, C lie in a certain dense open subset in the space
of all SDP instances. The inequalities (5.50) are known as Patakis inequalities.
If m and n are xed, then they give a lower bound and an upper bound for the
possible ranks r of the optimal matrix of a generic SDP instance. The variety
(5.49) represents all complex solutions of the KKT equations for such a generic
SDP instance. Its cardinality, denoted (m, n, r), is known as the algebraic degree
of SDP.
Corollary 5.48. Consider the variety of symmetric nn matrices of rank r that
lie in the generic m-dimensional linear subspace P(U) of P(S n ). Its dual variety is
a hypersurface if and only if Patakis inequalities (5.50) hold, and the degree of that
hypersurface is (m, n, r), the algebraic degree of SDP.

i
i

5.5. Spectrahedra and Semidenite Programming

main
2012/11/1
page 237
i

237

Proof. The genericity of U ensures that {XY = 0}r ( P(U) P(U) ) is the
conormal variety of the given variety. We obtain its dual by projection onto the
second factor P(U) = P(S n /U ). The degree of the dual hypersurface is found by
intersecting with a generic line. The line we take is P(L /U ). That intersection
corresponds to the second factor P(L ) in (5.49).
We note that the symmetry in the equations (5.48) implies the duality
&&
'
'
$
%
n+1
m, n, r
=
m, n, n r ,
2
rst shown in [27, Proposition 9]. See also [27, Table 2]. Bothmer and Ranestad
[12] derived an explicit combinatorial formula for the algebraic degree of SDP. Their
result implies that (m, n, r) is a polynomial of degree m in n when n r is xed.
For example, in addition to [27, Theorem 11], we have
(6, n, n 2)

%
1$
11n6 81n5 + 185n4 75n3 196n2 + 156n .
72

The algebraic degree of SDP is important because it represents a universal


upper bound on the intrinsic algebraic complexity of optimizing a linear function
over any m-dimensional spectrahedron of nn matrices. The algebraic degree can
be much smaller for families of instances involving special matrices Ai , B, or C.
Example 5.49. Fix n = 4 and m = 6 = dim(E4 ). Patakis inequalities (5.50) state
that the rank of the optimal matrix is r = 1 or r = 2, and this was indeed observed
in Example 5.46. For r = 2 we had found the algebraic degree six when solving
(5.46). However, here B is the identity matrix and A1 , A2 , A3 , A4 are diagonal.
When these are replaced by generic symmetric matrices, then the algebraic degree
jumps from six to (6, 4, 2) = 30.
We now state a result that elucidates the decompositions in (5.39) and (5.47).
Theorem 5.50. If the matrices A1 , . . . , Am and C in the denition (5.35) of the
spectrahedron P are suciently generic, then the algebraic boundary of the dual
body P is the following union of dual hypersurfaces:
<
a P

{X L | rank(X) r} .
(5.51)
r as in (5.50)

Proof. Let Y be any irreducible component of a P (Pm ) . Then Y P is a


semialgebraic subset of codimension 1 in P . We consider a general point in that
set. The corresponding hyperplane H in the primal Rm supports the spectrahedron P at a unique point Z. Then r = rank(Z) satises Patakis inequalities, by
Theorem 5.47. Moreover, the genericity in our choices of A1 , . . . , Am , C, H ensure
that Z is a regular point in {X L | rank(X) r}. Bertinis theorem ensures that
this determinantal variety is irreducible and that its singular locus consists only
of matrices of rank < r. This implies that {X L | rank(X) r} is the Zariski

i
i

238

main
2012/11/1
page 238
i

Chapter 5. Dualities

closure of {X P | rank(X) = r} and hence also of a neighborhood of Z in that


rank stratum. Likewise, Y is the Zariski closure in (Pm ) of Y P . An open
dense subset of points in Y P corresponds to hyperplanes that support P at a
rank r matrix. We conclude Y = {X L | rank(X) r}. Biduality completes the
proof.
Theorem 5.50 is similar to Theorem 5.34 in that it characterizes the algebraic
boundary in terms of dual hypersurfaces. Just as in Section 5.4, we can apply this
result to compute a P . For each rank r in the Pataki range (5.50), we need to
check whether the corresponding dual hypersurface meets the boundary of P . The
indices r which survive this test determine a P .
When the data that specify the spectrahedron P are not generic but special
then the computation of a P is more subtle and we know of no formula as simple
as (5.51). This issue certainly deserves further research.
We close this section with an interesting three-dimensional example.
Example 5.51. The cyclohexatope is a spectrahedron with m = 3 and n = 5 that
arises in the study of chemical conformations
[14]. Consider the following Sch
onberg

matrix for the pairwise distances Dij among six carbon atoms:

2D12
D12 +D13 D23

D12 +D14 D24

D12 +D15 D25


D12 +D16 D26

D12 +D13 D23


2D13
D13 +D14 D34
D13 +D15 D35
D13 +D16 D36

D12 +D14 D24


D13 +D14 D34
2D14
D14 +D15 D45
D14 +D16 D46

D12 +D15 D25


D13 +D15 D35
D14 +D15 D45
2D15
D15 +D56 D56

D12 +D16 D26


D13 +D16 D36

D14 +D16 D46


.
D15 +D56 D56
2D16

The Dij are the squared distances among six points in R3 if and only if this matrix
is positive semidenite of rank 3. The points represent the carbon atoms in
cyclohexane C6 H12 if and only if Di,i+1 = 1 and Di,i+2 = 8/3 for all indices i,
understood cyclically. The three diagonal distances x = D14 , y = D25 , and
z = D36 are unknowns, so, for cyclohexane conformations, the above Sch
onberg
matrix equals

2
8/3
x 5/3 11/3 y
2/3
8/3
2
5/3 + x
8/3
11/3 z

5/3
5/3
+
x
16/3
x
+
5/3
x 5/3
C6 (x, y, z) =

.
11/3 y
8/3
x + 5/3
2y
8/3
2/3
11/3 z x 5/3
8/3
16/3
The cyclohexatope Cyc6 is the spectrahedron in R3 dened by C6 (x, y, z)  0. Its
algebraic boundary decomposes as a Cyc6 = V (f ) V (g), where
f
g

=
=

27xyz 75x 75y 75z 250


and
3xy + 3xz + 3yz 22x 22y 22z + 121.

The conformation space of cyclohexane is the real algebraic variety


:
;
(x, y, z) Cyc6 | rank(C6 (x, y, z)) 3
= V (f, g) V (g)sing .

i
i

5.6. Projected Spectrahedra

main
2012/11/1
page 239
i

239

The rst component is the closed curve of all chair $conformations.


The second
%
11 11
.
These
are well,
,
component is the boat conformation point (x, y, z) = 11
3
3
3
known to chemists [14]. Remarkably, the cyclohexatope coincides with the convex
hull of these two components. This spectrahedron is another example of a convex
hull of a space curve, now with an isolated point. SDP over the cyclohexatope
means computing the conformation which minimizes a linear function in the squared
distances Dij .
Exercises
Exercise 5.52. Maximize the sum of the o-diagonal entries over all positive
semidenite 44 matrices with trace 1. Formulate this as a pair of primal and dual
problems and solve the KKT equations. For both primal and dual, determine the
set of all optimal solutions, and verify Theorem 5.45.
Exercise 5.53. A result in classical algebraic geometry states that every smooth
cubic surface contains 27 lines and is obtained by blowing up P2 at six points. Are
these statements still true for Cayleys cubic surface X = a E3 as in Example 5.44?
Exercise 5.54. Determine the positive integer (5, 7, 5). Explain in your own
words what this number means for SDP on 77 matrices.
Exercise 5.55. Compute the right-hand side of (5.51) for the spectrahedron P in
(5.2).
Exercise 5.56. The analytic center of a spectrahedron is the symmetric matrix in
its interior that maximizes the determinant function. Compute the analytic center
of the three-dimensional spectrahedron

x
z+1 x+y+z
z+1
y
x y  0.
x+y+z xy 1xy
Determine the values x , y  , and z  for the optimal matrix as oating point numbers.
Make sure that you have at least twenty accurate digits. If this is possible, write
x , y  , and z  in terms of radicals over Q.

5.6

Projected Spectrahedra

A projected spectrahedron is the image of a spectrahedron under a linear map. The


class of projected spectrahedra is much larger than the class of spectrahedra. In
fact, it has even been conjectured that every convex basic semialgebraic set in Rn
is a projected spectrahedron [17]. See Chapter 6 for a detailed discussion.
Our point of departure is the result that the convex body dual to a projected
spectrahedron is again a projected spectrahedron [15, Proposition 3.3].
Theorem 5.57. The class of projected spectrahedra is closed under duality.

i
i

240

main
2012/11/1
page 240
i

Chapter 5. Dualities

Proof (Construction). A projected spectrahedron can be written in the form

p
m




xi Ai +
y j Bj  0 .
P =
x Rm y Rp with C +

i=1

j=1

An expression for the dual body P is obtained by the following variant of the
construction in Remark 5.43. We consider the same linear map as before:
n
: S+
Rm , U  (A1 , U , . . . , Am , U ).

We apply this linear map to the spectrahedron


:
;
n
Q = U S+
| C, U  1 and B1 , U  = = Bp , U  = 0 .
The closure of the projected spectrahedron (Q) equals the dual convex body P .
This closure is itself a projected spectrahedron, e.g., by using the extended
LagrangeSlater dual formulation proposed by Ramana [29].
We now consider the following problem: Given a real variety X Rn , nd a
representation of its convex hull conv(X) as a projected spectrahedron. A systematic approach to computing such representations was introduced by Lasserre [21]
and further developed by Gouveia et al. [16]. It is based on the relaxation of
nonnegative polynomial functions on X as sums of squares in the coordinate ring
R[X]. This approach is known as moment relaxation (also Lasserre relaxation; see
Chapter 7) in light of the duality between positive polynomials and moments of
measures.
We shall begin by exploring these ideas for homogeneous polynomials of even
degree 2d that
are nonnegative
on Rn . These form a cone in a real vector space
%
$2d+n1
. Inside that cone lies the smaller sos cone of polynomials p
of dimension
2d
that are sums of squares of polynomials of degree d:
p

q12 + q22 + + qr2 .

(5.52)

By Hilberts theorem [25, Theorem 1.2.6], this inclusion of convex cones is strict
unless (n, 2d) equals (1, 2d) or (n, 2) or (2, 4). The sos cone is easily seen to be a
projected spectrahedron. Indeed, consider an unknown symmetric matrix Q S N
and write p = v T Qv, where v is the vector of all N monomials of degree d. The
matrix Q is positive semidenite if it has a Cholesky factorization Q = C T C. The
resulting identity p = (Cv)T (Cv) can be rewritten as (5.52). Hence the sos cone is
N
under the linear map Q  v T Qv.
the image of S+
The boundaries of our two cones and their duals have been described in detail
already in Chapter 4, and here we want only to briey make some connections to
our previous discussion about dualities. In the work of Nie [26] the structure of
these boundaries was approached by computations with discriminants, encountered
at the end of Section 5.2.4.
Proposition 5.58 (Theorem 4.1 in [26]). The algebraic boundary of the cone
of homogeneous polynomials p of degree 2d that are nonnegative on Rn is given

i
i

5.6. Projected Spectrahedra

main
2012/11/1
page 241
i

241

by the discriminant of a polynomial p whose coecients are indeterminates. This


discriminant is the irreducible hypersurface dual to the Veronese embedding
2d1
Pn1  PN 1 , (x1 : : xn )  (x2d
x2 : : x2d
1 : x1
n ).

The degree of this discriminant is n(2d 1)n1 .


Proof. The discriminant
of p vanishes if and only if there exists x Pn1 with


p(x) = 0 and p x = 0. If p is in the boundary of the cone of positive polynomials
then such a real point x exists. For the degree formula, see [11].
Results similar to Proposition 5.58 hold when we restrict ourselves to polynomials p that lie in linear subspaces. This is why the A-discriminants A from
Section 5.2.4 are relevant. We show this for a two-dimensional family of polynomials.
Example 5.59. Consider the two-dimensional family of ternary quartics
fa,b (x, y, z)

x4 + y 4 + ax3 z + ay 2 z 2 + by 3 z + bx2 z 2 + (a + b)z 4 .

Here a and b are parameters. Such a polynomial is nonnegative on R3 if and only if


it is a sum of squares, by Hilberts theorem. This condition denes a closed convex
region C in the (a, b)-plane R2 . It is nonempty because (0, 0) C. Its boundary
a C is derived from the A-discriminant A , where

4 0 3 0 0 2 0
A = 0 4 0 2 3 0 0 .
(5.53)
0 0 1 2 1 2 4
This A-discriminant is an irreducible homogeneous polynomial of degree 24 in the
seven coecients. What we are interested in here is the specialized discriminant
which is obtained from A by substituting the vector of coecients (1, 1, a, a, b, b, a+
b) corresponding to our polynomial fa,b . The specialized discriminant is an inhomogeneous polynomial of degree 24 in the two unknowns a and b, and it is no longer
irreducible. A computation reveals that it is the product of four irreducible factors
whose degrees are 1, 5, 5, and 13.
The linear factor equals a + b. The two factors of degree 5 are
256a2 27a5 +512ab+144a3 b27a4 b+256b2 128ab2 +144a2 b2 128b3 4a2 b3 +16b4 ,
256a2 128a3 +16a4 +512ab128a2 b+256b2 +144a2 b2 4a3 b2 +144ab3 27ab4 27b5 .

Finally, the factor of degree 13 in the specialized discriminant equals


2916a11 b2 + 19683a9 b4 + 19683a8 b5 + 2916a7 b6 + 2916a6 b7 + 19683a5 b8
+19683a4 b9 + 2916a2 b11 11664a12 104976a10 b2 136080a9 b3 27216a8 b4
225504a7 b5 419904a6 b6 225504a5 b7 27216a4 b8 136080a3 b9
104976a2 b10 11664b12 + 93312a11 + 217728a10 b + 76032a9 b2
+1133568a8 b3 + 1976832a7 b4 + 891648a6 b5 + 891648a5 b6 + 1976832a4 b7
+1133568a3 b8 + 76032a2 b9 + 217728ab10 + 93312b11 241920a10
1368576a9 b 2674944a8 b2 1511424a7 b3 4729600a6 b4 9369088a5 b5
4729600a4 b6 1511424a3 b7 2674944a2 b8 1368576ab9 241920b10

i
i

242

main
2012/11/1
page 242
i

Chapter 5. Dualities
+663552a9 + 2949120a8 b + 10539008a7 b2 + 17727488a6 b3 + 9981952a5 b4
+9981952a4 b5 + 17727488a3 b6 + 10539008a2 b7 + 2949120ab8 + 663552b9
2719744a8 8847360a7 b 14974976a6 b2 36503552a5 b3 56360960a4 b4
36503552a3 b5 14974976a2 b6 8847360ab7 2719744b8 + 4587520a7
+25821184a6 b + 52035584a5 b2 + 50724864a4 b3 +50724864a3 b4 +52035584a2 b5
+25821184ab6 + 4587520b7 6291456a6 31457280a5 b 94371840a4 b2
138412032a3 b3 94371840a2 b4 31457280ab5 6291456b6 + 16777216a5
+50331648a4 b + 67108864a3 b2 + 67108864a2 b3 + 50331648ab4 + 16777216b5
16777216a4 67108864a3 b 100663296a2 b2 67108864ab3 16777216b4 .

The relevant pieces of these four curves in the (a, b)-plane are depicted in Figure 5.9.
The line a + b = 0 is seen in the lower left, the degree 13 curve is the swallowtail
in the upper right, and the two quintic curves form the upper-left and lower-right
boundaries of the enclosed convex region C.

rank 4
rank 5

rank 6

rank 5
rank 3
Figure 5.9. The discriminant in Example 5.59 denes a curve in the (a, b)plane. The projected spectrahedron C is the set of points where the ternary quartic
fa,b is sos. The ranks of the corresponding sos matrices Q are indicated.
For each (a, b) C, the ternary quartic fa,b has an sos representation
fa,b (x, y, z) = (x2 , xy, y 2 , xz, yz, z 2) Q (x2 , xy, y 2 , xz, yz, z 2)T ,

(5.54)

where Q is a positive semidenite 66 matrix. This identity gives 15 independent


linear constraints which, together with Q  0, dene an eight-dimensional spectrahedron in the (21 + 2)-dimensional space of parameters (Q, a, b). The projection of
this spectrahedron onto the (a, b)-plane is our convex region C. This proves that
C is a projected spectrahedron. If (a, b) lies in the interior of C, then the ber of
the projection is a six-dimensional spectrahedron. If (a, b) lies in the boundary C,

i
i

5.6. Projected Spectrahedra

main
2012/11/1
page 243
i

243

then the ber consists of a single point. The ranks of these unique matrices are
indicated in Figure 5.9. Notice that C has three singular points, at which the rank
drops from 5 to 4 and 3, respectively.
We now turn our attention to the question of approximating the convex hull
of a variety by a nested family of projected spectrahedra. Let I be an ideal in
R[x1 , . . . , xn ] and VR (I) the variety it denes in Rn . Consider the set of anelinear polynomials that are nonnegative on VR (I):
P1 (I)

{ f R[x1 , . . . , xn ]1 | f (x) 0 for all x VR (I)}.

In light of the biduality theorem for convex sets (cf. Section 5.2.2), we can characterize the (closure of) the convex hull of our variety as follows:
conv(VR (I))

{x Rn | f (x) 0 for all f P1 (I)}.

The geometry behind this formula is shown in Figure 5.10.

Figure 5.10. Convex hull as intersection of half spaces.


The hard constraint that f (x) be nonnegative on VR (I) can now be relaxed
to the (hopefully easier) constraint that f (x) be a sum of squares in the coordinate
ring R[x1 , . . . , xn ]/I; see [16]. Introducing a parameter d that indicates the degree
of the polynomials allowed in that sos representation, we consider the following set
of ane linear polynomials:
;
:
d1 (I) = f | f q12 qr2 I for some qi R[x1 , . . . , xn ]d .
The following chain of inclusions holds:
11 (I) 21 (I) 31 (I) P1 (I).

(5.55)

We now dualize the situation by considering the subsets of Rn where the various f
are nonnegative. The dth theta body of the ideal I is the set
:
;
THd (I) = x Rn | f (x) 0 for all f d1 (I) .

i
i

244

main
2012/11/1
page 244
i

Chapter 5. Dualities

The following reverse chain of inclusions holds among subsets in Rn :


TH1 (I) TH2 (I) TH3 (I) conv(VR (I)).

(5.56)

This chain of outer approximations can fail to converge in general, but there are
various convergence results when the geometry is nice. For instance, if the real
variety VR (I) is compact then Schm
udgens Positivstellensatz [35, Section 3] ensures
asymptotic convergence. When VR (I) is a nite set, so that conv(VR (I)) is a polytope, then nite convergence follows from [19], that is, d : THd (I) = conv(VR (I)).
More information on theta bodies and related constructions is given in Chapter 7.
The main point we wish to record here is the following:
Theorem 5.60 ([16, 22]). Each theta body THd (I) is a projected spectrahedron.
Proof. We may assume, without loss of generality, that the origin 0 lies in the
interior of conv(VR (I)). Then d1 (I) is the cone over the convex set dual to THd (I).
Since the class of projected spectrahedra is closed under duality, and under intersection with ane hyperplanes, it suces to show that d1 (I) is a projected
spectrahedron. But this follows from the formula f q12 qr2 I by an
argument similar to that given after (5.52).
In this chapter we have seen two rather dierent representations of the convex hull of a real variety, namely, the characterization of the algebraic boundary
in Section 5.4, and the representation as a theta body suggested above. The relationship between these two is not yet well understood. A specic question is how
to eciently compute the algebraic boundary of a projected spectrahedron. This
leads to problems in elimination theory that seem to be particularly challenging for
current computer algebra systems.
We conclude by revisiting one of the examples we had seen in Section 5.4.
Example 5.61 (Example 5.37 continued). We revisit the curve X = V (h1 , h2 )
with
h1 = x2 + y 2 + z 2 1,
h2 = 19x2 + 21y 2 + 22z 2 20.
Scheiderer [35] proved that nite convergence holds in (5.56) whenever I denes
a curve of genus 1, such as X. We will show that d = 1 suces in our example;
i.e., we will show that TH1 (I) = conv(X) for the ideal I = h1 , h2 .
We are interested in ane linear forms f that admit a representation

qi2 .
(5.57)
f = 1 + ux + vy + wz = 1 h1 + 2 h2 +
i

Here 1 and 2 are real parameters. Moreover, we want f to lie in 11 (I), so we


require deg qi = 1 for all i. The sum of squares can be written as

4
qi2 = (1, x, y, z) Q (1, x, y, z)T ,
where Q S+
.
i

i
i

5.6. Projected Spectrahedra

main
2012/11/1
page 245
i

245

After matching coecients in (5.57), we obtain the projected spectrahedron


11 (I) = (u, v, w) R3 1 , 2 :

1 + 1 + 202
u
v
w

19
0
0
1
2

0 .
0
v
0
1 212

1 222

Dual to this is the theta body TH1 (I) = 11 (I) . It has the representation

1
x
y
z

x 2 1 u 4
u
u
1
2
3
3
3

.

0
TH1 (I) = (x, y, z) R u1 , u2 , u3 , u4 :
2
1

y
u1

3 3 u4 u3

z
u2
u3
u4
To show that TH1 (I) = conv(X), we use the general approach outlined in Remark
5.62 below. We consider the ideal generated by this 44 determinant and its derivatives with respect to u1 , u2 , u3 , u4 , we saturate by the ideal of 33 minors, and then
we eliminate u1 , u2 , u3 , u4 . The result is the principal ideal h4 h5 h6 , with hi as in
Example 5.37. This computation reveals that the algebraic boundary of conv(X)
consists of quadrics, and we can conclude that TH1 (I) = conv(X).

Figure 5.11. Convex hull of the curve in Figure 5.7 and its dual convex body.
Pictures of our convex body and its dual are shown in Figure 5.11. Diagrams
such as these can be drawn fairly easily for any projected spectrahedron in R3 . To
be precise, the matrix representation of TH1 (I) and 11 (I) given above can be
used to rapidly sample the boundaries of these convex bodies, by maximizing many
linear functions via SDP.
Remark 5.62. It would be desirable to develop a practical algorithm for computing the algebraic boundary of a projected spectrahedron. After a linear change of

i
i

246

main
2012/11/1
page 246
i

Chapter 5. Dualities

coordinates, we may assume that the given spectrahedron is represented by a symmetric matrix whose entries are linear forms in some unknowns, and our task is to
eliminate a subset of these unknowns. To do this, we consider the ideal generated by
the determinant and its partial derivative with respect to the unknowns to be eliminated. The variety of this ideal contains the ramication locus of the projection,
but it also contains the singular locus of the determinantal hypersurface. The main
diculty in the computation is that we need to remove that singular locus before we
eliminate the unknowns. Frequently, like in the previous example, the singular locus
is given by the vanishing of the comaximal minors. However, this need not always
be the case. A concrete example is discussed below in Example 5.63. Thus, one
issue is how to best represent the singular locus of the algebraic boundary of a spectrahedron, in order to perform the saturation step. Once we have the correct ideal
for the ramication locus, then we can compute the branch locus by elimination,
and the result will be the desired hypersurface.
Example 5.63. Consider the surface in 3-space dened by

x
y + z
det
x
y

y+z
1
y
1

x
y
z
x

y
1
= 0.
x
1

Its singular locus is the line x y = z = 0. This does not coincide, in this example,
with the variety dened by the vanishing of the (comaximal) 3 3 minors which
consist only of the two points (0, 0, 0) and (1, 1, 0).

Exercises
Exercise 5.64. Find an explicit symmetric 66 matrix Q, with entries that are
linear in a and b, that satises the identify (5.54). Is your matrix Q unique?
Exercise 5.65. The polynomial p(x) = 1+x+x2 +x3 +x4 +x5 +x6 is nonnegative
on the real line. What is its minimum value? Write p(x) as a sum of squares. The set
of all sums of squares representations of p(x) is a three-dimensional spectrahedron.
Draw a picture of this spectrahedron. Determine all possible representations of p(x)
as a sum of two squares.
Exercise 5.66. Let C denote the convex set of all points (u, v) R2 such that
fu,v (x) = x4 + ux2 + vx + 1 is a sum of squares. Draw a picture of C, express C
as a projected spectrahedron, and compute a polynomial g(u, v) that vanishes on
the boundary of C.
Exercise 5.67. Let I = h1 , where h1 = (x21 1)(x1 1)2 +(x22 1)2 is the bicuspid
curve in Example 5.25. Compute and draw the second theta body TH2 (I).

i
i

Bibliography

main
2012/11/1
page 247
i

247

Exercise 5.68. The A-discriminant A of the 3 7 matrix in (5.53) is a homogeneous polynomial of degree 24 in seven indeterminates. Can you compute A
explicitly? How many monomials appear in the expansion of A ?
Notes. This chapter grew out of the notes for three lectures given by Bernd Sturmfels on March 2224, 2010, at the spring school on Linear Matrix Inequalities and
Polynomial Optimization (LMIPO) at UC San Diego. Later that spring, Bernd
Sturmfels lectured on convex algebraic geometry at the Universit`a de Roma 3. This
led to the publication of a rst version of the material in this chapter under the
title Dualities in Convex Algebraic Geometry in Rendiconti di Matematica, Serie
VII, 30:285327, 2010.

Bibliography
[1] A. I. Barvinok. A Course in Convexity, Grad. Stud. in Math. 54. American
Mathematical Society, Providence, RI, 2002.
[2] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in Real Algebraic Geometry.
Springer, Berlin, 2006.
[3] D. Bates, J. Hauenstein, A. Sommese, and C. Wampler. Bertini: Software for Numerical Algebraic Geometry. Available at http://www.nd.edu/
sommese/bertini.
[4] J. Bochnak, M. Coste, and M.-F. Roy. Geometrie Algebraique Reelle, Ergebn.
Math. Grenzgeb. 12. Springer, Berlin, 1987.
[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, Cambridge, UK, 2004.
[6] S. Boyd and L. Vandenberghe. Semidenite programming. SIAM Rev., 38:49
95, 1996.
[7] F. Catanese, S. Hosten, A. Khetan, and B. Sturmfels. The maximum likelihood
degree. Amer. J. Math., 128:671697, 2006.
[8] D. Cox, J. Little, and D. OShea. Ideals, Varieties and Algorithms, 3rd edition,
Undergrad. Texts Math. Springer, New York, 2007.
[9] D. Cox, J. Little, and D. OShea. Using Algebraic Geometry, 2nd edition, Grad.
Texts in Math. Springer, New York, 2005.
[10] R. Datta. Universality of Nash equilibria. Math. Oper. Res., 28:424432, 2003.
[11] I. Gelfand, M. Kapranov, and A. Zelevinsky: Discriminants, Resultants and
Multidimensional Determinants. Birkhauser, Boston, 1994.
[12] H.-C. Graf von Bothmer and K. Ranestad. A general formula for the algebraic
degree in semidenite programming. Bull. Lond. Math. Soc., 41:193197, 2009.

i
i

248

main
2012/11/1
page 248
i

Chapter 5. Dualities

[13] D. Grayson and M. Stillman: Macaulay2, a software system for research in


algebraic geometry. Available at http://www.math.uiuc.edu/Macaulay2/.
[14] N. Go and H. A. Scheraga. Ring closure in chain molecules with Cn ,I, and S2n
symmetry. Macromolecules, 6:273281, 1973.
[15] J. Gouveia and T. Netzer. Positive polynomials and projections of spectrahedra. SIAM J. Optim., 21:960976, 2011.
[16] J. Gouveia, P. A. Parrilo, and R. R. Thomas. Theta bodies for polynomial
ideals. SIAM J. Optim., 20:20972118, 2010.
[17] J. W. Helton and J. Nie. Semidenite representation of convex sets. Math.
Program. Ser. A, 122:2164, 2010.
[18] M. Hestenes. Optimization Theory: The Finite Dimensional Case. Wiley &
Sons, New York, 1975.
[19] M. Laurent, J. B. Lasserre, and P. Rostalski. Semidenite characterization
and computation of zero-dimensional real radical ideals. Found. Comp. Math.,
8:607647, 2008.
[20] M. Laurent and S. Poljak. On the facial structure of the set of correlation
matrices. SIAM J. Matrix Anal. Appl., 17:530547, 1996.
[21] J. B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM J. Optim., 11:796817, 2001.
[22] J. B. Lasserre. Moments, Positive Polynomials and Their Applications. Imperial College Press, London, 2010.
[23] S. Lazard, L. M. Pe
naranda, and S. Petitjean. Intersecting quadrics: an ecient
and exact implementation. Comput. Geom., 35:7499, 2006.
[24] D. G. Luenberger. A double look at duality. IEEE Trans. Automat. Control,
73:14741482, 1992.
[25] M. Marshall. Positive Polynomials and Sums of Squares. American Mathematical Society, Providence, RI, 2008.
[26] J. Nie. Discriminants and nonnegative polynomials. J. Symbolic Comput.,
47:167191, 2012.
[27] J. Nie, K. Ranestad, and B. Sturmfels. The algebraic degree of semidenite
programming. Math. Program., 122:379405, 2010.
[28] D. Plaumann, B. Sturmfels, and C. Vinzant. Quartic curves and their bitangents. J. Symbolic Comput., 46:712733, 2011.
[29] M. Ramana. An exact duality theory for semidenite programming and its
complexity implications. Math. Program., 77:129162, 1997.

i
i

Bibliography

main
2012/11/1
page 249
i

249

[30] M. Ramana and A. J. Goldman. Some geometric results in semidenite programming. J. Global Optim., 7:3350, 1995.
[31] K. Ranestad and B. Sturmfels. On the convex hull of a space curve. Adv. Geom.,
12:157178, 2012.
[32] K. Ranestad and B. Sturmfels. The convex hull of a variety. In P. Br
anden,
M. Passare, and M. Putinar, editors, Notions of Positivity and the Geometry
of Polynomials. Trends Math. Springer-Verlag, Basel, 2011, pp. 331344.
[33] R. T. Rockafeller. Convex Analysis. Princeton University Press, Princeton, NJ,
1970.
[34] P. Rostalski. Bermeja, Software for Convex Algebraic Geometry. Available at
http://math.berkeley.edu/philipp/cagwiki.
[35] C. Scheiderer. Convex hulls of curves of genus one. Adv. Math., 228:26062622,
2011.
[36] M. Trott. Applying GroebnerBasis to three problems in geometry. Mathematica
in Education and Research, 6:1528, 1997.
[37] G. Ziegler. Lectures on Polytopes. Grad. Texts in Math. Springer, New York,
1995.

i
i

main
2012/11/1
page 250
i

main
2012/11/1
page 251
i

Chapter 6

Semidenite
Representability

Jiawang Nie

It is natural to ask which convex optimization problems can be formulated as


semidenite programs. If such a formulation exists, how can we nd it? The
answer to these questions is equivalent to nding an exact representation of a convex set as a spectrahedron or projected spectrahedron. Whenever this can be done,
we say that the convex set has a semidenite representation or it is semidenite
representable.

6.1

Introduction

We begin by examining the question of when a convex set S is a spectrahedron. Since


a spectrahedron is dened by a linear matrix inequality, the points on the boundary
of S must satisfy a polynomial equation given by the determinant of its linear pencil.
Therefore, only convex sets whose boundaries have a polynomial description can be
spectrahedra. In fact, being a spectrahedron is even more restrictive and we will
examine some of these restrictions in this chapter. In particular, we will present
a complete characterization of two-dimensional spectrahedra due to Helton and
Vinnikov. In higher dimensions a full characterization of which convex sets are
spectrahedra is unknown.
The class of projected spectrahedra is considered next. We will provide some
natural necessary conditions for a set to be a projected spectrahedron. Deriving sucient conditions brings us to explicit construction methods for semidenite
representations. A general technique for constructing such representations and approximations of a convex set S given by polynomial equations and inequalities is
to use moments. The basic idea is that we introduce an independent variable for
Jiawang

Nie was supported by NSF grants DMS-0757212 and DMS-0844775.

251

i
i

252

main
2012/11/1
page 252
i

Chapter 6. Semidenite Representability

every monomial, so that the dening inequalities of S become linear inequalities in


the new variables. We then consider a set consisting of points satisfying the dening inequalities of S in the new variables, and some extra positive semideniteness
conditions coming from moment matrices. The moment approach is equivalent, via
duality, to showing that every linear polynomial nonnegative on S has a weighted
sum-of-squares representation with uniform degree bounds. Therefore, the sum-ofsquares theory will naturally appear in studying semidenite representability. We
will examine in detail the power of the moment approach to provide exact representations of convex sets. In particular, under some local boundary conditions we
obtain exact semidenite representations.
Another approach for constructing semidenite representations is called localization. If we can divide a convex set S into several parts and nd a semidenite
representation for each piece, then these representations can be glued together to
provide a semidenite representation for S. The main tool for this approach is
building a single semidenite representation for the convex hull of the union of
several projected spectrahedra.
Sucient conditions will follow from combining localization with the moment
approach. While at this time we do not have a full understanding of semidenite
representability of convex semialgebraic sets, the necessary and sucient conditions
derived in this chapter are reasonably close to each other.

6.2

Spectrahedra

Recall from Chapter 2 that a set S Rn is called a spectrahedron if it can be


described by a linear matrix inequality as
S = {x Rn : A0 + x1 A1 + + xn An  0}.

(6.1)

Here, each Ai is a constant symmetric matrix, and if the origin is in the interior of
S, then A0 can be chosen to be positive denite. Furthermore, if A0 0, we can
apply a congruence transformation to the matrices A1 , . . . , An and make A0 = I.
For instance, if A0 = BB T with B nonsingular, then S can be described by
I + x1 B 1 A1 B T + + xn B 1 An B T  0.
When A0 = I, the linear matrix inequality in (6.1) is said to be monic and the origin
is in the interior of S. Conversely, if S dened by (6.1) has nonempty interior, we
may assume A0 is positive denite by translating an interior point to the origin. The
expression A0 + x1 A1 + + xn An is called a symmetric linear matrix polynomial
or a linear pencil.

6.2.1

Examples of Spectrahedra

We begin by giving examples of spectrahedra that naturally arise in optimization.


Ellipsoids. An ellipsoid E is a set in Rn that can be described as
E = {x Rn : (x c)T E 1 (x c) 1}

i
i

6.2. Spectrahedra

main
2012/11/1
page 253
i

253

n
and a vector c Rn . The
for a symmetric positive denite matrix E S++
vector c is called the center of E, and E is called the shape matrix of E. An
ellipsoid E is a spectrahedron because a point x is in E if and only if it satises
the linear matrix inequality

E
(x c)T

 
E
xc
=
1
cT

 

n
c
0
xi T
+
1
ei
i=1


ei
 0.
0

We can use Schur complement to verify that the above linear matrix inequality
describes E. Ellipsoids have wide applications in optimization [3, 7, 8, 32].
Second order cones. The set {(x, t) Rn R+ :
x
2 t} is called the
second order cone (also Lorentz cone or ice cream cone). We have already
seen this cone in Chapter 2. It is a spectrahedron, because it is dened by
the linear matrix inequality


tIn
xT

 

n
x
0
xi T
=
t
ei
i=1



I
ei
+t n
0
0


0
 0.
1

Second order cones also have wide applications in optimization (cf. [2]).
Convex quadratic sets. More general convex sets than ellipsoids and second order cones are dened by quadratic inequalities. Let Q := {x Rn :
q(x) 0} be a nonempty set, with
q(x) := xT Bx + bT x + c
being a quadratic function. Here B is a symmetric matrix. It is interesting
to note that the set Q is convex if and only if it is a spectrahedron. We leave
this as an exercise to the readers.
Matrices with bounded eigenvalues or singular values. Denote by
min () and max (), respectively, the minimum and maximum eigenvalues of
a symmetric matrix. Let X Rnn . If X is symmetric, then max (X) t if
and only if
tI X  0
and min (X) t if and only if
X tI  0.
If X is not symmetric, then its maximum singular value max (X) t if and
only if


tI X
 0.
X T tI
These linear matrix inequalities all dene spectrahedra in the space of (X, t).

i
i

i main

2012/11/10
page 254

254

Chapter 6. Semidenite Representability

Fractional linear-quadratic inequalities [3]. Fractional linear-quadratic


inequalities can be used to dene interesting convex sets. Let



2

n
Bx + f
2
T
T
c x + d, a x + b > 0
F = xR T
a x+b
be a nonempty set dened by a, b, c, d, f, B. Note that the denominator aT x+b
is positive on F . The closure of F is a spectrahedron since by Schur complement it can be described by the linear matrix inequality
 T

(a x + b)I Bx + f
 0.
(Bx + f )T cT x + d
Quadratic matrix inequalities [3]. Let V be a symmetric positive denite
matrix and L(X) : Rmn S k be a linear operator. Consider the following
quadratic matrix inequality on matrix pairs (X, Y ) with Y symmetric:
XV 1 X T + L(X) Y.
By Schur complement, it is equivalent to the linear matrix inequality


V
XT
0
X Y L(X)
which denes a spectrahedron in the space of (X, Y ).
Matrix cubes parameterized by eigenvalues [30]. Consider the linear
matrix polynomials B1 (x), . . . , Bm (x). Let

m



A
+
t
A

0
whenever

0
k k

n
k=1

(x, d) R R
,
C =

min (Bk (x)) tk max (Bk (x))


for k = 1, 2, . . . , m
where every Ak is a constant symmetric matrix. The set C is a spectrahedron
(cf. [30]), because there exists a symmetric linear matrix polynomial L(x, d)
in (x, d) such that
C = { (x, d) Rn R : L(x, d)  0 } .
The construction of L(x, d) is given in [30].
A special case of matrix cubes is the k-ellipse, which consists of all points
in the plane that have a constant sum of distances to a set of given foci (cf.
[31]). We have already encountered the k-ellipse in Section 2.1.3. For instance,
the 3-ellipse with foci (0, 0), (1, 0), (0, 1) and radius d = 5 is dened by the
equation
?
?
?
x21 + x22 + (x1 1)2 + x22 + x21 + (x2 1)2 = 5.

i
i

6.2. Spectrahedra

main
2012/11/1
page 255
i

255

The region surrounded by this 3-ellipse is convex and can be described by the
linear matrix inequality:

6 3x1
x2

x2 1

x2

0
0

x2
6 x1
0
x2 1
0
x2
0
0

x2 1
0
6 x1
x2
0
0
x2
0

0
x2 1
x2
6 + x1
0
0
0
x2

x2
0
0
0
4 x1
x2
x2 1
0

0
x2
0
0
x2
4 + x1
0
x2 1

0
0
x2
0
x2 1
0
4 + x1
x2

0
0

x2
 0.
0

x2 1

x2
4 + 3x1

Therefore, this convex region is a spectrahedron. A dening polynomial for


this 3-ellipse is given by the determinant of the above matrix.

6.2.2

Spectrahedra and Algebraic Interiors

Let S be a spectrahedron dened as in (6.1), and pI (x) denote the principal minor
of the linear pencil
A(x) := A0 + A1 x1 + + An xn ,
whose rows and columns are indexed by a nonempty set I {1, 2, . . . , m}, where
m is the size of the matrices Ai . Then, a point x S if and only if all the principal
minors are nonnegative at x:
pI (x) 0 for all I {1, 2, . . . , n}.
Therefore, S is a basic closed semialgebraic set (dened by nitely many weak
polynomial inequalities). The boundary of S lies on the determinantal hypersurface
det A(x) = 0.
If A0 0 (the origin is in the interior of S), then S is the closure of the connected
component of the set
{x : det A(x) > 0}
containing the origin.
The above observation leads to the denition of algebraic interior, which was
introduced by Helton and Vinnikov [17]. A subset T of Rn is an algebraic interior
if it equals the closure of a connected component of the set {x : p(x) > 0} for
some polynomial p. The polynomial p is called a dening polynomial of T . The
dening polynomial of an algebraic interior is not unique. However, the one of
the smallest degree is unique up to a positive constant factor, and divides all the
dening polynomials of T . Its degree is called the degree of T .
Example 6.1. Consider the spectrahedron dened by

1 x1 x2
x1 1 x3  0.
x2 x3 1

i
i

256

main
2012/11/1
page 256
i

Chapter 6. Semidenite Representability

It is the elliptope E3 , which we have previously seen in Chapter 2 and Chapter 5,


and an algebraic interior dened by the cubic polynomial inequality
p{1,2,3} (x) := 2x1 x2 x3 x21 x22 x23 + 1 > 0.
This spectrahedron is a basic closed semialgebraic set dened by the four polynomial
inequalities:
p{1,2,3} (x) 0,

p{1,2} = 1 x21 0,

p{1,3} = 1 x22 0,

p{2,3} = 1 x23 0.

A picture of this elliptope is shown in Chapter 5, Figure 5.8.


A spectrahedron dened by a monic linear pencil is convex and an algebraic
interior. We will now consider the converse of this statement. Suppose a set S Rn
is convex and equals the closure of a connected component of the set
{x : p(x) > 0}
for some polynomial p. Does it follow that S is a spectrahedron? As we will see, a
spectrahedron satises a stronger condition called rigid convexity.

6.2.3

Rigid Convexity

Suppose S is a spectrahedron dened by a monic linear pencil A(x). Then S is


an algebraic interior with dening polynomial p(x) = det A(x). Given an arbitrary
real direction 0 = w Rn , consider the line x(t) := tw passing through 0. Note
that

wi Ai .
p(x(t)) = det(I + tW ), W =
i

Since W is symmetric, the equation p(x(t)) = 0 has only real roots. This is an
important property satised by spectrahedra.
A polynomial p R[x] is called real zero with respect to a point u with
p(u) > 0 if for every 0 = w Rn the univariate polynomial p(u + tw) R[t] has
only real zeros. If u = 0, we simply say that p is real zero. Real zero polynomials are
nonhomogeneous versions of hyperbolic polynomials. A homogeneous polynomial
h(x) is hyperbolic with respect to a direction u Rn with h(u) > 0 if for every
0 = w Rn the univariate polynomial h(u + tw) R[t] has only real zeros. If a
form h(x) is hyperbolic with respect to u = (1, u2 , . . . , un ), then the dehomogenized
polynomial h(1, x2 , . . . , xn ) is real zero with respect to (u2 , . . . , un ).
Example 6.2. (i) The cubic polynomial from Example 6.1,
2x1 x2 x3 x21 x22 x23 + 1,
is real zero, because it is the determinant of a monic linear pencil.
(ii) The polynomial p(x) = 1 (x41 + x42 ) is not real zero [17]. For every 0 =
(w1 , w2 ) R2 , the univariate polynomial in t
/.
/
.
p(tw) = 1 t2 (w14 + w24 )1/2 1 + t2 (w14 + w24 )1/2
has two nonreal zeros. The origin lies in the interior of {x : p(x) > 0}.

i
i

6.2. Spectrahedra

main
2012/11/1
page 257
i

257

1
0.8
0.6
0.4
0.2
0
2
4
6
8

0.5

Figure 6.1. The TV screen {(x1 , x2 ) : x41 + x42 1}.


Suppose an algebraic interior S Rn is dened by a polynomial p. Then S
is called rigidly convex if p is real zero with respect to an interior point u of S. If
so, we say S passes the line test with respect to u; i.e., every real line passing
through u intersects the hypersurface p(x) = 0 at only real points. The properties
of rigidly convex sets are summarized in the following theorem due to Helton and
Vinnikov.
Theorem 6.3 ([17]). Suppose S is an algebraic interior.
(i) If S passes the line test with respect to a point u int(S), then it must be
convex.
(ii) If S is rigidly convex with respect to a point u int(S), then S is rigidly
convex with respect to every point v int(S).
Not all convex algebraic interiors are rigidly convex. As we saw above, the
TV screen (see Figure 6.1)
{(x1 , x2 ) : 1 x41 x42 0}
does not pass the line test and hence is not rigidly convex (cf. Example 6.2). Here
is another such example.
Example 6.4 ([17]). Consider the polynomial
p(x) = x31 3x22 x1 (x21 + x22 )2 .
The inequality p(x) > 0 denes three bounded convex components shown in Figure 6.2. Let S be the closure of the component lying in the half space x1 0. It is
an algebraic interior of degree 4 and is shaded in Figure 6.2. The point u = (0.5, 0)
lies in the interior of S. Figure 6.2 shows a line passing through u and intersecting
the curve p(x) = 0 in only two real points. Thus, S is not rigidly convex.

i
i

258

main
2012/11/1
page 258
i

Chapter 6. Semidenite Representability

1.5

0.5

(x21

0.2

0.4

0.6

0.8

Figure 6.2. A line passing through (0.5, 0) intersects the curve x31 3x22 x1
= 0 in only 2 real points.

x22 )2

The relationship between spectrahedra and rigid convexity is described by the


following fundamental result of Helton and Vinnikov. It completely characterizes
two-dimensional spectrahedra.
Theorem 6.5 ([17]). If an algebraic interior S Rn is a spectrahedron, then S is
rigidly convex. When n = 2, the converse is also true, and S can be represented by
a monic linear matrix inequality whose size equals the degree of its boundary S.
The rst statement in Theorem 6.5 has been shown at the beginning of this
subsection: the determinant of a monic linear pencil must be real zero. In the
two-dimensional case (n = 2), the converse statement is established by showing
that every real zero bivariate polynomial of degree d is the determinant of a monic
linear pencil of size d d. Finding a spectrahedral representation of a rigidly convex
algebraic interior S is equivalent to nding a representation of a dening polynomial
of S as the determinant of a monic linear pencil. This naturally leads to studying
determinantal representations of polynomials.

6.2.4

Symmetric Determinantal Representations

Given a polynomial p R[x], we say it has a symmetric determinantal representation if there exists a linear pencil
L(x) := L0 + x1 L1 + + xn Ln
such that p = det L(x) and every Li is symmetric. If L0 0, we say that p
admits a monic symmetric determinantal representation . An important result due

i
i

6.2. Spectrahedra

main
2012/11/1
page 259
i

259

to Helton, McCullough, and Vinnikov is that every polynomial p can be expressed


as the determinant of a linear pencil (not necessarily monic).
Theorem 6.6 ([16]). Every polynomial p (with p(0) = 0) admits a symmetric
determinantal representation of the form
p(x) = c det (L0 + L1 x1 + + Ln xn ),

(6.2)

where L0 is a signature matrix (L0 is diagonal and L20 = I) and c is a nonzero


constant.
Clearly, if deg(p) = d, the size of matrices Lj should be at least d. When
n > 2, typically the size of Lj has to be larger than d. This can be shown by a
dimension comparison. Suppose Lj has dimension N N and L0 is diagonal. The
%
$
dimension of the space of degree d polynomials is n+d
d , while the dimension of
the space of pencils L(x) is N + nN (N + 1)/2. For any xed n > 2, the former
dimension grows signicantly faster than the latter if N = O(d). So we should
expect N > O(d) when n > 2.
Example 6.7. (i) The polynomial 1 + x21 + x22 has the symmetric determinantal
representation

1
0 x1
1 x2 .
1 + x21 + x22 = det 0
x1 x2 1
This linear pencil is clearly not monic.
(ii) Consider the following bivariate quartic polynomial:
1 + x21 + x22 + 4x21 x2 4x1 x22 + x41 2x31 x2 2x1 x32 x21 x22 + x42 .
It is the determinant of the following linear

1 x1 x2
x1 1 x1

x2 x1 1
x2 x2 x1

pencil which is also not monic:

x2
x2
.
x1
1

In the context of semidenite representations it is natural to ask whether a


real zero polynomial admits a monic symmetric determinantal representation. For
the general case n > 2, a counterexample was found by Br
anden [9]. He further
showed that there are real zero polynomials p for which there is no power k > 0
such that pk admits a monic symmetric determinantal representation. Simpler
counterexamples were found by Netzer and Thom [26]. For instance, for every
n 4, the simple quadratic polynomial (1 + x1 )2 x22 x2n does not admit a
monic symmetric determinantal representation (cf. [26, Example 3.5]).
It follows from Theorem 6.3 that for n = 2, a degree d real zero polynomial
always has a monic symmetric determinantal representation of size d d. The proof

i
i

260

main
2012/11/1
page 260
i

Chapter 6. Semidenite Representability

uses complexication of projective algebraic curves and the constructions are mostly
theoretical. Computational aspects of these constructions are discussed in [35].
When S Rn (n > 2) is an algebraic interior that is rigidly convex, its minimum degree dening polynomial p might not admit a monic symmetric determinantal representation. However, this does not exclude the possibility of a multiple
of p having a monic symmetric determinantal representation. If this is true, then S
would be a spectrahedron. Indeed, Helton and Vinnikov [17] conjectured that every
rigidly convex algebraic interior of Rn is a spectrahedron.

6.2.5

Exercises

Exercise 6.8. Let C = {x Rn : f (x) 0} be a nonempty convex set dened by a


smooth function f : Rn R. Suppose u lies on the boundary of C and f (u) = 0.
Show that the following is true:
(i) The Hessian 2 f (u) is positive semidenite in the tangent space of C at u, i.e.,
v T 2 f (u)v 0 for all v f (u) := {w : f (u)T w = 0}.
(ii) The set C belongs to the half space f (u)T (x u) 0.
Exercise 6.9. Let Q = {x Rn : q1 (x) 0, . . . , qm (x) 0} be a nonempty set
with each qi being a quadratic polynomial. Show that Q is convex if and only if it
is a spectrahedron.
Exercise 6.10. Decide whether the following polynomials are real zero or not with
respect to the vector (1, . . . , 1) of all ones:
(a) x1 xn 1/2;
(b) x1 xn ;
(c) x1 xn (1/x1 + + 1/xn );
(d) (n + 1)xpn+1 xp1 xpn , (p > 1 is an integer).
Exercise 6.11. Find a smallest size symmetric determinantal representation for
the following polynomials:
(a) 1 x21 x22 x23 ;
(b) 1 + x31 + x32 ;
(c) 1 x41 x42 ;
(d) 1 + x61 + x62 .
Exercise 6.12. Consider the 3-ellipse with foci (0, 0), (1, 0), (0, 1) and radius 3:
?
?
?
x21 + x22 + (x1 + 1)2 + x22 + x21 + (x2 + 1)2 = 3.

i
i

6.3. Projected Spectrahedra

main
2012/11/1
page 261
i

261

Represent the convex region surrounded by this 3-ellipse by a linear matrix inequality in variables x1 and x2 only. What is the polynomial of smallest degree (up to a
constant factor) vanishing on this 3-ellipse?
Exercise 6.13. Suppose S is a spectrahedron. Show that every face of S is
exposed. (A face F of S is called exposed if either F = S or there exists a supporting
hyperplane H of S such that H S = F .)

6.3

Projected Spectrahedra

A set S Rn is called a projected spectrahedron if there exists a spectrahedron


P Rn+k such that

4
5

S = x Rn (x, y) P for some y Rk .
(6.3)
In the above, y is called a lifting vector and P a lifting spectrahedron of S. Using
the linear matrix inequality dening P , we can write S as


n
k




S = x Rn A0 +
(6.4)
xi Ai +
yj Bj  0 for some y Rk .


i=1
j=1
Projected spectrahedra are a much larger class of convex sets than spectrahedra,
with signicantly greater modeling power. Unlike in the case of spectrahedra where
rigid convexity is a natural requirement, no nontrivial obstructions to being a projected spectrahedron are known. In the remainder of this chapter we discuss representability of convex sets as projected spectrahedra.

6.3.1

Examples of Projected Spectrahedra

We now give several examples of projected spectrahedra, many of which are important in applications.
The TV screen {(x1 , x2 ) : 1 x41 x42 0} of Example 6.2 is a projected
spectrahedron since it admits the semidenite representation
 
 
'
&
y2
1 x1
1 x2
1 + y1
,
,
 0.
BlockDiag
y2
1 y1
x1 y1
x2 y2
It has two lifting variables, and we have seen that the TV screen is not a
spectrahedron.
:
;
The three-dimensional hyperboloid H = x R3+ : x1 x2 x3 1 is a projected
spectrahedron, since it admits the semidenite representation
&
 
 
'
x1 y1
1
y
x y2
BlockDiag
, 1
, 3
 0.
1 y2
y1 x2
y2 1

i
i

262

main
2012/11/1
page 262
i

Chapter 6. Semidenite Representability


There are two lifting variables. The hyperboloid H is not a spectrahedron,
because its dening polynomial x1 x2 x3 1 is not real zero with respect to
(1, 1, 2), an interior point of H.
For any rational r [0, 1/m] the set
r
H(m, r) := {(x, t) Rm
+ R : t (x1 xm ) }

(6.5)

is a projected spectrahedron (cf. [3, Section 3.3]). As we will see below, the
sets H(m, r) are useful in constructing semidenite representations for convex
sets.
Sums of largest eigenvalues [3]. In optimization one often needs to minimize the sum of k largest eigenvalues over an ane subspace of symmetric
matrices. This optimization problem is convex and can be formulated as a
semidenite program. For X S n , let i (X) be the ith largest eigenvalue
of X. Dene sk (X) := 1 (X) + + k (X) to be the sum of k largest
eigenvalues of X. Denote the set
:
;
Skn := (X, t) S n R : sk (X) t .
Note that sk (X) t if and only if there exists (Z, ) S n R such that [3,
Section 4.2]
t k Tr(Z) 0,
(6.6)
Z  0,
Z X + In  0.
It can be checked that (6.6) implies sk (X) t. Conversely, if sk (X) t,
then we can nd a pair (Z, ) S n R satisfying (6.6). To see this, we may
assume that X is diagonal (up to an orthogonal transformation) and choose
= k (X),

Z = Diag(1 (X) , . . . , k1 (X) , 0, . . . , 0).

Hence, (6.6) is a semidenite representation of Skn , and Skn is a projected


spectrahedron.
A semidenite representation similar to (6.6) can be constructed for the set
of all pairs (X, t) S n R satisfying
nk+1 (X) + + n (X) t
by using the relation ni (X) = i+1 (X). This means that maximizing
the sum of k smallest eigenvalues over an ane subspace of symmetric matrices
can also be formulated as a semidenite program.
Sums of largest singular values [3]. Another frequently encountered optimization problem is to minimize the sum of k largest singular values of
matrices in an ane subspace. This problem can also be formulated as a
semidenite program in a similar way. For X Rmn , denote by i (X) the

i
i

6.3. Projected Spectrahedra

main
2012/11/1
page 263
i

263

ith largest singular value of X. Note that


&
0
i (X) = i
XT

X
0

'
.

A semidenite representation as in (6.6) can be similarly constructed for the


set of all pairs (X, t) satisfying
1 (X) + + k (X) t.
Powers of determinants [3]. In many applications, such as matrix completion problems, one often needs to maximize the determinant of a positive
semidenite matrix over an ane subspace. This problem can also be formulated as a semidenite program. For a rational number r [0, 1/n], the set
:
;
n
Drn := (X, t) S+
R : (det X)r t
is a projected spectrahedron, because (X, t) Drn if and only if there exists a
lower triangular matrix L Rnn satisfying


X
L
 0, (diag(L), t) H(n, r).
LT Diag(L)
Here H(n, r) is as in (6.5). This was shown in [3, Section 4.2].
Sums of squares polynomials. In Chapter 3, we have seen that sos polynomials are very useful in global optimization of polynomial functions. Recall
that n,2d is the set of sos polynomials of degree 2d in n variables. We already
know that a polynomial f n,2d if and only if there exists a Gram matrix
X  0 such that
f (x) = [x]Td X[x]d , X  0,
where [x]d denotes the column vector of monomials of degree at most d:
3T
2
[x]d = 1 x1 x21 x1 x2 xdn .
Note that the Gram matrix X is usually not unique for a given f . The
above implies n,2d is a projected spectrahedron, which we have also seen in
Chapter 4.
For instance, the set 1,4 of univariate quartic sos polynomials is the set
4



5
i
(f0 , f1 , f2 , f3 , f4 ) R
fi x 0 x R .

i=0

It admits the following semidenite representation with one lifting variable :

1
1
f0
2 f1
3 f2
1
1
1 f1
 0.
2
3 f2 + 2
2 f3
1
1
f4
3 f2
2 f3
Truncated quadratic modules and preordering. In constrained polynomial optimization, weighted sos polynomials are very useful in representing

i
i

264

main
2012/11/1
page 264
i

Chapter 6. Semidenite Representability


polynomials that are nonnegative on a set. For a tuple of polynomials g :=
(g1 , . . . , gm ), its kth order truncated quadratic module is dened as

 m


deg(i gi ) 2k for all i
,
(6.7)
i gi
qmodulek (g) =
0 , . . . , m are sos

i=0

and its kth order truncated preorder is dened as



deg(
g
)

2k,

preorderk (g) =
g
.

is sos for every


{0,1}m

(6.8)

m
and g0 = 1. The set of all sos
In the above, we denote g := g11 gm
polynomials with a xed degree is a projected spectrahedron, as shown in
the preceding example. Therefore, both qmodulek (g) and preorderk (g) are
projected spectrahedra.

For instance, in the case of two variables (n = 2), qmodule1 (1 x21 x22 )
admits the semidenite representation with one lifting variable :




a
b
T
T

 0 , 0 .
a + 2b x + x Cx
C + I2
bT

6.3.2

Necessary Conditions

The geometry of the boundary is very important in investigating semidenite representability of convex sets. The notion of curvature plays a crucial role.
Let f be a polynomial in R[x]. Consider its real variety
VR (f ) = {x Rn : f (x) = 0}
and a point u VR (f ). We say f is nonsingular at u if f (u) = 0. If f is nonsingular
at u VR (f ), we say VR (f ) has positive curvature at u if for either s = 1 or s = 1
s v T 2 f (u)v > 0 for all 0 = v f (u) .

(6.9)

Here f (u) denotes the orthogonal complement of the subspace spanned by


f (u). When VR (f ) has positive curvature at u and s = 1 in (6.9) (respectively,
s = 1), the intersection {f (x) 0} B(u, ) (respectively, {f (x) 0} B(u, )) is
convex for a small > 0. The denition of positive curvature of a nonsingular hypersurface Z is independent of the choice of its dening functions (cf. [13, Section 3]).
Geometrically, when f is nonsingular at a point u VR (f ), the variety VR (f ) has
positive curvature at u if and only if there exists a neighborhood O of u such that
VR (f ) O is the graph of a strictly convex function (here strict convexity means
the Hessian is positive denite). For a subset V VR (f ), we say VR (f ) has positive
curvature on V if f (x) is nonsingular everywhere on V and VR (f ) has positive curvature at every u V . When > is replaced by in (6.9), we similarly say VR (f )
has nonnegative curvature at u. We refer to [39] for more properties of curvature.
Example 6.14. Consider the TV screen 1 x41 x42 0. Note that
$
%
v T 2 (1 x41 x42 ) v = 12(x21 v12 + x22 v22 ) 0 for all v R2 .

i
i

6.3. Projected Spectrahedra

main
2012/11/1
page 265
i

265

Its boundary has zero curvature on four points (1, 0), (0, 1) and has positive
curvature everywhere else.
A polynomial function f (x) is said to be strictly quasi-concave at u if the
condition (6.9) holds for s = 1. For a subset V Rn , we say f (x) is strictly
quasi-concave on V if f (x) is strictly quasi-concave on every point of V . When > is
replaced by in (6.9) for s = 1, we can similarly dene f (x) to be quasi-concave.
Similarly, quasi-convexity and strict quasi-convexity are dened by requiring s = 1
in (6.9). Our denitions of quasi-convexity and quasi-concavity are slightly less
demanding than the ones in the existing literature (e.g., [8, Section 3.4.3]).
Example 6.15. Consider the two-dimensional hyperboloid
H := {x R2+ : x1 x2 1 0}.
We see that
$
%
v T 2 (x1 x2 1) v = 2v1 v2 > 0
whenever 0 = v x and x1 x2 = 1. Hence the boundary H has positive curvature.
The dening polynomial is not convex anywhere, but it is strictly quasi-concave on
the boundary of H.
Now we present some necessary conditions for a set to be a projected spectrahedron. We are interested in closed semialgebraic sets:
S=

m
<

Tk ,

k
Tk = {x Rn : g1k (x) 0, . . . , gm
(x) 0}.
k

k=1

Each gik is a polynomial and the sets Tk are called basic closed semialgebraic. Denote
by Tk the boundary of Tk in the standard Euclidean topology. For any u Tk ,
the active set Ik (u) := {1 i mk : gik (u) = 0} is nonempty.
The description of a semialgebraic set by polynomials is usually not unique,
and its boundary might have singularities. We say u is a nonsingular point of Tk
if |Ik (u)| = 1 and gik (u) = 0 for i Ik (u); otherwise, we say u is a singular
point of Tk . A point u on Tk is called a corner point of Tk if |Ik (u)| > 1. For
u S and i Ik (u) = , we say gik is irredundant at u with respect to S (or just
irredundant at u if the set S is clear from the context) if there exists a sequence
of nonsingular points {uN } V (gik ) S of Tk such that uN u; otherwise, we
say gik is redundant at u. We say gik is at u if gik (u) = 0. Geometrically, when
gik is nonsingular at u S, gik being redundant at u means that the inequality
gik (x) 0 is not necessary for describing S in a small neighborhood of u.
Example 6.16. Consider the convex set that is drawn in the shaded area of Figure 6.3. It is the union of the following two basic closed semialgebraic sets:
T1 = {g11 (x) := x2 0, g21 (x) := 1 x2 0, g31 (x) := x42 x61 0},
T2 = {g12 (x) := x1 0, g22 (x) := 1 x2 0, g32 (x) := 10x32 x51 0}.

i
i

266

main
2012/11/1
page 266
i

Chapter 6. Semidenite Representability

1.5

0.5

0.5

1.5

Figure 6.3. The shaded area is the union of T1 and T2 in Example 6.16.
The corner points of T1 are (1, 1), (0, 0), (1, 1). The polynomial g31 is irredundant at (1, 1) and (0, 0) but redundant at (1, 1). The polynomials g31 in
nonsingular at (1, 1) but singular at (0, 0). The
polynomial g11 is redundant at

5
(0, 0). The corner points of T2
are (0, 0), (0, 1), ( 10, 1). The polynomial g32 is ir5
redundant at both (0, 0) and ( 10, 1). It is nonsingular at ( 5 10, 1) but singular
at (0, 0). The polynomial g12 is redundant at (0, 1) and (0, 0). Both g21 and g22 are
irredundant on the section x2 = 1 of the boundary.
Now we present necessary conditions for semidenite representability.
Theorem 6.17 ([13]). Let S Rn be a projected spectrahedron. Then S is convex
and has the following additional properties:
(a) The interior int(S) of S is a nite union of basic open semialgebraic sets, i.e.,
int(S) =

m
<

Tk ,

k
Tk = {x Rn : g1k (x) > 0, . . . , gm
(x) > 0}.
k

k=1

(b) The closure S of S is a nite union of basic closed semialgebraic sets:


S=

m
<

Tk ,

k
Tk = {x Rn : g1k (x) 0, . . . , gm
(x) 0}.
k

k=1

(The polynomials gik may be dierent from those in (a).)


(c) For each u S and i Ik (u) = , if gik from (b) is irredundant and nonsingular at u, then gik is quasi-concave at u.

i
i

6.3. Projected Spectrahedra

main
2012/11/1
page 267
i

267

Theorem 6.17 says that a projected spectrahedron must be convex and semialgebraic, and its boundary must have nonnegative curvature at smooth points. In
particular, the rst two parts establish the necessary algebraic structure of projected
spectrahedra, while nonnegativity of curvature follows from convexity. In other
words, convexity and being semialgebraic are necessary conditions for semidenite
representability. It is not clear whether they are also sucient. Indeed, it was
conjectured in [13] that every convex semialgebraic set in Rn is semidenite representable.
Proof of Theorem 6.17. The convexity of S is obvious. Parts (a) and (b)
immediately follow from the TarskiSeidenberg quantier elimination [6].
(c) Let u S Tk . Note that S is a convex set and has the same boundary
as S. (If a set is not closed, then its boundary is dened to be the boundary of its
closure.)
First, consider the case that u is a smooth point. Since S is convex, S has a
supporting hyperplane u + w = {u + x : wT x = 0}. S lies on one side of u + w
and so does Tk , since Tk is contained in S. Since u is a smooth point, Ik (u) = {i}
has cardinality one. For some > 0 suciently small, we have
Tk B(u, ) = {x Rn : gik (x) 0, 2
x u
2 > 0}.
Note u + w is also a supporting hyperplane of Tk passing through u. So, the
gradient gik (u) must be parallel to w, i.e., gik (u) = ki w for some nonzero scalar

v is not
ki = 0. Thus, for all 0 = v w and > 0 small enough, the point u + v
in the interior of Tk B(u, ), which implies
&
'

gik u +
v 0 for all 0 = v w = gik (u) .

By the second order Taylor expansion, we have

v T 2 gik (u)v 0 for all 0 = v gik (u) ;


that is, gik is quasi-concave at u.
Second, consider the case that u S is a corner point. By assumption that
gik is irredundant and nonsingular at u, there exists a sequence of smooth points
{uN } Z(gik ) S such that uN u and gik (u) = 0.
So gik (uN ) = 0 for N suciently large. From the above, we know that

v T 2 gik (uN )v 0 for all 0 = v gik (uN ) .

Note that the subspace gik (uN )


where

equals the range space of the matrix R(uN )

$
k
k
T
R(v) := In
gik (v)
2
2 gi (v)gi (v) .
So the quasi-concavity of gik at uN is equivalent to
R(uN )T 2 gik (uN )R(uN )  0.

i
i

268

main
2012/11/1
page 268
i

Chapter 6. Semidenite Representability

Since gik (u) = 0, we have R(uN ) R(u). Therefore, letting N , we get


R(u)T 2 gik (u)R(u)  0,
which implies

v T 2 gik (u)v 0 for all 0 = v gik (u) ;


that is, gik is quasi-concave at u.
In part (c) of Theorem 6.17, the condition that gik is irredundant cannot be
dropped. We show this in the following example.
Example 6.18. Consider the basic closed semialgebraic set



3
2
2
2

2 g1 (x) := x1 x1 x2 (x1 x1 ) 0
xR
.
g2 (x) := x21 + x22 1 0
It is shaded in Figure 6.4. The point u = (1, 0) lies on the boundary of the set.
The real variety of g1 is not connected and has two components, so the inequality
g2 (x) 0 cannot be dropped in the description of this semialgebraic set. The
polynomial g2 is redundant at u, and it is not quasi-concave at u. Indeed, g2 is
strictly convex since its Hessian is always positive denite.
Given a semidenite representation of a projected spectrahedron S, nding
polynomials gik as in Theorem 6.17 is generally very dicult. However, for some

2.5
2
1.5
1
0.5
0
5

Figure 6.4. The semialgebraic set of Example 6.18.

i
i

6.3. Projected Spectrahedra

main
2012/11/1
page 269
i

269

simple cases, they could be obtained by eliminating lifting variables. Techniques


for doing so were presented in Chapter 5, and we also refer the readers to Tarski
Seidenberg quantier elimination [6]. We show how to do this in the following
example.
Example 6.19. Let S be the projected spectrahedron dened by

1
0
x1
0
1
x2 y  0.
x1 x2 y
y
Its picture is shown in the shaded area of Figure 6.5. The above linear matrix
inequality is equivalent to
f (x, y) := x21 + (x2 y)2 y 0,
where f (x, y) is the determinant of the dening linear pencil. If a point x lies on the
boundary of S, then there exists y such that f (x, y) = 0 and y is a local maximizer
of the function y  f (x, y), which implies
fy = 2x2 + 2y 1 = 0.
Eliminating y from f (x, y) = fy (x, y) = 0 gives the equation
g(x) := 1 + 4(x2 x21 ) = 0.
On the other hand, for every x satisfying g(x) 0, the equation f (x, y) = 0 has
a real solution y and the pair (x, y) satises f (x, y) 0. Therefore, we get an
equivalent description for S as
S = {(x1 , x2 ) : 1 + 4(x2 x21 ) 0}.

25

20

15

10

Figure 6.5. Projected spectrahedron dened in Example 6.19.

i
i

270

main
2012/11/1
page 270
i

Chapter 6. Semidenite Representability

The dening polynomial g(x) is concave. The boundary of S has positive curvature
everywhere.

6.3.3

Exercises

Exercise 6.20. Are the following sets projected spectrahedra?


;
:
(a) X S n : 0 X 3 + 2X 2 2X In .
(b) {X Rmn :
X
p,q 1} (p, q 1 being integers).
;
:
2
2
S+
: XY + Y X  I2 .
(c) (X, Y ) S+
;
:
(d) (X, Y ) S 2 S 2 : X 4 + Y 4 I2 .
If yes, nd a semidenite representation for it; if no, give reasons.
Exercise 6.21. Describe the following
terms of polynomials in (x1 , x2 ).

1 x1 y
x1
(a) : x1 1 x2  0; (b) : y1
y x2 1
1

projected spectrahedra in (x1 , x2 )-space in


y1
1
y2

1
1
y2  0; (c) : y1
x2
y2

y1
y2
x1

y2
x1  0.
x2

Verify your description by drawing the above projected spectrahedra in


Matlab.
Exercise 6.22. For each integer m n 1, nd a semidenite representation for
the hyperboloid:
{(x, t) Rn+ R+ : x1 xn tm }.
Exercise 6.23. If a convex cone K Rn is a projected spectrahedron, show that
its dual cone K is also a projected spectrahedron.
Exercise 6.24. Let P be the convex cone in the space R5 :
5
4
P = (f0 , f1 , f2 , f3 , f4 ) : f0 + f1 x + f2 x2 + f3 x3 + f4 x4 0 for all x [1, 1] .
Find a semidenite representation for P and its dual cone P .
Exercise 6.25. Let Q be the convex cone:
5
4
Q = (A, b, c) S n Rn R : xT Ax + 2bT x + c 0 for all
x
2 1 .
Find a semidenite representation for Q and its dual cone Q .
Exercise 6.26. A symmetric matrix A S n is called copositive if xT Ax 0 for
all x Rn+ . Find a semidenite representation for the cone C3 of all 3 3 copositive
matrices and its dual cone C3 . Repeat this for the 4 4 case.

i
i

6.4. Constructing Semidenite Representations

main
2012/11/1
page 271
i

271

Exercise 6.27. Is the convex set (n > 1)


4
5
(A, B, C) (S n )3 : A + 2xB + x2 C  0 for all x R
a projected spectrahedron? If yes, nd a semidenite representation for it; if no,
give reasons.

6.4

Constructing Semidenite Representations

A general approach for constructing explicit semidenite representations is to use


moments. This was originally proposed in [19, 33]. We rst describe a basic moment
construction of a possible semidenite representation, and then present tighter moment constructions. The basic construction produces a semidenite representation
when the convex set is sos-convex. The tighter constructions produce semidenite
representations for convex sets whose boundaries have positive curvature. In many
applications, a convex set is dened by rational function inequalities, or polynomial
matrix inequalities. In these cases semidenite representations can also be constructed by using moments in a similar way. We describe these constructions and
the conditions under which they work.

6.4.1

A Basic Moment Construction

To illustrate the basic idea of moment constructions, we begin with a simple example
of a one-dimensional convex set dened by a single quartic inequality
a0 + a1 x + a2 x2 + a3 x3 + a4 x4 0.
We introduce a new variable yi for each monomial xi and convert the dening
quartic inequality to the following system:
a0 y0 + a1 y1 + a2 y2 + a3 y3 + a4 y4 0,

y0
y1
y2
The matrix

y1
y2
y3


1
y2
y3 = x
y4
x2

1
x
x2

x
x2
x3

x
x2
x3

x2
x3 .
x4

x2
x3
x4

is always positive semidenite. Therefore we can relax the above system to

1 x y2
a0 + a1 x + a2 y2 + a3 y3 + a4 y4 0, x y2 y3  0,
y2 y3 y4
which yields a projected spectrahedron with lifting variables y2 , y3 , y4 .

i
i

272

main
2012/11/1
page 272
i

Chapter 6. Semidenite Representability

This construction can be readily applied in higher-dimensional cases. Let S


be a convex basic closed semialgebraic set given by
S = {x Rn : g1 (x) 0, . . . , gm (x) 0}.

(6.10)

d = max{'deg(g1 )/2(, . . . , 'deg(gm )/2(}.

(6.11)

Let

Write every gi as
gi (x) =

g(i) x .

||2d

If we let y = x for every , then S is equal to the set


4
5
x Rn : Lg1 (y) 0, . . . , Lgm (y) 0, y = x for all || 2d .
The linear functionals Lgi (y) are linearizations of the polynomials gi :

Lgi (y) =
g(i) y .
||2d

The vector y is called a truncated moment vector, indexed by Nn with || 2d.


Now we dene a linear pencil Md (y) by substituting y = x into [x]d [x]Td and call
Md (y) a moment matrix of order d in n variables. Since the matrix inequality
[x]d [x]Td  0
holds, we also get that Md (y)  0. Thus, the set S dened in (6.10) can be
equivalently described as




Lg1 (y) 0, . . . , Lgm (y) 0,
n
.
S = xR
Md (y)  0, y = x for all || 2d
Note that y0 = x0 = 1. As before, we can obtain a relaxation of S by dropping the
constraints y = x with || > 1 and get the projected spectrahedron

Lg1 (y) 0, . . . , Lgm (y) 0,


y0 = 1, Md (y)  0,
R = x Rn
.
(6.12)


x1 = ye1 , . . . , xn = yen
The lifting variables in R are y , where || 2.
Example 6.28. Consider the set S = {(x1 , x2 ) R2 : 1 x41 x42 x21 x22 0}.
The construction (6.12) gives a semidenite relaxation R of S dened by

1
x1 x2 y20 y11 y02
x1 y20 y11 y30 y21 y12

x2 y11 y02 y21 y12 y03


 0.

1 y40 y04 y22 0,

y20 y30 y21 y40 y31 y22


y11 y21 y12 y31 y22 y13
y02 y12 y03 y22 y13 y04
There are 12 lifting variables yij , and the equality R = S holds for this example, as
will be shown in Section 6.4.3.

i
i

6.4. Constructing Semidenite Representations

main
2012/11/1
page 273
i

273

Example 6.29 ([14]). Consider the set


S = {x Rn : 1 (xd )T Bxd 0},
2
3T
where B is a symmetric matrix and xd := xd1 xd2 xdn . The basic moment
relaxation R is



n

1 i,j=1 Bij yd(ei +ej ) 0,
x Rn
.
y0 = 1, Md(y)  0, x1 = ye , . . . , xn = ye
1

When B is positive semidenite with nonnegative entries and d is even, the equality
S = R holds, which will be shown in Section 6.4.3.

6.4.2

Tighter moment constructions

In general, the semidenite relaxation R given by (6.12) does not equal S, except in
the special case of sos-convex sets (dened in the next subsection). Hence tighter
constructions by using higher order moments are necessary. We describe two basic types of rened moment constructions: Putinar and Schm
udgen semidenite
relaxations.
To describe them, we need to dene localizing matrices. Let p be a polynomial
with deg(p) 2N . Write

)
A(N
(k = 'deg(p)/2();
p(x)[x]N k [x]TN k =
x
||2N
(N )

then dene a linear pencil Lp (y) by linearizing y = x as before,



)
)
L(N
A(N
p (y) =
y .
||2N
(N )

The pencil Lp (y) is called the Nth order localizing matrix of p. If p is nonnegative on S, then for every x S we have
)
L(N
p (y)  0

if every y = x .

(d)

Note that g0 = 1 and Lg0 = Md (y) as before. Since all g0 , g1 , . . . , gm are nonnegative on S, for every N the set S is contained in the projected spectrahedron



)
L(N
n
gi (y)  0, i = 0, 1, . . . , m

SN = x R
.
(6.13)
y0 = 1, x1 = ye1 , . . . , xn = yen
The set SN is called a Putinar semidenite relaxation of S.
The product of polynomials from any subset of g1 , . . . , gm is also nonnegative
m
on S. For every {0, 1}m, dene g := g11 gm
. Each g is nonnegative on S.
So every x S satises
y0 = 1,

)
L(N
g (y)  0

for all {0, 1}m

if every y = x .

i
i

274

main
2012/11/1
page 274
i

Chapter 6. Semidenite Representability

This implies that for every N the set S is contained in the projected spectrahedron
(N )


m

n Lg (y)  0 for all {0, 1} ,

SN = x R
.
(6.14)
y0 = 1, x1 = ye1 , . . . , xn = yen
udgen semidenite relaxation of S. Clearly, for every N ,
The set SN is called a Schm
SN SN because (6.14) has extra conditions in addition to those in (6.13). We
have the nesting relation
S1

S1

SN

SN

S
 .
S

Later we will see that both SN and SN are equal to S for N large enough, under
some general conditions. Typically, it is very dicult to get explicit bounds on N
for which SN = S or SN = S. In some special cases, such bounds can be estimated,
e.g., in [29, Section 3].
Example 6.30. Consider the convex set S dened by
g1 (x) := x2 x31 0,
The relaxation S3 is given

y01 y30
(3)
Lg1 (y) = y11 y40
y02 y31

y01 + y30
(3)
Lg2 (y) = y11 + y40
y02 + y31

by

g2 (x) := x2 + x31 0.

y11 y40
y21 y50
y12 y41

y02 y31
y12 y41  0,
y03 y32

y11 + y40
y21 + y50
y12 + y41

y02 + y31
y12 + y41  0,
y03 + y32

x1 = y10 ,

x2 = y01 ,

y00 = 1,

M3 (y)  0.

In addition to the above, the relaxation S3 has the extra inequality


L(3)
g12 (y) = y02 y60 0.
Higher order relaxations SN and SN can be constructed in a similar way.

6.4.3

Sos-convex Sets

A symmetric matrix polynomial P R[x]rr is a sum of squares if there exists a


matrix polynomial W such that P (x) = W (x)T W (x). A polynomial f is called
sos-convex if the matrix polynomial given by its Hessian 2 f is a sum of squares.
Similarly, f is called sos-concave if f is sos-convex. If for the set S dened in
(6.10) every gi is sos-concave, then we say that S is sos-convex.
Theorem 6.31 ([12]). Let S be dened as in (6.10) with nonempty interior. If
every dening polynomial gi is sos-concave, then the projected spectrahedron R given
by (6.12) is a semidenite representation of S.

i
i

6.4. Constructing Semidenite Representations

main
2012/11/1
page 275
i

275

In the rest of this subsection we present the proof of this result. It gives a
general framework for proving that moment relaxations provide semidenite representations. A typical approach for proving equality of two convex sets is to use
duality theory via separating hyperplanes. Let S be as in Theorem 6.31. Suppose
aT x + b = 0 (a = 0) is a supporting hyperplane of S, then
aT x + b 0 for all x S,

aT u + b = 0 for some u S.

The point u is a minimizer of aT x+b over S and belongs to the boundary S. Since S
has nonempty interior, there exists a point v Rn such that every gi (v) > 0 (Slaters
condition) and every gi is concave. So, the rst order optimality condition holds
at u (cf. [5, Proposition 5.3.5]); i.e., there exist Lagrange multipliers 1 0, . . . ,
m 0 such that
a = 1 g1 (u) + + m gm (u),

i gi (u) = 0 (1 i m).

Clearly, the Lagrangian


L(x) := aT x + b 1 g1 (x) m gm (x)

(6.15)

is convex and nonnegative everywhere and vanishes at u, and the gradient


L(u) = 0. Interestingly, the Lagrangian L(x) is sos if every gi is sos-concave.
This is the key point in proving Theorem 6.31.
Lemma 6.32 ([12]). If a symmetric matrix polynomial P R[x]rr is sos, then
for any u Rn , the symmetric matrix polynomial
, 1, t
P (u + s(x u)) ds dt
0

is sos. In case of r = 1, the above integral is an sos polynomial.


Proof. This is left as an exercise.
Lemma 6.33 ([12]). Let p be a polynomial such that p(u) = 0 and p(u) = 0 for
some u Rn . If its Hessian 2 p is sos, then p is also sos.
Proof. Let q(t) = p(u + t(x u)) be a univariate polynomial in t. Then
q  (t) = (x u)T 2 p(u + t(x u))(x u).
Thus, p(x) = q(1) has the expansion
'
&, 1 , t
(x u)T
2 p(u + s(x u)) ds dt (x u).
0

Since 2 p(x) is sos, the double integral above is sos by Lemma 6.32. Thus p(x) is
also sos.

i
i

276

main
2012/11/1
page 276
i

Chapter 6. Semidenite Representability


Using the above lemmas we now prove Theorem 6.31.

Proof. We have already seen that S R. If S = R, there must exist y satisfying


/ S. Since S is closed, there exists a supporting
(6.12) and x
= (
ye1 , . . . , yen )
hyperplane aT x + b = 0 of S separating x
strictly from S:
aT x + b 0 for all x S,

aT x
+ b < 0,

for some u S.

aT u + b = 0

The point u minimizes aT x + b over S. Since int(S) = and each gi is concave, the
rst order optimality condition holds (cf. [5]) and there must exist (1 , . . . , m ) 0
such that the Lagrangian L(x) in (6.15) is a convex nonnegative polynomial satisfying L(u) = 0 and L(u) = 0. Furthermore, its Hessian
2 L(x) =

m


i (2 gi (x))

i=1

is sos, and Lemma 6.33 implies L(x) is sos. The degree of L(x) is at most 2d. So
there exists a symmetric matrix W  0 such that
aT x + b =

m


i gi (x) + [x]Td W [x]d .

i=1

The above is an identity in x. Replacing each x by y results in


aT x + b =

m


$
%
i Lgi (
y ) + T r W Md (
y ) 0,

i=1

which contradicts the previous assertion that aT x


+ b < 0. Thus S = R.
Example 6.34. Consider the set in Example 6.28. The dening polynomial there is
sos-concave, because the Hessian 2 (1+x41 +x42 +x21 x22 ) has the sos decomposition


x
4 1
x2



x1
x2

T
+2


2x1

2
x1

+2


x2

2
2x2

By Theorem 6.31, the projected spectrahedron R given by (6.12) is a semidenite


representation for S.
Example 6.35. The set in Example 6.29 is sos-convex because the Hessian of
(xd )T Bxd has the decomposition
Diag(xd1 ) W Diag(xd1 ) + Diag(a1 (x), . . . , an (x)),
where W and each ai (x) are given as
W = 2d2 B + 2d(d 1)Diag(B),

ai (x) = 2d(d 1)

Bij xd2
xdj .
i

j =i

i
i

6.4. Constructing Semidenite Representations

main
2012/11/1
page 277
i

277

If B  0 and d 1, then W  0 and must be sos; if each Bij 0 and d > 0 is even,
then all ai (x) are sos. Therefore, when B  0, every Bij 0 and d > 0 is even, the
form (xd )T Bxd is sos-convex, and by Theorem 6.31 the projected spectrahedron R
given by (6.12) is a semidenite representation for S.
Sos-convexity is a very strong condition, and not all convex polynomials are
sos-convex. An explicit example is given in [1]. More generally, a nonnegative
convex polynomial need not be a sum of squares (cf. Chapter 4). Generally, the
projected spectrahedron R given by (6.12) does not equal S if gi are not sos-concave.
On the other hand, sos-convexity can be veried by semidenite programming.
A polynomial f is sos-convex if and only if its Hessian 2 f is sos. This can be
checked numerically by solving a single SDP feasibility problem, and therefore, sosconvexity is a favorable condition in practice.

6.4.4

Strictly Convex Sets

When S is not sos-convex, the basic moment relaxation R given by (6.12) might
not be a semidenite representation of S. The projected spectrahedra SN in (6.13)
and SN in (6.14) are better candidates for a semidenite representation of S. We
now examine weaker conditions than sos-convexity that guarantee that SN = S (or
SN = S) for some nite N .
A sucient condition for SN = S or SN = S is the bounded degree representation (BDR) introduced by Lasserre in [19]. BDR is typically very dicult to check.
More easily checkable conditions are strict convexity and strict quasi-convexity. We
now discuss these cases.
Bounded Degree Representation Condition
A general approach for showing that a moment relaxation produces a semidenite
representation is given in the proof of Theorem 6.31. The key point is to prove a
weighted sos representation with uniform degree bounds for all linear functionals
nonnegative on S. If a linear functional aT x + b is positive on S, then Putinars
Positivstellensatz [37] says that
aT x + b = 0 + 1 g1 + + m gm ,

(6.16)

where each i is an sos polynomial. To make sure that (6.16) holds, we require that
the presentation of S satises the archimedean condition: there exist sos polynomials
s0 , s1 , . . . , sm and a number M > 0 such that
M
x
22 = s0 + s1 g1 + + sm gm .
The archimedean condition implies that S is compact, but the reverse is not necessarily true. However, the presentation of any compact set S can be strengthened to satisfy the archimedean condition by adding a redundant ball constraint
M
x
22 0 for a suciently large M . Generally, the degrees of the polynomials
i in (6.16) go to innity as the minimum value of aT x + b on S tends to zero.

i
i

278

main
2012/11/1
page 278
i

Chapter 6. Semidenite Representability

Moment relaxations present the dual side of sum of squares relaxations. If we


have SN = S for some N then any linear functional positive on S is also positive on SN . Linear functionals positive on SN have a weighted sum of squares
representation with bounded degree 2N :
aT x + b > 0 on S

aT x + b = 0 + 1 g1 + + m gm

with i sos and deg(i gi ) 2N for all i. If almost all positive linear functionals
on S have such a representation, then we say that the presentation of S admits a
PutinarPrestel bounded degree representation (PP-BDR) of order N (cf. [19]).
For the Schm
udgen moment relaxation SN in (6.14), to guarantee that SN = S
for some order N , we need a Schm
udgen bounded degree representation (S-BDR) of
order N (cf. [19]): for almost every pair (a, b) Rn R
aT x + b > 0 on S

aT x + b =

g ,

{0,1}m

with every being sos and deg( g ) 2N .


Theorem 6.36 ([19]). Suppose S is dened by (6.10) and is compact.
(a) If PP-BDR of order N holds for S, then conv(S) equals SN .
(b) If S-BDR of order N holds for S, then conv(S) equals SN .
Proof. We sketch the proof given by Lasserre in [19].
(a) We have already seen in the construction of (6.13) that S SN , regardless
of whether S is convex. Therefore, conv(S) SN . Now we prove the reverse
containment by contradiction. Suppose there exists a y satisfying the linear matrix
/ conv(S). Since conv(S) is closed,
inequalities in (6.13) and x
= (
ye1 , . . . , yen )
there exists a hyperplane strictly separating x
from conv(S): there exist a nonzero
a Rn and b R such that
aT x + b > 0 for all x S,

aT x + b < 0.

Since conv(S) is compact, we can choose the above (a, b) generically. Since PP-BDR
of order N holds for the presentation of S, there exist sos polynomials 0 , . . . , m
such that (6.16) is true and deg(i gi ) 2N . For each i, we can nd a symmetric
Wi  0 such that i (x) = [x]TN di Wi [x]N di with di = 'deg(gi )/2(. Replacing each
monomial x by y , we get
/
.
/
.
)
)
aT x
+ b = T r L(N
y )W0 + + T r L(N
y )Wm 0,
g0 (
gm (
which contradicts the previous assertion that aT x
+b < 0. Therefore, conv(S) = SN .
Part (b) is proved in almost exactly the same way.

i
i

6.4. Constructing Semidenite Representations

main
2012/11/1
page 279
i

279

Generally, it is quite dicult to check PP-BDR for a given semialgebraic set.


In the denition of PP-BDR, we require that almost every linear polynomial aT x+b
positive on S admits a representation like (6.16) with uniform degree bounds on the
sos polynomials i . This is almost impossible to check in practice. Indeed, to check
PP-BDR, one often needs to prove that the BDR holds for every linear polynomial
aT x + b that is nonnegative on S. Interestingly, this stronger version of PP-BDR
is satised under some general conditions (cf. [12]). The situation for S-BDR is
similar.
Theorem 6.37 ([12]). Suppose S = {x Rn : g1 (x) 0, . . . , gm (x) 0} is
compact and convex and has nonempty interior. Assume each gi (x) is concave on
S and let Si := {x : gi (x) 0}.
(i) Suppose for each i, either 2 gi (x) is sos or 2 gi (u) 0 for all u
Si S. Then there exists N > 0 such that if aT x + b is nonnegative on S,
then

g
aT x + b =
{0,1}m

for some sos polynomials satisfying deg( g ) 2N . Thus, the S-BDR of


order N holds for S.
(ii) If, in addition, S satises the archimedean condition, then the PP-BDR of
some order N  holds for S.
A detailed proof of Theorem 6.37 is given in [12]. We sketch the basic ideas
behind the proof. Suppose aT x + b is nonnegative on S and aT u + b = 0 for some
point u S. Under some general assumptions, the KKT conditions hold at u, i.e.,
there exist Lagrange multipliers 1 0, . . . , m 0 such that
a=

m


i gi (u),

1 g1 (u) = = m gm (u) = 0.

i=1

Let L(x) be the Lagrangian dened in (6.15). Note that L(u) = 0 and L(u) = 0.
By Taylor expansion

, 1, t
m


L(x) = (x u)T
i
2 gi (u + s(x u))ds dt (x u).
i=1

0
0
!
"

Hi (x,u)

For any xed u, Hi (x, u) is a matrix polynomial in x. If each Hi (x, u) is an sos


matrix in x, then L(x) must be sos since each i 0. This further implies that
aT x+b has the desired Putinar- or Schm
udgen-type representation. Conditions such
as sos-convexity or strict convexity ensure that each matrix polynomial Hi (x, u)
admits a Putinar- or Schm
udgen-type weighted sos representation with uniform
degree bounds that are independent of u, and hence the PP-BDR or S-BDR holds.

i
i

280

main
2012/11/1
page 280
i

Chapter 6. Semidenite Representability

We close by noting that if the set S is convex, then Theorem 6.37 gives concrete
conditions under which SN and SN give semidenite representations of S.
Convex sets with positively curved boundaries
When a semialgebraic set S is convex its dening polynomials are not necessarily
concave. For instance, the hyperboloid {x R2+ : x1 x2 1 0} is convex, while its
dening polynomial is neither concave nor convex. However, because of convexity,
the boundary of S must have nonnegative curvature at smooth points (see Theorem 6.17). Therefore, the dening polynomials are quasi-concave at smooth points.
This observation leads to weaker conditions, such as strict quasi-concavity of the
dening polynomials.
Theorem 6.38 ([15]). Assume that the set S dened in (6.10) is compact and
convex and has nonempty interior. If each gi (x) is either sos-concave or strictly
quasi-concave on S, then SN equals S for N suciently large. If, in addition, the
archimedean condition holds, then SN equals S for N suciently large.
The proof of Theorem 6.38 is based on Theorem 6.37. The basic idea is that
we are able to nd a dierent set of strictly concave dening polynomials for S by
using strict quasi-concavity. When gi (x) is strictly quasi-concave on S, we can nd
a polynomial hi (x) positive on S such that pi (x) = gi (x)hi (x) is strictly concave
on S. We refer to [15] for the details of the proof but provide an example below.
Consider the set


1
S = x R2 : g1 (x) := x1 x2 1 0, g2 (x) := (x1 1)2 (x2 1)2 0 .
9
The set S is compact and convex. The polynomial g1 is strictly quasi-concave, but
not concave. However, the set S can also be equivalently described as




p1 (x) := (x1 x2 1)(3 x1 x2 ) 0

2
S= xR
,
g2 (x) := 19 (x1 1)2 (x2 1)2 0
where p1 (x) is strictly concave on S.
For a convex basic closed semialgebraic set S, the Putinar moment relaxation
produces a semidenite representation of S only if all faces of S are exposed (cf.
[25]). There are further dierent conditions under which SN or SN gives semidenite
representations of S (cf. [20, 28]).

6.4.5

Generalizations

In many applications convex sets are naturally dened by rational function inequalities or polynomial matrix inequalities. In these cases semidenite representations
can also be constructed by using moments. We show some examples without going
into the details. Further results on these topics can be found in [28, 29].

i
i

6.4. Constructing Semidenite Representations

main
2012/11/1
page 281
i

281

1
0.8
0.6
0.4

x2

0.2
0
2
4
6
8

0
x1

0.5

Figure 6.6. The convex set dened by x21 + x22 x41 + x21 x22 + x42 .

First, we consider the case of a convex set dened by a rational function


inequality, f (x) 0. Of course, one could describe this set by polynomial inequalities. However, by doing so, one might lose some nice properties of f (e.g.,
concavity of the dening inequalities could be lost). As we have seen in the preceding subsections, convexity plays a crucial role in constructing valid semidenite
representations. When f is rational, to construct a semidenite relaxation, we need
moments with fractional weights. The general constructions are described in [28].
Here is an example.
Example 6.39 ([28]). Consider the two-dimensional set S dened by the polynomial inequality x21 + x22 x41 x21 x22 x42 0. Its dening polynomial is not concave
in R2 (it is actually convex near the origin). This set is drawn in Figure 6.6. Clearly,
S can also be described by the rational inequality
f (x) = 1

x41 + x21 x22 + x42


0.
x21 + x22

Interestingly, the rational function f (x) is concave everywhere. It satises a socalled rst order sos-concavity condition, and the set S admits the following semidefinite representation (cf. [28]):

y = (yij ), z = (zij ), s.t.



2
1 y20 + z04
xR

L1 (x, y, z) + L2 (x, y, z)  0

i
i

282

main
2012/11/1
page 282
i

Chapter 6. Semidenite Representability

where L1 , L2 are linear pencils dened as

0 0
0
1
0
0
0 1
0
x1
x2
0

0 0
0
x
0
0
2
,

L1 (x, y, z) =

1
x
x
y

y
y
y
1
2
20
02
11
02

0 x2 0
y11
y02 0
0
0
0 0
0
y02

z00
z10
z01 z02
z11
z02
z10 z02
z
z
z
z12
11
12
03

z01
z
z
z
z
z03
11
02
03
12
L2 (x, y, z) =
z02 z12 z03
z
z
z
04
13
04

z11 z03
z12 z13 z04
z13
z02
z12
z03 z04
z13
z04

The lifting variables yij correspond to regular moments, while zij correspond to
moments with the weight (x21 + x22 )1 , i.e., the integrals of type
,
xi xj
d(x)
2
x1 + x22
with respect to some positive measure on Rn . The details of constructing L1 , L2
are described in [28].
Now we consider the case of a convex set dened by a polynomial matrix inequality. A semidenite relaxation as in (6.12) can be constructed by using moments.
Under a matrix sos-convexity condition, this construction gives a semidenite representation of the convex set (cf. [29]).
Example 6.40 ([29]). Consider the set S dened by the polynomial matrix inequality:

2 x21 2x23
1 + x1 x2
x1 x3
1 + x1 x2
2 x22 2x21
1 + x2 x3  0.
x1 x3
1 + x2 x3
2 x23 2x22
The above quadratic matrix polynomial is matrix sos-concave (cf. [29]). A picture of
this set is drawn in Figure 6.7. As in (6.12), a basic moment semidenite relaxation
of S is

2 y200 2y002 1 + y110


y101

1 + y110 2 y020 2y200 1 + y011


0


y
1
+
y
2

2y

101
011
002
020


y
s.t.
ijk
3

xR
.
1
x1
x2
x3

x1 y200 y110 y101



x2 y110 y020 y011  0


x3 y101 y011 y002
Indeed, the above is a semidenite representation of S, as shown in [29]. Therefore,
S is a projected spectrahedron.

i
i

6.4. Constructing Semidenite Representations

main
2012/11/1
page 283
i

283

0.5

1
0.5

1
0.5

Figure 6.7. The convex set in Example 6.40.

6.4.6

Convex Hulls of Unions

Suppose we can divide a convex set S into several parts and nd a semidenite representation for each piece. Then a natural question is whether these representations
can be glued together to provide a semidenite representation of S. This brings us
to the main question of this section: Is the convex hull of a union of projected
spectrahedra a projected spectrahedron? If so, how can we construct a semidenite
representation of it? Interestingly, there exist positive answers to these questions.
A simple implementation of the above idea is to cover the compact set by
nitely many balls. If the intersection of each ball with the convex set is a projected spectrahedron, then we can glue them together to get a uniform semidenite
representation for the whole convex set. This approach is called localization. The
necessary tool is building a single semidenite representation for the convex hull of
several projected spectrahedra. Since balls (ellipsoids) are spectrahedra, the question of semidenite representability of a convex set reduces to the representability
of the intersections of balls with the boundary of the set. Thus we can focus on
local properties of the boundary.
Let W1 , . . . , Wm Rn be convex sets. Their Minkowski sum is the convex set
dened as

 m


W1 + + Wm :=
xk xk Wk , k = 1, . . . , m .

k=1

If each Wk is a projected spectrahedron described by


Nk
n
.
/


(k)
(k) (k)
xi Bi +
yj Cj  0,
Lk x, y (k) := A(k) +
i=1

(6.17)

j=1

i
i

284

main
2012/11/1
page 284
i

Chapter 6. Semidenite Representability

then a semidenite representation for the Minkowski sum W1 + + Wm is



 m


.
/
.
/
.
/

(k)
(k) (k)
(1) (1)
(m) (m)
 0 for pairs x , y
,..., x ,y
.
x Lk x , y

k=1

Lemma 6.41. If W1 , . . . , Wm are nonempty convex sets, then



m
<
<
Wk =
(1 W1 + + m Wm ) ,
conv
k=1

(6.18)

where m = { Rm
+ : 1 + + m = 1} is the standard simplex.
Proof. The proof follows readily from the denitions of convex hull and Minkowski
sum. See, for instance, [13].
Using Lemma 6.41, we can get a single semidenite representation for the
convex hull conv(m
k=1 Wk ) from those of the individual Wk .
Theorem 6.42 ([13]). Let W1 , . . . , Wm be nonempty projected spectrahedra dened
in (6.17), and W := conv(m
k=1 Wk ) be the convex hull of their union. Dene



k , u(k) (k = 1, . . . , m)


(k)
1 , . . . , m 0, 1 + + m = 1,
.
(6.19)
C :=
x


k=1

k A(k) + n x(k) B (k) + Nk u(k) C (k)  0


i
j
j=1 j
i=1 i
Then W C and C = W . If, in addition, every Wk is bounded, then C = W .
Proof. This is left as an exercise.
Example 6.43. Let W1 , W2 be the spectrahedra dened by




x1
2 + x1 x2 + 1
x2 1
 0,
 0.
x2 + 1 x1
x2 1 2 x1
They are unit balls centered at (1, 1). The convex hull of their union has the
semidenite representation




(1)
(1)

+
x
x
+


1
1
1
2

0


(1)
(1)


x
+

x
1

2
1




(1)
(2)
(2)
(2)
x=x +x
.
x
x

2
1
2


(2)
(2)  0

x2 2 22 x1


1 + 2 = 1, 1 , 2 0

i
i

6.4. Constructing Semidenite Representations

main
2012/11/1
page 285
i

285

Setting x(2) = (u1 , u2 ), we get a projected spectrahedron with three lifting variables:


21 + x1 u1 x2 u2 + 1

0

x2 u2 + 1

u1 x1




2
u1
u2 + 1 1
.
xR
0


u
+

1
2

2
1
1
1


1 0, 1 1 0
When some Wk are unbounded, C and the convex hull W may not be equal,
but they have the same closure and interior. Note that both C and W are not
necessarily closed even when all Wi are.
Example 6.44 ([13]). (i) Consider the following spectrahedra:




x1 1
2
W1 = x R :
 0 , W2 = {0}.
1 x2
Their convex hull is
conv(W1 W2 ) = {x R2+ : x1 = x2 = 0 or x1 x2 > 0}.
However, the set C in (6.19) is


x
x R2 : 0 1 1, 1
1



1
 0 = R2+ .
x2

So, C = conv(W1 W2 ), but they have same interior. Both W1 and W2 are closed
while conv(W1 W2 ) is not.
(ii) Consider the projected spectrahedra


x1
2
W1 = x R : u 0,
1 + x2



1 + x2
0
1+u

and W2 = {0}. We have


conv(W1 W2 ) = {x R2 : x1 > 0, or x1 = 0 and 1 x2 0},
conv(W1 W2 ) = {x R2 : x1 0},
and hence, conv(W1 W2 ) is not closed. However, one can verify that the projected
spectrahedron C is equal to conv(W1 W2 ).
As seen above, the equality C = W is possible even if W1 , . . . , Wm are not
all bounded. In particular, we always have C = W if every Wk is homogeneous
(i.e., A(k) = 0 in the semidenite representation of Wk ). This fact is implied by
Lemma 6.41.

i
i

286

main
2012/11/1
page 286
i

Chapter 6. Semidenite Representability

Example 6.45 ([13]). Consider the two spectrahedra in R2 dened by






x1 1
x1 1
 0,
 0.
1
x2
1 x2
The convex hull of their union is the open half space {x : x2 > 0}, which is precisely
equal to the projected spectrahedron C. The description of C can be simplied to





u1 x1
u1
1
1 1

 0,
0

u
1

u
1
2
2
1
2
x R2
.

2

u R , 0 1 1
This is a semidenite representation with three lifting variables.
Putting all of the above together we obtain the following result.
Theorem 6.46 ([13]). Let S Rn be a compact convex set. Then S is a projected
spectrahedron if and only if for every u S there exists > 0 such that the
intersection S B(u, ) is a projected spectrahedron.
Proof. The only if part is trivial, because the closed ball
B(u, ) = {x :
x u
2 }
is a spectrahedron. For the if part, suppose for every u S and some u > 0,
the set S B(u, u ) is a projected spectrahedron. Note that {B(u, u ) : u S} is
an open cover for the compact set S. So there are a nite number of balls, say,
B(u1 , 1 ), . . . , B(uL , L ), to cover S. Note that
 L

 L

<
<
S = conv(S) = conv
(S B(uk , k )) conv
(S B(uk , k )) S.
k=1

k=1

The sets S B(uk , k ) are all bounded. By Theorem 6.42, we see that

 L
<
S B(uk , k )
S = conv
k=1

is a projected spectrahedron.

6.4.7

Sucient Conditions for Semidenite Representability

We now have all the tools to present a sucient condition for a compact convex
semialgebraic set S Rn to be a projected spectrahedron. The condition essentially
requires that the boundary of S has positive curvature.
Theorem 6.47 ([13]). Suppose S Rn is a compact convex set dened by
S

m
<

k
Tk := {x Rn : g1k (x) 0, . . . , gm
(x) 0},
k

k=1

i
i

6.4. Constructing Semidenite Representations

main
2012/11/1
page 287
i

287

where gik are polynomials. If for every u S and every gik satisfying gik (u) = 0,
Tk has interior near u (i.e., for any > 0, the ball B(u, ) intersects the interior
of Tk ) and gik (x) is strictly quasi-concave at u, then S is a projected spectrahedron.
Theorem 6.47 is proved by applying Theorem 6.46. It is enough to show that
for every u S, there exists a ball B(u, ) so that S B(u, ) is a projected
spectrahedron. Note that S B(u, ) is a nite union of intersections Tk B(u, ).
By Theorem 6.38, every Tk B(u, ) is a projected spectrahedron, under the assumption of strict quasi-concavity of the dening polynomials. A complete proof of
this result can be found in Theorem 4.5 of Helton and Nie [13].
In Theorem 6.47, if the set S is not convex, but the other conditions are satised, then we can conclude that the convex hull of S is a projected spectrahedron.
Here we give some remarks on the conditions in Theorems 6.31, 6.37, and 6.47.
Theorem 6.31 assumes that all gi are sos-concave, which is the strongest assumption, but its conclusion is also the strongest: (6.12) is an explicit representation of S
as a projected spectrahedron. Theorem 6.37 assumes that gi are either sos-concave
or strictly quasi-concave, which is weaker than Theorem 6.31, and its conclusion
is also weaker: SN or SN provides a representation of S as a projected spectrahedron for some large enough N . Theorem 6.47 assumes the weakest condition, but
its conclusion is also the weakest: there exists a semidenite representation of S
(an explicit description is typically quite complicated).
By comparing Theorems 6.17 and 6.47, we can see that the presented necessary
and sucient conditions for semidenite representability are not too far away from
each other. The dierence between them is nonnegative versus positive curvature
and singularity versus nonsingularity.

6.4.8

Exercises

Exercise 6.48. Prove Lemma 6.32.


Exercise 6.49. Show that a polynomial f (x) is sos-convex if and only if the
following polynomial in (x, y) is sos:
f (x) f (y) f (y)T (x y).
Exercise 6.50. Show that a univariate polynomial f (x) is sos-convex if and only
if it is convex everywhere. Show that this is also true if f (x) is a bivariate quartic
polynomial.
Exercise 6.51. Let f (x) be a cubic polynomial that is concave over Rn+ and
S = {x Rn : x1 0, . . . , xn 0, f (x) 0}.
Show that the equality SN = S holds for some order N , where SN is given by (6.13).
What is the smallest value of such an N ?

i
i

288

main
2012/11/1
page 288
i

Chapter 6. Semidenite Representability

Exercise 6.52. Consider the following convex set


S = {x R2 : x1 0, x2 0, x51 + x52 1}.
Find a semidenite representation for S with the smallest number of lifting variables.
Exercise 6.53. Consider the following basic closed semialgebraic set:
S = {x R2 : x21 x22 (x21 + x22 )2 0, x1 0}.
Does there exist an order N > 0 such that SN = S, where SN is given by (6.13)?
If so, what is the smallest N making the equality occur? If no, give reasons. How
about SN given by (6.14)?
Exercise 6.54. For each of the following cases, nd a semidenite representation
with the smallest number of lifting variables for the convex hull of the union of the
given convex sets:
(a) Two balls B(1, 1), B(1, 1/2) in Rn (1 is the vector of all ones).
(b) Three pairwise touching balls in R2 :
(x1 + 1)2 + x22 1,

(x1 1)2 + x22 1,

x21 + (x2 2)2 ( 5 1)2 .

(c) Two elliptopes in R3 :

1
x1
x2

x1
1
x3

x2
x3  0,
1

1
x1 1 x2 1
x1 1
1
x3 1  0.
x2 1 x3 1
1

(d) The semidenite cone and nonnegative orthant embed in R3 :




x1
x2

x2
 0,
x3


x1
x2 0.
x3

Exercise 6.55. Let P be the set of univariate quadratic polynomials that are either
nonnegative on [1, 0] or nonnegative on [0, 1]. Find a semidenite representation
for the convex hull of P with the smallest number of lifting variables.
Exercise 6.56. Prove Theorem 6.42. (Hint: use Lemma 6.41.)
Exercise 6.57. Let T be a compact nonconvex set in Rn . Its convex boundary is
dened as c T := T conv(T ). Show that conv(c T ) = conv(T ). Is this also
true if T is not compact?

i
i

Bibliography

main
2012/11/1
page 289
i

289

Bibliography
[1] A. A. Ahmadi and P. A. Parrilo. A convex polynomial that is not sos-convex.
Math. Program., 135:275292, 2012.
[2] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Program., 95:351, 2003.
[3] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization:
Analysis, Algorithms, and Engineering Applications, MPS/SIAM Ser. Optim.
SIAM, Philadelphia, 2001.
[4] G. Blekherman. Convex forms that are not sums of squares. Preprint, 2009.
http://arxiv.org/abs/0910.0656.
[5] D. P. Bertsekas. Convex Optimization Theory. Athena Scientic, Belmont, MA,
2009.
[6] J. Bochnak, M. Coste, and M.-F. Roy. Real Algebraic Geometry, Springer,
Berlin, 1998.
[7] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory, SIAM Stud. Appl. Math. 15, SIAM,
Philadelphia, 1994.
[8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, Cambridge, UK, 2004.
[9] P. Br
anden. Obstructions to determinantal representability. Adv. Math.,
226:12021212, 2011.
[10] J. B. Conway. A Course in Functional Analysis. Grad. Texts in Math. Springer,
Berlin, 1985.
[11] D. Cox, J. Little, and D. OShea. Ideals, Varieties, and Algorithms. An Introduction to Computational Algebraic Geometry and Commutative Algebra, 3rd
edition, Undergrad. Texts in Math. Springer, New York, 2007.
[12] J. W. Helton and J. Nie. Semidenite representation of convex sets. Math.
Program., 122:2164, 2010.
[13] J. W. Helton and J. Nie. Sucient and necessary conditions for semidenite
representability of convex hulls and sets. SIAM J. Optim., 20:759791, 2009.
[14] J. W. Helton and J. Nie. Structured semidenite representation of some convex
sets. Proceedings of 47th IEEE Conference on Decision and Control (CDC),
Cancun, Mexico, Dec. 911, 2008, pp. 47974800.
[15] J. W. Helton and J. Nie. Semidenite representation of convex sets and convex
hulls. In M. Anjos and J. Lasserre, editors, Handbook on Semidenite, Cone
and Polynomial Optimization: Theory, Algorithms, Software and Applications,
to appear.

i
i

290

main
2012/11/1
page 290
i

Chapter 6. Semidenite Representability

[16] J. W. Helton, S. McCullough, and V. Vinnikov. Noncommutative convexity


arises from linear matrix inequalities. J. Funct. Anal., 240:105191, 2006.
[17] J. W. Helton and V. Vinnikov. Linear matrix inequality representation of sets.
Comm. Pure Appl. Math., 60:654674, 2007.
[18] J. Lasserre. Global optimization with polynomials and the problem of moments.
SIAM J. Optim., 11:796817, 2001.
[19] J. Lasserre. Convex sets with semidenite representation. Math. Program.,
120:457477, 2009.
[20] J. Lasserre. Convexity in semialgebraic geometry and polynomial optimization.
SIAM J. Optim., 19:19952014, 2009.
[21] P. Lax. Dierential equations, dierence equations and matrix theory. Comm.
Pure Appl. Math., 6:175194, 1958.
[22] M. Marshall. Representation of non-negative polynomials, degree bounds and
applications to optimization. Canad. J. Math., 61:205221, 2009.
[23] Y. Nesterov and A. Nemirovski. Interior-Point Polynomial Algorithms in Convex Programming, SIAM Stud. Appl. Math. 13. SIAM, Philadelphia, 1994.
[24] A. Nemirovskii. Advances in convex optimization: conic programming. Plenary
Lecture, International Congress of Mathematicians (ICM), Madrid, Spain,
2006.
[25] T. Netzer, D. Plaumann, and M. Schweighofer. Exposed faces of semidenitely
representable sets. SIAM J. Optim., 20:19441955, 2010.
[26] T. Netzer and A. Thom. Polynomials with and without determinantal representations. Linear Algebra Appl., 437:15791595, 2012.
[27] J. Nie and M. Schweighofer. On the complexity of Putinars Positivstellensatz.
J. Complexity, 23:135150, 2007.
[28] J. Nie. First order conditions for semidenite representations of convex sets
dened by rational or singular polynomials. Math. Program. Ser. A, 131:136,
2012.
[29] J. Nie. Polynomial matrix inequality and semidenite representation. Math.
Oper. Res., 36:398415, 2011.
[30] J. Nie and B. Sturmfels. Matrix cubes parametrized by eigenvalues. SIAM J.
Matrix Anal. Appl., 31:755766, 2009.
[31] J. Nie, P. A. Parrilo, and B. Sturmfels. Semidenite representation of the
k-ellipse. In A. Dickenstein, F.-O. Schreyer, and A. Sommese, editors, Algorithms in Algebraic Geometry. Springer, New York, 2008, pp. 117132.

i
i

Bibliography

main
2012/11/1
page 291
i

291

[32] J. Nie and J. Demmel. Minimum ellipsoid bounds for solutions of polynomial
systems via sum of squares. J. Global Optim., 33:511525, 2005.
[33] P. A. Parrilo. Exact semidenite representation for genus zero curves. Talk at
the Ban Workshop Positive Polynomials and Optimization, Ban, Canada,
October 812, 2006.
[34] P. A. Parrilo and B. Sturmfels. Minimizing polynomial functions. In S. Basu
and L. Gonzalez-Vega, editors, Proceedings of the DIMACS Workshop on Algorithmic and Quantitative Aspects of Real Algebraic Geometry in Mathematics
and Computer Science (March 2001), American Mathematical Society, Providence, RI, 2003, pp. 83100.
[35] D. Plaumann, B. Strumfels, and C. Vinzant. Computing linear matrix representations of Helton-Vinnikov curves. In H. Dym, M. de Oliveira, and M. Putinar,
editors, Mathematical Methods in Systems, Optimization, and Control, Oper.
Theory Adv. Appl., Birkhauser, Basel, 2011.
[36] S. Prajna, A. Papachristodoulou, P. Seiler, and P. Parrilo. SOSTOOLS Users
Guide. Website: http://www.mit.edu/parrilo/sosTOOLS/.
[37] M. Putinar. Positive polynomials on compact semi-algebraic sets, Indiana Univ.
Math. J., 42:969984, 1993.
[38] K. Schm
udgen. The K-moment problem for compact semialgebraic sets. Math.
Ann., 289:203206, 1991.
[39] M. Spivak. A Comprehensive Introduction to Dierential Geometry. Vol. II,
2nd edition. Publish or Perish, Inc., Wilmington, DE, 1979.
[40] H. Wolkowicz, R. Saigal, and L. Vandenberghe, editors. Handbook of Semidefinite Programming. Kluwer, Amsterdam, 2000.

i
i

main
2012/11/1
page 292
i

main
2012/11/1
page 293
i

Chapter 7

Spectrahedral
Approximations of
Convex Hulls of
Algebraic Sets
Jo
ao Gouveia and Rekha R. Thomas

This chapter describes a method for nding spectrahedral approximations of the


convex hull of a real algebraic variety (the set of real solutions to a nite system
of polynomial equations). The procedure creates a nested sequence of convex approximations of the convex hull of the variety. Computations can be done modulo
the ideal generated by the polynomials which has several advantages. We examine
conditions under which the sequence of approximations converges to the closure of
the convex hull of the real variety, either asymptotically or in nitely many steps,
with special attention to the case in which the very rst approximation yields a
semidenite representation of the convex hull. These methods allow optimization,
or approximation of the optimal value, of a linear function over a real algebraic
variety via semidenite programming.

7.1

Introduction

A central problem in optimization is to nd the maximum (or minimum) value of


a linear function over a set S in Rn . For example, in a linear program
maximize {c, x : Ax b}
with c Rn , A Rmn , and b Rm , the set S = {x Rn : Ax b} is a
polyhedron, while in a semidenite program,


n

Ai xi  0
maximize c, x : A0 +
i=1
Jo
ao

Gouveia was partially supported by NSF grant DMS-0757371 and by Fundac


ao para a
Ci
encia e Technologia.

293

i
i

294

main
2012/11/1
page 294
i

Chapter 7. Convex Hulls of Algebraic Sets

with c Rn and symmetric


n matrices A0 , A1 , . . . , An , the feasible region is the set
S = {x Rn : A0 + i=1 Ai xi  0} which is a spectrahedron. In both cases,
S is a convex semialgebraic set as it is convex and can be dened by a nite list
of polynomial inequalities. A real algebraic variety, which is the set of all real
solutions to a nite list of polynomial equations, is a special case of a semialgebraic
set. Optimizing a linear function over any set S Rn , in particular, a real algebraic
variety, is equivalent to optimizing the linear function over the closure of conv(S),
the convex hull of S. In this chapter we describe a method to construct semidenite
approximations of the closure of the convex hull of a real algebraic variety.
Representing the convex hull of a real algebraic variety is a multifaceted problem that arises in many contexts in both theory and practice. In Chapter 5 we
saw a method using dual projective varieties for explicitly nding the polynomials
that describe the boundary of the convex hull of a real variety. These bounding
polynomials use the same variables as those describing the variety and can be highly
complicated. Their computation boils down to eliminating variables from a larger
polynomial system and can be challenging in practice, although they can be computed using existing computer algebra packages in examples with a small number of
variables. If one is allowed to use more variables than those describing the variety,
then there is more freedom in nding representations and approximations and the
key idea then is to express the convex hull implicitly as the projection of a higherdimensional object. This approach is more exible than the former and has the
potential to yield a representation of a complicated set as the projection of a simple
set in higher dimensions. The method we will describe adopts this philosophy for
nding approximations and representations of the convex hull of a real algebraic
variety.
We present a procedure for nding a sequence of approximations of the convex
hull of a real algebraic variety (sometimes just called an algebraic set) in the form of
projected spectrahedra. While the convex hull of a real algebraic variety is a convex
semialgebraic set, recall from Chapter 6 that it is not known which convex semialgebraic sets are projected spectrahedra. Regardless, we will develop an automatic
method that nds semidenite representations (as projected spectrahedra) for a
sequence of outer approximations of conv(S), when S is an algebraic set. In many
cases, these approximations will converge to conv(S). If our procedure yields an
exact representation of conv(S) as a projected spectrahedron, then as a by product
we can optimize a linear function over S by solving a semidenite program. In the
nice cases where the representation uses spectrahedra of small size (relative to the
size of S), semidenite programming becomes an ecient method for optimizing a
linear function over S. In fact, there are several families of algebraic sets where this
spectrahedral approach yields polynomial time algorithms for linear optimization.
Similarly, the spectrahedral approach can, in some cases, yield ecient algorithms
for nding good approximations of the optimal value of a linear function over S.
While we will see many examples of real algebraic varieties (and their dening
ideals) for which our method yields an exact representation of its convex hull in a
few iterations of our procedure, many open questions remain. For instance, there
is no complete understanding of when the method is guaranteed to converge to the
convex hull of the variety in nitely many steps of the procedure. Even in the
cases where nite convergence is guaranteed, good upper bounds on the number of

i
i

7.2. The Method

main
2012/11/1
page 295
i

295

iterations required by the procedure are lacking. The work presented in this chapter
was inspired by a question posed by Lov
asz in [19] that asked for a characterization
of ideals for which the rst approximation in our hierarchy will yield a semidenite
representation of the convex hull of the variety of the ideal. In Section 7.3 we answer
this question for nite varieties. The case of innite varieties is far less understood.
We identify conditions that prevent nite convergence of these approximations to
the closure of the convex hull of the variety. However, again a full characterization
is missing. Thus, the material in this chapter oers both advances in spectrahedral
representations of algebraic sets as well as many avenues for further research.
This chapter is organized as follows. In Section 7.2 we explain the procedure for nding spectrahedral approximations of the convex hull of an algebraic
set. These techniques were developed in [8], coauthored with Parrilo. One of the
key theorems needed in this section (Theorem 7.6) was strengthened in this presentation with the help of Greg Blekherman. We illustrate the method with various
examples and explain the underlying computations. In Section 7.3 we discuss the
situations in which this method converges, either asymptotically or nitely, to an
exact semidenite representation of the convex hull of the variety. The most useful
scenario is when the rst approximation yields an exact semidenite representation
of the convex hull of the variety. We characterize all nite varieties for which this
happens. We conclude in Section 7.4 with examples from combinatorial optimization where the underlying varieties are all nite. The methods we describe have
algorithmic impact on certain classes of combinatorial optimization problems and
the algebra becomes endowed with rich combinatorics in these cases.

7.2

The Method

Let f1 , . . . , fm R[x1 , . . . , xn ] =: R[x] be polynomials and


VR (f1 , . . . , fm ) := {x Rn : f1 (x) = f2 (x) = = fm (x) = 0}
be their set of real zeros. We are interested in representing conv(VR (f1 , . . . , fm )),
the convex hull of VR (f1 , . . . , fm ) in Rn as projected spectrahedra.
Recall that the ideal generated by f1 , . . . , fm in R[x] is the set
m


gi fi : gi R[x], m N R[x].
I = f1 , . . . , fm  =
i=1

The real variety of I is the set VR (I) := {x Rn : h(x) = 0 for all h I} of


real zeros of all polynomials in I. Note
that if s VR (f1 , . . . , fm ), then s VR (I)
m
since fi (s) = 0 implies that h(s) =
i=1 gi (s)fi (s) = 0 for all h I. Conversely, if s VR (I), then for all i = 1, . . . , m, fi (s) = 0 since fi I. Therefore,
VR (f1 , . . . , fm ) = VR (I), and our goal can be viewed more generally as wanting to
nd semidenite representations of the convex hull of the real variety of an ideal in
R[x], or approximations of it.
For any set S Rn , the closure of conv(S) is exactly the intersection of all
closed half spaces {x Rn : l(x) 0} as l varies over all linear polynomials that
are nonnegative on S. Throughout this chapter, linear polynomials include ane

i
i

296

main
2012/11/1
page 296
i

Chapter 7. Convex Hulls of Algebraic Sets

linear polynomials (those with a constant term). In particular, given an ideal I,


@
cl(conv(VR (I))) =
{x : l(x) 0}.
l linear, l|VR (I)0

It is not so clear how to work with this description. Even for a single linear polynomial l, checking whether l(x) is nonnegative on VR (I) is a dicult task. A natural
idea is to relax the condition l|VR (I) 0 to something easier to check, at the risk
of losing some of the l(x) in the above intersection, and obtaining a superset of
cl(conv(VR (I))). As seen already in Chapters 3 and 4, the classical method to
certify the nonnegativity of a polynomial on all of Rn is to write it as a sum of
squares (sos) of other polynomials. In our case, we just need to certify that l(x) is
nonnegative on VR (I), a subset of Rn .
Let denote the set of all sos polynomials in R[x], R[x]k the set of all 
polyh2j ,
nomials in R[x] of degree at most k, and 2k the set of all sos polynomials
where hj R[x]k . Nonnegativity of l(x) on VR (I) is guaranteed if
l(x) = (x) +

m


gi (x)fi (x)

(7.1)

i=1

for (x) and gi R[x], since then for all s VR (I), l(s) = (s) 0. In
Chapter 3 we saw that semidenite programming can be used to check whether a
polynomial is sos. In (7.1) we need to nd both (x) and the polynomials gi to
write l(x) as sos mod I. Therefore, to check (7.1) in practice, we impose degree
restrictions and proceed in one of two possible ways.
(i) In the rst method, we ask that 2k and gi fi R[x]2k for a xed positive
integer k and, if so, say that l(x) is k-sos mod {f1 , . . . , fm }. This is the basic
idea that underlies Lasserres moment method for approximating the convex
hull of a semialgebraic set described in Chapter 6.
(ii) In the second method, we ask only that 2k for a xed positive integer k
which reduces (7.1) to l(x) = (x)+h(x) where h(x) I. If this is the case, we
say that l(x) is k-sos mod I. This method is more natural if one is interested
in the geometry of VR (I) and conv(VR (I)) as it removes the dependence of the
method on the choice of a particular generating set of I. The only issue is if
the computation can be done in practice at the level of the ideal I and not
the input f1 , . . . , fm .
Both methods yield a hierarchy of convex relaxations of conv(VR (I)) obtained
as the intersection of all half spaces {x : l(x) 0} as l(x) ranges over the linear
polynomials that are k-sos in the sense of the method. Since if l(x) is k-sos mod
{f1 , . . . , fm } then it is also k-sos mod I, method (ii) yields a relaxation that is no
worse than that from method (i) for each value of k. On the other hand, method
(ii) requires the knowledge of a basis of R[x]/I as we will see below, which for some
problems may be hard to compute in practice. To see the computational dierences
that can occur between the two methods, consult Remark 7.14.

i
i

7.2. The Method

main
2012/11/1
page 297
i

297

In this chapter we focus on method (ii). The kth iteration of (ii) yields a
closed convex set, called the kth theta body of I, dened as
THk (I) := {x Rn : l(x) 0 for all l linear and k-sos mod I}.
Clearly VR (I), and hence cl(conv(VR (I))), is contained in THk (I) for all k. Thus the
theta bodies of I form a hierarchy of closed convex approximations of conv(VR (I))
as follows:
TH1 (I) TH2 (I) THk (I) THk+1 (I) cl(conv(VR (I))).
An immediate question is when this hierarchy converges to cl(conv(VR (I))) either
nitely or asymptotically. Finite convergence allows an exact representation of
cl(conv(VR (I))) as a theta body which would be extremely useful if we can represent
and optimize over a theta body eciently. We will show in Section 7.2.2 that each
THk (I) is the closure of a projected spectrahedron. This enables optimization
over a real variety using semidenite programming. In Section 7.4, we will learn
the motivation for the name theta bodies. We begin with some background on
working modulo a polynomial ideal.

7.2.1

Sum of Squares Modulo an Ideal

Let I R[x] be an ideal and VR (I) be its real variety. For two polynomials f, g
R[x], if f g I, then f (s) = g(s) for all s VR (I). If f g I, then f and g
are said to be congruent mod I, written as f g mod I. Congruence mod I is an
equivalence relation on R[x]. The equivalence class of f is denoted as f + I, and the
set of equivalence classes is denoted as R[x]/I. The set R[x]/I is both an R-vector
space and a ring over R where addition, scalar multiplication, and multiplication
are dened as follows. Given f, g R[x] and R, (f + I) + (g + I) = (f + g) + I,
(f + I) = f + I, and (f + I)(g + I) = f g + I. We will denote vector space bases
of R[x]/I by B in this chapter. By the degree of an equivalence class f + I, we mean
the smallest degree of an element in the class. With this denition, we may assume
that the elements of B are listed in order of increasing degree. Further, for each
k N, the set Bk of all elements in B of degree at most k is then well-dened.
Computations in R[x]/I can be done via Gr
obner bases of I. Recall that if
G is any reduced Gr
obner basis of I, then a polynomial h lies in I if and only
if the normal form of h with respect to G is zero. Therefore, f g mod I if
and only if the normal form of f g with respect to G is zero, or equivalently,
f and g have the same normal form with respect to G. This provides an algorithm
to check whether two polynomials are congruent mod I. The unique normal form
of all polynomials in the same equivalence class serves as a canonical representative
for this class given G. If M is the initial ideal of I corresponding to the reduced
Gr
obner basis G, then recall that the standard monomials of M form an R-vector
space basis for R[x]/I. Therefore, the normal form of a polynomial with respect
to G can be written as an R-linear combination of the standard monomials of the
initial ideal M . The vector space R[x]/I has many other bases, some of which may
be better suited for computations than the standard monomial bases coming from

i
i

298

main
2012/11/1
page 298
i

Chapter 7. Convex Hulls of Algebraic Sets

an initial ideal of I. See Chapter 3 for a discussion of alternative bases of R[x] and
hence R[x]/I. In this chapter we will use only a standard monomial basis of R[x]/I.
A quick tour of the algebraic notions needed in this chapter can be found in the
appendix. For a thorough introduction to the theory of Grobner bases and related
notions, we refer the reader to [6].
We now come to sum of squares polynomials modulo an ideal I, and the question of how to check whether a
polynomial f R[x] is k-sos mod I. A polynomial
f R[x] is sos mod I if f
h2j mod I for some hj R[x], and k-sos mod I
if hj R[x]k for all j. Hence, the equivalence classes of polynomials that are sos
mod I (respectively, k-sos mod I) are precisely those in
/I := { + I : }
(respectively, 2k /I). It is worthwhile to note that many polynomials that are not
sos in R[x] can become sos mod an ideal I. For instance, the univariate linear
polynomial x is congruent to x2 mod the ideal x x2  R[x].
Let [x]k denote the vector of all monomials in R[x]k in a xed order, say degree
lexicographic. Recall from Chapter 3 that a polynomial f 2k if and only if there
exists a positive semidenite matrix A, denoted A  0, such that f = [x]Tk A[x]k .
The matrix A can be solved for using semidenite programming
and a Cholesky
 2
hj for f , where hj (x)
factorization of it as A = V T V yields an sos expression
is the inner product of the jth row of V and the vector of monomials [x]k . This
method can be adapted to check whether f is k-sos mod I as follows. The vector
[x]k can be replaced by the vector of monomials from Bk , denoted as [x]Bk , since
R[x]k /I is spanned by Bk . Since the size of Bk is no larger than the size of a
basis of R[x]k , this can decrease the size of the unknown matrix A considerably,
making the nal SDP much smaller than before. Setting up A as a symmetric
matrix of indeterminates Aij and multiplying out [x]TBk A[x]Bk , we get a polynomial
obner
g R[x]2k . Let the normal forms of f and g with respect to a reduced Gr
basis G of I be f  and g  , respectively. Then since f f  and g g  mod I and f 
and g  are fully reduced with respect to G, we have that f g mod I if and only if
f  = g  . Therefore, to check if f is k-sos mod I, we equate the coecients of f  and
g  for like monomials and check whether the resulting linear system in the Aij s has
a solution with A  0.
Example 7.1. Consider the polynomial f (x, y) = x4 + y 4 + 2x2 y 2 x2 + y 2 and
the principal ideal I = f  R[x, y]. The real variety VR (I), which is the set of real
zeros of f , is a Bernoulli lemniscate (shown in Figure 7.1) with foci ( 12 , 0).
It is easy to check that the horizontal line y = 18 is a bitangent to VR (I) and
that l(x, y) := y + 18 is nonnegative on VR (I). Since f has degree 4 and l has
degree 1, l cannot be 1-sos mod I but has a chance to be 2-sos mod I. We apply
the method described above to verify this.
The set {f } is a reduced Grobner basis of I with respect to every term order.
The initial ideal of I under the total degree order with ties broken lexicographically
with x > y, is generated by x4 . Hence a basis B for R[x, y]/I is given by the innite
set of standard monomials of x4  R[x, y] which are all the monomials in x and y

i
i

7.2. The Method

main
2012/11/1
page 299
i

299

Figure 7.1. The lemniscate x4 + y 4 + 2x2 y 2 x2 + y 2 = 0 with a bitangent.


that are not divisible by x4 . In particular, B1 = {1, x, y}, B2 = {1, x, y, x2 , xy, y 2 },
and [x]B2 = (1 x y x2 xy y 2 ).
The general 2-sos polynomial mod I is therefore of the form

g=

1
x
y
x2
xy
y2

a11

a12

a13

a14

a15

a12

a22

a23

a24

a25

a13

a23

a33

a34

a35

a14

a24

a34

a44

a45

a15

a25

a35

a45

a55

a16

a26

a36

a46

a56

a16

a26

a36

a46

a56

a66

1
x
y
x2
xy
y2

where A = (aij )  0. Multiplying out the above expression we get that


g := a11 + 2a12 x + 2a13 y + (2a14 + a22 )x2 + (2a23 + 2a15 )xy + (2a16 + a33 )y 2
+ 2a24 x3 + (2a34 + 2a25 )x2 y + (2a26 + 2a35 )xy 2 + 2a36 y 3 + a44 x4 + 2a45 x3 y
+ (a55 + 2a46 )x2 y 2 + 2a56 xy 3 + a66 y 4 .
We now reduce g by the Grobner basis {f }, which means replacing every
occurrence of x4 with
y 4 2x2 y 2 + x2 y 2 ,
and obtain the normal form of g, which is
g  := a11 + 2a12 x + 2a13 y + (2a14 + a22 + a44 )x2 + (2a23 + 2a15 )xy + (2a16 + a33
a44 )y 2 + 2a24 x3 + (2a34 + 2a25 )x2 y + (2a26 + 2a35 )xy 2 + 2a36 y 3 + 2a45 x3 y
+ (a55 + 2a46 2a44 )x2 y 2 + 2a56 xy 3 + (a66 a44 )y 4 .
Since l(x, y) = y + 18 is already reduced with respect to {f }, if l is 2-sos
mod I, then l = g  , and hence to verify this, we need to check whether there exists
A  0 such that a11 = 18 , 2a13 = 1, and all other coecients of g  equal zero.
Writing out all the linear conditions, we need to check whether there exists a positive

i
i

300

main
2012/11/1
page 300
i

Chapter 7. Convex Hulls of Algebraic Sets

semidenite matrix of the form


1

0
8
0
a
22

1 a15
2
a14
0

a15 a25
a16 a26

21
a15
a33
a25
a26
0

a14
0
a25
a44
0
a46

a15
a25
a26
0
a55
0

a16
a26
0
a46
0
a44

that satises the conditions


2a14 + a22 + a44 = 0,
Check that the matrix

A=

2a16 + a33 a44 = 0,

23/2
0
1/2
23/2
0
23/2

0
0
0
0
0
0

1/2 23/2
0
0
21/2
0
0
21/2
0
0
0
21/2

a55 + 2a46 2a44 = 0.

0
0
0
0
0
0

23/2
0
0
21/2
0
21/2

is positive semidenite and satises the conditions given above. This matrix A
factors as A = V T V with


25/4 0
0 21/4 0 21/4
V =
,
25/4 0 21/4
0 0
0
and hence,
&

1
y
8

'

&
'2
%2
1
1 $
mod I.
2x2 + 2y 2 1 + 2 y
4 2
8

In general, nding exact sos expressions, as above, is dicult. This particular sos
decomposition was found by Bruce Reznick using a series of tricks. He showed that
( 18 y) + 12 ((x2 + y 2 )2 (x2 y 2 ))
/2
$ 2
%2 .
1
2
1
2x
+
2y

1
+
2
y

.
= 4
2
8
In practice, one can use an SDP solver to nd A. Using MATLAB, to do this
computation in YALMIP [17] we input the following code:
sdpvar a14 a15 a16 a22 a25 a26 a33 a44 a46 a55
A=[ 1/sqrt(8) 0
-1/2 a14 a15 a16;
0
a22 -a15 0
a25 a26;
-1/2
-a15 a33 -a25 -a26 0 ;
a14
0
-a25 a44 0
a46;
a15
a25 -a26 0
a55 0 ;
a16
a26 0
a46 0
a44];

i
i

main
2012/11/1
page 301

7.2. The Method

301

l1=2*a14 + a22 + a44;


l2=2*a16 + a33 - a44;
l3=a55 + 2*a46 -2*a44;
solvesdp([A>0,l1==0,l2==0,l3==0],0);

We ran this code with SeDuMi 1.1 as the underlying SDP solver in YALMIP. The
matrix can now be recovered by simply typing double(A) and we obtain

0.3536
0.0000 0.5000 0.4052 0.0000 0.1985
0.0000
0.1034
0.0000
0.0000 0.2924 0.0000

0.5000 0.0000
1.1041
0.2924
0.0000
0.0000

,
A=
0.2924
0.7071
0.0000
0.2936
0.4052 0.0000

0.0000 0.2924 0.0000


0.0000
0.8270
0.0000
0.1985 0.0000
0.0000
0.2936
0.0000
0.7071
in which the entries are shown up to four digits of precision. After factorizing A as
V T V we obtain the sos decomposition:
%2
$
0.5946427499 0.8408409925 y 0.6814175403 x2 0.3338138740 y 2
2
+ (0.3215587038 x 0.9093207446 xy)
%2
$
+ 0.6301479392 y 0.4452348146 x2 0.4454261796 y 2
$
%2
+ 0.2110357686 x2 0.6263671431 y 2
+ 0.0001357833655 x2y 2
+ 0.004928018144 y4,
which simplies to
+
+

0.3536000000 y
0.707(x4 + 2x2 y 2 + y 4 x2 + y 2 )
1011 (8.089965190 x2y 3.247827064 y 3).

This provides fairly strong computational evidence that l = 18 y is 2-sos mod I


even though it is not an exact 2-sos representation of l mod I.
The above approach becomes cumbersome as we search for higher and higher
degree sums of squares modulo an ideal. Luckily there are ways of using the existing
software to simplify our input. In our example, checking whether l is 2-sos modulo
I is the same as checking if there exists some R such that l(x, y) + f (x, y) is
sos, which can be done via YALMIP with the following commands:
sdpvar x y lambda
f=x^4+y^4+2*x^2*y^2-x^2+y^2;
l=1/sqrt(8)-y;
F=sos(l+lambda*f);
solvesos(F,0,[],lambda);
sdisplay(sosd(F))

The last command will actually display a list of polynomials whose squares
sum up to (approximately) l(x, y) + f (x, y). In our example, the following output
is obtained

i
i

302

main
2012/11/1
page 302
i

Chapter 7. Convex Hulls of Algebraic Sets


-0.5919274724+0.8880*y+0.6222*x^2+0.3571*y^2
-0.03240303655-0.5699*y+0.4037*x^2+0.6602*y^2
-0.3036*x+0.8587*x*y
-0.0461010126-0.1559*y+0.3963*x^2-0.3792*y^2
9.2958e-05*x+3.2868e-05*x*y
3.789017278e-05+1.3396e-05*y+1.4209e-05*x^2+4.7355e-06*y^2

which should be interpreted as saying that l(x, y) is the sum of squares of the
polynomials shown on each line. Note that the last two polynomials in the list
above again point to the fact that the software only provided reasonable evidence
that l(x, y) is 2-sos mod I.
The above computations also give a glimpse into the intertwining of algebraic
and numerical methods that is prevalent in convex algebraic geometry. The question
of whether a polynomial is a sum of squares modulo an ideal is purely algebraic.
However, the search for an sos expression is done via semidenite programming
which is solved using numerical methods. The answer provided by these numerical
solvers is often not exact. Massaging the numerical information into a certiable
answer can sometimes be an art.
Example 7.2. Consider the polynomial g(x, y) := y 2 (1 x2 ) (x2 + 2y 1)2 and
the ideal I = g(x, y) dening the bicorn curve shown in Figure 7.2. It is clear
that y 0 over the curve. Instead of checking if y is k-sos mod I for some k (which
is never the case as we will see in the next section), it is in general more useful to
search for the smallest such that y + is k-sos mod I. That way, if y is not sos
mod I, we will at least obtain a valid inequality y + 0 on VR (I) which will then
be valid for THk (I). In general, y + is k-sos mod I if there exists some polynomial
h(x, y) of degree 2k 4 such that (y + ) + h(x, y)g(x, y) is sos. As before, this can
be checked easily using YALMIP.
k=2;
sdpvar x y mu
[h,c]=polynomial([x y],2*k-4);
g=y^2*(1-x^2)-(x^2+2*y-1)^2;
F=sos(y+mu-h*g);
solvesos(F,mu,[],[mu;c]);

Figure 7.2. A bicorn curve.

i
i

7.2. The Method

main
2012/11/1
page 303
i

303

By successively setting k to be 2, 3, and 4, we get that the minimum value of


(recovered using double(mu)) is 0.1776, 0.0370, and 0.0161, respectively. So while
is approaching 0, it seems that y is at least not 4-sos mod I.

7.2.2

Theta Bodies

We now come back to theta bodies of the ideal I and their representations. Recall
that the kth theta body of I is
THk (I) := {x Rn : l(x) 0 for all l linear and k-sos mod I}.
Given any polynomial, it is possible to check whether it is k-sos mod I using Grobner
bases and semidenite programming as seen in Section 7.2.1. The bottleneck in using the denition of THk (I) in practice is that it requires knowledge of all the linear
polynomials (innitely many) that are k-sos mod I. To overcome this diculty we
will now derive an alternative description of THk (I) as a projected spectrahedron
(up to closure) which enables computations via semidenite programming.
We may assume that there are no linear polynomials in the ideal I since
otherwise, some variable xi is congruent to a linear combination of other variables
mod I, and we may work in a smaller polynomial ring. Therefore, R[x]1 /I
= R[x]1
and {1 + I, x1 + I, . . . , xn + I} can be completed to a basis B of R[x]/I. Recall
the denition of degree of f + I. We will assume that each element in a basis
B = {fi +I} of R[x]/I is represented by a polynomial whose degree equals the degree
of its equivalence class, and that B is ordered so that deg(fi + I) deg(fi+1 + I).
Further, Bk denotes the ordered subset of B of degree at most k.
Denition 7.3. Let I R[x] be an ideal. A basis B = {f0 + I, f1 + I, . . .} of R[x]/I
is a -basis if it has the following properties:
1. B1 = {1 + I, x1 + I, . . . , xn + I}.
2. If deg(fi + I), deg(fj + I) k, then fi fj + I is in the R-span of B2k .
Our goal will be to rst express the kth theta body THk (I) as the closure
of a certain set of linear functionals on the k-sos polynomials mod I. This will be
achieved in Theorem 7.6. In the case where I contains the polynomials x2i xi
for all i = 1, . . . , n, the closure can be removed (Theorem 7.8). Such ideals appear
in combinatorial optimization and hence this result will have an important role in
Section 7.4. After this, we use a -basis of the quotient ring R[x]/I to turn the
description of THk (I) in Theorem 7.6 to an explicit semidenite representation.
This allows concrete computations and examples. We proceed toward Theorem 7.6.
In what follows, we identify a linear polynomial + a, x R[x]1 with the
vector (, a) Rn+1 . Let k1 (I) := {f + I : f R[x]1 , f k-sos mod I}. Then k1 (I)
is a cone in the vector space R[x]1 /I
= R[x]1 , and its dual cone k1 (I) lives in

n+1
. Thus,
(R[x]1 /I) = R[x]1 = R
k1 (I) = {(t, x) R Rn : t + a, x 0 for all (, a) k1 (I)}.

i
i

304

main
2012/11/1
page 304
i

Chapter 7. Convex Hulls of Algebraic Sets

Consider the hyperplane H := {(1, x) : x Rn } in Rn+1 . We may think of H also


as H = {L (R[x]1 /I) : L(1 + I) = 1}. It then follows immediately that
{1} THk (I) = k1 (I) H.

(7.2)

Lemma 7.4. The hyperplane H intersects the relative interior of k1 (I) .


Proof. A sucient condition for a hyperplane L to intersect the relative interior of
a closed convex cone P is that cl(cone(relint(P L))) = P . If L does not intersect
the relative interior of P , then P L is contained in some proper face F of P
(possibly the empty face). Therefore, cl(cone(relint(P L))) is also contained in
this face which is a proper subset of P .
By (7.2), C := {(, x) : 0, x relint(THk (I))} is the cone over the
relative interior of k1 (I) H. We will show that cl(C) = k1 (I) . Let (, a) k1 (I)
and x relint(THk (I)). Then since x THk (I), 0 + a, x = (, a), (1, x)
which implies that 0 (, a), (, x) for all 0. Hence C k1 (I) , and since
k1 (I) is closed, cl(C) k1 (I) .
Suppose k1 (I)  cl(C). Then there exists (t, x) k1 (I) \cl(C). Since the
constant polynomial 1 lies in k1 (I) and (t, x) k1 (I) , t 0. Also, since cl(C)
is closed and there exists (s, y) C with s > 0, we can nd a small enough > 0
such that (t, x) + (s, y) k1 (I) \cl(C), and the rst coordinate of (t, x) + (s, y)
is positive. Scaling this element, we may assume that there is an element (1, x)
k1 (I) \cl(C). Since (1, x) k1 (I) , + a, x 0 for all (, a) k1 (I), which
implies that x THk (I) and hence (1, x) cl(C), which is a contradiction.
We will also need the following lemma which can be proved using standard
tools of convex geometry.
Lemma 7.5. Let P be a closed convex cone and Q be a convex subcone of P such
that cl(Q) = P . Then relint(P ) Q, and for any ane hyperplane H passing
through the relative interior of P , P H = cl(Q H).
We now examine the cone k1 (I) more closely. Let k (I) denote the set of
all f + I such that f is k-sos mod I. Then k (I) = 2k /I is a cone in R[x]2k /I,
and k1 (I) = k (I) R[x]1 /I. Therefore, the dual cone of k1 (I) in (R[x]/I) is the
closure of the projection of k (I) into (R[x]1 /I) as explained in Section 2.1 of
Chapter 5. Hence we may identify k1 (I) with the closure of the set
Sk (I) := {(L(1 + I), L(x1 + I), . . . , L(xn + I)) : L k (I) }.
:
;
Further, dene Qk (I) := (L(x1 + I), . . . , L(xn + I)) : L k (I) , L(1 + I) = 1 .
We will see shortly that Qk (I) is a projected spectrahedron, but rst we establish
the connection between THk (I) and Qk (I).
Theorem 7.6. THk (I) = cl(Qk (I)).

i
i

7.2. The Method

main
2012/11/1
page 305
i

305

Proof. Since {1} Qk (I) = Sk (I) H, we have {1} cl(Qk (I)) = cl(Sk (I) H).
Since cl(Sk (I)) = k1 (I) , it follows from (7.2) that {1} THk (I) = cl(Sk (I)) H.
Therefore, the theorem will follow if we can show that
cl(Sk (I)) H = cl(Sk (I) H).
By Lemma 7.5, this equality holds if H intersects Sk (I) in its relative interior. Again, by Lemma 7.5, relint(k1 (I) ) Sk (I). Lemma 7.4 showed that H
intersects the relative interior of k1 (I) and hence the relative interior of Sk (I).
We now focus on an important situation where the closure is not needed in
Theorem 7.6. In many cases in practice, we are interested in nding the convex hull
of a set S Rn that may not be presented as the real variety of an ideal. However,
the approximation THk (I) of conv(S) is dened with respect to an ideal I whose
real variety is S. In this case, the canonical choice for such an ideal is the vanishing
ideal of S, denoted as I(S), which consists of all polynomials in R[x] that vanish
on S. The real radical of an ideal I R[x] is the ideal
4
5


R
I = f R[x] : f 2m +
gi2 I, m N, gi R[x] ,

and the ideal I is said to be real radical if I = R I. The real Nullstellensatz [21]
states that I is real radical if and only if I = I(VR (I)). This is the analogue of
Hilberts Nullstellensatz for real algebraic varieties. Computing any ideal I such that
VR (I) = S might be hard, and in general, computing I(S), given S, might also be
hard. However, in many cases of practical interest, I(S) is available. A large source
of such examples is combinatorial optimization, where S is usually a nite set of
0/1 points for which a generating set for I(S) can be computed using combinatorial
arguments. We will see several such examples in Section 7.4. If S is a subset of
{0, 1}n and I = I(S), then Theorem 7.6 can be improved to Theorem 7.8. We rst
prove a lemma.
Lemma 7.7. Let J be any ideal that contains x2i xi for all i = 1, . . . , n. Then
1 + J is in the relative interior of k (J) = {f + J : f is k-sos mod J.
Proof. Let I := x2i xi for all i = 1, . . . , n. We will rst show that 1 + I is in
the relative interior of k (I) R[x]2k /I. The cone k (J) is a projection of k (I)
since I J, and hence, if 1 + I relint(k (I)), then 1 + J relint(k (J)). 1 + I
is in the relative interior of k (I), which is a cone in the vector space R[x]2k /I.
We will show that for any polynomial p R[x]2k , (1 + p) + I k (I) for
some > 0. Since we are working modulo I, we may assume that every monomial
in p is square-free. Further, since every monomial is a square modulo I, it suces
to show that (1 q) + I k (I) for any square-free monomial q of degree at most
2k and some > 0. Write q = q1 q2 for some square-free monomials q1 , q2 of degree
at most k. Now note that
(1 q2 )2 = 1 2q2 + q22 1 q2 mod I, and
(1 q1 + q2 ) = 1 + q12 + q22 2q1 + 2q2 2q1 q2 1 q1 + 3q2 2q1 q2 mod I.
2

i
i

306

main
2012/11/1
page 306
i

Chapter 7. Convex Hulls of Algebraic Sets

Therefore, (1 q1 + q2 )2 + 3(1 q2 )2 + q12 4 2q1 q2 = 4 2q mod I. Since


q1 , q2 R[x]k , it follows that (4 2q) + I k (I), which implies that (1 q2 ) + I
k (I).
Theorem 7.8. If S {0, 1}n and I = I(S), then THk (I) = Qk (I).
Proof. Since S {0, 1}n, its vanishing ideal I = I(S) contains x2i xi for all
i = 1, . . . , n, and so by Lemma 7.7, 1 + I is in the relative interior of k (I).
Hence, k1 (I) = Sk (I). (No closure operation is needed by [24, Corollary 16.4.2].)
Therefore,
{1} THk (I) = k1 (I) H = Sk (I) H = {1} Qk (I),
and the result follows.
We have thus far seen that the kth theta body THk (I) is the closure of Qk (I).
However, this description is still abstract and in order to work with theta bodies
in practice, we now give an explicit (coordinate based) description of Qk (I) using
a basis of R[x]/I which will make it transparent that Qk (I) is the projection of
a spectrahedron. This involves the theory of moments and moment matrices as
explained below.
Fix a -basis B = {fi + I} of R[x]/I and dene [x]Bk to be the column vector
formed by all the elements of Bk in order. Then [x]Bk [x]TBk is a square matrix
indexed by Bk and its (i, j)-entry is equal to fi fj + I. By hypothesis, the entries of
[x]Bk [x]TBk lie in the R-span of B2k . Let { li,j } be the unique set of real numbers

such that fi fj + I = fl +IB2k li,j (fl + I).
Denition 7.9. Let I, B, and { li,j } be as above. Let y be a real vector indexed
by B2k with y0 = 1, where y0 is the rst entry of y, indexed by the basis element
1 + I. The kth reduced moment matrix
 MBk (y) of I is the real matrix indexed by
Bk whose (i, j)-entry is [MBk (y)]i,j = fl +IB2k li,j yl .
We now give examples of reduced moment matrices. For simplicity, we often
write f for f + I. Also, in this chapter we consider only monomial bases of R[x]/I
obner basis
(i.e., fi is a monomial for all fi + I B) which we can obtain via Gr
theory. In this case, [x]Bk is a vector of monomials and we identify the vector [x]Bk
with the vector of monomials that represent the elements of Bk . The method is to
compute a reduced Gr
obner basis of I and take B to be the equivalence classes of
the standard monomials of the corresponding initial ideal. If the reduced Gr
obner
basis is with respect to a total degree ordering, then the second condition in the
denition of a -basis is satised by B.
Example 7.10. Consider the ideal I generated by f := (x + 1)x(x 1)2 . Clearly,
VR (I) = {1, 0, 1} with a double root at 1, and conv(VR (I)) = [1, 1]. The polynomial f = x4 x3 x2 + x is the unique element in every reduced Gr
obner basis
of I with x4  as initial ideal. The standard monomials of this initial ideal are

i
i

7.2. The Method

main
2012/11/1
page 307
i

307

1, x, x2 , x3 , and hence B = {1 + I, x + I, x2 + I, x3 + I} is a -basis for R[x]/I.


The biggest reduced moment matrix we could construct is MB3 (y), whose rows and
columns are indexed by B3 = B.
We have [x]B3 = (1 x x2 x3 ) and

1 x x2 x3
x x2 x3 x4

[x]B3 [x]TB3 =
x2 x3 x4 x5 ,
x3 x4 x5 x6
which is entrywise equivalent mod I to

1
x
x2
2
x
x
x3
2
3
3
x
x
x + x2 x
3
3
2
x x +x x
2x3 x

x3
x3 + x2 x
.

2x3 x
2x3 + x2 2x

We now linearize using y = (1, y1 , y2 , y3 ) and obtain

1
y1

MB3 (y) =
y2
y3

y1
y2
y3
y3 + y2 y1

y2
y3
y3 + y2 y1
2y3 y1

y3
y3 + y2 y1
.

2y3 y1
2y3 + y2 2y1

The reduced moment matrices MB1 (y) and MB2 (y) are the upper left 2 2
and 3 3 principal submatrices of MB3 (y).
Example 7.11. Consider the ideal I = x4 y 2 z 2 , x4 + x2 + y 2 1. Using a
computer algebra package such as Macaulay2 [10] one can calculate a total degree
reduced Gr
obner basis of I as follows:
Macaulay2, version 1.3
i1
i2
i3
o3

:
:
:
=

R
I
G
|

= QQ[x,y,z,Weights => {1,1,1}];


= ideal(x^4-y^2-z^2, x^4+x^2+y^2-1);
= gens gb I
x2+2y2+z2-1 4y4+4y2z2+z4-5y2-3z2+1 |

which says that this Grobner basis consists of the two polynomials
x2 + 2y 2 + z 2 1 and 4y 4 + 4y 2 z 2 + z 4 5y 2 3z 2 + 1.
A basis for the quotient ring R[x, y, z]/I is given by the standard monomials of the
initial ideal x2 , y 4 , which gives the following partial bases:
B1
B2
B3
B4

= {1, x, y, z},
= B1 {xy, y 2 , xz, yz, z 2},
= B2 {xy 2 , y 3 , xyz, y 2 z, xz 2 , yz 2 , z 3 },
= B3 {xy 3 , xy 2 z, y 3 z, xyz 2 , y 2 z 2 , xz 3 , yz 3 , z 4 }.

i
i

308

main
2012/11/1
page 308
i

Chapter 7. Convex Hulls of Algebraic Sets

Linearizing the elements of B4 , we get the following table:


1
1

x
y1

y
y2

z
y3

y2
y5

xy
y4
xy 3
y16

xz
y6

xy 2 z
y17

yz
y7

y3z
y18

z2
y8
xyz 2
y19

xy 2
y9
y2z 2
y20

y3
y10

y2z
y12

xyz
y11

xz 3
y21

yz 3
y22

xz 2
y13

yz 2
y14

z3
y15

z4
y23 .

We can now calculate various reduced moment matrices. For instance,

MB2 (y) =

y1
T1

y2
y4
y5

y3
y6
y7
y8

y4
T2
y9
y11
T4

y5
y9
y10
y12
y16
T6

y6
T3
y11
y13
T5
y17
T7

y7
y11
y12
y14
y17
y18
y19
y20

y8
y13
y14
y15
y19
y20
y21
y22
y23

where we have lled in only the upper triangular region. The unknowns T1 , T2 , . . .
stand for the following expressions:
T1
T2
T3
T4
T5
T6
T7

= 2y5 y8 + 1,
= 2y10 y14 + y2 ,
= 2y12 y15 + y3 ,
= y20 + y223 3y25 3y28 + 12 ,
= 2y18 y22 + 1,
= y20 y423 + 5y45 + 3y48 14 ,
= 2y20 y23 + y8 .

The Ti s can be calculated using Macaulay2 by rst nding the normal form of the
needed monomial with respect to the Gr
obner basis that was calculated and then
linearizing using the yi s. For instance, T2 is the linearization of the normal form
of x2 y, which by the calculation below, is 2y 3 yz 2 + y.
i6 : x^2*y%G
3
2
o6 = - 2y - y*z + y

The reduced moment matrix MBk (y) can also be dened in terms of linear
functionals on R[x]2k /I. For a vector y = (yb ) RB2k , dene Ly (R[x]2k /I)
as Ly (b) := yb for all b B2k . Then every L (R[x]2k /I) is equal to Ly for
y = (L(b) : b B2k ) RB2k . If y RB2k , let y0 := y1+I , yi := yxi +I for i = 1, . . . , n.
Further, let Rn be the projection map that sends y RB2k to (y1 , . . . , yn ) Rn .

i
i

7.2. The Method

main
2012/11/1
page 309
i

309

Lemma 7.12.
1. For a vector y RB2k with y0 = 1, the entry of MBk (y) indexed by bi , bj Bk
is Ly (bi bj ).
2. MBk (y)  0 Ly (f 2 + I) 0 for all f + I R[x]k /I.
Proof. The rst part follows from the denition of MBk (y) and Ly . For f + I

R[x]k /I, let f be the unique vector in RBk such that f +I = bi Bk fi bi . Therefore,

f 2 + I = bi ,bj Bk fi fj (bi bj ) which implies that
Ly (f 2 + I) =

fi fj Ly (bi bj ) = fT MBk (y)f.

bi ,bj Bk

Therefore, MBk (y)  0 Ly (f 2 + I) 0 for all f + I R[x]k /I.


Putting all this together, we obtain the following specic semidenite representation of Qk (I), and hence THk (I) up to closure. We will use this explicit
coordinate based description of THk (I) in the the calculations below.
Theorem 7.13. The kth theta body of I, THk (I), is the closure of
;
:
Qk (I) = Rn y RB2k : MBk (y)  0, y0 = 1 .
Proof. Recall that Qk (I) is the set


L(g + I) 0 for all g + I 2k /I,
(L(x1 + I), . . . , L(xn + I)) :
.
L(1 + I) = 1
Equivalently, Qk (I) is the set


L(f 2 + I) 0 for all f + I R[x]k /I,
(L(b) : b B1 \{1 + I}) :
.
L(1 + I) = 1
By Lemma 7.12 (2), it then follows that
:
;
Qk (I) = Rn y RB2k : MBk (y)  0, y0 = 1 =: QBk (I).
When working with a specic basis B, we use QBk (I) instead of Qk (I) to make
the choice of basis clear. In the examples that follow, please bear in mind that this
abuse of notation is simply to keep track of which -basis of R[x]/I was used in
the explicit semidenite representation of Qk (I). The proof of Theorem 7.13 shows
that any -basis of R[x]/I can be used to coordinatize Qk (I).
Example 7.10 continued. We write down QBk (I) for k = 1, 2, 3 for the ideal
I = (x + 1)x(x 1)2  from Example 7.10. Using the matrix MB3 (y) (with y0 = 1)
that was already computed we see that
QB1 (I) = {y1 : (y1 , y2 ) R2 s.t. y2 y12 },

i
i

310

main
2012/11/1
page 310
i

Chapter 7. Convex Hulls of Algebraic Sets

Figure 7.3. The spectrahedra {y RB2k : y0 = 1, MBk (y)  0} for


k = 1, 2, 3 for I = (x + 1)x(x 1)2  and their projections to the y1 -axis.
which is the projection onto the y1 -axis of the convex hull of the parabola y2 = y12 .
Therefore, QB1 (I) = R and hence TH1 (I) = R, which is a trivial relaxation of
conv(VR (I)) = [1, 1].
The body QB2 (I) = {y1 : y R3 s.t. MB2 (y)  0}. We know the exact form
of the moment matrices so we can use YALMIP to nd cl(QB2 (I)), by minimizing
x and x over that body.
sdpvar y1 y2 y3
M=[1 y1 y2;
y1 y2 y3;
y2 y3 y3+y2-y1];
solvesdp(M>0,y1);
double(y1)
solvesdp(M>0,-y1);
double(y1)

We then get cl(QB2 (I)) [1.0000, 1.0417], and we will later see that it is actually
exactly [1, 25
24 ].
To nish, we compute QB3 (I) = {y1 : y R3 s.t. MB3 (y)  0}. This is the
projection onto the y1 -coordinate of the spectrahedron in R3 described by all the

i
i

7.2. The Method

main
2012/11/1
page 311
i

311

Figure 7.4. The variety of Example 7.11 and its rst theta body.

Figure 7.5. The second theta body from Example 7.11.


inequalities obtained from the condition MB3 (y)  0. This body is the convex hull
of the moment vectors (x, x2 , x3 ) evaluated at x = 1, 0, 1, which is the triangle
with vertices (1, 1, 1), (0, 0, 0), (1, 1, 1). Projecting onto the y1 -coordinate, we get
cl(QB3 (I)) = [1, 1]. See Figure 7.3 for QBi (I), i = 1, 2, 3, and their spectrahedral
preimages.
Example 7.11 continued. We now draw a few theta bodies of the ideal
I = x4 y 2 z 2 , x4 + x2 + y 2 1
from Example 7.11, where we calculated the second reduced moment matrix MB2 (y).
This allows us to write down QB1 (I) and QB2 (I).
From the Grobner basis of I that we computed, we see that the polynomial
x2 + 2y 2 + z 2 1 is in I. We will see in Example 7.36 that the rst theta body of I
is the ellipsoid {(x, y, z) R3 : x2 + 2y 2 + z 2 1}. This ellipsoid along with VR (I)
(the two black rings) is shown in Figure 7.4. The second theta body is shown in
Figure 7.5 and it appears to equal conv(VR (I)).
Remark 7.14. This example shows the dierence between Lasserres method to
convexify VR (I) and the reduced moment method that underlies theta bodies. Recall
that in step k of Lasserres method, the relaxation of conv(VR (I)) that is computed is the common intersection of all half spaces l(x) 0 containing VR (I) and

i
i

312

main
2012/11/1
page 312
i

Chapter 7. Convex Hulls of Algebraic Sets

Figure 7.6. The second Lasserre relaxation for Example 7.11.


m
l(x) = (x) + i=1 gi (x)fi (x), where (x) is a k-sos polynomial and gi (x)fi (x)
R[x]2k . Using the software package Bermeja [25] we can draw the second relaxation
in Lasserres method which is shown in Figure 7.6.
Now that we have seen several examples of theta bodies of ideals, we give a
few comments and examples to point out some of the subtleties involved. We start
with an example to show that QBk (I) may not be closed, which emphasizes the
need to take its closure to get THk (I).
Example 7.15. Consider the principal ideal I = x21 x2 1 R[x1 , x2 ]. Then
conv(VR (I)) = {(s1 , s2 ) R2 : s2 > 0}, which is not a closed set. Any linear
polynomial that isnonnegative
over VR (I) is of the form x2 + , where , 0.
Since x2 + ( A
x1 x2 )2 + ( )2 mod I, TH2 (I) = cl(conv(VR (I))).
The set B = kN {xk1 + I, xk2 + I, x1 xk2 + I} is a -basis for R[x1 , x2 ]/I for
which
B4 = {1, x1 , x2 , x21 , x1 x2 , x22 , x1 x22 , x31 , x32 , x1 x32 , x41 , x42 } + I.
The reduced moment matrix MB2 (y) for y = (1, y1 , . . . , y11 ) RB4 is

1
x1
x2
x21
x1 x2
x22

x1

x2

x21

x1 x2 x22

1
y1
y2
y3
y4
y5

y1
y3
y4
y6
1
y7

y2
y4
y5
1
y7
y8

y3
y6
1
y9
y1
y2

y4
1
y7
y1
y2
y10

y5
y7
y8
y2
y10
y11

If MB2 (y)  0, then the principal minor indexed by x1 and x1 x2 implies that
y2 y3 1, and so in particular, y2 = 0 for all y QB2 (I). However, since QB2 (I)
conv(VR (I)) = {(s1 , s2 ) R2 : s2 > 0}, it must be that QB2 (I) = conv(VR (I)),
which shows that QB2 (I) is not closed.
We will see in the next section that when S is a nite set of points in Rn ,
the ideal I = I(S) of all polynomials that vanish on S, has the property that

i
i

7.2. The Method

main
2012/11/1
page 313
i

313

THl (I) = conv(VR (I)) = conv(S) for a nite l that depends on I. However, since
conv(S) QBl (I) THl (I), we also get that QBl (I) is closed. Even in this case,
QBk (I) may not be closed for some k < l.
Example 7.16. Consider the nite set of points S = {(t, 1/t2 ) : t = 1, . . . , 7}
lying on the curve x21 x2 = 1. Then
I(S) = x21 x2 1, (x21 1)(x21 4)(x21 9)(x21 16)(x21 25)(x21 36)(x21 49).
This is a zero-dimensional ideal, and a basis for R[x1 , x2 ]/I(S) is given by
B = {1, x1 , x2 , x21 , x1 x2 , x22 , x1 x22 , x31 , x32 , x1 x32 , x41 , x42 , x51 , x1 x42 } + I.
In particular, B4 is the same as the B4 in Example 7.15 and the initial ideal of I(S)
whose standard monomials are the monomials in B is generated by {x21 x2 , x52 , x61 }.
Therefore, MB2 (I(S)) and QB2 (I(S)) agree with those in Example 7.15, which implies that QB2 (I(S)) is not closed.
Another natural question is whether the theta bodies of dierent ideals with
the same real variety can have drastically dierent behaviors, especially with respect
to convergence
anideal I and its
to the convex hull of the variety. For instance,

real radical R I have the same real variety and I R I, THk ( R I) THk (I)
for all k.
Theorem 7.17. Fix
an ideal I. Then there exists a function : N N such that
TH (k) (I) THk ( R I) for all k.
We refer the reader to [9, Section 2.2] for a proof. The main message to take
away from this result is that whether or not the theta body hierarchy of an ideal
converges to cl(conv(VR (I))) is determined by the real variety of I. In particular,
whether the theta body sequence of anideal converges to cl(conv(VR (I))) in nitely
many steps, or not, is determined by R I.

7.2.3

Possible Extensions

The focus of this chapter is on polynomial equations, and sums of squares relaxations. However, all this theory can potentially be adapted to work in some more
complicated cases. In this section we give examples of some constructions that give
a avor of possible extensions. Similar constructions were also seen in Chapter 6,
and we refer to [22] for a more systematic study of the types of techniques we will
see below (in a slightly dierent setting).
Example 7.18. The theta body sequence can be modied to deal with polynomial inequalities, using Lasserres ideas. Given an ideal I and some polynomials g1 , . . . , gt , we might want to nd the convex hull of the semialgebraic set
S = {x VR (I) : g1 (x) 0, . . . , gt (x) 0}. To do this we use shifted reduced
moment matrices in addition to the reduced moment matrices of I.

i
i

314

main
2012/11/1
page 314
i

Chapter 7. Convex Hulls of Algebraic Sets

Figure 7.7. Sum of squares approximation to the half-lemniscate of Gerono.


Recall that to obtain the kth reduced moment matrix MBk (y) of I, we would
take the matrix [x]Bk [x]TBk , write it in terms of a basis B of R[x]/I, and linearize
using the new variables y with y0 = 1. To dene the shifted reduced moment matrix
MBk (g y) (with respect to g), we take the matrix g(x)[x]Bk [x]TBk and do precisely
as before.
C
B
Consider for example the ideal I = x4 x2 + y 2 of the lemniscate of Gerono,
together with the inequality x 0. The semialgebraic set S in this case is the right
half-lemniscate shown in Figure 7.7. The second reduced moment matrix of I is
given by

1
x
y
w20
w11 w02
x w0 w1
w30
w21 w12

2
1

y w11 w02
w21
w12 w03

w 0 w 0 w 1 w 0 w 2 w 1 w 2 ,
2
3
2
2
0
3
2
1

w1 w21 w12
w31
w22 w13
w02 w12 w03
w22
w13 w04
where wij is the linearization of xi y j . The combinatorial moment matrix shifted by
x and truncated at k = 1 is

x w20 w11
0

w2 w30 w21 .
w11 w21 w12
If we force both matrices to be positive semidenite and project over the x, y coordinates, we get an approximation of the convex hull of the right half of the lemniscate,
as shown in Figure 7.7. By increasing the truncation parameter of the reduced moment matrix and the shifted moment matrix we get better approximations to the
convex hull.
Note that in this example we are essentially searching for certicates of nonnegativity of the form l(x, y) 0 (x, y) + x1 (x, y) mod I, where 0 and 1 are
2-sos and 1-sos, respectively.

i
i

7.2. The Method

main
2012/11/1
page 315
i

315

Example 7.19. Consider the teardrop curve given by p(x, y) := x4 x3 + y 2 = 0.


We will see in Corollary 7.45 that the singularity at the origin will prevent the theta
bodies of p from converging in a nite number of steps to the convex hull of the
curve. We can, however, get rid of that problem by strengthening the hierarchy in
a simple way. Recall that the second theta body in this case will be obtained as the
closure of the set of all points (x, y) R2 for which there exists a positive denite
matrix of the form

1
x
y
w20
w11 w02
x w20 w11
w30
w21 w12

y w1 w2
w21
w12 w03

1
0
,
0
w2 w30 w21 w30 w02 w31 w22

1
w1 w21 w12
w31
w22 w13
w02 w12 w03
w22
w13 w04
where wij is a variable that linearizes the monomial xi y j , and so the rows and
columns are indexed by {1, x, y, x2 , xy, y 2 }. One can in this case strengthen the
condition by adding a new row and column to the matrix, indexed not by a monomial
1
. We then use the same strategy as
but by the fraction xy that we linearize as w1
before, of linearizing all resulting products modulo the relation x4 = x3 y 2 (which
2
2
allows us to get rid of w4,0 ) and the relations yx = x2 x3 and xy 2 = x x2 (which
eliminates two more variables). This new pseudomoment matrix is given by

1
1
x
y
w20
w11 w02
w1

x
w20
w11
w30
w21 w12
y

1
2
1
2
3
0
0
y
w1
w0
w2
w1 w0 w2 w3

0
0
w21
w30 w02 w31 w22
w11
M (x, y, w) =
.
w2 w3

w1 w1
2
1
2
3
2
w
w
w
w
w

1
2
1
3
2
1
0

2
3

w0 w12
w03
w22
w13 w04
w1
1
3
w1
y w20 w30
w11
w02 w1
x w20
Since the original moment matrix is a submatrix of M (x, y, w), the body Q =
{(x, y) : w s.t. M (x, y, w)  0} must be contained in TH2 (p), and a simple
numeric computation seems to show that Q actually matches the convex hull of the
real variety VR (p), as we can see in Figure 7.8. In this gure we see a comparison
of the second theta body and Q, drawn numerically using YALMIP. The fact that
Q seems to be exact is related to the fact that we can now use the term xy to get
sos certicates. For example, x = x2 + ( xy )2 modulo the new identities that we
introduced.
B C
Exercise 7.20. Let I = x2 .
1. Show that x is not k-sos mod I for any k.
2. Show that for any > 0, the polynomial x + is 1-sos mod I.
3. Describe TH1 (I).

i
i

316

main
2012/11/1
page 316
i

Chapter 7. Convex Hulls of Algebraic Sets

Figure 7.8. In the darker color we see TH2 (p), while in the lighter color
we see the strengthening Q as dened in Example 7.19. In black we see the variety
itself.

Figure 7.9. Lemniscate of Gerono.

Exercise 7.21. Using YALMIP or


nd the smallest such that
B other software,
C
x + is 2-sos modulo the ideal I = x4 x3 + y 2 . What about 3-sos? What about
4-sos?
Exercise 7.22. The lemniscate of Gerono is given by the equation x4 x2 + y 2 = 0
shown in Figure 7.9. Using YALMIP give an approximate 2-sos decomposition of
x + 1 modulo the equation of the curve. Can you nd an exact one?
Exercise 7.23. Using reduced moment matrices, give semidenite descriptions of
the following bodies:
1. QB2 (I) for the ideal of the lemniscate of Gerono.
B
C
2. QB1 (I) and QB2 (I) where I = y 2 x 1, x2 y 1 .
3. QB1 (I) where I is the vanishing ideal of the vertices of the 0/1 cube in R3 .

i
i

7.3. Convergence of Theta Bodies

main
2012/11/1
page 317
i

317

Exercise 7.24. Let I be the vanishing ideal of a nite set of points in Rn .


1. Prove that p(x) is nonnegative over VR (I) if and only if it is a sum of squares
modulo the ideal I.
2. Using the above fact, prove that for B, a -basis of R[x]/I, the spectrahedron
{y RB : MB (y)  0, y0 = 1} is the simplex whose vertices are the vectors
(fi (s) : fi + I B) as s varies over the nitely many points in VR (I).

7.3

Convergence of Theta Bodies

One of the main questions after dening a sequence of approximations to a convex


set is if they actually approximate the set, and further, if some approximation in
the sequence is guaranteed to coincide with the set. In this section we examine
conditions under which the sequence of theta bodies of an ideal I converges, either
nitely or asymptotically, to conv(VR (I)).
Denition 7.25. Let I R[x] be an ideal.
1. The theta body sequence of I converges to cl(conv(VR (I))) if

THk (I) = cl(conv(VR (I))).

k=1

2. For a nite integer k, the ideal I is THk -exact if THk (I) = cl(conv(VR (I))).
3. If I is THk -exact for a nite integer k, then we say that the theta body sequence of I converges to cl(conv(VR (I))) in nitely many steps. If the theta
body sequence of I converges to cl(conv(VR (I))) but there is no nite k for
which I is THk -exact, then we say that the theta body sequence of I converges
asymptotically to cl(conv(VR (I))).
We will see in Section 7.3.1 that if VR (I) is nite, then there is always some
nite k for which I is THk -exact. However, tight bounds on k for which I is THk exact are not known in general. The best scenario is when I is TH1 -exact. We
characterize nite varieties whose real radical ideal is TH1 -exact. Recall from the
discussion following Theorem 7.17 that there is no loss of generality in passing to
the real radical of I in discussing convergence.
When VR (I) is innite, much less is understood about the convergence of
the theta body sequence of I. In Section 7.3.2 we explain what we know about
this case. The best general result is that when VR (I) is compact, the theta body
sequence is guaranteed to converge to cl(conv(VR (I))) asymptotically. However,
nite convergence, and even convergence in the rst step are sometimes possible for
innite varieties, although no characterization is known in either case. We show that
certain singularities can prevent nite convergence when the variety is compact.

i
i

318

7.3.1

main
2012/11/1
page 318
i

Chapter 7. Convex Hulls of Algebraic Sets

Finite Real Varieties

Theorem 7.26. Let I be an ideal such that VR (I) is nite; then there exists some
k such that THk (I) = conv(VR (I)).
Proof. First notethat by Theorem 7.17 we just need to prove the existence of
such a k for J = R I. Let VR (I) := {P1 , . . . , Pm } Rn and, for each Pi , let qi
be a polynomial such that qi (Pi ) = 1 and qi (Pj ) = 0 for j = i. Then given any
polynomial f (x) that is nonnegative on VR (I) we have that
'2
m &?

f (x)
f (Pj )qj (x)
j=1

vanishes at all Pi , and hence it belongs to J, and f is sos modulo J. So all nonnegative polynomials on VR (J) are sos modulo J, which in particular implies that
each of them is nonnegative over some THk (J). Since the convex hull of VR (I)
is a polytope, it is cut out by a nite number of linear inequalities. Pick k large
enough for all these linear inequalities to be valid on THk (J) simultaneously. Then
conv(VR (I)) = THk (J).
Clearly, Theorem 7.26 implies that when VC (I) is nite, the ideal I is THk exact for some nite k. When the ideal I is also radical, nite convergence of
its theta body sequence to the convex hull of the variety was proved by Parrilo
(see Theorem 2.4 in [16]). Having established nite convergence of the theta body
sequence of I when VR (I) is nite, one can ask the more ambitious question of when
such an I is TH1 -exact. This is the most useful and computationally practical case
of nite convergence. If the ideal dening a nite set of points is always assumed to
be the vanishing ideal of the variety (and hence real radical), we can give a complete
geometric characterization of when they are TH1 -exact. We will need the following
fact about real radical ideals.
Lemma 7.27 ([8]). If I R[x] is a real radical ideal, then a linear inequality
l(x) 0 is valid for THk (I) if and only if l(x) is k-sos modulo I.
In order to characterize real radical ideals with nite real varieties, we need a
new denition.
Denition 7.28. Given a polytope P , we say that P is 2-level if for each facet F
of P and its ane span HF , all vertices of P are either in F or in a unique translate
of HF .
Example 7.29. In R3 , up to ane equivalence there are ve three-dimensional
2-level polytopes, shown in the upper part of Figure 7.10. It is easy to see that a
2-level polytope must be anely equivalent to a 0/1-polytope. In the bottom of
Figure 7.10 we show the three remaining 0/1-polytopes (up to ane equivalence)
with a face that fails to verify the 2-level condition highlighted.

i
i

7.3. Convergence of Theta Bodies

main
2012/11/1
page 319
i

319

Figure 7.10. The top row contains all 0/1 three-dimensional 2-level polytopes (up to ane equivalence). The bottom row contains all 0/1 three-dimensional
polytopes (up to ane equivalence) that are not 2-level.

Theorem 7.30. Let I be real radical with S := VR (I) nite. Then I is TH1 -exact
if and only if S is the set of vertices of a 2-level polytope.
Proof. Assume without loss of generality that S spans the entire space and let
f1 (x) 0, . . . , fm (x) 0 be a minimal list of linear inequalities describing P :=
conv(S), i.e., each fi corresponds to a facet Fi of P and is zero on that facet. By
Lemma 7.27, I is TH1 -exact if and only if all fi are 1-sos mod I, since every ane
linear polynomial that is nonnegative on S is a nonnegative linear combination of
the fi s.

If I is TH1 -exact, for each i = 1, . . . , m, we have fi (x) (hk (x))2 mod I,
where all hk are linear. But since fi vanishes on S Fi so must all hk and
therefore, since they are linear, they must vanish on the ane space generated
by Fi . This means that they are actually just scalar multiples of fi and we have
fi (x) (fi (x))2 mod I, for some nonnegative . In particular, all points P S
must satisfy either fi (P ) = 0 or fi (P ) = 1/ proving the 2-level condition.
Suppose now that P is 2-level. Then for each fi , all points P S must satisfy
fi (P ) = 0 or fi (P ) = i , for some xed i > 0. But then fi (fi i ) vanishes on
S, and therefore belongs to I. This implies fi (1/i )fi2 mod I and fi is 1-sos
modulo I.
Theorem 7.30 will turn out to be very useful in the context of combinatorial
optimization as we will see in the next section. Polytopes with integer vertices
that are 2-level are called compressed polytopes in the literature [34, 35] and play an
important role in other research areas. Being 2-level is a highly restrictive condition
that immediately gives us much information on the polytope. Since all the vertices
of a 2-level polytope in Rn can be assumed to be 0/1 vectors, it is clear that they
have at most 2n vertices. It was shown in [8] that they also have at most 2n facets
which is not obvious. There are many innite families of 2-level polytopes such as
simplices, hypercubes, cross polytopes, and hypersimplices.

i
i

320

main
2012/11/1
page 320
i

Chapter 7. Convex Hulls of Algebraic Sets

Figure 7.11. Cusp and its convex hull.

7.3.2

Innite Real Varieties

We begin by showing that unlike for nite varieties, the theta body approximations
can fail drastically when VR (I) is innite. The following simple example is adapted
from Example 1.3.2 in [21].
B
C
Example 7.31. Consider the ideal I = x2 y 3 dening the cusp in Figure 7.11.
The closure of the convex hull of this curve is the upper half-plane, so the only linear
0. Suppose
inequalities valid on the curve are of the form l (x, y) = y + , where 
there exists some l with an sos certicate modulo I, then l (x, y)
pi (x, y)2
mod I for some polynomials pi . Note that any polynomial p has a unique standard
form of the type a(y) + xb(y) modulo this ideal, which we can obtain by reducing all
multiples of x2 , using the fact that x2 y 3 mod I. Two polynomials are the same
modulo the ideal if they have the same standard form. Since l (x, y) is already in
this form, we can simply reduce the right-hand side in the congruence relation to its
standard form too. Suppose each pi = ai (y) + xbi (y). Then it is easy to check that


pi (x, y)2

(ai (y)2 + y 3 bi (y)2 ) +

(2xai (y)bi (y))

mod I.

Since the right-hand side is in standard form, to be congruent to l it must be the


same as l . Looking at the maximum degree of y in the rst sum on the right, we
see that it is smaller than two only if the ai s are all constants and the bi s are all
zero, since the highest degree terms cannot all cancel. In particular we get y + is a
constant, which is clearly a contradiction. This proves that THk (I) = R2 for all k,
and the theta bodies are completely ineective in approximating conv(VR (I)). In
fact, the same proof would work for any curve of the form x2 p(y) where p has
odd degree.
However, despite the existence of badly behavedvarieties such as the one
presented above, there is a large, very interesting class of innite real varieties
where such behavior never occurs, namely, compact varieties.
Theorem 7.32. Let I be an ideal such that VR (I) is compact. Then the theta body
sequence of I converges to the convex hull of the variety VR (I) in the sense that

THk (I) = conv(VR (I)).

k=1

i
i

7.3. Convergence of Theta Bodies

main
2012/11/1
page 321
i

321

Figure 7.12. Strophoid curve and its convex hull.

This is an immediate consequence of Schmudgens Positivstellensatz (see Chapter 3). To see the connection, just consider any set of generators {g1 , . . . , gt } for I
and the semialgebraic set S = {x Rn : g1 0, . . . gt 0} = VR (I). When applied to S, Schmudgens Positivstellensatz guarantees that every linear polynomial
that is strictly positive over VR (I) is sos modulo I.
Example 7.33. The existence of varieties as in Example 7.31 does not imply that
for all unbounded varieties we have problems with the theta body sequence. Consider the strophoid curve given by p(x, y) := (1 y)x2 (1 + y)y 2 = 0, shown in
Figure 7.12. The closure of the convex hull of this variety is the band B dened by
1 y 1. We claim that TH2 (I) = B. To show this it is enough to prove that
both 1 y and 1 + y are 2-sos modulo I, which is true since
&
'2
%2 1
1 2
1
1$
1
2
y y 2 + (xy x) + (y 1)p(x, y).
1y = 1 y y
+
2
2
4
2
2
In what follows we concentrate our eorts on the compact case, where asymptotic convergence of the theta body sequence is guaranteed. The next natural
question when VR (I) is innite but compact is whether we can understand when
the theta body sequence converges in nitely many steps to cl(conv(VR (I))). Finite convergence would prove that conv(VR (I)) is the projection of a spectrahedron,
which is an important feature of a convex semialgebraic set as seen in Chapter 6.
There is no complete understanding of this situation, but in the remainder of this
section, we discuss the known results.
TH1 -exactness. We begin by discussing the strongest scenario within nite convergence, namely TH1 -exactness of an ideal. In spite of the strength of this property,
there are surprisingly many interesting examples of such ideals with innite real varieties. We begin by taking a general look at the notion of TH1 -exactness for all
ideals. Roughly speaking, TH1 -exact ideals are those whose quadratic elements are
enough to describe their convex geometry, a statement that will be made precise
shortly. We start with a small lemma concerning convex quadrics.
Lemma 7.34. If p R[x] is a convex quadric polynomial, then p is TH1 -exact.

i
i

322

main
2012/11/1
page 322
i

Chapter 7. Convex Hulls of Algebraic Sets

Proof. This result will follow from Proposition 7.41, where we will show that the
rst theta body of any quadric is simply the convex hull of its graph intersected
with the x-plane. This intersection is precisely conv(p) if p is convex.
We now give an alternative characterization of TH1 (I) for any ideal I.
Proposition 7.35. For any ideal I R[x], TH1 (I) equals the intersection of
conv(VR (p)) as p varies over all convex quadrics in I.
Proof. The inclusion TH1 (I) conv(VR (p)) for all convex quadrics p I is
easy, since a linear inequality is valid over the second set if and only if it is 1-sos
modulo p, which immediately implies that it is 1-sos modulo I and therefore valid
on TH1 (I). For the second inclusion note that if l(x) is 1-sos mod I, then
l(x) = (x) + g(x),
where is a sum of squares and g is a quadric in I. But note that 2 g =
2  0 which implies g is a convex quadric in I, and l(x) is 1-sos modulo g.
Therefore, l(x) 0 is valid on conv(VR (g)) and hence also valid on the intersection
of conv(VR (p)) as p varies over all convex quadrics in I.
B
C
Example 7.36. Consider the ideal I = x4 y 2 z 2 , x4 + x2 + y 2 1 that we
introduced in Example 7.11. This is the intersection of two quartic surfaces in R3 .
The Gr
obner basis computation we did then shows that there exists a single quadric
in this ideal (up to scalar multiplication), which is the polynomial 1+x2 +2y 2 +z 2 .
Therefore, TH1 (I) equals the ellipsoid {(x, y, z) R3 : x2 + 2y 2 + z 2 1}, as seen
in Figure 7.4.
Proposition 7.35 can sometimes be used to prove TH1 -exactness.
Example 7.37. Consider the ideal I = x2 + y 2 + z 2 4, (x 1)2 + y 2 1, from
Example 7.47. Note that the quadratic polynomials p1 = (x 1)2 + y 2 1 and
p2 = 2x + z 2 4 belong to I. Write I1 = p1  and I2 = p2 . Then we claim that
conv(VR (I)) = conv(VR (I1 )) conv(VR (I2 )),
and therefore I is TH1 -exact. To see this note that the variety VR (I) can be written as


{(x, 1 (x 1)2 , 4 2x) : 0 x 2}.


In particular for each xed x we get four points, and the rectangle they form must
be contained in the convex hull of VR (I). This means


{(x, y, z) R3 : |y| 1 (x 1)2 , |z| 4 2x, 0 x 2} conv(VR (I)),


but it is clear that this set can be rewritten as
{(x, y, z) R3 : y 2 1 (x 1)2 , z 2 4 2x} = conv(VR (I1 )) conv(VR (I2 )),
which contains conv(VR (I)), so we get the intended equality.

i
i

7.3. Convergence of Theta Bodies

main
2012/11/1
page 323
i

323

An important open question concerning TH1 -exactness of varieties comes from


oriented Grassmannians and illustrates that the TH1 relaxation can be surprisingly
powerful. For the purposes of this discussion, we dene the oriented Grassmannian
n
Gk,n to be the set of all oriented k-subspaces of Rn , embedded in R(k ) by taking
Pl
ucker coordinates, i.e., by picking an oriented basis of the space, writing the
vectors as an n k matrix, and taking all k k minors and scaling them by a
n
positive scalar to a point on the sphere S (k )1 .
The ideal Ik,n , generated by all the quadratic relations among the k k minors
of an n k matrix, is called the Pl
ucker ideal. BThe Grassmann
variety is then
C
the compact real variety of the ideal I = In,k + 1
x
2 , so it makes sense to
approximate it with theta bodies. It is unknown whether all Grassmann varieties
are TH1 -exact, in fact even the G3,6 case is unknown, but numerical simulations
seem to say it is, at least for the relatively small examples for which numerical
computations are doable. Unpublished work by Sanyal and Rostalski [26] makes
connections between TH1 -exactness of these ideals and some classical open questions
of Harvey and Lawson on calibrated geometries [12].
Exactness in one step for principal ideals. Principal ideals are the simplest
ideals with innite varieties. However, even in this case, TH1 -exactness is not to be
expected. In fact, if p has degree d and 2k < d, THk (p) is the full ambient space Rn ,
since any k-sos linear inequality would verify l(x) = (x) + g(x) with degree of the
sums of squares less than or equal to 2k. But the degree of g I must be at
least d so there would be no cancellation of the highest degree and the sum could
never be a linear polynomial. An interesting question in this case is whether and
when the rst meaningful theta body would equal conv(VR (p)) when I = p. We
will focus on the following problem: given a polynomial p of degree 2k, decide if p
is THk -exact. In this generality there is a simple necessary criterion, but we have
to introduce a few denitions in order to state it.
Denition 7.38. Consider a polynomial p R[x1 , . . . , xn ] and dene p = x0
p(x1 , . . . , xn ) R[x0 , x1 , . . . , xn ]. Consider the convex set C = conv(VR (
p)), which
is simply the convex hull of the graph of p, and dene the shadow area of p, denoted
by sh(p), as the intersection of C with the plane x0 = 0.
This shadow area clearly contains conv(VR (p)) since it is convex and contains
the variety. However we can easily establish a more interesting inclusion.
Proposition 7.39. For p R[x] of degree 2k, sh(p) THk (p). In particular
if sh(p) strictly contains the closure of the convex hull of VR (p), then p is not
THk -exact.
Proof. Let l(x) be k-sos modulo p, i.e., l(x) = (x) + p(x) where is a sum
of squares of degree at most 2k and R. Then l(x) p(x) = (x) implies
l(x) p(x) 0 everywhere and therefore l(x0 , x) := l(x) x0 is valid over
p) and hence over its convex hull too. But by intersecting with x0 = 0 we
VR (

i
i

324

main
2012/11/1
page 324
i

Chapter 7. Convex Hulls of Algebraic Sets

Figure 7.13. Scarabaeus curve and its third theta body.


get that l(x) 0 must be valid on sh(p). From the denition of THk (I) it follows
immediately that sh(p) THk (I) as intended.
Despite the simplicity of the criterion, it is a handy tool to prove that a principal ideal is not exact at the rst step, without relying on numerical approximations.
Example 7.40. Consider the scarabaeus curve given by
p(x, y) := (x2 + y 2 )(x2 + y 2 + 4x)2 (x2 y 2 )2 = 0.
A simple numerical computation with an SDP solver shows us that TH3 (p) does
not match the convex hull of the curve, as can be seen in Figure 7.13. To provide
a short exact proof, one just has to point out that p(4, 0) = 256 and p(1, 0) = 24,
and since the point ( 47 , 0, 0) lies in the segment between (4, 0, 256) and (1, 0, 24),
the point = ( 47 , 0) must be contained in sh(p) and therefore in TH3 (p). It
is, however, easy to
calculate that the maximum value that x attains on the
curve is (50 + 11 22)/27 0.06, which implies that the convex hull must not
contain .
In some very special cases we can actually say a bit more about the rst
meaningful theta body.
Proposition 7.41. Let p be a polynomial in n variables and degree 2d. Then
1. if n = 1, sh(p) = THd (p);
2. if d = 1, sh(p) = TH1 (p);
3. if n = 2 and d = 2, sh(p) = TH2 (p).
Proof. We just have to prove that in these cases sh(p) THd (p). To do this
let l(x) > 0 be a valid linear inequality over sh(p). This means that the line
p)). By the
L = {(x0 , x) : x0 = 0, l(x) = 0} does not intersect C = conv(VR (

i
i

7.3. Convergence of Theta Bodies

main
2012/11/1
page 325
i

325

Figure 7.14. On the left we see the cardioid p(x) = 0 and its convex hull.
On the right we see the graph of p, its intersection with the plane z = 0 and the
ellipsoidal region where the graph and the boundary of its convex hull dier.

separation theorem for convex sets we can therefore take a hyperplane H that
strictly separates L and C. Since H does not touch the graph of p, it depends
on x0 , and since it does not touch L, it must be parallel to it. Therefore we have
a hyperplane of the form l (x0 , x) := x0 + (l(x) ) = 0, with = 0, > 0.
Since p(x0 , x) = x0 p(x), this means that (x) := p(x) + (l(x) ) is always
nonnegative or always nonpositive. Without loss of generality assume it is always
nonnegative (which implies > 0). Since the degree and number of variables of
this polynomial fall under Hilberts result (see Chapter 4), (x) is a sum of squares.
Hence, l(x) = (x)/ + p(x)/ is d-sos modulo the ideal, which implies that
l(x) 0 is valid over THd (p), proving the inclusion.
Example 7.42. We use the above result to prove TH2 -exactness of the following
principal ideal. Consider
p(x, y) = (x2 + y 2 + 2x)2 4(x2 + y 2 )
dening a cardioid, and the function

q(x, y) =

p(x, y)

if (x + 1)2 + y 2 3,

if (x + 1)2 + y 2 < 3.

8x 4

One can check that q is smooth and convex by noticing that p(x, y) = ((x+1)2 +y 2
3)2 +8x4 and by looking at its Hessian. Furthermore, the convex hull of the graph
of p is just the region above the graph of q. Therefore sh(p) = {(x, y) : q(x, y) 0},
and we can see in Figure 7.14 that sh(p) is the convex hull of the cardioid.
Even for one-variable polynomials this result is interesting.

i
i

326

main
2012/11/1
page 326
i

Chapter 7. Convex Hulls of Algebraic Sets

Figure 7.15. Graph of the polynomial x x2 x3 + x4 , its convex hull,


and intersection with the x-axis.
Example 7.43. Consider the polynomial p(x) = x x2 x3 + x4 . In Figure 7.15
we can see that this polynomial is not TH2 -exact, and why that happens. The
double root at x = 1 forces the convex hull of the graph to include some points
to the right of x = 1. In fact one can compute precisely the double
3 that
2 tangent
denes the boundary of the convex hull and show that TH2 (p) = 1, 25
24 .
Singularities and convergence. We now return to the more general question
of nite convergence of the theta body sequence for an ideal with an innite real
variety. There is no complete understanding of the obstructions to nite convergence, but we now show that if VR (I) has certain types of singularities, then nite
convergence is not possible.
Given an ideal I and a point P on the real variety of I, we dene the normal
space NP (I) to be the linear space {f (P ) : f I}.
Proposition 7.44. Let l(x) be an ane polynomial such that l(P ) = 0 for some
P in VR (I). If l  NP (I), then l is not a sum of squares modulo I.
Proof. Suppose l is a sum of squares. Then
l(x) = (x) + g(x)

(7.3)

for some sum of squares and some polynomial g I. By evaluating at P we


get that (P ) = 0, which immediately implies (P ) = 0. By dierentiating (7.3)
we get
l = (x) + g(x),

(7.4)

and by evaluating at P we get that l = g(P ) NP (I).


If I is real radical we can say even more.
Corollary 7.45. If I is real radical and l(x) 0 is a linear inequality valid on
 NP (I), then I is not
VR (I) with l(P ) = 0 at a point P VR (I) such that l
THk -exact for any k.

i
i

7.3. Convergence of Theta Bodies

main
2012/11/1
page 327
i

327

Figure 7.16. TH2 (I), TH3 (I), TH4 (I), and TH5 (I): all contain the origin
in their interior.
Proof. This follows from the previous proposition and Lemma 7.27.
Example 7.46. Let p(x, y) = (x2 + y 2 )2 (x + 5y)x2 and I = p. This ideal
denes a bifolium with a singularity at the origin, which implies N(0,0) (I) = {(0, 0)}.
Furthermore the linear inequality x + 5y 0 is valid on the variety and holds
with equality at the origin. Since (1, 5)  N(0,0) (I) we immediately have that this
inequality does not hold for any theta body relaxation of this ideal. In Figure 7.16
we can see THk (I) for k = 2, 3, 4, 5, and see that in fact the inequality does not
hold for any of them.
Corollary 7.45 essentially tells us that certain singularities of the ideal I that
are in the boundary of the convex hull of VR (I) aect the convergence of the theta
bodies of I. For a point P VR (I), the expected dimension of the normal space
NP (I) is the codimension of VR (I). A reasonable notion of a singularity of I is a
point P VR (I) for which NP (I) has smaller dimension than expected. The next
example will show that just the existence of singularities of I on the boundary of
conv(VR (I)) is not enough for Corollary 7.45 to apply.
Example 7.47. Consider the variety VR (I) in R3 dened by the ideal
I = x2 + y 2 + z 2 4, (x 1)2 + y 2 1.
As seen in Figure 7.17, this variety looks like a curved gure-eight and has a
singularity at the point p = (2, 0, 0), which belongs to the boundary of conv(VR (I)).
This happens since NP (I) = R{(1, 0, 0)} has dimension one, smaller than the codimension of the variety, which is two. However, (2, 0, 0) does not cause problems
for the convergence of theta bodies since the only linear polynomial that is zero at
p and nonnegative on VR (I) is the polynomial 2 x, whose gradient is in NP (I).
Indeed, the rst theta body of I already equals conv(VR (I)), as we will see in
Example 7.37.

i
i

328

main
2012/11/1
page 328
i

Chapter 7. Convex Hulls of Algebraic Sets

Figure 7.17. The curved eight variety and its convex hull.
A better, more rened, way of looking at singularities was introduced by
Omar and Osserman in [23]. They introduce a stronger notion of nonnegativity
over varieties that yields a stronger necessary condition for nite convergence of the
theta body hierarchy. As a byproduct they prove the following result.
Theorem 7.48. Let f (x) be a polynomial such that there exists some positive
integer n and an R-algebra homomorphism : R[x]/I R[]/ n  for which
(f ) = a0 + a1 + + an1 n1 . If the rst nonzero (leading) coecient ai
is negative, then f is not a sum of squares modulo I.
Proof. Just note that homomorphisms send sums of squares to sums of squares, and
sums of squares in R[]/ n  always have their leading coecient nonnegative.
Again this immediately gives us a new criterion.
Corollary 7.49. Let I be a real radical ideal and l(x) 0 a linear inequality valid
on VR (I). If there exists an R-algebra homomorphism : R[x]/I R[]/ n  for
which (l) has negative leading coecient, then I is not THk -exact for any k.
This corollary is much stronger than Corollary 7.45, and examples showing
the dierence are presented in [23]. In our next example we just show that we can
recover Corollary 7.45 from Corollary 7.49 for the variety in Example
but, in
C
B 7.46
fact, we can do so for any variety just by considering maps to R[]/ 2 .
2
Example 7.50. Let p(x, y) = (x2 +y 2 )B2 (x+5y)x
and I = p as in Example 7.46.
C
2
Then the map : R[x, y]/I R[]/ dened by (x) = (y) = is well
dened, since (p) = 0. However, (x+5y) = 6 has a negative leading coecient
despite x + 5y 0 being valid on the variety. Hence, p is not THk -exact for
any k.

One should keep in mind that singularities are not necessarily the only things
that prevent nite convergence of the theta body sequence to cl(conv(VR (I))). For
compact smooth curves and surfaces, Scheiderer proved that nonnegativity and

i
i

7.3. Convergence of Theta Bodies

main
2012/11/1
page 329
i

329

Figure 7.18. Serpentine curve and the closure of its convex hull.

sums of squares modulo the ideal are equivalent [28, 29]. However, even in these
cases, it is an open question if one can bound the degree needed to represent every
nonnegative ane polynomial as a sum of squares modulo the ideal. Thus there
might be examples of smooth curves and surfaces with no nite convergence of the
theta body hierarchy to conv(VR (I)). The only cases where we know a little more
is when the genus of the curve is one.
Proposition 7.51 (Theorem 2.1 [30]). If VR (I) is a smooth curve of genus 1
with at least one nonreal point at innity, then I is THk -exact for some k.
Genus zero curves can be rationally parametrized which allows semidenite
representations of their convex hulls by means of sums of squares, as seen in [13].
However such constructions do not automatically translate to nite convergence
of the theta body sequence to the convex hull of the curve, even in the smooth
case.
For varieties of dimension greater than two, there always exist nonnegative
polynomials that are not sums of squares modulo any ideal that denes them, even
in the smooth compact case, as seen in [27]. It is therefore very natural to expect
examples of smooth compact varieties with no nite convergence of the theta body
hierarchy, but we do not know a concrete example at this point.
Exercise 7.52. Consider the serpentine curve given by p(x) := y(x2 + 1) x = 0,
depicted in Figure 7.18. The closure of its convex hull is the band cut out by the
inequalities 1/2 y 1/2. Show that the ideal I = p is TH2 -exact by giving
an exact expression of 1 2y and 1 + 2y as 2-sos polynomials modulo I.
Exercise 7.53. Using Proposition 7.35 show that the rst theta body of the
vanishing ideal of the points {(0, 0), (1, 0), (0, 1), (2, 2)} is cut out by precisely two
polynomial inequalities, and write them explicitly.
C
B
Exercise 7.54. Consider the ideal I = y 2 x5 , z x3 . The inequality z 0 is
valid on the variety VR (I).
1. Can we use Proposition 7.44 to prove that z is not k-sos modulo I for any k?
2. Use Theorem 7.48 to prove that z is not k-sos modulo I for any k.

i
i

330

main
2012/11/1
page 330
i

Chapter 7. Convex Hulls of Algebraic Sets

Exercise 7.55. Similarly to our denition of 2-level polytope, we can dene a


k-level polytope to be one where given a facet F , and the ane plane HF that it
spans, all vertices of the polytope are contained either in HF or in one of k 1
parallel translates of HF . Prove that if S is the set of vertices of a (k + 1)-level
polytope then the vanishing ideal of S, I(S), is THk -exact.
Exercise 7.56. Consider the univariate quartic polynomial p(x) = x4 3x3 +3x2
3x + 2 which has two real roots, 1 and 2. Compute TH2 (p) exactly. Is the ideal
TH2 -exact?
Exercise 7.57. Consider the bifolium given by p(x, y) := (x2 +y 2 )2 yx2 = 0. This
curve has a singularity at the origin, which is also on the boundary of its convex
hull and satises the conditions of Corollary 7.45, and hence we know that its theta
body hierarchy does not converge. Using the same ideas as in Example 7.19, add
2
to the second moment matrix of I = p a row and a column indexed by yx . Plot
the resulting approximation and compare it with the convex hull of the curve.

7.4

Combinatorial Optimization

In this nal section, we focus on combinatorial optimization where a typical problem


involves optimizing a linear function over all combinatorial objects of a certain kind.
Many of these problems are modeled using graphs and can sometimes be studied
combinatorially. However, a more systematic approach is to model these problems
as integer or linear programs, which puts an emphasis on the underlying geometry.
These models work as follows. The combinatorial objects of interest are typically
dened as subsets of the ground set [n] := {1, 2, . . . , n} and the object T [n] is
recorded via its characteristic vector T {0, 1}n dened as Ti = 1 if i T and
Ti = 0 otherwise. This creates a simple bijection between the
objects and certain
c Rn , maximizing iT ci over all the
elements of {0, 1}n. Then, for a vector
T
objects {T } is equivalent to maximizing
c
i xi over the characteristic vectors { }
T
which in turn is equivalent to maximizing
ci xi over conv({ }) which is a 0/1
polytope by construction. (Recall that a 0/1 polytope in Rn is the convex hull of
vectors in {0, 1}n.) In principle this is a linear program but the diculty is that
T
no description of conv({
 }) is usually known, and one resorts to relaxations of
T
ci xi is maximized to obtain an upper bound on the value
conv({ }) over which
of max{c, x : x conv({T })}.
The theory of integer programming oers general methods to construct polyhedral relaxations of conv({T }) by rst nding a polytope whose integer points
are precisely {T }. See [31, Chapter 23] for linear programmingbased methods.
Polyhedral relaxations can sometimes be found using combinatorial arguments that
depend explicitly on the structure of the problem. Automatic methods for constructing relaxations have also come about from lift-and-project methods that nd
a sequence of polyhedral or spectrahedral relaxations of conv({T }). Some examples of lift-and-project methods besides, the theta body method described in this
chapter, can be found in [2, 14, 20, 33] (see also [15]). Theta bodies construct

i
i

7.4. Combinatorial Optimization

main
2012/11/1
page 331
i

331

relaxations of conv(VR (I)) for an ideal I. In the special case of the combinatorial
optimization model described above, the starting point is the nite set {T } which
is a nite algebraic variety, and we typically take its vanishing ideal as the ideal
whose theta bodies are to be computed. As we saw in Section 7.3.1, these real
radical ideals are always THk -exact for some nite k. We take a closer look at some
combinatorial optimization problems whose theta bodies have been explored.

7.4.1

Stable Sets in a Graph

An example that is at the heart of the history of theta bodies is the maximum
stable set problem in an undirected graph G = ([n], E) with vertex set [n] and edge
set E. A stable set in G is a set U [n] such that for all i, j U , {i, j}  E. The
maximum stable set problem seeks the stable set of largest cardinality in G, the
size of which is the stability number, (G), of G.
The maximum stable set problem can be modeled as follows. For each stable set U [n], let U {0, 1}n be its characteristic vector dened as U
i = 1 if
n
i U and U
i = 0 otherwise. Let SG {0, 1} be the set of characteristic vectors
of all stable sets in G. Then STAB(G) := conv(SG ) is called the stable set polytope of
nG and the maximum stable set problem is, in theory, the linear program
max{ i=1 xi : x STAB(G)} with optimal value (G). However, STAB(G) is
not
n known a priori, and so one resorts to relaxations of it over which to optimize
i=1 xi .
Polyhedral relaxations of STAB(G) can be constructed from combinatorial
arguments. For instance, a well-known relaxation is the polytope
FRAC(G) := {x Rn : xi + xj 1 for all {i, j} E, xi 0 for all i [n]},
where the constraint xi + xj 1 for {i, j} E comes from the fact that both
endpoints of an edge cannot be in a stable set. It can be checked that STAB(G) is
exactly the convex hull of the integer points in FRAC(G). The polytope FRAC(G)
and several tighter polyhedral relaxations of STAB(G) have been studied extensively
in the literature; see [11, Chapter 9].
Since the set SG is an algebraic variety, the theta bodies of its vanishing ideal
oer convex relaxations of STAB(G). This vanishing ideal is:
IG := x2i xi for all i [n], xi xj for all {i, j} E R[x1 , . . . , xn ].
0
For U [n], let xU := iU xi . From the generators of IG it follows that if
f R[x], then f g mod IG where g is in the R-span of the set of monomials
{xU : U is a stable set in G}. In particular,
B := {xU + IG : U stable set in G}
is a -basis of R[x]/IG (containing 1 + IG , x1 + IG , . . . , xn + IG ). This implies that
Bk = {xU + IG : U stable set in G, |U | k}, and for xUi + IG , xUj + IG Bk ,
their product is xUi Uj + IG , which is 0 + IG if Ui Uj is not a stable set in G.
This product formula allows us to compute MBk (y), where we index the element

i
i

332

main
2012/11/1
page 332
i

Chapter 7. Convex Hulls of Algebraic Sets

xU + IG Bk by the set U . Since SG {0, 1}n and I(G) is the vanishing ideal of
SG , by Theorems 7.8, we have that

M  0, M R|Bk ||Bk | such that

M = 1,

n
.
THk (IG ) = y R : M{i} = M{i} = M{i}{i} = yi


M
=
0
if
U

U
is
not
stable
in
G

UU

MUU  = MW W  if U U  = W W 
In particular, indexing the one-element stable sets by the vertices of G,

M  0, M R(n+1)(n+1) such that

M00 = 1,
TH1 (IG ) = y Rn :
.
M0i = Mi0 = Mii = yi i [n]

Mij = 0 for all {i, j} E


Example 7.58. Let G = ([5], {{1, 2}, {2, 3}, {3, 4}, {4, 5}, {1, 5}}) be a 5-cycle. The
vanishing ideal of the characteristic vectors of stable sets in G is
IG = x1 x2 , x2 x3 , x3 x4 , x4 x5 , x1 x5 , x2i xi for all i = 1, . . . , 5,
and a -basis for R[x]/IG is given by
B = {1, x1 , x2 , x3 , x4 , x5 , x1 x3 , x1 x4 , x2 x4 , x2 x5 , x3 x5 } + IG .
Let y R10 be the vector of variables whose coordinates are indexed by B in the
given order and with y0 = 1. Then
:
;
TH1 (IG ) = y R5 : y6 , . . . , y10 s.t. MB1 (y)  0 ,
where

MB1 (y) =

1
y1
y2
y3
y4
y5

y1
y1
0
y6
y7
0

y2
0
y2
0
y8
y9

y3
y6
0
y3
0
y10

y4
y7
y8
0
y4
0

y5
0
y9
y10
0
y5

Note that xi x2i and 1 xi (1 xi )2 mod IG for any graph G, so TH1 (IG ) is
always contained in the [0, 1] cube.
The rst example of an SDP relaxation of a combinatorial optimization problem was the theta body of a graph G = ([n], E) constructed by Lov
asz in [18] while
studying the Shannon capacity of graphs. The theta body of G, denoted as TH(G),
is a relaxation of STAB(G) that was originally dened as the intersection of the
innitely many half spaces that arise from the orthonormal representations of G.
Several equivalent denitions can be found in [18] and [11, Chapter 9]. However,
none of them point to an obvious generalization of the construction to other discrete

i
i

7.4. Combinatorial Optimization

main
2012/11/1
page 333
i

333

optimization problems. In [20], Lov


asz and Schrijver observe that TH(G) can be
formulated via semidenite programming exactly as the formulation for TH1 (IG )
shown above. This is still specialized to the stable set problem. Then in [19], Lov
asz
observes that, in fact, TH(G) is cut out by all linear polynomials that are 1-sos mod
the ideal IG . For the stable set problem, this fact can be proven without all the
machinery introduced in this paper. This connection leads naturally to the denition of THk (IG ) for any positive integer k and more generally THk (I) for any ideal
I R[x] and any k. Problem 8.3 in [19] (roughly) asks to characterize all ideals
I R[x] such that cl(conv(VR (I))) equals TH1 (I) or more generally, THk (I). It
was this problem that motivated us to study theta bodies in general and develop
the methods in this chapter.
Example 7.59. Let us return to the example Example 7.58. When Lov
asz introduced the theta body of a graph G, he also introduced the concept of theta number
of a graph, (G) (c.f. Chapter 2). This is just the number

 n

xi : x TH(G) = TH1 (IG ) ,
max
i=1

which is an upper bound (and approximation) for the stability number (G) of
a graph. We can now easily compute (C5 ), the theta number of the 5-cycle,
numerically using YALMIP, since we have the precise structure of the reduced
moment matrix.
y=sdpvar(1,10);
M=[1
y(1) y(2) y(3) y(4) y(5) ;
y(1) y(1) 0
y(6) y(7) 0
;
y(2) 0
y(2) 0
y(8) y(9) ;
y(3) y(6) 0
y(3) 0
y(10);
y(4) y(7) y(8) 0
y(4) 0
;
y(5) 0
y(9) y(10) 0
y(5) ];
obj=y(1)+y(2)+y(3)+y(4)+y(5);
solvesdp(M>=0,-obj);
double(obj)

This will return the answer (C5 ) 2.361. Note that (C5 ) = 2, so we do get an
upper approximation as expected, but it is clear that IC5 is not TH1 -exact.
A particular reason for Lov
aszs interest in [19, Problem 8.3] was due to the
fact that STAB(G) = TH(G) if and only if G is a perfect graph [11, Corollary 9.3.27].
Recall that a graph is perfect if and only if it has no induced odd cycle of length at
least ve or its complement [4]. Since TH(G) = TH1 (IG ) for all graphs G, it follows
that IG is TH1 -exact if and only if G is perfect. The pentagon in Example 7.58
is not perfect, which justies our observation that its ideal IG is not TH1 -exact.
Chv
atal and Fulkerson had shown that STAB(G) = QSTAB(G) if and only if G is
a perfect graph where



n
xi 1 for all cliques K in G .
QSTAB(G) := x R : xi 0 for all i [n],
iK

i
i

334

main
2012/11/1
page 334
i

Chapter 7. Convex Hulls of Algebraic Sets

A clique in G is a complete subgraph in G. Since every edge in G is a clique,


FRAC(G) QSTAB(G) STAB(G) in general. A hexagon is perfect, in which
case, FRAC(G) = QSTAB(G) since the only cliques in G are its edges. Therefore,
for the hexagon, STAB(G) = TH(G) = TH1 (IG ) = QSTAB(G) = FRAC(G). Since
IG is TH1 -exact if and only if G is perfect, by Theorem 7.30, we also have that
STAB(G) is 2-level if and only if G is perfect.
The above discussion leads naturally to the question of which graphs G have
the property that IG is TH2 -exact, or more generally, THk -exact. These problems
are open at the moment, although isolated examples of THk -exact ideals are known
for specic values of k > 1. In practice it is quite dicult to nd examples of
graphs G for which IG is not TH2 -exact although such graphs have to exist unless
P = N P . Recent results of Au and Tuncel prove that if G is the line graph of the
complete graph on 2n + 1 vertices, then the smallest k for which IG is THk -exact
grows linearly with n [1].

7.4.2

A General Framework

The stable set problem and many others in combinatorial optimization can be modeled as arising from a simplicial complex. A simplicial complex or independence
system, , with vertex set [n], is a collection of subsets of [n], called the faces of
the , such that whenever S and T S, then T . The StanleyReisner
ideal of is the ideal J generated by the square-free monomials xi1 xi2 xik such
that {i1 , i2 , . . . , ik } [n] is not a face of . If I := J + x2i xi : i [n],
then VR (I ) = {s {0, 1}n : support(s) }. The support of a vector
v Rn
0
T
is the set {i [n] : vi = 0}. Further, for T [n], if x := iT xi , then
B := {xT + I : T } is a -basis of R[x]/I . This implies that the kth theta
body of I is
THk (I ) = Rn {y RB2k : MBk (y)  0, y0 = 1}.
Since B is in bijection with the faces of and x2i xi I for all i [n], the theta
body can be written explicitly as

M  0, M R|Bk ||Bk | such that

M = 1,

n
.
THk (I ) = y R : M{i} = M{i} = M{i}{i} = yi ,


M
=
0
if
U


,

UU

MUU  = MW W  if U U  = W W 
If the dimension of is d 1 (i.e., the largest faces in have size d), then I is
THd -exact since all elements of B have degree at most d and hence the last possible
theta body THd (I ) must coincide with conv(VR (I )) as VR (I ) is nite. However,
in many examples, I could be THk -exact for a k much smaller than d.
In the case of the stable set problem on G = ([n], E), is the set of all stable
sets in G. This is a simplicial complex with vertex set [n] whose nonfaces are the sets
T [n] containing a pair i, j [n] such that {i, j} E. Hence the minimal nonfaces (by set inclusion) are precisely the edges of G and so J = xi xj : {i, j} E.

i
i

7.4. Combinatorial Optimization

main
2012/11/1
page 335
i

335

Then I = J +x2i xi : i [n], which is precisely the ideal IG from Section 7.4.1,
and the remaining facts about the -basis B used in Section 7.4.1 and the structure
of the theta bodies of IG follow from the general set up described above.
An example from combinatorial optimization that does not follow the simplicial complex framework is the maximum cut problem of nding the largest size cut
in a graph. Recall that a cut in G is the collection of edges that go between the two
parts of a partition of the vertices of G. Note that a subset of a cut is not necessarily
a cut and hence the set of cuts in a graph do not form a simplicial complex. In
[7] the theta body hierarchy for the maximum cut problem, and more generally for
binary matroids, is studied. In this case, a -basis for the ideal in question is not
obvious as in the simplicial complex model.

7.4.3

Triangle-free Subgraphs in a Graph

We nish the chapter with a second example from combinatorial optimization that
ts the simplicial complex model. A subgraph H of a graph G = ([n], E) is trianglefree if it does not contain a triangle (K3 , the complete graph on 3 vertices). Given
weights on the edges of G, the triangle-free subgraph problem in G asks for a trianglefree subgraph of G of maximum weight. If all the edge weights are one, then the
problem seeks a triangle-free subgraph in G with the most number of edges. The
triangle-free subgraph problem is known to be NP-hard [36] and is relevant in various
contexts within optimization.
The integer programming
 formulation of the triangle-free subgraph problem
optimizes the linear function eE we xe , where we is the weight on edge e E, over
the characteristic
vectors {H : H is triangle-free in G}. This is equivalent to max
imizing eE we xe over
Ptf (G) := conv{H : H is triangle-free in G},
the triangle-free subgraph polytope of G. Note that Ptf (G) is a full-dimensional 0/1
polytope in RE . The triangle-free subgraph polytope of a graph has been studied by
various authors (see, for instance, [3, 5]), and a number of facet dening inequalities
of the polytope are known, although a full inequality description is not known or
expected.
Taking to be the simplicial complex on E consisting of all triangle-free
subgraphs in G, and Itf (G) := I , we have that
VR (Itf (G)) = {H : H is triangle-free in G}.
Hence the theta bodies of Itf (G) provide convex relaxations of the triangle-free
subgraph polytope Ptf (G). From the general framework in Section 7.4.2, B =
{xH +Itf (G) : H triangle-free in G} is a -basis of R[x]/Itf (G). Therefore, the rows
and columns of MBk (y) are indexed by the triangle-free subgraphs in G with at most
k edges. For ease of exposition, let us denote the entry of MBk (y) corresponding
to row indexed by xH1 and column indexed by xH2 by MBk (y)H1 H2 , let H1 H2
denote the subgraph of G whose edge set is the union of the edge sets of H1 and H2 ,

i
i

336

main
2012/11/1
page 336
i

Chapter 7. Convex Hulls of Algebraic Sets

and yH denote the entry of y RB corresponding to the basis element xH + Itf (G).
Then

M  0, M R|Bk ||Bk | such that

M = 1, 
E
.
THk (Itf (G)) = y R :
0 if H1 H2 has a triangle

MH1 H2 =

yH1 H2 otherwise
Since all subgraphs of G with at most two edges are triangle-free, and B1 =
{1 + Itf (G)} {xe + Itf (G) : e E}, TH1 (Itf (G)) is exactly the same as the rst
theta body of the ideal x2e xe : e E which is TH1 -exact by Theorem 7.30.
Hence TH1 (Itf (G)) = [0, 1]E , and Itf (G) is TH1 -exact if and only if every subgraph
of G is triangle-free, or equivalently, G is triangle-free.
For graphs G that contain triangles, the second theta body of Itf (G) is more
interesting as triples and quadruples of edges in G can contain triangles which forces
some of the entries in MB2 (y) to be zero.
Example 7.60. Suppose G = K3 with edges labeled 1, 2, 3. Then Ptf (G) is the
convex hull of all 0/1 vectors in R3 except (1, 1, 1) which is the rst polytope shown
in the second row of polytopes in Figure 7.10. This polytope is TH2 -exact since
B2 = {1, x1 , x2 , x3 , x1 x2 , x1 x3 , x2 x3 } + Itf (G) = B.
Denoting y RB2 , with rst entry one, to be
have that

1
y1 y2 y3
y1 y1 y12 y13

y2 y12 y2 y23

MB2 (y) =
y3 y13 y23 y3
y12 y12 y12 0

y13 y13 0 y13


y23 0 y23 y23

y = (1, y1 , y2 , y3 , y12 , y13 , y23 ), we

y12 y13 y23


y12 y13 0

y12 0 y23

0 y13 y23
.
y12 0
0

0 y13 0
0
0 y23

Hence the triangle-free subgraph polytope of K3 has the spectrahedral description


Ptf (G) = {(y1 , y2 , y3 ) : MB2 (y)  0}.
Several families of facet inequalities for the triangle-free subgraph polytope
of a graph can be found in the literature, and a complete facet description of
Ptf (G) for an arbitrary graph is unknown. An easy class of facets of Ptf (G) come
from the obvious fact that in any triangle in G at most two edges can be in a
triangle-free subgraph. Mathematically, if a, b, c E induce a triangle in G, then
2 xa xb xc 0 is a valid inequality for Ptf (G). We now show that this inequality is valid for TH2 (Itf (G)). First check that
(1 xc xa xb ) (1 xc xa xb )2 mod Itf (G)
and also
(1 xa xb + xa xb ) (1 xa xb + xa xb )2 mod Itf (G).

i
i

7.4. Combinatorial Optimization

main
2012/11/1
page 337
i

337

Figure 7.19. 5-wheel, partial 5-wheel, and Petersen graph.


This implies that 2 xa xb xc = (1 xa xb + xa xb ) + (1 xc xa xb ) is 2-sos
mod Itf (G) and hence 2 xa xb xc 0 is valid for TH2 (Itf (G)).
Exercise 7.61. We saw in Example 7.59 how to compute (G) numerically for a
graph G. Find (G) for the graphs in Figure 7.19.
1. G a 5-wheel;
2. G the 5-wheel with two missing nonconsecutive rays;
3. G the Petersen graph.
Exercise 7.62. Compute the value of (G) for the 5-cycle exactly. (Hint: take
advantage of the symmetries of the graph.)
Exercise 7.63. Prove that for any
graph G, TH1 (IG ) QSTAB(G). Note that it
is enough to prove that xi and 1 iC xi are 1-sos mod IG for all vertices i and
all cliques C.
Exercise 7.64. It is known that the stable set polytope of C2k+1 , the odd cycle of
2k + 1 nodes, is dened by the inequalities xi 0 for all i [2k + 1], xi + xj 1
for all {i, j} E, which
 by the previous exercise are 1-sos mod IG , and the single
odd cycle inequality
xi k [32, Corollary 65.12a].
1. Show that C5 is TH2 -exact.
2. Show that C2k+1 is TH2 -exact for all k.
Exercise 7.65. In Exercise 7.55 we have shown that the vanishing ideal of the set of
vertices of a (k + 1)-level polytope is THk -exact. We also have seen in Theorem 7.30
that the reverse implication is true for k = 1: if a real radical ideal is TH1 -exact,
then its variety must be the set of vertices of a 2-level polytope. Using what we
know of the theta body approximations to the stable set polytope, show that the
reverse implication (THk -exact k-level) fails for k 2.
Exercise 7.66. The triangle-free subgraph problem is closely related to another
important problem in combinatorial optimization, the K3 -cover subgraph problem.

i
i

338

main
2012/11/1
page 338
i

Chapter 7. Convex Hulls of Algebraic Sets

A subgraph of G is said to be a K3 -cover if it contains at least an edge of every


triangle of G. What is the relation between a maximum triangle-free subgraph
and a minimum K3 -cover? How is that reected in the polytopes underlying those
combinatorial problems?
Exercise 7.67. A (2k + 1)-odd wheel is the graph on 2k + 2 vertices with 2k + 1
of the vertices forming a 2k + 1-cycle and the last vertex
connected to each of the
vertices of the cycle. Such a wheel yields the inequality eEW xe 3k + 1 that is
valid for the triangle-free subgraph polytope of G. For example, an induced 5-wheel
in a graph gives the inequality
x12 + x23 + x34 + x45 + x15 + x16 + x26 + x36 + x46 + x56 7,
which is valid for the triangle-free subgraph polytope of the graph.
1. Use YALMIP to see that the 5-wheel and 7-wheel inequalities appear to be
2-sos mod Itf (G), where G is the corresponding wheel.
2. Can you express them exactly as 2-sos modulo the ideals?
3. Can you prove that all odd wheel inequalities are 2-sos modulo its ideal?
Exercise 7.68. Another version of the triangle-free subgraph problem is vertexbased. Given a subset of nodes of G we say it is triangle-free if its induced subgraph
is triangle-free. This also falls into the simplicial complex model, so we know how
to construct reduced moment matrices. Using the rst theta body, compute an
approximation for the maximum triangle-free subset of nodes of the 4-wheel.

Bibliography
[1] Y. H. Au and L. Tuncel. Complexity analyses of Bienstock-Zuckerberg and
Lasserre relaxations on the matching and stable set polytopes. In Integer Programming and Combinatorial Optimization, Lecture Notes in Comput. Sci.
6655, Springer, Heidelberg, 2011, pp. 1426.
[2] E. Balas, S. Ceria, and G. Cornuejols. A lift-and-project cutting plane algorithm for mixed 0-1 programs. Math. Program., 58:295324, 1993.
[3] F. Bendali, A. R. Mahjoub, and J. Mailfert. Composition of graphs and the
triangle-free subgraph polytope. J. Comb. Optim., 6:359381, 2002.
[4] M. Chudnovsky, N. Robertson, P. Seymour, and R. R. Thomas. The strong
perfect graph theorem. Ann. of Math. (2), 164:51229, 2006.
[5] M. Conforti, D. G. Corneil, and A. R. Mahjoub. Ki -covers. I. Complexity and
polytopes. Discrete Math., 58:121142, 1986.
[6] D. Cox, J. Little, and D. OShea. Ideals, Varieties and Algorithms. SpringerVerlag, New York, 1992.

i
i

Bibliography

main
2012/11/1
page 339
i

339

[7] J. Gouveia, M. Laurent, P. A. Parrilo, and R. R. Thomas. A new hierarchy of


semidenite programming relaxations for cycles in binary matroids and cuts in
graphs. Math. Program., Ser. A, 2010, to appear.
[8] J. Gouveia, P. A. Parrilo, and R. R. Thomas. Theta bodies for polynomial
ideals. SIAM J. Optim., 20:20972118, 2010.
[9] J. Gouveia and R. R. Thomas. Convex hulls of algebraic sets. In M. Anjos
and J.-B. Lasserre, editors, Handbook of Semidenite, Cone and Polynomial
Optimization: Theory, Algorithms, Software and Applications, to appear.
[10] D. Grayson and M. Stillman. Macaulay 2, a software system for research in
algebraic geometry. Available at http://www.math.uiuc.edu/Macaulay2.
[11] M. Grotschel, L. Lov
asz, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization, 2nd edition, Algorithms Combin. Springer-Verlag, Berlin,
1993.
[12] R. Harvey and H. B. Lawson, Jr. Calibrated geometries. Acta Math., 148:47
157, 1982.
[13] D. Henrion. Semidenite representation of convex hulls of rational varieties.
LAAS-CNRS Research Report 09001, 2009.
[14] J. B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM J. Optim., 11:796817, 2001.
[15] M. Laurent. A comparison of the Sherali-Adams, Lov
asz-Schrijver, and
Lasserre relaxations for 0-1 programming. Math. Oper. Res., 28:470496, 2003.
[16] M. Laurent. Sums of squares, moment matrices and optimization over polynomials. In Emerging Applications of Algebraic Geometry, IMA Vol. Math. Appl.
149. Springer, Berlin, 2009.
[17] J. Lofberg. YALMIP: A toolbox for modeling and optimization in MATLAB.
In Proceedings of the CACSD Conference, Taipei, Taiwan, 2004.
[18] L. Lovasz. On the Shannon capacity of a graph. IEEE Trans. Inform. Theory,
25:17, 1979.
[19] L. Lovasz. Semidenite programs and combinatorial optimization. In Recent
Advances in Algorithms and Combinatorics, CMS Books Math./Ouvrages
Math. SMC 11. Springer, New York, 2003, pp. 137194.
[20] L. Lovasz and A. Schrijver. Cones of matrices and set-functions and 0-1 optimization. SIAM J. Optim., 1:166190, 1991.
[21] M. Marshall. Positive Polynomials and Sums of Squares, Math. Surveys
Monogr. 146. American Mathematical Society, Providence, RI, 2008.

i
i

340

main
2012/11/1
page 340
i

Chapter 7. Convex Hulls of Algebraic Sets

[22] J. Nie. First order conditions for semidenite representations of convex sets
dened by rational or singular polynomials. Math. Program., 131:136, 2012.
[23] M. Omar and B. Osserman. Strong nonnegativity and sums of squares on real
varieties. arXiv:1101.0826.
[24] R. T. Rockafellar. Convex Analysis, Princeton Landmarks in Mathematics and
Physics. Princeton University Press, Princeton, NJ, 1996.
[25] P. Rostalski. Bermeja, Software for Convex Algebraic Geometry. Available at
http://math.berkeley.edu/philipp/Software/Software.
[26] R. Sanyal. Orbitopes and theta bodies. Talk at IPAM Workshop
on Convex Optimization and Algebraic Geometry, slides available at
http://math.berkeley.edu/bernd/raman.pdf, 2010.
[27] C. Scheiderer. Sums of squares of regular functions on real algebraic varieties.
Trans. Amer. Math. Soc., 352:10391069, 2000.
[28] C. Scheiderer. Sums of squares on real algebraic curves. Math. Z., 245:725760,
2003.
[29] C. Scheiderer. Sums of squares on real algebraic surfaces. Manuscripta Math.,
119:395410, 2006.
[30] C. Scheiderer. Convex hulls of curves of genus one. Adv. Math., 228:26062622,
2011.
[31] A. Schrijver. Theory of Linear and Integer Programming, Wiley-Interscience
Series in Discrete Mathematics and Optimization. Wiley, New York, 1986.
[32] A. Schrijver. Combinatorial Optimization. Polyhedra and Eciency. Vol. B,
Algorithms Combin. 24. Springer-Verlag, Berlin, 2003.
[33] H. D. Sherali and W. P. Adams. A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems.
SIAM J. Discrete Math., 3:411430, 1990.
[34] R. P. Stanley. Decompositions of rational convex polytopes. Ann. Discrete
Math., 6:333342, 1980.
[35] S. Sullivant. Compressed polytopes and statistical disclosure limitation. Tohoku
Math. J. (2), 58:433445, 2006.
[36] M. Yannakakis. Edge-deletion problems. SIAM J. Comput., 10:297309, 1981.

i
i

main
2012/11/1
page 341
i

Chapter 8

Free Convex Algebraic


Geometry

J. William Helton , Igor Klep ,


and Scott McCullough

A new development is extension of the algebraic certicates of real algebraic geometry to noncommutative polynomials, thereby giving a theory of noncommutative
polynomial inequalities. Here we shall focus on convexity aspects of noncommutative real algebraic geometry, and we shall see this leads to a very rigid structure.
Our subject pertains to optimization problems where the unknowns are matrices.

8.1

Introduction

This chapter is a tutorial on techniques and results in free convex algebraic geometry
and free positivity. As such it also serves as a point of entry into the larger eld of
free real algebraic geometry and makes contact with noncommutative real algebraic
geometry [27, 30, 32, 33, 38, 47, 48, 53, 59, 62, 63], free analysis and free probability
(lying at the origins of free analysis; cf. [64]), and free analytic function theory and
free harmonic analysis [28, 29, 34, 54, 60, 69, 70, 46].
The term free here refers to the central role played by algebras of noncommuting polynomials R<x> in free (freely noncommuting) variables x = (x1 , . . . , xg ).
A striking dierence between the free and classical settings is the following Positivstellensatz.
J. William Helton was partially supported by NSF grants DMS-0700758, DMS-0757212, and
DMS-1160802 and by the Ford Motor Company.
Igor Klep was supported by the Faculty Research Development Fund (FRDF) of The University of Auckland (project 3701119) and was partially supported by the Slovenian Research Agency
(program P1-0222).
Scott McCullough was supported by NSF grant DMS-1101137.

341

i
i

342

main
2012/11/1
page 342
i

Chapter 8. Free Convexity

Theorem 8.1 (Helton [27]). A nonnegative (suitably dened) free polynomial is


a sum of squares.
The subject of free real algebraic geometry ows in two branches. One, free
positivity is an analogue of classical real algebraic geometry, a theory of polynomial inequalities embodied in Positivstellensatze. As is the case with the sum of
squares result above (Theorem 8.1), generally free Positivstellensatze have cleaner
statements than do their commutative counterparts; see, e.g., [53, 27, 39, 33] for a
sample. Free convexity, the second branch of free real algebraic geometry, arose in
an eort to unify a torrent of ad hoc techniques which came on the linear systems
engineering scene in the mid 1990s. We will soon give a quick sketch of the engineering motivation, based on the slightly more complete sketch given in the survey
article [13]. Mathematically, much as in the commutative case, free convexity is
connected with free positivity through the second derivative: A free polynomial is
convex if and only if its Hessian is positive.
The tutorial proper starts with Section 8.2. In the remainder of this introduction, motivation for the study of free positivity and convexity arising in linear
systems engineering, quantum phenomena, and other subjects such as free probability is provided, as are some suggestions for further reading.

8.1.1

Motivation

While the theory is both mathematically pleasing and natural, much of the excitement of free convexity and positivity stems from its applications. Indeed, the
fact that a large class of linear systems engineering problems naturally lead to free
inequalities provided the main force behind the development of the subject. In this
motivational section, we describe in some detail the linear systems point of view.
We also give a brief introduction to other applications.
Linear systems engineering
The layout of a linear systems problem is typically specied by a signal ow diagram.
Signals go into boxes and other signals come out. The boxes in a linear system
contain constant coecient linear dierential equations which are specied entirely
by matrices (the coecients of the dierential equations). Often many boxes appear
and many signals transmit between them. In a typical problem some boxes are
given, and some we get to design subject to the condition that the L2 -norm of
various signals must compare in a prescribed way; e.g., the input to the system has
L2 -norm bigger than the output. The signal ow diagram itself and corresponding
problems do not specify the size of matrices involved. So ideally any algorithms
derived apply to matrices of all sizes. Hence the problems are called dimension free.
An empirical observation is that system problems of this type convert to inequalities on polynomials in matrices, the form of the polynomials being determined entirely by the signal ow layout (and independent of the matrices involved).
Thus the systems problem naturally leads to free polynomials and free positivity
conditions.
For yet a more detailed discussion of this example, see [13, Section 4.1]. Those
who read Chapter 2 saw a basic example of this in Section 2.2.1. Next we give more

i
i

8.1. Introduction

main
2012/11/1
page 343
i

343

of an idea of how the correspondence between linear systems and noncommutative


polynomials occurs. This is done primarily with an example.
Linear systems
A linear system F is given by the constant coecient linear dierential equations
dx
= Ax + Bu,
dt
y = Cx,
with the vector
x(t) at each time t being in the vector space X called the state space,
u(t) at each time t being in the vector space U called the input space,
y(t) at each time t being in the vector space Y called the output space,
and A, B, C being linear maps on the corresponding vector spaces.
Connecting linear systems
Systems can be connected in incredibly complicated congurations. We describe
a simple connection and this goes a long way toward illustrating the general idea.
Given two linear systems F, G, we describe the formulas for connecting them in
feedback.
One basic feedback connection is described by the diagram
u

v
G
called a signal ow diagram. Here u is a signal going into the closed loop system
and y is the signal coming out. The signal ow diagram is equivalent to a collection
of equations. The systems F and G themselves are, respectively, given by the linear
dierential equations
d
= Q + R w,
dt
v = S .

dx
= Ax + Be,
dt
y = Cx,

The feedback connection is described algebraically by


w=y

and

e = u v.

Putting these relations together gives that the closed loop system is described by
dierential equations
dx
= Ax BS + Bu,
dt
d
= Q + R y = Q + R Cx,
dt
y = Cx,

i
i

344

main
2012/11/1
page 344
i

Chapter 8. Free Convexity

which is conveniently described in matrix form as


  
   
d x
A BS x
B
=
+
u,
RC
Q

0
dt
 
2
3 x
y= C 0
,

(8.1)

where the state space of the closed loop systems is the direct sum X Y of the
state spaces X of F and Y of G. From (8.1), the coecients of the ODE are
(block) matrices whose entries are (in this case simple) polynomials in the matrices
A, B, C, Q, R, S.
This illustrates the moral of the general story:
System connections produce a new system whose coecients are matrices with
entries which are noncommutative polynomials (or at worst rational expressions)
in the coecient matrices of the component systems.
Complicated signal ow diagrams give complicated matrices of noncommutative polynomials or rationals. Note that in what was said the dimensions of vector
spaces and matrices A, B, C, Q, R, S never entered explicitly; the algebraic form of
(8.1) is completely determined by the ow diagram. Thus, such linear systems lead
to dimension free problems.
Next we turn to how noncommutative inequalities arise. The main constraint producing them can be thought of as energy dissipation, a special case of
which are the Lyapunov functions already seen in Section 2.2.1.
Energy dissipation
We have a system F and want a condition which checks whether
,
,
2
2
|u| dt
|Fu| dt,
x(0) = 0,
0

holds for all input functions u, where Fu = y in the above notation. If this holds F
is called a dissipative system.
L2 [0, ]

L2 [0, ]

The energy dissipative condition is formulated in the language of analysis, but


it converts to algebra (or at least an algebraic inequality) because of the following
construction, which assumes the existence of a potential energy-like function V
on the state space. A function V which satises V 0, V (0) = 0, and
,

t2

V (x(t1 )) +
t1

,
|u(t)|2 dt V (x(t2 )) +

t2

|y(t)|2 dt

t1

i
i

8.1. Introduction

main
2012/11/1
page 345
i

345

for all input functions u and initial states x1 is called a storage function. The displayed inequality is interpreted physically as
potential energy now + energy in potential energy then + energy out.
Assuming enough smoothness of V , we can dierentiate this integral condition
d
x(t1 ) = Ax(t1 ) + Bu(t1 ) to obtain a dierential inequality
and use dt
0 V (x)(Ax + Bu) + |Cx|2 |u|2

(8.2)

on what is called the reachable set (which we do not need to dene here).
In the case of linear systems, V can be chosen to be a quadratic. So it has the
form V (x) = Ex, x with E  0 and V (x) = 2Ex.
Theorem 8.2. The linear system A, B, C is dissipative if inequality (8.2) holds for
all u U, x X . Conversely, if A, B, C is reachable,1 then dissipativity implies
that inequality (8.2) holds for all u U, x X .
In the linear case, we may substitute V (x) = 2Ex in (8.2) to obtain
0 2(Ex) (Ax + Bu) + |Cx|2 |u|2
for all u, x. Then maximize in x to get
0 x [EA + A E + EBB  E + C  C]x.
Thus the classical Riccati matrix inequality
0  EA + A E + EBB  E + C  C

with

E0

(8.3)

ensures dissipativity of the system and, it turns out, is also implied by dissipativity
when the system is reachable.
It is inequality (8.3), applied in many many contexts, which leads to positive
semidenite inequalities throughout all of linear systems theory.
As an aside we return to the very special case of dissipativity, namely Lyapunov stability, described in Section 2.2.1. Our discussion starts with the miracle
of inequality (8.3): when B = 0 it becomes the Lyapunov inequality. However,
this is merely magic (no miracle whatsoever); the trick being that the if input u
is identically zero, then dissipativity implies stability. The converse is less intuitive, but true: stability of x = Ax implies existence of a virtual potential energy
V (x) = Ex, x and output C making the virtual system dissipative.
Schur complements and linear matrix inequalities
Using Schur complements, the Riccati inequality of (8.3) is equivalent to the inequality


EA + A E + C  C EB
0.
L(E) :=
BE
I
1A

mild technical condition.

i
i

346

main
2012/11/1
page 346
i

Chapter 8. Free Convexity

Here A, B, C describe the system and E is an unknown matrix. If the system is


reachable, then A, B, C is dissipative if and only if L(E) 0 and E  0.
The key feature in this reformulation of the Riccati inequality is that L(E) is
linear in E, so the inequality L(E) 0 is a linear matrix inequality in E.
Putting it together
We have shown two ingredients of linear system theory, connection laws (algebraic)
and dissipation (inequalities), but have yet to put them together. It is in fact a
very mechanical procedure. After going through the procedure one sees that the
problem a software toolbox designer faces is this:
(GRAIL)

Given a symmetric matrix of noncommutative polynomials


D
Ek
p(a, x) = pij (a, x)

i,j=1

and a tuple of matrices A, provide an algorithm for nding X making


p(A, X)  0 or, better yet, as large as possible.
Algorithms for doing this are based on numerical optimization or a close relative,
so even if they nd a local solution there is no guarantee that it is global. If p is
convex in X, then these problems disappear.
Thus, systems problems described by signal ow diagrams produce a mess of
matrix inequalities with some matrices known and some unknown and the constraints that some polynomials are positive semidenite. The inequalities can
get very complicated as one might guess, since signal ow diagrams get complicated. These considerations thus naturally lead to the emerging subject of free real
algebraic geometry, the study of noncommutative (free) polynomial inequalities,
and free semialgebraic sets. Indeed, much of what is known about this very new
subject is touched on in this chapter.
The engineer would like for these polynomial inequalities to be convex in the
unknowns. Convexity guarantees that local optima are global optima (nding global
optima is often of paramount importance) and facilitates numerics.
Hence the major issues in linear systems theory are as follows:
1. Which problems convert to a convex matrix inequality? How does one do the
conversion?
2. Find numerics which will solve large convex problems. How do you use special
structure, such as most unknowns are matrices and the formulas are all built of
noncommutative rational functions?
3. Are convex matrix inequalities more general than linear matrix inequalities?
The mathematics here can be motivated by the problem of writing a toolbox
for engineers to use in designing linear systems. What goes in such toolboxes
are algebraic formulas with matrices A, B, C unspecied and reliable numerics for
solving them when a user does specify A, B, C as matrices. A user who designs a

i
i

8.1. Introduction

main
2012/11/1
page 347
i

347

controller for a helicopter puts in the mathematical systems model for his helicopter
and puts in matrices, for example, A is a particular 8 8 real matrix etc. Another
user who designs a satellite controller might have a 50-dimensional state space and
of course would pick completely dierent A, B, C. Essentially any matrices of any
compatible dimensions can occur. Any claim we make about our formulas must be
valid regardless of the size of the matrices plugged in.
The toolbox designer faces two completely dierent tasks. One is manipulation
of algebraic inequalities; the other is numerical solutions. Often the rst is far more
daunting since the numerics is handled by some standard package (although for
numerics problem size is a demon). Thus there is a great need for algebraic theory.
Most of this chapter bears on questions like (3) above, where the unknowns are
matrices. The rst two questions will not be addressed. Here we treat (3) when
there are no a variables. When there are a variables, see [26, 1]. Thus we shall
consider polynomials p(x) in free noncommutative variables x and focus on their
convexity on free semialgebraic sets.
What are the implications of our study for engineering? Herein you will see
strong results on free convexity but what do they say to an engineer? We foreshadow the forthcoming answer by saying it is fairly negative, but postpone further
disclosure till the nal page of these writings not so much to promote suspense but
for the conclusion to arrive after you have absorbed the theory.
Quantum phenomena
Free Positivstellens
atzealgebraic certicates for positivityof which Theorem 8.1
is the grandfather, have physical applications. Applications to quantum physics are
explained by Pironio, Navascues, and Acn [59], who also consider computational
aspects related to noncommutative sum of squares. How this pertains to operator
algebras is discussed by Klep and Schweighofer in [47]. The important Bessis
MoussaVillani conjecture (BMV) from quantum statistical mechanics is tackled in
[48, 7]. Doherty et al. [12] employ noncommutative positivity and the Positivstellensatz [37] of the rst and the third author to consider the quantum moment problem
and multiprover games.
A particularly elegant recent development, independent of the line of history
containing the work in this chapter, was initiated by Eros. The classic perspective transformation carries a function on Rn to a function on Rn+1 . It is used for
various purposes, one being in algebraic geometry to produce blowups of singularities, thereby removing them. It has the property that convex functions map to
convex functions. What about convex functions on free variables? This question
was asked by Eros and settled armatively in [18] for natural cases as a way to
show that quantum relative entropy is convex. Subsequently, [19] showed that the
perspective transformation in free variables always maps convex functions to convex
functions.
Miscellaneous applications
A number of other scientic disciplines use free analysis, though less systematically
than in free real algebraic geometry.

i
i

348

main
2012/11/1
page 348
i

Chapter 8. Free Convexity

Free probability. Voiculescu developed it to attack one of the purest of mathematical questions regarding von Neumann algebras. From the outset (about 20 years
ago) it was elegant and it came to have great depth. Subsequently, it was discovered
to bear forcefully and eectively on random matrices. The area is vast, so we do
not dive in but refer the reader to an introduction [64, 71].
Nonlinear engineering systems. A classical technique in nonlinear systems theory developed by Fliess is based on manipulation of power series with noncommutative variables (the Chen series). The area has a new impetus coming from the
problem of data compression, so now is a time when these correspondences are being
worked out; cf. [21, 22, 52].

8.1.2

Further Reading

We pause here to oer some suggestions for further reading. For further engineering
motivation we recommend the paper [65] or the longer version [66] for related new
directions. Descriptions of Positivstellens
atze are in the surveys [31, 13, 43, 63],
with the rst three also briskly touring free convexity. The survey article [40] is
aimed at engineers.
Noncommutative is a broad term, encompassing essentially all algebras. In
between the extremes of commutative and free lie many important topics, such as
Lie algebras, Hopf algebras, quantum groups, C -algebras, von Neumann algebras,
etc. For instance, there are elegant noncommutative real algebraic geometry results
for the Weyl algebra [62]; cf. [63].

8.1.3

Guide to the Chapter

The goal of this tutorial is to introduce the reader to the main results and techniques
used to study free convexity. Fortunately, the subject is new and the techniques
not too numerous so that one can quickly become an expert.
The basics of free, or noncommutative, polynomials and their evaluations are
developed in Section 8.2. The key notions are positivity and convexity for free polynomials. The principal fact is that the second directional derivative (in direction h)
of a free convex polynomial is a positive quadratic polynomial in h (just like in the
commutative case). Free quadratic (in h) polynomials have a Gram-type representation which thus gures prominently in studying convexity. The nuts and bolts of
this Gram representation and some of its consequences, including Theorem 8.1, are
the subjects of Sections 8.4 and 8.5, respectively.
The Gram representation techniques actually require only a small amount of
convexity, and thus there is a theory of geometry on free varieties having signed
(e.g., positive) curvature. Some details are in Section 8.6.
A couple of free real algebraic geometry results which have a heavy convexity
component are described in the last section, Section 8.7. The rst is an optimal
free convex Positivstellensatz which generalizes Theorem 8.1. The second says that
free convex semialgebraic sets are free spectrahedra, giving another example of the
much more rigid structure in the free setting.

i
i

8.2. Basics of Noncommutative Polynomials and Their Convexity

main
2012/11/1
page 349
i

349

Section 8.3 introduces software which handles free noncommutative computations. You may nd it useful in your free studies.
In what follows, mildly incorrectly but in keeping with the usage in the literature, the terms noncommutative and free are used synonymously.

8.2

Basics of Noncommutative Polynomials and


Their Convexity

This section treats the basics of polynomials in noncommutative variables, noncommutative dierential calculus, and noncommutative inequalities. There is also
a brief introduction to noncommutative rational functions and inequalities.

8.2.1

Noncommutative Polynomials

Before turning to the formalities, we give, by examples, an informal introduction to


noncommutative polynomials.
A noncommutative polynomial p is a polynomial in a nite set x = (x1 , . . . , xg )
of relation free variables. A canonical example, in the case of two variables x =
(x1 , x2 ), is the commutator
c(x1 , x2 ) = x1 x2 x2 x1 .

(8.4)

It is precisely the fact that x1 and x2 do not commute that makes c nonzero.
While a commutative polynomial q R[t1 , t2 ] is naturally evaluated at points
t R2 , noncommutative polynomials are naturally evaluated on tuples of square
matrices. For instance, with




0 1
1 0
X1 =
, X2 =
,
1 0
0 0
and X = (X1 , X2 ), one nds
c(X) =


0
1


1
.
0

Importantly, c can be evaluated on any pair (X, Y ) of symmetric matrices of


the same size. (Later in the section we will also consider evaluations involving not
necessarily symmetric matrices.) Note that if X and Y are n n, then c(X, Y ) is
itself an n n matrix. In the case of c(x, y) = xy yx, the matrix c(X, Y ) = 0
if and only if X and Y commute. In particular, c is zero on R2 (2-tuples of 1 1
matrices).
For another example, if d(x1 , x2 ) = 1+x1 x2 x1 , then with X1 and X2 as above,
we nd


1 0
d(X) = I2 + X1 X2 X1 =
.
0 2
Note that although X is a tuple of symmetric matrices, it need not be the
case that p(X) is symmetric. Indeed, the matrix c(X) above is not. In the present

i
i

350

main
2012/11/1
page 350
i

Chapter 8. Free Convexity

context, we say that p is symmetric if p(X) is symmetric whenever X = (X1 , . . . , Xg )


is a tuple of symmetric matrices. Another more algebraic denition of symmetric
for noncommutative polynomials appears in Section 8.2.2.
Noncommutative convexity for polynomials
Many standard notions for polynomials, and even functions, on Rg extend to the
noncommutative setting, though often with unexpected ramications. For example,
the commutative polynomial q R[t1 , t2 ] is convex if, given s, t R2 ,
.s + t/
%
1$
q(s) + q(t) q
.
2
2
There is a natural ordering on symmetric n n matrices dened by X  Y if
the symmetric matrix X Y is positive semidenite, i.e., if its eigenvalues are all
nonnegative. Similarly, X Y if X Y is positive denite, i.e., all its eigenvalues
are positive. This order yields a canonical notion of convex noncommutative polynomial. Namely, a symmetric polynomial p is convex if for each n and each pair
of g tuples of n n symmetric matrices X = (X1 , . . . , Xg ) and Y = (Y1 , . . . , Yg ),
we have
.X + Y /
%
1$
p(X) + p(Y )  p
.
2
2
Equivalently,
.X + Y /
p(X) + p(Y )
p
 0.
2
2

(8.5)

Even in one variable, convexity for a noncommutative polynomial is a serious


constraint. For instance, consider the polynomial x4 . It is symmetric, but with




4 2
2 0
X=
and Y =
2 2
0 0
it follows that



X4 + Y 4 . 1
1 /4
164 120

X+ Y
=
120 084
2
2
2

is not positive semidenite. Thus x4 is not convex.


Noncommutative polynomial inequalities and convexity
The study of polynomial inequalities, real algebraic geometry or semialgebraic geometry, has a noncommutative version. A basic open semialgebraic set is a subset
of Rg dened by a list of polynomial inequalities; i.e., a set S is a basic open semialgebraic set if
S = {t Rg : p1 (t) > 0, . . . , pk (t) > 0}
for some polynomials p1 , . . . , pk R[t1 , . . . , tg ].

i
i

8.2. Basics of Noncommutative Polynomials and Their Convexity

main
2012/11/1
page 351
i

351

t2
1

t1

ncTV(1) = {(t1 , t2 ) R2 : 1 t41 t42 > 0}.


Because noncommutative polynomials are evaluated on tuples of matrices,
a noncommutative (free) basic open semialgebraic set is a sequence. For positive
integers n, let (Snn )g denote the set of g-tuples of nn symmetric matrices. Given
symmetric noncommutative polynomials p1 , . . . , pk , let
P(n) = {X (Snn )g : p1 (X) 0, . . . , pk (X) 0}.
The sequence P = (P(n)) is then a noncommutative (free) basic open semialgebraic
set. The sequence
ncTV(n) = {X (Snn )2 : In X14 X24 0}
is an entertaining example. When n = 1, ncTV(1) is a subset of R2 often called
the TV screen. Numerically it can be veried, though it is rather tricky to do so
(see Exercise 8.23) that the set ncTV(2) is not a convex set. An analytic proof that
ncTV(n) is not a convex set for some n can be found in [15]. It also follows by
combining results in [38] and [44]. For properties of the classical commutative TV
screen, see Chapters 5 and 6 of this book.
Example 8.3. Let p := 2
N :=

g

<

j=1

x2j . Then the -neighborhood of 0,

{X (Snn )g : p (X) 0},

nN

is an important example of a noncommutative basic open semialgebraic set.

i
i

352

8.2.2

main
2012/11/1
page 352
i

Chapter 8. Free Convexity

Noncommutative Polynomials: The Formalities

We now take up the formalities of noncommutative polynomials, their evaluations,


convexity, and positivity.
Let x = (x1 , . . . , xg ) denote a g-tuple of free noncommuting variables and let
R<x> denote the associative R-algebra freely generated by x, i.e., the elements of
R<x> are polynomials in the noncommuting variables x with coecients in R. Its
elements are called (noncommutative) polynomials. An element of the form aw,
where 0 = a R and w is a word in the variables x, is called a monomial and a its
coecient. Hence words are monomials whose coecient is 1. Note that the empty
word plays the role of the multiplicative identity for R<x>.
There is a natural involution  on R<x> that reverses words. For example,
(2 3x21x2 x3 ) = 2 3x3 x2 x21 . A polynomial p is a symmetric polynomial if p = p.
Later we will see that this notion of symmetric is equivalent to that in the previous
subsection. For now we note that of
c(x) = x1 x2 x2 x1 ,
j(x) = x1 x2 + x2 x1 ,
j is symmetric, but c is not. Indeed, c = c. Because xj = xj we refer to the
variables as symmetric variables. Occasionally we emphasize this point by writing
R<x = x > for R<x>.
The degree of a noncommutative polynomial p, denoted deg(p), is the length
of the longest word appearing in p. For instance the polynomials c and j above
both have degree 2 and the degree of
r(x) = 1 3x1 x2 3x2 x1 2x21 x42 x21
is 8. Let R<x>k denote the polynomials of degree at most k.
Noncommutative matrix polynomials


Given positive integers d, d , let Rdd <x> denote the d d matrices with en
tries from R<x>. Thus elements of Rdd <x> are matrix-valued noncommutative polynomials. The involution on R<x> naturally extends to a mapping


 : Rdd <x> Rd d <x>. In particular, if
2 3d,d

P = pi,j i,j=1 Rdd <x>,
then

2  3d,d

P  = pj,i i,j=1 Rd d <x>.

In the case that d = d , such a P is symmetric if P  = P .


Linear pencils
Given a positive integer n, let Snn denote the real symmetric n n matrices. For
A0 , A1 , . . . , Ag Sdd , the expression
L(x) = A0 +

g


Aj xj Sdd <x>

(8.6)

j=1

i
i

8.2. Basics of Noncommutative Polynomials and Their Convexity

main
2012/11/1
page 353
i

353

in the noncommuting variables x is a symmetric ane linear pencil. In other words,


these are precisely the symmetric degree one matrix-valued noncommutative polynomials. If A0 = I, thenL is monic. If A0 = 0, then L is a linear pencil. The
g
homogeneous linear part j=1 Aj xj of a linear pencil L as in (8.6) will be denoted
by L(1) .
Example 8.4. Let

0 1 0
1 0 0
A1 =
0 0 0
0 0 0

0
0
,
0
0

Then

0
0
A2 =
0
0

0
0
1
0

1

x1
I+
Aj xj =
0
0

0
1
0
0

x1
1
x2
0

0
0
,
0
0

0
x2
1
x3

0
0
A3 =
0
0

0
0
0
0

0
0
0
1

0
0
.
1
0

0
0

x3
1

is the corresponding monic ane linear pencil.


Polynomial evaluations


If p Rdd <x> is a noncommutative polynomial and X (Snn )g , the evalu


ation p(X) Rdnd n is dened by simply replacing xi by Xi . Throughout we
use lowercase letters for variables and the corresponding capital letter for matrices
substituted for that variable.
2 23
Example 8.5. Suppose p(x) = Ax1 x2 where A = 4
3 0 . That is,


4x1 x2 2x1 x2
.
p(x) =
3x1 x2
0
Thus p R22 <x> and one example of an evaluation is
&
 
'
&

'
&
0 1
1 0
0 1 1 0
0
p
,
=A
= A
1 0
0 1
1 0 0 1
1

0 4 0 2
4 0 2 0

=
0 3 0 0 .
3 0 0 0

'
1
0

Similarly, if p is a constant matrix-valued noncommutative polynomial, p(x) =


A, and X (Snn )g , then p(X) = A In . Here we have taken advantage of the
usual tensor (or Kronecker) product of matrices. Given an  matrix A = (Ai,j )
and an n n matrix B, by denition, A B is the n n block matrix
3
2
A B = Ai,j B ,

i
i

354

main
2012/11/1
page 354
i

Chapter 8. Free Convexity

with  matrix entries. We have reserved the tensor product notation for the
tensor product of matrices and have eschewed the strong temptation of using A x
in place of Ax when x is one of the variables.
Proposition 8.6. Suppose p R<x>. In increasing levels of generality,
1. if p(X) = 0 for all n and all X (Snn )g , then p = 0;
2. if there is a nonempty noncommutative basic open semialgebraic set O such that
p(X) = 0 on O (meaning for every n and X O(n), p(X) = 0), then p = 0;
3. there is an N, depending only upon the degree of p, so that for any n N if
there is an open subset O (Snn )g with p(X) = 0 for all X O, then p = 0.
Proof. See Exercises 8.28, 8.31, and 8.34.
Exercise 8.7. Use Proposition 8.6 to prove the following statement.
Proposition 8.8. Suppose p R<x>. Show p(X) is symmetric for every n and
every X (Snn )g if and only if p = p.

8.2.3

Noncommutative Convexity Revisited and


Noncommutative Positivity

Now we return with a bit more detail to our main theme, convexity. A symmetric
polynomial p is matrix convex if, for each positive integer n, each pair of g-tuples
X = (X1 , . . . , Xg ) and Y = (Y1 , . . . , Yg ) in (Snn )g , and each 0 t 1,
$
%
tp(X) + (1 t)p(Y ) p tX + (1 t)Y  0,
where, for an n n matrix A Rnn , the notation A  0 means A is positive
semidenite. Synonyms for matrix convex include both noncommutative convex
and simply convex.
Exercise 8.9. Show that the denition here of (matrix) convex is equivalent to
that given in (8.5) in the informal introduction to noncommutative polynomials.
As we have already seen in the informal introduction to noncommutative
polynomials, even in one variable, convexity in the noncommutative setting differs from convexity in the commutative case because here Y need not commute
with X. Thus, although the polynomial x4 is a convex function of one real variable,
it is not matrix convex. On the other hand, to verify that x2 is a matrix convex
polynomial, observe that
tX 2 + (1 t)Y 2 (tX + (1 t)Y )2
= t(1 t)(X 2 XY Y X + Y 2 ) = t(1 t)(X Y )2  0.
A polynomial p R<x> is matrix positive, synonymously noncommutative
positive or simply positive, if p(X)  0 for all tuples X = (X1 , . . . , Xg ) (Snn )g .

i
i

8.2. Basics of Noncommutative Polynomials and Their Convexity

main
2012/11/1
page 355
i

355

A polynomial p is a sum of squares if there exists k N and polynomials h1 , . . . , hk


such that
k

p=
hj hj .
j=1

Because, for a matrix A, the matrix A A is positive semidenite, if p is a sum of


squares, then p is positive. Though we will not discuss its proof in this chapter, we
mention that, in contrast with the commutative case, the converse is true [27, 53].
Theorem 8.10. If p R<x> is positive, then p is a sum of squares.
As for convexity, note that p(x) is convex if and only if the polynomial q(x, y)
in 2g noncommutative variables given by
.x + y /
%
1$
q(x, y) = p(x) + p(y) p
2
2
is positive.

8.2.4

Directional Derivatives Versus Noncommutative


Convexity and Positivity

Matrix convexity can be formulated in terms of positivity of the Hessian, just as in


the case of a real variable. Thus we take a few moments to develop a very useful
noncommutative calculus.
Given a polynomial p R<x>, the th directional derivative of p in the
direction h is

d p(x + th)
( )
.
p (x)[h] :=

dt
t=0

Thus p

( )

(x)[h] is the polynomial that evaluates to



d p(X + tH)
for every choice of X, H (Snn )g .

dt
t=0

We let p (x)[h] denote the rst derivative, and the Hessian, denoted p (x)[h] of
p(x), is the second directional derivative of p in the direction h.
Equivalently, the Hessian of p(x) can also be dened as the part of the polynomial
$
%
r(x)[h] := 2 p(x + h) p(x)
in
R<x>[h] := R < x1 , . . . , xg , h1 , . . . , hg >
that is homogeneous of degree two in h.
If p = 0, that is, if p = p(x) is a noncommutative polynomial of degree two
or more, then the polynomial p (x)[h] in the 2g variables x1 , . . . , xg , h1 . . . , hg is
homogeneous of degree 2 in h and has degree equal to the degree of p.

i
i

356

main
2012/11/1
page 356
i

Chapter 8. Free Convexity

Example 8.11.
(1) The Hessian of the polynomial p = x21 x2 is
p (x)[h] = 2(h21 x2 + h1 x1 h2 + x1 h1 h2 ).
(2) The Hessian of the polynomial f (x) = x4 (just one variable) is
f  (x)[h] = 2(h2 x2 + hxhx + hx2 h + xhxh + xh2 x + x2 h2 ).
Noncommutative convexity is neatly described in terms of the Hessian.
Lemma 8.12. p R<x> is noncommutative convex if and only if p (x)[h] is
noncommutative positive.
Proof. See Exercise 8.26.

8.2.5

Symmetric, Free, Mixed, and Classes of Variables

To this point, our variables x have been symmetric in the sense that, under the
involution, xj = xj . The corresponding polynomials, elements of R<x> are then
the noncommutative analogue of polynomials in real variables, with evaluations
at tuples in Snn . In various applications and settings it is natural to consider
noncommutative polynomials in other types of variables.
Free variables
The noncommutative analogue of polynomials in complex variables is obtained by
allowing evaluations on tuples X of not necessarily symmetric matrices. In this case,
the involution must be interpreted dierently, and the variables are called free.
In this setting, given the noncommutative variables x = (x1 , . . . , xg ), let x =

(x1 , . . . , xg ) denote another collection of noncommutative variables. On the ring
R<x, x > dene the involution  by requiring xj  xj ; xj  xj ;  reverses the
order of words; and linearity. For instance, for
q(x) = 1 + x1 x2 x2 x1 R<x, x >,
we have

q  (x) = 1 + x2 x1 x1 x2 .

Elements of R<x, x > are polynomials in free variables, and in this setting the
variables themselves are free.
A polynomial p R<x, x > is symmetric provided p = p. In particular, q
above is not symmetric, but
p = 1 + x1 x2 + x2 x1

(8.7)

is.

i
i

8.2. Basics of Noncommutative Polynomials and Their Convexity

main
2012/11/1
page 357
i

357

A polynomial p R<x, x > is analytic if there are no transposes, i.e., if p is


a polynomial in x alone.
Elements of R<x, x > are naturally evaluated on tuples X = (X1 , . . . , Xg )
g
(R ) . For instance, if p is the polynomial in (8.7) and X = (X1 , X2 ) (R22 )2 ,
where


0 0
= X2 ,
X1 =
1 0
then


p(X) =


3 0
.
0 1

The space Rdd <x, x > is dened by analogy with Rdd <x>, and evaluation

of elements in Rdd <x, x > at a tuple X (R )g is dened in the obvious way.
Exercise 8.13. State and prove analogues of Propositions 8.6 and 8.8 for R<x, x >
and evaluations from (R )g .
Mixed variables
At times it is desirable to mix free and symmetric variables. We wont introduce
notation for this situation, as it will generally be understood from the context. Here
are some examples:
Example 8.14.
3
p(x) = x1 x1 + x2 + x1 x2 x1 ,
4

x2 = x2 ;

ric(a1 , a2 , x) = a1 x + xa1 xa2 a2 x,

(8.8)

x = x .

In the rst case x1 is free, but x2 is symmetric; and in the second a1 and a2 are
free, but x is symmetric. Two additional remarks are in order about the second
polynomial. First, it is a Riccati polynomial ubiquitous in control theory. Second,
we have separated the variables into two classes of variables, the a variables and the
x variable(s); thus p R<a, x = x >. In applications, the a variables can be chosen
to represent known (system parameters), while the x variables are unknown(s). Of
course, it could be that some of the a variables are symmetric and some free and
ditto for the x variables.
Example 8.15. Various directional derivatives of p in (8.8) are
3
3
Dx1 p(x)[h1 ] = h1 x1 +x1 h1 + h1 x2 x1 + x1 x2 h1 ,
4
4

3
Dx2 p(x)[h2 ] = h2 + x1 h2 x1 ,
4

3
3
3
Dx p(x)[h] = h1 x1 + x1 h1 + h2 + h1 x2 x1 + x1 x2 h1 + x1 h2 x1 ,
4
4
4
Continuing with the variable class warfare, consider the following matrixvalued example.

i
i

358

main
2012/11/1
page 358
i

Chapter 8. Free Convexity

Example 8.16. Let




a x + xa1
L(a1 , a2 , x) = 1
xa2


a2 x
.
1

We consider L R22 <a, x = x >; i.e., the a variables are free, and the x-variables
symmetric. Note that L is linear in x if we consider a1 , a2 xed. Of course, if a1 , a2 ,
and x are all scalars, then using Schur complements tells us there is a close relation
between L in this example and the Riccati of the previous example.

8.2.6

Noncommutative Rational Functions

While it is possible to dene noncommutative functions [67, 64, 69, 70, 60, 61, 46,
28, 29], in this section we content ourselves with a relatively informal discussion of
noncommutative rational functions [10, 11, 41, 45].
Rational functions, a gentle introduction
Noncommutative rational expressions are obtained by allowing inverses of polynomials. An example is the discrete time algebraic Riccati equation
r(a, x) = a1 xa1 (a1 xa2 )a1 (a3 + a2 xa2 )1 (a2 xa1 ) + a4 ,

x = x .

It is a rational expression in the free variables a and the symmetric variable x, as


is r1 . An example, in free variables, which arises in operator theory is
s(x) = x (1 xx )1 .

(8.9)

Thus, we dene (scalar) noncommutative rational expressions for free noncommutative variables x by starting with noncommutative polynomials and then
applying successive arithmetic operationsaddition, multiplication, and inversion.
We emphasize that an expression includes the order in which it is composed, and
no two distinct expressions are identied, e.g., (x1 ) + (x1 ), (1) + (((x1 )1 )(x1 )),
and 0 are dierent noncommutative rational expressions.
Evaluation on polynomials naturally extends to rational expressions. If r is a
rational expression in free variables and X (R )g , then r(X) is denedin the
obvious wayas long as any inverses appearing actually exist. Indeed, our main
interest is in the evaluation of a rational expression. For instance, for the polynomial
s above in one free variable, s(X) is dened as long as I XX  is invertible and
in this case,
s(X) = X  (I XX )1 .
Generally, a noncommutative rational expression r can be evaluated on a g-tuple X
of n n matrices in its domain of regularity, dom r, which is dened as the set of
all g-tuples of square matrices of all sizes such that all the inverses involved in the
calculation of r(X) exist. For example, if r = (x1 x2 x2 x1 )1 , then dom r = {X =
(X1 , X2 ) : det(X1 X2 X2 X1 ) = 0}. We assume that dom r = . In other words,

i
i

8.2. Basics of Noncommutative Polynomials and Their Convexity

main
2012/11/1
page 359
i

359

when forming noncommutative rational expressions we never invert an expression


that is nowhere invertible.
Two rational expressions r1 and r2 are equivalent if r1 (X) = r2 (X) at any X
where both are dened. For instance, for the rational expression t in one free
variable,
t(x) = (1 x x)1 x ,
and s from (8.9), it is an exercise to check that s(X) is dened if and only if t(X)
is and moreover in this case s(X) = t(X). Thus s and t are equivalent rational
expressions. We call an equivalence class of rational expressions a rational function.
The set of all rational functions will be denoted by R<x
( >.
)
Here is an interesting example of a noncommutative rational function with
nested inverses. It is taken from [2, Theorem 6.3].
Example 8.17. Consider two free variables x, y. For any r R<x,
( y>
) let
$
% $
%1
W (r) := c x, c(x, r)2 c x, c(x, r)1
R<x,
( y >.
)
(8.10)
Recall that c denotes the commutator (8.4). Bergmans noncommutative rational
function is given by
. $
. $
$
%
%1 /
%1 /
b := W (y) W c(x, y) W c x, c(x, y)
W c x, c(x, c(x, y))
R<x,
( y >.
)

(8.11)

Exercise 8.18.
$ Consider the
% function W from (8.10). Let R, X be n n matrices
and assume c X, c(X, R)1 exists and is invertible. Prove the following:
(1) If n = 2, then W (R) = 0.
(2) If n = 3, then W (R) = det(c(X, R)).
Exercise 8.19. Consider Bergmans rational function (8.11).
(1) Show that on a dense set of 2 2 matrices (X, Y ), b(X, Y ) = 0.
(2) Prove that on a dense set of 3 3 matrices (X, Y ), b(X, Y ) = 1.
The moral of Exercise 8.19 is that, unlike in the case of polynomial identities,
a noncommutative rational function that vanishes on (a dense set of) 3 3 matrices
need not vanish on (a dense set of) 2 2 matrices.
Matrices of rational functions; LDL
One of the main ways noncommutative rational functions occur in systems engineering is in the manipulation of matrices of polynomials. Extremely important is
the LDL decomposition. Consider the 2 2 matrix with noncommutative entries


a b
,
M=
b c

i
i

360

main
2012/11/1
page 360
i

Chapter 8. Free Convexity

where a = a . The entries themselves could be noncommutative polynomials or


even rational functions. If a is not zero, then M has the following decomposition:




I
0 a
0
I a1 b
M = LDL =
.
ba1 I 0 c ba1 b 0
I
Note that this formula holds in the case that c is itself a (square) matrix noncommutative rational function and b (and thus b ) are vector-valued noncommutative
rational functions. On the other hand, if both a = c = 0, then M is the block
matrix


0 b
M= 
.
b
0
If M is a k k matrix, then iterating this procedure produces a decomposition
of a permutation M  of M of the form M  = LDL , where D and L have
the form

0
0
0
0 0
d1 0

.. . .
.
. 0
0

0 0

0 dk
0

0 0

0 0
D=
(8.12)

0 . . . 0 Dk+1

.
.
.
.
..
..
..
..
0 0

0 0
0
D 0
0 0
0

0 E
and

L=

0
..
.

0 0
1 0
I2

0
0
0

0
0
0
..
.

0
I2

0
,

0
Ia

(8.13)

where dj are symmetric rational functions, and the Dj are nonzero 2 2 matrices
of the form


0 bj
Dj = 
.
bj 0
E is a square 0 matrix (possibly of size 0 0 and thus absent), and I2 is the 2 2
identity and the s represent possibly nonzero rational expressions (in some cases
matrices of rational functions), some of the 0s are zero matrices (of the appropriate
sizes), and a is the dimension of the space that E acts upon. The permutation
is necessary in cases where the procedure hits a 0 on the diagonal, necessitating a
permutation to bring a nonzero diagonal entry into the pivot position.

i
i

8.2. Basics of Noncommutative Polynomials and Their Convexity

main
2012/11/1
page 361
i

361

Theorem 8.20. Suppose M (x) R<x


( >
) is symmetric, and M  = LDL
where L, D are matrices with noncommutative rational entries as in (8.13)
and (8.12) and L, respectively. If n is a positive integer and X (Snn )g is in the
domains of both L and D, then M (X) is positive semidenite if and only if D(X)
is positive semidenite.
Proof. The proof is an easy exercise based on the fact that a square block lower
triangular matrix whose diagonal blocks are invertible is itself invertible. In this
case, L(X) is block lower triangular, with the n n identity In as each diagonal
entry. Thus M (X) and D(X) are congruent and thus have the same number of
negative eigenvalues.
Remark 8.21. Note that if D has any 2 2 blocks Dj , then D(X)  0 if and
only if each Dj (X) = 0. Thus, if D has any 2 2 blocks, generically D(X), and
hence M (X), is not positive semidenite. (Recall that we assume, without loss of
generality, that Dj are not zero.)
More on rational functions
The matrix positivity and convexity properties of noncommutative rational functions go just like those for polynomials. One only tests a rational function r on
matrices X in its domain of regularity. The denition of directional derivatives
goes as before and it is easy to compute them formally. There are issues of equivalences which we avoid here, instead referring the reader to [10, 45] or our treatment
in [41].
We emphasize that proving the assertions above takes considerable eort, because of dealing with the equivalence relation. In practice one works with rational
expressions, and calculations with noncommutative rational expressions themselves
are straightforward. For instance, computing the derivative of a symmetric noncommutative rational function r leads to an expression of the form

 k

a (x)hb (x) ,
Dr(x)[h] = symmetrize
=1

where a , b are noncommutative rational functions of x, and the symmetrization of



a (not necessarily symmetric) rational expression s is s+s
2 .

8.2.7

Exercises

Section 8.3 gives a very brief introduction on noncommutative computer algebra


and some might enjoy playing with computer algebra in working some of these
exercises.
Dene for use in later exercises the noncommutative polynomials
p = x21 x22 x1 x2 x1 x2 x2 x1 x2 x1 x22 x21 ,
q = x1 x2 x3 + x2 x3 x1 + x3 x1 x2 x1 x3 x2 x2 x1 x3 x3 x2 x1 ,
s = x1 x3 x2 x2 x3 x1 .

i
i

362

main
2012/11/1
page 362
i

Chapter 8. Free Convexity

Exercise 8.22.
(a) What is the derivative with respect to x1 in direction h1 of q and s?
(b) Concerning the formal derivative with respect to x1 in direction h1 ,
1
(i) show the derivative of r(x1 ) = x1 1 is x1
1 h1 x1 ;

(ii) what is the derivative of u(x1 , x2 ) = x2 (1 + 2x1 )1 ?


Exercise 8.23. Consider the polynomials p, q, s and rational functions r, u from
above.
(a) Evaluate the polynomials p, q, s on some matrices of size 1 1, 2 2, and 3 3.
(b) Redo part (a) for the rational functions r, u.
Try to use Mathematica or MATLAB.
Exercise 8.24. Show that c = x1 x2 x2 x1 is not symmetric by nding n and
X = (X1 , X2 ) such that c(X) is not a symmetric matrix.
Exercise 8.25. Consider the following polynomials in two and three variables,
respectively:
h1 = c2 = (x1 x2 )2 x1 x22 x1 x2 x21 x2 + (x2 x1 )2 ,
h2 = h1 x3 x3 h1 .
(a) Compute h1 (X1 , X2 ) and h2 (X1 , X2 , X3 ) for several choices of 2 2 matrices
Xj . What do you nd? Can you formulate and prove a statement?
(b) What happens if you plug in 3 3 matrices into h1 and h2 ?
Exercise 8.26. Prove that a symmetric noncommutative polynomial p is matrix
convex if and only if the Hessian p (x)[h] is matrix positive by completing the
following exercise.
Fix n, suppose is a positive linear functional on Snn , and consider
f = p : (Snn )g R.
(a) Show f is convex if and only if

d2 f (X+tH)
dt2

0 at t = 0 for all X, H (Snn )g .

Given v Rn , consider the linear functional (M ) := v  M v and let fv = p.


(b) Geometric: Fix n. Show, each fv satises the convexity inequality if and only
if p satises the convexity inequality on (Snn )g .
(c) Analytic: Show, for each v Rn , fv (X)[H] 0 for every X, H (Snn )g if
and only if p (X)[H]  0 for every X, H (Snn )g .

i
i

8.2. Basics of Noncommutative Polynomials and Their Convexity


Exercise 8.27. For n N let
sn =

main
2012/11/1
page 363
i

363

sign( )x (1) x (n)

Symn

be a polynomial of degree n in n variables. Here Symn denotes the symmetric group


on n elements.
(a) Prove that s4 is a polynomial identity for 2 2 matrices. That is, for any choice
of 2 2 matrices X1 , . . . , X4 , we have
s4 (X1 , . . . , X4 ) = 0.
(b) Fix d N. Prove that there exists a nonzero polynomial p vanishing on all
tuples of d d matrices.
Several of the next exercises use a version of the shift operators on Fock space.
With g xed, the corresponding Fock space, F = Fg , is the Hilbert space obtained
from R<x> by declaring the words to be an orthonormal basis; i.e., if v, w are
words, then
v, w = v,w ,
where v,w = 1 if v = w and is 0 otherwise. Thus Fg is the closure of R<x> in this
inner product. For each j, the operator Sj on Fg densely dened by Sj p = xj p, for
p R<x> is an isometry (preserves the inner product) and hence extends to an
isometry on all of Fg . Of course, Sj acts on an innite-dimensional Hilbert space
and thus is not a matrix.
Exercise 8.28. Given a natural number k, note that R<x>k is a nite dimensional
(and hence closed) subspace of F = Fg . The dimension of R<x>k is
(k) =

k


gj .

(8.14)

j=0

Let V : R<x>k F denote the inclusion and


Tj = V  Sj V.
Thus Tj does act on a nite-dimensional space, and T = (T1 , . . . , Tg ) (Rnn )g for
n = (k).
(a) Show that if v is a word of length at most k 1, then
Tj v = xj v,
and Tj v = 0 if the length of v is k.
(b) Determine Tj.

i
i

364

main
2012/11/1
page 364
i

Chapter 8. Free Convexity

(c) Show that if p is a nonzero polynomial of degree at most k and Yj = Tj + Tj ,


then p(Y ) = 0.
(d) Conclude that if, for every n and X (Snn )g , p(X) = 0, then p is 0.
Exercise 8.28 shows there are no noncommutative polynomials vanishing on
all tuples of (symmetric) matrices of all sizes. The next exercise will lead the reader
through an alternative proof inspired by standard methods of polynomial identities.
Exercise 8.29. Let p R<x>n be an analytic polynomial that vanishes on
(Rnn )g (same xed n). Write p = p0 + p1 + + pn , where pj is the homogeneous
part of p of degree j.
(a) Show that pj also vanishes on (Rnn )g .
(b) A polynomial q is called multilinear if it is homogeneous of degree one with
respect to all of its variables. Equivalently, each of its monomials contains all
variables exactly once, i.e.,

X(1) X(n) .
q=
Sn

Using the staircase matrices E11 , E12 , E22 , E23 , . . . , En1 n , Enn show that a
nonzero multilinear polynomial q of degree n cannot vanish on all nn matrices.
(c) By (a) we may assume p is homogeneous. By induction on the biggest degree
a variable in p can have, prove that p = 0. Hint: What are the degrees of the
variables appearing in
1 , x2 , . . . , xg ) p(x1 , x2 , . . . , xg ) p(
x1 , x2 , . . . , xg )?
p(x1 + x
Exercise 8.30. Redo Exercise 8.29 for a polynomial
(a) p R<x, x >, not necessarily analytic, vanishing on all tuples of matrices;
(b) p R<x> vanishing on all tuples of symmetric matrices.
Exercise 8.31. Show that if p R<x> vanishes on a nonempty basic open
semialgebraic set, then p = 0.
Exercise 8.32. Suppose p R<x>, n is a positive integer, and O (Snn )g
is an open set. Show that if p(X) = 0 for each X O, then P (X) = 0 for each
X (Snn )g . Hint: Given X0 O and X (Snn )g , consider the matrix valued
polynomial,
q(t) = p(X0 + tX).
Exercise 8.33. Suppose r R<x
( >
) is a rational function and there is a nonempty
noncommutative basic open semialgebraic set O dom(r) with r|O = 0. Show that
r = 0.

i
i

8.2. Basics of Noncommutative Polynomials and Their Convexity

main
2012/11/1
page 365
i

365

Exercise 8.34. Prove item (3) of Proposition 8.6. You may wish to use Exercises
8.32 and 8.28.
Exercise 8.35. Prove the following proposition.
Proposition 8.36. If : R<x> Rnn is an involution preserving homomorphism, then there is an X (Snn )g such that (p) = p(X); i.e., all nite
dimensional representations of R<x> are evaluations.
Exercise 8.37. Do the algebra to show
x (1 xx )1 = (1 x x)1 x .
(This is a key fact used in the model theory for contractions [55].)
Exercise 8.38. Give an example of symmetric 2 2 matrices X, Y such that
X  Y  0 but X 2  Y 2 .
This failure of a basic order property of R for Snn is closely related to the
rigid nature of positivity and convexity in the noncommutative setting.
Exercise 8.39. Antiderivatives.
(a) Is q(x)[h] = xh + hx the derivative of any noncommutative polynomial p? If so,
what is p?
(b) Is q(x)[h] = hhx + hxh + xhh the second derivative of any noncommutative
polynomial p? If so, what is p?
(c) Describe in general which polynomials q(x)[h] are the derivative of some noncommutative polynomial p(x).
(d) Check you answer against the theory in [23].
Exercise 8.40. (Requires background in algebra) Show that R<x
( >
) is a division
ring; i.e., the noncommutative rational functions form a ring in which every nonzero
element is invertible.
Exercise 8.41. In this exercise we will establish that it is possible to embed the
free algebra R<x1 , . . . , xg > into R<x, y> for any g N.
(a) Show that the subalgebra of R<x, y> generated by xy n , n N0 , is free.
(b) Ditto for the subalgebra generated by
x1 = x,

x2 = c(x1 , y),

x3 = c(x2 , y),

...,

xn = c(xn1 , y), . . . .

Here, as before, c is the commutator, c(a, b) = ab ba.

i
i

366

main
2012/11/1
page 366
i

Chapter 8. Free Convexity

A comprehensive study of free algebras and noncommutative rational functions


from an algebraic viewpoint is developed in [10, 11].
Exercise 8.42. As a hard exercise, numerically verify that the set
ncTV(2) = {X (S22 )2 : 1 X14 X24 0}
is not convex. That is, nd X = (X1 , X2 ) and Y = (Y1 , Y2 ), where X1 , X2 , Y1 , Y2
are 2 2 symmetric matrices such that both
1 X14 X24 0 and 1 Y14 Y24 0
but
&
1

X1 + Y1
2

&

'4

X2 + Y2
2

'4
 0.

You may wish to write a numerical search routine.

8.3

Computer Algebra Support

There are several computer algebra packages available to ease the rst contact with
free convexity and positivity. In this section we briey describe two of them:
(1) NCAlgebra running under Mathematica;
(2) NCSOStools running under MATLAB.
The former is more universal in that it implements manipulation with noncommutative variables, including noncommutative rationals, and several algorithms pertaining to convexity. The latter is focused on noncommutative positivity and numerics.

8.3.1

NCAlgebra

NCAlgebra [42] runs under Mathematica and gives it the capability of manipulating
noncommuting algebraic expressions. An important part of the package (which we
shall not go into here) is NCGB, which computes noncommutative Groebner bases
and has extensive sorting and display features as well as algorithms for automatically
discarding redundant polynomials.
We recommend that the user have a look at the Mathematica notebook
NCBasicCommandsDemo available from the NCAlgebra website
http://math.ucsd.edu/ncalg/
for the basic commands and their usage in NCAlgebra. Here is a sample.
The basic ingredients are (symbolic) variables, which can be either noncommutative or commutative. At present, single-letter lowercase variables are noncommutative by default and all others are commutative by default. To change this one
can employ

i
i

8.3. Computer Algebra Support

main
2012/11/1
page 367
i

367

NCAlgebra Command: SetNonCommutative[listOfVariables] to make all the


variables appearing in listOfVariables noncommutative. The converse is given by
NCAlgebra Command: SetCommutative.
Example 8.43. Here is a sample session in Mathematica running NCAlgebra.
In[1]:= a ** b - b ** a
Out[1]= a ** b - b ** a
In[2]:= A ** B - B ** A
Out[2]= 0
In[3]:= A ** b - b ** a
Out[3]= A b - b ** a
In[4]:= CommuteEverything[a ** b - b ** a]
Out[4]= 0
In[5]:= SetNonCommutative[A, B]
Out[5]= {False, False}
In[6]:= A ** B - B ** A
Out[6]= A ** B - B ** A
In[7]:= SetNonCommutative[A];SetCommutative[B]
Out[7]= {True}
In[8]:= A ** B - B ** A
Out[8]= 0

Slightly more advanced is the NCAlgebra command to generate the directional derivative of a polynomial p(x, y) with respect to x, which is denoted by
Dx p(x, y)[h]:
NCAlgebra Command: DirectionalD[Function p, x, h], and is abbreviated
NCAlgebra Command: DirD.
Example 8.44. Consider
a = x ** x ** y - y ** x ** y

Then
DirD[a, x, h] = (h ** x + x ** h) ** y - y ** h ** y

or in expanded form,
NCExpand[DirD[a, x, h]] = h ** x ** y + x ** h ** y - y ** h ** y

i
i

368

main
2012/11/1
page 368
i

Chapter 8. Free Convexity

Note that we have used


NCAlgebra Command: NCExpand[Function p] to expand a noncommutative expression. The command comes with a convenient abbreviation
NCAlgebra Command: NCE.
NCAlgebra is capable of much more. For instance, is a given noncommutative function convex? You type in a function of noncommutative variables; the
command
NCAlgebra Command: NCConvexityRegion[Func, ListOfVariables] tells you
where the (symbolic) Function is convex in the Variables. The algorithm comes
from the paper of Camino et al. [9].
NCAlgebra Command: {L, D, U, P }:=NCLDUDecomposition[Matrix]. Computes
the LDU decomposition of matrix and returns the result as a 4-tuple. The last
entry is a permutation matrix which reveals which pivots were used. If matrix is
symmetric, then U = L .
The NCAlgebra website comes with extensive documentation. A more advanced notebook with a hands-on demonstration of applied capabilities of the package is DemoBRL.nb; it derives the bounded real lemma for a linear system.
Exercise 8.45. For the polynomials and rational functions dened at the beginning
of Section 8.2.7, use NCAlgebra to calculate
(a) p**q and NCExpand[p**q],
(b) NCCollect[p**q, x1],
(c) D[p,x1,h1] and D[u,x1,h1].
Warning
The Mathematica substitute commands /., /> and /:> are not reliable in
NCAlgebra, so a user should use NCAlgebras Substitute command.
Example 8.46. Here is an example of unsatisfactory behavior of the built-in Mathematica function.
In[1]:= (x ** a ** b) /. {a ** b -> c}
Out[1]= x ** a ** b

On the other hand, NCAlgebra performs as desired:


In[2]:= Substitute[x ** a ** b, a ** b -> c]
Out[2]= x ** c

i
i

8.3. Computer Algebra Support

8.3.2

main
2012/11/1
page 369
i

369

NCSOStools

A reader mainly interested in positivity of noncommutative polynomials might be


better served by NCSOStools [8]. NCSOStools is an open source MATLAB toolbox for
(a) basic symbolic computation with polynomials in noncommuting variables;
(b) constructing and solving sum of hermitian squares (with commutators) programs for polynomials in noncommuting variables.
It is normally used in combination with standard SDP software to solve these constructed linear matrix inequalities.
The NCSOStools website http://ncsostools.s.unm.si contains documentation
and a demo notebook NCSOStoolsdemo to give the user a gentle introduction to its
features.
Example 8.47. Although it has some ability to manipulate symbolic expressions, MATLAB cannot handle noncommuting variables. They are implemented in
NCSOStools.
NCSOStools Command: NCvars x introduces a noncommuting variable x into the
workspace.
NCSOStools is well equipped to work with commutators and sums of (hermitian) squares. Recall: a commutator is an expression of the form f g gf .
Exercise 8.48. Use NCSOStools to check whether the polynomial x2 yx + yx3
2xyx2 is a sum of commutators. (Hint: Try the NCisCycEq command.) If so, can
you nd such an expression?
Let us demonstrate an example with sums of squares.
Example 8.49. Consider
f = 5 + x^2 - 2*x^3 + x^4 + 2*x*y + x*y*x*y - x*y^2 + x*y^2*x
-2*y + 2*y*x + y*x^2*y - 2*y*x*y + y*x*y*x - 3*y^2 - y^2*x + y^4

Is f matrix positive? By Theorem 8.10 it suces to check whether f is a sum of


squares. This is easily done using
NCSOStools Command: NCsos(f ), which checks if the polynomial f is a sum of
squares. Running NCsos(f ) tells us that f is indeed a sum of squares. What
NCSOStools does is transform this question into a semidenite program and then
calls a solver. NCsos comes with several options. Its full command line is
[IsSohs,X,base,sohs,g,SDP_data,L] = NCsos(f,params)

i
i

370

main
2012/11/1
page 370
i

Chapter 8. Free Convexity

The meaning of the output is as follows:


IsSohs equals 1 if the polynomial f is a sum of hermitian squares and 0 otherwise;
X is the Gram matrix solution of the corresponding semidenite program returned
by the solver;
base is a list of words which appear in the sums of Hermitian squares decomposition;
sohs is the sums of hermitian squares decomposition of f ;

g is the NCpoly representing i mi mi ;
SDP_data is a structure holding all the data used in the SDP solver;
L is the operator representing the dual optimization problem (i.e., the dual feasible
SDP matrix).
Exercise 8.50. Use NCSOStools to compute the smallest eigenvalue f (X, Y ) can
attain for a pair of symmetric matrices (X, Y ). Can you also nd a minimizer pair
(X, Y )?
Exercise 8.51. Let f = y 2 + (xy 1) (xy 1). Show the following.
(a) f (X, Y ) is always positive semidenite.
(b) For each > 0 there is a pair of symmetric matrices (X, Y ) so that the smallest
eigenvalue of f (X, Y ) is .
(c) Can f (X, Y ) be singular?
The moral of Example 8.51 is that even if a noncommutative polynomial is
bounded from below, it need not attain its minimum.
Exercise 8.52. Redo the Exercise 8.51 for f (x) = x x + (xx 1) (xx 1).

8.4

A Gram-like Representation

The next two sections are devoted to a powerful representation of quadratic functions q in noncommutative variables which takes a strong form when q is matrix
positive; we call it a QuadratischePositivstellensatz. Ultimately we shall apply this
to q(x)[h] = p (x)[h] and show that if p is matrix convex (i.e., q is matrix positive),
then p has degree 2. We begin by illustrating our grand scheme with examples.

8.4.1

Illustrating the Ideas

Example 8.53. The (symmetric) polynomial p(x) = x1 x2 x1 + x2 x1 x2 (in symmetric variables) has Hessian q(x)[h] = p (x)[h], which is homogeneous quadratic in h
and is
q(x)[h] = 2h1 h2 x1 + 2h1 x2 h1 + 2h2 h1 x2 + 2h2 x1 h2 + 2x1 h2 h1 + 2x2 h1 h2 .

i
i

8.4. A Gram-like Representation

main
2012/11/1
page 371
i

371

We can write q in the form

2
q(x)[h] = h1

h2

x2 h1

2x2
3 0
x1 h2
0
2

0
2x1
2
0

0
2
0
0

2
h1

0
h2 .
0 h1 x2
0 h2 x1

The representation of q displayed above is of the form


q(x)[h] = V (x)[h] Z(x)V (x)[h],
where Z is called the middle matrix and V the border vector. The middle matrix does
not contain h. The border vector is linear in h with h always on the left. In Section
8.4.2 we dene this border vectormiddle matrix (BV-MM) representation generally
for noncommutative polynomials q(x)[h] which are homogeneous of degree two in
the h variables. Note that the entries of the border vector are distinct monomials.
Example 8.54. Let p = x2 x1 x2 x1 + x1 x2 x1 x2 . Then
q = p = 2h1 h2 x1 x2 +2h1 x2 h1 x2 +2h1 x2 x1 h2 +2h2 h1 x2 x1 +2h2 x1 h2 x1 +2h2 x1 x2 h1
+ 2x1 h2 h1 x2 + 2x1 h2 x1 h2 + 2x1 x2 h1 h2 + 2x2 h1 h2 x1 + 2x2 h1 x2 h1 + 2x2 x1 h2 h1 .
The BV-MM representation for q is
q = [h1 h2 x2 h1 x1 h2 x1 x2 h1 x2 x1 h2 ]

h1
0
2x2 x1 2x2
0
0 2

2x1 x2

0
0
2x1 2 0

h2
2x1
h1 x2
0
0
2
0
0
.

2x2
2
0
0 0

h2 x1
0
2
0
0
0 0 h1 x2 x1
2
0
0
0
0 0 h2 x1 x2
Example 8.55. In the one variable with h1 = h1 we abbreviate h1 to h. Fix some
noncommutative variables not necessarily symmetric w := (a, b, d, e) and consider
q(w)[h] := hah + e hbh + hb he + e hdhe,
which is a quadratic function of h. It can be written in the BV-MM form
 

2
3 a b h
q(w)[h] = h e h
.
b d he

(8.15)

(8.16)

The representation is unique.


Observe (8.16) contrasts strongly with the commutative case wherein (8.15)
takes the form
q(w)[h] = h(a + e b + b e + e de)h.

i
i

372

main
2012/11/1
page 372
i

Chapter 8. Free Convexity

Example 8.56. The Hessian of p(x) = x4 is


q(x)[h] := p (x)[h] = 2(x2 h2 + xh2 x + h2 x2 )
+ 2(xhxh + hxhx)

(8.17)

+ hx h,
a polynomial that is homogeneous of degree 2
h that can be expressed as
2
x
2
3
q(x)[h] = 2 h xh x2 h x
1

in x and homogeneous of degree 2 in


x
1
0

1
h
0 hx .
0 hx2

Notice that the contribution of the main antidiagonal of the middle matrix for
q in Example 8.56 (all 1s) corresponds to the right-hand side of rst line of (8.17).
Indeed, each antidiagonal corresponds to a line of (8.17).
Exercise 8.57. In Example 8.56, for which symmetric matrices X is Z(X) positive
semidenite?
Exercise 8.58. What is the middle matrix Z(x) for p(x) = x3 ? For which symmetric matrices X is Z(X) positive semidenite?
Exercise 8.59. Compute middle matrix representations using NCAlgebra. The
command is
{lt, mq, rt} =NCMatrixOfQuadratic[q, {h, k}]
In the output mq is the middle matrix, rt is the border vector, and lt is (rt) . For
examples, see NCConvexityRegionDemo.nb in the NC/DEMOS directory.
The positivity of q vs. positivity of the middle matrix
In this section we let q(x)[h] denote a polynomial which is homogeneous of degree
two in h, but which is not necessarily the Hessian of a noncommutative polynomial.
While we have focused on Hessians, such a q will still have a BV-MM representation. So what good is this representation? After all one expects that q could have
wonderful properties, such as positivity, which are not shared by its middle matrix.
No, the striking thing is that positivity of q implies positivity of the middle matrix.
Roughly we shall prove what we call the QuadratischePositivstellensatz, which is
essentially Theorem 3.1 of [9].
Theorem 8.60. If the polynomial 2 q(x)[h] is homogeneous quadratic in h, then q
is matrix positive if and only if its middle matrix Z is matrix positive.
2 This

theorem is true (but not proved here) for q which are noncommutative rational in x.

i
i

8.4. A Gram-like Representation

main
2012/11/1
page 373
i

373

More generally, suppose O is a nonempty noncommutative basic open semialgebraic set. If q(X)[H] is positive semidenite for all n N, X O(n), and
H (Snn )g , then Z(X)  0 for all X O.
We emphasize that, in the theorem, the convention that the terms of the
border vector are distinct is in force.
To foreshadow Section 8.5 and to give an idea of the proof of Theorem 8.60,
we illustrate it on an example in one variable. This time we use a free rather than
symmetric variable since proofs are a bit easier.
Consider the noncommutative quadratic function q given by
q(w)[h] := h bh + e h ch + h c he + e h ahe,

(8.18)

where w = (a, b, c, e). The border vector V (w)[h] and the coecient matrix Z(w)
with noncommutative entries are
 


h
b c
V (w)[h] =
and
Z(w) =
;
he
c a
that is, q has the form
2
q(w)[h] = V (w)[h] Z(w)V (w)[h] = h

e  h


3 b
c

c
a




h
.
he

Now, if in (8.18) the elements a, b, c, e, h are replaced by matrices in Rnn ,


then the noncommutative quadratic function q(w)[h] becomes a matrix-valued function q(W )[H]. The matrix-valued function q[H] is matrix positive if and only if
v  q(W )[H]v 0 for all vectors v Rn and all H Rnn , or, equivalently, the
following inequality must hold:


2  
3
Hv
v H
v E  H  Z
0.
(8.19)
HEv
Let
2
y  := v  H 

3
v E  H  .

Then (8.19) is equivalent to y  Z y 0. Now it suces to prove that all vectors


of the form y sweep R2n . This will be completely analyzed in full generality in
Section 8.5.1, but next we give the proof for our simple situation.
Suppose for a given v, with n 2, the vectors v and Ev are linearly independent. Let y = [ vv12 ] be any vector in R2n ; then we can choose H Rnn with the
property that v1 = Hv and v2 = HEv. It is clear that



Hv
Rv :=
: H Rnn
(8.20)
HEv
is all R2n as required.

i
i

374

main
2012/11/1
page 374
i

Chapter 8. Free Convexity

Thus we are nished unless for all v the vectors v and Ev are linearly dependent. That is for all v, 1 (v)v + 2 (v)Ev = 0 for nonzero 1 (v) and 2 (v). Note
2 (v) = 0, unless v = 0. Set (v) := 12 (v)
(v) ; then the linear dependence becomes
(v)v + Ev = 0 for all v. It turns out that this does not happen unless E = I
for some R. This is a baby case of Theorem 8.92 which comes later and is a
subject unto itself.
To nish the proof pick a v which makes Rv equal all of R2n . Then v  q(W )[H]v
0 implies that Z  0 by (8.19).

8.4.2

Details of the Middle Matrix Representation

The following representation for symmetric noncommutative polynomials q(x)[h]


that are of degree in x and homogeneous of degree 2 in h is exploited extensively
in this subject:

Z01
Z0, 1 Z0
Z00
V0

Z10

Z11
Z1, 1
0
V1

2 



 3
.
.
.
.
.
.
..
.. ..
..
q(x)[h] = V0 V1 V 1 V ..
,
..

Z 1,0 Z 2,1

0
0
V 1
Z 0
0

0
0
V
(8.21)
where the following hold:
1. The degree d of q(x)[h] is d = + 2.
2. Vj = Vj (x)[h], j = 0, . . . , , is a vector of height g j+1 whose entries are
monomials of degree j in the x variables and degree 1 in the h variables.
The h always appears to the left. In particular, V (x)[h] is a vector of height
g( ), where as in (8.14),
( ) = 1 + g + + g .
3. Zij = Zij (x) is a matrix of size g i+1 g j+1 whose entries are polynomials in
the noncommuting variables x1 , . . . , xg of degree (i + j). In particular,
Zi, i = Zi, i (x) is a constant matrix for i = 0, . . . , .

= Zji .
4. Zij

Usually the entries of the vectors Vj are ordered lexicographically.


We note that the vector of monomials, V (x)[h], might contain monomials
that are not required in the representation of the noncommutative quadratic q.
Therefore, we can omit all monomials from the border vector that are not required.
This gives us a minimal length border vector and prevents extraneous zeros from
occurring in the middle matrix. The matrix Z in the representation (8.21) will be
referred to as the middle matrix of the polynomial q(x)[h], and the vectors Vj =
Vj (x)[h] with monomials as entries will be referred to as border vectors. It is easy

i
i

8.4. A Gram-like Representation

main
2012/11/1
page 375
i

375

to check that a minimal length border vector contains distinct monomials, and once
the ordering of entries of V is set, the middle matrix for a given q is unique; see
Lemma 8.62 below.
Example 8.61. Returning to Example 8.54, we have for the middle matrix representation of q that
 




h1
h2 x1
h1 x2 x1
V0 =
,
V1 =
,
V2 =
,
h2
h1 x2
h2 x1 x2
and, for instance,

0
Z00 =
2x1 x2


2x2 x1
,
0

Z01

2x2
=
0


0
,
2x1

Z02


0 2
=
.
2 0

Note that generically for a polynomial q in two variables the Vj have additional
terms. For instance, usually V1 is the column

h1 x1
h1 x2

h2 x1 .
h2 x2
Likewise generically V2 has eight terms. As for the Zij , Z01 , for instance, is generically 2 4.
Lemma 8.62. The entries in the middle matrix Z(x) are uniquely determined by
the polynomial q(x)[h] and the border vector V (x)[h].
Proof. Note every monomial in q(x)[h] has the form
m L h i mM h j mR .
Dene
Rj := {hj m : mL hi mM hj m is a term in q(x)[h]}.
Given the representation V  ZV for q, let EV denote the monomials in V . Then it
is clear that each monomial in EV must occur in some term of q, so it appears in
Rj for some j. Conversely, each term hj m in Rj corresponds to at least one term
mL hi mM hj m of q, so it must be in EV .
Exercise 8.63. Consider (8.21) and prove the degree bound on the Zij in (3).
Hint: Read Example 8.64 rst.
Example 8.64. If p(x) is a symmetric polynomial of degree d = 4 in g noncommuting variables, then the middle matrix Z(x) in the representation of the Hessian
p (x)[h] is

Z00 (x) Z01 (x) Z02 (x)


0 ,
Z(x) = Z10 (x) Z11 (x)
Z20 (x)
0
0

i
i

376

main
2012/11/1
page 376
i

Chapter 8. Free Convexity

where the block entries Zij = Zij (x) have the following structure:
Z00
Z01
Z02

is a g g matrix with noncommutative polynomial entries of degree 2,


is a g g 2 matrix with with noncommutative polynomial entries of
degree 1,
is a g g 3 matrix with constant entries.

All of these are proved merely by keeping track of the degrees. For example, the
contribution of Z02 to p is V0 Z02 V2 , whose degree is
deg(V0 ) + deg(Z02 ) + deg(V2 ) = 1 + deg(Z02 ) + 3 4,
so deg(Z02 ) = 0.

8.4.3

The Middle Matrix of p

The middle matrix Z(x) of the Hessian p (x)[h] of a noncommutative symmetric
polynomial p(x) plays a key role. These middle matrices have a very rigid structure
similar to that in Example 8.56. We illustrate with an example and then with
exercises.
Example 8.65. As a warm-up we rst illustrate that Z02 (X) = 0 if and only if
Z11 (X) = 0 for Example 8.54. To this end, observe that the contribution of the
middle matrixs extreme outer diagonal element Z02 to q is as follows:
  


1
h
0 2 h1 x2 x1
V0 (x)[h] Z02 (x)V2 (x)[h] = 1
= 2h1 h2 x1 x2 + 2h2 h1 x2 x1 .
h2
2 0 h2 x1 x2
2
Substitute hj  xj and get 2x1 x2 x1 x2 + 2x2 x1 x2 x1 , which is 2p(x). That is,
p(x) =

1
V0 (x)[x] Z02 (x)V2 (x)[x],
2

where Vk (x)[h] is the homogeneous, in x, of degree k part of the border vector V .


Obviously, Z02 = 0 implies p = 0.
Exercise 8.66. Show p(x) can also be obtained from Z11 in a similar fashion, i.e.,
p(x) =

1
V1 (x)[x] Z11 (x)V1 (x)[x].
2

Exercise 8.67. Suppose p is homogeneous of degree d and its Hessian q has the
BV-MM representation q(x)[h] = V (x)[h] Z(x)V (x)[h].
(a) Show
p=

1
V0 (x)[x] Z0 V (x)[x]
2

with = d 2. Prove this formula for d = 2, d = 4.

i
i

8.4. A Gram-like Representation

main
2012/11/1
page 377
i

377

(b) Show that likewise


p=

1
V1 (x)[x] Z1, 1 (x)V 1 (x)[x].
2

Do not cheat and look this up in [14], but do compare with Exercise 8.63.
Exercise 8.68. Let Z denote the middle matrix for the Hessian of a noncommutative polynomial p. Show, if i + j = i + j  , then Zij = 0 if and only if Zi j  = 0.

8.4.4

Positivity of the Middle Matrix and the Demise of


Noncommutative Convexity

This section focuses on positivity of the middle matrix of a Hessian.


Why should we focus on the case where Z(x) is positive semidenite? In
[35] it was shown that a polynomial p R<x> is matrix convex if and only if its
Hessian p (x)[h] is positive (see Exercise 8.26). Moreover, if Z(x) is positive, then
the degree of p(x) is at most two [36]. The proof of this degree constraint given
in Proposition 8.70 below using the more manageable bookkeeping scheme in this
chapter begins with the following exercise.
Exercise 8.69. Show that

A
B

B
0

is positive semidenite if and only if A  0 and B = 0. More rened versions of


this fact appear as exercises later; see Exercise 8.76.
As we shall see, we need not require our favorite functions be positive everywhere. It is possible to work locally, namely, on an open set.
Proposition 8.70. Let p = p(x) be a symmetric polynomial of degree d in g
noncommutative variables and let Z(x) denote the middle matrix in the BV-MM
representation of the Hessian p (x)[h]. If Z(X)  0 for all X in some nonempty
noncommutative basic open semialgebraic set O, then d is at most 2.
Proof. Arguing by contradiction, suppose that d 3; then p (x)[h] is of degree
= d 2 1 in x and its middle matrix is of the form

Z00 Z0

.. .
.
Z = ...
..
.
Z 0

A
B


B
,
0

Therefore, Z(X) is of the form



Z(X) =

i
i

378

main
2012/11/1
page 378
i

Chapter 8. Free Convexity

3
2
where A = A and B  = Z0 (X) 0 0 . From Exercise 8.67, pd , the homogeneous degree d part of p, can be reconstructed from Z0 . Now there is an X O
such that pd (X) is nonzero, as otherwise pd vanishes on a basic open semialgebraic
set and is equal to 0. It follows that there is an X O such that Z0 (X) is not
zero. Hence B(X) is not zero which implies, by Exercise 8.69, the contradiction
that Z(X) is not positive semidenite.
We have now reached our goal of showing that convex polynomials have degree 2.
Theorem 8.71. If p R<x> is a symmetric polynomial which is convex on a
nonempty noncommutative basic open semialgebraic set O, then it has degree at
most 2.
There is a version of the theorem for free variables, i.e., with p R<x, x >.
Proof. The convexity of p on O is equivalent to p (X)[H] being positive semidefinite for all X in O; see Exercise 8.26. By the QuadratischePositivstellensatz the
middle matrix Z(x) for p (x)[h] is positive on O; that is, Z(X)  0 for all X O.
Proposition 8.70 implies degree p is at most 2.

8.4.5

The Signature of the Middle Matrix

This section introduces the notion of the signature (Z(x)) of Z(x), the middle
matrix of a Hessian, or more generally a polynomial q(x)[h] which is homogeneous
of degree 2 in h.
The signature of a symmetric matrix M is a triple of integers
$
%
(M ), 0 (M ), + (M ) ,
where (M ) is the number of negative eigenvalues (counted with multiplicity);
+ (M ) is the number of positive eigenvalues; and 0 (M ) is the dimension of the
null space of M .
Lemma 8.72. A noncommutative symmetric polynomial q(x)[h] homogeneous of
degree 2 in h has middle matrix Z of the form in (8.21), and Z being positive
semidenite implies Z is of the form

Z00
Z10

.
.
.

Z  ,0
2
0

..
.

Z01
Z11
..
.
Z  ,1
2
0
..
.

.
..

.
..

Z0,  
2
Z1,  
2
..
.
Z  ,  
2
2
0
.
..

0
0
..
.
0
0
.
..

.
.
.

. .
..
.
..

i
i

8.4. A Gram-like Representation

main
2012/11/1
page 379
i

379

This lemma follows immediately from a much more general lemma.


Lemma 8.73. If

A
E = B 
C
is a real symmetric matrix, then

B
D
0

C
0
0

(E) (D) + rank C.


This can be proved using the LDL decomposition which we shall not do here
but suggest the reader apply the LDL hammer to the following simpler exercise.

8.4.6

Exercises

Exercise 8.74. True or false? If pd is homogeneous of degree d and we let Z denote


the middle matrix of the Hessian p (x)[h], then for each k d 2 the degree of
Zi,ki is independent of i.
Exercise 8.75. Redo Exercise 8.26 for convexity on a noncommutative basic open
semialgebraic set.
Exercise 8.76. If F = [ CA C0 ], then (F ) rank C. (If you cannot do the general
case, assume A is invertible.)
Exercise 8.77. If p(x) is a symmetric polynomial of degree d = 2 in g noncommuting variables, then the middle matrix Z(x) in the representation of the Hessian
p (x)[h] is equal to the g g constant matrix Z00 . Substituting X (Snn )g for x
gives
(Z(X)) (Z00 ).
(d)

Exercise 8.78. Let f R<x>2d and let V <x>d


all words in x of degree d. Prove

be a vector consisting of

(a) there is a matrix G R(d)(d) with f = V  GV (any such G is called a Gram


matrix for f );
(b) if f is symmetric, then there is a symmetric Gram matrix for f .
Exercise 8.79. Find all Gram matrices for
(a) f = x41 + x21 x2 x1 x22 + x2 x21 x22 x1 + x21 x22 + 2x1 x2 + 4;
(b) f = c(x1 , x2 )2 .
Exercise 8.80. Show that if f R<x> is homogeneous of degree 2d, then it has
a unique Gram matrix G R(d)(d) .

i
i

380

main
2012/11/1
page 380
i

Chapter 8. Free Convexity

8.4.7

A Glimpse of History

There is a theory of operator monotone and operator convex functions which overlaps with the matrix convex functions considered here in the case of one variable.
However, the points of view are substantially dierent, diverging markedly in several
variables. Lowner introduced a class of real analytic functions in one real variable
called matrix monotone functions, which we shall not dene here. L
owner gave
integral representations and these have developed substantially over the years. The
contact with convexity came when L
owners student Kraus [49] introduced matrix
convex functions f in one variable. Such a function f on [0, ) R can be represented as f (t) = tg(t) with g matrix monotone, so the representations for g produce
representations for f . Hansen has extensive in-depth work on matrix convex and
monotone functions whose denition in several variables is dierent than the one
we use here; see [25] or [24]. All of this gives a beautiful integral representation
characterizing matrix convex functions using techniques very dierent from ours.
An excellent treatment of the one-variable case is [3, Chapter 5]. Interestingly, to
the best of our knowledge, the one-variable version of Theorem 8.71 [36] does not
seem to be explicit in this classical literature. However, it is an immediate consequence of the results of [25], where (not necessarily polynomial) operator convex
functions on an interval are described. This and the papers of Hansen and [56, 68]
are some of the more recent references in this line of convexity history orthogonal
to ours.

8.5

Der QuadratischePositivstellensatz

In this section we present the proof of the QuadratischePositivstellensatz (Theorem


8.60) which is based on the fact that local linear dependence of noncommutative
rationals (or noncommutative polynomials) implies global linear dependence, a fact
itself based on the forthcoming CHSY lemma [9].

8.5.1

The CaminoHeltonSkeltonYe (CHSY) Lemma

At the root of the CHSY lemma [9] is the following linear algebra fact.
Lemma 8.81. Fix n > d. If {z1 , . . . , zd } is a linearly independent set in Rn , then
the codimension of

Hz1

Hz
2

nn
Rnd
.. : H S

Hzd
is

d(d1)
.
2

It is especially important that this codimension is independent of n.

The following exercise is a variant of Lemma 8.81 which is easier to prove.


Thus we suggest attempting it before launching into the proof of the lemma.

i
i

8.5. Der QuadratischePositivstellensatz

main
2012/11/1
page 381
i

381

Exercise 8.82. Prove that if {z1 , . . . , zd } is a linearly independent set in Rn , then

Hz1

Hz2

nn
= Rnd .
:
H

R
..

Hzd
Hint: It proceeds like the proof of (8.20).
Proof of Lemma 8.81. Consider the mapping : Snn Rnd given by

Hz1
Hz2

H  . .
..
Hzd
Since the span of {z1 , . . . , zd } has dimension d, it follows that the kernel of has
, and hence the range has dimension n(n+1)
. To
dimension = (nd)(nd+1)
2
2
see this assertion, it suces to assume that the span of {z1 , . . . , zd } is the span of
{e1 , . . . , ed } Rn (the rst d standard basis vectors in Rn ). In this case (since H
is symmetric) Hzj = 0 for all j if and only if


0 0
,
H=
0 H
where H  is a symmetric matrix of size (n d) (n d); in other words, this is the
kernel of .
From this we deduce that the codimension of the range of is
/ d(d 1)
. n(n + 1)
=
,
nd
2
2
concluding the proof.
Next is a straightforward extension of Lemma 8.81.
Lemma 8.83 ([9]). If n > d and {z1 , . . . , zd } is a linearly independent subset of
Rn , then the codimension of

Hj z 1

Hj z 2

g
nn g
j=1 . : H = (H1 , . . . , Hg ) (S
Rgnd
)
.

Hj z d
is g d(d1)
and is independent of n.
2
Proof. See Exercise 8.94.

i
i

382

main
2012/11/1
page 382
i

Chapter 8. Free Convexity


Finally, the form in which we generally apply the lemma is the following.

Lemma 8.84. Let v Rn , X (Snn )g . If the set {m(X)v : m <x>d } is


linearly independent, then the codimension of
{V (X)[H]v : H (Snn )g }
is g (1)
, where = (d) =
2

d
j=0

V =

g j and where
g
F

Hi m

i=1 m<x>d

is the border vector associated with <x>d . Again, this codimension is independent
of n as it depends only upon the number of variables g and the degree d of the
polynomial.
Proof. Let zm = m(X)v for m <x>d . There are at most of these. Now apply
the previous lemma.

8.5.2

Linear Dependence of Symbolic Functions

The main result in this section, Theorem 8.92, says roughly that if each evaluation
of a set G1 , . . . G of rational functions produces linearly dependent matrices, then
they satisfy a universal linear dependence relation. We begin with a clean and easily
stated consequence of Theorem 8.92.
In Section 8.2.1 we dened noncommutative basic open semialgebraic sets.
Here we dene a noncommutative basic semialgebraic set. Given matrix-valued
symmetric noncommutative polynomials and , let

(n) = {X (Snn )g : (X) 0}


D+

and
D(n) = {X (Snn )g : (X)  0}.
Then D is a noncommutative basic semialgebraic set if there exists 1 , . . . , k and
1 , . . . , k such that D = (D(n))nN , where

D(n) =

@
j

D+j (n)

Dj (n) .

Theorem 8.85. Suppose G1 , . . . , G are rational expressions and D is a nonempty


noncommutative basic semialgebraic set on which each Gj is dened. If, for each
X D(n) and vector v Rn , the set {Gj (X)v : j = 1, 2, . . . , } is linearly dependent, then the set {Gj (X) : j = 1, 2, . . . , } is linearly dependent on D; i.e., there

i
i

8.5. Der QuadratischePositivstellensatz

main
2012/11/1
page 383
i

383

exists a nonzero R such that


0=

for all X D.

j Gj (X)

j=1

If, in addition, D contains an -neighborhood of 0 for some > 0, then there exists
a nonzero R such that
0=

j Gj .

j=1

Corollary 8.86. Suppose G1 , . . . , G are rational expressions. If, for each n N,


X (Snn )g , and vector v Rn , the set {Gj (X)v : j = 1, 2, . . . , } is linearly
dependent, then the set {Gj : j = 1, 2, . . . , } is linearly dependent; i.e., there exists
a nonzero R such that



j Gj = 0.

j=1

Corollary 8.87. Suppose G1 , . . . , G are rational expressions. If, for each n N


and X (Snn )g , the set {Gj (X) : j = 1, 2, . . . , } is linearly dependent, then the
set {Gj : j = 1, 2, . . . , } is linearly dependent.
The point is that the j are independent of X. Before proving Theorem 8.85
we shall introduce some terminology pursuant to our more general result.
Direct Sums
We present some denitions about direct sums and sets which respect direct sums,
since they are important tools.
Denition 8.88. Our denition of the direct sum is the usual one. Given pairs
(X1 , v1 ) and (X2 , v2 ), where Xj are nj nj matrices and vj Rnj ,
(X1 , v1 ) (X2 , v2 ) = (X1 X2 , v1 v2 ),
where


X1
X1 X2 :=
0


0
,
X2

 
v
v1 v2 := 1 .
v2

We extend this denition to terms, (X1 , v1 ), . . . , (X , v ) in the expected way.


In the denition below, we consider a set B, which is the sequence
B := (B(n)),

i
i

384

main
2012/11/1
page 384
i

Chapter 8. Free Convexity

where each B(n) is a set whose members are pairs (X, v), where X is in (Snn )g
and v Rn .
Denition 8.89. The set B is said to respect direct sums if (X j , v j ) with X j
(Snj nj )g and v j Rnj for j = 1, . . . , being contained in the set B(nj ) implies
that the direct sum
(X 1 . . . X , v 1 . . . v ) = (j=1 X j , j=1 v j )

is also contained in B( nj ).
Denition 8.90. By a natural map G on B, we mean a sequence of functions
G(n) : B(n) Rn , which respects direct sums in the sense that, if (X j , v j ) B(nj )
for j = 1, 2, . . . , , then



G
nj (X j , v j ) = 1 G(nj )(X j , v j ).
1

Typically we omit the argument n, writing G(X) instead of G(n)(X).


Examples of sets which respect direct sums and of natural maps are provided
by the following example.
Example 8.91. Let be a rational expression.
(1) The set B = {(X, v) : X D (Snn )g , v Rn , n N} respects direct
sums.
(2) If G is a matrix-valued noncommutative rational expression whose domain contains D , then G determines a natural map on B() by G(n)(X, v) = G(X)v.
In particular, every noncommutative polynomial determines a natural map on
every noncommutative basic semialgebraic set B.
Main result on linear dependence
Theorem 8.92. Suppose B is a set which respects direct sums and G1 , . . . , G
are natural maps on B. If for each (X, v) B the set {G1 (X, v), . . . , G (X, v)} is
linearly dependent, then there exists a nonzero R so that
0=

j Gj (X, v)

j=1

for every (X, v) B. We emphasize that is independent of (X, v).


Before proving Theorem 8.92, we use it to prove an important earlier theorem.
Proof of Theorem 8.85. Let B be given by
B(n) = {(X, v) : X D (Snn )g and v Rn }.

i
i

8.5. Der QuadratischePositivstellensatz

main
2012/11/1
page 385
i

385

Let Gj denote the natural maps, Gj (X, v) = Gj (X)v. Then B and G1 , . . . , G


satisfy the hypothesis of Theorem 8.92 and so the rst conclusion of Theorem 8.85
follows.
The last conclusion follows because a noncommutative rational function r
vanishing on a noncommutative basic open semialgebraic set is 0 on all dom(r) and
hence is zero; cf. Exercise 8.33.
Proof of Theorem 8.92
We start with a nitary version of Theorem 8.92.
Lemma 8.93. Let B and Gi be as in Theorem 8.92. If R is a nite subset of B,
then there exists a nonzero (R) R such that



(R)j Gj (X)v = 0

j=1

for every (X, v) R.


Proof. The proof relies on taking direct sums of matrices. Write the set R as
;
:
R = (X 1 , v 1 ), . . . , (X , v ) ,
where each (X i , v i ) B. Since B respects direct sums,
(X, v) = (=1 X , =1 v ) B.
Hence, there exists a nonzero (R) R such that
0=

(R)j Gj (X, v).

j=1

Since each Gj respects direct sums, the desired conclusion follows.


Proof of Theorem 8.92. The proof is essentially a compactness argument, based
on Lemma 8.93. Let B denote the unit sphere in R .
To (X, v) B associate the set


j Gj (X, v) = 0 .
(X,v) = B : G(X)v =

Since (X, v) B, the hypothesis on B says (X,v) is nonempty. It is evident that


(X,v) is a closed subset of B and is thus compact.
Let := {(X,v) : (X, v) B}. Any nite subcollection from has the form
{(X,v) : (X, v) R} for some nite subset R of B, and so by Lemma 8.93 has a
nonempty intersection. In other words, has the nite intersection property. The
compactness of B implies that there is a B which is in every (X,v) . This is the
desired conclusion of the theorem.

i
i

386

8.5.3

main
2012/11/1
page 386
i

Chapter 8. Free Convexity

Proof of the QuadratischePositivstellensatz

We are now ready to give the proof of Theorem 8.60. Accordingly, let O be a given
basic open semialgebraic set. Suppose
q(x)[h] = V (x)[h] Z(x) V (x)[h],

(8.22)

where V is the border vector and Z is the middle matrix; cf. (8.21). Clearly, if Z is
matrix-positive on O, then q(X)[H] is positive semidenite for each n, X O(n),
and H (Snn )g .
The converse is less trivial and requires the CHSY lemma plus our main result on linear dependence of noncommutative rational functions. Let denote the
degree of q(x)[h] in the variable x. In particular, the border vector in the representation of q(x)[h] itself has degree in x. Recall from Exercise 8.28.
g )
= (X
1 , . . . , X
Suppose that for some s and g-tuple of symmetric matrices X
is not positive semidenite. By Lemma 8.84 and Theorem
O(s), the matrix Z(X)
8.85, there is a t, a Y O(t), and a vector so that {m(Y ) : m <x> } is linearly
Y and = 0 Rs+t . Then Z(X) is not positive
independent. Let X = X
semidenite and {m(X) : m <x> } is linearly independent.
+ 1, where is given in Lemma 8.84, and let n = (s + t)N .
Let N = g (1)
2
Consider W = X IN = (X1 IN , . . . , Xg IN ) and vector = e, for any
nonzero vector e RN +1 . The set {m(W ) : m <x> } is linearly independent,
and thus by Lemma 8.84, the codimension of M = {V (W )[H] : H (Snn )g } is at
most N 1. On the other hand, because Z(X) has a negative eigenvalue, the matrix
Z(W ) has an eigenspace E, corresponding to a negative eigenvalue, of dimension at
least N . It follows that E M is nonempty; i.e., there is an H (Snn )g such that
V (W )[H] E. In particular, this together with (8.22) implies
q(W )[H],  = Z(W )V (W )[H], V (W ) < 0,
and thus, q(W )[H] is not positive semidenite.

8.5.4

Exercises

Exercise 8.94. Prove Lemma 8.83.


Exercise 8.95. Let A Rnn be given. Show that if the rank of A is r, then the
matrices A, A2 , . . . , Ar+1 are linearly dependent.
In the next exercise employ the Fock space (see Section 8.2.7) to prove a
strengthening of Corollary 8.86 for noncommutative polynomials.
Exercise 8.96. Suppose p1 , . . . , p R<x>k are noncommutative polynomials.
Show that if the set of vectors
{p1 (X)v, . . . , p (X)v}

(8.23)

i
i

8.6. Noncommutative Varieties with Positive Curvature Have Degree 2

main
2012/11/1
page 387
i

387

is linearly dependent for every (X, v) (S )g R , where = (k) = dim R<x>k ,


then {p1 , . . . , p } is linearly dependent.
Exercise 8.97. Redo Exercise 8.96 under the assumption that the vectors (8.23)
are linearly dependent for all (X, v) O R , where O (S )g is a nonempty
open set.
For a more algebraic view of the linear dependence of noncommutative polynomials we refer to [6].
Exercise 8.98. Prove that f R<x> is a sum of squares if and only if it has
a positive semidenite Gram matrix. Are then all of f s Gram matrices positive
semidenite?

8.6

Noncommutative Varieties with Positive


Curvature Have Degree 2

This section looks at noncommutative varieties and their geometric properties. We


see a very strong rigidity when they have positive curvature which generalizes what
we have already seen about convex polynomials (their graph is a positively curved
variety) having degree 2.
In the classical setting of a surface dened by the zero set
(p) = {x Rg : p(x) = 0}
of a polynomial p = p(x1 , . . . , xg ) in g commuting variables, the second fundamental
form at a smooth point x0 of (p) is the quadratic form
h  (Hess p)(x0 )h, h,

(8.24)

where Hess p is the Hessian of p, and h Rg is in the tangent space to the surface
(p) at x0 ; i.e., p(x0 ) h = 0.3
We shall show that in the noncommutative setting the zero set V(p) of a
noncommutative polynomial p (subject to appropriate irreducibility constraints)
having positive curvature (even in a small neighborhood) implies that p is convex
and thus, p has degree at most twoand V(p) has positive curvature everywhere;
see Theorem 8.103 for the precise statements.
In fact there is a natural notion of the signature C (V(p)) of a variety V(p)
and the bound
deg(p) 2C (V(p)) + 2
3 The choice of the minus sign in (8.24) is somewhat arbitrary. Classically the sign of the
second fundamental form is associated with the choice of a smoothly varying vector that is normal
to (p). The zero set (p) has positive curvature at x0 if the second fundamental form is either
positive semidenite or negative semidenite at x0 . For example, if we dene (p) using a concave
function p, then the second fundamental form is negative semidenite, while for the same set (p)
the second fundamental form is positive semidenite.

i
i

388

main
2012/11/1
page 388
i

Chapter 8. Free Convexity

on the degree of p in terms of the signature C (V(p)) was obtained in [16]. The
convention that C+ (V(p)) = 0 corresponds to positive curvature, since in our examples, dening functions p are typically concave or quasiconcave. One could consider
characterizing p for which C (V(p)) satises a less restrictive hypothesis than being
equal to zero, and this has been done to some extent in [14]; however, this higher
level of generality is beyond our focus here. Since our goal is to present the basic
ideas, we stick to positive curvature.

8.6.1

Noncommutative Varieties and Their Curvature

We next dene a number of basic geometric objects associated to the noncommutative variety determined by a noncommutative polynomial p.
Varieties, tangent planes, and the second fundamental form
The variety (zero set) of a p R<x> is
V(p) :=

<

Vn (p),

n1

where
:
;
Vn (p) := (X, v) (Snn )g Rn : p(X)v = 0 .
The clamped tangent plane to V(p) at (X, v) Vn (p) is
Tp (X, v) := {H (Snn )g : p (X)[H]v = 0}.
The clamped second fundamental form for V(p) at (X, v) Vn (p) is the quadratic
form
Tp (X, v) R,

H  p (X)[H]v, v.

Note that
{X (Snn )g : (X, v) V(p) for some v = 0} = {X (Snn )g : det(p(X)) = 0}
is a variety in (Snn )g and typically has a true (commutative) tangent plane at
many points X, which of course has codimension one, whereas the clamped tangent
plane at a typical point (X, v) Vn (p) has codimension on the order of n and is
contained inside the true tangent plane.
Full rank points
The point (X, v) V(p) is a full rank point of p if the mapping
(Snn )g Rn ,

H  p (X)[H]v

is onto. The full rank condition is a nonsingularity condition which amounts to a


smoothness hypothesis. Such conditions play a major role in real algebraic geometry; see [5, Section 3.3].

i
i

8.6. Noncommutative Varieties with Positive Curvature Have Degree 2

main
2012/11/1
page 389
i

389

As an example, consider the classical real algebraic geometry case of n = 1


(and thus X Rg ) with the commutative polynomial p (which can be taken to
be the commutative collapse of the polynomial p). In this case, a full rank point
(X, 1) Rg R is a point at which the gradient of p does not vanish. Thus, X is a
nonsingular point for the zero variety of p.
Some perspective for n > 1 is obtained by counting dimensions. If (X, v)
(Snn )g Rn , then H  p (X)[H]v is a linear map from the g(n2 +n)/2-dimensional
space (Snn )g into the n-dimensional space Rn . Therefore, the codimension of the
kernel of this map is no bigger than n. This codimension is n if and only if (X, v)
is a full rank point, and in this case the clamped tangent plane has codimension n.
Positive curvature
As noted earlier, a notion of positive (really nonnegative) curvature can be dened
in terms of the clamped second fundamental form.
The variety V(p) has positive curvature at (X, v) V(p) if the clamped second
fundamental form is nonnegative at (X, v), i.e., if
p (X)[H]v, v 0

for every H Tp (X, v) .

Irreducibility: The minimum degree dening polynomial condition


While there is no tradition of what is an eective notion of irreducibility for noncommutative polynomials, there is a notion of minimal degree noncommutative
polynomial which is appropriate for the present context. In the commutative case
p) if there
the polynomial p on Rg is a minimal degree dening polynomial for (
does not exist a polynomial q of lower degree such that (
p) = (q). This is a key
feature of irreducible polynomials.
Denition 8.99. A symmetric noncommutative polynomial p is a minimum degree
dening polynomial for a nonempty set D V(p) if whenever q = 0 is another (not
necessarily symmetric) noncommutative polynomial such that q(X)v = 0 for each
(X, v) D, then
deg(q) deg(p).
Note this contrasts with [15], where minimal degree meant a slightly weaker inequality
holds.
The reader who is so inclined can simply choose D = V(p) or D equal to the
full rank points of V(p).
Now we give an example to illustrate these ideas.

8.6.2

A Very Simple Example

In the following example, the null space


T = Tp (X, v) = {H (Snn )g : p (X)[H]v = 0}

i
i

390

main
2012/11/1
page 390
i

Chapter 8. Free Convexity

is computed for certain choices of p, X, and v. Recall that if p(X)v = 0, then the
subspace T is the clamped tangent plane introduced in Subsection 8.6.1.
Example 8.100. Let X Snn , v Rn , v = 0, and let p(x) = xk for some integer
k 1. Suppose that (X, v) V(p), that is, X k v = 0. Then, since
X k v = 0 Xv = 0

when X Snn ,

it follows that p is a minimum degree dening polynomial for V(p) if and only if
k = 1.
It is readily checked that
(X, v) V(p) = p (X)[H]v = X k1 Hv
and hence that X is a full rank point for p if and only if X is invertible.
Now suppose k 2. Then
p (X)[H]v, v = 2HX k2 Hv, v.
Therefore, if k > 2,
(X, v) V(p) and p (X)[H]v = 0 = XHv = 0, and so
p (X)[H]v, v = 0.
To count the dimension of T we can suppose without loss of generality that


2
3
0 0
X=
and v = 1 0 0 ,
0 Y
where Y S(n1)(n1) is invertible. Then, for the simple case under consideration,
T = {H Snn : h21 , . . . , hn1 = 0},
where hij denotes the ij entry of H. Thus,
dim T =

n2 + n
(n 1),
2

i.e., codim T = n 1.
Remark 8.101. We remark that
X k v = 0 and p (X)[H]v, v = 0 = p (X)[H]v = 0 if k = 2t 4,
as follows easily from the formula
p (X)[H]v, v = 2X t1 Hv, X t1 Hv.
Exercise 8.102. Let A Snn and let U be a maximal strictly negative subspace of Rn with respect to the quadratic form Au, u. Prove that there exists a
complementary subspace V of U in Rn such that Av, v 0 for every v V.

i
i

8.6. Noncommutative Varieties with Positive Curvature Have Degree 2

8.6.3

main
2012/11/1
page 391
i

391

Main Result: Positive Curvature and the Degree of p

Theorem 8.103. Let p be a symmetric noncommutative polynomial in g symmetric


variables, let O be a noncommutative basic open semialgebraic set, and let R denote
the full rank points of p in V(p) O. If
1. R is nonempty,
2. V(p) has positive curvature at each point of R, and
3. p is a minimum degree dening polynomial for R,
then deg(p) is at most 2 and p is concave.

8.6.4

Ideas and Proofs

Our aim is to give the idea behind the proof of Theorem 8.103 under much stronger
hypotheses. We saw earlier the positivity of a quadratic on a noncommutative basic
open set O imparts positivity to its middle matrix there. The following shows this
happens for thin sets (noncommutative varieties) too. Thus, the following theorem
generalizes the QuadratischePositivstellensatz, Theorem 8.60.
Theorem 8.104. Let p, O, R be as in Theorem 8.103. Let q(x)[h] be a polynomial
which is quadratic in h having middle matrix representation q = V  ZV for which
deg(V ) deg(p). If
v  q(X)[H]v 0

for all

(X, v) R and all H,

(8.25)

then Z(X) is positive semidenite for all X with (X, v) R.


Proof. The proof of this theorem follows the proof of the QuadratischePositivstellensatz, modied to take into account the set R.
Suppose for each (X, v) R there is a linear combination G(X,v) (x) of the
words {m(x) : deg(m) < deg(p)} with G(X,v) (X)v = 0 for all (X, v) R. Then by
Theorem 8.92 (note that R is closed under direct sums), there is a linear combination
G R<x>deg(p)1 with G(X)v = 0. However, this is absurd by the minimality
of p. Hence there is a (Y, v) R such that {m(Y )v : deg(m) < deg(p)} is linearly
independent.
g ) there is a
= (X
1, . . . , X
Assume for some g-tuple of symmetric matrices X
v) R, and the matrix Z(X)
is not positive semidenite.
vector v such that (X,
Y and = v v. Then (X, ) R( ) for some ; the matrix Z(X) is
Let X = X
not positive semidenite; and {m(X) : deg(m) < deg(p)} is linearly independent.
+ 1, where is given in Lemma 8.84, and let n = N .
Let N = g (1)
2
Consider W = X IN = (X1 IN , . . . , Xg IN ) and vector = e, where
e RN is the vector with each entry equal to 1. Then (W, ) R(n), and the set
{m(W ) : m <x> } is linearly independent; thus by Lemma 8.84, the codimension of M = {V (W )[H] : H (Snn )g } is at most N 1. On the other hand,
because Z(X) has a negative eigenvalue, the matrix Z(W ) has an eigenspace E,

i
i

392

main
2012/11/1
page 392
i

Chapter 8. Free Convexity

corresponding to a negative eigenvalue, of dimension at least N . It follows that


E M is nonempty; i.e., there is an H (Snn )g such that V (W )[H] E. In
particular,
q(W )[H],  = Z(W )V (W )[H], V (W ) < 0,
and thus, q(W )[H] is not positive semidenite.
The modied Hessian
Our main tool for analyzing the curvature of noncommutative varieties is a variant
of the Hessian for symmetric noncommutative polynomials p. The curvature of V(p)
is dened in terms of Hess (p) compressed to tangent planes, for each dimension n.
This compression of the Hessian is awkward to work with directly, and so we associate to it a quadratic polynomial q(x)[h] carrying all of the information of p
compressed to the tangent plane, but having the key property (8.25). We shall call
this q we construct the relaxed Hessian. The rst step in constructing the relaxed
Hessian is to consider the simpler modied Hessian
p,0 (x)[h] := p (x)[h] + p (x)[h] p (x)[h],
which captures the conceptual idea. Suppose X (Snn )g and v Rn . We say
that the modied Hessian is negative at (X, v) if there is a 0 < 0, so that for all
0 ,
0 p,0 (X)[H]v, v
nn g
) Rn , we
for all H (Snn )g . Given a subset R = (R(n))
n=1 , with R(n) (S
say that the modied Hessian is negative on R if it is negative at each (X, v) S.
Now we turn to motivation.

Example 8.105. The classical n = 1 case. Suppose that p is strictly smoothly


quasi-concave, meaning that all superlevel sets of p are strictly convex with strictly
positively curved smooth boundary. Suppose that the gradient p (written as a
row vector) never vanishes on Rg . Then G = p(p) is strictly positive at each
point X in Rg . Fix such an X; the modied Hessian can be decomposed as a block
matrix subordinate to the tangent plane to the level set at X, denoted TX , and to
its orthogonal complement (the gradient direction):
TX {p : R}.
In this decomposition the modied Hessian has the form


A
B
R=
.
B  D + G
Here, in the case of = 0, R is the Hessian and the second fundamental form is A or
A, depending on convention and the rather arbitrary choice of inward or outward
normal to . If we select our normal direction to be p, then A is the classical

i
i

8.6. Noncommutative Varieties with Positive Curvature Have Degree 2

main
2012/11/1
page 393
i

393

second fundamental form as is consistent with the choice of sign in our denition
in Subsection 8.6.1. (All this concern with the sign is unimportant to the content
of this chapter and can be ignored by the reader.)
Next, in view of the presumed strict positive curvature of each level set ,
the matrix A at each point of is negative denite but the Hessian could have a
negative eigenvalue. However, by standard Schur complement arguments, R will be
negative denite if
D + G B  A1 B 0
on this region. Thus, strict convexity assumptions on the sublevel sets of p make
the modied Hessian negative denite for negative enough . One can make
this negative deniteness uniform in X in various neighborhoods under modest
assumptions.
Very unfortunately in the noncommutative case, Remark 6.8 [17] implies that
if n is large enough, then the second fundamental form will have a nonzero null
space, thus strict negative deniteness of the A part of the modied Hessian is
impossible.
Our trick for dealing with the likely reality that A is only positive semidenite
and obtaining a negative denite R is to add another negative term, say I, with
arbitrarily small < 0. After adding such , the argument based on choosing
large succeeds as before. This term plus the term produces the relaxed Hessian, to be introduced next, and proper selection of these terms makes it negative
denite.
The relaxed Hessian
Recall Let Vk (x)[h] denotes the vector of polynomials with entries hj w(x), where
w <x> runs through the set of g k words of length k, j = 1, . . . , g. Although the
order of the entries is xed in some of our earlier applications (see e.g. [16, (2.3)])
it is irrelevant for the moment. Thus, Vk = Vk (x)[h] is a vector of height g k+1 , and
the vectors
V (x)[h] = col(V0 , . . . , Vd2 ) and VG (x)[h] = col(V0 , . . . , Vd1 )
are vectors of height g(d 2) and g(d 1), respectively. Note that
VG (x)[h] VG (x)[h] =

g


w(x) h2j w(x).

j=1 deg(w)d1

The relaxed Hessian of the symmetric noncommutative polynomial p of degree


d is dened to be
p, (x)[h] := p,0 (x)[h] + VG (x)[h] VG (x)[h] R<x>[h].

i
i

394

main
2012/11/1
page 394
i

Chapter 8. Free Convexity

Suppose X (Snn )g and v Rn . We say that the relaxed Hessian is negative at


(X, v) if for each < 0 there is a < 0, so that for all ,
0 p, (X)[H]v, v
nn g
for all H (Snn )g . Given an R = (R(n))
) Rn , we
n=1 , with R(n) (S
say that the relaxed Hessian is positive (respectively, negative) on R if it is positive
(respectively, negative) at each (X, v) S.
The following theorem provides a link between the signature of the clamped
second fundamental form with that of the relaxed Hessian.

Theorem 8.106. Suppose p is a symmetric noncommutative polynomial of degree


d in g symmetric variables and (X, v) (Snn )g Rn . If V(p) has positive curvature
at (X, v) Vn (p), i.e., if
p (X)[H]v, v 0

for every H Tp (X, v),

then for every < 0 there exists a < 0 such that for all ,
p, (X)[H]v, v 0

for every H (Snn )g ;

i.e., the relaxed Hessian of p is negative at (X, v).


We leave the proof of Theorem 8.106 to the reader.
The basic idea of the proof of Theorem 8.103 is to obtain a negative relaxed
Hessian q from Theorem 8.106 and then apply Theorem 8.104. We begin with the
following lemma.
Lemma 8.107. Suppose R and T are operators on a nite-dimensional Hilbert
space H = K L. Suppose further that, with respect to this decomposition of H,
the operator R = CC  for
 


r
T 0
C=
: L K L and T = 0
.
c
0 0
If c is invertible and if for every > 0 there is a > 0 such that for all > ,
T + I + R  0,
then T  0.
Proof. Write

T + I + rr
T + I + R = 0
cr


rc
.
+ cc

From Schur complements it follows that


T0 + I + r( 2 c ( + cc )1 c)r  0.

i
i

8.6. Noncommutative Varieties with Positive Curvature Have Degree 2

main
2012/11/1
page 395
i

395

Now
r( 2 c ( + cc )1 c)r = rc ((cc )1 ( + cc )1 )cr
= rc (cc )1 ( + (cc ))1 cr
r(cc )1 r .
Hence,
T0 + I + r(cc )1 r  0.
Since the above inequality holds for all > 0, it follows that T0  0.
We now have enough machinery developed to prove Theorem 8.103.
Proof of Theorem 8.103. Fix , > 0 and consider q(x)[h] = p, (x)[h]. We
are led to investigate the middle matrix Z , of q(x)[h], whose border vector V (x)[h]
includes all monomials of the form hj m, where m is a word in x only of length at
most d 1; here d is the degree of p. Indeed,
Z , = Z + I + W,
where Z is the middle matrix for p (x)[h] and W is the middle matrix for
p (x)[h] p (x)[h]. With an appropriate choice of ordering for the border vector V ,
we have W = CC  , where


w(x)
C(x) =
c
for a nonzero vector c, and at the same time,
 0,0
Z (x)
Z(x) =
0


0
.
0

By the curvature hypothesis at a given X with (X, v) R, Theorem 8.106


implies for every > 0 there is an > 0 such that if >
q(X)[H]v, v 0

for all (X, v) R and all H.

Hence, by Theorem 8.104, the middle matrix, Z , (X) for q(x)[h] is positive semidefinite. We are in the setting of Lemma 8.107, from which we obtain Z 0,0 (X)  0. If
this held for X in a noncommutative basic open semialgebraic set, then Theorem
8.71 forces p to have degree no greater than 2. The proof of that theorem applies
easily here to nish this proof.

8.6.5

Exercises

Exercise 8.108. Compute the BV-MM representation for the relaxed Hessian of
x3 and x4 .

i
i

396

8.7

main
2012/11/1
page 396
i

Chapter 8. Free Convexity

Convex Semialgebraic Noncommutative Sets

In this section we will give a brief overview of convex semialgebraic noncommutative sets and positivity of noncommutative polynomials on them. We shall see that
their structure is much more rigid than that of their commutative counterparts.
For example, roughly speaking, each convex semialgebraic noncommutative set is a
spectrahedron, i.e., a solution set of a linear matrix inequality (LMI) (cf. Section
8.7.1 below). Similarly, every noncommutative polynomial nonnegative on a spectrahedron admits a sum of squares representation with weights and optimal degree
bounds (see Section 8.7.2 for details and precise statements).

8.7.1

Noncommutative Spectrahedra

Let L be an ane linear pencil. Then the solution set of the LMI L(x) 0 is
DL =

<:
;
X (Snn )g : L(X) 0
nN

and is called a noncommutative spectrahedron. The set DL is convex in the sense


that each
:
;
DL (n) := X (Snn )g : L(X) 0
is convex. It is also a noncommutative basic open semialgebraic set as dened in
Section 8.2.1 above. The main theorem of this section is the converse, a result which
has implications for both semidenite programming and systems engineering.
Most of the time we will focus on monic linear pencils. An ane linear pencil
L is called monic if L(0) = I, i.e., L(x) = I + A1 x1 + + Ag xg . Since we are
mostly interested in the set DL , there is no harm in reducing to this case whenever
DL = ; see Exercise 8.111.
Let p R <x> be a given symmetric noncommutative valued matrix
polynomial. Assuming that p(0) 0, the positivity set Dp (n) of a noncommutative
symmetric polynomial p in dimension n is the component of 0 of the set
{X (Snn )g : p(X) 0}.
The positivity set, Dp , is the sequence of sets (Dp (n))nN . The noncommutative set
Dp is called convex if, for each n, Dp (n) is convex.
Theorem 8.109 (HeltonMcCullough [38]). Fix p, a symmetric matrix
of polynomials in noncommuting variables. Assume
1. p(0) is positive denite;
2. Dp is bounded; and
3. Dp is convex.

i
i

8.7. Convex Semialgebraic Noncommutative Sets

main
2012/11/1
page 397
i

397

Then there is a monic linear pencil L such that


DL = Dp .
Here we shall conne ourselves to a few words about the techniques involved
in the proof, and refer the reader to [38] for the full proof. Since we are dealing with
matrix convex sets, it is not surprising that the starting point for our analysis is
the matricial version of the HahnBanach separation theorem of Eros and Winkler
[20], which (itself a part of the theory of operator spaces and completely positive
maps [4, 57, 58]) says that given a point x not inside a matrix convex set there is a
(nite) LMI which separates x from the set. For a general matrix convex set C, the
conclusion is then that there is a collection, likely innite, of LMIs which cut out C.
In the case C is matrix convex and also semialgebraic, the challenge is to prove
that there is actually a nite collection of LMIs which dene C. The techniques
used to meet this challenge have little relation to the methods of noncommutative
calculus and positivity in the previous sections. Indeed a basic tool (of independent
interest) is a degree bounded type of free Zariski closure of a single point (X, v)
(Snn )g Rn ,
<
Zd (X, v) := {(Y, w) (Smm )g Rm : q(Y )w = 0 if q(X)v = 0, q R<x>d }.
m

Chief among a pleasant list of natural properties is the fact that there is an (X, v)
with X Dp and p(X)v = 0 for which Zd (X, v) contains all pairs (Y, w) such that
Y Dp and p(Y )w = 0. Combining this with the ErosWinkler theorem and
battling degeneracies is a bit tricky, but separation prevails in the end. See [38] for
the details.
An unexpected consequence of Theorem 8.109 is that projections of noncommutative semialgebraic sets may not be semialgebraic; see Exercise 8.112. For perspective, in the commutative case of a basic open semialgebraic subset C of Rg , there
is a stringent condition, called the line test (see Chapter 6 for more details), which,
in addition to convexity, is necessary for C to be a spectrahedron. In two dimensions
the line test is necessary and sucient [44], a result used by LewisParriloRamana
[51] to settle a 1958 conjecture of Peter Lax on hyperbolic polynomials.
In summary, if a (commutative) bounded basic open semialgebraic convex set
is a spectrahedron, then it must pass the highly restrictive line test; whereas a
noncommutative basic open semialgebraic set is a spectrahedron if and only if it is
convex.

8.7.2

Noncommutative Positivstellens
atze under Convexity
Assumptions

An algebraic certicate for positivity of a polynomial p on a semialgebraic set S is


a Positivstellensatz. The familiar fact that a polynomial p in one variable which is
positive on R is a sum of squares is an example.
The theory of Positivstellens
atzea pillar of the eld of real algebraic
geometryunderlies the main approach currently used for global optimization of

i
i

398

main
2012/11/1
page 398
i

Chapter 8. Free Convexity

polynomials. See [50] or Chapter 3 by Parrilo for a beautiful treatment of this,


and other, applications of commutative real algebraic geometry. Further, because
convexity of a polynomial p on a set S is equivalent to positivity of the Hessian
of p on S, this theory also provides a link between convexity and semialgebraic
geometry. Indeed, this link in the noncommutative setting ultimately leads to the
conclusion that a matrix convex noncommutative polynomial has degree at most 2;
cf. Subsection 8.4.4.
In this section we give a result of opposite type. We present a noncommutative
Positivstellensatz for a polynomial to be nonnegative on a convex semialgebraic
noncommutative set (i.e., on a spectrahedron). Again, this result is cleaner and
more rigid than the commutative counterparts (cf. Theorem 8.10).
Theorem 8.110 ([33]). Suppose L is a monic linear pencil. Then a noncommutative polynomial p is positive semidenite on DL if and only if it has a weighted
sum of squares representation with optimal degree bounds. Namely,
p = s s +

nite


fj Lfj ,

(8.26)

where s, fj are vectors of noncommutative polynomials of degree no greater than


deg(p)
2 .
The main ingredient of the proof is an analysis of rank preserving extensions
of truncated noncommutative Hankel matrices; see [33] for details. We point out
that with L = 1, Theorem 8.110 recovers Theorem 8.10.
Theorem 8.110 contrasts sharply with the commutative setting, where the
degrees of s, fj are vastly greater than deg(p) and assuming only p nonnegative
yields a clean Positivstellensatz so seldom that the cases are noteworthy.

8.7.3

Exercises

Exercise 8.111. Suppose L is an ane linear pencil such that 0 DL (1). Show
with DL = D .
that there is a monic linear pencil L
L
Exercise 8.112. Chapters 5 and 6 discuss sets D Rg which have a semidenite
representation as a strict generalization of a spectrahedron. For instance, consider
the TV screen (cf. Section 8.2.1)
ncTV(1) = {X R2 : 1 X14 X24 > 0} R2 .
Given a positive real number, choose 4 = 1 + 22 and let

1 0
y1

y2
L0 = 0 1
y1 y2 1 2(y1 + y2 )


and
Lj =

1
xj


xj
,
+ yj

j = 1, 2.

(8.27)

(8.28)

i
i

8.7. Convex Semialgebraic Noncommutative Sets

main
2012/11/1
page 399
i

399

Note that the Lj are not monic, but because Lj (0) 0, they can be normalized to
be monic without altering the solution sets of Lj (X) 0; cf. Exercise 8.111. Let
L = L0 L1 L2 .
It is readily veried that ncTV(1) is the projection onto the rst two (the x)
coordinates of the set DL (1); i.e.,
ncTV(1) = {X R2 : Y R2 L(X, Y ) 0}.
1. Show that ncTV(1) is not a spectrahedron. (Hint: How often is LTV (tX, tY ) for
t R singular?)
2. Show that ncTV is not the projection of the noncommutative spectrahedron DL .
3. Show that ncTV is not the projection of any noncommutative spectrahedron.
4. Is ncTV(2) a projection of a spectrahedron? (Feel free to use the results about
ncTV and LMI representable sets (spectrahedra), stated without proofs, from
Sections 8.2.1 and 8.7.1.)
Exercise 8.113. If q is a symmetric concave matrix-valued polynomial with
q(0) = I, then there exists a linear pencil L and a matrix-valued linear polynomial
such that
q = I L  .
Exercise 8.114. Consider the monic linear pencil


1 x
M (x) =
.
x 1
1. Determine DM .
2. Show that 1 + x is positive semidenite on DM .
3. Construct a representation for 1 + x of the form (8.26).
Exercise 8.115. Consider the univariate ane linear pencil


1 x
L(x) =
.
x 0
1. Determine DL .
2. Show that x is positive semidenite on DL .
3. Does x admit a representation of the form (8.26)?
Exercise 8.116. Let L be an ane linear pencil. Prove that
1. DL is bounded if and only if DL (1) is bounded;
2. DL = if and only if DL (1) = .

i
i

400

main
2012/11/1
page 400
i

Chapter 8. Free Convexity

Exercise 8.117. Let L = I + A1 x1 + + Ag xg be a monic linear pencil and


assume that DL (1) is bounded. Show that I, A1 , . . . , Ag are linearly independent.
Exercise 8.118. Let

0
(x1 , x2 ) = I + 1
0

1 0
0 0
0 0 x1 + 0 0
0 0
1 0

1
1
0 x2 = x1
0
x2

x1
1
0

x2
0
1

and




1 0
0
(x1 , x2 ) = I +
x +
0 1 1
1



1
1 + x1
x =
x2
0 2

x2
1 x1

be ane linear pencils. Show


1. D (1) = D (1).
2. D (2)  D (2).
3. Is D D ? What about D D ?
Exercise 8.119. Let L = A1 x1 + + Ag xg Sdd <x> be a (homogeneous)
linear pencil. Then the following are equivalent:
(i) DL (1) = ;
(ii) If u1 , . . . , um Rd with

8.8

m
i=1

ui L(x)ui = 0, then u1 = = um = 0.

From Free Real Algebraic Geometry to the


Real World

Now that you have gone through the mathematics we return to its implications. In
the linear systems engineering problems you have seen both in Section 8.1.1 and in
Section 2.2.1, the conclusion was that the problem was equivalent to solving an LMI.
Indeed this is what one sees throughout the literature. Thousands of engineering
papers have a dimension free problem and it converts (often by serious cleverness)
to an LMI in the best of cases, or more likely there is some approximate solution
which is an LMI.
While engineers would be satised with convexity, what they actually do get
is an LMI. One would hope that there is a rich world of convex situations not
equivalent to an LMI. Then there would be a variety of methods waiting to be
discovered for dealing with them. Alas what we have shown here is compelling
evidence that any convex dimension free problem is equivalent to an LMI. Thus
there is no rich world of convexity beyond what is already known and no armada
of techniques beyond those for producing LMIs which we already see all around us.

i
i

Bibliography

main
2012/11/1
page 401
i

401

Bibliography
[1] S. Balasubramanian and S. McCullough. Quasi-convex free polynomials. To
appear in Proc. Amer. Math. Soc. http://arxiv.org/abs/1208.3582.
[2] G. M. Bergman. Rational relations and rational identities in division rings I.
J. Algebra, 43:252266, 1976.
[3] R. Bhatia. Matrix Analysis. Springer-Verlag, Berlin, 1997.
[4] D. P. Blecher and C. Le Merdy. Operator Algebras and Their ModulesAn
Operator Space Approach, Oxford Science Publications, Oxford, UK, 2004.
[5] J. Bochnak, M. Coste, and M. F. Roy. Real Algebraic Geometry. SpringerVerlag, Berlin, 1998.
[6] M. Bresar and I. Klep. A local-global principle for linear dependence of noncommutative polynomials. Israel J. Math., to appear.
[7] K. Cafuta, I. Klep, and J. Povh. A note on the nonexistence of sum of squares
certicates for the Bessis-Moussa-Villani conjecture. J. Math. Phys., 51:083521,
2010.
[8] K. Cafuta, I. Klep, and J. Povh. NCSOStools: a computer algebra system
for symbolic and numerical computation with noncommutative polynomials.
Optim. Methods Softw., 26:363380, 2011.
[9] J. F. Camino, J. W. Helton, R. E. Skelton, and J. Ye. Matrix inequalities:
A symbolic procedure to determine convexity automatically. Integral Equations
Operator Theory, 46:399454, 2003.
[10] P. M. Cohn. Skew Fields. Theory of General Division Rings. Cambridge University Press, Cambridge, UK, 1995.
[11] P. M. Cohn. Free Ideal Rings and Localization in General Rings. Cambridge
University Press, Cambridge, UK, 2006.
[12] A. C. Doherty, Y.-C. Liang, B. Toner, and S. Wehner. The quantum moment
problem and bounds on entangled multi-prover games. In Twenty-Third Annual
IEEE Conference on Computational Complexity, 2008, pp. 199210.
[13] M. de Oliviera, J. W. Helton, S. McCullough, and M. Putinar. Engineering systems and free semi-algebraic geometry. In Emerging Applications of Algebraic
Geometry, IMA Vol. Math. Appl. 149. Springer-Verlag, Berlin, 2009, pp. 1762.
[14] H. Dym, J. M. Greene, J. W. Helton, and S. McCullough. Classication of all
noncommutative polynomials whose Hessian has negative signature one and a
noncommutative second fundamental form. J. Anal. Math., 108:1959, 2009.
[15] H. Dym, J. W. Helton, and S. McCullough. Irreducible noncommutative dening polynomials for convex sets have degree four or less. Indiana Univ. Math. J.,
56:11891232, 2007.

i
i

402

main
2012/11/1
page 402
i

Chapter 8. Free Convexity

[16] H. Dym, J. W. Helton, and S. McCullough. The Hessian of a non-commutative


polynomial has numerous negative eigenvalues. J. Anal. Math., 102:2976,
2007.
[17] H. Dym, J. W. Helton, and S. McCullough. Noncommutative varieties with
curvature having bounded signature, Illinois J. Math., to appear.
[18] E. G. Eros. A matrix convexity approach to some celebrated quantum
inequalities. Proc. Natl. Acad. Sci. USA, 106:10061008, 2009.
[19] A. Ebadiana, I. Nikoufarb, and M. E. Gordjic. Perspectives of matrix convex
functions. Proc. Natl. Acad. Sci. USA, 108:73137314, 2011.
[20] E. G. Eros and S. Winkler. Matrix convexity: Operator analogues of the
bipolar and Hahn-Banach theorems. J. Funct. Anal., 144:117152, 1997.
[21] W.S. Gray and Y. Li. Generating series for interconnected analytic nonlinear
systems. SIAM J. Control Optim., 44:646672, 2005.
[22] W.S. Gray and M. Thitsa. A unied approach to generating series of mixed
cascades of analytic nonlinear input-output systems. Internat. J. Control,
85:17371754, 2012.
[23] J. M. Greene, J. W. Helton, and V. Vinnikov. Noncommutative plurisubharmonic polynomials, Part I: Global assumptions. J. Funct. Anal., 261:3390
3417, 2011.
[24] F. Hansen. Operator convex functions of several variables. Publ. Res. Inst.
Math. Sci., 33:443463, 1997.
[25] F. Hansen and J. Tomiyama. Dierential analysis of matrix convex functions.
Linear Algebra Appl., 420:102116, 2007.
[26] D. M. Hay, J. W. Helton, A. Lim, and S. McCullough. Non-commutative
partial matrix convexity. Indiana Univ. Math. J., 57:28152842, 2008.
[27] J. W. Helton. Positive noncommutative polynomials are sums of squares.
Ann. of Math. (2), 156:675694, 2002.
[28] J. W. Helton, I. Klep, and S. McCullough. Analytic mappings between
noncommutative pencil balls. J. Math. Anal. Appl., 376:407428, 2011.
[29] J. W. Helton, I. Klep, and S. McCullough. Proper analytic free maps. J.
Funct. Anal., 260:14761490, 2011.
[30] J. W. Helton, I. Klep, and S. McCullough. Relaxing LMI domination matricially. In 49th IEEE Conference on Decision and Control, 2010, pp. 33313336.
[31] J. W. Helton, I. Klep, and S. McCullough. Convexity and semidenite programming in dimension-free matrix unknowns. In M. Anjos and J. B. Lasserre,
editors, Handbook of Semidenite, Cone and Polynomial Optimization.
Springer-Verlag, Berlin, 2012, pp. 377405.

i
i

Bibliography

main
2012/11/1
page 403
i

403

[32] J. W. Helton, I. Klep, and S. McCullough. The matricial relaxation of a linear


matrix inequality. Preprint, http://arxiv.org/abs/1003.0908. To appear in
Math. Program.
[33] J. W. Helton, I. Klep, and S. McCullough. The convex Positivstellensatz in a
free algebra. Adv. Math., 231:516534, 2012.
[34] J. W. Helton, I. Klep, S. McCullough, and N. Slinglend. Noncommutative ball
maps. J. Funct. Anal., 257:4787, 2009.
[35] J. W. Helton and O. Merino. Sucient conditions for optimization of matrix functions. In 37th IEEE Conference on Decision and Control, 1998,
pp. 33613365.
[36] J. W. Helton and S. McCullough. Convex noncommutative polynomials have
degree two or less. SIAM J. Matrix Anal. Appl., 25:11241139, 2004.
[37] J. W. Helton and S. McCullough. A Positivstellensatz for noncommutative
polynomials. Trans. Amer. Math. Soc., 356:37213737, 2004.
[38] J. W. Helton and S. McCullough. Every free basic convex semialgebraic set
has an LMI representation. Ann. of Math., 176:9791013, 2012.
[39] J. W. Helton, S. McCullough, and M. Putinar. A non-commutative Positivstellensatz on isometries. J. Reine Angew. Math., 568:7180, 2004.
[40] J. W. Helton, S. McCullough, M. Putinar, and V. Vinnikov. Convex matrix
inequalities versus linear matrix inequalities. IEEE Trans. Automat. Control,
54:952964, 2009.
[41] J. W. Helton, S. McCullough, and V. Vinnikov. Noncommutative convexity
arises from linear matrix inequalities. J. Funct. Anal., 240:105191, 2006.
[42] J. W. Helton, M. de Oliveira, R. L. Miller, and M. Stankus. NCAlgebra:
A Mathematica package for doing non commuting algebra, available from
http://www.math.ucsd.edu/ncalg/.
[43] J. W. Helton and M. Putinar. Positive polynomials in scalar and matrix variables, the spectral theorem and optimization, In Operator Theory, Structured
Matrices, and Dilations, Theta Ser. Adv. Math. 7. American Mathematical
Society, Providence, RI, 2007, pp. 229306.
[44] J. W. Helton and V. Vinnikov. Linear matrix inequality representation of sets.
Comm. Pure Appl. Math. 60:654674, 2007.
[45] D. Kalyuzhnyi-Verbovetski and V. Vinnikov. Singularities of rational functions and minimal factorizations: The noncommutative and the commutative
setting. Linear Algebra Appl., 430:869889, 2009.
[46] D. Kalyuzhnyi-Verbovetski and V. Vinnikov. Foundations of noncommutative
function theory, in preparation.

i
i

404

main
2012/11/1
page 404
i

Chapter 8. Free Convexity

[47] I. Klep and M. Schweighofer. Connes embedding conjecture and sums of


Hermitian squares. Adv. Math., 217:18161837, 2008.
[48] I. Klep and M. Schweighofer. Sums of Hermitian squares and the BMV
conjecture. J. Stat. Phys., 133:739760, 2008.

[49] F. Kraus. Uber


konvexe matrixfunktionen. Math. Z., 41:1842, 1936.
[50] J. B. Lasserre. Moments, Positive Polynomials and Their Applications.
Imperial College Press, London, 2010.
[51] A. S. Lewis, P. A. Parrilo, and M. V. Ramana. The Lax conjecture is true.
Proc. Amer. Math. Soc., 133:24952499, 2005.
[52] T. Lyons, M. Caruana, and T. Levy. Dierential equations drive by rough

paths. In Ecole
dEte de Probabilites de Saint-Flour XXXIV, Lecture Notes in
Math. 1908, Springer-Verlag, Berlin, 2004.
[53] S. McCullough. Factorization of operator-valued polynomials in several
noncommuting variables. Linear Algebra Appl., 326:193203, 2001.
[54] P. S. Muhly and B. Solel. Progress in noncommutative function theory. Sci.
China Ser. A, 54:22752294, 2011.
[55] B. Sz.-Nagy, C. Foias, H. Bercovici, and L. Kerchy. Harmonic Analysis of
Operators on Hilbert Space. Springer-Verlag, New York, 2010.
[56] H. Osaka, S. Silvestrov, and J. Tomiyama. Monotone operator functions, gaps
and power moment problem. Math. Scand., 100:161183, 2007.
[57] V. Paulsen. Completely Bounded Maps and Operator Algebras. Cambridge
University Press, Cambridge, UK, 2002.
[58] G. Pisier. Introduction to Operator Space Theory. Cambridge University Press,
Cambridge, UK, 2003.
[59] S. Pironio, M. Navascues, and A. Acn. Convergent relaxations of polynomial
optimization problems with noncommuting variables. SIAM J. Optim.,
20:21572180, 2010.
[60] G. Popescu. Free holomorphic functions on the unit ball of B(H)n . J. Funct.
Anal., 241:268333, 2006.
[61] G. Popescu. Free holomorphic automorphisms of the unit ball of B(H)n .
J. Reine Angew. Math., 638:119168, 2010.
[62] K. Schm
udgen. A strict Positivstellensatz for the Weyl algebra. Math. Ann.,
331:779794, 2005.
[63] K. Schm
udgen. Noncommutative real algebraic geometrysome basic concepts and rst ideas. In Emerging Applications of Algebraic Geometry, IMA
Vol. Math. Appl. 149. Springer-Verlag, Berlin, 2009, pp. 325350.

i
i

Bibliography

main
2012/11/1
page 405
i

405

[64] D. Shlyakhtenko and D.-V. Voiculescu. Free analysis workshop summary,


American Institute of Mathematics,
http://www.aimath.org/pastworkshops/freeanalysis.html.
[65] R. E. Skelton and T. Iwasaki. Eye on education. Increased roles of linear
algebra in control education. IEEE Control Syst. Mag., 15:7690, 1995.
[66] R. E. Skelton, T. Iwasaki, and K. M. Grigoriadis. A Unied Algebraic
Approach to Linear Control Design. Taylor & Francis, London, 1997.
[67] J. L. Taylor. Functions of several noncommuting variables. Bull. Amer. Math.
Soc., 79:134, 1973.
[68] M. Uchiyama. Operator monotone functions and operator inequalities. Sugaku
Expositions, 18:3952, 2005.
[69] D.-V. Voiculescu. Free analysis questions I: Duality transform for the coalgebra
of X:B . Int. Math. Res. Not., 16:793822, 2004.
[70] D.-V. Voiculescu. Free analysis questions II: The Grassmannian completion and
the series expansions at the origin. J. Reine Angew. Math., 645:155236, 2010.
[71] D.-V. Voiculescu, K. J. Dykema, and A. Nica. Free Random Variables. A
Noncommutative Probability Approach to Free Products with Applications to
Random Matrices, Operator Algebras and Harmonic Analysis on Free Groups.
American Mathematical Society, Providence, RI, 1992.

i
i

main
2012/11/1
page 406
i

main
2012/11/1
page 407
i

Chapter 9

Sums of Hermitian
Squares: Old and New

Mihai Putinar

This nal chapter marks a departure from the main framework of the book by
putting emphasis on hermitian forms over the complex eld rather than symmetric
forms over the real eld. The passage is both natural and necessary. To give a
simple motivation: polynomial or rational functions with real coecients, so much
praised in the preceding chapters, may very well have complex roots or complex
poles. Taking them into account greatly simplies computations and conceptual
thinking, as we all remember from elementary algebra. A second important observation goes back to the dictionary between elementary functions and matrices: by
writing in
complex coordinates a real valued polynomial (in any number of variables)
the hermitian matrix (c ), while a simp(z, z) = c z z uniquely
 determines
ilar decomposition q(x) =
x+ , with real coecients , so much needed
for semidenite programming, has a clear ambiguity. The appearance at this late
stage of the book of imaginary ghosts related to the basic entities encountered so
far should not discourage the truly real and very applied reader.

9.1

Introduction

A question arises from the very beginning: how much of the vast theory of hermitian
forms (in a nite or innite number of variables) should the student or practitioner
in applied areas of real algebra, functional analysis, algebraic geometry, or optimization theory know? Due to the depth and wide ramications of hermitian forms
(over the complex eld) versus forms over real elds, the answer is: quite a lot! The
good news is that the material, old and new, either is well known, circulating in part
as folklore, or is accessible, due to a century and a half of continuous development
Mihai

Putinar was supported by NSF grant DMS-1001071.

407

i
i

408

main
2012/11/1
page 408
i

Chapter 9. Sums of Hermitian Squares: Old and New

of hermitian forms, in all their impersonations. Without aiming at completeness,


we touch below several basic aspects of the theory of hermitian forms. The historical
and bibliographical notes, supplemented by the suggested problems will guide the
reader though this eld and will hopefully whet the appetite for a thorough study
of some specic subtopics. A glimpse at the table of contents (of this chapter) will
give an indication of what we aim at below: root separation of polynomials, the
structure of stable polynomials, eective computation of bounds for analytic functions, Hilbert space realization of analytic functions, hermitian positivity in several
complex variables, and a brief return to real algebra. The identication of a positive
denite hermitian form with a Hilbert space structure cannot be underestimated,
especially for the emerging domain of convex algebraic geometry whose frontiers
are delimited in this book. In other words: it is not an accident that Hilbert spaces
pop up unexpectedly in convex algebraic geometry.
It is important to state from the very beginning that a major source of the
theory of hermitian forms is omitted by our survey: the study of linear integral
equations as they appear in problems of mathematical physics, such as the stationary values of the energy functional in potential theory, vibrations of strings and
membranes, elasticity theory, dissipation of heat, and so on. Major gures in this
eld were Riemann, Hilbert, and Poincare and their contemporaries. Hilbert has
collected six of his groundbreaking articles on integral equations in a booklet [21].
The modern reader can nd them actual, accessible, and full of ideas. In particular,
Hilbert regards the whole area of integral equations as a chapter of the theory of
hermitian forms of innitely many variables. His point of view has persisted through
the rst half of the twentieth century, as one can also see from the German Mathematical Encyclopedia article by Hellinger and Toeplitz [19]. Even today (quantum)
mathematical physicists prefer to work with hermitian forms rather than with linear
unbounded operators, and the distinction is not only cosmetic.

9.2

Hermitian Forms and Sums of Squares

We start by recalling a few well-known facts about canonical forms of matrices and
positive denite kernels. Let C be the complex eld and denote by Md (C) the
algebra of d d matrices over C, regarded as linear transforms of the space Cd . We
endow Cd with its hermitian structure, that is, the inner product
z, w = z w = z1 w 1 + + zd wd ,
where z = (z1 , . . . , zd ), w = (w1 , . . . , wd ) Cd . We put as usual
z
2 = z, z. The
adjoint of a linear transform A L(Cd ) is dened by the identity
Az, w = z, A w.
Let e1 , . . . , ed denote the canonical orthonormal basis of Cd . When representing
A = (ajk )dj,k=1 and z = z1 e1 + + zd ed as a column vector, we have
(Az)j = Az, ej  =

d


ajk zk ,

k=1

i
i

9.2. Hermitian Forms and Sums of Squares

main
2012/11/1
page 409
i

409

whence A is represented by the transpose complex conjugate matrix (akj )dj,k=1 .


The linear transform A is called self-adjoint or hermitian if A = A . A linear
transform U L(Cd ) is called unitary if U U = U U = I, that is, U is isometric:
U z, U w = z, w, z, w Cd .

9.2.1

The Spectral Theorem

Theorem 9.1. Let A = A be a hermitian matrix. There exists a unitary matrix


U and a diagonal matrix D with real entries, such that
A = U DU .
The elements on the diagonal of D are determined by A, up to a permutation, as they coincide, multiplicity included, with the eigenvalues of A, that is the
roots of the characteristic polynomial det(I A). For proofs see Chapter IX in
Gantmachers monograph [15], or your favorite linear algebra textbook.
There are two other ways to look at the spectral theorem. One of them involves
the quadratic form on Cd :
qA (z) = Az, z, z Cd .
Note that qA (z) is a bihomogeneous polynomial of degree (1, 1) in the variables z,
respectively, z, where the latter denotes complex conjugation entry by entry. Conversely, we have the following lemma.
Lemma 9.2. Every homogeneous polynomial P (z, z) of bidegree (1, 1) which has
real values for z Cd is of the form qA (z) for a unique self-adjoint matrix A.
Proof. Write
P (z, z) =

d


cjk zj z k = Cz, z,

j,k=1

where C is the matrix of its coecients. If P (z, z) R for all z Cd , we infer


Cz, z = z, Cz, z Cd .
But this identity can be polarized, that is,
Cz, w = z, Cw, z, w Cd ,
which implies C = C . This operation also implies the uniqueness of the matrix C.
To explain the polarization operation it is sucient to contemplate the identity
4u, v = u + v, u + v u v, u v + iu + iv, u + iv iu iv, u iv,

where u, v Cd and i = 1.

i
i

410

9.2.2

main
2012/11/1
page 410
i

Chapter 9. Sums of Hermitian Squares: Old and New

The Law of Inertia

The spectral theorem asserts that the quadratic form qA (z) can be written as a
weighted sum of squares of complex linear forms:
qA (z) =

d


j |wj |2 ,

j=1

where
wj =

d


ujk zk , 1 j d,

k=1

is a new orthonormal system of coordinates in Cd and j are the eigenvalues of A.


Now look at the level set
E = {z Cd ; qA (z) = 1}.
In the new system of coordinates E has the equation
1 |w1 |2 + 2 |w2 |2 + + d |wd |2 = 1.
Thus the reciprocals of the eigenvalues, when nonzero, represent the semiaxes of
this real quadratic hypersurface E in R2d = Cd . The reader is invited to question
what happens with E if one eigenvalue is zero.
In short, the quadratic form qA can be written as
qA (z) =

n

j=1

|Pj (z)|2

r


|Pj (z)|2 ,

(9.1)

j=n+1

where Pj (z) are linear, homogeneous polynomials. Is this decomposition unique, or


are at least the number of positive, respectively, negative squares unique? The
answer to these important questions was given a long time ago by Jacobi and
Sylvester. First observe that we should avoid obvious cancellations, such as 0 =
2
2
, or denoting by a single complex variable 0 = | + 1|2 + | 1|2
|P
(z)|2 |P
(z)|
2
| 2| | 2| .
Theorem 9.3. Let qA (z) be a hermitian form on Cd . In any decomposition (9.1)
with linearly independent complex linear forms P1 , . . . , Pr , the number of positive
or negative squares (n, respectively, r n) is independent of the decomposition.
For a proof and two classical methods (going back to Lagrange and Jacobi)
of how to compute eectively the sums of hermitian squares decompositions, see
Chapter X in [15]. To understand the intrinsic character of these numbers, simply
note that r is the rank of the matrix A, while n is the maximal dimension of a
vector subspace V of Cd on which qA (z), z V, z = 0, has only positive values. The
dierence n (r n) or sometimes the pair (n, r n) is called the signature of the
hermitian form qA (z).
The quadratic form qA (z) is called positive semidenite, respectively, positive
denite, if qA (z) 0 for all z, respectively, qA (z) > 0 for z = 0, in other terms the

i
i

9.3. Positive Denite Kernels

main
2012/11/1
page 411
i

411

eigenvalues of the hermitian matrix A are nonnegative, respectively, positive. The


terminology carries over to the matrix A.

9.2.3

Min-max Principle

Let A = A be a hermitian matrix with associated quadratic form qA . Since the


eigenvalues of A are real, we can arrange them in decreasing order:
1 (A) 2 (A) d (A).
The spectral decomposition and the interpretation of these numbers as reciprocals
of the principal axes of the quadric qA (z) = 1 lead to the following important
variational principle, stated as below by Courant and Fischer; see, for instance, [22,
Section 4.2].
Theorem 9.4. The eigenvalues of the hermitian matrix A satisfy
k (A) =

min

dim V =dk+1

max

zV \{0}

qA (z)
, 1 k d.

z
2

For this, and other, reasons, the numbers k (A) are also known as the characteristic values of the form qA (z).

9.2.4

Exercises

Exercise 9.5. A matrix A is called symmetric if it coincides with its transpose:


A = AT . Does the spectral theorem hold true for symmetric matrices over an
arbitrary eld? What about symmetric matrices over a real closed eld?
Exercise 9.6. Let A = A Md (C) be a hermitian matrix and let V Cd be a
vector subspace of dimension d 1. Prove, using the min-max principle, that the
restriction to V of the quadratic form qA has characteristic values interlaced with
those of qA .
Exercise 9.7. Let qA , qB be two hermitian forms. Try to dene the relative
characteristic numbers of A with respect to B via the Rayleigh quotient qA /qB .
Relate these values to the zeros of the determinant of the linear pencil of matrices
A B.

9.3

Positive Denite Kernels

9.3.1

Hilbert Space Factorization

Let X be a set and let K : X X C be a map. We call K a positive semidenite


kernel if, for every nite subset I X, the matrix (K(i, j))i,jI is hermitian and
positive semidenite, or equivalently
K(i, j) = K(j, i)

i
i

412
and

main
2012/11/1
page 412
i

Chapter 9. Sums of Hermitian Squares: Old and New

K(i, j)ci cj 0

i,jI

for all complex numbers ci C, i I. The kernel K is positive denite if the


matrix (K(i, j))i,jI is (strictly) positive denite for all nite subsets I X. The
following result, going back at least one century to Mercer and rediscovered by
Aronsajn, respectively, Kolmogorov, gives a set theoretic analogue of the sums of
squares decomposition of a Hermitian form; see, for instance, [24].
Theorem 9.8. Let K : X X C be a positive semidenite kernel. Then there
exists a complex Hilbert space H and a map F : X H such that
K(x, y) = F (x), F (y), x, y X.
Proof. Although tautological in its nature, the proof of this factorization theorem
is quite important for its wide range of applications. We construct the Hilbert space
as follows: let F (X) denote the set of all nitely supported functions f, g : X C,
and dene the inner product

K(x, y)f (x)g(y).
f, g =
xX

Denote the vectors of zero norm by N = {f F (X); f, f  = 0}. Note that by the
classical CauchySchwarz inequality we infer
|f, g|2 f, f g, g.
Thus N is a vector subspace of F (X) and the quotient F (X)/N carries a nondegenerate inner product induced on equivalence classes by f, g. The Hilbert space
completion H then contains F (X)/N as a dense subspace and the map F : X H
dened by the class of characteristic function F (x)(y) = 0 if x = y and F (x)(x) = 1
induces then the factorization in the statement.
Note that in general the Hilbert space constructed in the proof is nonseparable.
A uniqueness of the factorization can be immediately derived from the same proof.
Corollary 9.9. Assume that the positive semidenite kernel L : X X C
admits two factorizations L(x, y) = F (x), F (y)H = G(x), G(y)K , where H, K
are Hilbert spaces and the maps F : X H, G : X K both have dense
ranges. Then there exists a unitary transformation U : H K with the property
U F = G.

9.3.2

Positivity and Analyticity

In practice the positive denite kernel K satises some smoothness conditions on


an appropriate supporting set X, and consequently the factorization takes place in
a separable Hilbert space. We state only one possible result in this direction.

i
i

9.3. Positive Denite Kernels

main
2012/11/1
page 413
i

413

Proposition 9.10. Let Cd be an open set and let K : C be a


positive semidenite kernel which is analytic in the rst variable and antianalytic in
the second. Then there exists a separable, complex Hilbert space H and an analytic
map F : H, such that
K(z, w) = F (z), F (w),

z, w .

Proof. By its very construction, the factorization proved in Theorem 9.8 has
the property that the scalar function z  F (z), y is analytic for every vector y
belonging to a dense subspace of H. By taking limits of sequences of the form
F (z), yn  we nd that the map z  F (z), u is analytic for every u H. Hence
F (z) is analytic, due to the equivalence between weak and strong analyticity of
Hilbert space valued maps; see, for instance, [30].
To prove that the space H is separable, simply note that the vectors F (),
G, span H as soon as the countable set G is everywhere dense in .
When expanding in a Taylor, or Fourier, series we will encounter later the
natural question of whether the matrix of coecients of a kernel reects its positivity
as a map, as dened at the beginning of this section. For instance, take to be a
polydisk (that is, a product of disks) in Cd centered at z = 0, and assume that the
map K : C is analytic/antianalytic. Then a power series expansion
K(z, w) =

c, z w

(9.2)

,Nd

is convergent in . Under these conditions, we note the following simple but


essential observation:
Proposition 9.11. The kernel (9.2) is positive semidenite in a polydisk of convergence if and only if the innite matrix (c, ),Nd is positive semidenite.
Proof. Remark that, for > 0 suciently small,
,
c, =

,
...

K(z, w)z
|zj |=|wk |=

'
d &
#
dzj dwj
.
2izj 2iwj
j=1

A Riemann sum approximation of the integral proves then that (c, ),Nd is
a positive semidenite discrete kernel. Conversely, assuming that (c, ),Nd is
positive semidefninite, the convergence of the power series expansion implies the
positivity of K.
Exactly as in the case of hermitian forms, an analytic/antianalytic kernel
K(z, w) is determined by its values on the diagonal K(z, z).

i
i

414

9.3.3

main
2012/11/1
page 414
i

Chapter 9. Sums of Hermitian Squares: Old and New

Hadamards Product

Besides the natural operations which preserve positivity of kernels, their pointwise
product stands out:
Theorem 9.12 (Schur). Let Kj : X X C, j = 1, 2, be two positive
semidenite kernels. Then K(x, y) = K1 (x, y)K2 (x, y), x, y X, is also positive
semidenite.
For the proof see [15].
To give a single, illustrative application of Schurs theorem, consider an open
set Cd and a positive denite kernel K : C which is analytic/
antianalytic in the sense discussed above. Assume that there exists a positive constant M such that K(z, z) < M for all z . Then the new kernel
1
, z, w ,
M 2 K(z, w)
has the same properties (i.e., analyticity and positivity). Indeed, by virtue of the
CauchySchwarz inequality, |K(z, w)| < M for all z, w . Then Neumann series
decomposition and Schurs theorem lead to the desired conclusion:
'
&
1
K(z, w) K(z, w)2
2
=M
+
+ ... .
1+
M 2 K(z, w)
M2
M4

9.3.4

Bergmans Kernel

A classical construction in the geometry of complex varieties relies on a positive


denite kernel for constructing invariants to biholomorphisms. We briey recall the
construction in the particular case of a bounded domain of Cd . Let A2 () denote
the Bergman space of all analytic functions f : C which are square summable
with respect to the Lebesgue volume measure d2d :
,
|f (z)|2 d2d (z) < .

f
22, =

It is easy to see that A () is complete with respect to this norm and that the
evaluation functional f  f (a) is continuous for every a (for the proof use
the mean value theorem on a polydisk centered at z = a and fully contained in ).
Thus, according to Rieszs representation theorem (see [30]), there exists a unique
element ka A2 () which represents this functional:
f (a) = f, ka ,

f A2 ().

The positive denite kernel K (z, w) = kw , kz , also known as the Bergman kernel
of the domain , consequently satises the reproducing property
,
K (z, w)f (w)d2d (w), z , f A2 ().
f (z) =

Moreover, this property characterizes K .

i
i

9.3. Positive Denite Kernels

main
2012/11/1
page 415
i

415

Assume that : 1 2 is a biholomorphic map between bounded domains


of Cd . Then the change of variables in the above identity and the uniqueness of the
reproducing kernel yield
K2 ((z), (w))

(z)
(w) = K1 (z, w),
z
w

where
z (z) denotes the complex Jacobian. Since K1 (z, z) > 0 for all z 1 we
infer that the dierential form
d

log K (z, z)
dzj dz k
zj z k

j,k=1

is invariant under biholomorphic mappings. See [33, 35] for details.


To put this invariant to work, let us consider the unit ball B and the unit
polydisk in Cd . A power series argument leads to the closed forms
KB (z, w) =

1
1
, K (z, w) =
,
|B|(1 z, w)d+1
||(1 z1 w 1 )2 . . . (1 zd wd )2

where |A| denotes the volume of the set A. One can prove via these invariants that
the ball and the polydisk are not biholomorphically equivalent as soon as d 2;
see [33].

9.3.5

Exercises

Exercise 9.13. Let (X, ) be a compact space endowed with a Borel probability
measure, and let
,
K(x, y)f (y)d(y),
TK : L2 (X, ) L2 (X, ), (TK f )(x) =
X

be a linear bounded integral operator with kernel K : X X C. Relate the


positive deniteness of the kernel K to the positivity of TK :
TK f, f 2, 0, f L2 (X, ).
Exercise 9.14. Let Cd be an open set, and let H : C be an
analytic/antianalytic function. Prove that H(z, z) = 0, z implies H = 0.
Exercise 9.15. Denote by B Cd the unit ball. For which values of the parameter
R is the kernel (1 z, w) positive denite in B?
Exercise 9.16. Let Cd be a bounded domain, and let (hn )
n=0 denote an
orthonormal basis of Bergmans space A2 (). Prove that
K (z, w) =

hn (z)hn (w),

n=0

where the series converges uniformly on compact subsets of .

i
i

416

9.4

main
2012/11/1
page 416
i

Chapter 9. Sums of Hermitian Squares: Old and New

Origins of Hermitian Forms

In an inspired and undeservedly forgotten work of his early career, Hermite has
developed an algebraic method for counting the number of solutions of systems of
polynomial equations which are contained in a prescribed basic semialgebraic subset
of Rn or Cn . He was aiming at bypassing, via purely algebraic methods, Cauchys
residue integral method for counting roots of complex polynomials, analogous with
the widely circulated (at that time) algebraic algorithm developed by Sturm for
counting real zeros of polynomials. For this very reason Hermite introduced and
studied what we call today hermitian forms. For complete mathematical details
and ample historical comments see the (also forgotten) little book by Krein and
Naimark [27].
We illustrate below Hermites ideas in a couple of typical examples. For simplicity we expose Hermites idea in two variables, the transition to a larger number
of variables being straightforward. Suppose that two polynomials P1 , P2 R[x, y] of
degrees n1 (respectively, n2 ) possess exactly n = n1 n2 common roots V (P1 , P2 ) =
{(aj , bj ), 1 j n}, complex or real. Fix rational real functions , 1 , . . . , n so
that does not vanish on V (P1 , P2 ),
det((j (ak , bk ))nj,k=1 ) = 0,
and consider the hermitian form on Cn :
n

H(z, z) =
(aj , bj )|z1 1 (aj , bj ) + z2 2 (aj , bj ) + + zn n (aj , bj )|2 .
j=1

Since the sum is symmetric in the variables (aj , bj ) the hermitian form H depends
only on the coecients of the polynomials P1 , P2 . Denote the number of roots in
dierent sectors as follows:
Nc (P1 , P2 ) = #(V (P1 , P2 ) \ R2 ),
N+ (P1 , P2 ) = #(V (P1 , P2 ) {(x, y) R2 ; (x, y) > 0}),
N (P1 , P2 ) = #(V (P1 , P2 ) {(x, y) R2 ; (x, y) < 0}).
By the inertia theorem we infer, following Hermite, that
n (H) = Nc (P1 , P2 ) + N (P1 , P2 ), n+ (H) = Nc (P1 , P2 ) + N+ (P1 , P2 ),
where (n (H), n+ (H)) is the signature of the form H. Although it is dicult in
general to eliminate the variables (aj , bj ) in the form H, counting the number of
real common zeros of some given polynomials contained in a rectangle leads to an
elegant closed form, as pointed out by Hermite; see [27] for details.
We specialize the above ideas to polynomials of a single complex variable.
For p C[] denote p () = p(), that is, the polynomial obtained from p by
conjugating the coecients. Assume n = deg p and dene the complex polynomial
in two variables:
n

p(u)p (v) p (u)p(v)
ckl uk1 v l1 .
=
i
uv
k,l=1

i
i

9.4. Origins of Hermitian Forms

main
2012/11/1
page 417
i

417

By denition, the coecients satisfy the reality condition ckl = clk , 1 k, l n.


Let z = (z1 , . . . , zn ) Cn and dene the hermitian form
Hp (z, z) =

n


ckl zk zl .

(9.3)

k,l=1

Theorem 9.17 (Hermite). Let Hp be the hermitian form (9.3) associated with
a polynomial p C[] of degree n. Denote by n (Hp ) the number of negative,
respectively, positive, squares in the decomposition of Hp . Then the polynomial p
has n+ (Hp ) roots in the upper half-plane , > 0, n (Hp ) roots in the lower halfplane, and n n (Hp ) n+ (Hp ) common roots between p and p , that is, real roots
or complex conjugated roots.
In particular we derive from here a stability criterion widely used in mechanics
and engineering (compare with the similar criteria due to Routh and Hurwitz [15]).
Corollary 9.18. Assume that the hermitian form Hp is positive denite. Then the
polynomial p has all roots contained in the upper half-plane.
The proof of Hermites theorem relies on a product formula (well known today
in the context of Bezoutian computations). The key identity is, assuming p = p1 p2 :
p(u)p (v) p (u)p(v)
=
uv
p1 (u)p1 (v) p1 (u)p1 (v)
p2 (u)p2 (v) p2 (u)p2 (v)
+ p1 (u)p1 (v)
.
uv
uv
A similar product rule is inherited by the form Hp , allowing us to use the inertia
theorem and induction on the degree in order to prove Hermites theorem.
p2 (u)p2 (v)

9.4.1

Root Separation in the Unit Disk

The specic denominator uv and form of the conjugate p in the Hermite theorem
are related to the Schwarz reection with respect to the boundary of the domain
of root separation. In the case of the upper half-plane the reection is  .
1
When repeating the procedure for the unit disk, with Schwarz reection 
one arrives at a similar conclusion. The computations were detailed by Schur (and
independently by several other authors); see [27]. Specically, let p C[] be a polynomial of degree n and dene p () = n p ( 1 ) as the polynomial with conjugated
coecients, arranged in reversed order. Consider the bivariate polynomial
n

p (u)p (v) p(u)p (v)
=
akl uk1 v l1 .
1 uv
k,l=1

Let Sp be the hermitian form with coecients (akl ). In complete analogy with
Hermites theorem we state the following well-known result.

i
i

418

main
2012/11/1
page 418
i

Chapter 9. Sums of Hermitian Squares: Old and New

Theorem 9.19 (Schur). Let p C[] be a complex polynomial of degree n with


associated form Sp and signature n (Sp ). Then p has n+ (Sp ) roots in the open unit
disk, n (Sp ) roots in the complement of the closed unit disk, and nn+ (Sp )n (Sp )
roots of modulus one, or conjugated with respect to the unit circle.
Again, a criterion for all roots to be in the unit disk is that the form Sp is
positive denite. To complete the picture we remark that in the case
p() =

n
#

( aj ), p () =

j=1

n
#

(1 aj ),

j=1

we obtain a nite Blaschke product as quotient,


n
#
p
aj
m() =  () =
,
p
1
aj
j=1

(9.4)

and it is a rational n-fold covering of the disk onto the disk and its boundary onto
the boundary.

9.4.2

Eigenvalue Separation

A too well charted and traveled area of control theory deals with stability criteria
for linear systems of dierential equations. In its turn, via a Laplace transform, this
heavily relies on root separation criteria as presented above. We discuss below an
instance of Hermite theory as transgressed and distilled by engineers.
Let A, B be complex n n matrices, with A = A self-adjoint. We consider
the (spectrahedral) region in the complex plane
G = {z C; A + zB + (zB) 0}.
We assume that G is nonempty and does not coincide with the full complex plane.
Theorem 9.20. An n n matrix M has all its eigenvalues in the region G if and
only if there exists a positive denite matrix X such that
A M + B (XM ) + (XM ) B 0.
The most important examples are given by the following choices: the halfplane n = 1, A$= 0, B%= 1 and the disk centered at zero, of radius r, corresponding
0
01
to n = 2, A = r
0 r , B = ( 0 0 ) .
For the proof of the theorem and more details see [7].

9.4.3

Exercises

Exercise 9.21. Let p() = ( )( ) be a monic polynomial of degree 2.


Compute the associated forms Hp , Sp and verify Hermite and Schur theorems, respectively.
Exercise 9.22. Let p R[] and x an angle (0, ). Prove that the polynomial
p has all roots in the wedge < arg < if and only if the matrix of coecients

i
i

9.5. Schurs Algorithm

main
2012/11/1
page 419
i

419

of the polynomial
i

f (ei u)f (ei v) f (ei v)f (ei u)


uv

is positive denite.
Exercise 9.23. Find a hermitian form whose positivity certies that a polynomial
has all roots contained in a given ellipse.
Exercise 9.24. Prove the eigenvalue separation theorem in the case of a disk or a
half-plane.

9.5

Schurs Algorithm

Returning to Schurs theorem discussed in the last section, notice that the Blaschke
product (9.4) produces a positive semidenite kernel
1 m(u)m(v)
.
1 uv

K(u, v) =

Indeed, we have already seen that the kernel


p (u)p (v)K(u, v) =

p (u)p (v) p(u)p (v)


1 uv

is positive semidenite (as the polynomial p has all its roots contained in the unit
disk), and in addition the function p does not vanish in the disk.
It was Schur who recognized in the above positivity a characterization of all
power series
f (z) = a0 + a1 z + a2 z 2 +

(9.5)

which map the disk into the disk. By dierent means, the same question was
studied by Caratheodory, Fejer, and Toeplitz; again see [27] for more details. We
focus below on Schurs approach, as it leads to a basic algorithmic way of verifying
when f (z) maps the disk into the disk. We call, in short, f a contractive analytic
function in the disk.
Assume that the analytic function (9.5) satises |f (z)| 1 whenever |z| < 1.
In particular |a0 | = |f (0)| 1. If |a0 | = 1, then the function f (z) = a0 is a constant
obius transform applied
by the maximum principle. Assume that |a0 | < 1. Then a M
to f yields a new function from the disk to its closure, which in addition vanishes
at z = 0, whence
zf1 (z) =

f (z) 0
,
1 0 f (z)

where 0 = f (0) by denition. By virtue of Schwarz lemma, the factor f1 (z) satises
|f1 (z)| 1 for all |z| < 1. By inverting the transform we nd
f (z) =

zf1 (z) + 0
.
1 + 0 zf1 (z)

i
i

420

main
2012/11/1
page 420
i

Chapter 9. Sums of Hermitian Squares: Old and New

Let 1 = f1 (0) and continue by induction. If |1 | = 1, stop: f1 (z) = 1 is a


constant. If |1 | < 1, continue and dene successively
fk1 (z) =

zfk (z) + k1
.
1 + k1 zfk (z)

(9.6)

In this way we have associated with the nite section of the sequence of coefcients of f (z) another sequence of the same length, called the Schur parameters:
(a0 , . . . , an )  (0 , 1 , . . . , n ).
The transformation is real analytic, as one can easily prove by induction. The main
result is:
Theorem 9.25 (Schur). Let n be a positive integer and let a0 , a1 , . . . , an be
complex numbers. There exists a power series
f (z) = a0 + a1 z + a2 z 2 + + an z n + O(z n+1 )
mapping the open disk into the closed disk if and only if the Schur parameters
0 , 1 , . . . , n are of modulus less than or equal to one. If k is the rst index with
|k | = 1, then there exists only one continuation of a0 + a1 z + + ak z k into such
a function f , and this is a Blaschke product of degree k.
Moreover, the recursion formula (9.6) labels all possible extensions of the
polynomial a0 +a1 z +a2 z 2 + +an z n to a contractive function as in the statement.
The recursion formula and the representation of the function f (z) by a chain of
simple multiplication and division operations is a perfect analogue of the continued
fraction algorithm in number theory.
One step further, Schur made the connection with the counting zeros form,
by proving that a0 + a1 z + a2 z 2 + + an z n can be continued to a function f (z)
which maps the disk into its closure if and only if the Toeplitz matrix

a0 a1 . . .
an
0 a0 . . . an1

T = .
..
..
..
.
.
0 0 ...
a0
is contractive. In order to prove this fact we start with the observation that Schurs
algorithm as presented above implies that every analytic function f (z) mapping the
disk into the disk can be uniformly approximated on compact subsets of the open
disk by Blaschke products. Consequently, the kernel
1 f (z)f (w)
,
1 zw

|z|, |w| < 1,

is positive semidenite. We can dilate the argument of the function to f (rz), r < 1,
and assume that f is analytic in a neighborhood of the closed unit disk.

i
i

9.6. RieszHerglotz Theorem

main
2012/11/1
page 421
i

421

Let p(z) = p0 + p1 z + + pn z n be a polynomial of degree less than or equal


to n. Computing by Cauchys formula,
,
,
1 f (z)f (w)
dz dw
0
p(z)p(w)
1 zw
2iz 2iw
|z|=1 |w|=1
,
dz
=
|p(z)|2
2iz
|z|=1
,
,

(a0 + a1 z + + an z n )(p0 + p1 z 1 + + pn z n )
|z|=1

|w|=1

(a0 + a1 w1 + + an wn )(p0 + p1 w + + pn wn )(1 + zw + + z n w n )

dz dw
2iz 2iw

=
v
2
T v
2 ,
where v = (p0 , p1 , . . . , pn ) Cn and T is the above Toeplitz matrix.
The reader can consult the monograph [13] for further details and many unexpected applications of the Schur parameters.

9.5.1

Exercises

Exercise 9.26. Prove that the only power series (9.5) associated with an extremal
Schur parameter |n | = 1 is a Blaschke product of degree n.
Exercise 9.27. Find all continuations to a contractive power series of a degree 2
polynomial a0 + a1 z + a2 z 2 . Describe explicitly the conditions on the coecients
a0 , a1 , a2 that such a continuation exists, and if so, that it is unique.
Exercise 9.28. Let f (z) be an analytic function mapping the disk into the disk.
Prove that f can be approximated uniformly on compact subsets of the open disk
by Blaschke products.

9.6

RieszHerglotz Theorem

The structure of contractive analytic functions in the disk revealed in the previous section can be related by a linear fractional transform to that of nonnegative
harmonic functions in the disk. We briey describe this new point of view.
Let h(z), |h(z)| 1, be an analytic function in the disk |z| < 1. Leaving the
case of a constant function aside, we can assume that |h(z)| < 1 in the disk, and
1+h(z)
dene the function f (z) = 1h(z)
, so that -f (z) 0 for all |z| < 1. Let fr (z) =
f (rz), 0 < r < 1, so that the functions fr are dened in a neighborhood of the closed
disk and limr1 fr = f uniformly on compact subsets of D = {z C; |z| < 1}.
A direct application of Cauchys formula yields
, i
e + w -fr (ei )d
+ i,f (0).
fr (w) =
i
2
e w

i
i

422

main
2012/11/1
page 422
i

Chapter 9. Sums of Hermitian Squares: Old and New

)d
are nonnegative and of uniform mass
Remark that the measures r = fr (e
2
equal to -f (0); hence they form a compact set in the weak- topology of measures
on the unit torus. By passing to a limit point we obtain a positive measure with
the property
, i
e +w
d() + i,f (0).
(9.7)
f (w) =
i
e w
i

Since the trigonometric polynomials are dense in the space of continuous functions
on the torus, we infer that the measure is unique with the above property.
Formula (9.7) is known as the RieszHerglotz representation of the nonnegative
harmonic functions in the disk. Since D is simply connected, for any harmonic
function u : D R there exists an analytic function f : D C such that u = -f .
Putting together these observations we have proved the equivalence between the rst
two statements in the next theorem.
Theorem 9.29. Let f : D C be an analytic function. The following assertions
are equivalent:
(a) -f 0;
(b) there exists a positive measure on D, such that (9.7) holds;
(c) the kernel

f (z)+f (w)
1zw

is positive semidenite on D D.

Proof. (a) (b) was proved before. If (b) holds true, then
,
d()
f (z) + f (w)
=2
,
i z)(ei w)
1 zw
(e

whence (c) is true. Finally, (c) (a) because a positive semidenite kernel has
nonnegative values on the diagonal.
The above positivity result has a classical counterpart in the case f is a nonnegative polynomial on the boundary of the disk. Specically, we have the following
RieszFejer theorem.
Theorem 9.30. Let p(z, z) be a polynomial with complex coecients which is
nonnegative on the unit torus T. Then there exists a polynomial q(z) C[z] with
the property
p(z, z) = |q(z)|2 ,

z T.

Proof. For z T write z = ei and decompose


p(ei , ei ) =

d


cj eij .

d
i

The assumption p(e , e ) 0 for all [0, 2] implies cj = cj , 0 j d.


d
Consider the Laurent series P (z) = d cj z j and note that P (z) = P (1/z), rst
for z T and then for all z C, by analytic continuation. Thus, the zeros and
i

i
i

9.6. RieszHerglotz Theorem

main
2012/11/1
page 423
i

423

poles of the rational function P are symmetric with respect to the torus, whence
&
'
#
#
1
d

2
z P (z) = cz
(z j )
(z k ) z
,
k
j
k

where c = 0 is a constant, |j | = 1, and 0 < |k | < 1. By returning to the


parametrization of the torus z = ei we infer
p(ei , ei ) = |p(ei , ei )| = |eid P (ei )|2 = |c|

|ei j |2

# |ei k |2

9.6.1

|k |2

Bounded Analytic Optimization

To remain in the spirit of this volume, and returning to Schurs theorem and the
RieszHerglotz integral representation, we are in the position of stating the following
direct optimization corollary of our computations.
Proposition 9.31. Let h(z) be a bounded analytic function in the disk. Then
M =
h
,D = sup |h(z)|
zD

is the smallest nonnegative number M with the property that the kernel
M 2 h(z)h(w)
1 zw
is positive semidenite.
For the proof we simply substitute f (z) =
that
f (z) + f (w) = 2

9.6.2

M+h(z)
Mh(z)

in Theorem 9.29 and remark

M 2 h(z)h(w)
(M h(z))(M h(w))

Hilbert Space Realizations

Theorem 9.29 has a fourth equivalent statement which brings into focus very naturally Hilbert space representations of all bounded analytic functions in the disk. We
start with the positive kernel appearing in Proposition 9.31. According to Proposition 9.10 there exists a Hilbert space H and an analytic function F : D H,
such that
M 2 h(z)h(w)
= F (z), F (w), z, w D,
1 zw
or equivalently
M 2 + zF (z), wF (w) = h(z)h(w) + F (z), F (w).

i
i

424

main
2012/11/1
page 424
i

Chapter 9. Sums of Hermitian Squares: Old and New

Passing to the Hilbert space C H, the latter identity becomes


H&
' &
'I H&
' &
'I
M
M
h(z)
h(w)
,
=
,
.
zF (z)
wF (w)
F (z)
F (w)
In other terms we have two Hilbert space factorizations of the same positive
denite kernel. According to the uniqueness stated in $Corollary
% 9.9, there exists a
unitary map V between the linear span of the vectors zFM(z) into the linear span
%
$ M % $ h(z) %
$
of the vectors Fh(z)
(z) such that V zF (z) = F (z) . Extend V to the whole space
C H to a linear contractive map
&
'
d , b
V =
,
c
A
where d C, b, c H, A : H H. In particular |d| 1,
A
1. Therefore
dM + zF (z), b = h(z), M c + zAF (z) = F (z)
for all z D. By eliminating F (z) we obtain
h(z) = M [d + z(I zA)1 c, b], z D.
We have thus proved half of the following Hilbert space realization theorem,
well known for its wide applications in control theory.
Theorem 9.32. An analytic function h maps the disk into the disk if and only if
it can be written as h(z) = d + z(I zA)1 c, b, where
&

d , b
c
A

'

C
C
:
H
H

is a contractive linear operator and H is an auxiliary Hilbert space.


Proof. In order to prove the suciency of the above representation, dene F (z) =
(I zA)1 c, and remark that, due to the contractivity of the 2 2 block matrix
J&
'J
'J J&
J J h(z) J
J
1
J
JJ
J
J zF (z) J J F (z) J , z D.
Thus
1 + |z|2
F (z)
2 |h(z)|2 +
F (z)
2 |h(z)|2 + |z|2
F (z)
2 ,
which implies |h(z)| 1 for all z D.
Remember that originally V was a unitary map between two subspaces of
C H and V was a linear contractive extension of it. With a little extra care
one can adapt the construction of the realization so that V is unitary. Again, the
monograph [13] is an invaluable source of information on these topics.

i
i

9.7. von Neumanns Inequality

9.6.3

main
2012/11/1
page 425
i

425

Exercises

Exercise 9.33. Assume that the analytic function f maps the disk into the right
(w)
half-plane. Under which conditions has the kernel f (z)+f
nite rank?
1zw
Exercise 9.34. The set H of all functions f satisfying the conditions in Theorem
9.29 is a closed convex cone, as a subset of O(D), the Frechet space of all analytic
functions in the disk. Find the extremal rays of H.
Exercise 9.35. Let be a simply connected domain and let : D be a
conformal mapping. Let f O() be bounded. Find
f
, via a positive denite
optimization of a hermitian form involving f and .
Exercise 9.36. Derive a Hilbert space realization of all analytic functions mapping the disk into the right half-plane.
Exercise 9.37. [17] Let A Md (C) be a matrix with cyclic vector and minimal
monic polynomial Pd (z). Prove that there are polynomials Pk (z), of exact degree
deg Pk = k, 0 k < d, such that
|P (z)|2
(A z)1
2 = |Pd1 (z)|2 + |Pd2 (z)|2 + + |P0 (z)|2 .
Conversely, every sum of hermitian squares of polynomials in exact decreasing order
comes as above from a cyclic matrix.

9.7

von Neumanns Inequality

We are ready at this point to prove a new inequality involving functions and operators.
Theorem 9.38 (von Neumann). Let T : H H be a linear contractive
(
T
1) operator acting on a complex Hilbert space H and let f be an analytic
function dened in a neighborhood of the closed unit disk. Then

f (T )

f
,D .

(9.8)

By f (T ) we mean the analytic functional calculus obtained by replacing the


variables in the power series expansion of f by the operator T .
Proof. The statement is equivalent to
(for all z D, -f (z) 0) -f (T ) 0.
Indeed, if a linear operator S satises
S
< 1, then -[(I + S)(I S)1 ] > 0, and
vice versa, due to the identity
(I + S)(I S)1 + (I S )1 (I + S ) = 2(I S )1 (I S S)(I S)1 .

i
i

426

main
2012/11/1
page 426
i

Chapter 9. Sums of Hermitian Squares: Old and New

Let f (z) be an analytic function with positive real part dened in a neighborhood of the closed unit disk. By the RieszHerglotz formula
,
1
1
(1 wz) i
d(),
f (z) + f (w) = 2
i
w)
(e z)
(e
where is a positive measure on the unit circle.
Expand everything into a series and replace z in the above identity by T and
w by T , assuring that in the mixed terms contain T to the left of T . The result is
,
(ei T )1 (1 T T )(ei T )1 d() 0,
-f (T ) =

and the proof is complete.


Originally, von Neumann proved the above inequality using Schurs algorithm
and a rational approximation. See [2] for further details and generalizations.

9.7.1

The Spectral Theorem

Among the many applications of von Neumanns inequality we sketch below the
construction of the spectral measure of a unitary operator. The reader will easily
adapt afterward the proof to the case of bounded self-adjoint operators.
Let H be a complex Hilbert space and let U : H H be a unitary operator,
that is U U = U U = I. Then zI U is invertible for all z C, |z| = 1. In other
terms, the spectrum of U is contained in the unit circle D.
Let p(z, z) be a polynomial satisfying p(z, z) 0 whenever |z| = 1. By means
of the identity zz = 1 along D we can replace all mixed terms z m z n by a linear
combination of pure terms z k or z , modulo 1 |z|2 . Consequently we can write
p(z, z) = p1 (z) + p1 (z) + (1 |z|2 )p2 (z, z),
where p1 (z) is a polynomial which depends only on z. According to the von Neumann inequality we obtain
p(U, U ) = 2-p1 (U ) 0,
since -p1 (z) 0 on the circle.
Thus, the polynomial functional calculus : C[z, z] L(H), dened by
(p(z, z)) = p(U, U ) is linear, multiplicative, unital, and positive:
(for all z D, p(z, z) 0) p(U, U ) 0,
or equivalently,

p(U, U )

p
,D , C[z, z].
Since every continuous function on the circle is a uniform limit of trigonometric
polynomials, we can extend by continuity to a continuous algebra homomorphism
: C(D) L(H), (z) = U,

i
i

9.8. Bounded Analytic Interpolation

main
2012/11/1
page 427
i

427

which is in addition compatible with the involutions


(f ) = (f ) , f C(D),
and the order structures
f 0 (f ) 0.
The positivity of the functional calculus allows a further extension dened on
all bounded Borel functions on the circle, simply repeating the construction of the
Lebesgue integral in this setting. The key observation being that limn (fn )x
exists for all x H whenever fn is a monotonic and uniformly bounded sequence
of measurable functions; see, for instance, [30].
In conclusion we obtain the full spectral theorem for unitary operators.
Theorem 9.39. Let U L(H) be a unitary operator and denote by B(D) the
space of all bounded Borel measurable functions on the unit circle. There exists a
positive, unital algebra homomorphism : B(D) L(H) with the properties
(a) (z) = U ;
(b)
(f )
sup|z|=1 |f (z)|;
(c) for all x H and every monotonic, pointwise convergent sequence fn f
in B(D), (f )x = limn (fn )x.

9.7.2

Exercises

Exercise 9.40. Let Jn be the (nilpotent) Jordan block of size n n. Prove that

Jn
= 1 and translate the matrix inequality
p(Jn )

p
,D into numerical
inequalities referring to an arbitrary polynomial p(z).
Exercise 9.41. Let Uk , 1 k n be a nite system of commuting unitary
matrices of size d d. Prove that there exists a unitary matrix U and polynomials
pk (z) such that Uk = pk (U ) for all k, 1 k n.
Exercise 9.42. Let U L(H) be a unitary operator. Prove that the bicommutant
(U  ) of U is equal to the range of the Borel functional calculus described in Theorem
9.39. The commutant of a set of operators S L(H) is S  = {T L(H); T X =
XT, X S}.

9.8

Bounded Analytic Interpolation

One of the classical applications of the realization Theorem 9.32 has to do with the
bounded analytic interpolation of discrete data in the disk. Contrary to the free
polynomial interpolation, the data are in this case bound by a series of positivity
conditions. The precise statement follows.
Theorem 9.43 (NevanlinnaPick). Let {ai }, {ci }, i I, be subsets of D, so
that ai does not have accumulation points, but the index set may be innite. There

i
i

428

main
2012/11/1
page 428
i

Chapter 9. Sums of Hermitian Squares: Old and New

exists an analytic function f : D D interpolating the data


f (ai ) = ci ,
if and only if the kernel

i I,

1 ci cj
, i, j I,
1 ai aj

is positive semidenite.
Proof. One implication follows from Theorem 9.29. In order to prove the converse,
assume that the kernel in the statement is positive semidenite. Then there exists
a Hilbert space and a function h : I H with the property
1 ci cj
= h(i), h(j), i, j I.
1 ai aj
Starting from here we argue as in the proof of Theorem 9.32, namely,
1 + ai h(i), aj h(j) = ci cj + h(i), h(j)
implies the existence of a contractive block matrix operator on C H satisfying
&
' &
' &
'
d , b
1
ci
:
=
.
c
A
ai h(i)
h(i)
From here we infer by eliminating h(i):
ci = d + (I ai A)1 c, b.
Hence the contractive analytic function
f (z) = d + (I zA)1 c, b
interpolates the given data.
A similar result is known for (higher multiplicity) Hermite interpolation, that
is, by prescribing the values of nitely many derivatives of f at every point:
(k)

f (k) (ai ) = ci , 0 k K(i), i I.


The reader can try to nd and prove without too many additional complications
the right statement.
Full details and complements on NevanlinnaPick interpolation can be found
in [2, 13].

9.8.1

Exercises

Exercise 9.44. Prove and state a NevanlinnaPick theorem for matrix-valued


functions.

i
i

9.9. Perturbations of Self-Adjoint Matrices

main
2012/11/1
page 429
i

429

Exercise 9.45. Let a1 = 0, a2 , a3 be three distinct points in the unit disk and
choose c1 = 0, c2 , c3 also in the disk. Write the 3 3 conditions that there exists a
contractive analytic function in the disk interpolating these data. Find when this
function is unique.
Exercise 9.46. Prove that in the case of nitely many data (the set I in the
statement is nite), there always exists a rational contractive interpolant. Estimate
its degree.

9.9

Perturbations of Self-Adjoint Matrices

RieszHerglotzs theory on the unit circle has an obvious parallel on the line. A few
details are worth a closer look, as they provide the background of perturbation
theory of self-adjoint operators. We avoid below the complications related to unbounded symmetric operators or even general Hilbert space theory, focusing only
on nite-dimensional computations. The reader can greatly benet by lling these
gaps by reading the relevant sections contained in the monograph by Gohberg and
Krein [16].
Start with a self-adjoint matrix A = A L(Cd ). We can arrange the eigenvalues in nondecreasing order:
1 (A) 2 (A) d (A).
Consider a rank 1 self-adjoint operator ,  acting on Cd , where Cd is a vector.
An immediate corollary of the min-max principle (see Exercises 9.2.4, exercise 2)
shows that the perturbed matrix B = A + ,  has eigenvalues interlaced to
those of A:
1 (A) 1 (B) 2 (A) d (A) d (B).
d
Let = j=1 [j (A),j (B)] be the characteristic function of the union of spectral
displacement intervals between the two sets of eigenvalues. Then, for every z
/R
we obtain
d
#
j (B) z
det[(B zI)(A zI)1 ] =
(A) z
j=1 j

,
d

(t)dt
j (B) z
.
log
= exp
= exp
j (A) z
R tz
j=1
On the other hand, by simply expanding the vector in the orthonormal basis
which diagonalizes A we nd
det[(B zI)(A zI)1 ] = det[I + (A zI)1 , ]
= 1 + (A zI)1 ,  =

d

j=1

cj
,
j (A) z

where cj 0 for all j, 1 j d.

i
i

430

main
2012/11/1
page 430
i

Chapter 9. Sums of Hermitian Squares: Old and New

One step further, we can put together the above observations in the form of
equivalent representations of the same object.
Proposition 9.47. The following classes are equivalent:
(a) rational functions R(z) C(z) satisfying R() = 1 and
0 < ,R(z),(z) < , z
/ R;
(b) nite atomic positive measures on the real line;
(c) characteristic functions (t) of bounded semialgebraic subsets of the real
line;

(d) unitary equivalence classes of pairs (A, ), where A = A L(Cd ) and


is a cyclic vector for A.
The equivalence is given by the formulas
,
,
d(t)
(t)dt
= exp
R(z) = 1 +
t

z
R
R tz
= det[(A + ,  zI)(A zI)1 ] = 1 + (A zI)1 , .

Proof. A bounded semialgebraic subset of the real line is simply a nite union of
intervals. We prefer this fancy terminology due to higher-dimensional analogues;
see [17]. Since in all formulas the operator A or its powers appear against the
vector , it is natural to assume that this vector is cyclic with respect to A; that is,
, A, . . . , Ad1 is a linear basis of Cd .
To see that (a) (b) we remark that both zeros and poles of R must be
real, interlaced (by the argument principle), and that the residues at every pole
are positive. Then (b) (d) by considering the multiplier A = Mt on the space
L2 (), and (d) (c), (d) (b) by the computations preceding the statement.
The implication (c) (b) follows by direct integration and exponentiation. Finally
(d) (a) is a straightforward computation
(A zI)1 ,  (A zI)1 , 
= (A zI)1 (A zI)1 ,  > 0
zz
for all z
/ R.
To be in line with the theme of this chapter, we can add to the above equivalences the positivity (as a kernel) condition
E
D
R(z)R(z)
> 0, z
/ R.
(e) R is rational, R() = 1, and
zz
In this way the relation to the positivity theory in the disk exposed in the
previous sections becomes more transparent.
The function AB = appearing in the statement is known as the phase
shift or the spectral shift of the perturbation A B = A + , . The name is
justied by the following remarkable trace formula:
,
Tr(f (B) f (A)) =
f  AB dt, f C[z].
R

i
i

9.10. Positive Forms in Several Complex Variables

main
2012/11/1
page 431
i

431

Indeed,
Tr(f (B) f (A)) =

d


(f (j (B)) f (j (A))) =

j=1

d ,

j=1

j (B)

f  (t)dt.

j (A)

By dening step by step via rank 1 additive perturbations to the spectral shift
of a pair of self-adjoint matrices according to the rule
AB + BC = AC ,
we are led to the crucial observation
,
|AB |dt Tr|A B|.
R

This enables us to take limits and obtain the following well-known theorem.
Theorem 9.48 (LifshitzKrein). Let A, B L(H) be bounded self-adjoint
operators acting on a complex Hilbert space H and assume that A B is traceclass. Then there exists a function L1 (R, dt) with compact support, such that
,
Tr(f (B) f (A)) =
f  dt
R

for every function f

C01 (R),

and
,
||dt Tr|A B|.
R

The reader can consult for details the original article [25] and the monograph [23].

9.9.1

Exercises

Exercise 9.49. Proposition 9.47 has a word-by-word counterpart for analytic


functions mapping the upper half-plane into itself, with an arbitrary positive
measure, compactly supported on the line and the function L1 (R, dt) also of
compact support. The analytically inclined reader will nd pleasure in proving the
complete equivalence between the four corresponding statements. Details can be
found in [16].

9.10

Positive Forms in Several Complex Variables

Let z Cd denote the d-tuple of complex variables. We focus below on hermitian


bihomogeneous forms

f (z, z) =
c z z ,
||=||=n

i
i

432

main
2012/11/1
page 432
i

Chapter 9. Sums of Hermitian Squares: Old and New

where the standard multiindex notation is used: z = z11 zdd . Note that the
matrix of coecients (c ) is unambiguously determined by f . A diagonalization
of this matrix yields a decomposition
f (z, z) =
F1 (z)
2
F2 (z)
2 ,
where Fj : C d Cnj are homogeneous (of degree n), vector-valued polynomial
functions.
It is important to remark from the very beginning that, even if f (z, z) > 0 for
all z = 0, the form f may not be a sum of hermitian squares. The following simple
example in two variables (z, w) singles out where the obstruction lies:
f0 = |z|4 + |w|4 c|zw|2
is everywhere positive on C2 \ {0} as soon as c < 2. Now dene
fN = (|z|2 + |w|2 )N (|z|4 + |w|4 c|zw|2 ).
The matrix associated with fN is diagonal with entries containing binomial coecients of the form
&
' &
'
&
'
N
N
N
+
c
.
p
p+2
p+1
After elementary calculations, the condition that all these coecients are positive is
N +1>

2c
,
2c

showing that N has to increase to innity to ensure that fN is a sum of squares,


assuming that c tends to the value 2.
In analogy with Artins positive solution to Hilberts 17th problem (see [4, 28]),
the following result casts well the same phenomenon in the complex domain.
Theorem 9.50 (Quillen). Let f (z, z) be a bihomogeneous form on Cd . If f (z, z) >
0 for all z = 0, then there exists a positive integer N , such that
z
2N f (z, z) is a
sum of hermitian squares.
In complex dimension one we know a much simpler factorization, oered by
RieszFejer Theorem 9.30. We will discuss three proofs of Quillens theorem. The
third one, which is purely algebraic, is the most accessible for the nonanalyst, but
the other two have intrinsic value, as we shall see. For more details and examples
the reader can consult the works of dAngelo and his collaborators [5, 6, 9, 10, 11].
Proof 1. Quillens original proof [31] is based on computations in the Bargmann
Fock space (a natural environment for quantum physicists). Specically, we endow
2
Cd with the Gaussian measure dG = n ez d2d (z) and dene H as the Hilbert
space completion in L2 (Cd , dG) of the complex polynomials. One computes without
diculty
z , z H = , !, , Nd ,

i
i

9.10. Positive Forms in Several Complex Variables

main
2012/11/1
page 433
i

433

where , is Kroneckers symbol. Moreover, the BargmannFock space carries the


remarkable adjunction identity
D u, vH = u, z vH , u, v C[z],
where D =

||

z1 1 ...zd d

. Fix a bihomogeneous form




f (z, z) =

a z z

||=||=n

and consider the integral operator


,
Ef : C[z] C[z], Ef (u)(z) =
If u(w) =

|| u w

f (z, w)u(w)dG.
Cd

, then
Ef (u)(z) =

a !u z

and
Ef (u), vH =

a !!u v ,


where v(z) = || v z . If the matrix of coecients a is hermitian, then Ef turns
out to be a symmetric operator acting on the space of homogeneous polynomials of
degree n.
Ef (u), vH = u, Ef (v)H .
Note also that Ef  0 is a linear operator if and only if the matrix (a, ) is positive
semidenite, if and only if the form f is a sum of hermitian squares.
2N
Let N be a positive integer, and denote by fN (z, z) = z
N ! f (z, z), a form
of bidegree (N + n, N + n). Consider two homogeneous polynomials u, v of degree
N + n each. Note that
 zz

z
2N
=
.
N!
!
||=N

Then
EfN u, vH =

,,

( + )!( + )!
u+ v+
!

a D u, D vH


=


,


a z D u, v

.
H

i
i

434

main
2012/11/1
page 434
i

Chapter 9. Sums of Hermitian Squares: Old and New

Assume that f (z, z) > 0 for all z = 0. From this point the proof becomes
more technical
and we merely mention the main idea: the dierential operator


Tf =
a
is elliptic and symmetric on the space of polynomials,
||=|| z D
with positive principal symbol equal to the form f (up to a constant). Hence, on
a proper choice of Sobolev norms, Tf is Fredholm. In particular, when restricted
on polynomials Tf possesses a nite number of negative eigenvalues. But Tf maps
the space of homogeneous polynomials of degree N + n into itself, and it coincides
there with the operator with integral kernel fN . Thus, for N suciently large, EfN
is a positive operator; that is, the form fN is a sum of hermitian squares.
Proof 2. The second proof of Quillens theorem is due to Catlin and dAngelo [5]
and closely follows the same idea of compact perturbations of integral operators
(only that it was published thirty
Start again with the (n, n) years after Quillen).

a
z
z
,
and
assume that it is positive
homogeneous form f (z, z) =

||=||=n
d
on the unit sphere in C . By homogeneity, this is the same condition as before:
f (z, z) > 0 for all z = 0. Let B denote the unit ball in Cd and consider the Bergman
space A2 (B) with reproducing kernel KB (see [5, Section 3.4]). The operator
,
2
2
Sf : A (B) A (B), (Sf u)(z) =
KB (z, w)f (z, w)u(w)d2d (w)
B

is bounded, and it maps homogeneous polynomials of degree N into homogeneous


polynomials of degree N , as one can easily see from the orthogonality of the monomials z , Nd , and a power series expansion of the integral kernel:




||=||=n a z w
=
cN fN (z, w),
KB (z, w)f (z, w) =
|B|(1 z w)d+1
N =0

2N

where cN > 0 are constants and the forms fN (z, z) = z


N ! f (z, z) are the same as
in the previous proof.
Note that Sf is a self-adjoint operator which is a compact perturbation of the
positive operator Sf with integral kernel KB (z, w)f (w, w):
,

Sf u, u =
KB (z, w)f (w, w)u(w)u(z)d2d (z)d2d (w)
B

,
f (w, w)|u(w)|2 d2d (w) 0.

=
B

Hence Sf has only nitely many negative eigenvalues. Finally, we infer that for
large enough N , the restriction of the operator Sf on the space of homogeneous
polynomials of degree N is positive, that is, the form fN is a sum of hermitian
squares.

9.10.1

P
olyas Theorem

A well-known interplay between the complex coordinates in Cd and their moduli,


seen as coordinates on Rd+ , leads to a new proof of a classical theorem of Polya

i
i

9.11. Semirings of Hermitian Squares

main
2012/11/1
page 435
i

435

referring to positive polynomials in an octant. To be more precise, let f (x) R[x]


be a homogeneous polynomial in d variables x = (x1 , . . . , xd ) so that the substitution
xj = |zj |2 produces a bihomogeneous form F (z, z) = f (|z1 |2 , . . . , |zd |2 ). Assume that
'
&

xj > 0 f (x) > 0.
|x|1 :=
Then F (z, z) > 0, z = 0, by homogeneity. Thus, in view of Theorem 9.50, there
exists N 0 such that
z
2N F (z, z) is a sum of hermitian squares. This observation
can be carried back to f (x) 
in the following form.


2N

F (z, z) =
Assume |x1 |N f (x) =
|| a x . Then
z

|| a z z ; that
is, the coecient matrix associated with F is diagonal, and hence it has positive
entries by Quillens theorem. In conclusion we obtain the following theorem.
Theorem 9.51 (P
olya). Let f (x) be a homogeneous polynomial with real coecients in d variables. If f (x) > 0 for all x = (x1 , . . . , xd ) [0, )d \ {0}, then there
exists an integer N 0 with the property that the form (x1 + + xd )N f (x) has
positive coecients.
An important addition to Polyas theorem is that one can estimate the degree
N from the degree of f and its distance to zero on the standard simplex; see [28].

9.10.2

Exercises

Exercise 9.52. Prove, using the geometry of the zero set, that for every N 1
the two complex variable form (|z|2 + |w|2 )N (|z|2 |w|2 )2 is not a sum of squares
of hermitian forms.
Exercise 9.53. Prove that the zero set of a sum of hermitian squares is a complex
algebraic variety.
Exercise 9.54. Show that x2 cannot be represented as a sum of hermitian squares
in the variable z = x + iy.

9.11

Semirings of Hermitian Squares

The decomposition of a polynomial into a sum of hermitian squares can be treated


with purely algebraic methods, in the spirit of the classical real algebraic geometry
[4, 28]. The specic feature of the convex hull of hermitian squares is that it is
closed under additions and multiplications, but it does not contain all real squares.
We explain below this dierence and indicate how one can include the hermitian
squares into the existing theory.
To this aim we identify real ane space of even dimension R2d with complex
ane space Cd , with coordinates (x, y) = (x1 , . . . , xd , y1 , . . . , yd ) R2d (respectively, z = (z1 , . . . , zd ) Cd ), so that zk = xk + iyk (1 k d). The Euclidean


norm is denoted as before:
z
2 = dk=1 |zk |2 = dk=1 (x2k + yk2 ).

i
i

436

main
2012/11/1
page 436
i

Chapter 9. Sums of Hermitian Squares: Old and New

When we polarize the variables z and z we identify a real polynomial of 2d


variables f (x, y) with a complex polynomial of 2d variables:
f(x + iy, x iy) = f (x, y), x, y Rd .
The R-homomorphisms R[x, y] R correspond in this way either to points (, )
R2d or to couples ( + i, i) Cd , with the associated evaluation map
(, )  f (, ) = f( + i, i).
We will carry this isomorphism throughout the section, without making it always
explicit.
Let I R[x, y] be an ideal, and let
X := VR (I) = { = + i Cd : for all f I, f (, ) = f(, ) = 0}
be the real zero set of I. The elements of the quotient algebra A = R[x, y]/I can
be considered as real polynomial functions on X. Let A2 denote the convex cone
of sums of squares in A.
Let h denote the convex cone of sums of hermitian squares |p(z)|2 in R[x, y],
where p C[z]. Clearly, h is contained in R[x, y]2 , and it is easy to see that this
inclusion is proper. Given A = R[x, y]/I as above, we write h A := (h + I)/I
for the cone of all sums of hermitian squares restricted to X. There are nontrivial
examples of ideals I for which h A contains every function in A which is strictly
positive on X. One of them is furnished by Quillens theorem.
Proposition 9.55. Let p(x, y) be a positive polynomial on the unit sphere in R2d .
Then there exists an integer n 1 and polynomials qj C[z], 1 j n, h
R[x, y], such that
n

|qj (z)|2 + (1
z
2 )h(x, y).
p(x, y) =
j=1


Proof. Let p(z, z) = , p z z be the polarization of p(x, y) and assume that
p has degree n both in z and z, but it is not necessarily homogeneous. One can
assume that n is even by passing from p to
z
2p. Next we add a new complex
variable zd+1 = u + iv, z  = (z, zd+1 ) and homogenize p:

n||
n||
P (z  , z  ) =
p z zd+1 z z d+1 .
,

The bihomogeneous polynomial P is positive on the set {z  Cd+1 ;


z
= 1,
|zd+1 | = 1}. (Prove!) Therefore, by homogeneity there exists a positive constant
C > 0 with the property
(
z
2 = |zd+1 |2 ) P (z  , z  ) > C
z 
2m ,
whence
P (z  , z  ) + C(
z
2 |zd+1 |2 )m > 0 for all z  = 0.
By applying Theorem 9.50 we nd a positive integer N such that
z 
2N P (z  , z  ) 2h .

i
i

9.11. Semirings of Hermitian Squares

main
2012/11/1
page 437
i

437

Taking zd+1 = 1 we nd polynomials Qk C[z] such that



|Qk (z)|2 ,
(
z
= 1) p(z, z) =
k

and the proof is complete, noting that the ideal of the variety
z
2 = 1 is radical.
Let A be an R-algebra, and let S be a subsemiring of A with R+ S. Recall
that S is said to be Archimedean (in A) if R+S = A, that is, if for every f A there
exists a real number c such that c f S. If A is generated by x1 , . . . , xn , then S
is Archimedean if and only if there exist ci R with ci xi S (i = 1, . . . , n). See
[28, Denition 5.4.1], [32, references there].
Denition 9.56. Let I be an ideal in R[x, y]. We say that h is Archimedean
modulo I if the semiring h + I is Archimedean in R[x, y] or, equivalently, if the
semiring h = (h + I)/I is Archimedean in R[x, y]/I.
By (a particular case of) the representation theorem [28, Theorem 5.4.4], we
have the following theorem.
Theorem 9.57. Let I be an ideal in R[x, y]. The following conditions on I are
equivalent:
(i) The set VR (I) is compact and every f R[x, y] with f > 0 on VR (I) lies in
h + I;
(ii) h is Archimedean modulo I.
(The representation theorem, in the version for semirings, asserts that (ii)
implies (i). The opposite implication is obvious.)
We observe the following simple characterization of these ideals.
Proposition 9.58. Let I be an ideal in R[x, y]. Then h is Archimedean modulo I
if and only if I contains a polynomial of the form
f = c + ||z|| +
2

r


|qk (z)|2

k=1

with c R and qk (z) C[z] (k = 1, . . . , r).


Proof. When h is Archimedean modulo I, there exists c R with c ||z||2
h + I, which implies the above condition. Conversely, if (c + ||z||2 ) h + I,
then also

|zk |2 (c + ||z||2 )
(1 c) 2xj = |zj 1|2 +
k =j

and
(1 c) 2yj = |zj i|2 +

|zk |2 (c + ||z||2 )

k =j

lie in h + I, for j = 1, . . . , d. This implies that h is Archimedean modulo I.

i
i

438

main
2012/11/1
page 438
i

Chapter 9. Sums of Hermitian Squares: Old and New

This gives plenty of examples of ideals I such that every polynomial strictly
positive on VR (I) is a hermitian sum of squares modulo I. In particular we have
obtained in this way an algebraic proof (the third one) and explanation of Quillens
phenomenon.
Proposition 9.59. On a real hypersurface of Cd of equation

z
2 +

r


|qk (z)|2 = M,

k=1

where qk C[z] and M > 0, every positive polynomial is a sum of hermitian squares.

9.11.1

Exercises


Exercise 9.60. Let F (z, z) = k |qk (z)|2 be a sum of hermitian squares. Prove
that the polarization of F satises CauchySchwarz inequality |F (, )|2
F (, )F (, ).
Exercise 9.61. Let P1 , . . . , Pr C[z] be polynomials in a single complex variable
and let a1 , . . . , ar be real numbers. Dene the function
h(z, z) =

r


|(z 1)Pj (z) + aj |


2

j=1

r


a2j .

j=1

Prove that
h(1, 1) = h(1, 1) = h(1, 1) = 0,
and deduce that h is not Archimedean modulo the ideal (h).

9.12

Multivariable Miscellanea

We comment below on a few recent advances pertaining to the theory of hermitian


forms of several variables.

9.12.1

The SchurAgler Class

Among all aspects of the theory of bounded analytic functions, results of Nevanlinna
Pick interpolation type have received by far the most attention. A pioneer on these
topics is Jim Agler. His book with McCarthy [1] well illustrates the intricate nature
of interpolation and realization theories in higher dimensions.
One of the starting points is the observation that an analytic function f (z) is
t for NevanlinnaPick interpolation in the unit ball B only if the kernel
1 f (z)f (w)
, z, w B,
1 z, w

i
i

9.12. Multivariable Miscellanea

main
2012/11/1
page 439
i

439

is positive semidenite. Then an operator realization as in Theorem 9.32 holds


true. These functions form the SchurAgler class. Also, it is within the same
class of functions that a von Neumann inequality remains valid. The main line of
attack for all proofs is the interpretation of the positivity of the above kernel as the
boundedness of the multiplier by the function f on the space of analytic functions
1
, the so-called Drury space of the ball.
in the disk with reproducing kernel 1 z,w
The polydisk in two dimensions is exceptional in this context, due to the
following observation.
Theorem 9.62 (Ando). Let (T1 , T2 ) be two commuting contractive operators on
a Hilbert space. Then for every polynomial p(z1 , z2 ), von Neumanns inequality

p(T1 , T2 )

p
,D2
is true.
A celebrated example of Varopoulos shows that such an inequality fails for
three commuting contractions; see [2]. One preferred way of avoiding the complications related to the dierence between the SchurAgler class and all contractive
analytic functions is to turn to functions of free, noncommuting variables.

9.12.2

Quotients of Sums of Hermitian Squares

Having as an example Artins solution to Hilbert 17th problem, there were a few recent attempts to characterize quotients of sums of hermitian squares. The denitive
result is due to Varolin [34], but before stating it we consider a few low-dimensional
cases and examples.
Proposition 9.63 (dAngelo). A nontrivial real valued polynomial of a single
complex variable P (z, z) can be represented as

2
j |pj (z)|

P (z, z) =
,
2
k |qk (z)|
with nitely many pj , qk C[z] if and only if there are complex numbers a , positive
or negative integers n , and a polynomial Q(z, z), such that
#
P (z, z) =
|z a |2n Q(z, z), z C,

Q(z, z) > 0, z C,
and 2 degz Q = deg Q.
The proof [9] is accessible as an exercise to the reader, with the only indication
that if a quotient of sums of hermitian squares vanishes at the point z = a, then its
Taylor series in z a and z a has the lower degree term of the form |z a|2m .

i
i

440

main
2012/11/1
page 440
i

Chapter 9. Sums of Hermitian Squares: Old and New

A second obstruction for a polynomial P (z, z) to be a quotient of sums of


squares is that its zero set be nite.
Using these observations one can analyze the polynomial
P (z, z) = 1 + bz 2 + bz 2 c|z|2 + |z|4
and obtain the following conclusions [9, 10]:
The hermitian form associated with P is positive semidenite if and only if
c 2|b| 2,
P is a quotient of sums of squares if and only if c > 2|b| 2, or b = 0, c > 2,
or |b| = 1 and c = 0.
The main result, proved even in a more general context than stated below
(for sections of a holomorphic vector bundle on a projective manifold, see [34]) is
putting the above observations in their natural higher-dimensional context.
Theorem 9.64 (Varolin). Let P (z, z) be a hermitian, bihomogeneous polyno

2
mial depending on z Cd . Let P (z, z) = nj=1 |pj (z)|2 N
j+1 |pj (z)| be a decomposition into squares, with linearly independent entries pj . Let V (P ) = {z
Cd ; P (z, z) = 0} be the zero set of P .
Then P is a quotient of square norms if and only if
n
sup
z V
/ (P )

j=1

|pj (z)|2 +

N
j+1

|pj (z)|2

P (z, z)

< .

The proof uses algebraic geometry techniques and a rened estimate of the
Bergman kernel of a carefully chosen metric in the ambient space.

9.12.3

Geometry of Proper Analytic Maps

Spectacular applications of sums of hermitian squares were recently obtained in the


study of proper analytic maps between balls of unequal dimensions; see [5]. We
conne ourselves to touch a couple of elementary aspects of this area.
The rst observation can be immediately derived from Quillens theorem.
Theorem 9.65 (CatlindAngelo). Let P : Cd Cn be a homogeneous polynomial map. If
P (z)
< 1 for
z
= 1, then there exists a polynomial map
Q : Cd Cm , such that P Q is a proper analytic map between the unit balls of
Cd and Cm+n .
Proof. According to Proposition 9.55 applied to the bihomogeneous polynomial

z
2N
P (z)
2 , there exists Q as in the statement, so that

z
2 =
P (z)
2 +
Q(z)
2 .

i
i

9.12. Multivariable Miscellanea

main
2012/11/1
page 441
i

441

Since the map P Q is nonconstant, the maximum principle implies that it is proper
(that is, by denition, it pulls back compact subsets of the open unit ball in Cm+n
into compact subsets of the open unit ball in Cd ).
To illustrate the complexity of the classication of proper analytic maps between balls, we reproduce below from the work of dAngelo (see, for instance, [10])
a low-degree and low-dimensional analysis.
The main point is the following question: given N , is there a polynomial or
rational function g from C2 to CN such that |g(z)|2 = 1 |z1 z2 |2 on the unit
sphere? Here is the result.
(a) If ||2 4, then for all N , the answer is no.
(b) If N = 1, then the answer is yes only when = 0.
(c) If N = 2, the answer is yes precisely when one of the following holds:
= 0, ||2 = 1, ||2 = 2, ||2 = 3.
(d) For each with ||2 < 4, there is a smallest N for which the answer is
yes. The limit as || tends to 2 of N is innity.
Idea of proof. We are seeking a holomorphic polynomial mapping g such that
|g1 (z)|2 + + |gN (z)|2 + ||2 |z1 z2 |2 = 1
on the unit sphere.
The components of g and the additional term z1 z2 dene a holomorphic
mapping from the n ball to the N + 1 ball which maps the sphere to the sphere.
Such a map is either constant or proper. The maximum of |z1 z2 |2 on the sphere
is 1 when |z1 |2 = |z2 |2 = 12 . Hence ||2 4 must hold if the question has a positive
answer. We claim that ||2 = 4 cannot hold either. Suppose ||2 = 4 and g exists.
Then we would have
|g(z)|2 + 4|z1 |2 |z2 |2 = 1 = (|z1 |2 + |z2 |2 )2
on the sphere, and hence
|g(z)|2 = (|z1 |2 |z2 |2 )2
on the sphere. No such g exists.
The only proper mappings from the 2-ball to itself are automorphisms and
hence linear fractional transformations. Therefore the term z1 z2 can arise only if
= 0. When = 0 we may of course choose g(z) to be (z1 , z2 ).
The next statement follows from Farans classication of the proper holomorphic rational mappings from B2 to B3 [12]. We say that two maps g and h are
spherically equivalent if there are automorphisms u, v of the domain and target
balls such that h = vgu. If g existed, then there would be a proper polynomial
mapping h from B2 to B3 with the monomial z1 z2 as a component. It follows from
Farans classication that h would have to be spherically equivalent to one of the

i
i

442

main
2012/11/1
page 442
i

Chapter 9. Sums of Hermitian Squares: Old and New

four mappings
h(z1 , z2 ) = (z1 , z2 , 0),
h(z1 , z2 ) = (z1 , z1 z2 , z22 ),

h(z1 , z2 ) = (z12 , 2z1 z2 , z22 ),

h(z1 , z2 ) = (z13 , 3z1 z2 , z23 ).


These four mappings provide the four possible values for ||.
In the higher number of squares, if we allow one larger target dimension, then
one can obtain a one-parameter family of maps:
f (z) = (z1 , z22 , cos(t)z1 z2 , sin(t)z1 z22 , sin(t)z12 z2 ).
From this formula we see that we can recover all values of || up to unity, but
not beyond.
If N = 4, for example, the answer is yes for 0 ||2 2 and the following
additional values for ||2 :
7 10 8 5
, , , .
2 3 3 2
?
?
Explicit maps where the constants 72 and 10
3 arise as coecients of z1 z2 are


7
z17 , z27 ,

f (z) =

f (z) =

9.12.4

7
z15 , z25 ,

7
z1 z2 ,
2

10
z1 z2 ,
3

7 5
z z2 ,
2 1

5 4
z z2
3 1


7
5
z1 z ,
2 2


5
4
z1 z2 .
3

Exercises

Exercise 9.66. Find the Hilbert space realization of functions in the SchurAgler
class.
Exercise 9.67. Show that the polynomial (|zw|2 |u|2 )2 + |z|8 is not a quotient
of sums of squares.
Exercise 9.68. The polynomial 1 + |z|2 + |z|4 is a quotient of sums of squares
for > 2, but for = 2 it is not.
Exercise 9.69. The polynomial z 2 + z 2 + 2|z|2 is not a quotient of sums of squares.
Open problem. A classication of polynomial or rational proper maps between
balls is still unknown, even for maps dened on B3 .

i
i

9.13. Hermitian Squares in the Free -Algebra

9.13

main
2012/11/1
page 443
i

443

Hermitian Squares in the Free -Algebra

Among the many possible ramications of the positivity of hermitian forms discussed in the preceding section, the case of so-called hereditary polynomials in a
free -algebra stands aside. First for its simplicity, and second for the applications
to optimization problems outlined in other chapters of the present book. We conne
ourselves to report a couple of signicant results in this direction, recently proved
in [20].
Let A denote the free R-algebra with generators {x1 , . . . , xd , x1 , . . . , xd } and
R-linear involution satisfying
(f g) = g f , (xk ) = xk , (xk ) = xk , 1 k d, f, g A.
An element f A is called analytic if it belongs to the algebra generated by
x1 , . . . , xd , and it is called hereditary if all monomials in the decomposition of f
have xk to the left of xj for all j, k. For instance x1 x3 + x2 x2 is hereditary, while
x1 x3 + x2 x2 is not.
We will state a generic Positivstellensatz and Nullstellensatz, quite dierent
and simpler than the results we have seen in the commutative case. To this aim,
let p1 , . . . , pm be analytic elements of the free -algebra and let
(p) = {r1 p1 + + rd pd ; r1 , . . . , rd A}
denote the left ideal generated by them. Also, let
5
4
sym(p) =
(rj qj + qj rj ); rj A, qj (p)
be the associated symmetrized ideal.
The following result holds.
Theorem 9.70. Let p1 , . . . , pm A be analytic elements. If a symmetric hereditary
q A satises
q(X)v, v 0,
for all pairs (X, v) of nite matrices and vectors satisfying pj (X)v = 0, 1 j d,
then
n

q=
fk fk + g,
k=1

where g sym(p) and every fk is analytic.


If instead, Q(X)v, v = 0 for all pairs (X, v) satisfying pj (X)v = 0, for all j,
then q sym(p).
The proof consists of a standard separation of convex sets argument and a
GelfandNaimarkSegal construction. For details see [20]. A heuristic explanation
of why such a strong result holds being that the free -algebra representations on

i
i

444

main
2012/11/1
page 444
i

Chapter 9. Sums of Hermitian Squares: Old and New

nite matrices and vectors better separate points and directions than the mere point
evaluations and derivations of the commutative polynomial algebra.

9.14

Further Reading

The selection of topics related to hermitian positivity included in the present chapter
is far from complete. While we have tried to make the text self-contained and
illustrative for many theoretical ramications, we did not touch the vast array of
applications, classical and modern. We indicate below a few links to applied areas
with the hope that the interested reader will pursue some of these threads.
To start with the most recent publications, one can consult the monograph
[3], where the essential role played by hermitian sums of squares in signal processing, the prediction theory of stochastic processes and quantum information, is
well explained. Then, for matrix completion problems, a subject of high interest
nowadays, having its origin in the Schur parameter analysis, see the monograph [8].
Matrix completion problems are frequently invoked nowadays in image analysis,
remote sensing, information theory, codication, and on and on. The monograph
by Foias and Frazho [13] contains an interesting application of completion problems
and Schur parameters to the study of the wave propagation in layered media.
The early discoveries of the shift of spectral lines is at the origin of theory
of the perturbation theory of hermitian forms. Together with scattering theory,
another foundation theme of quantum mechanics, perturbation of spectra remains
a hot theme of research, with recent spectacular applications to solid state physics
and submolecular chemistry. The old writings of the founders, such as Friedrichs
[14] and Krein [25, 26] remain actual and inspiring. We must remark here on the
imperative appearance of complex numbers and hermitian forms in the mathematical formulations of quantum mechanics. The textbook [30] and its three additional
volumes are lled with hermitian forms formalism, it is true, in innitely many
variables.
The stability of motion of classical mechanical systems (for instance, oscillations of an elastic medium or uid ows) naturally leads to the problem of enclosing
the spectrum of a hermitian or dissipative operator (aka generator of a semigroup
or hamiltonian) into a prescribed region of the complex plane. The classical root
separation results presented in the rst part of this chapter have immediate consequences to the stability of dynamical systems; see, for instance, [15, 26].
Finally, Hilbert space realization of contractive analytic functions and bounded
analytic interpolation theorems are at the heart of moment problems and the control
theory of systems of dierential equations. Each subject is a big enterprise in itself.
See again the most recent publications [2, 3] and track their bibliographies back to
century-old sources.

Bibliography
[1] J. Agler and J. McCarthy. Pick Interpolation and Hilbert Function Spaces,
Grad. Stud. Math. American Mathematical Society, Providence, RI, 2002.

i
i

Bibliography

main
2012/11/1
page 445
i

445

[2] K. J. Astrom and R. M. Murray. Feedback Systems. Princeton University Press,


Princeton, NJ, 2008.
[3] M. Bakonyi and H. Woerdeman. Matrix Completion, Moments, and Sums of
Hermitian Squares. Princeton University Press, Princeton, NJ, 2011.
[4] J. Bochnak, M. Coste, and M.-F. Roy. Real Algebraic Geometry, Ergeb. Math.
Grenzgeb. (3) 36. Springer, Berlin, 1998.
[5] D. W. Catlin and J. P. dAngelo. A stabilization theorem for Hermitian forms
and applications to holomorphic mappings. Math. Res. Lett., 3:149166, 1996.
[6] D. W. Catlin and J. P. dAngelo. An isometric imbedding theorem for holomorphic bundles. Math. Res. Lett., 6:4360, 1999.
[7] M. Chilali, P. Gahinet, and P. Apkarian. Robust pole placement in LMI regions.
IEEE Trans. Automat. Control, 44:22572270, 1999.
[8] T. Constantinescu. Schur Parameters, Factorization and Dilation Problems.
Oper. Theory Adv. Appl. 82. Birkh
auser, Basel, 1996.
[9] J. P. dAngelo. Complex variables analogues of Hilberts seventeenth problem.
Int. J. Math., 16:609627, 2005.
[10] J. P. dAngelo and M. Putinar. Polynomial optimization on odd-dimensional
spheres. In Emerging Applications of Algebraic Geometry, IMA Vol. Math.
Appl. 149. Springer, New York, 2009, pp. 115.
[11] J. P. dAngelo and D. Varolin. Positivity conditions for Hermitian symmetric
functions. Asian J. Math., 7:118, 2003.
[12] J. Faran. Maps from the two-ball to the three-ball. Invent. Math., 68:441475,
1982.
[13] C. Foias and A. Frazho. The Commutant Lifting Approach to Interpolation
Problems. Birkhauser, Basel, 1989.
[14] K. O. Friedrichs. Perturbation of Spectra in Hilbert Space. American Mathematical Society, Providence, RI, 1965.
[15] F. R. Gantmacher. The Theory of Matrices, Chelsea, New York, 1959.
[16] I. C. Gohberg and M. G. Krein. Introduction to the Theory of Linear Nonselfadjoint Operators in Hilbert Space, Transl. Math. Monogr. 18, American
Mathematical Society, Providence, RI, 1969.
[17] B. Gustafsson and M. Putinar. Linear analysis of quadrature domains. II. Israel
J. Math., 119:187216, 2000.
[18] P. R. Halmos. Normal dilations and extensions of operators. Summa Bras.
Math., 2:125134, 1950.

i
i

446

main
2012/11/1
page 446
i

Chapter 9. Sums of Hermitian Squares: Old and New

[19] E. Hellinger and O. Toeplitz. Integralgleichungen und Gleichungen mit


unendlichvielen Unbekannten. Reprint from the Encyclopaedia Math. Sci.
Chelsea, New York, 1953.
[20] J. W. Helton, S. A. McCullough, and M. Putinar. Non-negative hereditary
polynomials in a free -algebra. Math. Z., 250:515522, 2005.
[21] D. Hilbert. Grundz
uge einer Allgemeinen Theorie der Linearen Integralgleichungen. Chelsea, New York, 1953.
[22] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press,
Cambridge, UK, 1985.
[23] T. Kato. Perturbation Theory of Linear Operators. Springer, Berlin, 1966.
[24] A. N. Kolmogorov. Selected Works of A. N. Kolmogorov: Vol. 2, Probability
Theory and Mathematical Statistics. Kluwer, Norwell, MA, 1991.
[25] M. G. Krein. On the trace formula in perturbation theory (in Russian). Mat.
Sb., 33:597626, 1953.
[26] M. G. Krein. Topics in Dierential Equations and Integral Equations and Operator Theory, Oper. Theory Adv. Appl. 7. Birkh
auser, Basel, 1983.
[27] M. G. Krein and M. A. Naimark. The method of symmetric and Hermitian
forms in the theory of the separation of the roots of algebraic equations. Linear
Multilinear Algebra, 10:265308, 1981.
[28] A. Prestel and Ch. N. Delzell. Positive Polynomials. Springer, Berlin, 2001.
[29] M. Putinar. Sur la complexication du probl`eme des moments, C. R. Acad.
Sci. Paris Ser. I Math., 314:743745, 1992.
[30] M. Reed and B. Simon. Methods of Mathematics Physics Vol. 1: Functional
Analysis Academic Press, San Diego, 1972.
[31] D. G. Quillen. On the representation of hermitian forms as sums of squares.
Invent. Math., 5:237242, 1968.
[32] C. Scheiderer. Positivity and sums of squares: A guide to recent results. In
Emerging Applications of Algebraic Geometry, IMA Vol. Math. Appl. 149
Springer, New York, 2009, pp. 271324.
[33] B. V. Shabat. Introduction to Complex Analysis, Part II: Functions of Several Variables, Transl. Math. Monogr. 110. American Mathematical Society,
Providence, RI, 1991.
[34] D. Varolin. Geometry of hermitian algebraic functions. Quotients of squared
norms. Amer. J. Math., 130:291315, 2008.

[35] A. Weil. Introduction a


` LEtude
des Varietes K
ahleriennes, Hermann et Cie,
Paris, 1958.

i
i

main
2012/11/1
page 447
i

Appendix A

Background Material

Grigoriy Blekherman, Pablo A. Parrilo,


and Rekha R. Thomas

The appendix consists of four parts: matrices and quadratic forms, convex optimization, convex geometry, and algebraic geometry. The material in this appendix
is mostly standard and as such is presented for the convenience of the reader in a
compact form.

A.1

Matrices and Quadratic Forms

We present here a few basic facts about linear algebra, symmetric matrices, and
quadratic forms. There are many excellent references on the topic, including [11]
and [15], among others.
A matrix A Rnn is symmetric if aij = aji for i, j = 1, . . . , n. The set
of
symmetric
matrices is denoted as S n and is a real vector space of dimension
$n+1
%
1
= 2 (n + 1)n. Real quadratic
2
n can always be represented in terms of
n forms
symmetric matrices, i.e., q(x) = i=1 j=1 aij xi xj = xT Ax, where aij = aji . We
often identify a symmetric matrix with the corresponding quadratic form.
The characteristic polynomial of a matrix A S n is pA () := det(I A) =
n1
0n
n
+ k=0 pk k = k=1 (k ), where k are the eigenvalues of A. Given a subset
S {1, . . . , n}, let AS be the submatrix of A whose rows and columns are indexed
by S. The principal minor of A corresponding to the subset S is the determinant
of AS . If S has the form {1, 2, . . . , k}, then the corresponding minor is called a
leading principal minor. It can be shown that the coecient pk of the characteristic
polynomial is equal 
(up to sign) to the sum of all the principal minors of size n k,
i.e., pk = (1)nk S:|S|=nk det AS . Notice that, in particular, pn1 = Tr A
and p0 = (1)n det A.
447

i
i

448

A.1.1

main
2012/11/1
page 448
i

Appendix A. Background Material

Positive Semidenite Matrices

If the quadratic form xT Ax takes only nonnegative values, we say that the matrix
A is positive semidenite. Similarly, if it takes only positive values (except at the
origin, where it necessarily vanishes), then A is positive denite. There are several
equivalent conditions for a matrix to be positive (semi)denite:
Proposition A.1. Let A S n be a symmetric matrix. The following statements
are equivalent:
1. The matrix A is positive semidenite (A  0).
2. For all x Rn , xT Ax 0.
3. All eigenvalues of A are nonnegative.
4. All 2n 1 principal minors of A are nonnegative.
5. The coecients of pA () weakly alternate in sign, i.e., (1)nk pk 0 for
k = 0, . . . , n 1.
6. There exists a factorization A = BB T , where B Rnr and r is the rank of A.
For the denite case, there are similar characterizations:
Proposition A.2. Let A S n be a symmetric matrix. The following statements
are equivalent:
1. The matrix A is positive denite (A 0).
2. For all nonzero x Rn , xT Ax > 0.
3. All eigenvalues of A are strictly positive.
4. All n leading principal minors of A are strictly positive.
5. The coecients of pA () alternate in sign, i.e., (1)nk pk > 0 for k =
0, . . . , n 1.
6. There exists a factorization A = BB T , with B square and nonsingular.
n
The set of positive semidenite matrices is denoted as S+
, and its interior (the
n
n
set of positive denite matrices) as S++ . The set S+ is invariant under nonsingular
congruence transformations; i.e., if T is nonsingular, A  0 T T AT  0. The
same statement holds for its interior, i.e., A 0 T T AT 0. For additional facts
about the geometry of the set of positive semidenite matrices, see Section A.3.5.

A.1.2

Matrix Factorizations

For a symmetric matrix A, there are several matrix factorizations that can be used
to determine or certify properties of A; see, e.g., [11] for theoretical background and
[9] for computational aspects. Among the most important matrix factorizations, we
have the following.

i
i

A.1. Matrices and Quadratic Forms

main
2012/11/1
page 449
i

449

Eigenvalue decomposition. Since A is symmetric, the eigenspaces corresponding to distinct eigenvalues are mutually orthogonal, and thus one can choose
an orthonormal basis of eigenvectors. As a consequence, the matrix A is
diagonalizable and there is always a decomposition
A = V V T ,

= diag(1 , . . . , n ),

where the matrix V is orthogonal (V V T = V T V = I). If A is positive


semidenite, we have i 0 for i = 1, . . . , n.
Cholesky decomposition. A positive semidenite matrix A can be decomposed as
A = LLT ,
where L is a lower triangular matrix (i.e., Lij = 0 for j > i). This is known as
the Cholesky decomposition of the matrix A and can be obtained by solving
the identity above column by column (or row by row). The Cholesky decomposition can be computed in O(n3 ) operations (in the dense case, faster
if the matrix is sparse). Notice that, as opposed to eigenvalue methods, no
iterative methods are required. This decomposition plays an important role
in numerical algorithms for semidenite programming.
LDLT decomposition. This is a decomposition of the form
A = LDLT ,
where the matrix D is diagonal with nonnegative entries, and L is lower
triangular with normalized diagonal entries Lii = 1. It should be clear that
this can be directly obtained from the Cholesky decomposition, by suitably
normalizing its diagonal entries. The importance of the LDLT decomposition
is that, in contrast to the other two factorizations discussed above, it is a
rational decomposition; i.e., if the matrix A is rational then all numbers that
appear in the decomposition are rational (and also, polynomially sized).
Two distinct factorizations of the same positive semidenite matrix can always
be related through a suitable orthogonal transformation. The following result makes
this precise.
Theorem A.3. Let A S n be a positive semidenite symmetric matrix, with
A = F F T and A = GGT , where F, G Rnp . Then, there exists a matrix U Rpp
such that F = GU and U is orthogonal (i.e., such that U U T = I and U T U = I).

A.1.3

Inertia and Signature

Denition A.4. Consider a symmetric matrix A. The inertia of A, denoted I(A),


is the integer triple (n , n0 , n+ ), where n , n0 , n+ are the number of negative, zero,
and positive eigenvalues, respectively. The signature of A is equal to the number

i
i

450

main
2012/11/1
page 450
i

Appendix A. Background Material

of positive eigenvalues minus the number of negative eigenvalues, i.e., the integer
n+ n .
Notice that, with the notation above, the rank of A is equal to n+ + n .
A symmetric positive denite n n matrix has inertia (0, 0, n), while a positive
semidenite one has (0, k, n k) for some k 0.
The inertia is an important invariant of a quadratic form, since it holds that
I(A) = I(T AT T ), where T is nonsingular. This invariance of the inertia of a matrix
under congruence transformations is known as Sylvesters law of inertia; see, for
instance, [11]. This invariance makes it possible to eciently compute the inertia
of a matrix from its LDLT decomposition, since in this case I(A) = I(D), and the
inertia of a diagonal matrix is trivial to compute.

A.1.4

Schur Complements

Given a block-partitioned matrix




A
BT


B
,
C

where A is square and invertible, the Schur complement of A is the matrix C


B T A1 B. Similarly, if C is square and invertible, its Schur complement is the
matrix A BC 1 B T . Schur complements appear in many areas, including among
others convex optimization (partial minimization of quadratic functions), probability and statistics (conditioning and marginalization of multivariate Gaussians), and
algorithms (block matrix inversion). For several applications and generalizations,
see, for instance, the classical survey [6].
Many of the properties of the Schur complement follow quite directly from the
two factorizations:

 



I
0
A
0
I A1 B
A B
=
0
I
B T A1 I
0 C B T A1 B
BT C





1
1 T
I BC
A BC B
0
I
0
=
.
0
I
0
C
C 1 B T I
Since the factorizations above are congruence transformations, this implies that the
following conditions are equivalent:




C 0,
A 0,
A B

0
T 1
1 T
BT C
A BC B 0.
C B A B 0

A.2

Convex Optimization

In this section we describe the basic elements of optimization theory, with an emphasis on convexity. For additional background, complete statements, and proofs,
we refer the reader to the works [2, 3, 5].

i
i

A.2. Convex Optimization

A.2.1

main
2012/11/1
page 451
i

451

Convexity and Hessians

A set S Rn is a convex set if x, y S implies x + (1 )y S for all 0 1.


A function f : Rn R is a convex function if f (x+ (1 )y) f (x)+ (1 )f (y)
for all 0 1 and x, y Rn . A function f is convex if and only if its epigraph
{(x, t) Rn R : f (x) t} is a convex set. A function f is concave if f is convex.
When a function is dierentiable there are several equivalent characterizations of
convexity, in terms of the gradient f (x) or the Hessian 2 f (x):
Lemma A.5. Let f : Rn R be a twice dierentiable function. The following
propositions are equivalent:
(i) The function f is convex, i.e.,
f (x + (1 )y) f (x) + (1 )f (y)

for all

0 1,

x, y Rn .

(ii) The rst-order convexity condition holds:


f (y) f (x) + (f (x))T (y x),

for all

x, y Rn .

(iii) The second-order convexity condition holds:


2 f (x)  0,

for all

x Rn ,

i.e., the Hessian is positive semidenite everywhere.

A.2.2

Minimax Theorem

Given a function f : S T R, the following inequality always hold:


max min f (s, t) min max f (s, t).
tT sS

sS tT

(A.1)

If the maxima or minima in (A.1) are not attained, the inequality is still true by
replacing max and min with sup and inf, respectively.
It is of interest to understand situations under which (A.1) holds with equality.
The following is a well-known condition for this.
Theorem A.6 (minimax theorem). Let S Rn and T Rm be compact convex
sets, and f : S T R be a continuous function that is convex in its rst argument
and concave in the second. Then
max min f (s, t) = min max f (s, t).
tT sS

sS tT

A special case of this theorem, used in game theory to prove the existence of
equilibria for zero-sum games, is when S and T are standard unit simplices and the
function f (s, t) is a bilinear form.

i
i

452

A.2.3

main
2012/11/1
page 452
i

Appendix A. Background Material

Lagrangian Duality

Consider a nonlinear optimization problem:


minimize
n

f (x)

subject to

gi (x) 0, i = 1, . . . , m,
hj (x) = 0, j = 1, . . . , p,

xR

(A.2)

and let u be its optimal value. Dene the Lagrangian associated with the optimization problem (A.2) as
p
L : Rn Rm
+ R
(x, , )

Rn , 
p
 f (x) + m
i=1 i gi (x) +
j=1 j hj (x).

The Lagrange dual function is dened as


(, ) := minn
xR

L(x, , ),

Maximizing this function over the dual variables (, ) yields


v  :=

max

Rp and 0

(, ).

Applying the minimax inequality (A.1), we see that this is a lower bound on the
value of the original optimization problem:
v  minn

max

xR Rp and 0

L(x, , ) = u .

If the functions f , gi are convex and hi are ane, then the Lagrangian is convex
in x and concave in (, ). To ensure strong duality (i.e., equality in the expression
above), compactness or other constraint qualications are needed. An often used
condition is the Slater constraint qualication: there exists a strictly feasible point,
i.e., a point z  Rn such that gi (z  ) < 0 for all i = 1, . . . , m and hj (z  ) = 0 for all
j = 1, . . . , p. Under this condition, strong duality always holds.
Theorem A.7. Consider the optimization problem (A.2), where f, gi are convex
and hi are ane. Assume Slaters constraint qualication holds. Then the optimal
value of the primal is the same as the optimal value of the dual, i.e., v  = u .

A.2.4

KKT Optimality Conditions

Consider the nonlinear optimization problem in (A.2). The KarushKuhnTucker


(KKT) optimality conditions are
p
m








f +
i gi +
j hj = 0,
x

i=1

x

j=1

Primal feasibility:

x

gi (x ) 0


for i = 1, . . . , m,

hj (x ) = 0
i 0

for j = 1, . . . , p,
for i = 1, . . . , m,

Complementary slackness: i gi (x ) = 0

for i = 1, . . . , m.

Dual feasibility:

(A.3)

i
i

A.3. Convex Geometry

main
2012/11/1
page 453
i

453

Under certain constraint qualications (e.g., the ones in the theorem below), the
KKT conditions are necessary for local optimality.
Theorem A.8. Assume any of the following constraint qualications hold:
The gradients of the constraints {g1 (x ), . . . , gm (x ), h1 (x ), . . . , hp (x )}
are linearly independent.
There exists a strictly feasible point (Slater constraint qualication), i.e.,
a point z  Rn such that gi (z  ) < 0 for all i = 1, . . . , m and hj (z  ) = 0
for all j = 1, . . . , p.
All constraints gi (x), hi (x) are ane functions.
Then, at every local minimum x of (A.2) the KKT conditions (A.3) hold.
On the other hand, for convex optimization problems, i.e., if all functions f , gi
are convex and hi are ane, then the KKT conditions are sucient for local (and
thus global) optimality:
Theorem A.9. Let (A.2) be a convex optimization problem and x be a point that
satises the KKT conditions (A.3). Then x is a global minimum.

A.3

Convex Geometry

We give a summary of standard properties of convex sets and the cone of positive
semidenite matrices. We refer the reader to [2, 13, 14] for more background and
proofs.

A.3.1

Basic Facts

Recall that a subset K of Rn is called convex if for all x, y K we have x + (1 )


y K for all real 0 1.
For vectors x1 , . . . , xk Rn a linear combination 1 x1 + + k xk is called
a convex combination if i 0 for 1 i k and 1 + + k = 1. A linear
combination is called a conic combination if we require only that i 0 for 1
i k. Equivalently, a subset K of Rn is convex if it is closed under taking convex
combinations, and K is convex cone if it is closed under taking conic combinations.
Equivalently, a convex cone is a convex set that is also closed under multiplication
by nonnegative scalars.
Let S Rn be an arbitrary subset. The convex hull, conv(S), of S is the
smallest convex set containing S. Equivalently conv(S) is the set of all convex
combinations of points in S:



x = 1 y1 + + k yk for some y1 , . . . , yk S
.
conv(S) = x Rn
i 0, 1 + + k = 1

i
i

454

main
2012/11/1
page 454
i

Appendix A. Background Material

The conic hull, cone(S), of S is the set of all conic combinations of the points in S:
cone(S) =


x Rn



x = 1 y1 + + k yk for some y1 , . . . , yk S

.

i 0

The set cone(S) is the smallest convex cone containing S.


A priori it is not clear how large, in terms of the number of points, the convex combinations of points in S need to be to write down a point in conv(S).
Caratheodorys Theorem provides an upper bound.
Theorem A.10. Let S be a subset of Rn . Then any point in the convex hull of S
can be written as a convex combination of at most n + 1 points in S.
A set dened by nitely many linear inequalities is called a polyhedron. The
convex hull of nitely many points in Rn is called a polytope, and the conic hull of
nitely many points in Rn is called a polyhedral cone. We then have the following
theorem.
Theorem A.11. A bounded polyhedron is a polytope.
Convex sets possess a well-dened notion of dimension. Let K Rn be a
convex set and let A(K) be its ane hull, i.e., the ane linear subspace spanned
by K. Then K has interior, as a subset of A(K) and dim K = dim A(K). The
interior of K as a subset of A K is called the relative interior of K, and the
boundary of K as a subset of A K is called the relative boundary of K. A compact
convex set that is full dimensional in Rn is a called a convex body.
Let K Rn be a closed convex set. A subset F K is called a face of K if
for all x, y K and any 0 1, if we have x + (1 )y F , then x, y F .
A face F is called a proper face of K if F is a nonempty proper subset of K. It is
easy to see that a proper face F does not contain any points in the relative interior
of K and therefore it is a subset of the relative boundary of K. The intersection of
any two faces of K is a face of K.
A face F of K is called exposed if there exists an ane hyperplane H in Rn
such that F = K H. The hyperplane H divides Rn into two half spaces, and it
follows that K \ F must lie in one of the two open half spaces. Equivalently F is
an exposed face of K if there exists an ane linear functional : Rn R such that
(x) 0 for all x K and (s) = 0 for all s F .
A point x K is called an extreme point of K if x is a face of K; i.e., if
x = y1 + (1 )y2 for some y1 , y2 K and 0 1, then y1 = y2 = x. A point x
in a convex cone C is said to span an extreme ray of C if the ray spanned by x is a
face of C; i.e., if x = 1 y1 + 2 y2 for some y1 , y2 K and 1 , 2 0, then y1 and y2
lie on the ray spanned by x. The following is the nite-dimensional KreinMilman
theorem. It is also known as Minkowskis theorem.
Theorem A.12. Let K Rn be a compact convex set. Then K is the convex hull
of its extreme points.

i
i

A.3. Convex Geometry

main
2012/11/1
page 455
i

455

Faces of a compact convex set K ordered by inclusion form a partially ordered


set F (K). The poset F (K) is a lattice, where the meet operation is intersection:
F1 F2 = F1 F2 andKthe join operation is the intersection of all faces containing
F1 and F2 : F1 F2 = F F1 ,F2 F .
It follows from the KreinMilman theorem that the minimal proper faces in
F (K) are the extreme points of K, and it will follow from separation theorems
presented below that the maximal proper faces in F (K) are exposed. We note
that the maximal proper faces of F (K) do not have to have the same dimension.
In particular for K Rn , the dimension of all the maximal proper faces can be
strictly smaller than n 1. This is the case for the cone of positive semidenite
n
matrices S+
as we will see below.

A.3.2

Cone Decomposition

Let K1 , K2 be convex subsets of Rn . Dene K1 + K2 as the set of all sums of points


from K1 and K2 :
K1 + K2 = {x Rn | x = x1 + x2 , x1 K1 , x2 K2 }.
This operation is called Minkowski addition, and the set K1 + K2 is also convex.
A closed convex cone C Rn is called pointed if C does not contain straight
lines. A cone that is closed, full-dimensional in Rn , and pointed is called a proper
cone. The following theorem shows that a nonpointed cone can always be decomposed into a pointed cone and a subspace.
Theorem A.13. Let C Rn be a closed convex cone. Then C is the Minkowski
sum of a pointed cone P and a linear subspace L:
C = P + L.
This allows us to concentrate on pointed convex cones. Now we formulate the
analogue of the KreinMilman theorem for pointed cones.
Theorem A.14. Let C be a closed pointed cone in Rn . Then C is the conic hull
of the points spanning its extreme rays.
There is also decomposition theorem for polyhedra, called the Minkowski
Weyl theorem.
Theorem A.15. Every polyhedron is a Minkowski sum of a polytope and a polyhedral cone.

A.3.3

Separation Theorems

An important property of a convex set is that we can certify when a point is not
in the set. This is usually done via a separation theorem. Let H be an ane
hyperplane in Rn . Then H divides Rn into two half spaces. We will use H+ and
+ and H
to denote the closed half spaces.
H to denote the open half spaces and H

i
i

456

main
2012/11/1
page 456
i

Appendix A. Background Material

We say that H separates two sets K1 and K2 if K1 and K2 belong to dierent closed
+ and H
. We say that H strictly separates K1 and K2 if they belong
half spaces H
to dierent open subspaces H+ and H .
Equivalently we can think of H as the zero set of an ane linear functional
: Rn R. Then separates K1 and K2 if (x) 0 for all x K1 and (x) 0
for all x K2 . Similarly strictly separates K1 and K2 if (x) > 0 for all x K1
and (x) < 0 for all x K2 .
Now we state our most general separation theorem.
Theorem A.16. Let K1 and K2 be convex subsets of Rn such that K1 K2 = .
Then there exists an ane hyperplane H that separates K1 and K2 .
We observe that it follows from Theorem A.16 that every face of a convex set
K is contained in an exposed face of K.
We will often be interested in strict separation, in which case we need to make
further assumptions on K1 and K2 .
Theorem A.17. Let K1 and K2 be disjoint convex subsets of Rn and suppose that
K1 is compact and K2 is closed. Then there exists an ane hyperplane H strictly
separating K1 and K2 .
Theorem A.17 is often applied when K1 is a single point. Separation theorems
lead to certicates of not belonging to a convex set. Combined with notions of
polarity explained below this leads to theorems of the alternative.
We need to adjust Theorems A.16 and A.17 to the setting of cones, since, for
example, all cones contain the origin and are never disjoint. Also, any hyperplane
separating two cones C1 and C2 must be linear. We will say that a linear functional
: Rn R separates C1 and C2 if (x) 0 for all x C1 and (x) 0 for all
x C2 . Similarly strictly separates C1 and C2 if (x) > 0 for all nonzero x C1
and (x) < 0 for all nonzero x C2 . Then we have the following theorem.
Theorem A.18. Let C1 and C2 be pointed closed convex cones in Rn such that
C1 C2 = 0. Then there exists a linear functional : Rn R strictly separating
C1 and C2 .

A.3.4

Polarity and Duality

We can view a compact convex set K as the convex hull of its extreme points,
but we can also view it as being cut out by linear inequalities. The set of ane
linear inequalities dening K is a convex object itself, and this leads to very fruitful
notions of polarity and duality in convex geometry.
Let  ,  be an inner product on Rn . Let K Rn be a convex body with
origin in its interior. Dene the polar body K as follows:
K = {x Rn | x, y 1 for all y K}.
The polar body encodes all the ane linear dening inequalities of K. It is easy to
see that K is also a convex body with origin in its interior. Moreover x K is

i
i

A.3. Convex Geometry

main
2012/11/1
page 457
i

457

on the boundary of K if and only if x, y = 1 for some y K. Polarity reverses


inclusion: if K1 and K2 are convex bodies and
if K1 K2 ,

then K2 K1 .

First we observe that polarity is an involution on convex bodies with origin in


the interior.
Theorem A.19 (biduality theorem). Let K be a convex body with origin in its
interior. Then
(K ) = K.
We now note that extreme points of K dene maximal proper faces of K
(and vice versa): given x K let Fx be the face of K dened by Fx = {y K |
x, y = 1}. More generally, if F is a face of K, we can dene the corresponding
exposed face F of the polar K by F = {y K | x, y = 1 for all x F }.
We observe that (F ) is equal to F if and only if F is exposed.
We can similarly dene the notion of a polar cone. Let C Rn be a convex
cone. We note that if for some x Rn we have x, y 1 for all y C, then it
follows that x, y 0 for all y C. Accordingly we dene the polar cone C as
C = {x Rn | x, y 0 for all y C}.
The dual cone C is dened as the negative of the polar cone:
C = {x Rn | x, y 0 for all y C}.
We note that the dual cone is often dened as a subset of the dual space (Rn ) :
C = { (Rn ) | (y) 0 for all y C}.
Here we used an explicit identication of the dual space (Rn ) with Rn via the inner
product  , . We can similarly state a biduality theorem for cones.
Theorem A.20. Let C be a closed convex cone in Rn . Then
(C ) = C

A.3.5

and

(C ) = C.

Cone of Positive Semidenite Matrices

n
Let S+
denote the cone of positive semidenite n n matrices. It is easy to show
n
that S+
is a closed, pointed cone and it is full dimensional in S n . We dene an
inner product on S n as follows: A, B = Tr(AB). It is not hard to show that the
n
is self-dual.
cone S+

Proposition A.21. With the inner product A, B = Tr(AB) for A, B S n we


have
$ n %
n
.
S+ = S+

i
i

458

main
2012/11/1
page 458
i

Appendix A. Background Material

From diagonalization of symmetric matrices we see that any positive semiden


can be written as a sum of rank 1 positive semidenite matrices.
nite matrix A S+
n
Thus we see that the extreme rays of S+
are the rank 1 positive semidenite matrices. Now let V be a linear subspace of Rn . Let FV be the set of all positive
semidenite matrices A such that V ker A. It is easy to show that FV is a face
n
n
. It also happens that all faces of S+
have this form.
of S+
n
Theorem A.22. Let F be a face of S+
; then F = FV for some subspace V of Rn .

Using diagonalization of symmetric matrices again it follows that the face FV


is isomorphic to the cone of positive semidenite
matrices of dimension ncodim V .
$ %
n
for
some 0 k n. We note that if
Therefore faces of S+
have dimension k+1
2
V and W are linear subspaces of Rn and V W , then
$ n %FW FV . From this we
.
obtain the following description of the face lattice F S+
$ n%
Corollary A.23. The face lattice F S+
is isomorphic to the lattice of linear
subspaces of Rn ordered by reverse inclusion.
We also have the following important corollary.
Corollary A.24. Let A, B be positive semidenite matrices. Then A, B = 0 if
and only if AB = 0.
Proof. Suppose that AB = 0. Then A, B = Tr(AB) = 0. Now suppose that
k
A, B = 0. We can write B = i=1 Ri , where Ri are positive semidenite rank 1
k
n
is self-dual,
matrices. Then we have A, B = i=1 A, Ri  = 0. Since the cone S+
we know that A, Ri  0 and therefore A, Ri  = 0 for all i. Since matrices Ri
have rank 1 we can write Ri = vi viT for some vectors vi Rn . Therefore we see
that A, Ri  = viT Avi = 0, and since A  0 we see that vi is in the kernel of A.
Therefore we see that ARi = Avi viT = 0 for all i. Thus we have AB = 0.

A.3.6

Dimensional Inequalities

It is often of great interest to nd a low rank positive semidenite matrix given


some linear conditions on the entries of a matrix. While existence of a positive
semidenite matrix subject to linear constraints can be solved via semidenite programming, the existence of a solution of low rank is a nonconvex problem and thus
quite challenging. It is therefore of interest to nd some theoretical guarantees
on the existence of low rank solutions, given that a positive semidenite solution
exists. We state the following bounds discovered and rediscovered independently by
several authors [1].
n
such that the intersection
Theorem A.25. Let A be an ane subspace of S+
$r+2%
n
A S+ is nonempty and codim A 2 1 for some nonnegative integer r. Then
n
there is a matrix X S+
A such that rank X r.

This bound is sharp in general, but it was improved by Barvinok in the case
n
where the intersection A S+
is bounded [1].

i
i

A.4. Algebra of Polynomials and Ideals

main
2012/11/1
page 459
i

459

Theorem A.26. Suppose$ that


subspace of
% r 0 and n r + 2. Let A be an ane
n
n
.
Suppose
that
the
intersection
A

S
such that codim A = r+2
is
nonempty
S+
+
2
n
and bounded. Then there is a matrix X S+
A such that rank X r.

A.4

Algebra of Polynomials and Ideals

There are excellent books for the basics of commutative algebra, algebraic geometry,
and real algebraic geometry used in this book. For polynomials, ideals, Gr
obner
bases, and basic algebraic geometry we refer the reader to [7], an introduction to
these topics at the undergraduate level. For basic real algebraic geometry concepts
such as semialgebraic sets and the TarskiSeidenberg quantier elimination, see [12].
What we provide below is a brief tour through some of the algebraic themes that
arise in this book with the goal of giving the absolute newcomer a quick grasp of
the concepts. For a more serious appreciation of these topics, the reader is referred
to the above-mentioned books.

A.4.1

Monomials, Polynomials, and the Polynomial Ring

A monomial in the n variables x1 , . . . , xn (abbreviated as x) is a product xa :=


xa1 1 xa2 2 xann , where a = (a1 , . . . , an ) Nn . A polynomial in x1 , . . . , x
n with
coecients in a eld k is a nite linear combination of the form f :=
ca xa ,
a

k.
A
monomial
x
is
in
the
support
of
f
if
c
=

0
in
the
expression
where
c
a
a a

f =
ca x . The degree of f =
ca xa is the maximum L1 -norm of the vectors a
that appear as exponents of monomials in the support of f . The usual elds considered in this book are the set of real numbers denoted as R and the set of complex
numbers denoted as C. In what follows, we assume that the eld k is either C or R.
The polynomial ring k[x] := k[x1 , . . . , xn ] is the set of all polynomials in x1 , . . . , xn
with coecients in k. It is endowed with the two binary operations of addition and
multiplication of pairs of polynomials.
Groups, rings, and elds are basic objects in abstract algebra that satisfy an
increasing list of properties. See, for instance, [8] for denitions and examples.
A binary operation  on a set S is
associative if (f  g)  h = f  (g  h) for all f, g, h S, and
commutative if f  g = g  f for all f, g S.
The pair (S, )
has an identity if there exists an element e S such that f  e = e  f = f for
all f S, and
has inverses if for each f S, there exists an element f 1 S such that
f  f 1 = f 1  f = e.
Denition A.27.
A set G with a binary operation  is a group if  is associative and (G, ) has
an identity and inverses. If in addition,  is commutative in G, then G is
called a commutative group.

i
i

460

main
2012/11/1
page 460
i

Appendix A. Background Material

A set R with two binary operations + (addition) and (multiplication) is a


ring if (R, +) is a commutative group, and (R, ) is associative and distributes
over + in the sense that f (g + h) = f g + f h for all f, g, h R. If (R, )
has an identity and/or (R, ) is commutative, then R is a ring with identity
and/or commutative.
A eld is a ring (F, +, ) in which (F, ) is also a commutative group with
identity.
The set of integers under addition and multiplication, (Z, +, ), forms a commutative ring with identity: (Z, +) is a commutative group (with 0 as its additive
identity and for each z Z, z is the additive inverse of z), and 1 is the multiplicative identity in (Z, +, ). No element z Z, z = 1 has a multiplicative inverse. On
the other hand, (R, +, ) and (C, +, ) are elds. The set of n n matrices under
matrix addition and multiplication forms a noncommutative ring with identity.
The polynomial ring k[x] is a commutative ring with identity under addition
and multiplication of pairs of polynomials. The empty monomial x01 x0n = 1
k[x] and hence k is a subset of k[x] and is called the set of scalars in k[x]. It is
customary to denote f g as just f g when the multiplication operation is clear. The
ring kx denotes the free ring where the variables x1 , . . . , xn do not commute; i.e.,
the relation xi xj = xj xi is not assumed. The free ring (and also k[x]) is an example
of an algebra which is a ring that is also a vector space over its eld of scalars.
Hence it is often called the free algebra in n variables over k. This noncommutative
ring plays a central role in Chapter 8.

A.4.2

Polynomial Ideals, Gr
obner Bases, and Quotient
Rings

Denition A.28.
1. A subset I k[x] is an ideal if it satises the following properties:
0 I.
If f, g I, then f + g I.
If f I and h k[x], then hf I.
2. The ideal generated by f1 , . . . , ft k[x] is the set I =

4

t
i=1

5
hi fi : hi k[x] ,

denoted as f1 , . . . , ft .
Check that f1 , . . . , ft  is an ideal in k[x]. A simple example of an ideal in
the polynomial ring R[x1 , x2 ] is the set of all polynomials that evaluate to 0 on the
point (0, 0). This ideal consists of all polynomials of the form x1 f + x2 g, where
f, g k[x] and hence equals x1 , x2 . An ideal I k[x] is nitely generated if it
is generated by a nite set of polynomials in k[x]. A generating set of an ideal I
is called a basis of I. An ideal can have bases of dierent cardinalities and, unlike
a vector space basis, an ideal basis is just a generating set with no independence
requirements.

i
i

A.4. Algebra of Polynomials and Ideals

main
2012/11/1
page 461
i

461

Theorem A.29 (Hilberts basis theorem). If k is a eld, then every ideal in


k[x] is nitely generated (has a nite basis).
Here are two important types of ideals.

Denition A.30. A polynomial f =
ca xa is homogeneous if all monomials in
its support have the same degree. An ideal I k[x] is homogeneous if it is generated
by homogeneous polynomials.
Denition A.31. An ideal I is principal if it is generated by a single polynomial.
Gr
obner bases are special bases for a polynomial ideal. They enable many
algorithms in computational algebraic geometry.
Denition A.32. A term order on k[x] is a total ordering on the monomials in
k[x] such that
1 xa for all a = 0, and
if xa xb then xa xc xb xc for all monomials xc .
A common example of a term order is the lexicographic/dictionary order with
x1 x2 xn dened as xa xb if and only if the left most nonzero
entry in a b is positive. Note that there are n! lexicographic term orders on
k[x]. A term order needed in Chapter 7 is the total degree order which rst sorts
monomials by degree and then breaks ties using a xed term order such as the above
lexicographic order. More precisely, xa xb if and only if either deg(xa ) > deg(xb )
or deg(xa ) = deg(xb ) and xa is lexicographically larger than xb .
Denition A.33. Fix a term order on k[x].


ca xa k[x] with respect to
The initial term in (f ) of a polynomial f =
a
a
is that monomial ca x with ca = 0 such that x xb for all other monomials
xb in the support of f . The monomial xa is called the initial monomial of f .
The initial ideal in (I) is the ideal generated by the initial monomials of all
polynomials in I.

By Hilberts basis theorem, the initial ideal in (I) is nitely generated. In


fact, it has a unique set of minimal generators that are all monomials.
Denition A.34. A Gr
obner basis G of a polynomial ideal I k[x] with
respect to the term order is a nite set of polynomials g1 , . . . , gt I such that
in (g1 ), . . . , in (gt ) = in (I).
Each term order gives rise to a reduced Grobner basis of I which is unique.
The Grobner bases returned by a computer algebra package such as Macaulay2 are
usually reduced. In the 1960s Buchberger provided an algorithm to nd a Grobner

i
i

462

main
2012/11/1
page 462
i

Appendix A. Background Material

basis of an ideal given a term order. This algorithm underlies the Grobner basis functionality in modern computer algebra packages such as Macaulay2, SINGULAR,
Maple, Mathematica, etc.
Example A.35. An example of a reduced Grobner basis with respect to the total
degree ordering was given in Chapter 7. Consider the ideal
I = x4 y 2 z 2 , x4 + x2 + y 2 1.
Using Macaulay2 [10] one can calculate a total degree reduced Gr
obner basis of I
as follows:
Macaulay2, version 1.3
i1
i2
i3
o3

:
:
:
=

R
I
G
|

= QQ[x,y,z,Weights => {1,1,1}];


= ideal(x^4-y^2-z^2, x^4+x^2+y^2-1);
= gens gb I
x2+2y2+z2-1 4y4+4y2z2+z4-5y2-3z2+1 |

which says that a total degree Grobner basis consists of the two polynomials
x2 + 2y 2 + z 2 1 and 4y 4 + 4y 2 z 2 + z 4 5y 2 3z 2 + 1.
The reduced Gr
obner basis of I would have the property that no initial term of an
element is divisible by the initial term of another element and that all initial terms
have unit coecients. Hence the reduced Gr
obner basis of I is


1 4
2
2
2
4
2 2
2
2
x + 2y + z 1, y + y z + (z 5y 3z + 1) .
4
In particular, the initial ideal of I with respect to this term order is x2 , y 4 . Check
that both elements in the Grobner basis lie in the ideal I.
Gr
obner bases enable a multitude of computations with ideals such as checking whether a polynomial lies in an ideal (ideal membership), checking whether an
ideal equals the whole ring, nding all roots of a system of polynomial equations,
nding the intersection of two ideals, etc. Ideal membership relies on a multivariate
division algorithm that computes the remainder (called a normal form) of a polynomial f with respect to a Gr
obner basis. A polynomial f lies in I if and only if
the normal form of f (with respect to any reduced Gr
obner basis of I) is zero. This
in turn relies on the fact that the normal form of a polynomial with respect to a
reduced Gr
obner basis of I is unique.
Example A.36. The normal form of the monomial x2 y with respect to the Grobner
basis in Example A.35 is obtained by successively dividing out the initial monomial
obner basis from x2 y and
in (g) of an element g := in (g) g  in the reduced Gr

2
2
2
multiplying with g . Let g1 := x + 2y + z 1 and g2 := y 4 + y 2 z 2 + 14 (z 4 5y 2
3z 2 + 1). Then x2 y can be divided by g1 to give 2y 3 yz 2 + y. The resulting initial
term 2y 3 cannot be divided by either in (g1 ) or in (g2 ), which implies that the
normal form of x2 y is 2y 3 yz 2 + y.

i
i

A.4. Algebra of Polynomials and Ideals

main
2012/11/1
page 463
i

463

Given an ideal I in a polynomial ring k[x], one can compute the quotient
ring k[x]/I which consists of all equivalence classes of polynomials in k[x] mod the
ideal I. Given two polynomials f, g k[x], f is equivalent to g mod I if f g I.
This is denoted as f
= g mod I, and the equivalence class of f mod I is denoted as
f + I. This notion is a generalization of the familiar modular arithmetic in the ring
of integers, where we say that z, z  Z are equivalent mod a xed integer p if z z 
is an integer multiple of p. In this case the ideal I (in the ring of integers Z) is the
ideal generated by p, namely the set consisting of all integer multiples of p. If f  is
the normal form of a polynomial f with respect to a reduced Grobner basis of an
ideal I in k[x], then f f  I and hence f
= f  mod I. Since the normal form of a
polynomial with respect to a reduced Gr
obner basis is unique, if f g I, then the
normal form of f g is zero, which implies that both f and g have the same normal
form. Hence every equivalence class of polynomials mod I can be represented by
the unique normal form of all the elements in that class with respect to a xed
reduced Gr
obner basis of I.
Example A.37. In Example A.35, the equivalence class of x2 y mod I consists of
all polynomials g Q[x, y, z] such that x2 y g I. In other words, x2 y + I is the
set of all g Q[x, y, z] with normal form 2y 3 yz 2 + y with respect to the reduced
Gr
obner basis


1
g1 := x2 + 2y 2 + z 2 1, g2 := y 4 + y 2 z 2 + (z 4 5y 2 3z 2 + 1) .
4
The quotient ring k[x]/I is a k-vector space. Addition in the ring is dened
as (f + I) + (g + I) = (f + g) + I and scalar multiplication as (f + I) = f + I for
all k. A primary use of Grobner bases is that they provide a vector space basis
for k[x]/I in the following sense. Fix a term order on k[x] and consider the initial
ideal in (I) of the ideal I. Recall that this initial ideal is generated by monomials.
The monomials in k[x] that do not lie in in (I) are called the standard monomials
of in (I). The equivalence classes m + I as m varies over the standard monomials
of in (I) form a vector space basis of k[x]/I. Buchbergers algorithm for Gr
obner
bases was motivated by the quest to nd vector space bases for k[x]/I. It is easy
to see why the equivalence classes of standard monomials provide a vector space
basis for k[x]/I. We saw earlier that once a term order is xed, every equivalence
class f + I has a unique representative f  + I, where f  is the normal form of f
with respect to the reduced Grobner basis G of I corresponding to . Note that
f  cannot be divided by in (g) for any g G and hence all its monomials are
standard with respect to in (I). This shows that the elements m + I span k[x]/I.
If a collection of them are linearly dependent,
then there exists standard monomials
,
.
.
.
,
m
and
scalars

,
.
.
.
,

such
that
i (mi + I) = 0 + I, or equivalently,
m
t
1
t
1
i mi I. However, if
i mi I, then its normal form with respect to G is
zero which implies that some mi is divisible by some in (g) for g G , which is a
contradiction.
Example A.38. The vector space Q[x, y, z]/I for the ideal in Example A.35 has
innite dimension. The initial ideal of the total degree order used in this example

i
i

464

main
2012/11/1
page 464
i

Appendix A. Background Material

is in (I) = x2 , y 4 . Hence the standard monomials of this initial ideal are all
monomials in x, y, z that are not divisible by x2 and y 4 . There are innitely many
such monomials since all powers of z are standard. Regardless, an innite basis of
Q[x, y, z]/I consists of m+I as m varies over the standard monomials of in (I).

A.4.3

Algebraic Varieties

Denition A.39. Given an ideal I = f1 , . . . , ft  k[x], its ane variety in k n is


the set Vk (I) := {p k n : f (p) = 0 for all f I}.
It can be checked that Vk (I) = {p k n : f1 (p) = = ft (p) = 0} and
is hence the set of solutions (zeros) in k n of the system of polynomial equations
f1 (x) = = ft (x) = 0. The ane variety of a principal ideal f  k[x] is called
a hypersurface in k n and denoted simply as Vk (f ).
Example A.40. The ane variety of the ideal x2i xi for all i = 1, . . . , n
R[x1 , . . . , xn ] is the set of all 0/1 vectors in Rn .
If f is a homogeneous polynomial in k[x], then for every p k n such that
f (p) = 0, we also have that f (p) = 0 for all k. Hence, solutions of I come
in lines through the origin. Hence, it makes sense to declare all points on a line
through the origin in k n as being equivalent. This leads to projective geometry,
where we replace k n with the projective space Pn1
whose points are in bijection
k
with the distinct lines through the origin in k n . The homogeneous coordinates of
corresponding to the line spanned by (x1 , . . . , xn ) is denoted as
the point in Pn1
k
(x1 : : xn ) to denote that it is unique only up to scalar multiplication. For details
on the construction of projective spaces, see Chapter 8 in [7]. If a polynomial is not
homogeneous, then it is not true that p(x) = 0 implies p(x) = 0 for all = 0.
Denition A.41. The projective variety of a homogeneous ideal I k[x] is {p
: f (p) = 0 for all f I}.
Pn1
k
We do not introduce any notation for projective varieties here as we will not
discuss them in this appendix. Chapter 8 in [7] gives an introduction to projective
varieties and their relationship to ane varieties. As for ane varieties and their
ideals, Gr
obner bases play an important role in computations involving projective
varieties and their (homogeneous) ideals.
Example A.42. The homogeneous principal ideal I = yz x2  C[x, y, z] contains all lines spanned by the points (t, t2 , 1), t C, in its ane variety in C3 . Its
projective variety is {(t : t2 : 1) : t C} {(0 : u : 0) : u C} P2C .
A eld k is algebraically closed if every polynomial in k[x] has all its roots
in k n . The eld C is algebraically closed while R is not. Every ideal I k[x] has an
ane variety Vk (I) k n , although dierent ideals can have the same ane variety.
For instance, both x, y and x2 , y 2  in C[x, y] have the ane variety {(0, 0)} in C2 .

i
i

A.4. Algebra of Polynomials and Ideals

main
2012/11/1
page 465
i

465

Given a variety W k n , its vanishing ideal,


I(W ) := {f k[x] : f (p) = 0 for all p W }
is the set of all polynomials in k[x] that vanish on W . Check that I I(Vk (I)) and
that Vk (I(Vk (I))) = Vk (I). The ideal I(Vk (I)) is the largest ideal with the ane
variety Vk (I). This vanishing ideal has the important property that if f m belongs
to it, then so does f since f m (p) = 0 for all p Vk (I) implies that f (p) = 0 for all
p Vk (I).
Denition A.43. The radical of an ideal I k[x] is

I := {f k[x] : f m I for some positive integer m}.

An ideal I is radical if I = I.

The radical I is an ideal and both I and I have the same ane variety.
Further, the vanishing ideal I(Vk (I)) is a radical ideal. The following theorem shows
that when k is an algebraically closed eld, there is a bijection between radical ideals
in k[x] and ane varieties in k n .
Theorem A.44 (Hilbertsstrong Nullstellensatz). If k is an algebraically
closed eld, then I(Vk (I)) = I.
The following example points out the importance of k being algebraically
closed in the above Nullstellensatz.
Example A.45. The ideal I = x2 + y 2  C[x, y] is radical. Its ane variety
in R2 is {(0, 0)}, whose vanishing ideal is J = x, y and J = I. On the other
hand, the ane variety of I in C2 consists of the two lines x = iy whose vanishing
ideal is I.
There is a strong Nullstellensatz for projective varieties as well that has the
same statement. However, there is a weak Nullstellensatz that characterizes empty
varieties whose statements are dierent for ane and projective varieties. We refer
the reader to [7, Chapter 8] for details.
Theorem A.46 (Hilberts weak Nullstellensatz). Let k be an algebraically
closed eld.
1. If I is an ideal in k[x], then its ane variety Vk (I) k n is empty if and only
if I = k[x].
2. If I is a homogeneous ideal in k[x], then its projective variety in Pn1
is empty
k
i

I
where
mi is
if and only if for each i = 1, . . . , n, there is a monomial xm
i
some nonnegative integer.
To end this subsection, we briey discuss the notions of dimension, degree, and
singular points of an algebraic variety. These notions are too subtle to be explained

i
i

466

main
2012/11/1
page 466
i

Appendix A. Background Material

correctly here and we refer the reader to [7, Chapter 9]. Dimension and degree of
a variety can be computed from an algebraic entity called the Hilbert polynomial of
the vanishing ideal of the variety. A key feature of Gr
obner basis theory is that an
ideal I and all its initial ideals have the same Hilbert polynomial and the polynomial
has a combinatorial expression that can be computed from the standard monomials
of any of its initial ideals. Intuitively, the dimension of an ideal is the dimension of
the largest component of its ane variety. For instance we expect a hypersurface in
k n to have dimension n 1 since it is constrained by a single polynomial. However,
when the eld is not algebraically closed, this intuition can be wrong. For instance,
VR (x2 + y 2 ) = {(0, 0)} is a zero-dimensional variety in R2 while VC (x2 + y 2 ) is a
one-dimensional variety in C2 .
The degree of a variety is also dened from the Hilbert polynomial of the
vanishing ideal. Intuitively we expect that slicing an r-dimensional variety in k n
with a generic plane of dimension n r through the origin would create nitely
many intersections. The number of intersection points should be constant if the
plane is generic enough and is intuitively the degree of the variety. For instance,
the parabola dened by y x2 has two points of intersection with a generic line
through the origin saying that its degree is two, while the cubic curve y = x3 cuts
out a variety of degree three.
A nonsingular (also called regular or smooth) point p on a variety W is a
point where the tangent space to W at p has the same dimension as the component
of W containing p and hence serves as a reasonable linear approximation to this
f
f
, . . . , x
) be the
component near p. For a polynomial f k[x], let (f ) := ( x
1
n
n
n
gradient of f and (f )(p) k be the evaluation of (f ) at p k . Since the
structure of a variety is unchanged by translation, we may assume without loss of
generality that p = 0. If I(W ) = f1 , . . . , fs , then the tangent space of W at p
is the null space of the matrix J(0) whose rows are (f1 )(0), . . . , (fs )(0). The
matrix J whose rows are the polynomials (f1 ), . . . , (fs ) is called the Jacobian
matrix of f1 , . . . , fs . Thus the rank of J(0) determines whether 0 is singular on W
or not. In particular, 0 is a singular point on a hypersurface Vk (f ) if and only if
(f )(u) = 0.

A.4.4

Real Algebraic Geometry

A good deal of the algebraic geometry that appears in this book is over R, which
is not an algebraically closed eld. As a result, many of the theorems that apply
over C do not work in this setting making the study of real varieties and their
ideals more tricky than their complex counterparts. A good introduction to the
real algebraic geometry background needed in this book is [12]. We dene a few of
the key concepts and results.
Denition A.47. A set S Rn dened as S = {x Rn : fi (x) i 0, i =
1, . . . , t}, where, for each i, i is one of , >, =, =, and fi (x) R[x], is called
a basic semialgebraic set. A basic closed semialgebraic set is a set of the form
S = {x Rn : f1 (x) 0, . . . , ft (x) 0}.

i
i

A.4. Algebra of Polynomials and Ideals

main
2012/11/1
page 467
i

467

Every basic semialgebraic set can be expressed with polynomial inequalities


of the form f (x) 0 and a single inequality g = 0. In this book we only encounter
basic semialgebraic sets in which  is either >, , or =. Note that every real
algebraic variety is a basic closed semialgebraic set.
Denition A.48. A nite union of basic semialgebraic sets in Rn is called a
semialgebraic set, and a nite union of basic closed semialgebraic sets is a closed
semialgebraic set.
Semialgebraic sets are closed under nite unions, nite intersections, and complementation. The following theorem shows that semialgebraic sets are also closed
under projections, a fact that is used several times in this book. For more details
see [12] and [4].
Theorem A.49 (TarskiSeidenberg theorem). Let S Rk+n be a semialgebraic set and : Rk+n Rn be the projection map that sends (y, x)  x. Then
(S) is a semialgebraic set in Rn .
Recall that denotes the set of sums of squares polynomials in R[x].
Denition A.50. The preorder associated with a nite set of polynomials f1 , . . . ,
ft R[x] is the set


preorder(f1 , . . . , ft ) :=
s f11 ftt : = (1 , . . . , t ) {0, 1}t and s .
All the polynomials in preorder(f1 , . . . , ft ) are nonnegative on the basic
closed semialgebraic set S = {x Rn : f1 (x) 0, . . . , ft (x) 0}.
Denition A.51. The real radical of an ideal I = f1 , . . . , ft  is the ideal

R
I := {f R[x] : f 2m + I for some nonnegative integer m}.

The ideal I is said to be real radical if I = R I.


We conclude with the Positivstellensatz and the real Nullstellensatz that
play the analogous role of Hilberts Nullstellensatz for semialgebraic sets and real
varieties.
Theorem A.52 (Positivstellensatz). Let f1 , . . . , ft R[x] and S = {x Rn :
f1 (x) 0, . . . , ft (x) 0} and T be the preorder associated to f1 , . . . , ft . For a
polynomial f R[x],
1. f > 0 on S if and only if there exists p, q T such that pf = 1 + q;
2. f 0 on S if and only if there exists an integer m 0 and p, q T such that
pf = f 2m + q;

i
i

468

main
2012/11/1
page 468
i

Appendix A. Background Material

3. f = 0 on S if and only if there exists an integer m 0 such that f 2m T ;


4. S = if and only if 1 T .
Corollary A.53
(Real Nullstellensatz). If I is an ideal in R[x], then its real
radical ideal R I is the largest ideal that vanishes on VR (I).

Recall that I R[x], the radical ideal of I in R[x] is the largest ideal that
vanishes on
thecomplex variety VC (I). Therefore, since VR (I) VC (I), we have
that I I R I.
The Positivstellensatz also gives a simple solution to Hilberts 17th problem,
which asked whether every nonnegative polynomial in R[x] can be written as a sum
of squares of rational functions in x. This was answered in the armative by Artin
in 1927. The two-variable case was shown by Hilbert in 1893.

Bibliography
[1] A. Barvinok. A remark on the rank of positive semidenite matrices subject
to ane constraints. Discrete Comput. Geom., 25:2331, 2001.
[2] A. Barvinok. A Course in Convexity, Grad. Stud. Math. 54. American Mathematical Society, Providence, RI, 2002.
[3] D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar. Convex Analysis and Optimization. Athena Scientic, Belmont, MA, 2003.
[4] J. Bochnak, M. Coste, and M-F. Roy. Real Algebraic Geometry. Springer,
Berlin, 1998.
[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, Cambridge, UK, 2004.
[6] R. W. Cottle. Manifestations of the Schur complement. Linear Algebra Appl.,
8:189211, 1974.
[7] D. Cox, J. Little, and D. OShea. Ideals, Varieties and Algorithms. SpringerVerlag, New York, 1992.
[8] D. S. Dummit and R. M. Foote. Abstract Algebra. Prentice Hall Inc., Englewood
Clis, NJ, 1991.
[9] G. H. Golub and C. F. Van Loan. Matrix Computations, 3rd edition. Johns
Hopkins University Press, 1996.
[10] D. R. Grayson and M. E. Stillman. Macaulay 2, a software system for research
in algebraic geometry. Available at http://www.math.uiuc.edu/Macaulay2/.
[11] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press,
Cambridge, UK, 1995.

i
i

Bibliography

main
2012/11/1
page 469
i

469

[12] M. Marshall. Positive Polynomials and Sums of Squares. Math. Surveys


Monogr. 146, American Mathematical Society, Providence, RI, 2008.
[13] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton,
New Jersey, 1970.
[14] R. Schneider. Convex Bodies: The Brunn-Minkowski Theory. Cambridge University Press, Cambridge, UK, 1993.
[15] G. Strang. Introduction to Linear Algebra, 4th edition. Wellesley Cambridge
Press, 2009.

i
i

main
2012/11/1
page 470
i

main
2012/11/1
page 471
i

Index
A-discriminant, 217, 224
adjoint, 209
ane linear pencil, 353
algebraic boundary, 205, 207, 211, 224, 226
algebraic degree, 220
semidenite programming, 236
algebraic interior, 255
algebraic set, 294
analytic center, 239
analytic polynomial, 357
Ando theorem, 439
Archimedean property, 115, 277
Archimedean semiring, 437
atoms, 39
Bergman kernel, 414
biduality, 209, 211, 217, 243
binary optimization, 28
bitangent line, 225
Blaschke product, 419
border vector, 371, 374
border vectormiddle matrix, 371
bounded degree representation, 277
PutinarPrestel, 278
Schm
udgen, 278
calibrated geometry, 323
Cayleys cubic surface, 232
CayleyBacharach relations, 174
characteristic polynomial, 447
characteristic vector, 330
Chebyshev inequality, 139
Cholesky, see decomposition
CHSY lemma, 380
clamped second fundamental form, 388
clamped tangent plane, 388, 390
closed loop system, 343

combinatorial optimization, 330


commutative collapse, 389
commutator, 349, 362, 365, 369
complementary slackness, 13, 206, 214,
234
completely positive matrix, see matrix
complex symmetric linear transform,
411
concave function, 451
cone, 209
dual, 117, 209
Lorenz, 253; see cone second-order
pointed, 21
proper, 7, 21
second-order, 22, 211, 253
semidenite, 231
solid, 21
sums of squares; see sums of
squares cone
congruence transformation, 448
conic programming, 19
conical hull, 5
conormal variety, 215
convex body, 211
dual, 211
convex boundary, 288
convex forms
cone of, 195
volume of, 196
convex function, 451
convex hull, 5
convex optimization problem, 453
convex polynomial, 350, 354356, 377,
398
convex quadrics, 321
convex set, 451

471

i
i

472
convex sum of squares, 271, 274
copositive matrix, see matrix
corner point, 265
correlation matrix, 209, 232, 234
curvature, 264
nonnegative, 264
positive, 264
cyclic forms, 134
cyclohexatope, 238
decomposition
LDLT , 449
Cholesky, 449
eigenvalue, 449
dening polynomial, 255
dehomogenization, 211, 215
density matrix, 140
dimension free, 342, 344
directional derivative, 362, 365, 367
discriminant, 217
dissipative system, 344
domain of regularity, 358
dual cone; see cone/dual
dual optimization problem, 213
dual variety, 215, 216, 221
dual vector space, 209
duality, 203
projective, 207
semidenite programming, 22
strong, 22
ellipsoid, 252
elliptope, 15, 232
epigraph, 451
Euclidean distance matrix, 37
face, 210
dual, 210
exposed, 210, 261
proper, 211
facet, 211
Farkas lemma, 111
Fock space, 363, 386
free
analysis, 341
convex algebraic geometry, 341

main
2012/11/1
page 472
i

Index
convexity, 342, 348
positivity, 341
probability, 341, 342, 348
real algebraic geometry, 341
semialgebraic set, 351
variables, 356
full rank point, 388
genus, 329
geometric theorem proving, 142
Gr
obner basis, 94, 205, 216, 297
Gram matrix, 61, 379, 387
graph
perfect, 333
Petersen, 35, 337
triangle-free, 335
Grassmannian, 323
Grothendieck constant, 32
Hadamard product, 414
Hermite matrix, 49
Hermite theorem, 416
hermitian linear transform, 409
hermitian structure, 408
Hessian, 355, 376, 387
modied, 392
relaxed, 393
hierarchy of relaxations, 113, 297
Hilbert space factorization, 413
Hilbert space realization, 423
Hilberts theorem, 162, 325
homogeneous linear pencil, 353
homogenization, 211
hyperbolic, 256
hyperboloid, 261
hyperplane rounding, 30
ideal, 107, 295
congruent mod, 297
initial, 297
Pl
ucker, 323
principal, 323
real radical, 305
StanleyReisner, 334
vanishing, 305
independent set, 34

i
i

Index
inequality
linear matrix; see linear matrix
inequality
inertia, 49,
inertia, law of, 410
inertia of a matrix, 449
inner product, 408
apolar, 67
Bombieri, 67
Fischer, 67
input space, 343
interpolation
analytic, 35
intervals, 86
involution, 352
irredundant, 265
Jacobian matrix, 215
K3 -cover subgraph problem, 337
k-ellipse, 17, 254
KarushKuhnTucker condition, 452
equations, 214
general form, 214
SDP, 206, 234
k-sos mod I, 296
Lagrange multiplier, 213
Lagrangian, 213, 452
Lasserres method, 296
LDL decomposition, 359
leading principal minor, 447
LifshitzKrein theorem, 431
lift-and-project methods, 330
lifting vector, 261
line test, 257
linear matrix inequality, 7, 204, 252,
346, 396
monic, 252
linear pencil, 251, 258, 353, 396
ane, 353
homogeneous, 353
monic, 353, 396, 398
symmetric, 353
linear programming, 4, 293
linear system, 342, 343

main
2012/11/1
page 473
i

473
LMI; see linear matrix inequality
localization, 252, 283
localizing matrix, 273
N th order, 273
Lov
asz theta function, 34
Lyapunov function, 25, 136
Markov inequality, 139
MATLAB, 300
matrix
completely positive, 131
copositive, 131, 270
Euclidean distance, 37
pseudo-moment, 315
reduced moment, 306
shifted reduced moment, 313
sum of squares, 87
matrix convex, 354
matrix factorizations, 448
matrix inequality, 346
matrix positive, 354
matrix-valued noncommutative polynomials, 352
maximum cut problem, 28, 335
middle matrix, 371, 374, 376
signature, 378
min-max principle, 411
Minkowski sum, 283
MinkowskiWeyl theorem, 5
moment curve, 123
moment matrix, 176, 272
moment spaces, 123
moments, 120, 251, 271
monomial basis, 67
Motzkin form, 162
natural map, 384
NCAlgebra, 366
NCSOStools, 366, 369
NCvars, 369
NevanlinnaPick theorem, 35, 427
Newton identities, 49, 50
Newton polytope, 91, 162
noncommutative
basic open semialgebraic set, 351
basic semialgebraic set, 382

i
i

474
convex, 354, 396
polynomial, 349, 352
positive, 354
rational expressions, 358
spectrahedron, 396
nonnegative polynomials
algebraic boundary, 172
boundary structure, 170
cone of, 161
dual cone, 168
exposed faces, 170
on a variety, 296
volume of, 187
nonsingular point, 265
norm
Lp , 212
atomic, 39
dual, 212
Frobenius, 15
nuclear, 15, 39
operator, 15
normal form, 297
normal space, 326
Nullstellensatz, 110
odd cycle, 337
odd wheel, 338
Ono inequality, 143
optimal value function, 207, 220, 224,
235
optimality conditions, 452
output space, 343
partial order, 7
Pataki inequalities, 236
phase shift, 430
polyhedron, 4
polynomial, 349, 352
analytic, 357
concave, 399
convex, 350, 354356, 362, 368,
377, 396
evaluation, 353, 365
irreducible, 389
linear, 295
linear dependence, 387

main
2012/11/1
page 474
i

Index
noncommutative, 349, 352
positive, 354, 356, 362, 369
symmetric, 350, 352, 354, 356
trigonometric, 63
univariate, 86
vanishing, 354
polynomial identity, 354, 362364
polynomial matrix inequality, 282
polynomial optimization
univariate, 76, 77
polynomial optimization, 76, 213
polytope, 4, 211
2-level, 318
k-level, 330
compressed, 319
triangle-free subgraph, 335
positive curvature, 387, 389, 391
positive denite kernel, 411
positive semidenite, 204, 410, 448
positivity set, 396
Positivstellensatz, 112, 347, 348, 397
Schm
udgen, 115, 273, 321
Putinar, 115, 273
preorder, 107
truncated, 264
principal minor, 447
probability bounds, 139
projective space, 215
projective toric variety, 217
proper analytic maps, 440
protrusion, 205
Polya theorem, 434
quadratic module, 107
truncated, 264
QuadratischePositivstellensatz, 372,
380, 386, 391
quantum
entanglement, 140
phenomena, 342
quasi-concave, 265
strictly, 265
Quillen theorem, 432
R<x>k , 352
R<x, x >, 356

i
i

Index
rank minimization, 39
rational expressions, 358
equivalent, 359
rational function, 77, 359, 365
Bergman, 359
convex, 361, 368
linear dependence, 384
matrix, 359
noncommutative, 359
positive, 361
rational sos decompositions, 69
real Nullstellensatz, 305
real zero, 256
redundant, 265
regular point, 215
Riccati
matrix inequality, 345
polynomial, 357
RieszFejer theorem, 422
RieszHerglotz theorem, 421
rigidly convex, 257
root separation, 417
S-lemma, 80
S-procedure, 80
Sch
onberg matrix, 238
Schur algorithm, 419
Schur complement, 253, 345
Schur theorem, 414
Schur inequality, 142
SchurAgler class, 438
SDPT3, 41
second fundamental form, 387
clamped, 388
SeDuMi, 41, 301
semialgebraic set, 211, 220, 294, 350,
382
basic closed semialgebraic, 255,
265
basic open semialgebraic, 350
basic semialgebraic, 233
convex, 396
free, 351
noncommutative, 351
semidenite programming, 7, 233, 293
abstract denition, 235

main
2012/11/1
page 475
i

475
semidenite relaxation
Putinar, 273
Schm
udgen, 274
semidenite representation, 251, 294
separation theorem, 325
shadow area, 323
signal ow diagram, 343
signature of a matrix, 449
simplicial complex, 334
singular
locus, 215
point, 215, 265, 327
Slaters condition, 14, 275
sos-matrix, 87
spectrahedron, 8, 9, 15, 205, 231, 252,
396, 399
lifting, 261
projected, 9, 261, 294
spectral theorem, 409, 427
spectraplex, 15
stability number, 331
stable set, 34, 331
polytope, 331
problem, 331
standard monomials, 297
state space, 343
Steiners quartic surface, 232
storage function, 345
sum of largest eigenvalues, 262
sum of largest singular values, 262
sum of squares, 57, 296, 342, 347, 355,
369, 387, 396, 397
convexity, 90
mod ideal, 296, 298
on quotient rings, 94
program, 73
sums of squares cone
algebraic boundary, 172
cone of, 161
dual cone, 176
semidenite representation, 177
volume of, 192
symmetric ane linear pencil, 353
symmetric polynomial, 350, 352,
356
symmetric variables, 352

i
i

476

main
2012/11/1
page 476
i

Index

tangent plane
clamped, 388
tensor product, 353
Kronecker, 353
theta
body, 243, 297, 303
body of a graph, 332
number of a graph, 333
Toeplitz matrix, 420
trace, 7
triangle-free subgraph problem, 335
tritangent plane, 227
Trott curve, 225
truncated moment vector, 272
TV screen, 351, 366, 399
unitary transform, 409
valid constraint, 107
variables
classes, 357
free, 356
mixed, 357
variety, 388
compact, 320
noncommutative, 388
real, 295
real algebraic, 294
Varolin theorem, 440
Veronese surface, 224
von Neumann inequality, 425
YALMIP, 300
Zariski closure, 211
zero set, 388; see variety

i
i

You might also like