100% found this document useful (6 votes)
2K views

Practical Linear Algebra A Geometry Toolbox

very good book for geometrical intro to linear algebra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (6 votes)
2K views

Practical Linear Algebra A Geometry Toolbox

very good book for geometrical intro to linear algebra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 506

Practical Linear Algebra

This page intentionally left blank


Practical Linear Algebra
A Geometry Toolbox
Third Edition

Gerald Farin
Dianne Hansford
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2014 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works


Version Date: 20130524

International Standard Book Number-13: 978-1-4665-7958-3 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but
the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to
trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained.
If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical,
or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without
written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright
Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a
variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to
infringe.

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
With gratitude to
Dr. James Slack and Norman Banemann.
This page intentionally left blank
Contents

Preface xiii

Descartes’ Discovery 1 Chapter 1


1.1 Local and Global Coordinates: 2D . . . . . . . . . . 2
1.2 Going from Global to Local . . . . . . . . . . . . . 6
1.3 Local and Global Coordinates: 3D . . . . . . . . . . 8
1.4 Stepping Outside the Box . . . . . . . . . . . . . . . 9
1.5 Application: Creating Coordinates . . . . . . . . . . 10
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 12

Here and There: Points and Vectors in 2D 15 Chapter 2


2.1 Points and Vectors . . . . . . . . . . . . . . . . . . 16
2.2 What’s the Difference? . . . . . . . . . . . . . . . . 18
2.3 Vector Fields . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Length of a Vector . . . . . . . . . . . . . . . . . . . 20
2.5 Combining Points . . . . . . . . . . . . . . . . . . . 23
2.6 Independence . . . . . . . . . . . . . . . . . . . . . 26
2.7 Dot Product . . . . . . . . . . . . . . . . . . . . . . 26
2.8 Orthogonal Projections . . . . . . . . . . . . . . . . 31
2.9 Inequalities . . . . . . . . . . . . . . . . . . . . . . . 32
2.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 33

Lining Up: 2D Lines 37 Chapter 3


3.1 Defining a Line . . . . . . . . . . . . . . . . . . . . 38
3.2 Parametric Equation of a Line . . . . . . . . . . . . 39
3.3 Implicit Equation of a Line . . . . . . . . . . . . . . 40
3.4 Explicit Equation of a Line . . . . . . . . . . . . . . 44

vii
viii

3.5 Converting Between Parametric and Implicit


Equations . . . . . . . . . . . . . . . . . . . . . . . 44
3.5.1 Parametric to Implicit . . . . . . . . . . . . . . 45
3.5.2 Implicit to Parametric . . . . . . . . . . . . . . 45
3.6 Distance of a Point to a Line . . . . . . . . . . . . . 47
3.6.1 Starting with an Implicit Line . . . . . . . . . . 48
3.6.2 Starting with a Parametric Line . . . . . . . . 50
3.7 The Foot of a Point . . . . . . . . . . . . . . . . . . 51
3.8 A Meeting Place: Computing Intersections . . . . . 52
3.8.1 Parametric and Implicit . . . . . . . . . . . . . 53
3.8.2 Both Parametric . . . . . . . . . . . . . . . . . 55
3.8.3 Both Implicit . . . . . . . . . . . . . . . . . . . 57
3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 58

Chapter 4 Changing Shapes: Linear Maps in 2D 61


4.1 Skew Target Boxes . . . . . . . . . . . . . . . . . . 62
4.2 The Matrix Form . . . . . . . . . . . . . . . . . . . 63
4.3 Linear Spaces . . . . . . . . . . . . . . . . . . . . . 66
4.4 Scalings . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5 Reflections . . . . . . . . . . . . . . . . . . . . . . . 71
4.6 Rotations . . . . . . . . . . . . . . . . . . . . . . . . 73
4.7 Shears . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.8 Projections . . . . . . . . . . . . . . . . . . . . . . . 77
4.9 Areas and Linear Maps: Determinants . . . . . . . 80
4.10 Composing Linear Maps . . . . . . . . . . . . . . . 83
4.11 More on Matrix Multiplication . . . . . . . . . . . . 87
4.12 Matrix Arithmetic Rules . . . . . . . . . . . . . . . 89
4.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 91

Chapter 5 2 × 2 Linear Systems 95


5.1 Skew Target Boxes Revisited . . . . . . . . . . . . . 96
5.2 The Matrix Form . . . . . . . . . . . . . . . . . . . 97
5.3 A Direct Approach: Cramer’s Rule . . . . . . . . . 98
5.4 Gauss Elimination . . . . . . . . . . . . . . . . . . . 99
5.5 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . 101
5.6 Unsolvable Systems . . . . . . . . . . . . . . . . . . 104
5.7 Underdetermined Systems . . . . . . . . . . . . . . 104
5.8 Homogeneous Systems . . . . . . . . . . . . . . . . 105
5.9 Undoing Maps: Inverse Matrices . . . . . . . . . . . 107
5.10 Defining a Map . . . . . . . . . . . . . . . . . . . . 113
5.11 A Dual View . . . . . . . . . . . . . . . . . . . . . . 114
5.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 116
ix

Moving Things Around: Affine Maps in 2D 119 Chapter 6


6.1 Coordinate Transformations . . . . . . . . . . . . . 120
6.2 Affine and Linear Maps . . . . . . . . . . . . . . . . 122
6.3 Translations . . . . . . . . . . . . . . . . . . . . . . 124
6.4 More General Affine Maps . . . . . . . . . . . . . . 124
6.5 Mapping Triangles to Triangles . . . . . . . . . . . 126
6.6 Composing Affine Maps . . . . . . . . . . . . . . . 128
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 132

Eigen Things 135 Chapter 7


7.1 Fixed Directions . . . . . . . . . . . . . . . . . . . . 136
7.2 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . 137
7.3 Eigenvectors . . . . . . . . . . . . . . . . . . . . . . 139
7.4 Striving for More Generality . . . . . . . . . . . . . 142
7.5 The Geometry of Symmetric Matrices . . . . . . . . 145
7.6 Quadratic Forms . . . . . . . . . . . . . . . . . . . . 149
7.7 Repeating Maps . . . . . . . . . . . . . . . . . . . . 153
7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 155

3D Geometry 157 Chapter 8


8.1 From 2D to 3D . . . . . . . . . . . . . . . . . . . . 158
8.2 Cross Product . . . . . . . . . . . . . . . . . . . . . 160
8.3 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.4 Planes . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.5 Scalar Triple Product . . . . . . . . . . . . . . . . . 169
8.6 Application: Lighting and Shading . . . . . . . . . 171
8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 174

Linear Maps in 3D 177 Chapter 9


9.1 Matrices and Linear Maps . . . . . . . . . . . . . . 178
9.2 Linear Spaces . . . . . . . . . . . . . . . . . . . . . 180
9.3 Scalings . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.4 Reflections . . . . . . . . . . . . . . . . . . . . . . . 183
9.5 Shears . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.6 Rotations . . . . . . . . . . . . . . . . . . . . . . . . 186
9.7 Projections . . . . . . . . . . . . . . . . . . . . . . . 190
9.8 Volumes and Linear Maps: Determinants . . . . . . 193
9.9 Combining Linear Maps . . . . . . . . . . . . . . . 196
9.10 Inverse Matrices . . . . . . . . . . . . . . . . . . . . 199
9.11 More on Matrices . . . . . . . . . . . . . . . . . . . 200
9.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 202
x

Chapter 10 Affine Maps in 3D 207


10.1 Affine Maps . . . . . . . . . . . . . . . . . . . . . . 208
10.2 Translations . . . . . . . . . . . . . . . . . . . . . . 209
10.3 Mapping Tetrahedra . . . . . . . . . . . . . . . . . 209
10.4 Parallel Projections . . . . . . . . . . . . . . . . . . 212
10.5 Homogeneous Coordinates and Perspective Maps . 217
10.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 222

Chapter 11 Interactions in 3D 225


11.1 Distance Between a Point and a Plane . . . . . . . 226
11.2 Distance Between Two Lines . . . . . . . . . . . . . 227
11.3 Lines and Planes: Intersections . . . . . . . . . . . . 229
11.4 Intersecting a Triangle and a Line . . . . . . . . . . 231
11.5 Reflections . . . . . . . . . . . . . . . . . . . . . . . 232
11.6 Intersecting Three Planes . . . . . . . . . . . . . . . 233
11.7 Intersecting Two Planes . . . . . . . . . . . . . . . 235
11.8 Creating Orthonormal Coordinate Systems . . . . . 235
11.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 238

Chapter 12 Gauss for Linear Systems 243


12.1 The Problem . . . . . . . . . . . . . . . . . . . . . . 244
12.2 The Solution via Gauss Elimination . . . . . . . . . 247
12.3 Homogeneous Linear Systems . . . . . . . . . . . . 255
12.4 Inverse Matrices . . . . . . . . . . . . . . . . . . . . 257
12.5 LU Decomposition . . . . . . . . . . . . . . . . . . . 260
12.6 Determinants . . . . . . . . . . . . . . . . . . . . . 264
12.7 Least Squares . . . . . . . . . . . . . . . . . . . . . 267
12.8 Application: Fitting Data to a Femoral Head . . . . 271
12.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 273

Chapter 13 Alternative System Solvers 277


13.1 The Householder Method . . . . . . . . . . . . . . . 278
13.2 Vector Norms . . . . . . . . . . . . . . . . . . . . . 285
13.3 Matrix Norms . . . . . . . . . . . . . . . . . . . . . 288
13.4 The Condition Number . . . . . . . . . . . . . . . . 291
13.5 Vector Sequences . . . . . . . . . . . . . . . . . . . 293
13.6 Iterative System Solvers: Gauss-Jacobi and Gauss-
Seidel . . . . . . . . . . . . . . . . . . . . . . . . . . 295
13.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 299
xi

General Linear Spaces 303 Chapter 14


14.1 Basic Properties of Linear Spaces . . . . . . . . . . 304
14.2 Linear Maps . . . . . . . . . . . . . . . . . . . . . . 307
14.3 Inner Products . . . . . . . . . . . . . . . . . . . . . 310
14.4 Gram-Schmidt Orthonormalization . . . . . . . . . 314
14.5 A Gallery of Spaces . . . . . . . . . . . . . . . . . . 316
14.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 318

Eigen Things Revisited 323 Chapter 15


15.1 The Basics Revisited . . . . . . . . . . . . . . . . . 324
15.2 The Power Method . . . . . . . . . . . . . . . . . . 331
15.3 Application: Google Eigenvector . . . . . . . . . . . 334
15.4 Eigenfunctions . . . . . . . . . . . . . . . . . . . . . 337
15.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 339

The Singular Value Decomposition 343 Chapter 16


16.1 The Geometry of the 2 × 2 Case . . . . . . . . . . . 344
16.2 The General Case . . . . . . . . . . . . . . . . . . . 348
16.3 SVD Steps . . . . . . . . . . . . . . . . . . . . . . . 352
16.4 Singular Values and Volumes . . . . . . . . . . . . . 353
16.5 The Pseudoinverse . . . . . . . . . . . . . . . . . . . 354
16.6 Least Squares . . . . . . . . . . . . . . . . . . . . . 355
16.7 Application: Image Compression . . . . . . . . . . . 359
16.8 Principal Components Analysis . . . . . . . . . . . 360
16.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 365

Breaking It Up: Triangles 367 Chapter 17


17.1 Barycentric Coordinates . . . . . . . . . . . . . . . 368
17.2 Affine Invariance . . . . . . . . . . . . . . . . . . . . 370
17.3 Some Special Points . . . . . . . . . . . . . . . . . . 371
17.4 2D Triangulations . . . . . . . . . . . . . . . . . . . 374
17.5 A Data Structure . . . . . . . . . . . . . . . . . . . 375
17.6 Application: Point Location . . . . . . . . . . . . . 377
17.7 3D Triangulations . . . . . . . . . . . . . . . . . . . 378
17.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 379

Putting Lines Together: Polylines and Polygons 381 Chapter 18


18.1 Polylines . . . . . . . . . . . . . . . . . . . . . . . . 382
18.2 Polygons . . . . . . . . . . . . . . . . . . . . . . . . 382
18.3 Convexity . . . . . . . . . . . . . . . . . . . . . . . 384
18.4 Types of Polygons . . . . . . . . . . . . . . . . . . . 385
18.5 Unusual Polygons . . . . . . . . . . . . . . . . . . . 386
xii

18.6 Turning Angles and Winding Numbers . . . . . . . 387


18.7 Area . . . . . . . . . . . . . . . . . . . . . . . . . . 389
18.8 Application: Planarity Test . . . . . . . . . . . . . . 393
18.9 Application: Inside or Outside? . . . . . . . . . . . 394
18.9.1 Even-Odd Rule . . . . . . . . . . . . . . . . . . 394
18.9.2 Nonzero Winding Number . . . . . . . . . . . . 395
18.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 396

Chapter 19 Conics 399


19.1 The General Conic . . . . . . . . . . . . . . . . . . 400
19.2 Analyzing Conics . . . . . . . . . . . . . . . . . . . 405
19.3 General Conic to Standard Position . . . . . . . . . 406
19.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 409

Chapter 20 Curves 411


20.1 Parametric Curves . . . . . . . . . . . . . . . . . . . 412
20.2 Properties of Bézier Curves . . . . . . . . . . . . . . 417
20.3 The Matrix Form . . . . . . . . . . . . . . . . . . . 418
20.4 Derivatives . . . . . . . . . . . . . . . . . . . . . . . 420
20.5 Composite Curves . . . . . . . . . . . . . . . . . . . 422
20.6 The Geometry of Planar Curves . . . . . . . . . . . 422
20.7 Moving along a Curve . . . . . . . . . . . . . . . . . 425
20.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . 427

Appendix A Glossary 429

Appendix B Selected Exercise Solutions 443

Bibliography 487

Index 489
Preface

Just about everyone has watched animated movies, such as Toy Story
or Shrek, or is familiar with the latest three-dimensional computer
games. Enjoying 3D entertainment sounds like more fun than study-
ing a linear algebra book. But it is because of linear algebra that
those movies and games can be brought to a TV or computer screen.
When you see a character move on the screen, it’s animated using
some equation straight out of this book. In this sense, linear algebra
is a driving force of our new digital world: it is powering the software
behind modern visual entertainment and communication.
But this is not a book on entertainment. We start with the funda-
mentals of linear algebra and proceed to various applications. So it
doesn’t become too dry, we replaced mathematical proofs with mo-
tivations, examples, or graphics. For a beginning student, this will
result in a deeper level of understanding than standard theorem-proof
approaches. The book covers all of undergraduate-level linear alge-
bra in the classical sense—except it is not delivered in a classical way.
Since it relies heavily on examples and pointers to applications, we
chose the title Practical Linear Algebra, or PLA for short.
The subtitle of this book is A Geometry Toolbox; this is meant
to emphasize that we approach linear algebra in a geometric and
algorithmic way. Our goal is to bring the material of this book to
a broader audience, motivated in a large part by our observations
of how little engineers and scientists (non-math majors) retain from
classical linear algebra classes. Thus, we set out to fill a void in the
linear algebra textbook market. We feel that we have achieved this,
presenting the material in an intuitive, geometric manner that will
lend itself to retention of the ideas and methods.

xiii
xiv Preface

Review of Contents
As stated previously, one clear motivation we had for writing PLA
was to present the material so that the reader would retain the in-
formation. In our experience, approaching the material first in two
and then in three dimensions lends itself to visualizing and then to
understanding. Incorporating many illustrations, Chapters 1–7 in-
troduce the fundamentals of linear algebra in a 2D setting. These
same concepts are revisited in Chapters 8–11 in a 3D setting. The
3D world lends itself to concepts that do not exist in 2D, and these
are explored there too.
Higher dimensions, necessary for many real-life applications and the
development of abstract thought, are visited in Chapters 12–16. The
focus of these chapters includes linear system solvers (Gauss elim-
ination, LU decomposition, the Householder method, and iterative
methods), determinants, inverse matrices, revisiting “eigen things,”
linear spaces, inner products, and the Gram-Schmidt process. Singu-
lar value decomposition, the pseudoinverse, and principal components
analysis are new additions.
Conics, discussed in Chapter 19, are a fundamental geometric en-
tity, and since their development provides a wonderful application
for affine maps, “eigen things,” and symmetric matrices, they really
shouldn’t be missed. Triangles in Chapter 17 and polygons in Chap-
ter 18 are discussed because they are fundamental geometric entities
and are important in generating computer images.
Several of the chapters have an “Application” section, giving a real-
world use of the tools developed thus far. We have made an effort to
choose applications that many readers will enjoy by staying away from
in-depth domain-specific language. Chapter 20 may be viewed as an
application chapter as a whole. Various linear algebra ingredients are
applied to the techniques of curve design and analysis.
The illustrations in the book come in two forms: figures and sketches.
The figures are computer generated and tend to be complex. The
sketches are hand-drawn and illustrate the core of a concept. Both are
great teaching and learning tools! We made all of them available on
the book’s website http://www.farinhansford.com/books/pla/. Many
of the figures were generated using PostScript, an easy-to-use geomet-
ric language, or Mathematica.
At the end of each chapter, we have included a list of topics, What
You Should Know (WYSK), marked by the icon on the left. This list
is intended to encapsulate the main points of each chapter. It is not
uncommon for a topic to appear in more than one chapter. We have
Preface xv

made an effort to revisit some key ideas more than once. Repetition
is useful for retention!
Exercises are listed at the end of each chapter. Solutions to selected
exercises are given in Appendix B. All solutions are available to
instructors and instructions for accessing these may be found on the
book’s website.
Appendix A provides an extensive glossary that can serve as a
review tool. We give brief definitions without equations so as to
present a different presentation than that in the text. Also notable
is the robust index, which we hope will be very helpful, particularly
since we revisit topics throughout the text.

Classroom Use
PLA is meant to be used at the undergraduate level. It serves as an
introduction to linear algebra for engineers or computer scientists, as
well as a general introduction to geometry. It is also an ideal prepara-
tion for computer graphics and geometric modeling. We would argue
that it is also a perfect linear algebra entry point for mathematics
majors.
As a one-semester course, we recommend choosing a subset of the
material that meets the needs of the students. In the table below,
LA refers to an introductory linear algebra course and CG refers to
a course tailored to those planning to work in computer graphics or
geometric modeling.

Chapter LA CG
1 Descartes’ Discovery • •
2 Here and There: Points and Vectors in 2D • •
3 Lining Up: 2D Lines •
4 Changing Shapes: Linear Maps in 2D • •
5 2×2 Linear Systems • •
6 Moving Things Around: Affine Maps in 2D • •
7 Eigen Things •
8 3D Geometry • •
9 Linear Maps in 3D • •
10 Affine Maps in 3D • •
11 Interactions in 3D •
xvi Preface

Chapter LA CG
12 Gauss for Linear Systems • •
13 Alternative System Solvers •
14 General Linear Spaces •
15 Eigen Things Revisited •
16 The Singular Value Decomposition •
17 Breaking It Up: Triangles •
18 Putting Lines Together: Polylines and Polygons •
19 Conics •
20 Curves •

Website
Practical Linear Algebra, A Geometry Toolbox has a website:
http://www.farinhansford.com/books/pla/

This website provides:

• teaching materials,
• additional material,

• the PostScript files illustrated in the book,

• Mathematica code,

• errata,

• and more!

Gerald Farin March, 2013


Dianne Hansford Arizona State University
1
Descartes’ Discovery

Figure 1.1.
Local and global coordinate systems: the treasure’s local coordinates with respect to
the boat do not change as the boat moves. However, the treasure’s global coordinates,
defined relative to the lake, do change as the boat moves.

There is a collection of old German tales that take place sometime in


the 17th century, and they are about an alleged town called Schilda,

1
2 1. Descartes’ Discovery

whose inhabitants were not known for their intelligence. Here is one
story [12]:

An army was approaching Schilda and would likely conquer


it. The town council, in charge of the town treasure, had to hide
it from the invaders. What better way than to sink it in the
nearby town lake? So the town council members board the town
boat, head for the middle of the lake, and sink the treasure.
The town treasurer gets out his pocket knife and cuts a deep
notch in the boat’s rim, right where the treasure went down.
Why would he do that, the other council members wonder? “So
that we will remember where we sunk the treasure, otherwise
we’ll never find it later!” replies the treasurer. Everyone is duly
impressed at such planning genius!
Eventually, the war is over and the town council embarks on
the town boat again, this time to reclaim the treasure from the
lake. Once out on the lake, the treasurer’s plan suddenly does
not seem so smart anymore. No matter where they went, the
notch in the boat’s rim told them they had found the treasure!

The French philosopher René Descartes (1596–1650) would have


known better: he invented the theory of coordinate systems. The
treasurer recorded the sinking of the treasure accurately by marking
it on the boat. That is, he recorded the treasure’s position relative
to a local coordinate system. But by neglecting the boat’s position
relative to the lake, the global coordinate system, he lost it all! (See
Figure 1.1.) The remainder of this chapter is about the interplay of
local and global coordinate systems.

1.1 Local and Global Coordinates: 2D


This book is written using the LATEX typesetting system (see [9] or
[13]), which converts every page to be output to a page description
language called PostScript (see [1]). It tells a laser printer where to
position all the characters and symbols that go on a particular page.
For the first page of this chapter, there is a PostScript command that
positions the letter D in the chapter heading.
In order to do this, one needs a two-dimensional, or 2D, coordinate
system. Its origin is simply the lower left corner of the page, and the
x- and y-axes are formed by the horizontal and vertical paper edges
meeting there. Once we are given this coordinate system, we can
position objects in it, such as our letter D.
1.1. Local and Global Coordinates: 2D 3

The D, on the other hand, was designed by font designers who


obviously did not know about its position on this page or of its actual
size. They used their own coordinate system, and in it, the letter D
is described by a set of points, each having coordinates relative to D’s
coordinate system, as shown in Sketch 1.1.
We call this system a local coordinate system, as opposed to the
global coordinate system, which is used for the whole page. Positioning
letters on a page thus requires mastering the interplay of the global
and local systems.
Following Sketch 1.2, let’s make things more formal: Let (x1 , x2 ) be
coordinates in a global coordinate system, called the [e1 , e2 ]-system. Sketch 1.1.
The boldface notation will be explained in the next chapter. You may A local coordinate system.
be used to calling coordinates (x, y); however, the (x1 , x2 ) notation
will streamline the material in this book, and it also makes writing
programs easier. Let (u1 , u2 ) be coordinates in a local system called
the [d1 , d2 ]-system. Let an object in the local system be enclosed by
a box with lower left corner (0, 0) and upper right corner (1, 1). This
means that the object “lives” in the unit square of the local system,
i.e., a square of edge length one, and with its lower left corner at the
origin. Restricting ourselves to the unit square for the local system
makes this first chapter easy—we will later relax this restriction.
We wish to position our object into the global system so that it
fits into a box with lower left corner (min1 , min2 ) and upper right
corner (max1 , max2 ) called the target box (drawn with heavy lines in
Sketch 1.2). This is accomplished by assigning to coordinates (u1 , u2 )
in the local system the corresponding target coordinates (x1 , x2 ) in
the global system. This correspondence is characterized by preserving
each coordinate value with respect to their extents. The local coor-
dinates are also known as parameters. In terms of formulas, these
parameters are written as quotients,
u1 − 0 x1 − min1
=
1−0 max1 − min1
u2 − 0 x2 − min2
= .
1−0 max2 − min2 Sketch 1.2.
Thus, the corresponding formulas for x1 and x2 are quite simple: Global and local systems.

x1 = (1 − u1 )min1 + u1 max1 , (1.1)


x2 = (1 − u2 )min2 + u2 max2 . (1.2)
We say that the coordinates (u1 , u2 ) are mapped to the coordinates
(x1 , x2 ). Sketch 1.3 illustrates how the letter D is mapped. This
concept of a parameter is reintroduced in Section 2.5.
4 1. Descartes’ Discovery

Let’s check that this actually works: the coordinates (u1 , u2 ) =


(0, 0) in the local system must go to the coordinates (x1 , x2 ) =
(min1 , min2 ) in the global system. We obtain

x1 = (1 − 0) · min1 + 0 · max1 = min1 ,


x2 = (1 − 0) · min2 + 0 · max2 = min2 .

Similarly, the coordinates (u1 , u2 ) = (1, 0) in the local system must


go to the coordinates (x1 , x2 ) = (max1 , min2 ) in the global system.
We obtain

x1 = (1 − 1) · min1 + 1 · max1 = max1 ,


x2 = (1 − 0) · min2 + 0 · max2 = min2 .

Example 1.1

Sketch 1.3. Let the target box be given by


Local and global D.
(min1 , min2 ) = (1, 3) and (max1 , max2 ) = (3, 5),

see Sketch 1.4. The coordinates (1/2, 1/2) can be thought of as


the “midpoint” of the local unit square. Let’s look at the result
of the mapping:

1 1
x1 = (1 − ) · 1 + · 3 = 2,
2 2
1 1
x2 = (1 − ) · 3 + · 5 = 4.
2 2
This is the “midpoint” of the target box. You see here how the
geometry in the unit square is replicated in the target box.

Sketch 1.4.
Map local unit square to a target
box.
A different way of writing (1.1) and (1.2) is as follows: Define
Δ1 = max1 − min1 and Δ2 = max2 − min2 . Now we have

x1 = min1 + u1 Δ1 , (1.3)
x2 = min2 + u2 Δ2 . (1.4)
1.1. Local and Global Coordinates: 2D 5

D
D
D D
D
D

Figure 1.2.
D D
Target boxes: the letter D is mapped several times. Left: centered in the unit square.
Right: not centered.

A note of caution: if the target box is not a square, then the object
from the local system will be distorted. We see this in the following
example, illustrated by Sketch 1.5. The target box is given by

(min1 , min2 ) = (−1, 1) and (max1 , max2 ) = (2, 2).

You can see how the local object is stretched in the e1 -direction by
being put into the global system. Check for yourself that the corners
of the unit square (local) still get mapped to the corners of the target
box (global).
In general, if Δ1 > 1, then the object will be stretched in the e1 -
direction, and it will be shrunk if 0 < Δ1 < 1. The case of max1
smaller than min1 is not often encountered: it would result in a re-
versal of the object in the e1 -direction. The same applies, of course,
to the e2 -direction if max2 is smaller than min2 . An example of sev-
eral boxes containing the letter D is shown in Figure 1.2. Just for
fun, we have included one target box with max1 smaller than min1 !
Another characterization of the change of shape of the object may
be made by looking at the change in aspect ratio, which is the ratio of
the width to the height, Δ1 /Δ2 , for the target box. This is also writ-
ten as Δ1 : Δ2 .The aspect ratio in the local system is one. Revisiting
6 1. Descartes’ Discovery

Example 1.1, the aspect ratio of the target box is one, therefore there
is no distortion of the letter D, although it is stretched uniformly in
both coordinates. In Sketch 1.5, a target box is given that has aspect
ratio 3, therefore the letter D is distorted.
Aspect ratios are encountered many times in everyday life. Televi-
sions and computer screens have recently changed from nearly square
4 : 3 to 16 : 9. Sketch 1.5 illustrates the kind of distortion that occurs
when an old format program is stretched to fill a new format screen.
(Normally a better solution is to not stretch the image and allow
for vertical black bars on either side of the image.) All international

(ISO A series) paper, regardless of size, has an aspect ratio of 1√: 2.
Golden rectangles, formed based on the golden ratio φ = (1 + 5)/2
with an aspect ratio of 1 : φ, provide a pleasing and functional shape,
and found their way into art and architecture. Credit cards have an
aspect ratio of 8 : 5, but to fit into your wallet and card readers the
size is important as well.
This principle, by the way, acts strictly on a “don’t need to know”
basis: we do not need to know the relationship between the local and
global systems. In many cases (as in the typesetting example), there
Sketch 1.5. actually isn’t a known correspondence at the time the object in the
A distortion. local system is created. Of course, one must know where the actual
object is located in the local unit square. If it is not nicely centered,
we might have the situation shown in Figure 1.2 (right).
You experience this “unit square to target box” mapping whenever
you use a computer. When you open a window, you might want to
view a particular image in it. The image is stored in a local coordinate
system; if it is stored with extents (0, 0) and (1, 1), then it utilizes
normalized coordinates. The target box is now given by the extents
of your window, which are given in terms of screen coordinates and the
image is mapped to it using (1.1) and (1.2). Screen coordinates are
typically given in terms of pixels;1 a typical computer screen would
have about 1440 × 900 pixels, which has an aspect ratio of 8 : 5
or 1.6.

1.2 Going from Global to Local


When discussing global and local systems in 2D, we used a target box
to position (and possibly distort) the unit square in a local [d1 , d2 ]-
system. For given coordinates (u1 , u2 ), we could find coordinates
(x1 , x2 ) in the global system using (1.1) and (1.2), or (1.3) and (1.4).
1 The term is short for “picture element.”
1.2. Going from Global to Local 7

How about the inverse problem: given coordinates (x1 , x2 ) in the


global system, what are its local (u1 , u2 ) coordinates? The answer is
relatively easy: compute u1 from (1.3), and u2 from (1.4), resulting in
x1 − min1
u1 = , (1.5)
Δ1
x2 − min2
u2 = . (1.6)
Δ2
Applications for this process arise any time you use a mouse to
communicate with a computer. Suppose several icons are displayed
in a window. When you click on one of them, how does your computer
actually know which one? The answer: it uses Equations (1.5) and
(1.6) to determine its position.

Example 1.2

Let a window on a computer screen have screen coordinates


(min1 , min2 ) = (120, 300) and (max1 , max2 ) = (600, 820).
The window is filled with 21 icons, arranged in a 7 × 3 pattern (see
Figure 1.3). A mouse click returns screen coordinates (200, 709).
Which icon was clicked? The computations that take place are as
follows:
200 − 120
u1 = ≈ 0.17,
480
709 − 300
u2 = ≈ 0.79,
520
according to (1.5) and (1.6).
The u1 -partition of normalized coordinates is
0, 0.33, 0.67, 1.
The value 0.17 for u1 is between 0.0 and 0.33, so an icon in the first
column was picked. The u2 -partition of normalized coordinates is
0, 0.14, 0.29, 0.43, 0.57, 0.71, 0.86, 1.
The value 0.79 for u2 is between 0.71 and 0.86, so the “Display” icon
in the second row of the first column was picked.
8 1. Descartes’ Discovery

Figure 1.3.
Selecting an icon: global to local coordinates.

1.3 Local and Global Coordinates: 3D

These days, almost all engineering objects are designed using a Com-
puter-Aided Design (CAD) system. Every object is defined in a coor-
dinate system, and usually many individual objects need to be inte-
grated into one coordinate system. Take designing a large commercial
airplane, for example. It is defined in a three-dimensional (or 3D) co-
ordinate system with its origin at the frontmost part of the plane,
the e1 -axis pointing toward the rear, the e2 -axis pointing to the right
(that is, if you’re sitting in the plane), and the e3 -axis is pointing
upward. See Sketch 1.6.
Sketch 1.6. Before the plane is built, it undergoes intense computer simula-
Airplane coordinates. tion in order to find its optimal shape. As an example, consider the
engines: these may vary in size, and their exact locations under the
wings need to be specified. An engine is defined in a local coordinate
system, and it is then moved to its proper location. This process
will have to be repeated for all engines. Another example would
be the seats in the plane: the manufacturer would design just one—
then multiple copies of it are put at the right locations in the plane’s
design.
1.4. Stepping Outside the Box 9

Following Sketch 1.7, and making things more formal again, we


are given a local 3D coordinate system, called the [d1 , d2 , d3 ]-system,
with coordinates (u1 , u2 , u3 ). We assume that the object under
consideration is located inside the unit cube, i.e., all of its defining
points satisfy
0 ≤ u1 , u2 , u3 ≤ 1.
This cube is to be mapped onto a 3D target box in the global [e1 , e2 , e3 ]-
system. Let the target box be given by its lower corner (min1 , min2 ,
min3 ) and its upper corner (max1 , max2 , max3 ). How do we map co-
ordinates (u1 , u2 , u3 ) from the local unit cube into the corresponding
target coordinates (x1 , x2 , x3 ) in the target box? Exactly as in the
2D case, with just one more equation:
x1 = (1 − u1 )min1 + u1 max1 , (1.7)
x2 = (1 − u2 )min2 + u2 max2 , (1.8)
x3 = (1 − u3 )min3 + u3 max3 . (1.9)
As an easy exercise, check that the corners of the unit cube are
mapped to the corners of the target box.
Sketch 1.7.
The analog to (1.3) and (1.4) is given by the rather obvious
Global and local 3D systems.
x1 = min1 + u1 Δ1 , (1.10)
x2 = min2 + u2 Δ2 , (1.11)
x3 = min3 + u3 Δ3 . (1.12)
As in the 2D case, if the target box is not a cube, object distortions
will result—this may be desired or not.

1.4 Stepping Outside the Box


We have restricted all objects to be within the unit square or cube; as
a consequence, their images were inside the respective target boxes.
This notion helps with an initial understanding, but it is not at all
essential. Let’s look at a 2D example, with the target box given by
(min1 , min2 ) = (1, 1) and (max1 , max2 ) = (2, 3),
and illustrated in Sketch 1.8.
The coordinates (u1 , u2 ) = (2, 3/2) are not inside the [d1 , d2 ]-
system unit square. Yet we can map it using (1.1) and (1.2):
x1 = −min1 + 2max1 = 3,
1 3 Sketch 1.8.
x2 = − min2 + max2 = 4. A 2D coordinate outside the
2 2 target box.
10 1. Descartes’ Discovery

Since the initial coordinates (u1 , u2 ) were not inside the unit square,
the mapped coordinates (x1 , x2 ) are not inside the target box. The
notion of mapping a square to a target box is a useful concept for
mentally visualizing what is happening—but it is not actually a re-
striction to the coordinates that we can map!

Example 1.3

Without much belaboring, it is clear the same holds for 3D. An ex-
ample should suffice: the target box is given by

(min1 , min2 , min3 ) = (0, 1, 0) and


(max1 , max2 , max3 ) = (0.5, 2, 1),

and we want to map the coordinates

(u1 , u2 , u3 ) = (1.5, 1.5, 0.5).

The result, illustrated by Sketch 1.9, is computed using (1.7)–(1.9):


it is
(x1 , x2 , x3 ) = (0.75, 2.5, 0.5).

1.5 Application: Creating Coordinates


Sketch 1.9. Suppose you have an interesting real object, like a model of a cat.
A 3D coordinate outside the 3D A friend of yours in Hollywood would like to use this cat in her latest
target box. hi-tech animated movie. Such movies use only mathematical descrip-
tions of objects—everything must have coordinates! You might recall
the movie Toy Story. It is a computer-animated movie, meaning
that the characters and objects in every scene have a mathematical
representation.
So how do you give your cat model coordinates? This is done with
a CMM, or coordinate measuring machine; see Figure 1.4. The CMM
is essentially an arm that is able to record the position of its tip by
keeping track of the angles of its joints.
Your cat model is placed on a table and somehow fixed so it does not
move during digitizing. You let the CMM’s arm touch three points
on the table; they will be converted to the origin and the e1 - and
e2 -coordinate axes of a 3D coordinate system. The e3 -axis (vertical
to the table) is computed automatically.2 Now when you touch your
2 Just how we convert the three points on the table to the three axes is covered

in Section 11.8.
1.5. Application: Creating Coordinates 11

Figure 1.4.
Creating coordinates: a cat is turned into math. (Microscribe-3D from Immersion
Corporation, http://www.immersion.com.)

cat model with the tip of the CMM’s arm, it will associate three
coordinates with that position and record them. You repeat this
for several hundred points, and you have your cat in the box! This
process is called digitizing. In the end, the cat has been “discretized,”
or turned into a finite number of coordinate triples. This set of points
is called a point cloud.
Someone else will now have to build a mathematical model of your
cat.3 Next, the mathematical model will have to be put into scenes
of the movie—but all that’s needed for that are 3D coordinate trans-
formations! (See Chapters 9 and 10.)

• unit square • parameter


• 2D and 3D local • aspect ratio
coordinates • normalized coordinates
• 2D and 3D global • digitizing
coordinates • point cloud
• coordinate transformation

3 This type of work is called geometric modeling or computer-aided geometric

design, see [7] and Chapter 20.


12 1. Descartes’ Discovery

1.6 Exercises
1. Let the coordinates of triangle vertices in the local [d1 , d2 ]-system unit
square be given by

(u1 , u2 ) = (0.1, 0.1), (v1 , v2 ) = (0.9, 0.2), (w1 , w2 ) = (0.4, 0.7).

(a) If the [d1 , d2 ]-system unit square is mapped to the target box with

(min1 , min2 ) = (1, 2) and (max1 , max2 ) = (3, 3),

where are the coordinates of the triangle vertices mapped?


(b) What local coordinates correspond to (x1 , x2 ) = (2, 2) in the
[e1 , e2 ]-system?

2. Given local coordinates (2, 2) and (−1, −1), find the global coordinates
with respect to the target box with

(min1 , min2 ) = (1, 1) and (max1 , max2 ) = (7, 3).

Make a sketch of the local and global systems. Connect the coordinates
in each system with a line and compare.
3. Let the [d1 , d2 , d3 ]-system unit cube be mapped to the 3D target box
with

(min1 , min2 , min3 ) = (1, 1, 1) and (Δ1 , Δ2 , Δ3 ) = (1, 2, 4).

Where will the coordinates (u1 , u2 , u3 ) = (0.5, 0, 0.7) be mapped?


4. Let the coordinates of triangle vertices in the local [d1 , d2 , d3 ]-system
unit square be given by

(u1 , u2 , u3 ) = (0.1, 0.1, 0), (v1 , v2 , v3 ) = (0.9, 0.2, 1),


(w1 , w2 , w3 ) = (0.4, 0.7, 0).

If the [d1 , d2 , d3 ]-system unit square is mapped to the target box with

(min1 , min2 ) = (1, 2, 4) and (max1 , max2 ) = (3, 3, 8),

where are the coordinates of the triangle vertices mapped? Hint: See
Exercise 1a.
5. Suppose we are given a global frame defined by (min1 , min2 , min3 ) =
(0, 0, 3) and (max1 , max2 , max3 ) = (4, 4, 4). For the coordinates (1, 1, 3)
and (0, 0, 0) in this frame, what are the corresponding coordinates in
the [d1 , d2 , d3 ]-system?
6. Assume you have an image in a local frame of 20 mm2 . If you enlarge
the frame and the image such that the new frame covers 40 mm2 , by
how much does the image size change?
1.6. Exercises 13

7. Suppose that local frame coordinates (v1 , v2 ) = (1/2, 1) are mapped


to global frame coordinates (5, 2) and similarly, (w1 , w2 ) = (1, 1) are
mapped to (8, 8). In the local frame, (u1 , u2 ) = (3/4, 1/2) lies at the
midpoint of two local coordinate sets. What are its coordinates in the
global frame?
8. The size of a TV set is specified by its monitor’s diagonal. In order to
determine the width and height of this TV, we must know the relevant
aspect ratio. What are the width and height dimensions of a 32 stan-
dard TV with aspect ratio 4 : 3? What are the dimensions of a 32 HD
TV with aspect ratio 16 : 9? (This is an easy one to find on the web,
but the point is to calculate it yourself!)
9. Suppose we are given coordinates (1/2, 1/2) in the [d1 , d2 ]-system. What
are the corresponding coordinates in the global frame with aspect ratio
4 : 3, (min1 , min2 ) = (0, 0), and Δ2 = 2?
10. In some implementations of the computer graphics viewing pipeline,
normalized device coordinates, or NDC, are defined as the cube with ex-
tents (−1, −1, −1) and (1, 1, 1). The next step in the pipeline maps the
(u1 , u2 ) coordinates from NDC (u3 is ignored) to the viewport, the area
of the screen where the image will appear. Give equations for (x1 , x2 )
in the viewport defined by extents (min1 , min2 ) and (max1 , max2 ) that
correspond to (u1 , u2 ) in NDC.
This page intentionally left blank
2
Here and There: Points
and Vectors in 2D

Figure 2.1.
Hurricane Katrina: the hurricane is shown here approaching south Louisiana. (Image
courtesy of NOAA, katrina.noaa.gov.)

In 2005 Hurricane Katrina caused flooding and deaths as it made its


way from the Bahamas to south Florida as a category 1 hurricane.
Over the warm waters of the Gulf, it grew into a category 5 hurricane,

15
16 2. Here and There: Points and Vectors in 2D

and even though at landfall in southeast Louisiana it had weakened


to a category 3 hurricane, the storm surges and destruction it cre-
ated rates it as the most expensive hurricane to date, causing more
than $45 billion of damage. Sadly it was also one of the deadliest,
particularly for residents of New Orleans. In the hurricane image
(Figure 2.1), air is moving rapidly, spiraling in a counterclockwise
fashion. What isn’t so clear from this image is that the air moves
faster as it approaches the eye of the hurricane. This air movement
is best described by points and vectors: at any location (point), air
moves in a certain direction and with a certain speed (velocity vector).
This hurricane image is a good example of how helpful 2D geome-
try can be in a 3D world. Of course a hurricane is a 3D phenomenon;
however, by analyzing 2D slices, or cross sections, we can develop a
very informative analysis. Many other applications call for 2D ge-
ometry only. The purpose of this chapter is to define the two most
fundamental tools we need to work in a 2D world: points and vectors.

2.1 Points and Vectors


Sketch 2.1. The most basic geometric entity is the point. A point is a reference
Points and their coordinates. to a location. Sketch 2.1 illustrates examples of points. In the text,
boldface lowercase letters represent points, e.g.,
 
p
p= 1 . (2.1)
p2
The location of p is p1 -units along the e1 -axis and p2 -units along
the e2 -axis. Thus a point’s coordinates, p1 and p2 , are dependent upon
the location of the coordinate origin. We use the boldface notation
so there is a noticeable difference between a one-dimensional (1D)
number, or scalar p. To clearly identify p as a point, the notation
p ∈ E2 is used. This means that a 2D point “lives” in 2D Euclidean
space E2 .
Now let’s move away from our reference point. Following Sketch 2.2,
suppose the reference point is p, and when moving along a straight
path, our target point is q. The directions from p would be to follow
the vector v. Our notation for a vector is the same as for a point:
boldface lowercase letters. To get to q we say,
Sketch 2.2.
Two points and a vector. q = p + v. (2.2)
To calculate this, add each component separately; that is,
       
q1 p v p + v1
= 1 + 1 = 1 .
q2 p2 v2 p2 + v2
2.1. Points and Vectors 17

For example, in Sketch 2.2, we have


     
4 2 2
= + .
3 2 1

The components of v, v1 and v2 , indicate how many units to move


along the e1 - and e2 -axis, respectively. This means that v can be
defined as
v = q − p. (2.3)

This defines a vector as a difference of two points, which describes a


direction and a distance, or a displacement. Examples of vectors are
illustrated in Sketch 2.3.
How to determine a vector’s length is covered in Section 2.4. Above Sketch 2.3.
we described this length as a distance. Alternatively, this length can Vectors and their components.
be described as speed: then we have a velocity vector.1 Yet another
interpretation is that the length represents acceleration: then we have
a force vector.
A vector has a tail and a head. As in Sketch 2.2, the tail is typically
displayed positioned at a point, or bound to a point in order to indicate
the geometric significance of the vector. However, unlike a point, a
vector does not define a position. Two vectors are equal if they have
the same component values, just as points are equal if they have the
same coordinate values. Thus, considering a vector as a difference of
two points, there are any number of vectors with the same direction
and length. See Sketch 2.4 for an illustration.
A special vector worth mentioning is the zero vector,
  Sketch 2.4.
0
0= . Instances of one vector.
0

This vector has no direction or length. Other somewhat special vec-


tors include    
1 0
e1 = and e2 = .
0 1
In the sketches, these vectors are not always drawn true to length to
prevent them from obscuring the main idea.
To clearly identify v as a vector, we write v ∈ R2 . This means that
a 2D vector “lives” in a 2D linear space R2 . (Other names for R2 are
real or vector spaces.)
1 This is what we’ll use to continue the Hurricane Katrina example.
18 2. Here and There: Points and Vectors in 2D

2.2 What’s the Difference?


When writing a point or a vector we use boldface lowercase letters;
when programming we use the same data structure, e.g., arrays. This
makes it appear that points and vectors can be treated in the same
manner. Not so!
Points and vectors are different geometric entities. This is reiter-
ated by saying they live in different spaces, E2 and R2 . As shown
in Sketch 2.5, for convenience and clarity elements of Euclidean and
linear spaces are typically displayed together.
The primary reason for differentiating between points and vectors is
to achieve geometric constructions that are coordinate independent.
Such constructions are manipulations applied to geometric objects
that produce the same result regardless of the location of the coordi-
nate origin (for example, the midpoint of two points). This idea be-
comes clearer by analyzing some fundamental manipulations of points
and vectors. In what follows, let’s use p, q ∈ E2 and v, w ∈ R2 .

Sketch 2.5. Coordinate Independent Operations:


Euclidean and linear spaces
illustrated separately and • Subtracting a point from another (p − q) yields a vector, as de-
together. picted in Sketch 2.2 and Equation (2.3).

• Adding or subtracting two vectors yields another vector. See


Sketch 2.6, which illustrates the parallelogram rule: the vectors
v − w and v + w are the diagonals of the parallelogram defined by
v and w. This is a coordinate independent operation since vectors
are defined as a difference of points.

• Multiplying by a scalar s is called scaling. Scaling a vector is a


well-defined operation. The result sv adjusts the length by the
scaling factor. The direction is unchanged if s > 0 and reversed
for s < 0. If s = 0 then the result is the zero vector. Sketch 2.7
illustrates some examples of scaling a vector.

Sketch 2.6. • Adding a vector to a point (p + v) yields another point, as in


Parallelogram rule. Sketch 2.2 and Equation (2.2).

Any coordinate independent combination of two or more points


and/or vectors can be grouped to fall into one or more of the items
above. See the Exercises for examples.
2.3. Vector Fields 19

Coordinate Dependent Operations:

• Scaling a point (sp) is not a well-defined operation because it is


not coordinate independent. Sketch 2.8 illustrates that the result
of scaling the solid black point by one-half with respect to two
different coordinate systems results in two different points.
• Adding two points (p+q) is not a well-defined operation because it
is not coordinate independent. As depicted in Sketch 2.9, the result
of adding the two solid black points is dependent on the coordinate
origin. (The parallelogram rule is used here to construct the results Sketch 2.7.
of the additions.) Scaling a vector.

Some special combinations of points are allowed; they are defined


in Section 2.5.

2.3 Vector Fields

Sketch 2.8.
Scaling of points is ambiguous.

Figure 2.2.
Vector field: simulating hurricane air velocity. Lighter gray indicates greater velocity.

A good way to visualize the interplay between points and vectors


is through the example of vector fields. In general, we speak of a
vector field if every point in a given region is assigned a vector. We
have already encountered an example of this in Figure 2.1: Hurricane
Katrina! Recall that at each location (point) we could describe the air
velocity (vector). Our previous image did not actually tell us anything
about the air speed, although we could presume something about the
direction. This is where a vector field is helpful. Shown in Figure 2.2
is a vector field simulating Hurricane Katrina. By plotting all the
20 2. Here and There: Points and Vectors in 2D

vectors the same length and using gray scale or varying shades of gray
to indicate speed, the vector field can be more informative than the
photograph. (Visualization of a vector field requires discretizing it: a
finite number of point and vector pairs are selected from a continuous
field or from sampled measurements.)
Other important applications of vector fields arise in the areas of
automotive and aerospace design: before a car or an airplane is built,
it undergoes extensive aerodynamic simulations. In these simulations,
the vectors that characterize the flow around an object are computed
from complex differential equations. In Figure 2.3 we have another
example of a vector field.

Sketch 2.9.
Addition of points is ambiguous.

Figure 2.3.
Vector field: every sampled point has an associated vector. Lighter gray indicates
greater vector length.

2.4 Length of a Vector


As mentioned in Section 2.1, the length of a vector can represent
distance, velocity, or acceleration. We need a method for finding the
length of a vector, or the magnitude. As illustrated in Sketch 2.10,
a vector defines the displacement necessary (with respect to the e1 -
and e2 -axis) to get from a point at the tail of the vector to a point at
the head of the vector.
Sketch 2.10. In Sketch 2.10 we have formed a right triangle. The square of the
Length of a vector. length of the hypotenuse of a right triangle is well known from the
Pythagorean theorem. Denote the length of a vector v as v. Then
v2 = v12 + v22 .
2.4. Length of a Vector 21

Therefore, the magnitude of v is



v = v12 + v22 . (2.4)

This is also called the Euclidean norm. Notice that if we scale the
vector by an amount k then
kv = |k|v. (2.5)
A normalized vector w has unit length, that is
w = 1.
Normalized vectors are also known as unit vectors. To normalize a
vector simply means to scale a vector so that it has unit length. If w
is to be our unit length version of v then
v
w= .
v
Each component of v is divided by the scalar value v. This scalar
value is always nonnegative, which means that its value is zero or
greater. It can be zero! You must check the value before dividing to
be sure it is greater than your zero divide tolerance. The zero divide
tolerance is the absolute value of the smallest number by which you
can divide confidently. (When we refer to checking that a value is
greater than this number, it means to check the absolute value.)
In Figures 2.2 and 2.3, we display vectors of varying magnitudes.
But instead of plotting them using different lengths, their magnitude
is indicated by gray scales.

Example 2.1
Start with  
5
v= .
0

Applying (2.4), v = 52 + 02 = 5. Then the normalized version of
v is defined as
  
5/5 1
w= = .
0/5 0

Clearly w = 1, so this is a normalized vector. Since we have only


scaled v by a positive amount, the direction of w is the same as v.
22 2. Here and There: Points and Vectors in 2D

Figure 2.4.
Unit vectors: they define a circle.

There are infinitely many unit vectors. Imagine drawing them all,
emanating from the origin. The figure that you will get is a circle of
radius one! See Figure 2.4.
To find the distance between two points we simply form a vector
defined by the two points, e.g., v = q − p, and apply (2.4).

Example 2.2

Let    
−1 1
q= and p = .
2 0
Then  
−2
q−p=
2
and  √
q − p = (−2)2 + 22 = 8 ≈ 2.83.
Sketch 2.11 illustrates this example.
Sketch 2.11.
Distance between two points.
2.5. Combining Points 23

2.5 Combining Points


Seemingly contrary to Section 2.2, there actually is a way to com-
bine two points such that we get a (meaningful) third one. Take the
example of the midpoint r of two points p and q; more specifically,
take      
1 2 3
p= , r= , q= ,
6 3 0
as shown in Sketch 2.12.
Let’s start with the known coordinate independent operation of
Sketch 2.12.
adding a vector to a point. Define r by adding an appropriately
The midpoint of two points.
scaled version of the vector v = q − p to the point p:
1
r= p+ v
2
     
2 1 1 2
= + .
3 6 2 −6

Expanding, this shows that r can also be defined as


1 1
r= p+ q
2 2
     
2 1 1 1 3
= + .
3 2 6 2 0
This is a legal expression for a combination of points.
There is nothing magical about the factor 1/2, however. Adding
a (scaled) vector to a point is a well-defined, coordinate independent
operation that yields another point. Any point of the form

r = p + tv (2.6)

is on the line through p and q. Again, we may rewrite this as

r = p + t(q − p)

and then
r = (1 − t)p + tq. (2.7)
Sketch 2.13 gives an example with t = 1/3.
The scalar values (1 − t) and t are coefficients. A weighted sum Sketch 2.13.
of points where the coefficients sum to one is called a barycentric Barycentric combinations:
combination. In this special case, where one point r is being expressed t = 1/3.
in terms of two others, p and q, the coefficients 1 − t and t are called
the barycentric coordinates of r.
24 2. Here and There: Points and Vectors in 2D

A barycentric combination allows us to construct r anywhere on


the line defined by p and q. This is why (2.7) is also called linear
interpolation. If we would like to restrict r’s position to the line
segment between p and q, then we allow only convex combinations:
t must satisfy 0 ≤ t ≤ 1. To define points outside of the line segment
between p and q, we need values of t < 0 or t > 1.
The position of r is said to be in the ratio of t : (1 − t) or t/(1 − t).
In physics, r is known as the center of gravity of two points p and q
with weights 1 − t and t, respectively. From a constructive approach,
the ratio is formed from the quotient
Sketch 2.14.
||r − p||
Examples of ratios. ratio = .
||q − r||
Some examples are illustrated in Sketch 2.14.

Example 2.3

Suppose we have three collinear points, p, q, and r as illustrated in


Sketch 2.15. The points have the following locations.
     
2 6.5 8
p= , r= , q= .
4 7 8
What are the barycentric coordinates of r with respect to p and q?
Sketch 2.15. To answer this, recall the relationship between the ratio and the
Barycentric coordinates in barycentric coordinates. The barycentric coordinates t and (1 − t)
relation to lengths. define r as      
6.5 2 8
= (1 − t) +t .
7 4 8
The ratio indicates the location of r relative to p and q in terms of
relative distances. Suppose the ratio is s1 : s2 . If we scale s1 and
s2 such that they sum to one, then s1 and s2 are the barycentric
coordinates t and (1 − t), respectively. By calculating the distances
between points:
l1 = r − p ≈ 5.4,
l2 = q − r ≈ 1.8,
l3 = l1 + l2 ≈ 7.2,
we find that
t = l1 /l3 = 0.75 and
(1 − t) = l2 /l3 = 0.25.
2.5. Combining Points 25

These are the barycentric coordinates. Let’s verify this:


     
6.5 2 8
= 0.25 × + 0.75 × .
7 4 8

The barycentric coordinate t is also called a parameter. (See Sec-


tion 3.2 for more details.) This parameter is defined by the quotient
||r − p||
t= .
||q − p||
We have seen how useful this quotient can be in Section 1.1 for the
construction of a point in the global system that corresponded to a
point with parameter t in the local system.
We can create barycentric combinations with more than two points.
Let’s look at three points p, q, and r, which are not collinear. Any
point s can be formed from
s = r + t1 (p − r) + t2 (q − r).
This is a coordinate independent operation of point + vector + vector.
Expanding and regrouping, we can also define s as
s = t1 p + t2 q + (1 − t1 − t2 )r
(2.8)
= t1 p + t2 q + t3 r.
Thus, the point s is defined by a barycentric combination with
coefficients t1 , t2 , and t3 = 1 − t1 − t2 with respect to p, q, and r,
respectively. This is another special case where the barycentric combi-
nation coefficients correspond to barycentric coordinates. Sketch 2.16
illustrates this. We will encounter barycentric coordinates in more
detail in Chapter 17.
We can also combine points so that the result is a vector. For this,
we need the coefficients to sum to zero. We encountered a simple case
of this in (2.3). Suppose we have the equation
Sketch 2.16.
e = r − 2p + q, r, p, q ∈ E2 . A barycentric combination of
Does e have a geometric meaning? Looking at the sum of the coeffi- three points.
cients, 1 − 2 + 1 = 0, we would conclude by the rule above that e is
a vector. How to see this? By rewriting the equation as
e = (r − p) + (q − p),
it is clear that e is a vector formed from (vector + vector).
26 2. Here and There: Points and Vectors in 2D

2.6 Independence
Two vectors v and w describe a parallelogram, as shown in Sketch 2.6.
It may happen that this parallelogram has zero area; then the two
vectors are parallel. In this case, we have a relationship of the form
v = cw. If two vectors are parallel, then we call them linearly depen-
dent. Otherwise, we say that they are linearly independent.
Two linearly independent vectors may be used to write any other
vector u as a linear combination:
u = rv + sw.
How to find r and s is described in Chapter 5. Two linearly inde-
pendent vectors in 2D are also called a basis for R2 . If v and w
are linearly dependent, then you cannot write all vectors as a linear
combination of them, as the following example shows.

Example 2.4
Let    
1 2
v= and w = .
2 4
If we tried to write the vector
 
1
u=
0
as u = rv + sw, then this would lead to
1 = r + 2s, (2.9)
0 = 2r + 4s. (2.10)

If we multiply the first equation by a factor of 2, the two right-hand


sides will be equal. Equating the new left-hand sides now results in
the expression 2 = 0. This shows that u cannot be written as a linear
combination of v and w. (See Sketch 2.17.)
Sketch 2.17.
Dependent vectors.

2.7 Dot Product


Given two vectors v and w, we might ask:
• Are they the same vector?
2.7. Dot Product 27

• Are they perpendicular to each other?


• What angle do they form?
The dot product is the tool to resolve these questions. Assume that v
and w are not the zero vector.
To motivate the dot product, let’s start with the Pythagorean the-
orem and Sketch 2.18. There, we see two perpendicular vectors v and
w; we conclude
v − w2 = v2 + w2 . (2.11)
Writing the components in (2.11) explicitly Sketch 2.18.
Perpendicular vectors.
(v1 − w1 )2 + (v2 − w2 )2 = (v12 + v22 ) + (w12 + w22 ),

and then expanding, bringing all terms to the left-hand side of the
equation yields

(v12 − 2v1 w1 + w12 ) + (v22 − 2v2 w2 + w22 ) − (v12 + v22 ) − (w12 + w22 ) = 0,

which reduces to
v1 w1 + v2 w2 = 0. (2.12)
We find that perpendicular vectors have the property that the sum
of the products of their components is zero. The short-hand vector
notation for (2.12) is
v · w = 0. (2.13)
This result has an immediate application: a vector w perpendicular
to a given vector v can be formed as
 
−v2
w=
v1

(switching components and negating the sign of one). Then v · w


becomes v1 (−v2 ) + v2 v1 = 0.
If we take two arbitrary vectors v and w, then v · w will in general
not be zero. But we can compute it anyway, and define

s = v · w = v1 w1 + v2 w2 (2.14)

to be the dot product of v and w. Notice that the dot product returns
a scalar s, which is why it is also called a scalar product. (Mathemati-
cians have yet another name for the dot product—an inner product.
See Section 14.3 for more on these.) From (2.14) it is clear that

v · w = w · v.
28 2. Here and There: Points and Vectors in 2D

This is called the symmetry property. Other properties of the dot


product are given in the Exercises.
In order to understand the geometric meaning of the dot product
of two vectors, let’s construct a triangle from two vectors v and w as
illustrated in Sketch 2.19.
From trigonometry, we know that the height h of the triangle can
be expressed as
h = w sin(θ).
Squaring both sides results in
Sketch 2.19.
h2 = w2 sin2 (θ).
Geometry of the dot product.
Using the identity
sin2 (θ) + cos2 (θ) = 1,
we have
h2 = w2 (1 − cos2 (θ)). (2.15)
We can also express the height h with respect to the other right
triangle in Sketch 2.19 and by using the Pythagorean theorem:

h2 = v − w2 − (v − w cos θ)2 . (2.16)

Equating (2.15) and (2.16) and simplifying, we have the expression,

v − w2 = v2 + w2 − 2vw cos θ. (2.17)

We have just proved the Law of Cosines, which generalizes the Pythag-
orean theorem by correcting it for triangles with an opposing angle
different from 90◦ .
We can formulate another expression for v − w2 by explicitly
writing out

v − w2 = (v − w) · (v − w)
(2.18)
= v2 − 2v · w + w2 .

By equating (2.17) and (2.18) we find that

v · w = vw cos θ. (2.19)

Here is another expression for the dot product—it is a very useful one!
Rearranging (2.19), the cosine of the angle between the two vectors
can be determined as
v·w
cos θ = . (2.20)
vw
2.7. Dot Product 29

cos(θ)

θ
90°  180°

–1

Figure 2.5.
Cosine function: its values at θ = 0◦ , θ = 90◦ , and θ = 180◦ are important to remember.

By examining a plot of the cosine function in Figure 2.5, some sense


can be made of (2.20).
First we consider the special case of perpendicular vectors. Re-
call the dot product was zero, which makes cos(90◦ ) = 0, just as it
should be.
If v has the same (or opposite) direction as w, that is v = kw,
then (2.20) becomes
kw · w
cos θ = .
kww
Using (2.5), we have

kw2
cos θ = = ±1.
|k|ww

Again, examining Figure 2.5, we see this corresponds to either θ = 0◦


or θ = 180◦ , for vectors of the same or opposite direction, respectively.
The cosine values from (2.20) range between ±1; this corresponds to
angles between 0◦ and 180◦ (or 0 and π radians). Thus, the smaller
angle between the two vectors is measured. This is clear from the
derivation: the angle θ enclosed by completing the triangle defined
by the two vectors must be less than 180◦ . Three types of angles can
be formed:

• right: cos(θ) = 0 → v · w = 0;

• acute: cos(θ) > 0 → v · w > 0;


Sketch 2.20.
• obtuse: cos(θ) < 0 → v · w < 0. Three types of angles.

These are illustrated in counterclockwise order from twelve o’clock in


Sketch 2.20.
30 2. Here and There: Points and Vectors in 2D

If the actual angle θ needs to be calculated, then the arccosine


function has to be invoked: let
v·w
s=
vw
then θ = acos(s) where acos is short for arccosine. One word of
warning: in some math libraries, if s > 1 or s < −1 then an error
occurs and a nonusable result (NaN—Not a Number) is returned.
Thus, if s is calculated, it is best to check that its value is within
the appropriate range. It is not uncommon that an intended value
of s = 1.0 is actually something like s = 1.0000001 due to round-
off. Thus, the arccosine function should be used with caution. In
many instances, as in comparing angles, the cosine of the angle is
all you need! Additionally, computing the cosine or sine is 40 times
more expensive than a multiplication, meaning that a cosine operation
might take 200 cycles (operations) and a multiplication might take 5
cycles. Arccosine and arcsine are yet more expensive.

Example 2.5

Let’s calculate the angle between the two vectors illustrated in


Sketch 2.21, forming an obtuse angle:
   
2 −1
v= and w = .
1 0

Sketch 2.21. Calculate the length of each vector,


The angle between two vectors.  √
v = 22 + 12 = 5

w = −12 + 02 = 1.
The cosine of the angle between the vectors is calculated using (2.20)
as
(2 × −1) + (1 × 0) −2
cos(θ) = √ = √ ≈ −0.8944.
5×1 5
Then
arccos(−0.8944) ≈ 153.4◦.
To convert an angle given in degrees to radians multiply by π/180◦ .
(Recall that π ≈ 3.14159 radians.) This means that
π
2.677 radians ≈ 153.4◦ × .
180◦
2.8. Orthogonal Projections 31

2.8 Orthogonal Projections


Sketch 2.19 illustrates that the projection of the vector w onto v
creates a footprint of length b = ||w|| cos(θ). This we derive from basic
trigonometry: cos(θ) = b/hypotenuse. The orthogonal projection of
w onto v is then the vector
v v·w
u = (||w|| cos(θ)) = v. (2.21)
||v|| ||v||2
Sometimes this projection is expressed as

u = projV1 w,

where V1 is the set of all 2D vectors kv and it is referred to as a one-


dimensional subspace of R2 . Therefore, u is the best approximation to
w in the subspace V1 . This concept of closest or best approximation
will be needed for several problems, such as finding the point at the
end of the footprint in Section 3.7 and for least squares approxima-
tions in Section 12.7. We will revisit subspaces with more rigor in
Chapter 14.
Using the orthogonal projection, it is easy to decompose the 2D
vector w into a sum of two perpendicular vectors, namely u and u⊥
(a vector perpendicular to u), such that

w = u + u⊥ . (2.22)

Another way to state this: we have resolved w into components with


respect to two other vectors. Already having found the vector u, we
now set
v·w
u⊥ = w − v.
||v||2
This can also be written as

u⊥ = w − projV1 w,

and thus u⊥ is the component of w orthogonal to the space of u.


See the Exercises of Chapter 8 for a 3D version of this decompo-
sition. Orthogonal projections and vector decomposition are at the
core of constructing the Gram-Schmidt orthonormal coordinate frame
in Section 11.8 for 3D and in Section 14.4 for higher dimensions. An
application that uses this frame is discussed in Section 20.7.
The ability to decompose a vector into its component parts is key to
Fourier analysis, quantum mechanics, digital audio, and video record-
ing.
32 2. Here and There: Points and Vectors in 2D

2.9 Inequalities

Here are two important inequalities when dealing with vector lengths.
Let’s start with the expression from (2.19), i.e.,

v · w = vw cos θ.

Squaring both sides gives

(v · w)2 = v2 w2 cos2 θ.

Noting that 0 ≤ cos2 θ ≤ 1, we conclude that

(v · w)2 ≤ v2 w2 . (2.23)

This is the Cauchy-Schwartz inequality. Equality holds if and only


if v and w are linearly dependent. This inequality is fundamental
in the study of more general vector spaces, which are presented in
Chapter 14.
Suppose we would like to find an inequality that describes the re-
lationship between the length of two vectors v and w and the length
of their sum v + w. In other words, how does the length of the third
side of a triangle relate to the lengths of the other two? Let’s begin
with expanding v + w2 :

v + w2 = (v + w) · (v + w)
= v · v + 2v · w + w · w
≤ v · v + 2|v · w| + w · w
(2.24)
≤ v · v + 2vw + w · w
= v2 + 2vw + w2
= (v + w)2 .

Sketch 2.22. Taking square roots gives


The triangle inequality.
v + w ≤ v + w,

which is known as the triangle inequality. It states the intuitively


obvious fact that the sum of any two edge lengths in a triangle is
never smaller than the length of the third edge; see Sketch 2.22 for
an illustration.
2.10. Exercises 33

• point versus vector • linear interpolation


• coordinates versus • convex combination
components • barycentric coordinates
• E2 versus R2 • linearly dependent vectors
• coordinate independent • linear combination
• vector length • basis for R2
• unit vector • dot product
• zero divide tolerance • Law of Cosines
• Pythagorean theorem • perpendicular vectors
• distance between two • angle between vectors
points • orthogonal projection
• parallelogram rule • vector decomposition
• scaling • Cauchy-Schwartz
• ratio inequality
• barycentric combination • triangle inequality

2.10 Exercises
1. Illustrate the parallelogram rule applied to the vectors
   
−2 2
v= and w = .
1 1

2. The parallelogram rule states that adding or subtracting two vectors, v


and w, yields another vector. Why is it called the parallelogram rule?
3. Define your own p, q ∈ E2 and v, w ∈ R2 . Determine which of the
following expressions are geometrically meaningful. Illustrate those
that are.
1
(a) p+q (b) 2
p + 12 q
(c) p+v (d) 3p + v
(e) v+w (f) 2v + 12 w
3
(g) v − 2w (h) 2
p − 12 q
4. Suppose we are given p, q ∈ E2 and v, w ∈ R2 . Do the following
operations result in a point or a vector?
1
(a) p−q (b) 2
p + 12 q
(c) p+v (d) 3v
(e) v+w (f) p + 12 w

5. What barycentric combination of the points p and q results in the


midpoint of the line through these two points?
34 2. Here and There: Points and Vectors in 2D

6. Illustrate a point with barycentric coordinates (1/2, 1/4, 1/4) with re-
spect to three other points.
7. Consider two points. Form the set of all convex combinations of these
points. What is the geometry of this set?
8. Consider three noncollinear points. Form the set of all convex combi-
nations of these points. What is the geometry of this set?
9. What is the length of the vector
 
−4
v= ?
−3

10. What is the magnitude of the vector


 
3
v= ?
−3

11. If a vector v is length 10, then what is the length of the vector −2v?
12. Find the distance between the points
   
3 −2
p= and q = .
3 −3

13. Find the distance between the points


   
−3 −2
p= and q = .
−3 −3

14. What is the length of the unit vector u?


 
−4
15. Normalize the vector v = .
−3
 
4
16. Normalize the vector v = .
2
17. Given points      
1 7 3
p= , q= , r= ,
1 7 3
what are the barycentric coordinates of r with respect to p and q?
18. Given points      
1 7 5
p= , q= , r= ,
1 7 5
what are the barycentric coordinates of r with respect to p and q?
19. If v = 4w, are v and w linearly independent?
20. If v = 4w, what is the area of the parallelogram spanned by v and w?
2.10. Exercises 35

21. Do the vectors    


1 3
v1 = and v2 =
4 0
form a basis for R2 ?
22. What linear combination allows us to express u with respect to v1 and
v2 , where
     
6 1 4
u= , v1 = , v2 = ?
4 0 4

23. Show that the dot product has the following properties for vectors
u, v, w ∈ R2 .
u·v =v·u symmetric

v · (sw) = s(v · w) homogeneous


(v + w) · u = v · u + w · u distributive
v·v >0 if v = 0 and v·v =0 if v=0 positive

24. What is v · w where


   
5 0
v= and w= ?
4 1

What is the scalar product of w and v?


25. Compute the angle (in degrees) formed by the vectors
   
5 3
v= and w= .
5 −3

26. Compute the cosine of the angle formed by the vectors


   
5 0
v= and w= .
5 4

Is the angle less than or greater than 90◦ ?


27. Are the following angles acute, obtuse, or right?

cos θ1 = −0.7 cos θ2 = 0 cos θ3 = 0.7

28. Given the vectors    


1 3
v= and w = ,
−1 2
find the orthogonal projection u of w onto v. Decompose w into com-
ponents u and u⊥ .
36 2. Here and There: Points and Vectors in 2D

29. For  √   
1/√2 1
v= and w=
1/ 2 3
find
u = projV1 w,
where V1 is the set of all 2D vectors kv, and find

u⊥ = w − projV1 w.

30. Given vectors v and w, is it possible for (v · w)2 to be greater than


v2 w2 ?
31. Given vectors v and w, under what conditions is (v · w)2 equal to
v2 w2 ? Give an example.
32. Given vectors v and w, can the length of v + w be longer than the
length of v + w?
33. Given vectors v and w, under what conditions is v+w = v+w?
Give an example.
3
Lining Up: 2D Lines

Figure 3.1.
Moiré patterns: overlaying two sets of lines at an angle results in an interesting pattern.

“Real” objects are three-dimensional, or 3D. So why should we con-


sider 2D objects, such as the 2D lines in this chapter? Because they
really are the building blocks for geometric constructions and play a
key role in many applications. We’ll look at various representations

37
38 3. Lining Up: 2D Lines

for lines, where each is suited for particular applications. Once we can
represent a line, we can perform intersections and determine distances
from a line.
Figure 3.1 shows how interesting playing with lines can be. Two
sets of parallel lines are overlaid and the resulting interference pattern
is called a Moiré pattern. Such patterns are used in optics for checking
the properties of lenses.

3.1 Defining a Line


As illustrated in Sketch 3.1, two elements of 2D geometry define a
line:

• two points;

• a point and a vector parallel to the line;

• a point and a vector perpendicular to the line.

Sketch 3.1. The unit vector that is perpendicular (or orthogonal) to a line is
Elements to define a line. referred to as the normal to the line. Figure 3.2 shows two families
of lines: one family of lines shares a common point and the other
family of lines shares the same normal. Just as there are different
ways to specify a line geometrically, there are different mathematical
representations: parametric, implicit, and explicit. Each representa-
tion will be examined and the advantages of each will be explained.
Additionally, we will explore how to convert from one form to another.

Figure 3.2.
Families of lines: one family shares a common point and the other shares a common
normal.
3.2. Parametric Equation of a Line 39

3.2 Parametric Equation of a Line


The parametric equation of a line l(t) has the form

l(t) = p + tv, (3.1)

where p ∈ E2 and v ∈ R2 . The scalar value t is the parameter. (See


Sketch 3.2.) Evaluating (3.1) for a specific parameter t = t̂, generates
a point on the line.
We encountered (3.1) in Section 2.5 in the context of barycentric
coordinates. Interpreting v as a difference of points, v = q − p, this
equation was reformulated as

l(t) = (1 − t)p + tq. (3.2)

A parametric line can be written in the form of either (3.1) or (3.2).


The latter is typically referred to as linear interpolation.
One way to interpret the parameter t is as time; at time t = 0 we
will be at point p and at time t = 1 we will be at point q. Sketch 3.2
illustrates that as t varies between zero and one, t ∈ [0, 1], points are
generated on the line between p and q. Recall from Section 2.5 that Sketch 3.2.
these values of t constitute a convex combination, which is a special Parametric form of a line.
case of a barycentric combination. If the parameter is a negative
number, that is t < 0, the direction of v reverses, generating points
on the line “behind” p. The case t > 1 is similar: this scales v so
that it is elongated, which generates points “past” q. In the context
of linear interpolation, when t < 0 or t > 1, it is called extrapolation.
The parametric form is very handy for computing points on a line.
For example, to compute ten equally spaced points on the line seg-
ment between p and q, simply define ten values of t ∈ [0, 1] as

t = i/9, i = 0, . . . , 9.

(Be sure this is a floating point calculation when programming!)


Equally spaced parameter values correspond to equally spaced points.

Example 3.1

Compute five points on the line defined by the points


   
1 6
p= and q = .
2 4
40 3. Lining Up: 2D Lines

Define v = q − p, then the line is defined as


   
1 5
l(t) = +t .
2 2

Generate five t-values as

t = i/4, i = 0, . . . , 4.

Plug each t-value into the formulation for l(t):


 
1
i = 0, t = 0, l(0) = ;
2
 
9/4
i = 1, t = 1/4, l(1/4) = ;
5/2
 
7/2
i = 2, t = 2/4, l(2/4) = ;
3
 
19/4
i = 3, t = 3/4, l(3/4) = ;
7/2
 
6
i = 4, t = 1, l(1) = .
4

Plot these values for yourself to verify them.

As you can see, the position of the point p and the direction and
length of the vector v determine which points on the line are gener-
ated as we increment through t ∈ [0, 1]. This particular artifact of
the parametric equation of a line is called the parametrization. The
parametrization is related to the speed at which a point traverses
the line. We may affect this speed by scaling v: the larger the scale
factor, the faster the point’s motion!

3.3 Implicit Equation of a Line


Another way to represent the same line is to use the implicit equation
of a line. For this representation, we start with a point p, and as
illustrated in Sketch 3.3, construct a vector a that is perpendicular
to the line.
For any point x on the line, it holds that

a · (x − p) = 0. (3.3)
3.3. Implicit Equation of a Line 41

This says that a and the vector (x − p) are perpendicular. If a has


unit length, it is called the normal to the line, and then (3.3) is the
point normal form of a line. Expanding this equation, we get
a1 x1 + a2 x2 + (−a1 p1 − a2 p2 ) = 0.
Commonly, this is written as
ax1 + bx2 + c = 0, (3.4)
where
a = a1 , (3.5)
b = a2 , (3.6) Sketch 3.3.
Implicit form of a line.
c = −a1 p1 − a2 p2 . (3.7)
Equation (3.4) is called the implicit equation of the line.

Example 3.2

Following Sketch 3.4, suppose we know two points,


   
2 6
p= and q = ,
2 4
on the line. To construct the coefficients a, b, and c in (3.4), first
form the vector  
4
v =q−p= .
2
Now construct a vector a that is perpendicular to v:
   
−v2 −2
a= = . (3.8)
v1 4

Note, equally as well, we could have chosen a to be Sketch 3.4.


  Implicit construction.
2
.
−4
The coefficients a and b in (3.5) and (3.6) are now defined as a = −2
and b = 4. With p as defined above, solve for c as in (3.7). In this
example,
c = 2 × 2 − 4 × 2 = −4.
The implicit equation of the line is complete:
−2x1 + 4x2 − 4 = 0.
42 3. Lining Up: 2D Lines

The implicit form is very useful for deciding if an arbitrary point lies
on the line. To test if a point x is on the line, just plug its coordinates
into (3.4). If the value f of the left-hand side of this equation,

f = ax1 + bx2 + c,

is zero then the point is on the line.


A numerical caveat is needed here. Checking equality with floating
point numbers should never be done. Instead, a tolerance  around
zero must be used. What is a meaningful tolerance in this situation?
We’ll see in Section 3.6 that
f
d= (3.9)
a

reflects the true distance of x to the line. Now the tolerance has a
physical meaning, which makes it much easier to specify. Sketch 3.3
illustrates the physical relationship of this tolerance to the line.
The sign of d indicates on which side of the line the point lies. This
sign is dependent upon the definition of a. (Remember, there were
two possible orientations.) Positive d corresponds to the point on the
side of the line to which a points.

Example 3.3

Let’s continue with our example for the line

−2x1 + 4x2 − 4 = 0,

as illustrated in Sketch 3.4. We want to test if the point


 
0
x=
1

lies on the line. First, calculate


 √
a = −22 + 42 = 20.

The distance is
√ √
d = (−2 × 0 + 4 × 1 − 4)/ 20 = 0/ 20 = 0,

which indicates the point is on the line.


3.3. Implicit Equation of a Line 43

Test the point  


0
x= .
3
For this point,
√ √
d = (−2 × 0 + 4 × 3 − 4)/ 20 = 8/ 20 ≈ 1.79.

Checking Sketch 3.3, this is a positive number, indicating that it is


on the same side of the line as the direction of a. Check for yourself
that d does indeed reflect the actual distance of this point to the line.
Test the point  
0
x= .
0
Calculating the distance for this point, we get
√ √
d = (−2 × 0 + 4 × 0 − 4)/ 20 = −4/ 20 ≈ −0.894.

Checking Sketch 3.3, this is a negative number, indicating it is on the


opposite side of the line as the direction of a.

From Example 3.3, we see that if we want to know the distance


of many points to the line, it is more economical to represent the
implicit line equation with each coefficient divided by a,

ax1 + bx2 + c
= 0,
a

which we know as the point normal form. The point normal form of
the line from the example above is
2 4 4
− √ x1 + √ x2 − √ = 0.
20 20 20
Examining (3.4) you might notice that a horizontal line takes the
form
bx2 + c = 0.
This line intersects the e2 -axis at −c/b. A vertical line takes the form

ax1 + c = 0.

This line intersects the e1 -axis at −c/a. Using the implicit form, these
lines are in no need of special handling.
44 3. Lining Up: 2D Lines

3.4 Explicit Equation of a Line

The explicit equation of a line is the third possible representation.


The explicit form is closely related to the implicit form in (3.4). It
expresses x2 as a function of x1 : rearranging the implicit equation
we have
a c
x2 = − x1 − .
b b
A more typical way of writing this is

x2 = âx1 + b̂,

where â = −a/b and b̂ = −c/b.


The coefficients have geometric meaning: â is the slope of the line
and b̂ is the e2 -intercept. Sketch 3.5 illustrates the geometry of the
coefficients for the line

x2 = 1/3x1 + 1.

The slope measures the steepness of the line as a ratio of the change
in x2 to a change in x1 : “rise/run,” or more precisely tan(θ). The
e2 -intercept indicates that the line passes through (0, b̂).
Sketch 3.5. Immediately, a drawback of the explicit form is apparent. If the
A line in explicit form. “run” is zero then the (vertical) line has infinite slope. This makes
life very difficult when programming! When we study transformations
(e.g., changing the orientation of some geometry) in Chapter 6, we
will see that infinite slopes actually arise often.
The primary popularity of the explicit form comes from the study
of calculus. Additionally, in computer graphics, this form is popular
when pixel calculation is necessary. Examples are Bresenham’s line
drawing algorithm and scan line polygon fill algorithms (see [10]).

3.5 Converting Between Parametric and


Implicit Equations

As we have discussed, there are advantages to both the parametric


and implicit representations of a line. Depending on the geometric
algorithm, it may be convenient to use one form rather than the other.
We’ll ignore the explicit form, since as we said, it isn’t very useful for
general 2D geometry.
3.5. Converting Between Parametric and Implicit Equations 45

3.5.1 Parametric to Implicit


Given: The line l in parametric form,

l : l(t) = p + tv.

Find: The coefficients a, b, c that define the implicit equation of the


line
l : ax1 + bx2 + c = 0.

Solution: First form a vector a that is perpendicular to the vector v.


Choose  
−v2
a= .
v1
This determines the coefficients a and b, as in (3.5) and (3.6), respec-
tively. Simply let a = a1 and b = a2 . Finally, solve for the coefficient
c as in (3.7). Taking p from l(t) and a, form

c = −(a1 p1 + a2 p2 ).

We stepped through a numerical example of this in the derivation


of the implicit form in Section 3.3, and it is illustrated in Sketch 3.4.
In this example, l(t) is given as
   
2 4
l(t) = +t .
2 2

3.5.2 Implicit to Parametric


Given: The line l in implicit form,

l : ax1 + bx2 + c = 0.

Find: The line l in parametric form,

l : l(t) = p + tv.

Solution: Recognize that we need one point on the line and a vec-
tor parallel to the line. The vector is easy: simply form a vector
perpendicular to a of the implicit line. For example, we could set
 
b
v= .
−a
46 3. Lining Up: 2D Lines

Next, find a point on the line. Two candidate points are the inter-
sections with the e1 - or e2 -axis,
   
−c/a 0
or ,
0 −c/b
respectively. For numerical stability, let’s choose the intersection clos-
est to the origin. Thus, we choose the former if |a| > |b|, and the latter
otherwise.

Example 3.4

Revisit the numerical example from the implicit form derivation in


Section 3.3; it is illustrated in Sketch 3.4. The implicit equation of
the line is
−2x1 + 4x2 − 4 = 0.
We want to find a parametric equation of this line,
l : l(t) = p + tv.
First form  
4
v= .
2
Now determine which is greater in absolute value, a or b. Since
| − 2| < |4|, we choose
   
0 0
p= = .
4/4 1
The parametric equation is
   
0 4
l(t) = +t .
1 2

The implicit and parametric forms both allow an infinite number


of representations for the same line. In fact, in the example we just
finished, the loop
parametric → implicit → parametric
produced two different parametric forms. We started with
   
2 4
l(t) = +t ,
2 2
3.6. Distance of a Point to a Line 47

and ended with    


0 4
l(t) = +t .
1 2
We could have just as easily generated the line
   
0 −4
l(t) = +t ,
1 −2

if v was formed with the rule


 
−b
v= .
a

Sketch 3.6 illustrates the first and third parametric representations of


this line.
These three parametric forms represent the same line! However, the Sketch 3.6.
manner in which the lines will be traced will differ. This is referred Two parametric represen-
tations for the same line.
to as the parametrization of the line. We already encountered this
concept in Section 3.2.

3.6 Distance of a Point to a Line


If you are given a point r and a line l, how far is that point from the
line? For example, as in Figure 3.3 (left), suppose a robot will travel
along the line and the points represent objects in the room. The
robot needs a certain clearance as it moves, thus we must check that
no point is closer than a given tolerance, thus it is necessary to check
the distance of each point to the line. In Section 2.8 on orthogonal

Figure 3.3.
Left: robot path application where clearance around the robot’s path must be mea-
sured. Right: measuring the perpendicular distance of each point to the line.
48 3. Lining Up: 2D Lines

projections, we learned that the smallest distance d(r, l) of a point


to a line is the orthogonal or perpendicular distance. This distance is
illustrated in Figure 3.3 (right).

3.6.1 Starting with an Implicit Line


Suppose our problem is formulated as follows:

Given: A line l in implicit form defined by (3.3) or (3.4) and a point r.

Find: d(r, l), or d for brevity.

Solution: The implicit line is given by coefficients a, b, and c, and thus


also the vector  
a
a= .
b
ar1 + br2 + c
d= ,
a
or in vector notation
a · (r − p)
d= ,
a
where p is a point on the line.
Let’s investigate why this is so. Recall that the implicit equation
of a line was derived through use of the dot product

a · (x − p) = 0,

as in (3.3); a line is given by a point p and a vector a normal to the


line. Any point x on the line will satisfy this equality.
Sketch 3.7. As in Sketch 3.7, we now consider a point r that is clearly not on
Distance point to line. the line. As a result, the equality will not be satisfied; however, let’s
assign a value v to the left-hand side:

v = a · (r − p).

To simplify, define w = r − p, as in Sketch 3.7. Recall the definition


of the dot product in (2.19) as

v · w = vw cos θ.

Thus, the expression for v becomes

v = a · w = aw cos(θ). (3.10)


3.6. Distance of a Point to a Line 49

The right triangle in Sketch 3.7 allows for an expression for cos(θ) as
d
cos(θ) = .
w
Substituting this into (3.10), we have
v = ad.
This indicates that the actual distance of r to the line is
v a · (r − p) ar1 + br2 + c
d= = = . (3.11)
a a a
If many points will be checked against a line, it is advantageous
to store the line in point normal form. This means that a = 1,
eliminating the division in (3.11).

Example 3.5

Start with the line and the point


 
5
l : 4x1 + 2x2 − 8 = 0; r= .
3
Find the distance from r to the line. (Draw your own sketch for this
example, similar to Sketch 3.7.)
First, calculate  √
a = 42 + 22 = 2 5.
Then the distance is
4×5+2×3−8 9
d(r, l) = √ = √ ≈ 4.02.
2 5 5
As another exercise, let’s rewrite the line in point normal form with
coefficients
4 2
â = √ = √ ,
2 5 5
2 1
b̂ = √ = √ ,
2 5 5
c −8 4
ĉ = = √ = −√ ,
a 2 5 5
thus making the point normal form of the line
2 1 4
√ x1 + √ x2 − √ = 0.
5 5 5
50 3. Lining Up: 2D Lines

3.6.2 Starting with a Parametric Line


Alternatively, suppose our problem is formulated as follows:

Given: A line l in parametric form, defined by a point p and a vector


v, and a point r.

Find: d(r, l), or d for brevity. Again, this is illustrated in Sketch 3.7.
Solution: Form the vector w = r − p. Use the relationship
d = w sin(α).
Later in Section 8.2, we will see how to express sin(α) directly in
terms of v and w; for now, we express it in terms of the cosine:

sin(α) = 1 − cos(α)2 ,
and as before
v·w
cos(α) = .
vw
Thus, we have defined the distance d.

Example 3.6

We’ll use the same line as in the previous example, but now it will be
given in parametric form as
   
0 2
l(t) = +t .
4 −4
We’ll also use the same point
 
5
r= .
3
Add any new vectors for this example to the sketch you drew for the
previous example.
First create the vector
     
5 0 5
w= − = .
3 4 −1
√ √
Next calculate w = 26 and v = 20. Compute
   
2 5
·
−4 −1
cos(α) = √ √ ≈ 0.614.
26 20
3.7. The Foot of a Point 51

Thus, the distance to the line becomes


√ 
d(r, l) ≈ 26 1 − (0.614)2 ≈ 4.02,

which rightly produces the same result as the previous example.

3.7 The Foot of a Point


Section 3.6 detailed how to calculate the distance of a point from a
line. A new question arises: which point on the line is closest to the
point? This point will be called the foot of the given point.
If you are given a line in implicit form, it is best to convert it to
parametric form for this problem. This illustrates how the implicit
form is handy for testing if a point is on the line; however, it is not
as handy for finding points on the line.
The problem at hand is thus:

Given: A line l in parametric form, defined by a point p and a vector


v, and another point r.

Find: The point q on the line that is closest to r. (See Sketch 3.8.)

Solution: The point q can be defined as

q = p + tv, (3.12)

so our problem is solved once we have found the scalar factor t. From
Sketch 3.8, we see that
tv
cos(θ) = ,
w
where w = r − p. Using
v·w
cos(θ) = ,
vw

we find Sketch 3.8.


v·w Closest point q on line to point r.
t= .
v2

Example 3.7
Given: The parametric line l defined as
   
0 0
l(t) = +t ,
1 2
52 3. Lining Up: 2D Lines

and point  
3
r= .
4
Find: The point q on l that is closest to r. This example is easy
enough to find the answer by simply drawing a sketch, but let’s go
through the steps.

Solution: Define the vector


     
3 0 3
w = r−p = − = .
4 1 3

Compute v · w = 6 and v = 2. Thus, t = 3/2 and


 
0
q= .
4
 
2
Try this example with r = .
−1

3.8 A Meeting Place: Computing Intersections


Finding a point in common between two lines is done many times
over in a CAD or graphics package. Take for example Figure 3.4: the
top part of the figure shows a great number of intersecting lines. In
order to color some of the areas, as in the bottom part of the figure,
it is necessary to know the intersection points. Intersection problems
arise in many other applications. The first question to ask is what
type of information do you want:

• Do you want to know merely whether the lines intersect?

• Do you want to know the point at which they intersect?

• Do you want a parameter value on one or both lines for the inter-
section point?

The particular question(s) you want to answer along with the line
representation(s) will determine the best method for solving the in-
tersection problem.
3.8. A Meeting Place: Computing Intersections 53

Figure 3.4.
Intersecting lines: the top figure may be drawn without knowing where the shown lines
intersect. By finding line/line intersections (bottom), it is possible to color areas—
creating an artistic image!

3.8.1 Parametric and Implicit

We then want to solve the following:

Given: Two lines l1 and l2 :


l1 : l1 (t) = p + tv
l2 : ax1 + bx2 + c = 0.
Find: The intersection point i. See Sketch 3.9 for an illustration.1
Solution: We will approach the problem by finding the specific pa- Sketch 3.9.
rameter t̂ with respect to l1 of the intersection point. Parametric and implicit line
intersection.
This intersection point, when inserted into the equation of l2 , will
cause the left-hand side to evaluate to zero:
a[p1 + t̂v1 ] + b[p2 + t̂v2 ] + c = 0.
1 InSection 3.3, we studied the conversion from the geometric elements of a
point q and perpendicular vector a to the implicit line coefficients.
54 3. Lining Up: 2D Lines

This is one equation and one unknown! Just solve for t̂,
−c − ap1 − bp2
t̂ = , (3.13)
av1 + bv2
then i = l(t̂).
But wait—we must check if the denominator of (3.13) is zero be-
fore carrying out this calculation. Besides causing havoc numerically,
what else does a zero denominator infer? The denominator
denom = av1 + bv2
can be rewritten as
denom = a · v.
We know from (2.7) that a zero dot product implies that two vectors
are perpendicular. Since a is perpendicular to the line l2 in implicit
form, the lines are parallel if
a · v = 0.
Of course, we always check for equality within a tolerance. A phys-
ically meaningful tolerance is best. Thus, it is better to check the
quantity
a·v
cos(θ) = ; (3.14)
av
the tolerance will be the cosine of an angle. It usually suffices to
use a tolerance between cos(0.1◦ ) and cos(0.5◦ ). Angle tolerances are
particularly nice to have because they are dimension independent.
Note that we do not need to use the actual angle, just the cosine of
the angle.
If the test in (3.14) indicates the lines are parallel, then we might
want to determine if the lines are identical. By simply plugging in
the coordinates of p into the equation of l2 , and computing, we get
ap1 + bp2 + c
d= .
a
If d is equal to zero (within tolerance), then the lines are identical.

Example 3.8

Given: Two lines l1 and l2 ,


   
0 −2
l1 : l1 (t) = +t
3 −1
l2 : 2x1 + x2 − 8 = 0.
3.8. A Meeting Place: Computing Intersections 55

Find: The intersection point i. Create your own sketch and try to
predict what the answer should be.
Solution: Find the parameter t̂ for l1 as given in (3.13). First check
the denominator:

denom = 2 × (−2) + 1 × (−1) = −5.

This is not zero, so we proceed to find


8−2×0−1×3
t̂ = = −1.
−5
Plug this parameter value into l1 to find the intersection point:
     
0 −2 2
l1 (−1) = + −1 = .
3 −1 4

3.8.2 Both Parametric


Another method for finding the intersection of two lines arises by
using the parametric form for both, illustrated in Sketch 3.10.
Sketch 3.10.
Given: Two lines in parametric form, Intersection of lines in
parametric form.
l1 : l1 (t) = p + tv
l2 : l2 (s) = q + sw.

Note that we use two different parameters, t and s, here. This is


because the lines are independent of each other.

Find: The intersection point i.


Solution: We need two parameter values t̂ and ŝ such that

p + t̂v = q + ŝw.

This may be rewritten as

t̂v − ŝw = q − p. (3.15)

We have two equations (one for each coordinate) and two unknowns
t̂ and ŝ. To solve for the unknowns, we could formulate an expres-
sion for t̂ using the first equation, and substitute this expression into
56 3. Lining Up: 2D Lines

the second equation. This then generates a solution for ŝ. Use this
solution in the expression for t̂, and solve for t̂. (Equations like this
are treated systematically in Chapter 5.) Once we have t̂ and ŝ, the
intersection point is found by inserting one of these values into its
respective parametric line equation.
If the vectors v and w are linearly dependent, as discussed in Sec-
tion 2.6, then it will not be possible to find a unique t̂ and ŝ. The
lines are parallel and possibly identical.

Example 3.9

Given: Two lines l1 and l2 ,


   
0 −2
l1 : l1 (t) = +t
3 −1
   
4 −1
l2 : l2 (s) = +s .
0 2

Find: The intersection point i. This means that we need to find t̂ and
ŝ such that l1 (t̂) = l2 (ŝ).2 Again, create your own sketch and try to
predict the answer.
Solution: Set up the two equations with two unknowns as in (3.15).
       
−2 −1 4 0
t̂ − ŝ = − .
−1 2 0 3

Solve these equations, resulting in t̂ = −1 and ŝ = 2. Plug these


values into the line equations to verify the same intersection point is
produced for each:
     
0 −2 2
l1 (−1) = + −1 =
3 −1 4
     
4 −1 2
l2 (2) = +2 = .
0 2 4

2 Really we only need t̂ or ŝ to find the intersection point.


3.8. A Meeting Place: Computing Intersections 57

3.8.3 Both Implicit


And yet a third method:
Given: Two lines in implicit form,
l1 : ax1 + bx2 + c = 0,
l2 : āx1 + b̄x2 + c̄ = 0.
As illustrated in Sketch 3.11, each line is geometrically given in terms
of a point and a vector perpendicular to the line.

Find: The intersection point


 

i = x̂ = 1
x̂2
that simultaneously satisfies l1 and l2 .

Solution: We have two equations


ax̂1 + bx̂2 = −c, (3.16)
āx̂1 + b̄x̂2 = −c̄ (3.17)
with two unknowns, x̂1 and x̂2 . Equations like this are solved in
Chapter 5, but this is simple enough to solve without those methods.
If the lines are parallel then it will not be possible to find x̂. This Sketch 3.11.
means that a and ā are linearly dependent. Intersection of two lines in
implicit form.

Example 3.10

Given: Two lines l1 and l2 ,


l1 : x1 − 2x2 + 6 = 0
l2 : 2x1 + x2 − 8 = 0.
Find: The intersection point x̂ as above. Create your own sketch and
try to predict the answer.

Solution: Reformulate the equations for l1 and l2 as in (3.16) and


(3.17). You will find that
 
2
x̂ = .
4
Plug this point into the equations for l1 and l2 to verify.
58 3. Lining Up: 2D Lines

• parametric form of a line • equation of a line defined


• linear interpolation by a point and a vector
• point normal form perpendicular to the line
• implicit form of a line
• distance of a point to a
• explicit form of a line
line
• equation of a line through
two points • line form conversions
• equation of a line defined • foot of a point
by a point and a vector
parallel to the line • intersection of lines

3.9 Exercises
1. Give the parametric form of the line l(t) defined by points
   
0 4
p= and q = ,
1 2

such that l(0) = p and l(1) = q.


2. Define the line l(t) in the linear interpolation form that interpolates to
points    
2 −1
p= and q = ,
3 −1
such that l(0) = p and l(1) = q.
3. For the line in Exercise 2, is the point l(2) formed from a convex com-
bination?
4. Using the line in Exercise 1, construct five equally spaced points on the
line segment l(t) where t ∈ [0, 1]. The first point should be p and the
last point should be q. What are the parameter values for these points?
5. Find the equation for a line in implicit form that passes through the
points
   
−2 0
p= and q = .
0 −1

6. Test if the following points lie on the line defined in Exercise 5:


       
0 −4 5 −3
r0 = , r1 = , r2 = , r3 = .
0 1 1 −1
3.9. Exercises 59

7. For the points in Exercise 6, if a point does not lie on the line, calculate
the distance from the line.
8. What is the point normal form of the line in Exercise 5?
9. What is the implicit equation of the horizontal line through x2 = −1/2?
What is the implicit equation of the vertical line through x1 = 1/2?
10. What is the explicit equation of the line 6x1 + 3x2 + 3 = 0?
11. What is the slope and e2 -intercept of the line x2 = 4x1 − 1?
12. What is the explicit equation of the horizontal line through x2 = −1/2?
What is the explicit equation of the vertical line through x1 = 1/2?
13. What is the explicit equation of the line with zero slope with e2 -
intercept 3? What is the explicit equation of the line with slope 2
that passes through the origin?
14. What is the implicit equation of the line
   
1 −2
l(t) = +t ?
1 3

15. What is the implicit equation of the line


   
0 0
l(t) = +t ?
1 1

16. What is the implicit equation of the line


   
−1 2
l(t) = (1 − t) +t ?
0 2

17. What is the parametric form of the line 3x1 + 2x2 + 1 = 0?


18. What is the parametric form of the line 5x1 + 1 = 0?
19. Given the line l(t) = p + tv and point r defined as
     
0 1 5
l(t) = +t and r = ,
0 1 0

find the distance of the point to the line.


20. Given the line l(t) = p + tv and point r defined as
     
−2 2 0
l(t) = +t and r = ,
0 2 0

find the distance of the point to the line.


21. Find the foot of the point r in Exercise 19.
22. Find the foot of the point r in Exercise 20.
60 3. Lining Up: 2D Lines

23. Given two lines: l1 defined by points


   
1 0
and ,
0 3

and l2 defined by points


   
−1 −4
and ,
6 1

find the intersection point using each of the three methods in Sec-
tion 3.8.
24. Find the intersection of the lines
   
0 1
l1 : l1 (t) = +t and
−1 1
l2 : −x1 + x2 + 1 = 0.

25. Find the intersection of the lines


l1 : −x1 +x2+ 1 =
 0 and
2 2
l2 : l(t) = +t .
2 2

26. Find the closest point on the line l(t) = p + tv, where
   
2 2
l(t) = +t
2 2

to the point  
0
r= .
2
27. The line l(t) passes through the points
   
0 4
p= and q = .
1 2

Now define a line m(t) that is perpendicular to l and that passes


through the midpoint of p and q.
4
Changing Shapes:
Linear Maps in 2D

Figure 4.1.
Linear maps in 2D: an interesting geometric figure constructed by applying 2D linear
maps to a square.

Geometry always has two parts to it: one part is the description of
the objects that can be generated; the other investigates how these

61
62 4. Changing Shapes: Linear Maps in 2D

objects can be changed (or transformed). Any object formed by sev-


eral vectors may be mapped to an arbitrarily bizarre curved or dis-
torted object—here, we are interested in those maps that map 2D vec-
tors to 2D vectors and are “benign” in some well-defined sense. All
these maps may be described using the tools of matrix operations,
or linear maps. An interesting pattern is generated from a simple
square in Figure 4.1 by such “benign” 2D linear maps—rotations and
scalings.
Matrices were first introduced by H. Grassmann in 1844. They
became the basis of linear algebra. Most of their properties can be
studied by just considering the humble 2 × 2 case, which corresponds
to 2D linear maps.

4.1 Skew Target Boxes

In Section 1.1, we saw how to map an object from a unit square to a


rectangular target box. We will now look at the part of that mapping
that is a linear map.
First, our unit square will be defined by vectors e1 and e2 . Thus,
a vector v in this [e1 , e2 ]-system is defined as

v = v1 e1 + v2 e2 . (4.1)

If we focus on mapping vectors to vectors, then we will limit the target


box to having a lower-left corner at the origin. In Chapter 6 we will
reintroduce the idea of a generally positioned target box. Instead of
specifying two extreme points for a rectangular target box, we will
describe a parallelogram target box by two vectors a1 , a2 , defining an
[a1 , a2 ]-system. A vector v is now mapped to a vector v by

v = v1 a1 + v2 a2 , (4.2)

Sketch 4.1. as illustrated by Sketch 4.1. This simply states that we duplicate the
A skew target box defined by a1 [e1 , e2 ]-geometry in the [a1 , a2 ]-system: The linear map transforms
and a 2 . e1 , e2 , v to a1 , a2 , v , respectively. The components of v are in the
context of the [e1 , e2 ]-system. However, the components of v with
respect to the [a1 , a2 ]-system are the components of v. Reviewing
a definition from Section 2.6, we recall that (4.2) is called a linear
combination.
4.2. The Matrix Form 63

Example 4.1

Let’s look at an example of the action of the map from the linear
combination in (4.2). Let the origin and
   
2 −2
a1 = , a2 =
1 4

define a new [a1 , a2 ]-coordinate system, and let


 
1/2
v=
1

be a vector in the [e1 , e2 ]-system. Applying the components of v in


a linear combination of a1 and a2 , results in
     
1 2 −2 −1
v = × +1× = . (4.3)
2 1 4 9/2

Thus v has components  


1/2
1
with respect to the [a1 , a2 ]-system; with respect to the [e1 , e2 ]-system,
it has coordinates  
−1
.
9/2
See Sketch 4.1 for an illustration.

4.2 The Matrix Form


The components of a subscripted vector will be written with a double
subscript as  
a
a1 = 1,1 .
a2,1
The vector component index precedes the vector subscript.
The components for the vector v in the [e1 , e2 ]-system from Ex-
ample 4.1 are expressed as
     
−1 1 2 −2
= × +1× . (4.4)
9/2 2 1 4
64 4. Changing Shapes: Linear Maps in 2D

This is strictly an equation between vectors. It invites a more concise


notation using matrix notation:
    
−1 2 −2 1/2
= . (4.5)
9/2 1 4 1
The 2×2 array in this equation is called a matrix. It has two columns,
corresponding to the vectors a1 and a2 . It also has two rows, namely
the first row with entries 2, −2 and the second one with 1, 4.
In general, an equation like this one has the form
  
 a1,1 a1,2 v1
v = , (4.6)
a2,1 a2,2 v2
or,
v = Av, (4.7)
where A is the 2 × 2 matrix. The vector v is called the image of
v, and thus v is the preimage. The linear map is described by the
matrix A—we may think of A as being the map’s coordinates. We
will also refer to the linear map itself by A. Then v is in the range
of the map and v is in the domain.
The elements a1,1 and a2,2 form the diagonal of the matrix. The
product Av has two components, each of which is obtained as a dot
product between the corresponding row of the matrix and v. In full
generality, we have
 
  v1 a1,1 + v2 a1,2
Av = v1 a1 + v2 a2 = .
v1 a2,1 + v2 a2,2
For example,     
0 −1 −1 −4
= .
2 4 4 14
In other words, Av is equivalent to forming the linear combination
v of the columns of A. All such combinations, that is all such v ,
form the column space of A.
Another note on notation. Coordinate systems, such as the [e1 , e2 ]-
system, can be interpreted as a matrix with columns e1 and e2 . Thus,
 
1 0
[e1 , e2 ] ≡ ,
0 1
and this is called the 2 × 2 identity matrix.
There is a neat way to write the matrix-times-vector algebra that
facilitates manual computation. As explained above, every entry in
the resulting vector is a dot product of the input vector and a row of
4.2. The Matrix Form 65

the matrix. Let’s arrange this as follows:


2
1/2
2 −2 3
1 4 4
Each entry of the resulting vector is now at the intersection of
the corresponding matrix row and the input vector, which is written
as a column. As you multiply and then add the terms in your dot
product, this scheme guides you to the correct position in the result
automatically! Here we multiplied a 2 × 2 matrix by a 2 × 1 vector.
Note that the interior dimensions (both 2) must be identical and the
outer dimensions, 2 and 1, indicate the resulting vector or matrix size,
2 × 1. Sometimes it is convenient to think of the vector v as a 2 × 1
matrix: it is a matrix with two rows and one column.
One fundamental matrix operation is matrix addition. Two matri-
ces A and B may be added by adding corresponding elements:
     
a1,1 a1,2 b b a + b1,1 a1,2 + b1,2
+ 1,1 1,2 = 1,1 . (4.8)
a2,1 a2,2 b2,1 b2,2 a2,1 + b2,1 a2,2 + b2,2

Notice that the matrices must be of the same dimensions; this is


not true for matrix multiplication, which we will demonstrate in Sec-
tion 4.10.
Using matrix addition, we may write

Av + Bv = (A + B)v.

This works because of the very simple definition of matrix addition.


This is also called the distributive law.
Forming the transpose matrix is another fundamental matrix oper-
ation. It is denoted by AT and is formed by interchanging the rows
and columns of A: the first row of AT is A’s first column, and the
second row of AT is A’s second column. For example, if
   
1 −2 T 1 3
A= , then A = .
3 5 −2 5

Since we may think of a vector v as a matrix, we should be able


to find v’s transpose. Not very hard: it is a vector with one row and
two columns,
 
−1  
v= and vT = −1 4 .
4
66 4. Changing Shapes: Linear Maps in 2D

It is straightforward to confirm that

[A + B]T = AT + B T . (4.9)

Two more identities are


T
AT = A and [cA]T = cAT . (4.10)

A symmetric matrix is a special matrix that we will encounter many


times. A matrix A is symmetric if A = AT , for example
 
5 8
.
8 1

There are no restrictions on the diagonal elements, but all other el-
ements are equal to the element about the diagonal with reversed
indices. For a 2 × 2 matrix, this means that a2,1 = a1,2 .
With matrix notation, we can now continue the discussion of in-
dependence from Section 2.6. The columns of a matrix define an
[a1 , a2 ]-system. If the vectors a1 and a2 are linearly independent
then the matrix is said to have full rank, or for the 2 × 2 case, the
matrix has rank 2. If a1 and a2 are linearly dependent then the ma-
trix has rank 1. These two statements may be summarized as: the
rank of a 2 × 2 matrix equals the number of linearly independent
column (row) vectors. Matrices that do not have full rank are called
rank deficient or singular. We will encounter an important example
of a rank deficient matrix, a projection matrix, in Section 4.8. The
only matrix with rank zero is the zero matrix, a matrix with all zero
entries. The 2 × 2 zero matrix is
 
0 0
.
0 0

The importance of the rank equivalence of column and row vectors


will come to light in later chapters when we deal with n × n matrices.
For now, we can observe that this fact means that the ranks of A and
AT are equal.

4.3 Linear Spaces


2D linear maps act on vectors in 2D linear spaces, also known as
2D vector spaces. Recall from Section 2.1 that the set of all ordered
pairs, or 2D vectors v is called R2 . In Section 2.8, we encountered
the concept of a subspace of R2 in finding the orthogonal projection
4.3. Linear Spaces 67

of a vector w onto a vector v. (In Chapter 14, we will look at more


general linear spaces.)
The standard operations in a linear space are addition and scalar
multiplication, which are encapsulated for vectors in the linear com-
bination in (4.2). This is called the linearity property. Additionally,
linear maps, or matrices, are characterized by preservation of linear
combinations. This statement can be encapsulated as follows

A(au + bv) = aAu + bAv. (4.11)

Let’s break this statement down into the two basic elements: scalar
multiplication and addition. For the sake of concreteness, we shall use
the example
     
−1 1/2 1 −1
A= , u= , v= .
0 −1/2 2 4

We may multiply all elements of a matrix by one factor; we then Sketch 4.2.
say that we have multiplied the matrix by that factor. Using our Matrices preserve scalings.
example, we may multiply the matrix A by a factor, say 2:
   
−1 1/2 −2 1
2× = .
0 −1/2 0 −1
When we say that matrices preserve multiplication by scalar factors
we mean that if we scale a vector by a factor c, then its image will
also be scaled by c:
A(cu) = cAu.

Example 4.2

Here are the computations that go along with Sketch 4.2:

        
−1 1/2 1 −1 1/2 1 0
2× =2× = .
0 −1/2 2 0 −1/2 2 −2

Matrices also preserve sums:

A(u + v) = Au + Av.
Sketch 4.3.
This is also called the distributive law. Sketch 4.3 illustrates this
Matrices preserve sums.
property (with a different set of A, u, v).
68 4. Changing Shapes: Linear Maps in 2D

Now an example to demonstrate that matrices preserve linear com-


binations, as expressed in (4.11).

Example 4.3

     
−1 1/2 1 −1
A(3u + 2v) = 3 +2
0 −1/2 2 4
    
−1 1/2 1 6
= = .
0 −1/2 14 −7

     
−1 1/2 1 −1 1/2 −1
3Au + 2Av = 3 +2
0 −1/2 2 0 −1/2 4
     
0 6 6
= + = .
−3 −4 −7

Sketch 4.4 illustrates this example.


Sketch 4.4.
Matrices preserve linear
combinations.
Preservation of linear combinations is a key property of matrices—
we will make substantial use of it throughout this book.

4.4 Scalings
Consider the linear map given by
   
1/2 0 v /2
v = v= 1 . (4.12)
0 1/2 v2 /2

This map will shorten v since v = 1/2v. Its effect is illustrated in


Figure 4.2. That figure—and more to follow—has two parts. The
left part is a Phoenix whose feathers form rays that correspond to a
sampling of unit vectors. The right part shows what happens if we
map the Phoenix, and in turn the unit vectors, using the matrix from
(4.12). In this figure we have drawn the e1 and e2 vectors,
   
1 0
e1 = and e2 = ,
0 1

but in future figures we will not. Notice the positioning of these


vectors relative to the Phoenix. Now in the right half, e1 and e2 have
4.4. Scalings 69

Figure 4.2.
Scaling: a uniform scaling.

been mapped to the vectors a1 and a2 . These are the column vectors
of the matrix in (4.12),
   
1/2 0
a1 = and a2 = .
0 1/2

The Phoenix’s shape provides a sense of orientation. In this exam-


ple, the linear map did not change orientation, but more complicated
maps will.
Next, consider  
2 0
v = v.
0 2
Now, v will be “enlarged.”
In general, a scaling is defined by the operation
 
 s1,1 0
v = v, (4.13)
0 s2,2

thus allowing for nonuniform scalings in the e1 - and e2 -direction.


Figure 4.3 gives an example for s1,1 = 1/2 and s2,2 = 2.
A scaling affects the area of the object that is scaled. If we scale
an object by s1,1 in the e1 -direction, then its area will be changed
by a factor s1,1 . Similarly, it will change by a factor of s2,2 when we
70 4. Changing Shapes: Linear Maps in 2D

Figure 4.3.
Scaling: a nonuniform scaling.

apply that scaling to the e2 -direction. The total effect is thus a factor
of s1,1 s2,2 .
You can see this from Figure 4.2 by mentally constructing the
square spanned by e1 and e2 and comparing its area to the rect-
angle spanned by the image vectors. It is also interesting to note
that, in Figure 4.3, the scaling factors result in no change of area,
although a distortion did occur.
The distortion of the circular Phoenix that we see in Figure 4.3
is actually well-defined—it is an ellipse! In fact, all 2 × 2 matrices
will map circles to ellipses. (In higher dimensions, we will speak of
ellipsoids.) We will refer to this ellipse that characterizes the action
of the matrix as the action ellipse.1 In Figure 4.2, the action ellipse
is a scaled circle, which is a special case of an ellipse. In Chapter 16,
we will relate the shape of the ellipse to the linear map.

1 We will study ellipses in Chapter 19. An ellipse is symmetric about two axes
that intersect at the center of the ellipse. The longer axis is called the major axis
and the shorter axis is called the minor axis. The semi-major and semi-minor
axes are one-half their respective axes.
4.5. Reflections 71

Figure 4.4.
Reflections: a reflection about the e1 -axis.

4.5 Reflections

Consider the scaling



 1 0
v = v. (4.14)
0 −1

We may rewrite this as


   
v1 v1
= .
v2 −v2

The effect of this map is apparently a change in sign of the second


component of v, as shown in Figure 4.4. Geometrically, this means
that the input vector v is reflected about the e1 -axis, or the line
x1 = 0.
Obviously, reflections like the one in Figure 4.4 are just a special
case of scalings—previously we simply had not given much thought
to negative scaling factors. However, a reflection takes a more general
form, and it results in the mirror image of the vectors. Mathemati-
cally, a reflection maps each vector about a line through the origin.
The most common reflections are those about the coordinate axes,
with one such example illustrated in Figure 4.4, and about the lines
x1 = x2 and x1 = −x2 . The reflection about the line x1 = x2 is
72 4. Changing Shapes: Linear Maps in 2D

Figure 4.5.
Reflections: a reflection about the line x1 = x2 .

achieved by the matrix


   
0 1 v
v = v= 2 .
1 0 v1
Its effect is shown in Figure 4.5; that is, the components of the input
vector are interchanged.
By inspection of the figures in this section, it appears that reflec-
tions do not change areas. But be careful—they do change the sign
of the area due to a change in orientation. If we rotate e1 into e2 , we
move in a counterclockwise direction. Now, rotate
   
0 1
a1 = into a2 = ,
1 0
and notice that we move in a clockwise direction. This change in
orientation is reflected in the sign of the area. We will examine this
in detail in Section 4.9.
The matrix  
 −1 0
v = v, (4.15)
0 −1
as seen in Figure 4.6, appears to be a reflection, but it is really a
rotation of 180◦ . (Rotations are covered in Section 4.6.) If we rotate
a1 into a2 we move in a counterclockwise direction, confirming that
this is not a reflection.
Notice that all reflections result in an action ellipse that is a circle.
4.6. Rotations 73

Figure 4.6.
Reflections: a reflection about both axes is also a rotation of 180◦ .

4.6 Rotations
The notion of rotating a vector around the origin is intuitively clear,
but a corresponding matrix takes a few moments to construct. To
keep it easy at the beginning, let us rotate the unit vector
 
1
e1 =
0

by α degrees counterclockwise, resulting in a new (rotated) vector


 
cos α
e1 = .
sin α

Notice that cos2 α + sin2 α = 1, thus this is a rotation. Consult


Sketch 4.5 to convince yourself of this fact.
Thus, we need to find a matrix R that achieves Sketch 4.5.
Rotating a unit vector.
    
cos α r1,1 r1,2 1
= .
sin α r2,1 r2,2 0

Additionally, we know that e2 will rotate to


 
− sin α
e2 = .
cos α
74 4. Changing Shapes: Linear Maps in 2D

Figure 4.7.
Rotations: a rotation by 45◦ .

This leads to the correct rotation matrix; it is given by


 
cos α − sin α
R= . (4.16)
sin α cos α

But let’s verify that we have already found the solution to the general
rotation problem.
Let v be an arbitrary vector. We claim that the matrix R from
(4.16) will rotate it by α degrees to a new vector v . If this is so, then
we must have
v · v = v2 cos α
according to the rules of dot products (see Section 2.7). Here, we
made use of the fact that a rotation does not change the length of a
vector, i.e., v = v  and hence v · v  = v2 .
Since  
 v1 cos α − v2 sin α
v = ,
v1 sin α + v2 cos α
the dot product v · v is given by
v · v = v12 cos α − v1 v2 sin α + v1 v2 sin α + v22 cos α
= (v12 + v22 ) cos α
= v2 cos α,

and all is shown! See Figure 4.7 for an illustration. There, α = 45◦ ,
4.7. Shears 75

and the rotation matrix is thus given by



√ √
2/2 − 2/2
R= √ √ .
2/2 2/2
Rotations are in a special class of transformations; these are called
rigid body motions. (See Section 5.9 for more details on these special
matrices.) The action ellipse of a rotation is a circle. Finally, it should
come without saying that rotations do not change areas.

4.7 Shears
What map takes a rectangle to a parallelogram? Pictorially, one such
map is shown in Sketch 4.6.
In this example, we have a map: Sketch 4.6.
    A special shear.
0 d
v= −→ v = 1 .
1 1
In matrix form, this is realized by
    
d1 1 d1 0
= . (4.17)
1 0 1 1
Verify! The 2 × 2 matrix in this equation is called a shear matrix. It
is the kind of matrix that is used when you generate italic fonts from
standard ones.
A shear matrix may be applied to arbitrary vectors. If v is an input
vector, then a shear maps it to v :
    
1 d1 v1 v + v2 d1
v = = 1 ,
0 1 v2 v2
as illustrated in Figure 4.8. Clearly, the circular Phoenix is mapped
to an elliptical one.
We have so far restricted ourselves to shears along the e1 -axis; we
may also shear along the e2 -axis. Then we would have
    
1 0 v1 v1
v = = ,
d2 1 v2 v1 d2 + v2
as illustrated in Figure 4.9.
Since it will be needed later, we look at the following. What is the
shear that achieves
   
v v
v= 1 −→ v = 1 ?
v2 0
76 4. Changing Shapes: Linear Maps in 2D

Figure 4.8.
Shears: shearing parallel to the e1 -axis.

Figure 4.9.
Shears: shearing parallel to the e2 -axis.

It is obviously a shear parallel to the e2 -axis and is given by the map


    
 v1 1 0 v1
v = = . (4.18)
0 −v2 /v1 1 v2
Shears do not change areas. In Sketch 4.6, we see that the rectangle
and its image, a parallelogram, have the same area: both have the
same base and the same height.
4.8. Projections 77

4.8 Projections
Projections—parallel projections, for our purposes—act like sunlight
casting shadows. Parallel projections are characterized by the fact
that all vectors are projected in a parallel direction. In 2D, all vec-
tors are projected onto a line. If the angle of incidence with the line
is ninety degrees then it is an orthogonal projection, otherwise it is
an oblique projection. In linear algebra, orthogonal projections are
very important, as we have already seen in Section 2.8, they give us
a best approximation in a particular subspace. Oblique projections
are important to applications in fields such as computer graphics and
architecture. On the other hand, in a perspective projection, the pro-
jection direction is not constant. These are not linear maps, however;
they are introduced in Section 10.5.
Let’s look at a simple 2D orthogonal projection. Take any vector
v and “flatten it out” onto the e1 -axis. This simply means: set the
v2 -coordinate of the vector to zero. For example, if we project the
vector  
3
v=
1
onto the e1 -axis, it becomes
 
 3
v = ,
0

as shown in Sketch 4.7.


What matrix achieves this map? That’s easy: Sketch 4.7.
     An orthogonal, parallel
3 1 0 3 projection.
= .
0 0 0 1

This matrix will not only project the vector


 
3
1

onto the e1 -axis, but in fact every vector! This is so since


    
1 0 v1 v
= 1 .
0 0 v2 0

While this is a somewhat trivial example of a projection, we see that


this projection does indeed feature a main property of a projection:
it reduces dimensionality. In this example, every vector from 2D
78 4. Changing Shapes: Linear Maps in 2D

Figure 4.10.
Projections: all vectors are “flattened out” onto the e1 -axis.

space is mapped into 1D space, namely onto the e1 -axis. Figure 4.10
illustrates this property and that the action ellipse of a projection is
a straight line segment that is covered twice.
To construct a 2D orthogonal projection matrix, first choose a unit
vector u to define a line onto which to project. The matrix is de-
fined by a1 and a2 , or in other words, the projections of e1 and e2 ,
respectively onto u. From (2.21), we have
u · e1
a1 = u = u1 u,
u2
u · e2
a2 = u = u2 u,
u2
thus
 
A = u1 u u2 u (4.19)
T
= uu . (4.20)
Forming a matrix as in (4.20), from the product of a vector and
its transpose, results in a dyadic matrix. Clearly the columns of A
are linearly dependent and thus the matrix has rank one. This map
reduces dimensionality, and as far as areas are concerned, projections
take a lean approach: whatever an area was before application of the
map, it is zero afterward.
Figure 4.11 shows the effect of (4.20) on the e1 and e2 axes. On
the left side, the vector u = [cos 30◦ sin 30◦ ]T and thin lines show the
4.8. Projections 79

projection of e1 (black) and e2 (dark gray) onto u. On the right side,


many u vectors are illustrated: every 10◦ , forming 36 arrows or ui for
i = 1, 36. The black circle of arrows is formed by the projection of e1
onto each ui . The gray circle of arrows is formed by the projection
of e2 onto each ui .
In addition to reducing dimensionality, a projection matrix is also
idempotent: A = AA. Geometrically, this means that once a vector
has been projected onto a line, application of the same projection will
leave the result unchanged.

Example 4.4

Let the projection line be defined by


 √ 
1/√2
u= .
1/ 2
Then (4.20) defines the projection matrix to be
 
0.5 0.5
A= .
0.5 0.5
This u vector, corresponding to a 45◦ rotation of e1 , is absent from
the right part of Figure 4.11, so find where it belongs. In this case,
the projection of e1 and e2 are identical.

Figure 4.11.
Projections: e1 and e2 vectors orthogonally projected onto u results in a1 (black) and
a2 (dark gray), respectively. Left: vector u = [cos 30◦ sin 30◦ ]T . Right: vectors ui for
i = 1, 36 are at 10◦ increments.
80 4. Changing Shapes: Linear Maps in 2D

Try sketching the projection of a few vectors yourself to get a feel


for how this projection works. In particular, try
 
1
v= .
2

Sketch v = Av. Now compute v = Av . Surprised?

Let’s revisit the orthogonal projection discussion from Section 2.8


by examining the action of A in (4.20) on a vector x,

Ax = uuT x = (u · x)u.

We see this is the same result as (2.21). Suppose the projection of x


onto u is y, then x = y + y⊥ , and we then have

Ax = uuT y + uuT y⊥ ,

and since uT y = y and uT y⊥ = 0,

Ax = yu.

Projections will be revisited many times, and some examples in-


clude: homogeneous linear systems in Section 5.8, 3D projections
in Sections 9.7 and 10.4, creating orthonormal coordinate frames in
Sections 11.8, 14.4, and 20.7, and least squares approximation in Sec-
tion 12.7.

4.9 Areas and Linear Maps: Determinants


As you might have noticed, we discussed one particular aspect of
linear maps for each type: how areas are changed. We will now
discuss this aspect for an arbitrary 2D linear map. Such a map takes
the two vectors [e1 , e2 ] to the two vectors [a1 , a2 ]. The area of the
square spanned by [e1 , e2 ] is 1, that is area(e1 , e2 ) = 1. If we knew
the area of the parallelogram spanned by [a1 , a2 ], then we could say
how the linear map affects areas.
Sketch 4.8. How do we find the area P of a parallelogram spanned by two
Area formed by a1 and a2 . vectors a1 and a2 ? Referring to Sketch 4.8, let us first determine the
4.9. Areas and Linear Maps: Determinants 81

area T of the triangle formed by a1 and a2 . We see that

T = a1,1 a2,2 − T1 − T2 − T3 .

We then observe that


1
T1 = a1,1 a2,1 , (4.21)
2
1
T2 = (a1,1 − a1,2 )(a2,2 − a2,1 ), (4.22)
2
1
T3 = a1,2 a2,2 . (4.23)
2
Working out the algebra, we arrive at

1 1
T = a1,1 a2,2 − a1,2 a2,1 .
2 2
Our aim was not really T , but the parallelogram area P . Clearly (see
Sketch 4.9),
P = 2T,
and we have our desired area.
It is customary to use the term determinant for the (signed) area of Sketch 4.9.
the parallelogram spanned by [a1 , a2 ]. Since the two vectors a1 and a2 Parallelogram and triangles.
form the columns of the matrix A, we also speak of the determinant
of the matrix A, and denote it by det A or |A|:

a1,1 a1,2
|A| = = a1,1 a2,2 − a1,2 a2,1 . (4.24)
a2,1 a2,2

Since A maps a square with area one onto a parallelogram with


area |A|, the determinant of a matrix characterizes it as follows:

• If |A| = 1, then the linear map does not change areas.

• If 0 ≤ |A| < 1, then the linear map shrinks areas.

• If |A| = 0, then the matrix is rank deficient.

• If |A| > 1, then the linear map expands areas.

• If |A| < 0, then the linear map changes the orientation of objects.
(We’ll look at this closer after Example 4.5.) Areas may still con-
tract or expand depending on the magnitude of the determinant.
82 4. Changing Shapes: Linear Maps in 2D

Example 4.5

We will look at a few examples. Let


 
1 5
A= ,
0 1

then |A| = (1)(1) − (5)(0) = 1. Since A represents a shear, we see


again that those maps do not change areas.
For another example, let
 
1 0
A= ,
0 −1

then |A| = (1)(−1) − (0)(0) = −1. This matrix corresponds to a


reflection, and it leaves areas unchanged, except for a sign change.
Finally, let  
0.5 0.5
A= ,
0.5 0.5
then |A| = (0.5)(0.5) − (0.5)(0.5) = 0. This matrix corresponds to the
projection from Section 4.8. In that example, we saw that projections
collapse any object onto a straight line, i.e., to an object with zero
area.

Sketch 4.10.
Resulting area after scaling one There are some rules for working with determinants:
column of A. If A = [a1 , a2 ], then

|ca1 , a2 | = c|a1 , a2 | = c|A|.

In other words, if one of the columns of A is scaled by a factor c,


then A’s determinant is also scaled by c. Verify that this is true from
the definition of the determinant of A. Sketch 4.10 illustrates this for
the example, c = 2. If both columns of A are scaled by c, then the
determinant is scaled by c2 :

|ca1 , ca2 | = c2 |a1 , a2 | = c2 |A|.

Sketch 4.11 illustrates this for the example c = 1/2.


If |A| is positive and c is negative, then replacing a1 by ca1 will
Sketch 4.11. cause c|A|, the area formed by ca1 and a2 , to become negative. The
Resulting area after scaling notion of a negative area is very useful computationally. Two 2D
both columns of A. vectors whose determinant is positive are called right-handed. The
4.10. Composing Linear Maps 83

standard example is the pair of vectors e1 and e2 . Two 2D vectors


whose determinant is negative are called left-handed.2 Sketch 4.12
shows a pair of right-handed vectors (top) and a pair of left-handed
ones (bottom). Our definition of positive and negative area is not
totally arbitrary: the triangle formed by vectors a1 and a2 has area
1/2 × sin(α)a1 a2 . Here, the angle α indicates how much we have
to rotate a1 in order to line up with a2 . If we interchange the two
vectors, the sign of α and hence the sign of sin(α) also changes!
There is also an area sign change when we interchange the columns
of A:
Sketch 4.12.
|a1 , a2 | = −|a2 , a1 |. (4.25)
Right-handed and left-handed
vectors.
This fact is easily verified using the definition of a determinant:

|a2 , a1 | = a1,2 a2,1 − a2,2 a1,1 .

4.10 Composing Linear Maps


Suppose you have mapped a vector v to v using a matrix A. Next,
you want to map v to v using a matrix B. We start out with
    
a a1,2 v1 a v + a1,2 v2
v = 1,1 = 1,1 1 .
a2,1 a2,2 v2 a2,1 v1 + a2,2 v2

Next, we have
  
b1,1 b1,2 a1,1 v1 + a1,2 v2
v =
b2,1 b2,2 a2,1 v1 + a2,2 v2
 
b1,1 (a1,1 v1 + a1,2 v2 ) + b1,2 (a2,1 v1 + a2,2 v2 )
= .
b2,1 (a1,1 v1 + a1,2 v2 ) + b2,2 (a2,1 v1 + a2,2 v2 )

Collecting the terms in v1 and v2 , we get


  
b a + b1,2 a2,1 b1,1 a1,2 + b1,2 a2,2 v1
v = 1,1 1,1 .
b2,1 a1,1 + b2,2 a2,1 b2,1 a1,2 + b2,2 a2,2 v2

The matrix that we have created here, let’s call it C, is called the
product matrix of B and A:
BA = C.
2 The reason for this terminology will become apparent when we revisit these

definitions for the 3D case (see Section 8.2).


84 4. Changing Shapes: Linear Maps in 2D

In more detail,
    
b1,1 b1,2 a1,1 a1,2 b a + b1,2 a2,1 b1,1 a1,2 + b1,2 a2,2
= 1,1 1,1 .
b2,1 b2,2 a2,1 a2,2 b2,1 a1,1 + b2,2 a2,1 b2,1 a1,2 + b2,2 a2,2
(4.26)

This looks messy, but a simple rule puts order into chaos: the element
ci,j is computed as the dot product of B’s ith row and A’s jth column.
We can use this product to describe the composite map:

v = Bv = B[Av] = BAv.

Example 4.6

Let    
2 −1 2 0 −2
v= , A= , B= .
−1 0 3 −3 1
Then     
−1 2 2 −4
v = =
0 3 −1 −3
and     
0 −2 −4 6
v = = .
−3 1 −3 9
We can also compute v using the matrix product BA:
    
0 −2 −1 2 0 −6
C = BA = = .
−3 1 0 3 3 −3

Verify for yourself that v = Cv.

Analogous to the matrix/vector product from Section 4.2, there


is a neat way to arrange two matrices when forming their product
for manual computation (yes, that is still encountered!). Using the
matrices of Example 4.6, and highlighting the computation of c2,1 ,
we write
−1 2
0 3
0 −2
−3 1 3
4.10. Composing Linear Maps 85

Figure 4.12.
Linear map composition is order dependent. Top: rotate by –120◦ , then reflect about
the (rotated) e1 -axis. Bottom: reflect, then rotate.

You see how c2,1 is at the intersection of column one of the “top”
matrix and row two of the “left” matrix.
The complete multiplication scheme is then arranged like this

−1 2
0 3
0 −2 0 −6
−3 1 3 −3

While we use the term “product” for BA, it is very important to


realize that this kind of product differs significantly from products of
real numbers: it is not commutative. That is, in general

AB = BA.

Matrix products correspond to linear map compositions—since the


products are not commutative, it follows that it matters in which
order we carry out linear maps. Linear map composition is order
dependent. Figure 4.12 gives an example.
86 4. Changing Shapes: Linear Maps in 2D

Example 4.7

Let us take two very simple matrices and demonstrate that the prod-
uct is not commutative. This example is illustrated in Figure 4.12. A
rotates by −120◦, and B reflects about the e1 -axis:
   
−0.5 0.866 1 0
A= , B= .
−0.866 −0.5 0 −1

We first form AB (reflect and then rotate),


    
−0.5 0.866 1 0 −0.5 −0.866
AB = = .
−0.866 −0.5 0 −1 −0.866 0.5

Next, we form BA (rotate and then reflect),


    
1 0 −0.5 0.866 −0.5 0.866
BA = = .
0 −1 −0.866 −0.5 0.866 0.5

Clearly, these are not the same!

Of course, some maps do commute; for example, the 2D rotations.


It does not matter if we rotate by α first and then by β or the other
way around. In either case, we have rotated by α + β. In terms of
matrices,
  
cos α − sin α cos β − sin β
=
sin α cos α sin β cos β
 
cos α cos β − sin α sin β − cos α sin β − sin α cos β
.
sin α cos β + cos α sin β − sin α sin β + cos α cos β

Check for yourself that the other alternative gives the same result!
By referring to a trigonometry reference, we see that this product
matrix can be written as
 
cos(α + β) − sin(α + β)
,
sin(α + β) cos(α + β)
which corresponds to a rotation by α+β. As we will see in Section 9.9,
rotations in 3D do not commute.
What about the rank of a composite map?

rank(AB) ≤ min{rank(A), rank(B)}. (4.27)


4.11. More on Matrix Multiplication 87

This says that matrix multiplication does not increase rank. You
should try a few rank 1 and 2 matrices to convince yourself of this
fact.
One more example on composing linear maps, which we have seen
in Section 4.8, is that projections are idempotent. If A is a projection
matrix, then this means
Av = AAv
for any vector v. Written out using only matrices, this becomes
A = AA or A = A2 . (4.28)
Verify this property for the projection matrix
 
0.5 0.5
.
0.5 0.5
Excluding the identity matrix, only rank deficient matrices are idem-
potent.

4.11 More on Matrix Multiplication


Matrix multiplication is not limited to the product of 2 × 2 matrices.
In fact, when we multiply a matrix by a vector, we follow the rules
of matrix multiplication! If v = Av, then the first component of v
is the dot product of A’s first row and v; the second component of v
is the dot product of A’s second row and v.
In Section 4.2, we introduced the transpose AT of a matrix A and
a vector v. Usually, we write u · v for the dot product of u and v,
but sometimes considering the vectors as matrices is useful as well;
that is, we can form the product as

uT v = u · v. (4.29)

For examples in this section, let


   
3 −3
u= and v= .
4 6

Then it is straightforward to show that the left- and right-hand sides


of (4.29) are equal to 15.
If we have a product uT v, what is [uT v]T ? This has an easy answer:

(uT v)T = vT u,

as an example will clarify.


88 4. Changing Shapes: Linear Maps in 2D

Example 4.8
 
T T

 −3 T
[u v] = 3 4 = [15]T = 15
6
 
  3
vT u = −3 6 = [15] = 15.
4
The results are the same.

We saw that addition of matrices is straightforward under transpo-


sition; matrix multiplication is not that straightforward. We have
(AB)T = B T AT . (4.30)
To see why this is true, recall that each element of a product matrix
is obtained as a dot product. In the matrix products below, we show
the calculation of one element of the left- and right-hand sides of
(4.30) to demonstrate that the dot products for this one element are
identical. Let transpose matrix elements be referred to as bT
i,j .

   T  
a1,1 a1,2 b1,1 b1,2 c c1,2
(AB)T = = 1,1 ,
a2,1 a2,2 b2,1 b2,2 c2,1 c2,2
  T   
bT
1,1 bT
1,2 a1,1 aT1,2 c c1,2
T
B A = T
= 1,1 .
bT
2,1 bT
2,2 a T
2,1 aT
2,2 c2,1 c2,2

Since bi,j = bT
j,i , we see the identical dot product is calculated to form
c1,2 .
What is the determinant of a product matrix? If C = AB denotes
a matrix product, then
|AB| = |A||B|, (4.31)
which tells us that B scales objects by |B|, A scales objects by |A|, and
the composition of the maps scales by the product of the individual
scales.

Example 4.9

As a simple example, take two scalings


   
1/2 0 4 0
A= , B= .
0 1/2 0 4
4.12. Matrix Arithmetic Rules 89

We have |A| = 1/4 and |B| = 16. Thus, A scales down, and B scales
up, but the effect of B’s scaling is greater than that of A’s. The
product
 
2 0
AB =
0 2

thus scales up: |AB| = |A||B| = 4.

Just as for real numbers, we can define exponents for matrices:

Ar = A
· .
. . · A .
r times

Here are some rules.

Ar+s = Ar As
Ars = (Ar )s
A0 = I

For now, assume r and s are positive integers. See Sections 5.9
and 9.10 for a discussion of A−1 , the inverse matrix.

4.12 Matrix Arithmetic Rules

We encountered some of these rules throughout this chapter in terms


of matrix and vector multiplication: a vector is simply a special ma-
trix. The focus of this chapter is on 2 × 2 and 2 × 1 matrices, but
these rules apply to matrices of any size. In the rules that follow, let
a, b be scalars.
Importantly, however, the matrix sizes must be compatible for the
operations to be performed. Specifically, matrix addition requires
the matrices to have the same dimensions and matrix multiplication
requires the “inside” dimensions to be equal: suppose A’s dimensions
are m × r and B’s are r × n, then the product C = AB is permissible
since the dimension r is shared, and the resulting matrix C will have
the “outer” dimensions, reading left to right: m × n.
90 4. Changing Shapes: Linear Maps in 2D

Commutative Law for Addition


A+B =B+A

Associative Law for Addition


A + (B + C) = (A + B) + C

No Commutative Law for Multiplication


AB = BA

Associative Law for Multiplication


A(BC) = (AB)C

Distributive Law
A(B + C) = AB + AC

Distributive Law
(B + C)A = BA + CA

Rules involving scalars:

a(B + C) = aB + aC
(a + b)C = aC + bC
(ab)C = a(bC)
a(BC) = (aB)C = B(aC)

Rules involving the transpose:

(A + B)T = AT + B T
(bA)T = bAT
(AB)T = B T AT
T
AT = A

Chapter 9 will introduce 3 × 3 matrices and Chapter 12 will intro-


duce n × n matrices.
4.13. Exercises 91

• linear combination • action ellipse


• matrix form • reflections
• preimage and image • rotations
• domain and range • rigid body motions
• column space • shears
• identity matrix • projections
• matrix addition • parallel projection
• distributive law • oblique projection
• transpose matrix • dyadic matrix
• symmetric matrix • idempotent map
• rank of a matrix • determinant
• rank deficient • signed area
• singular matrix • matrix multiplication
• linear space or vector • composite map
space • noncommutative property
• subspace of matrix multiplication
• linearity property • transpose of a product or
• scalings sum of matrices
• rules of matrix arithmetic

4.13 Exercises
For the following exercises, let
     
0 −1 1 −1 2
A= , B= , v= .
1 0 −1 1/2 3
1. What linear combination of
   
1 0
c1 = and c2 =
0 2
results in  
2
w= ?
1
Write the result in matrix form.
2. Suppose w = w1 c1 + w2 c2 . Express this in matrix form.
3. Is the vector v in the column space of A?
4. Construct a matrix C such that the vector
 
0
w=
1
is not in its column space.
92 4. Changing Shapes: Linear Maps in 2D

5. Are either A or B symmetric matrices?


6. What is the transpose of A, B, and v?
7. What is the transpose of the 2 × 2 identity matrix I?
8. Compute AT + B T and [A + B]T .
9. What is the rank of B?
10. What is the rank of the matrix C = [c 3c]?
11. What is the rank of the zero matrix?
12. For the matrix A, vectors v and u = [1 0]T , and scalars a = 4 (applied
to u) and b = 2 (applied to v), demonstrate the linearity property of
linear maps.
13. Describe geometrically the effect of A and B. (You may do this analyt-
ically or by using software to illustrate the action of the matrices.)
14. Compute Av and Bv.
15. Construct the matrix S that maps the vector w to 3w.
16. What scaling matrix will result in an action ellipse with major axis twice
the length of the minor axis?
17. Construct the matrix that reflects about the e2 -axis.
18. What is the shear matrix that maps v onto the e2 -axis?
19. What type of linear map is the matrix
 
−1 0
?
0 −1

20. Are either A or B a rigid body motion?


21. Is the matrix A idempotent?
22. Construct an orthogonal projection onto
 √ 
2/√5
u= .
1/ 5

23. For an arbitrary unit vector u, show that P = uuT is idempotent.


24. Suppose we apply each of the following linear maps to the vertices of
a unit square: scaling, reflection, rotation, shear, projection. For each
map, state if there is a change in area and the reason.
25. What is the determinant of A?
26. What is the determinant of B?
27. What is the determinant of 4B?
28. Compute A + B. Show that Av + Bv = (A + B)v.
29. Compute ABv and BAv.
4.13. Exercises 93

30. Compute B T A.
31. What is A2 ?
32. Let M and N be 2 × 2 matrices and each is rank one. What can you
say about the rank of M + N ?
33. Let two square matrices M and N each have rank one. What can you
say about the rank of M N ?
34. Find matrices C and D, both having rank greater than zero, such that
the product CD has rank zero.
This page intentionally left blank
5
2 × 2 Linear Systems

Figure 5.1.
Intersection of lines: two families of lines are shown; the intersections of corresponding
line pairs are marked by black boxes. For each intersection, a 2 × 2 linear system has
to be solved.

Just about anybody can solve two equations in two unknowns by


somehow manipulating the equations. In this chapter, we will develop
a systematic way for finding the solution, simply by checking the

95
96 5. 2 × 2 Linear Systems

underlying geometry. This approach will later enable us to solve


much larger systems of equations in Chapters 12 and 13. Figure 5.1
illustrates many instances of intersecting two lines: a problem that
can be formulated as a 2 × 2 linear system.

5.1 Skew Target Boxes Revisited


In our standard [e1 , e2 ]-coordinate system, suppose we are given two
vectors a1 and a2 . In Section 4.1, we showed how these vectors de-
fine a skew target box with its lower-left corner at the origin. As
illustrated in Sketch 5.1, suppose we also are given a vector b with
respect to the [e1 , e2 ]-system. Now the question arises, what are the
components of b with respect to the [a1 , a2 ]-system? In other words,
we want to find a vector u with components u1 and u2 , satisfying
u1 a1 + u2 a2 = b. (5.1)

Sketch 5.1.
Geometry of a 2 × 2 system.
Example 5.1

Before we proceed further, let’s look at an example. Following Sketch 5.1,


let      
2 4 4
a1 = , a2 = , b= .
1 6 4
Upon examining the sketch, we see that
     
2 1 4 4
1× + × = .
1 2 6 4
In the [a1 , a2 ]-system, b has components (1, 1/2). In the [e1 , e2 ]-
system, it has components (4, 4).

What we have here are really two equations in the two unknowns
u1 and u2 , which we see by expanding the vector equations into
2u1 + 4u2 = 4
(5.2)
u1 + 6u2 = 4.
And as we saw in Example 5.1, these two equations in two unknowns
have the solution u1 = 1 and u2 = 1/2, as is seen by inserting these
values for u1 and u2 into the equations.
5.2. The Matrix Form 97

Being able to solve two simultaneous sets of equations allows us to


switch back and forth between different coordinate systems. The rest
of this chapter is dedicated to a detailed discussion of how to solve
these equations.

5.2 The Matrix Form


The two equations in (5.2) are also called a linear system. It can be
written more compactly if we use matrix notation:
    
2 4 u1 4
= . (5.3)
1 6 u2 4

In general, a 2 × 2 linear system looks like this:


    
a1,1 a1,2 u1 b
= 1 . (5.4)
a2,1 a2,2 u2 b2

Equation (5.4) is shorthand notation for the equations

a1,1 u1 + a1,2 u2 = b1 , (5.5)


a2,1 u1 + a2,2 u2 = b2 . (5.6)

We sometimes write it even shorter, using a matrix A:

Au = b, (5.7)

where      
a a1,2 u1 b
A = 1,1 , u= , b= 1 .
a2,1 a2,2 u2 b2
Both u and b represent vectors, not points! (See Sketch 5.1 for an
illustration.) The vector u is called the solution of the linear system.
While the savings of this notation is not completely obvious in the
2 × 2 case, it will save a lot of work for more complicated cases with
more equations and unknowns.
The columns of the matrix A correspond to the vectors a1 and
a2 . We could then rewrite our linear system as (5.1). Geometrically,
we are trying to express the given vector b as a linear combination
of the given vectors a1 and a2 ; we need to determine the factors u1
and u2 . If we are able to find at least one solution, then the linear
system is called consistent, otherwise it is called inconsistent. Three
possibilities for our solution space exist.
98 5. 2 × 2 Linear Systems

1. There is exactly one solution vector u. In this case, |A| = 0, thus


the matrix has full rank and is nonsingular.
2. There is no solution, or in other words, the system is inconsistent.
(See Section 5.6 for a geometric description.)
3. There are infinitely many solutions. (See Sections 5.7 and 5.8 for
examples.)

5.3 A Direct Approach: Cramer’s Rule


Sketch 5.2 offers a direct solution to our linear system. By inspecting
the areas of the parallelograms in the sketch, we see that
area(b, a2 )
u1 = ,
area(a1 , a2 )
area(a1 , b)
u2 = .
area(a1 , a2 )
An easy way to see how these ratios of areas correspond to u1 and
u2 is to shear the parallelograms formed by b, a2 and b, a1 onto the
a1 and a2 axes, respectively. (Shears preserve areas.) The area of a
Sketch 5.2. parallelogram is given by the determinant of the two vectors spanning
Cramer’s rule. it. Recall from Section 4.9 that this is a signed area. This method of
solving for the solution of a linear system is called Cramer’s rule.

Example 5.2
Applying Cramer’s rule to the linear system in (5.3), we get

4 4

4 6 8
u1 = = ,
2 4 8

1 6

2
4
1 4 4
u2 = = .
2 4 8
1 6
Examining the determinant in the numerator, notice that b replaces
a1 in the solution for u1 and then b replaces a2 in the solution for u2 .
5.4. Gauss Elimination 99

Notice that if the area spanned by a1 and a2 is zero, that is, the
vectors are multiples of each other, then Cramer’s rule will not result
in a solution. (See Sections 5.6–5.8 for more information on this
situation.)
Cramer’s rule is primarily of theoretical importance. For larger
systems, Cramer’s rule is both expensive and numerically unstable.
Hence, we now study a more effective method.

5.4 Gauss Elimination


Let’s consider a special 2 × 2 linear system:
 
a1,1 a1,2
u = b. (5.8)
0 a2,2

This situation is shown in Sketch 5.3. This matrix is called upper


triangular because all elements below the diagonal are zero, forming
a triangle of numbers above the diagonal.
We can solve this system without much work. Examining Equa- Sketch 5.3.
tion (5.8), we see it is possible to solve for A special linear system.

u2 = b2 /a2,2 .

With u2 in hand, we can solve the first equation from (5.5) for

1
u1 = (b1 − u2 a1,2 ).
a1,1

This technique of solving the equations from the bottom up is called


back substitution.
Notice that the process of back substitution requires divisions.
Therefore, if the diagonal elements, a1,1 or a2,2 , equal zero then the
algorithm will fail. This type of failure indicates that the columns
of A are not linearly independent. (See Sections 5.6–5.8 for more
information on this situation.) Because of the central role that the
diagonal elements play in Gauss elimination, they are called pivots.
In general, we will not be so lucky to encounter an upper triangular
system as in (5.8). But any linear system in which A is nonsingular
may be transformed to this simple form, as we shall see by reexamining
the system in (5.3). We write it as
     
2 4 4
u1 + u2 = .
1 6 4
100 5. 2 × 2 Linear Systems

This situation is shown in Sketch 5.4. Clearly, a1 is not on the e1 -axis


as we would like, but we can apply a stepwise procedure so that it
will become just that. This systematic, stepwise procedure is called
forward elimination. The process of forward elimination followed by
back substitution is called Gauss elimination.1
Recall one key fact from Chapter 4: linear maps do not change
linear combinations. That means if we apply the same linear map to
all vectors in our system, then the factors u1 and u2 won’t change. If
the map is given by a matrix S, then
      
Sketch 5.4. 2 4 4
The geometry of a linear
S u1 + u2 =S .
1 6 4
system.
In order to get a1 to line up with the e1 -axis, we will employ a
shear parallel to the e2 -axis, such that
   
2 2
is mapped to .
1 0

That shear (see Section 4.7) is given by the matrix


 
1 0
S1 = .
−1/2 1

We apply S1 to all vectors involved in our system:


         
1 0 2 2 1 0 4 4
= , = ,
−1/2 1 1 0 −1/2 1 6 4
    
1 0 4 4
= .
−1/2 1 4 2

The effect of this map is shown in Sketch 5.5.


Sketch 5.5. Our transformed system now reads
Shearing the vectors in a linear     
system. 2 4 u1 4
= .
0 4 u2 2

Now we can employ back substitution to find

u2 = 2/4 = 1/2,

1 1
u1 = 4−4× = 1.
2 2
1 Gauss elimination and forward elimination are often used interchangeably.
5.5. Pivoting 101

For 2 × 2 linear systems there is only one matrix entry to zero in


the forward elimination procedure. We will restate the procedure in
a more algorithmic way in Chapter 12 when there is more work to do.

Example 5.3
We will look at one more example of forward elimination and back
substitution. Let a linear system be given by
    
−1 4 u1 0
= .
2 2 u2 2

The shear that takes a1 to the e1 -axis is given by


 
1 0
S1 = ,
2 1

and it transforms the system to


    
−1 4 u1 0
= .
0 10 u2 2

Draw your own sketch to understand the geometry.


Using back substitution, the solution is now easily found as u1 =
8/10 and u2 = 2/10.

5.5 Pivoting
Consider the system
    
0 1 u1 1
= ,
1 0 u2 1
illustrated in Sketch 5.6.
Our standard approach, shearing a1 onto the e1 -axis, will not work Sketch 5.6.
here; there is no shear that takes A linear system that needs
  pivoting.
0
1
onto the e1 -axis. However, there is no problem if we simply exchange
the two equations! Then we have
    
1 0 u1 1
= ,
0 1 u2 1
102 5. 2 × 2 Linear Systems

and thus u1 = u2 = 1. So we cannot blindly apply a shear to a1 ;


we must first check that one exists. If it does not—i.e., if a1,1 = 0—
exchange the equations.
As a rule of thumb, if a method fails because some number equals
zero, then it will work poorly if that number is small. It is thus
advisable to exchange the two equations anytime we have |a1,1 | <
|a2,1 |. The absolute value is used here since we are interested in
the magnitude of the involved numbers, not their sign. The process
of exchanging equations (rows) so that the pivot is the largest in
absolute value is called row pivoting or partial pivoting, and it is used
to improve numerical stability. In Section 5.8, a special type of linear
system is introduced that sometimes needs another type pivoting.
However, since row pivoting is the most common, we’ll refer to this
simply as “pivoting.”

Example 5.4
Let’s study an example taken from [11]:
    
0.0001 1 u1 1
= .
1 1 u2 2

If we shear a1 onto the e1 -axis, thus applying one forward elimination


step, the new system reads
    
0.0001 1 u1 1
= .
0 −9999 u2 −9998

Performing back substitution, we find the solution is


 
1.0001
ut = ,
0.99989̄

which we will call the “true” solution. Note the magnitude of changes
in a2 and b relative to a1 . This is the type of behavior that causes
numerical problems. It can often be dealt with by using a larger
number of digits.
Suppose we have a machine that stores only three digits, although
it calculates with six digits. Due to round-off, the system above would
be stored as
    
0.0001 1 u1 1
= ,
0 −10000 u2 −10000
5.5. Pivoting 103

which would result in a “round-off” solution of


 
0
ur = ,
1

which is not very close to the true solution ut , as

ut − ur  = 1.0001.

Pivoting is a tool to damper the effects of round-off. Now employ


pivoting by exchanging the rows, yielding the system
    
1 1 u1 2
= .
0.0001 1 u2 1
Shear a1 onto the e1 -axis, and the new system reads
    
1 1 u1 2
= ,
0 0.9999 u2 0.9998
which results in the “pivoting solution”
 
1
up = .
1

Notice that the vectors of the linear systems are all within the same
range. Even with the three-digit machine, this system will allow us to
compute a result that is closer to the true solution because the effects
of round-off have been minimized. Now the error is

ut − up  = 0.00014.

Numerical strategies are the primary topic of numerical analysis,


but they cannot be ignored in the study of linear algebra. Because
this is an important real-world topic, we will revisit it. In Section 12.2
we will present Gauss elimination with pivoting integrated into the
algorithm. In Section 13.4 we will introduce the condition number of
a matrix, which is a measure for closeness to being singular. Chap-
ters 13 and 16 will introduce other methods for solving linear systems
that are better to use when numerical issues are a concern.
104 5. 2 × 2 Linear Systems

5.6 Unsolvable Systems


Consider the situation shown in Sketch 5.7. The two vectors a1 and
a2 are multiples of each other. In other words, they are linearly
dependent.
The corresponding linear system is
    
2 1 u1 1
= .
4 2 u2 1

Sketch 5.7. It is obvious from the sketch that we have a problem here, but let’s
An unsolvable linear system. just blindly apply forward elimination; apply a shear such that a1 is
mapped to the e1 -axis. The resulting system is
    
2 1 u1 1
= .
0 0 u2 −1

But the last equation reads 0 = −1, and now we really are in trouble!
This means that our system is inconsistent, and therefore does not
have a solution.
It is possible however, to find an approximate solution. This is done
in the context of least squares methods, see Section 12.7.

5.7 Underdetermined Systems


Consider the system
    
2 1 u1 3
= ,
4 2 u2 6

shown in Sketch 5.8.


Sketch 5.8. We shear a1 onto the e1 -axis, and obtain
An underdetermined linear
    
system. 2 1 u1 3
= .
0 0 u2 0

Now the last equation reads 0 = 0—true, but a bit trivial! In reality,
our system is just one equation written down twice in slightly different
forms. This is also clear from the sketch: b may be written as a
multiple of either a1 or a2 , thus the system is underdetermined. This
type of system is consistent because at least one solution exists. We
can find a solution by setting u2 = 1, and then back substitution
results in u1 = 1.
5.8. Homogeneous Systems 105

5.8 Homogeneous Systems


A system of the form
Au = 0, (5.9)
i.e., one where the right-hand side consists of the zero vector, is called
homogeneous. One obvious solution is the zero vector itself; this is
called the trivial solution and is usually of little interest. If it has
a solution u that is not the zero vector, then clearly all multiples
cu are also solutions: we multiply both sides of the equations by a
common factor c. In other words, the system has an infinite number
of solutions.
Not all homogeneous systems do have a nontrivial solution, how-
ever. Equation (5.9) may be read as follows: What vector u, when
mapped by A, has the zero vector as its image? The only 2 × 2 maps
capable of achieving this have rank 1. They are characterized by the
fact that their two columns a1 and a2 are parallel, or linearly depen-
dent. If the system has only the trivial solution, then A is invertible.

Example 5.5

An example, illustrated in Sketch 5.9, should help. Let our homoge- Sketch 5.9.
neous system be     Homogeneous system with
1 2 0 nontrivial solution.
u= .
2 4 0
Clearly, a2 = 2a1 ; the matrix A maps all vectors onto the line de-
fined by a1 and the origin. In this example, any vector u that is
perpendicular to a1 will be projected to the zero vector:

A[cu] = 0.

After one step of forward elimination, we have


   
1 2 0
u= .
0 0 0
Any u2 solves the last equation. So let’s pick u2 = 1. Back substitu-
tion then gives u1 = −2, therefore
 
−2
u=
1
is a solution to the system; so is any multiple of it. Also check that
a1 · u = 0, so they are in fact perpendicular.
106 5. 2 × 2 Linear Systems

All vectors u that satisfy a homogeneous system make up the kernel


or null space of the matrix.

Example 5.6

We now consider an example of a homogeneous system that has only


the trivial solution:    
1 2 0
u= .
2 1 0
The two columns of A are linearly independent; therefore, A does
not reduce dimensionality. Then it cannot map any nonzero vector u
to the zero vector!
This is clear after one step of forward elimination,
   
1 2 0
u= ,
0 −3 0
and back substitution results in
 
0
u= .
0

In general, we may state that a homogeneous system has nontrivial


solutions (and therefore, infinitely many solutions) only if the columns
of the matrix are linearly dependent.
In some situations, row pivoting will not prepare the linear system
for back substitution, necessitating column pivoting. When columns
are exchanged, the corresponding unknowns must be exchanged as
well.

Example 5.7

This next linear system might seem like a silly one to pose; however,
systems of this type do arise in Section 7.3 in the context of finding
eigenvectors:  
0 1/2
u = 0.
0 0
5.9. Undoing Maps: Inverse Matrices 107

In order to apply back substitution to this system, column pivoting


is necessary, thus the system becomes
  
1/2 0 u2
= 0.
0 0 u1

Now we set u1 = 1 and proceed with back substitution to find that


u2 = 0. All vectors of the form
 
1
u=c
0

are solutions.

5.9 Undoing Maps: Inverse Matrices


In this section, we will see how to undo a linear map. Reconsider the
linear system
Au = b.
The matrix A maps u to b. Now that we know u, what is the matrix
B that maps b back to u,

u = Bb? (5.10)

Defining B—the inverse map—is the purpose of this section.


In solving the original linear system, we applied shears to the col-
umn vectors of A and to b. After the first shear, we had

S1 Au = S1 b.

This demonstrated how shears can be used to zero elements of the


matrix. Let’s return to the example linear system in (5.3). After
applying S1 the system became
    
2 4 u1 4
= .
0 4 u2 2

Let’s use another shear to zero the upper right element. Geometri-
cally, this corresponds to constructing a shear that will map the new
a2 to the e2 -axis. It is given by the matrix
 
1 −1
S2 = .
0 1
108 5. 2 × 2 Linear Systems

Applying it to all vectors gives the new system


    
2 0 u1 2
= .
0 4 u2 2

After the second shear, our linear system has been changed to

S2 S1 Au = S2 S1 b.

Next, apply a nonuniform scaling S3 in the e1 and e2 directions


that will map the latest a1 and a2 onto the vectors e1 and e2 . For
our current example,
 
1/2 0
S3 = .
0 1/4

The new system becomes


    
1 0 u1 1
= ,
0 1 u2 1/2

which corresponds to

S3 S2 S1 Au = S3 S2 S1 b.

This is a very special system. First of all, to solve for u is now


trivial because A has been transformed into the unit matrix or identity
matrix I,  
1 0
I= . (5.11)
0 1
This process of transforming A until it becomes the identity is the-
oretically equivalent to the back substitution process of Section 5.4.
However, back substitution uses fewer operations and thus is the
method of choice for solving linear systems.
Yet we have now found the matrix B in (5.10)! The two shears and
scaling transformed A into the identity matrix I:

S3 S2 S1 A = I; (5.12)

thus, the solution of the system is

u = S3 S2 S1 b. (5.13)

This leads to the inverse matrix A−1 of a matrix A:

A−1 = S3 S2 S1 . (5.14)
5.9. Undoing Maps: Inverse Matrices 109

The matrix A−1 undoes the effect of the matrix A: the vector u was
mapped to b by A, and b is mapped back to u by A−1 . Thus, we
can now write (5.13) as
u = A−1 b.
If this transformation result can be achieved, then A is called invert-
ible. At the end of this section and in Sections 5.6 and 5.7, we discuss
cases in which A−1 does not exist.
If we combine (5.12) and (5.14), we immediately get

A−1 A = I. (5.15)

This makes intuitive sense, since the actions of a map and its inverse
should cancel out, i.e., not change anything—that is what I does!
Figures 5.2 and 5.3 illustrate this. Then by the definition of the
inverse,
AA−1 = I.
If A−1 exits, then it is unique.
The inverse of the identity is the identity

I −1 = I.

The inverse of a scaling is given by:


 −1  
s 0 1/s 0
= .
0 t 0 1/t

Multiply this out to convince yourself.


Figure 5.2 shows the effects of a matrix and its inverse for the
scaling  
1 0
.
0 0.5

Figure 5.3 shows the effects of a matrix and its inverse for the shear
 
1 1
.
0 1
We consider the inverse of a rotation as follows: if Rα rotates by α
degrees counterclockwise, then R−α rotates by α degrees clockwise,
or
−1 T
R−α = Rα = Rα ,
as we can see from the definition of a rotation matrix (4.16).
110 5. 2 × 2 Linear Systems

Figure 5.2.
Inverse matrices: illustrating scaling and its inverse, and that AA–1 = A–1 A = I. Top:
the original Phoenix, the result of applying a scale, then the result of the inverse scale.
Bottom: the original Phoenix, the result of applying the inverse scale, then the result
of the original scale.

Figure 5.3.
Inverse matrices: illustrating a shear and its inverse, and that AA–1 = A–1 A = I. Top:
the original Phoenix, the result of applying a shear, then the result of the inverse shear.
Bottom: the original Phoenix, the result of applying the inverse shear, then the result
of the original shear.
5.9. Undoing Maps: Inverse Matrices 111

The rotation matrix is an example of an orthogonal matrix. An


orthogonal matrix A is characterized by the fact that
A−1 = AT .
The column vectors a1 and a2 of an orthogonal matrix satisfy a1  = 1,
a2  = 1 and a1 · a2 = 0. In words, the column vectors are orthonor-
mal. The row vectors are orthonormal as well. Those transformations
that are described by orthogonal matrices are called rigid body mo-
tions. The determinant of an orthogonal matrix is ±1.
We add without proof two fairly obvious identities that involve the
inverse:
−1
A−1 = A, (5.16)
which should be obvious from Figures 5.2 and 5.3, and
T −1
A−1 = AT . (5.17)
Figure 5.4 illustrates this for
 
1 0
A= .
1 0.5

Figure 5.4.
T –1
Inverse matrices: the top illustrates I, A–1 , A–1 and the bottom illustrates I, AT , AT .
112 5. 2 × 2 Linear Systems

Given a matrix A, how do we compute its inverse? Let us start


with
AA−1 = I. (5.18)
If we denote the two (unknown) columns of A−1 by a1 and a2 , and
those of I by e1 and e2 , then (5.18) may be written as
   
A a1 a2 = e1 e2 .
This is really short for two linear systems
Aa1 = e1 and Aa2 = e2 .
Both systems have the same matrix A and can thus be solved si-
multaneously. All we have to do is to apply the familiar shears and
scale—those that transform A to I—to both e1 and e2 .

Example 5.8

Let’s revisit Example 5.3 with


 
−1 4
A= .
2 2
Our two simultaneous systems are:
   
−1 4   1 0
a1 a2 = .
2 2 0 1
The first shear takes this to
   
−1 4   1 0
a1 a2 = .
0 10 2 1
The second shear yields
   
−1 0   2/10 −4/10
a1 a2 = .
0 10 2 1
Finally the scaling produces
   
1 0   −2/10 4/10
a1 a2 = .
0 1 2/10 1/10
Thus the inverse matrix
 
−1 −2/10 4/10
A = .
2/10 1/10
5.10. Defining a Map 113

It can be the case that a matrix A does not have an inverse. For
example, the matrix  
2 1
4 2
is not invertible because the columns are linearly dependent (and
therefore the determinant is zero). A noninvertible matrix is also
referred to as singular. If we try to compute the inverse by setting up
two simultaneous systems,
   
2 1   1 0
a1 a2 = ,
4 2 0 1

then the first shear produces


   
2 1   1 0
a1 a2 = .
0 0 −2 1

At this point it is clear that we will not be able to construct linear


maps to achieve the identity matrix on the left side of the equation.
Examples of singular matrices were introduced in Sections 5.6–5.8.

5.10 Defining a Map


Matrices map vectors to vectors. If we know the result of such a map,
namely that two vectors v1 and v2 were mapped to v1 and v2 , can
we find the matrix that did it?
Suppose some matrix A was responsible for the map. We would
then have the two equations

Av1 = v1 and Av2 = v2 .

Combining them, we can write

A[v1 , v2 ] = [v1 , v2 ],

or, even shorter,


AV = V  . (5.19)
−1
To define A, we simply find V , then

A = V  V −1 .

Of course v1 and v2 must be linearly independent for V −1 to exist.


If the vi and vi are each linearly independent, then A represents a
change of basis.
114 5. 2 × 2 Linear Systems

Example 5.9

Let’s find the linear map A that maps the basis V formed by vectors
   
1 −1
v1 = and v2 =
1 1

and the basis V  formed by vectors


   
 −1  1
v1 = and v2 =
−1 −1

as illustrated in Sketch 5.10.


First, we find V −1 following the steps in Example 5.8, resulting in
 
1/2 1/2
V −1 = .
−1/2 1/2

The change of basis linear map is


    
 −1 −1 −1 1/2 1/2 −1 0
A=V V = = .
1 1 −1/2 1/2 0 −1

Check that vi is mapped to vi . If we have any vector v in the


V basis, this map will return the coordinates of its corresponding
vector v in V  . Sketch 5.10 illustrates that v = [0 1]T is mapped to
v = [0 − 1]T .
Sketch 5.10.
Change of basis.

In Section 6.5 we’ll revisit this topic with an application.

5.11 A Dual View


Let’s take a moment to recognize a dual view of linear systems. A
coordinate system or linear combination approach (5.1) represents
what we might call the “column view.” If instead we focus on the row
equations (5.5–5.6) we take a “row view.” A great example of this can
be found by revisiting two line intersection scenarios in Examples 3.9
and 3.10. In the former, we are intersecting two lines in parametric
form, and the problem statement takes the column view by asking
what linear combination of the column vectors results in the right-
hand side. In the latter, we are intersecting two lines in implicit form,
5.11. A Dual View 115

2 5

3
2 1
1
2
–3 –2 –1 1 2 3
–1
1

–1 1 –1 1 2
2 2

–1

Figure 5.5.
Linear system classification: Three linear systems interpreted as line intersection
problems. Left to right: unique solution, inconsistent, underdetermined.

and the problem statement takes the row view by asking what u1 and
u2 satisfy both line equations. Depending on the problem at hand,
we can choose the view that best suits our given information.
We took a column view in our approach to presenting 2 × 2 linear
systems, but equally valid would be a row view. Let’s look at the
key examples from this chapter as if they were posed as implicit line
intersection problems. Figure 5.5 illustrates linear systems from Ex-
ample 5.1 (unique solution), the example in Section 5.6 (inconsistent
linear system), and the example in Section 5.7 (underdetermined sys-
tem). Importantly, the column and row views of the systems result
in the same classification of the solution sets.
Figure 5.6 illustrates two types of homogeneous systems from ex-
amples in Section 5.8. Since the right-hand side of each line equation
is zero, the lines will pass through the origin. This guarantees the
trivial solution for both intersection problems. The system with non-
trivial solutions is depicted on the right as two identical lines.
116 5. 2 × 2 Linear Systems

2 1.5

1.0

0.5
–2 –1 1 2
–3 –2 –1 1 2
–0.5

–2 –1.0

–4

Figure 5.6.
Homogeneous linear system classification: Two homogeneous linear systems inter-
preted as line intersection problems. Left to right: trivial solution only, nontrivial
solutions.

• linear system • inconsistent system of


• solution spaces equations
• consistent linear system • underdetermined system
• Cramer’s rule of equations
• upper triangular • homogeneous system
• Gauss elimination • kernel
• forward elimination • null space
• back substitution • row pivoting
• linear combination • column pivoting
• inverse matrix • complete pivoting
• orthogonal matrix • change of basis
• orthonormal • column and row views of
• rigid body motion linear systems

5.12 Exercises
1. Using the matrix form, write down the linear system to express
 
6
3
5.12. Exercises 117

in terms of the local coordinate system defined by the origin,


   
2 6
a1 = , and a2 = .
−3 0

2. Is the following linear system consistent? Why?


   
1 2 0
u= .
0 0 4

3. What are the three possibilities for the solution space of a linear system
Au = b?
4. Use Cramer’s rule to solve the system in Exercise 1.
5. Use Cramer’s rule to solve the system
    
2 1 x1 8
= .
0 1 x2 2

6. Give an example of an upper triangular matrix.


7. Use Gauss elimination to solve the system in Exercise 1.
8. Use Gauss elimination to solve the system
    
4 −2 x1 −2
= .
2 1 x2 1

9. Use Gauss elimination with pivoting to solve the system


    
0 4 x1 8
= .
2 2 x2 6

10. Resolve the system in Exercise 1 with Gauss elimination with pivoting.
11. Give an example by means of a sketch of an unsolvable system. Do the
same for an underdetermined system.
12. Under what conditions can a nontrivial solution to a homogeneous sys-
tem be found?
13. Does the following homogeneous system have a nontrivial solution?
    
2 2 x1 0
= .
0 4 x2 0

14. What is the kernel of the matrix


 
2 6
C= ?
4 12
118 5. 2 × 2 Linear Systems

15. What is the null space of the matrix


 
2 4
C= ?
1 2

16. Find the inverse of the matrix in Exercise 1.


17. What is the inverse of the matrix
 
10 0
?
0 0.5

18. What is the inverse of the matrix


 
cos 30◦ − sin 30◦
?
sin 30◦ cos 30◦

19. What type of matrix has the property that A−1 = AT ? Give an example.
20. What is the inverse of  
1 1
?
0 0
21. Define the matrix A that maps
       
1 1 1 1
→ and → .
0 0 1 −1

22. Define the matrix A that maps


       
2 1 0 −1
→ and → .
0 1 4 1
6
Moving Things Around:
Affine Maps in 2D

Figure 6.1.
Moving things around: affine maps in 2D applied to an old and familiar video game
character.

Imagine playing a video game. As you press a button, figures and


objects on the screen start moving around; they shift their positions,

119
120 6. Moving Things Around: Affine Maps in 2D

they rotate, they zoom in or out. As you see this kind of motion,
the game software must carry out quite a few transformations. In
Figure 6.1, they have been applied to a familiar face in gaming. These
computations are implementations of affine maps, the subject of this
chapter.

6.1 Coordinate Transformations


In Section 4.1 the focus was on constructing a linear map that takes
a vector v in the [e1 , e2 ]-system,

v = v1 e1 + v2 e2 ,

to a vector v in the [a1 , a2 ]-system

v = v1 a1 + v2 a2 .

Recall that this latter system describes a parallelogram (skew) target


box with lower-left corner at the origin. In this chapter, we want to
construct this skew target box anywhere, and we want to map points
rather than vectors, as illustrated by Sketch 6.1.
Sketch 6.1. Now we will describe a skew target box by a point p and two vectors
A skew target box. a1 , a2 . A point x is mapped to a point x by

x = p + x1 a1 + x2 a2 , (6.1)
= p + Ax, (6.2)

as illustrated by Sketch 6.1. This simply states that we duplicate the


[e1 , e2 ]-geometry in the [a1 , a2 ]-system: x has the same coordinates
in the new system as x did in the old one. Technically, the linear map
A in (6.2) is applied to the vector x − o, so it should be written as

x = p + A(x − o), (6.3)

where o is the origin of x’s coordinate system. In most cases, we will


have the familiar  
0
o= ,
0
and then we will simply drop the “−o” part, as in (6.2).
Affine maps are the basic tools to move and orient objects. All
are of the form given in (6.3) and thus have two parts: a translation,
given by p, and a linear map, given by A.
Let’s try representing the coordinate transformation of Section 1.1
as an affine map. The point u lives in the [e1 , e2 ]-system, and we
6.1. Coordinate Transformations 121

wish to find x in the [a1 , a2 ]-system. Recall that the extents of


the target box defined Δ1 = max1 − min1 and Δ2 = max2 − min2 , so
we set      
min1 Δ1 0
p= , a1 = , a2 = .
min2 0 Δ2
The affine map is defined as
      
x1 min1 Δ1 0 u1
= + ,
x2 min2 0 Δ2 u2

and we recover (1.3) and (1.4).


Of course we are not restricted to target boxes that are parallel to
the [e1 , e2 ]-coordinate axes. Let’s look at such an example.

Example 6.1

Let      
2 2 −2
p= , a1 = , a2 =
2 1 4
define a new coordinate system, and let
 
2
x=
1/2

be a point in the [e1 , e2 ]-system. In the new coordinate system, the


[a1 , a2 ]-system, the coordinates of x define a new point x . What is
this point with respect to the [e1 , e2 ]-system? The solution:
      
2 2 −2 2 5
x = + = . (6.4)
2 1 4 1/2 6

Thus, x has coordinates  


2
1/2
with respect to the [a1 , a2 ]-system; with respect to the [e1 , e2 ]-system,
it has coordinates  
5
.
6
(See Sketch 6.2 for an illustration.)
Sketch 6.2.
Mapping a point and a vector.
122 6. Moving Things Around: Affine Maps in 2D

And now an example of a skew target box. Let’s revisit Example 5.1
from Section 5.1, and add an affine aspect to it by translating our
target box.

Example 6.2

Sketch 6.3 illustrates the given geometry,


     
2 2 4
p= , a1 = , a2 = ,
2 1 6
Sketch 6.3.
A new coordinate system. and the point  
6
r=
6
with respect to the [e1 , e2 ]-system. We may ask, what are the coor-
dinates of r with respect to the [a1 , a2 ]-system? This was the topic
of Chapter 5; we simply set up the linear system, Au = (r − p), or
    
2 4 u1 4
= .
1 6 u2 4
Using Cramer’s
  rule or Gauss elimination from Chapter 5, we find
1
that u = .
1/2

6.2 Affine and Linear Maps


A map of the form v = Av is called a linear map because it preserves
linear combinations of vectors. This idea is expressed in (4.11) and
illustrated in Sketch 4.4. A very fundamental property of linear maps
has to do with ratios, which are defined in Section 2.5. What happens
to the ratio of three collinear points when we map them by an affine
map? The answer to this question is fairly fundamental to all of
geometry, and it is: nothing. In other words, affine maps leave ratios
unchanged, or invariant. To see this, let

p2 = (1 − t)p1 + tp3

and let an affine map be defined by

x = Ax + p.
6.2. Affine and Linear Maps 123

We now have
p2 = A((1 − t)p1 + tp3 ) + p
= (1 − t)Ap1 + tAp3 + [(1 − t) + t]p
= (1 − t)[Ap1 + p] + t[Ap3 + p]
= (1 − t)p1 + tp3 .

The step from the first to the second equation may seem a bit con-
trived; yet it is the one that makes crucial use of the fact that we are
combining points using barycentric combinations: (1 − t) + t = 1.
The last equation shows that the linear (1 − t), t relationship
among three points is not changed by affine maps—meaning that
their ratio is invariant, as is illustrated in Sketch 6.4. In particular,
the midpoint of two points will be mapped to the midpoint of the
image points.
The other basic property of affine maps is this: they map parallel Sketch 6.4.
lines to parallel lines. If two lines do not intersect before they are Ratios are invariant under affine
mapped, then they will not intersect afterward either. Conversely, maps.
two lines that intersect before the map will also do so afterward.
Figure 6.2 shows how two families of parallel lines are mapped to two
families of parallel lines. The two families intersect before and after
the affine map. The map uses the matrix
 
1 2
A= .
2 1

Figure 6.2.
Affine maps: parallel lines are mapped to parallel lines.
124 6. Moving Things Around: Affine Maps in 2D

Figure 6.3.
Translations: points on a circle are translated by a fixed amount, and a line connects
corresponding points.

6.3 Translations
If an object is moved without changing its orientation, then it is trans-
lated. See Figure 6.3 in which points on a circle have been translated
by a fixed amount.
How is this action covered by the general affine map in (6.3)? Recall
the identity matrix from Section 5.9, which has no effect whatsoever
on any vector: we always have

Ix = x,

which you should be able to verify without effort.


A translation is thus written in the context of (6.3) as

x = p + Ix.

One property of translations is that they do not change areas; the


two circles in Figure 6.3 have the same area. A translation causes a
rigid body motion. Recall that rotations are also of this type.

6.4 More General Affine Maps


In this section, we present two very common geometric problems that
demand affine maps. It is one thing to say “every affine map is of the
form Ax + p,” but it is not always clear what A and p should be for
6.4. More General Affine Maps 125

a given problem. Sometimes a more constructive approach is called


for, as is the case with Problem 2 in this section.
Problem 1: Let r be some point around which you would like to rotate
some other point x by α degrees, as shown in Sketch 6.5. Let x be
the rotated point.
Rotations have been defined only around the origin, not around
arbitrary points. Hence, we translate our given geometry (the two
points r and x) such that r moves to the origin. This is easy:

r̄ = r − r = 0, x̄ = x − r. Sketch 6.5.
Rotating a point about another
Now we rotate the vector x̄ around the origin by α degrees: point.
¯ = Ax̄.

The matrix A would be taken directly from (4.16). Finally, we trans-


¯ back to the center r of rotation:
late x̄

x = Ax̄ + r.

Let’s reformulate this in terms of the given information. This is


achieved by replacing x̄ by its definition:

x = A(x − r) + r. (6.5)

Example 6.3 Sketch 6.6.


Rotate x 90◦ around r.
Let    
2 3
r= , x= ,
1 0
and α = 90◦ . We obtain
      
0 −1 1 2 3
x = + = .
1 0 −1 1 2

See Sketch 6.6 for an illustration.

Problem 2: Let l be a line and x be a point. You want to reflect x


across l, with result x , as shown in Sketch 6.7. This problem could be Sketch 6.7.
solved using the following affine maps. Find the intersection r of l with Reflect a point across a line.
the e1 -axis. Find the cosine of the angle between l and e1 . Rotate x
126 6. Moving Things Around: Affine Maps in 2D

around r such that l is mapped to the e1 -axis. Reflect the rotated x


across the e1 -axis, and finally undo the rotation. Complicated!
It is much easier to employ the “foot of a point” algorithm that
finds the closest point p on a line l to a point x, which was developed
in Section 3.7. Then p must be the midpoint of x and x :
1 1
p= x + x ,
2 2
from which we conclude

x = 2p − x. (6.6)

While this does not have the standard affine map form, it is equivalent
to it, yet computationally much less complex.

6.5 Mapping Triangles to Triangles


Affine maps may be viewed as combinations of linear maps and trans-
lations. Another flavor of affine maps is described in this section; it
draws from concepts in Chapter 5.
This other flavor arises like this: given a (source) triangle T with
vertices a1 , a2 , a3 , and a (target) triangle T  with vertices a1 , a2 , a3 ,
what affine map takes T to T  ? More precisely, if x is a point inside
T , it will be mapped to a point x inside T  : how do we find x ? For
starters, see Sketch 6.8.
Sketch 6.8. Our desired affine map will be of the form
Two triangles define an affine
map. x = A[x − a1 ] + a1 ,

thus we need to find the matrix A. (We have chosen a1 and a1
arbitrarily as the origins in the two coordinate systems.) We define

v2 = a2 − a1 , v3 = a3 − a1 ,

and
v2 = a2 − a1 , v3 = a3 − a1 .
We know

Av2 = v2 ,
Av3 = v3 .

These two vector equations may be combined into one matrix equa-
tion:    
A v2 v3 = v2 v3 ,
6.5. Mapping Triangles to Triangles 127

which we abbreviate as
AV = V  .
We multiply both sides of this equation by V ’s inverse V −1 and obtain
A as
A = V  V −1 .
This is the matrix we derived in Section 5.10, “Defining a Map.”

Example 6.4

Triangle T is defined by the vertices


     
0 −1 1
a1 = , a2 = , a3 = ,
1 −1 −1
and triangle T  is defined by the vertices
     
0 1 −1
a1 = , a2 = , a3 = .
1 3 3
The matrices V and V  are then defined as
   
−1 1 1 −1
V = , V = .
−2 −2 2 2
The inverse of the matrix V is
 
−1/2 −1/4
V −1 = ,
1/2 −1/4
thus the linear map A is defined as
 
−1 0
A= .
0 −1
Do you recognize the map?
Let’s try a sample point
 
0
x=
−1/3
in T . This point is mapped to
         
 −1 0 0 0 0 0
x = − + =
0 −1 −1/3 1 1 7/3
in T  .

Note that V ’s inverse V −1 might not exist; this is the case when
v2 and v3 are linearly dependent and thus |V | = 0.
128 6. Moving Things Around: Affine Maps in 2D

6.6 Composing Affine Maps


Linear maps are an important theoretical tool, but ultimately we are
interested in affine maps; they map objects that are defined by points
to other such objects.
If an affine map is given by
x = A(x − o) + p,
nothing keeps us from applying it twice, resulting in x :
x = A(x − o) + p.
This may be repeated several times—for interesting choices of A and
p, interesting images will result.

Example 6.5

For Figure 6.4 (left), the affine map is defined by


    
cos 45◦ − sin 45◦ 0.5 0.0 0
A= and p = . (6.7)
sin 45◦ cos 45◦ 0 0.5 0
In Figure 6.4 (right), a translation was added:
 
0.2
p= . (6.8)
0

Figure 6.4.
Composing affine maps: The affine maps from (6.7) and (6.8) are applied iteratively,
resulting in the left and right images, respectively. The starting object is a set of points
on the unit circle centered at the origin. Successive iterations are lighter gray. The
same linear map is used for both images; however, a translation has been added to
create the right image.
6.6. Composing Affine Maps 129

For both images, the linear map is a composition of a scale and a


rotation. In the left image, each successive iteration is applied to
geometry that is centered about the origin. In the right image, the
translation steps away from the origin, thus the rotation action moves
the geometry in the e2 -direction even though the translation is strictly
in e1 .

Example 6.6

For Figure 6.5 (left), the affine map is


     
cos 90◦ − sin 90◦ 0.5 0.0 1 1 0
A= and p = . (6.9)
sin 90◦ cos 90◦ 0 0.5 0 1 0

In Figure 6.5 (right), a translation was added:


 
2
p= . (6.10)
0

For both images, the linear map is a composition of a shear, scale,


and rotation. In the left image, each successive iteration is applied
to geometry that is centered about the origin. In the right image,

Figure 6.5.
Composing affine maps: The affine maps from (6.9) and (6.10) are applied iteratively,
resulting in the left and right images, respectively. The starting object is a set of points
on the unit circle centered at the origin. Successive iterations are lighter gray. The
same linear map is used for both images; however, a translation has been added to
create the right image.
130 6. Moving Things Around: Affine Maps in 2D

the translation steps away from the origin, thus the rotation action
moves the geometry in the e2 -direction even though the translation
is strictly in e1 . Same idea as in Example 6.5, but a very different
affine map! One interesting artifact of this map: We expect a circle
to be mapped to an ellipse, but by iteratively applying the shear and
rotation, the ellipse is stretched just right to morph back to a circle!

Rotations can also be made more interesting. In Figure 6.6, you


see the letter S rotated several times around the origin, which is near
the lower left of the letter.
Adding scaling and rotation results in Figure 6.7. The basic affine
map for this case is given by

x = S[Rx + p]

where R rotates by −20◦, S scales nonuniformly, and p translates:


     
cos(−20) − sin(−20) 1.25 0 5
R= , S= , p= .
sin(−20) cos(−20) 0 1.1 5

We finish this chapter with Figure 6.8 by the Dutch artist, M.C.
Escher [5], who in a very unique way mixed complex geometric issues
with a unique style. The figure plays with reflections, which are affine
maps.

Figure 6.6. Figure 6.7.


Rotations: the letter S is rotated several times; the origin Rotations: the letter S is rotated several times; scalings
is at the lower left of the letter. and translations are also applied.
6.6. Composing Affine Maps 131

Figure 6.8.
M.C. Escher: Magic Mirror (1949).

Figure 6.8 is itself a 2D object, and so may be subjected to affine


maps. Figure 6.9 gives an example. The matrix used here is
 
0.7 0.35
A= . (6.11)
−0.14 0.49

No translation is applied.

Figure 6.9.
M.C. Escher: Magic Mirror (1949); affine map applied.
132 6. Moving Things Around: Affine Maps in 2D

• linear map • rotate a point about


• affine map another point
• translation • reflect a point about a line
• identity matrix • three points mapped to
• barycentric combination three points
• invariant ratios • mapping triangles to
• rigid body motion triangles

6.7 Exercises
For Exercises 1 and 2 let an affine map be defined by
   
2 1 2
A= and p = .
1 2 2
1. Let    
0 1
r= , s= ,
1 3/2
and q = (1/3)r + (2/3)s. Compute r , s , q ; e.g., r = Ar + p. Show
that q = (1/3)r + (2/3)s .
2. Let    
0 2
t= and m= .
1 1
Compute t and m . Sketch the lines defined by t, m and t , m . Do
the same for r and s from Exercise 1. What does this illustrate?
3. Map the three collinear points
     
0 1 2
x1 = , x2 = , x3 = ,
0 1 2
to points xi by the affine map Ax + p, where
   
1 2 −4
A= and p = .
0 1 0
What is the ratio of the xi ? What is the ratio of the xi ?
4. Rotate the point  
−2
x=
−2
by 90◦ around the point 
−2
r= .
2
Define the matrix and point for this affine map.
6.7. Exercises 133

5. Rotate the points


       
1 2 3 4
p0 = , p1 = , p2 = , p3 =
0 0 0 0

by 45◦ around the point p0 . Define the affine map needed here in terms
of a matrix and a point. Hint: Note that the points are evenly spaced so
some economy in calculation is possible.
     
0 0 1
6. Reflect the point x = about the line l(t) = p+tv, l(t) = +t .
2 0 2
7. Reflect the points
       
2 1 0 −1 −2
p0 = , p1 = , p2 = , p3 = , p4 =
1 1 1 1 1

about the line l(t) = p0 + tv, where


 
−2
v= .
2

Hint: Note that the points are evenly spaced so some economy in calcu-
lation is possible.
8. Given a triangle T with vertices
     
0 1 0
a1 = , a2 = , a3 = ,
0 0 1

and T  with vertices


     
1 0 0
a1 = , a2 = , a3 = ,
0 1 0

what is the affine map that maps T to T  ? What are the coordinates of
the point x corresponding to x = [1/2 1/2]T ?
9. Given a triangle T with vertices
     
2 0 −2
a1 = , a2 = , a3 = ,
0 1 0

and T  with vertices


     
2 0 −2
a1 = , a2 = , a3 = ,
0 −1 0

suppose that the triangle T has been mapped to T  via an affine map.
What are the coordinates of the point x corresponding to
 
0
x= ?
0
134 6. Moving Things Around: Affine Maps in 2D

10. Construct the affine map that maps any point x with respect to triangle
T to x with respect to triangle T  using the vertices in Exercise 9.
11. Let’s revisit the coordinate transformation from Exercise 10 in Chap-
ter 1. Construct the affine map which takes a 2D point x in NDC
coordinates to the 2D point x in a viewport. Recall that the extents of
the NDC system are defined by the lower-left and upper-right points
   
−1 1
ln = and un = ,
−1 1

respectively. Suppose we want to map to a viewport with extents


   
10 30
lv = and uv = .
10 20

After constructing the affine map, find the points in the viewport asso-
ciated with the NDC points
     
−1 1 −1/2
x1 = , x2 = , x3 = .
−1 1 1/2

12. Affine maps transform parallel lines to parallel lines. Do affine maps
transform perpendicular lines to perpendicular lines?
13. Which affine maps are rigid body motions?
14. The solution to the problem of reflecting a point across a line is given
by (6.6). Why is this a valid combination of points?
7
Eigen Things

Figure 7.1.
The Tacoma Narrows Bridge: a view from the approach shortly before collapsing.

A linear map is described by a matrix, but that does not say much
about its geometric properties. When you look at the 2D linear map
figures from Chapter 4, you see that they all map a circle, formed
from the wings of the Phoenix, to some ellipse—called the action
ellipse, thereby stretching and rotating the circle. This stretching

135
136 7. Eigen Things

Figure 7.2.
The Tacoma Narrows Bridge: a view from shore shortly before collapsing.

and rotating is the geometry of a linear map; it is captured by its


eigenvectors and eigenvalues, the subject of this chapter.
Eigenvalues and eigenvectors play an important role in the analysis
of mechanical structures. If a bridge starts to sway because of strong
winds, then this may be described in terms of certain eigenvalues
associated with the bridge’s mathematical model. Figures 7.1 and 7.2
show how the Tacoma Narrows Bridge swayed violently during mere
42-mile-per-hour winds on November 7, 1940. It collapsed seconds
later. Today, a careful eigenvalue analysis is carried out before any
bridge is built. But bridge design is complex and this problem can
still occur; the Millennium Bridge in London was swaying enough to
cause seasickness in some visitors when it opened in June 2000.
The essentials of all eigentheory are already present in the humble
2D case, the subject of this chapter. A discussion of the higher-
dimensional case is given in Section 15.1.

7.1 Fixed Directions


Consider Figure 4.2. You see that the e1 -axis is mapped to itself; so
is the e2 -axis. This means that any vector of the form ce1 or de2 is
mapped to some multiple of itself. Similarly, in Figure 4.8, you see
that all vectors of the form ce1 are mapped to multiples of each other.
7.2. Eigenvalues 137

The directions defined by those vectors are called fixed directions,


for the reason that those directions are not changed by the map.
All vectors in the fixed directions change only in length. The fixed
directions need not be the coordinate axes.
If a matrix A takes a (nonzero) vector r to a multiple of itself, then
this may be written as
Ar = λr (7.1)
with some real number λ.1 The value of λ will determine if r will
expand, contract, or reverse direction. From now on, we will disregard
the “trivial solution” r = 0 from our considerations.
We next observe that A will treat any multiple of r in this way as
well. Given a matrix A, one might then ask which vectors it treats
in this special way. It turns out that there are at most two directions
(in 2D), and when we study symmetric matrices (e.g., a scaling) in
Section 7.5, we’ll see that they are orthogonal to each other. These
special vectors are called the eigenvectors of A, from the German
word eigen, meaning special or proper. An eigenvector is mapped
to a multiple of itself, and the corresponding factor λ is called its
eigenvalue. The eigenvalues and eigenvectors of a matrix are the keys
to understanding its geometry.

7.2 Eigenvalues
We now develop a way to find the eigenvalues of a 2 × 2 matrix A.
First, we rewrite (7.1) as

Ar = λIr,

with I being the identity matrix. We may change this to

[A − λI]r = 0. (7.2)

This means that the matrix [A − λI] maps a nonzero vector r to the
zero vector; [A − λI] must be a rank deficient matrix. Then [A − λI]’s
determinant vanishes:

p(λ) = det[A − λI] = 0. (7.3)

As you will see, (7.3) is a quadratic equation in λ, called the character-


istic equation of A. And p(λ) is called the characteristic polynomial.
1 This is the Greek letter lambda.
138 7. Eigen Things

Figure 7.3.
Action of a matrix: behavior of the matrix from Example 7.1.

Example 7.1

Before we proceed further, an example. Let


 
2 1
A= .
1 2

Its action is shown in Figure 7.3.


So let’s write out (7.3). It is

2 − λ
1
p(λ) = = 0.
1 2 − λ

We expand the determinant and gather terms of λ to form the char-


acteristic equation

p(λ) = λ2 − 4λ + 3 = 0.

This quadratic equation2 has the roots

λ1 = 3, λ2 = 1.

2 Recall 2
√ √ equation aλ + bλ + c = 0 has the solutions λ1 =
that the quadratic
−b+ b2 −4ac −b− b2 −4ac
2a
and λ2 = 2a
.
7.3. Eigenvectors 139

Thus, the eigenvalues of a 2 × 2 matrix are nothing but the zeroes


of the quadratic equation

p(λ) = (λ − λ1 )(λ − λ2 ) = 0. (7.4)

Commonly, eigenvalues are ordered in decreasing absolute value order:


|λ1 | ≥ |λ2 |. The eigenvalue λ1 is called the dominant eigenvalue.
From (7.3), we see that the characteristic polynomial evaluated at
λ = 0 results in the determinant of A: p(0) = det[A]. Furthermore,
(7.4) shows that the determinant is the product of the eigenvalues:

|A| = λ1 · λ2 .

Check this in Example 7.1. This makes intuitive sense if we consider


the determinant as a measure of the change in area of the unit square
as it is mapped by A to a parallelogram. The eigenvalues indicate a
scaling of certain fixed directions defined by A.
Now let’s look at how to compute these fixed directions, or eigen-
vectors.

7.3 Eigenvectors
Continuing with Example 7.1, we would still like to know the corre-
sponding eigenvectors. We know that one of them will be mapped to
three times itself, the other one to itself. Let’s call the corresponding
eigenvectors r1 and r2 . The eigenvector r1 satisfies
 
2−3 1
r = 0,
1 2−3 1
or  
−1 1
r = 0.
1 −1 1
This is a homogeneous system. (Section 5.8 introduced these systems
and a technique to solve them with Gauss elimination.) Such sys-
tems have either none or infinitely many solutions. In our case, since
the matrix has rank 1, there are infinitely many solutions. Forward
elimination results in  
−1 1
r = 0.
0 0 1
Assign r2,1 = 1, then back substitution results in r1,1 = 1. Any vector
of the form  
1
r1 = c
1
140 7. Eigen Things

Figure 7.4.
Eigenvectors: the action of the matrix from Example 7.1 and its eigenvectors, scaled
by their corresponding eigenvalues.

will do. And indeed, Figure 7.4 indicates that


 
1
1

is stretched by a factor of three, that is Ar1 = 3r1 . Of course


 
−1
−1

is also stretched by a factor of three.


Next, we determine r2 . We get the linear system
 
1 1
r = 0.
1 1 2

Again, we have a homogeneous system with infinitely many solutions.


They are all of the form
 
−1
r2 = c .
1

Now recheck Figure 7.4; you see that the vector


 
1
−1

is not stretched, and indeed it is mapped to itself!


7.3. Eigenvectors 141

Typically, eigenvectors are normalized to achieve a degree of unique-


ness, and we then have
   
1 1 1 −1
r1 = √ , r2 = √ .
2 1 2 1

Let us return to the general 2 × 2 case and review the ideas thus
far. The fixed directions r of a map A that satisfy Ar = λr are key to
understanding the action of the map. The expression det[A − λI] = 0
is a quadratic polynomial in λ, and its zeroes λ1 and λ2 are A’s eigen-
values. To find the corresponding eigenvectors, we set up the linear
systems [A − λ1 I]r1 = 0 and [A − λ2 I]r2 = 0. Both are homoge-
neous linear systems with infinitely many solutions, corresponding to
the eigenvectors r1 and r2 , which are in the null space of the matrix
[A − λI].

Example 7.2

Let’s look at another matrix, namely


 
1 2
A= .
0 2

The characteristic equation, (1 − λ)(2 − λ) = 0 results in eigenvalues


λ1 = 1 and λ2 = 2, and the corresponding homogeneous systems
   
0 2 −1 2
r =0 and r = 0.
0 1 1 0 0 2

The system for r1 deserves some special attention. In order to apply


back substitution to this system, column pivoting is necessary, thus
the system becomes
  
2 0 r2,1
= 0.
1 0 r1,1

Notice that when column vectors are exchanged, the solution vector
components are exchanged as well. One more forward elimination
step leads to
  
2 0 r2,1
= 0.
0 0 r1,1

Assign r1,1 = 1, and back substitution results in r2,1 = 0.


142 7. Eigen Things

We use the same assignment and back substitution strategy to solve


for r2 . In summary, the two homogeneous systems result in (normal-
ized) eigenvectors
   
1 1 2
r1 = and r2 = √ ,
0 5 1
which unlike the eigenvectors of Example 7.1, are not orthogonal.
(The matrix in Example 7.1 is symmetric.) We can confirm that
Ar1 = r1 and Ar2 = 2r2 .

The eigenvector, r1 corresponding to the dominant eigenvalue λ1


is called the dominant eigenvector.

7.4 Striving for More Generality


In our examples so far, we only encountered quadratic polynomials
with two zeroes. Life is not always that simple: As you might re-
call from calculus, there are either no, one, or two real zeroes of a
quadratic polynomial,3 as illustrated in Figure 7.5.

Figure 7.5.
Quadratic polynomials: from left to right, no zero, one zero, two real zeroes.

If there are no zeroes, then the corresponding matrix A has no


fixed directions. We know one example—rotations. They rotate every
vector, leaving no direction unchanged. Let’s look at a rotation by
−90◦ , given by  
0 1
.
−1 0
Its characteristic equation is

−λ 1

−1 −λ = 0
3 Actually, every quadratic polynomial has two zeroes, but they may be complex

numbers.
7.4. Striving for More Generality 143

or
λ2 + 1 = 0.
This has no real solutions, as expected.
A quadratic equation may also have one double root; then there
is only one fixed direction. A shear in the e1 -direction provides an
example—it maps all vectors in the e1 -direction to themselves. An
example is  
1 1/2
A= .
0 1
The action of this shear is illustrated in Figure 4.8. You clearly see
that the e1 -axis is not changed.
The characteristic equation for A is

1 − λ 1/2
=0
0 1 − λ
or
(1 − λ)2 = 0.
It has the double root λ1 = λ2 = 1. For the corresponding eigenvec-
tor, we have to solve  
0 1/2
r = 0.
0 0
In order to apply back substitution to this system, column pivoting
is necessary, thus the system becomes
  
1/2 0 r2
= 0.
0 0 r1

Now we set r1 = 1 and proceed with back substitution to find that


r2 = 0. Thus  
1
r=
0
and any multiple of it are solutions. This is quite as expected; those
vectors line up along the e1 -direction. The shear A has only one fixed
direction.
Is there an easy way to decide if a matrix has real eigenvalues or
not? In general, no. But there is one important special case: all
symmetric matrices have real eigenvalues. In Section 7.5 we’ll take a
closer look at symmetric matrices.
The last special case to be covered is that of a zero eigenvalue.
144 7. Eigen Things

Example 7.3

Take the projection matrix from Figure 4.11. It was given by


 
0.5 0.5
A= .
0.5 0.5
The characteristic equation is

λ(λ − 1) = 0,

resulting in λ1 = 1 and λ2 = 0. The eigenvector corresponding to λ2


is found by solving    
0.5 0.5 0
r = ,
0.5 0.5 1 0
a homogeneous linear system. Forward elimination leads to
   
0.5 0.5 0
r = .
0.0 0.0 1 0
Assign r2,1 = 1, then back substitution results in r1,1 = −1. The
homogeneous system’s solutions are of the form
 
−1
r1 = c .
1

Since this matrix maps nonzero vectors (multiples of r1 ) to the zero


vector, it reduces dimensionality, and thus has rank one. Note that
the eigenvector corresponding to the zero eigenvalue defines the kernel
or null space of the matrix.

There is more to learn from the projection matrix in Example 7.3.


From Section 4.8 we know that such rank one matrices are idempo-
tent, i.e., A2 = A. One eigenvalue is zero; let λ be the nonzero one,
with corresponding eigenvector r. Multiply both sides of (7.1) by A,
then

A2 r = λAr
λr = λ2 r,

and hence λ = 1. Thus, a 2D projection matrix always has eigenvalues


0 and 1. As a general statement, we may say that a 2 × 2 matrix with
one zero eigenvalue has rank one.
7.5. The Geometry of Symmetric Matrices 145

7.5 The Geometry of Symmetric Matrices


Symmetric matrices, those for which A = AT , arise often in practical
problems, and two important examples are addressed in this book:
conics in Chapter 19 and least squares approximation in Section 12.7.
However, many more practical examples exist, coming from fields
such as classical mechanics, elasticity theory, quantum mechanics,
and thermodynamics.
One nice thing about real symmetric matrices: their eigenvalues
are real. And another nice thing: they have an interesting geometric
interpretation, as we will see below. As a result, symmetric matrices
have become the workhorse for a plethora of numerical algorithms.
We know the two basic equations for eigenvalues and eigenvectors
of a symmetric 2 × 2 matrix A:

Ar1 = λ1 r1 , (7.5)
Ar2 = λ2 r2 . (7.6)

Since A is symmetric, we may use (7.5) to write the following:

(Ar1 )T = (λ1 r1 )T
rT T T
1 A = r1 λ1
rT T
1 A = λ1 r1

and then after multiplying both sides by r2 ,

rT T
1 Ar2 = λ1 r1 r2 . (7.7)

Multiply both sides of (7.6) by rT


1

rT T
1 Ar2 = λ2 r1 r2 (7.8)

and equating (7.7) and (7.8)

λ1 rT T
1 r2 = λ2 r1 r2

or
(λ1 − λ2 )rT
1 r2 = 0.

If λ1 = λ2 (the standard case), then we conclude that rT 1 r2 = 0;


in other words, A’s two eigenvectors are orthogonal. Check this in
Example 7.1 and the continuation of this example in Section 7.3.
We may condense (7.5) and (7.6) into one matrix equation,
   
Ar1 Ar2 = λ1 r1 λ2 r2 . (7.9)
146 7. Eigen Things

If we define (using the capital Greek Lambda: Λ)


 
  λ 0
R = r1 r2 and Λ = 1 ,
0 λ2

then (7.9) becomes


AR = RΛ. (7.10)

Example 7.4

For Example 7.1, AR = RΛ becomes


  √ √   √ √  
2 1 1/√2 −1/√2 1/√2 −1/√2 3 0
= .
1 2 1/ 2 1/ 2 1/ 2 1/ 2 0 1

Verify this identity.

Assume the eigenvectors are normalized so rT T


1 r1 = 1 and r2 r2 = 1,
T T
and since they are orthogonal, r1 r2 = r2 r1 = 0; thus r1 and r2 are
orthonormal. These four equations may also be written in matrix
form
RT R = I, (7.11)
with I the identity matrix. Thus,

R−1 = RT

and R is an orthogonal matrix. Now (7.10) becomes

A = RΛRT , (7.12)

and it is called the eigendecomposition of A. Because it is possible


to transform A to the diagonal matrix Λ = R−1 AR, A is said to be
diagonalizable. Matrix decomposition is a fundamental tool in linear
algebra for giving insight into the action of a matrix and for building
stable and efficient methods to solve linear systems.
What does decomposition (7.12) mean geometrically? Since R is
an orthogonal matrix, it is a rotation, a reflection, or a combination
of the two. Recall that these linear maps preserve lengths and angles.
Its inverse, RT , is the same type of linear map as R, but a reversal
of the action of R. The diagonal matrix Λ is a scaling along each of
7.5. The Geometry of Symmetric Matrices 147

Figure 7.6.
Eigendecomposition: the action of the symmetric matrix A from Example 7.1. Top: I,
A. Bottom: I, RT (rotate –45◦ ), ΛRT (scale), RΛRT (rotate 45◦ ).

the coordinate axes. In Example 7.4, R is a rotation of 45◦ and the


eigenvalues are λ1 = 3 and λ2 = 1. This decomposition of A into
rotation, scale, rotation is illustrated in Figure 7.6. (RT is applied
first, so R is actually reversing the action of RT .)
Recall that there was a degree of freedom in selecting
√ the eigen-

vectors, so r2 from above could have been r2 = [1/ 2 − 1/ 2]T ,
for example. This selection would result in R being a rotation and
reflection. Verify this statement. It is always possible to choose the
eigenvectors so that R is simply a rotation.
Another way to look at the action of the map A on a vector x is to
write

Ax = RΛRT x


  rT
1
= r1 r2 Λ x
rT
2


  λ1 rT
1x
= r1 r2
λ2 rT
2x

= λ1 r1 rT T
1 x + λ2 r2 r2 x. (7.13)

Each matrix rk rTk is a projection onto rk , as introduced in Section 4.8.


As a result, the action of A can be interpreted as a linear combination
of projections onto the orthogonal eigenvectors.
148 7. Eigen Things

Example 7.5

Let’s look at the action of the matrix A in Example 7.1 in terms of


(7.13), and specifically we will apply the map to x = [2 1/2]T .
See Example 7.4 for the eigenvalues and eigenvectors for A. The
projection matrices are
   
T 1/2 1/2 T 1/2 −1/2
P1 = r1 r1 = and P2 = r2 r2 = .
1/2 1/2 −1/2 1/2

The action of the map is then


     
15/4 3/4 9/2
Ax = 3P1 x + P2 x = + = .
15/4 −3/4 3

This example is illustrated in Figure 7.7.

3P1x

3
Ax

1
r2 r1

1 2 3 4

P2x

Figure 7.7.
Eigendecomposition: the action of the matrix from Example 7.5 interpreted as a linear
combination of projections. The vector x is projected onto each eigenvector, r1 and
r2 , and scaled by the eigenvalues (λ1 = 3, λ2 = 1). The action of A is a sum of scaled
projection vectors.
7.6. Quadratic Forms 149

We will revisit this projection idea in the section of Principal Com-


ponents Analysis in Section 16.8; an application will be introduced
there.

7.6 Quadratic Forms


Recall from calculus the concept of a bivariate function: this is a
function f with two arguments x, y; we will use the notation v1 , v2
for the arguments. Then we write f as

f (v1 , v2 ) or f (v).

In this section, we deal with very special bivariate functions that


are defined in terms of 2 × 2 symmetric matrices C:

f (v) = vT Cv. (7.14)

Such functions are called quadratic forms because all terms are quadratic,
as we see by expanding (7.14)

f (v) = c1,1 v12 + 2c2,1 v1 v2 + c2,2 v22 .

The graph of a quadratic form is a 3D point set [v1 , v2 , f (v1 , v2 )]T ,


forming a quadratic surface. Figure 7.8 illustrates three types of
quadratic surfaces: ellipsoid, paraboloid, and hyperboloid. The cor-
responding matrices and quadratic forms are
     
2 0 2 0 −2 0
C1 = , C2 = , C3 = , (7.15)
0 0.5 0 0 0 0.5
f1 (v) = 2v12 + 0.5v22 , f2 (v) = 2v12 , f3 (v) = −2v12 + 0.5v22 , (7.16)

respectively. Accompanying each quadratic surface is a contour plot,


which is created from planar slices of the surface at several f (v) =
constant values. Each curve in a planar slice is called a contour, and
between contours, the area is colored to indicate a range of function
values. Each quadratic form type has distinct contours: the (bowl-
shaped) ellipsoid contains ellipses, the (taco-shaped) paraboloid con-
tains straight lines, and the (saddle-shaped) hyperboloid contains
hyperbolas.4
4 These are conic sections, to be covered in Chapter 19.
150 7. Eigen Things

1 1 1

0 0 0

–1 –1 –1
–1 0 1 –1 0 1 –1 0 1

Figure 7.8.
Quadratic forms: ellipsoid, paraboloid, hyperboloid evaluated over the unit circle. The
[e1 , e2 ]-axes are displayed. A contour plot for each quadratic form communicates
additional shape information. Color map extents: min f (v) colored black and max f (v)
colored white.

Some more properties of our sample matrices: the determinant and


the eigenvalues, which are easily found by inspection,

|C1 | = 1 λ1 = 2, λ2 = 0.5
|C2 | = 0 λ1 = 2, λ2 = 0
|C3 | = −1 λ1 = −2, λ2 = 0.5.

A matrix is called positive definite if

f (v) = vT Av > 0 (7.17)

for any nonzero vector v ∈ R2 . This means that the quadratic form
is positive everywhere except for v = 0. An example is illustrated in
the left part of Figure 7.8. It is an ellipsoid: the matrix is in (7.15)
and the function is in (7.16). (It is hard to see exactly, but the middle
function, the paraboloid, has a line touching the zero plane.)
Positive definite symmetric matrices are a special class of matrices
that arise in a number of applications, and their well-behaved nature
lends them to numerically stable and efficient algorithms.
7.6. Quadratic Forms 151

Geometrically we can get a handle on the positive definite condi-


tion (7.17) by first considering only unit vectors. Then, this condition
states that the angle between v and Av is between −90◦ and 90◦ , in-
dicating that A is somehow constrained in its action on v. It isn’t
sufficient to only consider unit vectors, though. Therefore, for a gen-
eral matrix, (7.17) is a difficult condition to verify.
However, for two special matrices, (7.17) is easy to verify. Suppose
A is not necessarily symmetric; then AT A and AAT are symmetric
and positive definite. We show this for the former matrix product:

vT AT Av = (Av)T (Av) = yT y > 0.


These matrices will be at the heart of an important decomposition
called the singular value decomposition (SVD), the topic of Chap-
ter 16.
The determinant of a positive definite 2 × 2 matrix is always posi-
tive, and therefore this matrix is always nonsingular. Of course these
concepts apply to n × n matrices, however there are additional re-
quirements on the determinant in order for a matrix to be positive
definite. This is discussed in more detail in Chapters 12 and 15.
Let’s examine quadratic forms where we restrict C to be positive
definite, and in particular C = AT A. If we concentrate on all v for
which (7.14) yields the value 1, we get

vT Cv = 1.
This is a contour of f .
Let’s look at this contour for C1 :

2v12 + 0.5v22 = 1;
it is the equation of an ellipse.
By setting v1 = 0 and solving
√ for v2 , we can identify the e2 -axis
extents of the ellipse: ±1/ √0.5. Similarly by setting v2 = 0 we
find the e1 -axis extents: ±1/ 2. By definition, the major axis is the
longest of the ellipse’s two axes, so in this case, it is in the e2 -direction.
As noted above, the eigenvalues for C1 are λ1 = 2 and λ2 = 0.5
and the corresponding eigenvectors are r1 = [1 0]T and r2 = [0 1]T .
(The eigenvectors of a symmetric matrix are orthogonal.) Thus we
have that the minor axis corresponds to the dominant eigenvector.
Examining the contour plot in Figure 7.8 (left), we see that the ellipses
do indeed have the major axis corresponding to r2 and minor axis
corresponding to r1 . Interpreting the contour plot as a terrain map,
we see that the minor axis (dominant eigenvector direction) indicates
steeper ascent.
152 7. Eigen Things

7.25

0
1
0
–1
Figure 7.9.
Quadratic form contour: a planar slice of the quadratic form from Example 7.6 defines
an ellipse. The quadratic form has been scaled in e3 to make the contour easier to
see.

Example 7.6

Here is an example for which the major and minor axes are not aligned
with the coordinate axes:
   
2 0.5 T 4 1
A= thus C4 = A A = ,
0 1 1 1.25
and this quadratic form is illustrated in Figure 7.9.
The eigendecomposition, C4 = RΛRT , is defined by
   
−0.95 −0.30 4.3 0
R= and Λ = ,
−0.30 −0.95 0 0.92
where the eigenvectors are the columns of R.
The ellipse defined by f4 = vT C4 v = 1 is
4v12 + 2v1 v2 + 1.25v22 = 1;
it is illustrated in Figure 7.9. The major and minor axis lengths are
easiest to determine by using the eigendecomposition to perform a
coordinate transformation. The ellipse can be expressed as
vT RΛRT v = 1
v̂T Λv̂ = 1
λ1 v̂12 + λ2 v̂22 = 1,
aligning the
√ ellipse √
with the coordinate axes. Thus the minor axis has
length 1/ √λ1 = 1/ √4.3 = 0.48 on the e1 axis and the major axis has
length 1/ λ2 = 1/ 0.92 = 1.04 on the e2 axis.
7.7. Repeating Maps 153

Figure 7.10.
Repetitions: a symmetric matrix is applied several times. One eigenvalue is greater
than one, causing stretching in one direction. One eigenvalue is less than one, causing
compaction in the opposing direction.

Figure 7.11.
Repetitions: a matrix is applied several times. The eigenvalues are not real, therefore
the Phoenixes do not line-up along fixed directions.

7.7 Repeating Maps


When we studied matrices, we saw that they always map the unit
circle to an ellipse. Nothing keeps us from now mapping the ellipse
again using the same map. We can then repeat again, and so on.
Figures 7.10 and 7.11 show two such examples.
Figure 7.10 corresponds to the matrix
 
1 0.3
A= .
0.3 1

Being symmetric, it has two real eigenvalues and orthogonal eigen-


vectors. As the map is repeated several times, the resulting ellipses
154 7. Eigen Things

become more and more stretched: they are elongated in the direction
r1 by λ1 = 1.3 and compacted in the direction of r2 by a factor of
λ2 = 0.7, with
√   √ 
1/√2 −1/√2
r1 = , r2 = .
1/ 2 1/ 2

To get some more insight into this phenomenon, consider applying A


(now a generic matrix) twice to r1 . We get

AAr1 = Aλ1 r1 = λ21 r1 .

In general,
An r1 = λn1 r1 . (7.18)

The same holds for r2 and λ2 , of course. So you see that once a
matrix has real eigenvectors, they play a more and more prominent
role as the matrix is applied repeatedly.
By contrast, the matrix corresponding to Figure 7.11 is given by
 
0.7 0.3
A= .
−1 1

As you should verify for yourself, this matrix does not have real eigen-
values. In that sense, it is related to a rotation matrix. If you study
Figure 7.11, you will notice a rotational component as we progress—
the figures do not line up along any (real) fixed directions.
In Section 15.2 on the power method, we will apply this idea of
repeating a map in order to find the eigenvectors.

• fixed direction • eigentheory of a


• eigenvalue symmetric matrix
• eigenvector • matrix with real
• characteristic equation eigenvalues
• dominant eigenvalue • diagonalizable matrix
• dominant eigenvector • eigendecomposition
• homogeneous system • quadratic form
• kernel or null space • contour plot
• orthogonal matrix • positive definite matrix
• repeated linear map
7.8. Exercises 155

7.8 Exercises
1. For each of the following matrices, describe the action of the linear map
and find the eigenvalues and eigenvectors:
     
1 s s 0 cos θ − sin θ
A= , B= , C= .
0 1 0 s sin θ cos θ

Since we have not been working with complex numbers, if an eigenvalue


is complex, do not compute the eigenvector.
2. Find the eigenvalues and eigenvectors of the matrix
 
1 −2
A= .
−2 1

Which is the dominant eigenvalue?


3. Find the eigenvalues and eigenvectors of
 
1 −2
A= .
−1 0

Which is the dominant eigenvalue?


4. If a 2 × 2 matrix has eigenvalues λ1 = 4 and λ2 = 2, and we apply this
map to the vertices of the unit square (area 1), what is the area of the
resulting parallelogram?
5. For 2 × 2 matrices A and I, what vectors are in the null space of A − λI?
6. For the matrix in Example 7.1, identify the four sets of eigenvectors
that may be constructed (by using the degree of freedom available in
choosing the direction). Consider each set as column vectors of a matrix
R, and sketch the action of the matrix. Which sets involve a reflection?
7. Are the eigenvalues of the matrix
 
a b
A=
b c

real or complex? Why?


8. Let u be a unit length 2D vector and P = uuT . What are the eigen-
values of P ?
9. Show that a 2×2 matrix A and its transpose share the same eigenvalues.
10. What is the eigendecomposition of
 
3 0
A= ?
0 4

Use the convention that the dominant eigenvalue is λ1 .


156 7. Eigen Things

11. Express the action of the map


 
2 0
A=
0 4

on the vector x = [1 1]T in terms of projections formed from the


eigendecomposition.
12. Consider the following quadratic forms fi (v) = vT Ci v:
     
3 1 3 4 3 3
C1 = , C2 = , C3 = .
1 3 4 1 3 3

Give the eigenvalues and classification (ellipsoid, paraboloid, hyper-


boloid) of each.
13. For Ci in Exercise 12, give the equation fi (v) for each.
14. For C1 in Exercise 12, what are the semi-major and semi-minor axis
lengths for the ellipse vT C1 v = 1? What are the (normalized) axis
directions?
15. For the quadratic form, f (v) = vT Cv = 2v12 + 4v1 v2 + 3v22 , what is the
matrix C?
16. Is f (v) = 2v12 − 4v1 v2 + 2v2 + 3v22 a quadratic form? Why?
17. Show that
xT AAT x > 0
for nonsingular A.
18. Are the eigenvalues of the matrix
 
a a/2
A=
a/2 a/2

real or complex? Are any of the eigenvalues negative? Why?


19. If all eigenvalues of a matrix have absolute value less than one, what
will happen as you keep repeating the map?
20. For the matrix A in Exercise 2, what are the eigenvalues and eigenvectors
for A2 and A3 ?
8
3D Geometry

Figure 8.1.
3D objects: Guggenheim Museum in Bilbao, Spain. Designed by Frank Gehry.

This chapter introduces the essential building blocks of 3D geometry


by first extending the 2D tools from Chapters 2 and 3 to 3D. But
beyond that, we will also encounter some concepts that are truly 3D,
i.e., those that do not have 2D counterparts. With the geometry
presented in this chapter, we will be ready to create and analyze

157
158 8. 3D Geometry

simple 3D objects, which then could be utilized to create forms such


as in Figure 8.1.

8.1 From 2D to 3D
Moving from 2D to 3D geometry requires a coordinate system with
one more dimension. Sketch 8.1 illustrates the [e1 , e2 , e3 ]-system that
consists of the vectors
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0
e1 = ⎣0⎦ , e2 = ⎣1⎦ , and e3 = ⎣0⎦ .
0 0 1

Thus, a vector in 3D is given as


⎡ ⎤
v1
v = ⎣v2 ⎦ . (8.1)
v3

The three components of v indicate the displacement along each axis


in the [e1 , e2 , e3 ]-system. This is illustrated in Sketch 8.1. A 3D
vector v is said to live in 3D space, or R3 , that is v ∈ R3 .
Sketch 8.1. A point is a reference to a location. Points in 3D are given as
The [e1 , e2 , e3 ]-axes, a point,
⎡ ⎤
and a vector. p1
p = ⎣ p2 ⎦ . (8.2)
p3

The coordinates indicate the point’s location in the [e1 , e2 , e3 ]-system,


as illustrated in Sketch 8.1. A point p is said to live in Euclidean 3D-
space, or E3 , that is p ∈ E3 .
Let’s look briefly at some basic 3D vector properties, as we did for
2D vectors. First of all, the 3D zero vector:
⎡ ⎤
0
0 = ⎣0⎦ .
0

Sketch 8.2. Sketch 8.2 illustrates a 3D vector v along with its components.
Length of a 3D vector. Notice the two right triangles. Applying the Pythagorean theorem
twice, the length or Euclidean norm of v, denoted as v, is

v = v12 + v22 + v32 . (8.3)
8.1. From 2D to 3D 159

The length or magnitude of a 3D vector can be interpreted as distance,


speed, or force.
Scaling a vector by an amount k yields kv = |k|v. Also, a
normalized vector has unit length, v = 1.

Example 8.1

We will get some practice working with 3D vectors. The first task is
to normalize the vector ⎡ ⎤
1
v = ⎣2⎦ .
3
First calculate the length of v as
 √
v = 12 + 22 + 32 = 14,
then the normalized vector w is
⎡ ⎤ ⎡ ⎤
1 0.27
v 1 ⎣ ⎦ ⎣
w= =√ 2 ≈ 0.53⎦ .
v 14 3 0.80
Check for yourself that w = 1.
Scale v by k = 2: ⎡ ⎤
2
2v = ⎣4⎦ .
6
Now calculate  √
2v = 22 + 42 + 62 = 2 14.
Thus we verified that 2v = 2v.

There are infinitely many 3D unit vectors. In Sketch 8.3 a few of


these are drawn emanating from the origin. The sketch is a sphere of
radius one.
All the rules for combining points and vectors in 2D from Sec- Sketch 8.3.
tion 2.2 carry over to 3D. The dot product of two 3D vectors, v and All 3D unit vectors define a
sphere.
w, becomes
v · w = v1 w1 + v2 w2 + v3 w3 .
The cosine of the angle θ between the two vectors can be determined
as
v·w
cos θ = . (8.4)
vw
160 8. 3D Geometry

8.2 Cross Product


The dot product is a type of multiplication for two vectors that reveals
geometric information, namely the angle between them. However,
this does not reveal information about their orientation in relation to
R3 . Two vectors define a plane. It would be useful to have yet another
vector in order to create a 3D coordinate system that is embedded in
the [e1 , e2 , e3 ]-system. This is the purpose of another form of vector
multiplication called the cross product.
In other words, the cross product of v and w, written as
u = v ∧ w,
produces the vector u, which satisfies the following:
1. The vector u is perpendicular to v and w, that is
u · v = 0 and u · w = 0.
2. The orientation of the vector u follows the right-hand rule. This
means that if you curl the fingers of your right hand from v to w,
your thumb will point in the direction of u. See Sketch 8.4.
3. The magnitude of u is the area of the parallelogram defined by v
and w.
Because the cross product produces a vector, it is also called a vector
product.
Sketch 8.4. Items 1 and 2 determine the direction of u and item 3 determines
Characteristics of the cross the length of u. The cross product is defined as
product. ⎡ ⎤
v2 w3 − w2 v3
v ∧ w = ⎣v3 w1 − w3 v1 ⎦ . (8.5)
v1 w2 − w1 v2
Item 3 ensures that the cross product of orthogonal and unit length
vectors v and w results in a vector u such that u, v, w are orthonor-
mal. In other words, u is unit length and perpendicular to v and w.
Notice that each component of (8.5) is a 2 × 2 determinant. For
the ith component, omit the ith component of v and w and negate
the middle determinant:
⎡ ⎤
v2 w2

⎢ v3 w3 ⎥
⎢ ⎥
⎢ v w ⎥
⎢− 1 1 ⎥
v ∧ w = ⎢ v w ⎥ .
⎢ 3 3 ⎥
⎢ ⎥
⎣ v1 w1 ⎦
v2 w2
8.2. Cross Product 161

Example 8.2

Compute the cross product of


⎡ ⎤ ⎡ ⎤
1 0
v = ⎣0⎦ and w = ⎣3⎦ .
2 4

The cross product is


⎡ ⎤ ⎡ ⎤
0×4−3×2 −6
u = v ∧ w = ⎣2 × 0 − 4 × 1⎦ = ⎣−4⎦ . (8.6)
1×3−0×0 3

Section 4.9 described why the 2 × 2 determinant, formed from two


2D vectors, is equal to the area P of the parallelogram defined by
these two vectors. The analogous result for two vectors in 3D is

P = v ∧ w. (8.7)

Recall that P is also defined by measuring a height and side length


of the parallelogram, as illustrated in Sketch 8.5. The height h is
Sketch 8.5.
h = w sin θ, Area of a parallelogram.
and the side length is v, which makes

P = vw sin θ. (8.8)

Equating (8.7) and (8.8) results in

v ∧ w = vw sin θ. (8.9)

Example 8.3

Compute the area of the parallelogram formed by


⎡ ⎤ ⎡ ⎤
2 0
v = ⎣2⎦ and w = ⎣0⎦ .
0 1
Set up the cross product


2
v ∧ w = ⎣−2⎦ .
0
162 8. 3D Geometry

Then the area is √


P = v ∧ w = 2 2.
Since the parallelogram is a rectangle, the area is the product of the
edge lengths, so this is the correct result. Verifying (8.9),
√ √
P = 2 2 sin 90◦ = 2 2.

In order to derive another useful expression in terms of the cross


product, square both sides of (8.9). Thus, we have

v ∧ w2 = v2 w2 sin2 θ


= v2 w2 (1 − cos2 θ)
(8.10)
= v2 w2 − v2 w2 cos2 θ
= v2 w2 − (v · w)2 .

The last line is referred to as Lagrange’s identity. We now have an


expression for the area of a parallelogram in terms of a dot product.
To get a better feeling for the behavior of the cross product, let’s
look at some of its properties.

• Parallel vectors result in the zero vector: v ∧ cv = 0.

• Homogeneous: cv ∧ w = c(v ∧ w).

• Antisymmetric: v ∧ w = −(w ∧ v).

• Nonassociative: u ∧ (v ∧ w) = (u ∧ v) ∧ w, in general.

• Distributive: u ∧ (v + w) = u ∧ v + u ∧ w.
• Right-hand rule:

e 1 ∧ e2 = e3 ,
e2 ∧ e3 = e1 ,
e3 ∧ e1 = e2 .

• Orthogonality:

v · (v ∧ w) = 0 : v∧w is orthogonal to v.
w · (v ∧ w) = 0 : v∧w is orthogonal to w.
8.2. Cross Product 163

Example 8.4

Let’s test these properties of the cross product with


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 0
u = ⎣1⎦ , v = ⎣0⎦ , w = ⎣3⎦ .
1 0 0

Make your own sketches and don’t forget the right-hand rule to guess
the resulting vector direction.

• Parallel vectors:
⎡ ⎤
0×0−0×0
v ∧ 3v = ⎣0 × 6 − 0 × 2⎦ = 0.
2×0−6×0

• Homogeneous:
⎡ ⎤ ⎡ ⎤
0×0−3×0 0
4v ∧ w = ⎣0 × 0 − 0 × 8⎦ = ⎣ 0 ⎦ ,
8×3−0×0 24

and ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0×0−3×0 0 0
4(v ∧ w) = 4 ⎣0 × 0 − 0 × 2⎦ = 4 ⎣0⎦ = ⎣ 0 ⎦ .
2×3−0×0 6 24

• Antisymmetric:
⎡ ⎤ ⎛⎡ ⎤⎞
0 0
v ∧ w = ⎣0⎦ and − (w ∧ v) = − ⎝⎣ 0⎦⎠ .
6 −6

• Nonassociative:
⎡ ⎤ ⎡ ⎤
1×6−0×1 6
u ∧ (v ∧ w) = ⎣1 × 0 − 6 × 1⎦ = ⎣−6⎦ ,
1×0−0×1 0

which is not the same as


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 6
(u ∧ v) ∧ w = ⎣ 2⎦ ∧ ⎣3⎦ = ⎣0⎦ .
−2 0 0
164 8. 3D Geometry

• Distributive:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 −3
u ∧ (v + w) = ⎣1 ⎦ ∧ ⎣3⎦ = ⎣ 2⎦ ,
1 0 1
which is equal to
⎡⎤ ⎡ ⎤ ⎡ ⎤
0 −3 −3
(u ∧ v) + (u ∧ w) = ⎣ 2⎦ + ⎣ 0⎦ = ⎣ 2⎦ .
−2 3 1

The cross product is an invaluable tool for engineering. One rea-


son: it facilitates the construction of a coordinate independent frame
of reference. We present two applications of this concept. In Sec-
tion 8.6, outward normal vectors to a 3D triangulation, constructed
using cross products, are used for rendering. In Section 20.7, a coor-
dinate independent frame is used to move an object in space along a
specified curve.

8.3 Lines
Specifying a line with 3D geometry differs a bit from 2D. In terms of
points and vectors, two pieces of information define a line; however,
we are restricted to specifying
• two points or
• a point and a vector parallel to the line.
The 2D geometry item (from Section 3.1), which specifies only
• a point and a vector perpendicular to the line,
no longer works. It isn’t specific enough. (See Sketch 8.6.) In other
words, an entire family of lines satisfies this specification; this family
lies in a plane. (More on planes in Section 8.4.) As a consequence,
the concept of a normal to a 3D line does not exist.
Sketch 8.6. Let’s look at the mathematical representations of a 3D line. Clearly,
Point and perpendicular don’t from the discussion above, there cannot be an implicit form.
define a line.
The parametric form of a 3D line does not differ from the 2D line
except for the fact that the given information lives in 3D. A line l(t)
has the form
l(t) = p + tv, (8.11)
8.4. Planes 165

where p ∈ E3 and v ∈ R3 . Points are generated on the line as the


parameter t varies.
In 2D, two lines either intersect or they are parallel. In 3D this is
not the case; a third possibility is that the lines are skew. Sketch 8.7
illustrates skew lines using a cube as a reference frame.
Because lines in 3D can be skew, two lines might not intersect.
Revisiting the problem of the intersection of two lines given in para-
metric form from Section 3.8, we can see the algebraic truth in this
statement. Now the two lines are
l1 : l1 (t) = p + tv
Sketch 8.7.
l2 : l2 (s) = q + sw
Skew lines.
where p, q ∈ E and v, w ∈ R . To find the intersection point, we
3 3

solve for t or s. Repeating (3.15), we have the linear system

t̂v − ŝw = q − p.

However, now there are three equations and still only two unknowns.
Thus, the system is overdetermined. No solution exists when the lines
are skew. But we can find a best approximation, the least squares
solution, and that is the topic of Section 12.7. In many applications
it is important to know the closest point on a line to another line.
This problem is solved in Section 11.2.
We still have the concepts of perpendicular and parallel lines in 3D.

8.4 Planes
While exploring the possibility of a 3D implicit line, we encountered a
plane. We’ll essentially repeat that here, however, with a little change
in notation. Suppose we are given a point p and a vector n bound to
p. The locus of all points x that satisfy the equation

n · (x − p) = 0 (8.12)

defines the implicit form of a plane. This is illustrated in Sketch 8.8.


The vector n is called the normal to the plane if n = 1. If this is
the case, then (8.12) is called the point normal plane equation.
Expanding (8.12), we have Sketch 8.8.
Point normal plane equation.
n1 x1 + n2 x2 + n3 x3 − (n1 p1 + n2 p2 + n3 p3 ) = 0.

Typically, this is written as

Ax1 + Bx2 + Cx3 + D = 0, (8.13)


166 8. 3D Geometry

where

A = n1
B = n2
C = n3
D = −(n1 p1 + n2 p2 + n3 p3 ).

Example 8.5

Compute the implicit form of the plane through the point


⎡ ⎤
4
p = ⎣0⎦
0

that is perpendicular to the vector


⎡ ⎤
1
n = ⎣1⎦ .
1

All we need to compute is D:

D = −(1 × 4 + 1 × 0 + 1 × 0) = −4.

Thus, the plane equation is

x1 + x2 + x3 − 4 = 0.

Sketch 8.9. Similar to a 2D implicit line, if the coefficients A, B, C correspond to


Origin to plane distance D. the normal to the plane, then |D| describes the distance of the plane to
the origin. Notice in Sketch 8.8 that this is the perpendicular distance.
A 2D cross section of the geometry is illustrated in Sketch 8.9. We
equate two definitions of the cosine of an angle,1
D n·p
cos(θ) = and cos(θ) = ,
p np

and remember that the normal is unit length, to find that D = n · p.


1 The equations that follow should refer to p − 0 rather than simply p, but we

omit the origin for simplicity.


8.4. Planes 167

The point normal form reflects the (perpendicular) distance of a


point from a plane. This situation is illustrated in Sketch 8.10. The
distance d of an arbitrary point x̂ from the point normal form of the
plane is
d = Ax̂1 + B x̂2 + C x̂3 + D.
The reason for this is precisely the same as for the implicit line in
Section 3.3. See Section 11.1 for more on this topic.
Suppose we would like to find the distance of many points to a given
plane. Then it is computationally more efficient to have the plane in
point normal form. The new coefficients will be A , B  , C  , D . We Sketch 8.10.
normalize n in (8.12), then we may divide the implicit equation by Point to plane distance.
this factor,
n · (x − p)
= 0,
n
n·x n·p
− = 0,
n n
resulting in
A B C D
A = , B = , C = , D = .
n n n n

Example 8.6

Let’s continue with the plane from the previous example,


x1 + x2 + x3 − 4 = 0.
Clearly it is not in point normal form because the length of the vector
⎡ ⎤
1
n = ⎣1⎦
1

is not equal to one. In fact, n = 1/ 3.
The new coefficients of the plane equation are
1 −4
A = B  = C  = √ D = √ ,
3 3
resulting in the point normal plane equation,
1 1 1 4
√ x1 + √ x2 + √ x3 − √ = 0.
3 3 3 3
168 8. 3D Geometry

Determine the distance d of the point


⎡ ⎤
4
q = ⎣4⎦
4

from the plane:

1 1 1 4 8
d = √ × 4 + √ × 4 + √ × 4 − √ = √ ≈ 4.6.
3 3 3 3 3
Notice that d > 0; this is because the point q is on the same side
of the plane as the normal direction.
√ The distance of the origin to
the plane is d = D = −4/ 3, which is negative because it is on
the opposite side of the plane to which the normal points. This is
analogous to the 2D implicit line.

The implicit plane equation is wonderful for determining if a point


is in a plane; however, it is not so useful for creating points in a plane.
For this we have the parametric form of a plane.
The given information for defining a parametric representation of
a plane usually comes in one of two ways:
Sketch 8.11.
Parametric plane. • three points, or

• a point and two vectors.

If we start with the first scenario, we choose three points p, q, r, then


choose one of these points and form two vectors v and w bound to
that point as shown in Sketch 8.11:

v = q−p and w = r − p. (8.14)

Why not just specify one point and a vector in the plane, analogous
to the implicit form of a plane? Sketch 8.12 illustrates that this is
not enough information to uniquely define a plane. Many planes fit
that data.
Two vectors bound to a point are the data we’ll use to define a
Sketch 8.12.
plane P in parametric form as
Family of planes through a point
and vector.
P(s, t) = p + sv + tw. (8.15)
8.5. Scalar Triple Product 169

The two independent parameters, s and t, determine a point P(s, t)


in the plane.2 Notice that (8.15) can be rewritten as

P(s, t) = p + s(q − p) + t(r − p)


(8.16)
= (1 − s − t)p + sq + tr.

As described in Section 17.1, (1 − s − t, s, t) are the barycentric co-


ordinates of a point P(s, t) with respect to the triangle with vertices
p, q, and r, respectively.
Another method for specifying a plane, as illustrated in Sketch 8.13,
is as the bisector of two points. This is how a plane is defined Sketch 8.13.
in Euclidean geometry—the locus of points equidistant from two A plane defined as the bisector
of two points.
points. The line between two given points defines the normal to the
plane, and the midpoint of this line segment defines a point in the
plane. With this information it is most natural to express the plane
in implicit form.

8.5 Scalar Triple Product


In Section 8.2 we encountered the area P of the parallelogram formed
by vectors v and w measured as

P = v ∧ w.

The next natural question is how do we measure the volume of the


parallelepiped, or skew box, formed by three vectors. See Sketch 8.14.
The volume is a product of a face area and the corresponding height Sketch 8.14.
of the skew box. As illustrated in the sketch, after choosing v and w Scalar triple product for the
to form the face, the height is u cos θ. Thus, the volume V is volume.

V = uv ∧ w cos θ.

Substitute a dot product for cos θ, then

V = u · (v ∧ w). (8.17)

This is called the scalar triple product, and it is a number representing


a signed volume.
The sign reveals something about the orientation of the three vec-
tors. If cos θ > 0, resulting in a positive volume, then u is on the
2 This is a slight deviation in notation: an uppercase boldface letter rather than

a lowercase one denoting a point.


170 8. 3D Geometry

same side of the plane formed by v and w as v ∧ w. If cos θ < 0,


resulting in a negative volume, then u is on the opposite side of the
plane as v ∧ w. If cos θ = 0 resulting in zero volume, then u lies in
this plane—the vectors are coplanar.
From the discussion above and the antisymmetry of the cross prod-
uct, we see that the scalar triple product is invariant under cyclic
permutations. This means that the we get the same volume for the
following:

V = u · (v ∧ w)
= w · (u ∧ v) (8.18)
= v · (w ∧ u).

Example 8.7

Let’s compute the volume for a parallelepiped defined by


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 0 3
v = ⎣0⎦ , w = ⎣1⎦ , u = ⎣3⎦ .
0 0 3

First we compute the cross product,


⎡ ⎤
0
y = v ∧ w = ⎣0⎦ ,
2

then the volume V = u · y = 6. Notice that if u3 = −3, then


V = −6, demonstrating that the sign of the scalar triple product
reveals information about the orientation of the given vectors.
We’ll see in Section 9.5 that the parallelepiped given here is simply
a rectangular box with dimensions 2 × 1 × 3 that has been sheared.
Since shears preserve areas and volumes, this also confirms that the
parallelepiped has volume 6.

In Section 4.9 we introduced the 2 × 2 determinant as a tool to


calculate area. The scalar triple product is really just a fancy name
for a 3 × 3 determinant, but we’ll get to that in Section 9.8.
8.6. Application: Lighting and Shading 171

Figure 8.2.
Hedgehog plot: the normal of each facet is drawn at the centroid.

8.6 Application: Lighting and Shading


Let’s look at an application of a handful of the tools that we have
developed so far: lighting and shading for computer graphics. One
of the most basic elements needed to calculate the lighting of a 3D
object (model) is the normal, which was introduced in Section 8.4.
Although a lighted model might look smooth, it is represented simply
with vertices and planar facets, most often triangles. The normal of
each planar facet is used in conjunction with the light source location
and our eye location to calculate the lighting (color) of each vertex,
and one such method is called the Phong illumination model; details of
the method may be found in graphics texts such as [10, 14]. Figure 8.2
illustrates normals drawn emanating from the centroid of the facets.
Determining the color of a facet is called shading. A nice example is
illustrated in Figure 8.3.
We calculate the normal by using the cross product from Section 8.2.
Suppose we have a triangle defined by points p, q, r. Then we form
two vectors, v and w as defined in (8.14) from these points. The
normal n is
v∧w
n= .
v ∧ w
The normal is by convention considered to be of unit length. Why
172 8. 3D Geometry

Figure 8.3.
Flat shading: the normal to each planar facet is used to calculate the illumination of
each facet.

use v ∧ w instead of w ∧ v? Triangle vertices imply an orientation.


From this we follow the right-hand rule to determine the normal di-
rection. It is important to have a rule, as just described, so the facets
for a model are consistently defined. In turn, the lighting will be
consistently calculated at all points on the model.
Figure 8.3 illustrates flat shading, which is the fastest and most
rough-looking shading method. Each facet is given one color, so a
lighting calculation is done at one point, say the centroid, based on
the facet’s normal. Figure 8.4 illustrates Gouraud or smooth shading,
which produces smoother-looking models but involves more calcu-
lation than flat shading. At each vertex of a triangle, the lighting
is calculated. Each of these lighting vectors, ip , iq , ir , each vector
indicating red, green, and blue components of light, are then interpo-
lated across the triangle. For instance, a point x in the triangle with
barycentric coordinates (u, v, w),

x = up + vq + wr,

will be assigned color

ix = uip + viq + wir ,

which is a handy application of barycentric coordinates!


8.6. Application: Lighting and Shading 173

Figure 8.4.
Smooth shading: a normal at each vertex is used to calculate the illumination over
each facet. Left: zoomed-in and displayed with triangles. Right: the smooth shaded
bugle.

Still unanswered in the smooth shading method: what normals do


we assign the vertices in order to achieve a smooth shaded model? If
we used the same normal at each vertex, for small enough triangles,
we would wind up with (expensive) flat shading. The answer: a
vertex normal is calculated as the average of the triangle normals for
the triangles in the star of the vertex. (See Section 17.4 for a review
of triangulations.) Better normals can be generated by weighting the
contribution of each triangle normal based on the area of the triangle.
The direction of the normal n relative to our eye’s position can be
used to eliminate facets from the rendering pipeline. For a closed
surface, such as that in Figure 8.3, the “back-facing” facets will
be obscured by the “front-facing” facets. If the centroid of a triangle
is c and the eye’s position is e, then form the vector

v = (e − c)/e − c.

If
n·v <0

then the triangle is back-facing, and we need not render it. This
process is called culling. A great savings in rendering time can be
achieved with culling.
Planar facet normals play an important role in computer graphics,
as demonstrated in this section. For more advanced applications,
consult a graphics text such as [14].
174 8. 3D Geometry

• 3D vector • point normal plane


• 3D point equation
• vector length • point-plane distance
• unit vector • plane-origin distance
• dot product • barycentric coordinates
• cross product • scalar triple product
• right-hand rule • volume
• orthonormal • cyclic permutations of
• area vectors
• Lagrange’s identity • triangle normal
• 3D line • back-facing triangle
• implicit form of a plane • lighting model
• parametric form of a plane • flat and Gouraud shading
• normal • vertex normal
• culling

8.7 Exercises
For the following exercises, use the following points and vectors:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 4 1 1 0
p = ⎣0⎦ , q = ⎣1⎦ , r = ⎣2⎦ , v = ⎣0⎦ , w = ⎣1⎦ , u = ⎣0⎦ .
1 1 4 0 1 1

1. Normalize the vector r. What is the length of the vector 2r?


2. Find the angle between the vectors v and w.
3. Compute v ∧ w.
4. Compute the area of the parallelogram formed by vectors v and w.
5. What is the sine of the angle between v and w?
6. Find three vectors so that their cross product is associative.
7. Are the lines
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 1 −1
l1 (t) = ⎣0⎦ + t ⎣0⎦ and l2 (s) = ⎣1⎦ + s ⎣−1⎦
1 1 1 1

skew?
8. Form the point normal plane equation for a plane through point p and
with normal direction r.
8.7. Exercises 175

9. Form the point normal plane equation for the plane defined by points
p, q, and r.
10. Form a parametric plane equation for the plane defined by points p, q,
and r.
11. Form an equation of the plane that bisects the points p and q.
12. Given the line l defined by point q and vector v, what is the length of
the projection of vector w bound to q onto l?
13. Given the line l defined by point q and vector v, what is the (perpen-
dicular) distance of the point q + w (where w is a vector) to the line
l?
14. What is w ∧ 6w?
15. For the plane in Exercise 8, what is the distance of this plane to the
origin?
16. For the plane in Exercise 8, what is the distance of the point q to this
plane?
17. Find the volume of the parallelepiped defined by vectors v, w, and u.
18. Decompose w into
w = u1 + u2 ,
where u1 and u2 are perpendicular. Additionally, find u3 to complete
an orthogonal frame. Hint: Orthogonal projections are the topic of
Section 2.8.
19. Given the triangle formed by points p, q, r, and colors
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1
ip = ⎣0⎦ , iq = ⎣1⎦ , ir = ⎣0⎦ ,
0 0 0

what color ic would be assigned to the centroid of this triangle using


Gouraud shading? Note: the color vectors are scaled between [0, 1].
The colors black, white, red, green, blue are represented by
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 1 0 0
⎣0⎦ , ⎣1⎦ , ⎣0⎦ , ⎣1⎦ , ⎣0⎦ ,
0 1 0 0 1

respectively.
This page intentionally left blank
9
Linear Maps in 3D

Figure 9.1.
Flight simulator: 3D linear maps are necessary to create the twists and turns in a flight
simulator. (Image is from the NASA website http://www.nasa.gov.)

The flight simulator is an important part in the training of airplane


pilots. It has a real cockpit, but what you see outside the windows is

177
178 9. Linear Maps in 3D

computer imagery. As you take a right turn, the terrain below changes
accordingly; as you dive downwards, it comes closer to you. When
you change the (simulated) position of your plane, the simulation
software must recompute a new view of the terrain, clouds, or other
aircraft. This is done through the application of 3D affine and linear
maps.1 Figure 9.1 shows an image that was generated by an actual
flight simulator. For each frame of the simulated scene, complex 3D
computations are necessary, most of them consisting of the types of
maps discussed in this section.

Sketch 9.1.
A vector in the [e1 , e2 , e3 ]-
9.1 Matrices and Linear Maps
coordinate system.
The general concept of a linear map in 3D is the same as that for
a 2D map. Let v be a vector in the standard [e1 , e2 , e3 ]-coordinate
system, i.e.,
v = v1 e1 + v2 e2 + v3 e3 .
(See Sketch 9.1 for an illustration.)
Let another coordinate system, the [a1 , a2 , a3 ]-coordinate system,
be given by the origin o and three vectors a1 , a2 , a3 . What vec-
tor v in the [a1 , a2 , a3 ]-system corresponds to v in the [e1 , e2 , e3 ]-
system? Simply the vector with the same coordinates relative to the
[a1 , a2 , a3 ]-system. Thus,

v = v1 a1 + v2 a2 + v3 a3 . (9.1)

This is illustrated by Sketch 9.2 and the following example.

Example 9.1

Let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 0 0
v = ⎣1⎦ , a1 = ⎣0⎦ , a2 = ⎣1⎦ , a3 = ⎣ 0 ⎦ .
2 1 0 1/2
Sketch 9.2. Then ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
The matrix A maps v in the 2 0 0 2
[e1 , e2 , e3 ]-coordinate system to v = 1 · ⎣0⎦ + 1 · ⎣1⎦ + 2 · ⎣ 0 ⎦ = ⎣1⎦ .
the vector v in the [a1 , a2 , a3 ]- 1 0 1/2 2
coordinate system.

1 Actually, perspective maps are also needed here. They will be discussed in

Section 10.5.
9.1. Matrices and Linear Maps 179

You should recall that we had the same configuration earlier for
the 2D case—(9.1) corresponds directly to (4.2) of Section 4.1. In
Section 4.2, we then introduced the matrix form. That is now an
easy project for this chapter—nothing changes except the matrices
will be 3 × 3 instead of 2 × 2. In 3D, a matrix equation looks like this:
v = Av, (9.2)
i.e., just the same as for the 2D case. Written out in detail, there is
a difference: ⎡ ⎤ ⎡ ⎤⎡ ⎤
v1 a1,1 a1,2 a1,3 v1
⎣v2 ⎦ = ⎣a2,1 a2,2 a2,3 ⎦ ⎣v2 ⎦ . (9.3)
v3 a3,1 a3,2 a3,3 v3
All matrix properties from Chapter 4 carry over almost verbatim.

Example 9.2

Returning to our example, it is quite easy to condense it into a matrix


equation: ⎡ ⎤⎡ ⎤ ⎡ ⎤
2 0 0 1 2
⎣0 1 0 ⎦ ⎣1⎦ = ⎣1⎦ .
1 0 1/2 2 2

Again, if we multiply a matrix A by a vector v, the ith component


of the result vector is obtained as the dot product of the ith row of
A and v.
The matrix A represents a linear map. Given the vector v in the
[e1 , e2 , e3 ]-system, there is a vector v in the [a1 , a2 , a3 ]-system such
that v has the same components in the [a1 , a2 , a3 ]-system as did v
in the [e1 , e2 , e3 ]-system. The matrix A finds the components of v
relative to the [e1 , e2 , e3 ]-system.
With the 2 × 2 matrices of Section 4.2, we introduced the transpose
AT of a matrix A. We will need this for 3 × 3 matrices, and it is
obtained by interchanging rows and columns, i.e.,
⎡ ⎤T ⎡ ⎤
2 3 −4 2 3 −1
⎣ 3 9 −4⎦ = ⎣ 3 9 −9⎦ .
−1 −9 4 −4 −4 4
The boldface row of A has become the boldface column of AT . As a
concise formula,
aTi,j = aj,i .
180 9. Linear Maps in 3D

9.2 Linear Spaces


The set of all 3D vectors is referred to as a 3D linear space or vector
space, and it is denoted as R3 . We associate with R3 the operation
of forming linear combinations. This means that if v and w are two
vectors in this linear space, then any vector

u = sv + tw (9.4)

is also in this space. The vector u is then said to be a linear combi-


nation of v and w. This is also called the linearity property. Notice
that the linear combination (9.4) combines scalar multiplication and
vector addition. These are the key operations necessary for a linear
space.
Select two vectors v and w and consider all vectors u that may be
expressed as (9.4) with arbitrary scalars s, t. Clearly, all vectors u
form a subset of all 3D vectors. But beyond that, they form a linear
space themselves—a 2D space. For if two vectors u1 and u2 are in
this space, then they can be written as

u 1 = s 1 v + t1 w and u2 = s2 v + t2 w,

and thus any linear combination of them can be written as

αu1 + βu2 = (αs1 + βs2 )v + (αt1 + βt2 )w,

which is again in the same space. We call the set of all vectors of
the form (9.4) a subspace of the linear space of all 3D vectors. The
term subspace is justified since not all 3D vectors are in it. Take for
instance the vector n = v ∧ w, which is perpendicular to both v and
w. There is no way to write this vector as a linear combination of v
and w!
We say our subspace has dimension 2 since it is generated, or
spanned, by two vectors. These vectors have to be noncollinear; oth-
erwise, they just define a line, or a 1D (1-dimensional) subspace. (In
Section 2.8, we needed the concept of a subspace in order to find the
orthogonal projection of w onto v. Thus the projection lived in the
one-dimensional subspace formed by v.)
If two vectors are collinear, then they are also called linearly de-
pendent. If v and w are linearly dependent, then v = sw. Con-
versely, if they are not collinear, they are called linearly independent.
If v1 , v2 , v3 are linearly independent, then we will not have a solution
set s1 , s2 for
v3 = s1 v1 + s2 v2 ,
9.3. Scalings 181

and the only way to express the zero vector,

0 = s1 v1 + s2 v2 + s3 v3

is if s1 = s2 = s3 = 0. Three linearly independent vectors in R3 span


the entire space and the vectors are said to form a basis for R3 .
Given two linearly independent vectors v and w, how do we decide
if another vector u is in the subspace spanned by v and w? Simple:
check the volume of the parallelepiped formed by the three vectors,
which is equivalent to calculating the scalar triple product (8.17) and
checking if it is zero (within a round-off tolerance). In Section 9.8,
we’ll introduce the 3 × 3 determinant, which is a matrix-oriented
calculation of volume.
We’ll revisit this topic in a more abstract setting for n-dimensional
vectors in Chapter 14.

9.3 Scalings
A scaling is a linear map that enlarges or reduces vectors:
⎡ ⎤
s1,1 0 0
v = ⎣ 0 s2,2 0 ⎦ v. (9.5)
0 0 s3,3

If all scale factors si,i are larger than one, then all vectors are enlarged,
as is done in Figure 9.2. If all si,i are positive yet less than one, all
vectors are shrunk.

Example 9.3

In this example,
⎡ ⎤ ⎡ ⎤
s1,1 0 0 1/3 0 0
⎣ 0 s2,2 0 ⎦ = ⎣ 0 1 0⎦ ,
0 0 s3,3 0 0 3

we shrink in the e1 -direction, leave the e2 -direction unchanged, and


stretch the e3 -direction. See Figure 9.3.
182 9. Linear Maps in 3D

Figure 9.2.
Scalings in 3D: the large torus is scaled by 1/3 in each coordinate to form the small
torus.

Figure 9.3.
Nonuniform scalings in 3D: the “standard" torus is scaled by 1/3, 1, 3 in the e1 -, e2 -,
e3 -directions,respectively.
9.4. Reflections 183

Negative numbers for the si,i will cause a flip in addition to a scale.
So, for instance ⎡ ⎤
−2 0 0
⎣ 0 1 0⎦
0 0 −1

will stretch and reverse in the e1 -direction, leave the e2 -direction


unchanged, and will reverse in the e3 -direction.
How do scalings affect volumes? If we map the unit cube, given by
the three vectors e1 , e2 , e3 with a scaling, we get a rectangular box.
Its side lengths are s1,1 in the e1 -direction, s2,2 in the e2 -direction,
and s3,3 in the e3 -direction. Hence, its volume is given by s1,1 s2,2 s3,3 .
A scaling thus changes the volume of an object by a factor that equals
the product of the diagonal elements of the scaling matrix.2
For 2 × 2 matrices in Chapter 4, we developed a geometric under-
standing of the map through illustrations of the action ellipse. For
example, a nonuniform scaling is illustrated in Figure 4.3. In 3D, the
same idea works as well. Now we can examine what happens to 3D
unit vectors forming a sphere. They are mapped to an ellipsoid—the
action ellipsoid! The action ellipsoid corresponding to Figure 9.2 is
simply a sphere that is smaller than the unit sphere. The action ellip-
soid corresponding to Figure 9.3 has its major axis in the e3 -direction
and its minor axis in the e1 -direction. In Chapter 16, we will relate
the shape of the ellipsoid to the linear map.

9.4 Reflections
If we reflect a vector about the e2 , e3 -plane, then its first component
should change in sign:
⎡ ⎤ ⎡ ⎤
v1 −v1
⎣v2 ⎦ −→ ⎣ v2 ⎦ ,
v3 v3

as shown in Sketch 9.3.


This reflection is achieved by a scaling matrix: Sketch 9.3.
Reflection of a vector about the
⎡ ⎤ ⎡ ⎤⎡ ⎤
−v1 −1 0 0 v1 e2 , e3 -plane.
⎣ v2 ⎦ = ⎣ 0 1 0⎦ ⎣v2 ⎦ .
v3 0 0 1 v3
2 We have shown this only for the unit cube, but it is true for any other object

as well.
184 9. Linear Maps in 3D

The following is also a reflection, as Sketch 9.4 shows:


⎡ ⎤ ⎡ ⎤
v1 v3
⎣v2 ⎦ −→ ⎣v2 ⎦ .
v3 v1

It interchanges the first and third component of a vector, and is thus a


reflection about the plane x1 = x3 . (This is an implicit plane equation,
as discussed in Section 8.4.)
This map is achieved by the following matrix equation:
⎡ ⎤ ⎡ ⎤⎡ ⎤
v3 0 0 1 v1
⎣v2 ⎦ = ⎣0 1 0⎦ ⎣v2 ⎦ .
v1 1 0 0 v3

Sketch 9.4. In Section 11.5, we develop a more general reflection matrix, called
Reflection of a vector about the the Householder matrix. Instead of reflecting about a coordinate
x1 = x 3 plane. plane, with this matrix, we can reflect about a given (unit) normal.
This matrix is central to the Householder method for solving a linear
system in Section 13.1.
By their very nature, reflections do not change volumes—but they
do change their signs. See Section 9.8 for more details.

9.5 Shears
What map takes a cube to the parallelepiped (skew box) of Sketch 9.5?
The answer: a shear. Shears in 3D are more complicated than the 2D
shears from Section 4.7 because there are so many more directions to
shear. Let’s look at some of the shears more commonly used.
Consider the shear that maps e1 and e2 to themselves, and that
also maps e3 to ⎡ ⎤
a
a3 = ⎣ b ⎦ .
1
The shear matrix S1 that accomplishes the desired task is easily found:
⎡ ⎤
1 0 a
S1 = ⎣0 1 b ⎦ .
Sketch 9.5. 0 0 1
A 3D shear parallel to the e1 ,
e2 -plane. It is illustrated in Sketch 9.5 with a = 1 and b = 1, and in Figure 9.4.
Thus this map shears parallel to the [e1 , e2 ]-plane. Suppose we apply
9.5. Shears 185

Figure 9.4.
Shears in 3D: a paraboloid is sheared in the e1 - and e2 -directions. The e3 -direction
runs through the center of the left paraboloid.

this shear to a vector v resulting in


⎡ ⎤
v1 + av3
v = Sv = ⎣ v2 + bv3 ⎦ .
v3
An ai,j element is a factor by which the jth component of v affects
the ith component of v .
What shear maps e2 and e3 to themselves, and also maps
⎡ ⎤ ⎡ ⎤
a a
⎣ b ⎦ to ⎣0⎦?
c 0
This shear is given by the matrix
⎡ ⎤
1 0 0
⎢ −b ⎥

S2 = ⎣ a 1 0⎥⎦. (9.6)
−c
a 0 1
One quick check gives:
⎡ ⎤
1 0 0 ⎡ ⎤ ⎡ ⎤
⎢ −b ⎥ a a
⎢ 1 0⎥ ⎣ b ⎦ = ⎣0 ⎦ ;
⎣a ⎦
−c c 0
a 0 1
thus our map does what it was meant to do: it shears parallel to the
[e2 , e3 ]-plane. (This is the shear of the Gauss elimination step that
we will encounter in Section 12.2.)
Although it is possible to shear in any direction, it is more com-
mon to shear parallel to a coordinate axis or coordinate plane. Try
constructing a matrix for a shear parallel to the [e1 , e3 ]-plane.
186 9. Linear Maps in 3D

The shear matrix ⎡ ⎤


1 a b
⎣0 1 0⎦
0 0 1
shears parallel to the e1 -axis. Matrices for the other axes follow sim-
ilarly.
How does a shear affect volume? For a geometric feeling, notice
the simple shear S1 from above. It maps the unit cube to a skew box
with the same base and the same height—thus it does not change
volume! All shears are volume preserving. After reading Section 9.8,
revisit these shear matrices and check the volumes for yourself.

9.6 Rotations
Suppose you want to rotate a vector v around the e3 -axis by 90◦ to
a vector v . Sketch 9.6 illustrates such a rotation:
⎡ ⎤ ⎡ ⎤
2 0
v = ⎣0⎦ → v = ⎣2⎦ .
1 1
A rotation around e3 by different angles would result in different vec-
tors, but they all will have one thing in common: their third compo-
nents will not be changed by the rotation. Thus, if we rotate a vector
around e3 , the rotation action will change only its first and second
components. This suggests another look at the 2D rotation matrices
Sketch 9.6. from Section 4.6. Our desired rotation matrix R3 looks much like the
Rotation example. one from (4.16):
⎡ ⎤
cos α − sin α 0
R3 = ⎣ sin α cos α 0⎦ . (9.7)
0 0 1
Figure 9.5 illustrates the letter L rotated through several angles about
the e3 -axis.

Example 9.4

Let us verify that R3 performs as promised with α = 90◦ :


⎡ ⎤⎡ ⎤ ⎡ ⎤
0 −1 0 2 0
⎣1 0 0⎦ ⎣0⎦ = ⎣2⎦ ,
0 0 1 1 1
so it works!
9.6. Rotations 187

Figure 9.5.
Rotations in 3D: the letter L rotated about the e3 -axis.

Similarly, we may rotate around the e2 -axis; the corresponding


matrix is ⎡ ⎤
cos α 0 sin α
R2 = ⎣ 0 1 0 ⎦. (9.8)
− sin α 0 cos α
Notice the pattern here. The rotation matrix for a rotation about the
ei -axis is characterized by the ith row being eT i and the ith column
being ei . For completeness, the last rotation matrix about the e1 -axis:
⎡ ⎤
1 0 0
R1 = ⎣ 0 cos α − sin α ⎦ . (9.9)
0 sin α cos α
The direction of rotation by a positive angle follows the right-hand
rule: curl your fingers with the rotation, and your thumb points in
the direction of the rotation axis.
If you examine the column vectors of a rotation matrix, you will see
that each one is a unit length vector and they are orthogonal to each
other. Thus, the column vectors form an orthonormal set of vectors,
and a rotation matrix is an orthogonal matrix. (These properties hold
for the row vectors of the matrix too.) As a result, we have that

RT R = I
RT = R−1

Additionally, if R rotates by θ, then R−1 rotates by −θ.


188 9. Linear Maps in 3D

Figure 9.6.
Rotations in 3D: the letter L is rotated about axes that are not the coordinate axes. On
the right the point on the L that touches the rotation axes does not move.

How about a rotation by α degrees around an arbitrary vector


a? The principle is illustrated in Sketch 9.7. The derivation of the
following matrix is more tedious than called for here, so we just give
the result:
⎡ 2 ⎤
a1 + C(1 − a21 ) a1 a2 (1 − C) − a3 S a1 a3 (1 − C) + a2 S
⎢ ⎥
R = ⎣a1 a2 (1 − C) + a3 S a22 + C(1 − a22 ) a2 a3 (1 − C) − a1 S ⎦ ,
a1 a3 (1 − C) − a2 S a2 a3 (1 − C) + a1 S a23 + C(1 − a23 )
(9.10)
Sketch 9.7. where we have set C = cos α and S = sin α. It is necessary that a = 1
Rotation about an arbitrary in order for the rotation to take place without scaling. Figure 9.6
vector. illustrates two examples of rotations about an arbitrary axis.

Example 9.5

With a complicated result such as (9.10), a sanity check is not a bad


idea. So let α = 90◦ ,
⎡ ⎤ ⎡ ⎤
0 1
a = ⎣0⎦ and v = ⎣0⎦ .
1 0
This means that we want to rotate v around a, or the e3 -axis, by 90◦
9.6. Rotations 189

as shown in Sketch 9.8. In advance, we know what R should be. In


(9.10), C = 0 and S = 1, and we calculate
⎡ ⎤
0 −1 0
R = ⎣1 0 0⎦ ,
0 0 1
which is the expected matrix. We obtain
⎡ ⎤
0
v = ⎣1⎦ .
0 Sketch 9.8.
A simple example of a rotation
about a vector.
With some confidence that (9.10) works, let’s try a more compli-
cated example.

Example 9.6

Let α = 90◦ , ⎡ ⎤
√1 ⎡ ⎤
3
⎢ 1 ⎥ 1
a= ⎢√ ⎥ and v = ⎣0⎦ .
⎣ 3⎦
√1
0
3
With C = 0 and S = 1 in (9.10), we calculate
⎡ 1 ⎤
3 − 3
1 √1 1 √1
3 3 + 3
⎢1 ⎥
R=⎢ ⎣3 + 3
√1 1
3
1 √1 ⎥
3 − 3⎦ ,

3 − 3
1 √1 1 √1 1
3 + 3 3

We obtain ⎡ 1 ⎤
3
⎢1 ⎥
v = ⎢
⎣3 +
√1 ⎥ .
3⎦

3 −
1 √1
3
Convince yourself that v  = v.
Continue this example with the vector
⎡ ⎤
1
v = ⎣1⎦ .
1
Surprised by the result?
190 9. Linear Maps in 3D

It should be intuitively clear that rotations do not change volumes.


Recall from 2D that rotations are rigid body motions.

9.7 Projections
Projections that are linear maps are parallel projections. There are
two categories. If the projection direction is perpendicular to the
projection plane then it is an orthogonal projection, otherwise it is
an oblique projection.Two examples are illustrated in Figure 9.7, in
which one of the key properties of projections is apparent: flattening.
The orthogonal and oblique projection matrices that produced this
figure are ⎡ ⎤ ⎡ √ ⎤
1 0 0 1 0 1/√2
⎣0 1 0⎦ and ⎣0 1 1/ 2⎦ ,
0 0 0 0 0 0
respectively.
Projections are essential in computer graphics to view 3D geometry
on a 2D screen. A parallel projection is a linear map, as opposed to a
perspective projection, which is not. A parallel projection preserves
relative dimensions of an object, thus it is used in drafting to produce
accurate views of a design.
Recall from 2D, Section 4.8, that a projection reduces dimension-
ality and it is an idempotent map. It flattens geometry because a
projection matrix P is rank deficient; in 3D this means that a vector

Figure 9.7.
Projections in 3D: on the left is an orthogonal projection, and on the right is an oblique
projection of 45◦ .
9.7. Projections 191

is projected into a subspace, which can be a (2D) plane or (1D) line.


The idempotent property leaves a vector in the subspace of the map
unchanged by the map, P v = P 2 v. Let’s see how to construct an
orthogonal projection in 3D.
First we choose the subspace U into which we would like to project.
If we want to project onto a line (1D subspace), specify a unit vector
u1 . If we want to project into a plane (2D subspace), specify two
orthonormal vectors u1 , u2 . Now form a matrix Ak from the vectors
defining the k-dimensional subspace U :
 
A1 = u1 or A2 = u1 u2 .

The projection matrix Pk is then defined as

Pk = Ak AT
k.

It follows that P1 is very similar to the projection matrix from Sec-


tion 4.8 except the projection line is in 3D,
 
P1 = A1 AT 1 = u1,1 u1 u2,1 u1 u3,1 u1 .

Projection into a plane takes the form,


 
  uT
P2 = A2 AT
2 = u1 u2 1
. (9.11)
uT
2

Expanding this, we see the columns of P2 are linear combinations of


u1 and u2 ,
 
P2 = u1,1 u1 + u1,2 u2 u2,1 u1 + u2,2 u2 u3,1 u1 + u3,2 u2 .

The action of P1 and P2 is thus

P1 v = (u · v)u, (9.12)
P2 v = (u1 · v)u1 + (u2 · v)u2 . (9.13)

An application of these projections is demonstrated in the Gram-


Schmidt orthonormal coordinate frame construction in Section 11.8.

Example 9.7

Let’s construct an orthogonal projection P2 into the [e1 , e2 ]-plane.


Although this example is easy enough to write down the matrix
192 9. Linear Maps in 3D

directly, let’s construct it with (9.11),


 
  eT1
P2 = e1 e2
eT2
⎡ ⎤
1 0 0
= ⎣0 1 0⎦
0 0 0

The action achieved by this linear map is


⎡ ⎤ ⎡ ⎤⎡ ⎤
v1 1 0 0 v1
⎣v2 ⎦ = ⎣0 1 0⎦ ⎣v2 ⎦ .
0 0 0 0 v3

See Sketch 9.9 and Figure 9.7 (left).

The example above is very simple, and we can immediately see that
Sketch 9.9. the projection direction is d = [0 0 ± 1]T . This vector satisfies the
Projection example. equation
P2 d = 0,
and we see that the projection direction is in the kernel of the map.
The idempotent property for P2 is easily understood by noticing
2 A2 is simply the 2 × 2 identity matrix,
that AT

P22 = A2 AT
2 A2 A2
T

 T  
  u1   uT
1
= u1 u2 u1 u2
uT2 uT
2
 
  uT
= u1 u2 I 1T
u2
= P2 .

In addition to being idempotent, orthogonal projection matrices are


symmetric. The action of the map is P v and this vector is orthogonal
to v − P v, thus

0 = (P v)T (v − P v)
= vT (P T − P T P )v,

from which we conclude that P = P T .


9.8. Volumes and Linear Maps: Determinants 193

We will examine oblique projections in the context of affine maps


in Section 10.4. Finally, we note that a projection has a significant
effect on the volume of an object. Since everything is flat after a
projection, it has zero 3D volume.

9.8 Volumes and Linear Maps: Determinants


Most linear maps change volumes; some don’t. Since this is an impor-
tant aspect of the action of a map, this section will discuss the effect
of a linear map on volume. The unit cube in the [e1 , e2 , e3 ]-system
has volume one. A linear map A will change that volume to that of
the skew box spanned by the images of e1 , e2 , e3 , i.e., by the volume
spanned by the vectors a1 , a2 , a3 —the column vectors of A. What is
the volume spanned by a1 , a2 , a3 ?
First, let’s look at what we have done so far with areas and volumes.
Recall the 2 × 2 determinant from Section 4.9. Through Sketch 4.8,
the area of a 2D parallelogram was shown to be equivalent to a de-
terminant. In fact, in Section 8.2 it was shown that the cross product
can be used to calculate this area for a parallelogram embedded in
3D. With a very geometric approach, the scalar triple product of Sec-
tion 8.5 gives us the means to calculate the volume of a parallelepiped
by simply using a “base area times height” calculation. Let’s revisit
that formula and look at it from the perspective of linear maps.
So, using linear maps, we want to illustrate that the volume of
the parallelepiped, or skew box, simply reduces to a 3D determinant
calculation. Proceeding directly with a sketch in the 3D case would be
difficult to follow. For 3D, let’s augment the determinant idea with
the tools from Section 5.4. There we demonstrated how shears—
area-preserving linear maps—can be used to transform a matrix to
upper triangular. These are the forward elimination steps of Gauss
elimination.
First, let’s introduce a 3 × 3 determinant of a matrix A. It is easily
remembered as an alternating sum of 2 × 2 determinants:

a2,2 a2,3 a1,2 a1,3 a1,2 a1,3
|A| = a1,1
− a2,1
+ a3,1 . (9.14)
a3,2 a3,3 a3,2 a3,3 a2,2 a2,3
The representation in (9.14) is called the cofactor expansion. Each
(signed) 2 × 2 determinant is the cofactor of the ai,j it is paired with
in the sum. The sign comes from the factor (−1)i+j . For example,
the cofactor of a2,1 is

a a1,3
(−1)2+1 1,2 .
a3,2 a3,3
194 9. Linear Maps in 3D

The cofactor is also written as (−1)i+j Mi,j where Mi,j is called the
minor of ai,j . As a result, (9.14) is also known as expansion by minors.
We’ll look into this method more in Section 12.6.
If (9.14) is expanded, then an interesting form for writing the de-
terminant arises. The formula is nearly impossible to remember, but
the following trick is not. Copy the first two columns after the last
column. Next, form the product of the three “diagonals” and add
them. Then, form the product of the three “antidiagonals” and sub-
tract them. The three “plus” products may be written as:
a1,1 a1,2 a1,3  
 a2,2 a2,3 a2,1 
  a3,3 a3,1 a3,2

and the three “minus” products as:


  a1,3 a1,1 a1,2
 a2,2 a2,3 a2,1  .
a3,1 a3,2 a3,3  

The complete formula for the 3 × 3 determinant is

|A| = a1,1 a2,2 a3,3 + a1,2 a2,3 a3,1 + a1,3 a2,1 a3,2
− a3,1 a2,2 a1,3 − a3,2 a2,3 a1,1 − a3,3 a2,1 a1,2 .

Example 9.8

What is the volume spanned by the three vectors


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 −1 0.1
a1 = ⎣0⎦ , a2 = ⎣ 4⎦ , a3 = ⎣−0.1⎦?
0 4 0.1

All we have to do is to compute



4 −0.1

det[a1 , a2 , a3 ] = 4
4 0.1
= 4(4 × 0.1 − (−0.1) × 4) = 3.2.

(Here we used an alternative notation, det, for the determinant.) In


this computation, we did not write down zero terms.
9.8. Volumes and Linear Maps: Determinants 195

As we have seen in Section 9.5, a 3D shear preserves volume. There-


fore, we can apply a series of shears to the matrix A, resulting in a
new matrix ⎡ ⎤
ã1,1 ã1,2 ã1,3
à = ⎣ 0 ã2,2 ã2,3 ⎦ .
0 0 ã3,3
The determinant of à is
|Ã| = ã1,1 ã2,2 ã3,3 , (9.15)

with of course, |A| = |Ã|.


Let’s continue with Example 9.8. One simple row operation, row3 =
row3 − row2 , will achieve the upper triangular matrix
⎡ ⎤
4 −1 0.1
à = ⎣0 4 −0.1⎦ ,
0 0 0.2

and we can determine that |Ã| = |A|.


For 3 × 3 matrices, we don’t actually calculate the volume of three
vectors by proceeding with the forward elimination steps, or shears.3
We would just directly calculate the 3 × 3 determinant from (9.14).
What is interesting about this development is now we can illustrate,
as in Sketch 9.10, how the determinant defines the volume of the skew
box. The first two column vectors of à lie in the [e1 , e2 ]-plane. Their
determinant defines the area of the parallelogram that they span; this
determinant is ã1,1 ã2,2 . The height of the skew box is simply the e3
component of ã3 . Thus, we have an easy to visualize interpretation
of the 3 × 3 determinant. And, from a slightly different perspective,
we have revisited the geometric development of the determinant as
the scalar triple product (Section 8.5).
Let’s conclude this section with some rules for determinants. Sup- Sketch 9.10.
pose we have two 3 × 3 matrices, A and B. The column vectors of A Determinant and volume in 3D.
are a1 a2 a3 .
• The determinant of the transpose matrix equals that of the matrix:
|A| = |AT |. This property allows us to interchange the terms “row”
or “column” when working with determinants.
• Exchanging two columns, creating
a noncyclic
permutation, changes
the sign of the determinant: a2 a1 a3 = −|A|.
3 However, we will use forward elimination for n × n systems. The sign of

the determinant in (9.15) needs to be adjusted if pivoting is included in forward


elimination. See Section 12.6 for details.
196 9. Linear Maps in 3D

• Multiplying one column by a scalar c results in the determinant


being multiplied by c: ca1 a2 a3 = c|A|.

• As an extension of the previous item: |cA| = c3 |A|.

• If A has a row of zeroes then |A| = 0.

• If A has two identical rows then |A| = 0.

• The sum of determinants is not the determinant of the sum, |A| +


|B| = |A + B|, in general.

• The product of determinants is the determinant of the product:


|AB| = |A||B|.

• Multiples of rows can be added together without changing the


determinant. For example, the shears of Gauss elimination do not
change the determinant, as we observed in the simple example
above.
The determinant of the shear matrix S2 in (9.6) is one, thus
|S2 A| = |S2 ||A| = |A|.

• A being invertible is equivalent to |A| = 0 (see Section 9.10).

• If A is invertible then
1
|A−1 | = .
|A|

9.9 Combining Linear Maps


If we apply a linear map A to a vector v and then apply a map B to
the result, we may write this as

v = BAv.

Matrix multiplication is defined just as in the 2D case; the element


ci,j of the product matrix C = BA is obtained as the dot product of
the ith row of B with the jth column of A. A handy way to write
the matrices so as to keep the dot products in order is

A
.
B C
9.9. Combining Linear Maps 197

Instead of a complicated formula, an example should suffice.

1 5 −4
−1 −2 0
2 3 −4
0 0 −1 −2 −3 4
1 −2 0 3 9 −4
−2 1 1 −1 −9 4

We have computed the dot product of the boldface row of B and


column of A to produce the boldface entry of C. In this example, B
and A are 3 × 3 matrices, and thus the result is another 3 × 3 matrix.
In the example in Section 9.1, a 3 × 3 matrix A is multiplied by a
3 × 1 matrix (vector) v resulting in a 3 × 1 matrix or vector. Thus
two matrices need not be the same size in order to multiply them.
There is a rule, however! Suppose we are to multiply two matrices A
and B together as AB. The sizes of A and B are

m × n and n × p, (9.16)

respectively. The resulting matrix will be of size m×p—the “outside”


dimensions in (9.16). In order to form AB, it is necessary that the
“inside” dimensions, both n here, be equal. The matrix multiplication
scheme from Section 4.2 simplifies hand-calculations by illustrating
the resulting dimensions.
As in the 2D case, matrix multiplication does not commute! That
is, AB = BA in most cases. An interesting difference between 2D
and 3D is the fact that in 2D, rotations did commute; however, in 3D
they do not. For example, in 2D, rotating first by α and then by β
is no different from doing it the other way around. In 3D, that is not
the case. Let’s look at an example to illustrate this point.

Example 9.9

Let’s look at a rotation by −90◦ around the e1 -axis with matrix R1


and a rotation by −90◦ around the e3 -axis with matrix R3 :
⎡ ⎤ ⎡ ⎤
1 0 0 0 1 0
R1 = ⎣0 0 1⎦ and R3 = ⎣−1 0 0⎦ .
0 −1 0 0 0 1
198 9. Linear Maps in 3D

e3 e3

I R3 I

R1R3
e2 e2
R1

e1 e1
R3R1

Figure 9.8.
Combining 3D rotations: left and right, the original L is labeled I for identity matrix.
On the left, R1 is applied and then R3 , the result is labeled R3 R1 . On the right, R3 is
applied and then R1 , the result is labeled R1 R3 . This shows that 3D rotations do not
commute.

Figure 9.8 illustrates what the algebra tells us:


⎡ ⎤ ⎡ ⎤
0 0 1 0 1 0
R3 R1 = ⎣−1 0 0⎦ is not equal to R1 R3 = ⎣0 0 1⎦ .
0 −1 0 1 0 0

Also helpful for understanding what is happening, is to track the


transformation of a point p on L. Form the vector v = p − o, and
let’s track ⎡ ⎤
0
v = ⎣0⎦ .
1
In Figure 9.8 on the left, observe the transformation of v:
⎡ ⎤ ⎡ ⎤
0 1
R1 v = ⎣1⎦ and R3 R1 v = ⎣0⎦ .
0 0

Now, on the right, observe the transformation of v:


⎡ ⎤ ⎡ ⎤
0 0
R3 v = ⎣0⎦ and R1 R3 v = ⎣1⎦ .
1 0

So it does matter which rotation we perform first!


9.10. Inverse Matrices 199

9.10 Inverse Matrices


In Section 5.9, we saw how inverse matrices undo linear maps. A
linear map A takes a vector v to its image v . The inverse map, A−1 ,
will take v back to v, i.e., A−1 v = v or A−1 Av = v. Thus, the
combined action of A−1 A has no effect on any vector v, which we can
write as
A−1 A = I, (9.17)
where I is the 3 × 3 identity matrix. If we applied A−1 to v first, and
then applied A, there would not be any action either; in other words,
AA−1 = I, (9.18)
too.
A matrix is not always invertible. For example, the projections
from Section 9.7 are rank deficient, and therefore not invertible. This
is apparent from Sketch 9.9: once we flatten the vectors ai to ai in
2D, there isn’t enough information available in the ai to return them
to 3D.
As we discovered in Section 9.6 on rotations, orthogonal matrices,
which are constructed from a set of orthonormal vectors, possess the
nice property RT = R−1 . Forming the reverse rotation is simple and
requires no computation; this provides for a huge savings in computer
graphics where rotating objects is a common operation.
Scaling also has an inverse, which is simple to compute. If
⎡ ⎤ ⎡ ⎤
s1,1 0 0 1/s1,1 0 0
S=⎣ 0 s2,2 0 ⎦ , then S −1 = ⎣ 0 1/s2,2 0 ⎦.
0 0 s3,3 0 0 1/s3,3
Here are more rules for matrices. These involve calculating with in-
verse matrices.

A−n = (A−1 )n = A−1 · . .


. · · · A−1
n times
(A−1 )−1 = A
1
(kA)−1 = A−1
k
(AB)−1 = B −1 A−1

See Section 12.4 for details on calculating A−1 .


200 9. Linear Maps in 3D

9.11 More on Matrices

A handful of matrix properties are explained and illustrated in Chap-


ter 4. Here we restate them so they are conveniently together. These
properties hold for n × n matrices (the topic of Chapter 12):

• preserve scalings: A(cv) = cAv


• preserve summations: A(u + v) = Au + Av

• preserve linear combinations: A(au + bv) = aAu + bAv

• distributive law: Av + Bv = (A + B)v

• commutative law for addition: A + B = B + A


• no commutative law for multiplication: AB = BA.

• associative law for addition: A + (B + C) = (A + B) + C


• associative law for multiplication: A(BC) = (AB)C

• distributive law: A(B + C) = AB + AC


(B + C)A = BA + CA

Scalar laws:
· a(B + C) = aB + aC
· (a + b)C = aC + bC
· (ab)C = a(bC)
· a(BC) = (aB)C = B(aC)
9.11. More on Matrices 201

Laws involving determinants:


• |A| = |AT |
• |AB| = |A| · |B|

• |A| + |B| = |A + B|
• |cA| = cn |A|

Laws involving exponents: Laws involving the transpose:


· Ar = A
· .
. . · A · [A + B]T = AT + B T
r times

· A r+s
= Ar As
· T
AT = A

· Ars = (Ar )s · [cA]T = cAT

· A0 = I · [AB]T = B T AT

• 3D linear map • projection


• transpose matrix • idempotent
• linear space • orthographic projection
• vector space • oblique projection
• subspace • determinant
• linearity property • volume
• linear combination • scalar triple product
• linearly independent • cofactor expansion
• linearly dependent • expansion by minors
• scale • inverse matrix
• action ellipsoid • multiply matrices
• rotation • noncommutative property
• rigid body motions of matrix multiplication
• shear • rules of matrix arithmetic
• reflection
202 9. Linear Maps in 3D

9.12 Exercises
1. Let v = 3a1 + 2a2 + a3 , where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 0
a1 = ⎣1⎦ , a2 = ⎣0⎦ , a3 = ⎣3⎦ .
1 0 0

What is v ? Write this equation in matrix form.


2. What is the transpose of the matrix
⎡ ⎤
1 5 −4
A = ⎣−1 −2 0⎦?
2 3 −4

3. Given a 2D linear subspace formed by vectors w and v, is u an element


of that subspace?
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
v = ⎣0⎦ , w = ⎣1⎦ , u = ⎣0⎦ .
0 1 1

4. Given ⎡ ⎤ ⎡⎤ ⎡ ⎤
3 1 7
v = ⎣ 2 ⎦, w = ⎣−1⎦ , u = ⎣3⎦ ,
−1 2 0
is u in the subspace defined by v and w?
5. Is the vector u = v ∧ w in the subspace defined by v and w?
6. Let V1 be the one-dimensional subspace defined by
⎡ ⎤
1
v = ⎣1⎦ .
0

What vector w in V1 is closest to


⎡ ⎤
1
w = ⎣0⎦?
1

7. The vectors v and w form a 2D subspace of R3 . Are they linearly


dependent?
8. What is the kernel of the matrix formed from linearly independent vec-
tors v1 , v2 , v3 ?
9.12. Exercises 203

9. Describe the linear map given by the matrix


⎡ ⎤
1 0 0
⎣0 0 −1⎦
0 1 0

by stating if it is volume preserving and stating the action of the map.


Hint: Examine where the ei -axes are mapped.
10. What matrix scales by 2 in the e1 -direction, scales by 1/4 in the e2 -
direction, and reverses direction and scales by 4 in the e3 -direction?
Map the unit cube with this matrix. What is the volume of the resulting
parallelepiped?
11. What is the matrix that reflects a vector about the plane x1 = x2 ? Map
the unit cube with this matrix. What is the volume of the resulting
parallelepiped?
12. What is the shear matrix that maps
⎡ ⎤ ⎡ ⎤
a 0
⎣ b ⎦ to ⎣0⎦?
c c

Map the unit cube with this matrix. What is the volume of the resulting
parallelepiped?
13. What matrix rotates around the e1 -axis by α degrees?
⎡ ⎤
−1
14. What matrix rotates by 45 around the vector ⎣ 0⎦?

−1
15. Construct the orthogonal projection matrix P that projects onto the
line spanned by ⎡ √ ⎤
1/√3
u = ⎣1/√3⎦
1/ 3
and what is the action of the map, v = P v? What is the action of the
map on the following two vectors:
⎡ ⎤ ⎡ ⎤
1 0
v1 = ⎣1⎦ and v2 = ⎣0⎦?
1 1

What is the rank of this matrix? What is the determinant?


16. Construct the projection matrix P that projects into the plane spanned
by ⎡ √ ⎤ ⎡ ⎤
1/√2 0
u1 = ⎣1/ 2⎦ and u2 = ⎣0⎦ .
0 1
204 9. Linear Maps in 3D

What is the action of the map, v = P v? What is the action of the


map on the following vectors:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1
v1 = ⎣1⎦ , v2 = ⎣0⎦ , v3 = ⎣−1⎦?
1 0 0
What is the rank of this matrix? What is the determinant?
17. Given the projection matrix in Exercise 16, what is the projection di-
rection?
18. Given the projection matrix
⎡ ⎤
1 0 −1
A = ⎣0 1 0⎦ ,
0 0 0
what is the projection direction? What type of projection is it?
19. What is the cofactor expansion of
⎡ ⎤
1 2 3
A = ⎣2 0 0⎦?
1 0 1
What is |A|?
20. For scalar values c1 , c2 , c3 and a matrix A = [a1 a2 a3 ], what is the
determinant of A1 = [c1 a1 a2 a3 ], A2 = [c1 a1 c2 a2 a3 ], and A3 =
[c1 a1 c2 a2 c3 a3 ]?
21. The matrix ⎡ ⎤
1 2 3
A = ⎣2 0 0⎦
1 1 1
is invertible and |A| = 2. What is |A−1 |?
22. Compute ⎡ ⎤⎡ ⎤
0 0 1 1 5 −4
⎣ 1 −2 0⎦ ⎣−1 −2 0⎦ .
−2 1 1 2 3 −4
23. What is AB and BA given
⎡ ⎤ ⎡ ⎤
1 2 3 0 1 0
A = ⎣2 0 0⎦ and B = ⎣1 1 1⎦?
1 1 1 1 0 0

24. What is AB and BA given


⎡ ⎤ ⎡ ⎤
1 2 3 1 1
A = ⎣2 0 0⎦ and B = ⎣2 2⎦?
1 1 1 0 1
9.12. Exercises 205

25. Find the inverse for each of the following matrices:

rotation:
⎡ √ √ ⎤ scale:
⎡ ⎤ projection:
⎡ ⎤
1/ 2 0 1/ 2 1/2 0 0 1 0 −1
⎣ ⎦ ⎣ 0 1/4 0⎦ , ⎣0 1 0⎦ .
√0 1 √0 ,
−1/ 2 0 1/ 2 0 0 2 0 0 0

26. For what type of matrix is A−1 = AT ?


27. What is the inverse of the matrix
⎡ ⎤
5 8
A = ⎣2 2⎦?
3 4

28. If ⎡ ⎤ ⎡ ⎤
0 1 0 0 0 1
B = ⎣1 1 1⎦ and B −1 =⎣ 1 0 0 ⎦,
1 0 0 −1 1 −1
what is (3B)−1 ?
29. For matrix A in Exercise 2, what is (AT )T ?
This page intentionally left blank
10
Affine Maps in 3D

Figure 10.1.
Affine maps in 3D: fighter jets twisting and turning through 3D space.

Affine maps in 3D are a primary tool for modeling and computer


graphics. Figure 10.1 illustrates the use of various affine maps. This
chapter goes a little further than just affine maps by introducing
projective maps—the maps used to create realistic 3D images.

207
208 10. Affine Maps in 3D

10.1 Affine Maps


Linear maps relate vectors to vectors. Affine maps relate points to
points. A 3D affine map is written just as a 2D one, namely as

x = p + A(x − o), (10.1)

where x, o, p, x are 3D points and A is a 3 × 3 matrix. In general, we


will assume that the origin of x’s coordinate system has three zero
coordinates, and drop the o term:
Sketch 10.1.
x = p + Ax. (10.2)
An affine map in 3D.
Sketch 10.1 gives an example. Recall, the column vectors of A are
the vectors a1 , a2 , a3 . The point p tells us where to move the origin
of the [e1 , e2 , e3 ]-system; again, the real action of an affine map is
captured by the matrix. Thus, by studying matrix actions, or linear
maps, we will learn more about affine maps.
We now list some of the important properties of 3D affine maps.
They are straightforward generalizations of the 2D cases, and so we
just give a brief listing.

1. Affine maps leave ratios invariant (see Sketch 10.2).


2. Affine maps take parallel planes to parallel planes (see Figure 10.2).

Sketch 10.2.
Affine maps leave ratios
invariant. This map is a rigid
body motion.

Figure 10.2.
Affine map property: parallel planes get mapped to parallel planes via an affine map.
10.2. Translations 209

3. Affine maps take intersecting planes to intersecting planes. In


particular, the intersection line of the mapped planes is the map
of the original intersection line.
4. Affine maps leave barycentric combinations invariant. If

x = c 1 p1 + c 2 p2 + c 3 p3 + c 4 p4 ,
P1́

where c1 + c2 + c3 + c4 = 1, then after an affine map we have


x
x = c1 p1 + c2 p2 + c3 p3 + c4 p4 . P1

For example, the centroid of a tetrahedron will be mapped to the


centroid of the mapped tetrahedron (see Sketch 10.3).
Sketch 10.3.
Most 3D maps do not offer much over their 2D counterparts—but The centroid is mapped to the
some do. We will go through all of them in detail now. centroid.

10.2 Translations
A translation is simply (10.2) with A = I, the 3 × 3 identity matrix:
⎡ ⎤
1 0 0
I = ⎣0 1 0⎦ ,
0 0 1
that is
x = p + Ix.
Thus, the new [a1 , a2 , a3 ]-system has its coordinate axes parallel to
the [e1 , e2 , e3 ]-system. The term Ix = x needs to be interpreted as
a vector in the [e1 , e2 , e3 ]-system for this to make sense. Figure 10.3
shows an example of repeated 3D translations.
Just as in 2D, a translation is a rigid body motion. The volume of
an object is not changed.

10.3 Mapping Tetrahedra


A 3D affine map is determined by four point pairs pi → pi for i =
1, 2, 3, 4. In other words, an affine map is determined by a tetrahedron
and its image. What is the image of an arbitrary point x under this
affine map?
Affine maps leave barycentric combinations unchanged. This will
be the key to finding x , the image of x. If we can write x in the form
x = u 1 p1 + u 2 p2 + u 3 p3 + u 4 p4 , (10.3)
210 10. Affine Maps in 3D

Figure 10.3.
Translations in 3D: three translated teapots.

then we know that the image has the same relationship with the pi :

x = u1 p1 + u2 p2 + u3 p3 + u4 p4 . (10.4)

So all we need to do is find the ui ! These are called the barycentric


coordinates of x with respect to the pi , quite in analogy to the triangle
case (see Section 6.5).
We observe that (10.3) is short for three individual coordinate equa-
tions. Together with the barycentric combination condition

u1 + u2 + u3 + u4 = 1,

we have four equations for the four unknowns u1 , . . . , u4 , which we


can solve by consulting Chapter 12.

Example 10.1

Let the original tetrahedron be given by the four points pi


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 0
⎣0⎦ , ⎣0⎦ , ⎣1⎦ , ⎣0⎦ .
0 0 0 1
10.3. Mapping Tetrahedra 211

Let’s assume we want to map this tetrahedron to the four points pi
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 −1 0 0
⎣0⎦ , ⎣ 0⎦ , ⎣−1⎦ , ⎣ 0⎦ .
0 0 0 −1
This is a pretty straightforward map if you consult Sketch 10.4.
Let’s see where the point
⎡ ⎤
1
x = ⎣1⎦
1 Sketch 10.4.
An example tetrahedron map.
ends up. First, we find that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1 0 0
⎣1⎦ = −2 ⎣0⎦ + ⎣0⎦ + ⎣1⎦ + ⎣0⎦ ,
1 0 0 0 1
i.e., the barycentric coordinates of x with respect to the original pi are
(−2, 1, 1, 1). Note how they sum to one. Now it is simple to compute
the image of x; compute x using the same barycentric coordinates
with respect to the pi :
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 −1 0 0 −1
x = −2 ⎣0⎦ + ⎣ 0⎦ + ⎣−1⎦ + ⎣ 0⎦ = ⎣−1⎦ .
0 0 0 −1 −1

A different approach would be to find the 3 × 3 matrix A and


point p that describe the affine map. Construct a coordinate system
from the pi tetrahedron. One way to do this is to choose p1 as the
origin1 and the three axes are defined as pi − p1 for i = 2, 3, 4. The
coordinate system of the pi tetrahedron must be based on the same
indices. Once we have defined A and p then we will be able to map
x by this map:
x = A[x − p1 ] + p1 .
Thus, the point p = p1 . In order to determine A, let’s write down
some known relationships. Referring to Sketch 10.5, we know
A[p2 − p1 ] = p2 − p1 , Sketch 10.5.
The relationship between
A[p3 − p1 ] = p3 − p1 , tetrahedra.
A[p4 − p1 ] = p4 − p1 ,
1 Any of the four p would do, so for the sake of concreteness, we choose the
i
first one.
212 10. Affine Maps in 3D

which may be written in matrix form as


   
A p2 − p1 p3 − p1 p4 − p1 = p2 − p1 p3 − p1 p4 − p1 .
(10.5)
Thus,
  −1
A = p2 − p1 p3 − p1 p4 − p1 p2 − p1 p 3 − p1 p 4 − p1 ,
(10.6)
and A is defined.

Example 10.2

Revisiting Example 10.1, we now want to construct the matrix A. By


selecting p1 as the origin for the pi tetrahedron coordinate system
there is no translation; p1 is the origin in the [e1 , e2 , e3 ]-system and
p1 = p1 . We now compute A. (A is the product matrix in the bottom
right position):
1 0 0
0 1 0
0 0 1
−1 0 0 −1 0 0
0 −1 0 0 −1 0
0 0 −1 0 0 −1
In order to compute x , we have
⎡ ⎤⎡ ⎤ ⎡ ⎤
−1 0 0 1 −1
x = ⎣ 0 −1 0⎦ ⎣1⎦ = ⎣−1⎦ .
0 0 −1 1 −1
This is the same result as in Example 10.1.

10.4 Parallel Projections


We looked at orthogonal parallel projections as basic linear maps in
Sections 4.8 and 9.7. Everything we draw is a projection of necessity—
paper is 2D, after all, whereas most interesting objects are 3D. Fig-
ure 10.4 gives an example. Here we will look at projections in the
context of 3D affine maps that map 3D points onto a plane.
10.4. Parallel Projections 213

Figure 10.4.
Projections in 3D: a 3D helix is projected into two different 2D planes.

As illustrated in Sketch 10.6, a parallel projection is defined by


a direction of projection d and a projection plane P . A point x is
projected into P , and is represented as xp in the sketch. This infor-
mation in turn defines a projection angle θ between d and the line
joining the perpendicular projection point xo in P . This angle is used
to categorize parallel projections as orthogonal or oblique.
Orthogonal (also called orthographic) projections are special; their
projection direction is perpendicular to the plane. There are special
names for many particular projection angles; see a computer graphics
text such as [10] for more details.
Let x be the 3D point to be projected, let v indicate the projection
direction, and the projection plane is defined by point q and normal
n, as illustrated in Sketch 10.7. For some point x in the plane, the
plane equation is [x − q] · n = 0. The intersection point is on the line Sketch 10.6.
defined by x and v and it is given by x = p + tv. We need to find Oblique and orthographic
t, and this is achieved by inserting the line equation into the plane parallel projections.
214 10. Affine Maps in 3D

equation and solving for t,

[x + tv − q] · n = 0,
[x − q] · n + tv · n = 0,
[q − x] · n
t= .
v·n
The intersection point x is now computed as

[q − x] · n
x = x + v. (10.7)
v·n

Sketch 10.7. How do we write (10.7) as an affine map in the form Ax + p?


Projecting a point on a plane. Without much effort, we find
n·x q·n
x = x − v+ v.
v·n v·n
We know that we may write dot products in matrix form (see Sec-
tion 4.11):
nT x q·n
x = x − v+ v.
v·n v·n
Next, we observe that
   
nT x v = v nT x .

Since matrix multiplication is associative (see Section 4.12), we also


have
   
v nT x = vnT x.

Notice that vnT is a 3 × 3 matrix. Now we can write


 
vnT q·n
x = I − x+ v, (10.8)
v·n v·n

where I is the 3 × 3 identity matrix. This is of the form x = Ax + p


and hence is an affine map.2
Let’s check the properties of (10.8). The projection matrix A,
formed from v and n has rank two and thus reduces dimensionality,
as designed. From the derivation of projection, it is intuitively clear
2 Technically, we should replace x with x − o to have a vector and replace αv

with αv + o to have a point, where α = (q · n)/(v · n).


10.4. Parallel Projections 215

that once x has been mapped into the projection plane, to x , it will
remain there. We can also show the map is idempotent algebraically,
 
vnT vnT
A2 = I− I−
vn vn
 2
vnT vnT
= I2 − 2 +
vn vn
T
 T 2
vn vn
=A− +
vn vn

Expanding the squared term, we find that

vnT vnT vnT


=
(v · n)2 v·n

and thus A2 = A. We can also show that repeating the affine map is
idempotent as well:

A(Ax + p) + p = A2 x + Ap + p
= Ax + Ap + p.

Let α = (q · n)/(v · n), and examining the middle term,



vnT
Ap = I− αv
vn
 T
n v
= αv − αv
vn
=0

Therefore, A(Ax + p) + p = Ax + p, and we have shown that indeed,


the affine map is idempotent.

Example 10.3

Suppose we are given the projection plane x1 + x2 + x3 − 1 = 0, a


point x (not in the plane), and a direction v given by
⎡ ⎤ ⎡ ⎤
3 0
x = ⎣2⎦ and v = ⎣ 0⎦ .
4 −1
216 10. Affine Maps in 3D

If we project x along v onto the plane, what is x ? Sketch 10.8


illustrates this geometry. First, we need the plane’s normal direction.
Calling it n, we have ⎡ ⎤
1
n = ⎣1⎦ .
1
Now, choose a point q in the plane. Let’s choose
⎡ ⎤
1
q = ⎣0⎦
0

for simplicity. Now we are ready to calculate the quantities in (10.8):

v · n = −1,

1 1 1
T 0 0 0 0
vn = ,
0 0 0 0
−1 −1 −1 −1
⎡ ⎤
0
q·n
v = ⎣0⎦ .
v·n
1
Putting all the pieces together:
⎡ ⎡ ⎤⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 0 3 0 3
x = ⎣I − ⎣0 0 0⎦⎦ ⎣2⎦ + ⎣0⎦ = ⎣ 2⎦ .
1 1 1 4 1 −4

Just to double-check, enter x into the plane equation

3 + 2 − 4 − 1 = 0,

and we see that ⎡ ⎤ ⎡ ⎤ ⎡ ⎤


3 3 0
⎣ 2⎦ = ⎣2⎦ + 8 ⎣ 0⎦ ,
−4 4 −1
which together verify that this is the correct point. Sketch 10.8 should
convince you that this is indeed the correct answer.
Sketch 10.8.
A projection example.
10.5. Homogeneous Coordinates and Perspective Maps 217

Which of the two possibilities, (10.7) or the affine map (10.8) should
you use? Clearly, (10.7) is more straightforward and less involved. Yet
in some computer graphics or CAD system environments, it may be
desirable to have all maps in a unified format, i.e., Ax + p. We’ll
revisit this unified format idea in Section 10.5.

10.5 Homogeneous Coordinates and Perspective Maps


There is a way to condense the form x = Ax + p of an affine map
into just one matrix multiplication

x = M x. (10.9)

This is achieved by setting


⎡ ⎤
a1,1 a1,2 a1,3 p1
⎢a2,1 a2,2 a2,3 p2 ⎥
M =⎢
⎣a3,1

a3,2 a3,3 p3 ⎦
0 0 0 1

and ⎡ ⎤ ⎡ ⎤
x1 x1
⎢x2 ⎥ ⎢x2 ⎥
x=⎢ ⎥
⎣x3 ⎦ , x = ⎢ ⎥
⎣x3 ⎦ .
1 1
The 4D point x is called the homogeneous form of the affine point
x. You should verify for yourself that (10.9) is indeed the same affine
map as before.
The homogeneous representation of a vector v must have the form,
⎡ ⎤
v1
⎢v2 ⎥
v=⎢ ⎥
⎣v3 ⎦ .
0

This form allows us to apply the linear map to the vector,

v = M v.

By having a zero fourth component, we disregard the translation,


which we know has no effect on vectors. Recall that a vector is defined
as the difference of two points.
This method of condensing transfomation information into one ma-
trix is implemented in the popular computer graphics Application
218 10. Affine Maps in 3D

Programmer’s Interface (API ), OpenGL [15]. It is very convenient


and efficient to have all this information (plus more, as we will see),
in one data structure.
The homogeneous form is more general than just adding a fourth
coordinate x4 = 1 to a point. If, perhaps as the result of some com-
putation, the fourth coordinate does not equal one, one gets from
the homogeneous point x to its affine counterpart x by dividing
through by x4 . Thus, one affine point has infinitely many homo-
geneous representations!

Example 10.4

This example shows two homogeneous representations of one affine


point. (The symbol ≈ should be read “corresponds to.”)
⎡ ⎤ ⎡ ⎤
⎡ ⎤ 10 −2
1 ⎢−10⎥ ⎢ 2⎥
⎣−1⎦ ≈ ⎢ ⎥ ⎢ ⎥
⎣ 30⎦ ≈ ⎣−6⎦ .
3
10 −2

Using the homogeneous matrix form of (10.9), the matrix M for


the point into a plane projection from (10.8) becomes
⎡ ⎤
v·n 0 0
⎣ 0 v·n 0 ⎦ − vnT (q · n)v
0 0 v·n
0 0 0 v·n

Here, the element m4,4 = v · n. Thus, x4 = v · n, and we will have


to divide x’s coordinates by x4 in order to obtain the corresponding
affine point.
A simple change in our equations will lead us from parallel pro-
jections onto a plane to perspective projections. Instead of using a
constant direction v for all projections, now the direction depends on
the point x. More precisely, let it be the line from x to the origin of
our coordinate system. Then, as shown in Sketch 10.9, v = −x, and
Sketch 10.9.
(10.7) becomes
[q − x] · n
Perspective projection. x = x + x,
x·n
10.5. Homogeneous Coordinates and Perspective Maps 219

which quickly simplifies to


q·n
x = x. (10.10)
x·n
In homogeneous form, this is described by the following matrix

I[q · n] o
M: .
0 0 0 x·n

Perspective projections are not affine maps anymore! To see this, a


simple example will suffice.

Example 10.5

Take the plane x3 = 1; let


⎡ ⎤
0
q = ⎣0⎦
1

be a point on the plane. Now q · n = 1 and x · n = x3 , resulting in


the map
1
x = x.
x3
Take the three points
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 3 4
x1 = ⎣0⎦ , x2 = ⎣−1⎦ , x3 = ⎣−2⎦ .
4 3 2

This example is illustrated in Sketch 10.9. Note that x2 = 12 x1 + 12 x3 ,


i.e., x2 is the midpoint of x1 and x3 .
Their images are
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1/2 1 2
x1 = ⎣ 0 ⎦ , x2 = ⎣−1/3⎦ , x3 = ⎣−1⎦ .
1 1 1

The perspective map destroyed the midpoint relation! Now,


2  1
x2 = x1 + x3 .
3 3
220 10. Affine Maps in 3D

Figure 10.5. Figure 10.6.


Parallel projection: a 3D helix and two orthographic projec- Perspective projection: a 3D helix and two orthographic
tions on the left and bottom walls of the bounding cube— projections on the left and bottom walls of the bounding
not visible due to the orthographic projection used for the cube—visible due to the perspective projection used for
whole scene. the whole scene.

Thus, the ratio of three points is changed by perspective maps. As


a consequence, two parallel lines will not be mapped to parallel lines.
Because of this effect, perspective maps are a good model for how
we perceive 3D space around us. Parallel lines do seemingly intersect
in a distance, and are thus not perceived as being parallel! Figure
10.5 is a parallel projection and Figure 10.6 illustrates the same ge-
ometry with a perspective projection. Notice in the perspective im-
age, the sides of the bounding cube that move into the page are no
longer parallel.
As we saw above, m4,4 allows us to specify perspective projections.
The other elements of the bottom row of M are used for projective
maps, a more general mapping than a perspective projection. The
topic of this chapter is affine maps, so we’ll leave a detailed discussion
of these elements to another source: A mathematical treatment of this
map is supplied by [6] and a computer graphics treatment is supplied
by [14]. In short, these entries are used in computer graphics for
mapping a viewing volume3 in the shape of a frustum to one in the
shape of a cube, while preserving the perspective projection effect.
3 The dimension and shape of the viewing volume defines what will be displayed

and how it will be displayed (orthographic or perspective).


10.5. Homogeneous Coordinates and Perspective Maps 221

Figure 10.7.
Perspective maps: an experiment by A. Dürer.

Algorithms in the graphics pipeline are very much simplified by only


dealing with geometry known to be in a cube.
The study of perspective goes back to the fourteenth century—
before that, artists simply could not draw realistic 3D images. One of
the foremost researchers in the area of perspective maps was
A. Dürer.4 See Figure 10.7 for one of his experiments.
4 From The Complete Woodcuts of Albrecht Dürer, edited by W. Durth, Dover

Publications Inc., New York, 1963.


222 10. Affine Maps in 3D

• affine map • parallel projection


• translation • orthogonal projection
• affine map properties • oblique projection
• barycentric combination • line and plane intersection
• invariant ratios • idempotent
• barycentric coordinates • dyadic matrix
• centroid • homogeneous coordinates
• mapping four points to • perspective projection
four points • rank

10.6 Exercises
We’ll use four points
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 0
x1 = ⎣0⎦ , x2 = ⎣1⎦ , x3 = ⎣ 0 ⎦ , x4 = ⎣0⎦ ,
0 0 −1 1

four points
⎡ ⎤ ⎡ ⎤ ⎤⎡ ⎡ ⎤
−1 0 0 0
y1 = ⎣ 0 ⎦ , y2 = ⎣−1⎦ , y3 = ⎣ 0⎦ , y4 = ⎣0⎦ ,
0 0 −1 1

and also the plane through q with normal n:


⎡ ⎤ ⎡ ⎤
1 3
1
q = ⎣0⎦ , n = ⎣0⎦ .
5
0 4

1. What are the two parts of an affine map?


2. An affine map xi → yi ; i = 1, 2, 3, 4 is uniquely defined. What is it?
3. What is the image of ⎡ ⎤
1
p = ⎣1⎦
1
under the map from Exercise 2? Use two ways to compute it.
4. What are the geometric properties of the affine map from Exercises 2
and 3?
5. An affine map yi → xi ; i = 1, 2, 3, 4 is uniquely defined. What is it?
10.6. Exercises 223

6. Using a direction
⎡ ⎤
2
1⎣ ⎦
v= 0 ,
4
2
what are the images of the xi when projected in this direction onto the
plane defined at the beginning of the exercises?
7. Using the same v as in Exercise 6, what are the images of the yi ?
8. What are the images of the xi when projected onto the plane by a
perspective projection through the origin?
9. What are the images of the yi when projected onto the plane by a
perspective projection through the origin?
10. Compute the centroid c of the xi and then the centroid c of their
perspective images (from Exercise 8). Is c the image of c under the
perspective map?
11. We claimed that (10.8) reduces to (10.10). This necessitates that
 
vnT
I− x = 0.
n·v

Show that this is indeed true.


12. What is the affine map that rotates the point q (defined above) 90◦
about the line defined as
⎡ ⎤ ⎡ ⎤
1 1
l(t) = ⎣1⎦ + t ⎣0⎦?
0 0

Hint: This is a simple construction, and does not require (9.10).


13. Suppose we have the unit cube with “lower-left” and “upper-right”
points
⎡ ⎤ ⎡ ⎤
0 1
l = ⎣0⎦ and u = ⎣1⎦ ,
0 1
respectively. What is the affine map that scales this cube uniformly by
two, rotates it −45◦ around the e2 -axis, and then positions it so that l
is mapped to
⎡ ⎤
−2
l = ⎣−2⎦?


−2
What is u ?
224 10. Affine Maps in 3D

14. Suppose we have a diamond-shaped geometric figure defined by the


following vertices,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 3 2
v1 = ⎣2⎦ , v2 = ⎣1⎦ , v3 = ⎣2⎦ , v4 = ⎣3⎦ ,
2 2 2 2

that is positioned in the x3 = 2 plane. We want to rotate the diamond


about its centroid c (with a positive angle) so it is positioned in the
x1 = c1 plane. What is the affine map that achieves this and what are
the mapped vertices vi ?
11
Interactions in 3D

Figure 11.1.
Ray tracing: 3D intersections are a key element of rendering a ray-traced image.
(Courtesy of Ben Steinberg, Arizona State University.)

The tools of points, lines, and planes are our most basic 3D geometry
building blocks. But in order to build real objects, we must be able
to compute with these building blocks. For example, if we are given

225
226 11. Interactions in 3D

a plane and a straight line, we would be interested in the intersection


of those two objects. This chapter outlines the basic algorithms for
these types of problems. The ray traced image of Figure 11.1 was
generated by using the tools developed in this chapter.1

11.1 Distance Between a Point and a Plane


Let a plane be given by its implicit form n · x + c = 0. If we also have
a point p, what is its distance d to the plane, and what is its closest
point q on the plane? See Sketch 11.1 for the geometry. Notice how
close this problem is to the foot of a point from Section 3.7.
Sketch 11.1. Clearly, the vector p − q must be perpendicular to the plane, i.e.,
A point and a plane. parallel to the plane’s normal n. Thus, p can be written as

p = q + tn;

if we find t, our problem is solved. This is easy, since q must also


satisfy the plane equation:

n · [p − tn] + c = 0.

Thus,
c+n·p
t= . (11.1)
n·n
It is good practice to assure that n is normalized, i.e., n · n = 1, and
then
t = c + n · p. (11.2)
Note that t = 0 is equivalent to n · p + c = 0; in that case, p is on
the plane to begin with!

Example 11.1

Consider the plane


x1 + x2 + x3 − 1 = 0
and the point ⎡ ⎤
2
p = ⎣2⎦ ,
Sketch 11.2.
3
Example point and a plane. as shown in Sketch 11.2.
1 See Section 11.3 for a description of the technique used to generate this image.
11.2. Distance Between Two Lines 227

According to (11.1), we find t = 2. Thus,


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 0
q = ⎣2⎦ − 2 × ⎣1⎦ = ⎣0⎦ .
3 1 1

The vector p − q is given by

p − q = tn.

Thus, the length of p − q, or the distance of p to the plane, is given


by tn. If n is normalized, then p − q = t; this means that we
simply insert p into the plane equation and obtain for the distance d:

d = c + n · p.

It is also clear that if t > 0, then n points toward p, and away


from it if t < 0 (see Sketch 11.3) where the plane is drawn “edge on.”
Compare with the almost identical Sketch 3.3.
Again, a numerical caveat: If a point is very close to a plane, it Sketch 11.3.
becomes very hard numerically to decide which side it is on. Points around a plane.
The material in this section should look familiar. We have de-
veloped similar equations for lines in Sections 3.3 and 3.6, and the
distance of a point to a plane was introduced in Section 8.4.

11.2 Distance Between Two Lines


Two 3D lines typically do not meet—such lines are called skew. It
might be of interest to know how close they are to meeting; in other
words, what is the distance between the lines? See Sketch 11.4 for an
illustration.
Let the two lines l1 and l2 be given by

x1 (s1 ) = p1 + s1 v1 , and
x2 (s2 ) = p2 + s2 v2 ,

respectively. Let x1 be the point on l1 closest to l2 , also let x2 be the


point on l2 closest to l1 . It should be clear that the vector x2 − x1 is
perpendicular to both l1 and l2 . Thus

[x2 − x1 ]v1 = 0,
[x2 − x1 ]v2 = 0,
228 11. Interactions in 3D

or

[p2 − p1 ]v1 = s1 v1 · v1 − s2 v1 · v2 ,
[p2 − p1 ]v2 = s1 v1 · v2 − s2 v2 · v2 .

These are two equations in the two unknowns s1 and s2 , and are thus
readily solved using the methods from Chapter 5.

Example 11.2

Let l1 be given by
⎡ ⎤ ⎡ ⎤
0 1
x1 (s1 ) = ⎣0⎦ + s1 ⎣0⎦ .
0 0

This means, of course, that l1 is the e1 -axis. For l2 , we assume


⎡ ⎤ ⎡ ⎤
0 0
x2 (s2 ) = ⎣1⎦ + s2 ⎣1⎦ .
1 0

This line is parallel to the e2 -axis; both lines are shown in Sketch 11.4.
Sketch 11.4. Our linear system becomes
Distance between two lines.     
1 0 s1 0
=
0 −1 s2 1

with solution s1 = 0 and s2 = −1. Inserting these values, we have


⎡ ⎤ ⎡ ⎤
0 0
x1 (0) = ⎣0⎦ and x2 (−1) = ⎣0⎦ .
0 1

These are the two points of closest proximity; the distance between
the lines is x1 (0) − x2 (−1) = 1.

Two 3D lines intersect if the two points x1 and x2 are identical.


However, floating point calculations can introduce round-off error, so
it is necessary to accept closeness within a tolerance: x1 − x2 2 <
tolerance.
11.3. Lines and Planes: Intersections 229

A condition for two 3D lines to intersect is found from the obser-


vation that the three vectors v1 , v2 , p2 − p1 must be coplanar, or
linearly dependent. This would lead to the determinant condition
(see Section 9.8)
det[v1 , v2 , p2 − p1 ] = 0.
From a numerical viewpoint, it is safer to compare the distance
between the points x1 and x2 ; in the field of computer-aided design,
one usually has known tolerances (e.g., 0.001 mm) for distances. It is
much harder to come up with a meaningful tolerance for a determi-
nant since it describes a volume.

11.3 Lines and Planes: Intersections


One of the basic techniques in computer graphics is called ray tracing.
Figure 11.1 illustrates this technique. A scene is given as an assem-
bly of planes (usually restricted to triangles). A computer-generated
image needs to compute proper lighting; this is done by tracing light
rays through the scene. The ray intersects a plane, it is reflected,
then it intersects the next plane, etc. Sketch 11.5 gives an example.
The basic problem to be solved is this: Given a plane P and a line l, Sketch 11.5.
what is their intersection point x? It is most convenient to represent A ray is traced through a scene.
the plane by assuming that we know a point q on it as well as its
normal vector n (see Sketch 11.6). Then the unknown point x, being
on P, must satisfy
[x − q] · n = 0. (11.3)
By definition, the intersection point is also on the line (the ray, in
computer graphics jargon), given by a point p and a vector v:

x = p + tv. (11.4)

At this point, we do not know the correct value for t; once we have
it, our problem is solved.
The solution is obtained by substituting the expression for x from
(11.4) into (11.3): Sketch 11.6.
[p + tv − q] · n = 0. A line and a plane.

Thus,
[p − q] · n + tv · n = 0
and
[q − p] · n
t= . (11.5)
v·n
230 11. Interactions in 3D

The intersection point x is now computed as


[q − p] · n
x= p+ v. (11.6)
v·n
(This equation should look familiar as we encountered this problem
when examining projections in Section 10.4, but we include it here to
keep this chapter self-contained.)

Example 11.3

Take the plane


x1 + x2 + x3 − 1 = 0
and the line ⎡ ⎤ ⎡ ⎤
1 0
p(t) = ⎣1⎦ + t ⎣0⎦
2 1
as shown in Sketch 11.7.
Sketch 11.7. We need a point q on the plane; set x1 = x2 = 0 and solve for
Intersecting a line and a plane. x3 , resulting in x3 = 1. This amounts to intersecting the plane with
the e3 -axis. From (11.5), we find t = −3 and then (11.6) gives the
intersection point as
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1
x = ⎣1⎦ − 3 ⎣0⎦ = ⎣ 1⎦ .
2 1 −1
It never hurts to carry out a sanity check: Verify that this x does
indeed satisfy the plane equation.

A word of caution: In (11.5), we happily divide by the dot product


v · n—but that better not be zero!2 If it is, then the ray “grazes” the
plane, i.e., it is parallel to it, and no intersection exists.
The same problem—intersecting a plane with a line—may be solved
if the plane is given in parametric form (see (8.15)). Then the un-
known intersection point x must satisfy

x = q + u1 r1 + u2 r2 .
2 Keep in mind that real numbers are rarely equal to zero. A tolerance needs

to be used; 0.001 should work if both n and v are normalized. However, such
tolerances are driven by applications.
11.4. Intersecting a Triangle and a Line 231

Since we know that x is also on the line l, we may set

p + tv = q + u1 r1 + u2 r2 .

This equation is short for three individual equations, one for each
coordinate. We thus have three equations in three unknowns u1 , u2 , t,
⎡ ⎤
  u1  
r1 r2 −v ⎣u2 ⎦ = p − q , (11.7)
t

and solve them with Gauss elimination. (The basic idea of this
method was presented in Chapter 5, and 3 × 3 and larger systems
are covered in Chapter 12.)

11.4 Intersecting a Triangle and a Line


A plane is, by definition, an unbounded object. In many applications,
planes are parts of objects; one is interested in only a small part of
a plane. For example, the six faces of a cube are bounded planes, so
are the four faces of a tetrahedron.
We will now examine the case of a 3D triangle as an example of a
bounded plane. If we intersect a 3D triangle with a line (a ray), then
we are not interested in an intersection point outside the triangle—
only an interior one will count.
Let the triangle be given by three points p1 , p2 , p3 and the line by
a point p and a direction v (see Sketch 11.8).
The plane may be written in parametric form as Sketch 11.8.
Intersecting a triangle and a
x(u1 , u2 ) = p1 + u1 (p2 − p1 ) + u2 (p3 − p1 ). line.

We thus arrive at

p + tv = p1 + u1 (p2 − p1 ) + u2 (p3 − p1 ),

a linear system in the unknowns t, u1 , u2 . The solution is inside


the triangle if both u1 and u2 are between zero and one and their
sum is less than or equal to one. This is so since we may view
(u1 , u2 , 1 − u1 − u2 ) as barycentric coordinates of the triangle defined
by p2 , p3 , p1 , respectively. These are positive exactly for points in-
side the triangle. See Section 17.1 for an in-depth look at barycentric
coordinates in a triangle.
232 11. Interactions in 3D

11.5 Reflections
The next problem is that of line or plane reflection. Given a point x
on a plane P and an “incoming” direction v, what is the reflected or
“outgoing” direction v ? See Sketch 11.9, where we look at the plane
P “edge on.” We assume that v, v , and n are of unit length.
From physics, you might recall that the angle between v and the
plane normal n must equal that of v and n, except for a sign change.
We conveniently record this fact using a dot product:

Sketch 11.9. −v · n = v · n. (11.8)


A reflection.
The normal vector n is thus the angle bisector of v and v . From
inspection of Sketch 11.9, we also infer the symmetry property

cn = v − v (11.9)

for some real number c. This means that some multiple of the normal
vector may be written as the sum v + (−v).
We now solve (11.9) for v and insert into (11.8):

−v · n = [cn + v] · n,

and solve for c:


c = −2v · n. (11.10)
Here, we made use of the fact that n is a unit vector and thus n·n = 1.
The reflected vector v is now given by using our value for c in
(11.9):
v = v − [2v · n]n. (11.11)
In the special case of v being perpendicular to the plane, i.e.,
v = −n, we obtain v = −v as expected. Also note that the point of
reflection does not enter the equations at all.
We may rewrite (11.11) as

v = v − 2[vT n]n,

which in turn may be reformulated to

v = v − 2[nnT ]v. (11.12)

You see this after multiplying out all products involved. Note that
nnT is an orthogonal projection matrix, just as we encountered in
11.6. Intersecting Three Planes 233

Section 4.8. It is a symmetric matrix and an example of a dyadic


matrix; this type of matrix has rank one. Sketch 11.10 illustrates
the action of (11.12).
Now we are in a position to formulate a reflection as a linear map.
It is of the form v = Hv with

H = I − 2nnT . (11.13)

The matrix H is known as a Householder matrix. We’ll look at


this matrix again in Section 13.1 in the context of solving a linear
system.
In computer graphics, this reflection problem is an integral part of Sketch 11.10.
calculating the lighting (color) at each vertex. A brief introduction to Reflection.
the lighting model may be found in Section 8.6. The light is positioned
somewhere on v, and after the light hits the point x, the reflected light
travels in the direction of v . We use a dot product to measure the
cosine of the angle between the direction of our eye (e − x) and the
reflection vector. The smaller the angle, the more reflected light hits
our eye. This type of lighting is called specular reflection; it is the
element of light that is dependent on our position and it produces a
highlight—a shiny spot.

11.6 Intersecting Three Planes

Suppose we are given three planes with implicit equations

n1 · x + c1 = 0,
n2 · x + c2 = 0,
n3 · x + c3 = 0.

Where do they intersect? The answer is some point x that lies on


each of the planes. See Sketch 11.11 for an illustration.
The solution is surprisingly simple; just condense the three plane Sketch 11.11.
equations into matrix form: Intersecting three planes.

⎡ T⎤ ⎡ ⎤ ⎡ ⎤
n1 x1 −c1
⎣nT
2
⎦ ⎣x2 ⎦ = ⎣−c2 ⎦ . (11.14)
nT
3 x3 −c3

We have three equations in the three unknowns x1 , x2 , x3 .


234 11. Interactions in 3D

Example 11.4

This example is shown in Sketch 11.12. The equations of the planes


in that sketch are

x1 + x3 = 1, x3 = 1, x2 = 2.

The linear system is


⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 1 x1 1
Sketch 11.12. ⎣0 0 1⎦ ⎣x2 ⎦ = ⎣1⎦ .
Intersecting three planes 0 1 0 x3 2
example.
Solving it by Gauss elimination (see Chapter 12), we obtain
⎡ ⎤ ⎡ ⎤
x1 0
⎣x2 ⎦ = ⎣2⎦ .
x3 1

While simple to solve, the three-planes problem does not always


have a solution. Two lines in 2D do not intersect if they are parallel, or
in other words, their normal vectors are parallel or linearly dependent.
The situation is analogous in 3D. If the normal vectors n1 , n2 , n3
are linearly dependent, then there is no solution to the intersection
problem.

Example 11.5

The normal vectors are


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
n1 = ⎣0⎦ , n2 = ⎣0⎦ , n3 = ⎣0⎦ .
0 1 1

Sketch 11.13. Since n2 = n1 + n3 , they are indeed linearly dependent, and thus the
Three planes intersecting in one planes defined by them do not intersect in one point (see Sketch 11.13).
line.
11.7. Intersecting Two Planes 235

11.7 Intersecting Two Planes


Odd as it may seem, intersecting two planes is harder than intersect-
ing three of them. The problem is this: Two planes are given in their
implicit form
n · x + c = 0,
m · x + d = 0.
Find their intersection, which is a line. We would like the solution to
be of the form
x(t) = p + tv.
This situation is depicted in Sketch 11.14.
The direction vector v of this line is easily found; since it lies in Sketch 11.14.
each of the planes, it must be perpendicular to both their normal Intersecting two planes.
vectors:
v = n ∧ m.
We still need a point p on the line.
To this end, we come up with an auxiliary plane that intersects
both given planes. The intersection point is clearly on the desired
line. Define the third plane by
v · x = 0.
This plane passes through the origin and has normal vector v; that
is, it is perpendicular to the desired line (see Sketch 11.15).
We now solve the three-plane intersection problem for the two given
planes and the auxiliary plane for the missing point p, and our line
is determined.
In the case c = d = 0, both given planes pass through the origin,
and it can serve as the point p.

11.8 Creating Orthonormal Coordinate Systems


Often times when working in 3D, life is made easier by creating a
local coordinate system. We have seen one example of this already:
digitizing. In Section 1.5 we alluded to a coordinate frame as a means
Sketch 11.15.
to “capture” a cat. Let’s look at that example as a motivation for
The auxiliary plane (shaded).
creating an orthonormal coordinate system with the Gram-Schmidt
method.
The digitizer needs an orthonormal coordinate frame in order to
store coordinates for the cat. The digitizer comes with its own frame—
the [e1 , e2 , e3 ]-coordinate frame, stored internally with respect to its
236 11. Interactions in 3D

base and arm, but as a user, you may choose your own frame of refer-
ence that we will call the [b1 , b2 , b3 ]-coordinate frame. For example,
when using this digitized model, it might be convenient to have the
b1 axis generally aligned with the head to tail direction. Placing the
origin of your coordinate frame close to the cat is also advantageous
for numerical stability with respect to round-off error and accuracy.
(The digitizer has its highest accuracy within a small radius from its
base.) An orthonormal frame facilitates data capture and the trans-
formation between coordinate frames. It is highly unlikely that you
can manually create an orthonormal frame, so the digitizer will do it
for you, but it needs some help.
With the digitizing arm, let’s choose the point p to be the origin
of our input coordinate frame, the point q will be along the b1 axis,
and the point r will be close to the b2 axis. From these three points,
we form two vectors,

v1 = q − p and v2 = r − p.

A simple cross product will supply us with a vector normal to the


table: v3 = v1 ∧ v2 . Now we can state the problem: given three
linearly independent vectors v1 , v2 , v3 , find a close orthonormal set
of vectors b1 , b2 , b3 .
Orthogonal projections and orthogonal components, as described
in Section 2.8, are the foundation of this method. We will refer to the
subspace formed by vi as Vi and the subspace formed by v1 , v2 as
V12 . As we build an orthogonal frame, we normalize all the vectors.
A notational shorthand that is helpful in making equations easier to
read: if we normalize a vector w, we will write w/ · . Normalizing
v1 :
v1
b1 = . (11.15)
·
Next, create b2 from the component of v2 that is orthogonal to the
subspace V1 , which means we normalize (v2 − projV1 v2 ):

v2 − (v2 · b1 )b1
b2 = . (11.16)
·
As the last step, we create b3 from the component of v3 that is
orthogonal to the subspace V12 , which means we normalize (v3 −
projV12 v3 ). This is done by separating the projection into the sum of
a projection onto V1 and onto V2 :
v3 − (v3 · b1 )b1 − (v3 · b2 )b2
b3 = . (11.17)
·
11.8. Creating Orthonormal Coordinate Systems 237

Example 11.6

As illustrated in Sketch 11.16, we are given


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 2 2
v1 = ⎣−2⎦ , v2 = ⎣−1⎦ , v3 = ⎣−0.5⎦ ,
0 0 2

and in this example, unlike the scenario in the digitizing application,


v3 is not the result of the cross product of v1 and v2 , so the example
can better illustrate each projection. Normalizing v1 ,
⎡⎤
0
⎣−2⎦ ⎡ ⎤
0 0
b1 = = ⎣−1⎦
·
0

The projection of v2 into the subspace V1 ,


⎛⎡ ⎤ ⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤
2 0 0 0 Sketch 11.16.
u = projV1 v2 = ⎝⎣−1⎦ · ⎣−1⎦⎠ ⎣−1⎦ = ⎣−1⎦ , 3D Gram-Schmidt
orthonormalization.
0 0 0 0

thus b2 is the component of v2 that is orthogonal to u, normalized


⎡⎤ ⎡ ⎤
2 0
⎣−1⎦ − ⎣−1⎦ ⎡ ⎤
0 0 1
b2 = = ⎣0⎦ .
·
0

The projection of v3 into the subspace V12 is computed as

w = projV12 v3
= projV1 v3 + projV2 v3
⎛⎡ ⎤ ⎡ ⎤⎞ ⎡ ⎤ ⎛⎡ ⎤ ⎡ ⎤⎞ ⎡ ⎤
2 0 0 2 1 1
= ⎝⎣−0.5⎦ · ⎣−1⎦⎠ ⎣−1⎦ + ⎝⎣−0.5⎦ · ⎣0⎦⎠ ⎣0⎦
2 0 0 2 0 0
⎡ ⎤
2
= ⎣−0.5⎦ .
0
238 11. Interactions in 3D

Then b3 is the component of v3 orthogonal to w, normalized


⎡ ⎤ ⎡ ⎤
2.0 2
⎣−0.5⎦ − ⎣−0.5⎦ ⎡ ⎤
2 0 0
b3 = = ⎣0⎦ .
·
1

As you might have observed, in 3D the Gram-Schmidt method


is more work in terms of multiplications and additions than simply
applying the cross product repeatedly. For example, if we form b3 =
b1 ∧ b2 , we get a normalized vector for free, and thus we save a
square root, which is an expensive operation in terms of operating
system cycles. The real advantage of the Gram-Schmidt method is
for dimensions higher than three, where we don’t have cross products.
However, understanding the process in 3D makes the n-dimensional
formulas easier to follow. A much more general version is discussed
in Section 14.4.

• distance between a point • reflection vector


and plane • Householder matrix
• distance between two lines • intersection of three planes
• plane and line intersection • intersection of two planes
• triangle and line • Gram-Schmidt
intersection orthonormalization

11.9 Exercises
For some of the exercises that follow, we will refer to the following two
planes and line. P1 goes through a point p and has normal vector n:
⎡ ⎤ ⎡ ⎤
1 −1
p = ⎣2⎦ , n = ⎣ 0⎦ .
0 0
The plane P2 is given by its implicit form
x1 + 2x2 − 2x3 − 1 = 0.
The line l goes through the point q and has direction v,
⎡ ⎤ ⎡ ⎤
−1 0
q = ⎣ 2⎦ , v = ⎣1⎦ .
0 0
11.9. Exercises 239

1. Find the distance of the point


⎡ ⎤
0
p = ⎣0⎦
2

to the plane P2 . Is this point on the same side as the normal direction?
√ √ √ √
2. Given the plane (1/ 3)x1 + (1/ 3)x2 + (1/ 3)x3 + (1/ 3) = 0 and
point p = [1 1 4]T , what is the distance of p to the plane? What is
the closest point to p in the plane?
3. Revisit Example 11.2, but set the point defining the line l2 to be
⎡ ⎤
0
p2 = ⎣0⎦ .
1

The lines have not changed; how do you obtain the (unchanged) solu-
tions x1 and x2 ?
4. Given two lines, xi (si ) = pi + si vi , i = 1, 2, where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −1 1 −1
x1 (s1 ) = ⎣1⎦ + s1 ⎣−1⎦ and x2 (s2 ) = ⎣1⎦ + s2 ⎣−1⎦ ,
0 1 1 −1

find the two points of closest proximity between the lines. What are the
parameter values s1 and s2 for these two points?
5. Find the intersection of P1 with the line l.
6. Find the intersection of P2 with the line l.
7. Let a be an arbitrary vector. It may be projected along a direction v
onto the plane P that passes through the origin with normal vector n.
What is its image a ?
8. Does the ray p + tv with
⎡ ⎤ ⎡ ⎤
−1 1
p = ⎣−1⎦ and v = ⎣1⎦
0 1

intersect the triangle with vertices


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3 0 2
⎣0⎦ , ⎣2⎦ , ⎣2⎦?
0 1 3

Solve this by setting up a linear system. Hint: t = 3.


240 11. Interactions in 3D

9. Does the ray p + tv with


⎡ ⎤ ⎡ ⎤
2 −1
p = ⎣2⎦ and v = ⎣−1⎦
2 −2

intersect the triangle with vertices


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0
⎣0⎦ , ⎣0⎦ , ⎣1⎦?
1 0 0

Hint: To solve the linear system, first express u and v in terms of t in


order to solve for t first.
10. Given the point p in the plane P1 , what is the reflected direction of the
vector ⎡ ⎤
1/3
v = ⎣ 2/3⎦?
−2/3
11. Given the ray defined by q + tv, where
⎡ ⎤ ⎡ ⎤
0 0
q = ⎣0⎦ and v = ⎣ 0⎦ ,
1 −1

find its reflection in the plane P2 by defining the intersection point p in


the plane and the reflection vector v .
12. Find the intersection of the three planes:

x1 + x2 = 1, x1 = 1, x3 = 4.

13. Find the intersection of the three planes


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0
P1 (u, v) = ⎣0⎦ + u ⎣0⎦ + v ⎣1⎦
0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −1 0
P2 (u, v) = ⎣1⎦ + u ⎣ 0⎦ + v ⎣0⎦
0 0 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0
P3 (u, v) = ⎣0⎦ + u ⎣1⎦ + v ⎣0⎦ .
0 1 1

14. Find the intersection of the planes:

x1 = 1 and x3 = 4.
11.9. Exercises 241

15. Find the intersection of the planes:

x1 − x2 = 0 and x1 + x2 = 2.

16. Given vectors


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 −1
v1 = ⎣0⎦ , v2 = ⎣1⎦ , v3 = ⎣−1⎦ ,
0 0 1

carry out the Gram-Schmidt orthonormalization.


17. Given the vectors vi in Exercise 16, what are the cross products that
will produce the orthonormal vectors bi ?
This page intentionally left blank
12
Gauss for Linear Systems

Figure 12.1.
Linear systems: the triangulation on the right was obtained from the left one by solving
a linear system.

In Chapter 5, we studied linear systems of two equations in two un-


knowns. A whole chapter for such a humble task seems like a bit of
overkill—its main purpose was really to lay the groundwork for this
chapter.
Linear systems arise in virtually every area of science and engineer-
ing—some are as big as 1,000,000 equations in as many unknowns.
Such huge systems require more sophisticated treatment than the
methods introduced here. They will allow you to solve systems with
several thousand equations without a problem.

243
244 12. Gauss for Linear Systems

Figure 12.1 illustrates the use of linear systems in the field of data
smoothing. The left triangulation looks somewhat “rough”; after
setting up an appropriate linear system, we compute the “smoother”
triangulation on the right, in which the triangles are closer to being
equilateral.
This chapter explains the basic ideas underlying linear systems.
Readers eager for hands-on experience should get access to software
such as Mathematica or MATLAB. Readers with advanced program-
ming knowledge can download linear system solvers from the web.
The most prominent collection of routines is LAPACK.

12.1 The Problem


A linear system is a set of equations like this:
3u1 − 2u2 − 10u3 + u4 = 0
u1 − u3 = 4
u1 + u2 − 2u3 + 3u4 = 1
u2 + 2u4 = −4.
The unknowns are the numbers u1 , u2 , u3 , u4 . There are as many
equations as there are unknowns, four in this example. We rewrite
this 4 × 4 linear system in matrix form:
⎡ ⎤⎡ ⎤ ⎡ ⎤
3 −2 −10 1 u1 0
⎢1 0 −1 0 ⎥ ⎢ u2 ⎥ ⎢ 4⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥.
⎣1 1 −2 3⎦ ⎣ u3 ⎦ ⎣ 1⎦
0 1 0 2 u4 −4
A general n × n linear system looks like this:
a1,1 u1 + a1,2 u2 + . . . + a1,n un = b1
a2,1 u1 + a2,2 u2 + . . . + a2,n un = b2
..
.
an,1 u1 + an,2 u2 + . . . + an,n un = bn .
In matrix form, it becomes
⎡ ⎤⎡ ⎤ ⎡ ⎤
a1,1 a1,2 . . . a1,n u1 b1
⎢ a2,1 a2,2 . . . a2,n ⎥ ⎢ u2 ⎥ ⎢ b2 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. ⎥ = ⎢ .. ⎥ , (12.1)
⎣ . ⎦⎣ . ⎦ ⎣ . ⎦
an,1 an,2 . . . an,n un bn
12.1. The Problem 245

 
a1 a2 . . . an u = b,
or even shorter
Au = b.
The coefficient matrix A has n rows and n columns. For example, the
first row is
a1,1 , a1,2 , . . . , a1,n ,
and the second column is
a1,2
a2,2
..
.
an,2 .

Equation (12.1) is a compact way of writing n equations for the n Sketch 12.1.
unknowns u1 , . . . , un . In the 2 × 2 case, such systems had nice geo- A solvable 3×3 system.
metric interpretations; in the general case, that interpretation needs
n-dimensional linear spaces, and is not very intuitive. Still, the meth-
ods that we developed for the 2 × 2 case can be gainfully employed
here!
Some underlying principles with a geometric interpretation are best
explained for the example n = 3. We are given a vector b and we try
to write it as a linear combination of vectors a1 , a2 , a3 ,
 
a1 a2 a3 u = b.

If the ai are truly 3D, i.e., if they form a tetrahedron, then a unique
solution may be found (see Sketch 12.1). But if the three ai all lie in
a plane (i.e., if the volume formed by them is zero), then you cannot
write b as a linear combination of them, unless it is itself in that 2D
plane. In this case, you cannot expect uniqueness for your answer.
Sketch 12.2 covers these cases. In general, a linear system is uniquely
solvable if the ai have a nonzero n-dimensional volume. If they do
not, they span a k-dimensional subspace (with k < n)—nonunique
solutions exist only if b is itself in that subspace. A linear system is
called consistent if at least one solution exists.
For 2D and 3D, we encountered many problems that lent themselves
to constructing the linear system in terms of a linear combination of
column vectors, the ai . However, in Section 5.11 we looked at how a Sketch 12.2.
linear system can be interpreted as a problem using equations built Top: no solution, bottom:
nonunique solution.
row by row. In n-dimensions, this commonly occurs. An example
follows.
246 12. Gauss for Linear Systems

Example 12.1

Suppose that at five time instances, say ti = 0, 0.25, 0.5, 0.75, 1 sec-
onds, we have associated observation data, p(ti ) = 0, 1, 0.5, 0.5, 0. We
would like to fit a polynomial to these data so we can estimate values
in between the observations. This is called polynomial interpolation.
Five points require a degree four polynomial,

p(t) = c0 + c1 t + c2 t2 + c3 t3 + c4 t4 ,

which has five coefficients ci . Immediately we see that we can write


down five equations,

p(ti ) = c0 + c1 ti + c2 t2i + c3 t3i + c4 t4i , i = 0, . . . , 4,

or in matrix form,
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 t0 t20 t30 t40 c0 p(t0 )
⎢1 t1 t21 t31 t41 ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢c1 ⎥ ⎢p(t1 )⎥
⎢ .. ⎥ ⎢ .. ⎥ = ⎢ .. ⎥
⎣ . ⎦⎣ . ⎦ ⎣ . ⎦
1 t4 t24 t34 t44 c4 p(t4 )

Figure 12.2 illustrates the result, p(t) = 12.667t−50t2 +69.33t3 −32t4 ,


for the given data.

1.0

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1.0

Figure 12.2.
Polynomial interpolation: a degree four polynomial fit to five data points.
12.2. The Solution via Gauss Elimination 247

12.2 The Solution via Gauss Elimination


The key to success in the 2 × 2 case was the application of a shear
(forward elimination) so that the matrix A was transformed to upper
triangular, meaning all entries below the diagonal are zero. Then it
was possible to apply back substitution to solve for the unknowns. A
shear was constructed to map the first column vector of the matrix
onto the e1 -axis. Revisiting an example from Chapter 5, we set
    
2 4 u1 4
= . (12.2)
1 6 u2 4

The shear used was  


1 0
S1 = ,
−1/2 1
which when applied to the system as

S1 Au = S1 b

produced the system


    
2 4 u1 4
= .
0 4 u2 2

Algebraically, what this shear did was to change the rows of the sys-
tem in the following manner:

1
row1 ← row1 and row2 ← row2 − row1 .
2
Each of these constitutes an elementary row operation. Back substi-
tution came next, with

1 1
u2 = ×2=
4 2
and then
1
u1 = (4 − 4u2 ) = 1.
2
The divisions in the back substitution equations are actually scalings,
thus they could be rewritten in terms of a scale matrix:
 
1/2 0
S2 = ,
0 1/4
248 12. Gauss for Linear Systems

and then the system would be transformed via


S2 S1 Au = S2 S1 b.
The corresponding upper triangular matrix and system is
    
1 2 u1 2
= .
0 1 u2 1/2
Check for yourself that we get the same result.
Thus, we see the geometric steps for solving a linear system have
methodical algebraic interpretations. We will be following this alge-
braic approach for the rest of the chapter. For general linear systems,
the matrices, such as S1 and S2 above, are not actually constructed
due to speed and storage expense. Notice that the shear to zero one
element in the matrix, changed the elements in only one row, thus it
is unnecessary to manipulate the other rows. This is an important
observation for large systems.
In the general case, just as in the 2 × 2 case, pivoting will be used.
Recall for the 2 × 2 case this meant that the equations were reordered
such that the (pivot) matrix element a1,1 is the largest one in the first
column. A row exchange can be represented in terms of a permutation
matrix. Suppose the 2 × 2 system in (12.2) was instead,
    
1 6 u1 4
= ,
2 4 u2 4
requiring pivoting as the first step. The permutation matrix that will
exchange the two rows is
 
0 1
P1 = ,
1 0
which is the identity matrix with the rows (columns) exchanged. After
this row exchange, the steps for solving P1 Au = P1 b are the same as
for the system in (12.2): S2 S1 P1 Au = S2 S1 P1 b.

Example 12.2

Let’s step through the necessary row exchanges and shears for a 3 × 3
linear system. The goal is to get it in upper triangular form so we
may use back substitution to solve for the unknowns. The system is
⎡ ⎤⎡ ⎤ ⎡ ⎤
2 −2 0 u1 4
⎣4 0 −2⎦ ⎣u2 ⎦ = ⎣−2⎦ .
4 2 −4 u3 0
12.2. The Solution via Gauss Elimination 249

The matrix element a1,1 is not the largest in the first column, so we
choose the 4 in the second row to be the pivot element and we reorder
the rows:
⎡ ⎤⎡ ⎤ ⎡ ⎤
4 0 −2 u1 −2
⎣2 −2 0⎦ ⎣u2 ⎦ = ⎣ 4⎦ .
4 2 −4 u3 0
The permutation matrix that achieves this row exchange is
⎡ ⎤
0 1 0
P1 = ⎣1 0 0⎦ .
0 0 1
(The subscript 1 indicates that this matrix is designed to achieve the
appropriate exchange for the first column.)
To zero entries in the first column apply:
1
row2 ← row2 − row1
2
row3 ← row3 − row1 ,
and the system becomes
⎡ ⎤⎡ ⎤ ⎡ ⎤
4 0 −2 u1 −2
⎣0 −2 1⎦ ⎣u2 ⎦ = ⎣ 5⎦ .
0 2 −2 u3 2
The shear matrix that achieves this is
⎡ ⎤
1 0 0
G1 = ⎣−1/2 1 0⎦ .
−1 0 1
Now the first column consists of only zeroes except for a1,1 , meaning
that it is lined up with the e1 -axis.
Now work on the second column vector. First, check if pivoting
is necessary; this means checking that a2,2 is the largest in absolute
value of all values in the second column that are below the diagonal.
No pivoting is necessary. (We could say that the permutation matrix
P2 = I.) To zero the last element in this vector apply
row3 ← row3 + row2 ,
which produces
⎡ ⎤⎡ ⎤ ⎡ ⎤
4 0 −2 u1 −2
⎣0 −2 1⎦ ⎣u2 ⎦ = ⎣ 5⎦ .
0 0 −1 u3 7
250 12. Gauss for Linear Systems

The shear matrix that achieves this is


⎡ ⎤
1 0 0
G2 = ⎣0 1 0⎦ .
0 1 1
By chance, the second column is aligned with e2 because a1,2 = 0. If
this extra zero had not appeared, then the last operation would have
mapped this 3D vector into the [e1 , e2 ]-plane.
Now that we have the matrix in upper triangular form, we are ready
for back substitution:
1
u3 = (7)
−1
1
u2 = (5 − u3 )
−2
1
u1 = (−2 + 2u3 ).
4
This implicitly incorporates a scaling matrix. We obtain the solution
⎡ ⎤ ⎡ ⎤
u1 −4
⎣u2 ⎦ = ⎣−6⎦ .
u3 −7
It is usually a good idea to insert the solution into the original equa-
tions: ⎡ ⎤⎡ ⎤ ⎡ ⎤
2 −2 0 −4 4
⎣4 0 −2⎦ ⎣−6⎦ = ⎣−2⎦ .
4 2 −4 −7 0
It works!

The example above illustrates each of the elementary row opera-


tions that take place during Gauss elimination:
• Pivoting results in the exchange of two rows.
• Shears result in adding a multiple of one row to another.
• Scaling results in multiplying a row by a scalar.
Gauss elimination for solving a linear system consists of two basic
steps: forward elimination (pivoting and shears) and then back sub-
stitution (scaling). Here is the algorithm for solving a general n × n
system of linear equations.
12.2. The Solution via Gauss Elimination 251

Gauss Elimination with Pivoting

Given: An n × n coefficient matrix A and a n × 1 right-hand side b


describing a linear system

Au = b,

which is short for the more detailed (12.1).

Find: The unknowns u1 , . . . , un .

Algorithm:

Initialize the n × n matrix G = I.


For j = 1, . . . , n − 1 (j counts columns)

Pivoting step:
Find the element with the largest absolute value in column j
from aj,j to an,j ; this is element ar,j .
If r > j, exchange equations r and j.

If aj,j = 0, the system is not solvable.

Forward elimination step for column j:


For i = j + 1, . . . , n (elements below diagonal of column j)
Construct the multiplier gi,j = ai,j /aj,j
ai,j = 0
For k = j + 1, . . . , n (each element in row i after
column j)
ai,k = ai,k − gi,j aj,k
bi = bi − gi,j bj

At this point, all elements below the diagonal have been set to
zero. The matrix is now in upper triangular form.

Back substitution:
un = bn /an,n
For j = n − 1, . . . , 1
1
uj = aj,j [bj − aj,j+1 uj+1 − . . . − aj,n un ].
252 12. Gauss for Linear Systems

In a programming environment, it can be convenient to form an


augmented matrix, which is the matrix A augmented with the vector
b. Here is the idea for a 3 × 3 linear system:
⎡ ⎤
a1,1 a1,2 a1,3 b1
⎣a2,1 a2,2 a2,3 b2 ⎦ .
a3,1 a3,2 a3,3 b3
Then the k steps would run to n + 1, and there would be no need for
the extra line for the bi element.
As demonstrated in Example 12.2, the operations in the elimination
step above may be written in matrix form. If A is the current matrix,
then at step j, we first check if a row exchange is necessary. This
may be achieved with a permutation matrix, Pj . If no pivoting is
necessary, Pj = I. To produce zeroes under aj,j the matrix product
Gj A is formed, where
⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
Gj = ⎢⎢ 1 ⎥.
⎥ (12.3)
⎢ −g 1 ⎥
⎢ j+1,j ⎥
⎢ .. .. ⎥
⎣ . . ⎦
−gn,j 1
The elements −gi,j of Gj are the multipliers. The matrix Gj is called
a Gauss matrix. All entries except for the diagonal and the entries
−gi,j below the diagonal of the jth column are zero. We could store
all Pj and Gj in one matrix
G = Gn−1 Pn−1 · . . . · G2 · P2 · G1 · P1 . (12.4)
If no pivoting is required, then it is possible to store the gi,j in the zero
elements of A rather than explicitly setting the ai,j element equal to
zero. Regardless, it is more efficient with regard to speed and storage
not to multiply A and b by the permutation and Gauss matrices
because this would result in many unnecessary calculations.
To summarize, Gauss elimination with pivoting transforms the lin-
ear system Au = b into the system
GAu = Gb,
which has the same solution as the original system. The matrix GA
is upper triangular, and it is referred to as U . The diagonal elements
of U are the pivots.
12.2. The Solution via Gauss Elimination 253

Example 12.3

We look at another example, taken from [2]. Let the system be


given by
⎡ ⎤⎡ ⎤ ⎡ ⎤
2 2 0 u1 6
⎣1 1 2⎦ ⎣u2 ⎦ = ⎣9⎦ .
2 1 1 u3 7
We start the algorithm with j = 1, and observe that no element in
column 1 exceeds a1,1 in absolute value, so no pivoting is necessary
at this step, thus P1 = I. Proceed with the elimination step for row
2 by constructing the multiplier

g2,1 = a2,1 /a1,1 = 1/2.

Change row 2 as follows:


1
row2 ← row2 − row1 .
2
Remember, this includes changing the element b2 . Similarly for row 3,

g3,1 = a3,1 /a1,1 = 2/2 = 1

then
row3 ← row3 − row1 .
Step j = 1 is complete and the linear system is now
⎡ ⎤⎡ ⎤ ⎡ ⎤
2 2 0 u1 6
⎣0 0 2⎦ ⎣u2 ⎦ = ⎣6⎦ .
0 −1 1 u3 1

The Gauss matrix for j = 1,


⎡ ⎤
1 0 0
G1 = ⎣−1/2 1 0⎦ .
−1 0 1

Next is column 2, so j = 2. Observe that a2,2 = 0, whereas a3,2 =


−1. We exchange equations 2 and 3 and the system becomes
⎡ ⎤⎡ ⎤ ⎡ ⎤
2 2 0 u1 6
⎣0 −1 1⎦ ⎣u2 ⎦ = ⎣1⎦ . (12.5)
0 0 2 u3 6
254 12. Gauss for Linear Systems

The permutation matrix for this exchange is


⎡ ⎤
1 0 0
P2 = ⎣0 0 1⎦ .
0 1 0
If blindly following the algorithm above, we would proceed with
the elimination for row 3 by forming the multiplier
a3,2 0
g3,2 = = = 0.
a2,2 −1
Then operate on the third row

row3 ← row3 − 0 × row2 ,


which doesn’t change the row at all. (So we will record G2 = I.) Due
to numerical instabilities, g3,2 might not be exactly zero. Without
putting a special check for a zero multiplier, this unnecessary work
takes place. Tolerances are very important here.
Apply back substitution by first solving for the last unknown:
u3 = 3.

Start the back substitution loop with j = 2:


1
u2 = [1 − u3 ] = 2,
−1
and finally
1
u1 = [6 − 2u2 ] = 1.
2
It’s a good idea to check the solution:
⎡ ⎤⎡ ⎤ ⎡ ⎤
2 2 0 1 6
⎣1 1 2⎦ ⎣2⎦ = ⎣9⎦ .
2 1 1 3 7
The final matrix G = G2 P2 G1 P1 is
⎡ ⎤
1 0 0
G = ⎣ −1 0 1⎦ .
−1/2 1 0

Check that GAu = Gb results in the linear system in (12.5).


12.3. Homogeneous Linear Systems 255

Just before back substitution, we could scale to achieve ones along


the diagonal of the matrix. Let’s do precisely that to the linear system
in Example 12.3. Multiply both sides of (12.5) by
⎡ ⎤
1/2 0 0
⎣ 0 −1 0 ⎦.
0 0 1/2

This transforms the linear system to


⎡ ⎤⎡ ⎤ ⎡ ⎤
1 1 0 u1 3
⎣0 1 −1⎦ ⎣u2 ⎦ = ⎣−1⎦ .
0 0 1 u3 3

This upper triangular matrix with rank = n is said to be in row


echelon form. If the matrix is rank deficient, rank < n, then the
rows with all zeroes should be the last rows. Some definitions of row
echelon do not require ones along the diagonal, as we have here; it is
more efficient to do the scaling as part of back substitution.
Gauss elimination requires O(n3 ) operations.1 Thus this algorithm
is suitable for a system with thousands of equations, but not for a
system with millions of equations. When the system is very large,
often times many of the matrix elements are zero—a sparse linear
system. Iterative methods, which are introduced in Section 13.6, are
a better approach in this case.

12.3 Homogeneous Linear Systems

Let’s revisit the topic of Section 5.8, homogeneous linear systems,


which take the form
Au = 0.

The trivial solution is always an option, but of little interest. How


do we use Gauss elimination to find a nontrivial solution if it exists?
Once we have one nontrivial solution, all multiples cu are solutions
as well. The answer: slightly modify the back substitution step. An
example will make this clear.

1 Read this loosely as: an estimated number of n3 operations.


256 12. Gauss for Linear Systems

Example 12.4

Start with the homogeneous system


⎡ ⎤ ⎡ ⎤
1 2 3 0
⎣1 2 3⎦ u = ⎣0⎦ .
1 2 3 0

The matrix clearly has rank one. First perform forward elimination,
arriving at
⎡ ⎤ ⎡ ⎤
1 2 3 0
⎣0 0 0⎦ u = ⎣0⎦ .
0 0 0 0

For each zero row of the transformed system, set the corresponding
ui , the free variables, to one: u3 = 1 and u2 = 1. Back substituting
these into the first equation gives u1 = −5 for the pivot variable.
Thus a final solution is
⎡ ⎤
−5
u = ⎣ 1 ⎦,
1

and all vectors cu are solutions as well.


Since the 3 × 3 matrix is rank one, it has a two dimensional null
space. The number of free variables is equal to the dimension of the
null space. We can systematically construct two vectors u1 , u2 that
span the null space by setting one of the free variables to one and the
other to zero, resulting in
⎡ ⎤ ⎡ ⎤
−3 −2
u1 = ⎣ 0 ⎦ and u2 = ⎣ 1 ⎦ .
1 0

All linear combinations of elements of the null space are also in the
null space, for example, u = 1u1 + 1u2 .

Column pivoting might be required to ready the matrix for back


substitution.
12.4. Inverse Matrices 257

Example 12.5

The homogeneous system


⎡ ⎤ ⎡ ⎤
0 6 3 0
⎣0 0 2⎦ u = ⎣0⎦ .
0 0 0 0

comes from an eigenvector problem similar to those in Section 7.3.


(More on eigenvectors in higher dimensions in Chapter 15.)
The linear system in its existing form leads us to 0u3 = 0 and
2u3 = 0. To remedy this, we proceed with column exchanges:
⎡ ⎤⎡ ⎤ ⎡ ⎤
6 3 0 u2 0
⎣0 2 0⎦ ⎣u3 ⎦ = ⎣0⎦ ,
0 0 0 u1 0

where column 1 was exchanged with column 2 and then column 2 was
exchanged with column 3. Each exchange requires that the associated
unknowns are exchanged as well. Set the free variable: u1 = 1, then
back substitution results in u3 = 0 and u2 = 0. All vectors
⎡ ⎤
1
c ⎣0⎦
0

satisfy the original homogeneous system.

12.4 Inverse Matrices


The inverse of a square matrix A is the matrix that “undoes” A’s
action, i.e., the combined action of A and A−1 is the identity:

AA−1 = I. (12.6)

We introduced the essentials of inverse matrices in Section 5.9 and


reviewed properties of the inverse in Section 9.11. In this section, we
introduce the inverse for n × n matrices and discuss inverses in the
context of solving linear systems, Gauss elimination, and LU decom-
position (covered in more detail in Section 12.5).
258 12. Gauss for Linear Systems

Example 12.6

The following scheme shows a matrix A multiplied by its inverse A−1 .


The matrix A is on the left, A−1 is on top, and the result of the
multiplication, the identity, is on the lower right:
1 0 −1
3 1 −3
1 2 −2
.
−4 2 −1 1 0 0
−3 1 0 0 1 0
−5 2 −1 0 0 1

How do we find the inverse of a matrix? In much the same way as


we did in the 2 × 2 case in Section 5.9, we write
   
A a1 . . . an = e1 . . . en . (12.7)

Here, the matrices are n × n, and the vectors ai as well as the ei are
vectors with n components. The vector ei has all zero entries except
for its ith component; it equals 1.
We may now interpret (12.7) as n linear systems:

Aa1 = e1 , ..., Aan = en . (12.8)

In Example 5.8, we applied shears and a scaling to transform A


into the identity matrix, and at the same time, the right-hand side
(e1 and e2 ) was transformed into A−1 . Now, with Gauss elimination
as per Section 12.2, we apply forward elimination to A and to each
of the ei . Then with back substitution, we solve for each of the ai
that form A−1 . However, as we will learn in Section 12.5, a more
economical solution is found with LU decomposition. This method is
tailored to solving multiple systems of equations that share the same
matrix A, but have different right-hand sides.
Inverse matrices are primarily a theoretical concept. They suggest
to solve a linear system Av = b by computing A−1 and then to set
v = A−1 b. Don’t do that! It is a very expensive way to solve a lin-
ear system; simple Gauss elimination or LU decomposition is much
cheaper. (Explicitly forming the inverse requires forward elimination,
n back substitution algorithms, and then a matrix-vector multiplica-
tion. On the other hand, Gauss elimination requires forward elimina-
tion and just one back substitution algorithm.)
12.4. Inverse Matrices 259

The inverse of a matrix A exists only if the matrix is square (n × n)


and the action of A does not reduce dimensionality, as in a projection.
This means that all columns of A must be linearly independent. There
is a simple way to see if a matrix A is invertible; just perform Gauss
elimination for the first of the linear systems in (12.8). If you are
able to transform A to upper triangular with all nonzero diagonal
elements, then A is invertible. Otherwise, it is said to be singular.
The term “nonzero” is to be taken with a grain of salt: real numbers
are (almost) never zero, and tolerances must be employed.
Again we encounter the concept of matrix rank. An invertible ma-
trix is said to have rank n or full rank. If a matrix reduces dimension-
ality by k, then it has rank n − k. The n × n identity matrix has rank
n; the zero matrix has rank 0. An example of a matrix that does not
have full rank is a projection. Review the structure of a 2D orthog-
onal projection in Section 4.8 and a 3D projection in Section 9.7 to
confirm this statement about the rank.

Example 12.7

We apply forward elimination to three 4 × 4 matrices to achieve row


echelon form:
⎡ ⎤
1 3 −3 0
⎢0 3 3 1⎥
M1 = ⎢
⎣0
⎥,
0 0 0⎦
0 0 0 0
⎡ ⎤
1 3 −3 0
⎢0 3 3 1⎥
M2 = ⎢
⎣0
⎥,
0 −1 0⎦
0 0 0 0
⎡ ⎤
1 3 −3 0
⎢0 3 3 1⎥
M3 = ⎢
⎣0
⎥.
0 −1 0⎦
0 0 0 2

M1 has rank 2, M2 has rank 3, and M3 has rank 4 or full rank.


260 12. Gauss for Linear Systems

Example 12.8

Let us compute the inverse of the n×n matrix Gj as defined in (12.3):


⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
G−1
j =⎢
⎢ 1 ⎥.

⎢ gj+1,j 1 ⎥
⎢ ⎥
⎢ .. .. ⎥
⎣ . . ⎦
gn,j 1

That’s simple! To make some geometric sense of this, you should


realize that Gj is a shear, and so is G−1
j , and it “undoes” Gj .

Here is another interesting property of the inverse of a matrix.


Suppose k = 0 and kA is an invertible matrix, then
1 −1
(kA)−1 = A .
k
And yet another: If two matrices, A and B, are invertible, then the
product AB is invertible, too.

12.5 LU Decomposition
Gauss elimination has two major parts: transforming the system to
upper triangular form with forward elimination and back substitution.
The creation of the upper triangular matrix may be written in terms
of matrix multiplications using Gauss matrices Gj . For now, assume
that no pivoting is necessary. If we denote the final upper triangular
matrix by U , then we have

Gn−1 · . . . · G1 · A = U. (12.9)

It follows that
A = G−1 −1
1 · . . . · Gn−1 U.

The neat thing about the product G−1 −1


1 · . . . · Gn−1 is that it is a lower
triangular matrix with elements gi,j below the diagonal and zeroes
12.5. LU Decomposition 261

above the diagonal:


⎡ ⎤
1
⎢ g2,1 1 ⎥
⎢ ⎥
G−1 −1
1 · . . . · Gn−1 = ⎢ .. .. .. ⎥.
⎣ . . . ⎦
gn,1 ··· gn,n−1 1
We denote this product by L (for lower triangular). Thus,

A = LU, (12.10)
which is known as the LU decomposition of A. It is also called the
triangular factorization of A. Every invertible matrix A has such a
decomposition, although it may be necessary to employ pivoting.
Denote the elements of L by li,j (keeping in mind that li,i = 1) and
those of U by ui,j . A simple 3 × 3 example will help illustrate the
idea.
u1,1 u1,2 u1,3
0 u2,2 u2,3
0 0 u3,3
.
1 0 0 a1,1 a1,2 a1,3
l2,1 1 0 a2,1 a2,2 a2,3
l3,1 l3,2 1 a3,1 a3,2 a3,3
In this scheme, we are given the ai,j and we want the li,j and ui,j .
This is systematically achieved using the following.
Observe that elements of A below the diagonal may be rewritten as
ai,j = li,1 u1,j + . . . + li,j−1 uj−1,j + li,j uj,j ; j < i,
For the elements of A that are on or above the diagonal, we get
ai,j = li,1 u1,j + . . . + li,i−1 ui−1,j + li,i ui,j ; j ≥ i.
This leads to
1
li,j = (ai,j − li,1 u1,j − . . . − li,j−1 uj−1,j ); j<i (12.11)
uj,j
and
ui,j = ai,j − li,1 u1,j − . . . − li,i−1 ui−1,j ; j ≥ i. (12.12)
If A has a decomposition A = LU , then the system can be writ-
ten as
LU u = b. (12.13)
262 12. Gauss for Linear Systems

The matrix vector product U u results in a vector; call this y. Reex-


amining (12.13), it becomes a two-step problem. First solve

Ly = b, (12.14)

then solve
U u = y. (12.15)
If U u = y, then LU u = Ly = b. The two systems in (12.14) and
(12.15) are triangular and easy to solve. Forward substitution is ap-
plied to the matrix L. (See Exercise 21 and its solution for an algo-
rithm.) Back substitution is applied to the matrix U . An algorithm
is provided in Section 12.2.
A more direct method for forming L and U is achieved with (12.11)
and (12.12), rather than through Gauss elimination. This then is the
method of LU decomposition.

LU Decomposition

Given: A coefficient matrix A and a right-hand side b describing a


linear system
Au = b.

Find: The unknowns u1 , . . . , un .

Algorithm:

Initialize L as the identity matrix and U as the zero matrix.


Calculate the nonzero elements of L and U :
For k = 1, . . . , n
uk,k = ak,k − lk,1 u1,k − . . . − lk,k−1 uk−1,k
For i = k + 1, . . . , n
1
li,k = [ai,k − li,1 u1,k − . . . − li,k−1 uk−1,k ]
uk,k
For j = k + 1, . . . , n
uk,j = ak,j − lk,1 u1,j − . . . − lk,k−1 uk−1,j

Using forward substitution solve Ly = b.


Using back substitution solve U u = y.

The uk,k term must not be zero; we had a similar situation with
Gauss elimination. This situation either requires pivoting or the ma-
trix might be singular.
12.5. LU Decomposition 263

The construction of the LU decomposition takes advantage of the


triangular structure of L and U combined with a particular compu-
tation order. The matrix L is being filled column by column and the
matrix U is being filled row by row.

Example 12.9

Let’s use LU decomposition to solve the linear system


⎡ ⎤ ⎡ ⎤
2 2 4 1
A = ⎣−1 2 −3⎦ u = ⎣1⎦ .
1 2 2 1

The first step is to decompose A. Following the steps in the algorithm


above, we calculate the matrix entries:

k=1: u1,1 = a1,1 = 2,


a2,1 −1
l2,1 = = ,
u1,1 2
a3,1 1
l3,1 = = ,
u1,1 2
u1,2 = a1,2 = 2,
u1,3 = a1,3 = 4,

k=2: u2,2 = a2,2 − l2,1 u1,2 = 2 + 1 = 3,


1 1 1
l3,2 = [a3,2 − l3,1 u1,2 ] = [2 − 1] = ,
u2,2 3 3
u2,3 = a2,3 − l2,1 u1,3 = −3 + 2 = −1,

1 1
k=3: u3,3 = a3,3 − l3,1 u1,3 − l3,2 u2,3 = 2 − 2 + = .
3 3
Check that this produces valid entries for L and U :

2 2 4
0 3 −1
0 0 1/3
.
1 0 0 2 2 4
−1/2 1 0 −1 2 −3
1/2 1/3 1 1 2 2
264 12. Gauss for Linear Systems

Next, we solve Ly = b with forward substitution—solving for y1 ,


then y2 , and then y3 —and find that
⎡ ⎤
1
y = ⎣3/2⎦ .
0

The last step is to solve U u = y with back substitution—as we did


in Gauss elimination, ⎡ ⎤
0
u = ⎣1/2⎦ .
0
It is simple to check that this solution is correct since clearly, the
column vector a2 is a multiple of b, and that is reflected in u.

Suppose A is nonsingular, but in need of pivoting. Then a per-


mutation matrix P is used to exchange (possibly multiple) rows so
it is possible to create the LU decomposition. The system is now
P Au = P b and we find P A = LU .
Finally, the major benefit of the LU decomposition: speed. For
cases in which we have to solve multiple linear systems with the same
coefficient matrix, LU decomposition is a big timesaver. We perform
it once, and then perform the forward and backward substitutions
(12.14) and (12.15) for each right-hand side. This is significantly
less work than performing a complete Gauss elimination every time!
Finding the inverse of a matrix, as described in (12.8), is an example
of a problem that requires solving multiple linear systems with the
same coefficient matrix.

12.6 Determinants
With the introduction of the scalar triple product, Section 8.5 pro-
vided a geometric derivation of 3 × 3 determinants; they measure vol-
ume. And then in Section 9.8 we learned more about determinants
from the perspective of linear maps. Let’s revisit that approach for
n × n determinants.
When we apply forward elimination to A, transforming it to upper
triangular form U , we apply a sequence of shears and row exchanges.
Shears do not change the volumes. As we learned in Section 9.8, a row
exchange will change the sign of the determinant. Thus the column
12.6. Determinants 265

vectors of U span the same volume as did those of A, however the


sign might change. This volume is given by the signed product of the
diagonal entries of U and is called the determinant of A:

det A = (−1)k (u1,1 × . . . × un,n ), (12.16)

where k is the number of row exchanges. In general, this is the best


(and most stable) method for finding the determinant (but also see
the method in Section 16.4).

Example 12.10

Let’s revisit Example 12.3 to illustrate how to calculate the determi-


nant with the upper triangular form, and how row exchanges influence
the sign of the determinant.
Use the technique of cofactor expansion, as defined by (9.14) to
find the determinant of the given 3 × 3 matrix A:

1 2 1 2

det A = 2 −2 = 4.
1 1 2 1

Now, apply (12.16) to the upper triangular form U (12.5) from the
example, and notice that we did one row exchange, k = 1:

det A = (−1)1 [2 × −1 × 2] = 4.

So the shears of Gauss elimination have not changed the absolute


value of the determinant, and by modifying the sign based on the
number of row exchanges, we can determine det A from det U .

The technique of cofactor expansion that was used for the 3 × 3


matrix in Example 12.10 may be generalized to n×n matrices. Choose
any column or row of the matrix, for example entries a1,j as above,
and then

det A = a1,1 C1,1 + a1,2 C1,2 + . . . + a1,n C1,n ,

where each cofactor is defined as

Ci,j = (−1)i+j Mi,j ,


266 12. Gauss for Linear Systems

and the Mi,j are called the minors; each is the determinant of the
matrix with the ith row and jth column removed. The Mi,j are
(n − 1) ×(n − 1) determinants, and they are computed by yet another
cofactor expansion. This process is repeated until we have 2 × 2
determinants. This technique is also known as expansion by minors.

Example 12.11

Let’s look at repeated application of cofactor expansion to find the


determinant. Suppose we are given the following matrix,
⎡ ⎤
2 2 0 4
⎢0 −1 1 3⎥
A=⎢ ⎣0
⎥.
0 2 0⎦
0 0 0 5
We may choose any row or column from which to form the cofactors,
so in this example, we will have less work to do if we choose the first
column. The cofactor expansion is

−1 1 3
2 0

det A = 2 0 2 0 = 2(−1) = 2(−1)(10) = −20.
0 0 5 0 5

Since the matrix is in upper triangular form, we could use (12.16)


and immediately see that this is in fact the correct determinant.

Cofactor expansion is more a theoretical tool than a computational


one. This method of calculating the determinant plays an important
theoretical role in the analysis of linear systems, and there are ad-
vanced theorems involving cofactor expansion and the inverse of a
matrix. Computationally, Gauss elimination and the calculation of
det U is superior.
In our first encounter with solving linear systems via Cramer’s rule
in Section 5.3, we learned that the solution to a linear system may be
found by simply forming quotients of areas. Now with our knowledge
of n × n determinants, let’s revisit Cramer’s rule. If Au = b is an
n × n linear system such that det A = 0, then the system has the
following unique solution:
det A1 det A2 det An
u1 = , u2 = , ..., un = , (12.17)
det A det A det A
12.7. Least Squares 267

where Ai is the matrix obtained by replacing the entries in the ith


column by b. Cramer’s rule is an important theoretical tool; however,
use it only for 2 × 2 or 3 × 3 linear systems.

Example 12.12

Let’s solve the linear system from Example 12.3 using Cramer’s rule.
Following (12.17), we have

6 2 0 2 6 0 2 2 6

9 1 2 1 9 2 1 1 9

7 1 1 2 7 1 2 1 7
u1 = , u = , u3 = .
2 2 0 0
2 2 0 2 2 2
1 1 2 1 1 2 1 1 2

2 1 1 2 1 1 2 1 1

We have computed the determinant of the coefficient matrix A in


Example 12.10, det A = 4. With the application of cofactor expansion
for each numerator, we find that
4 8 12
u1 = = 1, u2 = = 2, u3 = = 3,
4 4 4
which is identical to the solution found with Gauss elimination.

The determinant of a positive definite matrix is always positive, and


therefore the matrix is always nonsingular. The upper-left submatrices
of an n × n matrix A are
 
  a a1,2
A1 = a1,1 , A2 = 1,1 , . . . , An = A.
a2,1 a2, 2

If A is positive definite, then the determinants of all Ai are positive.


Rules for working with determinants are given in Section 9.8.

12.7 Least Squares


When presented with large amounts of data, we often look for meth-
ods to create a simpler view or synopsis of the data. For example,
Figure 12.3 is a graph of AIG’s monthly average stock price over
twelve years. We see a lot of activity in the price, but there is a clear
268 12. Gauss for Linear Systems

1500

1000

500

0
2000 2005 2010

Figure 12.3.
Least squares: fitting a straight line to stock price data for AIG from 2000 to 2013.

declining trend. A mathematical tool to capture this, which works


when the trend is not as clear as it is here, is linear least squares
approximation. The line illustrated in Figure 12.3 is the “best fit”
line or best approximating line.
Linear least squares approximation is also useful when analyzing ex-
perimental data, which can be “noisy,” either from the data capture
or observation method or from round-off from computations that gen-
erated the data. We might want to make summary statements about
the data, estimate values where data is missing, or predict future
values.
As a concrete (simple) example, suppose our experimental data are
temperature (Celsius) over time (seconds):
               
time 0 10 20 30 40 50 60
,
temperature 30 25 40 40 30 5 25

which are plotted in Figure 12.4. We want to establish a simple linear


relationship between the variables,

temperature = a × time + b,
12.7. Least Squares 269

40

30

20

10

10 20 30 40 50 60

Figure 12.4.
Least squares: a linear approximation to experimental data of time and temperature
pairs.

Writing down all relationships between knowns and unknowns, we


obtain linear equations of the form
⎡ ⎤ ⎡ ⎤
0 1 30
⎢10 1⎥ ⎢25⎥
⎢ ⎥ ⎢ ⎥
⎢20 1⎥   ⎢40⎥
⎢ ⎥ a ⎢ ⎥
⎢30 1⎥ ⎢ ⎥
⎢ ⎥ b = ⎢40⎥ . (12.18)
⎢40 1⎥ ⎢30⎥
⎢ ⎥ ⎢ ⎥
⎣50 1⎦ ⎣5⎦
60 1 25

We write the system as


 
a
Au = b, where u = .
b

This system of seven equations in two unknowns is overdetermined


and in general it will not have solutions; it is inconsistent. After all,
it is not very likely that b lives in the subspace V formed by the
columns of A. (As an analogy, consider the likelihood of a randomly
selected 3D vector living in the [e1 , e2 ]-plane.) But there is a recipe
for finding an approximate solution.
Denoting by b a vector in V, the system

Au = b (12.19)
270 12. Gauss for Linear Systems

is solvable (consistent), but it is still overdetermined since we have


seven equations in two unknowns. Recall from Section 2.8 that we
can write b as the sum of its orthogonal projection into V and the
component of b orthogonal to V,
b = b  + b⊥ . (12.20)

Also recall that b is closest to b and in V. Sketch 12.3 illustrates
this idea in 3D.
Since b⊥ is orthogonal to V, we can use matrix notation to formalize
Sketch 12.3. this relationship,
Least squares. ⊥ ⊥
aT
1b = 0 and aT
2 b = 0,

which is equivalent to
AT b⊥ = 0.
Based on (12.20), we can substitute b − b for b⊥ ,
AT (b − b ) = 0
AT (b − Au) = 0
AT b − AT Au = 0.
Rearranging this last equation, we have the normal equations
AT Au = AT b. (12.21)
This is a linear system with a square matrix AT A! Even more, that
matrix is symmetric. The solution to the new system (12.21), when
it has one, is the one that minimizes the error
Au − b2 ,
and for this reason, it is called the least squares solution of the original
system. Recall that b is closest to b in V and since we solved (12.19),
we have in effect minimized b − b.
It seems pretty amazing that by simply multiplying both sides by
AT , we have a “best” solution to the original problem!

Example 12.13

Returning to the system in (12.18), we form the normal equations,


    
9100 210 a 5200
= .
210 7 b 195
(Notice that the matrix is symmetric.)
12.8. Application: Fitting Data to a Femoral Head 271

The least squares solution is the solution of this linear system,


   
a −0.23
= ,
b 34.8

which corresponds to the line x2 = −0.23x1 + 34.8. Figure 12.4


illustrates this line with negative slope and e2 intercept of 34.8.

Imagine a scenario where our data capture method failed due to


some environmental condition. We might want to remove data points
if they seem outside the norm. These are called outliers. Point six in
Figure 12.4 looks to be an outlier. The least squares approximating
line provides a means for determining that this point is something of
an exception.
Linear least squares approximation can also serve as a method for
data compression.
Numerical problems can creep into the normal equations of the
linear system (12.21). This is particularly so when the n × m matrix
A has many more equations than unknowns, n m. In Section 13.1,
we will examine the Householder method for finding the least squares
solution to the linear system Au = b directly, without forming the
normal equations. Example 12.13 will be revisited in Example 13.3.
And yet another look at the least squares solution is possible with
the singular value decomposition of A in Section 16.6.

12.8 Application: Fitting Data to a Femoral Head


In prosthetic surgery, a common task is that of hip bone replacement.
This involves removing an existing femoral head and replacing it by a
transplant, consisting of a new head and a shaft for attaching it into
the existing femur. The transplant is typically made from titanium
or ceramic; the part that is critical for perfect fit and thus function is
the spherical head as shown in Figure 12.5. Data points are collected
from the existing femoral head by means of MRI or PET scans, then
a spherical fit is obtained, and finally the transplant is manufactured.
The fitting process is explained next.
We are given a set of 3D vectors v1 , . . . , vL that are of approxi-
mately equal length, ρ1 , . . . , ρL . We would like to find a sphere (cen-
tered at the origin) with radius r that closely fits the vi .
272 12. Gauss for Linear Systems

Figure 12.5.
Femur transplant: left, a titanium femoral head with shaft. Right, an example of a
sphere fit. Black points are “in front," gray points are occluded.

If all vi were on that sphere, we would have

r = ρ1 (12.22)
..
. (12.23)
r = ρL . (12.24)

This is a very overdetermined linear system—L equations in only one


unknown, r!
In matrix form: ⎡ ⎤ ⎡ ⎤
1 ρ1
⎢ .. ⎥   ⎢ .. ⎥
⎣.⎦ r = ⎣ . ⎦ .
1 ρL
Be sure to verify that the matrix dimensions work out!
Multiplying both sides by [1 . . . 1] gives

Lr = ρ1 + . . . + ρL

with the final result


ρ 1 + . . . + ρL
r= .
L
Thus the least squares solution is simply the average of the given
radii—just as our intuition would have suggested in the first place.
Things are not that simple if it comes to more unknowns, see Sec-
tion 16.6.
12.9. Exercises 273

• n × n linear system • matrix rank


• coefficient matrix • full rank
• consistent system • rank deficient
• subspace • homogeneous linear
• solvable system system
• unsolvable system • inverse matrix
• Gauss elimination • LU decomposition
• upper triangular matrix • factorization
• forward elimination • forward substitution
• back substitution • lower triangular matrix
• elementary row operation • determinant
• permutation matrix • cofactor expansion
• row echelon form • expansion by minors
• pivoting • Cramer’s rule
• Gauss matrix • overdetermined system
• multiplier • least squares solution
• augmented matrix • normal equations
• singular matrix

12.9 Exercises
1. Does the linear system
⎡ ⎤ ⎡ ⎤
1 2 0 1
⎣0 0 0⎦ u = ⎣2⎦
1 2 1 3
have a unique solution? Is it consistent?
2. Does the linear system
⎡ ⎤ ⎡ ⎤
1 1 5 3
⎣1 −1 1⎦ u = ⎣3⎦
1 2 7 3
have a unique solution? Is it consistent?
3. Examine the linear system in Example 12.1. What restriction on the ti
is required to guarantee a unique solution?
4. Solve the linear system Av = b where
⎡ ⎤ ⎡ ⎤
1 0 −1 2 −1
⎢0 0 1 −2 ⎥ ⎢ 2⎥
A=⎢ ⎣2
⎥ , and b = ⎢ ⎥ .
0 0 1⎦ ⎣ 1⎦
1 1 1 0 −3
Show all the steps from the Gauss elimination algorithm.
274 12. Gauss for Linear Systems

5. Solve the linear system Av = b where


⎡ ⎤ ⎡ ⎤
0 0 1 −1
A = ⎣1 0 0⎦ , and b = ⎣ 0⎦ .
1 1 1 −1

Show all the steps from the Gauss elimination algorithm.


6. Solve the linear system Av = b where
⎡ ⎤ ⎡ ⎤
4 2 1 7
A = ⎣2 2 0⎦ , and b = ⎣2⎦ .
4 2 3 9

7. Transform the following linear system to row echelon form.


⎡ ⎤ ⎡ ⎤
3 2 0 1
⎣3 1 2⎦ u = ⎣1⎦ .
0 2 0 1

8. What is the rank of the following matrix?


⎡ ⎤
3 2 0 1
⎢0 0 0 1⎥
⎢ ⎥
⎣0 1 2 0⎦
0 0 0 1

9. What is the permutation matrix that will exchange rows 3 and 4 in a


5 × 5 matrix?
10. What is the permutation matrix that will exchange rows 2 and 4 in a
4 × 4 matrix?
11. What is the matrix G as defined in (12.4) for Example 12.2?
12. What is the matrix G as defined in (12.4) for Exercise 6?
13. Solve the linear system
⎡ ⎤ ⎡ ⎤
4 1 2 0
⎣2 1 1⎦ u = ⎣0⎦
2 1 1 0

with Gauss elimination with pivoting.


14. Solve the linear system
⎡ ⎤ ⎡ ⎤
3 6 1 0
⎣6 12 2⎦ u = ⎣0⎦
9 18 3 0

with Gauss elimination with pivoting.


15. Find the inverse of the matrix from Exercise 5.
12.9. Exercises 275

16. Find the inverse of ⎡ ⎤


3 2 1
⎣0 2 1⎦ .
0 2 1
17. Find the inverse of ⎡ ⎤
cos θ 0 − sin θ
⎣ 0 1 0 ⎦.
sin θ 0 cos θ
18. Find the inverse of ⎡ ⎤
5 0 0 0 0
⎢0
⎢ 4 0 0 0⎥⎥
⎢0
⎢ 0 3 0 0⎥⎥.
⎣0 0 0 2 0⎦
0 0 0 0 1
19. Find the inverse of ⎡ ⎤
3 0
⎢2 1⎥
⎢ ⎥.
⎣1 4⎦
3 2
20. Calculate the LU decomposition of the matrix
⎡ ⎤
3 0 1
A = ⎣1 2 0⎦ .
1 1 1

21. Write a forward substitution algorithm for solving the lower triangular
system (12.14).
22. Use the LU decomposition of A from Exercise 20 to solve the linear
system Au = b, where ⎡ ⎤
4
b = ⎣0⎦ .
4
23. Calculate the determinant of
⎡ ⎤
3 0 1
A = ⎣1 2 0⎦ .
1 1 1

24. What is the rank of the matrix in Exercise 23?


25. Calculate the determinant of the matrix
⎡ ⎤
2 4 3 6
⎢1 0 0 0⎥
A=⎢ ⎣2 1 0 1⎦

1 1 2 0
using expansion by minors. Show all steps.
276 12. Gauss for Linear Systems

26. Apply Cramer’s rule to solve the following linear system:


⎡ ⎤ ⎡ ⎤
3 0 1 8
⎣1 2 0⎦ u = ⎣6⎦ .
1 1 1 6

Hint: Reuse your work from Exercise 23.


27. Apply Cramer’s rule to solve the following linear system:
⎡ ⎤ ⎡ ⎤
3 0 1 6
⎣0 2 0⎦ u = ⎣4⎦ .
0 2 1 7

28. Set up and solve the linear system for solving the intersection of the
three planes,
x1 + x3 = 1, x3 = 1, x2 = 2.
29. Find the intersection of the plane
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0
x(u1 , u2 ) = ⎣0⎦ + u1 ⎣ 0 ⎦ u2 ⎣ 1 ⎦
1 −1 −1

and the line ⎡ ⎤ ⎡ ⎤


1 0
p(t) = ⎣1⎦ + t ⎣0⎦
2 1
by setting up the problem as a linear system and solving it.
30. Let five points be given by
         
−2 −1 0 1 2
p1 = , p2 = , p3 = , p4 = , p5 = .
2 1 0 1 2

Find the linear least squares approximation.


31. Let five points be given by
         
0 1 2 1 2
p1 = , p2 = , p3 = , p4 = , p5 = .
0 1 2 −1 −2

Find the linear least squares approximation.


32. Let two points be given by
   
−4 4
p1 = , p2 = .
1 3

Find the linear least squares approximation.


13
Alternative System Solvers

Figure 13.1.
A sparse matrix: all nonzero entries are marked.

We have encountered Gauss elimination methods for solving linear


systems. These work well for moderately sized systems (up to a
few thousand equations), in particular if they do not pose numerical

277
278 13. Alternative System Solvers

problems. Ill-conditioned problems are more efficiently attacked using


the Householder method below. Huge linear systems (up to a million
equations) are more successfully solved with iterative methods. Of-
ten times, these linear systems are defined by a sparse matrix, one
that has many zero entries, as is illustrated in Figure 13.1. All these
alternative methods are the topics of this chapter.

13.1 The Householder Method


Let’s revisit the problem of solving the linear system Au = b, where
A is an n × n matrix. We may think of A as n column vectors, each
with n elements,
[a1 . . . an ]u = b.
In Section 12.2 we examined the classical method of Gauss elimina-
tion, or the process of applying shears Gi to the vectors a1 . . . an and
b in order to convert A to upper triangular form, or
Gn−1 . . . G1 Au = Gn−1 . . . G1 b,
and then we were able to solve for u with back substitution. Each
Gi is a shear matrix, constructed to transform the ith column vector
Gi−1 . . . G1 ai to a vector with zeroes below the diagonal element, ai,i .
As it turns out, Gauss elimination is not the most robust method
for solving a linear system. A more numerically stable method may
be found by replacing shears with reflections. This is the Householder
method.
The Householder method applied to a linear system takes the same
form as Gauss elimination. A series of reflections Hi is constructed
and applied to the system,
Hn−1 . . . H1 Au = Hn−1 . . . H1 b,
where each Hi transforms the column vector Hi−1 . . . H1 ai to a vec-
tor with zeroes below the diagonal element. Let’s examine how we
construct a Householder transformation Hi .
A simple 2 × 2 matrix  
1 −2
1 0
will help to illustrate the construction of a Householder transforma-
Sketch 13.1. tion. The column vectors of this matrix are illustrated in Sketch 13.1.
Householder reflection of vector The first transformation, H1 A, reflects a1 onto the e1 axis to the vec-
a1 to a1 e1 . tor a1 = ||a1 ||e1 , or   √ 
1 2
→ .
1 0
13.1. The Householder Method 279

We will reflect about the line L1 illustrated in Sketch 13.1, so we


must construct a normal n1 to this line:
a1 − ||a1 ||e1
n1 = ,
||a1 − ||a1 ||e1 ||
which is simply the normalized difference between the original vector
and the target vector (after the reflection). The implicit equation of
the line L1 is
nT1 x = 0,

and nT 1 a1 is the distance of the point (o + a1 ) to L1 . Therefore,


the reflection constitutes moving twice the distance to the line in the
direction of the normal, so
 
a1 = a1 − 2nT1 a1 n1 .

(Notice that 2nT1 a1 is a scalar.) To write this reflection in matrix


form, we rearrange the terms,
 
a1 = a1 − 2n1 nT1 a1
 
= I − 2n1 nT1 a1

Notice that 2n1 nT


1 is a dyadic matrix, as introduced in Section 11.5.
We now have the matrix H1 defining one Householder transformation:
H1 = I − 2n1 nT
1.

This is precisely the reflection we constructed in (11.13)!

Example 13.1

The Householder matrix H1 for our 2 × 2 example is formed with


 
−0.382
n1 = ,
0.923
then
   
0.146 −0.353 0.707 0.707
H1 = I − 2 = .
−0.353 0.853 0.707 −0.707
The transformed matrix is formed from the column vectors
√   √ 
2 −√2
H1 a1 = and H1 a2 = .
0 − 2
280 13. Alternative System Solvers

The 2 × 2 example serves only to illustrate the underlying geometry


of a reflection matrix. The construction of a general Householder
transformation Hi is a little more complicated. Suppose now that we
have the following matrix
⎡ ⎤
a1,1 a1,2 a1,3 a1,4
⎢ 0 a2,2 a2,3 a2,4 ⎥
A=⎢
⎣ 0
⎥.
0 a3,3 a3,4 ⎦
0 0 a4,3 a4,4

We need to construct H3 to zero the element a4,3 while preserving


A’s upper triangular nature. In other words, we want to preserve the
elimination done by previous transformations, H1 and H2 . To achieve
this, construct
⎡ ⎤
0
⎢ 0 ⎥
ā3 = ⎢ ⎥
⎣a3,3 ⎦ ,
a4,3

and then construct H3 so that


⎡ ⎤
0
⎢0⎥
H3 ā3 = γe3 = ⎢ ⎥
⎣γ ⎦ ,
0

where γ = ±ā3 . The matrix H3 has been designed so that H3 a3


will modify only elements a3,3 and a4,3 and the length of a3 will be
preserved.
That is the idea, so let’s develop it for n × n matrices. Suppose we
have
⎡ ⎤
0
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 0 ⎥

āi = ⎢ ⎥.

⎢ ai,i ⎥
⎢ . ⎥
⎣ .. ⎦
an,i

We want to construct the Householder matrix Hi for the following


13.1. The Householder Method 281

transformation ⎡ ⎤
0
⎢ .. ⎥
⎢.⎥
⎢ ⎥
Hi āi = γei = ⎢ ⎥
⎢γ ⎥ ,
⎢ .. ⎥
⎣.⎦
0
where γ = ±āi . Just as we developed for the 2 × 2 example,

Hi = I − 2ni nT
i , (13.1)

but we make a slight modification of the normal,


āi − γei
ni = ,
·
If āi is nearly parallel to ei then numerical problems can creep into our
construction due to loss of significant digits in subtraction of nearly
equal numbers. Therefore, it is better to reflect onto the direction of
the ei -axis that represents the largest reflection.
Let Ni = ni nT i and note that it is symmetric and idempotent.
Properties of Hi include being:
• symmetric: Hi = HiT , since Ni is symmetric;
• involutary: Hi Hi = I and thus Hi = Hi−1 ,
• unitary (orthogonal): HiT Hi = I, and thus Hi v has the same
length as v.
Implementation of Householder transformations doesn’t involve ex-
plicit construction and multiplication by the Householder matrix in
(13.1). A numerically and computationally more efficient algorithm is
easy to achieve since we know quite a bit about how each Hi acts on
the column vectors. So the following describes some of the variables
in the algorithm that aid optimization.
Some optimization of the algorithm can be done by rewriting n; let

−sign ai,i āi  if ai,i = 0
vi = āi − γei , where γ =
−āi  otherwise.

Then we have that


vvT vvT
2nnT = 1 T =
2v v
α
282 13. Alternative System Solvers

thus
α = γ 2 − ai,i γ.
When Hi is applied to column vector c,

vvT
Hi c = I − c = c − sv.
α
In the Householder algorithm that follows, as we work on the jth
column vector, we call ⎡ ⎤
aj,k
⎢ ⎥
âk = ⎣ ... ⎦
an,k
to indicate that only elements j, . . . , n of the kth column vector ak
(with k ≥ j) are involved in a calculation. Thus application of Hj
results in changes in the sub-block of A with aj,j at the upper-left
corner. Hence, the vector aj and Hj aj coincide in the first j − 1
components.
For a more detailed discussion, see [2] or [11].

The Householder Method


Algorithm:
Input:
An n × m matrix A, where n ≥ m and rank of A is m;
n vector b, augmented to A as the (m + 1)st column.
Output:
Upper triangular matrix HA written over A;
Hb written over b in the augmented (m + 1)st
column of A.
(H = Hn−1 . . . H1 )

If n = m then p = n − 1; else p = m (p is last column to


transform)
For j = 1, 2, . . . , p
a = âj · âj √
γ = −sign(aj,j ) a
α = a − aj,j γ
Temporarily set aj,j = aj,j − γ
For k = j + 1, . . . , m + 1
s = α1 (âj · âk )
âk = âk − sâj
 T
Set âj = γ 0 . . . 0
13.1. The Householder Method 283

Example 13.2

Let’s apply the Householder algorithm to the linear system


⎡ ⎤ ⎡ ⎤
1 1 0 −1
⎣1 −1 0⎦ u = ⎣ 0⎦ .
0 0 1 1

For j √= 1 in the√ algorithm, we calculate the following values:


γ = − 2, α = 2 + 2, and then we temporarily set
⎡ √ ⎤
1+ 2
â1 = ⎣ 1 ⎦ .
0
√ √
For k = 2, s = 2/(2 + 2). This results in
⎡ ⎤
√0
â2 = ⎣− 2⎦ .
0

For k = 3, s = 0 and â3 remains unchanged. For k = 4, s = − 2/2
and then the right-hand side vector becomes
⎡√ ⎤
√2/2
â4 = ⎣ 2/2⎦ .
0

Now we set â1 , and the reflection H1 results in the linear system
⎡ √ ⎤ ⎡√ ⎤
− 2 √0 0 √2/2
⎣ 0 − 2 0⎦ u = ⎣ 2/2⎦ .
0 0 1 1

Although not explicitly computed,


⎡ √ ⎤
1+ 2
⎣ 1 ⎦
0
n1 =
·
and the Householder matrix
⎡ √ √ ⎤
−√2/2 −√ 2/2 0
H1 = ⎣− 2/2 2/2 0⎦ .
0 0 1
284 13. Alternative System Solvers

Notice that a3 was not affected because it is in the plane about


which we were reflecting, and this is a result of the involutary property
of the Householder matrix. The length of each column vector was not
changed as a result of the orthogonal property. Since the matrix
is upper triangular, we may now use back substitution to find the
solution vector ⎡ ⎤
−1/2
u = ⎣−1/2⎦ .
1

The Householder algorithm is the method of choice if one is dealing


with ill-conditioned systems. An example: for some data sets, the
least squares solution to an overdetermined linear system can produce
unreliable results because of the nature of AT A. In Section 13.4, we
will examine this point in more detail. The Householder algorithm
above is easy to use for such overdetermined problems: The input
matrix A is of dimension n × m where n ≥ m. The following example
illustrates that the Householder method will result in the least squares
solution to an overdetermined system.

Example 13.3

Let’s revisit the least squares line-fitting problem from Example 12.13.
See that example for a problem description, and see Figure 12.4 for
an illustration. The overdetermined linear system for this problem is
⎡ ⎤ ⎡ ⎤
0 1 30
⎢10 1⎥ ⎢25⎥
⎢ ⎥ ⎢ ⎥
⎢20 1⎥   ⎢40⎥
⎢ ⎥ a ⎢ ⎥
⎢30 1⎥ ⎢ ⎥
⎢ ⎥ b = ⎢40⎥ .
⎢40 1⎥ ⎢30⎥
⎢ ⎥ ⎢ ⎥
⎣50 1⎦ ⎣5⎦
60 1 25
After the first Householder reflection (j = 1), the linear system be-
comes ⎡ ⎤ ⎡ ⎤
−95.39 −2.20 −54.51
⎢ 0 0.66 ⎥ ⎢ 16.14 ⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 0.33 ⎥   ⎢ 22.28 ⎥
⎢ ⎥ a ⎢ ⎥
⎢ 0 −0.0068⎥ ⎢ ⎥
⎢ ⎥ b = ⎢ 13.45 ⎥ .
⎢ 0 ⎥ ⎢ ⎥
⎢ −0.34 ⎥ ⎢ −5.44 ⎥
⎣ 0 −0.68 ⎦ ⎣ −39.29⎦
0 −1.01 −28.15
13.2. Vector Norms 285

For the second Householder reflection (j = 2), the linear system be-
comes ⎡ ⎤ ⎡ ⎤
−95.39 −2.20 −54.51
⎢ 0 −1.47⎥ ⎢−51.10⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 0 ⎥⎥   ⎢ 11.91 ⎥
⎢ a ⎢ ⎥
⎢ 0 0 ⎥ ⎢ ⎥
⎢ ⎥ b = ⎢ 13.64 ⎥ .
⎢ 0 0 ⎥⎥ ⎢ 5.36 ⎥
⎢ ⎢ ⎥
⎣ 0 0 ⎦ ⎣−17.91 ⎦
0 0 3.81
We can now solve the system with back substitution, starting with
the first nonzero row, and the solution is
   
a −0.23
= .
b 34.82

Excluding numerical round-off, this is the same solution found using


the normal equations in Example 12.13.

13.2 Vector Norms


Finding the magnitude or length of a 3D vector is fundamental to
many geometric operations. It is also fundamental for n-dimensional
vectors, even though such vectors might no longer have geometric
meaning. For example, in Section 13.6, we examine iterative methods
for solving linear systems, and vector length is key for monitoring
improvements in the solution.
The magnitude or length of an n−dimensional vector v is typically
measured by 
v2 = v12 + . . . + vn2 . (13.2)

The (nonnegative) scalar v2 is also referred to as the Euclidean


norm because in R3 it is Euclidean length. Since this is the “usual”
way to measure length, the subscript 2 often is omitted.
More generally, (13.2) is a vector norm. Other vector norms are
conceivable, for instance the 1-norm

v1 = |v1 | + |v2 | + . . . + |vn | (13.3)

or the ∞-norm
v∞ = max |vi |. (13.4)
i
286 13. Alternative System Solvers

Figure 13.2.
Vector norms: outline of the unit vectors for the 2-norm is a circle, ∞-norm is a square,
1-norm is a diamond.

We can gain an understanding of these norms by studying the familiar


case n = 2. Recall that a unit vector is one whose norm equals one.
For the 2-norm v2 , all unit vectors form a circle of radius one. With
a bit of thinking, we can see that for the 1-norm all unit vectors form a
diamond, while in the ∞-norm, all unit vectors form a square of edge
length two. This geometric interpretation is shown in Figure 13.2.
All three of these norms may be viewed as members of a whole
family of norms, referred to as p-norms. The p-norm of v is given by

vp = (v1p + v2p + . . . + vnp )1/p .

It is easy to see that the 1- and 2-norm fit this mold. That p → ∞


gives the ∞-norm requires a little more thought; give it a try!

Example 13.4

Let ⎡ ⎤
1
v = ⎣ 0 ⎦.
−2
Then √
v1 = 3, v2 = 5 ≈ 2.24, v∞ = 2.
13.2. Vector Norms 287

This example and Figure 13.2 demonstrate that

v1 ≥ v2 ≥ v∞ ; (13.5)

this is also true in n dimensions.


All vector norms share some basic properties:

v ≥ 0, (13.6)
v = 0 if and only if v = 0, (13.7)
cv = |c|v| for c ∈ R, (13.8)
v + w ≤ v + w. (13.9)

The last of these is the familiar triangle inequality.


By this time, we have developed a familiarity with the 2-norm, and
these properties are easily verified. Recall the steps for the triangle
inequality were demonstrated in (2.24).
Let’s show that the vector norm properties hold for the ∞-norm.
Examining (13.4), for each v in Rn , by definition, maxi |vi | ≥ 0. If
maxi |vi | = 0 then vi = 0 for each i = 1, . . . , n and v = 0. Conversely,
if v = 0 then maxi |vi | = 0. Thus we have established that the
first two properties hold. The third property is easily shown by the
properties of the absolute value function:

cv∞ = max |cvi | = |c| max |vi | = |c|v∞ .


i i

For the triangle inequality:

v + w∞ = max |vi + wi |


i
≤ max{|vi | + |wi |}
i
≤ max |vi | + max |wi |
i i
= v∞ + w∞ .

From a 2D geometric perspective, consider the 1-norm: starting at


the origin, move v1 units along the e1 -axis and then move v2 units
along the e2 -axis. The sum of these travels determines v’s length.
This is analogous to driving in a city with rectilinear streets, and
thus this norm is also called the Manhattan or taxicab norm.
The ∞-norm is also called the max-norm. Our standard measure of
length, the 2-norm, can take more computing power than we might be
willing to spend on a proximity problem, where we want to determine
whether two points are closer than a given tolerance. Problems of this
288 13. Alternative System Solvers

sort are often repeated many times and against many points. Instead,
the max norm might be more suitable from a computing point of view
and sufficient given the relationship in (13.5). This method of using
an inexpensive measure to exclude many possibilities is called trivial
reject in some disciplines such as computer graphics.

13.3 Matrix Norms


A vector norm measures the magnitude of a vector; does it also make
sense to talk about the magnitude of a matrix?
Instead of exploring the general case right away, we will first give
some 2D insight. Let A be a 2 × 2 matrix. It maps the unit circle
to an ellipse—the action ellipse. In Figure 13.3 we see points on the
unit circle at the end of unit vectors vi and their images Avi , forming
this action ellipse. We can learn more about this ellipse by looking
at AT A. It is symmetric and positive definite, and thus has real and
positive eigenvalues, which are λ1 and λ2 , in decreasing value. We
then define 
σi = λi , (13.10)
to be the singular values of A. (In Chapter 16, we will explore singular
values in more detail.)
The singular value σ1 is the length of the action ellipse’s semi-major
axis and σ2 is the length of the action ellipse’s semi-minor axis. If A

Figure 13.3.
Action of a 2 × 2 matrix: points on the unit circle (black) are mapped to points on an
ellipse (gray) This is the action ellipse. The vector is the semi-major axis of the ellipse,
which has length σ 1 .
13.3. Matrix Norms 289

is a symmetric positive definite matrix, then its singular values are


equal to its eigenvalues.
How much does A distort the unit circle? This is measured by its 2-
norm, A2 . If we find the largest Avi 2 , then we have an indication
of how much A distorts. Assuming that we have k unit vectors vi ,
we can compute
A2 ≈ max Avi 2 . (13.11)
i

As we increase k, (13.11) gives a better and better approximation to


A2 . We then have

A2 = max Av2 . (13.12)


v2 =1

It should be clear by now that this development is not restricted


to 2 × 2 matrices, but holds for n × n ones as well. From now on,
we will discuss this general case, where the matrix 2-norm is more
complicated to compute than its companion vector norm. It is given
by
A2 = σ1 ,
where σ1 is A’s largest singular value.
Recall that the inverse matrix A−1 “undoes” the action of A. Let
the singular values of A−1 be called σ̂i , then
1 1
σ̂1 = , ..., σ̂n =
σn σ1
and
1
A−1 2 =.
σn
Singular values are frequently used in the numerical analysis of
matrix operations. If you do not have access to software for singu-
lar values, then (13.11) will give a decent approximation. Singular
values are typically computed using a method called singular value
decomposition, or SVD, and they are the focus of Chapter 16.
Analogous to vector norms, there are other matrix norms. We give
two important examples:

A1 : maximum absolute column sum,


A∞ : maximum absolute row sum.

Because the notation for matrix and vector norms is identical, it is


important to note what is enclosed in the norm symbols—a vector or
a matrix.
290 13. Alternative System Solvers

Another norm that is sometimes used is the Frobenius norm, which


is given by 
AF = σ12 + . . . + σn2 , (13.13)
where the σi are A’s singular values.
The following is not too obvious. The Euclidean norm: add up the
squares of all matrix elements, and take the square root,

AF = a21,1 + a21,2 + . . . + a2n,n , (13.14)
is identical to the Frobenius norm.
A matrix A maps all unit vectors to an ellipsoid whose semi-axis
lengths are given by σ1 , . . . , σn . Then the Frobenius norm gives the
total distortion caused by A.

Example 13.5
Let’s examine matrix norms for
⎡ ⎤
1 2 3
A = ⎣3 4 5 ⎦.
5 6 −7
Its singular values are given by 10.5, 7.97, 0.334 resulting in
A2 = max{10.5, 7.97, 0.334} = 10.5,
A1 = max{9, 12, 15} = 15,
A∞ = max{6, 12, 18} = 18
 
AF = 12 + 32 + . . . (−7)2 = 10.52 + 7.972 + 0.334 = 13.2.

From the examples above, we see that matrix norms are real-valued
functions of the linear space defined over all n × n matrices.1 Matrix
norms satisfy conditions very similar to the vector norm conditions
(13.6)–(13.9):
A > 0 for A = Z, (13.15)
A = 0 for A = Z, (13.16)
cA = |c|A| c ∈ R, (13.17)
A + B ≤ A + B, (13.18)
AB ≤ AB, (13.19)
Z being the zero matrix.
1 More on this type of linear space in Chapter 14.
13.4. The Condition Number 291

How to choose a matrix norm? Computational expense and prop-


erties of the norm are the deciders. For example, the Frobenius and
2-norms are invariant with respect to orthogonal transformations.
Many numerical methods texts provide a wealth of information on
matrix norms and their relationships.

13.4 The Condition Number


We would like to know how sensitive the solution to Au = b is to
changes in A and b. For this problem, the matrix 2-norm helps us
define the condition of the map.
In most of our figures describing linear maps, we have mapped
a circle, formed by many unit vectors vi , to an ellipse, formed by
vectors Avi . This ellipse is evidently closely related to the geometry
of the map, and indeed its semi-major and semi-minor axis lengths
correspond to A’s singular values. For n = 2, the eigenvalues of
AT A are λ1 and λ2 , in decreasing value. The singular values of A
are defined in (13.10). If σ1 is very large and σ2 is very small, then
the ellipse will be very elongated, as illustrated by the example in
Figure 13.4. The ratio

κ(A) = A2 A−1 2 = σ1 /σn

is called the condition number of A. In Figure 13.4, we picked the


(symmetric, positive definite) matrix
 
1.5 0
A= ,
0 0.05

which has condition number κ(A) = 1.5/0.05 = 30.


Since AT A is symmetric and positive definite, κ(A) ≥ 1. If a matrix
has a condition number close to one, it is called well-conditioned. If
the condition number is 1 (such as for the identity matrix), then no
distortion happens. The larger the κ(A), the more A distorts, and if
κ(A) is “large,” the matrix is called ill-conditioned.

Figure 13.4.
Condition number: The action of A with κ(A) = 30.
292 13. Alternative System Solvers

Example 13.6

Let  
cos α − sin α
A= ,
sin α cos α
meaning that A is an α degree rotation. Clearly, AT A = I, where
I is the identity matrix. Thus, σ1 = σ2 = 1. Hence the condition
number of a rotation matrix is 1. Since a rotation does not distort,
this is quite intuitive.

Example 13.7

Now let  
100 0
A= ,
0 0.01
a matrix that scales by 100 in the e1 -direction and by 0.01 in the e2 -
direction. This matrix is severely distorting! We quickly find σ1 = 100
and σ2 = 0.01 and thus the condition number is 100/0.01 = 10, 000.
The fact that A distorts is clearly reflected in the magnitude of its
condition number.

Back to the initial problem: solving Au = b. We will always have


round-off error to deal with, but we want to avoid creating a poorly
designed linear system, which would mean a matrix A with a “large”
condition number. The definition of large is subjective and problem-
specific. A guideline: κ(A) ≈ 10k can result in a loss of k digits of
accuracy. If the condition number of a matrix is large, then irrespec-
tive of round-off, the solution cannot be depended upon. Practically
speaking, a large condition number means that the solution to the
linear system is numerically very sensitive to small changes in A or
b. Alternatively, we can say that we can confidently calculate the
inverse of a well-conditioned matrix.
The condition number is a better measure of singularity than the
determinant because it is a scale and size n invariant measure. If s is
a scalar, then κ(sA) = κ(A).
13.5. Vector Sequences 293

Example 13.8

Let n = 100. Form the identity matrix I and J = 0.1I.

det I = 1, κ(I) = 1, det J = 10−100 , κ(J) = 1.

The determinant of J is small, indicating that there is a problem


with this matrix. However, in solving a linear system with such a
coefficient matrix, the scale of J poses no problem.

In Section 12.7, we examined overdetermined linear systems Au =


b in m equations and n unknowns, where m > n. (Thus A is an m×n
matrix.) Our approach to this problem depended on the approxima-
tion method of least squares, and we solved the system AT Au = AT b.
The condition number for this new matrix, κ(AT A) = κ(A)2 , so if
A has a high condition number to start with, this process will create
an ill-posed problem. In this situation, a method such as the House-
holder method (see Section 13.1), is preferred. We will revisit this
topic in more detail in Section 16.6.
An advanced linear algebra or numerical analysis text will provide
an in-depth study of the condition number and error analysis.

13.5 Vector Sequences


You are familiar with sequences of real numbers such as
1 1 1
1, , , , . . .
2 4 8
or
1, 2, 4, 8, . . .
The first of these has the limit 0, whereas the second one does not
have a limit. One way of saying a sequence of real numbers ai has a
limit a is that beyond some index i, all ai differ from the limit a by
an arbitrarily small amount .
Vector sequences in Rn ,

v(0) , v(1) , v(2) , . . . ,

are not all that different. A vector sequence has a limit if each com-
ponent has a limit.
294 13. Alternative System Solvers

Example 13.9

Let a vector sequence be given by


⎡ ⎤
1/i
v(i) = ⎣1/i2 ⎦ .
1/i3

This sequence has the limit


⎡ ⎤
0
v = ⎣0⎦ .
0

Now take the sequence


⎡ ⎤
i
v(i) = ⎣1/i2 ⎦ .
1/i3

It does not have a limit: even though the last two components each
have a limit, the first component diverges.

We say that a vector sequence converges to v with respect to a


norm if for any tolerance  > 0, there exists an integer m such that

v(i) − v <  for all i > m. (13.20)

In other words, from some index i on, the distance of any v(i) from v
is smaller than an arbitrarily small amount . See Figure 13.5 for an
example. If the sequence converges with respect to one norm, it will
converge with respect to all norms. Our focus will be on the (usual)
Euclidean or 2-norm, and the subscript will be omitted.
Vector sequences are key to iterative methods, such as the two
methods for solving Au = b in Section 13.6 and the power method
for finding the dominant eigenvalue in Section 15.2. In practical ap-
plications, the limit vector v is not known. For some special prob-
lems, we can say whether a limit exists; however, we will not know it
a priori. So we will modify our theoretical convergence measure
(13.20) to examine the distance between iterations. This can take
13.6. Iterative System Solvers: Gauss-Jacobi and Gauss-Seidel 295

v0

v1

Figure 13.5.
Vector sequences: a sequence that converges.

on different forms depending on the problem at hand. In general, it


will lead to testing the condition

v(i) − v(i+1)  < ,

which measures the change from one iteration to the next. In the case
of an iterative solution to a linear system, u(i) , we will test for the
condition that Au(i) − b < , which indicates that this iteration
has provided an acceptable solution.

13.6 Iterative System Solvers:


Gauss-Jacobi and Gauss-Seidel

In applications such as finite element methods (FEM ) in the context


of the solution of fluid flow problems, scientists are faced with linear
systems with many thousands of equations. Gauss elimination would
work, but would be far too slow. Typically, huge linear systems have
one advantage: the coefficient matrix is sparse, meaning it has only
very few (such as ten) nonzero entries per row. Thus, a 100,000 ×
100,000 system would only have 1,000,000 nonzero entries, compared
to 10,000,000,000 matrix elements! In these cases, one does not store
the whole matrix, but only its nonzero entries, together with their i, j
location. An example of a sparse matrix is shown in Figure 13.1. The
solution to such systems is typically obtained by iterative methods,
which we will discuss next.
296 13. Alternative System Solvers

Example 13.10

Let a system2 be given by


⎡ ⎤⎡ ⎤ ⎡ ⎤
4 1 0 u1 1
⎣ 2 5 1⎦ ⎣u2 ⎦ = ⎣0⎦ .
−1 2 4 u3 3

An iterative method starts from a guess for the solution and then
refines it until it is the solution. Let’s take
⎡ (1) ⎤ ⎡ ⎤
u 1
⎢ 1 ⎥ ⎣ ⎦
u(1) = ⎣u(1)
2 ⎦ = 1
u
(1) 1
3

for our first guess, and note that it clearly is not the solution to our
system: Au(1) = b.
A better guess ought to be obtained by using the current guess and
(2) (2)
solving the first equation for a new u1 , the second for a new u2 ,
and so on. This gives us
(2)
4u1 + 1 = 1,
(2)
2 + 5u2 + 1 = 0,
(2)
−1 + 2 + 4u3 = 3,

and thus ⎡ ⎤
0
u(2) = ⎣−0.6⎦ .
0.5
The next iteration becomes
(3)
4u1 − 0.6 = 1,
(3)
5u2 + 0.5 = 0,
(3)
−1.2 + 4u3 = 3,

and thus ⎡ ⎤
0.4
u(3) = ⎣−0.1 ⎦ .
1.05
2 This example was taken from Johnson and Riess [11].
13.6. Iterative System Solvers: Gauss-Jacobi and Gauss-Seidel 297

After a few more iterations, we will be close enough to the true solu-
tion ⎡ ⎤
0.333
u = ⎣−0.333⎦ .
1.0
Try one more iteration for yourself.

This iterative method is known as Gauss-Jacobi iteration. Let us


now formulate this process for the general case. We are given a linear
system with n equations and n unknowns ui , written in matrix form
as
Au = b.
Let us also assume that we have an initial guess u(1) for the solution
vector u.
We now define two matrices D and R as follows: D is the diagonal
matrix whose diagonal elements are those of A, and R is the matrix
obtained from A by setting all its diagonal elements to zero. Clearly
then
A=D+R
and our linear system becomes
Du + Ru = b
or
u = D−1 [b − Ru].
In the spirit of our previous development, we now write this as
 
u(k+1) = D−1 b − Ru(k) ,

meaning that we attempt to compute a new estimate u(k+1) from


an existing one u(k) . Note that D must not contain zeroes on the
diagonal; this can be achieved by row or column interchanges.

Example 13.11

With this new framework, let us reconsider our last example. We


have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 1 0 0 1 0 0.25 0 0
A=⎣ 2 5 1⎦ , R=⎣ 2 0 1⎦ , D−1 =⎣ 0 0.2 0 ⎦.
−1 2 4 −1 2 0 0 0 0.25
298 13. Alternative System Solvers

Then
⎡ ⎤ ⎛⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎞ ⎡ ⎤
0.25 0 0 1 0 1 0 1 0
u(2) =⎣ 0 0.2 0 ⎦ ⎝⎣0⎦ − ⎣ 2 0 1⎦ ⎣1⎦⎠ = ⎣−0.6⎦
0 0 0.25 3 −1 2 0 1 0.5

Will the Gauss-Jacobi method succeed, i.e., will the sequence of


vectors u(k) converge? The answer is: sometimes yes, and sometimes
no. It will always succeed if A is diagonally dominant, which means
that if, for every row, the absolute value of its diagonal element is
larger than the sum of the absolute values of its remaining elements.
In this case, it will succeed no matter what our initial guess u(1) was.
Strictly diagonally dominant matrices are nonsingular. Many prac-
tical problems, for example, finite element ones, result in diagonally
dominant systems.
In a practical setting, how do we determine if convergence is taking
place? Ideally, we would like u(k) = u, the true solution, after a
number of iterations. Equality will most likely not happen, but the
length of the residual vector

Au(k) − b

should become small (i.e., less than some preset tolerance). Thus, we
check the length of the residual vector after each iteration, and stop
once it is smaller than our tolerance.
A modification of the Gauss-Jacobi method is known as Gauss-
Seidel iteration. When we compute u(k+1) in the Gauss-Jacobi method,
(k+1)
we can observe the following: the second element, u2 , is computed
(k) (k) (k) (k+1)
using u1 , u3 , . . . , un . We had just computed u1 . It stands to
(k)
reason that using it instead of u1 would be advantageous. This
idea gives rise to the Gauss-Seidel method: as soon as a new element
(k+1)
ui is computed, the estimate vector u(k+1) is updated.
In summary, Gauss-Jacobi updates the new estimate vector once
all of its elements are computed, Gauss-Seidel updates as soon as a
new element is computed. Typically, Gauss-Seidel iteration converges
faster than Gauss-Jacobi iteration.
13.7. Exercises 299

• reflection matrix • matrix norm properties


• Householder method • Frobenius norm
• overdetermined system • action ellipse axes
• symmetric matrix • singular values
• involutary matrix • condition number
• orthogonal matrix • well-conditioned matrix
• unitary matrix • ill-conditioned matrix
• vector norm • vector sequence
• vector norm properties • convergence
• Euclidean norm • iterative method
• L2 norm • sparse matrix
• ∞ norm • Gauss-Jacobi method
• max norm • Gauss-Seidel method
• Manhattan norm • residual vector
• matrix norm

13.7 Exercises
1. Use the Householder method to solve the following linear system
⎡ ⎤ ⎡ ⎤
1 1.1 1.1 1
⎣1 0.9 0.9⎦ u = ⎣ 1 ⎦ .
0 −0.1 0.2 0.3

Notice that the columns are almost linearly dependent.


2. Show that the Householder matrix is involuntary.
3. What is the Euclidean norm of
⎡ ⎤
1
⎢1⎥
v=⎢ ⎥
⎣1⎦ .
1

4. Examining the 1- and 2-norms defined in Section 13.2, how would you
define a 3-norm?
5. Show that the v1 satisfies the properties (13.6)–(13.9) of a vector
norm.
6. Define a new vector norm to be max{2|v1 |, |v2 |, . . . , |vn |}. Show that
this is indeed a vector norm. For vectors in R2 , sketch the outline of all
unit vectors with respect to this norm.
300 13. Alternative System Solvers

7. Let ⎡ ⎤
1 0 1
A = ⎣0 1 2⎦ .
0 0 1
What is A’s 2-norm?
8. Let four unit vectors be given by
       
1 0 −1 0
v1 = , v2 = , v3 = , v4 = .
0 1 0 −1

Using the matrix  


1 1
A= ,
0 1
compute the four image vectors Avi . Use these to find an approximation
to A2 .
9. Is the determinant of a matrix a norm?
10. Why is 1 the smallest possible condition number for a nonsingular ma-
trix?
11. What is the condition number of the matrix
 
0.7 0.3
A=
−1 1

that generated Figure 7.11?


12. What can you say about the condition number of a rotation matrix?
13. What is the condition number of the matrix
 
1 0
?
0 0

What type of matrix is it, and is it invertible?


14. Is the condition number of a matrix a norm?
15. Define a vector sequence by
⎡ ⎤⎡ ⎤
0 0 1/i 1
u =⎣ 0
i
1 0 ⎦ ⎣1⎦ ; i = 1, 2, . . . .
−1/i 0 0 1

Does it have a limit? If so, what is it?


16. Let a vector sequence be given by
   
1 1/2 1/4 (i)
u(1) = and u(i+1) = u .
−1 0 3/4

Will this sequence converge? If so, to what vector?


13.7. Exercises 301

17. Let a linear system be given by


⎡ ⎤ ⎡ ⎤
4 0 −1 2
A = ⎣2 8 2⎦ and b = ⎣−2⎦ .
1 0 2 0

Carry out three iterations of the Gauss-Jacobi iteration starting with


an initial guess ⎡ ⎤
0
u = ⎣0⎦ .
(1)

0
18. Let a linear system be given by
⎡ ⎤ ⎡ ⎤
4 0 1 0
A = ⎣2 −8 2⎦ and b = ⎣2⎦ .
1 0 2 0

Carry out three iterations of the Gauss-Jacobi iteration starting with


an initial guess ⎡ ⎤
0
u = ⎣0⎦ .
(1)

0
19. Carry out three Gauss-Jacobi iterations for the linear system
⎡ ⎤ ⎡ ⎤
4 1 0 1 0
⎢1 4 1 0⎥ ⎢1⎥
A=⎣ ⎢ ⎥ and b = ⎣ ⎥ ⎢ ,
0 1 4 1⎦ 1⎦
1 0 1 4 0

starting with the initial guess


⎡ ⎤
1
⎢1⎥
u(1) =⎢ ⎥
⎣1⎦ .
1

20. Carry out three iterations of Gauss-Seidel for Example 13.10. Which
method, Gauss-Jacobi or Gauss-Seidel, is converging to the solution
faster? Why?
This page intentionally left blank
14
General Linear Spaces

Figure 14.1.
General linear spaces: all cubic polynomials over the interval [0, 1] form a linear
space. Some elements of this space are shown.

In Sections 4.3 and 9.2, we had a first look at the concept of linear
spaces, also called vector spaces, by examining the properties of the
spaces of all 2D and 3D vectors. In this chapter, we will provide a
framework for linear spaces that are not only of dimension two or
three, but of possibly much higher dimension. These spaces tend to
be somewhat abstract, but they are a powerful concept in dealing
with many real-life problems, such as car crash simulations, weather

303
304 14. General Linear Spaces

forecasts, or computer games. The linear space of cubic polynomials


over [0, 1] is important for many applications. Figure 14.1 provides
an artistic illustration of some of the elements of this linear space.
Hence the term “general” in the chapter title refers to the dimension
and abstraction that we will study.

14.1 Basic Properties of Linear Spaces


We denote a linear space of dimension n by Ln . The elements of Ln
are vectors, denoted (as before) by boldface letters such as u or v.
We need two operations to be defined on the elements of Ln : addition
and multiplication by a scalar such as s or t. With these operations
in place, the defining property for a linear space is that any linear
combination of vectors,
w = su + tv, (14.1)
results in a vector in the same space. This is called the linearity
property. Note that both s and t may be zero, asserting that every
linear space has a zero vector in it. This equation is familiar to us by
now, as we first encountered it in Section 2.6, and linear combinations
have been central to working in the linear spaces of R2 and R3 .1
But in this chapter, we want to generalize linear spaces; we want to
include new kinds of vectors. For example, our linear space could be
the quadratic polynomials or all 3×3 matrices. Because the objects in
the linear space are not always in the format of what we traditionally
call a vector, we choose to call these linear spaces, which focuses on the
linearity property rather than vector spaces. Both terms are accepted.
Thus, we will expand our notation for objects of a linear space—we
will not always use boldface vector notation. In this section, we will
look at just a few examples of linear spaces, more examples of linear
spaces will appear in Section 14.5.

Example 14.1
Let’s start with a very familiar example of a linear space: R2 . Suppose
   
1 −2
u= and v =
1 3
are elements of this space; we know that
 
0
w = 2u + v =
5
is also in R .
2

1 Note that we will not always use the L notation, but rather the standard

name for the space when one exists.


14.1. Basic Properties of Linear Spaces 305

Example 14.2

Let the linear space M2×2 be the set of all 2 × 2 matrices. We know
that the linearity property (14.1) holds because of the rules of matrix
arithmetic from Section 4.12.

Example 14.3

Let’s define V2 to be all vectors w in R2 that satisfy w2 ≥ 0. For


example, e1 and e2 live in V2 . Is this a linear space?
It is not: we can form a linear combination as in (14.1) and produce
 
0
v = 0 × e1 + −1 × e2 = ,
−1

which is not in V2 .

In a linear space Ln , let’s look at a set of r vectors, where 1 ≤


r ≤ n. A set of vectors v1 , . . . , vr is called linearly independent if it
is impossible to express one of them as a linear combination of the
others. For example, the equation

v1 = s2 v2 + s3 v3 + . . . + sr vr

will not have a solution set s2 , . . . , sr in case the vectors v1 , . . . , vr are


linearly independent. As a simple consequence, the zero vector can
be expressed only in a trivial manner in terms of linearly independent
vectors, namely if
0 = s1 v1 + . . . + sr vr
then s1 = . . . = sr = 0. If the zero vector can be expressed as a
nontrivial combination of r vectors, then we say these vectors are
linearly dependent.
If the vectors v1 , . . . , vr are linearly independent, then the set of all
vectors that may be expressed as a linear combination of them forms
a subspace of Ln of dimension r. We also say this subspace is spanned
by v1 , . . . , vr . If this subspace equals the whole space Ln , then we
call v1 , . . . , vn a basis for Ln . If Ln is a linear space of dimension n,
then any n + 1 vectors in it are linearly dependent.
306 14. General Linear Spaces

Example 14.4

Let’s consider one very familiar linear space: R3 . A commonly used


basis for R3 are the linearly independent vectors e1 , e2 , e3 . A linear
combination of these basis vectors, for example,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3 1 0 0
v = ⎣4⎦ = 3 ⎣0⎦ + 4 ⎣1⎦ + 7 ⎣0⎦
7 0 0 1
is also in R3 . Then the four vectors v, e1 , e2 , e3 are linearly dependent.
Any one of the four vectors forms a one-dimensional subspace of
R3 . Since v and each of the ei are linearly independent, any two
vectors here form a two-dimensional subspace of R3 .

Example 14.5

In a 4D space R4 , let three vectors be given by


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 5 3
⎢ 0⎥ ⎢ 0⎥ ⎢ 0⎥
v1 = ⎢
⎣ 0⎦ ,
⎥ v2 = ⎢ ⎥
⎣−3⎦ , v3 = ⎢ ⎥
⎣−3⎦ .
1 1 0
These three vectors are linearly dependent since
v2 = v1 + 2v3 or 0 = v1 − v2 + 2v3 .
Our set {v1 , v2 , v3 } contains only two linearly independent vectors,
hence any two of them spans a subspace of R4 of dimension two.

Example 14.6

In a 3D space R3 , let four vectors be given by


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 1 1 0
v1 = ⎣ 0⎦ , v2 = ⎣2⎦ , v3 = ⎣ 2⎦ , v4 = ⎣ 0⎦ .
0 0 −3 −3
These four vectors are linearly dependent since
v3 = −v1 + 2v2 + v4 .
However, any set of three of these vectors is a basis for R3 .
14.2. Linear Maps 307

14.2 Linear Maps


The linear map A that transforms the linear space Ln to the linear
space Lm , written as A : Ln → Lm , preserves linear relationships.
Let three preimage vectors v1 , v2 , v3 in Ln be mapped to three image
vectors Av1 , Av2 , Av3 in Lm . If there is a linear relationship among
the preimages, vi , then the same relationship will hold for the images:

v1 = αv2 + βv3 implies Av1 = αAv2 + βAv3 . (14.2)

Maps that do not have this property are called nonlinear maps and
are much harder to deal with.
Linear maps are conveniently written in terms of matrices. A vector
v in Ln is mapped to v in Lm ,

v = Av,

where A has m rows and n columns. The matrix A describes the map
from the [e1 , . . . , en ]-system to the [a1 , . . . , an ]-system, where the ai
are in Lm . This means that v is a linear combination v of the ai ,

v = v1 a1 + v2 a2 + . . . vn an ,

and therefore it is in the column space of A.

Example 14.7

Suppose we have a map A : R2 → R3 , defined by


⎡ ⎤
1 0
A = ⎣0 1⎦ .
2 2

And suppose we have three vectors in R2 ,


     
1 0 2
v1 = , v2 = , v3 = ,
0 1 1

which are mapped to


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 2
v̂1 = ⎣0⎦ , v̂2 = ⎣1⎦ , v̂3 = ⎣1⎦ ,
2 2 6

in R3 by A.
308 14. General Linear Spaces

The vi are linearly dependent since v3 = 2v1 + v2 . This same


linear combination holds for the v̂i , asserting that the map preserves
linear relationships.

The matrix A has a certain rank k—how can we infer this rank from
the matrix? First of all, a matrix of size m×n, can be at most of rank
k = min{m, n}. This is called full rank. In other words, a linear map
can never increase dimension. It is possible to map Ln to a higher-
dimensional space Lm . However, the images of Ln ’s n basis vectors
will span a subspace of Lm of dimension at most n. Example 14.7
demonstrates this idea: the matrix A has rank 2, thus the v̂i live
in a dimension 2 subspace of R3 . A matrix with rank less than this
min{m, n} is called rank deficient. We perform forward elimination
(possibly with row exchanges) until the matrix is in upper triangular
form. If after forward elimination there are k nonzero rows, then the
rank of A is k. This is equivalent to our definition in Section 4.2
that the rank is equal to the number of linearly independent column
vectors. Figure 14.2 gives an illustration of some possible scenarios.

Figure 14.2.
The three types of matrices: from left to right, m < n, m = n, m > n. Examples of full
rank matrices are on the top row, and examples of rank deficient matrices are on the
bottom row. In each, gray indicates nonzero entries and white indicates zero entries
after forward elimination was performed.
14.2. Linear Maps 309

Example 14.8

Let us determine the rank of the matrix


⎡ ⎤
1 3 4
⎢ 0 1 2⎥
⎢ ⎥.
⎣ 1 2 2⎦
−1 1 1

We perform forward elimination to obtain


⎡ ⎤
1 3 4
⎢0 1 2⎥
⎢ ⎥.
⎣0 0 −3⎦
0 0 0

There is one row of zeroes, and we conclude that the matrix has
rank 3, which is full rank since min{4, 3} = 3.
Next, let us take the matrix
⎡ ⎤
1 3 4
⎢0 1 2⎥
⎢ ⎥
⎣1 2 2⎦ .
0 1 2

Forward elimination yields


⎡ ⎤
1 3 4
⎢0 1 2⎥
⎢ ⎥,
⎣0 0 0⎦
0 0 0

and we conclude that this matrix has rank 2, which is rank deficient.

Let’s review some other features of linear maps that we have en-
countered in earlier chapters. A square, n × n matrix A of rank n is
invertible, which means that there is a matrix that undoes A’s action.
This is the inverse matrix, denoted by A−1 . See Section 12.4 on how
to compute the inverse.
If a matrix is invertible, then it does not reduce dimension and its
determinant is nonzero. The determinant of a square matrix measures
the volume of the n-dimensional parallelepiped, which is defined by
310 14. General Linear Spaces

its column vectors. The determinant of a matrix is computed by


subjecting its column vectors to a sequence of shears until it is of
upper triangular form (forward elimination). The value is then the
product of the diagonal elements. (If pivoting takes place, the row
exchanges need to be documented in order to determine the correct
sign of the determinant.)

14.3 Inner Products


Let u, v, w be elements of a linear space Ln and let α and β be scalars.
A map from Ln to the reals R is called an inner product if it assigns
a real number v, w to the pair v, w such that:

v, w = w, v (symmetry), (14.3)


αv, w = αw, v (homogeneity), (14.4)
u + v, w = u, w + v, w (additivity), (14.5)
for all v v, v ≥ 0 and
v, v = 0 if and only if v = 0 (positivity). (14.6)

The homogeneity and additivity properties can be nicely combined


into
αu + βv, w = αu, w + βv, w.
A linear space with an inner product is called an inner product space.
We can consider the inner product to be a generalization of the dot
product. For the dot product in Rn , we write

v, w = v · w = v1 w1 + v2 w2 + . . . + vn wn .

From our experience with 2D and 3D, we can easily show that the
dot product satisfies the inner product properties.

Example 14.9

Suppose u, v, w live in R2 . Let’s define the following “test” inner


product in R2 ,
v, w = 4v1 w1 + 2v2 w2 , (14.7)
which is an odd variation of the standard dot product. To √ get √
a feel
for it, consider the three unit vectors, e1 , e2 , and r = [1/ 2 1/ 2]T ,
then
e1 , e2  = 4(1)(0) + 2(0)(1) = 0,
14.3. Inner Products 311

4
4
2
2
1 2
π 2π
π 2π
–2
π 2π

–4

Figure 14.3.
Inner product: comparing the dot product (black) with the test inner product from
Example 14.9 (gray). For each plot, the unit vector r is rotated in the range [0, 2π ]. Left:
the inner products, e1 · r and e1 , r
. Middle: length
the graph of for the inner products,

r · r and r, r
. Right: distance for the inner products, (e1 − r) · (e1 − r) and

(e1 − r), (e1 − r)
.

which is the same result as the dot product, but


 
1 1 4
e1 , r = 4(1) √ + 2(0) √ = √
2 2 2

differs from e1 ·r = 1/ 2. In Figure 14.3 (left), this test inner product
and the dot product are graphed for r that is rotated in the range
[0, 2π]. Notice that the graph of the dot product is the graph of cos(θ).
We will now show that this definition satisfies the inner product
properties.
Symmetry: v, w = 4v1 w1 + 2v2 w2
= 4w1 v1 + 2w2 v2
= w, v.

Homogeneity: αv, w = 4(αv1 )w1 + 2(αv2 )w2


= α(4v1 w1 + 2v2 w2 )
= αv, w.

Additivity: u + v, w = 4(u1 + v1 )w1 + 2(u2 + v2 )w2


= (4u1 w1 + 2u2 w2 ) + (4v1 w1 + 2v2 w2 )
= u, w + v, w.

Positivity: v, v = 4v12 + 2v22 ≥ 0,


v, v = 0 if and only if v = 0.
312 14. General Linear Spaces

We can question the usefulness of this inner product, but it does


satisfy the necessary properties.

Inner product spaces offer the concept of length,



v = v, v.

This is also called the 2-norm or Euclidean norm and is denoted as


v2 . Since the 2-norm is the most commonly used norm, the sub-
script is typically omitted. (More vector norms were introduced in
Section 13.2.) Then the distance between two vectors,

dist(u, v) = u − v, u − v = u − v.

For vectors in Rn and the dot product, we have the Euclidean norm

v = v12 + v22 + . . . + vn2

and

dist(u, v) = (u1 − v1 )2 + (u2 − v2 )2 + . . . + (un − vn )2 .

Example 14.10

Let’s get a feel for the norm and distance concept for the test inner
product in (14.7). We have

e1  = e1 , e1  = 4(1)2 + 2(0)2 = 4
 √
dist(e1 , e2 ) = 4(1 − 0)2 + 2(0 − 1)2 = 6,

The dot product produces e1  = 1 and dist(e1 , e2 ) = 2.
Figure 14.3 illustrates the difference between the dot product and
the test inner product. Again, r is a unit vector, rotated in the range
[0, 2π]. The middle plot shows that unit length vectors with respect
to the dot product are not unit length with respect to the test inner
product. The right plot shows that the distance between two vectors
differs too.
14.3. Inner Products 313

Orthogonality is important to establish as well. If two vectors v


and w in a linear space Ln satisfy

v, w = 0,

then they are called orthogonal. If v1 , . . . , vn form a basis for Ln and


all vi are mutually orthogonal, vi , vj  = 0 for i = j, then the vi
are said to form an orthogonal basis. If in addition they are also of
unit length, vi  = 1, they form an orthonormal basis. Any basis of
a linear space may be transformed to an orthonormal basis by the
Gram-Schmidt process, described in Section 14.4.
The Cauchy-Schwartz inequality was introduced in (2.23) for R2 ,
and we repeat it here in the context of inner product spaces,

v, w2 ≤ v, vw, w.

Equality holds if and only if v and w are linearly dependent.


If we restate the Cauchy-Schwartz inequality

v, w2 ≤ v2 w2

and rearrange,
 2
v, w
≤ 1,
vw
we obtain
v, w
−1 ≤ ≤ 1.
vw
Now we can express the angle θ between v and w by
v, w
cos θ = .
vw

The properties defining an inner product (14.3)–(14.6) suggest

v ≥ 0,
v = 0 if and only if v = 0,
αv = |α|v.

A fourth property is the triangle inequality,

v + w ≤ v + w,

which we derived from the Cauchy-Schwartz inequality in Section 2.9.


See Sketch 2.22 for an illustration of this with respect to a triangle
314 14. General Linear Spaces

formed from two vectors in R2 . See the exercises for the generalized
Pythagorean theorem.
The tools associated with the inner product are key for orthogonal
decomposition and best approximation in a linear space. Recall that
these concepts were introduced in Section 2.8 and have served as the
building blocks for the 3D Gram-Schmidt method for construction
of an orthonormal coordinate frame (Section 11.8) and least squares
approximation (Section 12.7).
We finish with the general definition of a projection. Let u1 , . . . , uk
span a subspace Lk of L. If v is a vector not in that space, then

P v = v, u1 u1 + . . . + v, uk uk

is v’s orthogonal projection into Lk .

14.4 Gram-Schmidt Orthonormalization


In Section 11.8 we built an orthonormal coordinate frame in R3 using
the Gram-Schmidt method. Every inner product space has an or-
thonormal basis. The key elements of its construction are projections
and vector decomposition.
Let b1 , . . . , br be a set of orthonormal vectors, forming the basis of
an r-dimensional subspace Sr of Ln , where n > r. We want to find
br+1 orthogonal to the given bi . Let u be an arbitrary vector in Ln ,
but not in Sr . Define a vector û by

û = projSr u = u, b1 b1 + . . . + u, br br .

This vector is u’s orthogonal projection into Sr . To see this, we check


that the difference vector u − û is orthogonal to each of the bi . We
first check that it is orthogonal to b1 and observe

u − û, b1  = u, b1  − u, b1 b1 , b1  − . . . − u, br b1 , br .

All terms b1 , b2 , b1 , b3 , etc. vanish since the bi are mutually
orthogonal and b1 , b1  = 1. Thus, u − û, b1  = 0. In the same
manner, we show that u − û is orthogonal to the remaining bi . Thus
u − projSr u
br+1 = ,
·
and the set b1 , . . . , br+1 forms an orthonormal basis for the subspace
Sr+1 of Ln . We may repeat this process until we have found an
orthonormal basis for all of Ln .
14.4. Gram-Schmidt Orthonormalization 315

Sketch 14.1 provides an illustration in which S2 is depicted as R2 .


This process is known as Gram-Schmidt orthonormalization. Given
any basis v1 , . . . , vn of Ln , we can find an orthonormal basis by set-
ting b1 = v1 / ·  and continuing to construct, one by one, vectors
b2 , . . . , bn using the above procedure. For instance,
v2 − projS1 v2 v2 − v2 , b1 b1
b2 = = ,
· ·
v3 − projS2 v3 v3 − v3 , b1 b1 − v3 , b2 b2
b3 = = .
· ·
Consult Section 11.8 for an example in R3 . Sketch 14.1.
Gram-Schmidt
orthonormalization.
Example 14.11

Suppose we are given the following vectors in R4 ,


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 0
⎢0⎥ ⎢1⎥ ⎢1⎥ ⎢0⎥
v1 = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣0⎦ , v2 = ⎣1⎦ , v3 = ⎣0⎦ , v4 = ⎣1⎦ ,
⎢ ⎥

0 1 0 0
and we wish to form an orthonormal basis, b1 , b2 , b3 , b4 .
The Gram-Schmidt method, as defined in the displayed equations
above, produces
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0√ 0√
⎢0⎥ ⎢1/ 3⎥ ⎢ 2/ 6 ⎥
b1 = ⎢ ⎥ ⎢ √ ⎥ ⎢
⎣0⎦ , b2 = ⎣1/ 3⎦ , b3 = ⎣−1/ 6⎦ .
√ ⎥
√ √
0 1/ 3 −1/ 6
The final vector, b4 , is defined as
⎡ ⎤
0
⎢ 0 ⎥
b4 = v4 − v4 , b1 b1 − v4 , b2 b2 − v4 , b3 b3 = ⎢ √ ⎥
⎣ 1/ 2 ⎦

−1/ 2
Knowing that the bi are normalized and checking that bi · bj = 0,
we can be confident that this is an orthonormal basis. Another tool
we have is the determinant, which will be one,

b1 b2 b3 b4 = 1.
316 14. General Linear Spaces

14.5 A Gallery of Spaces


In this section, we highlight some special linear spaces—but there are
many more!
For a first example, consider all polynomials of a fixed degree n.
These are functions of the form
p(t) = a0 + a1 t + a2 t2 + . . . + an tn
where t is the independent variable of p(t). Thus we can construct a
linear space Pn whose elements are all polynomials of a fixed degree.
Addition in this space is addition of polynomials, i.e., coefficient by
coefficient; multiplication is multiplication of a polynomial by a real
number. It is easy to check that these polynomials have the linearity
property (14.1). For example, if p(t) = 3 − 2t + 3t2 and q(t) =
−1+t+2t2 , then 2p(t)+3q(t) = 3−t+12t2 is yet another polynomial
of the same degree.
This example also serves to introduce a not-so-obvious linear map:
the operation of forming derivatives! The derivative p of a degree n
polynomial p is a polynomial of degree n − 1, given by
p (t) = a1 + 2a2 t + . . . + nan tn−1 .
The linear map of forming derivatives thus maps the space of all
degree n polynomials into that of all degree n − 1 polynomials. The
rank of this map is thus n − 1.

Example 14.12

Let us consider the two cubic polynomials


p(t) = 3 − t + 2t2 + 3t3 and q(t) = 1 + t − t3
in the linear space of cubic polynomials, P3 . Let
r(t) = 2p(t) − q(t) = 5 − 3t + 4t2 + 7t3 .
Now
r (t) = −3 + 8t + 21t2 ,
p (t) = −1 + 4t + 9t2 ,
q  (t) = 1 − 3t2 .
It is now trivial to check that r (t) = 2p (t) − q  (t), thus asserting the
linearity of the derivative map.
14.5. A Gallery of Spaces 317

For a second example, a linear space is given by the set of all real-
valued continuous functions over the interval [0, 1]. This space is
typically named C[0, 1]. Clearly the linearity condition is met: if f
and g are elements of C[0, 1], then αf + βg is also in C[0, 1]. Here
we have an example of a linear space that is infinite-dimensional,
meaning that no finite set of functions forms a basis for C[0, 1].
For a third example, consider the set of all 3 × 3 matrices. They
form a linear space; this space consists of matrices. In this space,
linear combinations are formed using standard matrix addition and
multiplication with a scalar as summarized in Section 9.11.
And, finally, a more abstract example. The set of all linear maps
from a linear space Ln into the reals forms a linear space itself and
it is called the dual space L∗n of Ln . As indicated by the notation, its
dimension equals that of Ln . The linear maps in L∗n are known as
linear functionals.
For an example, let a fixed vector v and a variable vector u be
in Ln . The linear functionals defined by Φv (u) = u, v are in L∗n .
Then, for any basis b1 , . . . , bn of Ln , we can define linear functionals
Φbi (u) = u, bi  for i = 1, . . . , n.
These functionals form a basis for L∗n .

Example 14.13

In R2 , consider the fixed vector


 
1
v= .
−2
Then Φv (u) = u, v = u1 − 2u2 for all vectors u, where ·, · is the
dot product.

Example 14.14

Pick e1 , e2 for a basis in R2 . The associated linear functionals are


Φe1 (u) = u1 , Φe2 (u) = u2 .
Any linear functional Φ can now be defined as
Φ(u) = r1 Φe1 (u) + r2 Φe2 (u),
where r1 and r2 are scalars.
318 14. General Linear Spaces

• linear space • determinant


• vector space • inner product
• dimension • inner product space
• linear combination • distance in an inner
• linearity property product space
• linearly independent • length in an inner product
• subspace space
• span • orthogonal
• linear map • Gram-Schmidt method
• image • projection
• preimage • basis
• domain • orthonormal
• range • orthogonal decomposition
• rank • best approximation
• full rank • dual space
• rank deficient • linear functional
• inverse

14.6 Exercises
1. Given elements of R4 ,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 3
⎢2⎥ ⎢0⎥ ⎢1⎥
u=⎢ ⎥
⎣0⎦ , v=⎢ ⎥
⎣2⎦ , w=⎢ ⎥
⎣2⎦ ,
4 7 0

is r = 3u + 6v + 2w also in R4 ?
2. Given matrices that are elements of M3×3 ,
⎡ ⎤ ⎡ ⎤
1 0 2 3 0 0
A = ⎣2 0 1⎦ and B = ⎣0 3 1⎦ ,
1 1 3 4 1 7

is C = 4A + B an element of M3×3 ?
3. Does the set of all polynomials with an = 1 form a linear space?
4. Does the set of all 3D vectors with nonnegative components form a
subspace of R3 ?
14.6. Exercises 319

5. Are ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
⎢0⎥ ⎢1⎥ ⎢1⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
u=⎢ ⎥
⎢1⎥ , v=⎢ ⎥
⎢1⎥ , w=⎢ ⎥
⎢0⎥
⎣0⎦ ⎣1⎦ ⎣1⎦
1 1 0
linearly independent?
6. Are ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 3 2
u = ⎣2⎦ , v = ⎣6⎦ , w = ⎣2⎦
1 1 1
linearly independent?
7. Is the vector ⎡ ⎤
2
⎢3⎥
⎢ ⎥
r=⎢ ⎥
⎢2⎥
⎣3⎦
2
in the subspace defined by u, v, w defined in Exercise 5?
8. Is the vector ⎡ ⎤
2
⎢3⎥
⎢ ⎥
r=⎢ ⎥
⎢2⎥
⎣3⎦
2
in the subspace defined by u, v, w defined in Exercise 6?
9. What is the dimension of the linear space formed by all n × n matrices?
10. Suppose we are given a linear map A : R4 → R2 , preimage vectors
vi , and corresponding image vectors wi . What are the dimensions
of the matrix A? The following linear relationship exists among the
preimages,
v4 = 3v1 + 6v2 + 9v3 .
What relationship holds for w4 with respect to the wi ?
11. What is the rank of the matrix
⎡ ⎤
1 2 0
⎢−1 −2 1⎥
⎢ ⎥?
⎣ 0 0 1⎦
2 4 −1

12. What is the rank of the matrix


⎡ ⎤
1 2 0 0 0
⎣−1 0 0 −2 1⎦?
0 0 1 0 1
320 14. General Linear Spaces

13. Given the vectors


 √   √ 
1/√10 −3/√ 10
w= and r= ,
3/ 10 1/ 10

find w, r
, w, r, and dist(w, r) with respect to the dot product
and then for the test inner product in (14.7).
14. For v, w in R3 , does

v, w
= v12 w12 + v22 w22 + v32 w32

satisfy the requirements of an inner product?


15. For v, w in R3 , does

v, w
= 4v1 w1 + v2 w2 + 2v3 w3

satisfy the requirements of an inner product?


16. In the space of all 3 × 3 matrices, is

A, B
= a1,1 b1,1 + a1,2 b1,2 + a1,3 b1,3 + a2,1 b2,1 + . . . + a3,3 b3,3

an inner product?
17. Let p(t) = p0 + p1 t + p2 t2 and q(t) = q0 + q1 t + q2 t2 be two quadratic
polynomials. Define

p, q
= p0 q0 + p1 q1 + p2 q2 .

Is this an inner product for the space of all quadratic polynomials?


18. Show that the Pythagorean theorem

||v + w||2 = ||v||2 + ||w||2

holds for orthogonal vectors v and w in an inner product space.


19. Given the following vectors in R3 ,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 −1
v1 = ⎣1⎦ , v2 = ⎣1⎦ , v3 = ⎣−1⎦ ,
0 0 −1

form an orthonormal basis, b1 , b2 , b3 .


20. Given the following vectors in R4 ,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 1 1
⎢0⎥ ⎢1⎥ ⎢1⎥ ⎢0⎥
v1 = ⎢ ⎥ ⎢ ⎥
⎣1⎦ , v2 = ⎣1⎦ , v3 = ⎢ ⎥
⎣0⎦ , v4 = ⎢ ⎥
⎣0⎦ ,
0 1 0 0

form an orthonormal basis, b1 = v1 , b2 , b3 , b4 .


14.6. Exercises 321

21. Find a basis for the linear space formed by all 2 × 2 matrices.
22. Does the set of all monotonically increasing functions over [0, 1] form
a linear space?
23. Let L be a linear space. Is the map Φ(u) = u an element of the dual
space L∗ ?
24. Show the linearity of the derivative map on the linear space of quadratic
polynomials P2 .
This page intentionally left blank
15
Eigen Things Revisited

Figure 15.1.
Google matrix: part of the connectivity matrix for Wikipedia pages in 2009, which is
used to find the webpage ranking. (Source: Wikipedia, Google matrix.)

Chapter 7 “Eigen Things” introduced the basics of eigenvalues and


eigenvectors in terms of 2×2 matrices. But also for any n×n matrix A,

323
324 15. Eigen Things Revisited

we may ask if it has fixed directions and what are the corresponding
eigenvalues.
In this chapter we go a little further and examine the power method
for finding the eigenvector that corresponds to the dominant eigen-
value. This method is paired with an application section describing
how a search engine might rank webpages based on this special eigen-
vector, given a fun, slang name—the Google eigenvector.
We explore “Eigen Things” of function spaces that are even more
general than those in the gallery in Section 14.5.
“Eigen Things” characterize a map by revealing its action and ge-
ometry. This is key to understanding the behavior of any system.
A great example of this interplay is provided by the collapse of the
Tacoma Narrows Bridge in Figures 7.1 and 7.2. But “Eigen Things”
are important in many other areas: characterizing harmonics of musi-
cal instruments, moderating movement of fuel in a ship, and analysis
of large data sets, such as the Google matrix in Figure 15.1.

15.1 The Basics Revisited


If an n × n matrix A has fixed directions, then there are vectors r
that are mapped to multiples λ of themselves by A. Such vectors are
characterized by
Ar = λr
or
[A − λI]r = 0. (15.1)
Since r = 0 trivially satisfies this equation, we will not consider it from
now on. In (15.1), we see that the matrix [A − λI] maps a nonzero
vector r to the zero vector. Thus, its determinant must vanish (see
Section 4.9):
det[A − λI] = 0. (15.2)
The term det[A − λI] is called the characteristic polynomial of A. It
is a polynomial of degree n in λ, and its zeroes are A’s eigenvalues.

Example 15.1

Let ⎡ ⎤
1 1 0 0
⎢0 3 1 0⎥
A=⎢ ⎥.
⎣0 0 4 1⎦
0 0 0 2
15.1. The Basics Revisited 325

We find the degree four characteristic polynomial p(λ):



1 − λ
1 0 0
0 3−λ 1 0
p(λ) = det[A − λI] = ,
0 0 4 − λ 1
0 0 0 2 − λ
resulting in
p(λ) = (1 − λ)(3 − λ)(4 − λ)(2 − λ).
The zeroes of this polynomial are found by solving p(λ) = 0. In our
slightly contrived example, we find λ1 = 4, λ2 = 3, λ3 = 2, λ4 = 1.
Convention orders the eigenvalues in decreasing order. Recall that the
largest eigenvalue in absolute value is called the dominant eigenvalue.
Observe that this matrix is upper triangular. Forming p(λ) with
expansion by minors (see Section 9.8), it is clear that the eigenvalues
for this type of matrix are on the main diagonal.

We learned that Gauss elimination or LU decomposition allows us


to transform a matrix into upper triangular form. However, elemen-
tary row operations change the eigenvalues, so this is not a means for
finding eigenvalues. Instead, diagonalization is used when possible,
and this is the topic of Chapter 16.

Example 15.2

Here is a simple example showing that elementary row operations


change the eigenvalues. Start with the matrix
 
2 2
A=
1 2
√ √
that has det A = 2 and eigenvalues λ1 = 2 + 2 and λ2 = 2 − 2.
After one step of forward elimination, we have an upper triangular
matrix  
 2 2
A = .
0 1
The determinant is invariant under forward elimination, det A = 2;
however, the eigenvalues are not: A has eigenvalues λ1 = 2 and
λ2 = 1.
326 15. Eigen Things Revisited

The bad news is that one is not always dealing with upper trian-
gular matrices like the one in Example 15.1. A general n × n matrix
has a degree n characteristic polynomial

p(λ) = det[A − λI] = (λ1 − λ)(λ2 − λ) · . . . · (λn − λ), (15.3)

and the eigenvalues are the zeroes of this polynomial. Finding the
zeroes of an nth degree polynomial is a nontrivial numerical task.
In fact, for n ≥ 5, it is not certain that we can algebraically find the
factorization in (15.3) because there is no general formula like we have
for n = 2, the quadratic equation. An iterative method for finding
the dominant eigenvalue is described in Section 15.2.
For 2 × 2 matrices, in (7.3) and (7.4), we observed that the charac-
teristic polynomial easily reveals that the determinant is the product
of the eigenvalues. For n × n matrices, we have the same situation.
Consider λ = 0 in (15.3), then p(0) = det A = λ1 λ2 · . . . · λn .
Needless to say, not all eigenvalues of a matrix are real in general.
But the important class of symmetric matrices always does have real
eigenvalues.
Two more properties of eigenvalues:
• The matrices A and AT have the same eigenvalues.
• Suppose A is invertible and has eigenvalues λi , then A−1 has eigen-
values 1/λi .
Having found the λi , we can now solve homogeneous linear systems

[A − λi I]ri = 0

in order to find the eigenvectors ri for i = 1, n. The ri are in the null


space of [A − λi I]. These are homogeneous systems, and thus have
no unique solution. Oftentimes, one will normalize all eigenvectors in
order to eliminate this ambiguity.

Example 15.3

Now we find the eigenvectors for the matrix in Example 15.1. Starting
with λ1 = 4, the corresponding homogeneous linear system is
⎡ ⎤
−3 1 0 0
⎢ 0 −1 1 0 ⎥
⎢ ⎥ r = 0,
⎣0 0 0 1⎦ 1
0 0 0 −2
15.1. The Basics Revisited 327

and we solve it using the homogeneous linear system techniques from


Section 12.3. Repeating for all eigenvalues, we find eigenvectors
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1/3 1/2 1/2 1
⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ 1/2 ⎥ ⎢0⎥
r1 = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 1 ⎦ , r2 = ⎣ 0 ⎦ , r3 = ⎣−1/2⎦ , r4 = ⎣0⎦ .
⎢ ⎥

0 0 1 0
For practice working with homogenous systems, work out the details.
Check that each eigenvector satisfies Ari = λi ri .

If some of the eigenvalues are multiple zeroes of the characteristic


polynomial, for example, λ1 = λ2 = λ, then we have two identical
homogeneous systems [A − λI]r = 0. Each has the same solution
vector r (instead of distinct solution vectors r1 , r2 for distinct eigen-
values). We thus see that repeated eigenvalues reduce the number of
eigenvectors.

Example 15.4
Let ⎡ ⎤
1 2 3
A = ⎣0 2 0⎦ .
0 0 2
This matrix has eigenvalues λi = 2, 2, 1. Finding the eigenvector
corresponding to λ1 = λ2 = 2, we get two identical homogeneous
systems ⎡ ⎤
−1 2 3
⎣ 0 0 0⎦ r1 = 0.
0 0 0
We set r3,1 = r2,1 = 1, and back substitution gives r1,1 = 5. The
homogeneous system corresponding to λ3 = 1 is
⎡ ⎤
0 2 3
⎣0 1 0⎦ r3 = 0.
0 0 1
Thus the two fixed directions for A are
⎡ ⎤ ⎡ ⎤
5 1
r1 = ⎣1⎦ and r3 = ⎣0⎦ .
1 0
Check that each eigenvector satisfies Ari = λi ri .
328 15. Eigen Things Revisited

Example 15.5

Let a rotation matrix be given by


⎡ ⎤
c −s 0
A = ⎣s c 0⎦
0 0 1

with c = cos α and s = sin α. It rotates around the e3 -axis by α


degrees. We should thus expect that e3 is an eigenvector—and indeed,
one easily verifies that Ae3 = e3 . Thus, the eigenvalue corresponding
to e3 is 1.

Symmetric matrices are special again. Not only do they have real
eigenvalues, but their eigenvectors are orthogonal. This can be shown
in exactly the same way as we did for the 2D case in Section 7.5.
Recall that in this case, A is said to be diagonalizable because it is
possible to transform A to the diagonal matrix Λ = R−1 AR, where
the columns of R are A’s eigenvectors and Λ is a diagonal matrix of
A’s eigenvalues.

Example 15.6

The symmetric matrix


⎡ ⎤
3 0 1
S = ⎣0 3 0⎦
1 0 3

has eigenvalues λ1 = 4, λ2 = 3, λ3 = 2 and corresponding eigenvectors


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 −1
r1 = ⎣0⎦ , r2 = ⎣1⎦ , r3 = ⎣ 0 ⎦ .
1 0 1

The eigenvalues are the diagonal elements of


⎡ ⎤
4 0 0
Λ = ⎣0 3 0⎦ ,
0 0 2
15.1. The Basics Revisited 329

and the ri are normalized for the columns of


⎡ √ √ ⎤
1/ 2 0 −1/ 2
R = ⎣ 0√ 1 0√ ⎦ ;
1/ 2 0 1/ 2

then A is said to be diagonizable since A = RΛRT . This is the


eigendecomposition of A.

Projection matrices have eigenvalues that are one or zero. A zero


eigenvalue indicates that the determinant of the matrix is zero, thus a
projection matrix is singular. When the eigenvalue is one, the eigen-
vector is projected to itself, and when the eigenvalue is zero, the
eigenvector is projected to the zero vector. If λ1 = . . . = λk = 1, then
the column space is made up of the eigenvectors and its dimension is
k, and the null space of the matrix is dimension n − k.

Example 15.7

Define a 3 × 3 projection matrix P = uuT , where


⎡ √ ⎤ ⎡ ⎤
1/ 2 1/2 0 1/2
u = ⎣ 0√ ⎦ , then P =⎣ 0 0 0 ⎦.
1/ 2 1/2 0 1/2

By design, we know that this matrix is rank one and singular. The
characteristic polynomial is p(λ) = λ2 (1 − λ) from which we conclude
that λ1 = 1 and λ2 = λ3 = 0. The eigenvector corresponding to λ1 is
⎡ ⎤
1
r1 = ⎣0⎦ ,
1

and it spans the column space of P . The zero eigenvalue leads to the
system ⎡ ⎤
1/2 0 1/2
⎣ 0 0 0 ⎦ r = 0.
0 0 0
To find one eigenvector associated with this eigenvalue, we can sim-
ply assign r3 = r2 = 1, and back substitution results in r1 = −1.
330 15. Eigen Things Revisited

Alternatively, we can find two vectors that span the two dimensional
null space of P , ⎡ ⎤ ⎡ ⎤
−1 0
r2 = ⎣ 0 ⎦ , r3 = ⎣1⎦ .
1 0
They are orthogonal to r1 . All linear combinations of elements of
the null space are also in the null space, and thus r = 1r1 + 1r2 .
(Normally, the eigenvectors are normalized, but for simplicity they
are not here.)

The trace of A is defined as


tr(A) = λ1 + λ2 + . . . + λn
= a1,1 + a2,2 + . . . + an,n .
Immediately it is evident that the trace is a quick and easy way to
learn a bit about the eigenvalues without computing them directly.
For example, if we have a real symmetric matrix, thus real, positive
eigenvalues, and the trace is zero, then the eigenvalues must all be
zero. For 2 × 2 matrices, we have
det[A − λI] = λ2 − λ(a1,1 + a2,2 ) + (a1,1 a2,2 − a1,2 a2,1 )
= λ2 − λtr(A) + det A,
thus 
tr(A) ± tr(A)2 − 4 det A
λ1,2 = . (15.4)
2

Example 15.8

Let  
1 −2
A= ,
0 −2
then simply by observation, we see that λ1 = −2 and λ2 = 1. Let’s
compare this to the eigenvalues from (15.4). The trace is the sum of
the diagonal elements, tr(A) = −1 and det A = −2. Then
−1 ± 3
λ1,2 = ,
2
resulting in the correct eigenvalues.
15.2. The Power Method 331

In Section 7.6 we introduced 2-dimensional quadratic forms, and


they were easy to visualize. This idea applies to n-dimensions as
well. The vector v lives in Rn . For an n × n symmetric, positive
definite matrix C, the quadratic form (7.14) becomes
f (v) = vT Cv
= c1,1 v12 + 2c1,2 v1 v2 + . . . + cn,n vn2 ,

where each vi2 term is paired with diagonal element ci,i and each vi vj
term is paired with 2ci,j due to the symmetry in C. Just as before, all
terms are quadratic. Now, the contour f (v) = 1 is an n-dimensional
ellipsoid. The semi-minor √ axis corresponds to the dominant eigenvec-
tor r1 and its length is 1/√ λ1 , and the semi-major axis corresponds
to rn and its length is 1/ λn .
A real matrix is positive definite if
f (v) = vT Av > 0 (15.5)

for any nonzero vector v in Rn . This means that the quadratic form
is positive everywhere except for v = 0. This is the same condition
we encountered in (7.17).

15.2 The Power Method


Let A be a symmetric n × n matrix.1 Further, let the eigenvalues of
A be ordered such that |λ1 | ≥ |λ2 | ≥ . . . ≥ |λn |. Then λ1 is called
the dominant eigenvalue of A. To simplify the notation to come,
refer to this dominant eigenvalue as λ, and let r be its corresponding
eigenvector. In Section 7.7, we considered repeated applications of a
matrix; we restricted ourselves to the 2D case. We encountered an
equation of the form
Ai r = λi r, (15.6)
which clearly holds for matrices of arbitrary size. This equation may
be interpreted as follows: if A has an eigenvalue λ, then Ai has an
eigenvalue λi with the same corresponding eigenvector r.
This property may be used to find the dominant eigenvalue and
eigenvector. Consider the vector sequence
r(i+1) = Ar(i) ; i = 1, 2, . . . (15.7)

where r(1) is an arbitrary (nonzero) vector.


1 The method discussed in this section may be extended to nonsymmetric ma-

trices, but since those eigenvalues may be complex, we will avoid them here.
332 15. Eigen Things Revisited

After a sufficiently large i, the r(i) will begin to line up with r,


as illustrated in the two leftmost examples in Figure 15.2. Here is
how to utilize that fact for finding λ: for sufficiently large i, we will
approximately have
r(i+1) = λr(i) .
This means that all components of r(i+1) and r(i) are (approximately)
related by
(i+1)
rj
(i)
= λ for j = 1, . . . , n. (15.8)
rj
In the algorithm to follow, rather than checking each ratio, we will
use the ∞-norm to define λ upon each iteration.

Algorithm:

Initialization:
Estimate dominant eigenvector r(1) = 0.
(1) (1)
Find j where |rj | = r(1) ∞ and set r(1) = r(1) /rj .
Set λ(1) = 0.
Set tolerance .
Set maximum number of iterations m.
For k = 2, . . . , m,
y = Ar(k−1) ,
λ(k) = yj .
Find j where |yj | = y∞ .
If yj = 0 then output: “eigenvalue zero; select new r(1)
and restart”; exit.
r(k) = y/yj .
If |λ(k) − λ(k−1) | <  then output: λ(k) and r(k) ; exit.
If k = m output: “maximum iterations exceeded.”

Some remarks on using this method:

• If |λ| is either “large” or “close” to zero, the r(k) will either become
unbounded or approach zero in length, respectively. This has the
potential to cause numerical problems. It is prudent, therefore, to
scale the r(k) , so in the algorithm above, at each step, the eigenvec-
tor is scaled by its element with the largest absolute value—with
respect to the ∞-norm.

• Convergence seems impossible if r(1) is perpendicular to r, the


eigenvector of λ. Theoretically, no convergence will kick in, but
15.2. The Power Method 333

for once numerical round-off is on our side: after a few iterations,


r(k) will not be perpendicular to r and we will converge—if slowly!
• Very slow convergence will also be observed if |λ1 | ≈ |λ2 |.
• The power method as described here is limited to symmetric matri-
ces with one dominant eigenvalue. It may be generalized to more
cases, but for the purpose of this exposition, we decided to outline
the principle rather than to cover all details.

Example 15.9
Figure 15.2 illustrates three cases, A1 , A2 , A3 , from left to right. The
three matrices and their eigenvalues are as follows:
 
2 1
A1 = , λ1 = 3, λ2 = 1,
1 2
 
2 0.1
A2 = , λ1 = 2.1, λ2 = 1.9,
0.1 2
 
2 −0.1
A3 = , λ1 = 2 + 0.1i, λ2 = 2 − 0.1i,
0.1 2

and for all of them, we used


 
1.5
r(1) = .
−0.1

In all three examples, the vectors r(i) were scaled relative to the ∞-
norm, thus r(1) is scaled to
 
(1) 1
r = .
−0.066667
An  tolerance of 5.0 × 10−4 was used for each matrix.

Figure 15.2.
The power method: three examples whose matrices are given in Example 15.9. The
longest black vector is the initial (guess) eigenvector. Successive iterations are in
lighter shades of gray. Each iteration is scaled with respect to the ∞-norm.
334 15. Eigen Things Revisited

The first matrix, A1 is symmetric and the dominant eigenvalue


is reasonably separated from λ2 , hence the rapid convergence in 11
iterations. The estimate for the dominant eigenvalue: λ = 2.99998.
The second matrix, A2 , is also symmetric, however λ1 is close in
value to λ2 , hence convergence is slower, needing 41 iterations. The
estimate for the dominant eigenvalue: λ = 2.09549.
The third matrix, a rotation matrix that is not symmetric, has
complex eigenvalues, hence no convergence. By following the change
in gray scale, we can follow the path of the iterative eigenvector es-
timate. The wide variation in eigenvectors makes clear the outline of
the ∞-norm. (Consult Figure 13.2 for an illustration of unit vectors
with respect to the ∞-norm.)

A more realistic numerical example is presented next with the


Google eigenvector.

15.3 Application: Google Eigenvector


We now study a linear algebra aspect of search engines. While many
search engine techniques are highly proprietary, they all share the
basic idea of ranking webpages. The concept was first introduced by
S. Brin and L. Page in 1998, and forms the basis of the search engine
Google. Here we will show how ranking webpages is essentially a
straightforward eigenvector problem.
The web (at some frozen point in time) consists of N webpages,
most of them pointing to (having links to) other webpages. A page
that is pointed to very often would be considered important, whereas
a page with none or only very few other pages pointing to it would be
considered not important. How can we rank all webpages according
to how important they are? In the sequel, we assume all webpages
are ordered in some fashion (such as lexicographic) so we can assign
a number, such as i, to each page.
First, two definitions: if page i points to page j, then we say this
is an outlink for page i, whereas if page j points to page i, then this
is an inlink for page i. A page is not supposed to link to itself. We
can represent this connectivity structure of the web by an N × N
adjacency matrix C: Each outlink for page i is recorded by setting
cj,i = 1. If page i does not have an outlink to page j, then cj,i = 0.
15.3. Application: Google Eigenvector 335

2 4

Figure 15.3.
Directed graph: represents the webpage connectivity defined by C in (15.9).

An example matrix is the following:


⎡ ⎤
0 1 1 1
⎢0 0 1 0⎥
C=⎢ ⎣1 1
⎥. (15.9)
0 1⎦
0 0 1 0

In this example, page 1 has one outlink since c3,1 = 1 and three
inlinks since c1,2 = c1,3 = c1,4 = 1. Thus, the ith column describes
the outlinks of page i and the ith row describes the inlinks of page
i. This connectivity structure is illustrated by the directed graph of
Figure 15.3.
The ranking ri of any page i is entirely defined by C. Here are
some rules with increasing sophistication:
1. The ranking ri should grow with the number of page i’s inlinks.
2. The ranking ri should be weighted by the ranking of each of page
i’s inlinks.
3. Let page i have an inlink from page j. Then the more outlinks
page j has, the less it should contribute to ri .
Let’s elaborate on these rules. Rule 1 says that a page that is
pointed to very often deserves high ranking. But rule 2 says that if
all those inlinks to page i prove to be low-ranked, then their sheer
number is mitigated by their low rankings. Conversely, if they are
mostly high-ranked, then they should boost page i’s ranking. Rule 3
implies that if page j has only one outlink and it points to page i, then
page i should be “honored” for such trust from page j. Conversely,
if page j points to a large number of pages, page i among them, this
does not give page i much pedigree.
336 15. Eigen Things Revisited

Although not realistic, assume for now that each page as at least
one outlink and at least one inlink so that the matrix C is structured
nicely. Let oi represent the total number of outlinks of page i. This
is simply the sum of all elements of the ith column of C. The more
outlinks page i has, the lower its contribution to page j’s ranking it
will have. Thus we scale every element of column i by 1/oi . The
resulting matrix D with
cj,i
dj,i =
oi
is called the Google matrix. Note that all columns of D have nonneg-
ative entries and sum to one. Matrices with that property (or with
respect to the rows) are called stochastic.2
In our example above, we have
⎡ ⎤
0 1/2 1/3 1/2
⎢0 0 1/3 0 ⎥
D=⎢ ⎣1 1/2 0 1/2⎦ .

0 0 1/3 0

We note that finding ri involves knowing the ranking of all pages,


including ri . This seems like an ill-posed circular problem, but a little
more analysis leads to the following matrix problem. Using the vector
r = [r1 , . . . , rN ]T , one can find

r = Dr. (15.10)

This states that we are looking for the eigenvector of D corresponding


to the eigenvalue 1. But how do we know that D does indeed have
an eigenvalue 1? The answer is that all stochastic matrices have that
property. One can even show that 1 is D’s largest eigenvalue. This
vector r is called a stationary vector.
To find r, we simply employ the power method from Section 15.2.
This method needs an initial guess for r, and setting all ri = 1, which
corresponds to equal ranking for all pages, is not too bad for that.
As the iterations converge, the solution is found. The entries of r are
real, since they correspond to a real eigenvalue.
The vector r now contains the ranking of every page, called page
rank by Google. If Google retrieves a set of pages all containing a link
to a term you are searching for, it presents them to you in decreasing
order of the pages’ ranking.
2 More precisely, matrices for which the columns sum to one are called left

stochastic. If the rows sum to one, the matrix is right stochastic and if the rows
and columns sum to one, the matrix is doubly stochastic.
15.4. Eigenfunctions 337

Back to our 4 × 4 example from above: D has an eigenvalue 1 and


the corresponding eigenvector

r = [0.67, 0.33, 1, 0.33]T,

which was calculated with the power method algorithm in Section 15.2.
Notice that r3 = 1 is the largest component, therefore page 3 has the
highest ranking. Even though pages 1 and 3 have the same number
of inlinks, the solitary outlink from page 1 to page 3 gives page 3 the
edge in the ranking.
In the real world, in 2013, there were approximately 50 billion web-
pages. This was the world’s largest matrix to be used ever. Luckily,
it contains mostly zeroes and thus is extremely sparse. Without tak-
ing advantage of that, Google (and other search engines) could not
function. Figure 15.1 illustrates a portion of a Google matrix for ap-
proximately 3 million pages. We gave three simple rules for building
D, but in the real world, many more rules are needed. For example,
webpages with no inlinks or outlinks must be considered. We would
want to modify D to ensure that the ranking r has only nonnegative
components. In order for the power method to converge, other modi-
fications of D are required as well, but that topic falls into numerical
analysis.

15.4 Eigenfunctions
An eigenvalue λ of a matrix A is typically thought of as a solution of
the matrix equation
Ar = λr.
In Section 14.5, we encountered more general spaces than those
formed by finite-dimensional vectors: those are spaces formed by poly-
nomials. Now, we will even go beyond that: we will explore the space
of all real-valued functions. Do eigenvalues and eigenvectors have
meaning there? Let’s see.
Let f be a function, meaning that y = f (x) assigns the output
value y to an input value x, and we assume both x and y are real
numbers. We also assume that f is smooth, or differentiable. An
example might be f (x) = sin(x).
The set of all such functions f forms a linear space as observed in
Section 14.5.
We can define linear maps L for elements of this function space. For
example, setting Lf = 2f is such a map, albeit a bit trivial. A more
interesting linear map is that of taking derivatives: Df = f  . Thus,
338 15. Eigen Things Revisited

to any function f , the map D assigns another function, namely the


derivative of f . For example, let f (x) = sin(x). Then Df (x) = cos(x).
How can we marry the concept of eigenvalues and linear maps such
as D? First of all, D will not have eigenvectors, since our linear space
consists of functions, not vectors. So we will talk of eigenfunctions
instead. A function f is an eigenfunction of D (or any other linear
map) if it is mapped to a multiple of itself:

Df = λf.

The scalar multiple λ is, as usual, referred to as an eigenvalue of f .


Note that D may have many eigenfunctions, each corresponding to a
different λ.
Since we know that D means taking derivatives, this becomes

f  = λf. (15.11)

Any function f satisfying (15.11) is thus an eigenfunction of the


derivative map D.
Now you have to recall your calculus: the function f (x) = ex sat-
isfies
f  (x) = ex ,
which may be written as

Df = f = 1 × f.

Hence 1 is an eigenvalue of the derivative map D. More generally, all


functions f (x) = eλx satisfy (for λ = 0):

f  (x) = λeλx ,

which may be written as


Df = λf.
Hence all real numbers λ = 0 are eigenvalues of D, and the corre-
sponding eigenfunctions are eλx . We see that our map D has infinitely
many eigenfunctions!
Let’s look at another example: the map is the second derivative,
Lf = f  . A set of eigenfunctions for this map is cos(kx) for k =
1, 2, . . . since

d2 cos(kx) d sin(kx)
2
= −k = −k 2 cos(kx), (15.12)
dx dx
15.5. Exercises 339

and the eigenvalues are −k 2 . Can you find another set of eigenfunc-
tions?
This may seem a bit abstract, but eigenfunctions actually have
many uses, for example in differential equations and mathematical
physics. In engineering mathematics, orthogonal functions are key for
applications such as data fitting and vibration analysis. Some well-
known sets of orthogonal functions arise as the result of the solution
to a Sturm-Liouville equation such as

y  (x) + λy(x) = 0 such that y(0) = 0 and y(π) = 0. (15.13)

This is a linear second order differential equation with boundary con-


ditions, and it defines a boundary value problem. Unknown are the
functions y(x) that satisfy this equation. Solving this boundary value
problem is out of the scope of this book, so we simply report that
y(x) = sin(ax) for a = 1, 2, . . .. These are eigenfunctions of (15.13)
and the corresponding eigenvalues are λ = a2 .

• eigenvalue • quadratic form


• eigenvector • positive definite matrix
• characteristic polynomial • power method
• eigenvalues and • max-norm
eigenvectors of a • adjacency matrix
symmetric matrix • directed graph
• dominant eigenvalue • stochastic matrix
• eigendecomposition • stationary vector
• trace • eigenfunction

15.5 Exercises
1. What are the eigenvalues and eigenvectors for
 
2 1
A= ?
1 2

2. What are the eigenvalues and eigenvectors for


⎡ ⎤
1 1 1
A = ⎣0 1 0⎦?
1 0 1
340 15. Eigen Things Revisited

3. Find the eigenvalues of the matrix


⎡ ⎤
4 2 −3 2
⎢0 3 1 3⎥
⎢ ⎥.
⎣0 0 2 1⎦
0 0 0 1

4. If ⎡ ⎤
1
r = ⎣1⎦
1
is an eigenvector of ⎡ ⎤
0 1 1
A = ⎣1 1 0⎦ ,
1 0 1
what is the corresponding eigenvalue?
5. The matrices A and AT have the same eigenvalues. Why?
6. If A has eigenvalues 4, 2, 0, what is the rank of A? What is the deter-
minant?
7. Suppose a matrix A has a zero eigenvalue. Will forward elimination
change this eigenvalue?
8. Let a rotation matrix be given by
⎡ ⎤
cos α 0 − sin α
R=⎣ 0 1 0⎦ ,
sin α 0 cos α

which rotates around the e2 -axis by −α degrees. What is one eigenvalue


and eigenvector?
9. For the matrix  
1 −2
A=
−2 1
show that the determinant equals the product of the eigenvalues and
the trace equals the sum of the eigenvalues.
10. What is the dominant eigenvalue in Exercise 9?
11. Compute the eigenvalues of A and A2 where
⎡ ⎤
3 0 1
A = ⎣0 1 0⎦ .
0 0 3

12. Let A be the matrix ⎡ ⎤


4 0 1
A = ⎣0 1 2⎦ .
1 2 2
15.5. Exercises 341

Starting with the vector



r(1) = 1 1 1

carry out three steps of the power method with (15.7), and use r(3) and
r(4) in (15.8) to estimate A’s dominant eigenvalue. If you are able to
program, then try implementing the power method algorithm.
13. Let A be the matrix
⎡ ⎤
−8 0 8
A=⎣ 0 1 −2⎦ .
8 −2 0
Starting with the vector

r(1) = 1 1 1

carry out three steps of the power method with (15.7), and use r(3) and
r(4) in (15.8) to estimate A’s dominant eigenvalue. If you are able to
program, then try implementing the power method algorithm.
14. Of the following matrices, which one(s) are stochastic matrices?
⎡ ⎤ ⎡ ⎤
0 1 0 0 1 1/3 1/4 0
⎢2 0 0 1/2⎥ ⎢0 0 1/4 0⎥
A=⎢ ⎣−1 0 0 1/2⎦ ,
⎥ B=⎢ ⎣0 1/3 1/4 1⎦ ,

0 0 1 0 0 1/3 1/4 0
⎡ ⎤
⎡ ⎤ 1/2 0 0 1/2
1 0 0 ⎢ 0 1/2 1/2 0 ⎥
C = ⎣0 0 0⎦ , D=⎢ ⎣1/3 1/3 1/3
⎥.
0 ⎦
0 0 1
1/2 0 0 1/2

15. The directed graph in Figure 15.4 describes inlinks and outlinks to web-
pages. What is the corresponding adjacency matrix C and stochastic
(Google) matrix D?

2 4

Figure 15.4.
Graph showing the connectivity defined by C.
342 15. Eigen Things Revisited

16. For the adjacency matrix


⎡ ⎤
0 0 0 1 0
⎢1 0 0 0 1⎥
⎢ ⎥
C=⎢
⎢0 1 0 1 1⎥
⎥,
⎣1 0 0 0 1⎦
1 0 1 0 0

draw the corresponding directed graph that describes these inlinks and
outlinks to webpages. What is the corresponding stochastic matrix D?
17. The Google matrix in Exercise 16 has dominant eigenvalue 1 and cor-
responding eigenvector r = [1/5, 2/5, 14/15, 2/5, 1]T . Which page has
the highest ranking? Based on the criteria for page ranking described
in Section 15.3, explain why this is so.
18. Find the eigenfunctions and eigenvalues for Lf (x) = xf  .
19. For the map Lf = f  , a set of eigenfunctions is given in (15.12). Find
another set of eigenfunctions.
16
The Singular Value
Decomposition

Figure 16.1.
Image compression: a method that uses the SVD. Far left: original image; second from
left: highest compression; third from left: moderate compression; far right: method
recovers original image. See Section 16.7 for details.

Matrix decomposition is a fundamental tool in linear algebra for un-


derstanding the action of a matrix, establishing its suitability to solve
a problem, and for solving linear systems more efficiently and effec-
tively. We have encountered an important decomposition already, the
eigendecomposition for symmetric matrices (see Section 7.5). The
topic of this chapter, the singular value decomposition (SVD), is a
tool for more general, even nonsquare matrices. Figure 16.1 demon-
strates one application of SVD, image compression.
This chapter allows us to revisit several themes from past chapters:
eigenvalues and eigenvectors, the condition number, the least squares
solution to an overdetermined system, and more! It provides a good
review of some important ideas in linear algebra.

343
344 16. The Singular Value Decomposition

16.1 The Geometry of the 2 × 2 Case


Let A be a 2 × 2 matrix, nonsingular for now. Let v1 and v2 be
two unit vectors that are perpendicular to each other, thus they are
orthonormal. This means that V = [v1 v2 ] is an orthogonal matrix:
V −1 = V T . In general, A will not map two orthonormal vectors v1 , v2
to two orthonormal image vectors u1 , u2 . However, it is possible for
A to map two particular vectors v1 and v2 to two orthogonal image
vectors σ1 u1 and σ2 u2 . The two vectors u1 and u2 are assumed to
be of unit length, i.e., they are orthonormal.
We formalize this as
AV = U Σ (16.1)
with an orthogonal matrix U = [u1 u2 ] and a diagonal matrix
 
σ1 0
Σ= .
0 σ2

If Avi = σui , then each ui is parallel to Avi , and A preserves the


orthogonality of the vi .
We now conclude from (16.1) that

A = U ΣV T . (16.2)

This is the singular value decomposition, SVD for short, of A. The


diagonal elements of Σ are called the singular values of A. Let’s now
find out how to determine U, Σ, V .
In Section 7.6, we established that symmetric positive definite ma-
trices, such as AT A, have real and positive eigenvalues and their eigen-
vectors are orthogonal. Considering the SVD in (16.2), we can write

AT A = (U ΣV T )T (U ΣV T )
= V ΣT U T U ΣV T
= V ΣT ΣV T
= V Λ V T , (16.3)

where     2 
λ1 0 σ1 0
Λ = = Σ T
Σ = .
0 λ2 0 σ22
Equation (16.3) states the following: The symmetric positive defi-
nite matrix AT A has eigenvalues that are the diagonal entries of Λ
and eigenvectors as columns of V , which are called the right singular
vectors of A. This is the eigendecomposition of AT A.
16.1. The Geometry of the 2 × 2 Case 345

The symmetric positive definite matrix AAT also has an eigende-


composition,

AAT = (U ΣV T )(U ΣV T )T
= U ΣV T V ΣT U T
= U ΣΣT U T
= U Λ U T , (16.4)

observing that ΣT Σ = ΣΣT . Equation (16.4) states that the symmet-


ric positive definite matrix AAT has eigenvalues that are the diagonal
entries of Λ and eigenvectors as the columns of U , and they are called
the left singular vectors of A.
It should be no surprise that Λ appears in both (16.3) and (16.4)
since a matrix and its transpose share the same (nonzero) eigenvalues.
Now we understand a bit more about the elements of the SVD
in (16.2): the singular values, σi , of A are the square roots of the
eigenvalues of AT A and AAT , that is

σi = λi .

The columns of V are the eigenvectors of AT A and the columns of


U are the eigenvectors of AAT . Observe from (16.1) that we can
compute ui = Avi / · .

Example 16.1

Let’s start with a very simple example:


 
3 0
A= ,
0 1

a symmetric, positive definite matrix that scales in the e1 -direction.


Then  
T T 9 0
AA = A A = ,
0 1
and we can easily calculate the eigenvalues of AT A as λ1 = 9 and
λ2 = 1. This means that σ1 = 3 and σ2 = 1. The eigenvectors of
AAT and AT A are identical and happen to be the columns of the
identity matrix for this simple example,
 
1 0
U =V = .
0 1
346 16. The Singular Value Decomposition

Figure 16.2.
Action of a map: the unit circle is mapped to the action ellipse with semi-major axis
length σ 1 and semi-minor axis length σ 2 . Left: ellipse from matrix in Example 16.1;
middle: circle; right: ellipse from Example 16.2.

Now we can form the SVD of A, A = U ΣV T :


     
3 0 1 0 3 0 1 0
= .
0 1 0 1 0 1 0 1

This says that the action of A on a vector x, that is Ax, simply


amounts to a scaling in the e1 -direction, which we observed to begin
with!
Notice that the eigenvalues of A are identical to the singular values.
In fact, because this matrix is positive definite, the SVD is identical
to the eigendecomposition.

Throughout Chapter 5 we examined the action of a 2 × 2 matrix


using an illustration of the circular Phoenix mapping to an elliptical
Phoenix. This action ellipse can now be described more precisely:
the semi-major axis has length σ1 and the semi-minor axis has length
σ2 . Figure 16.2 illustrates this point for Examples 16.1 and 16.2. In
that figure, you see: the semi-axes, the map of [1 0]T (thick point),
and the map of [0 1]T (thin point).

Example 16.2

This example is a little more interesting, as the matrix is now a shear,


 
1 2
A= .
0 1

We compute
   
1 2 5 2
AT A = , AAT = ,
2 5 2 1
16.1. The Geometry of the 2 × 2 Case 347

and we observe that these two matrices are no longer identical, but
they are both symmetric. As they are 2 × 2 matrices, we can easily
calculate the eigenvalues as λ1 = 5.82 and λ2 = 0.17. (Remember:
the nonzero eigenvalues are the same for a matrix and its transpose.)
These eigenvalues result in singular values σ1 = 2.41 and σ2 = 0.41.
The eigenvectors of AT A are the orthonormal column vectors of
 
0.38 −0.92
V =
0.92 0.38

and the eigenvectors of AAT are the orthonormal column vectors of


 
0.92 −0.38
U= .
0.38 0.92

Now we can form the SVD of A, A = U ΣV T :


     
1 2 0.92 −0.38 2.41 0 0.38 −0.92
= .
0 1 0.38 0.92 0 0.41 0.92 0.38

Figure 16.3 will help us break down the action of A in terms of the
SVD. It is now clear that V and U are rotation or reflection matrices
and Σ scales, deforming the circle into an ellipse.
Notice that the eigenvalues of A are λ1 = λ2 = 1, making the point
that, in general, the singular values are not the eigenvalues!

Figure 16.3.
SVD breakdown: shear matrix A from Example 16.2. Clockwise from top left: Initial
point set forming a circle with two reference points; V T x rotates clockwise 67.5◦ ; ΣV T x
stretches in e1 and shrinks in e2 ; U ΣV T x rotates counterclockwise 22.5◦ , illustrating
the action of A.
348 16. The Singular Value Decomposition

Now we come full circle and look at what we have solved in terms
of our original question that was encapsulated by (16.1): What or-
thonormal vectors vi are mapped to orthogonal vectors σi ui ? The
SVD provides a solution to this question by providing V , U , and Σ.
Furthermore, note that for this nonsingular case, the columns of V
form a basis for the row space of A and the columns of U form a basis
for the column space of A.
It should be clear that the SVD is not limited to invertible 2 × 2
matrices, so let’s look at the SVD more generally.

16.2 The General Case


The SVD development of the previous section assumed that A was
square and invertible. This had the effect that AT A had nonzero
eigenvalues and well-defined eigenvectors. However, everything still
works if A is neither square nor invertible. Just a few more aspects
of the decomposition come into play.
For the general case, A will be a rectangular matrix with m rows
and n columns mapping Rn to Rm . As a result of this freedom in
the dimensions, the matrices of the SVD (16.2) must be modified.
Illustrated in Figure 16.4 is each scenario, m < n, m = n, or m > n.
The matrix dimensions are as follows: A is m × n, U is m × m, Σ is
m × n, and V T is n × n. The matrix Λ in (16.3) is n × n and in (16.4)
is m × m, but they still hold the same nonzero eigenvalues because
the rank of a matrix cannot exceed min{m, n}.

A = U Σ VT

A = U Σ VT

A = U Σ VT

Figure 16.4.
SVD matrix dimensions: an overview of the SVD of an m × n matrix A. Top: m > n;
middle: m = n; bottom: m < n.
16.2. The General Case 349

Again we ask, what orthonormal vectors vi are mapped by A to


orthogonal vectors σi ui , where the ui are orthonormal? This is en-
capsulated in (16.1). In the general case as well, the matrices U and
V form bases, however, as we are considering rectangular and sin-
gular matrices, the rank r of A plays a part in the interpretation of
the SVD. The following are the main SVD properties, for a detailed
exposition, see Strang [16].

• The matrix Σ has nonzero singular values, σ1 , . . . , σr , and all other


entries are zero.

• The first r columns of U form an orthonormal basis for the column


space of A.

• The last m − r columns of U form an orthonormal basis for the


null space of AT .

• The first r columns of V form an orthonormal basis for the row


space of A.

• The last n − r columns of V form an orthonormal basis for the null


space of A.

Examples will make this clearer.

Example 16.3

Let A be given by
⎡ ⎤
1 0
A = ⎣0 2⎦ .
0 1

The first step is to form AT A and AAT and find their eigenvalues and
(normalized) eigenvectors, which make up the columns of an orthog-
onal matrix.
   
1 0 λ1 = 5, 0 1
AT A = , V = ;
0 5 λ2 = 1, 1 0
⎡ ⎤ ⎡ ⎤
1 0 0 λ1 = 5, 0 1 0
AAT = ⎣0 4 2⎦ , λ2 = 1, U = ⎣0.89 0 −0.44⎦ .
0 2 1 λ3 = 0, 0.44 0 0.89
350 16. The Singular Value Decomposition

Figure 16.5.
SVD of a 3 × 2 matrix A: see Example 16.3. Clockwise from top left: Initial point set
forming a circle with one reference point; V T x reflects; ΣV T x stretches in e1 ; U ΣV T x
rotates counterclockwise 26.5◦ , illustrating the action of A.

The rank of A is 2, thus there are two singular values, and


⎡ ⎤
2.23 0
Σ=⎣ 0 1⎦ .
0 0

The SVD of A = U ΣV T :
⎡ ⎤ ⎡ ⎤⎡ ⎤
1 0 0 1 0 2.23 0  
⎣0 2⎦ = ⎣0.89 0 −0.44⎦ ⎣ 0 0 1
1⎦ .
1 0
0 1 0.44 0 0.89 0 0

Figure 16.5 illustrates the elements of the SVD and the action of A.
Because m > n, u3 is in the null space of AT , that is AT u3 = 0.

Example 16.4

For a matrix of different dimensions, we pick


 
−0.8 0 0.8
A= .
1 1.5 −0.3.
The first step is to form AT A and AAT and find their eigenvalues
and (normalized) eigenvectors, which are made by the columns of an
16.2. The General Case 351

orthogonal matrix.
⎡ ⎤ ⎡ ⎤
1.64 1.5 −0.94 λ1 = 3.77, −0.63 0.38 0.67
AT A = ⎣ 1.5 2.25 −0.45⎦ , λ2 = 0.84, V = ⎣−0.71 −0.62 −0.31⎦ ;
−0.94 −0.45 0.73 λ3 = 0, 0.30 −0.68 0.67
   
T 1.28 −1.04 λ1 = 3.77, 0.39 −0.92
AA = , U= .
−1.04 3.34 λ2 = 0.84, −0.92 −0.39

The matrix A is rank 2, thus there are two singular values, and
 
1.94 0 0
Σ= .
0 0.92 0

The SVD of A = U ΣV T :
⎡ ⎤
     −0.63 −0.71 0.3
−0.8 0 0.8 0.39 −0.92 1.94 0 0 ⎣
= 0.38 −0.62 −0.68⎦ .
1 1.5 −0.3. −0.92 −0.39 0 0.92 0
0.67 −0.31 0.67

Figure 16.6 illustrates the elements of the SVD and the action of A.
Because m < n, v3 is in the null space of A, that is, Av3 = 0.

Figure 16.6.
The SVD of a 2 × 3 matrix A: see Example 16.4. Clockwise from top left: Initial point
set forming a circle with one reference point; V T x; ΣV T x; U ΣV T x, illustrating the
action of A.
352 16. The Singular Value Decomposition

Example 16.5

Let’s look at a fairly simple example, a rank deficient matrix,


⎡ ⎤
1 0 0
A = ⎣0 1 0⎦ ,
0 0 0

which is a projection into the [e1 , e2 ]-plane. Because A is symmetric


and idempotent, A = AT A = AAT . It is easy to see that A = U ΣV T
with ⎡ ⎤ ⎡ ⎤⎡ ⎤⎡ ⎤
1 0 0 1 0 0 1 0 0 1 0 0
⎣0 1 0⎦ = ⎣0 1 0⎦ ⎣0 1 0⎦ ⎣0 1 0⎦ .
0 0 0 0 0 1 0 0 0 0 0 1
Since the rank is two, the first two columns of U and V form an
orthonormal basis for the column and row spaces of A, respectively.
The e3 vector is projected to the zero vector by A, and thus this
vector spans the null space of A and AT .

Generalizing the 2 × 2 case, the σi are the lengths of the semi-axes


of an ellipsoid. As before, the semi-major axis is length σ1 and the
length of the semi-minor axis is equal to the smallest singular value.
The SVD is an important tool for dealing with rank deficient ma-
trices.

16.3 SVD Steps


This section is titled “Steps” rather than “Algorithm” to emphasize
that the description here is simply a review of our introduction to the
SVD. A robust algorithm that is efficient in terms of computing time
and storage will be found in an advanced numerical methods text.
Let’s summarize the steps for finding the SVD of a matrix A.
Input: an m × n matrix A.
Output: U, V, Σ such that A = U ΣV T .

Find the eigenvalues λ1 , . . . , λn of AT A.


Order the λi so that λ1 ≥ λ2 ≥ . . . ≥ λn .
Suppose λ1 , . . . , λr > 0, then the rank of Ais r.
Create an m × n diagonal matrix Σ with σi,i = λi , i = 1, . . . , r.
16.4. Singular Values and Volumes 353

Find the corresponding (normalized) eigenvectors vi of AT A.


Create an n × n matrix V with column vectors vi .
Find the (normalized) eigenvectors ui of AAT .
Create an m × m matrix U with column vectors ui .

You have now found the singular valued decomposition of A, which


is A = U ΣV T .
A note on U : instead of finding the columns as the eigenvectors of
AT A, one can compute ui , i = 1, r as ui = Avi / · . If m > n then
the remaining ui are found from the null space of AT .
The only “hard” task in this is finding the λi . But there are several
highly efficient algorithms for this task, taking advantage of the fact
that AT A is symmetric. Many of these algorithms will return the
corresponding eigenvectors as well.
As we discussed in the context of least squares in Section 13.1,
forming AT A can result in an ill-posed problem because the condition
number of this matrix is the square of the condition number of A.
Thus, numerical methods will avoid direct computation of this matrix
by employing a method such as Householder.

16.4 Singular Values and Volumes


As a practical application, we will use the SVD to compute the de-
terminant of a matrix A. We observe that in (16.2), the matrices U
and V , being orthogonal, have determinants equal to ±1. Thus,

| det A| = det Σ = σ1 · . . . · σn . (16.5)

If a 2D triangle has area ϕ, it will have area ±σ1 σ2 ϕ after being


transformed by a 2D linear map with singular values σ1 , σ2 . Similarly,
if a 3D object has volume ϕ, it will have volume ±σ1 σ2 σ3 ϕ after being
transformed by a linear map with singular values σ1 , σ2 , σ3 .
Of course one can compute determinants without using singular
values. Recall from Section 15.1, the characteristic polynomial of A,

p(λ) = det[A − λI] = (λ1 − λ)(λ2 − λ) · . . . · (λn − λ).

Evaluating p at λ = 0, we have

det A = λ1 · . . . · λn , (16.6)

where the λi are A’s eigenvalues.


354 16. The Singular Value Decomposition

16.5 The Pseudoinverse


The inverse of a matrix, introduced in Sections 5.9 and 12.4, is mainly
a theoretical tool for analyzing the solution to a linear system. Addi-
tionally, the inverse is limited to square, nonsingular matrices. What
might the inverse of a more general type of matrix be? The answer
is in the form of a generalized inverse, or the so-called pseudoinverse,
and it is denoted as A† (“A dagger”). The SVD is a very nice tool
for finding the pseudoinverse, and it is suited for practical use as we
shall see in Section 16.6.
Let’s start with a special matrix, an m × n diagonal matrix Σ with
diagonal elements σi . The pseudoinverse, Σ† , has diagonal elements
σi† given by
!
† 1/σi if σi > 0,
σi = ,
0 else.

and its dimension is n × m. If the rank of Σ is r, then the product


Σ† Σ holds the r × r identity matrix, and all other elements are zero.
We can use this very simple expression for the pseudoinverse of Σ
to express the pseudoinverse for a general m × n matrix A using its
SVD,
A† = (U ΣV T )−1 = V Σ† U T , (16.7)

recalling that U and V are orthogonal matrices.


If A is square and invertible, then A† = A−1 . Otherwise, A† still
has some properties of an inverse:

A† AA† = A† , (16.8)
AA† A = A. (16.9)

Example 16.6

Let’s find the pseudoinverse of the matrix from Example 16.3,




1 0
A = ⎣0 2⎦ .
0 1

We find  
1/2.23 0 0
Σ† = ,
0 1 0
16.6. Least Squares 355

then (16.7) results in


⎡ ⎤
     0 0.89 0.44
1 0 0 0 1 1/2.23 0 0 ⎣
A† = = 1 0 0 ⎦.
0 2/5 1/5 1 0 0 1 0
0 −0.44 0.89

And we check that (16.8) holds:


⎡ ⎤
  1 0  
† 1 0 0 ⎣ ⎦ 1 0 0
A = 0 2 ,
0 2/5 1/5 0 2/5 1/5
0 1

and also (16.9):


⎡ ⎤ ⎡ ⎤
1 0   1 0
1 0 0 ⎣
A = ⎣0 2⎦ 0 2⎦ .
0 2/5 1/5
0 1 0 1

Example 16.7

The matrix from Example 16.1 is square and nonsingular, therefore


the pseudoinverse is equal to the inverse. Just by visual inspection:
   
3 0 1/3 0
A= , A−1 = ,
0 1 0 1

and the pseudoinverse is


     
1 0 1/3 0 1 0 1/3 0
A† = = .
0 1 0 1 0 1 0 1

This generalization of the inverse of a matrix is often times called


the Moore-Penrose generalized inverse. Least squares approximation
is a primary application for this pseudoinverse.

16.6 Least Squares


The pseudoinverse allows for a concise approximate solution, specif-
ically the least squares solution, to an overdetermined linear sys-
tem. (A detailed introduction to least squares may be found in Sec-
tion 12.7.)
356 16. The Singular Value Decomposition

In an overdetermined linear system

Ax = b,

A is a rectangular matrix with dimension m × n, m > n. This means


that the linear system has m equations in n unknowns and it is in-
consistent because it is unlikely that b lives in the subspace V defined
by the columns of A. The least squares solution finds the orthogonal
projection of b into V, which we will call b . Thus the solution to
Ax = b produces the vector closest to b that lives in V. This leads
us to the normal equations

AT Ax = AT b, (16.10)

whose solution minimizes

Ax − b. (16.11)

In our introduction to condition numbers in Section 13.4, we dis-


cussed that this system can be ill-posed because the condition number
of AT A is the square of that of A. To avoid forming AT A, in Sec-
tion 13.1 we proposed the Householder algorithm as a method to find
the least squares solution.
As announced at the onset of this section, the SVD and pseudoin-
verse provide a numerically stable method for finding the least squares
solution to an overdetermined linear system. We find an approximate
solution rather easily:
x = A† b. (16.12)
But why is this the least squares solution?
Again, we want to find x to minimize (16.11). Let’s frame the
linear system in terms of the SVD of A and take advantage of the
orthogonality of U ,

Ax − b = U ΣV T x − b
= U ΣV T x − U U T b
= U (Σy − z).

This new framing of the problem exposes that

Ax − b = Σy − z,

and leaves us with an easier diagonal least squares problem to solve.


The steps involved are as follows.
16.6. Least Squares 357

1. Compute the SVD A = U ΣV T .

2. Compute the m × 1 vector z = U T b.

3. Compute the n × 1 vector y = Σ† z.


This is the least squares solution to the m × n problem Σy = z.
The least squares solution requires minimizing v = Σy − z, which
has the simple form:
⎡ ⎤
σ1 y1 − z1
⎢σ2 y2 − z2 ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
v=⎢ σ y
⎢ r r − z ⎥
r⎥ ,
⎢ −zr+1 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
−zm

where the rank of Σ is r.


It is easy to see that the y that will minimize v is yi = zi /σi for
i = 1, . . . , r, hence y = Σ† z.

4. Compute the n × 1 solution vector x = V y.

To summarize, the least squares solution is reduced to a diagonal least


squares problem (Σy = z), which requires only simple matrix-vector
multiplications. The calculations in reverse order include

x=Vy
x = V (Σ† z)
x = V Σ† (U T b).

We have rediscovered the pseudoinverse, and we have come back to


(16.12), while verifying that this is indeed the least squares solution.

Example 16.8

Let’s revisit the least squares problem that we solved using the normal
equations in Example 12.13 and the Householder method in Exam-
358 16. The Singular Value Decomposition

ple 13.3. The 7 × 2 overdetermined linear system, Ax = b, is


⎡ ⎤ ⎡ ⎤
0 1 30
⎢10 1⎥ ⎢25⎥
⎢ ⎥ ⎢ ⎥
⎢20 1⎥ ⎢40⎥
⎢ ⎥ ⎢ ⎥
⎢30 1⎥ x = ⎢40⎥ .
⎢ ⎥ ⎢ ⎥
⎢40 1⎥ ⎢30⎥
⎢ ⎥ ⎢ ⎥
⎣50 1⎦ ⎣5⎦
60 1 25

The best fit line coefficients are found in four steps.


1. Compute the SVD, A = U ΣV T . (The matrix dimensions are as
follows: U is 7 × 7, Σ is 7 × 2, V is 2 × 2.)
⎡ ⎤
95.42 0
⎢ 0
⎢ 1.47⎥⎥
⎢ 0
⎢ 0 ⎥⎥
 
⎥, 0.01 0 0 0 0 0 0
Σ=⎢ ⎢ 0 0 ⎥ Σ †
= .
⎢ 0 ⎥ 0 0.68 0 0 0 0 0
⎢ 0 ⎥
⎣ 0 0 ⎦
0 0

2. Compute ⎡ ⎤
54.5
⎢ 51.1 ⎥
⎢ ⎥
⎢ 3.2 ⎥
⎢ ⎥
z = U Tb = ⎢ ⎥
⎢−15.6⎥ .
⎢ 9.6 ⎥
⎢ ⎥
⎣ 15.2 ⎦
10.8

3. Compute  
0.57
y = Σ† z = .
34.8

4. Compute  
−0.23
x=Vy = ,
34.8
resulting in the same best fit line, x2 = −0.23x1 + 34.8, as we
found via the normal equations and the Householder method.
16.7. Application: Image Compression 359

The normal equations (16.10) give a best approximation

x = (AT A)−1 AT b (16.13)

to the original problem Ax = b. This approximation was developed


by considering a new right-hand side vector b in the subspace of A,
called V. If we substitute the expression for x in (16.13) into Ax = b ,
we have an expression for b in relation to b,

b = A(AT A)−1 AT b
= AA† b
= projV b

Geometrically, we can see that AA† is a projection because the goal


is to project b into the subspace V. A projection must be idempotent
as well, and a property of the pseudoinverse in (16.8) ensures this.
This application of the SVD gave us the opportunity to bring to-
gether several important linear algebra topics!

16.7 Application: Image Compression


Suppose the m × n matrix A has k = min(m, n) singular values σi
and as before, σ1 ≥ σ2 ≥ . . . ≥ σk . Using the SVD, it is possible to
write A as a sum of k rank one matrices:

A = σ1 u1 v1T + σ2 u2 v2T + . . . + σk uk vkT . (16.14)

This is analogous to what we did in (7.13) for the eigendecomposition.


We can use (16.14) for image compression. An image is comprised
of a grid of colored pixels.1 Figure 16.7 (left) is a very simple example;
it has only 4 × 4 pixels. Each grayscale is associated with a number,
thus this grid can be thought of as a matrix. The singular values for
this matrix are σi = 7.1, 3.8, 1.3, 0.3. Let’s refer to the images from
left to right as I0 , I1 , I2 , I3 . The original image is I0 . The matrix

A1 = σ1 u1 v1T

results in image I1 and the matrix

A2 = σ1 u1 v1T + σ2 u2 v2T

results in image I2 . Notice that the original image is nearly replicated


incorporating only half the singular values. This is due to the fact
1 We use grayscales here.
360 16. The Singular Value Decomposition

Figure 16.7.
Image compression: a method that uses SVD. The input matrix has singular values
σ i = 7.1, 3.8, 1.3, 0.3. Far left: original image; from left to right: recovering the image
by adding projection terms.

that σ1 and σ2 are large in comparison to σ3 and σ4 . Image I3 is


created from A3 = A2 + σ3 u3 v3T . Image I4 , which is not displayed, is
identical to I0 .
The change in an image by adding only those Ii corresponding
to small σi can be hardly noticeable. Thus omitting images Ik cor-
responding to small σk amounts to compressing the original image;
there is no severe quality loss for small σk . Furthermore, if some σi
are zero, compression can clearly be achieved. This is the case in
Figure 16.1, an 8 × 8 matrix with σi = 6.2, 1.7, 1.49, 0, . . . , 0. The
figure illustrates images corresponding to each nonzero σi . The last
image is identical to the input, making it clear that the five remaining
σi = 0 are unimportant to image quality.

16.8 Principal Components Analysis


Figure 16.8 illustrates a 2D scatter plot: each circle represents a co-
ordinate pair (point) in the [e1 , e2 ]-system. For example, we might
want to determine if gross domestic product (GDP) and the poverty
rate (PR) for countries in the World Trade Organization are related.
We would then record [GDP PR]T as coordinate pairs. How might
we reveal trends in this data set?
Let our data set be given as x1 , . . . , xn , a set of 2D points such
that x1 + . . . + xn = 0. This condition simply means that the origin
is the centroid of the points. The data set in Figure 16.8 has been
translated to produce Figure 16.9 (left). Let d be a unit vector. Then
the projection of xi onto a line through the origin containing d is a
vector with (signed) length xi · d. Let l(d) be the sum of the squares
of all these lengths:

l(d) = [x1 · d]2 + . . . + [xn · d]2 .


16.8. Principal Components Analysis 361

Figure 16.8.
Scatter plot: data pairs recorded in Cartesian coordinates.

Imagine rotating d around the origin. For each position of d, we


compute the value l(d). For the longest line in the left part of Fig-
ure 16.9, the value of l(d) is large in comparison to d orthogonal to
it, demonstrating that the value of l(d) indicates the variation in the
point set from the line generated by d. Higher variation results in
larger l(d).
Let’s arrange the data xi in a matrix
⎡ T⎤
x1
⎢xT ⎥
⎢ 2⎥
X = ⎢ . ⎥.
⎣ .. ⎦
xTn

Then we rewrite l(d) as

l(d) = Xd2
= (Xd) · (Xd)
= dT X T Xd. (16.15)

We further abbreviate C = X T X and note that C is a symmetric,


positive definite 2 × 2 matrix. Hence (16.15) describes a quadratic
form just as we discussed in Section 7.6.
For which d is l(d) maximal? The answer is simple: for d being the
eigenvector corresponding to C’s largest eigenvalue. Similarly, l(d) is
minimal for d being the eigenvector corresponding to C’s smallest
362 16. The Singular Value Decomposition

Figure 16.9.
PCA: Analysis of a data set. Left: given data with centroid translated to the origin. Thick
lines are coincident with the eigenvectors scaled by their corresponding eigenvalue.
Right: points, eigenvector lines, and quadratic form over the unit circle.

eigenvalue. Recall that these eigenvectors form the major and minor
axis of the action ellipse of C, and the thick lines in Figure 16.9 (left)
are precisely these axes. In the right part of Figure 16.9, the quadratic
form for the data set is shown along with the data and action ellipse
axes. We see that the dominant eigenvector corresponds to highest
variance in the data and this is reflected in the quadratic form as
well. If λ1 = λ2 , then there is no preferred direction in the point
set and the quadratic form is spherical. We are guaranteed that the
eigenvectors will be orthogonal because C is a symmetric matrix.
Looking more closely at C, we see its very simple form,

c1,1 = x21,1 + x22,1 + . . . + x2n,1 ,


c1,2 = c2,1 = x1,1 x1,2 + x2,1 x2,2 + . . . + xn,1 xn,2 ,
c2,2 = x21,2 + x22,2 + . . . + x2n,2 .

If each element of C is divided by n, then this is called the covariance


matrix. This matrix is a summary of the variation in each coordi-
nate and between coordinates. Dividing by n will result in scaled
eigenvalues; the eigenvectors will not change.
16.8. Principal Components Analysis 363

Figure 16.10.
PCA data transformations: three possible data transformations based on PCA analy-
sis. Top: data points transformed to the principal components coordinate system. This
set appears in all images. Middle: data compression by keeping dominant eigenvector
component. Bottom: data compression by keeping the nondominant eigenvector.

The eigenvectors provide a convenient local coordinate frame for


the data set. This isn’t a new idea: it is exactly the principle of
the eigendecomposition. This frame is commonly called the princi-
pal axes. Now we can construct an orthogonal transformation of the
data, expressing them in terms of a new, more meaningful coordinate
system. Let the matrix V = [v1 v2 ] hold the normalized eigenvec-
tors as column vectors, where v1 is the dominant eigenvector. The
orthogonal transformation of the data X that aligns v1 with e1 and
v2 with e2 is simply
X̂ = XV.
364 16. The Singular Value Decomposition

This results in  
xi · v1
x̂i = .
xi · v2
Figure 16.10 (top) illustrates the result of this transformation. Revisit
Example 7.6: this is precisely the transformation we used to align
the contour ellipse to the coordinate axes. (The transformation is
written a bit differently here to accommodate the point set organized
as transposed vectors.)
In summary: the data coordinates are now in terms of the trend
lines, defined by the eigenvectors of the covariance matrix, and the
coordinates directly measure the distance from each trend line. The
greatest variance corresponds to the first coordinate in this principal
components coordinate system. This leads us to the name of this
method: principal components analysis (PCA).
So far, PCA has worked with all components of the given data.
However, it can also be used for data compression by reducing di-
mensionality. Instead of constructing V to hold all eigenvectors, we
may use only the most significant, so suppose V = v1 . This transfor-
mation produces the middle image shown in Figure 16.10. If instead,
V = v2 , then the result is the bottom image, and for clarity, we chose
to display these points on the e2 -axis, but this is arbitrary. Compar-
ing these results: there is greater spread of the data in the middle
image, which corresponds to a trend line with higher variance.
Here we focused on 2D data, but the real power of PCA comes with
higher dimensional data for which it is very difficult to visualize and
understand relationships between dimensions. PCA makes it possible
to identify insignificant dimensions and eliminate them.

• singular value • matrix decomposition


decomposition (SVD) • action ellipse axes length
• singular values • pseudoinverse
• right singular vector • generalized inverse
• left singular vector • least squares solution via
• SVD matrix dimensions the pseudoinverse
• SVD column, row, and • quadratic form
null spaces • contour ellipse
• SVD steps • principal components
• volume in terms of analysis (PCA)
singular values • covariance matrix
• eigendecomposition
16.9. Exercises 365

16.9 Exercises
1. Find the SVD for  
1 0
A= .
0 4
2. What is the eigendecomposition of matrix A in Exercise 1.
3. For what type of matrix are the eigenvalues the same as the singular
values?
4. The action of a 2 × 2 linear map can be described by the mapping of
the unit circle to an ellipse. Figure 4.3 illustrates such an ellipse. What
are the lengths of the semi-axes? What are the singular values of the
corresponding matrix,  
1/2 0
A= ?
0 2
5. Find the SVD for the matrix
⎡ ⎤
0 −2
A = ⎣1 0 ⎦.
0 0

6. Let ⎡ ⎤
−1 0 1
A=⎣ 0 1 0 ⎦,
1 1 −2
and C = AT A. Is one of the eigenvalues of C negative?
7. For the matrix ⎡ ⎤
−2 0 0
A=⎣ 0 1 0⎦ ,
0 0 1
show that both (16.5) and (16.6) yield the same result for the absolute
value of the determinant of A.
8. For the matrix  
2 0
A= ,
2 0.5
show that both (16.5) and (16.6) yield the same result for the determi-
nant of A.
9. For the matrix ⎡ ⎤
1 0 1
A = ⎣0 1 0⎦ ,
0 1 0
show that both (16.5) and (16.6) yield the same result for the absolute
value of the determinant of A.
10. What is the pseudoinverse of the matrix from Exercise 5?
366 16. The Singular Value Decomposition

11. What is the pseudoinverse of the matrix


⎡ ⎤
0 1
⎣3 0⎦?
0 0
12. What is the pseudoinverse of the matrix
⎡ ⎤
2
A = ⎣0⎦?
0
Note that this matrix is actually a vector.
13. What is the pseudoinverse of ⎡ ⎤
3 0 0
A = ⎣0 2 0⎦?
0 0 1
14. What is the least squares solution to the linear system Av = b given by:
⎡ ⎤ ⎡ ⎤
1 0   0
⎣0 2⎦ v
= ⎣1⎦ .
1
v2
0 1 0
Use the pseudoinverse and the enumerated steps in Section 16.6. The
SVD of A may be found in Example 16.3.
15. What is the least squares solution to the linear system Av = b given by:
⎡ ⎤ ⎡ ⎤
4 0   1
v
⎣0 1⎦ 1 = ⎣1⎦ .
v2
0 0 1
Use the pseudoinverse and the enumerated steps in Section 16.6.
16. What is the least squares solution to the linear system Av = b given by:
⎡ ⎤⎡ ⎤ ⎡ ⎤
3 0 0 v1 3
⎣0 1 2⎦ ⎣v2 ⎦ = ⎣3⎦ .
0 1 0 v3 1
Use the pseudoinverse and the enumerated steps in Section 16.6.
17. For the following data set X, apply PCA using all eigenvectors. Give
the covariance matrix and the final components X̂ in the principal com-
ponents coordinate system. ⎡ ⎤
−2 2
⎢−1 −1⎥
⎢ ⎥
⎢0 0⎥
⎢ ⎥
X=⎢ ⎢1 1⎥⎥
⎢2 2⎥
⎢ ⎥
⎣−1 0⎦
1 0
Make a sketch of the data; this will help with finding the solution.
18. For the data set in Exercise 17, apply PCA using the dominant eigen-
vector only.
17
Breaking It Up: Triangles

Figure 17.1.
2D finite element method: refinement of a triangulation based on stress and strain
calculations. (Source: J. Shewchuk, http://www.cs.cmu.edu/∼ quake/triangle.html.)

Triangles are as old as geometry. They were of interest to the an-


cient Greeks, and in fact the roots of trigonometry can be found in

367
368 17. Breaking It Up: Triangles

their study. Triangles also became an indispensable tool in computer


graphics and advanced disciplines such as finite element analysis. In
graphics, objects are broken down into triangular facets for display
purposes; in the finite element method (FEM), 2D shapes are broken
down into triangles in order to facilitate complicated algorithms. Fig-
ure 17.1 illustrates a refinement procedure based on stress and strain
calculations. For both applications, reducing the geometry to linear
or piecewise planar makes computations more tractable.

17.1 Barycentric Coordinates

A triangle T is given by three points, its vertices, p1 , p2 , and p3 .


The vertices may live in 2D or 3D. Three points define a plane, thus
a triangle is a 2D element. We use the convention of labeling the pi
in a counterclockwise sense. The edge, or side, opposite point pi is
labeled si . (See Sketch 17.1.)
Sketch 17.1.
When we study the properties of this triangle, it is more convenient
Vertices and edges of a triangle.
to work in terms of a local coordinate system that is closely tied to the
triangle. This type of coordinate system was invented by F. Moebius
in 1827 and is known as barycentric coordinates.
Let p be an arbitrary point inside T . Our aim is to write it as a
combination of the vertices pi in a form such as

p = up1 + vp2 + wp3 . (17.1)

We know one thing already: the right-hand side of this equation is a


combination of points, and so the coefficients must sum to one:

u + v + w = 1. (17.2)

Otherwise, we would not have a barycentric combination! (See


Sketch 17.2.)
Sketch 17.2.
We can simply write (17.1) and (17.2) as a linear system
Barycentric coordinates.
⎡ ⎤ ⎡ ⎤
  u p1
p1 p2 p3 ⎣ v ⎦ = ⎣ p 2 ⎦
w 1

Using Cramer’s rule, the solution to this 3 × 3 linear system is formed


17.1. Barycentric Coordinates 369

as ratios of determinants or areas, thus


area(p, p2 , p3 )
u= , (17.3)
area(p1 , p2 , p3 )
area(p, p3 , p1 )
v= , (17.4)
area(p1 , p2 , p3 )
area(p, p1 , p2 )
w= . (17.5)
area(p1 , p2 , p3 )
Recall that for linear interpolation in Section 2.5, barycentric coor-
dinates on a line segment were defined in terms of ratios of lengths.
Here, for a triangle, we have the analogous ratios of areas. To review
determinants and areas, see Sections 8.2, 8.5, and 9.8.
Let’s see why this works. First, we observe that (u, v, w) do indeed
sum to one. Next, let p = p2 . Now (17.3)–(17.5) tell us that v = 1
and u = w = 0, just as expected. One more check: if p is on the edge
s1 , say, then u = 0, again as expected. Try to show for yourself that
the remaining vertices and edges work the same way.
We call (u, v, w) barycentric coordinates and denote them by bold-
face: u = (u, v, w). Although they are not independent of each other
(e.g., we may set w = 1 − u − v), they behave much like “normal”
coordinates: if p is given, then we can find u from (17.3)–(17.5). If
u is given, then we can find p from (17.1).
The three vertices of the triangle have barycentric coordinates

p1 ∼
= (1, 0, 0),
p2 ∼
= (0, 1, 0),
p3 ∼
= (0, 0, 1).

The ∼= symbol will be used to indicate the barycentric coordinates of


a point. These and several other examples are shown in Sketch 17.3.
As you see, even points outside of T can be given barycentric co- Sketch 17.3.
ordinates! This works since the areas involved in (17.3)–(17.5) are Examples of barycentric
signed. So points inside T have positive barycentric coordinates, and coordinates.
those outside have mixed signs.1
This observation is the basis for one of the most frequent uses
of barycentric coordinates: the triangle inclusion test. If a triangle
T and a point p are given, how do we determine if p is inside T
1 This assumes the triangle to be oriented counterclockwise. If it is oriented

clockwise, then the points inside have all negative barycentric coordinates, and
the outside ones still have mixed signs.
370 17. Breaking It Up: Triangles

or not? We simply compute p’s barycentric coordinates and check


their signs. If they are all of the same sign, inside—else, outside.
Theoretically, one or two of the barycentric coordinates could be zero,
indicating that p is on one of the edges. In “real” situations, you are
not likely to encounter values that are exactly equal to zero; be sure
not to test for a barycentric coordinate to be equal to zero! Instead,
use a tolerance , and flag a point as being on an edge if one of its
barycentric coordinates is less than  in absolute value. A good value
for ? Obviously, this is application dependent, but something like
1.0E−6 should work for most cases.
Finally, Sketch 17.4 shows how we may think of the whole plane
as being covered by a grid of coordinate lines. Note that the plane is
divided into seven regions by the (extended) edges of T .

Example 17.1

Let’s work with a simple example that is easy to sketch. Suppose the
three triangle vertices are given by
Sketch 17.4.      
0 1 0
Barycentric coordinates p1 = , p2 = , p3 = .
0 0 1
coordinate lines.
The points q, r, s with barycentric coordinates
 
1 1 1 1 1
q∼= 0, , , r∼ = (−1, 1, 1) , s ∼
= , ,
2 2 3 3 3
have the following coordinates in the plane:
 
1 1 1/2
q = 0 × p1 + × p2 + × p3 = ,
2 2 1/2
 
1
r = −1 × p1 + 1 × p2 + 1 × p3 = ,
1
 
1 1 1 1/3
s = × p1 + × p2 + × p3 = .
3 3 3 1/3

17.2 Affine Invariance


In this short section, we will discuss the statement: barycentric coor-
dinates are affinely invariant.
17.3. Some Special Points 371

Let T̂ be an affine image of T , having vertices pˆ1 , pˆ2 , pˆ3 . Let p be


a point with barycentric coordinates u relative to T . We may apply
the affine map to p also, and then we ask: What are the barycentric
coordinates of p̂ with respect to T̂ ?
While at first sight this looks like a daunting task, simple geometry
yields the answer quickly. Note that in (17.3)–(17.5), we employ ratios
of areas. These are, as introduced in Section 6.2, unchanged by affine
maps. So while the individual areas in (17.3)–(17.5) do change, their
quotients do not. Thus, p̂ also has barycentric coordinates u with
respect to T̂ .
This fact, namely that affine maps do not change barycentric co-
ordinates, is what is meant by the statement at the beginning of this
section. (See Sketch 17.5 for an illustration.)
Sketch 17.5.
Affine invariance of barycentric
coordinates.
Example 17.2

Let’s revisit the simple triangle in Example 17.1 and look at the
affine invariance of barycentric coordinates. Suppose we apply a 90◦
rotation,  
0 −1
R=
1 0
to the triangle vertices, resulting in p̂i = Rpi . Apply this rotation to
s from Example 17.1,
 
−1/3
ŝ = Rs = .
1/3

Due to the affine invariance of barycentric coordinates, we could have


found the coordinates of ŝ as
 
1 1 1 −1/3
ŝ = × p̂1 + × p̂2 + × p̂3 = .
3 3 3 1/3

17.3 Some Special Points


In classical geometry, many special points relative to a triangle have
been discovered, but for our purposes, just three will do: the centroid,
the incenter, and the circumcenter. They are used for a multitude of
geometric computations.
372 17. Breaking It Up: Triangles

The centroid c of a triangle is given by the intersection of the three


medians. (A median is the connection of an edge midpoint to the
opposite vertex.) Its barycentric coordinates (see Sketch 17.6) are
given by 
∼ 1 1 1
c= , , . (17.6)
3 3 3

We verify this by writing


 
1 1 1 1 2 1 1
, , = (0, 1, 0) + , 0, ,
Sketch 17.6. 3 3 3 3 3 2 2
The centroid.  
thus asserting that 13 , 13 , 13 lies on the median associated with p2 . In
the same way, we show that it is also on the remaining two medians.
We also observe that a triangle and its centroid are related in an
affinely invariant way.
The incenter i of a triangle is the intersection of the three angle
bisectors (see Sketch 17.7). There is a circle, called the incircle, that
has i as its center and touches all three triangle edges. Let si be the
length of the triangle edge opposite vertex pi . Let r be the radius of
the incircle—there is a formula for it, but we won’t need it here.
If the barycentric coordinates of i are (i1 , i2 , i3 ), then we see that

area(i, p2 , p3 )
i1 = .
area(p1 , p2 , p3 )
Sketch 17.7.
The incenter. This may be rewritten as
rs1
i1 = ,
rs1 + rs2 + rs3
using the “1/2 base times height” rule for triangle areas.
Simplifying, we obtain
s1 s2 s3
i1 = , i2 = , i3 = ,
c c c
where c = s1 + s2 + s3 is the circumference of T . A triangle is not
affinely related to its incenter—affine maps change the barycentric
coordinates of i.
The circumcenter cc of a triangle is the center of the circle through
its vertices. It is obtained as the intersection of the edge bisectors.(See
Sketch 17.8.) Notice that the circumcenter might not be inside the
Sketch 17.8. triangle. This circle is called the circumcircle and we will refer to its
The circumcenter. radius as R.
17.3. Some Special Points 373

The barycentric coordinates (cc1 , cc2 , cc3 ) of the circumcenter are


d1 (d2 + d3 ) d2 (d1 + d3 ) d3 (d1 + d2 )
cc1 = , cc2 = , cc3 = ,
D D D
where
d1 = (p2 − p1 ) · (p3 − p1 ),
d2 = (p1 − p2 ) · (p3 − p2 ),
d3 = (p1 − p3 ) · (p2 − p3 ),
D = 2(d1 d2 + d2 d3 + d3 d1 ).

Furthermore,
"
1 (d1 + d2 )(d2 + d3 )(d3 + d1 )
R= .
2 D/2

These formulas are due to [8]. Confirming our observation in


Sketch 17.8 that the circumcircle might be outside of the triangle,
note that some of the cci may be negative. If T has an angle close
to 180◦, then the corresponding cci will be very negative, leading to
serious numerical problems. As a result, the circumcenter will be far
away from the vertices, and thus not be of practical use. As with the
incenter, affine maps of the triangle change the barycentric coordi-
nates of the circumcenter.

Example 17.3

Yet again, let’s visit the simple triangle in Example 17.1. Be sure to
make a sketch to check the results of this example. Let’s compute √
the incenter. The lengths of the edges of the triangle are s1 = √2,
s2 = 1, and s3 = 1. The circumference of the triangle is c = 2 + 2.
The barycentric coordinates of the incenter are then
# √ $
∼ 2 1 1
i= √ , √ , √ .
2+ 2 2+ 2 2+ 2

(These barycentric coordinates are approximately (0.41, 0.29, 0.29).)


The coordinates of the incenter are
 
0.29
i = 0.41 × p1 + 0.29 × p2 + 0.29 × p3 = .
0.29
The circumcircle’s circumcenter is easily calculated, too. First com-
pute d1 = 0, d2 = 1, d3 = 1, and D = 2. Then the barycentric
374 17. Breaking It Up: Triangles

coordinates of the circumcenter are c =∼ (0, 1/2, 1/2). This is the


midpoint of the “diagonal” edge of the triangle.
Now the radius of√the circumcircle is easily computed with the
equation above, R = 2/2.

More interesting constructions based on triangles may be found in


Coxeter [3].

17.4 2D Triangulations
The study of one triangle is the realm of classical geometry; in mod-
ern applications, one often encounters millions of triangles. Typically,
they are connected in some well-defined way; the most basic one
being the 2D triangulation. Triangulations have been used in sur-
veying for centuries; more modern applications rely on satellite data,
which are collected in triangulations called TINs (triangular irregular
networks).
Sketch 17.9. Here is the formal definition of a 2D triangulation. A triangulation
Examples of illegal of a set of 2D points {pi }N
i=1 is a connected set of triangles meeting
triangulations. the following criteria:
1. The vertices of the triangles consist of the given points.
2. The interiors of any two triangles do not intersect.
3. If two triangles are not disjoint, then they share a vertex or have
coinciding edges.
4. The union of all triangles equals the convex hull of the pi .
These rules sound abstract, but some examples will shed light on
them. Figure 17.2 shows a triangulation that satisfies the 2D triangu-
lation definition. Evident from this example: the number of triangles
surrounding a vertex, or valence, varies from vertex to vertex. These
triangles make up the star of a vertex. In contrast, Sketch 17.9 shows
three illegal triangulations, violating the above rules. The top exam-
ple involves overlapping triangles. In the middle example, the bound-
ary of the triangulation is not the convex hull of the point set. (A lot
more on convex hulls may be found in [4]; also see Section 18.3.) The
bottom example violates condition 3.
If we are given a point set, is there a unique triangulation? Cer-
Sketch 17.10. tainly not, as Sketch 17.10 shows. Among the many possible tri-
Nonuniqueness of angulations, there is one that is most commonly agreed to be the
triangulations.
17.5. A Data Structure 375

Figure 17.2.
Triangulation: a valid triangulation of the convex hull.

“best.” This is the Delaunay triangulation. Describing the details of


this method is beyond the scope of this text, however a wealth of
information can be found on the Web.

17.5 A Data Structure


What is the best data structure for storing a triangulation? The
factors that determine the best structure include storage requirements
and accessibility. Let’s build the “best” structure based on the point
set and triangulation illustrated in Sketch 17.11.
In order to minimize storage, it is an accepted practice to store Sketch 17.11.
each point only once. Since these are floating point values, they take A sample triangulation.
up the most space. Thus, a basic triangulation structure would be
a listing of the point set followed by the triangulation information.
This constitutes pointers into the point set, indicating which points
are joined to form a triangle. Store the triangles in a counterclock-
wise orientation! This is the data structure for the triangulation in
Sketch 17.11:
376 17. Breaking It Up: Triangles

5 (number of points)
0.0 0.0 (point #1)
1.0 0.0
0.0 1.0
0.25 0.3
0.5 0.3
5 (number of triangles)
1 2 5 (first triangle - connects points #1,2,5)
2 3 5
4 5 3
1 5 4
1 4 3
We can improve this structure. We will encounter applications
that require knowledge of the connectivity of the triangulation, as
described in Section 17.6. To facilitate this, it is not uncommon to
also see the neighbor information of the triangulation stored. This
means that for each triangle, the indices of the triangles surrounding it
are stored. For example, in Sketch 17.11, triangle 1 defined by points
1, 2, 5 is surrounded by triangles 2, 4, −1. The neighboring triangles
are listed corresponding to the point across from the shared edge.
Triangle −1 indicates that there is not a neighboring triangle across
this edge. Immediately, we see that this gives us a fast method for
determining the boundary of the triangulation. Listing the neighbor
information after each triangle, the final data structure is as follows.
5 (number of points)
0.0 0.0 (point #1)
1.0 0.0
0.0 1.0
0.25 0.3
0.5 0.3
5 (number of triangles)
1 2 5 2 4 -1 (first triangle and neighbors)
2 3 5 3 1 -1
4 5 3 2 5 4
1 5 4 3 5 1
1 4 3 3 -1 4
This is but one of many possible data structures for a triangula-
tion. Based on the needs of particular applications, researchers have
developed a variety of structures to optimize searches. One such
structure that has proved to be popular is called the winged-edge
data structure [14].
17.6. Application: Point Location 377

17.6 Application: Point Location


Given a triangulation of points pi , assume we are given a point p that
has not been used in building the triangulation. Question: In which
triangle is p, if any? The easiest way is to compute p’s barycentric
coordinates with respect to all triangles; if all of them are positive
with respect to some triangle, then that is the desired one; else, p is
in none of the triangles.
While simple, this algorithm is expensive. In the worst case, every
triangle has to be considered; on average, half of all triangles have to
be considered. A much more efficient algorithm may be based upon
the following observation. Suppose p is not in a particular triangle
T . Then at least one of its barycentric coordinates with respect to
T must be negative; let’s assume it is u. We then know that p has
no chance of being inside T ’s two neighbors along edges s2 or s3 (see
Sketch 17.12).
So a likely candidate to check is the neighbor along s1 —recall that
we have stored the neighboring information in a data structure. In
this way—always searching in the direction of the currently most
negative barycentric coordinate—we create a path from a starting Sketch 17.12.
triangle to the one that actually contains p. Point location triangle search.

Point Location Algorithm

Input: Triangulation and neighbor information, plus one point p.

Output: Triangle that p is in.

Step 0: Set the “current triangle” to be the first triangle in the


triangulation.

Step 1: Perform the triangle inclusion test (see Section 17.1) for p
and the current triangle. If all barycentric coordinates are positive,
output the current triangle. If the barycentric coordinates are mixed
in sign, then determine the barycentric coordinate of p with respect
to the current triangle that has the most negative value. Set the
current triangle to be the corresponding neighbor and repeat Step 1.

Notes:

• Try improving the speed of this algorithm by not completing the di-
vision for determining the barycentric coordinates in (17.3)–(17.5).
This division does not change the sign. Keep in mind the test for
which triangle to move to changes.
378 17. Breaking It Up: Triangles

• Suppose the algorithm is to be executed for more than one point.


Consider using the triangle that was output from the previous run
as input, rather than always using the first triangle. Many times
a data set has some coherence, and the output for the next run
might be the same triangle, or one very near to the triangle from
the previous run.

17.7 3D Triangulations
In computer applications, one often encounters millions of triangles,
connected in some well-defined way, describing a geometric object. In
particular, shading algorithms require this type of structure.
The rules for 3D triangulations are the same as for 2D. Additionally,
the data structure is the same, except that now each point has three
instead of two coordinates.
Figure 17.3 shows a 3D surface that is composed of triangles. An-
other example is provided in Figure 8.4. Shading requires a 3D unit
vector, called a normal, to be associated with each triangle or vertex.
A normal is perpendicular to an object’s surface at a particular point.
This normal is used to calculate how light is reflected, and in turn
the illumination of the object. (See [14] for details on such illumi-

Figure 17.3.
3D triangulated surface: a wireframe and shaded renderings superimposed.
17.8. Exercises 379

nation methods.) We investigated just how to calculate normals in


Section 8.6.

• barycentric coordinates • valence


• triangle inclusion test • triangulation data
• affine invariance of structure
barycentric coordinates • point location algorithm
• centroid, barycenter • 3D triangulation criteria
• incenter • 3D triangulation data
• circumcenter structure
• 2D triangulation criteria • normal
• star

17.8 Exercises
Let a triangle T1 be given by the vertices
     
1 2 −1
p1 = , p2 = , p3 = .
1 2 2

Let a triangle T2 be given by the vertices


     
0 0 −1
q1 = , q2 = , q3 = .
0 −1 0

1. Using T1 :
 
0
(a) What are the barycentric coordinates of p = ?
1.5
 
0
(b) What are the barycentric coordinates of p = ?
0
(c) Find the triangle’s incenter.
(d) Find the triangle’s circumcenter.
(e) Find the centroid of the triangle.
2. Using T2 :
 
0
(a) What are the barycentric coordinates of p = ?
1.5
 
0
(b) What are the barycentric coordinates of p = ?
0
(c) Find the triangle’s incenter.
380 17. Breaking It Up: Triangles

(d) Find the triangle’s circumcenter.


(e) Find the centroid of the triangle.
3. What are the areas of T1 and T2 ?
4. Let an affine map be given by
   
1 2 −1
x = x+ .
−1 2 0

What are the areas of mapped triangles T1 and T2 ? Compare the ratios

T1 T1
and .
T2 T2

5. What is the unit normal to the triangle T1 ?


6. What is the unit normal to the triangle with vertices
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
⎣0⎦ , ⎣1⎦ , ⎣0⎦?
0 0 1
18
Putting Lines Together:
Polylines and Polygons

Figure 18.1.
Polygon: straight line segments forming a bird shape.

Figure 18.1 shows a polygon. It is the outline of a shape, drawn with


straight line segments. Since such shapes are all a printer or plot-
ter can draw, just about every computer-generated drawing consists
of polygons. If we add an “eye” to the bird-shaped polygon from

381
382 18. Putting Lines Together: Polylines and Polygons

Figure 18.2.
Mixing maps: a pattern is created by composing rotations and translations.

Figure 18.1, and if we apply a sequence of rotations and translations


to it, then we arrive at Figure 18.2. It turns out copies of our special
bird polygon can cover the whole plane! This technique is also present
in the Escher illustration in Figure 6.8.

18.1 Polylines

Straight line segments, called edges, connecting an ordered set of ver-


tices constitute a polyline. The first and last vertices are not neces-
sarily connected. Some 2D examples are illustrated in Sketch 18.1,
however, a polyline can be 3D, too. Since the vertices are ordered,
the edges are oriented and can be thought of as vectors. Let’s call
them edge vectors.
Sketch 18.1. Polylines are a primary output primitive, and thus they are in-
2D polyline examples. cluded in graphics standards. One example of a graphics standard is
the GKS (Graphical Kernel System); this is a specification of what
belongs in a graphics package. The development of PostScript was
based on GKS. Polylines have many uses in computer graphics and
modeling. Whether in 2D or 3D, they are typically used to outline
a shape. The power of polylines to reveal a shape is illustrated in
Figure 18.3 in the display of a 3D surface. The surface is evaluated in
an organized fashion so that the points can be logically connected as
polylines, giving the observer a feeling of the “flow” of the surface. In
modeling, a polyline is often used to approximate a complex curve or
data, which in turn makes analysis easier and less costly. An example
of this is illustrated in Figure 12.3.
18.2. Polygons 383

Figure 18.3.
Polylines: the display of a 3D surface. Two different directions for the polyline sets
give different impressions of the surface shape.

18.2 Polygons
When the first and last vertices of a polyline are connected, it is called
a polygon. Normally, a polygon is thought to enclose an area. For
this reason, unless a remark is made, we will consider planar polygons
only. Just as with polylines, polygons constitute an ordered set of
vertices and we will continue to use the term edge vectors. Thus, a
polygon with n edges is given by an ordered set of 2D points
p 1 , p2 , . . . , p n
and has edge vectors vi = pi+1 − pi ; i = 1, . . . , n. Note that the edge
vectors sum to the zero vector.
If you look at the edge vectors carefully, you’ll discover that vn =
pn+1 − pn , but there is no vertex pn+1 ! This apparent problem is
resolved by defining pn+1 = p1 , a convention called cyclic numbering.
We’ll use this convention throughout, and will not mention it every
time. We also add one more topological characterization of polygons:
the number of vertices equals the number of edges.
Since a polygon is closed, it divides the plane into two parts: a
finite part, the polygon’s interior, and an infinite part, the polygon’s
exterior.
As you traverse a polygon, you follow the path determined by the
vertices and edge vectors. Between vertices, you’ll move along straight
lines (the edges), but at the vertices, you’ll have to perform a rotation
before resuming another straight line path. The angle αi by which
you rotate at vertex pi is called the turning angle or exterior angle at Sketch 18.2.
pi . The interior angle is then given by π − αi (see Sketch 18.2). Interior and exterior angles.
384 18. Putting Lines Together: Polylines and Polygons

Polygons are used a lot! For instance, in Chapter 1 we discussed


the extents of geometry in a 2D coordinate system. Another name
for these extents is a minmax box (see Sketch 18.3). It is a special
polygon, namely a rectangle. Another type of polygon studied in
Chapter 17 is the triangle. This type of polygon is often used to define
a polygonal mesh of a 3D model. The triangles may then be filled with
color to produce a shaded image. A polygonal mesh from triangles
is illustrated in Figure 17.3, and one from rectangles is illustrated in
Figure 8.3.
Sketch 18.3. 18.3 Convexity
A minmax box is a polygon.
Polygons are commonly classified by their shape. There are many
ways of doing this. One important classification is as convex or non-
convex. The latter is also referred to as concave. Sketch 18.4 gives an
example of each. How do you describe the shape of a convex polygon?
As in Sketch 18.5, stick nails into the paper at the vertices. Now take
a rubberband and stretch it around the nails, then let go. If the rub-
berband shape follows the outline of the polygon, it is convex. The
points on the rubberband define the convex hull. Another definition:
take any two points in the polygon (including on the edges) and con-
nect them with a straight line. If the line never leaves the polygon,
then it is convex. This must work for all possible pairs of points!
The issue of convexity is important, because algorithms that in-
volve polygons can be simplified if the polygons are known to be
Sketch 18.4.
convex. This is true for algorithms for the problem of polygon clip-
Convex (left) and nonconvex
(right) polygons. ping. This problem starts with two polygons, and the goal is to find
the intersection of the polygon areas. Some examples are illustrated
in Sketch 18.6. The intersection area is defined in terms of one or
more polygons. If both polygons are convex, then the result will be
just one convex polygon. However, if even one polygon is not convex
then the result might be two or more, possibly disjoint or nonconvex,
polygons. Thus, nonconvex polygons need more record keeping in or-
der to properly define the intersection area(s). Not all algorithms are
designed for nonconvex polygons. See [10] for a detailed description
of clipping algorithms.
An n-sided convex polygon has a sum of interior angles I equal to
I = (n − 2)π. (18.1)
To see this, take one polygon vertex and form triangles with the other
vertices, as illustrated in Sketch 18.7. This forms n − 2 triangles. The
Sketch 18.5.
sum of interior angles of a triangle is known to be π. Thus, we get
Rubberband test for convexity of
a polygon. the above result.
18.4. Types of Polygons 385

The sum of the exterior angles of a convex polygon is easily found


with this result. Each interior and exterior angle sums to π. Suppose
the ith interior angle is αi radians, then the exterior angle is π − αi
radians. Sum over all angles, and the exterior angle sum E is
E = nπ − (n − 2)π = 2π. (18.2)
To test if an n-sided polygon is convex, we’ll use the barycenter
of the vertices p1 , . . . , pn . The barycenter b is a special barycentric
combination (see Sections 17.1 and 17.3):
1
b= (p1 + . . . + pn ).
n
It is the center of gravity of the vertices. We need to construct the
implicit line equation for each edge vector in a consistent manner. If
the polygon is convex, then the point b will be on the “same” side
of every line. The implicit equation will result in all positive or all
negative values. (Sketch 3.3 illustrates this concept.) This will not
work for some unusual, nonsimple, polygons. (See Section 18.5 and
the Exercises.)
Another test for convexity is to check if there is a reentrant angle. Sketch 18.6.
This is an interior angle that is greater than π. Polygon clipping.

18.4 Types of Polygons


There are a variety of special polygons. First, we introduce two terms
to help describe these polygons:
• equilateral means that all sides are of equal length, and
• equiangular means that all interior angles at the vertices are equal.
In the following illustrations, edges with the same number of tick
marks are of equal length and angles with the same number of arc
markings are equal. Sketch 18.7.
A very special polygon is the regular polygon: it is equilateral and Sum of interior angles using
equiangular. Examples are illustrated in Sketch 18.8. A regular poly- triangles.
gon is also referred to as an n-gon, indicating it has n edges. We list
the names of the “classical” n-gons:
• a 3-gon is an equilateral triangle,
• a 4-gon is a square,
• a 5-gon is a regular pentagon,
• a 6-gon is a regular hexagon, and
• an 8-gon is a regular octagon.
Sketch 18.8.
Regular polygons.
386 18. Putting Lines Together: Polylines and Polygons

Figure 18.4.
Circle approximation: using an n-gon to represent a circle.

An n-gon is commonly used to approximate a circle in computer


graphics, as illustrated in Figure 18.4.
A rhombus is equilateral but not equiangular, whereas a rectangle is
equiangular but not equilateral. These are illustrated in Sketch 18.9.

18.5 Unusual Polygons


Most often, applications deal with simple polygons, as opposed to
nonsimple polygons. A nonsimple polygon, as illustrated in Sketch
Sketch 18.9.
18.10, is characterized by edges intersecting other than at the ver-
Rhombus and rectangle.
tices. Topology is the reason nonsimple polygons can cause havoc
in some algorithms. For convex and nonconvex simple polygons, as
you traverse along the boundary of the polygon, the interior remains
on one side. This is not the case for nonsimple polygons. At the
mid-edge intersections, the interior switches sides. In more concrete
terms, recall how the implicit line equation could be used to deter-
mine if a point is on the line, and more generally, which side it is on.
Suppose you have developed an algorithm that associates the + side
of the line with being inside the polygon. This rule will work fine if
the polygon is simple, but not otherwise.
Sometimes nonsimple polygons can arise due to an error. The poly-
gon clipping algorithms, as discussed in Section 18.3, involve sorting
vertices to form the final polygon. If this sorting goes haywire, then
Sketch 18.10.
you could end up with a nonsimple polygon rather than a simple one
Nonsimple polygon.
as desired.
18.6. Turning Angles and Winding Numbers 387

Figure 18.5.
Trimmed surface: an application of polygons with holes. Left: trimmed surface. Right:
rectangular parametric domain with polygonal holes.

In applications, it is not uncommon to encounter polygons with


holes. Such a polygon is illustrated in Sketch 18.11. As you see, this
is actually more than one polygon. An example of this, illustrated in
Figure 18.5, is a special CAD/CAM (computer-aided manufacturing)
surface called a trimmed surface. The polygons define parts of the
material to be cut or punched out. This allows other parts to fit to
this one.
For trimmed surfaces and other CAD/CAM applications, a certain
convention is accepted in order to make more sense out of this multi-
polygon geometry. The polygons must be oriented a special way.
Sketch 18.11.
The visible region, or the region that is not cut out, is to the “left.”
Polygon with holes.
As a result, the outer boundary is oriented counterclockwise and the
inner boundaries are oriented clockwise. More on the visible region
in Section 18.9.

18.6 Turning Angles and Winding Numbers


The turning angle of a polygon or polyline is essentially another name
for the exterior angle, which is illustrated in Sketch 18.12. Notice that
this sketch illustrates the turning angles for a convex and a nonconvex
polygon. Here the difference between a turning angle and an exterior
angle is illustrated. The turning angle has an orientation as well as
an angle measure. All turning angles for a convex polygon have the
same orientation, which is not the case for a nonconvex polygon. This
fact will allow us to easily differentiate the two types of polygons.
Here is an application of the turning angle. Suppose for now that a
given polygon is 2D and lives in the [e1 , e2 ]-plane. Its n vertices are
labeled
p 1 , p 2 , . . . pn .
388 18. Putting Lines Together: Polylines and Polygons

We want to know if the polygon is convex. We only need to look at


the orientation of the turning angles, not the actual angles. First,
let’s embed the 2D vectors in 3D by adding a zero third coordinate,
for example: ⎡ ⎤
p1,1
p1 = ⎣p2,1 ⎦ .
0
Recall that the cross product of two vectors in the e1 , e2 -plane will
produce a vector that points “in” or “out,” that is in the +e3 or −e3
direction. Therefore, by taking the cross product of successive edge
vectors
ui = (pi+1 − pi ) ∧ (pi+2 − pi+1 ), (18.3)
we’ll encounter ui of the form
⎡ ⎤
0
⎣ 0 ⎦.
u3,i

Sketch 18.12. If the sign of the u3,i value is the same for all angles, then the polygon
Turning angles. is convex. A mathematical way to describe this is by using the scalar
triple product (see Section 8.5). The turning angle orientation is
determined by the scalar

u3,i = e3 · ((pi+1 − pi ) ∧ (pi+2 − pi+1 )).

Notice that the sign is dependent upon the traversal direction of the
polygon, but only a change of sign is important. The determinant of
the 2D vectors would have worked just as well, but the 3D approach
is more useful for what follows.
If the polygon lies in an arbitrary plane, having a normal n, then the
above convex/concave test is changed only a bit. The cross product
in (18.3) produces a vector ui that has direction ±n. Now we need
the dot product, n · ui to extract a signed scalar value.
If we actually computed the turning angle at each vertex, we could
form an accumulated value called the total turning angle. Recall from
(18.2) that the total turning angle for a convex polygon is 2π. For a
polygon that is not known to be convex, assign a sign using the scalar
triple product as above, to each angle measurement. The sum E will
then be used to compute the winding number of the polygon. The
winding number W is
E
W = .

18.7. Area 389

Thus, for a convex polygon, the winding number is one. Sketch 18.13
illustrates a few examples. A non-self-intersecting polygon is essen-
tially one loop. A polygon can have more than one loop, with dif-
ferent orientations: clockwise versus counterclockwise. The winding
number gets decremented for each clockwise loop and incremented
for each counterclockwise loop, or vice versa depending on how you
assign signs to your angles.

18.7 Area
A simple method for calculating the area of a 2D polygon is to use
the signed area of a triangle as in Section 4.9. First, triangulate the
polygon. For example, choose one vertex of the polygon and form all
triangles from it and successive pairs of vertices, as is illustrated in
Sketch 18.14. The sum of the signed areas of the triangles results in
the area of the polygon. For this method to work, we must form the
triangles with a consistent orientation. For example, in Sketch 18.14,
triangles (p1 , p2 , p3 ), (p1 , p3 , p4 ), and (p1 , p4 , p5 ) are all counter-
clockwise or right-handed, and therefore have positive area. More
precisely, if we form vi = pi − p1 , then the area of the polygon in Sketch 18.13.
Sketch 18.14 is Winding numbers.

1
A= (det[v2 , v3 ] + det[v3 , v4 ] + det[v4 , v5 ]).
2
In general, if a polygon has n vertices, then this area calculation
becomes
1
A = (det[v2 , v3 ] + . . . + det[vn−1 , vn ]). (18.4)
2
The use of signed area makes this idea work for nonconvex polygons
as in Sketch 18.15. As illustrated in the sketch, the negative areas
cancel duplicate and extraneous areas.
Equation (18.4) takes an interesting form if its terms are expanded.
We observe that the determinants that represent edges of triangles
within the polygon cancel. So this leaves us with
Sketch 18.14.
1 Area of a convex polygon.
A = (det[p1 , p2 ] + . . . + det[pn−1 , pn ] + det[pn , p1 ]). (18.5)
2
Equation (18.5) seems to have lost all geometric meaning because it
involves the determinant of point pairs, but we can recapture geomet-
ric meaning if we consider each point to be pi − o.
Is (18.4) or (18.5) the preferred form? The amount of computation
for each equation is similar; however, there is one drawback of (18.5).
390 18. Putting Lines Together: Polylines and Polygons

If the polygon is far from the origin then numerical problems can
occur because the vectors pi and pi+1 will be close to parallel. The
form in (18.4) essentially builds a local frame in which to compute the
area. For debugging and making sense of intermediate computations,
(18.4) is easier to work with. This is a nice example of how reducing
an equation to its “simplest” form is not always “optimal”!
An interesting observation is that (18.5) may be written as a gen-
eralized determinant. The coordinates of the vertices are
 
p
pi = 1,i .
p2,i

The area is computed as follows,



1 p1,1 p1,2 ··· p1,n p1,1
A= ,
2 p2,1 p2,2 ··· p2,n p2,1

which is computed by adding the products of all “downward” diago-


nals, and subtracting the products of all “upward” diagonals.

Sketch 18.15.
Area of a nonconvex polygon. Example 18.1

Let        
0 1 1 0
p1 = , p2 = , p3 = , p4 = .
0 0 1 1
We have

1 0 1 1 0 0 1
A= = [0 + 1 + 1 + 0 − 0 − 0 − 0 − 0] = 1.
2 0 0 1 1 0 2

Since our polygon was a square, this is as expected.


But now take
       
0 1 0 1
p1 = , p2 = , p3 = , p4 = .
0 1 1 0

This is a nonsimple polygon! Its area computes to



1 0 1 0 1 0 1
A= = [0 + 1 + 0 + 0 − 0 − 0 − 1 − 0] = 0.
2 0 1 1 1 0 2
Draw a sketch and convince yourself this is correct.
18.7. Area 391

Planar polygons are sometimes specified by 3D points. We can ad-


just the area formula (18.4) accordingly. Recall that the cross product
of two vectors results in a vector whose length equals the area of the
parallelogram spanned by the two vectors. So we will replace each
determinant by a cross product

ui = vi ∧ vi+1 for i = 2, n − 1 (18.6)

and as before, each vi = pi − p1 . Suppose the (unit) normal to


the polygon is n. Notice that n and all ui share the same direction,
therefore u = u · n. Now we can rewrite (18.5) for 3D points,

1
A= n · (u2 + . . . + un−1 ), (18.7)
2
with the ui defined in (18.6). Notice that (18.7) is a sum of scalar
triple products, which were introduced in Section 8.5.

Example 18.2

Take the four coplanar 3D points


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 2 2 0
p1 = ⎣2⎦ , p2 = ⎣0⎦ , p3 = ⎣0⎦ , p4 = ⎣2⎦ .
0 0 3 3

Compute the area with (18.7), and note that the normal
⎡ √ ⎤
−1/√2
n = ⎣−1/ 2⎦ .
0

First compute
⎡ ⎤ ⎤
⎡ ⎡ ⎤
2 2 0
v2 = ⎣−2⎦ , v3 = ⎣−2⎦ , v4 = ⎣0⎦ ,
0 3 3

and then compute the cross products:


⎡ ⎤ ⎡ ⎤
−6 −6
u2 = v2 ∧ v3 = ⎣−6⎦ and u3 = v3 ∧ v4 = ⎣−6⎦ .
0 0
392 18. Putting Lines Together: Polylines and Polygons

Then the area is


1 √
A= n · (u2 + u3 ) = 6 2.
2
√ √
(The equality 2/2 = 1/ 2 was used to eliminate the denomina-
tor.) You may also realize that our simple example polygon is just a
rectangle, and so you have another way to check the area.

In Section 8.6, we looked at calculating normals to a polygonal mesh


specifically for computer graphics lighting models. The results of this
section are useful for this task as well. By removing the dot product
with the normal, (18.7) provides us with a method of computing a
good average normal to a nonplanar polygon:

(u2 + u3 + . . . + un−2 )
n= .
u2 + u3 + . . . + un−2 

This normal estimation method is a weighted average based on the


areas of the triangles. To eliminate this weighting, normalize each ui
before summing them.

Example 18.3

What is an estimate normal to the nonplanar polygon


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 1 0
p1 = ⎣0⎦ , p2 = ⎣0⎦ , p3 = ⎣1⎦ , p4 = ⎣1⎦?
1 0 1 0

Calculate ⎡ ⎤ ⎡ ⎤
1 −1
u2 = ⎣−1⎦ , u3 = ⎣ 1 ⎦ ,
1 1
and the normal is ⎡ ⎤
0
n = ⎣0⎦ .
1
18.8. Application: Planarity Test 393

18.8 Application: Planarity Test


Suppose someone sends you a CAD file that contains a polygon. For
your application, the polygon must be 2D; however, it is oriented
arbitrarily in 3D. How do you verify that the data points are coplanar?
There are many ways to solve this problem, although some solutions
have clear advantages over the others. Some considerations when
comparing algorithms include:

• numerical stability,

• speed,

• ability to define a meaningful tolerance,

• size of data set, and

• maintainability of the algorithm.

The order of importance is arguable.


Let’s look at three possible methods to solve this planarity test and
then compare them.

• Volume test: Choose the first polygon vertex as a base point. Form
vectors to the next three vertices. Use the scalar triple product to
calculate the volume spanned by these three vectors. If it is less
than a given tolerance, then the four points are coplanar. Continue
for all other sets.

• Plane test: Construct the plane through the first three vertices.
Check if all of the other vertices lie in this plane, within a given
tolerance.

• Average normal test: Find the centroid c of all points. Compute


all normals ni = [pi − c] ∧ [pi+1 − c]. Check if all angles formed
by two subsequent normals are below a given angle tolerance.

If we compare these three methods, we see that they employ differ-


ent kinds of tolerances: for volumes, distances, and angles. Which of
these is preferable must depend on the application at hand. Clearly
the plane test is the fastest of the three; yet it has a problem if the
first three vertices are close to being collinear.
394 18. Putting Lines Together: Polylines and Polygons

18.9 Application: Inside or Outside?


Another important concept for 2D polygons is the inside/outside test
or visibility test. The problem is this: Given a polygon in the [e1 , e2 ]-
plane and a point p, determine if the point lies inside the polygon.
One obvious application for this is polygon fill for raster device
software, e.g., as with PostScript. Each pixel must be checked to
see if it is in the polygon and should be colored. The inside/outside
test is also encountered in CAD with trimmed surfaces, which were
introduced in Section 18.5. With both applications it is not un-
common to have a polygon with one or more holes, as illustrated in
Sketch 18.11. In the PostScript fill application, nonsimple polygons
are not unusual either.1
We will present two similar algorithms, producing different results
in some special cases. However we choose to solve this problem, we
will want to incorporate a trivial reject test. This simply means that
if a point is “obviously” not in the polygon, then we output that
result immediately, i.e., with a minimal amount of calculation. In this
problem, trivial reject refers to constructing a minmax box around the
polygon. If a point lies outside of this minmax box then it may be
trivially rejected. As you see, this involves simple comparison of e1 -
and e2 -coordinates.

18.9.1 Even-Odd Rule


From a point p, construct a line in parametric form with vector r in
any direction. The parametric line is

l(t) = p + tr.

This is illustrated in Sketch 18.16. Count the number of intersections


this line has with the polygon edges for t ≥ 0 only. This is why the
vector is sometimes referred to as a ray. The number of intersections
will be odd if p is inside and even if p is outside. Figure 18.6 illustrates
the results of this rule with the polygon fill application.
Sketch 18.16. It can happen that l(t) coincides with an edge of the polygon or
Even-odd rule. passes through a vertex. Either a more elaborate counting scheme
must be developed, or you can choose a different r. As a rule, it
is better to not choose r parallel to the e1 - or e2 -axis because the
polygons often have edges parallel to these axes.
1 As an example, take Figure 3.4. The lines inside the bounding rectangle are

the edges of a nonsimple polygon.


18.9. Application: Inside or Outside? 395

Figure 18.6. Figure 18.7.


Even-odd rule: applied to polygon fill. Nonzero winding rule: applied to polygon fill.

18.9.2 Nonzero Winding Number


In Section 18.6 the winding number was introduced. Here is another
use for it. This method proceeds similarly to the even-odd rule. Con-
struct a parametric line at a point p and intersect the polygon edges.
Again, consider only those intersections for t ≥ 0. The counting
method depends on the orientation of the polygon edges. Start with
a winding number of zero. Following Sketch 18.17, if a polygon edge
is oriented “right to left” then add one to the winding number. If a
polygon edge is oriented “left to right” then subtract one from the
winding number. If the final result is zero then the point is outside
the polygon. Figure 18.7 illustrates the results of this rule with the
same polygons used in the even-odd rule. As with the previous rule,
if you encounter edges head-on, then choose a different ray.
The differences in the algorithms are interesting. The PostScript Sketch 18.17.
language uses the nonzero winding number rule as the default. The Nonzero winding number rule.
authors (of the PostScript language) feel that this produces better re-
sults for the polygon fill application, but the even-odd rule is available
with a special command. PostScript must deal with the most general
(and crazy) polygons. In the trimmed surface application, the poly-
gons must be simple and polygons cannot intersect; therefore, either
algorithm is suitable.
396 18. Putting Lines Together: Polylines and Polygons

If you happen to know that you are dealing only with convex poly-
gons, another inside/outside test is available. Check which side of the
edges the point p is on. If it is on the same side for all edges, then p is
inside the polygon. All you have to do is to compute all determinants
of the form
(pi − p) (pi+1 − p) .
If they are all of the same sign, p is inside the polygon.

• polygon • equiangular polygon


• polyline • regular polygon
• cyclic numbering • n-gon
• turning angle • rhombus
• exterior angle • simple polygon
• interior angle • trimmed surface
• polygonal mesh • visible region
• convex • total turning angle
• concave • winding number
• polygon clipping • polygon area
• sum of interior angles • planarity test
• sum of exterior angles • trivial reject
• reentrant angle • inside/outside test
• equilateral polygon • scalar triple product

18.10 Exercises
1. What is the sum of the interior angles of a six-sided polygon? What is
the sum of the exterior angles?
2. What type of polygon is equiangular and equilateral?
3. Which polygon is equilateral but not equiangular?
4. Develop an algorithm that determines whether or not a polygon is sim-
ple.
5. Calculate the winding number of the polygon with the following vertices:
     
0 −2 −2
p1 = , p2 = , p3 = ,
0 0 2
     
0 2 2
p4 = , p5 = , p6 = ,
2 2 −2
     
3 3 0
p7 = , p8 = , p9 = .
−2 −1 −1
18.10. Exercises 397

6. Compute the area of the polygon with the following vertices:


     
−1 0 1
p1 = , p2 = , p3 = ,
0 1 0
   
1 −1
p4 = , p5 = .
2 2

Use both methods from Section 18.7.


7. Give an example of a nonsimple polygon that will pass the test for
convexity, which uses the barycenter from Section 18.3.
8. Find an estimate normal to the nonplanar polygon
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 1 0
p1 = ⎣0⎦ , p2 = ⎣0⎦ , p3 = ⎣1⎦ , p4 = ⎣1⎦ .
1 0 1 0

9. The following points are the vertices of a polygon that should lie in a
plane:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 3 2
p1 = ⎣ 0⎦ , p2 = ⎣ 2⎦ , p3 = ⎣ 4⎦ ,
−2 −4 −4
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 0
p4 = ⎣ 3⎦ , p5 = ⎣ 2⎦ , p6 = ⎣0⎦ .
−1 −1 0

However, one point lies outside this plane. Which point is the outlier?2
Which planarity test is the most suited to this problem?

2 This term is frequently used to refer to noisy, inaccurate data from a laser

scanner.
This page intentionally left blank
19
Conics

Figure 19.1.
Conic sections: three types of curves formed by the intersection of a plane and a cone.
From left to right: ellipse, parabola, and hyperbola.

Take a flashlight and shine it straight onto a wall. You will see a
circle. Tilt the light, and the circle will turn into an ellipse. Tilt
further, and the ellipse will become more and more elongated, and
will become a parabola eventually. Tilt a little more, and you will have
a hyperbola—actually one branch of it. The beam of your flashlight
is a cone, and the image it generates on the wall is the intersection
of that cone with a plane (i.e., the wall). Thus, we have the name
conic section for curves that are the intersections of cones and planes.
Figure 19.1 illustrates this idea.
The three types, ellipses, parabolas, and hyperbolas, arise in many
situations and are the subject of this chapter. The basic tools for
handling them are nothing but the matrix theory developed earlier.

399
400 19. Conics

Before we delve into the theory of conic sections, we list some “real-
life” occurrences.

• The paths of the planets around the sun are ellipses.

• If you sharpen a pencil, you generate hyperbolas (see Sketch 19.1).

• If you water your lawn, the water leaving the hose traces a parabolic
arc.

Sketch 19.1. 19.1 The General Conic


A pencil with hyperbolic arcs.
We know that all points x satisfying

x21 + x22 = r2 (19.1)

are on a circle of radius r, centered at the origin. This type of equation


is called an implicit equation. Similar to the implicit equation for a
line, this type of equation is satisfied only for coordinate pairs that
lie on the circle.
A little more generality will give us an ellipse:

λ1 x21 + λ2 x22 = c. (19.2)

The positive factors λ1 and λ2 denote how much the ellipse deviates
from a circle. For example, if λ1 > λ2 , the ellipse is more elongated
in the x2 -direction. See Sketch 19.2 for the example
Sketch 19.2.
An ellipse with λ1 = 1/4, λ2 = 1 2 1
x + x2 = 1.
1/25, and c = 1. 4 1 25 2
An ellipse in the form of (19.2) is said to be in standard position
because its minor and major axes are coincident with the coordinate
axes and the center is at the origin. The ellipse is symmetric about
the major and minor axes. The semi-major and semi-minor axes are
one-half the respective major and minor axes. In standard position,
the ellipse lives in the rectangle with
   
x1 extents [− c/λ1 , c/λ1 ] and x2 extents [− c/λ2 , c/λ2 ].

We will now rewrite (19.2) in matrix form:


  
  λ1 0 x1
x1 x2 − c = 0. (19.3)
0 λ2 x2
19.1. The General Conic 401

You will see the wisdom of this in a short while. This equation allows
for significant compaction:

xT Dx − c = 0. (19.4)

Example 19.1

Let’s start with the ellipse 2x21 + 4x22 − 1 = 0. In matrix form, corre-
sponding to (19.3), we have
  
  2 0 x1
x1 x2 − 1 = 0. (19.5)
0 4 x2

This ellipse in standard position is shown in√Figure √ 19.2 (left). The


major axis is on the e1 -axis with extents [−1/ 2, 1/ 2] and the minor
axis is on the e2 -axis with extents [−1/2, 1/2].

Suppose we encounter an ellipse with center at the origin, but with


minor and major axes not aligned with the coordinate axes, as in
Figure 19.2 (middle). What is the equation of such an ellipse? Points
x̂ on this ellipse are mapped from an ellipse in standard position via a
rotation, x̂ = Rx. Using the fact that a rotation matrix is orthogonal,
we replace x by RT x̂ in (19.4), and the rotated conic takes the form

[RT x̂]T D[RT x̂] − c = 0,

which becomes
x̂T RDRT x̂ − c = 0. (19.6)

3 3 3

2 2 2

1 1 1

0 0 0

–1 –1 –1

–2 –2 –2

–3 –3 –3
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3

Figure 19.2.
Conic section: an ellipse in three positions. From left to right: standard position as
given in (19.5), with a 45◦ rotation, and with a 45◦ rotation and translation of [2, –1]T .
402 19. Conics

An abbreviation of
A = RDRT (19.7)
shortens (19.6) to
x̂T Ax̂ − c = 0. (19.8)

There are a couple of things about (19.8) that should look familiar.
Note that A is a symmetric matrix. While studying the geometry of
symmetric matrices in Section 7.5, we discovered that (19.7) is the
eigendecompostion of A. The diagonal matrix D was called Λ there.
One slight difference: convention is that the diagonal elements of Λ
satisfy λ1,1 ≥ λ2,2 . The matrix D does not; that would result in
all ellipses in standard position having major axis on the e2 -axis, as
is the case with the example in Sketch 19.2. The curve defined in
(19.8) is a contour of a quadratic form as described in Section 7.6. In
fact, the figures of conics in this chapter were created as contours of
quadratic forms. See Figure 7.8.
Now suppose we encounter an ellipse as in Figure 19.2 (right), ro-
tated and translated out of standard position. What is the equation
of this ellipse? Points x̂ on this ellipse are mapped from an ellipse in
standard position via a rotation and then a translation, x̂ = Rx + v.
Again using the fact that a rotation matrix is orthogonal, we replace
x by RT (x̂ − v) in (19.4), and the rotated conic takes the form

[RT (x̂ − v)]T D[RT (x̂ − v)] − c = 0,

or
[x̂T − vT ]RDRT [x̂ − v] − c = 0.

Again, shorten this with the definition of A in (19.7),

[x̂T − vT ]A[x̂ − v] − c = 0.

Since x̂ is simply a variable, we may drop the “hat” notation. The


symmetry of A results in equality of xT Av and vT Ax, and we obtain

xT Ax − 2xT Av + vT Av − c = 0. (19.9)

This denotes an ellipse in general position and it may be slightly


abbreviated as
xT Ax − 2xT b + d = 0, (19.10)

with b = Av and d = vT Av − c.
19.1. The General Conic 403

If we relabel d and the elements of A and b from (19.10), the


equation takes the form
  

  c1 1 c3 x1   − 21 c4
x1 x2 1 2 − 2 x1 x2 + c6 = 0. (19.11)
2 c3 c2 x2 − 1 c5 2

Expanding this, we arrive at a familiar equation of a conic,

c1 x21 + c2 x22 + c3 x1 x2 + c4 x1 + c5 x2 + c6 = 0. (19.12)

In fact, many texts simply start out by using (19.12) as the initial
definition of a conic.

Example 19.2

Let’s continue with the ellipse from Example 19.1. We have xT Dx−1
= 0, where  
2 0
D= .
0 4
This ellipse is illustrated in Figure 19.2 (left). Now we rotate by 45◦ ,
using the rotation matrix
 
s −s
R=
s s

with s = sin 45◦ = cos 45◦ = 1/ 2. The matrix A = RDRT becomes
 
3 −1
A= .
−1 3

This ellipse, xT Ax − 1 = 0, is illustrated in Figure 19.2 (middle). We


could also write it as 3x21 − 2x1 x2 + 3x22 − 1 = 0.
If we now translate by a vector
 
2
v= ,
−1

then (19.10) gives the recipe for adding translation terms, and the
ellipse is now
   
3 −1 7
xT x − 2xT + 18 = 0.
−1 3 −5
404 19. Conics

Expanding this equation,


3x21 − 2x1 x2 + 3x22 − 14x1 + 10x2 + 18 = 0.
This ellipse is illustrated in Figure 19.2 (right).

This was a lot of work just to find the general form of an ellipse!
However, as we shall see, a lot more has been achieved here; the
form (19.10) does not just represent ellipses, but any conic. To see
that, let’s examine the two remaining conic types: hyperbolas and
parabolas.

Example 19.3

Sketch 19.3 illustrates the conic


x1 x2 − 1 = 0
or the more familiar form,
1
x2 = ,
x1
which is a hyperbola. This may be written in matrix form as
Sketch 19.3.   
A hyperbola.   0 1 x1
x1 x2 1 2 − 1 = 0.
2 0 x2

Example 19.4

The parabola
x21 − x2 = 0
or
x2 = x21 ,
is illustrated in Sketch 19.4. In matrix form, it is
    
  −1 0 x1   0
x1 x2 + x1 x2 = 0.
0 0 x2 1
Sketch 19.4.
A parabola.
19.2. Analyzing Conics 405

19.2 Analyzing Conics


If you are given the equation of a conic as in (19.12), how can you
tell which of the three basic types it is? The determinant of the 2 × 2
matrix A in (19.11) reveals the type:

• If det A > 0, then the conic is an ellipse.

• If det A = 0, then the conic is a parabola.

• If det A < 0, then the conic is an hyperbola.

If A is the zero matrix and either c4 or c5 are nonzero, then the conic
is degenerate and simply consists of a straight line.
Since A is a symmetric matrix, it has an eigendecomposition, A =
RDRT . The eigenvalues of A, which are the diagonal elements of D,
also characterize the conic type.

• Two nonzero entries of the same sign: ellipse.

• One nonzero entry: parabola.

• Two nonzero entries with opposite signs: hyperbola.

Of course these conditions are summarized by the determinant of


D. Rotations do not change areas (determinants), thus checking A
suffices and it is not necessary to find D.

Example 19.5

Let’s check the type for the examples of the last section.
Example 19.2: we encountered the ellipse in this example in two
forms, in standard position and rotated,

2 0 3 −1
=
0 4 −1 3 = 8.

The determinant is positive, confirming that we have an ellipse.


Example 19.3:
0 1
1 2 = −1,
0 4
2

confirming that we have a hyperbola. The characteristic equation


for this matrix is (λ + 1/2)(λ − 1/2) = 0, thus the eigenvalues have
opposite sign.
406 19. Conics

Example 19.4:
1
0
= 0,
0 0
confirming that we have a parabola. The characteristic equation for
this matrix is λ(λ + 1) = 0, thus one eigenvalue is zero.

We have derived the general conic and folded this into a tool to
determine its type. What might not be obvious: we found that affine
maps, M x + v, where M is invertible, take a particular type of (non-
degenerate) conic to another one of the same type. The conic type is
determined by the sign of the determinant of A and it is unchanged
by affine maps.

19.3 General Conic to Standard Position


If we are given a general conic equation as in (19.12), how do we
find its equation in standard position? For an ellipse, this means
that the center is located at the origin and the major and minor axes
correspond with the coordinate axes. This leaves a degree of freedom:
the major axis can coincide with the e1 - or e2 -axis.
Let’s work with the ellipse from the previous sections,

3x21 − 2x1 x2 + 3x22 − 14x1 + 10x2 + 18 = 0.

It is illustrated in Figure 19.2 (right). Upon converting this equation


to the form in (19.11), we have
   
3 −1 7
xT
x − 2xT
+ 18 = 0.
−1 3 −5

Breaking down this equation into the elements of (19.10), we have


   
3 −1 7
A= and b = .
−1 3 −5

This means that the translation v may be found by solving the 2 × 2


linear system
Av = b,
which in this case is  
2
v= .
−1
19.3. General Conic to Standard Position 407

This linear system may be solved if A has full rank. This is equivalent
to A having two nonzero eigenvalues, and so the given conic is either
an ellipse or a hyperbola.
Calculating c = vT Av − d from (19.10) and removing the transla-
tion terms, the ellipse with center at the origin is
 
3 −1
xT
x − 1 = 0.
−1 3

The eigenvalues of this 2 × 2 matrix are the solutions of the char-


acteristic equation
λ2 − 6λ + 8 = 0,
and thus are λ1 = 4 and λ2 = 2, resulting in
 
4 0
D= .
0 2

The convention established for building decompositions (e.g., eigen or


SVD) is to order the eigenvalues in decreasing value. Either this D or
the matrix in (19.5) suffice to define this ellipse in standard position.
This one,  
4 0
xT x − 1 = 0,
0 2
will result in the major axis aligned with the e2 -axis.
If we want to find the rotation that resulted in the general conic,
then we find the eigenvectors of A. The homogeneous linear systems
introduce degrees of freedom in selection of the eigenvectors. Some
choices correspond to reflection and rotation combinations. (We dis-
cussed this in Section 7.5.) However, any choice of R will define
orthogonal semi-major and semi-minor axis directions. Continuing
with our example, R corresponds to a −45◦ rotation,
 
1 1 1
R= √ .
2 −1 1

Example 19.6

Given the conic section

x21 + 2x22 + 8x1 x2 − 4x1 − 16x2 + 3 = 0,

let’s find its type and its equation in standard position.


408 19. Conics

4 4 4

2 2 2

0 0 0

–2 –2 –2

–4 –4 –4
–4 –2 0 2 4 –4 –2 0 2 4 –4 –2 0 2 4

Figure 19.3.
Conic section: a hyperbola in three positions. From left to right: standard position,
with rotation, with rotation and translation.

First let’s write this conic in matrix form,


   
T 1 4 T 2
x x − 2x + 3 = 0.
4 2 8

Since the determinant of the matrix is negative, we know this is a


hyperbola. It is illustrated in Figure 19.3 (right).
We recover the translation v by solving the linear system
   
1 4 2
v= ,
4 2 8

resulting in  
2
v= .
0
Calculate c = vT Av − 3 = 1, then the conic without the translation
is  
1 4
xT x − 1 = 0.
4 2
This hyperbola is illustrated in Figure 19.3 (middle).
The characteristic equation of the matrix is λ2 − 3λ − 14 = 0; its
roots are λ1 = 5.53 and λ2 = −2.53. The hyperbola in standard
position is  
T 5.53 0
x x − 1 = 0.
0 −2.53
This hyperbola is illustrated in Figure 19.3 (left).
19.4. Exercises 409

• conic section • standard position


• implicit equation • conic type
• circle • hyperbola
• quadratic form • parabola
• ellipse • straight line
• minor axis • eigenvalues
• semi-minor axis • eigenvectors
• major axis • eigendecomposition
• semi-major axis • affine invariance
• center

19.4 Exercises
1. What is the matrix form of a circle with radius r in standard position?
2. What is the equation of an ellipse that is centered at the origin and has
e1 -axis extents of [−5, 5] and e2 -axis extents of [−2, 2]?
3. What are the e1 - and e2 -axis extents of the ellipse

16x21 + 4x22 − 4 = 0?

4. What is the implicit equation of the ellipse

10x21 + 2x22 − 4 = 0

when it is rotated 90◦ ?


5. What is the implicit equation of the hyperbola

10x21 − 2x22 − 4 = 0

when it is rotated 45◦ ?


6. What is the implicit equation of the conic

10x21 − 2x22 − 4 = 0

rotated by 45◦ and translated by [2 2]T ?


7. Expand the matrix form
   
−5 −1 −1
xT x − 2xT +3=0
−1 −6 0

of the equation of a conic.


410 19. Conics

8. Expand the matrix form


   
1 3 3
xT x − 2xT +9=0
3 3 3

of the equation of a conic.


9. What is the eigendecomposition of
 
4 2
A= ?
2 4

10. What is the eigendecomposition of


 
5 0
A= ?
0 5

11. Let x21 − 2x1 x2 − 4 = 0 be the equation of a conic section. What type
is it?
12. Let x21 + 2x1 x2 − 4 = 0 be the equation of a conic section. What type
is it?
13. Let 2x21 + x2 − 5 = 0 be the equation of a conic section. What type is
it?
14. Let a conic be given by

3x21 + 2x1 x2 + 3x22 + 10x1 − 2x2 + 10 = 0.

Write it in matrix form. What type of conic is it? What is the rotation
and translation that took it out of standard position? Write it in matrix
form in standard position.
15. What affine map takes the circle

(x1 − 3)2 + (x2 + 1)2 − 4 = 0

to the ellipse
2x21 + 4x22 − 1 = 0?
16. How many intersections does a straight line have with a conic? Given
a conic in the form (19.2) and a parametric form of a line l(t), what are
the t-values of the intersection points? Explain any singularities.
17. If the shear  
1 1
1/2 0
is applied to the conic of Example 19.1, what is the type of the resulting
conic?
20
Curves

Figure 20.1.
Car design: curves are used to design cars such as the Ford Synergy 2010 concept
car. (Source http://www.ford.com.)

Earlier in this book, we mentioned that all letters that you see here
were designed by a font designer, and then put into a font library.
The font designer’s main tool is a cubic curve, also called a cubic
Bézier curve. Such curves are handy for font design, but they were
initially invented for car design. This happened in France in the early
1960s at Rénault and Citroën in Paris. These techniques are still in
use today, as illustrated in Figure 20.1. We will briefly outline this
kind of curve, and also apply previous linear algebra and geometric
concepts to the study of curves in general. This type of work is
called geometric modeling or computer-aided geometric design; see an

411
412 20. Curves

introductory text such as [7]. Please keep in mind: this chapter just
scratches the surface of the modeling field!

20.1 Parametric Curves


You will recall that one way to write a straight line is the parametric
form:
x(t) = (1 − t)a + tb.
If we interpret t as time, then this says at time t = 0 a moving
point is at a. It moves toward b, and reaches it at time t = 1. You
might have observed that the coefficients (1 − t) and t are linear, or
degree 1, polynomials, which explains another name for this: linear
interpolation. So we have a simple example of a parametric curve: a
curve that can be written as
 
f (t)
x(t) = ,
g(t)

where f (t) and g(t) are functions of the parameter t. For the linear
interpolant above, f (t) = (1 − t)a1 + tb1 and g(t) = (1 − t)a2 + tb2 .
In general, f and g can be any functions, e.g., polynomial, trigono-
metric, or exponential. However, in this chapter we will be looking
at polynomial f and g.
Let us be a bit more ambitious now and study motion along curves,
i.e., paths that do not have to be straight. The simplest example is
that of driving a car along a road. At time t = 0, you start, you follow
the road, and at time t = 1, you have arrived somewhere. It does not
really matter what kind of units we use to measure time; the t = 0
and t = 1 may just be viewed as a normalization of an arbitrary time
interval.
We will now attack the problem of modeling curves, and we will
choose a particularly simple way of doing this, namely cubic Bézier
curves. We start with four points in 2D or 3D, b0 , b1 , b2 , and b3 ,
called Bézier (control) points. Connect them with straight lines as
shown in Sketch 20.1. The resulting polygon is called a Bézier (con-
trol) polygon.1
Sketch 20.1. The four control points, b0 , b1 , b2 , b3 , define a cubic curve, and
A Bézier polygon. some examples are illustrated in Figure 20.2. To create these plots,
we evaluate the cubic curve at many t-parameters that range between
1 Bézier polygons are not assumed to be closed as were the polygons of Sec-

tion 18.2.
20.1. Parametric Curves 413

Figure 20.2.
Bézier curves: two examples that differ in the location of one control point, b0 only.

zero and one, t ∈ [0, 1]. If we evaluate at 50 points, then we would


find points on the curve associated with

t = 0, 1/50, 2/50, . . . , 49/50, 1,

and then these points are connected by straight line segments to make
the curve look smooth. The points are so close together that you
cannot detect the line segments. In other words, we plot a discrete
approximation of the curve.
Here is how you generate one point on a cubic Bézier curve. Pick
a parameter value t between 0 and 1. Find the corresponding point
on each polygon leg by linear interpolation:

b10 (t) = (1 − t)b0 + tb1 ,


b11 (t) = (1 − t)b1 + tb2 ,
b12 (t) = (1 − t)b2 + tb3 .

These three points form a polygon themselves. Now repeat the linear
interpolation process, and you get two points:

b20 = (1 − t)b10 (t) + tb11 (t),


b21 = (1 − t)b11 (t) + tb12 (t).

Repeat one more time,

b30 (t) = (1 − t)b20 (t) + tb21 (t), (20.1)

and you have a point on the Bézier curve defined by b0 , b1 , b2 , b3


Sketch 20.2.
at the parameter value t. The recursive process of applying linear
The de Casteljau algorithm.
interpolation is called the de Casteljau algorithm, and the steps above
are shown in Sketch 20.2. Figure 20.3 illustrates all de Casteljau steps
414 20. Curves

Figure 20.3.
Bézier curves: all intermediate Bézier points generated by the de Casteljau algorithm
for 33 evaluations.

for 33 evaluations. The points bji are often called intermediate Bézier
points, and the following schematic is helpful in keeping track of how
each point is generated.

b0
b1 b10
b2 b11 b20
b3 b12 b21 b30
stage : 1 2 3

Except for the (input) Bézier polygon, each point in the schematic is
a function of t.

Example 20.1

A numerical counterpart to Sketch 20.3 follows. Let the polygon be


given by
       
4 0 8 8
b0 = , b1 = , b2 = , b3 = .
4 8 8 0
Sketch 20.3.
Evaluation via the de Casteljau For simplicity, let t = 1/2. Linear interpolation is then nothing
algorithm at t = 1/2. but finding midpoints, and we have the following intermediate Bézier
20.1. Parametric Curves 415

points,
 
1 1 2
b10 = b0 + b1 = ,
2 2 6
 
1 1 4
b11 = b1 + b2 = ,
2 2 8
 
1 1 8
b12 = b2 + b3 = .
2 2 4
Next,
 
1 1 1 1 3
b20 = b + b1 = ,
2 0 2 7
 
1 1 1 6
b21 = b11 + b2 = ,
2 2 6
and finally

9
1 1 2
b30 = b20 + b21 = .
2 2 13
2
This is the point on the curve corresponding to t = 1/2.

The equation for a cubic Bézier curve is found by expanding (20.1):


b30 = (1 − t)b20 + tb21
   
= (1 − t) (1 − t)b10 + tb11 + t (1 − t)b11 + tb12
= (1 − t) [(1 − t) [(1 − t)b0 + tb1 ] + t [(1 − t)b1 + tb2 ]]
+ t [(1 − t) [(1 − t)b1 + tb2 ] + t [(1 − t)b2 + tb3 ]] .
After collecting terms with the same bi , this becomes
b30 (t) = (1 − t)3 b0 + 3(1 − t)2 tb1 + 3(1 − t)t2 b2 + t3 b3 . (20.2)
This is the general form of a cubic Bézier curve. As t traces out values
between 0 and 1, the point b30 (t) traces out a curve.
The polynomials in (20.2) are called the Bernstein basis functions:
B03 (t) = (1 − t)3 ,
B13 (t) = 3(1 − t)2 t,
B23 (t) = 3(1 − t)t2 ,
B33 (t) = t3 ,
416 20. Curves

B30 B33

B31 B32

Figure 20.4.
Bernstein polynomials: a plot of the four cubic polynomials for t ∈ [0, 1].

and they are illustrated in Figure 20.4. Now b30 (t) can be written as

b30 (t) = B03 (t)b0 + B13 (t)b1 + B23 (t)b2 + B33 (t)b3 . (20.3)

As we see, the Bernstein basis functions are cubic, or degree 3, poly-


nomials. This set of polynomials is a bit different than the cubic
monomials, 1, t, t2 , t3 that we are used to from calculus. However,
either set allow us to write all cubic polynomials. We say that the
Bernstein (or monomial) polynomials form a basis for all cubic poly-
nomials, hence the name basis function. See Section 14.1 for more on
bases. We’ll look at the relationship between these sets of polynomials
in Section 20.3.
The original curve was given by the control polygon

b0 , b 1 , b 2 , b 3 .

Inspection of Sketch 20.3 suggests that a subset of the intermediate


Bézier points form two cubic polygons that also mimic the curve’s
shape. The curve segment from b0 to b30 (t) has the polygon

b0 , b10 , b20 , b30 ,

and the other from b30 (t) to b3 has the polygon

b30 , b21 , b12 , b3 .


20.2. Properties of Bézier Curves 417

This process of generating two Bézier curves from one, is called sub-
division.
From now on, we will also use the shorter b(t) instead of b30 (t).

20.2 Properties of Bézier Curves


From inspection of the examples, but also from (20.2), we see that
the curve passes through the first and last control points:

b(0) = b0 and b(1) = b3 . (20.4)

Another way to say this: the curve interpolates to b0 and b3 .


If we map the control polygon using an affine map, then the curve
undergoes the same transformation, as shown in Figure 20.5. This
is called affine invariance, and it is due to the fact that the cubic
coefficients of the control points, the Bernstein polynomials, in (20.3)
sum to one. This can be seen as follows:

(1 − t)3 + 3(1 − t)2 t + 3(1 − t)t2 + t3 = [(1 − t) + t]3 = 1.

Thus, every point on the curve is a barycentric combination of the


control points. Such relationships are not changed under affine maps,
as per Section 6.2.
The curve, for t ∈ [0, 1], lies in the convex hull of the control
polygon—a fact called the convex hull property. This can be seen

Figure 20.5.
Affine invariance: as the control polygon rotates, so does the curve.
418 20. Curves

Figure 20.6.
Convex hull property: a Bézier curve lies in the convex hull of its control polygon. Left:
shaded area fills in the convex hull of the control polygon. Right: shaded area fills in
the minmax box of the control polygon. The convex hull lies inside the minmax box.

by observing in Figure 20.4 that the Bernstein polynomials in (20.2)


are nonnegative for t ∈ [0, 1]. It follows that every point on the curve
is a convex combination of the control points, and hence is inside their
convex hull. For a definition, see Section 17.4; for an illustration, see
Figure 20.6 (left). If we evaluate the curve for t-values outside of
[0, 1], this is called extrapolation. We can no longer predict the shape
of the curve, so this procedure is normally not recommended.
Clearly, the control polygon is inside its minmax box.2 Because of
the convex hull property, we also know that the curve is inside this
box—a property that has numerous applications. See Figure 20.6
(right) for an illustration.

20.3 The Matrix Form


As a preparation for what is to follow, let us rewrite (20.2) using the
formalism of dot products. It then looks like this:
⎡ ⎤
(1 − t)3
 ⎢ 2 ⎥
⎢3(1 − t) t⎥
b(t) = b0 b1 b2 b3 ⎢ ⎥. (20.5)
⎣3(1 − t)t2 ⎦
t3
2 Recall that the minmax box of a polygon is the smallest rectangle with edges

parallel to the coordinate axes that contains the polygon.


20.3. The Matrix Form 419

Instead of the Bernstein polynomials as above, most people think


of polynomials as combinations of the monomials; they are 1, t, t2 , t3
for the cubic case. We can relate the Bernstein and monomial forms
by rewriting our expression (20.2) as

b(t) = b0 + 3t(b1 − b0 ) + 3t2 (b2 − 2b1 + b0 ) + t3 (b3 − 3b2 + 3b1 − b0 ).


(20.6)
This allows a concise formulation using matrices:
⎡ ⎤⎡ ⎤
1 −3 3 −1 1
  ⎢0 3 −6 3⎥⎢ t ⎥
b(t) = b0 b1 b2 b3 ⎢ ⎣0
⎥⎢ ⎥. (20.7)
0 3 −3⎦ ⎣t2 ⎦
0 0 0 1 t3

This is the matrix form of a Bézier curve.


Equation (20.6) or (20.7) shows how to write a Bézier curve in
monomial form. A curve in monomial form looks like this:

b(t) = a0 + a1 t + a2 t2 + a3 t3 .

Geometrically, the four control “points” for the curve in monomial


form are now a mix of points and vectors: a0 = b0 is a point, but
a1 , a2 , a3 are vectors. Using the dot product form, this becomes
⎡ ⎤
1
 ⎢ t ⎥
b(t) = a0 a1 a2 a3 ⎣ 2 ⎥ ⎢ .
t ⎦
t3

Thus, the monomial coefficients ai are defined as


⎡ ⎤
1 −3 3 −1
    ⎢0 3 −6 3⎥
a0 a1 a2 a3 = b0 b1 b2 b3 ⎢ ⎣0
⎥ . (20.8)
0 3 −3⎦
0 0 0 1

How about the inverse process: If we are given a curve in monomial


form, how can we write it as a Bézier curve? Simply rearrange (20.8)
to solve for the bi :
⎡ ⎤−1
1 −3 3 −1
    ⎢0 3 −6 3⎥
b0 b1 b2 b3 = a0 a1 a2 a3 ⎢⎣0
⎥ .
0 3 −3⎦
0 0 0 1
420 20. Curves

A matrix inversion is all that is needed here. Notice that the square
matrix in (20.8) is nonsingular, therefore we can conclude that any
cubic curve can be written in either the Bézier or the monomial
form.

20.4 Derivatives
Equation (20.2) consists of two (in 2D) or three (in 3D) cubic equa-
tions in t. We can take the derivative in each of the components:

db(t)
= −3(1 − t)2 b0 − 6(1 − t)tb1 + 3(1 − t)2 b1
dt
− 3t2 b2 + 6(1 − t)tb2 + 3t2 b3 .
db(t)
Rearranging, and using the abbreviation dt = ḃ(t), we have

ḃ(t) = 3(1 − t)2 [b1 − b0 ] + 6(1 − t)t[b2 − b1 ] + 3t2 [b3 − b2 ]. (20.9)

As expected, the derivative of a degree three curve is one of degree


two.3
One very nice feature of the de Casteljau algorithm is that the inter-
mediate Bézier points generated by it allow us to express the deriva-
tive very simply. A simpler expression than (20.9) for the derivative
is
 
ḃ(t) = 3 b21 (t) − b20 (t) . (20.10)
Getting the derivative for free makes the de Casteljau algorithm more
efficient than evaluating (20.2) and (20.9) directly to get a point and
a derivative.
At the endpoints, the derivative formula is very simple. For t = 0,
we obtain
ḃ(0) = 3[b1 − b0 ],
and, similarly, for t = 1,

ḃ(1) = 3[b3 − b2 ].

In words, the control polygon is tangent to the curve at the curve’s


endpoints. This is not a surprising statement if you check Figure 20.2.

3 Note that the derivative curve does not have control points anymore, but

rather control vectors!


20.4. Derivatives 421

Example 20.2

Let us compute the derivative of the curve from Example 20.1 for
t = 1/2. First, let’s evaluate the direct equation (20.9). We obtain
            
1 1 0 4 1 8 0 1 8 8
ḃ =3· − +6· − +3· − ,
2 4 8 4 4 8 8 4 0 8

which yields   
1 9
ḃ = .
2 −3
If instead, we used (20.10), and thus the intermediate control points
calculated in Example 20.1, we get
    
1 1 1
ḃ = 3 b1 2
− b0
2
2 2 2
   
6 3
=3 −
6 7
 
9
= ,
−3

which is the same answer but with less work! See Sketch 20.4 for an Sketch 20.4.
illustration. A derivative vector.

Note that the derivative of a curve is a vector. It is tangent to the


curve—apparent from our example, but nothing we want to prove
here. A convenient way to think about the derivative is by interpret-
ing it as a velocity vector. If you interpret the parameter t as time,
and you think of traversing the curve such that at time t you have
reached b(t), then the derivative measures your velocity. The larger
the magnitude of the tangent vector, the faster you move.
If we rotate the control polygon, the curve will follow, and so will
all of its derivative vectors. In calculus, a “horizontal tangent” has a
special meaning; it indicates an extreme value of a function. Not here:
the very notion of an extreme value is meaningless for parametric
curves since the term “horizontal tangent” depends on the curve’s
orientation and is not a property of the curve itself.
We may take the derivative of (20.9) with respect to t. We then
have the second derivative. It is given by

b̈(t) = −6(1 − t)[b1 − b0 ] − 6t[b2 − b1 ] + 6(1 − t)[b2 − b1 ] + 6t[b3 − b2 ]


422 20. Curves

and may be rearranged to

b̈(t) = 6(1 − t)[b2 − 2b1 + b0 ] + 6t[b3 − 2b2 + b1 ]. (20.11)

The de Casteljau algorithm supplies a simple way to write the second


derivative, too:
 
b̈(t) = 6 b12 (t) − 2b11 (t) + b10 (t) . (20.12)

Loosely speaking, we may interpret the second derivative b̈(t) as ac-


celeration when traversing the curve.
Sketch 20.5. The second derivative at b0 (see Sketch 20.5) is particularly simple—
A second derivative vector. it is given by
b̈(0) = 6[b2 − 2b1 + b0 ].
Notice that this is a scaling of the difference of the vectors b1 −b0 and
b2 − b1 . Recalling the parallelogram rule, the second derivative at b0
is easy to sketch. A similar equation holds for the second derivative
at t = 1; now the points involved are b1 , b2 , b3 .

20.5 Composite Curves


A Bézier curve is a handsome tool, but one such curve would rarely
suffice for describing much of any shape! For “real” shapes, we have
to be able to line up many cubic Bézier curves. In order to define a
smooth overall curve, these pieces must join smoothly.
This is easily achieved. Let b0 , b1 , b2 , b3 and c0 , c1 , c2 , c3 be the
control polygons of two Bézier curves with a common point b3 = c0
(see Sketch 20.6). If the two curves are to have the same tangent
vector direction at b3 = c0 , then all that is required is

c1 − c0 = c[b3 − b2 ] (20.13)

for some positive real number c, meaning that the three points b2 , b3 =
Sketch 20.6. c0 , c1 are collinear.
Smoothly joining Bézier curves. If we use this rule to piece curve segments together, we can design
many 2D and 3D shapes. Figure 20.7 gives an example.

20.6 The Geometry of Planar Curves


The geometry of planar curves is centered around one concept: their
curvature. It is easily understood if you imagine driving a car along
a road. For simplicity, let’s assume you are driving with constant
20.6. The Geometry of Planar Curves 423

Figure 20.7.
Composite Bézier curves: the letter D as a collection of cubic Bézier curves. Only
one Bézier polygon of many is shown.

speed. If the road does not curve, i.e., it is straight, you will not have
to turn your steering wheel. When the road does curve, you will have
to turn the steering wheel, and more so if the road curves rapidly.
The curviness of the road (our model of a curve) is thus proportional
to the turning of the steering wheel.
Returning to the more abstract concept of a curve, let us sample its
tangents at various points (see Sketch 20.7). Where the curve bends
sharply, i.e., where its curvature is high, successive tangents differ
from each other significantly. In areas where the curve is relatively
flat, or where its curvature is low, successive tangents are almost
identical. Curvature may thus be defined as rate of change of tan-
gents. (In terms of our car example, the rate of change of tangents is
proportional to the turning of the steering wheel.)
Since the tangent is determined by the curve’s first derivative, its Sketch 20.7.
rate of change should be determined by the second derivative. This is Tangents on a curve.
indeed so, but the actual formula for curvature is a bit more complex
than can be derived in the context of this book. We denote the
curvature of the curve at b(t) by κ; it is given by
% %
%ḃ ∧ b̈%
κ(t) = % %3 . (20.14)
%ḃ%
424 20. Curves

This formula holds for both 2D and 3D curves. In the 2D case, it


may be rewritten as
ḃ b̈
κ(t) = % %3 (20.15)
%ḃ%

with the use of a 2 × 2 determinant. Since determinants may be


positive or negative, curvature in 2D is signed. A point where κ = 0
is called an inflection point: the 2D curvature changes sign here. In
Figure 20.8, the inflection point is marked. In calculus, you learned
that a curve has an inflection point if the second derivative vanishes.
For parametric curves, the situation is different. An inflection point
occurs when the first and second derivative vectors are parallel, or
linearly dependent. This can lead to the curious effect of a cubic with
two inflection points. It is illustrated in Figure 20.9.

Figure 20.8.
Inflection point: an inflection point, a point where the curvature changes sign, is
marked on the curve.

Figure 20.9.
Inflection point: a cubic with two inflection points.
20.7. Moving along a Curve 425

Figure 20.10.
Curve motions: a letter is moved along a curve.

20.7 Moving along a Curve

Take a look at Figure 20.10. You will see the letter B sliding along
a curve. If the curve is given in Bézier form, how can that effect be
achieved? The answer can be seen in Sketch 20.8. If you want to
position an object, such as the letter B, at a point on a curve, all
you need to know is the point and the curve’s tangent there. If ḃ is
the tangent, then simply define n to be a vector perpendicular to it.4
Using the local coordinate system with origin b(t) and [ḃ, n]-axes,
you can position any object as in Section 4.1.
The same story is far trickier in 3D! If you had a point on the curve Sketch 20.8.
and its tangent, the exact location of your object would not be fixed; Sliding along a curve.
it could still rotate around the tangent. Yet there is a unique way to
position objects along a 3D curve. At every point on the curve, we
may define a local coordinate system as follows.
Let the point on the curve be b(t); we now want to set up a local
coordinate system defined by three vectors f1 , f2 , f3 . Following the
2D example, we set f1 to be in the tangent direction; f1 = ḃ(t). If
the curve does not have an inflection point at t, then ḃ(t) and b̈(t)
will not be collinear. This means that they span a plane, and that
plane’s normal is given by ḃ(t) ∧ b̈(t). See Sketch 20.9 for some visual
information. We make the plane’s normal one of our local coordinate
axes, namely f3 . The plane, by the way, has a name: it is called the
osculating plane at x(t). Since we have two coordinate axes, namely
f1 and f3 , it is not hard to come up with the remaining axis, we just
set f2 = f1 ∧ f3 . Thus, for every point on the curve (as long as it is not
an inflection point), there exists an orthogonal coordinate system. It
is customary to use coordinate axes of unit length, and then we have

ḃ(t)
f1 = % %, (20.16)
%ḃ(t)%

   
4 If ḃ1 −ḃ2
ḃ = then n = .
ḃ2 ḃ1
426 20. Curves

Figure 20.11.
Curve motions: a robot arm is moved along a curve. (Courtesy of M. Wagner, Arizona
State University.)

ḃ(t) ∧ b̈(t)
f3 = % %, (20.17)
%ḃ(t) ∧ b̈(t)%

f2 = f1 ∧ f3 . (20.18)

This system with local origin b(t) and normalized axes f1 , f2 , f3 is


called the Frenet frame of the curve at b(t). Equipped with the tool
of Frenet frames, we may now position objects along a 3D curve! See
Figure 20.11. Since we are working in 3D, we use the cross product
to form the orthogonal frame; however, we could have equally as well
Sketch 20.9. used the Gram-Schmidt process from Section 11.8.
A Frenet frame. Let us now work out exactly how to carry out our object-positioning
plan. The object is given in some local coordinate system with axes
u1 , u2 , u3 . Any point of the object has coordinates
⎡ ⎤
u1
u = ⎣ u2 ⎦ .
u3
It is mapped to

x(t, u) = b(t) + u1 f1 + u2 f2 + u3 f3 .

A typical application is robot motion. Robots are used extensively


in automotive assembly lines; one job is to grab a part and move it
to its destination inside the car body. This movement happens along
well-defined curves. While the car part is being moved, it has to be
oriented into its correct position—exactly the process described in
this section!
20.8. Exercises 427

• linear interpolation • Bernstein and monomial


• parametric curve conversion
• de Casteljau algorithm • nonsingular
• cubic Bézier curve • first derivative
• subdivision • second derivative
• affine invariance • parallelogram rule
• convex hull property • composite Bézier curves
• Bernstein polynomials • curvature
• basis function • inflection point
• barycentric combination • Frenet frame
• matrix form • osculating plane
• cubic monomial curve

20.8 Exercises
Let a cubic Bézier curve d(t) be given by the control polygon
       
0 6 3 2
d0 = , d1 = , d2 = , d3 = .
0 3 6 4

Let a cubic Bézier curve b(t) be given by the control polygon


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 4 4 4
b0 = ⎣0⎦ , b1 = ⎣0⎦ , b2 = ⎣4⎦ , b3 = ⎣4⎦ .
0 0 0 4

1. Sketch d(t) manually.


2. Using the de Casteljau algorithm, evaluate d(t) for t = 1/2.
3. Evaluate the first and second derivative of d(t) for t = 0. Add these
vectors to the sketch from Exercise 1.
4. For d(t), what is the control polygon for the curve defined from t = 0
to t = 1/2 and the curve defined over t = 1/2 to t = 1?
5. Rewrite d(t) in monomial form.
6. Find d(t)’s minmax box.
7. Find d(t)’s curvature at t = 0 and t = 1/2.
8. Find d(t)’s Frenet frame for t = 1/2.
9. Using the de Casteljau algorithm, evaluate d(t) for t = 2. This is
extrapolation.
428 20. Curves

10. Attach another curve c(t) at d(t), creating a composite curve with tan-
gent continuity. What are the constraints on c(t)’s polygon?
11. Sketch b(t) manually.
12. Using the de Casteljau algorithm, evaluate b(t) for t = 1/4.
13. Evaluate the first and second derivative of b(t) for t = 1/4. Add these
vectors to the sketch from Exercise 11.
14. For b(t), what is the control polygon for the curve defined from t = 0
to t = 1/4 and the curve defined over t = 1/4 to t = 1?
15. Rewrite b(t) in monomial form.
16. Find b(t)’s minmax box.
17. Find b(t)’s curvature at t = 0 and t = 1/2.
18. Find b(t)’s Frenet frame for t = 1.
19. Using the de Casteljau algorithm, evaluate b(t) for t = 2. This is
extrapolation.
20. Attach another curve c(t) at b0 , creating a composite curve with tangent
continuity. What are the constraints on c(t)’s polygon?
A
Glossary

In this Glossary, we give brief definitions of the major concepts in the


book. We avoid equations here, so that we give a slightly different
perspective compared to what you find in the text. For more infor-
mation on any item, reference the Index for full descriptions in the
text. Each of these items is in one or more WYSK sections. However,
more terms may be found in WYSK.

Action ellipse The image of the unit circle under a linear map.

Adjacency matrix An n × n matrix that has ones or zeroes as ele-


ments, representing the connectivity between n nodes in a directed
graph.

Affine map A map that leaves linear relationships between points


unchanged. For instance, midpoints are mapped to midpoints.
An affine map is described by a linear map and a translation.

Affine space A set of points with the property that any barycentric
combination of two points is again in the space.

Aspect ratio The ratio of width to height of a rectangle.

Barycentric combination A weighted average of points where the


sum of the weights equals one.

Barycentric coordinates When a point is expressed as a barycentric


combination of other points, the coefficients in that combination
are called barycentric coordinates.

429
430 A. Glossary

Basis For a linear space of dimension n, any set of n linearly in-


dependent vectors is a basis, meaning that every vector in the
space may be uniquely expressed as a linear combination of these
n basis vectors.
Basis transformation The linear map taking one set of n basis vectors
to another set of n basis vectors.
Bernstein polynomial A polynomial basis function. The set of degree
n Bernstein polynomials are used in the degree n Bézier curve
representation.
Best approximation The best approximation in a given subspace is
the one that minimizes distance. Orthogonal projections produce
the best approximation.
Bézier curve A polynomial curve representation, which is based on
the Bernstein basis and control points.
Centroid The center of mass, or the average of a set of points, with
all weights being equal and summing to one.
Characteristic equation Every eigenvalue problem may be stated
as finding the zeroes of a polynomial, giving the characteristic
equation.
Coefficient matrix The matrix in a linear system that holds the
coefficients of the variables in a set of linear equations.
Collinear A set of points is called collinear if they all lie on the same
straight line.
Column space The columns of a matrix form a set of vectors. Their
span is the column space.
Condition number A function measuring how sensitive a map is to
changes in its input. If small changes in the input cause large
changes in the output, then the condition number is large.
Conic section The intersection curve of a double cone with a plane.
A nondegenerate conic section is either an ellipse, a parabola, or
a hyperbola.
Consistent linear system A linear system with one or many solu-
tions.
Contour A level curve of a bivariate function.
431

Convex A point set is convex if the straight line segment through


any of its points is completely contained inside the set. Example:
all points on and inside of a sphere form a convex set; all points
on and inside of an hourglass do not.
Convex combination A linear combination of points or vectors with
nonnegative coefficients that sum to one.
Coordinates A vector in an n-dimensional linear space may be
uniquely written as a linear combination of a set of basis vectors.
The coefficients in that combination are the vector’s coordinates
with respect to that basis.
Coplanar A set of points is coplanar if all points lie on the same
plane.
Covariance matrix An n-dimensional matrix that summarizes the
variation in each coordinate and between coordinates of given n-
dimensional data.
Cramer’s rule A method for solving a linear system explicitly using
ratios of determinants.
Cross product The cross product of two linearly independent 3D
vectors results in a third vector that is perpendicular to them.
Cubic curve A degree three polynomial curve.
Curvature The rate of change of the tangent.
Curve The locus of a moving point.
de Casteljau algorithm A recursive process of linear interpolation
for evaluating a Bézier curve.
Decomposition Expression of an element of a linear space, such as a
vector or a matrix, in terms of other, often more fundamental, el-
ements of the linear space. Examples: orthogonal decomposition,
LU decomposition, eigendecomposition, or SVD.
Determinant A linear map takes a geometric object to another
geometric object. The ratio of their volumes is the map’s deter-
minant.
Diagonalizable matrix A matrix A is diagonalizable if there exists
a matrix R such that R−1 AR is a diagonal matrix. The diagonal
matrix holds the eigenvalues of A and the columns of R hold the
eigenvectors of A.
432 A. Glossary

Dimension The number of linearly independent vectors needed to


span a linear space.

Directed graph A set of nodes and edges with associated direction.

Domain A linear map maps from one space to another. The “from”
space is the domain.

Dominant eigenvalue The eigenvalue of a matrix with the largest


absolute value.

Dominant eigenvector The eigenvector of a matrix corresponding


to the dominant eigenvalue.

Dot product Assigns a value to the product of two vectors that


is equal to the product of the magnitudes of the vectors and the
cosine of the angle between the vectors. Also called the scalar
product.

Dual space Consider all linear maps from a linear space into the
1D linear space of scalars. All these maps form a linear space
themselves, the dual space of the original space.

Dyadic matrix Symmetric matrix of rank one, obtained from multi-


plying a column vector by its transpose.

Eigendecomposition The decomposition of a matrix A into RΛRT ,


where the columns of R hold the eigenvectors of A and Λ is a
diagonal matrix holding the eigenvalues of A. This decomposition
is a sequence of a rigid body motion, a scale, followed by another
rigid body motion.

Eigenfunction A function that gets mapped to a multiple of itself


by some linear map defined on a function space.

Eigenvalue If a linear map takes some vector to itself multiplied by


some constant, then that constant is an eigenvalue of the map.

Eigenvector A vector whose direction is unchanged by a linear map.

Elementary row operation This type of operation on a linear system


includes exchanging rows, adding a multiple of one row to another,
and multiplying a row by a scalar. Gauss elimination uses these
operations to transform a linear system into a simpler one.
433

Ellipse A bounded conic section. When written in implicit form uti-


lizing the matrix form, its 2 × 2 matrix has a positive determinant
and two positive eigenvalues.

Equiangular polygon All interior angles at the vertices of a polygon


are equal.

Equilateral polygon All sides of a polygon are of equal length.

Explicit equation Procedure to directly produce the value of a


function or map.

Exterior angle A polygon is composed of edges. An exterior angle is


the angle between two consecutive edges, measured on the outside
of the polygon. Also called the turning angle.

Foot of a point The point on a line or a plane that is closest to a


given point.

Frenet frame A 3D orthonormal coordinate system that exists at


any point on a differentiable curve. It is defined by the tangent,
normal, and binormal at a given point on the curve.

Gauss elimination The process of forward elimination, which trans-


forms a linear system into an equivalent linear system with an
upper triangular coefficient matrix, followed by back substitution.

Gauss-Jacobi iteration Solving a linear system by successively im-


proving an initial guess for the solution vector by updating all
solution components simultaneously.

Gauss-Seidel iteration Solving a linear system by successively im-


proving an initial guess for the solution vector by updating each
component as a new value is computed.

Generalized inverse Another name for the pseudoinverse. The con-


cept of matrix inverse based on the SVD, which can be defined for
nonsquare and singular matrices as well as square and nonsingular
matrices.

Gram-Schmidt method A method for creating an orthonormal


coordinate system from orthogonal projections of a given set of
linearly independent vectors.
434 A. Glossary

Homogeneous coordinates Points in 2D affine space may be viewed


as projections of points in 3D affine space, all being multiples
of each other. The coordinates of any of these points are the
homogeneous coordinates of the given point.
Homogeneous linear system A linear system whose right-hand side
consists of zeroes only.
Householder matrix A special reflection matrix, which is designed
to reflect a vector into a particular subspace while leaving some
components unchanged. This matrix is symmetric, involutary,
and orthogonal. It is key to the Householder method for solving
a linear system.
Householder method A method for solving an m × n linear sys-
tem based on reflection matrices (rather than shears as in Gauss
elimination). If the system is overdetermined, the Householder
method results in the least squares solution.
Hyperbola An unbounded conic section with two branches. When
written in implicit form utilizing the matrix form, its 2 × 2 ma-
trix has a negative determinant and a positive and a negative
eigenvalue.
Idempotent A map is idempotent if repeated applications of the
map yield the same result as only one application. Example:
projections.
Identity matrix A square matrix with entries 1 on the diagonal and
entries 0 elsewhere. This matrix maps every vector to itself.
Image The result of a map. The preimage is mapped to the image.
Implicit equation A definition of a multivariate function by specifying
a relationship among its arguments.
Incenter The center of a triangle’s incircle, which is tangent to the
three edges.
Inflection point A point on a 2D curve where the curvature is zero.
Inner product Given two elements of a linear space, their inner
product is a scalar. The dot product is an example of an in-
ner product. An inner product is formed from a vector product
rule that satisfies symmetry, homogeneity, additivity, and positiv-
ity requirements. If two vectors are orthogonal, then their inner
product is zero.
435

Interior angle A polygon is composed of edges. An interior angle is


formed by two consecutive edges inside the polygon.

Inverse matrix A matrix maps a vector to another vector. The


inverse matrix undoes this map.

Involutary matrix A matrix for which the inverse is equal to the


original matrix.

Kernel The set of vectors being mapped to the zero vector by a


linear map. Also called the null space.

Least squares approximation The best approximation to an overde-


termined linear system.

Line Given two points in affine space, the set of all barycentric
combinations is a line.

Line segment A line, but with all coefficients of the barycentric


combinations being nonnegative.

Linear combination A weighted sum of vectors.

Linear functional A linear map taking the elements of a linear space


to the reals. Linear functionals are said to form the dual space of
this linear space.

Linear interpolation A weighted average of two points, where the


weights sum to one and are linear functions of a parameter.

Linearity property Preservation of linear combinations, thus incor-


porating the standard operations in a linear space: addition and
scalar multiplication.

Linearly independent A set of vectors is called linearly independent


if none of its elements may be written as a linear combination of
the remaining ones.

Linear map A map of a linear space to another linear space such


that linear relationships between vectors are not changed by the
map.

Linear space A set of vectors with the property that any linear
combination of any two vectors is also in the set. Also called a
vector space.
436 A. Glossary

Linear system The equations resulting from writing a given vector


as an unknown linear combination of a given set of vectors.

Length The magnitude of a vector.

Local coordinates A specific coordinate system used to define a


geometric object. This object may then be placed in a global
coordinate system.

Major axis An ellipse is symmetric about two axes that intersect at


the center of the ellipse. The longer axis is the major axis.

Map The process of changing objects. Example: rotating and scaling


an object. The object being mapped is called the preimage, the
result of the map is called the image.

Matrix The coordinates of a linear map, written in a rectangular


array of scalars.

Matrix norm A characterization of a matrix, similar to a vector


norm, which measures the “size” of the matrix.

Minor axis An ellipse is symmetric about two axes that intersect at


the center of the ellipse. The shorter axis is the minor axis.

Monomial curve A curve represented in terms of the monomial


polynomials.

Moore-Penrose generalized inverse Another name for the pseudo-


inverse.

n -gon An n-sided equiangular and equilateral polygon. Also called


regular polygon.

Nonlinear map A map that does not preserve linear relationships.


Example: a perspective map.

Norm A function that assigns a length to a vector.

Normal A vector that is perpendicular to a line or a plane.

Normal equations A linear system is transformed into a new, square


linear system, called the normal equations, which will result in
the least squares solution to the original system.

Null space The set of vectors mapped to the zero vector by a linear
map. The zero vector is always in this set. Also called the kernel.
437

Orthogonality Two vectors are orthogonal in an inner product space


if their inner product vanishes.

Orthogonal matrix A matrix that leaves angles and lengths un-


changed. The columns are unit vectors and the inverse is simply
the transpose. Example: 2D rotation matrix.

Orthonormal A set of unit vectors is called orthonormal if any two


of them are perpendicular.

Osculating plane At a point on a curve, the first and second


derivative vectors span the osculating plane. The osculating cir-
cle, which is related to the curvature at this point, lies in this
plane.

Overdetermined linear system A linear system with more equations


than unknowns.

Parabola An unbounded conic section with one branch. When


written in implicit form utilizing the matrix form, its 2 × 2 matrix
has a zero determinant and has one zero eigenvalue.

Parallel Two planes are parallel if they have no point in common.


Two vectors are parallel if they are multiples of each other.

Parallel projection A projection in which all vectors are projected


in the same direction. If this direction is perpendicular to the
projection plane it is an orthogonal projection, otherwise it is an
oblique projection.

Parallelogram rule Two vectors define a parallelogram. The sum of


these two vectors results in the diagonal of the parallelogram.

Parametric equation Description of an object by making each


point/vector a function of a set of real numbers (the parameters).

Permutation matrix A matrix that exchanges rows or columns in a


second matrix. It is a square and orthogonal matrix that has one
entry of 1 in each row and entries of 0 elsewhere. It is used to
perform pivoting in Gauss elimination.

Pivot The leading coefficient—first from the left—in a row of a


matrix. In Gauss elimination, the pivots are typically the diagonal
elements, as they play a central role in the algorithm.
438 A. Glossary

Pivoting The process of exchanging rows or columns so the pivot ele-


ment is the largest element. Pivoting is used to improve numerical
stability of Gauss elimination.
Plane Given three points in affine space, the set of all barycentric
combinations is a plane.
Point A location, i.e., an element of an affine space.
Point cloud A set of 3D points without any additional structure.
Point normal plane A special representation of an implicit plane
equation, formed by a point in the plane and the (unit) normal
to the plane.
Polygon The set of edges formed by connecting a set of points.
Polyline Vertices connected by edges.
Positive definite matrix If a quadratic form results in only nonnega-
tive values for any nonzero argument, then its matrix is positive
definite.
Principal components analysis (PCA) A method for analyzing data
by creating an orthogonal local coordinate frame—the principal
axes—based on the dimensions with highest variance. This allows
for an orthogonal transformation of the data to the coordinate
axes for easier analysis.
Projection An idempotent linear map that reduces the dimension of
a vector.
Pseudoinverse A generalized inverse that is based on the SVD. All
matrices have a pseudoinverse, even nonsquare and singular ones.
Pythagorean theorem The square of the length of the hypotenuse of
a right triangle is equal to the sum of the squares of the lengths
of the other two sides.
Quadratic form A bivariate quadratic polynomial without linear or
constant terms.
Range A linear map maps from one space to another. The “to”
space is the range.
Rank The dimension of the image space of a map. Also the number
of nonzero singular values.
439

Ratio A measure of how three collinear points are distributed. If


one is the midpoint of the other two, the ratio is 1.
Reflection matrix A linear map that reflects a vector about a line
(2D) or plane (3D). It is an orthogonal and involutary map.
Residual vector The error incurred during the solution process of
a linear system.
Rhombus A parallelogram with all sides equal.
Rigid body motion An affine map that leaves distances and angles
unchanged, thus an object is not deformed. Examples: rotation
and translation.
Rotation matrix A linear map that rotates a vector through a par-
ticular angle around a fixed vector. It is an orthogonal matrix,
thus the inverse is equal to the transpose and the determinant is
one, making it a rigid body motion.
Row echelon form Describes the inverted “staircase” form of a
matrix, normally as a result of forward elimination and pivoting
in Gauss elimination. The leading coefficient in a row (pivot) is
to the right of the pivot in the previous row. Zero rows are the
last rows.
Row space The rows of a matrix form a set of vectors. Their span
is the row space.
Scalar triple product Computes the signed volume of three vec-
tors using the cross product and dot product. Same as a 3 × 3
determinant.
Scaling matrix A linear map with a diagonal matrix. If all diagonal
elements are equal, it is a uniform scaling, which will uniformly
shrink or enlarge a vector. If the diagonal elements are not equal,
it is a nonuniform scaling. The inverse is a scaling matrix with
inverted scaling elements.
Semi-major axis One half the major axis of an ellipse.
Semi-minor axis One half the minor axis of an ellipse.
Shear matrix A linear map that translates a vector component pro-
portionally to one or more other components. It has ones on the
diagonal. It is area preserving. The inverse is a shear matrix with
the off-diagonal elements negated.
440 A. Glossary

Singular matrix A matrix with zero determinant, thus it is not


invertible.

Singular value decomposition (SVD) The decomposition of a matrix


into a sequence of a rigid body motion, a scale, followed by another
rigid body motion, and the scale factors are the singular values.

Singular values The singular values of a matrix A are the square


roots of the eigenvalues of AT A.

Span For a given set of vectors, its span is the set (space) of all vec-
tors that can be obtained as linear combinations of these vectors.

Star The triangles sharing one vertex in a triangulation.

Stationary vector An eigenvector with 1 as its associated eigenvalue.

Stochastic matrix A matrix for which the columns or rows sum to


one.

Subspace A set of linearly independent vectors defines a linear


space. The span of any subset of these vectors defines a subspace
of that linear space.

Symmetric matrix A square matrix whose elements below the diag-


onal are mirror images of those above the diagonal.

Translation An affine map that changes point locations by a constant


vector.

Transpose matrix The matrix whose rows are formed by the columns
of a given matrix.

Triangle inequality The sum of any two edge lengths in a triangle


is larger than the third one.

Triangulation Also called triangle mesh: a set of 2D or 3D points


that is faceted into nonoverlapping triangles.

Trimmed surface A parametric surface with parts of its domain


removed.

Trivial reject An inexpensive test in a process that eliminates unnec-


essary computations.

Turning angle A polygon is composed of edges. An exterior angle


formed by two consecutive edges is a turning angle.
441

Underdetermined linear system A linear system with fewer equa-


tions than unknowns.
Unit vector A vector whose length is 1.

Upper triangular matrix A matrix with only zero entries below the
diagonal.

Valence The number of triangles in the star of a vertex.

Vector An element of a linear space or, equivalently, the difference


of two points in an affine space.

Vector norm A measure for the magnitude of a vector. The Eu-


clidean norm is the standard measure of length; others, such as
the p-norms, are used as well.

Vector space Linear space.


Zero vector A vector of zero length. Every linear space contains a
zero vector.
This page intentionally left blank
B
Selected Exercise Solutions

1a. The triangle vertex with coordinates (0.1, 0.1) in the [d1 , d2 ]-system is Chapter 1
mapped to

x1 = 0.9 × 1 + 0.1 × 3 = 1.2,


x2 = 0.9 × 2 + 0.1 × 3 = 2.1.

The triangle vertex (0.9, 0.2) in the [d1 , d2 ]-system is mapped to

x1 = 0.1 × 1 + 0.9 × 3 = 2.8,


x2 = 0.8 × 2 + 0.2 × 3 = 2.2.

The triangle vertex (0.4, 0.7) in the [d1 , d2 ]-system is mapped to

x1 = 0.6 × 1 + 0.4 × 3 = 1.8,


x2 = 0.3 × 2 + 0.7 × 3 = 2.7.

1b. The coordinates (2, 2) in the [e1 , e2 ]-system are mapped to

2−1 1
u1 = = ,
3−1 2
2−2
u2 = = 0.
3−2

3. The local coordinates (0.5, 0, 0.7) are mapped to

x1 = 1 + 0.5 × 1 = 1.5,
x2 = 1 + 0 × 2 = 1,
x3 = 1 + 0.7 × 4 = 3.8.

443
444 Selected Exercise Solutions

4. We have simply moved Exercise 1a to 3D, so the first two coordinates


are identical. Since the 3D triangle local coordinates are at the ex-
tents of the local frame, they will be at the extents of the global frame.
Therefore, (u1 , u2 , u3 ) is mapped to (x1 , x2 , x3 ) = (1.2, 2.1, 4), (v1 , v2 , v3
is mapped to (x1 , x2 , x3 ) = (2.8, 2.2, 8), and (w1 , w2 , w3 ) is mapped to
(x1 , x2 , x3 ) = (1.8, 2.7, 4).
7. The local coordinates (3/4, 1/2), which are at the midpoint of the two
given local coordinate sets, has global coordinates
1 1 1
(6 , 5) = (5, 2) + (8, 8).
2 2 2
Its coordinates are the midpoint between the given global coordinate
sets.
9. The aspect ratio tells us that
4 8
Δ1 = 2 × = .
3 3
The global coordinates of (1/2, 1/2) are
1 8 4 1
x1 = 0 + · = x2 = 0 + · 2 = 1.
2 3 3 2
10. Each coordinate will follow similarly, so let’s work out the details for
x1 . First, construct the ratio that defines the relationship between a
coordinate u1 in the NDC system and coordinate x1 in the viewport
u1 − (−1) x1 − min1
= .
1 − (−1) max1 − min1
Solve for x1 , and the equations for x2 follow similarly, namely
(max1 − min1 )
x1 = (u1 + 1) + min1 ,
2
(max2 − min2 )
x2 = (u2 + 1) + min2 .
2

Chapter 2 2. The vectors v and w form adjacent sides of a parallelogram. The vectors
v+w and v−w form the diagonals of this parallelogram, and an example
is illustrated in Sketch 2.6.
4. The operations have the following results.
(a) vector
(b) point
(c) point
(d) vector
Selected Exercise Solutions 445

(e) vector
(f) point
5. The midpoint between p and q is
1 1
m= p + q.
2 2
8. A triangle.
9. The length of the vector  
−4
v=
−3
is
v = (−4)2 + (−3)2 = 5.
13. The distance between p and q is 1.
14. A unit vector has length one.
15. The normalized vector
 
v −4/5
= .
v −3/5

17. The barycentric coordinates are (1−t) and t such that r = (1−t)p+tq.
We determine √ to p and q by calculating l1 =
√ the location of r relative
r − p = 2 2 and l2 = q − r = 4 2. The barycentric coordinates

must sum to one, so we need the total length l3 = l1 + l2 = 6 2. Then
the barycentric coordinates are t = l1 /l3 = 1/3 and (1 − t) = 2/3.
Check that this is correct:
     
3 2 1 1 7
= + .
3 3 1 3 7

19. No, they are linearly dependent.


21. Yes, v1 and v2 form a basis for R2 since they are linearly independent.
24. The dot product, v · w = 5 × 0 + 4 × 1 = 4. Scalar product is another
name for dot product and the dot product has the symmetry property,
therefore w · v = 4.
   
5 3
25. The angle between the vectors and is 90◦ by inspection.
5 −3
Sketch it and this will be clear. Additionally, notice 5 × 3 + 5 × −3 = 0.
27. The angles fall into the following categories: θ1 is obtuse, θ2 is a right
angle, and θ3 is acute.
28. The orthogonal projection u of w onto v is determined by (2.21), or
specifically,    
1 3
·    
−1 2 1 1/2
u = √ 2 · = .
2 −1 −1/2
446 Selected Exercise Solutions

Draw a sketch to verify. Therefore, the u⊥ that completes the orthog-


onal decomposition of w is
     
3 1 2
u⊥ = w − u = − = .
2 −1 3

Add u⊥ to your sketch to verify.


31. Equality of the Cauchy-Schwartz inequality holds when v and w are
linearly dependent. A simple example:
   
1 3
v= and w = .
0 0

The two sides of Cauchy-Schwartz are then (v·w)2 = 9 and v2 w2 =
12 × 32 = 9.
32. No, the triangle inequality states that v + w ≤ v + w.

Chapter 3 1. The line is defined by the equation l(t) = p + t(q − p), thus
   
0 4
l(t) = +t .
1 1
We can check that
   
0 4
l(0) = +0× = p,
1 1
   
0 4
l(1) = +1× = q.
1 1

3. The parameter value t = 2 is outside of [0, 1], therefore l(2) is not formed
from a convex combination.
5. First form the vector  
2
q−p= ,
−1
then a is perpendicular to this vector, so let
 
1
a= .
2
Next, calculate c = −a1 p1 − a2 p2 = 2, which makes the equation of the
line x1 + 2x2 + 2 = 0.
Alternatively, we could have let
 
−1
a= .
−2
Then the implicit equation of the line is −x1 − 2x2 − 2 = 0, which is
simply equivalent to multiplying the previous equation by −1.
Selected Exercise Solutions 447

6. The point r0 is not on the line x1 +2x2 +2 = 0, since 0+2×0+2 = 2 = 0.


The point r1 is on the line since −4 + 2(1) + 2 = 0. The point r2 is not
on the lines since 5 + 2(1) + 2 = 0. The point r3 is not on the line since
−3 + 2(−1) + 2 = 0.
10. The explicit equation of the line 6x1 + 3x2 + 3 = 0 is x2 = −2x1 − 1.
11. The slope is 4 and e2 -intercept is x2 = −1.
14. The implicit equation of the line is −3x1 − 2x2 + 5 = 0. (Check that
the given point, p, from the parametric form satisfies this equation.)
16. To form the implicit equation, we first find a vector parallel to the line,
     
2 −1 3
v= − = .
2 0 2
From v we select the components of a as a1 = −2 and a2 = 3, and
then c = −2. Therefore, the implicit equation is −2x1 + 3x2 − 2 = 0.
(Check that the two points forming the parametric form satisfy this
equation.)
17. The parametric form of the line is l(t) = p + tv. Construct a vector
that is perpendicular to a:
 
2
v= ,
−3
find one point on the line:
 
−1/3
p= ,
0
and the definition is complete. (This selection of p and v follow the
steps outlined in Section 3.5).
19. Let w = r − p, then  
5
w=
0
√ w = 5. The cosine of the angle between w and v is cos α =
and
√ 2. The distance of r√to the line is d = w sin α, where sin α =
1/
1 − cos2 α, thus d = 5/ 2.
21. Let q be the foot of the point r and q = p + tv. Define w = r − p,
then
v·w 5
t= = .
v2 2
The foot of the point is
     
0 5 1 5/2
q= + = .
0 2 1 5/2

(The distance between r and the foot of r is 5/ 2, confirming our
solution to Exercise 19.)
448 Selected Exercise Solutions

24. The lines are identical!


27. The midpoint r is 1 : 1 with respect to the points p and q. This means
that it has parameter value t = 1/(1 + 1) = 1/2 with respect to the
line l(t) = p + t(q − p), and
     
0 1 4 2
r= + = .
1 2 1 3/2
 
−1
The vector v⊥ = is perpendicular to q − p. The line m(t) is
4
then defined as    
3/2 −1
m(t) = +t .
2 4

Chapter 4 1. The linear combination is w = 2c1 + (1/2)c2 . In matrix form:


    
1 0 2 2
= .
0 2 1/2 1

3. Yes, v = Aw where  
3
w= .
−2
6. The transpose of the given matrices are as follows:
   
0 1 1 −1 
AT = , BT = , vT = 2 3 .
−1 0 −1 1/2

7. The transpose of the 2 × 2 identity matrix I is (again)


 
1 0
I= .
0 1

8. From (4.9), we expect these two matrix sum-transpose results to be the


same:    
1 −2 1 0
A+B = , [A + B]T = ;
0 1/2 −2 1/2
     
0 1 1 −1 1 0
AT = , BT = , AT + B T = .
−1 0 −1 1/2 −2 1/2
And indeed they are.
9. The rank, or the number of linearly independent column vectors of B
is two. We cannot find a scalar α such that
   
1 −1
=α ,
−1 1/2

therefore, these vectors are linearly independent.


Selected Exercise Solutions 449

11. The zero matrix has rank zero.

14. The product Av:

2
3
.
0 −1 −3
1 0 2

The product Bv:

2
3
.
1 −1 −1
−1 1/2 −1/2

15. The matrix represents a uniform scaling,


 
3 0
S= .
0 3

19. This one is a little tricky! It is a rotation; notice that the determinant
is one. See the discussion surrounding (4.15).

21. No. Simply check that the matrix multiplied by itself does not result
in the matrix again:
 
−1 0
A2 = = A.
0 −1

25. The determinant of A:


 
0 −1
|A| =  = 0 × 0 − (−1) × 1 = 1.
1 0

26. The determinant of B:


 
 1 −1 
|B| =  = 1 × 1/2 − (−1) × (−1) = −1/2.
−1 1/2

28. The sum


 
1 −2
A+B = .
0 1/2
450 Selected Exercise Solutions

The product (A + B)v:


2
3
1 −2 −4
0 1/2 3/2
and      
−3 −1 −4
Av + Bv = + = .
2 −1/2 3/2
31. The matrix A2 = A · A equals
 
−1 0
.
0 −1
Does this look familiar? Reflect on the discussion surrounding (4.15).
33. Matrix multiplication cannot increase the rank of the result, hence the
rank of M N is one or zero.

Chapter 5 1. The linear system takes the form


    
2 6 x1 6
= .
−3 0 x2 3
2. This system is inconsistent because there is no solution vector u that
will satisfy this linear system.
4. The linear system     
2 6 x1 6
=
−3 0 x2 3
has the solution
   
6 6  2 6
  
3 0 −3 3 4
x1 =   = −1, x2 =   = .
2 6   6 3
  2
−3 0 −3 0

7. The linear system     


2 6 x1 6
=
−3 0 x2 3
is transformed to     
2 6 x1 6
=
0 9 x2 12
after one step of forward elimination. Now that the matrix is upper
triangular, back substitution is used to find
x2 = 12/9 = 4/3,
 
4
x1 = 6 − 6 /2 = −1.
3
Clearly, the same solution as obtained by Cramer’s rule in Exercise 4.
Selected Exercise Solutions 451

9. The Gauss elimination steps from Section 5.4 need to be modified be-
cause a1,1 = 0. Therefore, we add a pivoting step, which means that we
exchange rows, resulting in
    
2 2 x1 6
= .
0 4 x2 8

Now the matrix is in upper triangular form and we use back substitution
to find x2 = 2 and x1 = 1.
13. No, this system has only the trivial solution, x1 = x2 = 0.
14. We find the kernel of the matrix
 
2 6
C=
4 12

by first establishing the fact that c2 = 3c1 . Therefore, all vectors


 of
−3
the form v1 = −3v2 are a part of the kernel. For example, is in
1
the kernel.
Alternatively, we could have set up a homogeneous system and used
Gauss elimination to find the kernel: starting with
   
2 6 0
v= ,
4 12 0

perform one step of forward elimination,


   
2 6 0
v= ,
0 0 0

then the back substitution steps begin with arbitrarily setting v2 = 1,


and then v1 = −3, which is the same result as above.
16. The inverse of the matrix A:
 
0 −3/9
A−1 = .
1/6 1/9

Check that AA−1 = I.


19. An orthogonal matrix for which the column vectors are orthogonal and
unit length. A rotation matrix such as
 
cos 30◦ − sin 30◦
sin 30◦ cos 30◦

is an example.
452 Selected Exercise Solutions

21. Simply by inspection, we see that the map is a reflection about the
e1 -axis, thus the matrix is
 
1 0
A= .
0 −1

Let’s find this result using (5.19), then


   
1 1 1 1
V = V = .
0 1 0 −1
We find  
1 −1
V −1 = ,
0 1
and A = V  V −1 results in the reflection already given.

 
2/3
Chapter 6 1. The point q = . The transformed points:
4/3
     
3 11/2 14/3
r = , s = , q = .
4 6 16/3

The point q is in fact equal to (1/3)r + (2/3)s .


3. The affine map takes the xi to
     
 −4  −1 2
x1 = , x2 = , x3 = .
0 1 2
The ratio of three collinear points is preserved by an affine map, thus
the ratio of the xi and the xi is 1 : 1.
4. In order to rotate a point x around another point r, construct the affine
map
x = A(x − r) + r.
   
−2 −2
In this exercise, we rotate 90◦ , x = , and r = . The matrix
−2 2
 
0 −1
A= ,
1 0
 
2
thus x = . Be sure to draw a sketch!
2
7. The point p0 is on the line, therefore p0 = p0 . Since the five points are
uniformly spaced and collinear, we can apply the reflection affine map
to p4 and then use linear interpolation (another, simpler affine map) to
the other points. First, we want to find the foot of p4 on l(t),

q4 = p0 + tv.
Selected Exercise Solutions 453

We find t by projecting p4 onto the line. Let


 
−4
w = p4 − p0 = ,
0

then
v·w
t= = 1,
v2
and      
2 −2 0
q4 = +1 = .
1 2 3
This point is the midpoint between p4 and p4 , thus
 
2
p4 = 2q4 − p4 = .
5

Now we use linear interpolation to find the other points:


     
3 1 2 1 1 2 1 3 2
p1 = p0 + p4 = , p2 = p0 + p4 = , p3 = p0 + p4 = .
4 4 2 2 2 3 4 4 4

8. We want to define A in the affine map

x = A(x − a1 ) + a1 .

Let V = [v2 v3 ] and V  = [v2 v3 ], where


   
1 0
v 2 = a2 − a 1 = , v 3 = a3 − a 1 = ,
0 1
   
−1 −1
v2 = a2 − a1 = , v3 = a3 − a1 = .
1 0
The matrix A maps the vi to the vi , thus AV = V  , and
    
 −1 −1 −1 1 0 −1 −1
A=V V = = .
1 0 0 1 1 0

Check that the ai are mapped to ai . The point x = [1/2 1/2]T , the
midpoint between a2 and a3 , is mapped to x = [0 1/2]T , the midpoint
between a2 and a3
9. Affine maps take collinear points to collinear points and preserve ratios,
thus  
 0
x = .
0
11. Mapping a point x in NDC coordinates to x in the viewport involves
the following steps.
1. Translate x by an amount that translates ln to the origin.
454 Selected Exercise Solutions

2. Scale the resulting x so that the sides of the NDC box are of unit
length.
3. Scale the resulting x so that the unit box is scaled to match the
viewport’s dimensions.
4. Translate the resulting x by lv to align the scaled box with the
viewport.
The sides of the viewport have the lengths Δ1 = 20 and Δ2 = 10. We
can then express the affine map as
     
Δ1 /2 0 −1 10
x = x− + .
0 Δ2 /2 −1 10

Applying this affine map to the NDC points in the question yields
     
10 30 15
x1 = , x2 = , x3 = .
10 20 17 12

Of course we could easily “eyeball” x1 and x2 , so they provide a good
check for our affine map. Be sure to make a sketch.
12. No, affine maps do not transform perpendicular lines to perpendicular
lines. A simple shear is a counterexample.

Chapter 7 2. Form the matrix A − λI. The characteristic equation is

λ2 − 2λ − 3 = 0.

The eigenvalues are λ1 = 3 and λ2 = −1. The eigenvectors are


 √   √ 
−1/√ 2 1/√2
r1 = and r2 = .
1/ 2 1/ 2

The dominant eigenvalue is λ1 = 3.


4. The area of the parallelogram is 8 because |A| = λ1 λ2 = 4 × 2 = 8.
6. The eigenvalues
√ for the matrix in Example 7.1 are λ1 = 3 and λ2 = 1.
Let v = 1/ 2. For λ1 , we solve a homogenous equation, and find that
we can choose either
   
v −v
r1 = or r1 =
v −v

as the eigenvector. For λ2 , we can choose either


   
v −v
r2 = or r2 =
−v v
Selected Exercise Solutions 455

as the eigenvector. The matrix of eigenvectors is R = [r1 r2 ], so this


leads to the four possibilities for R:
       
v v v −v −v v −v −v
R1 = , R2 = , R3 = , R4 = ,
v −v v v −v −v −v v

where the subscripts have no intrinsic meaning; they simply allow us to


refer to each matrix.
It is easy to identify the rotation matrices by examining the diagonal
elements. Matrix R2 is a rotation by 45◦ and R3 is a rotation by 270◦ .
Matrix R1 is a reflection about e1 and a rotation by 45◦ ,
  
v −v 1 0
R1 = .
v v 0 −1

Matrix R4 is a reflection about e2 and a rotation by 45◦ ,


  
v −v −1 0
R4 = .
v v 0 1

Another way to identify the matrices: the reflection-rotation is a sym-


metric matrix and thus has real eigenvalues, whereas the rotation has
complex eigenvalues. Sketching the r1 and r2 combinations is also help-
ful for identifying the action of the map. This shows us how e1 and e2
are mapped.
7. The matrix A is symmetric, therefore the eigenvalues will be real.
9. The eigenvalues of A are the roots of the characteristic polynomial

p(λ) = (a1,1 − λ)(a2,2 − λ) − a1,2 a2,1

and since multiplication is commutative, the characteristic polynomial


for AT is identical to A’s. Therefore the eigenvalues are identical.
11. The eigenvalues and corresponding eigenvectors for A are λ1 = 4,
r1 = [0 1]T , λ2 = 2, r2 = [1 0]T .
The projection matrices are
   
0 0 0 1
P1 = r1 rT1 = and P 2 = r rT
2 2 = .
0 1 0 0

The action of the map on x is


     
0 2 2
Ax = 4P1 x + 2P2 x = + = .
4 0 4

Draw a sketch!
12. The quadratic form for C1 is an ellipsoid and λi = 4, 2. The quadratic
form for C2 is an hyperboloid and λi = 6.12, −2.12. The quadratic
form for C3 is an paraboloid and λi = 6, 0.
456 Selected Exercise Solutions

13. The quadratic forms for the given Ci are

f1 (v) = 3v12 + 2v1 v2 + 3v22 ,


f2 (v) = 3v12 + 8v1 v2 + 1v22 ,
f3 (v) = 3v12 + 6v1 v2 + 3v22 .
 
2 2
15. C = .
2 3
16. No, it is not a quadratic form because there is a linear term 2v2 .
18. The 2 × 2 matrix A is symmetric, therefore the eigenvalues will be real.
The eigenvalues will be positive since the matrix is positive definite,
which we conclude from the fact that the determinant is positive.
20. Repeated linear maps do not change the eigenvectors, as demonstrated
by (7.18), and the eigenvalues are easily computed. The eigenvalues
of A are λ1 = 3 and λ2 = −1; therefore, for A2 we have λ1 = 9 and
λ2 = 1, and for A3 we have λ1 = 27 and λ2 = −1.

Chapter 8 1. For the given vector ⎡ ⎤


4
r = ⎣2⎦ ,
4
we calculate r = 6, then
⎡ ⎤
2/3
r
= ⎣1/3⎦ .
r
2/3

2r = 2r = 12.


3. The cross product of the vectors v and w:
⎡ ⎤
0
v ∧ w = ⎣−1⎦ .
1

5. The sine of the angle between v and w:



v ∧ w 2
sin θ = = √ = 0.82.
vw 1× 3

This means that θ = 54.7◦ . Draw a sketch to double-check this for


yourself.
Selected Exercise Solutions 457

7. Two 3D lines are skew if they do not intersect. They intersect if there
exists s and t such that l1 (t) = l2 (s), or
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 −1 1 0
t ⎣0⎦ − s ⎣−1⎦ = ⎣1⎦ − ⎣0⎦ .
1 1 1 1

Examining each coordinate, we find that s = 1 and t = 1 satisfy these


three equations, thus the lines intersect and are not skew.
8. The point normal form of the plane through p with normal direction r
is found by first defining the normal n by normalizing r:
⎡ ⎤
2/3
r
n= = ⎣1/3⎦ .
·
2/3

The point normal form of the plane is

n(x − p) = 0,

or
2 1 2 2
x1 + x2 + x3 − = 0.
3 3 3 3
10. A parametric form of the plane P through the points p, q, and r is

P (s, t) = p + s(q − p) + t(r − p)


= (1 − s − t)p + sq + tr
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 4
= (1 − s − t) ⎣0⎦ + s ⎣1⎦ + t ⎣2⎦ .
1 1 4

12. The length d of the projection of w bound to q is calculated by using


two definitions of cos(θ):

d v·w
cos(θ) = and cos(θ) = .
w vw

Solve for d, ⎡ ⎤ ⎡ ⎤
1 1
⎣0⎦ · ⎣1⎦
v·w 0 1
d= = ⎡ ⎤ = 1.
v  
 1 
⎣0⎦
 
 0 

The projection length has nothing to do with q!


458 Selected Exercise Solutions

13. The distance h may be found using two definitions of sin(θ):

h v ∧ w
sin(θ) = and sin(θ) = .
w vw
Solve for h,
⎡ ⎤
 0 
 
⎣−1⎦
 
v ∧ w  1  √
h= = ⎡ ⎤ = 2.
v  1 
 
⎣0⎦
 
 0 

14. The cross product of parallel vectors results in the zero vector.
17. The volume V formed by the vectors v, w, u can be computed as the
scalar triple product
V = v · (w ∧ u).
This is invariant under cyclic permutations, thus we can also compute
V as
V = u · (v ∧ w),
which allows us to reuse the cross product from Exercise 1. Thus,
⎡ ⎤ ⎡ ⎤
0 0
V = ⎣0⎦ · ⎣−1⎦ = 1.
1 1

18. This solution is easy enough to determine without formulas, but let’s
practice using the equations. First, project w onto v, forming
⎡ ⎤
1
v·w
u1 = v = ⎣0⎦ .
v 2
0

Next form u2 so that w = u1 + u2 :


⎡ ⎤
0
u2 = w − u1 = ⎣1⎦ .
1

To complete the orthogonal frame, we compute


⎡ ⎤
0
u3 = u1 ∧ u2 = ⎣−1⎦ .
1

Draw a sketch to see what you have created. This frame is not or-
thonormal, but that would be easy to do.
Selected Exercise Solutions 459

19. We use barycentric coordinates to determine the color at a point inside


the triangle given the colors at the vertices:
1 1 1
ic = ip + iq + ir
3 3 3
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1
1⎣ ⎦ 1⎣ ⎦ 1⎣ ⎦
= 0 + 1 + 0
3 3 3
0 0 0
⎡ ⎤
2/3
= ⎣1/3⎦ ,
0
which is reddish-yellow.

1. The equation in matrix form is Chapter 9


⎡ ⎤ ⎡ ⎤⎡ ⎤
7 1 2 0 3
v = ⎣6⎦ = ⎣1

0 3⎦ ⎣2⎦ .
3 1 0 0 1

2. The transpose of the given matrix is


⎡ ⎤
1 −1 2
⎣5 −2 3 ⎦.
−4 0 −4

3. The vector u is not an element of the subspace defined by w and v


because we cannot find scalars s and t such that u = sw + tv. This
linear combination requires that t = 0 and t = 1. (We could check if
the parallelepiped formed by the three vectors, V = u · (v ∧ w), is zero.
Here we find that V = 1, confirming the result.)
5. No, u is by definition orthogonal to v and w, therefore it is not possible
to write u as a linear combination of v and w.
7. No. If v and w were linearly dependent, then they would form a 1D
subspace of R3 .
10. The scale matrix is ⎡ ⎤
2 0 0
⎣0 1/4 0 ⎦.
0 0 −4
This matrix changes the volume of a unit cube to be 2×1/4×−4 = −2.
12. The shear matrix is ⎡ ⎤
1 0 −a/c
⎣0 1 −b/c ⎦ .
0 0 1
Shears do not change volume, therefore the volume of the mapped unit
cube is still 1.
460 Selected Exercise Solutions

⎡ ⎤
−1
14. To rotate about the vector ⎣ 0 ⎦, first form the unit vector a =
−1
⎡ √ ⎤
−1/ 2
⎣ 0 ⎦. Then, following (9.10), the rotation matrix is

−1/ 2
⎡ √ √ ⎤
1
(1 + 22 ) √1/2 1
(1 − 22 )
⎢2 2

⎣ −1/2√ 2/2 1/2√ ⎦ .
1
2
(1 − 22 ) −1/2 12 (1 + 22 )
The matrices for rotating about an arbitrary vector are difficult to
verify by inspection. One test is to check the vector about which we
rotated. You’ll find that
⎡ ⎤ ⎡ ⎤
−1 −1
⎣ 0 ⎦ → ⎣ 0 ⎦,
−1 −1
which is precisely correct.
16. The projection is defined as P = AAT , where A = [u1 u2 ], which
results in ⎡ ⎤
1/2 1/2 0
P = ⎣1/2 1/2 0⎦ .
0 0 1
The action of P on a vector v is
⎡ ⎤
1/2(v1 + v2 )
v = ⎣1/2(v1 + v2 )⎦ .


v3
The vectors, v1 , v2 , v3 are mapped to
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1/2 0
v1 = ⎣1⎦ , v2 = ⎣1/2⎦ ,
 
v3 = ⎣0⎦ .
1 0 0
This is an orthogonal projection into the plane x1 − x2 = 0. Notice
that the e3 component keeps its value since that component already
lives in this plane.
The first two column vectors are identical, thus the matrix rank is 2
and the determinant is zero.
17. To find the projection direction, we solve the homogeneous system
P d = 0. This system gives us two equations to satisfy:
d1 + d2 = 0 and d3 = 0,
which have infinitely many nontrivial solutions, d = [c − c 0]T . All
vectors d are mapped to the zero vector. The vector v3 in the previous
exercise is part of the kernel, thus its image v3 = 0.
Selected Exercise Solutions 461

20. The matrices have the following determinants:

|A1 | = c1 |A|, |A2 | = c1 c2 |A|, |A3 | = c1 c2 c3 |A|.

21. The determinant of the inverse is


1 1
|A−1 | = = .
|A| 2

22. The resulting matrix is


⎡ ⎤
2 3 −4
⎣3 9 −4⎦ .
−1 −9 4

23. The matrix products are as follows:


⎡ ⎤ ⎡ ⎤
5 3 2 2 0 0
AB = ⎣0 2 0⎦ and BA = ⎣4 3 4⎦ .
2 2 1 1 2 3

25. The inverse matrices are as follows:


⎡ √ √ ⎤
1/ 2 0 −1/ 2
rotation: ⎣ 0 1 0√ ⎦,

1/ 2 0 1/ 2
⎡ ⎤
2 0 0
scale: ⎣0 4 0 ⎦,
0 0 1/2
projection: no inverse exists.

27. Only square matrices have an inverse, therefore this 3 × 2 matrix has
no inverse.
29. The matrix (AT )T is simply A.

1. An affine map is comprised of a linear map and a translation. Chapter 10


2. Construct the matrices in (10.6). For this problem, use the notation
A = Y X −1 , where
⎡ ⎤
 1 1 1
Y = y2 − y1 y3 − y1 y4 − y1 = ⎣−1 0 0⎦
0 −1 1

and
⎡ ⎤
 −1 −1 −1
X = x2 − x1 x3 − x1 x4 − x1 =⎣ 1 0 0 ⎦.
0 −1 1
462 Selected Exercise Solutions

The first task is to find X −1 :


⎡ ⎤
0 1 0
X −1
= ⎣−1/2 −1/2 −1/2⎦ .
−1/2 −1/2 1/2

Always check that XX −1 = I. Now A takes the form:


⎡ ⎤
−1 0 0
A=⎣ 0 −1 0⎦ ,
0 0 1

which isn’t surprising at all if you sketched the tetrahedra formed by


the xi and yi .
3. The point p is mapped to
⎡ ⎤
−1
p = ⎣−1⎦ .


We can find this point using the matrix from the previous exercise, p =
Ap or with barycentric coordinates. Clearly, the barycentric coordinates
for p with respect to the xi are (1, 1, −1, 0), thus

p = 1y1 + 1y2 − 1y3 + 0y4 .

6. As always, draw a sketch when possible! The plane that is used here is
shown in Figure B.1. Since the plane is parallel to the e2 axis, only a
side view is shown. The plane is shown as a thick line.

e3

v
e1
q

Figure B.1.
The plane for these exercises.
Selected Exercise Solutions 463

Construct the parallel projection map defined in (10.8):


⎡ ⎤ ⎡ ⎤
8/14 0 −8/14 6/14
x =⎣ 0

1 0 ⎦x +⎣ 0 ⎦.
−6/14 0 6/14 6/14

The xi are mapped to


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 3/7 1 −2/14
x1 = ⎣0⎦ , x2 = ⎣ 1 ⎦ , x3 = ⎣0⎦ , x4 = ⎣ 0 ⎦ .
0 3/7 0 12/14

Since x1 is already in the plane, x1 = x1 . The projection direction


v is parallel to the e2 -axis, thus this coordinate is unchanged in x2 .
Additionally, v projects equally into the e1 - and e3 -axes. Rationalize
the results for x3 and x4 yourself.
8. Construct the perspective projection vector equation defined in (10.10).
This will be different for each of the xi :
⎡ ⎤
1
3/5

x1 = x1 = ⎣0⎦ ,
3/5
0
3/5
x2 = x2 ,
0
⎡ ⎤
0
3/5
x3 = x3 = ⎣ 0 ⎦ ,
−4/5
3/4
⎡ ⎤
0
3/5

x4 = x4 = ⎣ 0 ⎦ .
4/5
3/4

There is no solution for x2 because x2 is projected parallel to the plane.


12. First, translate so that a point p on the line is at the origin. Second,
rotate 90◦ about the e1 -axis with the matrix R. Third, remove the
initial translation. Let’s define
⎡ ⎤ ⎡ ⎤
1 1 0 0
p = ⎣1⎦ , R = ⎣0 cos(90 ) − sin(90 )⎦ ,
◦ ◦

0 0 sin(90◦ ) cos(90◦ )

and then the affine map is defined as




1
q = R(q − p) + p = ⎣ 1 ⎦ .
−1
464 Selected Exercise Solutions

13. Let S be the scale matrix and R be the rotation matrix, then the affine
map is x = RSx + l , or specifically
⎡ √ √ ⎤ ⎡ ⎤
2/ 2 0 −2/ 2 −2
x = ⎣ 0√ 2 0√ ⎦ x + ⎣−2⎦ .
2/ 2 0 2/ 2 −2

The point u is mapped to


⎡ ⎤
−2
u = ⎣ 0 ⎦ .
0.83

Chapter 11 1. First we find the point normal form of the implicit plane as
1 2 2 1
x1 + x2 − x3 − = 0.
3 3 3 3
Substitute p into the plane equation to find the distance d to the plane,
d = −5/3 ≈ −1.67. Since the distance is negative, the point is on the
opposite side of the plane as the normal direction.
4. The point on each line that corresponds to the point of closest proximity
is found by setting up the linear system,

[x2 (s2 ) − x1 (s1 )]v1 = 0,


[x2 (s2 ) − x1 (s1 )]v2 = 0.

For the given lines, this linear system is


    
3 −1 s1 1
= ,
1 −3 s2 −1

and the solution is s1 = 1/2 and s2 = 1/2. This corresponds to closest


proximity points
⎡ ⎤ ⎡ ⎤
1/2 1/2
x1 (1/2) = ⎣1/2⎦ and x2 (1/2) = ⎣1/2⎦ .
1/2 1/2

The lines intersect!


5. Make a sketch! You will find that the line is parallel to the plane. The
actual calculations would involve finding the parameter t on the line for
the intersection:
(p − q) · n
t= .
v·n
In this exercise, v · n = 0.
Selected Exercise Solutions 465

7. The vector a is projected in the direction v, thus

a = a + tv,

for some unknown t. The vector a is in the plane P , thus a · n = 0.


Substitute the expression for a into the plane equation,

(a + tv) · n = 0,

from which we solve for t. Therefore, the vector a is projected to the


vector
a·n
a = a − v.
v·n
9. The parametric form of the plane containing the triangle is given by
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0
x(u, v) = ⎣0⎦ + u ⎣ 0 ⎦ + v ⎣ 1 ⎦ .
1 −1 −1

Thus the intersection problem reduces to finding u, v, t in


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 −1 0 1 0
⎣2⎦ + t ⎣−1⎦ = ⎣0⎦ + u ⎣ 0 ⎦ + v ⎣ 1 ⎦ .
2 −2 1 −1 −1

We rewrite this as a linear system:


⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 1 u 2
⎣ 0 1 1⎦ ⎣ v ⎦ = ⎣2⎦ .
−1 −1 2 t 1

Since we have yet to discuss Gauss elimination for 3 × 3 matrices, let’s


use the hint to solve the system. The first two equations lead to u = 2−t
and v = 2 − t. This allows us to use the first equation to find t = 5/4,
then we can easily find that
⎡ ⎤ ⎡ ⎤
u 3/4
⎣v ⎦ = ⎣3/4⎦ .
t 5/4

Since 1 − u − v = −1/2, the intersection point is outside of the triangle.


10. The reflected direction is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1/3 0 1/3
4
v = ⎣ 1/3 ⎦ + ⎣0⎦ = ⎣1/3⎦ .
3
−2/3 1 2/3
466 Selected Exercise Solutions

12. With a sketch, you can find the intersection point without calculation,
but let’s practice calculating the point. The linear system is:
⎡ ⎤ ⎡ ⎤
1 1 0 1
⎣1 0 0⎦ x = ⎣1⎦ ,
0 0 1 4
⎡ ⎤
1
and the solution is ⎣0⎦.
4
14. With a sketch you can find the intersection line without calculation,
but let’s practice calculating the line. The planes x1 = 1 and x3 = 4
have normal vectors
⎡ ⎤ ⎡ ⎤
1 0
n1 = ⎣0⎦ and n2 = ⎣0⎦ ,
0 1

respectively. Form the vector


⎡ ⎤
0
v = n1 ∧ n2 = ⎣1⎦ ,
0

and then a third plane is defined by v · x = 0. Set up a linear system


to intersect these three planes:
⎡ ⎤ ⎡ ⎤
1 0 0 1
⎣0 0 1⎦ p = ⎣4⎦ ,
0 1 0 0

and the solution is ⎡ ⎤


1
⎣0⎦ .
4
The intersection of the given two planes is the line
⎡ ⎤ ⎡ ⎤
1 0
l(t) = ⎣0⎦ + t ⎣1⎦ .
4 0

16. Set ⎡ ⎤
1
b1 = ⎣0⎦ ,
0
Selected Exercise Solutions 467

and then calculate


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
b2 = ⎣1⎦ − (1) ⎣0⎦ = ⎣1⎦ ,
0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 1 0 0
b3 = ⎣−1⎦ − (−1) ⎣0⎦ − (−1) ⎣1⎦ = ⎣0⎦ .
1 0 0 1

Each vector is unit length, so we have an orthonormal frame.

1. Refer to the system as Au = b. We do not expect a unique solution Chapter 12


since |A| = 0. In fact, the ai span the [e1 , e3 ]-plane and b lives in R3 ,
therefore the system is inconsistent because no solution exists.
3. The ti must be unique. If ti = tj then two rows of the matrix would be
identical and result in a zero determinant.
4. The steps for transforming A to upper triangular are given along with
the augmented matrix after each step.
Exchange row1 and row3 .
⎡ ⎤
2 0 0 1 1
⎢0 0 1 −2 2⎥
⎢ ⎥
⎣1 0 −1 2 −1⎦
1 1 1 1 −3

row3 ← row3 − 12 row1 , row4 ← row4 − 12 row1


⎡ ⎤
2 0 0 1 1
⎢0 0 1 −2 2 ⎥
⎢ ⎥
⎣0 0 −1 3/2 −3/2⎦
0 1 1 1/2 −7/2

Exchange row2 and row4 .


⎡ ⎤
2 0 0 1 1
⎢0 1 1 1/2 −7/2⎥
⎢ ⎥
⎣0 0 −1 3/2 −3/2⎦
0 0 1 −2 2

row4 ← row4 + row3


⎡ ⎤
2 0 0 1 1
⎢0 1 1 1/2 −7/2⎥
⎢ ⎥
⎣0 0 −1 3/2 −3/2⎦
0 0 0 −1/2 1/2
468 Selected Exercise Solutions

This is the upper triangular form of the coefficient matrix. Back sub-
stitution results in the solution vector
⎡ ⎤
1
⎢−3⎥
v=⎢ ⎥
⎣ 0 ⎦.
−1

5. The steps for transforming A to upper triangular are given along with
the augmented matrix after each step.
Exchange row1 and row2 .
⎡ ⎤
1 0 0 0
⎣0 0 1 −1⎦ .
1 1 1 −1

row3 ← row3 − row1 ⎡ ⎤


1 0 0 0
⎣0 0 1 −1⎦ .
0 1 1 −1
Exchange row2 and row3 .
⎡ ⎤
1 0 0 0
⎣0 1 1 −1⎦ .
0 0 1 −1

This is the upper triangular form of the coefficient matrix. Back sub-
stitution results in the solution vector
⎡ ⎤
0
v = ⎣ 0 ⎦.
−1

7. The linear system in row echelon form is


⎡ ⎤ ⎡ ⎤
1 2/3 0 1/3
⎣0 1 −2⎦ u = ⎣ 0 ⎦ .
0 0 1 1/4

9. The 5 × 5 permutation matrix that exchanges rows 3 and 4 is


⎡ ⎤
1 0 0 0 0
⎢0 1 0 0 0⎥
⎢ ⎥
⎢0 0 0 1 0⎥ .
⎢ ⎥
⎣0 0 1 0 0⎦
0 0 0 0 1

Let’s test the matrix: [v1 , v2 , v3 , v4 , v5 ] is mapped to [v1 , v2 , v4 , v3 , v5 ].


Selected Exercise Solutions 469

11. For a 3 × 3 matrix, G = G2 P2 G1 P1 , and in this example, there is no


pivoting for column 2, thus P2 = I. The other matrices are
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 1 0 0 1 0 0
P1 = ⎣1 0 0⎦ , G1 = ⎣−1/2 1 0⎦ , G2 = ⎣0 1 0⎦ ,
0 0 1 −1 0 1 0 1 1

resulting in ⎡ ⎤
0 1 0
G = ⎣1 −1/2 0⎦ .
1 −3/2 1
Let’s check this result by computing
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 1 0 2 −2 0 4 0 −2
GA = ⎣1 −1/2 0⎦ ⎣4 0 −2⎦ = ⎣0 −2 1 ⎦,
1 −3/2 1 4 2 −4 0 0 −1

which is the final upper triangular matrix from the example.


13. For this homogeneous system, we apply forward elimination, resulting
in ⎡ ⎤ ⎡ ⎤
4 1 2 0
⎣0 1/2 0⎦ u = ⎣0⎦ .
0 0 0 0
Before executing back substitution, assign a nonzero value to u3 , so
choose u3 = 1. Back substitution results in u2 = 0 and u1 = −1/2.
16. Simply by looking at the matrix, it is clear that second and third
column vectors are linearly dependent, thus it is not invertible.
17. This is a rotation matrix, and the inverse is simply the transpose.
⎡ ⎤
cos θ 0 sin θ
⎣ 0 1 0 ⎦.
− sin θ 0 cos θ

19. This matrix is not square, therefore it is not invertible.


21. The forward substitution algorithm for solving the lower triangular
linear system Ly = b is as follows.
Forward substitution:
y1 = b 1
For j = 2, . . . , n,
yj = bj − y1 lj,1 − . . . − yj−1 ln,j−1 .

23. The determinant is equal to 5.


24. The rank is 3 since the determinant is nonzero.
470 Selected Exercise Solutions

26. The determinants that we need to calculate are


 
3
 0 1
1
 2 0 = 5
1 1 1

and      
8
 0 1 3
 8 1 3
 0 8
6 2 0 = 10, 1 6 0 = 10, 1 2 6 = 10.
  
6 1 1 1 6 1 1 1 6
⎡ ⎤
2
The solution is ⎣2⎦.
2
28. This intersection problem was introduced as Example 11.4 and it is
illustrated in Sketch 11.12. The linear system is
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 1 x1 1
⎣0 0 1⎦ ⎣x2 ⎦ = ⎣1⎦ .
0 1 0 x3 2

Solving it by Gauss elimination needs very few steps. We simply ex-


change rows two and three and then apply back substitution to find
⎡ ⎤ ⎡ ⎤
x1 0
⎣x2 ⎦ = ⎣2⎦ .
x3 1

30. Use the explicit form of the line, x2 = ax1 + b. The overdetermined
system is ⎡ ⎤ ⎡ ⎤
−2 1 2
⎢−1 1⎥   ⎢1⎥
⎢ ⎥ a ⎢ ⎥
⎢0 1⎥ ⎢ ⎥
⎢ ⎥ b = ⎢0⎥ .
⎣1 1⎦ ⎣1⎦
2 1 2
Following (12.21), we form the normal equations and the linear system
becomes     
10 0 a 0
= .
0 5 b 6
Thus the least squares line is x2 = 6/5. Sketch the data and the line
to convince yourself.
32. Use the explicit form of the line, x2 = ax1 + b. The linear system for
two points is not overdetermined, however let’s apply the least squares
technique to see what happens. The system is
    
−4 1 a 1
= .
4 1 b 3
Selected Exercise Solutions 471

The normal equations are


    
32 0 a 8
= .
0 2 b 4

Thus the least squares line is x2 = (1/4)x1 + 2. Let’s evaluate the line
at the given points:

x2 = (1/4)(−4) + 2 = 1 x2 = (1/4)(4) + 2 = 3

This is the interpolating line! This is what we expect since the solution
to the original linear system must be unique. Sketch the data and the
line to convince yourself.

1. The linear system after running through the algorithm for j = 1 is Chapter 13
⎡ ⎤ ⎡ ⎤
−1.41 −1.41 −1.41 −1.41
⎣ 0 −0.14 −0.14⎦ u = ⎣ 0 ⎦ .
0 −0.1 0.2 0.3

The linear system for j = 2 is


⎡ ⎤ ⎡ ⎤
−1.41 −1.41 −1.41 −1.41
⎣ 0 0.17 −0.02⎦ u = ⎣−0.19⎦ ,
0 0 0.24 0.24

and from this one we can use back substitution to find the solution
⎡ ⎤
1
u = ⎣−1⎦ .
1

3. The Euclidean norm is also called the 2-norm, and for


⎡ ⎤
1
⎢1⎥
v=⎣ ⎥⎢ ,
1⎦
1

v2 = 12 + 12 + 12 + 12 = 2.
6. The new vector norm is max{2|v1 |, |v2 |, . . . , |vn |}. First we show that the
norm satisfies the properties in (13.6)–(13.9). Because of the absolute
value function, max{2|v1 |, |v2 |, . . . , |vn |} ≥ 0 and max{2|v1 |, |v2 |, . . . ,
|vn |} = 0 only for v = 0. The third property is satisfied as follows.

cv = max{2|cv1 |, |cv2 |, . . . , |cvn |}


= |c| max{2|v1 |, |v2 |, . . . , |vn |}
= |c|v.
472 Selected Exercise Solutions

Finally, the triangle inequality is satisfied as follows.

v + w = max{2|v1 + w1 |, |v2 + w2 |, . . . , |vn + wn |}


≤ max{2|v1 | + 2|w1 |, |v2 | + |w2 |, . . . , |vn | + |wn |}
≤ max{2|v1 |, |v2 |, . . . , |vn |} + max{2|w1 |, |w2 |, . . . , |wn |}
= v + w.

The outline of all unit vectors takes the form of a rectangle with lower
left vertex l and upper right vertex u:
  
−1/2 1/2
l= , u= .
−1 1

8. The image vectors Avi are given by


      
1 1 −1 −1
Av1 = , Av2 = , Av3 = , Av4 = .
0 1 0 −1
√ √ √
Their 2-norms are given by 1, 2, 1, 2, thus max Avi  = 2 ≈ 1.41
which is a reasonable guess for the true value 1.62. As we increase the
number of vi (to 10, say), we will get much closer to the true norm.
9. No, because for any real number c, det cA = cn det A. This violates
(13.15).
10. One is the smallest possible condition number for a nonsingular matrix
A since
κ(A) = σ1 /σn ,
and σn ≤ σ1 by definition. (For a singular matrix, σn = 0.)
11. To find the condition number of A, we calculate its singular values by
first forming
 
1.49 −0.79
AT A = .
−0.79 1.09
The characteristic equation |AT A − λ I| = 0 is

λ2 − 2.58λ + 1 = 0,

and its roots are

λ1 = 2.10 and λ2 = 0.475.


√ √
The singular values of A are σ1 = 2.10 = 1.45 and σ2 = 0.475 =
0.69. Thus the condition number of the matrix A is κ(A) = σ1 /σn =
2.10.
Selected Exercise Solutions 473

13. With the given projection matrix A, we form


 
T 1 0
A A= .
0 0
This symmetric matrix has eigenvalues λ1 = 1 and λ2 = 0, thus A
has singular values σ1 = 1 and σ2 = 0. The condition number κ(A) =
1/0 = ∞. This confirms what we already know from Section 5.9: a
projection matrix is not invertible.
15. For any i, ⎡ ⎤
1/i
u = ⎣ 1 ⎦.
i

−1/i
Hence there is a limit, namely [0, 1, 0]T .
17. We have
⎡ ⎤ ⎡ ⎤
4 0 0 0 0 −1
D = ⎣0 8 0⎦ and R = ⎣2 0 2 ⎦.
0 0 2 1 0 0
Hence
⎡ ⎤ ⎛⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎞ ⎡ ⎤
0.25 0 0 2 0 0 −1 0 0.5
u (2)
=⎣ 0 0.125 0 ⎦ ⎝⎣−2⎦ − ⎣2 0 2 ⎦ ⎣0⎦⎠ = ⎣−0.25⎦ .
0 0 0.5 0 1 0 0 0 0
Next:
⎡ ⎤ ⎛⎡ ⎤ ⎡ ⎤⎡ ⎤⎞ ⎡ ⎤
0.25 0 0 2 0 0 −1 0.5 0.5
u(3) =⎣ 0 0.125 0 ⎦ ⎝⎣−2⎦ − ⎣2 0 2 ⎦ ⎣−0.25⎦⎠ = ⎣−0.375⎦ .
0 0 0.5 0 1 0 0 0 −0.25
Similarly, we find
T
u(4) = 0.438 −0.313 −0.25 .
The true solution is
T
u = 0.444 −0.306 −.222 .

19. The first three iterations yield the vectors


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−0.5 0.187 −0.156
⎢−0.25⎥ ⎢ 0.437 ⎥ ⎢ 0.093 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣−0.25⎦ , ⎣0.4375⎦ , ⎣ 0.0937 ⎦ .
−0.5 0.187 −0.156
The actual solution is ⎡ ⎤
−0.041
⎢ 0.208 ⎥
⎢ ⎥
⎣ 0.208 ⎦ .
−0.041
474 Selected Exercise Solutions

Chapter 14 1. Yes, r, a linear combination of u, v, w, is also in R4 because of the


linearity property.
2. Yes, C is an element of M3×3 because the rules of matrix arithmetic
are consistent with the linearity property.
4. No. If we multiply an element of that set by −1, we produce a vector
that is not in the set, a violation of the linearity condition.
5. No, v = u + w.
7. Yes, r is a linear combination of u, v, w, namely r = 2u + 3w, therefore
it is in the subspace defined by these vectors.
10. The matrix A is 2 × 4. Since the vi are mapped to the wi via a linear
map, we know that the same linear relationship holds among the image
vectors,
w4 = 3w1 + 6w2 + 9w3 .

11. Forward elimination produces the matrix


⎡ ⎤
1 2 0
⎢0 0 1⎥
⎢ ⎥
⎣0 0 0⎦
0 0 0

from which we conclude the rank is 2.


14. This inner product satisfies the symmetry and positivity requirements,
however it fails homogeneity and additivity. Let’s look at homogeneity:

αv, w
= (αv1 )2 w12 + (αv2 )2 w22 + (αv3 )2 w32
= α2 v, w

= α v, w
.

16. Yes, A, B
satisfies the properties (14.3)–(14.6) of an inner product.
Symmetry: This is easily satisfied since products, ai,j bi,j , are commu-
tative.
Positivity:
A, A
= a21,1, + a21,2 + . . . + a23,3 ≥ 0,
and equality is achieved if all ai,j = 0. If A is the zero matrix, all
ai,j = 0, then A, B
= 0.
Homogeneity and Additivity: If C is also in this space, then

αA + βB, C
= (αa1,1 + βb1,1 )c1,1 + . . . + (αa3,3 + βb3,3 )c3,3
= αa1,1 c1,1 + βb1,1 c1,1 + . . . + αa3,3 c3,3 + βb3,3 c3,3
= α A, C
+ β B, C
.
Selected Exercise Solutions 475

17. Yes. We have to check the four defining properties (14.3)–(14.6). Each
is easily verified. Symmetry: For instance p0 q0 = q0 p0 , so this property
is easily verified.
Positivity: p, p
= p20 + p21 + p22 ≥ 0 and p, p
= 0 if p0 = p1 = p2 = 0.
If p(t) = 0 for all t, then p0 = p1 = p2 = 0, and clearly then p, p
= 0.
Homogeneity and Additivity: We can tackle these together,
αr + βp, q
= α r, q
+ β p, q
,
by first noting that if r is also a quadratic polynomial, then
αr + βp = αr0 + βp0 + (αr1 + βp1 )t + (αr2 + βp2 )t2 .
This means that
αr + βp, q
= (αr0 + βp0 )q0 + (αr1 + βp1 )q1 + (αr2 + βp2 )q2
= αr0 q0 + βp0 q0 + αr1 q1 + βp1 q1 + αr2 q2 + βp2 q2
= α r, q
+ β p, q
.
19. The Gram-Schmidt method produces
⎡ √ ⎤ ⎡ √ ⎤ ⎡⎤
1/√2 −1/√ 2 0
b1 = ⎣1/ 2⎦ , b2 = ⎣ 1/ 2 ⎦ , b3 = ⎣ 0 ⎦ .
0 0 −1
Knowing that the bi are normalized and checking that bi · bj = 0, we
can be confident that this is an orthonormal basis. Another tool we
have is the determinant, which will be one,
 
b1 b2 b3  = 1

21. One such basis is given by the four matrices


       
1 0 0 1 0 0 0 0
, , , .
0 0 0 0 1 0 0 1
23. No. The linearity conditions are violated. For example Φ(−u) = Φ(u),
contradicting the linearity condition, which would demand Φ(αu) =
αΦ(u) with α = −1.

1. First we find the eigenvalues by looking for the roots of the characteristic Chapter 15
equations  
2 − λ 1 
det[A − λI] =  = 0, (B.1)
1 2 − λ
which is λ2 − 4λ + 3 = 0 and when factored becomes (λ − 3)(λ − 1) = 0.
This tells us that the eigenvalues are λ1 = 3 and λ2 = 1.
The eigenvectors ri are found by inserting each λi into [A − λi I]r = 0.
We find that    
1 −1
r1 = and r2 = .
1 1
476 Selected Exercise Solutions

3. Since this is an upper triangular matrix, the eigenvalues are on the


diagonal, thus they are 4, 3, 2, 1.
4. The eigenvalue is 2 because Ar = 2r.
6. The rank of A is two since one eigenvalue is zero. A matrix with a zero
eigenvalue is singular, thus the determinant must be zero.
8. The vector e2 is an eigenvector since Re2 = e2 and the corresponding
eigenvalue is 1.
10. The dominant eigenvalue is λ1 = 3.
12. We have ⎡ ⎤ ⎡⎤ ⎡ ⎤
5 25 121
r(2) = ⎣3⎦ , r(3) = ⎣13⎦ , r(4) = ⎣ 55 ⎦ .
5 21 93
The last two give the ratios
(4) (4) (4)
r1 r2 r3
(3)
= 4.84, (3)
= 4.23, (3)
= 4.43.
r1 r2 r3
The true dominant eigenvalue is 4.65.
13. We have ⎡⎤ ⎡⎤ ⎡ ⎤
0 48 −368
r(2)
= ⎣−1⎦ , r(3)
= ⎣−13⎦ , r(4)
= ⎣ −17 ⎦ .
6 2 410
The last two give the ratios
(4) (4) (4)
r1 r2 r3
(3)
= −7.66, (3)
= 1.31, (3)
= 205.
r1 r2 r3
This is not close yet to revealing the true dominant eigenvalue, −13.02.
14. Stochastic: B, D. Not stochastic: A has a negative element. One
column of C does not sum to one.
16. Figure B.2 illustrates the directed graph.

1 4

2 3

Figure B.2.
Graph showing the connectivity defined by C.
Selected Exercise Solutions 477

The corresponding stochastic matrix is


⎡ ⎤
0 0 0 1/2 0
⎢1/3 0 0 0 1/3⎥
⎢ ⎥
D=⎢ ⎢ 0 1 0 1/2 1/3⎥
⎥.
⎣1/3 0 0 0 1/3⎦
1/3 0 1 0 0

19. The map Lf = f  , has eigenfunctions sin(kx) for k = 1, 2, . . . since

d2 sin(kx) d cos(kx)
= −k = −k2 sin(kx),
dx2 dx
and the corresponding eigenvalues are −k2 .

1. First we form   Chapter 16


T 1 0
A A= ,
0 16
and identify the eigenvalues as λi = 16, 1. The corresponding normal-
ized eigenvectors are the columns of
 
0 1
V = .
1 0
√ √
The singular values of A are σ1 = 16 = 4 and σ2 = 1 = 1, thus
 
4 0
Σ= .
0 1
Next we form  
1T 0
AA = ,
0 16
and as this matrix is identical to AT A, we can construct U = V without
much work at all. The SVD of A is complete: A = U ΣV T .
2. The eigendecomposition is identical to the SVD because the matrix is
symmetric and positive definite.
4. The semi-major axis length is 2 and the semi-minor axis length is 1/2.
This matrix is symmetric and positive definite, so the singular values are
identical to the eigenvalues, which are clearly λi = 2, 1/2. In general,
these lengths are the singular values.
7. Since A is a diagonal matrix, the eigenvalues are easily identified to be
λi = −2, 1, 1, hence det A = −2 × 1 × 1 = −2. To find A’s singular
values we find the eigenvalues of
⎡ ⎤
4 0 0
AT A = ⎣0 1 0⎦ ,
0 0 1
478 Selected Exercise Solutions

which are 4, 1, 1. Therefore, the singular values of A are σi = 2, 1, 1 and


| det A| = 2 × 1 × 1 = 2. The two equations do indeed return the same
value for | det A|.
9. Since the first and last column vectors are linearly dependent, this ma-
trix has rank 2, thus we know one eigenvalue, λ3 = 0, and det A = 0.
Similarly, a rank deficient matrix as this is guaranteed to have σ3 = 0,
confirming that det A = 0.
10. The pseudoinverse is defined in (16.7) as A† = V Σ† U T . The SVD of
A = U ΣV T is comprised of
⎡ ⎤ ⎡ ⎤
−1 0 0 2 0  
⎣ ⎦ ⎣ ⎦ 0 1
U= 0 1 0 , Σ= 0 1 , V = .
1 0
0 0 1 0 0

To form the pseudoinverse of A, we find


 
1/2 0 0
Σ† = ,
0 1 0

then  
† 0 1 0
A = .
−0.5 0 0

12. First we form AT A = [4]; this is a 1 × 1 matrix. The eigenvalue is 4


and the corresponding eigenvector is 1, so V = [1]. Next we form
⎡ ⎤
4 0 0
AAT = ⎣0 0 0⎦ ,
0 0 0

which has eigenvalues λi = 4, 0, 0 and the corresponding eigenvectors


form the columns of ⎡ ⎤
1 0 0
U = ⎣0 1 0⎦ ;
0 0 1
note that the last two columns of U form a basis for the null space of
AT . The next step is to form

Σ† = 1/2 0 0 ,

and then the pseudoinverse is



A† = V Σ† U T = 1/2 0 0 .

Check (16.8) and (16.9) to be sure this solution satisfies the properties
of the pseudoinverse.
14. The least squares system is Ax = b. We proceed with the enumerated
steps.
Selected Exercise Solutions 479

1. Compute the SVD of A:


⎡ ⎤ ⎡ ⎤
0 1 0 2.24 0  
0 1
U = ⎣0.89 0 −0.45⎦ , Σ=⎣ 0 1⎦ , V = .
1 0
0.45 0 0.89 0 0

The pseudoinverse of Σ is
 
0.45 0 0
Σ† = .
0 1 0

2. Compute ⎡

0.89
z = U b = ⎣ 0 ⎦.
T

−0.45
3. Compute  
0.4
y = Σ† z = .
0
4. Compute the solution
 
0
x = Vy = .
0.4

We can check this against the normal equation approach:

AT Av = AT b,
   
1 0 0
v= ,
0 5 2

and we can see that the solution is the same as that from the pseudo-
inverse.
17. Form the covariance matrix by first creating
 
T 12 8
X X= ,
8 12

and then divide this matrix by the number of points, 7, thus the co-
variance matrix is  
1.71 1.14
C= .
1.14 1.71
Find the (normalized) eigenvectors of C and make them the columns
of V . Looking at a sketch of the points and from our experience so
far, it is clear that  
0.707 −0.707
V = .
0.707 0.707
480 Selected Exercise Solutions

The data points transformed into the principal components coordinate


system are ⎡ ⎤
−2.82 0
⎢−1.41 0 ⎥
⎢ ⎥
⎢ 0 0 ⎥
⎢ ⎥
X̂ = XV = ⎢ ⎢ 1.41 0 ⎥ ⎥.
⎢ 2.82 0 ⎥
⎢ ⎥
⎣ 0 1.41 ⎦
0 −1.41

Chapter 17 1a. First of all, draw a sketch. Before calculating the barycentric coordi-
nates (u, v, w) of the point  
0
,
1.5
notice that this point is on the edge formed by p1 and p3 . Thus, the
barycentric coordinate v = 0.
The problem now is simply to find u and w such that
     
0 1 −1
=u +w
1.5 1 2
and u + w = 1. This is simple enough to see, without computing! The
barycentric coordinates are (1/2, 0, 1/2).
1b. Add the point  
0
p=
0
to the sketch from the previous exercise. Notice that p, p1 , and p2
are collinear. Thus, we know that w = 0.
The problem now is to find u and v such that
     
0 1 2
=u +v
0 1 2
and u + v = 1. This is easy to see: u = 2 and v = −1. If this wasn’t
obvious, you would calculate
p2 − p
u= ,
p2 − p1 
then v = 1 − u.
Thus, the barycentric coordinates of p are (2, −1, 0).
If you were to write a subroutine to calculate the barycentric coor-
dinates, you would not proceed as we did here. Instead, you would
calculate the area of the triangle and two of the three sub-triangle
areas. The third barycentric coordinate, say w, can be calculated as
1 − u − v.
Selected Exercise Solutions 481

1c. To calculate the barycentric coordinates (i1 , i2 , i3 ) of the incenter, we


need the lengths of the sides of the triangles:
√ √
s1 = 3, s2 = 5, s3 = 2.

The sum of the lengths, or circumference c, is approximately c = 6.65.


Thus the barycentric coordinates are
 √ √ 
3 5 2
, , = (0.45, 0.34, 0.21).
6.65 6.65 6.65
Always double-check that the barycentric coordinates sum to one. Ad-
ditionally, check that these barycentric coordinates result in a point in
the correct location:
       
1 2 −1 0.92
0.45 + 0.34 + 0.21 = .
1 2 2 1.55

Plot this point on your sketch, and this looks correct! Recall the
incenter is the intersection of the three angle bisectors.
1d. Referring to the circumcenter equations from Section 17.3, first calcu-
late the dot products
   
1 −2
d1 = · = −1,
1 1
   
−1 −3
d2 = · = 3,
−1 0
   
2 3
d3 = · = 6,
−1 0

then D = 18. The barycentric coordinates (cc1 , cc2 , cc3 ) of the circum-
center are

cc1 = −1 × 9/18 = −1/2,


cc2 = 3 × 5/18 = 5/6,
cc3 = 6 × 2/18 = 2/3.

The circumcenter is
       
−1 1 5 2 2 −1 0.5
+ + = .
2 1 6 2 3 2 2.5

Plot this point on your sketch. Construct the perpendicular bisectors


of each edge to verify.
1e. The centroid of the triangle is simple:
       
1 1 2 −1 2/3
c= + + = .
3 1 2 2 1 23
482 Selected Exercise Solutions

5. The unit normal is [0 0 1]T . It is easy to determine since the triangle


lives in the [e1 , e2 ]-plane.
6. Form two vectors that span the triangle,
⎡ ⎤ ⎡ ⎤
0 −1
v = ⎣1⎦ and w = ⎣ 0 ⎦ ,
0 1
then the normal is formed as
⎡ ⎤
1
v∧w 1 ⎣ ⎦
n= = √ 0 .
· 2 1

Chapter 18 3. A rhombus is equilateral but not equiangular.


5. The winding number is 0.
6. The area is 3.
8. The estimate normal to the polygon, using the methods from Sec-
tion 18.7, is ⎡ ⎤
1
1 ⎣ ⎦
n= √ 1 .
6 2

9. The outlier is p4 ; it should be


⎡ ⎤
0
⎣ 3 ⎦.
−3/2
Three ideas for planarity tests were given in Section 18.8. There is not
just one way to solve this problem, but one is not suitable for finding
the outlier. If we use the “average normal test,” which calculates the
centroid of all points, then it will lie outside the plane since it is cal-
culated using the outlier. Either the “volume test” or the “plane test”
could be used.
For so few points, we can practically determine the true plane and the
outlier. In “real world” applications, where we would have thousands of
points, the question above is ill-posed. We would be inclined to calculate
an average plane and then check if any points deviate significantly from
this plane. Can you think of a good method to construct such an average
plane?

Chapter 19 1. The matrix form for a circle in standard position is


 
λ 0
xT x − r 2 = 0.
0 λ
Selected Exercise Solutions 483

2. The ellipse is
1 2 1 2
x + x = 1.
25 1 4 2
4. The equation of the ellipse is

2x21 + 10x22 − 4 = 0.

It is fairly clear that rotating by 90◦ simply exchanges λ1 and λ2 ; how-


ever, as practice, we could construct the rotation:
     
0 −1 10 0 0 1 2 0
A= = .
1 0 0 2 −1 0 0 10

Thus the rotated ellipse is 10x21 + 2x22 − 4 = 0.


5. The implicit equation of the hyperbola is

4x21 + 4x22 + 12x1 x2 − 4 = 0.

In matrix form, the hyperbola is


 
4 6
xT x − 4 = 0.
6 4

7. The implicit equation of the conic in expanded form is

−5x21 − 6x22 − 2x1 x2 + 2x1 + 3 = 0.

9. The eigendecomposition is A = RDRT , where


 √ √   
1/√2 −1/√ 2 6 0
R= and D= .
1/ 2 1/ 2 0 4

11. A conic in matrix form is given by (19.10) and in this case,


 
1 −1
A= .
−1 0

The det A < 0, thus the conic is a hyperbola.


13. A conic in matrix form is given by (19.10) and in this case,
 
2 0
A= .
0 0

The det A = 0, thus the conic is a parabola.


484 Selected Exercise Solutions

15. Write the circle in matrix form,


   
1 0 3
xT x − 2xT + 6 = 0.
0 1 −1

The translation is the solution to the linear system


   
1 0 3
v= ,
0 1 −1

which is  
3
v= .
−1
To write the circle in standard position, we need c = vT Av − 6 = 4.
Therefore, the circle in standard form is
 
1 0
xT x − 4 = 0.
0 1

Divide this equation by 4 so we can easily see the linear map that takes
the circle to the ellipse,
 
1/4 0
xT x − 1 = 0.
0 1/4

Thus we need the scaling  


8 0
.
0 16
17. An ellipse is the result because a conic’s type is invariant under affine
maps.

Chapter 20 1. The solution is shown in Figure B.3.

b2

b3
b1

b0

Figure B.3.
The curve for Exercise 1 in Chapter 20.
Selected Exercise Solutions 485

2. The evaluation point at t = 1/2 is calculated as follows:


 
0
d0 =
0
   
6 3
d1 =
3 1.5
     
3 4.5 3.75
d2 =
6 3 2.25
       
2 2.5 3.5 3.625
d3 = .
4 5 4 3.125

3. The first derivative is


     
6 0 6
ḋ(0) = 3 − =3 .
3 0 3

The second derivative is


       
3 6 0 −9
d̈(0) = 3 × 2 −2 + =6 .
6 3 0 0

4. The de Casteljau scheme for the evaluation at t = 1/2 is given in the


solution to Exercise 2. Thus the control polygon corresponding to 0 ≤
t ≤ 1/2 is given by the diagonal of the scheme:
       
0 3 3.75 3.625
, , , .
0 1.5 2.25 3.125

The polygon for 1/2 ≤ t ≤ 1 is given by the bottom row, reading from
right to left:        
3.625 3.5 2.5 2
, , , .
3.125 4 5 4
Try to sketch the algorithm into Figure. B.3.
5. The monomial coefficients ai are
       
0 18 −27 11
a0 = , a1 = , a2 = , a3 = .
0 9 0 −5

For additional insight, compare these vectors to the point and deriva-
tives of the Bézier curve at t = 0.
6. The minmax box is given two points, which are the “lower left” and
“upper right” extents determined by the control polygon,
   
0 6
and .
0 6
486 Selected Exercise Solutions

7. First, for the curvature at t = 0 we find


   
18 −54
ḋ(0) = , d̈(0) = , |ḋ d̈| = 486.
9 0
Thus
|ḋ d̈| 486
κ(0) = = = 0.06. (B.2)
ḋ3 8150.46
Next, for the curvature at t = 1/2 we find
       
1 −0.75 1 −21
ḋ = , d̈ = , |ḋ d̈| = 108.
2 5.25 2 3
Thus  
1 |ḋ d̈| 108
κ = = = 0.72. (B.3)
2 ḋ3 148.88
Examining your sketch of the curve, notice that the curve is flatter at
t = 0, and this is reflected by these curvature values.
8. The Frenet frame at t = 1/2 is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0.14 0 0.99
f1 = ⎣0.99⎦ , f2 = ⎣0⎦ , f3 = ⎣−0.14⎦ .
0 1 0
9. We will evaluate the curve at t = 2 using the de Casteljau algorithm.
Let’s use the triangular schematic to guide the evaluation.
 
0
d0 =
0
   
6 12
d1 =
3 6
     
3 0 −12
d2 =
6 9 12
       
2 1 2 16
d3 = .
4 2 −5 −22
We have moved fairly far from the polygon!
10. To achieve tangent continuity, the curve c(t) must have
c3 = d0 ,
and c2 must be on the line formed by d0 and d1 :
c2 = c0 + c[d0 − d1 ]
for positive c. Let’s choose c = 1, then
 
−6
c2 = .
−3
We are free to choose c1 and c0 anywhere!
Bibliography

[1] Adobe Systems Inc. PostScript Language Tutorial and Cookbook.


Addison-Wesley Publishing Company, Inc., 1985.

[2] W. Boehm and H. Prautzsch. Geometric Concepts for Geometric


Design. A K Peters Ltd., 1992.

[3] H. M. S. Coxeter. Introduction to Geometry. Wiley Text Books,


second edition, 1989.

[4] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf.


Computational Geometry Algorithms and Applications. Berlin:
Springer-Verlag, 1997.

[5] M. Escher and J. Locher. The Infinite World of M.C. Escher.


New York: Abradale Press/Harry N. Abrams, Inc., 1971.

[6] G. Farin. NURBS: From Projective Geometry to Practical Use.


A K Peters, Ltd., second edition, 1999.

[7] G. Farin and D. Hansford. The Essentials of CAGD. A K Peters,


Ltd., 2000.

[8] R. Goldman. Triangles. In A. Glassner, editor, Graphics Gems,


Volume 1, pages 20–23. Academic Press, 1990.

[9] M. Goossens, F. Mittelbach, and A. Samarin. The LATEX Com-


panion. Reading, MA: Addison-Wesley Publishing Company,
Inc., 1994.

487
488 BIBLIOGRAPHY

[10] D. Hearn and M. Baker. Computer Graphics with OpenGL, 3/E.


Prentice-Hall, 2003.
[11] L. Johnson and R. Riess. Numerical Analysis. Addison-Wesley
Publishing Company, Inc., second edition, 1982.

[12] E. Kästner. Erich Kästner erzaehlt Die Schildbürger. Cecilie


Dressler Verlag, 1995.

[13] L. Lamport. LATEX User’s Guide and Reference Manual.


Addison-Wesley Publishing Company, Inc., 1994.
[14] P. Shirley. Fundamentals of Computer Graphics. A K Peters,
Ltd., 2002.
[15] D. Shreiner, M. Woo, J. Neider, and T. Davis. OpenGL Program-
ming Guide. Addison-Wesley Publishing Company, Inc., fourth
edition, 2004.
[16] G. Strang. Linear Algebra and Its Applications. Thomson
Brooks/Cole, fourth edition, 2006.

You might also like