0% found this document useful (0 votes)
2K views

Applied Linear Algebra and Optimization Using MATLAB

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views

Applied Linear Algebra and Optimization Using MATLAB

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1176

Applied Linear Algebra

and Optimization
using MATLAB R
License, Disclaimer of Liability, and Limited Warranty

By purchasing or using this book (the “Work”), you agree that this license grants
permission to use the contents contained herein, but does not give you the right
of ownership to any of the textual content in the book or ownership to any of
the information or products contained in it. This license does not permit up-
loading of the Work onto the Internet or on a network (of any kind) without the
written consent of the Publisher. Duplication or dissemination of any text, code,
simulations, images, etc. contained herein is limited to and subject to licensing
terms for the respective products, and permission must be obtained from the
Publisher or the owner of the content, etc., in order to reproduce or network
any portion of the textual material (in any media) that is contained in the Work.

Mercury Learning and Information (“MLI” or “the Publisher”) and any-


one involved in the creation, writing, or production of the accompanying algo-
rithms, code, or computer programs (“the software”), and any accompanying
Web site or software of the Work, cannot and do not warrant the performance
or results that might be obtained by using the contents of the Work. The au-
thor, developers, and the Publisher have used their best efforts to insure the
accuracy and functionality of the textual material and/or programs contained
in this package; we, however, make no warranty of any kind, express or implied,
regarding the performance of these contents or programs. The Work is sold “as
is” without warranty (except for defective materials used in manufacturing the
book or due to faulty workmanship).

The author, developers, and the publisher of any accompanying content, and
anyone involved in the composition, production, and manufacturing of this work
will not be liable for damages of any kind arising out of the use of (or the inabil-
ity to use) the algorithms, source code, computer programs, or textual material
contained in this publication. This includes, but is not limited to, loss of revenue
or profit, or other incidental, physical, or consequential damages arising out of
the use of this Work.

The sole remedy in the event of a claim of any kind is expressly limited to
replacement of the book, and only at the discretion of the Publisher. The use of
“implied warranty” and certain “exclusions” vary from state to state, and might
not apply to the purchaser of this product.
Applied Linear Algebra
and Optimization
using MATLAB R

Rizwan Butt, PhD

Mercury Learning and Information


Dulles, Virginia
Copyright 2011
c by Mercury Learning and Information.
All rights reserved.

This publication, portions of it, or any accompanying software may not be reproduced
in any way, stored in a retrieval system of any type, or transmitted by any means,
media, electronic display or mechanical display, including, but not limited to,
photocopy, recording, Internet postings, or scanning, without prior permission in
writing from the publisher.

Publisher: David Pallai


Mercury Learning and Information
22841 Quicksilver Drive
Dulles, VA 20166
[email protected]
www.merclearning.com
1-800-758-3756

This book is printed on acid-free paper.

R. Butt, PhD. Applied Linear Algebra and Optimization using MATLAB


R

ISBN: 978-1-9364200-4-9

The publisher recognizes and respects all marks used by companies, manufacturers,
and developers as a means to distinguish their products. All brand names and
product names mentioned in this book are trademarks or service marks of their
respective companies. Any omission or misuse (of any kind) of service marks or
trademarks, etc. is not an attempt to infringe on the property of others.

Library of Congress Control Number: 2010941258

1112133 2 1

Our titles are available for adoption, license, or bulk purchase by institutions,
corporations, etc. For additional information, please contact the
Customer Service Dept. at 1-800-758-3756 (toll free).

The sole obligation of Mercury Learning and Information to the purchaser is to


replace the disc, based on defective materials or faulty workmanship, but not based on
the operation or functionality of the product.
Dedicated to
Muhammad Sarwar Khan,
The Greatest Friend in the World
Contents

Preface xv

Acknowledgments xix

1 Matrices and Linear Systems 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Linear Systems in Matrix Notation . . . . . . . . . 7
1.2 Properties of Matrices and Determinants . . . . . . . . . 10
1.2.1 Introduction to Matrices . . . . . . . . . . . . . . . 10
1.2.2 Some Special Matrix Forms . . . . . . . . . . . . . 15
1.2.3 Solutions of Linear Systems of Equations . . . . . 30
1.2.4 The Determinant of a Matrix . . . . . . . . . . . . . 38
1.2.5 Homogeneous Linear Systems . . . . . . . . . . . 62
1.2.6 Matrix Inversion Method . . . . . . . . . . . . . . . 68
1.2.7 Elementary Matrices . . . . . . . . . . . . . . . . . 71
1.3 Numerical Methods for Linear Systems . . . . . . . . . . 74
1.4 Direct Methods for Linear Systems . . . . . . . . . . . . . 74
1.4.1 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . 75
1.4.2 Gaussian Elimination Method . . . . . . . . . . . . 79
1.4.3 Pivoting Strategies . . . . . . . . . . . . . . . . . . 99
1.4.4 Gauss–Jordan Method . . . . . . . . . . . . . . . . 106
1.4.5 LU Decomposition Method . . . . . . . . . . . . . 111
1.4.6 Tridiagonal Systems of Linear Equations . . . . . . 157
1.5 Conditioning of Linear Systems . . . . . . . . . . . . . . . 161
1.5.1 Norms of Vectors and Matrices . . . . . . . . . . . 162
1.5.2 Errors in Solving Linear Systems . . . . . . . . . . 167
vii
viii Contents

1.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 180


1.6.1 Curve Fitting, Electric Networks, and Traffic Flow . 180
1.6.2 Heat Conduction . . . . . . . . . . . . . . . . . . . 189
1.6.3 Chemical Solutions and
Balancing Chemical Equations . . . . . . . . . . . 192
1.6.4 Manufacturing, Social, and Financial Issues . . . . 195
1.6.5 Allocation of Resources . . . . . . . . . . . . . . . 201
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
1.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

2 Iterative Methods for Linear Systems 243


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 243
2.2 Jacobi Iterative Method . . . . . . . . . . . . . . . . . . . 245
2.3 Gauss–Seidel Iterative Method . . . . . . . . . . . . . . . 252
2.4 Convergence Criteria . . . . . . . . . . . . . . . . . . . . 270
2.5 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . 280
2.6 Successive Over-Relaxation Method . . . . . . . . . . . . 294
2.7 Conjugate Gradient Method . . . . . . . . . . . . . . . . . 308
2.8 Iterative Refinement . . . . . . . . . . . . . . . . . . . . . 313
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
2.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

3 The Eigenvalue Problems 327


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 327
3.2 Linear Algebra and Eigenvalues Problems . . . . . . . . . 348
3.3 Diagonalization of Matrices . . . . . . . . . . . . . . . . . 357
3.4 Basic Properties of Eigenvalue Problems . . . . . . . . . 374
3.5 Some Results of Eigenvalues Problems . . . . . . . . . . 393
3.6 Applications of Eigenvalue Problems . . . . . . . . . . . . 397
3.6.1 System of Differential Equations . . . . . . . . . . 397
3.6.2 Difference Equations . . . . . . . . . . . . . . . . . 405
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
3.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 410

4 Numerical Computation of Eigenvalues 417


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 417
Contents ix

4.2 Vector Iterative Methods for Eigenvalues . . . . . . . . . . 418


4.2.1 Power Method . . . . . . . . . . . . . . . . . . . . 419
4.2.2 Inverse Power Method . . . . . . . . . . . . . . . . 429
4.2.3 Shifted Inverse Power Method . . . . . . . . . . . 433
4.3 Location of the Eigenvalues . . . . . . . . . . . . . . . . . 438
4.3.1 Gerschgorin Circles Theorem . . . . . . . . . . . . 438
4.3.2 Rayleigh Quotient . . . . . . . . . . . . . . . . . . 440
4.4 Intermediate Eigenvalues . . . . . . . . . . . . . . . . . . 442
4.5 Eigenvalues of Symmetric Matrices . . . . . . . . . . . . 446
4.5.1 Jacobi Method . . . . . . . . . . . . . . . . . . . . 448
4.5.2 Sturm Sequence Iteration . . . . . . . . . . . . . . 455
4.5.3 Given’s Method . . . . . . . . . . . . . . . . . . . . 460
4.5.4 Householder’s Method . . . . . . . . . . . . . . . . 465
4.6 Matrix Decomposition Methods . . . . . . . . . . . . . . . 473
4.6.1 QR Method . . . . . . . . . . . . . . . . . . . . . . 473
4.6.2 LR Method . . . . . . . . . . . . . . . . . . . . . . 479
4.6.3 Upper Hessenberg Form . . . . . . . . . . . . . . 482
4.6.4 Singular Value Decomposition . . . . . . . . . . . 491
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
4.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 500

5 Interpolation and Approximation 511


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 511
5.2 Polynomial Approximation . . . . . . . . . . . . . . . . . . 513
5.2.1 Lagrange Interpolating Polynomials . . . . . . . . 514
5.2.2 Newton’s General Interpolating Formula . . . . . . 530
5.2.3 Aitken’s Method . . . . . . . . . . . . . . . . . . . 552
5.2.4 Chebyshev Polynomials . . . . . . . . . . . . . . . 557
5.3 Least Squares Approximation . . . . . . . . . . . . . . . . 574
5.3.1 Linear Least Squares . . . . . . . . . . . . . . . . 575
5.3.2 Polynomial Least Squares . . . . . . . . . . . . . . 581
5.3.3 Nonlinear Least Squares . . . . . . . . . . . . . . 585
5.3.4 Least Squares Plane . . . . . . . . . . . . . . . . . 601
5.3.5 Trigonometric Least Squares Polynomial . . . . . 604
5.3.6 Least Squares Solution of an
Overdetermined System . . . . . . . . . . . . . . . 608
x Contents

5.3.7 Least Squares Solution of an


Underdetermined System . . . . . . . . . . . . . . 613
5.3.8 The Pseudoinverse of a Matrix . . . . . . . . . . . 619
5.3.9 Least Squares with QR Decomposition . . . . . . 622
5.3.10 Least Squares with Singular
Value Decomposition . . . . . . . . . . . . . . . . 628
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 634
5.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 635

6 Linear Programming 653


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 653
6.2 General Formulation . . . . . . . . . . . . . . . . . . . . . 655
6.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 656
6.4 Linear Programming Problems . . . . . . . . . . . . . . . 656
6.4.1 Formulation of Mathematical Model . . . . . . . . 657
6.4.2 Formulation of Mathematical Model . . . . . . . . 659
6.5 Graphical Solution of LP Models . . . . . . . . . . . . . . 660
6.5.1 Reversed Inequality Constraints . . . . . . . . . . 668
6.5.2 Equality Constraints . . . . . . . . . . . . . . . . . 668
6.5.3 Minimum Value of a Function . . . . . . . . . . . . 669
6.5.4 LP Problem in Canonical Form . . . . . . . . . . . 676
6.5.5 LP Problem in Standard Form . . . . . . . . . . . . 677
6.5.6 Some Important Definitions . . . . . . . . . . . . . 682
6.6 The Simplex Method . . . . . . . . . . . . . . . . . . . . . 683
6.6.1 Basic and Nonbasic Variables . . . . . . . . . . . . 683
6.6.2 The Simplex Algorithm . . . . . . . . . . . . . . . . 684
6.6.3 Simplex Method for Minimization Problem . . . . . 690
6.7 Unrestricted in Sign Variables . . . . . . . . . . . . . . . . 693
6.8 Finding a Feasible Basis . . . . . . . . . . . . . . . . . . . 695
6.8.1 By Trial and Error . . . . . . . . . . . . . . . . . . . 695
6.8.2 Use of Artificial Variables . . . . . . . . . . . . . . 696
6.9 Big M Simplex Method . . . . . . . . . . . . . . . . . . . . 697
6.10 Two-Phase Simplex Method . . . . . . . . . . . . . . . . . 701
6.11 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
6.11.1 Comparison of Primal and Dual Problems . . . . . 708
6.11.2 Primal-Dual Problems in Standard Form . . . . . . 711
Contents xi

6.12 Sensitivity Analysis in


Linear Programming . . . . . . . . . . . . . . . . . . . . . 717
6.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
6.14 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 722

7 Nonlinear Programming 735


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 735
7.2 Review of Differential Calculus . . . . . . . . . . . . . . . 736
7.2.1 Limits of Functions . . . . . . . . . . . . . . . . . . 736
7.2.2 Continuity of a Function . . . . . . . . . . . . . . . 738
7.2.3 Derivative of a Function . . . . . . . . . . . . . . . 739
7.2.4 Local Extrema of a Function . . . . . . . . . . . . 742
7.2.5 Directional Derivatives and the Gradient Vector . . 752
7.2.6 Hessian Matrix . . . . . . . . . . . . . . . . . . . . 757
7.2.7 Taylor’s Series Expansion . . . . . . . . . . . . . . 762
7.2.8 Quadratic Forms . . . . . . . . . . . . . . . . . . . 768
7.3 Nonlinear Equations and Systems . . . . . . . . . . . . . 774
7.3.1 Bisection Method . . . . . . . . . . . . . . . . . . . 775
7.3.2 Fixed-Point Method . . . . . . . . . . . . . . . . . 781
7.3.3 Newton’s Method . . . . . . . . . . . . . . . . . . . 786
7.3.4 System of Nonlinear Equations . . . . . . . . . . . 789
7.4 Convex and Concave Functions . . . . . . . . . . . . . . 802
7.5 Standard Form of a Nonlinear
Programming Problem . . . . . . . . . . . . . . . . . . . . 818
7.6 One-Dimensional Unconstrained
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 819
7.6.1 Golden-Section Search . . . . . . . . . . . . . . . 819
7.6.2 Quadratic Interpolation . . . . . . . . . . . . . . . 825
7.6.3 Newton’s Method . . . . . . . . . . . . . . . . . . . 831
7.7 Multidimensional Unconstrained
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 835
7.7.1 Gradient Methods . . . . . . . . . . . . . . . . . . 840
7.7.2 Newton’s Method . . . . . . . . . . . . . . . . . . . 850
7.8 Constrained Optimization . . . . . . . . . . . . . . . . . . 855
7.8.1 Lagrange Multipliers . . . . . . . . . . . . . . . . . 855
7.8.2 The Kuhn–Tucker Conditions . . . . . . . . . . . . 868
xii Contents

7.8.3 Karush–Kuhn–Tucker Conditions . . . . . . . . . . 870


7.9 Generalized Reduced-Gradient Method . . . . . . . . . . 881
7.10 Separable Programming . . . . . . . . . . . . . . . . . . . 890
7.11 Quadratic Programming . . . . . . . . . . . . . . . . . . . 895
7.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 900
7.13 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 901

Appendices 917

A Number Representations and Errors 917


A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 917
A.2 Number Representations and the Base of Numbers . . . 918
A.2.1 Normalized Floating-Point Representations . . . . 921
A.2.2 Rounding and Chopping . . . . . . . . . . . . . . . 924
A.3 Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925
A.4 Sources of Errors . . . . . . . . . . . . . . . . . . . . . . . 927
A.4.1 Human Errors . . . . . . . . . . . . . . . . . . . . . 927
A.4.2 Truncation Errors . . . . . . . . . . . . . . . . . . . 927
A.4.3 Round-off Errors . . . . . . . . . . . . . . . . . . . 928
A.5 Effect of Round-off Errors in
Arithmetic Operations . . . . . . . . . . . . . . . . . . . . 929
A.5.1 Round-off Errors in Addition and Subtraction . . . 929
A.5.2 Round-off Errors in Multiplication . . . . . . . . . . 931
A.5.3 Round-off Errors in Division . . . . . . . . . . . . . 933
A.5.4 Round-off Errors in Powers and Roots . . . . . . . 935
A.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 937
A.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 938

B Mathematical Preliminaries 941


B.1 The Vector Space . . . . . . . . . . . . . . . . . . . . . . 941
B.1.1 Vectors in Two Dimensions . . . . . . . . . . . . . 942
B.1.2 Vectors in Three Dimensions . . . . . . . . . . . . 947
B.1.3 Lines and Planes in Space . . . . . . . . . . . . . 964
B.2 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . 976
B.2.1 Geometric Representation of Complex Numbers . 977
B.2.2 Operations on Complex Numbers . . . . . . . . . 978
Contents xiii

B.2.3 Polar Forms of Complex Numbers . . . . . . . . . 980


B.2.4 Matrices with Complex Entries . . . . . . . . . . . 983
B.2.5 Solving Systems with Complex Entries . . . . . . . 984
B.2.6 Determinants of Complex Numbers . . . . . . . . 984
B.2.7 Complex Eigenvalues and Eigenvectors . . . . . . 985
B.3 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . 986
B.3.1 Properties of Inner Products . . . . . . . . . . . . 987
B.3.2 Complex Inner Products . . . . . . . . . . . . . . . 990
B.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 992

C Introduction to MATLAB 1007


C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1007
C.2 Some Basic MATLAB Operations . . . . . . . . . . . . . . 1008
C.2.1 MATLAB Numbers and Numeric Formats . . . . . 1010
C.2.2 Arithmetic Operations . . . . . . . . . . . . . . . . 1012
C.2.3 MATLAB Mathematical Functions . . . . . . . . . . 1014
C.2.4 Scalar Variables . . . . . . . . . . . . . . . . . . . 1015
C.2.5 Vectors . . . . . . . . . . . . . . . . . . . . . . . . 1016
C.2.6 Matrices . . . . . . . . . . . . . . . . . . . . . . . . 1020
C.2.7 Creating Special Matrices . . . . . . . . . . . . . . 1024
C.2.8 Matrix Operations . . . . . . . . . . . . . . . . . . 1032
C.2.9 Strings and Printing . . . . . . . . . . . . . . . . . 1035
C.2.10 Solving Linear Systems . . . . . . . . . . . . . . . 1037
C.2.11 Graphing in MATLAB . . . . . . . . . . . . . . . . . 1044
C.3 Programming in MATLAB . . . . . . . . . . . . . . . . . . 1051
C.3.1 Statements for Control Flow . . . . . . . . . . . . . 1051
C.3.2 For Loop . . . . . . . . . . . . . . . . . . . . . . . 1052
C.3.3 While Loop . . . . . . . . . . . . . . . . . . . . . . 1052
C.3.4 Nested for Loops . . . . . . . . . . . . . . . . . . . 1053
C.3.5 Structure . . . . . . . . . . . . . . . . . . . . . . . 1054
C.4 Defining Functions . . . . . . . . . . . . . . . . . . . . . . 1056
C.5 MATLAB Built-in Functions . . . . . . . . . . . . . . . . . 1059
C.6 Symbolic Computation . . . . . . . . . . . . . . . . . . . . 1061
C.6.1 Some Important Symbolic Commands . . . . . . . 1064
C.6.2 Solving Equations Symbolically . . . . . . . . . . . 1069
C.6.3 Calculus . . . . . . . . . . . . . . . . . . . . . . . . 1071
xiv Contents

C.6.4 Symbolic Ordinary Differential Equations . . . . . 1077


C.6.5 Linear Algebra . . . . . . . . . . . . . . . . . . . . 1079
C.6.6 Eigenvalues and Eigenvectors . . . . . . . . . . . 1080
C.6.7 Plotting Symbolic Expressions . . . . . . . . . . . 1081
C.7 Symbolic Math Toolbox Functions . . . . . . . . . . . . . 1083
C.8 Index of MATLAB Programs . . . . . . . . . . . . . . . . . 1086
C.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089
C.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 1090

D Answers to Selected Exercises 1097


D.0.1 Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . 1097
D.0.2 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . 1107
D.0.3 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . 1108
D.0.4 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . 1111
D.0.5 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . 1115
D.0.6 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . 1118
D.0.7 Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . 1120
D.0.8 Appendix A . . . . . . . . . . . . . . . . . . . . . . 1122
D.0.9 Appendix B . . . . . . . . . . . . . . . . . . . . . . 1123
D.0.10 Appendix C . . . . . . . . . . . . . . . . . . . . . . 1126

Bibliography 1129

Index 1145
Preface

This book presents an integrated approach to numerical linear algebra and


optimization theory based on a computer—in this case, using the software
package MATLAB. This book has evolved over many years from lecture
notes on Numerical Linear Algebra and Optimization Theory that accom-
pany both graduate and post-graduate courses in mathematics at the King
Saud University at Riyadh, Saudi Arabia. These courses deal with linear
equations, approximations, eigenvalue problems, and linear and nonlinear
optimization problems. We discuss several numerical methods for solving
both linear systems of equations and optimization problems. It is generally
accepted that linear algebra methods aid in finding the solution of linear
and nonlinear optimization problems.

The main approach used in this book is quite different from currently
available books, which are either too theoretical or too computational. The
approach adopted in this book lies between the above two extremities. The
book fully exploits MATLAB’s symbolic, numerical, and graphical capabil-
ities to develop a thorough understanding of linear algebra and optimiza-
tion algorithms.

The book covers two distinct topics: linear algebra and optimization
theory. Linear algebra plays an important role in both applied and
theoretical mathematics, as well as in all of science and engineering,
xvi Preface

computer science, probability and statistics, economics, numerical analysis,


and many other disciplines. Nowadays, a proper grounding in both calcu-
lus and linear algebra is an essential prerequisite for a successful career in
science, engineering, and mathematics. Linear algebra can be viewed as the
mathematical apparatus needed to solve potentially huge linear systems,
to understand their underlying structure, and to apply what is learned in
other contexts. The term linear is the key and, in fact, refers not just to
linear algebraic equations, but also to linear differential equations, linear
boundary value problems, linear iterative systems, and so on.

The other focus of this book is on optimization theory. This theory


is the study of the extremal values of a function; its maxima and min-
ima. The topics in this theory range from conditions for existence of a
unique extremal value to methods—both analytic and numeric—for find-
ing the extremal values, and for what values of the independent variables
the function attains its extremes. It is a branch of mathematics that en-
compasses many diverse areas of optimization and minimization. The more
modern term is operational research. It includes the calculus of variations,
control theory, convex optimization theory, decision theory, game theory,
linear and nonlinear programming, queuing systems, etc. In this book we
emphasize only linear and nonlinear programming problems.

A wide range of applications appears throughout the book. They have


been chosen and written to give the student a sense of the broad range of
applicability of linear algebra and optimization theory. These applications
range from theoretical applications such as the use of linear algebra in dif-
ferential equations, difference equations, and least squares analysis.

When dealing with linear algebra or optimization theory, we often need


a computer. We believe that computers can improve the conceptional un-
derstanding of mathematics, not just enable the completion of complicated
calculations. We have chosen MATLAB as our standard package because
it is a widely used software for working with matrices. The surge of pop-
ularity in MATLAB is related to the increasing popularity of UNIX and
computer graphics. To what extent numerical computations will be pro-
grammed in MATLAB in the future is uncertain. A short introduction to
Preface xvii

MATLAB is given in Appendix C, and the programs in the text serve as


further examples.

The topics are discussed in a simplified manner with a number of ex-


amples illustrating the different concepts and applications. Most of the
sections contain a fairly large number of exercises, some of which relate
to real-life problems. Chapter 1 covers the basic concepts of matrices and
determinants and describes the basic computational methods used to solve
nonhomogeneous linear equations. Direct methods, including Cramer’s
rule, the Gaussian elimination method and its variants, the Gauss–Jordan
method, and LU decomposition methods, are discussed. It also covers
the conditioning of linear systems. Many ill-conditioned problems are dis-
cussed. The chapter closes with the many interesting applications of linear
systems. In Chapter 2, we discuss iterative methods, including the Ja-
cobi method, the Gauss–Seidel method, the SOR iterative method, the
conjugate gradient method, and the residual corrector method. Chapter
3 covers the selected methods of computing matrix eigenvalues. The ap-
proach discussed here should help students understand the relationship of
eigenvalues to the roots of characteristic equations. We define eigenvalues
and eigenvectors and study several examples. We discuss the diagonaliza-
tion of matrices and the computation of powers of diagonalizable matrices.
Some interesting applications of the eigenvalues and eigenvectors of a ma-
trix are also discussed at the end of the chapter. In Chapter 4, various
numerical methods are discussed for the eigenvalues of matrices. Among
them are the power iterative methods, the Jacobi method, Given’s method,
the Householder method, the QR iteration method, the LR method, and
the singular value decomposition method. Chapter 5 describes the ap-
proximation of functions. In this chapter we also describe curve fitting of
experimental data based on least squares methods. We discuss linear, non-
linear, plane, and trigonometric function least squares approximations. We
use QR decomposition and singular value decomposition for the solution of
the least squares problem. In Chapter 6, we describe standard linear pro-
gramming formulations. The subject of linear programming, in general,
involves the development algorithms and methodologies in optimization.
The field, developed by George Dantzig and his associates in 1947, is now
widely used in industry and has its foundation in linear algebra. In keep-
xviii Preface

ing with the intent of this book, this chapter presents the mathematical
formulations of basic linear programming problems. In Chapter 7, we de-
scribe nonlinear programming formulations. We discuss many numerical
methods for solving unconstrained and constrained problems. In the be-
ginning of the chapter some of the basic mathematical concepts useful in
developing optimization theory are presented. For unconstrained optimiza-
tion problems we discuss the golden-section search method and quadratic
interpolation method, which depend on the initial guesses that bracket the
single optimum, and Newton’s method, which is based on the idea from
calculus that the minimum or maximum can be found by solving f 0 (x) = 0.
For the functions of several variables, we use the steepest descent method
and Newton’s method. For handling nonlinear optimization problems with
constraints, we discuss the generalized reduced-gradient method, Lagrange
multipliers, and KT conditions. At the end of the chapter, we also discuss
quadratic programming problems and the separable programming prob-
lems.

In each chapter, we discuss several examples to guide students step-


by-step through the most complex topics. Since the only real way to learn
mathematics is to use it, there is a list of exercises provided at the end
of each chapter. These exercises range from very easy to quite difficult.
This book is completely self-contained, with all the necessary mathematical
background given in it. Finally, this book provides balanced convergence
of the theory, application, and numerical computation of all the topics dis-
cussed.

Appendix A covers different kinds of errors that are preparatory sub-


jects for numerical computations. To explain the sources of these errors,
there is a brief discussion of Taylor’s series and how numbers are computed
and saved in computers. Appendix B consists of a brief introduction to
vectors in space and a review of complex numbers and how to do linear
algebra with them. It is also devoted to general inner product spaces and
to how different notations and processes generalize. In Appendix C, we dis-
cuss the basic commands for the software package MATLAB. In Appendix
D, we give answers to selected odd-numbered exercises.
Acknowledgments

I wish to express my gratitude to all those colleagues, friends, and as-


sociates of mine, without whose help this work was not possible. I am
grateful, especially, to Dr. Saleem, Dr. Zafar Ellahi, Dr. Esia Al-Said, and
Dr. Salah Hasan for reading earlier versions of the manuscript and for pro-
viding encouraging comments. I have written this book as the background
material for an interactive first course in linear algebra and optimization.
The encouragement and positive feedback that I have received during the
design and development of the book have given me the energy required to
complete the project.

I also want to express my heartfelt thanks to a special person who has


been very helpful to me in a great many ways over the course of my career:
Muhammad Sarwar Khan, of King Saud University, Riyadh, Saudi Arabia.

My sincere thanks are also due to the Deanship of the Scientific Re-
search Center, College of Science, King Saud University, Riyadh, KSA,
for financial support and for providing facilities throughout the research
project No. (Math/2008/05/B).

It has taken me five years to write this book and thanks must go to my
long-suffering family for my frequent unsocial behavior over these years. I
am profoundly grateful to my wife Saima, and our children Fatima, Usman,
xx Acknowledgments

Fouzan, and Rahmah, for their patience, encouragement, and understand-


ing throughout this project. Special thanks goes to my elder daughter,
Fatima, for creating all the figures in this project.

Dr. Rizwan Butt


Department of Mathematics,
College of Science
King Saud University
August, 2010
Chapter 1

Matrices and Linear Systems

1.1 Introduction
When engineering systems are modeled, the mathematical description is
frequently developed in terms of a set of algebraic simultaneous equations.
Sometimes these equations are nonlinear and sometimes linear. In this
chapter, we discuss systems of simultaneous linear equations and describe
the numerical methods for the approximate solutions of such systems. The
solution of a system of simultaneous linear algebraic equations is proba-
bly one of the most important topics in engineering computation. Prob-
lems involving simultaneous linear equations arise in the areas of elasticity,
electric-circuit analysis, heat transfer, vibrations, and so on. Also, the
numerical integration of some types of ordinary and partial differential
equations may be reduced to the solution of such a system of equations. It
has been estimated, for example, that about 75% of all scientific problems
require the solution of a system of linear equations at one stage or another.
It is therefore important to be able to solve linear problems efficiently and
accurately.
1
2 Applied Linear Algebra and Optimization using MATLAB

Definition 1.1 (Linear Equation)

It is an equation in which the highest exponent in a variable term is no


more than one. The graph of such an equation is a straight line. •

A linear equation in two variables x1 and x2 is an equation that can be


written in the form
a1 x1 + a2 x2 = b,
where a1 , a2 , and b are real numbers. Note that this is the equation of a
straight line in the plane. For example, the equations
4
5x1 + 2x2 = 2, x1 + 2x2 = 1, 2x1 − 4x2 = π
5
are all linear equations in two variables.

A linear equation in n variables x1 , x2 , . . . , xn is an equation that can


be written as
a1 x1 + a2 x2 + · · · + an xn = b,
where a1 , a2 , . . . , an are real numbers and called the coefficients of unknown
variables x1 , x2 , . . . , xn and the real number b, the right-hand side of the
equation, is called the constant term of the equation.

Definition 1.2 (System of Linear Equations)

A system of linear equations (or linear system) is simply a finite set of


linear equations. •

For example,
4x1 − 2x2 = 5
3x1 + 2x2 = 4
is a system of two equations in two variables x1 and x2 , and

2x1 + x2 − 5x3 + 2x4 = 9


4x1 + 3x2 + 2x3 + 4x4 = 3
x1 + 2x2 + 3x3 + 2x4 = 11
Matrices and Linear Systems 3

is the system of three equations in the four variables x1 , x2 , x3 , and x4 .

In order to write a general system of m linear equations in the n vari-


ables x1 , . . . , xn , we have
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. .. (1.1)
. . ··· . .
am1 x1 + am2 x2 + · · · + amn xn = bm
or, in compact form the system (1.1) can be written as
n
X
aij xj = bi , i = 1, 2, . . . , m. (1.2)
j=1

For such a system we seek all possible ordered sets of numbers c1 , . . . , cn


which satisfy all m equations when they are substituted for the variables
x1 , x2 , . . . , xn . Any such set {c1 , c2 , . . . , cn } is called a solution of the sys-
tem of linear equations (1.1) or (1.2).

There are three possible types of linear systems that arise in engineering
problems, and they are described as follows:
1. If there are more equations than unknown variables (m > n), then
the system is usually called overdetermined. Typically, an overdeter-
mined system has no solution. For example, the following system
4x1 = 8
3x1 + 9x2 = 13
3x2 = 9
has no solution.
2. If there are more unknown variables than the number of the equations
(n > m), then the system is usually called underdetermined. Typi-
cally, an underdetermined system has an infinite number of solutions.
For example, the system
x1 + 5x2 = 45
3x2 + 4x3 = 21
4 Applied Linear Algebra and Optimization using MATLAB

has infinitely many solutions.

3. If there are the same number of equations as unknown variables (m =


n), then the system is usually called a simultaneous system. It has
a unique solution if the system satisfies certain conditions (which we
will discuss below). For example, the system

2x1 + 4x2 + x3 = −11


−x1 + 3x2 − 2x3 = −16
2x1 − 3x2 + 5x3 = 21

has the unique solution x1 = 2, x2 = −4, x3 = 1.


Most engineering problems fall into this category. In this chapter, we
will solve simultaneous linear systems using many numerical meth-
ods.
A simultaneous system of linear equations is said to be linear inde-
pendent if no equation in the system can be expressed as a linear
combination of the others. Under these circumstances a unique solu-
tion exists. For example, the system of linear equations

2x1 + x2 − x3 = 1
x1 − 2x2 + 3x3 = 4
x1 + x2 = 1

is linear independent and therefore has the unique solution

x1 = 1, x2 = 0, x3 = 1.

However, the system

5x1 + x2 + x3 = 4
3x1 − x2 + x3 = −2
x1 + x2 = 3

does not have a unique solution since the equations are not linear
independent; the first equation is equal to the second equation plus
twice the third equation.
Matrices and Linear Systems 5

Theorem 1.1 (Solution of a Linear System)

Every system of linear equations has either no solution, exactly one solu-
tion, or infinitely many solutions. •

For example, in the case of a system of two equations in two variables,


we can have these three possibilities for the solutions of the linear system.
First, the two lines (since the graph of a linear equation is a straight line)
may be parallel and distinct, and in this case, there is no solution to the
system because the two lines do not intersect each other at any point. For
example, consider the system

x1 + x2 = 1
2x1 + 2x2 = 3.

From the graphs (Figure 1.1(a)) of the given two equations we can see
that the lines are parallel, so the given system has no solution. It can be
proved algebraically simply by multiplying the first equation of the system
by 2 to get a system of the form

2x1 + 2x2 = 2
2x1 + 2x2 = 3,

which is not possible.

Second, the two lines may not be parallel, and they may meet at exactly
one point, so in this case the system has exactly one solution. For example,
consider the system
x1 − x2 = −1
3x1 − x2 = 3.
From the graphs (Figure 1.1(b)) of these two equations we can see that
the lines intersect at exactly one point, namely, (2, 3), and so the system
has exactly one solution, x1 = 2, x2 = 3. To show this algebraically, if we
substitute x2 = x1 + 1 in the second equation, we have 3x1 − x1 − 1 = 3,
or x1 = 2, and using this value of x in x2 = x1 + 1 gives x2 = 3.
6 Applied Linear Algebra and Optimization using MATLAB

Finally, the two lines may actually be the same line, and so in this case,
every point on the lines gives a solution to the system and therefore there
are infinitely many solutions. For example, consider the system

x1 + x2 = 1
2x1 + 2x2 = 2.

Figure 1.1: Three possible solutions of simultaneous systems.

Here, both equations have the same line for their graph (Figure 1.1(c)).
So this system has infinitely many solutions because any point on this line
gives a solution to this system, since any solution of the first equation is
also a solution of the second equation. For example, if we set x2 = x1 − 1,
and choose x1 = 0, x2 = 1, x1 = 1, x2 = 0, and so on. •
Note that a system of equations with no solution is said to be an incon-
sistent system and if it has at least one solution, it is said to be a consistent
system.
Matrices and Linear Systems 7

1.1.1 Linear Systems in Matrix Notation


The general simultaneous system of n linear equations with n unknown
variables x1 , x2 , . . . , xn is
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. .. .. (1.3)
. . . . .
an1 x1 + an2 x2 + · · · + ann xn = bn .

The system of linear equations (1.3) can be written as the single matrix
equation     
a11 a12 · · · a1n x1 b1
 a21 a22 · · · a2n   x2   b2 
..   ..  =  ..  . (1.4)
    
 .. .. ..
 . . . .  .   . 
an1 an2 · · · ann xn bn
If we compute the product of the two matrices on the left-hand side of
(1.9), we have
   
a11 x1 + a12 x2 + · · · + a1n xn b1
 a21 x1 + a22 x2 + · · · + a2n xn   b2 
= . (1.5)
   
 .. .. .. .. ..
 . . . .   . 
an1 x1 + an2 x2 + · · · + ann xn bn

But two matrices are equal if and only if their corresponding elements
are equal. Hence, the single matrix equation (1.9) is equivalent to the
system of the linear equations (1.3). If we define
     
a11 a12 · · · a1n x1 b1
 a21 a22 · · · a2n   x2   b2 
A =  .. , x = , b =  ..  ,
     
.. .. ..   .. 
 . . . .   .   . 
an1 an2 · · · ann xn bn

the coefficient matrix, the column matrix of unknowns, and the column
matrix of constants, respectively, and then the system (1.3) can be written
very compactly as
Ax = b, (1.6)
8 Applied Linear Algebra and Optimization using MATLAB

which is called the matrix form of the system of linear equations (1.3). The
column matrices x and b are called vectors.

If the right-hand sides of the equal signs of (1.6) are not zero, then the
linear system (1.6) is called a nonhomogeneous system, and we will find
that all the equations must be independent to obtain a unique solution.
If the constants b of (1.6) are added to the coefficient matrix A as a
column of elements in the position shown below

..
 
a11 a12 · · · a1n . b1 
 ..
a21 a22 · · · a2n . b2 
 
[A|b] =  , (1.7)

.. .. .. .. .. .. 

 . . . . . .  
..
an1 an2 · · · ann . bn

then the matrix [A|b] is called the augmented matrix of the system (1.6).
In many instances, it may be convenient to operate on the augmented ma-
trix instead of manipulating the equations. It is customary to put a bar
between the last two columns of the augmented matrix to remind us where
the last column came from. However, the bar is not absolutely necessary.
The coefficient and augmented matrices of a linear system will play key
roles in our methods of solving linear systems.

Using MATLAB commands we can define an augmented matrix as fol-


lows:

>> A = [1 2 3; 4 5 6; 7 8 9];
>> b = [10; 11; 12];
>> Aug = [A b]
Aug =
1 2 3 10
4 5 6 11
7 8 9 12

Also,
Matrices and Linear Systems 9

>> Aug = [A eye(3)]


Aug =
1 2 3 1 0 0
4 5 6 0 1 0
7 8 9 0 0 1
If all of the constant terms b1 , b2 , . . . , bn on the right-hand sides of the
equal signs of the linear system (1.6) are zero, then the system is called a
homogeneous system, and it can be written as

a11 x1 + a12 x2 + · · · + a1n xn = 0


a21 x1 + a22 x2 + · · · + a2n xn = 0
.. .. .. .. .. (1.8)
. . . . .
an1 x1 + an2 x2 + · · · + ann xn = 0.

The system of linear equations (1.8) can be written as the single matrix
equation     
a11 a12 · · · a1n x1 0
 a21 a22 · · · a2n   x2   0 
..   ..  =  ..  . (1.9)
    
 .. .. ..
 . . . .  .   . 
an1 an2 · · · ann xn 0
It can also be written in more compact form as

Ax = 0, (1.10)

where
     
a11 a12 · · · a1n x1 0
 a21 a22 · · · a2n   x2   0 
A= ..  , x= , 0= .
     
.. .. .. .. ..
 . . . .   .   . 
an1 an2 · · · ann xn 0

It can be seen by inspection of the homogeneous system (1.10) that


one of its solution, is x = 0; such a solution, in which all of the unknowns
are zero, is called the trivial solution or zero solution. For the general
nonhomogeneous linear system there are three possibilities: no solution,
one solution, or infinitely many solutions. For the general homogeneous
10 Applied Linear Algebra and Optimization using MATLAB

system, there are only two possibilities: either the zero solution is the only
solution, or there are infinitely many solutions (called nontrivial solutions).
Of course, it is usually nontrivial solutions that are of interest in physical
problems. A nontrivial solution to the homogeneous system can occur with
certain conditions on the coefficient matrix A, which we will discuss later.

1.2 Properties of Matrices and Determinants


To discuss the solutions of linear systems, it is necessary to introduce the
basic algebraic properties of matrices that make it possible to describe
linear systems in a concise way and make solving a system of n linear
equations easier.

1.2.1 Introduction to Matrices


A matrix can be described as a rectangular array of elements that can be
represented as follows:
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A =  .. . (1.11)
 
.. .. ..
 . . . . 
am1 am2 · · · amn

The numbers a11 , a12 , . . . , amn that make up the array are called the ele-
ments of the matrix. The first subscript for the element denotes the row
and the second denotes the column in which the element appears. The
elements of a matrix may take many forms. They could be all numbers
(real or complex), or variables, or functions, or integrals, or derivatives, or
even matrices themselves.
The order or size of a matrix is specified by the number of rows (m)
and column (n); thus, the matrix A in (1.11) is of order m by n, usually
written as m × n.
A vector can be considered a special case of a matrix having only one
row or one column. A row vector containing n elements is a 1 × n matrix,
called a row matrix, and a column vector of n elements is an n × 1 matrix,
called a (column matrix). A matrix of order 1 × 1 is called a scalar.
Matrices and Linear Systems 11

Definition 1.3 (Matrix Equality)

Two matrices A = (aij ) and B = (bij ) are equal if they are the same size
and the corresponding elements in A and B are equal, i.e.,

A = B, if and only if aij = bij

for i = 1, 2, . . . , m and j = 1, 2, . . . , n. For example, the matrices


   
1 −1 2 1 −1 z
A=  1 3 2  and B =  1 3 2 
2 4 3 x y w

are equal, if and only if x = 2, y = 4, z = 2, and w = 3. •

Definition 1.4 (Addition of Matrices)

Let A = (aij ) and B = (bij ) both be m × n matrices, then the sum A +


B of two matrices of the same size is a new matrix C = (cij ), each of
whose elements is the sum of the two corresponding elements in the original
matrices, i.e.,

cij = aij + bij , for i = 1, 2, . . . , m, and j = 1, 2, . . . , n.

For example, let


   
1 2 4 1
A= and B= .
3 4 5 2

Then      
1 2 4 1 5 3
+ = = C.
3 4 5 2 8 6

Using MATLAB commands and adding two matrices A and B of the


same size results in the answer C, another matrix of the same size:
12 Applied Linear Algebra and Optimization using MATLAB

>> A = [1 2; 3 4];
>> B = [4 1; 5 2];
>> C = A + B
C=
5 3
8 6

Definition 1.5 (Difference of Matrices)

Let A and B be m × n matrices, and we write A + (−1)B as A − B and


the difference of two matrices of the same size is a new matrix C, each of
whose elements is the difference of the two corresponding elements in the
original matrices. For example, let
   
1 2 4 1
A= and B = .
3 4 5 2
Then      
1 2 4 1 −3 1
− = = C.
3 4 5 2 −2 2
Note that (−1)B = −B is obtained by multiplying each entry of matrix B
by (−1), the scalar multiple of matrix B by −1. The matrix −B is called
the negative of the matrix B. •

Definition 1.6 (Multiplication of Matrices)

The multiplication of two matrices is defined only when the number of


columns in the first matrix is equal to the number of rows in the second. If
an m × n matrix A is multiplied by an n × p matrix B, then the product
matrix C is an m × p matrix where each term is defined by
n
X
cij = aik bkj
k=1

for each i = 1, 2, . . . , m and j = 1, 2, . . . , p. For example, let


   
1 2 4 1
A= and B = .
3 4 5 2
Matrices and Linear Systems 13

Then
      
1 2 4 1 4 + 10 1 + 4 14 5
= = = C.
3 4 5 2 12 + 20 3 + 8 32 11

Note that even if AB is defined, the product BA may not be defined.


Moreover, a simple multiplication of two square matrices of the same size
will show that even if BA is defined, it need not be equal to AB, i.e., they
do not commute. For example, if
   
1 2 2 1
A= and B= ,
−1 3 0 1

then    
2 3 1 7
AB = while BA = .
−2 2 −1 3

Thus, AB 6= BA. •

Using MATLAB commands, matrix multiplication has the standard


meaning as well. Multiplying two matrices A and B of size m × p and
p × n respectively, results in the answer C, another matrix of size m × n:

>> A = [1 2; −1 3];
>> B = [2 1; 0 1];
>> C = A ∗ B
C=
2 3
−2 2

MATLAB also has component-wise operations for multiplication, divi-


sion, and exponentiation. These three operations are a combination of a
period (.) and one of the operators ∗, /, and ˆ , which perform operations
on a pair of matrices (or vectors) with equal numbers of rows and columns.
For example, consider the two row vectors:
14 Applied Linear Algebra and Optimization using MATLAB

>> u = [1 2 3 4];
>> v = [5 3 0 2];
>> x = u. ∗ v
x=
5608

Warning: Divide by zero.

>> y = u./v;
y=
0.2000 0.6667 Inf 2.0000

These operations apply to matrices as well as vectors:

>> A = [1 2 3; 4 5 6; 7 8 9];
>> B = [9 8 7; 6 5 4; 3 2 1];
>> C = A. ∗ B
C=
9 16 21
24 25 24
21 16 9

Note that A. ∗ B is not the same as A ∗ B.

The array exponentiation operator, .ˆ , raises the individual elements


of a matrix to a power:

>> A = [1 2 3; 4 5 6; 7 8 9];
>> D = A.ˆ 2
D=
1 4 9
16 25 36
49 64 81
Matrices and Linear Systems 15

>> E = A.ˆ (1/2)


D=
1.0000 1.4142 1.7321
2.0000 2.2361 2.4495
2.6458 2.8284 3.0000
The syntax of array operators requires the correct placement of a typo-
graphically small symbol, a period, in what might be a complex formula.
Although MATLAB will catch syntax errors, it is still possible to make
computational mistakes with legal operations. For example, A.ˆ 2 and Aˆ
2 are both legal, but not at all equivalent.

In linear algebra, the addition and subtraction of matrices and vec-


tors are element-by-element operations. Thus, there are no special array
operators for addition and subtraction.

1.2.2 Some Special Matrix Forms


There are many special types of matrices encountered frequently in engi-
neering analysis. We discuss some of them in the following.

Definition 1.7 (Square Matrix)

A matrix A which has the same number of rows m and columns n, i.e.,
m = n, defined as

A = (aij ), for i = 1, 2, . . . , n, and j = 1, 2, . . . , n

is called a square matrix. For example, the matrices


 
  2 1 2
1 2
A= and B= 1 2 3 
−1 3
0 1 5

are square matrices because both have the same number of rows and columns.

16 Applied Linear Algebra and Optimization using MATLAB

Definition 1.8 (Null Matrix)

It is a matrix in which all elements are zero, i.e.,


A = (aij ) = 0, for i = 1, 2, . . . , n and j = 1, 2, . . . , n.
It is also called a zero matrix. It may be either rectangular or square. For
example, the matrices
 
  0 0 0
0 0 0
A= and B =  0 0 0 
0 0 0
0 0 0
are zero matrices. •
Definition 1.9 (Identity Matrix)

It is a square matrix in which the main diagonal elements are equal to 1.


It is defined as

aij = 0, if i = 6 j,
I = (aij ) =
aij = 1, if i = j.
An example of a 4 × 4 identity matrix may be written as
 
1 0 0 0
 0 1 0 0 
I4 = 
 0 0
.
1 0 
0 0 0 1
The identity matrix (also called a unit matrix) serves somewhat the same
purpose in matrix algebra as does the number one (unity) in scalar algebra.
It is called the identity matrix because multiplication of a matrix by it will
result in the same matrix. For a square matrix A of order n, it can be seen
that
In A = AIn = A.
Similarly, for a rectangular matrix B of order m × n, we have
Im B = BIn = B.
The multiplication of an identity matrix by itself results in the same identity
matrix. •
Matrices and Linear Systems 17

In MATLAB, identity matrices are created with the eye function, which
can take either one or two input arguments:

>> I = eye(n)
>> I = eye(m, n)

Definition 1.10 (Transpose Matrix)

The transpose of a matrix A is a new matrix formed by interchanging the


rows and columns of the original matrix. If the original matrix A is of
order m × n, then the transpose matrix, AT , will be of the order n × m,
i.e.,
if A = (aij ), for i = 1, 2, . . . , m and j = 1, 2, . . . , n,
then
AT = (aji ), for i = 1, 2, . . . , n and j = 1, 2, . . . , m.

The transpose of a matrix A can be found by using the following MAT-


LAB commands:

>> A = [1 2 3; 4 5 6; 7 8 9]
>> B = A0
B=
1.0000 4.0000 7.0000
2.0000 5.0000 8.0000
3.0000 6.0000 9.0000
Note that

1. (AT )T = A,

2. (A1 + A2 )T = AT1 + AT2 ,

3. (A1 A2 )T = AT2 AT1 ,

4. (αA)T = αAT , and α is a scalar.


18 Applied Linear Algebra and Optimization using MATLAB

Definition 1.11 (Inverse Matrix)

An n × n matrix A has an inverse or is invertible if there exists an n × n


matrix B such that

AB = BA = In .

Then the matrix B is called the inverse of A and is denoted by A−1 . For
example, let

−1 32
   
2 3
A= and B= .
2 2 1 1

Then we have

AB = BA = I2 ,

which means that B is an inverse of A. Note that the invertible matrix is


also called the nonsingular matrix. •

To find the inverse of a square matrix A using MATLAB commands we


do as follows:

>> A = [21 0 0; −1 2 − 1 0; 0 − 1 2 − 1; 0 0 − 1 2]
>> Ainv = IN V M AT (A)
Ainv =
0.8000 0.6000 0.4000 0.2000
0.6000 1.2000 0.8000 0.4000
0.4000 0.8000 1.2000 0.6000
0.2000 0.4000 0.6000 0.8000
Matrices and Linear Systems 19

Program 1.1
MATLAB m-file for Finding the Inverse of a Matrix
function [Ainv]=INVMAT(A)
[n,n]=size(A); I=zeros(n,n);
for i=1:n; I(i,i)=1; end
m(1:n,1:n)=A; m(1 : n, n + 1 : 2 ∗ n) = I;
for i=1:n; m(i, 1 : 2 ∗ n) = m(i, 1 : 2 ∗ n)/m(i, i);
for k=1:n; if i˜ =k
m(k, 1 : 2∗n) = m(k, 1 : 2∗n)−m(k, i)∗m(i, 1 : 2∗n);
end; end; end
invrs = m(1 : n, n + 1 : 2 ∗ n);

The MATLAB built-in function inv(A) can be also used to calculate


the inverse of a square matrix A, if A is invertible:

>> I = Ainv ∗ A;
>> f ormat short e
>> disp(I)
I=
1.0000e + 00 −1.1102e − 16 0 0
0 1.0000e + 00 0 0
0 0 1.0000e + 00 2.2204e − 16
0 0 0 1.0000e + 00
The values of I(2, 1), and I(3, 4) are very small, but nonzero, due to
round-off errors in the computation of Ainv and I. It is often preferable to
use rational numbers rather than decimal numbers. The function frac(x)
returns the rational approximation to x, or we can use the other MATLAB
command as follows:

>> f ormat rat

If the matrix A is not invertible, then the matrix A is called singular.

There are some well-known properties of the invertible matrix which


are defined as follows.
20 Applied Linear Algebra and Optimization using MATLAB

Theorem 1.2 If the matrix A is invertible, then:

1. It has exactly one inverse. If B and C are the inverses of A, then


B = C.

2. Its inverse matrix A−1 is also invertible and (A−1 )−1 = A.

3. Its product with another invertible matrix is invertible, and the in-
verse of the product is the product of the inverses in the reverse order.
If A and B are invertible matrices of the same size, then AB is in-
vertible and (AB)−1 = B −1 A−1 .

4. Its transpose matrix AT is invertible and (AT )−1 = (A−1 )T .

5. The kA for any nonzero k is invertible, i.e., (kA)−1 = k1 A−1 .

6. The Ak for any k is also invertible, i.e., (Ak )−1 = (A−1 )k .

7. Its size 1 × 1 is invertible when it is nonzero. If A = (a), then


A−1 = ( a1 ).

8. The formula for A−1 when n = 2 is


 −1  
−1 a11 a12 1 a22 −a12
A = = ,
a21 a22 a11 a22 − a12 a21 −a21 a11

provided that a11 a22 − a12 a21 6= 0. •

Definition 1.12 (Diagonal Matrix)

It is a square matrix having all elements equal to zero except those on the
main diagonal, i.e.,

aij = 0, if i 6= j
A = (aij ) =
aij 6= 0, if i = j.

Note that all diagonal matrices are invertible if all diagonal entries are
nonzero. •
Matrices and Linear Systems 21

The MATLAB function diag is used to either create a diagonal matrix


from a vector or to extract the diagonal entries of a matrix. If the input
argument of the diag function is a vector, MATLAB uses the vector to
create a diagonal matrix:

>> x = [2, 2, 2];


>> A = diag(x)
A=
2 0 0
0 2 0
0 0 2
The matrix A is called the scalar matrix because it has all the elements on
the main diagonal equal to the same scalars 2. Multiplication of a square
matrix and a scalar matrix is commutative, and the product is also a di-
agonal matrix.

If the input argument of the diag function is a matrix, the result is a


vector of the diagonal elements:

>> B = [2 − 4 1; 6 10 − 3; 0 5 8]
>> M = diag(B)
M=
2
10
8

Definition 1.13 (Upper-Triangular Matrix)

It is a square matrix which has zero elements below and to the left of the
main diagonal. The diagonal as well as the above diagonal elements can
take on any value, i.e.,

U = (uij ), where uij = 0, if i > j.

An example of such a matrix is


22 Applied Linear Algebra and Optimization using MATLAB

 
1 2 3
U =  0 4 5 .
0 0 6

The upper-triangular matrix is called an upper-unit-triangular matrix if


the diagonal elements are equal to one. This type of matrix is used in solv-
ing linear algebraic equations by LU decomposition with Crout’s method.
Also, if the main diagonal elements of the upper-triangular matrix are zero,
then the matrix  
0 a12 a13
A= 0 0 a23 
0 0 0
is called a strictly upper-triangular matrix. This type of matrix will be used
in solving linear systems by iterative methods. •

Using the MATLAB command triu(A) we can create an upper-triangular


matrix from a given matrix A as follows:

>> A = [1 2 3; 4 5 6; 7 8 9];
>> U = triu(A)
U=
1 2 3
0 4 5
0 0 6
We can also create a strictly upper-triangular matrix, i.e., an upper-
triangular matrix with zero diagonals, from a given matrix A by using the
MATLAB built-in function triu(A,I) as follows:

>> A = [1 2 3; 4 5 6; 7 8 9];
>> U = triu(A, I)
U=
0 2 3
0 0 5
0 0 0
Matrices and Linear Systems 23

Definition 1.14 (Lower-Triangular Matrix)

It is a square matrix which has zero elements above and to the right of the
main diagonal, and the rest of the elements can take on any value, i.e.,

L = (lij ), where lij = 0, if i < j.

An example of such a matrix is


 
2 0 0
L =  3 1 0 .
4 5 3

The lower-triangular matrix is called a lower-unit-triangular matrix if the


diagonal elements are equal to one. This type of matrix is used in solving
linear algebraic equations by LU decomposition with Doolittle’s method.
Also, if the main diagonal elements of the lower-triangular matrix are zero,
then the matrix  
0 0 0
A =  a21 0 0 
a31 a32 0
is called a strictly lower-triangular matrix. We will use this type of matrix
in solving the linear systems by using iterative methods. •

In a similar way, we can create a lower-triangular matrix and a strictly


lower-triangular matrix from a given matrix A by using the MATLAB
built-in functions tril(A) and tril(A,I), respectively.

Note that all the triangular matrices (upper or lower) with nonzero
diagonal entries are invertible.

Definition 1.15 (Symmetric Matrix)

A symmetric matrix is one in which the elements aij of a matrix A in the


ith row and jth column are equal to the elements aji in the jth row and ith
column, which means that

AT = A, i.e., aij = aji , for i 6= j.


24 Applied Linear Algebra and Optimization using MATLAB

Note that any diagonal matrix, including the identity, is symmetric. A


lower- or upper-triangular matrix is symmetric if and only if it is, in fact,
a diagonal matrix.

One way to generate a symmetric matrix is to multiply a matrix by its


transpose, since AT A is symmetric for any A. To generate a symmetric
matrix using MATLAB commands we do the following:

>> A = [1 : 4; 5 : 8; 9 : 12]
%A is not symmetric
>> B = A0 ∗ A
B=
107 122 137 152
122 140 158 176
137 158 179 200
152 176 200 224
>> C = A ∗ A0
C=
30 70 110
70 174 278
110 278 446

Example 1.1 Find all the values of a, b, and c for which the following
matrix is symmetric:
 
4 a+b+c 0
A =  −1 3 b − c .
−a + 2b − 2c 1 b − 2c
Solution. If the given matrix is symmetric, then A = AT , i.e.,
 
4 a+b+c 0
A =  −1 3 b−c 
−a + 2b − 2c 1 b − 2c
 
4 −1 −a + 2b − 2c
= a+b+c 3 1  = AT ,
0 b − c b − 2c
Matrices and Linear Systems 25

which implies that

0 = −a + 2b − 2c
−1 = a + b + c
1 = b − c.

Solving the above system, we get

a = 2, b = −1, c = −2,

and using these values, we have the given matrix of the form
 
4 −1 0
A=  −1 3 1 .
0 1 3

Theorem 1.3 If A and B are symmetric matrices of the same size, and
if k is any scalar, then:

1. AT is also symmetric;

2. A + B and A − B are symmetric;

3. and kA is also symmetric.

Note that the product of symmetric matrices is not symmetric in gen-


eral, but the product is symmetric if and only if the matrices commute.
Also, note that if A is a square matrix, then the matrices A, AAT , and
AT A are either all nonsingular or all singular. •

If for a matrix A, the aij = −aji for a i 6= j and the main diagonal
elements are not all zero, then the matrix A is called a skew matrix. If
all the elements on the main diagonal of a skew matrix are zero, then the
matrix is called skew symmetric, i.e.,

A = −AT , with aij = −aji for i 6= j, and aii = 0.


26 Applied Linear Algebra and Optimization using MATLAB

Any square matrix may be split into the sum of a symmetric and a skew
symmetric matrix. Thus,
1 1
A = (A + AT ) + (A − AT ),
2 2
1
where 2 (A+A ) is a symmetric matrix and 12 (A−AT ) is a skew symmetric
T

matrix. The matrices


     
1 2 3 1 2 3 0 2 3
 2 4 5  ,  −2 4 −5  , and  −2 0 5 
3 5 6 −3 5 6 −3 5 0
are examples of symmetric, skew, and skew symmetric matrices, respec-
tively. •
Definition 1.16 (Partitioned Matrix)

A matrix A is said to be partitioned if horizontal and vertical lines have


been introduced, subdividing A into submatrices called blocks. Partitioning
allows A to be written as a matrix A whose entries are its blocks. A simple
example of a partitioned matrix may be an augmented matrix, which can
be partitioned in the form
B = [A|b].
It is frequently necessary to deal separately with various groups of el-
ements, or submatrices, within a large matrix. This situation can arise
when the size of a matrix becomes too large for convenient handling, and
it becomes necessary to work with only a portion of the matrix at any one
time. Also, there will be cases in which one part of a matrix will have
a physical significance that is different from the remainder, and it is in-
structive to isolate that portion and identify it by a special symbol. For
example, the following 4 × 5 matrix A has been partitioned into four blocks
of elements, each of which is itself a matrix:
 . 
a11 a12 a13 .. a14 a15
 a21 a22 a23 ... a24 a25 
 
 
A =  a31 a32 a33 ... a34 a35  .
 
..
 
 ··· ··· ··· . ··· ··· 
 
.
a41 a42 a43 .. a44 a45
Matrices and Linear Systems 27

The partitioning lines must always extend entirely through the matrix
as in the above example. If the submatrices of A are denoted by the symbols
A11 , A12 , A21 , and A22 so that
   
a11 a12 a13 a14 a15
A11 =  a21 a22 a23  , A12 =  a24 a25  ,
a31 a32 a33 a34 a35
 
A21 = a41 a42 a43 , A22 = a44 a45 ,
then the original matrix can be written in the form
 
A11 A12
A= .
A21 A22
A partitioned matrix may be transposed by appropriate transposition
and rearrangement of the submatrices. For example, it can be seen by
inspection that the transpose of the matrix A is
 T 
A11 AT12
AT =  .
T T
A21 A22

Note that AT has been formed by transposing each submatrix of A and


then interchanging the submatrices on the secondary diagonal.

Partitioned matrices such as the one given above can be added, sub-
tracted, and multiplied provided that the partitioning is performed in an
appropriate manner. For the addition and subtraction of two matrices, it is
necessary that both matrices be partitioned in exactly the same way. Thus,
a partitioned matrix B of order 4 × 5 (compare with matrix A above) will
be conformable for addition with A only if it is partitioned as follows:
 . 
b11 b12 b13 .. b14 b15
 b21 b22 b23 ... b24 b25 
 
 
B =  b31 b32 b33 ... b34 b35  .
 
..
 
 ··· ··· ··· . ··· ··· 
 
.
b41 b42 b43 .. b44 b45
28 Applied Linear Algebra and Optimization using MATLAB

It can be expressed in the form


 
B11 B12
B= ,
B21 B22

in which B11 , B12 , B21 , and B22 represent the corresponding submatrices.
In order to add A and B and obtain a sum C, it is necessary according to
the rules for addition of matrices that the following represent the sum:
   
A11 + B11 A12 + B12 C11 C12
A+B = = = C.
A21 + B21 A22 + B22 C21 C22

Note that like A and B, the sum matrix C will also have the same par-
titions.

The conformability requirement for multiplication of partitioned matri-


ces is somewhat different from that for addition and subtraction. To show
the requirement, consider again the matrix A given previously and assume
that it is to be postmultiplied by a matrix D, which must have five rows but
may have any number of columns. Also assume that D is partitioned into
four submatrices as follows:
 
D11 D12
D= .
D21 D22

Then, when forming the product AD according to the usual rules for
matrix multiplication, the following result is obtained:
  
A11 A12 D11 D12
M = AD =
A21 A22 D21 D22
 
A11 D11 + A12 D21 A11 D12 + A12 D22
=
A21 D11 + A22 D21 A21 D12 + A22 D22
 
M11 M12
= .
M21 M22

Thus, the multiplication of the two partitioned matrices is possible if


the columns of the first partitioned matrix are partitioned in exactly the
Matrices and Linear Systems 29

same way as the rows of the second partitioned matrix. It does not matter
how the rows of the first partitioned matrix and the columns of the second
partitioned matrix are partitioned. •

Definition 1.17 (Band Matrix)

An n × n square matrix A is called a band matrix if there exists positive


integers p and q, with 1 < p and q < n, such that

aij = 0 for p≤j−i or q ≤ i − j.

The number p describes the number of diagonals above, including the


main diagonal on which the nonzero entries may lie. The number q de-
scribes the number of diagonals below, including the main diagonal on
which the nonzero entries may lie. The number p + q − 1 is called the
bandwidth of the matrix A, which tells us how many of the diagonals can
contain nonzero entries. For example, the matrix
 
1 2 3
 2 3 4 5 
A=  0 5 6 7 

0 0 7 8

is banded with p = 3 and q = 2, and so the bandwidth is equal to 4. An


important property of the band matrix is called the tridiagonal matrix, in
this case, p = q = 2, i.e., all nonzero elements lie either on or directly above
or below the main diagonal. For this type of matrix, Gaussian elimination
is particularly simpler. In general, the nonzero elements of a tridiagonal
matrix lie in three bands: the superdiagonal, diagonal, and subdiagonal.
For example, the matrix
 
1 2
 2 3 1 
 

 3 2 1 

A=  2 4 3 


 1 2 3 

 1 6 4 
3 4
30 Applied Linear Algebra and Optimization using MATLAB

is a tridiagonal matrix.

A matrix which is predominantly zero is called a sparse matrix. A band


matrix or a tridiagonal matrix is a sparse matrix, but the nonzero elements
of a sparse matrix are not necessarily near the diagonal. •

Definition 1.18 (Permutation Matrix)

A permutation matrix P has only 0s and 1s and there is exactly one in each
row and column of P . For example, the following matrices are permutation
matrices:
 
  0 1 0 0
1 0 0  1 0 0 0 
P =  0 0 1 , P =  0 0 1 0 .

0 1 0
0 0 0 1

The product P A has the same rows as A but in a different order (permuted),
while AP is just A with the columns permuted. •

1.2.3 Solutions of Linear Systems of Equations


Here we shall discuss the familiar technique called the method of elimina-
tion to find the solutions of linear systems. This method starts with the
augmented matrix of the given linear system and obtains a matrix of a
certain form. This new matrix represents a linear system that has exactly
the same solutions as the given origin system. In the following, we define
two well-known forms of a matrix.

Definition 1.19 (Row Echelon Form)

An m × n matrix A is said to be in row echelon form if it satisfies the


following properties:

1. Any rows consisting entirely of zeros are at the bottom.

2. The first entry from the left of a nonzero row is 1. This entry is
called the leading one of its row.
Matrices and Linear Systems 31

3. For each nonzero row, the leading one appears to the right and below
any leading ones in preceding rows.
Note that, in particular, in any column containing a leading one, all
entries below the leading one are zero. For example, the following matrices
are in row echelon form:
 
      0 1 0 1
1 2 1 1 0 2 1 2 3 4
 0 1 3 ,  0 1 2 ,  0 0 1 2 ,  0 0 1 0 .
 
 0 0 0 0 
0 0 0 0 0 1 0 0 0 0
0 0 0 0

Observe that a matrix in row echelon form is actually the augmented


matrix of a linear system (i.e., the last column is the right-hand side of
the system Ax = b), and the system is quite easy to solve by backward
substitution. For example, writing the first above matrix in linear system
form, we have
x1 + 2x2 = 1
x2 = 3.
No need to involve the last equation, which is

0x1 + 0x2 = 0,

and it satisfies for any choices of x1 and x2 . Thus, by using backward


substitution, we get

x2 = 3 and x1 = (1 − 2x2 ) = (1 − 2(3)) = −5.

So the unique solution of the linear system is [−5, 3]T .

Similarly, the linear system that corresponds to the second above matrix
is
x1 = 2
x2 = 2
0 = 1.
The third equation of this system shows that

0x1 + 0x2 = 1,
32 Applied Linear Algebra and Optimization using MATLAB

which is not possible for any choices of x1 and x2 . Hence, the system has
no solution.

Finally, the linear system that corresponds to the third above matrix is

x1 + 2x2 + 3x3 = 4
x3 = 2
0x1 + 0x2 + 0x3 = 0,

and by backward substitution (without using the third equation of the sys-
tem), we get

x3 = 2, and x1 = 4 − 2x2 − 3x3 = −2 − 2x2 .

By choosing an arbitrary nonzero value of x2 , we will get the value of


x1 , which implies that we have infinitely many solutions for such a linear
system. •

If we add one more property in the above definition of row echelon


form, then we will get another well-known form of a matrix, called reduced
row echelon form, which we define as follows.

Definition 1.20 (Reduced Row Echelon Form)

An m × n matrix A is said to be in reduced row echelon form if it satisfies


the following properties:

1. Any rows consisting entirely of zeros are at the bottom.

2. The first entry from the left of a nonzero row is 1. This entry is
called the leading one of its row.

3. For each nonzero row, the leading one appears to the right and below
any leading ones in preceding rows.

4. If a column contains a leading one, then all other entries in that


column (above and below a leading one) are zeroes.
Matrices and Linear Systems 33

For example, the following matrices are in reduced row echelon form:
       
1 0 1 1 0 0 2 1 4 5 0 1 1 0 0 0
 0 1 2 ,  0 1 0 4 ,  0 0 0 1 ,  0 0 1 0 2 ,
0 0 0 0 0 1 6 0 0 0 0 0 0 0 1 1

and the following matrices are not in reduced row echelon form:
       
1 3 0 2 1 3 0 2 1 0 0 3 1 0 2 0 0
 0 0 0 0 ,  0 0 5 4 ,  0 0 1 2 ,  0 1 1 0 2 .
0 0 1 4 0 0 0 1 0 1 0 6 0 2 0 1 1

Note that a useful property of matrices in reduced row echelon form


is that if A is an n × n matrix in reduced row echelon form not equal to
identity matrix In , then A has a row consisting entirely of zeros. •

There are usually many sequences of row operations that can be used to
transform a given matrix to reduced row echelon form—they all, however,
lead to the same reduced row echelon form. In the following, we shall
discuss how to transform a given matrix in reduced row echelon form.

Definition 1.21 (Elementary Row Operations)


It is the procedure that can be used to transform a given matrix into row
echelon or reduced row echelon form. An elementary row operation on an
m × n matrix A is any of the following operations:

1. Interchanging two rows of a matrix A;

2. Multiplying a row of A by a nonzero constant;

3. Adding a multiple of a row of A to another row.

Observe that when a matrix is viewed as the augmented matrix of a


linear system, the elementary row operations are equivalent, respectively,
to interchanging two equations, multiplying an equation by a nonzero con-
stant, and adding a multiple of an equation to another equation. •
34 Applied Linear Algebra and Optimization using MATLAB

Example 1.2 Consider the matrix


 
0 0 1 3
A =  3 2 0 4 .
4 4 8 12

Interchanging rows 1 and 2 gives


 
3 2 0 4
R1 =  0 0 1 3  .
4 4 8 12

Multiplying the third row of A by 14 , we get


 
0 0 1 3
R2 =  3 2 0 4 .
1 1 2 3

Adding (−2) times row 2 of A to row 3 of A gives


 
0 0 1 3
R3 =  3 2 0 4 .
−2 0 8 4

Observe that in obtaining R3 from A, row 2 did not change. •

Theorem 1.4 Every matrix can be brought to reduced row echelon form
by a series of elementary row operations. •

Example 1.3 Consider the matrix


 
1 −3 0 0 1
A =  2 −6 −1 1 1 .
3 −9 2 −1 5

Using the finite sequence of elementary row operations, we get the matrix
of the form  
1 −3 0 0 1
R1 =  0 0 1 −1 1  ,
0 0 0 1 0
Matrices and Linear Systems 35

which is in row echelon form. If we continue with the matrix R1 and make
all elements above the leading one equal to zero, we obtain
 
1 −3 0 0 1
R2 =  0 0 1 0 1 ,
0 0 0 1 0

which is the reduced row echelon form of the given matrix A. •

MATLAB has a function rref used to arrive directly at the reduced


echelon form of a matrix. For example, using the above given matrix, we
do the following:

>> A = [1 − 3 0 0 1; 2 − 6 − 1 1 1; 3 − 9 2 − 1 5];
>> B = rref (A)
B=
1 −3 0 0 1
0 0 1 0 1
0 0 0 1 0

Definition 1.22 (Row Equivalent Matrix)

An m × n matrix A is said to be row equivalent to an m × n matrix B if B


can be obtained by applying a finite sequence of elementary row operations
to the matrix A. •

Example 1.4 Consider the matrix


 
1 3 6 5
A= 2 1 4 3 .
3 −4 3 4

If we add (−1) times row 1 of A to its third row, we get


 
1 3 6 5
R1 =  2 1 4 3 ,
2 −7 −3 −1
36 Applied Linear Algebra and Optimization using MATLAB

so R1 is row equivalent to A.

Interchanging row 2 and row 3 of the matrix R1 gives the matrix of the
form  
1 3 6 5
R2 =  2 −7 −3 −1  ,
2 1 4 3
so R2 is row equivalent to R1 .

Multiplying row 2 of R2 by (−2), we obtain


 
1 3 6 5
R3 =  −4 14 6 2  ,
2 1 4 3
so R3 is row equivalent to R2 .

It then follows that R3 is row equivalent to the given matrix A since


we obtained the matrix R3 by applying three successive elementary row
operations to A. •

Theorem 1.5
1. Every matrix is row equivalent to itself.
2. If a matrix A is row equivalent to a matrix B, then B is row equivalent
to A.
3. If a matrix A is row equivalent to a matrix B and B is row equivalent
to a matrix C, then A is row equivalent to C. •

Theorem 1.6 Every m × n matrix is row equivalent to a unique matrix


in reduced row echelon form. •

Example 1.5 Use elementary row operations on matrices to solve the lin-
ear system
− x2 + x3 = 1
x1 − x2 − x3 = 1
−x1 + 3x3 = −2.
Matrices and Linear Systems 37

Solution. The process begins with the augmented matrix form


.
 
0 −1 1 .. 1
 1 −1 −1 ...
 
1 .
 
..
−1 0 3 . −2
Interchanging the first and the second rows gives
..
 
1 −1 −1 . 1
 .. 
.
 0 −1 1 . 1 

.
−1 0 3 .. −2
Adding (1) times row 1 of the above matrix to its third row, we get
.
 
1 −1 −1 .. 1

 0 −1 .. 
.
 1 . 1 
..
0 −1 2 . −1
Now multiplying the second row by −1 gives
..
 
1 −1 −1 . 1
 .. 
.
 0
 1 −1 . −1 
..
0 −1 2 . −1
Replace row 1 with the sum of itself and (1) times row 2, and then also
replace row 3 with the sum of itself and (1) times row 2, and we get the
matrix of the form
..
 
1 0 −2 . 0
 0 1 −1 ... −1  .
 
 
..
0 0 1 . −2
Replace row 1 with the sum of itself and (2) times row 3, and then replace
row 2 with the sum of itself and (1) times the row 3, and we get
..
 
1 0 0 . −4
 0 1 0 ... −3  .
 
 
..
0 0 1 . −2
38 Applied Linear Algebra and Optimization using MATLAB

Now by writing in equation form and using backward substitution

x1 = −4
x2 = −3
x3 = −2,

and we get the solution [−4, −3, −2]T of the given linear system. •

1.2.4 The Determinant of a Matrix


The determinant is a certain kind of a function that associates a real num-
ber with a square matrix. We will denote the determinant of a square
matrix A by det(A) or |A|.

Definition 1.23 (Determinant of a Matrix)

Let A = (aij ) be an n × n square matrix, then a determinant of A is given


by:

1. det(A) = a11 , if n = 1.

2. det(A) = a11 a22 − a12 a21 , if n = 2. •

For example, if
   
4 2 6 3
A= and B = ,
−3 7 2 5

then

det(A) = (4)(7) − (−3)(2) = 34 and det(B) = (6)(5) − (3)(2) = 24.

Notice that the determinant of a 2 × 2 matrix is given by the difference


of the products of the two diagonals of a matrix. The determinant of a
3 × 3 matrix is defined in terms of the determinants of 2 × 2 matrices, and
the determinant of a 4 × 4 matrix is defined in terms of the determinants
of 3 × 3 matrices, and so on.
Matrices and Linear Systems 39

The MATLAB function det(A), is calculated by the determinant of the


square matrix A as:

>> A = [2 2; 6 7];
>> B = det(A)
B=
2.0000
Another way to find the determinants of only 2 × 2 and 3 × 3 matrices
can be found easily and quickly using diagonals (or direct evaluation). For
a 2 × 2 matrix, the determinant can be obtained by forming the product of
the entries on the line from left to right and subtracting from this number
the product of the entries on the line from right to left. For a matrix of
size 3 × 3, the diagonals of an array consisting of the matrix with the first
two columns added to the right are used. Then the determinant can be
obtained by forming the sum of the products of the entries on the lines
from left to right, and subtracting from this number the products of the
entries on the lines from right to left, as shown in Figure (1.2).

Thus, for a 2 × 2 matrix


|A| = a11 a22 − a12 a21
and for 3 × 3 matrix
|A| = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 − a11 a23 a32 − a12 a21 a33 .
(diagonal products from left to right) (diagonal products from right to left)

For example, the determinant of a 2 × 2 matrix can be computed as



12 5
|A| = = (12)(6) − (5)(−7) = 72 + 35 = 107,
−7 6
and the determinant of a 3 × 3 matrix can be obtained as

4 5 6

|A| = −3 8 2 = [(4)(8)(7) + (5)(2)(4) + (6)(−3)(9)]
4 9 7
− [(6)(8)(4) + (4)(2)(9) + (5)(−3)(7)]

= 102 − 159 = −57.


40 Applied Linear Algebra and Optimization using MATLAB

Figure 1.2: Direct evaluation of 2 × 2 and 3 × 3 determinants.

For finding the determinants of the higher-order matrices, we will define


the following concepts called the minor and cofactor of matrices.

Definition 1.24 (Minor of a Matrix)

The minor Mij of all elements aij of a matrix A of order n × n as the


determinant of the submatrix of order (n − 1) × (n − 1) is obtained from
A by deleting the ith row and jth column (also called the ijth minor of A).
For example, let
 
2 3 −1
A= 5 3 2 ,
4 −2 4

then the minor M11 will be obtained by deleting the first row and the first
column of the given matrix A, i.e.,


3 2
M11 = = (3)(4) − (−2)(2) = 12 + 4 = 16.
−2 4

Similarly, we can find the other possible minors of the given matrix as
Matrices and Linear Systems 41

follows:
5 2
M12 =
4 4
=
20 − 16 = 4


5 3
M13 = = −10 − 12 = −22
4 −2

3 −1
M21 = = 12 − 2 = 10
−2 4

2 −1
M22 =
4
= 8+4 = 12
4

2 3
M23 =
4 −2
=
−4 − 12 = −16


3 −1
M31 =
3
= 6+3 = 9
2

2 −1
M32 =
5
= 4+5 = 9
2

2 3
M33 = 5 3 =
6 − 15 = −9,

which are the required minors of the given matrix. •


Definition 1.25 (Cofactor of a Matrix)

The cofactor Aij of all elements aij of a matrix A of order n × n is given


by
Aij = (−1)i+j Mij ,
where Mij is the minor of all elements aij of a matrix A. For example, the
cofactors Aij of all elements aij of the matrix
 
2 3 −1
A= 5 3 2 
4 −2 4
42 Applied Linear Algebra and Optimization using MATLAB

are computed as follows:

A11 = (−1)1+1 M11 = M11 = 16


A12 = (−1)1+2 M12 = −M12 = −4
A13 = (−1)1+3 M13 = M13 = −22
A21 = (−1)2+1 M21 = −M21 = −10
A22 = (−1)2+2 M22 = M22 = 12
A23 = (−1)2+3 M23 = −M23 = 16
A31 = (−1)3+1 M31 = M31 = 9
A32 = (−1)3+2 M32 = −M32 = −9
A33 = (−1)3+3 M33 = M33 = −9,

which are the required cofactors of the given matrix. •

To get the above results, we use the MATLAB command window as


follows:

>> A = [2 3 − 1; 5 3 2; 4 − 2 4];
>> Cof A = cof actor(A, 1, 1);
>> Cof A = cof actor(A, 1, 2);
>> Cof A = cof actor(A, 1, 3);
>> Cof A = cof actor(A, 2, 1);
>> Cof A = cof actor(A, 2, 2);
>> Cof A = cof actor(A, 2, 3);
>> Cof A = cof actor(A, 3, 1);
>> Cof A = cof actor(A, 3, 2);
>> Cof A = cof actor(A, 3, 3);
Matrices and Linear Systems 43

Program 1.2
MATLAB m-file for Finding Minors and Cofactors
of a Matrix
function CofA = cofactor(A,i,j)
[m,n] = size(A);
if m ˜ = n error(Matrix must be square) end
A1 = A([1:i-1,i+1:n],[1:j-1,j+1:n]);
Minor = det(A1);
CofA = (-1)ˆ (i+j)*det(Minor);

Definition 1.26 (Cofactor Expansion of a Determinant of


a Matrix)

Let A be a square matrix, then we define the determinant of A as the sum


of the products of the elements of the first row and their cofactors. If A is
a 3 × 3 matrix, then its determinant is defined as

det(A) = |A| = a1 1A11 + a1 2A12 + a1 3A13 .

Similarly, in general, for an n × n matrix, we define it as


n
X
det(A) = |A| = aij Aij , n > 2, (1.12)
1

where the summation is on i for any fixed value of the jth column (1 ≤ j ≤
n), or on j for any fixed value of the ith row (1 ≤ i ≤ n), and Aij is the
cofactor of element aij . •

Example 1.6 Find the minors and cofactors of the matrix A and use them
to evaluate the determinant of the matrix
 
3 1 −4
A= 2 5 6 .
1 4 8
44 Applied Linear Algebra and Optimization using MATLAB

Solution. The minors of A are calculated as follows:



5 6
M11 = = 40 − 24 = 16
4 8

2 6
M12 = = 16 − 6 = 10
1 8

2 5
M13 = = 8−5 = 3.
1 4

From these values of the minors, we can calculate the cofactors of the
elements of the given matrix as follows:

A11 = (−1)1+1 M11 = M11 = 16


1+2
A12 = (−1) M12 = −M12 = −10
A13 = (−1)1+3 M13 = M13 = 3.

Now by using the cofactor expansion along the first row, we can find
the determinant of the matrix as follows:

det(A) = a11 A11 +a12 A12 +a13 A13 = (3)(16)+(1)(−10)+(−4)(3) = 26.

Note that in Example 1.6, we computed the determinant of the matrix


by using the cofactor expansion along the first row, but it can also be found
along the first column of the matrix.

To get the results of Example 1.6, we use the MATLAB Command


Window as follows:

>> A = [3 1 − 4; 2 5 6; 1 4 8];
>> DetA = Cof F exp(A);
Matrices and Linear Systems 45

Program 1.3
MATLAB m-file for Finding the Determinant of a
Matrix by Cofactor Expansion
function DetA = CofFexp(A)
[m,n] = size(A);
if m ˜ = n error (Matrix must be square) end
a = A(1,:);c = [ ];
for i=1:n
c1i = cofactor(A,1,i);
c = [c;c1i]; end
DetA = a*c;;

Theorem 1.7 (The Laplace Expansion Theorem)

The determinant of an n × n matrix A = {aij }, when n ≥ 2, can be


computed as

det(A) = ai1 Ai1 + ai2 Ai2 + · · · + ain Ain


Xn
= aij Aij ,
j=1

which is called the cofactor expansion along the ith row, and also as

det(A) = a1j A1j + a2j A2j + · · · + anj Anj


Xn
= aij Aij
i=1

and is called the cofactor expansion along the jth column. This is called
the Laplace expansion theorem. •

Note that the cofactor and minor of an element aij differs only in sign,
i.e., Aij = ±Mij . A quick way for determining whether to use the + or −
is to use the fact that the sign relating Aij and Mij is in the ith row and
jth column of the checkerboard array
46 Applied Linear Algebra and Optimization using MATLAB

 
+ − + − + ···

 − + − + − ··· 


 + − + − + ··· .


 − + − + − ··· 

.. .. .. .. .. ..
. . . . . .

For example, A11 = M11 , A21 = −M21 , A12 = −M12 , A22 = M22 , and so on.

Definition 1.27 (Cofactor Matrix)

If A is any n × n matrix and Aij is the cofactor of aij , then the matrix
 
A11 A12 · · · A1n
 A21 A22 · · · A2n 
 
 .. .. .. 
 . . ··· . 
An1 An2 · · · Ann

is called the matrix of the cofactor from A. For example, the cofactor of
the matrix  
3 2 −1
A= 1 6 3 
2 −4 0
can be calculated as follows:

A11 = 12, A12 = 6, A13 = −16


A21 = 4, A22 = 2, A23 = 16
A31 = 12, A32 = −10, A33 = 16.

So that the matrix of the form


 
12 6 −16
 4 2 16 
12 −10 16

is the required cofactor matrix of the given matrix. •


Matrices and Linear Systems 47

Definition 1.28 (Adjoint of a Matrix)

If A is any n × n matrix and Aij is the cofactor of aij of A, then the


transpose of this matrix is called the adjoint of A and is denoted by Adj(A).
For example, the cofactor matrix of the matrix
 
3 2 −1
A= 1 6 3 
2 −4 0
is calculated as  
12 6 −16
 4 2 16  .
12 −10 16
So by taking its transpose, we get the matrix
 T  
12 6 −16 12 4 12
 4 2 16  =  6 2 −10  = Adj(A),
12 −10 16 −16 16 16
which is called the adjoint of the given matrix A. •
Example 1.7 Find the determinant of the following matrix using cofactor
expansion and show that det(A) = 0 when x = 4:
 
x+2 x 2
A= 1 x − 1 3 .
4 x+1 x
Solution. Using the cofactor expansion along the first row, we compute
the determinant of the given matrix as
|A| = a11 C11 + a12 C12 + a13 C13 ,
where

1+1
x−1 3
C11 = (−1) M11 = M11 =
= x2 − 4x − 3
x+1 x

1 3
C12 = (−1)1+2 M12 = −M12 = − = −x + 12
4 x

1 x−1
C13 = (−1)1+3 M13 = M13 = − = −3x + 5.
4 x+1
48 Applied Linear Algebra and Optimization using MATLAB

Thus,

|A| = (x + 2)[x2 − 4x − 3] + x[−x + 12] + 2[−3x + 5] = x3 − 3x2 − 5x + 4.

Now taking x = 4, we get

|A| = (4)3 − 3(4)2 − 5(4) + 4 = 64 − 48 − 20 + 4 = 0,

which is the required determinant of the matrix at x = 4. •

The following are special properties, which will be helpful in reducing the
amount of work involved in evaluating determinants.

Theorem 1.8 (Properties of the Determinant)

Let A be an n × n matrix:

1. The determinant of a matrix A is zero if any row or column is zero


or equal to a linear combination of other rows and columns.
For example, if  
3 1 0
A =  2 1 0 ,
4 3 0
then det(A) = 0.

2. A determinant of a matrix A is changed in sign if the two rows or


two columns are interchanged. For example, if
 
3 2
A= ,
4 5

then det(A) = 7, but for the matrix


 
4 5
B= ,
3 2

obtained from the matrix A by interchanging its rows, we have det(B) =


−7.
Matrices and Linear Systems 49

3. The determinant of a matrix A is equal to the determinant of its


transpose. For example, if
 
5 3
A= ,
4 4

then det(A) = 8, and for the matrix


 
5 4
B= ,
3 4

obtained from the matrix A by taking its transpose, we have

det(B) = 8 = det(A).

5. If the matrix B is obtained from the matrix A by multiplying every


element in one row or in one column by k, then the determinant of
the matrix B is equal to k times the determinant of A. For example,
if  
6 5
A= ,
3 4
then det(A) = 9, but for the matrix
 
12 10
B= ,
3 4

obtained from the matrix A by multiplying its first row by 2, we have

det(B) = 18 = 2(9) = 2 det(A).

6. If the matrix B is obtained from the matrix A by adding to a row (or


a column) a multiple of another row (or another column) of A, then
the determinant of the matrix B is equal to the determinant of A.
For example, if  
4 3
A= ,
5 4
50 Applied Linear Algebra and Optimization using MATLAB

then det(A) = 1, and for the matrix


 
4 3
B= ,
13 10
obtained from the matrix A by adding to its second row 2 times the
first row, we have
det(B) = 1 = det(A).

7. If two rows or two columns of a matrix A are identical, then the


determinant is zero. For example, if
 
2 3
A= ,
2 3
then det(A) = 0.
8. The determinant of a product of matrices is the product of the deter-
minants of all matrices. For example, if
   
3 4 5 1 2 3
A =  3 2 1 , B =  4 2 3 ,
2 1 6 1 3 5
then det(A) = −36 and det(A) = −3. Also,
 
24 29 46
AB =  12 13 20  ,
12 24 39
then det(AB) = 108. Thus,
det(A) det(B) = (−36)(−3) = 108 = det(AB).

9. The determinant of a triangular matrix (upper-triangular or lower-


triangular matrix) is equal to the product of all their main diagonal
elements. For example, if
 
3 4 5
A =  0 4 7 ,
0 0 5
Matrices and Linear Systems 51

then
det(A) = (3)(4)(5) = 60.

10. The determinant of an n × n matrix A times the scalar multiple k is


equal to k n times the determinant of the matrix A, i.e., det(kA) =
k n det(A). For example, if
 
3 4 5
A =  2 3 6 ,
1 0 5

then det(A) = 14, and for the matrix


 
6 8 10
B = 2A =  4 6 12  ,
2 0 10
obtained from the matrix A by multiplying by 2, we have

det(B) = 112 = 8(14) = 23 det(A).

11. The determinant of the kth power of a matrix A is equal to the kth
power of the determinant of the matrix A, i.e., det(Ak ) = (det(A))k .
For example, if  
2 −2 0
A= 2 3 −1  ,
1 0 1
then det(A) = 12, and for the matrix
 
−18 −30 12
B = A3 =  24 −3 −9  ,
3 −12 3
obtained by taking the cubic power of the matrix A, we have

det(B) = 1728 = (12)3 = (det(A))3 .

12. The determinant of a scalar matrix (1 × 1) is equal to the element


itself. For example, if A = (8), then det(A) = 8.
52 Applied Linear Algebra and Optimization using MATLAB

Example 1.8 Find all the values of α for which det(A) = 0, where
 
α−3 1 0
A= 0 α − 1 1 .
0 2 α

Solution. We find the determinant of the given matrix by using the co-
factor expansion along the first row, so we compute

|A| = a11 C11 + a12 C12 + a13 C13



α−1 1
+ 0 0 α − 1
0 1
= (α − 3) − 1
2 α 0 α 0 2
= (α − 3)[(α − 1)(α) − 2] + 1[0 − 0] + 0
= (α − 3)[α2 − α) − 2]
= (α − 3)(α + 1)(α − 2)

given det(A) = 0, which implies that

|A| = 0
(α − 3)(α + 1)(α − 2) = 0,

which gives
α = −1, α = 2, α = 3,
the required values of α for which det(A) = 0. •

Example 1.9 Find all the values of α such that


 
  3 −1 0
4α α
det = det  0 α −2  .
1 α+1
−1 3 α+1

Solution. Since
4α α
= 4α(α + 1) − α,
1 α+1
which is equivalent to

4α α
= 4α2 + 3α.
1 α+1
Matrices and Linear Systems 53

Also,

3 −1 0

0 α
−2 = 3[α(α+1)+6]−(−1)[(0)(α+1)+2]+0[(0)(3)−(−1)(α)],

−1 3 α+1

which can be written as



3 −1 0

0 α −2 = 3α2 + 3α + 16.

−1 3 α+1

Given that
3 −1 0
4α α

1 α+1
= 0 α
−2 ,

−1 3 α+1
we get
4α2 + 3α = 3α2 + 3α + 16.
Simplifying this quadratic polynomial, we have

α2 = 16 or α2 − 16 = 0,

which gives
α = −4 and α = 4,
the required values of α. •

Example 1.10 Find the determinant of the matrix


 
−5a −5b −5c
A =  2d − g 2e − h 2f − i  ,
2d 2e 2f

if  
a b c
det  d e f  = 4.
g h i
54 Applied Linear Algebra and Optimization using MATLAB

Solution. Using the property of the determinant, we get




a b c

|A| = (−5) 2d − g 2e − h 2f − i .
2d 2e 2f

Subtracting the third row from the second row gives



a b c

|A| = (−5) −g −h −i .
2d 2e 2f

Interchanging the last two rows, we get



a b c

|A| = (−5)(−1) 2d 2e 2f
,

−g −h −i

or
a b c

|A| = (−5)(−1)(2)(−1) d e f .

g h i
Since it is given that
a b c

d e f = 4,

g h i
we have
|A| = (−5)(−1)(2)(−1)(4) = −40,
the required determinant of the given matrix. •

Elimination Method for Evaluating a Determinant

One can easily transform the given determinant into upper-triangular form
by using the following row operations:

1. Add a multiple of one row to another row, and this will not affect
the determinant.
Matrices and Linear Systems 55

2. Interchange two rows of the determinant, and this will be done by


multiplying the determinant by −1.
After transforming the given determinant into upper-triangular form,
then use the fact that the determinant of a triangular matrix is the product
of its diagonal elements.
Example 1.11 Find the following determinant:

3 6 9

6
2 −7 .
−3 1 −1
1
Solution. Multiplying row 1 of the determinant by 3
gives

1 2 3

(3) 6 2 −7 .

−3 1 −1

Now to create the zeros below the main diagonal, column by column, we
do as follows:

Replace the second row of the determinant with the sum of itself and (−6)
times the first row of the determinant and then replace the third row of
the determinant with the sum of itself and (3) times the first row of the
determinant, which gives

1 2 3

(3) 0 −10 −25 .

0 7 8
1
Multiplying row 2 of the determinant by − 10 gives

1 2 3

(3)(−10) 0 1 −5/2 .
0 7 8

Replacing the third row of the determinant with the sum of itself and (−7)
times the second row of the determinant, we obtain
56 Applied Linear Algebra and Optimization using MATLAB


1 2 3

(3)(−10) 0 1 −5/2
= (3)(−10)(1)(1)(−19/2) = 285,

0 0 −19/2

which is the required value of the given determinant. •

Theorem 1.9 If A is an invertible matrix, then:


1. det(A) 6= 0.
1
2. det(A−1 ) = .
det(A)
Adj(A)
3. A−1 = .
det(A)
A
4. (adj(A))−1 = = adj(A−1 ).
det(A)
5. det(adj(A)) = det(A)n−1 . •

By using Theorem 1.9 we can find the inverse of a matrix by showing


that the determinant of a matrix is not equal to zero and by using the
adjoint and determinant of the given matrix A.

Example 1.12 For what values of α does the following matrix have an
inverse?  
1 0 α
A=  2 2 1 
0 2α 1
Solution. We find the determinant of the given matrix by using cofactor
expansion along the first row as follows:

|A| = a11 C11 + a12 C12 + a13 C13 ,

which is equal to

|A| = (1)C11 + (0)C12 + (α)C13 = C11 + αC13 .


Matrices and Linear Systems 57

Now we compute the values of C11 and C13 as follows:


2 1
C11 = (−1)1+1 M11 = M11 = = 2 − 2α
2α 1

2 2
C13 = (−1)1+3 M13 = M13 = − = 4α.
0 2α

Thus,

|A| = C11 + αC13 = 2 − 2α + 4α2 .

From Theorem 1.9 we know that the matrix has an inverse if det(A) 6= 0,
so

|A| = 2 − 2α + 4α2 6= 0,

which implies that

2α2 − α − 1 = (2α + 1)(α − 1) 6= 0.

Hence, the given matrix has an inverse if α 6= −1/2 and α 6= 1. •

Example 1.13 Use the adjoint method to compute the inverse of the fol-
lowing matrix:  
1 2 −1
A =  2 −1 1 .
1 2 2
Also, find the inverse and determinant of the adjoint matrix.

Solution. First, we compute the determinant of the given matrix as fol-


lows:
|A| = a11 C11 + a12 C12 + a13 C13 ,
which gives
|A| = (1)(−4) − (2)(3) + (−1)(5) = −15
Now we compute the nine cofactors as follows:
58 Applied Linear Algebra and Optimization using MATLAB


−1 1 2 −1
= −4, C12 = − 2 1

C11 = + = −3, C13 = + = 5,
2 2 1 2 1 2

2 −1 −1
= −6, C22 = + 1
1 2
C21 = − = 3, C23 = − = 0,
2 2 1 2 1 2

2 −1 1 −1 1 2
C31 = + = 1, C32 = − = −3, C33 = + = −5.
−1 1 2 1 2 −1
Thus, the cofactor matrix has the form
 
−4 −3 5
 −6 3 0 ,
1 −3 −5
and the adjoint is the transpose of the cofactor matrix
 T  
−4 −3 5 −4 −6 1
adj(A) =  −6 3 0  =  −3 3 −3  .
1 −3 −5 5 0 −5

To get the adjoint of the matrix of Example 1.13, we use the MATLAB
Command Window as follows:

>> A = [1 2 − 1; 2 − 1 1; 1 2 2];
>> AdjA = Adjoint(A);

Program 1.4
MATLAB m-file for Finding the Adjoint of a
Matrix Function AdjA = Adjoint(A)
[m,n] = size(A);
if m ˜ = n error(‘Matrix must be square’) end
A1 = [ ];
for i = 1:n
for j=1:n
A1 = [A1;cofactor(A,i,j)];end;end
AdjA = reshape(A1,n,n);
Matrices and Linear Systems 59

Then by using Theorem 1.9 we can have the inverse of the matrix as
follows:

  
−4 −6 1 4/15 2/5 −1/15
Adj(A) 1
A−1 = = −  −3 3 −3  =  1/5 −1/5 1/5  .
det(A) 15
5 0 −5 −1/3 0 1/3

Using Theorem 1.9 we can compute the inverse of the adjoint matrix as:
 
−1/15 −2/15 1/15
A
(adj(A))−1 = =  −2/15 1/15 −1/15  ,
det(A)
−1/15 −2/15 −2/15

and the determinant of the adjoint matrix as

det(adj(A)) = (det(A))3−1 = (−15)2 = 225.

Now we consider the implementation of finding the inverse of the matrix


 
1 −1 1 2
 1 0 1 3 
A=
 0

0 2 4 
1 1 −1 1

by using the adjoint and the determinant of the matrix in the MATLAB
Command Window as:

>> A = [1 − 1 1 2; 1 0 1 3; 0 0 2 4; 1 1 − 1 1];

The cofactors Aij of elements of the given matrix A can also be found
directly by using the MATLAB Command Window as follows:
60 Applied Linear Algebra and Optimization using MATLAB

>> A11 = (−1)ˆ (1 + 1) ∗ det(A([2 : 4], [2 : 4]));


>> A12 = (−1)ˆ (1 + 2) ∗ det(A([2 : 4], [1, 3 : 4]));
>> A13 = (−1)ˆ (1 + 3) ∗ det(A([2 : 4], [1 : 2, 4]));
>> A14 = (−1)ˆ (1 + 4) ∗ det(A([2 : 4], [1 : 3]));
>> A21 = (−1)ˆ (2 + 1) ∗ det(A([1, 3 : 4], [2 : 4]));
>> A22 = (−1)ˆ (2 + 2) ∗ det(A([1, 3 : 4], [1, 3 : 4]));
>> A23 = (−1)ˆ (2 + 3) ∗ det(A([1, 3 : 4], [1 : 2, 4]));
>> A24 = (−1)ˆ (2 + 4) ∗ det(A([1, 3 : 4], [1 : 3]));
>> A31 = (−1)ˆ (3 + 1) ∗ det(A([1 : 2, 4], [2 : 4]));
>> A32 = (−1)ˆ (3 + 2) ∗ det(A([1 : 2, 4], [1, 3 : 4]));
>> A33 = (−1)ˆ (3 + 3) ∗ det(A([1 : 2, 4], [1 : 2, 4]));
>> A34 = (−1)ˆ (3 + 4) ∗ det(A([1 : 2, 4], [1 : 3]));
>> A41 = (−1)ˆ (4 + 1) ∗ det(A([1 : 3], [2 : 4]));
>> A42 = (−1)ˆ (4 + 2) ∗ det(A([1 : 3], [1, 3 : 4]));
>> A43 = (−1)ˆ (4 + 3) ∗ det(A([1 : 3], [1 : 2, 4]));
>> A44 = (−1)ˆ (4 + 4) ∗ det(A([1 : 3], [1 : 3]));

Now form the cofactor matrix B using the Aij s as follows:

>> B =
[A11 A12 A13 A14;
A21 A22 A23 A24;
A31 A32 A33 A34;
A41 A42 A43 A44]

which gives

B=
−2 −4 −4 2
6 6 8 −4
−3 −2 −3 2
−2 −2 −4 2

The adjoint matrix is the transpose of the cofactor matrix:


Matrices and Linear Systems 61

>> adjA = B 0

−2 6 −3 −2
−4 6 −2 −2
−4 8 −3 −4
2 −4 2 2
The determinant of the matrix can be obtained as:

>> det(A)
ans =
2
The inverse of A is the adjoint matrix divided by the determinant of A.

>> invA = (1/det(A)) ∗ adjA;


invA =
−1 3 −1.5 −1
−2 3 −1 −1
−2 4 −1.5 −2
1 −2 1 1
Verify the results by finding A−1 directly using the MATLAB command:

>> inv(A)

Example 1.14 If det(A) = 3 and det(B) = 4, then show that

det(A2 B −1 AT B 3 ) = 432.

Solution. By using the properties of the determinant of the matrix, we


have
det(A2 B −1 AT B 3 ) = det(A2 ) det(B −1 ) det(AT ) det(B 3 ),
which can also be written as
1
det(A2 B −1 AT B 3 ) = (det(A))2 (det(A))(det(B))3 .
det(B)
62 Applied Linear Algebra and Optimization using MATLAB

Now using the given information, we get


1
det(A2 B −1 AT B 3 ) = (3)2 (3)(4)3 = 33 42 = 432,
4
the required solution. •

1.2.5 Homogeneous Linear Systems


We have seen that every system of linear equations has either no solution,
a unique solution, or infinitely many solutions. However, there is another
type of system that always has at least one solution, i.e., either a unique
solution (called a zero solution or trivial solution) or infinitely many solu-
tions (called nontrivial solutions). Such a system is called a homogeneous
linear system.
Definition 1.29 A system of linear equations is said to be homogeneous
if all the constant terms are zero, i.e.,
Ax = b = 0. (1.13)
For example,
x1 + 2x2 − x3 = 0
2x1 − 3x2 + 3x3 = 0
is a homogeneous linear system. But
x1 + 2x2 − x3 = 0
2x1 − 3x2 + 3x3 = 1
is not a homogeneous linear system.

The general homogeneous system of m linear equations with n unknown


variables x1 , x2 , . . . , xn is

a11 x1 + a12 x2 + · · · + a1n xn = 0


a21 x1 + a22 x2 + · · · + a2n xn = 0
.. .. .. .. .. (1.14)
. . . . .
am1 x1 + am2 x2 + · · · + amn xn = 0.
Matrices and Linear Systems 63

The system of linear equations (1.14) can be written as the single matrix
equation
    
a11 a12 · · · a1n x1 0
 a21 a22 · · · a2n   x2   0 
..   ..  =  ..  . (1.15)
    
 .. .. ..
 . . . .  .   . 
am1 am2 · · · amn xn 0
If we compute the product of the two matrices on the left-hand side of
(1.15), we have

   
a11 x1 + a12 x2 + · · · + a1n xn 0
 a21 x1 + a22 x2 + · · · + a2n xn   0 
= . (1.16)
   
 .. .. .. .. ..
 . . . .   . 
am1 x1 + am2 x2 + · · · + amn xn 0

But the two matrices are equal if and only if their corresponding elements
are equal. Hence, the single matrix equation (1.15) is equivalent to the
system of the linear equations (1.14). If we define
     
a11 a12 · · · a1n x1 0
 a21 a22 · · · a2n   x2   0 
A =  .. ..  , x =  ..  , b =  ..  ,
     
.. ..
 . . . .   .   . 
am1 am2 · · · amn xn 0

the coefficient matrix, the column matrix of unknowns, and the column
matrix of constants, respectively, then the system (1.14) can be written
very compactly as
Ax = b, (1.17)
which is called the matrix form of the homogeneous system. •

Note that a homogeneous linear system has an augmented matrix of


the form

[A|0].
64 Applied Linear Algebra and Optimization using MATLAB

Theorem 1.10 Every homogeneous linear system Ax = 0 has either ex-


actly one solution or infinitely many solutions. •

Example 1.15 Solve the following homogeneous linear system:

x1 + x2 + 2x3 = 0
2x1 + 3x2 + 4x3 = 0
3x1 + 4x2 + 7x3 = 0.

Solution. Consider the augmented matrix form of the given system as


follows:
.
 
1 1 2 .. 0

[A|0] =  .. 
 2 3 4 . .
0 
.
3 4 7 .. 0
To convert it into reduced echelon form, we first do the elementary row
operations: row2 – (2)row1 and row3 – (3)row1 gives
.
 
1 1 2 .. 0
≡
 .. 
.
 0 1 0 . 0 
..
0 1 1 . 0

Next, using the elementary row operations: row3 – row2 and row1 – row2,
we get
..
 
1 0 2 . 0
≡  0 1 0 ... 0 
 
.

..
0 0 1 . 0

Finally, using the elementary row operation: row1 – (2)row3, we obtain


.
 
1 0 0 .. 0
≡
 .. 
.
 0 1 0 . 0 
.
0 0 1 .. 0
Matrices and Linear Systems 65

Thus,

x1 = 0, x2 = 0, x3 = 0

is the only trivial solution of the given system. •

Theorem 1.11 A homogeneous linear system Ax = 0 of m linear equa-


tions with n unknowns, where m < n, has infinitely many solutions. •

Example 1.16 Solve the homogeneous linear system

x1 + 2x2 + x3 = 0
2x1 − 3x2 + 4x3 = 0.

Solution. Consider the augmented matrix form of the given system as


. !
1 2 1 .. 0
[A|0] = . .
2 −3 4 .. 0

To convert it into reduced echelon form, we first do the elementary row


operation row2 – 2row1, and we get
.. !
1 2 1 . 0
∼ . .
0 −7 2 .. 0

Doing the elementary row operation: − 17 row2 gives

. !
1 2 1 .. 0
∼ . .
0 1 − 72 .. 0

Finally, using the elementary row operation row1 – 2row3, we get


.. !
1 2 11 . 0
∼ 7
. .
0 1 − 2 .. 0 7

Writing it in the system of equations form, we have


66 Applied Linear Algebra and Optimization using MATLAB

11
x1 + 0x2 + x3 = 0
7
2
0x1 + x2 − x3 = 0
7
and from it, we get
2
x2 = x3 .
7
Taking x3 = t, for t ∈ R and t 6= 0, we get the nontrivial solution
11 2
[x1 , x2 , x3 ]T = [ t, t, t]T .
7 7
Thus, the given system has infinitely many solutions, and this is to be
expected because the given system has three unknowns and only two equa-
tions. •

Example 1.17 For what values of α does the homogeneous linear system

(α − 2)x1 + x2 = 0
x1 + (α − 2)x2 = 0

have nontrivial solutions?

Solution. The augmented matrix form of the given system is


.. !
(α − 2) 1 . 0
[A|0] = . .
1 (α − 2) .. 0

By interchanging row1 by row2, we get


. !
1 (α − 2) .. 0
∼ .. .
(α − 2) 1 . 0

Doing the elementary row operation: row2 – (α − 2) row1 gives


Matrices and Linear Systems 67

.. !
1 (α − 2) . 0
∼ .. .
0 1 − (α − 2)2 . 0
Using backward substitution, we obtain

x1 + (α − 2)x2 = 0
0x1 + 1 − (α − 2)2 x2 = 0.
Notice that if x2 = 0, then x1 = 0, and the given system has a trivial
6 0. This implies that
solution, so let x2 =

1 − (α − 2)2 = 0
1 − α2 + 4α − 4 = 0
α2 − 4α + 3 = 0
(α − 3)(α − 1) = 0,
which gives
α = 1 and α = 3.
Notice that for these values of α, the given set of equations are identical,
i.e.,
(for α = 1)
−x1 + x2 = 0
x1 − x2 = 0,
and (for α = 3)
x1 + x2 = 0
x1 + x2 = 0.
Thus, the given system has nontrivial solutions (infinitely many solu-
tions) for α = 1 and α = 3. •
The following basic theorems on the solvability of linear systems are
proved in linear algebra.
68 Applied Linear Algebra and Optimization using MATLAB

Theorem 1.12 A homogeneous system of n equations in n unknowns has


a solution other than the trivial solution if and only if the determinant of
the coefficients matrix A vanishes, i.e., matrix A is singular. •

Theorem 1.13 (Necessary and Sufficient Condition for a Unique


Solution)

A nonhomogeneous system of n equations in n unknowns has a unique


solution if and only if the determinant of a coefficients matrix A does not
vanish, i.e., A is nonsingular. •

1.2.6 Matrix Inversion Method


If matrix A is nonsingular, then the linear system (1.6) always has a unique
solution for each b since the inverse matrix A−1 exists, so the solution of
the linear system (1.6) can be formally expressed as

A−1 Ax = A−1 b
Ix = A−1 b

or
x = A−1 b. (1.18)
If A is a square invertible matrix, there exists a sequence of elementary
row operations that carry A to the identity matrix I of the same size, i.e.,
A −→ I. This same sequence of row operations carries I to A−1 , i.e.,
I −→ A−1 . This can also be written as

[A|I] −→ [I|A−1 ].

Example 1.18 Use the matrix inversion method to find the solution of
the following linear system:

x1 + 2x2 = 1
−2x1 + x2 + 2x3 = 1
−x1 + x2 + x3 = 1.
Matrices and Linear Systems 69

Solution. First, we compute the inverse of the given matrix as


 
1 2 0
A =  −2 1 2 
−1 1 1

by reducing A to the identity matrix I by elementary row operations and


then applying the same sequence of operations to I to produce A−1 . Con-
sider the augmented matrix
.
 
1 2 0 .. 1 0 0

[A|I] =  .. 
.
 −2 1 2 . 0 1 0 
..
−1 1 1 . 0 0 1

Multiply the first row by −2 and −1 and then, subtracting the results
from the second and third rows, respectively, we get
.
 
1 2 0 .. 1 0 0
∼
 .. 
.
 0 5 2 . 2 1 0 
..
0 3 1 . 1 0 1

Multiplying the second row by 15 , we get


.
 
1 2 0 .. 1 0 0
∼
 .
.

.
 0 1 2/5 . 2/5 1/5 0 
.
0 3 1 .. 1 0 1

Multiplying the second row by 2 and 3 and then subtracting the results
from the first and third rows, respectively, we get
.
 
1 0 −4/5 .. 1/5 −2/5 0
 . 
∼  0 1 2/5 .. 2/5 1/5 .
0 
..
0 0 −1/5 . −1/5 −3/5 1

After multiplying the third row by −5, we obtain


70 Applied Linear Algebra and Optimization using MATLAB

.
 
1 0 −4/5 .. 1/5 −2/5 0

∼ .. 
.
 0 1 2/5 . 2/5 1/5 0 
..
0 0 1 . 1 3 −5

Multiplying the third row by 25 and − 45 and then subtracting the results
from the second and first rows, respectively, we get
..
 
1 0 0 . 1 2 −4
 .
∼  0 1 0 .. 0 −1

.
2 

.
0 0 1 .. 1 3 −5
Thus, the inverse of the given matrix is
 
1 2 −4
A−1 =  0 −1 2 ,
1 3 −5
and the unique solution of the system can be computed as
    
1 2 −4 1 −1
x = A−1 b =  0 −1 2  1  =  1 ,
1 3 −5 1 −1
i.e.,
x1 = −1, x2 = 1, x3 = −1,
the solution of the given system by the matrix inversion method. •

Thus, when the matrix inverse A−1 of the coefficient matrix A is com-
puted, the solution vector x of the system (1.6) is simply the product of
inverse matrix A−1 and the right-hand side vector b.

Using MATLAB commands, the linear system of equations defined by


the coefficient matrix A and the right-hand side vector b using the matrix
inverse method is solved with:
Matrices and Linear Systems 71

>> A = [1 2 0; −2 1 2; −1 1 1];
>> b = [1; 1; 1];
>> x = A \ b
x=
1.0000
1.0000
−1.0000

Theorem 1.14 For an n × n matrix A, the following properties are equiv-


alent:

1. The inverse of matrix A exists, i.e., A is nonsingular.

2. The determinant of matrix A is nonzero.

3. The homogeneous system Ax = 0 has a trivial solution x = 0.

4. The nonhomogeneous system Ax = b has a unique solution. •

Not all matrices have inverses. Singular matrices don’t have inverses
and thus the corresponding systems of equations do not have unique solu-
tions. The inverse of a matrix can also be computed by using the following
numerical methods for linear systems: Gauss-elimination method, Gauss–
Jordan method, and LU decomposition method. But the best and simplest
method for finding the inverse of a matrix is to perform the Gauss–Jordan
method on the augmented matrix with an identity matrix of the same size.

1.2.7 Elementary Matrices


An n × n matrix E is called an elementary matrix if it can be obtained
from the n × n identity matrix In by a single elementary row operation.
For example, the first elementary matrix E1 is obtained by multiplying the
second row of the identity matrix by 6, i.e.,
   
1 0 0 1 0 0
I =  0 1 0  −→  0 6 0  = E1 .
0 0 1 0 0 1
72 Applied Linear Algebra and Optimization using MATLAB

The second elementary matrix E2 is obtained by multiplying the first


row of the identity matrix by −5 and adding it to the third row, i.e.,
   
1 0 0 1 0 0
I =  0 1 0  −→  0 1 0  = E2 .
0 0 1 −5 0 1
Similarly, the third elementary matrix E3 is obtained by interchanging
the second and third rows of the identity matrix, i.e.,
   
1 0 0 1 0 0
I =  0 1 0  −→  0 0 0  = E3 .
0 0 1 0 1 0
Notice that elementary matrices are always square.
Theorem 1.15 To perform an elementary row operation on the m × n
matrix A, multiply A on the left by the corresponding elementary matrix.

Example 1.19 Let
 
1 2 3 −5
A= 3 3 2 1 .
4 1 −2 4
Find an elementary matrix E such that EA is the matrix that results
by adding 5 times the first row of A to the third row.

Solution. The matrix E must be 3 × 3 to conform to the product EA. So,


we get E by adding 5 times the first row to the third row. This gives
 
1 0 0
E =  0 1 0 ,
5 0 1
and the product EA is given as
    
1 0 0 1 2 3 −5 1 2 3 −5
EA =  0 1 0   3 3 2 1 = 3 3 2 1 .
5 0 1 4 1 −2 4 9 11 13 −21

Matrices and Linear Systems 73

Theorem 1.16 An elementary matrix is invertible, and the inverse is also


an elementary matrix. •
Example 1.20 Express the matrix
 
2 3
A=
1 1
as a product of elementary matrices.

Solution. We reduce A to identity matrix I and write the elementary


matrix at each stage, given
 
2 3
A= .
1 1
By interchanging the first and the second rows, we get
   
1 1 0 1
E1 A = , where E1 = .
2 3 1 0
Multiplying the second row by 2 and subtracting the result from the
second row, we get
   
1 1 1 0
E2 (E1 A) = E2 E1 A = , where E2 = .
0 1 −2 1
Finally, by subtracting the third row from the first row, we get
   
1 0 1 −1
E3 (E2 E1 A) = E3 E2 E1 A = , where E3 = .
0 1 0 1
Hence,

E3 E2 E1 A = I,
and so
A = (E3 E2 E1 )−1 .
This means that
   
0 1 1 0 1 1
A= E1−1 E2−1 E3−1 = .
1 0 2 1 0 1

74 Applied Linear Algebra and Optimization using MATLAB

Theorem 1.17 A square matrix A is invertible if and only if it is a product


of elementary matrices. •

Theorem 1.18 An n × n matrix A is invertible if and only if:

1. It is row equivalent to identity matrix In .

2. Its reduced row echelon form is identity matrix In .

3. It is expressible as a product of elementary matrices.

4. It has n pivots. •

In the following, we will discuss the direct methods for solving the linear
systems.

1.3 Numerical Methods for Linear Systems


To solve systems of linear equations using numerical methods, there are
two types of methods available. The first type of methods are called direct
methods or elimination methods. The other type of numerical methods are
called iterative methods. In this chapter we will discuss only the first type
of the numerical methods, and the other type of the numerical methods
will be discussed in Chapter 2. The first type of methods find the solution
in a finite number of steps. These methods are guaranteed to succeed and
are recommended for general use. Here, we will consider Cramer’s rule, the
Gaussian elimination method and its variants, the Gauss–Jordan method,
and LU decomposition (by Doolittle’s, Crout’s, and Cholesky methods).

1.4 Direct Methods for Linear Systems


This type of method refers to a procedure for computing a solution from a
form that is mathematically exact. We shall begin with a simple method
called Cramer’s rule with determinants. We shall then continue with the
Gaussian elimination method and its variants and methods involving tri-
angular, symmetric, and tridiagonal matrices.
Matrices and Linear Systems 75

1.4.1 Cramer’s Rule


This is our first direct method for solving linear systems by the use of
determinants. This method is one of the least efficient for solving a large
number of linear equations. It is, however, very useful for explaining some
problems inherent in the solution of linear equations.

Consider a system of two linear equations


a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2 ,
with the condition that a11 a22 − a12 a21 6= 0, i.e., the determinant of the
given matrix must not be equal to zero or the matrix must be nonsingular.
Solving the above system using systematic elimination by multiplying the
first equation of the system with a22 and the second equation by a12 and
subtracting gives

(a11 a22 − a12 a21 )x1 = a22 b1 − a12 b2 ,

and now solving for x1 gives


a22 b1 − a12 b2
x1 = ,
a11 a22 − a12 a21
and putting the value of x1 in any equation of the given system, we have
x2 as
a22 b2 − a12 b1
x2 = .
a11 a22 − a12 a21
Then writing it in determinant form, we have
|A1 | |A2 |
x1 = and x2 = ,
|A| |A|
where

b a a11 b1 a a
|A1 | = 1 12 and |A| = 11 12

, |A2 | = , .
b2 a22 a21 b2 a21 a22

In a similar way, one can use Cramer’s rule for a set of n linear equations
as follows:
76 Applied Linear Algebra and Optimization using MATLAB

|Ai |
xi = , i = 1, 2, 3, . . . , n, (1.19)
|A|
i.e., the solution for any one of the unknown xi in a set of simultaneous
equations is equal to the ratio of two determinants; the determinant in
the denominator is the determinant of the coefficient matrix A, while the
determinant in the numerator is the same determinant with the ith column
replaced by the elements from the right-hand sides of the equation.

Example 1.21 Solve the following system using Cramer’s rule:

5x1 + x3 + 2x4 = 3
x1 + x2 + 3x3 + x4 = 5
x1 + x2 + 2x4 = 1
x1 + x2 + x3 + x4 = −1.

Solution. Writing the given system in matrix form


    
5 0 1 2 x1 3
 1 1 3 1   x2   5 
  = 
 1 1 0 2   x3   1 
1 1 1 1 x4 −1

gives   
5 0 1 2 3
 1 1 3 1   5 
A=
 1
 and b=
 1 .

1 0 2 
1 1 1 1 −1
The determinant of the matrix A can be calculated by using cofactor
expansion as follows:

5 0 1 2

1 1 3 1
|A| =
1 1 0 2

1 1 1 1

= a11 c11 + a12 c12 + a13 c13 + a14 c14 = 5(2) + 0(−2) + 1(0) + 2(0) = 10 6= 0,
Matrices and Linear Systems 77

which shows that the given matrix A is nonsingular. Then the matrices
A1 , A2 , A3 , and A4 can be computed as
   
3 0 1 2 5 3 1 2
 5 1 3 1   1 5 3 1 
A1 =  1 1 0 2 
, A2 = 
 1
,
1 0 2 
−1 1 1 1 1 −1 1 1
   
5 0 3 2 5 0 1 3
 1 1 5 1   1 1 3 5 
A3 =  1 1
, A4 =  .
1 2   1 1 0 1 
1 1 −1 1 1 1 1 −1
The determinant of the matrices A1 , A2 , A3 , and A4 can be computed
as follows:

|A1 | = 3(2) + 0(18) + 1(−6) + 2(−10) = 6 + 0 − 6 − 20 = −20


|A2 | = 5(−18) + 3(−2) + 1(6) + 2(10) = −90 − 6 + 6 + 20 = −70
|A3 | = 5(6) + 0(−6) + 3(0) + 2(0) = 30 + 0 + 0 + 0 = 30
|A4 | = 5(10) + 0(−10) + 1(0) + 3(0) = 50 + 0 + 0 + 0 = 50.

Now applying Cramer’s rule, we get


|A1 | 20
x1 = = − = −2
|A| 10

|A2 | 70
x2 = = − = −7
|A| 10

|A3 | 30
x3 = = = 3
|A| 10

|A3 | 50
x4 = = = 5,
|A| 10
which is the required solution of the given system. •

Thus Cramer’s rule is useful in hand calculations only if the determi-


nants can be evaluated easily, i.e., for n = 3 or n = 4. The solution of a
78 Applied Linear Algebra and Optimization using MATLAB

3
system of n linear equations by Cramer’s rule will require N = (n + 1) n3
multiplications. Therefore, this rule is much less efficient for large values of
n and is at most never used for computational purposes. When the number
of equations is large (n > 4), other methods of solutions are more desirable.

Use MATLAB commands to find the solution of the above linear sys-
tem by Cramer’s rule as follows:

>> A = [5 0 1 2; 1 1 3 1; 1 1 0 2; 1 1 1 1];
>> b = [3; 5; 1; −1];
>> A1 = [b A(:, [2 : 4])];
>> x1 = det(A1)/det(A);
>> A2 = [A(:, 1) b A(:, [3 : 4])];
>> x2 = det(A2)/det(A);
>> A3 = [A(:, [1 : 2]) b A(:, 4)];
>> x3 = det(A3)/det(A);
>> A4 = [A(:, [1 : 3]) b];
>> x4 = det(A4)/det(A);

Procedure 1.1 (Cramer’s Rule)


1. Form the coefficient matrix A and column matrix b.
2. Compute the determinant of A. If det A = 0, then the system has no
solution; otherwise, go to the next step.
3. Compute the determinant of the new matrix Ai by replacing the ith
matrix with the column vector b.
4. Repeat step 3 for i = 1, 2, . . . , n.
5. Solve for the unknown variables xi using
det(Ai )
xi = , for i = 1, 2, . . . , n.
det(A)

The m-file CRule.m and the following MATLAB commands can be used
to generate the solution of Example 1.21 as follows:
Matrices and Linear Systems 79

>> A = [5 0 1 2; 1 1 3 1; 1 1 0 2; 1 1 1 1];
>> b = [3; 5; 1; −1];
>> sol = CRule(A, b);

Program 1.5
MATLAB m-file for Cramer’s Rule for a Linear System
function sol=CRule(A,b)
[m, n] = size(A);
if m ˜ = n error(‘Matrix is not square.’); end
if det(A) == 0 error(‘Matrix is singular.’);end
for i = 1:n
B = A; B(:, i) = b;
sol(i) = det(B) / det(A);end
sol = sol’;

1.4.2 Gaussian Elimination Method


It is one of the most popular and widely used direct methods for solving
linear systems of algebraic equations. No method of solving linear sys-
tems requires fewer operations than the Gaussian procedure. The goal of
the Gaussian elimination method for solving linear systems is to convert
the original system into the equivalent upper-triangular system from which
each unknown is determined by backward substitution.

The Gaussian elimination procedure starts with forward elimination,


in which the first equation in the linear system is used to eliminate the
first variable from the rest of the (n − 1) equations. Then the new second
equation is used to eliminate the second variable from the rest of the (n−2)
equations, and so on. If (n − 1) such elimination is performed, and the re-
sulting system will be the triangular form. Once this forward elimination
is complete, we can determine whether the system is overdetermined or
underdetermined or has a unique solution. If it has a unique solution, then
backward substitution is used to solve the triangular system easily and one
can find the unknown variables involved in the system.
80 Applied Linear Algebra and Optimization using MATLAB

Now we shall describe the method in detail for a system of n linear


equations. Consider the following system of n linear equations:

a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1


a21 x1 + a22 x2 + a23 x3 + · · · + a2n xn = b2
a31 x1 + a32 x2 + a33 x3 + · · · + a3n xn = b3 (1.20)
.. .. .. .. .. ..
. . . . . .
an1 x1 + an2 x2 an3 x3 + · · · + ann xn = bn .

Forward Elimination

Consider the first equation of the given system (1.20)

a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1 (1.21)

as the first pivotal equation with the first pivot element a11 . Then the first
equation times multiples mi1 = (ai1 /a11 ), i = 2, 3, . . . , n is subtracted from
the ith equation to eliminate the first variable x1 , producing an equivalent
system

a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1


(1) (1) (1) (1)
a22 x2 + a23 x3 + · · · + a2n xn = b2
(1) (1) (1) (1)
a32 x2 + a33 x3 + · · · + a3n xn = b3 (1.22)
.. .. .. .. ..
. . . . .
(1) (1) (1) (1)
an2 x2 + an3 x3 + · · · + ann xn = bn .

Now consider a second equation of the system (1.22), which is

(1) (1) (1) (1)


a22 x2 + a23 x3 + · · · + a2n xn = b2 , (1.23)
(1)
the second pivotal equation with the second pivot element a22 . Then
(1) (1)
the second equation times multiples mi2 = (ai2 /a22 ), i = 3, . . . , n is sub-
tracted from the ith equation to eliminate the second variable x2 , producing
Matrices and Linear Systems 81

an equivalent system
a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1
(1) (1) (1) (1)
a22 x2 + a23 x3 + · · · + a2n xn = b2
(2) (2) (2)
a33 x3 + · · · + a3n xn = b3 (1.24)
.. .. .. ..
. . . .
(2) (2) (2)
an3 x3 + · · · + ann xn = bn .

Now consider a third equation of the system (1.24), which is


(2) (2) (2)
a33 x3 + · · · + a3n xn = b3 , (1.25)
(2)
the third pivotal equation with the third pivot element a33 . Then the third
(2) (2)
equation times multiples mi3 = (ai3 /a33 ), i = 4, . . . , n is subtracted from
the ith equation to eliminate the third variable x3 . Similarly, after (n–
1)th steps, we have the nth pivotal equation which has only one unknown
variable xn , i.e.,
a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1
(1) (1) (1) (1)
+ a22 x2 + a23 x3 + · · · + a2n xn = b2
(2) (2) (2)
+ a33 x3 + · · · + a3n xn = b3 (1.26)
.. ..
. .
(n−1) (n−1)
ann x n = bn ,
(n−1)
with the nth pivotal element ann . After getting the upper-triangular
system, which is equivalent to the original system, the forward elimination
is completed.

Backward Substitution

After the triangular set of equations has been obtained, the last equation of
system (1.26) yields the value of xn directly. The value is then substituted
into the equation next to the last one of the system (1.26) to obtain a value
of xn−1 , which is, in turn, used along with the value of xn in the second
82 Applied Linear Algebra and Optimization using MATLAB

to the last equation to obtain a value of xn−2 , and so on. A mathematical


formula can be obtained for the backward substitution:
(n−1) 
bn
xn = (n−1)



ann







1 
(n−2) (n−2)
 

xn−1 = (n−2) bn−1 − an−1n xn

an−1n−1 . (1.27)
..


.


! 

n 
1 X 

x1 = b1 − a1j xj



a11 j=2

The Gaussian elimination can be carried out by writing only the co-
efficients and the right-hand side terms in a matrix form, the augmented
matrix form. Indeed, this is exactly what a computer program for Gaus-
sian elimination does. Even for hand calculations, the augmented matrix
form is more convenient than writing all sets of equations. The augmented
matrix is formed as follows:
 
a11 a12 a13 · · · a1n | b1
 a21 a22 a23 · · · a2n | b2 
 
 a31 a32 a33 · · · a3n | b3 
 . (1.28)
 .. .. .. .. .. 
 . . . . . | 
an1 an2 an3 · · · ann | bn

The operations used in the Gaussian elimination method can now be


applied to the augmented matrix. Consequently, system (1.26) is now
written directly as
 
a11 a12 a13 · · · a1n | b1
 (1) (1) (1) (1) 

 a22 a23 · · · a2n | b2 

 (2) (2) (2) 

 a33 · · · a3n | b3 ,
 (1.29)
 .. .. 

 . . | 

(n−1) (n−1)
ann | bn
Matrices and Linear Systems 83

from which the unknowns are determined as before by using backward


substitution. The number of multiplications and divisions for the Gaussian
elimination method for one b vector is approximately
 n3  n
2
N= +n − . (1.30)
3 3

Simple Gaussian Elimination Method


First, we will solve the linear system using the simplest variation of the
Gaussian elimination method, called simple Gaussian elimination or Gaus-
sian elimination without pivoting. The basics of this variation is that all
possible diagonal elements (called pivot elements) should be nonzero. If
at any stage an element becomes zero, then interchange that row with
any row below with a nonzero element at that position. After getting the
upper-triangular matrix, we use backward substitution to get the solution
of the given linear system.
Example 1.22 Solve the following linear system using the simple Gaus-
sian elimination method:
x1 + 2x2 + x3 = 2
2x1 + 5x2 + 2x3 = 1
x1 + 3x2 + 4x3 = 5.

Solution. The process begins with the augmented matrix form


..
 
1 2 1 . 2
 . 
 2 5
 3 .. 1 
.
..
1 3 4 . 5

Since a11 = 1 6= 0, we wish to eliminate the elements a21 and a31 by


subtracting from the second and third rows the appropriate multiples of the
first row. In this case, the multiples are given as
2 1
m21 = = 2 and m31 = = 1.
1 1
Hence,
84 Applied Linear Algebra and Optimization using MATLAB

.
 
1 2 1 .. 2
 0 1 1 ... −3  .
 
 
..
0 1 3 . 3
(1) (1)
Since a22 = 1 6= 0, we eliminate the entry in the a32 position by subtracting
1
the multiple m32 = = 1 of the second row from the third row to get
1
.
 
1 2 1 .. 2
 0 1 1 ... −3  .
 
 
..
0 0 2 . 6
Obviously, the original set of equations has been transformed to an
upper-triangular form. Since all the diagonal elements of the obtaining
upper-triangular matrix are nonzero, the coefficient matrix of the given
system is nonsingular, and the given system has a unique solution. Now
expressing the set in algebraic form yields
x1 + 2x2 + x3 = 2
x2 + x3 = −3
2x3 = 6.
Now using backward substitution, we get
2x3 = 6 gives x3 = 3
x2 = −x3 − 3 = −(3) − 3 = −6 gives x2 = −6
x1 = 2 − 2x2 − x3 = 2 − 2(−6) − 3 gives x1 = 11,
which is the required solution of the given system. •
The above results can be obtained using MATLAB commands as fol-
lows:

>> B = [1 2 1 2; 2 5 3 1; 1 3 4 5];
%B = [A|b] = Augmented matrix
>> x = W P (B);
>> disp(x)
Matrices and Linear Systems 85

Program 1.6
MATLAB m-file for Simple Gaussian Elimination Method
function x=WP(B)
[n,t]=size(B); U=B;
for k=1:n-1; for i=k:n-1; m=U(i+1,k)/U(k,k);
for j=1:t; U(i+1,j)=U(i+1,j)-m*U(k,j);end;end end
i=n; x(i,1)=U(i,t)/U(i,i);
for i=n-1:-1:1; s=0;
for k=n:-1:i+1; s = s + U (i, k) ∗ x(k, 1); end
x(i,1)=(U(i,t)-s)/U(i,i); end; B; U; x; end

In the simple description of Gaussian elimination without pivoting just


given, we used the kth equation to eliminate the variable xk from equations
k + 1, . . . , n during the kth step of the procedure. This is possible only if
(k−1)
at the beginning of the kth step, the coefficient akk of xk in equation k is
not zero. Even though these coefficients are used as denominators both in
the multipliers mij and in the backward substitution equations, this does
not necessarily mean that the linear system is not solvable, but that the
procedure of the solution must be altered.

Example 1.23 Solve the following linear system using the simple Gaus-
sian elimination method:
x2 + x3 = 1
x1 + 2x2 + 2x3 = 1
2x1 + x2 + 2x3 = 3.

Solution. Write the given system in augmented matrix form:


..
 
0 1 1 . 1
 . 
 1 2
 2 .. 1 
.
.
2 1 2 .. 3

To solve this system, the simple Gaussian elimination method will fail
immediately because the element in the first row on the leading diagonal,
the pivot, is zero. Thus, it is impossible to divide that row by the pivot
86 Applied Linear Algebra and Optimization using MATLAB

value. Clearly, this difficulty can be overcome by rearranging the order of


the rows; for example, making the first row the second gives
..
 
1 2 2 . 1
 0 1 1 ... 1  .
 
 
..
2 1 2 . 3
Now we use the usual elimination process. The first elimination step
is to eliminate the element a31 = 3 from the third row by subtracting a
multiple m31 = 31 = 3 of row 1 from row 3, which gives
..
 
1 2 2 . 1
 . 
 0
 1 1 .. 1 
.
.
0 −3 −2 .. 1
We finished with the first elimination step since the element a21 is already
eliminated from the second row. The second elimination step is to eliminate
(1)
the element a32 = −3 from the third row by subtracting a multiple m32 = −31
of row 2 from row 3, which gives
.
 
1 2 2 .. 1
 0 1 1 ... 1  .
 
 
..
0 0 1 . 4
Obviously, the original set of equations has been transformed to an upper-
triangular form. Now expressing the set in algebraic form yields
x1 + 2x2 + 2x3 = 1
x2 + x3 = 1
x3 = 4.
Now using backward substitution, we get
x3 = 4

x2 = 1 − x3 = 1 − 4 = −3

x1 = 1 − 2x2 − 2x3 = 1 − 2(−3) − 2(4) = −1,


Matrices and Linear Systems 87

the solution of the given system. •

Example 1.24 Solve the following linear system using the simple Gaus-
sian elimination method:
x1 + x2 + x3 = 3
2x1 + 2x2 + 3x3 = 7
x1 + 2x2 + 3x3 = 6.

Solution. Write the given system in augmented matrix form:


..
 
1 1 1 . 3
 . 
 2 2
 3 .. 7 
.
.
1 2 3 .. 6

The first elimination step is to eliminate the elements a21 = 2 and a31 =
1 from the second and third rows by subtracting the multiples m21 = 21 = 2
and m31 = 11 = 1 of row 1 from row 2 and row 3, respectively, which gives

.
 
1 1 1 .. 3
 0 0 1 ... 1  .
 
 
..
0 1 2 . 3

We finished the first elimination step. To start the second elimination


(1)
step, since we know that the element a22 = 0, called the second pivot
element, the simple Gaussian elimination cannot continue in its present
form. Therefore, we interchange rows 2 and 3 to get
..
 
1 1 1 . 3
 0 1 2 ... 3  .
 
 
..
0 0 1 . 1
(1)
We have finished with the second elimination step since the element a32 is
already eliminated from the third row. Obviously, the original set of equa-
tions has been transformed to an upper-triangular form. Now expressing
88 Applied Linear Algebra and Optimization using MATLAB

the set in algebraic form yields


x1 + x2 + x3 = 3
x2 + 2x3 = 3
x3 = 1.
Now using backward substitution, we get
x3 = 1, x2 = 1, x1 = 1,
the solution of the system. •
Example 1.25 Using the simple Gaussian elimination method, find all
values of a and b for which the following linear system is consistent or
inconsistent:
2x1 − x2 + 3x3 = 1
4x1 + 2x2 + 2x3 = 2a
2x1 + x2 + x3 = b.
Solution. Write the given system in augmented matrix form:
 
2 −1 3 1
 4 2 2 2a  ,
2 1 1 b
in which we wish to eliminate the elements a21 and a31 by subtracting from
the second and third rows the appropriate multiples of the first row. In this
case, the multiples are given as
4 2
m21 = = 2 and m31 = = 1.
2 2
Hence,  
2 −1 3 1
 0 4 −4 2a − 2  .
0 2 −2 b − 1
We have finished the first elimination step. The second elimination step is
(1)
to eliminate element a32 = 2 by subtracting a multiple m32 = 42 = 12 of row
2 from row 3, which gives
 
2 −1 3 1
 0 4 −4 2a − 2  .
0 0 0 b−a
Matrices and Linear Systems 89

We finished the second column. So the third row of the equivalent upper-
triangular system is
0x1 + 0x2 + 0x3 = b − a. (1.31)
First, if (1.31) has no constraint on unknowns x1 , x2 , and x3 , then the
upper-triangular system represents only two nontrivial equations, namely,
2x1 − x2 + 3x3 = 1
4x2 − 4x3 = 2a − 2
in the three unknowns. As a result, one of the unknowns can be chosen
arbitrarily, say x3 = x∗3 , then x∗2 and x∗1 can be obtained by using backward
substitution:
1
x∗2 = 1/2a − 1/2 − x∗3 ; x∗1 = (1 + 1/2a − 1/2 − 4x∗3 ).
2
Hence,
1
x∗ = [ (1 + 1/2a − 1/2 − 4x∗3 ), 1/2a − 1/2 − x∗3 , x∗3 ]T
2
is an approximation solution of the given system for any value of x∗3 for
any real value of a. Hence, the given linear system is consistent (infinitely
many solutions).

Second, when b − a 6= 0, in this case, (1.31) puts a restriction on


unknowns x1 , x2 , and x3 that is impossible to satisfy. So the given system
cannot have any solutions and, therefore, is inconsistent. •
Example 1.26 Solve the following homogeneous linear system using the
simple Gaussian elimination method:
x1 + x2 − 2x3 = 0
2x1 + 4x2 − 3x3 = 0
3x1 + 7x2 − 5x3 = 0.
Solution. The process begins with the augmented matrix form
..
 
1 1 −2 . 0
 .. 
.
 2 4 −3 . 0 

..
3 7 −5 . 0
90 Applied Linear Algebra and Optimization using MATLAB

Using the following multiples,


2 3
m21 = = 2 and m31 = =3
1 1
finishes the first elimination step, and we get
.
 
1 1 −2 .. 0

 0 2 .. 
.
 1 . 0 
..
0 4 1 . 0

Then using the multiple m32 = 42 = 2 of the second row from the third row,
we get
..
 
1 1 −2 . 0
 .. 
.
 0 2 1 . 0 

..
0 0 −1 . 0
Obviously, the original set of equations has been transformed to an
upper-triangular form. Thus, the system has the unique solution [0, 0, 0]T ,
i.e., the system has only the trivial solution. •

Example 1.27 Find the value of k for which the following homogeneous
linear system has nontrivial solutions by using the simple Gaussian elimi-
nation method:
2x1 − 3x2 + 5x3 = 0
−2x1 + 6x2 − x3 = 0
4x1 − 9x2 + kx3 = 0.
Solution. The process begins with the augmented matrix form
.
 
2 −3 5 .. 0

 −2 .. 
,
 6 −1 . 0 
..
4 −9 k . 0

and then using the following multiples,


−2 4
m21 = = −1 and m31 = = 2,
2 2
Matrices and Linear Systems 91

which gives
.
 
2 −3 5 .. 0

 0 .. 
.
 3 4 . 0 
..
0 −3 k − 10 . 0
−3
Also, by using the multiple m32 = 3
= −1, we get

..
 
2 −3 5 . 0
 . 
 0
 3 4 .. 0 
.
.
0 0 k − 6 .. 0

From the last row of the above system, we obtain

k − 6 = 0, which gives k = 6.

Also, solving the above underdetermined system

2x1 − 3x2 + 5x3 = 0


3x2 + 4x3 = 0

by taking x3 = 1, we have the nontrivial solutions

x∗ = α[−9/2, −4/3, 1]T , for α 6= 0.

Note that if we put x3 = 0, for example, we obtain the trivial solution


[0, 0, 0]T . •

Theorem 1.19 An upper-triangular matrix A is nonsingular if and only


if all its diagonal elements are not zero. •

Example 1.28 Use the simple Gaussian elimination method to find all
the values of α which make the following matrix singular:
 
1 −1 α
A= 2 2 1 .
0 α −1.5
92 Applied Linear Algebra and Optimization using MATLAB

Solution. Apply the forward elimination step of the simple Gaussian elim-
ination on the given matrix A and eliminate the element a21 by subtracting
from the second row the appropriate multiple of the first row. In this case,
the multiple is given as
 
1 −1 α
 0 4 1 − 2α  .
0 α −1.5
We finished the first elimination step. The second elimination step is
(1)
to eliminate element a32 = α by subtracting a multiple m32 = α4 of row 2
from row 3, which gives
1 −1 α
 
 0 4 1 − 2α  .
α(1 − 2α)
 
0 0 −1.5 −
4
To show that the given matrix is singular, we have to set the third diagonal
element equal to zero (by Theorem 1.19), i.e.,
α(1 − 2α)
−1.5 − = 0.
4
After simplifying, we obtain
2α2 − α − 6 = 0.
Solving the above quadratic equation, we get
3
α=− and α = 2,
2
which are the possible values of α, which make the given matrix singular.•

Example 1.29 Use the smallest positive integer value of α to find the
unique solution of the linear system Ax = [1, 6, −4]T by the simple Gaus-
sian elimination method, where
 
1 −1 α
A= 2 2 1 .
0 α −1.5
Matrices and Linear Systems 93

Solution. Since we know from Example 1.28 that the given matrix A is
singular when α = − 32 and α = 2, to find the unique solution we take the
smallest positive integer value α = 1 and consider the augmented matrix
as follows:
.
 
1 −1 1 .. 1

 2 .. 
.
 2 1 . 6 
..
0 1 −1.5 . −4
Applying the forward elimination step of the simple Gaussian elimina-
tion on the given matrix A and eliminating the element a21 by subtracting
from the second row the appropriate multiple m21 = 2 of the first row gives

..
 
1 −1 1 . 1
 . 
 0
 4 −1 .. .
4 
.
0 1 −1.5 .. −4

(1)
The second elimination step is to eliminate element a32 = 1 by subtracting
a multiple m32 = 14 of row 2 from row 3, which gives

..
 
1 −1 1 . 1
 .. 
.
 0
 4 −1 . 4 
..
0 0 −5/4 . −5

Now expressing the set in algebraic form yields

x1 − x2 + x3 = 1
4x2 − x3 = 4
−5/4x3 = −5.

Using backward substitution, we obtain

x3 = 4, x2 = 2, x1 = −1,

the unique solution of the given system. •


94 Applied Linear Algebra and Optimization using MATLAB

Note that the inverse of the nonsingular matrix A can be easily determined
by using the simple Gaussian elimination method. Here, we have to con-
sider the augmented matrix as a combination of the given matrix A and the
identity matrix I (the same size as A). To find the inverse matrix BA−1 ,
we must solve the linear system in which the jth column of the matrix B
is the solution of the linear system with the right-hand side the jth column
of the matrix I.
Example 1.30 Use the simple Gaussian elimination method to find the
inverse of the following matrix:
 
2 −1 3
A =  4 −1 6  .
2 −3 4

Solution. Suppose that the inverse A−1 = B of the given matrix exists
and let
    
2 −1 3 b11 b12 b13 1 0 0
AB =  4 −1 6   b21 b22 b23  =  0 1 0  = I.
2 −3 4 b31 b32 b33 0 0 1

Now to find the elements of the matrix B, we apply simple Gaussian


elimination on the augmented matrix:
.
 
2 −1 3 .. 1 0 0

[A|I] =  .. 
.
 4 −1 6 . 0 1 0 
..
2 −3 4 . 0 0 1

Apply the forward elimination step of the simple Gaussian elimination


on the given matrix A and eliminate the elements a21 = 4 and a31 = 2 by
subtracting from the second and the third rows the appropriate multiples
m21 = 24 = 2 and m31 = 22 = 1 of the first row. It gives
..
 
2 −1 3 . 1 0 0
 .. 
1 0 . −2 1 0  .
 0

..
0 −2 1 . −1 0 1
Matrices and Linear Systems 95

We finished the first elimination step. The second elimination step is


(1)
to eliminate element a32 = −2 by subtracting a multiple m32 = −2 1
= −2
of row 2 from row 3, which gives
..
 
2 −1 3 . 1 0 0
 .. 
.
 0
 1 0 . −2 1 0 
..
0 0 1 . −5 2 1

We solve the first system


    
2 −1 3 b11 1
 0 1 0   b21  =  −2 
0 0 1 b31 −5

by using backward substitution, and we get

2b11 − b21 + 3b31 = 1


b21 = −2
b31 = −5,

which gives
b11 = 7, b21 = −2, b31 = −5.
Similarly, the solution of the second linear system
    
2 −1 3 b12 0
 0 1 0   b22  =  1 
0 0 1 b32 2

can be obtained as follows:

2b12 − b22 + 3b32 = 0


b22 = 1
b32 = 2,

which gives
b12 = −5/2, b22 = 1, b32 = 2.
96 Applied Linear Algebra and Optimization using MATLAB

Finally, the solution of the third linear system


    
2 −1 3 b13 0
 0 1 0   b23  =  0 
0 0 1 b33 1

can be obtained as follows:


2b13 − b23 + 3b33 = 0
b23 = 0
b33 = 1,

and it gives
b13 = −3/2, b23 = 0, b33 = 1.
Hence, the elements of the inverse matrix B are
 5 3 
7 − −
 2 2 
 
B = A−1 =  −2 ,
 
 1 0 
 
−5 2 1

which is the required inverse of the given matrix A. •

Procedure 1.2 (Gaussian Elimination Method)


1. Form the augmented matrix, B = [A|b].

2. Check the first pivot element a11 6= 0, then move to the next step;
otherwise, interchange rows so that a11 6= 0.

3. Multiply row one by multiplier mi1 = ai1 /a11 and subtract to the ith
row for i = 2, 3, . . . , n.

4. Repeat steps 2 and 3 for the remaining pivots elements unless coeffi-
cient matrix A becomes upper-triangular matrix U .
bn−1
5. Use backward substitution to solve xn from the nth equation xn = n
ann
and solve the other (n − 1) unknown variables by using (1.27).
Matrices and Linear Systems 97

We now introduce the most important numerical quantity associated with


a matrix.

Definition 1.30 (Rank of a Matrix)

The rank of a matrix A is the number of pivots. An m × n matrix will,


in general, have a rank r, where r is an integer and r ≤ min{m, n}. If
r = min{m, n}, then the matrix is said to be full rank. If r < min{m, n},
then the matrix is said to be rank deficient. •

In principle, the rank of a matrix can be determined by using the Gaus-


sian elimination process in which the coefficient matrix A is reduced to
upper-triangular form U . After reducing the matrix to triangular form, we
find that the rank is the number of columns with nonzero values on the
diagonal of U . In practice, especially for large matrices, round-off errors
during the row operation may cause a loss of accuracy in this method of
rank computation.
Theorem 1.20 For a system of n equations with n unknowns written in
the form Ax = b, the solution x of a system exists and is unique for any
b, if and only if rank(A) = n. •
Conversely, if rank(A) < n for an n × n matrix A, then the system of
equations Ax = b may or may not be consistent. Such a system may not
have a solution, or the solution, if it exists, will not be unique.
Example 1.31 Find the rank of the following matrix:
 
1 2 4
A= 1 1 5 .
1 1 6

Solution. Apply the forward elimination step of simple Gaussian elimina-


tion on the given matrix A and eliminate the elements below the first pivot
(first diagonal element) to
 
1 2 4
 0 −1 1  .
0 −1 2
98 Applied Linear Algebra and Optimization using MATLAB

We finished the first elimination step. The second pivot is in the (2, 2)
position, but after eliminating the element below it, we find the triangular
form to be  
1 2 4
 0 −1 1  .
0 0 3
Since the number of pivots are three, the rank of the given matrix is 3. Note
that the original matrix is nonsingular since the rank of the 3 × 3 matrix
is 3. •

In MATLAB, the built-in rank function can be used to estimate the


rank of a matrix:

>> A = [1 2 4; 1 1 5; 1 1 6];
>> rank(A)
ans =
3
Note that:
rank(AB) ≤ min(rank(A), rank(B))
rank(A + B) ≤ rank(A) + rank(B)
rank(AAT ) = rank(A) = rank(AT A)
Although the rank of a matrix is very useful to categorize the behavior
of matrices and systems of equations, the rank of a matrix is usually not
computed. •

The use of nonzero pivots is sufficient for the theoretical correctness


of simple Gaussian elimination, but more care must be taken if one is to
obtain reliable results. For example, consider the linear system

0.000100x1 + x2 = 1
x1 + x2 = 2,

which has the exact solution x = [1.00010, 0.99990]T . Now we solve this
system by simple Gaussian elimination. The first elimination step is to
eliminate the first variable x1 from the second equation by subtracting
Matrices and Linear Systems 99

multiple m21 = 10000 of the first equation from the second equation, which
gives
0.000100x1 + x2 = 1
− 10000x2 = −10000.

Using backward substitution we get the solution x∗ = [0, 1]T . Thus, a


computational disaster has occurred. But if we interchange the equations,
we obtain
x1 + x2 = 2
0.000100x1 + x2 = 1.

Applying Gaussian elimination again, we get the solution x∗ = [1, 1]T .


This solution is as good as one would hope. So, we conclude from this
example that it is not enough just to avoid a zero pivot, one must also
avoid a relatively small one. Here we need some pivoting strategies to help
us overcome the difficulties faced during the process of simple Gaussian
elimination.

1.4.3 Pivoting Strategies


We know that simple Gaussian elimination is applied to a problem with
no pivotal elements that are zero, but the method does not work if the
first coefficient of the first equation or a diagonal coefficient becomes zero
in the process of the solution, because they are used as denominators in a
forward elimination.
Pivoting is used to change the sequential order of the equations for two
purposes; first to prevent diagonal coefficients from becoming zero, and sec-
ond, to make each diagonal coefficient larger in magnitude than any other
coefficient below it, i.e., to decrease the round-off errors. The equations are
not mathematically affected by changes in sequential order, but changing
the order makes the coefficient become nonzero. Even when all diagonal
coefficients are nonzero, the change of order increases the accuracy of the
computations.
There are two standard pivoting strategies used to handle these diffi-
culties easily. They are explained as follows.
100 Applied Linear Algebra and Optimization using MATLAB

Partial Pivoting
Here, we develop an implementation of Gaussian elimination that utilizes
the pivoting strategy discussed above. In using Gaussian elimination by
partial pivoting (or row pivoting), the basic approach is to use the largest
(in absolute value) element on or below the diagonal in the column of
current interest as the pivotal element for elimination in the rest of that
column.
One immediate effect of this will be to force all the multiples used to be
not greater than 1 in absolute value. This will inhibit the growth of error in
the rest of the elimination phase and in subsequent backward substitution.
At stage k of forward elimination, it is necessary, therefore, to be able to
identify the largest element from |akk |, |ak+1,k |, . . . , |ank |, where these aik s
are the elements in the current partially triangularized coefficient matrix. If
this maximum occurs in row p, then the pth and kth rows of the augmented
matrix are interchanged and the elimination proceeds as usual. In solving
n linear equations, a total of N = n(n+1)/2 coefficients must be examined.
Example 1.32 Solve the following linear system using Gaussian elimina-
tion with partial pivoting:
x1 + x2 + x3 = 1
2x1 + 3x2 + 4x3 = 3
4x1 + 9x2 + 16x3 = 11.
Solution. For the first elimination step, since 4 is the largest absolute
coefficient of the first variable x1 , the first row and the third row are inter-
changed, which gives us
4x1 + 9x2 + 16x3 = 11
2x1 + 3x2 + 4x3 = 3
x1 + x2 + x3 = 1.
Eliminate the first variable x1 from the second and third rows by subtracting
the multiples m21 = 24 and m31 = 41 of row 1 from row 2 and row 3,
respectively, which gives
4x1 + 9x2 + 16x3 = 11
− 3/2x2 − 4x3 = −5/2
− 5/4x2 − x3 = −7/5.
Matrices and Linear Systems 101

For the second elimination step, − 32 is the largest absolute coefficient of the
second variable x2 , so eliminate the second variable x2 from the third row
by subtracting the multiple m32 = 56 of row 2 from row 3, which gives

4x1 + 9x2 + 16x3 = 11


− 3/2x2 − 4x3 = −5/2
1/3x3 = 1/3.

Obviously, the original set of equations has been transformed to an equiva-


lent upper-triangular form. Now using backward substitution, we get

x1 = 1, x2 = −1, x3 = 1,

which is the required solution of the given linear system. •

The following MATLAB commands will give the same results we ob-
tained in Example 1.32 of the Gaussian elimination method with partial
pivoting:

>> B = [1 1 1 1; 2 3 4 3; 4 9 16 11];
>> x = P P (B);
>> disp(x)
102 Applied Linear Algebra and Optimization using MATLAB

Program 1.7
MATLAB m-file for Gaussian Elimination by Partial Pivoting
function x=PP(B)
% B = input(0 input matrix in f orm[A/b]0 );
[n, t] = size(B); U = B;
for M = 1:n-1
mx(M ) = abs(U (M, M )); r = M ;
for i = M+1:n
if mx(M ) < abs(U (i, M ))
mx(M)=abs(U(i,M)); r = i; end; end
rw1(1,1:t)=U(r,1:t); rw2(1,1:t)=U(M,1:t);
U(M,1:t)=rw1 ; U(r,1:t)=rw2 ;
for k=M+1:n
m=U(k,M)/U(M,M);
for j=M:t
U (k, j) = U (k, j) − m ∗ U (M, j); end;end
i=n; x(i)=U(i,t)/U(i,i);
for i=n-1:-1:1; s=0;
for k=n:-1:i+1
s = s + U (i, k) ∗ x(k); end
x(i)=(U(i,t)-s)/U(i,i); end; B; U; x; end

Procedure 1.3 (Partial Pivoting)

1. Suppose we are about to work on the ith column of the matrix. Then
we search that portion of the ith column below and including the di-
agonal and find the element that has the largest absolute value. Let
p denote the index of the row that contains this element.

2. Interchange row i and p.

3. Proceed with elimination procedure 1.2.


Matrices and Linear Systems 103

Total Pivoting
In the case of total pivoting (or complete pivoting), we search for the largest
number (in absolute value) in the entire array instead of just in the first
column, and this number is the pivot. This means that we shall probably
need to interchange the columns as well as rows. When solving a system
of equations using complete pivoting, each row interchange is equivalent to
interchanging two equations, while each column interchange is equivalent
to interchanging the two unknowns.

At the kth step, interchange both the rows and columns of the matrix
so that the largest number in the remaining matrix is used as the pivot
i.e., after the pivoting
|akk | = max|aij |, for i = k, k + 1, . . . , n, j = k, k + 1, . . . , n.
There are times when the partial pivoting procedure is inadequate.
When some rows have coefficients that are very large in comparison to
those in other rows, partial pivoting may not give a correct solution.

Therefore, when in doubt, use total pivoting. No amount of pivot-


ing will remove inherent ill-conditioning (we will discuss this later in the
chapter) from a set of equations, but it helps to ensure that no further
ill-conditioning is introduced in the course of computation.
Example 1.33 Solve the following linear system using Gaussian elimina-
tion with total pivoting:
x1 + x2 + x3 = 1
2x1 + 3x2 + 4x3 = 3
4x1 + 9x2 + 16x3 = 11.
Solution. For the first elimination step, since 16 is the largest absolute
coefficient of variable x3 in the given system, the first row and the third
row are interchanged as well as the first column and third column, and we
get
16x3 + 9x2 + 4x1 = 11
4x3 + 3x2 + 2x1 = 3
x3 + 9x2 + x1 = 1.
104 Applied Linear Algebra and Optimization using MATLAB

Then eliminate the third variable x3 from the second and third rows by
4 1
subtracting the multiples m21 = 16 and m31 = 16 of row 1 from rows 2 and
3, which respectively, gives

16x3 + 9x2 + 4x1 = 11


3 1
x
4 2
+ x1 = 4
7 5
x
16 2
+ 3/4x1 = 16
.

For the second elimination step, 1 is the largest absolute coefficient of the
first variable x1 in the second row and third column, so the second and third
columns are interchanged, giving us

16x3 + 4x1 + 9x2 = 11


3 1
x1 + x
4 2
= 4
3 7 5
x
4 1
+ x
16 2
= 16
.

Eliminate the first variable x1 from the third row by subtracting the multiple
m32 = 43 of row 2 from row 3, which gives

16x3 + 4x1 + 9x2 = 11


x1 + 43 x2 = 14
− 18 x2 = 18 .

The original set of equations has been transformed to an equivalent upper-


triangular form. Now using backward substitution, we get

x1 = 1, x2 = −1, x3 = 1,

which is the required solution of the given linear system. •


Matrices and Linear Systems 105

Program 1.8
MATLAB m-file for the Gaussian Elimination by Total Pivoting
function x=TP(B)
% B = input(‘input matrix in f orm[A/b]0 );
[n,m]=size(B);U=B; w=zeros(n,n);
for i=1:n; N(i)=i; end
for M = 1:n-1; r=M; c=M;
for i = M:n; for j = M:n
if max(M ) < abs(U (i, j)); max(M)=abs(U(i,j));
r = i; c = j; end; end; end
rw1(1,1:m)=U(r,1:m); rw2(1,1:m)=U(M,1:m);
U(M,1:m)=rw1;U(r,1:m)=rw2 ; cl1(1:n,1)= U(1:n,c);
cl2(1 : n, 1) = U (1 : n, M ); U (1 : n, M ) = cl1(1 : n, 1);
U (1 : n, c) = cl2(1 : n, 1); p = N (M ); N (M ) = N (c);
N (c) = p; w(M, 1 : n) = N ;
for k = M + 1 : n; e = U (k, M )/U (M, M );
for j = M : m; U (k, j) = U (k, j) − e ∗ U (M, j); end; end
i = n; x(i, 1) = U (i, m)/U (i, i);
for i = n − 1 : −1 : 1; s = 0;
for k = n : −1 : i + 1; s = s + U (i, k) ∗ x(k, 1);end
x(i, 1) = (U (i, m) − s)/U (i, i); end
for i=1:n; X(N (i), 1) = x(i, 1); end; B; U ; X; end

MATLAB can be used to get the same results we obtained in Exam-


ple 1.33 of the Gaussian elimination method with total pivoting with the
following command:

>> B = [1 1 1 1; 2 3 4 3; 4 9 16 11];
>> x = T P (B);
>> disp(x)

Total pivoting offers little advantage over partial pivoting and it is signifi-
cantly slower, requiring N = n(n+1)(2n+1)
6
elements to be examined in total.
It is rarely used in practice because interchanging columns changes the
order of the xs and, consequently, add significant and usually unjustified
106 Applied Linear Algebra and Optimization using MATLAB

complexity to the computer program. So for getting good results partial


pivoting has shown to be a very reliable procedure.

1.4.4 Gauss–Jordan Method


This method is a modification of the Gaussian elimination method. The
Gauss–Jordan method is inefficient for practical calculation, but is often
useful for theoretical purposes. The basis of this method is to convert the
given matrix into a diagonal form. The forward elimination of the Gauss–
Jordan method is identical to the Gaussian elimination method. However,
Gauss–Jordan elimination uses backward elimination rather than backward
substitution. In the Gauss–Jordan method the forward elimination and
backward elimination need not be separated. This is possible because a
pivot element can be used to eliminate the coefficients not only below but
also above at the same time. If this approach is taken, the form of the
coefficients matrix becomes diagonal when elimination by the last pivot is
completed. The Gauss–Jordan method simply yields a transformation of
the augmented matrix of the form

[A|b] → [I|c],

where I is the identity matrix and c is the column matrix, which represents
the possible solution of the given linear system.
Example 1.34 Solve the following linear system using the Gauss–Jordan
method:
x1 + 2x2 = 3
−x1 − 2x3 = −5
−3x1 − 5x2 + x3 = −4.
Solution. Write the given system in the augmented matrix form
..
 
1 2 0 . 3
 .. 
.
 −1 0 −2 . −5 

..
−3 −5 1 . −4
The first elimination step is to eliminate elements a21 = −1 and a31 = −3
by subtracting the multiples m21 = −1 and m31 = −3 of row 1 from rows
Matrices and Linear Systems 107

2 and 3, respectively, which gives


.
 
1 2 0 .. 3
 .
 0 2 −2 .. −2

.
 
.
0 1 1 .. 5

The second row is now divided by 2 to give


..
 
1 2 0 . 3
 .. 
1 −1 . −1  .
 0

..
0 1 1 . 5
(1)
The second elimination step is to eliminate the elements in positions a12 =
2 and a32 = 1 by subtracting the multiples m12 = 2 and m32 = 1 of row 2
from rows 1 and 3, respectively, which gives
.
 
1 0 2 .. 5
 0 1 −1 ... −1  .
 
 
..
0 0 2 . 6

The third row is now divided by 2 to give


.
 
1 0 2 .. 5
 0 1 −1 ... −1
 
.
 
.
0 0 1 .. 3
(1)
The third elimination step is to eliminate the elements in positions a23 =
−1 and a13 = 2 by subtracting the multiples m23 = −1 and m13 = 2 of row
3 from rows 2 and 1, respectively, which gives
..
 
1 0 0 . −1
 0 1 0 ...
 
 2 .
..
0 0 1 . 3
108 Applied Linear Algebra and Optimization using MATLAB

Obviously, the original set of equations has been transformed to a diagonal


form. Now expressing the set in algebraic form yields
x1 = −1
x2 = 2
x3 = 3,
which is the required solution of the given system. •
The above results can be obtained using MATLAB commands, as fol-
lows:

>> Ab = [A|b] = [1 2 0 3; −1 0 − 2 − 5; −3 − 5 1 − 4];


>> GaussJ(Ab);

Program 1.9
MATLAB m-file for the Gauss–Jordan Method
function sol=GaussJ(Ab)
[m,n]=size(Ab);
for i=1:m
Ab(i, :) = Ab(i, :)/Ab(i, i);
for j=1:m
if j == i; continue; end
Ab(j, :) = Ab(j, :) − Ab(j, i) ∗ Ab(i, :);
end; end; sol=Ab;

Procedure 1.4 (Gauss–Jordan Method)


1. Form the augmented matrix, [A|b].

2. Reduce the coefficient matrix A to unit upper-triangular form using


the Gaussian procedure.

3. Use the nth row to reduce the nth column to an equivalent identity
matrix column.

4. Repeat step 3 for n–1 through 2 to get the augmented matrix of the
form [I|c].
Matrices and Linear Systems 109

5. Solve for the unknown xi = ci , for i = 1, 2, . . . , n.


The number of multiplications and divisions required for the Gauss–Jordan
method is approximately
 3
n n
N= − n2 − ,
2 2
which is approximately 50% larger than for the Gaussian elimination method.
Consequently, the Gaussian elimination method is preferred.
The Gauss–Jordan method is particularly well suited to compute the
inverse of a matrix through the transformation
[A|I] → [I|A−1 ].
Note if the inverse of the matrix can be found, then the solution of the
linear system can be computed easily from the product of matrix A−1 and
column matrix b, i.e.,
x = A−1 b. (1.32)
Example 1.35 Apply the Gauss–Jordan method to find the inverse of the
following matrix:  
10 1 −5
A =  −20 3 20  .
5 3 5
Then solve the system with b = [1, 2, 6]T .

Solution. Consider the following augmented matrix:


..
 
10 1 −5 . 1 0 0
[A|I] =  −20 3 20 ... 0 1 0 
 
.

..
5 3 5 . 0 0 1
Divide the first row by 10, which gives
..
 
1 0.1 −0.5 . 0.1 0 0

=  −20 .. 
3 20 . .
0 1 0 

..
5 3 5 . 0 0 1
110 Applied Linear Algebra and Optimization using MATLAB

The first elimination step is to eliminate the elements in positions a21 =


−20 and a31 = 5 by subtracting the multiples m21 = −20 and m31 = 5 of
row 1 from rows 2 and 3, respectively, which gives
..
 
1 0.1 −0.5 . 0.1 0 0

= 0 .
.

5 10 . 2 1 0  .

..
0 2.5 7.5 . −0.5 0 1
Divide the second row by 5, which gives
.
 
1 0.1 −0.5 .. 0.1 0 0

= 0 .
.

.
 1 2 . 0.4 0.2 0 
..
0 2.5 7.5 . −0.5 0 1
The second elimination step is to eliminate the elements in positions a12 =
(1)
0.1 and a32 = 2.5 by subtracting the multiples m12 = 0.1 and m32 = 2.5 of
row 2 from rows 1 and 3, respectively, which gives
..
 
1 0 −0.7 . 0.06 −0.02 0
 . 
= 0 1
 2 .. 0.4 .
0.2 0 
.
0 0 2.5 .. −1.5 −0.5 1
Divide the third row by 2.5, which gives
..
 
1 0 −0.7 . 0.06 −0.02 0
=
 .. 
 0 1 2 . 0.4 0.2 .
0 
.
0 0 1 .. −0.6 −0.2 0.4
(2)
The third elimination step is to eliminate the elements in positions a23 = 2
(2)
and a13 = −0.7 by subtracting the multiples m23 = 2 and m13 = −0.7 of
row 3 from rows 2 and 1, respectively, which gives
..
 
1 0 0 . −0.36 −0.16 0.28
=  0 1 0 ... −1
 
1.6 0.6 −0.8   = [I|A ].

.
0 0 1 .. −0.6 −0.2 0.4
Matrices and Linear Systems 111

Obviously, the original augmented matrix [A|I] has been transformed to the
augmented matrix of the form [I|A−1 ]. Hence, the solution of the linear
system can be obtained by the matrix multiplication (1.32) as
      
x1 −0.36 −0.16 0.28 1 1
 x2  =  1.6 0.6 −0.8   2  =  −2  .
x3 −0.6 −0.2 0.4 6 1.4

Hence, x∗ = [1, −2, 1.4]T is the solution of the given system. •

The above results can be obtained using MATLAB, as follows:

>> Ab = [A|I] = [10 1 − 5 1 0 0; −20 3 20 0 1 0; 5 3 5 0 0 1];


>> [I|inv(A)] = GaussJ(Ab);
>> b = [1 2 6]0 ;
>> x = inv(A) ∗ b;

1.4.5 LU Decomposition Method


This is another direct method to find the solution of a system of linear equa-
tions. LU decomposition (or the factorization method) is a modification
of the elimination method. Here we decompose or factorize the coefficient
matrix A into the product of two triangular matrices in the form

A = LU, (1.33)

where L is a lower-triangular matrix and U is the upper-triangular matrix.


Both are the same size as the coefficients matrix A. To solve a number
of linear equations sets in which the coefficients matrices are all identical
but the right-hand sides are different, then LU decomposition is more
efficient than the elimination method. Specifying the diagonal elements
of either L or U makes the factoring unique. The procedure based on
unity elements on the diagonal of matrix L is called Doolittle’s method (or
Gauss factorization), while the procedure based on unity elements on the
diagonal of matrix U is called Crout’s method. Another method, called the
Cholesky method, is based on the constraint that the diagonal elements of
L are equal to the diagonal elements of U , i.e., lii = uii , for i = 1, 2, . . . , n.
112 Applied Linear Algebra and Optimization using MATLAB

The general forms of L and U are written as


   
l11 0 ··· 0 u11 u12 · · · u1n
 l21 l22 · · · 0   0 u22 · · · u2n 
L =  .. ..  , U =  .. ..  , (1.34)
   
.. .. .. ..
 . . . .   . . . . 
ln1 ln2 · · · lnn 0 0 · · · unn

such that lij = 0 for i < j and uij = 0 for i > j.

Consider a linear system


Ax = b (1.35)
and let A be factored into the product of L and U , as shown by (1.34).
Then the linear system (1.35) becomes

LU x = b

or can be written as
Ly = b,
where
y = U x.
The unknown elements of matrix L and matrix U are computed by equating
corresponding elements in matrices A and LU in a systematic way. Once
the matrices L and U have been constructed, the solution of system (1.35)
can be computed in the following two steps:

1. Solve the system Ly = b.

By using forward elimination, we will find the components of the


unknown vector y by using the following steps:

y 1 = b1 , 

i−1
X . (1.36)
y i = bi − lij yj , i = 2, 3, . . . , n 

j=1
Matrices and Linear Systems 113

2. Solve the system U x = y.

By using backward substitution, we will find the components of the


unknown vector x by using the following steps:
yn 
xn = , 
unn" #


n
1 X . (1.37)
xi = yi − uij xj , i = n − 1, n − 2, . . . , 1 
uii


j=i+1

Thus, the relationship of the matrices L and U to the original matrix A is


given by the following theorem.
Theorem 1.21 If Gaussian elimination can be performed on the linear
system Ax = b without row interchanges, then the matrix A can be factored
into the product of a lower-triangular matrix L and an upper-triangular
matrix U , i.e.,
A = LU,
where the matrices L and U are the same size as A. •
Let us consider a nonsingular system Ax = b and with the help of the
simple Gauss elimination method we will convert the coefficient matrix A
into the upper-triangular matrix U by using elementary row operations. If
all the pivots are nonzero, then row interchanges are not necessary, and the
decomposition of the matrix A is possible. Consider the following matrix:
 
2 4 2
A= 4 9 7 .
−2 −2 5
To convert it into the upper-triangular matrix U , we first apply the follow-
ing row operations
Row2 − (2)Row1 and Row3 + Row1,
which gives  
2 4 2
 0 1 3 .
0 2 7
114 Applied Linear Algebra and Optimization using MATLAB

Once again, applying the row operation


Row2 − (2)Row2,
we get  
2 4 2
 0 1 3  = U,
0 0 1
which is the required upper-triangular matrix.

Now defining the three elementary matrices (each of them can be ob-
tained by adding a multiple of row i to row j) associated with these row
operations:
     
1 0 0 1 0 0 1 0 0
E1 =  −2 1 0  , E2 =  0 1 0  , E3 =  0 1 0 .
0 0 1 1 0 1 0 −2 1
Then
     
1 0 0 1 0 0 1 0 0 1 0 0
E3E2E1 =  0 1 0   0 1 0   −2 1 0  =  −2 1 0 
0 −2 1 1 0 1 0 0 1 5 −2 1
and
    
1 0 0 2 4 2 2 4 2
E3E2E1A =  −2 1 0  4 9 7  =  0 1 3  = U.
5 −2 1 −2 −2 5 0 0 1
So
A = E1−1 E2−1 E3−1 = LU,
where
     
1 0 0 1 0 0 1 0 0 1 0 0
E1−1 E2−1 E3−1 =  2 1 0   0 1 0   0 1 0  =  2 1 0  = L.
0 0 1 −1 0 1 0 2 1 −1 2 1
Thus, A = LU is a product of a lower-triangular matrix L and an upper-
triangular matrix U . Naturally, this is called an LU decomposition of A.
Matrices and Linear Systems 115

Theorem 1.22 Let A be an n × n matrix that has an LU factorization,


i.e.,
A = LU.
If A has rank n (i.e., all pivots are nonzeros), then L and U are uniquely
determined by A. •

Now we will discuss all three possible variations of LU decomposition to


find the solution of the nonsingular linear system in the following.

Doolittle’s Method
In Doolittle’s method (called Gauss factorization), the upper-triangular
matrix U is obtained by forward elimination of the Gaussian elimination
method and the lower-triangular matrix L containing the multiples used in
the Gaussian elimination process as the elements below the diagonal with
unity elements on the main diagonal.

For the matrix A in Example 1.22, we can have the decomposition of


matrix A in the form
    
1 2 1 1 0 0 1 2 1
 2 5 3 = 2 1 0  0 1 1 ,
1 3 4 1 1 1 0 0 2

where the unknown elements of matrix L are the used multiples and the
matrix U is the same as we obtained in the forward elimination process.
Example 1.36 Construct the LU decomposition of the following matrix A
by using Gauss factorization (i.e., LU decomposition by Doolittle’s method).
Find the value(s) of α for which the following matrix is
 
1 −1 α
A =  −1 2 −α 
α 1 1

singular. Also, find the unique solution of the linear system Ax = [1, 1, 2]T
by using the smallest positive integer value of α.
116 Applied Linear Algebra and Optimization using MATLAB

Solution. Since we know that


     
1 −1 α 1 0 0 u11 u12 u13
A =  −1 2 −α  =  m21 1 0  ,  0 u22 u23  = LU,
α 1 1 m31 m32 1 0 0 u33

now we will use only the forward elimination step of the simple Gaussian
elimination method to convert the given matrix A into the upper-triangular
matrix U . Since a11 = 1 6= 0, we wish to eliminate the elements a21 = −1
and a31 = α by subtracting from the second and third rows the appropriate
multiples of the first row. In this case, the multiples are given,

−1 α
m21 = = −1 and m31 = = α.
1 1
Hence,  
1 −1 α
 0 1 0 .
0 1 + α 1 − α2
(1) (1)
Since a22 = 1 6= 0, we eliminate the entry in the a32 = 1 + α position by
subtracting the multiple m32 = 1+α
1
of the second row from the third row to
get  
1 −1 α
 0 1 0 .
0 0 1 − α2
Obviously, the original set of equations has been transformed to an upper-
triangular form. Thus,
    
1 −1 α 1 0 0 1 −1 α
 −1 2 −α  =  −1 1 0  0 1 0 ,
α 1 1 α 1+α 1 0 0 1 − α2

which is the required decomposition of A. The matrix will be singular, if


the third diagonal element 1 − α2 of the upper-triangular U is equal to zero
(Theorem 1.19), which gives α = ±1.
Matrices and Linear Systems 117

To find the unique solution of the given system we take α = 2, and it


gives
    
1 −1 2 1 0 0 1 −1 2
 −1 2 −2  =  −1 1 0  0 1 0 .
2 1 1 2 3 1 0 0 −3

Now solve the first system Ly = b for unknown vector y, i.e.,


    
1 0 0 y1 1
 −1 1 0   y2  =  1 .
2 3 1 y3 2

Performing forward substitution yields

y1 = 1 gives y1 = 1,
−y1 + y2 = 1 gives y2 = 2,
2y1 + 3y2 + y3 = 2 gives y3 = −6.

Then solve the second system U x = y for unknown vector x, i.e.,


    
1 −1 2 x1 1
 0 1 0   x2  =  2  .
0 0 −3 x3 −6

Performing backward substitution yields

x1 − x2 + 2x3 = 1 gives x1 = −1
x2 = 2 gives x2 = 2
− 3x3 = −6 gives x3 = 2,

which gives
x1 = −1
x2 = 2
x3 = 2,
the approximate solution of the given system. •
118 Applied Linear Algebra and Optimization using MATLAB

We can write a MATLAB m-file to factor a nonsingular matrix A into


a unit lower-triangular matrix L and an upper-triangular matrix U using
the lu − gauss function. The following MATLAB commands can be used
to reproduce the solution of the linear system of Example 1.22:

>> A = [1 2 0; −1 0 − 2; −3 − 5 1];
>> B = lu − gauss(A);
>> L = eye(size(B)) + tril(B, −1);
>> U = triu(A);
>> b = [3 − 5 − 4]0 ;
>> y = L \ b;
>> x = U \ y;

Program 1.10
MATLAB m-file for the LU Decomposition Method
function A = lu − gauss(A)
% LU factorization without pivoting
[n,n] = size(A); for i=1:n-1; pivot = A(i,i);
for k=i+1:n; A(k,i)=A(k,i)/pivot;
for j=i+1:n; A(k, j) = A(k, j) − A(k, i) ∗ A(i, j);
end;end; end

There is another way to find the values of the unknown elements of the
matrices L and U , which we describe in the following example.

Example 1.37 Construct the LU decomposition of the following matrix


using Doolittle’s method:
 
1 2 4
A =  1 3 3 .
2 2 2
Matrices and Linear Systems 119

Solution. Since
  
1 0 0 u11 u12 u13
A = LU =  l21 1 0   0 u22 u23  ,
l31 l32 1 0 0 u33

performing the multiplication on the right-hand side gives


   
1 2 4 u11 u12 u13
 1 3 3  =  l21 u11 l21 u12 + u22 l21 u13 + u23  .
2 2 2 l31 u11 l31 u12 + l32 u22 l31 u13 + l32 u23 + u33

Then equate elements of the first column to obtain

1 = u11 , u11 = 1
1 = l21 u11 , l21 = 1
2 = l31 u11 , l31 = 2.

Now equate elements of the second column to obtain

2 = u12 , u12 = 2
3 = l21 u12 + u22 , u22 = 3 − 2 = 1
2 = l31 u12 + l32 u22 , l32 = 2 − 4 = −2.

Finally, equate elements of the third column to obtain

4 = u13 , u13 = 4
3 = l21 u13 + u23 , u23 = 3 − 4 = −1
2 = l31 u13 + l32 u23 + u33 , u33 = 2 − 10 = −8.

Thus, we obtain
    
1 2 4 1 0 0 1 2 4
 1 3 3 = 1 1 0  0 1 1 ,
2 2 2 2 −2 1 0 0 −8

the factorization of the given matrix. •


120 Applied Linear Algebra and Optimization using MATLAB

The general formula for getting elements of L and U corresponding to the


coefficient matrix A for a set of n linear equations can be written as

i−1

X 
uij = aij − lik ukj , 2 ≤i≤j 



k=1






" j−1
# 

1 X 

lij = aij − lik ukj , i >j≥2

uii . (1.38)
k=1 




uij = a1j , i =1







ai1 ai1 

lij = = , j =1


u11 a11

Example 1.38 Solve the following linear system by LU decomposition us-


ing Doolittle’s method:
  
1 2 4 −2
A=  1 3 3  and b =  3 .
2 2 2 −6
Solution. The factorization of the coefficient matrix A has already been
constructed in Example 1.37 as
    
1 2 4 1 0 0 1 2 4
 1 3 3 = 1 1 0  0 1 1 .
2 2 2 2 −2 1 0 0 −8
Then solve the first system Ly = b for unknown vector y, i.e.,
    
1 0 0 y1 −2
 1 1 0   y2  =  3  .
2 −2 1 y3 −6
Performing forward substitution yields
y1 = −2 gives y1 = −2,
y1 + y2 = 3 gives y2 = 5,
2y1 − 2y2 + y3 = −6 gives y3 = 8.
Matrices and Linear Systems 121

Then solve the second system U x = y for unknown vector x, i.e.,

    
1 2 4 x1 −2
 0 1 1   x2  =  5  .
0 0 −8 x3 8

Performing backward substitution yields

x1 + 2x2 + 4x3 = −2 gives x1 = −6


x2 + x3 = 5 gives x2 = 4
− 8x3 = 8 gives x3 = −1,

which gives

x1 = −6
x2 = 4
x3 = −1,

the approximate solution of the given system. •

We can also write the MATLAB m-file called Doolittle.m to get the
solution of the linear system by LU decomposition by using Doolittle’s
method. In order to reproduce the above results using MATLAB com-
mands, we do the following:

>> A = [1 2 4; 1 3 3; 2 2 2];
>> b = [−2 3 − 6];
>> sol = Doolittle(A, b);
122 Applied Linear Algebra and Optimization using MATLAB

Program 1.11
MATLAB m-file for using Doolittle’s Method
function sol = Doolittle(A,b)
[n,n]=size(A); u=A;l=zeros(n,n);
for i=1:n-1; if abs(u(i,i))> 0
for i1=i+1:n; m(i1,i)=u(i1,i)/u(i,i);
for j=1:n
u(i1, j) = u(i1, j) − m(i1, i) ∗ u(i, j);end;end;end;end
for i=1:n; l(i,1)=A(i,1)/u(1,1); end
for j=2:n; for i=2:n; s=0;
for k=1:j-1; s = s + l(i, k) ∗ u(k, j); end
l(i,j)=(A(i,j)-s)/u(j,j); end; end y(1)=b(1)/l(1,1);
for k=2:n; sum=b(k);
for i=1:k-1; sum = sum − l(k, i) ∗ y(i); end
y(k)=sum/l(k,k); end
x(n)=y(n)/u(n,n);
for k=n-1:-1:1; sum=y(k);
for i=k+1:n; sum = sum − u(k, i) ∗ x(i); end
x(k)=sum/u(k,k); end; l; u; y; x

Procedure 1.5 (LU Decomposition by Doolittle’s Method)

1. Take the nonsingular matrix A.

2. If possible, decompose the matrix A = LU using (1.38).

3. Solve linear system Ly = b using (1.36).

4. Solve linear system U x = y using (1.37).

The LDV Factorization


There is some asymmetry in LU decomposition because the lower-triangular
matrix has 1s on its diagonal, while the upper-triangular matrix has a
nonunit diagonal. This is easily remedied by factoring the diagonal entries
Matrices and Linear Systems 123

out of the upper-triangular matrix as follows:


    
u11 u12 · · · u1n u11 ··· 0 1 u12 /u11 · · · u1n /u11
 0 u22 · · · u2n   0 u22 · · ·  0 1 · · · u2n /u22 
..  =  .. ..  .
    
 .. .. .. .. .. ..  .. .. ..
 . . . .   . . . .  . . . . 
0 0 · · · unn 0 · · · unn 0 0 ··· 1

Let D denote the diagonal matrix having the same diagonal elements as
the upper-triangular matrix U ; in other words, D contains the pivots on
its diagonal and zeros everywhere else. Let V be the redefining upper-
triangular matrix obtained from the original upper-triangular matrix U by
dividing each row by its pivot, so that V has all 1s on the diagonal. It
is easily seen that U = DV , which allows any LU decomposition to be
written as
A = LDV,
where L and V are lower- and upper-triangular matrices with 1s on both
of their diagonals. This is called the LDV factorization of A.

Example 1.39 Find the LDV factorization of the following matrix:


 
1 2 −1
A= 3 2 1 .
2 4 1

Solution. By using Doolittle’s method, the LU decomposition of A can be


obtained as
    
1 2 −1 1 0 0 1 2 −1
A= 3 2 1  =  3 1 0   0 −4 4  = LU.
2 4 1 2 0 1 0 0 3

Then the matrix D and the matrix V can be obtained as


    
1 2 −1 1 0 0 1 2 −1
U =  0 −4 4  =  0 −4 0   0 1 4  = DV.
0 0 3 0 0 3 0 0 1
124 Applied Linear Algebra and Optimization using MATLAB

Thus, the LDV factorization of the given matrix A is obtained as


   
1 0 0 1 0 0 1 2 −1
A = LDV =  3 1 0   0 −4 0   0 1 4 .
2 0 1 0 0 3 0 0 1

If a given matrix A is symmetric, then there is a connection between


the lower-triangular matrix L and the upper-triangular matrix U in the
LU decomposition. In the first elimination step, the elements in Ls first
column are obtained by dividing U s first row by the diagonal elements.
Similarly, during the second elimination step, l32 = uu22
23
. In general, when
a symmetric matrix is decomposed without pivots, lij is related to uji
through the identity
uji
lij = .
ujj
In other words, each column of a matrix L equals the corresponding row
of a matrix U divided by the diagonal element. It is uniquely determined
that the LDV decomposition of a symmetric matrix has the form LDLT ,
since A = LDV . Taking the transpose of it, we get

(A)T = (LDV )T = V T DT LT = V T DLT ,

(the diagonal matrix D is symmetric), and the uniqueness of the LDV


decomposition implies that

L=VT and V = LT .

Note that not every symmetric matrix has an LDLT factorization. How-
ever, if A = LDLT , then A must be symmetric because

(A)T = (LDLT )T = (LT )T DT LT = LDLT = A.

Example 1.40 Find the LDLT factorization of the following symmetric


matrix:  
1 3 2
A =  3 4 1 .
2 1 2
Matrices and Linear Systems 125

Solution. By using Doolittle’s method, the LU decomposition of A can be


obtained as
    
1 3 2 1 0 0 1 3 2
A=  3 4 1  =  3 1 0   0 −5 −5  = LU.
2 1 2 2 1 1 0 0 3

Then the matrix D and the matrix V can be obtained as


    
1 3 2 1 0 0 1 3 2
U =  0 −5 −5  =  0 −5 0   0 1 1  = DV.
0 0 3 0 0 3 0 0 1

Note that    
1 3 2 1 0 0
V =  0 1 1  = LT =  3 1 0  .
0 0 1 2 1 1
Thus, we obtain

  
1 0 0 1 0 0 1 3 2
A = LDLT =  3 1 0   0 −5 0   0 1 1  ,
2 1 1 0 0 3 0 0 1

the LDLT factorization of the given matrix A. •

Crout’s Method
Crout’s method, in which matrix U has unity on the main diagonal, is
similar to Doolittle’s method in all other aspects. The L and U matrices
are obtained by expanding the matrix equation A = LU term by term to
determine the elements of the L and U matrices.

Example 1.41 Construct the LU decomposition of the following matrix


using Crout’s method:  
1 2 3
A =  6 5 4 .
2 5 6
126 Applied Linear Algebra and Optimization using MATLAB

Solution. Since
  
l11 0 0 1 u12 u13
A = LU =  l21 l22 0   0 1 u23  ,
l31 l32 l33 0 0 1
performing the multiplication on the right-hand side gives
   
1 2 3 l11 l11 u12 l11 u13
 6 5 4  =  l21 l21 u12 + l22 l21 u13 + l22 u23 .
2 5 6 l31 l31 u12 + l32 l31 u13 + l32 u23 + l33
Then equate elements of the first column to obtain
1 = l11
6 = l21
2 = l31 .
Then equate elements of the second column to obtain
2 = l11 u12 , u12 = 2

5 = l21 u12 + l22 , l22 = 5 − 12 = −7

5 = l31 u12 + l32 , l32 = 5 − 4 = 1.


Finally, equate elements of the third column to obtain
3 = l11 u13 , u13 = 3

4 = l21 u13 + l22 u23 , u23 = (4 − 18)/ − 7 = 2

6 = l31 u13 + l32 u23 + l33 , l33 = (6 − 6 − 2) = −2.


Thus, we get
    
1 2 3 1 0 0 1 2 3
 6 5 4  =  6 −7 0  0 1 2 ,
2 5 6 2 1 −2 0 0 1
the factorization of the given matrix. •
Matrices and Linear Systems 127

The general formula for getting elements of L and U corresponding to


the coefficient matrix A for a set of n linear equations can be written as

j−1

X 
lij = aij − lik ukj , i ≥ j, i = 1, 2, . . . , n 



k=1






i−1


1 X 

uij = [aij − lik ukj ], i < j, j = 2, 3, . . . , n

lii . (1.39)
k=1 




lij = ai1 , j=1







aij 

uij = , i=1


a11

Example 1.42 Solve the following linear system by LU decomposition us-


ing Crout’s method:
   
1 2 3 1
A= 6 5 4  and b =  −1  .
2 5 6 5

Solution. The factorization of the coefficient matrix A has already been


constructed in Example (1.41) as
    
1 2 3 1 0 0 1 2 3
 6 5 4  =  6 −7 0  0 1 2 .
2 5 6 2 1 −2 0 0 1

Then solve the first system Ly = b for unknown vector y, i.e.,


    
1 0 0 y1 1
 6 −7 0   y2  =  −1  .
2 1 −2 y3 5
128 Applied Linear Algebra and Optimization using MATLAB

Performing forward substitution yields

y1 = 1 gives y1 = 1
6y1 − 7y2 = −1 gives y2 = 1
2y1 + y2 − 2y3 = 5 gives y3 = −1.

Then solve the second system U x = y for unknown vector x, i.e.,

    
1 2 3 x1 1
 0 1 2   x2  =  1  .
0 0 1 x3 −1

Performing backward substitution yields

x1 + 2x2 + 3x3 = 1 gives x1 = −2


x2 + 2x3 = 1 gives x2 = 3
x3 = −1 gives x3 = −1,

which gives the approximate solution x∗ = [−2, 3, −1]T . •

The above results can be reproduced by using MATLAB commands as


follows:

>> A = [1 2 3; 6 5 4; 2 5 6];
>> b = [1 − 1 5];
>> sol = Crout(A, b);
Matrices and Linear Systems 129

Program 1.12
MATLAB m-file for the Crout’s Method
function sol = Crout(A, b)
[n,n]=size(A); u=zeros(n,n); l=u;
for i=1:n; u(i,i)=1; end
l(1,1)=A(1,1);
for i=2:n
u(1,i)=A(1,i)/l(1,1); l(i,1)=A(i,1); end
for i=2:n; for j=2:n; s=0;
if i <= j; K=i-1;
else; K=j-1; end
for k=1:K; s = s + l(i, k) ∗ u(k, j); end
if j > i; u(i,j)=(A(i,j)-s)/l(i,i); else
l(i,j)=A(i,j)-s; end;end;end
y(1)=b(1)/l(1,1);
for k=2:n; sum=b(k);
for i=1:k-1; sum = sum − l(k, i) ∗ y(i); end
y(k)=sum/l(k,k); end
x(n)=y(n)/u(n,n);
for k=n-1:-1:1; sum=y(k);
for i=k+1:n; sum = sum − u(k, i) ∗ x(i); end
x(k)=sum/u(k,k); end; l; u; y; x;

Procedure 1.6 (LU Decomposition by Crout’s Method)

1. Take the nonsingular matrix A.

2. If possible, decompose the matrix A = LU using (1.39).

3. Solve linear system Ly = b using (1.36).

4. Solve linear system U x = y using (1.37).

Note that the factorization method is also used to invert matrices.


Their usefulness for this purpose is based on the fact that triangular ma-
trices are easily inverted. Once the factorization has been affected, the
130 Applied Linear Algebra and Optimization using MATLAB

inverse of a matrix A is found from the formula


A−1 = (LU )−1 = U −1 L−1 . (1.40)
Then
U A−1 = L−1 , where LL−1 = I.
A practical way of calculating the determinant is to use the forward
elimination process of Gaussian elimination or, alternatively, LU decom-
position. If no pivoting is used, calculation of the determinant using LU
decomposition is very easy, since by one of the properties of the determi-
nant
det(A) = det(LU ) = det(L) det(U ).
So when using LU decomposition by Doolittle’s method,
n
Y
det(A) = det(U ) = uii = (u11 u22 · · · unn ),
i=1

where det(L) = 1 because L is a lower-triangular matrix and all its diagonal


elements are unity. For LU decomposition by Crout’s method,
n
Y
det(A) = det(L) = lii = (l11 l22 · · · lnn ),
i=1

where det(U ) = 1 because U is an upper-triangular matrix and all its di-


agonal elements are unity.

Example 1.43 Find the determinant and inverse of the following matrix
using LU decomposition by Doolittle’s method:
 
1 −2 1
A =  1 −1 1  .
1 1 2
Solution. We know that
    
1 −2 1 1 0 0 u11 u12 u13
A =  1 −1 1  =  m21 1 0   0 u22 u23  = LU.
1 1 2 m31 m32 1 0 0 u33
Matrices and Linear Systems 131

Now we will use only the forward elimination step of the simple Gaussian
elimination method to convert the given matrix A into the upper-triangular
matrix U . Since a11 = 1 6= 0, we wish to eliminate the elements a21 = 1
and a31 = 1 by subtracting from the second and third rows the appropriate
multiples of the first row. In this case, the multiples are given as

m21 = 1, and m31 = 1.

Hence,  
1 −2 1
 0 1 0 .
0 3 1
(1) (1)
Since a22 = 1 6= 0, we eliminate the entry in the a32 = 3 position by
subtracting the multiple m32 = 3 of the second row from the third row to
get  
1 −2 1
 0 1 0 .
0 0 1
Obviously, the original set of equations has been transformed to an upper-
triangular form. Thus,
    
1 −2 1 1 0 0 1 −2 1
 1 −1 1  =  1 1 0  0 1 0 ,
1 1 2 1 3 1 0 0 1

which is the required decomposition of A.


Now we find the determinant of matrix A as

det(A) = det(U ) = u11 u22 u33 = (1)(1)(1) = 1.

To find the inverse of matrix A, first we will compute the inverse of the
lower-triangular matrix L−1 from
  0   
1 0 0 l11 0 0 1 0 0
LL−1 =  1 1 0   l21 0 0
l22 0 = 0 1 0 =I
0 0 0
1 3 1 l31 l32 l33 0 0 1
132 Applied Linear Algebra and Optimization using MATLAB

by using forward substitution.


To solve the first system
  0   
1 0 0 l11 1
0 
 1 1 0   l21 =  0 ,
0
1 3 1 l31 0

by using forward substitution, we get


0 0 0
l11 = 1, l21 = −1, l31 = 2.

Similarly, the solution of the second linear system


    
1 0 0 0 0
 1 0 
1 0   l22 =  1 
0
1 3 1 l32 0

can be obtained
0 0
l22 = 1, l32 = −3.
Finally, the solution of the third linear system
    
1 0 0 0 0
 1 1 0  0  =  0 
0
1 3 1 l33 1
0
gives l33 = 1.
Hence, the elements of the matrix L−1 are
 
1 0 0
L−1 =  −1 1 0 ,
2 −3 1

which is the required inverse of the lower-triangular matrix L.


To find the inverse of the given matrix A, we will solve the system
 0
a11 a012 a013
   
1 −2 1 1 0 0
U A−1 =  0 1 0   a021 a022 a023  =  −1 1 0  = L−1
0 0 0
0 0 1 a31 a32 a33 2 −3 1
Matrices and Linear Systems 133

by using backward substitution.


We solve the first system
  0   
1 −2 1 a11 1
 0 1 0   a021  =  −1 
0 0 1 a031 2

by using backward substitution, and we get

a011 = −3, a021 = −1, a031 = 2.

Similarly, the solution of the second linear system


  0   
1 −2 1 a12 0
 0 1 0   a022  =  1 
0 0 1 a032 −3

can be obtained as follows:

a012 = 5, a022 = 1, a032 = −3.

Finally, the solution of the third linear system


  0   
1 −2 1 a13 0
 0 1 0   a023  =  0 
0 0 1 a033 1

can be obtained as follows:

a013 = −1, a023 = 0, a033 = 1.

Hence, the elements of the inverse matrix A−1 are


 
−3 5 −1
A−1 =  −1 1 0 ,
2 −3 1

which is the required inverse of the given matrix A. •


134 Applied Linear Algebra and Optimization using MATLAB

For LU decomposition we have not used pivoting for the sake of sim-
plicity. However, pivoting is important for the same reason as in Gaussian
elimination. We know that pivoting in Gaussian elimination is equivalent
to interchanging the rows of the coefficients matrix together with the terms
on the right-hand side. This indicates that pivoting may be applied to LU
decomposition as long as the interchanging is applied to the left and right
terms in the same way. When performing pivoting in LU decomposition,
the changes in the order of the rows are recorded. The same reordering is
then applied to the right-hand side terms before starting the solution in
accordance with the forward elimination and backward substitution steps.

Indirect LU Decomposition

It is to be noted that a nonsingular matrix A sometimes cannot be directly


factored as A = LU . For example, the matrix in Example 1.24 is nonsin-
gular, but it cannot be factored into the product LU . Let us assume it has
a LU form and
   
2 2 −4 u11 u12 u13
 2 2 −1  =  l21 u11 l21 u12 + u22 l21 u13 + u23 .
3 2 −3 l31 u11 l31 u12 + l32 u22 l31 u13 + l32 u23 + u33

Then equate elements of the first column to obtain

2 = u11 gives u11 = 2


2 = l21 u11 gives l21 = 1
3 = l31 u11 gives l31 = 23 .

Then equate elements of the second column to obtain

2 = u12 gives u12 = 2


2 = l21 u12 + u22 gives u22 = 0
2 = l31 u12 + l32 u22 gives 0 = −1,

which is not possible because 0 6= −1, a contradiction. Hence, the matrix


A cannot be directly factored into the product of L and U . The indirect
factorization LU of A can be obtained by using the permutation matrix P
Matrices and Linear Systems 135

and replacing the matrix A by P A. For example, using the above matrix
A, we have
    
1 0 0 2 2 −4 2 2 −4
P A =  0 0 1   2 2 −1  =  3 2 −3  .
0 1 0 3 2 −3 2 2 −1

From this multiplication we see that rows 2 and 3 of the original matrix A
are interchanged, and the resulting matrix P A has a LU factorization and
we have
    
1 0 0 2 2 −4 2 2 −4
 1.5 1 0   0 −1 3  =  3 2 −3  .
1 0 1 0 0 3 2 2 −1

The following theorem is an extension of Theorem 1.21, which includes the


case when interchanged rows are required. Thus, LU factorization can be
used to find the solution to any linear system Ax = b with a nonsingular
matrix A.

Theorem 1.23 Let A be a square n × n matrix and assume that Gaussian


elimination can be performed successfully to solve the linear system Ax =
b, but that row interchanges are required. Then there exists a permutation
matrix P = pk , . . . , p2 , p1 (where p1 , p2 , . . . , pk are the elementary matrices
corresponding to the row interchanges used) so that the P A matrix has a
LU factorization, i.e.,
P A = LU, (1.41)
where P A is the matrix obtained from A by doing these interchanges to A.
Note that P = In if no interchanges are used. •

When pivoting is used in LU decomposition, its effects should be taken into


consideration. First, we recognize that LU decomposition with pivoting is
equivalent to performing two separate process:

1. Transform A to A0 by performing all shifting of rows.

2. Then decompose A0 to LU with no pivoting.


136 Applied Linear Algebra and Optimization using MATLAB

The former step can be expressed by

A0 = P A, equivalently A = P −1 A0 ,

where P is called a permutation matrix and represents the pivoting oper-


ation. The second process is

A0 = P A = LU

and so
A = P −1 LU = (P T L)U
since P −1 = P T . The determinant of A may now be written as

det(A) = det(P −1 ) det(L) det(U )

or
det(A) = β det(L) det(U ),
where β = det(P −1 ) equals −1 or +1 depending on whether the number
pivoting is odd or even, respectively. •

One can use the MATLAB built-in lu function to obtain the permuta-
tion matrix P so that the P A matrix has a LU decomposition:

>> A = [0 1 2; −1 4 2; 2 2 1];
>> [L, U, P ] = lu(A);
It will give us the permutation matrix P and the matrices L and U as
follows:  
0 0 1
P =  0 1 0 
1 0 0
and   
1 0 0 2 2 1
P A =  −0.5 1 0   0 5 2.5  = LU.
0 0.2 1 0 0 1.5
So
A = P −1 LU
Matrices and Linear Systems 137

or   
0 0.2 1 2 2 1
A = (P T L)U =  −0.5 1 0   0 5 2.5  .
1 0 0 0 0 1.5

Example 1.44 Consider the following matrix:


 
0 3 2
A =  3 2 5 ,
6 2 4

then:
1. Show that A does not have LU factorization;

2. Use Gauss elimination by partial pivoting and find the permutation


matrix P as well as the LU factors such that P A = LU ;

3. Use the information in P, L, and U to solve the system Ax = [6, 4, 3]T .


Solution. (1) I using simple Gauss elimination, since a11 = 0, from
Theorem 1.21, the LU decomposition of A is not possible.
(2) For applying Gauss elimination by partial pivoting, the interchanges
of the rows between row 1 and row 3 gives
 
6 2 4
 3 2 5 ,
0 3 3

and then using multiple m21 = 36 = 1


2
,
we obtain
 
6 2 4
 0 1 3 .
0 3 3

Now interchanging row 2 and row 3 gives


 
6 2 4
 0 3 3 .
0 1 3
138 Applied Linear Algebra and Optimization using MATLAB

By using multiple m32 = 13 , we get


 
6 2 4
 0 3 2 .
0 0 2

Note that during this elimination process two row interchanges were needed,
which means we got two elementary permutation matrices of the inter-
changes (from Theorem 1.23), which are
   
0 0 1 1 0 0
p1 =  0 1 0  and p2 =  0 0 1  .
1 0 0 0 1 0

Thus, the permutation matrix is


    
1 0 0 0 0 1 0 1 0
P = p2 p 1 =  0 0 1   0 1 0  =  0 0 1  .
0 1 0 1 0 0 1 0 0

If we do these interchanges to the given matrix A, the result is the matrix


P A, i.e.,
    
0 1 0 0 3 2 3 2 5
PA =  0 0 1  3 2 5  =  6 2 4 .
1 0 0 6 2 4 0 3 3

Now apply LU decomposition to the matrix P A, and we will convert it to


the upper-triangular matrix U by using the possible multiples

3
m21 = 2, m31 = 0, m32 = −
2
as follows:
     
3 2 5 3 2 5 3 2 5
 6 2 4  →  6 −2 −6  →  0 −2 −6  = U.
0 3 3 0 3 3 0 0 −6
Matrices and Linear Systems 139

Thus, P A = LU , where
   
1 0 0 3 2 5
L= 2 1 0  and U =  0 −2 −6  .
0 −3/2 1 0 0 −6

(3) Solve the first system Ly = P b = [4, 3, 6]T for unknown vector y, i.e.,
    
1 0 0 y1 4
 2 1 0   y2  =  3  .
0 −3/2 1 y3 6

Performing forward substitution yields

y1 = 4 gives y1 = 4
2y1 + y2 = 3 gives y2 = −5
− 3/2y2 + y3 = 6 gives y3 = −1.5.

Then solve the second system U x = y for the unknown vector x, i.e.,
    
3 2 5 x1 4
 0 −2 −6   x2  =  −5  .
0 0 −6 x3 −1.5

Performing backward substitution yields

3x1 + 2x2 + 5x3 = 4 gives x1 = −0.25


− 2x2 − 6x3 = −5 gives x2 = 1.75
− 6x3 = −1.5 gives x3 = 0.25,

which gives the approximate solution x∗ = [−0.25, 1.75, 0.25]T . •

The major advantage of the LU decomposition methods is the efficiency


when multiple unknown b vectors must be considered. The number of mul-
tiplications and divisions required by the complete Gaussian elimination
3
method is N = ( n3 ) + n2 − ( n3 ). The forward substitution step required
to solve the system Ly = b requires N = n2 − ( n2 ) operations, and the
backward substitution step required to solve the system U x = y requires
N = n2 + ( n2 ) operations. Thus, the total number of multiplications and
140 Applied Linear Algebra and Optimization using MATLAB

divisions required by LU decomposition, after L and U matrices have been


determined, is N = 2n2 , which is much less work than required by the
Gaussian elimination method, especially for large systems. •

In the analysis of many physical systems, sets of linear equations arise


that have coefficient matrices that are both symmetric and positive-definite.
Now we factorize such a matrix A into the product of lower-triangular and
upper-triangular matrices which have these two properties. Before we do
the factorization, we define the following matrix.
Definition 1.31 (Positive-Definite Matrix)

The function
 
a11 a12 · · · a1n  
x 1
 a21 a22 · · · a2n   x2 

xT Ax = x1 x2 · · · xn 
 a31 a32 · · · a3n  
 . 

 .. .. .. ..   .. 
 . . . . 
xn
an1 an2 · · · ann
or n X
n
X
xT Ax = aij xi xj (1.42)
i=1 j=1
can be used to represent any quadratic polynomial in the variables x1 , x2 , . . . , xn
and is called a quadratic form. A matrix is said to be positive-definite if
its quadratic form is positive for all real nonzero vectors x, i.e.,
xT Ax > 0, for every n-dimensional column vector x 6= 0.
Example 1.45 The matrix
 
4 −1 0
A =  −1 4 −1 
0 −1 4
is positive-definite and suppose x is any nonzero three-dimensional column
vector, then
  
4 −1 0 x1
xT Ax = x1 x2 x3  −1

4 −1   x2 
0 −1 4 x3
Matrices and Linear Systems 141

or  
 4x1 − x2
= x1 x2 x3  −x1 + 4x2 − x3  .
− x2 + 4x3
Thus,
xT Ax = 4x21 − 2x1 x2 + 4x22 − 2x2 x3 + 4x23 .
After rearranging the terms, we have

xT Ax = 3x21 + (x1 − x2 )2 + 2x22 + (x2 − x3 )2 + 3x23 .

Hence,
3x21 + (x1 − x2 )2 + 2x22 + (x2 − x3 )2 + 3x23 > 0,
unless x1 = x2 = x3 = 0. •

Symmetric positive-definite matrices occur frequently in equations de-


rived by minimization or energy principles, and their properties can often
be utilized in numerical processes.

Theorem 1.24 If A is a positive-definite matrix, then:


1. A is nonsingular.

2. aii > 0, for each i = 1, 2, . . . , n.

Theorem 1.25 The symmetric matrix A is a positive-definite matrix, if


and only if Gaussian elimination without row interchange can be performed
on the linear system Ax = b, with all pivot elements positive. •

Theorem 1.26 A matrix A is positive-definite if the determinant of the


principal minors of A are positive.

The principal minors of a matrix A are the square submatrices lying in the
upper-left hand corner of A. An n × n matrix A has n of these principal
minors. For example, for the matrix
 
6 2 1
A =  2 3 2 ,
1 1 2
142 Applied Linear Algebra and Optimization using MATLAB

the determinant of its principal minors are

det(6) = 6 > 0,
 
6 2
det = 18 − 4 = 14 > 0,
2 3
 
6 2 1
det  2 3 2  = 19 > 0.
1 1 2

Thus, the matrix A is positive-definite. •

Theorem 1.27 If a symmetric matrix A is diagonally dominant, then it


must be positive-definite. •

For example, for the diagonally dominant matrix


 
4 −1 2
A =  −1 3 0 ,
2 0 5

the determinant of its principal minors are

det(4) = 4 > 0,
 
4 −1
det = 12 − 1 = 11 > 0,
−1 3
 
4 −1 2
det  −1 3 0  = 43 > 0.
2 0 5

Hence, (using Theorem 1.26) matrix A is positive-definite. •

Theorem 1.28 If a matrix A is nonsingular, then AT A is always positive-


definite. •
Matrices and Linear Systems 143

For example, for the matrix


 
1 1 1
A =  1 1 2 ,
1 2 1

we can have
 
3 4 4
AT A =  4 6 5  .
4 5 6

Then the determinant of its principal minors are

det(3) = 3 > 0,
 
3 4
det = 18 − 16 = 2 > 0,
4 6
 
3 4 4
det  4 6 5  = 1 > 0.
4 5 6

Thus, matrix A is positive-definite. •

Cholesky Method

The Cholesky method (or square root method) is of the same form as
Doolittle’s method and Crout’s method except it is limited to equations
involving symmetrical coefficient matrices. In the case of a symmetric
and positive-definite matrix A it is possible to construct an alternative
triangular factorization with a saved number of calculations compared with
previous factorizations. Here, we decompose the matrix A into the product
of LLT , i.e.,
A = LLT , (1.43)
144 Applied Linear Algebra and Optimization using MATLAB

where L is the lower-triangular matrix and LT is its transpose. The ele-


ments of L are computed by equating successive columns in the relation
    
a11 a12 · · · a1n l11 0 ··· 0 l11 l21 · · · ln1
 a21 a22 · · · a2n   l21 l22 · · · 0   0 l22 · · · ln2 
 
= ..  .
  
 .. .. .. .. . .. .. .. . .. ..
.   .. .   ..
   
 . . . . . . . . 
an1 an2 · · · ann ln1 ln2 · · · lnn 0 0 · · · lnn

After constructing the matrices L and LT , the solution of the system Ax =


b can be computed in the following two steps:
1. Solve Ly = b, for y.
(using forward substitution)

2. Solve LT x = y, for x.
(using backward substitution)
In this procedure, it is necessary to take the square root of the elements
on the main diagonal of the coefficient matrix. However, for a positive-
definite matrix the terms on its main diagonal are positive, so no difficulty
will arise when taking the square root of these terms.
Example 1.46 Construct the LU decomposition of the following matrix
using the Cholesky method:
 
1 1 2
A =  1 2 4 .
2 4 9

Solution. Since
  
l11 0 0 l11 l21 l31
A = LLT =  l21 l22 0   0 l22 l32  ,
l31 l32 l33 0 0 l33

performing the multiplication on the right-hand side gives


   2 
1 1 2 l11 l11 l21 l11 l31
 1 2 4  =  l11 l21 2 2
l21 + l22 l21 l31 + l22 l32  .
2 2 2
2 4 9 l11 l31 l31 l21 + l22 l32 l31 + l32 + l33
Matrices and Linear Systems 145

Then equate elements of the first column to obtain

2

1 = l11 gives l11 = 1=1

1 = l11 l21 gives l21 = 1

2 = l11 l31 gives l31 = 2.


Note that l11 could be − 1 and so the matrix L is not (quite) unique.
Now equate elements of the second column to obtain

2 2
2 = l21 + l22 gives l22 = 1
4 = l31 l21 + l32 l22 gives l32 = 2.

Finally, equate elements of the third column to obtain

2 2 2
9 = l31 + l32 + l33 gives l33 = 1.

Thus, we obtain

    
1 1 2 1 0 0 1 1 2
 1 2 4  =  1 1 0  0 1 2 ,
2 4 9 2 2 1 0 0 1

the factorization of the given matrix. •


146 Applied Linear Algebra and Optimization using MATLAB

For a general n × n matrix, the elements of the lower-triangular matrix L


are constructed from
√ 
l11 = a11 



aj1



lj1 = , j>1 

l11 





v ! 

u i−1
X



u
2
lii = aii − lik , 1<i<n 
t 



k=1
. (1.44)
" # 

i−1 
1 X 

lji = aji − ljk lik , j>i>1 


lii k=1








v ! 
n−1
u 

u X 
2

lnn = t ann − lnk 



k=1

The method fails if ljj = 0 and the expression inside the square root is
negative, in which case all of the elements in column j are purely imaginary.
There is, however, a special class of matrices for which these problems don’t
occur.
The Cholesky method provides a convenient method for investigat-
ing the positive-definiteness of symmetric matrices. The formal definition
xT Ax > 0, for all x 6= 0, is not easy to verify in practice. However, it is
relatively straightforward to attempt the construct of a Cholesky decom-
position of a symmetric matrix.

Theorem 1.29 A matrix A is positive-definite, if and only if A can be


factored in the form A = LLT , where L is a lower-triangular matrix with
nonzero diagonal entries. •

Example 1.47 Show that the following matrix is positive-definite by using


the Cholesky method:  
9 3 6
A =  3 10 8  .
6 8 9
Matrices and Linear Systems 147

Solution. Since
  
l11 0 0 l11 l21 l31
A = LLT =  l21 l22 0   0 l22 l32  ,
l31 l32 l33 0 0 l33
performing the multiplication on the right-hand side gives
   2 
9 3 6 l11 l11 l21 l11 l31
 3 10 8  =  l11 l21 l21 2 2
+ l22 l21 l31 + l22 l32  .
2 2 2
6 8 9 l11 l31 l31 l21 + l22 l32 l31 + l32 + l33
Then equate elements of the first column to obtain
2

9 = l11 gives l11 = 9=3

3 = l11 l21 gives l21 = 1

6 = l11 l31 gives l31 = 2.


Now equate elements of the second column to obtain
2 2
10 = l21 + l22 gives l22 = 3
8 = l31 l21 + l32 l22 gives l32 = 2.
Finally, equate elements of the third column to obtain
2 2 2
9 = l31 + l32 + l33 gives l33 = 1.
Thus, the factorization obtained as
    
9 3 6 3 0 0 3 1 2
A =  3 10 8  =  1 3 0   0 3 2  = LLT ,
6 8 9 2 2 1 0 0 1
and it shows that the given matrix is positive-definite. •
If the symmetric coefficient matrix is not positive-definite, then the
terms on the main diagonal can be zero or negative. For example, the
symmetric coefficient matrix
 
1 1 2
A= 1 2 4 
2 4 8
148 Applied Linear Algebra and Optimization using MATLAB

is not positive-definite because the Cholesky decomposition of the matrix


has the form
    
1 1 2 1 0 0 1 1 2
A= 1 2 4 = 1 1 0   0 1 2  = LLT ,
2 4 8 2 2 0 0 0 0

which shows that one of the diagonal elements of L and LT is zero. •


Example 1.48 Solve the following linear system by LU decomposition us-
ing the Cholesky method:
   
1 1 2 2
A=  1 2 4  and b =  1 .
2 4 9 1

Solution. The factorization of the coefficient matrix A has already been


constructed in Example (1.46) as
    
1 1 2 1 0 0 1 1 2
 1 2 4 = 1 1 0  0 1 2 .
2 4 9 2 2 1 0 0 1

Then solve the first system Ly = b for unknown vector y, i.e.,


    
1 0 0 y1 2
 1 1 0   y2  =  1  .
2 2 1 y3 1

Performing forward substitution yields

y1 = 2, y1 = 2
y1 + y2 = 1, y2 = −1
2y1 + 2y2 + y3 = 1, y3 = −1.

Then solve the second system LT x = y for unknown vector x, i.e.,


    
1 1 2 x1 2
 0 1 2   x2  =  −1  .
0 0 1 x3 −1
Matrices and Linear Systems 149

Performing backward substitution yields

x1 + x2 + 2x3 = 2 gives x1 = 3
x2 + 2x3 = −1 gives x2 = 1
x3 = −1 gives x3 = −1,

which gives the approximate solution x∗ = [3, 1, −1]T . •

Now use the following MATLAB commands to obtain the above results:

>> A = [1 1 2; 1 2 4; 2 4 9];
>> b = [2 1 1];
>> sol = Cholesky(A, b);

Procedure 1.7 (LU Decomposition by the Cholesky Method)

1. Take the positive-definite matrix A.

2. If possible, decompose the matrix A = LLT using (1.44).

3. Solve linear system Ly = b using (1.36).

4. Solve linear system LT x = y using (1.37).

Example 1.49 Find the bounds on α for which the Cholesky factorization
of the following matrix with real elements
 
1 2 α
A =  2 8 2α 
α 2α 9

is possible.

Solution. Since
  
l11 0 0 l11 l21 l31
A = LLT =  l21 l22 0   0 l22 l32  ,
l31 l32 l33 0 0 l33
150 Applied Linear Algebra and Optimization using MATLAB

performing the multiplication on the right-hand side gives

   2 
1 2 α l11 l11 l21 l11 l31
2
 2 8 2α  =  l11 l21 l21 2
+ l22 l21 l31 + l22 l32  .
2 2 2
α 2α 9 l11 l31 l31 l21 + l22 l32 l31 + l32 + l33

Then equate elements of the first column to obtain

2

1 = l11 gives l11 = 1=1

2 = l11 l21 gives l21 = 2

α = l11 l31 gives l31 = α.


Note that l11 could be − 1 and so matrix L is not (quite) unique.
Now equate elements of the second column to obtain

2 2
8 = l21 + l22 gives l22 = 2
2α = l31 l21 + l32 l22 gives l32 = 0.

Finally, equate elements of the third column to obtain

2 2 2

9 = l31 + l32 + l33 gives l33 = 9 − α2 ,

which shows that the allowable values of α must satisfy 9 − α2 > 0.


Thus, α is bounded by −3 < α < 3. •
Matrices and Linear Systems 151

Program 1.13
MATLAB m-file for the Cholesky Method
function sol = Cholesky(A, b)
[n,n]=size(A); l=zeros(n,n); u=l;
l(1, 1) = (A(1, 1)) \ 0.5; u(1,1)=l(1,1);
for i=2:n; u(1,i)=A(1,i)/l(1,1);
l(i,1)=A(i,1)/u(1,1); end
for i=2:n; for j=2:n; s=0;
if i <= j; K=i-1; else; K=j-1; end
for k=1:K; s = s + l(i, k) ∗ u(k, j); end
if j > i; u(i,j)=(A(i,j)-s)/l(i,i);
elseif i == j
l(i, j) = (A(i, j) − s) \ 0.5; u(i,j)=l(i,j);
else; l(i,j)=(A(i,j)-s)/u(j,j); end; end; end
y(1)=b(1)/l(1,1);
for k=2:n; sum=b(k);
for i=1:k-1; sum = sum − l(k, i) ∗ y(i); end
y(k)=sum/l(k,k); end
x(n)=y(n)/u(n,n);
for k=n-1:-1:1; sum=y(k);
for i=k+1:n; sum = sum − u(k, i) ∗ x(i); end
x(k)=sum/u(k,k); end; l; u; y; x;

Example 1.50 Find the LU decomposition of the following matrix using


Doolittle’s, Crout’s, and the Cholesky methods:
 
4 −2 4
A =  −2 2 2 .
4 2 29

Solution. By using the simple Gauss elimination method, one can convert
the given matrix into the upper-triangular matrix
 
4 −2 4
U = 0 1 4 
0 0 9
152 Applied Linear Algebra and Optimization using MATLAB

with the help of the possible multiples

m21 = −0.5, m31 = 1, m32 = 4.

Thus, the LU decomposition of A using Doolittle’s method is


  
1 0 0 4 −2 4
A = LU =  −0.5 1 0   0 1 4 .
1 4 1 0 0 9
Rather than computing the next two factorizations directly, we can obtain
them from Doolittle’s factorization above. From Doolittle’s factorization
the LDV factorization of the given matrix A can be obtained as
   
1 0 0 4 0 0 1 −0.5 1
A = LDV =  −0.5 1 0   0 1 0   0 1 4 .
1 4 1 0 0 9 0 0 1

By putting L̂ = LD, i.e.,


    
1 0 0 4 0 0 4 0 0
L̂ = LD =  −0.5 1 0   0 1 0  =  −2 1 0  ,
1 4 1 0 0 9 4 4 9
we can obtain Crout’s factorization as follows:
  
4 0 0 1 −0.5 1
A = LDV = L̂V =  −2 1 0   0 1 4 .
4 4 9 0 0 1
Similarly, the Cholesky factorization is obtained by splitting diagonal ma-
1 1
trix D into the form D 2 D 2 in the LDV factorization and associating one
factor with L and the other with V . Thus,
1 1
A = LDV = (LD 2 )(D 2 V ) = L̂L̂T ,

where
    
1 0 0 2 0 0 2 0 0
L̂ = LD1/2 =  −0.5 1 0   0 1 0  =  −1 1 0 
1 4 1 0 0 3 2 4 3
Matrices and Linear Systems 153

and
    
2 0 0 1 −0.5 1 2 −1 2
L̂T = D1/2 V =  0 1 0   0 1 4 = 0 1 4 .
0 0 3 0 0 1 0 0 3

Thus, we obtain
  
2 0 0 2 −1 2
A = L̂L̂T =  −1 1 0   0 1 4 ,
2 4 3 0 0 3

the Cholesky factorization of the given matrix A. •

The factorization of primary interest is A = LU , where L is a unit lower-


triangular matrix and U is an upper-triangular matric. Henceforth, when
we refer to a LU decomposition, we mean one in which L is a unit lower-
triangular matrix.

Example 1.51 Show that the following matrix cannot be factored as A =


LDLT :  
1 2 1
A=  2 5 3 .
1 3 2
Solution. By using the simple Gauss elimination method we can use the
multipliers
m21 = 2, m31 = 1, m32 = 1,
and we can convert the given matrix into an upper-triangular matrix as
follows:    
1 2 1 1 2 1
 0 1 1  →  0 1 1  = U.
0 1 1 0 0 0
(2)
Since the element a33 = 0, the simple Gaussian elimination cannot con-
tinue in its present form and from Theorem 1.21, the decomposition of A
is not possible. Hence, A cannot be factored as A = LDLT . •
154 Applied Linear Algebra and Optimization using MATLAB

Since we know that not every matrix has a direct LU decomposition,


we define the following matrix which gives the sufficient condition for the
LU decomposition of the matrix. It also helps us with the convergence of
the iterative methods for solving linear systems.
Definition 1.32 (Strictly Diagonally Dominant Matrix)

A square matrix is said to be Strictly Diagonally Dominant (SDD) if the


absolute value of each element on the main diagonal is greater than the
sum of the absolute values of all the other elements in that row. Thus, a
SDD matrix is defined as
n
X
|aii | > |aij |, for i = 1, 2, . . . , n. (1.45)
j=1
j6=i

Example 1.52 The matrix


 
7 3 1
A= 1 6 3 
−2 4 8
is SDD since
|7| > |3| + |1|, i.e., 7 > 4
|6| > |1| + |3|, i.e., 6 > 4
|8| > | − 2| + |4|, i.e., 8 > 6,
but the matrix  
6 −3 4
B= 3 7 3 
5 −4 10
is not SDD since
|6| > | − 3| + |4|, i.e., 6 > 7,

which is not true. •

An SDD matrix occurs naturally in a wide variety of practical applica-


tions, and when solving an SDD system by the Gauss elimination method,
partial pivoting is never required.
Matrices and Linear Systems 155

Theorem 1.30 If a matrix A is strictly diagonally dominant, then:

1. Matrix A is nonsingular.

2. Gaussian elimination without row interchange can be performed on


the linear system Ax = b.

3. Matrix A has LU factorization. •

Example 1.53 Solve the following linear system using the simple Gaus-
sian elimination method and also find the LU decomposition of the matrix
using Doolittle’s method and Crout’s method:

5x1 + x2 + x3 = 7
2x1 + 6x2 + x3 = 9
x1 + 2x2 + 9x3 = 12.

Solution. Start with the augmented matrix form


.
 
5 1 1 .. 7
 2 6 1 ... 9
 
,
 
.
1 2 9 .. 12

and since a11 = 5 6= 0, we can eliminate the elements a21 and a31 by
subtracting from the second and third rows the appropriate multiples of the
first row. In this case the multiples are given,
2 1
m21 = = 0.4 and m31 = = 0.2.
5 5
Hence,
..
 
5 1 1 . 7
 0 5.6 0.6 ... 6.2  .
 
 
..
0 1.8 8.8 . 10.6
(1) (1)
Since a22 = 5.6 6= 0, we eliminate the entry in the a32 position by sub-
1.8
tracting the multiple m32 = 5.6 = 0.32 of the second row from the third row
156 Applied Linear Algebra and Optimization using MATLAB

to get
.
 
5 1 ..
1 7
 0 5.6 0.6 ... 6.2  .
 
 
..
0 0 8.6 . 8.6
Obviously, the original set of equations has been transformed to an upper-
triangular form. All the diagonal elements of the obtaining upper-triangular
matrix are nonzero, which means that the coefficient matrix of the given
system is nonsingular, therefore, the given system has a unique solution.
Now expressing the set in algebraic form yields
5x1 + x2 + x3 = 7
5.6x2 + 0.6x3 = 6.2
8.6x3 = 8.6.
Now use backward substitution to get the solution of the system as
8.6x3 = 8.6 gives x3 = 1
5.6x2 = −0.6x3 + 6.2 = −0.6 + 6.2 = 5.6 gives x2 = 1
5x1 = 7 − x2 − x3 = 7 − 1 − 1 = 5 gives x1 = 1.
We know that when using LU decomposition by Doolittle’s method the un-
known elements of matrix L are the multiples used and the matrix U is the
same as we obtained in the forward elimination process of the simple Gauss
elimination. Thus, the LU decomposition of matrix A can be obtained by
using Doolittle’s method as follows:
   
5 1 1 1 0 0 5 1 1
A =  2 6 1   0.4 1 0   0 5.6 0.6  = LU.
1 2 9 0.5 0.32 1 0 0 8.6
Similarly, the LU decomposition of matrix A can be obtained by using
Crout’s method as
   
5 1 1 5 0 0 1 0.2 0.2
A=  2 6 1   2 5.6 0   0 1 0.1  = LU.
1 2 9 1 1.8 8.6 0 0 1
Thus, the conditions of Theorem 1.30 are satisfied. •
Matrices and Linear Systems 157

1.4.6 Tridiagonal Systems of Linear Equations


The application of numerical methods to the solution of certain engineering
problems may in some cases result in a set of tridiagonal linear algebraic
equations. Heat conduction and fluid flow problems are some of the many
applications that generate such a system.
A tridiagonal system has a coefficients matrix T of which all elements
except those on the main diagonal and the two diagonals just above and
below the main diagonal (usually called superdiagonal and subdiagonal,
respectively) are defined as
 
α1 c1 0 ··· 0
..
 β2 α2 c2 . 0
 


T = ... 
. (1.46)
 0 β3 a3 0 
 . ... ... ...
 ..

cn−1 
0 0 0 βn αn

This type of matrix can be stored more economically, which is the


case for a fully populated matrix. Obviously, one may use any one of
the methods discussed in the previous sections for solving the tridiagonal
system
T x = b, (1.47)
but the linear system involving nonsingular matrices of the form T given
in (1.47) are also most easily solved by the LU decomposition method just
described for the general linear system. The tridiagonal matrix T can be
factored into a lower-bidiagonal factor L and an upper-bidiagonal factor U
having the following forms:
   
1 0 0 ··· 0 u1 c1 0 ··· 0
. ..
l2 1 0 .. 0  0 u2 c2 . 0
   
  
 .   .. 
L=
 0 l3 1 .. ,
0  U =
 0 0 u3 . 0 .

 ... ... ... ...   .. .. .. .. 
 0   . . . . cn−1 
0 0 0 ln 1 0 0 0 0 un
(1.48)
158 Applied Linear Algebra and Optimization using MATLAB

The unknown elements li and ui of matrices L and U , respectively, can be


computed as a special case of Doolittle’s method using the LU decompo-
sition method,

u1 = α1 





βi 
li = , i = 2, 3, . . . , n . (1.49)
ui−1 





u = α −lc ,
i i i i−1 i = 2, 3, . . . , n. 

After finding the values for li and ui , then they are used along with the
elements ci , to solve the tridiagonal system (1.47) by solving the first bidi-
agonal system
Ly = b, (1.50)
for y by using forward substitution,

y 1 = b1
, (1.51)
yi = bi − li yi−1 , i = 2, 3, . . . , n

followed by solving the second bidiagonal system,

U x = y, (1.52)

for x by using backward substitution,



xn = yn /un
. (1.53)
xi = yi − ci xi+1 , i = n − 1, . . . , 1

The entire process for solving the original system (1.47) requires 3n ad-
ditions, 3n multiplications, and 2n divisions. Thus, the total number of
multiplications and divisions is approximately 5n.

Most large tridiagonal systems are strictly diagonally dominant (defined


as follows), so pivoting is not necessary. When solving systems of equations
with a tridiagonal coefficients matrix T , iterative methods can sometimes
be used to one’s advantage. These methods are introduced in Chapter 2.
Matrices and Linear Systems 159

Example 1.54 Solve the following tridiagonal system of equations using


the LU decomposition method:

x1 + x2 = 1
x1 + 2x2 + x3 = 0
x2 + 3x3 + x4 = 1
x3 + 4x4 = 1.

Solution. Construct the factorization of tridiagonal matrix T as follows:


    
1 1 0 0 1 0 0 0 u1 1 0 0
 1 2 1 0   l2 1 0
  0   0 u2 1 0
  

 0 = .
1 3 1   0 l3 1 0   0 0 u3 1 
0 0 1 4 0 0 l4 1 0 0 0 u4

Then the elements of the L and U matrices can be computed by using (1.48)
as follows:
u1 = α1 = 1

β2 1
l2 = = =1
u1 1

u2 = α2 − l2 c1 = 2 − (1)1 = 1

b3 1
l3 = = =1
u2 1

u3 = α3 − l3 c2 = 3 − (1)1 = 2

b4 1
l4 = =
u3 2

1 7
u4 = α4 − l4 c3 = 4 − ( )1 = .
2 2
After finding the elements of the bidiagonal matrices L and U , we solve the
160 Applied Linear Algebra and Optimization using MATLAB

first system Ly = b as follows:

   

1 0 0 0 y1 1
 1 1 0 0   y2   0 
   =  .
 0 1 1 0   y3   1 
0 0 21 1 y4 1

Using forward substitution, we get

[y1 , y2 , y3 , y4 ]T = [1, −1, 2, 0]T .

Now we solve the second system U x = y as follows:

   

1 1 0 0 x1 1
 0 1 1 0   x2   −1 
=
  2 .
  
 0 0 2 1   x3
0 0 0 27 x4 0

Using backward substitution, we get

x∗ = [x1 , x2 , x3 , x4 ]T = [3, −2, 1, 0]T ,

which is the required solution of the given system. •

The above results can be obtained using MATLAB commands. We do


the following:

>> T b = [T |b] = [1 1 0 0 1; 1 2 1 0 0; 0 1 3 1 1; 0 0 1 4 1];


>> T riDLU (T b);
Matrices and Linear Systems 161

Program 1.14
MATLAB m-file for LU Decomposition for a Tridiagonal System
function sol=TRiDLU(Tb)
[m,n]=size(Tb); L=eye(m); U=zeros(m);
U(1,1)=Tb(1,1);
for i=2:m
U (i − 1, i) = T b(i − 1, i);
L(i, i − 1) = T b(i, i − 1)/U (i − 1, i − 1);
U (i, i) − L(i, i − 1) ∗ T b(i − 1, i); end
disp(’The lower-triangular matrix’) L;
disp(’The upper-triangular matrix’) U;
y = inv(L) ∗ T b(:, n); x = inv(U ) ∗ y;

Procedure 1.8 (LU Decomposition by the Tridiagonal Method)

1. Take the tridiagonal matrix T .

2. Decompose the matrix T = LU using (1.49).

3. Solve linear system Ly = b using (1.51).

4. Solve linear system U x = y using (1.53).

1.5 Conditioning of Linear Systems


In solving the linear system numerically we have to see the problem condi-
tioning, algorithm stability, and cost. Earlier we discussed efficient elimi-
nation schemes to solve a linear system, and these schemes are stable when
pivoting is employed. But there are some ill-conditioned systems which are
tough to solve by any method. These types of linear systems are identified
in this chapter.

Here, we will present a parameter, the condition number, which quan-


titatively measures the conditioning of a linear system. The condition
number is greater than and equal to one and as a linear system becomes
162 Applied Linear Algebra and Optimization using MATLAB

more ill-conditioned, the condition number increases. After factoring a ma-


trix, the condition number can be estimated in roughly the same time it
takes to solve a few factored systems (LU )x = b. Hence, after factoring a
matrix, the extra computer time needed to estimate the condition number
is usually insignificant.

1.5.1 Norms of Vectors and Matrices


For solving linear systems, we discuss a method for quantitatively mea-
suring the distance between vectors in Rn , the set of all column vectors
with real components, to determine whether the sequence of vectors that
results from using a direct method converges to a solution of the system.
To define a distance in Rn , we use the notation of the norm of a vector.

Vector Norms

It is sometimes useful to have a scalar measure of the magnitude of a vector.


Such a measure is called a vector norm and for a vector x is written as kxk.

A vector norm on Rn is a function from Rn to R satisfying:

1. kxk > 0, for all x ∈ Rn ;

2. kxk = 0, if and only if x = 0;

3. kαxk = |α|kxk, for all α ∈ R, x ∈ Rn ;

4. kx + yk ≤ kxk + kyk, for all x, y ∈ Rn .

There are three norms in Rn that are most commonly used in applications,
called l1 -norm, l2 -norm, and l∞ -norm, and are defined for the given vectors
Matrices and Linear Systems 163

x = [x1 , x2 , . . . , xn ]T as

n
X
kxk1 = |xi |
i=1

n
!1/2
X
kxk2 = x2i
i=1

kxk∞ = max |xi |.


1≤i≤n

The l1 -norm is called the absolute norm, the l2 -norm is frequently called
the Euclidean norm as it is just the formula for distance in ordinary three-
dimensional Euclidean space extended to dimension n, and finally, the
l∞ -norm is called the maximum norm or occasionally the uniform norm.
All these three norms are also called the natural norms.

Example 1.55 Compute the lp -norms (p = 1, 2, ∞) of the vector x =


[−5, 3, −2]T in R3 .

Solution. These lp -norms (p = 1, 2, ∞) of the given vector are:

kxk1 = |x1 | + |x2 | + |x3 | = | − 5| + |3| + | − 2| = 10,


h i1/2
kxk2 = (x21 + x22 + x23 )1/2 = (−5)2 + (3)2 + (−2)2 ≈ 6.16,

kxk∞ = max{|x1 |, |x2 |, |x3 |} = max{| − 5|, |3|, | − 2|} = 5.

In MATLAB, the built-in norm function computes the lp -norms of vec-


tors. If only one argument is passed to norm, the l2 -norm is returned and
for two arguments, the second one is used to specify the value of p:
164 Applied Linear Algebra and Optimization using MATLAB

>> x = [−5 3 − 2];


>> v = norm(x)
v = 6.16
>> x = [−5 3 − 2];
>> v = norm(x, 2)
v = 6.16
>> x = [−5 3 − 2];
>> v = norm(x, 1)
v = 10
>> x = [−5 3 − 2];
>> v = norm(x, inf )
v=5

The internal MATLAB constant inf is used to select the l∞ -norm.

Matrix Norms
A matrix norm is a measure of how well one matrix approximates another,
or, more accurately, of how well their difference approximates the zero ma-
trix. An iterative procedure for inverting a matrix produces a sequence
of approximate inverses. Since, in practice, such a process must be termi-
nated, it is desirable to have some measure of the error of an approximate
inverse.

So a matrix norm on the set of all n × n matrices is a real-valued


function, k.k, defined on this set, satisfying for all n × n matrices A and B
and all real numbers α as follows:

1. kAk > 0, A 6= 0;

2. kAk = 0, A = 0;

3. kIk = 1, I is the identity matrix;

4. kαAk = |α|kAk, for scalar α ∈ R;

5. kA + Bk ≤ kAk + kBk;
Matrices and Linear Systems 165

6. kABk ≤ kAkkBk;

7. kA − Bk ≥ kAk − kBk .

Several norms for matrices have been defined, and we shall use the following
three natural norms l1 , l2 , and l∞ for a square matrix of order n:
n
!
X
kAk1 = max |aij | = maximum column sum,
j
i=1

kAk2 = max kAxk2 = spectral norm,


kxk2 =1

n
!
X
kAk∞ = max |aij | = row sum norm.
i
j=1

The l1 -norm and l∞ -norm are widely used because they are easy to cal-
culate. The matrix norm kAk2 that corresponds to the l2 -norm is related
to the eigenvalues of the matrix. It sometimes has special utility because
no other norm is smaller than this norm. It, therefore, provides the best
measure of the size of a matrix, but is also the most difficult to compute.
We will discuss this natural norm later in the chapter.

For an m × n matrix, we can paraphrase the Frobenius (or Euclidean)


norm (which is not a natural norm) and define it as

m X
n
!1/2
X
kAkF = |aij |2 .
i=1 j=1

It can be shown that p


kAkF = tr(AT A),
where tr(AT A) is the trace of a matrix AT A, i.e., the sum of the diagonal
entries of AT A. The Frobenius norm of a matrix is a good measure of the
magnitude of a matrix. Note that kAkF 6= kAk2 . For a diagonal matrix,
all norms have the same values.
166 Applied Linear Algebra and Optimization using MATLAB

Example 1.56 Compute the lp -norms (p = 1, ∞, F ) of the following ma-


trix:  
4 2 −1
A= 3 5 −2  .
1 −2 7
Solution. These norms are:
3
X
|ai1 | = |4| + |3| + |1| = 8,
i=1
3
X
|ai2 | = |2| + |5| + | − 2| = 9,
i=1
3
X
|ai3 | = | − 1| + | − 2| + |7| = 10,
i=1

so
kAk1 = max{8, 9, 10} = 10.
Also,
3
X
|a1j | = |4| + |2| + | − 1| = 7,
j=1
3
X
|a2j | = |3| + |5| + | − 2| = 10,
j=1
X3
|a3j | = |1| + | − 2| + |7| = 10,
j=1
so
kAk∞ = max{7, 10, 10} = 10.
Finally, we have

kAkF = (16 + 4 + 1 + 9 + 25 + 4 + 1 + 4 + 49)1/2 ≈ 10.6301,

the Frobenius norm of the given matrix. •

Like the lp -norms of vectors, in MATLAB the built-in norm function


can be used to compute the lp -norms of matrices. The l1 -norm of a matrix
Matrices and Linear Systems 167

can be computed as follows:

>> A = [4 2 − 1; 3 5 − 2; 1 − 2 − 7];
>> B = norm(A, 1)
B=
10
The l∞ -norm of a matrix A is:

>> A = [4 2 − 1; 3 5 − 2; 1 − 2 − 7];
>> B = norm(A, inf )
B=
10
Finally, the Frobenius norm of the matrix A is:

>> A = [4 2 − 1; 3 5 − 2; 1 − 2 − 7];
>> B = norm(A,0 f ro0 )
B=
10.6301

1.5.2 Errors in Solving Linear Systems


Any computed solution of a linear system must, because of round-off and
other errors, be considered an approximate solution. Here, we shall con-
sider the most natural method for determining the accuracy of a solution
of the linear system. One obvious way of estimating the accuracy of the
computed solution x∗ is to compute Ax∗ and to see how close Ax∗ comes
to b. Thus, if x∗ is an approximate solution of the given system Ax = b,
we compute a vector
r = b − Ax∗ , (1.54)
which is called the residual vector and which can be easily calculated. The
quantity
krk kb − Ax∗ k
=
kbk kbk
is called the relative residual. We use MATLAB as follows:
168 Applied Linear Algebra and Optimization using MATLAB

Program 1.15
MATLAB m-file for Finding the Residual Vector
function r=RES(A,b,x0)
[n,n]=size(A);
for i=1:n; R(i) = b(i);
for j=1:n
R(i)=R(i)-A(i,j)*x0(j);end
RES(i)=R(i); end
r=RES’

The smallness of the residual then provides a measure of the goodness


of the approximate solution x∗ . If every component of vector r vanishes,
then x∗ is the exact solution. If x∗ is a good approximation, then we would
expect each component of r to be small, at least in a relative sense. For
example, the linear system
x1 + 2x2 = 3
1.0001x1 + 2x2 = 3.0001

has the approximate solution x∗ = [3, 0]T . To see how good this solution
is, we compute the residual, r = [0, −0.0002]T .

We can conclude from the residual that the approximate solution is


correct to at most three decimal places. Also, the linear system
1.0000x1 + 0.9600x2 + 0.8400x3 + 0.6400x4 = 3.4400
0.9600x1 + 0.9214x2 + 0.4406x3 + 0.2222x4 = 2.5442
0.8400x1 + 0.4406x2 + 1.0000x3 + 0.3444x4 = 2.6250
0.6400x1 + 0.2222x2 + 0.3444x3 + 1.0000x4 = 2.2066

has the exact solution x = [1, 1, 1, 1]T and the approximate solution due to
Gaussian elimination without pivoting is

x∗ = [1.0000322, 0.99996948, 0.99998748, 1.0000113]T ,

and the residual is

r = [0.6 × 10−7 , 0.6 × 10−7 , −0.53 × 10−5 , −0.21 × 10−4 ]T .


Matrices and Linear Systems 169

The approximate solution due to Gaussian elimination with partial pivot-


ing is
x∗ = [0.9999997, 0.99999997, 0.99999996, 1.0000000]T ,
and the residual is

r = [0.3 × 10−7 , 0.3 × 10−7 , 0.6 × 10−7 , 0.1 × 10−8 ]T .

We found that all the elements of the residual for the second case (with
pivoting) are less than 0.6 × 10−7 , whereas for the first case (without piv-
oting) they are as large as 0.2 × 10−4 . Even without knowing the exact
solution, it is clear that the solution obtained in the second case is much
better than the first case. The residual provides a reasonable measure of
the accuracy of a solution in those cases where the error is primarily due
to the accumulation of round-off errors.

Intuitively it would seem reasonable to assume that when krk is small


for a given vector norm, then the error kx − x∗ k would be small as well.
In fact, this is true for some systems. However, there are systems of equa-
tions which do not satisfy this property. Such systems are said to be
ill-conditioned.

These are systems in which small changes in the coefficients of the


system lead to large changes in the solution. For example, consider the
linear system
x1 + x2 = 2
x1 + 1.01x2 = 2.01.
The exact solution is easily verified to be x1 = x2 = 1. On the other hand,
the system
x1 + x2 = 2
1.001x1 + x2 = 2.01
has the solution x1 = 10, x2 = −8. Thus, a change of 1% in the coefficients
has changed the solution by a factor of 10. If in the above given system, we
substitute x1 = 10, x2 = 8, we find that the residuals are r1 = 0, r2 = 0.09,
so this solution looks reasonable, although it is grossly in error. In practical
problems we can expect the coefficients in the system to be subject to small
170 Applied Linear Algebra and Optimization using MATLAB

errors, either because of round-off or because of physical measurement. If


the system is ill-conditioned, the resulting solution may be grossly in error.
Errors of this type, unlike those caused by round-off error accumulation,
cannot be avoided by careful programming.

We have seen that for ill-conditioned systems the residual is not neces-
sarily a good measure of the accuracy of a solution. How then can we tell
when a system is ill-conditioned? In the following we discuss some possible
indicators of ill-conditioned systems.

Definition 1.33 (Condition Number of a Matrix)

The number kAkkA−1 k is called the condition number of a nonsingular


matrix A and is denoted by K(A), i.e.,

cond(A) = K(A) = kAkkA−1 k. (1.55)

Note that the condition number K(A) for A depends on the matrix norm
used and can, for some matrices, vary considerably as the matrix norm is
changed. Since

1 = kIk = kAA−1 k ≤ kAkkA−1 k = K(A),

the condition number is always in the range 1 ≤ K(A) ≤ ∞ regardless of


any natural norm. The lower limit is attained for identity matrices and
K(A) = ∞ if A is singular. So the matrix A is well-behaved (or well-
conditioned) if K(A) is close to 1 and is increasingly ill-conditioned when
K(A) is significantly greater than 1, i.e., K(A) → ∞. •

The condition numbers provide bounds for the sensitivity of the solution
of a set of equations to changes in the coefficient matrix. Unfortunately,
the evaluation of any of the condition numbers of a matrix A is not a trivial
task since it is necessary first to obtain its inverse.

So if the condition number of a matrix is a very large number, then this


is one of the indicators of an ill-conditioned system. Another indicator of
ill-conditioning is when the pivots during the process of elimination suffer
Matrices and Linear Systems 171

a loss of one or more significant figures. Small changes in the right-hand


side terms of the system lead to large changes in the solution and give
another indicator of an ill-conditioned system. Also, when the elements of
the inverse of the coefficient matrix are large compared with the elements
of the coefficients matrix, this also shows an ill-conditioned system.

Example 1.57 Compute the condition number of the following matrix us-
ing the l∞ -norm:  
2 −1 0
A =  2 −4 −1  .
−1 0 2
Solution. The condition number of a matrix is defined as

K(A) = kAk∞ kA−1 k∞ .

First, we calculate the inverse of the given matrix, which is


8 2 1
 
 13 − −
 13 13 

 
3 4 2
A−1 = 
 
− − .
 13 13 13 
 
 
 4 1 6 

13 13 13
Now we calculate the l∞ -norm of both the matrices A and A−1 . Since the
l∞ -norm of a matrix is the maximum of the absolute row sums, we have

kAk∞ = max{|2| + | − 1| + |0|, |2| + | − 4| + | − 1|, | − 1| + |0| + |2|} = 7

and
n 8 −2 −1 3 −4 −2 4 −1 6 o
kA−1 k∞ = max + + , + + , + + ,

13 13 13 13 13 13 13 13 13
which gives
11
kA−1 k∞ = .
13
172 Applied Linear Algebra and Optimization using MATLAB

Therefore,

−1 11
K(A) = kAk∞ kA k∞ = (7) ≈ 5.9231.
13
Depending on the application, we might consider this number to be rea-
sonably small and conclude that the given matrix A is reasonably well-
conditioned. •
To get the above results using MATLAB commands, we do the follow-
ing:

>> A = [2 − 1 0; 2 − 4 − 1; −1 0 2];
>> Ainv = inv(A)
>> K(A) = norm(A, inf ) ∗ norm(Ainv, inf );
K(A) =
5.9231
Some matrices are notoriously ill-conditioned. For example, consider the
4 × 4 Hilbert matrix
 1 1 1 
1
 2 3 4 
 
 
 1 1 1 1 
 
 2 3 4 5 
H= ,
 
 1 1 1 1 
 
 3 4 5 6 
 
 
1 1 1 1
 
4 5 6 7
whose entries are defined by
1
aij = , for i, j = 1, 2, . . . , n.
(i + j − 1)
The inverse of the matrix H can be obtained as
 
16 −120 240 −140
 −120 1200 −2700 1680 
H −1 = 
 240 −2700
.
6480 −4200 
−140 1680 −4200 2800
Matrices and Linear Systems 173

Then the condition number of the Hilbert matrix is

K(H) = kHk∞ kH −1 k∞ = (2.0833)(13620) ≈ 28375,

which is quite large. Note that the condition numbers of Hilbert matri-
ces increase rapidly as the sizes of the matrices increase. Therefore, large
Hilbert matrices are considered to be extremely ill-conditioned.

We might think that if the determinant of a matrix is close to zero, then


the matrix is ill-conditioned. However, this is false. Consider the matrix
 −7 
10 0
A= ,
0 10−7

for which det A = 10−14 ≈ 0. One can easily find the condition number of
the given matrix as

K(A) = kAk∞ kA−1 k∞ = (10−7 )(107 ) = 1.

The matrix A is therefore perfectly conditioned. Thus, a small determi-


nant is necessary but not sufficient for a matrix to be ill-conditioned.

The condition number of a matrix K(A) using the l2 -norm can be com-
puted by the built-in function cond command in MATLAB as follows:

>> A = [1 − 1 2; 3 1 − 1; 2 0 1];
>> K(A) = cond(A);
K(A) =
19.7982

Theorem 1.31 (Error in Linear Systems)

Suppose that x∗ is an approximation to the solution x of the linear system


Ax = b and A is a nonsingular matrix and r is the residual vector for x∗ .
Then for any natural norm, the error is

kx − x∗ k ≤ krkkA−1 k, (1.56)
174 Applied Linear Algebra and Optimization using MATLAB

and the relative error is


kx − x∗ k krk
≤ K(A) , provided x 6= 0, b 6= 0. (1.57)
kxk kbk

Proof. Since r = b − Ax∗ and A is nonsingular, then

Ax − Ax∗ = b − (b − r) = r,

which implies that


A(x − x∗ ) = r (1.58)
or
x − x∗ = A−1 r.
Taking the norm on both side gives

kx − x∗ k = kA−1 rk ≤ kA−1 kkrk.

Moreover, since b = Ax, then

kbk
kbk ≤ kAkkxk, or kxk ≥ .
kAk

Hence,
kx − x∗ k kA−1 kkrk krk
≤ ≤ K(A) .
kxk kbk/kAk kbk
The inequalities (1.56) and (1.57) imply that the quantities kA−1 k and
K(A) can be used to give an indication of the connection between the
residual vector and the accuracy of the approximation. If the quantity
K(A) ≈ 1, the relative error will be fairly close to the relative residual.
But if K(A) >> 1, then the relative error could be many times larger than
the relative residual.

Example 1.58 Consider the following linear system:

x1 + x2 − x3 = 1
x1 + 2x2 − 2x3 = 0
−2x1 + x2 + x3 = −1.
Matrices and Linear Systems 175

(a) Discuss the ill-conditioning of the given linear system.


(b) Let x∗ = [2.01, 1.01, 1.98]T be an approximate solution of the given sys-
tem, then find the residual vector r and its norm krk∞ .
(c) Estimate the relative error using (1.57).
(d) Use the simple Gaussian elimination method to find the approximate
error using (1.58).

Solution. (a) Given the matrix


 
1 1 −1
A =  1 2 −2  ,
−2 1 1
the inverse can be computed as
 
2 −1 0
A−1 =  1.5 −0.5 0.5  .
2.5 −1.5 0.5
Then the l∞ -norms of both matrices are
kAk∞ = 5 and kA−1 k∞ = 4.5.
Using the values of both matrices’ norms, we can find the value of the
condition number of A as
K(A) = kAk∞ k|A−1 k∞ = 22.5 >> 1,
which shows that the matrix is ill-conditioned. Thus, the given system is
ill-conditioned.

>> A = [1 1 − 1; 1 2 − 2; −2 1 1];
>> K(A) = norm(A, inf ) ∗ norm(inv(A), inf );
(b) The residual vector can be calculated as
r = b − Ax∗
    
1 1 1 −1 2.01
=  0  −  1 2 −2   1.01  .
−1 −2 1 1 1.98
176 Applied Linear Algebra and Optimization using MATLAB

After simplifying, we get


 
−0.04
r =  −0.07  ,
0.03

and it gives
krk∞ = 0.07.
>> A = [1 1 − 1; 1 2 − 2; −2 1 1];
>> b = [1 0 − 1]0 ;
>> x0 = [2.01 1.01 1.98]0 ;
>> r = RES(A, b, x0);
>> rnorm = norm(r, inf );
(c) From (1.57), we have

kx − x∗ k krk
≤ K(A) .
kxk kbk

By using parts (a) and (b) and the value kbk∞ = 1, we obtain

kx − x∗ k (0.07)
≤ (22.5) = 1.575.
kxk 1

>> RelErr = (K(A) ∗ rnorm)/norm(b, inf );

(d) Solve the linear system Ae = r, where


   
1 1 −1 −0.04
A =  1 2 −2  and r =  −0.07 
−2 1 1 0.03

and e = x − x∗ . Write the above system in the augmented matrix form


..
 
1 1 −1 . −0.04
 1 2 −2 ... −0.07  .
 
 
..
−2 1 1 . 0.03
Matrices and Linear Systems 177

After applying the forward elimination step of the simple Gauss elimination
method, we obtain
..
 
1 1 −1 . −0.04
 0 1 −1 ... −0.03  .
 
 
..
0 0 2 . 0.04
Now by using backward substitution, we obtain the solution

e∗ = [−0.01, −0.01, 0.02]T ,

which is the required approximation of the exact error. •

>> B = [1 1 − 1 − 0.04; 1 2 − 2 − 0.07; −2 1 1 0.03];


>> W P (B);

Conditioning
Let us consider the conditioning of the linear system

Ax = b. (1.59)

Case 1.1 Suppose that the right-hand side term b is replaced by b + δb,
where δb is an error in b. If x + δx is the solution corresponding to the
right-hand side b + δb, then we have

A(x + δx) = (b + δb), (1.60)

which implies that

Ax + Aδx = b + δb,
Aδx = δb.

Multiplying by A−1 , we get

δx = A−1 δb.

Taking the norm gives

kδxk = kA−1 δbk ≤ kA−1 kkδbk. (1.61)


178 Applied Linear Algebra and Optimization using MATLAB

Thus, the change kδxk in the solution is bounded by kA−1 k times the change
kδbk in the right-hand side.
The conditioning of the linear system is connected with the ratio between
kδxk kδbk
the relative error and the relative change in the right-hand side,
kxk kbk
which gives

kδxk/kxk kA−1 δbk/kxk kAxkkA−1 δbk


= = ≤ kAkkA−1 k,
kδbk/kbk kδbk/kAxk kxkkδbk

which implies that


kδxk kδbk
≤ K(A) . (1.62)
kxk kbk

Thus, the relative change in the solution is bounded by the condition num-
ber of the matrix times the relative change in the right-hand side. When
the product in the right-hand side is small, the relative change in the so-
lution is small.

Case 1.2 Suppose that the matrix A is replaced by A + δA, where δA is


the error in A, while the right-hand side term b is similar. If x + δx is the
solution corresponding to the matrix A + δA, then we have

(A + δA)(x + δx) = b, (1.63)

which implies that

Ax + Aδx + δA(x + δx) = b

or
Aδx = −δA(x + δx).
Multiplying by A−1 , we get

δx = −A−1 δA(x + δx).

Taking the norm gives

kδxk = k − A−1 δA(x + δx)k ≤ kA−1 kkδAk(kxk + kδxk)


Matrices and Linear Systems 179

or
kδxk(1 − kA−1 kkδAk) ≤ kA−1 kkδAkkxk,
which can be written as
kδxk kA−1 kkδAk K(A)kδAk/kAk
≤ −1
= . (1.64)
kxk (1 − kA kkδAk) (1 − kA−1 kkδAk)
If the product kA−1 kkδAk is much smaller than 1, the denominator in
(1.64) is near 1. Consequently, when kA−1 kkδAk is much smaller than 1,
then (1.64) implies that the relative change in the solution is bounded by
the condition number of a matrix times the relative change in the coefficient
matrix.

Case 1.3 Suppose that there is a change in the coefficient matrix A and the
right-hand side term b together, and if x + δx is the solution corresponding
to the coefficient matrix A + δA and the right-hand side b + δb, then we
have
(A + δA)(x + δx) = (b + δb), (1.65)
which implies that
Ax + Aδx + xδA + δAδx = b + δb
or
Aδx + δxδA = (δb − xδA).
Multiplying by A−1 , we get
δx(I + A−1 δA) = A−1 (δb − xδA)
or
δx = (I + A−1 δA)−1 A−1 (δb − xδA). (1.66)
Since we know that if A is nonsingular and δA is the error in A, we obtain
kA−1 δAk ≤ kA−1 kkδAk < 1, (1.67)
it then follows that (see Fröberg 1969) the matrix (I+A−1 δA) is nonsingular
and
1 1
k(I + A−1 δA)−1 k ≤ −1
≤ −1
. (1.68)
1 − kA δAk 1 − kA kkδAk
180 Applied Linear Algebra and Optimization using MATLAB

Taking the norm of (1.66) and using (1.68) gives


kA−1 k
kδxk ≤ [kδbk + kxkkδAk]
1 − kA−1 kkδAk
or
kA−1 k
 
kδxk kδbk
≤ + kδAk . (1.69)
kxk 1 − kA−1 kkδAk kxk
Since we know that
kbk
kxk ≥ , (1.70)
kAk
by using (1.70) in (1.69), we get
 
kδxk K(A) kδAk kδbk
≤ + . (1.71)
kxk kδAk kAk kbk
1 − K(A)
kAk
The estimate (1.71) shows that small relative changes in A and b cause
small relative changes in the solution x of the linear system (1.59) if the
inequality
K(A)
(1.72)
kδAk
1 − K(A)
kAk
is not too large. •

1.6 Applications
In this section we discuss applications of linear systems. Here, we will solve
or tackle a variety of real-life problems from several areas of science.

1.6.1 Curve Fitting, Electric Networks, and Traffic Flow


Curve Fitting

The following problem occurs in many different branches of science. A set


of data points
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )
Matrices and Linear Systems 181

Figure 1.3: Fitting a graph to data points.

is given and it is necessary to find a polynomial whose graph passes through


the points. The points are often measurements in an experiment. The x-
coordinates are called base points. It can be shown that if the base points
are all distinct, then a unique polynomial of degree n − 1 (or less)

p(x) = a0 + a1 x + · · · + an−2 xn−2 + an−1 xn−1

can be fitted to the points (Figure 1.3).

The coefficients an−1 , an−2 , . . . , a1 , a0 of the appropriate polynomial can


be found by substituting the points into the polynomial equation and then
solving a system of linear equations. It is usual to write the polynomial in
terms of ascending powers of x for the purpose of finding these coefficients.
The columns of the matrix of coefficients of the system of equations then of-
ten follow a pattern. More will be discussed about this in the next chapter.

We now illustrate the procedure by fitting a polynomial of degree 2, a


parabola, to a set of three such data points.
182 Applied Linear Algebra and Optimization using MATLAB

Example 1.59 Determine the equation of the polynomial of degree 2 whose


graph passes through the points (1, 6), (2, 3), and (3, 2).

Solution. Observe that in this example we are given three points and we
want to find a polynomial of degree 2 (one less than the number of data
points). Let the polynomial be
p(x) = a0 + a1 x + a2 x2 .
We are given three points and shall use these three sets of information to
determine the three unknowns a0 , a1 , and a2 . Substituting
x = 1, y = 6; x = 2, y = 3; x = 3, y = 2,
in turn, into the polynomial leads to the following system of three linear
equations in a0 , a1 , and a2 :
a0 + a1 + a2 = 6
a0 + 2a1 + 4a2 = 3
a0 + 3a1 + 9a2 = 2.
Solve this system for a2 , a1 , and a0 using the Gauss elimination method:
. . .
     
1 1 1 .. 6 1 1 1 .. 6 1 1 1 .. 6
 1 2 4 ...  0 1 3 ... −3  ≈  0 1 3 ... −3  .
     
 3 
 ≈   
. .. ..
1 3 9 .. 2 0 2 8 . −4 0 0 2 . 2
Now use backward substitution to get the solution of the system (Fig-
ure 1.4),
2a2 = 2 gives a2 = 1

a1 + 3a2 = −3 gives a1 = −6

a0 + a1 + a2 = 6 gives a0 = 11.
Thus,
p(x) = 11 − 6x + x2
is the required the polynomial. •
Matrices and Linear Systems 183

Figure 1.4: Fitting a graph to data points of Example 1.59.

Electrical Network Analysis

Systems of linear equations are used to determine the currents through


various branches of electrical networks. The following two laws, which are
based on experimental verification in the laboratory, lead to the equations.

Theorem 1.32 (Kirchoff ’s Laws)

1. Junctions: All the current flowing into a junction must flow out of
it.
2. Paths: The sum of the IR terms (where I denotes current and R
resistance) in any direction around a closed path is equal to the total voltage
in the path in that direction. •

Example 1.60 Consider the electric network in Figure 1.5. Let us deter-
mine the currents through each branch of this network.

Solution. The batteries are 8 volts and 16 volts. The resistances are 1
ohm, 4 ohms, and 2 ohms. The current entering each battery will be the
184 Applied Linear Algebra and Optimization using MATLAB

Figure 1.5: Electrical circuit.

same as that leaving it.

Let the currents in the various branches of the given circuit be I1 , I2 ,


and I3 . Kirchhoff ’s Laws refer to junctions and closed paths. There are
two junctions in these circuits, namely, the points B and D. There are
three closed paths, namely ABDA, CBDC, and ABCDA. Apply the laws to
the junctions and paths.

Junctions

Junction B : I1 + I2 = I3
Junction D : I3 = I1 + I2
These two equations result in a single linear equation

I1 + I2 − I3 = 0.

Paths

P ath ABDA : 2I1 + 1I3 + 2I1 = 8


P ath CBDC : 4I2 + 1I3 = 16
Matrices and Linear Systems 185

It is not necessary to look further at path ABCDA. We now have a system


of three linear equations in three unknowns, I1 , I2 , and I3 . Path ABCDA,
in fact, leads to an equation that is a combination of the last two equations;
there is no new information.

The problem thus reduces to solving the following system of three linear
equations in three variables I1 , I2 , and I3 :
I1 + I2 − I3 = 0
4I1 + I3 = 8
4I2 + I3 = 16.
Solve this system for I1 , I2 , and I3 using the Gauss elimination method:
.. .. .
     
1 1 −1 . 0 1 1 −1 . 0 1 1 −1 .. 0
 .   .   . 
 4
 0 1 .. 8 ≈
 0 −4 5 ..  ≈  0 −4
8   5 .. 8 .

. . .
0 4 1 .. 16 0 4 1 .. −4 0 0 6 .. 24
Now use backward substitution to get the solution of the system:
6I3 = 24 gives I3 = 4

−4I2 + 5I3 = 8 gives I2 = 3

I1 + I2 − I3 = 0 gives I1 = 1.
Thus, the currents are I1 = 1, I2 = 3, and I3 = 4. The units are amps.
The solution is unique, as is to be expected in this physical situation. •

Traffic Flow

Network analysis, as we saw in the previous discussion, plays an important


role in electrical engineering. In recent years, the concepts and tools of
network analysis have been found to be useful in many other fields, such
as information theory and the study of transportation systems. The fol-
lowing analysis of traffic flow through a road network during peak periods
illustrates how systems of linear equations with many solutions can arise
in practice.
186 Applied Linear Algebra and Optimization using MATLAB

Consider the typical road network in Figure 1.6. It represents an area


of downtown Jacksonville, Florida. The streets are all one-way with the
arrows indicating the direction of traffic flow. The flow of traffic in and out
of the network is measured in vehicles per hour (vph). The figures given
here are based on midweek peak traffic hours, 7 A.M. to 9 A.M. and 4
P.M. to 6 P.M. An increase of 2% in the overall flow should be allowed for
during Friday evening traffic flow. Let us construct a mathematical model
that can be used to analyze this network. Let the traffic flows along the

Figure 1.6: Downtown Jacksonville, Florida, USA.

various branches be x1 , . . . , x7 as shown in Figure 1.6.


Theorem 1.33 (Traffic Law)

All traffic entering a junction must leave that junction. •


Matrices and Linear Systems 187

This conservation of flow constraint (compare it to the first of Kirchhoff’s


Laws for electrical networks) leads to a system of linear equations:
Junction A : Traffic entering = 400 + 200
Traffic leaving = x1 + x5
Thus x1 + x5 = 600
Junction B : Traffic entering = x1 + x6
Traffic leaving = x2 + 100
Thus, x1 + x6 = x2 + 100.
Continuing thus for each junction and writing the resulting equations in
convenient form with variables on the left and constraints on the right, we
get the following system of linear equations:
Junction A : x1 +x5 = 600
Junction B : x1 −x2 +x6 = 100
Junction C : x2 −x7 = 500
Junction D : −x3 +x7 = 200
Junction E : −x3 +x4 +x6 = 800
Junction F : x4 +x5 = 600
The Gauss–Jordan elimination method is used to solve this system of equa-
tions. Observe that the augmented matrix contains many zeros. These
zeros greatly reduce the amount of computation involved. In practice,
networks are much larger than the one we have illustrated here, and the
systems of linear equations that describe them are thus much larger. The
systems are solved on a computer, however, the augmented matrices of all
such systems contain many zeros.

Solve this system for x1 , x2 , . . . , x7 using the Gauss–Jordan elimination


method:  . 
1 0 0 0 1 0 0 .. 600

 1 −1 .. 
 0 0 0 1 0 . 100 

 .. 
 0 1 0 0 0 0 −1 . 500 
 ≈ ··· ≈
..

0 −1 0 0 0
 
 0 1 . 200 
 .. 
 0
 0 −1 0 0 0 1 . 800  
..
0 0 0 1 1 0 0 . 600
188 Applied Linear Algebra and Optimization using MATLAB

 . 
1 0 0 0 0 1 −1 .. 600
 . 

 0 1 0 0 0 0 −1 .. 500 

 .. 
 0 0 1 0 0 0 −1 . −200 
≈ ..
.
1 −1 .
 
 0 0 0 1 0 600 
 .. 

 0 0 0 0 1 −1 1 . 000 

..
0 0 0 0 0 0 0 . 000
The system of equations that corresponds to this form is:

x1 +x6 −x7 = 600


x2 −x7 = 500
x3 −x7 = −200
x4 +x6 −x7 = 600
x5 −x6 +x7 = 000.

Expressing each leading variable in terms of the remaining variables, we


get
x1 = −x6 + x7 + 600
x2 = x7 + 500
x3 = x7 − 200
x4 = −x6 + x7 + 600
x5 = x6 − x7 .
As was perhaps to be expected, the system of equations has many solutions—
there are many traffic flows possible. One does have a certain amount of
choice at intersections.

Let us now use this mathematical model to arrive at information. Sup-


pose it becomes necessary to perform road work on the stretch of Adams
Street between Laura and Hogan. It is desirable to have as small a flow of
traffic as possible along this stretch of road. The flows can be controlled
along various branches by means of traffic lights at junctions. What is the
minimum flow possible along Adams that would not lead to traffic conges-
tion? What are the flows along the other branches when this is attained?
Our model will enable us to answer these questions.
Matrices and Linear Systems 189

Minimizing the flow along Adams corresponds to minimizing x7 . Since


all traffic flows must be greater than or equal to zero, the third equation
implies that the minimum value of x7 is 200, otherwise, x3 could become
negative. (A negative flow would be interpreted as traffic moving in the
opposite direction to the one permitted on a one-way street.) Thus, the
road work must allow for a flow of at least 200 cars per hour on the branch
CD in the peak period.

Let us now examine what the flows in the other branches will be when
this minimum flow along Adams is attained, when x7 gives

x1 = −x6 + 800
x2 = + 700
x3 = 000
x4 = −x6 + 800
x5 = x6 − 200.

Since x7 = 200 implies that x3 = 0 and vice-versa, we see that the minimum
flow in branch x7 can be attained by making x3 = 0; i.e., by closing branch
DE to traffic. •

1.6.2 Heat Conduction


Another typical application of linear systems is in heat-transfer problems
in physics and engineering.

Suppose we have a thin rectangular metal plate whose edges are kept at
fixed temperatures. As an example, let the left edge be 0o C , the right edge
2o C, and the top and bottom edges 1o C (Figure 1.7). We want to know
the temperature inside the plate. There are several ways of approaching
this kind of problem. The simplest approach of interest to us will be
the following type of approximation: we shall overlay our plate with finer
and finer grids, or meshes. The intersections of the mesh lines are called
mesh points. Mesh points are divided into boundary and interior points,
depending on whether they lie on the boundary or the interior of the plate.
We may consider these points as heat elements, such that each influences
its neighboring points. We need the temperature of the interior points,
190 Applied Linear Algebra and Optimization using MATLAB

Figure 1.7: Heat-transfer problem.

given the temperature of the boundary points. It is obvious that the finer
the grid, the better the approximation of the temperature distribution of
the plate. To compute the temperature of the interior points, we use the
following principle.
Theorem 1.34 (Mean Value Property for Heat Conduction)

The temperature at any interior point is the average of the temperatures of


its neighboring points. •
Suppose, for simplicity, we have only four interior points with unknown
temperatures x1 , x2 , x3 , x4 , and 12 boundary points (not named) with the
temperatures indicated in Figure 1.7.
Example 1.61 Compute the unknown temperatures x1 , x2 , x3 , x4 using Fig-
ure 1.7.

Solution. According to the mean value property, we have


1
x1 = (x2 + x3 + 1)
4
Matrices and Linear Systems 191

1
x2 = (x1 + x4 + 3)
4
1
x3 = (x1 + x4 + 1)
4
1
x4 = (x2 + x3 + 3) .
4
The problem thus reduces to solving the following system of four linear
equations in four variables x1 , x2 , x3 , and x4 :
4x1 − x2 − x3 = 1
−x1 + 4x2 − x4 = 3
−x1 + 4x3 − x4 = 1
− x2 − x3 + 4x4 = 3.
Solve this system for x1 , x2 , x3 , and x4 using the Gauss elimination method:
..
 
4 −1 −1 0 . 1
..
 
 
4 −1 −1 0 . 1  15 1 . 13
..

 .   0 − −1 
 −1 4 0 −1 .. 3  4 4 4 
  
 ≈ ··· ≈  .
 

 −1 .
.. 1  56 16 .
. 22
0 4 −1  0 0 − . 
15 15 15
   
..  
0 −1 −1 4 . 3  24 .. 30 
0 0 0 .
7 7
Now use backward substitution to get the solution of the system:
24 30 5
x4 = gives x4 =
7 7 4
56 16 22 3
x3 − x4 = gives x3 =
15 15 15 4
15 1 13 5
x2 − x3 − x4 = gives x2 =
4 4 4 4
3
4x1 − x2 − x3 = 1 gives x1 = .
4
Thus, the temperatures are x1 = 34 , x2 = 54 , x3 = 34 , and x4 = 54 . •
192 Applied Linear Algebra and Optimization using MATLAB

1.6.3 Chemical Solutions and


Balancing Chemical Equations
Example 1.62 (Chemical Solutions) It takes three different ingredi-
ents, A, B, and C, to produce a certain chemical substance. A, B, and
C have to be dissolved in water separately before they interact to form
the chemical. The solution containing A at 2.5g per cubic centimeter
(g/cm3 ) combined with the solution containing B at 4.2g/cm3 , combined
with the solution containing C at 5.6g/cm3 , makes 26.50g of the chemical.
If the proportions for A, B, C in these solutions are changed to 3.4, 4.7, and
2.8g/cm3 , respectively (while the volumes remain the same), then 22.86g of
the chemical is produced. Finally, if the proportions are changed to 3.7, 6.1,
and 3.7g/cm3 , respectively, then 29.12g of the chemical is produced. What
are the volumes in cubic centimeters of the solutions containing A, B, and
C?

Solution. Let x, y, z be the cubic centimeters of the corresponding volumes


of the solutions containing A, B, and C. Then 2.5x is the mass of A in
the first case, 4.2y is the mass of B, and 5.6z is the mass of C. Added
together, the three masses should be 26.50. So 2.5x + 4.2y + 5.6z = 26.50.
The same reasoning applies to the other two cases, and we get the system

2.5x + 4.2y + 5.6z = 26.50


3.4x + 4.7y + 2.8z = 22.86
3.6x + 6.1y + 3.7z = 29.12.

Solve this system for x, y, and z using the Gauss elimination method:
.
 
2.5 4.2 5.6 .. 26.50
 3.4 4.7 2.8 ...
 
 22.86 

.
3.6 6.1 3.7 .. 29.12

..
 
2.5 4.2 5.6 . 26.50
 .
≈  0 −1.012 −4.816 .. −13.18 



..
0 0.052 −4.364 . −9.04
Matrices and Linear Systems 193

.
 
2.5 4.2 5.6 .. 26.50
 . 
≈
 0 −1.012 −4.816 .. −13.18 
.
..
0 0 −4.612 . −9.717
Now use backward substitution to get the solution of the system:

−4.612z = −9.717 gives z = 2.107


−1.012y − 4.816z = −13.18 gives y = 2.996
2.5x + 4.2y + 5.6z = 26.50 gives x = 0.847.

Hence, the volumes of the solutions containing A, B, and C are, respec-


tively, 0.847cm3 , 2.996cm3 , and 2.107cm3 . •

Balancing Chemical Equations

When a chemical reaction occurs, certain molecules (the reactants) com-


bine to form new molecules (the products). A balanced chemical equation
is an algebraic equation that gives the relative numbers of reactants and
products in the reaction and has the same number of atoms of each type
on the left- and right-hand sides. The equation is usually written with the
reactants on the left, the products on the right, and an arrow in between
to show the direction of the reaction.

For example, for the reaction in which hydrogen gas (H2 ) and oxygen
(O2 ) combine to form water (H2 O), a balanced chemical equation is

2H2 + O2 −→ 2H2 O,

indicating that two molecules of hydrogen combine with one molecule of


oxygen to form two molecules of water. Observe that the equation is bal-
anced, since there are four hydrogen atoms and two oxygen atoms on each
side. Note that there will never be a unique balanced equation for a reac-
tion, since any positive integer multiple of a balanced equation will also be
balanced. For example, 6H2 + 3O2 −→ 6H2 O is also balanced. Therefore,
we usually look for the simplest balanced equation for a given reaction.
Note that the process of balancing chemical equations really involves solv-
ing a homogeneous system of linear equations.
194 Applied Linear Algebra and Optimization using MATLAB

Example 1.63 (Balancing Chemical Equations) The combustion of


ammonia (N H3 ) in oxygen produces nitrogen (N2 ) and water. Find a bal-
anced chemical equation for this reaction.

Solution. Let w, x, y, and z denote the numbers of molecules of ammonia,


oxygen, nitrogen, and water, respectively, then we are seeking an equation
of the form
wN H3 + xO2 −→ yN2 + zH2 O.
Comparing the number of nitrogen, hydrogen, and oxygen atoms in the
reactants and products, we obtain three linear equations:

Nitrogen: w = 2y
Hydrogen: 3w = 2z
Oxygen: 2x = z.

Rewriting these equations in standard form gives us a homogeneous system


of three equations in four variables:

w − 2y = 0
3w − 2z = 0
2x − z = 0.

The augmented matrix form of the system is

.
 
1 0 −2 0 .. 0
 . 
 3 0
 0 −2 .. 0 
.
.
0 2 0 −1 .. 0

Solve this system for w, x, y, and z using the Gauss elimination method
with partial pivoting:

.. .
   
3 0 0 −2 . 0 3 0 0 −2 .. 0
2 .. .
   
 0 0 −2
 3 ≈ 0 2
. 0   0 −1 .. 0 .

. 2 ..
0 2 0 −1 .. 0 0 0 −2 3
. 0
Matrices and Linear Systems 195

Now use backward substitution to get the solution of the homogeneous sys-
tem:
−2y + 32 z = 0 gives y = 31 z

2x − z=0 gives x = 12 z

3w − 2z = 0 gives w = 32 z.
The smallest positive value of z that will produce integer values for all
four variables is the least common denominator of the fractions 23 , 12 , and
1
3
—namely, 6—which gives

w = 4, x = 3, y = 2, z = 6.

Therefore,
4N H3 + 3O2 −→ 2N2 + 6H2 O
is the balanced chemical equation. •

1.6.4 Manufacturing, Social, and Financial Issues


Example 1.64 (Manufacturing) Sun Microsystems manufactures three
types of personal computers: The Cyclone, the Cyclops, and the Cycloid.
It takes 15 hours to assemble the Cyclone, 4 hours to test its hardware, and
5 hours to install its software. The hours required for the Cyclops are 12
hours to assemble, 4.5 hours to test, and 2.5 hours to install. The Cycloid,
being the lower end of the line, requires 10 hours to assemble, 3 hours to
test, and 2.5 hours to install. If the company’s factory can afford 1250
labor hours per month for assembling, 400 hours for testing, and 320 hours
for installation, how many PCs of each kind can be produced in a month?

Solution. Let x, y, z be the number of Cyclones, Cyclops, and Cycloids


produced each month. Then it takes 15x + 12y + 10z hours to assemble the
computers. Hence, 15x + 12y + 10z = 1250. Similarly, we get equations for
testing and installing. The resulting system is
15x + 12y + 10z = 1250
4x + 4.5y + 3z = 400
5x + 2.5y + 2.5z = 320.
196 Applied Linear Algebra and Optimization using MATLAB

Solve this system for x, y, and z using the Gauss elimination method:
 .. 
  15 12 10 . 1250 
..

15 12 10 . 1250  
 ..  
 0 13 1 .
. 200 
 4 4.5 3 . 400  ≈ 
 . 
10 3 3
 
..
 
5 2.5 2.5 . 320
 
3 5 .. 290
 
0 − − . −
2 6 3
 . 
15 12 10 .. 1250
 
 
 13 1 .
.. 200 
≈ 0 .
 
 10 3 3 
 
35 .. 770
 
0 0 − . −
78 39
Now use backward substitution to get the solution of the system:
35 770
− z=− gives z = 44
78 39
13 1 200
y+ z= gives y = 40
10 3 3

15x + 12y + 10z = 1250 gives x = 20.


Hence, 20 Cyclones, 40 Cyclops, and 44 Cycloids can be manufactured
monthly. •

Example 1.65 (Weather) The average of the temperature for the cities
of Jeddah, Makkah, and Riyadh was 50o C during a given summer day. The
temperature in Makkah was 5o C higher than the average of the temperatures
of the other two cities. The temperature in Riyadh was 5o C lower than the
average temperature of the other two cities. What was the temperature in
each of the cities?

Solution. Let x, y, z be the temperatures in Jeddah, Makkah, and Riyadh,


respectively. The average temperature of all three cities is (x+y+z)
3
, which is
Matrices and Linear Systems 197

50o C. On the other hand, the temperature in Makkah exceeds the average
temperature of Jeddah and Riyadh, (x+z)
2
, by 5o C. So, y = (x+z)
2+5
. Likewise,
(x+y)
we have z = 2−5 . So, the system becomes

(x + y + z)
= 50
3
(x + z)
y = +5
2
(x + y)
z = − 5.
2
Rewriting the above system in standard form, we get

x + y + z = 150
−x + 2y − z = 10
−x − y + 2z = −10.

Solve this system for x, y, and z using the Gauss elimination method:
.. ..
   
1 1 1 . 150 1 1 1 . 150
 ..   . 
 −1
 2 −1 . 10  ≈  0 3
  0 .. 160 .
.. .
−1 −1 2 . −10 0 0 3 .. 140

Now use backward substitution to get the solution of the system:

3z = 140 gives z = 46.667


3y = 160 gives y = 53.333
x + y + z = 150 gives x = 50.

Thus, the temperature in Jeddah was 50o C and the temperatures in Makkah
and Riyadh were approximately, 53o C and 470 C, respectively. •

Example 1.66 (Foreign Currency Exchange) An international busi-


ness person needs, on the average, fixed amounts of Pakistani rupees, En-
glish pounds, and Saudi riyals during each of his business trips. He traveled
three times this year. The first time he exchanged a total of $26000 at the
198 Applied Linear Algebra and Optimization using MATLAB

following rates: the dollar was 60 rupees, 0.6 pounds, and 3.75 riyals. The
second time he exchanged a total of $25500 at these rates: the dollar was
65 rupees, 0.56 pounds, and 3.76 riyals. The third time he exchanged again
a total of $25500 at these rates: the dollar was 65 rupees, 0.6 pounds, and
3.75 riyals. How many rupees, pounds, and riyals did he buy each time?

Solution. Let x, y, z be the fixed amounts of rupees, pounds, and riyals he


1
purchases each time. Then the first time he spent ( 60 )x dollars to buy ru-
1 1
pees, ( 0.6 )y dollars to buy pounds, and ( 3.75 )z dollars to buy riyals. Hence,

     
1 1 1
x+ y+ z = 26000.
60 0.8 3.75

The same reasoning applies to the other two purchases, and we get the
system

1 5 4
x + y + z = 26000
60 3 15
1 25 25
x + y + z = 25500
65 14 94
1 5 4
x + y + z = 25500.
65 3 15

Solve this system for x, y, and z using the Gauss elimination method:

1 5 4 .. 1 5 4 ..
   
 60 . 26000   . 26000 
 3 15   60 3 15 
   
 1 25 25 ..   45 121 .. 

 65 . 25500  ≈ 0 . 1500 
 14 94 
 
 182 6110 

   
 1 5 4 ..   5 4 .. 
. 25500 0 . 1500
65 3 15 39 195
Matrices and Linear Systems 199

1 5 4 ..
 
 60 . 26000
 3 15 

 
 45 121 .. 
≈ 0
 . 1500  .
 182 6110 
 
 13 .. 6500 
0 0 .
1269 9
Now use backward substitution to get the solution of the system:
13 6500
z= gives z = 70500
1269 9
45 121
y+ z = 1500 gives y = 420
182 6110
1 5 4
x + y + z = 26000 gives x = 390000.
60 3 15
Therefore, each time he bought 390000 rupees, 420 pounds, and 70500 riyals
for his trips. •
Example 1.67 (Inheritance) A father plans to distribute his estate, worth
SR234,000, between his four daughters as follows: 23 of the estate is to be
split equally among the daughters. For the rest, each daughter is to receive
SR3,000 for each year that remains until her 21st birthday. Given that
the daughters are all 3 years apart, how much would each receive from her
father’s estate? How old are the daughters now?

Solution. Let x, y, z, and w be the amounts of money that each daughter


will receive from the splitting of 31 of the estate, according to age, starting
with the oldest one. Then x + y + z + w = 13 (234, 000) = 78, 000. On the
other hand, w − z = 3(3000), z − y = 3(3000), and y − x = 3(3000). The
problem thus reduces to solving the following system of four linear equations
in four variables x, y, z, and w:
x + y + z + w = 78, 000
− z + w = 9, 000
− y + z = 9, 000
−x + y = 9, 000.
200 Applied Linear Algebra and Optimization using MATLAB

Solve this system for x1 , x2 , x3 , and x4 using the Gauss elimination method
with partial pivoting:

.. ..
   
 1 1 1 1 . 78, 000   1
..
1 1 1 . 78, 000 
..
 0 0 −1 1 . 9, 000   0 0 −1 1 . 9, 000 
   
. ≈ ..
1 0 .. 9, 000 
 
 0 −1
  0 −1 1 0 . 9, 000 

 
.. ..
−1 1 0 0 . 9, 000 0 2 1 1 . 87, 000

and
..
 
 .. 
1 1 1 1 . 78, 000  1 1 1 1 . 78, 000 
.
..  0 2 1 1 .. 87, 000 
   
 0 2 1 1 . 87, 000 

3 1 .. ≈ 0
  3 1 .. 
.

 0 0 . 52, 500  0 . 52, 500 
2 2
 
 2 2   
..  4 .. 
0 0 −1 1 . 9, 000 0 0 0 . 44, 000
3
Now use backward substitution to get the solution of the system:

4
w = 44, 000 gives w = 33, 000
3
3 1
z + w = 52, 500 gives z = 24, 000
2 2

2y + z + w = 87, 000 gives y = 15, 000

x + y + z + w = 78, 000 gives x = 6, 000.

One-quarter of two-thirds of the estate is worth 14 ( 23 (234, 000)) = SR39, 000.


So, the youngest daughter will receive (33, 000 + 39, 000) = SR72, 000, the
next one (24, 000 + 39, 000) = SR63, 000, the next one (15, 000 + 39, 000) =
SR54, 000, and the first one (6, 000 + 39, 000) = SR45, 000. The oldest
daughter will receive 6, 000 = 2(3, 000), so she is currently 21 − 2 = 19.
The second one is 16, the third one is 13, and the last one is 10 years old.

Matrices and Linear Systems 201

1.6.5 Allocation of Resources


A great many applications of systems of linear equations involve allocating
limited resources subject to a set of constraints.

Example 1.68 A dietitian is to arrange a special diet composed of four


foods A, B, C, and D. The diet is to include 72 units of calcium, 45 units
of iron, 42 units of vitamin A, and 60 units of vitamin B. The following
table shows the amount of calcium, iron, vitamin A, and vitamin B (in

Food Calcium Iron Vitamin A Vitamin B


A 18 6 6 6
B 9 6 12 9
C 9 9 6 9
D 12 12 9 18

units) per ounce in foods A, B, C, and D. Find, if possible, the amount of


foods A, B, C, and D that can be included in the special diet to conform to
the dietitian’s recommendations.

Solution. Let x, y, z, and w be the ounces of foods A, B, C, and D,


respectively. Then we have the system of equations

18x + 9y + 9z + 12w = 72
6x + 6y + 9z + 12w = 45
6x + 12y + 6z + 9w = 42
6x + 9y + 9z + 18w = 60.

Solve this system for x, y, z, and w using the Gauss elimination method:

.. ..
   
 18 9 9 12 .
..
72   18 9 9 12 .
..
72 
 6 6 9 12 . 45   0 3 6 8 . 21 
   
 .. ≈ .. 
 6 12 6 9 . 42   0 9 3 5 . 18 
   
.. ..
6 9 9 18 . 60 0 6 6 14 . 36
202 Applied Linear Algebra and Optimization using MATLAB

and

..
  .. 
18 9 9 12 . 72
 18 9 9 12
..
. 72  
.. 
 0 3

6 8 . 21  
  0 3 6 8 . 21 
≈ .

 .. ..
 0 0 −15 −19 . −45   0 0 −15 −19 . −45 
28 ..
   
..
0 0 −6 −2 . 36 0 0 0 . 12
5
Now use backward substitution to get the solution of the system:
28 15
w = 12 gives w =
5 7
2
−15z − 19w = −45 gives z =
7
5
3y + 6z + 8w = 21 gives y =
7
29
18x + 9y + 9z + 12w = 72 gives x = .
14
29
Thus, the amount in ounces of foods A, B, C, and D are x = 14
,y = 75 , z =
2
7
, and w = 15
7
, respectively. •

1.7 Summary
The basic methods for solving systems of linear algebraic equations were
discussed in this chapter. Since these methods use matrices and determi-
nants, the basic properties of matrices and determinants were presented.

Several direct solution methods were also discussed. Among them were
Cramer’s rule, Gaussian elimination and its variants, the Gauss–Jordan
method, and the LU decomposition method. Cramer’s rule is impracti-
cal for solving systems with more than three or four equations. Gaussian
elimination is the best choice for solving linear systems. For systems of
equations having a constant coefficients matrix but many right-hand side
Matrices and Linear Systems 203

vectors, LU decomposition is the method of choice. The LU decomposi-


tion method has been used for the solution of tridiagonal systems. Direct
methods are generally used when the number of equations is small, or most
of the coefficients of the equations are nonzero, or the system of equations
is not diagonally dominant, or the system of equations is ill-conditioned.
But these methods are generally impractical when a large number of equa-
tions must be solved simultaneously. In this chapter we also discussed
conditioning of linear systems by using a parameter called the condition
number. Many ill-conditioned systems were discussed. The coefficient ma-
trix A of an ill-conditioned system Ax = b has a large condition number.
The numerical solution to a linear system is less reliable when A has a
large condition number than when A has a small condition number. The
numerical solution x∗ of Ax = b is different from the exact solution x be-
cause of round-off errors in all stages of the solution process. The round-off
errors occur in the elimination or factorization of A and during backward
substitution to compute x∗ . The degree to which perturbation in A and
b affect the numerical solution is determined by the value of the condition
number K(A). A large value of K(A) indicates that A is close to being
singular. When K(A) is large, matrix A is said to be ill-conditioned and
small perturbations in A and b cause relatively large differences between x
and x∗ . If K(A) is small, any stable algorithm will return a solution with
small residual r, while if K(A) is large, then the return solution may have
large errors even though the residuals are small. The best way to deal with
ill-conditioning is to avoid it by reformulating the problem.

At the end of the chapter we discussed many applications of linear


systems. Fitting a polynomial of degree (n − 1) to n data points leads
to a system of linear equations that has a unique solution. The analysis
of electric networks and traffic flow give rise to systems that have unique
solutions and many solutions. The model for traffic flow is similar to that
of electric networks, but it has fewer restrictions, leading to more freedom
and thus many solutions in place of a unique solution. Applications to
heat conduction, chemical reactions, balancing equations, manufacturing,
social and financial issues, and allocation of resources were also covered.
204 Applied Linear Algebra and Optimization using MATLAB

1.8 Problems
1. Determine the matrix C given by the following expression

C = 2A − 3B,

if the matrices A and B are


   
2 −1 1 1 1 1
A =  −1 2 3 , B =  0 1 3 .
2 1 2 2 1 4

2. Find the product AB and BA for the matrices of Problem 1.

3. Show that the product AB of the following rectangular matrices is a


singular matrix:
 
6 −3  
2 −1 −2
A= 1 4 , B= .
3 −4 −1
−2 1

4. Let
     
1 2 3 1 1 2 1 0 1
A =  0 −1 2  , B =  −1 1 −1  , C =  0 1 2 .
2 0 2 1 0 2 2 0 1

(a) Compute AB and BA and show that AB 6= BA.


(b) Find (A + B) + C and A + (B + C).
(c) Show that (AB)T = B T AT .

5. Find a value of x and y such that AB T = C T , where


 
1 2 3
A =  4 2 0 , B = [1 x 1], C = [−2 − 2 y].
2 1 3
Matrices and Linear Systems 205

6. Find the values of a and b such that each of the following matrices is
symmetric:
   
1 3 5 −2 a + b 2
(a) A =  a + 2 5 6  , (b) B =  3 4 2a + b  ,
b+1 6 7 2 5 −3
   
1 4 a−b 1 a − 4b 2
(c) C =  4 2 a + 3b  , (d) D =  2 8 6 .
7 3 4 7 a − 7b 8

7. Which of the following matrices are skew symmetric?

(a)    
1 −5 0 −4
A= , B= ,
5 0 4 0
(b)    
1 9 1 6
C= , D= ,
−9 7 −6 2
(c)    
0 2 −2 3 −3 −3
E =  −2 0 4 , F = 3 3 −3  ,
2 −4 0 3 3 3
(d)    
1 −5 1 2 8 6
G= 5 1 4 , H =  −8 4 2 .
−1 −4 1 −6 −2 5

8. Determine whether each of the following matrices is in row echelon


form, reduced row echelon form, or neither:

(a)    
1 0 0 1 0 8
A =  0 1 0 , B =  0 1 2 ,
0 0 3 0 0 0
206 Applied Linear Algebra and Optimization using MATLAB

(b)    
1 2 3 0 1 2 0 0 1
C =  0 0 0 1 , D =  0 0 1 0 1 ,
0 0 0 1 0 0 0 1 0
(c)  
1 4 5 6  
 0 1 0 0 0 3
1 7 8 
E=
 0
, F =  0 0 1 0 4 ,
0 1 9 
0 0 0 1 5
0 0 0 0
(d)    
1 0 0 3 0 0 0 0 0
 0 1 0 4   0 0 1 2 4 
G=
 0
, H= .
0 0 5   0 0 0 1 0 
0 0 0 6 0 0 0 0 0
9. Find the row echelon form of each of the following matrices using
elementary row operations, and then solve the linear system:

(a)    
0 1 2 1
A =  2 3 4 , b =  −1  .
1 3 2 2
(b)    
1 2 3 1
A =  0 3 1 , b =  0 .
−1 4 5 −3
(c)    
0 −1 0 1
A= 3 0 1 , b =  3 .
0 1 1 2
(d)   

0 −1 2 4 2
 2 3 5 6   1 
A= , b=
 −1  .

 1 3 −2 4 
1 2 −1 3 2
Matrices and Linear Systems 207

10. Find the row echelon form of each of the following matrices using
elementary row operations, and then solve the linear system:

(a)
   
1 4 −2 5
A =  2 3 2 , b =  −3  .
6 4 1 4

(b)
   
2 2 7 3
A =  0 3 2 , b =  2 .
3 2 1 5

(c)
   
0 −1 0 1
A= 5 0 2 , b =  1 .
−1 1 4 1

(d)
   
1 1 2 4 11
 1 3 4 5   7 
A=
 1
, b=
 6 .

4 2 4 
2 2 −1 3 4

11. Find the reduced row echelon form of each of the following matrices
using elementary row operations, and then solve the linear system:

(a)
   
1 2 3 4
A =  −1 2 1  , b =  3 .
0 1 2 1

(b)
   
0 1 4 1
A =  2 1 −1  , b =  1 .
1 3 4 −1
208 Applied Linear Algebra and Optimization using MATLAB

(c)    
0 −1 3 2 6
 3 2 5 4   4 
A=
 −1
, b=
 4 .

3 1 2 
2 3 4 1 4
(d)    
1 2 −4 1 1
 −2 0 2 3   −1 
A= , b=
 2 .

 0 1 −1 2 
2 3 0 −1 4

12. Compute the determinant of each of the following matrices using


cofactor expansion along any row or column:
     
cos x sin x 1 x y z 2x 0 z
A= 0 3 cos x −3 sin x  , B =  0 x2 y  , C =  0 2y −z  .
0 2 sin x 2 cos x 0 y2 x z −z 2z

13. Compute the determinant of each of the following matrices using


cofactor expansion along any row or column:
     
3 7 6 11 −6 4 4 −8 11
A =  0 3 5  , B =  −16 8 6  , C =  10 1 4 .
7 4 3 5 7 12 7 10 8

14. Let    
1 1 1 0
A= , B= ,
0 1 1 1
then show that (AB)−1 = B −1 A−1 .

15. Evaluate the determinant of each of the following matrices using the
Gauss elimination method:
     
3 1 −1 4 1 6 17 46 7
A= 2 0 4  , B =  −3 6 4 , C =  20 49 8  .
1 −5 1 5 0 9 23 52 19
Matrices and Linear Systems 209

16. Evaluate the determinant of each of the following matrices using the
Gauss elimination method:
   
4 2 5 −1 4 −2 5 −3
 2 5 4 6 
, B =  1 8 12 7 

A=  4 5 1
,
3   1 4 3 6 
11 7 1 1 5 3 −3 6
   
13 22 −12 8 9 11 2 8
 15 10 33 4   15 1 3 12 
C=
 9 −12
, D=
 9 −12 5 17  .

5 7 
15 33 −19 26 13 17 21 15

17. Find all zeros (values of x such that f (x) = 0) of polynomial f (x) =
det(A), where
 
x−1 3 2
A= 3 x 1 .
2 1 x−2

18. Find all zeros (values of x such that f (x) = 0) of polynomial f (x) =
det(A), where
 
x 0 1
A =  2 1 3 .
0 x 2

19. Find all zeros (values of x such that f (x) = 0) of polynomial f (x) =
det(A), where
 
x −8 5 2
 −3 x 2 1 
A=  3
.
4 x 1 
3 6 −5 17

20. (a) The matrix


 
−x 1 0
A= 0 −x 1 
−c0 −c1 −c2
210 Applied Linear Algebra and Optimization using MATLAB

is called the companion matrix of the polynomial (−1)(c2 x2 +c1 x+c0 ).


Show that

−x 1 0

|A| = 0 −x 1 = (−1)(c2 x2 + c1 x + c0 ).
−c0 −c1 −c2

(b) The matrix  


1 1 1
 
 
 x1 x2 x3 
A= 
 
2 2 2
x1 x2 x3
is called the vandermonde matrix. It is a square matrix and it is
famously ill-conditioned. Show that

1 1 1



|A| = x1 x2 x3 = (x1 − x2 )(x2 − x3 )(x3 − x1 ).


2 2 2
x x x
1 2 3

(c) A square matrix A is said to be a nilpotent matrix, if Ak = 0


for some positive integer k. Prove that if A is nilpotent, then the
determinant of A is zero.

(d) A square matrix A is said to be an idempotent matrix, if A2 = A.


Prove that if A is idempotent, then either det(A) = 1 or det(A) = 0.

(e) A square matrix A is said to be an involution matrix, if A2 = I.


Give an example of a 3 × 3 matrix that is an involution matrix.
21. Compute the adjoint of each matrix A, and find the inverse of it, if
it exists:
 
  1 2 −1
1 2
(a) A = , (b) A =  2 1 4 ,
−3 4
1 5 −8
Matrices and Linear Systems 211
 
1 1 0
(c) A =  1 0 1  .
0 1 1

22. Show that A(Adj A) = (Adj A)A = det(A)I3 , if


 
2 1 3
A =  −1 2 0 .
3 −2 1

23. Find the inverse and determinant of the adjoint matrix of each of the
following matrices:
     
4 1 5 3 4 −2 1 2 4
A =  5 6 3 , B =  2 5 4 , C =  1 4 0 .
5 4 4 7 −3 4 3 1 1

24. Find the inverse and determinant of the adjoint matrix of each of the
following matrices:
     
3 2 5 5 3 −2 1 2 3
A =  2 5 4 , B =  3 5 6 , C =  4 5 6 .
5 4 6 −2 6 5 7 8 8

25. Find the inverse of each of the following matrices using the determi-
nant:
 
    0 4 2 −4
0 1 5 2 4 −2  6 1 4 −3 
A =  3 1 2  , B =  −4 7 5 , C =   4
.
3 1 3 
2 3 4 5 −4 4
8 4 −3 2

26. Solve each of the following homogeneous linear systems:

(a)
x1 − 2x2 + x3 = 0
x1 + x2 + 3x3 = 0
2x1 + 3x2 − 5x3 = 0.
212 Applied Linear Algebra and Optimization using MATLAB

(b)
x1 − 5x2 + 3x3 = 0
2x1 + 3x2 + 2x3 = 0
x1 − 2x2 − 4x3 = 0.
(c)
3x1 + 4x2 − 2x3 = 0
2x1 − 5x2 − 4x3 = 0
3x1 − 2x2 + 3x3 = 0.
(d)
x1 + x2 + 3x3 − 2x4 = 0
x1 + 2x2 + 5x3 + x4 = 0
x1 − 3x2 + x3 + 2x4 = 0.

27. Find value(s) of α such that each of the following homogeneous linear
systems has a nontrivial solution:

(a)
2x1 − (1 − 3α)x2 = 0
x1 + αx2 = .
(b)
2x1 + 2αx2 − x3 = 0
x1 − 2x2 + x3 = 0
αx1 + 2x2 − 3x3 = 0.
(c)
x1 + 2x2 + 4x3 = 0
3x1 + 7x2 + αx3 = 0
3x1 + 3x2 + 15x3 = 0.
(d)
x1 + x2 + 2x3 − 3x4 = 0
x1 + 2x2 + x3 − 2x4 = 0
3x1 + x2 + αx3 + 3x4 = 0
3x1 + x2 + αx3 + 3x4 = 0
2x1 + 3x2 + x3 + αx4 = 0.
Matrices and Linear Systems 213

28. Using the matrices in Problem 15, solve the following systems using
the matrix inversion method:

(a) Ax = [1, 1, −3]T , (b) Bx = [2, 1, 3]T , (c) Cx = [1, 0, 1]T .

29. Solve the following systems using the matrix inversion method:

(a)
x1 + 3x2 − x3 = 4
5x1 − 2x2 − x3 = −2
2x1 + 2x2 + x3 = 9.
(b)
x1 + x2 + 3x3 = 2
5x1 + 3x2 + x3 = 3
2x1 + 3x2 + x3 = −1.
(c)
4x1 + x2 − 3x3 = −1
3x1 + 2x2 − 6x3 = −2
x1 − 5x2 + 3x3 = −3.
(d)
7x1 + 11x2 − 15x3 = 21
3x1 + 22x2 − 18x3 = 12
2x1 − 13x2 + 9x3 = 16.

30. Solve the following systems using the matrix inversion method:

(a)
3x1 − 2x2 − 4x3 = 7
5x1 − 2x2 − 3x3 = 8
7x1 + 4x2 + 2x3 = 9.
(b)
−3x1 + 4x2 + 3x3 = 11
5x1 + 3x2 + x3 = 12
x1 + x2 + 5x3 = 10.
214 Applied Linear Algebra and Optimization using MATLAB

(c)
x1 + 42 − 8x3 = 7
2x1 + 7x2 − 5x3 = −5
3x1 − 6x2 + 6x3 = 4.
(d)
17x1 + 18x2 − 19x3 = 10
43x1 + 22x2 − 14x3 = 11
25x1 − 33x2 + 21x3 = 12.

31. Solve the following systems using the matrix inversion method:

(a)
2x1 + 3x2 − 4x3 + 4x4 = 11
x1 + 3x2 − 4x3 + 2x4 = 12
4x1 + 3x2 + 2x3 + 3x4 = 14
3x1 − 4x2 + 5x3 + 6x4 = 15.
(b)
7x1 + 13x2 + 12x3 + 9x4 = 21
3x1 + 23x2 − 5x3 + 2x4 = 10
4x1 − 7x2 + 22x3 + 3x4 = 11
3x1 − 4x2 + 25x3 + 16x4 = 10.
(c)
12x1 + 6x2 + 5x3 − 2x4 = 21
11x1 + 13x2 + 7x3 + 2x4 = 22
14x1 + 9x2 + 2x3 − 6x4 = 23
7x1 − 24x2 − 7x3 + 8x4 = 24.
(d)
15x1 − 26x2 + 15x3 − 11x4 = 17
14x1 + 15x2 + 7x3 + 7x4 = 18
17x1 + 14x2 − 22x3 − 16x4 = 19
21x1 − 12x2 − 7x3 + 8x4 = 20.

32. In each case, factor the matrix as a product of elementary matrices:


     
1 1 3 2 1 1
(a) , (b) , (c) ,
3 1 1 2 −2 4
Matrices and Linear Systems 215
     
1 0 1 1 1 2 1 −3 5
(d)  0 2 1  , (e)  0 1 2  , (f )  −2 2 −4  .
2 2 3 1 2 3 4 7 9

33. Solve Problem 30 using Cramer’s rule.

34. Solve the following systems using Cramer’s rule:

(a)
3x1 + 4x2 + 5x3 = 1
3x1 + 2x2 + x3 = 2
4x1 + 3x2 + 5x3 = 3.
(b)
x1 − 4x2 + 2x3 = 4
−4x1 + 5x2 + 6x3 = 0
7x1 − 3x2 + 5x3 = 4.
(c)
6x1 + 7x2 + 8x3 = 1
−5x1 + 3x2 + 2x3 = 1
x1 + 2x2 + 3x3 = 1.
(d)
x1 + 3x2 − 4x3 + 5x4 = 2
6x1 − x2 + 6x3 + 3x4 = −3
2x1 + x2 + 3x3 + 2x4 = 4
x1 + 5x2 + 6x3 + 7x4 = 2.

35. Solve the following systems using Cramer’s rule:

(a)
2x1 − 2x2 + 8x3 = 1
5x1 + 6x2 + 5x3 = 2
7x1 + 7x2 + 9x3 = 3.
(b)
3x1 − 3x2 + 12x3 = 14
−4x1 + 5x2 + 16x3 = 18
x1 − 15x2 + 24x3 = 19.
216 Applied Linear Algebra and Optimization using MATLAB

(c)
9x1 − 11x2 + 12x3 = 3
−5x1 + 3x2 + 2x3 = 4
7x1 − 12x2 + 13x3 = 5.
(d)
11x1 + 3x2 − 13x3 + 15x4 = 22
26x1 − 5x2 + 6x3 + 13x4 = 23
22x1 + 6x2 + 13x3 + 12x4 = 24
17x1 − 25x2 + 16x3 + 27x4 = 25.

36. Use the simple Gaussian elimination method to show that the fol-
lowing system does not have a solution:

3x1 + x2 = 1.5
2x1 − x2 − x3 = 2
4x1 + 3x2 + x3 = 0.

37. Solve Problem 34 using the simple Gaussian elimination method.

38. Solve the following systems using the simple Gaussian elimination
method:

(a)
x1 − x2 = −2
−x1 + 2x2 − x3 = 5
4x1 − x2 + 4x3 = 1.
(b)
3x1 + x2 − x3 = 5
5x1 − 3x2 + 2x3 = 7
2x1 − x2 + x3 = 3.
(c)
3x1 + x2 + x3 = 2
2x1 + 2x2 + 4x3 = 3
4x1 + 9x2 + 16x3 = 1.
Matrices and Linear Systems 217

(d)
2x1 + x2 + x3 − x4 = 9
x1 + 9x2 + 8x3 + 4x4 = 11
−x1 + 3x2 + 5x3 + 2x4 = 10
5x1 + x2 + x4 = 12.

39. Solve the following systems using the simple Gaussian elimination
method:

(a)
2x1 + 5x2 − 4x3 = 3
2x1 + 2x2 − x3 = 1
3x1 + 2x2 − 3x3 = −5.
(b)
2x2 − x3 = 1
3x1 − x2 + 2x3 = 4
x1 + 3x2 − 5x3 = 1.
(c)
x1 + 2x2 = 3
−x1 − 2x3 = −5
−3x1 − 5x2 + x3 = −4.
(d)
3x1 + 2x2 + 4x3 − x4 = 2
x1 + 4x2 + 5x3 + x4 = 1
4x1 + 5x2 + 4x3 + 3x4 = 5
2x1 + 3x2 + 2x3 + 4x4 = 6.

40. For what values of a and b does the following linear system have no
solution or infinitely many solutions:

(a)
2x1 + x2 + x3 = 2
−2x1 + x2 + 3x3 = a
2x1 − x3 = b.
218 Applied Linear Algebra and Optimization using MATLAB

(b)
2x1 + 3x2 − x3 = 1
x1 − x2 + 3x3 = a
3x1 + 7x2 − 5x3 = b.
(c)
2x1 − x2 + 3x3 = 3
3x1 + x2 − 5x3 = a
−5x1 − 5x2 + 21x3 = b.
(d)

2x1 − x2 + 3x3 = 5
4x1 + 2x2 + bx3 = 6
−2x1 + ax2 + 3x3 = 4.

41. Find the value(s) of α so that each of the following linear systems
has a nontrivial solution:

(a)
2x1 + 2x2 + 3x3 = 1
3x1 + αx2 + 5x3 = 3
x1 + 7x2 + 3x3 = 2.
(b)
x1 + 2x2 + x3 = 2
x1 + 3x2 + 6x3 = 5
2x1 + 3x2 + αx3 = 6.
(c)
αx1 + x2 + x3 = 7
x1 + x2 − x3 = 2
x1 + x2 + αx3 = 1.
(d)
2x1 + αx2 + 3x3 = 9
3x1 − 4x2 − 5x3 = 11
4x1 + 5x2 + αx3 = 12.
Matrices and Linear Systems 219

42. Find the inverse of each of the following matrices by using the simple
Gauss elimination method:
     
3 3 3 5 3 2 1 2 3
A =  0 2 2 , B =  3 2 2 , C =  2 5 2 .
2 4 5 2 6 5 3 4 3

43. Find the inverse of each of the following matrices by using the simple
Gauss elimination method:
     
3 2 3 1 −3 2 5 2 3
A =  4 2 2 , B =  3 2 6 , C =  2 5 5 .
2 4 3 2 −6 5 3 2 4

44. Determine the rank of each of the following matrices:


     
3 1 −1 4 1 6 17 46 7
A=  2 0 4  , B =  −3 6 4  , C =  20 49 8  .
1 −5 1 5 0 9 23 52 9

45. Determine the rank of each matrix:


 
   1  2 3 4
2 −1 0 0.1 0.2 0.3  2 4 6 8 
A =  2 −1 1  , B =  0.4 0.5 0.6  , C = 
 3
.
5 7 9 
1 1 −1 0.7 0.8 0.91
4 6 8 10

46. Let A be an m × n matrix and B be an n × p matrix. Show that the


rank of AB is less than or equal to the rank of A.

47. Solve Problem 38 using Gaussian elimination with partial pivoting.

48. Solve the following linear systems using Gaussian elimination with
partial and without pivoting:

(a)
1.001x1 + 1.5x2 = 0
2x1 + 3x2 = 1.
220 Applied Linear Algebra and Optimization using MATLAB

(b)
x1 + 1.001x2 = 2.001
x1 + x2 = 2.
(c)
6.122x1 + 1500.5x2 = 1506.622
2000x1 + 3x2 = 2003.
49. The elements of matrix A, the Hilbert matrix, are defined by
aij = 1/(i + j − 1), for i, j = 1, 2, . . . , n.
Find the solution of the system Ax = b for n = 4 and b = [1, 2, 3, 4]T
using Gaussian elimination by partial pivoting.
50. Solve the following systems using the Gauss–Jordan method:

(a)
x1 + 4x2 + x3 = 1
2x1 + 4x2 + x3 = 9
3x1 + 5x2 − 2x3 = 11.
(b)
x1 + x 2 + x3 = 1
2x1 − x2 + 3x3 = 4
3x1 + 2x2 − 2x3 = −2.
(c)
2x1 + 3x2 + 6x3 + x4 = 2
x1 + x2 − 2x3 + 4x4 = 1
3x1 + 5x2 − 2x3 + 2x4 = 11
2x1 + 2x2 + 2x3 − 3x4 = 2.
51. The following sets of linear equations have a common coefficients ma-
trix but different right-side terms:

(a)
2x1 + 3x2 + 5x3 = 0
3x1 + x2 − 2x3 = −2
x1 + 3x2 + 4x3 = −3.
Matrices and Linear Systems 221

(b)
2x1 + 3x2 + 5x3 = 1
3x1 + x2 − 2x3 = 2
x1 + 3x2 + 4x3 = 4.
(c)
2x1 + 3x2 + 5x3 = −5
3x1 + x2 − 2x3 = 6
x1 + 3x2 + 4x3 = −1.
The coefficients and the three sets of right-side terms may be com-
bined into an augmented matrix of the form
..
 
2 3 5 . 0 1 −5

 3 1 −2 .. 
.
 . −2 2 6 
..
1 3 4 . −3 4 −1
If we apply the Gauss–Jordan method to this augmented matrix form
and reduce the first three columns to the unity matrix form, the solu-
tion for the three problems are automatically obtained in the fourth,
fifth, and sixth columns when elimination is completed. Calculate
the solution in this way.
52. Calculate the inverse of each matrix using the Gauss–Jordan method:
 
    5 −2 0 0
3 −9 5 1 4 5  −2 5 −2 0 
(a)  0 5 1  , (b)  2 1 2  , (c)   0 −2
.
5 −2 
−1 6 3 8 1 1
0 0 −2 5
53. Find the inverse of the Hilbert matrix of size 4 × 4 using the Gauss–
Jordan method. Then solve the linear system Ax = [1, 2, 3, 4]T .
54. Find the LU decomposition of each matrix A using Doolittle’s method
and then solve the systems:
(a)    
2 −1 1 4
A=  −3 4 −1 ,  b=  5 .
1 −1 1 6
222 Applied Linear Algebra and Optimization using MATLAB

(b)   
7 6 5 2
A =  5 4 3 , b =  1 .
3 7 6 2
(c)    
2 2 2 0
A =  1 2 1 , b =  −4  .
3 3 4 1
(d)    
2 4 −6 −4
A= 1 5 3 , b =  10  .
1 3 2 5
(e)   
1 −1 0 2
A =  2 −1 1 , b =  4 .
2 −2 −1 3
(f )    
1 5 3 4
A =  2 4 6 , b =  11  .
1 3 2 5
55. Find the LU decomposition of each matrix A using Doolittle’s method,
and then solve the systems:

(a)
  
3 −2 1 1 3
 −3 7 4 −3   2 
A=
 2 −5 3
, b=
 1 .

4 
7 −3 2 4 2
(b)
   
2 −4 5 3 6
 3 5 −4 3   5 
A=
 1
, b=
 2 .

6 2 6 
7 2 5 1 4
Matrices and Linear Systems 223

(c)
   
2 2 3 −2 10
 10 2 13 11   14 
A=
 2
, b=
 11  .

5 4 6 
1 −4 −2 7 9

(d)
   
5 12 4 −11 44
 21 15 13 23   33 
A=
 31
, b=
 55  .

33 12 22 
−17 15 14 11 22

(e)
   
1 −1 10 8 −2
 12 −17 11 22   7 
A= , b=
 6 .

 22 31 13 −1 
8 24 13 9 5

(f )
 
  41
41 25 23 −18  1 
A =  2 13 −16 12  ,  15  .
b= 
11 13 9 7
13

56. For the value(s) of α of each of the following matrices, if A is singular,


using Doolittle’s method:
224 Applied Linear Algebra and Optimization using MATLAB

 
1 −1 2
(a) A =  −1 3 −1  .
α −2 3
 
1 5 7
(b) A =  4 4 α .
−2 α 9
 
2 −4 α
(c) A= 2 4 3 .
4 −2 5
 
2 α 1−α
(d) A= 2 5 −2  .
2 5 4
 
1 −1 3
(e) A= 3 2 3 .
4 α−2 7
 
1 5 α
(f ) A= 1 4 α − 2 .
1 −2 8

57. Find the determinant of each of the following matrices using LU


decomposition by Doolittle’s method:
   
2 3 −1 1 −2 2
(a) A =  1 2 1  , (b) A =  2 1 1 ,
2 1 −6 1 0 1
   
2 4 1 2 4 −6
(c) A =  3 3 2 ,
 (d) A =  1 5 3 ,
4 1 4 1 3 2
   
1 −1 0 1 5 3
(e) A =  2 −1 1  , (f ) A =  1 2 3 .
−2 2 1 1 3 2
Matrices and Linear Systems 225

58. Use the smallest positive integer to find the unique solution of each of
the linear systems of Problem 56 using LU decomposition by Doolit-
tle’s method:

(a) Ax = [2, 3, 2]T .


(b) Ax = [5, −6, 2]T .
(c) Ax = [11, 13, 10]T .
(d) Ax = [−8, 11, 8]T .
(e) Ax = [32, 23, 12]T .
(f ) Ax = [−11, 43, 22]T .

59. Find the LDV factorization of each of the following matrices:

(a)
   
3 4 3 4 −2 3
A =  2 3 3 , B= 5 2 −3  .
1 3 5 4 3 6
(b)
   
2 5 4 3 2 −6
A =  2 1 6 , B =  2 2 −5  .
3 2 7 3 4 7
(c)
   
1 −5 4 4 7 −6
A= 2 3 −4  , B= 5 5 −5  .
3 2 6 6 −4 9
(d)
 
  2 3 4 5
3 −1 4  3 1 2 4 
A=  2 2 −1 , B=
 3
.
1 1 1 
3 2 2
4 3 1 2
226 Applied Linear Algebra and Optimization using MATLAB

60. Find the LDLT factorization of each of the following matrices:

(a)  
2 3 4
A =  3 5 2 .
4 2 6
(b)  
3 −2 4
A =  −2 2 1 .
4 1 3
(c)  
2 1 −1
A= 1 3 2 .
−1 2 2
(d)  
1 −2 3 4
 −2 3 4 5 
A= .
 3 4 5 −6 
4 5 −6 7

61. Solve Problem 54 by LU decomposition using Crout’s method.

62. Find the determinant of each of the following matrices using LU


decomposition by Crout’s method:
   
2 2 −1 2 −1 1
(a) A =  1 2 1  , (b) A =  1 2 2 ,
2 1 −4 2 0 2
   
4 4 1 2 4 5
(c) A =  5 4 2  , (d) A =  3 5 3 ,
1 4 4 4 3 2
   
1 −1 2 1 5 3
(e) A =  2 −1 1  , (f ) A =  1 2 3 .
−2 2 4 1 3 4
Matrices and Linear Systems 227

63. Solve the following systems by LU decomposition using the Cholesky


method:

(a)
   
1 −1 1 2
A =  −1 5 −1  , b =  2 .
1 −1 10 2

(b)
   
10 2 1 7
A =  2 10 3  , b =  −4  .
1 3 10 3

(c)
   
4 2 3 1
A =  2 17 1  , b =  2 .
3 1 5 5

(d)
  
3 4 −6 0 4
 4 5 3 1   5 
A=
 −6
, b=
 2 .

3 3 1 
0 1 1 3 3

64. Solve the following systems by LU decomposition using the Cholesky


method:

(a)
   
5 −1 1 5
A =  −3 5 −2  , b =  7 .
2 −1 7 9

(b)
   
6 2 −3 5
A =  3 12 −4  , b =  2 .
6 3 13 4
228 Applied Linear Algebra and Optimization using MATLAB

(c)    
5 2 −5 3
A= 2 4 4 , b =  11  .
−3 −2 7 14
(d)   
1 4 −6 0 12
 2 2 3 3   13 
A=
 −3
, b=
 14  .

6 7 1 
0 2 −3 5 15

65. Solve the following tridiagonal systems using LU decomposition:

(a)    
3 −1 0 1
A =  −1 3 −1  , b =  1 .
0 −1 3 1
(b)   

2 3 0 0 6
 3 2 3 0   7 
A=
 0
, b=
 5 .

3 2 3 
0 0 3 2 3
(c)   
4 −1 0 0 1
 −1 4 −1 0   1 
A=
 0 −1
, b=
 1 .

4 −1 
0 0 −1 4 1
(d)   
2 3 0 0 1
 3 5 4 0   2 
A=
 0
, b=
 3 .

4 6 3 
0 0 3 4 4

66. Solve the following tridiagonal systems using LU decomposition:


Matrices and Linear Systems 229

(a)    
4 −2 0 5
A =  −2 5 −2  , b =  6 .
0 −2 6 7
(b)    
8 1 0 0 2
 1 8 1 0   2 
A=
 0
, b=
 2 .

1 8 1 
0 0 1 8 2
(c)   
5 −3 0 0 7
 −3 6 −2 0   −5 
A=
 0 −2
, b=
 4 .

7 −5 
0 0 −5 8 2
(d)    
2 −4 0 0 11
 −4 5 7 0   12 
A=
 0
, b=
 13  .

7 6 2 
0 0 2 8 14
67. Find kxk1 , kxk2 , and kxk∞ for the following vectors:

(a)
[2, −1, −6, 3]T .
(b)
[sin k, cos k, 3k ]T , for a fixed integer k.
68. Find k.k1 , k.k∞ and k.ke for the following matrices:
   
3 1 −1 4 1 6
A= 2 0 4  , B =  −3 6 4  ,
1 −5 1 5 0 9
 
  3 11 −5 2
17 46 7  6 8 −11 6 
C=  20 49 8  , D =   −4 −8
.
10 14 
23 52 9
13 14 −12 9
230 Applied Linear Algebra and Optimization using MATLAB

69. Consider the following matrices:


   
−11 7 −8 6 2 7
A= 5 9 6  , B =  −12 10 8  ,
6 3 7 3 −15 14
 
  2 1 −1 1
5 −6 4  1 3 5 2 
C =  −7 8 5 , D =   −2 −3
.
4 5 
3 −9 12
3 4 −2 4
Find k.k1 and k.k∞ for (a) A3 , (b) A2 + B 2 + C 2 + D2 , (c) BC,
(d) C 2 + D2 .
70. The n × n Hilbert matrix H (n) is defined by
(n) 1
Hij = , 1 ≤ i, j ≤ n.
i+j−1
Find the l∞ -norm of the 10 × 10 Hilbert matrix.
71. Compute the condition numbers of the following matrices relative to
k.k∞ :
1 1 1
 
 3 2 5 
     

 1
 0.03 0.01 −0.02 1.11 1.98 2.01
1 1 
 , (b)  0.15 0.51 −0.11  , (c)  1.01 1.05 2.05  .
(a) 
 2
 5 3 
 1.11 2.22 3.33 0.85 0.45 1.25
 
 1 1 1 
5 3 2
72. The following linear systems have x as the exact solution and x∗ is an
kr||∞
approximate solution. Compute kx − x∗ k∞ and K(A) , where
kAk∞
r = b − Ax∗ is the residual vector:

(a)
0.89x1 + 0.53x2 = 0.36
0.47x1 + 0.28x2 = 0.19
Matrices and Linear Systems 231

x = [1, −1]T
x∗ = [0.702, −0.500]T
(b)
0.986x1 + 0.579x2 = 0.235
0.409x1 + 0.237x2 = 0.107

x = [2, −3]T
x∗ = [2.110, −3.170]T
(c)
1.003x1 + 58.090x2 = 68.12
5.550x1 + 321.8x2 = 377.3

x = [10, 1]T
x∗ = [−10, 1]T

73. Discuss the ill-conditioning (stability) of the linear system

1.01x1 + 0.99x2 = 2
0.99x1 + 1.01x2 = 2.

If x∗ = [2, 0]t is an approximate solution of the system, then find the


residual vector r and estimate the relative error.

74. Show that if B is singular, then

1 kA − Bk
≤ .
K(A) kAk

75. Consider the following matrices:


   
0.06 0.01 0.02 0.1 0.2 0.12
A=  0.13 0.05 0.11  , B =  0.1 0.4 0.2  .
1.01 2.02 3.03 0.2 0.05 0.1

Using Problem 74, compute the approximation of the condition num-


ber of the matrix A relative to k.k∞ .
232 Applied Linear Algebra and Optimization using MATLAB

76. Let A and B be nonsingular n × n matrices. Show that

(a)
K(A) ≥ 1 and K(B) ≥ 1.
(b)
K(AB) ≤ K(A)K(B).

77. The exact solution of the linear system


x1 + x2 = 1
x1 + 1.01x2 = 2

is x = [−99, 100]T . Change the coefficient matrix slightly to


 
1 1
δA = ,
1 0.99
and consider the linear system
x 1 + x2 = 1
x1 + 0.99x2 = 2.
Compute the changed solution δx of the system. Is the matrix A
ill-conditioned?
78. Using Problem 77, compute the relative error and the relative resid-
ual.
79. The exact solution of the linear system
x1 + 3x2 = 4
1.0001x1 + 3x2 = 4.0001

is x = [1, 1]T . Change the right-hand vector b slightly to δb =


[4.0001, 4.0003]T , and consider the linear system
x1 + 3x2 = 4.0001
1.0001x1 + 3x2 = 4.0003.
Compute the changed solution δx of the system. Is the matrix A
ill-conditioned?
Matrices and Linear Systems 233

80. If kAk < 1, then show that the matrix (I − A) is nonsingular and
1
k(I − A)−1 k ≤ .
1 − kAk

81. The exact solution of the linear system


x1 + x2 = 3
x1 + 1.0005x2 = 3.0010

is x = [1, 2]T . Change the coefficient matrix and the right-hand


vector b slightly to
   
1 1 2.99
δA = and δb = ,
1 1.001 3.01
and consider the linear system
x1 + x2 = 2.99
x1 + 1.001x2 = 3.01
Compute the changed solution δx of the system. Is the matrix A
ill-conditioned?
82. Find the condition number of the following matrix:
!
1 1
An = 1 .
1 1−
n
Solve the linear system A4 x = [2, 2]T and compute the relative resid-
ual.
83. Determine equations of the polynomials of degree two whose graphs
pass through the given points.

(a) (1, 2), (2, 2), (3, 4).


(b) (1, 14), (2, 22), (3, 32).
(c) (1, 5), (2, 7), (3, 9).
(d) (−1, −1), (0, 1), (1, −3).
(e) (1, 8), (3, 26), (5, 60).
234 Applied Linear Algebra and Optimization using MATLAB

84. Find an equation of the polynomial of degree three whose graph


passes through the points (1, −3), (2, −1), (3, 9), (4, 33).

85. Determine the currents through the various branches of the electrical
network in Figure 1.8:

(a) When battery C is 9 volts.


(b) When battery C is 23 volts.

Figure 1.8: Electrical circuit.

Note how the current through the branch AB is reversed in (b). What
would the voltage of C have to be for no current to pass through AB?
Matrices and Linear Systems 235

86. Construct a mathematical model that describes the traffic flow in


the road network of Figure 1.9. All streets are one-way streets in the
directions indicated. The units are in vehicles per hour. Give two
distinct possible flows of traffic. What is the minimum possible flow
that can be expected along branch AB?

Figure 1.9: Traffic flow.


236 Applied Linear Algebra and Optimization using MATLAB

87. Figure 1.10 represents the traffic entering and leaving a “roundabout”
road junction. Such junctions are very common in Europe. Construct
a mathematical model that describes the flow of traffic along the vari-
ous branches. What is the minimum flow theoretically possible along
the branch BC? Is this flow ever likely to be realized in practice?

Figure 1.10: Traffic flow.


Matrices and Linear Systems 237

88. Find the temperatures at x1 , x2 , x3 , and x4 of the triangular metal


plate shown in Figure 1.11, given that the temperature of each inte-
rior point is the average of its four neighboring points.

Figure 1.11: Heat Conduction.


238 Applied Linear Algebra and Optimization using MATLAB

89. Find the temperatures at x1 , x2 , and x3 of the triangular metal plate


shown in Figure 1.12, given that the temperature of each interior
point is the average of its four neighboring points.

Figure 1.12: Heat conduction.

90. It takes three different ingredients, A, B, and C, to produce a cer-


tain chemical substance. A, B, and C have to be dissolved in water
separately before they interact to form the chemical. The solution
containing A at 2.2g/cm3 combined with the solution containing B
at 2.5g/cm3 , combined with the solution containing C at 4.6g/cm3 ,
makes 18.25g of the chemical. If the proportions for A, B, C in
these solutions are changed to 2.4, 3.5, and 5.8g/cm3 , respectively
(while the volumes remain the same), then 21.26g of the chemi-
cal is produced. Finally, if the proportions are changed to 1.7, 2.1,
and 3.9g/cm3 , respectively, then 15.32g of the chemical is produced.
What are the volumes in cubic centimeters of the solutions containing
A, B, and C?
Matrices and Linear Systems 239

91. Find a balanced chemical equation for each reaction:

(a) F eS2 + O2 −→ F e2 O3 + SO2 .

(b) CO2 + H2 O −→ C6 H12 O6 + O2 (This reaction takes place when a


green plant converts carbon dioxide and water to glucose and oxygen
during photosynthesis.)

(c) C4 H10 + O2 −→ CO2 + H2 O (This reaction occurs when butane,


C4 H10 , burns in the presence of oxygen to form carbon dioxide and
water.)

(d) C5 H11 OH + O2 −→ H2 O + CO2 (This reaction represents the


combustion of amyl alcohol.)

92. Find a balanced chemical equation for each reaction:

(a) C7 H6 O2 + O2 −→ H2 O + CO2 .

(b) HClO4 + P4 O10 −→ H3 P O4 + Cl2 O7 .

(c) N a2 CO3 + C + N2 −→ N aCN + CO.

(d) C2 H2 Cl4 + Ca(OH)2 −→ C2 HCl3 + CaCl2 + H2 O.

93. A manufacturing company produces three products, I, II, and III.


It uses three machines, A, B, and C, for 350, 150, and 100 hours, re-
spectively. Making one thousand atoms of type I requires 30, 10, and
5 hours on machines A, B, and C, respectively. Making one thousand
atoms of type II requires 20, 10, and 10 hours on machines A, B, and
C, respectively. Making one thousand atoms of type III requires
30, 30, and 5 hours on machines A, B, and C, respectively. Find the
number of items of each type of product that can be produced if the
machines are used at full capacity.
240 Applied Linear Algebra and Optimization using MATLAB

94. The average of the temperature for the cities of Jeddah, Makkah,
and Riyadh was 15o C during a given winter day. The temperature
in Makkah was 6o C higher than the average of the temperatures
of the other two cities. The temperature in Riyadh was 6o C lower
than the average temperature of the other two cities. What was the
temperature in each one of the cities?

95. An international business person needs, on the average, fixed amounts


of Japanese yen, French francs, and German marks during each of
his business trips. He traveled three times this year. The first time
he exchanged a total of $2,400 at the following rates: the dollar was
100 yen, 1.5 francs, and 1.2 marks. The second time he exchanged a
total of $2,350 at these rates: the dollar was 100 yen, 1.2 francs, and
1.5 marks. The third time he exchanged a total of $2,390 at these
rates: the dollar was 125 yen, 1.2 francs, and 1.2 marks. How many
yen, francs, and marks did he buy each time?

96. A father plans to distribute his estate, worth SR1000,000, between


his four sons as follows: 43 of the estate is to be split equally among
the sons. For the rest, each son is to receive SR5,000 for each year
that remains until his 25th birthday. Given that the sons are all 4
years apart, how much would each receive from his father’s estate?

97. A biologist has placed three strains of bacteria (denoted by I, II, and
III) in a test tube, where they will feed on three different food sources
(A, B, and C). Each day 2300 units of A, 800 units of B, and 1500
units of C are placed in the test tube, and each bacterium consumes
a certain number of units of each food per day, as shown in the given
table. How many bacteria of each strain can coexist in the test tube
and consume all the food?

Food Bacteria Bacteria Bacteria


Strain I Strain II Strain III
Food A 2 2 4
Food B 1 2 0
Food C 1 3 1
Matrices and Linear Systems 241

98. Al-karim hires three types of laborers, I, II, and III, and pays them
SR20, SR15, and SR10 per hour, respectively. If the total amount
paid is SR20,000 for a total of 300 hours of work, find the possible
number of hours put in by the three categories of workers if the
category III workers must put in the maximum amount of hours.
Chapter 2

Iterative Methods for Linear


Systems

2.1 Introduction
The methods discussed in Chapter 1 for the solution of the system of linear
equations have been direct, which required a finite number of arithmetic
operations. The elimination methods for solving such systems usually
yield sufficiently accurate solutions for approximately 20 to 25 simulta-
neous equations, where most of the unknowns are present in all of the
equations. When the coefficients matrix is sparse (has many zeros), a con-
siderably large number of equations can be handled by the elimination
methods. But these methods are generally impractical when many hun-
dreds or thousands of equations must be solved simultaneously.

There are, however, several methods that can be used to solve large
numbers of simultaneous equations. These methods, called iterative meth-
ods, are methods by which an approximation to the solution of a system
243
244 Applied Linear Algebra and Optimization using MATLAB

of linear equations may be obtained. The iterative methods are used most
often for large, sparse systems of linear equations and they are efficient in
terms of computer storage and time requirements. Systems of this type
arise frequently in the numerical solutions of boundary value problems
and partial differential equations. Unlike the direct methods, the iterative
methods may not always yield a solution, even if the determinant of the
coefficients matrix is not zero.

The iterative methods to solve the system of linear equations

Ax = b (2.1)

start with an initial approximation x(0) to the solution x of the linear


system (2.1) and generate a sequence of vectors {x(k) }∞
k=0 that converges
to x. Most of these iterative methods involve a process that converts the
system (2.1) into an equivalent system of the form

x = Tx + c (2.2)

for some square matrix T and vector c. After the initial vector x(0) is se-
lected, the sequence of approximate solution vectors is generated by com-
puting
x(k+1) = T x(k) + c, for k = 0, 1, 2, . . . . (2.3)

The sequence is terminated when the error is sufficiently small, i.e.,

kx(k+1) − x(k) k < , for small positive . (2.4)

Among them, the most useful methods are the Jacobi method, the
Gauss–Seidel method, the Successive Over-Relaxation (SOR) method, and
the conjugate gradient method.

Before discussing these methods, it is convenient to introduce notations


for some matrices. The matrix A is written as

A = L + D + U, (2.5)
Iterative Methods for Linear Systems 245

where L is strictly lower-triangular, U is strictly upper-triangular, and D


is the diagonal parts of the coefficients matrix A, i.e.,
   
0 0 0 ··· 0 a11 a12 a13 · · · a1n
 a21
 0 0 ··· 0  
 0
 0 a23 · · · a2n  
L=
 a31 a32 0 · · · 0 ,

U =
 0 0 0 · · · a 3n ,

 .. .. .. .. ..   .. .. .. .. .. 
 . . . . .   . . . . . 
an1 an2 an3 · · · 0 0 0 0 ··· 0
and  
a11 0 0 ··· 0

 0 a22 0 ··· 0 

D=
 0 0 a33 · · · 0 .

 .. .. .. .. .. 
 . . . . . 
0 0 0 · · · ann
Then (2.1) can be written as
(L + D + U )x = b. (2.6)
Now we discuss our first iterative method to solve the linear system (2.6).

2.2 Jacobi Iterative Method


This is one of the easiest iterative methods to find the approximate solution
of the system of linear equations (2.1). To explain its procedure, consider
a system of three linear equations as follows:
a11 x1 + a12 x2 + a13 x3 = b1
a21 x1 + a22 x2 + a23 x3 = b2
a31 x1 + a32 x2 + a33 x3 = b3 .
The solution process starts by solving for the first variable x1 from the first
equation, the second variable x2 from the second equation and the third
variable x3 from the third equation, which gives
a11 x1 = b1 − a12 x2 − a13 x3
a22 x2 = b2 − a21 x1 − a23 x3
a33 x3 = b3 − a31 x1 − a32 x2
246 Applied Linear Algebra and Optimization using MATLAB

or in matrix form
Dx = b − (L + U )x.
Divide both sides of the above three equations by their diagonal elements,
a11 , a22 , and a33 , respectively, to get

1 h i
x1 = b1 − a12 x2 − a13 x3
a11

1 h i
x2 = b2 − a21 x1 − a23 x3
a22

1 h i
x3 = b3 − a31 x1 − a32 x2 ,
a33

which can be written in the matrix form

x = D−1 [b − (L + U )x].
h iT
(0) (0) (0) (0)
Let x = x1 , x2 , x3
be an initial solution of the exact solution x of
the linear system (2.1). Then define an iterative sequence

(k+1) 1 h (k) (k)


i
x1 = b1 − a12 x2 − a13 x3
a11

(k+1) 1 h (k) (k)


i
x2 = b2 − a21 x1 − a23 x3 (2.7)
a22

(k+1) 1 h (k) (k)


i
x3 = b3 − a31 x1 − a32 x2
a33

or in matrix form

x(k+1) = D−1 [b − (L + U )x(k) ], k = 0, 1, 2, . . . , (2.8)

where k is the number of iterative steps. Then the form (2.7) is called
the Jacobi formula for the system of three equations and (2.8) is called its
matrix form. For a general system of n linear equations, the Jacobi method
Iterative Methods for Linear Systems 247

is defined by
" i−1 n
#
(k+1) 1 X (k)
X (k)
xi = bi − aij xj − aij xj (2.9)
aii j=1 j=i+1
i = 1, 2, . . . , n, k = 0, 1, 2, . . .

provided that the diagonal elements aii 6= 0, for each i = 1, 2, . . . , n. If


the diagonal elements equal zero, then reordering of the equations can be
performed so that no element in the diagonal position equals zero. The
matrix form of the Jacobi iterative method (2.9) can be written as

x(k+1) = c + TJ x(k) , k = 0, 1, 2, . . . (2.10)

or
      
x1 c1 0 −t12 · · · −t1n x1
x2   −t21
c2 0 · · · −t2n    x2 
    
= + ..   ..  ,
    
 ..    .. .. .. ..
  .    . . . . .  . 
xn (k+1) cn −tn1 −tn2 · · · 0 xn (k)
(2.11)
where the Jacobi iteration matrix TJ and vector c are defined as follows:

TJ = −D−1 (L + U ) and c = D−1 b, (2.12)

and their elements are defined by


aij
tij = , i, j = 1, 2, . . . , n, i 6= j
aii

tij = 0, i=j

bi
ci = , i = 1, 2, . . . , n.
aii
The Jacobi iterative method is sometimes called the method of simultane-
ous iterations, because all values of xi are iterated simultaneously. That
(k+1) (k)
is, all values of xi depend only on the values of xi .
248 Applied Linear Algebra and Optimization using MATLAB

Note that the diagonal elements of the Jacobi iteration matrix TJ are
(0)
always zero. As usual with iterative methods, an initial approximation xi
must be supplied. If we don’t have knowledge of the exact solution, it is
(0)
conventional to start with xi = 0, for all i. The iterations defined by
(2.9) are stopped when

kx(k+1) − x(k) k < , (2.13)

or by using other possible stopping criteria

kx(k+1) − x(k) k
< (2.14)
kx(k+1) k

where  is a preassigned small positive number. For this purpose, any


convenient norm can be used, the most common being the l∞ -norm.

Example 2.1 Solve the following system of equations using the Jacobi it-
erative method using  = 10−5 in the l∞ -norm:

15x1 − x2 − 2x3 − 3x4 = 11


−x1 + 15x2 − 2x3 − 3x4 = 22
−x1 − 2x2 + 15x3 − 3x4 = 33
−x1 − 2x2 − 3x3 + 15x4 = 44.

Start with the initial solution x(0) = [0, 0, 0, 0]T .

Solution. The Jacobi method for the given system is

(k+1) 1h (k) (k) (k)


i
x1 = 15 + x2 + 2x3 + 3x4
15

(k+1) 1h (k) (k) (k)


i
x2 = 22 + x1 + 2x3 + 3x4
15

(k+1) 1h (k) (k) (k)


i
x3 = 33 + x1 + 2x2 + 3x4
15

(k+1) 1h (k) (k) (k)


i
x4 = 44 + x1 + 2x2 + 3x3 ,
15
Iterative Methods for Linear Systems 249

Table 2.1: Solution of Example 2.1.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.00000 0.00000 0.00000 0.00000
1 0.73333 1.46667 2.20000 2.93333
2 1.71111 2.39556 3.03111 3.61778
3 2.02074 2.70844 3.35704 3.97304
4 2.15611 2.84366 3.49045 4.10058
5 2.20842 2.89592 3.54300 4.15431
6 2.22966 2.91716 3.56421 4.17528
7 2.23810 2.92560 3.57266 4.18377
8 2.24148 2.92898 3.57604 4.18715
9 2.24283 2.93033 3.57739 4.18851
10 2.24338 2.93088 3.57793 4.18905
11 2.24359 2.93109 3.57815 4.18926
12 2.24368 2.93118 3.57824 4.18935
13 2.24371 2.93121 3.57827 4.18938
14 2.24373 2.93123 3.57829 4.18940
15 2.24373 2.93123 3.57829 4.18940

(0) (0) (0) (0)


and starting with the initial approximation x1 = 0, x2 = 0, x3 = 0, x4 =
0, then for k = 0, we obtain

(1) 1h (0) (0) (0)


i
x1 = 15 + x2 + 2x3 + 3x4 = 0.73333
15

(1) 1h (0) (0) (0)


i
x2 = 22 + x1 + 2x3 + 3x4 = 1.46667
15

(1) 1h (0) (0) (0)


i
x3 = 33 + x1 + 2x2 + 3x4 = 2.20000
15

(1) 1h (0) (0) (0)


i
x4 = 44 + x1 + 2x2 + 3x3 = 2.93333.
15
The first and subsequent iterations are listed in Table 2.1.
250 Applied Linear Algebra and Optimization using MATLAB

Note that the Jacobi method converged and after 15 iterations we ob-
tained the good approximation [2.24373, 2.93123, 3.57829, 4.18940]T of
the given system having the exact solution [2.24374, 2.93124, 3.57830,
4.18941]T . Ideally, the iterations should stop automatically when we ob-
tain the required accuracy using one of the stopping criteria mentioned in
(2.13) or (2.14). •

The above results can be obtained using MATLAB commands, as fol-


lows:

>> Ab = [15 − 1 − 2 − 3 11; −1 15 − 2 − 3 22; −1 − 2 15 − 3 33; ...


−1 − 2 − 3 44];
>> x = [0 0 0 0];
>> acc = 1e − 05;
>> JacobiM (Ab, x, acc);

Example 2.2 Solve the following system of equations using the Jacobi it-
erative method:

−x1 + 15x2 − 2x3 − 3x4 = 22


15x1 − x2 − 2x3 − 3x4 = 11
−x1 − 2x2 + 15x3 − 3x4 = 33
−x1 − 2x2 − 3x3 + 15x4 = 44.

Start with the initial solution x(0) = [0, 0, 0, 0]T .

Solution. Results for this linear system are listed in Table 2.2. Note that
in this case the Jacobi method diverges rapidly. Although the given linear
system is the same as the linear system of Example 2.1, the first and second
equations are interchanged. From this example we conclude that the Jacobi
iterative method is not always convergent.
Iterative Methods for Linear Systems 251

Table 2.2: Solution of Example 2.2.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.000000 0.000000 0.000000 0.00000
1 –2.2000e+001 –1.1000e+001 2.2000e+000 2.9333e+000
2 –2.0020e+002 –3.5420e+002 –1.4667e-001 4.4000e-001
3 –5.3360e+003 –3.0150e+003 –5.8285e+001 –5.7669e+001
4 –4.4958e+004 –7.9762e+004 –7.6707e+002 –7.6646e+002
5 –1.1926e+006 –6.7054e+005 –1.3783e+004 –1.3783e+004
6 –9.9893e+006 –1.7820e+007 –1.7167e+005 –1.7167e+005
7 –2.6645e+008 –1.4898e+008 –3.0763e+006 –3.0763e+006
8 –2.2193e+009 –3.9813e+009 –3.8242e+007 –3.8242e+007
9 –5.9529e+010 –3.3099e+010 –6.8645e+008 –6.8645e+008
10 –4.9305e+011 –8.8950e+011 –8.5190e+009 –8.5190e+009

Program 2.1
MATLAB m-file for the Jacobi Iterative Method
function x=JacobiM(Ab,x,acc) % Ab = [A b]
[n,t]=size(Ab); b=Ab(1:n,t); R=1; k=1;
d(1,1:n+1)=[0 x]; while R > acc
for i=1:n
sum=0;
for j=1:n; if j ˜ =i
sum = sum + Ab(i, j) ∗ d(k, j + 1); end;
x(1, i) = (1/Ab(i, i)) ∗ (b(i, 1) − sum); end;end
k=k+1; d(k,1:n+1)=[k-1 x];
R=max(abs((d(k,2:n+1)-d(k-1,2:n+1))));
if k > 10 & R > 100
(‘Jacobi Method diverges’)
break; end; end; x=d;

Procedure 2.1 (Jacobi Method)


1. Check that the coefficient matrix A is strictly diagonally dominant
(for guaranteed convergence).
252 Applied Linear Algebra and Optimization using MATLAB

2. Initialize the first approximation x(0) and preassigned accuracy .

bi
3. Compute the constant c = D−1 b = , for i = 1, 2, . . . , n.
aii

4. Compute the Jacobi iteration matrix TJ = −D−1 (L + U ).

(k+1) (k)
5. Solve for the approximate solutions xi = TJ xi +c, i = 1, 2, . . . , n
k = 0, 1, . . ..
(k+1) (k)
6. Repeat step 5 until kxi − xi k < .

2.3 Gauss–Seidel Iterative Method


This is one of the most popular and widely used iterative methods for
finding the approximate solution of the system of linear equations. This
iterative method is a modification of the Jacobi iterative method and gives
us good accuracy by using the most recently calculated values.

From the Jacobi iterative formula (2.9), it is seen that the new estimates
for solution x are computed from the old estimates and only when all
the new estimates have been determined are they then used in the right-
hand side of the equation to perform the next iteration. But the Gauss–
Seidel method is used to make use of the new estimates in the right-hand
side of the equation as soon as they become available. For example, the
Gauss–Seidel formula for the system of three equations can be defined as
an iterative sequence:

(k+1) 1 h (k) (k)


i
x1 = b1 − a12 x2 − a13 x3
a11

(k+1) 1 h (k+1) (k)


i
x2 = b2 − a21 x1 − a23 x3 (2.15)
a22

(k+1) 1 h (k+1) (k+1)


i
x3 = b3 − a31 x1 − a32 x2 .
a33
Iterative Methods for Linear Systems 253

For a general system of n linear equations, the Gauss–Seidel iterative


method is defined as
" i−1 n
#
(k+1) 1 X (k+1)
X (k)
xi = bi − aij xj − aij xj , (2.16)
aii j=1 j=i+1
i = 1, 2, . . . , n, k = 0, 1, 2, . . .

and in matrix form, can be represented by

x(k+1) = (D + L)−1 [b − U x(k) ], for each k = 0, 1, 2, . . . . (2.17)

For the lower-triangular matrix (D + L) to be nonsingular, it is necessary


and sufficient that the diagonal elements aii 6= 0, for each i = 1, 2, . . . , n.
By comparing (2.3) and (2.17), we obtain

TG = −(D + L)−1 U and c = (D + L)−1 b, (2.18)

which are called the Gauss–Seidel iteration matrix and the vector, respec-
tively.

The Gauss–Seidel iterative method is sometimes called the method of


successive iteration, because the most recent values of all xi are used in
the calculation.

Example 2.3 Solve the following system of equations using the Gauss–
Seidel iterative method, with  = 10−5 in the l∞ -norm:

15x1 − x2 − 2x3 − 3x4 = 11


−x1 + 15x2 − 2x3 − 3x4 = 22
−x1 − 2x2 + 15x3 − 3x4 = 33
−x1 − 2x2 − 3x3 + 15x4 = 44.

Start with the initial solution x(0) = [0, 0, 0, 0]T .


254 Applied Linear Algebra and Optimization using MATLAB

Solution. The Gauss–Seidel method for the given system is

(k+1) 1h (k) (k) (k)


i
x1 = 15 + x2 + 2x3 + 3x4
15

(k+1) 1h (k+1) (k) (k)


i
x2 = 22 + x1 + 2x3 + 3x4
15

(k+1) 1h (k+1) (k+1) (k)


i
x3 = 33 + x1 + 2x2 + 3x4
15

(k+1) 1h (k+1) (k+1) (k+1)


i
x4 = 44 + x1 + 2x2 + 3x3 ,
15
(0) (0) (0) (0)
and starting with the initial approximation x1 = 0, x2 = 0, x3 = 0, x4 =
0, then for k = 0, we obtain

(1) 1h (0) (0) (0)


i
x1 = 15 + x2 + 2x3 + 3x4 = 0.73333
15

(1) 1h (1) (0) (0)


i
x2 = 22 + x1 + 2x3 + 3x4 = 1.51556
15

(1) 1h (1) (1) (0)


i
x3 = 33 + x1 + 2x2 + 3x4 = 2.45096
15

(1) 1h (1) (1) (1)


i
x4 = 44 + x1 + 2x2 + 3x3 = 3.67449.
15
The first and subsequent iterations are listed in Table 2.3.

The above results can be obtained using MATLAB commands as fol-


lows:

>> Ab = [15 − 1 − 2 − 3 11; −1 15 − 2 − 3 22; −1 − 2 15 − 3 33; ...


−1 − 2 − 3 44];
>> x = [0 0 0 0];
>> acc = 1e − 5;
>> GaussSM (Ab, x, acc);
Iterative Methods for Linear Systems 255

Table 2.3: Solution of Example 2.3.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.00000 0.00000 0.00000 0.00000
1 0.73333 1.51556 2.45096 3.67449
2 1.89606 2.65476 3.41527 4.09676
3 2.18504 2.88706 3.54996 4.17394
4 2.23392 2.92371 3.57354 4.18681
5 2.24208 2.92997 3.57749 4.18897
6 2.24346 2.93102 3.57816 4.18933
7 2.24369 2.93120 3.57827 4.18939
8 2.24373 2.93123 3.57829 4.18940
9 2.24374 2.93123 3.57830 4.18941

Note that the Gauss–Seidel method converged for the given system and re-
quired nine iterations to obtain the approximate solution [2.24374, 2.93123,
3.57830, 4.18941]T , which is equal to the exact solution [2.24374, 2.93124,
3.57830, 4.18941]T up to six significant digits, which is six iterations less
than required by the Jacobi method for the same linear system. •

Example 2.4 Solve the following system of equations using the Gauss–
Seidel iterative method, with  = 10−5 in the l∞ -norm:

−x1 + 15x2 − 2x3 − 3x4 = 22


15x1 − x2 − 2x3 − 3x4 = 11
−x1 − x2 + 15x3 − 3x4 = 33
−x1 − 2x2 − 3x3 + 15x4 = 44.

Start with the initial solution x(0) = [0, 0, 0, 0]T .

Solution. Results for this linear system are listed in Table 2.4. Note that
in this case the Gauss–Seidel method diverges rapidly. Although the given
linear system is the same as the linear system of the previous Example 2.3,
the first and second equations are interchanged. From this example we
conclude that the Gauss–Seidel iterative method is not always convergent.•
256 Applied Linear Algebra and Optimization using MATLAB

Table 2.4: Solution of Example 2.4.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.00000 0.00000 0.00000 0.0000
1 –2.2000e+001 –3.4100e+002 –4.4733e+001 –5.2947e+001
2 –4.8887e+003 –7.3093e+004 –1.0080e+004 –1.2085e+004
3 –1.0400e+006 –1.5544e+007 –2.1442e+006 –2.5707e+006
4 –2.2115e+008 –3.3053e+009 –4.5597e+008 –5.4665e+008
5 –4.7028e+010 –7.0287e+011 –9.6960e+010 –1.1624e+011
6 –1.0000e+013 –1.4946e+014 –2.0618e+013 –2.4719e+013
7 –2.1265e+015 –3.1783e+016 –4.3844e+015 –5.2564e+015
8 –4.5220e+017 –6.7585e+018 –9.3233e+017 –1.1177e+018
9 –9.6160e+019 –1.4372e+021 –1.9826e+020 –2.3769e+020

Program 2.2
MATLAB m-file for the Gauss–Seidel Iterative Method
function x=GaussSM(Ab,x,acc) % Ab = [A b]
[n,t]=size(Ab); b=Ab(1:n,t);R=1; k=1;
d(1,1:n+1)=[0 x]; k=k+1; while R > acc
for i=1:n; sum=0; for j=1:n
if j <= i − 1; sum = sum + Ab(i, j) ∗ d(k, j + 1);
elseif j >= i + 1
sum = sum + Ab(i, j) ∗ d(k − 1, j + 1); end; end
x(1, i) = (1/Ab(i, i)) ∗ (b(i, 1) − sum);
d(k,1)=k-1; d(k,i+1)=x(1,i); end
R=max(abs((d(k,2:n+1)-d(k-1,2:n+1))));
k=k+1; if R > 100 & k > 10 (‘Gauss–Seidel method Diverges’)
break ;end;end;x=d;
Procedure 2.2 (Gauss–Seidel Method)
1. Check that the coefficient matrix A is strictly diagonally dominant
(for guaranteed convergence).
2. Initialize the first approximation x(0) ∈ R and preassigned accuracy
.
Iterative Methods for Linear Systems 257

3. Compute the constant c = (D + L)−1 b.


4. Compute the Gauss–Seidel iteration matrix TG = −(D + L)−1 U .
(k+1) (k)
5. Solve for the approximate solutions xi = TG xi + c, i =
1, 2, . . . , n, k = 0, 1, . . . .
(k+1) (k)
6. Repeat step 5 until kxi − xi k < .
From Example 2.1 and Example 2.3, we note that the solution by the
Gauss–Seidel method converges more quickly than the Jacobi method. In
general, we may state that if both the Jacobi method and the Gauss–
Seidel method converge, then the Gauss–Seidel method will con-
verge more quickly. This is generally the case but is not always true.
In fact, there are some linear systems for which the Jacobi method con-
verges but the Gauss–Seidel method does not, and others for which the
Gauss–Seidel method converges but the Jacobi method does not.
Example 2.5 Solve the following system of equations using the Jacobi and
Gauss–Seidel iterative methods, using  = 10−5 in the l∞ -norm and taking
the initial solution x(0) = [0, 0, 0, 0]T :
7x1 + x2 + x4 = 2
2x1 + 5x2 + x3 + 3x4 = 2
4x2 + 5x3 + 2x4 = 3
x1 + 3x2 + 2x3 + 6x4 = 4.
Solution. First, we solve by the Jacobi method and for the given system,
the Jacobi formula is
(k+1) 1h (k) (k)
i
x1 = 2 − 11x2 − x4
7

(k+1) 1h (k) (k) (k)


i
x2 = 2 − 2x1 − x3 − 3x4
5

(k+1) 1h (k) (k)


i
x3 = 3 − 4x2 − 2x4
2

(k+1) 1h (k) (k) (k)


i
x4 = 4 − x1 − 3x2 − 2x3 ,
6
258 Applied Linear Algebra and Optimization using MATLAB

Table 2.5: Solution by the Jacobi method.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.00000 0.00000 0.00000 0.00000
1 0.28571 0.40000 0.60000 0.66667
2 –0.43810 –0.23429 0.01333 0.21905
3 0.62259 0.44114 0.69981 0.85238
4 –0.52928 –0.50043 –0.09387 0.10906
5 1.05652 0.56505 0.95672 1.03638
6 –0.75027 –0.83578 –0.26659 –0.11085
7 1.61492 0.81994 1.31296 1.29847
8 –1.18825 –1.28764 –0.57535 –0.45011
9 2.37345 1.26043 1.81015 1.70031
10 –1.93787 –1.93160 –1.08847 –0.96251

(0) (0) (0) (0)


and starting with the initial approximation x1 = 0, x2 = 0, x3 = 0, x4 =
0, then for k = 0, we obtain

(1) 1h (0) (0)


i
x1 = 2 − 11x2 − x4 = 0.28571
7

(1) 1h (0) (0) (0)


i
x2 = 2 − 2x1 − x3 − 3x4 = 0.40000
5

(1) 1h (0) (0)


i
x3 = 3 − 4x2 − 2x4 = 0.60000
2

(1) 1h (0) (0) (0)


i
x4 = 4 − x1 − 3x2 − 2x3 = 0.66667.
6

The first and subsequent iterations are listed in Table 2.5. Now we solve
the same system by the Gauss–Seidel method and for the given system, the
Iterative Methods for Linear Systems 259

Gauss–Seidel formula is

(k+1) 1h (k) (k)


i
x1 = 2 − 11x2 − x4
7

(k+1) 1h (k+1) (k) (k)


i
x2 = 2 − 2x1 − x3 − 3x4
5

(k+1) 1h (k+1) (k)


i
x3 = 3 − 4x2 − 2x4
2

(k+1) 1h (k+1) (k+1) (k+1)


i
x4 = 4 − x1 − 3x2 − 2x3 ,
6
(0) (0) (0) (0)
and starting with the initial approximation x1 = 0, x2 = 0, x3 = 0, x4 =
0, then for k = 0, we obtain

(1) 1h (0) (0)


i
x1 = 2 − 11x2 − x4 = 0.28571
7

(1) 1h (1) (0) (0)


i
x2 = 2 − 2x1 − x3 − 3x4 = 0.28571
5

(1) 1h (1) (0)


i
x3 = 3 − 4x2 − 2x4 = 0.37143
2

(1) 1h (1) (1) (1)


i
x4 = 4 − x1 − 3x2 − 2x3 = 0.35238.
6
260 Applied Linear Algebra and Optimization using MATLAB

Table 2.6: Solution by the Gauss–Seidel method.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.00000 0.00000 0.00000 0.00000
1 0.28571 0.28571 0.37143 0.35238
2 –0.21361 0.19973 0.29927 0.50265
3 –0.09995 0.07854 0.33611 0.53202
4 0.08630 –0.02095 0.40395 0.52811
5 0.24319 –0.09493 0.46470 0.51870
6 0.36080 –0.14848 0.51130 0.51034
7 0.44613 –0.18692 0.54540 0.50397
8 0.50745 –0.21444 0.56996 0.49933
9 0.55136 –0.23413 0.58758 0.49598
10 0.58278 –0.24822 0.60018 0.49358
.. .. .. .. ..
. . . . .
25 0.66118 –0.28335 0.63163 0.48760
26 0.66132 –0.28342 0.63169 0.48759
27 0.66143 –0.28346 0.63174 0.48758
28 0.66150 –0.28350 0.63177 0.48758

The first and subsequent iterations are listed in Table 2.6. Note that the Ja-
cobi method diverged and the Gauss–Seidel method converged after 28 itera-
tions with the approximate solution [0.66150, −0.28350, 0.63177, 0.48758]T
of the given system, which has the exact solution [0.66169, −0.28358, 0.63184,
0.48756]T . •

Example 2.6 Solve the following system of equations using the Jacobi and
Gauss–Seidel iterative methods, using  = 10−5 in the l∞ -norm and taking
the initial solution x(0) = [0, 0, 0, 0]T :
x1 + 2x2 − 2x3 = 1
x1 + x2 + x3 = 2
2x1 + 2x2 + x3 = 3
x1 + x2 + x3 + x 4 = 4.

Start with the initial solution x(0) = [0, 0, 0, 0]T .


Iterative Methods for Linear Systems 261

Solution. First, we solve by the Jacobi method and for the given system,
the Jacobi formula is

(k+1) 1h (k) (k)


i
x1 = 1 − 2x2 + 2x3
1

(k+1) 1h (k) (k)


i
x2 = 2 − x1 − x3
1

(k+1) 1h (k) (k)


i
x3 = 3 − 2x1 − 2x2
1

(k+1) 1h (k) (k) (k)


i
x4 = 4 − x1 − x2 − x3 ,
1

(0) (0) (0) (0)


and starting with initial approximation x1 = 0, x2 = 0, x3 = 0, x4 = 0,
then for k = 0, we obtain

(1) 1h (0) (0)


i
x1 = 1 − 2x2 + 2x3 = 1.0000
1

(1) 1h (0) (0)


i
x2 = 2 − x1 − x3 = 2.0000
1

(1) 1h (0) (0)


i
x3 = 3 − 2x1 − 2x2 = 3.0000
1

(1) 1h (0) (0) (0)


i
x4 = 4 − x1 − x2 − x3 = 4.0000.
1

The first and subsequent iterations are listed in Table 2.7. Now we solve
the same system by the Gauss–Seidel method and for the given system, the
262 Applied Linear Algebra and Optimization using MATLAB

Table 2.7: Solution by the Jacobi method.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.0000 0.0000 0.0000 0.0000
1 1.0000 2.0000 3.0000 4.0000
2 3.0000 –2.0000 –3.0000 –2.0000
3 –1.0000 2.0000 1.0000 6.0000
4 –1.0000 2.0000 1.0000 2.0000
5 –1.0000 2.0000 1.0000 2.0000

Gauss–Seidel formula is
(k+1) 1h (k) (k)
i
x1 = 1 − 2x2 + 2x3
1

(k+1) 1h (k+1) (k)


i
x2 = 2 − x1 − x3
1

(k+1) 1h (k+1) (k+1)


i
x3 = 3 − 2x1 − 2x2
1

(k+1) 1h (k+1) (k+1) (k+1)


i
x4 = 4 − x1 − x2 − x3 ,
1
(0) (0) (0) (0)
and starting with the initial approximation x1 = 0, x2 = 0, x3 = 0, x4 =
0, then for k = 0, we obtain
(1) 1h (0) (0)
i
x1 = 1 − 2x2 + 2x3 = 1.0000
1

(1) 1h (1) (0)


i
x2 = 2 − x1 − x3 = 1.0000
1

(1) 1h (1) (1)


i
x3 = 3 − 2x1 − 2x2 = −1.0000
1

(1) 1h (1) (1) (1)


i
x4 = 4 − x1 − x2 − x3 = 3.0000.
1
The first and subsequent iterations are listed in Table 2.8. Note that the
Iterative Methods for Linear Systems 263

Table 2.8: Solution by the Gauss–Seidel method.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.0000 0.0000 0.0000 0.0000
1 1.0000 1.0000 -1.0000 3.0000
2 -3.0000 6.0000 -3.0000 4.0000
3 -17.0000 22.0000 -7.0000 6.0000
4 -57.0000 66.0000 -15.0000 10.0000
5 -417.0000 450.0000 -63.0000 34.0000
7 -1025.0000 1090.0000 -127.0000 66.0000
8 -2433.0000 2562.0000 -255.0000 130.0000
9 -5633.0000 5890.0000 -511.0000 258.0000

Jacobi method converged quickly (only five iterations) but the Gauss–Seidel
method diverged for the given system. •

Example 2.7 Consider the system:


6x1 + 2x2 = 1
x1 + 7x2 − 2x3 = 2
3x1 − 2x2 + 9x3 = −1.
(a) Find the matrix form of the iterative (Jacobi and Gauss–Seidel)
methods.
(k) (k) (k)
(b) If x(k) = [x1 , x2 , x3 ]T , then write the iterative forms of part (a) in
component forms and find the exact solution of the given system.
(c) Find the formulas for the error e(k+1) in the (n + 1)th step.
(d) Find the second approximation of the error e(2) using part (c) if
x(0) = [0, 00]T .

Solution. Since the given matrix A is


 
6 2 0
A= 1 7 −2  ,
3 −2 9
264 Applied Linear Algebra and Optimization using MATLAB


    
0 0 0 0 2 0 6 0 0
A=L+U +D = 1 0 0  +  0 0 −2  +  0 7 0  .
3 −2 0 0 0 0 0 0 9

Jacobi Iterative Method

(a) Since the matrix form of the Jacobi iterative method can be written as

x(k+1) = TJ x(k) + c, k = 0, 1, 2, . . .

where
TJ = −D−1 (L + U ) and c = D−1 b

one can easily compute the Jacobi iteration matrix TJ and the vector c as
follows:
2 1
   
 0 −6 0   6 
   
   
 1 2   2 
TJ = 
 −7 0  and c = 
 7 .

 7   
   
 3 2   1 
− 0 −
9 9 9
Thus, the matrix form of the Jacobi iterative method is

2 1
   
0 − 0 

 6 

 6 

   
 1 2  (k) 
 2 
x(k+1) =
 −7 0 x + , k = 0, 1, 2.
 7 


 7 

   
 3 2   1 
− 0 −
9 9 9

(b) Now by writing the above iterative matrix form in component form, we
Iterative Methods for Linear Systems 265

have
1 1
   
  0 − 0   
x1 
 3  x1

 6 

      
 x2  =  − 1 2  2 
     
0   x2  +  ,
   7 7 
    7 
     
   
x3  1 2  x3  1 
− 0 −
3 9 9
and it is equivalent to
1 1
x1 = − x2 +
3 6
1 2 2
x2 = − x1 + x3 +
7 7 7
1 2 1
x3 = − x1 + x2 − .
3 9 9
Now solving for x1 , x2 , and x3 , we get
1
 
 
x1  12 
 
   
  
 x2  =  1 
,


 
  4 

 
x3  1 

12
which is the exact solution of the given system.
(c) Since the error in the (n + 1)th step is defined as
e(k+1) = x − x(k+1) ,
we have
1 2 1
     
0 − 0

 12 
 
 6 

 6



     
 1   1
  2  (k) 
 2 
e(k+1) = − − 0 x + .

 4   7
 7 

 7
 

     
 1   3 2   1 
− − 0 −
12 9 9 9
266 Applied Linear Algebra and Optimization using MATLAB

This can also be written as

1 2 1
    
0 − 0

 12 
 
 6   12



    
 1   1
  2   1 
e(k+1) =  − − 0  

 4   7
  7 
  4 

    
 1   3 2  1 
− − 0 −
12 9 9 12
2
 
0 − 0 

 6 
 
 1 2 
 e(k)
+  − 0

 7 7 

 
 3 2 
− 0
9 9
1
 

 6 

 
 2 
+ 

 7 

 
 1 

9

or
2
 
0 − 0 

 6 
 
 1 2 
e(k+1) =
 −7 0  e(k) ,
 7 

 
 3 2 
− 0
9 9

(because x(k) = e(k) − x) which is the required error in the (n + 1)th step.
(d) Now finding the first approximation of the error, we have to compute
Iterative Methods for Linear Systems 267

the following:
2
 
0 − 0 

 6 
 
 1 2 
e(1) =
 −7 0  e(0) ,
 7 

 
 3 2 
− 0
9 9
where
e(0) = x − x(0) .
Using x(0) = [0, 0, 0]T , we have
1 1
   
 
 12  0  12 
   
     
(0)
 1    
− 0 = 1 
e = .

 4    
    4 

   
 1  0  1 
− −
12 12
Thus,
2 1 1
    
 0 − 6 0   12   − 12 
    
    
(1)
 1 2  1   1 
e =  7 − 0  = − .
 7 

 4 
  28 
 
    
 3 2  1   1 
− 0 −
9 9 12 36
Similarly, for the second approximation of the error, we have to compute
the following:
2
 
 0 −6 0 
 
 
(2)
 1 2  (1)
e =  7− 0 e
 7 
 
 3 2 
− 0
9 9
268 Applied Linear Algebra and Optimization using MATLAB

or
2 1 1
    
0 − 0   − 12   84 

 6    
    
 1 2  − 1  =  5 ,
   
e(2) =
 −7 0
 7   28   252 
   
    
 3 2  1   5 
− 0
9 9 36 252

which is the required second approximation of the error.

Gauss–Seidel Iterative Method

(a) Now by using the Gauss–Seidel method, first we compute the Gauss–
Seidel iteration matrix TG and the vector c as follows:

1 1
   
 0 − 0 
 3 

 6 

   
 1 2   11 
TG = 
 0
 and c= .
 21 7 


 42 

   
 23 4   41 
0 −
189 63 378

Thus, the matrix form of the Gauss–Seidel iterative method is

1 1
   
 0 − 0 
 3 

 6 

   
 1 2  (k) 
 11 
x(k+1) =
 0 x + , k = 0, 1, 2.
 21 7 


 42 

   
 23 4   41 
0 −
189 63 378
Iterative Methods for Linear Systems 269

(b) Writing the above iterative form in component form, we get


1 1
   
  0 − 0  
x1 
 3  x1


 6 

      
  
 x2  =  0 1 2   
  x2  +  11 
,


 
  21 7 

 
  42 

   
x3  23 4  x3  41 
0 −
189 63 378
and it is equivalent to
1 1
x1 = − x2 +
3 6
1 2 11
x2 = x2 + x3 +
21 7 42
23 4 41
x3 = x2 + x2 − .
189 63 378
Now solving for x1 , x2 , and x3 , we get
1
 
 
x1 
 12 

   
  
 x2  =  1 
,


 
  4 

 
x3  1 

12
which is the exact solution of the given system.
(c) The error in the (n + 1)th step can be easily computed as
1
 
 0 −3 0 
 
 
(k+1)
 1 2  (k)
e = 0 e .

 21 7 

 
 23 4 
0
189 63
270 Applied Linear Algebra and Optimization using MATLAB

(d) The first and second approximations of the error can be calculated as
follows:
1
 
 0 − 0 
 3 
 
1 2  e(0) = [− 1 , − 1 , 19 ]T
(1)

e =  0
 21 7  12 84 756
 
 23 4 
0
189 63
and
1
 
 0 − 0 
 3 
 
1  e(1) = [ 1 , 5 , 1 ]T ,
2 

e(2) =
 0
 21 7  252 756 6804
 
 23 4 
0
189 63
which is the required second approximation of the error. •

2.4 Convergence Criteria


Since we noted that the Jacobi method and the Gauss–Seidel method do
not always converge to the solution of the given system of linear equations,
here, we need some conditions which make both methods converge. The
sufficient condition for the convergence of both methods is defined in the
following theorem.

Theorem 2.1 (Sufficient Condition for Convergence)

If a matrix A is strictly diagonally dominant, then for any choice of ini-


tial approximation x(0) ∈ R, both the Jacobi method and the Gauss–Seidel
method give the sequence {x(k) }∞
k=0 of approximations that converge to the
solution of a linear system. •

There is another sufficient condition for the convergence of both itera-


tive methods, which is defined in the following theorem.
Iterative Methods for Linear Systems 271

Theorem 2.2 (Sufficient Condition for Convergence)

For any initial approximation x(0) ∈ R, the sequence {x(k) }∞


k=0 of approxi-
mations defined by

x(k+1) = T x(k) + c, for each k ≥ 0, and c 6= 0 (2.19)

converges to the unique solution of x = T x+c if kT k < 1, for any natural


matrix norm, and the following error bounds hold:

kx − x(k) k ≤ kT kk kx(0) − xk
(2.20)
(k) kT kk
kx − x k ≤ kx(1) − x(0) k.
1 − kT k

Note that the smaller the value of kT k, the faster the convergence of
the iterative methods.

Example 2.8 Show that for the nonhomogeneous linear system Ax = b,


with the matrix A  
5 0 −1
A =  −1 3 0 ,
0 −1 4

the Gauss–Seidel iterative method converges faster than the Jacobi iterative
method.

Solution. Here we will show that the l∞ -norm of the Gauss–Seidel itera-
tion matrix TG is less than the l∞ -norm of the Jacobi iteration matrix TJ ,
i.e.,
kTG k < kTJ k.

The Jacobi iteration matrix TJ can be obtained from the given matrix A as
272 Applied Linear Algebra and Optimization using MATLAB

follows:
1
 
 0 0
5 
 −1    
5 0 0 0 0 −1 
 1

−1

TJ = −D (L+U ) = − 0 3 0   −1 0 0 = .
0 0 
 
 3
0 0 4 0 −1 0 



 1 
0 − 0
4
Then the l∞ -norm of the matrix TJ is
 
1 1 1 1
kTJ k∞ = max , , = = 0.3333 < 1.
5 3 4 3
The Gauss–Seidel iteration matrix TG is defined as
  
5 0 0 0 0 −1
TG = −(D + L)−1 U = −  −1 3 0  0 0 0 ,
0 −1 4 0 0 0
and it gives
1
 
 0 0 5 
 
 
 1 
TG =  0 0 −
 .
 15 

 
 1 
0 0
60
Then the l∞ -norm of the matrix TG is
 
1 1 1 1
kTG k∞ = max , , = = 0.2000 < 1,
5 15 60 5
which shows that the Gauss–Seidel method will converge faster than the
Jacobi method for the given linear system. •
Note that the condition kT k < 1 is equivalent to the condition that a
matrix A is to be strictly diagonally dominant.
Iterative Methods for Linear Systems 273

For the Jacobi method for a general matrix A, the norm of the Jacobi
iteration matrix is defined as
n
X aij
kTJ k = max aii .

1≤i≤n
j=1
j6=i

Thus, kTJ k < 1 is equivalent to requiring


n
X
|aij | < |aii |,
j=1
j6=i

i.e., the matrix A is strictly diagonally dominant.


Example 2.9 Consider the following linear system of equations:
10x1 + 2x2 + x3 + x4 = 5
x1 + 12x2 + x3 + 2x4 = 9
2x1 + x2 + 13x3 + 3x4 = 1
x1 + 2x2 + x3 + 15x4 = 13.
(a) Show that both iterative methods (Jacobi and Gauss–Seidel) will
converge by using kT k∞ < 1.
(b) Find the second approximation x(2) when the initial solution is
x(0) = [0, 0, 0, 0]T .
(c) Compute the error bounds for your approximations.
(d) How many iterations are needed to get an accuracy within 10−4 ?

Solution. Since the given matrix A is


 
10 2 1 1
 1 12 1 2 
A=  2 1

13 3 
1 2 1 15
from (2.5), we have
     
0 0 0 0 10 0 0 0 0 2 1 1
 1 0 0 0   0 12 0 0   0 0 1 2 
A = L+U +D = 
 2
+ + .
1 0 0   0 0 13 0   0 0 0 3 
1 2 1 0 0 0 0 15 0 0 0 0
274 Applied Linear Algebra and Optimization using MATLAB

Jacobi Iterative Method

(a) Since the Jacobi iteration matrix is defined as

TJ = −D−1 (L + U ),

and computing the right-hand side, we get

 1
 2 1 1 
0 − − −

0 0 0  10 10 10 
 10   
    
 
0 2 1 1
 1 1 2 

 0 1   − 0 − − 
0 0 
1 0 1 2   12 12 12 
TJ = − 
 12  = ,

2 1 0 3
  
   2 1 3 

 0 1 
1 2 1 0
 − − 0 − 
0 0  
13 13 13 

13
  
1
   
0 0 0 1 2 1
 
15 − − − 0
15 15 15
then the l∞ norm of the matrix TJ is
 
4 4 6 4 6
kTJ k∞ = max , , , = = 0.46154 < 1.
10 12 13 15 13

Thus, the Jacobi method will converge for the given linear system.

(b) The Jacobi method for the given system is

(k+1) 1h (k) (k) (k)


i
x1 = 5 − 2x2 − x3 − x4
10

(k+1) 1h (k) (k) (k)


i
x2 = 9 − x1 − x3 − 2x4
12

(k+1) 1h (k) (k) (k)


i
x3 = 1 − 2x1 − x2 − 3x4
13

(k+1) 1h (k) (k) (k)


i
x4 = 13 − x1 − 2x2 − x3 .
15
Iterative Methods for Linear Systems 275

(0) (0) (0) (0)


Starting with an initial approximation x1 = 0, x2 = 0, x3 = 0, x4 = 0,
and for k = 0, 1, we obtain the first and the second approximations as
follows:

x(1) = [0.5, 0.75, 0.07692, 0.86667]T , x(2) = [0.25564, 0.62970, −0.12436, 0.72821]T .

(c) Using the error bound formula (2.20), we obtain


   
0.5 0
6 2 
(2) ( 13 )  0.75   0 
kx − x k ≤ 6  0.07692  − 
  
1 − 13 0 
0.86667 0
or
0.21302
kx − x(2) k ≤ (0.86667) = 0.34286.
0.53846
(d) To find the number of iterations, we use formula (2.20) as

kTJ kk
kx − x(k) k ≤ kx(1) − x(0) k ≤ 10−4 .
1 − kTJ k
It gives
6 k
( 13 )
7 (0.86667) ≤ 10−4
13
or
7
6 k ( 13 ) × 10−4
( ) ≤ ,
13 0.86667
which gives
(0.46154)k ≤ (6.21) × 10−5 .
Taking ln on both sides, we obtain
2
k ln( ) ≤ ln (6.21) × 10−5

3
or
k(−0.77327) ≤ (−9.68621),
and it gives
k ≥ 12.5263 or k = 13,
276 Applied Linear Algebra and Optimization using MATLAB

which is the required number of iterations.

Gauss–Seidel Iterative Method

(a) Since the Gauss–Seidel iteration matrix is defined as

TG = −(D + L)−1 U,

and computing the right-hand side, we have


 1 
0 0 0
 10 
0 2 1 1

 
 
 1 1  
− 0 0

0 0 1 2 
 
 20 12  
TG = −  ,
 

 − 23 − 1 1

0 0 0 3 
 
0  
1560 156 13
  
 
0 0 0 0
 
107 5 1 1
 
− − −
23400 468 195 15
and it gives  1 1 1 
0 − − −
 5 10 10 
 
 
 1 3 19 
 0 − − 
 60 40 120 
TG =  ,
 
 23 11 317 
 0 − 
780 520 1560 
 

 
107 119 136
 
0
11700 7800 3291
then the l∞ norm of the matrix TG is

kTG k∞ = max {0.4, 0.25058, 0.2539, 0.0657} = 0.4 < 1.

Thus, the Gauss–Seidel method will converge for the given linear system.
Iterative Methods for Linear Systems 277

(b) The Gauss–Seidel method for the given system is

(k+1) 1h (k) (k) (k)


i
x1 = 5 − 2x2 − x3 − x4
10

(k+1) 1h (k+1) (k) (k)


i
x2 = 9 − x1 − x3 − 2x4
12

(k+1) 1h (k+1) (k+1) (k)


i
x3 = 1 − 2x1 − x2 − 3x4
13

(k+1) 1h (k+1) (k+1) (k+1)


i
x4 = 13 − x1 − 2x2 − x3 .
15
(0) (0) (0) (0)
Starting with an initial approximation x1 = 0, x2 = 0, x3 = 0, x4 = 0,
and for k = 0, 1, we obtain the first and the second approximations as
follows:
x(1) = [0.5, 0.70833, −0.05449, 0.74252]T
x(2) = [0.28953, 0.66854, −0.07616, 0.76330]T .
(c) Using error bound formula (2.20), we obtain
   
0.5 0
(0.4)2

 0.70833  −  0 
  
kx − x(2) k ≤ 
1 − 0.4  −0.05449  
0 
0.74252 0
or
0.16
kx − x(2) k ≤(0.74252) = 0.19801.
0.6
(d) To find the number of iterations, we use formula (2.20) as

kTJ kk
kx − x(k) k ≤ kx(1) − x(0) k ≤ 10−4 .
1 − kTJ k
It gives
(0.4)k
(0.74252) ≤ 10−4
0.6
or
(0.4)k ≤ (8.08 × 10−5 ).
278 Applied Linear Algebra and Optimization using MATLAB

Taking ln on both sides, we obtain

k ln(0.4) ≤ ln(8.08 × 10−5 )

or
k(−0.91629) ≤ (−9.4235),

and it gives
k ≥ 10.28441 or k = 11,

which is the required number of iterations. •

Theorem 2.3 If A is a symmetric positive-definite matrix with positive


diagonal entries, then the Gauss–Seidel method converges to a unique so-
lution of the linear system Ax = b. •

Example 2.10 Solve the following system of linear equations using Gauss–
Seidel iterative methods, using  = 10−5 in the l∞ -norm and taking the
initial solution x(0) = [0, 0, 0, 0]T :

5x1 − x3 = 1
14x2 − x3 − x4 = 1
−x1 − x2 + 13x3 = 4
− x2 + 9x4 = 3.

Solution. The matrix


 
5 0 −1 0
 0 14 −1 −1 
A=
 −1 −1 13

0 
0 −1 0 9

of the given system is symmetric positive-definite with positive diagonal


Iterative Methods for Linear Systems 279

entries, and the Gauss–Seidel formula for the system is

(k+1) 1h (k)
i
x1 = 1 + x3
5

(k+1) 1h (k) (k)


i
x2 = 1 + x3 + x4
14

(k+1) 1h (k+1) (k+1)


i
x3 = 4 + x1 + x2
13

(k+1) 1h (k+1)
i
x4 = 3 + x2 .
9
(0) (0) (0) (0)
So starting with an initial approximation x1 = 0, x2 = 0, x3 = 0, x4 =
0, and for k = 0, we get

(1) 1h (0)
i
x1 = 1 + x3 = 0.200000
5

(1) 1h (0) (0)


i
x2 = 1 + x3 + x4 = 0.071429
14

(1) 1h (1) (1)


i
x3 = 4 + x1 + x2 = 0.328571
13

(1) 1h (1)
i
x4 = 3 + x2 = 0.341270.
9
The first and subsequent iterations are listed in Table 2.9. •

Note that the Gauss–Seidel method converged very fast (only five it-
erations) and the approximate solution of the given system [0.267505,
0.120302, 0.337524, 0.346700]T is equal to the exact solution [0.267505,
0.120302, 0.337524, 0.346700]T up to six decimal places.
280 Applied Linear Algebra and Optimization using MATLAB

Table 2.9: Solution by the Gauss–Seidel method.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.000000 0.000000 0.000000 0.000000
1 0.200000 0.071429 0.328571 0.341270
2 0.265714 0.119274 0.337307 0.346586
3 0.267461 0.120278 0.337518 0.346698
4 0.267504 0.120301 0.337524 0.346700
5 0.267505 0.120302 0.337524 0.346700

2.5 Eigenvalues and Eigenvectors


Here, we will briefly discuss the eigenvalues and eigenvectors of an n × n
matrix. We also show how they can be used to describe the solutions of
linear systems.
Definition 2.1 An n × n matrix A is said to have an eigenvalue λ of A
if there exists a nonzero vector, called an eigenvector x, such that
Ax = λx. (2.21)
Then the relation (2.21) represents the eigenvalue problem, and we refer
to (λ, x) as an eigenpair. •
The equivalent form of (2.21) is
(A − λI)x = 0, (2.22)
where I is an n × n identity matrix. The system of equations (2.22) has
the nontrivial solution x if, and only if, A − λI is singular or, equivalently,
det(A − λI) = |A − λI| = 0. (2.23)
The above relation (2.23) represents a polynomial equation in λ of
degree n which in principle could be used to obtain the eigenvalues of the
matrix A. This equation is called the characteristic equation of A. There
are n roots of (2.23), which we will denote by λ1 , λ2 , . . . , λn . For a given
eigenvalue λi , the corresponding eigenvector xi is not uniquely determined.
If x is an eigenvector, then so is αx, where α is any nonzero scalar.
Iterative Methods for Linear Systems 281

Example 2.11 Find the eigenvalues and eigenvectors of the following ma-
trix:  
−6 0 0
A =  11 −3 0  .
−3 6 7
Solution. To find the eigenvalues of the given matrix A by using (2.23),
we have
−6 − λ 0 0


11 −3 − λ 0 = 0,
−3 6 7−λ
which gives a characteristic equation of the form
λ3 + 2λ2 − 45λ − 126 = 0.
It factorizes to
(−6 − λ)(−3 − λ)(7 − λ) = 0
and gives us the eigenvalues λ = −6, λ = −3, and λ = 7 of the given
matrix A. Note that the sum of these eigenvalues is −2, and this agrees
with the trace of A. After finding the eigenvalues of the matrix we turn to
the problem of finding eigenvectors. The eigenvectors of A corresponding
to the eigenvalues λ are the nonzero vectors x that satisfy (2.22). Equiv-
alently, the eigenvectors corresponding to λ are the nonzero vectors in the
solution space of (2.22). We call this solution space the eigenspace of A
corresponding to λ.

To find the eigenvectors of the above given matrix A corresponding to


each of these eigenvalues, we substitute each of these three eigenvalues in
(2.22). When λ = −6, we have
    
0 0 0 x1 0
 11 3 0   x2  =  0  ,
−3 6 13 x3 0
which implies that
0x1 + 0x2 + 0x3 = 0
11x1 + 3x2 + 0x3 = 0
−3x1 + 6x2 + 13x3 = 0.
282 Applied Linear Algebra and Optimization using MATLAB

Solving this system, we get x1 = 3, x2 = −11, and x3 = 75. Hence, the


eigenvector x(1) corresponding to the first eigenvalue, λ1 = −6, is

x(1) = α[3, −11, 75]T , where α ∈ R, α 6= 0.

When λ = −3, we have


    
−3 0 0 x1 0
 11 0 0   x2  =  0  ,
−3 6 10 x3 0

which implies that

3x1 + 0x2 + 0x3 = 0


11x1 + 0x2 + 0x3 = 0
−3x1 + 6x2 + 10x3 = 0,

which gives the solution, x1 = 0, x2 = 5, and x3 = −3. Hence, the eigen-


vector x(2) corresponding to the second eigenvalue, λ2 = −3, is

x(2) = α[0, 5, −3]T , where α ∈ R, α 6= 0.

Finally, when λ = 7, we have


    
−13 0 0 x1 0
 11 −10 0   x2  =  0  ,
−3 6 0 x3 0

which implies that

−13x1 + 0x2 + 0x3 = 0


11x1 − 10x2 + 0x3 = 0
−3x1 + 6x2 + 0x3 = 0,

which gives x1 = x2 = 0, and x3 = 1. Hence,

x(3) = α[0, 0, 1]T , where α ∈ R, α 6= 0

is the eigenvector x(3) corresponding to the third eigenvalue, λ3 = 7. •


Iterative Methods for Linear Systems 283

The MATLAB command eig is the basic eigenvalue and eigenvector


routine. The command

>> D = eig(A);

returns a vector containing all the eigenvalues of the matrix A. If the


eigenvectors are also wanted, the syntax

>> [X, D] = eig(A);

will return a matrix X whose columns are eigenvectors of A corresponding


to the eigenvalues in the diagonal matrix D. To get the results of Exam-
ple 2.11, we use the MATLAB Command Window as follows:

>> A = [−6 0 0; 11 − 3 0; −3 6 7];


>> P = poly(A);
>> P P = poly2sym(P );
>> [X, D] = eig(A);
>> eigenvalues = diag(D);

Definition 2.2 (Spectral Radius of a Matrix)

Let A be an n × n matrix. Then the spectral radius ρ(A) of a matrix A is


defined as
ρ(A) = max |λi |,
1≤i≤n

where λi are the eigenvalues of a matrix A. •

For example, the matrix


 
4 1 −3
A= 0 0 2 
0 0 −3

has the characteristic equation of the form

det(A − λI) = −λ3 + λ2 + 12λ = 0,


284 Applied Linear Algebra and Optimization using MATLAB

which gives the eigenvalues λ = 4, 0, −3 of A. Hence, the spectral radius


of A is
ρ(A) = max{|4|, |0|, | − 3|} = 4.
The spectral radius of a matrix A may be found using MATLAB com-
mands as follows:

>> A = [4 1 − 3; 0 0 2; 0 0 − 3];
>> B = max(eig(A))
B=
4

Example 2.12 For the matrix


 
a b
A= ,
c d
if the eigenvalues of the Jacobi iteration matrix and the Gauss–Seidel iter-
ation matrix are λi and µi , respectively, then show that µmax = λ2max .

Solution. Decompose the given matrix into the following form:


     
0 0 a 0 0 b
A=L+D+U = + + .
c 0 0 d 0 0
First, we define the Jacobi iteration matrix as
TJ = −D−1 (L + U ),
and computing the right-hand side, we get
 
1 
b

 a 0   0 −
 0 b  a 
TJ = −  = .

1  c 0

c
  
0 − 0
d d
To find the eigenvalues of the matrix TJ , we do as follows:
b

−λ −


a
det(TJ − λI) = =0
c

− −λ


d
Iterative Methods for Linear Systems 285

gives r r
cb cb
λ1 = − , λ2 =
ad ad
and r
cb
λmax = .
ad
Similarly, we can find the Gauss–Seidel iteration matrix as

TG = −(L + D)−1 U,

and computing the right-hand side, we get


 
b
 0 −a 
TG =  .
 
 cb 
0
ad
To find the eigenvalues of the matrix TG , we do as follows:


−µ b

a
det(TG − λI) = = 0,

cb
0 − µ
ad

which gives
cb
µ1 = 0, µ2 =
ad
and
cb
µmax = .
ad
Thus,
r !2
cb
µmax = = λ2max ,
ad
which is the required result. •
286 Applied Linear Algebra and Optimization using MATLAB

The necessary and sufficient condition for the convergence of the Jacobi
iterative method and the Gauss–Seidel iterative method is defined in the
following theorem.

Theorem 2.4 (Necessary and Sufficient Condition for Convergence)

For any initial approximation x(0) ∈ R, the sequence {x(k) }∞


k=0 of approxi-
mations defined by
x(k+1) = T x(k) + c, for each k ≥ 0, and c 6= 0 (2.24)
converges to the unique solution of x = T x + c, if and only if ρ(T ) < 1.

Note that the condition ρ(T ) < 1 is satisfied when kT k < 1 because
ρ(T ) ≤ kT k for any natural norm. •

No general results exist to help us choose between the Jacobi method or


the Gauss–Seidel method to solve an arbitrary linear system. However, the
following theorem is suitable for the special case.
Theorem 2.5 If aii ≤ 0, for each i 6= j, and aii > 0, for each i =
1, 2, . . . , n, then one and only one of the following statements holds:
1. 0 ≤ ρ(TG ) < ρ(TJ ) < 1.
2. 1 < ρ(TJ ) < ρ(TG ).
3. ρ(TJ ) = ρ(TG ) = 0.
4. ρ(TJ ) = ρ(TG ) = 1. •

Example 2.13 Find the spectral radius of the Jacobi and the Gauss–Seidel
iteration matrices using each of the following matrices:
   
2 0 −1 1 −1 1
(a) A =  −1 3 0  , (b) A =  −2 2 −1  ,
0 −1 4 0 1 5
   
1 0 0 1 0 −1
(c) A =  −1 2 0  , (d) A =  1 1 0 .
0 −1 3 0 1 1
Iterative Methods for Linear Systems 287

Solution. (a) The Jacobi iteration matrix TJ for the given matrix A can
be obtained as
1
 
0 0

 2 
 
 1 
TJ = 
 3 0 0 ,

 
 
 1 
0 0
4
and the characteristic equation of the matrix TJ is
1
det(TJ − λI) = −λ3 + = 0.
24
Solving this cubic polynomial, the maximum eigenvalue (in absolute) of TJ
329
is , i.e.,
949
329
ρ(TJ ) = = 0.3467.
949
Also, the Gauss–Seidel iteration matrix TG for the given matrix A is
1
 
0 0

 2 

 
 1 
TG =  0 0 

 6 

 
 1 
0 0
24
and has the characteristic equation of the form
1 2
det(TG − λI) = −λ3 + λ = 0.
24
Solving this cubic polynomial, we obtain the maximum eigenvalue of TG ,
1
, i.e.,
24
1
ρ(TG ) = = 0.0417.
24
288 Applied Linear Algebra and Optimization using MATLAB

(b) The Jacobi iteration matrix TJ for the given matrix A is


 
0 1 −1

 1 1 
TJ =  0 ,

2
 1 
0 − 0
5
with the characteristic equation of the form
9 2 1
det(TJ − λI) = −λ3 + λ + = 0,
20 15
and it gives
1098
ρ(TJ ) = = 1.0447.
1051
The Gauss–Seidel iteration matrix TG is
 
0 1 −1
 1 
TG =  0 1 − 
2 ,

 1 1 
0 −
5 10
with the characteristic equation of the form
11 2
det(TG − λI) = −λ3 + λ = 0,
10
and it gives
11
ρ(TG ) = = 1.1000.
10
Similarly, for the matrices for (c) and (d), we have
 
0 0 0  
 1  0 0 0
TJ =  2
 0 0 
 , TG =  0 0 0  ,
 1 
0 0 0
0 0
3
with
ρ(TJ ) = 0.0000 and ρ(TG ) = 0.0000
Iterative Methods for Linear Systems 289

and    
0 0 1 0 0 1
TJ =  −1 0 0 , TG =  0 0 −1  ,
0 −1 0 0 0 1
with
ρ(TJ ) = 1.0000 and ρ(TG ) = 1.0000,
respectively. •

Definition 2.3 (Convergent Matrix)

An n × n matrix is called a convergent matrix if

lim (Ak )ij = 0, for each i, j = 1, 2, . . . , n.


k→∞

Example 2.14 Show that the matrix


 
1
 3 0 
A=
 

 1 1 
9 3
is the convergent matrix.

Solution. By computing the powers of the given matrix, we obtain


     
1 1 1
 9 0   27 0   81 0 
A2 =   , A3 =   , A4 =  .
     
 2 1   3 1   4 1 
27 9 81 27 243 81
Then in general, we have
  k 
1
0
 3 
Ak = 
 k  k ,
1 
3k+1 3
290 Applied Linear Algebra and Optimization using MATLAB

and it gives
 k  
1 k
lim = 0 and lim = 0.
k→∞ 3 k→∞ 3k+1
Hence, the given matrix A is convergent. •
Since the above matrix has the eigenvalue 31 of order two, its spectral
radius is 13 . This shows the important relation existing between the spectral
radius of a matrix and the convergent of a matrix.
Theorem 2.6 The following statements are equivalent:
1. A is a convergent matrix.
2. lim kAn k = 0, for all natural norms.
n→∞

3. ρ(A) < 1.
4. lim An x = 0, for every x. •
n→∞

Example 2.15 Show that the matrix


 
1 1 0 1
 1 1 1 0 
A=  0 1

1 1 
1 0 1 1
is not the convergent matrix.

Solution. First, we shall find the eigenvalues of the given matrix A by


computing the characteristic equation of the matrix as follows:
det(A − λI) = λ4 − 4λ3 + 2λ2 + 4λ − 3 = 0,
which factorizes to
(λ + 1)(λ − 3)(λ − 1)2 = 0
and gives the eigenvalues 3, 1, 1, and –1 of the given matrix A. Hence,
the spectral radius of A is
ρ(A) = max{|3|, |1|, |1|, | − 1|} = 3,
which shows that the given matrix is not convergent. •
Iterative Methods for Linear Systems 291

We will discuss some very important results concerning the eigenvalue


problems. The proofs of all the results are beyond the scope of this text
and will be omitted. However, they are very easily understood and can be
used.

Theorem 2.7 If A is an n × n matrix, then


1. [ρ(AT A)]1/2 = kAk2 , and

2. ρ(A) ≤ kAk, for any natural norm k.k.

Example 2.16 Consider the matrix


 
−2 1 2
A =  1 0 0 ,
0 1 0

which gives a characteristic equation of the form

det(A − λI) = −λ3 − 2λ2 + λ + 2 = 0.

Solving this cubic equation, the eigenvalues of A are –2, –1, and 1. Thus
the spectral radius of A is

ρ(A) = max{| − 2|, | − 1|, |1|} = 2.

Also,
    
−2 1 0 −2 1 2 5 −2 −4
AT A =  1 0 1   1 0 0  =  −2 2 2 ,
2 0 0 0 1 0 −4 2 4

and a characteristic equation of AT A is

−λ3 + 11λ2 − 14λ + 4 = 0,

which gives the eigenvalues 0.4174, 1, and 9.5826. Therefore, the spectral
radius of AT A is 9.5826. Hence,
p √
kAk2 = ρ(AT A) = 9.5826 ≈ 3.0956.
292 Applied Linear Algebra and Optimization using MATLAB

From this we conclude that

ρ(A) = 2 < 3.0956 ≈ kAk2 .

One can also show that


ρ(A) = 2 < 5 = kAk∞
ρ(A) = 2 < 3 = kAk1 ,
which satisfies Theorem 2.7. •

The spectral norm of a matrix A may be found using MATLAB com-


mands as follows:

>> A = [−2 1 2; 1 0 0; 0 1 0];


>> B = sqrt(max(eig(A0 ∗ A)))
B=
3.0956

Theorem 2.8 If A is a symmetric matrix then


p
kAk2 = ρ(AT A) = ρ(A).

Example 2.17 Consider a symmetric matrix


 
3 0 1
A =  0 −3 0  ,
1 0 3
which has a characteristic equation of the form

−λ3 + 4λ2 + 9λ − 36 = 0.

Solving this cubic equation, we have the eigenvalues 4, –3, and 3 of


the given matrix A. Therefore, the spectral radius of A is 4. Since A is
symmetric,  
10 0 6
AT A = A2 =  0 9 0  .
6 0 10
Iterative Methods for Linear Systems 293

Since we know that the eigenvalues of A2 are the eigenvalues of A raised


to the power of 2, the eigenvalues of AT A are 16, 9, and 9, and its spectral
radius is ρ(AT A) = ρ(A2 ) = [ρ(A)]2 = 16. Hence,
p √
kAk2 = ρ(AT A) = 16 = 4 = ρ(A),
which satisfies Theorem 2.8. •

Theorem 2.9 If A is a nonsingular matrix, then for any eigenvalue of A


1
≤ |λ| ≤ kAk2 .
kA−1 k 2

Note that this result is also true for any natural norm. •

Example 2.18 Consider the matrix


 
2 1
A= ,
3 2
and its inverse matrix is
 
−1 2 −1
A = .
−3 2
First, we find the eigenvalues of the matrix
 
T 13 8
A A= ,
8 5
which can be obtained by solving the characteristic equation

T
13 − λ 8
det(A A − λI) = = λ2 − 18λ + 1 = 0,
8 5−λ

which gives the eigenvalues 17.96 and 0.04. The spectral radius of AT A is
17.96. Hence, p √
kAk2 = ρ(AT A) = 17.96 ≈ 4.24.
Since a characteristic equation of (A−1 )T (A−1 ) is

−1 T −1
13 − λ 4
det[(A ) (A ) − λI] = = λ2 − 18λ + 49 = 0,
4 5−λ
294 Applied Linear Algebra and Optimization using MATLAB

which gives the eigenvalues 14.64 and 3.36 of (A−1 )T (A−1 ), its spectral
radius 14.64. Hence,
p √
kA−1 k2 = ρ((A−1 )T (A−1 )) = 14.64 ≈ 3.83.

Note that the eigenvalues of A are 3.73 and 0.27, therefore, its spectral
radius is 3.73. Hence,
1
< |3.73| < 4.24,
3.83
which satisfies Theorem 2.9. •

2.6 Successive Over-Relaxation Method


We have seen that the Gauss–Seidel method uses updated information
immediately and converges more quickly than the Jacobi method, but in
some large systems of equations the Gauss–Seidel method converges at a
very slow rate. Many techniques have been developed in order to improve
the convergence of the Gauss–Seidel method. Perhaps one of the simplest
and most widely used methods is Successive Over-Relaxation (SOR). A
useful modification to the Gauss–Seidel method is defined by the iterative
scheme:
" i−1 n
#
(k+1) (k) ω X (k+1)
X (k)
xi = (1 − ω)xi + bi − aij xj − aij xj ,(2.25)
aii j=1 j=i+1
i = 1, 2, . . . , n, k = 1, 2, . . .

which can be written as


" i−1 n
#
(k+1) (k) ω X (k+1)
X (k)
xi = xi + bi − aij xj − aij xj . (2.26)
aii j=1 j=i
i = 1, 2, . . . , n, k = 1, 2, . . .

The matrix form of the SOR method can be represented by

x(k+1) = (D + ωL)−1 [(1 − ω)D + ωU ]x(k) + ω(D − ωL)−1 b, (2.27)


Iterative Methods for Linear Systems 295

which is equivalent to
x(k+1) = Tω x(k) + c, (2.28)
where
Tω = (D + ωL)−1 [(1 − ω)D − ωU ] and c = ω(D − ωL)−1 b (2.29)
are called the SOR iteration matrix and the vector, respectively.

The quantity ω is called the relaxation factor. It can be formally proved


that convergence can be obtained for values of ω in the range 0 < ω < 2.
For ω = 1, the SOR method (2.25) is simply the Gauss–Seidel method. The
methods involving (2.25) are called relaxation methods. For the choices of
0 < ω < 1, the procedures are called under-relaxation methods and can be
used to obtain convergence of some systems that are not convergent by the
Gauss–Seidel method. For choices 1 < ω < 2, the procedures are called
over-relaxation methods, which can be used to accelerate the convergence
for systems that are convergent by the Gauss–Seidel method. The SOR
methods are particularly useful for solving linear systems that occur in the
numerical solutions of certain partial differential equations.
Example 2.19 Find the l∞ -norm of the SOR iteration matrix Tω , when
ω = 1.005, by using the following matrix:
 
5 −1
A= .
−1 10
Solution. Since the SOR iteration matrix is
Tω = (D + ωL)−1 [(1 − ω)D − ωU ],
where
     
0 0 0 −1 5 0
L= , U= , D= ,
−1 0 0 0 0 10
then    −1
5 0 0 0
Tω = + 1.005
0 10 −1 0
    
5 0 0 −1
(1 − 1.005) − 1.005 ,
0 10 0 0
296 Applied Linear Algebra and Optimization using MATLAB

which is equal to
 −1  
5 0 −0.025 1.005
Tω = .
−1.005 10 0 −0.05
Thus,   
0.2 0 −0.025 1.005
Tω =
0.0201 0.1 0 −0.05
or  
−0.005 0.201
Tω = .
−0.0005 0.0152
The l∞ -norm of the matrix Tω is
kTω k∞ = max{0.206, 0.0157} = 0.206.

Example 2.20 Solve the following system of linear equations, taking an
initial approximation x(0) = [0, 0, 0, 0]T and with  = 10−4 in the l∞ -norm:
2x1 + 8x2 = 1
5x1 − x2 + x3 = 2
−x1 + x2 + 4x3 + x4 = 12
x2 + x3 + 5x4 = 12.
(a) Using the Gauss–Seidel method.
(b) Using the SOR method with ω = 0.33.

Solution. (a) The Gauss–Seidel method for the given system is


(k+1) 1h (k)
i
x1 = 1 − 8x2
2

(k+1) 1 h (k+1) (k)


i
x2 = 2 − 5x1 − x3
−1

(k+1) 1h (k+1) (k+1) (k)


i
x3 = 12 + x1 − x2 − x4
4

(k+1) 1h (k+1) (k+1)


i
x4 = 12 − x2 − x3 .
5
Iterative Methods for Linear Systems 297

Table 2.10: Solution of Example 2.20 by the Gauss–Seidel method.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.00000 0.00000 0.00000 0.00000
1 5.0000e–001 5.0000e–001 3.0000e+000 1.7000e+000
2 –1.5000e+000 –6.5000e+000 3.8250e+000 2.9350e+000
3 2.6500e+001 1.3432e+002 –2.4690e+001 –1.9527e+001
4 –5.3680e+002 –2.7107e+003 5.5135e+002 4.3427e+002
5 1.0843e+004 5.4766e+004 –1.1086e+004 –8.7335e+003
6 –2.1906e+005 –1.1064e+006 2.2402e+005 1.7648e+005
7 4.4256e+006 2.2352e+007 –4.5257e+006 –3.5653e+006
8 –8.9408e+007 –4.5157e+008 9.1431e+007 7.2027e+007
9 1.8063e+009 9.1227e+009 –1.8471e+009 –1.4551e+009

Starting with an initial approximation x(0) = [0, 0, 0, 0]T , and for k = 0,


we obtain

(1) 1h (0)
i
x1 = 1 − 8x2 = 0.5
2

(1) 1 h (1) (0)


i
x2 = 2 − 5x1 − x3 = 0.5
−1

(1) 1h (1) (1) (0)


i
x3 = 12 + x1 − x2 − x4 = 3.0
4

(1) 1h (1) (1)


i
x4 = 12 − x2 − x3 = 1.7.
5

The first and subsequent iterations are listed in Table 2.10.


298 Applied Linear Algebra and Optimization using MATLAB

(b) Now the SOR method for the given system is


(k+1) (k) ωh (k)
i
x1 = (1 − ω)x1 + 1 − 8x2
2
(k+1) (k) ω h (k+1) (k)
i
x2 = (1 − ω)x2 + 2 − 5x1 − x3
−1
(k+1) (k) ωh (k+1) (k+1) (k)
i
x3 = (1 − ω)x3 + 12 + x1 − x2 − x4
4
(k+1) (k) ωh (k+1) (k+1)
i
x4 = (1 − ω)x4 + 12 − x2 − x3 .
5
Starting with an initial approximation x(0) = [0, 0, 0, 0]T , ω = 0.33, and
for k = 0, we obtain
(1) (0) ωh (0)
i
x1 = (1 − ω)x1 + 1 − 8x2 = 0.16500
2
(1) (0) ω h (1) (0)
i
x2 = (1 − ω)x2 + 2 − 5x1 − x3 = −0.387750
−1
(1) (0) ωh (1) (1) (0)
i
x3 = (1 − ω)x3 + 12 + x1 − x2 − x4 = 1.03560
4
(1) (0) ωh (1) (1)
i
x4 = (1 − ω)x4 + 12 − x2 − x3 = 0.74924.
5
The first and subsequent iterations are listed in Table 2.11. Note that the
Gauss–Seidel method diverged for the given system, but the SOR method
converged very slowly for the given system. •

Example 2.21 Solve the following system of linear equations using the
SOR method, with  = 0.5 × 10−6 in the l∞ -norm:
2x1 + x2 = 4
x1 + 2x2 + x3 = 8
x2 + 2x3 + x4 = 12
x3 + 2x4 = 11.

Start with an initial approximation x(0) = [0, 0, 0, 0]T and take ω = 1.27.
Iterative Methods for Linear Systems 299

Table 2.11: Solution of Example 2.20 by the SOR Method.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.00000 0.00000 0.00000 0.00000
1 0.16500 –0.38775 1.03560 0.74924
2 0.66376 0.51715 1.63414 1.15201
3 –0.26301 –0.20820 1.98531 1.44656
4 0.02493 –0.10321 2.21139 1.62205
5 0.05030 0.08360 2.33506 1.71914
6 –0.19531 –0.15568 2.40939 1.79508
7 –0.05655 –0.06251 2.45669 1.83669
8 –0.09343 –0.04533 2.48049 1.86186
9 –0.14497 –0.11101 2.49552 1.88207
10 –0.09613 –0.06948 2.50453 1.89227
.. .. .. .. ..
. . . . .
21 –0.11932 –0.08436 2.51291 1.91401
22 –0.11939 –0.08427 2.51284 1.91410

Solution. For the given system, the SOR method with ω = 1.27 is

(k+1) (k) ωh (k)


i
x1 = (1 − ω)x1 + 4 − x2
2
(k+1) (k) ωh (k+1) (k)
i
x2 = (1 − ω)x2 + 8 − x1 − x3
2
(k+1) (k) ωh (k+1) (k)
i
x3 = (1 − ω)x3 + 12 − x2 − x4
2
(k+1) (k) ωh (k+1)
i
x4 = (1 − ω)x4 + 11 − x3 .
2

Starting with an initial approximation x(0) = [0, 0, 0, 0]T , and for k = 0,


300 Applied Linear Algebra and Optimization using MATLAB

Table 2.12: Solution of Example 2.21 by the SOR method.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.000000 0.000000 0.000000 0.000000
1 2.540000 3.467100 5.418392 3.544321
2 –0.34741 0.923809 3.319772 3.919978
3 2.047182 1.422556 3.331152 3.811324
4 1.083938 1.892328 3.098770 3.988224
5 1.045709 1.937328 3.020607 3.990094
6 1.027456 1.986402 3.009361 3.996730
.. .. .. .. ..
. . . . .
15 0.999999 2.000000 3.000000 4.000000
16 1.000000 2.000000 3.000000 4.000000

we obtain
(1) (0) 1.27 (0)
x1 = (1 − 1.27)x1 + [4 − x2 ] = 2.54
2

(1) (0) 1.27 (1) (0)


x2 = (1 − 1.27)x2 + [8 − x1 − x3 ] = 3.4671
2

(1) (0) 1.27 (1) (0)


x3 = (1 − 1.27)x3 + [12 − x2 − x4 ] = 5.418392
2

(1) (0) 1.27 (1)


x4 = (1 − 1.27)x4 + [11 − x3 ] = 3.544321.
2
The first and subsequent iterations are listed in Table 2.12.

To get these results using MATLAB commands, we do the following:

>> Ab = [2 1 0 0 4; 1 2 1 0 8; 0 1 2 1 12; 0 0 1 2 11];


>> x = [0 0 0 0];
>> w = 1.27; acc = 0.5e − 6;
>> SORM (Ab, x, w, acc);
Iterative Methods for Linear Systems 301

Table 2.13: Solution of Example 2.21 by the Gauss–Seidel method.


(k) (k) (k) (k)
k x1 x2 x3 x4
0 0.000000 0.000000 0.000000 0.000000
1 2.000000 3.000000 4.500000 3.250000
2 0.500000 1.500000 3.625000 3.687500
3 1.250000 1.562500 3.375000 3.812500
4 1.218750 1.703125 3.242188 3.878906
5 1.148438 1.804688 3.158203 3.920898
6 1.097656 1.872070 3.103516 3.948242
.. .. .. .. ..
. . . . .
35 1.000000 1.999999 3.000000 4.000000
36 1.000000 2.000000 3.000000 4.000000

We note that the SOR method converges and required 16 iterations


to obtain what is obviously the correct solution for the given system. If
we solve Example 2.21 using the Gauss–Seidel method, we find that this
method also converges, but very slowly because it needed 36 iterations to
obtain the correct solution, shown by Table 2.13, which is 20 iterations
more than required by the SOR method. Also, if we solve the same exam-
ple using the Jacobi method, we will find that it needs 73 iterations to get
the correct solution. Comparing the SOR method with the Gauss–Seidel
method, a large reduction in the number of iterations can be achieved,
given an efficient choice of ω.

In practice, ω should be chosen in the range 1 < ω < 2, but the precise
choice of ω is a major problem. Finding the optimum value for ω depends
on the particular problem (size of the system of equations and the nature
of the equations) and often requires careful work. A detailed study for
the optimization of ω can be found in Isaacson and Keller (1966). The
following theorems can be used in certain situations for the convergence of
the SOR method.
302 Applied Linear Algebra and Optimization using MATLAB

Theorem 2.10 If all the diagonal elements of a matrix A are nonzero,


i.e., aii 6= 0, for each i = 1, 2, . . . , n, then
ρ(Tω ) = |ω − 1|.
This implies that the SOR method converges only if 0 < ω < 2. •
Theorem 2.11 If A is a positive-definite matrix and 0 < ω < 2, then
the SOR method converges for any choice of initial approximation vector
x(0) ∈ R. •
Theorem 2.12 If A is a positive-definite and tridiagonal matrix, then
ρ(TG ) = [ρ(TJ )]2 < 1,
and the optimal choices of relaxation factor ω for the SOR method is
2
ω= p , (2.30)
1 + 1 − [ρ(TJ )]2
where TG and TJ are the Gauss–Seidel iteration and the Jacobi iteration
matrices, respectively. With this choice of relaxation factor ω, we can have
the spectral radius of the SOR iteration matrix Tω as
ρ(Tω ) = ω − 1.
Example 2.22 Find the optimal choice for the relaxation factor ω for
using it in the SOR method for solving the linear system Ax = b, where
the coefficient matrix A is given as follows:
 
2 −1 0
A =  −1 2 −1  .
0 −1 2
Solution. Since the given matrix A is positive-definite and tridiagonal, we
can use Theorem 2.12 to find the optimal choice for ω. Using matrix A,
we can find the Jacobi iteration matrix TJ as follows:
1
 
0 0
 1 2 1 
 
TJ = 
 0 .
2 2 
 1 
0 0
2
Iterative Methods for Linear Systems 303

Now to find the spectral radius of the Jacobi iteration matrix TJ , we use
the characteristic equation

λ
det(TJ − λI) = |TJ − λi| = −λ3 + ,
2
which gives the eigenvalues of matrix TJ , as λ = 0, ± √12 . Thus,

1
ρ(TJ ) = √ = 0.707107,
2
and the optimal value of ω is

2
ω= p = 1.171573.
1 + 1 − (0.707107)2

Also, note that the Gauss–Seidel iteration matrix TG has a the form

1
 
 0 2 0 
 
 
 1 1 
TG =  0
 ,
 4 2 

 
 1 1 
0
8 4
and its characteristic equation is

λ2
det(TG − λI) = |TG − λI| = −λ3 + .
2
Thus,
1
= 0.50000 = (ρ(TJ ))2 ,
ρ(TG ) =
2
which agrees with Theorem 2.12. •

Note that the optimal value of ω can also be found by using (2.30) if the
eigenvalues of the Jacobi iteration matrix TJ are real and 0 < ρ(TJ ) < 1. •
304 Applied Linear Algebra and Optimization using MATLAB

Example 2.23 Find the optimal choice for the relaxation factor ω by us-
ing the matrix
 
5 −1 −1 −1
 2 5 −1 0 
A=  −1
.
−1 5 −1 
−1 −1 −1 5
Solution. Using the given matrix A, we can find the Jacobi iteration
matrix TJ as
 1 1 1 
0
 5 5 5 
 
 
 2 1 
 5 0 5 0 
 
TJ =  .
 
 1 1 1 
 0 
 5 5 5 
 
 
1 1 1
 
0
5 5 5
Now to find the spectral radius of the Jacobi iteration matrix TJ , we use
the characteristic equation

det(TJ − λI) = 0,

and get the following polynomial equation:

6 2 8 8
−λ4 − λ − λ− = (5λ − 3) ∗ (5λ + 1)3 = 0.
25 125 125
Solving the above polynomial equation, we obtain

3 1 1 1
λ = ,− ,− ,− ,
5 5 5 5
which are the eigenvalues of the matrix TJ . From this we get

3
ρ(TJ ) = = 0.6,
5
Iterative Methods for Linear Systems 305

the spectral radius of the matrix TJ .


Since the value of ρ(TJ ) is less than 1, we can use formula (2.30) and get
2
ω= p = 1.1111,
1 + 1 − (0.6)2
the optimal value of ω. •
Since the rate of convergence of an iterative method depends on the
spectral radius of the matrix associated with the method, one way to choose
a method to accelerate convergence is to choose a method whose associated
matrix T has a minimal spectral radius.
Example 2.24 Compare the convergence of the Jacobi, Gauss–Seidel, and
SOR iterative methods for the system of linear equations Ax = b, where
the coefficient matrix A is given as
 
4 −1 0 0
 −1 4 −1 0 
A=  0 −1
.
4 −1 
0 0 −1 4
Solution. First, we compute the Jacobi iteration matrix by using
TJ = −D−1 (L + U ).
Since
   
4 0 0 0 0 −1 0 0
 0 4 0 0   −1 0 −1 0 
D=  and L+U =  0 −1
,
 0 0 4 0  0 −1 
0 0 0 4 0 0 −1 0
 1  
4
0 0 0 0 −1 0 0
 0 1 0 0   −1 0 −1 0 
TJ = −  4
 0 0 1 0

  0 −1

4
0 −1 
0 0 0 14 0 0 −1 0
1
 
0 4
0 0
 1 0 41 0 
= 4
1 .
1

 0 0
4 4
0 0 14 0
306 Applied Linear Algebra and Optimization using MATLAB

To find the eigenvalues of the Jacobi iteration matrix TJ , we evaluate the


determinant as
−λ 1
1 4
0 0
1
4 −λ1 0

4
1 = 0,
0
4
−λ 4
1
0 0 4
−λ

which gives the characteristic equation of the form

λ4 − 0.1875λ2 + 1/256 = 0.

Solving this fourth-degree polynomial equation, we get the eigenvalues

λ = −0.4045, λ = −0.1545, λ = 0.1545, λ = 0.4045

of the matrix TJ . The spectral radius of the matrix TJ is

ρ(TJ ) = 0.4045 < 1,

which shows that the Jacobi method will converge for the given linear sys-
tem.

Since the given matrix is positive-definite and tridiagonal, by using The-


orem 2.12 we can compute the spectral radius of the Gauss–Seidel iteration
matrix with the help of the spectral radius of the Jacobi iteration matrix,
i.e.,
ρ(TG ) = [ρ(TJ )]2 = (0.4045)2 = 0.1636 < 1,
which shows that the Gauss–Seidel method will also converge, and faster,
than the Jacobi method. Also, from Theorem 2.12, we have

ρ(Tω ) = ω − 1.

Now to find the spectral radius of the SOR iteration matrix Tω , we have to
calculate first the optimal value of ω by using

2
ω= p .
1 + 1 − [ρ(TJ )]2
Iterative Methods for Linear Systems 307

So using ρ(TJ ) = 0.4045, we get


2
ω= p = 1.045.
1 + 1 − [0.4045]2

Using this optimal value of ω, we can compute the spectral radius of the
SOR iteration matrix Tω as follows:

ρ(Tω ) = ω − 1 = 1.045 − 1 = 0.045 < 1.

Thus the SOR method will also converge for the given system, and faster
than the other two methods, because

ρ(Tω ) < ρ(TG ) < ρ(TJ ).

Program 2.3
MATLAB m-file for the SOR Iterative Method
function sol=SORM(Ab,x,w,acc) % Ab = [A b]
[n,t]=size(Ab); b=Ab(1:n,t); R=1; k=1;
d(1,1:n+1)=[0 x];
k=k+1; while R > acc
for i=1:n
sum=0;
for j=1:n
if j <= i − 1; sum = sum + Ab(i, j) ∗ d(k, j + 1);
elseif j >= i + 1; sum = sum + Ab(i, j) ∗ d(k − 1, j + 1);
end;end
x(1, i) = (1 − w) ∗ d(k − 1, i + 1) + (w/Ab(i, i)) ∗ (b(i, 1) −
sum);
d(k, 1) = k − 1; d(k, i + 1) = x(1, i); end
R = max(abs((d(k, 2 : n + 1) − d(k − 1, 2 : n + 1))));
if R > 100 & k > 10; break; end
k=k+1; end; x=d;
308 Applied Linear Algebra and Optimization using MATLAB

Procedure 2.3 (SOR Method)

1. Find or take ω in the interval (0, 2) (for guaranteed convergence).

2. Initialize the first approximation x(0) and preassigned accuracy .

3. Compute the constant c = ω(D − ωL)−1 b.

4. Compute the SOR iteration matrix Tω = (D+ωL)−1 [(1−ω)D−ωU ].


(k+1) (k)
5. Solve for the approximate solutions xi = Tω xi +c, i = 1, 2, . . . , n,
and k = 0, 1, . . ..
(k+1) (k)
6. Repeat step 5 until kxi − xi k < .

2.7 Conjugate Gradient Method


So far, we have discussed two broad classes of methods for solving linear
systems. The first, known as direct methods (Chapter 1), are based on
some version of Gaussian elimination or LU decomposition. Direct meth-
ods eventually obtain the exact solution but must be carried through to
completion before any useful information is obtained. The second class
contains the iterative methods discussed in the present chapter that lead to
closer and closer approximations to the solution, but almost never reach
the exact value.

Now we discuss a method, called the conjugate gradient method, which


was developed as long ago as 1952. It was originally developed as a direct
method designed to solve an n × n positive-definite linear system. As a
direct method it is generally inferior to Gaussian elimination with pivoting
since both methods require n major steps to determine a solution, and the
steps of the conjugate gradient method are more computationally expan-
sive than those in Gaussian elimination. However, the conjugate gradient
method is very useful when employed as an iterative approximation method
for solving large sparse systems.
Iterative Methods for Linear Systems 309

Actually, this method is rarely used as a primary method for solving


linear systems, rather, its more common applications arise in solving dif-
ferential equations and when other iterative methods converge very slowly.
We assume the coefficient matrix A of the linear system Ax = b is positive-
definite and orthogonality with respect to the inner product notation
< x, y >= xT Ay,
where x and y are n-dimensional vectors. Also, we have for each x and y,
< x, Ay >=< Ax, y > .
The conjugate gradient method is a variational approach in which we
seek the vector x∗ as a solution to the linear system Ax = b, if and only if
x∗ minimizes
E(x) =< x, Ax > −2 < x, b > . (2.31)
In addition, for any x and v 6= 0, the function E(x + tv) has its minimum
when
< v, b − Ax >
t= .
< v, Av >
The process is started by specifying an initial estimate x(0) at iteration
zero, and by computing the initial residual vector from
r(0) = b − Ax(0) .
We then obtain improved estimates x(k) from the iterative process
x(k) = x(k−1) + tk v(k) , (2.32)
where v(k) is a search direction expressed as a vector and the value of
< v(k) , b − Ax(k−1) >
tk =
< v(k) , Av(k) >
is chosen to minimize the value of E(x(k) ).

In a related method, called the method of steepest descent, v(k) is chosen


as the residual vector
v(k) = r(k−1) = b − Ax(k−1) .
310 Applied Linear Algebra and Optimization using MATLAB

This method has merit for nonlinear systems and optimization prob-
lems, but it is not used for linear systems because of slow convergence. An
alternative approach uses a set of nonzero direction vectors {v(1) , . . . , v(n) }
that satisfy
< v(i) , Av(j) >= 0, if i 6= j.
This is called an A-orthogonality condition, and the set of vectors {v(1) , . . . , v(n) }
is said to be A-orthogonal.
In the conjugate gradient method, we use v(1) equal to r(0) only at the
beginning of the process. For all later iterations, we choose
kr(k) k2 (k)
v(k+1) = r(k) + v
kr(k−1) k2
to be conjugate to all previous direction vectors.

Note that the initial approximation x(0) can be chosen by the user,
with x(0) = 0 as the default. The number of iterations, m ≤ n, can be
chosen by the user in advance; alternatively, one can impose a stopping
criterion based on the size of the residual vector, kr(k) k, or, alternatively,
the distance between successive iterates, kx(k+1) − x(k) k. If the process is
carried on to the bitter end, i.e., m = n, then, in the absence of round-off
errors, the results will be the exact solution to the linear system. More
iterations than n may be required in practical applications because of the
introduction of round-off errors.
Example 2.25 The linear system
2x1 − x2 = 1
−x1 + 2x2 − x3 = 0
− x2 + x3 = 1

has the exact solution x = [2, 3, 4]T . Solve the system by the conjugate
gradient method.

Solution. Start with an initial approximation x(0) = [0, 0, 0]T and find the
residual vector as

r(0) = b − Ax(0) = b = [1, 0, 1]T .


Iterative Methods for Linear Systems 311

The first conjugate direction is v(1) = r(0) = [1, 0, 1]T . Since kr(0) k = 2
and < v(1) , v(1) >= [v(1) ]T Av(1) = 3, we use (2.32) to obtain the updated
approximation to the solution
 
2
    3 

1
kr(0) k2 2   
x(1) = x(0) + v (1)
= 0 = 0 .
 
< v(1) , v(1) > 3

1
 
 
 2 
3
Now we compute the next residual vector as
1 4 4
r(1) = b − Ax(1) = b = [− , , ]T ,
3 3 3
and the conjugate direction as
1 2
   

 3 
 
  1 
 3 

(1) 2
     
kr k  4  2    4 
v(2) = r(1) + (0) 2 v(1) =  +  0 = ,
kr k  3  2  
     3 

   
 1  1  4 
3 3
which satisfies the conjugacy condition < v , v >= [v ] Av(2) = 0.
(1) (2) (1) T

Now we get the new approximation as


2
     
2 13
 3   3  
     6  
(1) 2
  
kr k   2
4  
x(2) = x(1) + v (2)
= 0 + = 3 .
   
< v(2) , v(2) >  89 
  

   3  
 


 2   
 4   11 
3 3 3
Since we are dealing with a 3 × 3 system, we will recover the exact solution
by one more iteration of the method. The new residual vector is
 T
(2) (2) 1 1 1
r = b − Ax = b = − , − , ,
3 6 3
312 Applied Linear Algebra and Optimization using MATLAB

and the final conjugate direction is


1 2
    

− 1
 3   3   − 
     4 
kr(2) k2 (2)  1  14  4
     
 
v(3) = r(2) + v = − + 
  = 0 ,
kr(1) k2  6  2 3
  
  
     
 1   4  1 
3 3 2

which, as one can check, is conjugate to both v(1) and v(2) . Thus, the
solution is obtained from
   
13 1
 6   −4   
2
   
kr (2) 2
k   1 
(3) (2) (3) 4
x =x + v = 3 + 3  0 = 3 .
     
< v(3) , v(3) >
4
  8 
   
 11   1 
3 2
Since we applied the method n = 3 times, this is the actual solution. •

Note that in larger examples, one would not carry through the method
to the bitter end, since an approximation to the solution is typically ob-
tained with only a few iterations. The result can be a substantial saving
in computational time and effort required to produce an approximation to
the solution.
To get the above results using MATLAB commands, we do the follow-
ing:

>> A = [2 − 1 0; −1 2 − 1; 0 − 1 1];
>> b = [1 0 1]0 ;
>> x0 = [0 0 0]0 ;
>> acc = 0.5e − 6;
>> maxI = 3;
>> CON JG(A, b, x0, acc, maxI);
Iterative Methods for Linear Systems 313

Program 2.4
MATLAB m-file for the Conjugate Gradient Method
function x=CONJG(A,b,x0,acc,maxI)
x = x0; r = b − A ∗ x0; v = r;
alpha=r0 ∗ r; iter=0;flag=0;
normb=norm(b); if normb < eps; normb=1; end
while (norm(r)/normb > acc)
u = A ∗ v; t = alpha/(u0 ∗ v); x = x + t ∗ v;
r = r − t ∗ u; beta = r0 ∗ r;
v = r + beta/alpha ∗ v; alpha = beta;
iter = iter+1; if (iter == maxI); flag= 1;
break; end; end

Procedure 2.4 (Conjugate Gradient Method)

1. Initialize the first approximation x(0) = 0.

2. Compute r(0) and set v(1) equal to r(0) .

3. For iterations k equal to (1, 2, . . .) until convergence:


kr(k) k2 (k)
(a) Compute v(k+1) = r(k) + v .
kr(k−1) k2
kr(k) k2
(b) Compute x(k+1) = x(k) + v(k+1) .
[v(k+1) ]T Av(k+1)

2.8 Iterative Refinement


In those cases when the left-hand side coefficients aij of the system are
exact, but the system is ill-conditioned, an approximate solution can be
improved by an iterative technique called the method of residual correc-
tion. The procedure of the method is defined below.

Let x(1) be an approximate solution to the system

Ax = b (2.33)
314 Applied Linear Algebra and Optimization using MATLAB

and let y be a correction to x(1) so that the exact solution x satisfies

x = x(1) + y.

Then by substituting into (2.33), we find that y must satisfy

Ay = r = b − Ax(1) , (2.34)

where r is the residual. The system (2.34) can now be solved to give
correction y to the approximation x(1) . Thus, the new approximation

x(2) = x(1) + y

will be closer to the solution than x(1) . If necessary, we compute the new
residual
r = b − Ax(2)
and solve system (2.34) again to get new corrections. Normally, two or
three iterations are enough to get an exact solution. This iterative method
can be used to obtain an improved solution whenever an approximate so-
lution has been obtained by any means.

Example 2.26 The linear system

1.01x1 + 0.99x2 = 2
0.99x1 + 1.01x2 = 2

has the exact solution x = [1, 1]T . The approximate solution using the
Gaussian elimination method is x(1) = [1.01, 1.01]T and residual r(1) =
[−0.02, −0.02]T . Then the solution to the system

Ay = r(1) ,

using the simple Gaussian elimination method, is y(1) = [−0.01, −0.01]T .


So the new approximation is

x(2) = x(1) + y(1) = [1, 1]T ,


Iterative Methods for Linear Systems 315

which is equal to the exact solution after just one iteration. •

For MATLAB commands for the above iterative method, the two m-
files RES.m and WP.m have been used, then the first iteration is easily
performed by the following sequence of MATLAB commands:

>> A = [1.01 0.99; 0.99 1.01];


>> b = [2 2]0 ;
>> x0 = [1.01 1.01]0 ;
>> r = RES(A, b, x0);
>> B = [A r];
>> y = W P (B);
>> x0 = x0 + y;
If needed, the last four commands can be repeated to generate the subse-
quent iterates.

Procedure 2.5 (Iterative Refinement Method)


1. Find or use the given initial approximation x(1) ∈ R.

2. Compute the residual vector r(1) = b − Ax(1) .

3. Solve the linear system Ay = r(1) for the unknown y.

4. Set x(k+1) = x(k) + y, for k = 1, . . ..

5. Repeat steps 2 to 4, unless the best approximation is achieved.

2.9 Summary
Several iterative methods were discussed. Among them were the Jacobi
method, the Gauss–Seidel method, and the SOR method. All methods
converge if the coefficient matrix is strictly diagonally dominant. The SOR
is the best method of choice. Although the determination of the optimum
value of the relaxation factor ω is difficult, it is generally worthwhile if the
system of equations is to be solved many times for right-hand side vectors.
The need for estimating parameters is removed in the conjugate gradient
316 Applied Linear Algebra and Optimization using MATLAB

method, which, although more complicated to code, can rival the SOR
method in efficiency when dealing with large, sparse systems. Iterative
methods are generally used when the number of equations is large, and
the coefficient matrix is strictly diagonally dominant. At the end of this
chapter we discussed the residual corrector method, which improved the
approximate solution.

2.10 Problems
1. Find the Jacobi iteration matrix and its l∞ -norm for each of the
following matrices:
   
11 −3 2 7 1 1
(a)  4 10 3  , (b)  3 13 2  ,
−2 5 9 −4 3 14
   
11 −3 2 7 1 1
(c)  4 10 3  , (d)  3 13 2  ,
−2 5 9 −4 3 14
   
8 1 −1 0 7 1 −3 1
 2 13 −2 1   1 10 2 −3 
(e)  −1 3 15 2  ,
 (f ) 
 1 −5 25
.
4 
1 4 5 20 1 2 3 17
2. Find the Jacobi iteration matrix and its l2 -norm for each of the fol-
lowing matrices:
   
5 2 3 6 3 0
(a)  −3 6 4  , (b)  4 7 5  ,
2 5 8 −3 2 11
   
21 −13 6 9 0 11
(c)  5 15 2  , (d)  1 11 3  ,
11 2 19 −1 0 4
   
18 2 −3 2 5 4 3 0
 6 17 −2 1   2 10 3 3 
(e)  1 13 25 2 
, (f )  3 4 12 −3  .
 

−6 8 7 21 2 0 −1 7
Iterative Methods for Linear Systems 317

3. Solve the following linear systems using the Jacobi method. Start
with initial approximation x(0) = 0 and iterate until kx(k+1) − x(k) k∞ ≤
10−5 for each system:

(a)
4x1 − x2 + x3 = 7
4x1 − 8x2 + x3 = −21
−2x1 + x2 + 5x3 = 15
(b)
3x1 + x2 + x3 = 5
2x1 + 6x2 + x3 = 9
x1 + x2 + 4x3 = 6
(c)
4x1 + 2x2 + x3 = 1
x1 + 7x2 + x3 = 4
x1 + x2 + 20x3 = 7
(d)
5x1 + 2x2 − x3 = 6
x1 + 6x2 − 3x3 = 4
2x1 + x2 + 4x3 = 7
(e)
6x1 − x2 + 3x3 = −2
3x2 + x3 = 1
−2x1 + x2 + 5x3 = 5
(f )
4x1 + x2 = −1
2x1 + 5x2 + x3 = 0
−x1 + 2x2 + 4x3 = 3
(g)
5x1 − x2 + x3 = 1
3x2 − x3 = −1
x1 + 2x2 + 4x3 = 2
318 Applied Linear Algebra and Optimization using MATLAB

(h)
9x1 + x2 + x3 = 10
2x1 + 10x2 + 3x3 = 19
3x1 + 4x2 + 11x3 = 0

4. Consider the following system of equations:

4x1 + 2x2 + x3 = 1
x1 + 7x2 + x3 = 4
x1 + x2 + 20x3 = 7.

(a) Show that the Jacobi method converges by using kTJ k∞ < 1.
(b) Compute the second approximation x(2) , starting with x(0) =
[0, 0, 0]T .
(c) Compute an error estimate kx − x(2) k∞ for your approximation.

5. If    
4 1 0 3
A= 1 3 −1  , b =  4 .
0 −1 4 5
Find the Jacobi iteration matrix TJ . If the first approximate solution
of the given linear system by the Jacobi method is [ 34 , 43 , 54 ]T , using
x(0) = [0, 0, 0]T , then estimate the number of iterations necessary to
obtain approximations accurate to within 10−6 .

6. Consider the linear system Ax = b, where


   
−5 1 0 4
A =  1 5 −1  , b =  −2  .
0 1 2 −5

Find the Jacobi iteration matrix TJ and show that kTJ k < 1. Use
the Jacobi method to find the first approximate solution x(1) of the
linear system by using x(0) = [0, 0, 0]T . Also, compute the error bound
kx − x(10) k. Compute the number of steps needed to get accuracy
within 10−5 .
Iterative Methods for Linear Systems 319

7. Consider a linear system Ax = b, where


   
4 2 1 1
A=  1 7 1 ,  b=  4 .
1 1 20 7
Show that the Jacobi method converges for the given linear system.
If the first approximate solution of the system by the Jacobi method
is x(1) = [0.25, 0.57, 0.35]T , by using an initial approximation x(0) =
[0, 0, 0]T , compute an error bound kx − x(20) k.
8. Consider a linear system Ax = b, where
   
3 2 1
A= , b= .
2 50 1
Using the Jacobi method and the Gauss–Seidel method, which one
will converge faster and why? If an approximate solution of the
system is [0.1, 0.4]T , then find the upper bound for the relative error
in solving the given linear system.
9. Rearrange the following system such that convergence of the Gauss–
Seidel method is guaranteed. Then use x(0) = [0, 0, 0]T to find the
first approximation by the Gauss–Seidel method. Also, compute an
error bound kx − x(10) k.
   
1 2 4 2
A =  5 −1 1 , b =  1 .
0 3 −1 −1

10. Solve Problem 1 using the Gauss–Seidel method.


11. Consider the following system of equations:
4x1 + 2x2 + x3 = 11
−x1 + 2x2 = 3
2x1 + x2 + 4x3 = 16.

(a) Show that the Gauss–Seidel method converges by using


kTG k∞ < 1.
320 Applied Linear Algebra and Optimization using MATLAB

(b) Compute the second approximation x(2) , starting with x(0) =


[1, 1, 1]T .
(c) Compute an error estimate kx − x(2) k∞ for your approximation.
Consider the linear system Ax = b, where
   
−5 2 1 −3
A =  1 −10 1 , b =  27  .
1 1 −4 4

Find the Gauss–Seidel iteration matrix TG and show that kTG k < 1.
Use the Gauss–Seidel method to find the second approximate solution
x(1) of the linear system by using x(0) = [−0.5, −2.5, −1.5]T . Also,
compute the error bound.

12. Consider the following linear system Ax = b, where


   
−5 2 1 −3
A =  1 10 1 , b =  27  .
1 1 −4 4

Show that the Gauss–Seidel method converges for the given linear
system. If the first approximate solution of the given linear system
by the Gauss–Seidel method is x(1) = [0.6, −2.7, −1]T , by using the
initial approximation x(0) = [0, 0, 0]T , then compute the number of
steps needed to get accuracy within 10−4 . Also, compute an upper
bound for the relative error in solving the given linear system.

13. Consider the following system:

4x1 + x2 − 2x3 = 4
2x1 + 9x2 − 3x3 = 3
x1 − 2x2 + 8x3 = 2.

(a) Find the matrix form of both the iterative (Jacobi and
Gauss–Seidel) methods.
(k) (k) (k)
(b) If x(k) = [x1 , x2 , x3 ]T , then write the iterative forms of
part (a) in component form and find the exact solution of the
given system.
Iterative Methods for Linear Systems 321

(c) Find the formulas for the error e(k+1) in the (n + 1)th step.
(d) Find the second approximation of the error e(2) using part (c),
if x(0) = [0, 00]T .
14. Consider the following system:
16x1 − 3x2 + 2x3 = 11
x1 + 15x2 − 3x3 = 12
5x1 − 3x2 + 14x3 = 13.
(a) Find the matrix form of both the iterative (Jacobi and
Gauss–Seidel) methods.
(k) (k) (k)
(b) If x(k) = [x1 , x2 , x3 ]T , then write the iterative forms of
part (a) in component form and find the exact solution of the
given system.
(c) Find the formulas for the error e(k+1) in the (n + 1)th step.
(d) Find the second approximation of the error e(2) using part (c),
if x(0) = [0, 00]T .
15. Which of the following matrices is convergent?
 
    1 0 −1 1
2 2 3 1 0 0  2 2 −1 1 
(a)  1 2 1 , (b)  −1 3 0 , (c)  .
 0 1 3 −2 
2 −2 1 3 2 −2
1 0 1 4

16. Find the eigenvalues and their associated eigenvectors of the matrix
 
2 −2 3
A= 0 3 −2  .
0 −1 2
Also, show that kAk2 > ρ(A).
17. Find the l2 -norm of each of the following matrices:
 
    1 3 2 1
2 1 3 1 2 0  2 2 −1 1 
(a)  1 2 4  , (b)  1 2 0  , (c)  .
 0 1 2 −2 
2 −2 1 1 2 −2
1 3 1 5
322 Applied Linear Algebra and Optimization using MATLAB

18. Find the l2 -norm of each of the following matrices:


 
    1 3 2 1
2 1 3 1 2 0  3 2 1 1 
(a)  1 2 −2  , (b)  2 2 −1  , (c)  .
 2 1 3 −2 
3 −2 1 0 −1 −2
1 1 −2 4

19. Find the eigenvalues µ of the matrix B = mathbf I − 14 A and show


that µ = mathbf I − 14 λ, where λ are the eigenvalues of the following
matrix:  
4 −1 −1 −1
 −1 4 −1 −1 
A=  −1 −1
.
4 −1 
−1 −1 −1 4
20. Solve Problem 1 using the SOR method by taking ω = 1.02 for each
system.
21. Write the Jacobi, Gauss–Seidel, and SOR iteration matrices for the
following matrix:
 
4 −1 −1 −1
 −1 4 −1 −1 
A=  −1 −1
.
4 −1 
−1 −1 −1 4

22. Use the given parameter ω to solve each of the following linear sys-
tems by using the SOR method within accuracy 10−6 in the l∞ -norm,
starting with x(0) = 0 :

(a) ω = 1.323
6x1 − 2x2 + x3 = 9
2x1 + 7x2 + 3x3 = 11
−4x1 + 5x2 + 15x3 = 10
(b) ω = 1.201
2x1 − 3x2 + x3 = 6
2x1 + 16x2 + 5x3 = −5
2x1 + 3x2 + 8x3 = 3
Iterative Methods for Linear Systems 323

(c) ω = 1.110
4x1 + 3x2 = 11
x1 + 7x2 − 3x3 = 13
2x1 + 7x2 + 20x3 = 20
(d) ω = 1.543
x1 − 2x2 = −3
−2x1 + 5x2 − x3 = 5
− x2 + 2x3 − 0.5x4 = 2
− 0.5x3 + 1.25x4 = 3.5

23. Consider the following linear system Ax = b, where


   
9 4 −1 11
A= 4 5 1 , b =  10  .
−1 1 5 3
Use the Gauss–Seidel and SOR methods (using the optimal value of
ω) to get the solution accurate to four significant digits, starting with
x(0) = [0, 0, 0]T .
24. Consider the following linear system with Ax = b, where
   
4 −2 0 0 1
 −2 3 2 0   2 
A= , b =  −1  .
 
 0 2 3 −1 
0 0 −1 4 −1
Use the Jacobi, Gauss–Seidel, and SOR methods (taking ω = 1.007)
to get the solution accurate to four significant digits, with x(0) =
[2.5, 5.5, −4.5, −0.5]T .
25. Find the optimal choice for ω and use it to solve the linear system
by the SOR method within accuracy 10−4 in the l∞ -norm, starting
with x(0) = 0. Also, find how many iterations are needed by using
the Gauss–Seidel and the Jacobi methods.
4x1 + 2x2 = 5
2x1 + 4x2 − 2x3 = 3
− x2 + 4x3 = 6.
324 Applied Linear Algebra and Optimization using MATLAB

26. Consider the following system:


4x1 − 2x2 − x3 = 1
−x1 + 4x2 − x4 = 2
−x1 + 4x3 − x4 = 0
− x2 − x3 + 4x4 = 1.

Using x(0) = 0, how many iterations are required to approximate the


solution to within five decimal places using the (a) Jacobi method,
(b) Gauss–Seidel method, and (c) SOR method (take ω = 1.1)?
27. Find the spectral radius of the Jacobi, the Gauss–Seidel, and the
SOR iteration matrices for each of the following matrices:

(a) ω = 1.25962  
2 1 0 0
 1 2 1 0 
A=
 0
.
1 2 1 
0 0 1 2
(b) ω = 1.15810
 
4 −1 0 0
 −1 4 −1 0 
A= .
 0 −1 4 −1 
0 0 −1 4

28. Perform only two steps of the conjugate gradient method for the fol-
lowing linear systems, starting with x(0) = 0 :

(a)
3x1 − x2 + x3 = 7
−x1 + 3x2 + 2x3 = 1
x1 + 2x2 + 5x3 = 5
(b)
3x1 − 2x2 + x3 = 5
−2x1 + 6x2 − x3 = 9
x1 − x2 + 4x3 = 6
Iterative Methods for Linear Systems 325

(c)
4x1 − 2x2 + x3 = 1
−2x1 + 7x2 + x3 = 4
x1 + x2 + 20x3 = 1
(d)
5x1 − 3x2 − x3 = 6
−3x1 + 6x2 − 3x3 = 4
−x1 − 3x2 + 4x3 = 7
(e)
9x1 − 3x2 − x3 + 2x4 = 11
3x1 − 10x2 − 2x3 + x4 = 9
2x1 + 3x2 − 11x3 + 3x4 = 15
x1 − 3x2 − 2x3 + 12x4 = 8
29. Perform only two steps of the conjugate gradient method for the fol-
lowing linear systems, starting with x(0) = 0 :

(a)
6x1 + 2x2 + x3 = 1
2x1 + 3x2 − x3 = 0
x1 − x2 + 2x3 = −2
(b)
5x1 − 2x2 + x3 = 3
−2x1 + 4x2 − x3 = 2
x1 − x2 + 3x3 = 1
(c)
6x1 − x2 − x3 + 5x4 = 1
−x1 + 7x2 + x3 − x4 = 2
−x1 + x2 + 3x3 − 3x4 = 0
5x1 − x2 − 3x3 + 6x4 = −1
(d)
3x1 − 2x2 − x3 + 3x4 = 1
−2x1 + 7x2 + x3 − x4 = 0
−x1 + x2 + 3x3 − 3x4 = 0
3x1 − x2 − 3x3 + 6x4 = 0
326 Applied Linear Algebra and Optimization using MATLAB

30. Find the approximate solution of the linear system

x1 + 2x2 = 1
2x1 + 4.0001x2 = 1.9999

using simple Gaussian elimination, and then use the residual cor-
rection method (two iterations only) to improve the approximate
solution.

31. The following linear system has the exact solution x = [10, 1]T . Find
the approximate solution of the system

0.03x1 + 58.9x2 = 59.2


5.31x1 − 6.10x2 = 47.0

by using simple Gaussian elimination, and then use the residual cor-
rection method (one iteration only) to improve the approximate so-
lution.

32. The following linear system has the exact solution x = [1, 1]T . Find
the approximate solution of the system

x1 + 2x2 = 3
x1 + 2.01x2 = 3.01

by using simple Gaussian elimination, and then use the residual cor-
rection method (one iteration only) to improve the approximate so-
lution.
Chapter 3

The Eigenvalue Problems

3.1 Introduction
In this chapter we describe numerical methods for solving eigenvalue prob-
lems that arise in many branches of science and engineering and seem to
be a very fundamental part of the structure of the universe. Eigenvalue
problems are important in a less direct manner in numerical applications.
For example, discovering the condition factor in the solution of a set of
linear algebraic equations involves finding the ratio of the largest to the
smallest eigenvalue values of the underlying matrix. Also, the eigenvalue
problem is involved when establishing the stiffness of ordinary differential
equations problems. In solving eigenvalue problems, we are mainly con-
cerned with the task of finding the values of the parameter λ and vector
x, which satisfy a set of equations of the form

Ax = λx. (3.1)
327
328 Applied Linear Algebra and Optimization using MATLAB

The linear equation (3.1) represents the eigenvalue problem, where A is


an n × n coefficient matrix, also called the system matrix, x is an unknown
column vector, and λ is an unknown scalar. If the set of equations has a
zero on the right-hand side, then a very important special case arises. For
such a case, one solution of (3.1) for a real square matrix A is the trivial
solution, x = 0. However, there is a set of values for the parameter λ for
which nontrivial solutions for the vector x exist. These nontrivial solutions
are called eigenvectors, characteristic vectors, or latent vectors of a matrix
A, and the corresponding values of the parameter λ are called eigenvalues,
characteristic values, or latent roots of A. The set of all eigenvalues of A
is called the spectrum of A. Eigenvalues may be real or complex, distinct
or multiple. From (3.1), we deduce

Ax = λIx,

which gives
(A − λI)x = 0, (3.2)
where I is an n × n identity matrix. The matrix (A − λI) appears as
 
(a11 − λ) a12 ··· a1n
 a21 (a22 − λ) · · · a2n 
,
 
 .. .. . . .
.
 . . . . 
an1 an2 · · · (ann − λ)

and the result of the multiplication of (3.2) is a set of homogeneous equa-


tions of the form
(a11 − λ)x1 + a12 x2 + ··· + a1n xn = 0
a21 x1 + (a22 − λ)x2 + · · · + a2n xn = 0
.. .. ... .. .. (3.3)
. . . .
an1 x1 + an2 x2 + · · · + (ann − λ)xn = 0.

Then by using Cramer’s rule, we see that the determinant of the de-
nominator, namely, the determinant of the matrix of the system (3.3) must
vanish if there is to be a nontrivial solution, i.e., a solution other than
x = 0. Geometrically, Ax = λx says that under transformation by A,
The Eigenvalue Problems 329

eigenvectors experience only changes in magnitude or sign—the orienta-


tion of Ax in Rn is the same as that of x. The eigenvalue λ is simply the
amount of “stretch” or “shrink” to which the eigenvector x is subjected
when transformed by A (Figure 3.1).

Figure 3.1: The situation in R2 .

Definition 3.1 (Trace of a Matrix)


For an n × n matrix A = (aij ), we define the trace of A to be the sum of
the diagonal elements of A, i.e.,

trace(A) = a11 + a22 + · · · + ann .

For example, the trace of the matrix


 
7 3 1
A= 2 6 2 
1 4 3

is defined as
trace(A) = 7 + 6 + 3 = 16. •

Theorem 3.1 If A and B are square matrices of the same size, then:
330 Applied Linear Algebra and Optimization using MATLAB

1. trace (AT ) = trace (A).

2. trace (kA) = k trace (A).

3. trace (A + B) = trace (A) + trace (B).

4. trace (A − B) = trace (A) – trace (B).

5. trace (AB) = trace (BA). •

For example, consider the following matrices:


   
5 −4 2 9 3 −4
A= 1 4 3  and B =  1 5 2 .
3 2 7 3 −2 8

Then  
5 1 3
AT =  −4 4 2 
2 3 7
and
trace(AT ) = 16 = trace(A).
Also,  
20 −16 8
4A =  4 16 12 
12 8 28
and
trace(4A) = 64 = 4(16) = 4trace(A).

The sum of the above two matrices is defined as


 
14 −1 −2
A+B = 2 9 5 
6 0 15

and
trace(A + B) = 38 = 16 + 22 = trace(A) + trace(B).
The Eigenvalue Problems 331

Similarly, the difference of the above two matrices is defined as

 
−4 −7 6
A − B =  0 −1 1 ,
0 4 −1

and
trace(A − B) = −6 = 16 − 22 = trace(A) − trace(B).

Finally, the product of the above two matrices is defined as

 
47 −9 −12
AB =  22 17 28 
50 5 48

and
 
36 −32 −1
BA =  16 20 31  .
37 −4 56

Then
trace(AB) = 112 = trace(BA).

To get these results, we use the MATLAB Command Window as fol-


lows:

>> A = [5 − 4 2; 1 4 3; 3 2 7];
>> B = [9 3 − 4; 1 5 2; 3 − 2 8];
>> C = A + B;
>> D = A ∗ B;
>> trac(C);
>> trac(D);
332 Applied Linear Algebra and Optimization using MATLAB

Program 3.1
MATLAB m-file for Finding Trace of The Ma-
trix
function [trc]=trac(A)
n=max(size(A)); trc=0;
for i=1:n; for k=1:n
if i==k tracc=A(i,k); trc=trc+tracc; else
trc=trc;
end; end; end

There should be no confusion between the diagonal entries and eigen-


values. For a triangular matrix, they are the same but that is exceptional.
Normally, the pivots and diagonal entries and eigenvalues are completely
different.

The classical method of finding the eigenvalues of a matrix A is to


estimate the roots of a characteristic equation of the form

p(λ) = det(A − λI) = |A − λI| = 0. (3.4)

Then the eigenvectors are determined by setting one of the nonzero ele-
ments of x to unity and calculating the remaining elements by equating
coefficients in the relation (3.2).

Eigenvalues of 2 × 2 Matrices
Let λ1 and λ2 be the eigenvalues of a 2 × 2 matrix A, then a quadratic
polynomial p(λ) is defined as

p(λ) = (λ − λ1 )(λ − λ2 )
= λ2 − (λ1 + λ2 )λ + λ1 λ2 .

Note that

trace(A) = λ1 + λ2
det(A) = λ1 λ2 .
The Eigenvalue Problems 333

So
p(λ) = λ2 − trace(A)λ + det(A).
For example, if the given matrix is
 
5 4
A= ,
3 4
then  
5−λ 4
A − λI =
3 4−λ
and
p(λ) = (5 − λ)(4 − λ) − 12 = λ2 − 9λ + 8.
By solving the above quadratic polynomial, we get

λ1 = 8, λ2 = 1,

the possible eigenvalues of the given matrix.

Note that

trace(A) = λ1 + λ2 = 9,
det(A) = λ1 λ2 = 8,

which satisfies the above result. •

The discriminant of a 2 × 2 matrix is defined as

D = [trace(A)]2 − 4 det(A).

For example, the discriminant of the matrix


 
8 7
A=
4 6
can be calculated as

D = [14]2 − 4(20) = 116,

where the trace of A is 14 and the determinant of A is 20. •


334 Applied Linear Algebra and Optimization using MATLAB

Theorem 3.2 If D is discriminant of a 2 × 2 matrix, then the following


statements hold:

1. The eigenvalues of A are real and distinct when D > 0.

2. The eigenvalues of A are a complex conjugate pair when D < 0.

3. The eigenvalues of A are real and equal when D = 0. •

For example, the eigenvalues of the matrix


 
3 2
A=
2 4

are real and distinct because

D = [7]2 − 4(8) = 17 > 0.

Also, the eigenvalues of the matrix


 
3 −5
A=
1 2

are a complex conjugate pair since

D = [5]2 − 4(11) = −19 < 0.

Finally, the matrix  


1 2
A=
0 1
has real and equal eigenvalues because

D = [2]2 − 4(1) = 0. •

Note that the eigenvectors of a 2 × 2 matrix A corresponding to each


of the eigenvalues of a matrix A can be found easily by substituting each
eigenvalue in (3.2).
The Eigenvalue Problems 335

Example 3.1 Find the eigenvalues and eigenvectors of the following ma-
trix:  
6 −3
A= .
−4 5
Solution. The eigenvalues of the given matrix are real and distinct because

D = [11]2 − 4(18) = 49 > 0.

Then  
6−λ −3
A − λI =
−4 5 − λ
and
p(λ) = (6 − λ)(5 − λ) − 12 = λ2 − 11λ + 18.
By solving the above quadratic polynomial, we get

λ1 = 9, λ2 = 2,

the possible eigenvalues of the given matrix.

Note that

trace(A) = λ1 + λ2 = 11
det(A) = λ1 λ2 = 18.

Now to find the eigenvectors of the given matrix A corresponding to


each of these eigenvalues, we substitute each of these two eigenvalues in
(3.2). When λ1 = 9, we have
    
−3 −3 x1 0
= ,
−4 −4 x2 0
which implies that

−3x1 − 3x2 = 0 gives x1 = −x2 ,

hence, the eigenvector x1 corresponding to the first eigenvalue, 9, by choos-


ing x2 = 1, is

x1 = α[−1, 1]T , where α ∈ R, α 6= 0.


336 Applied Linear Algebra and Optimization using MATLAB

When λ2 = 2, we have
    
4 −3 x1 0
= .
−4 3 x2 0

From it, we obtain


3
4x1 − 3x2 = 0 gives x1 = x2 ,
4
and
3
−4x1 + 3x2 = 0 gives x1 = x2 .
4
Thus, choosing x2 = 4, we obtain

x2 = α[3, 4]T , where α ∈ R, α 6= 0,

which is the second eigenvector x2 corresponding to the second eigenvalue,


2. •

To get the results of Example 3.1, we use the MATLAB Command


Window as follows:

>> A = [6 − 3; −4 5];
>> EigT wo(A);
The Eigenvalue Problems 337

Program 3.2
MATLAB m-file for Finding Eigenvalues of a 2 × 2
Matrix
function [Lambda,x]= EigTwo(A)
det = A(1, 1) ∗ A(2, 2) − A(1, 2) ∗ A(2, 1);
trace = A(1,1) + A(2,2);
L1 = (trace + sqrt(traceˆ 2−4 ∗ det))/2;
L2 = (trace − sqrt(traceˆ 2−4 ∗ det))/2;
if A(1,2) =˜ 0
x1 = [A(1,2); L1-A(1,1)];
x2 = [A(1,2); L2-A(1,1)];
elseif A(2,1) = ˜ 0
x1 = [L1-A(2,2); A(2,1)];
x2 = [L2-A(2,2); A(2,1)];
else x1 = [1; 0]; x2 = [0; 1];end
disp([0 Lˆ 2−0 num2str(trace)0 ∗ L +0
0 0
num2str(det) = 0 ])
L1 x1 L2 x2

For larger size matrices, there is no doubt that the eigenvalue problem
is computationally more difficult than the linear system Ax = b. With
a linear system, a finite number of elimination steps produces the exact
answer in a finite time. In the case of an eigenvalue, no such steps and no
such formula can exist. The characteristic polynomial of a 5 × 5 matrix is
a quintic, and it is proved there can be no algebraic form for the roots of
a fifth degree polynomial, although there are a few simple checks on the
eigenvalues, after they have been computed, and we mention here two of
them:

1. The sum of the n eigenvalues of a matrix A equals the sum of n


diagonal entries, i.e.,
n
X n
X
λi = aii = (a11 + a22 + · · · + ann ),
i=1 i=1
338 Applied Linear Algebra and Optimization using MATLAB

which is the trace of A.

2. The product of n eigenvalues of a matrix A equals the determinant


of A, i.e.,
Yn
λi = det(A) = |A| = (λ1 λ2 · · · λn ).
i=1

It should be noted that the system matrix A of (3.1) may be real and
symmetric, or real and nonsymmetric, or complex with symmetric real and
skew symmetric imaginary parts. These different types of a matrix A are
explained as follows:

1. If the given matrix A is a real symmetric matrix, then the eigenval-


ues of A are real but not necessarily positive, and the corresponding
eigenvectors are also real. Also, if λi , xi and λj , xj satisfy the eigen-
value problem (3.1) and λi and λj are distinct, then

xTi xj = 0, i 6= j (3.5)

and
xTi Axj = 0, i 6= j. (3.6)
Equations (3.5) and (3.6) represent the orthogonality relationships.
Note that if i = j, then in general, xTi xi and xTi Axi are not zero. Re-
calling that xi includes an arbitrary scaling factor, then the product
xTi xi must also be arbitrary. However, if the arbitrary scaling factor
is adjusted so that
xTi xj = 1, (3.7)
then
xTi Axj = λi , (3.8)
and the eigenvectors are known to be normalized.
Sometimes the eigenvalues are not distinct and the eigenvectors asso-
ciated with these equal or repeated eigenvalues are not, of necessity,
orthogonal. If λi = λj and the other eigenvalues, λk , are distinct,
then
xTi xk = 0, k = 1, 2, · · · , n k 6= i, k 6= j (3.9)
The Eigenvalue Problems 339

and
xTj xk = 0, k = 1, 2, · · · , n k 6= i, k. 6= j. (3.10)
When λi = λj , the eigenvectors xi and xj are not unique and a linear
combination of them, i.e., axi + bxj , where a and b are arbitrary con-
stants, also satisfies the eigenvalue problems. One important result is
that a symmetric matrix of order n always has n distinct eigenvectors
even if some of the eigenvalues are repeated.
2. If a given A is a real nonsymmetric matrix, then a pair of related
eigenvalue problems can arise as follows:
Ax = λx (3.11)
and
AT y = βy. (3.12)
By taking the transpose of (3.12), we have
yT A = βyT . (3.13)

The vectors x and y are called the right-hand and left-hand vectors
of A, respectively. The eigenvalues of A and AT are identical, i.e.,
λi = βi , but the eigenvectors x and y will, in general, differ from
each other. The eigenvalues and eigenvectors of a nonsymmetric real
matrix are either real or pairs of complex conjugates. If λi , xi , yi , and
λj , xj , yj are solutions that satisfy the eigenvalue problems of (3.11)
and (3.12) and λi and λj are distinct, then
yjT xi = 0, i 6= j (3.14)
and
yjT Axi = 0, i 6= j. (3.15)
Equations (3.14) and (3.15) are called bi-orthogonal relationships.
Note that if, in these equations, i = j, then, in general, yiT xi and
yiT Axi are not zero. The eigenvectors xi and yi include arbitrary scal-
ing factors, and so the product of these vectors will also be arbitrary.
However, if the vectors are adjusted so that
yiT xi = 1, (3.16)
340 Applied Linear Algebra and Optimization using MATLAB

then
yiT Axi = λi . (3.17)

We can, in these circumstances, describe neither xi nor yi as normal-


ized; the vectors still include an arbitrary scaling factor, only their
product is uniquely chosen. If for a nonsymmetric matrix λi = λj
and the remaining eigenvalues, λk , are distinct, then

yjT xk = 0, yiT xk = 0 k = 1, 2, · · · , n k 6= i, k 6= j (3.18)

and

xTj xk = 0, ykT xj = 0 k = 1, 2, · · · , n k 6= i, k 6= j. (3.19)

For certain matrices with repeated eigenvalues, the eigenvectors may


also be repeated; consequently, for an nth-order matrix of this type
we may have less than n distinct eigenvectors. This type of matrix
is called deficient.

3. Let us consider the case when the given A is a complex matrix. The
properties of one particular complex matrix is an Hermitian matrix,
which is defined as
H = A + iB, (3.20)
where A and B are real matrices such that A = AT and B = −B T .
Hence, A is symmetric and B is skew symmetric with zero terms on
the leading diagonal. Thus, by definition of an Hermitian matrix,
H has a symmetric real part and a skew symmetric imaginary part,
making H equal to the transpose of its complex conjugate, denoted
by H ∗ . Consider now the eigenvalue problem

Hx = λx. (3.21)

If λi , xi are solutions of (3.21), then xi is complex but λi is real. Also,


if λi , xi , and λj , xj satisfy the eigenvalue problem (3.21) and λi and
λj are distinct, then

x∗i xj = 0, i 6= j (3.22)
The Eigenvalue Problems 341

and
x∗i Hxj = 0, i 6= j, (3.23)
where x∗i is the transpose of the complex conjugate of xi . As before,
xi includes an arbitrary scaling factor and the product x∗i xi must
also be arbitrary. However, if the arbitrary scaling factor is adjusted
so that
x∗i xi = 1, (3.24)
then
x∗i Hxi = λi , (3.25)
and the eigenvectors are then said to be normalized.

A large number of numerical techniques have been developed to solve


the eigenvalue problems. Before discussing all these numerical techniques,
we shall start with a hand calculation, mainly to reinforce the definition
and solve the following examples.

Example 3.2 Find the eigenvalues and eigenvectors of the following ma-
trix:  
3 0 1
A =  0 −3 0  .
1 0 3
Solution. First, we shall find the eigenvalues of the given matrix A. From
(3.2), we have     
3 0 1 x1 0
 0 −3 0   x2  =  0  .
1 0 3 x3 0
For nontrivial solutions, using (3.4), we get

3−λ 0 1


0 −3 − λ 0 = 0,

1 0 3−λ

which gives a characteristic equation of the form

λ3 − 3λ2 − 10λ + 24 = 0,
342 Applied Linear Algebra and Optimization using MATLAB

which factorizes to
(λ + 3)(λ − 2)(λ − 4) = 0,
which gives the eigenvalues 4, –3, and 2 of the given matrix A. One can
note that the sum of these eigenvalues is 3, and this agrees with the trace
of A. •

The characteristic equation of the given matrix can be obtained by us-


ing the following MATLAB commands:

>> syms lambd


>> A = [3 0 1; 0 − 3 0; 1 0 3];
>> determ(A − lambd ∗ eye(3));
>> ans =
−24 + 10 ∗ lambd + 3 ∗ lambdˆ 2−lambdˆ 3;
>> f actor(ans);
−(lambd − 2) ∗ (lambd − 4) ∗ (3 + lambd)

After finding the eigenvalues of the matrix A, we turn to the problem


of finding the corresponding eigenvectors. The eigenvectors of A corre-
sponding to the eigenvalues λ are the nonzero vectors x that satisfy (3.2).
Equivalently, the eigenvectors corresponding to λ are the nonzero vectors
in the solution space of (3.2). We call this solution space the eigenspace of
A corresponding to λ.

Now to find the eigenvectors of the given matrix A corresponding to


each of these eigenvalues, we substitute each of these three eigenvalues in
(3.2). When λ1 = 4, we have
    
−1 0 1 x1 0
 0 −7 0   x2  =  0  ,
1 0 −1 x3 0
which implies that
−x1 + 0x2 + x3 = 0 ⇒ x1 = x3
0x1 − 7x2 + 0x3 = 0 ⇒ x2 = 0
x1 + 0x2 − x3 = 0 ⇒ x1 = x3 .
The Eigenvalue Problems 343

Solving this system, we get x1 = x3 = ∞, and x2 = 0. Hence, the eigen-


vector x1 corresponding to the first eigenvalue, 4, by choosing x3 = 1,
is
x1 = α[1, 0, 1]T , where α ∈ R, α 6= 0.
When λ2 = −3, we have
    
6 0 1 x1 0
 0 0 0   x2  =  0  ,
1 0 6 x3 0
which implies that
6x1 + 0x2 + x3 = 0 ⇒ x1 = − 16 x3
0x1 + 0x2 + 0x3 = 0 ⇒ x2 = ∞
x1 + 0x2 + 6x3 = 0 ⇒ x1 = −6x3 ,
which gives the solution, x1 = −6x3 = 0 and x2 = ∞. Hence, the eigen-
vector x2 corresponding to the second eigenvalue, –3, by choosing x2 = 1,
is
x2 = α[0, 1, 0]T , where α ∈ R, α 6= 0.
Finally, when λ3 = 2, we have
    
1 0 1 x1 0
 0 −5 0   x2  =  0  ,
1 0 1 x3 0
which implies that
x1 + 0x2 + x3 = 0 ⇒ x1 = −x3
0x1 − 5x2 + 0x3 = 0 ⇒ x2 = 0
x1 + 0x2 + x3 = 0 ⇒ x1 = −x3 ,
which gives x1 = −x3 = ∞ and x2 = 0. Hence, by choosing x1 = 1, we
obtain
x3 = α[1, 0, −1]T , where α ∈ R, α 6= 0,
the third eigenvector x3 corresponding to the third eigenvalue, 2, of the
matrix. •
344 Applied Linear Algebra and Optimization using MATLAB

MATLAB can handle eigenvalues, eigenvectors, and the characteristic


polynomial. The built-poly function in MATLAB computes the charac-
teristic polynomial of the matrix:

>> A = [3 0 1; 0 − 3 0; 1 0 3];
>> P = poly(A);
>> P P = poly2sym(P );
>> P P =
xˆ 3−3xˆ 2−10x + 24

The elements of vector P are arranged in decreasing power of x. To


solve the characteristic equation (in order to obtain the eigenvalues of A),
ask for the roots of P :

>> roots(P );

If all we require are the eigenvalues of A, we can use the MATLAB


command eig, which is the basic eigenvalue and eigenvector routine. The
command

>> d = eig(A);

returns a vector containing all the eigenvalues of a matrix A. If the eigen-


vectors are also wanted, the syntax

>> [X, D] = eig(A);

will return a matrix X whose columns are the eigenvectors of A corre-


sponding to the eigenvalues in the diagonal matrix D.

To get the results of Example 3.2, we use the MATLAB Command Win-
dow as follows:
The Eigenvalue Problems 345

>> A = [3 0 1; 0 − 3 0; 1 0 3];
>> P = poly(A);
>> P P = poly2sym(P );
>> [X, D] = eig(A);
>> λ = diag(D);
>> x1 = X(:, 1); x2 = X(:, 2); x3 = X(:, 3);
Example 3.3 Find the eigenvalues and eigenvectors of the following ma-
trix:  
1 2 2
A=  0 3 3 .
−1 1 1
Solution. From (3.2), we have
    
1 2 2 x1 0
 0 3 3   x2  =  0 .
−1 1 1 x3 0
For nontrivial solutions, using (3.4), we get

1−λ 2 2


0 3 − λ 3 = 0,

−1 1 1−λ
which gives a characteristic equation of the form
−λ3 + 5λ2 − 6λ = 0.
It factorizes to
λ(λ − 2)(λ − 3) = 0,
which gives the eigenvalues 0, 2, and 3 of the given matrix A. One can
note that the sum of these three eigenvalues is 5, and this agrees with the
trace of A.

To find the eigenvectors corresponding to each of these eigenvalues, we


substitute each of the three eigenvalues of A in (3.2). When λ = 0, we
have     
1 2 2 x1 0
 0 3 3   x2  =  0  .
−1 1 1 x3 0
346 Applied Linear Algebra and Optimization using MATLAB

The augmented matrix form of the system is


 
1 2 2 0
 0 3 3 0 ,
−1 1 1 0

which can be reduced to  


1 2 2 0
 0 1 1 0 .
0 0 0 0
Thus, the components of an eigenvector must satisfy the relation

x1 + 2x2 + 2x3 = 0
0x1 + x2 + x3 = 0
0x1 + 0x2 + 0x3 = 0.

This system has an infinite set of solutions. Arbitrarily, we choose


x2 = 1, then x3 can be equal to −1, whence x1 = 0. This gives solutions of
the first eigenvector of the form x1 = α[0, 1, −1]T , with α ∈ R and α 6= 0.
Thus, x1 = α[0, 1, −1]T is the most general eigenvector corresponding to
eigenvalue 0.

A similar procedure can be applied to the other two eigenvalues. The


result is that we have two other eigenvectors x2 = α[4, 3, −1]T and x3 =
α[1, 1, 0]T corresponding to the eigenvalues 2 and 3, respectively. •

Example 3.4 Find the eigenvalues and eigenvectors of the following ma-
trix:  
3 2 −1
A= 2 6 −2  .
−1 −2 3
Solution. From (3.2), we have
    
3 2 −1 x1 0
 2 6 −2   x2  =  0 .
−1 −2 3 x3 0
The Eigenvalue Problems 347

For nontrivial solutions, using (3.4), we get



3−λ 2 −1


2 6 − λ −2 = 0,

−1 −2 3 − λ
which gives a characteristic equation
λ3 − 12λ2 + 36λ − 32 = 0.
It factorizes to
(λ − 2)2 (λ − 8) = 0
and gives the eigenvalue 2 of multiplicity 2 and the eigenvalue 8 of multi-
plicity 1, and the sum of these three eigenvalues is 12, which agrees with
the trace of A. When λ = 2, we have
 
1 2 −1
(A − 2I) =  2 4 −2  .
−1 −2 1
and so from (3.2), we have
x1 + 2x2 − x3 = 0
2x1 + 4x2 − 2x3 = 0
−x1 − 2x2 + x3 = 0.
Let x2 = s and x3 = t, then the solution to this system is
       
x1 −2s + t −2 1
 x2  =  s  = s 1  + t 0 .
x3 t 0 1
So the two eigenvectors of A are x1 = α[−2, 1, 0]T and x2 = α[1, 0, 1]T ,
corresponding to the eigenvalue 2, with s, t ∈ R and s, t 6= 0.

Similarly, we can find the third eigenvector, x3 = α[0.5, 1, −0.5]T , of A


corresponding to the other eigenvalue, 8. •
Note that in all the above three examples and any other example, there
is always an infinite number of choices for each eigenvector. We arbitrarily
choose a simple one by setting one or more of the elements xi s equal to a
convenient number. Here we have set one of the elements xi s equal to 1. •
348 Applied Linear Algebra and Optimization using MATLAB

3.2 Linear Algebra and Eigenvalues Problems


The solutions of many physical problems require the calculation of the
eigenvalues and the corresponding eigenvectors of a matrix associated with
a linear system of equations. Since a matrix A of order n has n, not neces-
sarily distinct, eigenvalues, which are the roots of a characteristic equation
(3.4), theoretically, the eigenvalues of A can be obtained by finding the n
roots of a characteristic polynomial p(λ), and then the associated linear
system can be solved to determine the corresponding eigenvectors. The
polynomial p(λ) is difficult to obtain except for small values of n. So for a
large value of n, it is necessary to construct approximation techniques for
finding the eigenvalues of A.

Before discussing such approximation techniques for finding eigenvalues


and eigenvectors of a given matrix A, we need some definitions and results
from linear algebra.

Definition 3.2 (Real Vector Spaces)

A vector space consists of a nonempty set V of objects (called vectors) that


can be added, that can be multiplied by a real number (called a scalar),
and for which certain axioms hold. If u and v are two vectors in V , their
sum is expressed as u + v, and the scalar product of u by a real number
α is denoted by αu. These operations are called vector addition and scalar
multiplication, respectively, and the following axioms are assumed to hold:

Axioms for Vector Addition

1. If u and v are in V , then u + v is in V.

2. u + v = v + u, for all u and v in V.

3. u + (v + w) = (u + v) + w, for all u, v, and w in V.

4. There exists an element 0 in V , called a zero vector, such that


u + 0 = u.
The Eigenvalue Problems 349

5. For each u in V , there is an element −u (called the negative of u)


in V such that u + (−u) = 0.

Axioms for Scalar Multiplication

1. If v is in V , then αu is in V , for all α ∈ R.

2. α(u + v) = αu + αv, for all u, v ∈ V and α ∈ R.

3. (α + β)u = αu + βu, for all u ∈ V and α, β ∈ R.

4. α(βu) = (α + β)u, for all u ∈ V and α, β ∈ R.

5. 1u = u. •

For example, a real vector space is the space Rn consisting of column


vectors or n-tuples of real numbers u = (u1 , u2 , . . . , un )T . Vector addition
and scalar multiplication are done in the usual manner:
   
u1 + v1 αu1
 u2 + v2   αu2 
u+v =  , αu =  ..  ,
   
..
 .   . 
un + vn αun

whenever    
u1 v1
 u2   v2 
u= , v= .
   
.. ..
 .   . 
un vn
The zero vector is 0 = (0, 0, . . . , 0)T . The fact that vectors in Rn satisfy
all of the vector space axioms is an immediate consequence of the laws of
vector addition and scalar multiplication. •

The following theorem presents several useful properties common to all


vector spaces.

Theorem 3.3 If V is a vector space, then:


350 Applied Linear Algebra and Optimization using MATLAB

1. 0u = 0, for all u ∈ V.
2. α0 = 0, for all α ∈ R.
3. If αu = 0, then α = 0 or u = 0.
4. (−1)u = u, for all u ∈ V. •

Definition 3.3 (Subspaces)

Let V be a vector space and W be a nonempty subset of V . If W is a vector


space with respect to the operations in V , then W is called a subspace of V.

For example, every vector space has at least two subspaces, itself and the
subspace{0} (called the zero subspace) consisting of only the zero vector. •

Theorem 3.4 A nonempty subset W ⊂ V of a vector space is a subspace,


if and only if:
1. For every u, v ∈ W , then the sum u + v ∈ W.
2. For every u ∈ W and every α ∈ R, then the scalar product αu ∈ W.

Definition 3.4 (Basis of Vector Space)

Let V be a vector space. A finite set S = {v1 , v2 , . . . , vn } of vectors in


V is a basis for V , if and only if any vector v in V can be written, in a
unique way, as a linear combination of the vectors in S, i.e., if and only if
any vector v has the form

v = k1 v1 + k2 v2 + · · · + kn vn , (3.26)

for one and only one set of real numbers k1 , k2 , . . . , kn . •

Definition 3.5 (Linearly Independent Vectors)

The vectors v1 , v2 , . . . , vn are said to be linearly independent, if whenever

k1 v1 + k2 v2 + · · · + kn vn = 0, (3.27)
The Eigenvalue Problems 351

then all of the coefficients k1 , k2 , . . . , kn must be equal to zero, i.e.,


k1 = k2 = · · · = kn = 0. (3.28)
If the vectors v1 , v2 , . . . , vn are not linearly independent, then we say
that they are linearly dependent. In other words, the vectors v1 , v2 , . . . , vn
are linearly dependent, if and only if there exist numbers k1 , k2 , . . . , kn , not
all zero, for which
k1 v1 + k2 v2 + · · · + kn vn = 0.
Sometimes we say that the set {v1 , v2 , . . . , vn } is linearly independent
(or linearly dependent) instead of saying that the vectors v1 , v2 , . . . , vn are
linearly independent (or linearly dependent). •

Example 3.5 Let us consider the vectors v1 = (1, 2) and v2 = (−1, 1) in


R2 . To show that the vectors are linearly independent, we write
k1 v1 + k2 v2 = 0
k1 (1, 2) + k2 (−1, 1) = (0, 0)
(k1 − k2 , 2k1 + k2 ) = (0, 0),
showing that
k1 − k2 = 0
2k1 + k2 = 0,
and the only solution to the system is a trivial solution, i.e., k1 = k2 = 0.
Thus, the vectors are linearly independent. •

The above results can be obtained using the MATLAB Command Win-
dow as follows:

>> v1 = [1 2]0 ;
>> v2 = [−1 1]0 ;
>> A = [v1 v2];
>> null(A);
Note that using the MATLAB command, we obtained
ans =
Empty matrix: 2 − by − 0,
352 Applied Linear Algebra and Optimization using MATLAB

which means that the only solution to the homogeneous system Ak = 0 is


the zero solution k = 0.

Example 3.6 Consider the following functions:

p1 (t) = t2 + t + 2
p2 (t) = 2t2 + t + 3
p3 (t) = 3t2 + 2t + 2.

Show that the set {p1 (t), p2 (t), p3 (t)} is linearly independent.

Solution. Suppose a linear combination of these given polynomials van-


ishes, i.e.,

k1 (t2 + t + 2) + k2 (2t2 + t + 3) + k3 (3t2 + 2t + 2) = 0.

By equating the coefficients of 2, 1, and 0 degrees, we get the following


linear system:

k1 + 2k2 + 3k3 = 0
k1 + k2 + 2k3 = 0
2k1 + 3k2 + 2k3 = 0.

Solving this homogenous linear system


    
1 2 3 k1 0
 1 1 2   k2  =  0 ,
2 3 2 k3 0

we get     
1 0 0 k1 0
 0 1 0   k2  =  0  .
0 0 1 k3 0
Since the only solution to the above system is a trivial solution

k1 = k2 = k3 = 0,

the given functions are linearly independent. •


The Eigenvalue Problems 353

Example 3.7 Suppose that the set {v1 , v2 , v3 } is linearly independent in


a vector space V . Show that the set {v1 +v2 +v3 , v1 −v2 −v3 , v1 +v2 −v3 }
is also linearly independent.

Solution. Suppose a linear combination of these given vectors v1 + v2 +


v3 , v1 − v2 − v3 , and v1 + v2 − v3 vanishes, i.e.,

k1 (v1 + v2 + v3 ) + k2 (v1 − v2 − v3 ) + k3 (v1 + v2 − v3 ) = 0.

We must deduce that k1 = k2 = k3 = 0. By equating the coefficients of


v1 , v2 , and v3 , we obtain

(k1 + k2 + k3 )v1 + (k1 − k2 + k3 )v2 + (k1 − k2 − k3 )v3 = 0.

Since {v1 , v2 , v3 } is linearly independent, we have

k1 + k2 + k3 = 0
k1 − k2 + k3 = 0
k1 − k2 − k3 = 0.

Solving this linear system, we get the unique solution

k1 = 0, k2 = 0, k3 = 0,

which means that the set {v1 + v2 + v3 , v1 − v2 − v3 , v1 + v2 − v3 } is also


linearly independent. •

Theorem 3.5 If {v1 , v2 , . . . , vn } is a set of n linearly independent vectors


in Rn , then any vector x ∈ Rn can be written uniquely as

x = k1 v1 + k2 v2 + · · · + kn vn , (3.29)

for some collection of constants k1 , k2 , . . . , kn . •

Example 3.8 Consider the vectors v1 = (1, 2, 1), v2 = (1, 3, −2), and
v3 = (0, 1, −3) in R3 . If k1 , k2 , and k3 are numbers with

k1 v1 + k2 v2 + k3 v3 = 0,
354 Applied Linear Algebra and Optimization using MATLAB

this is equivalent to

k1 (1, 2, 1) + k2 (1, 3, −2) + k3 (0, 1, −3) = (0, 0, 0).

Thus, we have the system

k1 + k2 + 0k3 = 0
2k1 + 3k2 + k3 = 0
k1 − 2k2 − 3k3 = 0.

This system has infinitely many solutions, one of which is k1 = 1, k2 = −1,


and k3 = 1. So,
v1 − v2 + v3 = 0.
Thus, the vectors v1 , v2 , and v3 are linearly dependent. •

The above results can be obtained using the MATLAB Command Win-
dow as follows:

>> v1 = [1 2 1]0 ;
>> v2 = [1 3 − 2]0 ;
>> v3 = [0 1 − 3]0 ;
>> A = [v1 v2 v3];
>> null(A);
By using this MATLAB command, the answer we obtained means that
there is a nonzero solution to the homogeneous system Ak = 0.
Example 3.9 Find the value of α for which the set {(1, −2), (4, −α)} is
linearly dependent.

Solution. Suppose a linear combination of these given vectors (1, −2) and
(4, −α) vanishes, i.e.,

k1 (1, −2) + k2 (4, −α) = 0.

It can be written in the linear system form as

k1 + 4k2 = 0
−2k1 − αk2 = 0
The Eigenvalue Problems 355

or     
1 4 k1 0
= .
−2 α k2 0
By solving this system, we obtain
    
1 4 k1 0
= ,
0 8−α k2 0

and it shows that the system has infinitely many solutions for α = 8. Thus,
the given set {(1, −2), (4, −α)} is linearly dependent for α = 8. •

Theorem 3.6 Let the set {v1 , v2 , . . . , vn } be linearly dependent in a vector


space V . Any set of vectors in V that contains these vectors will also be
linearly dependent. •

Note that any collection of n linearly independent vectors in Rn is a basis


for Rn .

Theorem 3.7 If A is an n×n matrix and λ1 , . . . , λn are distinct eigenval-


ues of A, with associated eigenvectors v1 , . . . , vn , then the set {v1 , . . . , vn }
is linearly independent. •

Definition 3.6 (Orthogonal Vectors)

A set of vectors {v1 , v2 , . . . , vn } is called orthogonal, if

viT vj = 0, for all i 6= j. (3.30)

If, in addition

viT vi = 1, for all i = 1, 2, . . . , n, (3.31)

then the set is called orthonormal. •

Theorem 3.8 An orthogonal set of vectors that does not contain the zero
vectors is linearly independent. •
356 Applied Linear Algebra and Optimization using MATLAB

The proof of this theorem is beyond the scope of this text and will
be omitted. However, the result is extremely important and can be easily
understood and used. We illustrate this result by considering the matrix
 
6 −2 2
A =  −2 5 0 ,
2 0 7

which has the eigenvalues 3, 6, and 9. The corresponding eigenvectors of


A are [2, 2, −1]T , [−1, 2, 2]T , and [2, −1, 2]T , and they form an orthogonal
set. To show that the vectors are linearly independent, we write

k1 v1 + k2 v2 + k3 v3 = 0,

then the equation

k1 (2, 2, −1) + k2 (−1, 2, 2) + k3 (2, −1, 2) = (0, 0, 0)

leads to the homogeneous system of three equations in three unknowns,


k1 , k2 , and k3 :
2k1 − k2 + 2k3 = 0
2k1 + 2k2 − k3 = 0
−k1 + 2k2 + 2k3 = 0.
Thus, the vectors will be linearly independent, if and only if the above
system has a trivial solution. By writing the above system as an augmented
matrix form and then row-reducing, we get:
   
2 −1 2 0 1 0 0 0
 2 2 −1 0  −→  0 1 0 0  ,
−1 2 2 0 0 0 1 0

which gives k1 = 0, k2 = 0, k3 = 0. Hence, the vectors are linearly


independent. •

Theorem 3.9 The determinant of a matrix is zero, if and only if the rows
(or columns) of the matrix form a linearly dependent set. •
The Eigenvalue Problems 357

3.3 Diagonalization of Matrices


Of special importance for the study of eigenvalues are diagonal matrices.
These will be denoted by
 
λ1 0 0 ··· 0
 0 λ2 0 ··· 0 
D=
 
.. .. .. . . .. 
 . . . . . 
0 0 0 · · · λn

and are called spectral matrices, i.e., all the diagonal elements of D are
the eigenvalues of A. This simple but useful result makes it desirable to
find ways to transform a general n × n matrix A into a diagonal matrix
having the same eigenvalues. Unfortunately, the elementary operations
that can be used to reduce A → D are not suitable, because the scale and
subtract operations alter eigenvalues. Here, what we needed are similarity
transformations. Similarity transformations occur frequently in the context
of relating coordinate systems.

Definition 3.7 (Similar Matrix)

Let A and B be square matrices of the same size. A matrix B is said to


be similar to A (i.e., A ≡ B), if there exists a nonsingular matrix Q such
that B = Q−1 AQ. The transformation of a matrix A into the matrix B in
this manner is called a similarity transformation. •

Example 3.10 Consider the following matrices A and Q, and Q is non-


singular. Use the similarity transformation Q−1 AQ to transform A into a
matrix B.
   
0 0 −2 −1 0 −2
A= 1 2 1 , Q= 0 1 1 .
1 0 3 1 0 1
358 Applied Linear Algebra and Optimization using MATLAB

Solution. Let
 −1   
−1 0 −2 0 0 −2 −1 0 −2
B = Q−1 AQ =  0 1 1   1 2 1  0 1 1 
1 0 1 1 0 3 1 0 1
   
1 0 2 0 0 −2 −1 0 −2
=  1 1 1  1 2 1  0 1 1 
−1 0 −1 1 0 3 1 0 1
 
2 0 0
=  0 2 0 .
0 0 1

In Example 3.10, the matrix A is transformed into a diagonal matrix B.


Not every square matrix can be “diagonalized” in this manner. Here, we
will discuss conditions under which a matrix can be diagonalized and when
it can, ways of constructing an approximate transforming matrix Q. We
will find that eigenvalues and eigenvectors play a key role in this discussion.

Theorem 3.10 Let A, B, and C be n × n matrices:


1. A ≡ A.

2. If A ≡ B, then B ≡ A.

3. If A ≡ B and B ≡ C, then A ≡ C. •

Theorem 3.11 Let A and B be n × n matrices with A ≡ B, then:


1. det(A) = det(B).

2. A is invertible, if and only if B is invertible.

3. A and B have the same rank.

4. A and B have the same characteristic polynomial.

5. A and B have the same eigenvalues. •


The Eigenvalue Problems 359

Note that Theorem 3.11 gives the sufficient conditions for the similar ma-
trices. For example, for the matrices
   
1 0 1 1
A= and B = ,
0 1 0 1

then

det(A) = det(B)
rankA = 1 = rankB
(λ1 = 1, λ2 = 1) = (λ1 = 1, λ2 = 1).

But A is not similar to B, since

Q−1 AQ = Q−1 IQ = Q−1 Q = I 6= B

for any invertible matrix Q.

Theorem 3.12 Similar matrices have the same eigenvalues.

Proof. Let A and B be similar matrices. Hence, there exists a matrix Q


such that B = Q−1 AQ. The characteristic polynomial of B is |B − λI|.
Substituting for B and using the multiplicative properties of determinants,
we get

|B − λI| = |Q−1 AQ − λI| = |Q−1 (A − λI)Q|

= |Q−1 ||A − λI||Q| = |(A − λI)|Q−1 ||Q|

= |A − λI||Q−1 Q| = |A − λI||I|

= |A − λI|.

The characteristic polynomials of A and B are identical. This means that


their eigenvalues are the same. •
360 Applied Linear Algebra and Optimization using MATLAB

Definition 3.8 (Diagonalizable Matrix)

A square matrix A is called diagonalizable if there exists an invertible ma-


trix Q such that

D = Q−1 AQ (3.32)

is a diagonal matrix. Note that all the diagonal elements of it are the
eigenvalues of A, and a invertible matrix Q can be written as

 
Q = x1 |x2 | · · · |xn

and is called its model matrix because its columns contain, x1 , x2 , . . . , xn ,


which are the eigenvectors of A corresponding to the eigenvalues λ1 , . . . , λn .

Theorem 3.13 Any matrix having linearly independent eigenvectors cor-


responding to distinct and real eigenvalues is diagonalizable, i.e.,

Q−1 AQ = D,

where D is a diagonal matrix and Q is an invertible matrix.

Proof. Let λ1 , . . . , λn be the eigenvalues of a matrix A, with corresponding


linearly independent eigenvectors x1 , . . . , xn . Let Q be the matrix having
x1 , . . . , xn as column vectors, i.e.,

Q = (x1 · · · xn ).

Since Ax1 = λx1 , . . . , Axn = λn xn , matrix multiplication in terms of


The Eigenvalue Problems 361

columns gives

AQ = (Ax1 · · · Axn )

= (λ1 x1 · · · λn xn )
 
λ1 0
= (x1 · · · xn ) 
 ... 

0 λn
 
λ1 0
= Q
 ... .

0 λn

Since the columns of Q are linearly independent, Q is invertible. Thus,


 
λ1 0
Q−1 AQ = 
 ..  = D.

.
0 λn

Therefore, if a square matrix A has n linearly independent eigenvectors,


these eigenvectors can be used as the columns of a matrix Q that diagonal-
izes A. The diagonal matrix has the eigenvalues of A as diagonal elements.

Note that the converse of the above theorem also exists, i.e., if A is
diagonalizable, then it has n linearly independent eigenvectors. •

Example 3.11 Consider the matrix


 
0 0 1
A =  3 7 −9  ,
0 2 −1

which has a characteristic

λ3 − 6λ2 + 11λ − 6 = 0,
362 Applied Linear Algebra and Optimization using MATLAB

and this cubic factorizes to give

(λ − 1)(λ − 2)(λ − 3) = 0.

The eigenvalues of A, therefore, are 1, 2, and 3, with a sum 6, which agrees


with the trace of A. Corresponding to these eigenvalues, the eigenvectors
of A are x1 = [1, 1, 1]T , x2 = [1, 3, 2]T , and x3 = [1, 6, 3]T . Thus, the
nonsingular matrix Q is given by
 
1 1 1
Q =  1 3 6 ,
1 2 3

and the inverse of this matrix is given by


 
3 1 −3
Q−1 =  −3 −2 5 .
1 1 −2

Thus,
   
3 1 −3 0 0 1 1 1 1
Q−1 AQ =  −3 −2 5   3 7 −9   1 3 6  ,
1 1 −2 0 2 −1 1 2 3

which implies that


 
1 0 0
Q−1 AQ =  0 2 0  = D.
0 0 3


The above results can be obtained using the MATLAB Command Win-
dow as follows:
The Eigenvalue Problems 363

>> A = [0 0 1; 3 7 − 9; 0 2 − 1];
>> P = poly(A);
>> P P = poly2sym(P );
>> [X, D] = eig(A);
>> eigenvalues = diag(D);
>> x1 = X(:, 1); x2 = X(:, 2); x3 = X(:, 3);
>> Q = [x1 x2 x3];
>> D = inv(Q) ∗ A ∗ Q;
It is possible that independent eigenvectors may exist even though the
eigenvalues are not distinct, though no theorem exists to show under what
conditions they may do so. The following example shows the situation that
can arise.
Example 3.12 Consider the matrix
 
2 1 1
A =  2 3 2 ,
3 3 4
which has a characteristic equation
λ3 − 9λ2 + 15λ − 7 = 0,
and it can be easily factorized to give
(λ − 7)(λ − 1)2 = 0.
The eigenvalues of A are 7 of multiplicity one and 1 of multiplicity two. The
eigenvectors corresponding to these eigenvalues are x1 = [1, 2, 3]T , x2 =
[1, 0, −1]T , and x3 = [0, 1, −1]T . Thus, the nonsingular matrix Q is given
by  
1 1 0
Q= 2 0 1 ,
3 −1 −1
and the inverse of this matrix is
 
1 −1 −1
1
Q−1 =  −5 1 1 .
6
2 −4 2
364 Applied Linear Algebra and Optimization using MATLAB

Thus,  
7 0 0
Q−1 AQ =  0 1 0  = D.
0 0 1

Computing Powers of a Matrix

There are numerous problems in applied mathematics that require the


computation of higher powers of a square matrix. Now we shall show how
diagonalization can be used to simplify such computations for diagonaliz-
able matrices.

If A is a square matrix and Q is an invertible matrix, then


(Q−1 AQ)2 = Q−1 AQQ−1 AQ = Q−1 AIAQ = Q−1 A2 Q.
More generally, for any positive integer k, we have
(Q−1 AQ)k = Q−1 Ak Q.
It follows from this equation that if A is diagonalizable and Q−1 AQ = D
is a diagonal matrix, then
Q−1 Ak Q = (Q−1 AQ)k = Dk .
Solving this equation for Ak yields
Ak = QDk Q−1 . (3.33)
Therefore, in order to compute the kth power of A, all we need to do is
compute the kth power of a diagonal matrix D and then form the matrices
Q and Q−1 as indicated in (3.33). But taking the kth power of a diagonal
matrix is easy, for it simply amounts to taking the kth power of each of the
entries on the main diagonal.
Example 3.13 Consider the matrix
 
1 1 −4
A =  2 0 −4  ,
−1 1 −2
The Eigenvalue Problems 365

which has a characteristic equation

λ3 + λ2 − 4λ − 4 = 0

and factorizes to
(λ + 1)(λ + 2)(λ − 2) = 0.
It gives eigenvalues 2, –2, and –1 of the given matrix A with the correspond-
ing eigenvectors [1, 2, 1]T , [1, 1, 1]T , and [1, 1, 0]T . Then the factorization

A = QDQ−1

becomes
     
1 1 −4 1 1 1 −1 0 0 −1 1 0
 2 0 −4  =  2 1 1   0 −2 0   1 −1 1 ,
−1 1 −2 1 1 0 0 0 2 1 0 −1

and from (3.33), we have


   
1 1 1 (−1)k 0 0 −1 1 0
Ak =  2 1 1   0 (−2)k 0   1 −1 1 ,
k
1 1 0 0 0 (2) 1 0 −1

which implies that


 
−(−1)k + (−2)k + 2k (−1)k − (−2)k (−2)k − 2k
Ak =  −2(−1)k + (−2)k + 2k 2(−1)k − (−2)k (−2)k − 2k  .
−(−1)k + (−2)k (−1)k − (−2)k (−2)k

For this formula, we can easily compute any power of a given matrix A.
For example, if k = 10, then
 
2047 −1023 0
A10 =  2046 −1022 0 ,
1023 −1023 1024

the required 10th power of the matrix. •


366 Applied Linear Algebra and Optimization using MATLAB

The above results can be obtained using the MATLAB Command Win-
dow as follows:

>> A = [1 1 − 4; 2 0 − 4; −1 1 − 2];
>> P = poly(A);
>> [X, D] = eig(A);
>> eigenvalues = diag(D);
>> x1 = X(:, 1); x2 = X(:, 2); x3 = X(:, 3);
>> Q = [x1 x2 x3];
>> A10 = Q ∗ Dˆ 10∗inv(Q);

Example 3.14 Show that the following matrix A is not diagonalizable:


 
5 −3
A= .
3 −1
Solution. To compute the eigenvalues and corresponding eigenvectors of
the given matrix A, we have the characteristic equation of the form
|A − λI| = 0
2
λ − 4λ + 4 = 0,
which factorizes to
(λ − 2)(λ − 2) = 0
and gives repeated eigenvalues 2 and 2 of the given matrix A. To find the
corresponding eigenvectors, we solve (3.2) for λ = 2, and we get
  
3 −3 x1
= 0.
3 −3 x2
Solving the above homogeneous system gives 3x1 − 3x2 = 0, and we have
x1 = x2 = α. Thus, the eigenvectors are nonzero vectors of the form
 
1
α .
1
The eigenspace is a one-dimensional space. A is a 2 × 2 matrix, but it does
not have two linearly independent eigenvectors. Thus, A is not diagonaliz-
able. •
The Eigenvalue Problems 367

Definition 3.9 (Orthogonal Matrix)

It is a square matrix whose inverse can be determined by transposing it,


i.e.,
A−1 = AT .
Such matrices do occur in some engineering problems. The matrix used to
obtain rotation of coordinates about the origin of a Cartesian system is one
example of an orthogonal matrix. For example, consider the square matrix
 
0.6 −0.8
A= .
0.8 0.6
One can easily verify that the given matrix A is orthogonal because
 
−1 0.6 0.8
A = = AT .
−0.8 0.6

Orthogonal Diagonalization

Let Q be an orthogonal matrix, i.e., Q−1 = QT . Thus, if such a ma-


trix is used in a similarity transformation, the transformation becomes
D = QT AQ. This type of similarity transformation is much easier to
calculate because its inverse is simply its transpose. There is therefore
considerable advantage in searching for situations where a reduction to a
diagonal matrix, using an orthogonal matrix, is possible.

Definition 3.10 (Orthogonally Diagonalizable Matrix)

A square matrix A is said to be orthogonally diagonalizable if there exists


an orthogonal matrix Q such that

D = Q−1 AQ = QT AQ

is a diagonal matrix. •

The following theorem tells us that the set of orthogonally diagonaliz-


able matrices is in fact the set of symmetric matrices.
368 Applied Linear Algebra and Optimization using MATLAB

Theorem 3.14 A square matrix A is orthogonally diagonalizable if it is a


symmetric matrix.

Proof. Suppose that a matrix A is orthogonally diagonalizable, then there


exists an orthogonal matrix Q such that

D = QT AQ.

Therefore,
A = QDQT .
Taking its transpose gives

AT = (QDQT )T = (QT )T (QD)T = QDQT = A.

Thus, A is symmetric.
The converse of this theorem is also true, but it is beyond the scope of
this text and will be omitted. •

Symmetric Matrices

Now our next goal is to devise a procedure for orthogonally diagonalizing a


symmetric matrix, but before we can do so, we need an important theorem
about eigenvalues and eigenvectors of symmetric matrices.

Theorem 3.15 If A is a symmetric matrix, then:


(a) The eigenvalues of A are all real numbers.

(b) Eigenvectors from distinct eigenvalues are orthogonal. •

Theorem 3.16 The following conditions are equivalent for an n×n matrix
Q:
(a) Q is invertible and Q−1 = QT .

(b) The rows of Q are orthonormal.

(c) The columns of Q are orthonormal. •


The Eigenvalue Problems 369

Diagonalization of Symmetric Matrices

As a consequence of the preceding theorem we obtain the following proce-


dure for orthogonally diagonalizing a symmetric matrix:
1. Find a basis for each eigenspace of A.

2. Find an orthonormal basis for each eigenspace.

3. Form the matrix Q whose columns are these orthonormal vectors.

4. The matrix D = QT AQ will be a diagonal matrix.

Example 3.15 Consider the matrix


 
3 −1 0
A =  −1 2 −1  ,
0 −1 3
which has a characteristic equation

λ3 − 8λ2 + 19λ − 12 = 0,

and it gives the eigenvalues 1, 3, and 4 for the given matrix A. Correspond-
ing to these eigenvalues, the eigenvectors of A are x1 = [1, 2, 1]T , x2 =
[1, 0, −1]T , and x3 = [1, −1, 1]T , and they form an orthogonal set. Note
that the following vectors
x1
u1 = = √1 [1, 2, 1]T
kx1 k2 6

x2
u2 = = √1 [1, 0, −1]T
kx2 k2 2

x3
u3 = = √1 [1, −1, 1]T
kx3 k2 3

form an orthonormal set, since they inherit orthogonality from x1 , x2 , and


x3 , and in addition

ku1 k2 = ku2 k2 = ku3 k2 = 1.


370 Applied Linear Algebra and Optimization using MATLAB

Then an orthogonal matrix Q forms from an orthonormal set of vectors as

1 1 1
 
√ √ √
 6 2 3 
 
 
 2 1 
Q= √ 0 −√ 


 6 3 
 
 
 1 1 1 
√ −√ √
6 2 3

and

1 2 1 1 1 1
   
√ √ √ √ √ √
 6 6 6  0  6 2 3 
 
  3 −1 
   
 1 1     2 1 
T
Q AQ =  √ 0 − √   −1 2 −1   √ 0 −√  ,
    
 2 2    6 3 
   
0 −1 3
   
 1 1 1   1 1 1 
√ −√ √ √ −√ √
3 3 3 6 2 3

which implies that


 
1 0 0
QT AQ =  0 3 0  = D.
0 0 4

Note that the eigenvalues 1, 3, and 4 of the matrix A are real and its
eigenvectors form an orthonormal set, since they inherit orthogonally from
x1 , x2 , and x3 , which satisfy the preceding theorem. •

The results of Example 3.15 can be obtained using the MATLAB Com-
mand Window as follows:
The Eigenvalue Problems 371

>> A = [3 − 1 0; −1 2 − 1; 0 − 1 3];
>> P = poly(A);
>> P P = poly2sym(P );
>> [X, D] = eig(A);
>> λ = diag(D);
>> x1 = X(:, 1); x2 = X(:, 2); x3 = X(:, 3);
>> u1 = x1/norm(x1); u2 = x2/norm(x2); u3 = x3/norm(x3);
>> Q = [u1 u2 u3];
>> D = Q0 ∗ A ∗ Q;

Theorem 3.17 (Principal Axis Theorem)

The following conditions are equivalent for an n × n matrix A:


(a) A has an orthonormal set of n eigenvectors.
(b) A is orthogonally diagonalizable.
(c) A is symmetric. •
In the following section we shall discuss some extremely important
properties of the eigenvalue problem. Before this, we will discuss some
special matrices, as follows.
Definition 3.11 (Conjugate of a Matrix)

If the entries of an n × n matrix A are complex numbers, we can write


A = (aij ) = (bij + icij ),
where bij and cij are real numbers. The conjugate of a matrix A is a matrix
Ā = (āij ) = (bij − icij ).
For example, the conjugate of
   
2 π i 2 π −i
A=  3 7 0  is Ā =  3 7 0 .
4 1−i 4 4 1+i 4

372 Applied Linear Algebra and Optimization using MATLAB

Definition 3.12 (Hermitian Matrix)

It is a square matrix A = (aij ) that is equal to its conjugate transpose

A = A∗ = ĀT ,

i.e., whenever aij = āji . This is the complex analog of symmetry. For
example, the following matrix A is Hermitian if it has the form
 
a b + ic
A= ,
b − ic d

where a, b, c, d are real. An Hermitian matrix may be or may not be sym-


metric. For example, the matrices
   
1 2 + 4i 1 − 3i 1 2 + 4i 1 − 3i
A =  2 − 4i 3 8 + 6i  and B =  2 + 4i 3 8 + 6i 
1 + 3i 8 − 6i 5 1 − 3i 8 + 6i 5

are Hermitian where the matrix A is symmetric but the matrix B is not.
Note that:
1. Every diagonal matrix is Hermitian, if and only if it is real.
2. The square matrix A is said to be a skew Hermitian when

A = −A∗ = −ĀT ,

i.e., whenever aij = −āji . This is the complex analog of skew symmetry.
For example, the following matrix A is skew Hermitian if it has the form
 
0 1+i
A= .
−1 + i i

Definition 3.13 (Unitary Matrix)

Let A = (aij ) be the square matrix, then if

AA∗ = A∗ A = In ,
The Eigenvalue Problems 373

where In is an n × n identity matrix, then A is called the unitary matrix.


For example, for any real number θ, the following matrix
 
cos θ − sin θ
A=
sin θ cos θ
is unitary. •
Note that:
1. The identity matrix is unitary.
2. The inverse of a unitary matrix is unitary.
3. A product of a unitary matrix is unitary.
4. A real matrix A is unitary, if and only if AT = A−1 .
5. A square matrix A is unitarily similar to the square matrix B, if and
only if there is an unitary matrix Q the same size as A and B such that
A = QBQ−1 .
A square matrix A is unitarily diagonalizable, if and only if it is unitarily
similar to a diagonal matrix.
Theorem 3.18 A square matrix A is unitarily diagonalizable, if and only
if there is a unitary matrix Q of the same size whose columns are eigen-
vectors of A. •
Definition 3.14 (Normal Matrix)

A square matrix A is normal, if and only if it commutes with its conjugate


transpose, i.e.,
AA∗ = A∗ A.
For example, if a and b are real numbers, then the following matrix
 
a b
A=
−b a
is normal because
 
∗ a2 + b 2 0
AA = = A∗ A.
0 a + b2
2

However, its eigenvalues are a + ib. Note that all the Hermitian, skew
Hermitian and unitary matrices are normal matrices. •
374 Applied Linear Algebra and Optimization using MATLAB

3.4 Basic Properties of Eigenvalue Problems


1. A square matrix A is singular, if and only if at least one of its eigen-
values is zero. It can be easily proved, since for λ = 0, we have (3.4)
of the form
|A − λI| = |A| = 0.

Example 3.16 Consider the following matrix:


 
3 1 0
A =  2 −1 −1  .
4 3 1
Then the characteristic equation of A takes the form

−λ3 + 3λ2 = 0.

By solving this cubic equation, the eigenvalues of A are 0, 0, and 3.


Hence, the given matrix is singular because two of its eigenvalues are
zero. •

2. The eigenvalues of a matrix A and its transpose AT are identical.


It is well known that the determinant of a matrix A and of its AT
are the same. Therefore, they must have the same characteristic
equation and the same eigenvalues.

Example 3.17 Consider a matrix A and its transpose matrix AT as


   
1 1 0 1 3 2
A= 3 0 3  and AT =  1 0 −1  .
2 −1 3 0 3 3

The characteristic equations of A and AT are the same, which are

−λ3 + 4λ2 − 3λ = 0.

Solving this cubic polynomial equation, we have the eigenvalues 0, 1, 3


of both the matrices, A and its transpose AT . •
The Eigenvalue Problems 375

3. The eigenvalues of an inverse matrix A−1 , provided that A−1 exists,


are the inverses of the eigenvalues of A.
To prove this, let us consider λ is an eigenvalue of A and using (3.4),
gives
1
|A − λI| = |A − λAA−1 | = |A(I − λA−1 )| = |λ||A|| I − A−1 | = 0.
λ
Since the matrix A is nonsingular, |A| =
6 0, and also, λ 6= 0. Hence,

1
I − A−1 = 0,

λ

which shows that 1


λ
is an eigenvalue of a matrix A−1 .

Example 3.18 Consider a matrix A and its inverse matrix A−1 as


3 1
 
 
 8 0 − 
3 0 1 8 
−1
 1
A=  0 −3 0  and A =  0 −

.
0 
1 0 3  1 3
3 
− 0
8 8

Then a characteristic equation of A has the form

λ3 − 3λ2 − 10λ + 24 = 0,

which gives the eigenvalues 4, –3, and 2 of A. Also, the characteristic


equation of A−1 is
5 2 1 1
λ3 − λ − λ+ = 0,
12 8 24
and it gives the eigenvalues
1 1 1
,− , and ,
4 3 2
which are reciprocals to the eigenvalues 4, −3, 2 of the matrix A. •
376 Applied Linear Algebra and Optimization using MATLAB

4. The eigenvalues of Ak (k is an integer) are eigenvalues of A raised to


the kth power.
To prove this, consider the characteristic equation of a matrix A

|A − λI| = 0,

which can also be written as

0 = |A − λI| = |A − λI||A + λI| = |(A − λI)(A + λI)| = |A2 − λ2 I|.

Example 3.19 Consider the matrix


 
1 1 0
A= 3 0 3 ,
2 −1 3

which has the eigenvalues, 0, 1, and 3. Now


 
4 1 3
AA = A2 =  9 0 9 
5 −1 6

has the characteristic equation of the form

λ3 − 10λ2 + 9λ = 0.

Solving this cubic equation, the eigenvalues of A2 are, 0, 1, and 9,


which are double the eigenvalues 0, 1, 3 of A. •

5. The eigenvalues of a diagonal matrix or a triangular (upper or lower)


matrix are their diagonal elements.

Example 3.20 Consider the following matrices:


   
1 0 0 2 2 3
A=  0 2 0 , B =  0 3 3 .
0 0 3 0 0 4
The Eigenvalue Problems 377

The characteristic equation of A is

λ3 − 6λ2 + 11λ − 6 = (1 − λ)(2 − λ)(3 − λ) = 0,

and it gives eigenvalues 1, 2, 3, which are the diagonal elements of the


given matrix A. Similarly, the characteristic equation of B is

λ3 − 9λ2 + 26λ − 24 = (2 − λ)(3 − λ)(4 − λ) = 0,

and it gives eigenvalues 2, 3, 4, which are the diagonal elements of the


given matrix B. Hence, the eigenvalues of a diagonal matrix A and
the upper-triangular matrix B are their diagonal elements. •

6. Every square matrix satisfies its own characteristic equation.

This is a well-known theorem called the Cayley–Hamilton Theorem.


If a characteristic equation of A is

λn + αn−1 λn−1 + αn−2 λn−2 + · · · + α1 λ + α0 = 0,

the matrix itself satisfies the same equation, namely,

An + αn−1 An−1 + αn−2 An−2 + · · · + α1 A + α0 = 0. (3.34)

Multiplying each term in (3.34) by A−1 , when A−1 exists and thus
α0 6= 0, gives an important relationship for the inverse of a matrix:

An−1 + αn−1 An−2 + αn−2 An−3 + · · · + α1 I + α0 A−1 = 0

or

1 n−1
A−1 = − [A + αn−1 An−2 + αn−2 An−3 + · · · + α1 I].
α0
378 Applied Linear Algebra and Optimization using MATLAB

Program 3.3
MATLAB m-file for Using the Cayley–Hamilton Theorem
function [c,Ainv]= chim(A)
n=max(size(A));
for i=1:n; for j=1:n
I(i,j)=0; I(i,i)=1; end; end
AA=A; AAA=A; c=[1];
for k=1:n; traceA=0;
for g=1:n % Loop to find the trace of matrix A.
traceA=traceA+A(g,g); end
cc = −1/k ∗ traceA; % To find coefficients of the polynomial.
c = [c, cc]; if k < n;
for i=1:n; for j=1:n; b(i, j) = A(i, j) + cc ∗ I(i, j); end; end
for i=1:n; for j=1:n; s=0;
for m=1:n; ss = AA(i, m) ∗ b(m, j); s=s+ss; end;
A(i,j)=s; end; end; end; end
for i=1:n; for j=1:n
su1(i, j) = c(n) ∗ I(i, j) + c(n − 1) ∗ AA(i, j); su2(i,j)=0; end; end
if n > 2
for z=2:n-1; for i=1:n; for j=1:n; s=0;
for m=1:n; ss = AAA(i, m) ∗ AA(m, j); s = s + ss; end
am(i,j)=s; end; end; AAA=am;
for i=1:n; for j=1:n
su2(i, j) = su2(i, j) + c(n − z) ∗ AAA(i, j); end; end;end;end
for i=1:n; for j=1:n
su(i,j)=su1(i,j)+su2(i,j); Ainv(i, j) = −1/c(n + 1) ∗ su(i, j);
end;end

Example 3.21 Consider the square matrix


 
2 1 2
A= 0 2 3 ,
0 0 5
which has a characteristic equation of the form
p(λ) = λ3 − 9λ2 + 24λ − 20 = 0,
The Eigenvalue Problems 379

and one can write

p(A) = A3 − 9A2 + 24A − 20I = 0.

Then the inverse of A can be obtained as

A2 − 9A + 24I − 20A−1 = 0,

which gives
1 2
A−1 = [A − 9A + 24I].
20

Computing the right-hand side, we have


 
10 −5 −1
1 
A−1 = 0 10 −6  .
20
0 0 4

Similarly, one can also find the higher power of the given matrix A.
For example, one can compute the value of the matrix A5 by solving
the expression
A5 = 9A4 − 24A3 + 20A2 ,
and it gives  
32 80 3013
A5 =  0 32 3093  .
0 0 3125

To find the coefficients of a characteristic equation and the in-


verse of a matrix A by the Cayley–Hamilton theorem using MATLAB
commands we do as follows:

>> A = [2 1 2; 0 2 3; 0 0 5];
c = chim(A);
380 Applied Linear Algebra and Optimization using MATLAB

7. The eigenvectors of A−1 are the same as the eigenvectors of A.


Let x be an eigenvector of A that satisfies the equation

Ax = λx,

then
1 −1
A Ax = A−1 x.
λ
Hence,
1
A−1 x = x,
λ
which shows that x is also an eigenvector of A−1 .

8. The eigenvectors of the matrix (kA) are identical to the eigenvectors


of A, for any scalar k.
Since the eigenvalues of (kA) are k times the eigenvalues of A, if
Ax = λx, then
(kA)x = (kλ)x.

9. A symmetric matrix A is positive-definite, if and only if all the eigen-


values of A are positive.

Example 3.22 Consider the matrix


 
2 0 1
A =  0 2 0 ,
1 0 2

which has the characteristic equation of the form

λ3 − 6λ2 + 11λ − 6 = 0,

and it gives the eigenvalues 3, 2, and 1 of A. Since all the eigenvalues


of the matrix A are positive, A is positive-definite. •
The Eigenvalue Problems 381

10. For any n × n matrix A, we have



B0 = I 

Ak = ABk−1 

1 k = 1, 2, . . . , n,
ck = − tr(Ak ) 
k 

Bk = Ak + ck I

where tr(A) is a trace of a matrix A. Then a characteristic polynomial


of A is

p(λ) = |A − λI| = λn + c1 λn−1 + · · · + cn−1 λ + cn .

If cn 6= 0, then the inverse of A can be obtained as


1
A−1 = − Bn−1 .
cn
This is called the Sourian–Frame Theorem.

Example 3.23 Find a characteristic polynomial of the matrix


 
5 −2 −4
A =  −2 2 2 ,
−4 2 5

and then find A−1 by using the Sourian–Frame theorem.

Solution. Since the given matrix is of size 3 × 3, the possible form


of a characteristic polynomial will be

p(λ) = λ3 + c1 λ2 + c2 λ + c3 .

The values of the coefficients c1 , c2 , and c3 of the characteristic poly-


nomial can be computed as

A1 = AB0 = AI = A,

so
1
c1 = − tr(A1 ) = −12.
1
382 Applied Linear Algebra and Optimization using MATLAB

Now  
−7 −2 −4
B1 = A1 + c1 I = A1 − 12I =  −2 −10 2 
−4 2 −7
and  
−15 2 −4
A2 = AB1 =  2 −12 −2  ,
4 −2 −15
so
1
c2 = − tr(A2 ) = 21.
2
Now  
6 2 4
B2 = A2 + c2 I = A2 + 21I =  2 9 −2 
4 −2 6
and  
10 0 0
A3 = AB2 =  0 10 0  ,
0 0 10
so
1
c3 = − tr(A3 ) = −10.
3
Thus,
p(λ) = λ3 − 12λ2 + 21λ − 10
and  
6 2 4
1 1 
A−1 = − B2 = 2 9 −2 
c3 10
4 −2 6
is the inverse of the given matrix A. •

These results can be obtained by using the MATLAB Command


Window as follows:

>> A = [5 − 2 − 4; −2 2 2; −4 2 5];
>> [c, Ainv] = sourian(A);
The Eigenvalue Problems 383

Program 3.4
MATLAB m-file for Using the Sourian–Frame Theorem
function [c,Ainv]=sourian(A)
[n,n]=size(A);
for i=1:n; for j=1:n; b0(i,j)=0;b0(i,i)=1;end;end
AA=A; c[1]; for k = 1:n;
traceA=0; for i=1:n; traceA=traceA+A(i,i);end;
cc = −1/k ∗ tracA; c = [c, cc]; if k < n;
for i=1:n; for j=1:n
b(i, j) = A(i, j) + cc ∗ b0(i, j);end;end
for i=1:n; for j=1:n; s = 0; for m =1:n;
ss = AA(i, m) ∗ b(m, j); s = s + ss; end;
A(i, j) = s; end; end; end; end;
for i=1:n; for j=1:n;
ai(i,j)=-1/c(n+1)*b(i,j); end; end
disp(’Coefficients of the polynomial’) c
disp(’Inverse of the matrix A=ai’) ai

11. If a characteristic equation of an n × n matrix A is

λn + αn−1 λn−1 + αn−2 λn−2 + αn−3 λn−3 + · · · + α1 λ + α0 = 0,

the values of the coefficients of a characteristic polynomial are then


found from the following sequence of computations:

αn−1 = −tr(A)
1
αn−2 = − [αn−1 tr(A) + tr(A2 )]
2
1
αn−3 = − [αn−2 tr(A) + αn−1 tr(A2 ) + tr(A3 )]
3
.. ..
. .
1
α0 = − [α1 tr(A) + α2 tr(A2 ) + · · · + tr(An )].
n
This formula is called Bocher’s formula, which can be used to find
the coefficients of a characteristic equation of a square matrix.
384 Applied Linear Algebra and Optimization using MATLAB

Example 3.24 Find a characteristic equation of the following ma-


trix by using Bocher’s formula:
 
1 1 0
A= 3 0 3 .
2 −1 3

Solution. Since the size of the given matrix is 3 × 3, we have to find


the coefficients α2 , α1 , α0 of the characteristic equation

λ3 + α2 λ2 + α1 λ + α0 = 0,

where
α2 = −tr(A)

1
α1 = − [α2 tr(A) + tr(A2 )]
2
1
α0 = − [α1 tr(A) + α2 tr(A2 ) + tr(A3 )].
3
In order to find the values of the above coefficients, we must compute
the powers of matrix A as follows:
   
4 1 3 13 1 12
2 3
A =  9 0 9  and A =  27 0 27  .
5 −1 6 14 −1 15

By using these matrices, we can find the coefficients of the charac-


teristic equation as

α2 = −(4) = −4

1
α1 = − [−4(4) + 10] = 3
2
1
α0 = − [3(4) + (−4)(10) + 28] = 0.
3
The Eigenvalue Problems 385

Hence, the characteristic equation of A is

λ3 − 4λ2 + 3λ = 0.

To find the coefficients of the characteristic equation by Bocher’s


theorem using MATLAB commands we do as follows:

>> A = [1 1 0; 3 0 1; 2 − 1 3];
>> c = BOCH(A);

Program 3.5
MATLAB m-file for Using Bocher’s Theorem
function c=BOCH(A)
[n,n]=size(A);
for i=1:n; for j=1:n; I(i,i)=1; end; end
A(n-1)=-trace(A);
for i=2:n; s=0; p=1;
for k=i-1:-1:1
s = s + A(n − k) ∗ trace(Aˆ p); p = p + 1; end
˜ A(n − i) = (−1/i) ∗ (s + trace(Aˆ i)); else
if i=n;
Ao = (−1/i) ∗ (s + trace(Aˆ i)); end; end
coeff=[Ao A]; if ao=0;
˜ s=A ˆ (n-1);
for i=1:n-2
s = s + a(n − i) ∗ Aˆ (n − (i + 1)); end
s = s + A(1) ∗ I; Ainv = −s/Ao; else; end

12. For an n × n matrix A with a characteristic equation

|A − λI| = λn + αn−1 λn−1 + αn−2 λn−2 + · · · + α1 λ + α0 = 0,


386 Applied Linear Algebra and Optimization using MATLAB

the unknown coefficients can be computed as


αn−1 = −tr(AD1 )

1
αn−2 = − (AD2 )
2
1
αn−3 = − (AD3 )
3
.. ..
. .
1
α0 = − (ADn ),
n
where
D1 = I
D2 = AD1 + αn−1 I = A + αn−1 I
D3 = AD2 + αn−2 I = A2 + αn−1 A + αn−2 I
..
.
Dn = ADn−1 + α1 I = An−1 + αn−1 An−2 + · · · + α2 A + α1 I
and also
Dn+1 = ADn + α0 I = 0.
Then the determinant of A is
det(A) = |A| = (−1)n α0 ,
the adjoint of A is
adj(A) = (−1)n+1 Dn ,
and the inverse of A is
1
A−1 = − Dn .
α0
Note that a singular matrix is indicated by α0 = 0.

This result is known as the Faddeev–Leverrier method. This


method, which is recursive, yields a characteristic equation for a
square matrix, the adjoint of a matrix, and its inverse (if it exists).
The determinant of a matrix, being the negative of the last coefficient
in the characteristic equation, is also computed.
The Eigenvalue Problems 387

Example 3.25 Find the characteristic equation, determinant, ad-


joint, and inverse of the following matrix using the Faddeev–Leverrier
method:  
2 2 −1
A =  −1 0 4 .
3 −1 −3

Solution. Since the given matrix is of order 3 × 3, the possible


characteristic equation will be of the form

|A − λI| = λ3 + α2 λ2 + α1 λ + α0 = 0.

The values of the unknown coefficients α2 , α1 , and α0 can be computed


as
α2 = −tr(AD1 ) = −tr(A) = 1
and  
3 2 −1
D2 = AD1 + α2 I = A + I =  −1 1 4 .
3 −1 −2
Also  
1 7 8
AD2 =  9 −6 −7  ,
1 8 −1
so
1 1
α1 = − tr(AD2 ) = − (−6) = 3.
2 2
Similarly, we have
 
4 7 8
D3 = AD2 + α1 I = AD2 + 3I =  9 −3 −7 
1 8 2

and  
25 0 0
AD3 =  0 25 0  ,
0 0 25
388 Applied Linear Algebra and Optimization using MATLAB

which gives
1 1
α0 = − tr(AD3 ) = − (25) = −25,
3 3
which shows that the given matrix is nonsingular. Hence, the char-
acteristic equation is

|A − λI| = λ3 + λ2 + 3λ − 25 = 0.

Thus, the determinant of A is

|A| = (−1)3 (−25) = 25,

and the adjoint of A is


 
4 7 8
adj(A) = (−1)4 (D3 ) = D3 =  9 −3 −7  .
1 8 2

Finally, the inverse of A is


 
4 7 8
1 1 
A−1 = − D3 = 9 −3 −7  .
α0 25
1 8 2

13. All the eigenvalues of a Hermitian matrix are real.


To prove this, consider (3.1), which is

Ax = λx,

and it implies that


x∗ Ax = λx∗ x.

Since A is Hermitian, i.e., A = A∗ and x∗ Ax is a scalar,

x∗ Ax = x∗ A∗ x = (x∗ Ax)∗ = xAx.

Thus, the scalar is equal to its own conjugate and hence, real. There-
fore, λ is real.
The Eigenvalue Problems 389

Example 3.26 Consider the Hermitian matrix


 
2 −i 0
A =  i 2 0 ,
0 0 3

which has a characteristic equation

λ3 − 7λ2 + 15λ − 9 = 0,

and it gives the real eigenvalues 1, 3, and 3 for the given matrix A.•

14. A matrix that is unitarily similar to a Hermitian matrix is itself


Hermitian.
To prove this, assume that B ∗ = B and A = QBQ−1 , where Q−1 =
Q∗ . Then
A∗ = (QBQ−1 )∗
= (QBQ∗ )∗
= Q∗∗ B ∗ Q∗
= QBQ−1 = A.

This shows that matrix A is Hermitian.

15. If A is a Hermitian matrix with distinct eigenvalues, then it is unitary


similarly to a diagonal matrix, i.e., there exists a unitary matrix µ
such that
µT Aµ = D,
a diagonal matrix.

Example 3.27 Consider the matrix


 
11 2 8
A= 2 2 −10  ,
8 −10 5

which has the characteristic equation

λ3 − 18λ2 − 81λ + 1458 = 0.


390 Applied Linear Algebra and Optimization using MATLAB

It can be easily factorized to give


(λ + 9)(λ − 9)(λ − 18) = 0,
and the eigenvalues of A are, −9, 9, and 18. The eigenvectors cor-
responding to these eigenvalues are
x1 = [−1, 2, 2]T , x2 = [2, 2, −1]T , x3 = [2, −1, 2]T ,
and they form an orthogonal set. Note that the vectors
x1
u1 = = 31 [−1, 2, 2]T
kx1 k2
x2
u2 = = 31 [2, 2, −1]T
kx2 k2
x3
u3 = = 31 [2, −1, 2]T
kx3 k2
form an orthonormal set, since they inherit orthogonality from x1 , x2 ,
and x3 , and in addition
ku1 k2 = ku2 k2 = ku3 k2 = 1.
Thus, the unitary matrix µ is given by
 
−1 2 2
1
µ=  2 2 −1 
3
2 −1 2
and
   
−1 2 2 11 2 8 −1 2 2
1
µT Aµ =  2 2 −1   2 2 −10   2 2 −1  ,
9
2 −1 2 8 −10 5 2 −1 2
which implies that
 
−9 0 0
µT Aµ =  0 9 0  = D.
0 0 18

The Eigenvalue Problems 391

16. A matrix Q is unitary, if and only if its conjugate transpose is its


inverse:
Q∗ = Q−1 .

Note that a real matrix Q is unitary, if and only if QT = Q−1 .


For any square matrix A, there is a unitary matrix µ such that
µ−1 Aµ = T,
an upper-triangular matrix whose diagonal entries consist of the
eigenvalues of A. This is a well-known lemma called Shur’s lemma.

Example 3.28 Consider the matrix


 
2 −1
A= ,
1 0
which has an eigenvalue 1 of multiplicity 2. The eigenvector cor-
responding to this eigenvalue is [1, 1]T . Thus, the first column of a
unitary matrix µ is [ √12 , √12 ]T , and the other column is orthogonal to
it, i.e., [ √12 , − √12 ]T . So
 1 1   1 1 
√ √  √ √ 
 2 2  2 −1  2 2 
−1
µ Aµ =  ,
   
 
 1 1  1 1
1 0
 
√ −√ √ −√
2 2 2 2
which gives  
−1 1 2
µ Aµ = = T.
0 1

17. A matrix that is unitarily similar to a normal matrix is itself normal.


Assume that BB ∗ = B ∗ B and A = QBQ−1 , where Q−1 = Q∗ . Then
A∗ = (QBQ−1 )∗
= (QBQ∗ )∗
= Q∗∗ B ∗ Q∗
= QB ∗ Q−1 .
392 Applied Linear Algebra and Optimization using MATLAB

So
AA∗ = (QBQ−1 )(QB ∗ Q−1 )
= (QBB ∗ Q−1 )
= (QB ∗ BQ−1 )
= (QB ∗ Q−1 QBQ−1 )
= (QB ∗ Q−1 )(QBQ−1 ) = A∗ A.
This shows that the matrix A is normal.
18. The value of the exponential of a matrix A can be calculated from
eA = Q(expΛ)Q−1 ,
where expΛ is a diagonal matrix whose elements are the exponential
of successive eigenvalues, and Q is a matrix of the eigenvectors of A.

Example 3.29 Consider the matrix


 
0.1 0.1
A= .
0.0 0.2
In order to find the value of the exponential of the given matrix A, we
have to find the eigenvalues of A and also the eigenvectors of A. The
eigenvalues of A are 0.1 and 0.2, and the corresponding eigenvectors
are [1, 0]T and [1, 1]T . Then the matrix Q is
 
1 1
Q= .
0 1
Its inverse can be found as
 
−1 1 −1
Q = .
0 1
Thus,
   
A −1 1 1 e0.1 0 1 −1
e = Q(expΛ)Q = ,
0 1 0 e0.2 0 1
which gives  
A 1.105 0.116
e = .
0.000 1.221

The Eigenvalue Problems 393

In the following section we discuss some very important results concerning


eigenvalue problems. The proofs of all the results are beyond the scope of
this text and will be omitted. However, they are very easily understood
and can be used. We shall discuss these results by using the different ma-
trices.

3.5 Some Results of Eigenvalues Problems


1. If A is a Hermitian matrix, then
p
kAk2 = ρ(AH A) = ρ(A).

Example 3.30 Consider the Hermitian matrix


 
0 ᾱ
A= .
α 0

Then the characteristic equation of A is

λ2 − αᾱ = λ2 − |α|2 = 0,

which implies that


λ = |α| = ρ(A).
Also,     
H 0 ᾱ 0 ᾱ αᾱ 0
A A= = ,
α 0 α 0 0 αᾱ
and the characteristic equation of AH A is

(αᾱ − λ)2 = 0,

which implies that

kAk2 = [ρ(AH A)]1/2 = |α| = ρ(A).


394 Applied Linear Algebra and Optimization using MATLAB

2. For an arbitrary nonsingular matrix A


n
!
1 X
≥ min |aii | − |aij | , j 6= i. (3.35)
ρ(A−1 ) i
j=1

Example 3.31 Consider the matrix


 √ 
0
√ 3
A= .
3 2

To satisfy the above relation (3.35), first, we compute the inverse of


the given matrix A, which is
 √ 
2 3
 3 3 
A−1 =  √ .
 
3
 
0
3
Now to find the eigenvalues of the above inverse matrix, we solve the
characteristic equation of the form as follows:

2 3
−λ
det(A−1 − λI) = 3 √ 3 = 3λ2 − 2λ − 1 = 0,

3
−λ


3

which gives the eigenvalues −1, 31 of A−1 . Hence, the spectral radius
of the matrix A−1 is

1
ρ(A−1 ) = max{| − 1|, | |} = 1.
3
Thus,
1 > min{−1.7321, −0.2680},
which satisfies the relation (3.35). •
The Eigenvalue Problems 395

3. Let A be a symmetric matrix with k.k = k.k2 , then


largest |λi |
condA = .
smallest |λi |

It is a well-known theorem, called the spectral radius theorem,


and it shows that for a symmetric matrix A, ill-conditioning corre-
sponds to A having eigenvalues of both large and small magnitude.
It is most commonly used to define the condition number of a ma-
trix. As we discussed in Chapter 2, a matrix is ill-conditioned if its
condition number is large. Strictly speaking, a matrix has many con-
dition numbers, and the word “large” is not itself well defined in this
context. To have an idea of what “large” means is to deal with the
Hilbert matrix. For example, in dealing with a 3 × 3 Hilbert matrix,
we have

>> A = hilb(3);
A=
1.0000 0.5000 0.3333
0.5000 0.3333 0.2500
0.3333 0.2500 0.2000

and one can find the condition number of this Hilbert matrix as

>> cond(A)
ans = .
524.0568

By adapting this result, we can easily confirm that the condition


numbers of Hilbert matrices increase rapidly as the sizes of the ma-
trices increase. Large Hilbert matrices are therefore considered to be
extremely ill-conditioned.

Example 3.32 Find the conditioning of the matrix


 √ 
0 3
A= √ .
3 2
396 Applied Linear Algebra and Optimization using MATLAB

Solution. The following



−λ 3
det(A − λI) = √

=0
3 2−λ
gives a characteristic equation
λ2 − 2λ − 3 = 0.
Solving the above equation gives the solutions 3 and −1, which are
called the eigenvalues of matrix A. Thus, the largest eigenvalue of
A is 3, and the smallest one is 1. Hence, the condition number of
matrix A is
3
cond A = = 3.
1
Since 3 is of the order of magnitude of 1, A is well-conditioned. •

4. Let A be a nonsymmetric matrix A, with k.k = k.k2 , then


1/2
largest eigenvalue |λi | of AT A

cond A = .
smallest eigenvalue|λi | of AT A
Example 3.33 Find the conditioning of the matrix
 
4 5
A= .
3 2

Solution. Since

T
25 − λ 26
det(A A − λI) = = 0,
26 29 − λ
solving the above equation gives
λ2 − 54λ + 49 = 0.
The solutions 53.08 and 0.92 of the above equation are called the
eigenvalues of the matrix AT A. Thus, the conditioning of the given
matrix can be obtained as
 1/2
53.08
condA = ≈ 7.6,
0.92
which shows that the given matrix A is not ill-conditioned. •
The Eigenvalue Problems 397

3.6 Applications of Eigenvalue Problems


Here, we will deal with two important applications of eigenvalues and eigen-
vectors. They are systems of differential equations and difference equations.
The techniques used in these applications are important in science and en-
gineering. One should master them and be able to use them whenever the
need arises. We first introduce the idea of a system of differential equations.

3.6.1 System of Differential Equations


Solving a variety of problems, particularly in science and engineering,
comes down to solving a differential equation or a system of differential
equations. Linear algebra is helpful in the formulation and solution of dif-
ferential equations. Here, we provide only a brief survey of the approach.

The general problem is to find differentiable functions f1 (t), f2 (t), . . . , fn (t)


that satisfy a system of equations of the form

f10 (t) = a11 f1 (t) + a12 f2 (t) + · · · + a1n fn (t)


f20 (t) = a21 f1 (t) + a22 f2 (t) + · · · + a2n fn (t)
.. .. .. .. .. (3.36)
. . . . .
0
fn (t) = an1 f1 (t) + an2 f2 (t) + · · · + ann fn (t),

where the aij are known constants. This is called a linear system of differ-
ential equations. To write (3.36) in matrix form, we have
     
f1 (t) f10 (t) a11 a12 · · · a1n
 f2 (t) 
0
 f20 (t)   a21 a22 · · · a2n 
f (t) =  , f (t) =  , A= ..  .
     
.. .. .. .. ..
 .   .   . . . . 
fn (t) fn0 (t) an1 an2 · · · ann

Then the system (3.36) can be written as

f 0 (t) = Af (t). (3.37)


398 Applied Linear Algebra and Optimization using MATLAB

With this notation, an n − vector function


 
f1 (t)
 f2 (t) 
f (t) =  ..
 

 . 
fn (t)
satisfying (3.37) is called a solution to the given system. It can be shown
that the set of all solutions to the linear system of differential equations
(3.36) is a subspace of the vector space of differentiable real-valued n−
vector functions. One can also easily verify that if f (1) (t), f (2) (t), . . . , f (n) (t)
are all solutions to (3.37), then
f (t) = c1 f (1) (t) + c2 f (2) (t) + · · · + cn f (n) (t) (3.38)
is also a solution to (3.37).

A set of vector functions {f (1) (t), f (2) (t), . . . , f (n) (t)} is said to be a fun-
damental system for (3.36) if every solution to (3.36) can be written in the
form (3.38). In this case, the right side of (3.38), where c1 , c2 , . . . , cn are
arbitrary constants, is said to be the general solution to (3.37).

If the general solution to (3.38) is known, then the initial-value problem


can be solved by setting t = 0 in (3.38) and determining the constants
c1 , c2 , . . . , cn so that
f0 = f (0) = c1 f (1) (0) + c2 f (2) (0) + · · · + cn f (n) (0), (3.39)
where f0 is a given vector, called an initial condition. It is easily seen that
this is actually an n × n linear system with unknowns c1 , c2 , . . . , cn . This
linear system can also be written as
Bc = f0 , (3.40)
where  
c1
 c2 
c= ,
 
..
 . 
cn
The Eigenvalue Problems 399

and B is the n × n matrix whose columns are f (1) (0), f (2) (0), . . . , f (n) (0),
respectively.

Note that, if f (1) (t), f (2) (t), . . . , f (n) (t) form a fundamental system for
(3.36), then B is nonsingular, so (3.40) always has a unique solution.

Example 3.34 The simplest system of the form (3.36) is the single equa-
tion
df
= αf, (3.41)
dt
where α is a constant. The general solution to (3.41) is

f (t) = ceαt . (3.42)

To get the particular solution to (3.41), we have to solve the initial-value


problem
df
= αf, f (0) = f0
dt
and set t = 0 in (3.42) and get c = f0 . Thus, the solution to the initial-
value problem is
f (t) = f0 eαt .

The system (3.37) is said to be diagonal if the matrix A is diagonal.


The system (3.36) can be rewritten as

f10 (t) = a11 f1 (t)


f20 (t) = a22 f2 (t)
.. (3.43)
.
fn0 (t) = ann fn (t).

The solution of the system (3.43) can be found easily as

f1 (t) = c1 ea11 t
f2 (t) = c2 ea22 t
.. .. (3.44)
. .
fn (t) = cn eann t ,
400 Applied Linear Algebra and Optimization using MATLAB

where c1 , c2 , . . . , cn are arbitrary constants. Writing (3.44) in vector form


yields
       
c1 ea11 t 1 0 0
   0   1   0 
       
a t 
f (t) =  c2 e 22  = c1  0  ea11 t +c2  0  ea22 t +· · ·+cn  0  eann t .
      
 ..   ..   ..   .. 
 .   .   .   . 
ann t
cn e 0 0 1

This implies that the vector functions


     
1 0 0
 0   1   0 
     
f (t) =   e , f (t) =  0
(1)  0  a11 t (2)  a22 t (n) 0  ann t
 e , . . . , f (t) =  e
 
 ..   ..   .. 
 .   .   . 
0 0 1

form a fundamental system for the diagonal system (3.43).


Example 3.35 Find the general solution to the diagonal system
 0    
f1 (t) 2 0 0 f1
 f20 (t)  =  0 5 0   f2  .
f30 (t) 0 0 1 f3
Solution. The given system can be written as three equations of the form
f10 (t) = 2f1 (t)
f20 (t) = 5f2 (t)
f30 (t) = f3 (t).
Solving these equations, we get

f1 (t) = c1 e2t , f2 (t) = c2 e5t , f3 (t) = c3 et ,

where c1 , c2 , . . . , cn are arbitrary constants. Thus,


       
c1 e2t 1 0 0
f (t) =  c2 e 5t 
= c1 0 e + c2 1 e + c3 0  et
  2t   5t 
t
c3 e 0 0 1
The Eigenvalue Problems 401

is the general solution to the given system of differential equations, and the
functions
     
1 0 0
f (1) (t) =  0  e2t , f (2) (t) =  1  e5t , f (3) (t) =  0  et
0 0 1
form a fundamental system for the given diagonal system. •
If the system (3.37) is not diagonal, then there is an extension of the
method discussed in the preceding example that yields the general solution
in the case where A is diagonalizable. Suppose that A is diagonalizable and
Q is a nonsingular matrix such that
D = Q−1 AQ, (3.45)
where D is diagonal. Then by multiplying Q−1 to the system (3.37), we
get
Q−1 f 0 (t) = Q−1 Af (t)
or
Q−1 f 0 (t) = Q−1 AQQ−1 f (t) = (Q−1 AQ)(Q−1 f (t)), (3.46)
−1
(since QQ = In ). Let
g(t) = Q−1 f (t), (3.47)
and by taking its derivative, we have
g0 (t) = Q−1 f 0 (t). (3.48)
Since Q−1 is a constant matrix, using (3.48) and (3.45), we can write (3.46)
as
g0 (t) = Dg(t). (3.49)
Then the system (3.49) is a diagonal system and can be solved by the
method just discussed. Since the matrix D is a diagonal matrix and all its
diagonal elements are also called the eigenvalues λ1 , λ2 , . . . , λn of A, it can
be written as  
λ1 0 · · · 0
 0 λ2 · · · 0 
D =  .. ..  .
 
.. ..
 . . . . 
0 0 · · · λn
402 Applied Linear Algebra and Optimization using MATLAB

The columns of Q are linearly independent eigenvectors of A associated,


respectively, with λ1 , λ2 , . . . , λn . Thus, the general solution to (3.37) is
 
c1 eλ1 t
 c2 eλ2 t 
g(t) =   = c1 g(1) (t) + c2 g(2) (t) + c3 g(3) (t),
 
..
 . 
λn t
cn e

where
     
1 0 0
(1)
 0 
 λ1 t (2)
 1 
 λ2 t (n)
 0 
 λn t
g (t) =  e , g (t) =  e , ...,g (t) =  e ,
  
.. .. ..
 .   .   . 
0 0 1
(3.50)
and c1 , c2 , . . . , cn are arbitrary constants. The system (3.47) can also be
written as
f (t) = Qg(t). (3.51)
So the general solution to the given system (3.37) is

f (t) = Qg(t) = c1 Qg(1) (t) + c2 Qg(2) (t) + · · · + cn Qg(n) (t). (3.52)

However, since the constant vectors in (3.50) are the columns of the
identity matrix and QIn = Q, (3.52) can be rewritten as

f (t) = c1 q1 eλ1 t + c2 q2 eλ2 t + · · · + cn qn eλn t , (3.53)

where q1 , q2 , . . . , qn are the columns of Q, and are therefore, the eigenvec-


tors of A associated with the eigenvalues λ1 , λ2 , . . . , λn , respectively.
Theorem 3.19 Consider a linear system

f 0 (t) = Af (t)

of differential equations, where A is an n × n diagonalizable matrix. Let


Q−1 AQ be diagonal, where Q is given in terms of its columns

Q = [q1 , q2 , . . . , qn ],
The Eigenvalue Problems 403

and q1 , q2 , . . . , qn are independent eigenvectors of A. If qi corresponds to


the eigenvalues λi for each i, then

{q1 eλ1 t , q2 eλ2 t , . . . , qn eλn t }

is a basis for the space of solutions of f 0 (t) = Af (t). •

Example 3.36 Find the general solution to the system

f10 = 3f1 − f2 − f3
0
f2 = −12f1 + 5f3
0
f3 = 4f1 − 2f2 − f3 .

Then find the solution to the initial-value problem determined by the


given initial conditions

f1 (0) = 1, f2 (0) = 4, f3 (0) = 3.

Solution. Writing the given system in (3.37) form, we obtain


 0    
f1 3 −1 −1 f1
f 0 =  f20  =  −12 0 5   f2  = Af .
0
f3 4 −2 −1 f3

The characteristic polynomial of A is

0 = f (λ) = det(A − λI) = −λ3 + 2λ2 + λ − 2

or
0 = f (λ) = −(λ + 1)(λ − 1)(λ − 2).
So the eigenvalues of A are λ1 = −1, λ2 = 1, and λ3 = 2, and the associated
eigenvectors are
 1     
3 1
 2   −1  ,  −1  ,
 1 ,
1 7 2
404 Applied Linear Algebra and Optimization using MATLAB

respectively. The general solution is then given by


 1     
3 1
f = c1  21  e−t + c2  −1  et + c3  −1  e2t ,
 

1 7 2

where c1 , c2 , . . . , cn are arbitrary constants.

Now we write our general solution in the form (3.51) as


  1 
c1 e−t
 
f1 (t) 3 1
f (t) =  f2 (t)  = 
 2 
c2 e t  .
1 −1 −1 
f3 (t) 1 7 2 c3 e2t

Now taking t = 0, we obtain


   1  
1 3 1 c1 e0
f (0) =  4  =  2 0 
1 −1 −1  c2 e0
 
2 1 7 2 c3 e

or  1    
3 1 c1 1
 2
 1 −1 −1  c2 = 4 .
  
1 7 2 c3 3

Solving this system for c1 , c2 , and c3 using the Gauss elimination method,
we obtain
c1 = 2, c2 = 1, c3 = −3.
Therefore, the solution to the initial-value problem is
 1     
3 1
f (t) = 2  21  e−t +  −1  et − 3  −1  e2t .
 

1 7 2


The Eigenvalue Problems 405

3.6.2 Difference Equations


It often happens that a problem can be solved by finding a sequence of num-
bers a0 , a1 , a2 , . . . , where the first few are known, and subsequent numbers
are given in terms of earlier values.

Let a0 , a1 , a2 , . . . be a sequence of real numbers. Such a sequence may


be defined by giving its nth term. For example, suppose

an = n3 + 2. (3.54)

Letting n = 0, 1, 2, . . . , we get the terms of the sequence (3.54) as

a0 = 2, a1 = 3, a2 = 10, . . . .

Furthermore, any specific term of the sequence (3.54) can be found.


For example, if we want a10 , then we let n = 10 in (3.54) and get

a10 = (10)3 + 2 = 1000 + 2 = 1002.

When sequences arise in applications, they are often initially defined by


a relationship between consecutive terms, with some initial terms known,
rather than defined, by the nth term. For example, a sequence might be
defined, by the relationship

an = 2an−1 + 8an−2 , n = 2, 3, 4, . . . , (3.55)

where a0 = 1 and a1 = 2. Such an equation is called a difference equation


(or recurrence relation), and the given terms of the sequence are called
initial conditions. Further terms of the sequence can be found from the
difference equation and initial conditions. For example, letting n = 2, 3,
and 4, we obtain

a2 = 2a1 + 8a0 = 2(2) + 8(1) = 12


a3 = 2a2 + 8a1 = 2(12) + 8(2) = 40
a4 = 2a3 + 8a2 = 2(40) + 8(12) = 176.

Thus, the sequence is 1, 2, 12, 40, 176, . . . .


406 Applied Linear Algebra and Optimization using MATLAB

However, if we want to find a specific term such as the 20th term of


the sequence (3.55), this method of using the difference equation to first
find all the preceding terms is impractical. Here we need an expression for
the nth term of the sequence. The expression for the nth term is called the
solution to the difference equation. Now we discuss how the tools of linear
algebra can be used to solve this problem.

Consider the difference equation of the form

an = pan−1 + qan−2 , n = 2, 3, 4, . . . , (3.56)

where p and q are fixed numbers and a0 and a1 are the given initial con-
ditions. This equation is called a linear difference equation (because each
ai appears to the first power) and is of order 2 (because an is expressed
in terms of two preceding terms an−1 and an−2 ). To solve (3.56), let us
introduce a relation bn = an−1 . So we get the system

an = pan−1 + qbn−1
bn = an−1 , n = 2, 3, 4, . . . . (3.57)

To write (3.57) in matrix form, we obtain


    
an p q an−1
= , n = 2, 3, 4, . . . . (3.58)
bn 1 0 bn−1
Let    
an p q
Vn = and A = ,
bn 1 0
then
Vn = AVn−1 , n = 2, 3, 4, . . . . (3.59)
Thus,
Vn = AVn−1 = A2 Vn−2 = A3 Vn−3 = · · · = An−1 V1 , (3.60)
where    
a1 a1
V1 = = .
b1 a0
In most applications, a matrix A has distinct eigenvalues λ1 and λ2 , and
the corresponding linearly independent eigenvectors can be diagonalized
The Eigenvalue Problems 407

using a similarity transformation. Let Q be a matrix whose columns are


linearly independent eigenvectors of A and let
 
−1 λ1 0
D = Q AQ = . (3.61)
0 λ2

Then
n−1
An−1 = QDQ−1 = QDQ−1 QDQ−1 · · · QDQ−1 = QDn−1 Q−1
  

or
 
 n−1 (λ1 )n−1 0
λ1 0
An−1 = Q Q−1 = Q   Q−1 .
0 λ2 n−1
0 (λ2 )

This gives  
(λ1 )n−1 0
Vn = Q   Q−1 V1 . (3.62)
n−1
0 (λ2 )
Example 3.37 Solve the difference equation

an = 2an−1 + 8an−2 , n = 2, 3, 4, . . .

using initial conditions a0 = 1 and a1 = 2. Use the solution to find a20 .

Solution. Construct the system

an = 2an−1 + 8bn−1
bn = an−1 , n = 2, 3, 4, . . . .

The matrix form of the system is


    
an 2 8 an−1
= , n = 2, 3, 4, . . . .
bn 1 0 bn−1

Since the matrix A is  


2 8
A= ,
1 0
408 Applied Linear Algebra and Optimization using MATLAB

and its eigenvalues can be obtained by solving the characteristic polynomial

p(λ) = λ2 − 2λ − 8 = (λ − 4)(λ + 2) = 0,

which gives the eigenvalues of A

λ1 = 4 and λ2 = −2,

one can easily find the eigenvectors of A corresponding to the eigenvalues


λ1 and λ2 as
−1
   
2
v1 =   and v2 =  ,
   
1 1
2 2
respectively. Consider
−1
 
2
Q= ,
 
1 1
2 2
and its inverse can be obtained as
 
1 2
 3 3 
Q−1 =  .
 
 1 4 

3 3
Then  
  (λ1 )n−1 0  
an  Q−1 a1
= Q ,
bn b1
0 (λ2 )n−1
which is
 
1 2
2 −1
  
  (4)n−1 0 3 3 
 
an  2

=
  
bn 1 1 4  1
  
0 (−2)n−1
 1
2 2 −
3 3
The Eigenvalue Problems 409

(b1 = a0 = 1). After simplifying, we obtain


 
2 n 1
(4) + (−2)n
3 3
  
an

= .
 
bn  2 1 
(4)n−1 + (−2)n−1
3 3
Thus, the solution is
2 1
an = (4)n + (−2)n , n = 2, 3, 4, . . . ,
3 3
which gives a0 = 1 and a1 = 2 by taking n = 0, 1 and it agrees with the
given initial conditions. Now taking n = 20, we get

2 1 2 1
a20 = (4)20 + (−2)20 = (1099511627776) + (1048576),
3 3 3 3
so
a20 = 733008101376. •

3.7 Summary
In this chapter we discussed the approximation of eigenvalues and eigen-
vectors. We discussed similar, unitary, and diagonalizable matrices. The
set of diagonalizable matrices includes matrices with n distinct eigenvalues
and symmetric matrices. Matrices that are not diagonalizable are some-
times referred to as defective matrices.

We discussed the Cayley–Hamilton theorem for finding the power and


inverse of a matrix. We also discussed the Sourian–Frame theorem, Bocher’s
theorem, and the Faddeev–Laverrier theorem for computing the coefficients
of the characteristic polynomial p(λ) of a matrix A. There are no restric-
tions on A. In theory, the eigenvalues of A can be obtained by factoring
p(λ) using polynomial-root-finding techniques. However, this approach is
practical only for relatively small values of n. The chapter closed with two
important applications.
410 Applied Linear Algebra and Optimization using MATLAB

3.8 Problems
1. Find the characteristic polynomial, eigenvalues, and eigenvectors of
each matrix:
     
3 2 1 −2 1 1 2 −1 −1
(a)  2 1 1  , (b)  −6 1 3  , (c)  −1 2 −1  ,
1 1 1 −12 −2 8 −1 −1 2
 
    4 3 2 1
1 1 1 3 2 −2  3 3 2 1 
(d)  1 1 0  , (e)  −3 −1 3  , (f ) 
 2 2 2 1 .

1 0 1 1 2 0
1 1 1 1
2. Determine whether each of the given sets of vectors is linearly depen-
dent or independent:

(a) (−3, 4, 2), (7, −1, 3), and (1, 1, 8).


(b) (1, 0, 2), (2, 6, 4), and (1, 12, 2).
(c) (1, −2, 1, 1), (3, 0, 2, −2), (0, 4, −1, 1), and (5, 0, 3, −1).
(d) (3, −2, 4, 5), (0, 2, 3, −4), (0, 0, 2, 7), and (0, 0, 0, 4).
3. Determine whether each of the following sets {p1 , p2 , p3 } of functions
is linearly dependent or independent:

(a) p1 = 3x2 − 1, p2 = x2 + 2x − 1, p3 = x2 − 4x + 3.
(b) p1 = x2 + 5x + 12, p2 = 3x2 + 5x − 3, p3 = 4x2 − 3x + 7.
(c) p1 = 2x2 − 8x + 9, p2 = 6x2 + 13x − 22, p3 = 4x2 − 11x + 2.
(d) p1 = −2x2 + 3x, p2 = 7x2 − 5x − 10, p3 = −3x2 + 9x − 13.
4. For what values of k are the following vectors in R3 linearly indepen-
dent?
(a) (−1, 0, −1), (2, 1, 2), and (1, 1, k).
(b) (1, 2, 3), (2, −1, 4), and (3, k, 4).
(c) (2, k, 1), (1, 0, 1), and (0, 1, 3).
(d) (k, 21 , 12 ), ( 21 , k, 12 ), and ( 21 , 12 , k).
5. Show that the vectors (1, a, a2 ), (1, b, b2 ), and (1, c, c2 ) are linearly
independent, if a 6= b, a =
6 c, b 6= c.
The Eigenvalue Problems 411

6. Determine whether each of the given matrices is diagonalizable:


     
1 1 −4 1 1 0 −46 6 18
(a)  2 0 −4  , (b)  3 0 3  , (c)  0 2 0 ,
−1 1 −2 2 −1 3 −120 15 47
 
   1  0 0 2
−5 −25 6 10 11 3  0 1 3 0 
(d)  10 12 9  , (e)  −3 −4 −3  , (f ) 
 0
.
0 2 0 
−3 −9 4 −8 −8 11
0 0 0 2

7. Find 3 × 3 nondiagonal matrix whose eigenvalues are −2, −2 and 3,


and associated eigenvectors are
     
1 0 1
 0 ,  1 ,  1 .
1 1 1

8. Find a nonsingular matrix Q such that Q−1 AQ is a diagonal matrix,


using Problem 1.

9. Find the formula for the kth power of each matrix considered in Prob-
lem 5, and then compute A5 .

10. Show that the following matrices are similar:


   
1 0 0 1 0 0
A= 0 1 0  and B =  1 1 0 .
1 0 1 0 0 1

11. Consider the diagonalizable matrix


 
1 0 0 2
 0 1 3 0 
A=  0 0
.
2 0 
0 0 0 2

Find a formula for the kth power of the matrix and compute A10 .
412 Applied Linear Algebra and Optimization using MATLAB

12. Prove that

(a) The matrix A is similar to itself.


(b) If A is similar to B, then B is also similar to A.
(c) If A is similar to B, and B is similar to C, then A is similar
to C.
(d) If A is similar to B, then det(A) = det(B).
(e) If A is similar to B, then A2 is similar to B 2 .
(f ) If A is noninvertible and B is similar to A, then B is
also noninvertible.

13. Find a diagonal matrix that is similar to the given matrix:


     
3 −1 −1 −5 0 0 −3 2 −3
(a)  −12 0 5  , (b)  −4 0 −4  , (c)  1 −2 −3  .
4 −2 −1 −7 −7 0 1 −1 3

14. Show that each of the given matrices is not diagonalizable:


     
10 11 3 3 3 3 2 0 0
(a)  −3 −4 −3 , (b)  2 2 2  , (c)  3 2 0  .
−8 −8 −1 1 1 1 0 0 5

15. Find the orthogonal transformations matrix Q to reduce the given


matrices to diagonal matrices:
     
1 0 0 −1 2 2 3 2 2
(a)  0 0 1  , (b)  2 −1 2  , (c)  2 2 0  ,
0 1 0 2 2 −1 2 0 4
     
8 −1 1 5 −2 −4 −2 3 0
(d)  −1 8 1  , (e)  −2 8 −2  , (f )  3 4 0  .
1 1 8 −4 −2 5 0 0 2

16. Find the characteristic polynomial and inverse of each of the matrices
considered in Problem 5 by using the Cayley–Hamilton theorem.
The Eigenvalue Problems 413

17. Use the Cayley–Hamilton theorem to compute the characteristic poly-


nomial, powers A3 , A4 , and inverse matrices A−1 , A−2 for the each of
the given matrices:
   
  2 −2 1 1 1 0
2 3
(a) , (b)  −2 4 2  , (c)  −1 0 1  .
−4 5
1 2 3 −2 1 0

18. Find the characteristic polynomial and inverse for each of the follow-
ing matrices by using the Sourian–Frame theorem:
     
5 0 0 2 2 1 1 1 −1
(a)  2 1 2  , (b)  −2 1 2  , (c)  3 1 0 ,
0 1 1 1 −2 2 1 −2 1
 
  1 0 2 −1 4
  4 0 0 2
1 2 1  5 3 −1 0 1 
 0 3 0 −1   
(d)  1 2 0  , (e) 
 1 0
 , (f )  8 5 −3 −1 4 .
2 2   
1 4 2  6 2 0 0 1 
0 0 0 2
0 1 4 2 0

19. Find the characteristic polynomial and inverse of the following matrix
by using the Sourian–Frame theorem:
 
1 0 0 0
 −2 1 0 0 
A=
 5 −4 1
.
0 
−5 3 0 1

20. Find the characteristic polynomial and inverse of each of the given
matrices considered in Problem 11 by using the Sourian–Frame the-
orem.

21. Use Bocher’s formula to find the coefficients of the characteristic


equation of each of the matrices considered in Problem 1.
414 Applied Linear Algebra and Optimization using MATLAB

22. Find the characteristic equation, determinant, adjoint, and inverse


of each of the given matrices using the Faddeev–Leverrier method:
 
    1 −3 0 −2
3 5 0 5 5 5  3 −12 −2 −6 
(a)  4 −2 1  , (b)  2 10 1  , (c) 
 −2
.
10 2 5 
6 −3 4 6 3 −9
−1 6 1 3

23. Find the exponential of each of the matrices considered in Problem


1.

24. Find the general solution to the following system. Then find the
solution to the initial-value problem determined by the given initial
condition.
f10 = −f1 + 5f2
f20 = f1 + 3f2
f1 (0) = 1, f2 (0) = −1.

25. Find the general solution to the following system. Then find the
solution to the initial-value problem determined by the given initial
condition.
f10 = 9f1 + 2f2
f20 = 3f1 + 5f2
f1 (0) = 2, f2 (0) = 4.

26. Find the general solution to each of the following systems. Then find
the solution to the initial-value problem determined by the given ini-
tial condition.

(a)
f10 = 2f1 + f2 + 2f3
f20 = 2f1 + 2f2 − 2f3
f30 = 3f1 + f2 + f3
f1 (0) = 1, f2 (0) = 1, f3 (0) = 1.
The Eigenvalue Problems 415

(b)
f10 = 3f1 + 2f2 + 3f3
f20 = −f1 + f2 + 4f3
f30 = f1 + f2 + 2f3
f1 (0) = 4, f2 (0) = 2, f3 (0) = −1.

(c)
f10 = 5f1 + 8f2 + 16f3
0
f2 = 4f1 + f2 + 8f3
f30 = −4f1 − 4f2 − 11f3
f1 (0) = 1, f2 (0) = 1, f3 (0) = 1.
(d)
f10 = 2f1 − 3f2 − 2f3
f20 = −2f1 + 3f3
0
f3 = f1 − 6f2 + 9f3
f1 (0) = 3, f2 (0) = 2, f3 (0) = 1.

27. Find the general solution to each of the following systems. Then find
the solution to the initial-value problem determined by the given ini-
tial condition.

(a)
f10 = 5f1
f20 = + 7f2
f30 = + 6f3
f1 (0) = 1, f2 (0) = 2, f3 (0) = 3.

(b)
f10 = 3f1 + 2f3
f20 = 4f1 − 3f2 + f3
f30 = 2f1 + 2f2 + 4f3
f1 (0) = 3, f2 (0) = 3, f3 (0) = 3.
416 Applied Linear Algebra and Optimization using MATLAB

(c)
f10 = 2f1 + f2 + 2f3
f20 = 1f1 + 5f2 + 5f3
f30 = 2f1 + f2 + f3
f1 (0) = −1, f2 (0) = 2, f3 (0) = 5.
(d)
f10 = f1 + f3
f20 = 2f1 + f2 + 3f3
f30 = f1 + 3f2 + 5f3
f1 (0) = 2, f2 (0) = 1, f3 (0) = 6.

28. Solve each of the following difference equations and use the solution
to find the given term.

(a) an = an−1 + 2an−2 , a0 = 1, a1 = 2. F ind a20 .


(b) an = 4an−1 + 5an−2 , a0 = 0, a1 = 1. F ind a15 .
(c) an = 3an−1 + 4an−2 , a0 = 1, a1 = −1. F ind a15 .
(d) an = an−1 + 12an−2 , a0 = 2, a1 = 3. F ind a12 .

29. Solve each of the following difference equations and use the solution
to find the given term.

(a) an = 3an−1 + 4an−2 , a0 = −1, a1 = 1. F ind a15 .


(b) an = an−1 + 2an−2 , a0 = 2, a1 = 3. F ind a20 .
(c) an = 6an−1 + 7an−2 , a0 = 2, a1 = −2. F ind a12 .
(d) an = 4an−1 + 5an−2 , a0 = 3, a1 = 2. F ind a18 .
Chapter 4

Numerical Computation of
Eigenvalues

4.1 Introduction
The importance of the eigenvalues of a square matrix in a broad range of
applications has been amply demonstrated in the previous chapters. How-
ever, finding the eigenvalues and associated eigenvectors is not such an easy
task. At this point, the only method we have for computing the eigenvalues
of a matrix is to solve the characteristic equation. However, there are sev-
eral problems with this method that render it impractical in all but small
examples. The first problem is that it depends on the computation of a de-
terminant, which is a very time-consuming process for large matrices. The
second problem is that the characteristic equation is a polynomial equa-
tion, and there are no formulas for solving polynomial equations of degrees
higher than 4 (polynomials of degrees 2, 3, and 4 can be solved using the
quadratic formula and its analogues). Thus, we are forced to approximate
417
418 Applied Linear Algebra and Optimization using MATLAB

eigenvalues in most practical problems. We are in need of a completely


new idea if we have any hope of designing efficient numerical techniques.
Unfortunately, techniques for approximating the roots of a polynomial are
quite sensitive to round-off error and are therefore unreliable. Here, we
will discuss a few of the most basic numerical techniques for computing
eigenvalues and eigenvectors.

One class of techniques, called iterative methods, can be used to find


some or all of the eigenvalues and eigenvectors of a given matrix. They
start with an arbitrary approximation to one of the eigenvectors and suc-
cessively improve this until the required accuracy is obtained. Among
them is the power method of inverse iterations, which is used to find all of
the eigenvectors of a matrix from known approximations to its eigenvalues.

The other class of techniques, which can only be applied to symmetric


matrices, include the Jacobi, Given’s, and Householder’s methods, which
reduce a given symmetric matrix to a special form whose eigenvalues are
readily computed. For general matrices (symmetric or nonsymmetric ma-
trices), the QR method and the LR method are the most widely used tech-
niques for solving eigenvalue problems. Most of these procedures make use
of a series of similarity transformations.

4.2 Vector Iterative Methods for Eigenvalues


So far we have discussed classical methods for evaluating the eigenvalues
and eigenvectors for different matrices. It is evident that these methods
become impractical as the matrices involved become large. Consequently,
iterative methods are used for that purpose, such as the power methods.
These methods are an easy means to compute eigenvalues and eigenvectors
of a given matrix.

The power methods include three versions. First, is the regular power
method or simple iterations based on the power of a matrix. Second, is
the inverse power method which is based on the inverse power of a matrix.
Third, is the shifted inverse power method in which the given matrix A is
Numerical Computation of Eigenvalues 419

replaced by (A − µI) for any given scalar µ. Following, we discuss all of


these methods in some detail.

4.2.1 Power Method


The basic power method can be used to compute the eigenvalue of the
largest modules and the corresponding eigenvector of a general matrix.
The eigenvalue of the largest magnitude is often called the dominant eigen-
value. The implication of the power method is that if we assume a vector
xk , then a new vector xk+1 can be calculated. The new vector is normal-
ized by factoring its largest coefficient. This coefficient is then taken as
a first approximation to the largest eigenvalue, and the resulting vector
represents the first approximation to the corresponding eigenvector. This
process is continued by substituting the new eigenvector and determining
a second approximation until the desired accuracy is achieved.

Consider an n × n matrix A, then the eigenvalues and eigenvectors


satisfy

Avi = λi vi , (4.1)
where λi is the ith eigenvalue and vi is the corresponding ith eigenvector of
A. The power method can be used on both symmetric and nonsymmetric
matrices. If A is a symmetric matrix, then all the eigenvalues are real. If
A is a nonsymmetric, then there is a possibility that there is not a single
real dominant eigenvalue but a complex conjugate pair. Under these con-
ditions the power method does not converge. We assume that the largest
eigenvalue is real and not repeated and that eigenvalues are numbered in
increasing order, i.e.,

|λ1 | > |λ2 | ≥ |λ3 | · · · ≥ |λn−1 | ≥ |λn |. (4.2)


The power method starts with an initial guess for the eigenvector x0 ,
which can be any nonzero vector. The power method is defined by the
iteration

xk+1 = Axk , for k = 0, 1, 2, . . . , (4.3)


420 Applied Linear Algebra and Optimization using MATLAB

and it gives
x1 = Ax0
x2 = Ax1 = A2 x0
x3 = Ax2 = A3 x0 .
..
.
Thus,

xk = A k x 0 , for k = 1, 2, . . . .

The vector x0 is an unknown linear combination of all the eigenvectors


of the system, provided they are linearly independent. Thus,

x0 = α1 v1 + α2 v2 + · · · + αn vn .

Let
x1 = Ax0 = A(α1 v1 + α2 v2 + · · · + αn vn )
= α1 Av1 + α2 Av2 + · · · + αn Avn
= α1 λ1 v1 + α2 λ2 v2 + · · · + αn λn vn ,

since from the definition of an eigenvector, Avi = λi vi . Similarly,

x2 = Ax1 = A(α1 λ1 v1 + α2 λ2 v2 + · · · + αn λn vn )
= α1 λ1 Av1 + α2 λ2 Av2 + · · · + αn λn Avn
= α1 λ21 v1 + α2 λ22 v2 + · · · + αn λ2n vn .

Continuing in this way, we get

xk = α1 λk1 v1 + α2 λk2 v2 + · · · + αn λkn vn , (4.4)


which may be written as
"  k  k  k #
λ2 λ3 λn
xk = Ak x0 = λk1 α1 v1 + α2 v2 + α3 v3 + · · · + α n vn .
λ1 λ1 λ1

All of the terms except the first in the above relation (4.4) converge to
the zero vector as k → ∞, since |λ1 | > |λi | for i 6= 1. Hence,
Numerical Computation of Eigenvalues 421

xk ≈ (λk1 α1 )v1 , for large k, provided that α1 6= 0.

Since λk1 α1 v1 is a scalar multiple of v1 , xk = Ak x0 will approach an


eigenvector for the dominant eigenvalue λ1 , i.e.,

Axk ≈ λ1 xk ,

so if xk is scaled and its dominant component is 1, then

(dominant component of Axk ) ≈ λ1 ×(dominant component of xk ) = λ1 ×1 = λ1 .

The rate of convergence of the power method is primarily dependent


on the distribution of the eigenvalues; the smaller the ratio | λλ1i , (for i =
2, 3, . . . , n), the faster the convergence; in particular, this rate depends
upon the ratio | λλ12 |. The number of iterations required to get a desired
degree of convergence depends upon both the rate of convergence and how
large λ1 is compared with the other λi , the latter depending, in turn, on
the choice of initial approximation x0 .

Example 4.1 Find the first five iterations obtained by the power method
applied to the following matrix using the initial approximation x0 = [1, 1, 1]T :
 
1 1 2
A =  1 2 1 .
1 1 0

Solution. Starting with an initial vector x0 = [1, 1, 1]T , we have


    
1 1 2 1 4.0000
Ax0 =  1 2 1   1  =  4.0000  ,
1 1 0 1 2.0000

which gives  
1.0000
Ax0 = λ1 x1 = 4.0000  1.0000  .
0.5000
422 Applied Linear Algebra and Optimization using MATLAB

Similarly, the other possible iterations are as follows:


    
1 1 2 1.0000 3.0000
Ax1 =  1 2 1   1.0000  =  3.5000 
1 1 0 0.5000 2.0000
 
0.8571
= 3.5000  1.0000  = λ2 x2 .
0.5714
    
1 1 2 0.8571 3.0000
Ax2 =  1 2 1   1.0000  =  3.4286 
1 1 0 0.5714 1.8571
 
0.8750
= 3.4286  1.0000  = λ3 x3 .
0.5417
    
1 1 2 0.8750 2.9583
Ax3 =  1 2 1   1.0000  =  3.4167 
1 1 0 0.5417 1.8750
 
0.8659
= 3.4167  1.0000  = λ4 x4 .
0.5488
    
1 1 2 0.8659 2.9634
Ax4 =  1 2 1   1.0000  =  3.4146 
1 1 0 0.5488 1.8659
 
0.8679
= 3.4146  1.0000  = λ5 x5 .
0.5464
Since the eigenvalues of the given matrix A are 3.4142, 0.5858 and
−1.0000, the approximation of the dominant eigenvalue after the five itera-
tions is λ5 = 3.4146, and the corresponding eigenvector is [0.8679, 1.0000, 0.5464]T .

Numerical Computation of Eigenvalues 423

To get the above results using the MATLAB Command Window, we


do the following:

>> A = [1 1 2; 1 2 1; 1 1 0];
>> X = [1 1 1]0 ;
>> maxI = 5;
>> sol = P M (A, X, maxI);

Program 4.1
MATLAB m-file for the Power method
function sol=PM (A, X, maxI)
[n, n] = size(A);
for k=1:maxI; for i=1:n; s=0;
for j=1:n; ss = A(i, j) ∗ X(j, 1); s = s + ss;end
XX(i, 1) = s;end; X = XX; y = max(X);
for i=1:n; X(i, 1) = 1/y ∗ X(i, 1);end; yy=abs(y-y1);
if (yy <= 1e − 6); break; end; y; end

The power method has the disadvantage that it is unknown at the out-
set whether or not a matrix has a single dominant eigenvalue. Nor is it
known how an initial vector x0 should be chosen to ensure that its repre-
sentation in terms of the eigenvectors of a matrix will contain a nonzero
contribution from the eigenvector associated with the dominant eigenvalue,
should it exist.

Note that the dominant eigenvalue λ of a matrix can also be obtained


from two successive iterations, by dividing the corresponding elements of
vectors xn and xn−1 .

Example 4.2 Find the dominant eigenvalue of the matrix


 
4 1 3
A= 9 0 9 .
5 −1 6
424 Applied Linear Algebra and Optimization using MATLAB

Solution. Let us consider an arbitrary vector x0 = [1, 0, 0]T , then


    
4 1 3 1 4
x1 = Ax0 =  9 0 9   0  =  9 
5 −1 6 0 5
    
4 1 3 4 40
x2 = Ax1 =  9 0 9   9  =  81 
5 −1 6 5 41
    
4 1 3 40 364
x3 = Ax2 =  9 0 9   81  =  729  .
5 −1 6 41 365
Then the dominant eigenvalue can be obtained as
x3 364 729 365
λ1 ≈ ≈ ≈ ≈ ≈ 9.
x2 40 81 41

Power Method and Symmetric Matrices

The power method will converge if the given n × n matrix A has linearly
independent eigenvectors and a symmetric matrix satisfies this property.
Now we will discuss the power method for finding the dominant eigenvalue
of a symmetric matrix only.

Theorem 4.1 (Power Method with Euclidean Scaling)

Let A be a symmetric n × n matrix with a positive dominant eigenvalue


λ. If x0 is a unit vector in Rn that is not orthogonal to the eigenspace
corresponding to λ, then the normalized power sequence
Ax0 Ax1 Axk−1
x0 , x1 = , x2 = ,..., xk = , ... (4.5)
kAx0 k kAx1 k kAxk−1 k
converges to a unit dominant eigenvector, and the eigenvalues
Ax1 .x1 , Ax2 .x2 , ..., Axk .xk , .... (4.6)

Numerical Computation of Eigenvalues 425

The basic steps for the power method with Euclidean scaling are:
1. Choose an arbitrary nonzero vector and normalize it to obtain a unit
vector x0 .

2. Compute Ax0 and normalize it to obtain the first approximation x1


to a dominant unit eigenvector. Compute Ax1 .x1 to obtain the first
approximation to the dominant eigenvalue.

3. Compute Ax1 and normalize it to obtain the second approximation


x2 to a dominant unit eigenvector. Compute Ax2 .x2 to obtain the
second approximation to the dominant eigenvalue.

4. Continuing in this way we will create a sequence of increasingly


closer approximations to the dominant eigenvalue and a correspond-
ing eigenvector.

Example 4.3 Apply the power method with Euclidean scaling to the ma-
trix  
2 2
A= ,
2 5
with x0 = [0, 1]T to get the first four approximations to the dominant unit
eigenvector and the dominant eigenvalue.

Solution. Starting with the unit vector x0 = [0, 1]T , we get the first ap-
proximation of the dominant unit eigenvector as follows:
        
2 2 0 2 Ax0 1 2 0.3714
Ax0 = = , x1 = =√ ≈ .
2 5 1 5 max(Ax0 ) 29 5 0.9285

Similarly, for the second, third, and fourth approximations of the dominant
unit eigenvector, we find
    
2 2 0.3714 2.5997
Ax1 = = ,
2 5 0.9285 5.3852

   
Ax1 1 2.5997 0.4347
x2 = ≈ ≈ .
max(Ax1 ) 5.9799 5.3852 0.9006
426 Applied Linear Algebra and Optimization using MATLAB

    
2 2 0.4347 2.6706
Ax2 = = ,
2 5 0.9006 5.3723

   
Ax2 1 2.6706 0.4451
x3 = ≈ ≈ .
max(Ax2 ) 5.9994 5.3723 0.8955

    
2 2 0.4451 2.6812
Ax3 = = ,
2 5 0.8955 5.3676

   
Ax3 1 2.6812 0.4469
x4 = ≈ ≈ .
max(Ax3 ) 6 5.3676 0.8946

Now we find the approximations of the dominant eigenvalue of the given


matrix as follows:

(Ax1 )T x1
 
Ax1 .x1 0.3714
λ1 = = = (2.5997 5.3852) ≈ 5.9655,
x1 .x1 xT1 x1 0.9285

(Ax2 )T x2
 
Ax2 .x2 0.4347
λ2 = = = (2.6706 5.3723) ≈ 5.9992,
x2 .x2 xT2 x2 0.9006

(Ax3 )T x3
 
Ax3 .x3 0.4451
λ3 = = = (2.6812 5.3676) ≈ 6.0001,
x3 .x3 xT3 x3 0.8955

(Ax4 )T x4
 
Ax4 .x4 0.4469
λ4 = = = (2.6830 5.3668) ≈ 6.0002.
x4 .x4 xT4 x4 0.8946

These are the required approximations of the dominant eigenvalue of A.


Notice that the exact dominant eigenvalue of the given matrix is λ = 6,
with the corresponding dominant unit eigenvector x = [0.4472, 0.8945]T . •

Now we will consider the power method using a symmetric matrix in


such a way that each iterate is scaled to make its largest entry a 1, rather
than being normalized.
Numerical Computation of Eigenvalues 427

Theorem 4.2 (Power Method with Maximum Entry Scaling)

Let A be a symmetric n × n matrix with a positive dominant eigenvalue


λ. If x0 is a nonzero vector in Rn that is not orthogonal to the eigenspace
corresponding to λ, then the normalized power sequence
Ax0 Ax1 Axk−1
x0 , x1 = , x2 = ,..., xk = , ...
max(Ax0 ) max(Ax1 ) max(Axk−1 )
(4.7)
converges to an eigenvector corresponding to eigenvalue λ, and the sequence
Ax1 .x1 Ax2 .x2 Axk .xk
, , ..., , ... (4.8)
x1 .x1 x2 .x2 xk .xk
converges to λ. •
In using the power method with maximum entry scaling, we have to do
the following steps:
1. Choose an arbitrary nonzero vector x0 .
2. Compute Ax0 and divide it by the factor max x0 to obtain the first
approximation x1 to a dominant eigenvector. Compute Ax 1 .x1
x1 .x1
to
obtain the first approximation to the dominant eigenvalue.
3. Compute Ax1 and divide it by the factor max x1 to obtain the second
approximation x2 to a dominant eigenvector. Compute Ax 2 .x2
x2 .x2
to
obtain the second approximation to the dominant eigenvalue.
4. Continuing in this way we will create a sequence of increasingly
closer approximations to the dominant eigenvalue and a correspond-
ing eigenvector.

Example 4.4 Apply the power method with maximum entry scaling to the
matrix  
2 2
A= ,
2 5
with x0 = [0, 1]T , to get the first four approximations to the dominant
eigenvector and the dominant eigenvalue.
428 Applied Linear Algebra and Optimization using MATLAB

Solution. Starting with x0 = [0, 1]T , we get the first approximation of the
dominant eigenvector as follows:
        
2 2 0 2 Ax0 1 2 0.4000
Ax0 = = , x1 = = = .
2 5 1 5 max(Ax0 ) 5 5 1.0000

Similarly, for the second, third, and fourth approximations of the dominant
eigenvector, we find
    
2 2 0.4000 2.8000
Ax1 = = ,
2 5 1.0000 5.8000

   
Ax1 1 2.8000 0.4828
x2 = = ≈ .
max(Ax1 ) 5.8000 5.8000 1.0000

    
2 2 0.4828 2.9655
Ax2 = ≈ ,
2 5 1.0000 5.9655

   
Ax2 1 2.9655 0.4971
x3 = = ≈ .
max(Ax2 ) 5.9655 5.9655 1.0000

    
2 2 0.4971 2.9942
Ax3 = ≈ ,
2 5 1.0000 5.9942

   
Ax3 1 2.9942 0.4995
x4 = = ≈ ,
max(Ax3 ) 5.9942 5.9942 1.0000

which are the required first four approximations of the dominant eigenvec-
tor.

Now we find the approximations of the dominant eigenvalue of the given


matrix as follows:

Ax1 .x1 (Ax1 )T x1 6.9200


λ1 = = T
≈ ≈ 5.9655,
x1 .x1 x1 x1 1.1600
Numerical Computation of Eigenvalues 429

Ax2 .x2 (Ax2 )T x2 7.3972


λ2 = = T
≈ ≈ 5.9989,
x2 .x2 x2 x2 1.2331
Ax3 .x3 (Ax3 )T x3 7.4826
λ3 = = T
≈ ≈ 6.0000,
x3 .x3 x3 x3 1.2471
Ax4 .x4 (Ax4 )T x4 7.4970
λ4 = = T
≈ ≈ 6.0000.
x4 .x4 x4 x4 1.2495

These are the required approximations of the dominant eigenvalue of A.


Notice that the exact dominant eigenvalue of the given matrix is λ = 6,
with the corresponding dominant eigenvector x = [0.5, 1]T . •

Notice that the main difference between the power method with Eu-
clidean scaling and the power method with maximum entry scaling is that
the Euclidean scaling gives a sequence that approaches a unit dominant
eigenvector, whereas maximum entry scaling gives a sequence that ap-
proaches a dominant eigenvector whose largest component is 1.

4.2.2 Inverse Power Method


The power method can be modified by replacing the given matrix A with
its inverse matrix A−1 , and this is called the inverse power method. Since
we know that the eigenvalues of A−1 are reciprocals of A, the power method
applied to A−1 will find the smallest eigenvalue of A. Thus, the smallest
(or least) value of the eigenvalue for A will become the maximum value for
A−1 . Of course, we must assume that the smallest eigenvalue of A is real
and not repeated, otherwise, the method does not work.

In this method the solution procedure is a little more involved than


finding the largest eigenvalue of the given matrix. Fortunately, it is just as
straight forward. Consider
Ax = λx, (4.9)
and multiplying by A−1 , we have

A−1 Ax = λA−1 x
430 Applied Linear Algebra and Optimization using MATLAB

or
1
A−1 x = x. (4.10)
λ
The solution procedure is initiated by starting with an initial guess for
the vector xi and improving the solution by getting a new vector xi+1 , and
so on until the vector xi is approximately equal to xi+1 .

Example 4.5 Use the inverse power method to find the first seven approxi-
mations of the least dominant eigenvalue and the corresponding eigenvector
of the following matrix using an initial approximation x0 = [0, 1, 2]T :
 
3 0 1
A =  0 −3 0  .
1 0 3

Solution. The inverse of the given matrix A is

3 1
 
0 − 
 8 8 
−1
 1
A = 0 − .
0 
 3
 1 3 
− 0
8 8

Starting with the given initial vector x0 = [0, 1, 2]T , we have

3 1
 
0 −    
 8 8  0 −0.2500
 1
A−1 x0 = 
 0 −3 0    =  −0.3333 
 1
 1 3  2 0.7500
− 0
8 8
 
−0.3333
= 0.75  −0.4444  = λ1 x1 .
1.0000
Numerical Computation of Eigenvalues 431

Similarly, the other possible iterations are as follows:

3 1
 
0 −    
 8 8 −0.3333 −0.2500
 1 
A−1 x1 =  0 − 0   −0.4444  =  0.1481 

 1 3 
1.0000 0.4167
3 
− 0
8 8
 
−0.6000
= 0.4167  0.3558  = λ2 x2 .
1.0000

3 1
 
0 −    
 8 8  −0.6000 −0.3500
 1
A−1 x2 = 
 0 −3 0 
 0.3558  =  −0.1185 
 1 3  1.0000 0.4500
− 0
8 8
 
−0.7778
= 0.4500  −0.2634  = λ3 x3 .
1.0000

3 1
 
0 −    
 8 8  −0.7778 −0.4167
 1
A−1 x3 = 
 0 −3 0  −0.2634
  =  0.0878 
 1 3  1.0000 0.4722
− 0
8 8
 
−0.8824
= 0.4722  0.1859  = λ4 x4 .
1.0000
432 Applied Linear Algebra and Optimization using MATLAB

3 1
 
0 − 
 −0.8824
  
 8 8 −0.4559
 1 
A−1 x4 = 
 0 −3 0 
 0.1859  =  −0.0620 
 1 3  1.0000 0.4853
− 0
8 8
 
−0.9394
= 0.4853  −0.1277  = λ5 x5 .
1.0000

3 1
 
0 −    
 8 8 −0.9394 −0.4773
 1 
A−1 x5 = 
 0 −3 0  −0.1277
  =  −0.0426 
 1 3  1.0000 0.4924
− 0
8 8
 
−0.9692
= 0.4924  −0.0864  = λ6 x6 .
1.0000

3 1
 
0 −    
 8 8 −0.9692 −0.4885
 1 
A−1 x6 = 
 0 −3 0 
 0.0864  =  −0.0288 
 1 3  1.0000 0.4962
− 0
8 8
 
−0.9845
= 0.4962  −0.0581  = λ7 x7 .
1.0000

Since the eigenvalues of the given matrix A are −3.0000, 2.0000, and
4.0000, the dominant eigenvalue of A−1 after the seven iterations is λ7 =
0.4962 and converges to 21 and so the smallest dominant eigenvalue of the
given matrix A is the reciprocal of the dominant eigenvalue 21 of the matrix
A−1 , i.e., 2 and the corresponding eigenvector is [−0.9845, −0.0581, 1.0000]T .
Numerical Computation of Eigenvalues 433

To get the above results using the MATLAB Command Window, we do:

>> A = [3 0 1; 0 − 3 0; 1 0 3];
>> X = [0 1 2]0 ;
>> maxI = 7;
>> sol = IP M (A, X, maxI);

Program 4.2
MATLAB m-file for using the Inverse Power method
function sol=IPM (A, X, maxI)
[n, n] = size(A); B = inv(A);
for k=1:maxI; for i=1:n; s=0;
for j=1:n; ss = B(i, j) ∗ X(j, 1); s = s + ss;end
XX(i, 1) = s;end; X = XX; y = max(X);
for i=1:n; X(i, 1) = 1/y ∗ X(i, 1);end; yy=abs(y-y1);
if (yy <= 1e − 6); break; end; y; end

4.2.3 Shifted Inverse Power Method


Another modification of the power method consists of replacing the given
matrix A with (A − µI), for any scalar µ, i.e.,

(A − µI)x = (λ − µ)x, (4.11)

and it follows that the eigenvalues of (A − µI) are the same as those of A
except that they have all been shifted by an amount µ. The eigenvectors
remain unaffected by the shift.

The shifted inverse power method is to apply the power method to the
system
1
(A − µI)−1 x = x. (4.12)
(λ − µ)
Thus the iteration of (A − µI)−1 leads to the largest value of (λ−µ)
1
, i.e.,
the smallest value of (λ − µ). The smallest value of (λ − µ) implies that
the value of λ will be the value closest to µ. Thus, by a suitable choice
434 Applied Linear Algebra and Optimization using MATLAB

of µ we have a procedure for finding the subdominant eigensolutions. So,


(A − µI)−1 has the same eigenvectors as A but with eigenvalues (λ−µ)
1
.

In practice, the inverse of (A−µI) is never actually computed, especially


if the given matrix A is a large sparse matrix. It is computationally more
efficient if (A − µI) is decomposed into the product of a lower-triangular
matrix L and an upper-triangular matrix U . If us is an initial vector for
the solution of (4.12), then

(A − µI)−1 us = vs (4.13)

and
vs
us+1 = . (4.14)
max(vs )
By rearranging (4.13), we obtain

us = (A − µI)vs
= LU vs .

Let
U vs = z, (4.15)
then
Lz = us . (4.16)
By using an initial value, we can find z from (4.16) by applying for-
ward substitution, and knowing z we can find vs from (4.15) by applying
backward substitution. The new estimate for the vector us+1 can then be
found from (4.14). The iteration is terminated when us+1 is sufficiently
close to us , and it can be easily shown when convergence is completed.

Let λµ be an eigenvalue of A nearest to µ, then

1
λµ = + µ. (4.17)
dominant eigenvalue of (A − µI)−1

The shifted inverse power method uses the power method as a basis
but gives faster convergence. Convergence is to the eigenvalue λ that is
Numerical Computation of Eigenvalues 435

closest to µ, and if this eigenvalue is extremely close to µ, the rate of con-


vergence will be very rapid. Inverse iteration therefore provides a means of
determining an eigenvector of a matrix for which the corresponding eigen-
value has already been determined to moderate accuracy by an alternative
method, such as the QR method or the Strum sequence iteration, which
we will discuss later in this chapter.

When inverse iteration is used to determine eigenvectors corresponding


to known eigenvalues, the matrix to be inverted, even if it is symmetric,
will not normally be positive-definite, and if it is nonsymmetric, will not
normally be diagonally dominant. The computation of an eigenvector cor-
responding to a complex conjugate eigenvalue by inverse iteration is more
difficult than for a real eigenvalue.

Example 4.6 Use the shifted inverse power method to find the first five
approximations of the eigenvalue nearest µ = 6 of the following matrix
using the initial approximation x0 = [1, 1]T :

 
4 2
A= .
3 5

Solution. Consider
 
−2 2
B = (A − 6I) = .
3 −1

The inverse of B is

 
1 1
 4 2 
B −1 = (A − 3I)−1 = .
 
 3 1 
4 2
436 Applied Linear Algebra and Optimization using MATLAB

Now applying the power method, we obtain the following iterations:


 
1 1    
 4 2  1 0.7500
B −1 x0 =  =
    

 3 1 
1 1.2500
4 2
 
0.6000
= 1.2500 = λ1 x 1 .
1.0000

Similarly, the other approximations can be computed as


 
1 1    
 4 2  0.6000 0.6500
B −1 x1 =  =
  

 3 1 
1.0000 0.9500
4 2
 
0.6842
= 0.9500 = λ2 x2 .
1.0000
 
1 1    
 4 2  0.6842 0.6711
B −1 x2 =  =
  

 3 1 1.0000 1.0132

4 2
 
0.6623
= 1.0132 = λ3 x3 .
1.0000
 
1 1    
 4 2  0.6623 0.6656
B −1 x3 =  =
  

 3 1 
1.0000 0.9968
4 2
 
0.6678
= 0.9968 = λ4 x4 .
1.0000
Numerical Computation of Eigenvalues 437
 
1 1    
 4 2 
 0.6678 0.6669
B −1 x4 =  =

 
 3 1  1.0000 1.0008
4 2
 
0.6664
= 1.0008 = λ5 x5 .
1.0000

Thus, the fifth approximation of the dominant eigenvalue of B −1 =


(A − 3I)−1 is λ5 = 1.0008, and it is converges to 1 with the eigenvector
[1.0000, 0.7000]T . Hence, the eigenvalue λµ of A nearest to µ = 6 is

1
λµ = + 6 = 7.
1
To get the above results using the MATLAB Command Window, we
do the following:

>> A = [4 2; 3 5];
>> mu = 6;
>> X = [1 1]0 ;
>> maxI = 5;
>> sol = SIP M (A, X, mu, maxI);

Program 4.3
MATLAB m-file for Using Shifted Inverse Power method
function sol=SIPM (A, X, mu, maxI)
[n, n] = size(A); B = A − mu ∗ eye(n); C = inv(B);
for k=1:maxI; for i=1:n; s=0;
for j=1:n; ss = C(i, j) ∗ X(j, 1); s = s + ss;end
XX(i, 1) = s;end; X = XX; y = max(X);
for i=1:n; X(i, 1) = 1/y ∗ X(i, 1);end; yy=abs(y-y1);
if (yy <= 1e − 6); break; end; lmu = (1/y) + mu; end
438 Applied Linear Algebra and Optimization using MATLAB

4.3 Location of the Eigenvalues


Here, we discuss two well-known theorems that are some of the more impor-
tant among the many theorems that deal with the location of eigenvalues
of both symmetric and nonsymmetric matrices, i.e., the location of zeros of
the characteristic polynomial. The eigenvalues of a nonsymmetric matrix
could, of course, be complex, in which case the theorems give us a means
of locating these numbers in the complex plane. The theorems can also be
used to estimate the magnitude of the largest and smallest eigenvalues and
thus to estimate the spectral radius ρ(A) of A and the condition number
of A. Such estimates can be used to generate initial approximations to be
used in iterative methods for determining eigenvalues.

4.3.1 Gerschgorin Circles Theorem


Let A be an n × n matrix, and Ri denote the circles in the complex plane
n
X
C, with center aii and radius |ai,j |, i.e.,
j=1
j6=i

n
X
Ri = {z ∈ C : |z − aii | ≤ |aij |}, i = 1, 2, · · · , n, (4.18)
j=1
j6=i

where the variable z is complex valued.


The eigenvalues of A are contained within R = ∪ni=1 Ri and the union
of any k of these circles that do not intersect the remaining (n − k) must
contain precisely k (counting multiplication) of the eigenvalues. •
Numerical Computation of Eigenvalues 439

Example 4.7 Consider the matrix


 
10 1 1 2
 1 5 1 0 
A=  1 1 −5
,
0 
2 0 0 −10
which is symmetric and has only real eigenvalues. The Gerschgorin circles
associated with A are given by
R1 = {z ∈ C| |z − 10| ≤ 4}
R2 = {z ∈ C| |z − 5| ≤ 2}
R3 = {z ∈ C| |z + 5| ≤ 2}
R4 = {z ∈ C| |z + 10| ≤ 2}.
These circles are illustrated in Figure 4.1, and Gerschgorin’s theorem indi-
cates that the eigenvalues of A lie inside the circles. The circles are about

Figure 4.1: Circles for Example 4.7.

−10 and −5 each and must contain an eigenvalue. The other eigenvalues
must lie in the interval [3, 14]. By using the shifted inverse power method,
with  = 0.000005, with initial approximations of 10, 5, −5, and −10, leads
to approximations of
λ1 = 10.4698, λ2 = 4.8803
λ3 = −5.1497, λ4 = −10.2004,
respectively. The number of iterations required ranges from 9 to 13. •
440 Applied Linear Algebra and Optimization using MATLAB

Example 4.8 Consider the matrix


 
1 2 −1
A= 2 7 0 ,
−1 0 −5
which is symmetric and so has only real eigenvalues. The Gerschgorin
circles are
C1 : |z − 1| ≤ 3
C2 : |z − 7| ≤ 2
C3 : |z + 5| ≤ 1.
These circles are illustrated in Figure 4.2, and Gerschgorin’s theorem in-
dicates that the eigenvalues of A lie inside the circles.

Figure 4.2: Circles for Example 4.8.

Then by Gerschgorin’s theorem, any eigenvalues of A must lie in one of the


three intervals [−2, 4], [5, 9], and [−6, −4]. Since the eigenvalues of A are
0, 5, and 9, λ1 = 0 lies in circle C1 and λ2 = 5 and λ3 = 9 lie in circle C2 .•

4.3.2 Rayleigh Quotient


The shifted inverse power method requires the input of an initial approx-
imation µ for the eigenvalue λ of a matrix A. It can be obtained by the
Numerical Computation of Eigenvalues 441

Rayleigh quotient as
xT Ax
µ= . (4.19)
xT x
The maximum eigenvalue λ1 can be obtained when x is the corresponding
vector, as in
xT Ax
λ1 = max T . (4.20)
x6=0 x x

In the case where λ1 is the dominant eigenvalue of a matrix A, and x is


the corresponding eigenvector, then the Rayleigh quotient is
xT Ax xT λ1 x λ1 (xT x)
= = = λ1 .
xT x xT x xT x
Thus, if xk converges to a dominant eigenvector x, then it seems reasonable
that
xk T Axk
xk T x k
converges to
xT Ax
= λ1 ,
xT x
which is the dominant eigenvalue.

Theorem 4.3 (Rayleigh Quotient Theorem)

If the eigenvalues of a real symmetric matrix A are

λ1 ≥ λ2 ≥ λ3 · · · ≥ λn , (4.21)

and if x is any nonzero vector, then


xT Ax
λn ≤ ≤ λ1 . (4.22)
xT x

Example 4.9 Consider the symmetric matrix


 
2 −1 3
A =  −1 1 −2  ,
3 −2 1
442 Applied Linear Algebra and Optimization using MATLAB

and the vector x as  


1
x =  1 .
1
Then  
1
xT x =

1 1 1  1 =3
1
and   
 2 −1 3 1
xT Ax = 1 1 1  −1 1 −2   1 
3 −2 1 1

 4
= 1 1 1  −2  = 4.
2
Thus,
4
λ3 ≤ ≤ λ1 .
3
If µ is close to an eigenvalue λ1 , then convergence will be quite rapid.

4.4 Intermediate Eigenvalues


Once the largest eigenvalue is determined, then there is a method to ob-
tain the approximations to the other possible eigenvalues of a matrix. This
method is called matrix deflation and it is applicable to both symmetrical
and nonsymmetrical coefficients matrices. The deflation method involves
forming a new matrix B whose eigenvalues are the same as those of A, ex-
cept that the dominant eigenvalue of A is replaced by the eigenvalue zero
in B.

It is evident that this process can be continued until all of the eigen-
values have been extracted. Although this method shows promise, it does
have a significant drawback, i.e., at each iteration performed in deflating
Numerical Computation of Eigenvalues 443

the original matrix, any errors in the computed eigenvalues and eigenvec-
tors will be passed on to the next eigenvectors. This could result in serious
inaccuracy, especially when dealing with large eigenvalue problems. This
is precisely why this method is generally used for small eigenvalue problems.

The following preliminary results are essential in using this technique.

Theorem 4.4 If a matrix A has eigenvalues λi corresponding to eigenvec-


tors xi , then Q−1 AQ has the same eigenvalues as A but with eigenvectors
Q−1 xi for any nonsingular matrix Q. •

Theorem 4.5 Let


 
λ1 a12 a13 · · · a1n

 0 c22 c23 · · · c2n 

B=
 0 c32 c33 · · · c3n ,

(4.23)
.. .. .. .
· · · ..
 
 . . . 
0 cn2 cn3 · · · cnn

and let C be an (n−1)×(n−1) matrix obtained by deleting the first row and
first column of a matrix B. The matrix B has eigenvalues λ1 together with
the (n−1) eigenvalues of C. Moreover, if (β2 , β3 , . . . , βn )T is an eigenvector
of C with eigenvalue µ 6= λ1 , then the corresponding eigenvector of B is
(β1 , β2 , . . . , βn )T , with
Xn
a1j βj
j=2
β1 = . (4.24)
µ − λ1
Note that eigenvectors xi of A can be recovered by premultiplication by
Q. •

Example 4.10 Consider the matrix


 
10 −6 −4
A =  −6 11 2 ,
−4 2 6
444 Applied Linear Algebra and Optimization using MATLAB

which has the dominant eigenvalue λ1 = 18, with the corresponding eigen-
vector x1 = [1, −1, − 21 ]T . Use the deflation method to find the other eigen-
values and eigenvectors of A.

Solution. The transformation matrix is given as


 
1 0 0
Q =  −1 1 0  .
 
1
− 0 1
2
Then
   
1 0 0  10 −6 −4  1 0 0
B = Q−1 AQ =  1 1 0   −6 11 2   −1 1 0  .
   
1 1
0 1 −4 2 6 − 0 1
2 2
After simplifying, we get
 
18 −6 −4
B= 0 5 −2  .
0 −1 4

So the deflated matrix is


 
5 −2
C= .
−1 4

Now we can easily find the eigenvalues of C, which are 6 and 3, with the
corresponding eigenvectors [1, − 21 ]T and [1, 1]T respectively. Thus, the other
two eigenvalues of A are 6 and 3. Now we calculate the eigenvectors of A
corresponding to these two eigenvalues. First, we calculate the eigenvectors
of B corresponding to λ = 6 from the system
  β  
β1

18 −6 −4 1
 0 5 −2   1  = 6  1  .
   
1 1
0 −1 4 − −
2 2
Numerical Computation of Eigenvalues 445

Then by solving the above system, we have

18β1 − 4 = 6β1 ,

which gives β1 = 31 . Similarly, we can find the value of β1 corresponding


to λ = 3 by using the system as
    
18 −6 −4 β1 β1
 0 5 −2   1  = 3  1  ,
0 −1 4 1 1

which gives β1 = 32 . Thus, the eigenvectors of B are v1 = [ 13 , 1, − 12 ]T and


v2 = [ 23 , 1, 1]T .

Now we find the eigenvectors of the original matrix A, which can be


obtained by premultiplying the vectors of B by nonsingular matrix Q. First,
the second eigenvector of A can be found as
   1 
  1
1 0 0    3 
  3   
  
2 
 
 −1 1 0    
x2 = Qv1 =   1  =   ,
    3  
1
  
 1   
− 0 1 −
 2 
2 2 −
3
or, equivalently, x2 = [ 12 , 1, −1]T . Similarly, the third eigenvector of the
given matrix A can be computed as
2
 
  2 
1 0 0  3 
  3   
  
1
 
 −1 1 0     
x3 = Qv2 =   1  =  ,
    3 


1
    
− 0 1 1
 2 
2
3
or, equivalently, x3 = [1, 12 , 1]T . •
446 Applied Linear Algebra and Optimization using MATLAB

Note that in this example the deflated matrix C is nonsymmetric even


though the original matrix A is symmetric. We deduce that the property
of symmetry is not preserved in the deflation process. Also, note that the
method of deflation fails whenever the first element of given vector x1 is
zero, since x1 cannot then be scaled so that this number is one.

The above results can be reproduced using MATLAB commands as


follows:

>> A = [10 − 6 − 4; −6 11 2; −4 2 6];


>> Lamda = 18;
>> XA = [1 − 1 − 0.5]0 ;
>> [Lamda, X] = DEF LAT ED(A, Lamda, XA);

Program 4.4
MATLAB m-file for Using the Deflation method
function [Lamda,X]=DEFLATED(A,Lamda,XA)
[n,n]=size(a); Q=eye(n);
Q(:, 1) = XA(:, 1); B = inv(Q)∗A∗Q; c=B(2:n,2:n);
[xv, ev] = eig(c,0 nobalance0 );
for i=1:n-1
b = −(B(1, 2 : n) ∗ xv(:, i))/(Lamda − ev(i, i));
Xb(:, i) = [bxv(:, i)0 ]0 ; XA(:, i + 1) = Q ∗ Xb(:, i); end
Lamda=[Lamda;diag(ev)]; Lamda; XA; end

4.5 Eigenvalues of Symmetric Matrices


In the previous section we discussed the power methods for finding indi-
vidual eigenvalues. The regular power method can be used to find the
distinct eigenvalue with the largest magnitude, i.e., the dominant eigen-
value, and the inverse power method can find the eigenvalue called the
smallest eigenvalue, and the shifted inverse power method can find the
subdominant eigenvalues. In this section we develop some methods to find
all eigenvalues of a given matrix. The basic approach is to find a sequence
Numerical Computation of Eigenvalues 447

of similarity transformations that transform the original matrix into a sim-


ple form. Clearly, the best form for the transformed matrix would be a
diagonal one, but this is not always possible, since some transformed ma-
trices would be tridiagonal. Furthermore, these techniques are generally
limited to symmetrical matrices with real coefficients.

Before we discuss these methods, we define some special matrices, which


are very useful in discussing these methods.

Definition 4.1 (Orthogonally Similar Matrix)

A matrix A is said to be orthogonally similar to a matrix B, if there is an


orthogonal matrix Q for which

A = QBQT . (4.25)

If A is a symmetric and B = Q−1 AQ, then

B T = (QT AQ)T = QT AQ = B.

Thus, similarity transformations on symmetric matrices that use orthogo-


nal matrices produce matrices which are again symmetric. •

Definition 4.2 (Rotation Matrix)

A rotation matrix Q is an orthogonal matrix that differs from the identity


matrix in, at most, four elements. These four elements at the vertices of
the rectangle have been replaced by cos θ, − sin θ, sin θ, and cos θ in the
positions pp, pq, qp, qq, respectively. For example, the matrix
 
1 0 0 0 0
 0 cos θ 0 − sin θ 0 
 
B=  0 0 1 0 0 
 (4.26)
 0 sin θ 0 cos θ 0 
0 0 0 0 1

is the rotation matrix, where p = 2 and q = 4. Note that a rotation matrix


is also an orthogonal matrix, i.e., B T B = I. •
448 Applied Linear Algebra and Optimization using MATLAB

4.5.1 Jacobi Method


This method can be used to find all the eigenvalues and eigenvectors of
a symmetric matrix by performing a series of similarity transformations.
The Jacobi method permits the transformation of a symmetric matrix into
a diagonal one having the same eigenvalues as the original matrix. This
can be done by eliminating off-diagonal elements in a systematic way. The
method requires an infinite number of iterations to produce the diagonal
form. This is because the reduction of a given element to zero in a matrix
will most likely introduce a nonzero element into a previous zero coeffi-
cient. Hence, the method can be viewed as an iterative procedure that can
approach a diagonal form using a finite number of steps. The implication
is that the off-diagonal coefficients will be close to zero rather than exactly
equal to zero.

Consider the eigenvalue problem

Av = λv, (4.27)

where A is a symmetric matrix of order n × n, and let the solution of


(4.27) give the eigenvalues λ1 , . . . , λn and the corresponding eigenvectors
v1 , . . . , vn of A. Since the eigenvectors of a symmetric matrix are orthog-
onal, i.e.,
vT = v−1 , (4.28)
by using (4.28), we can write (4.27) as

vT Av = λ. (4.29)

The basic procedure for the Jacobi method is as follows.

Assume that

A1 = QT1 AQ1

A2 = QT2 A1 Q2 = QT2 QT1 AQ1 Q2

A3 = QT3 A2 Q3 = QT3 QT2 QT1 AQ1 Q2 Q3


Numerical Computation of Eigenvalues 449

.. ..
. .
Ak = QTk · · · QT1 AQ1 · · · Qk .

We see that as k → ∞, then

Ak → λ and Q1 Q2 · · · Qk → v. (4.30)

The matrix Qi (i = 1, 2, . . . , k) is a rotation matrix that is constructed in


such a way that off-diagonal coefficients in matrix Ak are reduced to zero.
In other words, in a rotation matrix
 
1
..

 . 

cosθ −sinθ
 
 
Qk =  ,
 
sinθ cosθ
 
 
 .. 
 . 
1

the value of θ is selected in such a way that the apq coefficient in Ak is


reduced to zero, i.e.,
2apq
tan 2θ = . (4.31)
app − aqq
Theoretically, there are an infinite number of θ values corresponding to
the infinite matrices Ak . However, as θ approaches zero, a rotation matrix
Qk becomes an identity matrix and no further transformations are required.

There are three strategies for annihilating off-diagonals. The first is


called the serial method, which selects the elements in row order, i.e., in
the positions (1, 2), . . . , (1, n); (2, 3), . . . , (2, n); . . .; (n−1, n) in turn, which
is then repeated. The second method is called the natural method, which
searches through all of the off-diagonals and annihilates the elements of the
largest modules at each stage. Although this method converges faster than
the serial method, it is not recommended for large values of n, since the
actual search procedure itself can be extremely time consuming. The third
method is known as the threshold serial method, in which the off-diagonals
450 Applied Linear Algebra and Optimization using MATLAB

are cycled in row order as in the serial method, omitting transformations


on any element whose magnitude is below some threshold value. This value
is usually decreased after each cycle. The advantage of this approach is
that zeros are only created in positions where it is worthwhile to do so,
without the need for a lengthy search. Here, we shall use only the natural
method for annihilating the off-diagonal elements.

Theorem 4.6 Consider a matrix A and a rotation matrix Q as


   
a11 a12 p11 p12
A= and Q = .
a12 a22 p21 p22
Then there exists θ such that:

1. QT Q = I,
2. QT AQ = D,

where I is an identity matrix and D is a diagonal matrix, and its diagonal


elements, λ1 and λ2 , are the eigenvalues of A.

Proof. To convert the given matrix A into a diagonal matrix D, we have


to make off-diagonal element a12 of A zero, i.e., p = 1 and q = 2. Consider
p11 = cos θ = p22 and p12 = −p21 = sin θ, then the matrix Q has the form
 
cos θ − sin θ
Q= .
sin θ cos θ
The corresponding matrix A1 can be constructed as

A1 = QT1 AQ1

or
a∗11 a∗12
     
cos θ sin θ a11 a12 cos θ − sin θ
= .
a∗12 a∗22 − sin θ cos θ a21 a22 sin θ cos θ
Since our task is to reduce a∗12 to zero, carrying out the multiplication on
the right-hand side and using matrix equality gives

a∗12 = 0 = −(sin θ cos θ)a11 + (cos2 θ)a12 − (sin2 θ)a12 + (cos θ sin θ)a22 .
Numerical Computation of Eigenvalues 451

Simplifying and rearranging gives


sin θ cos θ a12
2 =
cos2 θ − sin θ a11 − a22
sin 2θ a12
=
2 cos 2θ a11 − a22
sin 2θ 2a12
= ,
cos 2θ a11 − a22
or more simply
2a12
tan 2θ = , a11 6= a22 .
a11 − a22
Note that if a11 = a22 , this implies that θ = π4 . We found that for a
2 × 2 matrix, it required only one iteration to convert the given matrix A
to a diagonal matrix D.

Similarly, for a higher order matrix, a diagonal matrix D can be ob-


tained by a number of such multiplications, i.e.,

QTk QTk−1 · · · QT1 AQ1 · · · Qk−1 Qk = D.

The diagonal elements of D are all the eigenvalues λ of A and the


corresponding eigenvectors v of A can be obtained as

Q1 Q2 · · · Qk = v.

Example 4.11 Use the Jacobi method to find the eigenvalues and the
eigenvectors of the matrix
 
3.0 0.01 0.02
A =  0.01 2.0 0.1  .
0.02 0.1 1.0
452 Applied Linear Algebra and Optimization using MATLAB

Solution. The largest off-diagonal entry of the given matrix A is a23 = 0.1,
so we begin by reducing element a23 to zero. Since p = 2 and q = 3, the
first orthogonal transformation matrix has the form
 
1 0 0
Q1 =  0 c −s  .
0 s c

The values of c = cos θ and s = sin θ can be obtained as follows:


   
1 2a23 1 2(0.1)
θ = arctan = arctan ≈ 6.2833
2 a22 − a33 2 2−1

cos θ ≈ 0.9951 and sin θ ≈ 0.0985.

Then
   
1 0 0 1 0 0
Q1 =  0 0.9951 −0.0985  and QT1 =  0 0.9951 0.0985 
0 0.0985 0.9951 0 −0.0985 0.9951

and  
3.0 0.0119 0.0189
A1 = QT1 AQ1 =  0.0119 2.0099 0 .
0.0189 0 0.9901

Note that the rotation makes a32 and a23 zero, increasing slightly a21
and a12 , and decreasing the second dominant off-diagonal entries a13 and
a31 .

Now the largest off-diagonal element of the matrix A1 is a13 = 0.0189,


so to make this position zero, we consider the second orthogonal matrix of
the form
 
c 0 −s
Q2 =  0 1 0 ,
s 0 c
Numerical Computation of Eigenvalues 453

and the values of c and s can be obtained as follows:


   
1 2a13 1 2(0.0189)
θ = arctan = arctan ≈ 0.5984
2 a11 − a33 2 3 − 0.9901

cos θ ≈ 0.9999 and sin θ ≈ 0.0094.

Then
   
0.9999 0 −0.0094 0.9999 0 0.0094
Q2 =  0 1 0  and QT2 =  0 1 0 .
0.0094 0 0.9999 −0.0094 0 0.9999

Hence,
 
3.0002 0.0119 0
A2 = QT2 A1 Q2 = QT2 QT1 AQ1 Q2 =  0.0119 2.0099 −0.0001  .
0 −0.0001 0.9899

Similarly, to make off-diagonal element a12 = 0.0119 of the matrix A2


zero, we consider the third orthogonal matrix of the form
 
c −s 0
Q3 =  s c 0 
0 0 1

and
   
1 2a12 1 2(0.0119)
θ = arctan = arctan ≈ 0.7638
2 a11 − a22 2 3.0002 − 2.0099

cos θ ≈ 0.9999 and sin θ ≈ 0.0120.

Then  
0.9999 −0.0120 0
Q3 =  0.0120 0.9999 0  and
0 0 1
454 Applied Linear Algebra and Optimization using MATLAB

 
0.9999 0.0120 0
QT3 =  −0.0120 0.9999 0  .
0 0 1

Hence,
 
3.0003 0 −1.35E − 6
A3 = QT3 QT2 QT1 AQ1 Q2 Q3 =  0 2.00 −1.122E − 4  ,
−1.35E − 6 −1.122E − 4 0.9899

which gives the diagonal matrix D, and its diagonal elements converge
to 3, 2, and 1, which are the eigenvalues of the original matrix A. The
corresponding eigenvectors can be computed as follows:
 
0.9998 −0.0121 −0.0094
v = Q1 Q2 Q3 =  0.0111 0.9951 −0.0985  .
0.0106 0.0984 0.9951

To reproduce the above results by using the Jacobi method and the
MATLAB Command Window, we do the following:

>> A = [3 0.01 0.02; 0.01 2 0.1; 0.02 0.1 1];


>> sol = JOBM (A);
Numerical Computation of Eigenvalues 455

Program 4.5
MATLAB m-file for the Jacobi method
function sol=JOBM(A)
[n,n]=size(A); QQ=[ ];
for u = 1 : .5 ∗ n ∗ (n − 1); for i=1:n; for j=1:n
if (j > i); aa(i,j)=A(i,j); else; aa(i,j)=0;
end; end; end
aa=abs(aa); mm=max(aa); m=max(mm);
[i,j]=find(aa==m); i=i(1); j=j(1);
t = .5 ∗ atan(2 ∗ A(i, j)/(A(i, i) − A(j, j))); c=cos(t); s=sin(t);
for ii=1:n; for jj=1:n; Q(ii,jj)=0.0;
if (ii==jj); Q(ii,jj)=1.0; end; end; end
Q(i,i)=c; Q(i,j)=-s; Q(j,i)=s; Q(j,j)=c;
for i1=1:n; for j1=1:n;
QT(i1,j1)=Q(j1,i1); end; end
for i2=1:n; for j2=1:n; s=0;
for k = 1 : n; ss = QT (i2, k) ∗ A(k, j2);
s=s+ss; end; QTA(i2,j2)=s; end; end
for i3=1:n; for j3=1:n; s=0;
for k=1:n; ss = QT A(i3, k) ∗ Q(k, j3); s=s+ss; end
A(i3,j3)=s; end; end; QQ=[QQ,Q]; end; D=A
y=[]; for k=1:n; yy=A(k,k); y=[y;yy]; end; eigvals=y
x=Q; if (n > 2) % Compute eigenvectors
x(1:n,1:n)=QQ(1:n,1:n);
for c = n + 1 : n : n ∗ n; xx(1:n,1:n)=QQ(1:n,c:n+c-1);
for i=1:n; for j=1:n; s=0;
for k=1:n; ss = x(i, k) ∗ xx(k, j); s=s+ss; end
x1(i,j)=s; end; end; x=x1; end; end

4.5.2 Sturm Sequence Iteration


When a symmetric matrix is tridiagonal, then the eigenvalues of a tridi-
agonal matrix can be computed to any specified precision using a simple
method called the Sturm sequence iteration. In the following sections we
will discuss two methods that will convert the given symmetric matri-
456 Applied Linear Algebra and Optimization using MATLAB

ces into symmetrical tridiagonal forms by using similarity transformations.


The Sturm sequence iteration below can, therefore, be used in the cal-
culation of eigenvalues of any symmetric tridiagonal matrix. Consider a
symmetric tridiagonal matrix of order 4 × 4 as
 
a1 b 2 0 0
 b 2 a2 b 3 0 
A=  0 b 3 a3 b 4  ,

0 0 b 4 a4
and assume that bi 6= 0, for each i = 2, 3, 4. Then one can define the
characteristic polynomial of a given matrix A as
f4 (λ) = det(A − λI), (4.32)
which is equivalent to

a1 − λ b2 0 0

b 2 a 2 − λ b 3 0
f4 (λ) = .
0 b 3 a 3 − λ b4

0 0 b 4 a4 − λ

We expand by minors in the last row as



a1 − λ b2 0 a1 − λ b2 0

f4 (λ) = (a4 − λ) b 2 a2 − λ b3 − b4
b 2 a 2 − λ 0

0 b 3 a3 − λ 0 b3 b4
or
f4 (λ) = (a4 − λ)f3 (λ) − b24 f2 (λ). (4.33)
The recurrence relation (4.33) is true for a matrix of any order r × r, i.e.,
fr (λ) = (ar − λ)fr−1 (λ) − b2r fr−2 (λ), (4.34)
provided that we define f0 (λ) = 1 and evaluate f1 (λ) = a1 − λ.

The sequence {f0 , f1 , . . . , fr , . . .} is known as the Sturm sequence. So


starting with f0 (λ) = 1, we can eventually find a characteristic polynomial
of A by using
fn (λ) = 0. (4.35)
Numerical Computation of Eigenvalues 457

Example 4.12 Use the Sturm sequence iteration to find the eigenvalues
of the symmetric tridiagonal matrix
 
1 2 0 0
 2 4 1 0 
A= .
 0 1 5 −1 
0 0 −1 3

Solution. We compute the Sturm sequences as follows:

f0 (λ) = 1
f1 (λ) = (a1 − λ)
= 1 − λ.

The second sequence is

f2 (λ) = (a2 − λ)f1 (λ) − b22 f0 (λ)


= (4 − λ)(1 − λ) − 4(1)
= λ2 − 5λ.

and the third sequence is

f3 (λ) = (a3 − λ)f2 (λ) − b23 f1 (λ)


= (5 − λ)(λ2 − 5λ) − (1)2 (1 − λ)
= −λ3 + 10λ2 − 24λ − 1.

Finally, the fourth sequence is

f4 (λ) = (a4 − λ)f3 (λ) − b24 f2 (λ)


= (3 − λ)(−λ3 + 10λ2 − 24λ − 1) − (−1)2 (λ2 − 5λ)
= λ4 − 13λ3 − 53λ2 − 66λ − 3.

Thus,
f4 (λ) = λ4 − 13λ3 − 53λ2 − 66λ − 3 = 0.
Solving the above equation, we have the eigenvalues 6.11, 4.41, 2.54, and
−0.04 of the given symmetric tridiagonal matrix. •
458 Applied Linear Algebra and Optimization using MATLAB

To get the above results using the MATLAB Command Window, we


do the following:

>> A = [1 2 0 0; 2 4 1 0; 0 1 5 − 1; 0 0 − 1 3];
>> sol = SturmS(A);

Program 4.6
MATLAB m-file for the Sturm Sequence method
function sol=SturmS(A)
% This evaluates the eigenvalues of a tridiagonal symmetric ma-
trix
[n,n] = size(A);
ff(1,:)=[1 0 0 0 0]; ff(2,:)=[A(1,1) -1 0 0 0];
for i=3:n+1; h=[A(i-1,i-1) -1];
f f (i, 1) = h(1) ∗ f f (i − 1, 1) − A(i − 1, i − 2)ˆ 2 ∗ f f (i − 2, 1);
for z=2:n+1
f f (i, z) = h(1)∗f f (i−1, z)+h(2)∗f f (i−1, z−1)−A(i−1, i−2)ˆ
2 ∗ f f (i − 2, z); end; end
for i=1:n+1; y(i)=ff(n+1,n+2-i);end; eigval=roots(y)
Theorem 4.7 For any real number λ∗ , the number of agreements in signs
of successive terms of the Sturm sequence {f0 (λ∗ ), f1 (λ∗ ), . . . , fn (λ∗ )} is
equal to the number of eigenvalues of the tridiagonal matrix A greater than
λ∗ . The sign of a zero is taken to be opposite to that of the previous term.

Example 4.13 Find the number of eigenvalues of the matrix
 
3 −1 0
A =  −1 2 −1 
0 −1 3
lying in the interval (0, 4).

Solution. Since the given matrix is of size 3 × 3, we have to compute the


Sturm sequences f3 (0) and f3 (4). First, for λ∗ = 0, we have
f0 (0) = 1 and f1 (0) = (a1 − 0) = (3 − 0) = 3.
Numerical Computation of Eigenvalues 459

Also,
f2 (0) = (a2 − 0)f1 (0) − b22 f0 (0)

= (2)(3) − (−1)2 (1) = 5.


Finally, we have

f3 (0) = (a3 − 0)f2 (0) − b23 f1 (0)

= (3)(5) − (−1)2 (3) = 12,

which have signs + + ++, with three agreements. So all three eigenvalues
are greater than λ∗ = 0.

Similarly, we can calculate for λ∗ = 4. The Sturm sequences are

f0 (4) = 1 and f1 (4) = (a1 − 4) = (3 − 4) = −1.

Also,
f2 (4) = (a2 − 4)f1 (4) − b22 f0 (4)

= (2 − 4)(−1) − (−1)2 (1) = 1.


In the last, we have

f3 (4) = (a3 − 4)f2 (4) − b23 f1 (4)

= (3 − 4)(1) − (−1)2 (−1) = 0,

which have signs + − +−, with no agreements. So no eigenvalues are


greater than λ∗ = 4. Hence, there are exactly three eigenvalues in [0, 4].
Furthermore, since f3 (0) 6= 0 and also, f3 (4) = 0, we deduce that no eigen-
value is exactly equal to 0 but one eigenvalue is exactly equal to 4, because
f3 (λ∗ ) = det(A − λ∗ I), the characteristic polynomial of A. Therefore, there
are three eigenvalues in the half-open interval (0, 4] and two eigenvalues are
in the open interval (0, 4) . Since the given matrix A is positive-definite,
therefore, by a well-known result, all of the eigenvalues of A must be strictly
positive. Note that the eigenvalues of the given matrix A are 1, 3, and 4.•
460 Applied Linear Algebra and Optimization using MATLAB

Note that if the sign pattern is + + + − −, for a 4 × 4 matrix for λ = c,


then there are three eigenvalues greater than λ = c.

If the sign pattern is + − + − −, for a 4 × 4 matrix for λ = c, then


there is one eigenvalue greater than λ = c.

If the sign pattern is + − 0 + +, for a 4 × 4 matrix for λ = c, then there


are two eigenvalues greater than λ = c.

4.5.3 Given’s Method


This method is also based on similarity transformations of the same type as
those used for the Jacobi method. The zeros created are retained, and the
symmetric matrix is reduced to a symmetric tridiagonal matrix C rather
than a diagonal form using a finite number of orthogonal similarity trans-
formations. The eigenvalues of the original matrix A are the same as those
of the symmetric tridiagonal matrix C. Given’s method is generally prefer-
able to the Jacobi method in that it requires a finite number of iterations.

For Given’s method, the angle θ is chosen to create zeros, not in the
(p, q) and (q, p) positions as in the Jacobi method, but in the (p − 1, q)
and (q, p − 1) positions. This is because zeros can be created in row order
without destroying those previously obtained.

In the first stage of Given’s method we annihilate elements along the


first row (and by symmetry, down the first column) in the positions (1, 3),
. . ., (1, n) using the rotation matrices Q23 , . . . , Q2n in turn. Once a zero has
been created in positions (1, j), subsequent transformations use matrices
Qpq with p, q 6= 1, j and so zeros are not destroyed. In the second stage
we annihilate elements in the positions (2, 4), . . . , (2, n) using Q34 , . . . , Q3n .
Again, any zeros produced by these transformations are not destroyed as
subsequent zeros are created along the second row. Furthermore, zeros pre-
viously obtained in the first row are also preserved. The process continues
until a zero is created in the position (n − 2, n) using Qn−1n . The original
matrix can, therefore, be converted into a symmetric tridiagonal matrix C
Numerical Computation of Eigenvalues 461

in exactly
1
(n − 2) + (n − 3) + · · · + 1 ≡ (n − 1)(n − 2)
2
steps. This method also uses rotation matrices as the Jacobi method does,
but in the following form:

cos θ = (p−1, p−1), sin θ = (p−1, q), − sin θ = (q, p−1), cos θ = (q, q)

and  
ap−1q
θ = − arctan .
ap−1p
We can also find the values of cos θ and sin θ by using

|ap−1p | |ap−1q |
cos θ = and sin θ = ,
R R
where q
R= (ap−1p )2 + (ap−1q )2 .

Example 4.14 Use Given’s method to reduce the matrix


 
2 −1 1 4
 −1 3 1 2 
A= 
 1 1 5 −3 
4 2 −3 6

to a symmetric tridiagonal form and then find the eigenvalues of A.

Solution.
Step I. Create a zero in the (1, 3) position by using the first orthogonal
transformation matrix as
 
1 0 0 0
 0 c s 0 
Q23 = 
 0 −s c 0  .

0 0 0 1
462 Applied Linear Algebra and Optimization using MATLAB

To find the value of the cos θ and sin θ, we have


   
a13 1
θ = − arctan = − arctan ≈ 50.0000
a12 −1

cos θ = 0.7071, and sin θ = 0.7071.


Then  
1.0 0 0 0
 0 0.7071 0.7071 0 
Q23 = 0 −0.7071 0.7071
 and
0 
0 0 0 1.0
 
1 0 0 0
 0 0.7071 −0.7071 0 
QT23 = 
 0 0.7071
,
0.7071 0 
0 0 0 1
which gives
 
2.0 −1.4142 0 4.0
 −1.4142 3.0 −1.0 3.535 
A1 = QT23 AQ23 = .
 0 −1.0 5.0 −0.7071 
4.0 3.535 −0.7071 6.0
Note that because of the symmetry matrix, the lower part of A1 is the
same as the upper part.
Step II. Create a zero in the (1, 4) position by using the second orthogonal
transformation matrix as
 
1 0 0 0
 0 c 0 s 
Q24 =  0

0 1 0 
0 −s 0 c
and
   
a14 4
θ = − arctan = − arctan ≈ 78.3658
a12 −1.4142

cos θ = 0.3333, and sin θ = 0.9428.


Numerical Computation of Eigenvalues 463

Then
   
1 0 0 0 1 0 0 0
 0 0.3333 0 0.9428   0 0.3333 0 −0.9428 
Q24 =
 0
 and QT24 = ,
0 1 0   0 0 1 0 
0 −0.9428 0 0.3333 0 0.9428 0 0.3333
which gives
 
2.0 −4.2426 0 0
 −4.2426 3.4444 0.3333 −3.6927 
A2 = QT24 A1 Q24 = .
 0 0.3333 5.0 −1.1785 
0 −3.6927 −1.7185 5.5556
Step III. Create a zero in the (2, 4) position by using the third orthogonal
transformation matrix as
 
1 0 0 0
 0 1 0 0 
Q34 =  0

0 c s 
0 0 −s c
and
   
a24 −3.6927
θ = − arctan = − arctan ≈ 94.2695
a23 0.3333

cos θ = 0.0899, and sin θ = 0.9960.


Then
   
1 0 0 0 1 0 0 0
 0 1 0 0   0 1 0 0 
Q34 =  and QT34 = ,
 0 0 0.0899 0.9960   0 0 0.0899 −0.9960 
0 0 −0.9960 0.0899 0 0 0.9960 0.0899
which gives
 
2.0 −4.2426 0 0
 −4.2426 3.4444 3.7077 0 
A3 = QT34 A2 Q34 =  = C.
 0 3.7077 5.7621 1.1097 
0 0 1.1097 4.7934
464 Applied Linear Algebra and Optimization using MATLAB

By using the Sturm sequence iteration, the eigenvalues of the symmet-


ric tridiagonal matrix C are 9.621, 5.204, 3.560, and −2.385, which are also
the eigenvalues of A. •

To get the above results using the MATLAB Command Window, we


do the following:

>> A = [2 − 1 1 4; −1 3 1 2; 1 1 5 3; 4 2 − 3 6];
>> sol = Given(A);

Program 4.7
MATLAB m-file for Given’s method
function sol=Given(A)
[n, n] = size(A); t = 0; for i=1:n; for j=1:n
if i==j; Q(i,j)=1; else; Q(i,j)=0; end; end;end
for i=1:n-2; for j=i+2:n; t=t+1;
for f=1:n; for g=1:n
Q(f, t ∗ n + g) = Q(f, g); end; end;
theta=atan(A(i,j)/A(i,i+1));
Q(i+1, t∗n+i+1) = cos(theta); Q(i+1, t∗n+j) = −sin(theta);
Q(j, t ∗ n + i + 1) = sin(theta); Q(j, t ∗ n + j) = cos(theta);
for f=1:n; for g=1:n; sum=0; for l=1:n
sum = sum + a(f, l) ∗ Q(l, t ∗ n + g); ;end; aa(f,g)=sum; end; end
for f=1:n; for g=1:n; sum=0; for l=1:n
sum = sum + Q(l, t ∗ n + f ) ∗ aa(l, g); end
A(f,g)=sum; end; end; end; end T=A
% Solve the tridiagonal matrix using Sturm sequence method
ff(1,:)=[1 0 0 0 0]; ff(2,:)=[A(1,1) -1 0 0 0];
for i=3:n+1; h=[A(i-1,i-1) -1];
f f (i, 1) = h(1) ∗ f f (i − 1, 1) − A(i − 1, i − 2)ˆ 2∗f f (i − 2, 1);
for z=2:n+1
f f (i, z) = h(1)∗f f (i−1, z)+h(2)∗f f (i−1, z −1)−A(i−1, i−2)
ˆ 2∗f f (i − 2, z); end;end
for i=1:n+1; y(i) = f f (n + 1, n + 2 − i); end; eigval = roots(y)
Numerical Computation of Eigenvalues 465

4.5.4 Householder’s Method


This method is a variation of Given’s method and enables us to reduce a
symmetric matrix A to a symmetric tridiagonal matrix form C having the
same eigenvalues. It reduces a given matrix into a symmetric tridiagonal
form with about half as much computation as Given’s method requires.
This method is used to reduce a whole row and column (except for the
tridiagonal elements) to zero at a time. Note that the symmetric tridiag-
onal matrix form by Given’s method and Householder’s method may be
different, but the eigenvalues will be same.

Definition 4.3 (Householder Matrix)

A Householder matrix Hw is a matrix of the form


 
T 2
Hw = I − 2ww = I − wwT ,
wT w

where I is an n × n identity matrix and w is some n × 1 vector satisfying


n
X
T
w w= wk2 = 1,
k=1

i.e., the vector w has unit length. •

It is easy to verify that a Householder matrix Hw is symmetric, i.e.,


 
1 − 2w12 −2w1 w2 · · · −2w1 wn
 −2w1 w2 1 − 2w22 · · · −2w2 wn  T
Hw =  ..  = Hw
 
.. .. ..
 . . . . 
2
−2w1 wn −2w2 wn · · · 1 − 2wn

and is orthogonal, i.e.,

Hw2 = (I − 2wwT )(I − 2wwT )


= I − 4wwT + 4wwT wwT
= I − 4wwT + 4wwT
= I, (since wT w = 1)
466 Applied Linear Algebra and Optimization using MATLAB

Thus,
Hw = Hw−1 = HwT ,
which shows that Hw is symmetric. Note that the determinant of a House-
holder matrix Hw is always equal to −1.

Example 4.15 Consider a vector w = [1, 2]T , then


2
Hw = I − wwT ,
5
so  
3 4
− 
5 5 
   
1 0 2 1 2

Hw = − = ,

0 1 5 2 4  4 3 
− −
5 5
which shows that the given Householder matrix Hw is symmetric and or-
thogonal and the determinant of Hw is −1. •

A Householder matrix Hw corresponding to a given w may be gener-


ated using the MATLAB Command Window as follows:

>> w = [1 2]0 ;
>> w = w/norm(w);
>> Hw = eye(2) − 2 ∗ w ∗ w0 ;
The basic steps of Householder’s method that require us to convert the
symmetric matrix into a symmetric tridiagonal matrix are as follows:

A1 = A

A2 = QT1 A1 Q1

A3 = QT2 A2 Q2
.. ..
. .
Ak+1 = QTk Ak Qk ,
Numerical Computation of Eigenvalues 467

where Qk matrices are the Householder transformation matrices and can


be constructed as
Qk = I − sk wk wkT
and
2
sk = .
wkT wk
The coefficients of a vector wk are defined in terms of a matrix A as

0, for i = 1, 2, . . . , k
wik =
aik , for i = k + 2, k + 3, . . . , n

and v
u n
uX
wk+1k = ak+1k ± t a2ik .
i=k+1

The positive sign or negative sign of wk+1k can be taken depending on


the sign of a coefficient ak+1k of a given matrix A.

Householder’s method transforms a given n × n symmetric matrix to


a symmetric tridiagonal matrix in exactly (n −2) steps. Each step of
the method creates a zero in a complete row and column. The first step
annihilates elements in the positions (1, 3), (1, 4), . . . , (1, n) simultaneously.
Similarly, step r annihilates elements in the positions (r, r + 2), (r, r +
3), . . . , (r, n) simultaneously. Once a symmetric tridiagonal form has been
achieved, then the eigenvalues of a given matrix can be calculated by using
the Sturm sequence iteration. After calculating the eigenvalues, the shifted
inverse power method can be used to find the eigenvectors of a symmetric
tridiagonal matrix and then the eigenvectors of the original matrix A can
be found by premultiplying these eigenvectors (of a symmetric tridiagonal
matrix) by the product of successive transformation matrices.

Example 4.16 Reduce the matrix


 
30 6 5
A =  6 30 9 
5 9 30
468 Applied Linear Algebra and Optimization using MATLAB

to a symmetric tridiagonal form using Householder’s method.

Solution. Since the given matrix is of size 3 × 3, only one iteration is re-
quired in order to reduce the given symmetric matrix into symmetric tridi-
agonal form. Thus, for k = 1, we construct the elements of the vector w1
as follows:

w11 = 0
w31 = a31 = p
5 √
w21 = a21 ± a221 + a231 = 6 ± 62 + 52 = 6 ± 7.81.

Since the given coefficient a21 is positive, the positive sign must be used for
w21 , i.e.,
w21 = 13.81.
Therefore, the vector w1 is now determined to be

w1 = [0, 13.81, 5]T

and
2
s1 = = 0.0093.
(0)2 + (13.81)2 + (5)2
Thus, the first transformation matrix Q1 for the first iteration is
   
1 0 0 0 
Q1 =  0 1 0  − 0.009  13.81  0 13.81 5 ,
0 0 1 5

and it gives  
1 0 0
Q1 =  0 −0.7682 −0.6402  .
0 −0.6402 0.7682
Therefore,
 
30.0 −7.810 0
A2 = QT1 A1 Q1 =  −7.810 38.85 −1.622  ,
0 −1.622 21.15

which is the symmetric tridiagonal form. •


Numerical Computation of Eigenvalues 469

To get the above results using the MATLAB Command Window, we


do the following:

>> A = [30 6 5; 6 30 9; 5 9 30];


>> sol = HHHM (A);

Program 4.8
MATLAB m-file for Householder’s method
function sol=HHHM(A)
[n, n] = size(A); Q = eye(n); for k=1:n-2
alf a = sign(A(k+1, k))∗sqrt(A((k+1) : n, k)0 ∗A((k+1) : n, k));
w = zeros(n, 1);
w(k + 1, 1) = A(k + 1, k) + alf a; w((k + 2) : n, 1) = A((k + 2) :
n, k);
P = eye(n) − 2 ∗ w ∗ w0 /(w0 ∗ w); Q = Q ∗ P ; A = P ∗ A ∗ P ; end
T=A % this is the tridiagonal matrix
% using Sturm sequence method
ff(1,:)=[1 0 0 0 0]; ff(2,:)=[A(1,1) -1 0 0 0];
for i=3:n+1
h = [A(i−1, i−1)−1]; f f (i, 1) = h(1)∗f f (i−1, 1)−A(i−1, i−2)
ˆ 2∗f f (i − 2, 1);
for z=2:n+1
f f (i, z) = h(1)∗f f (i−1, z)+h(2)∗f f (i−1, z−1)−A(i−1, i−2)ˆ
2*f f (i − 2, z); end; end
for i=1:n+1; y(i)=ff(n+1,n+2-i); end; alfa; u; Q; eig-
val=roots(y)

Example 4.17 Reduce the matrix


 
7 1 2 1
 1 8 1 −1 
A=  2

1 3 1 
1 −1 1 2

to symmetric tridiagonal form using Householder’s method, and then find


the approximation of the eigenvalues of A using the Strum sequence itera-
470 Applied Linear Algebra and Optimization using MATLAB

tion.

Solution. Since the size of A is 4 × 4, we need two iterations to convert


the given symmetric matrix into symmetric tridiagonal form. For the first
iteration, we take k = 1, and we construct the elements of the vector w1
as follows:

w11 = 0
w31 = a31 = 2
w41 = a41 = p
1 √ √
w21 = a21 ± a221 + a231 + a241 = 1 ± 12 + 22 + 11 = 1 ± 6.

Since the given coefficient a21 > 0, the positive sign must be used for w21 ,
and it gives
w21 = 1 + 2.4495 = 3.4495.
Thus, the vector w1 takes the form

w1 = [0, 3.4495, 2, 1]T

and
2 2 2
s1 = = = = 0.1183.
w1T w1 (0)2 2 2
+ (3.4495) + (2) + 1 2 16.83

Thus, the first transformation matrix Q1 for the first iteration is


   
1 0 0 0 0
 0 1 0 0 
−0.1188  3.4495  0 3.4495 2 1 ,
 
Q1 = I−s1 w1 w1T = 

 0 0 1 0   2 
0 0 0 1 1

and it gives
 
1.0000 0 0 0
 0 −0.4082 −0.8165 −0.4082 
Q1 =  .
 0 −0.8165 0.5266 −0.2367 
0 −0.4082 −0.2367 0.8816
Numerical Computation of Eigenvalues 471

Therefore,
 
7.0000 −2.4495 0.0000 0
 −2.4495 4.6667 1.5700 1.1933 
A2 = QT1 A1 Q1 = 
 0.0000
.
1.5700 4.7816 2.9972 
0 1.1933 2.9972 3.5518

Now for k = 2, we construct the elements of the vector w2 as follows:

w12 = 0
w22 = 0
w42 = 1.1933 p √
w32 = 1.5700 ± (1.5700)2 + (1.1933)2 = 1.5700 ± 0000003.855.
= 1.5700 ± 1.9721

Since the given coefficient a32 > 0, the positive sign must be used for w32 ,
and it gives
w32 = 1.5700 + 1.9721 = 3.5421.
Thus, the vector w2 takes the form

w2 = [0, 0, 3.5421, 1.1933]T

and
2 2
s2 = = = 0.1432.
w2T w2 13.9704
Thus, the second transformation matrix Q2 for the second iteration is
   
1 0 0 0 0
 0 1 0 0 
−0.1432  0
 
Q2 = I−s2 w2 w2T = 

 0 0 3.5421 1.1933 ,
 0 0 1 0   3.5421 
0 0 0 1 1.1933

and it gives
 
1.0000 0 0 0
 0 −0.4082 0.8971 0.1690 
Q2 =  .
 0 −0.8165 −0.2760 −0.5071 
0 −0.4082 −0.3450 0.8452
472 Applied Linear Algebra and Optimization using MATLAB

Therefore,
 
7.0000 −2.4495 0.0000 0.0000
 −2.4495 4.6667 −1.9720 0.0000 
A3 = QT2 A2 Q2 = 
 0.0000 −1.9720
 = T,
7.2190 −0.2100 
0.0000 0.0000 −0.2100 1.1143
which is the symmetric tridiagonal form.

To find the eigenvalues of this symmetric tridiagonal matrix we use the


Sturm sequence iteration

f4 (λ) = (a4 − λ)f3 (λ) − b24 f2 (λ),

where
f3 (λ) = (a3 − λ)f2 (λ) − b23 f1 (λ)
and
f2 (λ) = (a2 − λ)f1 (λ) − b22 f0 (λ),
with
f1 (λ) = (a1 − λ) and f0 (λ) = 1.
Since

a1 = 7.0000, a2 = 4.6667, a3 = 7.2190, a4 = 1.1143

and
b2 = −2.4495, b3 = −1.9720, b4 = −0.2100.
Thus,

f4 (λ) = λ4 − 20λ3 + 128.0002λ2 − 284.0021λ + 183.0027 = 0,

and solving this characteristic equation, we get

λ1 = 9.2510, λ1 = 7.1342, λ3 = 2.5100, λ4 = 1.1047,

which are the eigenvalues of the symmetric tridiagonal matrix T and are
also the eigenvalues of the given matrix A. Once the eigenvalues of A are
obtained, then the corresponding eigenvectors of A can be obtained by using
the shifted inverse power method. •
Numerical Computation of Eigenvalues 473

4.6 Matrix Decomposition Methods


In the following we will discuss two matrix decomposition methods, called
the QR method and the LR method, which help us to find the eigenvalues
of a given general matrix.

4.6.1 QR Method
We know that the Jacobi, Given’s, and Householder’s methods are appli-
cable only to symmetric matrices for finding all the eigenvalues of a matrix
A. First, we describe the QR method, which can find all the eigenvalues of
a general matrix. In this method we decompose an arbitrary real matrix
A into a product QR, where Q is an orthogonal matrix and R is an upper-
triangular matrix with nonnegative diagonal elements. Note that when A
is nonsingular, this decomposition is unique.

Starting with A1 = A, the QR method iteratively computes similar


matrices Ai , i = 2, 3, . . ., in two stages:

(1) Factor Ai into Qi Ri , i.e., Ai = Qi Ri .

(2) Define Ai+1 = Ri Qi .

Note that from stage (1), we have

Ri = Q−1
i Ai ,

and using this, stage (2) can be written as

Ai+1 = Q−1 T
i Ai Qi = Qi Ai Qi ,

where all Ai are similar to A, and thus have the same eigenvalues. It turns
out that in the case where the eigenvalues of A all have different magnitude,

|λ1 | > |λ2 | > · · · > |λn |,

then the QR iterates Ai approach an upper-triangular matrix, and thus the


elements of the main diagonal approach the eigenvalues of a given matrix
474 Applied Linear Algebra and Optimization using MATLAB

A. When there are distinct eigenvalues of the same size, the iterates Ai
may not approach an upper-triangular matrix; however, they do approach
a matrix that is near enough to an upper-triangular matrix to allow us to
find the eigenvalues of A.

If a given matrix A is symmetric and tridiagonal, since the QR transfor-


mation preserves symmetry, all subsequent matrices Ai will be symmetric
and, hence, tridiagonal. Thus, the combined method of first reducing a
symmetric matrix to a symmetric tridiagonal form by the Householder
transformations and then applying the QR method is probably the most
effective for evaluating all the eigenvalues of a symmetric matrix.

The simplest way of calculating the QR decomposition of an n × n


matrix A is to premultiply A by a series of rotation matrices, and the
values of p, q, and θ are chosen to annihilate one of the lower-triangular
elements. The value of θ, which is chosen to create zero in the (q, p)
position, is defined as  
aqp
θ = − arctan .
app
The first stage of the decomposition annihilates the element in posi-
tion (2,1) using the rotation matrix QT12 . The next two stages annihilate
elements in positions (3,1) and (3,2) using the rotation matrices QT13 and
QT23 , respectively. The process continues in this way, creating zeros in row
order, until the rotation matrix QTn−1n is used to annihilate the element
in the position (n, n − 1). The zeros created are retained in a similar way
as in Given’s method, and an upper-triangular matrix R is produced after
n(n−1)
2
premultiplications, i.e.,

QTn−1n · · · QT13 QT12 A = R,

which can be rearranged as

A = (Q12 Q13 · · · Qn−1n )R = QR,

since QTpq = Q−1


pq .
Numerical Computation of Eigenvalues 475

Example 4.18 Find the first QR iteration for the matrix


 
1 4 3
A =  2 3 1 .
2 6 5
Solution. Step I. Create a zero in the (2, 1) position by using the first
orthogonal transformation matrix
 
c s 0
Q12 =  −s c 0 
0 0 1
and
 
a21
θ = − arctan = − arctan(2) = −70.4833,
a11

cos θ ≈ 0.4472 and sin θ ≈ −0.8944.


Then
   
0.4472 −0.8944 0 0.4472 −0.8944 0
Q12 =  0.8944 0.4472 0  and QT12 =  0.8944 0.4472 0  ,
0 0 1 0 0 1
which gives  
2.2360 4.4720 2.2360
QT12 A =  0 −2.2360 −2.2360  .
2.0000 6.0000 5.0000
Step II. Create a zero in the (3, 1) position by using the second orthogonal
transformation matrix
 
c 0 s
Q13 =  0 1 0  ,
−s 0 c
with
   
a31 2
θ = − arctan = − arctan ≈ −46.4585
a11 2.2360

cos θ ≈ 0.7453, and sin θ ≈ −0.6667.


476 Applied Linear Algebra and Optimization using MATLAB

Then
   
0.7453 0 −0.6667 0.7453 0 0.6667
Q13 = 0 1 0  and QT13 = 0 1 0 ,
0.6667 0 0.7453 −0.6667 0 0.7453
which gives
 
3.0001 7.3336 5.0002
QT13 (QT12 A) =  0 −2.2360 −2.2360  .
0.0001 1.4909 2.2363
Step III. Create a zero in the (3, 2) position by using the third orthogonal
transformation matrix
 
1 0 0
Q23 =  0 c s ,
0 −s c
with
   
a32 1.4909
θ = − arctan = − arctan ≈ 37.4393
a22 −2.2360

cos θ ≈ 0.8320, and sin θ ≈ 0.5548.


Then
   
1 0 0 1 0 0
Q23 = 0 0.8320 0.5548  and QT23 =  0 0.8320 −0.5548  ,
0 −0.5548 0.8320 0 0.5548 0.8320
which gives
 
3 7.3333 5
R1 = QT23 (QT13 QT12 A) =  0 −2.6874 −3.1009  ,
0 0 0.6202
which is the required upper-triangular matrix R1 . The matrix Q1 can be
computed as
 
0.3333 −0.5788 −0.7442
Q1 = Q12 Q13 Q23 =  0.6667 0.7029 −0.2481  .
0.6667 −0.4134 0.6202
Numerical Computation of Eigenvalues 477

Hence, the original matrix A can be decomposed as

 
0.9999 3.9997 2.9997
A1 = Q1 R1 =  2.0001 3.0001 1.0000  ,
2.0001 6.0001 5.0001

and the new matrix can be obtained as

 
9.2222 1.3506 −0.9509
A2 = R1 Q1 =  −3.8589 −0.6068 −1.2564  ,
0.4134 −0.2564 0.3846

which is the required first QR iteration for the given matrix. •

Note that if we continue in the same way with the 21 iterations, the
new matrix A21 becomes the upper-triangular matrix

 
8.5826 −4.9070 −2.1450
A21 = R20 Q20 = 0 1 −1.1491  ,
0 0 −0.5825

and its diagonal elements are the eigenvalues, λ = 8.5826, 1, −0.5825, of


the given matrix A. Once the eigenvalues have been determined, the cor-
responding eigenvectors can be computed by the shifted inverse power
method.
To get the above results using the MATLAB Command Window, we
do the following:

>> A = [1 4 3; 2 3 1; 2 6 5];
>> sol = QRM (A);
478 Applied Linear Algebra and Optimization using MATLAB

Program 4.9
MATLAB m-file for the QR method
function sol=QRM(A)
[n,n]=size(A); M=0; for i=1:n; for j=1:n
if j < i; M=M+1;end; end;end;
for i=1:n; I(i,i)=1;end
dd=1; while dd > 0.0001; Q=I; Qs=I; kk=1;
for i=2:n; for j=1:i-1
t = −atan((A(i, j)/A(j, j))); Q(j, j) = cos(t);
Q(j, i) = sin(t); Q(i, j) = −sin(t); Q(i, i) = cos(t); Q;
A = Q0 ∗ A; Qs(:, :, kk) = Q; kk = kk + 1; Q = I; end; end;
Q = Qs(:, :, M ); f orc = M − 1 : −1 : 1
Q = Qs(:, :, c) ∗ Q; end; R = A; Q; A = R ∗ Q; k = 1;
for i=1:n; for j=1:n
if j < i; m(k) = A(i, j); k = k + 1; end; end; end;
m; dd = max(abs(m)); end; for i=1:n; eigvals(i)=A(i,i); end

Example 4.19 Find the first QR iteration for the matrix


 
5 −2
A= ,
−2 8

and if (Q1 R1 )x = b and R1 x = c, with c = QT1 b, then find the solution of


the linear system Ax = [7, 8]T .

Solution. First, create a zero in the (2, 1) position with the help of the
orthogonal transformation matrix
 
c s
Q12 = ,
−s c

and then, for finding the value of the θ, c, and s, we calculate the
 
a21
θ = − arctan = − arctan(−0.4) = 0.3805,
a11

cos θ ≈ 0.9285 and sin θ ≈ 0.3714.


Numerical Computation of Eigenvalues 479

So,  
0.9285 0.3714
Q1 = Q12 =
−0.3714 0.9285
and  
5.3853 −4.8282
R1 = QT12 A = .
0 6.6852
Since
    
0.9285 −0.3714 7 3.5283
c= QT1 b = = ,
0.3714 0.9285 8 10.0278

therefore, solving the system


    
5.3853 −4.8282 x1 3.5283
R1 x = = = c,
0 6.6852 x2 10.0278
we get    
x1 2.0000
= ,
x2 1.5000
which is the required solution of the given system. •

4.6.2 LR Method
Another method, which is very similar to the QR method, is Rutishauser’s
LR method. This method is based upon the decomposition of a matrix A
into the product of lower-triangular matrix L (with unit diagonal elements)
and upper-triangular matrix R. Starting with A1 = A, the LR method
iteratively computes similar matrices Ai , i = 2, 3, . . . , in two stages.
(1) Factor Ai into Li Ri , i.e., Ai = Li Ri .

(2) Define Ai+1 = Ri Li .


Each complete step is a similarity transformation because

Ai+1 = Ri Li = L−1
i Ai Li ,

and so all of the matrices Ai have the same eigenvalues. This triangular
decomposition-based method enables us to reduce a given nonsymmetric
480 Applied Linear Algebra and Optimization using MATLAB

matrix to an upper-triangular matrix whose diagonal elements are the pos-


sible eigenvalues of a given matrix A, in decreasing order of magnitude. The
(i)
rate at which the lower-triangular elements ajk of Ai converge to zero is of
λ
order ( λkj )i , j > k.

This implies, in particular, that the order of convergence of the ele-


λ
ments along the first subdiagonal is ( λj+1
j
)i , and so convergence will be
slow whenever two or more real eigenvalues are close together. The situa-
tion is more complicated if any of the eigenvalues are complex.

Since we know that the triangular decomposition is not always possible,


we will use decomposition by partial pivoting, starting with
Pi Ai = Li Ri ,
where Pi represents the row permutations used in the decomposition. In
order to preserve eigenvalues it is necessary to calculate Ai+1 from
Ai+1 = (Ri Pi )Li .
It is easy to see that this is a similarity transformation because
Ai+1 = (Ri Pi )Li = L−1
i Pi Ai Pi Li ,

and Pi−1 = Pi .

The matrix Pi does not have to be computed explicitly; Ri Pi is just a


column permutation of Ri using interchanges corresponding to row inter-
changes used in the decomposition of Ai .
Example 4.20 Use the LR method to find the eigenvalues of the matrix
 
2 −2 3
A= 0 3 −2  .
0 −1 2
Solution. The exact eigenvalues of the given matrix A are λ = 1, 2, 4.
The first triangular decomposition of A = A1 produces
   
1.0000 0 0 2.0000 −2.0000 3.0000
L1 =  0 1.0000 0  , R1 =  0 3.0000 −2.0000  ,
0 −0.3333 1.0000 0 0 1.3333
Numerical Computation of Eigenvalues 481

and no rows are interchanged. Then


 
2.0000 −3.0000 3.0000
A2 = R1 L1 =  0 3.6667 −2.0000  .
0 −0.4444 1.3333
The second triangular decomposition of A2 produces
   
1.0000 0 0 2.0000 −3.0000 3.0000
L2 =  0 1.0000 0  , R2 =  0 3.6667 −2.0000  ,
0 −0.1212 1.0000 0 0 1.0909
and again, no rows are interchanged. Then
 
2.0000 −3.3636 3.0000
A3 = R2 L2 =  0 3.9091 −2.0000  .
0 −0.1322 1.0909
In a similar way, the next matrices in the sequence are
  
2 −3.3636 3.0000 1 0 0
A4 =  0 3.9091 −2.0000   0 1.0000 0 
0 0 1.0233 0 −0.0338 1
 
2 −3.4651 3.0000
= 0 3.9767 −2.0000 
0 −0.0346 1.0233
  
2 −3.4651 3.0000 1 0 0
A5 =  0 3.9767 −2.0000   0 1.0000 0 
0 0 1.0058 0 −0.0087 1
 
2 −3.4912 3.0000
= 0 3.9942 −2.0000 
0 −0.0088 1.0058
  
2 −3.4912 3.0000 1 0 0
A6 =  0 3.9942 −2.0000   0 1.0000 0 
0 0 1.0015 0 −0.0022 1
482 Applied Linear Algebra and Optimization using MATLAB

 
2 −3.4978 3.0000
= 0 3.9985 −2.0000 
0 −0.0022 1.0015
    
2 −3.4978 3 1 0 0 2 −3.4995 3.0000
A7 =  0 3.9985 −2   0 1.0000 0  =  0 3.9996 −2.0000 
0 0 1 0 −0.0005 1 0 −0.0005 1.0004
    
2 −3.4995 3 1 0 0 2 −3.4999 3.0000
A8 =  0 3.9996 −2   0 1.0000 0  =  0 3.9999 −2.0000 
0 0 1 0 −0.0001 1 0 −0.0001 1.0001
    
2 −3.4999 3 1 0 0 2 −3.5 3
A9 =  0 3.9999 −2   0 1 0  =  0 4 −2  .
0 0 1 0 0 1 0 0 1

4.6.3 Upper Hessenberg Form


In employing the QR method or the LR method to find the eigenvalues of
a nonsymmetric matrix A, it is preferable to first use similarity transforma-
tions to convert A to upper Hessenberg form and then go on to demonstrate
its usefulness in the QR and the LR methods.

Definition 4.4 A matrix A is in upper Hessenberg form if


aij = 0, for all i, j, such that i − j > 1. •

For example, in the following 4 × 4 matrix case, the nonzero elements


are  
3 2 1 5
 4 6 7 3 
A=  0 8
.
9 5 
0 0 7 8
Note that one way to characterize upper Hessenberg form is that it is
almost triangular. This is important, since the eigenvalues of the triangular
matrix are the diagonal elements. The upper Hessenberg form of a matrix
A can be achieved by a sequence of Householder transformations or the
Gaussian elimination procedure. Here, we will use the Gaussian elimination
Numerical Computation of Eigenvalues 483

procedure since it is about a factor of 2 more efficient than Householder’s


method. It is possible to construct matrices for which the Householder
reduction, being orthogonal, is stable and elimination is not, but such
matrices are extremely rare in practice.
A general n × n matrix A can be reduced to upper Hessenberg form in
exactly n − 2 steps.
Consider a 5 × 5 matrix

 
a11 a12 a13 a14 a15

 a21 a22 a23 a24 a25 

A=
 a31 a32 a33 a34 a35 .

 a41 a42 a43 a44 a45 
a51 a52 a53 a54 a55

The first step of reducing the given matrix A = A1 into upper Hes-
senberg form is to eliminate the elements in the (3, 1), (4, 1), and (5, 1)
positions. It can be done by subtracting multiples m31 = aa31 21
, m41 = aa41
21
a51
and m51 = a21 of row 2 from rows 3, 4, and 5, respectively, and considering
the matrix
 
1 0 0 0 0

 0 1 0 0 0 

M1 = 
 0 m31 1 0 0 .

 0 m41 0 1 0 
0 m51 0 0 1

Since we wish to carry out a similarity transformation to preserve eigen-


values, it is necessary to find the inverse matrix M −1 and compute

A2 = M1−1 A1 M1 .
484 Applied Linear Algebra and Optimization using MATLAB

The right-hand side multiplication gives us


 (2) 
a11 a12 a13 a14 a15
 
 
 a (2)
 21 a22 a23 a24 a25 

 
 
(2) (2) (2) (2)
A2 =  0 a32 a33 a34 a35 ,
 
 
 

 0 (2) (2) (2) (2) 
 a42 a43 a44 a45 

 
(2) (2) (2) (2)
0 a52 a53 a54 a55
(2)
where aij denotes the new element in (i, j). In the second step, we elim-
inate the elements in the (4, 2) and (5, 2) positions. This can be done by
(2) (2)
a42 a52
subtracting multiples m42 = (2) and m52 = (2) of row 3 from rows 4 and
a32 a32
5, respectively, and considering the matrix
 
1 0 0 0 0
 0 1 0 0 0 
 
M2 =   0 0 1 0 0 .

 0 0 m42 1 0 
0 0 m52 0 1
Hence,
 (2) (3) 
a11 a12 a13 a14 a15
 
 
 a (2) (3)
 21 a22 a23 a24 a25


 
 
A3 = M2−1 A2 M2 =  0 (2) (3) (2)
a32 a33 a34 a35
(2)
,
 
 
 

 0 (3) (3) (3) 
 0 a43 a44 a45 

 
(3) (3) (3)
0 0 a53 a54 a55
(3)
where aij denotes the new element in (i, j). In the third step, we elimi-
nate the elements in the (5, 3) position. This can be done by subtracting
Numerical Computation of Eigenvalues 485
(3)
a53
multiples m53 = (3) of row 4 from row 5, and considering the matrix
a43

 
1 0 0 0 0

 0 1 0 0 0 

M3 = 
 0 0 1 0 0 .

 0 0 0 1 0 
0 0 0 m53 1

Hence,
 (2) (3) (4) 
a11 a12 a13 a14 a15
 
 
 a (2) (3) (4)
 21 a22 a23 a24 a25


 
 
A4 = M3−1 A3 M3 =  0 (2) (3) (4)
a32 a33 a34 a35
(2)
,
 
 
 

 0 (3) (4) (4) 
 0 a43 a44 a45 

 
(4) (4)
0 0 0 a54 a55

which is in upper Hessenberg form.

Example 4.21 Use the Gaussian elimination method to convert the ma-
trix
 
5 3 6 4 9
 4 6 5 3 4 
 
 4
A1 =  2 3 1 1 

 2 4 6 3 3 
2 5 6 4 7

into upper Hessenberg form.

Solution. In the first step, we eliminate the elements in the (3, 1), (4, 1)
and (5, 1) positions. It can be done by subtracting multiples m31 = 44 =
1, m41 = 24 = 0.5, and m51 = 24 = 0.5 of row 2 from rows 3, 4, and 5,
486 Applied Linear Algebra and Optimization using MATLAB

respectively. The matrices M1 and M1−1 are as follows:


   
1 0 0 0 0 1 0 0 0 0
 0 1 0 0 0   0 1 0 0 0
  
−1
 
M1 =  0
 −1 1 0 0  and M1 = 

 0 1 1 0 0 .

 0 −0.5 0 1 0   0 0.5 0 1 0 
0 −0.5 0 0 1 0 0.5 0 0 1

Then the transformation is


 
5 15.50 6 4 9
 4 14.50 5 3 4 
M1−1 A1 M1
 
A2 = =
 0 −8.50 −2 −2 −3 
.
 0 5.75 3.50 1.50 1 
0 9.25 3.50 2.50 5

In the second step, we eliminate the elements in the (4, 2) and (5, 2)
5.75
positions. This can be done by subtracting multiples m42 = −8.50 = −0.6765
9.25
and m52 = −8.50 = −1.0882 of row 3 from rows 4 and 5, respectively. The
matrices M2 and M2−1 are as follows:
   
1 0 0 0 0 1 0 0 0 0
 0 1 0 0 0   0 1 0 0 0 
−1
   
M2 =   0 0 1 0 0  and M2 =  0 0 1 0 0 
.

 0 0 0.6765 1 0   0 0 −0.6765 1 0 
0 0 1.0882 0 1 0 0 −1.0882 0 1

Then the transformation is


 
5 15.50 −6.50 4 9
 4 14.50 −1.3824 3 4 
A3 = M2−1 A1 M2 = 
 
 0 −8.50 2.6176 −2 −3 .

 0 0 3.1678 0.1471 −1.0294 
0 0 −0.7837 0.3235 1.7353

In the last step, we eliminate the elements in the (5, 3) position. This
can be done by subtracting multiples m53 = −0.7837
3.1678
= −0.2474 of row 4
Numerical Computation of Eigenvalues 487

from row 5. The matrices M3 and M3−1 are as follows:


   
1 0 0 0 0 1 0 0 0 0
 0 1 0 0 0   0 1 0 0 0
  
−1
 
 0 0 1 0 0 
M3 =  and M 3 =  0 0 1 0 0 .
  
 0 0 0 1 0   0 0 0 1 0 
0 0 0 0.2474 1 0 0 0 −0.2474 1

Then the transformation is


 
5 15.50 −6.50 1.7733 9
 4 14.50 −1.3824 2.0104 4 
 
A4 =  0 −8.50
 2.6176 −1.2578 −3 ,

 0 0 3.1678 0.4017 −1.0294 
0 0 0 −0.0064 1.4806

which is in required upper Hessenberg form. •

To get the above results using the MATLAB Command Window, we


do the following:

>>>> A = [5 3 6 4 9; 4 6 5 3 4; 4 2 3 1 1; 2 4 6 3 3; 2 5 6 4 7];
>> sol = hes(A);

Program 4.10
MATLAB m-file for the Upper Hessenberg Form
function sol=hes(A)
n = length(A(1, :)); for i = 1:n-1; m = eye(n);
[wj] = max(abs(A(i + 1 : n, i)));
if j > i + 1;
t = m(i + 1, :); m(i + 1, :) = m(j, :);
m(j, :) = t; A = m ∗ A ∗ m0 ; end;
m = eye(n); m(i + 2 : n, i + 1) = −A(i + 2 : n, i)/(A(i +
1, i));
mi = m; mi(i + 2 : n, i + 1) = −m(i + 2 : n, i + 1);
A = m ∗ A ∗ mi; mesh(abs(A)); end
488 Applied Linear Algebra and Optimization using MATLAB

(j)
Note that the above reduction fails if any aj+1,j = 0 and, as in Gaussian
elimination, is unstable whenever |mij | > 1. Row and column interchanges
are used to avoid these difficulties (i.e., Gaussian elimination with pivot-
ing). At step j, the elements below the diagonal in column j are examined.
If the element of the largest modulus occurs in row rj , say, then rows j + 1
and rj are interchanged. Here, we perform the transformation

Aj+1 = Mj−1 (Ij+1,r


−1
j
Aj Ij+1,rj )Mj ,

where Ij+1,rj denotes a matrix obtained from the identity matrix by inter-
changing rows j + 1 and rj , and the elements of Mj are all less than or
equal to one in the modulus. Note that
−1
Ij+1,r j
= Ij+1,rj .

Example 4.22 Use Gaussian elimination with pivoting to convert the ma-
trix  
3 2 1 −1
 1 4 2 1 
A=  2 2 3 −2 

5 1 2 3
into upper Hessenberg form.

Solution. The element of the largest modulus below the diagonal occurs
in the fourth row, so we need to interchange rows 2 and 3 and columns 2
and 3 to get
   
1 0 0 0 3 2 1 −1 1 0 0 0
 0 0 0 1  1 4 2 1  0 0 0 1 
 
A1 = I24 AI24 = 
 0 0 1 0   2 2 3 −2   0 0 1 0  ,


0 1 0 0 5 1 2 3 0 1 0 0

which gives  
3 −1 1 2
 5 3 2 1 
A1 = 
 2 −2 3 2  .

1 1 2 4
Numerical Computation of Eigenvalues 489

Now we eliminate the elements in the (3, 1) and (4, 1) positions. It can
be done by subtracting multiples m31 = 52 = 0.4 and m41 = 15 = 0.2 of row
2 from rows 3 and 4, respectively. Then the transformation
   
1 0 0 0 3 −1 1 2 1 0 0 0
 0 1 0 0 
A2 = M −1 A1 M =   5 3 2 1   0 1 0 0 
 

 0 −0.4 1 0   2 −2 3 2   0 0.4 1 0 
0 −0.2 0 1 1 1 2 4 0 0.2 0 1
gives  
3 −0.2 1 2
 5 4 2 1 
A2 =  .
 0 −2 2.2 1.6 
0 1.8 1.6 3.8
The element of the largest modulus below the diagonal in the second
column occurs in the third row, and so there is no need to interchange the
row and column. Now we eliminate the elements in the (4, 2) position.
This can be done by subtracting multiples m42 = 1.8
−2
= −0.9 of row 3 from
row 4. Then the transformation
   
1 0 0 0 3 −0.2 1 2 1 0 0 0
0 1 0 0 5 4 2 1  0 1 0 0 
A3 = M2−1 A2 M2 = 
    
 
 0 0 1 0   0 −2 2.2 1.6   0 0 1 0 
0 0 0.9 1 0 1.8 1.6 3.8 0 0 0.9 1
gives  
3 −0.2 −0.8 2
 5 4 1.1 1 
A3 = 
 0 −2
,
0.76 1.6 
0 0 −1.136 5.24
which is in upper Hessenberg form. •

Example 4.23 Convert the following matrix to upper Hessenberg form


and then apply the QR method to find its eigenvalues:
 
1 4 3
A =  2 3 1 .
2 6 5
490 Applied Linear Algebra and Optimization using MATLAB

Solution. Since the upper Hessenberg form of the given matrix is


 
1 7 3
H1 =  2 4 1  ,
0 7 4
then applying the QR method on the upper Hessenberg matrix H1 will result
in transformation matrices after iterations 1, 10, 14, and 19 as follows:
 
7.0000 −1.3460 2.0466
H2 = R1 Q1 =  −7.4297 1.8551 −5.5934 
−0.0000 −0.2268 0.1449
 
8.5826 4.4029 6.8555
H10 = R9 Q9 =  −0.0000 1.0069 −1.8962 
0.0000 0.0058 −0.5895
 
8.5826 4.4248 6.8413
H14 = R13 Q13 =  −0.0000 1.0008 −1.9013 
0.0000 0.0007 −0.5834
 
8.5826 4.4279 6.8393
H19 = R18 Q18 =  −0.0000 0.9999 −1.9020  .
−0.0000 −0.0000 −0.5825
In this case the QR method converges in 19 iterations faster than the QR
method applied on the original matrix A in Example 4.18. •
Note that the calculation of the QR decomposition is simplified if a
given matrix is converted to upper Hessenberg form. So instead of applying
the decomposition to the original matrix A = A1 , the original matrix
is first transformed to the Hessenberg form. When A1 = H1 is in the
upper Hessenberg form, all the subsequent Hi are also in the same form.
Unfortunately, although transformation to upper Hessenberg form reduces
the number of calculations at each step, the method may still prove to be
computationally inefficient if the number of steps required for convergence
is too large. Therefore, we use the more efficient process called the shifting
QR method. Here, the iterative procedure
Hi = Qi Ri
Hi+1 = Ri Qi
Numerical Computation of Eigenvalues 491

is changed to
Hi − µi I = Qi Ri
Hi+1 = Ri Qi + µi I. (4.36)
This change is called shift because subtracting µi I from Hi shifts the
eigenvalues of the right side by µi as well as the eigenvalues of Ri Qi . Adding
µi I in the second equation in (4.36) shifts the eigenvalues of Hi+1 back
to the original values. However, the shifts accelerate convergence of the
eigenvalues close to µi .

4.6.4 Singular Value Decomposition


We have considered two principal methods for the decomposition of the
matrix, QR decomposition and LR decomposition. There is another im-
portant method for matrix decomposition called Singular Value Decompo-
sition (SVD).

Here, we show that every rectangular real matrix A can be decomposed


into a product U DV T of two orthogonal matrices U and V and a general-
ized diagonal matrix D. The construction of U DV T is based on the fact
that for all real matrices A, a matrix AT A is symmetric, and therefore
there exists an orthogonal matrix Q and a diagonal matrix D for which
AT A = QDQT .
As we know, the diagonal entries of D are the eigenvalues of AT A. Now
we show that they are nonnegative in all cases and that their square roots,
called the singular values of A, can be used to construct U DV T .

Singular Values of a Matrix

For any m × n matrix A, an n × n matrix AT A is symmetric and hence


can be orthogonally diagonalized. Not only are the eigenvalues of AT A all
real, they are all nonnegative. To show this, let λ be an eigenvalue of AT A,
with the corresponding unit vector v. Then
0 ≤ kAvk2 = (Av).(Av) = (Av)T Av = vT AT Av
= vT λv = λ(v.v) = λkvk2 = λ.
492 Applied Linear Algebra and Optimization using MATLAB

It therefore makes sense to take (positive) square roots of these eigen-


values.

Definition 4.5 (Singular Values of a Matrix)

If A is an m × n matrix, the singular values of A are the square roots of


the eigenvalues of AT A and are denoted by σ1 , . . . , σn . It is conventional
to arrange the singular values so that σ1 ≥ σ2 ≥ · · · σn . •

Example 4.24 Find the singular values of


 
1 0 1
A= .
1 1 0

Solution. Since the singular values of A are the square roots of the eigen-
values of AT A, we compute
   
1 1   2 1 1
1 0 1
AT A =  0 1  = 1 1 0 .
1 1 0
1 0 1 0 1

The matrix AT A has eigenvalues λ1 =√3, λ2 = 1, and λ3 = √ 0. Conse-


quently,
√ the singular values of A are σ1 = 3 = 1.7321, σ2 = 1 = 1, and
σ3 = 0 = 0. •

Note that the singular values of A are not the same as its eigenvalues,
but there is a connection between them if A is a symmetric matrix.
Theorem 4.8 If A = AT is a symmetric matrix, then its singular values
are the absolute values of its nonzero eigenvalues, i.e.,

σi = |λi | > 0. •

Theorem 4.9 The condition number of a nonsingular matrix is the ratio


between its largest singular value σ1 (or dominant singular value) and the
smallest singular value σn , i.e.,
σ1
K(A) = . •
σn
Numerical Computation of Eigenvalues 493

Singular Value Decomposition

The following are some of the properties that make singular value decom-
positions useful:

1. All real matrices have singular value decompositions.

2. A real square matrix is invertible, if and only if all its singular values
are nonzero.

3. For any m × n real rectangular matrix A, the number of nonzero


singular values of A is equal to the rank of A.

4. If A = U DV T is a singular value decomposition of an invertible


matrix A, then A−1 = V D−1 U T .

5. For positive-definite symmetric matrices, the orthogonal decomposi-


tion QDQT and the singular value decomposition U DV T coincide.

Theorem 4.10 (Singular Value Decomposition Theorem)

Every m × n matrix A can be factored into the product of an m × m matrix


U with orthonormal columns, so U T U = I, the m × n diagonal matrix
D = diag(σ1 , . . . , σr ) that has the singular values of A as its diagonal
entries, and an n × n matrix V with orthonormal rows, so V T V = I, i.e.,

v1T
  
σ1 0
 σ2  v2T 

...
 .. 
.
  
  
A = U DV T = (u1 u2 · · · ur ur+1 · · · un )  σr vrT .
  

T
0 vr+1
  
  
 ..  .. 
 .  . 
0 0 vnT


494 Applied Linear Algebra and Optimization using MATLAB

Note that the columns of U , u1 , u2 , . . . , ur , are called left singular vec-


tors of A, and the columns of V , v1 , v2 , . . . , vr , are called right singular
vectors of A. The matrices U and V are not uniquely determined by A,
but a matrix D must contain the singular values, σ1 , σ2 , . . . , σr , of A.

To construct the orthogonal matrix V , we must find an orthonormal ba-


sis {v1 , v2 , . . . , vn } for Rn consisting of eigenvectors of an n × n symmetric
matrix AT A. Then
V = [v1 v2 · · · vn ],
is an orthogonal n × n matrix.

For the orthogonal matrix U , we first note that {Av1 , Av2 , . . . , Avn }
is an orthogonal set of vectors in Rm . To see this, suppose that vi is an
eigenvector of AT A corresponding to an eigenvalue λi , then, for i 6= j, we
have

(Avi ).(Avj ) = (Avi )T Avj = vi T AT Avj = vi T λj vj = λj (vi .vj ) = 0,

since the eigenvectors vi are orthogonal. Now recall that the singular values
satisfy σi = kAvi k and that the first r of these are nonzero. Therefore, we
can normalize Av1 , . . . , Avr by setting
1
ui = Avi , for i = 1, 2, . . . , r.
σi
This guarantees that {u1 , u2 , . . . , ur } is an orthonormal set in Rm , but
if r < m, it will not be a basis for Rm . In this case, we extend the set
{u1 , u2 , . . . , ur } to an orthonormal basis {u1 , u2 , . . . , um } for Rm .

Example 4.25 Find the singular value decomposition of the following ma-
trix:  
1 0 1
A= .
1 1 0
Solution. We compute
   
1 1   2 1 1
1 0 1
AT A =  0 1  =  1 1 0 ,
1 1 0
1 0 1 0 1
Numerical Computation of Eigenvalues 495

and find that its eigenvalues are

λ1 = 3, λ2 = 1, λ3 = 0,

with the corresponding eigenvectors


     
2 0 −1
 1 ,  −1  ,  1 .
1 1 1

These vectors are orthogonal, so we normalize them to obtain

2 1
   
√ 0 −√
 
 6   3 
     

 1 
  1   

− √
  1 
v1 =  √  , v2 =  , v3 =  √  .
     
 6  2 
   3 
     

 1 
  1   
√  1 
√ 2 √
6 3

The singular values of A are


p √ p √ p √
σ1 = λ1 = 3, σ2 = λ2 = 1 = 1, σ3 = λ3 = 0 = 0.

Thus,

2 1
 
√ 0 −√
 6 3 
 

 1
  √ 
 √ −√1 1 
3 0 0
V = √ ,

D= .
 6 2 3  0 1 0
 
 
 1 1 1 
√ √ √
6 2 3
496 Applied Linear Algebra and Optimization using MATLAB

To find U , we compute
2
 

 6   1 

 
  
2 
1 1 1 0 1  1  
  
u1 = Av1 = √  √ =

σ1 3 1 1 0 

6   1 
  √
2
 
 1 

6
and
0
 
 1 

 
 
 1 
2
1 1 1 0 1  −√   
u2 = Av2 = = .
  
σ2 1 1 1 0 
 2   1 
  −√
 1  2

2
These vectors already form an orthonormal basis for R2 , so we have
 1 1 
√ √
 2 2 
U = .
 
 1 1 
√ −√
2 2
This yields the SVD
2 1
 
√ 0 −√
 1 1   6 3 
√ √
 
2 2   √   
3 0 0  √1 1 1 
 
A=
  −√ √ .
1  0 1 0  6
 
 1 2 3 
√ −√  
2 2
 
 1 1 1 
√ √ √
6 2 3

Numerical Computation of Eigenvalues 497

The MATLAB built-in function svd performs the SVD of a matrix.


Thus, to reproduce the above results using the MATLAB Command Win-
dow, we do the following:

>> A = [1 0 1; 1 1 0];
>> [U, D, V ] = svd(A);

The SVD occurs in many applications. For example, if we can compute


the SVD accurately, then we can solve a linear system very efficiently. Since
we know that the nonzero singular values of A are the square roots of the
nonzero eigenvalues of a matrix AAT , which are the same as the nonzero
eigenvalues of AT A, there are exactly r = rank(A) positive singular values.

Suppose that A is square and has full rank. Then if Ax = b, we have


U DV T x = b
U T U DV T x = UT b
DV T x = UT b
V Tx = D−1 U T b
V V Tx = V D−1 U T b
x = V D−1 U T b
(since U T U = 1, V V T = 1 by orthogonality).
Example 4.26 Find the solution of the linear system Ax = b using SVD,
where    
−4 −6 1
A= and b = .
3 −8 4
Solution. First we have to compute the SVD of A. For this we have to
compute
    
T −4 3 −4 −6 25 0
A A= = .
−6 −8 3 −8 0 100

The characteristic polynomial of AT A is


λ2 − 125λ + 2500 = (λ − 100)(λ − 25) = 0,
498 Applied Linear Algebra and Optimization using MATLAB

and it gives the eigenvalues of AT A:

λ1 = 100 and λ2 = 25.

Corresponding to the eigenvalues λ1 and λ2 , we can have the eigenvectors


   
0 1
and ,
1 0

respectively. These vectors are orthogonal, so we normalize them to obtain


   
0 1
v1 = and v2 = .
1 0

The singular values of A are


p √ p √
σ1 = λ1 = 100 = 10 and σ2 = λ2 = 25 = 5.

Thus,    
0 1 10 0
V = and D= .
1 0 0 5
To find U , we compute
    
1 1 −4 −6 0 −0.6
u1 = Av1 = =
σ1 10 3 −8 1 −0.8

and     
1 1 −4 −6 1 −0.8
u2 = Av2 = = .
σ2 5 3 −8 0 0.6
These vectors already form an orthonormal basis for R2 , so we have
 
−0.6 −0.8
U= .
−0.8 0.6

This yields the SVD


   
−0.6 −0.8 10 0 0 1
A= .
−0.8 0.6 0 5 1 0
Numerical Computation of Eigenvalues 499

Now to find the solution of the given linear system, we solve

x = V D−1 U T b

or
       
x1 0 1 0.1 0 −0.6 −0.8 −0.04
= = .
x2 1 0 0 0.2 −0.8 0.6 −0.14

So
x1 = −0.04 and x2 = −0.14,
which is the solution of the given linear system. •

4.7 Summary
We discussed many numerical methods for finding eigenvalues and eigen-
vectors. Many eigenvalue problems do not require computation of all of
the eigenvalues. The power method gives us a mechanism for computing
the dominant eigenvalue along with its associated eigenvector for an arbi-
trary matrix. The convergence rate of the power method is poor when the
two largest eigenvalues in magnitude are nearly equal. The technique of
shifting the matrix by an amount (−µI) can help us to overcome this dis-
advantage, and it can also be used to find intermediate eigenvalues by the
power method. Also, if a matrix A is symmetric, then the power method
gives faster convergence to the dominant eigenvalue and associated eigen-
vector. The inverse power method is used to estimate the least dominant
eigenvalue of a nonsingular matrix. The inverse power method is guar-
anteed to converge if a matrix A is diagonalizable with the single least
dominant nonzero eigenvalue. The inverse power method requires more
computational effort than the power method, because a linear algebraic
system must be solved at each iteration. The LU decomposition method
(Chapter 1) can be used to efficiently accomplish this task. We also dis-
cussed the deflation method to obtain other eigenvalues once the dominant
eigenvalue is known, and the Gerschgorin Circles theorem, which gives a
crude approximation of the location of the eigenvalues of a matrix.
500 Applied Linear Algebra and Optimization using MATLAB

A technique for symmetric matrices, which occurs frequently, is the


Jacobi method. It is an iterative method that uses orthogonal similarity
transformations based on plane rotations to reduce a matrix to a diagonal
form with diagonal elements as the eigenvalues of a matrix. The rotation
matrices are used at the same time to form a matrix whose columns contain
the eigenvectors of the matrix. The disadvantage of this method is that
it may take many rotations to converge to a diagonal form. The rate of
convergence of this method is increased by first preprocessing a matrix by
Given’s method and Householder transformations. These methods use the
orthogonal similarity transformations to convert a given symmetric matrix
to a symmetric tridiagonal matrix.

In the last section we discussed methods that depend on matrix de-


composition. Methods such as the QR method and the LR method can
be applied to a general matrix. To improve the computational efficiency of
these methods, instead of applying the decomposition to an original ma-
trix, an original matrix is first transformed to upper Hessenberg form. We
discussed the singular values of a matrix and the singular value decompo-
sition of a matrix in the last section of this chapter.

4.8 Problems
1. Find the first four iterations of the power method applied to each of
the following matrices:
 
2 3 1
(a)  1 4 −1  , start with x0 = [0, 1, 1]T .
3 1 2
 
5 4 6
(b)  2 2 −3  , start with x0 = [1, 1, 1]T .
3 1 1
 
1 1 1
(c)  −2 2 1  , start with x0 = [1, 1, 0]T .
5 1 1
Numerical Computation of Eigenvalues 501
 
3 0 0 2
 0 3 0 −1 
 , start with x0 = [1, 0, 0, 0]T .
(d) 
 1 0 2 2 
0 0 4 2

2. Find the first four iterations of the power method with Euclidean
scaling applied to each of the following matrices:
 
2 1 2
(a)  1 4 −1  , start with x0 = [1, 0, 1]T .
2 −1 2
 
2 4 −1
(b)  4 2 −3  , start with x0 = [1, 0, 0]T .
−1 −3 5
 
3 2 4
(c)  2 3 −1  , start with x0 = [0, 1, 1]T .
4 −1 3
 
3 1 0 1
 1 4 −1 3   , start with x0 = [1, 0, 1, 1]T .
(d)   0 −1 5 1 
1 3 1 2

3. Repeat Problem 2 using the power method with maximum entry


scaling.

4. Repeat Problem 1 using the inverse power method.

5. Find the first four iterations of the following matrices by using the
shifted inverse power method:
 
2 3 3
(a)  1 4 −1  , start with x0 = [0, 1, 1]T , µ = 4.5.
3 1 2
 
1 1 −1
(b)  2 1 −3  , start with x0 = [1, 1, 1]T , µ = 5.
2 −4 1
502 Applied Linear Algebra and Optimization using MATLAB

 
1 1 1
(c)  −2 2 1  , start with x0 = [1, 1, 0]T , µ = 4.
3 3 3
 
3 0 3 2
 1 3 0 −1 
(d)   , start with x0 = [1, 0, 0, 0]T , µ = 3.5.
 1 0 2 2 
0 0 0 2
6. Find the dominant eigenvalue and corresponding eigenvector by using
the power method, with x(0) = [1, 1, 1]t (only four iterations):
 
3 0 1
A =  2 2 2 .
4 2 5
Also, solve by using the inverse power method by taking the initial
value of the eigenvalue by using the Rayleigh quotient theorem.
7. Find the dominant eigenvalue and corresponding eigenvector of the
matrix A by using the power method. Start with x(0) = [2, 1, 0, −1]T
and  = 0.0001:
 
3 1 −2 1
 1 8 −1 0 
A=  −2 −1
.
3 −1 
1 0 −1 8

Also, use the shifted inverse power method with the same x(0) as given
above to find the eigenvalue nearest to µ, which can be calculated by
using the Rayleigh quotient.
8. Find the dominant eigenvalue and corresponding eigenvector by using
the power method, with u0 = [1, 1, 1]t (only four iterations):
 
3 0 1
A =  2 2 2 .
4 2 5
Also, solve by using the inverse power method by taking the initial
value of the eigenvalue by using the Rayleigh quotient.
Numerical Computation of Eigenvalues 503

9. Use the Gerschgorin Circles theorem to determine the bounds for the
eigenvalues of each of the given matrices:
     
3 2 1 1 1 1 2 −2 1
(a)  2 3 0  , (b)  1 1 0  , (c)  −2 1 1 .
1 0 3 1 0 1 1 1 2

10. Consider the matrix


 
1 1 −2
A =  −1 2 1 ,
0 1 −1
which has an eigenvalue 2 with eigenvector [1, 3, 1]T . Use the deflation
method to find the remaining eigenvalues and eigenvectors of A.
11. Consider the matrix
 
2 −3 6
A= 0 3 −4  ,
0 2 −3
which has an eigenvalue 2 with eigenvector [1, 0, 0]T . Use the deflation
method to find the remaining eigenvalues and eigenvectors of A.
12. Consider the matrix
 
8 −2 −3 1
 7 −1 −3 1 
A= ,
 6 −2 −1 1 
5 −2 −3 4
which has an eigenvalue 4 with eigenvector [1, 1, 1, 1]T . Use the de-
flation method to find the remaining eigenvalues and eigenvectors of
A.
13. Find the eigenvalues and corresponding eigenvectors of each of the
following matrices by using the Jacobi method:
     
5 3 7 2 −1.5 0 4 6 7
(a)  3 1 6  , (b)  −1.5 2 −0.5  , (c)  6 5 −3  ,
7 6 2 0 −0.5 2 7 −3 2
504 Applied Linear Algebra and Optimization using MATLAB

   
  4 4 4 1 2 −1 3 2
0.4 0.3 0.1  4 6 1 4 
  −1 3 1 −2 
(d)  0.3 0.5 0.2  , (e) 
 4 , (f )  .
1 6 4   3 1 4 1 
0.1 0.2 0.6
1 4 4 6 2 −2 1 −3
14. Use the Jacobi method to find all the eigenvalues and eigenvectors of
the matrix  
5 −2 −0.5 1.5
 −2 5 1.5 −0.5 
A=  −0.5
.
1.5 5 −2 
1.5 −0.5 −2 5
15. Use the Sturm sequence iteration to find the number of eigenvalues
of the following matrices lying in the given intervals (a, b):
   
2 −1 0 5 −1 0
(a)  −1 2 −1  , (−1, 3) (b)  −1 2 2  , (0, 4).
0 −1 2 0 2 3

16. Use the Sturm sequence iteration to find the eigenvalues of the fol-
lowing matrices:
     
1 4 0 1 2 0 1 2 0
(a)  4 1 4 , (b)
  2 2 1  , (c)  2 1 2  .
0 1 1 0 4 4 0 2 1

17. Find the eigenvalues and eigenvectors of the given symmetric matrix
A by using the Jacobi method:
 
0.6532 0.2165 0.0031
A= 0.4105 0.0052  .
0.2132
Also, use Given’s method to tridiagonalize the above matrix.
18. Use Given’s method to convert the given matrix into tridiagonal form:
 
2 −1 3 2
 −1 3 1 −2 
A=  3
.
1 4 1 
2 −2 1 −3
Numerical Computation of Eigenvalues 505

Also find the characteristic equation by using the Sturm sequence


iteration.

19. Use Given’s method to convert each matrix considered in Problem 9


into tridiagonal form.

20. Use Given’s method to convert each matrix considered in Problem 5


into tridiagonal form and then use the Sturm sequence iteration to
find the eigenvalues of each matrix.

21. Use Householder’s method to convert each matrix considered in Prob-


lem 9 into tridiagonal form.

22. Use Householder’s method to convert each matrix into tridiagonal


form and then use the Sturm sequence iteration to find the eigenval-
ues of each matrix:
 
    4 −2 1 4
2 3 4 5 −2 1  −2 5 0 3 
(a)  3 4 5 , (b)
  −2 7 9  , (c)   1
.
0 6 2 
4 5 6 1 9 8
4 3 2 7

23. Use Householder’s method to place the following matrix in tridiago-


nal form:  
7 1 2 1
 1 8 1 −1 
A=  2
.
1 3 1 
1 −1 1 2
Also, find the characteristic equation.

24. Find the first four QR iterations for each of the given matrices:
     
1 0 2 2 −1 2 −21 −9 12
(a)  −2 1 1  , (b)  3 1 0  , (c)  0 6 0 .
−2 −5 1 0 2 1 −24 −8 15

25. Find the first 15 QR iterations for each of the matrices in Problem
9.
506 Applied Linear Algebra and Optimization using MATLAB

26. Use the QR method to find the eigenvalues of the matrix


 
2 −1 0
A =  −1 −1 −2  .
0 −2 3

27. Find the eigenvalues using the LR method for each of the given ma-
trices:
     
3 1 1 2 1 2 4 0 1
(a)  2 1 1  , (b)  3 1 0  , (c)  −2 1 0  .
1 1 1 1 2 1 −2 0 1

28. Find the eigenvalues using the LR method for each of the given ma-
trices:
     
1 2 4 3 3 3 15 13 20
(a)  5 1 1  , (b)  3 3 3  , (c)  −21 12 15  .
2 1 1 −3 −3 −3 −8 −8 11

29. Transform each of the given matrices into upper Hessenberg form:
     
1 6 4 5 4 3 2 5 2
(a)  5 1 3  , (b)  2 3 3  , (c)  11 6 7  ,
2 4 4 −3 −3 8 9 15 22
   
2 −1 4 2 2 1 −2 −3
 3
 , (e)  2
2 3 2   2 −3 2 
(d) 
 1  −3 −3
,
2 2 2  4 5 
2 −3 4 4 7 8 3 2
 
9 2 1 −2
 2 1 1 −5 
(f ) 
 −2
.
1 6 −2 
−2 −1 1 −3
Numerical Computation of Eigenvalues 507

30. Transform each of the given matrices into upper Hessenberg form
using Gaussian elimination with pivoting. Then use the QR method
and the LR method to find their eigenvalues:
 
    14 22 2 1
11 33 45 4 3 3  5 1 5 −2 
(a)  12 21 23 , (b)
  2 5 4  , (c)   6
.
1 6 1 
18 22 31 −3 2 1
7 −2 7 4

31. Find the singular values for each of the given matrices:
   
  3 0 0 1 0 1
2 0 1
(a) , (b)  −2 3 −2  , (c)  0 1 0  ,
0 2 0
2 0 5 0 1 2
 
    2 0 1 2
4 0 1 2 0 1  0 1 1 3 
(d)  0 1 0 , (e)
  −4 6 −2  , (f )   0 3 2 1 .

2 1 1 2 0 7
1 0 3 1
32. Show that the singular values of the following matrices are the same
as the eigenvalues of the matrices:
     
4 2 1 3 0 1 2 0 0
(a)  2 8 0  , (b)  0 5 0  , (c)  0 6 0 .
1 0 8 1 0 5 0 0 7

33. Show that all singular values of an orthogonal matrix are 1.


34. Show that if A is a positive-definite matrix, then A has a singular
value decomposition of the form QDQT .
35. Find an SVD for each of the given matrices:
   
0 −4 2 1 0
(a) , (b) ,
−6 0 1 3 0
 
1 0  
1 2 1
(c)  1 2 ,
 (d) .
1 2 1
−1 3
508 Applied Linear Algebra and Optimization using MATLAB

36. Find an SVD for each of the given matrices:


   
0 −2 2 0 0
(a) , (b) ,
−3 0 0 3 0
 
1 0  
1 1 1
(c)  1 1 ,
 (d) .
1 1 1
−1 1

37. Find an SVD for each of the given matrices:


   
0 −2 1 2 −4 3
(a)  −3 0 2 , (b)  6 6 3 ,
0 1 1 −4 2 4
 
  8 −3 7 1
1 2 0  3 11 3 2 
(c)  1 1 1 , (d) 
 1
,
2 5 2 
−1 1 2
2 0 7 2
   
2 1 −2 −13 10 1 0 −2
 11 12 −3 12   2 1 1 −5 
(e)  , (f )  .
 3 22 24 15   4 6 7 −2 
7 8 3 2 5 −1 1 −3

38. Find the solution of each of the following linear systems, Ax = b,


using singular value decomposition:

(a)
     
1 −3 x1 1
A= , x= , b= .
3 −5 x2 2

(b)
     
1 −1 x1 1.1
A= , x= , b= .
1 4 x2 0.5
Numerical Computation of Eigenvalues 509

(c)
   

3 −1 4 x1 1
A =  −1 0 1 , x =  x2  , b =  2 .
4 1 2 x3 3

(d)
     
4 3 2 x1 2.5
A =  1 2 −1  , x =  x2  , b =  1.5  .
1 3 2 x3 0.85

39. Find the solution each of the following linear systems, Ax = b, using
singular value decomposition:

(a)      
2 2 x1 1
A= , x= , b= .
1 3 x2 0.9
(b)      
1 0 x1 1
A= , x= , b= .
3 −2 x2 2
(c)
    
1 −1 0 x1 1
A= 2 0 1 , x =  x2  , b =  1 .
3 0 2 x3 1

(d)    

1 2 3 x1 1
A =  2 1 2 , x =  x2  , b =  0 .
1 1 1 x3 1
Chapter 5

Interpolation and
Approximation

5.1 Introduction
In this chapter we describe the numerical methods for the approximation
of functions other than elementary functions. The main purpose of these
techniques is to replace a complicated function with one that is simpler
and more manageable. We sometimes know the value of a function f (x) at
a set of points (say, x0 < x1 < x2 · · · < xn ), but we do not have an analytic
expression for f (x) that lets us calculate its value at an arbitrary point.
We will concentrate on techniques that may be adapted if, for example, we
have a table of values of functions that may have been obtained from some
physical measurement or some experiments or long numerical calculations
that cannot be cast into a simple functional form. The task now is to
estimate f (x) for an arbitrary point x by, in some sense, drawing a smooth

511
512 Applied Linear Algebra and Optimization using MATLAB

curve through (and perhaps beyond) the data points xi . If the desired
x is between the largest and smallest of the data points, then the problem
is called interpolation; and if x is outside that range, it is called extrapola-
tion. Here, we shall restrict our attention to interpolation. It is a rational
process generally used in estimating a missing functional value by taking
a weighted average of known functional values at neighboring data points.

The interpolation scheme must model a function, in between or be-


yond the known data point, by some plausible functional form. The form
should be sufficiently general to be able to approximate large classes of
functions that might arise in practice. The functional forms are polynomi-
als, trigonometric functions, rational functions, and exponential functions.
However, we shall restrict our attention to polynomials. The polynomial
functions are widely used in practice, since they are easy to determine,
evaluate, differentiate, and are integrable. Polynomial interpolation pro-
vides some mathematical tools that can be used in developing methods
for approximation theory, numerical differentiation, numerical integration,
and numerical solutions of ordinary differential equations and partial dif-
ferential equations. A set of data points we consider here may be equally or
unequally spaced in the independent variable x. Several procedures can be
used to fit approximation polynomials either individually or for both. For
example, Lagrange interpolatory polynomials, Newton’s divided difference
interpolatory polynomials, and Aitken’s interpolatory polynomials can be
used for unequally spaced or equally spaced, and procedures based on dif-
ferences can be used for equally spaced, including Newton forward and
backward difference polynomials, Gauss forward and backward difference
polynomials, Bessel difference polynomials and Stirling difference polyno-
mials. These methods are quite easy to apply. But here, we discuss only the
Lagrange interpolation method, Newton’s divided differences interpolation
method, and Aitken’s interpolation method. We shall also discuss another
polynomial interpolation known as the Chebyshev polynomial. This type
of polynomial interpolates the given function over the interval [−1, 1].

The other approach to approximate a function is called the least squares


approximation. This approach is suitable if the given data points are exper-
imental data. We shall discuss linear, nonlinear, plane, and trigonometric
Interpolation and Approximation 513

least squares approximation of a function. We shall also discuss the least


squares solution of overdetermined and underdetermined linear systems.
At the end of the chapter, we discuss least squares with QR decomposition
and singular value decomposition.

5.2 Polynomial Approximation


The general form of an nth-degree polynomial is

pn (x) = a0 + a1 x + a2 x2 + · · · + an xn , (5.1)

where n denotes the degree of the polynomial and a0 , a1 , . . . , an are con-


stant coefficients. Since there are (n+1) coefficients, (n+1) data points are
required to obtain a unique value for the coefficients. The important prop-
erty of polynomials that makes them suitable for approximating functions
is due to the following Weierstrass Approximation theorem.

Theorem 5.1 (Weierstrass Approximation Theorem)

If f (x) is a continuous function in the closed interval [a, b], then for every
 > 0 there exists a polynomial pn (x), where the value of n depends on the
value of , such that for all x in [a, b],

|f (x) − pn (x)| < . (5.2)

Consequently, any continuous function can be approximated to any accu-


racy by a polynomial of high enough degree. •

Suppose we have a given a set of (n + 1) data points relating dependent


variables f (x) to an independent variable x as follows:

x x0 x1 · · · xn
.
f (x) f (x0 ) f (x1 ) · · · f (xn )
514 Applied Linear Algebra and Optimization using MATLAB

Generally, the data points x0 , x1 , . . . , xn are arbitrary, and assume the in-
terval between the two adjacent points is not the same, and assume that
the data points are organized in such a way that x0 < x1 < x2 < · · · <
xn−1 < xn .

When the data points in a given functional relationship are not equally
spaced, the interpolation problem becomes more difficult to solve. The
basis for this assertion lies in the fact that the interpolating polynomial
coefficient will depend on the functional values as well as on the data
points given in the table.

5.2.1 Lagrange Interpolating Polynomials


This is one of the most popular and well-known interpolation methods
used to approximate the functions at an arbitrary point x. The Lagrange
interpolation method provides a direct approach for determining interpo-
lated values, regardless of the data points spacing, i.e., it can be fitted to
unequally spaced or equally spaced data. To discuss the Lagrange interpo-
lation method, we start with the simplest form of interpolation, i.e,, linear
interpolation. The interpolated value is obtained from the equation of a
straight line that passes through two tabulated values, one on each side
of the required value. This straight line is a first-degree polynomial. The
problem of determining a polynomial of degree one that passes through
the distinct points (x0 , y0 ) and (x1 , y1 ) is the same as approximating the
function f (x) for which f (x0 ) = y0 and f (x1 ) = y1 by means of first degree
polynomial interpolation. Let us consider the construction of a linear poly-
nomial p1 (x) passing through two data points (x0 , f (x0 )) and (x1 , f (x1 )),
as shown in Figure 5.1. Let us consider a linear polynomial of the form
p1 (x) = a0 + a1 x. (5.3)
Since a polynomial of degree one has two coefficients, one might expect
to be able to choose two conditions that satisfy
p1 (xk ) = f (xk ); k = 0, 1.
When p1 (x) passes through point (x0 , f (x0 )), we have
p1 (x0 ) = a0 + a1 x0 = y0 = f (x0 ),
Interpolation and Approximation 515

Figure 5.1: Linear Lagrange interpolation.

and if it passes through point (x1 , f (x1 )), we have

p1 (x1 ) = a0 + a1 x1 = y1 = f (x1 ).

Solving the last two equations gives the unique solution


x0 y1 − x1 y0 y1 − y0
a0 = ; a1 = . (5.4)
x0 − x1 x1 − x0

Putting these values in (5.3), we have


   
x − x1 x − x0
p1 (x) = y0 + y1 ,
x0 − x1 x1 − x 0

which can also be written as

p1 (x) = L0 (x)f (x0 ) + L1 (x)f (x1 ), (5.5)

where
x − x1 x − x0
L0 (x) = and L1 (x) = . (5.6)
x0 − x 1 x1 − x0
516 Applied Linear Algebra and Optimization using MATLAB

Figure 5.2: General Lagrange interpolation.

Note that when x = x0 , then L0 (x0 ) = 1 and L1 (x0 ) = 0. Similarly,


when x = x1 , then L0 (x1 ) = 0 and L1 (x1 ) = 1. The polynomial (5.5) is
known as the linear Lagrange interpolating polynomial and (5.6) are the
Lagrange coefficient polynomials. To generalize the concept of linear inter-
polation, consider the construction of a polynomial pn (x) of degree at most
n that passes through (n + 1) distinct points (x0 , f (x0 )), . . . , (xn , f (xn ))
(Figure 5.2) and satisfies the interpolation conditions

pn (xk ) = f (xk ); k = 0, 1, 2, . . . , n. (5.7)

Assume that there exists polynomial Lk (x) (k = 0, 1, 2, . . . , n) of degree


n having the property

0, for k 6= j
Lk (xj ) = (5.8)
1, for k=j

and n
X
Lk (x) = 1. (5.9)
k=0
Interpolation and Approximation 517

The polynomial pn (x) is given by

pn (x) = L0 (x)f (x0 ) + L1 (x)f (x1 ) + · · · + Li−1 (x)f (xi−1 )


+ Li (x)f (xi ) + · · · + Ln (x)f (xn )
Xn
= Lk (x)f (xk ). (5.10)
k=0

It is clearly a polynomial of degree at most n and satisfies the conditions


(5.7) since

pn (xi ) = L0 (xi )f (x0 ) + L1 (xi )f (x1 ) + · · · + Li−1 (xi )f (xi−1 )


+ Li (xi )f (xi ) + · · · + Ln (xi )f (xn ),

which implies that


pn (xi ) = f (xi ).
It remains to be shown how the polynomial Li (x) can be constructed
so that it satisfies (5.8). If Li (x) is to satisfy (5.8), then it must contain a
factor

(x − x0 )(x − x1 ) · · · (x − xi−1 )(x − xi+1 ) · · · (x − xn ). (5.11)

Since this expression has exactly n terms and Li (x) is a polynomial of


degree n, we can deduce that

Li (x) = Ai (x − x0 )(x − x1 ) · · · (x − xi−1 )(x − xi+1 ) · · · (x − xn ), (5.12)

for some multiplicative constant Ai . Let x = xi , then the value of Ai is


chosen so that

Ai = 1/(xi − x0 )(xi − x1 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn ), (5.13)

where none of the terms in the denominator can be zero, from the assump-
tion of distinct points. Hence,
n  
Y x − xk
Li (x) = , i 6= k. (5.14)
k=0
xi − xk
518 Applied Linear Algebra and Optimization using MATLAB

The interpolating polynomial can now be readily evaluated by substituting


(5.14) into (5.10) to give
n Y n  
X x − xk
f (x) ≈ pn (x) = f (xi ), i 6= k. (5.15)
i=0 k=0
x i − x k

This formula is called the Lagrange interpolation formula of degree n and


the terms in (5.14) are called the Lagrange coefficient polynomials.

To show the uniqueness of the interpolating polynomial pn (x), we sup-


pose that in addition to the polynomial pn (x) the interpolation problem
has another solution qn (x) of degree ≤ n, whose graph passes through
(xi , yi ), i = 0, 1, . . . , n. Then define
rn (x) = pn (x) − qn (x)
of a degree not greater than n. Since
rn (xi ) = pn (xi ) − qn (xi ) = f (xi ) − f (xi ) = 0,
the polynomial rn (x) vanishes at n + 1 point. But by using the following
well-known result from the theory of equations:

“If a polynomial of degree n vanishes at n + 1 distinct points, then the


polynomial is identically zero.”

Hence, rn (x) vanishes identically, or equivalently, at pn (x) = qn (x).


Example 5.1 Let f (x) = 0 be defined by the three numbers −h, 0, h, where
h 6= 0. Use the Lagrange interpolating polynomial to construct the polyno-
mial p(x), which interpolates f (x) at the given numbers. Then show that
this polynomial can be written in the following form:
1 1
p(x) = 2
[f (−h) − 2f (0) + f (h)]x2 + [f (h) − f (−h)]x + f (0).
2h 2h
Solution. Given three distinct points x0 = −h, x1 = 0, and x2 = h and
using the quadratic Lagrange interpolating polynomial as
(x − x1 )(x − x2 ) (x − x0 )(x − x2 ) (x − x0 )(x − x1 )
p2 (x) = f (x0 )+ f (x1 )+ f (x2 ),
(x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 ) (x2 − x0 )(x2 − x1 )
Interpolation and Approximation 519

at these data points, we get

(x − 0)(x − h) (x + h)(x − h) (x + h)(x − 0)


p2 (x) = f (−h)+ f (0)+ f (h)
(−h − 0)(−h − h) (0 + h)(0 − h) (h + h)(h − 0)

or

(x2 − xh) (x2 − h2 ) (x2 + xh)


p2 (x) = f (−h) + f (0) + f (h).
(−h)(−2h) (h)(−h) (2h)(h)

Separating the coefficients of x2 , x, and a constant term, we get


     2 
f (−h) f (0) f (h) 2 −hf (−h) hf (h) f (h) −h f (0)
p2 (x) = + + x + + + x+ .
2h2 −h2 2h2 2h2 2h2 2h2 −h2

Simplifying, we obtain

1 2 1
p(x) = [f (−h)−2f (0)+f (h)]x + [f (h)−f (−h)]x+f (0).
2h2 2h

Example 5.2 Let p2 (x) be the quadratic Lagrange interpolating polyno-


mial for the data: (1, 2), (2, 3), (3, α). Find the value of α if the constant
term in p2 (x) is 5. Also, find the approximation of f (2.5).

Solution. Consider the quadratic Lagrange interpolating polynomial as

p2 (x) = L0 (x)f (x0 ) + L1 (x)f (x1 ) + L2 (x)f (x2 ),

and using the given data points, we get

p2 (x) = L0 (x)(2) + L1 (x)(3) + L2 (x)(α),


520 Applied Linear Algebra and Optimization using MATLAB

where the Lagrange coefficients can be calculated as follows:


(x − x1 )(x − x2 ) (x − 2)(x − 3)
L0 (x) = =
(x0 − x1 )(x0 − x2 ) (1 − 2)(1 − 3)

1 2
= (x − 5x + 6)
2
(x − x0 )(x − x2 ) (x − 1)(x − 3)
L1 (x) = =
(x1 − x0 )(x1 − x2 ) (2 − 1)(2 − 3)

= −(x2 − 4x + 3)

(x − x0 )(x − x1 ) (x − 1)(x − 2)
L2 (x) = =
(x2 − x0 )(x2 − x1 ) (3 − 1)(3 − 2)

1 2
= (x − 3x + 2).
2
Thus,
1 1
p2 (x) = (x2 − 5x + 6)(2) − (x2 − 4x + 3)(3) + (x2 − 3x + 2)(α).
2 2
Separating the coefficients of x2 , x, and a constant term, we get
α 3α
p2 (x) = (1 − 3 + )x2 + (−5 + 12 − )x + (6 − 9 + α)
2 2
or
α 3α
p2 (x) = (−2 + )x2 + (7 − )x + (−3 + α).
2 2
Since the given value of the constant term is 5, using this, we get
(−3 + α) = 5, gives α = 8.
Now using this value of α, the approximation of f (x) and given x = 2.5,
we get
   
8 2 24
p2 (2.5) = (−2 + (2.5) + (7 − (2.5) + (−3 + 8),
2 2
and it gives
f (2.5) ≈ p2 (2.5) = 12.50 − 12.50 + 5 = 5. •
Interpolation and Approximation 521

Example 5.3 Let f (x) = x + 1/x, with points x0 = 1, x1 = 1.5, x2 = 2.5,


and x3 = 3. Find the quadratic Lagrange polynomial for the approximation
of f (2.7). Also, find the relative error.

Solution. Consider the quadratic Lagrange interpolating polynomial as


p2 (x) = L0 (x)f (x0 ) + L1 (x)f (x1 ) + L2 (x)f (x2 ),
where the Lagrange coefficients are as follows:
(x − x1 )(x − x2 )
L0 (x) =
(x0 − x1 )(x0 − x2 )

(x − x0 )(x − x2 )
L1 (x) =
(x1 − x0 )(x1 − x2 )

(x − x0 )(x − x1 )
L2 (x) = .
(x2 − x0 )(x2 − x1 )
Since the given interpolating point is x = 2.7, the best three points for the
quadratic polynomial should be
x0 = 1.5, x1 = 2.5, x2 = 3,
and the function values at these points are
f (x0 ) = 2.167, f (x1 ) = 2.9, f (x2 ) = 3.333.
So using these values, we have
p2 (x) = 2.167L0 (x) + 2.9L1 (x) + 3.333L2 (x),
where
(x − 2.5)(x − 3) 1 2
L0 (x) = = (x − 5.5x + 7.5)
(1.5 − 2.5)(1.5 − 3) 1.5

(x − 1.5)(x − 3) 1
L1 (x) = = (x2 − 4.5x + 4.5)
(2.5 − 1.5)(2.5 − 3) −0.5

(x − 1.5)(x − 2.5) 1
L2 (x) = = (x2 − 4x + 3.75).
(3 − 1.5)(3 − 2.5) 0.75
522 Applied Linear Algebra and Optimization using MATLAB

Figure 5.3: Quadratic approximation of a function.

Thus,
2.167 2 2.9 2 3.333 2
p2 (x) = (x −5.5x+7.5) = (x −4.5x+4.5) = (x −4x+3.75),
1.5 −0.5 0.75
and simplifying, we get
p2 (x) = 0.0889x2 + 0.3776x + 1.4003,
which is the required quadratic polynomial. At x = 2.7, we have
f (2.7) ≈ p2 (2.7) = 3.0679.
The relative error is
|f (2.7) − p2 (2.7)| |3.0704 − 3.0679|
= = 0.0008. •
|f (2.7)| |3.0704|
Note that the sum of the Lagrange coefficients is equal to 1 as it should
be:
L0 (2.7) + L1 (2.7) + L2 (2.7) = −0.0400 + 0.7200 + 0.3200 = 1.
Using MATLAB commands, the above results can be reproduced as
follows:
Interpolation and Approximation 523

>> x = [1.5 2.5 3];


>> y = x + 1/x;
>> x0 = 2.7;
>> sol = lint(x, y, x0);

Program 5.1
MATLAB m-file for the Lagrange Interpolation Method
function fi=lint(x,y,x0)
dxi=x0-x; m=length(x); L=zeros(size(y));
L(1) = prod(dxi(2 : m))/prod(x(1) − x(2 : m));
L(m) = prod(dxi(1 : m − 1))/prod(x(m) − x(1 : m − 1));
for j=2:m-1
num = prod(dxi(1 : j − 1)) ∗ prod(dxi(j + 1 : m));
dem = prod(x(j) − x(1 : j − 1)) ∗ prod(x(j) − x(j + 1 : m));
L(j)=num/dem; end; f i = sum(y. ∗ L);
Example 5.4 Using the cubic Lagrange interpolation formula
p( x) = α1 f (0) + α2 f (0.2) + α3 f (0.4) + α4 f (0.6),
for the approximation of f (0.5), show that
α1 + α2 + α3 + α4 = 1.
Solution. Consider the cubic Lagrange interpolating polynomial
p3 (x) = α1 f (x0 ) + α2 f (x1 ) + α3 f (x2 ) + α4 f (x3 ),
where the values of α1 , α2 , α3 , α4 can be defined as follows:
(x − x1 )(x − x2 )(x − x3 )
α1 =
(x0 − x1 )(x0 − x2 )(x0 − x3 )

(x − x0 )(x − x2 )(x − x3 )
α2 =
(x1 − x0 )(x1 − x2 )(x1 − x3 )

(x − x0 )(x − x1 )(x − x3 )
α3 =
(x2 − x0 )(x2 − x1 )(x2 − x3 )

(x − x0 )(x − x1 )(x − x2 )
α4 = .
(x3 − x0 )(x3 − x1 )(x3 − x2 )
524 Applied Linear Algebra and Optimization using MATLAB

Using the given values as x0 = 0, x1 = 0.2, x2 = 0.4, x3 = 0.6, and the


interpolating point x = 0.5, we obtain
(0.5 − 0.2)(0.5 − 0.4)(0.5 − 0.6)
α1 = = 0.0625
(0 − 0.2)(0 − 0.4)(0 − 0.6)

(0.5 − 0)(0.5 − 0.4)(0.5 − 0.6)


α2 = = −0.3125
(0.2 − 0)(0.2 − 0.4)(0.2 − 0.6)

(0.5 − 0)(0.5 − 0.2)(0.5 − 0.6)


α3 = = 0.9375
(0.4 − 0)(0.4 − 0.2)(0.4 − 0.6)

(0.5 − 0)(0.5 − 0.2)(0.5 − 0.4)


α4 = = 0.3125.
(0.6 − 0)(0.6 − 0.2)(0.6 − 0.4)
Thus,
α1 + α2 + α3 + α4 = 0.0625 − 0.3125 + 0.9375 + 0.3125 = 1. •
Error Formula

As with any numerical technique, it is important to obtain bounds for


the errors involved. Now we discuss the error term when the Lagrange
polynomial is used to approximate the continuous function f (x). It is
similar to the error term for the well-known Taylor polynomial, except that
the factor (x − x0 )n+1 is replaced with the product (x − x0 )(x − x1 ) · · · (x −
xn ). This is expected because interpolation is exact at each of the (n + 1)
data points xk , where we have
f (xk ) − pn (xk ) = yk − yk = 0, for k = 1, 2, . . . , n. (5.16)
Theorem 5.2 (Error Formula of the Lagrange Polynomial)

If f (x) has (n + 1) derivatives on interval I, and if it is approximated by a


polynomial pn (x) passing through (n + 1) data points on I, then the error
En is given by
f (n+1) (η(x))
En = f (x) − pn (x) = (x − x0 )(x − x1 ) · · · (x − xn ), η(x) ∈ I,
(n + 1)!
(5.17)
Interpolation and Approximation 525

where pn (x) is the Lagrange interpolating polynomial (5.10) and an un-


known point η(x) ∈ (x0 , xn ). •
The error formula (5.17) is an important theoretical result because
Lagrange polynomials are used extensively for deriving numerical differ-
entiation and integration methods. Error bounds for these techniques are
obtained from the Lagrange error formula.
Example 5.5 Find the linear Lagrange polynomial that passes through the
points (0, f (0) and (π, f (π)) to approximate the function f (x) = 2 cos x.
Also, find a bound for the error in the linear interpolation of f (x).

Solution. Given x0 = 0 and x1 = π, then the linear Lagrange polynomial


p1 (x)
(x − x1 ) (x − x0 )
p1 (x) = f (x0 ) + f (x1 )
(x0 − x1 ) (x1 − x0 )
interpolating f (x) at these points is
(x − π) (x − 0)
p1 (x) = f (0) + f (π).
(0 − π) (π − 0)
By using the function values at the given data points, we get
(x − π) (x − 0) 4x
f (x) ≈ p1 (x) = (2) + (−2) = 2 − .
(0 − π) (π − 0) π
To compute a bound of error in the linear interpolation of f (x), we use
the linear Lagrange error formula (5.17)
(x − x0 )(x − x1 ) 00
f (x) − p1 (x) = f (η(x)),
2!
where η(x) is a unknown point between x0 = 0 and x1 = π. Hence,

(x − 0)(x − π) 00
|f (x) − p1 (x)| = |f (η(x))|.
2

The value of f 00 (η(x)) cannot be computed exactly because η(x) is not


known. But we can bound the error by computing the largest possible value
for |f 00 (η(x))|. So the bound |f 00 (x)| on [0, π] can be obtained as
M = max |f 00 (x)| = max | − 2 cos x| = 2,
0≤x≤π 0≤x≤π
526 Applied Linear Algebra and Optimization using MATLAB

and so for |f 00 (η(x))| ≤ M , we have

2
|f (x) − p1 (x)| ≤ |(x − 0)(x − π)|.
2
Since the function |(x − 0)(x − π)| attains its maximum in [0, π] and occurs
2
at x = (0+π)
2
, the maximum value is (π−0)4
.

This follows easily by noting that the function (x − 0)(x − π) is a


quadratic and has two roots 0 and π. Hence, its maximum value occurs
midway between these roots. Thus, for any x ∈ [0, π], we have

(π − 0)2 π2
|f (x) − p1 (x)| ≤ = ,
4 4
which is the required bound of error in the linear interpolation of f (x). •

Example 5.6 Use the best Lagrange interpolating polynomial to find the
approximation of f (1.5), if f (−2) = 2, f (−1) = 1.5, f (1) = 3.5, and
f (2) = 5. Estimate the error bound if the maximum value of |f (4) (x)| is
0.025 in the interval [−2, 2].

Solution. Since the given number of points, x0 = −2, x1 = −1, x2 =


1, x3 = 2, are four, the best Lagrange interpolating polynomial to find the
approximation of f (1.5) will be the cubic. The cubic Lagrange interpolating
polynomial for the approximation of the given function is

f (x) ≈ p3 (x) = L0 (x)f (x0 ) + L1 (x)f (x1 ) + L2 (x)f (x2 ) + L3 (x)f (x3 ),

and taking f (−2) = 2, f (−1) = 1.5, f (1) = 3.5, f (2) = 5, and the interpo-
lating point x = 1.5, we have

f (1.5) ≈ p3 (1.5) = L0 (1.5)f (−2)+L1 (1.5)f (−1)+L2 (1.5)f (1)+L3 (1.5)f (2)

or

f (1.5) ≈ p3 (1.5) = 2L0 (1.5) + 1.5L1 (1.5) + 3.5L2 (1.5) + 5L3 (1.5).
Interpolation and Approximation 527

The Lagrange coefficients can be calculated as follows:


(1.5 + 1)(1.5 − 1)(1.5 − 2)
L0 (1.5) = = 0.0521
(−2 + 1)(−2 − 1)(−2 − 2)

(1.5 + 2)(1.5 − 1)(1.5 − 2)


L1 (1.5) = = −0.1458
(−1 + 2)(−1 − 1)(−1 − 2)

(1.5 + 2)(1.5 + 1)(1.5 − 2)


L2 (1.5) = = 0.7292
(1 + 2)(1 + 1)(1 − 2)

(1.5 + 2)(1.5 + 1)(1.5 − 1)


L3 (1.5) = = 0.3646.
(2 + 2)(2 + 1)(2 − 1)
Putting these values of the Lagrange coefficients in the above equation, we
get
f (1.5) ≈ p3 (1.5) = 2(0.0521)+1.5(−0.1458)+3.5(0.7292)+5(0.3646) = 4.2607,
which is the required cubic interpolating polynomial approximation of the
function at the given point x = 1.5.

To compute an error bound for the approximation of the given function


in the interval [−2, 2], we use the following cubic error formula:
|f (4) (η(x))|
|f (x) − p3 (x)| = |(x − x0 )(x − x1 )(x − x2 )(x − x3 )|.
4!
Since
|f (4) (η(x))| ≤ M = max |f (4) (x)| = 0.025,
−2≤x≤2

M
|f (1.5) − p3 (1.5)| ≤ |(1.5 + 2)(1.5 + 1)(1.5 − 1)(1.5 − 2)|,
4!
which gives
(0.025)(2.1875)
|f (1.5) − p3 (1.5)| ≤ = 0.0023,
24
the desired error bound. •
528 Applied Linear Algebra and Optimization using MATLAB

Example 5.7 Determine the spacing h in a table of equally spaced values


of the function f (x) = ex between the smallest point a = 1 and the largest
point b = 2, so that interpolation with a second-degree polynomial in this
table will yield the desired accuracy.

Solution. Suppose that the given table contains the function values f (xi ),
for the points xi = 1 + ih, i = 0, 1, . . . , n, where n = (2−1) h
. If x ∈
[xi−1 , xi+1 ], then we approximate the function f (x) by degree 2 polynomial
p2 (x), which interpolates f (x) at xi−1 , xi , xi+1 . Then the error formula
(5.17) for these data points becomes
(x − x )(x − x )(x − x )
i−1 i i+1 000
|f (x) − p2 (x)| = f (η(x)) ,

3!
where η(x) ∈ (xi−1 , xi+1 ). Since the point η(x) is unknown, we cannot
estimate f 000 (η(x)), so we let
|f 000 (η(x))| ≤ M = max |f 000 (x)|.
1≤x≤2

Then
M
|(x − xi−1 )(x − xi )(x − xi+1 )|.
|f (x) − p2 (x)| ≤
6
Since f (x) = ex and f 000 (x) = ex ,
|f 000 (η(x))| ≤ M = e2 = 7.3891.
Now to find the maximum value of |(x − xi−1 )(x − xi )(x − xi+1 )|, we
have
max |(x − xi−1 )(x − xi )(x − xi+1 | = max |(t − h)t(t + h)|
x∈[xi−1 ,xi+1 ] t∈[−h,h]

= max |t(t2 − h2 )|,


t∈[−h,h]

using the linear change of variables t = x − xi . As we can see, the function


H(t) = t3 − th2 vanishes at t = −h and t = h, so the maximum value of
|H(t)| on [−h, h] must occur at one of the extremes of H(t), which can be
found by solving the equation

H 0 (t) = 3t2 − h2 = 0, gives, t = ±h/ 3.
Interpolation and Approximation 529

Hence,
2h3
max |(x − xi−1 )(x − xi )(x − xi+1 )| = √ .
x∈[xi−1 ,xi+1 ] 3 3
Thus, for any x ∈ [1, 2], we have

(2h3 /3 3)e2 h3 e2
|f (x) − p2 (x)| ≤ = √ ,
6 9 3
if p2 (x) is chosen as the polynomial of degree 2, which interpolates f (x) =
ex at the three tabular points nearest x. If we wish to obtain six decimal
place accuracy this way, we would have to choose h so that
h3 e2
√ < 5 × 10−7 ,
9 3
which implies that
h3 < 10.5483 × 10−7
and gives h = 0.01. •
While the Lagrange interpolation formula is at the heart of polynomial
interpolation, it is not, by any stretch of the imagination, the most prac-
tical way to use it. Just consider for a moment that if we had to add an
additional data point in the previous Example 5.6, in order to find the
cubic polynomial p3 (x), we would have to repeat the whole process again
because we cannot use the solution of the quadratic polynomial p2 (x) in
the construction of the cubic polynomial p3 (x). Therefore, one can note
that the Lagrange method is not particularly efficient for large values of
n, the degree of the polynomial. When n is large and the data for x is
ordered, some improvement in efficiency can be obtained by considering
only the data pairs in the vicinity of the x values for which f (x) is sought.

One will be quickly convinced that there must be better techniques


available. In the following, we discuss some of the more practical ap-
proaches to polynomial interpolation. They are Newton’s, Aitken’s, and
Chebyshev’s interpolation formulas. In using the first two schemes, the
construction of the difference table plays an important role. It must be
noted that in using the Lagrange interpolation scheme there was no need
to construct a difference table.
530 Applied Linear Algebra and Optimization using MATLAB

5.2.2 Newton’s General Interpolating Formula


We noted in the previous section that for a small number of data points
one can easily use the Lagrange formula for the interpolating polynomial.
However, for a large number of data points there will be many multipli-
cations and, more significantly, whenever a new data point is added to an
existing set, the interpolating polynomial has to be completely recalcu-
lated. Here, we describe an efficient way of organizing the calculations to
overcome these disadvantages.

Let us consider the nth-degree polynomial pn (x) that agrees with the
function f (x) at the distinct numbers x0 , x1 , . . . , xn . The divided differ-
ences of f (x) with respect to x0 , x1 , . . . , xn are derived to express pn (x) in
the form

pn (x) = a0 + a1 (x − x0 ) + a2 (x − x0 )(x − x1 ) + · · ·
+ an (x − x0 )(x − x1 ) · · · (x − xn−1 ), (5.18)

for appropriate constants a0 , a1 , . . . , an .

Now determine the constants first by evaluating pn (x) at x0 , and we


have
pn (x0 ) = a0 = f (x0 ). (5.19)
Similarly, when pn (x) is evaluated at x1 , then

pn (x1 ) = a0 + a1 (x1 − x0 ) = f (x1 ),

which implies that


f (x1 ) − f (x0 )
a1 = . (5.20)
x1 − x0
Now we express the interpolating polynomial in terms of divided dif-
ferences.
Interpolation and Approximation 531

Divided Differences

First, we define the zeroth divided difference at the point xi by


f [xi ] = f (xi ), (5.21)
which is simply the value of the function f (x) at xi .

The first-order or first divided difference at the points xi and xi+1 can
be defined by
f [xi+1 ] − f [xi ] f (xi+1 ) − f (xi )
f [xi , xi+1 ] = = . (5.22)
xi+1 − xi xi+1 − xi
In general, the nth divided difference f [xi , xi+1 , . . . , xi+n ] is defined by
f [xi+1 , xi+2 , . . . , xi+n ] − f [xi , xi+1 , . . . , xi+n−1 ]
f [xi , xi+1 , . . . , xi+n ] = .
xi+n − xi
(5.23)
By using this definition, (5.19) and (5.20) can be written as
a0 = f [x0 ]; a1 = f [x0 , x1 ],
respectively. Similarly, one can have the values of other constants involved
in (5.18) such as
a2 = f [x0 , x1 , x2 ]
a3 = f [x0 , x1 , x2 , x3 ]
··· = ···
··· = ···
an = f [x0 , x1 , . . . , xn ].
Putting the values of these constants in (5.18), we get
pn (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 )
+ · · · + f [x0 , x1 , . . . , xn ](x − x0 )(x − x1 ) · · · (x − xn−1 ), (5.24)
which can also be written as
n
X
pn (x) = f [x0 ] + f [x0 , x1 , . . . , xk ](x − x0 )(x − x1 ) · · · (x − xk−1 ). (5.25)
k=1
532 Applied Linear Algebra and Optimization using MATLAB

This type of polynomial is known as Newton’s interpolatory divided dif-


ference polynomial. Table 5.1 shows the divided differences for a function
f (x). One can easily show that (5.25) is simply a rearrangement of the La-

Table 5.1: Divided difference table for a function y = f (x).


Zeroth First Second Third
Divided Divided Divided Divided
k xk Difference Difference Difference Difference
0 x0 f [x0 ]
1 x1 f [x1 ] f [x0 , x1 ]
2 x2 f [x2 ] f [x1 , x2 ] f [x0 , x1 , x2 ]
3 x3 f [x3 ] f [x2 , x3 ] f [x1 , x2 , x3 ] f [x0 , x1 , x2 , x3 ]

grange form defined by (5.10). For example, the Newton divided difference
interpolation polynomial of degree one is

p1 (x) = f [x0 ] + f [x0 , x1 ](x − x0 ),

which implies that


 
f (x1 ) − f (x0 )
p1 (x) = f (x0 ) + (x − x0 )
x1 − x 0

(x1 − x0 )f (x0 ) + (x − x0 )f (x1 ) − f (x0 )(x − x0 )


=
x1 − x0
   
x − x1 x − x0
= f (x0 ) + f (x1 ),
x0 − x1 x1 − x0

which is the Lagrange interpolating polynomial of degree one. Similarly,


one can show the equivalent for the nth-degree polynomial. •

Example 5.8 Construct the fourth divided differences table for the func-
tion f (x) = 4x4 + 3x3 + 2x2 + 10 using the values x = 3, 4, 5, 6, 7, 8.
Interpolation and Approximation 533

Solution. The results are listed in Table 5.5.

From the results in Table 5.5, one can note that the nth divided difference
for the nth polynomial equation is always constant and the (n+1)th divided
difference is always zero for the nth polynomial equation. •

Using the following MATLAB commands one can construct Table 5.5
as follows:
>> x = [3 4 5 6 7 8];
>> y = 4 ∗ x.ˆ 4+3 ∗ x.ˆ 3+2 ∗ x.ˆ 2+10;
>> D = divdif f (x, y);

Table 5.2: Divided differences table for f (x) = ex at the given points.

Zeroth First Second Third Fourth Fifth


Divided Divided Divided Divided Divided Divided
k xk Difference Difference Difference Difference Difference Difference
0 3 433
1 4 1258 825
2 5 2935 1677 426
3 6 5914 2979 651 75
4 7 10741 4827 924 91 4
5 8 18058 7317 1245 107 4 0

Example 5.9 Write Newton’s interpolating polynomials in the form a +


bx + cx2 and show that a + b + c = 2 by using the following data points:
x 0 1 3
.
f (x) 1 2 3
Solution. First, we construct the divided differences table for the given
data points. The result of the divided differences is listed in Table 5.3.
Since Newton’s interpolating polynomial of degree 2 can be written as

p2 (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 ),


534 Applied Linear Algebra and Optimization using MATLAB

Table 5.3: Divided differences table for Example 5.9.


Zeroth Divided First Divided Second Divided
k xk Difference Difference Difference
0 0 1
1 1 2 1
1 1
2 3 3 −
2 6

by using Table 5.3, we have


 
1
p2 (x) = 1 + (1)(x − 0) + − (x − 0)(x − 1),
6
which gives
(6 + 7x − x2 )
p2 (x) = ,
6
and from it, we have
6 7 −1
a+b+c= + + = 2.
6 6 6

Example 5.10 Show that Newton’s interpolating polynomial p2 (x) of de-


gree 2 satisfies the interpolation conditions

p2 (xi ) = f (xi ), i = 0, 1, 2.

Solution. Since Newton’s interpolating polynomial of degree 2 is

p2 (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 ),

first, taking x = x0 , we have

p2 (x0 ) = f [x0 ] + 0 + 0 = f (x0 ).

Now taking x = x1 , we have


f (x1 ) − f (x0 )
p2 (x1 ) = f [x0 ] + f [x0 , x1 ](x1 − x0 ) + 0 = f (x0 ) + (x1 − x0 ),
x1 − x0
Interpolation and Approximation 535

and it gives
p2 (x1 ) = f (x0 ) + f (x1 ) − f (x0 ) = f (x1 ).
Finally, taking x = x2 , we have

p2 (x2 ) = f [x0 ] + f [x0 , x1 ](x2 − x0 ) + f [x0 , x1 , x2 ](x2 − x0 )(x2 − x1 ),

which can be written as


f [x1 , x2 ] − f [x0 , x1 ]
p2 (x2 ) = f [x0 ] + f [x0 , x1 ](x2 − x0 ) + (x2 − x0 )(x2 − x1 ),
x2 − x0
which gives

p2 (x2 ) = f [x0 ] + f [x0 , x1 ](x1 − x0 ) + f [x1 , x2 ](x2 − x1 ).

From (5.22), we have

f (x1 ) − f (x0 ) f (x2 ) − f (x1 )


p2 (x2 ) = f [x0 ] + (x1 − x0 ) + (x2 − x1 ),
x1 − x0 x2 − x 1
which gives
p2 (x2 ) = f (x0 ) + f (x1 ) − f (x0 ) + f (x2 ) − f (x1 ) = f (x2 ). •

Program 5.2
MATLAB m-file for the Divided Differences
function D=divdiff(x,y)
% Construct divided difference table
m = length(x); D = zeros(m, m); D(:, 1) = y(:);
for j=2:m; for i=j:m
D(i, j) = (D(i, j−1)−D(i−1, j−1))/(x(i)−x(i−j+1));
end; end

The main advantage of the Newton divided difference form over the La-
grange form is that polynomial pn (x) can be calculated from polynomial
pn−1 (x) by adding just one extra term, since it follows from (5.25) that

pn (x) = pn−1 (x) + f [x0 , x1 , . . . , xn ](x − x0 )(x − x1 ) · · · (x − xn−1 ). (5.26)


536 Applied Linear Algebra and Optimization using MATLAB

Example 5.11 (a) Construct the divided difference table for the function
f (x) = ln(x + 2) in the interval 0 ≤ x ≤ 3 for the stepsize h = 1.
(b) Use Newtons’s divided difference interpolation formula to construct
the interpolating polynomials of degree 2 and degree 3 to approximate
ln(3.5).
(c) Compute error bounds for the approximations in part (b).

Solution. (a) The results of the divided differences are listed in Table 5.4.

(b) First, we construct the second degree polynomial p2 (x) by using the
quadratic Newton interpolation formula as follows:

p2 (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 ),

then with the help of the divided differences Table 5.4, we get

p2 (x) = 0.6932 + 0.4055(x − 0) − 0.0589(x − 0)(x − 1),

which implies that

p2 (x) = −0.0568x2 + 0.4644x + 0.6932.

Then at x = 1.5, we have

p2 (1.5) = 1.2620,

with possible actual error

f (1.5) − p2 (1.5) = 1.2528 − 1.2620 = −0.0072.

Now to construct the cubic interpolatory polynomial p3 (x) that fits at


all four points, we only have to add one more term to the polynomial p2 (x):

p3 (x) = p2 (x) + f [x0 , x1 , x2 , x3 ](x − x0 )(x − x1 )(x − x2 )

or
p3 (x) = p2 (x) + 0.0089(x − 0)(x − 1)(x − 2)
p3 (x) = p2 (x) + 0.0089(x3 − 3x2 + 2x),
Interpolation and Approximation 537

then at x = 1.5, we get


p3 (1.5) = p2 (1.5) + 0.0089((1.5)3 − 3(1.5)2 + 2(1.5))
p3 (1.5) = 1.2620 − 0.0033 = 1.2587,
with possible actual error
f (1.5) − p3 (1.5) = 1.2528 − 1.2587 = −0.0059.
We note that the estimated value of f (1.5) by the cubic interpolating poly-
nomial is closer to the exact solution than the quadratic polynomial.

Table 5.4: Divide differences table for Example 5.11.


Zeroth First Second Third
Divided Divided Divided Divided
k xk Difference Difference Difference Difference
0 0 0.6932
1 1 1.0986 0.4055
2 2 1.3863 0.2877 - 0.0589
3 3 1.6094 0.2232 - 0.0323 0.0089

(c) Now to compute the error bounds for the approximations in part (b),
we use the error formula (5.17). For the polynomial p2 (x), we have
|f 000 (η(x))|
|f (x) − p2 (x)| = |(x − x0 )(x − x1 )(x − x2 )|.
3!
Since the third derivative of the given function is
2
f 000 (x) =
(x + 2)3
and
000
2
|f (η(x))| = , for η(x) ∈ (0, 2),

(η(x) + 2) 3

then
2
M = max = 0.25
0≤x≤2 (x + 2)3
538 Applied Linear Algebra and Optimization using MATLAB

and
(0.25)
|f (1.5) − p2 (1.5)| ≤ (0.375) = 0.0156,
6
which is the required error bound for the approximation p2 (1.5).

Since the error bound for the cubic polynomial p3 (x) is

|f (4) (η(x))|
|f (x) − p3 (x)| = |(x − x0 )(x − x1 )(x − x2 )(x − x3 )|,
4!
taking the fourth derivative of the given function, we have
−6
f (4) (x) =
(x + 2)4

and −6
|f (4) (η(x))| = , for η(x) ∈ (0, 3).

(η(x) + 2)4
Since
|f (4) (0)| = 0.375
|f (4) (3)| = 0.0096,

−6
|f (4) (η(x))| ≤ max = 0.375
0≤x≤3 (x + 2)4

and
(0.375)
|f (1.5) − p3 (1.5)| ≤ (0.5625) = 0.0088,
24
which is the required error bound for the approximation p3 (1.5). •

Note that in Example 5.11, we used the value of the quadratic poly-
nomial p2 (1.5) in calculating the cubic polynomial p3 (1.5). It was possible
because the initial value for both polynomials was the same as x0 = 0.
But the situation will be quite different if the initial point for both poly-
nomials is different. For example, if we have to find the approximate
value of ln(4.5), then the suitable data points for the quadratic polyno-
mial will be x0 = 1, x1 = 2, x2 = 3 and for the cubic polynomial will be
x0 = 0, x1 = 1, x2 = 2, x3 = 3. So for getting the best approximation of
Interpolation and Approximation 539

ln(4.5) by the cubic polynomial p3 (2.5), we cannot use the value of the
quadratic polynomial p2 (2.5) in the cubic polynomial p3 (2.5). The best
way is to use the cubic polynomial form

p3 (2.5) = f [0] + f [0, 1](2.5 − 0) + f [0, 1, 2](2.5 − 0)(2.5 − 1)


+ f [0, 1, 2, 3](2.5 − 0)(2.5 − 1)(2.5 − 2),

which gives

p3 (2.5) = 0.6932 + 1.0137 − 0.2208 + 0.0166 = 1.5027.

Figure 5.4: Quadratic and cubic polynomial approximations of the func-


tion.

MATLAB commands can reproduce the results of Example 5.11 as fol-


lows:

>> x = [0 1 2 3];
>> y = log(x + 2);
>> x0 = 1.5;
>> Y = N divf (x, y, x0);
540 Applied Linear Algebra and Optimization using MATLAB

Program 5.3
MATLAB m-file for Newton’s Linear Interpolation Method
function Y=Ndivf(x,y,x0)
m = length(x); D = zeros(m, m); D(:, 1) = y(:);
for j=2:m; for i=j:m;
D(i, j) = (D(i, j−1)−D(i−1, j−1))/(x(i)−x(i−j+1)); end; end;
Y = D(m, m) ∗ ones(size(x0));
for i = m − 1 : −1 : 1;
Y = D(i, i) + (x0 − x(i)) ∗ Y ; end

Example 5.12 Let x0 = 0.5, x1 = 0.7, x2 = 0.9, x3 = 1.1, x4 = 1.3, and


x5 = 1.5. Use Newton polynomial p5 (x) of degree five to approximate the
function f (x) = ex at x = 0.6, when p4 (0.6) = 1.9112. Also, compute an
error bound for your approximation.

Solution. Since the fifth-degree Newton polynomial p5 (x) is defined as

p5 (x) = p4 (x)+(x−x0 )(x−x1 )(x−x2 )(x−x3 )(x−x4 )f [x0 , x1 , x2 , x3 , x4 , x5 ],

and using the given data points, we have

p5 (0.6) = p4 (0.6) + (0.0010)f [0.5, 0.7, 0.9, 1.1, 1.3, 1.5].

Now we compute the fifth-order divided differences of the function as fol-


lows. Thus,

p5 (0.6) = 1.9112 + (0.0010)(0.0228) = 1.9112.

Since the error bound for the fifth-degree polynomial p5 (x) is

|f (6) (η(x))|
|f (x)−p5 (x)| = |(x−x0 )(x−x1 )(x−x2 )(x−x3 )(x−x4 )(x−x5 )|,
6!
taking the sixth derivative of the given function, we have

f (6) (x) = ex

and
|f (6) (η(x))| = e(η(x)) , for η(x) ∈ (0.5, 1.5).
Interpolation and Approximation 541

Table 5.5: Divided differences for f (x) = ex at the given points.

Zeroth First Second Third Fourth Fifth


Divided Divided Divided Divided Divided Divided
k xk Difference Difference Difference Difference Difference Difference
0 0.5 1.6487
1 0.7 2.0138 1.8252
2 0.9 2.4596 2.2293 1.0102
3 1.1 3.0042 2.7228 1.2339 0.3728
4 1.3 3.6693 3.3257 1.5071 0.4553 0.1032
5 1.5 4.4817 4.0620 1.8408 0.4553 0.1260 0.0228

Since
|f (6) (0.5)| = 1.6487
|f (6) (1.5)| = 4.4817,

|f (6) (η(x))| ≤ max |ex | = 4.4817.


0.5≤x≤1.5

Thus, we get
|f (0.6) − p5 (0.6)| ≤ (0.00095)(4.4817)/720 = 0.000006,
which is the required error bound for the approximation p5 (0.6). •
Example 5.13 Consider the points x0 = 0.5, x1 = 1.5, x2 = 2.5, x3 =
3.0, x4 = 4.5, and for a function f (x), the divided differences are
f [x2 ] = 73.8125, f [x1 , x2 ] = 59.5, f [x0 , x1 , x2 ] = 23.7,
f [x1 , x2 , x3 ] = 47.25, f [x0 , x1 , x2 , x3 , x4 ] = 1.
Using this information, construct the complete divided differences table for
the given data points.

Solution. Since we know the third divided difference is defined as


f [x1 , x2 , x3 ] − f [x0 , x1 , x2 ]
f [x0 , x1 , x2 , x3 ] = ,
x3 − x0
542 Applied Linear Algebra and Optimization using MATLAB

using the given data points, we get


47.25 − 23.5
f [x0 , x1 , x2 , x3 ] = = 9.50.
3.0 − 0.5
Similarly, the other third divided difference f [x1 , x2 , x3 , x4 ] can be computed
by using the fourth divided difference formula as follows:
f [x1 , x2 , x3 , x4 ] − f [x0 , x1 , x2 , x3 ]
f [x0 , x1 , x2 , x3 , x4 ] =
x4 − x0
f [x1 , x2 , x3 , x4 ] − 9.50
1.0 =
4.5 − 0.5

f [x1 , x2 , x3 , x4 ] = 4.0 + 9.50 = 13.50.


Now finding the remaining second-order divided difference f [x2 , x3 , x4 ],
we use the third-order divided difference as follows:
f [x2 , x3 , x4 ] − f [x1 , x2 , x3 ]
f [x1 , x2 , x3 , x4 ] =
x4 − x1
f [x2 , x3 , x4 ] − 47.25
13.50 =
4.5 − 1.5

f [x2 , x3 , x4 ] = 40.50 + 47.25 = 87.75.


Finding the first-order divided difference f [x0 , x1 ], we use the second-
order divided difference as follows:
f [x1 , x2 ] − f [x0 , x1 ]
f [x0 , x1 , x2 ] =
x2 − x0
59.50 − f [x0 , x1 ]
23.50 =
2.5 − 0.5

f [x2 , x3 , x4 ] = 59.50 − 47.0 = 12.50.


Similarly, the other two first-order divided differences f [x2 , x3 ] and f [x3 , x4 ]
can be calculated as follows:
f [x2 , x3 ] − f [x1 , x2 ]
f [x1 , x2 , x3 ] =
x3 − x1
Interpolation and Approximation 543

f [x2 , x3 ] − 59.50
47.25 =
3.0 − 1.5

f [x2 , x3 ] = 70.8750 + 59.50 = 130.375

and
f [x3 , x4 ] − f [x2 , x3 ]
f [x2 , x3 , x4 ] =
x4 − x2
f [x3 , x4 ] − 130.375
87.75 =
4.5 − 2.5

f [x2 , x3 , x4 ] = 175.50 + 130.375 = 305.875.

Also, the remaining zeroth-order divided differences can be calculated as


follows:
f [x2 ] − f [x1 ]
f [x1 , x2 ] =
x2 − x1
73.8125 − f [x1 ]
59.50 =
2.5 − 1.5

f [x1 ] = 73.8125 − 59.50 = 14.3125

and
f [x1 ] − f [x0 ]
f [x0 , x1 ] =
x1 − x 0
73.8125 − f [x1 ]
59.50 =
2.5 − 1.5

f [x1 ] = 73.8125 − 59.50 = 14.3125.

Finally,
f [x3 ] − f [x2 ]
f [x2 , x3 ] =
x3 − x2
544 Applied Linear Algebra and Optimization using MATLAB

f [x3 ] − 73.8125
130.375 =
3.0 − 2.5

f [x3 ] = 65.1875 + 73.8125 = 139.00

and
f [x4 ] − f [x3 ]
f [x3 , x4 ] =
x4 − x3
f [x4 ] − 139.00
305.875 =
4.5 − 3.0

f [x4 ] = 458.8125 + 139.00 = 597.8125,

which completes the divided differences table as shown in Table 5.6. •

Table 5.6: Complete divided differences table for the given points.
Zeroth First Second Third Fourth
Divided Divided Divided Divided Divided
k xk Difference Difference Difference Difference Difference
0 0.5 1.8125
1 1.5 14.3125 12.5000
2 2.5 73.8125 59.5000 23.5000
3 3.0 139.000 130.375 47.2500 9.5000
4 4.5 597.8125 305.875 87.7500 13.500 1.0000

Example 5.14 If f (x) = p(x)q(x), then show that

f [x0 , x1 ] = p(x1 )q[x0 , x1 ] + q(x0 )p[x0 , x1 ].

Also, find the values of p[0, 1] and q[0, 1], when f [0, 1] = 4, f (1) = 5, p(1) =
q(0) = 2.
Interpolation and Approximation 545

Solution. The first-order divided difference can be written as


f (x1 ) − f (x0 )
f [x0 , x1 ] = .
x1 − x0
Now using f (x1 ) = p(x1 )q(x1 ) and f (x0 ) = p(x0 )q(x0 ) in the above for-
mula, we have
p(x1 )q(x1 ) − p(x0 )q(x0 )
f [x0 , x1 ] = .
x1 − x0
Adding and subtracting the term p(x1 )q(x0 ), we obtain
p(x1 )q(x1 ) − p(x1 )q(x0 ) + p(x1 )q(x0 ) − p(x0 )q(x0 )
f [x0 , x1 ] = ,
x1 − x0
which can be written as
q(x1 ) − q(x0 ) p(x1 ) − p(x0 )
f [x0 , x1 ] = p(x1 ) + q(x0 ) .
x1 − x0 x1 − x0
Thus,
f [x0 , x1 ] = p(x1 )q[x0 , x1 ] + q(x0 )p[x0 , x1 ].
Given x0 = 0, x1 = 1, f (1) = 5, and f [0, 1] = 4, we obtain
f (1) − f (0)
f [0, 1] = = f (1) − f (0)
1−0
or
f [0, 1] = 4 = 5 − f (0), gives f (0) = 1.
Also,
5
f (1) = 5 = p(1)q(1) = 2q(1), gives q(1) =
2
and
1
f (0) = 1 = p(0)q(0) = 2p(0), gives p(0) = .
2
Hence,
p(1) − p(0) 1 3
p[0, 1] = = p(1) − p(0) = 2 − =
1−0 2 2
and
q(1) − q(0) 5 1
q[0, 1] = = q(1) − q(0) = − 2 = .
1−0 2 2

546 Applied Linear Algebra and Optimization using MATLAB

In the case of the Lagrange interpolating polynomial we derive an ex-


pression for the truncation error in the form given by (5.17), namely, that

f (n+1) (η(x))
Rn+1 (x) = , Ln (x),
(n + 1)!

where Ln (x) = (x − x0 )(x − x1 ) · · · (x − xn ).

For Newton’s divided difference formula, we obtain, following the same


reasoning as above,

f (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 ) + · · ·


+ f [x0 , x1 , . . . , xn ](x − x0 )(x − x1 ) · · · (x − xn−1 )
+ f [x0 , x1 , . . . , xn , x](x − x0 )(x − x1 ) · · · (x − xn−1 )(x − xn ),

which can also be written as

f (x) = pn (x) + f [x0 , x1 , . . . , xn , x](x − x0 )(x − x1 ) · · · (x − xn ) (5.27)

or
f (x) − pn (x) = Ln (x)f [x0 , x1 , . . . , xn , x]. (5.28)
Since the interpolation polynomial agreeing with f (x) at x0 , x1 , . . . , xn is
unique, it follows that these two error expressions must be equal.

Theorem 5.3 Let pn (x) be the polynomial of degree at most n that inter-
polates a function f (x) at a set of n + 1 distinct points x0 , x1 , . . . , xn . If x
is a point different from the points x0 , x1 , . . . , xn , then
n
Y
f (x) − pn (x) = f [x0 , x1 , . . . , xn , x] (x − xj ). (5.29)
j=0

One can easily show the relationship between the divided differences and
the derivative. From (5.23), we have

f (x1 ) − f (x0 )
f [x0 , x1 ] = .
x1 − x0
Interpolation and Approximation 547

Now applying the Mean Value theorem to the above equation implies that
when the derivative f 0 exists, we have

f [x0 , x1 ] = f 0 (η(x))

for the unknown point η(x), which lies between x0 and x1 . The following
theorem generalizes this result.

Theorem 5.4 (Divided Differences and Derivatives)

Suppose that f ∈ C n [a, b] and x0 , x1 , . . . , xn are distinct numbers in [a, b].


Then for some point η(x) in the interval (a, b) spanned by x0 , . . . , xn
exists
f (n) (η(x))
f [x0 , x1 , . . . , xn ] = . (5.30)
n!

Example 5.15 Let f (x) = x ln x, and the points x0 = 1.1, x1 = 1.2, x2 =


1.3. Find the best approximate value for the unknown point η(x) by using
the relation (5.30).

Solution. Given f (x) = x ln x, then

f (1.1) = 1.1 ln(1.1) = 0.1048


f (1.2) = 1.2 ln(1.2) = 0.2188
f (1.3) = 1.3 ln(1.3) = 0.3411.

Since the relation (5.30) for the given data points is

f 00 (η(x))
f [x0 , x1 , x2 ] = , (5.31)
2!
to compute the value of the left-hand side of the relation (5.31), we have
to find the values of the first-order divided differences

f (x1 ) − f (x0 ) 0.2188 − 0.1048


f [x0 , x1 ] = = = 1.1400
x1 − x0 1.2 − 1.1
548 Applied Linear Algebra and Optimization using MATLAB

and
f (x2 ) − f (x1 ) 0.3411 − 0.2188
f [x1 , x2 ] = = = 1.2230.
x2 − x1 1.3 − 1.2
Using these values, we can compute the second-order divided difference as
f [x1 , x2 ] − f [x0 , x1 ] 1.2230 − 1.1400
f [x0 , x1 , x2 ] = = = 0.4150.
x2 − x0 1.3 − 1.1
Now we calculate the right-hand side of the relation (5.31) for the given
points, which gives us
f 00 (x0 ) 1
= = 0.4546
2 2x0

f 00 (x1 ) 1
= = 0.4167
2 2x1

f 00 (x2 ) 1
= = 0.3846.
2 2x2
We note that the left-hand side of (5.31) is nearly equal to the right-
hand side when x1 = 1.2. Hence, the best approximate value of η(x) is
x1 = 1.2. •
Properties of Divided Differences

Now we discuss some of the properties of divided differences as follows:


1. If pn (x) is a polynomial of degree n, then the divided differences of
order n is always constant and (n+1), (n+2), . . . are identically zero.
2. The divided difference is a symmetric function of its arguments.
Thus, if (t0 , . . . , tn ) is a permutation of (x0 , x1 , . . . , xn ), then
f [t0 , t1 , . . . , tn ] = f [x0 , x1 , . . . , xn ].
This can be verified easily, since the divided differences on both sides
of the above equation are the coefficients of xn in the polynomial of
degree at most n that interpolates f (x) at the n + 1 distinct points
t0 , t1 , . . . , tn and x0 , x1 , . . . , xn . These two polynomials are, of course,
the same.
Interpolation and Approximation 549

3. The interpolating polynomial of degree n can be obtained by adding


a single term to the polynomial of degree (n − 1) expressed in the
Newton form:
n−1
Y
pn (x) = pn−1 (x) + f [x0 , . . . , xn ] (x − xj ).
j=0

4. The divided difference f [x0 , . . . , xn−1 ] is the coefficient of xn−1 in the


polynomial that interpolates (x0 , f0 ), (x1 , f1 ), . . . , (xn−1 , fn−1 ).

5. A sequence of divided differences may be constructed recursively from


the formula
f [x1 , . . . , xn ] − f [x0 , . . . , xn−1 ]
f [x0 , . . . , xn ] = ,
xn − x0

and the zeroth-order divided difference is defined by

f [xi ] = f (xi ), i = 0, 1, . . . , n.

6. Another useful property of divided differences can be obtained by us-


ing the definitions of the divided differences (5.23) and (5.24), which
can be extended to the case where some or all of the points xi are
coincident, provided that f (x) is sufficiently differentiable. For ex-
ample, define

f (x1 ) − f (x0 )
f [x0 , x0 ] = lim f [x0 , x1 ] = lim = f 0 (x0 ). (5.32)
x1 →x0 x1 →x0 x1 − x0

For an arbitrary n ≥ 1, let all the points in Theorem 5.4 approach


x0 . This leads to the definition

f (n) (x0 )
f [x0 , x0 , . . . , x0 ] = ,
n!
where the left-hand side denotes the nth divided difference, for which
all points are x0 .
550 Applied Linear Algebra and Optimization using MATLAB

Example 5.16 Let f (x) = x2 e2x + ln(x + 1) and x0 = 0, x1 = 1. Using


(5.30) and the above divide difference property 6, calculate the values of
the divided differences f [1, 1, 0, 0] and f [0, 0, 0, 1, 1, 1].

Solution. Since
f [0, 1, 1] − f [0, 0, 1]
f [1, 1, 0, 0] = f [0, 0, 1, 1] =
1−0

= f [0, 1, 1] − f [0, 0, 1]
   
f [1, 1] − f [0, 1] f [0, 1] − f [0, 0]
= −
1−0 1−0

= f [1, 1] − 2f [0, 1] + f [0, 0]


 
f [1] − f [0]
= f [1, 1] − 2 + f [0, 0]
1−0
= f 0 (1) − 2f (1) + 2f (0) + f 0 (0),

the given function is

f (x) = x2 e2x + ln(x + 1),

and its first derivative can be calculated as


1
f 0 (x) = 2xe2x + 2x2 e2x + ,
(x + 1)

so their values at the given points are

f (0) = 0 and f (1) = e2 + ln 2


f 0 (0) = 1 and f 0 (1) = 4e2 + 0.5.

Thus, we have

f [1, 1, 0, 0] = f 0 (1) − 2f (1) + 2f (0) + f 0 (0) = 4e2 + 0.5 − 2(e2 + ln 2) + 0 + 1,

and it gives
f [1, 1, 0, 0] = 2e2 − 2 ln 2 + 1.5 = 14.8918.
Interpolation and Approximation 551

Also,
f [0, 0, 1, 1, 1] − f [0, 0, 0, 1, 1]
f [0, 0, 0, 1, 1, 1] =
1−0

= f [0, 0, 1, 1, 1] − f [0, 0, 0, 1, 1]
   
f [0, 1, 1, 1] − f [0, 0, 1, 1] f [0, 0, 1, 1] − f [0, 0, 0, 1]
= −
1−0 1−0

= f [0, 1, 1, 1] − 2f [0, 0, 1, 1] + f [0, 0, 0, 1]


= f [1, 1, 1] − 3f [0, 0, 1, 1] + f [0, 0, 0]
f 00 (1) f 00 (0)
= − 3f [0, 0, 1, 1] + .
2! 2!
Since the second derivative of the given function is
1
f 00 (x) = e2x (2 + 8x + 4x2 ) − ,
(x + 1)2
and its values at the given points are
f 00 (0) = 2 − 1 = 1 and f 00 (1) = 14e2 − 0.25,
using these values and f [0, 0, 1, 1] = 14.8918, we get
14e2 − 0.25 1
f [0, 0, 0, 1, 1, 1] = − 3(14.8918) +
2! 2

= 51.60895 − 44.6754 − 0.5 = 6.43355,


which is the required value of the fifth-order divided difference of the func-
tion at the given points. •
There are many schemes for the efficient implementation of divided
difference interpolation, such as those due to the Aitken’s Method, which is
designed for the easy evaluation of the polynomial, taking the points closest
to the one of interest first and computing only those divided differences that
are actually necessary for the computation. The implementation is iterative
in nature; additional data points are included one at a time until successive
estimates pk (x) and pk+1 (x) of f (x) agree to some specified accuracy or
until all data has been used.
552 Applied Linear Algebra and Optimization using MATLAB

5.2.3 Aitken’s Method


This is an iterative interpolation process based on the repeated applica-
tion of a simple interpolation method. This elegant method may be used
to interpolate between both equal and unequal spaced data points. The
basis of this method is equivalent to generating a sequence of the Lagrange
polynomials, but it is a very efficient formulation. The method is used to
compute an interpolated value using successive, higher degree polynomials
until further increases in the degree of the polynomials give a negligible
improvement on the interpolated value.

Suppose we want to fit a polynomial function, for the purpose of inter-


polation, to the following data points:
x x0 x1 ··· xn
.
f (x) f (x0 ) f (x1 ) · · · f (xn )
In order to estimate the value of the function f (x) corresponding to any
given value of x, we consider the following expression:

1 x − x0 f (x 0 ) 1 h i
P01 (x) = = (x−x 0 )f (x 1 )−(x−x 1 )f (x 0 )
x1 − x0 x − x1 f (x1 ) x1 − x0
and

1 x − x0 f (x0 ) 1 h i
P02 (x) = = (x−x0 )f (x2 )−(x−x2 )f (x0 ) .
x2 − x0 x − x2 f (x2 ) x2 − x0
In general,

1 x − x0 f (x0 )
P0m (x) =
xm − x0 x − x m f (xm )
(5.33)
1 h i
= (x − x0 )f (xm ) − (x − xm )f (x0 ) .
xm − x0
It represents a first-degree polynomial and is equivalent to a linear inter-
polation using the data points (x0 , f (x0 )) and (xm , f (xm )). One can easily
verify that
P0m (x0 ) = f (x0 ) and P0m (xm ) = f (xm ). (5.34)
Interpolation and Approximation 553

Similarly, the second-degree polynomials are generated as follows:



1 x − x1 P 01 (x)
P012 (x) =
x2 − x1 x − x2 P02 (x)

1 h i
= (x − x1 )P02 (x) − (x − x2 )P01 (x)
x2 − x1
and

1 x − x1 P01 (x)
P01m (x) =
xm − x1 x − xm P0m (x)
1 h i
= (x − x1 )P0m (x) − (x − xm )P01 (x) , (5.35)
xm − x 1
where m can now take any value from 2 to n, and P01m denotes a polynomial
of degree 2 that passes through three points (x0 , f (x0 )), (x1 , f (x1 )), and
(xm , f (xm )). By repeated use of this procedure, higher degree polynomials
can be generated. In general, one can define this procedure as follows:

1 P01···(n−1) (x) f (x n−1 )
P012···n (x) =
xn − xn−1 P01···n (x) f (xn )
1 h i
= P01···(n−1) (x)f (xn ) − P01···n (x)f (xn−1 ) . (5.36)
xn − xn−1
This is a polynomial of degree n and it fits all the data. Table 5.7 shows
the construction of P012···n (x). When using Aitken’s method in practice,
only the values of the polynomials for specified values of x are computed
and coefficients of the polynomials are not determined explicitly. Further-
more, if for a specified x, the stage is reached when the difference in value
between successive degree polynomials is negligible, then the procedure
can be terminated. It is an advantage of this method compared with the
Lagrange interpolation formula.
554 Applied Linear Algebra and Optimization using MATLAB

Table 5.7: Aitken’s scheme to approximate a function.


First Second Third Nth
k xk x − xk f (xk ) Order Order Order Order
0 x0 x − x0 f (x0 )
1 x1 x − x1 f (x1 ) P01 (x)
2 x2 x − x2 f (x2 ) P02 (x) P012 (x)
3 x3 x − x3 f (x3 ) P03 (x) P013 (x)
··· ··· ··· ··· ··· ··· ··· ··· ···
n xn x − xn f (xn ) P0n (x) P01n (x) P012n (x) · · · P012···n (x)

Example 5.17 Apply Aitken’s method to the approximate evaluation of


ln x at x = 4.5 from the following data points:
x 2 3 4 5
.
f (x) 0.6932 1.0986 1.3863 1.6094
Solution. To find the estimate value of ln(4.5), using the given data points,
we have to compute all the unknowns required in the given problem as
follows:

1 x − x0 f (x 0 )
P01 (x) =
x1 − x0 x − x1 f (x1 )

1 4.5 − 2 0.6932
P01 (4.5) =
3 − 2 4.5 − 3 1.0986

= (2.5)(1.0986) − (1.5)(0.6932) = 1.7067


and

1 x − x0 f (x0 )
P02 (x) =
x2 − x0 x − x2 f (x2 )

1 4.5 − 2 0.6932
P02 (4.5) =
4 − 2 4.5 − 4 1.3863

1h i
= (2.5)(1.3863) − (0.5)(0.6932) = 1.5596
2
Interpolation and Approximation 555

and

1 x − x0 f (x0 )
P03 (x) =
x3 − x0 x − x3 f (x3 )

1 4.5 − 2 0.6932
P03 (4.5) =
5 − 2 4.5 − 5 1.6094

1h i
= (2.5)(1.6094) − (−0.5)(0.6932) = 1.4567.
3
Similarly, the values of second-degree polynomials can be generated as fol-
lows:
1 x − x1 P01 (x)
P012 (x) =
x2 − x1 x − x2 P02 (x)

1 4.5 − 3 P01 (4.5)
P012 (4.5) =
4 − 3 4.5 − 4 P02 (4.5)

= (1.5)(1.5596) − (0.5)(1.7067) = 1.4860


and

1 x − x1 P01 (x)
P013 (x) =
x3 − x1 x − x3 P03 (x)

1 4.5 − 3 P01 (4.5)
P013 (4.5) =
5 − 3 4.5 − 5 P03 (4.5)

1h i
= (1.5)(1.4567) − (−0.5)(1.7067) = 1.5193.
2
Finally, the values of third-degree polynomials can be generated as follows:

1 x − x2 P 012 (x)
P0123 (x) =
x3 − x2 x − x3 P013 (x)

1 4.5 − 4 P012 (4.5)
P0123 (4.5) =
5 − 4 4.5 − 4 P013 (4.5)

= (0.5)(1.5193) − (−0.5)(1.4860) = 1.5027.


556 Applied Linear Algebra and Optimization using MATLAB

Table 5.8: Approximate solution for Example 5.17.


First- Second- Third-
k xk f (xk ) x − xk Order Order Order
0 2.0 0.6932 2.5
1 3.0 1.0986 1.5 1.7067
2 4.0 1.3863 0.5 1.5596 1.4860
3 5.0 1.6094 -0.5 1.4567 1.5193 1.5027

The results obtained are listed in Table 5.8. Note that the approximate
value of ln(4.5) is P0123 (4.5) = 1.5027 and its exact value is 1.5048. •
To get the above results using the MATLAB Command Window, we
do the following:

>> x = [2 3 4 5];
>> y = [0.6932 1.0986 1.3863 1.6094];
>> x0 = 4.5;
>> P = Aitken1(x, y, x0);

Program 5.4
MATLAB m-file for Aitken’s Method
function P=Aitken1(x,y,x0)
n=size(x,1);
if n==1
n=size(x,2); end
for i=1:n
P(i,1)=y(i); end
for i=2:n
t=0;
for j=2:i
t=t+1;
P(i,j)=(P(i,j-1)*(x0-x(j-1))-P(t,t)*(x0-x(i)))/(x(i)-
x(j-1));
end; end
Interpolation and Approximation 557

5.2.4 Chebyshev Polynomials


Here, we discuss polynomial interpolation for f (x) over the interval [−1, 1]
based on the points

−1 ≤ x0 < x1 < x2 < . . . < xn ≤ 1.

This special type of polynomial is known as a Chebyshev polynomial.

Chebyshev polynomials are used in many parts of numerical analysis


and more generally in mathematics and physics. Basically, Chebyshev
polynomials are used to minimize approximation error. These polynomials
are of the form

Tn (x) = cos(n cos−1 (x)), for x ∈ [−1, 1]. (5.37)

The representation of (5.37) may not appear to be a polynomial, but we


will show it is a polynomial of degree n. To simplify the manipulation of
(5.37), we introduce

θ = cos−1 (x), or x = cos θ, 0 ≤ θ ≤ π. (5.38)

Then
Tn (x) = cos(nθ), for x ∈ [−1, 1]. (5.39)
For example, taking n = 0, then

T0 (x) = cos(0.θ) = cos(0) = 1,

and for n = 1, gives

T1 (x) = cos(1.θ) = cos(θ) = x.

Also, by taking n = 2, we have

T2 (x) = cos(2.θ) = cos(2θ),

and using the standard identity, cos(2θ) = 2 cos2 (θ) − 1, we get

T2 (x) = cos(2θ) = 2 cos2 (θ) − 1 = 2x2 − 1.


558 Applied Linear Algebra and Optimization using MATLAB

Figure 5.5: Graphs of T0 (x), T1 (x), T2 (x), T3 (x), T4 (x).

The graphs of the Chebyshev polynomials T0 (x), T1 (x), T2 (x), T3 (x),


and T4 (x) are given in Figure 5.5.
The first few Chebyshev polynomials are as follows:

T0 (x) = cos(0.θ) = 1
T1 (x) = cos(1.θ) = x
T2 (x) = cos(2.θ) = 2x2 − 1
T3 (x) = cos(3.θ) = 4x3 − 3x
T4 (x) = cos(4.θ) = 8x4 − 8x2 + 1
T5 (x) = cos(5.θ) = 16x5 − 20x3 + 5x
T6 (x) = cos(6.θ) = 32x6 − 48x4 + 18x2 − 1
T7 (x) = cos(7.θ) = 64x7 − 112x5 + 56x3 − 7x
T8 (x) = cos(8.θ) = 128x8 − 256x6 + 160x4 − 32x2 + 1
T9 (x) = cos(9.θ) = 256x9 − 576x7 + 432x5 − 120x3 + 9x
T1 0(x) = cos(10.θ) = 512x10 − 1280x8 + 1120x6 − 600x4 + 50x2 − 1.

To get the coefficients of the above Chebyshev polynomials using the


MATLAB Command Window, we do the following:
Interpolation and Approximation 559

>> n = 10;
>> T = CHEBP (n);
>> T =
512 0 − 1280 0 1120 0 − 400 0 50 0 − 1
Note that we got the coefficients of the Chebyshev polynomials in de-
scending order of powers.

Program 5.5
MATLAB m-file for Computing Chebyshev
Polynomials
function T = CHEBP(n)
x0 = 1; x1 = [1 0];
if n == 0 T = x0;
elseif n == 1; T = x1; else
for i=2:n
T = [2 ∗ x1 0] − [0 0 x0];
x0 = x1; x1 = T ; end end

The higher order polynomials can be generated from the recursion relation
called the triple recursion relation. This relation can be easily constructed
with the help of the trigonometric addition formulas

cos(A ± B) = cos(A) cos(B) ∓ sin(A) sin(B).

For any n ≥ 1, apply these identities to get

Tn+1 (x) = cos((n + 1)θ) = cos(nθ + θ) = cos(nθ) cos(θ) − sin(nθ) sin(θ)


Tn−1 (x) = cos((n − 1)θ) = cos(nθ − θ) = cos(nθ) cos(θ) + sin(nθ) sin(θ).

By adding Tn+1 (x) and Tn−1 (x), we get

Tn+1 (x) + Tn−1 (x) = 2 cos(nθ) cos(θ) = 2xTn (x),

because cos(nθ) = Tn (x) and cos(θ) = x.


560 Applied Linear Algebra and Optimization using MATLAB

So the relation

Tn+1 (x) = 2xTn (x) − Tn−1 (x), n≥1 (5.40)

is called the triple recursion relation for the Chebyshev polynomials.

Theorem 5.5 (Properties of Chebyshev Polynomials)

The functions Tn (x) satisfy the following properties:


1. Each Tn (x) is a polynomial of degree n.

2. Tn+1 (x) = 2xTn (x) − Tn−1 (x), n ≥ 1.

3. Tn (x) = 2n−1 xn + lower order terms.

4. When n = 2m, T2m (x) is an even function, i.e., T2m (−x) = T2m (x).

5. When n = 2m + 1, T2m+1 (x) is an odd function, i.e., T2m+1 (−x) =


−T2m+1 (x).

6. Tn (x) has n distinct zeros xk (called Chebyshev points) that lie on the
interval [−1, 1]:
 
(2k + 1)π
xk = cos , for k = 0, 1, . . . , n.
2(n + 1)

7. |Tn (x)| ≤ 1, for −1 ≤ x ≤ 1.

8. Chebyshev polynomials have some unusual properties. They form an


orthogonal set. To show the orthogonality of the Chebyshev polyno-
mials, consider
Z 1 Z 1
Tn (x)Tm (x) cos(n cos−1 x) cos(m cos−1 x)
√ dx = √ dx.
−1 1 − x2 −1 1 − x2
Let θ = cos−1 x, then
1
dθ = − √ dx
1 − x2
Interpolation and Approximation 561

and
x = −1, θ = π, and x = 1, θ = 0.

Suppose that n 6= m, and since

1
cos nθ cos mθ = [cos(n + m)θ + cos(n − m)θ] ,
2

then we get
Z 1 Z π Z π
Tn (x)Tm (x) 1 1
√ dx = cos[(n + m)θ]dθ + cos[(n − m)θ]dθ.
−1 1−x 2 2 0 2 0

Solving the right-hand side, we obtain


Z 1  π
Tn (x)Tm (x) 1 1
√ dx = sin[(n + m)θ] + sin[(n − m)θ]
−1 1 − x2 2(n + m) 2(n − m) 0

or
Z 1
Tn (x)Tm (x)
√ dx = 0.
−1 1 − x2
Now when n = m, we have
1
1 π 1 π
Z Z Z
Tn (x)Tn (x)
√ dx = cos[(n + n)θ]dθ + cos[(n − n)θ]dθ
−1 1 − x2 2 0 2 0
1 π 1 π
Z Z
= cos(2nθ)dθ + cos(0)dθ
2 0 2 0
 π
1 sin(2nθ) 1
= + [θ]π0
2 2n 0 2
= 0,

for each n ≥ 1. •

In the following example, we shall find the Chebyshev points for the
linear, quadratic, and cubic interpolations for the given function.
562 Applied Linear Algebra and Optimization using MATLAB

Example 5.18 Let f (x) = x2 ex on the interval [−1, 1]. Then the Cheby-
shev points for the linear interpolation (n = 1) are given by
 
(2(0) + 1)π π
x0 = cos = cos = 0.7071
2(1 + 1) 4
 
(2(1) + 1)π 3π
x1 = cos = cos = −0.7071.
2(1 + 1) 4
Now using the linear Lagrange polynomial using these two Chebyshev points,
we have
p1 (x) = L0 (x)f (x0 ) + L1 (x)f (x1 ),
where
(x − x1 ) (x + 0.7071) (x + 0.7071)
L0 (x) = = =
(x0 − x1 ) (0.7071 + 0.7071) 1.4142
(x − x0 ) (x − 0.7071) (x − 0.7071)
L1 (x) = = = ,
(x1 − x0 ) (−0.7071 − 0.7071) −1.4142
and the function values at the Chebyshev points are
f (0.7071) = (0.7071)2 e0.7071 = 1.0140
f (−0.7071) = (−0.7071)2 e−0.7071 = 0.2465.
Thus,
(x + 0.7071) (x − 0.7071)
p1 (x) = (1.0140) + (0.2465)
1.4142 −1.4142
gives
p1 (x) = 0.6302 + 0.5427x.
Now to find the quadratic Lagrange polynomial, we need to calculate
three Chebyshev points as follows:
 
(2(0) + 1)π π
x0 = cos = cos = 0.8660
2(2 + 1) 6
 
(2(1) + 1)π 3π
x1 = cos = cos = 0.0
2(2 + 1) 6
 
(2(2) + 1)π 5π
x2 = cos = cos = −0.8660.
2(2 + 1) 6
Interpolation and Approximation 563

For the quadratic polynomial, we have


p2 (x) = L0 (x)f (x0 ) + L1 (x)f (x1 ) + L2 (x)f (x2 ),
where
(x − 0.0)(x + 0.8660) (x2 + 0.8660x)
L0 (x) = =
(0.8660 − 0.0)(0.8660 + 0.8660) 1.4999

(x − 0.8660)(x + 0.8660) (x2 + 0.7500)


L1 (x) = =
(0.0 − 0.8660)(0.0 + 0.8660) (−0.7500)

(x − 0.8660)(x − 0.0) (x2 − 0.7500)


L2 (x) = = ,
(−0.8660 − 0.8660)(−0.8660 − 0.0) 0.7500
and the function values are
f (0.8660) = (0.8660)2 e0.8660 = 1.7829
f (0.0) = (0.0)2 e0.0 = 0.0
f (−0.8660) = (−0.8660)2 e−0.8660 = 0.3155.
So,
p2 (x) = 0.8939x + 1.6101x2 .
Similarly, for the cubic polynomial, we need the following:
 
(2(0) + 1)π π
x0 = cos = cos = 0.9239
2(3 + 1) 8
 
(2(1) + 1)π 3π
x1 = cos = cos = 0.3827
2(3 + 1) 8
 
(2(2) + 1)π 5π
x2 = cos = cos = −0.3827
2(3 + 1) 8
 
(2(3) + 1)π 7π
x3 = cos = cos = −0.9239
2(3 + 1) 8
and
(x − 0.3827)(x + 0.3827)(x + 0.9239)
L0 (x) =
(0.9239 − 0.3827)(0.9239 + 0.3827)(0.9239 + 0.9239)
564 Applied Linear Algebra and Optimization using MATLAB

(x3 + 0.9239x2 − 0.1464x − 0.1353)


=
1.3066
(x − 0.9239)(x + 0.3827)(x + 0.9239)
L1 (x) =
(0.3827 − 0.9239)(0.3827 + 0.3827)(0.3827 + 0.9239)
(x3 + 0.3827x2 − 0.8536x − 0.3267)
=
−0.5412
(x − 0.9239)(x − 0.3827)(x + 0.9239)
L2 (x) =
(−0.3827 − 0.9239)(−0.3827 − 0.3827)(−0.3827 + 0.9239)
(x3 − 0.3827x2 − 0.8536x − 0.3267)
=
0.5412
(x − 0.9293)(x − 0.3827)(x + 0.3827)
L3 (x) =
(−0.9239 − 0.9293)(−0.9239 − 0.3827)(0.9239 + 0.3827)
(x3 − 0.9239x2 − 0.1465x + 0.1354)
=
3.1638
and

f (0.8660) = (0.9239)2 e0.9239 = 2.1503


f (0.3827) = (0.3827)2 e0.3827 = 0.2147
f (−0.3827) = (−0.3827)2 e−0.3827 = 0.0999
f (−0.9239) = (−0.9239)2 e−0.9239 = 0.3388.

Thus,
p3 (x) = −0.1389 − 0.0756x + 1.7592x2 + 1.5404x3 . •
Note that because Tn (x) = cos(nθ), Chebyshev polynomials have a succes-
sion of maximums and minimums of alternating signs, each of magnitude
one. Also, | cos(nθ)| = 1, for nθ = 0, π, 2π, . . . , and because θ varies from
0 to π as x varies from 1 to −1, Tn (x) assumes its maximum magnitude of
unity (n + 1) times on the interval [−1, 1]. An important result of Cheby-
shev polynomials is the fact that, of all polynomials of degree n where the
1
coefficient of xn is unity, the polynomial ( 2n−1 )Tn (x) has a smaller bound
Interpolation and Approximation 565

to its magnitude on the interval [−1, 1] than any other. Because the max-
1
imum magnitude of Tn (x) is one, the upper bound referred to is 2n−1 .

Theorem 5.6 Let n ≥ 1 be an integer, and consider all possible monic


polynomials (a polynomial whose highest-degree term has a coefficient of 1)
of degree n. Then the degree n monic polynomial with the smallest maxi-
1
mum absolute value on [−1, 1] is 2n−1 . •

It is important to note that polynomial interpolation using equally


spaced data points, whether expressed in the Lagrange interpolation for-
mula or Newton’s interpolation formula, is most accurate in the middle
range of the interpolation domain, but the error of the interpolation in-
creases toward the edges. While the spacings determined by a Chebyshev
polynomial are largest at the center of the interpolation domain and de-
crease toward the edges, errors become more evenly distributed throughout
the domain and their magnitudes become less than with equally spaced
points. Since the error formula for Lagrange and Newton’s polynomials
satisfy
f (x) = pn (x) + En (x),
where
f (n+1)
En (x) = R(x) ,
(n + 1)!
and R(x) is the polynomial of degree (n + 1)

R(x) = (x − x0 )(x − x1 ) · · · (x − xn ),

using this relationship, we have



max f (n+1)
−1≤x≤1
|En (x)| ≤ |R(x)| .
(n + 1)!

Here, we are looking to get a minimum of max |R(x)|.


−1≤x≤1

The Russian mathematician Chebyshev studied how to minimize the


upper bound for |En (x)|. One upper bound can be formed by taking the
product of the maximum value of |R(x)| over all x in [−1, 1] and the
566 Applied Linear Algebra and Optimization using MATLAB

(n+1)
f
maximum value over all x in [−1, 1]. To minimize the factor
(n + 1)!
max |R(x)|, Chebyshev found that x0 , x1 , . . . , xn should be chosen so that
1
R(x) = Tn+1 (x).
2n
Theorem 5.7 Let f ∈ C n+1 ([−1, 1]) be given, and let pn (x) be the nth
degree polynomial interpolated to f using the Chebyshev points. Then
1 (n+1)
max |f (x) − pn (x)| ≤ max f . (5.41)
−1≤x≤1 2n (n + 1)! −1≤x≤1
Note that
1
≤ max |(x − x0 )(x − x1 ) · · · (x − xn )|,
2n −1≤x≤1
for any choice of x0 , x1 , . . . , xn on the interval [−1, 1]. •

Example 5.19 Construct the Lagrange interpolating polynomials of de-


gree 2 on the interval [−1, 1] to f (x) = (x + 2)ex using equidistant and the
Chebyshev points.

Solution. First, we construct the polynomial with the use of the three
equidistant points
x0 = −1, x1 = 0, x2 = 1,
and their corresponding function values

f (−1) = e−1 , f (0) = 2, f (1) = 3e.

Then the Lagrange polynomial at equidistant points is


(x − 0)(x − 1.0)
p2 (x) = (e−1 )
(−1.0 − 0)(−1.0 − 1.0)

(x + 1.0)(x − 1.0)
= (2)
(0 + 1.0)(0 − 1.0)

(x + 1.0)(x − 0.0)
= (3e).
(1.0 + 1.0)(1.0 − 0)
Interpolation and Approximation 567

Simplifying this, we get

p2 (x) = 2.2614x2 + 3.8935x + 2,

the required polynomial at equidistant points.

Similarly, we can obtain the polynomial using the Chebyshev points


π
x0 = cos = 0.8660
6

x1 = cos = 0.0
6

x2 = cos = −0.8660,
6
with their corresponding function values

f (x0 ) = 6.81384

f (x1 ) = 2.0

f (x2 ) = 0.4770,

as follows:
(x − 0)(x + 0.8660)
Q2 (x) = (6.81384)
(0.8660 − 0)(0.8660 + 0.8660)

(x − 0.8660)(x + 0.8660)
= (2)
(0 − 0.8660)(0 + 0.8660)

(x − 0.8660)(x − 0)
= (0.4770).
(−0.8660 − 0.8660)(−0.8660 − 0)
Thus, the Lagrange polynomial at the Chebyshev points is

Q2 (x) = 2.1941x2 + 3.6587x + 2.


568 Applied Linear Algebra and Optimization using MATLAB

Note that the coefficients p2 (x) and Q2 (x) are different because they
use different points and function values. Also, the actual errors at x = 0.5
using both polynomials are

f (0.5) − p2 (0.5) = −0.3903

and
f (0.5) − Q2 (0.5) = −0.2561.

Changing Intervals: [a, b] to [−1, 1]

The Chebyshev polynomial of interpolation can be applied to any range


other than [−1, 1] by mapping [−1, 1] onto the range of interest. Writing
the range of interpolation as [a, b], the mapping is given by

b(1 + z) + a(1 − z) 2x − a − b
x= or z = ,
2 b−a
where a ≤ x ≤ b and −1 ≤ z ≤ 1.

The required Chebyshev points on Tn+1 (z) on [−1, 1] are


 
π
zk = cos (2n + 1 − 2k) or k = 0, 1, . . . , n, (5.42)
2(n + 1)

and the interpolating points on [a, b] are obtained as

b(1 + zk ) + a(1 − zk )
xk = or k = 0, 1, . . . , n. (5.43)
2
Theorem 5.8 (Lagrange–Chebyshev Approximation Polynomial)

Suppose that pn (x) is the Lagrange polynomial that is based on the Cheby-
shev points

b(1 + zk ) + a(1 − zk )
xk = or k = 0, 1, . . . , n.
2
Interpolation and Approximation 569

If f ∈ C n+1 [a, b], then the error bound formula is

2(b − a)n+1
|f (x) − pn (x)| ≤ n+1
max f (n+1) (x) . (5.44)
4 (n + 1) a≤x≤b

Example 5.20 Find the three Chebyshev points in 1 ≤ x ≤ 3 and then


write the Lagrange interpolation to interpolate ln(x + 1). Also, compute an
error bound.

Solution. Given a = 1, b = 3, n = 2, and k = 0, 1, 2, the three Chebyshev


points can be calculated as follows:
 π 5π
z0 = cos (4 + 1 − 0) = cos = −0.86603
6 6
 π π
z1 = cos (4 + 1 − 2) = cos = 0.0
6 2
 π  π
z2 = cos (4 + 1 − 4) = cos = 0.5.
6 6
Now we compute the interpolating points on [1, 3] as follows:

3(1 + z0 ) + 1(1 − z0 ) 3(1 − 0.86603) + 1(1 + 0.86603)


x0 = = = 1.13398
2 2
3(1 + z1 ) + 1(1 − z1 ) 3(1 + 0.0) + 1(1 − 0.0)
x1 = = = 2.0
2 2
3(1 + z2 ) + 1(1 − z2 ) 3(1 + 0.5) + 1(1 − 0.5)
x0 = = = 2.5.
2 2
Now we compute the function values at these interpolating points as:

f (x0 ) = ln(1 + x0 ) = ln(2.13398) = 0.75799


f (x1 ) = ln(1 + x1 ) = ln(3) = 1.09861
f (x2 ) = ln(1 + x2 ) = ln(3.5) = 1.25276.
570 Applied Linear Algebra and Optimization using MATLAB

Thus, the Lagrange interpolating polynomial becomes

(x − 2)(x − 2.5)
p2 (x) = (0.75799)
(1.13398 − 2)(1.13398 − 2.5)
(x − 1.13398)(x − 2.5)
+ (1.09861)
(2 − 1.13398)(2 − 2.5)
(x − 1.13398)(x − 2)
+ (1.25276)
(2.5 − 1.13398)(2.5 − 2)

and simplifying it, we get

p2 (x) = −0.06223x2 + 0.58834x + 0.17086.

To compute the error bound, we use formula (5.44) as

2(3 − 1)2+1 (2+1)


|f (x) − p2 (x)| ≤ max f (x) ,
42+1 (2 + 1) 1≤x≤3

2
then using f (3) (x) = , we get
(x + 1)3

2(3 − 1)2+1

2
|f (x) − p2 (x)| ≤ 2+1 max
4 (2 + 1) 1≤x≤3 (x + 1)3
or
|f (x) − p2 (x)| ≤ 0.01042,
which is the required error bound. •

Theorem 5.9 (Chebyshev Approximation Polynomial)

The Chebyshev approximation polynomial pn (x) of degree n for a function


f (x) over the interval [−1, 1] can be written as
n
X
f (x) ≈ pn (x) = ai Ti (x), (5.45)
i=1
Interpolation and Approximation 571

where the coefficients of the polynomial can be calculated as


n
1 X
a0 = f (xj )T0 (xj )
n + 1 j=0
(5.46)
n
2 X
ai = f (xj )Ti (xj ), for i = 1, 2, . . . , n, (5.47)
n+1 j=0

and the polynomial Ti is


 
(2j + 1)iπ
Ti (xj ) = cos . (5.48)
2n + 2
Example 5.21 Construct the Chebyshev polynomial of degree 4 and La-
grange interpolating polynomial of degree 4 (using equidistant points) on
the interval [−1, 1] to approximate the function f (x) = (x + 2) ln(x + 2).

Solution. The Chebyshev polynomial of degree 4 to approximate the given


function can be written as

p4 (x) = a0 T0 (x) + a1 T1 (x) + a2 T2 (x) + a3 T3 (x) + a4 T4 (x).

First, we compute the coefficients a0 , a1 , a2 , a3 , and a4 by using (5.46) and


(5.48), and the points xj = cos( (2j+1)π
10
), for j = 0, 1, 2, 3, 4, as follows:
4
1X
a0 = (xj + 2) ln(xj + 2)T0 (xj ) = 1.5156
5 j=0
4
2X
a1 = (xj + 2) ln(xj + 2)T1 (xj ) = 1.6597
5 j=0
4
2X
a2 = (xj + 2) ln(xj + 2)T2 (xj ) = 0.1308
5 j=0
4
2X
a3 = (xj + 2) ln(xj + 2)T3 (xj ) = −0.0115
5 j=0
572 Applied Linear Algebra and Optimization using MATLAB

4
2X
a4 = (xj + 2) ln(xj + 2)T4 (xj ) = 0.0015.
5 j=0

Using these values, we have

p4 (x) = 1.5156T0 (x)+1.6597T1 (x)+0.1308T2 (x)−0.0115T3 (x)+0.0015T4 (x).

Since we know that

T0 (x) = cos(0.θ) = 1
T1 (x) = cos(1.θ) = x
T2 (x) = cos(2.θ) = 2x2 − 1
T3 (x) = cos(3.θ) = 4x3 − 3x
T4 (x) = cos(4.θ) = 8x4 − 8x2 + 1,

we have

p4 (x) = 1.5156(1)+1.6597(x)+0.1308(2x2 −1)−0.0115(4x3 −3x)+0.0015(8x4 −8x2 +1)

or

p4 (x) = 1.3863 + 1.9918x − 0.0120x2 − 0.0460x3 + 0.0120x4 ,

which is the required Chebyshev approximation polynomial for the given


function.

Now we construct the Lagrange polynomial of degree 4 using the equidis-


tant points on the interval [−1, 1],

x0 = −1, x1 = −0.5, x2 = 0, x3 = 0.5, x4 = 1,

and the functional values at these points,

f (x0 ) = 0.0, f (x1 ) = 0.6082, f (x2 ) = 1.3863, f (x3 ) = 2.2907, f (x4 ) = 3.2958,

as follows:
Interpolation and Approximation 573

p4 (x) = L0 (x)(0) + L1 (x)(0.6082) + L2 (x)(1.3863) + L3 (x)(2.2907) + L4 (x)(3.2958)


= 0.6082L1 (x) + 1.3863L2 (x) + 2.2907L3 (x) + 3.2958L4 (x).

The values of the unknown Lagrange coefficients are as follows:

(x + 1)(x − 0)(x − 0.5)(x − 1) x4 − 0.5x3 − x2 + 0.5x


L1 (x) = =
(−0.5 + 1)(−0.5 − 0)(−0.5 − 0.5)(−0.5 − 1) 0.375

(x + 1)(x + 0.5)(x − 0.5)(x − 1) x4 − 1.25x2 + 0.25


L2 (x) = =
(0 + 1)(0 + 0.5)(0 − 0.5)(0 − 1) 0.25

(x + 1)(x + 0.5)(x − 0)(x − 1) x4 + 0.5x3 − x2 − 0.5x


L3 (x) = =
(0.5 + 1)(0.5 + 0.5)(0.5 − 0)(0.5 − 1) −0.375

(x + 1)(x + 0.5)(x − 0)(x − 0.5) x4 + x3 − 0.25x2 − 0.25x


L4 (x) = = .
(1 + 1)(1 + 0.5)(1 − 0)(1 − 0.5) 1.5

Thus,

p4 (x) = 1.3863 + 3.3159x − 2.9942x2 − 1.6680x3 + 3.2558x4 ,

which is the Lagrange interpolating polynomial of degree 4 to approximate


the given function. •

To get the coefficients of the above Chebyshev polynomial approxima-


tion we use the MATLAB Command Window as follows:

>> n = 4; a = −1; b = 1;
>> y =0 (x + 2). ∗ log(x + 2)0 ;
>> A = CHEBP A(y, n, a, b);
>> A =
1.5156 1.6597 0.1308 − 0.0115 0.0015
574 Applied Linear Algebra and Optimization using MATLAB

Program 5.6
MATLAB m-file for Computing Coefficients of Chebyshev
Polynomial Approximation
function C= CHEBPA(fn,n,a,b)
if nargin==2, a = −1; b = 1; end
d = pi/(2 ∗ n + 2); A = zeros(1, n + 1);
for i=1:n+1
x(i) = cos((2 ∗ i − 1) ∗ d); end
x = (b − a) ∗ x/2 + (a + b)/2; y = eval(f n);
for i=1:n+1
z = (2 ∗ i − 1) ∗ d;
for j=1:n+1
A(j) = A(j) + y(i) ∗ cos((j − 1) ∗ z); end end
A = 2 ∗ A/(n + 1); A(1) = A(1)/2;

5.3 Least Squares Approximation


In fitting a curve to given data points, there are two basic approaches. One
is to have the graph of the approximating function pass exactly through
the given data points. The methods of polynomial interpolation approxi-
mation discussed in the previous sections have this special property. If the
data values are experimental then they may contain errors or have a limited
number of significant digits. In such cases, the polynomial interpolation
methods may yield unsatisfactory results. The second approach, which is
discussed here, is usually more satisfactory for experimental data and uses
an approximating function that graphs a smooth curve having the general
shape suggested by the data values but not, in general, passing exactly
through all of the data points. Such an approach is known as least squares
data fitting. The least squares method seeks to minimize the sum (over all
data points) of the squares of the differences between the function value
and the data value. The method is based on results from calculus that
demonstrate that a function, in this case, the total squared error, attains
a minimum value when its partial derivatives are zero.
Interpolation and Approximation 575

The least squares method of evaluating empirical formulas has been


used for many years. In engineering, curve fitting plays an important role
in the analysis, interpretation, and correlation of experimental data with
mathematical models formulated from fundamental engineering principles.

5.3.1 Linear Least Squares


To introduce the idea of linear least squares approximation, consider the
experimental data shown in Figure 5.6.

Figure 5.6: Least squares approximation.

A Lagrange interpolation of a polynomial of degree 6 could easily be


constructed for this data. However, there is no justification for insisting
that the data points be reproduced exactly, and such an approximation
may well be very misleading since unwanted oscillations are likely. A more
satisfactory approach would be to find a straight line that passes close to
all seven points. One such possibility is shown in Figure 5.7. Here, we have
to decide what criterion is to be adopted for constructing such an approxi-
mation. The most common approach for this curve is known as linear least
squares data fitting. The linear least squares approach defines the correct
straight line as the one that minimizes the sum of the squares of the dis-
tances between the data points and the line. The least squares straight line
approximations are an extremely useful and common approximate fit. The
576 Applied Linear Algebra and Optimization using MATLAB

Figure 5.7: Least squares approximation.

solution to linear least squares approximation is an important application


of the solution of systems of linear equations and leads to other interesting
ideas of numerical linear algebra. The least squares approximation is not
restricted to a straight line. However, in order to motivate the general case
we consider this first. The straight line

p1 (x) = a + bx (5.49)

should be fitted through the given points (x1 , y1 ), . . . , (xn , yn ) so that the
sum of the squares of the distances of these points from the straight line
is minimum, where the distance is measured in the vertical direction (the
y-direction). Hence, it will suffice to minimize the function

n
X
E(a, b) = (yj − a − bxj )2 . (5.50)
j=1

The minimum of E occurs if the partial derivatives of E with respect


to a and b become zero. Note that {xj } and {yj } are constant in (5.50)
and unknown parameters a and b are variables. Now differentiate E with
respect to variable a by making the other variable b fixed and then put it
Interpolation and Approximation 577

equal to zero, which gives


n
∂E X
= −2 (yj − a − bxj ) = 0. (5.51)
∂a j=1

Now hold variable a and differentiate E with respect to variable b and


then put it equal to zero, and we obtain
n
∂E X
= −2 xj (yj − a − bxj ) = 0. (5.52)
∂b j=1

Equations (5.51) and (5.52) may be rewritten after dividing by –2 as fol-


lows: n n n
X X X
yj − a−b xj = 0
j=1 j=1 j=1

n
X n
X n
X
xj yj − a xj − b x2j = 0,
j=1 j=1 j=1

which can be arranged to form a 2 × 2 system that is known as the normal


equations
Xn Xn
na + b xj = yj
j=1 j=1

n
X n
X n
X
a xj + b x2j = xj y j .
j=1 j=1 j=1

Now writing in matrix form, we have


    
n S1 a S2
= , (5.53)
S1 S3 b S4

where P
S1 = P xj
S2 = P yj
S3 = P x2j
S4 = xj y j .
578 Applied Linear Algebra and Optimization using MATLAB

Table 5.9: Find the coefficients of (5.53).


i xi yi x2i xi yi
1 1.0000 1.0000 1.0000 1.0000
2 2.0000 2.0000 4.0000 4.0000
3 3.0000 2.0000 9.0000 6.0000
4 4.0000 3.0000 16.0000 12.0000
n=4 S1=10 S2=8 S3=30 S4=23

In the foregoing equations the summation is over j from 1 to n.


The solution of the above system (5.53) can be obtained easily as
S3S2 − S1S4 nS4 − S1S2
a= and b = . (5.54)
nS3 − (S1)2 nS3 − (S1)2
The formula (5.53) reduces the problem of finding the parameters for
a least squares linear fit to simple matrix multiplication.

We shall call a and b the least squares linear parameters for the data
and the linear guess function with parameters, i.e.,
p1 (x) = a + bx
will be called the least squares line (or regression line) for the data.
Example 5.22 Using the method of least squares, fit a straight line to the
four points (1, 1), (2, 2), (3, 2), and (4, 3).

Solution. The sums required for the normal equation (5.53) are easily
obtained using the values in Table 5.9. The linear system involving a and
b in (5.53) form is
    
4 10 a 8
= .
10 30 b 23
Then solving the above linear system using LU decomposition by the Cholesky
method discussed in Chapter 1, the solution of the linear system is
a = 0.5 and b = 0.6.
Interpolation and Approximation 579

Thus, the least squares line is

p1 (x) = 0.5 + 0.6x.

Clearly, p1 (x) replaces the tabulated functional relationship given by


y = f (x). The original data along with the approximating polynomials are
shown graphically in Figure 5.8.

Use the MATLAB Command Window as follows:

>> x = [1 2 3 4];
>> y = [1 2 2 3];
>> [a, b] = linef it(x, y);
To plot Figure 5.7 one can use the MATLAB Command Window:

>> xf it = 0 : 0.1 : 5;
>> yf it = 0.6 ∗ xf it + 0.5;
>> plot(x, y,0 o0 , xf it, yf it,0 −0 );

Figure 5.8: Least squares fit of four data points to a line.


580 Applied Linear Algebra and Optimization using MATLAB

Table 5.10: Error analysis of the linear fit.


i xi yi p1 (xi ) abs(yi − p1 (xi ))
1 1.0000 1.0000 1.1000 0.1000
2 2.0000 2.0000 1.7000 0.3000
3 3.0000 2.0000 2.3000 0.3000
4 4.0000 3.0000 2.9000 0.1000

Table 5.10 shows the error analysis of the straight line using least squares
approximation.
Hence, we have
4
X
E(a, b) = (yi − p1 (xi ))2 = 0.2000.
i=1

Program 5.7
MATLAB m-file for the Linear Least Squares Fit
function [a,b]=linefit(x,y)
n=length(x);
S1 = sum(x);
S2 = sum(y);
S3 = sum(x. ∗ x);
S4 = sum(x. ∗ y);
a = (n ∗ S4 − S1 ∗ S2)/(n ∗ S3 − (S1) ˆ 2);
b = (S3 ∗ S2 − S4 ∗ S1)/(n ∗ S3 − (S1) ˆ 2);
f or k = 1 : n
p1 = a + b ∗ x(k);
Error(k) = abs(p1 − y(k));
end
Error = sum(Error. ∗ Error);
Interpolation and Approximation 581

5.3.2 Polynomial Least Squares


In the previous section we discussed a procedure to derive the equation of
a straight line using least squares, which works very well if the measured
data are intrinsically linear. But in many cases, data from experimental
results are not linear. Therefore, now we show how to find the least squares
parabola, and the extension to a polynomial of higher degree is easily
made. The general problem of approximating a set of data {(xi , yi ), i =
0, 1, . . . , m} with a polynomial of degree n < m − 1 is

pn (x) = b0 + b1 x + b2 x2 + · · · + bn xn . (5.55)

Then the error E takes the form


m
X
E = (yj − pn (xj ))2
j=1

m
X m
X m
X
= yj2 − 2 pn (xj )yj + (pn (xj ))2
j=1 j=1 j=1

m m n
! m n
!2
X X X X X
= yj2 − 2 bi xij yj + bi xij
j=1 j=1 i=0 j=1 i=0

m n m
! n X
n m
!
X X X X X
= yj2 − 2 bi yj xij + bi bk xji+k .
j=1 i=0 j=1 i=0 k=0 j=1

Like in linear least squares, for E to be minimized, it is necessary that


∂E/∂bi = 0, for each i = 0, 1, 2, . . . , n. Thus, for each i,
m n m
∂E X X X
0= = −2 i
yj xj + 2 bk xji+k . (5.56)
∂bi j=1 k=1 j=1

This gives (n+1) normal equations in the (n+1) unknowns bi ,


n
X m
X m
X
bk xi+k
j = yj xij , i = 0, 1, 2, . . . , n. (5.57)
k=1 j=1 j=1
582 Applied Linear Algebra and Optimization using MATLAB

It is helpful to write the equations as follows:


m
X m
X m
X m
X m
X
b0 x0j + b1 x1j + b2 x2j + · · · + bn xnj = yj x0j
j=1 j=1 j=1 j=1 j=1

m
X m
X m
X m
X m
X
b0 x1j + b1 x2j + b2 x3j + · · · + bn xn+1
j = yj x1j
j=1 j=1 j=1 j=1 j=1
..
.
m
X m
X m
X m
X m
X
b0 xnj + b1 xn+1
j + b2 xn+2
j + · · · + bn x2n
j = yj xnj .
j=1 j=1 j=1 j=1 j=1

Note that the coefficients matrix of this system is symmetric and positive-
definite. Hence, the normal equations possess a unique solution.

Example 5.23 Find the least squares polynomial approximation of degree


2 to the following data:

xj 0 1 2 4 6
.
yj 3 1 0 1 4

Solution. The coefficients of the least squares polynomial approximation


of degree 2,
p2 (x) = b0 + b1 x + b2 x2 ,
are the solution values b0 , b1 , and b2 of the linear system
X X X 
b0 m + b1 x1j + b2 x2j = yj x0j 






X X X X
b0 x1j + b1 x2j + b2 x3j = 1
yj xj . (5.58)




X X X X 
b0 x2j + b1 x3j + b2 x4j = 2 
yj xj 

The sums required for the normal equation (5.58) are easily obtained using
the values in Table 5.11. The linear system involving unknown coefficients
Interpolation and Approximation 583

Table 5.11: Find the coefficients of (5.58).


i xi yi x2i x3i x4i xi y i x2i yi
1 0.00 3.00 0.00 0.00 0.00 0.00 0.00
2 1.00 1.00 1.00 1.00 1.00 1.00 1.00
3 2.00 0.00 4.00 8.00 16.00 0.00 0.00
4 4.00 1.00 16.00 64.00 256.00 4.00 16.00
5 6.00 4.00 36.00 216.00 1296.00 24.00 144.00
m=5 13.00 9.00 57.00 289.00 1569.00 29.00 161.00

b0 , b1 , and b2 is
5b0 + 13b1 + 57b2 = 9
13b0 + 57b1 + 289b2 = 29
57b0 + 289b1 + 1569b2 = 161.
Then solving the above linear system, the solution of the linear system is

b0 = 2.8252, b1 = −2.0490, b2 = 0.3774.

Hence, the parabola equation becomes

p2 (x) = 2.8252 − 2.0490x + 0.3774x2 .

Use the MATLAB Command Window as follows:

>> x = [0 1 2 4 6];
>> y = [3 1 0 1 4];
>> n = 2;
>> C = polyf it(x, y, n);
Clearly, p2 (x) replaces the tabulated functional relationship given by y =
f (x). The original data along with the approximating polynomials are
shown graphically in Figure 5.9. To plot Figure 5.9 one can use the MAT-
LAB Command Window as follows:
584 Applied Linear Algebra and Optimization using MATLAB

>> xf it = −1 : 0.1 : 7;
>> yf it = 2.8252 − 2.0490. ∗ xf it + 0.3774. ∗ xf it. ∗ xf it;
>> plot(x, y,0 o0 , xf it, yf it,0 −0 );

Figure 5.9: Least squares fit of five data points to a parabola.

Program 5.8
MATLAB m-file for the Polynomial Least Squares Fit
function C=polyfit(x,y,n)
m=length(x); for i = 1 : 2 ∗ n + 1
a(i) = sum(x.ˆ (i − 1));end
f or i = 1 : n + 1 % coefficients of vector b
b(i) = sum(y. ∗ x.ˆ (i − 1));end
for i=1:n+1; for j=1:n+1
A(i, j) = a(j + i − 1); end;end
C = A \ b0 ; % Solving linear system
f or k = 1 : m; S = C(1); f or i = 2 : m + 1
S = S + C(i). ∗ x(k).ˆ (i − 1);end;
p2 (k) = S; Error(k) = abs(y(k) − p2 (k)); end
Error = sum(Error. ∗ Error);
Interpolation and Approximation 585

Table 5.12: Error analysis of the polynomial Fit.


i xi yi p2 (xi ) abs(yi − p2 (xi ))
1 0.0000 3.0000 2.8252 0.1748
2 1.0000 1.0000 1.1535 0.1535
3 2.0000 0.0000 0.2367 0.2367
4 4.0000 1.0000 0.6674 0.3326
5 6.0000 4.0000 4.1173 0.1173
13.000 9.0000 9.0001 1.0148

Table 5.12 shows the error analysis of the parabola using least squares ap-
proximation. Hence, the error associated with the least squares polynomial
approximation of degree 2 is
5
X
E(b0 , b1 , b2 ) = (yi − p(xi ))2 = 0.2345.
j=1

5.3.3 Nonlinear Least Squares


Although polynomials are frequently used as the approximating function,
they are by no means the only possibilities. The most popular forms of
nonlinear curves are the exponential forms

y(x) = axb (5.59)

or
y(x) = aebx . (5.60)
We can develop the normal equations for these analogously to the pre-
vious development for least squares. The least squares error for (5.59) is
given by
Xn
E(a, b) = (yj − axbj )2 , (5.61)
j=1

with associated normal equations


586 Applied Linear Algebra and Optimization using MATLAB

n

∂E X
= −2 (yj − axbj )xbj = 0



∂a j=1




. (5.62)
n 
∂E X 
= −2 (yj − axbj )(abxb−1

j ) = 0


∂b


j=1

Then the set of normal equations (5.62) represents the system of two
equations in the two unknowns a and b. Such nonlinear simultaneous
equations can be solved using Newton’s method for nonlinear systems. The
details of this method of nonlinear systems will be discussed in Chapter 7.
Example 5.24 Find the best-fit of the form y = axb by using the data
x 1 2 4 10
y 2.87 4.51 6.11 9.43
by Newton’s method, starting with the initial approximation (a0 , b0 ) = (2, 1)
and taking a desired accuracy within  = 10−5 .

Solution. The normal equation is


4 n

X X
yj xbj − a x2b

j = 0




j=1 j=1


. (5.63)
4
X 4
X



yj xbj ln xj − a xj2b ln xj = 0




j=1 j=1

By using the given data points, the nonlinear system (5.63) gives
2.87 − a(1 + 22b + 42b + 102b ) + 4.5(2b ) + 6.11(4b ) + 9.43(10b ) = 0
−a(0.69(22b ) + 1.39(42b ) + 2.30(102b )) + 3.12(2b ) + 8.47(4b ) + 21.72(10b ) = 0.
Let us consider the two functions
f1 (a, b) = 2.87 − a(1 + 22b + 42b + 102b ) + 4.5(2b ) + 6.11(4b ) + 9.43(10b )
f2 (a, b) = −a(0.69(22b ) + 1.39(42b ) + 2.30(102b )) + 3.12(2b ) + 8.47(4b )
+21.72(10b ),
Interpolation and Approximation 587

and their derivatives with respect to unknown variables a and b:


∂f1
= −(1 + 22b + 42b + 102b )
∂a
∂f1
= −a(1.39(22b ) + 2.77(42b ) + 4.61(102b ) + 3.12(2b ) + 8.47(4b ) + 21.71(10b ))
∂b
∂f2
= −(0.69(22b ) + 1.39(42b ) + 2.30(102b )
∂a
∂f2
= −a(0.96(22b ) + 3.84(42b ) + 10.61(102b ) + 2.16(2b ) + 11.74(4b ) + 50.01(10b )).
∂b
Since Newton’s formula for the system of two nonlinear equations is
     
ak+1 ak −1 f1 (ak , bk )
= − J (ak , bk ) ,
bk+1 bk+1 f2 (ak , bk )

where  
∂f1 ∂f1
 ∂a ∂b 
J = ,
 
 ∂f ∂f2 
2
∂a ∂b
let us start with the initial approximation (a0 , b0 ) = (2, 1), and the values
of the functions at this initial approximation are as follows:

f1 (2, 1) = −111.39
f2 (2, 1) = −253.216.

The Jacobian matrix J and its inverse J −1 at the given initial approx-
imation can be calculated as
 
−121 −763.576
J(2, 1) =
−255.248 −1700.534

and  
−1 −0.1565 0.0703
J (2, 1) = .
0.0235 −0.0111
588 Applied Linear Algebra and Optimization using MATLAB

Substituting all these values in the above Newton’s formula, we get the first
approximation as

        
a1 2.0 −0.1565 0.0703 −111.39 2.3615
= − = .
b1 1.0 0.0235 −0.0111 −253.216 0.7968

Similarly, the second iteration using (a1 , b1 ) = (2.3615, 0.7968) gives

        
a2 2.3615 −0.2323 0.1063 −35.4457 2.7444
= − = .
b2 0.7968 0.0339 −0.0169 −81.1019 0.6282

The first two and the further steps of the method are listed in Table 7.4 by
taking the desired accuracy within  = 10−5 .

Table 5.13: Solution of a system of two nonlinear equations.


n a-approx. b-approx.
an bn
00 2.00000 1.00000
01 2.36151 0.79684
02 2.74443 0.62824
03 2.99448 0.52535
04 3.07548 0.49095
05 3.08306 0.48754
06 3.08314 0.48751

Hence, the best nonlinear fit is

y(x) = 3.08314x0.48751 .

But remember that nonlinear simultaneous equations are more difficult


to solve than linear equations. Because of this difficulty, the exponential
forms are usually linearized by taking logarithms before determining the
Interpolation and Approximation 589

required parameters. Therefore, taking logarithms of both sides of (5.59),


we get
ln y = ln a + b ln x,
which may be written as

Y = A + BX, (5.64)
with A = ln a, B = b, X = ln x, and Y = ln y. The values of A and B can
be chosen to minimize
n
X
E(A, B) = (Yj − (A + BXj ))2 , (5.65)
j=1

where Xj = ln xj and Yj = ln yj . After differentiating E with respect to


A and B and then putting the results equal to zero, we get the normal
equations in linear form as
n
X n
X
nA + B Xj = Yj
j=1 j=1

n
X n
X n
X
2
A Xj + B Xj = Xj Yj .
j=1 j=1 j=1

Then writing the above equations in matrix form, we have


    
n S1 A S2
= , (5.66)
S1 S3 B S4

where P
S1 = P Xj
S2 = P Yj
S3 = P Xj2
S4 = X j Yj .
590 Applied Linear Algebra and Optimization using MATLAB

Table 5.14: Find the coefficients of (5.67).


i Xi Yi Xi2 X i Yi
1 0.0000 1.0543 0.0000 0.0000
2 0.6932 1.5063 0.4805 1.0442
3 1.3863 1.8099 1.9218 2.5091
4 2.3026 2.2439 5.3020 5.1668
n=4 S1=4.3821 S2=6.6144 S3=7.7043 S4=8.7201

In the foregoing equations the summation is over j from 1 to n. The


solution of the above system can be obtained easily as

S3S2 − S1S4 
A = 
nS3 − (S1)2



. (5.67)
nS4 − S1S2 

B = 

nS3 − (S1)2

Now the data set may be transformed to (ln xj , ln yj ) and determining a


and b is a linear least squares problem. The values of unknowns a and b
can be deduced from the relations

a = eA and b = B. (5.68)

Thus, the nonlinear guess function with parameters a and b

y(x) = axb

will be called the nonlinear least squares approximation for the data.

Example 5.25 Find the best-fit of the form y = axb by using the following
data:
x 1 2 4 10
.
y 2.87 4.51 6.11 9.43
Solution. The sums required for the normal equation (5.66) are easily
obtained using the values in Table 5.14. The linear system involving A and
Interpolation and Approximation 591

B in (5.66) form is
    
4 4.3821 A 6.6144
= .
4.3821 7.7043 B 8.7201
Then solving the above linear system, the solution of the linear system is
A = 1.0975 and B = 0.5076.
Using the values of A and B in (5.68), we have the values of the pa-
rameters a and b as
a = eA = 2.9969 and b = B = 0.5076.
Hence, the best nonlinear fit is
y(x) = 2.9969x0.5076 .

Use the MATLAB Command Window as follows:

>> x = [1 2 4 10];
>> y = [2.87 4.51 6.11 9.43];
>> [A, B] = exp1f it(x, y);

Program 5.9
MATLAB m-file for the Nonlinear Least Squares Fit
function [A,B]=exp1fit(x,y)% Least square fit y = axb
%Transform the data from (x,y) to (X,Y), X = log(x), Y = log(y);
n = length(x); X = log(x); Y = log(y);
S1 = sum(X); S2 = sum(Y ); S3 = sum(X. ∗ X); S4 = sum(W. ∗
Z);
B = (n ∗ S4 − S1 ∗ S2)/(n ∗ S3 − (S1) ˆ 2);
A = (S3 ∗ S2 − S4 ∗ S1)/(n ∗ S3 − (S1)ˆ 2);
b = B; a = exp(A);
for k=1:n
y = a ∗ X(k).ˆ b; Error(k) = abs(y(k) − y); end
Error = sum(Error. ∗ Error);
592 Applied Linear Algebra and Optimization using MATLAB

Clearly, y(x) replaces the tabulated functional relationship given by y =


f (x). The original data along with the approximating polynomials are
shown graphically in Figure 5.10. To plot Figure 5.10, one can use the
MATLAB Command Window as follows:

>> xf it = 0 : 0.1 : 11;


>> yf it = 2.9969 ∗ xf it. ˆ 0.5076;
>> plot(x, y,0 o0 , xf it, yf it,0 −0 );

Figure 5.10: Nonlinear least squares fit.

Table 5.15 shows the error analysis of the nonlinear least squares approxi-
mation.

Table 5.15: Error analysis of the nonlinear fit.


i xi yi y(xi ) abs(yi − y(xi ))
1 1.0000 2.870 2.9969 0.1269
2 2.000 4.510 4.2605 0.2495
3 4.000 6.110 6.0569 0.0531
4 10.000 9.430 9.6435 0.2135
Interpolation and Approximation 593

Hence, the error associated with the nonlinear least squares approximation
is
X4
E(a, b) = (yi − axbx 2
i ) = 0.1267.
i=1

Similarly, for the other nonlinear curve y(x) = aebx , the least squares error
is defined as n
X
E(a, b) = (yj − aebxj )2 , (5.69)
j=1

which gives the associated normal equations as


n

∂E X
= −2 (yj − aebxj )ebxj

=0 

∂a j=1




. (5.70)
n 
∂E X 
= −2 (yj − aebxj )axj ebxj

=0 

∂a


j=1

Then the set of normal equations (5.70) represents the nonlinear simulta-
neous system.
Example 5.26 Find the best-fit of the form y = aebx by using the data
x 0 0.25 0.4 0.5
y 9.532 7.983 4.826 5.503
by Newton’s method, starting with the initial approximation (a0 , b0 ) = (8, 0)
and taking the desired accuracy within  = 10−5 .

Solution. The normal equation is


4 4

X X
yj ebxj − a e2bxj = 0





j=1 j=1


. (5.71)
4
X 4
X



yj xj ebxj − a xj e2bxj = 0




j=1 j=1
594 Applied Linear Algebra and Optimization using MATLAB

By using the given data points, the nonlinear system (5.71) gives

9.53 − a(1 + e0.5b + e0.8b + eb ) + 7.98e0.25b + 4.83e0.4b + 5.503e0.5b = 0


−a(0.25e0.5b + 0.4e0.8b + 0.5eb ) + 1.996e0.25b + 1.93e0.4b + 2.752e0.5b = 0.

Let us consider the two functions

f1 (a, b) = 9.53 − a(1 + e0.5b + e0.8b + eb ) + 7.98e0.25b + 4.83e0.4b + 5.503e0.5b


f2 (a, b) = −a(0.25e0.5b + 0.4e0.8b + 0.5eb ) + 1.996e0.25b + 1.93e0.4b + 2.752e0.5b

and their derivatives with respect to unknown variables a and b:


∂f1
= −(1 + e0.5b + e0.8b + eb )
∂a
∂f1
= −a(0.5e0.5b + 0.8e0.8b + eb ) + 1.996e0.25b + 1.93e0.4b + 2.752e0.5b
∂b
∂f2
= −(0.25e0.5b + 0.4e0.8b + 0.5eb )
∂a
∂f2
= −a(0.125e0.5b + 0.32e0.8b + 0.5eb ) + 0.499e0.25b + 0.772e0.4b + 1.376e0.5b .
∂b
Since Newton’s formula for the system of two nonlinear equations is
     
ak+1 ak −1 f1 (ak , bk )
= − J (ak , bk ) ,
bk+1 bk+1 f2 (ak , bk )

where  
∂f1 ∂f1
 ∂a ∂b 
J = ,
 
 ∂f ∂f 
2 2
∂a ∂b
let us start with the initial approximation (a0 , b0 ) = (8, 0), and the values
of the functions at this initial approximation are as follows:

f1 (2, 1) = −4.156
f2 (2, 1) = −2.522.
Interpolation and Approximation 595

The Jacobian matrix J and its inverse J −1 at the given initial approxima-
tion can be computed as
 
−4 −11.722
J(8, 0) =
−1.15 −4.913
and  
−1 −0.7961 1.8993
J (8, 0) = .
0.1863 −0.6481
Substituting all these values in the above Newton’s formula, we get the first
approximation as
        
a1 8.0 −0.7961 1.8993 −4.156 9.48168
= − = .
b1 0.0 0.1863 −0.6481 −2.522 −0.86015

Similarly, the second iteration using (a1 , b1 ) = (9.48168, −0.86015) gives


        
a2 9.48168 −0.87813 2.19437 −1.4546 9.70881
= − = .
b2 −0.86015 0.20559 −0.92078 −0.6856 −1.19239

The first two and the further steps of the method are listed in Table 7.4
by taking the desired accuracy within  = 10−5 .

Table 5.16: Solution of nonlinear system.


Iteration a-approximation b-approximation
n an bn
00 8.00000 0.00000
01 9.48168 -0.86015
02 9.70881 -1.19239
03 9.72991 -1.26193
04 9.73060 -1.26492
05 9.73060 -1.26492

Hence, the best nonlinear fit is

y(x) = 9.73060e−1.26492x .


596 Applied Linear Algebra and Optimization using MATLAB

Once again, to make this exponential form a linearized form, we take the
logarithms of both sides of (5.60), and we get

ln y = ln a + bx,

which may be written as


Y = A + BX, (5.72)
with A = ln a, B = b, X = x, and Y = ln y. The values of A and B can
be chosen to minimize
n
X
E(A, B) = (Yj − (A + BXj ))2 , (5.73)
j=1

where Xj = xj and Yj = ln yj . By solving the linear normal equations of


the form n n

X X 
nA + B Xj = Yj 



j=1 j=1


(5.74)
n
X Xn Xn 

Xj2 =

A Xj + B Xj Yj 



j=1 j=1 j=1

to get the values of A and B, the data set may be transformed to (xj , ln yj )
and determining a and b is a linear least squares problem. The values of
unknowns a and b are deduced from the relations

a = eA and b = B. (5.75)

Thus, the nonlinear guess function with parameters a and b

y(x) = aebx

will be called the nonlinear least squares approximation for the data.
Interpolation and Approximation 597

Example 5.27 Find the best-fit of the form y = aebx by using the following
data:
x 0 0.25 0.4 0.5
.
y 9.532 7.983 4.826 5.503

Solution. The sums required for the normal equation (5.74) are easily
obtained using the values in Table 5.17.

Table 5.17: Find the coefficients of (5.74).


i Xi Yi Xi2 Xi Yi
1 0.0000 2.2546 0.0000 0.0000
2 0.2500 2.0773 0.0625 0.5193
3 0.4000 1.5740 0.1600 0.6296
4 0.5000 1.7053 0.2500 0.8527
n=4 1.1500 7.6112 0.4725 2.0016

The linear system involving unknown coefficients A and B is

4A + 1.1500B = 7.6112
1.1500A + 0.4725B = 2.0016.

Then solving the above linear system, the solution of the linear system is

A = 2.2811 and B = −1.3157.

Using the values in (5.75), we have the values of the unknown parameters
a and b as
a = eA = 9.7874, and b = B = −1.3157.
Hence, the best nonlinear fit is

y(x) = 9.7874x−1.3157 .
598 Applied Linear Algebra and Optimization using MATLAB

Use the MATLAB Command Window as follows:

>> x = [0 0.25 0.4 0.5];


>> y = [9.532 7.983 4.826 5.503];
>> [A, B] = exp2f it(x, y);
Clearly, y(x) replaces the tabulated functional relationship given by y =
f (x). The original data along with the approximating polynomials are
shown graphically in Figure 5.14. To plot Figure 5.14, one can use the
MATLAB Command Window as follows:

>> xf it = −0.1 : 0.1 : 0.6;


>> yf it = 9.7874 ∗ exp(−1.3157. ∗ xf it);
>> plot(x, y,0 o0 , xf it, yf it,0 −0 );

Figure 5.11: Nonlinear least squares fit.

Note that the value of a and b calculated for the linearized problem
will not necessarily be the same as the values obtained for the original
least squares problem. In this example, the nonlinear system becomes
9.532 − a + 7.983e0.25b − ae0.5b + 4.826e0.4b − ae0.8b + 5.503e0.5b − aeb = 0

1.996e0.25b − 0.25a2 e0.5b + 1.930e0.4b − 0.4a2 e0.8b + 2.752ae0.5b − 0.5a2 eb = 0.


Interpolation and Approximation 599

Table 5.18: Error analysis of the nonlinear fit.


i xi yi y(xi ) abs(yi − y(xi ))
1 0.000 9.532 9.7872 0.2552
2 0.250 7.983 7.0439 0.9391
3 0.400 4.826 5.7823 0.9563
4 0.500 5.503 5.0695 0.4335

Now Newton’s method for nonlinear systems can be applied to this system,
and we get the values of a and b as

a = 9.731 and b = −1.265.

Table 5.18 shows the error analysis of the nonlinear least squares approxi-
mation.
Hence, the error associated with the nonlinear least squares approximation
is
4
X
E(a, b) = (yi − aebxi )2 = 2.0496.
i=1

Table 5.19 shows the conversion of nonlinear forms into linear forms by
using a change of variables and constants.

Example 5.28 Find the best-fit of the form y = axe−bx by using the
change of variables to linearize the following data points:
x 1.5 2.5 4.0 5.5
.
y 3.0 4.3 6.5 7.0

Solution. Write the given form y = axe−bx into the form


y
= ae−bx ,
x
and taking the logarithms of both sides of the above equation, we get
y
ln = ln a + (−bx),
x
600 Applied Linear Algebra and Optimization using MATLAB

Table 5.19: Conversion of nonlinear forms into linear forms.

No. Nonlinear Form Linear Form Change of Variables and Constants


y = f (x) Y = A + BX
1 y = ax + b/x2 Y = A + BX Y = y/x, X = 1/x, A = a, B = b
2 y = 1/(a + bx)2 Y = A + BX Y = (y)1/2 , X = x, A = a, B = b
3 y = 1/(a + bx2 ) Y = A + BX Y = 1/y, X = x2 , A = a, B = b
4 y = 1/(2 + aebx ) Y = A + BX Y = ln(1/y − 2), X = x, A = a, B = b
5 y = axe−bx Y = A + BX Y = ln(y/x), X = x, A = ln(a), B = −b
6 y = a + b/ ln x Y = A + BX Y = y, X = 1/ ln x, A = a, B = b
7 y = (a + bx)−3 Y = A + BX Y = 1/y 1/3 , X = x, A = a, B = b

Table 5.20: Find the coefficients of (5.67).


i xi yi Xi = xi Yi = ln(yi /xi ) Xi2 Xi Yi
1 1.50 3.0 1.50 0.6932 2.25 1.0398
2 2.50 4.3 2.50 0.5423 6.25 1.3558
3 4.00 6.5 4.00 0.4855 16.00 1.9420
4 5.50 7.0 5.50 0.2412 3.25 1.3266
n=4 S1=13.50 S2=1.9622 S3=54.750 S4=5.6642

which may be written as


Y = A + BX,
with y
A = ln a, B = −b, X = x, Y = ln .
x
Then the sums required for the normal equation (5.66) are easily obtained
using the values in Table 5.20. The linear system involving A and B in
(5.66) form is
    
4 13.50 A 1.9622
= .
13.50 54.750 B 5.6642
Then solving the above linear system, the solution of the linear system is
A = 0.8426 and B = −0.1043.
Interpolation and Approximation 601

Using these values of A and B, we have the values of the parameters a and
b as
a = eA = 2.3224 and b = −B = 0.1043.

Hence,
y(x) = 2.3224e−0.1043x

is the best nonlinear fit. •

Program 5.10
MATLAB m-file for the Nonlinear Least Squares Fit
function [A,B]=exp2fit(x,y) % Least square fit y = aebx
% Transform the data from (x,y) to (x,Y), Y = log(Y );
n = length(x); Y = log(y); S1 = sum(x); S2 = sum(Y );
S3 = sum(x. ∗ x); S4 = sum(x. ∗ Y );
B = (n ∗ S4 − S1 ∗ S2)/(n ∗ S3 − (S1) ˆ 2);
A = (S3 ∗ S2 − S4 ∗ S1)/(n ∗ S3 − (S1)ˆ 2);
b = B; a = exp(A)
for k=1:n
y = a ∗ exp(b ∗ x(k)); Error(k) = abs(y(k) − y); end
Error = sum(Error. ∗ Error);

5.3.4 Least Squares Plane


Many problems arise in engineering and science where the dependent vari-
able is a function of two or more variables. For example, z = f (x, y) is a
two-variables function. Consider the least squares plane

z = ax + by + c, (5.76)

for the n points (x1 , y1 , z1 ), . . . , and (xn , yn , zn ) is obtained by minimizing


n
X
E(a, b, c) = (zj − axj − byj − c)2 , (5.77)
j=1
602 Applied Linear Algebra and Optimization using MATLAB

and the function E(a, b, c) is minimum when


n
∂E X
= 2 (zj − axj − bxj − c)(−xj ) = 0
∂a j=1

n
∂E X
= 2 (zj − axj − bxj − c)(−yj ) = 0
∂b j=1

n
∂E X
= 2 (zj − axj − bxj − c)(−1) = 0.
∂c j=1

Dividing by 2 and rearranging gives the normal equations



(P x2j )a
P P P P
+ (P xj yj )b + (P xj )c = P zj xj 
(P xj yj )a + (P yj2 )b + ( yj )c = P zj yj . (5.78)
( xj )a + ( yj )b + nc = zj

The above linear system can be solved for unknowns a, b, and c.

Example 5.29 Find the least squares plane z = ax + by + c by using the


following data:
xj 1 1 2 2 2
yj 1 2 1 2 3 .
zj 7 9 10 11 12

Solution. The sums required for the normal equation (5.78) are easily
obtained using the values in Table 5.23.

The linear system (5.78) involving unknown coefficients a, b, and c is

14a + 15b + 8c = 82
15a + 19b + 9c = 93
8a + 9b + 5c = 49.
Then solving the above linear system, the solution of the linear system is

a = 2.400, b = 1.200, c = 3.800.


Interpolation and Approximation 603

Table 5.21: Find the coefficients of (5.78).


i xi yi zi x2i xi y i yi2 xi zi yi zi
1 1.000 1.000 7.000 1.000 1.000 1.000 7.000 7.000
2 1.000 2.000 9.000 1.000 2.000 4.000 9.000 18.000
3 2.000 1.000 10.000 4.000 2.000 1.000 20.000 10.000
4 2.000 2.000 11.000 4.000 4.000 4.000 22.000 22.000
5 2.000 3.000 12.000 4.000 6.000 9.000 24.000 36.000
n=5 8.000 9.000 49.000 14.000 15.000 19.000 82.000 93.000

Table 5.22: Error Analysis of the Plane fit.


i xi yi zi z(xi , yi ) abs(zi − z)
1 1.0000 1.0000 7.0000 7.4000 0.4000
2 1.0000 2.0000 9.0000 8.6000 0.4000
3 2.0000 1.0000 10.0000 9.8000 0.2000
4 2.0000 2.0000 11.0000 11.0000 0.0000
5 2.0000 3.0000 12.0000 12.2000 0.2000

Hence, the least squares plane fit is

z = 2.400x + 1.200y + 3.800.

Use the MATLAB Command Window as follows:

>> x = [1 1 2 2 2];
>> y = [1 2 1 2 3];
>> z = [7 9 10 11 12];
>> sol = planef it(x, y, z);
Table 5.22 shows the error analysis of the least squares plane approxima-
tion. Hence, the error associated with the least squares plane approxima-
tion is
4
X
E(a, b, c) = (zi − axi + byi + c)2 = 0.4000. •
i=1
604 Applied Linear Algebra and Optimization using MATLAB

Program 5.11
MATLAB m-file for the Least Squares Plane Fit
function Sol=planefit(x,y,z)
n = length(x); S1 = sum(x); S2 = sum(y); S3 = sum(z);
S4 = sum(x. ∗ x); S5 = sum(x. ∗ y); S6 = sum(y. ∗ y);
S7 = sum(z. ∗ x); S8 = sum(z. ∗ y);
A = [S4 S5 S1; S5 S6 S2; S1 S2 n]; B = [S7 S8 S3]0 ; C = A\B;
for k=1:n
z = C(1) ∗ x(k) + C(2) ∗ y(k) + C(3);
Error(k) = abs(z(k) − z); end
Error = sum(Error. ∗ Error);

5.3.5 Trigonometric Least Squares Polynomial


This is another popular form of the polynomial frequently used as the
approximating function. Since we know that a series of the form

m
a0 X
p(x) = + [ai cos(kx) + bi sin(kx)] (5.79)
2 i=1

is called a trigonometric polynomial of order m, here we shall approxi-


mate the given data points with the function (5.79) using the least squares
method. The least squares error for (5.79) is given by

n
X
E(a0 , a1 , . . . , an , b1 , b2 , . . . , bn ) = [yj − p(xj )]2 , (5.80)
j=1

and with the associated normal equations

∂E

= 0, for k = 0, 1, . . . , m 

∂ak 

, (5.81)
∂E 

= 0, for k = 1, 2, . . . , m 

∂bk
Interpolation and Approximation 605

gives
n m m n

X a0 X X X 
[ ai cos(kxj ) + bi sin(kxj )] = yj 

j=1
2 i=1 i=1 j=1








n m m n

a0 X
X X X 


[ ai cos(kxj ) + bi sin(kxj )] cos(kxj ) = cos(kxj )yj 
j=1
2 i=1 i=1 j=1
.




n m m n


X a0 X X X 

[ ai cos(kxj ) + bi sin(kxj )] sin(kxj ) = sin(kxj )yj 

2 i=1


j=1 i=1 j=1



for k = 1, 2, . . . , m

(5.82)
Then the set of these normal equations (5.83) represents the system of
(2m + 1) equations in (2m + 1) unknowns and can be solved using any
numerical method discussed in Chapter 2. Note that the derivation of the
coefficients ak and bk is usually called discrete Fourier analysis.

For m = 1, we can write the normal equations (5.83) in the form


n n n

a0 X X X 
+ a1 cos(xj ) + b1 sin(xj ) = yj 

2 j=1 j=1 j=1








n n n n


X X X X 
2
a0 cos(xj ) + a1 cos (xj ) + b1 cos(xj ) sin(xj ) = cos(xj )yj ,
j=1 j=1 j=1 j=1






n n n n


X X X X 
sin2 (xj )

a0 sin(xj ) + a1 cos(xj ) sin(xj ) + b1 = sin(xj )yj 



j=1 j=1 j=1 j=1
(5.83)
where j is the number of data points. By writing the above equations in
matrix form, we have
    
n/2 S1 S2 a0 S6
 S1 S3 S4   a1  =  S7  , (5.84)
S2 S4 S5 b1 S8
606 Applied Linear Algebra and Optimization using MATLAB

where
n
X n
X
S1 = cos(xj ), S2 = sin(xj )
j=1 j=1
n
X n
X
S3 = cos2 (xj ), S4 = cos(xj ) sin(xj )
j=1 j=1
n
X n
X
S5 = sin2 (xj ), S6 = yj
j=1 j=1
n
X n
X
S7 = cos(xj )yj , S8 = sin(xj )yj ,
j=1 j=1

which represents a linear system of three equations in three unknowns


a0 , a1 , and b1 . Note that the coefficients matrix of this system is symmet-
ric and positive-definite. Hence, the normal equations possess a unique
solution.

Example 5.30 Find the trigonometric least squares polynomial p1 (x) =


a0 + a1 cos x + b1 sin x that approximates the following data:

xj 0.1 0.2 0.3 0.4 0.5


.
yj 0.9 0.75 0.64 0.52 0.40

Solution. To find the trigonometric least squares polynomial

p1 (x) = a0 + a1 cos x + b1 sin x,

we have to solve the system


    
n/2 S1 S2 a0 S6
 S1 S3 S4   a1  =  S7  ,
S2 S4 S5 b1 S8

where
n
X n
X
S1 = cos(xj ) = 4.7291, S2 = sin(xj ) = 1.4629
j=1 j=1
Interpolation and Approximation 607
n
X n
X
S3 = cos2 (xj ) = 4.4817, S4 = cos(xj ) sin(xj ) = 1.3558
j=1 j=1

n
X n
X
2
S5 = sin (xj ) = 0.5183, S6 = yj = 3.2100
j=1 j=1

n
X n
X
S7 = cos(xj )yj = 3.0720, S8 = sin(xj )yj = 0.8223.
j=1 j=1

So,     
2.5 4.7291 1.4629 a0 3.2100
 4.7291 4.4817 1.3558   a1  =  3.0720  ,
1.4629 1.3558 0.5183 b1 0.8223
and using the Gauss elimination method, we get the values of unknown as

a0 = −0.0001, a1 = 0.9850, b1 = −0.9897.

Thus, we get the best trigonometric least squares polynomial

p(x) = −0.0001 + 0.9850 cos x − 0.9897 sin x,

which approximates the given data.

Note that C = cos(xj ) and S = sin(xj ). •

The original data along with the approximating polynomial are shown
graphically in Figure 5.12. To plot Figure 5.12, one can use the MATLAB
Command Window as follows:

>> x = [0.1 0.2 0.3 0.4 0.5];


>> y = [0.9 0.75 0.64 0.52 0.40];
>> xf it = −0.1 : 0.1 : 0.6;
>> yf it = −0.0001 + 0.9850. ∗ cos(xf it) − 0.9897. ∗ sin(xf it);
>> plot(x, y,0 o0 , xf it, yf it,0 −0 );
608 Applied Linear Algebra and Optimization using MATLAB

Table 5.23: Find the coefficients of (5.83).

j xj yj C S C2 S2 CS Cyj Syj
1 0.1 0.90 0.9950 0.0998 0.9900 0.0100 0.0993 0.8955 0.0899
2 0.2 0.75 0.9801 0.1987 0.9605 0.0395 0.1947 0.7350 0.1490
3 0.3 0.64 0.9553 0.2955 2.000 0.0873 0.2823 0.6114 0.1891
4 0.4 0.52 0.9211 0.3894 4.000 0.1516 0.3587 0.4790 0.2025
5 0.5 0.40 0.8776 0.4794 6.000 0.2298 0.4207 0.3510 0.1918
n=5 1.5 3.21 4.7291 1.4629 4.4817 0.5183 1.3558 3.0720 0.8223

Figure 5.12: Trigonometric least squares fit.

5.3.6 Least Squares Solution of an


Overdetermined System
In Chapter 2 we discussed methods for computing the solution x to a linear
system Ax = b when the coefficient matrix A is square (number of rows
and columns are equal). For square matrix A, the linear system usually has
a unique solution. Now we consider linear systems where the coefficient
matrix is rectangular (number of rows and columns are not equal). If A
has m rows and n columns, then x is a vector with n components and b
is a vector with m components. If the number of rows is greater than the
Interpolation and Approximation 609

number of columns (m > n), then the linear system is called an overdeter-
mined system. Typically, an overdetermined system has no solution. This
type of system generally arises when dealing with experimental data. It is
also common in optimization-related problems.

Consider the following overdetermined linear system of two equations


in one variable:

2x1 = 3
4x1 = 1. (5.85)

Now using Gauss elimination to solve this system, we obtain

0 = −5,

which is impossible and hence, the given system (5.85) is inconsistent.


Writing the given system in vector form, we get
   
2 2
x1 = . (5.86)
4 4

The left-hand side of (5.86) is [0, 0]T when x1 = 0, and is [2, 4]T when
x1 = 1. Note that as x1 takes on all possible values, the left-hand side of
(5.86) generates the line connecting the origin and the point (2, 4) (Fig-
ure 5.13). On the other hand, the right-hand side of (5.86) is the vector
[3, 1]T . Since the point (3, 1) does not lie on the line, the left-hand side
and the right-hand side of (5.86) are never equal. The given system (5.86)
is only consistent when the point corresponding to the right-hand side is
contained in the line corresponding to the left-hand side. Thus, the least
squares solution to (5.86) is the value of x1 for which the point on the line
is closest to the point (3, 1). In Figure 5.13, we see that the point (1, 2) on
the line is closest to (3, 1), which we got when x1 = 21 . So the least squares
solution to (5.85) is x1 = 12 . Now consider the following linear system of
three equations in two variables:

a11 x1 + a12 x2 = b1 
a21 x1 + a22 x2 = b2 . (5.87)
a31 x1 + a32 x2 = b3

610 Applied Linear Algebra and Optimization using MATLAB

Figure 5.13: Least squares solution to an overdetermined system.

Again, it is impossible to find a solution that can satisfy all of the equations
unless two of the three equations are dependent. That is, if only two out
of the three equations are unique, then a solution is possible. Otherwise,
our best hope is to find a solution that minimizes the error, i.e., the least
squares solution. Now, we discuss the method for finding the least squares
solution to the overdetermined system.

In the least squares method, x̂ is chosen so that the Euclidean norm of


residual r = b − Ax̂ is as small as possible. The residual corresponding to
system (5.87) is
 
b1 − a11 x1 − a12 x2
r =  b2 − a21 x1 − a22 x2  .
b3 − a31 x1 − a32 x2
The l2 -norm of the residual is the square root of the sum of each component
squared:
q
krk2 = r12 + r22 + r32 .

Since minimizing krk2 is equivalent to minimizing (krk2 )2 , the least squares


Interpolation and Approximation 611

solution to (5.87) is the values for x1 and x2 , which minimize the expression
(b1 − a11 x1 − a12 x2 )2 + (b2 − a21 x1 − a22 x2 )2 + (b3 − a31 x1 − a32 x2 )2 . (5.88)
Minimizing x1 and x2 is done by differentiating (5.88), with respect to x1
and x2 , and setting the derivatives to zero. Then solving for x1 and x2 , we
obtain the least squares solution x̂ = [x1 , x2 ]T to the system (5.87).

For a general overdetermined linear system Ax = b, the residual is


r = b − Ax̂ and the l2 -norm of the residual is the square root of rT r. The
least squares solution to the linear system minimizes
rT r = (krk2 )2 = (b − Ax̂)T (b − Ax̂). (5.89)
The above equation (5.89) attains minimum when the partial derivative
with respect to each of the variables x1 , x2 , . . . , xn is zero. Since
rT r = r12 + r22 + · · · + rm
2
, (5.90)
and the ith component of the residual r is
ri = bi − ai1 x1 − ai2 x2 − · · · − ain xn ,
the partial derivative of rT r with respect to xj is given by
∂ T
r r = −2r1 a1j − 2r2 a2j − · · · − 2rm amj . (5.91)
∂xj
From the right side of (5.91) we see that the partial derivative of rT r with
respect to xj is −2 times the product between the jth column of A and r.
Note that the jth column of A is the jth row of AT . Since the jth component
of AT r is equal to the jth column of A times r, the partial derivative of rT r
with respect to xj is the jth component of the vector −2AT r. The l2 -norm
of the residual is minimized at the point x where all the partial derivatives
vanish, i.e.,
∂ T ∂ T ∂ T
r r= r r = ··· = r r = 0. (5.92)
∂x1 ∂x2 ∂xn
Since each of these partial derivatives is −2 times the corresponding com-
ponent of AT r, we conclude that
AT r = 0. (5.93)
612 Applied Linear Algebra and Optimization using MATLAB

Replacing r by b − Ax gives
AT (b − Ax̂) = 0 (5.94)
or
AT Ax̂ = AT b, (5.95)
which is called the normal equation.

Any x̂ that minimizes the l2 -norm of the residual r = b − Ax̂ is a


solution to the normal equation (5.95). Conversely, any solution to the
normal equation (5.95) is a least squares solution to the overdetermined
linear system.

Example 5.31 Solve the following overdetermined linear system of three


equations in two unknowns:
2x1 + 5x2 + x3 = 1
3x1 − 4x2 + 2x3 = 3
4x1 + 3x2 + 3x3 = 5
5x1 − 2x2 + 4x3 = 7.
Solution. The matrix form of the given system is
   
2 5 1   1
 3 −4 2  x1  3 

 4
 =  5 .

3 3  x2
5 −2 4 7
Then using the normal equation (5.95), we obtain
   
  2 5 1    1
2 3 4 5   2 3 4 5  
 5 −4 3 −2   3 −4 2  x1 =  5 −4 3 −2   3  ,

 4 3 3  x2  5 
1 2 3 4 1 2 3 4
5 −2 4 7
which reduces the given system as
   
54 0 40   66
 0 54 −2  x1 =  −6  .
x2
40 −2 30 50
Interpolation and Approximation 613

Solving this simultaneous linear system, the values of unknowns are

x1 = −1.00, x2 = 0.00, x3 = 3.00,

and they are the least squares solution of the given overdetermined system.

Using the MATLAB Command Window, the above result can be re-
produced as follows:

>> A = [2 5 1; 3 − 4 2; 4 3 3; 5 − 2 4];
>> b = [1; 3; 5; 7];
>> x = overD(A, b);

Program 5.12
MATLAB m-file for the Overdetermined Linear System
function sol=overD(A,b)
x = (A0 ∗ A) \ (A0 ∗ b); % Solve the normal equations
sol=x;

5.3.7 Least Squares Solution of an


Underdetermined System
We consider again such linear systems where the coefficient matrix is rect-
angular (number of rows and columns are not equal). If A has m rows and
n columns, then x is a vector with n components and b is a vector with m
components. If the number of rows is smaller than the number of columns
(m < n), then the linear system is called an underdetermined system. Typ-
ically, an underdetermined system has infinitely many solutions. This type
of system generally arises in optimization theory and in economic modeling.

In general, the coefficient in row i and column j for the matrix AAT is
the dot product between row i and row j from A.
614 Applied Linear Algebra and Optimization using MATLAB

Notice that the coefficient matrix AAT is symmetric, so when forming


the matrix AAT , we just evaluate the coefficients that are on the diago-
nal or above the diagonal, whereas the coefficients below the diagonal are
determined from the symmetry property. Consider the equation

4x1 + 3x2 = 15. (5.96)

We want to find the least squares solution to (5.96). The set of all points
(x1 , x2 ) that satisfy (5.96) forms a line with slope − 43 and the distance from
1
the origin to the point (x1 , x2 ) is (x21 + x22 ) 2 .

Figure 5.14: Least squares solution of underdetermined system.

To find the least squares solution to (5.96), we choose the point (x1 , x2 )
that is as close to the origin as possible. The point (z1 , z2 ) in Figure 5.14,
which is closest to the origin, is the least squares solution to (5.96). We
see in Figure 5.14 that the vector from the origin to (z1 , z2 ) is orthogonal
to the line 4x1 + 3x2 = 15.

The collection of points forming this perpendicular have the form


 
4
t ,
3

where t is an arbitrary scalar. Since (z1 , z2 ) lies on this perpendicular,


there exists a value of t such that
   
z1 4
=t ,
z2 3
Interpolation and Approximation 615

and this implies that


z1 = 4t and z2 = 3t.
Since x1 = z1 and x2 = z2 must satisfy (5.96), we have

4z1 + 3z2 = 15. (5.97)

Substituting z1 = 4t and z2 = 3t into (5.97) gives

25t = 15 or t = 3/5 = 0.6.

Thus, the least squares solution to (5.96) is


     
z1 4 2.4
= 0.6 = .
z2 3 1.8

Now, let us consider a general underdetermined linear system Ax = b


and suppose that p is any solution to the linear system and q is any vector
for which
Aq = 0.
Since
A(p + q) = Ap + Aq = Ap = b,
we see that p + q is a solution to Ax = b, whenever Aq = 0. Conversely,
it is also true because, if Ax = b, then

x = p + (x − p) = p + q,

where q = x − p.

Since
A(x − p) = b − b = 0,
then x can be expressed as x = p + q and Az = 0.

The set of all q such that Aq = 0 is called the null space of A (kernel
of A) letting
N = {q : Aq = 0}.
616 Applied Linear Algebra and Optimization using MATLAB

Figure 5.15: Least squares solution of underdetermined system.

Any solutions x of the underdetermined linear system Ax = b are


sketched in Figure 5.15, for x = p + q and q ∈ N.

From Figure 5.15, the solution closest to the origin is perpendicular to


N.
In linear algebra, the set of vectors perpendicular to the null space of
A are linear combinations of the rows of A, so if
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A= , m < n,
 
.. .. .. ..
 . . . . 
am1 am2 · · · amn

then
       
z1 a11 a21 am1
 z2   a12   a22   am2 
z=  = t1   + t2   + · · · + tm 
       
.. .. .. .. 
 .   .   .   . 
zn a1n a2n amn
Interpolation and Approximation 617

or
    
a11 t1 + a21 t2 + · · · + am1 tm a11 a21 · · · am1 t1
 a12 t1 + a22 t2 + · · · + am2 tm   a12 a22 · · · am2  t2 
s= = .
    
.. .. .. .. .. .. .. ..  ..
 . . . .   . . . .  . 
a1n t1 + a2n t2 + · · · + amn tm a1n a2n · · · amn tm
So,
z = AT t,
where  
t1
 t2 
t= .
 
..
 . 
tm
Substituting x = z = AT t into the linear system Ax = b, we have
AAT t = b, (5.98)
and solving this equation yields t, i.e.,
t = (AAT )−1 b,
while the least squares solution s to the underdetermined system is
z = AT t = AT (AAT )−1 b. (5.99)
Now solving the underdetermined equation (5.96),
 
 x1
4 3 = 15,
x2
we first use (5.98) as follows:
 
 4
4 3 t = 15,
3
15
which gives t = 25
= 0.6. Now using (5.99), we have
   
T 4 2.4
z=A t= (0.6) = ,
3 1.8
the required least squares solution of the given underdetermined equation.

618 Applied Linear Algebra and Optimization using MATLAB

Example 5.32 Solve the following underdetermined linear system of two


equations in three unknowns:

x1 + 2x2 − 3x3 = 42
5x1 − x2 + x3 = 54.

Solution. The matrix form of the given system is


 
  x1  
1 2 −3  42
x2  = .
5 −1 1 54
x3

Then using the normal equation (5.99)

AAT t = b,

we obtain
 
  1 5    
1 2 −3  2 −1  t1 42
= ,
5 −1 1 t2 54
−3 1

which reduces the given system to


    
14 0 t1 42
= .
0 27 t2 54

Solving the above linear system, the values of unknowns are

t1 = 3 t2 = 2.

Since the best least squares solution z to the given linear system is z = AT t,
i.e.,      
z1 1 5   13
 z2  =  2 −1  3 =  4  ,
2
z3 −3 1 −7
it is called the least squares solution of the given underdetermined system.

Interpolation and Approximation 619

Using the MATLAB Command Window, the above result can be re-
produced as:

>> A = [1 2 − 3; 5 − 1 1];
>> b = [42; 54];
>> x = underD(A, b);

Program 5.13
MATLAB m-file for the Underdetermined Linear System
function sol=underD(A,b)
t = (A ∗ A0 ) \ (b); % Solve the normal equations
x = A0 ∗ t; % Solve the normal equations
sol=x;

5.3.8 The Pseudoinverse of a Matrix


If A is an n × n matrix with linearly independent columns, then it is in-
vertible, and the unique solution to the linear system Ax = b is x = A−1 b.
If m > n and A is an m × n matrix with linearly independent columns,
then a system Ax = b has no exact solution, but the best approximation is
given by the unique least squares solution x̂ = (AT A)−1 AT b. The matrix
(AT A)−1 AT , therefore, plays the role of an inverse of A in this situation.

Definition 5.1 (Pseudoinverse of a Matrix)

If A is a matrix with linearly independent columns, then the pseudoinverse


of a matrix is the matrix A+ defined by

A+ = (AT A)−1 AT . (5.100)

For example, consider the matrix


 
1 2
A =  2 3 ,
3 4
620 Applied Linear Algebra and Optimization using MATLAB

then we have
 
  1 2  
1 2 3 14 20
AT A =  2 3 = ,
2 3 4 20 29
3 4

and its inverse will be of the form


 
29 10

 6 3 
(AT A)−1 =  .
 
 10 7 

3 3
Thus, the pseudoinverse of the matrix is
 
29 10

6 3 
 
 1 2 3

T −1 T +
(A A) A = A = 

7  2 3 4

 10

3 3
 
11 1 7
 − 6 −3 6 
= .
 
 4 1 2 

3 3 3

The pseudoinverse of a matrix can be obtained using the MATLAB


Command Window as follows:

>> A = [1 2; 2 3; 3 4];
>> pinv(A);
Note that if A is a square matrix, then A+ = A−1 and in such a case, the
least squares solution of a linear system Ax = b is the exact solution, since

x̂ = A+ b = A−1 b.
Interpolation and Approximation 621

Example 5.33 Find the pseudoinverse of the matrix of the following lin-
ear system, and then use it to compute the least squares solution of the
system:
x1 + 2x2 = 3
2x1 − 3x2 = 4.
Solution. The matrix form of the given system is
    
1 2 x1 3
= ,
2 −3 x2 4
and so     
T 1 2 1 2 5 −4
A A= = .
2 −3 2 −3 −4 13
The inverse of the matrix AT A can be computed as
 
13 4
 49 49 
T −1
(A A) =  .
 
 4 5 
49 49
The pseudoinverse of the matrix of the given system is
   
13 4 3 2
 49 49  
 7 7
 
 1 2

(AT A)−1 AT = A+ =  = .
 
5  2 −3

 4  2 1 

49 49 7 7
Now we compute the least squares solution of the system as
 
3 2
 7 7 
 
+  3
x̂ = A b =  ,

1  4

 2

7 7
which gives  
17
 7 
x̂ =  ,
 
 2 
7
622 Applied Linear Algebra and Optimization using MATLAB

and this is the least squares solution of the given system. •

The least squares solution to the linear system by the pseudoinverse


of a matrix can be obtained using the MATLAB Command Window as
follows:

>> A = [1 2; 2 − 3];
>> b = [3 4]0 ;
>> x = pinv(A) ∗ b;

Theorem 5.10 Let A be a matrix with linearly independent columns, then


A+ of A satisfies:

1. AA+ A = A.

2. A+ AA+ = A+ .

3. (AT )+ = (A+ )T .

Theorem 5.11 If A is a square matrix with linearly independent columns,


then:

1. A+ = A−1 .

2. (A+ )+ = A.

3. (AT )+ = (A+ )T .

5.3.9 Least Squares with QR Decomposition


The least squares solutions discussed previously suffer from a frequent prob-
lem. The matrix AT A of the normal equation is usually ill-conditioned,
therefore, a small numerical error in performing the Gauss elimination will
result in a large error in the least squares.
Interpolation and Approximation 623

Usually, the Gauss elimination for AT A of size n ≥ 5 does not yield


any good approximate solutions. It turns out that the QR decomposition
of A (discussed in Chapter 4) yields a more reliable way of computing the
least squares approximation of linear system Ax = b. The idea behind this
approach is that because orthogonal matrices preserve length, they should
preserve the length of the error as well.

Let A have linearly independent columns and let A = QR be a QR


decomposition. In this decomposition, we express a matrix as the product
of an orthogonal matrix Q and an upper triangular matrix R.

For x̂, a least squares solution of Ax = b, we have


AT Ax̂ = AT b
(QR)T (QR)x̂ = (QR)T b
RT QT QRx̂ = R T QT b
RT Rx̂ = R T QT b (because QT Q = I).
Since R is invertible, so is RT , and hence
Rx̂ = QT b,
or equivalently,
x̂ = R−1 QT b.
Since R is an upper triangular matrix, in practice it is easier to solve
Rx̂ = QT b directly (using backward substitution) than to invert R and
compute R−1 QT b.
Theorem 5.12 If A is an m×n matrix with linearly independent columns
and if A = QR is a QR decomposition, then the unique least squares
solutions x̂ of Ax = b is, theoretically, given by
x̂ = R−1 QT b, (5.101)
and it is usually computed by solving the system
Rx̂ = QT b. (5.102)

624 Applied Linear Algebra and Optimization using MATLAB

Example 5.34 A QR decomposition of A is given. Use it to find a least


squares solution of the linear system Ax = b, where
   
2 2 6 1
A= 1 4 −3  , b =  −1 
2 −4 9 4
and
2 1 2
 
− 

 3 3 3   
  3 0 9
 1 2 2 
Q= , R =  0 6 −6  .

 3 3 3 
 0 0 −3
 
 2 2 1 
− −
3 3 3
Solution. For the right-hand side of (5.102), we obtain
2 1 2
 
 3 3 −3 
    
  1 3
 1 2 2 

QT b =  3 3 − 3  −1
 =  −3  .



 4 0
 2 2 1 

3 3 3
Hence, (5.102) can be written as
    
3 0 9 x1 3
 0 6 −6   x2  =  −3 
0 0 −3 x3 0
or
3x1 + 9x3 = 3
6x2 − 6x3 = −3
− 3x3 = 0.
Now using backward substitution, we obtain
1
x̂ = [x1 , x2 , x3 ]T = [1, − , 0]T ,
2
which is called the least squares solution of the given system. •
Interpolation and Approximation 625

So we conclude that
Rx̂ = QT b
must be satisfied by the solution of AT Ax̂ = AT b, but because, in general,
R is not an even square, we cannot use multiplication by (RT )−1 to arrive
at this conclusion. In fact, it is not true, in general, that the solution of

Rx̂ = QT b

even exists; after all, Ax = b is equivalent to QRx = b, i.e., to Rx = QT b,


so Rx = QT b can have an actual solution x only if Ax = b does. However,
we are getting close to finding the least squares solution. Here, we need to
find a way to simplify the expression

RT Rx̂ = RT QT b. (5.103)

The matrix R is upper triangular, and because we have restricted ourselves


to the case where m ≥ n, we may write the m × n matrix R as
 
R1
R= , (5.104)
0

in partitioned (block) form, where R1 is an upper triangular n × n matrix


and 0 represents an (m − n) × n zero matrix. Since rank(R) = n, R1 is
nonsingular. Hence, every diagonal element of R1 must be nonzero. Now
we may rewrite
RT Rx̂ = RT QT b
as
 T    T
R1 R1 R1
x̂ = QT b
0 0 0
 
R1
QT b
 
R1T 0 T
x̂ = R1T 0T
0

R1T R1 x̂ = (QT b).



R1T 0T

Note that multiplying by the block 0T (an n × (m − n) zero matrix) on the


right-hand side simply means that the last (m − n) components of QT b do
626 Applied Linear Algebra and Optimization using MATLAB

not affect the computation. Since R1 is nonsingular, then we have


R1 x̂ = (R1T )−1 R1T 0T (QT b)


(QT b).

= In 0T
The left-hand side, R1 x̂, is (n × n) × (n × 1) −→ n × 1, and the right-hand
side is (n × (n + (m − n)) × (m × m) × (m × 1) −→ n × 1. If we define the
vector q to be equal to the first n components of QT b, then this becomes
R1 x̂ = q, (5.105)
which is a square linear system involving a nonsingular upper triangular
n×n matrix. So (5.105) is called the least squares solution of the overdeter-
mined system Ax = b, with QR decomposition by backward substitution,
where A = QR is the QR decomposition of A and q is essentially QT b.

Note that the last (m − n) columns of Q are not needed to solve the
least squares solution of the linear system with QR decomposition. The
block-matrix representation of Q corresponding to R (by (5.104)) is
Q = [Q1 , Q2 ],
where Q1 is the matrix composed of the first m columns of Q and Q2 is a
matrix composed of the remaining columns of Q. Note that only the first n
columns of Q are needed to create A using the coefficients in R, and we can
save effort and memory in the process of creating the QR decomposition.
The so-called short QR decomposition of A is
A = Q1 R1 . (5.106)
The only difference between the full QR decomposition and the short de-
composition is that the full QR decomposition contains the additional
(m − n) columns of Q.
Example 5.35 Find the least squares solution of the following linear sys-
tem Ax = b using QR decomposition, where
   
2 1   1.9
x1
A =  1 0 , x = , b =  0.9  .
x2
3 1 2.8
Interpolation and Approximation 627

Solution. First, we find the QR decomposition, and we will get


   
−0.5345 0.6172 −0.5774 −3.7417 −1.3363
Q =  −0.2673 −0.7715 −0.5774  and R =  0 0.4629 
−0.8018 −0.1543 0.5774 0 0

and
    
−0.5345 −0.2673 −0.8018 1.9 −3.5011
QT b =  0.6172 −0.7715 −0.1543   0.9  =  0.0463  ,
−0.5774 −0.5774 0.5774 2.8 0.0000

so that
   
−3.7417 −1.3363 −3.5011
R1 = and q= .
0 0.4629 0.0463

Hence, we must solve (5.105), i.e.,

R1 x̂ = q,

i.e.,     
−3.7417 −1.3363 x1 −3.5011
= .
0 0.4629 x2 0.0463
Using backward substitution, we obtain

x̂ = [x1 , x2 ]T = [0.9000, 0.1000]T ,

the least squares solution of the given system. •

The MATLAB built-in function qr returns the QR decomposition of a


matrix. There are two ways of calling qr, which are

>> [Q, R] = qr(A);


>> [Q1 , R1 ] = qr(A, 0);
where Q and Q1 are orthogonal matrices and R and R1 are upper triangular
matrices. The above first form returns the full QR decomposition (i.e., if
A is (m × n), then Q is (m × m) and R is (m × n)). The second form
628 Applied Linear Algebra and Optimization using MATLAB

returns the short QR decomposition, where Q1 and R1 are the matrices in


(5.106).
In Example 5.35, we apply the full QR decomposition of A using the
first form of the built-in function qr as

>> A = [2 1; 1 0; 3 1];
>> [Q, R] = qr(A);
The short QR decomposition of A can be obtained by using the second
form of the built-in function qr as

>> A = [2 1; 1 0; 3 1];
>> [Q1 , R1 ] = qr(A, 0);
Q1 =
−0.5345 0.6172
−0.2673 − 0.7715
−0.8018 − 0.1543
R1 =
−3.7417 − 1.3363
0 0.4629
As expected, Q1 and the first two columns of Q are identical, as are R1
and the first two rows of R. The short QR decomposition of A possesses
all the necessary information in the columns of Q1 and R1 to reconstruct
A.

5.3.10 Least Squares with Singular


Value Decomposition
One of the advantages of the Singular Value Decomposition (SVD) method
is that we can efficiently compute the least squares solution. Consider the
problem of finding the least squares solution of the overdetermined linear
system Ax = b. We discussed previously that the least squares solution of
Ax = b is the solution of AT Ax̂ = AT b, i.e., the solution of

(U DV T )T U DV T x̂ = (U DV T )T b
Interpolation and Approximation 629

V DT U T U DV T x̂ = V DT U T b
V DT DV T x̂ = V DT U T b
V T V DT DV T x̂ = V T V DT U T b
DT DV T x̂ = DT U T b
DV T x̂ = UT b
V T x̂ = D−1 U T b
x̂ = V D−1 U T b.

This is the same formal solution that we found for the linear system Ax = b
(see Chapter 6), but recall that A is no longer a square matrix.

Note that in exact arithmetic, the solution to a least squares prob-


lem via normal equations QR and SVD is exactly the same. The main
difference between these two approaches is the numerical stability of the
methods. To find the least squares solution of the overdetermined linear
system with SVD, we will find D1 as
 
D1
D=
0

in partitioned (block) form, where D1 is an n × n matrix and 0 represents


an (m − n) × n zero matrix. If we define the right-hand vector q to be
equal to the first n components of U T b, then the least squares solution of
the overdetermined linear system is to solve the system

D1 V T x̂ = q (5.107)

or
x̂ = V D1−1 q. (5.108)

Example 5.36 Find the least squares solution of the following linear sys-
tem Ax = b using SVD, where
   
1 1   1
x1
A=  0 1 , x=
 , b=  1 .
x2
1 0 1
630 Applied Linear Algebra and Optimization using MATLAB

Solution. First, we find the SVD of the given matrix. The first step is to
find the eigenvalues of the following matrix:
 
  1 1  
T 1 0 1  2 1
A A= 0 1 = .
1 1 0 1 2
1 0

The characteristic polynomial of AT A is

p(λ) = λ2 − 4λ + 3 = (λ − 3)(λ − 1) = 0,

which gives
λ1 = 3, λ2 = 1,
and the eigenvalues of AT A and the corresponding eigenvectors are
   
1 −1
and .
1 1
These vectors are orthogonal, so we normalize them to obtain
 √   √ 
2 2
 2   − 2 
v1 =  √  and v2 =  √  .
   
2 2
   
2 2
The singular values of A are
p √ p √
σ1 = λ1 = 3 and σ2 = λ2 = 1 = 1.

Thus,  √ √ 
2 2
 2 − 2   
0.7071 −0.7071
V = √ =
 
√  0.7071 0.7071
2 2
 
2 2
and  √   
3 0 1.7321 0
D= 0 1 = 0 1.0000  .
0 0 0 0
Interpolation and Approximation 631

To find U , we first compute


 √ 
6
 √   3 
2  
√ √
 
1 1  2
 
1 3  
6

u1 = Av1 = 0 1  √ = ,
   
σ1 3 6
1 0 6
   
 
2
 √ 
6
 
2
and similarly,
√0
 
 2 
1 
2

u2 = Av2 =  .
 
σ2  √ 
2
 

2
These are two of the three column vectors of U , and they already form an
orthonormal basis for R2 . Now to find the third column vector u3 of U , we
will look for a unit vector u3 that is orthogonal to
   
√ 2 √ 0
6u1 =  1  and 2u2 =  1  .
1 −1

To satisfy these two orthogonality conditions, the vector u3 must be a so-


lution of the homogeneous linear system
   
  x1 0
2 1 1 
x2  =  0 ,
0 1 −1
x3 0

which gives the general solution of the system


   
x1 −1
 x2  = α  1  , α ∈ R.
x3 1
632 Applied Linear Algebra and Optimization using MATLAB

By normalizing the vector on the right-hand side, we get

1
 
−√

 3 

 
 1 
u3 =  √  .
 
 3 
 
 
 1 
−√
3

So we have
 √ 
6 1
 3 0 −√ 
 3 
 √ √
   
 6
 0.8165 0.0000 −0.5774
2 1 
U =  6 √  =
 0.4082 0.7071 0.5774  .
 2 3  0.4082 −0.7071 0.5774
 
 √ √ 
 6 2 1 
− √
2 2 3

This yields the SVD


 √ 
6 1
0 −√ 
 3 3   √ √ 

 √ 2 2
 √ √
 
 6
 3 0 2 2
2 1 
 
A= − √  0 1  .

 √ √
 6 
 2 3 

0 0 2 2


 √ √
 −
 6 2 1 
 2 2

2 2 3

Hence,
   
1.7321 0 0.5774 0
D1 = and D1−1 = .
0 1.0000 0 1.0000
Interpolation and Approximation 633

Also,
    
0.8165 0.4082 0.4082 1 1.6330
U T b =  0.0000 0.7071 −0.7071   1  =  0.0000  ,
−0.5774 0.5774 0.5774 1 0.5774

and from it, we obtain  


1.6330
q= .
0.0000
Thus, we must solve (5.108), i.e.,

x̂ = V D1−1 q,

which gives
       
x1 0.7071 −0.7071 0.5774 0 1.6330 0.6667
= = ,
x2 0.7071 0.7071 0 1.0000 0.0000 0.6667

the least squares solution of the given system. •

Like QR decomposition, the MATLAB built-in function svd returns


the SVD of a matrix. There are two ways of calling svd:

>> [U, D, V ] = svd(A);


>> [U1 , D1 , V ] = svd(A, 0);
Here, A is any matrix, D is a diagonal matrix having singular values of
A in the diagonal, and U and V are orthogonal matrices. The first form
returns the full SVD and the second form returns the short SVD. The
second decomposition is useful when A is an m × n matrix with m > n.
The second form of SVD gives U1 , the first n columns of U , and square
(n × n) D1 . When m > n, the full SVD of A gives a D matrix with only
zeros in the last (m − n) rows. Note that there is no change in V in both
forms.

In Example 5.36, we apply the full SVD of A using the first form of
the built-in function svd:
634 Applied Linear Algebra and Optimization using MATLAB

>> A = [1 1; 0 1; 1 0];
>> [U, D, V ] = svd(A);

The short SVD of A can be obtained by using the second form of the
built-in function svd:

>> A = [1 1; 0 1; 1 0];
>> [U1 , D1 , V ] = svd(A, 0);

U1 =
0.8165 0.0000
0.4082 0.7071
0.4082 − 0.7071
D1 =
1.7321 0
0 1.0000
V =
0.7071 − 0.7071
0.7071 0.7071

As expected, U1 and the first two columns of U are identical, as are D1 and
the first two rows of D (no change in V in either form). The short SVD
of A possesses all the necessary information in the columns of U1 and D1
(with V also) to reconstruct A.

Note that when m and n are similar in size, SVD is significantly more
expensive to compute than QR decomposition. If m and n are equal, then
solving a least squares problem by SVD is about an order of magnitude
more costly than using QR decomposition. So for least squares problems
it is generally advisable to use QR decomposition. When a least squares
problem is known to be a difficult one, using SVD is probably justified.

5.4 Summary
In this chapter, we discussed the procedures for developing approximating
polynomials for discrete data. First, we discussed the Lagrange and New-
Interpolation and Approximation 635

ton divided differences polynomials and both yield the same interpolation
for a given set of n data pairs (x, f (x)). The pairs are not required to
be ordered, nor is the independent variable required to be equally spaced.
The dependent variable is approximated as a single-valued function. The
Lagrange polynomial works well for small data points. The Newton di-
vided differences polynomial is generally more efficient than the Lagrange
polynomial, and it can be adjusted easily for additional data. For effi-
cient implementation of divided difference interpolation, we used Aitken’s
method, which is designed for the easy evaluation of the polynomial. We
also discussed the Chebyshev polynomial interpolation of the function over
the interval [−1, 1]. These types of polynomials are used to minimize ap-
proximation error.

Procedures for developing least squares approximation for discrete data


were also discussed. Least squares approximations are useful for large sets
of data and sets of rough data. Least squares polynomial approximation
is straightforward for one independent variable and for two independent
variables. The least squares normal equations corresponding to polynomi-
als approximating functions are linear, which leads to very efficient solu-
tion procedures. For nonlinear approximating functions, the least squares
normal equations are nonlinear, which leads to complicated solution proce-
dures. We discussed the trigonometric least squares polynomial for approx-
imating the given function. We also discussed the least squares solutions to
overdetermined linear systems and the underdetermined linear systems. In
the last section, we discussed the least squares solutions of linear systems
with the pseudoinverse of matrices, QR decomposition, and SVD.

5.5 Problems
1. Use the Lagrange interpolation formula based on the points x0 =
0, x1 = 1 and x2 = 2 to find the equation of the quadratic polynomial
to approximate f (x) = x+1
x+5
at x = 1.5. Compute the absolute error.

2. Let f (x) = x2 sin( xπ


4
), where x is in radian. Find the quadratic
Lagrange interpolation polynomial by using the best of the points
x0 = 0, x1 = 1, x2 = 2, and x3 = 4 to find the approximation of the
636 Applied Linear Algebra and Optimization using MATLAB

function f (x) at x = 0.5 and x = 3.5. Compute the error bounds for
each case.

3. Use the quadratic Lagrange interpolation formula to show that A +


B = 1 − C, such that p2 (1.4) = Af (0) + Bf (1) + Cf (2).

4. Let f (x) = x + 2ln(x + 2). Use the quadratic Lagrange interpolation


formula based on the points x0 = 0, x1 = 1, x2 = 2, and x3 = 3 to
approximate f (0.5) and f (2.8). Also, compute the error bounds for
your approximations.

5. Let p2 (x) be the quadratic Lagrange interpolating polynomial for the


data: (0, 0), (1, α), (2, 3). Find the value of α, if the coefficient of x2
in p2 (x) is 12 .
2
6. Consider the function f (x) = ex ln(x + 1) and x = 0, 0.25, 0.5, 1.
Then use the suitable Lagrange interpolating polynomial to approxi-
mate f (0.75). Also, compute an error bound for your approximation.

7. Consider the following table having the data for f (x) = e3x cos 2x:

x 0.1 0.2 0.4 0.5


.
f (x) 1.32295 1.67828 2.31315 2.42147

Find the cubic Lagrange polynomial p3 (x) and use it to approximate


f (0.3). Also, estimate the actual error and the error bound for the
approximation.

8. Construct the divided differences table for the function f (x) = x4 +


4x3 + 2x2 + 11x + 21, for the values x = 1.5, 2.5, 3.5, 4.5, 5.5, 6.5.

9. Construct the divided differences table for the function


f (x) = (x + 2)ex−3 , for the values x = 2.1, 3.2, 4.3, 5.4, 6.5, 7.6.

10. Consider the following table:

x 0.5 1.5 2.5 3.5 4.5 5.5


.
f (x) 2.70 6.97 11.89 21.06 36.34 41.45
Interpolation and Approximation 637

(a) Construct the divided differences table for the tabulated function.
(b) Compute the Newton interpolating polynomials p2 (x) and p3 (x)
at x = 3.7.

11. Consider the following table of f (x) = x:
x 4 5 6 7 8
.
f (x) 2.0000 2.2361 2.4495 2.6458 2.8284
(a) Construct the divided differences table for the tabulated function.
(b) Find the Newton interpolating polynomials p3 (x) and p4 (x) at
x = 5.9.
(c) Compute error bounds for your approximations in part (b).
12. Let f (x) = ln(x + 3) sin x, with x0 = 0, x1 = 2, x2 = 2.5, x3 = 4, x4 =
4.5. Then:
(a) Construct the divided differences table for the given data points.
(b) Find the Newton divided difference polynomials p2 (x), p3 (x) and
p4 (x) at x = 2.4.
(c) Compute error bounds for your approximations in part (b).
(d) Compute the actual error.
13. Show that if x0 , x1 , and x2 are distinct, then
f [x0 , x1 , x2 ] = f [x1 , x2 , x0 ] = f [x2 , x0 , x1 ].

14. The divided differences form of the interpolating polynomial p3 (x) is


p3 (x) = f [x0 ] + (x − x0 )f [x0 , x1 ] + (x − x0 )(x − x1 )f [x0 , x1 , x2 , x0 ]
+ (x − x0 )(x − x1 )(x − x2 )f [x0 , x1 , x2 , x3 ].
By expressing these divided differences in terms of the function values
f (xi )(i = 0, 1, 2, 3), verify that p3 (x) does pass through the points
(xi , f (xi ))(i = 0, 1, 2, 3).
15. Let f (x) = x2 + ex and x0 = 0, x1 = 1. Use the divided differences
to find the value of the second divided difference f [x0 , x1 , x0 ].
2
16. Let f (x) = ln(x+2)ex and x0 = 0.5, x1 = 1.5. Use the divided differ-
ences to find the value of the third divided difference f [x0 , x1 , x0 , x1 ].
638 Applied Linear Algebra and Optimization using MATLAB

17. Use property 1 of the Chebyshev polynomial to construct T4 (x) using


T (3) and T2 (x).

18. Find the Chebyshev polynomial p3 (x) that approximates the function
f (x) = e2x+1 over [−1, 1].

19. Find the Lagrange–Chebyshev polynomial approximation p3 (x) of


f (x) = ln(x + 2) on [−1, 1]. Also, compute the error bound.

20. Find the four Chebyshev points in 2 ≤ x ≤ 4 and write the Lagrange
interpolation to interpolate ex (x + 2). Compute the error bound.

21. Apply Aitken’s method to approximate f (1.5) by using the following


data points:
x 1 2 3 4
.
f (x) 1 4 10 17

22. Consider the following table:

x 1.0 1.1 1.2 1.3 1.4 1.5


.
f (x) 0.8415 0.8912 0.9320 0.9636 0.9854 0.9975

Use Aitken’s method to find the estimated value of f (1.21).

23. Let f (x) = cos(x − 2)e−x , with points x0 = 0, x1 = 1, x2 = 2, x3 =


3, and x4 = 4. Use Aitken’s method to find the estimated value of
f (2.5).
xe x
24. (a) Let f (x) = x+1 , with points x0 = 0.2, x1 = 1.1, x2 = 2.3, and
x3 = 3.5. Use Aitken’s method to find the estimated value of
f (2.5).
(b) Let f (x) = x + x23+1 , with points x0 = 1.5, x1 = 2.5, x2 = 3.5,
and x3 = 4.5. Use Aitken’s method to find the estimated value
of f (3.9).

25. Find the least squares line fit y = ax + b for the following data:

(a) (−2, 1), (−1, 2), (0, 3), (1, 4).


(b) (1.5, 1.4), (2, 2.4), (3, 3.9), (4, 4.7).
Interpolation and Approximation 639

(c) (2, 0), (3, 4), (4, 10), (5, 16).


(d) (2, 2.6), (3, 3.4), (4, 4.9), (5, 5.4), (8, 4.6).
(e) (−4, 1.2), (−2, 2.8), (0, 6.2), (2, 7.8), (4, 13.2).

26. Repeat Problem 25 to find the least squares parabolic fit y = a +


bx + cx2 .

27. Find the least squares parabolic fit y = a + bx + cx2 for the following
data:

(a) (−1, 0), (0, −2), (0.5, −1), (1, 0).


(b) (−3, 15), (−1, 5), (1, 1), (3, 5).
(c) (−3, −1), (−1, 25), (1, 25), (3, 1).
(d) (0, 3), (1, 1), (2, 0), (4, 1), (6, 4).
(e) (−2, 10), (−1, 1), (0, 0), (1, 2), (2, 9).

28. Repeat Problem 27 to find the best fit of the form y = axb .

29. Find the best fit of the form y = aebx for the following data:

(a) (5, 5.7), (10, 7.5), (15, 8.9), (20, 9.9).


(b) (−1, 0.1), (1, 2.3), (2, 10), (3, 45).
(c) (3, 4), (4, 10), (5, 16), (6, 20).
(d) (−2, 1), (−1, 2), (0, 3), (1, 3), (2, 4).
(e) (−1, 6.62), (0, 3.94), (1, 2.17), (2, 1.35), (3, 0.89).

30. Use a change of variable to linearize each of the following data points:

(a) For the given data (1, 10), (3, 20), (5, 35), (7, 55), (9, 70), find the
4
least squares curve f (x) = (1+ce ax ) .

(b) For the given data (0.5, 2), (1.5, 5), (2.5, 6.5), (4, 7), (6.5, 11), find
1
the least squares curve f (x) = (a+bx 2) .
1 1 1 1
(c) For the given data (1.3, 4 ), (1.8, 9 ), (2.5, 16 ), (3.6, 25 ),
1 1
(4.2, 36 ), find the least squares curve f (x) = (a+bx)2 .
(d) For the given data (4, 5.6), (7, 7.2), (9, 11.5), (12, 15.5), (17, 18.7),
find the least squares curve f (x) = a lnlnx+b x
.
640 Applied Linear Algebra and Optimization using MATLAB

31. Find the least squares planes for the following data:

(a) (0, 1, 2), (1, 0, 3), (2, 1, 4), (0, 2, 1).


(b) (1, 7, 1), (2, 5, 6), (3, 1, −2), (2, 1, 0).
(c) (3, 1, −3), (2, 1, −1), (2, 2, 0), (1, 1, 1), (1, 2, 3).
(d) (5, 4, 3), (3, 7, 9), (3, 2, 3), (4, 4, 4), (5, 7, 8).

32. Find the plane z = ax + by + c that best fits the following data:

(a) (1, 2, 3), (1, −2, 1), (2, 1, 3), (2, 2, 1).
(b) (2, 4, −1), (2, 2, 5), (1, 3, 1), (7, 8, 2).
(c) (3, −1, 1), (2, 3, −2), (9, 6, −2), (7, 1, 2).
(d) (1, 2, 2), (3, 1, 6), (1, 2, 2), (2, 5, 1).

33. Find the trigonometric least squares polynomial fit y = a0 +a1 cos x+
b1 sin x for each of the following data:

(a) (1.5, 7.5), (2.5, 11.4), (3.5, 15.3), (4.5, 19.2), (5.5, 23.5).
(b) (0.2, 3.0), (0.4, 5.0), (0.6, 7.0), (0.8, 9.0), (1.0, 11.0).
(d) (−2.0, 1.5), (−1.0, 2.5), (0.0, 3.5), (1.0, 4.5), (2.0, 5.5).
(c) (1.1, 6.5), (1.3, 8.3), (1.5, 10.4), (1.7, 12.9), (1.9, 14.6).

34. Repeat Problem 25 to find the trigonometric least squares polynomial


fit y = a0 + a1 cos x + b1 sin x.

35. Find the least squares solution for each of the following overdeter-
mined systems:

(a)
3x1 + x2 = 1
2x1 + 5x2 = 2
x1 − 4x2 = 3
(b)
7x1 + 6x2 = 5
3x1 + 5x2 = 1
2x1 + 6x2 = 2
Interpolation and Approximation 641

(c)
2x1 − 3x2 + 4x3 = 13
x1 + 5x2 + 3x3 = 7
3x1 + 2x2 + x3 = 11
4x1 + x2 + 5x3 = 10
(d)
4x1 − 3x2 + 4x3 = 9
3x1 + 2x2 − 5x3 = 3
2x1 + 5x2 − 9x3 = 5
4x1 + 12x2 + 3x3 = 7

36. Find the least squares solution for each of the following overdeter-
mined systems:

(a)
12x1 − 9x2 = 7
11x1 + 21x2 = 13
17x1 − 22x2 = 24
(b)
x1 + 5x2 + 19x3 = 13
4x1 − 2x2 + 3x3 = 14
3x1 + x2 − x3 = 12
5x1 − 4x2 + 4x3 = 19
(c)
2x1 − 4x2 + 11x3 = 9
3x1 − 5x2 + 5x3 = 3
11x1 + 11x2 − 7x3 = 2
x1 − 8x2 + 3x3 = 7
(d)
2x1 + 5x2 + 2x3 + 2x4 = 12
x1 + 4x2 + 6x3 + x4 = 14
3x1 + 7x2 + 2x3 − 2x4 = 23
5x1 − 2x2 + 11x3 + 7x4 = 11
x1 + 4x2 − 7x3 + 13x4 = 19
642 Applied Linear Algebra and Optimization using MATLAB

37. Find the least squares solution for each of the following overdeter-
mined systems:

(a)
7x1 + 6x2 − 4x3 = −3
8x1 + 5x2 + 3x3 = −5
9x1 + 3x2 + 5x3 = 6
−3x1 + 2x2 + 6x3 = 7

(b)
x1 + 3x2 + 9x3 = 23
2x1 − 2x2 + 6x3 = 24
3x1 − 7x2 + 5x3 = 11
4x1 − 4x2 + 9x3 = 22

(c)
2x1 − 5x2 + 4x3 + 3x4 = 15
3x1 + 2x2 + x3 + 5x4 = 14
7x1 − 3x2 + 4x3 + 9x4 = 13
11x1 + 8x2 + 5x3 + 7x4 = 12
3x1 + x2 − 2x3 + 6x4 = 11

(d)
x1 + 7x2 + 7x3 − 3x4 = 5
3x1 − 2x2 − 5x3 + 2x4 = 6
3x1 + 3x2 − 5x3 − 2x4 = 7
x1 − 3x2 + 12x3 + 11x4 = 8
2x1 + 4x2 − 15x3 + 3x4 = 9

38. Find the least squares solution for each of the following overdeter-
mined systems:

(a)
2x1 − 9x2 = 5
4x1 + 8x2 = 1
12x1 + 13x2 = 2
Interpolation and Approximation 643

(b)
3x1 + 4x2 + 9x3 = 3
4x1 − 2x2 + 7x3 = 4
2x1 − 8x2 + 4x3 = 2
x1 − 4x2 + 4x3 = 9
(c)
2x1 − 24x2 + 9x3 = 3
11x1 − 15x2 + 14x3 = 1
13x1 + 21x2 − 6x3 = 2
x1 − 8x2 + 3x3 = 3
(d)
x1 + 5x2 + 2x3 − 2x4 = 1
3x1 − 2x2 + 6x3 + x4 = 1
3x1 + 7x2 − 5x3 − 2x4 = 2
5x1 − 2x2 + 12x3 + 11x4 = 1
x1 + 4x2 − 15x3 + 3x4 = 1

39. Find the least squares solution for each of the following underdeter-
mined systems:

(a)
2x1 + x2 + x3 = 4
x1 + 3x2 + 4x3 = 5
(b)
x1 − 3x2 + 4x3 = 2
−x1 + 2x2 + 5x3 = 11
(c)
x1 + 5x2 + 3x3 − 5x4 = 15
2x1 + 5x2 + 6x3 + x4 = 18
3x1 + 2x2 + 5x3 − 3x4 = 10
(d)
2x1 + 5x2 + 2x3 + x4 = −5
4x1 + 3x2 − 2x3 + 9x4 = 6
x1 + x2 + 3x3 + 8x4 = 12
644 Applied Linear Algebra and Optimization using MATLAB

40. Find the least squares solution for each of the following underdeter-
mined systems:

(a)
2x1 + 11x2 + 7x3 = 13
5x1 + 13x2 + 3x3 = 11
(b)
12x1 + 8x2 − 9x3 = 4
15x1 − 13x2 + 14x3 = 5
(c)
x1 + 5x2 + 4x3 + 3x4 = 22
7x1 + 15x2 + 6x3 + x4 = 15
−3x1 + 7x2 + 3x3 + 6x4 = 19
(d)
x1 + 5x2 + x3 + 13x4 = 3
2x1 + 15x2 + 12x3 + 9x4 = 8
x1 − 11x2 + 17x3 + 22x4 = −2
41. Find the least squares solution for each of the following underdeter-
mined systems:

(a)
3x1 − 9x2 + 5x3 = 21
4x1 + 17x2 + 15x3 = 23
(b)
9x1 − 5x2 − 8x3 = 14
7x1 − 3x2 + 4x3 = 11
(c)
2x1 − 6x2 + 7x3 + 9x4 = 9
5x1 + 11x2 + 9x3 − 4x4 = 8
2x1 + 5x2 + 6x3 + 8x4 = 7
(d)
3x1 + 5x2 + 7x3 + 3x4 = 33
2x1 + 21x2 + 2x3 + 29x4 = 18
5x1 − 31x2 + 19x3 + 12x4 = 22
Interpolation and Approximation 645

42. Find the least squares solution for each of the following underdeter-
mined systems:

(a)
3x1 + 23x2 + 14x3 = 51
4x1 − 37x2 + 35x3 = 13
(b)
5x1 − 16x2 − 18x3 = 44
2x1 − 23x2 + 34x3 = 51
(c)
2x1 + 7x2 − 9x3 + 5x4 = 19
−x1 + 11x2 + 18x3 − 24x4 = 27
−3x1 + 15x2 + 6x3 + 8x4 = 39
(d)
13x1 + 5x2 − 13x3 + 23x4 = 17
17x1 − 22x2 − 12x3 + 29x4 = 28
15x1 + 11x2 − 19x3 + 22x4 = 32

43. Find the pseudoinverse of each of the matrices:

  
1 3 1 −2
(a) A= 1 4  (b) A =  2 −3 
2 1 5 1
   
2 3 1 2
(c) A= 3 4  (d) A =  4 −2 
5 2 −1 1

44. Find the pseudoinverse of each of the matrices:

  
2 2 3 −1
(a) A= 3 0  (b) A= 2 3 
1 4 6 5
646 Applied Linear Algebra and Optimization using MATLAB

   
3 0 1 −1
 2 1   1 1 
(c) A= 
 1 −1  (d) A=
 2

3 
−1 2 3 5
45. Find the least squares solution for each of the following linear sys-
tems by using the pseudoinverse of the matrices:

(a)
2x1 + 5x2 = 4
x1 + 11x2 = 5
(b)
3x1 − 7x2 = 2
5x1 + 2x2 = 5
(c)
x1 + 3x2 + 2x3 = 1
2x1 − 7x2 + x3 = 1
5x1 + 3x2 + 5x3 = 1
(d)
2x1 + 7x2 + 12x3 = 3
x1 + 13x2 − 2x3 = 2
7x1 + 11x2 + 3x3 = 9
46. Find the least squares solution for each of the following linear sys-
tems by using the pseudoinverse of the matrices:

(a)
7x1 + 13x2 = 3
8x1 + 15x2 = 1
(b)
2x1 + 5x2 = 14
5x1 − 3x2 = 11
(c)
3x1 + 5x2 + 3x3 = 2
x1 + 5x2 + 6x3 = 5
−3x1 + 2x2 + 3x3 = 9
Interpolation and Approximation 647

(d)
x1 + 3x2 + 4x3 = 13
x1 − 6x2 + 17x3 = 9
4x1 − 15x2 + 9x3 = 2

47. A QR decomposition of A is given. Use it to find a least squares


solution of Ax = b, where
   
0 3 3
A =  0 4 , b =  0 ,
5 10 −4
   
0 −0.6000 −0.8000 5 −10
Q= 0 −0.8000 0.6000  , R =  0 −5  .
−1.0000 0 0 0 0

48. A QR decomposition of A is given. Use it to find a least squares


solution of Ax = b, where
   
1 0 1
A =  2 −1  , b =  1 ,
−1 1 1
   
−0.4082 0.7071 −0.5774 −2.4495 1.2247
Q =  −0.8165 −0.0000 0.5774  , R= 0 0.7071  .
0.4082 0.7071 0.5774 0 0

49. A QR decomposition of A is given. Use it to find a least squares


solution of Ax = b, where
   
2 1 2
A =  2 0 , b =  3 ,
1 1 −1
   
−0.6667 0.3333 −0.6667 −3.0000 −1.0000
Q =  −0.6667 −0.6667 0.3333  , R= 0 1.0000  .
−0.3333 0.6667 0.6667 0 0
648 Applied Linear Algebra and Optimization using MATLAB

50. A QR decomposition of A is given. Use it to find a least squares


solution of Ax = b, where
   
1 1 1
 −1 1   2 
A=  1
, b =  4 ,
 
1 
1 −1 −1
   
−0.5000 −0.5000 −0.7000 −0.1000 −2 0
 0.5000 −0.5000 −0.1000
 , R =  0 −2  .
0.7000   
Q=  −0.5000 −0.5000 0.7000 0.1000   0 0 
−0.5000 0.5000 −0.1000 0.7000 0 0

51. A QR decomposition of A is given. Use it to find a least squares


solution of Ax = b, where
   
1 2 1
A= , b= .
1 −1 1
   
0.7071 −0.7071 1.4142 0.7071
Q= , R= .
0.7071 0.7071 0.0000 −2.1213

52. A QR decomposition of A is given. Use it to find a least squares


solution of Ax = b, where
   
1 2 2
A= , b= ,
−1 3 1
   
0.7071 0.7071 1.4142 −0.7071
Q= , R= .
−0.7071 0.7071 0.0000 3.5356

53. A QR decomposition of A is given. Use it to find a least squares


solution of Ax = b, where
   
1 0 1
A= , b= ,
1 1 −1
   
0.7071 −0.7071 1.4142 0.7071
Q= , R= .
0.7071 0.7071 0.0000 0.7071
Interpolation and Approximation 649

54. A QR decomposition of A is given. Use it to find a least squares


solution of Ax = b, where
  
1 3 0 1
A =  0 −1 8  , b =  2 ,
1 2 4 3
   
0.7071 −0.4082 −0.5774 1.4142 3.5355 2.8284
Q= 0 0.8165 −0.5774  , R= 0 −1.2247 8.1650  .
0.7071 0.4082 0.5774 0 0 −2.3094
55. A QR decomposition of A is given. Use it to find a least squares
solution of Ax = b, where
  
1 0 2 1
A =  −1 2 0 , b =  1 ,
−1 −2 2 1
   
0.7071 −0.4082 −0.5774 1.4142 3.5355 2.8284
Q= 0 0.8165 −0.5774  , R =  0 −1.2247 8.1650  .
0.7071 0.4082 0.5774 0 0 −2.3094
56. Find the least squares solution of each of the following linear systems
Ax = b using QR decomposition:

(a)    
5 3   4.9
x1
A =  1 3 , x= , b =  0.8  .
x2
2 1 1.7
(b)
   
1 −1 4   1.1
 0 x1
2 1   0.2 
A=
 1
, x=  x2  ,  0.9  .
b= 
1 0 
x3
2 −1 1 1.7
(c)
   
1 −1 1   0.7
 −1 x1  −0.8 
4 2 
A=
 −2
, x=  x2  , b=
 −1.5  .

1 2 
x3
1 4 2 1.02
650 Applied Linear Algebra and Optimization using MATLAB

(d)
   
3 2 1   2.5
 1 x1
2 2   1.1 
A= , x =  x2  , b=
 0.8  .

 1 0 −1 
x3
2 1 −2 1.9

57. Find the least squares solution of each of the following linear systems
Ax = b using QR decomposition:

(a)    
2 1   1
x1
A =  1 1 , x= , b =  1 .
x2
2 2 1
(b)    
1 −1   0.1
x1
A= 0 2 , x= , b =  1.7  .
x2
1 1 0.9
(c)
   
3 −1   2.7
x1
A =  −1 4 , x= , b =  −0.8  .
x2
−2 1 −1.5
(d)    
4 2   3.5
x1
A =  1 2 , x= , b =  1.1  .
x2
1 0 0.8
58. Solve Problem 55 using singular value decomposition.
59. Find the least squares solution of each of the following linear systems
Ax = b using singular value decomposition:

(a)
   
−2 2   −1.8
x1
A =  −1 1  , x= , b =  −0.9  .
x2
2 2 1.9
Interpolation and Approximation 651

(b)    
1 −1   1.1
x1
A= 1 2 , x= , b =  0.7  .
x2
1 1 0.9
(c)
   
1 0   0.9
x1
A =  1 1 , x= , b =  0.85  .
x2
−1 1 −0.9

(d)
   
3 2   2.5
x1
A =  1 −1  , x= , b =  1.05  .
x2
1 3 0.85

60. Solve Problem 56 using singular value decomposition.


Chapter 6

Linear Programming

6.1 Introduction
In this chapter, we give an introduction to linear programming. Linear
Programming (LP) is a mathematical method for finding optimal solutions
to problems. It deals with the problem of optimizing (maximizing or min-
imizing) a linear function, subject to the constraints imposed by a system
of linear inequalities. It is widely used in industry and in government. His-
torically, linear programming was first developed and applied in 1947 by
George Dantzig, Marshall Wood, and their associates in the U.S. Air Force;
the early applications of LP were thus in the military field. However, the
emphasis in applications has now moved to the general industrial area. LP
today is concerned with the efficient use or allocation of limited resources
to meet desired objectives.

Before formally defining a LP problem, we define the concepts linear


function and linear inequality.
653
654 Applied Linear Algebra and Optimization using MATLAB

Definition 6.1 (Linear Function)

A function Z(x1 , x2 , . . . , xN ) of x1 , x2 , . . . , xN is a linear function, if and


only if for some set of constants c1 , c2 , . . . , cN ,

Z(x1 , x2 , . . . , xN ) = c1 x1 + c2 x2 + · · · + cN xN .

For example, Z(x1 , x2 ) = 30x1 + 50x2 is a linear function of x1 and x2 , but


Z(x1 , x2 ) = 2x21 x2 is not a linear function of x1 and x2 . •

Definition 6.2 (Linear Inequality)

For any function Z(x1 , x2 , . . . , xN ) and any real number b, the inequalities

Z(x1 , x2 , . . . , xN ) ≤ b

and
Z(x1 , x2 , . . . , xN ) ≥ b
are linear inequalities. For example, 3x1 + 2x2 ≤ 11 and 10x1 + 15x2 ≥ 17
are linear inequalities, but 2x21 x2 ≥ 3 is not a linear inequality. •

Definition 6.3 (Linear Programming Problem)

An LP problem is an optimization problem for which we do the following:

1. We attempt to maximize or to minimize a linear function of the de-


cision variables. The function that is to be maximized or minimized
is called the objective function.

2. The values of the decision variables must satisfy a set of constraints.


Each constraint must be a linear equation or linear inequality.

3. A sign restriction is associated with each variable. For any variable


xi , the sign restriction specifies that xi must be either a nonnegative
(xi ≥ 0) or unrestricted sign.

In the following, we discuss some LP problems involving linear functions,


inequality constraints, equality constraints, and sign restriction. •
Linear Programming 655

6.2 General Formulation


Let x1 , x2 , . . . , xN be N variables in an LP problem. The problem is to
find the values of the variables x1 , x2 , . . . , xN to maximize (or minimize) a
given linear function of the variables, subject to a given set of constraints
that are linear in the variables.

The general formulation for an LP problem is


maximize Z = c1 x 1 + c2 x 2 + · · · + cN x N , (6.1)
subject to the constraints
a11 x1 + a12 x2 + ··· + a1N xN ≤ b1
a21 x1 + a22 x2 + ··· + a2N xN ≤ b2
.. .. .. .. (6.2)
. . ··· . .
aM 1 x 1 + aM 2 x 2 + · · · + aM N x N ≤ b M
and
x1 ≥ 0, x2 ≥ 0, . . . , xN ≥ 0, (6.3)
where aij (i = 1, 2, . . . , M ; j = 1, 2, . . . , N ) are constants (called constraint
coefficients), bi (i = 1, 2, . . . , M ) are constants (called resources values), and
cj (j = 1, 2, . . . , N ) are constants (called cost coefficients). We call Z the
objective function.

In matrix form, the general formulation can be written as


maximize Z = cT x, (6.4)
subject to the constraints
Ax ≤ b (6.5)
and
x ≥ 0, (6.6)
where  
a11 a12 ··· a1N
 a21 a22 ··· a2N 
A= (6.7)
 
.. .. .. .. 
 . . . . 
aM 1 aM 2 · · · aM N
656 Applied Linear Algebra and Optimization using MATLAB

       
b1 c1 x1 0
 b2   c2   x2   0 
b= , c= , x= , 0= (6.8)
       
.. .. .. .. 
 .   .   .   . 
bM cN xN 0

and cT denotes the transpose of the vector c.

6.3 Terminology
The following terms are commonly used in LP :

• Decision variables: variables x1 , x2 , . . . , xN in (6.1).

• Objective function: function Z given by (6.1).

• Objective function coefficients: constants c1 , . . . , cN in (6.1).

• Constraints coefficients: constants aij in (6.2).

• Nonnegativity constraints: constraints given by (6.3).

• Feasible solution: set of x1 , x2 , . . . , xN values that satisfy all the con-


straints.

• Feasible region: collection of all feasible solutions.

• Optimal solution: feasible solution that gives an optimal value of the


objective function (i.e., the maximum value of Z in (6.1)).

6.4 Linear Programming Problems


Example 6.1 (Product-Mix Problem)

The Handy-Dandy Company wishes to schedule the production of a kitchen


appliance that requires two resources—labor and material. The company
Linear Programming 657

is considering three different models and its production engineering depart-


ment has furnished the following data: the supply of raw material is re-
stricted to 360 pounds per day. The daily availability of labor is 250 hours.
Formulate a linear programming model to determine the daily production
rate of the various models in order to maximize the total profit.

Model A Model B Model C


Labor (hours per unit) 7 3 6
Material (hours per unit) 6 7 8
Profit ($ per unit) 6 6 3

6.4.1 Formulation of Mathematical Model


Step I. Identify the Decision Variables: the unknown activities to be de-
termined are the daily rate of production for the three models.
Representing them by algebraic symbols:
xA : Daily production of model A
xB : Daily production of model B
xC : Daily production of model C

Step II. Identify the Constraints: in this problem, the constraints are
the limited availability of the two resources—labor and material. Model
A requires 7 hours of labor for each unit, and its production quantity is
xA . Hence, the labor requirement for model A alone will be 7xA hours
(assuming a linear relationship). Similarly, models B and C will require
3xB and 6xC hours, respectively. Thus, the total requirement of labor will
be 7xA + 3xB + 6xC , which should not exceed the available 250 hours. So
the labor constraint becomes
7xA + 3xB + 6xC ≤ 250.
Similarly, the raw material requirements will be 6xA pounds for model A,
7xB pounds for model B, and 8xC pounds for model C. Thus, the raw
material constraint is given by
6xA + 7xB + 8xC ≤ 360.
658 Applied Linear Algebra and Optimization using MATLAB

In addition, we restrict the variables xA , xB , xC to have only nonnegative


values, i.e.,
xA ≥ 0, xB ≥ 0, xC ≥ 0.
These are called the nonnegativity constraints, which the variables must
satisfy. Most practical linear programming problems will have this nonneg-
ative restriction on the decision variables. However, the general framework
of linear programming is not restricted to nonnegative values.

Step III. Identify the Objective: the objective is to maximize the total
profit for sales. Assuming that a perfect market exists for the product such
that all that is produced can be sold, the total profit from sales becomes

Z = 6xA + 6xB + 3xC .

Thus, the complete mathematical model for the product mix problem may
now be summarized as follows:

Find numbers xA , xB , xC , which will maximize

Z = 6xA + 6xB + 3xC ,


subject to the constraints

7xA + 3xB + 6xC ≤ 250


6xA + 7xB + 8xC ≤ 360

xA ≥ 0, xB ≥ 0, xC ≥ 0.

Example 6.2 (An Inspection Problem)

A company has two grades of inspectors, 1 and 2, who are to be assigned


for a quality control inspection. It is requires that at least 1800 pieces be
inspected per 8-hour day. Grade 1 inspectors can check pieces at the rate
of 25 per hour, with an accuracy of 98%. Grade 2 inspectors check at the
rate of 15 pieces per hour, with an accuracy of 95%. The wage rate of a
Grade 1 inspector is $4.00 per hour, that of a Grade 2 inspector is $3.00 per
Linear Programming 659

hour. Each time an error is made by an inspector, the cost to the company
is $2.00. The company has available for the inspection job eight Grade 1
inspectors and ten Grade 2 inspectors. The company wants to determine
the optimal assignment of inspectors, which will minimize the total cost of
the inspection.

6.4.2 Formulation of Mathematical Model


Let x1 and x2 denote the number of Grade 1 and Grade 2 inspectors as-
signed for inspection. Since the number of available inspectors in each grade
is limited, we have the following constraints:
x1 ≤ 8 (Grade 1)
x1 ≤ 10 (Grade 2).

The company requires at least 1800 pieces to be inspected daily. Thus, we


get
8(25)x1 + 8(15)x2 ≥ 1800
or
200x1 + 120x2 ≥ 1800,
which can also be written as

5x1 + 3x2 ≥ 45.

To develop the objective function, we note that the company incurs two
types of costs during inspections: wages paid to the inspector, and the cost
of the inspection error. The hourly cost of each Grade 1 inspector is

$4 + 2(25)(0.02) = $5 per hour.

Similarly, for each Grade 2 inspector the cost is

$3 + 2(15)(0.05) = $4.50 per hour.

Thus, the objective function is to minimize the daily cost of inspection


given by
Z = 8(5x1 + 4.5x2 ) = 40x1 + 36x2 .
660 Applied Linear Algebra and Optimization using MATLAB

The complete formulation of the linear programming problem thus becomes


minimize Z = 40x1 + 36x2 ,
subject to the constraints
x1 ≤ 8
x2 ≤ 10
5x1 + 3x2 ≥ 45
x1 ≥ 0, x2 ≥ 0.

6.5 Graphical Solution of LP Models


In the last section, two examples were presented to illustrate how practical
problems can be formulated mathematically as LP problems. The next
step after formulation is to solve the problem mathematically to obtain
the best possible solution. In this section, a graphical procedure to solve
LP problems involving only two variables is discussed. Though in practice
such small problems are usually not encountered, the graphical procedure
is presented to illustrate some of the basic concepts used in solving large
LP problems.

Example 6.3 A company manufactures two types of mobile phones, model


A and model B. It takes 5 hours and 2 hours to manufacture A and B,
respectively. The company has 900 hours available per week for the pro-
duction of mobile phones. The manufacturing cost of each model A is $8
and the manufacturing cost of a model B is $10. The total funds available
per week for production is $2800. The profit on each model A is $3, and
the profit on each model B is $2. How many of each type of mobile phone
should be manufactured weekly to obtain the maximum profit?

Solution. We first find the inequalities that describe the time and mon-
etary constraints. Let the company manufacture x1 of model A and x2 of
model B. Then the total manufacturing time is (5x1 + 2x2 ) hours. There
are 900 hours available. Therefore,
5x1 + 2x2 ≤ 900,
Linear Programming 661

Now the cost of manufacturing x1 of model A at $8 each is $8x1 , and the


cost of manufacturing x2 of model B at $10 each is $10x2 . Thus, the total
production cost is (8x1 + 10x2 ). There is $2800 available for production of
mobile phones. Therefore,

8x1 + 10x2 ≤ 2800.

Furthermore, x1 and x2 represent the numbers of mobile phones man-


ufactured. These numbers cannot be negative. Therefore, we get two more
constraints
x1 ≥ 0, x2 ≥ 0.
Next, we find a mathematical expression for profit. Since the weekly
profit on x1 mobile phones at $3 per mobile phone is $3x1 and the weekly
profit on x2 mobile phones at $2 per mobile phone is $2x2 , the total weekly
profit is $(3x1 + 2x2 ). Let the profit function Z be defined as

Z = 3x1 + 2x2 .

Thus, the mathematical model for the given LP problem with the profit
function and the system of linear inequalities may be written as

maximize Z = 3x1 + 2x2 ,

subject to the constraints

5x1 + 2x2 ≤ 900


8x1 + 10x2 ≤ 2800

x1 ≥ 0, x2 ≥ 0.
In this problem, we are interested in determining the values of the vari-
ables x1 and x2 that will satisfy all the restrictions and give the maximum
value of the objective function. As a first step in solving this problem, we
want to identify all possible values of x1 and x2 that are nonnegative and
satisfy the constraints. The solution of an LP problem is merely finding
the best feasible solution (optimal solution) in the feasible region (set of all
feasible solutions). In our example, an optimal solution is a feasible solu-
tion which maximizes the objective function 3x1 + 2x2 . The value of the
662 Applied Linear Algebra and Optimization using MATLAB

objective function corresponding to an optimal solution is called the opti-


mal value of the LP problem.

To represent the feasible region in a graph, every constraint is plotted,


and all values of x1 , x2 that will satisfy these constraints are identified.
The nonnegativity constraints imply that all feasible values of the two vari-
ables will be in the first quadrant. It can be shown that the graph of the
constraint 5x1 + 2x2 ≤ 900 consists of points on and below the straight
line 5x1 + 2x2 = 900. Similarly, the points that satisfy the inequality
8x1 + 10x2 ≤ 2800 are on and below the straight line 8x1 + 10x2 = 2800.

The feasible region is given by the shaded region ABCO as shown in


Figure 6.1. Obviously, there is an infinite number of feasible points in this

Figure 6.1: Feasible region of Example 6.3.

region. Our objective is to identify the feasible point with the largest value
of the objective function Z.

It has been proved that the maximum value of Z will occur at a vertex
of the feasible region, namely, at one of the points A, B, C, or O; if there
is more than one point at which the maximum occurs, it will be along one
Linear Programming 663

edge of the region, such as AB or BC. Hence, we only have to examine


the value of the objective function Z = 3x1 + 2x2 at the vertices A, B, C,
and O. These vertices are found by determining the points of intersection
of the lines. We obtain:

A(0, 280) : ZA = 3(0) + 2(280) = 560


B(100, 200) : ZB = 3(100) + 2(200) = 700
C(180, 0) : ZC = 3(180) + 2(0) = 540
O(0, 0) : ZO = 3(0) + 2(0) = 0.

The maximum value of Z is 700 at B. Thus, the maximum value of


Z = 3x1 + 2x2 is 700, when x1 = 100 and x2 = 200. The interpretation
of these results is that the maximum weekly profit is $700 and this occurs
when 100 model A mobile phones and 200 model B mobile phones are
manufactured. •

In using the Optimization Toolbox, linprog solves LP problems:

min Z 0 ∗ x subject to: A ∗ x <= b


x
Aeq ∗ x == beq
lb <= x <= ub.

In solving LP problems using linprog, we use the following:

Syntax:

>> x = linprog(Z, A, b)

solves min Z 0 ∗ x such that A ∗ x <= b.

>> x = linprog(Z, A, b, Aeq, beq)

solves min Z 0 ∗ x such that A ∗ x <= b, Aeq ∗ x == beq. If no inequalities


exist, then set A = [ ] and b = [ ].

>> x = linprog(Z, A, b, Aeq, beq, lb, ub)


664 Applied Linear Algebra and Optimization using MATLAB

defines a set of lower and upper bounds on the design variables, x, so that
the solution is always in the range lb <= x <= ub. If no equalities exist,
then set Aeq = [ ] and beq = [ ].

>> [x, F val] = linprog(Z, A, b)

returns the value of the objective function at x, F val = Z 0 ∗ x.

>> [x, F val, exitf lag] = linprog(Z, A, b)

returns a value exitflag that describes the exit condition.

>> [x, F val, exitf lag, output] = linprog(Z, A, b)

returns a structure containing information about the optimization.

Input Parameters:
Z is the objective function coefficients.
A is a matrix of inequality constraint coefficients.
b is the right-hand side in inequality constraints.
Aeqs the matrix of equality constraints.
beq is the right-hand side in equality constraints.
lb is the lower bounds of the desgin. values; -Inf == unbounded below.
Embty lb ==> -Inf on all variables.
ub is the upper bounds on the desgin. values; Inf == unbounded above.
Embty ub ==> Inf on all variables.
Output Parameters:
x optimal design parameters.
F val optimal design parameters.
exitflag exit conditions of linprog. If exitflag is:

>0 then linprog converged with a solution x.


=0 then linprog reached the maximum number of iterations
without converging to a solution x.
<0 then the problem was infeasible or linprog failed. For example,
Linear Programming 665

if exitflag = –2, then no feasible point was found.


if exitflag = –3, then the problem is unbounded.
if exitflag = –4, then the NaN value was encountered during execution
of the algorithm.
if exitflag = –5, then both the primal and dual problems are infeasible.

output structure, and the fields of the structure are:


the number of iterations taken,
the type of algorithm used,
the number of conjugate gradient iterations (if used).

Now to reproduce the above results (Example 6.3) using MATLAB, we


do the following:

First, enter the coefficients:

>> Z = [3 2];
>> A = [5 2; 8 10];
>> b = [900; 2800];
Second, evaluate linprog:

>> [x, F val, exitf lag, output] = linprog(−Z, A, b);

MATLAB’s answer is:


optimization terminated successfully.

Now evaluate x and Z as follows:

>> x =
100.0000
200.0000
and
666 Applied Linear Algebra and Optimization using MATLAB

>> Objective = −F val =


700.0000

Note that the optimization functions in the toolbox minimize the objec-
tive function. To maximize a function Z, apply an optimization function
to minimize −Z. The resulting point where the maximum of Z occurs is
also the point where the minimum of −Z occurs.

We graph the regions specified by the constraints. Let’s put in the two
constraint inequalities:

>> X = 0 : 200;
>> Y 1 = max((900 − 5. ∗ X)./2, 0);
>> Y 2 = max((2800 − 8. ∗ X)./10, 0);
>> Y top = min([Y 1; Y 2]);
>> area(X, Y top);

Like the optimization toolbox function linprog, MATLAB’s Simulink Tool-


box has a built-in function, simlp, that implements the solution of an LP
problem. We will now use simlp on the above problem as follows:

>> Z = [3 2];
>> A = [5 2; 8 10];
>> b = [900; 2800];
>> simlp(−Z, A, b);
>> ans
100.0000
200.0000

This is the same answer we obtained before. Note that we entered the
negative of the coefficient vector for the objective function Z because simlp
also searches for a minimum rather than a maximum.

Example 6.4 Find the maximum value of

Z = 2x1 + x2 ,
Linear Programming 667

subject to the constraints


3x1 + 5x2 ≤ 20
4x1 + x2 ≤ 15
and
x1 ≥ 0, x2 ≥ 0.
Solution. The constraints are represented graphically by the shaded region
ACBO in Figure 6.2. The vertices of the feasible region are found by

Figure 6.2: Feasible region of Example 6.4.

determining the points of intersections of the lines. We get


A(0, 4) : ZA = 2(0) + 4 = 4
   
15 15 15
B ,0 : ZB = 2 +0 =
4 4 2
   
55 35 55 35 145
C , : ZC = 2 + =
17 17 17 17 17

O(0, 0) : ZO = 0 + 0 = 0
668 Applied Linear Algebra and Optimization using MATLAB

145
Thus, the maximum value of the objective function Z is 17
; namely, when
x1 = 55
17
35
and x2 = 17 . •
To get the above results using MATLAB’s built-in function, simlp, we
do the following:

>> Z = [2 1];
>> A = [3 5; 4 1];
>> b = [20; 15];
>> simlp(−Z, A, b);
>> ans
3.2353
2.0588
In fact, we can compute

>> [x, y] = solve(0 3 ∗ x + 5 ∗ y = 200 ,0 4 ∗ x + y = 150 );


>> x
3.2353
>> y
2.0588

6.5.1 Reversed Inequality Constraints


For cases where some or all of the constraints contain inequalities with the
sign reversed (≥ rather than ≤), the ≥ signs can be converted to ≤ signs
by multiplying both sides of the constraints by −1. Thus, the constraint

ai1 x1 + ai2 x2 + · · · + aiN xN ≥ bi

is equivalent to

−ai1 x1 − ai2 x2 − · · · − aiN xN ≤ −bi .

6.5.2 Equality Constraints


For cases where some or all of the constraints contain equalities, the prob-
lem can be reformulated by expressing an equality as two inequalities with
Linear Programming 669

opposite signs. Thus, the constraint

ai1 x1 + ai2 x2 + · · · + aiN xN = bi

is equivalent to

ai1 x1 + ai2 x2 + · · · + aiN xN ≤ bi


ai1 x1 + ai2 x2 + · · · + aiN xN ≥ bi .

6.5.3 Minimum Value of a Function


An LP problem that involves determining the minimum value of an objec-
tive function Z can be solved by looking for the maximum value of −Z,
the negative of Z, over the same feasible region. Thus, the problem

minimize Z = c1 x1 + c2 x2 + · · · + cN xN

is equivalent to

maximize − Z = (−c1 )x1 + (−c2 )x2 + · · · + (−cN )xN .

Theorem 6.1 (Minimum Value of a Function)

The minimum value of a function Z over a region S occurs at the point(s)


of the maximum value of −Z and is the negative of that maximum value.

Proof. Let Z have a minimum value ZA at the point A in the region.


Then if B is any other point in the region,

ZA ≤ ZB .

Multiply both sides of this inequality by −1 to get

−ZA ≥ −ZB .

This implies that A is a point of maximum value −Z. Furthermore, the


minimum value is ZA , the negative of the maximum value −ZA . The above
steps can be reversed, proving that the converse holds, and this verifies the
result. •
670 Applied Linear Algebra and Optimization using MATLAB

In summary, a minimization LP problem in which

1. the objective function is to be minimized (rather than maximized)


or;

2. the constraints contain equalities (= rather than ≤) or;

3. the constraints contain inequalities with the sign reversed (≥ rather


than ≤);

can be reformulated in terms of the general solution formulation given by


(6.1), (6.2), and (6.3).

The following diet problem is an example of a general LP problem, in


which the objective function is to be minimized and the constraints contain
≥ signs.

Example 6.5 (Diet Problem)

The diet problem arises in the choice of foods for a healthy diet. The prob-
lem is to determine the foods in a diet that minimize the total cost per day,
subject to constraints that ensure minimum daily nutritional requirements.
Let

• M = number of nutrients

• N = number of types of food

• aij = number of units of nutrient i in food (i = 1, 2, . . . , M ; j =


1, 2, . . . , N )

• bi = number of units of nutrient i required per day (i = 1, 2, . . . , M )

• ci = cost per unit of food j (j = 1, 2, . . . , N ) xj = number of units of


food j in the diet per day (j = 1, 2, . . . , N )
Linear Programming 671

The objective is to find the values of the N variables x1 , x2 , . . . , xN to min-


imize the total cost per day, Z. The LP formulation for the diet problem
is
minimze Z = c1 x 1 + c2 x 2 + · · · + cN x N ,
subject to the constraints

a11 x1 + a12 x2 + ··· + a1N xN ≥ b1


a21 x1 + a22 x2 + ··· + a2N xN ≥ b2
.. .. .. ..
. . ··· . .
aM 1 x 1 + aM 2 x 2 + · · · + aM N x N ≥ b M

and
x1 ≥ 0, x2 ≥ 0, . . . , xN ≥ 0,
where aij , bi , cj (i = 1, 2, . . . , M ; j = 1, 2, . . . , N ) are constants. •

Example 6.6 Consider the inspection problem given by Example 6.2:

minimize Z = 40x1 + 36x2 ,

subject to the constraints

x1 ≤ 8
x2 ≤ 10
5x1 + 3x2 ≥ 45

x1 ≥ 0, x2 ≥ 0.
In this problem, we are interested in determining the values of the vari-
ables x1 and x2 that will satisfy all the restrictions and give the least value
of the objective function. As a first step in solving this problem, we want
to identify all possible values of x1 and x2 that are nonnegative and satisfy
the constraints. For example, a solution x1 = 8 and x2 = 10 is positive
and satisfies all the constraints. In our example, an optimal solution is a
feasible solution which minimizes the objective function 40x1 + 36x2 .

To represent the feasible region in a graph, every constraint is plotted,


and all values of x1 , x2 that will satisfy these constraints are identified. The
672 Applied Linear Algebra and Optimization using MATLAB

nonnegativity constraints imply that all feasible values of the two variables
will be in the first quadrant. The constraint 5x1 + 3x2 ≥ 45 requires that
any feasible solution (x1 , x2 ) to the problem should be on one side of the
straight line 5x1 + 3x2 = 45. The proper side is found by testing whether
the origin satisfies the constraint or not. The line 5x1 + 3x2 = 45 is first
plotted by taking two convenient points (for example, x1 = 0, x2 = 15 and
x1 = 9, x2 = 0).

Similarly, the constraints x1 ≤ 8 and x2 ≤ 10 are plotted. The feasi-


ble region is given by the region ACBO as shown in Figure 6.3. Obviously

Figure 6.3: Feasible region of Example 6.6.

there is an infinite number of feasible points in this region. Our objective is


to identify the feasible point with the lowest value of the objective function
Z.

Observe that the objective function, given by Z = 40x1 + 36x2 , repre-


sents a straight line if the value of Z is fixed a priori. Changing the value
of Z essentially translates the entire line to another straight line parallel to
itself. In order to determine an optimal solution, the objective function line
is drawn for a convenient value of Z such that it passes through one or more
Linear Programming 673

points in the feasible region. Initially, Z is chosen as 600. By moving this


line closer to the origin, the value of Z is further decreased (Figure 6.3).
The only limitation on this decrease is that the straight line Z = 40x1 +36x2
contains at least one point in the feasible region ABC. This clearly occurs
at the corner point A given by x1 = 8, x2 = 53 . This is the best feasible
point giving the lowest value of Z as 380. Hence, x1 = 8, x2 = 35 is an
optimal solution, and Z = 380 is the optimal value for the LP problem.

Thus, for the inspection problem the optimal utilization is achieved by


using eight Grade 1 inspectors and 1.67 Grade 2 inspectors. The fractional
value x2 = 35 suggests that one of the Grade 2 inspectors is utilized only
67% of the time. If this is not possible, the normal practice is to round off
the fractional values to get an optimal integer solution as x1 = 8, x2 = 2.
(In general, rounding off the fractional values will not produce an optimal
integer solution.) •

Unique Optimal Solution

In Example 6.6, the solution x1 = 8, x2 = 35 is the only feasible point with


the lowest value of Z. In other words, the values of Z corresponding to the
other feasible solution in Figure 6.3 exceed the optimal value of 380. Hence,
for this problem, the solution x1 = 8, x2 = 53 is the unique optimal solution.

Alternative Optimal Solutions

In some LP problems, there may exist more than one feasible solution such
that their objective values are equal to the optimal values of the linear
program. In such cases, all of these feasible solutions are optimal solu-
tions, and the LP problem is said to have alternative or multiple optimal
solutions. To illustrate this, consider the following LP problem.
674 Applied Linear Algebra and Optimization using MATLAB

Example 6.7 Find the minimum value of

Z = x1 + 2x2 ,

subject to the constraints

x1 + 2x2 ≤ 10
x1 + x2 ≥ 1
x2 ≤ 4

and
x1 ≥ 0, x2 ≥ 0.

Figure 6.4: Feasible region of Example 6.7.

Solution. The feasible region is shown in Figure 6.4. The objective func-
tion lines are drawn for Z = 2, 6, and 10. The optimal value for the LP
problem is 10, and the corresponding objective function line x1 + 2x2 = 10
coincides with side BC of the feasible region. Thus, the corner point fea-
sible solutions x1 = 10, x2 = 0(B), and x1 = 2, x2 = 4(C), and all other
points on the line BC are optimal solutions. •
Linear Programming 675

Unbounded Solution

Some LP problems may not have an optimal solution. In other words,


it is possible to find better feasible solutions continuously improving the
objective function values. This would have been the case if the constraint
x1 + 2x2 ≤ 10 were not given in Example 6.7. In this case, moving farther
away from the origin increases the objective function x1 + 2x2 and max-
imized Z would be +∞. When there exists no finite optimum, the LP
problem is said to have an unbounded solution.

It is inconceivable for a practical problem to have an unbounded so-


lution, since this implies that one can make infinite profit from a finite
amount of resources. If such a solution is obtained in a practical problem
it generally means that one or more constraints have been omitted inad-
vertently during the initial formulation of the problem. These constraints
would have prevented the objective function from assuming infinite values.

Theorem 6.2 If there exists an optimal solution to an LP problem, then


at least one of the corner points of the feasible region will always qualify to
be an optimal solution. •

Notice that each feasible region we have discussed is such that the whole
of the segment of a straight line joining two points within the region lies
within that region. Such a region is called a convex. A theorem states that
the feasible region in an LP problem is a convex (see Figure 6.5).

In the following section, we will use an iterative procedure called the


simplex method for solving an LP problem based on Theorem 6.2. Even
though the feasible region of an LP problem contains an infinite number
of points, an optimal solution can be determined by merely examining the
finite number of corner points in the feasible region. Before we discuss the
simplex method, we discuss the canonical and standard forms of a linear
program.
676 Applied Linear Algebra and Optimization using MATLAB

Figure 6.5: Convex and nonconvex sets in R2 .

6.5.4 LP Problem in Canonical Form


The general LP problem can always be put in the following form, which is
referred to as the canonical form:
M
X
maximize Z = ci x i ,
i=1

subject to the constraints


M
X
aij xj ≤ bi , j = 1, 2, . . . , N
i=1

xi ≥ 0, i = 1, 2, . . . , M.
The characteristics of this form are:
1. All decision variables are nonnegative.
2. All the constraints are of the ≤ form.
3. The objective function is to maximize.
Linear Programming 677

4. All the right-hand sides bi ≥ 0, i = 1, 2, . . . , M.

5. The matrix A contains M identity columns of an M × M identity


matrix I.

6. The objective function coefficients corresponding to those M identity


columns are zero.

Note that the variables corresponding to the M identity columns are called
basic variables and the remaining variables are called nonbasic variables.
The feasible solution obtained by setting the nonbasic variables equal to
zero and using the constraint equations to solve for the basic variables is
called the basic feasible solution.

6.5.5 LP Problem in Standard Form


The standard form of an LP problem with M constraints and N variables
can be represented as follows:

maximize (minimize) Z = c1 x1 + c2 x2 + · · · + cN xN , (6.9)


subject to the constraints

a11 x1 + a12 x2 + ··· + a1N xN = b1


a21 x1 + a22 x2 + ··· + a2N xN = b2
.. .. .. .. (6.10)
. . ··· . .
aM 1 x 1 + aM 2 x 2 + · · · + aM N x N = b M

and

x1 ≥ 0, x2 ≥ 0, . . . , xN ≥ 0
b1 ≥ 0, b2 ≥ 0, . . . , bM ≥ 0. (6.11)

The main features of the standard form are:

• The objective function is of the maximization or minimization type.

• All constraints are expressed as equations.


678 Applied Linear Algebra and Optimization using MATLAB

• All variables are restricted to be nonnegative.

• The right-hand side constant of each constraint is nonnegative.

In matrix-vector notation, the standard LP problem can be expressed as:

maximize (minimize) Z = cx (6.12)

subject to the constraints


Ax = b (6.13)
and

x≥0
b ≥ 0, (6.14)

where A is an M × N matrix, x is an N × 1 column vector, b is an M × 1


column vector, and x is an 1 × N row vector. In other words,
 
a11 a12 · · · a1N
 a21 a22 · · · a2N 
A =  .. ..  , (6.15)
 
.. ..
 . . . . 
aM 1 aM 2 · · · aM N
   
b1 x1
 b2    x2 
b= , c= c1 c2 · · · cN , x= . (6.16)
   
.. ..
 .   . 
bM xN

In practice, A is called a coefficient matrix, x is the decision vector, b


is the requirement vector, and c is the profit (cost) vector of an LP problem.

Note that to convert an LP problem into standard form, each inequality


constraint must be replaced by an equation constraint by introducing new
variables that are slack variables or surplus variables. We illustrate this
procedure using the following problem.
Linear Programming 679

Example 6.8 (Leather Limited)

Leather Limited manufactures two types of belts: the deluxe model and the
regular model. Each type requires 3 square yards of leather. A regular belt
requires 5 hours of skilled labor and a deluxe belt requires 4 hours. Each
week, 55 square yards of leather and 75 hours of skilled labor are available.
Each regular belt contributes $10 to profit and each deluxe belt, $15. For-
mulate the LP problem.

Solution. Let x1 be the number of deluxe belts and x2 be the regular belts
that are produced weekly. Then the appropriate LP problem sets the form:
maximize Z = 10x1 + 15x2 ,
subject to the constraints
3x1 + 3x2 ≤ 55
4x1 + 5x2 ≤ 75
x1 ≥ 0, x2 ≥ 0.
To convert the above inequality constraints to equality constraints, we de-
fine for each ≤ constraint a slack variable ui (ui = slack variable for ith
constraint), which is the amount of the resource unused in the ith con-
straint. Because x1 + x2 square yards of leather are being used, and 40
square yards are available, we define u1 by
u1 = 55 − 3x1 − 3x2 or 3x1 + 3x2 + u1 = 55.
Similarly, we define u2 by
u2 = 75 − 4x1 − 5x2 or 4x1 + 5x2 + u2 = 75.
Observe that a point (x1 , x2 ) satisfies the ith constraint, if and only if ui ≥
0. Thus, the converted LP problem
maximize Z = 10x1 + 15x2 ,
subject to the constraints
3x1 + 3x2 + u1 = 55
4x1 + 5x2 + u2 = 75
680 Applied Linear Algebra and Optimization using MATLAB

x1 ≥ 0, x2 ≥ 0, u1 ≥ 0, u2 ≥ 0
is in standard form.

In summary, if constraint i of an LP problem is a ≤ constraint, then


we convert it to an equality constraint by adding the slack variable ui to
the ith constraint and adding the sign restriction ui ≥ 0. •

Now we illustrate how a ≥ constraint can be converted to an equality


constraint. Let us consider the diet problem discussed in Example 6.5. To
convert the ith ≥ constraint to an equality constraint, we define an excess
variable (surplus variable) vi (vi will always be the excess variable for the
ith constraint). We define vi to be the amount by which the ith constraint
is oversatisfied. Thus, for the diet problem,

v1 = a11 x1 + a12 x2 + · · · + a1N xN − b1

or
a11 x1 + a12 x2 + · · · + a1N xN − v1 = b1 .
We do the same for the other remaining ≥ constraints; the converted
standard form of the diet problem after adding the sign restrictions vi ≥
0(i = 1, 2, . . . , M ) may be written as

minimize Z = c1 x1 + c2 x2 + · · · + cN xN ,

subject to the constraints

a11 x1 + ··· + a1N xN − v1 = b1


a21 x1 + ··· + a2N xN − v2 = b2
.. .. .. ..
. ··· . . ··· .
aM 1 x 1 + · · · + aM N x N − vM = bM

and
xi ≥ 0, vi ≥ 0 (i = 1, 2, . . . , M ).
A point (x1 , x2 , . . . , xN ) satisfies the ith ≥ constraint, if and only if vi is
nonnegative.
Linear Programming 681

In summary, if the ith constraint of an LP problem is a ≥ constraint,


then it can be converted to an equality constraint by subtracting an excess
variable vi from the ith constraint and adding the sign restriction vi ≥ 0.

If an LP problem has both ≤ and ≥ constraints, then simply apply the


procedures we have described to the individual constraints. For example,
the LP problem
maximize Z = 55x1 + 60x2 ,

subject to the constraints

x1 ≤ 30
x2 ≤ 45
15x1 + 25x2 ≤ 70
30x1 + 35x2 ≥ 90

x1 ≥ 0, x2 ≥ 0

can be easily transformed into standard form by adding slack variables


u1 , u2 , and u3 , respectively, to the first three constraints and subtracting
an excess variable v4 from the fourth constraint. Then we add the sign
restrictions
u1 ≥ 0, u2 ≥ 0, u3 ≥ 0, v4 ≥ 0.

This yields the following LP problem in standard form

maximize Z = 55x1 + 60x2 ,

subject to the constraints

x1 + u1 = 30
x2 + u2 = 45
15x1 + 25x2 + u3 = 70
30x1 + 35x2 + v4 = 90

x1 ≥ 0, x2 ≥ 0, u1 ≥ 0, u2 ≥ 0, u3 ≥ 0, v4 ≥ 0.
682 Applied Linear Algebra and Optimization using MATLAB

6.5.6 Some Important Definitions


Let us review the basic definitions using the standard form of an LP prob-
lem given by
maximize Z = cx,
subject to the constraints
Ax = b
x ≥ 0.
1. Feasible Solution. A feasible solution is a nonnegative vector x sat-
isfying the constraints Ax = b.
2. Feasible Region. The feasible region, denoted by S, is the set of all
feasible solutions. Mathematically,

S = {x|Ax = b, x > 0}.

If the feasible set S is empty, then the linear program is said to be


infeasible.
3. Optimal Solution. An optimal solution is a vector x0 such that it is
feasible and its value of the objective function (cx0 ) is larger than
that of any other feasible solution. Mathematically, x0 is optimal, if
and only if x0 ∈ S and cx0 ≥ cx for all x ∈ S.
4. Optimal Value. The optimal value of a linear program is the value of
the objective function corresponding to the optimal solution. If Z 0
is the optimal value, then Z 0 = cx0 .
5. Alternate Optimum. When a linear program has more than one op-
timal solution, it is said to have an alternate optimal solution. In
this case, there exist more than one feasible solution having the same
optimal value (Z 0 ) for their objective functions.
6. Unique Optimum. The optimal solution of a linear program is said
to be unique when there exists no other optimal solution.
7. Unbounded Solution. When a linear program does not pose a finite
optimum (i.e., Zmax → ∞), it is said to have an unbounded solution.
Linear Programming 683

6.6 The Simplex Method


The graphical method of solving an LP problem introduced in the last
section has its limitations. The method demonstrated for two variables
can be extended to LP problems involving three variables, but for prob-
lems involving more than two variables, the graphical approach becomes
impractical. Here, we introduce the other approach called the simplex
method, which is an algebraic method that can be used for any number
of variables. This method was developed by George B. Dantzig in 1947.

It can be used to solve maximization or minimization problems with any


standard constraints.

Before proceeding further with our discussion with the simplex algo-
rithm, we must define the concept of a basic solution to a linear system
(6.13).

6.6.1 Basic and Nonbasic Variables


Consider a linear system Ax = b of M linear equations in N variables
(assume N ≥ M ).

Definition 6.4 (Basic Solution)

A basic solution to Ax = b is obtained by setting N − M variables equal to


0 and solving for the values of the remaining M variables. This assumes
that setting the N − M variables equal to 0 yields unique values for the
remaining M variables or, equivalently, the columns for the remaining M
variables are linearly independent. •

To find a basic solution to Ax = b, we choose a set of N − M variables


(the nonbasic variables) and set each of these variables equal to 0. Then
we solve for the values of the remaining N − (N − M )M variables (the
basic variables) that satisfy Ax = b.
684 Applied Linear Algebra and Optimization using MATLAB

Definition 6.5 (Basic Feasible Solution)

Any basic solution to a linear system (6.13) in which all variables are
nonnegative is a basic feasible solution. •

The simplex method deals only with basic feasible solutions in the sense
that it moves from one basic solution to another. Each basic solution is
associated with an iteration. As a result, the maximum number of itera-
tions in the simplex method cannot exceed the number of basic solutions
of the standard form. We can thus conclude that the maximum number of
iterations cannot exceed
 
N N!
= .
M (N − M )!M !

The basic–nonbasic swap gives rise to two suggestive concepts: The en-
tering variable is a current nonbasic variable that will “enter” the set of
basic variables at the next iteration. The leaving variable is a current basic
variable that will “leave” the basic solution in the next iteration.

Definition 6.6 (Adjacent Basic Feasible Solution)

For any LP problem with M constraints, two basic feasible solutions are
said to be adjacent if their sets of basic variables have M −1 basic variables
in common. In other words, an adjacent feasible solution differs from the
present basic feasible solution in exactly one basic variable. •

We now give a general description of how the simplex algorithm solves LP


problems.

6.6.2 The Simplex Algorithm


1. Set up the initial simplex tableau.

2. Locate the negative element in the last row, other than the last ele-
ment, that is largest in magnitude (if two or more entries share this
property, any one of these can be selected). If all such entries are
nonnegative, the tableau is in final form.
Linear Programming 685

3. Divide each positive element in the column defined by this negative


entry into the corresponding element of the last column.

4. Select the divisor that yields the smallest quotient. This element is
called the pivot element (if two or more elements share this property,
any one of these can be selected as the pivot).

5. Now use operations to create a 1 in the pivot location and zeros


elsewhere in the pivot column.

6. Repeat steps 2-5 until all such negative elements have been eliminated
from the last row. The final matrix is called the final simplex tableau
and it leads to the optimal solution.

Example 6.9 Determine the maximum value of the function

Z = 3x1 + 5x2 + 8x3 ,

subject to the constraints

x1 + x2 + x3 ≤ 100
3x1 + 2x2 + 4x3 ≤ 200
x1 + 2x2 + x3 ≤ 150

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.
Solution. Take the three slack variables u1 , u2 , and u3 , which must be
added to the given 3 constraints to get the standard constraints, which may
be written in the LP problem as

x1 + x2 + x3 + u 1 = 100
3x1 + 2x2 + 4x3 + u2 = 200
x1 + 2x2 + x3 + u3 = 150

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, u1 ≥ 0, u2 ≥ 0, u3 ≥ 0.
The objective function Z = 3x1 + 5x2 + 8x3 is rewritten in the form

−3x1 − 5x2 − 8x3 + Z = 0.


686 Applied Linear Algebra and Optimization using MATLAB

Thus, the entire problem now becomes that of determining the solution to
the following system of equations:
x1 + x2 + x3 + u1 = 100
3x1 + 2x2 + 4x3 + u2 = 200
x1 + 2x2 + x3 + u3 = 150
−3x1 − 5x2 − 8x3 + Z = 0.
Since we know that the simplex algorithm starts with an initial basic
feasible solution, by inspection, we see that if we set nonbasic variables
x1 = x2 = x3 = 0, we can solve for the values of the basic variables
u1 , u2 , u3 . So the basic feasible solution for the basic variables is
u1 = 100, u2 = 200, u3 = 150, x1 = x2 = x3 = 0.
It is important to observe that each basic variable may be associated
with the row of the canonical form in which the basic variable has a coef-
ficient of 1. Thus, for the initial canonical form, u1 may be thought of as
the basic variable for row 1, as may u2 for row 2, and u3 for row 3. To
perform the simplex algorithm, we also need a basic (although not neces-
sarily nonnegative) variable for the last row. Since Z appears in the last
row with a coefficient of 1, and Z does not appear in any other row, we use
Z as its basic variable. With this convention, the basic feasible solution for
our initial canonical form has
basic variables {u1 , u2 , u3 , Z} and nonbasic variables {x1 , x2 , x3 }.
For this basic feasible solution
u1 = 100, u2 = 200, u3 = 150, Z = 0, x1 = x2 = x3 = 0
Note that a slack variable can be used as a basic variable for an equation
if the right-hand side of the constraint is nonnegative.

Thus, the simplex tableaus are as follows:


basis x1 x2 x3 u1 u2 u3 Z constants
u1 1 1 1 1 0 0 0 100
u2 3 2 4l 0 1 0 0 200
u3 1 2 0 0 0 1 0 150
Z −3 −5 −8 0 0 0 1 0
Linear Programming 687

basis x1 x2 x3 u 1 u2 u3 Z constants
u1 1 1 1 1 0 0 0 100
3 1 1
u2 4 2
1 0 4
0 0 50
u3 1 2 0 0 0 1 0 150
Z −3 −5 −8 0 0 0 1 0

basis x1 x2 x3 u1 u2 u3 Z constants
1 1
u1 4 2
0 1 − 14 0 0 50
3 1 1
x3 4 2
1 0 4
0 0 50
u3 1 2 0 0 0 1 0 150
Z 3 −1 0 0 2 0 1 400

basis x1 x2 x3 u1 u2 u3 Z constants
1 1
u1 4 2
0 1 − 14 0 0 50
3 1 1
x3 4 2
1 0 4
0 0 50
u3 1 l
2 0 0 0 1 0 150
Z 3 −1 0 0 2 0 1 400

basis x1 x2 x3 u1 u2 u3 Z constants
u1 0 0 0 1 − 14 − 14 0 25
2
1 1
x3 2
0 1 0 4
− 14 0 25
2
1 1
x2 2
1 0 0 0 2
0 75
7 1
Z 2
0 0 0 2 2
1 475

Since all negative elements have been eliminated from the last row, the final
tableau gives the following system of equations:
1 1 25
u1 − u2 − u3 =
4 4 2
1 1 1 25
x1 + x3 + u2 − u3 =
2 4 4 2
1 1
x1 + x2 + u3 = 75
2 2
7 1
x1 + 2u2 + u3 + Z = 475.
2 2
688 Applied Linear Algebra and Optimization using MATLAB

The constraints are

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, u1 ≥ 0, u2 ≥ 0, u3 ≥ 0.

The final equation, under the constraints, implies that Z has a maximum
value of 475 when x1 = 0, u2 = 0, u3 = 0. On substituting these values back
into the equations, we get
25 25
x2 = 75, x3 = , u1 = .
2 2
Thus, Z = 3x1 + 5x2 + 8x3 has a maximum value of 475 at
25
x1 = 0, x2 = 75, x3 = .
2
Note that the reasoning at this maximum value of Z implies that the
element in the last row and the last column of the final tableau will always
correspond to the maximum value of the objective function Z. •

In the following example, we illustrate the application of the simplex


method when there are many optimal solutions.

Example 6.10 Determine the maximum value of the function

Z = 8x1 + 2x2 ,

subject to the constraints

4x1 + x2 ≤ 32
4x1 + 3x2 ≤ 48

x1 ≥ 0, x2 ≥ 0.
Solution. Take the two slack variables u1 and u2 , which must be added
to the given 2 constraints to get the standard constraints, which may be
written in the LP problem as

4x1 + x2 + u1 = 32
4x1 + 3x2 + u2 = 48
Linear Programming 689

x1 ≥ 0, x2 ≥ 0, u1 ≥ 0, u2 ≥ 0.
The objective function Z = 8x1 + 2x2 is rewritten in the form

−8x1 − 2x2 + Z = 0.

Thus, the entire problem now becomes that of determining the solution to
the following system of equations:

4x1 + x2 + u1 = 32
4x1 + 3x2 + u2 = 48
−8x1 − 2x2 + Z = 0.

The simplex tableaus are as follows:


basis x1 x2 u1 u2 Z constants
u1 4l 1 1 0 0 32
u2 4 3 0 1 0 48
Z −8 −2 0 0 1 0

basis x1 x2 u1 u2 Z constants
1 1
u1 1 4 4
0 0 8
u2 4 3 0 1 0 48
Z −8 −2 0 0 1 0

basis x1 x2 u1
u2 Z constants
1 1
x1 1 4
04
0 8
u2 0 2 −1 1 0 16
Z 0 0 2 0 1 64

Since all negative elements have been eliminated from the last row, the
final tableau gives the following system of equations:
1 1
x1 + x2 + u1 = 8
4 4

2x2 − u1 + u2 = 16

2u1 + Z = 64,
690 Applied Linear Algebra and Optimization using MATLAB

with the constraints

x1 ≥ 0, x2 ≥ 0, u1 ≥ 0, u2 ≥ 0.

The last equation implies that Z has a maximum value of 64 when u1 = 0.


On substituting these values back into the equations, we get
1
x1 + x
4 2
= 8

2x2 + u2 = 16,

with the constraints

x1 ≥ 0, x2 ≥ 0, u2 ≥ 0.

Any point (x1 , x2 ) that satisfies these conditions is an optimal solution.


Thus, Z = 8x1 + 2x2 has a maximum value of 64. This is achieved at a
point on the line x1 + 41 x2 = 8 between (6, 8) and (8, 0). •

To use the simplex method set ‘LargeScale’ to ‘off’ and ‘simplex’ to ‘on’ in
options:

>> option = optimset(0 Large0 ,0 of f 0 ,0 simplex0 ,0 on0 );

Then call the function linprog with the options input argument:

>> Z = [8 : 2];
>> A = [4 1; 4 3];
>> b = [32; 48];
>> lb = [0; 0]; ub = [20; 20];
>> [x, F val, exitf lag, output] = linprog(−Z, A, b, [ ], [ ], lb, ub);

6.6.3 Simplex Method for Minimization Problem


In the last two examples, we used the simplex method for finding the
maximum value of the objective function Z. In the following, we will
apply the method for the minimization problem.
Linear Programming 691

Example 6.11 Determine the minimum value of the function


Z = −2x1 + x2 ,
subject to the constraints
2x1 + x2 ≤ 20
x1 − x2 ≤ 4
−x1 + x2 ≤ 5
x1 ≥ 0, x2 ≥ 0.
Solution. We can solve this LP problem using two different approaches.

First Approach: Put Z1 = −Z, then minimizing Z is equivalent to


maximizing Z1. For this first approach, find Z1max , then Zmin = −Z1max .
Let
Z1 = −Z = 2x1 − x2 ,
then the problem reduces to maximize Z1 = 2x1 − x2 under the same con-
straints. Introducing slack variables, we have
2x1 + x2 + u1 = 20
x1 − x2 + u2 = 4
−x1 + x2 + u3 = 5
−2x1 + x2 + Z1 = 0
x1 ≥ 0, x2 ≥ 0, u1 ≥ 0, u2 ≥ 0, u3 ≥ 0.
The simplex tableaus are as follows:
basis x1 x2 u 1 u2 u3 Z1 constants
u1 2 1 1 0 0 0 20
u2 1l −1 0 1 0 0 4
u3 −1 1 0 0 1 0 5
Z1 −2 1 0 0 0 1 0

basis x1 x2 u1 u2 u3 Z1 constants
u1 0 3l 1 −2 0 0 12
x1 1 −1 0 1 0 0 4
u3 0 0 0 1 1 0 9
Z1 0 −1 0 2 0 1 8
692 Applied Linear Algebra and Optimization using MATLAB

basis x1 x2 u1 u2 u3 Z1 constants
1
x2 0 1 3
− 23 0 0 4
1 1
x1 1 0 3 3
0 0 8
u3 0 0 0 1 1 0 9
1 4
Z1 0 0 3 3
0 1 12

Thus, the final tableau gives the following system of equations:


1 2
x2 + u1 − u2 = 4
3 3
1 1
x1 + u1 + u2 = 8
3 3

u2 + u3 = 9

1 4
u1 + u2 + Z1 = 12,
3 3
with the constraints

x1 ≥ 0, x2 ≥ 0, u1 ≥ 0, u2 ≥ 0, u3 ≥ 0.

The last equation implies that Z1 has a maximum value of 12 when


u1 = u2 = 0. Thus, Z1 = 2x1 − x2 has a maximum value of 12 at x1 = 8
and x2 = 4. Since Z = −Z1 = −12, the minimum value of the objective
function Z = −2x1 + x2 is −12.

Second Approach: To decrease Z, we have to pick out the largest posi-


tive entry in the bottom row to find a pivotal column. Thus, the problem
now becomes that of determining the solution to the following system of
equations:

2x1 + x2 + u 1 = 20
x1 − 2x2 + u2 = 4
−x1 + x2 + u3 = 5
2x1 − x2 + Z = 0.

The simplex tableaus are as follows:


Linear Programming 693

basis x1 x2 u1 u2 u3 Z constants
u1 2 1 1 0 0 0 20
u2 1l −1 0 1 0 0 4
u3 −1 1 0 0 1 0 5
Z 2 −1 0 0 0 1 0
basis x1 x2 u1 u2 u3 Z constants
u1 0 3l 1 −2 0 0 12
x1 1 −1 0 1 0 0 4
u3 0 0 0 1 1 0 9
Z 0 1 0 −2 0 1 −8
basis x1 x2 u1 u2 u3 Z constants
1
x2 0 1 3
− 23 0 0 4
1 1
x1 1 0 3 3
0 0 8
u3 0 0 0 1 1 0 9
Z 0 0 − 13 − 43 0 1 −12
Thus, Z = −2x1 + x2 has a minimum value of −12 at x1 = 8 and x2 = 4.•

6.7 Unrestricted in Sign Variables


In solving an LP problem with the simplex method, we used the ratio test
to determine the row in which an entering variable becomes a basic vari-
able. Recall that the ratio test depends on the fact that any feasible point
requires all variables to be nonnegative. Thus, if some variables are allowed
to be unrestricted in sign, the ratio test and therefore the simplex method
are no longer valid. Here, we show how an LP problem with restricted in
sign variables can be transformed into an LP problem in which all variables
are required to be nonnegative.

For each unrestricted in sign variable xi , we begin by defining two new


variables x0i and x00i . Then substitute x0i −x00i for xi in each constraint and in
the objective function. Also, add the sign restrictions x0i ≥ 0 and x00i ≥ 0.
Now all the variables are nonnegative, therefore, we can use the simplex
method. Note that each basic feasible solution can have either x0i > 0 (and
x00i = 0), or x00i > 0 (and x0i = 0), or x0i = x00i = 0 (xi = 0).
694 Applied Linear Algebra and Optimization using MATLAB

Example 6.12 Consider the following LP problem:

maximize Z = 30x1 − 4x2 ,

subject to the constraints

5x1 − x2 ≤ 30
x1 ≤ 5

x1 ≥ 0, x2 unrestricted.
Solution. Since x2 is unrestricted in sign, we replace x2 by x02 − x002 in the
objective function, and in the first constraint we obtain

maximize Z = 30x1 − 4x02 + 4x002 ,

subject to the constraints

5x1 − x02 + x002 ≤ 30


x1 ≤ 5

x1 ≥ 0, x02 ≥ 0, x002 ≥ 0.
Now convert the problem into standard form by adding two slack vari-
ables, u1 and u2 , in the first and second constraints, respectively, and we
get
maximize Z = 30x1 − 4x02 + 4x002 ,
subject to the constraints

5x1 − x02 + x002 + u1 = 30


x1 + u2 = 5

x1 ≥ 0, x02 ≥ 0, x002 ≥ 0, u1 ≥ 0, u2 ≥ 0.
The simplex tableaus are as follows:
basis x1 x02 x002 u1 u2 Z constants
u1 5 −1 1 1 0 0 30
u2 1l 0 0 0 1 0 5
Z −30 4 −4 0 0 1 0
Linear Programming 695

basis x1 x02 x002 u1 u2 Z constants


u1 0 −1 1l 1 −5 0 5
x1 1 0 0 0 1 0 5
Z 0 4 −4 0 30 1 150
basis x1 x02 x002 u1 u2 Z constants
x002 0 −1 1 1 −5 0 5
x1 1 0 0 0 1 0 5
Z 0 0 0 4 10 1 170
We now have an optimal solution to the LP problem given by x1 = 5, x01 =
0, x002 = 5, u1 = u2 = 0, and maximum Z = 170.

Note that the variables x02 and x002 will never both be basic variables in
the same tableau. •

6.8 Finding a Feasible Basis


A major requirement of the simplex method is the availability of an initial
basic feasible solution in canonical form. Without it, the initial simplex
tableau cannot be found. There are two basic approaches to finding an
initial basic feasible solution.

6.8.1 By Trial and Error


Here, a basic variable is chosen arbitrarily for each constraint, and the
system is reduced to canonical form with respect to those basic variables. If
the resulting canonical system gives a basic feasible solution (i.e., the right-
hand side constants are nonnegative), then the initial tableau can be set up
to start the simplex method. It is also possible that during the canonical
reduction some of the right-hand side constants may become negative. In
that case, the basic solution obtained will be infeasible, and the simplex
method cannot be started. Of course, one can repeat the process by trying
a different set of basic variables for the canonical reduction and hope for
a basic feasible solution. Now it is clearly obvious that the trial and error
method is very inefficient and expensive. In addition, if a problem does
not possess a feasible solution, it will take a long time to realize this.
696 Applied Linear Algebra and Optimization using MATLAB

6.8.2 Use of Artificial Variables


This is a systematic way of getting a canonical form with a basic feasible
solution when none is available by inspection. First, an LP problem is
converted to standard form such that all the variables are nonnegative,
the constraints are equations, and all the right-hand side constants are
nonnegative. Then each constraint is examined for the existence of a basic
variable. If none is available, a new variable is added to act as the basic
variable in that constraint. In the end, all the constraints will have a
basic variable, and by definition we have a canonical system. Since the
right-hand side elements are nonnegative, an initial simplex tableau can
be formed readily. Of course, the additional variables have no meaning to
the original problem. These are merely added so that we will have a ready
canonical system to start the simplex method. Hence, these variables are
termed artificial variables as opposed to the real decision variables in the
problem. Eventually they will be forced to zero lest they unbalance the
equations. To illustrate the use of artificial variables, consider the following
LP problem:
Example 6.13 Consider the minimization problem

minimize Z = −3x1 + x2 + x3 ,

subject to the constraints

x1 − 2x2 + x3 ≤ 11
−4x1 + x2 + 2x3 ≥ 3
2x1 − x3 = −1
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.
First, the problem is converted to the standard form as follows:

minimize Z = −3x1 + x2 + x3 ,

subject to the constraints

x1 − 2x2 + x3 + u1 = 11
−4x1 + x2 + 2x3 − v2 = 3
−2x1 + x3 = 1
Linear Programming 697

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, u1 ≥ 0, v2 ≥ 0.
The slack variable u1 in the first constraint equation is a basic variable.
Since there are no basic variables in the other constraint equations, we add
artificial variables, w3 and w4 , to the second and third constraint equations,
respectively. To retain the standard form, w3 and w4 will be restricted to
be nonnegative. Thus, we now have an artificial system given by:

x1 − 2x2 + x3 + u1 = 11
−4x1 + x2 + 2x3 − v2 + w3 = 3
−2x1 + x3 + w4 = 1

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, u1 ≥ 0, v2 ≥ 0, w3 ≥ 0, w4 ≥ 0.
The artificial system has a basic feasible solution in canonical form given
by
x1 = x2 = x3 = 0, u1 = 11, v2 = 0, w3 = 3, w4 = 1.
But this is not a feasible solution to the original problem due to the presence
of the artificial variables w3 and w4 at positive values. •

On the other hand, it is easy to see that any basic feasible solution
to the artificial system in which the artificial variables (w3 and w4 in the
above example) are zero is automatically a basic feasible solution to the
original problem. Hence, the object is to reduce the artificial variables
to zero as soon as possible. This can be accomplished in two ways, and
each one gives rise to a variant of the simplex method, the Big M simplex
method and the Two-Phase simplex method.

6.9 Big M Simplex Method


In this approach, the artificial variables are assigned a very large cost in
the objective function. The simplex method, while trying to improve the
objective function, will find the artificial variables uneconomical to main-
tain as basic variables with positive values. Hence, they will be quickly
replaced in the basis by the real variables with smaller costs. For hand cal-
culations it is not necessary to assign a specific cost value to the artificial
variables. The general practice is to assign the letter M as the cost in a
698 Applied Linear Algebra and Optimization using MATLAB

minimization problem, and −M as the profit in a maximization problem,


with the assumption that M is a very large positive number.

The following steps describe the Big M simplex method:


1. Modify the constraints so that the right-hand side of each constraint
is nonnegative. This requires that each constraint with a negative
right-hand side be multiplied through by −1.
2. Convert each inequality constraint to standard form. This means
that if constraint i is a ≤ constraint, we add a slack variable ui , and
if i is a ≥ constraint, we subtract a surplus variable vi .
3. If (after step 1 has been completed) constraint i is a ≥ or = constraint,
add an artificial variable wi . Also, add the sign restriction wi ≥ 0.
4. Let M denote a very large positive number. If an LP problem is a
minimization problem, add (for each artificial variable) M wi to the
objective function. If an LP problem is a maximization problem, add
(for each artificial variable) −M wi to the objective function.
5. Since each artificial variable will be in the starting basis, all artificial
variables must be eliminated from the last row before beginning the
simplex method. This ensures that we begin with the canonical form.
In choosing the entering variable, remember that M is a very large
positive number. Now solve the transformed problem by the simplex
method. If all artificial variables are equal to zero in the optimal
solution, we have found the optimal solution to the original problem.
If any artificial variables are positive in the optimal solution, the
original problem is infeasible.
Example 6.14 To illustrate the Big M simplex method, let us consider
the standard form of Example 6.13:
minimize Z = −3x1 + x2 + x3 ,
subject to the constraints
x1 − 2x2 + x3 + u1 = 11
−4x1 + x2 + 2x3 − v2 = 3
−2x1 + x3 = 1
Linear Programming 699

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, u1 ≥ 0, v2 ≥ 0.
Solution. In order to derive the artificial variables to zero, a large cost
will be assigned to w3 and w4 so that the objective function becomes:

minimize Z = −3x1 + x2 + x3 + M w3 + M w4 ,

where M is a very large positive number. Thus, the LP problem with its
artificial variables becomes:

minimize Z = −3x1 + x2 + x3 + M w3 + M w4 ,

subject to the constraints

x1 − 2x2 + x3 + u1 = 11
−4x1 + x2 + 2x3 − v2 + w 3 = 3
−2x1 + x3 + w4 = 1

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, u1 ≥ 0, v2 ≥ 0, w3 ≥ 0, w4 ≥ 0.
Note the reason behind the use of the artificial variables. We have three
equations and seven unknowns. Hence, the starting basic solution must
include 7−3 = 4 zero variables. If we put x1 , x2 , x3 , and v2 at the zero level,
we immediately obtain the solution u1 = 11, w3 = 3, and w4 = 1, which is
the required starting feasible solution. Having constructed a starting feasible
solution, we must “condition” the problem so that when we put it in tabular
form, the right-hand side column will render the starting solution directly.
This is done by using the constraint equations to substitute out w3 and w4
in the objective function. Thus,

w3 = 3 + 4x1 − x2 − 2x3 + v2
w4 = 1 + 2x1 − x3 .

The objective function thus becomes

Z = −3x1 + x2 + x3 + M (3 + 4x1 − x2 − 2x3 + v2 ) + M (1 + 2x1 − x3 )

or
Z = (−3 + 6M )x1 + (1 − M )x2 + (1 − 3M )x3 + M v2 + 4M,
700 Applied Linear Algebra and Optimization using MATLAB

and the Z-equation now appears in the tableau as


Z − (−3 + 6M )x1 − (1 − M )x2 − (1 − 3M )x3 − M v2 = 4M.
Now we can see that at the starting solution, given x1 = x2 = x3 = v2 = 0,
the value of Z is 4M , as it should be when u1 = 11, w3 = 3, and w4 = 1.

The sequence of tableaus leading to the optimum solution is shown in


the following:

basis x1 x2 x3 u1 v2 w3 w4 Z constants
u1 1 −2 1 1 0 0 0 0 11
w3 −4 1 2 0 −1 1 0 0 3
w4 −2 0 1l 0 0 0 1 0 1
Z 3 − 6M −1 + M −1 + 3M 0 −M 0 0 1 4M

basis x1 x2 x3 u1 v2 w3 w4 Z constants
u1 3 −2 0 1 0 0 −1 0 10
w3 0 1l 0 0 −1 1 −2 0 1
x3 −2 0 1 0 0 0 1 0 1
Z 1 −1 + M 0 0 −M 0 1 − 3M 1 M +1

basis x1 x2 x3 u1 v2 w3 w4 Z constants
u1 3l 0 0 1 −2 2 −5 0 12
x2 0 1 0 0 −1 1 −2 0 1
x3 −2 0 1 0 0 0 1 0 1
Z 1 0 0 0 −1 1−M −1 − M 1 2
Now both the artificial variables w3 and w4 have been reduced to zero. Thus,
Tableau 3 represents a basic feasible solution to the original problem. Of
course, this is not an optimal solution since x1 can reduce the objective
function further by replacing u1 in the basis.
basis x1 x2 x3 u1 v2 w3 w4 Z constants
1
x1 1 0 0 3
− 32 2
3
− 35 0 4
x2 0 1 0 0 −1 1 −2 0 1
x3 0 0 1 0 0 0 1 0 9
Z 0 0 0 − 13 − 31 1
3
−M 2
3
− M 1 −2
Linear Programming 701

Tableau 4 is optimal, and the unique optimal solution is given by x1 =


4, x2 = 1, x3 = 9, u1 = 0, v2 = 0, and the minimum z = −2. •

Note that an artificial variable is added merely to act as a basic variable


in a particular equation. Once it is replaced by a real (decision) variable,
there is no need to retain the artificial variable in the simplex tableaus.
In other words, we could have omitted the column corresponding to the
artificial variable w4 in Tableaus 2, 3, and 4. Similarly, the column corre-
sponding to w3 could have been dropped from Tableaus 3 and 4.

When the Big M simplex method terminates with an optimal tableau,


it is sometimes possible for one or more artificial variables to remain as
basic variables at positive values. This implies that the original problem is
infeasible, since no basic feasible solution is possible to the original system
if it includes even one artificial variable at a positive value. In other words,
the original problem without artificial variables does not have a feasible
solution. Infeasibility is due to the presence of inconsistent constraints in
the formulation of the problem. In economic terms, this means that the
resources of the system are not sufficient to meet the expected demands.

Also, note that for computer solutions, M has to be assigned a specific


value. Usually the largest value that can be represented in the computer
solution is assumed.

6.10 Two-Phase Simplex Method


A drawback of the Big M simplex method is that assigning a very large
value to the constant M can sometimes create computational problems in
a digital computer. The Two-Phase method is designed to alleviate this
difficulty. Although the artificial variables are added in the same manner
employed in the Big M simplex method, the use of the constant M is elimi-
nated by solving the problem in two phases (hence, the name “Two-Phase”
method). These two phases are outlined as follows:

Phase 1. This phase consists of finding an initial basic feasible solution to


the original problem. In other words, the removal of the artificial variables
702 Applied Linear Algebra and Optimization using MATLAB

is taken up first. For this an artificial objective function is created, which


is the sum of all the artificial variables. The artificial objective function is
then minimized using the simplex method. If the minimum value of the
artificial problem is zero, then all the artificial variables have been reduced
to zero, and we have a basic feasible solution to the original problem. Go
to Phase 2. Otherwise, if the minimum is positive, the problem has no
feasible solution. Stop.

Phase 2. The basic feasible solution found is optimized with respect to


the original objective function. In other words, the final tableau of Phase
1 becomes the initial tableau for Phase 2 after changing the objective func-
tion. The simplex method is once again applied to determine the optimal
solution.

The following steps describe the Two-Phase simplex method. Note that
steps 1 − 3 for the Two-Phase simplex method are similar to steps 1 − 3
for the Big M simplex method.

1. Modify the constraints so that the right-hand side of each constraint


is nonnegative. This requires that each constraint with a negative
right-hand side be multiplied through by −1.

2. Convert each inequality constraint to standard form. This means


that if constraint i is a ≤ constraint, we add a slack variable ui , and
if i is a ≥ constraint, we subtract a surplus variable vi .

3. If (after step 1 has been completed) constraint i is a ≥ or = constraint,


add an artificial variable wi . Also, add the sign restriction wi ≥ 0.

4. For now, ignore the original LP’s objective function. Instead solve
an LP problem whose objective function is minimize W = (sum of
all the artificial variables). This is called the Phase 1 LP problem.
The act of solving the Phase 1 LP problem will force the artificial
variables to be zero.

Note that:
If the optimal value of W is equal to zero, and no artificial variables
Linear Programming 703

are in the optimal Phase 1 basis, then we drop all columns in the
optimal Phase 1 tableau that correspond to the artificial variables.
We now combine the original objective function with the constraints
from the optimal Phase 1 tableau. This yields the Phase 2 LP prob-
lem. The optimal solution to the Phase 2 LP problem is the optimal
solution to the original LP problem.
If the optimal value W is greater than zero, then the original LP
problem has no feasible solution.
If the optimal value of W is equal to zero and at least one artificial
variable is in the optimal Phase 1 basis, then we can find the optimal
solution to the original LP problem if, at the end of Phase 1, we drop
from the optimal Phase 1 tableau all nonbasic artificial variables and
any variable from the original problem that has a negative coefficient
in the last row of the optimal Phase 1 tableau.
Example 6.15 To illustrate the Two-Phase simplex method, let us con-
sider again the standard form of Example 6.13:
minimize Z = −3x1 + x2 + x3 ,
subject to the constraints
x1 − 2x2 + x3 + u1 = 11
−4x1 + x2 + 2x3 − v2 = 3
−2x1 + x3 = 1
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, u1 ≥ 0, v2 ≥ 0.
Solution.
Phase 1 Problem:
Since we need artificial variables w3 and w4 in the second and third equa-
tions, the Phase 1 problem reads as
minimize W = w3 + w4 ,
subject to the constraints
x1 − 2x2 + x3 + u1 = 11
−4x1 + x2 + 2x3 − v2 + w3 = 3
−2x1 + x3 + w4 = 1
704 Applied Linear Algebra and Optimization using MATLAB

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, u1 ≥ 0, v2 ≥ 0, w3 ≥ 0, w4 ≥ 0.
Because w3 and w4 are in the starting solution, they must be substituted
out in the objective function as follows:

W = w3 + w4
= (3 + 4x1 − x2 − 2x3 + v2 ) + (1 + 2x1 − x3 )
= 4 + 6x1 − x2 − 3x3 + v2 ,

and the W equation now appears in the tableau as

W − 6x1 + x2 + 3x3 − v2 = 4.

The initial basic feasible solution for the Phase 1 problem is given below:
basis x1 x2 x3 u1 v2 w3 w4 W constants
u1 1 −2 1 1 0 0 0 0 11
w3 −4 1 2 0 −1 1 0 0 3
w4 −2 0 1l 0 0 0 1 0 1
W −6 1 3 0 −1 0 0 1 4

basis x1 x 2 x3 u1 v2 w3 w4 W constants
u1 3 −2 0 1 0 0 −1 0 10
w3 0 1l 0 0 −1 1 −2 0 1
x3 −2 0 1 0 0 0 1 0 1
W 0 −1 0 0 −1 0 −3 1 1

basis x1 x2 x3 u1 v2 w3 w4 W constants
u1 3 0 0 1 −2 2 −5 0 12
x2 0 1 0 0 −1 1 −2 0 1
x3 −2 0 1 0 0 0 1 0 1
W 0 0 0 0 0 −1 1 1 0

We now have an optimal solution to the Phase 1 LP problem, given by


x1 = 0, x2 = 1, x3 = 1, u1 = 12, v2 = 0, w3 = 0, w4 = 0, and minimum
W = 0. Since the artificial variables w3 = 0 and w4 = 0, Tableau 3 repre-
sents a basic feasible solution to the original problem.
Linear Programming 705

Phase 2 Problem: The artificial variables have now served their purpose
and must be dispensed with in all subsequent computations. This means
that the equations of the optimum tableau in Phase 1 can be written as

3x1 + u1 − 2v2 = 12
x2 − v2 = 1
−2x1 + x3 = 1.

These equations are exactly equivalent to those in the standard form of the
original problem (before artificial variables are added). Thus, the original
problem can be written as

minimize Z = −3x1 + x2 + x3 ,

subject to the constraints

3x1 + u1 − 2v2 = 12
x2 − v2 = 1
−2x1 + x3 = 1

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, u1 ≥ 0, v2 ≥ 0.
As we can see, the principal contribution of the Phase 1 computations is to
provide a ready starting solution to the original problem. Since the prob-
lem has three equations and five variables, by putting 5 − 3 = 2 variables,
namely, x1 = v2 = 0, we immediately obtain the starting basic feasible so-
lution u1 = 12, x2 = 1, and x3 = 1.

To solve the problem, we need to substitute the basic variables x1 , x2 , and


x3 in the objective function. This is accomplished by using the constraint
equations as follows:

Z = −3x1 + x2 + x3
= −3(4 − 31 u1 + 32 v2 ) + (1 + v2 ) + 2(4 − 31 u1 + 23 v2 )
= −2 + 13 u1 + 31 v2 .

Thus, the starting tableau for Phase 2 becomes:


706 Applied Linear Algebra and Optimization using MATLAB

basis x1 x2 x3 u1 v2 Z constants
u1 3l 0 0 1 −2 0 12
x2 0 1 0 0 −1 0 1
x3 −2 0 1 0 0 0 1
Z 0 0 0 − 31 − 13 1 −2

basis x1 x2 x3 u1 v2 Z constants
1
x1 1 0 0 3
− 23 0 4
x2 0 1 0 0 −1 0 1
2
x3 0 0 1 3
− 43 0 9
Z 0 0 0 − 13 − 13 1 −2

An optimal solution has been reached, and it is given by x1 = 4, x2 =


1, x3 = 9, u1 = 0, v2 = 0, and minimum Z = −2. •

Comparing the Big M simplex method and the Two-Phase simplex method,
we observe the following:

• The basic approach to both methods is the same. Both add the
artificial variables to get the initial canonical system and then derive
them to zero as soon as possible.

• The sequence of tableaus and the basis changes are identical.

• The number of iterations are the same.

• The Big M simplex method solves the linear problem in one pass,
while the Two-Phase simplex method solves it in two stages as two
linear programs.

6.11 Duality
From both the theoretical and practical points of view, the theory of duality
is one of the most important and interesting concepts in linear program-
ming. Each LP problem has a related LP problem called the dual problem.
Linear Programming 707

The original LP problem is called the primal problem. For the primal prob-
lem defined by (6.1)–(6.3) above, the corresponding dual problem is to find
the values of the M variables y1 , y2 , . . . , yM to solve the following:

minimize V = b 1 y 1 + b2 y 2 + · · · + bM y M , (6.17)

subject to the constraints

a11 y1 + a21 y2 + · · · + aM 1 y M ≥ c1
a12 y1 + a22 y2 + · · · + aM 2 y M ≥ c2
.. .. .. .. (6.18)
. . ··· . .
a1N y1 + a2N y2 + · · · + aM N yM ≥ cN

and
y1 ≥ 0, y2 ≥ 0, . . . , yM ≥ 0. (6.19)
In matrix notation, the primal and the dual problems are formulated
as
Primal Dual

Maximize Z = cT x Minimize V = bT y
subject to the constraints subject to the constraints

Ax ≤ b AT y ≥ c

x≥0 y ≥ 0,

where  
a11 a12 ··· a1N
 a21 a22 ··· a2N 
A=
 
.. .. .. .. 
 . . . . 
aM 1 aM 2 · · · aM N
       
b1 c1 x1 y1
 b2   c2   x2   y2 
b= , c =  ..  , x =  ..  , y=
       
.. .. 
 .   .   .   . 
bM cN xN yM
708 Applied Linear Algebra and Optimization using MATLAB

and cT denotes the transpose of the vector c.

The concept of a dual can be introduced with the help of the following
LP problem.

Example 6.16 Write the dual of the following linear problem:

Primal Problem:

maximize Z = x1 + 2x2 − 3x3 + 4x4 ,

subject to the following constraints

x1 + 2x2 + 2x3 − 3x4 ≤ 25


2x1 + x2 − 3x3 + 2x4 ≤ 15

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0.
The above linear problem has two constraints and four variables. The dual
of this primal problem is written as:

Dual Problem:
minimize V = 25y1 + 15y2 ,
subject to the constraints

y1 + 2y2 ≥ 1
2y1 + 2y2 ≥ 2
2y1 − 3y2 ≥ −3
−3y1 + 2y2 ≥ 4

y1 ≥ 0, y2 ≥ 0,
where y1 and y2 are called the dual variables. •

6.11.1 Comparison of Primal and Dual Problems


Comparing the primal and the dual problems, we observe the following
relationships:
Linear Programming 709

1. The objective function coefficients of the primal problem have became


the right-hand side constants of the dual. Similarly, the right-hand
side constants of the primal have become the cost coefficients of the
dual.

2. The inequalities have been reversed in the constraints.

3. The objective function is changed from maximization in primal to


minimization in dual.

4. Each column in the primal corresponds to a constraint (row) in the


dual. Thus, the number of dual constraints is equal to the number
of primal variables.

5. Each constraint (row) in the primal corresponds to a column in the


dual. Hence, there is one dual variable for every primal constraint.

6. The dual of the dual is the primal problem.

In both the primal and the dual problems, the variables are nonnegative
and the constraints are inequalities. Such problems are called symmetric
dual linear programs.

Definition 6.7 (Symmetric Form)

A linear program is said to be in symmetric form, if all the variables are


restricted to be nonnegative, and all the constraints are inequalities (in a
maximization problem the inequalities must be in “less than or equal to”
form, while in a minimization problem they must be “greater than or equal
to”). •

The general rules for writing the dual of a linear program in symmetric
form are summarized below:

1. Define one (nonnegative) dual variable for each primal constraint.

2. Make the cost vector of the primal the right-hand side constants of
the dual.
710 Applied Linear Algebra and Optimization using MATLAB

3. Make the right-hand side vector of the primal the cost vector of the
dual.

4. The transpose of the coefficient matrix of the primal becomes the


constraint matrix of the dual.

5. Reverse the direction of the constraint inequalities.

6. Reverse the optimization direction, i.e., change minimizing to maxi-


mizing and vice versa.

Example 6.17 Write the following linear problem in symmetric form and
then find its dual:

minimize Z = 2x1 + 4x2 + 3x3 + 5x4 + 3x5 + 4x6 ,

subject to the constraints

x1 + x2 + x3 ≤ 300
x4 + x5 + x6 ≤ 600
x1 + x4 ≥ 200
x2 + x5 ≥ 300
x3 + x6 ≥ 400

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0, x6 ≥ 0.
Solution. For the above linear program (minimization) to be in symmetric
form, all the constraints must be in “greater than or equal to” form. Hence,
we multiply the first two constraints by −1, then we have the primal problem
as
minimize Z = 2x1 + 4x2 + 3x3 + 5x4 + 3x5 + 4x6 ,
subject to the constraints

−x1 − x2 − x3 ≥ −300
− x4 − x5 − x6 ≥ −600
x1 + x4 ≥ 200
x2 + x5 ≥ 300
x3 + x6 ≥ 400
Linear Programming 711

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0, x6 ≥ 0.
The dual of the above primal problem becomes

maximize V = −300y1 − 600y2 + 200y3 + 300y4 + 400y5 ,

subject to the constraints

−y1 + y3 ≤ 2
−y1 + y4 ≤ 4
−y1 + y5 ≤ 3
− y2 + y3 ≤ 5
− y2 + y4 ≤ 3
− y2 + y5 ≤ 4

y1 ≥ 0, y2 ≥ 0, y3 ≥ 0, y4 ≥ 0, y5 ≥ 0.

6.11.2 Primal-Dual Problems in Standard Form


In most LP problems, the dual is defined for various forms of the primal
depending on the types of the constraints, the signs of the variables, and
the sense of optimization. Now we introduce a definition of the dual that
automatically accounts for all forms of the primal. It is based on the fact
that any LP problem must be put in the standard form before the model
is solved by the simplex method. Since all the primal-dual computations
are obtained directly from the simplex tableau, it is logical to define the
dual in a way that is consistent with the standard form of the primal.

Example 6.18 Write the standard form of the primal-dual problem of the
following linear problem:

maximize Z = 5x1 + 12x2 + 4x3 ,

subject to the constraints

x1 + 2x2 + x3 ≤ 10
2x1 − x2 + x3 = 8
712 Applied Linear Algebra and Optimization using MATLAB

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.
Solution. The given primal can be put in the standard primal as

maximize Z = 5x1 + 12x2 + 4x3 ,

subject to the constraints

x1 + 2x2 + x3 + u1 = 10
2x1 − x2 + x3 = 8

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, u1 ≥ 0.
Notice that u1 is a slack in the first constraint. Now its dual form can
be written as
minimize V = 10y1 + 8y2 ,
subject to the constraints

y1 + 2y2 ≥ 5
2y1 − y2 ≥ 12
y1 + y2 ≥ 4

y1 ≥ 0, y2 unrestricted.

Example 6.19 Write the standard form of the primal-dual problem of the
following linear problem:

minimize Z = 5x1 − 2x2 ,

subject to the constraints

−x1 + x2 ≥ −3
2x1 + 3x2 ≤ 5

x1 ≥ 0, x2 ≥ 0.
Solution. The given primal can be put in the standard primal form as

minimize Z = 5x1 − 2x2 ,


Linear Programming 713

subject to the constraints

x1 − x2 + u1 = 3
2x1 + 3x2 + u2 = 5

x1 ≥ 0, x2 ≥ 0, u1 ≥ 0, u2 ≥ 0.

Notice that u1 and u2 are slack in the first and second constraints. Their
dual form is
maximize V = 3y1 + 5y2 ,
subject to the constraints

y1 + 2y2 ≤ 5
−y1 + 3y2 ≤ −2
y1 ≤ 0
y2 ≤ 0

y1 , y2 unrestricted.

Theorem 6.3 (Duality Theorem)

If the primal problem has an optimal solution, then the dual problem also
has an optimal solution, and the optimal values of their objective functions
are equal, i.e.,
Maximize Z = Minimize V.

It can be shown that when a primal problem is solved by the simplex


method, the final tableau contains the optimal solution to the dual prob-
lem in the objective row under the columns of the slack variables, i.e., the
first dual variable is found in the objective row under the first slack vari-
able, the second is found under the second slack variable, and so on.
714 Applied Linear Algebra and Optimization using MATLAB

Example 6.20 Find the dual of the following linear problem:

maximize Z = 12x1 + 9x2 + 15x3 ,

subject to the constraints


2x1 + x2 + x3 ≤ 30
x1 + x2 + 3x3 ≤ 40
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0,
and then find its optimal solution.

Solution. The dual of this problem is

minimize V = 30y1 + 40y2 ,

subject to the constraints


2y1 + y2 ≥ 12
y1 + y2 ≥ 9
y1 + 3y2 ≥ 15
y1 ≥ 0, y2 ≥ 0.
Introducing the slack variables u1 and u2 in order to convert the given linear
problem to the standard form, we obtain

maximize Z = 12x1 + 9x2 + 15x3

subject to the following constraints


2x1 + x2 + x3 + u1 = 30
x1 + x2 + 3x3 + u2 = 40
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, u1 ≥ 0, u2 ≥ 0.
We now apply the simplex method and obtain the following tableaus:
basis x1 x2 x3 u1 u2 Z constants
u1 2 1 1 1 0 0 30
u2 1 1 3l 0 1 0 40
Z −12 −9 −15 0 0 1 0
Linear Programming 715

basis x1 x2 x3 u1 u2 Z constants
u1 m
5 2
0 1 − 13 0 50
3 3 3

1 1 1 40
x3 3 3
1 0 3
0 3
Z −7 −4 0 0 5 1 200

basis x1 x2 x3 u1 u2 Z constants
x1 1 m
2
0 3
− 51 0 10
5 5

1
x3 0 5
1 − 15 2
5
0 10

Z 0 − 65 0 21
5
18
5
1 270

basis x1 x2 x3 u1 u2 Z constants
5 3
x2 2
1 0 2
− 21 0 25

x3 − 21 0 1 − 12 1
2
0 5
Z 3 0 0 6 3 1 300

Thus, the optimal solution to the given primal problem is

x1 = 0, x2 = 25, x3 = 5,

and the optimal value of the objective function Z is 300.

The optimal solution to the dual problem is found in the objective row
under the slack variables u1 and u2 columns as

y1 = 6 and y2 = 3.

Thus, the optimal value of the dual objective function is

V = 30(6) + 40(3) = 300,

which we expect from the Duality theorem 6.3. •


In the following we give another important duality theorem, which gives
the relationship between the primal and dual solutions.
716 Applied Linear Algebra and Optimization using MATLAB

Theorem 6.4 (Weak Duality Theorem)

Consider the symmetric primal-dual linear problems:


Primal Dual

Maximize Z = cT x Minimize V = bT y
subject to the constraints subject to the constraints

Ax ≤ b AT y ≥ c

x≥0 x≥0
The value of the objective function of the minimization problem (dual) for
any feasible solution is always greater than or equal to that of the maxi-
mization problem (primal). •
Example 6.21 Consider the following LP problem:

Primal:

maximize Z = x1 + 2x2 + 3x3 + 4x4 ,


subject to the constraints
x1 + 2x2 + 2x3 + 3x4 ≤ 20
2x1 + x2 + 3x3 + 2x4 ≤ 20
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0.
Its dual form is:

Dual:

minimize V = 20y1 + 20y2 ,


subject to the constraints
y1 + 2y2 ≥ 1
2y1 + y2 ≥ 2
2y1 + 3y2 ≥ 3
3y1 + 2y2 ≥ 4
Linear Programming 717

y1 ≥ 0, y2 ≥ 0.
The feasible solution for the primal is x1 = x2 = x3 = x4 = 1, and
y1 = y2 = 1 is feasible for the dual. The value of the primal objective is

Z = cT x = 10,

and the value of the dual objective is

V = bT y = 40.

Note that
cT x < bT y,
which satisfies the Weak Duality Theorem 6.4. •

6.12 Sensitivity Analysis in


Linear Programming
Sensitivity analysis refers to the study of the changes in the optimal so-
lution and optimal value of objective function Z due to the input data
coefficients. The need for such an analysis arises in various circumstances.
Often management is not entirely sure about the values of the constants
and wants to know the effects of changes. There may be different kinds of
modifications:

1. Changes in the right-hand side constants bi .

2. Changes in the objective function coefficients ci .

3. Changes in the elements aij of the coefficient matrix A.

4. Introducing additional constraints or deleting some of the existing


constraints.

5. Adding or deleting decision variables.

We will discuss here only changes in the right-hand side constants bi , which
are the most common in sensitivity analysis.
718 Applied Linear Algebra and Optimization using MATLAB

Example 6.22 A small towel company makes two types of towels, stan-
dard and deluxe. Both types have to go through two processing departments,
cutting and sewing. Each standard towel needs 1 minute in the cutting de-
partment and 3 minutes in the sewing department. The total available time
in cutting is 160 minutes for a production run. Each deluxe towel needs
2 minutes in the cutting department and 2 minutes in the sewing depart-
ment. The total available time in sewing is 240 minutes for a production
run. The profit on each standard towel is $1.00, whereas the profit on each
deluxe towel is $1.50. Determine the number of towels of each type to pro-
duce to maximize profit.

Solution. Let x1 and x2 be the number of standard towels and deluxe


towels, respectively. Then the LP problem is

maximize Z = x1 + 1.5x2 ,

subject to the constraints


x1 + 2x2 ≤ 160 (cutting dept.)
3x1 + 2x2 ≤ 240 (sewing dept.)
x1 ≥ 0, x2 ≥ 0.
After converting the problem to the standard form and then applying the
simplex method, one can easily get the final tableau as follows:
basis x1 x2 u1 u2 Z constants
3
x2 0 1 4
− 14 0 60

x1 1 0 − 12 1
2
0 40

5 1
Z 0 0 8 8
1 130

The optimal solution is

x1 = 40, x2 = 60, u1 = u2 = 0, Zmax = 130.

Now let us ask a typical sensitivity analysis question:


Linear Programming 719

Suppose we increase the maximum number of minutes at the cutting depart-


ment by 1 minute, i.e., if the maximum minutes at the cutting department
is 161 instead of 160, what would be the optimal solution?

Then the revised LP problem will be

maximize Z = x1 + 1.5x2 ,

subject to the constraints

x1 + 2x2 ≤ 161 (cutting dept.)


3x1 + 2x2 ≤ 240 (sewing dept.)

x1 ≥ 0, x2 ≥ 0.
Of course, we can again solve this revised problem using the simplex method.
However, since the modification is not drastic, we would wonder whether
there is an easy way to utilize the final tableau for the original problem in-
stead of going through all the iteration steps for the revised problem. There
is a way, and this way is the key idea of the sensitivity analysis.
1. Since the slack variable for the cutting department is u1 , then use the
u1 -column.

2. Modify the right most column (constants) using the u1 -column as


subsequently shown, giving the final tableau for the revised problem
as follows:

basis x1 x2 u1 u2 Z constants
3
x2 0 1 4
− 14 0 60 + 1 × 34

x1 1 0 − 12 1
2
0 40 + 1 × (− 21 )

5 1 5
Z 0 0 8 8
1 130 + 1 × 8

(where in the last column, the first entry is the original entry, the second
one is one unit (minutes) increased, and the final one is the u1 -column
entry), i.e.,
720 Applied Linear Algebra and Optimization using MATLAB

basis x1 x2 u1 u2 Z constants
3
x2 0 1 4
− 14 0 60 34

x1 1 0 − 12 1
2
0 39 12

5 1
Z 0 0 8 8
1 130 58

then the optimal solution for the revised problem is


1 3 5
x1 = 39 , x2 = 60 , u1 = u2 = 0, Zmax = 130 .
2 4 8
Let us try one more revised problem:

Assume that the maximum number of minutes at the sewing department


is reduced by 8, making the maximum minutes 240 − 8 = 232. The final
tableau for this revised problem will be given as follows:
basis x1 x2 u1 u2 Z constants
3
x2 0 1 4
− 41 0 60 + (−8) × (− 14 ) = 62

x1 1 0 − 21 1
2
0 40 + (−8) × ( 12 ) = 36

5 1
Z 0 0 8 8
1 130 + (−8) × ( 18 ) = 129

then the optimal solution for the revised problem is

x1 = 36, x2 = 62, u1 = u2 = 0, Zmax = 129.

The bottom-row entry, 85 , represents the net profit increase for a one unit
(minute) increase of the available time at the cutting department. It is
called the shadow price at the cutting department. Similarly, another bottom-
row entry, 18 , is called the shadow price at the sewing department.
Linear Programming 721

In general, the shadow price for a constraint is defined as the change


in the optimal value of the objective function when one unit is increased in
the right-hand side of the constraint.

A negative entry in the bottom row represents the net profit increase
when one unit of the variable in that column is introduced. For example,
if a negative entry in the x1 column is − 41 , then introducing one unit of
x1 will result in $( 41 ) = 25 cents net profit gain. Therefore, the bottom-row
entry, 85 , in the preceding tableau represents that the net profit loss is $( 58 )
when one unit of u1 is introduced, keeping the constraint, 160, the same.

Now, suppose the constraint at the cutting department is changed from


160 to 161. If this increment of 1 minute is credited to u1 as a slack, or
unused time at the cutting department, the total profit will remain the same
because the unused time will not contribute to a profit increase. However,
if this u1 = 1 is given up, or reduced (which is the opposite of introduced),
it will yield a net profit gain of $( 85 ). •

6.13 Summary
In this chapter we gave a brief introduction to linear programming. Prob-
lems were described by systems of linear inequalities. One can see that
small systems can be solved in a graphical manner, but that large sys-
tems are solved using row operations on matrices by means of the simplex
method. For finding the basic feasible solution to artificial systems, we
discussed the Big M simplex method and the Two-Phase simplex method.

In this chapter we also discussed the concept of duality in linear pro-


gramming. Since the optimal primal solution can be obtained directly from
the optimal dual tableau (and vice-versa), it is advantageous computation-
ally to solve the dual when it has fewer constraints than the primal. Duality
provides an economic interpretation that sheds light on the unit worth or
shadow price of the different resources. It also explains the condition of
optimality by introducing the new economic definition of inputted costs for
each activity. We closed this chapter with a presentation of the important
722 Applied Linear Algebra and Optimization using MATLAB

technique of sensitivity analysis, which gives linear programming the dy-


namic characteristic of modifying the optimum solution to reflect changes
in the model.

6.14 Problems
1. The Oakwood Furniture Company has 12.5 units of wood on hand
from which to manufacture tables and chairs. Making a table uses
two units of wood and making a chair uses one unit. Oakwood’s
distributor will pay $20 for each table and $15 for each chair, but
they will not accept more than eight chairs, and they want at least
twice as many chairs as tables. How many tables and chairs should
the company produce to maximize its revenue? Formulate this as a
linear programming problem.

2. The Mighty Silver Ball Company manufactures three kinds of pinball


machines, each requiring a different manufacturing technique. The
Super Deluxe Machine requires 17 hours of labor, 8 hours of testing,
and yields a profit of $300. The Silver Ball Special requires 10 hours
of labor, 4 hours of testing, and yields a profit of $200. The Bumper
King requires 2 hours of labor, 2 hours of testing, and yields a profit
of $100. There are 1000 hours of labor and 500 hours of testing
available.
In addition, a marketing forecast has shown that the demand for the
Super Deluxe is no more that 50 machines, demand for the Silver
Ball Special is no more than 80, and demand for the Bumper King is
no more than 150. The manufacturer wants to determine the optimal
production schedule that will maximize the total profit. Formulate
this as a linear programming problem.

3. Consider a diet problem in which a college student is interested in


finding a minimum cost diet that provides at least 21 units of Vitamin
A and 12 units of Vitamin B from five foods with the following
properties:
Linear Programming 723

Food 1 2 3 4 5
Vitamin A content 1 0 1 1 2
Vitamin B content 0 1 2 1 1
Cost per unit (cents) 20 20 31 11 12
Formulate this as a linear programming problem.
4. Consider a problem of scheduling the weekly production of a certain
item for the next 4 weeks. The production cost of the item is $10
for the first two weeks, and $15 for the last two weeks. The weekly
demands are 300, 700, 800, and 900 units, which must be met. The
plant can produce a maximum of 700 units each week. In addition,
the company can employ overtime during the second and third weeks.
This increases weekly production by an additional 200 units, but the
cost of production increases by $5 per unit. Excess production can be
stored at a cost of $3 an item per week. How should the production
be scheduled to minimize the total cost? Formulate this as a linear
programming problem.
5. An oil refinery can blend three grades of crude oil to produce regular
and super gasoline. Two possible blending processes are available.
For each production run the older process uses 5 units of crude A,
7 units of crude B, and 2 units of crude C to produce 9 units of
regular and 7 units of super gasoline. The newer process uses 3
units of crude A, 9 units of crude B, and 4 units of crude C to
produce 5 units of regular and 9 units of super gasoline for each
production run. Because of prior contract commitments, the refinery
must produce at least 500 units of regular gasoline and at least 300
units of super for the next month. It has available 1500 units of crude
A, 1900 units of crude B, and 1000 units of crude C. For each unit of
regular gasoline produced the refinery receives $6, and for each unit
of super it receives $9. Determine how to use the resources of crude
oil and the two blending processes to meet the contract commitments
and, at the same time, maximize revenue. Formulate this as a linear
programming problem.
6. A tailor has 80 square yards of cotton material and 120 square yards
of woolen material. A suit requires 2 square yards of cotton and 1
724 Applied Linear Algebra and Optimization using MATLAB

square yard of wool. A dress requires 1 square yard of cotton and 3


square yards of wool. How many of each garment should the tailor
make to maximize income if a suit and a dress each sell for $90? What
is the maximum income? Formulate this as a linear programming
problem.

7. A trucking firm ships the containers of two companies, A and B.


Each container from company A weighs 40 pounds and is 2 cubic
feet in volume. Each container from company B weighs 50 pounds
and is 3 cubic feet in volume. The trucking firm charges company A
$2.20 for each container shipped and charges company B $3.00 for
each container shipped. If one of the firm’s trucks cannot carry more
than 37, 000 pounds and cannot hold more than 2000 cubic feet, how
many containers from companies A and B should a truck carry to
maximize the shipping charges?

8. A company produces two types of cowboy hats. Each hat of the


first type requires twice as much labor time as does each hat of the
second type. If all hats are of the second type only, the company can
produce a total of 500 hats a day. The market limits daily sales of
the first and second types to 150 and 200 hats, respectively. Assume
that the profit per hat is $8 for type 1 and $5 for type 2. Determine
the number of hats of each type to produce to maximize profit.

9. A company manufactures two types of hand calculators, of model A


and model B. It takes 1 hour and 4 hours in labor time to manufac-
ture each A and B, respectively. The cost of manufacturing A is $30
and that of manufacturing B is $20. The company has 1600 hours of
labor time available and $18, 000 in running costs. The profit on each
A is $10 and on each B is $8. What should the production schedule
be to ensure maximum profit?

10. A clothing manufacturer has 10 square yards of cotton material, 10


square yards of wool material, and 6 square yards of silk material. A
pair of slacks requires 1 square yard of cotton, 2 square yards of wool,
and 1 square yard of silk. A skirt requires 2 square yards of cotton,
1 square yard of wool, and 1 square yard of silk. The net profit on
Linear Programming 725

a pair of slacks is $3 and the net profit on a skirt is $4. How many
skirts and how many slacks should be made to maximize profit?

11. A manufacturer produces sacks of chicken feed from two ingredients,


A and B. Each sack is to contain at least 10 ounces of nutrient N1 , at
least 8 ounces of nutrient N2 , and at least 12 ounces of nutrient N3 .
Each pound of ingredient A contains 2 ounces of nutrient N1 , 2 ounces
of nutrient N2 , and 6 ounces of nutrient N3 . Each pound of ingredient
B contains 5 ounces of nutrient N1 , 3 ounces of nutrient N2 , and 4
ounces of nutrient N3 . If ingredient A costs 8 cents per pound and
ingredient B costs 9 cents per pound, how much of each ingredient
should the manufacturer use in each sack of feed to minimize the
cost?

12. The Apple Company has a contract with the government to supply
1200 microcomputers this year and 2500 next year. The company has
the production capacity to make 1400 microcomputers each year, and
it has already committed its production line for this level. Labor and
management have agreed that the production line can be used for at
most 80 overtime shifts each year, each shift costing the company an
additional $20, 000. In each overtime shift, 50 microcomputers can
be manufactured this year and used to meet next year’s demand, but
must be stored at a cost of $100 per unit. How should the production
be scheduled to minimize cost?

13. Solve each of the following linear programming problems using the
graphical method:

(a) Maximize: Z = 2x1 + x2


Subject to: 4x1 + x2 ≤ 36
4x1 + 3x2 ≤ 60
x1 ≥ 0, x2 ≥ 0

(b) Maximize: Z = 2x1 + x2


Subject to: 4x1 + x2 ≤ 16
x1 + x2 ≤ 7
726 Applied Linear Algebra and Optimization using MATLAB

x1 ≥ 0, x2 ≥ 0

(c) Maximize: Z = 4x1 + x2


Subject to: 2x1 + x2 ≤ 4
6x1 + x2 ≤ 8
x1 ≥ 0, x2 ≥ 0

(d) Minimize: Z = 2x1 − 7x2 − 3x3


Subject to: x1 + 2x2 + x3 ≤ 5
x1 + x3 ≤ 10
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0
14. Solve each of the following linear programming problems using the
graphical method:

(a) Maximize: Z = 6x1 − 2x2


Subject to: x1 − x2 ≤ 1
3x1 − x2 ≤ 6
x1 ≥ 0, x2 ≥ 0

(b) Maximize: Z = 4x1 + 4x2


Subject to: 2x1 + 7x2 ≤ 21
7x1 + 2x2 ≤ 49
x1 ≥ 0, x2 ≥ 0

(c) Maximize: Z = 3x1 + 2x2


Subject to: 2x1 + x2 ≤ 2
3x1 + 4x2 ≥ 12
x1 ≥ 0, x2 ≥ 0

(d) Maximize: Z = −2x1 − x2 + 4x3


Subject to: 3x1 − x2 + 2x3 ≤ 25
−x1 − x2 + 2x3 ≤ 20
−x1 − x2 + x3 ≤ 5
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0
Linear Programming 727

15. Solve each of the following linear programming problems using the
graphical method:

(a) Minimize: Z = −3x1 − x2


Subject to: x1 + x2 ≤ 150
4x1 + x2 ≤ 450
x1 ≥ 0, x2 ≥ 0

(b) Minimize: Z = −2x1 + x2


Subject to: 2x1 + 3x2 ≤ 14
4x1 + 5x2 ≤ 16
x1 ≥ 0, x2 ≥ 0

(c) Minimize: Z = −2x1 + x2


Subject to: 2x1 + x2 ≤ 440
4x1 + x2 ≤ 680
x1 ≥ 0, x2 ≥ 0

(d) Maximize: Z = x1 − x2
Subject to: x1 + x2 ≤ 6
x1 − x2 ≥ 0
−x1 − x2 ≥ 3
x1 ≥ 0, x2 ≥ 0

16. Solve each of the following linear programming problems using the
graphical method:

(a) Minimize: Z = 3x1 + 5x2


Subject to: 3x1 + 2x2 ≥ 36
3x1 + 5x2 ≥ 45
x1 ≥ 0, x2 ≥ 0

(b) Minimize: Z = 3x1 − 8x2


Subject to: 2x1 − x2 ≤ 4
728 Applied Linear Algebra and Optimization using MATLAB

3x1 + 11x2 ≤ 33
3x1 + 4x2 ≥ 24
x1 ≥ 0, x2 ≥ 0

(c) Minimize: Z = 3x1 − 5x2


Subject to: 2x1 − x2 ≤ −2
4x1 − x2 ≥ 0
x2 ≤ 3
x1 ≥ 0, x2 ≥ 0

(d) Minimize: Z = −3x1 + 2x2


Subject to: 3x1 − x2 ≥ −5
−x1 + x2 ≥ 1
2x1 + 4x2 ≥ 12
x1 ≥ 0, x2 ≥ 0

17. Solve each of the following linear programming problems using the
simplex method:

(a) Maximize: Z = 3x1 + 2x2


Subject to: −x1 + 2x2 ≤ 4
3x1 + 2x2 ≤ 14
x1 − x2 ≤ 3
x1 ≥ 0, x2 ≥ 0

(b) Maximize: Z = x1 + 3x2


Subject to: 4x1 ≤ 5
x1 + 2x2 ≤ 10
x2 ≤ 4
x1 ≥ 0, x2 ≥ 0

(c) Maximize: Z = 10x1 + 5x2


Subject to: x1 + x2 ≤ 180
3x1 + 2x2 ≤ 480
Linear Programming 729

x1 ≥ 0, x2 ≥ 0

(d) Maximize: Z = x1 − 4x2


Subject to: x1 + 2x2 ≤ 5
x1 + 6x2 ≤ 7
x1 ≥ 0, x2 ≥ 0

18. Solve Problem 13 using the simplex method.

19. Solve each of the following linear programming problems using the
simplex method:

(a) Maximize: Z = 2x1 + 4x2 + x3


Subject to: −x1 + 2x2 + 3x3 ≤ 6
−x1 + 4x2 + 5x3 ≤ 5
−x1 + 5x2 + 7x3 ≤ 7
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0

(b) Minimize: Z = 2x1 − x2 + 2x3


Subject to: −x1 + x2 + x3 ≤ 4
x1 + 2x2 + 3x3 ≤ 3
−x1 + x2 − x3 ≤ 6
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0

(c) Maximize: Z = 100x1 + 200x2 + 50x3


Subject to: 5x1 + 5x2 + 10x3 ≤ 1000
10x1 + 8x2 + 5x3 ≤ 2000
10x1 + 5x2 ≤ 500
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0

(d) Minimize: Z = 6x1 + 3x2 + 4x3


Subject to: x1 ≥ 30
2x2 ≤ 50
x3 ≥ 20
730 Applied Linear Algebra and Optimization using MATLAB

x1 + x2 + x3 ≥ 120
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0

20. Solve each of the following linear programming problems using the
simplex method:

(a) Minimize: Z = x1 + 2x2 + 3x3 + 4x4


Subject to: x1 + 2x2 + 2x3 + 3x4 ≤ 20
2x1 + x2 + 3x3 + 2x4 ≤ 20
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0

(b) Maximize: Z = x1 + 2x2 + 4x3 − x4


Subject to: 5x1 + 4x3 + 6x4 ≤ 20
4x1 + 2x2 + 2x3 + 8x4 ≤ 40
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0

(c) Minimize: Z = 3x1 + x2 + x3 + x4


Subject to: 2x1 + 2x2 + x3 + 2x4 ≤ 4
3x1 + x2 + 2x3 + 4x4 ≤ 6
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0

(d) Minimize: Z = x1 + 2x2 − x3 + 3x4


Subject to: 2x1 + 4x2 + 5x3 + 6x4 ≤ 24
4x1 + 4x2 + 2x3 + 2x4 ≤ 4
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0

21. Use Big M simplex method to solve each of the following linear pro-
gramming problems:

(a) Minimize: Z = 4x1 + 4x2 + x3


Subject to: x 1 + x2 + x3 ≤ 2
2x1 + x2 + 0x3 ≤ 3
2x1 + x2 + 3x3 ≥ 3
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0
Linear Programming 731

(b) Minimize: Z = 2x1 + 3x2


Subject to: 0.5x1 + 0.25x2 ≤ 4
x1 + 3x2 ≥ 36
x1 + x2 = 10
x1 ≥ 0, x2 ≥ 0

(c) Minimize: Z = −3x1 − 2x2


Subject to: x1 + x2 = 10
x1 + 0x2 ≥ 4
x1 ≥ 0, x2 ≥ 0

(d) Minimize: Z = 2x1 + 3x2


Subject to: 0.5x1 + 0.25x2 ≤ 4
x1 + 3x2 ≥ 20
x1 ≥ 0, x2 ≥ 0

22. Use the Two-Phase simplex method to solve each of the following
linear programming problems:

(a) Minimize: Z = 2x1 + 3x2


Subject to: 0.5x1 + 0.25x2 ≤ 4
x1 + 3x2 ≥ 36
x1 + x2 = 10
x1 ≥ 0, x2 ≥ 0

(b) Maximize: Z = 2x1 + 3x2 − 5x3


Subject to: x1 + x2 + x3 = 7
2x1 − 5x2 + x3 ≥ 10
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0

(c) Minimize: Z = 4x1 + x2


Subject to: 3x1 + x2 = 3
4x1 + 3x2 − x3 = 6
4x1 + 2x2 + x4 = 4
732 Applied Linear Algebra and Optimization using MATLAB

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0

(d) Minimize: Z = 40x1 + 10x2 + 7x5 + 14x6


Subject to: x1 − x2 + 2x5 = 0
−2x1 + x2 − 2x5 = 0
x1 + x 3 + x5 − x6 = 3
2x2 + x3 + x4 + 2x5 + x6 = 4
xi ≥ 0, i = 1, . . . , 6

23. Write the duals of each of the following linear programming problems:

(a) Maximize: Z = −5x1 + 2x2


Subject to: −x1 + x2 ≤ −3
2x1 + 3x2 ≤ 5
x1 ≥ 0, x2 ≥ 0

(b) Maximize: Z = 5x1 + 6x2 + 4x3


Subject to: x1 + 4x2 + 6x3 ≤ 12
2x1 + x2 + 2x3 ≤ 11
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0

(c) Minimize: Z = 6x1 + 3x2


Subject to: 6x1 − 3x2 + x3 ≥ 2
3x1 + 4x2 + x3 ≥ 5
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0

(d) Minimize: Z = 2x1 + x2 + 2x3


Subject to: −x1 + x2 + x3 = 4
−x1 + x2 − x3 ≤ 6
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0

24. Write the duals of each of the following linear programming problems:

(a) Minimize: Z = 3x1 + 4x2 + 6x − 3


Subject to: x1 + x2 ≥ 10
Linear Programming 733

x1 ≥ 0, x3 ≥ 0, x2 ≤ 0

(b) Maximize: Z = 5x1 + 2x2 + 3x3


Subject to: x1 + 5x2 + 2x3 = 30
x1 − x2 − 6x3 ≤ 40
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0

(c) Minimize: Z = x1 + 2x2 + 3x3 + 4x4


Subject to: 2x1 + 2x2 + 2x3 + 3x4 ≥ 30
2x1 + x2 + 3x3 + 2x4 ≥ 20
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0

(d) Maximize: Z = x1 + x2
Subject to: 2x1 + x2 = 5
3x1 − x2 = 6
x1 ≥ 0, x2 ≥ 0

(e) Maximize: Z = 2x1 + 4x2


Subject to: 4x1 + 3x2 = 50
2x1 + 5x2 = 75
x1 ≥ 0, x2 ≥ 0

(f ) Maximize: Z = x1 + 2x2 + 3x − 3
Subject to: 2x1 + x2 + x3 = 5
x1 + 3x2 + 4x3 = 3
x1 ≥ 0, x2 ≥ 0
Chapter 7

Nonlinear Programming

7.1 Introduction
In the previous chapter, we studied linear programming problems in some
detail. For such cases, our goal was to maximize or minimize a linear func-
tion subject to linear constraints. But in many interesting maximization
and minimization problems, the objective function may not be a linear
function, or some of the constraints may not be linear constraints. Such
an optimization problem is called a Nonlinear Programming (NLP) prob-
lem.

An NLP problem is characterized by terms or groups of terms that


involve intrinsically nonlinear functions. For example, the transcendental
functions such as sin(x), cos(x), or exponential functions and logarithmic
functions such as ex , ln(x + 1), etc. Nonlinearities also arise as a result of
interactions between two or more variables, such as x ln y, xy, xy , and so on.

735
736 Applied Linear Algebra and Optimization using MATLAB

Remember that in studying linear programming solution techniques,


there was a basic underlying structure that was exploited in solving those
problems. This structure found that an optimal solution could be achieved
by solving linear systems of equations. It was also known that an optimal
solution would always be found at an extreme point of the feasible solution
space. Though in solving NLP problems, an optimal solution might be
found at an extreme point, or at a point of discontinuity, in addition,
algorithmic techniques might involve the solution of simultaneous linear
systems of equations, simultaneous nonlinear equations, or both. Before
formally defining an NLP problem, we begin with a review of material from
differential calculus, Taylor’s series approximations, and definitions of the
gradient vector and the Hessian matrix of functions of n variables, which
will be needed for our study of nonlinear programming. A discussion of
quadratic functions and convex functions and sets is also included.

7.2 Review of Differential Calculus


We begin with a review of material from differential calculus, which will
be needed for the discussion of nonlinear programming.

7.2.1 Limits of Functions


The concept of the limit of a function f is one of the fundamental ideas that
distinguishes calculus from algebra and trigonometry. One of the important
things to know about a function f is how its outputs will change when the
inputs change. If the inputs get closer and closer to some specific value a,
for example, will the outputs get closer to some specific value L? If they
do, we want to know that because it means we can control the outputs by
controlling the inputs.

Definition 7.1 (Limits)

The equation
lim f (x) = L
x→a
Nonlinear Programming 737

means that as x gets closer to a (real number), the value of f (x) gets
arbitrarily closer to L. Note that the limit of a function may or may not
exist. •

Example 7.1 Find the limit, if it exists:

2x3 − 6x2 + x − 3
lim .
x→3 x−3
Solution. The domain of the given function

2x3 − 6x2 + x − 3
f (x) =
x−3
is all the real numbers except the number 3. To find the limit we shall
change the form of f (x) by factoring it as follows:

(2x2 + 1)(x − 3)
lim .
x→3 x−3
When we investigate the limit as x → 3, we assume that x 6= 3. Hence,
x − 3 6= 0, and we can now cancel the factor x − 3. Thus,

lim 2x2 + 1 = 2(3)2 + 1 = 19,


x→3

and the given limit exists. •

Example 7.2 Find the limit, if it exists:



lim x − 3.
x→3

Solution. The given function is



f (x) = x − 3.

To find the limit of f (x), we have to find the one-side limits, i.e., the
right-hand limit
lim+ f (x),
x→3
738 Applied Linear Algebra and Optimization using MATLAB

and the left-hand limit


lim f (x).
x→3−
First, we find √
lim+ x − 3.
x→3

If x > 3, then x − 3 > 0, and hence f (x) = x − 3 is a real number; i.e.,
f (x) is defined. Thus,
√ √
lim− x − 3 = 3 − 3 = 0.
x→3

Now we find the left-hand limit



lim− x − 3.
x→3

But this limit does not exist because f (x) = x − 3 is not a real number,
if x < 3. Thus, the limit of the function

lim x − 3,
x→3

does not exist because f (x) is not defined throughout an open interval con-
taining 3. •

The relationship between one-sided limits and limits is described in the


following theorem.
Theorem 7.1
lim f (x) = L, if and only if lim f (x) = L = lim− f (x).
x→a x→a+ x→a

7.2.2 Continuity of a Function


In mathematics and science, we use the word continuous to describe a
process that goes on without abrupt changes. In fact, our experience leads
us to assume that this is an essential feature of many natural processes.
The issue of continuity has become one of practical as well as theoretical
importance. As scientists, we need to know when continuity is called for,
what it is, and how to test for it.
Nonlinear Programming 739

Definition 7.2 (Continuity)

A function f (x) is continuous at a point a, if the following conditions are


satisfied:
(i) f (a) must be defined (real number).
(ii) lim f (x) must exist.
x→a

(iii) lim f (x) = f (a).


x→a

Note that if f (x) is not a continuous function at x = a, then we say that


f (x) is discontinuous (or has a discontinuity) at a. •

Example 7.3 Show that the function



2
 16 − x , if x 6= 4


f (x) = 4−x


 8, if x = 4
is continuous at x = 4.

Solution. The given function is continuous at x = 4 because


(i) f (4) = 8 is a real number.
16 − x2 (4 − x)(4 + x)
(ii) lim = lim = lim (x + 4) = 8 exists.
x→4 4 − x x→4 4−x x→4

16 − x2
(iii) lim = 8 = f (4). •
x→4 4 − x

7.2.3 Derivative of a Function


Derivatives are the functions that measure the rates at which things change.
We use them to calculate velocities and accelerations, to predict the effect
of flight maneuvers on the heart, and to describe how sensitive formulas
are to errors in measurement.
740 Applied Linear Algebra and Optimization using MATLAB

Definition 7.3 (Differentiation)

The derivative of a function f (x) at an arbitrary point x can be denoted as


f 0 (x) and is defined as

f (x + h) − f (x)
f 0 (x) = lim , provided limits exist.
x→a h
The process of finding derivatives is called differentiation. •

If a limit does not exist, then the function is not differentiable. Re-
member that the derivative of f (x) at x = a, i.e., f 0 (a) is called the slope
of f (x). If f 0 (a) > 0, then f (x) is increasing at x = a, whereas if f 0 (a) < 0,
then f (x) is decreasing at x = a.

Basic Rules of Differentiation

Functions Derivatives of the Functions

f (x) = c f 0 (x) = 0
f (x) = x f 0 (x) = 1
f (x) = ax + b f 0 (x) = a
f (x) = xn f 0 (x) = nxn−1
f (x) = ebx f 0 (x) = bebx
f (x) = ax f 0 (x) = ax ln a
f (x) = ln x f 0 (x) = x1
f (x) = [g(x)]n f 0 (x) = n[g(x)]n−1 g 0 (x)
f (x) = [f1 (x) + f2 (x)] f 0 (x) = f10 (x) + f20 (x)
f (x) = [f1 (x)f2 (x)] f 0 (x) = f10 (x)f2 (x) + f1 (x)f20 (x)
f1 (x) f2 (x)f10 (x) − f1 (x)f20 (x)
f (x) = f 0 (x) =
f2 (x) [f2 (x)]2

Higher Derivatives
Sometimes we have to find the derivatives of derivatives. For this we can
take sequential derivatives to form second derivatives, third derivatives,
and so on. As we have seen, if we differentiate a function f , we obtain
Nonlinear Programming 741

another function denoted f 0 . If f 0 has a derivative, it is denoted f 00 and is


called the second derivative of f . The third derivative of f , denoted f 000 , is
the derivative of the second derivative. In general, if n is a positive integer,
then f n denotes the nth derivative of f and is found by starting with f
and differentiating, successively, n times.

Example 7.4 Find the first three derivatives of the function

f (x) = 2x4 − 5x3 + x2 − 4x + 1.

Solution. By using the differentiation rule, we find the first derivative of


the given function as follows:

f 0 (x) = 8x3 − 15x2 + 2x − 4.

Similarly, we can find the second and the third derivatives of the function
as follows:

f 00 (x) = 24x2 − 30x + 2


f 000 (x) = 48x − 30,

which are the required first three derivatives of the function. •

To plot the above function f (x) and its first three derivatives f 0 (x),
f 00 (x), f 000 (x), we use the following MATLAB commands:

>> x = [−3 : 0.01 : 3];


>> y = 2 ∗ x.ˆ 4−5 ∗ x.ˆ 3+x.ˆ 2−4 ∗ x + 1;
>> plot(x, y)
>> holdon
>> dy = 8 ∗ x.ˆ 3−15 ∗ x.ˆ 2+2 ∗ x − 4;
>> plot(x, dy)
>> ddy = 24 ∗ x.ˆ 2−30 ∗ x + 2;
>> plot(x, ddy)
>> dddy = 48 ∗ x − 30;
>> plot(x, dddy)
742 Applied Linear Algebra and Optimization using MATLAB

Figure 7.1: Higher-order derivatives of the function.

7.2.4 Local Extrema of a Function


One of the principal applications of derivatives is to find the local maxi-
mum and local minimum values (local extrema) of a function in an interval.
Points at which the first derivative of the function is zero (f 0 (x) = 0) are
called critical points of f (x). Although the condition f 0 (c) = 0 (c is called
the critical point of f (x)) is used to find extrema, it does not guarantee
that f (x) have a local extremum there. For example, f (x) = x3 ; f 0 (0) = 0,
but f (x) has no extreme value at x = 0.

Let f (x) be continuous on the open interval (a, b) and let f 0 (x) exist
and be continuous on (a, b). If f 0 (x) > 0 in (a, c) and f 0 (x) < 0 in (c, b),
then f (x) is concave downward at x = c. On the other hand, if f 0 (x) < 0
in (a, c) and f 0 (x) > 0 in (c, b), then f (x) is concave upward at x = c. The
type of concavity is related to the sign of the second derivative, and so
we have the second derivative test to determine if a critical point is local
extremum or not.
Nonlinear Programming 743

Theorem 7.2 (Second Derivative Test)

If f 0 (c) = 0 and f 00 (x) exists, then:

(i) if f 00 (x) < 0, then f (x) has a local maximum at x = c;

(ii) if f 00 (x) > 0, then f (x) has a local minimum at x = c;

(iii) if f 00 (x) = 0, then there is no information at x = c.

A point x = D is called an inflection point, if f (x) is concave downward


on one side of D and concave on the other side. Consequently, f 00 (x) = 0
at an inflection point. It is not necessary that f 0 (x) = 0 at an inflection
point. •

Example 7.5 Find the local extrema and inflection points of the function

f (x) = x3 − 2x2 + x + 1

over the entire x-axis.

Solution. The first derivative of the function is

f 0 (x) = 3x2 − 4x + 1,

and the equation

f 0 (x) = 3x2 − 4x + 1 = (3x − 1)(x − 1) = 0


1
shows that there are two critical points, x = 3
and x = 1. The second
derivative of the function is

f 00 (x) = 6x − 4.

The fact that f 00 ( 31 ) = −2 < 0 and f 00 (1) = 2 > 0 tells us that the critical
point x = 13 is the local maximum (f ( 13 ) = 27 31
) and x = 1 is the local
minimum (f (1) = 1) of f (x). The inflection point is given by f 00 (x) =
0, or at x = 23 , i.e., ( 23 , − 31 ). •
744 Applied Linear Algebra and Optimization using MATLAB

Figure 7.2: Local extrema of the function.

Example 7.6 Find the maximum and minimum values of the function
f (x) = x3 − 6x2 − 15x + 1
on the closed interval [−2, 6].

Solution. First, we find the critical points of the function by differentiating


the function, which gives
f 0 (x) = 3x2 − 12x − 15 = 3(x + 1)(x − 5).
Since the derivative exists for every x, the only critical points are those
for which the derivative is zero—i.e., −1 and 5. As f (x) is continuous on
[−2, 6], it follows that the maximum and minimum values are among the
numbers f (−2), f (−1), f (5), and f (6). Calculating these values, we obtain
the following:
f (−2) = −1, f (−1) = 9, f (5) = −99, f (6) = −89.
Thus, the minimum value of f (x) on [−2, 6] is the smallest function
value f (5) = −99, and the maximum value is the largest value f (−1) = 9
on [−2, 6].
Nonlinear Programming 745

Figure 7.3: Absolute extrema of the function.

Note that the extrema of a function on the closed interval [a, b] is also
called the absolute extrema of a function. •

The MATLAB command fminbnd can be used to find the minimum


of a function of a single variable within the interval [a, b]. It has the form:

x = f minbnd(0 f unction0 , a, b)

Note that the function can be entered as a string, as the name of a function
file, or as the name of an inline function, i.e.,

>> f = inline(0 xˆ 3−6 ∗ xˆ 2−15 ∗ x + 10 );


>> a = −2; b = 6;
>> x = f minbnd(0 f 0 , a, b)
x=
5.0000
746 Applied Linear Algebra and Optimization using MATLAB

The value of the function at the minimum can be added to the output by
using the following command:

>> [x f value] = f minbnd(0 xˆ 3−6 ∗ xˆ 2−15 ∗ x + 10 , −2, 6)


x=
5.0000
f value =
−99.0000

Also, the fminbnd command can be used to find the maximum of a func-
tion, which can be done by multiplying the function by −1 and then finding
the minimum. For example:

>> [x f value] = f minbnd(0 −xˆ 3+6 ∗ xˆ 2+15 ∗ x − 10 , −2, 6)


x=
−1.0000
f value =
−9.0000

Note that the maximum of the function is at x = −1, and the value of the
function at this point is 9.

Definition 7.4 (Partial Differentiation)

Let f be a function of two variables. The first partial derivative of f with


respect to x1 and x2 are the functions fx1 and fx2 , such that

f (x1 + h, x2 ) − f (x1 , x2 )
fx1 (x1 , x2 ) = lim
h→0 h
f (x1 , x2 + h) − f (x1 , x2 )
fx2 (x1 , x2 ) = lim .
h→0 h
In this definition, x1 and x2 are fixed (but arbitrary) and h is the only
variable; hence, we use the notation for limits of functions of one vari-
able instead of the (x1 , x2 ) → (a, b) notation introduced previously. We
Nonlinear Programming 747

can find partial derivatives without using limits, as follows. If we let


y = b and define a function g of one variable by g(x1 ) = f (x1 , x2 ), then
g 0 (x1 ) = fx1 (x1 , b) = fx1 (x1 , x2 ). Thus, to find fx1 (x1 , x2 ), we regard x2
as a constant and differentiate f (x1 , x2 ) with respect to x. Similarly, to
find fx2 (x1 , x2 ), we regard the variable x1 as a constant and differentiate
f (x1 , x2 ) with respect to x2 . •

Notations for Partial Derivatives


If z = f (x1 , x2 ), then the first partial derivative of a function is defined as

∂f ∂f
f x1 = , fx2 =
∂x1 ∂x2
∂ ∂z
fx1 (x1 , x2 ) = f (x1 , x2 ) = = zx1
∂x1 ∂x1
∂ ∂z
fx2 (x1 , x2 ) = f (x1 , x2 ) = = zx2 .
∂x2 ∂x2

Second Partial Derivatives


If f is a function of two variables x1 and x2 , then fx1 and fx2 are also
functions of two variables, and we may consider their first partial deriva-
tives. These are called the second partial derivatives of f and are denoted
as follows:
∂ 2f
 
∂ ∂ ∂f
fx1 = (fx1 )x1 = fx1 x1 = =
∂x1 ∂x1 ∂x1 ∂x21
∂ 2f
 
∂ ∂ ∂f
fx1 = (fx1 )x2 = fx1 x2 = =
∂x2 ∂x2 ∂x1 ∂x2 ∂x1
∂ 2f
 
∂ ∂ ∂f
fx2 = (fx2 )x1 = fx2 x1 = =
∂x1 ∂x1 ∂x2 ∂x1 ∂x2
∂ 2f
 
∂ ∂ ∂f
fx2 = (fx2 )x2 = fx2 x2 = = .
∂x2 ∂x2 ∂x2 ∂x2 2
748 Applied Linear Algebra and Optimization using MATLAB

Theorem 7.3 Let f be a function of two variables x1 and x2 . If f, fx1 ,


fx2 , fx1 x2 , and fx2 x1 are continuous on an open region R, then

f x1 x2 = f x2 x1 ,

throughout R. •

Example 7.7 Find the first partial derivatives of the function


q
f (x1 , x2 ) = x21 + x21 + x22 + 4,

and also compute the value of fx1 x2 (1, 2).

Solution. The first partial derivatives of the given function are as follows:

∂f x1
fx1 = = 2x1 + p 2
∂x1 x1 + x22 + 4
∂f x2
fx2 = =p 2 .
∂x2 x1 + x22 + 4

Similarly, the second derivative is

∂fx1 x 1 x2
fx1 x2 (x1 , x2 ) = =− 2 ,
∂x2 (x1 + x22 + 4)3/2

and its value at (1, 2) is

(1)(2)
fx1 x2 (1, 2) = − = −0.0741.
(12 + 22 + 4)3/2

To plot the above function f (x1 , x2 ) and its partial derivatives fx1 , fx2 ,
fx1 x2 , we use the following MATLAB commands
Nonlinear Programming 749

>> syms x1 x2 ;
>> f = x1 ˆ 2+sqrt(x1 ˆ 2+x2 ˆ 2+4);
>> f x1 = dif f (f, x1 ); f x2 = dif f (f, x2 );
>> f x1 x2 = (f x1 , x2 );
>> simplif y(f x1 x2 );
>> subs(f x1 x2 , {x1 , x2 }, {1, 2});
>> ezmesh(f );
>> ezmesh(f x1 );
>> ezmesh(f x1 x2 );

Figure 7.4: Partial derivatives of the function.

Example 7.8 Find the second partial derivatives of the function

f (x1 , x2 , x3 ) = 3x31 + 4x21 x2 + 2x32 − 2x2 x23 + 4x33 .

Solution. The first partial derivatives of the given function are as follows:

∂f
f x1 = = 9x21 + 8x1 x2
∂x1
∂f
f x2 = = 4x21 + 6x22 − 2x23
∂x2
∂f
f x3 = = −4x2 x3 + 12x23 .
∂x3
750 Applied Linear Algebra and Optimization using MATLAB

Similarly, the second derivatives of the functions are as follows:


∂fx1 ∂ 2f
(fx1 )x1 = = = 18x1 + 8x2
∂x1 2 ∂x1 2
∂fx2 ∂ 2f
(fx2 )x2 = = = 12x2
∂x2 2 ∂x2 2
∂fx3 ∂ 2f
(fx3 )x3 = = = −4x2 + 24x3
∂x3 2 ∂x3 2
∂fx1 ∂ 2f ∂ 2f
(fx1 )x2 = = = 8x1 = = (fx2 )x1
∂x2 ∂x2 ∂x1 ∂x1 ∂x2
∂fx1 ∂ 2f ∂fx3 ∂ 2f
(fx1 )x3 = = =0= = = (fx3 )x1
∂x3 ∂x3 ∂x1 ∂x1 ∂x1 ∂x3
∂fx2 ∂ 2f ∂fx3 ∂ 2f
(fx2 )x3 = = = −4x3 = = = (fx3 )x2 .
∂x3 ∂x3 ∂x2 ∂x2 ∂x2 ∂x3

In the following we give a theorem that is analogous to the Second


Derivative Test for functions of one variable.
Theorem 7.4 (Second Partials Test)

Suppose that f (x1 , x2 ) has continuous partial derivatives in a neighborhood


of a point (x10 , x20 ) and that ∇f (x10 , x20 ) = 0. Let

D = D(x10 , x20 ) = fx1 x1 (x10 , x20 )fx2 x2 (x10 , x20 ) − fx21 x2 (x10 , x20 ).

Then:
(i) if D > 0 and fx1 x1 (x10 , x20 ) < 0, f (x10 , x20 ) is a local maximum
value;

(ii) if D > 0 and fx1 x1 (x10 , x20 ) > 0, f (x10 , x20 ) is a local minimum
value;

(iii) if D < 0, f (x10 , x20 ) is not an extreme value ((x10 , x20 ) is a saddle point) ;
Nonlinear Programming 751

(iv) if D = 0, the test is inconclusive. •

Example 7.9 Find the extrema, if any, of the function

f (x1 , x2 ) = 4x31 + 2x22 − 12x1 + 8x2 + 2.

Solution. Since the first derivatives of the function with respect to x1 and
x2 are

fx1 (x1 , x2 ) = 12x21 − 12 and fx2 (x1 , x2 ) = 4x2 + 8,

the critical points obtained by solving the simultaneous equations fx1 (x1 , x2 ) =
fx2 (x1 , x2 ) = 0, are (1, −2) and (−1, −2).

To find the critical points for the given function f (x1 , x2 ) using MAT-
LAB commands we do the following:

>> syms x1 x2 ;
>> f = 4 ∗ x1 ˆ 3 + 2 ∗ x2 ˆ 2 − 12 ∗ x1 + 8 ∗ x2 + 2;
>> f x1 = dif f (f, x1 );
>> f x2 = dif f (f, x2 );
>> [x1 c, x2 c] = solve(f x1 , f x2 );
x1 c =
[ 1]
[−1]
x2 c =
[−2]
[−2]
Similarly, the second partial derivatives of the function are

fx1 x1 (x1 , x2 ) = 24x1 , fx2 x2 (x1 , x2 ) = 4, and fx1 x2 (x1 , x2 ) = 0.

Thus, at the critical point (1, −2), we get

D = fx1 x1 (1, −2)fx2 x2 (1, −2) − fx21 x2 (1, −2) = 24(4) − 0 = 96 > 0.

Furthermore, fx1 x1 (1, −2) = 24 > 0 and so, by (ii), f (1, −2) = −14 is a
local minimum value of the given function f (x1 , x2 ).
752 Applied Linear Algebra and Optimization using MATLAB

Now testing the given function at the other critical point, (−1, −2), we find
that

D = fx1 x1 (−1, −2)fx2 x2 (−1, −2)−fx21 x2 (−1, −2) = (−24)(4)−0 = −96 < 0.

Thus, by (iii), (−1, −2) is a saddle point and f (1, −2) is not an extremum.

To get the above results we use the following MATLAB commands:

>> syms x1 x2
>> f = 4 ∗ x1 ˆ 3 + 2 ∗ x2 ˆ 2 − 12 ∗ x1 + 8 ∗ x2 + 2;
>> f x1 = dif f (f, x1 );
>> f x1 x1 = dif f (f x1 , x1 );
>> f x2 = dif f (f, x2 );
>> f x2 x2 = dif f (f x2 , x2 );
>> f x1 x2 = dif f (f x1 , x2 );
>> D = f x1 x1 ∗ f x2 x2 − fx1 x2 ˆ 2;
>> [ax1 ax2 ] = solve(f x1 , f x2 );
>> T = [ax1 ax2 subs(D, x1 x2 , ax1 ax2 ) subs(f x1 x1 , x1 x2 , ax1 ax2 )];
>> double(T );
ans =
1 −2 96 24
−1 −2 −96 −24

To plot a symbolic expression f that contains two variables x1 and x2 , we


use the ezplot command as follows:

>> ezplot(f );

7.2.5 Directional Derivatives and the Gradient Vector


Here, we introduce a type of derivative, called a directional derivative, that
enables us to find the rate of change of a function of two or more variables
in any direction.
Nonlinear Programming 753

Figure 7.5: Local extrema of the function.

Definition 7.5 (Directional Derivatives)

Let z = f (x1 , x2 ) be a function, and the directional derivative of f (x1 , x2 )


at the point (x10 , x20 ) in the direction of a unit vector u = (u1 , u2 ) is given
by
f (x10 + hu1 , x20 + hu2 ) − f (x10 , x20 )
Du f (x10 , x20 ) = lim
h→0 h
provided the limit exists. •

Notice that this definition is similar to the definition of a partial deriva-


tive, except that in this case, both variables may be changed. Further, one
can observe that the directional derivative in the direction of the positive
x1 -axis (i.e., in the direction of the unit vector u = (1, 0)) is

f (x10 + h, x20 ) − f (x10 , x20 )


Du f (x10 , x20 ) = lim ,
h→0 h
∂f
which is the partial derivative ∂x 1
. Likewise, the directional derivative in
the direction of the positive x2 -axis (i.e., in the direction of the unit vector
754 Applied Linear Algebra and Optimization using MATLAB

u = (0, 1)) is

f (x10 , x20 + h) − f (x10 , x20 )


Du f (x10 , x20 ) = lim ,
h→0 h
∂f
which is the partial derivative ∂x 2
. So this means that any directional
derivative can be calculated simply in terms of the first derivatives.

Theorem 7.5 If f (x1 , x2 ) is a differentiable function of x1 and x2 , then


f (x1 , x2 ) has a directional derivative in the direction of any unit vector
u = (u1 , u2 ) and can be written as

∂f ∂f
Du f (x1 , x2 ) = u1 + u2 .
∂x1 ∂x2

Example 7.10 Find the directional derivative of f (x, y) = x21 x2 + 3x22 at


the point (1, 2) in the direction of the unit vector u = (3, −2).

Solution. Since we know that


∂f ∂f
Du f (x1 , x2 ) = u1 + u2 ,
∂x1 ∂y

we can easily compute the first partial derivatives of the given function as

∂f ∂f
f x1 = = 2x1 x2 and f x2 = = x21 + 6x2 ,
∂x1 ∂x2

and their values at the given point (1, 2) are

fx1 (1, 2) = 4 and fx2 (1, 2) = 13.

Thus, for the given unit vector u = (3, −2), and using Theorem 7.5, we
have
Du f (1, 2) = 4(3) + 13(−2) = 12 − 26 = −14,
which is the required solution. •
Nonlinear Programming 755

To get the results of Example 7.10, we the following MATLAB com-


mands:

>> syms x1 x2 ;
>> u1 = 3; u2 = −2;
>> f = x1 ˆ 2x2 + 3 ∗ x2 ˆ 2;
>> f x1 = (f, x1 ); f x2 = (f, x2 );
>> du f = subs(f x1 , {x1 , x2 }, {1, 2}) ∗ u1 + subs(f x2 , {x1 , x2 }, {1, 2}) ∗ u2;

It is useful to combine the first partial derivatives of a function into a single


vector function called a gradient. We denote the gradient of a function f
by grad f or ∇f.

Definition 7.6 (The Gradient Vector)

Let z = f (x1 , x2 ) be a function, then the gradient of f (x1 , x2 ) is the vector


function ∇f defined by
 
∂f ∂f ∂f ∂f
∇f (x1 , x2 ) = , = i+ j,
∂x1 ∂x2 ∂x1 ∂x2

provided that both partial derivatives exist. •

Similarly, the vector of partial derivatives of a function f (x1 , x2 , . . . , xn )


with respect to the point x = (x1 , x2 , . . . , xn ) is defined as
 
∂f (x)
 ∂x1 
 
 
 ∂f (x) 
 
∇f (x) = 
 .
∂x 2 

 .
..


 
 ∂f (x) 
∂xn

One can easily compute the directional derivatives by using the following
theorem.
756 Applied Linear Algebra and Optimization using MATLAB

Theorem 7.6 If f (x1 , x2 ) is a differentiable function of x1 and x2 , and u


is any unit vector, then

Du f (x1 , x2 ) = ∇f (x1 , x2 ).u.

Example 7.11 Find the gradient of the function f (x1 , x2 , x3 ) = x1 cos(x2 x3 )


at the point (2, 0, 2).

Solution. The gradient of the function is

∇f (x1 , x2 , x3 ) = (fx1 , fx2 , fx3 ) = (cos(x2 x3 ), −x1 x3 sin(x2 x3 ), −x1 x2 sin(x2 x3 )).

At the given point (2, 0, 2), we have

∇f (2, 0, 2) = (1, 0, 0).

The MATLAB command jacobian can be used to get the gradient of


the function as follows:

>> syms x1 x2 x3
>> f = x1 ∗ cos(x2 ∗ x3 );
>> gradf = jacobian(f, [x1 , x2 , x3 ]);
ans =

[cos(x2 ∗ x3 ), −x1 ∗ sin(x2 ∗ x3 ) ∗ x3 , −x1 ∗ sin(x2 ∗ x3 ) ∗ x2 ]

Theorem 7.7 The point x̄ is a stationary point of f (x), if and only if

∇f (x̄) = 0.

Example 7.12 Consider the function

f (x1 , x2 ) = x21 + (x2 − 1)2 + 3,


Nonlinear Programming 757

then  
2x1
∇f (x1 , x2 ) = ,
2(x2 − 1)
and at the stationary point x̄ = [0, 21 ]T , the gradient vector of the function
is    
1 0
∇f 0, = .
2 −1

7.2.6 Hessian Matrix


If a function f is twice continuous differentiable, then there exists a matrix
H of second partial derivatives, called the Hessian matrix, whose entries
are given by
∂ 2f
hij = , i, j = 1, 2, . . . , n.
∂xi ∂xj
For example, the Hessian matrix of size 2 × 2 can be written as
 2
∂ 2f

∂ f
 ∂x2 ∂x1 ∂x2 
 1 
H(x1 , x2 ) = ∇2 f (x1 , x2 ) = 

.

 ∂ 2f 2
∂ f 
∂x2 ∂x1 ∂x22

This matrix is formally referred to as the Hessian of f. Note that the Hes-
sian matrix is square and symmetric.

Example 7.13 Find the Hessian matrix of the function

f (x1 , x2 ) = x21 + 2(x2 − 1)2 + 3x1 x2 + 4.

Solution. The first-order partial derivatives of the given function are

∂f ∂f
= 2x1 + 3x2 , = 4(x2 − 1) + 3x1 ,
∂x1 ∂x2
758 Applied Linear Algebra and Optimization using MATLAB

and the second-order partial derivatives of the given functions are

∂ 2f ∂ 2f
= 2, = 4.
∂x21 ∂x22

Also, the mixed partial derivatives are

∂ 2f ∂ 2f
= 3, = 3.
∂x1 ∂x2 ∂x2 ∂x1

Thus,
 
2 3
H(x1 , x2 ) =
3 4
is the Hessian matrix of the given function. •

To get the above results we use the following MATLAB commands:

>> syms x1 x2
>> f un = x1 ˆ 2+2 ∗ (x2 − 1)ˆ 2+3 ∗ x1 ∗ x2 + 4;
>> T = Hessian(f un, [x1 , x2 ])
>> double(T )
2 3
ans =
3 4

Program 7.1
MATLAB m-file for the Hessian Matrix
function H = Hessian(fun,Vars)
n = numel(Vars);
Hess = vpa(ones(n,n));
for j = 1:n;
for i = 1:n;
Hess(j,i) = diff(diff(fun,Vars(i),1),Vars(j),1);
end end
H=Hess;
Nonlinear Programming 759

For the n-dimensional case, the Hessian matrix H(x) is defined as follows:
 2 
∂ f (x) ∂ 2 f (x) ∂ 2 f (x)
 ∂x2 ···
 1 ∂x1 ∂x2 ∂x1 ∂xn  
 
 
 ∂ 2 f (x) ∂ 2 f (x) 2
∂ f (x) 
 ··· 
 ∂x ∂x
2 1 ∂x22 ∂x2 ∂xn 
H(x) =  .
 
 
 .. .. .. .. 

 . . . . 

 
 2
 ∂ f (x) ∂ 2 f (x) 2

∂ f (x) 
···
∂xn ∂x1 ∂xn ∂x2 ∂x2n

Checking the sign of the second derivative when n = 1 corresponds to


checking the definition of the Hessian matrix when n > 1. Let us consider
a constant matrix H of size n × n and a nonzero n-dimensional vector z,
then:

1. H is positive-definite, if and only if zT Hz > 0.

2. H is positive-semidefinite, if and only if zT Hz ≥ 0.

3. H is negative-definite, if and only if zT Hz < 0.

4. H is negative-semidefinite, if and only if zT Hz ≤ 0.

Note that if H is the zero matrix (so that it is both positive-semidefinite


and negative-semidefinite) or if the sign of zT Hz varies with the choice of
z, we shall say that H is indefinite.

The relationships between the Hessian matrix definiteness and the clas-
sification of stationary points are discussed in the following two theorems.

Theorem 7.8 (Minima of a Function)

If x̄ is a stationary point of f (x), then the following conditions are satisfied:

1. H(x̄) is positive-definite implies that x̄ is a strict minimizing point.


760 Applied Linear Algebra and Optimization using MATLAB

2. H(x̄) is positive-semidefinite for all x̄, in some neighborhood of x̄,


implies that x̄ is a minimizing point.

3. x̄ is a minimizing point implies that H(x̄) is positive-semidefinite. •

Theorem 7.9 (Maxima of a Function)

If x̄ is a stationary point of f (x), then the following conditions are satisfied:


1. H(x̄) is negative-definite implies that x̄ is a strict maximizing point.

2. H(x̄) is negative-semidefinite for all x̄, in some neighborhood of x̄,


implies that x̄ is a maximizing point.

3. x̄ is a maximizing point implies that H(x̄) is negative-semidefinite.•

Since we know that the second derivative test for a function of one
variable gives no information when the second derivative of a function is
zero, similarly, if H(x̄) is indefinite or if H(x̄) is positive-semidefinite at
x̄ but not all points are in a neighborhood of x̄, then the function might
have a maximum, or a minimum, or neither at x̄.

Example 7.14 Consider the function

f (x1 , x2 ) = 2x21 + 2x22 + 4,

then the Hessian matrix of the given function can be found as


 
4 0
H(x1 , x2 ) = .
0 4

To check the definiteness of H, take


  
T
 4 0 z1
z Hz = z1 z2 ,
0 4 z2

which gives  
T 4z1
= 4z12 + 4z22 .

z Hz = z1 z2
4z2
Nonlinear Programming 761

Note that
zT Hz = 4(z12 + z22 ) > 0,
for z =6 0, so the Hessian matrix is positive-definite and the stationary
point x̄ = [0, 0]T is a strict local minimizing point. •

Example 7.15 Consider the function


1 2
f (x1 , x2 ) = (x1 − 1)2 − x + 1,
2 2
then the gradient vector of the given function is
 
2(x1 − 1)
∇f (x1 , x2 ) = .
−x2

The stationary point can be found as


 
2(x1 − 1)
∇f (x1 , x2 ) = = 0,
−x2

which gives
x̄ = [1, 0]T .
The Hessian matrix for the given function is
 
2 0
H(x1 , x2 ) = .
0 −1

To check the definiteness of H, take


  
T
 2 0 z1
z Hz = z1 z2 ,
0 −1 z2

and it gives  
T 2z1
= 2z12 − z22 .

z Hz = z1 z2
−z2
The sign of zT Hz clearly depends on the particular values taken on by
z1 and z2 , so the Hessian matrix is indefinite and the stationary point x
cannot be classified on the basis of this test. •
762 Applied Linear Algebra and Optimization using MATLAB

7.2.7 Taylor’s Series Expansion


Let f (x) be a function of a single variable x, and if f (x) has continuous
derivatives f 0 (x), f 00 (x), . . ., then Taylor’s series can be used to approximate
this function about x0 as follows:
(x − x0 )2 00 (x − x0 )3 000
f (x) = f (x0 ) + (x − x0 )f 0 (x0 ) + f (x0 ) + f (x0 ) + · · · .
2! 3!
A linear approximation of the function can be obtained by truncating the
above series after the second term, i.e.,
f (x) ≈ T1 (x) = f (x0 ) + (x − x0 )f 0 (x0 ), (7.1)
whereas the quadratic approximation of the function can be computed as

0
(x − x2)
f (x) ≈ T2 (x) = f (x0 ) + (x − x0 )f (x0 ) + f 00 (x0 ). (7.2)
2!
Example 7.16 Find the cubic approximation of the function f (x) = ex cos x
expanded about x0 = 0.

Solution. The cubic approximation of the function about x0 is


(x − x0 ) 0 (x − x0 )2 00 (x − x0 )3 000
T3 (x) = f (x0 )+ f (x0 )+ f (x0 )+ f (x0 ). (7.3)
1! 2! 3!
Since f (x) = ex cos x, then f (x0 ) = f (0) = 1, and calculating the deriva-
tives required for the desired polynomial T3 (x), we get
f 0 (x) = ex (cos x − sin x), f 0 (x0 ) = f 0 (0) = 1
f 00 (x) = −2ex sin x, f 00 (x0 ) = f 00 (0) = 0
f 000 (x) = −2ex (cos x + sin x), f 000 (x0 ) = f 000 (0) = −2.
Putting all these values in (7.3), we get
x3
T3 (x) = 1 + x + 0 − 2 .
6
Thus,
x3
f (x) ≈ T3 (x) = 1 + x −
3
is the cubic approximation of the given function about x0 = 0. •
Nonlinear Programming 763

The MATLAB command for a Taylor polynomial is taylor(f, n + 1, a),


where f is the function, n is the order of the polynomial, and a is the point
about which the expansion is made. We can use the following MATLAB
commands to get the above results:

>> syms x
>> f = inline(0 exp(x) ∗ cos(x)0 );
>> taylor(f (x), 4, 0)
ans = 1 + x − 1/3 ∗ xˆ 3

Now consider a function f (x, y) of two variables and all of its partial deriva-
tives are continuous, then we can approximate this function about a given
point (x0 , y0 ) using Taylor’s series as follows:

 
∂f (x0 , y0 ) ∂f (x0 , y0 )
f (x, y) = f (x0 , y0 ) + (x − x0 ) + (y − y0 )
∂x ∂y

1 ∂ 2 f (x0 , y0 ) ∂ 2 f (x0 , y0 )

2
+ (x − x 0 ) + 2 (x − x0 )(y − y0 )
2 ∂x2 ∂x∂y
∂ 2 f (x0 , y0 )

2
+ (y − y0 ) + · · · . (7.4)
∂y 2

Writing the above expression in more compact form by using matrix nota-
tion gives

  
∂f (x0 , y0 ) ∂f (x0 , y0 ) x − x0
f (x, y) = f (x0 , y0 ) +
∂x ∂y y − y0
 2 
∂ f (x0 , y0 ) ∂ 2 f (x0 , y0 )
1

 ∂x2 ∂x∂y 
 x − x0

+ (x − x0 )(y − y0 )   y − y0 + · · · .

2  2
 ∂ f (x0 , y0 ) ∂ 2 f (x0 , y0 ) 
∂x∂y ∂y 2
764 Applied Linear Algebra and Optimization using MATLAB

Denote the first derivative of f by ∇f and the second derivative by ∇2 f ,


then a 2 × 1 vector called a gradient vector is defined as
∂f (x0 , y0 )
 
 ∂x 
∇f (x0 , y0 ) =  ,
 
 ∂f (x0 , y0 ) 
∂y
and a 2 × 2 matrix called the Hessian matrix is defined as
 2 
∂ f (x0 , y0 ) ∂ 2 f (x0 , y0 )

 ∂x2 ∂x∂y 

2
∇ f (x0 , y0 ) =  2
.

 ∂ f (x0 , y0 ) ∂ 2 f (x0 , y0 ) 
∂y∂x ∂y 2
Also, note that
 
T ∂f (x0 , y0 ) ∂f (x0 , y0 )
∇f (x0 , y0 ) =
∂x ∂y
and      
x − x0 x x0
= − = (x − x∗ ) ≡ ∆x.
y − y0 y y0
Thus, Taylor’s series for a function of two variables can be written as
1
f (x, y) = f (x0 , y0 ) + ∇f (x0 , y0 )T ∆x + ∆xT ∇2 f (x0 , y0 )∆x + · · · . (7.5)
2
Similarly, for a function of n variables, x = [x1 , x2 , . . . , xn ], an n×1 gradient
vector and an n × n Hessian matrix can be used in Taylor’s series for n
variables defined as  
∂f
 ∂x1 
 
 
 ∂f 
 
 
 ∂x2 
∇f (x) = 


 . 
 .. 
 
 
 
 ∂f 
∂xn
Nonlinear Programming 765

and  
∂ 2 f (x) ∂ 2 f (x) ∂ 2 f (x)
 ∂x2 ···
 1 ∂x1 ∂x2 ∂x1 ∂xn  
 
 
2
 ∂ f (x) 2 2
∂ f (x) ∂ f (x) 

 ∂x ∂x 2
··· 
2 2 1 ∂x 2 ∂x 2 ∂x 
n 
∇ f (x) =  .

 
 .. .. .. .. 

 . . . . 

 
 2 2 2

 ∂ f (x) ∂ f (x) ∂ f (x) 
“““‘ ···
∂xn ∂x1 ∂xn ∂x2 ∂x2n
For a continuously differentiable function, the mixed second partial deriva-
tives are symmetric, i.e.,

∂ 2 f (x) ∂ 2 f (x)
= , i, j = 1, 2, . . . , n,
∂xi ∂xj ∂xj ∂xi

which implies that the Hessian matrix is always symmetric. Thus, Taylor’s
series approximation for a function of several variables about given point
x∗ can be written as
1
f (x) = f (x∗ ) + ∇f (x∗ )T ∆x + ∆xT ∇2 f (x∗ )∆x + · · · .
2
Example 7.17 Find the linear and quadratic approximations of the func-
tion
f (x1 , x2 ) = x21 + 5x1 x22 + 3x22 + 4x31 x32
at the given point (a, b) = (1, 1) using Taylor’s series formulas.

Solution. The first-order partial derivatives of the function are

∂f
= 2x1 + 5x22 + 12x21 x32 ,
∂x1

and
∂f
= 10x1 x2 + 6x2 + 12x31 x22 .
∂x2
766 Applied Linear Algebra and Optimization using MATLAB

Thus, the gradient of the function is

∇f (x1 , x2 )T = [2x1 + 5x22 + 12x21 x32 10x1 x2 + 6x2 + 12x31 x22 ]

and  
x1 − a
∆x = .
x2 − b

Linear Approximation Formula


The linear approximation formula for two variables is

f (x1 , x2 ) ≈ T1 (x1 , x2 ) = f (a, b) + ∇f (a, b)T ∆x.

Given (a, b) = (1, 1), we get the following values:

f (a, b) = f (1, 1) = 13

∇f (1, 1)T = [19 28]


 
x1 − 1
∆x = .
x2 − 1

Putting these values in the above linear approximation formula, we get


 
x1 − 1
f (x1 , x2 ) ≈ T1 (x1 , x2 ) = 13 + [19 28]
x2 − 1
or
f (x1 , x2 ) ≈ T1 (x1 , x2 ) = 19x1 + 28x2 − 34,
which is the required linear approximation for the given function.

Quadratic Approximation Formula


The quadratic approximation formula for two variables is defined as
1
f (x1 , x2 ) ≈ T2 (x1 , x2 ) = f (a, b) + ∇f (a, b)T ∆x + ∆xT ∇2 f (a, b)∆x.
2
Nonlinear Programming 767

Now we compute the second-order partial derivatives as follows:


∂ 2f
= 2 + 24x1 x32
∂x21
∂ 2f
= 20x2 + x2 + 24x31 x2
∂x22
∂ 2f
= 10x2 + 36x21 x22
∂x1 ∂x2
∂ 2f
= 10x2 + 36x21 x22 ,
∂x2 ∂x1
and so
∂ 2 f (x1 , x2 ) ∂ 2 f (x1 , x2 )
 
2 + 24x1 x32 10x2 + 36x21 x22
 
 ∂x21 ∂x1 ∂x2 
2
 
∇ f (x1 , x2 ) = 
 =

.
 ∂ 2 f (x1 , x2 ) ∂ 2 f (x1 , x2 )  10x2 + 36x21 x22 20x2 + x2 + 24x31 x2
∂x2 ∂x1 ∂x22

Thus,  
72 46
∇2 f (1, 1)T =  .
46 45
So using the quadratic approximation formula
1
f (x1 , x2 ) ≈ T2 (x1 , x2 ) = T1 (x1 , x2 ) + ∆xT ∇2 f (a, b)∆x,
2
we get
 
72 46  
1 x1 − 1
f (x1 , x2 ) ≈ T2 (x1 , x2 ) = T1 (x1 , x2 )+ [(x1 − 1) (x2 − 1)]   ,
2 x2 − 1
46 45

which gives

f (x1 , x2 ) ≈ T2 (x1 , x2 ) = 19x1 + 28x2 − 34


1
72x21 + 45x22 − 236x1 − 182x2 + 92x1 x2 + 117 .

+
2
768 Applied Linear Algebra and Optimization using MATLAB

So

f (x1 , x2 ) ≈ T2 (x1 , x2 ) = 36x21 + 22.5x22 − 97x1 − 63x2 + 46x1 x2 + 24.5,

which is the required quadratic approximation of the function. •

7.2.8 Quadratic Forms


Quadratic forms play an important role in geometry. Given

x = (x1 , x2 , . . . , xn )T

and  
a11 a12 ··· a1n

 a21 a22 ··· a2n 

A=
 a31 a32 ··· a3n ,

 .. .. .. .. 
 . . . . 
an1 an2 · · · ann
then the function
 
a11 a12 · · · a1n  
x 1
 a21 a22 · · · a2n   x2 

q(x) = xT Ax = x1 x2 · · · xn 
 a31 a32 · · · a3n  
 . 

 .. .. .. ..   .. 
 . . . . 
xn
an1 an2 · · · ann
Pn Pn
= i=1 j=1 aij xi xj ,

can be used to represent any quadratic polynomial in the variables x1 , x2 , . . . , xn


and is called a quadratic form. The matrix A of a quadratic form can always
be forced to be symmetric because
T
 
T T (A + A )
x Ax = x x,
2
T
and the matrix (A+A2
)
is always symmetric. The symmetric matrix A as-
sociated with the quadratic form is called the matrix of the quadratic form.
Nonlinear Programming 769

Example 7.18 What is the quadratic form of the associated matrix


 
2 3 −1
A= 3 6 0 ?
−1 0 4

Solution. If  
x1
x =  x2  ,
x3
then   
2 3 −1 x 1
xT Ax =

x1 x2 x3  3 6 0   x2 
−1 0 4 x3
or  
2x1 + 2x2 − x3
xT Ax =

x1 x2 x3  2x1 + 6x2 .
−x1 + 4x3
Thus,

xT Ax = 2x21 + 2x1 x2 − x1 x3 + 2x1 x2 + 3x22 − x1 x3 + 4x23

or
xT Ax = 2x21 + 4x1 x2 − 2x1 x3 + 6x22 + 4x23 .
After rearranging the terms, we have

xT Ax = (x1 + 2x2 )2 + 2x22 + (x1 − x3 )2 + 3x23 .

Hence,
(x1 + 2x2 )2 + 2x22 + (x1 − x3 )2 + 3x23 > 0,
unless
x1 = x2 = x3 = 0.

770 Applied Linear Algebra and Optimization using MATLAB

Example 7.19 Find the matrix associated with the quadratic form

q(x1 , x2 , x3 ) = 4x21 + 2x22 + 2x23 + 2x1 x2 − 4x1 x3 + 4x2 x3 .

Solution. The coefficients of the squared terms x2i go on the diagonal as


aii , and the product terms xi xj are split between aij and aji . The elements
(a +a )
aij can be computed as aij = ij 2 ji , which gives
 
4 1 −2
A= 1 2 2 .
−2 2 2

Thus,   
4 1 −2 x 1
xT Ax =

x1 x2 x3  1 2 2   x2  .
−2 2 2 x3

The quadratic form is said to be:

1. positive-definite, if q(x) > 0 for all x 6= 0;

2. positive-semidefinite, if q(x) ≥ 0 for all x and there exists x 6= 0,


such that q(x) = 0;

3. negative-definite, if −q(x) is positive-definite;

4. negative-semidefinite, if −q(x) is positive-semidefinite;

5. indefinite in all other cases.

It can be proved that the necessary and sufficient conditions for the real-
ization of the preceding cases are:

Theorem 7.10 Let q(x) = xT Ax, then:

1. q(x) is positive-definite (-semidefinite), if the values of the principal


minor determinants of A are positive (nonnegative). In this case, A
is said to be positive-definite (-semidefinite).
Nonlinear Programming 771

2. q(x) is negative-definite, if the value of the kth principal minor de-


terminant of A has the sign of (−1)k , k = 1, 2, . . . , n. In this case,
A is called negative-definite.

3. q(x) is negative-semidefinite, if the kth principal minor determinant


of A either is zero or has the sign of (−1)k , k = 1, 2, . . . , n. •

Theorem 7.11 Let A be an n × n symmetric matrix. The quadratic form


q(x) = xT Ax is:

1. positive-definite, if and only if all of the eigenvalues of A are positive;

2. positive-semidefinite, if and only if all of the eigenvalues of A are


nonnegative;

3. negative-definite, if and only if all of the eigenvalues of A are nega-


tive;

4. negative-semidefinite, if and only if all of the eigenvalues of A are


nonpositive;

5. indefinite, if and only if A has both positive and negative eigenvalues.


Example 7.20 Classify

q(x1 , x2 , x3 ) = 7x21 + 7x22 + 7x23 − 2x1 x2 − 2x1 x3 − 2x2 x3

as positive-definite, negative-definite, indefinite, or none of these.

Solution. The matrix of the quadratic form is


 
7 −1 −1
A =  −1 7 −1  .
−1 −1 7

One can easily compute the eigenvalues of the above matrix, which are 8, 8,
and 5. Since all of these eigenvalues are positive, q(x1 , x2 , x3 ) is a positive,
definite quadratic form. •
772 Applied Linear Algebra and Optimization using MATLAB

Theorem 7.12 If A and B are n × n real matrices, with


1
B = (A + AT ),
2
then the corresponding quadratic forms of A and B are identical, and B is
symmetric.

Proof. Since xT Ax is a (1 × 1) matrix (a real number), we have

xT AT x = (xT Ax)T

and
1 T
xT B T x = x (A + AT )x
2
1 T 1
= x Ax + xT AT x
2 2
1 T 1
= x Ax + (xT Ax)T
2 2
1 T 1
= x Ax + xT Ax
2 2
T
= x Ax.

Also,
1 1 1
B T = (A + AT )T = (AT + A) = (A + AT ) = B.
2 2 2
Note that the quadratic forms of A and B are the same but the matrices A
and B are not, unless A is a symmetric. For example, for the matrix
 
4 4
A= ,
2 6

we have    
1 4 4 4 2
B= + ,
2 2 6 4 6
and it gives    
1 8 6 4 3
B= = .
2 6 12 3 6
Nonlinear Programming 773

Now for the symmetric matrix


 
4 4
A= ,
4 6

we have    
1 4 4 4 4
B= + ,
2 4 6 4 6
which gives    
1 8 8 4 4
B= = .
2 8 12 4 6
Also, the quadratic forms
  
T 4 4 x1
= 4x21 + 6x1 x2 + 6x22

x Ax = x1 x2
2 6 x2

and
  
T 4 3 x1
= 4x21 + 6x1 x2 + 6x22

x Bx = x1 x2
3 6 x2

are the same. •

Example 7.21 Classify

q(x1 , x2 , x3 ) = −9x21 − 5.5x22 − 9x23 + 6x1 x2 + 6x1 x3 + 6x2 x3

as positive-definite, negative-definite, indefinite, or none of these.

Solution. The matrix of the quadratic form is


 
−9 3 3
A =  3 −5.5 3 .
3 3 −9

The eigenvalues of the above matrix A are −12, −10, and −1.5, and since
all of these eigenvalues are negative, q(x1 , x2 , x3 ) is a negative-definite
quadratic form. •
774 Applied Linear Algebra and Optimization using MATLAB

7.3 Nonlinear Equations and Systems


Here we study one of the fundamental problems of numerical analysis,
namely, the numerical solution of nonlinear equations. Most equations, in
practice, are nonlinear and are rarely of a form that allows the roots to be
determined exactly. A nonlinear equation may be considered any one of
the following types:

1. An equation may be an algebraic equation (a polynomial equation of


degree n) expressible in the form:

an xn + an−1 xn−1 + · · · + a1 x + a0 = 0, an 6= 0, n > 1,

where an , an−1 , . . . , a1 , and a0 are constants. For example, the fol-


lowing equations are nonlinear:

x2 + 5x + 6 = 0; x3 = 2x + 1; x100 + x2 + 1 = 0.

2. The power of the unknown variable involved in the equation must


be difficult to manipulate. For example, the following equations are
nonlinear:
√ 2
x−1 + 2x = 1; x + x = 0; x2/3 + + 4 = 0.
x

3. An equation may be a transcendental equation that involves trigono-


metric functions, exponential functions, and logarithmic functions.
For example, the following equations are nonlinear:

x = cos(x); ex + x − 10 = 0; x + ln x = 10.

Definition 7.7 (Root of an Equation)

Assume that f (x) is a continuous function. A number α for which f (α) = 0


is called a root of the equation f (x) = 0 or a zero of the function f (x). •

There may be many roots of the given nonlinear equation, but we will
seek the approximation of only one of its roots, which lies on the given
Nonlinear Programming 775

interval [a, b]. This root may be simple (not repeating) or multiple (repeat-
ing).

Now, we shall discuss the methods for nonlinear equations in a single


variable. The problem here can be written down simply as

f (x) = 0. (7.6)

We seek the values of x called the roots of (7.6) or the zeros of the function
f (x) such that (7.6) is true. The roots of (7.6) may be real or complex.
Here, we will look for the approximation of the real root of (7.6). There
are many methods that will give us information about the real roots of
(7.6). The methods we will discuss are all iterative methods. They are the
bisection method, fixed-point method, and Newton’s method.

7.3.1 Bisection Method


This is one of the simplest iterative techniques for determining the roots
of (7.6), and it needs two initial approximations to start. It is based on
the Intermediate Value theorem. This method is also called the interval-
halving method because the strategy is to bisect, or halve, the interval
from one endpoint of the interval to the other endpoint and then retain
the half-interval whose ends still bracket the root. It is also referred to
as a bracketing method or sometimes is called Bolzano’s method. The fact
that the function is required to change sign only once gives us a way to
determine which half of the interval to retain; we keep the half on which
f (x) changes signs or becomes zero. The basis for this method can be
easily illustrated by considering the function

y = f (x).

Our object is to find an x value for which y is zero. Using this method, we
begin by supposing f (x) is a continuous function defined on the interval
[a, b] and then by evaluating the function at two x values, say, a and b,
such that
f (a).f (b) < 0.
776 Applied Linear Algebra and Optimization using MATLAB

The implication is that one of the values is negative and the other is pos-
itive. These conditions can be easily satisfied by sketching the function
(Figure 7.6).

Figure 7.6: Bisection method.

Obviously, the function is negative at one endpoint a of the interval and


positive at the other endpoint b and is continuous on a ≤ x ≤ b. Therefore,
the root must lie between a and b (by the Intermediate Value theorem),
and a new approximation to the root α can be calculated as
a+b
c= ,
2
and, in general,
an + b n
cn = , n ≥ 1. (7.7)
2
The iterative formula (7.7) is known as the bisection method.
Nonlinear Programming 777

If f (c) ≈ 0, then c ≈ α is the desired root, and, if not, then there are
two possibilities. First, if f (a).f (c) < 0, then f (x) has a zero between
point a and point c. The process can then be repeated on the new interval
[a, c]. Second, if f (a).f (c) > 0, it follows that f (b).f (c) < 0 since it is
known that f (b) and f (c) have opposite signs. Hence, f (x) has a zero
between point c and point b and the process can be repeated with [c, b].
We see that after one step of the process, we have found either a zero or
a new bracketing interval which is precisely half the length of the original
one. The process continues until the desired accuracy is achieved. We use
the bisection process in the following example.

Example 7.22 Use the bisection method to find the approximation to the
root of the equation
x3 = 4x − 2
that is located on the interval [1.0, 2.0] accurate to within 10−4 .

Solution. Since the given function f (x) = x3 −4x+2 is a cubic polynomial


function and is continuous on [1.0, 2.0], starting with a1 = 1.0 and b1 = 2.0,
we compute
a1 = 1.0 : f (a1 ) = −2.0
b1 = 2.0 : f (b1 ) = 1.0,
and since f (1.0).f (2.0) = −2 < 0, so that a root of f (x) = 0 lies on the
interval [1.0, 2.0], using formula (7.7) (when n = 1), we get

a1 + b 1 1.0 + 2.0
c1 = = = 1.5; f (c1 ) = f (1.5) = 0.859375.
2 2

Hence, the function changes sign on [a1 , c1 ] = [1.5, 1.75]. To continue, we


squeeze from the right and set a2 = a1 and b2 = c1 . Then the midpoint is
a2 + b 2
c2 = = 1.625; f (c2 ) = 0.041056.
2
778 Applied Linear Algebra and Optimization using MATLAB

Figure 7.7: Bisection method.

Continuing in this manner we obtain a sequence {ck } of approximation


shown by Table 7.1.

We see that the functional values approach zero as the number of it-
erations increases. We got the desired approximation to the root α =
1.6751309 of the given equation x3 = 4x − 2, i.e., c17 = 1.675133, which
was obtained after 17 iterations, with accuracy  = 10−4 . •

To use MATLAB commands for the bisection method, first we define


a function m-file as fn.m for the equation as follows:

f unction y = f n(x)
y = x.ˆ 3 − 4 ∗ x + 2;

then use the single command:

>> s = bisect(0 f n0 , 1.0, 2.0, 1e − 4)


s=
1.675133
Nonlinear Programming 779

Table 7.1: Solution of x3 = 4x − 2 by the bisection method.


n Left Right Function Value
Endpoint an Midpoint cn Endpoint bn f (cn )
01 1.000000 1.500000 2.000000 –0.625000
02 1.500000 1.750000 2.000000 0.359375
03 1.500000 1.625000 1.750000 –0.208984
04 1.625000 1.687500 1.750000 0.055420
05 1.625000 1.656250 1.687500 –0.081635
06 1.656250 1.671875 1.687500 –0.014332
07 1.671875 1.679688 1.687500 0.020239
08 1.671875 1.675781 1.679688 0.002875
09 1.671875 1.673828 1.675781 –0.005748
10 1.673828 1.674805 1.675781 –0.001439
11 1.674805 1.675293 1.675781 0.000717
12 1.674805 1.675049 1.675293 –0.000361
13 1.674805 1.675171 1.675293 0.000177
14 1.674805 1.675110 1.675171 –0.000092
15 1.675110 1.675140 1.675171 0.000040
16 1.675110 1.675125 1.675140 –0.000025
17 1.675125 1.675133 1.675140 0.000009

Program 7.2
MATLAB m-file for the Bisection Method
function sol=bisect(fn,a,b,tol)
f a = f eval(f n, a); f b = f eval(f n, b);
if f a∗f b > 0; fprintf(’Endpoints have same sign’) return
end
while abs (b − a) > tol
c = (a + b)/2; f c = f eval(f n, c);
if f a ∗ f c < 0; b = c; else a = c; end
end; sol=(a + b)/2;
780 Applied Linear Algebra and Optimization using MATLAB

Theorem 7.13 (Bisection Convergence and Error Theorem)

Let f (x) be a continuous function defined on the initial interval [a0 , b0 ] =


[a, b] and suppose f (a).f (b) < 0. Then the bisection method (7.7) generates
a sequence {cn }∞n=1 approximating α ∈ (a, b), with the property

b−a
|α − cn | ≤ , n ≥ 1. (7.8)
2n
Moreover, to obtain an accuracy of
|α − cn | ≤ 
(for  = 10−k ), it suffices to take

ln 10k (b − a)
n≥ , (7.9)
ln 2
where k is a nonnegative integer. •

The above Theorem 7.13 gives us information about bounds for errors
in approximation and the number of bisections needed to obtain any given
accuracy.

One drawback of the bisection method is that the convergence rate is


raster slow. However, the rate of convergence is guaranteed. So for this
reason it is often used as a start for a more efficient method used to find
the roots of nonlinear equations. The method may give a false root, if f (x)
is discontinuous on the given interval [a, b].

Procedure 7.1 (Bisection Method)

1. Establish an interval a ≤ x ≤ b such that f (a) and f (b) are opposite


signs, i.e., f (a).f (b) < 0.
2. Choose an error tolerance ( > 0) value for the function.
3. Compute a new approximation for the root
(an + bn )
cn = ; n = 1, 2, 3, . . . .
2
Nonlinear Programming 781

4. Check the tolerance. If |f (cn )| ≤ , then use cn (n = 1, 2, 3, . . .) for


the desired root; otherwise, continue.

5. Check; if f (an ).f (cn ) < 0, then set bn = cn ; otherwise, set an = cn .

6. Go back to step 3 and repeat the process.

7.3.2 Fixed-Point Method


This is another iterative method used to solve the nonlinear equation (7.6),
and it needs one initial approximation to start. This is a very general
method for finding the root of (7.6), and it provides us with a theoretical
framework within which the convergence properties of subsequent methods
can be evaluated. The basic idea of this method, which is also called the
successive approximation method or function iteration, is to rearrange the
original equation
f (x) = 0 (7.10)
into an equivalent expression of the form

x = g(x). (7.11)

Any solution of (7.11) is called a fixed point for the iteration function g(x)
and, hence, a root of (7.10).

Definition 7.8 (Fixed Point)

A fixed point of a function g(x) is a real number α such that α = g(α). •

The task of solving (7.10) is therefore reduced to that of finding a point


satisfying the fixed-point condition (7.11). The fixed-point method essen-
tially solves two functions simultaneously; y = x and y = g(x). The point
of intersection of these two functions is the solution to x = g(x) and thus
to f (x) = 0 (Figure 7.8).

This method is conceptually very simple. Since g(x) is also nonlinear,


the solution must be obtained iteratively. An initial approximation to the
solution, say, x0 , must be determined. For choosing the best initial value
782 Applied Linear Algebra and Optimization using MATLAB

Figure 7.8: Fixed-point method.

x0 for using this iterative method, we have to find an interval [a, b] on


which the original function f (x) satisfies the sign property and then use
the midpoint a+b2
as the initial approximation x0 . Then this initial value
x0 is substituted in the function g(x) to determine the next approximation
x1 , and so on.

Definition 7.9 (Fixed-Point Method)

The iteration defined in

xn+1 = g(xn ); n≥0 (7.12)

is called the fixed-point method or the fixed-point iteration. •

The value of the initial approximation x0 is chosen arbitrarily and the


hope is that the sequence {xn }∞ n=0 converges to a number α which will
automatically satisfy (7.10). Moreover, since (7.10) is a rearrangement of
(7.11), α is guaranteed to be a zero of f (x). In general, there are many
different ways of rearranging (7.11) in (7.10) form. However, only some
of these are likely to give rise to successful iterations; but sometimes we
Nonlinear Programming 783

don’t have successful iterations. To describe such behavior, we discuss the


following example.

Example 7.23 One of the possible rearrangements of the nonlinear equa-


tion x3 = 4x + 2, which has the root on [1, 2], is
r
4xn − 2
xn+1 = g(xn ) = ; n ≥ 0.
xn
Then use the fixed-point iteration formula (7.12) to compute the approx-
imation of the root of the equation accurate to within 10−4 , starting with
x0 = 1.5.

Solution. Since x0 = 1.5 is given, we have


r r
4x0 − 2 4(1.5) − 2
x1 = g(x0 ) = = = 1.632993.
x0 1.5
This and the further iterates are shown in Table 7.2.

Table 7.2: Solution


r of Example 7.23 .
4xn − 2
n xn+1 = g(xn ) = f (xn )
xn
00 1.500000 –0.625000
01 1.632993 –0.177324
02 1.665910 –0.040313
03 1.673157 –0.008701
04 1.674710 –0.0018586
05 1.675041 –0.000397
06 1.675112 –0.000083
07 1.675127 –0.000017
08 1.675130 –0.000004

We note that the considered sequence converges, and it converged faster


than the bisection method. The desired approximation to the root of the
784 Applied Linear Algebra and Optimization using MATLAB

Figure 7.9: Fixed-point method.

given equation is x8 = 1.675130, which we obtained after 8 iterations, with


accuracy  = 10−4 . •

Theorem 7.14 (Fixed-Point Theorem)

If g is continuously differentiable on the interval [a, b] and g(x) ∈ [a, b] for


all x ∈ [a, b], then

(a) g has at least one fixed point on the given interval [a, b].

Moreover, if the derivative g 0 (x) of the function g(x) exists on an interval


[a, b], which contains the starting value x0 , with

|g 0 (x)| ≤ k < 1, for all x ∈ [a, b] (7.13)

then:

(b) The sequence (7.12) will converge to the attractive (unique) fixed-
point α in [a, b].

(c) The iteration (7.12) will converge to α for any initial approximation.
Nonlinear Programming 785

(d) We have the error estimate

kn
|α − xn | ≤ |x1 − x0 |, for all n ≥ 1. (7.14)
1−k

(e) The limit holds:


α − xn+1
lim = g 0 (α). (7.15)
n→∞ α − xn

MATLAB commands for the above given rearrangement x = g(x) of


f (x) = x3 − 4x + 2 by using the initial approximation x0 = 1.5 can be
written as follows:

f unction y = f n(x)
y = sqrt(4 ∗ x − 2)/x;
>> x0 = 1.5; tol = 0.00001
>> sol = f ixpt(0 f n0 , x0, tol);

Program 7.3
MATLAB m-file for the Fixed-Point Method
function sol=fixpt(fn,x0,tol)
old= x0+1;
while abs(x0-old) > tol; old=x0;
x0 = f eval(f n, old); end; sol=x0;

Procedure 7.2 (Fixed-Point Method)

1. Choose an initial approximation x0 such that x0 ∈ [a, b].

2. Choose a convergence parameter  > 0.

3. Compute the new approximation xnew by using the iterative formula


(7.12).

4. Check if |xnew − x0 | <  then xnew is the desire approximate root;


otherwise, set x0 = xnew and go to step 3.
786 Applied Linear Algebra and Optimization using MATLAB

7.3.3 Newton’s Method


This is one of the most popular and powerful iterative methods for finding
roots of the nonlinear equation (7.6). It is also known as the method of
tangents because after estimating the actual root, the zero of the tangent to
the function at that point is determined. It always converges if the initial
approximation is sufficiently close to the exact solution. This method is
distinguished from the methods given in the previous sections, by the fact
that it requires the evaluation of both the function f (x) and the derivative
of the function f 0 (x) at arbitrary point x. Newton’s method consists of
geometrically expanding the tangent line at a current point xi until it
crosses zero, then setting the next guess xi+1 to the abscissa of that zero
crossing (Figure 7.10). This method is also called the Newton–Raphson
method.

Figure 7.10: Newton’s method.

There are many descriptions of Newton’s method. We shall derive the


method from the familiar Taylor’s series expansion of a function in the
neighborhood of a point.

Let f ∈ C 2 [a, b] and let xn be the nth approximation to the root α such
Nonlinear Programming 787

that f 0 (xn ) 6= 0, and |α − xn | is small. Consider the first Taylor polynomial


for f (x) expanded about xn , so we have

(x − xn )2 00
f (x) = f (xn ) + (x − xn )f 0 (xn ) + f (η(x)), (7.16)
2
where η(x) lies between x and xn . Since f (α) = 0, then (7.16), with x = α,
gives

(α − xn )2 00
f (α) = 0 = f (xn ) + (α − xn )f 0 (xn ) + f (η(α)).
2
Since |α − xn | is small, we neglect the term involving (α − xn )2 , and so

0 ≈ f (xn ) + (α − xn )f 0 (xn ).

Solving for α, we get


f (xn )
α ≈ xn − , (7.17)
f 0 (xn )
which should be a better approximation to α than xn . We call this ap-
proximation xn+1 , and we get

f (xn )
xn+1 = xn − , f 0 (xn ) 6= 0, for all n ≥ 0. (7.18)
f 0 (xn )

The iterative method (7.18) is called Newton’s method. Usually New-


ton’s method converges well and quickly; its convergence cannot, however,
be guaranteed, and it may sometimes converge to a different root from
the one expected. In particular, there may be difficulties if the initial
approximation is not sufficiently close to the actual root. The most se-
rious problem of Newton’s method is that some functions are difficult to
differentiate analytically, and some functions cannot be differentiated an-
alytically at all. Newton’s method is not restricted to one dimension only.
The method readily generalizes to multiple dimensions. It should be noted
that this method is suitable for finding real as well as imaginary roots of
polynomials.
788 Applied Linear Algebra and Optimization using MATLAB

Example 7.24 Use Newton’s method to find the root of the equation x3 −
4x + 2 = 0 that is located on the interval [1.0, 2.0] accurate to 10−4 , taking
an initial approximation x0 = 1.5.

Solution. Given
f (x) = x3 − 4x + 2,
since Newton’s method requires that the value of the derivative of the func-
tion be found, the derivative of the function is

f 0 (x) = 3x2 − 4.

Now evaluating f (x) and f 0 (x) at the give approximation x0 = 1.5 gives

x0 = 1.5, f (1.5) = −0.625, f 0 (1.5) = 2.750.

Using Newton’s iterative formula (7.18), we get


f (x0 ) (−0.625)
x1 = x0 − 0
= 1.5 − = 1.727273.
f (x0 ) 2.75
Now evaluating f (x) and f 0 (x) at the new approximation x1 , we get

x1 = 1.727273, f (1.727273) = 0.244177, f 0 (1.727273) = 4.950413.

Using the iterative formula (7.18) again, we get the other new approxima-
tion as follows:
f (x1 ) (0.244177)
x2 = x1 − 0
= 1.727273 − = 1.677948.
f (x1 ) 4.950413
Thus, the successive iterates are shown in Table 7.3. Just after the third
iteration, the root is approximated to be x4 = 1.67513087056 and the
functional value is reduced to 4.05 × 10−10 . Since the exact solution is
1.67513087057, the actual error is 1 × 10−10 . We see that the convergence
is faster than the methods considered previously. •

To get the above results using MATLAB commands, first the function
x3 − 4x + 2 and its derivative 3x2 − 4 are saved in m-files called fn.m and
dfn.m, respectively, written as follows:
Nonlinear Programming 789

Table 7.3: Solution of x3 = 4x − 2 by Newton’s method.


n xn f (xn ) f 0 (xn ) Error α − xn
00 1.500000 -0.625000 2.750000 0.175131
01 1.727273 0.244177 4.950413 -0.052142
02 1.677948 0.012487 4.446529 -0.002817
03 1.675140 0.000040 4.418281 -0.000009
04 1.675131 4.05e-010 4.418190 -0.000000

f unction y = f n(x)
y = x.ˆ 3 − 4 ∗ x + 2;

and

f unction dy = df n(x)
dy = 3 ∗ x.ˆ 2 − 4;

Then we do the following:

>> x0 = 1.5; tol = 0.00001;


>> sol = newton(0 f n0 ,0 df n0 , x0, tol);

Program 7.4
MATLAB m-file for Newton’s Method
function sol=newton(fn,dfn,x0,tol)
old = x0+1;
while abs (x0 − old) > tol; old = x0;
x0 = old − f eval(f n, old)/f eval(df n, old);
end; sol=x0;

7.3.4 System of Nonlinear Equations


A system of nonlinear algebraic equations may arise when one is dealing
with problems involving optimization and numerical integration (Gauss
790 Applied Linear Algebra and Optimization using MATLAB

quadratures). Generally, the system of equations may not be of the poly-


nomial variety. Therefore, a system of n equations in n unknowns is called
nonlinear, if one or more of the equations in the systems is/are nonlinear.

The numerical methods we discussed so far have been concerned with


finding a root of a nonlinear algebraic equation with one independent vari-
able. We now consider two numerical methods for solving systems of non-
linear algebraic equations in which each equation is a function of a specified
number of variables.
Consider the system of two nonlinear equations with the two variables

f1 (x, y) = 0 (7.19)

and
f2 (x, y) = 0. (7.20)
The problem can be stated as follows:
Given the continuous functions f1 (x, y) and f2 (x, y), find the values x = α
and y = β such that

f1 (α, β) = 0
f2 (α, β) = 0. (7.21)

The functions f1 (x, y) and f2 (x, y) may be algebraic equations, transcen-


dental, or any nonlinear relationship between the input x and y and the
output f1 (x, y) and f2 (x, y). The solutions to (7.19) and (7.20) are the
intersections of f1 (x, y) = f2 (x, y) = 0 (Figure 7.11). This problem is con-
siderably more complicated than the solution of a single nonlinear equation.
The one-point iterative method discussed above for the solution of a single
equation may be extended to the system. So to solve the system of non-
linear equations we have many methods to choose from, but we will use
Newton’s method.

Newton’s Method for the System


Consider the two nonlinear equations specified by the equations (7.19) and
(7.20). Suppose that (xn , yn ) is an approximation to a root (α, β); then
Nonlinear Programming 791

Figure 7.11: Nonlinear equation in two variables.

by using Taylor’s theorem for functions of two variables for f1 (x, y) and
f2 (x, y) expanding about (xn , yn ), we have

f1 (x, y) = f1 (xn + (x − xn ), yn + (y − yn ))
∂f1 (xn , yn ) ∂f1 (xn , yn )
= f1 (xn , yn ) + (x − xn ) + (y − yn ) + ···
∂x ∂y

and

f2 (x, y) = f2 (xn + (x − xn ), yn + (y − yn ))
∂f2 (xn , yn ) ∂f2 (xn , yn )
= f2 (xn , yn ) + (x − xn ) + (y − yn ) + ···.
∂x ∂y

Since f1 (α, β) = 0 and f2 (α, β) = 0, these equations, with x = α and


y = β, give

∂f1 (xn , yn ) ∂f1 (xn , yn )


0 = f1 (xn , yn ) + (α − xn ) + (β − yn ) + ···
∂x ∂y
∂f2 (xn , yn ) ∂f2 (xn , yn )
0 = f2 (xn , yn ) + (α − xn ) + (β − yn ) + ···.
∂x ∂y
792 Applied Linear Algebra and Optimization using MATLAB

Newton’s method has a condition that initial approximation (xn , yn ) should


be sufficiently close to the exact root (α, β), therefore, the higher order
terms may be neglected to obtain

∂f1 (xn , yn ) ∂f1 (xn , yn )


0 ≈ f1 (xn , yn ) + (α − xn ) + (β − yn )
∂x ∂y
∂f2 (xn , yn ) ∂f2 (xn , yn )
0 ≈ f2 (xn , yn ) + (α − xn ) + (β − yn ) .
∂x ∂y
(7.22)
We see that this represents a system of two linear algebraic equations
for α and β. Of course, since the higher order terms are omitted in the
derivation of these equations, their solution (α, β) is no longer an exact root
of (7.21) and (7.22). However, it will usually be a better approximation
than (xn , yn ), so replacing (α, β) by (xn+1 , yn+1 ) in (7.21) and (7.22) gives
the iterative scheme

∂f1 (xn , yn ) ∂f1 (xn , yn )


0 = f1 (xn , yn ) + (xn+1 − xn ) + (yn+1 − yn )
∂x ∂y
∂f2 (xn , yn ) ∂f2 (xn , yn )
0 = f2 (xn , yn ) + (xn+1 − xn ) + (yn+1 − yn ) .
∂x ∂y

Then writing in the matrix form, we have


 ∂f ∂f1  
1   
 ∂x ∂y  xn+1 − xn f1



  = − , (7.23)
 ∂f ∂f2 
2 yn+1 − yn f2
∂x ∂y

where f1 , f2 , and their partial derivatives f1 x , f1 y are evaluated at (xn , yn ).


Hence,

  
 ∂f
1 ∂f1 −1  
xn+1 xn  ∂x ∂y  f1
 = −

 
 . (7.24)
 ∂f ∂f2
yn+1 yn f2

2
∂x ∂y
Nonlinear Programming 793

We call the matrix  ∂f


1 ∂f1 
 ∂x ∂y 
J = (7.25)
 

 ∂f ∂f2 
2
∂x ∂y
the Jacobian matrix.

Note that (7.23) can be written in the simplified form as


 ∂f ∂f 
1 1    
 ∂x ∂y  h f1
= − ,
  
 
 ∂f ∂f 
2 2 k f2
∂x ∂y
where h and k can be evaluated as
 ∂f2 ∂f1 
− f1 + f2
∂y ∂y
h = 
∂f1 ∂f2 ∂f1 ∂f2 

∂x ∂y ∂y ∂x
(7.26)
 ∂f ∂f1 
2
f1 − f2
k =  ∂x ∂x ,
∂f1 ∂f2 ∂f1 ∂f2 

∂x ∂y ∂y ∂x
where all functions are to be evaluated at (x, y). Newton’s method for a
pair of equations in two unknowns is therefore
     
xn+1 xn h
= + , n = 0, 1, 2, . . . , (7.27)
yn+1 yn k

where (h, k) are given by (7.26) and evaluated at (xn , yn ).

At a starting approximation (x0 , y0 ), the functions f1 , f1 x , f1 y , f2 , f2 x


and f2 y are evaluated. The linear equations are then solved for (x1 , y1 ) and
the whole process is repeated until convergence is obtained. Comparison
794 Applied Linear Algebra and Optimization using MATLAB

of (7.18) and (7.24) shows that the above procedure is indeed an extension
of Newton’s method in one variable, where division by f 0 is generalized to
premultiplication by J −1 .

Example 7.25 Solve the following system of two equations using Newton’s
method with accuracy  = 10−5 :

4x3 + y = 6
x2 y = 1.

Assume x0 = 1.0 and y0 = 0.5 as starting values.

Solution. Obviously, this system of nonlinear equations has an exact so-


lution of x = 1.088282 and y = 0.844340. Let us look at how Newton’s
method is used to approximate these roots. The first partial derivatives are
as follows:

f1 (x, y) = 4x3 + y − 6, f1 x = 12x2 , f1 y = 1


f2 (x, y) = x2 y − 1, f2 x = 2xy, f2 y = x2 .

At the given initial approximations x0 = 1.0 and y0 = 0.5, we get


∂f1 ∂f1
f1 (1.0, 0.5) = −1.5, = f1 x = 12, = f1 y = 1.0
∂x ∂y

∂f1 ∂f2
f2 (1.0, 0.5) = −0.5, = f2 x = 1.0, = f2 y = 1.0.
∂x ∂y
The Jacobian matrix J and its inverse J −1 at the given initial approxima-
tion can be calculated as
 ∂f ∂f 
1 1  
 ∂x ∂y  12.0 1.0
J = =
   
 ∂f ∂f 
2 2 1.0 1.0
∂x ∂y
and  
−1 1 1.0 −1.0
J = .
11.0 −1.0 12.0
Nonlinear Programming 795

The Jacobian matrix can be found by using MATLAB commands as follows:

>> syms x y
>> f un = [4 ∗ xˆ 3+y − 6, xˆ 2∗y − 1];
>> var = [x, y];
>> R = jacobian(f, var);

Figure 7.12: Graphical solution of the given nonlinear system.

Substituting all these values into (7.25), we get the first approximation as
        
x1 1.0 1 1.0 −1.0 −1.5 1.090909
= − = .
y1 0.5 11.0 −1.0 12.0 −0.5 0.909091

Similarly, the second iteration gives


      
x2 1.090909 1 1.190082 −1.0 0.102178
= −
y2 0.909091 15.012077 −1.983471 14.280989 0.081893

 
1.088264
= .
0.844686
796 Applied Linear Algebra and Optimization using MATLAB

The first two and the further steps of the method are listed in Table 7.4.

Table 7.4: Solution of a system of two nonlinear equations.


n x-approx. y-approx. 1st. func. 2nd. func.
xn yn f1 (xn , yn ) f2 (xn , yn )
00 1.000000 0.500000 -1.50000 -0.500000
01 1.090909 0.909091 0.102178 0.081893
02 1.088264 0.844686 0.000091 0.000377
03 1.088282 0.844340 0.000001 0.000001

Note that a typical iteration of this method for this pair of equations
can be implemented in the MATLAB Command Window using:

>> f 1 = 4 ∗ x0ˆ 3 + y0 − 6; f 2 = x0ˆ 2 ∗ y0 − 1;


>> f 1x = 12 ∗ x0ˆ 2; f 1y = 1; f 2x = 2 ∗ x0 ∗ y0 ; f 2y = x0ˆ 2;
>> D = f 1x ∗ f 2y − f 1y ∗ f 2x;
>> h = (f 2 ∗ f 1y − f 1 ∗ f 2y)/D; k = (f 1 ∗ f 2x − f 2 ∗ f 1x)/D;
>> x0 = x0 + h; y0 = y0 + k;

Using the starting value (1.0, 0.5), the possible approximations are shown
in Table 7.4. •

We see that the values of both the functionals approach zero as the
number of iterations is increased. We got the desired approximations to
the roots after 3 iterations, with accuracy  = 10−5 .

Newton’s method is fairly easy to implement for the case of two equa-
tions in two unknowns. We first need the function m-files for the equations
and the partial derivatives. For the equations in Example 7.25, we do the
following:
Nonlinear Programming 797

f unction f = f n2(v)
%Here f and v are vector quantities
x = v(1); y = v(2);
f (1) = 4 ∗ x.ˆ 2 + y − 6;
f (2) = x.ˆ 2 ∗ y − 1;

f unction J = df n2(v)
%Jacobian matrix f orf n2.m
x = v(1); y = v(2);
J(1, 1) = 12 ∗ x.ˆ 2; J(1, 2) = 1;
J(2, 1) = 2 ∗ x ∗ y; J(2, 2) = x.ˆ 2;

Then the following MATLAB commands can be used to generate the so-
lution of Example 7.25:
>> s = newton2(0 f n20 ,0 df n20 , [1.0, 0.5], 1e − 5)
s=
1.088282 0.844340
The m-file Newton2.m will need both the function and its partial deriva-
tives as well as a starting vector and a tolerance. The following code can
be used:

Program 7.5
MATLAB m-file for Newton’s Method for a Nonlinear System
function sol=newton2(fn2,dfn2,x0,tol)
old=x0+1; while max(abs(x0-old))>tol; old=x0;
f = f eval(f n2, old); f 1 = f (1); f 2 = f (2);
J=feval(dfn2,old);
f 1x = J(1, 1); f 1y = J(1, 2); f 2x = J(2, 1); f 2y = J(2, 2);
D = f 1x ∗ f 2y − f 1y ∗ f 2x;
h = (f 2 ∗ f 1y − f 1 ∗ f 2y)/D; k = (f 1 ∗ f 2x − f 2 ∗ f 1x)/D;
x0 = old + [h, k]; end; sol=x0;
Similarly, for a large system of equations it is convenient to use vector
notation. Consider the system
f (x) = 0,
798 Applied Linear Algebra and Optimization using MATLAB

where f = (f1 , f2 , . . . , fn )T and x = (x1 , x2 , . . . , xn )T . Denoting the nth


[n] [n] [n] [n]
iterate by x[n] = (x1 , x2 , x3 , . . . , xn )T , then Newton’s method is defined
by
−1
x[n+1] = x[n] − J(x[n] ) f (x[n] ),

(7.28)
where the Jacobian matrix J is defined as
∂f1 ∂f1 ∂f1
 
...
 ∂x1 ∂x2 ∂xn 
 . . 
 
J = . . .
 
 . . 
 
 ∂f ∂fn ∂fn 
n
...
∂x1 ∂x2 ∂xn

Since the iterative formula (7.28) involves the inverse of Jacobian J, in


practice we do not attempt to find this explicitly. Instead of using the
form of (7.28), we use the form

J(x[n] )Z[n] = −f (x[n] ), (7.29)

where Z[n] = x[n+1] − x[n] .

This represents a system of linear equations for Z[n] and can be solved
by any of the methods described in Chapter 3. Once Z[n] has been found,
the next iterate is calculated from

x[n+1] = Z[n] + x[n] . (7.30)

There are two major disadvantages with this method:

1. The method may not converge unless the initial approximation is


a good one. Unfortunately, there are no general means by which
an initial solution can be obtained. One can assume such values
for which det(J) 6= 0. This does not guarantee convergence, but it
does provide some guidance as to the appropriateness of one’s initial
approximation.
Nonlinear Programming 799

2. The method requires the user to provide the derivatives of each func-
tion with respect to each variable. Therefore, one must evaluate
the n functions and the n2 derivatives at each iteration. So solv-
ing systems of nonlinear equations is a difficult task. For systems
of nonlinear equations that have analytical partial derivatives, New-
ton’s method can be used; otherwise, multidimensional minimization
techniques should be used.

Procedure 7.3 (Newton’s Method for Two Nonlinear Equations)

1. Choose the initial guess for the roots of the system so that the deter-
minant of the Jacobian matrix is not zero.

2. Establish the tolerance (> 0).

3. Evaluate the Jacobian at initial approximations and then find the


inverse of the Jacobian.

4. Compute a new approximation to the roots by using the iterative for-


mula (7.30).

5. Check the tolerance limit. If k(xn , yn ) − (xn−1 , yn−1 )k ≤ , for n ≥ 0,


then end; otherwise, go back to step 3 and repeat the process.

Fixed-Point Method for a System


It is sometimes convenient to solve a system of nonlinear equations by an
iterative method that does not require the computation of partial deriva-
tives. An example of the use of a fixed-point iteration for finding the zero of
a nonlinear function of a single variable was discussed previously. Now we
extend this idea to systems. The conditions that guarantee a fixed point
for the vector function g(x) are similar for a fixed point of a function of a
single variable.

The fixed-point iteration formula

xn+1 = g(xn ), n≥0


800 Applied Linear Algebra and Optimization using MATLAB

can be modified to solve the two simultaneous nonlinear equations


f1 (x, y) = 0
f2 (x, y) = 0.
These two nonlinear equations can be expressed in an equivalent form
x = g1 (x, y)
y = g2 (x, y),
and the iterative method to generate the sequences {xn } and {yn } is defined
by the recurrence formulas
xn+1 = g1 (xn , yn ), n≥0
yn+1 = g2 (xn , yn ), n≥0 (7.31)
for the given starting values x0 and y0 .

The sufficient condition guaranteeing the convergence of the iterations


defined by (7.31) or the convergence of {xn } to α and the convergence of
{yn } to β, where the numbers α and β are such that
α = g1 (α, β)
β = g2 (α, β)
are
∂g1 ∂g2 ∂g1 ∂g2
∂x + ∂x < 1, ∂y + ∂y < 1 (7.32)

or        
∂g1 ∂g2 ∂g1 ∂g2
− < 1. (7.33)
∂x n ∂x n ∂y n ∂y n
Note that the fixed-point iteration may fail to converge even though the
condition (7.32) is satisfied, unless the process is started with an initial
guess (x0 , y0 ) sufficiently close to (α, β).
Example 7.26 Solve the following system of two equations using the fixed-
point iteration, with accuracy  = 10−5 :
4x3 + y = 6
x2 y = 1.
Nonlinear Programming 801

Assume x0 = 1.0 and y0 = 0.5 as starting values.

Solution. Given the two functions


f1 (x, y) = 4x3 + y − 6
f2 (x, y) = x2 y − 1,
let us consider the possible rearrangements of the given system of equations
as follows:
 1/3
6−y
x = g1 (x, y) =
4
 
1
y = g2 (x, y) = .
x2
Thus,
 1/3
6 − yn
xn+1 = g1 (xn , yn ) = ; n≥0
4
 
1
yn+1 = g2 (xn , yn ) = ; n ≥ 0,
x2n
and using the given initial approximation (x0 = 1.0, y0 = 0.5), we get
 1/3  1/3
6 − y0 6 − 0.5
x1 = g1 (x0 , y0 ) = = = 1.111990
4 4
   
1 1
y1 = g2 (x0 , y0 ) = 2
= = 1.
x0 12
The first and the further iterations of the method, starting with the initial
approximation (x0 , y0 ) = (1.0, 0.5) with accuracy 10−5 , are listed in Ta-
ble 7.5.

Similarly, for a large system of equations it is convenient to use vector


notation as follows:
x(k+1) = g(x(k) ), k ≥ 0,
where g = (g1 , g2 , . . . , gn )T and x = (x1 , x2 , . . . , xn )T . •
802 Applied Linear Algebra and Optimization using MATLAB

Table 7.5: Solution of the given system by fixed-point iteration.


n xn+1 = g1 (xn , yn ) yn+1 = g2 (xn , yn ) 1st func. 2nd func.
 1/3  
6 − yn 1
= = f1 (xn , yn ) f2 (xn , yn )
4 x2n
00 1.000000 0.500000 –1.50000 –0.500000
01 1.111990 1.000000 0.499999 0.236522
02 1.077217 0.808720 –0.191285 –0.061564
03 1.090782 0.861774 0.053047 0.025343
04 1.087054 0.840474 –0.021298 –0.006823
05 1.088554 0.846248 0.005775 0.002761
06 1.088148 0.843918 –0.002326 –0.000745
07 1.088312 0.844547 0.000634 0.000301
08 1.088267 0.844293 –0.000595 –0.000083
09 1.088285 0.844363 0.000066 0.000033
10 1.088280 0.844335 –0.000033 –0.000009
11 1.088282 0.844343 0.000004 0.000004

7.4 Convex and Concave Functions


Convex and concave functions play an extremely important role in the
study of nonlinear programming problems.

Definition 7.10 (Convex Set)

A set S is convex if x0 ∈ S and x00 ∈ S implies that all points on the line
segment joining x0 and x00 are members of S. This ensures that

cx0 + (1 − c)x00 , for 0<c<1

will be a member of S. For example, a vector subspace is convex, a ball in a


normed vector space is convex (apply the triangle inequality), a hyperplane
is a convex set, and a half-space is a convex set.
Nonlinear Programming 803

Figure 7.13: Convex and nonconvex sets.

Note that the intersection of convex sets is a convex set, but the union
of convex sets is not necessarily a convex set. •

Definition 7.11 (Convex Function)

A function f (x1 , x2 , . . . , xn ) that is defined for all points (x1 , x2 , . . . , xn ) in


a convex set S is called a convex function on S, if for any x0 ∈ S and
x00 ∈ S,
f (cx0 + (1 − c)x00 ) ≤ cf (x0 ) + (1 − c)f (x00 )
holds for 0 ≤ c ≤ 1.
For example, the functions f (x) = x2 and f (x) = ex are the convex func-
tions. •

Definition 7.12 (Concave Function)

A function f (x1 , x2 , . . . , xn ) that is defined for all points (x1 , x2 , . . . , xn ) in


a convex set S is called a concave function on S, if for any x0 ∈ S and
x00 ∈ S,
f (cx0 + (1 − c)x00 ) ≥ cf (x0 ) + (1 − c)f (x00 )
holds for 0 ≤ c ≤ 1. √
For example, the function f (x) = x is the concave function. •
804 Applied Linear Algebra and Optimization using MATLAB

Figure 7.14: Convex and concave functions.

Let y = f (x) be a function of a single variable. From Figure 7.15, we find


that a function f (x) is a convex function, if and only if the straight line
joining any two points on the curve y = f (x) is never below the curve
y = f (x). From the above figure we have:

Point A = (x0 , f (x0 ))


Point B = (cx0 + (1 − c)x00 , f (cx0 + (1 − c)x00 ))
Point C = (cx0 + (1 − c)x00 , cf (x0 ) + (1 − c)f (x00 ))
Point D = (x00 , f (x00 )),

and from this, we get

f (cx0 + (1 − c)x00 ) ≤ cf (x0 ) + (1 − c)f (x00 ),

which implies that a function is convex.

A function f (x) is a concave function, if and only if the straight line


joining any two points on the curve y = f (x) is never above the curve
y = f (x). From Figure 7.16, we have:

Point A = (x0 , f (x0 ))


Point B = (cx0 + (1 − c)x00 , cf (x0 ) + (1 − c)f (x00 ))
Nonlinear Programming 805

Figure 7.15: Convex function.

Point C = (cx0 + (1 − c)x00 , f (cx0 + (1 − c)x00 ))


Point D = (x00 , f (x00 )),

which gives

f (cx0 + (1 − c)x00 ) ≥ cf (x0 ) + (1 − c)f (x00 ),

which means that the function is concave.

Example 7.27 Show that f (x) = x2 is a convex function.

Solution. Let a function f (x) be a convex function, then

f (cx0 + (1 − c)x00 ) ≤ cf (x0 ) + (1 − c)f (x00 ).

Given f (x) = x2 , then the left-hand side of the above inequality can be
written as
2 2
f (cx0 + (1 − c)x00 ) = (cx0 + (1 − c)x00 )2 = c2 x0 + (1 − c)2 x00 + 2c(1 − c)x0 x00 .
806 Applied Linear Algebra and Optimization using MATLAB

Figure 7.16: Concave function.

Also, the right-hand side of the inequality gives


2 2
cf (x0 ) + (1 − c)f (x00 ) = cx0 + (1 − c)x00 .

So using these values in the above inequality, we get


2 2 2 2
c2 x0 + (1 − c)2 x00 + 2c(1 − c)x0 x00 ≤ cx0 + (1 − c)x00 ,

or, it can be written as


2 2
x0 (c2 − c) + x00 [(1 − c2 ) − (1 − c)] + x0 x00 (2c(1 − c)) ≤ 0.

Thus,
2 2
(c2 − c)[x0 + x00 − 2x0 x00 ] ≤ 0
or
(c2 − c)(x0 − x00 )2 ≤ 0.
For c = 0 and c = 1, this inequality holds with the equality, and for c ∈
(0, 1), we have
c2 − c < 0.
Nonlinear Programming 807

Also,
(x0 − x00 )2 ≥ 0,
which also holds. Hence, the given f (x) is a convex function. •

Example 7.28 A linear function f (x) = ax + b is both a convex and a


concave function because it follows from

f (cx0 + (1 − c)x00 ) = a(cx0 + (1 − c)x00 ) + b


= c(ax0 + b) + (1 − c)(ax00 + b)
= cf (x0 ) + (1 − c)f (x00 ).

Figure 7.17: Both a convex and concave Function.

From the above definitions of convex and concave functions, we see that
f (x1 , x2 , . . . , xn ) is a convex function, if and only if −f (x1 , x2 , . . . , xn ) is a
concave function, and vice-versa.

From the following Figure 7.18, we see a function that is neither convex
nor concave because the line segment AB lies below y = f (x) and the line
808 Applied Linear Algebra and Optimization using MATLAB

Figure 7.18: Neither a convex nor a concave function.

segment BC lies above y = f (x).

A function f (x) is said to be strictly convex if, for two distinct points x0
and x00 ,
f (cx0 + (1 − c)x00 ) < cf (x0 ) + (1 − c)f (x00 ),
where 0 < c < 1. Conversely, a function f (x) is strictly concave if −f (x)
is strictly convex.

A special case of the convex (concave) function is the quadratic form

f (x) = Kx + xT Ax,

where K is a constant vector and A is a symmetric matrix. It can be


proved that f (x) is strictly convex, if A is positive-definite, and f (x) is
strictly concave, if A is negative-definite.

Properties of Convex Functions


1. If f (x) is a convex function, then af (x) is also a convex function, for
any a > 0.
Nonlinear Programming 809

2. The sum of convex functions is also a convex function. For example,


f (x) = x2 and g(x) = ex are convex functions, so h(x) = x2 + ex is
also a convex function.
3. If f (x) is a convex function, and g(y) is another convex function
whose value continuously increases, then the composite function g(f (x))
is also a convex function.
To check the convexity of a given function of a single variable, we can
use the following theorem:
Theorem 7.15 (Convex Function)

Suppose that the second derivative of a function f (x) exists for all x in a
convex set S. Then f (x) is a convex function on S, if and only if
f 00 (x) ≥ 0, for all x ∈ S.
For example, the function f (x) = x2 is a convex function on S = R1
because
f 0 (x) = 2x, f 00 (x) = 2 ≥ 0.
Theorem 7.16 (Concave Function)

Suppose that the second derivative of a function f (x) exists for all x in a
convex set S. Then f (x) is a concave function on S, if and only if
f 00 (x) ≤ 0, for all x ∈ S.
For example, the function f (x) = x1/2 is a concave function on S = R1
because
1 1
f 0 (x) = x−1/2 , f 00 (x) = − x−3/2 ≤ 0.
2 4
Also, the function f (x) = 3x + 2 is both a convex and concave function on
S = R1 because
f 0 (x) = 3, f 00 (x) = 0.
Using the definitions and the above two theorems, it is difficult to check
for convexity of a given function because it would require consideration of
infinite many points. However, using the sign of the Hessian matrix of the
function, we can determine the convexity of a function.
810 Applied Linear Algebra and Optimization using MATLAB

Theorem 7.17

1. A function f (x1 , x2 , . . . , xn ) is a convex function, if its Hessian ma-


trix H(x1 , x2 , . . . , xn ) is at least positive-semidefinite.

2. A function f (x1 , x2 , . . . , xn ) is a concave function, if its Hessian ma-


trix H(x1 , x2 , . . . , xn ) is at least negative-semidefinite.

3. A function f (x1 , x2 , . . . , xn ) is a nonconvex function, if its Hessian


matrix H(x1 , x2 , . . . , xn ) is indefinite. •

Example 7.29 Show that the function

f (x1 , x2 , x3 ) = 3x21 + 2x1 x2 + 2x22 − 2x2 x3 + 2x23

is a convex function.

Solution. First, we find the first and second partial derivatives of the
given function as follows:
∂f
f x1 = = 6x1 + 2x2
∂x1
∂f
f x2 = = 2x1 + 4x2 − 2x3
∂x2
∂f
f x3 = = −2x2 + 4x3 ,
∂x3
and the second derivatives of the functions are as follows:

∂ 2f
(fx1 )x1 = =6
∂x1 2
∂ 2f
(fx2 )x2 = =4
∂x2 2
∂ 2f
(fx3 )x3 = =2
∂x3 2
Nonlinear Programming 811

∂fx1 ∂fx2
(fx1 )x2 = =2= = (fx2 )x1
∂x2 ∂x1
∂fx1 ∂fx3
(fx1 )x3 = =0= = (fx3 )x1
∂x3 ∂x1
∂fx2 ∂fx3
(fx2 )x3 = = −2 = = (fx3 )x2 .
∂x3 ∂x2
Hence, the Hessian matrix for the given function can be found as
 
6 2 0
H(x1 , x2 , x3 ) =  2 4 −2  .
0 −2 4
To check the definiteness of H, take
  
6 2 0 z1
zT Hz = (z1 , z2 , z3 )  2 4 −2   z2  ,
0 −2 4 z3
which gives
 
6z1 + 2z2 + 0z3
zT Hz = (z1 , z2 , z3 )  2z1 + 4z2 − 2z3 
0z1 − 2z2 + 4z3

= z1 (6z1 + 2z2 ) + z2 (2z1 + 4z2 − 2z3 ) + z3 (−2z2 + 4z3 ).


Thus,
zT Hz = 6z12 + 4z1 z2 + 4z22 − 4z2 z3 + 4z32 .
Note that
zT Hz = 2(2z12 + (z1 + z2 )2 + (z2 − z3 )2 + z32 ) > 0,
for z 6= 0, so the Hessian matrix is positive-definite. Hence, the function
f (x1 , x2 , x3 ) is a convex function. •
Another way to determine whether a function f (x1 , x2 , . . . , xn ) is a convex
or concave function is to use the principal minor test, which helps us to
determine the sign of the Hessian matrix. In the following, we discuss two
definitions.
812 Applied Linear Algebra and Optimization using MATLAB

Definition 7.13 (Principal Minor)

An ith principal minor of an n×n matrix is the determinant of an i×i ma-


trix obtained by deleting (n − i) rows and the corresponding (n − i) columns
of a matrix.

For example, the matrix


 
−3 −2
A=
−2 −5
has −3 and −5 as the first principal minors, and the second principal minor
is
−3(−5) − (−2)(−2) = 15 − 4 = 11,
which is the determinant of the given matrix. •

Note that for an n × n square matrix, there are, in all, 2n − 1 principal


minors (or determinants). Also, the first principal minors of a given matrix
are just the diagonal entries of a matrix.
Definition 7.14 (Leading Principal Minor)

A kth leading principal minor of an n × n matrix is the determinant of a


k × k matrix obtained by deleting (n − k) rows and columns of a matrix. •
Let Hk (x1 , x2 , . . . , xn ) be the kth leading principal minor of the Hessian
matrix evaluated at the point (x1 , x2 , . . . , xn ). Thus, if
f (x1 , x2 ) = 3x31 + 4x1 x2 + 2x22 ,
then
H1 (x1 , x2 ) = 18x1 , H1 (x1 , x2 ) = 18x1 (4) − 4(4) = 72x1 − 16.
Theorem 7.18 (Convex Function)

Suppose that a function f (x1 , x2 , . . . , xn ) has continuous second-order par-


tial derivatives for each point x = (x1 , x2 , . . . , xn ) ∈ S. Then function f (x)
is a convex function on S, if and only if for each x ∈ S all principal minors
are nonnegative. •
Nonlinear Programming 813

Example 7.30 Show that the function f (x1 , x2 ) = 3x21 + 4x1 x2 + 2x22 is a
convex function on S = R2 .

Solution. First, we find the Hessian matrix, which is of the form


 
6 4
H(x1 , x2 ) = .
4 4

The first principal minors of the Hessian matrix are the diagonal entries,
both 6 > 0 and 4 > 0. The second principal minor of the Hessian matrix
is the determinant of the Hessian matrix, which is

6(4) − (4)(4) = 24 − 16 = 8 > 0.

So for any point, all principal minors of the Hessian matrix H(x1 , x2 )
are nonnegative, therefore, Theorem 7.18 shows that f (x1 , x2 ) is a convex
function on R2 . •

Figure 7.19: Convex function.


814 Applied Linear Algebra and Optimization using MATLAB

Theorem 7.19 (Concave Function)

Suppose that a function f (x1 , x2 , . . . , xn ) has continuous second-order par-


tial derivatives for each point x = (x1 , x2 , . . . , xn ) ∈ S. Then function f (x)
is a concave function on S, if and only if for each x ∈ S, k = 1, 2, . . . , n,
and all nonzero principal minors have the same sign as (−1)k . •

Example 7.31 Show that the function f (x1 , x2 ) = −x21 − 2x1 x2 − 3x22 is
a concave function on S = R2 .

Solution. The Hessian matrix of the given function has the form
 
−2 −2
H(x1 , x2 ) = .
−2 −6
The first principal minors of the Hessian matrix are the diagonal entries
(−2 and −6). These are both negative (nonpositive). The second principal
minor is the determinant of the Hessian matrix H(x1 , x2 ) and equals

−2(−6) − (−2)(−2) = 12 − 4 = 8 > 0.

Thus, from Theorem 7.19, f (x1 , x2 ) is a concave function on R2 . •

Example 7.32 Show that the function f (x1 , x2 ) = 2x21 − 4x1 x2 + 3x22 is
not a convex nor a concave function on S = R2 .

Solution. The Hessian matrix of the given function has the form
 
2 −4
H(x1 , x2 ) = .
−4 3
The first principal minors of the Hessian matrix are 2 and 3. Because both
principal minors are positive, f (x1 , x2 ) cannot be concave. The second
principal minor is the determinant of the Hessian matrix H(x1 , x2 ) and it
is equal to
2(3) − (−4)(−4) = 6 − 16 = −10 < 0.
Thus, f (x1 , x2 ) cannot be a convex function on R2 . Together, these facts
show that f (x1 , x2 ) cannot be a convex nor a concave function. •
Nonlinear Programming 815

Figure 7.20: Concave function.

Example 7.33 Show that the function

f (x1 , x2 , x3 ) = 2x21 + 2x22 + 2x23 − 2x1 x2 − 2x2 x3 − 2x1 x3

is a convex function on S = R3 .

Solution. The Hessian matrix of the given function has the form
 
4 −2 −2
H(x1 , x2 , x3 ) =  −2 4 −2  .
−2 −2 4

By deleting rows (and columns) 1 and 2 of the Hessian matrix, we obtain


the first-order principal minor 4 > 0. By deleting rows (and columns)
1 and 3 of the Hessian matrix, we obtain the first-order principal minor
4 > 0. By deleting rows (and columns) 2 and 3 of the Hessian matrix, we
obtain the first-order principal minor 4 > 0. By deleting row 1 and column
1 of the Hessian matrix, we find the second-order principal minor

4 −2
|H2 | = = 16 − 4 = 12 > 0.
−2 4
816 Applied Linear Algebra and Optimization using MATLAB

Figure 7.21: Neither a convex nor a concave function.

By deleting row 2 and column 2 of the Hessian matrix, we find the second-
order principal minor

4 −2
|H2 | = = 16 − 4 = 12 > 0.
−2 4
By deleting row 3 and column 3 of the Hessian matrix, we find the second-
order principal minor

4 −2
|H2 | = = 16 − 4 = 12 > 0.
−2 4
The third-order principal minor is simply the determinant of the Hessian
matrix itself. Expanding by row 1 cofactors, we find the third-order prin-
cipal minor as follows:
|H3 | = 4[(4)(4) − (−2)(−2)] − (−2)[(−2)(4) − (−2)(−2)]
+(−2)[(−2)(−2) − (−2)(4)] = 0.
Because for all (x1 , x2 , x3 ) all principal minors of the Hessian matrix are
nonnegative, we have shown that f (x1 , x2 , x3 ) is a convex function on R3 .

Nonlinear Programming 817

Example 7.34 For what values of a, b, and c will the function


f (x1 , x2 ) = ax21 + bx1 x2 + cx22
be a concave function on R2 ?

Solution. The first-order partial derivatives are


∂f
= 2ax1 + bx2
∂x1
and
∂f
= bx1 + 2cx2 .
∂x2
Thus, the gradient of the function is
 
2ax1 + bx2
∇f (x1 , x2 ) = .
bx1 + 2cx2
The second-order partial derivatives are:
∂ 2f
= 2a
∂x21
∂ 2f
= 2c
∂x22
∂ 2f ∂ 2f
= b= ,
∂x1 ∂x2 ∂x2 ∂x1
and so the Hessian matrix for the function is
 
2 2a b
∇ f (x1 , x2 ) = H(x1 , x2 ) = .
b 2c
The first principal minors are
H1 = 2a and H1 = 2c,
and the second principal is the determinant of the Hessian matrix and is
equal to
H2 = |H| = 4ac − b2 .
818 Applied Linear Algebra and Optimization using MATLAB

If the given function is a concave function on R2 , then a, b, and c must


satisfy the conditions

a ≤ 0, c ≤ 0 and 4ac − b2 ≥ 0,

whereas 4ac − b2 ≥ 0 implies that


√ √ √
|b| ≤ 4ac or − 4ac ≤ b ≤ 4ac.

7.5 Standard Form of a Nonlinear


Programming Problem
In solving NLP problems, we have to do the following:

Find an optimal solution x = (x1 , x2 , . . . , xn )T in order to minimize or


maximize an objective function, f (x), subject to the constraint function
gi (x), i = 1, 2, . . . , m, which will be either equality constraints or inequality
constraints type. Thus, the standard form of an NLP problem will be of
the form:

maximize(minimize) Z = f (x)
subject to (7.34)
gj (x) (≤, =, ≥) 0, j = 1, 2, . . . , m.

In the following we give two very important theorems that illustrate the
importance of convex and concave functions in NLP problems.

Theorem 7.20 (Concave Function)

Consider the NLP problem (7.34) and assume it is a maximization problem.


Suppose the feasible region S for NLP problem (7.34) is a convex set. If
f (x) is concave on S, then any local maximum for NLP problem (7.34) is
an optimal solution to this NLP problem. •
Nonlinear Programming 819

Theorem 7.21 (Convex Function)

Consider the NLP problem (7.34) and assume it is a minimization problem.


Suppose the feasible region S for NLP problem (7.34) is a convex set. If
f (x) is convex on S, then any local minimum for NLP problem (7.34) is
an optimal solution to this NLP problem. •

The above two theorems demonstrate that if we are maximizing a con-


cave function or minimizing a convex function over a convex feasible region
S, then any local maximum or local minimum will solve NLP problem
(7.34). As we solve NLP problems, we will repeatedly apply these two
theorems. •

7.6 One-Dimensional Unconstrained


Optimization
Optimization and root finding are related in the sense that both involve
guessing and searching for a point on a function. In root finding, we look
for zeros of a function or functions. While in optimization, we search for
an extremum of a function, i.e., either the maximum or the minimum value
of a function.
Here, we will discuss the NLP problem that consists of only an objective
function, i.e., z = f (x) and no constraints. Note that if the given objective
function is convex (concave), then a unique solution will be found at an
interior point to the feasible region where all derivatives are zero or at a
point. We will discuss three one-dimensional optimization methods in the
following sections.

7.6.1 Golden-Section Search


This is the first method we discuss for the single variable optimization
that has a goal of finding the value of x that yields an extremum; either a
maximum or minimum of a function f (x). It is a simple, general-purpose,
single-variable search method. It is similar to the bisection method for
820 Applied Linear Algebra and Optimization using MATLAB

Figure 7.22: Relationship between optimization and root finding.

nonlinear equations.

This method is an iterative method and starts with two initial guesses,
xL and xu , that bracket one local extremum of f (x) (considered a maxi-
mum) and hence is called a unimodel. Next, we look for two interior points,
x1 and x2 , which can be chosen according to the golden ratio

5−1
d= (xu − xL ),
2
which gives
x1 = x L + d
x2 = xu − d.
After finding the two interior points, the given function is evaluated at
these points and two results can occur:
1. If f (x1 ) > f (x2 ), then the domain of x to the left of x2 , from xL
to x2 , can be eliminated because it does not contain the maximum.
Since the optimum lies on the interval (x1 , xu ), we set x2 = x1 for
the next iteration.
Nonlinear Programming 821

2. If f (x1 ) < f (x2 ), then the domain of x to the right of x1 , from x1


to xu , would have been eliminated. In this case, the optimum lies on
the interval (xL , x2 ), and we set x2 = x1 for the next iteration.

3. If f (x1 ) = f (x2 ), then the optimum lies on the interval (x1 , x2 ).

Figure 7.23: Graphical interpretation of golden-section search.

Remember that we do not have to recalculate all the function values for
the next iteration, and we need only one new function value. For example,
when the optimum is on the interval (x1 , xu ), then we set x2 = x1 , i.e., the
old x1 becomes the new x2 and f (x2 ) = f (x1 ). After this, we have to find
only the new x1 for the next iteration, and it can be obtained as

5−1
x1 = xL + (xu − xL ).
2
A similar approach would be used for the other possible case, when the
optimum is on the interval (xL , x2 ) by setting x2 = x1 , i.e., the old x2
becomes the new x1 and f (x1 ) = f (x2 ). Then we need to find only the
new x2 for the next iteration, which can be obtained as

5−1
x2 = xu − (xu − xL ).
2
822 Applied Linear Algebra and Optimization using MATLAB

As the iterations are repeated, the interval containing the optimum is re-
duced rapidly. In fact, with each iteration the interval is reduced by a
factor of the golden ratio (about 61.8%). This means that after 10 itera-
tions the interval is shrunk to about 0.008 or 0.8% of the initial interval.

Example 7.35 Use golden-section search to find the approximation of the


maximum of the function

f (x) = 2x − 1.75x2 + 1.1x3 − 0.25x4 ,

with initial guesses xL = 1 and xu = 2.5.

Solution. To find the two interior points x1 and x2 , first we compute the
value of the golden ratio as
√ √
5−1 5−1
d= (xu − xL ) = (2.25 − 1) = 0.7725,
2 2
and with this value, we have the values of the interior points as follows:

x1 = xL + d = 1 + 0.7725 = 1.7725,
x2 = xu − d = 2.25 − 0.7725 = 1.4775.

Next, we have to compute the function values at these interior points, which
are:

f (x1 ) = f (1.7725) = 1.7049


f (x2 ) = f (1.4775) = 1.4913.

Since f (x1 ) > f (x2 ), the maximum is on the interval defined by x2 , x1 , and
xu , i.e., (x2 , xu ). For this, we set the following scheme:

xL = x2 = 1.4775
x2 = x1 = 1.7725
xu = xu = 2.25.
Nonlinear Programming 823

So we have to find the new value of x1 , only for the second iteration, and
it can be computed with the help of the new value of the golden ratio as
follows:

√ √
5−1 5−1
d= (xu − xL ) = (2.25 − 1.4775) = 0.4774
2 2

and
x1 = xL + d = 1.4775 + 0.4774 = 1.9549.

The function values at these new interior points are:

f (x1 ) = f (1.9549) = 1.7887


f (x2 ) = f (1.7725) = 1.7049.

Again, f (x1 ) > f (x2 ), so the maximum is on the interval defined by x2 , x1 ,


and xu . For this, we set the following scheme:

xL = x2 = 1.7725
x2 = x1 = 1.9549
xu = xu = 2.25.

The new value of x1 and d can be computed as follows:

d = 0.2951 and x1 = 2.0676.

Repeat the process, and the numerical results for the corresponding itera-
tions, starting with the initial approximations xL = 1.0 and xu = 2.25 with
accuracy 5 × 10−4 , are given in Table 7.6. From Table 7.6, we can see that
within, 14 iterations (very slow), the result converges rapidly on the true
value of 1.8082 at x = 2.0793. •
824 Applied Linear Algebra and Optimization using MATLAB

Figure 7.24: Graph of the given function.

Program 7.6
MATLAB m-file for the Golden-Section Search Method for Op-
timization
function sol=golden(fn,xL,xu,tol)
disp(’ k xL f (xL) x2 f (x2) x1 f (x1) xu f (xu) d ’)
d=((sqrt(5)-1)/2)*(xu-xL); x1=xL+d; x2=xu-d;
fL=feval(fn,xL);fu=feval(fn,xu);
f1=feval(fn,x1);f2=feval(fn,x2); k = 0;
[k xL fL x2 f2 x1 f1 xu fu d]
while abs(x1-x2)>tol; f1=feval(fn,x1);f2=feval(fn,x2);
k=k+1; if f1>f2; xL=x2; x2=x1; xu=xu;
d=((sqrt(5)-1)/2)*(xu-xL); x1=xL+d; f1=feval(fn,x1);
E=abs(x1-x2); sol=x2; f2=feval(fn,x2);
fL=feval(fn,xL); fu=feval(fn,xu);
[k xL fL x2 f2 x1 f1 xu fu d sol]
else; xu=x1; x1=x2; xL=xL; d=((sqrt(5)-1)/2)*(xu-xL);
x2=xu-d; f2=feval(fn,x2); E=abs(x1-x2); sol=x1;
f1=feval(fn,x1);fL=feval(fn,xL);fu=feval(fn,xu);
[k xL fL x2 f2 x1 f1 xu fu d sol]
end;end;
Nonlinear Programming 825

Table 7.6: Solution by the golden-section search.

n xL f (xL ) x2 f (x2 ) x1 f (x1 ) xu f (xu ) d


00 1.0000 1.1000 1.4775 1.4913 1.7725 1.7049 2.2500 1.7631 0.7725
01 1.4775 1.4913 1.7725 1.7049 1.9549 1.7887 2.2500 1.7631 0.4774
02 1.7725 1.7049 1.9549 1.7887 2.0676 1.8080 2.2500 1.7631 0.2951
03 1.9549 1.7887 2.0676 1.8080 2.1373 1.8034 2.2500 1.7631 0.1824
04 1.9549 1.7887 2.0246 1.8042 2.0676 1.8080 2.1373 1.8034 0.1127
05 2.0246 1.8042 2.0676 1.8080 2.0942 1.8079 2.1373 1.8034 0.0697
06 2.0246 1.8042 2.0512 1.8071 2.0676 1.8080 2.0942 1.8079 0.0431
07 2.0512 1.8071 2.0676 1.8080 2.0778 1.8082 2.0942 1.8079 0.0266
08 2.0676 1.8080 2.0778 1.8082 2.0841 1.8081 2.0942 1.8079 0.0164
09 2.0676 1.8080 2.0739 1.8081 2.0778 1.8082 2.0841 1.8081 0.0102
10 2.0739 1.8081 2.0778 1.8082 2.0802 1.8082 2.0841 1.8081 0.0063
11 2.0778 1.8082 2.0802 1.8082 2.0817 1.8082 2.0841 1.8081 0.0039
12 2.0778 1.8082 2.0793 1.8082 2.0802 1.8082 2.0817 1.8082 0.0024
13 2.0778 1.8082 2.0787 1.8082 2.0793 1.8082 2.0802 1.8082 0.0015
14 2.0787 1.8082 2.0793 1.8082 2.0796 1.8082 2.0802 1.8082 0.0009

To use MATLAB commands for the golden-section search method, first we


define a function m-file as fn.m for the equation as follows:

f unction y = f n(x)
y = 2 ∗ x − 1.75 ∗ x.ˆ 2 + 1.1 ∗ x.ˆ 3 − 0.25 ∗ x.ˆ 4;

then use the single command:

>> sol = golden(0 f n0 , 1.0, 2.25, 5e − 4)


sol =
2.0793

7.6.2 Quadratic Interpolation


This iterative method is based on fitting a polynomial function through a
given number of points. As the name indicates, the quadratic interpola-
tion method uses three distinct points and fits a quadratic function through
826 Applied Linear Algebra and Optimization using MATLAB

these points. The minimum of this quadratic function is computed us-


ing necessary points.

Since the method is iterative, a new set of three points is selected by


comparing function values at this minimum point with three initial guesses.
The process is repeated with the three new points until the interval on
which the minimum lies becomes fairly small.

Similar to the previously discussed method, the golden-section search,


this method also requires only one new function evaluation at each itera-
tion. As the interval becomes small, the quadratic approximation becomes
closer to the actual function, which speeds up convergence.

Derivation of the Formula

Just as there is only one straight line connecting two points, there is
only one quadratic or parabola connecting three points. Suppose that
we are given three distinct points x0 , x1 , and x2 and a quadratic function
p(x) passing through the corresponding function values f (x0 ), f (x1 ), and
f (x2 ). Thus, if these three points jointly bracket an optimum, we can fit a
quadratic function to the points as follows:

(x − x1 )(x − x2 ) (x − x0 )(x − x2 )
p(x) = f (x0 ) + f (x1 )
(x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 )
(x − x0 )(x − x1 )
+ f (x2 ).
(x2 − x0 )(x2 − x1 )
The necessary condition for the minimum of this quadratic function can
be obtained by differentiating it with respect to x. Set the result equal to
zero, and solve the equation for an estimate of optimal x, i.e.,
(2x − x1 − x2 ) (2x − x0 − x2 )
p0 (x) = f (x0 ) + f (x1 )
(x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 )
(2x − x0 − x1 )
+ f (x2 )
(x2 − x0 )(x2 − x1 )
= 0.
Nonlinear Programming 827

It can be shown by some algebraic manipulations that the minimum point


(or optimal point) denoted xopt is

(x21 − x22 )f (x0 ) + (x22 − x20 )f (x1 ) + (x20 − x21 )f (x2 )


 
1
x = xopt = , (7.35)
2 (x1 − x2 )f (x0 ) + (x2 − x0 )f (x1 ) + (x0 − x1 )f (x2 )

which is called the quadratic interpolation formula.

After finding the new point (optimum point), the next job is to deter-
mine which one of the given three points is discarded before repeating the
process. To discard a point we check the following:

1. xopt ≤ x1 :

(i) If f (x1 ) ≥ f (xopt ), then the minimum of the actual function is


on the interval (x0 , x1 ), therefore, we will use the new three points
x0 , xopt , and x1 for the next iteration.
(ii) If f (x1 ) < f (xopt ), then the minimum of the actual function is
on the interval (xopt , x2 ), therefore, in this case we will use the new
three points xopt , x1 , and x2 for the next iteration.

2. xopt > x1 :
(i) If f (x1 ) ≥ f (xopt ), then the minimum of the actual function is
on the interval (x1 , x2 ), therefore, we will use the new three points
x1 , xopt , and x2 for the next iteration.
(ii) If f (x1 ) < f (xopt ), then the minimum of the actual function is
on the interval (x0 , xopt ), therefore, in this case we will use the new
three points x0 , x1 , and xopt for the next iteration.

Example 7.36 Use quadratic interpolation to find the approximation of


the maximum of

f (x) = 2x − 1.75x2 + 1.1x3 − 0.25x4 ,

with initial guesses x0 = 1.75, x1 = 2, and x2 = 2.25.


828 Applied Linear Algebra and Optimization using MATLAB

Figure 7.25: Graphical interpretation of quadratic interpolation.

Solution. Using the three initial guesses, the corresponding functional


values are:
x0 = 1.75 f (1.75) = 1.6912
x1 = 2.0 f (2.0) = 1.8000
x2 = 2.25 f (2.25) = 1.7631.
Using formula (7.35), we get

((2)2 − (2.25)2 )(1.691) + ((2.25)2 − (1.75)2 )(1.8) + ((1.75)2 − (2)2 )(1.763)


xopt =
2(2 − 2.25)(1.691) + 2(2.25 − 1.75)(1.8) + 2(1.75 − 2)(1.763)

= 2.0617,

and the function value at this optimal point is

f (2.0617) = 2(2.0617) − 1.75(2.0617)2 − 1.1(2.0617)3 − 0.25(2.0617)4 = 1.8077.

To perform the second iteration, we have to discard one point by using the
same strategy as in the previous golden-section search. Since the function
value at xopt is greater than the intermediate point x1 and the xopt value is
to the right of x1 , the first initial guess x0 is discarded. So for the second
Nonlinear Programming 829

iteration, we will start from the following initial guesses:


x0 = 2.0 f (2.0) = 1.8000
x1 = 2.0617 f (2.0617) = 1.8077
x2 = 2.25 f (2.25) = 1.7631.
Using formula (7.35) again gives
((2.062)2 − (2.25)2 )(1.8) + ((2.25)2 − (2)2 )(1.808) + ((2)2 − (2.062)2 )(1.763)
xopt =
2(2.062 − 2.25)(1.8) + 2(2.25 − 2)(1.808) + 2(2 − 2.062)(1.763)

= 2.0741,

and the function value at this optimal point is


f (2.0741) = 2(2.0741) − 1.75(2.0741)2 + 1.1(2.0741)3 − 0.25(2.0741)4 = 1.8081.

Repeat the process and the numerical results for the corresponding iter-
ations, starting with the initial approximations x0 = 1.75, x1 = 2.0, and
x2 = 2.25, with accuracy 5 × 10−2 , which are given in Table 7.7.

Table 7.7: Solution by quadratic interpolation method.

n x0 f (x0 ) x1 f (x1 ) x2 f (x2 ) x3 f (x3 )


00 1.7500 1.6912 2.0000 1.8000 2.2500 1.7631 2.0617 1.8077
01 2.0000 1.8000 2.0617 1.8077 2.2500 1.7631 2.0741 1.8081
02 2.0741 1.8081 2.0781 1.8082 2.2500 1.7631 2.0781 1.8082
03 2.0781 1.8082 2.0790 1.8082 2.2500 1.7631 2.0790 1.8082
04 2.0790 1.8082 2.0793 1.8082 2.2500 1.7631 2.0793 1.8082
05 2.0793 1.8082 2.0793 1.8082 2.2500 1.7631 2.0793 1.8082

From Table 7.7, we can see that within five iterations, the result con-
verges rapidly on the true value of 1.8082 at x = 2.0793. Also, note that for
this problem the quadratic interpolation method converges only on one end
of the interval, and sometimes the convergence can be slow for this reason.

To use MATLAB commands for the quadratic interpolation method,


first we define a function m-file as fn.m for the equation as follows:
830 Applied Linear Algebra and Optimization using MATLAB

f unction y = f n(x)
y = 2 ∗ x − 1.75 ∗ x.ˆ 2 + 1.1 ∗ x.ˆ 3 − 0.25 ∗ x.ˆ 4;

then use the single command:

>> sol = Quadratic2(0 f n0 , 1.75, 2, 2.25, 5e − 2)


sol =
2.0793

Remember that the procedure is essentially complete except for the choice
of three initial points. Choosing three arbitrary values of x may cause
problems if the denominator of the xopt equation is zero. Assume that
the three points are chosen as 0, , and 2, where a positive number  is a
chosen parameter (say,  = 1). In such case, the expression for xopt takes
the form
(3f (x0 ) − 4f (x1 ) + f (x2 ))
xopt = ,
(2f (x0 ) − 4f (x1 ) + 2f (x2 ))

and for the denominator to be greater than zero, we must have

2f (x0 ) − 4f (x1 ) + 2f (x2 ) > 0,

i.e.,
f (x0 ) + f (x2 )
> f (x1 ).
2
In the case of the convergence of the method, the interval on which the
minimum lies becomes smaller, the quadratic function becomes closer to
the actual function, and the process is terminated when

(f (xopt ) − p(xopt )
≤ tol,
(f (xopt )

where tol is a smaller convergence tolerance.


Nonlinear Programming 831

Program 7.7
MATLAB m-file for Quadratic Interpolation Method
function sol=Quadratic2(fn,x0,x1,x2,tol)
disp(0 k x0 f (x0) x1 f (x1) x2 f (x2) x3 f (x3)0 )
f0=feval(fn,x0);f1=feval(fn,x1);f2=feval(fn,x2); k = 0;
A = (f 0 ∗ (x1.ˆ 2 − x2.ˆ 2) + f 1 ∗ (x2.ˆ 2−x0.2 ) + f 2 ∗ (x0.ˆ 2−x1.ˆ
2));
B = (2 ∗ f 0 ∗ (x1 − x2) + 2 ∗ f 1 ∗ (x2 − x0) + 2 ∗ f 2 ∗ (x0 − x1));
x3 = A/B;f3=feval(fn,x3);
while abs(x2-x0)¿tol; k=k+1; if f3¿f1;
x0=x1;f0=f1;x1=x3;f1=f3; else; x2=x1;f2=f1;x1;x3;f1=f3;
disp(0 k x0 f (x0) x1 f (x1) x2 f (x2) x3 f (x3)0 )
end;end; sol=x3;

7.6.3 Newton’s Method


This is one of the best one-dimensional iterative methods for single variable
optimization. Unlike other methods for one-dimensional optimization, this
method requires only a single initial approximation.

Since for finding the root of nonlinear equation f (x) = 0, this method
can be written as
f (xn )
xn+1 = xn − 0 , n ≥ 0,
f (xn )
a similar open approach can be used to find an optimum of f (x) by defining
a new function, F (x) = f 0 (x). Thus, because the same optimal value x∗
satisfies both the functions
F 0 (x∗ ) = f 0 (x∗ ) = 0,
Newton’s method for optimization can be written as
f 0 (xn )
xn+1 = xn − , n ≥ 0, (7.36)
f 00 (xn )
which can be used to find the minimum or maximum of f (x), if f (x) is
twice continuously differentiable.
832 Applied Linear Algebra and Optimization using MATLAB

It should be noted that formula (7.36) can be obtained by using second-


order Taylor’s series for the single variable function f (x) and setting the
derivative of the series equal to zero, i.e., using

(x − x0 )2 00
f (x) = f (x0 ) + (x − x0 )f 0 (x0 ) + f (x0 ) + higher-order terms.
2!
Taking the derivative with respect to x and ignoring the higher-order term,
we get
f 0 (x) ≈ f 0 (x0 ) + (x − x0 )f 00 (x0 ).
Setting f 0 (x) = 0 and simplifying the expression for x, we obtain

f 0 (x0 )
x ≈ x0 − .
f 00 (x0 )
It is an improved approximation and can be written as
f 0 (x0 )
x 1 = x0 − ,
f 00 (x0 )

or, in general, we have formula (7.36).

Example 7.37 Use Newton’s method to find the local maximum of the
function
f (x) = 2x − 1.75x2 + 1.1x3 − 0.25x4 ,
with an initial guess x0 = 2.5.

Solution. To use formula (7.36), first we compute the first and second
derivative of the given function as follows:

f 0 (x) = 2 − 3.5x + 3.3x2 − x3

f 00 (x) = −3.5 − 6.6x − 3x2 .


Then using formula (7.36), we have

2 − 1.5xn + 3.3x2n − x3n


xn+1 = xn − , n ≥ 0.
−1.5 + 6.6xn − 3x2n
Nonlinear Programming 833

Taking n = 0 and x0 = 2.5, we get


2 − 1.5(2.1957) + 3.3(2.1957)2 − (2.1957)3
x1 = 2.1957 − = 2.0957,
−1.5 + 6.6(2.1957) − 3(2.1957)2
which gives the function value f (2.1957) = 1.7880. Similarly, the second
iteration can be obtained as
2 − 1.5(2.5) + 3.3(2.5)2 − (2.5)3
x2 = 2.5 − = 2.1917,
−1.5 + 6.6(2.5) − 3(2.5)2
and the corresponding function value f (2.0917) = 1.8080.

Repeat the process; the numerical results for the corresponding iter-
ations, starting with the initial approximation x0 = 2.5 with accuracy
5 × 10−2 , are given in Table 7.8.

Table 7.8: Solution by Newton’s method.


n xn f (xn ) f 0 (xn ) f 00 (xn )
00 2.5 1.4844 –1.7500 –5.7500
01 2.1957 1.7880 –0.3608 –3.4714
02 2.0917 1.8080 –0.0344 –2.8204
03 2.0795 1.8082 –0.00044 –2.7483
04 2.0793 1.8082 –7.5522e-008 –2.7474

From Table 7.8, we can see that within four iterations, the result converges
rapidly on the true value of 1.8082 at x = 2.0793. Also, note that this
method does not require initial guesses that bracket the optimum. In ad-
dition, this method also shares the disadvantage that it may be divergent.
For confirming the convergence of the method we must check the correct
sign of the second derivative of the function. For maximizing the function,
the second derivative of the function should be less than zero, and it should
be greater than zero for the minimizing problem. In both cases, the first
derivative of the function should be close to zero as much as possible be-
cause optimum here means the same as the root of f 0 (x) = 0. Note that if
the second derivative of the function equals zero at the given initial guess,
then change the initial guess. •
834 Applied Linear Algebra and Optimization using MATLAB

To get the above results using MATLAB commands, first the func-
tion 2x − 1.75x2 + 1.1x3 − 0.25x4 and its first and second derivatives
2 − 1.5x + 3.3x2 − x3 and −1.5 + 6.6x − 3x2 were saved in three m-files
called fn.m, dfn.m, and ddfn.m, respectively, written as follows:

f unction y = f n(x)
y = 2 ∗ x − 1.75 ∗ x.ˆ 2 + 1.1 ∗ x.ˆ 3 − 0.25 ∗ x.ˆ 4;

first derivative of the function

f unction dy = df n(x)
dy = 2 − 1.5 ∗ x + 3.3 ∗ x.ˆ 2 − x.ˆ 3;

and the second derivative of the function,

f unction ddy = ddf n(x)


ddy = −1.5 + 6.6 ∗ x − 3 ∗ x.ˆ 2;

after which we do the following.

>> x0 = 2.5; tol = 5e − 6;


>> sol = newtonO(0 f n0 ,0 df n0 ,0 ddf n0 , x0, tol);

Program 7.8
MATLAB m-file for Newton’s Method for Optimization
function sol=newtonO(fn,dfn,ddfn,x0,tol)
old = x0+1; k = 0;
while abs (x0 − old) > tol; old = x0;
x0 = old − f eval(df n, old)/f eval(ddf n, old);
end; sol=x0;
Nonlinear Programming 835

7.7 Multidimensional Unconstrained


Optimization
Just as the theory of linear programming is based on linear algebra with
several variables, the theory of NLP is based on calculus with several vari-
ables. A convenient and familiar place to begin is therefore with the prob-
lem of finding the minimum or maximum of a nonlinear function in the
absence of constraints.

Here, we will discuss the procedure to find an optimal solution (if it


exists) or a local extremum for the following NLP problem:

maximize(or minimize) z = f (x1 , x2 , . . . , xn ) 
subject to . (7.37)
n 
x = (x1 , x2 , . . . , xn ) ∈ R

We assume that the first and second partial derivatives of f (x1 , x2 , . . . , xn )


exist and are continuous at all points.

Theorem 7.22 (Local Extremum)

If x̄ is a local extremum for NLP problem (7.37), then

∂f
(x̄) = 0, i = 1, 2, . . . , n, (7.38)
∂xi

where the point x̄ is called the stationary point of a function f (x). •

The following theorems give conditions (involving the Hessian matrix of f )


under which a stationary point is a local minimum, or a local maximum
and not a local extremum.

Theorem 7.23 (Local Minimum)

If Hk (x̄) > 0, for k = 1, 2, . . . , n, then the stationary point x̄ is a local


minimum for NLP problem (7.37). •
836 Applied Linear Algebra and Optimization using MATLAB

Theorem 7.24 (Local Maximum)

If Hk (x̄) is nonzero, for k = 1, 2, . . . , n and has the same sign as (−1)k ,


then the stationary point x̄ is a local maximum for NLP problem (7.37). •

Theorem 7.25 (Saddle Point)

If Hk (x̄) 6= 0, for k = 1, 2, . . . , n, and the conditions of Theorem 7.23


and Theorem 7.24 do not hold, then the stationary point x̄ is not a local
minimum for NLP problem (7.37). •

Theorem 7.26 If Hk (x̄) = 0, for k = 1, 2, . . . , n, then the stationary


point x̄ may be a local minimum, or a maximum, or a saddle point for
NLP problem (7.37). •

Example 7.38 Find all local minimum, local maximum, and saddle points
for the function
1 2 1
f (x1 , x2 ) = x31 − x32 + x21 − 6x1 + 32x2 + 4.
3 3 2
Solution. The first partial derivatives of the function are

∂f (x1 , x2 )
= x21 + x1 − 6
∂x1
and
∂f (x1 , x2 )
= −2x22 + 32.
∂x2
∂f ∂f
Since ∂x 1
and ∂x 2
exist for every (x1 , x2 ), the only stationary points are
the solutions of the system
 2
x1 + x1 − 6 = 0
−2x22 + 32 = 0.

Solving this system, we obtain the four stationary points

(−3, −4), (−3, 4), (2, −4), (2, 4).


Nonlinear Programming 837

The second partial derivatives of f are

∂ 2f ∂ 2f
= 2x1 + 1, = −4x2
∂x1 ∂x22

and
∂ 2f ∂ 2f
=0= .
∂x1 ∂x2 ∂x2 ∂x1
Hence, the Hessian matrix for the function f (x) is
 
2x1 + 1 0
H(x1 , x2 ) = .
0 −4x2

At (−3, −4) the Hessian matrix is


 
−5 0
H(−3, −4) = .
0 16

Since
H1 (−3, −4) = −5 < 0
and
−5 0
H2 (−3, −4) = = −80 6= 0,
0 16
the conditions of Theorem 7.23 and Theorem 7.24 cannot be satisfied, there-
fore, the stationary point x̄ = (−3, −4) is not a local extremum for the
given function. But Theorem 7.25 now implies that x̄ = (−3, −4) is a
saddle point, i.e.,
407
(−3, −4, f (−3, −4)) = (−3, −4, − ).
6
At (−3, 4) the Hessian matrix is
 
−5 0
H(−3, 4) = .
0 −16

Since
H1 (−3, 4) = −5 < 0
838 Applied Linear Algebra and Optimization using MATLAB

and
−5 0
H2 (−3, 4) = = −80 6= 0,
0 16
the conditions of Theorem 7.24 are satisfied, and it shows that the station-
ary point x̄ = (−3, 4) is a local maximum for the given function, i.e.,
617
f (−3, 4) = .
6
At (2, −4) the Hessian matrix is
 
5 0
H(2, −4) = .
0 16
Since
H1 (2, −4) = 5 > 0
and
5 0
H2 (2, −4) = = 80 > 0,
0 16
the conditions of Theorem 7.23 are satisfied, and it shows that the station-
ary point x̄ = (2, −4) is a local minimum for the given function, i.e.,
266
f (2, −4) = − .
3
Finally, at (2, 4) the Hessian matrix is
 
5 0
H(2, 4) = .
0 −16
Since
H1 (2, 4) = 5 > 0
and
−5 0
H2 (−3, −4) = = −80 6= 0,
0 16
the conditions of Theorem 7.23 and Theorem 7.24 cannot be satisfied, there-
fore, the stationary point x̄ = (2, 4) is not a local extremum for the given
function. From Theorem 7.25, we see that x̄ = (2, 4) is a saddle point, i.e.,
(2, 4, f (2, 4)) = (2, 4, 82).

Nonlinear Programming 839

Example 7.39 Find all local minimum, local maximum, and saddle points
for the function
f (x1 , x2 ) = −x21 + x22 .
Solution. The first partial derivatives of the function are
∂f (x1 , x2 ) ∂f (x1 , x2 )
= −2x1 , = 2x2 .
∂x1 ∂x2
∂f ∂f
Since ∂x 1
and ∂x 2
exist for every (x1 , x2 ), the only stationary points are
the solutions of the system

−2x1 = 0
2x2 = 0.

Solving this system, we obtain the only stationary point (0, 0).
The second partial derivatives of f are
∂ 2f ∂ 2f
= −2, =2
∂x21 ∂x22
and
∂ 2f ∂ 2f
=0=
∂x1 ∂x2 ∂x2 ∂x1

Hence, the Hessian matrix for the function f (x) is:


 
−2 0
H(x1 , x2 ) = .
0 2

At (0, 0) the Hessian matrix is


 
−2 0
H(0, 0) = .
0 2

Since
H1 (0, 0) = −2 < 0
and
−2 0
H2 (0, 0) = = −4 6= 0,
0 2
840 Applied Linear Algebra and Optimization using MATLAB

the conditions of Theorem 7.23 and Theorem 7.24 cannot be satisfied, there-
fore, the stationary point x̄ = (0, 0) is not a local extremum for the given
function. From Theorem 7.25, we conclude that x̄ = (0, 0) is a saddle
point, i.e.,
(0, 0, f (0, 0)) = (0, 0, 0).

7.7.1 Gradient Methods


There are a number of techniques available for multidimensional uncon-
strained optimization. The techniques we discuss here require derivatives
and therefore are called gradient methods. As the name implies, gradi-
ent methods explicitly use derivative information to generate efficient al-
gorithms. Two methods will be discussed here, and they are called the
steepest ascent and steepest descent methods.

Consider the following NLP problem:



maximize (or minimize) z = f (x1 , x2 , . . . , xn ) 
subject to . (7.39)
n 
x = (x1 , x2 , . . . , xn ) ∈ R

We know that if f (x1 , x2 , . . . , xn ) is a concave function, then the optimal


solution to the problem (7.39) (if it exists) will occur at a stationary point
x̄ having
∂f (x̄) ∂f (x̄) ∂f (x̄)
= = ··· = = 0.
∂x1 ∂x2 ∂xn
Sometimes it is very easy to compute a stationary point of a function, but
in many problems, it may be very difficult. Here, we discuss a method that
can be used to approximate the stationary point of a function.

Definition 7.15 (Length of a Vector)

Given a vector x = (x1 , x2 , . . . , xn ) ∈ Rn , the length of x is denoted by kxk


and is defined as
kxk = (x21 + x22 · · · + x2n )1/2 . (7.40)
Nonlinear Programming 841

Note that any n-dimensional vector represents a direction in Rn . Also,


for any direction there are an infinite number of vectors representing that
direction. For any vector x, the vector
x
kuk = (7.41)
kxk

is called a unit vector and will have a length of 1 and will define the same
direction as x.

Definition 7.16 (Gradient Vector)

Let f (x1 , x2 , . . . , xn ) be a function of n variables x = (x1 , x2 , . . . , xn ), then


the gradient vector for f x is denoted ∇f (x) and is defined as
 T
∂f ∂f ∂f
∇f (x) = (x̄), (x̄), . . . , (x̄) . (7.42)
∂x1 ∂x2 ∂xn

∇f (x)
Also, ∇f (x) defines the direction . •
k∇f (x)xk

For example, if
f (x1 , x2 ) = x21 + x22 ,
then
∇f (x1 , x2 ) = [2x1 , 2x2 ]T .
Thus, at (2, 3), the gradient vector of the function is

∇f (2, 3) = [2(2), 2(3)]T = [4, 6]T

and √ √ √
k∇f (2, 3)k = 42 + 62 = 16 + 36 = 52.
So the gradient vector ∇f (2, 3) defines the direction

∇f (2, 3) 4 6
= ( √ , √ ) = (0.5547, 0.8321).
k∇f (2, 3)k 52 52
842 Applied Linear Algebra and Optimization using MATLAB

∇f (x)
Note that if any point x̄ lies on the curve f (x), then the vector
k∇f (x)xk
will be perpendicular to the curve f (x).

For example, if f (x1 , x2 ) = x21 + x22 , then at (2, 3)


∇f (2, 3)
= (0.5547, 0.8321)
k∇f (2, 3)k
is perpendicular to
x21 + x22 = 13.
Note that if at any point x = (x1 , x2 , . . . , xn ) the gradient vector ∇f (x)
points in the direction in which the function f (x) is increasing most rapidly,
it is called the direction of steepest ascent. So it follows that −∇f (x) points
in the direction in which f (x) is decreasing more rapidly, and it is called
the direction of steepest descent. In other words, we can say that if we are
looking for a maximum of f (x) using the initial point v0 , it seems sensible
to look in the direction of steepest ascent, and for a minimum of f (x) we
look in the direction of steepest descent.

Also, moving from v0 in the direction of ∇f (x) to get the local maxi-
mum, we have to find the new point v1 as
v1 = v0 + α0 ∇f (v0 ),
for some α0 > 0. Since we desire v1 to be as close as possible to the
maximum, we need to find the unknown variable α0 > 0 such that
f (v1 ) = f (v0 + α0 ∇f (v0 ))
is as large as possible.

Since f (v0 + α0 ∇f (v0 )) is a function of the one variable α0 , α0 can be


found by using one-dimensional search. Since the steepest ascent method
(also called the gradient method) is an iterative method, we need at each
iteration a new value of the variable αk , which helps us to maximize the
function f (vk + αk ∇f (vk )) at each step. The value of αk can be computed
by using the following form:
vk+1 = vk + αk ∇f (v0 ), for k ≥ 0.
Nonlinear Programming 843

Theorem 7.27 Suppose we have a point v, and we move from v a small


distance δ in a direction d. Then for a given δ, the maximum increase in
the value of f (x) will occur if we choose
∇f (x)
d= .
k∇f (x)k
In short, if we move a small distance from v and we want f (x) to increase
as quickly as possible, then we should move in the direction of ∇f (v). •

Beginning at any point v0 and moving in the direction of ∇f (v0 ) will result
in a maximum rate of increase for f . So we begin by moving away from v0
in the direction of ∇f (v0 ). For some nonnegative value of α0 , we move to
a point v1 , which can be written as
v1 = v0 + α0 ∇f (v0 ),
where α0 solves the following one-dimensional optimization problem:

z(α) = maximize f (v0 + α0 ∇f (v0 )) 
subject to .
α0 ≥ 0

If k∇f (v1 )k is small, i.e.,


k∇f (v1 )k ≤ , given small  > 0,
we may terminate the process with the knowledge that v1 is a good ap-
proximation of the stationary point v̄ with ∇f (v̄) = 0.

But if k∇f (v1 )k is not sufficiently small, then we move away from v1
a distance α1 in the direction of k∇f (v1 )k. As before, we discuss α1 by
solving

z(α) = maximize f (v1 + α1 ∇f (v1 )) 
subject to .
α1 ≥ 0

We are at point v2 , which can be written as


v2 = v1 + α1 ∇f (v1 ).
844 Applied Linear Algebra and Optimization using MATLAB

If k∇f (v2 )k is sufficiently small, then we terminate the process with the
knowledge that v2 is a good approximation of the stationary point v̄ of the
given function f (x), with ∇f (v̄) = 0.

This process is called the steepest ascent method because, to generate


points, we always move in the direction that maximizes the rate at which
f increases (at least locally).

Example 7.40 Use the steepest ascent method to approximate the solution
to
maximize z = −(x1 − 3)2 − (x2 − 2)2
subject to
(x1 , x2 ) ∈ R2 ,
by starting with v0 = (1, 1).

Solution. Given

f (x1 , x2 ) = −(x1 − 3)2 − (x2 − 2)2 and v0 = (1, 1),

and the gradient vector of the given function, which is

∇f (x1 , x2 ) = [−2(x1 − 3), −2(x2 − 2)]T ,

we choose to begin at the point v0 = (1, 1), so

∇f (1, 1) = [−2(1 − 3), −2(1 − 2)]T = [4, 2]T .

Thus, we must choose α0 to maximize

f (α0 ) = f [(1, 1) + α0 (4, 2)] = f (1 + 4α0 , 1 + 2α0 ),

which can be simplified as

f (α0 ) = f (1 + 4α0 , 1 + 2α0 ) = −(4α0 − 2)2 − (2α0 − 1)2 .

Thus,
f (α0 ) = −20α02 + 20α0 − 5.
Nonlinear Programming 845

So solving the one-dimensional optimization problem, we need to find the


value of α0 which can be obtained as

f 0 (α0 ) = 0 = −40α0 + 20,

which gives
−40α0 + 20 = 0, α0 = 0.5.
Thus our new point can be found as

v1 = v0 + α0 ∇f (v0 ) = (1, 1) + 0.5(4, 2) = (3, 2).

Since
∇f (3, 2) = [0, 0]T ,
we terminate the process. Thus, (3, 2) is the optimal solution to the given
NLP problem because f (x1 , x2 ) is a concave function:
 
−2 0
H(3, 2) = .
0 −2

The first principal minors of the Hessian matrix are the diagonal entries
(–2 and –2). These are both negative (nonpositive). The second principal
minor is the determinant of the Hessian matrix H and equals

−2(−2) − 0 = 4 ≥ 0.

Procedure 7.4 (The Method of Steepest Ascent)

1. Start with initial point x(0) and initial (given) function f0 (x).

2. Find the search direction; d0 = ∇f0 (x(0) );


and if d0 = 0, stop; x(0) is the maximum.

3. Search the line x = x(0) + αd0 for a maximum.

4. Find the approximation of α at which f0 (α) is maximized.


846 Applied Linear Algebra and Optimization using MATLAB

5. Update the estimate of the maximum:

x(k+1) = x(k) + αdk .

6. If kx(k+1) − x(k) k <  ( > 0), stop; x(k+1) is the maximum;


otherwise, repeat all steps.

Example 7.41 Use the steepest ascent method to approximate the solution
to the problem

maximize z = x1 + x2 − x21 − 2x1 x2 − 2x22


subject to
(x1 , x2 ) ∈ R2 ,

by starting with v0 = (1, 1).

Solution. Given

f (x1 , x2 ) = x1 + x2 − x21 − 2x1 x2 − 2x22 and v0 = (1, 1),

the gradient vector of the given function can be evaluated as

∇f (x1 , x2 ) = [1 − 2x1 − 2x2 , 1 − 2x1 − 2x2 ]T ,

and it is at the given point v0 = (1, 1), that we get

∇f (1, 1) = [1 − 2(1) − 2(1), 1 − 2(1) − 2(1)]T = [−3, −3]T .

Thus, we must choose α0 to maximize

f (α0 ) = f [(1, 1) + α0 (−3, −3)] = f (1 + 4α0 , 1 + 2α0 ),

which can be simplified as

f (α0 ) = f (1 + 4α0 , 1 + 2α0 ) = −(4α0 − 2)2 − (2α0 − 1)2 .

Thus,
f (α0 ) = −20α02 + 20α0 − 5.
Nonlinear Programming 847

So solving the one-dimensional optimization problem, we need to find the


value of α0 , which can be obtained as follows:

f 0 (α0 ) = 0 = −40α0 + 20,

which gives
−40α0 + 20 = 0, α0 = 0.5.
Thus, our new point can be found as

v1 = v0 + α0 ∇f (v0 ) = (1, 1) + 0.5(4, 2) = (3, 2).

Since
∇f (3, 2) = [0, 0]T ,
it is important to note that this method is a very slow convergent (lin-
early) method, therefore, it can mostly be used for providing the best initial
guess for the approximation of an extreme value of a function for the other
iterative methods. •

Example 7.42 Use the steepest descent method to approximate the solu-
tion to the following problem

minimize z = 2x21 + x22 + x1 − x2 − x1 x2


subject to
(x1 , x2 ) ∈ R2 ,

by starting with v0 = (1, 1).

Solution. Given

f (x1 , x2 ) = 2x21 + x22 + x1 − x2 − x1 x2 and v0 = (1, 1),

the gradient vector of the given function can be computed as

∇f (x1 , x2 ) = [4x1 + 1 − x2 , 2x2 − 1 − x1 ]T ,

and its value at the given point v0 = (1, 1) is

∇f (1, 1) = [4, 0]T .


848 Applied Linear Algebra and Optimization using MATLAB

The new point is defined as

v1 = v0 + α0 d0 , where d0 = −∇f (1, 1) = [−4, 0]T .

Thus, we must choose α0 to minimize

f (α0 ) = f [(1, 1) + α0 (−4, 0)] = f (1 − 4α0 , 1),

and simplifying it gives

f (α0 ) = 2(1 − 4α0 )2 + (1)2 + (1 − 4α0 ) − 1 − (1 − 4α0 )(1) = 32α02 − 16α0 .

Thus, solving the one-dimensional optimization problem for finding the


value of α0 , we do the following:

f 0 (α0 ) = 0 = 64α0 − 16,

which gives
16 1
64α0 = 16, or α0 = = .
64 4
Notice that  
00 1 00
f (α0 ) = f = 64 > 0,
4
so α0 = 14 is the minimizing point.
Thus, our new point can be found as
1
v1 = v0 + α0 d0 = (1, 1) + (−4, 0) = (0, 1)
4
and
∇f (v1 ) = ∇f (0, 1) = [0, 1]T ,
which gives
d1 = −∇f (v1 ) = −∇f (0, 1) = [0, −1]T .
Now we find α1 to minimize

f (α1 ) = f [(0, 1) + α1 (0, −1)] = f (0, 1 − α1 ),

and it gives

f (α1 ) = 0 + (1 − α1 )2 + 0 − (1 − α1 ) = α12 − α1 .
Nonlinear Programming 849

Again, solving the one-dimensional optimization problem for finding the


value of α1 , we use the equation

f 0 (α1 ) = 0 = 2α1 − 1,

which gives α1 = 12 .
So the new point can be found as
 
1 1 1
v2 = v1 + α1 d = (0, 1) + (0, −1) = 0,
2 2

and
   T
1 1
∇f (v2 ) = ∇f 0, = ,0 ,
2 2
which gives
   T
2 1 1
d = −∇f (v2 ) = −∇f 0, = − ,0 .
2 2

Similarly, we have the other iterations as follows:


   T
1 1 1 1
α2 = , v3 = − , , ∇f (v3 ) = 0,
4 8 2 8

   T
1 1 7 1
α3 = , v4 = − , , ∇f (v4 ) = ,0 .
2 8 16 16

Since ∇f (v4 ) ≈ 0, the process can be terminated at this point. The approx-
imate minimum point is given by v4 = (− 81 , 16
7
). Notice that the gradients
at points v3 and v4 ,
  T
1 1
0, ,0 = 0,
8 16
are orthogonal. •
850 Applied Linear Algebra and Optimization using MATLAB

7.7.2 Newton’s Method


Newton’s method can also be used for multidimensional maximization or
minimization. This form of Newton’s method can be obtained by using
Taylor’s series of several variables as follows:
1
f (x) = f (x0 ) + ∇f (x0 )(x − x0 ) + (x − x0 )T H(x0 )(x − x0 ) + · · · , (7.43)
2
where H(x0 )is called the Hessian matrix or, simply, the Hessian of f (x).
Take x0 = x∗ (for example, a minimum of f ) and, ignoring the higher-order
terms, we get
1
f (x) ≈ f (x∗ ) + ∇f (x∗ )T (x − x∗ ) + (x − x∗ )T H(x∗ )(x − x∗ ).
2
Since ∇f (x∗ ) = 0 because x∗ is minimum of f (x),
1
f (x) ≈ f (x∗ ) + (x − x∗ )T H(x∗ )(x − x∗ ).
2
Note that x∗ is the local minimum value of f (x∗ ), so

(x − x∗ )T H(x∗ )(x − x∗ ) ≥ 0,

at least for x near x∗ ; if the minimum is strict local minimum, then

(x − x∗ )T H(x∗ )(x − x∗ ) > 0, for x 6= x∗ ,

showing that H is positive-definite.

From (7.43), we have ∇f (x) = 0, which gives

0 ≈ 0 + ∇f (x0 )T + (x − x0 )H(x0 ).

Also, it can be written as

(x − x0 ) ≈ −H −1 (x0 )∇f (x0 )T

or
x ≈ x0 − H −1 (x0 )∇f (x0 )T ,
Nonlinear Programming 851

which is a better approximation of x∗ . Hence, Newton’s method for the


extremum value of f (x) (of several variables) is

xk+1 = xk − H −1 (xk )∇f (xk )T , k ≥ 0. (7.44)

Note that if H is positive-definite, then it is nonsingular and the inverse of


it exists. For example, if a given function is of two variables, then (7.44)
can be written as
 2 −1 
∂ f ∂ 2f ∂f

     ∂x2 ∂x∂y  ∂x 
xk+1 xk   
= −   (7.45)
 
yk+1 yk
 
  ∂f 

 ∂ 2f ∂ 2f
∂y∂x ∂y 2 ∂y

or for three variables


 −1
∂ 2f ∂ 2f ∂ 2f  ∂f 
 ∂x2 ∂x∂y ∂x∂z 
   ∂x 
       
xk+1 xk  2   
 yk+1  =  yk  −  ∂ f
 ∂ 2f ∂ 2f 

 ∂f


 , (7.46)
zk+1 zk
 ∂y∂x
 ∂y 2 ∂y∂z 

 ∂y



   
∂f
 2
∂ 2f ∂ 2f
  
 ∂ f 
∂z∂x ∂z∂y ∂z 2 ∂z

and so on. Note that in both formulas, (7.45) and (7.46), the Hessian
matrix and gradient vector on the right-hand side are evaluated at (x, y) =
(xk , yk ) and (x, y, z) = (xk , yk , zk ), respectively.

Example 7.43 Use Newton’s method to find the local minimum of the
given function

f (x, y, z) = x + 2z + yz − x2 − y 2 − z 2 ,

taking the starting values (x0 , y0 , z0 )T = (1, 1, 1)T .

Solution. First, we compute the partial derivatives of the given function


as
852 Applied Linear Algebra and Optimization using MATLAB

∂f
= 1 − 2x
∂x
∂f
= z − 2y
∂y
∂f
= 2 + y − 2z,
∂z
so the gradient of the function can be written as

∇f (x, y, z) = [1 − 2x, z − 2y, 2 + y − 2z]T .

Also, the second partial derivatives are

∂ 2f
= −2
∂x2
∂ 2f
= −2
∂y 2
∂ 2f
= −2
∂z 2
∂ 2f ∂ 2f
= =0
∂x∂y ∂y∂x
∂ 2f ∂ 2f
= =0
∂x∂z ∂z∂x
∂ 2f ∂ 2f
= = 1,
∂y∂z ∂z∂y

so the Hessian of f is
 
−2 0 0
H(x, y, z) =  0 −2 1 .
0 1 −2
Nonlinear Programming 853

Thus, Newton’s method for the given function is


 
     −1 1 − 2xk
xk+1 xk −2 0 0 



 yk+1  =  yk  −  0 −2 1    zk − 2yk
.

zk+1 zk 0 1 −2  
2 + yk − 2zk
Starting with the initial approximation (x0 , y0 , z0 ) = (1, 1, 1) and k = 0 in
the above formula, we have
 
     −1 −1
x1 1 −2 0 0 



 y1  =  1  −  0 −2 1   1 .
 
z1 1 0 1 −2  
1
Since the inverse of the Hessian matrix is
1
 
− 0 0
−1  2
 
 
−2 0 0  
−1
 2 1 
H =  0 −2 1  = 0 −
 −  ,
0 1 −2  3 3 

 
 1 2 
0 − −
3 3
using this value we have the first iteration as
1
 
 −2 0 0    1 
−1

x1
   
1 

   2 
 y1  =  1  −  0 − 2 − 1   1  = 
     
 2 .

z1 1

 3 3 
  
   
  
 1 2  1 2
0 − −
3 3
The norm of the gradient vector of the function at the new approximation
can be calculated as
 T r
1 1 3
k∇f (x1 , y1 , z1 )k = k , 1, 1 k = +1+1= .
2 4 2
854 Applied Linear Algebra and Optimization using MATLAB

Similarly, we have the other possible iterations as follows.

Second iteration:
1 1
   
 1  −
 2 0 0  1 
  2
  2 

x2

  
 2   
  
2 1 2 
     
 y2  =  − 0 − − = ,
 
 2 −1 

z2
   3 3 
   3 
     


2
 1 2 
−1
 2 
0 − −
3 3 3
and the norm
 T r √
2 4 4 16 2 5
k∇f (x2 , y2 , z2 )k = k 0, − , k= 0+ + = .
3 3 9 9 3
Third iteration:
1 1 1

    
0
  − 0 0 
  2   2 2 
  
    
x3   
 2  
 2   
 y3  =  − 0 − 2 1 
−   2 
 3   −   3 =
 ,
z3    3 3  
  3 
  
    
 2   1 2  4  
4 
0 − − 3
3 3 3 3
and the norm
k∇f (x3 , y3 , z3 )k = (0, 0, 0)T = 0.
We noted that the convergence is very fast because we start sufficiently close
to the optimal solution, which can be easily computed analytically as
∂f
= 1 − 2x = 0
∂x
∂f
= z − 2y = 0
∂y
∂f
= 2 + y − 2z = 0,
∂z
Nonlinear Programming 855

and solving the above system gives


 
1 2 4
(x̄, ȳ, z̄) = , , .
2 3 3

7.8 Constrained Optimization


Here, we will discuss an NLP problem which that of both an objective
function and constraints. The uniqueness of an optimal solution of the
given NLP problem depends on the nature of both the objective function
and the constraints. If the given objective function is concave, and the con-
straint set forms a convex region, then there will be only one maximization
solution to the problem, and any stationary point must be a global max-
imization solution. But if the given objective function is convex and the
constraint set also forms a convex region, then any stationary point will
be a global minimum solution of the given NLP problem.

7.8.1 Lagrange Multipliers


Here, we will discuss a general, rather powerful method to maximize or
minimize a function with one or more constraints. This method is due to
Lagrange and is called the method of Lagrange multipliers.

This method can be used to solve the NLP problem in which all the
constraints are equality constraints. We consider an NLP problem of the
following type:
maximize (or minimize) z = f (x1 , x2 , . . . , xn )
subject to

g1 (x1 , x2 , . . . , xn ) = 0
g2 (x1 , x2 , . . . , xn ) = 0
..
. (7.47)
gm (x1 , x2 , . . . , xn ) = 0.
856 Applied Linear Algebra and Optimization using MATLAB

To solve problem (7.47), we associate a multiplier λi , for i = 1, 2, . . . , m,


with the ith constraints in (7.47) and form the Lagrangian as follows:
i=m
X
L(x1 , . . . , xn , λ1 , . . . , λm ) = f (x1 , . . . , xn ) + λi [bi − gi (x1 , . . . , xn )] . (7.48)
i=1

Then we attempt to find an optimal point (x¯1 , . . . , x¯n , λ¯1 , . . . , λ¯m ) that
maximizes (or minimizes) L(x1 , . . . , xn , λ1 , . . . , λm ). If (x¯1 , . . . , x¯n , λ¯1 , . . . ,
λ¯m ) maximizes the Lagrangian L, then at (x¯1 , . . . , x¯n , λ¯1 , . . . , λ¯m ) we have
∂L
= bi − gi (x1 , . . . , xn ) = 0,
∂λi
∂L
where ∂λi
is the partial derivative of L with respect to λi .

This shows that (x¯1 , x¯2 , . . . , x¯n ) will satisfy the constraints in (7.47).
To show that (x¯1 , x¯2 , . . . , x¯n ) solves (7.47), let (x01 , x02 , . . . , x0n ) be any point
that is in (7.47)’s feasible region. Since (x¯1 , x¯2 , . . . , x¯n , λ¯1 , λ¯2 , . . . , λ¯m ) max-
imizes L, for any numbers λ01 , λ02 , . . . , λ0m ), we have

L(x¯1 , x¯2 , . . . , x¯n , λ¯1 , . . . , λ¯m ) ≥ L(x01 , x02 , . . . , x0n , λ01 , . . . , λ0m ). (7.49)

Since (x¯1 , . . . , x¯n ) and (x01 , . . . , x0n ) are both feasible in (7.47), the terms in
(7.48) involving the λs are all zero, and (7.49) becomes

f (x¯1 , . . . , x¯n ) ≥ f (x01 , . . . , x0n ). (7.50)

Thus, (x¯1 , . . . , x¯n ) does solve problem (7.47). In short, if (x¯1 , . . . , x¯n ,
λ¯1 , . . . , λ¯m ) solves the unconstraint maximization problem

maximize L(x1 , . . . , xn , λ1 , . . . , λm ), (7.51)

then (x¯1 , . . . , x¯n ) solves (7.47). Since we know that for (x¯1 , x¯2 , . . . , x¯n , λ¯1 ,
. . . , λ¯m ) to solve (7.51), it is necessary that at (x¯1 , . . . , x¯n , λ¯1 , . . . , λ¯m )
∂L ∂L ∂L ∂L
= ··· = = = ··· = = 0. (7.52)
∂x1 ∂xn ∂λ1 ∂λm
The following theorems give conditions that any point (x¯1 , . . . , x¯n , λ¯1 , . . . , λ¯m )
that satisfies (7.52) will yield an optimal solution (x¯1 , . . . , x¯n ) to (7.47).
Nonlinear Programming 857

Theorem 7.28 Suppose that NLP problem (7.47) is a maximization prob-


lem. If f (x1 , x2 , . . . , xn ) is a convex function and each gi (x1 , x2 , . . . , xn ) for
i = 1, 2, . . . , m is a linear function, then any point (x¯1 , x¯2 , . . . , x¯n , λ¯1 , λ¯2 ,
. . . , λ¯m ) satisfying (7.52) will yield an optimal solution (x¯1 , x¯2 , . . . , x¯n ) to
(7.47). •

Theorem 7.29 Suppose that the NLP problem (7.47) is a minimization


problem. If f (x1 , x2 , . . . , xn ) is a concave function and each gi (x1 , x2 , . . . , xn )
for i = 1, 2, . . . , m is a linear function, then any point (x¯1 , x¯2 , . . . , x¯n , λ¯1 ,
λ¯2 , . . . , λ¯m ) satisfying (7.52) will yield an optimal solution (x¯1 , x¯2 , . . . , x¯n )
to (7.47). •

Example 7.44 A company is planning to spend $10, 000 advertising its


product. It costs $3, 000 per minute to advertise on the internet, $2, 000
per minute by television, and $1, 000 per minute on radio. If the firm buys
x1 minutes of internet advertising, x2 minutes of television advertising, and
x3 minutes of radio advertising, then its revenue in thousands of dollars is
given by
f (x1 , x2 , x3 ) = x1 + x2 + x3 − x21 − x22 − x23 .
How can the firm maximize its revenue?

Solution. Given the following NLP problem:

maximize w = f (x1 , x2 , x3 ) = x1 + x2 + x3 − x21 − x22 − x23


subject to

3, 000x1 + 2, 000x2 + 1, 000x3 = 10, 000,

the Lagrangian L(x1 , x2 , x3 ) is defined as

L(x1 , x2 , x3 ) = x1 + x2 + x3 − x21 − x22 − x23 + λ(3x1 + 2x2 + x3 − 10),

and we set
∂L
= 0 = 1 − 2x1 + 3λ
∂x1
858 Applied Linear Algebra and Optimization using MATLAB

∂L
= 0 = 1 − 2x2 + 2λ
∂x2
∂L
= 0 = 1 − 2x3 + λ
∂x3
∂L
= 0 = 3x1 + 2x2 + x3 − 10.
∂λ
From the first equation of the above system, we have
1 − 2x1 + 3λ = 0,
and it gives
1 + 3λ
x1 = .
2
The second equation of the above system is
1 − 2x2 + 2λ = 0,
which gives
1 + 2λ
x2 = .
2
Also, the third equation of the system is
1 − 2x3 + λ = 0,
which gives
1+λ
x3 = .
2
Finally, the last equation of the system is simply the given constraint
3x1 + 2x2 + x3 = 10,
and using the values of x1 , x2 , and x3 in this equation, we get
     
1 + 3λ 1 + 2λ 1+λ
3 +2 + = 10.
2 2 2
Simplifying this expression, we get
14λ = 14, which gives λ = 1.
Nonlinear Programming 859

Using this value of λ = 1, we obtain


1 + 3(1)
x1 = = 2.0
2
1 + 2(1)
x2 = = 1.5
2
1+1
x3 = = 1.0.
2
Thus, we get
x̄ = [2.0, 1.5, 1.0]T .
Now we compute the Hessian matrix of the function, which can help us show
that the given function is concave. The Hessian for the given function is
 
−2 0 0
H(2, 1.5, 1) =  0 −2 0 .
0 0 −2
Since we know that the first principal minors are simply the diagonal ele-
ments of the Hessian,
H1 = −2, H1 = −2, H1 = −2,
which are all negative, to find the second-order principal minors, we have
to find the determinant of the matrices
     
−2 0 −2 0 −2 0
, , ,
0 −2 0 −2 0 −2
which can be obtained by just deleting row 1 and column 1, row 2 and col-
umn 2, and row 3 and column 3 of the Hessian matrix, respectively.

So the second-order principal minors are


H2 = 4, H2 = 4, H2 = 4,
and all are nonnegative. The third-order principal minor is simply the
determinant of the Hessian itself. So the determinant of the Hessian can
be obtained by expanding row 1 cofactors as follows:
H3 = −2[(−2)(−2) − 0] − 0 + 0 = −8,
860 Applied Linear Algebra and Optimization using MATLAB

which is the third-order minor and it is negative. So by Theorem 7.19


the given function is concave. Also, since the given constraint is linear,
Theorem 7.28 shows that the Lagrange multiplier method does yield the
optimal solution to the given NLP problem. Thus, the firm should purchase
2 minutes of television time, 1.5 minutes of radio time, and 1 minute of
internet time, since λ = 1 and spending an extra δ (thousand) (for small)
would increase the firm’s revenues by approximately $1δ (thousand). •

Example 7.45 Find the maximum and minimum of the function


f (x1 , x2 , x3 ) = (x1 − 1)2 + (x2 − 2)2 + (x3 − 2)2
in the region
x21 + x22 + x23 = 36
by using Lagrange multipliers.

Solution. Given
f (x1 , x2 , x3 ) = (x1 − 1)2 + (x2 − 2)2 + (x3 − 2)2 ,
with the constraints
g1 (x1 , x2 , x3 ) = x21 + x22 + x23 − 36,
the Lagrangian L(x, y, z) is defined as
L(x1 , x2 , x3 ) = (x1 − 1)2 + (x2 − 2)2 + (x3 − 2)2 + λ(x21 + x22 + x23 − 36),
which leads to the equations
∂L
= 0 = 2(x1 − 1) + 2x1 λ
∂x1
∂L
= 0 = 2(x2 − 2) + 2x2 λ
∂x2
∂L
= 0 = 2(x3 − 2) + 2x3 λ
∂x3
∂L
= 0 = x21 + x22 + x23 − 36.
∂λ
Nonlinear Programming 861

Assume that x1 = x2 = x3 6= 0

1 − x1 2 − x2 2 − x3
= λ, = λ, = λ.
x1 x2 x3

The first two equation, imply that

1 − x1 2 − x2
= ,
x1 x2

from which it follows that


x2 = 2x1 .

Similarly, the first and third equations imply that

x3 = 2x1 .

Putting the values of x2 and x3 in the constraint equation

x21 + x22 + x23 − 36 = 0,

we obtain
9x2 = 36 or x ± 2.

Using these values of x1 , we get the two points

(2, 4, 4) and (−2, −4 − 4).

Since f (2, 4, 4) = 9 and f (−2, −4, −4) = 81, the function has a mini-
mum value at the point (2, 4, 4) and the maximum value at the other point
(−2, −4, −4). •

The above results can be reproduced using the following MATLAB


commands:
862 Applied Linear Algebra and Optimization using MATLAB

>> syms x1 x2 x3 lambda


>> f = (x1 − 1)2 + (x2 − 2)2 + (x3 − 2)2 ;
>> g = x21 + x22 + x23 − 36;
>> f x1 = dif f (f, x1 ), f x2 = dif f (f, x2 ), f x3 = dif f (f, x3 );
>> gx1 = dif f (g, x1 ), gx2 = dif f (g, x2 ), gx3 = dif f (g, x3 );
>> [alambda ax1 ax2 ax3 ] = solve(f x1 − lambda ∗ gx1 , ...
f x2 − lambda ∗ gx2 , f x3 − lambda ∗ gx3 , g − 1)
>> [bx1 bx2 bx3 ] = solve(f x1 , f x2 , f x3 );
>> [cx1 cx2 cx3 ] = solve(gx1 , gx2 , gx3 );
>> T = [ax1 ax2 ax3 subs(g, x1 x2 x3 , ax1 ax2 ax3 )...
subs(f, x1 x2 x3 , ax1 ax2 ax3 ) bx1 bx2 bx3 ...
subs(g, x1 x2 x3 , bx1 bx2 bx3 )...
subs(f, x1 x2 x3 , bx1 bx2 bx3 )...
cx1 cx2 cx3 subs(g, x1 x2 x3 , cx1 cx2 cx3 )...
subs(f, x1 x2 x3 , cx1 cx2 cx3 )];
>> double(T );

Procedure 7.5 (The Method of Lagrange Multipliers)

1. Verify that n > m and each gi has continuous first partials.

2. Form the Lagrangian function


m
X
L(x, λ) = f (x) + λi gi (x).
i=1

3. Find all of the solutions (x̄, λ̄) to the following system of nonlinear
algebraic equations:
m
X
∇L(x, λ) = ∇f (x) + λi ∇gi (x) = 0
i=1
∂L(x, λ)
= gi (x) = 0.
∂λi

These equations are called the Lagrange conditions and (x̄, λ̄) are the
Lagrange points.
Nonlinear Programming 863

4. Examine each solution (x̄, λ̄) to see if it is a minimizing point.

Example 7.46 Maximize the function

z = x21 + x22 + x23 ,

subject to the constraints

2x1 + x2 + x3 = 2
x1 − x2 − 3x3 = 4.

Solution. Given
f (x1 , x2 , x3 ) = x21 + x22 + x23 ,
with constraints

g1 (x1 , x2 , x3 ) = 2x1 + x2 + x3 − 2
g2 (x1 , x2 , x3 ) = x1 − x2 − 3x3 − 4,

and m = 2, n = 3, n > m as required, i.e., the method can be used for the
given problem.

The gradients of the constraints are


   
2 1
∇g1 (x) =  1  and ∇g1 (x) =  −1  ,
1 −3
∂gi
so the gradients ∂xj
are continuous functions.

Now form the Lagrangian function

L(x, λ) = x21 + x22 + x23 + λ1 (2x1 + x2 + x3 − 2) + λ2 (x1 − x2 − 3x3 − 4).

Compute the derivatives of L(x, λ) and then set the derivatives equal to
zero, which gives
∂L
= 2x1 + 2λ1 + λ2 = 0
∂x1
864 Applied Linear Algebra and Optimization using MATLAB

∂L
= 2x2 + λ1 − λ2 = 0
∂x2
∂L
= 2x3 + λ1 − 3λ2 = 0
∂x3
∂L
= 2x1 + x2 + x3 − 2 = 0
∂λ1
∂L
= x1 − x2 − 3x3 − 4 = 0.
∂λ2

Note that this system is linear and quite easy to solve, but in many prob-
lems the systems are nonlinear, and in such cases the Lagrange conditions
cannot be solved analytically, on account of the particular nonlinearities
they contain. While in other cases an analytical solution is possible, some
ingenuity might be needed to find it.

Thus, solving the above linear system, one can get the values of x1 , x2 ,
and x3 as follows:

(2λ1 + λ2 )
x1 = −
2
(λ1 − λ2 )
x2 = −
2
(λ1 − 3λ2 )
x3 = − .
2

Putting the values of x1 , x2 , and x3 in the last two equations (constraints)


of the above system, we get
     
(2λ1 + λ2 ) (λ1 − λ2 ) (λ1 − 3λ2 )
2 + + +2 = 0
2 2 2
     
(2λ1 + λ2 ) (λ1 − λ2 ) (λ1 − 3λ2 )
− −3 + 4 = 0.
2 2 2
Nonlinear Programming 865

After simplifying, we obtain


3λ1 − λ2 + 2 = 0
−2λ1 + 11λ2 + 8 = 0.
Solving this system, we get
30 28
λ1 = − and λ2 = − .
31 31
Using these values of the multipliers, we have
44 1 27
x1 = , x2 = , x3 = − .
31 31 31
Thus, one solution to the Lagrange conditions is
 T  T
44 1 27 30 28
x̄ = , ,− and λ̄ = − , − .
31 31 31 31 31
Note that this solution is unique, but in many problems it is not, and mul-
tiple solutions must be sought.

One can easily examine that the solution is a minimizing point as the
gradients of the constraints
   
2 1
∇g1 (x) =  1  and ∇g1 (x) =  −1 
1 −3
are linearly independent vectors. •
Some applications may involve more than two constraints. In particular,
consider the problem of finding the extremum of f (x1 , x2 , x3 , x4 ) subject to
the constraints gi (x) = 0 (for i = 1, 2, 3). If f (x) has an extremum subject
to these constraints, then the following conditions must be satisfied for
some real numbers λ1 , λ2 , and λ3 :
∇f + λ1 ∇g1 + λ2 ∇g2 + λ3 ∇g3 = 0.
By equating components and using constraints, we obtain a system of seven
equations in the seven unknowns x1 , x2 , x3 , x4 , λ1 , λ2 , λ3 . This method can
also be extended to functions of more than 4 variables and to more than 3
constraints.
866 Applied Linear Algebra and Optimization using MATLAB

Theorem 7.30 (Lagrange Multipliers Theorem)


Given an NLP problem:

minimize z = f (x1 , x2 , . . . , xn )
subject to
g1 (x1 , x2 , . . . , xn ) = 0
g2 (x1 , x2 , . . . , xn ) = 0
..
. (7.53)
gm (x1 , x2 , . . . , xn ) = 0.

If x̄ is a local minimizing point for the NLP problem (7.53), and n > m
(there are more variables than constraints) and the constraints gi (i =
1, , 2, . . . , m) have continuous first derivatives with respect to xj (j = 1, 2, . . . , n)
and the gradient ∇gi (x̄) are linearly independent vectors, then there is a
vector λ = [λ1 , λ2 , . . . , λm ]T such that
m
X
∇f (x̄) + λi ∇gi (x̄) = 0, (7.54)
i=1

where the numbers λi are called Lagrange multipliers. •

Note that, in general, a Lagrange multiplier for an equality-constraint


problem can be of either sign. The requirement that the ∇gi (λ̄) be linearly
independent is called a constraint qualification.

The above Theorem 7.30 gives a condition that must necessarily be


satisfied by any minimizing point x̄, namely,
m
X
∇f (x̄) + λi ∇gi (x̄) = 0, for λ ∈ Rm .
i=1

For fixed x̄, this vector equation is simply a system of linear equations in
the variables λi (i = 1, 2, . . . , m). If the assumptions of Theorem 7.30 hold,
and if there is no λ such that the preceding gradient equation holds at x̄,
then the point x̄ cannot be a minimizing point.
Nonlinear Programming 867

Example 7.47 Consider the following problem

minimize z = f (x1 , x2 , x3 ) = 20 + 2x1 + 2x2 + x32


subject to
g1 (x1 , x2 , x3 ) = x21 + x22 + x23 − 11 = 0
g2 (x1 , x2 , x3 ) = x1 + x2 + x3 − 3 = 0.

Solution. Given the objective function

f (x1 , x2 , x3 ) = 20 + 2x1 + 2x2 + x23 ,

and the constraints

g1 (x1 , x2 , x3 ) = x21 + x22 + x23 − 11 = 0


g2 (x1 , x2 , x3 ) = x1 + x2 + x3 − 3 = 0,

the Lagrangian L(x1 , x2 , x3 ) is defined as

L(x1 , x2 , x3 ) = 20+2x1 +2x2 +x23 +λ(x21 +x22 +x23 −11)+µ(x1 +x2 +x3 −3).

Now calculate the gradients of L, f, g1 , and g2 , and then set

∇L(x) = ∇f (x) + λ∇g1 (x) + µ∇g2 (x) = 0,

which gives
∂L
= 0 = 2 + 2λx1 + µ
∂x1
∂L
= 0 = 2 + 2λx2 + µ
∂x2
∂L
= 0 = 2x3 + 2λx3 + µ.
∂x3
Writing the above system in matrix form, we have
       
0 2 2x1 1
 0  =  2  + λ  2x2  + µ  1  .
0 2x3 2x3 1
868 Applied Linear Algebra and Optimization using MATLAB

Suppose that the feasible point x̄ = [−1, 3, 1]T is the minimizing point, then
       
0 2 −2 1
 0  =  2  + λ 6  + µ 1 
0 2 2 1
or

0 = 2 − 2λ + µ
0 = 2 + 6λ + µ
0 = 2 + 2λ + µ.

By adding the first and third equations, we get


1
0 = 4 + 2µ gives µ=− ,
2
and using this, we get
3
λ= .
4
Putting the values of λ and µ in the second equation, we see that it does
not satisfy. Thus, there is no (λ, µ) such that this system has a solution.
So x̄ cannot be the minimization point. Remember that these equations are
only necessary conditions for a minimizing point. If a solution λ does exist
for a given x̄, then that point x̄ may be or may not be a minimizing point.

7.8.2 The Kuhn–Tucker Conditions


The KT conditions play an important role in the general theory of non-
linear programming, and in particular, they are the conditions that must
be used in solving problems with inequality constraints. In the Lagrange
method, we found that Lagrangian multipliers could be utilized in solving
equality-constrained optimization problems. Kuhn and Tucker have ex-
tended this theory to include the general NLP problem with both equality
and inequality constraints. In the Lagrange method we used the gradient
condition and the original equality constraint equations to find the station-
ary points of an equality-constrained problem. In a similar way, here we
Nonlinear Programming 869

can use the gradient condition, the orthogonality condition, and the orig-
inal constraint inequalities to find the stationary points for an inequality-
constrained problem.

The KT conditions are first-order necessary conditions for a general


constrained minimization problem written in the standard form

minimize z = f (x)

subject to

gi (x) ≤ 0, i = 1, 2, . . . , m
hi (x) = 0, i = 1, 2, . . . , p

In the above standard form, the objective function is the minimization type,
all constraint right-hand sides are zero, and the inequality constraints are
the less-than type. Before applying the KT conditions, it is necessary to
note that the given problem has been converted to this standard form.

The KT conditions represent a set of equations that must be satisfied


for all local minimizing points of a considered minimization problem. Since
they are only necessary conditions, the points that satisfy these conditions
are only candidates for being a local minimum and are usually known as
KT points. Then one can also check sufficient conditions to determine if a
given KT point actually is a local minimum or not.

It is important to make sure that to apply the results of this section,


all the NLP constraints must be ≤ constraints. A constraint of the form

h(x1 , x2 , . . . , xn ) ≥ 0

must be written as
−h(x1 , x2 , . . . , xn ) ≤ 0.
Also, a constraint of the form

h(x1 , x2 , . . . , xn ) = 0
870 Applied Linear Algebra and Optimization using MATLAB

can be replaced by
h(x1 , x2 , . . . , xn ) ≤ 0
and
−h(x1 , x2 , . . . , xn ) ≤ 0.
For example,
x1 + 3x2 = 4
can be replaced by
x1 + 3x2 ≤ 4
and
−x1 − 3x2 ≤ −4.

7.8.3 Karush–Kuhn–Tucker Conditions


1. Form the Lagrangian function
m
X
L(x, λ) = f (x) + λi gi (x).
i=1

2. Find all of the solutions (x̄, λ̄) to the following system of nonlinear
algebraic equations and inequalities:
∂L
= 0, j = 1, 2, . . . , n (gradient condition)
∂xj
gi (x) ≤ 0, i = 1, 2, . . . , m (feasibility condition)
λi gi (x) = 0, i = 1, 2, . . . , m (orthogonality condition)
λi ≥ 0, i = 1, 2, . . . , m (nonnegativity condition)

3. If the functions gi (x) are all convex, the point (x̄) is the global min-
imizing point. Otherwise, examine each solution (x̄, λ̄) to see if (x̄)
is a minimizing point.

Note that for each inequality constraint, we need to consider the following
two possibilities.
Nonlinear Programming 871

Inactive Inequality Constraints: such constraints for which g(x̄) < 0.


These constraints do not determine the optimum and hence are not needed
in developing optimality conditions.

Active Inequality Constraints: such constraints for which g(x̄) = 0.

Now, we discuss necessary and sufficient conditions for x̄ = (x¯1 , x¯2 , . . . , x¯n )
to be an optimal solution for the following NLP problem:
maximize (or minimize) z = f (x1 , x2 , . . . , xn )
subject to
g1 (x1 , x2 , . . . , xn ) ≤ 0 (7.55)
g2 (x1 , x2 , . . . , xn ) ≤ 0 (7.56)
..
.
gm (x1 , x2 , . . . , xn ) ≤ 0
The following theorems give conditions (KT conditions) that are necessary
for a point
x̄ = (x¯1 , x¯2 , . . . , x¯n )
to solve (7.55).
Theorem 7.31 (Necessary Conditions, Maximization Problem)

Suppose (7.55) is a maximum problem. If x̄ = (x¯1 , x¯2 , . . . , x¯n ) is an optimal


solution to (7.55), then x̄ = (x¯1 , x¯2 , . . . , x¯n ) must satisfy the m constraints
in (7.55), and there exist multipliers λ̄ = (λ¯1 , λ¯2 , . . . , λ¯m ) satisfying

m
∂f (x̄) X ∂gi (x̄)
− λ̄i = 0, j = 1, 2, . . . , n (7.57)
∂xj i=1
∂xj

λ̄i [gi (x̄)] = 0, i = 1, 2, . . . , m (7.58)

λ̄i ≥ 0, i = 1, 2, . . . , m. (7.59)

872 Applied Linear Algebra and Optimization using MATLAB

Theorem 7.32 (Necessary Conditions, Minimization Problem)

Suppose (7.55) is a minimization problem. If x̄ = (x¯1 , x¯2 , . . . , x¯n ) is an


optimal solution to (7.55), then x̄ = (x¯1 , x¯2 , . . . , x¯n ) must satisfy the m
constraints in (7.55), and there exist multipliers λ̄ = (λ¯1 , λ¯2 , . . . , λ¯m ) sat-
isfying
m
∂f (x̄) X ∂gi (x̄)
+ λ̄i = 0, j = 1, 2, . . . , n (7.60)
∂xj i=1
∂x j

λ̄i [gi (x̄)] = 0, i = 1, 2, . . . , m (7.61)

λ̄i ≥ 0, i = 1, 2, . . . , m. (7.62)

In many situations, the KT conditions are applied to the NLP problem in


which the variables must be nonnegative. For example, the KT-conditions
can be used to find the optimal solution to the following NLP problem:

maximize (or minimize) z = f (x1 , x2 , . . . , xn )


subject to
g1 (x1 , x2 , . . . , xn ) ≤ 0
g2 (x1 , x2 , . . . , xn ) ≤ 0
..
.
gm (x1 , x2 , . . . , xn ) ≤ 0
−x1 ≤ 0
−x2 ≤ 0
..
.
−xn ≤ 0.

If we associate multipliers µ1 , µ2 , . . . , µn with these nonnegative constraints,


Theorem 7.31 and Theorem 7.32 can be written as follows.
Nonlinear Programming 873

Theorem 7.33 (Necessary Conditions, Maximization Problem)

Consider a maximum problem:


maximize (or minimize) z = f (x1 , x2 , . . . , xn )
subject to
g1 (x1 , x2 , . . . , xn ) ≤ 0
g2 (x1 , x2 , . . . , xn ) ≤ 0
..
.
gm (x1 , x2 , . . . , xn ) ≤ 0 (7.63)
−x1 ≤ 0
−x2 ≤ 0
..
.
−xn ≤ 0.
If x̄ = (x¯1 , x¯2 , . . . , x¯n ) is a an optimal solution to (7.63), then
x̄ = (x¯1 , x¯2 , . . . , x¯n ) must satisfy the m constraints in (7.63), and there
exist multipliers λ̄ = (λ¯1 , λ¯2 , . . . , λ¯m , µ¯1 , µ¯2 , . . . , µ¯n ) satisfying
m
∂f (x̄) X ∂gi (x̄)
− λ̄i + µj = 0, j = 1, 2, . . . , n (7.64)
∂xj i=1
∂x j

λ̄i [gi (x̄)] = 0, i = 1, 2, . . . , m (7.65)


" m
#
∂f (x̄) X ∂gi (x̄)
− λ̄i x̄ = 0 (7.66)
∂xj i=1
∂xj

λ̄i ≥ 0, i = 1, 2, . . . , m (7.67)
µ¯j ≥ 0, j = 1, 2, . . . , n. (7.68)

Since µ¯j ≥ 0, the first equation in the above system is equivalent to
m
∂f (x̄) X ∂gi (x̄)
− λ̄i ≤ 0, j = 1, 2, . . . , n.
∂xj i=1
∂xj
874 Applied Linear Algebra and Optimization using MATLAB

Thus, the KT conditions for the above maximization problem with non-
negative constraints may be written as
m
∂f (x̄) X ∂gi (x̄)
− λ̄i ≤ 0, j = 1, 2, . . . , n (7.69)
∂xj i=1
∂xj

λ̄i [gi (x̄)] = 0, i = 1, 2, . . . , m (7.70)


" m
#
∂f (x̄) X ∂gi (x̄)
− λ̄i x̄ = 0 (7.71)
∂xj i=1
∂xj

λ̄i ≥ 0, i = 1, 2, . . . , m. (7.72)
Theorem 7.34 (Necessary Conditions, Minimization Problem)

Suppose (7.69) is a minimization problem. If x̄ = (x¯1 , . . . , x¯n ) is an opti-


mal solution to (7.63), then x̄ = (x¯1 , . . . , x¯n ) must satisfy the m constraints
in (7.63), and there exist multipliers λ̄ = (λ¯1 , . . . , λ¯m , µ¯1 , µ¯2 , . . . , µ¯n ) satis-
fying
m
∂f (x̄) X ∂gi (x̄)
− λ̄i − µj = 0, j = 1, 2, . . . , n (7.73)
∂xj i=1
∂xj

λ̄i [gi (x̄)] = 0, i = 1, 2, . . . , m (7.74)


" m
#
∂f (x̄) X ∂gi (x̄)
+ λ̄i (x̄) x̄ = 0 (7.75)
∂xj i=1
∂x j

λ̄i ≥ 0, i = 1, 2, . . . , m (7.76)
µ¯j ≥ 0, j = 1, 2, . . . , n. (7.77)

Since µ¯j ≥ 0, the first equation in the above system is equivalent to
m
∂f X ∂gi
(x̄) + λ̄i (x̄) ≥ 0, j = 1, 2, . . . , n.
∂xj i=1
∂xj
Nonlinear Programming 875

Thus, the KT conditions for the above maximization problem with non-
negative constraints may be written as
m
∂f (x̄) X ∂gi (x̄)
+ λ̄i (x̄) ≥ 0, j = 1, 2, . . . , n (7.78)
∂xj i=1
∂x j

λ̄i [gi (x̄)] = 0, i = 1, 2, . . . , m (7.79)


" m
#
∂f X ∂gi
(x̄) + λ̄i (x̄) x̄ = 0 (7.80)
∂xj i=1
∂xj

λ̄i ≥ 0, i = 1, 2, . . . , m. (7.81)

Theorems 7.31, 7.32, 7.33, and 7.34 give the necessary conditions for a
point x̄ = (x¯1 , x¯2 , . . . , x¯n ) to be an optimal solution to (7.55) and (7.69).
The following theorems give the sufficient conditions for a point x̄ =
(x¯1 , x¯2 , . . . , x¯n ) to be an optimal solution to (7.55) and (7.69).

Theorem 7.35 (Sufficient Conditions, Maximization Problem)

Suppose (7.55) is a maximization problem. If f (x1 , . . . , xn ) is a concave


function and g1 (x1 , . . . , xn ), . . . , gm (x1 , . . . , xn ) are convex functions, then
any point x̄ = (x¯1 , . . . , x¯n ) satisfying the hypothesis of Theorem 7.31 is
an optimal solution to (7.55). Also, if (7.63) is a maximization problem,
f (x1 , . . . , xn ) is a concave function, and g1 (x1 , . . . , xn ), . . . , gm (x1 , . . . , xn )
are convex functions, then any point x̄ = (x¯1 , . . . , x¯n ) satisfying the hypoth-
esis of Theorem 7.33 is an optimal solution to (7.63). •

Theorem 7.36 (Sufficient Conditions, Minimization Problem)

Suppose (7.55) is a minimization problem. If f (x1 , . . . , xn ) is a convex


function and g1 (x1 , . . . , xn ), . . . , gm (x1 , . . . , xn ) are convex functions, then
any point x̄ = (x¯1 , . . . , x¯n ) satisfying the hypothesis of Theorem 7.32 is
an optimal solution to (7.55). Also, if (7.63) is a minimization problem,
f (x1 , . . . , xn ) is a convex function, and g1 (x1 , . . . , xn ), . . . , gm (x1 , . . . , xn )
are convex functions, then any point x̄ = (x¯1 , . . . , x¯n ) satisfying the hy-
pothesis of Theorem 7.34 is an optimal solution to (7.63). •
876 Applied Linear Algebra and Optimization using MATLAB

Example 7.48 Minimize the function

z = (x1 − 2)2 + (x2 − 3)2 + (x3 − 3)2 ,

subject to the inequality constraints

x1 + x2 − x3 ≤ 1
2x1 − x2 − 2x3 ≤ 2.

(a) Write down the KT conditions for the problem.


(b) Find all the solutions of the KT conditions for the problem.
(c) Find all the local minimizing points.

Solution. (a) Given

f (x1 , x2 , x3 ) = (x1 − 2)2 + (x2 − 3)2 + (x3 − 3)2 ,

with constraints

g1 (x1 , x2 , x3 ) = x1 + x2 − x3 − 1 ≤ 0
g2 (x1 , x2 , x3 ) = 2x1 − x2 − 2x3 − 2 ≤ 0,

first, we form the Lagrangian function as follows:

L(x, λ) = (x1 −2)2 +(x2 −3)2 +(x3 −3)2 +λ1 [x1 +x2 −x3 −1]+λ2 [2x1 −x2 −2x3 −2].

Write down the KT conditions:


∂L
= 2(x1 − 2) + λ1 + 2λ2 = 0
∂x1
∂L
= 2(x2 − 3) + λ1 − λ2 = 0
∂x2
∂L
= 2(x3 − 3) − λ1 − 2λ2 = 0
∂x3
∂L
= g1 (x) = x1 + x2 − x3 − 1 ≤ 0
∂λ1
Nonlinear Programming 877

∂L
= g1 (x) = 2x1 − x2 − 2x3 − 2 ≤ 0
∂λ2

λ1 [x1 + x2 − x3 − 1] = 0

λ2 [2x1 − x2 − 2x3 − 2] = 0

λ1 ≥ 0, λ2 ≥ 0.
(b) Consider the four possible cases:

λ1 = 0, λ2 =0
λ1 6= 0, λ2 6= 0
λ1 = 0, λ2 6= 0
λ1 6= 0, λ2 = 0.
First Case: When λ1 = 0, λ2 = 0, then using the set of equations we
got from the gradient condition, we get
2(x1 − 2) = 0 gives x1 = 2
2(x2 − 3) = 0 gives x2 = 3
2(x3 − 3) = 0 gives x3 = 3.
Putting these values of x1 , x2 , and x3 in the given first constraint, we have
x1 + x2 − x3 − 1 = 2 + 3 − 3 − 1 = 1 6≤ 0.
Hence, this case does not hold.

Second Case: When λ1 6= 0, λ2 6= 0, then again using the set of equa-


tions we got from the gradient condition, we get
1
2(x1 − 2) + λ1 + 2λ2 = 0 gives x1 = [4 − λ1 − 2λ2 ]
2
1
2(x2 − 3) + λ1 − λ2 = 0 gives x2 = [6 − λ1 + λ2 ]
2
1
2(x3 − 3) − λ1 − 2λ2 = 0 gives x3 = [6 + λ1 + 2λ2 ].
2
878 Applied Linear Algebra and Optimization using MATLAB

Since λ1 6= 0, λ2 6= 0, from the orthogonality condition, we get

x1 + x2 − x3 − 1 = 0
2x1 − x2 − 2x3 − 2 = 0.

Now putting the values of x1 , x2 , and x3 in this system, we get

3λ1 + 9λ2 = −14


3λ1 + 3λ2 = 2.

Solving this system gives


10 8
λ1 = and λ2 = − .
3 3
But from the nonnegativity condition, λ2 ≥ 0. So this case also does not
hold.

Third Case: When λ1 = 0, λ2 6= 0, then using the gradient condition,


we get

2(x1 − 2) + 2λ2 = 0 gives x1 = 2 − λ2

1
2(x2 − 3) − λ2 = 0 gives x2 = [6 + λ2 ]
2

2(x3 − 3) − 2λ2 = 0 gives x3 = 3 + λ2 .

Since λ2 6= 0, then from the orthogonality condition, we get

2x1 − x2 − 2x3 − 2 = 0.

Now using the values of x1 , x2 , and x3 , we get


1 8
2(2 − λ2 ) − (6 + λ2 ) − 2(3 + λ2 ) − 2 = 0, λ2 = − .
2 7
So this case also does not hold.
Nonlinear Programming 879

Fourth Case: When λ1 6= 0, λ2 = 0 then using the gradient condition,


we get
1
2(x1 − 2) + λ1 = 0 gives x1 = [4 − λ1 ]
2
1
2(x2 − 3) + λ1 = 0 gives x2 = [6 − λ1 ]
2
1
2(x3 − 3) − λ1 = 0 x3 = [6 + λ1 ].
gives
2
6 0, then from the orthogonality condition, we get
Since λ1 =
x1 + x2 − x3 − 1 = 0.
Now using the values of x1 , x2 , and x3 , we get
1 1 1 2
(4 − λ1 ) + (6 − λ1 ) − (6 + λ1 ) − 1 = 0, λ1 = .
2 2 2 3
So this case holds and
 T  T
5 8 10 2
x̄ = , , and λ̄ = , 0
3 3 3 3
is the only KT point.
(c) Now we will check whether the functions f, g1 , g2 are convex or not.
First, we check the convexity of the objective function as follows:
f (x1 , x2 , x3 ) = (x1 − 2)2 + (x2 − 3)2 + (x3 − 3)2

∂f ∂f 2
= 2(x1 − 2); =2
∂x1 ∂x21
∂f ∂f 2
= 2(x2 − 3); =2
∂x2 ∂x22
∂f ∂f 2
= 2(x3 − 3); =2
∂x3 ∂x23
∂f 2 ∂f 2 ∂f 2
= = = 0.
∂x1 ∂x2 ∂x1 ∂x3 ∂x2 ∂x3
880 Applied Linear Algebra and Optimization using MATLAB

Hence, the Hessian matrix for the objective function is


 
2 0 0
H =  0 2 0 .
0 0 2
By deleting rows (and columns) 1 and 2 of the Hessian matrix, we obtain
the first-order principal minor 2 > 0. By deleting rows (and columns)
1 and 3 of the Hessian matrix, we obtain the first-order principal minor
2 > 0. By deleting rows (and columns) 2 and 3 of the Hessian matrix, we
obtain the first-order principal minor 2 > 0. By deleting row 1 and column
1 of the Hessian matrix, we find the second-order principal minor
 
2 0
|H2 | = = 4 − 0 = 4 > 0.
0 2
By deleting row 2 and column 2 of the Hessian matrix, we find the second-
order principal minor
 
2 0
|H2 | = = 4 − 0 = 4 > 0.
0 2
By deleting row 3 and column 3 of the Hessian matrix, we find the second-
order principal minor
 
2 0
|H2 | = = 4 − 0 = 4 > 0.
0 2
The third-order principal minor is simply the determinant of the Hessian
matrix itself, which is
|H3 | = 8 > 0.
Because for all (x1 , x2 , x3 ) all principal minors of the Hessian matrix are
nonnegative, we have shown that f (x1 , x2 , x3 ) is a convex function.

Also, since the functions g1 and g2 are linear and thus convex by the
definition of a convex function, hence all the functions are convex and the
point x̄ = [ 35 , 83 , 10
3
]T is the global minimum.

Note for this problem there is one active constraint, g1 (x̄) = 0, so


λ1 6= 0, and one inactive constraint, g2 (x̄) < 0, so λ2 6= 0. •
Nonlinear Programming 881

7.9 Generalized Reduced-Gradient Method


Here, we will consider the equality-constrained problems and how they
can be converted from the inequality-constrained problems. The equality-
constrained problems have much importance in nonlinear optimization. As
in the case of linear programming, many nonlinear optimizations are eas-
ily formulated in a such way that they contain equality constraints. Also,
the theory of equality-constrained nonlinear programming leads naturally
to a more complete theory that encompasses both equality and inequal-
ity constraints. The advantage of using equality constraints here is that it
helps us to eliminate variables by the use of constraint equations from some
variables in terms of the others. After eliminating the variables, we could
easily solve the resulting reduced unconstrained problem by using a simple
calculus method. The elimination of the variables will be very simple if
the constraints are linear equations. But for the nonlinear case we can use
Taylor’s method to convert the given nonlinear constraint equations into
linear constraint equations.

First, we consider the NLP problem having equality constraints in linear


form.
Example 7.49 Minimize the following function

z = x21 + x22 + x23 + x24 ,

subject to the linear constraints

x1 + 2x2 + 3x3 + 5x4 = 10


x1 + 2x2 + 5x3 + 6x4 = 15.

Solution. Given the objective function

f (x) = x21 + x22 + x23 + x24

and the linear constraints

g1 (x) = x1 + 2x2 + 3x3 + 5x4 − 10


g2 (x) = x1 + 2x2 + 5x3 + 6x4 − 15,
882 Applied Linear Algebra and Optimization using MATLAB

we will solve the given constraints for two of the variables in terms of the
other two. Solving for x1 and x3 in terms of x2 and x4 , we multiply the
first constraint equation by 5 and the second constraint equation by 3 and
subtract the results, which gives
5 7
x1 = − 2x2 − x4 .
2 2
Next, subtracting the two given constraint equations, we get
5 1
x3 = − x4 .
2 2
Putting these two expressions for x1 and x3 into the given objective func-
tion, we obtain the new problem (called the reduced problem):
 2  2
5 7 2 5 1
minimize f (x2 , x4 ) = − 2x2 − x4 + x2 + − x4 + x24 ,
2 2 2 2
or it can be written as
25 27
minimize f (x2 , x4 ) = − 10x2 + 5x22 − 20x4 + 14x2 x4 + x24 .
2 2
One can note that this is an unconstraint problem now, and it can be solved
by setting the first partial derivatives with respect to x2 and x4 equal to zero,
i.e.,
∂f
= −10 + 10x2 + 14x4 = 0
∂x2
and
∂f
= −20 + 14x2 + 27x4 = 0.
∂x4
Thus, we have a linear system of the form
10x2 + 14x4 = 10
14x2 + 27x4 = 20.
Solving this system by taking x2 = (1 − 14 x ) from the first equation and
10 4
then putting this value in the second equation we get
 
14
14 1 − + 27x4 = 20,
10
Nonlinear Programming 883

30 5
which gives x4 = 37
, and it implies that x2 = − 37 .

Now using these values of x2 and x4 , we obtain the other two variables
x1 and x3 as follows:

5 5 7 30 5
x1 = +2 − ( )=−
2 37 2 37 74
and
5 1 30 155
x3 = − ( )= .
2 2 37 74
Thus, the optimal solution is

5 5 155 30 T
x̄ = [− ,− , , ] .
74 37 74 37
Note that the main difference between this method and the Lagrange mul-
tipliers method is that this method is easy to solve for several constraint
equations simultaneously if they are linear. The gradient of the objective
function is called the reduced gradient, and the method is therefore called
the reduced-gradient method.

Note that pivoting in a linear programming tableau can also be used to


obtain the above result as follows:

x1 x2 x3 x4 constants
1l 2 3 5 10
1 2 5 6 15

x1 x2 x3 x4 constants
1 2 3 5 10
0 0 2l 1 5

x1 x2 x3 x4 constants
7 5
1 2 0 2 2

1 5
0 0 1 2 2
884 Applied Linear Algebra and Optimization using MATLAB

which gives
5 7
x1 = − 2x2 − x4
2 2
5 1
x3 = − x4 ,
2 2
the same as above. •

Nonlinear Constraints
In the previous example, we solved the problem having linear constraints
by the gradient method. Now we will deal with a problem having nonlinear
constraints and consider the possibility of approximating such a problem
by a problem with linear constraints. To do this we expand each nonlinear
constraint function in a Taylor series and then truncate terms beyond the
linear one.

Example 7.50 Minimize the function

z = x1 + 2x22 + x3 + x24 ,

subject to the nonlinear constraints

x1 + x22 + x3 + 2x4 = 2
2x1 + 2x2 + x3 + x24 = 4.

Solution. Given the objective function

f (x) = x1 + 2x22 + x3 + x24

and the nonlinear constraints

g1 (x) = x1 + x22 + x3 + 2x4 − 2


g2 (x) = 2x1 + 2x2 + x3 + x24 − 4,

we know that the Taylor series approximation for a function of several


variables about given point x∗ can be written as
1
f (x) = f (x∗ ) + ∇f (x∗ )T ∆x + ∆xT ∇2 f (x∗ )∆x + · · · .
2
Nonlinear Programming 885

If we truncate terms after the second term, we obtain the linear approxi-
mation
f (x) ≈ f (x∗ ) + ∇f (x∗ )T (∆xx∗ ),
which can also be written as

f (x) ≈ f (x∗ ) + ∇f (x∗ )T (x − x∗ ).

With the help of this formula we can replace the inequality constraints by
equality constraints that approximate the true constraints in the vicinity of
the point x∗ at which the linearization is performed:

x1 − x∗1
 
 x2 − x∗2 
g1 (x) ≈ [x∗1 + x∗ 22 + x∗3 + 2x∗4 − 2] + [1, 2x∗ 2 , 1, 2]  
 x3 − x∗3 
x4 − x∗4

or, it can be written as

g1 (x) ≈ x1 + (2x∗ 2 )x2 + x3 + 2x4 − [2x∗ 22 + 2].

Doing the similar approximation for the second constraint function, we


obtain
x1 − x∗1
 
 x2 − x∗2 
g2 (x) ≈ [2x∗1 + 2x∗ 2 + x∗3 + x∗ 24 − 4] + [2, 2, 1, 2x∗ 4 ]  
 x3 − x∗3 
x4 − x∗4

or, it can be written as

g2 (x) ≈ 2x1 + 2x2 + x3 + (2x∗ 4 )x4 − [x∗ 24 + 4].

This process is called the generalized reduced-gradient method and is


used to solve a sequence of subproblems, each of which uses a linear ap-
proximation of the constraints. The first step of the method is to start
with the initial given point and then, at each iteration of the method, the
constraint linearization is recalculated at the point gotten from the previous
iteration. After each iteration the approximated solution comes closer to
886 Applied Linear Algebra and Optimization using MATLAB

the optimal point and the linearized constraints of the subproblems become
better approximations to the original nonlinear constraints in the neighbor-
hood of the optimal point. At the optimal point, the linearized problem has
the same solution as the original nonlinear problem.

To apply the generalized reduced-gradient method, first we have to pick


the starting point x0 = [2, −1, 1, −1]T at which the first linearization can be
performed. In the second step we use the approximation formulas already
given to linearize the constraint functions at the starting point x0 and form
the first approximate problem as follows:

f (0) (x) = x1 + 2x22 + x3 + x24 ,

and the linear constraints are

g1 (x) = x1 − 2x2 + x3 + 2x4 − 4


g2 (x) = 2x1 + 2x2 + x3 − 2x4 − 5.

Now we solve the equality constraints of the approximate problem to express


two of the variables in terms of the others. By selecting x1 and x3 to be
basic variables, we solve the linear system to write them in terms of the
other variables x2 and x4 (nonbasic variables):

x1 = 1 − 4x2 + 4x4
x3 = 3 + 6x2 − 6x4 .

Putting the expressions for x1 and x3 in the objective function, we get

f (0) (x) = (1−4x2 +4x4 )+2x22 +(3+6x2 −6x4 )+x24 = 4+2x2 +2x22 −2x4 +x24 .

Solving this unconstrained minimization by putting the first partial deriva-


tives with respect to the nonbasic variables x2 and x4 equal to zero, we
obtain
∂f (0) 1
= 2 + 4x2 = 0 gives x2 = −
∂x2 2
∂f (0)
= −2 + 2x4 = 0 gives x4 = 1.
∂x4
Nonlinear Programming 887

Using these values in the above x1 equation and x3 equation, we get


 
1
x1 = 1 − 4 − + 4(1) = 1 + 2 + 4 = 7
2
 
1
x3 = 3 + 6 − − 6(1) = 3 − 3 − 6 = −6.
2
Thus, we get the new point, x1 = [7, − 12 , −6, 1]T .

Similarly, using this new point x1 , we obtain the second approximate


problem as follows:
f (1) (x) = x1 + 2x22 + x3 + x24 ,
and the linear constraints are
g1 (x) = x1 − x2 + x3 + 2x4 − 2.5
g2 (x) = 2x1 + 2x2 + x3 + 2x4 − 5.
Again by selecting x1 and x3 to be basic variables, we solve the linear system
and write them in terms of other nonbasic variables x2 and x4 as follows:
x1 = 2.5 − 3x2
x3 = 4x2 − 2x4 .
Putting the expressions for x1 and x3 in the objective function, we get
f (1) (x) = (2.5 − 3x2 ) + 2x22 + (4x2 − 2x4 ) + x24 = 2.5 + x2 + 2x22 − 2x4 + x24 .
Solving this unconstrained minimization, we get
∂f (1) 1
= 1 + 4x2 = 0 gives x2 = −
∂x2 4
∂f (1)
= −2 + 2x4 = 0 gives x4 = 1.
∂x4
Using the values of x2 and x4 , we get the values of the other variables as
 
1
x1 = 2.5 − 3 − = 2.5 + 0.75 = 3.25
4
 
1
x3 = 4 − − 2(1) = −1 − 2 = −3.
4
888 Applied Linear Algebra and Optimization using MATLAB

Thus, we get the other new point, x2 = [3.5, − 41 , −3, 1]T .

Continuing to convert nonlinear constraint functions into linear func-


tions at the new point, use the resulting system of linear equations to ex-
press two of the variables in terms of the other, substituting into the objec-
tive function to get a new reduced problem, solving the reduced problem for
x3 , and so forth.

We can also solve this problem by converting the given constraints for
two of the variables in terms of the other two. So solving for x1 and x3 in
terms of x2 and x4 , we obtain

x1 = 2 − x22 − x3 − 2x4

and
x3 = 4 − 2x1 − 2x2 − x24 .
Putting the x1 equation in the x3 equation, we get

x3 = 2x2 − 2x22 − 4x4 + x24 .

Now putting this value of x3 in the x1 equation, we obtain

x1 = 2 + x22 − 2x2 + 2x4 − x24 .

Using these new values of x1 and x3 , the given objective function becomes

f (x) = (2 + x22 − 2x2 + 2x4 − x24 ) + 2x22 + (2x2 − 2x22 − 4x4 + x24 ) + x24

or
f (x) = 2 + x22 − 2x4 + x24 .
Setting the first partial derivatives with respect to x2 and x4 equal to zero
gives
∂f
= 2x2 = 0
∂x2
and
∂f
= −2 + 2x4 = 0.
∂x4
Nonlinear Programming 889

By solving these two equations, we get

x2 = 0 and x4 = 1.

Using the values of x2 and x3 , we get

x1 = 2 + (0)2 − 2(0) + 2(1) − (1)2 = 2 + 2 − 1 = 3

and
x3 = 2(0) − 2(0)2 − 4(1) + (1)2 = −4 + 1 = −3.
Thus, the optimal solution is

x̄ = [3, 0, −3, 1]T ,

and the minimum value of the function is

f (x) = (3) + 2(0)2 + (−3) + (1)2 = 1.

Procedure 7.6 (Generalized Reduced-Gradient Method)

1. Start with the initial point x(0) .

2. Convert the nonlinear constraint functions into linear constraint func-


tions using

f (x) ≈ f (k) (x(k) ) + ∇f (k) (x(k) )T (x − x(k) ).

3. Solve the linear constraint equations for the basic variables in terms
of the other (nonbasic) variables.

4. Solve the unconstrained reduced problem for nonbasic variables.

5. Find the basic variables using the nonbasic variables from the linear
constraints equations.

6. Repeat all the previous steps unless you get the desired accuracy.
890 Applied Linear Algebra and Optimization using MATLAB

7.10 Separable Programming


In separable programming NLP problems are solved by approximating the
nonlinear functions with piecewise linear functions and then solving the
optimization problem through the use of a modified simplex algorithm of
linear programming and, in special cases, the ordinary simplex algorithm.

Definition 7.17 (Separable Programming)

In using separable programming, a basic condition is that all given functions


in the problem be separable, i.e.,

f (x1 , x2 , . . . , xn ) = f1 (x1 ) + f2 (x2 ) + · · · + fn (xn ).

For example, the function

f (x1 , x2 ) = x31 − 4x21 + 2x1 + 2x22 − 3x2

is separable because

f (x1 , x2 ) = f1 (x1 ) + f2 (x2 ),

where
f1 (x1 ) = x31 − 4x21 + 2x1 and f2 (x2 ) = 2x22 − 3x2 .
But the function

f (x1 , x2 ) = x21 − cos(x1 + x2 ) + x1 ex2 + 2x22

is not separable.

Sometimes the given nonlinear functions are not separable, but they
can be made separable by approximate substitution. For example, the given
nonlinear programming problem

maximize z = x1 ex2

is not separable, but it can be made separable by letting

w = x1 ex2 ,
Nonlinear Programming 891

then
ln w = ln x1 + x2 .
Thus,
maximize z = w,
subject to
ln w = ln x1 + x2
is called a separable programming problem. Note that the substitution as-
sumes that x1 is a positive variable. •

There are different ways to deal with the separable programming prob-
lems, but we will solve the problems by the McMillan method.

McMillan states that any continuous nonlinear and separable function


f (x1 , x2 , . . . , xn ) can be approximated by a piecewise linear function and
solved using linear programming solving techniques provided that the fol-
lowing condition is applied:
d
X d
X d
X
f (x) ≈ f¯ = λk1 fk1 + λk2 fk2 + · · · + λkn fkn ,
k=0 k=0 k=0

where
d
X
xn = λkn xkn ,
k=0

and d is any suitable integer representing the number of segments into


which the domain of x is divided, given that
d
X
1. λkj = 1, j = 1, 2, . . . , n,
k=0

2. λkj ≥ 0, k = 0, 1, . . . d, j = 1, 2, . . . , n, and

3. no more than two of the λs that are associated with any one variable
j are greater than zero, and if two are greater than zero, they must
be adjacent.
892 Applied Linear Algebra and Optimization using MATLAB

Example 7.51 Consider the NLP problem

minimize z = (x1 − 2)2 + 4(x2 − 6)2 ,

subject to

g(x) = 6x1 + 3(x2 + 1)2 ≤ 12


x1 ≥ 0, x2 ≥ 0.

Solution. Both the objective function and the constraint are separable
functions because
f (x) = f1 (x1 ) + f2 (x2 ),
where
f1 (x1 ) = (x1 − 2)2 and f2 (x2 ) = 4(x2 − 6)2 ,
and also
g(x) = g1 (x1 ) + g2 (x2 ),
where
g1 (x1 ) = 6x1 and g2 (x2 ) = 4(x2 − 6)2 .
Notice that both x1 and x2 enter the problem nonlinearly in the objective
function and the constraint. Thus we must write both x1 and x2 in terms
of the λs. But if the variables are linear throughout the entire problem, it is
not necessary to write in terms of the λs; they can be used as the variables
themselves.

First, we determine the domain d of interest for the variables x1 and


x2 . From the given constraints, the possible values for x1 and x2 are

0 ≤ x1 ≤ 2 and 0 ≤ x2 ≤ 1,

respectively. Dividing the domain of interest for x1 and x2 arbitrarily into


two segments each and obtaining the grid points, the piecewise linear func-
tion to be used to approximate f is
2
X 2
X
f¯ = λk1 fk1 + λk2 fk2 ,
k=0 k=0
Nonlinear Programming 893

and the approximation function for g is


2
X 2
X
ḡ = λk1 gk1 + λk2 gk2 .
k=0 k=0

Evaluating both approximate functions gives

k xk1 fk1 gk1 xk2 fk2 gk2


0 0 4 0 0 144 3
1
1 1 1 6 2
121 27
3
2 2 0 12 1 100 12

Now we solve the NLP problem

minimize z = 4λ01 + λ11 + 144λ02 + 121λ12 + 100λ22 ,

subject to

27
6λ11 + 12λ21 + 3λ02 + λ12 + 12λ22 ≤ 12
4
λ01 + λ11 + λ21 = 1
λ02 + λ12 + λ22 = 1
λij ≥ 0, i, j = 0, 1, 2.

Note that this approximating problem to our original nonlinear problem


is linear. Thus, we can solve this problem using the simplex algorithm of
linear programming if we modified it to ensure that in any basic solution no
more than two of the λs that are associated with either of the xj variables
are greater than zero and if two (rather than one) are greater than zero,
then they must be adjacent.

λ01 λ11 λ21 λ02 λ12 λ22 s1 a2 a3 rhs


1 1 1 1 1 1 0 0 0 w=2
27
0 6 12 3 4
12 1 0 0 s1 = 12
1n 1 1 0 0 0 0 1 0 a2 = 1
0 0 0 1 1 1 0 0 1 a3 = 1
894 Applied Linear Algebra and Optimization using MATLAB

λ01 λ11 λ21 λ02 λ12 λ22 s1 a3 rhs


0 0 0 1 1 1 0 0 w=1
27
0 6 12 3 4
12 1 0 s1 = 12
1 1 1 0 0 0 0 0 λ01 = 1
0 0 0 1n 1 1 0 1 a3 = 1

λ01 λ11 λ21 λ02 λ12 λ22 s1 rhs


0 0 0 0 0 0 0 w=0
15
0 6 12 0 4
9 1 s1 = 9
1 1 1 0 0 0 0 λ01 = 1
0 0 0 1 1 1 0 λ02 = 1

λ01 λ11 λ21 λ02 λ12 λ22 s1 rhs


−4 −1 0 −144 −121 −100 0
15
0 6 12 0 4
9 1 s1 = 9
1 1 1 0 0 0 0 λ01 = 1
0 0 0 1 1 1 0 λ02 = 1

λ01 λ11 λ21 λ02 λ12 λ22 s1 rhs


0 3 4 0 23 44 0 z = 148
15
0 6 12 0 4
9 1 s1 = 9
1 1 1 0 0 0 0 λ01 = 1
0 0 0 1 1 1n 0 λ02 = 1

λ01 λ11 λ21 λ02 λ12 λ22 s1 rhs


0 3 4 −44 −21 0
 0 z = 104
0 6 12

−9 − 21
4
0 1 s1 = 0
1 1 1 0 0 0 0 λ01 = 1
0 0 0 1 1 1 0 λ22 = 1
Nonlinear Programming 895

λ01 λ11 λ21 λ02 λ12 λ22 s1 rhs

0 1 0 −41 − 77
4
0 − 13 z = 104

1
0 1 − 34 7
− 16 0 1
λ21 = 0

2 2

1 3 7 1
1 2
0 4 16
0 12
λ01 = 1

0 0 0 1 1 1 0 λ22 = 1

λ01 λ11 λ21 λ02 λ12 λ22 s1 rhs

0 0 −2 − 79
2
− 147
8
0 − 12 z = 104

0 1 2 − 32 − 78 0 1
6
λ11 = 0

6 14
1 0 −1 4 16
0 0 λ01 = 1

0 0 0 1 1 1 0 λ22 = 1
Since
λ11 = 0, λ01 = 1, λ22 = 1,
we have
x1 = λ01 x01 + λ11 x11 + λ21 x21 = 0
x2 = λ02 x02 + λ12 x12 + λ22 x22 = 1.
Hence,
f (x) = (x1 − 2)2 + 4(x2 − 6)2 = 4 + 100 = 104.

7.11 Quadratic Programming


Quadratic programming is a technique that has a quadratic objective func-
tion. For example, for the two variables, the objective function must con-
tain only terms in x21 , x22 , x1 , x2 , x1 x2 , and a constant term. The constraints
896 Applied Linear Algebra and Optimization using MATLAB

can be linear inequalities or equalities.

An NLP problem whose constraints are linear and whose objective func-
tion is the sum of the terms of the form xk11 xk22 . . . xknn (with each term
having a degree of 2, 1, or 0) is a quadratic programming problem.

There are several algorithms that can be used to solve quadratic pro-
gramming problems. For solving such programming problems we describe
Wolfe’s method. The basic approach of this method is that all the variables
must be nonnegative.

Example 7.52 Solve the quadratic programming problem

maximize z = 6x1 + 3x2 − 4x1 x2 − 2x21 − 3x22 ,

subject to

x1 + x2 ≤ 1
2x1 + 3x2 ≤ 4
x1 , x2 ≥ 0.

Solution. The KT conditions may be written as:

6 − 4x1 − 4x2 − λ1 − 2λ2 ≤ 0


3 − 4x1 − 6x2 − λ1 − 3λ2 ≤ 0
x1 + x2 ≤ 1
2x1 + 3x2 ≤ 4
x1 , x2 ≥ 0.

The objective function may be shown to be concave, so any point satisfying


the KT conditions will solve this quadratic programming problem. Applying
the excess variable e1 for the first constraint (called the x1 constraint), the
excess variable e2 for the second constraint (called the x2 constraint), and
the slack variable s1 for the third constraint, and the slack variable s2 for
Nonlinear Programming 897

the last constraint, we have

6 − 4x1 − 4x2 − λ1 − 2λ2 − e1 = 0


3 − 4x1 − 6x2 − λ1 − 3λ2 − e2 = 0
x1 + x2 + s 1 = 1
2x1 + 3x2 + s2 = 4.

All variables are nonnegative:

e 1 x1 = 0
e 2 x2 = 0
λ1 s 1 = 0
λ2 s 2 = 0.

Observe that with the exception of the last four equations, the KT con-
ditions are all linear or nonnegative constraints. The last four equations
are the complementary slackness conditions for this quadratic programming
problem.

For general quadratic programming problems, the complementary slack-


ness conditions may be verbally expressed by

“ei from xi constraints and xi cannot both be slack, or the excess vari-
able for the ith constraint and λi cannot both be positive.”

To find a point satisfying the KT conditions (except for the complementary


slackness conditions), Wolfe’s method simply applies a modified version of
Phase I of the Two-Phase simplex method. We first add an artificial vari-
able to each constraint in the KT conditions that does not have an obvious
basic variable, and then we attempt to minimize the sum of the artificial
variables.

To ensure that the final solution (with all the artificial variables equal
to zero) satisfies the above slackness conditions, Wolfe’s method is modified
by the simplex choice of the entering variable as follows:
898 Applied Linear Algebra and Optimization using MATLAB

1. Never perform a pivotal that would make the ei from the above jth
constraint and xi both basic variables.
2. Never perform a pivot that would make the slack (or excess) variable
for the ith constraint and λi both basic variables.
To apply Wolfe’s method to the given problem, we have to the solve the LP
problem
minimize w = a1 + a2 ,
subject to
4x1 + 4x2 + λ1 + 2λ2 − e1 + a1 = 6
4x1 + 6x2 + λ1 + 3λ2 − e2 + a2 = 3
x1 + x2 + s 1 = 1
2x1 + 3x2 + s2 = 4
e1 x1 = e2 x2 = λ1 s1 = λ2 s 2 = 0
x1 , x2 ≥ 0, λi ≥ 0, i = 1, 2.
x1 x2 λ1 λ2 e1 e2 s1 s2 a1 a2 rhs
8 10 2 5 −1 −1 0 0 0 0 w=9
4 
4 1 2 −1 0 0 0 1 0 a1 = 6
4 
6 1 3 0 −1 0 0 0 1 a2 =3
1 1 0 0 0 0 1 0 0 0 s1 = 1
2 3 0 0 0 0 0 1 0 0 s2 = 4
x1 x2 λ1 λ2 e1 e2 s1 s2 a1 a2 rhs

4 1 2
3
0 3
0 −1 3
0 0 0 w=4

4 1 2
3
0 3
0 −1 3
0 0 1 a1 = 4

2 1 1
1 0 − 61 0 0 0 x2 = 1

3 6 2 2

1
3
0 − 16 − 12 0 1
6
1 0 0 s1 = 1
2

0 0 − 12 − 32 0 1
2
0 1 0 s2 = 5
2
Nonlinear Programming 899

x1 x2 λ1 λ2 e1 e2 s1 s2 a1 a2 rhs

0 −2 0 −1 −1 1 0 0 0 w=3
0 −2 0 −1 −1 1 0 0 1 a1 = 3

3 1 3
1 2 4 4
0 − 14 0 0 0 x1 = 3
4

0 − 12 − 14 − 34 0 1
1 0 0 s1 = 1

4 4

0 0 − 12 − 32 0 1
2
0 1 0 s2 = 5
2

x1 x2 λ1 λ2 e1 e2 s1 s2 a1 a2 rhs
0 0 2 −1 0
1  −4 0 0 w=2
0 0 2 −1 0
1  −4 0 1 a1 = 2
1 1 0 0 0 0 1 0 0 x1 = 1
0 −2 −1 −3 0 1 4 0 0 e2 = 1
0 1 0 0 0 0 −2 1 0 s2 = 2

x1 x2 λ1 λ2 e1 e2 s1 s2 rhs
0 0 0 0 0 0 0 0 w=2

1
0 0 1 − 12 0 −2 0 λ2 = 1

2

1 1 0 0 0 0 1 0 x1 = 1

1
0 −2 2
0 − 32 1 −2 0 e2 = 4

0 1 0 0 0 0 −2 1 s2 = 2

x1 x2 λ1 λ2 e1 e2 s1 s2 rhs
0 0 0 0 0 0 0 0 w=0
0 0 1 2 −1 0 −4 0 λ1 = 2
1 1 0 0 0 0 1 0 x1 = 1
0 −2 0 −1 −1 1 0 0 e2 = 3
0 1 0 0 0 0 −2 1 s2 = 2
900 Applied Linear Algebra and Optimization using MATLAB

We note from the last tableau that w = 0, so we have found a solution that
satisfies the KT conditions and is optimal for the quadratic programming
problem. Thus, the optimal solution to the quadratic programming problem
is
x1 = 1 and x2 = 0.
We also note that

λ1 = 2, λ2 = 0, e1 = 0, e2 = 3, s1 = 0, s2 = 2,

which satisfies

e1 x1 = 0, e2 x2 = 0, λ1 s1 = 0, λ2 s2 = 0.

Note that Wolfe’s method is guaranteed to obtain the optimal solution to


a quadratic programming problem if all the leading principals of minors of
the objective function’s Hessian are positive. Otherwise, the method may
not converge in a finite number of pivots. •

7.12 Summary
Nonlinear programming is a very vast subject and in this chapter we gave
a brief introduction to the idea of nonlinear programming problems. We
started with a review of differential calculus. Classical optimization theory
uses differential calculus to determine points of extrema (maxima and min-
ima) for unconstrained and constrained problems. The methods may not
be suitable for efficient numerical problems, but they provide the theory
that is the basis for most nonlinear programming methods. The solution
methods for the nonlinear programming problem were discussed, including
direct and indirect methods. For the one-dimensional optimization prob-
lem solution we used three indirect numerical methods. First, we used one
of the direct search methods called golden-section search, which helped us
identify the interval of uncertainty that is known to include the optimum
solution point. This method locates the optimum by iteratively decreasing
the interval of uncertainty to any given accuracy. The other two one-
dimensional methods we discussed are the quadratic interpolation method
and Newton’s method. Both are fast convergence methods compared with
Nonlinear Programming 901

the golden-section search method. In the case of direct methods, we dis-


cussed gradient methods where the maximum (minimum) of a problem is
found following the fastest rate of increase (decrease) of the objective func-
tion.

We also discussed necessary and sufficient conditions for determining


extremum, the Lagrange method for problems with equality constraints,
and the Karush–Kuhn–Tucker (KT) conditions for problems with inequal-
ity constraints. The KT conditions provide the most unifying theory for all
nonlinear programming problems. In indirect methods, the original prob-
lem is replaced by an auxiliary one from which the optimum is determined.
For such cases we used quadratic programming (Wolfe’s method) and sep-
arable programming (McMillan method).

This chapter contained many examples which we solved numerically


and graphically and using MATLAB.

7.13 Problems
1. Find the following limits as x approaches 0:

sin x2
(a) .
x tan x
1 + cos2 x + 3x2 − 2 cos x
(b) .
x2
√ √
x+2− 2
(c) .
x
sin x − 1 + cos x
(d) .
x

2. Find the constants a and b so that the following function is continu-


ous on the entire real line:
902 Applied Linear Algebra and Optimization using MATLAB

(a)


 3x + 1, if x ≤ −1



f (x) = ax + b, if − 1 < x < 4




x + 4, if x ≥ 4.

(b)

 x + 1, if 1 < x < 3
f (x) =
x2 + ax + b, if |x − 2| ≥ 1.

(c)


 ax − b, if x ≤ −2




 2
x −4
f (x) = , if − 2 < x < 1


 x + 2


 x2 + b, if x ≥ 1.

(d)
2 sin 2x


 , if x < 0


 x

f (x) = a − 3x, if x > 0






a = b, if x = 0.

3. Find the third derivatives of the following functions:

(a) f (x) = x3/4 + x2 + 5x + 3.


 
1
(b) f (x) = x − .
x
(c) f (x) = x3 + x cos x + x ln x + 3.

(d) f (x) = x2 + tan x + e3x .


Nonlinear Programming 903

4. Find the local extrema using the second derivative test, and find the
point of inflection of the following functions:

(a) f (x) = x3 − 2x2 + x + 3.

(b) f (x) = x4 − 2x2 .


 
3 12 1
(c) f (x) = x 1 − 2 + 3 .
x x
(d) f (x) = 2 − 15x + 9x2 − x3 .
5. Find the second partial derivatives of the following functions:

(a) z = f (x, y) = e−xy cos y.


1 x
(b) z = f (x, y) = (e − e−y ) sin xy.
2
(c) z = f (x, y, z) = 4x2 y − 6xyz 2 + 3yz 2 .

(d) z = f (x, y, z) = e−2x sin yz.


6. Find the directional derivative of the following functions at the indi-
cated point in the direction of the indicated vector:

(a) z = f (x, y, z) = x2 + y 2 + z 2 + xz + yz, P (1, 1, 1), u = (2, −1, 3).

(b) z = f (x, y, z) = cos xy + e2z , P (0, −1, 2), u = (0, 2, 3).

(c) z = f (x, y, z) = x3 + y 3 + z 3 + xyz, P (2, 1, 2), u = (0, −1, 1).

(d) z = f (x, y, z) = sin x + eyz , P (0, 3, 3), u = (2, −1, 2).


7. Find the gradient of the following functions at the indicated points:

(a) z = f (x, y, z) = x3 − yz 2 + x2 z + y 2 z 3 , (1, 1, 1).

(b) z = f (x, y, z) = cos xy + xye2z , (0, 1, 2).


904 Applied Linear Algebra and Optimization using MATLAB

(c) z = f (x, y, z) = ln(x3 + y 3 + z 3 ), (1, 2, 3).

(d) z = f (x, y, z) = tan xeyz , (0, 1, 1).

8. Find the Hessian matrix for the following functions:

(a) f (x, y) = (x − 3)2 + 2y 2 + 5.

(b) f (x, y) = x4 + 3y 4 − 2x2 y + xy 2 − x − y.

(c) f (x, y, z) = x2 + 2y 2 + 4z 2 − 2x − 3yz.

(d) f (x, y, z) = x3 + 3y 2 + 2z 2 − 2xy + yz.

9. Find the Hessian matrix for the following functions at the indicated
points:

(a) f (x, y) = x4 + y 4 + 5xy, P (1, −1).

(b) f (x, y) = 3x5 + y 5 + 3x3 y 3 − x2 y + 2x − 3y, P (2, 1).

(c) f (x, y, z) = x3 + 4y 3 + 2z 3 + 12x + 5yz, P (1, 2, 2).

(d) f (x, y, z) = 2x4 + 6y 4 + 2z 4 − 2x2 y + 3y 2 z, P (−2, 3, −2).

10. Find the linear and quadratic approximations of the following func-
tions at the given point (a, b) using Taylor’s series formulas:

(a) f (x, y) = ln(x2 + y 2 ), (1, 1).

(b) f (x, y) = x2 + xy + y 2 , (1, 1).

(c) f (x, y) = x2 + y 2 + exy , (1, 1).

(d) f (x, y) = cos xy, (0, 0).


Nonlinear Programming 905

11. Find the quadratic forms of the associated matrices:


   
2 3 4 −1
(a) A = . (b) A = .
3 4 −1 3
   
2 4 −2 4 3 −1
(c) A =  4 6 0 . (d) A =  3 5 1 .
−2 0 5 −1 1 3

12. Find the matrices associated with each of the following quadratic
forms:

(a) q(x, y) = 5x2 + 7y 2 + 12xy.

(b) q(x, y) = 3x2 + 2y 2 − 4xy.

(c) q(x, y, z) = 7x2 + 6y 2 + 2z 2 − 3xy − 3xz − 3yz.

(d) q(x, y, z) = x2 + 2y 2 + 3z 2 − 2xy + 2xz + 2yz.

13. Classify each of the following quadratic forms as positive-definite,


negative-definite, indefinite, or none of these:

(a) q(x, y) = 3x2 + 4y 2 − 2xy.

(b) q(x, y) = 4x2 + 2y 2 + 4xy.

(c) q(x, y, z) = 3x2 + 4y 2 + 5z 2 − 2xy − 2xz − 2yz.

(d) q(x, y, z) = 4x2 + 4y 2 + 4z 2 − 4xy + xz + 2yz.

14. Use the bisection method to find solutions accurate to within 10−4
on the indicated interval of the following functions:

(a) f (x) = x5 − 2x2 + x + 1, [−1, 1].

(b) f (x) = x5 − 4x2 + x + 1, [0, 1].


906 Applied Linear Algebra and Optimization using MATLAB

(c) f (x) = x6 − 7x4 − 2x2 + 1, [2, 3].

(d) f (x) = ex − 3x2 + x + 1, [3, 4].

15. Use the bisection method to find solutions accurate to within 10−4
on the indicated intervals of the following functions:

(a) f (x) = x3 − 8x2 + 4, [7, 8].

(b) f (x) = x3 + 2x2 + x − 3, [0, 1].

(c) f (x) = x4 + 2x3 − 7x2 − x + 2, [1, 2].

(d) f (x) = ln x + 2x5 + x − 3, [0.5, 1.5].

16. Use Newton’s method to find a solution accurate to within 10−4 of


Problem 14 using the suitable initial approximation.

17. Use Newton’s method to find a solution accurate to within 10−4 using
the given initial approximations of the following functions:

(a) f (x) = x3 + 2x2 − 5, x0 = 1.5.

(b) f (x) = x3 − 5x2 + 3x − 2, x0 = 4.5.

(c) f (x) = x4 − 3x3 − 4x2 + 3x + 5, x0 = 3.5.

(d) f (x) = ex − x2 /2 + x + 1, x0 = −0.5.

18. Use the fixed-point iteration to find a solution accurate to within


10−4 using the given intervals and initial approximations of the fol-
lowing functions:

(a) f (x) = 2x3 − 5x + 2, [1, 2], x0 = 1.5.

(b) f (x) = 3x3 − 7x2 − 2x + 1, [0, 1], x0 = 0.5.


Nonlinear Programming 907

(c) f (x) = x4 − 3x3 + 5x2 − 6x + 1, [1, 2], x0 = 1.5.

(d) f (x) = ex − 4x + 1, [0, 1], x0 = 0.5.


19. Solve the following system by Newton’s method using the indicated
initial approximation (x0 , y0 ) and stop when successive iterates differ
by less than 10−4 :

(a) (x0 , y0 ) = (1, 1)


2x3 + y = 4,
x2 y = 2.
(b) (x0 , y0 ) = (−1, −1)
x + y 2 = −2,
2x2 + 2y = −4.
(c) (x0 , y0 ) = (−1, 1)
x2 + y 2 = 4,
x2 + y = 3.
(d) (x0 , y0 ) = (0, 3)
x + ey = 40,
sin x − y = −2.

20. Solve the following system by fixed-point iterations using the indi-
cated initial approximation (x0 , y0 ) and stop when successive iterates
differ by less than 10−4 :

(a) (x0 , y0 ) = (3, −1.5)


2x3 + y = 4,
xy = 1.
(b) (x0 , y0 ) = (1, 1)
x2 − 2y 2 = 12,
xy 2 = 2.
908 Applied Linear Algebra and Optimization using MATLAB

(c) (x0 , y0 ) = (0, 0.5)

x2 + y = 4,
x + y 2 = 3.

(d) (x0 , y0 ) = (0, 1)

3x + ey = 4,
x2 + y = 2.

21. Find the maximum value of the following functions using accuracy
 = 0.005 by the golden-section search:

(a) f (x) = 3 cos x − 2x3 , [0, 2].

x2
(b) f (x) = sin x − , [0, 2].
5
1
(c) f (x) = x3 − 2x2 − 1.75x + 1.5, [0, 1].
5
(d) f (x) = x6 − 5x5 + 3x4 + 2x2 + 4, [2, 4].

22. Find the extrema of the following functions using accuracy  = 0.005
by the quadratic interpolation method for optimization:
3
(a) Minimize f (x) = x3 + x2 − 6x + 5, x0 = 0.0, x1 = 0.5, x2 = 1.5.
2
(b) Maximize f (x) = x3 −3x2 −9x+2, x0 = −2.0, x1 = 0.0, x2 = 1.0.

(c) Maximize f (x) = x4 − 8x3 + 18x2 − 16x + 5, x0 = 3.9, x1 = 4.1,


x2 = 4.5.

(d) Minimize f (x) = x5 − 2x4 − x3 , x0 = 0.2, x1 = 0.5, x2 = 1.0.

23. Find the extrema of the following functions using accuracy  = 0.005
by the quadratic interpolation method for optimization:

(a) Minimize f (x) = x3 +3x2 −15x−5, x0 = 2.0, x1 = 2.5, x2 = 3.5.


Nonlinear Programming 909

1
(b) Maximize f (x) = x3 + 2x2 − 12x + 1, x0 = −4.0, x1 = −5.5,
3
x2 = −7.0.

(c) Maximize f (x) = x4 − 2x2 − 12, x0 = 1.0, x1 = 1.5, x2 = 2.5.

(d) Minimize f (x) = x5 − 20x3 + 1, x0 = 1.0, x1 = 1.5, x2 = 2.5.

24. Find the extrema of the following functions using accuracy  = 0.005
by Newton’s method for optimization:

(a) Minimize f (x) = x3 + 3x2 − 45x + 8, x0 = 2.5.

(b) Maximize f (x) = x3 + 3x2 − 24x + 6, x0 = −3.5.

(c) Maximize f (x) = x4 − 4x3 − 20x2 + 4x + 8, x0 = 2.5.

(d) Minimize f (x) = x2/3 (x − 6)2 + 4, x0 = 5.5.

25. Find the extrema of the following functions using accuracy  = 0.005
by Newton’s method for optimization:

(a) Minimize f (x) = 5 − 6x2 − 2x3 , x0 = 0.5.

(b) Minimize f (x) = x3 − 13x2 − 10x + 1, x0 = −1.0.

(c) Minimize f (x) = x4 − 8x2 + 1, x0 = −0.5.

(d) Maximize f (x) = x(x − 4)3 , x0 = 0.5.

26. Find the extrema and saddle points of each of the following functions:

(a) z = f (x, y) = 2x3 − 3x2 y + 6x2 − 6y 2 .

(b) z = f (x, y) = xy + ln x + y 2 − 10.

(c) z = f (x, y) = 3x2 + x2 y + 4xy + y 2 .


910 Applied Linear Algebra and Optimization using MATLAB

(d) z = f (x, y) = 1 + 4xy − 2x2 − 2y 2 .

27. Find the extrema and saddle points of each of the following functions:

(a) z = f (x, y) = 4xy − x4 − y 4 .

(b) z = f (x, y) = x2 y − 6y 2 − 3x2 .

(c) z = f (x, y) = x3 + y 3 − 6y 2 − 3x + 9.

(d) z = f (x, y) = x2 + y 2 − 3xy + 4x − 2y + 5.

28. Use the method of steepest ascent to approximate (up to given iter-
ations) the optimal solution to the following problems:

(a) Maximize z = x2 − 2y 2 + 2xy + 3y; (1.0, 1.0), iter. = 2.

(b) Maximize z = 2x2 + 2y 2 − 2xy − 2x; (0.5, 0.5), iter. = 2.

(c) Maximize z = −(x − 3)2 + (y + 1)2 − 2x; (1.5, 1.0), iter. = 2.

(d) Maximize z = (x − 2)2 − 4y 2 − 3xy + y; (2.0, 1.5), iter. = 2.

29. Use the method of steepest descent to approximate (up to the given
iterations) the optimal solution to the following problems:

(a) Minimize z = 2x2 + 2y 2 + 2xy − y; (1.0, 1.0), iter. = 2.

(b) Minimize z = x2 − 3y 2 − 4xy + 2x − y; (2.5, 1.5), iter. = 2.

(c) Minimize z = (x + 2)2 + (y − 1)2 − y; (0.0, 1.0), iter. = 2.

(d) Minimize z = (x − 1)2 + 2y 2 + xy − x; (0.5, 1.5), iter. = 2.

30. Solve Problem 29 using Newton’s method.


Nonlinear Programming 911

31. Use Lagrange multipliers to find the extrema of function f subject


to the stated constraints:
p
(a) Maximize z = f (x, y) = 6 − x2 − y 2
Subject to g(x, x) = x + y − 2 = 0

(b) Maximize z = f (x, x) = x2/3 y 1/3


Subject to g(x, y) = x + y − 4 = 0

(c) Minimize w = f (x, y, z) = xy + 2xz + 2yz


Subject to g(x, y, z) = xyz − 12 = 0

(d) Minimize w = f (x, y, z) = x4 + y 4 + z 4


Subject to g(x, y, z) = x + y + z − 1 = 0
32. Use Lagrange multipliers to find the extrema of function f subject
to the stated constraints:

(a) Minimize z = f (x, y) = x2 − y 2


Subject to g(x, y) = x − 2y + 6 = 0

(b) Maximize z = f (x, y) = 4x3 + y 2


Subject to g(x, y) = 2x2 + y 2 − 1 = 0

(c) Minimize w = f (x, y, z) = 2x + y + 2z


Subject to g(x, y, z) = x2 + y 2 + z 2 − 4 = 0

(d) Minimize z = f (x, y, z) = x2 + y 2 + z 2


Subject to g(x, y, z) = x + y + z − 6 = 0
33. Use Lagrange multipliers to find the extrema of function f subject
to the stated constraints:

(a) w = f (x, y, z) = x2 + y 2 + z 2
Subject to
g1 (x, y, z) = x + y + z − 12 = 0
g2 (x, y, z) = x2 + y 2 − z = 0
912 Applied Linear Algebra and Optimization using MATLAB

(b) w = f (x, y, z) = x + y + z
Subject to
g1 (x, y, z) = x2 − y 2 − z = 0
g2 (x, y, z) = x2 + z 2 − 4 = 0

(c) w = f (x, y, z) = x + 2y
Subject to
g1 (x, y, z) = x + y + z − 1 = 0
g2 (x, y, z) = x2 + z 2 − 4 = 0

(d) w = f (x, y, z) = xy + yz
Subject to
g1 (x1 , x2 , x3 ) = xy − 1 = 0
g2 (x, y, z) = y 2 + z 2 − 1 = 0

34. Use Lagrange multipliers to find the extrema of function f subject


to the stated constraints:

(a) Minimize w = f (x, y, z) = x2 + y 2 + z 2


Subject to
g1 (x, y, z) = x + 2y + 3z − 6 = 0
g2 (x, y, z) = y + z = 0

(b) Maximize w = f (x, y, z) = xyz


Subject to
g1 (x, y, z) = x + y + z − 4 = 0
g2 (x, y, z) = x + y − z = 0

(c) Maximize w = f (x, y, z) = x2 + yz


Subject to
g1 (x, y, z) = x + y + z − 5 = 0
g2 (x, y, z) = x + y − z − 2 = 0
2 −z)
(d) Minimize w = f (x, y, z) = y 2 e(x
Subject to
Nonlinear Programming 913

g1 (x, y, z) = 9x2 + 4y 2 + 36z 2 − 36 = 0


g2 (x1 , x2 , x3 ) = xy + yz − 1 = 0
35. Use Lagrange multipliers to find the extrema of function f subject
to the stated constraints:

(a) w = f (x, y, z) = x2 + y 2 + z 2
Subject to
g1 (x, y, z) = x − y − 1 = 0
g2 (x, y, z) = y 2 − z 2 − 1 = 0

(b) w = f (x, y, z) = x2 + y 2 + z 2
Subject to
g1 (x, y, z) = x2 + y 2 − z 2 = 0
g2 (x, y, z) = x − 4z + 5 = 0

(c) w = f (x, y, z) = x2 + y 2 + z 2
Subject to
g1 (x, y, z) = x2 + y 2 + 2z − 16 = 0
g2 (x, y, z) = x + y − 4 = 0

(d) w = f (x, y, z) = 4x − 2y − 5z
Subject to
g1 (x, y, z) = x + y − z − 1 = 0
g2 (x1 , x2 , x3 ) = x2 + y 2 + 2z 2 − 2 = 0
36. Use the KT conditions to find a solution to the following nonlinear
programming problems:

(a) Minimize z = f (x, y) = −x − 2y + y 2


Subject to
x+y ≤1
x ≥ 0, y ≥ 0

(b) Maximize z = f (x, y) = x + 3y 2


Subject to
x−y ≤2
914 Applied Linear Algebra and Optimization using MATLAB

x ≥ 0, y≥0

(c) Maximize w = f (x, y, z) = x + y − z


Subject to
x2 + y 2 + z ≤ 1
x ≥ 0, y ≥ 0, z ≥ 0
2
(d) Minimize w = f (x, y, z) = y 2 e(x −z)
Subject to
g1 (x, y, z) = 9x2 + 4y 2 + 36z 2 − 36 = 0
g2 (x, y, z) = xy + yz − 1 = 0
x ≥ 0, y ≥ 0, z ≥ 0

37. Use the KT conditions to find a solution to the following nonlinear


programming problems:

(a) Minimize z = f (x, y) = x2 + 4y 2 − 4xy − x + 12y


Subject to
x+y ≥4
x ≥ 0, y ≥ 0

(b) Maximize z = f (x, y) = x2 − y 2


Subject to
−(x − 2)2 − y 2 ≤ −4
x ≥ 0, y ≥ 0

(c) Maximize z = f (x, y) = −3x + 0.5y


Subject to
x2 + y 2 ≤ 1
−x ≤ 0
−y ≤ 0

(d) Minimize w = f (x, y, z) = (x − 10)2 + (y − 10)2 + (z − 10)2


Subject to
g1 (x, y, z) = x2 + y 2 + z 2 − 12 = 0
g2 (x, y, z) = −x − y + 2z = 0
Nonlinear Programming 915

x ≥ 0, y ≥ 0, z≥0
38. Use the reduced-gradient method to find the extrema of function f
subject to the stated constraints:

(a) Minimize u = f (x, y, z, w) = x2 + y + z 2 + w


Subject to
g1 (x, y, z, w) = x + y − 2z + 2w − 2 = 0
g2 (x, y, z, w) = −x + y + 4z − 4w + 4 = 0

(b) Minimize u = f (x, y, z, w) = 2x2 + 3y + z + w2


Subject to
g1 (x, y, z, w) = 2x − y − 3z + w − 3 = 0
g2 (x, y, z, w) = x − y + 2z − 2w + 2 = 0

(c) Minimize u = f (x, y, z, w) = 3x + y 2 + z 2 + 2w


Subject to
g1 (x, y, z, w) = x − y − z + w + 1 = 0
g2 (x, y, z, w) = 3x − y + 2z − w − 2 = 0

(d) Minimize u = f (x, y, z, w) = x2 − y + z 2 − w


Subject to
g1 (x, y, z, w) = x + 2y − 3z + 2w − 5 = 0
g2 (x, y, z, w) = 2x − y − 2z − 4w − 2 = 0
39. Convert the following problems into separable forms:

(a) Minimize z = f (x, y) = 2x2 + y 2


Subject to
x+y ≤3
x ≥ 0, y ≥ 0

(b) Maximize z = f (x, y) = 3x3 − 2x2 + 2x1 + 5y 2 − 4y


Subject to
2x + 3y ≤ 1
x ≥ 0, y ≥ 0
916 Applied Linear Algebra and Optimization using MATLAB

(c) Maximize w = f (x, y, z) = x2 + y 2 − z 2


Subject to
x2 + 2y 2 + 3z ≤ 3
x−y+z ≤1
x ≥ 0, y ≥ 0, z ≥ 0

(d) Minimize w = f (x, y, z) = x2 + 2y 2 + z 2 − 2xz


Subject to
5x2 + 4y 2 + 6z 2 ≤ 13
x + 3y − 2z ≤ 2
x ≥ 0, y ≥ 0, z ≥ 0

40. Solve each of the following quadratic programming problems using


Wolfe’s method:

(a) Maximize z = f (x, y) = 2x2 + 2xy + 4y 2 − x + y


Subject to
x+y ≤2
x−y ≤1
x ≥ 0, y ≥ 0

(b) Maximize z = f (x, y) = −x − y + 0.5x2 + y 2 − xy


Subject to
x+y ≤3
2x + 3y ≥ 6
x ≥ 0, y ≥ 0
Appendix A

Number Representations and


Errors

A.1 Introduction
Here, we study in broad outline the floating-point representation used in
computers for real numbers and the errors that result from the finite na-
ture of this representation. We give a good general overview of how the
computer represents and manipulates numbers. We see later that such con-
siderations affect the choice of design of computer algorithms for solving
higher-order problems. We introduce several definitions and concepts that
may be unfamiliar. The reader should not spend time trying to master
all these immediately but should rather try to acquire a rough idea of the
sorts of difficulties that can arise from computer solutions of mathematical
problems. We describe methods for representing numbers on computers
and the errors introduced by these representations. In addition, we exam
other sources of various types of computational errors.
917
918 Linear Algebra and Optimization Using MATLAB

A.2 Number Representations and the Base


of Numbers
The number system we use daily is called the decimal system. The base
of the decimal number system is 10. The familiar decimal notation for
numbers employs the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. When we write down a
whole number such as 478325, the individual digits represent coefficients
of powers of 10 as follows:

478325 = 5 + 20 + 300 + 8000 + 70000 + 400000

= 5 × 100 + 2 × 101 + 3 × 102 + 8 × 103 + 7 × 104 + 4 × 105 .

Thus, in general, a string of digits represents a number according to the


formula

an an−1 · · · a1 a0 = a0 × 100 + a1 × 101 + · · · + an−1 × 10n−1 + an × 10n . (A.1)

This takes care of the positive whole numbers. A number between 0 and
1 is represented by a string of digits to the right of a decimal point. For
example,

0.8543 = 8 × 10−1 + 5 × 10−2 + 4 × 10−3 + 3 × 10−4 .

Thus, in general, a string of digits represents a number according to the


formula

0.b1 b2 b3 · · · = b1 × 10−1 + b2 × 10−2 + b3 × 10−3 + · · · . (A.2)

For a real number of the form


n
X ∞
X
an an−1 · · · a1 a0 .b1 b2 · · · = k
ak 10 + bk 10−k , (A.3)
k=0 k=1

the integer part is the first summation in the expansion and the fractional
part is the second. Computers, however, don’t use the decimal system in
computations and memory but use the binary system. The binary system is
natural for computers because computer memory consists of a huge number
Appendix A: Number Representations and Errors 919

of electronic and magnetic recording devices, of which each element has


only “on” and “off” statues. In the binary system the base is 2, and the
integer coefficients may take the values 0 or 1. The digits 0 and 1 are called
bits, which is short for binary digits. For example, the number 1110.11 in
the binary system represents the number

1 × 24 + 1 × 23 + 1 × 22 + 0 × 21 + 1 × 2−1 + 1 × 2−2

in the decimal system.


There are other base systems used in computers, particularly, the octal
and hexadecimal systems. The base for the octal system is 8 and for the
hexadecimal it is 16. These two systems are close relatives of the binary
system and can be translated to and from binary easily. Expressions in
octal or hexadecimal form are shorter than in binary form, so they are
easier for humans to read and understand. Hexadecimal form also provides
more efficient use of memory space for real numbers. If we use another base,
say, β, then numbers represented in the β system look like this:
n
X ∞
X
(an an−1 · · · a1 a0 .b1 b2 · · ·)β = k
ak β + bk β −k , (A.4)
k=0 k=1

and the digits are 0, 1, 2, . . . , β − 1 in this representation. If β > 10, it is


necessary to introduce symbols for 10, 11, . . . , β − 1. In this system based
on 16, we use A, B, C, D, E, F for 10, 11, 12, 13, 14, 15, respectively. Thus,
for example,

(2BED)16 = D + E × 16 + B × 162 + 2 × 163 = 11245.

The base of a number system is also called the radix. The base of a number
is denoted by a subscript, for example, (4.445)10 is 4.445 in base 10 (deci-
mal), (1011.11)2 is 1011.11 in base 2 (binary), and (18C7.90)16 is 18C7.90
in base 16 (hexadecimal).
The conversion of an integer from one system to another is fairly simple
and can probably best be presented in terms of an example. Let k = 275
in decimal form, i.e., k = (2 × 102 ) + (7 × 101 ) + (5 × 100 ). Now (k/162 ) > 1
but (k/163 ) < 1, so in hexadecimal form k can be written as k = (α2 ×
162 ) + (α1 × 161 ) + (α0 × 160 ). Now, 275 = 1(162 ) + 19 = 1(162 ) + 1(16) + 3,
920 Linear Algebra and Optimization Using MATLAB

and so the decimal integer, 275, can be written in hexadecimal form as


113, i.e.,
275 = (275)10 = (113)16 .
The reverse process is even simple. For example,
(5C3)16 = 5(162 ) + 12(16) + 3 = 1280 + 192 + 3 = (1475)10 .
Conversion of a hexadecimal fraction to a decimal is similar. For example,
(0.2A8)16 = (2/16) + (A/162 ) + (8/163 ) = (2(162 ) + 10(16) + 8)/163
= 680/4096 = 0.166,
carrying only three digits in decimal form. Conversion of a decimal frac-
tion to a hexadecimal (or a binary) proceeds as in the following example.
Consider the number r1 = 1/10 = 0.1 (decimal form). Then there exist
constants {αk }∞
k=1 such that

r1 = 0.1 = α1 /16 + α2 /162 + α3 /163 + · · · .


Now
16r1 = 1.6 = α1 + α2 /16 + α3 /162 + · · · .
Thus, α1 = 1 and
r2 ≡ .6 = α2 /16 + α3 /162 + α4 /163 + · · · .
Again,
16r2 = 9.6 = α2 + α3 /16 + α4 /162 + · · · ,
so α2 = 9, and
r2 ≡ .6 = α3 /16 + α4 /162 + · · ·
From this stage on we see that the process will repeat itself, and so we have
(0.1)10 equals the infinitely repeating hexadecimal fraction, (0.1999 · · ·)16 .
Since 1 = (0001)2 and 9 = (1001)2 we also have the infinite binary expan-
sion
r1 = (0.1)10 = (0.1999 · · ·)16 = (0.000110011001 · · ·)2 .
Example A.1 The conversion from one base to another base is:
(17)10 = (10001)2 = (21)8 = (11)16
(13.25)10 = (1101.01)2 = (15.2)8 = (D.4)16 .
Appendix A: Number Representations and Errors 921

A.2.1 Normalized Floating-Point Representations


Unless numbers are specified to be integers, they are stored in the com-
puters in what is known as normalized floating-point form. This form is
similar to the scientific notation used as a compact form for writing very
small or very large numbers. For example, the number 0.0000123 may be
written in scientific notation as 0.123 × 10−4 .
In general, every nonzero real number x has a floating-point represen-
tation
1 1
x = ±M × re , where ≤ M < 1, or − ≥ M > −1,
r r
where
t
X
M= dk r−k .
k=1

Here, M is called the mantissa, e is an integer called the exponent, r the


base, dk is the value of the kth digit and t is the maximum number of digits
allowed in the number. When r = 10, then the nonzero real number x has
the normalized floating-point decimal representation

x = ±M × 10e ,
1
where the normalized mantissa M satisfies 10 ≤ M < 1. Normalization
consists of finding the exponent e for which |x|/10e lies on the interval
1
[ 10 , 1), then taking M = |x|/10e . This corresponds to “floating” the dec-
imal point to the left of the leading significant digit of x’s decimal repre-
sentation, then adjusting e as needed. For example,
−12.75 has representation −0.1275 × 102 (M = 0.1275, e = 2)
0.1 has representation +0.1 × 100 (M = 0.1, e = 0)
1 1 −1
15
= 0.066 has representation ( 15 × 10) × 10 (M = 0.66, e = −1).
A machine number for a calculator is a real number that it stores exactly
in normalized floating-point form. For the calculator storage, a nonzero x
is a machine number, if and only if its normalized floating decimal point
representation is of the form

x = ±M × 10e ,
922 Linear Algebra and Optimization Using MATLAB

where
M = 0.d1 d2 · · · dk (dk = 0, 1, 2, . . . , 9), with d1 6= 0
e = −100, −99, · · · , +99.
1
The condition d1 6= 0 ensures normalization (i.e., M ≥ 10
).

Computers use a normalized floating-point binary representation for


real numbers. The computer stores a binary approximation to x as

x = ±M × 2e .

Normalization in this case consists of finding the unique exponent e for


which |x|/2e lies on the interval ( 21 , 1), and then taking |x|/2e as M. For
example,

−12.75 = − 51
4
can be represented as −( 51 . 1 ).24 (M =
4 16
51
64
,e = 4)

0.1 = 1
10
1
can be represented as +( 10 .8).2−3 (M = 45 , e = −3)

1
15
1
= 0.06666 · · · can be representation ( 15 .8).2−3 (M = 8
15
,e = −3).

Computers have both an integer mode and a floating-point mode for


representing numbers. The integer mode is used for performing calculations
that are known to be integer values and have limited usage for numerical
analysis. Floating-point numbers are used for scientific and engineering
applications. It must be understood that any computer implementation
of equations x = ±M × 2e places restrictions on the number of digits
used in the mantissa M, and the range of the possible exponent e must
be limited. Computers that use 32 bits to represent single-precision real
numbers use 8 bits for the exponent and 24 bits for the mantissa. They can
represent real numbers whose magnitude is in the range 2.938736E − 39
to 1.701412E + 38 (i.e., 2−128 to 2127 ), with six decimal digits of numerical
precision (for example, (2−23 = 1.2 × 10−7 ).
Computers that use 48 bits to represent single-precision real numbers
might use 8 bits for the exponent and 40 bits for the mantissa. They can
represent real numbers in the range 2.9387358771E −39 to 1.701418346E +
38 (i.e., 2−128 to 2127 ) with 11 decimal digits of precision (for example,
Appendix A: Number Representations and Errors 923

2−39 = 1.8 × 10−12 ). If the computer has 64 bit double-precision real


numbers, it might use 11 bits for the exponent and 53 bits for the mantissa.
They can represent real numbers in the range 5.56284646268003 × 10−309
to 8.988465674311580 × 10307 (i.e., 2−1024 to 21023 ) with about 16 decimal
digits of precision (for example, 2−52 = 2.2 × 10−16 ).
The most commonly used floating-point representations are the 1EEE
binary floating-point standards. There are two such formats: single and
double precision. 1EEE single precision uses a 32-bit word to represent the
sign, exponent, and mantissa. Double precision uses 64 bits. These bits
are distributed as shown below.

No. of Bits Single Precision Double Precision

Sign 1 1
Exponent 8 11
Mantissa 24 53

In all essential respects, MATLAB uses only one type of number—1EEE


double precision floating-point. In other words, MATLAB uses pairs of
these to represent double floating-point complex numbers, but that will
not affect much of what we do here. Integers are stored in MATLAB as
“floating integer” which means, in essence, that integers are stored in their
floating-point representations.

MATLAB Variable Meaning Value

eps Machine unit 2−52 ≈ 2.22044604925031e


−016
realmin Smallest positive number 2.22507385850723 − 308
realmax Largest positive number 1.79769313486232e + 308

From this we see that the above representation uses 11 bits for the binary
exponent, which therefore ranges from about −210 to 210 . (The actual range
is not exactly this because of special representations for small numbers
and for ±∞.) The mantissa has 53 bits including the implicit bit. If x =
M × 2e is a normalized MATLAB floating-point number, then M ∈ [1, 2)
924 Linear Algebra and Optimization Using MATLAB

is represented by
52
X
M =1+ dk 2−k .
k=1

Since 210 = 1024 ≈ 103 , these 53 significant bits are equivalent to approxi-
mately 16 significant decimal digits accuracy in MATLAB representation.
The fact that the mantissa has 52 bits after the binary point means that
the next machine number greater than 1 is 1 + 2−52 . This gap is called
machine epsilon.
In MATLAB, neither underflow nor overflow cause a program to stop.
Underflow is replaced by a zero, while overflow is replaced by ±∞. This
allows subsequent instructions to be executed and may permit meaningful
results. Frequently, however, it will result in meaningless answers such
as ±∞ or NaN, which stands for Not-a-Number. NaN is the result of
indeterminate arithmetic operations such as 0/0, ∞/∞, 0.∞, ∞ − ∞, etc.
There are two commonly used ways of translating a given real number
x into a k- digits floating-point number, rounding and chopping, which we
shall discuss in the following section.

A.2.2 Rounding and Chopping


When one gives the number of digits in a numerical value, one should not
include zeros in the beginning of the number, as these zeros only help to
denote where the decimal point should be. If one is counting the number
of decimals, one should be off course and include leading zeros to the right
of the decimal point. For example, the number 0.00123 is given with three
digits but has five decimals. The number 11.44 is given with four digits but
has two decimals. If the magnitude of the error in approximate number
p does not exceed 1/2 × 10−k , then p is said to have k correct decimals.
The digits in p, which occupy positions where the unit is greater than
or equal to 10−k , are called, then, significant digits (any initial zeros are
not counted). For example, 0.001234 ± 0.000004 has five correct decimals
and three significant digits, while 0.0012342 ± 0.000006 has four correct
decimals and two significant digits. The number of correct decimals gives
one an idea of the magnitude of the absolute error, while the number of
significant digits gives a rough idea of the magnitude of the relative error.
Appendix A: Number Representations and Errors 925

There are two ways of rounding off number s to a given number (k) of
decimals. In chopping, one simply leaves off all the decimals to the right
of the kth. That way of abridging a number is not recommended since
the error has, systematically, the opposite sign of the number itself. Also,
the magnitude of the error can be large as 10−k . A surprising number
of computers use chopping on the results of every arithmetical operation.
This usually does not do much harm, because the number of digits used
in the operations is generally far greater than the number of significant
digits in the data. In rounding (sometimes called “correct rounding”), one
chooses among the numbers that are closest to the given number. Thus,
if the part of the number which stands to the right of the kth decimal is
less than 1/2 × 10−k in magnitude, then one should leave the kth decimal
unchanged. If it is greater than 1/2 × 10−k , then one raises the kth decimal
by 1. In the boundary case, when the number that stands to the right
of the kth decimal is exactly 21 × 10−k , one should raise the kth decimal
if it is odd or leave it unchanged if it is even. In this way, the error is
positive or negative about equally often. Most computers that perform
rounding always, in the boundary case mentioned above, raise the number
by 1/2 × 10−k (or the corresponding operation in a base other than 10),
because this is easier to realize technically. Whichever convention one
chooses in the boundary case, the error in the rounding will always lie
on the interval [−1/2 × 10−k , 1/2 × 10−k ]. For example, shorting to three
decimals:

0.2397 rounds to 0.240 (is chopped to 0.239)


−0.2397 rounds to − 0.240 (is chopped to − 0.239)
0.23750 rounds to 0.238 (is chopped to 0.237)
0.23650 rounds to 0.236 (is chopped to 0.236)
0.23652 rounds to 0.237 (is chopped to 0.236)

A.3 Error
An approximate number p is a number that differs slightly from an exact
number α. We write
p ≈ α.
926 Linear Algebra and Optimization Using MATLAB

By error E of an approximate number p, we mean the difference between


the exact number α and its computed approximation p. Thus, we define

E = α − p. (A.5)

If α > p, the error E is positive, and if α < p, the error E is negative. In


many situations, the sign of the error may not be known and might even
be irrelevant. Therefore, we define absolute error as

|E| = |α − p|. (A.6)

The relative error RE of an approximate number p is the ratio of the


absolute error of the number to the absolute value of the corresponding
exact number α. Thus,
|α − p|
RE = , α 6= 0. (A.7)
|α|
1
If we approximate 3
by 0.333, we have
1
E= × 10−3 and RE = 10−3 .
3
Note that the relative error is generally a better measure of the extent of
error than the actual error. But one should also note that the relative
error is undefined if the exact answer is equal to zero. Generally, we shall
be interested in E (or sometimes |E|) rather than RE, but when the true
value of a quantity is very small or very large, relative errors are more
meaningful. For example, if the true value of a quantity is 1015 , an error
of 106 is probably not serious, but this is more meaningfully expressed by
saying that RE = 10−9 . In actual computation of the relative error, we
shall often replace the unknown true value by the computed approximate
value. Sometimes the quantity
|α − p|
× 100% (A.8)
|α|
is defined as percentage error. From the above example, we have

P E = 0.001 × 100 = 0.1%.


Appendix A: Number Representations and Errors 927

In investigating the effect of the total error in various methods, we shall


often mathematically derive an error called the error bound, which is a
limit on how large the error can be. We shall have reason to compute error
bounds in many situations. This applies to both absolute and relative
errors. Note that the error bound can be much larger than the actual error
and that this is often the case in practice. Any mathematically derived
error bound must account for the worst possible case that can occur and is
often based upon certain simplifying assumptions about the problem which
in many practical cases cannot be actually tested. For the error bound to
be used in any practical way, the user must have a good understanding of
how the error bound was derived in order to know how crude it is, i.e., how
likely it is to overestimate the actual error. Of course, whenever possible,
our goal is to eliminate or lessen the effects of errors, rather than trying to
estimate them after they occur.

A.4 Sources of Errors


In analyzing the accuracy of numerical results, one should be aware of the
possible sources of error in each stage of the computational process and
of the extent to which these errors can affect the final answer. We will
consider that there are three types of errors that occur in a computation.
We discuss them step-by-step as follows.

A.4.1 Human Errors


These types of errors arise when the equations of the mathematical model
are formed, due to sources such as the idealistic assumptions made to
simplify the model, inaccurate measurements of data, miscopying of figures,
the inaccurate representation of mathematical constants (for example, if
the constant π occurs in an equation, we must replace π by 3.1416 or
3.141593), etc.

A.4.2 Truncation Errors


This type of error is caused when we are forced to use mathematical tech-
niques that give approximate, rather than exact, answers. For example,
928 Linear Algebra and Optimization Using MATLAB

suppose that we use Maclaurin’s series expansion to represent sin x, so that


x3 x5 x7
sin x = x − + − + ···.
3! 5! 7!
If we want a number that approximates sin(π/2), we must terminate the
expansion in order to obtain
π π (π/2)3 (π/2)5 (π/2)7
sin( ) = − + − + E,
2 2 3! 5! 7!
where E is the truncation error introduced in the calculation. Truncation
errors in numerical analysis usually occur because many numerical meth-
ods are iterative in nature, with the approximations theoretically becoming
more accurate as we take more iterations. As a practical matter, we must
stop the iterations after a finite number of steps, thus introducing a trun-
cation error. Taylor’s series is the most important means used to derive
numerical schemes and analysis truncation errors.

A.4.3 Round-off Errors


These errors are associated with the limited number of digits numbers in a
computer. For example, by rounding off 1.32463672 to six decimal places
to give 1.324637, any further calculation involving such a number will also
contain an error. We round-off numbers according to the following rules:
1. If the first discarded digit is less than 5, leave the remaining digits of
the number unchanged. For example,

48.47263 ≈ 48.4726.

2. If the first discarded digit is exceeds 5, add 1 to the retained digit.


For example,
48.4726 ≈ 48.473.

3. If the first discarded digit is exactly 5 and there are nonzero among
those discarded, add 1 to the last retained digit. For example,

3.0554 ≈ 3.06.
Appendix A: Number Representations and Errors 929

4. If the first discarded digit is exactly 5 and all other discarded dig-
its are zero, the last retained digit is left unchanged if it is even,
otherwise, 1 is added to it. For example,
3.05500 ≈ 3.06
3.04500 ≈ 3.04.

With these rules, the error is never larger in magnitude than one-half unit
of the place of the nth digit in the rounded number.
To understand the nature of round-off errors, it is necessary to learn
the ways numbers are stored and additions and subtractions are performed
in a computer. •

A.5 Effect of Round-off Errors in


Arithmetic Operations
Here, we discuss the effect of rounding off errors in calculations in detail.
Let ar be the rounded off value ae , which is the exact value of a number
which is not necessarily known. Similarly, br , be , cr , ce , etc., are the corre-
sponding values for other numbers. The number EA = ar − ae is called the
error in the number ar . Similarly, EB is the error in br , etc. The error EA
will be positive or negative accordingly as ar is greater or less than ae . It
is, however, usually impossible to determine the sign of EA . Therefore, it is
normal to consider only the value of |EA |, called the absolute error of num-
ber ar . To indicate that a number has been rounded off to two significant
figures or four decimal places it is followed by 2S or 4dp as appropriate.

A.5.1 Round-off Errors in Addition and Subtraction


Let ar and br be two approximate numbers and cr be their sum, which
have been rounded off, be represented by
c r = ar + b r , (A.9)
which is an approximation for
c e = ae + b e . (A.10)
930 Linear Algebra and Optimization Using MATLAB

Then by subtracting (A.10) from (A.9), we have


cr − ce = (ar − ae ) + (br − be )
(A.11)
EC = EA + EB .
Then
|EC | ≤ |EA | + |EB |, (A.12)
i.e., the absolute error of the sum of two numbers is less than or equal to
the sum of the absolute error of the two numbers. Note that this can be
extended to the sum of any number. One should follow a similar argument
for the error involved in the difference of two rounded off numbers, i.e.,
c r = ar − b r , with ar > br , (A.13)
and one should find that the same result is obtained, which is
|EC | ≤ |EA | + |EB |. (A.14)
This can also be extended to any number of terms. For example, con-
sider the error in the numbers 1015 + 0.3572, where both numbers have
been rounded off. The first number 1.015(ar ) has been rounded off to 3dp
so that the exact value must lie between 1.0145 and 1.0155, which implies
that −0.0005 ≤ EA ≤ 0.0005. This means that the absolute error is never
greater than 0.0005 or 21 × 10−3 , i.e., |EA | ≤ 21 × 10−3 . Note that if a
number is rounded off to n decimal places, then the absolute error is less
than or equal to 12 × 10−n . Similarly, if the other given number 0.3572(br )
has been rounded off to 4dp, then EB ≤ 12 × 10−4 .
Since
c r = ar + b r ,
then
cr = 1.015 + 0.3572 = 1.3722
but
|EC | ≤ |EA | + |EB |

≤ ( 21 × 10−3 ) + ( 21 × 10−4 )

≤ (0.5 + 0.05) × 10−3


≤ 0.55 × 10−3 .
Appendix A: Number Representations and Errors 931

So the exact value of this sum must be in the range


1.3722 ± 0.55 × 10−3 ,
i.e., between 1.37165 and 1.37275, so this result may be correctly rounded
off to 1.37, i.e., to only 2dp.

A.5.2 Round-off Errors in Multiplication


Let ar and br be the rounded off values and cr be the product of these
numbers, i.e.,
c r = ar b r ,
the number which approximates to the exact number
c e = ae b e .
Then
cr − EC = (ar − EA )(br − EB )

= ar br − EA br − EB ar + EA EB ,
since cr = ar br , so
EC = EA br + EB ar − EA EB
and
EC EA br + EB ar − EA EB
=
cr ar b r

EA Eb EA EB
= + − .
ar br ar b r
The last term has as its numerator the product of two very small num-
bers, both of which will also be small compared with ar and br so we neglect
the last term, then we have
EC EA Eb
= + . (A.15)
cr ar br
The number EA /ar is called the relative error in ar . Then from (A.15), we
have
EC EA Eb
c r ≤ ar + b r . (A.16)

932 Linear Algebra and Optimization Using MATLAB

Hence, the relative error modulus of a product is less than or equal to


the sum of the relative error moduli of the factors of the product. Having
found the relative error modulus of a product from this result the absolute
error is usually then obtained by multiplying the relative error modulus by
ar , i.e.,
RE = |EA |/|ar |.

This result can be extended to the product of more than two numbers
and simply increases the number of terms on the right-hand side of the
formula. For example, consider the error in 1.015 × 0.3573 where both
numbers have been rounded off. Then

1.015 × 0.3573 = 0.3626595 ≈ 0.363.

So the relative error modulus is given by

1
× 10−3 1
× 10−4

EC 2
0.363 ≤
+ 2
1.015 0.3573

≤ (0.49 × 10−3 ) + (1.4 × 10−4 ).

Hence,

EC −3 −3
0.363 ≤ (0.49 × 10 ) + (0.14 × 10 )

≤ 0.63 × 10−3 .

So, we have
|EC | ≤ 0.63 × 0.363 × 10−3
≤ 0.23 × 10−3 .

Hence, the exact value of this product lies in the range

0.3626595 ± 0.00023,

i.e., between 0.3624295 and 0.3628895, so that this result may be correctly
rounded off to 0.36, i.e., to 2dp.
Appendix A: Number Representations and Errors 933

A.5.3 Round-off Errors in Division


Let ar and br be rounded off values and cr be the division of these numbers,
i.e.,
cr = ar /br ,
the number which approximates to the exact number

ce = ae /be .

Then
(ar − EA )
cr − E C =
(br − EB )

ar (1 − EA /ar )
=
br (1 − EB /br )
  −1
ar EA EB
= 1− 1− .
br ar br
The number  −1
EB
1−
br
is expanded using the binomial series and neglecting those terms involving
powers of the relative error EB /br . Thus,
  
ar EA EB
cr − E C = 1− 1+ + ···
br ar br
 
ar EA EB EA EB
= 1− + (neglecting )
br ar br ar b r

ar EA EB ar
= − + 2 ,
br br br

which implies that


EA EB ar
EC = − 2
br br
934 Linear Algebra and Optimization Using MATLAB

and
EC EA EB ar ar
= − 2 ÷
cr br br br

EA EB
= − .
ar br
Hence,
EC EA EB
≤ +
c r ar b r ,

which gives the same result as for the product of the two numbers. It
follows that it is possible to extend this result to quotients with two or
more factors in the numerator or denominator by simply increasing the
number of terms on the right-hand side. For example, consider the error
in 17.28 ÷ 2.136, where both numbers have been rounded off. Then

17.28/2.136 = 8.0898876 ≈ 8.09.

Therefore,
1
× 10−2 1
× 10−3

EC 2 2
8.09 ≤ +

17.28 2.136

≤ (0.029 × 10−2 ) + (0.23 × 10−3 )


≤ (0.29 × 10−3 + 0.23 × 10−3 )
≤ 0.52 × 10−3 ,
so that
|EC | ≤ 4.2 × 10−3
≤ 0.42 × 10−2 .
Hence, the exact value of this quotient lies in the range

8.08989 ± 0.000432,

i.e., between 8.08569 and 8.09409, so that this result may be correctly
rounded off to 8.09, i.e., to 2dp. The value of |EC | suggested this directly.
This could be given to 3dp as 8.090, but with a large error of up to 5 units
in the third decimal place.
Appendix A: Number Representations and Errors 935

Example A.2 Consider the error in 5.381+(5.96×17.89), where all num-


bers have been rounded off. We first find the absolute error in |EC |. So
5.96 × 17.89 = 106.6244 ≈ 106.6,
then
1 1

EC
× 10−2 × 10−2
2 2
106.6 ≤ +

5.96 17.89

≤ (0.084 × 10−2 ) + (0.028 × 10−2 )


≤ 0.112 × 10−2 ,
which gives
|EC | ≤ 0.12.
The absolute error for 5.381 is 1/2×10−3 , so that the maximum absolute
error for the sum is
0.12 + 0.0005 = 0.1205.
But by the calculator
5.381 + (5.96 × 17.89) = 112.0054,
the exact value lies in the range
112.0054 ± 0.1205,
i.e., between 111.8849 and 112.1259. This means that the result may be
correctly rounded off to 3S or 0dp as an error of 0.1205 as suggested or
could be given as 112.0 with an error of up to 1 unit in the first decimal
place. •

A.5.4 Round-off Errors in Powers and Roots


Let ar and br be rounded off values and
br = (ar )p ,
where the power is exact and may be rational. This approximates to the
exact number
be = (ae )p .
936 Linear Algebra and Optimization Using MATLAB

Using ae = ar − EA , we have
br − EB = (ar − EA )p

= (ar )p (1 − EA /ar )p
 
p pEA
= (ar ) 1 − + ··· .
ar
Using the binomial series and neglecting those terms involving powers
of the relative error EA /ar gives
br − EB = (ar )p − pEA (ar )p−1 ,
which implies that
EB = pEA (ar )p−1 ,
and so
EB pEA (ar )p−1
=
br (ar )p

EA
= p .
ar
Hence,
EB
= |p| EA ,


br ar
i.e., the relative error modulus of a power of a number is equal to the
product of the modulus of the power√ and the relative error modulus of the
number. For example, consider 8.675, √ where 8.675 has been rounded off.
Here, p = 1/2 and by the calculator 8.675 = 2.9453, retaining 4dp. Thus,
1 −3

≤ 2 × 10
EB

2.945 8.675

≤ 0.029 × 10−3 ,
so that
|EB | ≤ 0.85 × 10−4 .

This means that 8.675 may be correctly rounded off to 2.945, i.e., to 3dp
or may be given to 4dp with an error of up to 1 unit in the fourth decimal
place. •
Appendix A: Number Representations and Errors 937

A.6 Summary
In this chapter, we discussed the storage and arithmetic of numbers on a
computer. Efficient storage of numbers in computer memory requires allo-
cation of a fixed number of bits to each value. The fixed bit size translates
to a limit on the number of decimal digits associated with each number,
which limits the range of numbers that can be stored in computer memory.
The three number systems most commonly used in computing are bi-
nary (base 2), decimal (base 10), and hexadecimal (base 16). Techniques
were developed for transforming back and forth between the number sys-
tems. Binary numbers are a natural choice for computers because they
correspond directly to the underlying hardware, which features transistors
that are switched on and off.
The absolute and relative errors were discussed as measures of differ-
ence between exact x and approximate x̂. They were applied to the storage
mechanisms of chopping and rounding to estimate the maximum error in-
troduced when on storing a number. Rounding is somewhat more accurate
than chopping (ignoring excess digits), but chopping is typically used be-
cause it is simpler to implement in hardware.
Round-off error is one of the principal sources of error in numerical
computations. Mathematical operations on floating-point values introduce
round-off errors because the results must be stored with a limited number
of decimal digits. In numerical calculations involving many operations,
round-off gradually corrupts the least significant digits of the results.
The other main source of error in numerical computations is called trun-
cation error. Truncation error is the error that arises when approximations
to exact mathematical expressions are used, such as the truncation of an
infinite series to a finite number of terms. Truncation error is independent
of round-off errors, although these two sources of error combine to affect
the accuracy of a computed result. Truncation error considerations are
important in many procedures and are discussed throughout the book. •
938 Linear Algebra and Optimization Using MATLAB

A.7 Problems
1. Convert the following binary numbers to decimal form:

(1010)2 , (100101)2 , (.1100011)2 .

2. Convert the following binary numbers to decimal form:

(101101)2 , (1010)2 , (100101)2 , (10000001)2 .

3. Write down the following ordinary numbers in terms of power of 10:

8383, 285.625, 413.14159265 · · · .

4. Write down the following ordinary numbers in terms of power of 10:

769825, 654285.2625, 29873.3087045 · · · .

5. Express the base of natural logarithms e as a normalized floating-


point number, using both chopping and symmetric rounding for each
of the following systems:
(a) base 10, with 4 significant figures.
(b) base 10, with 7 significant figures.
(c) base 2, with 10 significant figures.
6. Write down the normalized binary floating-point representations of
1 1 1 1 1
, , , , and 10
3 5 7 9
. Use enough bits in the mantissa to see the recurring
patterns.
7. Find the first five binary digits of (0.1)10 . Obtain values for the
absolute and relative errors in yours results.
8. Convert the following:
(a) decimal numbers to binary numbers form.

165, 3433, 111, 2345, 278.5, 347.45

(b) decimal numbers to hexadecimal decimal numbers.

1025, 278.5, 14.09375, 1445, 347.45


Appendix A: Number Representations and Errors 939

(c) hexadecimal numbers to both decimal and binary.


1F.C, F F F.118, 1A4.C, 1023, 11.1

9. If a = 111010, b = 1011, then evaluate a + b, a − b, ab, and a/b.


10. Find the following expressions in binary form:
(a) 101 + 11 + 110110 + 110101 − 1101 − 1010.
(b) 1112 − 1102 .
(c) (11011)(101101).
(d) (101111001)/(10111).
1
11. What is the absolute error in approximating 3
by 0.3333? What is
the corresponding relative error?
12. Evaluate the absolute error in each of the following calculations and
give the answer to a suitable degree of accuracy:
(a) 9.01 + 9.96.
(b) 4.65 − 3.429.
(c) 0.7425 × 0.7199.
(d) 0.7078 ÷ 0.87.
13. Find the absolute and relative errors in approximating π by 3.1416.
What are the corresponding errors in the approximation 100π ≈
314.16?
14. Calculate the error, relative error, and number of significant digits in
the following approximations, with p ≈ x:
(a)
x = 25.234, p = 25.255.
(b)
x = e, p = 19/7.
(c) √
x= 2, p = 1.414.
15. Write each of the following numbers in (decimal) floating-point form,
starting with the word length m and the exponent e:
13.2, −12.532, 2/125.
940 Linear Algebra and Optimization Using MATLAB

16. Find absolute error in each of the following calculations (all numbers
are rounded):
(a)
187.2 + 93.5.
(b)
0.281 × 3.7148.
(c) √
28.315.
(d) p
(6.2342 × 0.82137)/27.268.
Appendix B

Mathematical Preliminaries

This appendix presents some of the basic mathematical concepts that are
used frequently in our discussion. We start with the concept of vector
space, which is useful for the discussion of matrices and systems of linear
equations. We also give a review of complex numbers and how they can
be used in linear algebra. This appendix is also devoted to general inner
product spaces and how the different notations and processes generalize.

B.1 The Vector Space


In dealing with systems of linear equations we notice that solutions to linear
systems can be points in the plane if the equations have two variables,
three-space if they are equations in three variables, points in four-space if
they have four variables, and so on. The solutions make up subsets of large
spaces. Here, we set out to investigate the spaces and their subsets and to
develop mathematical structures on them. The spaces that we construct
are called vector spaces and they arise in many areas of mathematics.
941
942 Linear Algebra and Optimization using MATLAB

A vector space V is intuitively a set together with the operations of


addition and multiplication by scalars. If we restrict the scalars to be the
set of real numbers, then the vector space V is called a vector space over
the real numbers. If the scalars are allowed to be complex numbers, then
it is called a vector space over the complex numbers.
Many quantities in the physical sciences are vectors because they have
both a magnitude and a direction associated with them. Examples are
velocity, force, and angular momentum. We start a discussion of vectors
in two dimensions because their properties are easy to visualize and the
results are readily extended to three (or more) dimensions.

B.1.1 Vectors in Two Dimensions


This dimension vectors can be defined as ordered pairs of real numbers
(a, b) that obey certain algebraic rules. The numbers a and b are called
components of the vector. The vector (a, b) can be represented geometri-
cally by a directed line segment (arrow) from the origin of a coordinate
system to the point. As shown by Figure B.1, we use P~Q to denote the
vector with initial point P and terminal point Q and indicate the direction
of the vector by placing an arrow head at Q. The magnitude of P~Q is the
length of the segment P~Q and is denoted by kP~Qk. Vectors that have the
same length and the same direction are equal.

Figure B.1: Geometric representation of vectors.


Appendix B: Mathematical Preliminaries 943

Definition B.1 (Magnitude of a Vector)

The magnitude (norm or length) of a vector u =< u1 , u2 > is denoted by


kuk and is defined as
q
kuk = k < u1 , u2 > k = u21 + u22 .

For example, if u =< −4, 3 >, then


p √
kuk = k < −4, 3 > k = (−4)2 + 32 = 25 = 5,
which is called the magnitude of the given vector. •
The norm of a vector can be obtained using MATLAB command window
as follows:

>> u = [3 − 4];
>> v = norm(u);
Operations on Vectors

Let u =< u1 , u2 > and v =< v1 , v2 > be two vectors, then:


1. We can multiply a vector u by a scalar k, the result being
ku = k < u1 , u2 >=< ku1 , ku2 > .
Geometrically, the magnitude of u is changed by this operation. If
k > 0, the length of u is scaled by a factor of k; if k < 0, then the
direction of u is reversed, the magnitude of u. If k = 0, we have the
zero vector, i.e., 0 =< 0, 0 > .
2. The addition of two vectors is defined as
u + v =< u1 , u2 > + < v1 , v2 >=< u1 + v1 , u2 + v2 > .
The u + v is the vector connecting the tail of u to the head on v.
3. The subtraction of two vectors is defined as
u − v =< u1 , u2 > − < v1 , v2 >=< u1 − v1 , u2 − v2 > .
944 Linear Algebra and Optimization using MATLAB

4. The vector addition is commutative and associative

u+v =v+u

(u + v) + w = u + (v + w).

5. The two vectors i =< 1, 0 > and j =< 0, 1 > have magnitude 1 and
can be used to obtain another way of denoting vectors as

u =< u1 , u2 >= u1 i + u2 j.

Figure B.2: Operations on vectors.

Definition B.2 (Unit Vector)

If a 6= 0, then the unit vector u that has the same direction as a is defined
as
a
u= .
kak
For example, the unit vector u that has the same direction as 4i − 3j is
4i − 3j 4i − 3j 4 3
u= p = = i − j,
42 + (−3)2 5 5 5

called the unit vector. •


Appendix B: Mathematical Preliminaries 945

The unit vector can be obtained using the MATLAB Command Window
as follows:

>> a = [4 − 3];
>> u = a/norm(a);
Now we define two useful concepts that involve vectors u and v—the dot
product, which is a scalar, and the cross product, which is a vector. First,
we define the dot product of two vectors as follows.
Definition B.3 (Dot Product of Vectors)

The multiplication for two vectors u =< u1 , u2 > and v =< v1 , v2 > is
called the dot product (or scalar product) and is symbolized by u.v. It is
defined as
u.v =< u1 , u2 > . < v1 , v2 >= u1 v1 + u2 v2 .
For example, if u =< 3, −4 > and v =< 2, 3 > are two vectors, then

u.v =< 3, −4 > . < 2, 3 >= (3)(2) + (−4)(3) = 6 − 12 = −6

is their dot product. •


The dot product of two vectors can be obtained using the MATLAB Com-
mand Window as follows:

>> u = [3 − 4];
>> v = [2 3];
>> dot(u, v);

Theorem B.1 If u, v, and w are vectors and k is a scalar, then these


properties hold:
1. u.v = v.u.
2. u.(v + w) = u.v + u.w.
3. k(u.v) = (ku).v = u.(kv).
4. 0.u = u.0 = 0.
946 Linear Algebra and Optimization using MATLAB

5. u.u = kuk. •
The dot product of two vectors can be also defined as follows.
Definition B.4 (Dot Product of Vectors)

If u and v are nonzero vectors, and θ is the angle between them, the dot
product of u and v is defined as follows:

u.v = kukkvk cos θ.

By the angle between vectors u and v, we mean the smallest nonnegative


angle between them, so 0 ≤ θ ≤ π.
For example, finding the angle between u =< 4, 3 > and v = (2, 5), we
do the following:

< 4, 3 > . < 2, 5 >= k < 4, 3 > kk < 2, 5 > k cos θ

or
< 4, 3 > . < 2, 5 >
cos θ = ,
k < 4, 3 > kk < 2, 5 > k
which gives
 
−1 23
θ = cos √ ≈ 0.5468 radian = 31.3287 degree,
5 29
which is called the angle between the given vectors. •
The angle between two vectors can be obtained using the MATLAB Com-
mand Window as follows:

>> u = [4 3];
>> v = [2 5];
>> T = a cos(dot(u, v)/(norm(u) ∗ norm(v)));
>> (360 ∗ T )/(2 ∗ pi);

Theorem B.2 Let u and v be two vectors and k is a scalar, then:


1. u and v are orthogonal if and only if u.v = 0.
Appendix B: Mathematical Preliminaries 947

2. u and v are parallel if and only if v = ku.


For example, the vectors u =< 4, 3 > and v =< 3, −4 > are orthogonal
vectors because
u.v =< 4, 3 > . < 3, −4 >= 12 − 12 = 0,
while the vectors u =< 1, −2 > and v =< 2, −4 > are parallel vectors
because
1 1
v =< 2, −4 >= < 1, −2 >= u.
2 2

B.1.2 Vectors in Three Dimensions


Three-dimensional vectors can be treated as ordered triplets of three num-
bers and obey rules very similar to those obeyed by two-dimensional vec-
tors. We represent three-dimensional vectors by arrows and the geometric
interpretation of the addition and subtraction of these vectors follows a
parallelogram rule just as it does in two dimensions. We define unit vec-
tors i, j, and k along the x, y, and z axes of a cartesian coordinate system
and express three-dimensional vectors as
u = u1 i + u2 j + u3 k,
in terms of ordered triplets of the real numbers
i =< 1, 0, 0 >, j =< 0, 1, 0 >, k =< 0, 0, 1 > .
Note that:
i.i = j.j = k.k
i.j = j.i = i.k = k.i = j.k = k.j = 0.

Definition B.5 (Distance Between Points)

The distance between two points P1 (x1 , y1 , z1 ) and P2 (x2 , y2 , z2 ) is denoted


by d(P1 , P2 ) and is defined as
p
d(P1 , P2 ) = (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2 .
948 Linear Algebra and Optimization using MATLAB

Figure B.3: Geometric representation of three-dimensional space.

If the points P1 and P2 are on the xy-plane so that z1 = z2 = 0, then the


above distance formula reduces to the two-dimensional distance formula
p
d(P1 , P2 ) = (x2 − x1 )2 + (y2 − y1 )2 .
Example B.1 Find the distance between P1 (−3, 2, 4) and P2 (3, 5, 2).

Solution. Using the above distance formula, we have


p
d(P1 , P2 ) = (3 + 3)2 + (5 − 2)2 + (2 − 4)2

= 36 + 9 + 4

= 49 = 7,
which is the required distance between the given points. •
The definitions stated for two-dimensional extend to three-dimensional,
the only change is the inclusion of a third component for each vector.
Magnitude of a vector, vector addition, and scalar multiples of vectors are
defined as follows:
q
kuk = u21 + u22 + u23
u + v = < u1 + v1 , u2 + v2 , u3 + v3 >
ku = k < u1 , u2 , u3 >=< ku1 , ku2 , ku3 >
−u = − < u1 , u2 , u3 >=< −u1 , −u2 , −u3 >
0 = < 0, 0, 0 > .
Appendix B: Mathematical Preliminaries 949

Example B.2 If u =< 3, 4, −2 > and v =< −5, 7, 6 >, then find u + v,
4u − 3v, and kuk.

Solution. Using the given vectors, we have


u + v = < 3, 4, −2 > + < −5, 7, 6 >=< −2, 11, 4 >
4u − 3v = 4 < 3, 4, −2 > −3 < −5, 7, 6 >=< 12, 16, −8 > − < −15, 21, 18 >
= < 27, −5, −26 >
p √ √
kuk = 32 + 42 + (−2)2 = 9 + 16 + 4 = 29,
which is the required operations on the given vectors. •
Also, if u =< u1 , u2 , u3 > and v =< v1 , v2 , v3 >, then their dot product is
defined as
u.v = u1 v1 + u2 v2 + u3 v3 ,
and the magnitude of the vectors u and v is defined as
p
kuk = k < u1 , u2 , u3 > k = u21 + u22 + u33
p
kvk = k < v1 , v2 , v3 > k = v12 + v22 + v33 .
Example B.3 If u =< 5, −7, 8 > and v =< −3, 6, 5 >, then find the dot
product of the vectors and the angle between the vectors.

Solution. Using the given vectors, we have


u.v = < 5, −7, 8 > . < −3, 6, 5 >
= (5)(−3) + (−7)(6) + (8)(5) = −15 − 42 + 40
= −17,
which is the required dot product of the given vectors.
The angle between the given vectors is defined as
u.v
cos θ =
kukkvk
−17
= √ √
25 + 49 + 64 9 + 36 + 25
−17 −17
= √ √ =√ .
138 70 9660
950 Linear Algebra and Optimization using MATLAB

Hence,  
−1−17
θ = cos √ ≈ 99.96o ,
9660
which is the required angle between the given vectors. •

Definition B.6 (Direction Angles and Cosines)

The smallest nonnegative angles α, β, and γ between a nonzero vector u and


the basis vectors i, j, and k are called the direction angles of u. The cosines
of these direction angles, cos α, cos β, and cos γ, are called the direction
cosines of the vector u.
If u = u1 i + u2 j + u3 k, then

u.i = kukkik cos α = kuk cos α

and
u.i =< u1 , u2 , u3 > . < 1, 0, 0 >= u1 ,
and it follows that
u1
cos α = .
kuk
By similar reasoning with the basis vectors j and k, we have
u2 u3
cos β = and cos γ = ,
kuk kuk
where α, β, and γ, are respectively, are the angles between u and i, u and
j, and u and k.
Consequently, any nonzero vector u in space has the normalized form
u u1 u2 u3
= i+ j+ k = cos α + cos β + cos γ,
kuk kuk kuk kuk
u
and because is a unit vector, it follows that
kuk

cos2 α + cos2 β + cos2 γ = 1.

Note that the vector < cos α, cos β, cos γ > is a unit vector with the same
direction as the original vector u. •
Appendix B: Mathematical Preliminaries 951

Figure B.4: Direction angles of a vector.

Example B.4 Find the direction cosines and angles for the vector u =
3i + 6j + 2k, and show that cos2 α + cos2 β + cos2 γ = 1.

Solution. Because
√ √ √
kuk = 32 + 62 + 22 = 9 + 36 + 4 = 49 = 7,

we can write it as
u1 3
cos α = = ,
kuk 7
and it gives
3
α = cos−1 ≈ 64.62o .
7
Similarly,
u2 6 6
cos β = = , β = cos−1 ≈ 31.00o
kuk 7 7
and
u3 2 2
cos γ = = , β = cos−1 ≈ 73.40o .
kuk 7 7
Furthermore, the sum of the squares of the direction cosines is
9 36 4
cos2 α + cos2 β + cos2 γ = + + = 1.
49 49 49

952 Linear Algebra and Optimization using MATLAB

Definition B.7 (Component of Vector Along a Vector)

Let u and v be nonzero vectors. Then the component of a vector (also


called the scalar projection) u along vector v is denoted by compv u and
defined as
v u.v kukkvk cos θ
compv u = u. = = = kuk cos θ.
kvk kvk kvk
Note that if u = u1 i + u2 j + u3 k, then by the above definition

compi u = u.i = u1
compj u = u.j = u2
compk u = u.k = u3 .

Thus, the components of u along i, j, and k are the same as the components
u1 , u2 , and u3 of the vector u. •

Example B.5 If u = 3i + 2j − 6k and v = 2i + 2j + k, then find compv u


and compu v.

Solution. Using the above definition, we have


v 1
compv u = u. = (3i + 2j − 6k). (2i + 2j + k),
kvk 3
since √ √
kvk = 22 + 22 + 12 = 9 = 3.
Thus,
(3)(2) + (2)(2) + (−6)(1) 4
compv u = = .
3 3
Similarly, we compute
u 1
compu v = v. = (2i + 2j + k). (3i + 2j − 6k)(2i + 2j + k),
kuk 7
where p √
kuk = 32 + 22 + (−6)2 = 49 = 7.
Appendix B: Mathematical Preliminaries 953

Thus,
(2)(3) + (2)(2) + (1)(−6) 4
compu v = = ,
7 7
the required solution. •
To get the results of Example B.5, we use the MATLAB Command Win-
dow as follows:

>> u = [3 2 6];
>> v = [2 2 1];
>> compu = (dot(u, v))/norm(v);
>> compv = (dot(u, v))/norm(u);

Figure B.5: Projections of a vector onto a vector.

Definition B.8 (Projection of a Vector onto a Vector)

If u and v are nonzero vectors, then the projection of vector u onto vector
v is denoted by projv u and is defined as
 
u.v
projv u = v.
kvk2
Note that the projection of u onto v can be written as a scalar multiple of
a unit vector in the direction of v, i.e.,
 
u.v (u.v) v v
projv u = 2
v= =K ,
kvk kvk kvk kvk
954 Linear Algebra and Optimization using MATLAB

where
u.v
K= = kuk cos θ
kvk
is called the component of u in the direction of v. •
Example B.6 If u = 4i − 5j + 3k and v = 6i − 3j + 2k, then find the
projection of u onto v.

Solution. Since
u.v = (4)(6) + (−5)(−3) + (3)(2) = 24 + 15 + 6 = 45
and p √ √
kvk = 62 + (−3)2 + 22 = 36 + 9 + 4 = 49 = 7,
using the above definition, we have
   
u.v 45
projv u = 2
v= (6i − 3j + 2k)
kvk 49
or
270 135 90
projv u = i− j + k,
49 49 49
which is the required projection of u onto v. •
To get the results of Example B.6, we use the MATLAB Command Win-
dow as follows:

>> u = [4 − 5 3];
>> v = [6 − 3 2];
>> K = dot(u, v)/norm(v);
>> X = v/norm(v);
>> P roj = K ∗ X;

Definition B.9 (Work Done)


The work done by a constant force F = P~R as its point of application
moves along the vector D = P~Q (displacement vector) and is defined as
W = F.D = kFkkDk cos θ.
Thus, the work done by a constant force is the dot product of the vectors.•
Appendix B: Mathematical Preliminaries 955

Figure B.6: Work done by a force.

Example B.7 A force is given by a vector F = 6i + 4j + 7k and moves a


particle from the point P (2, −3, 4) to the point Q(5, 4, −2). Find the work
done.

Solution. The vector D that corresponds to P~Q is

D =< 5 − 2, 4 + 3, −2 − 4 >=< 3, 7, −6 > .

If P~R corresponds to F, then the work done is

W = P~R.P~Q = F.D = < 6, 4, 7 > . < 3, 7, −6 >


= 18 + 28 − 42 = 4.

If, for example, the distance is in feet and the magnitude of the force is in
pounds, then the work done is 4 ft-lb. If the distance is in meters and the
force is in Newtons, then the work done is 4 joules. •

Now we define the cross product of two vectors in three-dimensional space


as follows.

Definition B.10 (Cross Product of Vectors)


956 Linear Algebra and Optimization using MATLAB

The other way to multiply two vectors u =< (u1 , u2 , u3 > and v =<
v1 , v2 , v3 > is known as the cross product (or vector product) and is sym-
bolized by u × v. It is defined as

i j k
u2 u3 u1 u3 u1 u2
u × v = u1 u2 u3 = i −
v1 v3 j + v1 v2 k.

v1 v2 v3 v 2 v3

For example, if u =< 1, −1, 2 > and v =< 2, −1, −2 > are two vectors,
then their cross product is defined as

i j k
−1 2 1 2 1 −1
u × v = 1 −1 2 = i −
2 −2 j + 2 −1 k.

−1 −2
2 −1 −2

By evaluating these determinants, we get


u × v = 4i + 6j + k,
the cross product of the vectors, which is also the vector. •
Theorem B.3 Let u and v be vectors in three dimensions and θ be the
angle between them, then:
1. u.(u × v) = v.(u × v) = 0.
2. ku × vk = kukkvk sin θ. •
Note that:
i × j = k, j × k = i, k×i=j
j × i = −k, k × j = −i, i × k = −j
i × i = j × j = k × k = 0.
Theorem B.4 Two vectors u and v are parallel, if and only if
u × v = 0.
For example, the vectors u =< −6, −10, 4 > and v =< 3, 5, −2 > are
parallel because

i j k
−10 4 −6 4 −6 −10
u×v = −6 −10 4 = i− j+ k,
5 −2 3 −2 3 5
3 5 −2
Appendix B: Mathematical Preliminaries 957

Figure B.7: Cross product of the vectors.

and it gives

u × v = (20 − 20)i − (12 − 12)j + (−30 + 30)k = 0.


Note that the length of the cross product u × v is equal to the area of the
parallelogram determined by the vectors u and v, i.e.,

Area of the parallelogram = A = ku × vk.

Also, the area of the triangle is half of the area of the parallelogram, i.e.,
1
Area of triangle = A = ku × vk.
2
Example B.8 Find the area of the parallelogram made by P~Q and P~R,
where P (3, 1, 2), Q(2, −1, 1), and R(4, 2, −1) are the points in the plane.

Solution. Since

P~Q = (2 − 3)i + (−1 − 1)j + (1 − 2)k = −i − 2j − k


958 Linear Algebra and Optimization using MATLAB

Figure B.8: Length of the cross product of the vectors.

and
P~R = (4 − 3)i + (2 − 1)j + (−1 − 2)k = i + j − 3k,
their cross product is defined as follows:

i j k
~ ~
−2 −1 −1 −1
j + −1 −2 k

P Q × P R = −1 −2 −1 =
i −
1 −3 1 −3 1 1
1 1 −3
= (−6 + 1)i − (−3 + 1)j + (−1 + 2)k
= −5i + 2j + k.
Thus, √
A = kP~Q × P~Rk =
p
(−5)2 + 22 + 12 = 30 unit2
is the required area of the parallelogram. •

Example B.9 Find a vector perpendicular to the plane that passes through
the points P (2, 1, 4), Q(−3, 4, −2), and R(2, −2, 1).

Solution. The P~Q×P~R is perpendicular to both P~Q and P~R and therefore
perpendicular to the plane through P, Q, and R. Since

P~Q = (−3 − 2)i + (4 − 1)j + (−2 − 4)k = −5i + 3j − 6k

and
P~R = (2 − 2)i + (−2 − 1)j + (1 − 4)k = 0i − 3j − 3k,
Appendix B: Mathematical Preliminaries 959

their cross product is defined as follows:



i j k
~ ~
3 −6 −5 −6 −5 3
P Q × P R = −5 3 −6 =
i − j + k
0 −3 −3 −3 −3 0 −3 0 −3

= (−9 − 18)i − (−15 − 0)j + (15 − 0)k


= −27i − 15j + 15k.
Thus, the vector (−27, −15, 15) is perpendicular to the given plane. Any
nonzero scalar multiple of this vector, such as (−9, 5, 5), is also perpendic-
ular to the plane. •
Example B.10 Find the area of the triangle with vertices P (2, 1, 1), Q(3, 1, 2),
and R(1, −2, 1).

Solution. Since the area of the triangle is half of the area of the paral-
lelogram, we compute first the area of the parallelogram. The area of the
parallelogram with adjacent sides P Q and P R is the length of the cross
product P~Q × P~R, therefore, we find the vectors P~Q and P~R as follows:
P~Q = (3 − 2)i + (1 − 1)j + (2 − 1)k = i + 0j + k,
and
P~R = (1 − 2)i + (−2 − 1)j + (1 − 1)k = −i − 3j + 0k.
Now we compute the cross product of these two vectors as follows:

i j k
0 1 1 1
j + 1 0
P~Q × P~R = 1

0 1 = i − k
−1 −3 0 −3 0 −1 0 −1 −3

= (0 + 3)i − (0 + 1)j + (−3 − 0)k


= 3i − j − 3k.
Thus, the length of this cross product is

kP~Q × P~Rk = 32 + (−1)2 + (−3)2 = 19 unit2 ,
p

which is the area of the parallelogram. The area A of the triangle P QR is


half the area of this parallelogram, i.e.,

19
A= unit2
2
is the required area of the triangle. •
960 Linear Algebra and Optimization using MATLAB

Theorem B.5 Let u, v, and w be three vectors and k is a scalar, then:

1. u × v = −(v × u).

2. (ku) × v = k(u × v) = u × (kv).

3. u × (v + w) = (u × v) + (u × w).

4. (u + v) × w = (u × w) + (v × w).

5. (u × v).w = u.(v × w).

6. u × (v × w) = (u.w)v − (u.v)w. •

Note that the product u.(v × w) that occurs 5th in Theorem B.5 is called
the scalar triple product of the vectors u, v, and w. We can write the scalar
triple product of the vectors as a determinant:

u1 u2 u3

u.(v × w) = v1 v2 v3 .
w1 w2 w3

Example B.11 Find the scalar triple product of the vectors u = 2i + j +


3k, v = 3i + 2j + 4k, and w = 4i + 3j + 5k.

Solution. We use the following determinant to compute the scalar triple


product of the given vectors as follows:

2 2 3
2 4 3 4 3 2
u.(v × w) = 3 2 4 = 2 − 2
4 5 + 3 4 3

4 3 5 3 5
= 2(10 − 12) − 2(15 − 16) + 3(9 − 8)
= −4 + 2 + 3 = 1,

which is the required scalar triple product of the given vectors. •

To get the scalar triple product of the given vectors of Example B.11, we
use the MATLAB Command Window as follows:
Appendix B: Mathematical Preliminaries 961

>> u = [2 1 3];
>> v = [3 2 4];
>> w = [4 3 5];
>> x = cross(v, w);
>> y = dot(u, x);
Note that the volume of the parallelepiped determined by the vectors u, v,
and w is the magnitude of their scalar triple product:

Volume of the parallelepiped = V = |u.(v × w)|.

Example B.12 Find the volume of the parallelepiped having adjacent sides
AB, AC, and AD, where

A(0, 1, 0), B(2, −2, 3), C(1, 1, −1), and D(4, −1, −1).

Solution. Since
~ = (2 − 0)i + (−2 − 1)j + (3 − 0)k = 2i − 3j + 3k
u = AB
~ = (1 − 0)i + (1 − 1)j + (−1 − 0)k = i + 0j − k
v = AC
~ = (4 − 0)i + (−1 − 1)j + (−1 − 0)k = 4i − 2j − k,
w = AD

use the following determinant to compute the scalar triple product of the
given vectors as follows:

2 −3 3
0 −1 1 −1 1 0
u.(v × w) = 1 0 −1 = 2 + 3 + 3
4 −2 −1 −2 −1 4 −1 4 −2

= 2(0 − 2) + 3(−1 + 4) + 3(−2 − 0)


= −4 + 9 − 6 = −1,
which is the scalar triple product of the given vectors. Thus,

V = |u.(v × w)| = | − 1| = 1

is the volume of the parallelepiped. •


To get the volume of the parallelepiped of Example B.12, we use the MAT-
LAB Command Window as follows:
962 Linear Algebra and Optimization using MATLAB

>> u = [2 − 3 3];
>> v = [1 0 − 1];
>> w = [4 − 2 − 1];
>> x = cross(v, w);
>> y = dot(u, x);
>> v = abs(y);
Note that if the volume of the parallelepiped determined by the vectors
u, v, and w is zero, then the vectors must lie in the same plane; i.e., they
are coplanar.
Example B.13 Use the scalar triple product to show that the vectors
u = 4i + 6j + 2k, v = 2i − 2j, and w = 14i + 6j + 4k are coplanar.

Solution. Given
u = 4i + 6j + 2k
v = 2i − 2j
w = 14i + 6j + 4k,
we use the following determinant to compute the scalar triple product of
the given vectors as follows:

4 6 2
−2 0 2 0 2 −2
u.(v × w) = 2 −2 0 = 4
− 6
+ 2
14 6 4 14 4 14 6
6 4
= 4(−8 − 0) − 6(8 − 0) + 2(12 + 28)
= −32 − 48 + 80 = 0.
Since
V = |u.(v × w)| = 0,
the volume of the parallelepiped determined by the given vectors u, v, and
w is zero. This means that u, v, and w are coplanar. •
Note that the product u × (v × w) that occurs 6th in Theorem B.5 is called
the triple vector product of the vectors u, v, and w. We can write the triple
vector product of the vectors in dot product form as
u × (v × w) = (u.w)v − (u.v)w,
Appendix B: Mathematical Preliminaries 963

and the result of the triple vector product of the vectors is a vector.

Example B.14 Find the triple vector product of the vectors u = 3i−j, v =
2i + j + k, and w = i − j + k.

Solution. To find the triple vector product of u =< 3, −1, 0 >, v =<
2, 1, 1 >, and w =< 1, −1, 1 >,we compute the following dot products:

u.w =< 3, −1, 0 > . < 1, −1, 1 >= (3)(1)+(−1)(−1)+(0)(1) = 3+1+0 = 4

and

u.v =< 3, −1, 0 > . < 2, 1, 1 >= (3)(2) + (−1)(1) + (0)(1) = 6 − 1 + 0 = 5.

Thus,

u × (v × w) = 4v − 5w
= 4 < 2, 1, 1 > −5 < 1, −1, 1 >
= < 8, 4, 4 > − < 5, −5, 5 >
= < 8 − 5, 4 + 5, 4 − 5 >
= < 3, 9, −1 >,

which is the required triple vector product of the given vectors.


We can also find the triple vector product of the given vectors directly
by first taking the cross product of the vectors v × w = x and then taking
one more time the cross product of the vectors u × x as follows:

i j k
1 1 2 1
j + 2 1

x = v × w = 2 1 1 = i − k
1 −1 1 −1 1 1 1 1 −1

= (1 + 1)i − (2 − 1)j + (−2 − 1)k


= 2i − j − 3k
964 Linear Algebra and Optimization using MATLAB

and

i j k

u × (v × w) = u × x = 3 −1 0

2 −1 −3

−1 0 3 0 3 −1
=
i−
j+
k
−1 −3 2 −3 2 −1
= (3 − 0)i − (−9 − 0)j + (−3 + 2)k
= 3i + 9j − k,

the triple vector product of the given vectors. •

To get the triple vector product of the given vectors of Example B.14, we
use the MATLAB Command Window as follows:

>> u = [3 − 1 0];
>> v = [2 1 1];
>> w = [1 − 1 1];
>> x = cross(v, w);
>> y = cross(u, x);

B.1.3 Lines and Planes in Space


Here, we discuss parametric equations of lines in space which is important
because they generally provide the most convenient form for representing
lines algebraically. Also, we will use vectors to derive equations of planes
in space, which we will use to solve various geometric problems.

Lines in Space
Let us consider a line that passes through the point P1 = (x1 , y1 , z1 ) and
is parallel to the position vector a = (a1 , a2 , a3 ). For any other point
P = (x, y, z) on the line, the vector P~1 P must be parallel to a, i.e.,

P~1 P = ta
Appendix B: Mathematical Preliminaries 965

for some scalar t. Since


P~1 P = (x − x1 , y − y1 , z − z1 )
and
ta = t(a1 , a2 , a3 ) = (a1 t, a2 t, a3 t),
we have
(x − x1 , y − y1 , z − z1 ) = (a1 t, a2 t, a3 t).
Two vectors are equal, if and only if all of their components are equal, so
x − x1 = a1 t, y − y1 = a2 t, z − z1 = a3 t,
which are called the parametric equations for the line, where t is the pa-
rameter.
Note that if all the components of the vector a are nonzero, then we
can solve for the parameter t in each of the three equations as follows:
x − x1 y − y1 z − z1
= = ,
a1 a2 a3
which are the symmetric equations of the line.

Example B.15 Find the parametric and symmetric equations of the line
passing through the points (1, 3, −2) and (3, −2, 5).

Solution. Begin by letting P1 = ((1, 3, −2) and P2 = (3, −2, 5), then a
direction vector for the line passing through P1 and P2 is given by
a = P1~P2 = (3 − 1, −2 − 3, 5 + 2) = (2, −5, 7),
which is parallel to the given line and taking either point will give us an
equation for the line. So using direction number a1 = 2, a2 = −5, and
a3 = 7, with the point P1 = ((1, 3, −2), we can obtain the parametric
equations of the form
x − 1 = 2t, y − 3 = −5t, z + 2 = 7t.
Similarly, the symmetric equations of the line are
x−1 y−3 z+2
= = .
2 −5 7

966 Linear Algebra and Optimization using MATLAB

Neither the parametric equations nor the symmetric equation of a given


line are unique. For instance, in Example B.15, by taking parameter t = 1
in the parametric equations we would obtain the point (3, −2, 5). Using
this point with the direction numbers a1 = 2, a2 = −5, and a3 = 7 produces
the parametric equations as follows:

x − 3 = 2t, y + 2 = −5t, z − 5 = 7t.

Definition B.11 Let l1 and l2 be two lines in R3 , with parallel vectors a


and b, respectively, and let θ be the angle between a and b. Then:
1. Lines l1 and l2 are parallel.

2. If lines l1 and l2 intersect, then:


(i) the angle between l1 and l2 is θ.
(ii) the lines l1 and l2 are orthogonal whenever a and b are orthogo-
nal.

Example B.16 Find the angle between lines l1 and l2 , where

l1 : x = 1+3t, y = 2−3t, z = −1+2t, l2 : x = −3+t, y = 2−2t, z = 2+3t.

Solution. Given that lines l1 and l2 are parallel, respectively, to the vectors

u =< 3, −3, 2 > and v =< 1, −2, 3 >,

if θ is the angle between u and v, then


< 3, −3, 2 > . < 1, −2, 3 >
cos θ = ,
k < 3, −3, 2 > kk < 1, −2, 3 > k
which gives  
−1 15
θ = cos √ ≈ 0.0149 radian,
308
the angle between the given lines.
Note that the angle between lines l1 and l2 is defined for either inter-
secting or nonintersecting lines. •

Note that nonparallel, nonintersecting lines are called skew lines. •


Appendix B: Mathematical Preliminaries 967

Example B.17 Show that lines l1 and l2 are skew lines:

l1 : x = 1 + 3t, y = 5 − 3t, z = −1 + 5t, l2 : x = 2 + 7t, y = 4 − 3t, z = 5 + t.

Solution. Line l1 is parallel to the vector 3i−3j+5k, and line l2 is parallel


to the vector 7i − 3j + k. These vectors are not parallel since neither is a
scalar multiple of the other. Thus, the linear lines are not parallel.
For l1 and l2 to intersect at some point (x0 , y0 , z0 ) these coordinates
would have to satisfy the equations of both lines, i.e., there exist values t1
and t2 for the parameters such that

x0 = 1 + 3t, y0 = 5 − 3t, z0 = −1 + 5t

and
x0 = 2 + 7t, y0 = 4 − 3t, z0 = 5 + t.
This leads to three conditions on t1 and t2 :

1 + 3t1 = 2 + 7t2
5 − 3t1 = 4 − 3t2
−1 + 5t1 = 5 + t2 .

Adding the first two equations of the above system, we get

6 = 6 + 4t2 , which gives t2 = 0.

We can find t1 by putting the value of t2 in the first equation as

1 + 3t1 = 2 + 7(0), which gives t1 = 1/3.

With these values of t1 and t2 , the third equation of the above system is
not satisfied, so the lines do not intersect. Thus, the given linear lines are
skew lines. •

Planes in Space
As we have seen, an equation of a line in space can be obtained from a
point on the line and a vector parallel to it. A plane in space is determined
by specifying a vector n =< a, b, c > that is normal (perpendicular) to
968 Linear Algebra and Optimization using MATLAB

the plane (i.e., orthogonal to every vector lying in the plane), and a point
P1 = (x1 , y1 , z1 ) lying in the plane.
In order to find an equation of the plane, let P = (x, y, z) represent any
point in the plane. Then, since P and P1 are both points in the plane, the
vector
P~1 P = (x − x1 , y − y1 , z − z1 )
lies in the plane and so must be orthogonal to n, i.e.,
n.P~1 P = 0
(a, b, c).(x − x1 , y − y1 , z − z1 ) = 0
a(x − x1 ) + b(y − y1 ) + c(z − z1 ) = 0
The above third equation is called the equation of the plane in standard
form or sometimes called the point-normal form of the equation of the
plane.
Let us rewrite the equation as
ax − ax1 + by − by1 + cz − cz1 = 0
or
ax + by + cy − ax1 − by1 − cz1 = 0.
Since the last three terms are constant, combine them into one constant d
and write
ax + by + cy + d = 0.
This is called the general form of the equation of the plane.
Given the general form of the equation of the plane, it is easy to find a
normal vector to the plane. Simply use the coefficients of x, y, and z and
write n =< a, b, c > .
Example B.18 Find an equation of the plane through the point (3, −4, 3)
with normal vector n =< 3, −4, 5 > .

Solution. Using the direction number for n =< 3, −4, 5 >=< a, b, c >
and the point (x1 , y1 , z1 ) = (3, −4, 3), we can obtain
a(x − x1 ) + b(y − y1 ) + c(z − z1 ) = 0
3(x − 3) − 4(y + 4) + 5(z − 3) = 0 (standard form)
3x − 4y + 5z − 40 = 0, (general form)
Appendix B: Mathematical Preliminaries 969

the equation of the plane. Observe that the given point (3, −4, 3) satisfies
this equation. •

Example B.19 Find the general equation of the plane containing the three
points (2, −1, 3), (3, 1, 2), and (4, 5, −3).

Solution. To find the equation of the plane, we need a point in the plane
and a vector that is normal to the plane. There are three choices for the
point, but no normal vector is given. To find a normal vector, use the
vector product of vectors a and b extending from the point P1 (2, −1, 3) to
the points P2 (3, 1, 2) and P3 (4, 5, −3). The component forms of a and b are
as follows:

a = P1~P2 =< 3 − 2, 1 + 1, 2 − 3 >=< 1, 2, −1 >


b = P2~P3 =< 4 − 3, 5 − 1, −3 − 2 >=< 1, 4, −5 > .

So, one vector orthogonal to both a and b is the vector product



i j k
2 −1 1 −1 1 2
n = a × b = 1 2 −1

4 −5 i − 1 −5 j + 1 4 k.
=
1 4 −5

Solving this, we get

n = (−10 + 4)i − (−5 + 1)j + (4 − 2)k


= −6i + 4j + 2k
= < −6, 4, 2 >=< a, b, c >,

the vector which is normal to the given plane. Using the direction number
for n and the point (x1 , y1 , z1 ) = (2, −1, 3), we can obtain an equation of
the plane to be

a(x − x1 ) + b(y − y1 ) + c(z − z1 ) = 0


−6(x − 2) + 4(y + 1) + 2(z − 3) = 0 (standard form)
−6x + 4y + 2z + 10 = 0. (general form)

Note that each of the given points (2, −1, 3), (3, 1, 2), and (4, 5, −3) satisfies
this plane equation. •
970 Linear Algebra and Optimization using MATLAB

Note that:

1. Two planes are parallel if their normal vectors are parallel.

2. Two planes are orthogonal if their normal vectors are orthogonal.

3. The angle between planes is


n1 .n2
cos θ = ,
kn1 kkn2 k

where n1 and n2 are normal vectors of the planes.

Example B.20 Show that the planes 2x − 2y + 3z − 2 = 0 and 8x − 8y +


12z − 5 = 0 are parallel.

Solution. The normal vectors to the given planes, respectively, are

n1 =< 2, −2, 3 > and n2 =< 8, −8, 12 > .

Since
n2 =< 8, −8, 12 >= 4 < 2, −2, 3 >= 4n1 ,
the vectors n1 and n2 are parallel, and so are the planes. •

Theorem B.6 (Distance Between a Plane and a Point)

The distance between a plane and a point R (which is not in the plane) is
defined as
|P~R.n|
D= ,
knk
where P is a point in the plane and n is normal to the plane. •

To find a point in the plane ax + by + cz + d = 0 (a 6= 0), take y = 0 and


z = 0, then we get
ax + d = 0.
It gives x = −d/a, so the point in the plane will be (/.a, 0, 0). •
Appendix B: Mathematical Preliminaries 971

Example B.21 Find the distance between the point R = (3, 7, −3) and
the plane given by 4x − 3y + 5z = 8.

Solution. The vector


n =< 4, −3, 5 >
is normal to the given plane. Now to find a point P in the plane, let
y = 0, z = 0, and we obtain the point P = (2, 0, 0). The vector from P to
R is given by

P~R =< 3 − 2, 7 − 0, −3 − 0 >=< 1, 7, −3 > .

Using the above distance formula, we have

|P~R.n| | < 1, 7, −3 > . < 4, −3, 5 > |


D= = p
knk 42 + (−3)2 + 52
|4 − 21 − 15|
= √
16 + 9 + 25
32
= √ ,
50
which is the required distance between the point and the plane. •

From Theorem B.6, we can determine the distance between the point R =
(x0 , y0 , z0 ), and the plane given by ax + by + cz + d = 0 is

|a(x0 − x1 ) + b(y0 − y1 ) + c(z0 − z1 )|


D= √ .
a2 + b 2 + c 2
It can be written as
|ax0 + by0 + cz0 + d|
D= √ ,
a2 + b 2 + c 2
where P = (x1 , y1 , z1 ) is a point in the plane and d = −(ax1 + by1 + cz1 ).

Example B.22 Find the distance between the point P (1, −2, −3) and the
plane 6x − 2y + 3z = 2.
972 Linear Algebra and Optimization using MATLAB

Solution. Given the equation of the plane

6x − 2y + 3z − 2 = 0,

we obtain
a = 6, b = −2, c = 3, d = −2.
Using these values, we get
|(6)(1) + (−2)(−2) + (3)(−3) − 2|
D= p
62 + (−2)2 + 32
or
| − 1| 1
D= √ = ,
49 7
which is the distance from the given point to the given plane. •

Example B.23 Find the distance between the parallel planes 9x + 3y −


3z = 9 and 3x + y − z = 2.

Solution. First, we note that the planes are parallel because their normal
vectors < 9, 3, −3 > and < 3, 1, −1 > are parallel, i.e.,

< 9, 3, −3 >= 3 < 3, 1, −1 > .

To find the distance between the planes, we choose any point on one plane,
say (x0 , y0 , z0 ) = (1, 0, 0) is a point in the first plane, then, from the second
plane, we can find

a = 3, b = 1, c = −1, d = −2.

Using these values, the distance is


|(3)(1) + (1)(0) + (−1)(0) − 2|
D = p
32 + 12 + (−1)2
3
= √
11
≈ 0.9045,

which is the distance between the given planes. •


Appendix B: Mathematical Preliminaries 973

Example B.24 Show that the following system of equations has no solu-
tion:
x1 − x2 + 4x3 = 1
−2x1 + 2x2 − 8x3 = 3
x1 + x2 + 3x3 = 2.
Solution. Consider the general form of the equation of the plane

ax + by + cy + d = 0,

where the vector (a, b, c) is normal to this plane. Interpret each of the given
equations as defining a plane in R3 . On comparison with the general form,
it is seen that the following vectors are normal to these three planes:

(1, −1, 4), (−2, 2, −8), (1, 1, 3).

Note that
(−2, 2, −8) = −2(1, −1, 4),
which shows that the normals to the first two planes are parallel. Thus,
these two planes are parallel and are distinct. Thus, three planes have no
points in common and, therefore, the given system has no solution. •

Theorem B.7 (Distance Between a Point and a Line in Space)

The distance between a point R and a line in a space is defined as

kP~R × uk
D= ,
kuk

where u is the direction vector for the line and P is a point on the line. •

To find a point in the plane ax + by + cz + d = 0 (a 6= 0), take y = 0 and


z = 0, then we get
ax + d = 0,
which gives x = −d/a, so the point in the plane will be (/.a, 0, 0). •
974 Linear Algebra and Optimization using MATLAB

Example B.25 Find the distance between the point R = (4, −2, 5) and
the line given by
x = −1 + 3t, y = 2 − 5t, and z = 3 + 7t.
Solution. Using the direction numbers 3, −5, 7, we have the direction vec-
tor for the line, which is
u =< 3, −5, 7 > .
So to find a point P on the line, let t = 0, and we get the point P =
(−1, 2, 3). Thus, the vector from P to R is given by
P~R =< 4 + 1, −2 − 2, 5 − 3 >=< 5, −4, 2 >,
and we can form the vector product as

i j k
~
−4 2 i − 5 2

j + 5 −4

P R × u = 5 −4 2 = k.
−5 7 3 7 3 −5
3 −5 7
Solving this, we get
P~R × u = (−28 + 10)i − (35 − 6)j + (−25 + 12)k
= −18i − 29j − 13k
= < −18, −29, −13 > .
Thus, the distance between the point R and the given line is
kP~R × uk k < −18, −29, −13 > k
D= =
kuk k < 3, −5, 7 > k
p
(−18)2 + (−29)2 + (−13)2
= p
32 + (−5)2 + 72

1334
= √
83
36.5240
=
9.1104
= 4.0090,
which is the required distance between the given point and the line. •
Appendix B: Mathematical Preliminaries 975

Example B.26 Show that the lines


l1 : x = 1 + 2t, y = 3 − 2t, z =5+t
l2 : x = 2 + 3s, y = 2 + s, z = −4 + 2s
are skew. Find the distance between them.

Solution. Since the two lines l1 and l2 are skew, they can be viewed as
lying on two parallel planes P1 and P2 . The distance between l1 and l2 is
the same as the distance between P1 and P2 . The common normal vector
to both planes must be orthogonal to both u1 =< 2, −2, 1 > (the direction
of l1 ) and u2 =< 3, 1, 2 > (the direction of l2 ). So a normal vector is

i j k
−2 1 2 1 2 −2
n = u1 × u2 = 2 −2 1 =

i − 3 2 j + 3
k.
3 1 2 1
1 2
Solving this, we get
n = (−4 − 1)i − (4 − 3)j + (2 + 6)k
= −5i − j + 8k
= < −5, −1, 8 > .
If we put s = 0 in the equations of l2 , we get the point (2, 2, −4) on P2 ,
and so the equation for P2 is
−5(x − 2) − (y − 2) + 8(z + 4) = 0,
which can also be written as
−5x − y + 8z + 44 = 0.
If we set t = 0 in the equations of l1 , we get the point (1, 3, 5) on P1 . So
the distance between l1 and l2 is the same as the distance from (1, 3, 5) to
−5x − y + 8z + 44 = 0. Thus, the distance is
|(−5)(1) + (−1)(3) + (8)(5) + 44|
D = p
(−5)2 + (−1)2 + 82
76
= √
90
≈ 8.0111,
which is the required distance between the skew lines. •
976 Linear Algebra and Optimization using MATLAB

B.2 Complex Numbers


Although physical applications ultimately require real answers, complex
numbers and complex vector spaces play an extremely useful, if not es-
sential, role in the intervening analysis. Particularly in the description of
periodic phenomena, complex numbers and complex exponentials help to
simplify complicated trigonometric formulae.
Complex numbers arise naturally in the course of solving polynomial
equations. For example, the solutions of the quadratic equation

ax2 + bx + c = 0

are given by the quadratic formula



−b ± b2 − 4ac
x= ,
2a
which are complex numbers, if b2 − 4ac < 0. To deal with the problem
that the equation x2 = −1 has no real solution, mathematicians of the
eighteenth century invented the “imaginary” number

i = −1,

which is assumed to have the property



i2 = ( −1)2 = −1,

but which otherwise has the algebraic properties of a real number.


A complex number z is of the form

z = a + ib, (B.1)

where a and b are real numbers; a is called the real part of z and is denoted
by Re(z); and b is called the imaginary part of z and is denoted by Im(z).
We say that two complex numbers z1 = a1 + ib1 and z2 = a2 + ib2 are
equal, if their real and imaginary parts are equal, i.e., if

a1 = a2 and b1 = b2 .

Note that:
Appendix B: Mathematical Preliminaries 977

1. Every real number a is a complex number with its imaginary part


zero; a = a + i0.

2. The complex number z = 0 + i0 corresponds to zero.

3. If a = 0 and b 6= 0, then z = ib is called the imaginary number, or a


purely imaginary number.

B.2.1 Geometric Representation of Complex Numbers


A complex number z = a + ib may be regarded as an ordered pair (a, b)
of real numbers. This ordered pair of real numbers corresponds to a point
in the plane. Such a correspondence naturally suggests that we represent

Figure B.9: Geometric representation of a complex number.

a + ib as a point in the complex plane (Figure B.9), where the horizontal


axis (also called the real axis) is used to represent the real part of z and
the vertical axis (also called the imaginary axis) is used to represent the
imaginary part of the complex number z.
978 Linear Algebra and Optimization using MATLAB

B.2.2 Operations on Complex Numbers


Complex numbers are added, subtracted, and multiplied in accordance
with the standard rules of algebra but i2 = −1.
If z1 = a1 + ib1 and z2 = a2 + ib2 are two complex numbers, then their sum
is
z1 + z2 = (a1 + a2 ) + i(b1 + b2 ),
and their difference is

z1 − z2 = (a1 − a2 ) + i(b1 − b2 ).

The product of z1 and z2 is

z1 z2 = (a1 + ib1 )(a2 + ib2 ) = (a1 a2 − b1 b2 ) + i(a1 b2 + a2 b1 ).

This multiplication formula is obtained by expanding the left side and using
the fact that i2 = −1.
One can multiply a complex number by a real number α according to

αz = αa + iαb.

Finally, division is obtained in the following manner:


z1 a1 + ib1 (a1 + ib1 )(a2 − ib2 )
= =
z2 a2 + ib2 (a2 + ib2 )(a2 − ib2 )
   
a1 a2 + b 1 b 2 b 1 a2 − a1 b 2
= +i .
a22 + b22 a22 + b22
An important quantity associated with complex number z is its complex
conjugate, defined by
z = a − ib.
Note that
zz = (a + ib)(a − ib) = a2 + b2
is an intrinsically positive quantity (unless a = b = 0).
The MATLAB built-in function conj can be used to find the complex
conjugate as follows:
Appendix B: Mathematical Preliminaries 979

>> z1 = conj(z);

We call zz the modulus, absolute value, or the magnitude of z and write

|z| = |a + ib| = zz = a2 + b2 .

This also tells us that


1 z
= 2.
z |z|
Note that a complex number cannot be ordered in the sense that the in-
equality z1 < z2 has no meaning. Nevertheless, the absolute values of
complex numbers, being real numbers,
√ can be ordered. Thus, for example,
|z| < 1 means that z is such that a2 + b2 < 1.
Note that:

1. z = z.

2. z1 + z2 = z1 + z2 .

3. z1 z2 = z1 z2 .
 
z1 z1
4. If z2 6= 0, then = .
z2 z2

5. z is real, if and only if z = z.

A complex vector space is defined in exactly the same manner as its real
counterpart, the only difference being that we replace real scalars by com-
plex scalars. The terms complex vector space and real vector space empha-
size the set from which the scalars are chosen. The most basic example is
the n-dimensional complex vector space Cn consisting of all column vectors
z = (z1 , z2 , . . . , zn )n that have n complex entries z1 , z2 , . . . , zn in Cn . Note
that
z ∈ Rn ⊂ Cn

is a real vector, if and only if z = z.


980 Linear Algebra and Optimization using MATLAB

B.2.3 Polar Forms of Complex Numbers


As we have seen, the complex number z = a + ib can be represented
geometrically by the point (a, b). This point can also be expressed in terms
of polar coordinates (r, θ), where r ≥ 0, as shown in Figure B.10. We have

a = r cos θ and b = r sin θ,

so
z = a + ib = r cos θ + ir sin θ.
Thus, any complex number can be written in the polar form

z = r(cos θ + i sin θ),

where
√ b
r = |z| = a2 + b 2 and
tan θ = .
a
The angle θ is called an argument of z and is denoted argz. Observe

Figure B.10: Polar form of a complex number.

that argz is not unique. Adding or subtracting any integer multiple of 2π


Appendix B: Mathematical Preliminaries 981

gives another argument of z. However, there is only one argument θ that


satisfies
−π < θ ≤ π.
This is called the principal argument of z and is denoted Argz.
Note that

z1 z2 = [r1 (cos θ1 + i sin θ1 )][r2 (cos θ2 + i sin θ2 )],

which can be written as (after using the trigonometric identities)

z1 z2 = r1 r2 [cos(θ1 + θ2 ) + i sin(θ1 + θ2 )],

which means that to multiply two complex numbers, we multiply their


absolute values and add their arguments. Similarly, we can get
z1 r1
= [cos(θ1 − θ2 ) + i sin(θ1 − θ2 )] , z2 6= 0, (B.2)
z2 r2
which means that to divide two complex numbers, we divide their absolute
values and subtract their arguments.
As a special case of (B.2), we obtain a formula for the reciprocal of a
complex number in polar form. Setting z1 = 1 (and therefore θ1 = 0) and
z2 = z (and therefore θ2 = 0), we obtain:
If z = r(cos θ + i sin θ) is nonzero, then

1 1
= (cos θ − i sin θ) .
z r
In the following we give some well-known theorems concerning the polar
form of a complex number.

Theorem B.8 (De Moivre’s Theorem)

If z = r(cos θ + i sin θ) and n is a positive integer, then

z n = rn (cos nθ + i sin nθ).


982 Linear Algebra and Optimization using MATLAB

Theorem B.9 (Euler’s Formula)

For any real number α,

eiα = cos α + i sin α.

Using Euler’s formula, we see that the polar form of a complex number can
be written more compactly as

z = r(cos θ + i sin θ) = reiθ

and z = re−iθ . •

Theorem B.10 (Multiplication Rule)

If z1 = r1 eiθ1 and z2 = r1 eiθ2 are complex numbers in polar form, then

z1 z2 = r1 r2 ei(θ1 +θ2 ) .

In the following we give other important theorems concerning complex


numbers.
Theorem B.11 If α1 and α2 are roots of the quadratic equation

x2 + ux + v = 0,

then α1 + α2 = −u and α1 α2 = v. •

Theorem B.12 (Fundamental Theorem of Algebra)

Every polynomial function f (x) of positive degree with complex coefficients


has a complex root. •

Theorem B.13 Every complex polynomial of degree n ≥ 1 has the form

f (x) = u(x − u1 )(x − u2 ) · · · (x − un ),

where u1 , u2 , . . . , un are the roots of f (x) (and need not all be distinct) and
u is the coefficient of xn . •
Appendix B: Mathematical Preliminaries 983

Theorem B.14 Every polynomial f (x) of positive degree with real coef-
ficients can be factored as a product of linear and irreducible quadratic
factors. •

Theorem B.15 (Nth Roots of Unity)

If n ≥ 1 is an integer, the nth roots of unity (i.e., the solution to z n = 1)


are given by
z = e2πki , k = 0, 1, . . . , n − 1.

B.2.4 Matrices with Complex Entries


If the entries of a matrix are complex numbers, we can perform the ma-
trix operations of addition, subtraction, multiplication, and scalar multi-
plication in the same manner as for real matrices. The validity of these
operations can be verified using the properties of complex arithmetic and
just imitating the proofs for real matrices as discussed in Chapter 1. For
example, consider the following matrices:
 
    1+i 2+i
1 + i 3i 2 − 2i 4i
A= , B= , C =  5 + i 4 + 5i  .
5 + i 6i 1 + 3i 2i
2 − 4i 1 + 2i

Then
     
1 + i 3i 2 − 2i 4i 3 − i 7i
A+B = + =
5 + i 6i 1 + 3i 2i 6 + 4i 8i

and
     
1 + i 3i 2 − 2i 4i −1 + 3i −i
A−B = − = .
5 + i 6i 1 + 3i 2i 4 − 2i 4i

Also,
   
1+i 2+i   9 − 9i −9 + 15i
1 + i 3i
CA =  5 + i 4 + 5i  =  19 + 35i −30 + 39i 
5 + i 6i
2 − 4i 1 + 2i 9 + 9i 12i
984 Linear Algebra and Optimization using MATLAB

and    
1 + i 3i −3 + 3i −9i
3iA = 3i = .
5 + i 6i −3 + 15i −18
There are special types of complex matrices, like Hermitian matrices, uni-
tary matrices, and normal matrices which we discussed in Chapter 3.

B.2.5 Solving Systems with Complex Entries


The results and techniques dealing with the solutions of linear systems
that we developed in Chapter 2 carry over directly to linear systems with
complex coefficients. For example, the solution of the linear system

3ix1 + 4x2 = 5 + 15i


5 − ix1 + 3 − 4ix2 = 24 + 5i

can be obtained by using the Gauss–Jordan method as follows:


. ! . !
3i 4 .. 5 + 15i 3i 4 .. 5 + 15i
[A|b] = . ∼ .
5 − i 3 − 4i .. 24 + 5i 0 13
3
+ 83 i .. 23 + 55
3
i

. ! . !
3i 4 .. 5 + 15i 3i 0 .. −3 + 3i
∼ . ∼ .
0 1 .. 2 + 3i 0 1 .. 2 + 3i
. !
1 0 .. 1 + i
∼ . .
0 1 .. 2 + 3i
Thus, the solution to the given system is x1 = 1 + i and x2 = 2 + 3i. •

B.2.6 Determinants of Complex Numbers


The definition of a determinant and all its properties derived in Chapter 1
applies to matrices with complex entries. For example, the determinant of
the matrix  
1 + i 3i
A=
5 + i 6i
Appendix B: Mathematical Preliminaries 985

can be obtained as

1 + i 3i
|A| = = (6i + 6i2 ) − (15i + 3i2 ) = −9i − 3.
5 + i 6i

B.2.7 Complex Eigenvalues and Eigenvectors


Let A be an n × n matrix. The complex number λ is an eigenvalue of A,
if there exists a nonzero vector x in Cn such that

Ax = λx. (B.3)

Every nonzero vector x satisfying (B.3) is called an eigenvector of A asso-


ciated with the eigenvalue λ. The relation (B.3) can be rewritten as

(A − λI)x = 0. (B.4)

This homogeneous system has a nonzero solution x, if and only if

det(A − λI) = 0

has a solution. As in Chapter 5, det(A − λI) is called the characteristic


polynomial of the matrix A, which is a complex polynomial of degree n in
λ. The eigenvalues of the complex matrix A are the complex roots of the
characteristic polynomial. For example, let
 
0 1
A= ,
−1 0

then
det(A − λI) = λ2 + 1 = 0
gives the eigenvalues λ1 = i and λ2 = −i of A. One can easily find the
eigenvectors
x1 = [1, i]T and x2 = [−1, i]T ,
associated with eigenvalues i and −i, respectively. •
986 Linear Algebra and Optimization using MATLAB

B.3 Inner Product Spaces


Now we study a more advanced topic in linear algebra called inner product
spaces. Inner products lie at the heart of linear (and nonlinear) analysis,
both in finite-dimensional vector spaces and infinite-dimensional function
spaces. It is impossible to overemphasize its importance for both theo-
retical developments, practical application, and in the design of numerical
solution techniques. Inner products are widely used from theoretical anal-
ysis to applied signal processing. Here, we discuss the basic properties of
inner products and give some important theorems.

Definition B.12 (Inner Product)

An inner product on a vector space V is an operation that assigns to every


pair of vectors u and v in V a real number < u, v > such that the following
properties hold for all vectors u, v, and w in V and all scalars α:

1. < u, v > = < v, u > .

2. < u, v + w > = < u, v > + < u, w > .

3. < αu, v > = α < u, v > .

4. < u, u > ≥ 0 and < u, u > = 0, if and only if u = 0.

A vector space with an inner product is called an inner product space. The
most basic example of an inner product is the familiar dot product
n
X
< u, v > = u.v = u1 v1 + u2 v2 + · · · + un vn = uj vj ,
j=1

between (column) vectors

u = (u1 , u2 , . . . , un )T and v = (v1 , v2 , . . . , vn )T ,

lying in the Euclidean space Rn . •


Appendix B: Mathematical Preliminaries 987

B.3.1 Properties of Inner Products


The following theorem summarizes some additional properties that follow
from the definition of an inner product.
Theorem B.16 Let u, v, and w be vectors in an inner product space V
and let α be a scalar:
1. < u + v, w > = < u, w > + < u, v > .
2. < u, αv > = α < u, v > .
3. < u, 0 > = < 0, v >= 0.

In an inner product space, we can define the length of a vector, the distance
between vectors, and orthogonal vectors.
Definition B.13 (Length of a Vector)

Let v be a vector in an inner product space V . Then the length (or norm)
of v is defined as √
kvk = < v, v >.

Theorem B.17 (Inner Product Norm Theorem)

If V is a real vector space with an inner product < u, v >, then the function

kuk = < u, u > is a norm on V. •
Definition B.14 (Distance Between Vectors)

Let u and v be vectors in an inner product space V . Then the distance


between u and v is defined as
d(u, v) = ku − vk.
Note that:
d(u, 0) = d(0, u) = kuk.
A vector with norm 1 is called a unit vector. The set S of all unit vectors
is called a unit circle or unit sphere S = {u|u ∈ V and kuk = 1}. •
988 Linear Algebra and Optimization using MATLAB

The following theorem summarizes the most important properties of a


distance function.
Theorem B.18 Let d be a distance function defined on a normed linear
space V . The following properties hold for all u, v, and w vectors in V :
1. d(u, v) ≥ 0, and d(u, v) = 0, if and only if u = v.
2. d(u, v) = d(v, u).
3. d(u, w) ≤ d(u, v) + d(v, w). •

Definition B.15 (Orthogonal Vectors)

Let u and v be vectors in an inner product space V . Then u and v are


orthogonal if
< u, v > = 0.

In the following we give some well-known theorems concerning inner prod-


uct spaces.
Theorem B.19 (Pythagoras’ Theorem)

Let u and v be vectors in an inner product space V . Then u and v are


orthogonal if and only if
ku + vk2 = kuk2 + kvk2 .

Theorem B.20 (Orthogonality Test for Linear Independence)

Nonzero orthogonal vectors in an inner product space are linearly indepen-


dent. •

Theorem B.21 (Normalization Theorem)

For every nonzero vector u in an inner product space V , the vector v =


u/kuk is a unit vector. •
Appendix B: Mathematical Preliminaries 989

Theorem B.22 (Cauchy–Schwarz Inequality)

Let u and v be vectors in an inner product space V . Then

| < u, v > | ≤ kukkvk,

with the inequality holding, if and only if u and v are scalar multiples of
each other. •

Theorem B.23 (Triangle Inequality)

Let u and v be vectors in an inner product space V . Then

ku + vk ≤ kuk + kvk.

Theorem B.24 (Parallelogram Law)

Let V be an inner product space. For any vectors u and v of V , we have

ku + vk2 + ku − vk2 = 2kuk2 + 2kvk2 .

Theorem B.25 (Polarization Identity)

Let V be an inner product space. For any vectors u and v of V , we have


1 1
< u, v > = ku + vk2 − ku − vk2 .
4 4

Theorem B.26 Let V be an inner product space. For any vectors u and
v of V , we have

ku + vk2 = kuk2 + kvk2 + 2 < u, v > .


990 Linear Algebra and Optimization using MATLAB

Theorem B.27 Every inner product on Rn is given by


< u, v > = uT Av, for u, v ∈ Rn ,
where A is a symmetric, positive-definite matrix. •
Theorem B.28 All Gram matrices are positive semidefinite. The Gram
matrix
 
< u1 , u1 > < u1 , u2 > · · · < u1 , un >
 < u2 , u1 > < u2 , u2 > · · · < u2 , un > 
A=
 
.. .. ... .. 
 . . . 
< un , u1 > < un , u2 > · · · < un , un >
(where u1 , u2 , . . . , un are vectors in the inner product space V ) is positive-
definite, if and only if u1 , u2 , . . . , un are linearly independent. •

B.3.2 Complex Inner Products


Certain applications of linear algebra require complex-valued inner prod-
ucts.
Definition B.16 (Complex Inner Product)

An inner product on a complex vector space V is a function that associates


a complex number < u, v > with each pair of vectors u and v in such a
way that the following axioms are satisfied for all vectors u, v, and w in V
and all scalars α:
1. < u, v > = < v, u > .
2. < u + v, w > = < u, w > + < v, w > .
3. < αu, v > = α < u, v > .
4. < v, v >≥ 0 and < v, v > = 0, if and only if v = 0.
The scalar < v, u > is the complex conjugate of < v, u >. Complex inner
products are no longer symmetric since < v, u > is not always equal to its
complex conjugate. A complex vector space with an inner product is called
a complex inner product space or unitary space. •
Appendix B: Mathematical Preliminaries 991

The following additional properties follow immediately from the four inner
product axioms:

1. < 0, u > = < v, 0 > = 0.

2. < u, v + w > = < u, v > + < u, w > .

3. < u, αv > = α < u, v > .

An inner product can then be used to define the norm, orthogonality, and
distance for a real vector space.
Let u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) be elements of Cn . The
most useful inner product for Cn is

< u, v >= u1 v1 + u2 v2 + · · · + un vn .

It can be shown that this definition satisfies the inner product axioms for
a complex vector space.
This inner product leads to the following definitions of norm, distance,
and orthogonality for Cn :

1. kuk = u1 u1 + u2 u2 + · · · + un un .

2. d(u, v) = ku − vk.

3. u is orthogonal to v, if < u, v > = 0. •


992 Linear Algebra and Optimization using MATLAB

B.4 Problems
1. Compute u + v, u − v and their k.k for each of the following:
(a) u =< 4, −5 >, v =< 3, 4 > .
(b) u =< −3, −7 >, v =< −4, 5 > .
(c) u =< 4, 5, 6 >, v =< 1, −3, 4 > .
(d) u =< −7, 15, 26 >, v =< 11, −13, 24 >

2. Compute u + v, u − v and their k.k for each of the following:


(a) u = 3i + 2j + 5k, v = 2i − 5j − 7k.
(b) u = i − 12j − 13k, v = 8i + 11j + 16k.
(c) u = 19i − 22j − 35k, v = −12i + 32j + 18k.
(d) u = 34i − 35k, v = −31i + 25j − 27k.

3. Find a unit vector that has the same direction as a vector a:


(a) a =< −7, 15, 26 > .
(b) a = 2 < 6, −11, 15 > .
(c) a = −3 < 11, −33, 46 > .
(d) a = 5 < 12, −23, −33 > .

4. Find a unit vector that has the same direction as a vector a:


(a) a = i − 5j + 4k.
(b) a = 3i + 7j − 3k.
(c) a = 25i − 17j + 22k.
(d) a = 33i + 45j − 51k.

5. Find the dot product of each of the following vectors:


(a) u =< −3, 4, 2 >, v =< 5, −3, 4 > .
(b) u =< 2, −1, 4 >, v =< 6, 9, 12 > .
(c) u =< 6, 7, 8 >, v =< −8, −11, 15 > .
(d) u =< −23, 24, 33 >, v =< 26, −45, 51 > .

6. Find the dot product of each of the following vectors:


(a) u = i − 3j + 2k, v = −2i + 3j − 5k.
(b) u = 5i + 7j − 8k, v = 6i + 10j + 14k.
(c) u = −21i + 13j − 26k, v = 17i − 33j + 56k.
(d) u = 55i − 63j + 72k, v = 33i − 43j − 75k.
Appendix B: Mathematical Preliminaries 993

7. Find the angle between each of the following vectors:


(a) u =< 2, 3, 1 >, v =< 4, −2, 5 > .
(b) u =< −3, 0, 7 >, v =< 5, −8, 4 > .
(c) u =< 7, −9, 11 >, v =< 6, −13, 10 > .
(d) u =< 22, −29, 31 >, v =< 27, 41, 57 > .

8. Find the angle between each of the following vectors:


(a) u = 2i + 4j − 3k, v = 3i + j − 6k.
(b) u = i − 11j − 12k, v = 5i − 13j − 16k.
(c) u = −11i + 23j + 32k, v = 7i − 43j − 26k.
(d) u = 25i − 36j + 47k, v = 31i + 24j + 15k.

9. Find the value of α such that the following vectors are orthogonal:
(a) u =< 4, 5, −3 >, v =< α, 4, 0 > .
(b) u =< 5, α, −4 >, v =< 5, −3, 4 > .
(c) u =< 2 sin x, 2, − cos x >, v =< − sin x, α, 2 cos x > .
(d) u =< sin x, cos x, −2 >, v =< cos x, − sin x, α > .

10. Show that the following vectors are orthogonal:


(a) u =< 3, −2, 1 >, v =< 4, 5, −2 > .
(b) u =< 4, −1, −2 >, v =< 2, −2, 5 > .
(c) u =< sin x, cos x, −2 >, v =< cos x, − sin x, 0 > .
(d) u =< 2 sin x, 2, − cos x >, v =< − sin x, 2, 2 cos x > .

11. Find the direction cosines and angles for the vector u = 1i + 2j + 3k,
and show that cos2 α + cos2 β + cos2 γ = 1.

12. Find the direction cosines and angles for the vector u = 9i−13j+22k,
and show that cos2 α + cos2 β + cos2 γ = 1.

13. Find compv u and compu v of each of the following vectors:


(a) u =< 2, 1, 1 >, v =< 3, 2, 2 > .
(b) u =< 3, 2, 2 >, v =< 1, 2, 4 > .
(c) u =< 3, 5, −2 >, v =< 4, −1, 4 > .
(d) u =< 5, 7, 8 >, v =< 10, 11, 12 > .

14. Find compv u and compu v of each of the following vectors:


(a) u =< 2, 4, −3 >, v =< 2, 2, 7 > .
994 Linear Algebra and Optimization using MATLAB

(b) u =< 3, −3, −4 >, v =< −4, 2, 2 > .


(c) u =< 7, 5, 5 >, v =< 8, −10, 14 > .
(d) u =< 9, 7, 11 >, v =< 15, 17, 13 > .

15. Find the projection of a vector u and v of each of the following


vectors:
(a) u =< 2, 3, 4 >, v =< 3, 5, 2 > .
(b) u =< 5, −4, 2 >, v =< 5, 3, 1 > .
(c) u =< 6, 4, −2 >, v =< 2, −1, 3 > .
(d) u =< 9, 7, −5 >, v =< 10, −11, 9 > .

16. Find the projection of a vector u and v of each of the following


vectors:
(a) u =< 3, 4, 3 >, v =< 3, 2, 5 > .
(b) u =< 3, 3, 7 >, v =< −4, 6, 5 > .
(c) u =< 7, 8, 5 >, v =< 6, −10, 11 > .
(d) u =< 8, 7, 10 >, v =< 12, 13, 11 > .

17. A force is given by a vector F = 12i − 9j + 11k and move a particle


from the point P (9, 7, 5) to the point Q(14, 22, 17). Find the work
done.

18. A force is given by a vector F = 4i + 5j + 7k and moves a particle


from point P (2, 1, 3) to point Q(5, 4, 3). Find the work done.

19. Find the cross product of each of the following vectors:


(a) u =< 2, −3, 4 >, v =< 3, −2, 6 > .
(b) u =< −3, 2, −2 >, v =< −1, 2, −4 > .
(c) u =< 3, 5, −2 >, v =< 2, −1, 4 > .
(d) u =< 12, −31, 21 >, v =< 14, 17, −19 > .

20. Use the cross product to show that each of the following vectors are
parallel:
(a) u =< 2, −1, 4 >, v =< −6, 3, −12 > .
(b) u =< −3, −2, 1 >, v =< 6, 4, −2 > .
(c) u =< 3, 4, 2 >, v =< −6, −8, −4 > .
(d) u =< −6, −10, 4 >, v =< 3, 5, −2 > .
Appendix B: Mathematical Preliminaries 995

21. Find the cross product of each of the following vectors:


(a) u = 12i − 8j + 11k, v = −9i + 17j + 13k.
(b) u = 13i − 17j + 3k, v = 22i − 13j + 12k.
(c) u = 18i − 19j + 12k, v = 14i − 9j + 13k.
(d) u = 15i − 13j + 4k, v = 8i − 11j + 15k.

22. Find the area of the parallelogram with vertices P, Q, R, and S:


(a) P (2, 3, 1), Q(3, 4, 2), R(5, −2, 1), S(4, −6, 3).
(b) P (2, 1, 1), Q(5, 2, 2), R(4, 2, 3), S(7, 5, 2).
(c) P (2, 1, 1), Q(6, 1, 5), R(5, −3, 4), S(9, −5, 4).
(d) P (2, 1, 1), Q(3, 5, 3), R(2, 5, 9), S(8, −7, 6).

23. Find the area of the parallelogram made by P~Q and P~R, where P,
Q, and R are the points in the plane:
(a) P (4, −5, 2), Q(2, 4, 7), R(−4, −2, 6).
(b) P (2, 1, 1), Q(5, 2, −5), R(4, −2, 5).
(c) P (2, 1, 1), Q(4, 4, 5), R(5, −3, 4).
(d) P (2, 1, 1), Q(9, 5, 3), R(4, 7, 9).

24. Find the area of the triangle with vertices P, Q, and R:


(a) P (2, 3, 1), Q(3, 4, 2), R(5, −2, 1).
(b) P (2, 1, 1), Q(5, 2, 2), R(4, 2, 3).
(c) P (2, 1, 1), Q(6, 1, 5), R(5, −3, 4).
(d) (2, 1, 1), Q(3, 5, 3), R(2, 5, 9).

25. Find the area of the triangle with vertices P, Q, and R:


(a) P (4, −5, 2), Q(2, 4, 7), R(−4, −2, 6).
(b) P (2, 1, 1), Q(5, 2, −5), R(4, −2, 5).
(c) P (2, 1, 1), Q(4, 4, 5), R(5, −3, 4).
(d) P (2, 1, 1), Q(9, 5, 3), R(4, 7, 9).

26. Find the scalar triple product of each of the following vectors:
(a) u =< 2, 0, 1 >, v =< 3, −4, 2 >, w =< 3, −2, 0 > .
(b) u =< 5, 3, 0 >, v =< 1, −2, 5 >, w =< 3, −2, 7 > .
(c) u =< 5, −3, 2 >, v =< 2, 1, 6 >, w =< 4, 0, −5 > .
(d) u =< 5, −4, 9 >, v =< 7, −4, 4 >, w =< 6, 1, 5 > .
996 Linear Algebra and Optimization using MATLAB

27. Find the scalar triple product of each of the following vectors:
(a) u = 3i − 5j + 2k, v = 6i + 3j + 4k, w = 3i − 8j + k.
(b) u = 4i − 3j + 6k, v = −4i + 3j + 5k, w = −3i + 9j + 2k.
(c) u = 17i − 25j + 10k, v = 5i + 9j + 13k, w = 4i + 5j + 8k.
(d) u = 25i+24j+15k, v = 13i−11j+17k, w = −9i+18j+27k.

28. Find the volume of the parallelepiped determined by each of the


following vectors:
(a) u =< 1, −1, 2 >, v =< 3, 2, 1 >, w =< 2, −2, 1 > .
(b) u =< 2, 3, 4 >, v =< 3, 2, 5 >, w =< 3, −2, 3 > .
(c) u =< 5, −3, 2 >, v =< 2, 1, 6 >, w =< 4, 3, 7 > .
(d) u =< 3, 6, 8 >, v =< 5, 7, 9 >, w =< 4, −2, 5 > .

29. Find the volume of the parallelepiped determined by each of the


following vectors:
(a) u = i − 4j + 2k, v = 2i + 3j + 4k, w = 3i − 5j + k.
(b) u = 3i − 2j + 5k, v = −3i + 2j + 5k, w = −2i + 4j + 3k.
(c) u = 7i − 5j + 10k, v = i + 9j + 3k, w = 2i + 6j + 7k.
(d) u = 2i + 4j + 5k, v = 5i − 11j + 6k, w = −i + 2j + k.

30. Find the volume of the parallelepiped with adjacent sides P Q, P R,


and P S:
(a) P (2, −1, 2), Q(4, 2, 1), R(3, −2, 1), S(5, −2, 1).
(b) P (3, −2, 4), Q(3, 2, 5), R(2, 1, 5), S(4, −3, 3).
(c) P (2, 4, 2), Q(4, 2, 3), R(3, 4, 6), S(3, 2, 5).
(d) P (10, 3, 3), Q(4, 2, 5), R(7, 11, 9), S(13, 12, 15).

31. Find the triple vector product by using each of the following vectors:
(a) u =< 3, 2, 1 >, v =< 3, −2, 1 >, w =< −2, −2, 1 > .
(b) u =< 5, 3, −4 >, v =< 3, −2, 4 >, w =< 3, −2, 2 > .
(c) u =< 5, 6, 7 >, v =< 6, 8, 9 >, w =< 14, 13, 17 > .
(d) u =< 17, 21, 18 >, v =< 15, 7, 12 >, w =< 14, −12, 15 > .

32. Find the triple vector product by using each of the following vectors:
(a) u = 2i − 2j + 2k, v = 3i + 5j + 4k, w = 4i − 3j + 2k.
(b) u = i − 3j + 2k, v = 4i + 6j + 5k, w = 2i + 4j + 3k.
(c) u = 3i + 4j + 7k, v = 4i + 9j + 11k, w = 5i + 7j + 12k.
(d) u = 8i+9j+15k, v = 10i−14j+16k, w = −16i−22j+15k.
Appendix B: Mathematical Preliminaries 997

33. Find the parametric equations for the line through point P parallel
to vector u :
(a) P (3, 2, −4), u = 2i + 2j + 2k.
(b) P (2, 0, 3), u = −2i − 3j + k.
(c) P (1, 2, 3), u = 2i + 4j + 6k.
(d) P (2, 2, −3), u = 4i + 5j − 6k.

34. Find the parametric equations for the line through points P and Q :
(a) P (4, −3, 5), Q(3, 5, 2).
(b) P (−2, 2, −3), Q(5, 8, 9).
(c) P (3, 2, 4), Q(−7, 2, 4).
(d) P (6, −5, 3), Q(3, −3, −4).

35. Find the angles between the lines l1 and l2 :


(a) l1 : x = 1 + 2t, y = 3 − 4t, z = −2 + t;
l2 : x = −5 − t, y = 2 − 3t, z = 4 + 3t.
(b) l1 : x = 2 + 5t, y = 1 − 7t, z = 7 + 3t;
l2 : x = 5 − 6t, y = 9 − 2t, z = 8 + 11t.
(c) l1 : x = 6 + 2t, y = 8 − 4t, z = 2 + 4t;
l2 : x = −4 + 6t, y = −3 + 2t, z = 4 − 3t.
(d) l1 : x = −3 − 4t, y = 2 − 7t, z = −3 + 5t;
l2 : x = 2 + 3t, y = 4 − 5t, z = 2 − 3t.

36. Determine whether the two lines l1 and l2 intersect, and if so, find
the point of intersection:
(a) l1 : x = 1 + 3t, y = 2 − 5t, z = 4 − t;
l2 : x = 1 − 6v, y = 2 + 3v, z = 1 + v.
(b) l1 : x = −3 + 3t, y = 2 − 2t, z = 4 − 4t;
l2 : x = 1 − v, y = 2 + 2v, z = 3 + 3v.
(c) l1 : x = 4 + 3t, y = 12 − 15t, z = 14 − 13t;
l2 : x = 11 − 16v, y = 12 + 13v, z = 10 + 10v.
(d) l1 : x = 9 + 5t, y = 10 − 11t, z = 9 − 21t;
l2 : x = 16 − 22v, y = 13 + 23v, z = 11 − 15v.

37. Find an equation of the plane through the point P with normal vector
u:
(a) P (4, −3, 5), u = 2i + 3j + 4k.
998 Linear Algebra and Optimization using MATLAB

(b) P (3, 5, 6), u = i − j + 2k.


(c) P (5, −1, 2), u = i + 5j + 4k.
(d) P (9, 11, 13), u = 6i − 9j + 8k.

38. Find an equation of the plane determined by the points P, Q, and R:


(a) P (3, 3, 1), Q(2, 4, 2), R(5, 3, 4).
(b) P (2, 5, 6), Q(5, 2, 5), R(3, 2, 6).
(c) P (5, −4, 3), Q(6, −3, 7), R(5, −3, 4).
(d) P (12, 11, 11), Q(8, 6, 11), R(12, 15, 19).

39. Find the distance from point P to the plane:


(a) P (1, −4, −3), 2x − 3y + 6z + 1 = 0.
(b) P (3, 5, 7), 4x − y + 5z − 9 = 0.
(c) P (3, 0, 0), 2x + 4y − 4z − 7 = 0.
(d) P (9, 10, 11), 12x − 23y + 11z − 64 = 0.

40. Show that the two planes are parallel and find the distance between
the planes:
(a) 8x − 4y + 12z − 6 = 0, −6x + 3y − 9z − 4 = 0.
(b) x + 2y − 2z − 3 = 0, 2x + 4y − 4z − 7 = 0.
(c) 2x − 2y + 2z − 4 = 0, x − y + z − 1 = 0.
(d) −4x + 2y + 2z − 1 = 0, 6x − 3y − 3z − 4 = 0.

41. Perform the indicated operation on each of the following:


(a) (3 + 4i) + (7 − 2i) + (9 − 5i).
(b) 4(6 + 2i) − 7(4 − 3i) + 11(6 − 8i).
(c) (1 − 2i)(1 + 2i) + (3 − 5i)(5 − 3i) + (7 − 3i)(8 − 2i).
(d) (−4 − 12i)(−13 + 5i) + (21 + 15i)(−11 − 23i) + (13 + i)(−4 + 7i).

42. Perform the indicated operation on each of the following:


(a) (−2 + 7i) − (9 + 12i) − (11 − 15i).
(b) 3(−7 − 5i) + 9(−3 + 5i) − 8(6 + 17i).
(c) (3 + 2i)(5 − 3i) − (13 + 10i)(13 − 10i) + (−12 + 5i)(7 − 11i).
(d) (17 + 21i)(31 − 26i) − (15 − 22i)(10 + 22i) − (25 − i)(9 − 15i).

43. Convert each of the following complex numbers to its polar form:
(a) 4 √
+ 3i.
(b) 5 3 + 7i.
Appendix B: Mathematical Preliminaries 999

(c) −3 √
+ 5i.
(d) −7 3 − 5i.
44. Convert each of the following complex numbers to its polar form:
(a) 11 −√8i.
(b) −15 7√+ 22i.
(c) −45√ + 19i.
(d) 24 18 − 25i.
45. Compute the conjugate of each of the following:
(a) 5 − 3i.
(b) −7 + 9i.
(c) e−2πi .
(d) 11e5πi/4 .

√ polar forms of the complex numbers z1 = −1 −
46. Use the 3i and
z2 = 3 + i to compute z1 z2 and z1 /z2 .
47. Find A + B, A − B, and CA using the following matrices:
 
    i 1−i
2+i i 1 − i −2i
A= ,B= , C =  4 − i 3 + 2i  .
3 − i 2i 1 + 2i 3i
2 − 3i 1 − 2i

48. Find 2A + 5B, 3A − 7B, and 4CA using the following matrices:
   
1 + i 3i 1 + i −3i 4i 1 − 5i
A =  4 − i 2i 1 − 3i  , B =  −5 + 2i 8i 2 + 7i  ,
i 2i i −i 1 + 4i −7i
 
1+i 1−i 4i
C =  29 − 12i 1 − 44i 13i  .
52 − 63i 31 + 21i −42i
49. Solve each of the following systems:
(a)
2ix1 − 3x2 = 1 − i
4ix1 + (5 + i)x2 = 2i.
1000 Linear Algebra and Optimization using MATLAB

(b)

(1 − i)x1 + (2 + i)x2 = 3i
(3 + i)x1 + 4ix2 = 1 − 5i.

(c)

−6ix1 + 9x2 = 4 − i
(1 − 4i)x1 − 6ix2 = −7i.

(d)

5ix1 − 3ix2 = 1 − i
7ix1 + 8ix2 = 3i.

50. Solve each of the following systems:


(a)

ix1 − 7x2 + (1 − 2i)x3 = 4 − 3i


2ix1 − 5ix2 + 4ix3 = 1 − 2i
ix1 + 11ix2 − 9ix3 = 5i.

(b)

ix1 + 5x2 + 3i)x3 = 6i


2ix1 − 9ix2 − 7ix3 = 7i
3ix1 − 8ix2 + 10ix3 = 8i.

(c)

13ix1 + 12x2 + (1 − 2i)x3 = 4 − 3i


21ix1 − 11ix2 − 12ix3 = 4 − 17i
33ix1 − 41ix2 − 29ix3 = 27i.
Appendix B: Mathematical Preliminaries 1001

(d)

1 − 3ix1 − 13ix2 + (1 + 21i)x3 = 6i


−5ix1 + 14ix2 − 24ix3 = 1 + 11i
7ix1 + 15ix2 + 23ix3 = 18i.

51. Find the determinant of each of the following matrices:


(a)  
2 − 3i 2i
A= .
6 + 7i 1 − 3i
(b)  
12 + 7i 2i
A= .
11 − 5i 3 + 4i
(c)  
12 + 23i 1 − 5i
A= .
16 − 27i 21 − 13i
(d)  
10 − 27i 8i
A= .
13 − 15i 11 − 24i

52. Find the determinant of each of the following matrices:


(a)  
1 − i 2i 4i
A =  4i 3i −2i  .
i 5i 4i
(b)  
3i 1 − 5i 7i
A =  i 2i 5i  .
6i −2i 3i
(c)  
1 − 2i 7i 3 + 5i
A =  2 + 7i 8i 1 + 7i  .
3 − 5i 9i 5 − 8i
1002 Linear Algebra and Optimization using MATLAB

(d)
 
13i 11i 15i
A =  23i −25i 5 − 15i  .
33i 33i 18i

53. Find the determinant of each of the following matrices:


(a)
 
i i −1 + 3i
A =  3 − 5i 2 + 7i 2 − 2i  .
i 3i 5i
(b)
 
2 − 3i 11 − 15i 12 − 7i
A =  5 − 8i 6 − 13i 3 − 15i  .
8 − 25i 10 − 2i 3i
(c)
 
1 + 2i 1 − 7i 3 − 6i
A =  2 − 7i 4 − 8i 11 − 5i  .
13 + 5i 3 − 9i 5 + 17i
(d)
 
32 − 13i 8 + 11i 4 − 15i
A =  42 + 23i 9 − 25i 5 − 27i  .
52 − 33i 10 − 31i 13 − 18i

54. Find the real and imaginary part of each of the following matrices:
(a)
 
3i 2
A= .
2 − 5i 1 − i
(b)
 
2 − 5i i
A= .
1 − 3i 2 − 3i
(c)
 
13 − 21i 8 − 25i
A= .
15 + 27i 10 + 13i
Appendix B: Mathematical Preliminaries 1003

(d)  
10 − 33i 3 − 9i
A= .
41 − 17i 10
55. Find the real and imaginary part of each of the following matrices:
(a)  
4 − 2i 3i 2 + 3i
A =  3 − 5i 2 + 7i 3 − 7i  .
2i 3i 1 − 5i
(b)  
2 + 5i 11 − 15i 12 − 7i
A =  5 − 7i 6 − 13i 3 − 15i  .
3 − 2i 9 − 12i 4 − 2i
(c)  
3i 3 − 4i 2 − 5i
A =  2 − 7i 4 − 8i 11 − 5i  .
11 + 2i 6 3 + 7i
(d)  
i 4 + 14i 24 − 35i
A =  11 + 13i 9 − 25i 25 − 17i  .
22 + 23i 12 − 31i 23 − 38i
56. Find the eigenvalues and the corresponding eigenvectors of each of
the following matrices:
(a)  
1 2 0
A =  4 5 6 .
7 0 9
(b)  
0 2 0
A =  4 5 −6  .
7 0 1
(c)  
1 0 1
A =  3 0 0 .
1 1 1
1004 Linear Algebra and Optimization using MATLAB

(d)  
2 0 −5
A= 3 1 0 .
1 5 1
57. Find the eigenvalues and the corresponding eigenvectors of each of
the following matrices:
(a)  
7 2 1
A =  3 3 0 .
1 2 1
(b)  
−2 12 1
A =  −23 13 1  .
6 −2 21
(c)  
11 −22 11
A =  12 −24 42  .
13 −26 −63
(d)  
−29 18 −11
A= 6 7 12  .
10 16 −6

58. Find A ± B, AB, and AT by using each of the following matrices:


(a)    
2+i i 1 − i −2i
A= , B= .
3 − i 2i 1 + 2i 3i
(b)
   
9 + 2i 1 − 4i 3 − 5i 7 − 8i
A= , B= .
3 + 4i 3 − 2i 9 + 11i 12 − 15i
(c)
   
1 − 3i 7i 1 − 5i 1 − 3i 3 − 4i 6 − 5i
A =  3 − i 4i 4 − 3i  , B =  −5 + 2i 8i 3 − 7i  .
2i 2i 4i 1 − 3i 5 + 4i 2 − 6i
Appendix B: Mathematical Preliminaries 1005

(d)
   
1 + 5i 7 − 3i 1 + i 9 − 10i 11 + 12i 14 − 15i
A =  4 − 7i 8 − 2i 2 − 2i  , B =  10 + 11i 11 − 12i 13 + 14i  .
3i 9 − i 3 − 3i 11 − 12i 13 + 14i 15 − 16i
Appendix C

Introduction to MATLAB

C.1 Introduction
In this appendix, we discuss programming with the software package MAT-
LAB. The name MATLAB is an abbreviation for “Matrix Laboratory.”
MATLAB is an extremely powerful package for numerical computing and
programming. In MATLAB, we can give direct commands, as on a hand
calculator, and we can write programs. MATLAB is widely used in univer-
sities and colleges in introductory and advanced courses in mathematics,
science, and especially in engineering. In industry, software is used in re-
search, development, and design. The standard MATLAB program has
tools (functions) that can be used to solve common problem. Until re-
cently, most users of MATLAB have been people who had previous knowl-
edge of programming languages such as FORTRAN or C and switched to
MATLAB as the software became popular.
MATLAB software exists as a primary application program and a large
library of program modules called the standard toolbox. Most of the nu-
merical methods described in this textbook are implemented in one form
1007
1008 Linear Algebra and Optimization using MATLAB

or another in the toolbox. The MATLAB toolbox contains an extensive


library for solving many practical numerical problems, such as root finding,
interpolation, numerical integration and differentiation, solving systems of
linear and nonlinear equations, and solving ordinary differential equations.
The MATLAB package also consists of an extensive library of numerical
routines, easily accessed two- and three-dimensional graphics, and a high-
level programming format. The ability to quickly implement and modify
programs makes MATLAB an appropriate format for exploring and exe-
cuting the algorithms in this textbook.
The MATLAB is a mathematical software package based on matrices.
It is a highly optimized and extremely reliable system for numerical linear
algebra. Many numerical tasks can be concisely expressed in the language
of linear algebra.
MATLAB is a huge program, therefore, it is impossible to cover all of it
in this appendix. Here, we focus primarily on the foundations of MATLAB.
It is believed that once these foundations are well understood, the student
will be able to learn advanced topics easily by using the information in the
Help menu.
The MATLAB program, like most other software, is continually being
developed and new versions are released frequently. This appendix cov-
ers version 7.4, release 14. It should be emphasized, however, that this
appendix covers the basics of MATLAB which do not change much from
version to version. This appendix covers the use of MATLAB on comput-
ers that use the Windows operating system and almost everything is the
same when MATLAB is used on other machines.

C.2 Some Basic MATLAB Operations


It is assumed that the software is installed on the computer and that the
user can start the program. Once the program starts, the window that
opens contains three smaller windows which are the Command Window
(main window, enter variables, runs programs), the Current Directory Win-
dow (logs commands entered in the Command Window), and the Command
History Window (shows the files in the current directory). Besides these,
there are other windows, including the Figure Window (contains output
Appendix C: Introduction to MATLAB 1009

from graphics commands), the Editor Window (creates and debugs script
and function files), the Help Window (gives help information), and the
Workspace Window (gives information about the variables that are used).
The Command Window in MATLAB is the main window and can be used
for executing commands, opening other windows, running programs writ-
ten by users, and managing the software.

(1) Throughout this discussion we use >> to indicate a MATLAB com-


mand statement. The command prompt >> may vary from system
to system. The command prompt >> is given by the system and
you only need to enter the MATLAB command.

(2) It is possible to include comments in the MATLAB workspace. Typ-


ing % before a statement indicates a comment statement. Comment
statements are not executable. For example:
>> % Find root of nonlinear equation f (x) = 0

(3) To get help on a topic, say, determinant, enter


>> help determinant

(4) A semicolon placed at the end of an expression suppresses the com-


puter output. For example:

>> a = 25;

Without ; a was displayed.

(5) If a command is too long to fit on one line, it can be continued to


the next line by typing three periods · · · (called an ellipsis). •
1010 Linear Algebra and Optimization using MATLAB

C.2.1 MATLAB Numbers and Numeric Formats


All numerical variables are stored in MATLAB in double-precision floating-
point form. While it is possible to force some variables to be other types,
this is not done easily and is unnecessary.
The default output to the screen is to have 4 digits to the right of the
decimal point. To control the formatting of output to the screen, use the
command format. The default formatting is obtained by using the follow-
ing command:

>> f ormat short


>> pi
ans =
3.1416
To obtain the full accuracy available in a number, we can use the following
(to have 14 digits to the right of the decimal point):

>> f ormat long


>> pi
ans =
3.14159265358979
The other format commands, called format short e and format long e, use
‘scientific notation’ for the output (4 decimal digits and 15 decimal digits):

>> f ormat short e


>> pi
ans =
3.1416e + 000
>> f ormat long e
>> pi
ans =
3.141592653589793e + 000
Appendix C: Introduction to MATLAB 1011

The other format commands, called format short g and format long g, use
‘scientific notation’ for the output (the best of 5-digit fixed or floating-point
and the best of 15-digit fixed or floating-point):
>> f ormat short g
>> pi
ans =
3.1416
>> f ormat long g
>> pi
ans =
3.14159265358979
We can also use the other format command for the output, called format
bank (to have 2 decimal digits):
>> f ormat bank
>> pi
ans =
3.14
There are two other format commands which can be used for the output,
called format compact (which eliminates empty lines to allow more lines
to be displayed on the screen) and format loose (which adds empty lines
(opposite of compact)).
As part of its syntax and semantics, MATLAB provides for exceptional
values. Positive infinity is represented by Inf, negative infinity by – Inf, and
not a real number by NAN. These exceptional values are carried through
the computations in a logically consistent way. •
1012 Linear Algebra and Optimization using MATLAB

C.2.2 Arithmetic Operations


Arithmetic in MATLAB follows all the rules and uses standard computer
symbols for its arithmetic operation signs:

Symbol Effect

+ Addition
− Subtraction
∗ M ultiplication
\ Division
∧ P ower
0 Conjugate transpose
pi, e Constants

In the present context, we shall consider these operations as scalar arith-


metic operations, which is to say that they operate on 2 numbers in the
conventional manner:

>> (4 − 2 + 3 ∗ pi)/2
ans =
5.7124
>> a = 2; b = sin(a);
>> 2 ∗ bˆ 2
ans =
1.6537
MATLAB’s arithmetic operations are actually much more powerful than
this. We shall see just a little of this extra power later.
There are some arithmetic operations that require great care. The order
in which multiplication and division operations are specified is especially
important. For example:

>> a = 2; b = 3; c = 4;
>> a/b ∗ c
Here, the absence of any parentheses results in MATLAB executing the
two operations from left-to-right so that:
Appendix C: Introduction to MATLAB 1013

First, a is divided by b, and then: The result is multiplied by c.

The result is therefore:

ans = 2.6667

This arithmetic is equivalent to ab c or as a MATLAB command:

>> (a/b) ∗ c

a/b a
Similarly, a/b/c yields the same result as c
or b/c
, which could be achieved
with the MATLAB command:

>> a/(b ∗ c)

Use parentheses to be sure that MATLAB does what you want.


MATLAB executes the calculations according to the order of prece-
dence which is the same as used in most calculations:

Precedence Mathematical Operation

First Parentheses. For nested parentheses,


the innermost are executed first.

Second Exponentiation.

Third Multiplication, division (equal precedence).

Fourth Addition and subtraction.

Note that in an expression that has several operations, higher prece-


dence operations are executed before lower precedence operations. If two
or more operations have the same precedence, the expression is executed
from left-to-right.
MATLAB can also be used as a calculator in the Command Window
by typing a mathematical expression. MATLAB calculates the expression
1014 Linear Algebra and Optimization using MATLAB

and responds by displaying ans = and the numerical result of the expres-
sion in the next line. For example:

>> 25 + 10/5
ans =
27
>> (25 + 10)/5
ans =
7
>> 35ˆ (1/2)+12*4
ans =
53.9161
>> 115ˆ (1/3)+(112*40)/12 - (0.87+3.25)/6
ans =
377.5096

C.2.3 MATLAB Mathematical Functions


All of the standard mathematical functions—often called elementary functions—
that we learned in our calculus courses are available in MATLAB using
their usual mathematical names. The important functions for our pur-
poses are:
Symbol Effect

abs(x) Absolute value


sqrtx Square root
sin(x) Sine f unction
cos(x) Cosine f unction
tan(x) T angent f unction
log(x) N atural logarithmic f unction
exp(x) Exponential f unction
atan(x) Inverse tangent f unction
acos(x) Inverse cosine f unction
asin(x) Inverse sine f unction
cos h(x) Hyperbolic cosine f unction
sin h(x) Hyperbolic sine f unction
Appendix C: Introduction to MATLAB 1015

Note that the various trigonometric functions expect their argument to be


in radian (or pure number) form but not in degree form. For example:

>> cos(pi/3)

gives the output:

ans = 0.5

C.2.4 Scalar Variables

A variable is a name made of a letter or a combination of several letters


that is assigned a numerical value. Once a variable is assigned a numerical
value, it can be used in mathematical expressions, in functions, and in any
MATLAB statements and commands. Note that the variables appear to
be scalars. In fact, all MATLAB variables are arrays. An important aspect
of MATLAB is that it works very efficiently with arrays and the main tasks
are best done with arrays.

A variable is actually the name of a memory location. When a new


variable is defined, MATLAB allocates an appropriate memory space where
the variable’s assignment is stored. When the variable is used the stored
data is used. If the variable is assigned a new value the content of the
memory location is replaced:

>> x = 0.5
>> z = sin(x) + cos(x)ˆ 2
z=
1.2496

The following commands can be used to eliminate variables or to obtain


1016 Linear Algebra and Optimization using MATLAB

information about variables that have been created:


Command Outcome

clear Remove all variables from the memory.

who Display a list of variables currently in the memory.

whos Display a list of variables currently in the memory


and their size togather with information about
their bytes and class. •

C.2.5 Vectors
In MATLAB the word vector can really be interpreted simply as a ‘list of
numbers.’ Strictly, it could be a list of objects other than numbers but ‘list
of numbers’ will fits our need’s for now.
There are two basic kinds of MATLAB vectors: row and column vec-
tors. As the names suggest, a row vector stores its numbers in a long
‘horizontal list’ such as

1, 2, 3, 1.23, −10.3, 1.2,

which is a row vector with 6 components. A column vector stores its num-
bers in a vertical list such as:
1
2
3
1.23
−10.3
2.1,
which is a column vector with 6 components. In mathematical notation
these arrays are usually enclosed in brackets [ ].
There are various convenient forms of these vectors for allocating val-
ues to them and accessing the values that are stored in them. The most
basic method of accessing or assigning individual components of a vector is
based on using an index, or subscript, which indicates the position of the
Appendix C: Introduction to MATLAB 1017

particular component in the list. MATLAB notation for this subscript is


to enclose it in parentheses ( ). For assigning a complete vector in a single
statement, we can use the square brackets [ ] notation. For example:

>> x = [1, 2, 3.4, 1.23, −10.3, 2.1]


x=
1.0000 2.0000 3.4000 1.2300 − 10.3000 2.1000
>> x(3) = x(1) + 3 ∗ x(6)
x=
1.0000 2.0000 7.3000 1.2300 − 10.3000 2.1000
Remember that when in entering values for a row vector, space could be
used in place of commas. For the corresponding column vector, simply
replace the commas with semicolons. To switch between column and row
format for a MATLAB vector we use the transpose operator denoted by 0.
For example:

>> x = x0
x=
1.0000
2.0000
7.3000
1.2300
−10.3000
2.1000
MATLAB has several convenient ways of allocating values to a vector where
these values fit a simple pattern.
The colon : has a very special and powerful role in MATLAB. Basically,
it allows an easy way to specify a vector of equally spaced numbers. There
are two basic forms of the MATLAB colon notation.
The first one is that two arguments are separated by a colon as in:

>> x = −2 : 4

which generates a row vector with the first component –2, the last one 4,
and others spaced at unit intervals.
1018 Linear Algebra and Optimization using MATLAB

The second form is that the three arguments separated by two colons
has the effect of specifying the starting value : spacing : final value. For
example:

>> x = −2 : 0.5 : 1

which generates

x=
−2.0 − 1.5 − 1.0 − 0.5 0.0 0.5 1.0
Also, one can use MATLAB colon notation as follows:

>> y = x(2 : 6)

which generates

y=
−1.5 − 1.0 − 0.5 0.0 0.5
MATLAB has two other commands for conveniently specifying vectors.
The first one is called the linspace function, which is used to specify a vec-
tor with a given number of equally spaced elements between specified start
and finish points. For example:

>> x = linspace(0, 1, 10)


x=
0.000 0.111 0.222 0.333 0.444 0.556 0.667 0.778 0.889 1.000
Using 10 points results in just 9 steps.
The other command is called the logspace function, which is similar to
the linspace function, except that it creates elements that are logarithmi-
cally equally spaced. The statement:

>> lognspace(start value, endvalue, numpoints)

will create numpoints elements between 10start value


and 10end value
. For
Appendix C: Introduction to MATLAB 1019

example:

>> x = lognspace(1, 4, 4)
x=
10 100 1000 10000

We can use MATLAB’s vectors to generate tables of function values. For


example:

>> x = linspace(0, 1, 11);


>> y = cos(x);
>> [x0 , y 0 ]
ans =
0.0000 1.0000
0.1000 0.9950
0.2000 0.9801
0.3000 0.9553
0.4000 0.9211
0.5000 0.8776
0.6000 0.8253
0.7000 0.7648
0.8000 0.6967
0.9000 0.6216
1.0000 0.5403

Note the use of the transpose to convert the row vectors to columns, and
the separation of these two columns by a comma.

Note also that the standard MATLAB functions are defined to operate on
vectors of inputs in an element-by-element manner. The following exam-
ple illustrates the use of the colon (:) notation and arithmetic within the
argument of a function as:
1020 Linear Algebra and Optimization using MATLAB

>> y = sqrt(4 + 2 ∗ (0 : 0.1 : 1)0 )


ans =
2.0000
2.0494
2.0976
2.1448
2.1909
2.2361
2.2804
2.3238
2.3664
2.4083
2.4495

C.2.6 Matrices
A matrix is a two-dimensional array of numerical values that obeys the
rules of linear algebra as discussed in Chapter 3.
To enter a matrix, list all the entries of the matrix with the first row,
separating the entries by blank space or commas, separating two rows by a
semicolon, and enclosing the list in square brackets. For example, to enter
a 3 × 4 matrix A, we do the following:

>> A = [1 2 3 4; 3 2 1 4; 4 1 2 3]

and it will appears as follows:

A=
1 2 3 4
3 2 1 4
4 1 2 3

There are also other options available when directly defining an array. To
define a column vector, we can use the transpose operation. For example:
Appendix C: Introduction to MATLAB 1021

>> [1 2 5]0

result in the column vector:

ans =
1
2
5

The components (entries) of matrices can be manipulated in several ways.


For example:

>> A = [1 2 3; 4 5 6; 7 8 9];
>> A(2, 3)
ans =
6

Select a submatrix of A as follows:

>> A([1 3], [1 3])


ans =
1 3
7 9

or

>> A(1 : 2, 2 : 3)
ans =
2 3
5 6

An individual element or group of elements can be deleted from vectors


and matrices by assigning these elements to the null (zero) matrix, [ ]. For
example:
1022 Linear Algebra and Optimization using MATLAB

>> x = [1 2 3 4 5];
>> x(3) = [ ]
x=
[1 2 4 5]
>> A = [1 2 3; 4 5 6; 7 8 9];
>> A(:, 1) = [ ]
ans =
2 3
5 6
8 9

To interchange the two rows of a given matrix A, we type the following:

>> B = A([new order of rows separating the entries by commas], :)

For example, if the matrix A has three rows and we want to change rows
1 and 3, we type:

>> B = A([3, 2, 1], :)

For example:

>> A = [1 2 3; 4 5 6; 7 8 9]
>> B = A([3, 2, 1], :)
B=
7 8 9
4 5 6
1 2 3

Note that the method can be used to change the order of any number of
rows.
Similarly, one can interchange the columns easily by typing:

>> B = A(:, [new order of columns separating the entries by commas])

For example, if the matrix A has three columns and we want to change
Appendix C: Introduction to MATLAB 1023

column 1 and 3, we type:

>> B = A(:, [3, 2, 1])


B=
3 2 1
6 5 4
9 8 7
Note that the method can be used to change the order of any number of
columns.
In order to replace the kth row of a matrix A, set A(k, :) equal to the
new entries of the row separated by a space and enclosed in square brack-
ets, i.e., type:

>> A(k, :) = [New entries of kth row]

For example, to change the second row of a 3 × 3 matrix A to [2, 2, 2], type
the command:

>> A(2, :) = [2 2 2]

For example:

>> A = [1 2 3; 4 5 6; 7 8 9]
>> A(2, :) = [2 2 2]
A=
1 2 3
2 2 2
7 8 9
Similarly, one can replace the kth column of a matrix A equal to the new
entries of the column in square brackets separated by semicolons, i.e., type:

>> A(:, k) = [New entries of kth column]

For example, to change the second column of a 3 × 3 matrix A to [2, 2, 2]0 ,


type the command:
1024 Linear Algebra and Optimization using MATLAB

>> A(:, 2) = [2 2 2]

For example:

>> A = [1 2 3; 4 5 6; 7 8 9]
>> A(:, 2) = [2; 2; 2]
A=
1 2 3
4 2 6
7 2 9

C.2.7 Creating Special Matrices


There are several built-in functions for creating vectors and matrices.

• Create a zero matrix with m rows and n columns using zeros function
as follows:

>> A = zeros(m, n)

or, one can create an n × n zero matrix as follows:

>> A = zeros(n)

For example:

>> A = zeros(3)
A=
0 0 0
0 0 0
0 0 0
Appendix C: Introduction to MATLAB 1025

• Create an n × n ones matrix using the ones function as follows:

>> A = ones(n, n)

For example, the 3 × 3 ones matrix:

>> A = ones(3, 3)
A=
1 1 1
1 1 1
1 1 1

Of course, the matrix need not be square:

>> A = ones(2, 4)
A=
1 1 1 1
1 1 1 1

Indeed, ones and zeros can be used to create row and column vectors:

>> u = ones(1, 4)
u=
1 1 1 1

and

>> v = ones(1, 4)
v=
1
1
1
1
1026 Linear Algebra and Optimization using MATLAB

• Create an n × n identity matrix using the eye function as follows:

>> I = eye(n)

For example:

>> I = eye(3)
I=
1 0 0
0 1 0
0 0 1

• Create an n × n diagonal matrix using the diag function, which either


creates a matrix with specified values on the diagonal or it extracts
the diagonal entries. Using the diag function, the argument must be
a vector:

>> v = [4 5 6];
>> A = diag(v)
A=
4 0 0
0 5 0
0 0 6

or it can be specified directly in the input argument as in:

>> A = diag([4 5 6])

To extract the diagonal entries of an existing matrix, the same diag


function is used, but with the input being a matrix instead of a vector:
Appendix C: Introduction to MATLAB 1027

>> u = diag(A)
u=
4
5
6

• Create the length function and size function which are used to deter-
mine the number of elements in vectors and matrices. These functions
are useful when one is dealing with matrices of unknown or variable
size, especially when writing loops. To define the length function,
type:

>> u = 1 : 5
u=
1 2 3 4 5

Then

>> n = length(u)
n=5

Now to define the size command, which returns two values and has
the syntax:

[nr, nc] = size(A),

where nr is the number of rows and nc is the number of columns in


matrix A. For example:
1028 Linear Algebra and Optimization using MATLAB

>> A = eye(3, 4);


>> [nr, nc] = size(A)
nr = 3
nc = 4
>> B = ones(size(A))
B=
1 1 1 1
1 1 1 1
1 1 1 1

• Creating a square root of a matrix A using the sqrt function means


to obtain a matrix B with entries of the square root of the entries of
matrix A. Type:

>> B = sqrt(A)

For example:

>> A = [1 4 5; 2 3 4; 4 7 8];
>> B = sqrt(A)
B=
1.0000 2.0000 2.2361
1.4142 1.7321 2.0000
2.0000 2.6458 2.8284

• Create an upper triangular matrix for a given matrix A using the


triu function as follows:

>> U = triu(A)

For example:
Appendix C: Introduction to MATLAB 1029

>> A = [1 2 3; 4 5 6; 7 8 9];
>> U = triu(A)
A=
1 2 3
0 5 6
0 0 9

Also, one can create an upper triangular matrix from a given matrix
A with a zero diagonal as:

>> W = triu(A, 1)

For example:

>> A = [1 2 3; 4 5 6; 7 8 9];
>> W = triu(A, 1)
W =
0 2 3
0 0 6
0 0 0

• Create a lower triangular matrix A for a given matrix using the tril
function as:

>> L = tril(A)

For example:

>> A = [1 2 3; 4 5 6; 7 8 9];
>> L = tril(A)
A=
1 0 0
4 5 0
7 8 9
1030 Linear Algebra and Optimization using MATLAB

Also, one can create a lower triangular matrix from a given matrix
A with a zero diagonal as follows:

>> V = tril(A, 1)

For example:

>> A = [1 2 3; 4 5 6; 7 8 9];
>> V = tril(A, 1)
V =
0 0 0
4 0 0
7 8 0

• Create an n × n random matrix using the rand function as follows:

>> R = rand(n)

For example:

>> R = rand(3)
R=
0.6038 0.0153 0.9318
0.2722 0.7468 0.4660
0.1988 0.4451 0.4186

• Create a reshape matrix of matrix A using the reshape function as


follows:

>> B = reshape(A, newrows, newcols)


Appendix C: Introduction to MATLAB 1031

For example:

>> A = [1 2 3; 4 5 6; 7 8 9; 10 11 12]
>> B = reshape(A, 2, 6)
B=
1 7 2 8 3 9
4 10 4 11 6 12

and

>> c = reshape(A, 1, 12)


c=
1 4 7 10 2 5 8 11 3 6 9 12

• Create an n × n Hilbert matrix using the hilb function as follows:

>> H = hilb(n)

For example:

>> H = hilb(3)
H=
1.0000 0.5000 0.3333
0.5000 0.3333 0.2500
0.3333 0.2500 0.2000

• Create a Toeplitz matrix with a given column vector C as the first


column and a given row vector R as the first row using the toeplitz
function as follows:

>> U = toeplitz(C, R)
1032 Linear Algebra and Optimization using MATLAB

C.2.8 Matrix Operations


The basic arithmetic operations of addition, subtraction, and multiplica-
tion may be applied directly to matrix variables, provided that the partic-
ular operation is legal under the rules of linear algebra. When two matrices
have the same size, we add and subtract them in the standard way matrices
are added and subtracted. For example:

>> A = [3 2 − 3; 4 5 6; 7 6 7];
>> B = [1 2 3; 4 − 2 1; 7 5 − 4];
>> C = A + B
C=
4 4 0
8 3 7
14 11 3

and the difference of A and B gives:

>> D =A−B
D=
2 0 −6
0 7 5
0 1 11

Matrix multiplication has the standard meaning as well. Given any two
compatible matrix variables A and B, MATLAB expression A ∗ B evalu-
ates the product of A and B as defined by the rules of linear algebra. For
example:

>> A = [2 3; −1 4];
>> B = [5 − 2 1; 3 8 − 6];
>> C = A ∗ B
C=
19 20 −16
7 34 −25

Also,
Appendix C: Introduction to MATLAB 1033

>> A = [1 2; 3 4];
>> B = A0 ;
>> C = 3 ∗ (A ∗ B)ˆ 3
C=
13080 29568
29568 66840

Similarly, if the two vectors are the same size, they can be added or sub-
tracted from one other. They can be multiplied, or divided by a scalar, or
a scalar can be added to each of their components.
Mathematically the operation of division by a vector does not make
sense. To achieve the corresponding component-wise operation, we use the
./ operator. Similarly, for multiplication and powers we use .∗ and .∧,
respectively. For example:

>> a = [1 2 3];
>> b = [2 − 1 4];
>> c = a. ∗ b
c=
2 −2 12

Also,

>> c = a./b
c=
0.5 −2.0 0.75

and

>> c = a.ˆ 3
c=
1 8 27

Similarly,
1034 Linear Algebra and Optimization using MATLAB

>> c = 2.ˆ a
c=
2 4 8
and

>> c = b.ˆ b
c=
2 1 64
Note that these operations apply to matrices as well as vectors. For exam-
ple:

>> A = [1 2 3; 4 5 6; 7 8 9];
>> B = [9 8 7; 6 5 4; 3 2 1];
>> C = A. ∗ B
C=
9 16 21
24 25 24
21 16 9
Note that A. ∗ B is not the same as A ∗ B.

>> C = A.ˆ 2
C=
1 4 9
16 25 36
49 64 81
and

>> C = A.ˆ (1/2)


C=
1.0000 1.4142 1.7321
2.0000 2.2361 2.4495
2.6458 2.8284 3.0000
Note that there are no such special operators for addition and subtraction.
Appendix C: Introduction to MATLAB 1035

C.2.9 Strings and Printing


Strings are matrices with character elements. In more advanced applica-
tions such as symbolic computation, string manipulation is a very impor-
tant topic. For our purposes, however, we shall need only very limited
skills in handling strings initially. One most important use might be to
include your name.
Strings can be defined in MATLAB by simply enclosing the appropriate
string of characters in single quotes such as:

>> f irst =0 Rizwan0 ;


>> last =0 Butt0 ;
>> name = [f irst, last]
name =
Rizwan Butt
Since the transpose operator and the string delimiter are the same character
(the single quote), creating a single column vector with a direct assignment
requires enclosing the string literal in parentheses:

>> Last N ame = (0 Butt0 )0


N ame =
B
u
t
t
String matrices can also be created as follows:

>> N ame = [0 Rizwan0 ;0 Butt0 ]


N ame =
Rizwan
Butt
There are two functions for text output, called disp and fprintf. The disp
1036 Linear Algebra and Optimization using MATLAB

function is suitable for a simple printing task. The fprintf function provides
fine control over the displayed information as well as the capability of
directing the output to a file.
The disp function takes only one argument, which may be either a
string matrix or a numerical matrix. For example:

>> disp(0 Hello0 )


Hello

and

>> x = 0 : pi/5 : 2 ∗ pi;


>> y = sin(x);
>> disp([x0 y 0 ])
0.0000 1.0000
0.6283 0.8090
1.2566 0.3090
1.8850 −0.3090
2.5133 −0.8090
3.1416 −1.0000
3.7699 −0.8090
4.3982 −0.3090
5.0265 0.3090
5.6549 0.8090
6.2832 1.0000

More complicated strings can be printed using the fprintf function. This is
essentially a C programming command that can be used to obtain a wide
range of printing specifications. For example:

>> f printf (0 M y N ame is \ n Rizwan Butt \ n0 )


M y N ame is
Rizwan Butt

where the \n is the newline command.

The sprintf function allows specification of the number of digits in the


Appendix C: Introduction to MATLAB 1037

display, as in:

>> root2 = f printf (0 T he square root of 2 is %9.7f 0 , (sqrt(2)))


root2 =
The square root of 2 is 1.4142136
or use of the exponential format:
>> root2 = f printf (0 T he square root of 2is %11.5e0 , (sqrt(2)))
root2 =
The square root of 2 is 1.41421e + 000

C.2.10 Solving Linear Systems


MATLAB started as a linear algebra extension of Fortran. Since its early
days, MATLAB has been extended beyond its initial purpose, but linear
algebra methods are still one of its strongest features. To solve the linear
system
Ax = b
we can just set

>> x = A \ b

with A as a nonsingular matrix. For example:

>> A = [1 1 1; 2 3 1; 1 − 1 − 2];
>> b = [2; 3; −6];
>> x = A \ b
x=
−1
1
2
There are a small number of functions that should be mentioned.
• Reduce a given matrix A to reduced row echelon form by using the
rref function as:
1038 Linear Algebra and Optimization using MATLAB

>> rref (A)

For example:

>> A = [1 1 1; 2 3 1; 1 − 1 − 2];
>> rref (A)
ans =
1 0 0
0 1 0
0 0 1

• Find the determinant of a matrix A by using the det function as:

>> det(A)

For example:

>> A = [1 2 − 1; 3 0 1; 4 2 1];
>> det(A)
ans = −6

• Find the rank of a matrix A by using the rank function as:

>> rank(A)

For example:

>> A = [1 4 5; 2 3 4; 4 7 8];
>> rank(A)
ans = 3
Appendix C: Introduction to MATLAB 1039

• Find the inverse of a nonsingular matrix A by using the inv function


as:

>> inv(A)

For example:

>> A = [1 1 1; 1 2 4; 1 3 9];
>> inv(A)
ans =
3.0000 −3.0000 1.0000
−2.5000 4.0000 −1.5000
0.5000 −1.0000 0.5000

• To find the augmenting matrix [A b], which is the combination of


the coefficient matrix A and the right-hand side vector b of the linear
system Ax = b and saving the answer in the matrix C, type:

>> C = [A b];

For example:

>> A = [1 1 1; 1 2 4; 1 3 9];
>> b = [2; 3; 4];
>> C = [A b]
C=
1 1 1 2
1 2 3 3
1 4 9 4

• The LU decomposition of a matrix A can be computed by using the


lu function as:
1040 Linear Algebra and Optimization using MATLAB

>> [L, U ] = lu(A)

For example:

>> A = [1 4 5; 2 3 4; 4 7 8];
>> [L, U ] = lu(A)
L=
0.2500 1.0000 0.0000
0.5000 −0.2222 1.0000
1.0000 0.0000 0.0000

and

U=
4.0000 7.0000 8.0000
0.0000 2.2500 3.0000
0.0000 0.0000 0.6667

• Using indirect LU decomposition, one can compute as:

>> [L, U, P ] = lu(A)

For example:

>> A = [1 4 5; 2 3 4; 4 7 8];
>> [L, U, P ] = lu(A)
L=
1 0 0
0.25 1 0
0.5 −0.2 1

and
Appendix C: Introduction to MATLAB 1041

U=
4 7 8
0 2.25 3
0 0 0.67

and

P =
0 0 1
1 0 0
0 1 0

• One can compute the various norms of the vectors and matrices by
using the norm function. The expression norm(A, 2) or norm(A)
gives the Euclidean norm or l2 -norm of A while norm(A,Inf) gives
the maximum or l∞ -norm. Here, A can be a vector or a matrix. The
l1 -norm of a vector or matrix can be obtained by norm(A,1). For
example, the different norms of the vector can be obtained as:

>> a = [6, 7, 8]
V 1 >> norm(a)
V 1 = 12.8841
V 2 >> norm(a, 1)
V 2 = 22
V 3 >> norm(a, Inf )
V3=8

Similarly, to find the different norms of matrix A type:


1042 Linear Algebra and Optimization using MATLAB

>> A = [1 1 1; 1 2 4; 1 3 9]
M 1 >> norm(A)
M 1 = 10.6496
M 2 >> norm(A, 1)
M 2 = 14
M 3 >> norm(A, Inf )
M 3 = 13

• The condition number of a matrix A can be obtained by using the


cond function as cond(A). This is equivalent to norm(A, Inf )∗norm(inv(A), Inf ).
For example:

>> A = [1 1 1; 1 2 4; 1 3 9]
>> B = inv(A) = [3 − 3 1; −2.5 4 − 1; 0.5 − 1 0.5]
N 1 >> norm(A, Inf )
N 1 = 13
N 2 >> norm(B, Inf )
N2 = 8

Thus, the condition number of a matrix A is computed by cond(A) as:

>> cond(A) = N 1 ∗ N 2
cond(A) = 104

• The root of polynomial p(x) can be obtained by using the roots func-
tion roots(p). For example, if p(x) = 3x2 + 5x − 6 is a polynomial,
enter:

>> p = [35 − 6];


>> r = roots(p)
r=
−2.4748
0.8081
Appendix C: Introduction to MATLAB 1043

• Use the polyvar function to evaluate a polynomial pn (x) at a partic-


ular point x. For example, to find the polynomial function p3 (x) =
x3 − 2x + 12 at given point x = 1.5, type:

>> ceof = [1 0 − 2 12];


>> sol = polyvar(coef, 1.5)
sol = 12.3750

• Create eigenvalues and eigenvectors of a given matrix A by using the


eig function as follows:

>> [U, D] = eig(A)

Here, U is a matrix with columns as eigenvectors and D is a diagonal


matrix with eigenvalues on the diagonal. For example:

>> A = [1 1 2; −1 2 1; 0 1 3];
>> [U, D] = eig(A)
U=
0.4082 −0.5774 0.7071
0.8165 0.5774 0.0000
−0.4082 −0.5774 0.7071

and

D=
1 0 0
0 2 0
0 0 3

which shows that 1, 2, and 3 are eigenvalues of the given matrix.


1044 Linear Algebra and Optimization using MATLAB

C.2.11 Graphing in MATLAB


Plots are a very useful tool for presenting information. This is true in
any field, but especially in science and engineering where MATLAB is
mostly used. MATLAB has many commands that can be used for creating
different types of plots. MATLAB can produce two- and three-dimensional
plots of curves and surfaces. The plot command is used to generate graphs
of two-dimensional functions. MATLAB’s plot function has the ability
to plot many types of ‘linear’ two-dimensional graphs from data which is
stored in vectors or matrices. For producing two-dimensional plots we have
to do the following:

• Divide the interval into subintervals of equal width. To do this, type:

>> x = a : d : b;

where a is the lower limit, d is the width of each subinterval, and b


is the upper limit of the interval.

• Enter the expression for y in term of x as:

>> y = f (x);

• Create the plot by typing:

>> plot(x, y)

For example, to graph the function y = ex + 10, type:

>> x = −2 : 0.1 : 2;
>> y = exp(x) + 10;
>> plot(x, y)
Appendix C: Introduction to MATLAB 1045

Figure C.1: Graph of y = ex + 10.

By default, the plot function connects the data with a solid line. The
markers used for points in a plot may be any of the following:

Symbol Effect

• P oint
◦ Circle
× Cross
? Star

For example, to put a marker for points in the above function plot using
the following commands, we get:

>> x = −2 : 0.1 : 2;
>> y = exp(x) + 10;
>> plot(x, y,0 o0 )
To plot several graphs using the hold on, hold off commands, one graph
is plotted first with the plot command. Then the hold on command is
typed. It keeps the Figure Window with the first plot open, including its
1046 Linear Algebra and Optimization using MATLAB

axis properties and formatting if any was done. Additional graphs can
be added with plot commands that are typed next. Each plot command
creates a graph that is equal to that figure. To stop this process, the hold
off command can be used. For example:

>> x = [−3 : 0.01 : 5];


>> y = x.ˆ 2+x − cos(x);
>> plot(x, y,0 −0 )
>> hold on
>> dy = 2 ∗ x + sin(x);
>> plot(x, dy,0 −−0 )
>> ddy = 2 + cos(x);
>> plot(x, ddy,0 .0 )
>> dddy = −sin(x);
>> plot(x, dddy,0 .−0 )
>> hold of f

Figure C.2: Graph of function and its first three derivatives.

Also, we can used the fplot command, which plots a function with the
form y = f (x) between specified limits. For example, to plot the function
Appendix C: Introduction to MATLAB 1047

Figure C.3: A plot of the function y = x3 + 2 cos x + 4.

f (x) = x3 +2 cos x+4 in the domain −2 ≤ x ≤ 2 in the Command Window


type:

>> f plot(0 x.ˆ 3+2 ∗ cos(x) + 40 , [−2, 2])

Three-dimensional surface plots are obtained by specifying a rectangular


subset of the domain of a function with the meshgrid command and then us-
ing the mesh or surf commands to obtain a graph. For a three-dimensional
graph, we do the following:

For the function of two variables Z = f (X, Y ) and three-dimensional plots,


use the following procedure:

• Define the scaling vector for X. For example, to divide the interval
[−2, 2] for x into subintervals of width 0.1, enter:

>> x = −2 : 0.1 : 2;

• Define the scaling vectors for Y . In order to use the same scaling for
1048 Linear Algebra and Optimization using MATLAB

y, enter:

>> y = x;

One may, however, use a different scaling for y.

• Create a meshgrid for the x and y axis:

>> [X, Y ] = meshgrid(x, y);

• Compute the function Z = f (X, Y ) at the points defined in the first


two steps. For example, if f (X, Y ) = −3X + Y , enter:

>> Z = −3 ∗ X + Y ;

• To plot the graph of Z = f (X, Y ) in three dimensions, type:

>> mesh(X, Y, Z)

p
For example, to create a surface plot of z = x2 + y 2 + 1 on the
domain −5 ≤ x ≤ 5, −5 ≤ y ≤ 5, we type the following:

>> x = linspace(−5, 5, 20);


>> y = linspace(−5, 5, 20);
>> [X, Y ] = meshgrid(x, y);
>> R = sqrt(X.ˆ 2+Y.ˆ 2+1) + eps;
>> Z = sin(R)./R;
>> surf (X, Y, Z)

Adding eps (a MATLAB command that returns the smallest floating-point


number on your system) avoids the indeterminate 0/0 at the origin.
Appendix C: Introduction to MATLAB 1049

p p
Figure C.4: Surface plot of z = sin( x2 + y 2 + 1)/ x2 + y 2 + 1.

Subplots

Often, it is in our interest to place more than one plot in a single figure
window. This is possible with the graphic command called the subplot
function, which is always called with three arguments as in:

>> subplot(nrows, ncols, thisplot)

where nrows and ncols define a visual matrix of plots to be arranged in a


single figure window and thisplot indicates the number of subplots that is
being currently drawn. This plot is an integer that counts across rows and
then columns. For a given arrangement of subplots in a figure window,
the nrows and ncols arguments do not change. Just before each plot in
the matrix is drawn, the subplot function is issued with the appropriate
value of thisplot. The following figure shows four subplots created with the
following statements:
1050 Linear Algebra and Optimization using MATLAB

>> x = linspace(0, 2 ∗ pi);


>> subplot(2, 2, 1);
>> plot(x, cos(x)); axis([0 2 ∗ pi − 1.5 1.5]); title(0 cos(x)0 );
>> subplot(2, 2, 2);
>> plot(x, cos(2 ∗ x)); axis([0 2 ∗ pi − 1.5 1.5]); title(0 cos(2x)0 );
>> subplot(2, 2, 3);
>> plot(x, cos(3 ∗ x)); axis([0 2 ∗ pi − 1.5 1.5]); title(0 cos(3x)0 );
>> subplot(2, 2, 4);
>> plot(x, cos(4 ∗ x)); axis([0 2 ∗ pi − 1.5 1.5]); title(0 cos(4x)0 );

Figure C.5: Four subplots in a figure window.

Similarly, one can use the subplots function for creating surface plots by
using the following command:

>> x = linspace(−5, 5, 20); y = linspace(−5, 5, 20);


>> [X, Y ] = meshgrid(x, y); Z = 2 + (X.ˆ 2+Y.ˆ 2);
>> subplot(2, 2, 1); mesh(x, y, Z); title(0 meshplot0 );
>> subplot(2, 2, 2); surf (x, y, Z); title(0 surf plot0 );
>> subplot(2, 2, 3); surf c(x, y, Z); title(0 surf cplot0 );
>> subplot(2, 2, 4); surf l(x, y, Z); title(0 surf lplot0 );
Appendix C: Introduction to MATLAB 1051

Figure C.6: Four types of surface plots.

C.3 Programming in MATLAB


Here, we discuss the structure and syntax of MATLAB programs. There
are many similarities between MATLAB and other high-level languages.
The syntax is similar to Fortran, with the same ideas borrowed from C.
MATLAB has loop and conditional execution constructs. Several impor-
tant features of MATLAB differentiate it from other high-level languages.
MATLAB programs are tightly integrated into an interactive environment.
MATLAB programs are interpreted, not compiled. All MATLAB variables
are sophisticated data structures that manifest themselves to the user as
matrices. MATLAB automatically manages dynamic memory allocation
for matrices, which affords convenience and flexibility in the development
of algorithms. MATLAB provides highly optimized, built-in routines for
multiplication, adding, and subtracting matrices, along with solving linear
systems and computing eigenvalues.

C.3.1 Statements for Control Flow


The commands for, while, and if define decision-making structures called
control flow statements for the execution of parts of a script based on vari-
ous conditions. Each of the three structures is ended by an end command.
The statements that we use to control the flow are called relations.
1052 Linear Algebra and Optimization using MATLAB

The repetitions can be handled in MATLAB by using a for loop or


a while loop. The syntax is similar to the syntax of such loops in any
programming languages. In the following, we discuss such loops.

C.3.2 For Loop


This loop enables us to have an operation repeat a specified number of
times. This may be required in summing terms of a series, or specifying
the elements of a nonuniformly spaced vector such as the first terms of a
sequence defined recursively.
The syntax includes a counter variable, initial value of the counter, the
final value of the counter, and the action to be performed, written in the
following format:

>> f or counter name = initial value : f inal value, action; end

For example, in order to create the 1 × 4 row vector x with entries accord-
ing to formula x(i) = i; type:

>> f or i = 1 : 4
x(i) = i
end

The action in this loop will be performed once for each value of counter
name i beginning with the initial value 1 and increasing by each time until
the actions are executed for the last time with the final value i = 4.

C.3.3 While Loop


This loop allows the number of times the loop operation is performed to
be determined by the results. It is often used in iterative processes such as
approximations to the solution of an equation.
The syntax for a while loop is as follows:
Appendix C: Introduction to MATLAB 1053

>> while (condition) action; increment action; end

The loop is executed until the condition (the statement in parentheses) is


evaluated. Note that the counter variable must be initialized before using
the above command and the increment action gives the increment in the
counter variable. For example:

>> x = 1;
while x > 0.01
x = x/2;
end
>> disp(x)
which generates:

x=
0.5000
0.2500
0.1250
0.0625
0.0313
0.0156
0.0078

C.3.4 Nested for Loops


In order to have a nest of for loops or while loops, each type of loop must
have a separate counter. The syntax for two nested for loops is:

>> f or counter1 = initial value1 : f inal value1,


>> f or counter2 = initial value2 : f inal value2,
action;
end
end
For example, in order to create a 5 × 4 matrix A by the formula A(i, j) =
1054 Linear Algebra and Optimization using MATLAB

i + j, type:

>> f or i = 1 : 5,
for j = 1 : 4,
A(i, j) = i + j;
end,
end

which generates a matrix of the form:

ans =
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9

C.3.5 Structure
Finally, we introduce the basic structure of MATLAB’s logical branching
commands. Frequently, in programs, we wish for the computer to take dif-
ferent actions depending on the value of some variables. Strictly speaking
these are logical variables, or, more commonly, logical expressions similar
to those we saw when defining while loops.
Two types of decision statements are possible in MATLAB, one-way
decision statements and two-way decision statements.
The syntax for the one-way decision statement is:

>> if (condition), action, end

in which the statements in the action block are executed only if the con-
dition is satisfied (true). If the condition is not satisfied (false) then the
action block is skipped. For example:

>> if x > 0, disp(0 x is positive number0 ); end


Appendix C: Introduction to MATLAB 1055

For a two-way decision statement we define its syntax as:

>> if (condition), action, else action, end

in which the first set of instructions in the action block is executed if the
condition is satisfied while the second set, the action block, is executed if
the condition is not satisfied. For example, if x and y are two numbers and
we want to display the value of the number, we type:

>> if (x > y), disp(x), else disp(y), end

MATLAB also contains a number of logical and relational operators.

The logical operations are represented by the following:

Symbol Effect

& and
| or
∼ not

However, these operators not only apply to side variables but such opera-
tors will also work on vectors and matrices when the operation is valid.

The relational operators in MATLAB are:

Symbol Effect

== is equal to
<= is less than or equal
>= is greater than or equal
∼= is not equal to
< is less than
> is greater than
1056 Linear Algebra and Optimization using MATLAB

The relational operators are used to compare values or elements of arrays.


If the relationship is true, the result is a logical variable whose value is one.
Otherwise, the value is zero if the relationship is false.

C.4 Defining Functions


A simple function in mathematics, f (x), associates a unique number to
each value of x. The function can be expressed in the form y = f (x),
where f (x) is usually a mathematical expression in terms of x. Many
functions are programmed inside MATLAB as built-in functions and can
be used in mathematical expressions simply by typing their names with
an argument; examples are tan(x), sqrt(x), and exp(x). A user-defined
function is a MATLAB program that is created by the user, saved as a
function file, and then can be used like a built-in function.
MATLAB allows us to define their functions by constructing an m-file
in the m-file editor. If the m-file is to be a function m-file, the first word
of the file is function, and we must also specify names for its input and
output. The last two of these are purely local variable names.
The first line of the function has the form:

function y = function name(input arguments)

For example, to define the function

2x
f (x) = ex − ,
(1 + x3 )

type:

f unctiony = f n1(x)
y = exp(x) − 2 ∗ x./(1 + x.ˆ 3);
Once this function is saved as an m-file named fn1.m, we can use the
MATLAB Command Window to compute function at any given point.
For example:
Appendix C: Introduction to MATLAB 1057

>> x = (0 : 0.2 : 2)0 ;


>> f x = f n1(x);
>> [x0 , f x0 ]

generates the following table:

ans =
0.0000 1.0000
0.2000 0.8246
0.4000 0.7399
0.6000 0.8353
0.8000 1.1673
1.0000 1.7183
1.2000 2.4404
1.4000 3.3073
1.6000 4.3251
1.8000 5.5227
2.0000 6.9446

MATLAB provides the option of using inline functions. An inline function


is defined with computer code (not as a separate file like a function file)
and is then used in the code. Inline functions are created with the inline
command according to the following format:

name = inline(’expression’)

x2
For example, the function f (x) = √ can be defined in the MATLAB
x2 + 1
Command Window as follows:

>> y = inline(0 xˆ 2 /sqrt(xˆ 2+1)0 )


>> y =
Inline f unction :
y(x) = xˆ 2/sqrt(xˆ 2+1)

The function can be calculated for different values of x. For example,


1058 Linear Algebra and Optimization using MATLAB

>> y(2) =
ans =
1.7889
If x is expected to be an array and the function is calculated for each
element, then the function must be modified for element-by-element calcu-
lations:

>> y = inline(0 x.ˆ 2./sqrt(x.ˆ 2+1)0 )


>> x = ([1 2 3])
ans =
0.7071 1.7889 2.8460
If the inline function has two or more independent variables it can be writ-
ten in the following format:

>> y = inline(0 x.ˆ 2+y.ˆ 2+4. ∗ x. ∗ y 0 )


>> z(1, 2)
ans =
13
In MATLAB we can use the feval (function evaluate) command to evalu-
ate the values of a function for a given value (or values) of the function’s
argument.

variable = feval(’function name’, argument value)

For example:

>> y = f eval(0 sqrt0 , 169)


y=
13
Note that feval command can be used in the user-defined function. •
Appendix C: Introduction to MATLAB 1059

C.5 MATLAB Built-in Functions


Listed below are some of the MATLAB built-in functions grouped by sub-
ject area:

Built-in Function Definition

abs absolute value


cos cosine function
sin sine function
tan tangent function
cosh cosine hyperbolic function
sinh sine hyperbolic function
tanh tangent hyperboic function
acos inverse cosine function
asin inverse sine function
atan inverse tangent function Z x
√ 2
erf error function erf (x) = (2/ π) e−t dt
0
exp exponential function
expm matrix exponential
log natural logarithm
log10 common logarithm
sqrt calculate square root
sqrm calculate square root of a matrix
sort arrange elements in asending order
std calculate standard deviation
mean calculate mean value
median calculate median value
sum calculate sum of elements
angle calculate phase angle
fix round toward zero
floor round toward - ∞
ceil round toward ∞
sign signum function
round round to nearest integer
dot dot product of two vectors
1060 Linear Algebra and Optimization using MATLAB

Built-in Function Definition

cross cross product of two vectors


dist distance between two points
frac return the rational approximation
max return maximum value
min return minimum value
factorial factorial function
rref reduce echelon form
zeros generates a matrix of all zeros
ones generates a matrix of all ones
eye generates an identity matrix
hilb generates a Hilbert matrix
reshape rearranges a matrix
det calculate a determinant of a matrix
eig calculate a eigenvalues of a matrix
rank calculate a rank of a matrix
norm(v) calculate a the Euclidean norm of a vector v
norm(v,inf) calculate a maximum norm of vector v
cond(A,2) calculate a condition number of matrix using
Euclidean norm
cond(A,inf) calculate a condition number of matrix using
maximum norm
toeplitz creates a Toeplitz matrix
inv find the inverse of a matrix
pinv find the pseudoinverse of a matrix
diag create a diagonal matrix
length number of elements in a vector
size size of an array
qr create QR-decomposition of a matrix
svd calculate singular value decomposition of a matrix
polyval calculate the value of a polynomial
roots calculate the roots of a polynomial
conv multiplies two polynomials
deconv divide two polynomials
polyder calculate derivative of a polynomial
Appendix C: Introduction to MATLAB 1061

Built-in Function Definition

polyint calculate integral of a polynomial


polyfit calculate coefficients of a polynomial
fzero solve an equation with one variable
quad integrate a function
linspace create equally spaced vector
logspace create logarithically spaced elements
axis sets limts to axes
plot create a plot
pie create a pie plot
polar create a polar plot
hist create a histogram
bar create a vertical bar plot
barh create a horizontal bar plot
fplot plot a function
bar3 create a vertical 3-D bar plot
contour create a 2-D contour plot
contour3 create a 3-D contour plot
cylinder create a cylinder
mesh create a mesh plot
meshc create a mesh and a contour plot
surf create a surface plot
surfc create a surface and a contour plot
surf1 create a surface plot with lighting
sphere create a sphere
subplot create multiple plot on one page
title add a title to a plot
xlabel add label to x-axis
ylabel add label to y-axis
grid add grid to a plot

C.6 Symbolic Computation


In this appendix we discuss symbolic computation which is an important
and complementary aspect of computing. As we have noted, MATLAB
1062 Linear Algebra and Optimization using MATLAB

uses floating-point arithmetic for its calculations. But one can also do exact
arithmetic with symbolic expressions. Here, we will give many examples
to get the exact arithmetic. The starting point for symbolic operations
is symbolic objects. Symbolic objects are made of variables and numbers
that, when used in mathematical expressions, tell MATLAB to execute
the expression symbolically. Typically, the user first defines the symbolic
variables that are needed and then uses them to create symbolic expressions
that are subsequently used in symbolic operations. If needed, symbolic
expressions can be used in numerical operations.
Many applications in mathematics, science, and engineering require
symbolic operations, which are mathematical operations with expressions
that contain symbolic variables. Symbolic variables are variables that don’t
have specific numerical values when the operation is executed. The result
of such operations is also mathematical expression in terms of the sym-
bolic variables. Symbolic operations can be performed by MATLAB when
the Symbolic Math Toolbox is installed. The Symbolic Math Toolbox is
included in the student version of the software and can be added to the
standard program. The Symbolic Math Toolbox is a collection of MAT-
LAB functions that are used for execution of symbolic operations. The
commands and functions for the symbolic operations have the same style
and syntax as those for the numerical operations.
Symbolic computations are performed by computer programs such as
Derive R
, Maple R
, and Mathematica R
. MATLAB also supports sym-
bolic computation through the Symbolic Math Toolbox, which uses the
symbolic routines of Maple. To check if the Symbolic Math Toolbox is
installed on a computer, one can type:

>> ver

In response, MATLAB displays information about the version that is used


as well as a list of the toolboxes that are installed.
Using the MATLAB Symbolic Math Toolbox, we can carry out al-
gebraic or symbolic calculations such as factoring polynomials or solving
algebraic equations. For example, to add three numbers 34 , 41 , and 54 sym-
bolically, we do the following:
Appendix C: Introduction to MATLAB 1063

>> sym(’3/4’) + sym(’1/4’) + sym(’5/4’)


ans =
9/4

Symbolic computations can be performed without the approximations √ √that


are necessary for numerical calculations. For example, to evaluate 5 5−5
symbolically, we type:

>> sym(sqrt(5)) ∗ sym(sqrt(5)) − 5


ans =
0

But when we do the same calculation numerically, we have:

>> sqrt(5) ∗ sqrt(5) − 5


ans =
8.8818e − 016

In general, numerical results are obtained much more quickly with numeri-
cal computation than with numerical evaluation of a symbolic calculation.
To perform symbolic computations, we must use syms to declare the vari-
ables we plan to use to be symbolic variables. For example, the quadratic
formula can be defined in terms of a symbolic expression by the following
kind of commands:

>> syms x a b c
>> sym(sqrt(a ∗ xˆ 2+b ∗ x + c))
ans =
(a ∗ xˆ 2+b ∗ x + c) ˆ (1/2)

A symbolic object that is created can also be a symbolic expression written


in terms of variables that have not been first created as symbolic objects.
For example, the above quadratic formula can be created as a symbolic
object by using the following sym command:
1064 Linear Algebra and Optimization using MATLAB

>> sym(0 sqrt(a ∗ xˆ 2+b ∗ x + c)0 )


ans =
(a ∗ xˆ 2+b ∗ x + c) ˆ (1/2)
The double (x) command can be used to convert a symbolic expression
(object) x, which is written in an exact numerical form. (The name double
comes from the fact that the command returns a double-precision floating-
point number representing the value of x.) For example:

>> x = sym(12) ∗ sin(7 ∗ pi/4)


x=
−6 ∗ 2ˆ (1/2)
x1 = double(x)
x1 =
−8.4853
Symbolic expressions that already exist can be used to create new sym-
bolic expressions, and this can be done by using the name of the existing
expression in the new expression. For example:

>> syms x y z
>> w = xˆ 2+yˆ 2
w=
xˆ 2+yˆ 2
>> u = xˆ 2∗yˆ 2
u=
xˆ 2∗yˆ 2
>> v = sqrt(w + u)
v=
(xˆ 2+yˆ 2+x ˆ 2∗yˆ 2)ˆ (1/2)

C.6.1 Some Important Symbolic Commands


Symbolic expressions are either created by the user or by MATLAB as the
result of symbolic operations. the expressions created by MATLAB might
not be in the simplest form or in a form that the user prefers. The form of
Appendix C: Introduction to MATLAB 1065

an existing symbolic expression can be changed by collecting terms with


the same power, by expanding products, by factoring out common multi-
pliers, by using mathematical and trigonometric identities, and by many
other operations. Now we define several commands that can be used to
change the form of an existing symbolic expression.

The Collect Command

This command collects the terms in the expression that have the variable
with the same power. In the new expression, the terms will be ordered in
decreasing order of power. The form of this command is:

>> collect(f )

or

>> collect(f, varn ame)

For example, if f = (2x2 +y 2 )(x+y 2 +3), then use the following commands:

>> syms x y
>> f = (2 ∗ xˆ 2+yˆ 2) ∗ (x + yˆ 2+3)
>> collect(f )
ans =
2 ∗ xˆ 3+(2 ∗ yˆ 2+6) ∗ xˆ 2+yˆ 2∗x + yˆ 2∗(yˆ 2+3)
But if we take y as a symbolic variable, then we do the following:

>> syms x y
>> f = (2 ∗ xˆ 2+yˆ 2) ∗ (x + yˆ 2+3)
>> collect(f, y)
ans =
yˆ 4+(2 ∗ xˆ 2+x + 3) ∗ yˆ 2+2 ∗ xˆ 2∗(x + 3)
The Factor Command

This command changes an expression that is a polynomial to be a product


1066 Linear Algebra and Optimization using MATLAB

of polynomials of lower degree. The form of this command is:

>> f actor(f )

For example, if f = x3 − 3x2 − 4x + 12, then use the following commands:

>> syms x
>> f = xˆ 3−3 ∗ xˆ 2−4 ∗ x + 12
>> f actor(f )
ans =
(x − 2) ∗ (x − 3) ∗ (x + 2)
The Expand Command

This command multiplies the expressions. The form of this command is:

>> expand(f )

For example, if f = (x3 − 3x2 − 4x + 12)(x − 3)2 , then use the following
commands:

>> syms x
>> f = (xˆ 3−3 ∗ xˆ 2−4 ∗ x + 12) ∗ (x − 3)ˆ 3
>> expand(f )
ans =
xˆ 6−12 ∗ xˆ 5+50 ∗ xˆ 4−60 ∗ xˆ 3−135 ∗ xˆ 2+432 ∗ x − 324
The Simplify Command

This command is used to generate a simpler form of the expression. The


form of this command is:

>> simplif y(f )

For example, if f = (x3 − 3x2 − 4x + 12)/(x − 3)2 , then use the following
commands:
Appendix C: Introduction to MATLAB 1067

>> syms x
>> f = (xˆ 3−3 ∗ xˆ 2−4 ∗ x + 12)/(x − 3)ˆ 3
>> simplif y(f )
ans =
(xˆ 2−4)/(x − 3)ˆ 2
The Simple Command

This command finds a form of the expression with the fewest number of
characters. The form of this command is:

>> simple(f )

For example, if f = (cos x cos y + sin x sin y), then use the simplify com-
mand, and we get:

>> syms x y
>> f = (cos(x) ∗ cos(y) + sin(x) ∗ sin(y))
>> simplif y(f )
ans =
cos(x) ∗ cos(y) + sin(x) ∗ sin(y)
But if we use the simple command, we get:

>> syms x y
>> f = (cos(x) ∗ cos(y) + sin(x) ∗ sin(y))
>> simple(f )
ans =
cos(x − y)
The Pretty Command

This command displays a symbolic expression in a format in which expres-


sions are generally typed. The form of this command is:

>> pretty(f )
1068 Linear Algebra and Optimization using MATLAB


For example, if f = x3 − 3x2 − 4x + 12, then use the following com-
mands:

>> syms x
>> f = sqrt(xˆ 3−3 ∗ xˆ 2−4 ∗ x + 12)
>> pretty(f )
ans =
3 2 1
(x − 3x − 4x + 12) /2

The findsym Command

To determine what symbolic variables are used in an expression, we use


the findsym command. For example, the symbolic expressions f 1 and f 2
are defined by:

>> syms a b c x y z
>> f 1 = a ∗ xˆ 2+b ∗ x + c
>> f 2 = x ∗ y ∗ z
>> f indsym(f 1)
ans =
a, b, c
>> f indsym(f 2)
ans =
x, y, z

The Subs Command

We can substitute a numerical value for a symbolic variable using the


subs command. For example, to substitute the value x = 2 in the f =
x3 y + 12xy + 12, we use the following commands:

>> syms x y
>> f = xˆ 3∗y + 12 ∗ x ∗ y + 12
>> subs(f, 2)
ans =
32 ∗ y + 12
Appendix C: Introduction to MATLAB 1069

Note that if we do not specify a variable to substitute for, MATLAB chooses


a default variable according to the following rule. For one-letter variables,
MATLAB chooses the letter closest to x in the alphabet. If there are two
letters equally close to x, MATLAB chooses the one that comes later in
the alphabet. In the preceding function, subs(f,2) returns the same answer
as subs(f,x,2). One can also use the findsym command to determine the
default variable. For example:

>> syms u v
>> f = u ∗ v
>> f indsym(f, 1)
ans =
v

C.6.2 Solving Equations Symbolically

We can find the solutions of certain equations symbolically by using the


MATLAB command solve. For example, to solve the nonlinear equation
x3 − 2x − 1 = 0 we define the symbolic variable x and the expression
f = x3 − 2x − 1 with the following commands:

>> syms x
>> f = xˆ 3−2 ∗ x − 1
>> solve(f )
ans =
[−1]
[1/2 ∗ 5 ˆ (1/2) + 1/2]
[1/2 − 1/2 ∗ 5ˆ (1/2)]

Note that the equation to be solved is specified as a string; i.e., it is sur-


rounded by single quotes.
√ The answer consists of the exact(symbolic) so-
lutions −1, 1/2±1/2 5. To get the numerical solutions, type double(ans):
1070 Linear Algebra and Optimization using MATLAB

>> double(ans)
ans =
−1.0000
1.6180
−0.6180

or type vpa(ans):
>> vpa(ans)
ans =
[−1.]
[1.6180339887498949025257388711907]
[−.61803398874989490252573887119070]

The command solve can also be used to solve polynomial equations of


higher degrees, as well as many other types of equations. It can also solve
equations involving more than one variable. For example, to solve the two
equations 3x + 3y = 2 and x + 2y 2 = 1, we do the following:

>> syms x y
>> [x, y] = solve(0 3 ∗ x + 3 ∗ y = 20 ,0 x + 2 ∗ yˆ 2= 10 )
x=
[5/12 − 1/12 ∗ 33ˆ (1/2)]
[5/12 + 1/12 ∗ 33ˆ (1/2)]

y=
[1/4 + 1/12 ∗ 33ˆ (1/2)]
[1/4 − 1/12 ∗ 33ˆ (1/2)]

Note that both solutions can be extracted with x(1), y(1), x(2), and y(2).
For example, type:

>> x(1)
ans =
[5/12 − 1/12 ∗ 33ˆ (1/2)]

and
Appendix C: Introduction to MATLAB 1071

>> y(1)
ans =
[1/4 + 1/12 ∗ 33ˆ (1/2)]

If we want to solve x + xy 2 + 3xy = 3 for y in terms of x, then we have to


specify the equation as well as the variable y as a string:

>> syms x y
>> solve(0 x + x ∗ yˆ 2+3 ∗ x ∗ y = 30 ,0 y 0 )
ans =
[1/2/x ∗ (−3 ∗ x + (5 ∗ x2 + 12 ∗ x)ˆ (1/2))]
[1/2/x ∗ (−3 ∗ x − (5 ∗ x2 + 12 ∗ x)ˆ (1/2))]

C.6.3 Calculus

The Symbolic Math Toolbox provides functions to do the basic operations


of calculus. Here, we describe these functions.

Symbolic Differentiation

This can be performed by using the diff command as follows:

>> dif f (f )

or

>> dif f (f, var)

where the command diff(f, var) is used for differentiation of expressionss


with several symbolic variables. For example, to find the first derivative of
f = x3 + 3x2 + 20x − 12, we use the following commands:
1072 Linear Algebra and Optimization using MATLAB

>> syms x
>> f = xˆ 3+3 ∗ xˆ 2+20 ∗ x − 12
>> dif f (f )
ans =
3 ∗ xˆ 2+6 ∗ x + 20
2
Note that if f = x3 + x ln y + yex is taken, then MATLAB differentiates
f with respect to x (default symbolic variable) as:

>> syms x y
>> f = xˆ 3+x ∗ log(y) + y ∗ exp(xˆ 2)
>> dif f (f )
ans =
3 ∗ xˆ 2+log(y) + 2 ∗ y ∗ x ∗ exp(xˆ 2)
2
If we want to differentiate f = x3 + x ln y + yex with respect to y, then we
use the MATLAB diff(f, y) command as:

>> syms x y
>> f = xˆ 3+x ∗ log(y) + y ∗ exp(xˆ 2)
>> dif f (f, y)
ans =
x/y + exp(xˆ 2)

Find the numerical value of the symbolic expression by using the MAT-
LAB subs command. For example, to find the derivative of f = x3 + 3x2 +
20x − 12 at x = 2, we do the following:

>> syms x
>> f = xˆ 3+3 ∗ xˆ 2+20 ∗ x − 12
>> df = dif f (f )
>> subs(df, x, 2)
ans =
44

We can also find the second and higher derivative of expressions by using
the following command:
Appendix C: Introduction to MATLAB 1073

>> dif f (f, n)

or

>> dif f (f, var, n)

where n is a positive integer: n = 2 and n = 3 mean the second and


third derivative, respectively. For example, to find the second derivative of
2
f = x3 + x ln y + yex with respect to y, we use the MATLAB diff(f, y, 2)
command as:

>> syms x y
>> f = xˆ 3+x ∗ log(y) + y ∗ exp(xˆ 2)
>> dif f (f, y, 2)
ans =
−x/yˆ 2

Symbolic Integration

Integration can be performed symbolically by using the int command. This


command can be used to determine indefinite integrals and definite inte-
grals of expression f . For indefinite integration, we use:

>> int(f )

or

>> int(f, var)

If in using the int(f) command the expression contains one symbolic vari-
able, then integration took place with respect to that variable. But if the
expression contains more than one variable, then the integration is per-
formed with respect to the default symbolic variable. For example, to find
2
the indefinite integral (antiderivative) of f = x3 + x ln y + yex with respect
to y, we use the MATLAB int(f, y) command as:
1074 Linear Algebra and Optimization using MATLAB

>> syms x y
>> f = xˆ 3+x ∗ log(y) + y ∗ exp(xˆ 2)
>> int(f, y)
ans =
xˆ 3∗y + x ∗ y ∗ log(y) − x ∗ y + 1/2 ∗ yˆ 2∗exp(xˆ 2)
Similarly, for the case of a definite integral, we use the following command:

>> int(f, a, b)

or

>> int(f, var, a, b)

where a and b are the limits of integration. Note that the limits a and
b may beR numbers or symbolic variables. For example, to determine the
1
value of 0 (x2 + 3ex + x ln y) dx, we use the following commands:

>> syms x y
>> f = xˆ 2+3 ∗ exp(x) + x ∗ log(y)
>> int(f, 0, 1)
ans =
−8/3 + 3 ∗ exp(1) + 1/2 ∗ log(y)
We can also use symbolic integration to evaluate Rthe integral when f has
∞ 2
some parameters. For example, to evaluate the −∞ e−ax dx, we do the
following:

>> syms a positve


>> syms x
>> f = exp(−a ∗ xˆ 2)
>> int(f, x, −inf, inf )
ans =
1/(a)ˆ 2(1/2) ∗ piˆ 2(1/2)
Note that if we don’t assign a value to a, then MATLAB assumes that a
Appendix C: Introduction to MATLAB 1075

represents a complex number and therefore gives a complex answer. If a


is any real number, then we do the following:

>> syms a real


>> syms x
>> f = exp(−a ∗ xˆ 2)
>> int(f, x, −inf, inf )
ans =
PIECEWISE([1/aˆ 2∗piˆ 2(1/2), signum(a) = 1], [Inf, otherwise])
Symbolic Limits

The Symbolic Math Toolbox provides the limit command, which allows us
to obtain the limits of functions directly. For example, to use the definition
of the derivative of the function
f (x + h) − f (x)
f 0 (x) = lim , provided limits exist,
h→0 h
and for finding the derivative of the function f (x) = x2 , we use the fol-
lowing commands:

>> syms h x
>> f = (x + h)ˆ 2−xˆ 2
>> limit(f /h, h, 0)
ans =
2∗x
We can also find one-sided limits with the Symbolic Math Toolbox. To
find the limit as x approaches a from the left, we use the commands:

>> syms a real


>> syms x
>> limit(f, x, a,0 lef t0 )
and to find the limit as x approaches a from the right, we use the com-
mands:
1076 Linear Algebra and Optimization using MATLAB

>> syms a real


>> syms x
>> limit(f, x, a,0 right0 )
|x−3|
For example, to find the limit of x−3
when x approaches 3, we need to
calculate

|x − 3| |x − 3|
lim− and lim+ .
x→3 x−3 x→3 x−3

Now to calculate the left-side limit, we do as follows:

>> syms a real


>> syms x
>> a = 3
>> f = abs(x − 3)/(x − 3)
>> limit(f, x, a,0 lef t0 )
ans =
−1

and to calculate the right-side limit, we use the commands:

>> syms a real


>> syms x
>> a = 3
>> f = abs(x − 3)/(x − 3)
>> limit(f, x, a,0 right0 )
ans =
1

Since the limit from the left does not equal the limit from the right, the
limit does not exist. It can be checked by using the following commands:
Appendix C: Introduction to MATLAB 1077

>> syms a real


>> syms x
>> a = 3
>> f = abs(x − 3)/x − 3
>> limit(f, x, a)
ans =
N aN

Taylor Polynomial of a Function

The Symbolic Math Toolbox provides the taylor command, which allows
us to obtain the analytical expression of the Taylor polynomial of a given
function. In particular, having defined in the string the function f on
which we want to operate taylor(f, x, n+1) the associated Taylor polyno-
mial of degree n expanded about x0 = 0. For example, to find the Taylor
polynomial of degree three for f (x) = ex sin x expanded about x0 = 0, we
use the following commands:

>> syms x
>> f = exp(x) ∗ sin(x)
>> taylor(f, x, 4)
ans =
x + xˆ 2+1/3 ∗ xˆ 3

C.6.4 Symbolic Ordinary Differential Equations


Like differentiation and integration, an ordinary differential equation can
be solved symbolically by using the dsolve command. This command can
be used to solve a single equation or a system of differential equations.
This command can also be used in getting a general solution or a partic-
ular solution of an ordinary differential equation. For first-order ordinary
differential equations, we use:

>> dsolve(0 eq 0 )
1078 Linear Algebra and Optimization using MATLAB

or

>> dsolve(0 eq 0 ,0 var0 )

For example, in finding the general solution of the ordinary differential


equation
dy 3y
=t+ ,
dt t
we use the following commands:

>> syms t y
>> f = t + 3 ∗ y/t
>> dsolve(0 Dy = f 0 )
ans =
−tˆ 2+tˆ 3∗C1
For finding a particular solution of a first-order ordinary differential equa-
tion, we use the following command:

>> dsolve(0 eq 0 ,0 cond10 )

For example, in finding the particular solution of the ordinary differential


equation
dy 3y
=t+ ,
dt t
with the initial condition y(1) = 4, we do the following:

>> syms t y
>> f = x + 3 ∗ y/t
>> dsolve(0 Dy = f 0 ,0 y(1) = 40 ,0 t0 )
ans =
−tˆ 2+5 ∗ tˆ 3
Similarly, the higher-order ordinary differential equation can be solved sym-
bolically using the following command:
Appendix C: Introduction to MATLAB 1079

>> dsolve(0 eq 0 ,0 cond10 ,0 cond20 , · · · ,0 var0 )

For example, the second-order ordinary differential equation


d2 y dy
2
− 4 − 5y = 0, y(1) = 0, y 0 (1) = 2
dx dx
can be solved by using the following commands:

>> syms x y
>> dsolve(0 D2y − 4 ∗ Dy − 5 ∗ y = 00 ,0 y(1) = 00 ,0 Dy(1) = 20 ,0 x0 )
ans =
1/3/exp(1)ˆ 5∗exp(5 ∗ x) − 1/3 ∗ exp(1) ∗ exp(−x)

C.6.5 Linear Algebra


Consider the matrix
   
3 2 1
A= and b = .
x y y
Since the matrix A is symbolic expressions, we can calculate the determi-
nant and the inverse of A, and also solve the linear system using the vector
b:

>> syms x y
>> A = [3 2; x y])
>> det(A)
ans =
3∗y−2∗x
>> inv(A)
ans =
[−y/(−3 ∗ y + 2 ∗ x), 2/(−3 ∗ y + 2 ∗ x)]
[x/(−3 ∗ y + 2 ∗ x), −3/(−3 ∗ y + 2 ∗ x)]
>> b = [1; x]
>> A\b
ans =
[(2 ∗ x − y/(−3 ∗ y + 2 ∗ x)]
[−2 ∗ x/(−3 ∗ y + 2 ∗ x)]
1080 Linear Algebra and Optimization using MATLAB

C.6.6 Eigenvalues and Eigenvectors

To find a characteristic equation of the matrix

 
3 −1 0
A =  −1 2 −1  ,
0 −1 3

we use the following commands:

>> A = [3 − 1 0; −1 2 − 1; 0 − 1 3]
>> poly(sym(A))
ans =
xˆ 3−8 ∗ xˆ 2+19 ∗ x − 12

>> f actor(ans)
ans =
(x − 1) ∗ (x − 3) ∗ (x − 4)

We can also get the eigenvalues and eigenvectors of a square matrix A sym-
bolically by using the eig(sym(A)) command. The form of this command
is as follows:

>> [X, D] = eig(sym(A))

For example, to find the eigenvalues and eigenvectors of the matrix A, we


use the following commands:
Appendix C: Introduction to MATLAB 1081

>> A = [3 − 1 0; −1 2 − 1; 0 − 1 3]
>> [X, D] = eig(sym(A))
X=
[1, −1, 1]
[−1, 0, 2]
[1, 1, 1]

D=
[4, 0, 0]
[0, 3, 0]
[0, 0, 1]

where the eigenvector in the first column of vector X corresponds to the


eigenvalue in the first column of D, and so on.

C.6.7 Plotting Symbolic Expressions


We can easily plot a symbolic expression by using the ezplot command.
To plot a symbolic expression Z that contains one or two variables, the
ezplot command is:

>> ezplot(Z)

or

>> ezplot(Z, [min, max])

or

>> ezplot(Z, [xmin, xmax, ymin, ymax])

For example, we can plot a graph of the symbolic expression Z = (2x2 +


2)/(x2 − 6) using the following commands:
1082 Linear Algebra and Optimization using MATLAB

>> syms x
>> Z = (2 ∗ xˆ 2+2)/(xˆ 2−64)
>> ezplot(Z)
and we obtain Figure C.7.

Figure C.7: Graph of Z = (2x2 + 2)/(x2 − 6).

Note that ezplot can also be used to plot a function that is given in a
parametric form. For example, when x = cos 2t and y = sin 4t, we use the
following commands:

>> syms t
>> x = cos(2 ∗ t)
>> x = sin(4 ∗ t)
>> ezplot(x, y)
and we obtain Figure C.8.
Appendix C: Introduction to MATLAB 1083

Figure C.8: Graph of function in a parametric form.

C.7 Symbolic Math Toolbox Functions

Listed below are some of the Symbolic Math Toolbox functions:

Symbolic Math Definition


Toolbox Function

diff differentiate
int integration
limit limit of an expression
symsum summation of series
taylor Taylor’s series expansion
det determinant
diag create or extract diagonals
eig eigenvalues and eigenvectors
inv inverse of a matrix
expm exponential of a matrix
rref reduced row echelon form
svd singular value decomposition
1084 Linear Algebra and Optimization using MATLAB

Symbolic Math Definition


Toolbox Function

poly characteristic polyonmial


rank rank of a matrix
tril lower triangle
triu upper triangle
collect collect common terms
expand expand polynomials and elementary functions
factor factor a expression
simplify simplification
simple search for shortest form
pretty pretty print of symbolic expression
findsym determine symbolic variables
subexpr rewrite in terms of subexpresions
numden numerator and denominator
compose functional composition
solve solution of algebraic equations
desolve solution of differetial equations
finverse functional inverse
sym create symbolic object
syms shortcut for creating multiple symbolic objects
real real part of an imaginary number
latex LATEX representation of a symbolic expression
fortran Fortran representation of a symbolic expression
imag imaginary part of a complex number
conj complex conjugate
resums Riemann sums
taylortool Taylor’s seriecs calculator
funtool Tfunction calculator
digits set variable precision accuracy
vpa variable precision arithmetic
double convert symbolic matrix to double
char convert sym object to string
poly2sym function calculator
sym2poly symbolic polynomial to coefficient vector
Appendix C: Introduction to MATLAB 1085

Symbolic Math Definition


Toolbox Function

fix round toward zero


floor round toward minus infinity
ceil round toward plus infinity
int8 convert symbolic matrix to signed 8-bit integer
int16 convert symbolic matrix to signed 16-bit integer
int32 convert symbolic matrix to signed 32-bit integer
int64 convert symbolic matrix to signed 64-bit integer
uint8 convert symbolic matrix to unsigned 8-bit integer
uint16 convert symbolic matrix to unsigned 16-bit integer
dirac dirac delta function
zeta Riemann zeta function
cosint cosine integral
sinint sine integral
fourier Fourier transform
ifourier inverse fourier transform
laplace Laplace transform
ilaplace inverse laplace transform
ztrans z-transform
iztrans inverse z-transform
ezplot function plotter
ezplot3 3-D curve plotter
ezpolar polar coordinate plotter
ezcontour contour plotter
ezcontourf filled contour plotter
ezmesh mesh plotter
ezmeshc combined mesh and contour plotter
ezsurf surface plotter
ezsurfc combine surface and contour plotter
1086 Linear Algebra and Optimization using MATLAB

C.8 Index of MATLAB Programs


In this section we list all the MATLAB functions supplied with this book.
These functions are contained in a CD included with this book. The CD-
ROM includes a MATLAB program for each of the methods presented.
Every program is illustrated with a sample problem or example that is
closely correlated to the text. The programs can be easily modified for
other problems by making minor changes. All the programs are designed
to run on a minimally configured computer. Minimal hard disk space plus
the MATLAB package are all that is needed. All the programs are given
as ASCII files called m-files with the .m extension. They can be altered
using any word processor that creates a standard ASCII file. The m-files
can be run from within MATLAB by entering the name without the .m
extension. For example, fixpt.m can be run using fixpt as the command.
The files should be placed in MATLAB\work subdirectory of MATLAB.

MATLAB Definition Chapter


Function 1

INVMAT Inverse of a matrix program 1.1


CofA Minor and cofactor of a matrix program 1.2
CofExp Determinant by cofactor expansion program 1.3
Adjoint Adjoint of a matrix program 1.4
CRule Cramer’s rule program 1.5
WP Gauss elimination method program 1.6
PP G.E. with partial pivoting program 1.7
TP G.E. with total pivoting program 1.8
GaussJ Gauss–Jordan method program 1.9
lu-guass LU decomposition method program 1.10
Dolittle Doolittle’s method program 1.11
Crout Crout’s method program 1.12
Cholesky Cholesky method program 1.13
TridLU Tridiagonal system program 1.14
RES Calculate residual vector program 1.15
Appendix C: Introduction to MATLAB 1087

MATLAB Definition Chapter


Function 2

JacobiM Jacobi iterative method program 2.1


GaussSM Gauss–Seidel iterative method program 2.2
SORM SOR iterative method program 2.3
CONJG Conjugate gradient method program 2.4

MATLAB Definition Chapter


Function 3

trac Trace of a matrix program 3.1


EigT wo Eigenvalues of a 2 × 2 matrix program 3.2
Chim Cayley–Hamilton theorem program 3.3
sourian Sourian frame theorem program 3.4
BOCH Bocher’s theorem program 3.5

MATLAB Definition Chapter


Function 4

POWERM1 Power method program 4.1


INVERSEPM1 Inverse power method program 4.2
ShiftedIPM1 Shifted inverse power method program 4.3
DEFLATION Deflation method program 4.4
JOBM Jacobi method for eigenvalues program 4.5
SturmS Sturm sequence method program 4.6
Given Given’s method program 4.7
HHHM Householder method program 4.8
QRM QR method’s program 4.9
hes Upper Hessenberg form program 4.10
1088 Linear Algebra and Optimization using MATLAB

MATLAB Definition Chapter


Function 5

Lint Lagrange method program 5.1


DiviDiff Divided differences of a function program 5.2
NDiviD Newton’s divided differences formula program 5.3
Aitken1 Aitken’s method program 5.4
ChebP Chehbev polynomial program 5.5
ChebYA Chehbev polynomial approximation program 5.6
linefit Linear least squares fit program 5.7
polyfit Polynomial least squares fit program 5.8
ex1fit Nonlinear least squares fit program 5.9
ex2fit Nonlinear least squares fit program 5.10
planefit Least squares plane fit program 5.11
overd Overdetermined program 5.12
underd Underdetermined program 5.13

MATLAB Definition Chapter


Function 7

Hessian Hessian Matrix program 7.1


bisect Bisection method program 7.2
fixpt Fixed-point method program 7.3
newton Newton’s method program 7.4
newton2 Newton’s method for a nonlinear system program 7.5
golden Golden-section search method program 7.6
Quadratic2 Quadratic interpolation method program 7.7
newtonO Newton’s method for optimization program 7.8
Appendix C: Introduction to MATLAB 1089

C.9 Summary
MATLAB has a wide range of capabilities. In this book, we used only a
small portion of its features. We found that MATLAB’s command struc-
ture is very close to the way one writes algebraic expressions and linear
algebra operations. The names of many MATLAB commands closely par-
allel those of the operations and concepts of linear algebra. We gave de-
scriptions of commands and features of MATLAB that related directly to
this course. A more detailed discussion of MATLAB commands can be
found in the MATLAB user guide that accompanies the software and in
the following books:

Experiments in computational Matrix Algebra by David R. Hill (New York,


Random House, 1988).

Linear Algebra LABS with MATLAB, second edition by David R. Hill and
David E. Zitaret (Prentice-Hall, Inc., 1996).

For a very complete introduction to MATLAB graphics, one can use the
following book:

Graphics and GUIs with MATLAB, 2nd ed., by P. Marchand (CRC Press,
1999).

There are many websites to help you learn MATLAB and you can locate
many of those by using a web search engine. Alternatively, MATLAB soft-
ware provides immediate on-screen descriptions using the Help command
or one can contact Mathworks at: www.mathworks.com
1090 Linear Algebra and Optimization using MATLAB

C.10 Problems
1. Solve each of the following expressions in the Command window:
(a)
(165)3/2 (765)2
(15 + 17)2 + + .
4 24
(b) p
3 ( (45)3 + 23)3 (101/21)2
(2.55 + ln(4)) + + .
17 15
(c) √
3/2 2
(165 + 2e ) ( 876 + 234)4
(e2 + 245)3 + + .
134 342
(d) p
( (788)5 + 120)2 (e4 + 333)4
(e4/5 + ln(7))2 + + .
111 254
2. Solve each of the following expressions in the Command Window:
(a)
4π 2π 2π 5π
(sin( ) + 0.7757)2 + cos2 ( ) + 2 cos( ) sin( ).
3 3 3 3
(b)
7π 12π π π
(sin2 ( ) + ln(2.5))3 + e3 cos( ) + cos( ) tan( ).
4 5 3 4
(c)
π π π π
(tan( ) + e0.5 )1/2 + (sin3 ( ) + 3 cos( ))/4 + sin3 ( ).
3 6 6 6
(d)
p
3/2 ( (22) + 12 cos(4π)) (e2 + ln(4.5) sin(0.75))2
2
(e + ln(3.5) + + .
24 12

3. Solve each of the following expressions in the Command Window:


(a)
3π 12π π
cos2 ( ) sin( ) + 2 tan( ).
4 5 4
Appendix C: Introduction to MATLAB 1091

(b)
5π 5π π 7π
sec( ) tan2 ( ) + 2 cos2 ( ) sin( ).
4 5 4 4
(c)
5π 3π π 9π
cot2 ( ) sin2 ( ) + 3[csc2 ( ) sec2 ( )]2 .
4 5 4 4
(d)
π π π π π
[tan( ) sin( )]2 + cos3 ( )[csc2 ( ) sec2 ( )]3 .
6 4 6 6 6
4. Define variable x and calculate each of the following in the Command
Window:
(a)
f (x) = x4 + 23x3 + 19x2 + 2x + 32; x = 2.5.
(b)

4 2 x3 + 23x2 + 1.2x ((x4 − 12)/13)2
f (x) = x +ln(x +2)+ + ; x = 4.5.
x 5
(c)
p
(x2 +1) (15x 3
+ 2ex/2 4
) ( sin(x) + x)2
f (x) = (e +2x3 +5x)4 + + , x12.5.
x 2
(d)
p
(x+3) 3 (x + 18)6 + sin(x + 1))3
3 (
f (x) = 5(e +ln(x −2)) + , x = 35.5.
33

5. Define variables x, y, z and solve each of the following in the Com-


mand Window:
(a)

w = x2 y 3 + 3x3 yz 4 + 9xy 3 z + 2xyz + 32y; x = 0.5, y = 2.7, z = 13.5.

(b)

w = ln(xy 3 )+2.5xyz+sin(xy)+2x3 y 6 z 8 ; x = 12.5, y = 22.5, z = 33.5.


1092 Linear Algebra and Optimization using MATLAB

(c)
p
w= cos(xy + z)+ln(x2 +y 3 +z 4 )+tan(xy); x = 5.5, y = 6.5, z = 8.5.

(d)
p
w = ln( x + y 3 )+cos(x3 y)+15x2 yz 4 ; x = 11.0, y = 12.0, z = 13.0.

(d)
p
(x+3) 3 3(x + 18)6 + sin(x + 1))3
(
f (x) = 5(e +ln(x −2)) + , x = 35.5
33

6. Create a vector that has the following elements using the Command
Window:
(a)
π
19, 4, 31, e25 , 63, cos( ), ln(3).
6
(b)
11π
π, 44, 101, e2 , 116, sin( ), ln(2).
4
(c)

35, 40, 321, e7 , 406, cos3 ( ), 2 ln(7).
4
(d)

60, 4 ln(4), 17, 1 + e3 , 83 sin( ).
6

x3 − 4x + 1
7. Plot the function f (x) = for the domain −5 ≤ x ≤ 5.
x3 − x + 2
8. Plot the function f (x) = 4x cos x − 3x and its derivative, both on the
same plot, for the domain −2π ≤ x ≤ 2π.

9. Plot the function f (x) = 4x4 − 3x3 + 2x2 − x + 1, and its first and
second derivatives, for the domain −3 ≤ x ≤ 5, all in the same plot.
Appendix C: Introduction to MATLAB 1093

10. Make two separate plots of the function

f (x) = x4 − 3 sin x + cos x + x,

one plot for −2 ≤ x ≤ 2 and the other for −3 ≤ x ≤ 3.

11. Use the fPlot command to plot the function f (x) = 0.25x4 − 0.15x3 +
0.5x2 − 1.5x + 3.5, for the domain −3 ≤ x ≤ 3.

12. The position of a moving particle as a function of time is given by

x = (1 − 2 sin t) cos t
y = (1 − 2 sin t) sin t
z = 3t2 .

Plot the position of the particle for 3 ≤ t ≤ 10.

13. The position of a moving particle as a function of time is given by

x = 1 + sin t
y = 1 + cos t
z = 3t4 .

Plot the position of the particle for 0 ≤ t ≤ 10.

14. Make a 3-D surface plot and contour plot (both in the same figure)
of the function z = (x + 2)2 + 3y 2 − xy in the domain −5 ≤ x ≤ 5
and −5 ≤ x ≤ 5.

15. Make a 3-D mesh plot and contour plot (both in the same figure) of
the function z = (x − 2)2 + (y − 2)2 + xy in the domain −5 ≤ x ≤ 5
and −5 ≤ x ≤ 5.

16. Define x as a symbolic variable and create the two symbolic expres-
sions:
P1 = x4 − 6x3 + 12x2 − 9x + 3 and
P2 = (x + 2)4 + 5x3 + 17(x + 3)2 + 12x − 20.
Use symbolic operations to determine the simplest form of the fol-
lowing expressions:
1094 Linear Algebra and Optimization using MATLAB

(i) P1 .P2 .
(ii) P1 + P2 .
P1
(iii) .
P2
(iv) Use the subs command to evaluate the numerical value of the
results for x = 15.
17. Define x as a symbolic variable and create the two symbolic expres-
sions: √
P1 = ln(sin(x + 2) + x2 ) − 12 x3 + 12 − ex+2 − 15x + 10 and
P2 = (x − 2)3 + 11x2 + 9x − 15.
Use symbolic operations to determine the simplest form of the fol-
lowing expressions:
(i) P1 .P2 .
(ii) P1 + P2 .
P1
(iii) .
P2
(iv) Use the subs command to evaluate the numerical value of the
results for x = 9.
18. Define x as a symbolic variable.
(i) Show that the roots of the polynomial
f (x) = x5 − 12x4 + 15x3 + 200x2 − 276x − 1008
are −3, −2, 4, 6, and 7 by using the factor command.
(ii) Derive the equation of the polynomial that has the roots
x = −5, x = 4, x = 2, and x = 1.
19. Define x as a symbolic variable.
(i) Show that the roots of the polynomial
f (x) = x5 − 30x4 + 355x3 − 2070x2 + 5944x − 6720
are 4, 5, 6, 7, and 8 by using the factor command.
(ii) Derive the equation of the polynomial that has the roots
x = 3, x = 2, x = 1, and x = 0.
20. Find the fourth-degree Taylor polynomial for the function f (x) =
(x3 + 1)−1 , expanded about x0 = 0.
Appendix C: Introduction to MATLAB 1095

21. Find the fourth-degree Taylor polynomial for the function f (x) =
x + 2 ln(x + 2), expanded about x0 = 0.
22. Find the fourth-degree Taylor polynomial for the function f (x) =
(x + 1)ex + cos x, expanded about x0 = 0.
23. Find the general solution of the ordinary differential equation

y 0 = 2(y + 1).

Then find its particular solution by taking the initial condition y(0) =
1 and plot the solution for −2 ≤ x ≤ 2.
24. Find the general solution of the second-order ordinary differential
equation
y 00 + xy 0 − 3y = x2 .
Then find its particular solution by taking the initial conditions
y(0) = 3, y 0 (0) = −6 and plot the solution for −4 ≤ x ≤ 4.
25. Find the inverse and determinant of the matrix
 
3 −1 5
A= p 4 2q  .
q 3p 5

Use b = [1, 2, 3]T to solve the system Ax = b.


26. Find the inverse and determinant of the matrix
 
3 −1 2 −3
 2p 6 −q 3 
A=  3
.
q 3p 5 
4 −5 7 8

Use b = [3, −2, 4, 5]T to solve the system Ax = b.


27. Find the characteristic equation of the matrix
 
4 2 1
A=  2 8 0 .
1 0 8
1096 Linear Algebra and Optimization using MATLAB

Then find its roots by using the factor command. Also, find the
eigenvalues and the eigenvectors of A.

28. Find the characteristic equation of the matrix


 
1 0 1 0
 1 0 1 0 
A=  1
.
0 1 0 
0 1 1 0

Then find its roots by using the factor command. Also, find the
eigenvalues and the eigenvectors of A.

29. Determine the solution of the nonlinear equation x3 + 2x2 − 4x = 8


using the solve command and plot the graph of the equation for
−4 ≤ x ≤ 4.

30. Determine the solution of the nonlinear equation cos x + 3x2 = 20


using the solve command and plot the graph of the equation for
−2 ≤ x ≤ 2.
Appendix D

Answers to Selected Exercises

D.0.1 Chapter 1

 
1 −5 −1
1. C =  −2 1 −3  .
−2 −1 −8

1 −5 −1

3. |AB| = −2 1 −3 = 0.
−2 −1 −8

5. x = −3, y = 2.

7. (a) B, (c) E.
1097
1098 Linear Algebra and Optimization using MATLAB

 
2 3 4
9. (a) Row E.F. =  0 1 2 , x = [4/3, 1/3, −2/3]T .
 0 0 −3 
3 0 1
(c) Row E.F. =  0 −1 0  , x = [−1/3, −3, 2]T .
0 0 1

11. (a) x = [5/4, 5/2, −3/4]T .


(c) x = [−2, 0, 2, 0]T .

13. det(A) = a21 c21 + a22 c22 + a23 c23 = 0 + 3(−33) + 5(37) = 86.
det(B) = a13 c13 + a23 c23 + a33 c33 = 4(−152) + 6(−107) + 12(−8) =
−1346.
det(A) = a12 c12 +a22 c22 +a32 c32 = −8(−52)+1(−45)+10(94) = 1311.


3 1 −1
15. det(A) = 0 −2/3 14/3 = (3)(−2/3)(−36) = 72.
0 0 −36


4 1 6
det(B) = 0 27/4 17/2 = (4)(27/4)(83/27) = 83.
0 0 83/27


17 46 7
det(C) = 0 −87/17 −4/17 = (17)(−87/17)(10) = −870.
0 0 10

17. det(A) = x3 −3x2 −12x+31, det(A) = 0, x = [−3.3485, 4.0787, 2.2698]T .

19. det(A) = 17x3 −7x2 −882x−2052, det(A) = 0, x = [8.3530, −5.1174, −2.8238]T .


   
4 −2 2/5 −1/5
21. (a) Adj(A) = , A−1 = .
3 1 3/10 1/10
   
−1 −1 1 1/2 1/2 −1/2
(c) Adj(A) =  −1 1 1  , A−1 =  1/2 −1/2 1/2  .
1 −1 −1 −1/2 1/2 1/2
Appendix D: Answers to Selected Exercises 1099

23.
 
12 16 −27
Adj(A) =  −5 9 13  ,
−10 −11 19
 
−314/707 1/101 −451/707
(Adj(A))−1 =  5/101 6/101 3/101  ,
−145/707 4/101 −188/707
det(Adj(A)) = −707.
 
32 −10 26
Adj(B) =  20 26 −16  ,
−41 37 7
 
1/86 2/129 −1/129
(Adj(B))−1 =  1/129 5/258 2/129  ,
7/258 −1/86 2/129
det(Adj(B)) = 66564.
 
4 2 −16
Adj(C) =  −1 −11 4 ,
−11 5 2
 
−1/42 −1/21 −2/21
(Adj(C))−1 =  −1/42 −2/21 0 ,
−1/14 −1/42 −1/42
det(Adj(C)) = 1764.
25.
 
−2 11 −3
Adj(A) =  −8 −10 15  , det(A) = 27
7 2 −3
 
−2/27 11/27 −1/9
A−1 =  −8/27 −10/27 5/9  .
7/27 2/27 −1/9
1100 Linear Algebra and Optimization using MATLAB

 
48 −8 34
Adj(B) =  41 18 −2  , det(B) = 298
−19 28 30
 
24/149 −4/149 17/149
B −1 =  41/298 9/149 −1/149  .
−19/298 14/149 15/149
 
74 108 44 −80
 −190 112 −128 −20 
Adj(C) =   , det(C) = −1112
 4 −96 −208 176 
90 66 −232 68
 
−37/556 27/278 −11/278 10/139
 95/556 −14/139 16/139 5/278 
C −1 = 
 −1/278
.
12/139 26/139 −22/139 
−45/556 −8/139 29/139 −17/278

27. (a) α = 1.

(c) α = 11.

 
0 1/7 1/7
29. (a) A−1 =  1/5 −3/35 4/35  ,
−2/5 −4/35 17/35
x = [1, 2, 3]T .
 
3/16 1/16 1/16
(c) A−1 =  −1/64 5/64 −11/64  ,
−17/192 7/64 5/192
x = [−1/2, 3/8, −5/24]T .
 
−157/5 203/5 2/5 36/5
 17 −22 0 −4 
31. (a) A−1 =  ,
 13 −17 0 −3 
81/5 −104/5 −1/5 −18/5
Appendix D: Answers to Selected Exercises 1101

x = [1277/5, −137, −106, −641/5]T .


 
−34/2451 39/1634 13/258 139/4902
 −397/2451 143/1634 19/258 −35/4902 
(c) A−1 =
 898/2451 −165/1634 −55/258 −211/4902  ,

−751/4902 501/3268 −5/516 403/9804


x = [1557/751, 227/4902, −2273/4902, 2284/2485]T .

33. (a) det(A) = −50, det(A1) = −58, det(A2) = −44, det(A3) = −66,

x = [29/25, 22/25, 33/25]T .

(c) det(A) = 168, det(A1) = 108, det(A2) = −383, det(A3) =


−325,

x = [9/14, −383/168, −325/168]T .

35. (a) det(A) = 2, det(A1) = −7, det(A2) = 4, det(A3) = 3,

x = [−7/2, 2, 3/2]T .

(c) det(A) = 166, det(A1) = −105, det(A2) = −21, det(A3) = 101,

x = [−105/166, −21/166, 101/166]T .


 
3 4 5
37. (a) U =  0 −2 −4  , x = [7/6, −5/6, 1/6]T .
 0 0 3 
6 7 8
(c) U =  0 53/6 26/3  , x = [−2/9, −5/9, 7/9]T .
0 0 45/53
 
2 5 −4
39. (a) U =  0 −3 3 , x = [−4/3, 3, 7/3]T .
0 0 −5/2
1102 Linear Algebra and Optimization using MATLAB

 
1 2 0
(c) U =  0 2 −2  , x = [−1, 2, 3]T .
0 0 2

41. (a) α 6= 3, α 6= 5.

(c) α 6= 0, α 6= ±1.
43.
   
−1/7 3/7 −1/7 46/11 3/11 −2
A−1 =  −4/7 3/14 3/7  , B −1 =  −3/11 1/11 0 ,
6/7 −4/7 −1/7 −2 0 1
 
10/31 −2/31 −5/31
C −1 = 7/31 11/31 −19/31  .
−11/31 −4/31 21/31

45. rank(A) = 3, rank(B) = 3, rank(C) = 2.


 
4 −1 4
47. (a) U =  0 7/4 0 , x = [1, 3, 0]T .
0 0 −1
 
4 9 16
(c) U =  0 −23/4 −11 , x = [3/2, −5, 5/2]T .
0 0 18/23
   
1 1/2 1/3 1/4 1 1/2 1/3 1/4
 1/2 1/3 1/4 1/5   0 1/12 4/45 1/12 
49. A =  , U = ,
 1/3 1/4 1/5 1/6   0 0 −1/180 −1/120 
1/4 1/5 1/6 1/7 0 0 0 1/2800
x = [−64, 900, −2520, 1820]T .

51. (a) (x) = [3/2, −7/2, 3/2]T .


(b) (x) = [−14/9, 34/9, −13/9]T .
(c) (x) = [−7/6, 23/6, −17/6]T .
Appendix D: Answers to Selected Exercises 1103
 
16 −120 240 −140
 −120 1200 −2700 1680 
53. A−1 =
 240 −2700
, x = [−64, 900, −2520, 1820]T .
6480 −4200 
−140 1680 −4200 2800

55. (a)
   
1 0 0 0 3 −2 1 1
 −1 1 0 0   0 5 5 −2 
L=
 2/3 −11/15
, U = ,
1 0   0 0 6 28/15 
7/3 1/3 −1/3 1 0 0 0 133/45

y = [3, 5, 8/3, −52/9]T .


x = [99/133, −111/133, 20/19, −260/133]T .

(c)
   
1 0 0 0 2 2 3 −2
 5 1 0 0   0 −8 −2 21 
L=
 1 −3/8
, U = ,
1 0   0 0 1/4 127/8 
1/2 5/8 −9 1 0 0 0 551/4

y = [10, −36, −25/2, −86]T .


x = [7967/551, 3003/551, −5706/551, −344/551]T .

(e)  
1 0 0 0
 12 1 0 0 
L= 22 −53/5
,
1 0 
8 −32/5 628/1119 1
 
1 −1 10 8
 0 −5 −109 −74 
U = ,
 0 0 −6812/5 −4807/5 
0 0 0 2541/232
y = [−2, 31, 1893/5, 727/105]T .
x = [758/1851, 391/1723, −501/692, 342/541]T .
1104 Linear Algebra and Optimization using MATLAB

 
2 3 −1
57. (a) U =  0 1/2 3/2  , det(A) = (2)(1/2)(1) = 1.
0 0 1
 
2 4 1
(c) U =  0 −3 1/2  , det(A) = (2)(−3)(5/6) = −5.
070 5/6
 
1 −1 0
(e) U =  0 1 1  , det(A) = (2)(−3)(5/6) = −5.
0 0 1
59. (a)
   
1 0 0 3 0 0 1 4 3
A = LDV =  2/3 1 0   0 1/3 0   0 1 1 .
1/3 5 1 0 0 1 0 0 1
   
1 0 0 4 0 0 1 −2 3
B = LDV =  5/4 1 0   0 9/2 0  0 1 −27/4  .
1 10/9 1 0 0 21/2 0 0 1
(c)
   
1 0 0 1 −5 4 1 −5 4
A = LDV =  2 1 0  0 13 −12   0 13 −12 .
3 17/13 1 0 0 126/13 0 0 126/13
   
1 0 0 4 7 −6 4 7 −6
B = LDV =  5/4 1 0  0 −15/4 5/2   0 −15/4 5/2  .
3/2 58/15 1 0 0 25/3 0 0 25/3

61. (a)
   
2 0 0 1 −1/2 1/2
L =  −3 5/2 0 , U = 0 1 1/5  ,
1 −1/2 3/5 0 0 1

y = [2, 22/5, 31/3]T .


x = [−2, 7/3, 31/3]T .
Appendix D: Answers to Selected Exercises 1105

(c)    
2 0 0 1 1 1
L =  1 1 0 , U =  0 1 0 ,
3 0 1 0 0 1
y = [0, −4, 1]T .
x = [3, −4, 1]T .

(e)    
1 0 0 1 −1 0
L= 2 1 0 , U = 0 1 1 ,
270 −1 0 0 1
y = [2, 0, 1]T .
x = [1, −1, 1]T .

63. (a)    
1 0 0 1 −1 1
L =  −1 2 0  , LT =  0 2 0 ,
1 0 3 0 0 3
y = [2, 2, 0]T .
x = [3, 1, 0]T .

(c)
   
2 0 0 2 1 3/2
L= 1 4 0 , LT =  0 4 −1/8  ,
3/2 −1/8 253/153 0 0 253/153

y = [1/2, 3/8, 1042/401]T .


x = [−1, 1/7, 11/7]T .

65. (a)
   
1 0 0 3 −1 0
L =  −1/3 1 0 , U =  0 8/3 −1  ,
0 −3/8 1 0 0 21/8
1106 Linear Algebra and Optimization using MATLAB

y = [1, 4/3, 3/2]T .


x = [4/7, 5/7, 4/7]T .

(c)
   
1 0 0 0 4 −1 0 0
 −1/4 1 0 0   0 15/4 −1 0 
L= , U = ,
 0 −4/15 1 0   0 0 56/15 −1 
0 0 −15/56 1 0 0 0 209/56

y = [1, 5/4, 4/3, 19/14]T .


x = [4/11, 5/11, 5/11, 4/11]T .

67. (a) kxk1 = 12, kxk2 = 7.0711, kxk∞ = 6.


(b) kxk1 = 4.3818, kxk2 = 3.1623, kxk∞ = 3, (k = 1).

69. (a) kA3 k1 = 2996, kA3 k∞ = 23.

(c) kBCk1 = 427, kBCk∞ = 531.

71. (a) K(A) = 3.969, well-conditioned.

(c) K(A) = 42.2384, ill-conditioned.

73. K(A) = 100 (ill-conditioned), r = [−0.02, 0.02]T ,


relative error = 1.
6.06
75. K(A) ≥ = 25.25.
0.24
77. δx = [−100, 101]T , K(A) = 404.01, ill-conditioned.

79. δx = [1.9976, 0.6675]T , K(A) = 80002, ill-conditioned.

81. δx = [−17, 20]T , K(A) = 8004, ill-conditioned.


Appendix D: Answers to Selected Exercises 1107

83. (a) y = 4 − 3x + x2 .
(c) y = 3 + 2x.
(e) y = 5 + x + 2x2 .

85. (a) l1 = 3, l2 = 1, l3 = 2.

87. x2 = 0.
10 12 10
89. x1 = , x2 = and x3 = .
7 7 7
91. (a) 4F eS2 + 11O2 −→ 2F e2 O3 + 8SO2 .
(c) 2C4 H10 + 13O2 −→ 8CO2 + 10H2 O.

55 25 5
93. x = , y= , z= .
8 4 8
95. 80, 000 yen, 900 francs, 1200 marks.

97. x = 100 (bacteria of strain I).


y = 350 (bacteria of strain II).
z = 350 (bacteria of strain III).

D.0.2 Chapter 2

1. (a) x(13) = [2, 4, 3]T , (c) x(12) = [−0.102, 0.539, 0.328]T .


(e) x(14) = [−0.673, 0.096, 0.711]T , (g) x(12) = [0.588, −0.147, 0.559]T .

3. (a) x(8) = [2, 4, 3]T , (c) x(6) = [−0.102, 0.539, 0.328]T ,


(e) x(8) = [−0.673, 0.096, 0.711]T (g) x(7) = [0.588, −0.147, 0.559]T .

5. (a) Divergent because ρ(A) = 4 > 1.

7. (a) x(8) = [2, 4, 3]T , (c)x(6) = [−0.102, 0.539, 0.328]T ,


(e)x(8) = [−0.673, 0.096, 0.711]T , (g) x(8) = [0.588, −0.147, 0.559]T .
1108 Linear Algebra and Optimization using MATLAB

9. Optimal choice ω = 1.172, x(9) = [0.375, 1.750, 2.375]T .


The Gauss–Seidel method needed 15 iterations and the Jacobi method
needed 26 iterations to get the same solution as the SOR method.

11. ρJ = 0.809017, ρGS = 0.654508, quad ρSOR = 0.25962.

13. (a) x(2) = [0.51814, −0.72359, −1.94301]T ,


r(2) = [1.28497, −0.80311, 0.64249]T .
(c) x(2) = [0.90654, 0.46729, −0.33645, −0.57009]T ,
r(2) = [−1.45794, −0.59813, −0.26168, −2.65421]T .

15. Simple Gaussian elimination x(1) = [−10, 1.01]T ,

Residual Corrector method x(2) = [10, 1]T .

D.0.3 Chapter 3

1. (a) p(λ) = λ3 − 7λ2 + 4λ + 12, λ1 = 6, λ2 = 2, λ3 = −1


     
1 7 1
x1 =  1  , x2 =  −1  , x3 =  −5/2  .
1 −5 1

(c) p(λ) = λ3 − 6λ2 + 11λ − 6, λ1 = 2, λ2 = 1, λ3 = 3,


     
1 0 1
x1 =  1  , x2 =  −1  , x3 =  1  .
−1 1 0

(e) p(λ) = λ3 − 2λ2 − λ + 2, λ1 = 2, λ2 = 1, λ3 = −1,


     
0 1 1
x1 =  1  , x2 =  0  , x3 =  −1  .
1 1 1

3. (a) k = 1, (c) k 6= 1/3.


Appendix D: Answers to Selected Exercises 1109

5. (a) Diagonalizable, real distinct eigenvalues 2, −2, −1


(c) Not diagonalizable, real repeated eigenvalues −1, 2, 2
(e) Diagonalizable, real distinct eigenvalues −11, 7, −1

     
1 1 2 −5 1 0 0 1 1
7. (a)  1 2 3  , (c)  −2 −8 1/2  , (e)  1 −1 0 
2 4 1 1 −10 1 1 1 1

9. Similar matrices because have they have the same eigenvalues,


λ = 1, 1, 1.
   
2 0 0 3 0 0
11. (a)  0 1 0  , (c)  0 −4 0 .
0 0 −1 0 0 −1
   
1 √0 √0 −1 0 0
13. (a) Q =  0 1/√2 1/√2  , D =  0 1 0  .
0 1/ 2 −1/ 2 0 0 1
   
−2/3 1/3 2/3 0 0 0
(c) Q =  2/3 2/3 1/3  , D =  0 3 0  .
1/3 −2/3 2/3 0 0 6
   
2/3 −2/3 1/3 0 0 0
(e) Q =  1/3 2/3 2/3  , D =  0 9 0 
2/3 1/3 −2/3 0 0 9
   
2 −100
3 81 4 −524 105
15. (a) λ −7λ+22 = 0, A = , A = ,
−108 −19 −140 −419
   
−1 5/22 −3/22 −2 13/484 −21/484
A = , A = .
2/11 1/11 7/121 −2/121
   
−3 1 1 −6 −2 1
(c) λ3 −λ2 +3 = 0, A3 =  −3 −3 0  , A4 =  0 −3 −3 
−3 −2 −2 3 −5 −2
   
1/3 0 −1/3 0 −1/3 0
A−1 =  2/3 0 1/3  , A−2 =  1/3 1/3 −1/3  .
1/3 1 −1/3 2/3 −1/3 1/3
1110 Linear Algebra and Optimization using MATLAB

 
−5 −1/2 5/2
17. (a) λ3 − 2λ2 − λ + 2 = 0, A−1 =  −4 −1/2 3/2 
−12 −1 6

 
−3/4 −1/4 −1
(c) λ3 + 2λ2 − 11λ − 12 = 0, A−1 =  −1/2 −1/2 1 .
1/12 −1/12 1/3

19. (a) λ3 − 5λ2 − 19λ + 65 = 0, det(A) = −65,


   
−5 −20 5 1/3 4/13 −1/13
adj(A) =  −10 12 −3  , A−1 =  2/13 −12/65 3/65  .
0 39 −26 0 −3/5 2/5
(c) λ4 + 6λ3 − 3λ2 − 1 = 0, det(A) = −1,
   
0 −1 0 −2 0 1 0 2
 −1 1 2 −2   1 −1 −2 2 
adj(A) =  0 −1 −3 , A−1 =  .
3   0 1 3 −3 
2 −2 −3 3 −2 2 3 −3

21.

2 5 −2t
f1 (t) = − e4t + e
3 3
2 1 −2t
f1 (t) = − e4t − e .
3 3

23. (a)
1 3 4t
f1 (t) = − e2t + e
2 2

f2 (t) = e2t

1 3 4t
f3 (t) = − e2 + e .
2 2
Appendix D: Answers to Selected Exercises 1111

(c)
f1 (t) = −7e−3t + 8et
f2 (t) = −3e−3t + 4et
f3 (t) = 5e−3t − 4et .

1
25. (a) an = (2n+1 + 2n ), a20 = 1048576.
3
(c) an = (−1)n , a15 = −1.

D.0.4 Chapter 4

1. (a) λ(4) = 5.7320, X = [0.8255, 1.0000, 0.9225]T .

(c) λ(4) = 3.3103, X = [0.4583, 0.0729, 1.0000]T .

3. (a) λ(4) = 4.3731, X = [0.8897, −0.2962, 1.0000]T .

(c) λ(4) = 4.4143, X = [0.3333, 0.1381, 1.0000]T .

5. (a) The three real eigenvalues satisfies 0 ≤ λ ≤ 6.

(c) The three real eigenvalues satisfies −2 ≤ λ ≤ 4.


   
1 0 0 2 −3 6
7. Q =  0 1 0 , B= 0 3 −4 
0 0 1 0 2 −3
 
3 −4
C= .
2 −3
C has eigenvalues 1, −1 with eigenvectors [1, 1/2]T and [1, 1]T , re-
spectively. The eigenvectors of B are [0, 1, 1/2]T , [−1, 1, 1]T , which
are also the remaining eigenvectors of matrix A.
1112 Linear Algebra and Optimization using MATLAB

9. (a) λ = 13.6138, −0.1635, −5.4502,

 
0.7000 −0.6414 −0.3140
X =  0.4357 0.7319 −0.5240  .
0.5659 0.2300 0.79118

(c) λ = 11.5915, 6.6240, −7.2154,

 
0.7933 −0.1710 −0.5843
X =  0.4934 0.7429 0.4525  .
0.3567 −0.6472 0.6737

(e) λ = −1.6073, 4.9988, 14.5592, 4.0494,

 
0.6068 0.4539 0.4156 −0.5031
 −0.4738 0.5814 0.5323 0.3928 
X=
 −0.4513 −0.4775 0.5215 −0.5444  .

0.4513 −0.4775 0.5215 0.5444


11. (a) The upper Hessenberg form of the matrix is
 
11.0000 100.5000 45.0000
 12.0000 55.5000 23.0000  .
0 −14.7500 −3.5000
The 12th QR iteration is
 
69.5837 99.9605 35.8569
 0 −9.7509 −2.1207  ,
0 0 3.1673
and the approximation of the eigenvalues of matrix A are
λ = 69.5837, −9.7509, 3.1673.
Appendix D: Answers to Selected Exercises 1113

(c) The upper Hessenberg form of the matrix is


 
14.0000 33.8000 1.9337 1.0000
 5.0000
 24.2000 5.1327 −2.0000  .
 0 −92.2400 −20.7564 11.4000 
0 0 −1.8283 7.5564
The 42th QR iteration is
 
26.3046 −30.0959 76.9405 19.6782
 0.0000
 −13.4367 53.4580 9.0694 
,
 0.0000 −3.0615 5.7097 −2.2534 
0.0000 −0.0000 −0.0000 6.4224
and the approximation of eigenvalues of matrix A are
λ = 26.3046, −13.4367, 5.7097, 6.4224.
 
3.0000 2.2361 0
13. (a) T =  2.2361 3.0000 0  , λ = 5.2361, 3.0000, 0.7639.
0 0 3.0000
 
1 −2 0
(c) T =  −2 1 −2  , λ = 3.8284, −1.8284, 1.0000.
0 −2 1
 
2.0000 −5.0000 0.0000
15. (a) T =  −5.0000 10.0800 0.4400  ,
0.0000 0.4400 −0.0800

λ = 12.4807, −0.4807, 0.0000.

 
4.0000 4.5826 0.0000 0.0000
 4.5826 5.0476 −3.3873 0.0000 
(c) T = 
 0.0000 −3.3873
,
7.1666 −1.1701 
0.0000 0.0000 −1.1701 5.7858

λ = 11.1117, 6.7240, 4.9673, −0.8030.


1114 Linear Algebra and Optimization using MATLAB

 
5.4713 −0.7373 2.1419
17. (a) A(15) =  −0.0097 −3.4713 1.9705  ,
0.0000 0.0000 1.0000

λ = 5.4713, −3.4713, 1.0000.

 
3.8284 −0.0001 0.0000
(c) A(15) =  −0.0001 −1.8284 −0.0002  ,
0.0000 −0.0002 1.0000
λ = 3.8284, −1.8284, 1.0000.
19. (a) λ = 4.3028, 0.70, 0, Iterations = 8.

(c) λ = 3, 2, 1, Iterations = 24.

21. (a) The upper Hessenberg form of the matrix is


 
11.0000 100.5000 45.0000
 12.0000 55.5000 23.0000  .
0 −14.7500 −3.5000
The 12 QR iteration is
 
69.5837 99.9605 35.8569
 0.0000 −9.7509 −2.1207  ,
−0.0000 −0.0000 3.1673
and the approximation of eigenvalues of matrix A are
λ = 69.5837, −9.7509, 3.1673.

(c) The upper Hessenberg form of the matrix is


 
14.0000 33.8000 1.9337 1.0000
 5.0000
 24.2000 5.1327 −2.0000  .
 0 −92.2400 −20.7564 11.4000 
0 0 −1.8283 7.5564
Appendix D: Answers to Selected Exercises 1115

The 42 QR iteration is
 
26.3046 −30.0959 76.9405 19.6782
 0.0000 −13.4367 53.4580 9.0694 
 ,
 0.0000 −3.0615 5.7097 −2.2534 
0.0000 −0.0000 −0.0000 6.4224

and the approximation of the eigenvalues of matrix A are

λ = 26.3046, −13.4367, 5.7097, 6.4224.



23. (a) σ1 = 5, σ2 = 2, σ3 = 0.
√ √ √
(c) σ1 = 5, σ2 = 3, σ3 = 3.

25. Let A be an orthogonal matrix. Then AT = A−1 . Moreover, the


singular values of A are the eigenvalues of the matrix AT A = A−1 A =
I, since the only eigenvalue of an identity matrix is 1.

27. (a)    
T 0 1 3 0 −1 0
A = U DV = .
1 0 0 2 0 −1

(c)
  
0.5774 0 0.8165 1.7321 0  
0 1
A = U DV T =  0.5774 0.7071 −0.4082   0 1.4142 
1 0
−0.5774 0.7071 0.4082 0 0

29. (a) x = [x1 , x2 ]T = [0.25, −0.25]T .


(c) x = [x1 , x2 , x3 ]T = [−0.6154, 2.6923, 1.3846]T .

D.0.5 Chapter 5

1. p2 (x) = (−4x2 + 32x + 42)/210; f (1.5) ≈ p2 (1.5) = 0.3857;


Error = 0.0011.
1116 Linear Algebra and Optimization using MATLAB

3. A = −0.12, B = 0.84, C = 0.28.

5. α = 1/2.

7. p3 (x) = −14.269x3 + 8.726x2 + 1.935x + 1.057; f (0.3) ≈ 2.0375;


Error = 0.0075; Error Bound = 0.0124.

9. Divided difference table is:


2.1 1.6669 0 0 0 0 0
3.2 6.3513 4.2585 0 0 0 0
4.3 23.1166 15.2412 4.9921 0 0 0
5.4 81.5715 53.1409 17.2271 3.7076 0 0
6.5 281.4813 181.7362 58.4524 12.4925 1.9966 0
7.6 955.0494 612.3346 195.7266 41.5982 6.6149 0.8397

11. (a) Divided difference table is:

2.0000 0 0 0 0
2.2361 0.2361 0 0 0
2.4495 0.2134 −0.0113 0 0
2.6458 0.1963 −0.0086 0.0009 0
2.8284 0.1826 −0.0069 0.0006 −0.0001

(b) p3 (5.9) = 2.4290 andp4 (5.9) = 2.4290.


(c) E3 = 0.0005742 and E4 = 0.00000004.

13. All three divided differences can be expanded as

(x2 − x1 )f0 − (x2 − x0 )f1 + (x1 − x0 )f2


.
(x2 − x1 )(x2 − x0 )(x1 − x0 )

15. f [0, 1, 0] = 0.7183.

17. T4 (x) = 2xT3 (x) − T2 (x) = 8x4 − 8x2 + 1.

19. p3 (x) = 0.0491x3 − 0.1434x2 + 0.4991x + 0.6955;


Error Bound = 0.0313.
Appendix D: Answers to Selected Exercises 1117

21. P0123 (1.5) = 2.0000 ≈ f (1.5).

23. P01234 (2.5) = 0.0673 ≈ f (2.5).

25. (a) a = 1, b = 3, y = x + 3, E(a, b) = 0.0.


(c) a = 5.4, b = −11, y = 5.4x − 11, E(a, b) = 1.2000.
(e) a = 6.24, b = 1.45, y = 6.24x + 1.45, E(a, b) = 4.0120.

27. (a) a = −1.782, b = 0.073, c = 1.818,


y = −1.782x2 + 0.073x + 1.818, E(a, b) = 0.146.
(c) a = 28.125, b = 0.3, c = −3.125,
y = 28.125x2 + 0.3x − 3.125, E(a, b) = 0.2.
(e) a = −0.6, b = −0.1, c = 2.5,
y = −0.6x2 − 0.1x + 2.5, E(a, b) = 1.6.

29. (a) a = 4.961, b = 0.037, y = 4.961e0.037x , E(a, b) = 0.452.


(c) a = 0.980, b = 0.530, y = 0.980e0.530x , E(a, b) = 21.185.
(e) a = 3.867, b = −0.508, y = 3.867e−0.508x , E(a, b) = 0.071.

31. (a) a = 1.333, b = 0.167, c = 1.333, Z = 1.333x + 0.167y + 1.333.


(c) a = 1.11, b = −0.44, c = 2.11, Z = 1.11x − 0.44y + 2.11.

33. (a) p(x) = 38.1975 + 16.3611 cos x − 1.2156 sin x.


(c) p(x) = 8.6690 − 3.3430 cos x + 1.7330 sin x.

35. (a) x̂ = [x1 , x2 ]T = [0.8460, −0.2051]T .


(c) x̂ = [x1 , x2 , x3 ]T = [2.3041, 0.0986, 1.1107]T .

37. (a) x̂ = [x1 , x2 , x3 ]T = [−0.1827, −0.1068, 0.8733]T .


(c) x̂ = [x1 , x2 , x3 , x4 ]T = [0.9316, −0.1110, −1.5383, 1.4928]T .

39. (a) x̂ = [x1 , x2 , x3 ]T = [1.4933, 0.5467, 0.4667]T .


(c) x̂ = [x1 , x2 , x3 , x4 ]T = [−0.0799, 2.3216, 1.0979, 0.0356]T .

41. (a) x̂ = [x1 , x2 , x3 ]T = [0.9554, −0.8005, 2.1858]T .


(c) x̂ = [x1 , x2 , x3 , x4 ]T = [0.3780, −0.0863, 0.8664, 0.1846]T .
1118 Linear Algebra and Optimization using MATLAB

43. (a)  
+ −1/75 −2/15 43/75
A = .
3/25 1/5 −4/25
(c)  
+ −13/159 −25/318 89/318
A = .
29/159 34/159 −32/159
 
+ 11/17 −5/17
45. (a) A = , x̂ = [19/17, 6/17]T .
−1/17 2/17
 
−38/29 −9/29 17/29
+
(c) A =  −5/29 −5/29 3/29  , x̂ = [−30/29, −7/29, 40/29]T .
41/29 12/29 −13/29

47. x̂ = [x1 , x2 ]T = [−4, 9/5]T .


49. x̂ = [x1 , x2 ]T = [5/3, −2]T .
51. x̂ = [x1 , x2 ]T = [1, 0]T .
53. x̂ = [x1 , x2 ]T = [1, −2]T .
55. x̂ = [x1 , x2 , x3 ]T = [−0.3334, 0.3333, 0.6666]T .
57. (a) x̂ = [x1 , x2 ]T = [0.4000, 0.2000]T .
(c) x̂ = [x1 , x2 ]T = [0.4000, 0.2000]T .
59. (a) x̂ = [x1 , x2 ]T = [0.9250, 0.0250]T .
(c) x̂ = [x1 , x2 ]T = [0.8833, −0.0250]T .

D.0.6 Chapter 6

1. maximize Z = 20x1 + 15x2


Subject to the constraints
x2 ≤ 8
2x1 − x2 ≤ 0
2x1 + x2 ≤ 12.5
Appendix D: Answers to Selected Exercises 1119

x1 ≥ 0, x2 ≥ 0.

3. minimize Z = 20x1 + 20x2 + 31x3 + 11x4 + 12x5


subject to the constraints

x1 + x3 + x4 + 2x5 ≥ 21 (Vitamin A constraint)


x2 + 2x3 + x4 + x5 ≥ 12 (Vitamin B constraint)

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0.

5. maximize Z = 6(9x1 + 5x2 ) + 9(7x1 + 9x2 )


subject to the constraints

9x1 + 5x2 ≥ 500


7x1 + 9x2 ≥ 300
5x1 + 3x2 ≤ 1500
7x1 + 9x2 ≤ 1900
2x1 + 4x2 ≤ 1000

x1 ≥ 0, x2 ≥ 0.

7. 550 Containers from Company A


300 Containers from Company B
Maximum Shipping Charges = $2110

9. Company manufactures 400 of A


Company manufactures 300 of B
Maximum profit = $6400

11. 0.4 pounds of Ingredient A


2.4 pounds of Ingredient B
Minimum Cost = 24.8 cents

13. (a) Maximum = 24 at (6, 12),


(c) Maximum = 6 at (1, 2)

15. (a) Minimum = −350 at (100, 50),


(c) Minimum = −4 at (4, 0)
1120 Linear Algebra and Optimization using MATLAB

17. (a) Maximum = 14 when x1 = 4 and x2 = 1,


(c) Maximum = 1600 when x1 = 160 and x2 = 0

19. (a) No Maximum,


(c) Maximum = 22, 500 when x1 = 0, x2 = 100, and x3 = 50

21. (a) Minimum = 1 when x1 = x2 = 0, x3 = 1,


(c) Minimum = −24 when x1 = 4 and x2 = 6

23. (a) minimize V = 3y1 + 5y2


subject to the constraints

y1 + y2 ≥ −5
−y1 + 3y2 ≥ 2

y1 ≥ 0, y2 ≥ 0.
(c) maximize V = 2y1 + 5y2
subject to the constraints

6y1 + 3y2 ≤ 6
−3y1 + 4y2 ≤ 3
y1 + y2 ≤ 0

y1 ≥ 0, y2 ≥ 0.

D.0.7 Chapter 7

1. (a) 1.

2
(c) .
4
15
3. (a) .
16x9/4
1
(c) 6 − 3 cos x + x sin x − 2 .
x
Appendix D: Answers to Selected Exercises 1121

∂ 2f
5. (a) = y 2 e−xy cos y.
∂ 2x
∂ 2f
(c) 2 = (x2 − 1)e−xy cos y + 2xe−xy sin y.
∂ x
7. (a) [5, 1, 2]T .
 T
3 12 27
(c) , , .
ln 36 ln 36 ln 36
 
  6 0 0
12 5
9. (a) H(1, −1) = , (c) H(1, 2, 2) =  0 48 5  .
5 12
0 5 24

11. (a) xT Ax = 2x2 +6xy+4y 2 , (c) xT Ax = 2x2 +4y 2 +5z 2 +8xy−4xz.


13. (a) Positive-definite.
(c) Positive-definite.
15. (a) C18 = 7.9365.
(c) C17 = 1.8187.
17. (a) x4 = 1.2419.
(c) x5 = 3.7578.
19. (a) [1.2599, −0.0000], iter. = 6.
(c) [−1.1756, 1.6180], iter. = 5.
21. (a) f (0.01) = 2.9998, iter. = 10.
(c) f (0.081) = 1.4856, iter. = 9.
23. (a) f (2.25) = −42.5469, iter. = 2.
(c) f (1.25) = −12.6863, iter. = 2.
25. (a) f (5.1200e − 006) = 5.0, iter. = 3.
(c) f (5.9729e − 012) = 1.0, iter. = 2.
27. (a) M ax : (1, 1) and (−1, −1), saddle (0, 0).
(c) M in : (−1, 0) M ax : (1, 4) saddle (1, 0), (−1, 4).
29. (a) (−0.0685, 0.2042).
(c) (−2.0, 1.5).
1122 Linear Algebra and Optimization using MATLAB

31. (a) f (1,√1) =√2. √


(c) f (2 3 3, 2 3 3, 3 3) = 12(3)2/3 .

33. (a) M in : f (2, √


2, 8) =√72, max √: f (−3, −3, 18) = 342.
(c) M in : f (1, √ 2, −√2) = 1 − 2√ 2,
max : f (1, − 2, 2) = 1 + 2 2.

35. (a) M in : f (0, −1,


√ 0) = 1,√ min : f (2, 1, 0) = 5.
(c) M in : f (2 + 3, 2 − 3, 1) = 15, max : f (2, 2, 4) = 24.

37. (a) x∗ = [61/18, 11/18]T , z = 311/36.


(a) x∗ = [1, 0]T , z = −3.

39. (a) f1 (x) = 2x2 , f2 (y) = y 2 and g1 (x) = x, g2 (y) = y.


(c) f1 (x) = x2 , f2 (y) = y, f3 (z) = −z 2 and
g1 (x) = x2 , g2 (y) = 2y 2 , g3 (z) = 3z
g4 (x) = x, g5 (y) = y, g6 (z) = z.

D.0.8 Appendix A

99
1. 10, 37, .
128
3. 8(10)3 + 3(10)2 + 8(10) + 3
2(10)2 + 8(10) + 5 + 6(10)−1 + 2(10)−2 + 5(10)−3
4(10)2 + (10) + 3 + (10)−1 + 4(10)−2 + (10)−3 + · · · .

5. (.00011)2 , 0.00625, 0.0025.

7. 1000101, 10111, 1001111110, 101.01000101 · · · .

9. (a) Chopping: e = 2.718 × 100 , Rounding: e = 2.718 × 100 .


(b) Chopping: e = 2.718281 × 100 , Rounding: e = 2.718282 × 100 .
(c) Chopping: e = 1.010110111 × 21 ,
Rounding: e = 1.010111000 × 21 .

11. Absolute error = 1/3 × 10−4 , Relative error = 10−4 .


Appendix D: Answers to Selected Exercises 1123

13. (a) Absolute error = 7.346 × 10−6 , Relative error = 2.3383 × 10−6 .
(b) Absolute error = 7.346×10−4 , Relative error = 2.3383×10−4 .

15. 0.132 × 102 (m = 3, e = 2), −0.12532 × 102 (m = 5, e = 2),


0.16 × 10−1 (m = 2, e = −1).

D.0.9 Appendix B

1. (a) < 7, −1 >, < 1, −9 >, 7.0711, 9.0554.


(c) < 5, 2, 10 >, < 3, 8, 2 >, 11.3578, 8.7750.

3. (a) < −444/1955, 1332/2737, 399/473 >.


(c) < −70/367, 210/367, −4359/5465 >.

5. (a) −19 (c) − 5.

7. (a) 1.2882 radian.


(c) 0.2363 radian.

9. (a) α = −5.
(c) α = 1.

11. α ≈ 740 , β ≈ 580 , γ ≈ 37o .

13. 27 joules.

15. (a) 2.4254, 4.0825.


(c) −0.1741, −0.1622.

17. (a) < 2.2895, 3.8158, 1.5263 >.


(c) < 0.2857, −0.1429, 0.4286 >.

19. (a) < −10, 0, 5 >.


(c) < 18, −16, −13 >.

21. (a) < −291 − 255132 >.


(c) < −139, −66, 104 >.
1124 Linear Algebra and Optimization using MATLAB

23. (a) 59.2621.


(c) 113.9210.
25. (a) 38.1477.
(c) 15.4110.
27. (a) −39.
(c) −4771.
29. (a) 55.
(c) 200.
31. (a) < −15, 30, −15 >.
(c) < −372, 303, 6 >.
33. (a) x = 3 + 2t, y = 2 + 2t, z = −4 + 2t4.
(c) x = 1 + 2t, y = 2 + 4t, z = 3 + 6t.
35. (a) θ ≈ 0.86 radian.
(c) θ ≈ 1.38 radian.
37. (a) d = 3/7.
(c) d = 1/6.
39. (a) 2x + 3y + 4z − 19 = 0.
(c) x + 5y + 4z − 8 = 0.
41. (a) 19 − 3i.
(c) 55 − 72i.
43. (a) z = 5(cos( 43 ) + i sin( 43 )). √
√ √
(c) z = 14(cos( 35 ) + i sin( 35 )).
45. (a) 5 + 3i.
(c) e2πi .
47.
 
    3 − 2i 1 + 2i
3 − 2i −i 1 3i  18 − 3i −3 + 10i  .
, ,
4 + i 5i 2 − 3i −i
2 − 15i 7 + 4i
Appendix D: Answers to Selected Exercises 1125

49. (a) [−1/61 − 11/61i, −13/61 − 21/61i]T .


(c) [−42/41 + 137/123i, −110/61 − 211/369i]T .
51. (a) 7 − 21i.
(c) 670 + 434i.
53. (a) −34 − 5i.
(c) −317 + 1225i.
55. (a)
   
4 0 2 −2 3 3
Re(A) =  3 2 3  , Im(A) =  −5 7 −7  .
0 0 1 2 3 −5
(c)
   
0 3 2 3 −4 −5
Re(A) =  2 4 11  , Im(A) =  −7 −8 −5  .
11 6 3 2 0 7
57. (a)
λ1 = 8.3972
x1 = [0.8480, 0.4714, 0.2421]T .

λ2 = 1.3014 + 0.6707i
x2 = [0.2163 + 0.1536i, −0.2378 − 0.3653i, −0.8600]T .

λ3 = 1.3014 − 0.6707i
x3 = [0.2163 − 0.1536i, −0.2378 + 0.3653i, −0.8600]T .
(c)
λ1 = 0.0000
x1 = [−0.8944, −0.4472, 0.0000]T .

λ2 = −38.0000 + 18.0000i
x2 = [0.4020 + 0.0783i, 0.7732, −0.3726 + 0.3090i]T .

λ3 = −38.0000 − 18.0000i
x3 = [0.4020 − 0.0783i, 0.7732, −0.3726 − 0.3090i]T .
1126 Linear Algebra and Optimization using MATLAB

D.0.10 Appendix C

1. (a) 2.9417e + 005.


(c) 3.0194e + 007.

3. (a) 2.4755.
(c) 24.9045.

5. (a) 8.0392e + 023.


(c) 6.2206e + 008.

7. 9.

11. 13.

17. (i) (log(sin(x + 2) + xˆ 2) − 12 ∗ (xˆ 3 + 12)ˆ (1/2) − exp(x + 2)−


15 ∗ x + 10) ∗ ((x − 2)ˆ 3 + 11 ∗ x2 + 9 ∗ x − 15).
(ii) log(sin(x + 2) + xˆ 2) − 12 ∗ (xˆ 3 + 12)ˆ (1/2) − exp(x + 2)−
6 ∗ x − 5 + (x − 2)ˆ 3 + 11 ∗ xˆ 2.
Appendix D: Answers to Selected Exercises 1127

15.

(iii) (log(sin(x + 2) + xˆ 2) − 12 ∗ (xˆ 3 + 12)ˆ (1/2) − exp(x + 2)−


15 ∗ x + 10)/((x − 2)ˆ 3 + 11 ∗ xˆ 2 + 9 ∗ x − 15).
(iv) − 7.8418e + 007, −5.9021e + 004, −46.4011.

19. (i) (x − 5) ∗ (x − 6) ∗ (x − 7) ∗ (x − 8) ∗ (x − 4)
(ii) xˆ 4 − 6xˆ 3 + 11xˆ 2 − 6x

21. 3 ∗ x − xˆ 2 + 2/3 ∗ xˆ 3 − 1/2 ∗ xˆ 4.

23. y(x) = e2x+c − 1, y(x) = e2x − 1.

25. 60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ 2 − 20 ∗ q,
[−2 ∗ (−10 + 3 ∗ q ∗ p)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ
2 − 20 ∗ q), 5 ∗ (1 + 3 ∗ p)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ
2 − 20 ∗ q), −2 ∗ (q + 10)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ
2 − 20 ∗ q)]
[−(5 ∗ p − 2 ∗ qˆ 2)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ
2 − 20 ∗ q), −5 ∗ (−3 + q)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ
1128 Linear Algebra and Optimization using MATLAB

2 − 20 ∗ q), (−6 ∗ q + 5 ∗ p)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ


2 − 20 ∗ q)]
[(3 ∗ pˆ 2 − 4 ∗ q)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ 2 −
20 ∗ q), −(9 ∗ p + q)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ
2 − 20 ∗ q), (12 + p)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ 2 − 20 ∗ q)]
[−6 ∗ (q ∗ p − 5 ∗ p + 5 + q)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ
2 − 20 ∗ q)]
[2 ∗ (5 ∗ p + 15 − 14 ∗ q + qˆ 2)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ
2 − 20 ∗ q)]
[3 ∗ (−5 ∗ p + 12 + pˆ 2 − 2 ∗ q)/(60 − 18 ∗ q ∗ p + 5 ∗ p + 15 ∗ pˆ 2 − 2 ∗ qˆ
2 − 20 ∗ q)].

27. xˆ 3−20 ∗ xˆ 2 + 123 ∗ x − 216


(x − 3) ∗ (x − 8) ∗ (x − 9)
X=
[0, −5, 1]
[1, 2, 2]
[−2, 1, 1]
D=
[8, 0, 0]
[0, 3, 0]
[0, 0, 9].

29. [2]
[−2]
[−2]
Bibliography

[1] Abramowitz M. and I. A. Stegum I. A.(eds): Handbook of Mathemat-


ical Functions, National Bureau of Standards, 1972.

[2] Achieser, N. I.: Theory of Approximation, Dover, New York, 1993.

[3] Ahlberg, J., E. Nilson, and J. Walsh: The Theory of Splines and Their
Application, Academic Press, New York, 1967.

[4] Akai, T. J.: Applied Numerical Methods for Engineers, John Wiley &
Sons, New York, 1993.

[5] Allgower, E. L., K. Glasshoff, and H. O. Peitgen (eds.): Numerical


Solutions of Nonlinear Equations, LNM878, Springer–Verlag, 1981.

[6] Atkinson, K. E. and W. Han: An Introduction to Numerical Analysis,


3rd ed., John Wiley & Sons, New York, 2004.

[7] Axelsson, O.: Iterative Solution Methods, Cambridge University


Press, New York, 1994.

[8] Ayyub, B. M. and R. H. McCuen: Numerical Methods for Engineers,


Prentice–Hall, Upper Saddle River, NJ, 1996.

[9] Bellman, R.: Introduction to Matrix Analysis, 2nd ed., McGraw–Hill,


New York, 1970.

[10] Bazaraa, M., H. Sherall, C. Shetty: Nonlinear Programming, Theory


and Algorithms, 2nd ed., John Wiley & Sons, New York, 1993.
1129
1130 Bibliography

[11] Beale, E. M. L.: Numerical Methods in Nonlinear Programming, (Ed.


J. Abadie), North–Holland, Amsterdam, 1967.
[12] Beale, E. M. L.: Mathematical Programming in Practice, Pitman,
London, 1968.
[13] Bertsetkas, D.: Nonlinear Programming, Athena Publishing, Cam-
bridge, MA, 1995.
[14] Beightler, C., D. Phillips, and D. Wilde: Foundations of Optimization,
2nd ed., Prentice–Hall, Upper Saddle River, New Jersey, 1979.
[15] Bradley, S. P., A. C. Hax and T. L. Magnanti: Applied Mathematical
Programming, Addison–Wesley, Reading, MA, 1977.
[16] Bender, C. M. and S. A. Orszag: Advanced Mathematical Methods
for Scientists and Engineers, McGraw–Hill, New York, 1978.
[17] Blum, E. K.: Numerical Analysis and Computation: Theory and Prac-
tice, Addison–Wesley, Reading, MA, 1972.
[18] Borse, G. H.: Numerical Methods with MATLAB, PWS, Boston,
1997.
[19] Bronson, R.: Matrix Methods—An Introduction, Academic Press,
New York, 1969.
[20] Buchanan, J. L. and P. R. Turner: Numerical Methods and Analysis,
McGraw–Hill, New York, 1992.
[21] Burden, R. L. and J. D. Faires: Numerical Analysis, 8th ed.,
Brooks/Cole Publishing Company, Boston, 2005.
[22] Butcher, J.: The Numerical Analysis of Ordinary Differential equa-
tions, John Wiley & Sons, New York, 1987.
[23] Carnahan, B., A. H. Luther, and J. O. Wilkes: Applied Numerical
Methods, John Wiley & Sons, New York, 1969.
[24] Chapra, S. C. and R. P. Canale: Numerical Methods for Engineers,
3rd ed., McGraw–Hill, New York, 1998.
Bibliography 1131

[25] Cheney, E. W.: Introduction to Approximation Theory, McGraw–Hill,


New York, 1982.

[26] Chv’atal, V.: Linear Programming. W. H. Freeman, New York, 1983.

[27] Ciarlet, P. G.: Introduction to Numerical Linear Algebra and Opti-


mization, Cambridge University Press, Cambridge, MA, 1989.

[28] Coleman, T. F. and C. Van Loan: Handbook for Matrix Computa-


tions, SAIM, Philadelphia, 1988.

[29] Conte, S. D. and C. de Boor: Elementary Numerical Analysis, 3rd


ed., McGraw–Hill, New York, 1980.

[30] Daellenbach, H. G. and E. J. Bell: User’s Guide to Linear Program-


ming, Prentice–Hall, Englewood Cliffs, NJ, 1970.

[31] Dahlquist, G. and A. Bjorck: Numerical Methods, Prentice–Hall, En-


glewood Cliffs, NJ, 1974.

[32] Daniels, R. W.: An Introduction to Numerical Methods and Opti-


mization Techniques, North–Holland, New York, 1978.

[33] Dantzig, G. B.: Minimization of a Linear Function of Variables Sub-


ject to Linear Inequalities. In Activity Analysis of Production and
Allocation, Koopmans, T. C., ed., John Wiley & Sons, New York,
Chapter XXI, pp. 339-347, 1951(a).

[34] Dantzig, G. B.: Application of the Simplex Method to a Transporta-


tion Problem. In Activity Analysis of Production and Allocation,
Koopmans, T. C., ed., John Wiley & Sons, New York, Chapter XXIII,
pp. 35993-373, 1951(b).

[35] Dantzig, G. B.: Linear Programming and Extensions, Princeton Uni-


versity Press, Princeton, NJ, 1963.

[36] Dantzig, G. B. and M. N. Thapa: Linear Programming 1: Introduc-


tion, Springer–Verlag, New York, 1997.
1132 Bibliography

[37] Datta, B. N.: Numerical Linear Algebra and Application, Brook/Cole,


Pacific Grove, CA, 1995.
[38] Davis, P. J.: Interpolation and Approximation, Dover, New York,
1975.
[39] Davis, P. J. and P. Rabinowitz: Methods of Numerical Integration,
2nd ed., Academic Press, 1984.
[40] Demmel, J. W.: Applied Numerical Linear Algebra, SIAM, Philadel-
phia, 1997.
[41] Driebeek, N. J.: Applied Linear Programming, Addison–Wesley,
Reading, MA, 1969.
[42] Duff, I. S., A. M. Erisman and J. K. Reid: Direct Methods for Sparse
Matrices, Oxford University Press, Oxford, England, 1986.
[43] Epperson, J. F.: An Introduction to Numerical Methods and Analysis,
John Wiley & Sons, Chichester, 2001.
[44] Etchells, T. and J. Berry: Learning Numerical Analysis Through De-
rive, Chartwell–Bratt, Kent, 1997.
[45] Etter, D. M. and D. C. Kuncicky: Introduction to MATLAB,
Prentice–Hall, Englewood Cliffs, NJ, 1999.
[46] Evans, G.: Practical Numerical Analysis, John Wiley & Sons, Chich-
ester, England, 1995.
[47] Fang, S. C. and Puthenpura, S.: Linear Optimization and Extensions,
AT&T Prentice–Hall, Englewood Cliffs, NJ, 1993.
[48] Fatunla, S. O.: Numerical Methods for Initial-Value Problems in Or-
dinary Differential Equations, Academic Press, New York, 1988.
[49] Ferziger, J. H.: Numerical Methods for Engineering Application, John
Wiley & Sons, New York, 1981.
[50] Fiacco, A. V.: Introduction to Sensitivity and Stability Analysis in
Numerical Programming, Academic Press, New York, 1983.
Bibliography 1133

[51] Fletcher, R.: Practical Methods of Optimization, 2nd ed., John Wiley
& Sons, New York, 1987.
[52] Forsythe, G. E. and C. B. Moler: Computer Solution of Linear Alge-
braic Systems, Prentice–Hall, Englewood Cliffs, NJ, 1967.
[53] Fox, L.: Numerical Solution of Two-Point Boundary-Value Problems
in Ordinary Differential Equations, Dover, New York, 1990.
[54] Fox, L.: An Introduction to Numerical Linear Algebra, Oxford Uni-
versity Press, New York, 1965.
[55] Fröberg, C. E.: Introduction to Numerical Analysis, 2nd ed., Addison–
Wesley, Reading, MA, 1969.
[56] Fröberg, C. E.: Numerical Mathematics: Theory and Computer Ap-
plication, Benjamin/Cummnings, Menlo Park, CA, 1985.
[57] Gass, S. I.: An Illustrated Guide to Linear Programming, McGraw–
Hill, New York, 1970.
[58] Gass, S. I.: Linear Programming, 4th ed., McGraw–Hill, New York,
1975.
[59] Gerald, C. F. and P. O. Wheatley: Applied Numerical Analysis, 7th
ed., Addison–Wesley, Reading, MA, 2004.
[60] Gilat A.: MATLAB—An Introduction with Applications, John Wiley
& Sons, New York, 2005.
[61] Gill, P. E., W. Murray, and M. H. Wright: Numerical Linear Algebra
and Optimization, Addison–Wesley, Reading, MA, 1991.
[62] Gill, P. E., W. Murray, and M. H. Wright: Practical Optimization,
Academic Press, New York, 1981.
[63] Goldstine, H. H.: A History of Numerical Analysis From the 16th
Through the 19th Century, Springer–Verlag, New York, 1977.
[64] Golub, G. H.: Studies in Numerical Analysis, MAA, Washington, DC,
1984.
1134 Bibliography

[65] Golub, G. H. and J. M. Ortega: Scientific Computing and Differential


Equations, Academic Press, New York, 1992.

[66] Golub, G. H. and C. F. van Loan: Matrix Computation, 3rd ed., Johns
Hopkins University Press, Baltimore, MD, 1996.

[67] Goldstine, H. H.: A History of Numerical Analysis From the 16th


Through the 19th Century, Springer–Verlag, New York, 1977.

[68] Greenbaum, A.: Iterative Methods for Solving Linear Systems, SIAM,
Philadelphia, 1997.

[69] Greenspan, D. and V. Casulli: Numerical Analysis for Applied Math-


ematics, Science and Engineering, Addison–Wesley, New York, 1988.

[70] Greeville, T. N. E.: Theory and Application of Spline Functions, Aca-


demic Press, New York, 1969.

[71] Griffiths, D. V. and I. M. Smith: Numerical Methods for Engineers,


CRC Press, Boca Raton, FL, 1991.

[72] Hadley, G.: Linear Algebra, Addison–Wesley, Reading, MA, 1961.

[73] Hadley, G.: Linear Programming, Addison–Wesley, Reading, MA,


1962.

[74] Hageman, L. A. and D. M. Young: Applied Iterative Methods, Aca-


demic Press, New York, 1981.

[75] Hager, W. W.: Applied Numerical Linear Algebra, Prentice–Hall, En-


glewood Cliffs, NJ, 1988.

[76] Hamming, R. W.: Introduction to Applied Numerical Analysis,


McGraw–Hill, New York, 1971.

[77] Hamming, R. W.: Numerical Methods for Scientists and Engineers,


2nd ed., McGraw–Hill, New York, 1973.

[78] Harrington, S.: Computer Graphics: A Programming Approach,


McGraw–Hill, New York, 1987.
Bibliography 1135

[79] Hayhurst, G.: Mathematical Programming Applications, Macmillan,


New York, 1987.

[80] Henrici, P. K.: Elements of Numerical Analysis, John Wiley & Sons,
New York, 1964.

[81] Higham, N. J.: Accuracy and Stability of Numerical Algorithms,


SIAM, Philadelphia, 1996.

[82] Higham, D. J. and N. J. Higham: MATLAB Guide, SIAM, Philadel-


phia, 2000.

[83] Hildebrand, F. B.: Introduction to Numerical Analysis, 2nd ed.,


McGraw–Hill, New York, 1974.

[84] Himmelblau, D. M.: Applied Nonlinear Programming, McGraw–Hill,


New York, 1972.

[85] Hoffman, Joe. D.: Numerical Methods for Engineers and Scientists,
McGraw–Hill, New York, 1993.

[86] Hohn, F. E.: Elementary Matrix Algebra, 3rd ed., Macmillan, New
York, 1973.

[87] Horn, R. A. and C. R. Johnson: Matrix Analysis, Cambridge Univer-


sity Press, Cambridge, 1985.

[88] Hornbeck, R. W. Numerical Methods, Prentice–Hall, Englewood


Cliffs, NJ, 1975.

[89] Householder, A. S.: The Theory of Matrices in Numerical Analysis,


Dover Publications, New York, 1964.

[90] Householder, A. S.: The Numerical Treatment of a Single Non-linear


Equation, McGraw–Hill, New York, 1970.

[91] Hultquist, P. E.: Numerical Methods for Engineers and Computer


Scientists, Benjamin/Cummnings, Menlo Park, CA, 1988.
1136 Bibliography

[92] Hunt, B. R., R. L. Lipsman, and J. M. Rosenberg: A Guide to


MATLAB for Beginners and Experienced Users, Cambridge Univer-
sity Press, Cambridge, MA, 2001.

[93] Isaacson, E. and H. B. Keller: Analysis of Numerical Methods, John


Wiley & Sons, New York, 1966.

[94] Jacques, I. and C. Judd: Numerical Analysis, Chapman and Hall,


New York, 1987.

[95] Jahn, J.: Introduction to the Theory of Nonlinear Optimization, 2nd


ed., Springer–Verlag, Berlin, 1996.

[96] Jennings, A.: A Matrix Computation for Engineers and Scientists,


John Wiley & Sons, London, 1977.

[97] Jeter, M. W.: Mathematical Programming: An Introduction to Opti-


mization, Marcel Dekker, New York, 1986.

[98] Johnson, L. W. and R. D. Riess: Numerical Analysis, 2nd ed.,


Addison–Wesley, Reading, MA, 1982.

[99] Johnston, R. L.: Numerical Methods—A Software Approach, John


Wiley & Sons, New York, 1982.

[100] Kahanger, D., C. Moler, and S. Nash: Numerical Methods and Soft-
ware, Prentice–Hall, Englewood Cliffs, NJ, 1989.

[101] Kharab, A. and R. B. Guenther: An Introduction to Numerical


Methods—A MATLAB Approach, Chapman & Hall/CRC, New York,
2000.

[102] Kelley, C. T.: iterative Methods for Linear and Nonlinear Equations,
SIAM, Philadelphia, 1995.

[103] Kincaid, D. and W. Cheney: Numerical Analysis—Mathematics


of Scientific Computing, 3rd ed., Brooks/Cole Publishing Company,
Boston, 2002.
Bibliography 1137

[104] King, J.T.: Introduction to Numerical Computation, McGraw–Hill,


New York, 1984.

[105] Knuth, D. E.: Seminumerical Algorithms, 2nd ed., Vol. 2 of The Art
of Computer Programming, Addison–Wesley, Reading, MA, 1981.

[106] Kolman, B. and R. E. Beck: Elementary Linear Programming with


Applications, Academic Press, New York, 1980.

[107] Lambert, J. D.: Numerical Methods for Ordinary Differential Sys-


tems, John Wiley & Sons, Chichester, 1991.

[108] Lancaster, P.: “Explicit Solution of Linear Matrix Equations,” SIAM


Review 12, 544-66, 1970.

[109] Lancaster, P. and M. Tismenetsky: The Theory of Matrices, 2nd ed.,


Academic Press, New York, 1985.

[110] Lastman, G. J. and N. K. Sinha: Microcomputer-Based Numerical


Methods for Science and Engineering, Saunders, New York, 1989.

[111] Lawson, C. L. and R. J. Hanson: Solving Least Squares Problems,


SIAM, Philadelphia, 1995.

[112] Leader, J. J.: Numerical Analysis and Scientific Computation,


Addison–Wesley, Reading, MA, 2004.

[113] Linear, P.: Theoretical Numerical Analysis, John Wiley & Sons, New
York, 1979.

[114] Linz, P. and R. L. C. Wang: Exploring Numerical Methods—An


Introduction to Scientific Computing Using MATLAB, Jones and
Bartlett Publishers, Boston, 2002.

[115] Luenberger, D. G.: Linear and Nonlinear Programming, 2nd ed.,


Addison–Wesley, Reading, MA, 1984.

[116] Mangasarian, O.: Nonlinear Programming, McGraw–Hill, New York,


1969.
1138 Bibliography

[117] Marcus, M.: Matrices and Matlab, Prentice–Hall, Upper Saddle


River, NJ, 1993.

[118] Marcus, M. and H. Minc: A Survey of Matrix Theory and Matrix


Inequalities, Allyn and Bacon, Boston, 1964.

[119] Maron, M. J. and R. J. Lopez: Numerical Analysis: A Practical


Approach, 3rd ed., Wadsworth, Belmont, CA, 1991.

[120] Mathews, J. H.: Numerical Methods for Mathematics, Science and


Engineering, 2nd ed., Prentice–Hall, Englewood Cliffs, NJ, 1987.

[121] McCormick, G. P.: Nonlinear Programming Theory, Algorithms, and


Applications, John Wiley & Sons, New York, 1983.

[122] Meyer, C. D.: Matrix Analysis and Applied Linear Algebra, SIAM,
Philadelphia, 2000.

[123] Mirsky, L.: An Introduction to Linear Algebra, Oxford University


Press, Oxford, 1963.

[124] Modi, J. J.: Parallel Algorithms and Matrix Computation, Oxford


University Press, Oxford, 1988.

[125] Moore, R. E.: Mathematical Elements of Scientific Computing, Holt,


Reinhart & Winston, New York, 1975.

[126] Mori, M. and R. Piessens (eds.): Numerical Quadrature, North Hol-


land, New York, 1987.

[127] Morris, J. L.: Computational Methods in Elementary Theory and


Application of Numerical Analysis, John Wiley & Sons, New York,
1983.

[128] Murty, K. G.: Linear Programming, John Wiley & Sons, New York,
1983.

[129] Nakamura, S.: Applied Numerical Methods in C, Prentice–Hall, En-


glewood Cliffs, NJ, 1993.
Bibliography 1139

[130] Nakos, G. and D. Joyner: Linear Algebra With Applications,


Brooks/Cole Publishing Company, Boston, 1998.

[131] Nash, S. G. and A. Sofer: Linear and Nonlinear Programming,


McGraw–Hill, New York, 1998.

[132] Nazareth, J. L.: Computer Solution of Linear Programs, Oxford Uni-


versity Press, New York, 1987.

[133] Neumaier, A.: Introduction to Numerical Analysis, Cambridge Uni-


versity Press, Cambridge, MA, 2001.

[134] Nicholson, W. K.: Linear Algebra With Applications, 4th ed.,


McGraw–Hill Ryerson, New York, 2002.

[135] Noble, B. and J. W. Daniel: Applied Linear Algebra, 2nd ed.,


Prentice–Hall, Englewood Cliffs, NJ, 1977.

[136] Olver, P. J. and C. Shakiban: Applied Linear Algebra, Pearson


Prentice–Hall, Upper Saddle River, NJ, 2005.

[137] Ortega, J. M.: Numerical Analysis—A Second Course, Academic


Press, New York, 1972.

[138] Ostrowski, A. M.: Solution of Equations and Systems of Equations,


Academic Press, New York, 1960.

[139] Parlett, B. N.: The Symmetric Eigenvalue Problem, Prentice–Hall,


Englewood Cliffs, NJ, 1980.

[140] Peressini, A. L., F. E. Sullivan, and J. J. Uhl Jr.: The Mathematics


of Nonlinear Programming, Springer–Verlag, New York, 1988.

[141] Pike, R. W.: Optimization for Engineering Systems, Van Nostrand


Reinhold, New York, 1986.

[142] Polak, E.: Computational Methods in Optimization, Academic


Press, New York, 1971.
1140 Bibliography

[143] Polak, E.: Optimization Algorithm and Consistent Approximations,


Computational Methods in Optimization, Springer–Verlag, New York,
1997.

[144] Powell, M. J. D.: Approximation Theory and Methods, Cambridge


University Press, Cambridge, MA, 1981.

[145] Pshenichnyj, B. N.: The Linearization Method for Constrained Op-


timization, Springer–Verlag, Berlin, 1994.

[146] Quarteroni, Alfio: Scientific Computing with MATLAB, Springer–


Verlag, Berlin Heidelberg, 2003.

[147] Ralston, A. and P. Rabinowitz: A First Course in Numerical Analy-


sis, 2nd ed., McGraw–Hill, New York, 1978.

[148] Rao, S. S.: Engineering Optimization: Theory and Practice, 3rd ed.,
John Wiley & Sons, New York, 1996.

[149] Ravindran, A.: Linear Programming, in Handbook of Industrial En-


gineering, (G. Salvendy ed.), Chapter 14 (pp 14.2.1-14.2-11), John
Wiley & Sons, New York, 1982.

[150] Rardin, R. L.: Optimization in Operational Research, Prentice–Hall,


Upper Saddle River, NJ, 1998.

[151] Recktenwald, G.: Numerical Methods with MATLAB—


Implementation and Application, Prentice–Hall, Englewood Cliffs,
NJ, 2000.

[152] Rice, J. R.: Numerical Methods, Software and Analysis, McGraw–


Hill, New York, 1983.

[153] Rivlin, T. J.: An Introduction to the Approximation of Functions,


Dover Publications, New York, 1969.

[154] Rorrer, C. and H. Anton: Applications of Linear Algebra, John Wiley


& Sons, New York, 1977.
Bibliography 1141

[155] Saad, Y.: Numerical Methods for Large Eigenvalue Problems: The-
ory and Algorithms, John Wiley & Sons, New York, 1992.
[156] Saad, Y.: Iterative Methods for Sparse Linear Systems, PWS Pub-
lishing Co., Boston, 1996.
[157] Scales, L. E.: Introduction to Nonlinear Optimization, Macmillan,
London, 1985.
[158] Scarborough, J. B.: Numerical Mathematical Analysis, 6th ed., The
Johns Hopkins University Press, Baltimore, MA, 1966.
[159] Schatzman, M.: Numerical Analysis—A Mathematical Introduction,
Oxford University Press, New York, 2002.
[160] Scheid, F.: Numerical Analysis, McGraw–Hill, New York, 1988.
[161] Schilling, R. J. and S. L. Harris: Applied Numerical Methods for
Engineers using MATLAB and C, Brooks/Cole Publishing Company,
Boston, 2000.
[162] Shampine, L. F. and R. C. Allen: Numerical Computing—An Intro-
duction, Saunders, Philadelphia, 1973.
[163] Shapiro, J. F.: Mathematical Programmin Structures and Algo-
rithms, John Wiley & Sons, New York, 1979.
[164] Steward, B. W.:Introduction to Matrix Computations, Academic
Press, New York, 1973.
[165] Stewart, G. W.: Afternotes on Numerical Analysis, SIAM, Philadel-
phia, 1996.
[166] Store, J. and R. Bulirsch: Introduction to Numerical Analysis,
Springer–Verlag, New York, 1980.
[167] Strang, G.: Linear Algebra and Its Applications 3rd ed., Brooks/Cole
Publishing Company, Boston, 1988.
[168] Stroud, A. H. and D. Secrest: Gaussian Quadrature Formulas,
Prentice–Hall, Englewood Cliffs, NJ, 1966.
1142 Bibliography

[169] Suli, E. and D. Mayers: An Introduction to Numerical Analysis,


Cambridge University Press, Cambridge, MA, 2003.
[170] The Mathworks, Inc.: Using MATLAB, The Mathworks, Inc., Nat-
ick, MA, 1996.
[171] The Mathworks, Inc.: Using MATLAB Graphics, The Mathworks,
Inc., Natick, MA, 1996.
[172] The Mathworks, Inc.: MATLAB Language Reference Manual. The
Mathworks, Inc., Natick, MA, 1996.
[173] Trefethen, L. N. and D. Bau III: Numerical Linear Algebra, SIAM,
Philadelphia, 1997.
[174] Traub, J. F.: Iterative Methods for the Solution of Equations,
Prentice–Hall, Englewood Cliffs, NJ, 1964.
[175] Turnbull, H. W. and A. C. Aitken: An Introduction to the Theory
of Canonical Matrices, Dover Publications, New York, 1961.
[176] Turner, P. R.: Guide to Scientific Computing, 2nd ed., Macmillan
Press, Basingstoke, 2000.
[177] Ueberhuber, C. W.: Numerical Computation 1: Methods, Software,
and Analysis, Springer–Verlag, New York, 1997.
[178] Usmani, R. A.: Numerical Analysis for Beginners, D and R Texts
Publications, Manitoba, 1992.
[179] Vandergraft, J. S.: Introduction to Numerical Computations, Aca-
demic Press, New York, 1978.
[180] Verga, R. S.: Matrix Iterative Analysis, Prentice–Hall, Englewood
Cliffs, NJ, 1962.
[181] Wagner, H. M.: Principles of Operational Research, 2nd ed.,
Prentice–Hall, Englewood Cliffs, NJ, 1975.
[182] Walsh, G. R.: Methods of Optimization, John Wiley & Sons, London,
1975.
Bibliography 1143

[183] Wilkinson, J. H.: Rounding Errors in Algebraic Processes, Prentice–


Hall, Englewood Cliffs, NJ, 1963.

[184] Wilkinson, J. H.: The Algebraic Eigenvalue Problem, Clarendon


Press, Oxford, England, 1965.

[185] Wilkinson, J. H. and C. Reinsch: Linear Algebra, vol. II of Handbook


for Automatic Computation, Springer–Verlag, New York, 1971.

[186] Williams, G.: Linear Algebra With Applications, 4th ed., Jones and
Bartlett Publisher, UK, 2001.

[187] Wolfe, P.: Methods of Nonlinear Programming, in Nonlinear Pro-


gramming (Ed. J. Abadie), North–Holland, Amsterdam, 1967.

[188] Wood, A.: Introduction to Numerical Analysis, Addison–Wesley,


Reading, MA, 1999.

[189] Young, D. M.: Iterative Solution of Large Linear Systems, Academic


Press, New York, 1971.

[190] Zangwill, W.: Nonlinear Programming, Prentice–Hall, Englewood


Cliffs, NJ, 1969.
Index

LR method, 418, 482 arbitrary linear system, 286


LU decomposition, 111 arithmetic operation, 1012
QR decomposition, 474, 490 arithmetic operations, 1012
QR method, 418, 482 artificial objective function, 702
QR transformation, 474 artificial problem, 702
nth divided difference, 531 artificial system, 697
syms, 1063 artificial variables, 696
augmented matrix, 8
absolute error, 926 average temperature, 196
absolute extrema, 745
absolute value, 100, 979 backward substitution, 79, 82
accelerate convergence, 305, 491 Balancing Chemical Equations, 192
additional constraints, 717 band matrix, 29
adjacent feasible solution, 684 base points, 181
adjacent points, 514 base systems, 919
adjoint matrix, 60, 61 basic feasible solution, 677, 684
adjoint of a matrix, 386 basic solution, 683
Aitken’s interpolation, 512 basic variables, 677, 683
Aitken’s interpolatory polynomials, basis, 369, 700
512 basis vectors, 950
algebraic form, 86 Bessel difference polynomials, 512
algebraic method, 683 Big M simplex method, 697
alternate optimal solution, 682 binary approximation, 922
approximate number, 925 binary digits, 919
approximating functions, 513 binary expansion, 920
approximating polynomials, 583 binary system, 918
approximation polynomials, 512 bisection method, 777, 780
approximation theory, 512 blocks, 26
arbitrary constants, 400 Bocher’s theorem, 385
1145
1146 Index

boundary value problems, 244 component of a vector, 952


bounds for errors, 780 computer solutions, 701
bracket the optimum, 833 concave downward, 742
bracketing method, 775 concave functions, 802
built-in functions, 1024 concave upward, 742
concavity, 742
canonical form, 676 condition for the convergence, 286
canonical reduction, 695 condition number, 170
canonical system, 696 conjugacy condition, 311
cartesian coordinate system, 947 conjugate direction, 311
Cayley–Hamilton theorem, 379 conjugate gradient iterations, 665
characteristic equation, 344, 369 conjugate gradient method, 310
Chebyshev approximation polynomial,conjugate transpose, 373
570 consistent system, 6
Chebyshev points, 560 constant force, 954
Chebyshev polynomial, 557 constrained minimization problem, 869
chemical equations, 193 constraint coefficients, 655
Chemical Solutions, 192 constraint equation, 697
chemical substance, 192 constraint equations, 883
Cholesky method, 143 constraint matrix, 710
chopping, 924 constraint set, 855
closed path, 183 constraints, 654, 655
coefficient matrix, 7, 63 continuous function, 513, 739, 780
cofactor, 41, 44 continuously differentiable function,
column matrix, 7, 63 765
column vector, 678, 1016 convergence tolerance, 830
Command History Window, 1008 convergent matrix, 289, 290
Command Window, 1008 convex, 675
companion matrix, 210 convex feasible region, 819
complex conjugate, 978 convex function, 803
complex conjugate pair, 419 convex set, 803
complex inner product space, 990 convexity, 809
complex number, 976 coordinate system, 942
complex scalars, 979 coplanar, 962
complex vector space, 979 correct decimals, 924
complex-valued inner products, 990 correct rounding, 925
Index 1147

cost coefficients, 655 digital computer, 701


Cramer’s rule, 75 digits represent, 918
critical points, 742 direct method, 111
cross product, 956 directed line segment, 942
Crout’s method, 125 direction angles, 950
current basic variable, 684 direction cosines, 950
Current Directory Window, 1008 direction of steepest ascent, 842
curve fitting, 575 direction of steepest descent, 842
direction vector, 973
data fitting, 574 direction vectors, 310
decimal digits, 922 directional derivative, 752
decimal fraction, 920 discontinuous, 739
decimal notation, 918 discrete Fourier analysis, 605
decimal number system, 918 discriminant, 333
decimal point, 918 displacement vector, 954
decimal system, 918 divided differences, 530
decision statement, 1055 dominant eigenvalue, 419
decision statements, 1054 dominant singular value, 492
decision variables, 654 Doolittle’s method, 115
decision vector, 678 dot product, 945, 986
deflation method, 442 double precision, 923
deluxe model, 679 dual constraints, 709
Derivatives of the Functions, 740 dual problem, 706
desgin. values, 664 dual problems, 708
design variables, 664 dual variables, 708
determinant, 38, 50 duality, 706
diagonal elements, 401 Duality theorem, 715
diagonal matrix, 21
diagonal system, 400, 401 echelon form, 1037
diagonally dominant, 316, 435 Editor Window, 1009
diet problem, 670 eigenpair, 280
difference equation, 405 eigenspace, 369
differentiable function, 756 eigenvalue problem, 280
differentiable functions, 397 eigenvalues, 280
differential equations, 397, 401 eigenvectors, 280
differentiation, 740 electrical networks, 183
1148 Index

elementary functions, 511 first partial derivatives, 747


elementary matrix, 71 first Taylor polynomial, 787
elementary row operation, 33 fixed-point iteration, 783
elimination methods, 243 fixed-point method, 781, 782
ellipsis, 1009 floating-point, 921
entering variable, 684, 693 flow constraint, 187
equality constraint, 680 format bank, 1011
equality constraints, 679, 855 format compact, 1011
equally spaced, 512 format long, 1011
equivalent system, 81 format loose, 1011
error analysis, 585 format short, 1011
error bound, 636, 927 forward elimination, 81
error bounds, 273 fplot command, 1046
exact number, 925 fprintf function, 1036
excess variable, 680 fractional values, 673
exit condition, 664 full rank, 97
exitflag, 664 function coefficients, 677
experimental data, 609 function m-file, 1056
exponent, 921 function plot, 1045
exponential format, 1037 fundamental system, 400, 401
exponential functions, 512
extrapolation, 512 Gauss quadratures, 790
Gauss–Jordan method, 106
factorization method, 111 Gauss–Seidel iteration matrix, 253
Faddeev–Leverrier method, 386 Gauss–Seidel iterative method, 253
feasible point, 693 Gaussian elimination method, 79
feasible points, 662 general formulation, 655
feasible region, 661, 819 general solution, 398
feasible solutions, 656 generalized reduced-gradient, 885
feasible values, 662 George Dantzig, 653
Figure Window, 1008 Gerschgorin circles, 439
final simplex tableau, 685 Gerschgorin’s theorem, 439, 440
finer grids, 189 Given’s method, 460
finite optimum, 675, 682 global minimum, 855
first divided difference, 531 golden ratio, 820
first partial derivative, 746 golden-section search, 825
Index 1149

gradient, 755 inequality constraints, 679, 868


gradient methods, 840 infinitely many solutions, 32
gradient of a function, 755 inflection point, 743
gradient vector, 841 initial conditions, 403, 405
Gram matrix, 990 initial simplex tableau, 684
graphic command, 1049 initial-value problem, 398, 399
graphical method, 683 inline functions, 1057
graphical procedure, 660 inner product, 986
inner product axioms, 991
half-space, 802 inner product space, 986
Heat Conduction, 189 Input Parameters, 664
heat-transfer problems, 189 inspection error, 659
Help Window, 1009 inspection problem, 671, 673
Hermitian matrix, 372 integer mode, 922
Hessian matrix, 757 integer part, 918
hexadecimal, 919 interior point, 190, 819
hexadecimal fraction, 920 Intermediate Value theorem, 775
Hilbert matrix, 395, 1031 interpolating point, 524
homogeneous linear system, 62 interpolating polynomial, 518
homogeneous system, 9, 366 interpolation, 512
horizontal axis, 977 interpolation conditions, 516
Householder matrix, 465 interval-halving method, 775
Householder transformations, 474 inverse iteration, 435
Householder’s method, 465 inverse matrix, 20
idempotent matrix, 210 inverse power method, 418
identity matrix, 16 invertible matrix, 18, 19
ill-conditioned systems, 170 involution matrix, 210
ill-conditioning, 103 iterative methods, 244, 418
imaginary axis, 977 iterative scheme, 294
imaginary part, 976 Jacobi iteration matrix, 247
imaginary roots, 787 Jacobi method, 246, 448
inconsistent system, 6 Jacobian matrix, 793
indefinite, 759 junctions, 188
independent eigenvectors, 363
indirect factorization, 134 kernel of A, 615
individual constraints, 681 Kirchhoff’s Laws, 184
1150 Index

KT conditions, 869 linear independent, 4


linear inequality, 653
Lagrange coefficient polynomials, 516 linear interpolation, 516
Lagrange coefficients, 527 linear least squares, 575
Lagrange conditions, 864 linear polynomial, 514
Lagrange interpolation, 514 linear programming, 653
Lagrange interpolatory polynomials, linear programming tableau, 883
512 linearized form, 596
Lagrange multipliers, 855, 866 linearly independent columns, 619
Laplace expansion theorem, 45 linearly independent eigenvectors, 402
largest positive entry, 692 linprog, 663
largest singular value, 492 local extremum, 742
leading one, 30, 32 local maximum, 742
leading principal minor, 812 local minimum, 743, 850
least dominant eigenvalue, 430 logical operations, 1055
least squares approximation, 576, 580 logical variables, 1054
least squares error, 585 lower bounds, 664
least squares line, 578 lower-triangular matrix, 23
least squares method, 574 lowest value, 673
least squares plane, 601 LP Problem, 676
least squares polynomial, 582 LP problem, 654, 661, 662
least squares problem, 596
least squares solution, 624 machine epsilon, 924
leaving variable, 684 machine number, 921
left singular vectors, 494 magnitude, 942
length of a vector, 987 mathematical method, 653
length of the segment, 942 MATLAB, 1007
limit of a function, 737 matrix decomposition, 491
linear algebra, 1008 matrix inversion method, 70
linear approximation, 762 matrix norm, 164
linear combination, 4, 420 matrix of the cofactor, 46
linear constraints, 884 matrix-vector notation, 678
linear difference equation, 406 maximization in primal, 709
linear equation, 2 maximization problem, 709
linear equations, 1 maximizing a concave function, 819
linear function, 653 maximum profit, 660
Index 1151

maximum value, 669 Newton interpolation, 536


mean value property, 190 Newton’s divided difference, 512
mesh lines, 189 Newton’s method, 586, 786
mesh points, 189 Newton–Raphson method, 786
method of elimination, 30 nilpotent matrix, 210
method of tangents, 786 NLP problem, 819
minimal spectral, 305 no solution, 32
minimization in dual, 709 nonbasic variables, 677, 683
minimization problem, 690, 709 nondiagonal matrix, 411
minimizing a convex function, 819 nonhomogeneous system, 68
minimizing point, 865 nonlinear constraint, 884
minimum value, 669 nonlinear curves, 585
minor, 40 nonlinear equations, 780
minors, 44 nonlinear fit, 597
mixed partial derivatives, 758 nonlinear simultaneous system, 593
modified simplex algorithm, 890 nonnegative restriction, 658
modulus, 979
nonnegative values, 658
monetary constraints, 660
nonnegative vector, 682
monic polynomials, 565
nonnegativity constraints, 658, 662,
multidimensional maximization, 850
672
multidimensional unconstrained op-
nonsingular matrix, 18
timization, 840
nontrivial solutions, 62
multiple optimal solutions, 673
nonzero eigenvalues, 497
multiples, 80
nonzero singular values, 497
natural method, 449 nonzero vectors, 946, 952, 953
natural norm, 286 normal equations, 582, 585
negative infinity, 1011 normal matrices, 373
negative-definite, 759 normalized mantissa, 921
negative-semidefinite, 759 normalized power sequence, 424, 427
neighboring points, 190 normed vector space, 802
Network analysis, 185 null space, 615
newline command, 1036 number of nutrients, 670
Newton divided difference, 535 number system, 918
Newton divided difference interpola- numerical linear algebra, 1008
tion, 532 numerical matrix, 1036
1152 Index

objective function, 654, 669, 819 overdetermined system, 3


octal system, 919
one-dimensional optimization, 819 parabola equation, 583
one-dimensional search, 842 parallel vectors, 947
one-point iterative method, 790 parametric equations, 965
one-side limits, 737 partial derivatives, 611, 799
open region, 748 partial pivoting, 100
optimal assignment, 659 partitioned, 26
optimal choices, 302 partitioned matrix, 27
optimal integer solution, 673 percentage error, 926
optimal Phase 1 tableau, 703 permutation matrix, 30
optimal solution, 835 Phase 1 basis, 703
optimal solutions, 653 Phase 1 problem, 703
optimal value, 656, 662 Phase 1 tableau, 703
optimization direction, 710 pivot column, 685
optimization problem, 654 pivot element, 80, 87
optimization problems, 310 pivot location, 685
Optimization Toolbox, 663 pivotal equation, 80
optimum point, 827 pivoting strategies, 99
optimum tableau, 705 point-normal form, 968
optimum value, 301 polar coordinates, 980
original system, 701 polynomial equation, 280
orthogonal, 310 polynomial Fit, 585
orthogonal matrix, 367, 370, 447, 494 polynomial functions, 512
orthogonal set, 356 polynomial interpolation, 529
orthogonal vectors, 947 Polynomial Least Squares, 584
orthogonality condition, 310 positive dominant, 424, 427
orthogonally diagonalizing, 369 positive singular values, 497
orthogonally similar, 447 positive whole numbers, 918
orthonormal basis, 369, 494 positive-definite, 380, 759
orthonormal columns, 493 positive-definite matrix, 141
orthonormal rows, 493 positive-semidefinite, 759
orthonormal set, 369 power methods, 418
Output Parameters, 664 practical problems, 660
over-relaxation methods, 295 preassigned accuracy, 308
overdetermined linear system, 612 primal constraint, 709
Index 1153

primal problem, 707, 709 rectangular matrix, 16


primal variables, 709 rectangular real matrix, 491
primal-dual computations, 711 recurrence relation, 405
primal-dual problem, 711 recursion relation, 559
principal argument, 981 reduced gradient, 883
principal minors, 141, 817 reduced row echelon form, 32
product matrix, 12 reduced-gradient method, 883
profit function, 661 regression line, 578
program modules, 1007 regular power method, 418
programming command, 1036 relational operators, 1055
programming model, 657 relative error, 926
projection of vector, 953 relaxation factor, 295
pseudoinverse of a matrix, 619 repeated eigenvalues, 366
purely imaginary number, 977 reshape matrix, 1030
residual correction, 313
quadratic approximation, 768
residual correction method, 326
quadratic form, 768
residual corrector, 316
quadratic interpolation formula, 827
residual vector, 310
quadratic objective function, 895
resources values, 655
Quadratic programming, 895
revised problem, 719
radix, 919 right singular vectors, 494
random matrix, 1030 road network, 186
rank, 97, 493 rotation matrix, 447
rank deficient, 97 round-off errors, 310, 929
rank of a matrix, 1038 rounding, 924
rate of convergence, 305 row echelon form, 30
ratio test, 693 row equivalent, 35
rational functions, 512 row operations, 33
raw material, 657 row vector, 1016
Rayleigh quotient, 441 row-reducing, 356
Rayleigh Quotient Theorem, 441 Rutishauser’s LR method, 479
real axis, 977
real number, 976 s, 840
real part, 976 saddle points, 836
real vector space, 979 scalar arithmetic, 1012
rectangular array, 10 scalar matrix, 21, 51
1154 Index

scalar multiplication, 348 singular value decomposition, 493


scalar product, 348, 350, 945 singular values, 491
scalar projection, 952 skew Hermitian, 372
scalar triple product, 960 skew lines, 966
scaling vector, 1047 skew matrix, 25
scientific notation, 921 skew symmetric matrix, 26
SDD matrix, 154 skew symmetry, 372
search direction, 309 slack variables, 678
search method, 819 smallest eigenvalue, 429
second derivative test, 742 smallest singular value, 492
second partial derivatives, 747 software package, 1007
sensitivity analysis, 717 SOR iteration matrix, 295
separable functions, 892 SOR method, 301
separable programming, 890 Sourian–Frame theorem, 381
separable programming problem, 891 sparse matrix, 30
sequence converges, 783 sparse systems, 316
sequential derivatives, 740 spectral matrices, 357
serial method, 449 spectral radius, 283
shadow price, 720 spectral radius theorem, 395
shifted inverse power method, 418 square matrix, 15
sign restriction, 654 standard constraints, 683, 688
sign restrictions, 681, 693 standard form, 677, 680, 968
sign variable, 693 standard primal, 712
significant digits, 924 starting solution, 704
Similar Matrix, 357 stationary point, 756
similarity transformation, 357 steepest descent, 309
simple iterations, 418 Stirling difference polynomials, 512
simplex algorithm, 683 stopping criteria, 248
simplex method, 675 strictly concave, 808
simplex tableaus, 686 strictly convex, 808
simultaneous equations, 1 strictly lower-triangular matrix, 23
simultaneous iterations, 247 strictly upper-triangular matrix, 22
simultaneous linear systems, 4 string delimiter, 1035
single precision, 923 string literal, 1035
singular value, 493 string manipulation, 1035
Singular Value Decomposition, 491 string matrix, 1036
Index 1155

Sturm sequence iteration, 455 traffic flow, 186


Sturm sequences, 458 transformation matrix, 444
subdiagonal, 29 transportation systems, 185
subdominant eigenvalues, 446 transpose matrix, 17
submatrices, 26 transpose operator, 1035
subplot function, 1049 triangle inequality, 802
subplots, 1049 triangular form, 79
subspace, 350 triangular system, 79
successive iterates, 310, 788 tridiagonal elements, 465
successive iteration, 253 tridiagonal matrix, 29, 302
Successive Over-Relaxation, 294 tridiagonal system, 157
superdiagonal, 29 trigonometric functions, 512
surface plot, 1048 trigonometric polynomial, 604
surface plots, 1047, 1050 triple recursion relation, 559
surplus variables, 678 triple vector product, 962
symbolic calculations, 1062 trivial solution, 62, 68, 356
Symbolic Math Toolbox, 1062 truncation error, 928
symbolic variables, 1063 Two-Phase method, 701
symmetric dual linear programs, 709 Two-Phase simplex method, 697, 703
symmetric equations, 965
symmetric form, 709 unbounded above, 664
symmetric matrix, 23 unbounded below, 664
symmetric tridiagonal matrix, 457 unbounded solution, 675, 682
Syntax, 663 under-relaxation methods, 295
system of linear equations, 2 underdetermined linear system, 618
system of nonlinear equations, 794 underdetermined system, 3
system of two equations, 794 unequally spaced, 512
unique optimal solution, 673
Taylor series approximation, 884 unique solution, 4, 31
Taylor’s series, 762 unit circle, 987
the derivative, 740 unit dominant eigenvector, 424
threshold serial method, 449 unit sphere, 987
Toeplitz matrix, 1031 unit vector, 753, 944, 987
total pivoting, 103 unitarily diagonalizable, 373
total profit, 657 unitarily similar, 373
total voltage, 183 unitary matrix, 373
1156 Index

unitary space, 990


unrestricted in sign, 693
unrestricted sign, 654
upper bounds, 664
upper Hessenberg form, 482
upper-triangular matrix, 22

vandermonde matrix, 210


vector addition, 348, 944
vector equation, 866
vector functions, 398
vector norm, 162
vector product, 956
vector space, 348, 398, 986
vector spaces, 941
vector subspace is convex, 802
vertical axis, 977
volume of the parallelepiped, 961

Weak Duality Theorem, 717


Weierstrass Approximation Theorem,
513
Weierstrass Approximation theorem,
513
Wolfe’s method, 896
Workspace Window, 1009

zero matrix, 16
zero solution, 62
zero subspace, 350
zero vector, 348, 943
zeroth divided difference, 531

You might also like