0% found this document useful (0 votes)
3 views

Course note

The document contains comprehensive notes on Linear Algebra and Systems Theory, covering topics such as vector spaces, linear transformations, eigenvalues, and matrix norms. It includes detailed sections on theorems, definitions, and applications relevant to both linear algebra and control theory. The content is structured into chapters that progressively build on foundational concepts and their implications in various mathematical contexts.

Uploaded by

group.control.tu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Course note

The document contains comprehensive notes on Linear Algebra and Systems Theory, covering topics such as vector spaces, linear transformations, eigenvalues, and matrix norms. It includes detailed sections on theorems, definitions, and applications relevant to both linear algebra and control theory. The content is structured into chapters that progressively build on foundational concepts and their implications in various mathematical contexts.

Uploaded by

group.control.tu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

Linear Algebra and Systems Theory Notes

Manual Rewrite

May 2, 2025
2
Contents

1 Vector Spaces 5
1.1 Definition and Examples (over R, C) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Linear Independence, Span, Basis, and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Theorem: Existence and Uniqueness of Basis Representation . . . . . . . . . . . . . . . . . . . . 10

2 Linear Transformations and Matrices 11


2.1 Linear Transformations: Definition, Properties, Kernel (Null Space), Range (Image) . . . . . . . 11
2.2 Matrix Representation of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Matrix Operations (Addition, Scalar Multiplication, Multiplication) . . . . . . . . . . . . . . . . 15
2.4 Change of Basis Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Matrix Transpose, Inverse, Determinant: Properties and Computation . . . . . . . . . . . . . . . 18
2.6 Theorem: Rank-Nullity Theorem (with Proof) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Eigenvalues, Eigenvectors, and Diagonalization 23


3.1 Eigenvalues and Eigenvectors: Definition and Characteristic Polynomial . . . . . . . . . . . . . . 23
3.2 Finding Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Properties of Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Theorem: Linear Independence of Eigenvectors for Distinct Eigenvalues (with Proof) . . . . . . . 29
3.5 Diagonalization: Condition (existence of eigenbasis), Process (A = VΛV−1 ) . . . . . . . . . . . . 30
3.6 Theorem: Diagonalizability Condition (Algebraic vs. Geometric Multiplicity) (with Proof) . . . . 33
3.7 Similarity Transformations and their Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Jordan Normal Form 39


4.1 Generalized Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Jordan Blocks and Jordan Canonical Form Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Theorem: Existence and Uniqueness of Jordan Form (Statement, brief discussion on construction) 43
4.4 Application to understanding matrix structure when not diagonalizable . . . . . . . . . . . . . . 44

5 Matrix Norms and Positive Definite Matrices 47


5.1 Vector Norms (e.g., 1-norm, 2-norm, ∞-norm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Matrix Norms (Induced Norms, Frobenius Norm) . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Positive Definite and Semidefinite Matrices: Definition (xT Px > 0 or ≥ 0 for x ̸= 0), Properties . 51
5.4 Theorem: Sylvester’s Criterion for Positive Definiteness (Statement and Application) . . . . . . . 53

6 The Matrix Exponential 55


6.1 Definition via Power Series (eAt = (At)k /k!) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
P
6.2 Convergence of the Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.3 Properties of eAt (e.g., dt e = AeAt , e0 = I, eA(t+s) = eAt eAs , inverse) . . . . . . .
d At
. . . . . . . 57
6.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.5 Methods for Computing eAt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.5.1 Using Laplace Transform: eAt = L−1 {(sI − A)−1 } (Derivation) . . . . . . . . . . . . . . . 62
6.5.2 Using Diagonalization: eAt = V eΛt V −1 (Derivation) . . . . . . . . . . . . . . . . . . . . . 63
6.5.3 Using Jordan Form: eAt = P eJt P −1 (Derivation) . . . . . . . . . . . . . . . . . . . . . . . 63
6.5.4 Using Cayley-Hamilton Theorem (Theorem Statement, Application Example) . . . . . . . 64
6.5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3
4 CONTENTS

I Linear Systems Theory 67


7 State-Space Representation 71
7.1 Concept of State, State Variables, State Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.4 Examples: Electrical Circuits, Mechanical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.5 Linearization of Nonlinear Systems around Equilibria (Brief Overview) . . . . . . . . . . . . . . . 77

8 Solution of State Equations 81


8.1 State Transition Matrix (STM) Φ(t, t0 ) for LTV Systems . . . . . . . . . . . . . . . . . . . . . . . 81
8.1.1 Definition as the unique solution to Φ̇ = A(t)Φ, Φ(t0 , t0 ) = I . . . . . . . . . . . . . . . . 81
8.1.2 Properties: Φ(t, t) = I, Φ(t, τ )Φ(τ, σ) = Φ(t, σ), Φ−1 (t, τ ) = Φ(τ, t) (with Proofs) . . . . . 81
8.1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.2 Matrix Exponential for LTI Systems: Φ(t, t0 ) = eA(t−t0 ) . . . . . . . . . . . . . . . . . . . . . . . 83
8.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.3 Fundamental Matrix Ψ(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.3.2 Relation to STM: Φ(t, τ ) = Ψ(t)Ψ−1 (τ ) (with Proof) . . . . . . . . . . . . . . . . . . . . . 85
8.4 Solution to Nonhomogeneous Systems (Variation of Constants / Convolution Integral) . . . . . . 86
8.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

9 Stability of Linear Systems 89


9.1 Concept of Stability (Internal Stability / Lyapunov Stability) . . . . . . . . . . . . . . . . . . . . 89
9.1.1 Definitions: Equilibrium Points, Stability, Asymptotic Stability, Instability . . . . . . . . 89
9.2 Stability of LTI Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9.2.1 Theorem: Stability based on Eigenvalues of A (Re(λi ) < 0 ⇔ Asymptotic Stability) (with
Proof for diagonalizable case, discussion for Jordan case) . . . . . . . . . . . . . . . . . . 90
9.2.2 Marginal Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.3 Lyapunov Theory for LTI Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.3.1 Lyapunov Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.3.2 Lyapunov Equation: AT P + P A = −Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
9.3.3 Theorem: Lyapunov Stability Theorem (If A stable, then for any Q>0, unique P>0 exists;
If P>0 exists for some Q>0, then A is stable) (with Proofs) . . . . . . . . . . . . . . . . . 94
9.4 Input-Output Stability (BIBO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9.4.2 Relation to Impulse Response Integrability (Mention) . . . . . . . . . . . . . . . . . . . . 96
9.4.3 Theorem: For LTI systems, Asymptotic Stability ⇒ BIBO Stability (Statement) . . . . . 97

10 System Modes and Response Characteristics 99


10.1 Modal Decomposition of LTI Systems (for diagonalizable A) . . . . . . . . . . . . . . . . . . . . . 99
10.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
10.1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.3 Interpretation of Eigenvalues (Poles) and Eigenvectors (Mode Shapes) in System Response . . . 102

11 Controllability and Observability (Introduction) 105


11.1 Definitions: Reachability, Controllability, Constructibility, Observability . . . . . . . . . . . . . . 105
11.2 Physical Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Chapter 1

Vector Spaces

1.1 Definition and Examples (over R, C)


Linear algebra provides the fundamental mathematical framework for understanding and manipulating linear
systems. At its core lies the concept of a vector space, an abstract structure that generalizes the familiar
properties of vectors in Euclidean space. A vector space consists of a set of objects, called vectors, which can
be added together and multiplied by scalars (numbers) according to a specific set of rules or axioms. These
operations must behave in a way that is consistent with our intuition about standard vector arithmetic.
Formally, let F denote a field, which is a set equipped with addition and multiplication operations satisfying
certain properties (associativity, commutativity, distributivity, existence of identity and inverse elements). For
our purposes, the field F will typically be either the set of real numbers, denoted by R, or the set of complex
numbers, denoted by C. A vector space V over the field F is a non-empty set V , whose elements are called
vectors, equipped with two operations:
1. Vector Addition: A binary operation denoted by ’+’, which takes any two vectors v, u ∈ V and produces
another vector v + u ∈ V .
2. Scalar Multiplication: An operation that takes any scalar a ∈ F and any vector v ∈ V and produces
another vector av ∈ V .
These two operations must satisfy the following axioms for all vectors u, v, w ∈ V and all scalars a, b ∈ F:
(A1) Associativity of Addition: (u + v) + w = u + (v + w)
(A2) Commutativity of Addition: u + v = v + u
(A3) Existence of Zero Vector: There exists an element 0 ∈ V , called the zero vector, such that v + 0 = v
for all v ∈ V .
(A4) Existence of Additive Inverse: For every v ∈ V , there exists an element −v ∈ V , called the additive
inverse of v, such that v + (−v) = 0.
(M1) Associativity of Scalar Multiplication: (ab)v = a(bv)
(M2) Distributivity of Scalar Multiplication with respect to Vector Addition: a(u + v) = au + av
(M3) Distributivity of Scalar Multiplication with respect to Field Addition: (a + b)v = av + bv
(M4) Identity Element of Scalar Multiplication: 1v = v, where 1 is the multiplicative identity in the field
F.
These axioms ensure that vector spaces possess a rich algebraic structure. Axioms (A1)-(A4) state that (V, +)
forms an abelian group. Axioms (M1)-(M4) describe how scalar multiplication interacts with vector addition
and field operations. When the field F is R, we call V a real vector space. When F is C, we call V a complex
vector space.
Let’s explore some fundamental examples:
Example 1.1.1 (Euclidean Space Rn ). The most familiar example of a real vector space is Rn , the set of all
ordered n-tuples of real numbers. An element x in Rn is written as x = (x1 , x2 , . . . , xn ), where each xi is a real
number. Vector addition is defined component-wise: if x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ), then
x + y = (x1 + y1 , . . . , xn + yn ).

5
6 CHAPTER 1. VECTOR SPACES

Scalar multiplication by a real number a is also defined component-wise:


ax = (ax1 , . . . , axn ).
The zero vector is 0 = (0, . . . , 0), and the additive inverse of x is −x = (−x1 , . . . , −xn ). It is straightforward
to verify that Rn with these operations satisfies all the vector space axioms over the field R.
Example 1.1.2 (Complex Space Cn ). Similarly, Cn , the set of all ordered n-tuples of complex numbers, forms
a vector space over the field C. An element z in Cn is z = (z1 , . . . , zn ), where each zi ∈ C. Addition and scalar
multiplication (by complex scalars) are defined component-wise, analogous to Rn :
z + w = (z1 + w1 , . . . , zn + wn )
az = (az1 , . . . , azn ), where a ∈ C.
C satisfies all the vector space axioms over the field C. It’s worth noting that Cn can also be considered as a
n

vector space over R, but its dimension will be different (2n over R versus n over C).
Example 1.1.3 (Space of Matrices Mm×n (F)). Let Mm×n (F) denote the set of all m × n matrices with entries
from the field F. We can define matrix addition and scalar multiplication in the usual way. If A = [aij ] and
B = [bij ] are two m × n matrices, their sum A + B is the m × n matrix C = [cij ] where cij = aij + bij . If a ∈ F
is a scalar, the scalar multiple aA is the m × n matrix D = [dij ] where dij = aaij . The zero vector is the m × n
zero matrix (all entries are 0), and the additive inverse of A is −A = [−aij ]. With these operations, Mm×n (F)
forms a vector space over F.
Example 1.1.4 (Space of Polynomials P (F)). Let P (F) be the set of all polynomials with coefficients from
the field F. A polynomial p(x) can be written as p(x) = a0 + a1 x + a2P x2 + · · · + an xn forPsome non-negative
integer n andP coefficients ai ∈ F. Addition of two polynomials p(x) = ai xi andPq(x) = bi xi is defined as
(p + q)(x) = (ai + bi )x . Scalar multiplication by c ∈ F is defined as (cp)(x) = (cai )x . The zero vector is
i i

the zero polynomial (all coefficients are 0). P (F) forms a vector space over F.
Example 1.1.5 (Space of Functions F(S, F)). Let S be any non-empty set and F be a field. Consider the set
F(S, F) of all functions f : S → F. We can define addition of two functions f, g ∈ F(S, F) as the function (f + g)
defined by (f + g)(s) = f (s) + g(s) for all s ∈ S. Scalar multiplication by a ∈ F is defined as the function (af )
given by (af )(s) = a(f (s)) for all s ∈ S. The zero vector is the function that maps every element of S to the
zero element in F. With these operations, F(S, F) forms a vector space over F. Many important vector spaces,
such as spaces of continuous functions or differentiable functions, are subsets of this general construction.
Understanding the abstract definition of a vector space and recognizing these common examples is crucial for
applying the powerful tools of linear algebra to various problems, including the analysis of linear systems.

1.2 Subspaces
Within a given vector space, we often encounter subsets that themselves possess the structure of a vector space
under the same operations. These special subsets are called subspaces. Understanding subspaces is fundamental
because they represent smaller, self-contained linear structures within a larger space.
Formally, let V be a vector space over a field F. A non-empty subset W of V (W ⊆ V, W ̸= ∅) is called a
subspace of V if W itself forms a vector space over F under the vector addition and scalar multiplication
operations defined on V .
While we could verify all ten vector space axioms for W , there is a much simpler and more efficient criterion.
A non-empty subset W of a vector space V is a subspace if and only if it is closed under vector addition and
scalar multiplication. This means:
1. Closure under Addition: For any two vectors w1 , w2 ∈ W , their sum w1 + w2 must also be in W .
2. Closure under Scalar Multiplication: For any scalar a ∈ F and any vector w ∈ W , the scalar multiple
aw must also be in W .
Theorem 1.2.1 (Subspace Criterion). Let V be a vector space over a field F, and let W be a non-empty subset
of V . Then W is a subspace of V if and only if for all w1 , w2 ∈ W and all scalars a ∈ F, the following two
conditions hold:
1. w1 + w2 ∈ W
2. aw1 ∈ W

Necessity: If W is a subspace, it is, by definition, a vector space under the operations inherited from V .
(⇒)Proof.
Therefore, vector addition and scalar multiplication must be closed operations within W , meaning (i) and
(ii) must hold.
1.2. SUBSPACES 7

(⇐) Sufficiency: Assume W is a non-empty subset of V satisfying conditions (i) and (ii). We need to show W satisfies all
vector space axioms.
– Axioms (A1), (A2), (M1), (M2), (M3), (M4) involve only the properties of addition and scalar
multiplication themselves (associativity, commutativity, distributivity, identity). Since these hold for
all vectors in V , they automatically hold for all vectors in the subset W .
– We need to verify (A3) the existence of the zero vector in W and (A4) the existence of additive
inverses in W .
(A3) Zero Vector: Since W is non-empty, there exists at least one vector w ∈ W . By condition (ii), taking the
scalar a = 0 (the zero scalar in F), we have 0w ∈ W . We know from basic vector space properties
(which can be derived from the axioms, see Exercise) that 0w = 0 (the zero vector of V ). Thus,
0 ∈ W.
(A4) Additive Inverse: Let w ∈ W . By condition (ii), taking the scalar a = −1 (the additive inverse of the multiplicative
identity 1 in F), we have (−1)w ∈ W . We also know from basic vector space properties (see
Exercise) that (−1)w = −w (the additive inverse of w in V ). Thus, for every w ∈ W , its
additive inverse −w is also in W .
Since W is closed under addition and scalar multiplication, and contains the zero vector and additive in-
verses for all its elements, and inherits the necessary associative, commutative, and distributive properties
from V , W is itself a vector space over F. Therefore, W is a subspace of V .

Note: An alternative, equivalent criterion is that a non-empty subset W is a subspace if and only if for all
w1 , w2 ∈ W and all scalars a, b ∈ F, the linear combination aw1 + bw2 is also in W .

Examples of Subspaces:
1. The Trivial Subspaces: For any vector space V , the set containing only the zero vector, {0}, is always
a subspace (the zero subspace). Also, V itself is always a subspace of V .
2. Lines and Planes through the Origin in R3 : Consider V = R3 . The set W1 = {(x, 0, 0) | x ∈ R}
(the x-axis) is a subspace. Let w1 = (x1 , 0, 0) and w2 = (x2 , 0, 0). Then w1 + w2 = (x1 + x2 , 0, 0) ∈ W1 ,
and aw1 = (ax1 , 0, 0) ∈ W1 . Similarly, the set W2 = {(x, y, 0) | x, y ∈ R} (the xy-plane) is a subspace. In
general, any line or plane passing through the origin in R3 forms a subspace.
3. Solutions to Homogeneous Linear Systems: Let A be an m × n matrix over F. The set W = {x ∈
Fn | Ax = 0} (the null space or kernel of A) is a subspace of Fn . To verify:
• W is non-empty since A0 = 0, so 0 ∈ W .
• If x1 , x2 ∈ W , then Ax1 = 0 and Ax2 = 0. Then A(x1 + x2 ) = Ax1 + Ax2 = 0 + 0 = 0, so
x1 + x2 ∈ W .
• If x ∈ W and a ∈ F, then Ax = 0. Then A(ax) = a(Ax) = a0 = 0, so ax ∈ W .
4. Continuous Functions: Let V = C[a, b] be the vector space of all real-valued continuous functions on
the interval [a, b]. The subset W = C 1 [a, b] of all continuously differentiable functions on [a, b] is a subspace
of V . The sum of two differentiable functions is differentiable, and a scalar multiple of a differentiable
function is differentiable.
5. Polynomials of Degree ≤ n: Let V = P (F) be the space of all polynomials. Let Pn (F) be the
subset of polynomials with degree less than or equal to n. If p(x) and q(x) have degrees ≤ n, then
deg(p + q) ≤ max(deg(p), deg(q)) ≤ n. Also, deg(ap) = deg(p) ≤ n (if a ̸= 0). The zero polynomial has
degree −∞ (by convention) and is in Pn (F). Thus, Pn (F) is a subspace of P (F).

Non-Examples:
1. A Line Not Through the Origin: The set W = {(x, 1) | x ∈ R} in R2 is not a subspace because it
does not contain the zero vector (0, 0). Also, it’s not closed under addition: (1, 1) ∈ W and (2, 1) ∈ W ,
but (1, 1) + (2, 1) = (3, 2) ∈
/ W.
2. The Union of Two Subspaces: Generally, the union of two subspaces is not a subspace. For example,
let W1 be the x-axis and W2 be the y-axis in R2 . Both are subspaces. However, W1 ∪ W2 is not a subspace
because (1, 0) ∈ W1 and (0, 1) ∈ W2 , but (1, 0) + (0, 1) = (1, 1) ∈
/ W1 ∪ W2 .
8 CHAPTER 1. VECTOR SPACES

Subspaces inherit the algebraic structure of the parent space, making them fundamental building blocks in the
study of vector spaces and linear transformations.

1.3 Linear Independence, Span, Basis, and Dimension


Having defined vector spaces and subspaces, we now turn to concepts that allow us to describe the structure
and size of these spaces more precisely: linear independence, span, basis, and dimension.

Linear Combinations and Span


A fundamental operation in vector spaces is forming linear combinations. Given a set of vectors S = {v1 , v2 , . . . , vk }
in a vector space V over a field F, a linear combination of these vectors is any vector v of the form:

v = c1 v1 + c2 v2 + · · · + ck vk

where c1 , c2 , . . . , ck are scalars from the field F.

The set of all possible linear combinations of vectors in S is called the span of S, denoted as span(S) or lin(S):

span(S) = {c1 v1 + · · · + ck vk | ci ∈ F, vi ∈ S}

If S is an infinite set, the span consists of all linear combinations of any finite subset of S.
Theorem 1.3.1. The span of any subset S of a vector space V is always a subspace of V .

Proof. Let W = span(S). We need to show W is non-empty and closed under addition and scalar multiplication.

• Non-empty: If S is non-empty, let v ∈ S. Then 1v = v is a linear combination, so v ∈ W . If S is empty,


span(∅) is defined as {0}, which is non-empty and is a subspace.

• Closure under Addition: Let w1 , w2 ∈ W . By definition, P w1 = P ai vi and w2 = bj uj for some


P P
scalars ai , bj and vectors vi , uj ∈ S. Their sum w1 + w2 = ai vi + bj uj is also a finite sum of scalar
multiples of vectors from S, hence it is a linear combination of vectors in S. Thus, w1 + w2 ∈ W .

• Closure under Scalar Multiplication: Let


P w ∈ W and c ∈ F. Then w = ai vi for some ai ∈ F, vi ∈
P
S. The scalar multiple cw = c( ai vi ) = (cai )vi . Since cai ∈ F, cw is also a linear combination of
P
vectors in S. Thus, cw ∈ W .

Therefore, span(S) is a subspace of V .

If span(S) = V , we say that the set S spans or generates the vector space V .

Linear Independence
While a set S might span a vector space V , it might contain redundant vectors. The concept of linear indepen-
dence addresses this redundancy.

A set of vectors S = {v1 , v2 , . . . , vk } in a vector space V is said to be linearly independent if the only solution
to the vector equation:
c1 v1 + c2 v2 + · · · + ck vk = 0

is the trivial solution c1 = c2 = · · · = ck = 0.

If there exists a non-trivial solution (i.e., at least one ci is non-zero), then the set S is said to be linearly
dependent. An infinite set S is linearly independent if every finite subset of S is linearly independent.

Intuitively, a set is linearly dependent if at least one vector in the set can be expressed as a linear combination
of the others. For example, if c1 v1 + · · · + ck vk = 0 with c1 ̸= 0, then we can write:

v1 = (−c2 /c1 )v2 + · · · + (−ck /c1 )vk

showing that v1 is redundant in terms of spanning, as it already lies in the span of the other vectors {v2 , . . . , vk }.
1.3. LINEAR INDEPENDENCE, SPAN, BASIS, AND DIMENSION 9

Basis
A basis for a vector space V combines the concepts of spanning and linear independence. It is a minimal set of
vectors that spans the entire space.
A subset B of a vector space V is called a basis for V if it satisfies two conditions:
1. Linear Independence: B is a linearly independent set.
2. Spanning: B spans V , i.e., span(B) = V .
Equivalently, a basis is a maximal linearly independent set or a minimal spanning set.
Example 1.3.1. In Rn , the set of standard basis vectors {e1 , e2 , . . . , en }, where ei is the vector with a 1 in
the i-th position and 0s elsewhere, forms a basis. This is called the standard basis for Rn .
• Linear Independence: c1 e1 + · · · + cn en = (c1 , . . . , cn ) = 0 = (0, . . . , 0) implies c1 = · · · = cn = 0.
• Spanning: Any vector x = (x1 , . . . , xn ) ∈ Rn can be written as x = x1 e1 + · · · + xn en .
Example 1.3.2. In P2 (R), the space of polynomials of degree at most 2, the set {1, x, x2 } forms a basis.
• Linear Independence: c1 · 1 + c2 · x + c3 · x2 = 0 (the zero polynomial) for all x implies c1 = c2 = c3 = 0.
• Spanning: Any polynomial p(x) = a0 + a1 x + a2 x2 is clearly a linear combination of {1, x, x2 }.

Dimension
A crucial property of vector spaces is that although a space can have many different bases, all bases for a given
vector space contain the same number of vectors.
Theorem 1.3.2 (Invariance of Dimension). If a vector space V has a basis consisting of n vectors, then every
basis for V must consist of exactly n vectors.
(Proof omitted here, but relies on the Steinitz Exchange Lemma or similar arguments showing that a linearly
independent set cannot have more elements than a spanning set.)
The number of vectors in any basis for a vector space V is called the dimension of V , denoted as dim(V ).
• If V has a basis with a finite number of vectors n, V is called finite-dimensional, and dim(V ) = n.
• If V does not have a finite basis (e.g., P (F) or C[a, b]), it is called infinite-dimensional.
• By convention, the dimension of the zero vector space {0} is 0.

Examples of Dimension:
• dim(Rn ) = n
• dim(Cn ) = n (as a complex vector space)
• dim(Mm×n (F)) = m × n
• dim(Pn (F)) = n + 1

Properties related to Dimension:


• If W is a subspace of a finite-dimensional vector space V , then dim(W ) ≤ dim(V ). Furthermore, if
dim(W ) = dim(V ), then W = V .
• In an n-dimensional vector space V :
– Any linearly independent set of n vectors is automatically a basis.
– Any spanning set of n vectors is automatically a basis.
– Any set with more than n vectors is linearly dependent.
– Any set with fewer than n vectors cannot span V .
These concepts provide the tools to precisely describe the size and structure of vector spaces and their subspaces,
which is essential for understanding linear transformations and solving systems of linear equations.
10 CHAPTER 1. VECTOR SPACES

1.4 Theorem: Existence and Uniqueness of Basis Representation


One of the most significant consequences of having a basis for a vector space is that it provides a unique way to
represent every vector in the space as a linear combination of the basis vectors. This representation essentially
establishes a coordinate system for the vector space.
Theorem 1.4.1 (Unique Representation Theorem). Let V be a vector space over a field F, and let B =
{b1 , b2 , . . . , bn } be a basis for V . Then for every vector v ∈ V , there exists a unique set of scalars c1 , c2 , . . . , cn ∈
F such that:
v = c1 b1 + c2 b2 + · · · + cn bn

Proof. The proof consists of two parts: existence and uniqueness.


1. Existence: Since B is a basis for V , by definition, it must span V (span(B) = V ). The definition of
spanning means that every vector v ∈ V can be expressed as a linear combination of the vectors in B.
Therefore, for any given v ∈ V , there must exist scalars c1 , c2 , . . . , cn ∈ F such that v = c1 b1 + c2 b2 +
· · · + cn bn . This establishes the existence part of the theorem.
2. Uniqueness: To prove uniqueness, we assume that a vector v ∈ V can be represented in two ways using
the basis vectors in B. Let these representations be:
v = c1 b1 + c2 b2 + · · · + cn bn
v = d1 b1 + d2 b2 + · · · + dn bn
where c1 , . . . , cn and d1 , . . . , dn are scalars from F. Our goal is to show that ci = di for all i = 1, . . . , n.
Subtracting the second equation from the first gives:
v − v = (c1 b1 + · · · + cn bn ) − (d1 b1 + · · · + dn bn )
0 = (c1 − d1 )b1 + (c2 − d2 )b2 + · · · + (cn − dn )bn
Now, recall the other defining property of a basis: the set B = {b1 , b2 , . . . , bn } is linearly independent.
By the definition of linear independence, the only way a linear combination of the basis vectors can equal
the zero vector is if all the scalar coefficients are zero. Therefore, we must have:
(c1 − d1 ) = 0
(c2 − d2 ) = 0
...
(cn − dn ) = 0
This implies that c1 = d1 , c2 = d2 , . . . , cn = dn . Thus, the two representations must be identical, proving
that the representation of v as a linear combination of the basis vectors is unique.

Coordinates
The unique scalars c1 , c2 , . . . , cn associated with the representation of a vector v relative to an ordered basis
B = {b1 , b2 , . . . , bn } are called the coordinates of v relative to the basis B. The vector formed by these
coordinates is called the coordinate vector of v relative to B, often denoted as [v]B :
 
c1
 c2 
[v]B =  .  ∈ Fn
 
 .. 
cn

This theorem establishes a fundamental link between an abstract n-dimensional vector space V over F and
the familiar concrete vector space Fn . Once a basis B is chosen for V , every vector v in V corresponds to a
unique coordinate vector [v]B in Fn , and vice versa. This correspondence preserves the vector space operations,
meaning:
• [u + v]B = [u]B + [v]B
• [av]B = a[v]B
This type of structure-preserving mapping is called an isomorphism. Thus, any n-dimensional vector space over
F is isomorphic to Fn . This allows us to translate problems about abstract vector spaces into problems about
column vectors in Fn , which can often be solved using matrix algebra.
Chapter 2

Linear Transformations and Matrices

Linear transformations are the fundamental mappings between vector spaces that preserve the underlying
linear structure. They are central to linear algebra and play a crucial role in understanding linear systems, as
they represent operations like rotation, scaling, projection, and, importantly, the evolution of linear dynamical
systems.

2.1 Linear Transformations: Definition, Properties, Kernel (Null Space),


Range (Image)
Definition
Let V and W be two vector spaces over the same field F. A function (or mapping) T : V → W is called a linear
transformation (or linear map, linear operator, or vector space homomorphism) if it satisfies the following
two properties for all vectors u, v ∈ V and all scalars c ∈ F:
1. Additivity: T (u + v) = T (u) + T (v)
2. Homogeneity: T (cv) = cT (v)
These two conditions can be combined into a single equivalent condition:

T (cu + dv) = cT (u) + dT (v) for all u, v ∈ V and all scalars c, d ∈ F.

This single condition states that a linear transformation preserves linear combinations. Intuitively, applying a
linear transformation to a linear combination of vectors yields the same result as applying the transformation
to each vector individually and then forming the same linear combination of the results.

Properties of Linear Transformations


Several important properties follow directly from the definition:
1. Mapping the Zero Vector: A linear transformation always maps the zero vector of the domain space
V to the zero vector of the codomain space W . That is, T (0V ) = 0W .
Proof. T (0V ) = T (0v) (where v is any vector in V ). Using homogeneity, T (0v) = 0T (v). Since the scalar
multiple of any vector by the zero scalar is the zero vector, 0T (v) = 0W . Thus, T (0V ) = 0W .
2. Mapping Additive Inverses: T (−v) = −T (v) for all v ∈ V .
Proof. T (−v) = T ((−1)v) = (−1)T (v) = −T (v).
3. Preservation of Subspaces: If U is a subspace of V , then its image T (U ) = {T (u) | u ∈ U } is a
subspace of W . If Z is a subspace of W , then its preimage T −1 (Z) = {v ∈ V | T (v) ∈ Z} is a subspace
of V .

Examples of Linear Transformations:


• Matrix Multiplication: Let A be an m × n matrix over F. The function T : Fn → Fm defined by
T (x) = Ax is a linear transformation. This follows from the distributive and associative properties of
matrix multiplication: A(cu + dv) = c(Au) + d(Av).

11
12 CHAPTER 2. LINEAR TRANSFORMATIONS AND MATRICES

• Zero Transformation: The function T : V → W defined by T (v) = 0W for all v ∈ V is a linear


transformation.
• Identity Transformation: The function T : V → V defined by T (v) = v for all v ∈ V is a linear
transformation.
• Differentiation: Let V = P (R) (space of polynomials) and W = P (R). The differentiation operator D
defined by D(p(x)) = p′ (x) is a linear transformation because (cp + dq)′ = cp′ + dq ′ .
• Integration: Let V = C[a, b] (continuous functions on [a, b]) and W = R. The definite integral operator
Rb Rb Rb
T defined by T (f ) = a f (x)dx is a linear transformation because a (cf (x) + dg(x))dx = c a f (x)dx +
Rb
d a g(x)dx.
• Rotation in R2 : The function T : R2 → R2 that rotates a vector counterclockwise by anangle θ is a linear

cos θ − sin θ
transformation. It can be represented by matrix multiplication T (x) = Ax where A = .
sin θ cos θ

Kernel (Null Space)


The kernel or null space of a linear transformation T : V → W , denoted as ker(T ) or N (T ), is the set of all
vectors in the domain V that are mapped to the zero vector in the codomain W :
ker(T ) = {v ∈ V | T (v) = 0W }
Theorem 2.1.1. The kernel of a linear transformation T : V → W is a subspace of V .

Proof. • Non-empty: We know T (0V ) = 0W , so 0V ∈ ker(T ).


• Closure under Addition: Let u, v ∈ ker(T ). Then T (u) = 0W and T (v) = 0W . Using linearity,
T (u + v) = T (u) + T (v) = 0W + 0W = 0W . Thus, u + v ∈ ker(T ).
• Closure under Scalar Multiplication: Let v ∈ ker(T ) and c ∈ F. Then T (v) = 0W . Using linearity,
T (cv) = cT (v) = c0W = 0W . Thus, cv ∈ ker(T ).
Therefore, ker(T ) is a subspace of V .

The dimension of the kernel is called the nullity of T, denoted nullity(T ) = dim(ker(T )). The kernel provides
information about whether the transformation is injective (one-to-one).
Theorem 2.1.2. A linear transformation T : V → W is injective if and only if ker(T ) = {0V }.

Proof. (⇒) Assume T is injective. We know 0V ∈ ker(T ) because T (0V ) = 0W . If v ∈ ker(T ), then T (v) = 0W .
Since T (0V ) = 0W and T is injective, we must have v = 0V . Thus, ker(T ) contains only the zero vector.
(⇐) Assume ker(T ) = {0V }. Suppose T (u) = T (v) for some u, v ∈ V . Then T (u) − T (v) = 0W . By linearity,
T (u − v) = 0W . This means u − v ∈ ker(T ). Since ker(T ) only contains the zero vector, we must have
u − v = 0V , which implies u = v. Therefore, T is injective.

Range (Image)
The range or image of a linear transformation T : V → W , denoted as range(T ), Im(T ), or R(T ), is the set
of all vectors in the codomain W that are the image of at least one vector in the domain V :
range(T ) = {w ∈ W | w = T (v) for some v ∈ V }
Theorem 2.1.3. The range of a linear transformation T : V → W is a subspace of W .

Proof. • Non-empty: Since T (0V ) = 0W , the zero vector 0W is in range(T ).


• Closure under Addition: Let w1 , w2 ∈ range(T ). Then there exist v1 , v2 ∈ V such that T (v1 ) = w1
and T (v2 ) = w2 . Using linearity, w1 + w2 = T (v1 ) + T (v2 ) = T (v1 + v2 ). Since v1 + v2 ∈ V , this shows
that w1 + w2 is the image of a vector in V , so w1 + w2 ∈ range(T ).
• Closure under Scalar Multiplication: Let w ∈ range(T ) and c ∈ F. Then there exists v ∈ V such
that T (v) = w. Using linearity, cw = cT (v) = T (cv). Since cv ∈ V , this shows that cw is the image of a
vector in V , so cw ∈ range(T ).
Therefore, range(T ) is a subspace of W .
2.2. MATRIX REPRESENTATION OF LINEAR TRANSFORMATIONS 13

The dimension of the range is called the rank of T, denoted rank(T ) = dim(range(T )). The range determines
if the transformation is surjective (onto). T is surjective if and only if range(T ) = W .
Understanding the kernel and range is crucial for analyzing the properties of linear transformations and solving
linear equations. The relationship between their dimensions is captured by the Rank-Nullity Theorem, which
we will discuss later.

2.2 Matrix Representation of Linear Transformations


While linear transformations are abstract mappings between vector spaces, when dealing with finite-dimensional
vector spaces, we can represent them concretely using matrices. This representation allows us to perform
computations related to linear transformations using familiar matrix algebra.
The key idea is that a linear transformation T : V → W is completely determined by its action on a basis of
the domain V . If we know where the basis vectors of V are mapped, we can determine the image of any vector
in V .
Let V and W be finite-dimensional vector spaces over a field F. Let B = {b1 , b2 , . . . , bn } be an ordered basis
for V (dim(V ) = n), and let C = {c1 , c2 , . . . , cm } be an ordered basis for W (dim(W ) = m).
Consider a linear transformation T : V → W . For each basis vector bj ∈ B, its image T (bj ) is a vector in W .
Since C is a basis for W , T (bj ) can be uniquely expressed as a linear combination of the basis vectors in C:

T (bj ) = a1j c1 + a2j c2 + · · · + amj cm

The scalars a1j , a2j , . . . , amj are the coordinates of T (bj ) relative to the basis C. We can write this coordinate
vector as:  
a1j
 a2j 
[T (bj )]C =  .  ∈ Fm
 
 .. 
amj
The matrix representation of T relative to the bases B and C, denoted as [T ]CB or A, is the m × n matrix
whose j-th column is the coordinate vector of T (bj ) relative to the basis C:

A = [T ]CB = [T (b1 )]C [T (b2 )]C · · · [T (bn )]C


 

 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
= . .. .. .. 
 
 .. . . . 
am1 am2 ··· amn

The Action of the Matrix Representation


The significance of this matrix A = [T ]CB is that it allows us to compute the coordinates of the image T (v)
relative to basis C by simply multiplying the matrix A by the coordinates of v relative to basis B.
Theorem 2.2.1. Let T : V → W be a linear transformation, B = {b1 , . . . , bn } be a basis for V , and C =
{c1 , . . . , cm } be a basis for W . Let A = [T ]CB be the matrix representation of T relative to B and C. Then for
any vector v ∈ V :
[T (v)]C = A[v]B

Proof. Let v ∈ V . Since B is a basis for V , v can be uniquely written as:

v = x1 b1 + x2 b2 + · · · + xn bn
 
x1
 x2 
where [v]B =  . .
 
 .. 
xn
Apply the linear transformation T to v:

T (v) = T (x1 b1 + x2 b2 + · · · + xn bn )
14 CHAPTER 2. LINEAR TRANSFORMATIONS AND MATRICES

Using the linearity of T :


T (v) = x1 T (b1 ) + x2 T (b2 ) + · · · + xn T (bn )
Now, consider the coordinate vector of T (v) relative to the basis C. The mapping from a vector to its coordinate
vector is itself a linear transformation (isomorphism). Therefore:

[T (v)]C = [x1 T (b1 ) + x2 T (b2 ) + · · · + xn T (bn )]C

[T (v)]C = x1 [T (b1 )]C + x2 [T (b2 )]C + · · · + xn [T (bn )]C


Recall that the j-th column of the matrix A = [T ]CB is precisely [T (bj )]C . The expression above is exactly the
definition of the matrix-vector product A[v]B :
 
x1
 x2 

 
A[v]B = [T (b1 )]C [T (b2 )]C · · · [T (bn )]C  . 
 .. 
xn

= x1 [T (b1 )]C + x2 [T (b2 )]C + · · · + xn [T (bn )]C


Comparing the results, we conclude that:
[T (v)]C = A[v]B

This theorem provides the bridge between the abstract linear transformation T and its concrete matrix repre-
sentation A. It shows that the action of T on a vector v corresponds to the multiplication of the matrix A by
the coordinate vector of v.
Example 2.2.1. Let T : R2 → R3 be defined by T (x, y) = (x + y, 2x − y, y). Let B = {b1 = (1, 0), b2 = (0, 1)}
be the standard basis for R2 . Let C = {c1 = (1, 0, 0), c2 = (0, 1, 0), c3 = (0, 0, 1)} be the standard basis for R3 .
We find the matrix representation A = [T ]CB . First, find the images of the basis vectors in B:

T (b1 ) = T (1, 0) = (1 + 0, 2(1) − 0, 0) = (1, 2, 0)

T (b2 ) = T (0, 1) = (0 + 1, 2(0) − 1, 1) = (1, −1, 1)


Next, find the coordinate vectors of these images relative to basis C (which is trivial since C is the standard
basis):  
1
[T (b1 )]C = [(1, 2, 0)]C = 2

0
 
1
[T (b2 )]C = [(1, −1, 1)]C = −1
1
Construct the matrix A using these coordinate vectors as columns:
 
1 1
A = [T ]CB = 2 −1
0 1
 
3
Now, let’s take an arbitrary vector v = (3, 4) in R2 . Its coordinate vector relative to B is [v]B = . According
4
to the theorem, [T (v)]C = A[v]B :
     
1 1   1(3) + 1(4) 7
3
[T (v)]C = 2 −1 = 2(3) + (−1)(4) = 2
4
0 1 0(3) + 1(4) 4
 
7
This coordinate vector 2 corresponds to the vector 7c1 + 2c2 + 4c3 = (7, 2, 4) in R3 . Let’s check by applying
4
T directly to v = (3, 4):
T (3, 4) = (3 + 4, 2(3) − 4, 4) = (7, 2, 4).
The results match, confirming the theorem.
2.3. MATRIX OPERATIONS (ADDITION, SCALAR MULTIPLICATION, MULTIPLICATION) 15

Important Note:
The matrix representation of a linear transformation depends crucially on the choice of bases B and C. Changing
either basis will generally result in a different matrix representation for the same linear transformation. We will
explore the relationship between different matrix representations of the same transformation under a change of
basis later.

2.3 Matrix Operations (Addition, Scalar Multiplication, Multiplica-


tion)
Just as linear transformations can be added and scaled, their corresponding matrix representations can also be
combined using standard matrix operations. Furthermore, the composition of linear transformations corresponds
to the multiplication of their matrix representations.
Let V and W be vector spaces over the same field F. Let S : V → W and T : V → W be two linear
transformations.

Addition of Linear Transformations


The sum of S and T , denoted S + T , is a function from V to W defined by:
(S + T )(v) = S(v) + T (v) for all v ∈ V.
We can verify that S + T is also a linear transformation:
(S + T )(cu + dv) = S(cu + dv) + T (cu + dv)
= (cS(u) + dS(v)) + (cT (u) + dT (v)) (by linearity of S and T )
= c(S(u) + T (u)) + d(S(v) + T (v)) (rearranging terms)
= c(S + T )(u) + d(S + T )(v)

Scalar Multiplication of Linear Transformations


Let a ∈ F be a scalar. The scalar multiple of T by a, denoted aT , is a function from V to W defined by:
(aT )(v) = a(T (v)) for all v ∈ V.
We can verify that aT is also a linear transformation:
(aT )(cu + dv) = a(T (cu + dv))
= a(cT (u) + dT (v)) (by linearity of T )
= acT (u) + adT (v)
= c(aT (u)) + d(aT (v)) (rearranging scalars)
= c(aT )(u) + d(aT )(v)

With these operations, the set of all linear transformations from V to W , denoted L(V, W ) or Hom(V, W ), itself
forms a vector space over F.

Matrix Representation of Sums and Scalar Multiples


Now, let V and W be finite-dimensional with ordered bases B = {b1 , . . . , bn } and C = {c1 , . . . , cm }, respectively.
Let [S]CB and [T ]CB be the matrix representations of S and T relative to these bases.
• Sum: The matrix representation of the sum S + T is the sum of their individual matrix representations:
[S + T ]CB = [S]CB + [T ]CB
Proof Sketch: The j-th column of [S + T ]CB is [(S + T )(bj )]C = [S(bj ) + T (bj )]C . Since the coordinate
mapping is linear, this equals [S(bj )]C + [T (bj )]C , which is the sum of the j-th columns of [S]CB and [T ]CB .
• Scalar Multiple: The matrix representation of the scalar multiple aT is the scalar multiple of the matrix
representation of T :
[aT ]CB = a[T ]CB
Proof Sketch: The j-th column of [aT ]CB is [(aT )(bj )]C = [a(T (bj ))]C . By linearity of the coordinate
mapping, this equals a[T (bj )]C , which is a times the j-th column of [T ]CB .
16 CHAPTER 2. LINEAR TRANSFORMATIONS AND MATRICES

These results show that the mapping from L(V, W ) to the space of m × n matrices Mm×n (F) defined by
T 7→ [T ]CB is itself a linear transformation (an isomorphism, in fact). This means that the vector space of
linear transformations L(V, W ) is isomorphic to the vector space of matrices Mm×n (F), and dim(L(V, W )) =
dim(Mm×n (F)) = mn.

Composition of Linear Transformations


Let U, V, and W be vector spaces over F. Let T : U → V and S : V → W be linear transformations. The
composition of S and T , denoted S ◦ T or ST , is a function from U to W defined by:
(S ◦ T )(u) = S(T (u)) for all u ∈ U.
We can verify that the composition S ◦ T is also a linear transformation:
(S ◦ T )(cu1 + du2 ) = S(T (cu1 + du2 ))
= S(cT (u1 ) + dT (u2 )) (by linearity of T )
= cS(T (u1 )) + dS(T (u2 )) (by linearity of S)
= c(S ◦ T )(u1 ) + d(S ◦ T )(u2 )

Matrix Representation of Composition


Now, let U, V, W be finite-dimensional with ordered bases A = {a1 , . . . , ap }, B = {b1 , . . . , bn }, and C =
{c1 , . . . , cm }, respectively. Let [T ]B
A be the n × p matrix representation of T : U → V relative to bases A and
B. Let [S]CB be the m × n matrix representation of S : V → W relative to bases B and C.
The matrix representation of the composition S ◦ T : U → W relative to bases A and C is the product of the
individual matrix representations:
[S ◦ T ]CA = [S]CB [T ]B
A

Proof. Let u ∈ U . We know:


[T (u)]B = [T ]B
A [u]A
Let v = T (u) ∈ V . Then:
[S(v)]C = [S]CB [v]B
Substituting v = T (u) and [v]B = [T (u)]B into the second equation:
[S(T (u))]C = [S]CB [T (u)]B
Now substitute the expression for [T (u)]B :
[(S ◦ T )(u)]C = [S]CB ([T ]B
A [u]A )

Using the associativity of matrix multiplication:


[(S ◦ T )(u)]C = ([S]CB [T ]B
A )[u]A

This equation holds for all u ∈ U . By the definition of the matrix representation of S ◦ T relative to bases A
and C, the matrix that multiplies [u]A to give [(S ◦ T )(u)]C is precisely [S ◦ T ]CA . Therefore:
[S ◦ T ]CA = [S]CB [T ]B
A

This fundamental result connects the abstract operation of function composition with the concrete operation
of matrix multiplication. It underscores why matrix multiplication is defined the way it is – it mirrors the
composition of the underlying linear transformations.

2.4 Change of Basis Matrices


We established that the matrix representation of a linear transformation T : V → W depends on the chosen
bases B for V and C for W . A natural question arises: how does the matrix representation change if we choose
different bases for V and/or W ?
Understanding how to change basis is crucial because it allows us to choose a basis that might simplify the
matrix representation of a transformation (e.g., making it diagonal) or simplify the coordinate representation
of vectors.
2.4. CHANGE OF BASIS MATRICES 17

Change of Basis in the Domain (V)


Let V be an n-dimensional vector space with two ordered bases:

B = {b1 , . . . , bn }

B ′ = {b′1 , . . . , b′n }
Any vector v ∈ V has unique coordinate representations relative to each basis:
 
x1
 .. 
[v]B =  .  such that v = x1 b1 + · · · + xn bn
xn

x′1
 

[v]B′ =  ...  such that v = x′1 b′1 + · · · + x′n b′n


 

x′n
We want to find a matrix that relates [v]B and [v]B′ .
Consider the identity transformation Id : V → V , defined by Id(v) = v. Let’s find its matrix representation
relative to the bases B ′ (for the domain) and B (for the codomain). This matrix is called the change-of-
coordinates matrix (or transition matrix) from B ′ to B, denoted PB←B′ .

PB←B′ = [Id]B
B′

The j-th column of this matrix is the coordinate vector of Id(b′j ) = b′j relative to the basis B:

PB←B′ = [b′1 ]B [b′2 ]B [b′n ]B


 
···

Using the theorem about matrix representations, we have:

[Id(v)]B = [Id]B
B′ [v]B′

Since Id(v) = v, we get the relationship:


[v]B = PB←B′ [v]B′
This equation allows us to convert coordinates from the B ′ basis to the B basis.
Similarly, we can find the change-of-coordinates matrix from B to B ′ , denoted PB′ ←B :
B′
 
PB′ ←B = [Id]B = [b1 ]B′ [b2 ]B′ · · · [bn ]B′

This matrix converts coordinates from B to B ′ :

[v]B′ = PB′ ←B [v]B

Substituting this into the previous equation:

[v]B = PB←B′ (PB′ ←B [v]B ) = (PB←B′ PB′ ←B )[v]B

Since this holds for all [v]B , the matrix product must be the identity matrix:

PB←B′ PB′ ←B = In

This shows that the change-of-coordinates matrices are inverses of each other:

(PB←B′ )−1 = PB′ ←B

Change of Basis for Linear Transformations


Now consider a linear transformation T : V → W . Let B, B ′ be bases for V and C, C ′ be bases for W . Let

A = [T ]CB be the matrix representation relative to B and C. Let A′ = [T ]CB′ be the matrix representation relative
to B and C . We want to find the relationship between A and A .
′ ′ ′

We have the following relationships:


[T (v)]C = A[v]B
18 CHAPTER 2. LINEAR TRANSFORMATIONS AND MATRICES

[T (v)]C ′ = A′ [v]B′
Let P = PB←B′ be the change-of-coordinates matrix from B ′ to B in V ([v]B = P [v]B′ ). Let Q = PC←C ′ be
the change-of-coordinates matrix from C ′ to C in W ([w]C = Q[w]C ′ ). Then Q−1 = PC ′ ←C converts coordinates
from C to C ′ ([w]C ′ = Q−1 [w]C ).
Start with [T (v)]C ′ = A′ [v]B′ . We want to relate this to A. Apply the change of basis in W :

[T (v)]C ′ = Q−1 [T (v)]C

Now use the relationship involving A:


[T (v)]C ′ = Q−1 (A[v]B )
Now apply the change of basis in V :
[T (v)]C ′ = Q−1 (A(P [v]B′ ))
Using associativity:
[T (v)]C ′ = (Q−1 AP )[v]B′
Comparing this with [T (v)]C ′ = A′ [v]B′ , we see that:

A′ = Q−1 AP

where:
• A = [T ]CB

• A′ = [T ]B
C

• P = PB←B′ (Change of basis B ′ to B in V )


• Q = PC←C ′ (Change of basis C ′ to C in W )
This formula shows how the matrix representation transforms when the bases for both the domain and codomain
are changed.

Special Case: T: V → V (Linear Operators)


A common scenario is when the transformation maps a space to itself, T : V → V . In this case, we usually
B be the
use the same basis for both the domain and codomain. Let B and B ′ be two bases for V . Let A = [T ]B
B′
matrix relative to basis B. Let A = [T ]B′ be the matrix relative to basis B .
′ ′

In the general formula A′ = Q−1 AP , we now have W = V , C = B, and C ′ = B ′ . So, P = PB←B′ (change of
basis B ′ to B). And Q = PB←B′ (change of basis B ′ to B), which is the same matrix P . Therefore, Q−1 = P −1 .
The formula becomes:
A′ = P −1 AP

B , A = [T ]B′ , and P = PB←B′ is the change-of-coordinates matrix from B to B.
where A = [T ]B ′ B ′

Two matrices A and A′ related by A′ = P −1 AP for some invertible matrix P are called similar matrices.
Similar matrices represent the same linear operator T : V → V but relative to different bases. This concept is
fundamental in understanding eigenvalues, eigenvectors, and diagonalization.

2.5 Matrix Transpose, Inverse, Determinant: Properties and Com-


putation
Beyond the basic operations of addition, scalar multiplication, and matrix multiplication (corresponding to
transformation composition), several other matrix operations and concepts are crucial in linear algebra and its
applications, particularly the transpose, inverse, and determinant.

Matrix Transpose
The transpose of an m × n matrix A, denoted as AT (or sometimes A′ ), is the n × m matrix obtained by
interchanging the rows and columns of A. If A = [aij ], then AT = [aji ], where the entry in the i-th row and j-th
column of AT is the entry from the j-th row and i-th column of A.
Properties of Transpose: Let A and B be matrices of appropriate sizes and c be a scalar.
2.5. MATRIX TRANSPOSE, INVERSE, DETERMINANT: PROPERTIES AND COMPUTATION 19

1. (AT )T = A (The transpose of the transpose is the original matrix)


2. (A + B)T = AT + B T (The transpose of a sum is the sum of the transposes)
3. (cA)T = cAT (The transpose of a scalar multiple is the scalar multiple of the transpose)
4. (AB)T = B T AT (The transpose of a product is the product of the transposes in reverse order )

Proof of (4). Let A be m × n and B be n × p. Then P AB is m × p, and (AB)T is p × m. B T is p × n and AT


is n × m, so B PA is p × m. Let C = AB, so cij = k aik bkj . The entry in the i-th row and j-th column of
T T

(AB)T is cji = k ajk bki . Let D = B T AT . The entry in the i-th row and j-th column of D is given by the dot
product of the i-th row of B T and the j-th column of AT . The i-th row of B T is the i-th column of B, which
has elements
P bki Tfor k =T 1..n. The
P j-th column
P of A is the j-th row of A, which has elements ajk for k = 1..n.
T

So, dij = k (B )ik (A )kj = k bki ajk = k ajk bki . Comparing cji and dij , we see they are equal. Thus,
(AB)T = B T AT .

A square matrix A is called symmetric if AT = A, and skew-symmetric if AT = −A.

Matrix Inverse
For square matrices (n × n), we can define the concept of an inverse, analogous to the reciprocal of a non-zero
number. An n × n matrix A is called invertible (or non-singular) if there exists an n × n matrix B such that:

AB = BA = In

where In is the n × n identity matrix. If such a matrix B exists, it is unique and is called the inverse of A,
denoted as A−1 . If no such matrix B exists, A is called non-invertible (or singular).
Properties of Inverse: Let A and B be invertible n × n matrices and c be a non-zero scalar.
1. (A−1 )−1 = A
2. (AB)−1 = B −1 A−1 (The inverse of a product is the product of the inverses in reverse order )
3. (cA)−1 = (1/c)A−1
4. (AT )−1 = (A−1 )T (The inverse of the transpose is the transpose of the inverse)

Proof of (2). We need to show that (B −1 A−1 )(AB) = I and (AB)(B −1 A−1 ) = I.

(B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 (I)B = B −1 B = I.

(AB)(B −1 A−1 ) = A(BB −1 )A−1 = A(I)A−1 = AA−1 = I.


Thus, the inverse of AB is indeed B −1 A−1 .

Conditions for Invertibility: For an n × n matrix A, the following conditions are equivalent (if one is true,
all are true):
• A is invertible.
• The equation Ax = 0 has only the trivial solution x = 0.
• The columns of A are linearly independent.
• The columns of A span Fn .
• The columns of A form a basis for Fn .
• The linear transformation T (x) = Ax is injective (one-to-one).
• The linear transformation T (x) = Ax is surjective (onto).
• The rank of A is n (rank(A) = n).
• The nullity of A is 0 (nullity(A) = 0).
• The determinant of A is non-zero (det(A) ̸= 0).
• A is row equivalent to the identity matrix In .
20 CHAPTER 2. LINEAR TRANSFORMATIONS AND MATRICES

Computation of Inverse: One common method to compute the inverse A−1 is using Gaussian elimination
(row reduction). We form the augmented matrix [A | I] and perform elementary row operations to transform
A into the identity matrix I. The same sequence of operations applied to I will transform it into A−1 :

[A | I] → [I | A−1 ]

If A cannot be reduced to I, then A is not invertible.

Determinant
The determinant is a scalar value associated with every square matrix A, denoted as det(A) or |A|. It provides
crucial information about the matrix and the linear transformation it represents.
 
a b
For a 2 × 2 matrix A = , det(A) = ad − bc.
c d
 
a b c
For a 3 × 3 matrix A = d e f ,
g h i

det(A) = a(ei − f h) − b(di − f g) + c(dh − eg).

In general, the determinant can be defined recursively using cofactor expansion. Let Aij be the submatrix
obtained by deleting the i-th row and j-th column of A. The cofactor Cij is defined as Cij = (−1)i+j det(Aij ).
• Expansion across row i: det(A) = j aij Cij
P

• Expansion down column j: det(A) = i aij Cij


P

Properties of Determinant: Let A and B be n × n matrices.


1. det(I) = 1
2. det(AT ) = det(A)
3. det(AB) = det(A) det(B) (Multiplicative property)
4. det(A−1 ) = 1/ det(A) (if A is invertible)
5. det(cA) = cn det(A) (where A is n × n)
6. If A has a row or column of zeros, det(A) = 0.
7. If A has two identical rows or columns, det(A) = 0.
8. If B is obtained from A by swapping two rows or columns, det(B) = − det(A).
9. If B is obtained from A by multiplying a single row or column by a scalar c, det(B) = c det(A).
10. If B is obtained from A by adding a multiple of one row (or column) to another row (or column),
det(B) = det(A). (This is key for computation using row reduction).
11. The determinant of a triangular matrix (upper or lower) is the product of its diagonal entries.
Computation of Determinant: While cofactor expansion works, it is computationally expensive for large
matrices (O(n!)). A more efficient method uses row reduction:
1. Use elementary row operations (row swaps, adding a multiple of one row to another) to transform A into
an upper triangular matrix U . Keep track of the number of row swaps (k) and any row multiplications
by scalars.
2. det(A) = (−1)k ×(product of diagonal entries of U )/(product of scalars used for row multiplication, if any).
(Usually, row multiplication is avoided, only row swaps and adding multiples are used, simplifying the
calculation to det(A) = (−1)k det(U )).
Geometric Interpretation: In R2 , | det(A)| represents the area of the parallelogram formed by the columns
(or rows) of A. In R3 , | det(A)| represents the volume of the parallelepiped formed by the columns (or rows) of
A. In general, | det(A)| represents the factor by which the transformation T (x) = Ax scales volumes. The sign
of det(A) indicates whether the transformation preserves or reverses orientation.
Determinant and Invertibility: As mentioned earlier, a square matrix A is invertible if and only if det(A) ̸=
0. This is one of the most important applications of the determinant.
2.6. THEOREM: RANK-NULLITY THEOREM (WITH PROOF) 21

2.6 Theorem: Rank-Nullity Theorem (with Proof)


The Rank-Nullity Theorem (also known as the Dimension Theorem for linear maps) establishes a fundamental
relationship between the dimensions of the domain, the kernel (null space), and the range (image) of a linear
transformation.
Theorem 2.6.1 (Rank-Nullity Theorem). Let V and W be vector spaces over a field F, and let V be finite-
dimensional. Let T : V → W be a linear transformation. Then:

dim(V ) = dim(ker(T )) + dim(range(T ))

Or, using the terms rank and nullity:

dim(V ) = nullity(T ) + rank(T )

Proof. Let dim(V ) = n. Since ker(T ) is a subspace of V , it is also finite-dimensional. Let dim(ker(T )) = k,
where 0 ≤ k ≤ n.
1. Basis for the Kernel: Choose a basis for ker(T ). Let this basis be Bker = {u1 , u2 , . . . , uk }. Since
this set is linearly independent and consists of k vectors, k = dim(ker(T )) = nullity(T ). If k = n, then
ker(T ) = V , which means T (v) = 0 for all v ∈ V . In this case, range(T ) = {0}, so dim(range(T )) = 0.
The theorem holds: n = n + 0. If k = 0, then ker(T ) = {0}. The basis Bker is the empty set.
2. Extend to a Basis for V: Assume 0 ≤ k < n. Since Bker is a linearly independent set in V , we
can extend it to form a basis for the entire space V (by the Basis Extension Theorem). Let B =
{u1 , . . . , uk , v1 , . . . , vn−k } be such a basis for V . The total number of vectors in this basis is k + (n − k) =
n = dim(V ).
3. Show {T (v1 ), . . . , T (vn−k )} is a Basis for range(T ): We need to show that the set Brange = {T (v1 ), T (v2 ), . . . , T (vn−k
forms a basis for the range of T . This requires proving two things: that Brange spans range(T ) and that
it is linearly independent.
• Spanning: Let w be any vector in range(T ). By definition, w = T (x) for some x ∈ V . Since B is
a basis for V , we can write x as a linear combination of the basis vectors:

x = (c1 u1 + · · · + ck uk ) + (d1 v1 + · · · + dn−k vn−k )

Apply T to x using linearity:

w = T (x) = T (c1 u1 + · · · + ck uk ) + T (d1 v1 + · · · + dn−k vn−k )

w = [c1 T (u1 ) + · · · + ck T (uk )] + [d1 T (v1 ) + · · · + dn−k T (vn−k )]


Since u1 , . . . , uk are in ker(T ), we have T (u1 ) = · · · = T (uk ) = 0. Therefore:

w = d1 T (v1 ) + d2 T (v2 ) + · · · + dn−k T (vn−k )

This shows that any vector w in range(T ) can be written as a linear combination of the vectors in
Brange . Thus, Brange spans range(T ).
• Linear Independence: Suppose there is a linear combination of vectors in Brange that equals the
zero vector:
d1 T (v1 ) + d2 T (v2 ) + · · · + dn−k T (vn−k ) = 0W
Using linearity, this can be written as:

T (d1 v1 + d2 v2 + · · · + dn−k vn−k ) = 0W

This implies that the vector z = d1 v1 + · · · + dn−k vn−k is in the kernel of T (ker(T )). Since
Bker = {u1 , . . . , uk } is a basis for ker(T ), z must be expressible as a linear combination of these
vectors:
z = c1 u1 + c2 u2 + · · · + ck uk
Substituting the expression for z:

d1 v1 + · · · + dn−k vn−k = c1 u1 + · · · + ck uk

Rearranging the terms to form a linear combination of the basis vectors of V :

(−c1 )u1 + · · · + (−ck )uk + d1 v1 + · · · + dn−k vn−k = 0V


22 CHAPTER 2. LINEAR TRANSFORMATIONS AND MATRICES

Since B = {u1 , . . . , uk , v1 , . . . , vn−k } is a basis for V , it is a linearly independent set. Therefore, the
only way this linear combination can equal the zero vector is if all the scalar coefficients are zero:

−c1 = 0, . . . , −ck = 0, and d1 = 0, . . . , dn−k = 0.

In particular, we have shown that d1 = d2 = · · · = dn−k = 0. This is precisely the condition needed
to show that the set Brange = {T (v1 ), . . . , T (vn−k )} is linearly independent.
4. Conclusion: Since Brange spans range(T ) and is linearly independent, it forms a basis for range(T ). The
number of vectors in this basis is n − k. Therefore:

dim(range(T )) = rank(T ) = n − k.

We already established that dim(ker(T )) = nullity(T ) = k and dim(V ) = n. Substituting these into the
equation rank(T ) = n − k gives:
rank(T ) = dim(V ) − nullity(T )
Rearranging, we get the Rank-Nullity Theorem:

dim(V ) = nullity(T ) + rank(T )

Application to Matrices:
If T : Fn → Fm is a linear transformation represented by the matrix multiplication T (x) = Ax, where A is an
m × n matrix, then:
• ker(T ) is the null space of A, N (A) = {x ∈ Fn | Ax = 0}. nullity(T ) = dim(N (A)).
• range(T ) is the column space of A, Col(A) = span{columns of A}. rank(T ) = dim(Col(A)) = rank(A).
The Rank-Nullity Theorem for matrices states:

n = dim(N (A)) + rank(A)

where n is the number of columns of A.


This theorem provides a powerful check on calculations involving rank and nullity. It also implies relationships
between the injectivity and surjectivity of linear transformations between spaces of the same dimension. For
instance, if T : V → V and dim(V ) is finite, then T is injective (nullity=0) if and only if T is surjective
(rank=dim(V )).
Chapter 3

Eigenvalues, Eigenvectors, and


Diagonalization

3.1 Eigenvalues and Eigenvectors: Definition and Characteristic Poly-


nomial
In the study of linear transformations and matrices, certain vectors possess a remarkable property: when acted
upon by the transformation, they are simply scaled, meaning their direction remains unchanged (or is reversed
if the scaling factor is negative). These special vectors and their corresponding scaling factors are fundamental
concepts in linear algebra, revealing deep insights into the structure and behavior of the transformation itself.
They are known as eigenvectors and eigenvalues, respectively.

Definition of Eigenvalues and Eigenvectors


Let V be a vector space over a field F (typically R or C), and let T : V → V be a linear transformation. A
non-zero vector v in V is called an eigenvector of T if there exists a scalar λ in F such that

T (v) = λv

The scalar λ is called the eigenvalue corresponding to the eigenvector v. The term "eigen" comes from German,
meaning "proper" or "characteristic". Thus, eigenvectors are characteristic vectors of the transformation, and
eigenvalues are the characteristic values associated with these directions.
Similarly, for a square matrix A of size n × n representing a linear transformation from Fn to Fn relative to
some basis, a non-zero vector x in Fn is called an eigenvector of A if there exists a scalar λ in F such that

Ax = λx

The scalar λ is called the eigenvalue corresponding to the eigenvector x.


It is crucial to emphasize that an eigenvector must be non-zero. If v = 0, the equation T (0) = λ0 holds for any
scalar λ, which provides no useful information about the transformation. The zero vector is therefore explicitly
excluded from the definition of an eigenvector.
An eigenvalue λ can be zero. If λ = 0 is an eigenvalue, then the corresponding eigenvector v satisfies T (v) =
0v = 0. This means that any non-zero vector in the kernel (or null space) of the transformation T is an
eigenvector corresponding to the eigenvalue λ = 0.

Geometric Interpretation
Geometrically, the equation Ax = λx signifies that the action of the matrix A on the vector x results in a
vector Ax that is parallel to the original vector x. The eigenvalue λ represents the factor by which the vector x
is stretched or shrunk (or reversed if λ < 0) along the direction defined by x. Eigenvectors define the invariant
directions under the linear transformation represented by A.
For example, consider a rotation matrix in R2 . Unless the rotation is by 0 or π radians, no non-zero vector will
map to a scalar multiple of itself. Thus, such a rotation matrix (over R) has no real eigenvectors. However, if we

23
24 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION

consider the transformation over C, eigenvalues and eigenvectors might exist. A projection matrix, on the other
hand, will typically have eigenvectors. Vectors lying in the subspace being projected onto are eigenvectors with
eigenvalue 1, while vectors in the subspace being projected out (the kernel) are eigenvectors with eigenvalue 0.

Finding Eigenvalues: The Characteristic Polynomial


To find the eigenvalues and eigenvectors of a matrix A, we start from the defining equation:

Ax = λx

Since we are looking for non-zero vectors x, we can rewrite this equation. Let I be the identity matrix of the
same size as A. Then λx can be written as λIx. Substituting this into the equation gives:

Ax = λIx

Rearranging the terms to one side, we get:


Ax − λIx = 0

(A − λI)x = 0
This equation represents a homogeneous system of linear equations. We are seeking non-trivial solutions (x ̸= 0)
for this system. A fundamental result from linear algebra states that a homogeneous system (M x = 0) has
non-trivial solutions if and only if the matrix M is singular, which means its determinant is zero.
In our case, the matrix is M = (A − λI). Therefore, for non-zero eigenvectors x to exist, the matrix (A − λI)
must be singular. This condition translates to:

det(A − λI) = 0

This equation is called the characteristic equation of the matrix A. The expression det(A − λI), when ex-
panded, results in a polynomial in the variable λ. This polynomial is known as the characteristic polynomial
of A, often denoted as p(λ) or charA (λ).
p(λ) = det(A − λI)
The roots of the characteristic polynomial p(λ) = 0 are precisely the eigenvalues of the matrix A.
Let A be an n × n matrix. The matrix (A − λI) is:
 
a11 − λ a12 ··· a1n
 a21 a22 − λ ··· a2n 
A − λI =  . .. .. ..
 
 .. . . .


an1 an2 ··· ann − λ

The determinant of this matrix, det(A − λI), can be computed using standard determinant expansion methods
(like cofactor expansion). The resulting characteristic polynomial p(λ) will have a degree of n.

p(λ) = det(A − λI) = cn λn + cn−1 λn−1 + · · · + c1 λ + c0

where the coefficients ci depend on the entries of A. Specifically, the leading coefficient cn is (−1)n , and the
constant term c0 is det(A) (obtained by setting λ = 0). Another important coefficient is cn−1 , which is equal
to (−1)n−1 times the trace of A (the sum of the diagonal elements).
By the Fundamental Theorem of Algebra, a polynomial of degree n with complex coefficients has exactly n
complex roots, counting multiplicities. Therefore, an n × n matrix A always has n eigenvalues in the complex
numbers (counting multiplicities). These eigenvalues might be real or complex, and they might not be distinct.
Once the eigenvalues (λ1 , λ2 , . . . , λn ) are found by solving the characteristic equation det(A − λI) = 0, the
corresponding eigenvectors for each eigenvalue λi can be found by solving the homogeneous system of linear
equations:
(A − λi I)x = 0
The non-zero solutions x to this system are the eigenvectors associated with the eigenvalue λi . The set of all
solutions (including the zero vector) forms a subspace called the eigenspace corresponding to λi , denoted as
Eλi or Nul(A − λi I). Any non-zero vector in this eigenspace is an eigenvector for λi .
3.2. FINDING EIGENVALUES AND EIGENVECTORS 25

3.2 Finding Eigenvalues and Eigenvectors


Section 3.1 established the theoretical foundation for eigenvalues and eigenvectors, defining them and deriving
the characteristic equation det(A − λI) = 0, whose roots are the eigenvalues. This section focuses on the
practical procedure for finding the eigenvalues and their corresponding eigenvectors for a given square matrix
A.

Step 1: Find the Eigenvalues by Solving the Characteristic Equation


The first step is to determine the eigenvalues of the matrix A. This involves setting up and solving the
characteristic equation:
p(λ) = det(A − λI) = 0
1. Form the matrix (A−λI): Subtract λ from each diagonal element of A, leaving the off-diagonal elements
unchanged.
2. Calculate the determinant: Compute the determinant of the resulting matrix (A − λI). This will yield
the characteristic polynomial p(λ), a polynomial in λ of degree n, where n is the size of the matrix A.
3. Solve the polynomial equation: Find the roots of the characteristic polynomial p(λ) = 0. These roots
are the eigenvalues (λ1 , λ2 , . . . , λn ) of the matrix A. According to the Fundamental Theorem of Algebra,
an n-degree polynomial has exactly n roots in the complex number system, counting multiplicities. These
eigenvalues might be real or complex, and they might  not all be distinct.
2 1
Example 3.2.1. Find the eigenvalues of the matrix A = .
1 2
1. Form (A − λI):  
2−λ 1
A − λI =
1 2−λ

2. Calculate the determinant:

det(A − λI) = (2 − λ)(2 − λ) − (1)(1)


= 4 − 2λ − 2λ + λ2 − 1
= λ2 − 4λ + 3

3. Solve p(λ) = 0:
λ2 − 4λ + 3 = 0
Factoring the polynomial, we get:
(λ − 1)(λ − 3) = 0
The roots are λ1 = 1 and λ2 = 3.
Therefore, the eigenvalues of the matrix A are 1 and 3.

Step 2: Find the Eigenvectors for Each Eigenvalue


Once an eigenvalue λi is found, the next step is to find the corresponding eigenvectors. Eigenvectors associated
with λi are the non-zero vectors x that satisfy the equation:

(A − λi I)x = 0

This is a homogeneous system of linear equations. The set of all solutions (including the zero vector) forms the
eigenspace Eλi = Nul(A − λi I). To find the eigenvectors:
1. Substitute the eigenvalue: Replace λ with the specific eigenvalue λi in the matrix (A − λI).
2. Solve the homogeneous system: Solve the system of linear equations (A − λi I)x = 0 for the vector
x. This is typically done using methods like Gaussian elimination (row reduction) to find the basis for
the null space (eigenspace).
3. Identify the eigenvectors: The non-zero vectors in the basis of the null space are the linearly indepen-
dent eigenvectors corresponding to λi . Any non-zero linear combination of these basis vectors is also an
eigenvector for λi . The set of all solutions (including the zero vector) constitutes the eigenspace Eλi .
26 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION
 
2 1
Continuing the Example (A = ):
1 2
For λ1 = 1:
1. Substitute λ1 = 1 into (A − λI):
   
2−1 1 1 1
A − 1I = =
1 2−1 1 1

2. Solve (A − 1I)x = 0:     
1 1 x1 0
=
1 1 x2 0
This corresponds to the system:

x1 + x2 = 0
x1 + x2 = 0

Both equations  are


 identical:
  x1 = −x2 . Let x2 = t (a free parameter). Then x1 = −t. The solution
−t −1
vector is x = =t .
t 1
 
−1
3. Identify eigenvectors: The eigenspace E1 is spanned by the vector . Any non-zero scalar multiple
1
 
−1
of this vector is an eigenvector for λ1 = 1. For instance, v1 = is an eigenvector.
1
For λ2 = 3:
1. Substitute λ2 = 3 into (A − λI):
   
2−3 1 −1 1
A − 3I = =
1 2−3 1 −1

2. Solve (A − 3I)x = 0:     
−1 1 x1 0
=
1 −1 x2 0
This corresponds to the system:

−x1 + x2 = 0
x1 − x2 = 0

Both equations
   imply x1 = x2 . Let x2 = s (a free parameter). Then x1 = s. The solution vector is
s 1
x= =s .
s 1
 
1
3. Identify eigenvectors: The eigenspace E3 is spanned by the vector . Any non-zero scalar multiple
  1
1
of this vector is an eigenvector for λ2 = 3. For instance, v2 = is an eigenvector.
1
 
2 1
In summary, for the matrix A = , the eigenvalues are λ1 = 1 and λ2 = 3. The corresponding eigenspaces
1 2
   
−1 1
are E1 = span{ } and E3 = span{ }.
1 1

Important Considerations
• Multiplicity: An eigenvalue can have an algebraic multiplicity (its multiplicity as a root of the
characteristic polynomial) and a geometric multiplicity (the dimension of its corresponding eigenspace,
i.e., the number of linearly independent eigenvectors). The geometric multiplicity is always less than or
equal to the algebraic multiplicity (1 ≤ dim(Eλi ) ≤ algebraic multiplicity of λi ).
3.3. PROPERTIES OF EIGENVALUES AND EIGENVECTORS 27

• Complex Eigenvalues: If the matrix A has real entries, its characteristic polynomial will have real
coefficients. However, the roots (eigenvalues) can still be complex. If a complex number λ is an eigenvalue,
its complex conjugate λ̄ will also be an eigenvalue. The corresponding eigenvectors will also occur in
conjugate pairs.
• Finding Roots: For matrices larger than 2 × 2 or 3 × 3, finding the roots of the characteristic polynomial
analytically can be difficult or impossible. Numerical methods are often employed to approximate the
eigenvalues in such cases.

3.3 Properties of Eigenvalues and Eigenvectors


Eigenvalues and eigenvectors possess several important properties that provide further insight into the structure
of matrices and linear transformations. Understanding these properties is crucial for various applications,
including diagonalization, solving systems of differential equations, and analyzing stability.
Let A be an n×n matrix with eigenvalues λ1 , λ2 , . . . , λn (counting multiplicities) and corresponding eigenvectors
v1 , v2 , . . . , vn .
1. Trace and Eigenvalues: The sum of the eigenvalues of a matrix is equal to the trace of the matrix (the
sum of the diagonal elements).
n
X
λi = λ1 + λ2 + · · · + λn = tr(A) = a11 + a22 + · · · + ann
i=1

Derivation Sketch: The characteristic polynomial is p(λ) = det(A−λI) = (−1)n λn +(−1)n−1 tr(A)λn−1 +
· · ·+det(A). By Vieta’s formulas, the sum of the roots (eigenvalues) of a polynomial is related to its coeffi-
cients. For p(λ), the sum of the roots is −(coefficient of λn−1 )/(coefficient of λn ) = −((−1)n−1 tr(A))/((−1)n ) =
−(− tr(A)) = tr(A).
2. Determinant and Eigenvalues: The product of the eigenvalues of a matrix is equal to the determinant
of the matrix.
Yn
λi = λ1 · λ2 · · · · · λn = det(A)
i=1

Derivation Sketch: The characteristic polynomial is p(λ) = det(A − λI). Setting λ = 0 gives p(0) =
det(A−0I) = det(A). Also, from the factored form of the polynomial p(λ) = (−1)n (λ−λ1 )(λ−λ2 ) . . . (λ−
λn ), setting λ = 0 gives p(0) = (−1)n (−λ1 )(−λ2 ) . . . (−λn ) = (−1)n (−1)n (λ1 λ2 . . . λn ) = λ1 λ2 . . . λn .
Equating the two expressions for p(0) yields det(A) = λ1 λ2 . . . λn .
3. Eigenvalues of AT : A matrix A and its transpose AT have the same characteristic polynomial and
therefore the same eigenvalues.
Proof. The characteristic polynomial of AT is det(AT − λI). We know that the determinant of a matrix is
equal to the determinant of its transpose: det(M ) = det(M T ). Let M = A − λI. Then M T = (A − λI)T =
AT − (λI)T = AT − λI. Therefore, det(A − λI) = det((A − λI)T ) = det(AT − λI). Since the characteristic
polynomials are identical, their roots (the eigenvalues) must also be identical.
Note: While the eigenvalues are the same, the eigenvectors of A and AT are generally different (unless A
is symmetric).
4. Eigenvalues of A−1 (if A is invertible): If A is an invertible matrix (det(A) ̸= 0) and λ is an eigenvalue
of A with corresponding eigenvector x, then 1/λ is an eigenvalue of A−1 with the same eigenvector x.
Proof. Since A is invertible, det(A) ̸= 0. From property 2, the product of eigenvalues is det(A), so no
eigenvalue λ can be zero. We start with the eigenvalue equation:

Ax = λx

Multiply both sides by A−1 from the left:

A−1 (Ax) = A−1 (λx)

(A−1 A)x = λ(A−1 x) (Scalar λ commutes with matrix A−1 )


Ix = λ(A−1 x)
x = λ(A−1 x)
28 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION

Since λ ̸= 0, we can divide by λ:


(1/λ)x = A−1 x
Rearranging gives:
A−1 x = (1/λ)x
This shows that x is an eigenvector of A−1 with the corresponding eigenvalue 1/λ.
5. Eigenvalues of Ak (for integer k ≥ 0): If λ is an eigenvalue of A with corresponding eigenvector x,
then λk is an eigenvalue of Ak with the same eigenvector x.
Proof by induction on k. Base case (k=1): A1 x = Ax = λx = λ1 x. The statement holds for k=1.
Inductive step: Assume Ak x = λk x holds for some integer k ≥ 1. Consider Ak+1 x:

Ak+1 x = A(Ak x)

Using the inductive hypothesis:


Ak+1 x = A(λk x)
Since λk is a scalar:
Ak+1 x = λk (Ax)
Using the eigenvalue definition Ax = λx:

Ak+1 x = λk (λx)

Ak+1 x = (λk+1 )x
Thus, the statement holds for k+1. By induction, Ak x = λk x holds for all integers k ≥ 1. The case k=0
is A0 x = Ix = x and λ0 x = 1x = x, so it also holds (assuming λ ̸= 0 if k=0, though usually defined as
I).
6. Eigenvalues of cA (for scalar c): If λ is an eigenvalue of A with corresponding eigenvector x, then cλ
is an eigenvalue of cA with the same eigenvector x.
Proof. Start with Ax = λx. Multiply both sides by the scalar c:

c(Ax) = c(λx)

(cA)x = (cλ)x
This shows that x is an eigenvector of cA with eigenvalue cλ.
7. Eigenvalues of A + cI: If λ is an eigenvalue of A with corresponding eigenvector x, then λ + c is an
eigenvalue of A + cI with the same eigenvector x.
Proof. Consider the action of (A + cI) on x:

(A + cI)x = Ax + (cI)x

= Ax + cx
Using Ax = λx:
= λx + cx
= (λ + c)x
This shows that x is an eigenvector of A + cI with eigenvalue λ + c.
8. Eigenvalues of Triangular Matrices: The eigenvalues of a triangular matrix (either upper or lower)
are the entries on its main diagonal.
Proof. Let A be an n × n upper triangular matrix. Then (A − λI) is also an upper triangular matrix with
diagonal entries (a11 −λ), (a22 −λ), . . . , (ann −λ). The determinant of a triangular matrix is the product of
its diagonal entries. Therefore, det(A − λI) = (a11 − λ)(a22 − λ) . . . (ann − λ). The characteristic equation
is (a11 − λ)(a22 − λ) . . . (ann − λ) = 0. The roots (eigenvalues) are λ1 = a11 , λ2 = a22 , . . . , λn = ann . The
same logic applies to lower triangular matrices.
9. Linear Independence of Eigenvectors: Eigenvectors corresponding to distinct eigenvalues are linearly
independent. (This is a crucial theorem that will be proven in the next section).
These properties are fundamental tools for working with eigenvalues and eigenvectors, simplifying calculations
and providing theoretical insights.
3.4. THEOREM: LINEAR INDEPENDENCE OF EIGENVECTORS FOR DISTINCT EIGENVALUES (WITH PROOF)29

3.4 Theorem: Linear Independence of Eigenvectors for Distinct Eigen-


values (with Proof)
A fundamental and highly useful property concerning eigenvectors is their linear independence when they
correspond to distinct eigenvalues. This theorem is crucial for understanding the structure of matrices and
forms the basis for diagonalization.
Theorem 3.4.1. Let A be an n × n matrix. If v1 , v2 , . . . , vk are eigenvectors of A corresponding to distinct
eigenvalues λ1 , λ2 , . . . , λk (i.e., λi ̸= λj for i ̸= j), then the set of eigenvectors {v1 , v2 , . . . , vk } is linearly
independent.

Proof. We will prove this theorem by induction on the number of eigenvectors, k.


Base Case (k=1): Consider a single eigenvector v1 corresponding to eigenvalue λ1 . By definition, an eigen-
vector must be non-zero (v1 ̸= 0). A set containing a single non-zero vector is always linearly independent.
Thus, the theorem holds for k=1.
Inductive Hypothesis: Assume that any set of k − 1 eigenvectors corresponding to k − 1 distinct eigenvalues
is linearly independent. That is, assume the theorem holds for sets of size k − 1.
Inductive Step (Prove for k): Consider a set of k eigenvectors {v1 , v2 , . . . , vk } corresponding to k distinct
eigenvalues {λ1 , λ2 , . . . , λk }. We want to show that this set is linearly independent. To do this, we examine the
linear dependence equation:
c1 v1 + c2 v2 + · · · + ck vk = 0 (3.1)
where c1 , c2 , . . . , ck are scalars. We need to show that the only solution to this equation is c1 = c2 = · · · = ck = 0.
Multiply Equation (3.1) by the matrix A:

A(c1 v1 + c2 v2 + · · · + ck vk ) = A(0)

Using the linearity of matrix multiplication:

c1 Av1 + c2 Av2 + · · · + ck Avk = 0

Since vi is an eigenvector corresponding to eigenvalue λi , we have Avi = λi vi . Substitute this into the equation:

c1 λ1 v1 + c2 λ2 v2 + · · · + ck λk vk = 0 (3.2)

Now, multiply the original Equation (3.1) by the scalar λk :

λk (c1 v1 + c2 v2 + · · · + ck vk ) = λk (0)

c1 λk v1 + c2 λk v2 + · · · + ck−1 λk vk−1 + ck λk vk = 0 (3.3)


Subtract Equation (3.3) from Equation (3.2):

(c1 λ1 v1 + · · · + ck λk vk ) − (c1 λk v1 + · · · + ck λk vk ) = 0 − 0

Group terms by the eigenvectors vi :

c1 (λ1 − λk )v1 + c2 (λ2 − λk )v2 + · · · + ck−1 (λk−1 − λk )vk−1 + ck (λk − λk )vk = 0

c1 (λ1 − λk )v1 + c2 (λ2 − λk )v2 + · · · + ck−1 (λk−1 − λk )vk−1 = 0 (3.4)


Equation (3.4) is a linear combination of the k − 1 eigenvectors {v1 , v2 , . . . , vk−1 }. By the inductive hypothesis,
since these k −1 eigenvectors correspond to the k −1 distinct eigenvalues {λ1 , λ2 , . . . , λk−1 }, they form a linearly
independent set.
Therefore, the only way Equation (3.4) can hold is if all the scalar coefficients are zero:

c1 (λ1 − λk ) = 0

c2 (λ2 − λk ) = 0
...
ck−1 (λk−1 − λk ) = 0
30 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION

Since all the eigenvalues are distinct by assumption, we know that λi ̸= λk for i = 1, 2, . . . , k − 1. This means
that (λi − λk ) ̸= 0 for all i in this range.
Because (λi − λk ) ̸= 0, the only way for the products ci (λi − λk ) to be zero is if the coefficients ci are zero:

c1 = 0

c2 = 0

...

ck−1 = 0

Now, substitute these zero coefficients back into the original Equation (3.1):

0v1 + 0v2 + · · · + 0vk−1 + ck vk = 0

This simplifies to:


ck vk = 0

Since vk is an eigenvector, it must be non-zero (vk ̸= 0). Therefore, the only way for ck vk to be zero is if the
scalar ck is zero:
ck = 0

We have now shown that c1 = c2 = · · · = ck−1 = ck = 0 is the only solution to the linear dependence equation
(3.1). This proves that the set of eigenvectors {v1 , v2 , . . . , vk } is linearly independent.
Conclusion: By the principle of mathematical induction, the theorem holds for any number k of eigenvectors
corresponding to distinct eigenvalues.

Significance: This theorem guarantees that if an n × n matrix A has n distinct eigenvalues, then it has a set
of n linearly independent eigenvectors. As we will see in the next section, this is a sufficient condition for the
matrix to be diagonalizable.

3.5 Diagonalization: Condition (existence of eigenbasis), Process (A


= VΛV−1 )
Diagonalization is a fundamental process in linear algebra where a square matrix A is decomposed into a product
involving a diagonal matrix. This decomposition simplifies many matrix computations, such as calculating
powers of A or the matrix exponential, and provides deep insights into the geometric action of the linear
transformation represented by A.

Definition of Diagonalizability
A square matrix A of size n × n is said to be diagonalizable if it is similar to a diagonal matrix. That is, A is
diagonalizable if there exists an invertible matrix V and a diagonal matrix Λ such that:

A = V ΛV −1

Equivalently, multiplying by V −1 from the left and V from the right, we can write:

Λ = V −1 AV

This means that A can be transformed into a diagonal matrix Λ through a similarity transformation using the
matrix V .

Condition for Diagonalizability: Existence of an Eigenbasis


The crucial question is: when is a matrix A diagonalizable? The answer lies in its eigenvectors.
Theorem 3.5.1. An n × n matrix A is diagonalizable if and only if A has n linearly independent eigenvectors.
3.5. DIAGONALIZATION: CONDITION (EXISTENCE OF EIGENBASIS), PROCESS (A = VΛV−1 ) 31

Proof. (⇒) Assume A is diagonalizable.


If A is diagonalizable, then there exists an invertible matrix V and a diagonal matrix Λ such that A = V ΛV −1
or equivalently AV = V Λ. Let the columns of V be the vectors v1 , v2 , . . . , vn , so V = [v1 |v2 | . . . |vn ]. Let the
diagonal entries of Λ be λ1 , λ2 , . . . , λn , so:
 
λ1 0 ··· 0
0 λ2 ··· 0 
Λ= . .. .. ..
 
 .. . . .


0 0 ··· λn

Now, let’s look at the equation AV = V Λ column by column. The j-th column of AV is A times the j-th column
of V , which is Avj . The j-th column of V Λ is V times the j-th column of Λ. The j-th column of Λ is a vector
with λj in the j-th position and zeros elsewhere. Thus, the j-th column of V Λ is:
 
0
 .. 
.
 
0
 
λj  = 0v1 + · · · + 0vj−1 + λj vj + 0vj+1 + · · · + 0vn = λj vj
[v1 |v2 | . . . |vn ]  
0
.
 
 .. 
0

Equating the j-th columns of AV and V Λ, we get:

Avj = λj vj

This is precisely the eigenvalue equation. It shows that the columns vj of the matrix V are eigenvectors of A,
and the corresponding diagonal entries λj of Λ are the eigenvalues of A.

Since A is diagonalizable, the matrix V exists and is invertible. A matrix is invertible if and only if its columns
are linearly independent. Therefore, the eigenvectors v1 , v2 , . . . , vn (which form the columns of V ) must be
linearly independent.

Furthermore, since V is an n × n matrix and its columns are linearly independent, these n eigenvectors form
a basis for the vector space Fn (where F is the field, typically R or C). Such a basis consisting entirely of
eigenvectors is called an eigenbasis.

Thus, if A is diagonalizable, it must have n linearly independent eigenvectors (which form an eigenbasis).

(⇐) Assume A has n linearly independent eigenvectors.


Let v1 , v2 , . . . , vn be n linearly independent eigenvectors of A, corresponding to eigenvalues λ1 , λ2 , . . . , λn (note
that the eigenvalues λi are not necessarily distinct here).

Construct the matrix V whose columns are these eigenvectors:

V = [v1 |v2 | . . . |vn ]

Construct the diagonal matrix Λ whose diagonal entries are the corresponding eigenvalues:
 
λ1 0 ··· 0
0 λ2 ··· 0 
Λ= . .. .. ..
 
 .. . . .


0 0 ··· λn

Now consider the product AV :

AV = A[v1 |v2 | . . . |vn ] = [Av1 |Av2 | . . . |Avn ]

Since Avi = λi vi , we have:


AV = [λ1 v1 |λ2 v2 | . . . |λn vn ]
32 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION

Next, consider the product V Λ:

V Λ = [v1 |v2 | . . . |vn ]Λ


 
λ1 0 ··· 0
0 λ2 ··· 0 
= [v1 |v2 | . . . |vn ]  . .. .. ..
 
 .. . . .


0 0 ··· λn
= [λ1 v1 + 0v2 + . . . |0v1 + λ2 v2 + . . . | . . . | · · · + 0vn−1 + λn vn ]
= [λ1 v1 |λ2 v2 | . . . |λn vn ]

Comparing the results, we see that AV = V Λ.


Since the eigenvectors v1 , v2 , . . . , vn are linearly independent by assumption, the matrix V whose columns are
these vectors is invertible.
Because V is invertible, we can multiply the equation AV = V Λ by V −1 from the right:

(AV )V −1 = (V Λ)V −1

A(V V −1 ) = V ΛV −1
AI = V ΛV −1
A = V ΛV −1
This shows that A is similar to the diagonal matrix Λ. Therefore, A is diagonalizable.

Conclusion: An n × n matrix A is diagonalizable if and only if it possesses a set of n linearly independent


eigenvectors. This set of eigenvectors forms an eigenbasis for the vector space Fn .

Process of Diagonalization
If an n × n matrix A is diagonalizable, the process to find the matrices V and Λ such that A = V ΛV −1 is as
follows:
1. Find the Eigenvalues: Solve the characteristic equation det(A − λI) = 0 to find the n eigenvalues
λ1 , λ2 , . . . , λn of A.
2. Find Linearly Independent Eigenvectors: For each distinct eigenvalue λi , find a basis for the corre-
sponding eigenspace Eλi = Nul(A − λi I). Collect all the basis vectors found for all eigenspaces.
3. Check for Diagonalizability: Check if the total number of linearly independent eigenvectors found in
Step 2 is equal to n (the size of the matrix). If it is less than n, the matrix A is not diagonalizable. If it
is equal to n, then A is diagonalizable.
• Sufficient Condition: A shortcut based on Section 3.4: If A has n distinct eigenvalues, then it is
guaranteed to have n linearly independent eigenvectors, and thus A is diagonalizable.
• Necessary and Sufficient Condition: A is diagonalizable if and only if the geometric multiplicity
(dimension of the eigenspace) of each eigenvalue equals its algebraic multiplicity (multiplicity as a
root of the characteristic polynomial). (This will be discussed further in the next section).
4. Construct V: Form the matrix V whose columns are the n linearly independent eigenvectors found in
Step 2. V = [v1 |v2 | . . . |vn ]. The order of the eigenvectors in V matters.
5. Construct Λ: Form the diagonal matrix Λ whose diagonal entries are the eigenvalues corresponding to the
eigenvectors in V , in the same order. That is, if the j-th column of V is the eigenvector vj corresponding
to eigenvalue λj , then the j-th diagonal entry of Λ must be λj .
6. Verification (Optional but Recommended): Calculate V −1 and verify that A = V ΛV −1 or, more
easily, check that AV = V Λ.  
2 1
Example 3.5.1. Diagonalize A = (from Section 3.2).
1 2
1. Eigenvalues: We found λ1 = 1, λ2 = 3.
   
−1 1
2. Eigenvectors: We found corresponding eigenvectors v1 = for λ1 = 1 and v2 = for λ2 = 3.
1 1
3.6. THEOREM: DIAGONALIZABILITY CONDITION (ALGEBRAIC VS. GEOMETRIC MULTIPLICITY) (WITH PROO

3. Check: We have 2 linearly independent eigenvectors ({v1 , v2 }) for a 2 × 2 matrix. Thus, A is diagonal-
izable.
 
−1 1
4. Construct V: V = [v1 |v2 ] =
1 1
   
λ1 0 1 0
5. Construct Λ: Λ = =
0 λ2 0 3
6. Verify (AV = V Λ):       
2 1 −1 1 −2 + 1 2 + 1 −1 3
AV = = =
1 2 1 1 −1 + 2 1 + 2 1 3
      
−1 1 1 0 −1(1) + 1(0) −1(0) + 1(3) −1 3
VΛ= = =
1 1 0 3 1(1) + 1(0) 1(0) + 1(3) 1 3
Since AV = V Λ, the diagonalization is correct. A = V ΛV −1 .

3.6 Theorem: Diagonalizability Condition (Algebraic vs. Geometric


Multiplicity) (with Proof)
In the previous section, we established that an n × n matrix A is diagonalizable if and only if it has n linearly
independent eigenvectors. While finding all eigenvectors and checking their independence works, it can be
cumbersome. A more refined condition for diagonalizability involves comparing the algebraic and geometric
multiplicities of each eigenvalue.

Definitions: Algebraic and Geometric Multiplicity


Let A be an n × n matrix and let λ be an eigenvalue of A.
1. Algebraic Multiplicity (AM): The algebraic multiplicity of an eigenvalue λ, denoted AM(λ), is its
multiplicity as a root of the characteristic polynomial p(λ) = det(A − λI). In other words, if p(λ) =
(λ − λ0 )k q(λ) where q(λ0 ) ̸= 0, then the algebraic multiplicity of the eigenvalue λ0 is k. The sum of
the algebraic multiplicities of all distinct eigenvalues of an n × n matrix is always equal to n (by the
Fundamental Theorem of Algebra applied to the characteristic polynomial).
2. Geometric Multiplicity (GM): The geometric multiplicity of an eigenvalue λ, denoted GM(λ), is the
dimension of the corresponding eigenspace Eλ = Nul(A − λI). That is, GM(λ) = dim(Eλ ) = dim(Nul(A −
λI)). The geometric multiplicity represents the maximum number of linearly independent eigenvectors
that can be found for the eigenvalue λ.

Relationship between AM and GM


For any eigenvalue λ of a matrix A, its geometric multiplicity is always less than or equal to its algebraic
multiplicity.
Theorem 3.6.1. For any eigenvalue λ of an n × n matrix A, 1 ≤ GM(λ) ≤ AM(λ).

Proof Sketch. Let λ0 be an eigenvalue of A with geometric multiplicity k = GM(λ0 ). This means the eigenspace
Eλ0 has dimension k. Let {v1 , v2 , . . . , vk } be a basis for Eλ0 . We can extend this basis to a basis for the entire
space Fn : {v1 , . . . , vk , w1 , . . . , wn−k }. Form an invertible matrix P whose columns are these basis vectors:
P = [v1 | . . . |vk |w1 | . . . |wn−k ]. Consider the matrix P −1 AP . The first k columns of AP are Av1 , . . . , Avk .
Since vi are eigenvectors for λ0 , Avi = λ0 vi . So, AP = [λ0 v1 | . . . |λ0 vk |Aw1 | . . . |Awn−k ]. Then P −1 AP =
P −1 [λ0 v1 | . . . |λ0 vk |Aw1 | . . . |Awn−k ]. Since P −1 P = I, and the first k columns of P are v1 , . . . , vk , the first
k columns of P −1 AP will be P −1 (λ0 vi ) = λ0 (P −1 vi ). Because vi is the i-th column of P , P −1 vi is the i-th
standard basis vector ei . Thus, the first k columns of P −1 AP are λ0 e1 , . . . , λ0 ek . This means P −1 AP has the
block form:  
λ I B
P −1 AP = 0 k
0 C
where Ik is the k × k identity matrix, B is k × (n − k), 0 is (n − k) × k zero matrix, and C is (n − k) × (n − k).
Now, consider the characteristic polynomial of A. Since A and P −1 AP are similar matrices, they have the same
characteristic polynomial:
det(A − λI) = det(P −1 AP − λI)
34 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION
 
λ0 Ik − λIk B
det(P −1 AP − λI) = det
0 C − λIn−k
This is the determinant of a block upper triangular matrix, which is the product of the determinants of the
diagonal blocks:

det(P −1 AP − λI) = det(λ0 Ik − λIk ) · det(C − λIn−k )


= det((λ0 − λ)Ik ) · det(C − λIn−k )
= (λ0 − λ)k · det(C − λIn−k )

The characteristic polynomial det(A − λI) has a factor of (λ0 − λ)k . This implies that the algebraic multiplicity
of λ0 , AM(λ0 ), must be at least k. Since k = GM(λ0 ), we have shown that GM(λ0 ) ≤ AM(λ0 ). The inequality
1 ≤ GM(λ) holds because an eigenvalue must have at least one corresponding eigenvector (which spans a space
of dimension at least 1).

The Diagonalizability Theorem


Now we can state the main theorem connecting diagonalizability with algebraic and geometric multiplicities.
Theorem 3.6.2. Let A be an n × n matrix with distinct eigenvalues λ1 , λ2 , . . . , λp . The matrix A is diagonal-
izable if and only if the following two conditions hold:
1. The characteristic polynomial factors completely into linear factors (possibly with repetitions) over the
field F (i.e., all eigenvalues λi belong to F). This is always true if F = C (complex numbers).
2. For each distinct eigenvalue λi of A, the geometric multiplicity equals the algebraic multiplicity: GM(λi ) =
AM(λi ).

Proof. (⇒) Assume A is diagonalizable.


If A is diagonalizable, then by the theorem in Section 3.5, A has n linearly independent eigenvectors. Let
these eigenvectors be grouped according to their eigenvalues. Let B1 , B2 , . . . , Bp be bases for the eigenspaces
Eλ1 , Eλ2 , . . . , Eλp corresponding to the distinct eigenvalues λ1 , . . . , λp . The dimension of each eigenspace is
GM(λi ) = |Bi | (the number of vectors in the basis Bi ).
Any vector from one eigenspace Eλi is linearly independent of any linear combination of vectors from other
eigenspaces Eλj (where j ̸= i). This is a consequence of the theorem that eigenvectors corresponding to distinct
eigenvalues are linearly independent (Section 3.4). Therefore, the union of the bases B = B1 ∪ B2 ∪ · · · ∪ Bp
forms a linearly independent set of eigenvectors.
The total number of vectors in B is |B| = |B1 | + |B2 | + · · · + |Bp | = GM(λ1 ) + GM(λ2 ) + · · · + GM(λp ).
Since A is diagonalizable, it must have n linearly independent eigenvectors. Thus, the set B must contain
exactly n vectors, meaning:
GM(λ1 ) + GM(λ2 ) + · · · + GM(λp ) = n (3.5)
We also know the sum of the algebraic multiplicities must equal n:

AM(λ1 ) + AM(λ2 ) + · · · + AM(λp ) = n (3.6)

Furthermore, we know from the previous theorem that for each eigenvalue, GM(λi ) ≤ AM(λi ).
Comparing Equations (3.5) and (3.6), and knowing GM(λi ) ≤ AM(λi ), the only way for both sums to equal n
is if GM(λi ) = AM(λi ) for all i = 1, . . . , p. Also, for A to be diagonalizable over F, the eigenvalues λi (roots of
the characteristic polynomial) must be in F, meaning the polynomial factors completely over F.
(⇐) Assume the characteristic polynomial factors completely over F and GM(λi ) = AM(λi ) for all
distinct eigenvalues λi .
Let λ1 , . . . , λp be the distinct eigenvalues. Since the characteristic polynomial factors completely, the sum of
the algebraic multiplicities is n:
AM(λ1 ) + · · · + AM(λp ) = n
By assumption, GM(λi ) = AM(λi ) for all i. Therefore:

GM(λ1 ) + · · · + GM(λp ) = n

Let Bi be a basis for the eigenspace Eλi . The size of Bi is |Bi | = GM(λi ). Consider the union of these bases:
B = B1 ∪ B2 ∪ · · · ∪ Bp . The total number of vectors in B is |B| = GM(λ1 ) + · · · + GM(λp ) = n.
3.7. SIMILARITY TRANSFORMATIONS AND THEIR PROPERTIES 35

As argued before, the set B consists of eigenvectors, and vectors from different eigenspaces are linearly indepen-
dent. Within each Bi , the vectors are linearly independent by definition of a basis. Therefore, the entire set B
is a collection of n linearly independent eigenvectors of A.
Since A has n linearly independent eigenvectors, by the theorem in Section 3.5, A is diagonalizable.

Conclusion: The condition GM(λi ) = AM(λi ) for all eigenvalues λi , along with the requirement that all
eigenvalues belong to the field of interest, provides a precise criterion for determining if a matrix is diagonalizable
without explicitly finding the full set of n eigenvectors
  and checking their independence.
2 1
Example 3.6.1 (Revisited). Consider A = . Eigenvalues are λ1 = 1, λ2 = 3. AM(1) = 1, AM(3) = 1.
1 2
   
−1 1
Sum = 1+1 = 2 = n. We found E1 = span{ }, so GM(1) = dim(E1 ) = 1. We found E3 = span{ }, so
1 1
GM(3) = dim(E3 ) = 1. Since GM(1) = AM(1) = 1 and GM(3) = AM(3) = 1, the matrix A is diagonalizable
(as we already knew).  
2 1
Example 3.6.2 (Non-Diagonalizable). Consider B = . Characteristic polynomial: det(B − λI) =
0 2
 
2−λ 1
det( ) = (2 − λ)(2 − λ) = (λ − 2)2 . Eigenvalue: λ1 = 2 with AM(2) = 2. Find eigenspace
0 2−λ
E2 = Nul(B − 2I):
 
0 1
B − 2I =
0 0
    
0 1 x1 0
(B − 2I)x = 0 =⇒ = =⇒ 0x1 + 1x2 = 0 =⇒ x2 = 0.
0 0 x2 0
     
x1 1 1
x1 is a free variable. Solution: x = = x1 . The eigenspace E2 is spanned by . Thus, GM(2) =
0 0 0
dim(E2 ) = 1. Since GM(2) = 1 < AM(2) = 2, the matrix B is not diagonalizable.

3.7 Similarity Transformations and their Properties


Similarity transformations are a fundamental concept in linear algebra, providing a way to relate different
matrix representations of the same linear transformation under different bases. Diagonalization, as discussed
previously, is a specific and important type of similarity transformation.

Definition of Similarity
Two n × n matrices A and B (over the same field F) are said to be similar if there exists an invertible n × n
matrix P such that:
B = P −1 AP
The transformation from A to B (or vice versa, since A = (P −1 )−1 B(P −1 )) is called a similarity transfor-
mation. The matrix P is often referred to as the change-of-basis matrix.

Connection to Change of Basis


The concept of similarity arises naturally when considering the matrix representation of a linear transformation
T : V → V with respect to different bases for the vector space V .
Let V be an n-dimensional vector space. Let α = {v1 , . . . , vn } be a basis for V . Let β = {w1 , . . . , wn } be
another basis for V .
Let A be the matrix representation of T with respect to the basis α, denoted [T ]α . This means that for any
vector x in V , if [x]α is the coordinate vector of x relative to α, then [T (x)]α = A[x]α .
Let B be the matrix representation of T with respect to the basis β, denoted [T ]β . This means [T (x)]β = B[x]β .
There exists an invertible change-of-basis matrix P that converts coordinates from basis β to basis α. Specifically,
P = [Id]βα , where the columns of P are the coordinate vectors of the basis vectors in β relative to the basis
α: P = [[w1 ]α | . . . |[wn ]α ]. This matrix satisfies [x]α = P [x]β for any vector x in V . The inverse matrix
P −1 = [Id]αβ converts coordinates from basis α to basis β, satisfying [x]β = P [x]α .
−1
36 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION

Now, let’s relate A and B. Consider T (x):

[T (x)]β = B[x]β

We can also express [T (x)]β using basis α and the change-of-basis matrices:

[T (x)]β = P −1 [T (x)]α

Since [T (x)]α = A[x]α :


[T (x)]β = P −1 (A[x]α )
And since [x]α = P [x]β :
[T (x)]β = P −1 (A(P [x]β ))
[T (x)]β = (P −1 AP )[x]β
Comparing the two expressions for [T (x)]β :

B[x]β = (P −1 AP )[x]β

Since this must hold for all vectors x (and thus all coordinate vectors [x]β ), we must have:

B = P −1 AP

This shows that the matrices representing the same linear transformation T with respect to different bases
(A = [T ]α and B = [T ]β ) are similar. The matrix P is the change-of-basis matrix from β to α.
Conversely, if two matrices A and B are similar (B = P −1 AP ), they can be interpreted as representing the
same linear transformation but with respect to different bases.

Properties Preserved by Similarity Transformations


Similar matrices share many important properties. If A and B are similar (B = P −1 AP ), then:
1. Determinant: det(A) = det(B)
Proof. det(B) = det(P −1 AP ) = det(P −1 ) det(A) det(P ) = (1/ det(P )) det(A) det(P ) = det(A).
2. Trace: tr(A) = tr(B)
Proof. Using the property tr(XY ) = tr(Y X): tr(B) = tr(P −1 (AP )) = tr((AP )P −1 ) = tr(A(P P −1 )) =
tr(AI) = tr(A).
3. Characteristic Polynomial: A and B have the same characteristic polynomial.
Proof. The characteristic polynomial of B is det(B − λI).

det(B − λI) = det(P −1 AP − λI)


= det(P −1 AP − λP −1 IP ) (Since I = P −1 IP )
= det(P −1 (A − λI)P )
= det(P −1 ) det(A − λI) det(P )
= (1/ det(P )) det(A − λI) det(P )
= det(A − λI)

Thus, the characteristic polynomials are identical.


4. Eigenvalues: A and B have the same eigenvalues with the same algebraic and geometric multiplicities.
Proof. Since the characteristic polynomials are identical (Property 3), the roots of these polynomials (the
eigenvalues) must be identical, including their algebraic multiplicities. For geometric multiplicity, let λ
be an eigenvalue and consider the eigenspace of B for λ: Eλ (B) = Nul(B − λI) = Nul(P −1 AP − λI) =
Nul(P −1 (A−λI)P ). Let x ∈ Nul(B −λI). Then (B −λI)x = 0 =⇒ P −1 (A−λI)P x = 0. Multiply by P :
(A − λI)P x = 0. This means P x ∈ Nul(A − λI). Conversely, let y ∈ Nul(A − λI). Then (A − λI)y = 0.
Let x = P −1 y. Then y = P x. Substituting into the equation: (A − λI)P x = 0. Multiply by P −1 :
P −1 (A − λI)P x = 0 =⇒ (B − λI)x = 0. This means x = P −1 y ∈ Nul(B − λI). This establishes a
one-to-one correspondence (isomorphism) between the null spaces Nul(B − λI) and Nul(A − λI) via the
mapping x 7→ P x (or y 7→ P −1 y). Since P is invertible, this mapping preserves dimension. Therefore,
dim(Nul(B − λI)) = dim(Nul(A − λI)), which means GMB (λ) = GMA (λ).
3.7. SIMILARITY TRANSFORMATIONS AND THEIR PROPERTIES 37

5. Rank: rank(A) = rank(B)


Proof. Multiplication by an invertible matrix does not change the rank. rank(B) = rank(P −1 AP ). Since P
is invertible, rank(P −1 AP ) = rank(AP ). Since P −1 is invertible, rank(AP ) = rank(A). Thus, rank(B) =
rank(A).
6. Nullity: nullity(A) = nullity(B)
Proof. Follows directly from the Rank-Nullity Theorem (rank + nullity = n) and the fact that rank is
preserved.
7. Invertibility: A is invertible if and only if B is invertible.
Proof. Follows directly from the determinant property (det(A) = det(B)). A is invertible iff det(A) ̸= 0,
which is equivalent to det(B) ̸= 0, which is equivalent to B being invertible.
Diagonalization as Similarity: If a matrix A is diagonalizable, it means A is similar to a diagonal matrix
Λ (A = V ΛV −1 or Λ = V −1 AV ). This implies that the linear transformation represented by A looks simplest
(is represented by a diagonal matrix) when viewed in the basis formed by the eigenvectors (the columns of V ).
The properties listed above confirm why diagonalization is so useful: the fundamental characteristics of the
transformation (eigenvalues, determinant, trace) are preserved in the simpler diagonal form Λ.
38 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION
Chapter 4

Jordan Normal Form

Not all square matrices are diagonalizable. As seen in Section 3.6, a matrix is diagonalizable if and only if
for every eigenvalue, its geometric multiplicity equals its algebraic multiplicity. When this condition fails for
one or more eigenvalues, the matrix lacks a full set of n linearly independent eigenvectors needed to form an
eigenbasis. However, it is still possible to find a basis that simplifies the matrix representation of the associated
linear transformation into a near-diagonal form called the Jordan Normal Form (or Jordan Canonical Form).
This form is crucial for understanding the structure of non-diagonalizable matrices and has important ap-
plications, particularly in solving systems of linear differential equations when the coefficient matrix is not
diagonalizable.

4.1 Generalized Eigenvectors


When the geometric multiplicity GM(λ) of an eigenvalue λ is less than its algebraic multiplicity AM(λ), there
are fewer linearly independent eigenvectors associated with λ than the dimension of the subspace associated
with λ in the characteristic polynomial’s factorization. To form a complete basis that reveals the matrix’s
structure, we need to introduce the concept of generalized eigenvectors.

Definition
Let A be an n × n matrix and let λ be an eigenvalue of A. A non-zero vector x is called a generalized
eigenvector of rank k corresponding to the eigenvalue λ if:

(A − λI)k x = 0

and
(A − λI)k−1 x ̸= 0
where k is a positive integer.
• A generalized eigenvector of rank 1 is simply a regular eigenvector, because (A − λI)1 x = 0 and (A −
λI)0 x = Ix = x ̸= 0 (since eigenvectors are non-zero).
• A generalized eigenvector of rank k > 1 is not a regular eigenvector, because (A − λI)x ̸= 0 (otherwise
its rank would be 1).

Chains of Generalized Eigenvectors


Generalized eigenvectors associated with a single eigenvalue λ form chains. Starting with a generalized eigen-
vector xk of rank k, we can generate a chain of k vectors:

xk (rank k)
xk−1 = (A − λI)xk (rank k − 1)
xk−2 = (A − λI)xk−1 = (A − λI)2 xk (rank k − 2)
...
x1 = (A − λI)x2 = (A − λI)k−1 xk (rank 1 - this is a regular eigenvector)

39
40 CHAPTER 4. JORDAN NORMAL FORM

Applying (A − λI) one more time gives:

(A − λI)x1 = (A − λI)k xk = 0

This sequence {x1 , x2 , . . . , xk } is called a Jordan chain of length k, generated by the generalized eigenvector
xk . The vector x1 at the end of the chain is always a standard eigenvector.
Action of A on the Chain: Let’s see how the matrix A acts on the vectors in a Jordan chain:
• For x1 (the eigenvector): Ax1 = λx1
• For x2 : (A − λI)x2 = x1 =⇒ Ax2 − λx2 = x1 =⇒ Ax2 = λx2 + x1
• For x3 : (A − λI)x3 = x2 =⇒ Ax3 − λx3 = x2 =⇒ Ax3 = λx3 + x2
• ...
• For xj (1 < j ≤ k): (A − λI)xj = xj−1 =⇒ Axj − λxj = xj−1 =⇒ Axj = λxj + xj−1
This structure (Axj = λxj + xj−1 ) is key to understanding the form of Jordan blocks, which will be discussed
in the next section.

Generalized Eigenspace
For a given eigenvalue λ, the set of all generalized eigenvectors corresponding to λ, together with the zero vector,
forms a subspace called the generalized eigenspace of λ, denoted Kλ .

Kλ = {x ∈ Fn | (A − λI)j x = 0 for some integer j ≥ 1}

Equivalently, Kλ is the null space of some power of the matrix (A − λI). It can be shown that if AM(λ) = m,
then Kλ = Nul((A − λI)m ).
Properties of Generalized Eigenspaces:
1. Subspace: Kλ is a subspace of Fn .
2. Invariance: Kλ is an invariant subspace under A, meaning that if x ∈ Kλ , then Ax ∈ Kλ .
Proof. If x ∈ Kλ , then (A − λI)j x = 0 for some j. We need to show (A − λI)p (Ax) = 0 for some p.
Consider (A − λI)j (Ax). Since A commutes with (A − λI), (A − λI)j (Ax) = A((A − λI)j x) = A(0) = 0.
So Ax ∈ Kλ (with p=j).
3. Dimension: The dimension of the generalized eigenspace Kλ is equal to the algebraic multiplicity of the
eigenvalue λ: dim(Kλ ) = AM(λ). This is a crucial result. While the dimension of the regular eigenspace
Eλ (which is GM(λ)) might be smaller than AM(λ), the dimension of the generalized eigenspace Kλ
always matches AM(λ).
4. Direct Sum Decomposition: The entire vector space Fn can be decomposed into a direct sum of the
generalized eigenspaces corresponding to the distinct eigenvalues λ1 , . . . , λp of A:

Fn = Kλ1 ⊕ Kλ2 ⊕ · · · ⊕ Kλp

This means every vector in Fn can be uniquely written as a sum of vectors, one from each generalized
eigenspace.

Finding Generalized Eigenvectors and Jordan Chains


Finding generalized eigenvectors and the corresponding Jordan chains is more involved than finding regular
eigenvectors.
1. Find Eigenvalues and AM/GM: Determine the eigenvalues λi , their algebraic multiplicities AM(λi ),
and geometric multiplicities GM(λi ).
2. Identify Deficient Eigenvalues: Focus on eigenvalues where GM(λ) < AM(λ). These are the eigen-
values that require generalized eigenvectors.
3. Calculate Powers of (A − λI): For a deficient eigenvalue λ with AM(λ) = m, calculate the matrices
(A − λI), (A − λI)2 , . . . , (A − λI)m .
4.2. JORDAN BLOCKS AND JORDAN CANONICAL FORM MATRIX 41

4. Determine Ranks/Nullities: Find the nullities (dimensions of the null spaces) of these powers: n1 =
nullity(A − λI) = GM(λ), n2 = nullity((A − λI)2 ), . . . , nm = nullity((A − λI)m ) = AM(λ). The number
of Jordan blocks of size k × k or larger associated with λ is given by nk − nk−1 (with n0 = 0). The number
of blocks of exactly size k × k is (nk − nk−1 ) − (nk+1 − nk ) = 2nk − nk−1 − nk+1 .
5. Find Chain Generators: Find vectors x that are in Nul((A − λI)k ) but not in Nul((A − λI)k−1 ). These
are the generalized eigenvectors of rank k (the top of the chains).
• Start by finding a basis for Nul((A − λI)m ) relative to Nul((A − λI)m−1 ). These vectors will generate
the longest chains (length m).
• Then find a basis for Nul((A − λI)m−1 ) relative to Nul((A − λI)m−2 ), ensuring linear independence
from vectors already generated by longer chains. These generate chains of length m-1.
• Continue this process down to Nul(A − λI), which contains the regular eigenvectors (bottoms of the
chains).
6. Construct Chains: For each generator xk found in step 5, construct the full Jordan chain: xk , xk−1 =
(A − λI)xk , . . . , x1 = (A − λI)k−1 xk .
The collection of all vectors from all Jordan chains for all eigenvalues forms a basis for Fn , known as a Jordan
basis. This basis is what allows the transformation of A into its Jordan Normal Form.

4.2 Jordan Blocks and Jordan Canonical Form Matrix


Section 4.1 introduced generalized eigenvectors and Jordan chains, which form a basis (called a Jordan basis)
for the vector space Fn even when the matrix A is not diagonalizable. This Jordan basis is precisely what allows
us to transform A via a similarity transformation into a special block diagonal matrix known as the Jordan
Normal Form (or Jordan Canonical Form). This form consists of diagonal blocks called Jordan blocks.

Jordan Blocks
A Jordan block is a square matrix associated with a single eigenvalue λ. It has the eigenvalue λ on the main
diagonal, ones on the super-diagonal (the diagonal directly above the main diagonal), and zeros everywhere
else. A k × k Jordan block, denoted Jk (λ), has the form:
 
λ 1 0 ··· 0
0
 λ 1 ··· 0 
0 0 λ ··· 0
Jk (λ) =  . .. .. .. .. 
 
 .. . . . .
 
0 0 0 ··· 1
0 0 0 ··· λ

• A 1 × 1 Jordan block is simply [λ].


 
λ 1
• A 2 × 2 Jordan block is .
0 λ
 
λ 1 0
• A 3 × 3 Jordan block is  0 λ 1 .
0 0 λ

Properties of a Jordan Block Jk (λ):


• Eigenvalue: The only eigenvalue of Jk (λ) is λ, with algebraic multiplicity k (since det(Jk (λ) − µI) =
(λ − µ)k ).
• Eigenvectors: The geometric multiplicity of the eigenvalue λ is 1. The eigenspace Eλ = Nul(Jk (λ) − λI)
is spanned by the single eigenvector e1 = [1, 0, . . . , 0]T .
Proof. Jk (λ) − λI is a matrix with zeros on the main diagonal and ones on the super-diagonal. Its null
space consists of vectors [x1 , . . . , xk ]T such that x2 = 0, x3 = 0, . . . , xk = 0, leaving x1 free. Thus, the null
space is span(e1 ), and its dimension (GM) is 1.
42 CHAPTER 4. JORDAN NORMAL FORM

• Generalized Eigenvectors: The standard basis vectors e1 , e2 , . . . , ek form a Jordan chain for Jk (λ)
corresponding to eigenvalue λ. Let J = Jk (λ). Then J − λI is the matrix N with 1s on the super-diagonal
and 0s elsewhere.

(J − λI)e1 = N e1 = 0
(J − λI)e2 = N e2 = e1
(J − λI)e3 = N e3 = e2
...
(J − λI)ek = N ek = ek−1

This matches the structure Axj = λxj + xj−1 derived for Jordan chains in Section 4.1, with J playing
the role of A and the standard basis vectors ej playing the role of the chain vectors xj (in reverse order:
ek is the generator xk , e1 is the eigenvector x1 ). Also, (J − λI)k = N k = 0 (N is nilpotent), confirming
that e1 , . . . , ek are generalized eigenvectors.

Jordan Canonical Form Matrix


The Jordan Canonical Form (JCF) or Jordan Normal Form (JNF) of a square matrix A is a block
diagonal matrix J where each diagonal block is a Jordan block Jki (λi ).
 
J1 0 · · · 0
 0 J2 · · · 0 
J = . .. .. .. 
 
 .. . . .
0 0 ··· Jp

where each Ji = Jki (λi ) is a Jordan block corresponding to some eigenvalue λi .


Key Properties of the JCF:
1. Similarity: Every square matrix A over an algebraically closed field (like the complex numbers C) is
similar to a Jordan Canonical Form matrix J. That is, there exists an invertible matrix P such that:

J = P −1 AP or equivalently, A = P JP −1

2. Structure of P: The columns of the matrix P are the vectors of a Jordan basis for A, arranged appro-
priately. Specifically, the columns corresponding to a single Jordan block Jk (λ) must be the vectors of the
corresponding Jordan chain {x1 , . . . , xk }, ordered from the eigenvector x1 to the generator xk .

P = [. . . |x1 |x2 | . . . |xk | . . . ]

If P is constructed this way, then AP = P J, which leads to J = P −1 AP .


3. Uniqueness: The Jordan Canonical Form J of a matrix A is unique up to the permutation of the Jordan
blocks along the diagonal.
4. Eigenvalue Information: The eigenvalues of A appear on the diagonal of its JCF, J. The algebraic
multiplicity AM(λ) of an eigenvalue λ is the sum of the sizes of all Jordan blocks corresponding to λ.
5. Geometric Multiplicity Information: The geometric multiplicity GM(λ) of an eigenvalue λ is equal
to the number of Jordan blocks corresponding to λ. Reasoning: Each Jordan block contributes exactly
one linearly independent eigenvector (the first vector in its corresponding chain) to the eigenspace Eλ .
The total dimension of Eλ (which is GM(λ)) is therefore the sum of these contributions, which equals the
number of blocks for λ.
6. Block Sizes: The sizes of the Jordan blocks for a given eigenvalue λ are determined by the structure of
the null spaces of the powers of (A − λI), as discussed in Section 4.1. Let nj = nullity((A − λI)j ). The
number of Jordan blocks of size at least k × k is nk − nk−1 . The number of blocks of size exactly k × k is
(nk − nk−1 ) − (nk+1 − nk ) = 2nk − nk−1 − nk+1 .  
2 1
Example 4.2.1. Consider the non-diagonalizable matrix B = from Section 3.6.
0 2
• Eigenvalue: λ = 2, AM(2) = 2.
   
0 1 1
• Eigenspace E2 = Nul(B − 2I) = Nul( ) = span{ }. So GM(2) = 1.
0 0 0
4.3. THEOREM: EXISTENCE AND UNIQUENESS OF JORDAN FORM (STATEMENT, BRIEF DISCUSSION ON CONS

• Since GM(2) < AM(2), the matrix is not diagonalizable.


• Number of Jordan blocks for λ = 2 is GM(2) = 1.
• The sum of the sizes of the blocks for λ = 2 must be AM(2) = 2.
• Since there is only one block, its size must be 2 × 2.
 
2 1
• The Jordan block is J2 (2) = .
0 2
 
2 1
• The Jordan Canonical Form is J = . (In this case, B was already in JCF).
0 2
Let’s find the Jordan basis P such that J = P −1 BP . Since B = J, we expect P = I. We need a Jordan chain
of length 2 for λ = 2.     
2 0 1 0 1 0 0
(B − 2I) = = .
0 0 0 0 0 0
 
1
Nul((B − 2I)2 ) is all of R2 . Nul(B − 2I) is span{ }. We need a vector x2 in Nul((B − 2I)2 ) but not
0
      
0 0 1 0 1
in Nul(B − 2I). Let’s choose x2 = . Then x1 = (B − 2I)x2 = = . The Jordan chain
    1 0 0 1 0  
1 0 1 0
is {x1 , x2 } = { , }. The Jordan basis matrix P is formed by columns x1 and x2 : P = = I.
0 1 0 1
Verification: P −1 BP = I −1 BI = B = J. This confirms the structure.

Diagonalizability and JCF


A matrix A is diagonalizable if and only if its Jordan Canonical Form J is a diagonal matrix. This happens
precisely when all Jordan blocks are of size 1 × 1. This, in turn, occurs if and only if the number of blocks for
each eigenvalue λ (which is GM(λ)) equals the sum of the sizes of the blocks for λ (which is AM(λ)). Thus, the
JCF framework confirms the diagonalizability condition GM(λ) = AM(λ) for all eigenvalues λ.

4.3 Theorem: Existence and Uniqueness of Jordan Form (Statement,


brief discussion on construction)
The previous sections introduced generalized eigenvectors, Jordan chains, and the structure of Jordan blocks
and the Jordan Canonical Form (JCF). A fundamental theorem guarantees that every square matrix (over an
algebraically closed field like C) can be transformed into this specific form, and this form is essentially unique.

Theorem: Existence and Uniqueness of the Jordan Canonical Form


Statement: Let A be an n × n matrix whose characteristic polynomial factors completely into linear factors
over the field F (this is always true if F = C, the field of complex numbers). Then A is similar to a Jordan
Canonical Form matrix J, i.e., there exists an invertible matrix P such that J = P −1 AP .
The Jordan Canonical Form matrix J is unique up to the permutation (reordering) of the Jordan blocks
J1 , J2 , . . . , Jp along its diagonal.
Implications:
• Existence: Every matrix (over C, or over R if all eigenvalues are real) has a Jordan Canonical Form.
This provides a standard, simplified representation even for non-diagonalizable matrices.
• Uniqueness: Although the matrix P (the Jordan basis) is not unique, the resulting Jordan form J is
unique, apart from the order in which the blocks appear. This means the number and sizes of the Jordan
blocks corresponding to each eigenvalue are uniquely determined by the matrix A itself.

Brief Discussion on Construction and Uniqueness Proof


Existence (Construction Sketch): The proof of existence is constructive and relies heavily on the properties
of generalized eigenspaces discussed in Section 4.1.
1. Decomposition into Generalized Eigenspaces: The vector space Fn decomposes into the direct sum
of the generalized eigenspaces Kλi corresponding to the distinct eigenvalues λi : Fn = Kλ1 ⊕ · · · ⊕ Kλp .
44 CHAPTER 4. JORDAN NORMAL FORM

2. Invariant Subspaces: Each Kλi is invariant under the transformation A. This means that the action of
A can be studied independently within each Kλi . If we choose a basis for Fn that is the union of bases for
each Kλi , the matrix representation of A in this basis will be block diagonal, with blocks corresponding
to the restriction of A to each Kλi .
3. Action within a Generalized Eigenspace: Within a single generalized eigenspace Kλ , the transfor-
mation A acts like λI + N , where N = A − λI is nilpotent on Kλ (meaning N m = 0 on Kλ , where
m = AM(λ)).
4. Jordan Basis for Nilpotent Operator: The core of the construction involves finding a specific basis
for Kλ (a Jordan basis) such that the matrix representation of the nilpotent operator N restricted to Kλ
takes the form of a block diagonal matrix composed of Jordan blocks with 0 on the diagonal (e.g., Jk (0)).
This construction involves analyzing the null spaces of N, N 2 , . . . , N m and carefully selecting basis vectors
to form Jordan chains, as outlined in Section 4.1.
5. Combining Bases: When the Jordan bases for each Kλi are combined, they form a Jordan basis for
the entire space Fn . In this basis, the matrix representation of A becomes the Jordan Canonical Form
J = P −1 AP , where P ’s columns are the Jordan basis vectors.
Uniqueness (Proof Sketch): The uniqueness part relies on showing that the number and sizes of the Jordan
blocks for each eigenvalue λ are uniquely determined by the properties of the matrix A, specifically by the
dimensions of the null spaces of the powers of (A − λI).
1. Similarity Invariants: If J = P −1 AP , then J − λI = P −1 (A − λI)P . This means J − λI and A − λI
are similar.
2. Powers are Similar: Consequently, (J − λI)k = P −1 (A − λI)k P , so (J − λI)k and (A − λI)k are also
similar for any k ≥ 1.
3. Rank/Nullity Invariance: Similar matrices have the same rank and the same nullity. Therefore,
nullity((J − λI)k ) = nullity((A − λI)k ) for all k ≥ 1.
4. Nullity Determines Block Structure: The nullity of (J − λI)k depends directly on the number and
sizes of the Jordan blocks in J corresponding to the eigenvalue λ. Specifically, nullity((J − λI)k ) is the
sum, over all blocks Jj (λ) associated with λ, of min(k, size of block). Let nk = nullity((A − λI)k ). As
shown previously, the number of blocks of size exactly m × m is 2nm − nm−1 − nm+1 .
5. Unique Determination: Since the values nk = nullity((A − λI)k ) are determined solely by A, and these
values in turn uniquely determine the number of Jordan blocks of each possible size for the eigenvalue λ,
the structure of the Jordan Canonical Form J associated with A must be unique (up to the ordering of
the blocks).
In summary, the existence of the JCF is guaranteed by the ability to decompose the space using generalized
eigenspaces and construct Jordan bases within them. The uniqueness follows because the dimensions of the null
spaces of powers of (A − λI), which are similarity invariants, completely dictate the structure of the Jordan
blocks.

4.4 Application to understanding matrix structure when not diago-


nalizable
The Jordan Canonical Form (JCF) is more than just a theoretical construct; it provides significant practi-
cal insights into the structure and behavior of linear transformations, especially those represented by non-
diagonalizable matrices.

Understanding Geometric vs. Algebraic Multiplicity


The JCF provides a clear visualization of why a matrix might not be diagonalizable. Diagonalizability requires
GM(λ) = AM(λ) for all eigenvalues λ. The JCF reveals the discrepancy:
• AM(λ): The sum of the sizes of all Jordan blocks associated with λ.
• GM(λ): The number of Jordan blocks associated with λ.
If GM(λ) < AM(λ), it means there must be at least one Jordan block for λ with size greater than 1 × 1. These
larger blocks correspond to the presence of generalized eigenvectors that are not regular eigenvectors, indicating
4.4. APPLICATION TO UNDERSTANDING MATRIX STRUCTURE WHEN NOT DIAGONALIZABLE45

directions where the transformation involves both scaling (by λ) and a "shear-like" component represented by
the super-diagonal 1s.

Structure of the Transformation


Recall that A = P JP −1 , where J is the JCF and P contains the Jordan basis vectors. This decomposition
allows us to understand the action of A by considering the action of J in the Jordan basis.
The action of J is block-diagonal. Within each block Jk (λ) corresponding to a Jordan chain {x1 , . . . , xk }
(ordered x1 to xk in P ), the transformation acts as:
• Ax1 = λx1 (Eigenvector behavior)
• Ax2 = λx2 + x1
• ...
• Axk = λxk + xk−1
This shows that within the subspace spanned by a Jordan chain associated with a block larger than 1 × 1, the
transformation A is not a simple scaling. Instead, it scales vectors by λ but also shifts them along the direction
of the preceding vector in the chain. This reveals the more complex geometric action of non-diagonalizable
transformations compared to the pure scaling along eigenvector directions seen in diagonalizable transformations.

Computing Matrix Powers and Exponentials


Diagonalization simplifies computing Ak = (V ΛV −1 )k = V Λk V −1 because Λk is easy to compute (just raise
diagonal elements to the power k). The JCF provides a similar, albeit slightly more complex, simplification for
non-diagonalizable matrices.
If A = P JP −1 , then Ak = P J k P −1 . Computing J k involves computing the k-th power of each Jordan block
Ji on the diagonal. Let Ji = Jm (λ) be an m × m Jordan block. We can write Ji = λI + N , where N is the
nilpotent matrix with 1s on the super-diagonal. Since λI and N commute, we can use the binomial theorem:
k   k  
k k
X k k−j j
X k k−j j
(Ji ) = (λI + N ) = (λI) N = λ N
j=0
j j=0
j

Since N is nilpotent (N m = 0), the sum only goes up to j = m − 1 (or k if k < m − 1). The powers N j are
known matrices with 1s moving up and to the right: N 0 = I N 1 = N (1s on 1st super-diagonal) N 2 = (1s on
2nd super-diagonal) . . . N m−1 = (1 in the top-right corner) N j = 0 for j ≥ m
This provides an explicit formula for (Ji )k and thus for J k and Ak .
Similarly, the matrix exponential eAt = P eJt P −1 can be computed. eJt is block diagonal, and for each block
Ji = λI + N :
eJi t = e(λI+N )t = eλIt eN t (since λI and N commute)
 
∞ j
X (N t)
= (eλt I) ·  
j=0
j!
 
m−1
X N j tj 
= eλt ·  (since N m = 0)
j=0
j!

This gives an explicit formula for eJi t :

1 t t2 /2! tm−1 /(m − 1)!


 
···
0 1 t ··· tm−2 /(m − 2)!
λt  .. . .. .. ..
 
eJ i t = e  . .. . . .


 
0 0 0 ··· t 
0 0 0 ··· 1

This is crucial for solving systems of linear differential equations ẋ = Ax, where the solution is x(t) = eAt x(0).
The JCF allows computing eAt even when A is not diagonalizable.
46 CHAPTER 4. JORDAN NORMAL FORM

Stability Analysis
In systems theory (e.g., ẋ = Ax), the stability of the system is determined by the eigenvalues of A. For
asymptotic stability, all eigenvalues must have negative real parts. The JCF helps understand the behavior
near marginal stability (eigenvalues on the imaginary axis).
If an eigenvalue λ on the imaginary axis has GM(λ) < AM(λ), meaning there is a Jordan block of size > 1 for
λ, the corresponding solutions will involve terms like teλt , t2 eλt , etc. (arising from the powers of t in eJi t ). Even
if Re(λ) = 0, these polynomial terms in t cause the solution magnitude to grow over time, leading to instability.
Therefore, for an LTI system ẋ = Ax to be stable (in the sense of Lyapunov), all eigenvalues must have Re(λ)
≤ 0, AND for any eigenvalue with Re(λ) = 0, its geometric multiplicity must equal its algebraic multiplicity
(i.e., all Jordan blocks for purely imaginary eigenvalues must be 1 × 1).
In conclusion, the Jordan Canonical Form provides the necessary tool to fully understand the structure, compute
functions (like powers and exponentials), and analyze the behavior (like stability) of linear transformations and
systems represented by matrices that are not diagonalizable.
Chapter 5

Matrix Norms and Positive Definite


Matrices

In the analysis of linear systems, particularly when dealing with stability, convergence of iterative methods, and
sensitivity analysis, it is essential to have ways to measure the "size" or "magnitude" of vectors and matrices.
Vector norms provide a way to quantify the length or magnitude of vectors, while matrix norms extend this
concept to matrices, measuring how much a matrix can amplify the norm of a vector.
This chapter also introduces positive definite and semidefinite matrices, which play a crucial role in stability
analysis (especially Lyapunov stability), optimization, and various areas of engineering and physics.

5.1 Vector Norms (e.g., 1-norm, 2-norm, ∞-norm)


A norm on a vector space V over a field F (typically R or C) is a function || · || : V → R that assigns a
non-negative real number (the "length" or "magnitude") to each vector, satisfying specific properties.

Definition of a Vector Norm


A function || · || : V → R is a vector norm if for all vectors x, y ∈ V and all scalars α ∈ F, the following
properties hold:
1. Non-negativity: ||x|| ≥ 0
2. Positive definiteness: ||x|| = 0 if and only if x = 0 (the zero vector)
3. Homogeneity (or Scaling): ||αx|| = |α|||x|| (where |α| is the absolute value or modulus of the scalar
α)
4. Triangle Inequality: ||x + y|| ≤ ||x|| + ||y||
A vector space equipped with a norm is called a normed vector space.

Common Vector Norms on Fn (where F = R or C)


For the common vector space Fn (vectors with n components), several standard norms are frequently used. Let
x = [x1 , x2 , . . . , xn ]T ∈ Fn .
1. The Euclidean Norm (or 2-norm): This is the most common norm, representing the standard Euclidean
distance from the origin.

n
!1/2
X p
||x||2 = |xi |2 = |x1 |2 + |x2 |2 + · · · + |xn |2
i=1

√ to x1 + · · · + xn . For vectors in R or R , this corresponds to the usual notion of length.


In Rn , this simplifies
p
2 2 2 3

If F = C, ||x||2 = xH x, where x is the conjugate transpose (Hermitian transpose) of x.


H

Proof of Triangle Inequality (Minkowski Inequality for p=2): This relies on the Cauchy-Schwarz inequality:
|xH y| ≤ ||x||2 ||y||2 . Then ||x + y||22 = (x + y)H (x + y) = xH x + y H y + xH y + y H x = ||x||22 + ||y||22 + 2Re(xH y).

47
48 CHAPTER 5. MATRIX NORMS AND POSITIVE DEFINITE MATRICES

Using Cauchy-Schwarz, Re(xH y) ≤ |xH y| ≤ ||x||2 ||y||2 . So, ||x + y||22 ≤ ||x||22 + ||y||22 + 2||x||2 ||y||2 = (||x||2 +
||y||2 )2 . Taking the square root gives ||x + y||2 ≤ ||x||2 + ||y||2 .
2. The 1-norm (or Manhattan/Taxicab norm): This norm measures the sum of the absolute values of
the components.
n
X
||x||1 = |xi | = |x1 | + |x2 | + · · · + |xn |
i=1

It represents the distance traveled if movement is restricted to grid lines (like streets in Manhattan).
yi |. By the triangle inequality for scalars, |xi + yi | ≤ |xi | + |yi |.
P
P ||x + y||1 =P |xi + P
Proof of Triangle Inequality:
Therefore, ||x + y||1 ≤ (|xi | + |yi |) = |xi | + |yi | = ||x||1 + ||y||1 .
3. The Infinity-norm (or Maximum/Supremum norm): This norm measures the maximum absolute
value among the components.
||x||∞ = max{|x1 |, |x2 |, . . . , |xn |}

Proof of Triangle Inequality: ||x+y||∞ = maxi |xi +yi |. For any i, |xi +yi | ≤ |xi |+|yi |. Since |xi | ≤ maxj |xj | =
||x||∞ and |yi | ≤ maxj |yj | = ||y||∞ , we have |xi + yi | ≤ ||x||∞ + ||y||∞ . Since this holds for all i, the maximum
value must also satisfy this inequality: maxi |xi + yi | ≤ ||x||∞ + ||y||∞ . Thus, ||x + y||∞ ≤ ||x||∞ + ||y||∞ .
4. The p-norm (Hölder norm): This is a generalization that includes the 1-norm, 2-norm, and ∞-norm as
special cases. For any real number p ≥ 1:

n
!1/p
X
||x||p = |xi |p
i=1

• p = 1 gives the 1-norm.


• p = 2 gives the 2-norm (Euclidean norm).
• As p → ∞, the p-norm converges to the ∞-norm: limp→∞ ||x||p = ||x||∞ .

Equivalence of Norms
An important property in finite-dimensional vector spaces like Fn is that all norms are equivalent. This means
that for any two norms || · ||a and || · ||b on Fn , there exist positive constants c1 and c2 such that for all vectors
x ∈ Fn :
c1 ||x||b ≤ ||x||a ≤ c2 ||x||b

This equivalence implies that concepts like convergence and boundedness are independent of the specific norm
chosen in a finite-dimensional space. If a sequence of vectors converges to a limit using one norm, it converges
to the same limit using any other norm.
Examples of Equivalence Inequalities in Fn :

• ||x||∞ ≤ ||x||2 ≤ n||x||∞
• ||x||∞ ≤ ||x||1 ≤ n||x||∞

• (1/ n)||x||1 ≤ ||x||2 ≤ ||x||1
These relationships can be derived using inequalities like Cauchy-Schwarz or by considering the definitions
directly.
Vector norms provide the foundation for defining matrix norms and are essential tools for analyzing the mag-
nitude and convergence properties of vectors in linear algebra and systems theory.

5.2 Matrix Norms (Induced Norms, Frobenius Norm)


Just as vector norms measure the size of vectors, matrix norms measure the "size" or "magnitude" of matrices.
They are particularly important for analyzing how a linear transformation (represented by a matrix) scales
vectors, for studying the convergence of iterative matrix algorithms, and for sensitivity analysis in linear systems.
5.2. MATRIX NORMS (INDUCED NORMS, FROBENIUS NORM) 49

Definition of a Matrix Norm


A function || · || : Fm×n → R (where Fm×n is the space of m × n matrices over field F) is a matrix norm if it
satisfies the same properties as a vector norm for all matrices A, B ∈ Fm×n and all scalars α ∈ F:
1. Non-negativity: ||A|| ≥ 0
2. Positive definiteness: ||A|| = 0 if and only if A = 0 (the zero matrix)
3. Homogeneity (or Scaling): ||αA|| = |α|||A||
4. Triangle Inequality: ||A + B|| ≤ ||A|| + ||B||
In addition to these basic properties, matrix norms are often required to be submultiplicative (or consistent
with the vector norms they interact with), especially when dealing with square matrices (m = n):
5. Submultiplicativity: ||AB|| ≤ ||A||||B|| (for compatible matrices A, B)
This property relates the norm of a product of matrices to the product of their norms and is crucial for analyzing
powers of matrices and convergence.

Induced (Operator) Norms


One important class of matrix norms is derived from vector norms. Given vector norms || · ||a on Fn and || · ||b
on Fm , the induced matrix norm (or operator norm) || · ||a,b on Fm×n is defined as the maximum scaling
factor that the matrix A applies to any non-zero vector x ∈ Fn :
 
||Ax||b
||A||a,b = sup : x ∈ Fn , x ̸= 0
||x||a

Equivalently, using the properties of supremum and norms:

||A||a,b = sup{||Ax||b : x ∈ Fn , ||x||a = 1}

This definition measures the largest possible "amplification" or "gain" of the linear transformation represented
by A, relative to the chosen vector norms.
Properties of Induced Norms:
• They satisfy the four basic matrix norm properties (non-negativity, positive definiteness, homogeneity,
triangle inequality).
• They satisfy the consistency condition with the underlying vector norms: ||Ax||b ≤ ||A||a,b ||x||a .
• If the same vector norm is used for both the domain and codomain (|| · ||a = || · ||b ) for square matrices,
the induced norm is submultiplicative: ||AB|| ≤ ||A||||B||.
Common Induced Norms (usually assuming the same vector norm for domain and codomain):
Let A be an m × n matrix.
1. The Matrix 1-norm (induced by vector 1-norm): This is the maximum absolute column sum.
m
!
X
||A||1 = max |aij |
1≤j≤n
i=1

Pj |. Rearranging the sum: i |aij |). Let


P P P P P P
P Sketch: ||Ax||1 = i | j aij xj | ≤ i j |aij ||x
Derivation j |xj |(
Cj = i |aP ij | (j-th absolute column sum). Then ||Ax||1 ≤ C |x
j j j |. Since Cj ≤ max C
k k , we have ||Ax||1 ≤
(maxk Ck ) j |xj | = (maxk Ck )||x||1 . Thus ||A||1 ≤ maxk Ck . It can be shown that this maximum is achieved
by choosing x to be the standard basis vector ek corresponding to the column k with the maximum absolute
sum.
2. The Matrix ∞-norm (induced by vector ∞-norm): This is the maximum absolute row sum.
 
Xn
||A||∞ = max  |aij |
1≤i≤m
j=1

Pi | j aij xj | ≤ maxi j |aij ||xj |. Since |xj | ≤ ||x||∞ , we have ||Ax||∞ ≤


P P
P Sketch: ||Ax||∞ = max
Derivation
maxi ( j |aij |)||x||∞ . Let Ri = j |aij | (i-th absolute row sum). Then ||Ax||∞ ≤ (maxk Rk )||x||∞ . Thus
50 CHAPTER 5. MATRIX NORMS AND POSITIVE DEFINITE MATRICES

||A||∞ ≤ maxk Rk . It can be shown this maximum is achieved by choosing an x with ||x||∞ = 1 where
xj = sign(akj ) for the row k with the maximum absolute sum.
3. The Matrix 2-norm (or Spectral Norm, induced by vector 2-norm): This norm is related to the
singular values of the matrix.

||A||2 = σmax (A) (the largest singular value of A)

Recall that the singular values σp


i (A) are the square roots of the eigenvalues of the positive semidefinite matrix
AH A (or AAH ). So, σmax (A) = λmax (AH A).
Derivation Sketch: ||A||22 = sup||x||2 =1 ||Ax||22 = sup||x||2 =1 (Ax)H (Ax) = sup||x||2 =1 xH (AH A)x. Since AH A is
Hermitian and positive semidefinite, it has real non-negative eigenvalues λi (AH A) and an orthonormal basis of
eigenvectors. Any vector x with ||x||2 = 1 can be written as a linear combination of these eigenvectors. The
expression xH (AH A)x is maximized when x is the eigenvector corresponding to the largest eigenvalue of AH A,
λmax (A A). In this case, x (A A)x = λmax (A A). Thus, ||A||2 = λmax (A A), and ||A||2 = λmax (AH A) =
p
H H H H 2 H

σmax (A).
Note: If A is a normal matrix (AH A = AAH ), then ||A||2 = max |λi (A)| (the spectral radius). If A is Hermitian
(AH = A) or symmetric real (AT = A), then ||A||2 = max |λi (A)|.

The Frobenius Norm (or Hilbert-Schmidt Norm)


Another common matrix norm, which is not an induced norm, is the Frobenius norm. It treats the matrix as
a long vector formed by stacking its columns (or rows) and computes the standard Euclidean (vector 2-) norm
of that vector.
 1/2
Xm X n
||A||F =  |aij |2 
i=1 j=1

Equivalently, using the trace function (sum of diagonal elements):


q q
||A||F = tr(AH A) = tr(AAH )

Properties of the Frobenius Norm:


• Satisfies the four basic matrix norm properties.
• Is submultiplicative:P||AB||F ≤ ||A||F ||B||F . Proof Sketch: Using Cauchy-Schwarz on the inner product
definition (AB)ij = k aik bkj .
• Is consistent with the vector 2-norm: ||Ax||2 ≤ ||A||F ||x||2 .
• Is related to singular values: ||A||2F = σi (A)2 (sum of squared singular values).
P

• Is easy to compute directly from the matrix entries.


• Is unitarily invariant: ||U AV ||F = ||A||F if U and V are unitary matrices.
Induced 2-Norm vs. Frobenius Norm: It’s important to note the relationship and difference between ||A||2
and ||A||F :
||A||2 = σmax (A)
qX
||A||F = σi (A)2

From these, we can derive the inequality:



||A||2 ≤ ||A||F ≤ r||A||2

where r = rank(A) ≤ min(m, n).


Matrix norms are fundamental for analyzing matrix properties, convergence of algorithms, and the behavior of
linear systems.
5.3. POSITIVE DEFINITE AND SEMIDEFINITE MATRICES: DEFINITION (XT PX > 0 OR ≥ 0 FOR X ̸= 0), PROPERT

5.3 Positive Definite and Semidefinite Matrices: Definition (xT Px >


0 or ≥ 0 for x ̸= 0), Properties
Positive definite and positive semidefinite matrices are special classes of symmetric (or Hermitian) matrices
that appear frequently in various fields, including optimization, stability analysis (Lyapunov theory), statistics
(covariance matrices), physics, and engineering. Their defining property relates to the sign of the quadratic
form associated with the matrix.
Throughout this section, we will primarily consider real symmetric matrices. The concepts extend directly to
complex Hermitian matrices (where AH = A) by replacing the transpose (T ) with the conjugate transpose (H ).

Definitions
Let A be an n × n real symmetric matrix (A = AT ).
1. Positive Definite (PD): The matrix A is called positive definite if the quadratic form xT Ax is strictly
positive for all non-zero vectors x ∈ Rn .

A is PD ⇐⇒ xT Ax > 0 for all x ∈ Rn , x ̸= 0.

Notation: A > 0.
2. Positive Semidefinite (PSD): The matrix A is called positive semidefinite if the quadratic form
xT Ax is non-negative for all vectors x ∈ Rn .

A is PSD ⇐⇒ xT Ax ≥ 0 for all x ∈ Rn .

Notation: A ≥ 0.
Similarly, we can define negative definite (A < 0 if xT Ax < 0 for x ̸= 0) and negative semidefinite (A ≤ 0
if xT Ax ≤ 0 for all x). A symmetric matrix that is neither positive semidefinite nor negative semidefinite is
called indefinite (meaning the quadratic form xT Ax can take both positive and negative values).
Note: Non-symmetric matrices are generally not classified as positive definite or semidefinite using this def-
inition, although the quadratic form xT Ax can still be evaluated. However, xT Ax = xT ((A + AT )/2)x, so
the quadratic form only depends on the symmetric part of A, (A + AT )/2. Therefore, discussions of positive
definiteness typically focus on symmetric matrices.

Properties of Positive Definite and Semidefinite Matrices


Let A be an n × n real symmetric matrix.
1. Eigenvalues:
• A is PD if and only if all its eigenvalues are strictly positive (λi > 0 for all i).
• A is PSD if and only if all its eigenvalues are non-negative (λi ≥ 0 for all i).
Proof Sketch (PD case, ⇒): Let Ax = λx with x ̸= 0. Then xT Ax = xT (λx) = λ(xT x) = λ||x||22 .
Since A is PD, xT Ax > 0. Since ||x||22 > 0, we must have λ > 0. Proof Sketch (PD case, ⇐):
Since A is symmetric, it has an orthonormal basis of eigenvectors P {v1 , . . . , vn } with corresponding real
eigenvaluesPλ1 , . . . , λnP
. Any non-zero x can be written as x = ci vi (with not all ci = 0). Then
xT Ax = ( P ci vi )T A( cj vj ) = ( ci vi )T ( cj λj vj ). Using orthonormality (viT vj = δij ), this simplifies
P P
n
to xT Ax = i=1 λi c2i . If all λi > 0 and at least one ci ̸= 0, then xT Ax > 0. (Similar arguments hold for
the PSD case with > replaced by ≥).
2. Determinant:
• If A is PD, then det(A) > 0 (since det(A) is the product of eigenvalues).
• If A is PSD, then det(A) ≥ 0.
 
−1 0
• Note: det(A) > 0 is necessary but not sufficient for A to be PD (e.g., has det = 1 but is
0 −1
negative definite).
3. Diagonal Entries:
• If A is PD, then all its diagonal entries are strictly positive (aii > 0).
52 CHAPTER 5. MATRIX NORMS AND POSITIVE DEFINITE MATRICES

• If A is PSD, then all its diagonal entries are non-negative (aii ≥ 0).
Proof: Consider x = ei (the i-th standard basis vector). Then xT Ax = eT
i Aei = aii . If A is PD, aii > 0.
If A is PSD, aii ≥ 0.
4. Principal Submatrices:
• If A is PD, then all its principal submatrices are also PD. (A principal submatrix is obtained by
deleting the same set of rows and columns).
• If A is PSD, then all its principal submatrices are also PSD.
Proof Sketch: Let AK be a principal submatrix corresponding to index set K. Let y be a non-zero vector
of the size of AK . Construct x by setting xi = yi if i ∈ K and xi = 0 otherwise. Then y T AK y = xT Ax.
Since x ̸= 0 if y ̸= 0, the sign of y T AK y follows the sign of xT Ax.
5. Invertibility:
• A PD matrix is always invertible (since all eigenvalues are non-zero, det(A) ̸= 0).
• A PSD matrix may or may not be invertible (it is invertible if and only if all eigenvalues are strictly
positive, i.e., if it is actually PD).
6. Inverse: If A is PD, then its inverse A−1 is also PD.
Proof. The eigenvalues of A−1 are 1/λi (A). If λi (A) > 0, then 1/λi (A) > 0. Since A−1 is also symmetric,
it is PD.
7. Sum: If A and B are n × n matrices of the same type (both PD or both PSD), then A + B is also of that
type.
Proof (PD case). xT (A + B)x = xT Ax + xT Bx. If x ̸= 0, then xT Ax > 0 and xT Bx > 0, so their sum
is > 0.
8. Congruence Transformation: If A is PD (or PSD) and C is an invertible n × n matrix, then the
congruent matrix C T AC is also PD (or PSD).
Proof (PD case). Consider y T (C T AC)y = (Cy)T A(Cy). Let x = Cy. Since C is invertible and y ̸= 0,
then x ̸= 0. Therefore, (Cy)T A(Cy) = xT Ax > 0 (since A is PD). Thus C T AC is PD.
Note: If C is merely non-square (m × n, m ̸= n) but has full column rank, and A (m × m) is PD, then
C T AC (n × n) is PD. If A is PSD, C T AC is PSD.
9. Matrix Square Root: If A is PSD, there exists a unique PSD matrix B such that B 2 = A. This matrix
B is called the principal square root of A, often denoted A1/2 .

Examples
 
2 1
• A= . Eigenvalues are 1, 3 (both > 0). A is PD.
1 2
  
2 1 x1
xT Ax = [x1 , x2 ] = 2x21 + 2x1 x2 + 2x22 = x21 + x22 + (x1 + x2 )2 > 0 for (x1 , x2 ) ̸= (0, 0).
1 2 x2
 
1 1
• B= . Eigenvalues are 0, 2 (both ≥ 0). B is PSD (but not PD).
1 1

xT Bx = x21 + 2x1 x2 + x22 = (x1 + x2 )2 ≥ 0.


 
1 2
• C= . Eigenvalues are 3, -1. C is indefinite.
2 1
   
1 1
xT Cx = x21 + 4x1 x2 + x22 . If x = , xT Cx = 1. If x = , xT Cx = 1 − 4 + 1 = −2.
0 −1

• I (Identity matrix): Eigenvalues are all 1. I is PD.


• 0 (Zero matrix): Eigenvalues are all 0. 0 is PSD.
Positive definite and semidefinite matrices are fundamental in many areas, particularly in Lyapunov stability
theory where the existence of a positive definite matrix satisfying certain conditions guarantees system stability.
5.4. THEOREM: SYLVESTER’S CRITERION FOR POSITIVE DEFINITENESS (STATEMENT AND APPLICATION)53

5.4 Theorem: Sylvester’s Criterion for Positive Definiteness (State-


ment and Application)
While checking the eigenvalues of a symmetric matrix A provides a definitive test for positive definiteness (all
eigenvalues must be positive), calculating eigenvalues can be computationally intensive, especially for large
matrices. Sylvester’s criterion offers an alternative test based on the determinants of certain submatrices, which
can sometimes be easier to compute.

Leading Principal Minors


Sylvester’s criterion involves examining the leading principal minors of the matrix A. For an n × n matrix
A, the k-th leading principal minor, denoted ∆k , is the determinant of the upper-left k × k submatrix of A.

Let A = [aij ].

• ∆1 = det([a11 ]) = a11
 
a11 a12
• ∆2 = det
a21 a22
 
a11 a12 a13
• ∆3 = det a21 a22 a23 
a31 a32 a33
• ...

• ∆n = det(A)

There are n leading principal minors for an n × n matrix.

Sylvester’s Criterion
Theorem 5.4.1 (Sylvester’s Criterion). An n × n real symmetric matrix A is positive definite if and only if all
of its leading principal minors are strictly positive.

A is PD ⇐⇒ ∆k > 0 for all k = 1, 2, . . . , n.

Proof Sketch. (⇒) Assume A is PD.


We know from Section 5.3 (Property 4) that if A is PD, all of its principal submatrices are also PD. The upper-
left k × k submatrices, whose determinants are the leading principal minors ∆k , are principal submatrices.
Therefore, these submatrices Ak are PD. Since any PD matrix has a positive determinant (Property 2), it
follows that det(Ak ) = ∆k > 0 for all k = 1, . . . , n.

(⇐) Assume ∆k > 0 for all k = 1, . . . , n.


This direction is typically proven by induction on the size n of the matrix.

• Base Case (n=1): A = [a11 ]. ∆1 = a11 . If ∆1 > 0, then a11 > 0. The quadratic form is xT Ax = a11 x21 .
Since a11 > 0, xT Ax > 0 for x1 ̸= 0. So A is PD.

• Inductive Step: Assume the criterion holds for (n − 1) × (n − 1) matrices. Let A be n × n with ∆k > 0
for k = 1, . . . , n. Let An−1 be the upper-left (n − 1) × (n − 1) submatrix. Its leading principal minors are
∆1 , . . . , ∆n−1 , which are all positive by assumption. By the inductive hypothesis, An−1 is PD. The proof
then often proceeds by considering the block decomposition of A and using properties of Schur complements
or LU/Cholesky decomposition, showing that the positivity of all ∆k implies that all eigenvalues of A must
be positive. A key step involves showing that if An−1 is PD and det(A) > 0, it doesn’t necessarily mean
A is PD, but the condition on all leading principal minors ensures it. Alternatively, one can show that
the conditions ∆k > 0 allow for a Cholesky decomposition A = LLT where L is a lower triangular matrix
with positive diagonal entries. Then xT Ax = xT LLT x = (LT x)T (LT x) = ||LT x||22 . Since L is invertible
(because its diagonal entries are positive), LT x ̸= 0 if x ̸= 0. Therefore, xT Ax > 0 for x ̸= 0, proving A
is PD.
54 CHAPTER 5. MATRIX NORMS AND POSITIVE DEFINITE MATRICES

Application and Examples


Sylvester’s criterion provides
 a direct way to check for positive definiteness without computing eigenvalues.
2 1
Example 5.4.1. A =
1 2
• ∆1 = det([2]) = 2
 
2 1
• ∆2 = det = (2)(2) − (1)(1) = 4 − 1 = 3
1 2
Since ∆1 = 2 > 0 and ∆ 0, A is positive definite.
2 = 3 > 
1 2 0
Example 5.4.2. B = 2 5 1
0 1 3
• ∆1 = det([1]) = 1
 
1 2
• ∆2 = det = (1)(5) − (2)(2) = 5 − 4 = 1
2 5
   
5 1 2 1
• ∆3 = det(B) = 1 · det − 2 · det +0
1 3 0 3
= 1 · (15 − 1) − 2 · (6 − 0) = 14 − 12 = 2
Since ∆1 = 1 > 0, ∆2 = 1 > 0, and ∆3 = 2 > 0, B is positive definite.
1 1
Example 5.4.3. C =
1 1
• ∆1 = det([1]) = 1
 
1 1
• ∆2 = det =1−1=0
1 1
Since ∆2 is not strictlypositive,
 C is not positive definite. (It is positive semidefinite).
1 2
Example 5.4.4. D =
2 1
• ∆1 = det([1]) = 1
 
1 2
• ∆2 = det = 1 − 4 = −3
2 1
Since ∆2 < 0, D is not positive definite. (It is indefinite).

Limitations and Extensions


• Positive Semidefinite: Sylvester’s criterion (positivity of leading principal minors) does not directly
test for positive semidefiniteness. A matrix is positive semidefinite if and only if all its principal minors
(not just the leading ones) are non-negative. Checking all principal minors (there are 2n − 1 of them) is
generally much harder than checking eigenvalues.
• Negative Definite: A symmetric matrix A is negative definite if and only if its leading principal minors
alternate in sign, starting with negative: (−1)k ∆k > 0 for all k = 1, . . . , n. (i.e., ∆1 < 0, ∆2 > 0, ∆3 <
0, . . .).
Sylvester’s criterion is a valuable tool for verifying positive definiteness, especially for smaller matrices or
matrices with structure where determinants are easily computed. It is frequently used in optimization and
stability analysis.
Chapter 6

The Matrix Exponential

The matrix exponential, denoted eAt , is a fundamental concept in the theory of linear time-invariant (LTI)
systems and differential equations. It generalizes the scalar exponential function eat to matrices and provides
the solution to homogeneous linear systems of differential equations ẋ = Ax.

Definition via Power Series (eAt = (At)k /k!)


P
6.1
Just as the scalar exponential function ez can be defined by its Taylor series expansion around z = 0:


z2 z3 X zk
ez = 1 + z + + + ··· =
2! 3! k!
k=0

the matrix exponential eM for a square matrix M is defined by substituting the matrix M into this power series:
Definition 6.1.1. For an n × n matrix M (real or complex), the matrix exponential eM is defined by the
infinite series:

M2 M3 X Mk
eM = I + M + + + ··· =
2! 3! k!
k=0

where M 0 = I (the n × n identity matrix) and 0! = 1.

This definition yields an n × n matrix.

The Matrix Exponential eAt :


In the context of linear systems and differential equations, we are often interested in the matrix exponential
where the argument is the matrix A multiplied by a scalar variable t (usually representing time). Replacing M
with At in the definition gives:


(At)2 (At)3 X (At)k
eAt = I + (At) + + + ··· =
2! 3! k!
k=0


t2 2 t3 3 X tk
eAt = I + tA + A + A + ··· = Ak
2! 3! k!
k=0

This series defines the matrix


 exponential
 eAt as a function of t, resulting in an n × n matrix for each value of t.
λ 0
Example 6.1.1. Let A = .
0 µ
 2   3   k 
2 λ 0 3 λ 0 λ
k 0
A = , A = , ..., A = .
0 µ2 0 µ3 0 µk

55
56 CHAPTER 6. THE MATRIX EXPONENTIAL

∞ k
X t
eAt = Ak
k!
k=0
∞ k
t λk 0
X  
=
k! 0 µk
k=0
∞ 
(λt)k /k!

X 0
=
0 (µt)k /k!
k=0
(λt)k /k! P 0
P 
=
0 (µt)k /k!
 λt 
e 0
=
0 eµt

This shows that for a diagonal matrix, the matrix exponential is simply the diagonal matrix of the scalar
exponentials of the diagonal entries (multipliedby t).
0 1
Example 6.1.2 (Nilpotent Matrix). Let N = .
0 0
 
2 0 0
N = = 0.
0 0

All higher powers N k (k ≥ 2) are also the zero matrix.

t2 2 t3 3
eN t = I + tN + N + N + ...
2! 3!
= I + tN + 0 + 0 + . . .
   
1 0 0 1
= +t
0 1 0 0
 
1 t
=
0 1

The power series definition is fundamental, but its convergence must be established to ensure it is well-defined
for any square matrix A and any scalar t. This will be addressed in the next section.

6.2 Convergence of the Series


The definition of the matrix exponential eAt relies on an infinite power series:
∞ k
X t
eAt = Ak
k!
k=0

For this definition to be meaningful, we must ensure that this series converges for any n × n matrix A and any
scalar t. Convergence here means that each element (i, j) of the partial sum matrices converges to a finite limit
as the number of terms goes to infinity.
We can prove convergence using matrix norms. Recall from Section 5.2 that for any matrix norm || · || that is
submultiplicative (like induced norms or the Frobenius norm), we have ||M k || ≤ ||M ||k .
Let’s consider the norm of each term in the series for eAt :
tk k |tk | k |t|k k
A = ||A || = ||A ||
k! k! k!
Using the submultiplicative property:

tk k |t|k (||A|||t|)k
A ≤ ||A||k =
k! k! k!
Now, consider the infinite series of the norms of the terms:
∞ ∞
X tk k X (||A|||t|)k
A ≤
k! k!
k=0 k=0
6.3. PROPERTIES OF eAt (E.G., d At
dt e = AeAt , e0 = I, eA(t+s) = eAt eAs , INVERSE) 57

Let z = ||A|||t|. This is a non-negative real number. The series on the right becomes:

X zk
k!
k=0

This is exactly the power series expansion for the scalar exponential function ez . We know from calculus that
the power series for ez converges absolutely for all real (or complex) numbers z.
P∞ k
Therefore, the series k=0 (||A|||t|)k! converges to e||A|||t| .

Since the series of norms ||(tP/k!)Ak || converges (it is bounded above by the convergent series e||A|||t| ), this
P k
implies that the matrix series (tk /k!)Ak converges absolutely (element-wise). Absolute convergence implies
convergence in finite-dimensional spaces.
of Matrix Series): A series of matrices Mk converges if the sequence
P
Formal Argument (Convergence
Pm
of partial sums Sm = k=0 M k converges element-wise. That is, for each entry (i, j), the sequence (Sm )ij
converges to a limit (S)Pij . A sufficient condition for convergence is absolute convergence with respect to some
matrix norm, meaning ||Mk || converges.
In our case, Mk = (tk /k!)Ak . We showed that ||Mk || ≤ e||A|||t| , which is finite. By the Weierstrass M-test
P
adapted for matrix series (or simply by noting that convergence of the norm series implies element-wise absolute
convergence due to norm equivalence), the matrix series (tk /k!)Ak converges for all n × n matrices A and all
P
scalars t.
Conclusion: The power series definition of the matrix exponential eAt is well-defined because the series con-
verges absolutely for all square matrices A and all scalars t.

d At
6.3 Properties of eAt (e.g., dt e = AeAt , e0 = I, eA(t+s) = eAt eAs , inverse)
Now that we have established the definition and convergence of the matrix exponential eAt , we can explore its
fundamental properties. Many of these properties are direct analogues of the properties of the scalar exponential
function eat .
Let A and B be n × n matrices, and let t, s be scalars.
1. Value at t=0:
eA·0 = e0 = I (the n × n identity matrix)
Proof. Substitute t = 0 into the power series definition:

X (A · 0)k A0 (A · 0)1 (A · 0)2 I
eA·0 = = + + + ··· = + 0 + 0 + ··· = I
k! 0! 1! 2! 1
k=0

2. Derivative with respect to t:


d At
e = AeAt = eAt A
dt
Proof. We can differentiate the power series term by term (justified by uniform convergence on any finite
interval of t):
∞ k
!
d At d X t k
e = A
dt dt k!
k=0

d tk
X  
= Ak
dt k!
k=0

X ktk−1 k
= A (The k=0 term is I, its derivative is 0)
k!
k=1

X tk−1
= Ak
(k − 1)!
k=1
58 CHAPTER 6. THE MATRIX EXPONENTIAL

Let j = k − 1. As k goes from 1 to ∞, j goes from 0 to ∞.


∞ j
X t
= Aj+1
j!
j=0
∞ j
X t
= AAj
j=0
j!
 
∞ j
X t
= A Aj  (Factoring out A from the left)
j=0
j!

= AeAt

Similarly, we can factor A out from the right:


 
∞ j
X t j
= A A
j=0
j!

= eAt A

This shows that A and eAt commute.


3. Commutativity with A:
AeAt = eAt A
Proof. This was shown directly during the proof of the derivative property. Alternatively:
X k  X k X tk
At t k t
Ae = A A = AAk = Ak+1
k! k! k!
X k  X tk X tk
At t k
e A= A A= Ak A = Ak+1
k! k! k!
Thus, AeAt = eAt A.
4. Exponential of a Sum (Commuting Case): If A and B commute (i.e., AB = BA), then e(A+B)t =
eAt eBt .
Proof Sketch (using power series). This proof mirrors the proof for scalar exponentials, relying on the
Pk
binomial theorem for commuting matrices: ((A + B)t)k = tk (A + B)k = tk j=0 kj Aj B k−j . Expanding

P∞ k P i P j
e(A+B)t = k=0 tk! (A + B)k and eAt eBt = ( ti! Ai )( tj! B j ) and carefully rearranging the sums (using
the commutativity AB = BA) shows the two expressions are equal. The rearrangement is valid due to
absolute convergence.
Caution: This property does not hold if A and B do not commute (AB ̸= BA). In general, e(A+B)t ̸=
eAt eBt .
5. Additive Property (Time):
eA(t+s) = eAt eAs
Proof. Let M1 = At and M2 = As. Since A commutes with itself, At commutes with As. We can use
Property 4:
eA(t+s) = eAt+As = eAt eAs .

6. Inverse: The inverse of eAt is e−At .


(eAt )−1 = e−At
Proof. Using Property 5 with s = −t:
eA(t−t) = eAt eA(−t)
eA·0 = eAt e−At
I = eAt e−At
Similarly, e−At eAt = I. Therefore, e−At is the inverse of eAt . This also implies that eAt is always invertible
(non-singular) for any A and t.
6.3. PROPERTIES OF eAt (E.G., d At
dt e = AeAt , e0 = I, eA(t+s) = eAt eAs , INVERSE) 59

7. Similarity Transformation: If A = P M P −1 for some invertible matrix P , then eAt = P eM t P −1 .


Proof. First, note that Ak = (P M P −1 )k = P M k P −1 (this can be shown by induction). Now use the
power series definition:
∞ k
X t
eAt = Ak
k!
k=0
∞ k
Xt
= (P M k P −1 )
k!
k=0
∞ k
!
X t
=P M k
P −1 (Factoring out P and P −1 )
k!
k=0
= P eM t P −1

This property is extremely useful for computing eAt when A is diagonalizable (M = Λ) or can be trans-
formed to Jordan form (M = J).
8. Transpose / Conjugate Transpose:
T
(eAt )T = eA t

H
(eAt )H = eA t

Proof (Transpose case). Using the property that (M k )T = (M T )k and that transpose distributes over
sums:

∞ k
!T
At T
X t k
(e ) = A
k!
k=0

X k T
t k
= A
k!
k=0
∞ k
X t
= (Ak )T
k!
k=0
∞ k
X t
= (AT )k
k!
k=0
T
= eA t

(The conjugate transpose case is analogous).


9. Block Diagonal Matrices: If A is a block diagonal matrix:
 
A1 0 ··· 0
0 A2 ··· 0 
A= . .. .. ..
 
 .. . . .


0 0 ··· Ap

Then eAt is also block diagonal with the exponentials of the blocks:
 At 
e 1 0 ··· 0
 0 eA2 t ··· 0 
eAt = . .. .. ..
 
 .. . . .


0 0 ··· eAp t

Proof. This follows from the fact that powers Ak are block diagonal with blocks (Ai )k , and the power
series definition.
These properties are essential for manipulating and computing the matrix exponential, and they form the basis
for solving linear systems of differential equations.
60 CHAPTER 6. THE MATRIX EXPONENTIAL

6.4 Theorem: Solution to ẋ = Ax is x(t) = eAt x(0) (with Proof)


One of the most important applications of the matrix exponential eAt is that it provides the explicit solution
to the initial value problem for a system of homogeneous linear first-order ordinary differential equations with
constant coefficients.
Consider the system:
ẋ(t) = Ax(t)
with the initial condition:
x(t0 ) = x0
where x(t) is an n × 1 vector function of time t, ẋ(t) is its derivative with respect to t, A is a constant n × n
matrix, and x0 is the initial state vector at time t0 .
Theorem 6.4.1. The unique solution to the initial value problem ẋ = Ax, x(t0 ) = x0 is given by:
x(t) = eA(t−t0 ) x0
In the common case where the initial time is t0 = 0, the solution simplifies to:
x(t) = eAt x(0)

Proof. We need to show two things: first, that the proposed solution x(t) = eA(t−t0 ) x0 actually satisfies both
the differential equation and the initial condition, and second, that this solution is unique.
1. Verification of the Solution:
• Initial Condition: Let’s check the value of the proposed solution at t = t0 .
x(t0 ) = eA(t0 −t0 ) x0 = eA·0 x0
Using Property 1 from Section 6.3 (e0 = I):
x(t0 ) = Ix0 = x0
The initial condition is satisfied.
• Differential Equation: Let’s differentiate the proposed solution x(t) = eA(t−t0 ) x0 with respect to t.
Let τ = t − t0 . Then dτ /dt = 1. Using the chain rule and Property 2 from Section 6.3 (d/dτ eAτ = AeAτ ):
d A(t−t0 )
ẋ(t) = [e x0 ]
dt
 
d A(t−t0 )
= e x0 (Since x0 is a constant vector)
dt
 
d Aτ dτ
= e · x0
dτ dt
= [AeAτ · 1]x0
= AeA(t−t0 ) x0

Now, substitute the proposed solution x(t) = eA(t−t0 ) x0 back into the expression:
ẋ(t) = A[eA(t−t0 ) x0 ]
= Ax(t)
The differential equation ẋ = Ax is satisfied.
Since the proposed solution satisfies both the differential equation and the initial condition, it is indeed a solution
to the initial value problem.
2. Uniqueness of the Solution: Suppose there are two solutions, x1 (t) and x2 (t), that both satisfy ẋ = Ax
and x(t0 ) = x0 . Let y(t) = x1 (t) − x2 (t).
• Derivative of y(t):
d
(x1 (t) − x2 (t)) = ẋ1 (t) − ẋ2 (t)
ẏ(t) =
dt
Since both x1 and x2 satisfy ẋ = Ax, we have ẋ1 (t) = Ax1 (t) and ẋ2 (t) = Ax2 (t).
ẏ(t) = Ax1 (t) − Ax2 (t) = A(x1 (t) − x2 (t)) = Ay(t)
So, y(t) also satisfies the homogeneous differential equation ẏ = Ay.
6.5. METHODS FOR COMPUTING eAt 61

• Initial Condition for y(t):

y(t0 ) = x1 (t0 ) − x2 (t0 ) = x0 − x0 = 0.

So, y(t) satisfies the initial condition y(t0 ) = 0.


Now, consider the function z(t) = e−A(t−t0 ) y(t). Let’s differentiate z(t) using the product rule:
 
d −A(t−t0 )
ż(t) = e y(t) + e−A(t−t0 ) [ẏ(t)]
dt

Using the chain rule and derivative property for the first term (let τ = −(t − t0 ), dτ /dt = −1):

d −A(t−t0 ) d Aτ dτ
e = e · = (AeAτ ) · (−1) = −Ae−A(t−t0 )
dt dτ dt
Substituting this and ẏ(t) = Ay(t) into the expression for ż(t):

ż(t) = [−Ae−A(t−t0 ) ]y(t) + e−A(t−t0 ) [Ay(t)]

ż(t) = −Ae−A(t−t0 ) y(t) + e−A(t−t0 ) Ay(t)


Since A and e−A(t−t0 ) commute (Property 3, Section 6.3), we can rearrange the second term:

ż(t) = −Ae−A(t−t0 ) y(t) + Ae−A(t−t0 ) y(t)

ż(t) = 0 (the zero vector)


Since the derivative ż(t) is the zero vector for all t, the function z(t) must be a constant vector.

z(t) = C (constant vector)

To find the constant C, we evaluate z(t) at t = t0 :

C = z(t0 ) = e−A(t0 −t0 ) y(t0 ) = e0 y(t0 ) = I · 0 = 0.

So, the constant vector C is the zero vector. This means z(t) = 0 for all t.

z(t) = e−A(t−t0 ) y(t) = 0

Since e−A(t−t0 ) is always invertible (Property 6, Section 6.3), we can multiply by its inverse eA(t−t0 ) :

eA(t−t0 ) [e−A(t−t0 ) y(t)] = eA(t−t0 ) · 0

[eA(t−t0 ) e−A(t−t0 ) ]y(t) = 0

Iy(t) = 0

y(t) = 0 for all t.


Since y(t) = x1 (t) − x2 (t), we have x1 (t) − x2 (t) = 0, which implies x1 (t) = x2 (t) for all t. Therefore, the
solution is unique.

Conclusion: The matrix exponential eA(t−t0 ) provides the unique solution x(t) = eA(t−t0 ) x0 to the funda-
mental initial value problem ẋ = Ax, x(t0 ) = x0 . This result is central to the analysis and solution of linear
time-invariant systems.

6.5 Methods for Computing eAt


While the power series definition eAt = (tk /k!)Ak is fundamental, it is often impractical for direct computa-
P
tion, except for simple cases like diagonal or nilpotent matrices. Fortunately, several alternative methods exist,
leveraging properties of Laplace transforms, eigenvalues/eigenvectors, Jordan forms, and the Cayley-Hamilton
theorem.
62 CHAPTER 6. THE MATRIX EXPONENTIAL

6.5.1 Using Laplace Transform: eAt = L−1 {(sI − A)−1 } (Derivation)


The Laplace transform provides a powerful method for solving linear differential equations and can be used to
compute eAt .
Recall the definition of the Laplace transform of a matrix function F (t):
Z ∞
L{F (t)} = F̃ (s) = e−st F (t)dt
0

And the property for the transform of a derivative:

L{dF/dt} = sF̃ (s) − F (0)

Consider the differential equation satisfied by eAt (from Section 6.3):

d At
e = AeAt
dt
with the initial condition eA·0 = I.
Let X(t) = eAt . Then the equation is dX/dt = AX, with X(0) = I. Take the Laplace transform of both sides
of the differential equation:
L{dX/dt} = L{AX}
Using the derivative property on the left side and linearity on the right side (A is constant):

sX̃(s) − X(0) = AL{X(t)} = AX̃(s)

Substitute the initial condition X(0) = I:

sX̃(s) − I = AX̃(s)

Now, solve for X̃(s):


sX̃(s) − AX̃(s) = I
(sI − A)X̃(s) = I
Assuming (sI − A) is invertible (which it is for s sufficiently large, specifically for Re(s) greater than the real
part of any eigenvalue of A), we can multiply by its inverse:

X̃(s) = (sI − A)−1

Since X(t) = eAt , we have found the Laplace transform of the matrix exponential:

L{eAt } = (sI − A)−1

To find eAt itself, we take the inverse Laplace transform:

eAt = L−1 {(sI − A)−1 }

Derivation Summary:
1. Start with the defining differential equation dX/dt = AX, X(0) = I, where X(t) = eAt .
2. Take the Laplace transform: sX̃(s) − X(0) = AX̃(s).
3. Substitute X(0) = I: sX̃(s) − I = AX̃(s).
4. Rearrange: (sI − A)X̃(s) = I.
5. Solve for X̃(s): X̃(s) = (sI − A)−1 .
6. Take the inverse Laplace transform: X(t) = eAt = L−1 {(sI − A)−1 }.
Computation: This method requires computing the inverse of the matrix (sI − A), which involves symbolic
manipulation with the variable s (often using Cramer’s rule or adjugate matrix formula: (sI − A)−1 = adj(sI −
A)/ det(sI − A)), and then finding the inverse Laplace transform of each element of the resulting matrix of
rational functions in s. This can be computationally intensive but provides a closed-form expression.
6.5. METHODS FOR COMPUTING eAt 63

6.5.2 Using Diagonalization: eAt = V eΛt V −1 (Derivation)


If the matrix A is diagonalizable, this method is often the most straightforward. If A is diagonalizable, then
A = V ΛV −1 , where Λ is a diagonal matrix of eigenvalues (λ1 , . . . , λn ) and V is the invertible matrix whose
columns are the corresponding linearly independent eigenvectors.
Using the similarity transformation property (Property 7, Section 6.3):
eAt = P eM t P −1
Substitute P = V and M = Λ:
eAt = V eΛt V −1
Since Λ is diagonal, Λ = diag(λ1 , . . . , λn ), we know from the example in Section 6.1 that eΛt is also diagonal:
eΛt = diag(eλ1 t , . . . , eλn t )

Derivation Summary:
1. Assume A is diagonalizable: A = V ΛV −1 .
2. Apply the similarity property of matrix exponentials: eAt = V eΛt V −1 .
3. Compute eΛt for the diagonal matrix Λ: eΛt = diag(eλ1 t , . . . , eλn t ).
4. Substitute back: eAt = V diag(eλ1 t , . . . , eλn t )V −1 .
Computation:
1. Find the eigenvalues λi and corresponding eigenvectors vi of A.
2. Check if A is diagonalizable (n linearly independent eigenvectors).
3. Form V = [v1 | . . . |vn ] and Λ = diag(λ1 , . . . , λn ).
4. Compute V −1 .
5. Form eΛt = diag(eλ1 t , . . . , eλn t ).
6. Calculate the product eAt = V eΛt V −1 .

6.5.3 Using Jordan Form: eAt = P eJt P −1 (Derivation)


This method works for any square matrix A (over C, or over R if all eigenvalues are real), including non-
diagonalizable ones. It relies on transforming A to its Jordan Canonical Form (JCF).
Any square matrix A can be written as A = P JP −1 , where J is the JCF of A and P is the matrix whose
columns form a Jordan basis for A.
Using the similarity transformation property (Property 7, Section 6.3):
eAt = P eJt P −1
The JCF matrix J is block diagonal:  
J1 0 ··· 0
0 J2 ··· 0
J = . .. .. .. 
 
 .. . . .
0 0 ··· Jp
where each Ji = Jki (λi ) is a Jordan block. Using the block diagonal property (Property 9, Section 6.3):
 Jt 
e 1 0 ··· 0
 0 eJ 2 t · · · 0 
eJt =  . .. .. .. 
 
 .. . . . 
Jp t
0 0 ··· e
The computation reduces to finding the exponential of each Jordan block Ji = Jm (λ) (size m × m). As shown
in Section 4.4, we write Ji = λI + N , where N is nilpotent (N m = 0).
 
m−1
X N j tj
eJi t = eλt ·  
j=0
j!
64 CHAPTER 6. THE MATRIX EXPONENTIAL

1 t t2 /2! tm−1 /(m − 1)!


 
···
0 1 t ··· tm−2 /(m − 2)!
λt  .. . .. .. ..
 
eJ i t = e . . . . . .


 
0 0 0 ··· t 
0 0 0 ··· 1

Derivation Summary:
1. Find the Jordan form J and Jordan basis matrix P such that A = P JP −1 .
2. Apply the similarity property: eAt = P eJt P −1 .
Pm−1
3. Compute eJt by computing the exponential of each Jordan block Ji using the formula eJi t = eλt j=0 (N j tj /j!).

4. Assemble eJt from the block exponentials eJi t .


5. Calculate the product eAt = P eJt P −1 .
Computation: This method requires finding the JCF and the Jordan basis, which can be complex. However,
once J and P are found, computing eJt and the final product is relatively straightforward using the explicit
formula for eJi t .

6.5.4 Using Cayley-Hamilton Theorem (Theorem Statement, Application Exam-


ple)
Cayley-Hamilton Theorem: Every square matrix A satisfies its own characteristic equation. If p(λ) =
det(A − λI) = cn λn + · · · + c1 λ + c0 is the characteristic polynomial of A, then p(A) = cn An + · · · + c1 A + c0 I = 0
(the zero matrix).
This theorem implies that any power Ak (for k ≥ n) can be expressed as a linear combination of lower powers
I, A, . . . , An−1 .
P∞
Application to eAt : The infinite series eAt = k=0 (tk /k!)Ak involves all powers of A. Using the Cayley-
Hamilton theorem, we can argue that eAt must be expressible as a finite polynomial in A of degree at most
n − 1, with coefficients that depend on t:

eAt = α0 (t)I + α1 (t)A + α2 (t)A2 + · · · + αn−1 (t)An−1

The coefficients αi (t) can be found by considering the eigenvalues. If we substitute an eigenvalue λi into the
characteristic polynomial, we get p(λi ) = 0. It can be shown that the scalar exponential eλi t must satisfy the
same relationship with the coefficients αj (t):

eλi t = α0 (t) + α1 (t)λi + α2 (t)λ2i + · · · + αn−1 (t)λn−1


i

If the matrix A has n distinct eigenvalues λ1 , . . . , λn , this gives a system of n linear equations for the n unknown
coefficients α0 (t), . . . , αn−1 (t). Solving this system yields the coefficients, which can then be plugged back into
the expression for eAt .
If there are repeated eigenvalues, we need additional equations obtained by differentiating the equation with
respect to λ and evaluating at the repeated eigenvalue. For an eigenvalue λi with multiplicity m, we use the
equations obtained from:

dk λt dk h X i
[e ]| λ=λi = α j (t)λj
|λ=λi for k = 0, 1, . . . , m − 1.
dλk dλk

Computation:
1. Find the characteristic polynomial p(λ) of A.
2. Find the eigenvalues λi (roots of p(λ) = 0).
3. Set up the system of n equations relating eλi t (and possibly its derivatives w.r.t. λ) to the polynomial in
λi with unknown coefficients α0 (t), . . . , αn−1 (t).
4. Solve the system for α0 (t), . . . , αn−1 (t).
5. Compute eAt = α0 (t)I + α1 (t)A + · · · + αn−1 (t)An−1 .
6.5. METHODS FOR COMPUTING eAt 65
 
2 1
Example 6.5.1. A = . Eigenvalues λ1 = 1, λ2 = 3. n = 2. Assume eAt = α0 (t)I + α1 (t)A. Equations
1 2
from eigenvalues:

e1t = α0 (t) + α1 (t)(1)


e3t = α0 (t) + α1 (t)(3)

Subtracting the first from the second: e3t − et = 2α1 (t) =⇒ α1 (t) = (e3t − et )/2. Substituting α1 into the first
equation: et = α0 (t) + (e3t − et )/2 =⇒ α0 (t) = et − (e3t − et )/2 = (2et − e3t + et )/2 = (3et − e3t )/2.
Now compute eAt :

eAt = α0 (t)I + α1 (t)A


3et − e3t 1 0 e3t − et 2 1
   
= +
2 0 1 2 1 2
 t 3t
2(e − et )
3t
e3t − et

1 3e − e 0
= + 3t
2 0 3et − e3t e − et 2(e3t − et )
1 e + e3t e3t − et
 t 
=
2 e3t − et et + e3t

Eigenstructure Expansion (Spectral Decomposition): eAt = eλi t vi wiT (Deriva-


P
6.5.5
tion for distinct eigenvalues)
This method applies when the matrix A is diagonalizable and provides insight into the modal structure of
the solution. Assume A has n distinct eigenvalues λ1 , . . . , λn and corresponding right eigenvectors v1 , . . . , vn
(Avi = λi vi ). Since the eigenvalues are distinct, A is diagonalizable. Let w1 , . . . , wn be the corresponding
left eigenvectors (wiT A = λi wiT , or AT wi = λi wi ). The sets {vi } and {wi } can be chosen such that they are
biorthogonal: wiT vj = δij (1 if i=j, 0 if i̸=j).
The matrix A can be written using its spectral decomposition:
n
X
A= λi vi wiT
i=1

Similarly, any function f (A) that can be expressed as a power series (like eAt ) can be written using the eigen-
values:
Xn
f (A) = f (λi )vi wiT
i=1

For the matrix exponential, f (λ) = e . Therefore:


λt

n
X
eAt = eλi t vi wiT
i=1

Derivation: Start with the solution x(t) = eAt x(0). Since {v1 , . . . , vn } form a basis, we can write the initial
condition x(0) as a linear combination of eigenvectors:
n
X
x(0) = cj vj
j=1

Using biorthogonality, the coefficients are cj = wjT x(0).


n
X
x(0) = (wjT x(0))vj
j=1

Now apply eAt to x(0):  


n
X
x(t) = eAt x(0) = eAt  cj vj 
j=1
66 CHAPTER 6. THE MATRIX EXPONENTIAL

n
X
= cj (eAt vj )
j=1

We need to compute e vj . Using the power series:


At

∞ k
!
At
X t k
e vj = A vj
k!
k=0
∞ k
X t
= (Ak vj )
k!
k=0

Since Avj = λj vj , then Ak vj = λkj vj .


∞ k
X t
= (λkj vj )
k!
k=0

!
X (λj t)k
= vj
k!
k=0
= eλj t vj

Substitute this back into the expression for x(t):


n
X
x(t) = cj eλj t vj
j=1

Substitute cj = wjT x(0):


n
X
x(t) = (wjT x(0))eλj t vj
j=1
n
X
x(t) = eλj t vj wjT x(0)
j=1

n
!
X
λi t
x(t) = e vi wiT x(0)
i=1

Comparing this with x(t) = eAt x(0), we identify:


n
X
At
e = eλi t vi wiT
i=1

Computation:
1. Find eigenvalues λi .
2. Find right eigenvectors vi .
3. Find left eigenvectors wi (eigenvectors of AT ).
4. Normalize vi and wi such that wiT vj = δij .
5. Compute the sum eAt = e vi wiT .
P λi t

This method explicitly shows how the solution is a sum of modes eλi t shaped by the eigenvectors.
Each of these methods provides a way to compute eAt , with varying degrees of computational complexity and
applicability depending on the properties of the matrix A.
Part I

Linear Systems Theory

67
69

This part transitions from the foundational linear algebra concepts to their application in the analysis and
description of linear dynamical systems. We will focus on the state-space representation, which provides a
powerful and unified framework for modeling, analyzing, and designing control systems.
70
Chapter 7

State-Space Representation

State-space representation is a mathematical model of a physical system described by a set of input, output, and
state variables related by first-order differential equations (for continuous-time systems) or difference equations
(for discrete-time systems). This approach provides a complete description of the system’s internal dynamics,
unlike the input-output representation (e.g., transfer functions) which only describes the relationship between
the input and output signals.

7.1 Concept of State, State Variables, State Vector


The Concept of State
The state of a dynamical system is a minimal set of variables such that the knowledge of these variables at a
specific time t0 , together with the knowledge of the input signals for all times t ≥ t0 , completely determines the
behavior of the system (including the output and the state variables themselves) for all t ≥ t0 .
In essence, the state encapsulates the entire history of the system relevant to its future evolution. It acts as the
memory of the system. Knowing the state at time t0 means we don’t need to know the inputs or states prior to
t0 to predict the future, given the future inputs.

State Variables
The variables that constitute the state are called state variables. They are typically chosen to represent
quantities that describe the internal energy storage or memory elements of the system. For example:
• In mechanical systems: positions and velocities of masses.
• In electrical circuits: voltages across capacitors and currents through inductors.
• In thermal systems: temperatures at various points.
The choice of state variables for a given system is not unique, but the number of state variables required for a
minimal representation (the dimension of the state) is unique and is called the order of the system.

State Vector
The state variables at a given time t are typically arranged into a column vector called the state vector,
denoted by x(t).
If a system has n state variables, x1 (t), x2 (t), . . . , xn (t), the state vector is:
 
x1 (t)
 x2 (t) 
x(t) =  . 
 
 .. 
xn (t)

The state vector x(t) belongs to an n-dimensional vector space called the state space, typically Rn or Cn . The
evolution of the system over time corresponds to a trajectory traced by the state vector x(t) within the state
space.

71
72 CHAPTER 7. STATE-SPACE REPRESENTATION

Input and Output Vectors


In addition to the state vector, state-space models include:
• Input Vector (u(t)): A vector representing the external signals applied to the system that influence its
behavior. If there are m inputs, u(t) ∈ Rm .
• Output Vector (y(t)): A vector representing the signals measured or observed from the system. These
are typically functions of the state variables and possibly the input variables. If there are p outputs,
y(t) ∈ Rp .
The state-space representation provides a structured way to describe the relationship between the input u(t),
the state x(t), and the output y(t) through a set of first-order equations.

7.2 Linear Time-Invariant (LTI) Systems: Standard form ẋ = Ax +


Bu, y = Cx + Du
A crucial and widely studied class of dynamical systems is Linear Time-Invariant (LTI) systems. These sys-
tems are characterized by linear relationships between state, input, and output variables, and their governing
equations do not change over time.

Standard State-Space Form


A continuous-time LTI system is typically represented in the standard state-space form by two equations:
1. State Equation: Describes the evolution of the state vector x(t) over time. It is a first-order vector
differential equation:
ẋ(t) = Ax(t) + Bu(t)

2. Output Equation: Describes how the output vector y(t) is obtained from the state vector x(t) and the
input vector u(t). It is an algebraic equation:

y(t) = Cx(t) + Du(t)

Where:
• t: Time (continuous variable).
• x(t): The n × 1 state vector (n is the order of the system).
• ẋ(t): The time derivative of the state vector, dx/dt.
• u(t): The m × 1 input vector (m is the number of inputs).
• y(t): The p × 1 output vector (p is the number of outputs).
• A: The n × n state matrix (or system matrix). It describes the internal dynamics of the system (how
the state evolves in the absence of input).
• B: The n × m input matrix (or control matrix). It describes how the inputs affect the state dynamics.
• C: The p × n output matrix (or sensor matrix). It describes how the state variables are combined to
form the outputs.
• D: The p × m feedthrough matrix (or direct transmission matrix). It describes the direct influence of
the inputs on the outputs, bypassing the state dynamics.
Key Characteristics:
• Linearity: The equations are linear combinations of the state and input vectors. This allows the use of
powerful linear algebra tools for analysis and design.
• Time-Invariance: The matrices A, B, C, and D are constant; they do not depend on time t. This implies
that the system’s behavior is consistent regardless of when the inputs are applied.
7.3. 73

Interpretation of Matrices
• A Matrix (State Matrix): The term Ax(t) in the state equation governs the system’s natural response
(how the state changes if u(t) = 0). The eigenvalues of A determine the stability and modes of the
unforced system.
• B Matrix (Input Matrix): The term Bu(t) shows how the external inputs u(t) drive the state variables.
If an element Bij is zero, the j-th input has no direct effect on the rate of change of the i-th state variable.
• C Matrix (Output Matrix): The term Cx(t) determines how the internal state x(t) is observed
through the outputs y(t). If an element Cij is zero, the j-th state variable does not directly contribute to
the i-th output.
• D Matrix (Feedthrough Matrix): The term Du(t) represents a direct path from input to output.
If D = 0 (the zero matrix), the system is called strictly proper. In many physical systems, especially
those with inertia, D is often zero because inputs typically affect the state derivatives first, and the state
then influences the output. Non-zero D implies an instantaneous effect of the input on the output.
The quartet of matrices (A, B, C, D) completely defines the LTI system in state-space form.
Example 7.2.1 (Simple Mass-Spring-Damper). Consider a mass m attached to a wall by a spring (stiffness k)
and a damper (damping coefficient c). An external force u(t) is applied to the mass. Let the position of the
mass from equilibrium be z(t). The equation of motion is: mz̈ + cż + kz = u(t).
To put this in state-space form, we choose state variables. A common choice for second-order mechanical
systems is position and velocity:
• x1 (t) = z(t) (position)
• x2 (t) = ż(t) (velocity)
Now, find the derivatives of the state variables:
• ẋ1 (t) = ż(t) = x2 (t)
• ẋ2 (t) = z̈(t). From the equation of motion: z̈ = (1/m)[u(t) − cż − kz] = (1/m)[u(t) − cx2 (t) − kx1 (t)]
Arrange these into the state equation ẋ = Ax + Bu:
      
ẋ1 0 1 x1 0
= + u(t)
ẋ2 −k/m −c/m x2 1/m
   
0 1 0
So, A = and B = .
−k/m −c/m 1/m
If we choose the output to be the position, y(t) = z(t) = x1 (t), then the output equation y = Cx + Du is:
 
  x1  
y(t) = 1 0 + 0 u(t)
x2

So, C = 1 0 and D = [0] (a scalar zero in this case). This LTI state-space representation (A, B, C, D) fully
 

describes the dynamics of the mass-spring-damper system.

7.3 Linear Time-Varying (LTV) Systems: Standard form ẋ = A(t)x +


B(t)u, y = C(t)x + D(t)u
While LTI systems are fundamental and widely applicable, many real-world systems have parameters that change
over time. Examples include aircraft whose dynamics change with altitude and speed, or chemical processes
where reaction rates vary with temperature. Such systems are modeled as Linear Time-Varying (LTV) systems.

Standard State-Space Form


An LTV system retains the linear structure of the state-space equations, but the matrices defining the system
are allowed to be functions of time. The standard state-space form for a continuous-time LTV system is:
1. State Equation:
ẋ(t) = A(t)x(t) + B(t)u(t)
74 CHAPTER 7. STATE-SPACE REPRESENTATION

2. Output Equation:
y(t) = C(t)x(t) + D(t)u(t)

Where:
• t: Time (continuous variable).
• x(t): The n × 1 state vector.
• ẋ(t): The time derivative of the state vector.
• u(t): The m × 1 input vector.
• y(t): The p × 1 output vector.
• A(t): The n × n time-varying state matrix.
• B(t): The n × m time-varying input matrix.
• C(t): The p × n time-varying output matrix.
• D(t): The p × m time-varying feedthrough matrix.
Key Characteristics:
• Linearity: The equations remain linear in x(t) and u(t) at any given time t.
• Time-Varying: At least one of the matrices A(t), B(t), C(t), or D(t) explicitly depends on time t. If all
matrices are constant, the system reduces to an LTI system.

Comparison with LTI Systems


The key difference is the time dependence of the system matrices. This has significant implications for the
analysis and solution of LTV systems:
• Solution: Unlike LTI systems where the solution involves the matrix exponential eA(t−t0 ) , the solution
for LTV systems involves the more complex concept of the state transition matrix Φ(t, t0 ), which
generally does not have a simple closed-form expression like the matrix exponential. (This will be covered
in Chapter 8).
• Eigenvalue Analysis: Standard eigenvalue analysis of A(t) at a specific time t does not provide complete
information about stability or system behavior over time, as the dynamics are constantly changing.
• Transfer Functions: The concept of a transfer function, based on Laplace transforms, is generally not
applicable to LTV systems because the Laplace transform assumes time-invariance.
Example 7.3.1 (RC Circuit with Time-Varying Resistor). Consider a simple series RC circuit where the
resistor R(t) varies with time, and the input is the voltage source u(t). We want to find the voltage across the
capacitor, y(t).
Let the state variable be the voltage across the capacitor, x(t) = VC (t). By Kirchhoff’s Voltage Law (KVL):

u(t) = VR (t) + VC (t) = R(t)i(t) + x(t)

The current through the capacitor is i(t) = CdVC /dt = C ẋ(t). Substituting i(t) into the KVL equation:

u(t) = R(t)[C ẋ(t)] + x(t)

Rearranging to find the state equation for ẋ(t):

R(t)C ẋ(t) = u(t) − x(t)


   
1 1
ẋ(t) = − x(t) + u(t)
R(t)C R(t)C
This is in the form ẋ = A(t)x + B(t)u, where:
• A(t) = −1/(R(t)C) (a 1 × 1 matrix, i.e., a scalar function of time)
• B(t) = 1/(R(t)C) (a 1 × 1 matrix, i.e., a scalar function of time)
The output is y(t) = VC (t) = x(t). So, the output equation is y = C(t)x + D(t)u, where:
7.4. EXAMPLES: ELECTRICAL CIRCUITS, MECHANICAL SYSTEMS 75

• C(t) = 1 (constant 1 × 1 matrix)


• D(t) = 0 (constant 1 × 1 matrix)
Since A(t) and B(t) depend on time through R(t), this is an LTV system.
LTV systems provide a more general framework than LTI systems but are often more challenging to analyze
due to the time-varying nature of their dynamics.

7.4 Examples: Electrical Circuits, Mechanical Systems


To solidify the understanding of state-space representation for LTI and LTV systems, let’s look at deriving the
models for common physical systems.
Example 7.4.1 (RLC Circuit (LTI)). Consider a series RLC circuit with input voltage u(t) = Vin (t). We want
to model the system, perhaps with the output y(t) being the voltage across the capacitor, VC (t).
[Placeholder for Series RLC Circuit Diagram]
State Variables: A natural choice for state variables in electrical circuits corresponds to energy storage
elements: the voltage across the capacitor (related to electric field energy) and the current through the inductor
(related to magnetic field energy).
• x1 (t) = VC (t) (Voltage across capacitor)
• x2 (t) = iL (t) (Current through inductor)
Derive State Equations: We need equations for ẋ1 and ẋ2 .
• Capacitor Equation: The current through the capacitor is iC = CdVC /dt. In a series circuit, the
current is the same through all elements, so iC = iL = x2 . Therefore:

1
x2 = C ẋ1 =⇒ ẋ1 = x2
C

• Inductor Equation: The voltage across the inductor is VL = LdiL /dt = Lẋ2 .
• KVL Equation: Apply Kirchhoff’s Voltage Law around the loop:

Vin (t) − VR (t) − VL (t) − VC (t) = 0


u(t) − RiL (t) − Lẋ2 (t) − x1 (t) = 0
u(t) − Rx2 (t) − Lẋ2 (t) − x1 (t) = 0

Solve for ẋ2 :


Lẋ2 (t) = −x1 (t) − Rx2 (t) + u(t)

1 R 1
ẋ2 (t) = − x1 (t) − x2 (t) + u(t)
L L L

State-Space Form: Combine the equations for ẋ1 and ẋ2 :


      
ẋ1 0 1/C x1 0
= + u(t)
ẋ2 −1/L −R/L x2 1/L

This gives the LTI matrices:   


0 1/C 0
A= , B=
−1/L −R/L 1/L
Output Equation: If the output is the capacitor voltage, y(t) = VC (t) = x1 (t).
 
  x1  
y(t) = 1 0 + 0 u(t)
x2

So, C = 1 0 and D = [0].


 

If the output was the current, y(t) = iL (t) = x2 (t), then C = 0 1 and
 D = [0]. If the output was the voltage
 

across the resistor, y(t) = VR (t) = RiL (t) = Rx2 (t), then C = 0 R and D = [0].
76 CHAPTER 7. STATE-SPACE REPRESENTATION

Example 7.4.2 (Two-Mass System (LTI)). Consider two masses, m1 and m2 , connected by a spring (k2 ) and
damper (c2 ). Mass m1 is connected to a wall by another spring (k1 ) and damper (c1 ). An external force u(t)
acts on mass m1 .
[Placeholder for Two-Mass System Diagram]
Let z1 (t) and z2 (t) be the displacements of m1 and m2 from their equilibrium positions.
Equations of Motion (Newton’s Second Law):
• For m1 :
m1 z̈1 = −k1 z1 − c1 ż1 + k2 (z2 − z1 ) + c2 (ż2 − ż1 ) + u(t)
m1 z̈1 = −(k1 + k2 )z1 − (c1 + c2 )ż1 + k2 z2 + c2 ż2 + u(t)

• For m2 :
m2 z̈2 = −k2 (z2 − z1 ) − c2 (ż2 − ż1 )
m2 z̈2 = k2 z1 + c2 ż1 − k2 z2 − c2 ż2

State Variables: Choose positions and velocities:


• x1 = z1
• x2 = ż1
• x3 = z2
• x4 = ż2
Derive State Equations:
• ẋ1 = ż1 = x2
• ẋ3 = ż2 = x4
• ẋ2 = z̈1 = 1
m1 [−(k1 + k2 )x1 − (c1 + c2 )x2 + k2 x3 + c2 x4 + u(t)]

• ẋ4 = z̈2 = 1
m2 [k2 x1 + c2 x2 − k2 x3 − c2 x4 ]
State-Space Form:
      
ẋ1 0 1 0 0 x1 0
ẋ2  −(k1 + k2 )/m1 −(c1 + c2 )/m1 k2 /m1 c2 /m1  x2  + 1/m1  u(t)
   
 =
ẋ3   0 0 0 1  x3   0 
ẋ4 k2 /m2 c2 /m2 −k2 /m2 −c2 /m2 x4 0

This gives the 4 × 4 matrix A and the 4 × 1 matrix B.


Output Equation: If the output is the position of the second mass, y(t) = z2 (t) = x3 (t).
   
y(t) = 0 0 1 0 x(t) + 0 u(t)

So, C = 0 0 1 0 and D = [0].


 

Example 7.4.3 (Simple Pendulum (Nonlinear, then LTV via Linearization)). Consider a simple pendulum of
length L and mass m, with angle θ from the vertical. An input torque τ (t) is applied at the pivot. The equation
of motion (ignoring friction) is:
mL2 θ̈ = −mgL sin(θ) + τ (t)
This is a nonlinear system due to the sin(θ) term.
Linearization around Equilibrium (θ = 0): Assume the pendulum stays close to the stable equilibrium
point θ = 0 (hanging down). For small angles, sin(θ) ≈ θ. The linearized equation becomes:

mL2 θ̈ ≈ −mgLθ + τ (t)

g 1
θ̈ + θ≈ τ (t)
L mL2
Let u(t) = τ (t). Choose state variables:
• x1 = θ
7.5. LINEARIZATION OF NONLINEAR SYSTEMS AROUND EQUILIBRIA (BRIEF OVERVIEW) 77

• x2 = θ̇

State Equations:

• ẋ1 = θ̇ = x2

• ẋ2 = θ̈ ≈ − Lg x1 + 1
mL2 u(t)

State-Space Form (LTI, linearized around θ = 0):


      
ẋ1 0 1 x1 0
= + u(t)
ẋ2 −g/L 0 x2 1/(mL2 )

   
0 1 0
A= , B=
−g/L 0 1/(mL2 )

Time-Varying Example (Pendulum on a Moving Cart - LTV): Imagine the pendulum pivot is on a cart
whose horizontal position is given by a known function s(t). The dynamics become more complex, and if we
linearize around a trajectory (e.g., keeping the pendulum vertical while the cart moves), the resulting linearized
system matrices A(t) and B(t) might depend on the trajectory s(t) and its derivatives, making the system LTV.
Deriving this explicitly is more involved but illustrates how time-varying parameters or reference trajectories
can lead to LTV models.

These examples demonstrate the process of selecting state variables and deriving the A, B, C, D matrices (or
A(t), B(t), C(t), D(t)) for physical systems, translating differential equations into the standard state-space form.

7.5 Linearization of Nonlinear Systems around Equilibria (Brief Overview)


Many real-world systems are inherently nonlinear. However, linear systems theory provides a wealth of powerful
analysis and design tools that are not readily available for general nonlinear systems. A common and effective
technique to bridge this gap is linearization: approximating the behavior of a nonlinear system near a specific
operating point (usually an equilibrium point) with a linear model (LTI or LTV).

Nonlinear State-Space Systems


A general continuous-time nonlinear system can be represented in state-space form as:

ẋ(t) = f (x(t), u(t), t)


y(t) = h(x(t), u(t), t)

where:

• x(t) is the state vector, u(t) is the input vector, y(t) is the output vector.

• f is a nonlinear vector function describing the state dynamics.

• h is a nonlinear vector function describing the output relationship.

• The explicit dependence on t indicates potential time-varying behavior.

Equilibrium Points
An equilibrium point (or operating point) of a nonlinear system is a state xeq such that if the system starts at
xeq with a constant input ueq , it remains at xeq indefinitely. For a time-invariant system (f does not explicitly
depend on t), this means:
ẋ = 0 when x = xeq and u = ueq

So, equilibrium points (xeq , ueq ) satisfy:


f (xeq , ueq ) = 0

The corresponding equilibrium output is yeq = h(xeq , ueq ).


78 CHAPTER 7. STATE-SPACE REPRESENTATION

Linearization using Taylor Series Expansion


We want to approximate the nonlinear system’s behavior near an equilibrium point (xeq , ueq ). Let the state
and input be slightly perturbed from this point:

x(t) = xeq + δx(t)

u(t) = ueq + δu(t)

where δx(t) and δu(t) are small deviations. The corresponding output will be:

y(t) = yeq + δy(t)

Now, substitute these into the state equation and use a first-order Taylor series expansion of f around (xeq , ueq )
(assuming f is sufficiently smooth and time-invariant for simplicity here):

d d
ẋ(t) = (xeq + δx(t)) = (δx(t))
dt dt

∂f ∂f
f (x, u) ≈ f (xeq , ueq ) + (x − xeq ) + (u − ueq )
∂x (xeq ,ueq ) ∂u (xeq ,ueq )

Since f (xeq , ueq ) = 0 (equilibrium condition), and substituting the deviations:

d ∂f ∂f
(δx(t)) ≈ 0 + δx(t) + δu(t)
dt ∂x (xeq ,ueq ) ∂u (xeq ,ueq )

This is a linear differential equation in terms of the deviations δx(t) and δu(t). Let:

∂f
A= (Jacobian matrix of f w.r.t. x, evaluated at equilibrium)
∂x (xeq ,ueq )

∂f
B= (Jacobian matrix of f w.r.t. u, evaluated at equilibrium)
∂u (xeq ,ueq )

Then the linearized state equation is:

d
(δx(t)) = Aδx(t) + Bδu(t)
dt
Similarly, linearize the output equation y = h(x, u):

y(t) = yeq + δy(t)

∂h ∂h
h(x, u) ≈ h(xeq , ueq ) + (x − xeq ) + (u − ueq )
∂x (xeq ,ueq ) ∂u (xeq ,ueq )

Since yeq = h(xeq , ueq ), we get:

∂h ∂h
yeq + δy(t) ≈ yeq + δx(t) + δu(t)
∂x (xeq ,ueq ) ∂u (xeq ,ueq )

Let:
∂h
C= (Jacobian matrix of h w.r.t. x, evaluated at equilibrium)
∂x (xeq ,ueq )

∂h
D= (Jacobian matrix of h w.r.t. u, evaluated at equilibrium)
∂u (xeq ,ueq )

Then the linearized output equation is:

δy(t) = Cδx(t) + Dδu(t)


7.5. LINEARIZATION OF NONLINEAR SYSTEMS AROUND EQUILIBRIA (BRIEF OVERVIEW) 79

The Linearized System


The resulting system, described in terms of the deviation variables (δx, δu, δy), is an LTI system:

d
(δx) = Aδx + Bδu
dt
δy = Cδx + Dδu
where the constant matrices A, B, C, D are the Jacobians of f and h evaluated at the equilibrium point
(xeq , ueq ).
Jacobian Matrices: Recall that the Jacobian matrix of a vector function f (z) (where f has m components
and z has n components) is the m × n matrix of partial derivatives:
 
∂f1 /∂z1 ∂f1 /∂z2 · · · ∂f1 /∂zn
∂f  ∂f2 /∂z1 ∂f2 /∂z2 · · · ∂f2 /∂zn 
= .. .. .. ..
 
∂z . . . .

 
∂fm /∂z1 ∂fm /∂z2 · · · ∂fm /∂zn

So, A = [∂fi /∂xj ], B = [∂fi /∂uj ], C = [∂hi /∂xj ], D = [∂hi /∂uj ], all evaluated at (xeq , ueq ).

Validity and Application


• The linearized model (A, B, C, D) accurately approximates the behavior of the original nonlinear system
only for small deviations (δx, δu) around the chosen equilibrium point (xeq , ueq ).
• The stability properties of the linearized system (determined by the eigenvalues of A) often correspond
to the local stability properties of the nonlinear system near the equilibrium point (Hartman-Grobman
theorem, Lyapunov’s indirect method).
• Linear control design techniques (like pole placement or LQR, discussed later) can be applied to the
linearized model to design controllers that work well near the operating point.
Linearization around a Trajectory: A similar process can be used to linearize a nonlinear system around a
nominal trajectory (x∗ (t), u∗ (t)) that satisfies ẋ∗ = f (x∗ , u∗ , t). The resulting linearized system will generally
be an LTV system, as the Jacobian matrices A(t), B(t), C(t), D(t) will be evaluated along the time-varying
trajectory (x∗ (t), u∗ (t)).
Linearization is a fundamental technique for applying linear systems theory to the analysis and control of
nonlinear systems in the vicinity of desired operating points or trajectories.
80 CHAPTER 7. STATE-SPACE REPRESENTATION
Chapter 8

Solution of State Equations

Having established the state-space representation for linear systems (both LTI and LTV), the next crucial step
is to determine how the state vector x(t) evolves over time, given an initial state x(t0 ) and the input u(t). This
chapter focuses on finding the solution to the state equation.
We begin with the more general case of LTV systems, introducing the concept of the State Transition Matrix
(STM), and then show how it simplifies to the matrix exponential for LTI systems.

8.1 State Transition Matrix (STM) Φ(t, t0 ) for LTV Systems


Consider the homogeneous LTV state equation:

ẋ(t) = A(t)x(t)

with initial condition x(t0 ) = x0 .


Unlike the LTI case where the solution involves eA(t−t0 ) , finding a solution for the LTV case is more complex
because A(t) varies with time. The solution involves a matrix function called the State Transition Matrix
(STM).

8.1.1 Definition as the unique solution to Φ̇ = A(t)Φ, Φ(t0 , t0 ) = I


Definition 8.1.1. The State Transition Matrix (STM), denoted Φ(t, t0 ), for the system ẋ = A(t)x is the
unique n × n matrix function that satisfies the matrix differential equation:

d
Φ(t, t0 ) = A(t)Φ(t, t0 )
dt
with the initial condition:
Φ(t0 , t0 ) = I (the n × n identity matrix)

Existence and Uniqueness: If the matrix A(t) is continuous over an interval, then a unique solution Φ(t, t0 )
to this matrix initial value problem exists for all t and t0 within that interval. Each column of Φ(t, t0 ) represents
the solution x(t) to ẋ = A(t)x starting from an initial condition where x(t0 ) is the corresponding standard
basis vector ei .
Interpretation: The STM Φ(t, t0 ) maps the state vector from time t0 to time t. If the system starts at
state x(t0 ) at time t0 , its state at time t (assuming no input) will be x(t) = Φ(t, t0 )x(t0 ). Φ(t, t0 ) essentially
"transitions" the state from t0 to t.
Note: Finding an analytical expression for Φ(t, t0 ) is generally difficult unless A(t) has special structures (e.g.,
A(t) is constant, diagonal, or commutes with its integral).

8.1.2 Properties: Φ(t, t) = I, Φ(t, τ )Φ(τ, σ) = Φ(t, σ), Φ−1 (t, τ ) = Φ(τ, t) (with Proofs)
The STM has several important properties that follow directly from its definition:
1. Identity Property: Φ(t, t) = I for all t.

81
82 CHAPTER 8. SOLUTION OF STATE EQUATIONS

Proof. By definition, Φ(t, t0 ) satisfies Φ̇ = A(t)Φ with Φ(t0 , t0 ) = I. If we set t0 = t, the initial condition
becomes Φ(t, t) = I.
2. Semigroup Property (Composition Property): Φ(t, τ )Φ(τ, σ) = Φ(t, σ) for all t, τ, σ.
Proof. Consider the solution x(t) starting from x(σ) = x0 . We have x(t) = Φ(t, σ)x0 . We can also
transition the state from σ to τ , and then from τ to t:
x(τ ) = Φ(τ, σ)x0
x(t) = Φ(t, τ )x(τ ) = Φ(t, τ )[Φ(τ, σ)x0 ] = [Φ(t, τ )Φ(τ, σ)]x0
Since the solution x(t) starting from x0 at time σ is unique, we must have:
Φ(t, σ)x0 = [Φ(t, τ )Φ(τ, σ)]x0
This holds for any initial state x0 , which implies the matrix equality:
Φ(t, σ) = Φ(t, τ )Φ(τ, σ)

3. Inverse Property: Φ(t, τ ) is invertible, and its inverse is Φ(τ, t).


Φ−1 (t, τ ) = Φ(τ, t)
Proof. Using the semigroup property, set σ = t:
Φ(t, τ )Φ(τ, t) = Φ(t, t)
Using the identity property, Φ(t, t) = I:
Φ(t, τ )Φ(τ, t) = I
Similarly, setting σ = τ in Φ(τ, σ)Φ(σ, t) = Φ(τ, t) gives:
Φ(τ, t)Φ(t, τ ) = Φ(τ, τ ) = I
Since Φ(t, τ )Φ(τ, t) = I and Φ(τ, t)Φ(t, τ ) = I, the matrix Φ(t, τ ) is invertible and its inverse is Φ(τ, t).
4. Determinant Property (Liouville-Jacobi Formula):
Z t 
det(Φ(t, t0 )) = exp tr(A(τ ))dτ
t0

Proof Sketch: This involves showing that d/dt(det(Φ)) = tr(A(t)) det(Φ) and solving this scalar differential
equation for det(Φ) with the initial condition det(Φ(t0 , t0 )) = det(I) = 1. This property implies that
Φ(t, t0 ) is always invertible if A(t) is continuous, because the exponential function is never zero.

8.1.3 Solution to Homogeneous LTV: x(t) = Φ(t, t0 )x(t0 )


Now we show that the STM provides the solution to the homogeneous LTV initial value problem ẋ = A(t)x,
x(t0 ) = x0 .
Theorem 8.1.1. The unique solution to ẋ = A(t)x with x(t0 ) = x0 is given by x(t) = Φ(t, t0 )x0 .

Proof. We verify that x(t) = Φ(t, t0 )x0 satisfies the differential equation and the initial condition.
• Initial Condition:
x(t0 ) = Φ(t0 , t0 )x0 = Ix0 = x0 . (Satisfied)

• Differential Equation: Differentiate x(t) with respect to t:


d
ẋ(t) = [Φ(t, t0 )x0 ]
dt 
d
= Φ(t, t0 ) x0 (Since x0 is a constant vector)
dt
Using the definition of Φ(t, t0 ), d
dt Φ(t, t0 ) = A(t)Φ(t, t0 ):
ẋ(t) = [A(t)Φ(t, t0 )]x0
= A(t)[Φ(t, t0 )x0 ]
= A(t)x(t). (Satisfied)
8.2. MATRIX EXPONENTIAL FOR LTI SYSTEMS: Φ(t, t0 ) = eA(t−t0 ) 83

Since the solution exists and is unique (from theory of ODEs), x(t) = Φ(t, t0 )x0 is the unique solution.

The State Transition Matrix Φ(t, t0 ) is the fundamental concept for solving LTV systems, playing a role analo-
gous to the matrix exponential eA(t−t0 ) in LTI systems.

8.2 Matrix Exponential for LTI Systems: Φ(t, t0 ) = eA(t−t0 )


Now let’s consider the special, but very important, case where the system is Linear Time-Invariant (LTI). In
this case, the state matrix A is constant.
The homogeneous LTI state equation is:
ẋ(t) = Ax(t)
with initial condition x(t0 ) = x0 .
We seek the State Transition Matrix Φ(t, t0 ) for this system. By definition, Φ(t, t0 ) must satisfy:
d
Φ(t, t0 ) = AΦ(t, t0 )
dt
Φ(t0 , t0 ) = I
Let’s propose a candidate solution based on the matrix exponential we studied in Chapter 6. Consider the
matrix function:
X(t) = eA(t−t0 )
Let’s check if this function satisfies the defining properties of the STM.
• Initial Condition: Evaluate X(t) at t = t0 :

X(t0 ) = eA(t0 −t0 ) = eA·0 = e0 = I.

The initial condition Φ(t0 , t0 ) = I is satisfied.


• Differential Equation: Differentiate X(t) with respect to t. Let τ = t − t0 , so dτ /dt = 1.
d d
X(t) = eA(t−t0 )
dt dt
d Aτ dτ
= e ·
dτ dt
= (AeAτ ) · 1
= AeA(t−t0 )
= AX(t)

The differential equation d


dt Φ(t, t0 ) = AΦ(t, t0 ) is satisfied (since A is constant).
Since the function eA(t−t0 ) satisfies both the matrix differential equation and the initial condition that uniquely
define the STM for the LTI system ẋ = Ax, it must be the State Transition Matrix.
Result: For an LTI system ẋ = Ax, the State Transition Matrix is given by the matrix exponential:

Φ(t, t0 ) = eA(t−t0 )

Verification of STM Properties for eA(t−t0 ) : We can verify that eA(t−t0 ) satisfies the general properties of
the STM discussed in Section 8.1:
1. Identity: Φ(t, t) = eA(t−t) = e0 = I. (Matches)
2. Semigroup: Φ(t, τ )Φ(τ, σ) = eA(t−τ ) eA(τ −σ) . Using the property eM1 eM2 = eM1 +M2 if M1 and M2
commute (here M1 = A(t − τ ), M2 = A(τ − σ), which commute since A commutes with itself):

= eA(t−τ )+A(τ −σ) = eA(t−τ +τ −σ) = eA(t−σ) = Φ(t, σ). (Matches)

3. Inverse: Φ−1 (t, τ ) = (eA(t−τ ) )−1 . Using the inverse property of matrix exponential (eM )−1 = e−M :

= e−A(t−τ ) = eA(τ −t) = Φ(τ, t). (Matches)


84 CHAPTER 8. SOLUTION OF STATE EQUATIONS

8.2.1 Solution to Homogeneous LTI: x(t) = eA(t−t0 ) x(t0 )


Using the general result for LTV systems that the solution to the homogeneous equation ẋ = A(t)x, x(t0 ) = x0
is x(t) = Φ(t, t0 )x0 , and substituting the specific form of the STM for LTI systems, Φ(t, t0 ) = eA(t−t0 ) , we
immediately get the solution for the homogeneous LTI system.
Theorem 8.2.1. The unique solution to the LTI initial value problem ẋ = Ax, x(t0 ) = x0 is given by:

x(t) = eA(t−t0 ) x0

This confirms the result we proved directly using the properties of the matrix exponential in Section 6.4.
In the common case where the initial time is t0 = 0:

x(t) = eAt x(0)

This fundamental result connects the matrix exponential directly to the time evolution of the state of a linear
time-invariant system. The behavior of the system is entirely determined by the matrix exponential eAt acting
on the initial state x(0). The methods for computing eAt discussed in Section 6.5 are therefore crucial for
finding explicit solutions to LTI state equations.

8.3 Fundamental Matrix Ψ(t)


Another important concept related to the solution of homogeneous linear systems (both LTI and LTV) is the
Fundamental Matrix.
Consider the homogeneous linear system:
ẋ(t) = A(t)x(t)
where x(t) is the n × 1 state vector.

8.3.1 Definition: Columns are linearly independent solutions to ẋ = A(t)x


Definition 8.3.1. An n × n matrix Ψ(t) is called a Fundamental Matrix for the system ẋ = A(t)x if its n
columns, denoted ψ1 (t), ψ2 (t), . . . , ψn (t), are n linearly independent solutions to the differential equation.

Ψ(t) = [ψ1 (t)|ψ2 (t)| . . . |ψn (t)]

This means each column ψi (t) satisfies:


d
ψi (t) = A(t)ψi (t)
dt
And the set of vectors {ψ1 (t), . . . , ψn (t)} is linearly independent for all t in the interval of interest.
Since each column satisfies the differential equation, the entire matrix Ψ(t) also satisfies the matrix differential
equation:
d
Ψ(t) = A(t)Ψ(t)
dt
Linear Independence and Invertibility: The requirement that the columns are linearly independent for all
t is equivalent to requiring that the determinant of Ψ(t) is non-zero for all t.

det(Ψ(t)) ̸= 0

This means a fundamental matrix Ψ(t) is always invertible. (This can be shown using the Liouville-Jacobi
Rt
formula: det(Ψ(t)) = det(Ψ(t0 )) exp( t0 tr(A(τ ))dτ ). If the columns are linearly independent at any time t0 ,
then det(Ψ(t0 )) ̸= 0, and since the exponential term is never zero, det(Ψ(t)) ̸= 0 for all t).
Non-Uniqueness: Unlike the State Transition Matrix Φ(t, t0 ) which is uniquely defined by its initial condition
Φ(t0 , t0 ) = I, a fundamental matrix Ψ(t) is not unique. If Ψ(t) is a fundamental matrix, then Ψ(t)C is also a
fundamental matrix for any constant invertible n × n matrix C.

Proof. Let Ψ̃(t) = Ψ(t)C. Then dt d d


Ψ̃(t) = ( dt Ψ(t))C = (A(t)Ψ(t))C = A(t)(Ψ(t)C) = A(t)Ψ̃(t). So Ψ̃(t)
satisfies the matrix differential equation. Also, det(Ψ̃(t)) = det(Ψ(t)) det(C). Since det(Ψ(t)) ̸= 0 and det(C) ̸=
0, then det(Ψ̃(t)) ̸= 0, meaning its columns are linearly independent. Thus, Ψ̃(t) is also a fundamental matrix.
8.3. FUNDAMENTAL MATRIX Ψ(t) 85

General Solution using Fundamental Matrix: Any solution x(t) to ẋ = A(t)x can be expressed as a
linear combination of the columns of a fundamental matrix Ψ(t):
x(t) = Ψ(t)c
where c is a constant n × 1 vector determined by the initial conditions. If x(t0 ) = x0 , then x0 = Ψ(t0 )c, which
implies c = Ψ−1 (t0 )x0 (since Ψ(t0 ) is invertible). Therefore, the solution to the initial value problem is:
x(t) = Ψ(t)Ψ−1 (t0 )x0

8.3.2 Relation to STM: Φ(t, τ ) = Ψ(t)Ψ−1 (τ ) (with Proof )


Comparing the solution obtained using the fundamental matrix, x(t) = [Ψ(t)Ψ−1 (t0 )]x0 , with the solution
obtained using the state transition matrix, x(t) = Φ(t, t0 )x0 , we can establish a direct relationship between
Φ(t, t0 ) and any fundamental matrix Ψ(t).
Theorem 8.3.1. For any fundamental matrix Ψ(t) of the system ẋ = A(t)x, the State Transition Matrix
Φ(t, τ ) is given by:
Φ(t, τ ) = Ψ(t)Ψ−1 (τ )

Proof. We need to show that the matrix X(t) = Ψ(t)Ψ−1 (τ ) satisfies the defining properties of the STM Φ(t, τ ),
namely:
1. d
dt X(t) = A(t)X(t)
2. X(τ ) = I
Let τ be fixed. Consider X(t) = Ψ(t)Ψ−1 (τ ).
1. Differential Equation: Differentiate X(t) with respect to t. Note that Ψ−1 (τ ) is a constant matrix with
respect to t.
d d
X(t) = [Ψ(t)Ψ−1 (τ )]
dt dt 
d
= Ψ(t) Ψ−1 (τ )
dt
Since Ψ(t) is a fundamental matrix, d
dt Ψ(t) = A(t)Ψ(t).

= [A(t)Ψ(t)]Ψ−1 (τ )
= A(t)[Ψ(t)Ψ−1 (τ )]
= A(t)X(t)
The differential equation is satisfied.
2. Initial Condition: Evaluate X(t) at t = τ .
X(τ ) = Ψ(τ )Ψ−1 (τ )
Since Ψ(τ ) is invertible, Ψ(τ )Ψ−1 (τ ) = I.
X(τ ) = I
The initial condition is satisfied.
Since X(t) = Ψ(t)Ψ−1 (τ ) satisfies the unique definition of the State Transition Matrix Φ(t, τ ), we conclude:
Φ(t, τ ) = Ψ(t)Ψ−1 (τ )

Special Case: LTI Systems For an LTI system ẋ = Ax, we know Φ(t, τ ) = eA(t−τ ) . Also, Ψ(t) = eAt is
a fundamental matrix because dt
d
(eAt ) = AeAt and det(eAt ) = etr(A)t ̸= 0. Let’s check the relationship using
Ψ(t) = e :
At

Ψ(t)Ψ−1 (τ ) = eAt (eAτ )−1


= eAt e−Aτ
= eAt−Aτ (Since At and -Aτ commute)
A(t−τ )
=e
86 CHAPTER 8. SOLUTION OF STATE EQUATIONS

This matches Φ(t, τ ), confirming the relationship for the specific fundamental matrix eAt .
If we chose a different fundamental matrix, say Ψ̃(t) = eAt C (where C is invertible), then:

Ψ̃(t)Ψ̃−1 (τ ) = (eAt C)((eAτ C)−1 )


= (eAt C)(C −1 (eAτ )−1 )
= eAt (CC −1 )e−Aτ
= eAt Ie−Aτ
= eA(t−τ )

We get the same unique STM Φ(t, τ ), regardless of which fundamental matrix is used.
The fundamental matrix provides an alternative way to think about the basis of solutions for homogeneous
linear systems and offers another route to finding the state transition matrix.

8.4 Solution to Nonhomogeneous Systems (Variation of Constants /


Convolution Integral)
So far, we have focused on the solution to the homogeneous state equation ẋ = A(t)x. Now, we consider the
full linear state equation, including the input term:

ẋ(t) = A(t)x(t) + B(t)u(t)

with the initial condition x(t0 ) = x0 .


The solution to this nonhomogeneous equation consists of two parts: the zero-input response (due to the
initial state x0 ) and the zero-state response (due to the input u(t)). The zero-input response is simply the
solution to the homogeneous equation, which we already found using the State Transition Matrix (STM).

Zero-Input Response: xzi (t) = Φ(t, t0 )x0

To find the complete solution, we need to find the particular solution due to the input u(t), also known as the
zero-state response (assuming x0 = 0). We use a method analogous to the variation of parameters technique
used for scalar linear differential equations.
Rt
8.4.1 Derivation for LTV case: x(t) = Φ(t, t0 )x(t0 ) + t0
Φ(t, τ )B(τ )u(τ )dτ
Let the full solution be x(t). We guess a solution form inspired by the homogeneous solution, but allow the
"constant" vector to vary with time. Let:
x(t) = Φ(t, t0 )z(t)
where z(t) is an unknown vector function to be determined. Note that if z(t) were constant, this would just be
the homogeneous solution.
Substitute this guess into the nonhomogeneous state equation ẋ = A(t)x + B(t)u. First, differentiate x(t) using
the product rule:    
d d
ẋ(t) = Φ(t, t0 ) z(t) + Φ(t, t0 ) z(t)
dt dt
We know d
dt Φ(t, t0 ) = A(t)Φ(t, t0 ). Substitute this:

ẋ(t) = [A(t)Φ(t, t0 )]z(t) + Φ(t, t0 )ż(t)

Now, set this equal to the right-hand side of the state equation, A(t)x + B(t)u, substituting x(t) = Φ(t, t0 )z(t):

A(t)Φ(t, t0 )z(t) + Φ(t, t0 )ż(t) = A(t)[Φ(t, t0 )z(t)] + B(t)u(t)

The first term on both sides cancels out:

Φ(t, t0 )ż(t) = B(t)u(t)

Since Φ(t, t0 ) is invertible, with inverse Φ−1 (t, t0 ) = Φ(t0 , t), we can solve for ż(t):

ż(t) = Φ−1 (t, t0 )B(t)u(t)


8.4. SOLUTION TO NONHOMOGENEOUS SYSTEMS (VARIATION OF CONSTANTS / CONVOLUTION INTEGRAL)8

ż(t) = Φ(t0 , t)B(t)u(t)


To find z(t), we integrate ż(τ ) from the initial time t0 to the current time t:
Z t
z(t) − z(t0 ) = ż(τ )dτ
t0
Z t
z(t) = z(t0 ) + Φ(t0 , τ )B(τ )u(τ )dτ
t0

Now we need to find z(t0 ). From our assumed solution form, x(t) = Φ(t, t0 )z(t). Evaluate this at t = t0 :

x(t0 ) = Φ(t0 , t0 )z(t0 )

x0 = Iz(t0 )
z(t0 ) = x0
Substitute z(t0 ) = x0 back into the expression for z(t):
Z t
z(t) = x0 + Φ(t0 , τ )B(τ )u(τ )dτ
t0

Finally, substitute this z(t) back into our assumed solution form x(t) = Φ(t, t0 )z(t):
 Z t 
x(t) = Φ(t, t0 ) x0 + Φ(t0 , τ )B(τ )u(τ )dτ
t0
Z t
x(t) = Φ(t, t0 )x0 + Φ(t, t0 ) Φ(t0 , τ )B(τ )u(τ )dτ
t0

Using the semigroup property Φ(t, t0 )Φ(t0 , τ ) = Φ(t, τ ), we can bring Φ(t, t0 ) inside the integral:
Z t
x(t) = Φ(t, t0 )x0 + [Φ(t, t0 )Φ(t0 , τ )]B(τ )u(τ )dτ
t0

Complete Solution (LTV):


Z t
x(t) = Φ(t, t0 )x0 + Φ(t, τ )B(τ )u(τ )dτ
t0

This is the general solution for the state vector of an LTV system. It clearly shows the two components:
• Zero-Input Response: Φ(t, t0 )x0 (response due to initial state x0 only)
Rt
• Zero-State Response: t0 Φ(t, τ )B(τ )u(τ )dτ (response due to input u(t) only, assuming x0 = 0)
The integral term is a convolution-like integral, summing the effect of the input u(τ ) at all past times τ (from
t0 to t), propagated to the current time t by the state transition matrix Φ(t, τ ).
Rt
8.4.2 Derivation for LTI case: x(t) = eA(t−t0 ) x(t0 ) + t0
eA(t−τ ) Bu(τ )dτ
For the LTI case, the matrices A and B are constant, and the state transition matrix simplifies to Φ(t, τ ) =
eA(t−τ ) . We can substitute this directly into the general LTV solution derived above.
Substitute Φ(t, t0 ) = eA(t−t0 ) and Φ(t, τ ) = eA(t−τ ) into:
Z t
x(t) = Φ(t, t0 )x0 + Φ(t, τ )B(τ )u(τ )dτ
t0

Since B is constant, B(τ ) = B.


Complete Solution (LTI):
Z t
x(t) = eA(t−t0 ) x0 + eA(t−τ ) Bu(τ )dτ
t0

This is the well-known variation of constants formula or convolution integral solution for LTI systems.
88 CHAPTER 8. SOLUTION OF STATE EQUATIONS

• Zero-Input Response: eA(t−t0 ) x0


Rt
• Zero-State Response: t0 eA(t−τ ) Bu(τ )dτ
Convolution Interpretation (for t0 = 0): If we set t0 = 0, the solution becomes:
Z t
At
x(t) = e x(0) + eA(t−τ ) Bu(τ )dτ
0

The integral term represents the convolution of the system’s matrix impulse response, h(t) = eAt B (for t ≥ 0),
Rt
with the input u(t). Let H(t) = eAt B. The integral is 0 H(t − τ )u(τ )dτ , which is the definition of (H ∗ u)(t).
So, the zero-state response is xzs (t) = (eAt B) ∗ u(t).
Output Equation Solution: Once the state solution x(t) is found (for either LTV or LTI), the output y(t)
is obtained algebraically from the output equation:
• LTV: y(t) = C(t)x(t) + D(t)u(t)
• LTI: y(t) = Cx(t) + Du(t)
Substituting the full solution for x(t) gives the complete output response, which also consists of a zero-input
and a zero-state component. For LTI systems:
Z t
y(t) = CeAt x(0) + C eA(t−τ ) Bu(τ )dτ + Du(t)
0

The term CeAt B (for t ≥ 0, and 0 for t < 0) plus Dδ(t) (where δ(t) is the Dirac delta) is the impulse response
matrix of the LTI system from input u to output y.
These formulas provide the complete analytical solution for the state and output evolution of linear systems,
forming the basis for understanding system response to initial conditions and external inputs.
Chapter 9

Stability of Linear Systems

Stability is arguably the most fundamental property of a dynamical system. Informally, a stable system is one
that remains near its equilibrium state when subjected to small perturbations. An unstable system, conversely,
will diverge from equilibrium even for arbitrarily small disturbances. This chapter introduces the formal concepts
of stability, focusing primarily on linear systems.

9.1 Concept of Stability (Internal Stability / Lyapunov Stability)


We focus on the internal stability of a system, which concerns the behavior of the state vector x(t) in
the absence of input (i.e., the stability of the homogeneous system ẋ = A(t)x). This is often referred to as
Lyapunov stability, named after Aleksandr Lyapunov who pioneered the mathematical theory of stability.

9.1.1 Definitions: Equilibrium Points, Stability, Asymptotic Stability, Instability


Consider the autonomous (unforced) system:

ẋ(t) = f (x(t), t)

where f is a function describing the dynamics. For linear systems, f (x, t) = A(t)x.
1. Equilibrium Point: A state xeq is an equilibrium point (or fixed point) if f (xeq , t) = 0 for all t ≥ t0 .
If the system starts at xeq , it remains there forever. For the linear system ẋ = A(t)x, the origin xeq = 0
is always an equilibrium point, since A(t) · 0 = 0. If A(t) is invertible for all t, the origin is the only
equilibrium point. We will primarily focus on the stability of the origin xeq = 0.
2. Stability in the Sense of Lyapunov (i.s.L.): The equilibrium point xeq = 0 is stable (i.s.L.) if,
for any given tolerance ϵ > 0, there exists a sensitivity δ(ϵ, t0 ) > 0 such that if the initial state x(t0 ) is
close enough to the origin (||x(t0 )|| < δ), then the subsequent state x(t) remains within the tolerance ϵ
of the origin (||x(t)|| < ϵ) for all future times t ≥ t0 . Intuition: If you start close enough (within δ),
you stay close (within ϵ). Formal Definition: The origin is stable if ∀ϵ > 0, ∀t0 , ∃δ(ϵ, t0 ) > 0 such that
||x(t0 )|| < δ =⇒ ||x(t)|| < ϵ, ∀t ≥ t0 . If δ can be chosen independently of t0 , the stability is called
uniform stability. For LTI systems, stability is always uniform.
3. Asymptotic Stability: The equilibrium point xeq = 0 is asymptotically stable if it is:
(a) Stable (i.s.L.), and
(b) Convergent: There exists some δ0 (t0 ) > 0 such that if the initial state is within δ0 of the origin
(||x(t0 )|| < δ0 ), then the state x(t) not only stays close but also approaches the origin as time goes
to infinity (limt→∞ x(t) = 0).
Intuition: If you start close enough (within δ0 ), you eventually return to the equilibrium point. Formal
Definition (Convergence part): ∃δ0 (t0 ) > 0 such that ||x(t0 )|| < δ0 =⇒ limt→∞ ||x(t)|| = 0. If the
convergence is uniform with respect to t0 and initial states within δ0 , the stability is called uniform
asymptotic stability. For LTI systems, asymptotic stability is always uniform. If convergence holds for
any initial state (δ0 can be arbitrarily large), the origin is globally asymptotically stable. For linear
systems, asymptotic stability is always global.

89
90 CHAPTER 9. STABILITY OF LINEAR SYSTEMS

4. Instability: The equilibrium point xeq = 0 is unstable if it is not stable. This means there exists some
ϵ > 0 such that for any δ > 0, no matter how small, there is always some initial state x(t0 ) within δ of the
origin (||x(t0 )|| < δ) whose trajectory eventually leaves the ϵ-neighborhood (||x(t)|| ≥ ϵ for some t > t0 ).
Intuition: No matter how close you start, there’s a chance you’ll eventually move far away.
Visualizing Stability:
• Stable: Trajectories starting near the origin remain confined within a slightly larger region around the
origin.
• Asymptotically Stable: Trajectories starting near the origin not only remain nearby but also eventually
converge back to the origin.
• Unstable: At least some trajectories starting arbitrarily close to the origin eventually move far away.
These definitions provide the formal framework for analyzing the stability of the internal state dynamics of
linear (and nonlinear) systems.

9.2 Stability of LTI Systems


For Linear Time-Invariant (LTI) systems, ẋ = Ax, the stability of the equilibrium point xeq = 0 can be
determined directly from the eigenvalues of the constant state matrix A.

9.2.1 Theorem: Stability based on Eigenvalues of A (Re(λi ) < 0 ⇔ Asymptotic


Stability) (with Proof for diagonalizable case, discussion for Jordan case)
Theorem 9.2.1. Consider the LTI system ẋ = Ax. Let λ1 , λ2 , . . . , λn be the eigenvalues of the matrix A.
1. The system (or the equilibrium point x=0) is asymptotically stable if and only if all eigenvalues of A
have strictly negative real parts (Re(λi ) < 0 for all i = 1, . . . , n).
2. The system is stable (i.s.L.) if and only if all eigenvalues of A have non-positive real parts (Re(λi ) ≤ 0
for all i = 1, . . . , n), AND any eigenvalue with zero real part (Re(λi ) = 0, i.e., purely imaginary or zero
eigenvalues) corresponds to Jordan blocks of size 1 × 1 in the Jordan Canonical Form of A. (This means
eigenvalues on the imaginary axis must have equal algebraic and geometric multiplicities).
3. The system is unstable if at least one eigenvalue has a strictly positive real part (Re(λi ) > 0 for some i),
OR if there is an eigenvalue with zero real part (Re(λi ) = 0) that corresponds to a Jordan block of size
greater than 1 × 1.

Proof Sketch (Asymptotic Stability, Diagonalizable Case). Assume A is diagonalizable, so A = V ΛV −1 where


Λ = diag(λ1 , . . . , λn ). The solution to ẋ = Ax is x(t) = eAt x(0). Using the diagonalization method for eAt :

x(t) = V eΛt V −1 x(0)

where eΛt = diag(eλ1 t , . . . , eλn t ). Let c = V −1 x(0) be the representation of the initial condition in the eigen-
vector basis. Then:
x(t) = V [diag(eλ1 t , . . . , eλn t )]c
This means x(t) is a linear combination of terms like eλi t vi , where vi are the eigenvectors.
n
X
x(t) = ci eλi t vi
i=1

Now, consider the behavior as t → ∞. The magnitude of each term eλi t depends on the real part of λi . Let
λi = σi + jωi .

|eλi t | = |e(σi +jωi )t | = |eσi t ejωi t | = |eσi t ||ejωi t | = eσi t (since |ejωi t | = | cos(ωi t) + j sin(ωi t)| = 1).

• If Re(λi ) = σi < 0 for all i: Then eσi t → 0 as t → ∞ for all i. Consequently, each term ci eλi t vi goes
to the zero vector, and thus x(t) → 0 as t → ∞. Since the solution involves decaying exponentials, it
can also be shown that the solution remains bounded for any bounded initial condition, satisfying the
definition of stability i.s.L. Therefore, the system is asymptotically stable.
9.2. STABILITY OF LTI SYSTEMS 91

• If Re(λi ) > 0 for some i: Let λk be an eigenvalue with σk > 0. If the initial condition x(0) has a
non-zero component ck along the eigenvector vk (which is generally true unless x(0) lies exactly in the
subspace spanned by other eigenvectors), then the term ck eλk t vk will grow unboundedly as t → ∞ because
|eλk t | = eσk t → ∞. Thus, ||x(t)|| → ∞, and the system is unstable.
• If Re(λi ) ≤ 0 for all i, but Re(λk ) = 0 for some k: Let λk be an eigenvalue with σk = 0. The term
ck eλk t vk becomes ck ejωk t vk . The magnitude |ejωk t | = 1, so this term represents an oscillation (if ωk ̸= 0)
or remains constant (if ωk = 0, i.e., λk = 0). The terms corresponding to eigenvalues with σi < 0 decay
to zero. Therefore, the overall solution x(t) remains bounded. It does not necessarily converge to 0 (due
to the non-decaying terms from eigenvalues on the imaginary axis). This corresponds to stability i.s.L.,
but not asymptotic stability.

Discussion for Jordan Case (Non-Diagonalizable A): If A is not diagonalizable, the solution involves the
Jordan form, x(t) = P eJt P −1 x(0). The matrix eJt contains blocks eJi t corresponding to the Jordan blocks Ji .
Recall from Section 6.5.3 that for a Jordan block Ji = λI + N of size m × m:
"m−1 #
X N k tk
eJi t = eλt ·
k!
k=0

This involves terms like tk eλt for k = 0, 1, . . . , m − 1.


• If Re(λ) < 0: The exponential term eλt decays faster than any polynomial term tk grows. Therefore,
tk eλt → 0 as t → ∞ for all k. The system is asymptotically stable.
• If Re(λ) > 0: The term tk eλt grows unboundedly as t → ∞. The system is unstable.
• If Re(λ) = 0: The term becomes tk ejωt .
– If the Jordan block size m = 1 (i.e., the eigenvalue is semisimple), then k = 0 only. The term is ejωt ,
which is bounded (magnitude 1). If all other eigenvalues have Re ≤ 0 and those with Re=0 have
m = 1, the system is stable i.s.L.
– If the Jordan block size m > 1, then terms like tejωt , t2 ejωt , . . . , tm−1 ejωt appear. Since tk grows
unboundedly as t → ∞ for k ≥ 1, the solution x(t) will grow unboundedly even though Re(λ)=0.
The system is unstable.
This confirms the conditions stated in the theorem, particularly the requirement that eigenvalues on the imag-
inary axis must correspond to 1 × 1 Jordan blocks (be semisimple) for stability i.s.L.
Summary of Eigenvalue Locations and Stability:
• Asymptotic Stability: All eigenvalues strictly in the Left Half Plane (LHP): Re(λi ) < 0.
• Stability (i.s.L.): All eigenvalues in the LHP or on the imaginary axis (Re(λi ) ≤ 0), with any eigenvalues
on the imaginary axis being semisimple (corresponding to 1 × 1 Jordan blocks).
• Instability: At least one eigenvalue strictly in the Right Half Plane (RHP) (Re(λi ) > 0), OR at least
one eigenvalue on the imaginary axis (Re(λi ) = 0) that is not semisimple (corresponding to Jordan blocks
> 1 × 1).

9.2.2 Marginal Stability


The term marginal stability is often used to describe systems that are stable (i.s.L.) but not asymptotically
stable. This occurs when the system has eigenvalues on the imaginary axis (Re(λi ) = 0) that are semisimple,
while all other eigenvalues are strictly in the LHP (Re(λi ) < 0).
Characteristics of marginally stable systems:
• Trajectories remain bounded for all bounded initial conditions.
• Trajectories do not necessarily decay to zero; they may exhibit sustained oscillations (if λ = ±jω, ω ̸= 0)
or converge to a non-zero constant (if λ = 0).
Examples:
• A system with eigenvalues {−1, ±j} is marginally stable.
• A system with eigenvalues {−2, 0} is marginally stable.
92 CHAPTER 9. STABILITY OF LINEAR SYSTEMS
 
0 1
• A system with eigenvalues {0, 0} corresponding to A = (Jordan block size 2) is unstable.
0 0
• A system with eigenvalues {±j, ±j} corresponding to a 4 × 4 matrix with two 2 × 2 Jordan blocks for ±j
is unstable.
Determining stability based on eigenvalue locations is a cornerstone of LTI system analysis.

9.3 Lyapunov Theory for LTI Systems


While eigenvalue analysis provides a complete stability picture for LTI systems, Lyapunov theory offers an
alternative and powerful approach. It is particularly important because:
1. It provides a more general framework that can be extended to analyze the stability of nonlinear and
time-varying systems, where eigenvalue analysis is often insufficient or inapplicable.
2. It provides a concept (the Lyapunov function) that can be used not only for analysis but also for the
design of stabilizing controllers.
Lyapunov’s approach is often called the direct method because it assesses stability directly from the system’s
equations without explicitly solving for the trajectories x(t).

9.3.1 Lyapunov Functions


The core idea of Lyapunov’s direct method is to find a scalar function V (x) of the state vector x, often
interpreted as a generalized "energy" function for the system, such that:
1. V (x) is positive everywhere except at the equilibrium point (x = 0), where it is zero.
2. The time derivative of V (x) along the system’s trajectories, V̇ (x), is negative (or non-positive) everywhere
except possibly at the equilibrium.
If such a function exists, it implies that the system’s "energy" is continuously decreasing (or non-increasing)
as the state evolves, suggesting that the state must eventually settle down at the equilibrium point where the
"energy" is minimal (zero).
Formal Definition (Lyapunov Function Candidate): A continuously differentiable scalar function V (x) :
Rn → R is a Lyapunov function candidate for the system ẋ = f (x) (where f (0) = 0) in a region D
containing the origin if:
1. Positive Definiteness: V (0) = 0 and V (x) > 0 for all x ∈ D, x ̸= 0.
Time Derivative along Trajectories: The time derivative of V (x) along the trajectories of the system
ẋ = f (x) is given by the chain rule:
 T
d ∂V
V̇ (x) = V (x(t)) = ẋ = (∇V )T f (x)
dt ∂x

where ∇V = (∂V /∂x) is the gradient of V with respect to x.


Lyapunov Stability Theorems (Conceptual Overview):
• Stability i.s.L.: If there exists a Lyapunov function candidate V (x) such that its time derivative V̇ (x)
is negative semidefinite (V̇ (x) ≤ 0) in a region D around the origin, then the origin is stable (i.s.L.).
Intuition: The "energy" never increases, so trajectories starting close enough cannot move far away.
• Asymptotic Stability: If there exists a Lyapunov function candidate V (x) such that its time derivative
V̇ (x) is negative definite (V̇ (x) < 0 for all x ∈ D, x ̸= 0), then the origin is asymptotically stable.
Intuition: The "energy" strictly decreases towards zero, forcing the state to converge to the origin.
• Global Asymptotic Stability: If the conditions for asymptotic stability hold for D = Rn (the entire
state space) and V (x) is radially unbounded (V (x) → ∞ as ||x|| → ∞), then the origin is globally
asymptotically stable.
Lyapunov Functions for LTI Systems (Quadratic Forms): For LTI systems ẋ = Ax, a common and
effective choice for a Lyapunov function candidate is a quadratic form:

V (x) = xT P x
9.3. LYAPUNOV THEORY FOR LTI SYSTEMS 93

where P is a real, symmetric n × n matrix.


For V (x) = xT P x to be a Lyapunov function candidate, it must be positive definite. From Section 5.3, this
requires the symmetric matrix P to be positive definite (P > 0).
Now, let’s compute the time derivative V̇ (x) along the trajectories of ẋ = Ax:

d T
V̇ (x) = (x P x)
dt
   
d T T d
= x Px + x P x
dt dt
= (ẋ)T P x + xT P (ẋ)

Substitute ẋ = Ax:

= (Ax)T P x + xT P (Ax)
= xT AT P x + xT P Ax
= xT (AT P + P A)x

For asymptotic stability, we require V̇ (x) to be negative definite. This means we need the matrix (AT P + P A)
to be negative definite. Let Q = −(AT P + P A). Then we require Q to be positive definite (Q > 0).
This leads to the central equation in Lyapunov theory for LTI systems:

AT P + P A = −Q

If we can find a symmetric positive definite matrix P (P > 0) that satisfies this equation for some symmetric
positive definite matrix Q (Q > 0), then V (x) = xT P x is a valid Lyapunov function proving the asymptotic
stability of the LTI system ẋ = Ax.

9.3.2 Lyapunov Equation: AT P + P A = −Q


As derived in the previous subsection, when using a quadratic Lyapunov function candidate V (x) = xT P x
(with P symmetric and P > 0) for the LTI system ẋ = Ax, the time derivative is V̇ (x) = xT (AT P + P A)x.
For asymptotic stability, we require V̇ (x) to be negative definite, meaning V̇ (x) < 0 for all x ̸= 0. This is
equivalent to requiring the matrix −(AT P + P A) to be positive definite.
This leads to the Lyapunov Equation:
AT P + P A = −Q

where:
• A is the state matrix of the LTI system ẋ = Ax.
• P is a symmetric, positive definite matrix (P = P T , P > 0) that we are trying to find.
• Q is a chosen symmetric, positive definite matrix (Q = QT , Q > 0).
Interpretation: The Lyapunov equation relates the system dynamics (A) to the properties of the chosen
Lyapunov function (P ) and its rate of decrease (represented by Q).
• If the system ẋ = Ax is asymptotically stable, then for any chosen symmetric positive definite Q, there
exists a unique symmetric positive definite solution P to the Lyapunov equation.
• Conversely, if for some chosen symmetric positive definite Q, there exists a symmetric positive definite
solution P to the Lyapunov equation, then the system ẋ = Ax is asymptotically stable.
Choosing Q: In the context of proving stability, Q can often be chosen conveniently. A common choice is
Q = I (the identity matrix), or Q = C T C for some matrix C such that the pair (A, C) is observable (this
ensures Q is at least positive semidefinite, and often positive definite under certain conditions). The choice of
Q affects the resulting P matrix, but as long as Q is positive definite, the existence of a positive definite P
guarantees asymptotic stability.
Solving the Lyapunov Equation: The equation AT P +P A = −Q is a linear matrix equation for the unknown
matrix P . Since P is symmetric, it has n(n + 1)/2 independent entries. The equation can be rewritten as a
94 CHAPTER 9. STABILITY OF LINEAR SYSTEMS

system of n2 linear algebraic equations for the entries of P . This can be done using the Kronecker product (⊗)
and the vectorization operator (vec):

(I ⊗ AT + AT ⊗ I)vec(P ) = −vec(Q)

This is a standard linear system of the form M x = b, where x = vec(P ), b = −vec(Q), and M = (I⊗AT +AT ⊗I).
This system has a unique solution for P if and only if A and −AT have no common eigenvalues, which is
equivalent to λi (A) + λj (A) ̸= 0 for all pairs of eigenvalues i, j. If A is asymptotically stable (all Re(λi ) < 0),
this condition is always met, guaranteeing a unique solution P for any Q.
While the Kronecker product formulation shows existence and uniqueness, numerical methods are typically used
to solve the Lyapunov equation for P in practice (e.g., using functions like scipy.linalg.solve_lyapunov in
Python or lyap in MATLAB).
The Lyapunov equation is a fundamental tool in stability analysis and control design for LTI systems.

9.3.3 Theorem: Lyapunov Stability Theorem (If A stable, then for any Q>0,
unique P>0 exists; If P>0 exists for some Q>0, then A is stable) (with
Proofs)
This theorem formally connects the stability of the LTI system ẋ = Ax with the existence of solutions to the
Lyapunov equation AT P + P A = −Q.
Theorem 9.3.1 (Lyapunov Stability Theorem for LTI Systems). Let A be a real n × n matrix. The following
statements are equivalent:
1. The LTI system ẋ = Ax is asymptotically stable (i.e., all eigenvalues of A have strictly negative real
parts).
2. For any given symmetric positive definite matrix Q (Q = QT > 0), there exists a unique symmetric positive
definite matrix P (P = P T > 0) that satisfies the Lyapunov equation AT P + P A = −Q.
3. For some given symmetric positive definite matrix Q (Q = QT > 0), there exists a unique symmetric
positive definite matrix P (P = P T > 0) that satisfies the Lyapunov equation AT P + P A = −Q.

Proof. (1 ⇒ 2): Assume ẋ = Ax is asymptotically stable.


This means all eigenvalues λi of A satisfy Re(λi ) < 0. We need to show that for any symmetric Q > 0, there
exists a unique symmetric P > 0 satisfying AT P + P A = −Q.
• Existence and Uniqueness of P: As discussed in Section 9.3.2, the linear matrix equation AT P +P A =
−Q has a unique solution P for any Q if and only if λi (A) + λj (A) ̸= 0 for all pairs i, j. Since Re(λi ) < 0
and Re(λj ) < 0, their sum Re(λi + λj ) = Re(λi ) + Re(λj ) < 0, which means λi + λj ̸= 0. Thus, a unique
solution P exists for any Q.
• Symmetry of P: If Q is symmetric, is the unique solution P also symmetric? Take the transpose of the
Lyapunov equation:
(AT P + P A)T = (−Q)T

P T A + AT P T = −QT

Since Q = QT , we have AT P T + P T A = −Q. This shows that if P is a solution, then P T is also a solution.
Since the solution is unique, we must have P = P T . So, P is symmetric.
• Positive Definiteness of P: We need to show that the unique symmetric solution P is positive definite.
Consider the candidate solution P given by the integral:
Z ∞
T
P = eA τ QeAτ dτ
0

First, we need to show this integral converges. Since A is asymptotically stable, the matrix exponential
T
eAt (and thus eA t ) decays to zero as t → ∞. Specifically, ||eAt || ≤ M e−αt for some M > 0, α > 0.
T T
Then the norm of the integrand is bounded by ||eA τ QeAτ || ≤ ||eA τ ||||Q||||eAτ || ≤ M 2 ||Q||e−2ατ , which
is integrable from 0 to ∞. Thus, the integral converges. Since Q > 0 and eAτ is always invertible, the
T
integrand eA τ QeAτ is positive definite for all τ ≥ 0 (using the congruence transformation property). The
9.4. INPUT-OUTPUT STABILITY (BIBO) 95

integral of a positive definite matrix function over a positive interval results in a positive definite matrix
P . So, P > 0. Now, let’s verify if this P satisfies the Lyapunov equation:
Z ∞  Z ∞ 
T T AT τ Aτ AT τ Aτ
A P + PA = A e Qe dτ + e Qe dτ A
Z ∞ 0 0
T T
= [AT eA τ QeAτ + eA τ QeAτ A]dτ
Z0 ∞    
d AT τ T d Aτ
= [ e QeAτ + eA τ Q e ]dτ
0 dτ dτ
Z ∞
d AT τ Aτ
= [e Qe ]dτ
0 dτ
T
= [eA τ QeAτ ]ττ =∞
=0
T T
·0
= lim (eA τ QeAτ ) − eA QeA·0
τ →∞

Since A is stable, eAt → 0 as t → ∞, so the limit term is the zero matrix.

= 0 − e0 Qe0
= −IQI
= −Q
R∞ T
Thus, P = 0
eA τ QeAτ dτ is the unique symmetric positive definite solution.

(2 ⇒ 3): This implication is trivial. If statement 2 holds for any Q > 0, it certainly holds for some Q > 0.

(3 ⇒ 1): Assume for some symmetric Q > 0, there exists a symmetric P > 0 satisfying AT P + P A =
−Q.
We need to show that ẋ = Ax is asymptotically stable. Consider the quadratic function V (x) = xT P x. Since P
is symmetric and positive definite, V (x) is a valid Lyapunov function candidate (V (x) > 0 for x ̸= 0, V (0) = 0).
Calculate the time derivative V̇ (x) along the trajectories of ẋ = Ax:

V̇ (x) = xT (AT P + P A)x

Substitute the Lyapunov equation AT P + P A = −Q:

V̇ (x) = xT (−Q)x = −xT Qx

Since Q is positive definite (Q > 0), −Q is negative definite. Therefore, V̇ (x) = −xT Qx < 0 for all x ̸= 0. We
have found a Lyapunov function V (x) = xT P x whose derivative V̇ (x) is negative definite. By the Lyapunov
stability theorem (conceptual version from 9.3.1), this implies that the equilibrium point x = 0 is asymptotically
stable. Furthermore, since V (x) = xT P x is quadratic, it is radially unbounded. Therefore, the stability is global
asymptotic stability.

Conclusion: The theorem establishes a fundamental equivalence: the asymptotic stability of an LTI system is
completely characterized by the existence of a positive definite solution P to the Lyapunov equation AT P +P A =
−Q for any (or some) positive definite Q. This provides a powerful algebraic test for stability that avoids
calculating eigenvalues directly.

9.4 Input-Output Stability (BIBO)


While Lyapunov stability concerns the behavior of the internal state of a system in the absence of input (ho-
mogeneous system), another important concept is Input-Output Stability. This focuses on the relationship
between the system’s input and output signals, specifically whether bounded inputs always lead to bounded
outputs.

This type of stability is often crucial from a practical perspective, as we typically interact with a system through
its inputs and outputs. We want assurance that if we apply a reasonable (bounded) input signal, the resulting
output signal will also remain within reasonable bounds and not grow indefinitely.
96 CHAPTER 9. STABILITY OF LINEAR SYSTEMS

9.4.1 Definition
The most common form of input-output stability is Bounded-Input, Bounded-Output (BIBO) stability.
Consider a system (LTI or LTV) described by:

ẋ(t) = A(t)x(t) + B(t)u(t)


y(t) = C(t)x(t) + D(t)u(t)

Definition 9.4.1 (BIBO Stability). A system is said to be BIBO stable if, for zero initial conditions (x(t0 ) =
0), any bounded input u(t) produces a bounded output y(t).
Formal Definition: The system is BIBO stable if for any initial time t0 and any input u(t) such that
||u(t)|| ≤ Mu < ∞ for all t ≥ t0 (where Mu is some finite constant and || · || is a suitable vector norm, e.g.,
the ∞-norm), the resulting output y(t) (assuming x(t0 ) = 0) is also bounded, i.e., there exists a finite constant
My (Mu , t0 ) such that ||y(t)|| ≤ My < ∞ for all t ≥ t0 .
Key Points:
• Zero Initial State: BIBO stability is defined based on the system’s response to inputs only, assuming
the system starts from rest (x(t0 ) = 0). The effect of non-zero initial conditions is related to internal
(Lyapunov) stability.
• Bounded Input: The input signal must be bounded over time. This means its magnitude never exceeds
some finite limit. Examples include step inputs, sinusoidal inputs, and decaying exponentials. An input
like u(t) = t is not bounded.
• Bounded Output: If the system is BIBO stable, the output signal resulting from any bounded input
must also remain bounded over time; it cannot grow infinitely large.
• Uniformity: For LTI systems, if the system is BIBO stable, the bound My on the output depends only
on the bound Mu of the input, not on the initial time t0 .
BIBO stability ensures predictable and safe operation in response to realistic input signals.

9.4.2 Relation to Impulse Response Integrability (Mention)


For LTI systems, BIBO stability is directly related to the properties of the system’s impulse response. The
impulse response matrix, denoted H(t), describes the output y(t) when the input u(t) is a Dirac delta function
δ(t) applied at t = 0, assuming zero initial conditions.
From the LTI solution formula (Section 8.4.2):
Z t
At
y(t) = Ce x(0) + C eA(t−τ ) Bu(τ )dτ + Du(t)
0

Setting x(0) = 0 and u(t) = δ(t)I (where I is the m × m identity, applying a delta to each input channel), the
output is the impulse response matrix H(t):
Z t
H(t) = C eA(t−τ ) Bδ(τ )Idτ + Dδ(t)I
0

H(t) = CeAt B (for t ≥ 0) + Dδ(t)


(Note: Often the impulse response is defined without the Dδ(t) term, focusing on the strictly proper part.)
The zero-state output response can be written as the convolution of the impulse response H(t) with the input
u(t):
Z t
yzs (t) = H(t − τ )u(τ )dτ = (H ∗ u)(t)
0

(This includes the Du(t) term if H(t) contains Dδ(t)).


Condition for BIBO Stability (Impulse Response): An LTI system is BIBO stable if and only if its
impulse response matrix H(t) is absolutely integrable. This means that the integral of the absolute value
(or a suitable norm) of each element of H(t) from 0 to ∞ is finite.
Z ∞
|hij (t)|dt < ∞ for all elements hij (t) of H(t).
0
9.4. INPUT-OUTPUT STABILITY (BIBO) 97

Intuition: If the
R impulse response decays to zero sufficiently quickly (is absolutely integrable), then the convo-
lution integral H(t − τ )u(τ )dτ will remain bounded even if the input u(τ ) is bounded but persists indefinitely.
If the impulse response does not decay (or decays too slowly), a bounded input could potentially excite the
system continuously, leading to an unbounded output (resonance-like behavior).
For the state-space representation H(t) = CeAt B +Dδ(t), the absolute integrability condition primarily depends
on the decay rate of eAt . If A is asymptotically stable (all Re(λi ) < 0), then eAt decays exponentially, ensuring
that CeAt B is absolutely integrable. The Dδ(t) term does not affect BIBO stability as defined (response to
bounded inputs, not impulses).
This connection highlights that for LTI systems, the conditions for internal stability (asymptotic stability based
on eigenvalues) are closely related to the conditions for external stability (BIBO stability based on impulse
response integrability).

9.4.3 Theorem: For LTI systems, Asymptotic Stability ⇒ BIBO Stability (State-
ment)
As hinted in the previous subsection, there is a strong connection between the internal stability (asymptotic
stability of ẋ = Ax) and the external stability (BIBO stability) for LTI systems.
Theorem 9.4.1. For an LTI system described by ẋ = Ax + Bu, y = Cx + Du: If the system is asymptotically
stable (i.e., all eigenvalues of A have strictly negative real parts), then the system is BIBO stable.

Proof Idea. 1. Asymptotic Stability Implies Exponential Decay: If A is asymptotically stable, then
the matrix exponential eAt decays exponentially. This means ||eAt || ≤ M e−αt for some constants M > 0
and α > 0, for all t ≥ 0.
2. Boundedness of Zero-State Response: Consider the zero-state output response (assuming x(0) = 0):
Z t
yzs (t) = C eA(t−τ ) Bu(τ )dτ + Du(t)
0

3. Bound the Integral Term: Take the norm of the integral part:
Z t Z t
C eA(t−τ ) Bu(τ )dτ ≤ ||CeA(t−τ ) Bu(τ )||dτ (Triangle inequality for integrals)
0 0
Z t
≤ ||C||||eA(t−τ ) ||||B||||u(τ )||dτ (Submultiplicativity)
0

4. Use Input Bound and Exponential Decay: Assume the input is bounded: ||u(τ )|| ≤ Mu for all τ .
Substitute the exponential decay bound for ||eA(t−τ ) ||:
Z t
≤ ||C||(M e−α(t−τ ) )||B||Mu dτ
0

Z t
≤ ||C||M ||B||Mu e−α(t−τ ) dτ
0

5. Evaluate the Integral: Let s = t − τ , ds = −dτ . When τ = 0, s = t. When τ = t, s = 0.


Z t Z 0 Z t
e−α(t−τ ) dτ = e−αs (−ds) = e−αs ds
0 t 0
 t
1
= − e−αs
α
 0  
1 −αt 1 0
= − e − − e
α α
1
= (1 − e−αt )
α
Since α > 0, this integral is bounded for all t ≥ 0 by 1/α.
98 CHAPTER 9. STABILITY OF LINEAR SYSTEMS

6. Bound the Integral Term (Result):


Z t  
1
C eA(t−τ ) Bu(τ )dτ ≤ ||C||M ||B||Mu
0 α

This shows the integral part of the output is bounded by a constant that depends on Mu but not on t.
7. Bound the Feedthrough Term: The term Du(t) is also bounded:

||Du(t)|| ≤ ||D||||u(t)|| ≤ ||D||Mu

8. Bound the Total Output: Using the triangle inequality for the output yzs (t):
Z t
||yzs (t)|| ≤ C . . . dτ + ||Du(t)||
0
 
||C||M ||B||
||yzs (t)|| ≤ + ||D|| Mu
α

This shows that ||yzs (t)|| is bounded by a finite constant My = ( ||C||M


α
||B||
+ ||D||)Mu for all t ≥ 0.
Therefore, if the system is asymptotically stable, any bounded input produces a bounded output (assuming zero
initial state), satisfying the definition of BIBO stability.

Converse Statement (Requires Controllability and Observability): The converse (BIBO stability ⇒
Asymptotic Stability) is not always true. A system can be BIBO stable but internally unstable if the unstable
internal modes are either uncontrollable (cannot be excited by the input) or unobservable (do not appear at
the output).
However, if the LTI system is both controllable and observable (concepts introduced in Chapter 11), then
BIBO stability is equivalent to asymptotic stability.
Summary: For LTI systems, internal asymptotic stability is a sufficient condition for external BIBO stability.
They become equivalent conditions if the system is also controllable and observable.
Chapter 10

System Modes and Response


Characteristics

Understanding the stability of an LTI system, as discussed in Chapter 9, tells us whether the state converges to
the origin. However, it doesn’t fully describe how the system behaves over time. The concepts of eigenvalues
and eigenvectors, introduced in Chapter 3, reappear here to provide deeper insight into the system’s dynamic
behavior through modal analysis.

Modal analysis decomposes the system’s response into a sum of simpler components, called modes, each
associated with an eigenvalue and eigenvector of the state matrix A. This decomposition helps visualize and
understand the system’s natural frequencies, damping characteristics, and how different initial conditions excite
different patterns of behavior.

This chapter focuses primarily on systems where the state matrix A is diagonalizable.

10.1 Modal Decomposition of LTI Systems (for diagonalizable A)


Consider the homogeneous LTI system ẋ = Ax. If the n × n matrix A is diagonalizable, it has a full set of n
linearly independent eigenvectors, v1 , . . . , vn , corresponding to the eigenvalues λ1 , . . . , λn (which may not be
distinct, but the geometric multiplicity must equal the algebraic multiplicity for each eigenvalue).

We can form the modal matrix V whose columns are the eigenvectors, and the diagonal eigenvalue matrix Λ:

V = [v1 |v2 | . . . |vn ]

Λ = diag(λ1 , λ2 , . . . , λn )

Since the eigenvectors are linearly independent, V is invertible, and we have the relationship:

A = V ΛV −1

10.1.1 Transformation to Modal Coordinates (z = V −1 x)


The key idea of modal decomposition is to perform a change of basis for the state vector x using the eigenvectors
as the new basis vectors. Let the new state vector be z(t), defined by the linear transformation:

x(t) = V z(t)

This means z(t) represents the coordinates of the state x(t) in the basis of eigenvectors. We can find z(t) from
x(t) using the inverse transformation:
z(t) = V −1 x(t)

The vector z(t) is often called the vector of modal coordinates.

99
100 CHAPTER 10. SYSTEM MODES AND RESPONSE CHARACTERISTICS

10.1.2 Decoupled System Dynamics: ż = Λz


Now, let’s see how the system dynamics look in terms of the modal coordinates z(t). Differentiate the trans-
formation x = V z with respect to time (note that V is a constant matrix):

ẋ = V ż

Substitute this and x = V z into the original state equation ẋ = Ax:

V ż = A(V z)

Substitute A = V ΛV −1 :
V ż = (V ΛV −1 )(V z)

V ż = V Λ(V −1 V )z

V ż = V ΛIz

V ż = V Λz

Since V is invertible, we can multiply both sides by V −1 from the left:

V −1 (V ż) = V −1 (V Λz)

(V −1 V )ż = (V −1 V )Λz

I ż = IΛz

ż(t) = Λz(t)

This is a remarkable result. The transformation to modal coordinates z = V −1 x has transformed the original
coupled system ẋ = Ax into a completely decoupled system ż = Λz. Since Λ is a diagonal matrix, this vector
equation represents n independent scalar first-order differential equations:

ż1 (t) = λ1 z1 (t)


ż2 (t) = λ2 z2 (t)
...
żn (t) = λn zn (t)

Each modal coordinate zi (t) evolves independently according to its corresponding eigenvalue λi .
Solution in Modal Coordinates: The solution for each decoupled scalar equation is simply:

zi (t) = eλi t zi (0)

In vector form, the solution for z(t) is:


z(t) = eΛt z(0)

where eΛt = diag(eλ1 t , . . . , eλn t ) and z(0) = V −1 x(0) is the initial condition transformed into modal coordinates.
Solution in Original Coordinates: To get the solution back in the original state coordinates x(t), we use
the transformation x = V z:

x(t) = V z(t) = V [eΛt z(0)]


= V eΛt V −1 x(0)

This recovers the solution we found in Section 8.2.1 using the diagonalization method for computing eAt . The
modal decomposition provides a clear interpretation of this solution: the system’s evolution is governed by the
simple exponential behavior of the decoupled modal coordinates, which are then mapped back to the original
state space via the eigenvector matrix V .
This decomposition is fundamental to understanding how the system’s internal structure (eigenvalues and
eigenvectors) dictates its dynamic response.
10.2. 101

ci eλi t vi
P
10.2 Homogeneous Response via Modes: x(t) =
In the previous section, we saw that for a diagonalizable LTI system ẋ = Ax, the solution to the homogeneous
initial value problem x(t0 ) = x0 is given by:

x(t) = V eΛt V −1 x(0)

where V is the matrix of eigenvectors, Λ is the diagonal matrix of eigenvalues, and eΛt = diag(eλ1 t , . . . , eλn t ).
This form is useful for computation, but we can rewrite it to gain more insight into the structure of the response.
Let c be the vector of initial modal coordinates:

c = z(0) = V −1 x(0)

c = [c1 , c2 , . . . , cn ]T
The vector c represents the weights of the initial state x(0) when expressed as a linear combination of the
eigenvectors (the columns of V ). Specifically, x(0) = V c = c1 v1 + c2 v2 + · · · + cn vn .
Now, substitute c back into the solution x(t) = V [eΛt z(0)] = V eΛt c:
 
 λt  c1
e 1 0  
..   c2 
x(t) = [v1 |v2 | . . . |vn ]  .   .. 

.
0 eλ n t
cn

First, multiply the diagonal matrix by the vector c:

eλ1 t c1
 
 λt 
e 1 0  eλ2 t c2 
..  c =  .. 
.
   
 . 

0 eλn t
eλn t cn

Now, multiply the matrix V by this resulting vector:

eλ1 t c1
 
 eλ2 t c2 
x(t) = [v1 |v2 | . . . |vn ]  . 
 
 .. 
eλn t cn

This matrix-vector multiplication results in a linear combination of the columns of V , weighted by the elements
of the vector:
x(t) = (eλ1 t c1 )v1 + (eλ2 t c2 )v2 + · · · + (eλn t cn )vn

Homogeneous Response via Modes:

n
X
x(t) = ci eλi t vi
i=1

where ci are the components of c = V −1 x(0).


Interpretation: This equation expresses the state trajectory x(t) as a superposition of modes. Each term,
ci eλi t vi , represents a mode of the system:
• λi (Eigenvalue): Determines the temporal behavior of the mode.
– If Re(λi ) < 0, the mode decays exponentially.
– If Re(λi ) = 0, the mode oscillates (if Im(λi )̸= 0) or remains constant (if λi = 0).
– If Re(λi ) > 0, the mode grows exponentially.
• vi (Eigenvector): Determines the spatial pattern or "shape" of the mode in the state space. The state
evolution associated with this mode occurs along the direction defined by the eigenvector vi .
102 CHAPTER 10. SYSTEM MODES AND RESPONSE CHARACTERISTICS

• ci (Modal Participation Factor): Determines how much each mode is initially excited by the initial
condition x(0). It is the i-th component of the initial state when represented in the eigenvector basis
(c = V −1 x(0)). If ci = 0, the i-th mode is not excited by that specific initial condition.
The overall system response x(t) is the sum of these individual modal responses. The long-term behavior is
dominated by the modes corresponding to eigenvalues with the largest real parts (the slowest decaying or fastest
growing modes).
This modal representation is extremely powerful for understanding the qualitative behavior of LTI systems
without needing to compute the full matrix exponential eAt . It directly links the system’s structure (eigenvalues
and eigenvectors) to its dynamic response.

10.3 Interpretation of Eigenvalues (Poles) and Eigenvectors (Mode


Shapes) in System Response
The modal decomposition x(t) = ci eλi t vi provides a clear framework for interpreting the roles of eigenvalues
P
and eigenvectors in shaping the system’s dynamic response.
Eigenvalues (λi ) - Poles of the System:
The eigenvalues λi of the state matrix A are often referred to as the poles of the system (this terminology comes
from transfer function analysis in classical control, where eigenvalues correspond to the poles of the system’s
transfer function).
Each eigenvalue λi directly governs the temporal behavior of its corresponding mode eλi t :
1. Real Part (Re(λi ) = σi ): Decay Rate / Growth Rate
• The real part σi determines the exponential growth or decay of the mode’s amplitude.
• σi < 0 (Stable Pole): The mode decays exponentially with a time constant τ = −1/σi . The larger
the magnitude |σi |, the faster the decay.
• σi = 0 (Marginally Stable Pole): The mode’s amplitude neither decays nor grows exponentially.
It remains constant or oscillates with constant amplitude.
• σi > 0 (Unstable Pole): The mode grows exponentially with a time constant τ = 1/σi . The larger
σi , the faster the growth.
• The overall stability of the system is determined by the eigenvalue with the largest real part (most
unstable or least stable pole).
2. Imaginary Part (Im(λi ) = ωi ): Oscillation Frequency
• The imaginary part ωi determines the oscillatory behavior of the mode.
• Eigenvalues occur in complex conjugate pairs (λ = σ ± jω) for real matrices A.
• The corresponding modes e(σ±jω)t combine to produce real-valued responses of the form eσt ·(K1 cos(ωt)+
K2 sin(ωt)).
• ωi = 0 (Real Pole): The mode exhibits pure exponential decay or growth without oscillation.
• ωi ̸= 0 (Complex Pole): The mode exhibits oscillations with a natural frequency of ωi (in radians
per second). The oscillations are damped if σi < 0, sustained if σi = 0, or growing if σi > 0.
Eigenvectors (vi ) - Mode Shapes:
Each eigenvector vi associated with an eigenvalue λi defines the spatial structure or direction of the corre-
sponding mode in the n-dimensional state space.
• Direction of Motion: If a single mode i is excited (i.e., x(0) = ci vi ), the state vector x(t) = ci eλi t vi
will always remain aligned with the direction of the eigenvector vi . The state evolves purely along this
line (or plane, for complex eigenvectors) in the state space, either decaying towards the origin, growing
away from it, or oscillating along it.
• Mode Shape: The components of the eigenvector vi = [v1i , v2i , . . . , vni ]T describe the relative relationship
between the state variables when the system is behaving purely according to that mode. For example,
if vi = [1, −2]T , it means that when mode i is active, the state variable x2 is always -2 times the state
variable x1 .
10.3. INTERPRETATION OF EIGENVALUES (POLES) AND EIGENVECTORS (MODE SHAPES) IN SYSTEM RESPON

• Basis for Response: The set of n linearly independent eigenvectors {v1 , . . . , vn } forms a basis for the
state space. Any state vector x(t) can be represented as a linear combination of these eigenvectors, with
the time-varying coefficients given by the modal responses ci eλi t .
Example 10.3.1 (Interpretation). Consider a 2D system (n=2) with eigenvalues λ1 = −1 (real, stable) and
λ2 = −0.1 + j2 (complex, stable), and corresponding eigenvectors v1 = [1, 1]T and v2 = [1, −1 + j]T (and
v2∗ = [1, −1 − j]T for λ∗2 ).
• Mode 1 (λ1 = −1, v1 = [1, 1]T ): This is a non-oscillatory decaying mode. If the initial state x(0) is
proportional to v1 , the state x(t) will decay exponentially towards the origin along the line x2 = x1 .
• Mode 2 (λ2 = −0.1 ± j2, v2 ): This is a damped oscillatory mode. The real part (-0.1) indicates slow
decay, and the imaginary part (±2) indicates oscillations with frequency 2 rad/s. The complex eigenvector
v2 describes the phase relationship between x1 and x2 during these oscillations.
• General Response: A general initial condition x(0) will excite both modes (unless x(0) happens to be
aligned with one eigenvector). The response x(t) will be a superposition:

x(t) = c1 e−t v1 + [c2 e(−0.1+j2)t v2 + c∗2 e(−0.1−j2)t v2∗ ]

The response will exhibit damped oscillations (from mode 2) superimposed on a faster decay along the v1
direction (from mode 1). As t → ∞, the faster decaying mode 1 becomes negligible, and the behavior is
dominated by the slower decaying oscillatory mode 2, eventually converging to the origin.
By analyzing the eigenvalues (poles) and eigenvectors (mode shapes), we can predict and understand the
qualitative characteristics of the system’s response—such as stability, speed of response (decay rates), presence
and frequency of oscillations, and the dominant patterns of behavior in the state space—without needing to
simulate the system for every possible initial condition.
104 CHAPTER 10. SYSTEM MODES AND RESPONSE CHARACTERISTICS
Chapter 11

Controllability and Observability


(Introduction)

Chapters 9 and 10 focused on the internal dynamics and stability of linear systems, primarily analyzing the
homogeneous system ẋ = Ax. Now, we turn our attention to how the inputs u(t) affect the state x(t) and how
the state x(t) affects the output y(t). These concepts are formalized by controllability and observability.
• Controllability relates to the ability of the input u(t) to influence or steer the state vector x(t) to any
desired value within a finite time.
• Observability relates to the ability to determine the initial state vector x(t0 ) by observing the output
y(t) over a finite time interval.
These are fundamental structural properties of a system, independent of specific input signals or initial states.
They determine whether it is possible, in principle, to control the system completely or to deduce its internal
state from external measurements. These concepts are crucial for control design (e.g., pole placement requires
controllability) and state estimation (e.g., designing observers requires observability).

11.1 Definitions: Reachability, Controllability, Constructibility, Ob-


servability
There are slightly different, but closely related, definitions associated with controllability and observability,
particularly concerning the ability to reach from the origin versus reaching to the origin, and determining the
initial state versus the current state.
Consider the LTI system:

ẋ(t) = Ax(t) + Bu(t)


y(t) = Cx(t) + Du(t)

Reachability:
• Definition: A state xf is said to be reachable at time tf > t0 if, starting from the zero initial state
(x(t0 ) = 0), there exists an input signal u(t) defined over [t0 , tf ] that transfers the state to x(tf ) = xf .
• Reachable Subspace: The set of all reachable states at time tf forms a subspace of the state space Rn ,
called the reachable subspace R(tf ).
• Complete Reachability: The system is said to be completely reachable (or simply reachable) if the
reachable subspace is the entire state space (R(tf ) = Rn ) for some finite tf > t0 . (For LTI systems, if it’s
reachable for some tf , it’s reachable for all tf > t0 ).
• Interpretation: Can we reach any target state starting from the origin using some control input?
Controllability:
• Definition: A state x0 is said to be controllable at time t0 if there exists a finite time tf > t0 and an
input signal u(t) defined over [t0 , tf ] that transfers the state from x(t0 ) = x0 to the zero state (x(tf ) = 0).

105
106 CHAPTER 11. CONTROLLABILITY AND OBSERVABILITY (INTRODUCTION)

• Controllable Subspace: The set of all controllable states forms a subspace C(t0 ).
• Complete Controllability: The system is said to be completely controllable (or simply controllable)
if the controllable subspace is the entire state space (C(t0 ) = Rn ). This means any initial state can be
driven to the origin in finite time.
• Interpretation: Can we drive any initial state back to the origin using some control input?
Relationship between Reachability and Controllability (LTI): For continuous-time LTI systems, reach-
ability and controllability are equivalent concepts. A state is reachable if and only if it is controllable. Therefore,
the terms are often used interchangeably, and we typically refer to the property as controllability.
Constructibility:
• Definition: A state x0 at time t0 is said to be unconstructible if, assuming zero input (u(t) = 0 for
t ≤ t0 ), the output is identically zero (y(t) = 0 for t ≤ t0 ) when the initial state is x(t0 ) = x0 . If x0 = 0
is the only initial state that produces zero output for t ≤ t0 , then the system is constructible.
• Unconstructible Subspace: The set of all unconstructible initial states forms a subspace U(t0 ).
• Complete Constructibility: The system is said to be completely constructible (or simply con-
structible) if the unconstructible subspace contains only the zero vector (U(t0 ) = {0}). This means any
non-zero initial state must produce a non-zero output at some point in the past (assuming zero input).
• Interpretation: If we observe the output generated by the system evolving freely (zero input) up to time
t0 , can we uniquely determine the state x(t0 )? If the system is constructible, only x(t0 ) = 0 could have
produced y(t) = 0 for all t ≤ t0 .
Observability:
• Definition: A state x0 at time t0 is said to be unobservable if, assuming zero input (u(t) = 0 for
t ≥ t0 ), the output is identically zero (y(t) = 0 for all t ≥ t0 ) when the initial state is x(t0 ) = x0 .
• Unobservable Subspace: The set of all unobservable initial states forms a subspace N (t0 ).
• Complete Observability: The system is said to be completely observable (or simply observable) if
the unobservable subspace contains only the zero vector (N (t0 ) = {0}). This means any non-zero initial
state must produce a non-zero output at some point in the future (assuming zero input).
• Interpretation: If we observe the output generated by the system evolving freely (zero input) from time
t0 onwards, can we uniquely determine the initial state x(t0 )? If the system is observable, only x(t0 ) = 0
could produce y(t) = 0 for all t ≥ t0 .
Relationship between Constructibility and Observability (LTI): For continuous-time LTI systems,
constructibility and observability are equivalent concepts. Therefore, the terms are often used interchangeably,
and we typically refer to the property as observability.
These definitions lay the groundwork for analyzing whether a system’s internal state can be fully influenced by
inputs and fully inferred from outputs.

11.2 Physical Interpretation


Controllability and observability have intuitive physical meanings related to the system’s structure and how
energy or information flows through it.
Controllability Interpretation:
• Can the inputs influence all internal states? A system is controllable if the inputs Bu(t) can, possibly
indirectly through the system dynamics A, affect every single state variable xi (t) or, more generally, every
mode of the system. If a state variable or a mode is completely decoupled from the inputs, we cannot
control it.
• Energy Input: Think of the input u(t) as injecting "energy" or influence into the system through the
input matrix B. The system dynamics, represented by A, propagate this influence through the state
variables. Controllability means this influence can eventually reach all parts of the state space.
• Example (Uncontrollable): Imagine two separate subsystems described by ẋ1 = A1 x1 + B1 u and
ẋ2 = A2 x2 . The overall system state is x = [xT1 , x2 ] . If the input u only affects the first subsystem
T T

(B = [B1 , 0] ), then the second subsystem x2 evolves independently according to ẋ2 = A2 x2 regardless
T T

of the input u. The states in x2 are uncontrollable. We cannot use u to steer x2 to a desired value.
11.2. PHYSICAL INTERPRETATION 107

• Modal Interpretation: In the modal coordinates (assuming A is diagonalizable, ż = Λz + V −1 Bu),


controllability relates to whether the input affects each mode zi . If the i-th row of the matrix V −1 B is zero,
then the input u does not directly influence the i-th mode żi = λi zi , making that mode uncontrollable.
Even if V −1 B is non-zero, certain modes might still be uncontrollable due to structural properties (this
is captured by the Kalman rank condition).
Observability Interpretation:
• Can the outputs reveal all internal states? A system is observable if the behavior of every state
variable xi (t) or mode eventually affects the output y(t) through the output matrix C. If a state variable
or mode evolves without ever influencing the output, its behavior (and initial value) cannot be determined
by observing y(t).
• Information Output: Think of the state x(t) as containing internal information. The output matrix
C determines which parts of this information are made available externally through the output y(t).
Observability means that observing y(t) over time allows us to reconstruct the entire internal state.
• Example (Unobservable): Consider the same two separate subsystems ẋ1 = A1 x1 + B1 u and ẋ2 =
A2 x2 . If the output only depends on the first subsystem, y = C1 x1 , then the state x2 never affects the
output. The states in x2 are unobservable. We cannot determine the initial value of x2 by measuring y.
• Modal Interpretation: In modal coordinates (y = CV z + Du), observability relates to whether each
mode zi affects the output. If the i-th column of the matrix CV is zero, then the i-th mode zi never
influences the output y (beyond the direct Du term), making that mode unobservable.
Duality: There is a mathematical duality between controllability and observability. The system (A, B) is
controllable if and only if the "dual" system (AT , C T ) is observable. Similarly, (A, C) is observable if and only
if (AT , B T ) is controllable. This duality is reflected in the structure of the controllability and observability tests
(discussed next).
Understanding controllability and observability is essential for designing effective control systems and state
estimators. If a system has uncontrollable states, no controller acting through the given inputs can stabilize them
if they are unstable. If a system has unobservable states, their behavior cannot be inferred from measurements,
which poses challenges for feedback control and system monitoring.

You might also like