Course note
Course note
Manual Rewrite
May 2, 2025
2
Contents
1 Vector Spaces 5
1.1 Definition and Examples (over R, C) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Linear Independence, Span, Basis, and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Theorem: Existence and Uniqueness of Basis Representation . . . . . . . . . . . . . . . . . . . . 10
3
4 CONTENTS
Vector Spaces
5
6 CHAPTER 1. VECTOR SPACES
vector space over R, but its dimension will be different (2n over R versus n over C).
Example 1.1.3 (Space of Matrices Mm×n (F)). Let Mm×n (F) denote the set of all m × n matrices with entries
from the field F. We can define matrix addition and scalar multiplication in the usual way. If A = [aij ] and
B = [bij ] are two m × n matrices, their sum A + B is the m × n matrix C = [cij ] where cij = aij + bij . If a ∈ F
is a scalar, the scalar multiple aA is the m × n matrix D = [dij ] where dij = aaij . The zero vector is the m × n
zero matrix (all entries are 0), and the additive inverse of A is −A = [−aij ]. With these operations, Mm×n (F)
forms a vector space over F.
Example 1.1.4 (Space of Polynomials P (F)). Let P (F) be the set of all polynomials with coefficients from
the field F. A polynomial p(x) can be written as p(x) = a0 + a1 x + a2P x2 + · · · + an xn forPsome non-negative
integer n andP coefficients ai ∈ F. Addition of two polynomials p(x) = ai xi andPq(x) = bi xi is defined as
(p + q)(x) = (ai + bi )x . Scalar multiplication by c ∈ F is defined as (cp)(x) = (cai )x . The zero vector is
i i
the zero polynomial (all coefficients are 0). P (F) forms a vector space over F.
Example 1.1.5 (Space of Functions F(S, F)). Let S be any non-empty set and F be a field. Consider the set
F(S, F) of all functions f : S → F. We can define addition of two functions f, g ∈ F(S, F) as the function (f + g)
defined by (f + g)(s) = f (s) + g(s) for all s ∈ S. Scalar multiplication by a ∈ F is defined as the function (af )
given by (af )(s) = a(f (s)) for all s ∈ S. The zero vector is the function that maps every element of S to the
zero element in F. With these operations, F(S, F) forms a vector space over F. Many important vector spaces,
such as spaces of continuous functions or differentiable functions, are subsets of this general construction.
Understanding the abstract definition of a vector space and recognizing these common examples is crucial for
applying the powerful tools of linear algebra to various problems, including the analysis of linear systems.
1.2 Subspaces
Within a given vector space, we often encounter subsets that themselves possess the structure of a vector space
under the same operations. These special subsets are called subspaces. Understanding subspaces is fundamental
because they represent smaller, self-contained linear structures within a larger space.
Formally, let V be a vector space over a field F. A non-empty subset W of V (W ⊆ V, W ̸= ∅) is called a
subspace of V if W itself forms a vector space over F under the vector addition and scalar multiplication
operations defined on V .
While we could verify all ten vector space axioms for W , there is a much simpler and more efficient criterion.
A non-empty subset W of a vector space V is a subspace if and only if it is closed under vector addition and
scalar multiplication. This means:
1. Closure under Addition: For any two vectors w1 , w2 ∈ W , their sum w1 + w2 must also be in W .
2. Closure under Scalar Multiplication: For any scalar a ∈ F and any vector w ∈ W , the scalar multiple
aw must also be in W .
Theorem 1.2.1 (Subspace Criterion). Let V be a vector space over a field F, and let W be a non-empty subset
of V . Then W is a subspace of V if and only if for all w1 , w2 ∈ W and all scalars a ∈ F, the following two
conditions hold:
1. w1 + w2 ∈ W
2. aw1 ∈ W
Necessity: If W is a subspace, it is, by definition, a vector space under the operations inherited from V .
(⇒)Proof.
Therefore, vector addition and scalar multiplication must be closed operations within W , meaning (i) and
(ii) must hold.
1.2. SUBSPACES 7
(⇐) Sufficiency: Assume W is a non-empty subset of V satisfying conditions (i) and (ii). We need to show W satisfies all
vector space axioms.
– Axioms (A1), (A2), (M1), (M2), (M3), (M4) involve only the properties of addition and scalar
multiplication themselves (associativity, commutativity, distributivity, identity). Since these hold for
all vectors in V , they automatically hold for all vectors in the subset W .
– We need to verify (A3) the existence of the zero vector in W and (A4) the existence of additive
inverses in W .
(A3) Zero Vector: Since W is non-empty, there exists at least one vector w ∈ W . By condition (ii), taking the
scalar a = 0 (the zero scalar in F), we have 0w ∈ W . We know from basic vector space properties
(which can be derived from the axioms, see Exercise) that 0w = 0 (the zero vector of V ). Thus,
0 ∈ W.
(A4) Additive Inverse: Let w ∈ W . By condition (ii), taking the scalar a = −1 (the additive inverse of the multiplicative
identity 1 in F), we have (−1)w ∈ W . We also know from basic vector space properties (see
Exercise) that (−1)w = −w (the additive inverse of w in V ). Thus, for every w ∈ W , its
additive inverse −w is also in W .
Since W is closed under addition and scalar multiplication, and contains the zero vector and additive in-
verses for all its elements, and inherits the necessary associative, commutative, and distributive properties
from V , W is itself a vector space over F. Therefore, W is a subspace of V .
Note: An alternative, equivalent criterion is that a non-empty subset W is a subspace if and only if for all
w1 , w2 ∈ W and all scalars a, b ∈ F, the linear combination aw1 + bw2 is also in W .
Examples of Subspaces:
1. The Trivial Subspaces: For any vector space V , the set containing only the zero vector, {0}, is always
a subspace (the zero subspace). Also, V itself is always a subspace of V .
2. Lines and Planes through the Origin in R3 : Consider V = R3 . The set W1 = {(x, 0, 0) | x ∈ R}
(the x-axis) is a subspace. Let w1 = (x1 , 0, 0) and w2 = (x2 , 0, 0). Then w1 + w2 = (x1 + x2 , 0, 0) ∈ W1 ,
and aw1 = (ax1 , 0, 0) ∈ W1 . Similarly, the set W2 = {(x, y, 0) | x, y ∈ R} (the xy-plane) is a subspace. In
general, any line or plane passing through the origin in R3 forms a subspace.
3. Solutions to Homogeneous Linear Systems: Let A be an m × n matrix over F. The set W = {x ∈
Fn | Ax = 0} (the null space or kernel of A) is a subspace of Fn . To verify:
• W is non-empty since A0 = 0, so 0 ∈ W .
• If x1 , x2 ∈ W , then Ax1 = 0 and Ax2 = 0. Then A(x1 + x2 ) = Ax1 + Ax2 = 0 + 0 = 0, so
x1 + x2 ∈ W .
• If x ∈ W and a ∈ F, then Ax = 0. Then A(ax) = a(Ax) = a0 = 0, so ax ∈ W .
4. Continuous Functions: Let V = C[a, b] be the vector space of all real-valued continuous functions on
the interval [a, b]. The subset W = C 1 [a, b] of all continuously differentiable functions on [a, b] is a subspace
of V . The sum of two differentiable functions is differentiable, and a scalar multiple of a differentiable
function is differentiable.
5. Polynomials of Degree ≤ n: Let V = P (F) be the space of all polynomials. Let Pn (F) be the
subset of polynomials with degree less than or equal to n. If p(x) and q(x) have degrees ≤ n, then
deg(p + q) ≤ max(deg(p), deg(q)) ≤ n. Also, deg(ap) = deg(p) ≤ n (if a ̸= 0). The zero polynomial has
degree −∞ (by convention) and is in Pn (F). Thus, Pn (F) is a subspace of P (F).
Non-Examples:
1. A Line Not Through the Origin: The set W = {(x, 1) | x ∈ R} in R2 is not a subspace because it
does not contain the zero vector (0, 0). Also, it’s not closed under addition: (1, 1) ∈ W and (2, 1) ∈ W ,
but (1, 1) + (2, 1) = (3, 2) ∈
/ W.
2. The Union of Two Subspaces: Generally, the union of two subspaces is not a subspace. For example,
let W1 be the x-axis and W2 be the y-axis in R2 . Both are subspaces. However, W1 ∪ W2 is not a subspace
because (1, 0) ∈ W1 and (0, 1) ∈ W2 , but (1, 0) + (0, 1) = (1, 1) ∈
/ W1 ∪ W2 .
8 CHAPTER 1. VECTOR SPACES
Subspaces inherit the algebraic structure of the parent space, making them fundamental building blocks in the
study of vector spaces and linear transformations.
v = c1 v1 + c2 v2 + · · · + ck vk
The set of all possible linear combinations of vectors in S is called the span of S, denoted as span(S) or lin(S):
span(S) = {c1 v1 + · · · + ck vk | ci ∈ F, vi ∈ S}
If S is an infinite set, the span consists of all linear combinations of any finite subset of S.
Theorem 1.3.1. The span of any subset S of a vector space V is always a subspace of V .
Proof. Let W = span(S). We need to show W is non-empty and closed under addition and scalar multiplication.
If span(S) = V , we say that the set S spans or generates the vector space V .
Linear Independence
While a set S might span a vector space V , it might contain redundant vectors. The concept of linear indepen-
dence addresses this redundancy.
A set of vectors S = {v1 , v2 , . . . , vk } in a vector space V is said to be linearly independent if the only solution
to the vector equation:
c1 v1 + c2 v2 + · · · + ck vk = 0
If there exists a non-trivial solution (i.e., at least one ci is non-zero), then the set S is said to be linearly
dependent. An infinite set S is linearly independent if every finite subset of S is linearly independent.
Intuitively, a set is linearly dependent if at least one vector in the set can be expressed as a linear combination
of the others. For example, if c1 v1 + · · · + ck vk = 0 with c1 ̸= 0, then we can write:
showing that v1 is redundant in terms of spanning, as it already lies in the span of the other vectors {v2 , . . . , vk }.
1.3. LINEAR INDEPENDENCE, SPAN, BASIS, AND DIMENSION 9
Basis
A basis for a vector space V combines the concepts of spanning and linear independence. It is a minimal set of
vectors that spans the entire space.
A subset B of a vector space V is called a basis for V if it satisfies two conditions:
1. Linear Independence: B is a linearly independent set.
2. Spanning: B spans V , i.e., span(B) = V .
Equivalently, a basis is a maximal linearly independent set or a minimal spanning set.
Example 1.3.1. In Rn , the set of standard basis vectors {e1 , e2 , . . . , en }, where ei is the vector with a 1 in
the i-th position and 0s elsewhere, forms a basis. This is called the standard basis for Rn .
• Linear Independence: c1 e1 + · · · + cn en = (c1 , . . . , cn ) = 0 = (0, . . . , 0) implies c1 = · · · = cn = 0.
• Spanning: Any vector x = (x1 , . . . , xn ) ∈ Rn can be written as x = x1 e1 + · · · + xn en .
Example 1.3.2. In P2 (R), the space of polynomials of degree at most 2, the set {1, x, x2 } forms a basis.
• Linear Independence: c1 · 1 + c2 · x + c3 · x2 = 0 (the zero polynomial) for all x implies c1 = c2 = c3 = 0.
• Spanning: Any polynomial p(x) = a0 + a1 x + a2 x2 is clearly a linear combination of {1, x, x2 }.
Dimension
A crucial property of vector spaces is that although a space can have many different bases, all bases for a given
vector space contain the same number of vectors.
Theorem 1.3.2 (Invariance of Dimension). If a vector space V has a basis consisting of n vectors, then every
basis for V must consist of exactly n vectors.
(Proof omitted here, but relies on the Steinitz Exchange Lemma or similar arguments showing that a linearly
independent set cannot have more elements than a spanning set.)
The number of vectors in any basis for a vector space V is called the dimension of V , denoted as dim(V ).
• If V has a basis with a finite number of vectors n, V is called finite-dimensional, and dim(V ) = n.
• If V does not have a finite basis (e.g., P (F) or C[a, b]), it is called infinite-dimensional.
• By convention, the dimension of the zero vector space {0} is 0.
Examples of Dimension:
• dim(Rn ) = n
• dim(Cn ) = n (as a complex vector space)
• dim(Mm×n (F)) = m × n
• dim(Pn (F)) = n + 1
Coordinates
The unique scalars c1 , c2 , . . . , cn associated with the representation of a vector v relative to an ordered basis
B = {b1 , b2 , . . . , bn } are called the coordinates of v relative to the basis B. The vector formed by these
coordinates is called the coordinate vector of v relative to B, often denoted as [v]B :
c1
c2
[v]B = . ∈ Fn
..
cn
This theorem establishes a fundamental link between an abstract n-dimensional vector space V over F and
the familiar concrete vector space Fn . Once a basis B is chosen for V , every vector v in V corresponds to a
unique coordinate vector [v]B in Fn , and vice versa. This correspondence preserves the vector space operations,
meaning:
• [u + v]B = [u]B + [v]B
• [av]B = a[v]B
This type of structure-preserving mapping is called an isomorphism. Thus, any n-dimensional vector space over
F is isomorphic to Fn . This allows us to translate problems about abstract vector spaces into problems about
column vectors in Fn , which can often be solved using matrix algebra.
Chapter 2
Linear transformations are the fundamental mappings between vector spaces that preserve the underlying
linear structure. They are central to linear algebra and play a crucial role in understanding linear systems, as
they represent operations like rotation, scaling, projection, and, importantly, the evolution of linear dynamical
systems.
This single condition states that a linear transformation preserves linear combinations. Intuitively, applying a
linear transformation to a linear combination of vectors yields the same result as applying the transformation
to each vector individually and then forming the same linear combination of the results.
11
12 CHAPTER 2. LINEAR TRANSFORMATIONS AND MATRICES
The dimension of the kernel is called the nullity of T, denoted nullity(T ) = dim(ker(T )). The kernel provides
information about whether the transformation is injective (one-to-one).
Theorem 2.1.2. A linear transformation T : V → W is injective if and only if ker(T ) = {0V }.
Proof. (⇒) Assume T is injective. We know 0V ∈ ker(T ) because T (0V ) = 0W . If v ∈ ker(T ), then T (v) = 0W .
Since T (0V ) = 0W and T is injective, we must have v = 0V . Thus, ker(T ) contains only the zero vector.
(⇐) Assume ker(T ) = {0V }. Suppose T (u) = T (v) for some u, v ∈ V . Then T (u) − T (v) = 0W . By linearity,
T (u − v) = 0W . This means u − v ∈ ker(T ). Since ker(T ) only contains the zero vector, we must have
u − v = 0V , which implies u = v. Therefore, T is injective.
Range (Image)
The range or image of a linear transformation T : V → W , denoted as range(T ), Im(T ), or R(T ), is the set
of all vectors in the codomain W that are the image of at least one vector in the domain V :
range(T ) = {w ∈ W | w = T (v) for some v ∈ V }
Theorem 2.1.3. The range of a linear transformation T : V → W is a subspace of W .
The dimension of the range is called the rank of T, denoted rank(T ) = dim(range(T )). The range determines
if the transformation is surjective (onto). T is surjective if and only if range(T ) = W .
Understanding the kernel and range is crucial for analyzing the properties of linear transformations and solving
linear equations. The relationship between their dimensions is captured by the Rank-Nullity Theorem, which
we will discuss later.
The scalars a1j , a2j , . . . , amj are the coordinates of T (bj ) relative to the basis C. We can write this coordinate
vector as:
a1j
a2j
[T (bj )]C = . ∈ Fm
..
amj
The matrix representation of T relative to the bases B and C, denoted as [T ]CB or A, is the m × n matrix
whose j-th column is the coordinate vector of T (bj ) relative to the basis C:
a11 a12 ··· a1n
a21 a22 ··· a2n
= . .. .. ..
.. . . .
am1 am2 ··· amn
v = x1 b1 + x2 b2 + · · · + xn bn
x1
x2
where [v]B = . .
..
xn
Apply the linear transformation T to v:
T (v) = T (x1 b1 + x2 b2 + · · · + xn bn )
14 CHAPTER 2. LINEAR TRANSFORMATIONS AND MATRICES
This theorem provides the bridge between the abstract linear transformation T and its concrete matrix repre-
sentation A. It shows that the action of T on a vector v corresponds to the multiplication of the matrix A by
the coordinate vector of v.
Example 2.2.1. Let T : R2 → R3 be defined by T (x, y) = (x + y, 2x − y, y). Let B = {b1 = (1, 0), b2 = (0, 1)}
be the standard basis for R2 . Let C = {c1 = (1, 0, 0), c2 = (0, 1, 0), c3 = (0, 0, 1)} be the standard basis for R3 .
We find the matrix representation A = [T ]CB . First, find the images of the basis vectors in B:
Important Note:
The matrix representation of a linear transformation depends crucially on the choice of bases B and C. Changing
either basis will generally result in a different matrix representation for the same linear transformation. We will
explore the relationship between different matrix representations of the same transformation under a change of
basis later.
With these operations, the set of all linear transformations from V to W , denoted L(V, W ) or Hom(V, W ), itself
forms a vector space over F.
These results show that the mapping from L(V, W ) to the space of m × n matrices Mm×n (F) defined by
T 7→ [T ]CB is itself a linear transformation (an isomorphism, in fact). This means that the vector space of
linear transformations L(V, W ) is isomorphic to the vector space of matrices Mm×n (F), and dim(L(V, W )) =
dim(Mm×n (F)) = mn.
This equation holds for all u ∈ U . By the definition of the matrix representation of S ◦ T relative to bases A
and C, the matrix that multiplies [u]A to give [(S ◦ T )(u)]C is precisely [S ◦ T ]CA . Therefore:
[S ◦ T ]CA = [S]CB [T ]B
A
This fundamental result connects the abstract operation of function composition with the concrete operation
of matrix multiplication. It underscores why matrix multiplication is defined the way it is – it mirrors the
composition of the underlying linear transformations.
B = {b1 , . . . , bn }
B ′ = {b′1 , . . . , b′n }
Any vector v ∈ V has unique coordinate representations relative to each basis:
x1
..
[v]B = . such that v = x1 b1 + · · · + xn bn
xn
x′1
x′n
We want to find a matrix that relates [v]B and [v]B′ .
Consider the identity transformation Id : V → V , defined by Id(v) = v. Let’s find its matrix representation
relative to the bases B ′ (for the domain) and B (for the codomain). This matrix is called the change-of-
coordinates matrix (or transition matrix) from B ′ to B, denoted PB←B′ .
PB←B′ = [Id]B
B′
The j-th column of this matrix is the coordinate vector of Id(b′j ) = b′j relative to the basis B:
[Id(v)]B = [Id]B
B′ [v]B′
Since this holds for all [v]B , the matrix product must be the identity matrix:
PB←B′ PB′ ←B = In
This shows that the change-of-coordinates matrices are inverses of each other:
[T (v)]C ′ = A′ [v]B′
Let P = PB←B′ be the change-of-coordinates matrix from B ′ to B in V ([v]B = P [v]B′ ). Let Q = PC←C ′ be
the change-of-coordinates matrix from C ′ to C in W ([w]C = Q[w]C ′ ). Then Q−1 = PC ′ ←C converts coordinates
from C to C ′ ([w]C ′ = Q−1 [w]C ).
Start with [T (v)]C ′ = A′ [v]B′ . We want to relate this to A. Apply the change of basis in W :
A′ = Q−1 AP
where:
• A = [T ]CB
′
• A′ = [T ]B
C
′
In the general formula A′ = Q−1 AP , we now have W = V , C = B, and C ′ = B ′ . So, P = PB←B′ (change of
basis B ′ to B). And Q = PB←B′ (change of basis B ′ to B), which is the same matrix P . Therefore, Q−1 = P −1 .
The formula becomes:
A′ = P −1 AP
′
B , A = [T ]B′ , and P = PB←B′ is the change-of-coordinates matrix from B to B.
where A = [T ]B ′ B ′
Two matrices A and A′ related by A′ = P −1 AP for some invertible matrix P are called similar matrices.
Similar matrices represent the same linear operator T : V → V but relative to different bases. This concept is
fundamental in understanding eigenvalues, eigenvectors, and diagonalization.
Matrix Transpose
The transpose of an m × n matrix A, denoted as AT (or sometimes A′ ), is the n × m matrix obtained by
interchanging the rows and columns of A. If A = [aij ], then AT = [aji ], where the entry in the i-th row and j-th
column of AT is the entry from the j-th row and i-th column of A.
Properties of Transpose: Let A and B be matrices of appropriate sizes and c be a scalar.
2.5. MATRIX TRANSPOSE, INVERSE, DETERMINANT: PROPERTIES AND COMPUTATION 19
(AB)T is cji = k ajk bki . Let D = B T AT . The entry in the i-th row and j-th column of D is given by the dot
product of the i-th row of B T and the j-th column of AT . The i-th row of B T is the i-th column of B, which
has elements
P bki Tfor k =T 1..n. The
P j-th column
P of A is the j-th row of A, which has elements ajk for k = 1..n.
T
So, dij = k (B )ik (A )kj = k bki ajk = k ajk bki . Comparing cji and dij , we see they are equal. Thus,
(AB)T = B T AT .
Matrix Inverse
For square matrices (n × n), we can define the concept of an inverse, analogous to the reciprocal of a non-zero
number. An n × n matrix A is called invertible (or non-singular) if there exists an n × n matrix B such that:
AB = BA = In
where In is the n × n identity matrix. If such a matrix B exists, it is unique and is called the inverse of A,
denoted as A−1 . If no such matrix B exists, A is called non-invertible (or singular).
Properties of Inverse: Let A and B be invertible n × n matrices and c be a non-zero scalar.
1. (A−1 )−1 = A
2. (AB)−1 = B −1 A−1 (The inverse of a product is the product of the inverses in reverse order )
3. (cA)−1 = (1/c)A−1
4. (AT )−1 = (A−1 )T (The inverse of the transpose is the transpose of the inverse)
Proof of (2). We need to show that (B −1 A−1 )(AB) = I and (AB)(B −1 A−1 ) = I.
Conditions for Invertibility: For an n × n matrix A, the following conditions are equivalent (if one is true,
all are true):
• A is invertible.
• The equation Ax = 0 has only the trivial solution x = 0.
• The columns of A are linearly independent.
• The columns of A span Fn .
• The columns of A form a basis for Fn .
• The linear transformation T (x) = Ax is injective (one-to-one).
• The linear transformation T (x) = Ax is surjective (onto).
• The rank of A is n (rank(A) = n).
• The nullity of A is 0 (nullity(A) = 0).
• The determinant of A is non-zero (det(A) ̸= 0).
• A is row equivalent to the identity matrix In .
20 CHAPTER 2. LINEAR TRANSFORMATIONS AND MATRICES
Computation of Inverse: One common method to compute the inverse A−1 is using Gaussian elimination
(row reduction). We form the augmented matrix [A | I] and perform elementary row operations to transform
A into the identity matrix I. The same sequence of operations applied to I will transform it into A−1 :
[A | I] → [I | A−1 ]
Determinant
The determinant is a scalar value associated with every square matrix A, denoted as det(A) or |A|. It provides
crucial information about the matrix and the linear transformation it represents.
a b
For a 2 × 2 matrix A = , det(A) = ad − bc.
c d
a b c
For a 3 × 3 matrix A = d e f ,
g h i
In general, the determinant can be defined recursively using cofactor expansion. Let Aij be the submatrix
obtained by deleting the i-th row and j-th column of A. The cofactor Cij is defined as Cij = (−1)i+j det(Aij ).
• Expansion across row i: det(A) = j aij Cij
P
Proof. Let dim(V ) = n. Since ker(T ) is a subspace of V , it is also finite-dimensional. Let dim(ker(T )) = k,
where 0 ≤ k ≤ n.
1. Basis for the Kernel: Choose a basis for ker(T ). Let this basis be Bker = {u1 , u2 , . . . , uk }. Since
this set is linearly independent and consists of k vectors, k = dim(ker(T )) = nullity(T ). If k = n, then
ker(T ) = V , which means T (v) = 0 for all v ∈ V . In this case, range(T ) = {0}, so dim(range(T )) = 0.
The theorem holds: n = n + 0. If k = 0, then ker(T ) = {0}. The basis Bker is the empty set.
2. Extend to a Basis for V: Assume 0 ≤ k < n. Since Bker is a linearly independent set in V , we
can extend it to form a basis for the entire space V (by the Basis Extension Theorem). Let B =
{u1 , . . . , uk , v1 , . . . , vn−k } be such a basis for V . The total number of vectors in this basis is k + (n − k) =
n = dim(V ).
3. Show {T (v1 ), . . . , T (vn−k )} is a Basis for range(T ): We need to show that the set Brange = {T (v1 ), T (v2 ), . . . , T (vn−k
forms a basis for the range of T . This requires proving two things: that Brange spans range(T ) and that
it is linearly independent.
• Spanning: Let w be any vector in range(T ). By definition, w = T (x) for some x ∈ V . Since B is
a basis for V , we can write x as a linear combination of the basis vectors:
This shows that any vector w in range(T ) can be written as a linear combination of the vectors in
Brange . Thus, Brange spans range(T ).
• Linear Independence: Suppose there is a linear combination of vectors in Brange that equals the
zero vector:
d1 T (v1 ) + d2 T (v2 ) + · · · + dn−k T (vn−k ) = 0W
Using linearity, this can be written as:
This implies that the vector z = d1 v1 + · · · + dn−k vn−k is in the kernel of T (ker(T )). Since
Bker = {u1 , . . . , uk } is a basis for ker(T ), z must be expressible as a linear combination of these
vectors:
z = c1 u1 + c2 u2 + · · · + ck uk
Substituting the expression for z:
d1 v1 + · · · + dn−k vn−k = c1 u1 + · · · + ck uk
Since B = {u1 , . . . , uk , v1 , . . . , vn−k } is a basis for V , it is a linearly independent set. Therefore, the
only way this linear combination can equal the zero vector is if all the scalar coefficients are zero:
In particular, we have shown that d1 = d2 = · · · = dn−k = 0. This is precisely the condition needed
to show that the set Brange = {T (v1 ), . . . , T (vn−k )} is linearly independent.
4. Conclusion: Since Brange spans range(T ) and is linearly independent, it forms a basis for range(T ). The
number of vectors in this basis is n − k. Therefore:
dim(range(T )) = rank(T ) = n − k.
We already established that dim(ker(T )) = nullity(T ) = k and dim(V ) = n. Substituting these into the
equation rank(T ) = n − k gives:
rank(T ) = dim(V ) − nullity(T )
Rearranging, we get the Rank-Nullity Theorem:
Application to Matrices:
If T : Fn → Fm is a linear transformation represented by the matrix multiplication T (x) = Ax, where A is an
m × n matrix, then:
• ker(T ) is the null space of A, N (A) = {x ∈ Fn | Ax = 0}. nullity(T ) = dim(N (A)).
• range(T ) is the column space of A, Col(A) = span{columns of A}. rank(T ) = dim(Col(A)) = rank(A).
The Rank-Nullity Theorem for matrices states:
T (v) = λv
The scalar λ is called the eigenvalue corresponding to the eigenvector v. The term "eigen" comes from German,
meaning "proper" or "characteristic". Thus, eigenvectors are characteristic vectors of the transformation, and
eigenvalues are the characteristic values associated with these directions.
Similarly, for a square matrix A of size n × n representing a linear transformation from Fn to Fn relative to
some basis, a non-zero vector x in Fn is called an eigenvector of A if there exists a scalar λ in F such that
Ax = λx
Geometric Interpretation
Geometrically, the equation Ax = λx signifies that the action of the matrix A on the vector x results in a
vector Ax that is parallel to the original vector x. The eigenvalue λ represents the factor by which the vector x
is stretched or shrunk (or reversed if λ < 0) along the direction defined by x. Eigenvectors define the invariant
directions under the linear transformation represented by A.
For example, consider a rotation matrix in R2 . Unless the rotation is by 0 or π radians, no non-zero vector will
map to a scalar multiple of itself. Thus, such a rotation matrix (over R) has no real eigenvectors. However, if we
23
24 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION
consider the transformation over C, eigenvalues and eigenvectors might exist. A projection matrix, on the other
hand, will typically have eigenvectors. Vectors lying in the subspace being projected onto are eigenvectors with
eigenvalue 1, while vectors in the subspace being projected out (the kernel) are eigenvectors with eigenvalue 0.
Ax = λx
Since we are looking for non-zero vectors x, we can rewrite this equation. Let I be the identity matrix of the
same size as A. Then λx can be written as λIx. Substituting this into the equation gives:
Ax = λIx
(A − λI)x = 0
This equation represents a homogeneous system of linear equations. We are seeking non-trivial solutions (x ̸= 0)
for this system. A fundamental result from linear algebra states that a homogeneous system (M x = 0) has
non-trivial solutions if and only if the matrix M is singular, which means its determinant is zero.
In our case, the matrix is M = (A − λI). Therefore, for non-zero eigenvectors x to exist, the matrix (A − λI)
must be singular. This condition translates to:
det(A − λI) = 0
This equation is called the characteristic equation of the matrix A. The expression det(A − λI), when ex-
panded, results in a polynomial in the variable λ. This polynomial is known as the characteristic polynomial
of A, often denoted as p(λ) or charA (λ).
p(λ) = det(A − λI)
The roots of the characteristic polynomial p(λ) = 0 are precisely the eigenvalues of the matrix A.
Let A be an n × n matrix. The matrix (A − λI) is:
a11 − λ a12 ··· a1n
a21 a22 − λ ··· a2n
A − λI = . .. .. ..
.. . . .
an1 an2 ··· ann − λ
The determinant of this matrix, det(A − λI), can be computed using standard determinant expansion methods
(like cofactor expansion). The resulting characteristic polynomial p(λ) will have a degree of n.
where the coefficients ci depend on the entries of A. Specifically, the leading coefficient cn is (−1)n , and the
constant term c0 is det(A) (obtained by setting λ = 0). Another important coefficient is cn−1 , which is equal
to (−1)n−1 times the trace of A (the sum of the diagonal elements).
By the Fundamental Theorem of Algebra, a polynomial of degree n with complex coefficients has exactly n
complex roots, counting multiplicities. Therefore, an n × n matrix A always has n eigenvalues in the complex
numbers (counting multiplicities). These eigenvalues might be real or complex, and they might not be distinct.
Once the eigenvalues (λ1 , λ2 , . . . , λn ) are found by solving the characteristic equation det(A − λI) = 0, the
corresponding eigenvectors for each eigenvalue λi can be found by solving the homogeneous system of linear
equations:
(A − λi I)x = 0
The non-zero solutions x to this system are the eigenvectors associated with the eigenvalue λi . The set of all
solutions (including the zero vector) forms a subspace called the eigenspace corresponding to λi , denoted as
Eλi or Nul(A − λi I). Any non-zero vector in this eigenspace is an eigenvector for λi .
3.2. FINDING EIGENVALUES AND EIGENVECTORS 25
3. Solve p(λ) = 0:
λ2 − 4λ + 3 = 0
Factoring the polynomial, we get:
(λ − 1)(λ − 3) = 0
The roots are λ1 = 1 and λ2 = 3.
Therefore, the eigenvalues of the matrix A are 1 and 3.
(A − λi I)x = 0
This is a homogeneous system of linear equations. The set of all solutions (including the zero vector) forms the
eigenspace Eλi = Nul(A − λi I). To find the eigenvectors:
1. Substitute the eigenvalue: Replace λ with the specific eigenvalue λi in the matrix (A − λI).
2. Solve the homogeneous system: Solve the system of linear equations (A − λi I)x = 0 for the vector
x. This is typically done using methods like Gaussian elimination (row reduction) to find the basis for
the null space (eigenspace).
3. Identify the eigenvectors: The non-zero vectors in the basis of the null space are the linearly indepen-
dent eigenvectors corresponding to λi . Any non-zero linear combination of these basis vectors is also an
eigenvector for λi . The set of all solutions (including the zero vector) constitutes the eigenspace Eλi .
26 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION
2 1
Continuing the Example (A = ):
1 2
For λ1 = 1:
1. Substitute λ1 = 1 into (A − λI):
2−1 1 1 1
A − 1I = =
1 2−1 1 1
2. Solve (A − 1I)x = 0:
1 1 x1 0
=
1 1 x2 0
This corresponds to the system:
x1 + x2 = 0
x1 + x2 = 0
2. Solve (A − 3I)x = 0:
−1 1 x1 0
=
1 −1 x2 0
This corresponds to the system:
−x1 + x2 = 0
x1 − x2 = 0
Both equations
imply x1 = x2 . Let x2 = s (a free parameter). Then x1 = s. The solution vector is
s 1
x= =s .
s 1
1
3. Identify eigenvectors: The eigenspace E3 is spanned by the vector . Any non-zero scalar multiple
1
1
of this vector is an eigenvector for λ2 = 3. For instance, v2 = is an eigenvector.
1
2 1
In summary, for the matrix A = , the eigenvalues are λ1 = 1 and λ2 = 3. The corresponding eigenspaces
1 2
−1 1
are E1 = span{ } and E3 = span{ }.
1 1
Important Considerations
• Multiplicity: An eigenvalue can have an algebraic multiplicity (its multiplicity as a root of the
characteristic polynomial) and a geometric multiplicity (the dimension of its corresponding eigenspace,
i.e., the number of linearly independent eigenvectors). The geometric multiplicity is always less than or
equal to the algebraic multiplicity (1 ≤ dim(Eλi ) ≤ algebraic multiplicity of λi ).
3.3. PROPERTIES OF EIGENVALUES AND EIGENVECTORS 27
• Complex Eigenvalues: If the matrix A has real entries, its characteristic polynomial will have real
coefficients. However, the roots (eigenvalues) can still be complex. If a complex number λ is an eigenvalue,
its complex conjugate λ̄ will also be an eigenvalue. The corresponding eigenvectors will also occur in
conjugate pairs.
• Finding Roots: For matrices larger than 2 × 2 or 3 × 3, finding the roots of the characteristic polynomial
analytically can be difficult or impossible. Numerical methods are often employed to approximate the
eigenvalues in such cases.
Derivation Sketch: The characteristic polynomial is p(λ) = det(A−λI) = (−1)n λn +(−1)n−1 tr(A)λn−1 +
· · ·+det(A). By Vieta’s formulas, the sum of the roots (eigenvalues) of a polynomial is related to its coeffi-
cients. For p(λ), the sum of the roots is −(coefficient of λn−1 )/(coefficient of λn ) = −((−1)n−1 tr(A))/((−1)n ) =
−(− tr(A)) = tr(A).
2. Determinant and Eigenvalues: The product of the eigenvalues of a matrix is equal to the determinant
of the matrix.
Yn
λi = λ1 · λ2 · · · · · λn = det(A)
i=1
Derivation Sketch: The characteristic polynomial is p(λ) = det(A − λI). Setting λ = 0 gives p(0) =
det(A−0I) = det(A). Also, from the factored form of the polynomial p(λ) = (−1)n (λ−λ1 )(λ−λ2 ) . . . (λ−
λn ), setting λ = 0 gives p(0) = (−1)n (−λ1 )(−λ2 ) . . . (−λn ) = (−1)n (−1)n (λ1 λ2 . . . λn ) = λ1 λ2 . . . λn .
Equating the two expressions for p(0) yields det(A) = λ1 λ2 . . . λn .
3. Eigenvalues of AT : A matrix A and its transpose AT have the same characteristic polynomial and
therefore the same eigenvalues.
Proof. The characteristic polynomial of AT is det(AT − λI). We know that the determinant of a matrix is
equal to the determinant of its transpose: det(M ) = det(M T ). Let M = A − λI. Then M T = (A − λI)T =
AT − (λI)T = AT − λI. Therefore, det(A − λI) = det((A − λI)T ) = det(AT − λI). Since the characteristic
polynomials are identical, their roots (the eigenvalues) must also be identical.
Note: While the eigenvalues are the same, the eigenvectors of A and AT are generally different (unless A
is symmetric).
4. Eigenvalues of A−1 (if A is invertible): If A is an invertible matrix (det(A) ̸= 0) and λ is an eigenvalue
of A with corresponding eigenvector x, then 1/λ is an eigenvalue of A−1 with the same eigenvector x.
Proof. Since A is invertible, det(A) ̸= 0. From property 2, the product of eigenvalues is det(A), so no
eigenvalue λ can be zero. We start with the eigenvalue equation:
Ax = λx
Ak+1 x = A(Ak x)
Ak+1 x = λk (λx)
Ak+1 x = (λk+1 )x
Thus, the statement holds for k+1. By induction, Ak x = λk x holds for all integers k ≥ 1. The case k=0
is A0 x = Ix = x and λ0 x = 1x = x, so it also holds (assuming λ ̸= 0 if k=0, though usually defined as
I).
6. Eigenvalues of cA (for scalar c): If λ is an eigenvalue of A with corresponding eigenvector x, then cλ
is an eigenvalue of cA with the same eigenvector x.
Proof. Start with Ax = λx. Multiply both sides by the scalar c:
c(Ax) = c(λx)
(cA)x = (cλ)x
This shows that x is an eigenvector of cA with eigenvalue cλ.
7. Eigenvalues of A + cI: If λ is an eigenvalue of A with corresponding eigenvector x, then λ + c is an
eigenvalue of A + cI with the same eigenvector x.
Proof. Consider the action of (A + cI) on x:
(A + cI)x = Ax + (cI)x
= Ax + cx
Using Ax = λx:
= λx + cx
= (λ + c)x
This shows that x is an eigenvector of A + cI with eigenvalue λ + c.
8. Eigenvalues of Triangular Matrices: The eigenvalues of a triangular matrix (either upper or lower)
are the entries on its main diagonal.
Proof. Let A be an n × n upper triangular matrix. Then (A − λI) is also an upper triangular matrix with
diagonal entries (a11 −λ), (a22 −λ), . . . , (ann −λ). The determinant of a triangular matrix is the product of
its diagonal entries. Therefore, det(A − λI) = (a11 − λ)(a22 − λ) . . . (ann − λ). The characteristic equation
is (a11 − λ)(a22 − λ) . . . (ann − λ) = 0. The roots (eigenvalues) are λ1 = a11 , λ2 = a22 , . . . , λn = ann . The
same logic applies to lower triangular matrices.
9. Linear Independence of Eigenvectors: Eigenvectors corresponding to distinct eigenvalues are linearly
independent. (This is a crucial theorem that will be proven in the next section).
These properties are fundamental tools for working with eigenvalues and eigenvectors, simplifying calculations
and providing theoretical insights.
3.4. THEOREM: LINEAR INDEPENDENCE OF EIGENVECTORS FOR DISTINCT EIGENVALUES (WITH PROOF)29
A(c1 v1 + c2 v2 + · · · + ck vk ) = A(0)
Since vi is an eigenvector corresponding to eigenvalue λi , we have Avi = λi vi . Substitute this into the equation:
c1 λ1 v1 + c2 λ2 v2 + · · · + ck λk vk = 0 (3.2)
λk (c1 v1 + c2 v2 + · · · + ck vk ) = λk (0)
(c1 λ1 v1 + · · · + ck λk vk ) − (c1 λk v1 + · · · + ck λk vk ) = 0 − 0
c1 (λ1 − λk ) = 0
c2 (λ2 − λk ) = 0
...
ck−1 (λk−1 − λk ) = 0
30 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION
Since all the eigenvalues are distinct by assumption, we know that λi ̸= λk for i = 1, 2, . . . , k − 1. This means
that (λi − λk ) ̸= 0 for all i in this range.
Because (λi − λk ) ̸= 0, the only way for the products ci (λi − λk ) to be zero is if the coefficients ci are zero:
c1 = 0
c2 = 0
...
ck−1 = 0
Now, substitute these zero coefficients back into the original Equation (3.1):
Since vk is an eigenvector, it must be non-zero (vk ̸= 0). Therefore, the only way for ck vk to be zero is if the
scalar ck is zero:
ck = 0
We have now shown that c1 = c2 = · · · = ck−1 = ck = 0 is the only solution to the linear dependence equation
(3.1). This proves that the set of eigenvectors {v1 , v2 , . . . , vk } is linearly independent.
Conclusion: By the principle of mathematical induction, the theorem holds for any number k of eigenvectors
corresponding to distinct eigenvalues.
Significance: This theorem guarantees that if an n × n matrix A has n distinct eigenvalues, then it has a set
of n linearly independent eigenvectors. As we will see in the next section, this is a sufficient condition for the
matrix to be diagonalizable.
Definition of Diagonalizability
A square matrix A of size n × n is said to be diagonalizable if it is similar to a diagonal matrix. That is, A is
diagonalizable if there exists an invertible matrix V and a diagonal matrix Λ such that:
A = V ΛV −1
Equivalently, multiplying by V −1 from the left and V from the right, we can write:
Λ = V −1 AV
This means that A can be transformed into a diagonal matrix Λ through a similarity transformation using the
matrix V .
Now, let’s look at the equation AV = V Λ column by column. The j-th column of AV is A times the j-th column
of V , which is Avj . The j-th column of V Λ is V times the j-th column of Λ. The j-th column of Λ is a vector
with λj in the j-th position and zeros elsewhere. Thus, the j-th column of V Λ is:
0
..
.
0
λj = 0v1 + · · · + 0vj−1 + λj vj + 0vj+1 + · · · + 0vn = λj vj
[v1 |v2 | . . . |vn ]
0
.
..
0
Avj = λj vj
This is precisely the eigenvalue equation. It shows that the columns vj of the matrix V are eigenvectors of A,
and the corresponding diagonal entries λj of Λ are the eigenvalues of A.
Since A is diagonalizable, the matrix V exists and is invertible. A matrix is invertible if and only if its columns
are linearly independent. Therefore, the eigenvectors v1 , v2 , . . . , vn (which form the columns of V ) must be
linearly independent.
Furthermore, since V is an n × n matrix and its columns are linearly independent, these n eigenvectors form
a basis for the vector space Fn (where F is the field, typically R or C). Such a basis consisting entirely of
eigenvectors is called an eigenbasis.
Thus, if A is diagonalizable, it must have n linearly independent eigenvectors (which form an eigenbasis).
Construct the diagonal matrix Λ whose diagonal entries are the corresponding eigenvalues:
λ1 0 ··· 0
0 λ2 ··· 0
Λ= . .. .. ..
.. . . .
0 0 ··· λn
(AV )V −1 = (V Λ)V −1
A(V V −1 ) = V ΛV −1
AI = V ΛV −1
A = V ΛV −1
This shows that A is similar to the diagonal matrix Λ. Therefore, A is diagonalizable.
Process of Diagonalization
If an n × n matrix A is diagonalizable, the process to find the matrices V and Λ such that A = V ΛV −1 is as
follows:
1. Find the Eigenvalues: Solve the characteristic equation det(A − λI) = 0 to find the n eigenvalues
λ1 , λ2 , . . . , λn of A.
2. Find Linearly Independent Eigenvectors: For each distinct eigenvalue λi , find a basis for the corre-
sponding eigenspace Eλi = Nul(A − λi I). Collect all the basis vectors found for all eigenspaces.
3. Check for Diagonalizability: Check if the total number of linearly independent eigenvectors found in
Step 2 is equal to n (the size of the matrix). If it is less than n, the matrix A is not diagonalizable. If it
is equal to n, then A is diagonalizable.
• Sufficient Condition: A shortcut based on Section 3.4: If A has n distinct eigenvalues, then it is
guaranteed to have n linearly independent eigenvectors, and thus A is diagonalizable.
• Necessary and Sufficient Condition: A is diagonalizable if and only if the geometric multiplicity
(dimension of the eigenspace) of each eigenvalue equals its algebraic multiplicity (multiplicity as a
root of the characteristic polynomial). (This will be discussed further in the next section).
4. Construct V: Form the matrix V whose columns are the n linearly independent eigenvectors found in
Step 2. V = [v1 |v2 | . . . |vn ]. The order of the eigenvectors in V matters.
5. Construct Λ: Form the diagonal matrix Λ whose diagonal entries are the eigenvalues corresponding to the
eigenvectors in V , in the same order. That is, if the j-th column of V is the eigenvector vj corresponding
to eigenvalue λj , then the j-th diagonal entry of Λ must be λj .
6. Verification (Optional but Recommended): Calculate V −1 and verify that A = V ΛV −1 or, more
easily, check that AV = V Λ.
2 1
Example 3.5.1. Diagonalize A = (from Section 3.2).
1 2
1. Eigenvalues: We found λ1 = 1, λ2 = 3.
−1 1
2. Eigenvectors: We found corresponding eigenvectors v1 = for λ1 = 1 and v2 = for λ2 = 3.
1 1
3.6. THEOREM: DIAGONALIZABILITY CONDITION (ALGEBRAIC VS. GEOMETRIC MULTIPLICITY) (WITH PROO
3. Check: We have 2 linearly independent eigenvectors ({v1 , v2 }) for a 2 × 2 matrix. Thus, A is diagonal-
izable.
−1 1
4. Construct V: V = [v1 |v2 ] =
1 1
λ1 0 1 0
5. Construct Λ: Λ = =
0 λ2 0 3
6. Verify (AV = V Λ):
2 1 −1 1 −2 + 1 2 + 1 −1 3
AV = = =
1 2 1 1 −1 + 2 1 + 2 1 3
−1 1 1 0 −1(1) + 1(0) −1(0) + 1(3) −1 3
VΛ= = =
1 1 0 3 1(1) + 1(0) 1(0) + 1(3) 1 3
Since AV = V Λ, the diagonalization is correct. A = V ΛV −1 .
Proof Sketch. Let λ0 be an eigenvalue of A with geometric multiplicity k = GM(λ0 ). This means the eigenspace
Eλ0 has dimension k. Let {v1 , v2 , . . . , vk } be a basis for Eλ0 . We can extend this basis to a basis for the entire
space Fn : {v1 , . . . , vk , w1 , . . . , wn−k }. Form an invertible matrix P whose columns are these basis vectors:
P = [v1 | . . . |vk |w1 | . . . |wn−k ]. Consider the matrix P −1 AP . The first k columns of AP are Av1 , . . . , Avk .
Since vi are eigenvectors for λ0 , Avi = λ0 vi . So, AP = [λ0 v1 | . . . |λ0 vk |Aw1 | . . . |Awn−k ]. Then P −1 AP =
P −1 [λ0 v1 | . . . |λ0 vk |Aw1 | . . . |Awn−k ]. Since P −1 P = I, and the first k columns of P are v1 , . . . , vk , the first
k columns of P −1 AP will be P −1 (λ0 vi ) = λ0 (P −1 vi ). Because vi is the i-th column of P , P −1 vi is the i-th
standard basis vector ei . Thus, the first k columns of P −1 AP are λ0 e1 , . . . , λ0 ek . This means P −1 AP has the
block form:
λ I B
P −1 AP = 0 k
0 C
where Ik is the k × k identity matrix, B is k × (n − k), 0 is (n − k) × k zero matrix, and C is (n − k) × (n − k).
Now, consider the characteristic polynomial of A. Since A and P −1 AP are similar matrices, they have the same
characteristic polynomial:
det(A − λI) = det(P −1 AP − λI)
34 CHAPTER 3. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZATION
λ0 Ik − λIk B
det(P −1 AP − λI) = det
0 C − λIn−k
This is the determinant of a block upper triangular matrix, which is the product of the determinants of the
diagonal blocks:
The characteristic polynomial det(A − λI) has a factor of (λ0 − λ)k . This implies that the algebraic multiplicity
of λ0 , AM(λ0 ), must be at least k. Since k = GM(λ0 ), we have shown that GM(λ0 ) ≤ AM(λ0 ). The inequality
1 ≤ GM(λ) holds because an eigenvalue must have at least one corresponding eigenvector (which spans a space
of dimension at least 1).
Furthermore, we know from the previous theorem that for each eigenvalue, GM(λi ) ≤ AM(λi ).
Comparing Equations (3.5) and (3.6), and knowing GM(λi ) ≤ AM(λi ), the only way for both sums to equal n
is if GM(λi ) = AM(λi ) for all i = 1, . . . , p. Also, for A to be diagonalizable over F, the eigenvalues λi (roots of
the characteristic polynomial) must be in F, meaning the polynomial factors completely over F.
(⇐) Assume the characteristic polynomial factors completely over F and GM(λi ) = AM(λi ) for all
distinct eigenvalues λi .
Let λ1 , . . . , λp be the distinct eigenvalues. Since the characteristic polynomial factors completely, the sum of
the algebraic multiplicities is n:
AM(λ1 ) + · · · + AM(λp ) = n
By assumption, GM(λi ) = AM(λi ) for all i. Therefore:
GM(λ1 ) + · · · + GM(λp ) = n
Let Bi be a basis for the eigenspace Eλi . The size of Bi is |Bi | = GM(λi ). Consider the union of these bases:
B = B1 ∪ B2 ∪ · · · ∪ Bp . The total number of vectors in B is |B| = GM(λ1 ) + · · · + GM(λp ) = n.
3.7. SIMILARITY TRANSFORMATIONS AND THEIR PROPERTIES 35
As argued before, the set B consists of eigenvectors, and vectors from different eigenspaces are linearly indepen-
dent. Within each Bi , the vectors are linearly independent by definition of a basis. Therefore, the entire set B
is a collection of n linearly independent eigenvectors of A.
Since A has n linearly independent eigenvectors, by the theorem in Section 3.5, A is diagonalizable.
Conclusion: The condition GM(λi ) = AM(λi ) for all eigenvalues λi , along with the requirement that all
eigenvalues belong to the field of interest, provides a precise criterion for determining if a matrix is diagonalizable
without explicitly finding the full set of n eigenvectors
and checking their independence.
2 1
Example 3.6.1 (Revisited). Consider A = . Eigenvalues are λ1 = 1, λ2 = 3. AM(1) = 1, AM(3) = 1.
1 2
−1 1
Sum = 1+1 = 2 = n. We found E1 = span{ }, so GM(1) = dim(E1 ) = 1. We found E3 = span{ }, so
1 1
GM(3) = dim(E3 ) = 1. Since GM(1) = AM(1) = 1 and GM(3) = AM(3) = 1, the matrix A is diagonalizable
(as we already knew).
2 1
Example 3.6.2 (Non-Diagonalizable). Consider B = . Characteristic polynomial: det(B − λI) =
0 2
2−λ 1
det( ) = (2 − λ)(2 − λ) = (λ − 2)2 . Eigenvalue: λ1 = 2 with AM(2) = 2. Find eigenspace
0 2−λ
E2 = Nul(B − 2I):
0 1
B − 2I =
0 0
0 1 x1 0
(B − 2I)x = 0 =⇒ = =⇒ 0x1 + 1x2 = 0 =⇒ x2 = 0.
0 0 x2 0
x1 1 1
x1 is a free variable. Solution: x = = x1 . The eigenspace E2 is spanned by . Thus, GM(2) =
0 0 0
dim(E2 ) = 1. Since GM(2) = 1 < AM(2) = 2, the matrix B is not diagonalizable.
Definition of Similarity
Two n × n matrices A and B (over the same field F) are said to be similar if there exists an invertible n × n
matrix P such that:
B = P −1 AP
The transformation from A to B (or vice versa, since A = (P −1 )−1 B(P −1 )) is called a similarity transfor-
mation. The matrix P is often referred to as the change-of-basis matrix.
[T (x)]β = B[x]β
We can also express [T (x)]β using basis α and the change-of-basis matrices:
[T (x)]β = P −1 [T (x)]α
B[x]β = (P −1 AP )[x]β
Since this must hold for all vectors x (and thus all coordinate vectors [x]β ), we must have:
B = P −1 AP
This shows that the matrices representing the same linear transformation T with respect to different bases
(A = [T ]α and B = [T ]β ) are similar. The matrix P is the change-of-basis matrix from β to α.
Conversely, if two matrices A and B are similar (B = P −1 AP ), they can be interpreted as representing the
same linear transformation but with respect to different bases.
Not all square matrices are diagonalizable. As seen in Section 3.6, a matrix is diagonalizable if and only if
for every eigenvalue, its geometric multiplicity equals its algebraic multiplicity. When this condition fails for
one or more eigenvalues, the matrix lacks a full set of n linearly independent eigenvectors needed to form an
eigenbasis. However, it is still possible to find a basis that simplifies the matrix representation of the associated
linear transformation into a near-diagonal form called the Jordan Normal Form (or Jordan Canonical Form).
This form is crucial for understanding the structure of non-diagonalizable matrices and has important ap-
plications, particularly in solving systems of linear differential equations when the coefficient matrix is not
diagonalizable.
Definition
Let A be an n × n matrix and let λ be an eigenvalue of A. A non-zero vector x is called a generalized
eigenvector of rank k corresponding to the eigenvalue λ if:
(A − λI)k x = 0
and
(A − λI)k−1 x ̸= 0
where k is a positive integer.
• A generalized eigenvector of rank 1 is simply a regular eigenvector, because (A − λI)1 x = 0 and (A −
λI)0 x = Ix = x ̸= 0 (since eigenvectors are non-zero).
• A generalized eigenvector of rank k > 1 is not a regular eigenvector, because (A − λI)x ̸= 0 (otherwise
its rank would be 1).
xk (rank k)
xk−1 = (A − λI)xk (rank k − 1)
xk−2 = (A − λI)xk−1 = (A − λI)2 xk (rank k − 2)
...
x1 = (A − λI)x2 = (A − λI)k−1 xk (rank 1 - this is a regular eigenvector)
39
40 CHAPTER 4. JORDAN NORMAL FORM
(A − λI)x1 = (A − λI)k xk = 0
This sequence {x1 , x2 , . . . , xk } is called a Jordan chain of length k, generated by the generalized eigenvector
xk . The vector x1 at the end of the chain is always a standard eigenvector.
Action of A on the Chain: Let’s see how the matrix A acts on the vectors in a Jordan chain:
• For x1 (the eigenvector): Ax1 = λx1
• For x2 : (A − λI)x2 = x1 =⇒ Ax2 − λx2 = x1 =⇒ Ax2 = λx2 + x1
• For x3 : (A − λI)x3 = x2 =⇒ Ax3 − λx3 = x2 =⇒ Ax3 = λx3 + x2
• ...
• For xj (1 < j ≤ k): (A − λI)xj = xj−1 =⇒ Axj − λxj = xj−1 =⇒ Axj = λxj + xj−1
This structure (Axj = λxj + xj−1 ) is key to understanding the form of Jordan blocks, which will be discussed
in the next section.
Generalized Eigenspace
For a given eigenvalue λ, the set of all generalized eigenvectors corresponding to λ, together with the zero vector,
forms a subspace called the generalized eigenspace of λ, denoted Kλ .
Equivalently, Kλ is the null space of some power of the matrix (A − λI). It can be shown that if AM(λ) = m,
then Kλ = Nul((A − λI)m ).
Properties of Generalized Eigenspaces:
1. Subspace: Kλ is a subspace of Fn .
2. Invariance: Kλ is an invariant subspace under A, meaning that if x ∈ Kλ , then Ax ∈ Kλ .
Proof. If x ∈ Kλ , then (A − λI)j x = 0 for some j. We need to show (A − λI)p (Ax) = 0 for some p.
Consider (A − λI)j (Ax). Since A commutes with (A − λI), (A − λI)j (Ax) = A((A − λI)j x) = A(0) = 0.
So Ax ∈ Kλ (with p=j).
3. Dimension: The dimension of the generalized eigenspace Kλ is equal to the algebraic multiplicity of the
eigenvalue λ: dim(Kλ ) = AM(λ). This is a crucial result. While the dimension of the regular eigenspace
Eλ (which is GM(λ)) might be smaller than AM(λ), the dimension of the generalized eigenspace Kλ
always matches AM(λ).
4. Direct Sum Decomposition: The entire vector space Fn can be decomposed into a direct sum of the
generalized eigenspaces corresponding to the distinct eigenvalues λ1 , . . . , λp of A:
This means every vector in Fn can be uniquely written as a sum of vectors, one from each generalized
eigenspace.
4. Determine Ranks/Nullities: Find the nullities (dimensions of the null spaces) of these powers: n1 =
nullity(A − λI) = GM(λ), n2 = nullity((A − λI)2 ), . . . , nm = nullity((A − λI)m ) = AM(λ). The number
of Jordan blocks of size k × k or larger associated with λ is given by nk − nk−1 (with n0 = 0). The number
of blocks of exactly size k × k is (nk − nk−1 ) − (nk+1 − nk ) = 2nk − nk−1 − nk+1 .
5. Find Chain Generators: Find vectors x that are in Nul((A − λI)k ) but not in Nul((A − λI)k−1 ). These
are the generalized eigenvectors of rank k (the top of the chains).
• Start by finding a basis for Nul((A − λI)m ) relative to Nul((A − λI)m−1 ). These vectors will generate
the longest chains (length m).
• Then find a basis for Nul((A − λI)m−1 ) relative to Nul((A − λI)m−2 ), ensuring linear independence
from vectors already generated by longer chains. These generate chains of length m-1.
• Continue this process down to Nul(A − λI), which contains the regular eigenvectors (bottoms of the
chains).
6. Construct Chains: For each generator xk found in step 5, construct the full Jordan chain: xk , xk−1 =
(A − λI)xk , . . . , x1 = (A − λI)k−1 xk .
The collection of all vectors from all Jordan chains for all eigenvalues forms a basis for Fn , known as a Jordan
basis. This basis is what allows the transformation of A into its Jordan Normal Form.
Jordan Blocks
A Jordan block is a square matrix associated with a single eigenvalue λ. It has the eigenvalue λ on the main
diagonal, ones on the super-diagonal (the diagonal directly above the main diagonal), and zeros everywhere
else. A k × k Jordan block, denoted Jk (λ), has the form:
λ 1 0 ··· 0
0
λ 1 ··· 0
0 0 λ ··· 0
Jk (λ) = . .. .. .. ..
.. . . . .
0 0 0 ··· 1
0 0 0 ··· λ
• Generalized Eigenvectors: The standard basis vectors e1 , e2 , . . . , ek form a Jordan chain for Jk (λ)
corresponding to eigenvalue λ. Let J = Jk (λ). Then J − λI is the matrix N with 1s on the super-diagonal
and 0s elsewhere.
(J − λI)e1 = N e1 = 0
(J − λI)e2 = N e2 = e1
(J − λI)e3 = N e3 = e2
...
(J − λI)ek = N ek = ek−1
This matches the structure Axj = λxj + xj−1 derived for Jordan chains in Section 4.1, with J playing
the role of A and the standard basis vectors ej playing the role of the chain vectors xj (in reverse order:
ek is the generator xk , e1 is the eigenvector x1 ). Also, (J − λI)k = N k = 0 (N is nilpotent), confirming
that e1 , . . . , ek are generalized eigenvectors.
J = P −1 AP or equivalently, A = P JP −1
2. Structure of P: The columns of the matrix P are the vectors of a Jordan basis for A, arranged appro-
priately. Specifically, the columns corresponding to a single Jordan block Jk (λ) must be the vectors of the
corresponding Jordan chain {x1 , . . . , xk }, ordered from the eigenvector x1 to the generator xk .
2. Invariant Subspaces: Each Kλi is invariant under the transformation A. This means that the action of
A can be studied independently within each Kλi . If we choose a basis for Fn that is the union of bases for
each Kλi , the matrix representation of A in this basis will be block diagonal, with blocks corresponding
to the restriction of A to each Kλi .
3. Action within a Generalized Eigenspace: Within a single generalized eigenspace Kλ , the transfor-
mation A acts like λI + N , where N = A − λI is nilpotent on Kλ (meaning N m = 0 on Kλ , where
m = AM(λ)).
4. Jordan Basis for Nilpotent Operator: The core of the construction involves finding a specific basis
for Kλ (a Jordan basis) such that the matrix representation of the nilpotent operator N restricted to Kλ
takes the form of a block diagonal matrix composed of Jordan blocks with 0 on the diagonal (e.g., Jk (0)).
This construction involves analyzing the null spaces of N, N 2 , . . . , N m and carefully selecting basis vectors
to form Jordan chains, as outlined in Section 4.1.
5. Combining Bases: When the Jordan bases for each Kλi are combined, they form a Jordan basis for
the entire space Fn . In this basis, the matrix representation of A becomes the Jordan Canonical Form
J = P −1 AP , where P ’s columns are the Jordan basis vectors.
Uniqueness (Proof Sketch): The uniqueness part relies on showing that the number and sizes of the Jordan
blocks for each eigenvalue λ are uniquely determined by the properties of the matrix A, specifically by the
dimensions of the null spaces of the powers of (A − λI).
1. Similarity Invariants: If J = P −1 AP , then J − λI = P −1 (A − λI)P . This means J − λI and A − λI
are similar.
2. Powers are Similar: Consequently, (J − λI)k = P −1 (A − λI)k P , so (J − λI)k and (A − λI)k are also
similar for any k ≥ 1.
3. Rank/Nullity Invariance: Similar matrices have the same rank and the same nullity. Therefore,
nullity((J − λI)k ) = nullity((A − λI)k ) for all k ≥ 1.
4. Nullity Determines Block Structure: The nullity of (J − λI)k depends directly on the number and
sizes of the Jordan blocks in J corresponding to the eigenvalue λ. Specifically, nullity((J − λI)k ) is the
sum, over all blocks Jj (λ) associated with λ, of min(k, size of block). Let nk = nullity((A − λI)k ). As
shown previously, the number of blocks of size exactly m × m is 2nm − nm−1 − nm+1 .
5. Unique Determination: Since the values nk = nullity((A − λI)k ) are determined solely by A, and these
values in turn uniquely determine the number of Jordan blocks of each possible size for the eigenvalue λ,
the structure of the Jordan Canonical Form J associated with A must be unique (up to the ordering of
the blocks).
In summary, the existence of the JCF is guaranteed by the ability to decompose the space using generalized
eigenspaces and construct Jordan bases within them. The uniqueness follows because the dimensions of the null
spaces of powers of (A − λI), which are similarity invariants, completely dictate the structure of the Jordan
blocks.
directions where the transformation involves both scaling (by λ) and a "shear-like" component represented by
the super-diagonal 1s.
Since N is nilpotent (N m = 0), the sum only goes up to j = m − 1 (or k if k < m − 1). The powers N j are
known matrices with 1s moving up and to the right: N 0 = I N 1 = N (1s on 1st super-diagonal) N 2 = (1s on
2nd super-diagonal) . . . N m−1 = (1 in the top-right corner) N j = 0 for j ≥ m
This provides an explicit formula for (Ji )k and thus for J k and Ak .
Similarly, the matrix exponential eAt = P eJt P −1 can be computed. eJt is block diagonal, and for each block
Ji = λI + N :
eJi t = e(λI+N )t = eλIt eN t (since λI and N commute)
∞ j
X (N t)
= (eλt I) ·
j=0
j!
m−1
X N j tj
= eλt · (since N m = 0)
j=0
j!
This is crucial for solving systems of linear differential equations ẋ = Ax, where the solution is x(t) = eAt x(0).
The JCF allows computing eAt even when A is not diagonalizable.
46 CHAPTER 4. JORDAN NORMAL FORM
Stability Analysis
In systems theory (e.g., ẋ = Ax), the stability of the system is determined by the eigenvalues of A. For
asymptotic stability, all eigenvalues must have negative real parts. The JCF helps understand the behavior
near marginal stability (eigenvalues on the imaginary axis).
If an eigenvalue λ on the imaginary axis has GM(λ) < AM(λ), meaning there is a Jordan block of size > 1 for
λ, the corresponding solutions will involve terms like teλt , t2 eλt , etc. (arising from the powers of t in eJi t ). Even
if Re(λ) = 0, these polynomial terms in t cause the solution magnitude to grow over time, leading to instability.
Therefore, for an LTI system ẋ = Ax to be stable (in the sense of Lyapunov), all eigenvalues must have Re(λ)
≤ 0, AND for any eigenvalue with Re(λ) = 0, its geometric multiplicity must equal its algebraic multiplicity
(i.e., all Jordan blocks for purely imaginary eigenvalues must be 1 × 1).
In conclusion, the Jordan Canonical Form provides the necessary tool to fully understand the structure, compute
functions (like powers and exponentials), and analyze the behavior (like stability) of linear transformations and
systems represented by matrices that are not diagonalizable.
Chapter 5
In the analysis of linear systems, particularly when dealing with stability, convergence of iterative methods, and
sensitivity analysis, it is essential to have ways to measure the "size" or "magnitude" of vectors and matrices.
Vector norms provide a way to quantify the length or magnitude of vectors, while matrix norms extend this
concept to matrices, measuring how much a matrix can amplify the norm of a vector.
This chapter also introduces positive definite and semidefinite matrices, which play a crucial role in stability
analysis (especially Lyapunov stability), optimization, and various areas of engineering and physics.
n
!1/2
X p
||x||2 = |xi |2 = |x1 |2 + |x2 |2 + · · · + |xn |2
i=1
Proof of Triangle Inequality (Minkowski Inequality for p=2): This relies on the Cauchy-Schwarz inequality:
|xH y| ≤ ||x||2 ||y||2 . Then ||x + y||22 = (x + y)H (x + y) = xH x + y H y + xH y + y H x = ||x||22 + ||y||22 + 2Re(xH y).
47
48 CHAPTER 5. MATRIX NORMS AND POSITIVE DEFINITE MATRICES
Using Cauchy-Schwarz, Re(xH y) ≤ |xH y| ≤ ||x||2 ||y||2 . So, ||x + y||22 ≤ ||x||22 + ||y||22 + 2||x||2 ||y||2 = (||x||2 +
||y||2 )2 . Taking the square root gives ||x + y||2 ≤ ||x||2 + ||y||2 .
2. The 1-norm (or Manhattan/Taxicab norm): This norm measures the sum of the absolute values of
the components.
n
X
||x||1 = |xi | = |x1 | + |x2 | + · · · + |xn |
i=1
It represents the distance traveled if movement is restricted to grid lines (like streets in Manhattan).
yi |. By the triangle inequality for scalars, |xi + yi | ≤ |xi | + |yi |.
P
P ||x + y||1 =P |xi + P
Proof of Triangle Inequality:
Therefore, ||x + y||1 ≤ (|xi | + |yi |) = |xi | + |yi | = ||x||1 + ||y||1 .
3. The Infinity-norm (or Maximum/Supremum norm): This norm measures the maximum absolute
value among the components.
||x||∞ = max{|x1 |, |x2 |, . . . , |xn |}
Proof of Triangle Inequality: ||x+y||∞ = maxi |xi +yi |. For any i, |xi +yi | ≤ |xi |+|yi |. Since |xi | ≤ maxj |xj | =
||x||∞ and |yi | ≤ maxj |yj | = ||y||∞ , we have |xi + yi | ≤ ||x||∞ + ||y||∞ . Since this holds for all i, the maximum
value must also satisfy this inequality: maxi |xi + yi | ≤ ||x||∞ + ||y||∞ . Thus, ||x + y||∞ ≤ ||x||∞ + ||y||∞ .
4. The p-norm (Hölder norm): This is a generalization that includes the 1-norm, 2-norm, and ∞-norm as
special cases. For any real number p ≥ 1:
n
!1/p
X
||x||p = |xi |p
i=1
Equivalence of Norms
An important property in finite-dimensional vector spaces like Fn is that all norms are equivalent. This means
that for any two norms || · ||a and || · ||b on Fn , there exist positive constants c1 and c2 such that for all vectors
x ∈ Fn :
c1 ||x||b ≤ ||x||a ≤ c2 ||x||b
This equivalence implies that concepts like convergence and boundedness are independent of the specific norm
chosen in a finite-dimensional space. If a sequence of vectors converges to a limit using one norm, it converges
to the same limit using any other norm.
Examples of Equivalence Inequalities in Fn :
√
• ||x||∞ ≤ ||x||2 ≤ n||x||∞
• ||x||∞ ≤ ||x||1 ≤ n||x||∞
√
• (1/ n)||x||1 ≤ ||x||2 ≤ ||x||1
These relationships can be derived using inequalities like Cauchy-Schwarz or by considering the definitions
directly.
Vector norms provide the foundation for defining matrix norms and are essential tools for analyzing the mag-
nitude and convergence properties of vectors in linear algebra and systems theory.
This definition measures the largest possible "amplification" or "gain" of the linear transformation represented
by A, relative to the chosen vector norms.
Properties of Induced Norms:
• They satisfy the four basic matrix norm properties (non-negativity, positive definiteness, homogeneity,
triangle inequality).
• They satisfy the consistency condition with the underlying vector norms: ||Ax||b ≤ ||A||a,b ||x||a .
• If the same vector norm is used for both the domain and codomain (|| · ||a = || · ||b ) for square matrices,
the induced norm is submultiplicative: ||AB|| ≤ ||A||||B||.
Common Induced Norms (usually assuming the same vector norm for domain and codomain):
Let A be an m × n matrix.
1. The Matrix 1-norm (induced by vector 1-norm): This is the maximum absolute column sum.
m
!
X
||A||1 = max |aij |
1≤j≤n
i=1
||A||∞ ≤ maxk Rk . It can be shown this maximum is achieved by choosing an x with ||x||∞ = 1 where
xj = sign(akj ) for the row k with the maximum absolute sum.
3. The Matrix 2-norm (or Spectral Norm, induced by vector 2-norm): This norm is related to the
singular values of the matrix.
σmax (A).
Note: If A is a normal matrix (AH A = AAH ), then ||A||2 = max |λi (A)| (the spectral radius). If A is Hermitian
(AH = A) or symmetric real (AT = A), then ||A||2 = max |λi (A)|.
Definitions
Let A be an n × n real symmetric matrix (A = AT ).
1. Positive Definite (PD): The matrix A is called positive definite if the quadratic form xT Ax is strictly
positive for all non-zero vectors x ∈ Rn .
Notation: A > 0.
2. Positive Semidefinite (PSD): The matrix A is called positive semidefinite if the quadratic form
xT Ax is non-negative for all vectors x ∈ Rn .
Notation: A ≥ 0.
Similarly, we can define negative definite (A < 0 if xT Ax < 0 for x ̸= 0) and negative semidefinite (A ≤ 0
if xT Ax ≤ 0 for all x). A symmetric matrix that is neither positive semidefinite nor negative semidefinite is
called indefinite (meaning the quadratic form xT Ax can take both positive and negative values).
Note: Non-symmetric matrices are generally not classified as positive definite or semidefinite using this def-
inition, although the quadratic form xT Ax can still be evaluated. However, xT Ax = xT ((A + AT )/2)x, so
the quadratic form only depends on the symmetric part of A, (A + AT )/2. Therefore, discussions of positive
definiteness typically focus on symmetric matrices.
• If A is PSD, then all its diagonal entries are non-negative (aii ≥ 0).
Proof: Consider x = ei (the i-th standard basis vector). Then xT Ax = eT
i Aei = aii . If A is PD, aii > 0.
If A is PSD, aii ≥ 0.
4. Principal Submatrices:
• If A is PD, then all its principal submatrices are also PD. (A principal submatrix is obtained by
deleting the same set of rows and columns).
• If A is PSD, then all its principal submatrices are also PSD.
Proof Sketch: Let AK be a principal submatrix corresponding to index set K. Let y be a non-zero vector
of the size of AK . Construct x by setting xi = yi if i ∈ K and xi = 0 otherwise. Then y T AK y = xT Ax.
Since x ̸= 0 if y ̸= 0, the sign of y T AK y follows the sign of xT Ax.
5. Invertibility:
• A PD matrix is always invertible (since all eigenvalues are non-zero, det(A) ̸= 0).
• A PSD matrix may or may not be invertible (it is invertible if and only if all eigenvalues are strictly
positive, i.e., if it is actually PD).
6. Inverse: If A is PD, then its inverse A−1 is also PD.
Proof. The eigenvalues of A−1 are 1/λi (A). If λi (A) > 0, then 1/λi (A) > 0. Since A−1 is also symmetric,
it is PD.
7. Sum: If A and B are n × n matrices of the same type (both PD or both PSD), then A + B is also of that
type.
Proof (PD case). xT (A + B)x = xT Ax + xT Bx. If x ̸= 0, then xT Ax > 0 and xT Bx > 0, so their sum
is > 0.
8. Congruence Transformation: If A is PD (or PSD) and C is an invertible n × n matrix, then the
congruent matrix C T AC is also PD (or PSD).
Proof (PD case). Consider y T (C T AC)y = (Cy)T A(Cy). Let x = Cy. Since C is invertible and y ̸= 0,
then x ̸= 0. Therefore, (Cy)T A(Cy) = xT Ax > 0 (since A is PD). Thus C T AC is PD.
Note: If C is merely non-square (m × n, m ̸= n) but has full column rank, and A (m × m) is PD, then
C T AC (n × n) is PD. If A is PSD, C T AC is PSD.
9. Matrix Square Root: If A is PSD, there exists a unique PSD matrix B such that B 2 = A. This matrix
B is called the principal square root of A, often denoted A1/2 .
Examples
2 1
• A= . Eigenvalues are 1, 3 (both > 0). A is PD.
1 2
2 1 x1
xT Ax = [x1 , x2 ] = 2x21 + 2x1 x2 + 2x22 = x21 + x22 + (x1 + x2 )2 > 0 for (x1 , x2 ) ̸= (0, 0).
1 2 x2
1 1
• B= . Eigenvalues are 0, 2 (both ≥ 0). B is PSD (but not PD).
1 1
Let A = [aij ].
• ∆1 = det([a11 ]) = a11
a11 a12
• ∆2 = det
a21 a22
a11 a12 a13
• ∆3 = det a21 a22 a23
a31 a32 a33
• ...
• ∆n = det(A)
Sylvester’s Criterion
Theorem 5.4.1 (Sylvester’s Criterion). An n × n real symmetric matrix A is positive definite if and only if all
of its leading principal minors are strictly positive.
• Base Case (n=1): A = [a11 ]. ∆1 = a11 . If ∆1 > 0, then a11 > 0. The quadratic form is xT Ax = a11 x21 .
Since a11 > 0, xT Ax > 0 for x1 ̸= 0. So A is PD.
• Inductive Step: Assume the criterion holds for (n − 1) × (n − 1) matrices. Let A be n × n with ∆k > 0
for k = 1, . . . , n. Let An−1 be the upper-left (n − 1) × (n − 1) submatrix. Its leading principal minors are
∆1 , . . . , ∆n−1 , which are all positive by assumption. By the inductive hypothesis, An−1 is PD. The proof
then often proceeds by considering the block decomposition of A and using properties of Schur complements
or LU/Cholesky decomposition, showing that the positivity of all ∆k implies that all eigenvalues of A must
be positive. A key step involves showing that if An−1 is PD and det(A) > 0, it doesn’t necessarily mean
A is PD, but the condition on all leading principal minors ensures it. Alternatively, one can show that
the conditions ∆k > 0 allow for a Cholesky decomposition A = LLT where L is a lower triangular matrix
with positive diagonal entries. Then xT Ax = xT LLT x = (LT x)T (LT x) = ||LT x||22 . Since L is invertible
(because its diagonal entries are positive), LT x ̸= 0 if x ̸= 0. Therefore, xT Ax > 0 for x ̸= 0, proving A
is PD.
54 CHAPTER 5. MATRIX NORMS AND POSITIVE DEFINITE MATRICES
The matrix exponential, denoted eAt , is a fundamental concept in the theory of linear time-invariant (LTI)
systems and differential equations. It generalizes the scalar exponential function eat to matrices and provides
the solution to homogeneous linear systems of differential equations ẋ = Ax.
∞
z2 z3 X zk
ez = 1 + z + + + ··· =
2! 3! k!
k=0
the matrix exponential eM for a square matrix M is defined by substituting the matrix M into this power series:
Definition 6.1.1. For an n × n matrix M (real or complex), the matrix exponential eM is defined by the
infinite series:
∞
M2 M3 X Mk
eM = I + M + + + ··· =
2! 3! k!
k=0
∞
(At)2 (At)3 X (At)k
eAt = I + (At) + + + ··· =
2! 3! k!
k=0
∞
t2 2 t3 3 X tk
eAt = I + tA + A + A + ··· = Ak
2! 3! k!
k=0
55
56 CHAPTER 6. THE MATRIX EXPONENTIAL
∞ k
X t
eAt = Ak
k!
k=0
∞ k
t λk 0
X
=
k! 0 µk
k=0
∞
(λt)k /k!
X 0
=
0 (µt)k /k!
k=0
(λt)k /k! P 0
P
=
0 (µt)k /k!
λt
e 0
=
0 eµt
This shows that for a diagonal matrix, the matrix exponential is simply the diagonal matrix of the scalar
exponentials of the diagonal entries (multipliedby t).
0 1
Example 6.1.2 (Nilpotent Matrix). Let N = .
0 0
2 0 0
N = = 0.
0 0
t2 2 t3 3
eN t = I + tN + N + N + ...
2! 3!
= I + tN + 0 + 0 + . . .
1 0 0 1
= +t
0 1 0 0
1 t
=
0 1
The power series definition is fundamental, but its convergence must be established to ensure it is well-defined
for any square matrix A and any scalar t. This will be addressed in the next section.
For this definition to be meaningful, we must ensure that this series converges for any n × n matrix A and any
scalar t. Convergence here means that each element (i, j) of the partial sum matrices converges to a finite limit
as the number of terms goes to infinity.
We can prove convergence using matrix norms. Recall from Section 5.2 that for any matrix norm || · || that is
submultiplicative (like induced norms or the Frobenius norm), we have ||M k || ≤ ||M ||k .
Let’s consider the norm of each term in the series for eAt :
tk k |tk | k |t|k k
A = ||A || = ||A ||
k! k! k!
Using the submultiplicative property:
tk k |t|k (||A|||t|)k
A ≤ ||A||k =
k! k! k!
Now, consider the infinite series of the norms of the terms:
∞ ∞
X tk k X (||A|||t|)k
A ≤
k! k!
k=0 k=0
6.3. PROPERTIES OF eAt (E.G., d At
dt e = AeAt , e0 = I, eA(t+s) = eAt eAs , INVERSE) 57
Let z = ||A|||t|. This is a non-negative real number. The series on the right becomes:
∞
X zk
k!
k=0
This is exactly the power series expansion for the scalar exponential function ez . We know from calculus that
the power series for ez converges absolutely for all real (or complex) numbers z.
P∞ k
Therefore, the series k=0 (||A|||t|)k! converges to e||A|||t| .
Since the series of norms ||(tP/k!)Ak || converges (it is bounded above by the convergent series e||A|||t| ), this
P k
implies that the matrix series (tk /k!)Ak converges absolutely (element-wise). Absolute convergence implies
convergence in finite-dimensional spaces.
of Matrix Series): A series of matrices Mk converges if the sequence
P
Formal Argument (Convergence
Pm
of partial sums Sm = k=0 M k converges element-wise. That is, for each entry (i, j), the sequence (Sm )ij
converges to a limit (S)Pij . A sufficient condition for convergence is absolute convergence with respect to some
matrix norm, meaning ||Mk || converges.
In our case, Mk = (tk /k!)Ak . We showed that ||Mk || ≤ e||A|||t| , which is finite. By the Weierstrass M-test
P
adapted for matrix series (or simply by noting that convergence of the norm series implies element-wise absolute
convergence due to norm equivalence), the matrix series (tk /k!)Ak converges for all n × n matrices A and all
P
scalars t.
Conclusion: The power series definition of the matrix exponential eAt is well-defined because the series con-
verges absolutely for all square matrices A and all scalars t.
d At
6.3 Properties of eAt (e.g., dt e = AeAt , e0 = I, eA(t+s) = eAt eAs , inverse)
Now that we have established the definition and convergence of the matrix exponential eAt , we can explore its
fundamental properties. Many of these properties are direct analogues of the properties of the scalar exponential
function eat .
Let A and B be n × n matrices, and let t, s be scalars.
1. Value at t=0:
eA·0 = e0 = I (the n × n identity matrix)
Proof. Substitute t = 0 into the power series definition:
∞
X (A · 0)k A0 (A · 0)1 (A · 0)2 I
eA·0 = = + + + ··· = + 0 + 0 + ··· = I
k! 0! 1! 2! 1
k=0
= AeAt
= eAt A
This property is extremely useful for computing eAt when A is diagonalizable (M = Λ) or can be trans-
formed to Jordan form (M = J).
8. Transpose / Conjugate Transpose:
T
(eAt )T = eA t
H
(eAt )H = eA t
Proof (Transpose case). Using the property that (M k )T = (M T )k and that transpose distributes over
sums:
∞ k
!T
At T
X t k
(e ) = A
k!
k=0
∞
X k T
t k
= A
k!
k=0
∞ k
X t
= (Ak )T
k!
k=0
∞ k
X t
= (AT )k
k!
k=0
T
= eA t
Then eAt is also block diagonal with the exponentials of the blocks:
At
e 1 0 ··· 0
0 eA2 t ··· 0
eAt = . .. .. ..
.. . . .
0 0 ··· eAp t
Proof. This follows from the fact that powers Ak are block diagonal with blocks (Ai )k , and the power
series definition.
These properties are essential for manipulating and computing the matrix exponential, and they form the basis
for solving linear systems of differential equations.
60 CHAPTER 6. THE MATRIX EXPONENTIAL
Proof. We need to show two things: first, that the proposed solution x(t) = eA(t−t0 ) x0 actually satisfies both
the differential equation and the initial condition, and second, that this solution is unique.
1. Verification of the Solution:
• Initial Condition: Let’s check the value of the proposed solution at t = t0 .
x(t0 ) = eA(t0 −t0 ) x0 = eA·0 x0
Using Property 1 from Section 6.3 (e0 = I):
x(t0 ) = Ix0 = x0
The initial condition is satisfied.
• Differential Equation: Let’s differentiate the proposed solution x(t) = eA(t−t0 ) x0 with respect to t.
Let τ = t − t0 . Then dτ /dt = 1. Using the chain rule and Property 2 from Section 6.3 (d/dτ eAτ = AeAτ ):
d A(t−t0 )
ẋ(t) = [e x0 ]
dt
d A(t−t0 )
= e x0 (Since x0 is a constant vector)
dt
d Aτ dτ
= e · x0
dτ dt
= [AeAτ · 1]x0
= AeA(t−t0 ) x0
Now, substitute the proposed solution x(t) = eA(t−t0 ) x0 back into the expression:
ẋ(t) = A[eA(t−t0 ) x0 ]
= Ax(t)
The differential equation ẋ = Ax is satisfied.
Since the proposed solution satisfies both the differential equation and the initial condition, it is indeed a solution
to the initial value problem.
2. Uniqueness of the Solution: Suppose there are two solutions, x1 (t) and x2 (t), that both satisfy ẋ = Ax
and x(t0 ) = x0 . Let y(t) = x1 (t) − x2 (t).
• Derivative of y(t):
d
(x1 (t) − x2 (t)) = ẋ1 (t) − ẋ2 (t)
ẏ(t) =
dt
Since both x1 and x2 satisfy ẋ = Ax, we have ẋ1 (t) = Ax1 (t) and ẋ2 (t) = Ax2 (t).
ẏ(t) = Ax1 (t) − Ax2 (t) = A(x1 (t) − x2 (t)) = Ay(t)
So, y(t) also satisfies the homogeneous differential equation ẏ = Ay.
6.5. METHODS FOR COMPUTING eAt 61
Using the chain rule and derivative property for the first term (let τ = −(t − t0 ), dτ /dt = −1):
d −A(t−t0 ) d Aτ dτ
e = e · = (AeAτ ) · (−1) = −Ae−A(t−t0 )
dt dτ dt
Substituting this and ẏ(t) = Ay(t) into the expression for ż(t):
So, the constant vector C is the zero vector. This means z(t) = 0 for all t.
Since e−A(t−t0 ) is always invertible (Property 6, Section 6.3), we can multiply by its inverse eA(t−t0 ) :
Iy(t) = 0
Conclusion: The matrix exponential eA(t−t0 ) provides the unique solution x(t) = eA(t−t0 ) x0 to the funda-
mental initial value problem ẋ = Ax, x(t0 ) = x0 . This result is central to the analysis and solution of linear
time-invariant systems.
d At
e = AeAt
dt
with the initial condition eA·0 = I.
Let X(t) = eAt . Then the equation is dX/dt = AX, with X(0) = I. Take the Laplace transform of both sides
of the differential equation:
L{dX/dt} = L{AX}
Using the derivative property on the left side and linearity on the right side (A is constant):
sX̃(s) − I = AX̃(s)
Since X(t) = eAt , we have found the Laplace transform of the matrix exponential:
Derivation Summary:
1. Start with the defining differential equation dX/dt = AX, X(0) = I, where X(t) = eAt .
2. Take the Laplace transform: sX̃(s) − X(0) = AX̃(s).
3. Substitute X(0) = I: sX̃(s) − I = AX̃(s).
4. Rearrange: (sI − A)X̃(s) = I.
5. Solve for X̃(s): X̃(s) = (sI − A)−1 .
6. Take the inverse Laplace transform: X(t) = eAt = L−1 {(sI − A)−1 }.
Computation: This method requires computing the inverse of the matrix (sI − A), which involves symbolic
manipulation with the variable s (often using Cramer’s rule or adjugate matrix formula: (sI − A)−1 = adj(sI −
A)/ det(sI − A)), and then finding the inverse Laplace transform of each element of the resulting matrix of
rational functions in s. This can be computationally intensive but provides a closed-form expression.
6.5. METHODS FOR COMPUTING eAt 63
Derivation Summary:
1. Assume A is diagonalizable: A = V ΛV −1 .
2. Apply the similarity property of matrix exponentials: eAt = V eΛt V −1 .
3. Compute eΛt for the diagonal matrix Λ: eΛt = diag(eλ1 t , . . . , eλn t ).
4. Substitute back: eAt = V diag(eλ1 t , . . . , eλn t )V −1 .
Computation:
1. Find the eigenvalues λi and corresponding eigenvectors vi of A.
2. Check if A is diagonalizable (n linearly independent eigenvectors).
3. Form V = [v1 | . . . |vn ] and Λ = diag(λ1 , . . . , λn ).
4. Compute V −1 .
5. Form eΛt = diag(eλ1 t , . . . , eλn t ).
6. Calculate the product eAt = V eΛt V −1 .
Derivation Summary:
1. Find the Jordan form J and Jordan basis matrix P such that A = P JP −1 .
2. Apply the similarity property: eAt = P eJt P −1 .
Pm−1
3. Compute eJt by computing the exponential of each Jordan block Ji using the formula eJi t = eλt j=0 (N j tj /j!).
The coefficients αi (t) can be found by considering the eigenvalues. If we substitute an eigenvalue λi into the
characteristic polynomial, we get p(λi ) = 0. It can be shown that the scalar exponential eλi t must satisfy the
same relationship with the coefficients αj (t):
If the matrix A has n distinct eigenvalues λ1 , . . . , λn , this gives a system of n linear equations for the n unknown
coefficients α0 (t), . . . , αn−1 (t). Solving this system yields the coefficients, which can then be plugged back into
the expression for eAt .
If there are repeated eigenvalues, we need additional equations obtained by differentiating the equation with
respect to λ and evaluating at the repeated eigenvalue. For an eigenvalue λi with multiplicity m, we use the
equations obtained from:
dk λt dk h X i
[e ]| λ=λi = α j (t)λj
|λ=λi for k = 0, 1, . . . , m − 1.
dλk dλk
Computation:
1. Find the characteristic polynomial p(λ) of A.
2. Find the eigenvalues λi (roots of p(λ) = 0).
3. Set up the system of n equations relating eλi t (and possibly its derivatives w.r.t. λ) to the polynomial in
λi with unknown coefficients α0 (t), . . . , αn−1 (t).
4. Solve the system for α0 (t), . . . , αn−1 (t).
5. Compute eAt = α0 (t)I + α1 (t)A + · · · + αn−1 (t)An−1 .
6.5. METHODS FOR COMPUTING eAt 65
2 1
Example 6.5.1. A = . Eigenvalues λ1 = 1, λ2 = 3. n = 2. Assume eAt = α0 (t)I + α1 (t)A. Equations
1 2
from eigenvalues:
Subtracting the first from the second: e3t − et = 2α1 (t) =⇒ α1 (t) = (e3t − et )/2. Substituting α1 into the first
equation: et = α0 (t) + (e3t − et )/2 =⇒ α0 (t) = et − (e3t − et )/2 = (2et − e3t + et )/2 = (3et − e3t )/2.
Now compute eAt :
Similarly, any function f (A) that can be expressed as a power series (like eAt ) can be written using the eigen-
values:
Xn
f (A) = f (λi )vi wiT
i=1
n
X
eAt = eλi t vi wiT
i=1
Derivation: Start with the solution x(t) = eAt x(0). Since {v1 , . . . , vn } form a basis, we can write the initial
condition x(0) as a linear combination of eigenvectors:
n
X
x(0) = cj vj
j=1
n
X
= cj (eAt vj )
j=1
∞ k
!
At
X t k
e vj = A vj
k!
k=0
∞ k
X t
= (Ak vj )
k!
k=0
n
!
X
λi t
x(t) = e vi wiT x(0)
i=1
Computation:
1. Find eigenvalues λi .
2. Find right eigenvectors vi .
3. Find left eigenvectors wi (eigenvectors of AT ).
4. Normalize vi and wi such that wiT vj = δij .
5. Compute the sum eAt = e vi wiT .
P λi t
This method explicitly shows how the solution is a sum of modes eλi t shaped by the eigenvectors.
Each of these methods provides a way to compute eAt , with varying degrees of computational complexity and
applicability depending on the properties of the matrix A.
Part I
67
69
This part transitions from the foundational linear algebra concepts to their application in the analysis and
description of linear dynamical systems. We will focus on the state-space representation, which provides a
powerful and unified framework for modeling, analyzing, and designing control systems.
70
Chapter 7
State-Space Representation
State-space representation is a mathematical model of a physical system described by a set of input, output, and
state variables related by first-order differential equations (for continuous-time systems) or difference equations
(for discrete-time systems). This approach provides a complete description of the system’s internal dynamics,
unlike the input-output representation (e.g., transfer functions) which only describes the relationship between
the input and output signals.
State Variables
The variables that constitute the state are called state variables. They are typically chosen to represent
quantities that describe the internal energy storage or memory elements of the system. For example:
• In mechanical systems: positions and velocities of masses.
• In electrical circuits: voltages across capacitors and currents through inductors.
• In thermal systems: temperatures at various points.
The choice of state variables for a given system is not unique, but the number of state variables required for a
minimal representation (the dimension of the state) is unique and is called the order of the system.
State Vector
The state variables at a given time t are typically arranged into a column vector called the state vector,
denoted by x(t).
If a system has n state variables, x1 (t), x2 (t), . . . , xn (t), the state vector is:
x1 (t)
x2 (t)
x(t) = .
..
xn (t)
The state vector x(t) belongs to an n-dimensional vector space called the state space, typically Rn or Cn . The
evolution of the system over time corresponds to a trajectory traced by the state vector x(t) within the state
space.
71
72 CHAPTER 7. STATE-SPACE REPRESENTATION
2. Output Equation: Describes how the output vector y(t) is obtained from the state vector x(t) and the
input vector u(t). It is an algebraic equation:
Where:
• t: Time (continuous variable).
• x(t): The n × 1 state vector (n is the order of the system).
• ẋ(t): The time derivative of the state vector, dx/dt.
• u(t): The m × 1 input vector (m is the number of inputs).
• y(t): The p × 1 output vector (p is the number of outputs).
• A: The n × n state matrix (or system matrix). It describes the internal dynamics of the system (how
the state evolves in the absence of input).
• B: The n × m input matrix (or control matrix). It describes how the inputs affect the state dynamics.
• C: The p × n output matrix (or sensor matrix). It describes how the state variables are combined to
form the outputs.
• D: The p × m feedthrough matrix (or direct transmission matrix). It describes the direct influence of
the inputs on the outputs, bypassing the state dynamics.
Key Characteristics:
• Linearity: The equations are linear combinations of the state and input vectors. This allows the use of
powerful linear algebra tools for analysis and design.
• Time-Invariance: The matrices A, B, C, and D are constant; they do not depend on time t. This implies
that the system’s behavior is consistent regardless of when the inputs are applied.
7.3. 73
Interpretation of Matrices
• A Matrix (State Matrix): The term Ax(t) in the state equation governs the system’s natural response
(how the state changes if u(t) = 0). The eigenvalues of A determine the stability and modes of the
unforced system.
• B Matrix (Input Matrix): The term Bu(t) shows how the external inputs u(t) drive the state variables.
If an element Bij is zero, the j-th input has no direct effect on the rate of change of the i-th state variable.
• C Matrix (Output Matrix): The term Cx(t) determines how the internal state x(t) is observed
through the outputs y(t). If an element Cij is zero, the j-th state variable does not directly contribute to
the i-th output.
• D Matrix (Feedthrough Matrix): The term Du(t) represents a direct path from input to output.
If D = 0 (the zero matrix), the system is called strictly proper. In many physical systems, especially
those with inertia, D is often zero because inputs typically affect the state derivatives first, and the state
then influences the output. Non-zero D implies an instantaneous effect of the input on the output.
The quartet of matrices (A, B, C, D) completely defines the LTI system in state-space form.
Example 7.2.1 (Simple Mass-Spring-Damper). Consider a mass m attached to a wall by a spring (stiffness k)
and a damper (damping coefficient c). An external force u(t) is applied to the mass. Let the position of the
mass from equilibrium be z(t). The equation of motion is: mz̈ + cż + kz = u(t).
To put this in state-space form, we choose state variables. A common choice for second-order mechanical
systems is position and velocity:
• x1 (t) = z(t) (position)
• x2 (t) = ż(t) (velocity)
Now, find the derivatives of the state variables:
• ẋ1 (t) = ż(t) = x2 (t)
• ẋ2 (t) = z̈(t). From the equation of motion: z̈ = (1/m)[u(t) − cż − kz] = (1/m)[u(t) − cx2 (t) − kx1 (t)]
Arrange these into the state equation ẋ = Ax + Bu:
ẋ1 0 1 x1 0
= + u(t)
ẋ2 −k/m −c/m x2 1/m
0 1 0
So, A = and B = .
−k/m −c/m 1/m
If we choose the output to be the position, y(t) = z(t) = x1 (t), then the output equation y = Cx + Du is:
x1
y(t) = 1 0 + 0 u(t)
x2
So, C = 1 0 and D = [0] (a scalar zero in this case). This LTI state-space representation (A, B, C, D) fully
2. Output Equation:
y(t) = C(t)x(t) + D(t)u(t)
Where:
• t: Time (continuous variable).
• x(t): The n × 1 state vector.
• ẋ(t): The time derivative of the state vector.
• u(t): The m × 1 input vector.
• y(t): The p × 1 output vector.
• A(t): The n × n time-varying state matrix.
• B(t): The n × m time-varying input matrix.
• C(t): The p × n time-varying output matrix.
• D(t): The p × m time-varying feedthrough matrix.
Key Characteristics:
• Linearity: The equations remain linear in x(t) and u(t) at any given time t.
• Time-Varying: At least one of the matrices A(t), B(t), C(t), or D(t) explicitly depends on time t. If all
matrices are constant, the system reduces to an LTI system.
The current through the capacitor is i(t) = CdVC /dt = C ẋ(t). Substituting i(t) into the KVL equation:
1
x2 = C ẋ1 =⇒ ẋ1 = x2
C
• Inductor Equation: The voltage across the inductor is VL = LdiL /dt = Lẋ2 .
• KVL Equation: Apply Kirchhoff’s Voltage Law around the loop:
1 R 1
ẋ2 (t) = − x1 (t) − x2 (t) + u(t)
L L L
If the output was the current, y(t) = iL (t) = x2 (t), then C = 0 1 and
D = [0]. If the output was the voltage
across the resistor, y(t) = VR (t) = RiL (t) = Rx2 (t), then C = 0 R and D = [0].
76 CHAPTER 7. STATE-SPACE REPRESENTATION
Example 7.4.2 (Two-Mass System (LTI)). Consider two masses, m1 and m2 , connected by a spring (k2 ) and
damper (c2 ). Mass m1 is connected to a wall by another spring (k1 ) and damper (c1 ). An external force u(t)
acts on mass m1 .
[Placeholder for Two-Mass System Diagram]
Let z1 (t) and z2 (t) be the displacements of m1 and m2 from their equilibrium positions.
Equations of Motion (Newton’s Second Law):
• For m1 :
m1 z̈1 = −k1 z1 − c1 ż1 + k2 (z2 − z1 ) + c2 (ż2 − ż1 ) + u(t)
m1 z̈1 = −(k1 + k2 )z1 − (c1 + c2 )ż1 + k2 z2 + c2 ż2 + u(t)
• For m2 :
m2 z̈2 = −k2 (z2 − z1 ) − c2 (ż2 − ż1 )
m2 z̈2 = k2 z1 + c2 ż1 − k2 z2 − c2 ż2
• ẋ4 = z̈2 = 1
m2 [k2 x1 + c2 x2 − k2 x3 − c2 x4 ]
State-Space Form:
ẋ1 0 1 0 0 x1 0
ẋ2 −(k1 + k2 )/m1 −(c1 + c2 )/m1 k2 /m1 c2 /m1 x2 + 1/m1 u(t)
=
ẋ3 0 0 0 1 x3 0
ẋ4 k2 /m2 c2 /m2 −k2 /m2 −c2 /m2 x4 0
Example 7.4.3 (Simple Pendulum (Nonlinear, then LTV via Linearization)). Consider a simple pendulum of
length L and mass m, with angle θ from the vertical. An input torque τ (t) is applied at the pivot. The equation
of motion (ignoring friction) is:
mL2 θ̈ = −mgL sin(θ) + τ (t)
This is a nonlinear system due to the sin(θ) term.
Linearization around Equilibrium (θ = 0): Assume the pendulum stays close to the stable equilibrium
point θ = 0 (hanging down). For small angles, sin(θ) ≈ θ. The linearized equation becomes:
g 1
θ̈ + θ≈ τ (t)
L mL2
Let u(t) = τ (t). Choose state variables:
• x1 = θ
7.5. LINEARIZATION OF NONLINEAR SYSTEMS AROUND EQUILIBRIA (BRIEF OVERVIEW) 77
• x2 = θ̇
State Equations:
• ẋ1 = θ̇ = x2
• ẋ2 = θ̈ ≈ − Lg x1 + 1
mL2 u(t)
0 1 0
A= , B=
−g/L 0 1/(mL2 )
Time-Varying Example (Pendulum on a Moving Cart - LTV): Imagine the pendulum pivot is on a cart
whose horizontal position is given by a known function s(t). The dynamics become more complex, and if we
linearize around a trajectory (e.g., keeping the pendulum vertical while the cart moves), the resulting linearized
system matrices A(t) and B(t) might depend on the trajectory s(t) and its derivatives, making the system LTV.
Deriving this explicitly is more involved but illustrates how time-varying parameters or reference trajectories
can lead to LTV models.
These examples demonstrate the process of selecting state variables and deriving the A, B, C, D matrices (or
A(t), B(t), C(t), D(t)) for physical systems, translating differential equations into the standard state-space form.
where:
• x(t) is the state vector, u(t) is the input vector, y(t) is the output vector.
Equilibrium Points
An equilibrium point (or operating point) of a nonlinear system is a state xeq such that if the system starts at
xeq with a constant input ueq , it remains at xeq indefinitely. For a time-invariant system (f does not explicitly
depend on t), this means:
ẋ = 0 when x = xeq and u = ueq
where δx(t) and δu(t) are small deviations. The corresponding output will be:
Now, substitute these into the state equation and use a first-order Taylor series expansion of f around (xeq , ueq )
(assuming f is sufficiently smooth and time-invariant for simplicity here):
d d
ẋ(t) = (xeq + δx(t)) = (δx(t))
dt dt
∂f ∂f
f (x, u) ≈ f (xeq , ueq ) + (x − xeq ) + (u − ueq )
∂x (xeq ,ueq ) ∂u (xeq ,ueq )
d ∂f ∂f
(δx(t)) ≈ 0 + δx(t) + δu(t)
dt ∂x (xeq ,ueq ) ∂u (xeq ,ueq )
This is a linear differential equation in terms of the deviations δx(t) and δu(t). Let:
∂f
A= (Jacobian matrix of f w.r.t. x, evaluated at equilibrium)
∂x (xeq ,ueq )
∂f
B= (Jacobian matrix of f w.r.t. u, evaluated at equilibrium)
∂u (xeq ,ueq )
d
(δx(t)) = Aδx(t) + Bδu(t)
dt
Similarly, linearize the output equation y = h(x, u):
∂h ∂h
h(x, u) ≈ h(xeq , ueq ) + (x − xeq ) + (u − ueq )
∂x (xeq ,ueq ) ∂u (xeq ,ueq )
∂h ∂h
yeq + δy(t) ≈ yeq + δx(t) + δu(t)
∂x (xeq ,ueq ) ∂u (xeq ,ueq )
Let:
∂h
C= (Jacobian matrix of h w.r.t. x, evaluated at equilibrium)
∂x (xeq ,ueq )
∂h
D= (Jacobian matrix of h w.r.t. u, evaluated at equilibrium)
∂u (xeq ,ueq )
d
(δx) = Aδx + Bδu
dt
δy = Cδx + Dδu
where the constant matrices A, B, C, D are the Jacobians of f and h evaluated at the equilibrium point
(xeq , ueq ).
Jacobian Matrices: Recall that the Jacobian matrix of a vector function f (z) (where f has m components
and z has n components) is the m × n matrix of partial derivatives:
∂f1 /∂z1 ∂f1 /∂z2 · · · ∂f1 /∂zn
∂f ∂f2 /∂z1 ∂f2 /∂z2 · · · ∂f2 /∂zn
= .. .. .. ..
∂z . . . .
∂fm /∂z1 ∂fm /∂z2 · · · ∂fm /∂zn
So, A = [∂fi /∂xj ], B = [∂fi /∂uj ], C = [∂hi /∂xj ], D = [∂hi /∂uj ], all evaluated at (xeq , ueq ).
Having established the state-space representation for linear systems (both LTI and LTV), the next crucial step
is to determine how the state vector x(t) evolves over time, given an initial state x(t0 ) and the input u(t). This
chapter focuses on finding the solution to the state equation.
We begin with the more general case of LTV systems, introducing the concept of the State Transition Matrix
(STM), and then show how it simplifies to the matrix exponential for LTI systems.
ẋ(t) = A(t)x(t)
d
Φ(t, t0 ) = A(t)Φ(t, t0 )
dt
with the initial condition:
Φ(t0 , t0 ) = I (the n × n identity matrix)
Existence and Uniqueness: If the matrix A(t) is continuous over an interval, then a unique solution Φ(t, t0 )
to this matrix initial value problem exists for all t and t0 within that interval. Each column of Φ(t, t0 ) represents
the solution x(t) to ẋ = A(t)x starting from an initial condition where x(t0 ) is the corresponding standard
basis vector ei .
Interpretation: The STM Φ(t, t0 ) maps the state vector from time t0 to time t. If the system starts at
state x(t0 ) at time t0 , its state at time t (assuming no input) will be x(t) = Φ(t, t0 )x(t0 ). Φ(t, t0 ) essentially
"transitions" the state from t0 to t.
Note: Finding an analytical expression for Φ(t, t0 ) is generally difficult unless A(t) has special structures (e.g.,
A(t) is constant, diagonal, or commutes with its integral).
8.1.2 Properties: Φ(t, t) = I, Φ(t, τ )Φ(τ, σ) = Φ(t, σ), Φ−1 (t, τ ) = Φ(τ, t) (with Proofs)
The STM has several important properties that follow directly from its definition:
1. Identity Property: Φ(t, t) = I for all t.
81
82 CHAPTER 8. SOLUTION OF STATE EQUATIONS
Proof. By definition, Φ(t, t0 ) satisfies Φ̇ = A(t)Φ with Φ(t0 , t0 ) = I. If we set t0 = t, the initial condition
becomes Φ(t, t) = I.
2. Semigroup Property (Composition Property): Φ(t, τ )Φ(τ, σ) = Φ(t, σ) for all t, τ, σ.
Proof. Consider the solution x(t) starting from x(σ) = x0 . We have x(t) = Φ(t, σ)x0 . We can also
transition the state from σ to τ , and then from τ to t:
x(τ ) = Φ(τ, σ)x0
x(t) = Φ(t, τ )x(τ ) = Φ(t, τ )[Φ(τ, σ)x0 ] = [Φ(t, τ )Φ(τ, σ)]x0
Since the solution x(t) starting from x0 at time σ is unique, we must have:
Φ(t, σ)x0 = [Φ(t, τ )Φ(τ, σ)]x0
This holds for any initial state x0 , which implies the matrix equality:
Φ(t, σ) = Φ(t, τ )Φ(τ, σ)
Proof Sketch: This involves showing that d/dt(det(Φ)) = tr(A(t)) det(Φ) and solving this scalar differential
equation for det(Φ) with the initial condition det(Φ(t0 , t0 )) = det(I) = 1. This property implies that
Φ(t, t0 ) is always invertible if A(t) is continuous, because the exponential function is never zero.
Proof. We verify that x(t) = Φ(t, t0 )x0 satisfies the differential equation and the initial condition.
• Initial Condition:
x(t0 ) = Φ(t0 , t0 )x0 = Ix0 = x0 . (Satisfied)
Since the solution exists and is unique (from theory of ODEs), x(t) = Φ(t, t0 )x0 is the unique solution.
The State Transition Matrix Φ(t, t0 ) is the fundamental concept for solving LTV systems, playing a role analo-
gous to the matrix exponential eA(t−t0 ) in LTI systems.
Φ(t, t0 ) = eA(t−t0 )
Verification of STM Properties for eA(t−t0 ) : We can verify that eA(t−t0 ) satisfies the general properties of
the STM discussed in Section 8.1:
1. Identity: Φ(t, t) = eA(t−t) = e0 = I. (Matches)
2. Semigroup: Φ(t, τ )Φ(τ, σ) = eA(t−τ ) eA(τ −σ) . Using the property eM1 eM2 = eM1 +M2 if M1 and M2
commute (here M1 = A(t − τ ), M2 = A(τ − σ), which commute since A commutes with itself):
3. Inverse: Φ−1 (t, τ ) = (eA(t−τ ) )−1 . Using the inverse property of matrix exponential (eM )−1 = e−M :
x(t) = eA(t−t0 ) x0
This confirms the result we proved directly using the properties of the matrix exponential in Section 6.4.
In the common case where the initial time is t0 = 0:
This fundamental result connects the matrix exponential directly to the time evolution of the state of a linear
time-invariant system. The behavior of the system is entirely determined by the matrix exponential eAt acting
on the initial state x(0). The methods for computing eAt discussed in Section 6.5 are therefore crucial for
finding explicit solutions to LTI state equations.
det(Ψ(t)) ̸= 0
This means a fundamental matrix Ψ(t) is always invertible. (This can be shown using the Liouville-Jacobi
Rt
formula: det(Ψ(t)) = det(Ψ(t0 )) exp( t0 tr(A(τ ))dτ ). If the columns are linearly independent at any time t0 ,
then det(Ψ(t0 )) ̸= 0, and since the exponential term is never zero, det(Ψ(t)) ̸= 0 for all t).
Non-Uniqueness: Unlike the State Transition Matrix Φ(t, t0 ) which is uniquely defined by its initial condition
Φ(t0 , t0 ) = I, a fundamental matrix Ψ(t) is not unique. If Ψ(t) is a fundamental matrix, then Ψ(t)C is also a
fundamental matrix for any constant invertible n × n matrix C.
General Solution using Fundamental Matrix: Any solution x(t) to ẋ = A(t)x can be expressed as a
linear combination of the columns of a fundamental matrix Ψ(t):
x(t) = Ψ(t)c
where c is a constant n × 1 vector determined by the initial conditions. If x(t0 ) = x0 , then x0 = Ψ(t0 )c, which
implies c = Ψ−1 (t0 )x0 (since Ψ(t0 ) is invertible). Therefore, the solution to the initial value problem is:
x(t) = Ψ(t)Ψ−1 (t0 )x0
Proof. We need to show that the matrix X(t) = Ψ(t)Ψ−1 (τ ) satisfies the defining properties of the STM Φ(t, τ ),
namely:
1. d
dt X(t) = A(t)X(t)
2. X(τ ) = I
Let τ be fixed. Consider X(t) = Ψ(t)Ψ−1 (τ ).
1. Differential Equation: Differentiate X(t) with respect to t. Note that Ψ−1 (τ ) is a constant matrix with
respect to t.
d d
X(t) = [Ψ(t)Ψ−1 (τ )]
dt dt
d
= Ψ(t) Ψ−1 (τ )
dt
Since Ψ(t) is a fundamental matrix, d
dt Ψ(t) = A(t)Ψ(t).
= [A(t)Ψ(t)]Ψ−1 (τ )
= A(t)[Ψ(t)Ψ−1 (τ )]
= A(t)X(t)
The differential equation is satisfied.
2. Initial Condition: Evaluate X(t) at t = τ .
X(τ ) = Ψ(τ )Ψ−1 (τ )
Since Ψ(τ ) is invertible, Ψ(τ )Ψ−1 (τ ) = I.
X(τ ) = I
The initial condition is satisfied.
Since X(t) = Ψ(t)Ψ−1 (τ ) satisfies the unique definition of the State Transition Matrix Φ(t, τ ), we conclude:
Φ(t, τ ) = Ψ(t)Ψ−1 (τ )
Special Case: LTI Systems For an LTI system ẋ = Ax, we know Φ(t, τ ) = eA(t−τ ) . Also, Ψ(t) = eAt is
a fundamental matrix because dt
d
(eAt ) = AeAt and det(eAt ) = etr(A)t ̸= 0. Let’s check the relationship using
Ψ(t) = e :
At
This matches Φ(t, τ ), confirming the relationship for the specific fundamental matrix eAt .
If we chose a different fundamental matrix, say Ψ̃(t) = eAt C (where C is invertible), then:
We get the same unique STM Φ(t, τ ), regardless of which fundamental matrix is used.
The fundamental matrix provides an alternative way to think about the basis of solutions for homogeneous
linear systems and offers another route to finding the state transition matrix.
To find the complete solution, we need to find the particular solution due to the input u(t), also known as the
zero-state response (assuming x0 = 0). We use a method analogous to the variation of parameters technique
used for scalar linear differential equations.
Rt
8.4.1 Derivation for LTV case: x(t) = Φ(t, t0 )x(t0 ) + t0
Φ(t, τ )B(τ )u(τ )dτ
Let the full solution be x(t). We guess a solution form inspired by the homogeneous solution, but allow the
"constant" vector to vary with time. Let:
x(t) = Φ(t, t0 )z(t)
where z(t) is an unknown vector function to be determined. Note that if z(t) were constant, this would just be
the homogeneous solution.
Substitute this guess into the nonhomogeneous state equation ẋ = A(t)x + B(t)u. First, differentiate x(t) using
the product rule:
d d
ẋ(t) = Φ(t, t0 ) z(t) + Φ(t, t0 ) z(t)
dt dt
We know d
dt Φ(t, t0 ) = A(t)Φ(t, t0 ). Substitute this:
Now, set this equal to the right-hand side of the state equation, A(t)x + B(t)u, substituting x(t) = Φ(t, t0 )z(t):
Since Φ(t, t0 ) is invertible, with inverse Φ−1 (t, t0 ) = Φ(t0 , t), we can solve for ż(t):
Now we need to find z(t0 ). From our assumed solution form, x(t) = Φ(t, t0 )z(t). Evaluate this at t = t0 :
x0 = Iz(t0 )
z(t0 ) = x0
Substitute z(t0 ) = x0 back into the expression for z(t):
Z t
z(t) = x0 + Φ(t0 , τ )B(τ )u(τ )dτ
t0
Finally, substitute this z(t) back into our assumed solution form x(t) = Φ(t, t0 )z(t):
Z t
x(t) = Φ(t, t0 ) x0 + Φ(t0 , τ )B(τ )u(τ )dτ
t0
Z t
x(t) = Φ(t, t0 )x0 + Φ(t, t0 ) Φ(t0 , τ )B(τ )u(τ )dτ
t0
Using the semigroup property Φ(t, t0 )Φ(t0 , τ ) = Φ(t, τ ), we can bring Φ(t, t0 ) inside the integral:
Z t
x(t) = Φ(t, t0 )x0 + [Φ(t, t0 )Φ(t0 , τ )]B(τ )u(τ )dτ
t0
This is the general solution for the state vector of an LTV system. It clearly shows the two components:
• Zero-Input Response: Φ(t, t0 )x0 (response due to initial state x0 only)
Rt
• Zero-State Response: t0 Φ(t, τ )B(τ )u(τ )dτ (response due to input u(t) only, assuming x0 = 0)
The integral term is a convolution-like integral, summing the effect of the input u(τ ) at all past times τ (from
t0 to t), propagated to the current time t by the state transition matrix Φ(t, τ ).
Rt
8.4.2 Derivation for LTI case: x(t) = eA(t−t0 ) x(t0 ) + t0
eA(t−τ ) Bu(τ )dτ
For the LTI case, the matrices A and B are constant, and the state transition matrix simplifies to Φ(t, τ ) =
eA(t−τ ) . We can substitute this directly into the general LTV solution derived above.
Substitute Φ(t, t0 ) = eA(t−t0 ) and Φ(t, τ ) = eA(t−τ ) into:
Z t
x(t) = Φ(t, t0 )x0 + Φ(t, τ )B(τ )u(τ )dτ
t0
This is the well-known variation of constants formula or convolution integral solution for LTI systems.
88 CHAPTER 8. SOLUTION OF STATE EQUATIONS
The integral term represents the convolution of the system’s matrix impulse response, h(t) = eAt B (for t ≥ 0),
Rt
with the input u(t). Let H(t) = eAt B. The integral is 0 H(t − τ )u(τ )dτ , which is the definition of (H ∗ u)(t).
So, the zero-state response is xzs (t) = (eAt B) ∗ u(t).
Output Equation Solution: Once the state solution x(t) is found (for either LTV or LTI), the output y(t)
is obtained algebraically from the output equation:
• LTV: y(t) = C(t)x(t) + D(t)u(t)
• LTI: y(t) = Cx(t) + Du(t)
Substituting the full solution for x(t) gives the complete output response, which also consists of a zero-input
and a zero-state component. For LTI systems:
Z t
y(t) = CeAt x(0) + C eA(t−τ ) Bu(τ )dτ + Du(t)
0
The term CeAt B (for t ≥ 0, and 0 for t < 0) plus Dδ(t) (where δ(t) is the Dirac delta) is the impulse response
matrix of the LTI system from input u to output y.
These formulas provide the complete analytical solution for the state and output evolution of linear systems,
forming the basis for understanding system response to initial conditions and external inputs.
Chapter 9
Stability is arguably the most fundamental property of a dynamical system. Informally, a stable system is one
that remains near its equilibrium state when subjected to small perturbations. An unstable system, conversely,
will diverge from equilibrium even for arbitrarily small disturbances. This chapter introduces the formal concepts
of stability, focusing primarily on linear systems.
ẋ(t) = f (x(t), t)
where f is a function describing the dynamics. For linear systems, f (x, t) = A(t)x.
1. Equilibrium Point: A state xeq is an equilibrium point (or fixed point) if f (xeq , t) = 0 for all t ≥ t0 .
If the system starts at xeq , it remains there forever. For the linear system ẋ = A(t)x, the origin xeq = 0
is always an equilibrium point, since A(t) · 0 = 0. If A(t) is invertible for all t, the origin is the only
equilibrium point. We will primarily focus on the stability of the origin xeq = 0.
2. Stability in the Sense of Lyapunov (i.s.L.): The equilibrium point xeq = 0 is stable (i.s.L.) if,
for any given tolerance ϵ > 0, there exists a sensitivity δ(ϵ, t0 ) > 0 such that if the initial state x(t0 ) is
close enough to the origin (||x(t0 )|| < δ), then the subsequent state x(t) remains within the tolerance ϵ
of the origin (||x(t)|| < ϵ) for all future times t ≥ t0 . Intuition: If you start close enough (within δ),
you stay close (within ϵ). Formal Definition: The origin is stable if ∀ϵ > 0, ∀t0 , ∃δ(ϵ, t0 ) > 0 such that
||x(t0 )|| < δ =⇒ ||x(t)|| < ϵ, ∀t ≥ t0 . If δ can be chosen independently of t0 , the stability is called
uniform stability. For LTI systems, stability is always uniform.
3. Asymptotic Stability: The equilibrium point xeq = 0 is asymptotically stable if it is:
(a) Stable (i.s.L.), and
(b) Convergent: There exists some δ0 (t0 ) > 0 such that if the initial state is within δ0 of the origin
(||x(t0 )|| < δ0 ), then the state x(t) not only stays close but also approaches the origin as time goes
to infinity (limt→∞ x(t) = 0).
Intuition: If you start close enough (within δ0 ), you eventually return to the equilibrium point. Formal
Definition (Convergence part): ∃δ0 (t0 ) > 0 such that ||x(t0 )|| < δ0 =⇒ limt→∞ ||x(t)|| = 0. If the
convergence is uniform with respect to t0 and initial states within δ0 , the stability is called uniform
asymptotic stability. For LTI systems, asymptotic stability is always uniform. If convergence holds for
any initial state (δ0 can be arbitrarily large), the origin is globally asymptotically stable. For linear
systems, asymptotic stability is always global.
89
90 CHAPTER 9. STABILITY OF LINEAR SYSTEMS
4. Instability: The equilibrium point xeq = 0 is unstable if it is not stable. This means there exists some
ϵ > 0 such that for any δ > 0, no matter how small, there is always some initial state x(t0 ) within δ of the
origin (||x(t0 )|| < δ) whose trajectory eventually leaves the ϵ-neighborhood (||x(t)|| ≥ ϵ for some t > t0 ).
Intuition: No matter how close you start, there’s a chance you’ll eventually move far away.
Visualizing Stability:
• Stable: Trajectories starting near the origin remain confined within a slightly larger region around the
origin.
• Asymptotically Stable: Trajectories starting near the origin not only remain nearby but also eventually
converge back to the origin.
• Unstable: At least some trajectories starting arbitrarily close to the origin eventually move far away.
These definitions provide the formal framework for analyzing the stability of the internal state dynamics of
linear (and nonlinear) systems.
where eΛt = diag(eλ1 t , . . . , eλn t ). Let c = V −1 x(0) be the representation of the initial condition in the eigen-
vector basis. Then:
x(t) = V [diag(eλ1 t , . . . , eλn t )]c
This means x(t) is a linear combination of terms like eλi t vi , where vi are the eigenvectors.
n
X
x(t) = ci eλi t vi
i=1
Now, consider the behavior as t → ∞. The magnitude of each term eλi t depends on the real part of λi . Let
λi = σi + jωi .
|eλi t | = |e(σi +jωi )t | = |eσi t ejωi t | = |eσi t ||ejωi t | = eσi t (since |ejωi t | = | cos(ωi t) + j sin(ωi t)| = 1).
• If Re(λi ) = σi < 0 for all i: Then eσi t → 0 as t → ∞ for all i. Consequently, each term ci eλi t vi goes
to the zero vector, and thus x(t) → 0 as t → ∞. Since the solution involves decaying exponentials, it
can also be shown that the solution remains bounded for any bounded initial condition, satisfying the
definition of stability i.s.L. Therefore, the system is asymptotically stable.
9.2. STABILITY OF LTI SYSTEMS 91
• If Re(λi ) > 0 for some i: Let λk be an eigenvalue with σk > 0. If the initial condition x(0) has a
non-zero component ck along the eigenvector vk (which is generally true unless x(0) lies exactly in the
subspace spanned by other eigenvectors), then the term ck eλk t vk will grow unboundedly as t → ∞ because
|eλk t | = eσk t → ∞. Thus, ||x(t)|| → ∞, and the system is unstable.
• If Re(λi ) ≤ 0 for all i, but Re(λk ) = 0 for some k: Let λk be an eigenvalue with σk = 0. The term
ck eλk t vk becomes ck ejωk t vk . The magnitude |ejωk t | = 1, so this term represents an oscillation (if ωk ̸= 0)
or remains constant (if ωk = 0, i.e., λk = 0). The terms corresponding to eigenvalues with σi < 0 decay
to zero. Therefore, the overall solution x(t) remains bounded. It does not necessarily converge to 0 (due
to the non-decaying terms from eigenvalues on the imaginary axis). This corresponds to stability i.s.L.,
but not asymptotic stability.
Discussion for Jordan Case (Non-Diagonalizable A): If A is not diagonalizable, the solution involves the
Jordan form, x(t) = P eJt P −1 x(0). The matrix eJt contains blocks eJi t corresponding to the Jordan blocks Ji .
Recall from Section 6.5.3 that for a Jordan block Ji = λI + N of size m × m:
"m−1 #
X N k tk
eJi t = eλt ·
k!
k=0
V (x) = xT P x
9.3. LYAPUNOV THEORY FOR LTI SYSTEMS 93
d T
V̇ (x) = (x P x)
dt
d T T d
= x Px + x P x
dt dt
= (ẋ)T P x + xT P (ẋ)
Substitute ẋ = Ax:
= (Ax)T P x + xT P (Ax)
= xT AT P x + xT P Ax
= xT (AT P + P A)x
For asymptotic stability, we require V̇ (x) to be negative definite. This means we need the matrix (AT P + P A)
to be negative definite. Let Q = −(AT P + P A). Then we require Q to be positive definite (Q > 0).
This leads to the central equation in Lyapunov theory for LTI systems:
AT P + P A = −Q
If we can find a symmetric positive definite matrix P (P > 0) that satisfies this equation for some symmetric
positive definite matrix Q (Q > 0), then V (x) = xT P x is a valid Lyapunov function proving the asymptotic
stability of the LTI system ẋ = Ax.
where:
• A is the state matrix of the LTI system ẋ = Ax.
• P is a symmetric, positive definite matrix (P = P T , P > 0) that we are trying to find.
• Q is a chosen symmetric, positive definite matrix (Q = QT , Q > 0).
Interpretation: The Lyapunov equation relates the system dynamics (A) to the properties of the chosen
Lyapunov function (P ) and its rate of decrease (represented by Q).
• If the system ẋ = Ax is asymptotically stable, then for any chosen symmetric positive definite Q, there
exists a unique symmetric positive definite solution P to the Lyapunov equation.
• Conversely, if for some chosen symmetric positive definite Q, there exists a symmetric positive definite
solution P to the Lyapunov equation, then the system ẋ = Ax is asymptotically stable.
Choosing Q: In the context of proving stability, Q can often be chosen conveniently. A common choice is
Q = I (the identity matrix), or Q = C T C for some matrix C such that the pair (A, C) is observable (this
ensures Q is at least positive semidefinite, and often positive definite under certain conditions). The choice of
Q affects the resulting P matrix, but as long as Q is positive definite, the existence of a positive definite P
guarantees asymptotic stability.
Solving the Lyapunov Equation: The equation AT P +P A = −Q is a linear matrix equation for the unknown
matrix P . Since P is symmetric, it has n(n + 1)/2 independent entries. The equation can be rewritten as a
94 CHAPTER 9. STABILITY OF LINEAR SYSTEMS
system of n2 linear algebraic equations for the entries of P . This can be done using the Kronecker product (⊗)
and the vectorization operator (vec):
(I ⊗ AT + AT ⊗ I)vec(P ) = −vec(Q)
This is a standard linear system of the form M x = b, where x = vec(P ), b = −vec(Q), and M = (I⊗AT +AT ⊗I).
This system has a unique solution for P if and only if A and −AT have no common eigenvalues, which is
equivalent to λi (A) + λj (A) ̸= 0 for all pairs of eigenvalues i, j. If A is asymptotically stable (all Re(λi ) < 0),
this condition is always met, guaranteeing a unique solution P for any Q.
While the Kronecker product formulation shows existence and uniqueness, numerical methods are typically used
to solve the Lyapunov equation for P in practice (e.g., using functions like scipy.linalg.solve_lyapunov in
Python or lyap in MATLAB).
The Lyapunov equation is a fundamental tool in stability analysis and control design for LTI systems.
9.3.3 Theorem: Lyapunov Stability Theorem (If A stable, then for any Q>0,
unique P>0 exists; If P>0 exists for some Q>0, then A is stable) (with
Proofs)
This theorem formally connects the stability of the LTI system ẋ = Ax with the existence of solutions to the
Lyapunov equation AT P + P A = −Q.
Theorem 9.3.1 (Lyapunov Stability Theorem for LTI Systems). Let A be a real n × n matrix. The following
statements are equivalent:
1. The LTI system ẋ = Ax is asymptotically stable (i.e., all eigenvalues of A have strictly negative real
parts).
2. For any given symmetric positive definite matrix Q (Q = QT > 0), there exists a unique symmetric positive
definite matrix P (P = P T > 0) that satisfies the Lyapunov equation AT P + P A = −Q.
3. For some given symmetric positive definite matrix Q (Q = QT > 0), there exists a unique symmetric
positive definite matrix P (P = P T > 0) that satisfies the Lyapunov equation AT P + P A = −Q.
P T A + AT P T = −QT
Since Q = QT , we have AT P T + P T A = −Q. This shows that if P is a solution, then P T is also a solution.
Since the solution is unique, we must have P = P T . So, P is symmetric.
• Positive Definiteness of P: We need to show that the unique symmetric solution P is positive definite.
Consider the candidate solution P given by the integral:
Z ∞
T
P = eA τ QeAτ dτ
0
First, we need to show this integral converges. Since A is asymptotically stable, the matrix exponential
T
eAt (and thus eA t ) decays to zero as t → ∞. Specifically, ||eAt || ≤ M e−αt for some M > 0, α > 0.
T T
Then the norm of the integrand is bounded by ||eA τ QeAτ || ≤ ||eA τ ||||Q||||eAτ || ≤ M 2 ||Q||e−2ατ , which
is integrable from 0 to ∞. Thus, the integral converges. Since Q > 0 and eAτ is always invertible, the
T
integrand eA τ QeAτ is positive definite for all τ ≥ 0 (using the congruence transformation property). The
9.4. INPUT-OUTPUT STABILITY (BIBO) 95
integral of a positive definite matrix function over a positive interval results in a positive definite matrix
P . So, P > 0. Now, let’s verify if this P satisfies the Lyapunov equation:
Z ∞ Z ∞
T T AT τ Aτ AT τ Aτ
A P + PA = A e Qe dτ + e Qe dτ A
Z ∞ 0 0
T T
= [AT eA τ QeAτ + eA τ QeAτ A]dτ
Z0 ∞
d AT τ T d Aτ
= [ e QeAτ + eA τ Q e ]dτ
0 dτ dτ
Z ∞
d AT τ Aτ
= [e Qe ]dτ
0 dτ
T
= [eA τ QeAτ ]ττ =∞
=0
T T
·0
= lim (eA τ QeAτ ) − eA QeA·0
τ →∞
= 0 − e0 Qe0
= −IQI
= −Q
R∞ T
Thus, P = 0
eA τ QeAτ dτ is the unique symmetric positive definite solution.
(2 ⇒ 3): This implication is trivial. If statement 2 holds for any Q > 0, it certainly holds for some Q > 0.
(3 ⇒ 1): Assume for some symmetric Q > 0, there exists a symmetric P > 0 satisfying AT P + P A =
−Q.
We need to show that ẋ = Ax is asymptotically stable. Consider the quadratic function V (x) = xT P x. Since P
is symmetric and positive definite, V (x) is a valid Lyapunov function candidate (V (x) > 0 for x ̸= 0, V (0) = 0).
Calculate the time derivative V̇ (x) along the trajectories of ẋ = Ax:
Since Q is positive definite (Q > 0), −Q is negative definite. Therefore, V̇ (x) = −xT Qx < 0 for all x ̸= 0. We
have found a Lyapunov function V (x) = xT P x whose derivative V̇ (x) is negative definite. By the Lyapunov
stability theorem (conceptual version from 9.3.1), this implies that the equilibrium point x = 0 is asymptotically
stable. Furthermore, since V (x) = xT P x is quadratic, it is radially unbounded. Therefore, the stability is global
asymptotic stability.
Conclusion: The theorem establishes a fundamental equivalence: the asymptotic stability of an LTI system is
completely characterized by the existence of a positive definite solution P to the Lyapunov equation AT P +P A =
−Q for any (or some) positive definite Q. This provides a powerful algebraic test for stability that avoids
calculating eigenvalues directly.
This type of stability is often crucial from a practical perspective, as we typically interact with a system through
its inputs and outputs. We want assurance that if we apply a reasonable (bounded) input signal, the resulting
output signal will also remain within reasonable bounds and not grow indefinitely.
96 CHAPTER 9. STABILITY OF LINEAR SYSTEMS
9.4.1 Definition
The most common form of input-output stability is Bounded-Input, Bounded-Output (BIBO) stability.
Consider a system (LTI or LTV) described by:
Definition 9.4.1 (BIBO Stability). A system is said to be BIBO stable if, for zero initial conditions (x(t0 ) =
0), any bounded input u(t) produces a bounded output y(t).
Formal Definition: The system is BIBO stable if for any initial time t0 and any input u(t) such that
||u(t)|| ≤ Mu < ∞ for all t ≥ t0 (where Mu is some finite constant and || · || is a suitable vector norm, e.g.,
the ∞-norm), the resulting output y(t) (assuming x(t0 ) = 0) is also bounded, i.e., there exists a finite constant
My (Mu , t0 ) such that ||y(t)|| ≤ My < ∞ for all t ≥ t0 .
Key Points:
• Zero Initial State: BIBO stability is defined based on the system’s response to inputs only, assuming
the system starts from rest (x(t0 ) = 0). The effect of non-zero initial conditions is related to internal
(Lyapunov) stability.
• Bounded Input: The input signal must be bounded over time. This means its magnitude never exceeds
some finite limit. Examples include step inputs, sinusoidal inputs, and decaying exponentials. An input
like u(t) = t is not bounded.
• Bounded Output: If the system is BIBO stable, the output signal resulting from any bounded input
must also remain bounded over time; it cannot grow infinitely large.
• Uniformity: For LTI systems, if the system is BIBO stable, the bound My on the output depends only
on the bound Mu of the input, not on the initial time t0 .
BIBO stability ensures predictable and safe operation in response to realistic input signals.
Setting x(0) = 0 and u(t) = δ(t)I (where I is the m × m identity, applying a delta to each input channel), the
output is the impulse response matrix H(t):
Z t
H(t) = C eA(t−τ ) Bδ(τ )Idτ + Dδ(t)I
0
Intuition: If the
R impulse response decays to zero sufficiently quickly (is absolutely integrable), then the convo-
lution integral H(t − τ )u(τ )dτ will remain bounded even if the input u(τ ) is bounded but persists indefinitely.
If the impulse response does not decay (or decays too slowly), a bounded input could potentially excite the
system continuously, leading to an unbounded output (resonance-like behavior).
For the state-space representation H(t) = CeAt B +Dδ(t), the absolute integrability condition primarily depends
on the decay rate of eAt . If A is asymptotically stable (all Re(λi ) < 0), then eAt decays exponentially, ensuring
that CeAt B is absolutely integrable. The Dδ(t) term does not affect BIBO stability as defined (response to
bounded inputs, not impulses).
This connection highlights that for LTI systems, the conditions for internal stability (asymptotic stability based
on eigenvalues) are closely related to the conditions for external stability (BIBO stability based on impulse
response integrability).
9.4.3 Theorem: For LTI systems, Asymptotic Stability ⇒ BIBO Stability (State-
ment)
As hinted in the previous subsection, there is a strong connection between the internal stability (asymptotic
stability of ẋ = Ax) and the external stability (BIBO stability) for LTI systems.
Theorem 9.4.1. For an LTI system described by ẋ = Ax + Bu, y = Cx + Du: If the system is asymptotically
stable (i.e., all eigenvalues of A have strictly negative real parts), then the system is BIBO stable.
Proof Idea. 1. Asymptotic Stability Implies Exponential Decay: If A is asymptotically stable, then
the matrix exponential eAt decays exponentially. This means ||eAt || ≤ M e−αt for some constants M > 0
and α > 0, for all t ≥ 0.
2. Boundedness of Zero-State Response: Consider the zero-state output response (assuming x(0) = 0):
Z t
yzs (t) = C eA(t−τ ) Bu(τ )dτ + Du(t)
0
3. Bound the Integral Term: Take the norm of the integral part:
Z t Z t
C eA(t−τ ) Bu(τ )dτ ≤ ||CeA(t−τ ) Bu(τ )||dτ (Triangle inequality for integrals)
0 0
Z t
≤ ||C||||eA(t−τ ) ||||B||||u(τ )||dτ (Submultiplicativity)
0
4. Use Input Bound and Exponential Decay: Assume the input is bounded: ||u(τ )|| ≤ Mu for all τ .
Substitute the exponential decay bound for ||eA(t−τ ) ||:
Z t
≤ ||C||(M e−α(t−τ ) )||B||Mu dτ
0
Z t
≤ ||C||M ||B||Mu e−α(t−τ ) dτ
0
This shows the integral part of the output is bounded by a constant that depends on Mu but not on t.
7. Bound the Feedthrough Term: The term Du(t) is also bounded:
8. Bound the Total Output: Using the triangle inequality for the output yzs (t):
Z t
||yzs (t)|| ≤ C . . . dτ + ||Du(t)||
0
||C||M ||B||
||yzs (t)|| ≤ + ||D|| Mu
α
Converse Statement (Requires Controllability and Observability): The converse (BIBO stability ⇒
Asymptotic Stability) is not always true. A system can be BIBO stable but internally unstable if the unstable
internal modes are either uncontrollable (cannot be excited by the input) or unobservable (do not appear at
the output).
However, if the LTI system is both controllable and observable (concepts introduced in Chapter 11), then
BIBO stability is equivalent to asymptotic stability.
Summary: For LTI systems, internal asymptotic stability is a sufficient condition for external BIBO stability.
They become equivalent conditions if the system is also controllable and observable.
Chapter 10
Understanding the stability of an LTI system, as discussed in Chapter 9, tells us whether the state converges to
the origin. However, it doesn’t fully describe how the system behaves over time. The concepts of eigenvalues
and eigenvectors, introduced in Chapter 3, reappear here to provide deeper insight into the system’s dynamic
behavior through modal analysis.
Modal analysis decomposes the system’s response into a sum of simpler components, called modes, each
associated with an eigenvalue and eigenvector of the state matrix A. This decomposition helps visualize and
understand the system’s natural frequencies, damping characteristics, and how different initial conditions excite
different patterns of behavior.
This chapter focuses primarily on systems where the state matrix A is diagonalizable.
We can form the modal matrix V whose columns are the eigenvectors, and the diagonal eigenvalue matrix Λ:
Λ = diag(λ1 , λ2 , . . . , λn )
Since the eigenvectors are linearly independent, V is invertible, and we have the relationship:
A = V ΛV −1
x(t) = V z(t)
This means z(t) represents the coordinates of the state x(t) in the basis of eigenvectors. We can find z(t) from
x(t) using the inverse transformation:
z(t) = V −1 x(t)
99
100 CHAPTER 10. SYSTEM MODES AND RESPONSE CHARACTERISTICS
ẋ = V ż
V ż = A(V z)
Substitute A = V ΛV −1 :
V ż = (V ΛV −1 )(V z)
V ż = V Λ(V −1 V )z
V ż = V ΛIz
V ż = V Λz
V −1 (V ż) = V −1 (V Λz)
(V −1 V )ż = (V −1 V )Λz
I ż = IΛz
ż(t) = Λz(t)
This is a remarkable result. The transformation to modal coordinates z = V −1 x has transformed the original
coupled system ẋ = Ax into a completely decoupled system ż = Λz. Since Λ is a diagonal matrix, this vector
equation represents n independent scalar first-order differential equations:
Each modal coordinate zi (t) evolves independently according to its corresponding eigenvalue λi .
Solution in Modal Coordinates: The solution for each decoupled scalar equation is simply:
where eΛt = diag(eλ1 t , . . . , eλn t ) and z(0) = V −1 x(0) is the initial condition transformed into modal coordinates.
Solution in Original Coordinates: To get the solution back in the original state coordinates x(t), we use
the transformation x = V z:
This recovers the solution we found in Section 8.2.1 using the diagonalization method for computing eAt . The
modal decomposition provides a clear interpretation of this solution: the system’s evolution is governed by the
simple exponential behavior of the decoupled modal coordinates, which are then mapped back to the original
state space via the eigenvector matrix V .
This decomposition is fundamental to understanding how the system’s internal structure (eigenvalues and
eigenvectors) dictates its dynamic response.
10.2. 101
ci eλi t vi
P
10.2 Homogeneous Response via Modes: x(t) =
In the previous section, we saw that for a diagonalizable LTI system ẋ = Ax, the solution to the homogeneous
initial value problem x(t0 ) = x0 is given by:
where V is the matrix of eigenvectors, Λ is the diagonal matrix of eigenvalues, and eΛt = diag(eλ1 t , . . . , eλn t ).
This form is useful for computation, but we can rewrite it to gain more insight into the structure of the response.
Let c be the vector of initial modal coordinates:
c = z(0) = V −1 x(0)
c = [c1 , c2 , . . . , cn ]T
The vector c represents the weights of the initial state x(0) when expressed as a linear combination of the
eigenvectors (the columns of V ). Specifically, x(0) = V c = c1 v1 + c2 v2 + · · · + cn vn .
Now, substitute c back into the solution x(t) = V [eΛt z(0)] = V eΛt c:
λt c1
e 1 0
.. c2
x(t) = [v1 |v2 | . . . |vn ] . ..
.
0 eλ n t
cn
eλ1 t c1
λt
e 1 0 eλ2 t c2
.. c = ..
.
.
0 eλn t
eλn t cn
eλ1 t c1
eλ2 t c2
x(t) = [v1 |v2 | . . . |vn ] .
..
eλn t cn
This matrix-vector multiplication results in a linear combination of the columns of V , weighted by the elements
of the vector:
x(t) = (eλ1 t c1 )v1 + (eλ2 t c2 )v2 + · · · + (eλn t cn )vn
n
X
x(t) = ci eλi t vi
i=1
• ci (Modal Participation Factor): Determines how much each mode is initially excited by the initial
condition x(0). It is the i-th component of the initial state when represented in the eigenvector basis
(c = V −1 x(0)). If ci = 0, the i-th mode is not excited by that specific initial condition.
The overall system response x(t) is the sum of these individual modal responses. The long-term behavior is
dominated by the modes corresponding to eigenvalues with the largest real parts (the slowest decaying or fastest
growing modes).
This modal representation is extremely powerful for understanding the qualitative behavior of LTI systems
without needing to compute the full matrix exponential eAt . It directly links the system’s structure (eigenvalues
and eigenvectors) to its dynamic response.
• Basis for Response: The set of n linearly independent eigenvectors {v1 , . . . , vn } forms a basis for the
state space. Any state vector x(t) can be represented as a linear combination of these eigenvectors, with
the time-varying coefficients given by the modal responses ci eλi t .
Example 10.3.1 (Interpretation). Consider a 2D system (n=2) with eigenvalues λ1 = −1 (real, stable) and
λ2 = −0.1 + j2 (complex, stable), and corresponding eigenvectors v1 = [1, 1]T and v2 = [1, −1 + j]T (and
v2∗ = [1, −1 − j]T for λ∗2 ).
• Mode 1 (λ1 = −1, v1 = [1, 1]T ): This is a non-oscillatory decaying mode. If the initial state x(0) is
proportional to v1 , the state x(t) will decay exponentially towards the origin along the line x2 = x1 .
• Mode 2 (λ2 = −0.1 ± j2, v2 ): This is a damped oscillatory mode. The real part (-0.1) indicates slow
decay, and the imaginary part (±2) indicates oscillations with frequency 2 rad/s. The complex eigenvector
v2 describes the phase relationship between x1 and x2 during these oscillations.
• General Response: A general initial condition x(0) will excite both modes (unless x(0) happens to be
aligned with one eigenvector). The response x(t) will be a superposition:
The response will exhibit damped oscillations (from mode 2) superimposed on a faster decay along the v1
direction (from mode 1). As t → ∞, the faster decaying mode 1 becomes negligible, and the behavior is
dominated by the slower decaying oscillatory mode 2, eventually converging to the origin.
By analyzing the eigenvalues (poles) and eigenvectors (mode shapes), we can predict and understand the
qualitative characteristics of the system’s response—such as stability, speed of response (decay rates), presence
and frequency of oscillations, and the dominant patterns of behavior in the state space—without needing to
simulate the system for every possible initial condition.
104 CHAPTER 10. SYSTEM MODES AND RESPONSE CHARACTERISTICS
Chapter 11
Chapters 9 and 10 focused on the internal dynamics and stability of linear systems, primarily analyzing the
homogeneous system ẋ = Ax. Now, we turn our attention to how the inputs u(t) affect the state x(t) and how
the state x(t) affects the output y(t). These concepts are formalized by controllability and observability.
• Controllability relates to the ability of the input u(t) to influence or steer the state vector x(t) to any
desired value within a finite time.
• Observability relates to the ability to determine the initial state vector x(t0 ) by observing the output
y(t) over a finite time interval.
These are fundamental structural properties of a system, independent of specific input signals or initial states.
They determine whether it is possible, in principle, to control the system completely or to deduce its internal
state from external measurements. These concepts are crucial for control design (e.g., pole placement requires
controllability) and state estimation (e.g., designing observers requires observability).
Reachability:
• Definition: A state xf is said to be reachable at time tf > t0 if, starting from the zero initial state
(x(t0 ) = 0), there exists an input signal u(t) defined over [t0 , tf ] that transfers the state to x(tf ) = xf .
• Reachable Subspace: The set of all reachable states at time tf forms a subspace of the state space Rn ,
called the reachable subspace R(tf ).
• Complete Reachability: The system is said to be completely reachable (or simply reachable) if the
reachable subspace is the entire state space (R(tf ) = Rn ) for some finite tf > t0 . (For LTI systems, if it’s
reachable for some tf , it’s reachable for all tf > t0 ).
• Interpretation: Can we reach any target state starting from the origin using some control input?
Controllability:
• Definition: A state x0 is said to be controllable at time t0 if there exists a finite time tf > t0 and an
input signal u(t) defined over [t0 , tf ] that transfers the state from x(t0 ) = x0 to the zero state (x(tf ) = 0).
105
106 CHAPTER 11. CONTROLLABILITY AND OBSERVABILITY (INTRODUCTION)
• Controllable Subspace: The set of all controllable states forms a subspace C(t0 ).
• Complete Controllability: The system is said to be completely controllable (or simply controllable)
if the controllable subspace is the entire state space (C(t0 ) = Rn ). This means any initial state can be
driven to the origin in finite time.
• Interpretation: Can we drive any initial state back to the origin using some control input?
Relationship between Reachability and Controllability (LTI): For continuous-time LTI systems, reach-
ability and controllability are equivalent concepts. A state is reachable if and only if it is controllable. Therefore,
the terms are often used interchangeably, and we typically refer to the property as controllability.
Constructibility:
• Definition: A state x0 at time t0 is said to be unconstructible if, assuming zero input (u(t) = 0 for
t ≤ t0 ), the output is identically zero (y(t) = 0 for t ≤ t0 ) when the initial state is x(t0 ) = x0 . If x0 = 0
is the only initial state that produces zero output for t ≤ t0 , then the system is constructible.
• Unconstructible Subspace: The set of all unconstructible initial states forms a subspace U(t0 ).
• Complete Constructibility: The system is said to be completely constructible (or simply con-
structible) if the unconstructible subspace contains only the zero vector (U(t0 ) = {0}). This means any
non-zero initial state must produce a non-zero output at some point in the past (assuming zero input).
• Interpretation: If we observe the output generated by the system evolving freely (zero input) up to time
t0 , can we uniquely determine the state x(t0 )? If the system is constructible, only x(t0 ) = 0 could have
produced y(t) = 0 for all t ≤ t0 .
Observability:
• Definition: A state x0 at time t0 is said to be unobservable if, assuming zero input (u(t) = 0 for
t ≥ t0 ), the output is identically zero (y(t) = 0 for all t ≥ t0 ) when the initial state is x(t0 ) = x0 .
• Unobservable Subspace: The set of all unobservable initial states forms a subspace N (t0 ).
• Complete Observability: The system is said to be completely observable (or simply observable) if
the unobservable subspace contains only the zero vector (N (t0 ) = {0}). This means any non-zero initial
state must produce a non-zero output at some point in the future (assuming zero input).
• Interpretation: If we observe the output generated by the system evolving freely (zero input) from time
t0 onwards, can we uniquely determine the initial state x(t0 )? If the system is observable, only x(t0 ) = 0
could produce y(t) = 0 for all t ≥ t0 .
Relationship between Constructibility and Observability (LTI): For continuous-time LTI systems,
constructibility and observability are equivalent concepts. Therefore, the terms are often used interchangeably,
and we typically refer to the property as observability.
These definitions lay the groundwork for analyzing whether a system’s internal state can be fully influenced by
inputs and fully inferred from outputs.
(B = [B1 , 0] ), then the second subsystem x2 evolves independently according to ẋ2 = A2 x2 regardless
T T
of the input u. The states in x2 are uncontrollable. We cannot use u to steer x2 to a desired value.
11.2. PHYSICAL INTERPRETATION 107