0% found this document useful (0 votes)
22 views

SciCom LecNotes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

SciCom LecNotes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

SciCom LecNotes

Long Nguyen Chi


August 2023

1 MATLAB
- An algo is unstable if rounding errors can lead to large errors in the results.
- a = first:increment:last : initialize the vector a starting at the first element and ending at the last element with a
step length of increment. If increment is not specified, its default value is 1.
- A = [1 2 3; 4 5 6; 7 8 9] : rows are separated = ’;’.

• Take a row: A(3,:)

• Take a column: A(:,2)


• Take a sub-matrix: A(i:j,k:l)
• Exchange column: A(:,[3 1 2]) → column 1 2 3 → 3 1 2
• Merge 2 matrix by column: C = [A B]

• Merge 2 matrix by row: C = [A; B]


• Delete a column: A(:,2) = []

- Length of vector: length(A), size of matrix: size(A)


- Operator: +,-,*,/. A*B means matrix multiplication, while A.*B, A./B, A.ˆB does things like the + operator.
Matrices can be multiplied by scalars. If we do the + with a matrix and a scalar, all elements of the matrix will be added
by the scalar.
- MATLAB functions are vectorized. That is, if the input is an array, the output is also an array.

• eye(n) or eye(a,b): matrix with n*n or a*b identity

• diag(v) (v is a vector): matrix with diagonal = v, others = 0. diag(v,1) means diagonal = v, but the diagonal are
moved 1 to the right.

- String: msg = ’ something inside ’. ” means ’.


name = [’Michel ’ ’Paul ’ ’Heath ’]
>> name = Michel Paul Heath

• S = char(X) converts numbers (ASCII code) to the corresponding characters.


• X = double(S) converts characters to numbers (ASCII code)
• S = ’Hello’ ’Yes’ ’No’ ’Goodbye’: array of strings.

- Clear all variables: clear. clear var1 means only clear var1. ∼= = not equal
- if similar to python, elif python = elseif. Have ’end’ at last. After statement needs ;.
- for i = [1,2,4], for i = 1:4, for i = n:-1:1
x = []; x = [x,k]: append k at the end
- disp(x): display in comment window.
- function output/ [out1,out2,...] = functionName(inp1,inp2,...)
end
- plot, plot3 (3d), surf (use [X,Y] = meshgrid(x,y) then X and Y are 2 x.size()*y.size() matrix which are created by
copies of x and y, respectively.)
Ex: x = 0:2:6;

1
y = 0:1:6;
[X,Y] = meshgrid(x,y);
F = X.ˆ2 + Y.ˆ2;
surf(X,Y,F);
- 10\5 = 5/10 = 5:10

2 Linear Equations
2.1 System of Linear Equations
     
a11 a12 ··· a1n x1 b1
 a21 a22 ··· a2n   x2   b2 
A= . ..  , x =  . , b= . 
     
.. ..
 .. . . .   ..   .. 
am1 am2 ··· amn xn bm
Then, the system of linear equations can be written as:

Ax = b
where A is the coefficient matrix, m is number of eq, n is number of variables, x is the variable vector, and b is the
right-hand side vector:

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. ..
. .
am1 x1 + am2 x2 + · · · + amn xn = bm

• m = n : square system (often with a unique solution)

• m < n : missing system (the system is usually infinitely solutions)


• m > n : residual system (the system usually has no solution)

- To solve:
• Ax = b, A ∈ Rn∗n ; x, b ∈ Rn :
x = A−1 b (x = inv(A)*b)

• AX = B, A ∈ Rn∗m ; X ∈ Rm.k ; B ∈ Rn.k :


X = A\B

• XA = B, X ∈ Rn∗m ; A ∈ Rm.k ; B ∈ Rn.k :


X = B/A

- D=det(A), T=trace(A), R=rank(A)


- Kronecker-Capelli theorem: The system of linear equations Ax = b has solution if and only if:

rank(A) = rank([A b])

- Instances when solving Ax = b:


• The system of equations has a unique solution if det(A) ̸= 0.
• When det(A) = 0 the system of equations can have infinitely solutions or no solutions (We can apply the Kronecker-
Capelli theorem to determine whether it has no solution or infinitely solutions).
• When det(A) ̸= 0, there exists an inverse matrix of A and A is called a non-singular matrix.
• When det(A) = 0 then the inverse matrix A−1 does not exist and A is called singular matrix (degenerate matrix).

2
2.2 Permutation matrices and triangular matrices
- Permutation matrix: The permutation matrix is obtained from the unit matrix I by permuting its rows. Its properties:
• With permutation matrix: P −1 = P T
• Multiplication PX: permute rows of matrix X
• Multiplication XP: permute columns of matrix X
 
0 1 0 0
0 0 0 1
A=
0

0 1 0
1 0 0 0
- Short form in Matlab: p = [2, 4, 3, 1]
- Upper triangular matrix: The matrix X ∈ Rn×n if xij = 0 ∈ X ∀i < j
 
x1,1 x1,2 x1,3 . . . x1,n
 0 x2,2 x2,3 . . . x2,n 
 
 0 0 x3,3 . . . x3,n 
 
 .. .. .. .. .. 
 . . . . . 
0 0 0 ... xn,n
- Unit triangular matrix: Matrix which only has diagonal elements xii = 1 ∀i = 1, . . . , n.
- The determinant of an upper triangular matrix is non-zero ⇔ all the elements on the main diagonal are ̸= 0.
- Lower triangular matrix: similar
- Subunit triangular matrix: lower triangular matrix with ones on the diagonal and zeros above the diagonal.
- Solve U x = b, U is an upper triangular matrix:
x = zeros ( n , 1 ) ;
f o r k = n : −1:1
x ( k ) = b ( k ) /U( k , k ) ;
i =(1: k − 1 ) ’ ;
b ( i ) = b ( i ) − x ( k ) ∗ U( i , k ) ;
end
This code calculate from the last equation to first, after knowing an x(i), subtract them in all equations above at 2
sides, based on its coefficient of x(i).

2.3 LU analysis
• Let Pk be the permutation matrices at steps k = 1, . . . , n − 1.
• Let Mk be the subunit triangular matrices obtained by inserting the ”addition” factors used in the k th step below
the diagonal position of k th column of the unit matrix.
• Let U be the final upper triangular matrix obtained at the end of the forward elimination phase.
So: U = Mn−1 Pn−1 ...M1 P1 A.

Mn−1 = Mn−1
′ −1
Mn−2 = Pn−1 Mn−2 Pn−1
′ −1 −1
Mn−3 = Pn−1 Pn−2 Mn−3 Pn−2 Pn−1
..
.
−1 −1
Mk′ = Pn−1 · · · Pk+1 Mk Pk+1 · · · Pn−1

then Mn−1 Pn−1 . . . M1 P1 = Mn−1 . . . M1′ × Pn−1 . . . P1 .
Thus,
Mn−1 Pn−1 ...M1 P1 A = U

Mn−1 . . . M1′ × Pn−1 . . . P1 = U
P A = LU

3

where: P = Pn−1 . . . P1 , L = (Mn−1 . . . M1′ )−1
- Solve the system of equations by LU analysis Ax = b:
LU analysis: P A = LU
+ Solve Ly = P b, find y.
+ Solve U x = y, find x.
- Note: If we choose wrong pivot element, because of error in real numbers, the answer of the algorithm may have a
high difference to the real answer.

2.4 Effect of Rounding Error


- Problem: Find x: Ax = b.
- Measure difference:
+ Error: e = x − x∗
+ Offset: r = b − Ax∗
where x∗ : solution from the algo + round error.
+ Small deviation because det(A) is too small: 10−6 maybe.
+ Near singular matrix: Matrix which determinant is very close to 0.

2.5 Bad determinism and matrix condition number


- Vector normalization:
+ Definition: The function v : Rn → R is said to be a vector norm over Rn if and only if:

• v(x) ≥ 0 ∀x ∈ Rn , and v(x) = 0 if and only if x = 0.


• v(αx) = |α|v(x) ∀x ∈ Rn and all α ∈ R.
• v(x + y) ≤ v(x) + v(y) ∀x, y ∈ Rn . This is known as the triangle inequality.

The function v(x) is typically denoted by ||x||.


+ Types:
pPn
• ||x||2 = i=1 xi
2

Pn
• ||x||1 = i=1 |xi |
• ||x||∞ = max1≤i≤n |xi |
Pn 1/p
• ||x||p = ( i=1 |xi |p )
+ Code: norm(x,p). norm(x) = I2 .
- Matrix normalization:
+ Definition: The function || · || : Rn×n → R is said to be a matrix norm if it satisfies the following properties:

• ||A|| ≥ 0 for all A ∈ Rn×n , and ||A|| = 0 if and only if A is the zero matrix.
• ||αA|| = |α|||A|| for all A ∈ Rn×n and all α ∈ R.
• ||A + B|| ≤ ||A|| + ||B|| for all A, B ∈ Rn×n .
• ||AB|| ≤ ||A|| · ||B|| for all A, B ∈ Rn×n .

The norm of a matrix A ∈ Rn×n is defined as:


||Ax||
||A|| = max ||Ax|| = max
||x||=1 x̸=0 ||x||

In other words, the norm of A is the maximum amount by which A can stretch a unit vector.
The inequality ||Ax|| ≤ ||A|| · ||x|| is called the norm bound or norm inequality, and it follows from the definition of
the norm.
+ Types: Given a vector norm || · || on Rn , we can define a matrix norm || · || on Rn×n as follows:
• ||A||2 = max||x||2 =1 ||Ax||2 = λmax (A∗ A), where A∗ is the conjugate transpose matrix of A (A∗ij = Aji )
p

Pn
• Maximum absolute row sum: ||A||∞ = max||x||∞ =1 ||Ax||∞ = max1≤i≤n j=1 |aij |

4
Pn
• Maximum absolute column sum: ||A||1 = max||x||1 =1 ||Ax||1 = max1≤j≤n i=1 |aij |
qP
n Pn
• Frobenius standard: ||A||F = Tr(AT A) =
p
2
i=1 j=1 |aij |

+ Code: norm(A,p), p can be inf


- Condition number of matrix:
+ Definition: The condition number cond(A), usually denoted by κp (A), of a square matrix A computed for a given
matrix norm p (p standard) is a number given by:
cond(A) = ||A||p ||A−1 ||p
where we conventionally define cond(A) = ∞ when A is degenerate. Because of,
||Ax||p
maxx̸=0
||x||p
||A||p × ||A−1 ||p =
||Ax||p
minx̸=0
||x||p
so the condition number that measure the ratio of the maximum expansion to the maximum contraction that the
matrix can act on for a vector is non-zero.
+ Meaning: cond(A) is close to 1, then the matrix is well-conditioned (for a small change in the inputs (the independent
variables) there is a small change in the answer or dependent variable), if close to infinity, it is inverse: ill-conditioned.
- Note:
1
||A−1 ||p =
||Ax||p
minx̸=0
||x||p
+ Properties:
• For all A, cond(A) ≥ 1
• For all unit matrices I: cond(I) = 1
• For any permutation matrix P, cond(P ) = 1
• ∀ matrices A and non-zero scalars α : cond(αA) = cond(A)
maxi∈[1,n] |di |
• For any diagonal matrix D = diag(di ): cond(D) =
mini∈[1,n] |di |
+ Code:
• cond(A,p) to calculate κp (A) with p = 1, 2, inf.
• cond(A) or cond(A,2) calculates κ2 (A). Use svd(A). Should be used with small matrices.
• cond(A,1) calculates κ1 (A). Use the function inv(A). Requires less computation time than cond(A,2).
• cond(A,inf) calculates κ∞ (A). Requires less computation time than cond(A,1).
• condest(A) to evaluate κ1 (A). Using the function lu(A) and the Higham-Tisseur algorithm. Recommended for large
matrices.
• rcond(A) to evaluate 1/κ1 (A).
- Evaluate the error when knowing the condition number of the matrix:
Let x be the exact solution of Ax = b, and x∗ be the solution of the system Ax∗ = b + ∆b (where ∆b is the additive
noise). We define ∆x = x∗ − x. Then we have:
b + ∆b = Ax∗ = A(x + ∆x) = Ax + A∆x
∆x = A−1 ∆b
||b|| ≤ ||A|| × ||x||
||∆x|| ≤ ||A−1 || × ||∆b||
⇒ ||∆x|| × ||b|| ≤ ||A|| × ||A−1 || × ||x|| × ||∆b||
||∆x|| ||∆b||
≤ cond(A)
||x|| ||b||

5
- If the input data is represented approximately to computer accuracy, then the relative error estimate of the calculated
solution is given by the formula:
∥x∗ − x∥
≈ cond(A)ϵM
∥x∥
where cond(A) is the condition number of the matrix A and ϵM is the machine epsilon.
The calculated solution will lose an interval of log10 (cond(A)) decimal places in the relative error due to the limited
precision of the input data.

3 Curve fitting
3.1 Problem
N
- Determine the value of f (x) at n + 1 discrete points fi = f (xi ), for i = 0, 1, . . . , N . Given the dataset (xi , fi )i=0 ,
approximate f (x) by a function p(x).
- 2 methods:
• Interpolation: function p(x) must go through all points in the dataset.

• Regression: given the form p(x) with parameters, we must determine the parameters to minimize a certain error
criterion (usually the least squares criterion is used).

3.2 Interpolation
- The possible interpolation with functions p(x) is:
• Polynomial function
• Rational function

• Fourier series function


- Polynomial of degree no more than K:

pK (x) = a0 + a1 x + · · · + aK xK

where a0 , a1 , . . . , aK are constants, can be 0.


- Lagrange interpolation formula:
Consider polynomial:
pM (x) = a0 + a1 x + · · · + aM xM
The system of linear equations representing the interpolation conditions for polynomial pM (x) to interpolate the dataset
N
(xi , fi )i=0 can be written in matrix form as:

1 x0 x20 · · · xM
    
0 a0 f0
1 x1 x21 · · · xM 1
  a1   f1 
..   ..  =  .. 
    
 .. .. .. ..
. . . . .  .   . 
1 xN x2N ··· xM
N aM fN

If M = N, the matrix on the left is called the Vandermonde matrix (VN ).


Y
det(VN ) = xi − xj
i>j

infer that the system has a solution when x0 , x1 , . . . , xN is pairwise distinct.


- Theorem (Uniqueness of Interpolated Polynomials): If the data nodes x0 , x1 , . . . , xN are pairwise different, then there
exists a unique interpolation polynomial pN (x) of no more than N degree that interpolates the dataset {(xi , fi ) | i =
0, 1, . . . , N }.
- We can use Lagrange interpolation formula to calculate the constant of the interpolated polynomial without solving
the above system of linear equations.

6
- The Lagrange basis polynomials are defined as:
N
Y x − xj
Vi (x) =
x − xj
j=0 i
j̸=i

where i = 0, 1, . . . , N .
- The Lagrangian form of the interpolation polynomial is given by:

pN (x) = f0 V0 (x) + f1 V1 (x) + · · · + fN VN (x)

where Vi (x) : i = 0, 1, . . . , N are polynomials of degree N satisfying the condition:


(
1 if i = j
Vi (xj ) =
0 if i ̸= j

- The family of polynomials Vi (x) : i = 0, 1, . . . , N is called the family of basis polynomials.


- Interpolation Error: The error of the interpolated polynomial g(x) is determined by the formula:
N
M Y
e(x) = (x − xi )
(N + 1)! i=0

where M = max[a,b] f (N +1) (x)


The magnitude of e(x) depends on:
• The magnitude of the data points.
• Size of the interpolation domain D = xn − x0 .
• Degree of interpolation polynomial
For Lagrange interpolated polynomials, the error is determined by the formula:

e(x) = f (x) − g(x) = L(x)f (N +1) (ϵ)

- Spline interpolation: In this case we consider interpolation by a set of low-order polynomials instead of a single
higher-order polynomial:
• Interpolation by linear spline
• Interpolation by cubic spline
For the dataset
{(xi , fi ) : i = 0, 1, . . . , N }
and:
a = x0 < x1 < · · · < xN = b, h ≡ max |xi − xi−1 |
i

- Linear spline S1,N (x) is a continuous function that interpolates given data and is built from linear functions defined
by two data point interpolation polynomials:
f −f
1 0
 (x − x1 ) + f1 , if x ∈ [x0 , x1 ]
x − x


. 1 0




 ..

f −f

i i−1
S1,N (x) = (x − xi ) + fi , if x ∈ [xi−1 , xi ]
 xi − xi−1
..



.




 fN − fN −1 (x − xN ) + fN ,


if x ∈ [xN −1 , xN ]

xN − xN −1
Easy to see:
fi − fi−1
Li (x) = (x − xi ) + fi
xi − xi−1

7
is the equation of the line through two points (xi , fi ) and (xi−1 , fi−1 ). Using error formula for interpolated polynomial
with x ∈ [a, b] for each segment [xi−1 , xi ] we get:
2
|xi − xi−1 |
max |f (z) − S1,N (z)| ≤ × max f (2) (x)

z∈[xi−1 ,xi ] 8 x∈[xi−1 ,xi ]

h2
≤ × max f (2) (x)

8 x∈[xi−1 ,xi ]

where h ≡ maxi |xi − xi−1 |.


- Third-order splines: Instead of line segments, third-order splines use 3rd degree polynomials to approximate the
segment polynomial.


 p1(x) = a1,0 + a1,1 x + a1,2 x2 + a1,3 x3 , if x ∈ [x0 , x1 ]

 .
.
.



S3,N (x) = pi (x) = ai,0 + ai,1 x + ai,2 x2 + ai,3 x3 , if x ∈ [xi−1 , xi ]
.

..





pN (x) = aN,0 + aN,1 x + aN,2 x2 + aN,3 x3 , if x ∈ [xN −1 , xN ]

- S3,N (x) is a continuous function with continuous first and second derivatives on the segment [x0 , xN ]
- Condition (Find polynomial based on this):
• Smoothness Condition at the point xi : p′i (xi ) = p′i+1 (xi ), p′′i (xi ) = p′′i+1 (xi ), i = 1, 2, . . . , N − 1

• Guaranteed Data Interpolation Condition: pi (xi ) = pi+1 (xi ) = fi , i = 0, 1, . . . , N − 1


• Natural Boundary Condition: p′′1 (x0 ) = 0, p′′N (xN ) = 0
• Second Derivative Condition: p′′1 (x0 ) = f ′′ (x0 ), p′′N (xN ) = f ′′ (xN )
• Not-a-Knot Condition: p′′′ ′′′ ′′′ ′′′
1 (x1 ) = p2 (x1 ), pN −1 (xN −1 ) = pN (xN −1 )

+ p=spline(x,y): returns the segmented polynomial of the third order spline for the dataset (x,y).
+ v=ppval(p,x): Calculates the value of the interpolated polynomial at input x (p is determined by the spline
function).
+ yy=spline(x,y,xx): Defines the 3rd order interpolated spline for the dataset (x,y) and returns yy as the vector of
the values of the spline interpolated at points in the vector xx.

3.3 Regression
- Regression problem: Learn y = f (x) from a given training data D = {(x1 , y1 ),
. . . , (xM , yM )} such that yi ≡ f (xi ) ∀i
- Linear model:
f (x) = w0 + w1 x1 + · · · + wn xn
w0 , . . . , wn : regression coefficients/weights, w0 : bias.
Note: Learning a linear function = learning w = (w0 , . . . , wn )T .
- Expected loss of f :
E = Ex (f ∗ (xi ) − yi )
- Empirical loss:
M
X M
X
RSS(f ) = (yi − f (xi ))2 = (yi − w0 − · · · − wn xin )2
i=1 i=1

1
Note: RSS(f ): an approximation to Ex [r(x)]
M
1
- Generalization error: RSS(f ) − Ex [r(x)]
M
- Some methods:

8
+ Ordinary least squares (OLS): Given D, find f ∗ that minimizes RSS:
f ∗ = arg min RSS(f )
f ∈H
M
X
⇔ W∗ = arg min (yi − w0 − · · · − wn xin )2
w
i=1

- Linear regression: Linear regression is finding a line that fits the data points in the least squares sense. Given a set
of N pairs of data points (xi , fi ) : i = 1, . . . , N , find the coefficient m and the free constant b of the line y(x) = mx + b
such that this line fits the data according to the least squares criterion.
N
X
L(m, b) = (fi − (mxi + b))2
i=1

This is the formula for the least squares criterion, where L(m, b) is the sum of the squared differences between the actual
fi values and the predicted values mxi + b. The goal is to find the values of m and b that minimize L(m, b).
To find this minimum, we need to solve the system of equations to find the stopping point:
N
∂L X
= 2(fi − (mxi + b))(−xi ) = 0
∂m i=1

N
∂L X
= 2(fi − (mxi + b))(−1) = 0
∂b i=1
These equations represent the partial derivatives of the least squares criterion with respect to m and b, respectively. The
goal is to find the values of m and b that satisfy these equations, which correspond to the minimum of the least squares
criterion.
Using the equations from the previous step, we can derive the following expressions:
N
X N
X N
X
x2i m + xi b = xi fi
i=1 i=1 i=1

N
X N
X N
X
xi m + b= fi
i=1 i=1 i=1
In matrix form, this can be written as:
"P #   "P #
N PN N
x2i i=1 xi m x i fi
Pi=1
N = Pi=1
N
i=1 xi N b i=1 fi

Solving this system of equations will give us the coefficients m and b that minimize the least squares criterion.
- High order curve fitting: We want to build the curve fitting
pM (x) = a0 + a1 x + a2 x2 + · · · + aM xM
that matches the dataset (xi , fi ) : i = 1, . . . , N according to the least squares criterion:
σM ≡ (pM (x1 ) − f1 )2 + (pM (x2 ) − f2 )2 + · · · + (pM (xN ) − fN )2
To find the coefficients a0 , a1 , . . . , aM that minimize σM , we need to solve the system of equations:
∂σM ∂σM ∂σM
= 0; = 0; · · · ; =0
∂a0 ∂a1 ∂aM
These equations represent the partial derivatives of the least squares criterion with respect to each of the coefficients
a0 , a1 , . . . , aM . The goal is to find the values of a0 , a1 , . . . , aM that satisfy these equations, which correspond to the
minimum of the least squares criterion.
PN PN M     PN
···
 
N i=1 xi x a f
 PN xi PN 2
PNi=1 Mi +1   0   PNi=1 i 
 i=1 i=1 xi ··· i=1 xi   a1   i=1 fi xi 
 .. .. .. ..   ..  =  .. 
 . . . .  .   . 
PN M PN M +1 PN 2M PN
··· a M
i=1 xi i=1 xi xi fi xi
M
i=1 i=1

9
The special feature of this matrix is that it is definite symmetry, which means that we can use Gaussian elimination
without changing the line to solve the system of standard equations.
- General curve fitting: Suppose we want to match N multipoint for {(xi , fi ) : i = 1, . . . , n} because they include m
linearly independent functions φj (x), j = 1, 2, . . . , m. That is, the function f (x) to look for has the form:
m
X
f (x) = cj φj (x)
j=1

For example, the following family of functions can be used as φj (x):

φj (x) = xj , j = 1, 2, . . . , m

φj (x) = sin(jx), j = 1, 2, . . . , m
φj (x) = cos(jx), j = 1, 2, . . . , m
These functions are linearly independent and can be used to construct a basis for the space of functions that can be matched
to the data points. The coefficients cj can be found by solving the system of equations that results from matching the
data points to the linear combination of the basis functions.
  2
n
X n
X m
X
E(c1 , c2 , . . . , cm ) = (f (xk ) − fk )2 =  cj φj (xk ) − fk 
k=1 k=1 j=1

To find the coefficients c1 , c2 , . . . , cm , we need to solve the system of equations:


∂E
= 0, j = 1, 2, . . . , m
∂cj

4 Nonlinear equations
4.1 Bisection Method

Algorithm 1 Bisection method for finding a root of f (x)


Require: f (x) is continuous on [a, c], f (a)f (c) < 0, ϵ > 0
Ensure: Approximate solution x∗ with |f (x∗ )| < ϵ
1: b ← (a + c)/2 ▷ Initial midpoint of interval
2: while |c − a| > ϵ do ▷ Until interval is sufficiently small
3: if f (b) = 0 then
4: return b ▷ Exact solution found
5: else if f (a)f (b) < 0 then
6: c←b ▷ New interval is [a,b]
7: else
8: a←b ▷ New interval is [b,c]
9: end if
10: b ← (a + c)/2 ▷ Compute new midpoint
11: end while
12: return b ▷ Approximate solution

- Strength: Works even with non-analytic functions. (analytic functions: a function that is locally given by a convergent
power series)
- Weaknesses:

• Need to determine the range of solutions and find only one solution.
• When the function f has singularities (singularity is a point at which a given mathematical object is not defined),
the bisection method can treat them as solutions.

10
4.2 Chord method

Algorithm 2 Regula falsi method for finding a root of f (x)


Require: f (x) is continuous on [a, c], f (a)f (c) < 0, ϵ > 0
Ensure: Approximate solution x∗ with |f (x∗ )| < ϵ
af (c) − cf (a)
1: b ← ▷ Initial estimate using secant line
f (c) − f (a)
2: while |c − a| > ϵ do ▷ Until interval is sufficiently small
3: if f (b) = 0 then
4: return b ▷ Exact solution found
5: else if f (a)f (b) < 0 then
6: c←b ▷ New interval is [a,b]
7: else
8: a←b ▷ New interval is [b,c]
9: end if
af (c) − cf (a)
10: b← ▷ Compute new estimate using secant line
f (c) − f (a)
11: end while
12: return b ▷ Approximate solution

- Strength: like bisection, we do not need the analytic form of the equation f
- Weaknesses:
• Need to know the solution interval

• Single-sided convergence is slow, especially when the segment contains large solutions
• Can be improved using the same halving method

4.3 Newton’s method


Given a continuously differentiable function f (x) of order n + 1, we can approximate the nonlinear equation f (x) = 0 with
a linear equation for x by using the Taylor series expansion of f (x) around the point a, which is:

f ′′ (a) f (n) (a) f (n+1) (ξ)


f (b) = f (a) + f ′ (a)(b − a) + (b − a)2 + · · · + (b − a)n + (b − a)n+1
2! n! (n + 1)!

for some ξ ∈ (a, b). If we neglect the higher-order terms and solve for b, we get the linear equation:

f ′ (a) f ′ (a)
 
f (a) n−1
b=a− ′ 1 + ′′ (b − a) + · · · + (b − a)
f (a) 2f (a) n!f (n) (a)
which can be iteratively solved using a fixed-point iteration scheme. The convergence of the method depends on the
behavior of the higher-order terms, which can be difficult to estimate in general.

Algorithm 3 Fixed-point iteration method


Require: f (x) is continuous and differentiable on [a, b], f ′ (x) ̸= 0 for all x ∈ [a, b], x0 ∈ [a, b], ϵ > 0
Ensure: Approximate solution x∗ with |f (x∗ )| < ϵ
1: k ← 0
2: while |f (xk )| > ϵ do
f (xk )
3: xk+1 ← xk − ′
f (xk )
4: k ←k+1
5: end while
6: return xk

- Advantages:
• For a smooth enough function and if we start from a point near the solution, the convergence rate of the method is
squared or r = 2.

11
• No need to know the solution dissociation, only the initial point x0 is required.
- Disadvantages:
• Need to calculate the first derivative f ′ (xk ), which can be computationally expensive. We can approximate the first
f (xk + h) − f (xk − h)
derivative using the formula f ′ (xk ) ≈ , but choosing the value of h = 0.001 can be tricky and
2h
affect the accuracy of the approximation.
• The iterative procedure may not always converge or may converge to a wrong solution if the initial guess is too far
from the actual solution, or if the function has multiple roots or singular points.

4.4 Secant method

Algorithm 4 Secant method


Require: f (x) is continuous and differentiable, x0 and x1 are initial guesses, ϵ > 0
Ensure: Approximate solution x∗ with |f (x∗ )| < ϵ
1: k ← 1
2: while |f (xk )| > ϵ do
f (xk ) − f (xk−1 )
3: sk ←
xk − xk−1
f (xk )
4: xk+1 ← xk −
sk
5: k ←k+1
6: end while
7: return xk

- Advantages:
• No need to know the solution dissociation, only two initial points x0 and x1 are required.
• No need to calculate the first derivative f ′ (xk ).
- Disadvantages:
• Two initialization points are required, which may be more difficult to obtain than a single initial point.
• The convergence rate of the

method on linear problems is only moderately better than linear, 1 < r < 2, specifically
the golden ratio r ≈ 1 + 25 = 1.618.
• The method may converge slowly or not at all if the initial guesses are far from the actual solution or if the function
has singular points or multiple roots.

4.5 Iterative method


- Fixed point:
Instead of writing the equation as f (x) = 0, we rewrite it as a problem:
Find x satisfying x = g(x).
The point x∗ is called a fixed point of the function g(x) if x∗ = g(x∗ ), i.e. the point x∗ is not transformed by the
mapping g.
We are given an iterative procedure xk+1 = g(xk ), where k = 1, 2, . . .. This procedure is often called an iterative find
fixed point, and we are given a starting point x1 .
To solve this problem, we can use the following approach:

Algorithm 5 Iterative find fixed point


Require: g(x) is continuous, x1 is an initial guess, ϵ > 0
Ensure: An approximate fixed point x∗ such that |x∗ − g(x∗ )| < ϵ
1: k ← 1
2: while |xk − g(xk )| > ϵ do
3: xk+1 ← g(xk )
4: k ←k+1
5: end while
6: return xk

12
- Advantages: No need to know the sol interval.
- Disadvantages: Doesn’t always converge.
- Note: Power of k > 1: increase error and iteration, k < 1: decrease error and iteration.
Functions to find solutions in Matlab:
X = roots(C): find polynomial roots
X = fzero(F,X0 ) find solutions to nonlinear equations

5 Approximation of Derivative and Integral


5.1 APPROXIMATION OF DERIVATIVE
Consider the Taylor expansion of the function f at the neighborhood of x:

f ′′ (ξ)h2
f (x + h) = f (x) + f ′ (x)h + , ξ ∈ [x, x + h]
2
We have:
f (x + h) − f (x) f ′′ (ξ)h
f ′ (x) = −
h 2
f ′′ (ξ)h
Given that is the truncation error, we have:
2
f (x + h) − f (x)
f ′ (x) ≈
h
This is the Forward Difference (FD) formula to approximate the derivative.
- Backward Difference Method (BD):
+ Formulate the method: Similar to the forward difference method, in Taylor expansion we use x − h instead of x + h,
we have:
f (x) − f (x − h)
f ′ (x) ≈
h
′ π
Exercise: Use the BD to approximate f ( 3 ), knowing that f (x) = sin(x).

π f ( π3 ) − f ( π3 − h)
f ′( ) ≈
3 h
Using h = 0.1, we have:
π sin( π3 ) − sin( π3 − 0.1)
f ′( ) ≈ ≈ 0.497
3 0.1
- Central Difference Method (CD):
Consider the Taylor expansion of the function f at the neighborhood x:

f ′′ (x)h2 f ′′′ (ξ + )h3


f (x + h) = f (x) + f ′ (x)h + + , ξ + ∈ [x, x + h]
2! 3!
f ′′ (x)h2 f ′′′ (ξ − )h3
f (x − h) = f (x) − f ′ (x)h + − , ξ − ∈ [x − h, x]
2! 3!
We have:
f (x + h) − f (x − h) f ′′′ (ξ)h2
f ′ (x) = − , ξ ∈ [x − h, x + h]
2h 6
This is the Central Difference (CD) method:

f (x + h) − f (x − h)
f ′ (x) ≈
2h
where ξ belongs to (x − h, x + h).
+ Consider the Taylor expansion of the function f at the neighborhood x:

f ′′ (x)h2 f ′′′ (x)h3 f (4) (x)h4 f (5) (ξ + )h5


f (x + h) = f (x) + f ′ (x)h + + + +
2! 3! 4! 5!
f ′′ (x)h2 f ′′′ (x)h3 f (4) (x)h4 f (5) (ξ − )h5
f (x − h) = f (x) − f ′ (x)h + − + −
2! 3! 4! 5!

13
We have the approximate formula for 2nd order derivative:

f (x + h) − 2f (x) + f (x − h) f (4) (ξ)h2


f ′′ (x) ≈ −
h2 12
Truncation error:
O(h2 )
Minimum when h = ϵ1/4 .
f (x + h) − 2f (x) + f (x − h)
f ′′ (x) ≈ + O(h2 )
h2
where ξ belongs to (x − h, x + h).
- Approximation of partial derivative:
Similarly, we can formulate the approximate formula for partial derivative, for example, central difference for partial
derivatives of function f (x, y) as follows:

∂f f (x + h, y) − f (x − h, y)
(x, y) ≈
∂x 2h
∂f f (x, y + h) − f (x, y − h)
(x, y) ≈
∂y 2h
where h is the step size.
∂2f f (x + h, y) − 2f (x, y) + f (x − h, y)
2
(x, y) ≈
∂x h2
∂2f f (x, y + h) − 2f (x, y) + f (x, y − h)
2
(x, y) ≈
∂y h2
∂2f f (x + h, y + h) − f (x + h, y − h) − f (x − h, y + h) + f (x − h, y − h)
(x, y) ≈
∂x∂y 4h2

5.2 APPROXIMATION OF INTEGRAL


Suppose the function f is defined on [a, b] and ∆ is the division of the interval [a, b] into n closed sub-intervals Ik =
[xk−1 , xk ], k = 1, . . . , n, where a = x0 < x1 < . . . < xn−1 < xn = b. Choose n points ck : k = 1, . . . , n, each of which
belongs to a sub-interval, that is: ck belongs to Ik for all k. The sum:
n
X
f (ck )∆xk = f (c1 )∆x1 + f (c2 )∆x2 + . . . + f (cn )∆xn
k=1

where ∆xk = xk − xk−1 and ck ∈ Ik is called the Riemann sum of the function f (x) corresponding to the division ∆
and the selection points ck : k = 1, . . . , n.
- Trapezoidal rule: Extension
Divide [a, b] into n equal intervals using n + 1 points: x0 = a, x1 = a + h, xn−1 = a + (n − 1)h, xn = a + nh where
h = b−a
n . Applying the trapezoidal formula for each subinterval we have:
Z xi
xi − xi−1
f (x)dx ≈ (f (xi−1 ) + f (xi ))
xi−1 2

n−1
!
Z b
h X
f (x)dx ≈ f (a) + 2 f (xi ) + f (b)
a 2 i=1

This is called the expanded trapezoid formula.


- Simpson 1/3 formula
Substituting n = 2 into the Newton-Cotes formula and integrating, we get:
Z b    
b−a a+b
f (x)dx ≈ f (a) + 4f + f (b)
a 6 2

This is called Simpson’s formula 1/3.


- Simpson 1/3 formula: Extension

14
Like the extended trapezoidal rule, we divide the interval [a, b] into several subintervals and apply the Simpson 1/3 to
each subinterval, we get the extended Simpson 1/3:
 
Z b n/2−1 n/2−1
h X X
f (x)dx ≈ f (x0 ) + 4 f (x2i+1 ) + 2 f (x2i ) + f (xn )
a 3 i=0 i=1

Note: We need an even number of subintervals, or an odd number of points.


Error: O(h4 )
- Simpson 3/8 formula
Substituting n = 3 into the Newton-Cotes formula and integrating, we get:
Z b
3h
f (x)dx ≈ (f (a) + 3f (a + h) + 3f (a + 2h) + f (b))
a 8

This is called Simpson formula 3/8.


- Simpson 3/8 extended formula: When 3 | n:
 n

Z b n−1 3 −1
3h  X X
f (x)dx ≈ f (x0 ) + 3 f (xi ) + 2 f (x3i ) + f (xn )
a 8 i=1
i=1,3∤i

6 Differential Equations
6.1 Forward Euler (FE)
Considering differential equation: y ′ = f (y, t) Forward Euler method is obtained by using the forward difference formula
approximate y ′ :

y(tn+1 ) − y(tn )
y ′ (tn ) ≈
h
We can rewrite it in an iterative form as follows:

yn+1 = yn + hf (yn , tn )

where h is the step size, yn is the approximation of y(tn ), and f (yn , tn ) is the value of f at (yn , tn ). tn+1 = tn + h
- Disadvantages:

• Large rounding error.


• Instability occurs when the time constant of the equation is negative, unless the time step h is small enough.
Consider a system of differential equations:

y ′ = f (y, z, t), y(0) = y0



z = g(y, z, t), z(0) = z0

Forward Euler for a system of differential equations:

yn+1 = yn + hf (yn , zn , tn )
zn+1 = zn + hg(yn , zn , tn )

where h is the step size, yn and zn are the approximations of y(tn ) and z(tn ), respectively, and f (yn , zn , tn ) and g(yn , zn , tn )
are the values of f and g at (yn , zn , tn ).
To solve a higher-order differential equation, we can decompose it into a system of first-order differential equations.
For example, consider the second-order differential equation:

y ′′ (t) − 0.05y ′ (t) + 0.15y(t) = 0


y ′ (0) = 0
y(0) = 1

15
Let y ′ = z, then rewrite as follows:
y ′ = z, y(0) = 1

z = 0.05z − 0.15y, z(0) = 0
Equation becomes a system of first-order differential equations.
- Modified Euler method:
h
yn+1 = yn + [f (yn+1 , tn+1 ) + f (yn , tn )]
2

6.2 Backward Euler (BE)


yn+1 = yn + hf (yn+1 , tn+1 )
This method is stable, so it is used to solve non-smooth problems (which are difficult to solve by other methods).

6.3 Runge-Kutta method (RK)


The disadvantage of the Euler methods is that the order of accuracy is small. High accuracy requires h to be very small. In
the Runge-Kutta method, the order of accuracy is increased by using the intermediate points in each iteration. Consider
the differential equation:
y ′ = f (y, t), y(0) = y0
To calculate yn+1 at tn+1 = tn + h with yn known, we integrate the above equation in the interval [tn , tn+1 ] as follows:
Z tn+1
yn+1 = yn + f (y(t), t)dt
tn

The Runge-Kutta method was developed by applying integral methods to integrate the right-hand side.
The second-order Runge-Kutta formula:
k1 = hf (yn , tn )
k2 = hf (yn + k1 , tn+1 )
1
yn+1 = yn + (k1 + k2 )
2
where h is the step size, yn is the approximation of y(tn ), and f (yn , tn ) is the value of f at (yn , tn ).

- Third order Runge-Kutta formula:


k1 = hf (yn , tn )
 
1 1
k2 = hf yn + k1 , tn + h
2 2
k3 = hf (yn + θk1 + (1 − θ)k2 , tn + h)
1
yn+1 = yn + (k1 + 4k2 + k3 )
6

16
such that θ = −1 is optimal.
- Fourth-order Runge-Kutta formula:

k1 = hf (yn , tn )
 
1 1
k2 = hf yn + k1 , tn + h
2 2
 
1 1
k3 = hf yn + k2 , tn + h
2 2
k4 = hf (yn + k3 , tn + h)
1
yn+1 = yn + (k1 + 2k2 + 2k3 + k4 )
6
is based on Simpson 1/3.

k1 = hf (yn , tn )
 
1 1
k2 = hf yn + k1 , tn + h
3 3
 
1 1 2
k3 = hf yn + k1 + k2 , tn + h
3 3 3
k4 = hf (yn + k1 − k2 + k3 , tn + h)
1
yn+1 = yn + (k1 + 3k2 + 3k3 + k4 )
8
is based on Simpson 3/8.

7 UNCONSTRAINED MINIMIZATION METHODS


7.1 Calculus concepts
Denote Rn as the set of n-dimensional real vectors:
   

 x1 


  x2  

n
R = x =  .  : xi ∈ R, i = 1, 2, . . . , n
 

  ..  

 
xn
 

where R is the set of real numbers.


 There,
 we define
  the operations:
u1 v1
 u2   v2 
Addition of two vectors u =  .  and v =  . :
   
 ..   .. 
un vn
 
u1 + v1
 u2 + v2 
u+v =
 
.. 
 . 
un + vn

Vector multiplication with a real number α:  


αu1
 αu2 
αu =  . 
 
 .. 
αun
Rn and the operations just defined form a linear space. Elements of Rn are sometimes referred to as points.
If we consider the concept of the scalar product of two vectors u and v in Rn :
n
X
⟨u, v⟩ = u i · vi
i=1

17
then Rn together with the scalar product becomes an n-dimensional Euclidean space.
The length of vector u in Rn is given by:
v
u n
p uX
∥u∥ = ⟨u, u⟩ = t u2i
i=1

n
The distance between two points u and v in R is defined as:
v
u n
uX
ρ(u, v) = ∥u − v∥ = t (ui − vi )2
i=1

For u, v, w ∈ Rn , we have the triangle inequality:


∥u − v∥ ≤ ∥u − w∥ + ∥w − v∥
- Assume {uk ; k = 1, 2, . . .} is a set of points in Rn , which means uk ∈ Rn , k = 1, 2, . . .. The point v is called the
critical point of the sequence {uk } if we can find a subsequence {uki } that converges to v.
- A sequence {uk } is said to be bounded if there exists a constant M ≥ 0 such that ∥uk ∥ ≤ M , for all k = 1, 2, . . ..
- The set O(x; ε) = {u ∈ Rn : ∥u − x∥ < ε} is the sphere centered at x with radius ε > 0, and it is called the
neighborhood ε of x.
- The point v ∈ Rn is called the critical point of the set U ⊂ Rn if all its neighborhoods ε always contain a point of U
that is different from v.
- A point x ∈ X is said to be an interior point of set X if there exists a neighborhood ε that lies entirely in X. The
set of interior points of X is denoted by int(X).
- A point x ∈ X is said to be a boundary point of set X if among all its neighborhoods ε, there are points in X and
points not in X. The set of boundary points of X is denoted by ∂(X).
- A set X is said to be an open set if every point x ∈ X is an interior point of X.
- A set X in Rn is said to be bounded if there exists a constant L > 0 such that ∥u∥ ≤ L for all u ∈ X.
- A set X in Rn is said to be a closed set if it contains all its boundary points.
- Assume {xk } is a sequence of points in the closed set X, and limk→+∞ xk = x, then x ∈ X.
- A set X is said to be a compact set if it is both closed and bounded.
- Assume {xk } is a sequence of points in the compact set X. Then, from {xk }, we can always extract a convergent
subsequence {xki } such that limi→+∞ xki = x, and x ∈ X.
Differential of the multivariable function:
• Assume the function f is defined in the neighborhood O(x; ε) of the point x. We say that the function f is
differentiable at x if there exists a vector f ′ (x) ∈ Rn such that the increment of the function at x:
∆f (x) = f (x + ∆x) − f (x), ∥∆x∥ ≤ ε
can be stated as follows:
∆f (x) = ⟨f ′ (x), ∆x⟩ + o(x, ∆x)
o(x,∆x)
where lim∥∆x∥→0 ∥∆x∥ = 0.
The function f ′ (x) is called the gradient of the function f at x and is denoted by ∇f (x).
• Assume the function f is defined in the neighborhood O(x; ε) of the point x. We say that the function f is twice
differentiable at x if, along with the vector f ′ (x), there exists a symmetric matrix f ′′ (x) ∈ Rn×n such that the
increment of the function at x can be written as:
⟨f ′′ (x)∆x, ∆x⟩
∆f (x) = f (x + ∆x) − f (x) = ⟨f ′ (x), ∆x⟩ + + o(x, ∆x)
2
o(x, ∆x)
where lim∥∆x∥→0 = 0.
∥∆x∥2
The matrix f ′′ (x) is called the second-order derivative matrix (also known as the Hessian) of the function f at x and
is sometimes denoted by ∇2 f (x).
• Assume the function f is defined on an open set X. We say that the function f is continuously differentiable over
the set X if f is differentiable at every point x of X and
∥f ′ (x + ∆x) − f ′ (x)∥ → 0 as ∥∆x∥ → 0, ∀x, x + ∆x ∈ X.
1
The set of functions satisfying this property is denoted by C (X).

18
• Assume the function f is defined on an open set X. We say that the function f is twice continuously differentiable
on set X if f is twice differentiable at every point x of X and

∥f ′′ (x + ∆x) − f ′′ (x)∥ → 0 as ∥∆x∥ → 0, ∀x, x + ∆x ∈ X.

The set of functions satisfying this property is denoted by C 2 (X).


Taylor’s Formula: Assume f (x) is twice continuously differentiable in a neighborhood ε of x0 . Then, for any x in ε,
we have
1
f (x) = f (x0 ) + ⟨f ′ (x0 ), x − x0 ⟩ + ⟨f ′′ (x0 )(x − x0 ), x − x0 ⟩ + α(x, x0 )∥x − x0 ∥2 ,
2
where limx→x0 α(x, x0 ) = 0. The error can be written as o(∥x − x0 ∥2 ).
Taylor’s Formula (Vector-Valued): Assume f (x) is a vector-valued function that is twice continuously differentiable
in a neighborhood ε of x0 . Then, for any x in ε, we have
1
f (x) = f (x0 ) + J(x0 )(x − x0 ) + (x − x0 )T H(x0 )(x − x0 ) + α(x, x0 )∥x − x0 ∥2 ,
2
where J(x0 ) is the Jacobian matrix of f evaluated at x0 , H(x0 ) is the Hessian matrix of f evaluated at x0 , and
2
 can be written as o(∥x − x0 ∥ ).
limx→x0 α(x, x0 ) = 0. The error  
f1 (x) x1
Example: Let f (x) = f2 (x) be a vector-valued function of x = x2 . Suppose f (x) is twice continuously
f3 (x)   x3
x01
differentiable in a neighborhood ε of x0 = x02 . Then, for any x in ε, we have the Taylor’s formula as follows:
x03

1
f (x) = f (x0 ) + J(x0 )(x − x0 ) + (x − x0 )T H(x0 )(x − x0 ) + α(x, x0 )∥x − x0 ∥2 ,
2
where  ∂f1 ∂f1 ∂f1 
∂x1 (x0 ) ∂x2 (x0 ) ∂x3 (x0 )
 ∂f2 ∂f2 ∂f2
J(x0 ) =  ∂x1 (x0 ) ∂x2 (x0 ) ∂x3 (x0 )

∂f3 ∂f3 ∂f3
∂x1 (x0 ) ∂x2 (x0 ) ∂x3 (x0 )

is the Jacobian matrix of f evaluated at x0 , H(x0 ) is the Hessian matrix of f evaluated at x0 , and limx→x0 α(x, x0 ) = 0.
The error can be written as o(∥x − x0 ∥2 ).
 ∂2f ∂ 2 f1 ∂ 2 f1

1
∂x2
(x0 ) ∂x1 ∂x2 (x0 ) ∂x1 ∂x3 (x0 )
 ∂ 2 f12 ∂ 2 f2 ∂ 2 f2

 2 (x0 )
H(x0 ) = ∂x1 ∂x2 (x0 ) ∂x1 ∂x3 (x0 )

 ∂x1
∂ 2 f3 ∂ 2 f3 2
∂ f3
∂x21
(x0 ) ∂x1 ∂x2 (x0 ) ∂x1 ∂x3 (x0 )
   
sin(x1 ) x1
Example: Let f (x) =  ex2  be a vector-valued function of x = x2 . Suppose f (x) is twice continuously
ln(x3 )     x3
x01 0
differentiable in a neighborhood ε of x0 = x02 , where x0 = 1. Then, for any x in ε, we have the Taylor’s formula as
x03 2
follows:
1
f (x) = f (x0 ) + J(x0 )(x − x0 ) + (x − x0 )T H(x0 )(x − x0 ) + α(x, x0 )∥x − x0 ∥2 ,
2
where  
cos(x01 ) 0 0
J(x0 ) =  0 ex02 0 
1
0 0 x03

is the Jacobian matrix of f evaluated at x0 , and

− sin(x01 )
 
0 0
H(x0 ) =  0 ex02 0 
0 0 − x12
03

19
is the Hessian matrix of f evaluated at x0 . The error term α(x, x0 ) satisfies limx→x0 α(x, x0 ) = 0. The error can be
written as o(∥x − x0 ∥2 ).
- The finite-increments formula: Suppose the function f is continuously differentiable on the open set S, and x is
a vector in S. For any vector y satisfying x + y ∈ S, there exists a number α ∈ [0, 1] such that
Z 1
f (x + y) − f (x) ≤ ⟨f ′ (x + ty), y⟩ dt.
0

If f is twice continuously differentiable, then we have


1
f (x + y) − f (x) ≤ ⟨f ′ (x), y⟩ + ⟨f ′′ (x + αy)y, y⟩.
2
Extrema of Multivariable Functions:
- Optimization Problem: Minimize f (x) over X, where X ⊆ Rn and f is a function defined on X.
• The point x∗ ∈ X is called the global minimum point of f on X if f (x∗ ) ≤ f (x) for all x ∈ X. The value f (x∗ ) is
the minimum value of f on X, denoted as min f (x). The point x∗ ∈ X is called a local minimum point of f on X
x∈X
if there exists a neighborhood O(x; ε), ε > 0, such that f (x∗ ) ≤ f (x) for all x ∈ O(x; ε) \ X.
• Suppose the function f is bounded on X. The value f ∗ is called a lower bound of f on X if:
1. f ∗ ≤ f (x) for all x ∈ X.
2. For every value ε > 0, there exists uε ∈ X such that f (uε ) < f ∗ + ε.
We denote the infimum of f over X as inf x∈X f (x) = f ∗ .
• Obviously if function f has global minimum on X then:

inf f (x) = min f (x)


x∈X x∈X

7.2 Unconstrained nonlinear programming


- Theorem 1 (Necessary condition for optimality) The necessary condition for x0 to be a local minimum is:

∂f
(x0 ) = 0 (2)
∂x
Condition (2) is called the stationary condition, and the point x0 satisfying (2) is called a stationary point. Therefore,
solving the problem (1) can be reduced to solving equation (2).
- Theorem 2 (Sufficient condition for optimality) Suppose f is twice continuously differentiable. The stationary point
x0 is a local minimum if the matrix f ′′ (x0 ) is positive definite.
- Sylvester’s Criterion: To determine whether a matrix is positive definite or not, Sylvester’s criterion can be used.
Matrix A = (aij )n×n is negative definite (positive semidefinite) if and only if all of its sub-determinants are non-negative.
Formula: For 1 ≤ i1 < i2 < . . . < ik ≤ n, k = 1, 2, . . . , n, the sub-determinant Di1 ,i2 ,...,ik is defined as:
 
ai1 i1 ai1 i2 . . . ai1 ik
ai2 i1 ai2 i2 . . . ai2 ik 
Di1 ,i2 ,...,ik = det  . ..  ≥ 0
 
.. ..
 .. . . . 
aik i1 aik i2 ... aik ik

7.3 Single-variable minimization methods


- Unimodel function: A unimodal function is a function with only one maximum or minimum point on the specified
interval.
If x∗ is the single minimum point of f (x) in the range a ≤ x ≤ b, then f (x) is unimodal on the interval if and only if
for any two points x1 and x2 :
• If x∗ ≤ x1 ≤ x2 , then f (x∗ ) ≤ f (x1 ) ≤ f (x2 ).
• If x∗ ≥ x1 ≥ x2 , then f (x∗ ) ≤ f (x1 ) ≤ f (x2 ).
- Searching algorithm:

20
(b−a) (b−a)
1. Calculate x1 = a + 2 − e and x2 = a + 2 + e, where e is the precision.
2. Calculate f1 = f (x1 ) and f2 = f (x2 ).
3. If f1 < f2 , then set b = x2 (remove segment x > x2 ).

• If f1 > f2 , then set a = x1 (remove segment x < x1 ).


• If f1 = f2 , then set a = x1 and b = x2 (remove segment x < x1 and x > x2 ).
4. If |b − a| < 2e, then terminate; otherwise, go back to step 1.
- Fibonacci method: The Fibonacci sequence is defined recursively as follows:

F0 = 1,
F1 = 1,
Fk = Fk−1 + Fk−2 , k ≥ 2.

To determine the number of iterations N before calculating the minimum, we use the following formula:

(k) FN −1−k
x1 = (bk − ak ) + ak , k = 0, 1, . . . , N − 1
FN +1−k

(k) FN −k
x2 = (bk − ak ) + ak , k = 0, 1, . . . , N − 1
FN +1−k
where ak and bk are the lower and upper bounds of the interval at iteration k.
- Golden section method:
(k)
x1 = 0.382(bk − ak ) + ak , k = 0, 1, 2, . . .

(k)
x2 = 0.618(bk − ak ) + ak , k = 0, 1, 2, . . .

7.4 Unconstrained Optimization Methods


Consider the unconstrained nonlinear programming problem:

min f (x), x ∈ Rn

where f (x) is continuously differentiable. To solve problem, if a solution exists, it could be found among the solutions of
the equations:
∂f
(x) = 0
∂x
However, solving equations in the general case is still quite complex. This leads us to use efficient methods to solve
problem.
A common approach to solving problem is to use methods that iterate from an initial value x0 and move ”toward” the
optimal value x∗ . At each iteration, we calculate:

xk+1 = xk + αk pk , k = 1, 2, . . .

where:

• pk is the displacement vector from the point xk .


• αk is the length of the move in the direction pk .
Obviously, the procedure is deterministic when we determine the direction pk of the move and the way to calculate the
length of the move αk . Depending on the different constructions of pk and αk , we have iterative procedures with different
properties. We are particularly interested in the following two properties:
• The value variation of the objective function f along the sequence {xk }. We want the objective function to consis-
tently decrease or converge to a minimum value as we iterate through the procedure. This property helps us assess
the progress and effectiveness of the iterative method.

21
• The convergence of the sequence {xk } to the solution x∗ . Additionally, it’s important to note that defining pk and
αk differently also affects the amount of computation required.
- Gradient methods: We select the direction pk such that:

⟨∇f (xk ), pk ⟩ < 0

This condition ensures that when we choose αk small enough, we have:

f (xk+1 ) = f (xk + αk pk ) = f (xk ) + αk ⟨∇f (xk ), pk ⟩ + o(αk ) < f (xk )

In other words, by moving in the direction of pk with a sufficiently small length, we will reach the point xk+1 with a
smaller objective function value. Therefore, the direction pk satisfying condition is called the descent direction of the
objective function f (x). One of the vectors satisfying the inequality can be chosen as the gradient vector of the function
f at xk :
pk = −∇f (xk ), αk > 0, k = 0, 1, 2, . . .
Then, we have the iterative procedure:

xk+1 = xk − αk ∇f (xk ), αk > 0, k = 0, 1, 2, . . .

Iterative procedures that follow the formula are called gradient methods.
Since the search direction is fixed, the gradient methods differ due to the choice of αk . We list some of the basic choices
below:
+ Procedure: Minimization of a Function with One Variable
We solve the problem of minimizing a function with one variable:

min tk (λ) : λ ≥ 0, where tk (λ) = f (xk − λ∇f (xk )).

The optimal solution of this problem is taken as the value of αk .


+ Procedure: Selection of αk
To select αk , we perform the following process:

1. Set α = α0 > 0.
2. Set u = xk − α∇f (xk ) and calculate f (u).
3. Check if f (u) − f (xk ) ≤ −ϵα∥∇f (xk )∥2 , where 0 < ϵ < 1.
α
4. If the inequality is satisfied, then set αk = α. Otherwise, set α = 2 and go back to step 2.

- Newton method:
In the case that the function f is twice continuously differentiable and the calculation of f ′ (x) and f ′′ (x) is not difficult,
we can use the quadratic term of the Taylor series expansion:
1
fk (x) ≈ f (xk ) + ⟨f ′ (xk ), x − xk ⟩ + ⟨H(xk )(x − xk ), x − xk ⟩ (9)
2
This equation represents a quadratic approximation of the function f at the neighborhood of the point xk . When xk is
very close to x∗ , f (xk ) is very close to 0, and the quadratic term in equation (9) provides more precise information about
the variation of f in the neighborhood of xk .
We determine the approximation vector uk from the condition:

fk (uk ) = min fk (x) (10)

and build the next approximation solution:

xk+1 = xk + αk (uk − xk ) (11)

Depending on the choice of αk , different methods can be obtained. For αk = 1 for every k, we have Newton’s method.
From equation (11), we have xk+1 = uk when choosing αk = 1. The condition in equation (10) becomes:

fk (xk+1 ) = min fk (x), k = 1, 2, . . . (12)

From equation (12), we infer that xk+1 is the breakpoint of the function fk (x), i.e., fk′ (x) = f ′ (xk )+H(xk )(x−xk ) = 0.
Therefore, if H(xk ) is not degenerate, we have the following Newton’s formula:

xk+1 = xk − [H(xk )]−1 f ′ (xk ), k = 1, 2, . . .

22
8 LINEAR PROGRAMMING
8.1 Simplex method
8.1.1 The canonical and standard form of the linear programming problems
- The general form of the linear programming problem: The standard form of the linear programming problem is the
optimization problem in which we have to find the maximum (minimum) of a linear objective function with the condition
that the variables must satisfy a number of equations and linear inequalities. The mathematical model of the problem
can be stated as follows:

n
X
Minimize (Maximize): f (x) = cj xj
j=1
n
X
subject to: aij xj = bi , i = 1, 2, . . . , p
j=1
Xn
aij xj ≥ bi , i = p + 1, . . . , m
j=1

xj ≥ 0, j = 1, 2, . . . , q
xj unrestricted in sign, j = q + 1, . . . , n

The canonical form of the linear programming problem:

Minimize f (x) = cT x
subject to Ax = b
x≥0

The standard form of the linear programming problem:

Minimize f (x) = cT x
subject to Ax ≥ b
x≥0

where A is a matrix, x is a vector. Pn Pn


In the transformed form, the equality constraint j=1 aij xj ≥ bi is replaced by the inequality constraint j=1 aij xj −
yi = bi , and a nonnegativity constraint yi ≥ 0 is added for each artificial variable yi . This transformation ensures that if
(x1 , x2 , . . . , xn , yi ) is a solution to the transformed problem, then (x1 , x2 , . . . , xn ) is a solution to the original problem.

To replace each unrestricted sign variable xj by two restricted sign variables, we introduce x+ j and xj such that
+ − + −
xj = xj − xj and xj , xj ≥ 0. This transformation allows us to represent positive and negative values separately.
- Solve linear programming problem graphically:
Consider a linear programming problem with two variables in standard form:

f (x1 , x2 ) = c1 x1 + c2 x2 → min

subject to
ai1 x1 + ai2 x2 ≥ bi , i = 1, 2, . . . , m
We denote the feasible region as D = {(x1 , x2 ) : ai1 x1 + ai2 x2 ≥ bi , i = 1, 2, . . . , m}. Each linear inequality ai1 x1 +
ai2 x2 ≥ bi corresponds to a line that forms the boundary of what is permitted by the constraint. Therefore, the feasible
region D, determined as the intersection of m lines, will be a convex polygon on the plane.
An equation of the form c1 x1 + c2 x2 = α has a normal vector (c1 , c2 ). As α changes, it determines parallel lines
that we call contour lines with the value α. Each point u = (u1 , u2 ) ∈ D lies on the contour line with the value
αu = c1 u1 + c2 u2 = f (u1 , u2 ).
- Conclusion: If the problem has an optimal solution, then it always has an optimal solution that is the corner of the
feasible region even for many dimensions. Just find the optimal solution among a finite number of feasible solutions.
- Simplex algorithm: The algorithm starts from any corner of the feasible region and repeatedly moves to an adjacent
corner of better objective value, if one exists. When it gets to a corner with no better neighbor, it stops: this is the
optimal solution. The algorithm is finite but has exponential complexity.

23
+ Some notations and definitions:
The linear programming in canonical Pn form (LP-C) can be represented as follows:
Find min f (x1 , x2 , . . . , xn ) = j=1 cj xj → min,
subject
Pn to
i=1 aij xj = bi , i = 1, 2, . . . , m,
xj ≥ 0, j = 1, 2, . . . , n.
In the context of linear programming in canonical form (LP-C), we use the following notations:

• x = (x1 , x2 , . . . , xn )T : the variable vector


• c = (c1 , c2 , . . . , cn )T : the coefficient vector
• A is the constraint matrix, which has dimensions m × n. It is defined as A = (aij )m×n , where aij represents the
coefficient of variable xj in the ith constraint equation.
• b = (b1 , b2 , . . . , bm )T is the constraint vector, also known as the right-hand side vector.

The linear programming problem in canonical form (LP-C) can be rewritten in matrix form as follows:

min{f (x) = cT x : Ax = b, x ≥ 0}

Furthermore, in linear programming, we often encounter vector inequalities. A vector y = (y1 , y2 , . . . , yk ) ≥ 0 means
that each component of the vector satisfies yi ≥ 0 for i = 1, 2, . . . , k.
In linear programming in canonical form (LP-C), we introduce index sets and notations to represent the variables,
objective function coefficients, and constraint matrix. Let’s define the following symbols:

• J = {1, 2, . . . , n} is the index set representing the indices for variables.


• I = {1, 2, . . . , m} is the index set representing the indices for constraints.
• x = x(J) = {xj : j ∈ J}: the variable vector.
• c = c(J) = {cj : j ∈ J}: the objective function coefficient vector.
• A = A(I, J) = {aij : i ∈ I, j ∈ J}: the constraint matrix.
• Aj = (aij : i ∈ I): the jth column vector of the matrix A.

Using these notations, the basic constraint equations of the LP-C can also be written in the form:

A1 x1 + A2 x2 + . . . + An xn = b

A feasible solution x∗ is a solution that belongs to the constraint region D, i.e., Ax∗ = b and x∗ ≥ 0. Feasible solutions
are the potential solutions that satisfy the problem’s constraints.
An optimal solution of the problem is a feasible solution x∗ that gives the smallest value of the objective function.
In other words, for all x ∈ D, we have cT x∗ ≤ cT x. The optimal solution is the solution that minimizes the objective
function.
The value f ∗ = cT x∗ associated with the optimal solution x∗ is called the optimal value of the problem. It represents
the minimum value of the objective function obtained at the optimal solution.

8.1.2 BASIC FEASIBLE SOLUTION


In linear programming in canonical form (LP-C), we make the following assumption:
Assumption: We assume that the rank of the constraint matrix A is equal to m, denoted as rank(A) = m. This
means that the system of basic constraint equations consists of m linearly independent equations.
This assumption is crucial for the solvability of the LP-C problem. In fact, the assumption rank(A) = m is equivalent
to the assumption that the system of linear equations Ax = b has a solution.
Definition 1: A basis of matrix A is a set of m linearly independent column vectors B = {Aj1 , Aj2 , . . . , Ajm }. We
assume B = A(I, JB ), where JB = {j1 , j2 , . . . , jm } is a basis of matrix A.
Given a basis B, the vector x = (x1 , x2 , . . . , xn ) is a basic feasible solution corresponding to the basis B if the following
conditions hold:
• xj = 0 for j ∈ JN = J\JB (nonbasic variables).
• xj ≥ 0 for j ∈ JB (basic variables).

24
xjk : the k th element of B −1 b.
In linear programming in canonical form (LP-C), the basic solution x corresponding to a given basis B can be deter-
mined using the following procedure:

1. Set xN = 0, where xN = x(JN ) represents the values of the nonbasic variables set to zero.

2. Determine xB from the equations BxB = b, where xB = x(JB ) represents the values of the basic variables corre-
sponding to the basis vectors B.

Assume x = (xB , xN ) is a basic solution corresponding to the basis B. Then the LP-C problem can be rewritten as
follows:

minimize: f (xB , xN ) = cTB xB + cTN xN


subject to: BxB + N xN = b
xB , xN ≥ 0,

where cB = (cj : j ∈ JB ) represents the objective function coefficient vector for the basic variables, cN = (cj : j ∈ JN )
represents the objective function coefficient vector for the nonbasic variables, and N = (Aj : j ∈ JN ) is called the non-basic
matrix of A.
Example: Consider the following linear programming problem (LP):

minimize: 6x1 + 2x2 − 5x3 + x4 + 4x5 − 3x6 + 12x7


subject to: x1 + x2 + x3 + x4 = 4
x1 + x5 = 2
x3 + x6 = 3
3x2 + x3 + x7 = 6
x1 , x2 , x3 , x4 , x5 , x6 , x7 ≥ 0.

Let  
6
2  
 
−5 4
  2
 1 ,
c= 3 ,
b=
 
4
 
−3 6
12
and  
1 1 1 1 0 0 0
1 0 0 0 1 0 0
 
0
A= 0 1 0 0 1 0 .
0 3 1 0 0 0 1
A1 A2 A3 A4 A5 A6 A7
Consider the basis B = {A4 , A5 , A6 , A7 } = E4 , where E4 denotes the fourth column vector of matrix A.
To obtain the basic solution x = (x1 , x2 , . . . , x7 ) corresponding to basis B, we can set:

x1 = 0, x2 = 0, x3 = 0,

and determine the values of xB = (x4 , x5 , x6 , x7 ) by solving the equations Bxb = E4 xB = b.


Solving the equations, we find xB = (4, 2, 3, 6).
Therefore, the basic solution corresponding to basis B is:

x = (0, 0, 0, 4, 2, 3, 6).

⇒ Basic solution is called as basic feasible solution if it is a feasible solution.


The number of basic feasible solutions ≤ the number of basis ≤ C(n, m).
Theorem 1. If the LP has feasible solution, then it also has basic feasible solution.

25
8.1.3 Formula for incremental change of the objective function
Assume x is a basic feasible solution corresponding to basis B = (Aj : j ∈ JB ). Denote:
JB = {j1 , j2 , . . . , jm } – indices of basic variables;
JN = J \ JB – indices of non-basic variables;
B = (Aj : j ∈ JB ) – basis matrix;
N = (Aj : j ∈ JN ) – non-basic matrix;
xB = x(JB ) = {xj : j ∈ JB } – basic variable vector;
xN = x(JN ) = {xj : j ∈ JN } – non-basic variable vector;
cB = c(JB ) = {cj : j ∈ JB } – objective function coefficient vectors of basic variables;
cN = c(JN ) = {cj : j ∈ JN } – objective function coefficient vectors of non-basic variables.
Consider a basic feasible solution z = x + ∆x, where ∆x = (∆x1 , ∆x2 , . . . , ∆xn ) is the incremental change vector in
variables. We can find the formula to calculate the incremental change of the objective function:

∆f = cT z − cT x = cT ∆x.
Since x and z are both feasible solutions, we have Ax = b and Az = b. Therefore, the incremental change ∆x must
satisfy the condition A∆x = 0, which means:

B∆xB + N ∆xN = 0,
where ∆xB = (∆xj : j ∈ JB ) and ∆xN = (∆xj : j ∈ JN ).
Thus, we have ∆xB = −B −1 N ∆xN
We can express the incremental change of the objective function as follows:

cT ∆x = cTB ∆xB + cTN ∆xN = −(cTB B −1 N − cTN )∆xN .


Denoting u = cTB B −1 as the transpose vector, ∆N = (∆j : j ∈ JN ) = uN − cTN as the estimate vector, we obtain the
formula:
X
∆f = cT z − cT x = −∆N ∆xN = − ∆j ∆xj .
j∈JN

The obtained formula is called the formula for the incremental change of the objective function.

8.1.4 Optimality criterion (Optimality test)


Definition. A basic feasible solution x is said to be non-degenerate if all its basis elements are different from 0. The
linear programming problem (LP) is said to be non-degenerate if all its basic feasible solutions are non-degenerate.
Theorem 2. (Optimality criterion) The inequality

∆N ≤ 0 (∆j ≤ 0 ∀j ∈ JN )

is a sufficient condition, and in the non-degenerate case, it is also a necessary condition for a basic feasible solution x to
be optimal.
If among the ∆j values of a basic feasible solution x, there is a positive value ∆j0 > 0, and the corresponding elements
of the vector B −1 Aj0 ≤ 0, then the objective function of the problem is unbounded.

26
8.1.5 SIMPLEX METHOD

27
28

You might also like