SciCom LecNotes
SciCom LecNotes
1 MATLAB
- An algo is unstable if rounding errors can lead to large errors in the results.
- a = first:increment:last : initialize the vector a starting at the first element and ending at the last element with a
step length of increment. If increment is not specified, its default value is 1.
- A = [1 2 3; 4 5 6; 7 8 9] : rows are separated = ’;’.
• diag(v) (v is a vector): matrix with diagonal = v, others = 0. diag(v,1) means diagonal = v, but the diagonal are
moved 1 to the right.
- Clear all variables: clear. clear var1 means only clear var1. ∼= = not equal
- if similar to python, elif python = elseif. Have ’end’ at last. After statement needs ;.
- for i = [1,2,4], for i = 1:4, for i = n:-1:1
x = []; x = [x,k]: append k at the end
- disp(x): display in comment window.
- function output/ [out1,out2,...] = functionName(inp1,inp2,...)
end
- plot, plot3 (3d), surf (use [X,Y] = meshgrid(x,y) then X and Y are 2 x.size()*y.size() matrix which are created by
copies of x and y, respectively.)
Ex: x = 0:2:6;
1
y = 0:1:6;
[X,Y] = meshgrid(x,y);
F = X.ˆ2 + Y.ˆ2;
surf(X,Y,F);
- 10\5 = 5/10 = 5:10
2 Linear Equations
2.1 System of Linear Equations
a11 a12 ··· a1n x1 b1
a21 a22 ··· a2n x2 b2
A= . .. , x = . , b= .
.. ..
.. . . . .. ..
am1 am2 ··· amn xn bm
Then, the system of linear equations can be written as:
Ax = b
where A is the coefficient matrix, m is number of eq, n is number of variables, x is the variable vector, and b is the
right-hand side vector:
- To solve:
• Ax = b, A ∈ Rn∗n ; x, b ∈ Rn :
x = A−1 b (x = inv(A)*b)
2
2.2 Permutation matrices and triangular matrices
- Permutation matrix: The permutation matrix is obtained from the unit matrix I by permuting its rows. Its properties:
• With permutation matrix: P −1 = P T
• Multiplication PX: permute rows of matrix X
• Multiplication XP: permute columns of matrix X
0 1 0 0
0 0 0 1
A=
0
0 1 0
1 0 0 0
- Short form in Matlab: p = [2, 4, 3, 1]
- Upper triangular matrix: The matrix X ∈ Rn×n if xij = 0 ∈ X ∀i < j
x1,1 x1,2 x1,3 . . . x1,n
0 x2,2 x2,3 . . . x2,n
0 0 x3,3 . . . x3,n
.. .. .. .. ..
. . . . .
0 0 0 ... xn,n
- Unit triangular matrix: Matrix which only has diagonal elements xii = 1 ∀i = 1, . . . , n.
- The determinant of an upper triangular matrix is non-zero ⇔ all the elements on the main diagonal are ̸= 0.
- Lower triangular matrix: similar
- Subunit triangular matrix: lower triangular matrix with ones on the diagonal and zeros above the diagonal.
- Solve U x = b, U is an upper triangular matrix:
x = zeros ( n , 1 ) ;
f o r k = n : −1:1
x ( k ) = b ( k ) /U( k , k ) ;
i =(1: k − 1 ) ’ ;
b ( i ) = b ( i ) − x ( k ) ∗ U( i , k ) ;
end
This code calculate from the last equation to first, after knowing an x(i), subtract them in all equations above at 2
sides, based on its coefficient of x(i).
2.3 LU analysis
• Let Pk be the permutation matrices at steps k = 1, . . . , n − 1.
• Let Mk be the subunit triangular matrices obtained by inserting the ”addition” factors used in the k th step below
the diagonal position of k th column of the unit matrix.
• Let U be the final upper triangular matrix obtained at the end of the forward elimination phase.
So: U = Mn−1 Pn−1 ...M1 P1 A.
′
Mn−1 = Mn−1
′ −1
Mn−2 = Pn−1 Mn−2 Pn−1
′ −1 −1
Mn−3 = Pn−1 Pn−2 Mn−3 Pn−2 Pn−1
..
.
−1 −1
Mk′ = Pn−1 · · · Pk+1 Mk Pk+1 · · · Pn−1
′
then Mn−1 Pn−1 . . . M1 P1 = Mn−1 . . . M1′ × Pn−1 . . . P1 .
Thus,
Mn−1 Pn−1 ...M1 P1 A = U
′
Mn−1 . . . M1′ × Pn−1 . . . P1 = U
P A = LU
3
′
where: P = Pn−1 . . . P1 , L = (Mn−1 . . . M1′ )−1
- Solve the system of equations by LU analysis Ax = b:
LU analysis: P A = LU
+ Solve Ly = P b, find y.
+ Solve U x = y, find x.
- Note: If we choose wrong pivot element, because of error in real numbers, the answer of the algorithm may have a
high difference to the real answer.
Pn
• ||x||1 = i=1 |xi |
• ||x||∞ = max1≤i≤n |xi |
Pn 1/p
• ||x||p = ( i=1 |xi |p )
+ Code: norm(x,p). norm(x) = I2 .
- Matrix normalization:
+ Definition: The function || · || : Rn×n → R is said to be a matrix norm if it satisfies the following properties:
• ||A|| ≥ 0 for all A ∈ Rn×n , and ||A|| = 0 if and only if A is the zero matrix.
• ||αA|| = |α|||A|| for all A ∈ Rn×n and all α ∈ R.
• ||A + B|| ≤ ||A|| + ||B|| for all A, B ∈ Rn×n .
• ||AB|| ≤ ||A|| · ||B|| for all A, B ∈ Rn×n .
In other words, the norm of A is the maximum amount by which A can stretch a unit vector.
The inequality ||Ax|| ≤ ||A|| · ||x|| is called the norm bound or norm inequality, and it follows from the definition of
the norm.
+ Types: Given a vector norm || · || on Rn , we can define a matrix norm || · || on Rn×n as follows:
• ||A||2 = max||x||2 =1 ||Ax||2 = λmax (A∗ A), where A∗ is the conjugate transpose matrix of A (A∗ij = Aji )
p
Pn
• Maximum absolute row sum: ||A||∞ = max||x||∞ =1 ||Ax||∞ = max1≤i≤n j=1 |aij |
4
Pn
• Maximum absolute column sum: ||A||1 = max||x||1 =1 ||Ax||1 = max1≤j≤n i=1 |aij |
qP
n Pn
• Frobenius standard: ||A||F = Tr(AT A) =
p
2
i=1 j=1 |aij |
5
- If the input data is represented approximately to computer accuracy, then the relative error estimate of the calculated
solution is given by the formula:
∥x∗ − x∥
≈ cond(A)ϵM
∥x∥
where cond(A) is the condition number of the matrix A and ϵM is the machine epsilon.
The calculated solution will lose an interval of log10 (cond(A)) decimal places in the relative error due to the limited
precision of the input data.
3 Curve fitting
3.1 Problem
N
- Determine the value of f (x) at n + 1 discrete points fi = f (xi ), for i = 0, 1, . . . , N . Given the dataset (xi , fi )i=0 ,
approximate f (x) by a function p(x).
- 2 methods:
• Interpolation: function p(x) must go through all points in the dataset.
• Regression: given the form p(x) with parameters, we must determine the parameters to minimize a certain error
criterion (usually the least squares criterion is used).
3.2 Interpolation
- The possible interpolation with functions p(x) is:
• Polynomial function
• Rational function
pK (x) = a0 + a1 x + · · · + aK xK
1 x0 x20 · · · xM
0 a0 f0
1 x1 x21 · · · xM 1
a1 f1
.. .. = ..
.. .. .. ..
. . . . . . .
1 xN x2N ··· xM
N aM fN
6
- The Lagrange basis polynomials are defined as:
N
Y x − xj
Vi (x) =
x − xj
j=0 i
j̸=i
where i = 0, 1, . . . , N .
- The Lagrangian form of the interpolation polynomial is given by:
- Spline interpolation: In this case we consider interpolation by a set of low-order polynomials instead of a single
higher-order polynomial:
• Interpolation by linear spline
• Interpolation by cubic spline
For the dataset
{(xi , fi ) : i = 0, 1, . . . , N }
and:
a = x0 < x1 < · · · < xN = b, h ≡ max |xi − xi−1 |
i
- Linear spline S1,N (x) is a continuous function that interpolates given data and is built from linear functions defined
by two data point interpolation polynomials:
f −f
1 0
(x − x1 ) + f1 , if x ∈ [x0 , x1 ]
x − x
. 1 0
..
f −f
i i−1
S1,N (x) = (x − xi ) + fi , if x ∈ [xi−1 , xi ]
xi − xi−1
..
.
fN − fN −1 (x − xN ) + fN ,
if x ∈ [xN −1 , xN ]
xN − xN −1
Easy to see:
fi − fi−1
Li (x) = (x − xi ) + fi
xi − xi−1
7
is the equation of the line through two points (xi , fi ) and (xi−1 , fi−1 ). Using error formula for interpolated polynomial
with x ∈ [a, b] for each segment [xi−1 , xi ] we get:
2
|xi − xi−1 |
max |f (z) − S1,N (z)| ≤ × max f (2) (x)
z∈[xi−1 ,xi ] 8 x∈[xi−1 ,xi ]
h2
≤ × max f (2) (x)
8 x∈[xi−1 ,xi ]
- S3,N (x) is a continuous function with continuous first and second derivatives on the segment [x0 , xN ]
- Condition (Find polynomial based on this):
• Smoothness Condition at the point xi : p′i (xi ) = p′i+1 (xi ), p′′i (xi ) = p′′i+1 (xi ), i = 1, 2, . . . , N − 1
+ p=spline(x,y): returns the segmented polynomial of the third order spline for the dataset (x,y).
+ v=ppval(p,x): Calculates the value of the interpolated polynomial at input x (p is determined by the spline
function).
+ yy=spline(x,y,xx): Defines the 3rd order interpolated spline for the dataset (x,y) and returns yy as the vector of
the values of the spline interpolated at points in the vector xx.
3.3 Regression
- Regression problem: Learn y = f (x) from a given training data D = {(x1 , y1 ),
. . . , (xM , yM )} such that yi ≡ f (xi ) ∀i
- Linear model:
f (x) = w0 + w1 x1 + · · · + wn xn
w0 , . . . , wn : regression coefficients/weights, w0 : bias.
Note: Learning a linear function = learning w = (w0 , . . . , wn )T .
- Expected loss of f :
E = Ex (f ∗ (xi ) − yi )
- Empirical loss:
M
X M
X
RSS(f ) = (yi − f (xi ))2 = (yi − w0 − · · · − wn xin )2
i=1 i=1
1
Note: RSS(f ): an approximation to Ex [r(x)]
M
1
- Generalization error: RSS(f ) − Ex [r(x)]
M
- Some methods:
8
+ Ordinary least squares (OLS): Given D, find f ∗ that minimizes RSS:
f ∗ = arg min RSS(f )
f ∈H
M
X
⇔ W∗ = arg min (yi − w0 − · · · − wn xin )2
w
i=1
- Linear regression: Linear regression is finding a line that fits the data points in the least squares sense. Given a set
of N pairs of data points (xi , fi ) : i = 1, . . . , N , find the coefficient m and the free constant b of the line y(x) = mx + b
such that this line fits the data according to the least squares criterion.
N
X
L(m, b) = (fi − (mxi + b))2
i=1
This is the formula for the least squares criterion, where L(m, b) is the sum of the squared differences between the actual
fi values and the predicted values mxi + b. The goal is to find the values of m and b that minimize L(m, b).
To find this minimum, we need to solve the system of equations to find the stopping point:
N
∂L X
= 2(fi − (mxi + b))(−xi ) = 0
∂m i=1
N
∂L X
= 2(fi − (mxi + b))(−1) = 0
∂b i=1
These equations represent the partial derivatives of the least squares criterion with respect to m and b, respectively. The
goal is to find the values of m and b that satisfy these equations, which correspond to the minimum of the least squares
criterion.
Using the equations from the previous step, we can derive the following expressions:
N
X N
X N
X
x2i m + xi b = xi fi
i=1 i=1 i=1
N
X N
X N
X
xi m + b= fi
i=1 i=1 i=1
In matrix form, this can be written as:
"P # "P #
N PN N
x2i i=1 xi m x i fi
Pi=1
N = Pi=1
N
i=1 xi N b i=1 fi
Solving this system of equations will give us the coefficients m and b that minimize the least squares criterion.
- High order curve fitting: We want to build the curve fitting
pM (x) = a0 + a1 x + a2 x2 + · · · + aM xM
that matches the dataset (xi , fi ) : i = 1, . . . , N according to the least squares criterion:
σM ≡ (pM (x1 ) − f1 )2 + (pM (x2 ) − f2 )2 + · · · + (pM (xN ) − fN )2
To find the coefficients a0 , a1 , . . . , aM that minimize σM , we need to solve the system of equations:
∂σM ∂σM ∂σM
= 0; = 0; · · · ; =0
∂a0 ∂a1 ∂aM
These equations represent the partial derivatives of the least squares criterion with respect to each of the coefficients
a0 , a1 , . . . , aM . The goal is to find the values of a0 , a1 , . . . , aM that satisfy these equations, which correspond to the
minimum of the least squares criterion.
PN PN M PN
···
N i=1 xi x a f
PN xi PN 2
PNi=1 Mi +1 0 PNi=1 i
i=1 i=1 xi ··· i=1 xi a1 i=1 fi xi
.. .. .. .. .. = ..
. . . . . .
PN M PN M +1 PN 2M PN
··· a M
i=1 xi i=1 xi xi fi xi
M
i=1 i=1
9
The special feature of this matrix is that it is definite symmetry, which means that we can use Gaussian elimination
without changing the line to solve the system of standard equations.
- General curve fitting: Suppose we want to match N multipoint for {(xi , fi ) : i = 1, . . . , n} because they include m
linearly independent functions φj (x), j = 1, 2, . . . , m. That is, the function f (x) to look for has the form:
m
X
f (x) = cj φj (x)
j=1
φj (x) = xj , j = 1, 2, . . . , m
φj (x) = sin(jx), j = 1, 2, . . . , m
φj (x) = cos(jx), j = 1, 2, . . . , m
These functions are linearly independent and can be used to construct a basis for the space of functions that can be matched
to the data points. The coefficients cj can be found by solving the system of equations that results from matching the
data points to the linear combination of the basis functions.
2
n
X n
X m
X
E(c1 , c2 , . . . , cm ) = (f (xk ) − fk )2 = cj φj (xk ) − fk
k=1 k=1 j=1
4 Nonlinear equations
4.1 Bisection Method
- Strength: Works even with non-analytic functions. (analytic functions: a function that is locally given by a convergent
power series)
- Weaknesses:
• Need to determine the range of solutions and find only one solution.
• When the function f has singularities (singularity is a point at which a given mathematical object is not defined),
the bisection method can treat them as solutions.
10
4.2 Chord method
- Strength: like bisection, we do not need the analytic form of the equation f
- Weaknesses:
• Need to know the solution interval
• Single-sided convergence is slow, especially when the segment contains large solutions
• Can be improved using the same halving method
for some ξ ∈ (a, b). If we neglect the higher-order terms and solve for b, we get the linear equation:
f ′ (a) f ′ (a)
f (a) n−1
b=a− ′ 1 + ′′ (b − a) + · · · + (b − a)
f (a) 2f (a) n!f (n) (a)
which can be iteratively solved using a fixed-point iteration scheme. The convergence of the method depends on the
behavior of the higher-order terms, which can be difficult to estimate in general.
- Advantages:
• For a smooth enough function and if we start from a point near the solution, the convergence rate of the method is
squared or r = 2.
11
• No need to know the solution dissociation, only the initial point x0 is required.
- Disadvantages:
• Need to calculate the first derivative f ′ (xk ), which can be computationally expensive. We can approximate the first
f (xk + h) − f (xk − h)
derivative using the formula f ′ (xk ) ≈ , but choosing the value of h = 0.001 can be tricky and
2h
affect the accuracy of the approximation.
• The iterative procedure may not always converge or may converge to a wrong solution if the initial guess is too far
from the actual solution, or if the function has multiple roots or singular points.
- Advantages:
• No need to know the solution dissociation, only two initial points x0 and x1 are required.
• No need to calculate the first derivative f ′ (xk ).
- Disadvantages:
• Two initialization points are required, which may be more difficult to obtain than a single initial point.
• The convergence rate of the
√
method on linear problems is only moderately better than linear, 1 < r < 2, specifically
the golden ratio r ≈ 1 + 25 = 1.618.
• The method may converge slowly or not at all if the initial guesses are far from the actual solution or if the function
has singular points or multiple roots.
12
- Advantages: No need to know the sol interval.
- Disadvantages: Doesn’t always converge.
- Note: Power of k > 1: increase error and iteration, k < 1: decrease error and iteration.
Functions to find solutions in Matlab:
X = roots(C): find polynomial roots
X = fzero(F,X0 ) find solutions to nonlinear equations
f ′′ (ξ)h2
f (x + h) = f (x) + f ′ (x)h + , ξ ∈ [x, x + h]
2
We have:
f (x + h) − f (x) f ′′ (ξ)h
f ′ (x) = −
h 2
f ′′ (ξ)h
Given that is the truncation error, we have:
2
f (x + h) − f (x)
f ′ (x) ≈
h
This is the Forward Difference (FD) formula to approximate the derivative.
- Backward Difference Method (BD):
+ Formulate the method: Similar to the forward difference method, in Taylor expansion we use x − h instead of x + h,
we have:
f (x) − f (x − h)
f ′ (x) ≈
h
′ π
Exercise: Use the BD to approximate f ( 3 ), knowing that f (x) = sin(x).
π f ( π3 ) − f ( π3 − h)
f ′( ) ≈
3 h
Using h = 0.1, we have:
π sin( π3 ) − sin( π3 − 0.1)
f ′( ) ≈ ≈ 0.497
3 0.1
- Central Difference Method (CD):
Consider the Taylor expansion of the function f at the neighborhood x:
f (x + h) − f (x − h)
f ′ (x) ≈
2h
where ξ belongs to (x − h, x + h).
+ Consider the Taylor expansion of the function f at the neighborhood x:
13
We have the approximate formula for 2nd order derivative:
∂f f (x + h, y) − f (x − h, y)
(x, y) ≈
∂x 2h
∂f f (x, y + h) − f (x, y − h)
(x, y) ≈
∂y 2h
where h is the step size.
∂2f f (x + h, y) − 2f (x, y) + f (x − h, y)
2
(x, y) ≈
∂x h2
∂2f f (x, y + h) − 2f (x, y) + f (x, y − h)
2
(x, y) ≈
∂y h2
∂2f f (x + h, y + h) − f (x + h, y − h) − f (x − h, y + h) + f (x − h, y − h)
(x, y) ≈
∂x∂y 4h2
where ∆xk = xk − xk−1 and ck ∈ Ik is called the Riemann sum of the function f (x) corresponding to the division ∆
and the selection points ck : k = 1, . . . , n.
- Trapezoidal rule: Extension
Divide [a, b] into n equal intervals using n + 1 points: x0 = a, x1 = a + h, xn−1 = a + (n − 1)h, xn = a + nh where
h = b−a
n . Applying the trapezoidal formula for each subinterval we have:
Z xi
xi − xi−1
f (x)dx ≈ (f (xi−1 ) + f (xi ))
xi−1 2
n−1
!
Z b
h X
f (x)dx ≈ f (a) + 2 f (xi ) + f (b)
a 2 i=1
14
Like the extended trapezoidal rule, we divide the interval [a, b] into several subintervals and apply the Simpson 1/3 to
each subinterval, we get the extended Simpson 1/3:
Z b n/2−1 n/2−1
h X X
f (x)dx ≈ f (x0 ) + 4 f (x2i+1 ) + 2 f (x2i ) + f (xn )
a 3 i=0 i=1
6 Differential Equations
6.1 Forward Euler (FE)
Considering differential equation: y ′ = f (y, t) Forward Euler method is obtained by using the forward difference formula
approximate y ′ :
y(tn+1 ) − y(tn )
y ′ (tn ) ≈
h
We can rewrite it in an iterative form as follows:
yn+1 = yn + hf (yn , tn )
where h is the step size, yn is the approximation of y(tn ), and f (yn , tn ) is the value of f at (yn , tn ). tn+1 = tn + h
- Disadvantages:
yn+1 = yn + hf (yn , zn , tn )
zn+1 = zn + hg(yn , zn , tn )
where h is the step size, yn and zn are the approximations of y(tn ) and z(tn ), respectively, and f (yn , zn , tn ) and g(yn , zn , tn )
are the values of f and g at (yn , zn , tn ).
To solve a higher-order differential equation, we can decompose it into a system of first-order differential equations.
For example, consider the second-order differential equation:
15
Let y ′ = z, then rewrite as follows:
y ′ = z, y(0) = 1
′
z = 0.05z − 0.15y, z(0) = 0
Equation becomes a system of first-order differential equations.
- Modified Euler method:
h
yn+1 = yn + [f (yn+1 , tn+1 ) + f (yn , tn )]
2
The Runge-Kutta method was developed by applying integral methods to integrate the right-hand side.
The second-order Runge-Kutta formula:
k1 = hf (yn , tn )
k2 = hf (yn + k1 , tn+1 )
1
yn+1 = yn + (k1 + k2 )
2
where h is the step size, yn is the approximation of y(tn ), and f (yn , tn ) is the value of f at (yn , tn ).
16
such that θ = −1 is optimal.
- Fourth-order Runge-Kutta formula:
k1 = hf (yn , tn )
1 1
k2 = hf yn + k1 , tn + h
2 2
1 1
k3 = hf yn + k2 , tn + h
2 2
k4 = hf (yn + k3 , tn + h)
1
yn+1 = yn + (k1 + 2k2 + 2k3 + k4 )
6
is based on Simpson 1/3.
k1 = hf (yn , tn )
1 1
k2 = hf yn + k1 , tn + h
3 3
1 1 2
k3 = hf yn + k1 + k2 , tn + h
3 3 3
k4 = hf (yn + k1 − k2 + k3 , tn + h)
1
yn+1 = yn + (k1 + 3k2 + 3k3 + k4 )
8
is based on Simpson 3/8.
17
then Rn together with the scalar product becomes an n-dimensional Euclidean space.
The length of vector u in Rn is given by:
v
u n
p uX
∥u∥ = ⟨u, u⟩ = t u2i
i=1
n
The distance between two points u and v in R is defined as:
v
u n
uX
ρ(u, v) = ∥u − v∥ = t (ui − vi )2
i=1
18
• Assume the function f is defined on an open set X. We say that the function f is twice continuously differentiable
on set X if f is twice differentiable at every point x of X and
1
f (x) = f (x0 ) + J(x0 )(x − x0 ) + (x − x0 )T H(x0 )(x − x0 ) + α(x, x0 )∥x − x0 ∥2 ,
2
where ∂f1 ∂f1 ∂f1
∂x1 (x0 ) ∂x2 (x0 ) ∂x3 (x0 )
∂f2 ∂f2 ∂f2
J(x0 ) = ∂x1 (x0 ) ∂x2 (x0 ) ∂x3 (x0 )
∂f3 ∂f3 ∂f3
∂x1 (x0 ) ∂x2 (x0 ) ∂x3 (x0 )
is the Jacobian matrix of f evaluated at x0 , H(x0 ) is the Hessian matrix of f evaluated at x0 , and limx→x0 α(x, x0 ) = 0.
The error can be written as o(∥x − x0 ∥2 ).
∂2f ∂ 2 f1 ∂ 2 f1
1
∂x2
(x0 ) ∂x1 ∂x2 (x0 ) ∂x1 ∂x3 (x0 )
∂ 2 f12 ∂ 2 f2 ∂ 2 f2
2 (x0 )
H(x0 ) = ∂x1 ∂x2 (x0 ) ∂x1 ∂x3 (x0 )
∂x1
∂ 2 f3 ∂ 2 f3 2
∂ f3
∂x21
(x0 ) ∂x1 ∂x2 (x0 ) ∂x1 ∂x3 (x0 )
sin(x1 ) x1
Example: Let f (x) = ex2 be a vector-valued function of x = x2 . Suppose f (x) is twice continuously
ln(x3 ) x3
x01 0
differentiable in a neighborhood ε of x0 = x02 , where x0 = 1. Then, for any x in ε, we have the Taylor’s formula as
x03 2
follows:
1
f (x) = f (x0 ) + J(x0 )(x − x0 ) + (x − x0 )T H(x0 )(x − x0 ) + α(x, x0 )∥x − x0 ∥2 ,
2
where
cos(x01 ) 0 0
J(x0 ) = 0 ex02 0
1
0 0 x03
− sin(x01 )
0 0
H(x0 ) = 0 ex02 0
0 0 − x12
03
19
is the Hessian matrix of f evaluated at x0 . The error term α(x, x0 ) satisfies limx→x0 α(x, x0 ) = 0. The error can be
written as o(∥x − x0 ∥2 ).
- The finite-increments formula: Suppose the function f is continuously differentiable on the open set S, and x is
a vector in S. For any vector y satisfying x + y ∈ S, there exists a number α ∈ [0, 1] such that
Z 1
f (x + y) − f (x) ≤ ⟨f ′ (x + ty), y⟩ dt.
0
∂f
(x0 ) = 0 (2)
∂x
Condition (2) is called the stationary condition, and the point x0 satisfying (2) is called a stationary point. Therefore,
solving the problem (1) can be reduced to solving equation (2).
- Theorem 2 (Sufficient condition for optimality) Suppose f is twice continuously differentiable. The stationary point
x0 is a local minimum if the matrix f ′′ (x0 ) is positive definite.
- Sylvester’s Criterion: To determine whether a matrix is positive definite or not, Sylvester’s criterion can be used.
Matrix A = (aij )n×n is negative definite (positive semidefinite) if and only if all of its sub-determinants are non-negative.
Formula: For 1 ≤ i1 < i2 < . . . < ik ≤ n, k = 1, 2, . . . , n, the sub-determinant Di1 ,i2 ,...,ik is defined as:
ai1 i1 ai1 i2 . . . ai1 ik
ai2 i1 ai2 i2 . . . ai2 ik
Di1 ,i2 ,...,ik = det . .. ≥ 0
.. ..
.. . . .
aik i1 aik i2 ... aik ik
20
(b−a) (b−a)
1. Calculate x1 = a + 2 − e and x2 = a + 2 + e, where e is the precision.
2. Calculate f1 = f (x1 ) and f2 = f (x2 ).
3. If f1 < f2 , then set b = x2 (remove segment x > x2 ).
F0 = 1,
F1 = 1,
Fk = Fk−1 + Fk−2 , k ≥ 2.
To determine the number of iterations N before calculating the minimum, we use the following formula:
(k) FN −1−k
x1 = (bk − ak ) + ak , k = 0, 1, . . . , N − 1
FN +1−k
(k) FN −k
x2 = (bk − ak ) + ak , k = 0, 1, . . . , N − 1
FN +1−k
where ak and bk are the lower and upper bounds of the interval at iteration k.
- Golden section method:
(k)
x1 = 0.382(bk − ak ) + ak , k = 0, 1, 2, . . .
(k)
x2 = 0.618(bk − ak ) + ak , k = 0, 1, 2, . . .
min f (x), x ∈ Rn
where f (x) is continuously differentiable. To solve problem, if a solution exists, it could be found among the solutions of
the equations:
∂f
(x) = 0
∂x
However, solving equations in the general case is still quite complex. This leads us to use efficient methods to solve
problem.
A common approach to solving problem is to use methods that iterate from an initial value x0 and move ”toward” the
optimal value x∗ . At each iteration, we calculate:
xk+1 = xk + αk pk , k = 1, 2, . . .
where:
21
• The convergence of the sequence {xk } to the solution x∗ . Additionally, it’s important to note that defining pk and
αk differently also affects the amount of computation required.
- Gradient methods: We select the direction pk such that:
In other words, by moving in the direction of pk with a sufficiently small length, we will reach the point xk+1 with a
smaller objective function value. Therefore, the direction pk satisfying condition is called the descent direction of the
objective function f (x). One of the vectors satisfying the inequality can be chosen as the gradient vector of the function
f at xk :
pk = −∇f (xk ), αk > 0, k = 0, 1, 2, . . .
Then, we have the iterative procedure:
Iterative procedures that follow the formula are called gradient methods.
Since the search direction is fixed, the gradient methods differ due to the choice of αk . We list some of the basic choices
below:
+ Procedure: Minimization of a Function with One Variable
We solve the problem of minimizing a function with one variable:
1. Set α = α0 > 0.
2. Set u = xk − α∇f (xk ) and calculate f (u).
3. Check if f (u) − f (xk ) ≤ −ϵα∥∇f (xk )∥2 , where 0 < ϵ < 1.
α
4. If the inequality is satisfied, then set αk = α. Otherwise, set α = 2 and go back to step 2.
- Newton method:
In the case that the function f is twice continuously differentiable and the calculation of f ′ (x) and f ′′ (x) is not difficult,
we can use the quadratic term of the Taylor series expansion:
1
fk (x) ≈ f (xk ) + ⟨f ′ (xk ), x − xk ⟩ + ⟨H(xk )(x − xk ), x − xk ⟩ (9)
2
This equation represents a quadratic approximation of the function f at the neighborhood of the point xk . When xk is
very close to x∗ , f (xk ) is very close to 0, and the quadratic term in equation (9) provides more precise information about
the variation of f in the neighborhood of xk .
We determine the approximation vector uk from the condition:
Depending on the choice of αk , different methods can be obtained. For αk = 1 for every k, we have Newton’s method.
From equation (11), we have xk+1 = uk when choosing αk = 1. The condition in equation (10) becomes:
From equation (12), we infer that xk+1 is the breakpoint of the function fk (x), i.e., fk′ (x) = f ′ (xk )+H(xk )(x−xk ) = 0.
Therefore, if H(xk ) is not degenerate, we have the following Newton’s formula:
22
8 LINEAR PROGRAMMING
8.1 Simplex method
8.1.1 The canonical and standard form of the linear programming problems
- The general form of the linear programming problem: The standard form of the linear programming problem is the
optimization problem in which we have to find the maximum (minimum) of a linear objective function with the condition
that the variables must satisfy a number of equations and linear inequalities. The mathematical model of the problem
can be stated as follows:
n
X
Minimize (Maximize): f (x) = cj xj
j=1
n
X
subject to: aij xj = bi , i = 1, 2, . . . , p
j=1
Xn
aij xj ≥ bi , i = p + 1, . . . , m
j=1
xj ≥ 0, j = 1, 2, . . . , q
xj unrestricted in sign, j = q + 1, . . . , n
Minimize f (x) = cT x
subject to Ax = b
x≥0
Minimize f (x) = cT x
subject to Ax ≥ b
x≥0
f (x1 , x2 ) = c1 x1 + c2 x2 → min
subject to
ai1 x1 + ai2 x2 ≥ bi , i = 1, 2, . . . , m
We denote the feasible region as D = {(x1 , x2 ) : ai1 x1 + ai2 x2 ≥ bi , i = 1, 2, . . . , m}. Each linear inequality ai1 x1 +
ai2 x2 ≥ bi corresponds to a line that forms the boundary of what is permitted by the constraint. Therefore, the feasible
region D, determined as the intersection of m lines, will be a convex polygon on the plane.
An equation of the form c1 x1 + c2 x2 = α has a normal vector (c1 , c2 ). As α changes, it determines parallel lines
that we call contour lines with the value α. Each point u = (u1 , u2 ) ∈ D lies on the contour line with the value
αu = c1 u1 + c2 u2 = f (u1 , u2 ).
- Conclusion: If the problem has an optimal solution, then it always has an optimal solution that is the corner of the
feasible region even for many dimensions. Just find the optimal solution among a finite number of feasible solutions.
- Simplex algorithm: The algorithm starts from any corner of the feasible region and repeatedly moves to an adjacent
corner of better objective value, if one exists. When it gets to a corner with no better neighbor, it stops: this is the
optimal solution. The algorithm is finite but has exponential complexity.
23
+ Some notations and definitions:
The linear programming in canonical Pn form (LP-C) can be represented as follows:
Find min f (x1 , x2 , . . . , xn ) = j=1 cj xj → min,
subject
Pn to
i=1 aij xj = bi , i = 1, 2, . . . , m,
xj ≥ 0, j = 1, 2, . . . , n.
In the context of linear programming in canonical form (LP-C), we use the following notations:
The linear programming problem in canonical form (LP-C) can be rewritten in matrix form as follows:
min{f (x) = cT x : Ax = b, x ≥ 0}
Furthermore, in linear programming, we often encounter vector inequalities. A vector y = (y1 , y2 , . . . , yk ) ≥ 0 means
that each component of the vector satisfies yi ≥ 0 for i = 1, 2, . . . , k.
In linear programming in canonical form (LP-C), we introduce index sets and notations to represent the variables,
objective function coefficients, and constraint matrix. Let’s define the following symbols:
Using these notations, the basic constraint equations of the LP-C can also be written in the form:
A1 x1 + A2 x2 + . . . + An xn = b
A feasible solution x∗ is a solution that belongs to the constraint region D, i.e., Ax∗ = b and x∗ ≥ 0. Feasible solutions
are the potential solutions that satisfy the problem’s constraints.
An optimal solution of the problem is a feasible solution x∗ that gives the smallest value of the objective function.
In other words, for all x ∈ D, we have cT x∗ ≤ cT x. The optimal solution is the solution that minimizes the objective
function.
The value f ∗ = cT x∗ associated with the optimal solution x∗ is called the optimal value of the problem. It represents
the minimum value of the objective function obtained at the optimal solution.
24
xjk : the k th element of B −1 b.
In linear programming in canonical form (LP-C), the basic solution x corresponding to a given basis B can be deter-
mined using the following procedure:
1. Set xN = 0, where xN = x(JN ) represents the values of the nonbasic variables set to zero.
2. Determine xB from the equations BxB = b, where xB = x(JB ) represents the values of the basic variables corre-
sponding to the basis vectors B.
Assume x = (xB , xN ) is a basic solution corresponding to the basis B. Then the LP-C problem can be rewritten as
follows:
where cB = (cj : j ∈ JB ) represents the objective function coefficient vector for the basic variables, cN = (cj : j ∈ JN )
represents the objective function coefficient vector for the nonbasic variables, and N = (Aj : j ∈ JN ) is called the non-basic
matrix of A.
Example: Consider the following linear programming problem (LP):
Let
6
2
−5 4
2
1 ,
c= 3 ,
b=
4
−3 6
12
and
1 1 1 1 0 0 0
1 0 0 0 1 0 0
0
A= 0 1 0 0 1 0 .
0 3 1 0 0 0 1
A1 A2 A3 A4 A5 A6 A7
Consider the basis B = {A4 , A5 , A6 , A7 } = E4 , where E4 denotes the fourth column vector of matrix A.
To obtain the basic solution x = (x1 , x2 , . . . , x7 ) corresponding to basis B, we can set:
x1 = 0, x2 = 0, x3 = 0,
x = (0, 0, 0, 4, 2, 3, 6).
25
8.1.3 Formula for incremental change of the objective function
Assume x is a basic feasible solution corresponding to basis B = (Aj : j ∈ JB ). Denote:
JB = {j1 , j2 , . . . , jm } – indices of basic variables;
JN = J \ JB – indices of non-basic variables;
B = (Aj : j ∈ JB ) – basis matrix;
N = (Aj : j ∈ JN ) – non-basic matrix;
xB = x(JB ) = {xj : j ∈ JB } – basic variable vector;
xN = x(JN ) = {xj : j ∈ JN } – non-basic variable vector;
cB = c(JB ) = {cj : j ∈ JB } – objective function coefficient vectors of basic variables;
cN = c(JN ) = {cj : j ∈ JN } – objective function coefficient vectors of non-basic variables.
Consider a basic feasible solution z = x + ∆x, where ∆x = (∆x1 , ∆x2 , . . . , ∆xn ) is the incremental change vector in
variables. We can find the formula to calculate the incremental change of the objective function:
∆f = cT z − cT x = cT ∆x.
Since x and z are both feasible solutions, we have Ax = b and Az = b. Therefore, the incremental change ∆x must
satisfy the condition A∆x = 0, which means:
B∆xB + N ∆xN = 0,
where ∆xB = (∆xj : j ∈ JB ) and ∆xN = (∆xj : j ∈ JN ).
Thus, we have ∆xB = −B −1 N ∆xN
We can express the incremental change of the objective function as follows:
The obtained formula is called the formula for the incremental change of the objective function.
∆N ≤ 0 (∆j ≤ 0 ∀j ∈ JN )
is a sufficient condition, and in the non-degenerate case, it is also a necessary condition for a basic feasible solution x to
be optimal.
If among the ∆j values of a basic feasible solution x, there is a positive value ∆j0 > 0, and the corresponding elements
of the vector B −1 Aj0 ≤ 0, then the objective function of the problem is unbounded.
26
8.1.5 SIMPLEX METHOD
27
28