CS515 Homework 2
CS515 Homework 2
Problem 0
I would like to thank my wife Preeti Agarwal whose understanding of matrices
helped me with deriving the solutions for a number of problems in the homework
In the class I got some help during the breakout sessions from Christos Boutsikas
and Michael Beshear who discussed some of the parts of the problems.
I would also like to thank Professor Gleich, Nadim and Sai for providing help
with the classes and homeworks.
1
Problem 1.1
For setting α = 1.0, we can see from the graph that the residual instead of
approaching 0, grows unbounded.
Thus we can say that the Poissons equation for n = 10 will not converge using
the Richardson’s method for α = 1.0
Julia Code:
using SparseArrays, LinearAlgebra, Plots
h = 1.0/(n)
length_identity_diagonal = (n-1) * ( n - 2)
vector_identity_diagonal = -1 .* ones(length_identity_diagonal)
2
A = spdiagm(-position_identity_diagonal=>vector_identity_diagonal,
-1 => sub_sup_diag,
0 => diagVec,
1 => sub_sup_diag,
position_identity_diagonal=>vector_identity_diagonal)
ctr = 1
for i=0:n
for j=0:n
if i==0 || j == 0 || i == n || j == n
continue
else
fvec[ctr] = -f(i*h, j*h)*h^2
ctr = ctr + 1
end
end
end
return A, fvec
end
function solve_linear_system_return_residual(
A::SparseMatrixCSC, b::Vector, alpha::Float64 = 1.0, niter=0)
n = size(A,1)
@assert(n == size(A,2), "The matrix is not square")
py = Vector{Float64}()
while norm(r) >= n*eps(1.0) && iter <= niter
x .+= alpha .* r
r = b - A*x
iter += 1
push!(py, norm(r))
end
println("Num iterations ", niter, " alpha ", alpha, " residual ", norm(r))
if iter < niter && norm(r) < n*eps(1.0) && !isnan(norm(r))
println("Further iterations will not make a difference.\nIteration=", iter)
else
println("Could not converge within the given number of iterations")
end
return x, iter, py
end
n=10
f(::Float64, ::Float64) = 1
A1, f1 = laplacian_reduced(n, f)
# taking number of iterations as 1000
u1, num_iters1, residual1 = solve_linear_system_return_residual(A1, f1, 1.0, 100)
px = 1:num_iters
3
Problem 1.2
Consider the linear system of equation:
Ax = b
For convergence of the Richardson method, a sufficient condition is that the norm
of the matrix I − αA must be less than 1.
We need:
||I − αA||∞ < 1
X
⇒ max |(I − αA)i,j | < 1
i
j
Since this is the max of modulo sum of elements in a row among all rows, for
the matrix I − αA, due to the structure of the matrix A, We would get only 2
different sums
|1 − 4α| + |α| + |α| = |1 − 4α| + 2|α|
and
|1 − 4α| + |α| + |α| + |α| = |1 − 4α| + 3|α|
Since α is greater than 0 [by the definition] the max [ and thus the norm ] would
be:
||I − αA||∞ = |1 − 4α| + 3|α|
|1 − 4α| + 3α < 1
1 − 4α + 3α < 1
⇒1−α<1
⇒α>0 which is already true
−1 + 4α + 3α < 1
⇒ 7α < 2
⇒ α < 2/7 which is not possible, since α is greater than 0.25
Hence the range of α is 0 < α ≤ 1/4 for any n in the Poissons equation matrix.
We can verify this by setting α as 0.25 and 0.2 and checking the residual plot,
which both approach to an extremely small value within 576 and 724 iterations
respectively.
Anything more than 0.25 led to a diverging value of residual approaching infinity
4
Julia Code: (the extra code apart from what was already given in
Problem 1.1
for alph=0.25:-0.05:0.05
u, num_iters, residual = solve_linear_system_return_residual(A1, f1, alph, 20000)
px = 1:num_iters
plot(px,residual, title="Number of iterations vs residual", legend=false)
xlab = @sprintf("Iterations for alpha = %g", alph)
xlabel!(xlab)
ylabel!("Residual")
figname = @sprintf("IterationsVsResidualPlotAlpha%gN%g", alph*100, n)
savefig(figname)
end
5
Problem 1.3
For n = 20,
Through our experiments using the same code as in Problem 1.2, we saw that the
same range of values for α as derived in problem 1.2 worked i.e. 0 < alpha ≤ 1/4.
For example, setting α as 0.25 and 0.2 the residual converged to an extremely
small value within 2172 and 2718 iterations respectively
6
Problem 1.4
Through our experiments using the code as in Problem 1.2, we saw the following
number of iterations taken for the residual to converge for n = 10 and n = 20.
For n = 10
α Iterations
0.05 2939
0.1 1462
0.15 972
0.2 724
0.25 576
For n = 20
α Iterations
0.05 10910
0.1 5448
0.15 3628
0.2 2718
0.25 2172
We can see that the fewest number of iterations were when α = 0.25 for both
cases
7
Problem 1.5
(The theoretical proof for this problem was already done in 1.2, repasting that)
Consider the linear system of equation:
Ax = b
For convergence of the Richardson method, a sufficient condition is that the norm
of the matrix I − αA must be less than 1.
We need:
||I − αA||∞ < 1
X
⇒ max |(I − αA)i,j | < 1
i
j
Since this is the max of modulo sum of elements in a row among all rows, for
the matrix I − αA, due to the structure of the matrix A, We would get only 2
different sums
|1 − 4α| + |α| + |α| = |1 − 4α| + 2|α|
and
|1 − 4α| + |α| + |α| + |α| = |1 − 4α| + 3|α|
Since α is greater than 0 [by the definition] the max [ and thus the norm ] would
be:
||I − αA||∞ = |1 − 4α| + 3|α|
|1 − 4α| + 3α < 1
1 − 4α + 3α < 1
⇒1−α<1
⇒α>0 which is already true
−1 + 4α + 3α < 1
⇒ 7α < 2
⇒ α < 2/7 which is not possible, since α is greater than 0.25
Hence the range of α is 0 < α ≤ 1/4 for any n in the Poissons equation matrix.
8
Problem 2.1
Given ||A|| is a matrix norm and k 6= 0 is a real-valued vector.
Given a function f (x),
f (x) = ||xkT ||
2. f (x) = 0 iff x = 0
Lets say x = 0
Then, xkT = [0] (0 valued matrix)
Thus by property of matrix norm, ||xkT || = 0
f (αx) = ||αxkT ||
⇒ f (αx) = |α|.||xkT || (by property of matrix norms)
⇒ f (αx) = |α|.f (x) (by definition)
Hence proved.
4. f (x + y) ≤ f (x) + f (y)
f (x + y) = ||(x + y)kT ||
⇒ f (x + y) = ||xkT + ykT || (by property of vector mult.)
⇒ f (x + y) ≤ ||xk || + ||yk ||
T T
(by property of matrix norms)
Which is the same as saying:
f (x + y) ≤ f (x) + f (y)
Hence proved.
9
Problem 2.2
Given ||A|| is sub-multiplicative, i.e.
kABk ≤ kAk.kBk
To prove that,
f (Ax) ≤ kAkf (x)
Now,
f (Ax) = kAxkT k
By property of matrix multiplications (Treating vectors as a class of matrix),
we can say
f (Ax) = kA(xkT )k
Since the ||.|| is submultiplicative,
f (Ax) ≤ kAkkxkT k
Thus we can write, (because of the way the vector norm was defined)
10
Problem 3.1
Given the PageRank Linear system as:
(I − αpr P )x = (1 − αpr )v
A = I − αpr P
b = 1 − αpr
In order for Richardson’s method to converge, either of kI −Ak < 1 or ρ(I −A) < 1
must hold.
Thus we need to show that the max of eigen-values of matrix I−(I−αpr P ) = αpr P
is less than 1.
We know that the eigen-values of a matrix and its transpose is the same (because
they have the same characteristic equation)
Now, by property of eigen-values and eigen-vectors:
P T v = λv
Let vk be the maximum element in the vector v on the RHS. So for kth row, in
the equations, we can write:
|λvk | ≤ |vk |
λ≤1
Thus all eigen-values of P would be less or equal to 1 And since α < 1, Thus
all eigen-values of αP would be less than 1. Thus the spectral radius of matrix
I − A will be less than 1.
Hence the Richardsons method will converge for the PageRank linear system.
11
Problem 3.2
For Richardsons method, the update equations can be written as, with x0 = b:
rk = b − Axk
xk+1 = x + αrk
function solve_linear_system(
A::SparseMatrixCSC,
b::Vector,
alpha::Float64 = 1.0,
niter = 0,
)
n = size(A, 1)
@assert(n == size(A, 2), "The matrix is not square")
A = [0.7 0 0.4
0 1 0
0.3 0 0.6]
b = [1/4
1/2
1/4]
x1 = A\b
x2,_ = solve_linear_system(sparse(A), b, 1.0, 10000)
12
display(x1)
display(x2)
println("")
Output
Further iterations will not make a difference.
Iteration=25
3-element Vector{Float64}:
0.16666666666666669
0.5
0.3333333333333333
3-element Vector{Float64}:
0.1666778422384972
0.5
0.3333221577615028
13
Problem 3.3
We tested the implementation for correctness by running for P as a 1000 × 1000
matrix.
For αP R = 0.85 and αRichardson = 1.0, the algorithm reached a relative residual
of 10−5 in 80 iterations.
For αP R = 0.85 and αRichardson = 0.5, the algorithm reached a relative residual
of 10−5 in 167 iterations.
Julia Code
using SparseArrays, LinearAlgebra, Plots, Printf, ZipFile, DelimitedFiles
Random.seed!(0)
function csc_matvec(colptr, rowval, nzval, m, n, x)
y = zeros(m) # allocate output
for j = 1:length(colptr)-1 # for each column ...
for nzi = colptr[j]:colptr[j+1]-1 # for each entry in the column
i = rowval[nzi]
v = nzval[nzi]
y[i] += v * x[j] # same as indexed form!
end
end
return y
end
function solve_linear_system(
A::SparseMatrixCSC,
b::Vector,
alpha::Float64 = 1.0,
niter = 0,
)
n = size(A, 1)
@assert(n == size(A, 2), "The matrix is not square")
P =
sprand(1_000, 1_000, 15 / 1_000) |>
A ->
begin
fill!(A.nzval, 1)
A
end |> A -> begin
14
ei, ej, ev = findnz(A)
d = sum(A; dims = 2)
return sparse(ej, ei, ev ./ d[ei], size(A)...)
end
colptr, rowval, nzval = P.colptr, P.rowval, P.nzval
function load_data()
r = ZipFile.Reader("HW2/wikipedia-2005.zip")
try
@assert length(r.files) == 1
f = first(r.files)
data = readdlm(f, '\n', Int)
n = data[1]
colptr = data[2:2+n] # colptr has length n+1
rowval = data[3+n:end] # n+2 elements before start of rowval
A =
SparseMatrixCSC(n, n, colptr, rowval, ones(length(rowval))) |>
A -> begin
ei, ej, ev = findnz(A)
d = sum(A; dims = 2)
return sparse(ej, ei, ev ./ d[ei], size(A)...)
end
finally
close(r)
end
end
P =
sprand(1_000, 1_000, 15 / 1_000) |>
A ->
begin
fill!(A.nzval, 1)
A
end |> A -> begin
ei, ej, ev = findnz(A)
d = sum(A; dims = 2)
return sparse(ej, ei, ev ./ d[ei], size(A)...)
end
v = 1/size(P,1) .* ones(size(P,1))
alpha_pr = 0.85
A = (I(size(P,1)) - alpha_pr .* P)
b = (1.0-alpha_pr) .* v
x_actual = A\b
x, r_r, iters = solve_linear_system(sparse(A), b, 1.0, 10000000)
@printf("\nRelative Residual = %g Iterations Taken = %g\n", r_r, iters)
@printf("Norm of x_actual - x = %g", norm(x_actual .- x))
P = load_data()
v = 1/size(P,1) .* ones(size(P,1))
alpha_pr = 0.85
alpha_r = 1.0
A = (I(size(P,1)) - alpha_pr .* P)
b = (1.0-alpha_pr) .* v
15
x1, r_r1, iters1 = solve_linear_system(sparse(A), b, alpha_r, 10000000)
@printf("\nRelative Residual = %g Iterations Taken = %g\n", r_r1, iters1)
alpha_pr = 0.85
alpha_r = 0.5
A = (I(size(P,1)) - alpha_pr .* P)
b = (1.0-alpha_pr) .* v
output
julia>
16
Problem 4.1
From HW1 Problem 7 we were give T as the sparse matrix which represented
the transitions probability from one state to another.
Through the analysis done in HW1 Problem 7.1 we were able to write a
linear system of equations for getting the expected length of game from a starting
state as:
(I − T 0 )x = [1 1 1 . . . 1 1 0 1]0
x = (I − T 0 )−1 [1 1 1 . . . 1 1 0 1]0
So, for the linear equation for Chutes and Ladders Expected length, starting with
α = 1,
Thus for α = 1, the Richardson’s method for solving the equations will converge.
Running the Richardson method for different values of alpha and measuring the
number of iterations and time taken to run the iterations until convergence or
otherwise, we could make following inferences:
• The value of α at which fewest number of iterations are taken is around 1.4
• This value of α is also that takes least amount of time (through benchmark-
ing)
• This is also demonstrated by the rate of fall of residual for different values
of α
• In the graph of α vs number of iterations, the graph falls after 1.5 because
the program was set to exit early in case the value of norm exceeded the
limits.
• In the graph of α vs time taken, the graph falls after 1.5 because the program
was set to exit early in case the value of norm exceeded the limits.
17
18
Number of Iterations vs Residual for different values of α
Julia Code
using CSV, DataFrames, SparseArrays, Plots, LinearAlgebra,
BenchmarkTools, Printf
CLMatrixDF = CSV.read(
"chutes-and-ladders-matrix.csv", DataFrame, header=false)
CLMatrixCoords = CSV.read(
"chutes-and-ladders-coords.csv", DataFrame, header=false)
Ivec = CLMatrixDF[!,1]
Jvec = CLMatrixDF[!,2]
Vvec = CLMatrixDF[!,3]
xc = CLMatrixCoords[!,1]
yc = CLMatrixCoords[!,2]
function solve_linear_system(
A::SparseMatrixCSC, b::Vector, alpha::Float64 = 1.0, niter=0)
n = size(A,1)
@assert(n == size(A,2), "The matrix is not square")
residual = Vector{Float64}()
diverging_series = false
19
x = copy(b) # Initialize x0 with b
r = b - A*x # compute the initial residual
iter = 0
while norm(r) >= 1000*n*eps(1.0) && iter <= niter && !isnan(norm(r))
x .+= alpha .* r
r = b - A*x
iter += 1
push!(residual, norm(r))
end
plot_array = Any[]
for alph in [0.1, 0.4, 0.7, 1.0, 1.4, 1.5]
spectral_rad = find_spectral_radius(Id101 - T', alph)
# not checking for spectral radius because we want to show the
# divergence of the iterations
solution, iterations, residual, _ = solve_linear_system(
Id101 - T', b, alph, 10000)
title = @sprintf("alpha = %g", alph)
push!(plot_array,plot(1:iterations,residual, title=title, legend=false))
end
plot(plot_array..., layout = (3,2))
savefig("HW2_4_1_IterationsVsResidualForAlpha")
px = Any[]
py = Any[]
for alph = 0.1:0.1:1.7
spectral_rad = find_spectral_radius(Id101 - T', alph)
# not checking for spectral radius because we want to show the
# divergence of the iterations
solution, iterations, _, _ = solve_linear_system(
Id101 - T', b, alph, 10000)
20
push!(px, alph)
push!(py, iterations)
end
plot(px, py, title = "Alpha vs number of iterations", legend = false)
xlabel!("alpha")
ylabel!("Number of iterations")
savefig("HW2_4_1_AlphaVsIterations")
timeBenchmark = Any[]
for alph = 0.1:0.1:1.7
spectral_rad = find_spectral_radius(Id101 - T', alph)
# not checking for spectral radius because we want to show the
# divergence of the iterations
t = @belapsed solve_linear_system(Id101 - T', b, $alph, 100000)
push!(timeBenchmark, t)
end
plot(0.1:0.1:1.7, 1000 .* timeBenchmark,
title = "Alpha vs time elapsed in iterations",
legend=false)
xlabel!("alpha")
ylabel!("Time (msec)")
savefig("HW2_4_1_AlphaVsTimeTaken")
21
Problem 4.2
In the linear system for solving the expected length of the Chutes and Ladders
game,
x = (I − T 0 )−1 [1 1 1 . . . 1 1 0 1]0
The expected length of the game is given by 101th entry of the solution vector.
i.e.
E = [0 0 0 . . . 0 1](I − T 0 )−1 [1 1 1 . . . 1 1 0 1]0
⇒ C = ([1 1 1 . . . 1 1 1 1] − [0 0 0 . . . 0 0 1 0])(I + T + T 2 + T 3 + . . .)
C1 = [1 1 1 . . . 1 1 1 1](I + T + T 2 + T 3 + . . .)
Since the sum of columns of T (which represent states that will lead to a conclusion)
starts with 1 and gradually diminishes to extremely small value as k increases for
Tk
C1101 = [1 1 1 . . . 1 1 1 1](I + T + T 2 + T 3 + . . .)
P∞
Now We need to show that E is same as k=1 k[pk ]100P t101
∞
Since t101 = [0 0 0 . . . 0 1]T We need to show that C = k=1 k[pk ]100
Or we need to show P∞ that k−1
[1 1 1 . . . 1 1 0 1](I + T + T + T 3 + . . .) is equal to
2
[0 0 0 . . . 0 0 1 0] k=1 kT
22
Problem 5
In order to implement the gradient descent method for equation Ax = b, the
recurring equations are:
gk = Axk − b
gTk gk
αk =
gTk Agk
xk+1 = xk − αk gk
gk+1 = Axk+1 − b
⇒ gk+1 = A(xk − αk gk ) − b
⇒ gk+1 = Axk − b − αk Agk
⇒ gk+1 = gk − αk Agk
Thus finally we can rewrite the recurring equations where we just calculate a
single matrix vector product (Ag) as: Initialise
x0 = b
g0 = Ax0 − b
Then
gTk gk
αk =
gTk Agk
xk+1 = xk − αk gk
gk+1 = gk − αk Agk
We have kept the stopping condition as the k − gk < n × eps(1.0) where eps(1.0)
is an extremely small value provided by Julia.
Julia Code
# Gradient Descent Method:
using SparseArrays, Plots, LinearAlgebra,
BenchmarkTools, Printf
A = [1 0 0
0 4 5
0 5 6] # SPD
b = [7
8
9]
function steepest_descent_method(A::SparseMatrixCSC, b::Vector, niter=0)
n = size(A,1)
@assert(n == size(A,2), "The matrix is not square")
residual = Vector{Float64}()
diverging_series = false
23
while norm(-g) >= n*eps(1.0) && iter <= niter && !isnan(norm(-g))
iter += 1
mat_vec_pdt = A * g
alpha = (g' * g)/(g' * mat_vec_pdt)
x = x - alpha .* g
g = g - alpha .* mat_vec_pdt
end
sol1 = A\b
sol2, _, _, _ = steepest_descent_method(sparse(A), b, 10000)
display(sol1)
display(sol2)
Output
Further iterations won't make difference.
Iterations=13
3-element Vector{Float64}:
7.0
-3.0000000000000155
4.000000000000013
3-element Vector{Float64}:
7.0
-2.999999999999985
3.999999999999986
24
Problem 6
The coordinate descent method can be implemented as (taking x0 = b):
gk+1 = Axk − b
xk+1 = xk + γei
−gi
Where γ = A i,i
and ei is a vector of size same as b and has only the index
i as 1 and everything else as 0. The index i could be picked at random or cyclically.
This can be re-written as (again selecting the index i through some means):
gk+1 = Axk − b
gi
xk+1 [i] = xk [i] −
Ai,i
Given that the input matrix is a sparse matrix, we can optimize by not calculating
the complete vector g at each step.
This can be done by taking just the row of matrix A indexed by i. So the final
algorithm is:
(gi )k+1 = A[i, :]xk − b[i]
g
xk+1 [i] = xk [i] − i
Ai,i
We can use the CSR representation of the matrix and then taking the column
projection (of the corresponding row we require of the original matrix. This way
the maximum work done would be the number of non-zero entries in the row.
Julia Code
# Coordinate Descent Method:
using SparseArrays, Random
Plots, LinearAlgebra, BenchmarkTools, Printf
A = [1 0 1
0 4 0
1 0 6] # SPD
b = [7.0
8.0
9.0]
# To get consistent results
Random.seed!(10)
25
end
return y
end
function coordinate_descent_method(
A::SparseMatrixCSC,
b::Vector,
niter = 0,
use_random = false,
)
n = size(A, 1)
@assert(n == size(A, 2), "The matrix is not square")
residual = Vector{Float64}()
diverging_series = false
g[idx] =
csc_column_projection(
mat_transpose_csc.colptr,
mat_transpose_csc.rowval,
mat_transpose_csc.nzval,
mat_transpose_csc.m,
mat_transpose_csc.n,
idx,
x,
) - b[idx]
x[idx] =
x[idx] - (
g[idx] /
csc_lookup(A.colptr, A.rowval, A.nzval, A.m, A.n, idx, idx)
)
26
end
sol1 = A\b
sol2, _, _, _ = coordinate_descent_method(sparse(A), b, 100000, false)
sol3, _, _, _ = coordinate_descent_method(sparse(A), b, 100000, true)
display(sol1)
display(sol2)
display(sol3)
Output
Further iterations won't make difference.
Iterations=63
Further iterations won't make difference.
Iterations=31
3-element Vector{Float64}:
6.6
2.0
0.4
3-element Vector{Float64}:
6.6
2.0
0.4000000000000001
3-element Vector{Float64}:
6.599815672153635
2.0
0.40003072130772743
27
Problem 7
We want to solve the linear system of equations Ax = b by minimizing the
function:
f (x) = 21 xT Ax − xT b
Where A is SPD
The generalized co-ordinate descent method update step for 2 co-ordinates:
xk+1 = xk + E ij ck
c
Where E ij = ei ej and ck = i
cj
Thus putting this update step in the equation for f (x) and trying to minimize
the difference.
error = f (xk+1 )−f (xk ) = 21 (xk +ci ei +cj ej )T A(xk +ci ei +cj ej )−(xk +ci ei +cj ej )T b−( 12 xTk Axk −xTk b)
Taking partial derivatives w.r.t. ci and cj and equating to 0 we get the equations:
Or re-writing:
[gk ]i + cj Ai,j + ci Ai,i = 0
[gk ]j + ci Ai,j + cj Aj,j = 0
28
Thus we can now write the code for 2-coordinate descent method. Care must be
taken that if the indices are selected at random, then they must be different from
each other. Or we will definitely run into not defined errors.
Julia Code
# 2-Coordinate Descent Method:
using SparseArrays, Random, StatsBase
Plots, LinearAlgebra, BenchmarkTools, Printf
A = [
1 0 1
0 4 0
1 0 6
] # SPD
b = [
7.0
8.0
9.0
]
function coordinate_descent_method_2(
A::SparseMatrixCSC,
b::Vector,
niter = 0,
use_random = false,
)
n = size(A, 1)
@assert(n == size(A, 2), "The matrix is not square")
residual = Vector{Float64}()
diverging_series = false
iter = 0
while norm(-g) >= n * eps(1.0) && iter <= niter && !isnan(norm(-g))
iter += 1
if use_random
#select an index in gradient array at random
#they must be different from each other
idx1, idx2 = sample(1:n, 2, replace=false)
else
# select an index in gradient array cyclically
idx1 = iter % n + 1 # since Julia indices are 1 based
idx2 = (iter + 1) % n + 1
end
c1 =
(g[idx1] * A[idx2,idx2] - g[idx2] * A[idx1,idx2]) /
((A[idx1,idx2])^2 - A[idx1,idx1] * A[idx2,idx2])
c2 =
(g[idx2] * A[idx1,idx1] - g[idx1] * A[idx1,idx2]) /
((A[idx1,idx2])^2 - A[idx1,idx1] * A[idx2,idx2])
29
e1 = zeros(size(A,1))
e1[idx1] = 1
e2 = zeros(size(A,1))
e2[idx2] = 1
x = x + [e1 e2]*[c1;c2]
g = A * x - b
end
sol1 = A \ b
sol2, _, _, _ = coordinate_descent_method_2(sparse(A), b, 100000, false)
sol3, _, _, _ = coordinate_descent_method_2(sparse(A), b, 100000, true)
display(sol1)
display(sol2)
display(sol3)
Output
julia>
Further iterations won't make difference.
Iterations=2
Further iterations won't make difference.
Iterations=3
3-element Vector{Float64}:
6.6
2.0
0.4
3-element Vector{Float64}:
6.6
2.0
0.4
3-element Vector{Float64}:
6.6
2.0
0.4
30
Problem 8.1
For both coordinate descent method and gradient descent method, in order for
the iterations to converge, we need that the matrix A in Ax = b must be SPD.
We know that the Laplacian matrix generated (with boundary conditions removed)
is symmetric.
On checking out the eigen-values, all of them are real valued and lie between
0.012330665067488346 and 7.987669334932514
Thus the matrix is SPD.
Hence applying steepest descent or coordinate-descent method, the iterations will
converge.
On running both methods for the n=40 Laplacian matrix, we got similar solutions
of x by both methods.
The work done for the relative residual looks like following:
Relative Residual Steepest Descent Work Coordinate Descent Work
1 37625 343450
0.1 5643750 4756953
0.01 11287500 9276724
0.001 16931250 13619145
0.0001 22379350 18124070
The Plots look like:
31
32
Julia Code
f(::Float64, ::Float64) = 1
A, b = laplacian_reduced(40, f)
plot(
w1[1:2000:end],
r_r1[1:2000:end],
legend = false,
title = "Coord-Descent Work done vs relative residual",
)
xlabel!("Work done")
ylabel!("Relative Residual")
savefig("HW2/Prob8_1_CoordinateDesc")
plot(
w1[1:3000:end],
r_r1[1:3000:end],
yaxis = :log,
legend = false,
title = "Coord-Descent Work done vs relative residual (Log)",
)
xlabel!("Work done (log scale)")
ylabel!("Relative Residual")
savefig("HW2/Prob8_1_CoordinateDescLog")
plot(
w,
r_r,
legend = false,
title = "Steepest Descent Work done vs relative residual",
)
33
xlabel!("Work done")
ylabel!("Relative Residual")
savefig("HW2/Prob8_1_SteepestDesc")
plot(
w,
r_r,
yaxis = :log,
legend = false,
title = "Steepest Descent Work done vs relative residual (Log)",
)
xlabel!("Work done (log scale)")
ylabel!("Relative Residual")
savefig("HW2/Prob8_1_SteepestDescLog")
println("")
34
Problem 8.2
For both coordinate descent method and gradient descent method, in order for
the iterations to converge, we need that the matrix A in Ax = b must be SPD.
The matrix that corresponds to A in the equation is I − T T
On checking out the eigen-values, the eigen-values are complex, but all have a
positive real component In order for correctness, we can modify the equation to
have an SPD matrix.
We can instead solve, AT Ax = AT b
Since AT A would be an SPD matrix, we would now have a system that converges.
On checking the eigen-values of AT A , they range from 0.0007665789654431876
to 2.0619327436207917.
Hence applying steepest descent or coordinate-descent method, the iterations will
converge.
On running both methods for the above modified matrix, we got similar solutions
of x by both methods.
The work done for the relative residual looks like following:
Relative Residual Steepest Descent Work Coordinate Descent Work
.1 24423 1984
0.01 2093400 1983075
0.001 4884600 5250627
0.0001 7265261 8911588
The Plots look like (we notice that there is a lot of noise, but the general trend of
residual diminishing with increasing iterations still holds):
35
36
Julia Code
CLMatrixDF =
CSV.read("HW2/chutes-and-ladders-matrix.csv", DataFrame, header = false)
CLMatrixCoords =
CSV.read("HW2/chutes-and-ladders-coords.csv", DataFrame, header = false)
Ivec = CLMatrixDF[!, 1]
Jvec = CLMatrixDF[!, 2]
Vvec = CLMatrixDF[!, 3]
xc = CLMatrixCoords[!, 1]
yc = CLMatrixCoords[!, 2]
37
Problem 9
In the approximate model, we saw that if x is the vector of P(i is infected at time
T) and A is the adjacency matrix, x can be obtained for k steps as (taking x0 as
initial infections and ρ as probability of getting infected):
Which tells us that as the number of iterations increase, if we require that the
expected number of infected to converge to a fixed value, we need the (ρA)k to
approach 0.
Thus the largest eigen-value of (ρA) gives us a bound on how large the value of ρ
can get before the approximate model would not converge anymore.
38
39
Problem 10
Given,
D = diagonal entries of A
N = A − D = all non-diagonal entries of A
kN xk ≤ γkDxk
for the vector 2-norm and γ < 1 for all vectors x
Lets define the operator-2-norm for matrices, considering k.k is the 2 norm,
kAxk
kAk = max
x6=0 kxk
Now,
kN xk ≤ γkDxk
kN xk
⇒ ≤γ
kDxk
kDxk ≤ kDkkxk
kN xk
max ≤γ
x6=0 kDkkxk
kN k
⇒ ≤γ
kDk
P 2 2
i D i,i xi
kDk = max P
2
2
x6=0 i xi
Now we can argue that RHS would always be lesser than expression if we replace
D i,i with D max (the max diagonal entry). i.e.
P 2 2 P 2
i D i,i xi D x2i
P 2 ≤ iP max 2 = D 2max
x
i i x
i i
And the corresponding vector to obtain this value would be eigen-vector for that
corresponding D max . Hence,
kDk = D max
40
Hence we can write,
kN k
≤ γ ⇒ kD −1 kkN k ≤ γ
kDk
By the submultiplicative property of matrix norms:
kD −1 N k ≤ γ
And we know that the spectral radius of a matrix is always less than the matrix
norm, thus we can write,
ρ(D −1 N ) ≤ γ < 1
41