0% found this document useful (0 votes)
81 views41 pages

CS515 Homework 2

The document contains solutions to homework problems related to matrix computations. In Problem 1.2, the author derives the sufficient condition for convergence of the Richardson method as 0 < α ≤ 1/4 by analyzing the infinity norm of the matrix I - αA. Experimental results with α values of 0.25 and 0.2 for n=10 and n=20 matched this theoretical range. Problem 1.4 summarizes the number of iterations taken for different α values, showing that α=0.25 required the fewest iterations. Problem 2.1 proves that the function f(x) = ||xkT|| satisfies the properties of a vector norm.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views41 pages

CS515 Homework 2

The document contains solutions to homework problems related to matrix computations. In Problem 1.2, the author derives the sufficient condition for convergence of the Richardson method as 0 < α ≤ 1/4 by analyzing the infinity norm of the matrix I - αA. Experimental results with α values of 0.25 and 0.2 for n=10 and n=20 matched this theoretical range. Problem 1.4 summarizes the number of iterations taken for different α values, showing that α=0.25 required the fewest iterations. Problem 2.1 proves that the function f(x) = ||xkT|| satisfies the properties of a vector norm.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

HOMEWORK 2 SOLUTIONS

purdue university · cs 51500


Akshya Gupta
matrix computations
September 27, 2021

Problem 0
I would like to thank my wife Preeti Agarwal whose understanding of matrices
helped me with deriving the solutions for a number of problems in the homework
In the class I got some help during the breakout sessions from Christos Boutsikas
and Michael Beshear who discussed some of the parts of the problems.
I would also like to thank Professor Gleich, Nadim and Sai for providing help
with the classes and homeworks.

I also took references and directionfrom wikipedia.


Lastly I took a lot of reference from Julia online documentation

1
Problem 1.1
For setting α = 1.0, we can see from the graph that the residual instead of
approaching 0, grows unbounded.

Thus we can say that the Poissons equation for n = 10 will not converge using
the Richardson’s method for α = 1.0

Julia Code:
using SparseArrays, LinearAlgebra, Plots

function laplacian_reduced(n::Integer, f::Function)


if n < 3
display("Invalid input. n must be >= 3")
return
end
N = (n-1)^2 # number of element in u vector
fvec = zeros(N)

h = 1.0/(n)

# Form the sparse matrix:


# diagonal is all 4
# sub-diagonal and superdiagonal is repeated negative ones vector with every n-1 th element set to 0
# there are diagonals of negative ones which are n-1 away from main diagonal
diagVec = 4 .* ones(N)
sub_sup_diag = -1 .* ones(N-1)
for i=n-1:n-1:N-1
sub_sup_diag[i] = 0
end

length_identity_diagonal = (n-1) * ( n - 2)
vector_identity_diagonal = -1 .* ones(length_identity_diagonal)

position_identity_diagonal = n-1 # -(n-1) for sub and n+1 for super

2
A = spdiagm(-position_identity_diagonal=>vector_identity_diagonal,
-1 => sub_sup_diag,
0 => diagVec,
1 => sub_sup_diag,
position_identity_diagonal=>vector_identity_diagonal)
ctr = 1
for i=0:n
for j=0:n
if i==0 || j == 0 || i == n || j == n
continue
else
fvec[ctr] = -f(i*h, j*h)*h^2
ctr = ctr + 1
end
end
end
return A, fvec
end

function solve_linear_system_return_residual(
A::SparseMatrixCSC, b::Vector, alpha::Float64 = 1.0, niter=0)
n = size(A,1)
@assert(n == size(A,2), "The matrix is not square")

x = copy(b) # Initialize x0 with b


r = b - A*x # compute the initial residual
iter = 0

py = Vector{Float64}()
while norm(r) >= n*eps(1.0) && iter <= niter
x .+= alpha .* r
r = b - A*x
iter += 1
push!(py, norm(r))
end
println("Num iterations ", niter, " alpha ", alpha, " residual ", norm(r))
if iter < niter && norm(r) < n*eps(1.0) && !isnan(norm(r))
println("Further iterations will not make a difference.\nIteration=", iter)
else
println("Could not converge within the given number of iterations")
end
return x, iter, py
end

n=10
f(::Float64, ::Float64) = 1
A1, f1 = laplacian_reduced(n, f)
# taking number of iterations as 1000
u1, num_iters1, residual1 = solve_linear_system_return_residual(A1, f1, 1.0, 100)
px = 1:num_iters

plot(px,residual, title="Number of iterations vs residual", legend=false)


xlabel!("Iterations")
ylabel!("Residual")
savefig("IterationsVsResidualPlotAlpha1")

3
Problem 1.2
Consider the linear system of equation:

Ax = b

For convergence of the Richardson method, a sufficient condition is that the norm
of the matrix I − αA must be less than 1.

Using the infinity matrix norm,


X
||A||∞ = max |Ai,j |
i
j

We need:
||I − αA||∞ < 1
X
⇒ max |(I − αA)i,j | < 1
i
j

Since this is the max of modulo sum of elements in a row among all rows, for
the matrix I − αA, due to the structure of the matrix A, We would get only 2
different sums
|1 − 4α| + |α| + |α| = |1 − 4α| + 2|α|
and
|1 − 4α| + |α| + |α| + |α| = |1 − 4α| + 3|α|

Since α is greater than 0 [by the definition] the max [ and thus the norm ] would
be:
||I − αA||∞ = |1 − 4α| + 3|α|

Thus the inequation we need to solve is

|1 − 4α| + 3α < 1

For 1 − 4α ≥ 0 ⇒ α ≤ 1/4, we get:

1 − 4α + 3α < 1

⇒1−α<1
⇒α>0 which is already true

For 1 − 4α < 0 ⇒ α > 1/4, we get:

−1 + 4α + 3α < 1

⇒ 7α < 2
⇒ α < 2/7 which is not possible, since α is greater than 0.25

Hence the range of α is 0 < α ≤ 1/4 for any n in the Poissons equation matrix.

We can verify this by setting α as 0.25 and 0.2 and checking the residual plot,
which both approach to an extremely small value within 576 and 724 iterations
respectively.
Anything more than 0.25 led to a diverging value of residual approaching infinity

4
Julia Code: (the extra code apart from what was already given in
Problem 1.1
for alph=0.25:-0.05:0.05
u, num_iters, residual = solve_linear_system_return_residual(A1, f1, alph, 20000)
px = 1:num_iters
plot(px,residual, title="Number of iterations vs residual", legend=false)
xlab = @sprintf("Iterations for alpha = %g", alph)
xlabel!(xlab)
ylabel!("Residual")
figname = @sprintf("IterationsVsResidualPlotAlpha%gN%g", alph*100, n)
savefig(figname)
end

5
Problem 1.3
For n = 20,
Through our experiments using the same code as in Problem 1.2, we saw that the
same range of values for α as derived in problem 1.2 worked i.e. 0 < alpha ≤ 1/4.

For example, setting α as 0.25 and 0.2 the residual converged to an extremely
small value within 2172 and 2718 iterations respectively

6
Problem 1.4
Through our experiments using the code as in Problem 1.2, we saw the following
number of iterations taken for the residual to converge for n = 10 and n = 20.

For n = 10
α Iterations
0.05 2939
0.1 1462
0.15 972
0.2 724
0.25 576
For n = 20
α Iterations
0.05 10910
0.1 5448
0.15 3628
0.2 2718
0.25 2172
We can see that the fewest number of iterations were when α = 0.25 for both
cases

7
Problem 1.5
(The theoretical proof for this problem was already done in 1.2, repasting that)
Consider the linear system of equation:

Ax = b

For convergence of the Richardson method, a sufficient condition is that the norm
of the matrix I − αA must be less than 1.

Using the infinity matrix norm,


X
||A||∞ = max |Ai,j |
i
j

We need:
||I − αA||∞ < 1
X
⇒ max |(I − αA)i,j | < 1
i
j

Since this is the max of modulo sum of elements in a row among all rows, for
the matrix I − αA, due to the structure of the matrix A, We would get only 2
different sums
|1 − 4α| + |α| + |α| = |1 − 4α| + 2|α|
and
|1 − 4α| + |α| + |α| + |α| = |1 − 4α| + 3|α|

Since α is greater than 0 [by the definition] the max [ and thus the norm ] would
be:
||I − αA||∞ = |1 − 4α| + 3|α|

Thus the inequation we need to solve is

|1 − 4α| + 3α < 1

For 1 − 4α ≥ 0 ⇒ α ≤ 1/4, we get:

1 − 4α + 3α < 1

⇒1−α<1
⇒α>0 which is already true

For 1 − 4α < 0 ⇒ α > 1/4, we get:

−1 + 4α + 3α < 1

⇒ 7α < 2
⇒ α < 2/7 which is not possible, since α is greater than 0.25

Hence the range of α is 0 < α ≤ 1/4 for any n in the Poissons equation matrix.

8
Problem 2.1
Given ||A|| is a matrix norm and k 6= 0 is a real-valued vector.
Given a function f (x),
f (x) = ||xkT ||

To prove f (x is a vector norm, we need to prove the following:


1. f (x) ≥ 0 for all x 6= 0

For any matrix norm, ||A|| ≥ 0 for any A 6= 0


Since, k 6= 0 and x 6= 0, There will be non-zero entries in xkT
Thus xkT 6= 0 and by matrix norm property, ||xkT || ≥ 0

2. f (x) = 0 iff x = 0

Lets say x = 0
Then, xkT = [0] (0 valued matrix)
Thus by property of matrix norm, ||xkT || = 0

Now, lets say ||xkT || = 0


By property of matrix norms, this can only be true if xkT is a 0 valued
matrix.
Since it is given that k is non-zero, xkT can be 0-valued only if x = 0
Hence proved both ways.

3. f (αx) = |α|f (x)

f (αx) = ||αxkT ||
⇒ f (αx) = |α|.||xkT || (by property of matrix norms)
⇒ f (αx) = |α|.f (x) (by definition)
Hence proved.

4. f (x + y) ≤ f (x) + f (y)

f (x + y) = ||(x + y)kT ||
⇒ f (x + y) = ||xkT + ykT || (by property of vector mult.)
⇒ f (x + y) ≤ ||xk || + ||yk ||
T T
(by property of matrix norms)
Which is the same as saying:

f (x + y) ≤ f (x) + f (y)

Hence proved.

9
Problem 2.2
Given ||A|| is sub-multiplicative, i.e.

kABk ≤ kAk.kBk

To prove that,
f (Ax) ≤ kAkf (x)

Now,
f (Ax) = kAxkT k
By property of matrix multiplications (Treating vectors as a class of matrix),
we can say
f (Ax) = kA(xkT )k
Since the ||.|| is submultiplicative,

f (Ax) ≤ kAkkxkT k

Thus we can write, (because of the way the vector norm was defined)

f (Ax) ≤ kAkf (x)

Hence proved that f (x) is consistent with the matrix norm.

10
Problem 3.1
Given the PageRank Linear system as:

(I − αpr P )x = (1 − αpr )v

where vi ≥ 0 and i vi = 1 and (Pi,j ≥ 0) and ( i Pi,j = 1) and α < 1


P P

Comparing this with the linear system of equations Ax = b, we have:

A = I − αpr P

b = 1 − αpr

In order for Richardson’s method to converge, either of kI −Ak < 1 or ρ(I −A) < 1
must hold.
Thus we need to show that the max of eigen-values of matrix I−(I−αpr P ) = αpr P
is less than 1.
We know that the eigen-values of a matrix and its transpose is the same (because
they have the same characteristic equation)
Now, by property of eigen-values and eigen-vectors:

P T v = λv

Let vk be the maximum element in the vector v on the RHS. So for kth row, in
the equations, we can write:

P Tk,1 v1 + P Tk,2 v2 + . . . + P Tk,n vn = λvk

|λvk | = |P Tk,1 v1 + P Tk,2 v2 + . . . + P Tk,n vn |


By Triangle inequality

|λvk | ≤ |P Tk,1 v1 | + |P Tk,2 v2 | + . . . + |P Tk,n vn |

Since vk is maximal value

|λvk | ≤ |P Tk,1 vk | + |P Tk,2 vk | + . . . + |P Tk,n vk |

Since P i,j are all positive and the rows of P sum to 1:

|λvk | ≤ (P Tk,1 + P Tk,2 + . . . + P Tk,n )|vk |

|λvk | ≤ |vk |
λ≤1
Thus all eigen-values of P would be less or equal to 1 And since α < 1, Thus
all eigen-values of αP would be less than 1. Thus the spectral radius of matrix
I − A will be less than 1.
Hence the Richardsons method will converge for the PageRank linear system.

11
Problem 3.2
For Richardsons method, the update equations can be written as, with x0 = b:

rk = b − Axk

xk+1 = x + αrk

In these iterations, we can optimize the matrix vector product Axk .


For CSC representation, we can easily take get the optimized matrix-vector
product.
Julia Code
using SparseArrays, LinearAlgebra, Plots, Printf

function csc_matvec(colptr, rowval, nzval, m, n, x)


y = zeros(m) # allocate output
for j = 1:length(colptr)-1 # for each column ...
for nzi = colptr[j]:colptr[j+1]-1 # for each entry in the column
i = rowval[nzi]
v = nzval[nzi]
y[i] += v * x[j] # same as indexed form!
end
end
return y
end

function solve_linear_system(
A::SparseMatrixCSC,
b::Vector,
alpha::Float64 = 1.0,
niter = 0,
)
n = size(A, 1)
@assert(n == size(A, 2), "The matrix is not square")

x = copy(b) # Initialize x0 with b


r = b - csc_matvec(A.colptr, A.rowval, A.nzval, A.m, A.n, x) # compute the initial residual
iter = 0

while norm(r)/norm(b) >= 1e-5 && iter <= niter


x .+= alpha .* r
r = b - csc_matvec(A.colptr, A.rowval, A.nzval, A.m, A.n, x)
iter += 1
end
println("Further iterations will not make a difference.\nIteration=", iter)
return x, iter
end

A = [0.7 0 0.4
0 1 0
0.3 0 0.6]
b = [1/4
1/2
1/4]
x1 = A\b
x2,_ = solve_linear_system(sparse(A), b, 1.0, 10000)

12
display(x1)
display(x2)
println("")

Output
Further iterations will not make a difference.
Iteration=25
3-element Vector{Float64}:
0.16666666666666669
0.5
0.3333333333333333
3-element Vector{Float64}:
0.1666778422384972
0.5
0.3333221577615028

13
Problem 3.3
We tested the implementation for correctness by running for P as a 1000 × 1000
matrix.
For αP R = 0.85 and αRichardson = 1.0, the algorithm reached a relative residual
of 10−5 in 80 iterations.

For αP R = 0.85 and αRichardson = 0.5, the algorithm reached a relative residual
of 10−5 in 167 iterations.

Julia Code
using SparseArrays, LinearAlgebra, Plots, Printf, ZipFile, DelimitedFiles

Random.seed!(0)
function csc_matvec(colptr, rowval, nzval, m, n, x)
y = zeros(m) # allocate output
for j = 1:length(colptr)-1 # for each column ...
for nzi = colptr[j]:colptr[j+1]-1 # for each entry in the column
i = rowval[nzi]
v = nzval[nzi]
y[i] += v * x[j] # same as indexed form!
end
end
return y
end

function solve_linear_system(
A::SparseMatrixCSC,
b::Vector,
alpha::Float64 = 1.0,
niter = 0,
)
n = size(A, 1)
@assert(n == size(A, 2), "The matrix is not square")

x = copy(b) # Initialize x0 with b


r = b - csc_matvec(A.colptr, A.rowval, A.nzval, A.m, A.n, x) # compute the initial residual
iter = 0

while norm(r) / norm(b) >= 1e-5 && iter <= niter


x .+= alpha .* r
r = b - csc_matvec(A.colptr, A.rowval, A.nzval, A.m, A.n, x)
iter += 1
end

return x, norm(r) / norm(b), iter


end

P =
sprand(1_000, 1_000, 15 / 1_000) |>
A ->
begin
fill!(A.nzval, 1)
A
end |> A -> begin

14
ei, ej, ev = findnz(A)
d = sum(A; dims = 2)
return sparse(ej, ei, ev ./ d[ei], size(A)...)
end
colptr, rowval, nzval = P.colptr, P.rowval, P.nzval

function load_data()
r = ZipFile.Reader("HW2/wikipedia-2005.zip")
try
@assert length(r.files) == 1
f = first(r.files)
data = readdlm(f, '\n', Int)
n = data[1]
colptr = data[2:2+n] # colptr has length n+1
rowval = data[3+n:end] # n+2 elements before start of rowval
A =
SparseMatrixCSC(n, n, colptr, rowval, ones(length(rowval))) |>
A -> begin
ei, ej, ev = findnz(A)
d = sum(A; dims = 2)
return sparse(ej, ei, ev ./ d[ei], size(A)...)
end
finally
close(r)
end
end

P =
sprand(1_000, 1_000, 15 / 1_000) |>
A ->
begin
fill!(A.nzval, 1)
A
end |> A -> begin
ei, ej, ev = findnz(A)
d = sum(A; dims = 2)
return sparse(ej, ei, ev ./ d[ei], size(A)...)
end
v = 1/size(P,1) .* ones(size(P,1))
alpha_pr = 0.85
A = (I(size(P,1)) - alpha_pr .* P)
b = (1.0-alpha_pr) .* v
x_actual = A\b
x, r_r, iters = solve_linear_system(sparse(A), b, 1.0, 10000000)
@printf("\nRelative Residual = %g Iterations Taken = %g\n", r_r, iters)
@printf("Norm of x_actual - x = %g", norm(x_actual .- x))

P = load_data()
v = 1/size(P,1) .* ones(size(P,1))

alpha_pr = 0.85
alpha_r = 1.0
A = (I(size(P,1)) - alpha_pr .* P)
b = (1.0-alpha_pr) .* v

15
x1, r_r1, iters1 = solve_linear_system(sparse(A), b, alpha_r, 10000000)
@printf("\nRelative Residual = %g Iterations Taken = %g\n", r_r1, iters1)

alpha_pr = 0.85
alpha_r = 0.5
A = (I(size(P,1)) - alpha_pr .* P)
b = (1.0-alpha_pr) .* v

x2, r_r2, iters2 = solve_linear_system(sparse(A), b, alpha_r, 10000000)


@printf("\nRelative Residual = %g Iterations Taken = %g\n", r_r2, iters2)

output
julia>

Relative Residual = 8.59502e-06 Iterations Taken = 71


Norm of x_actual - x = 2.71798e-07
Relative Residual = 9.28214e-06 Iterations Taken = 80

Relative Residual = 9.51691e-06 Iterations Taken = 167

16
Problem 4.1
From HW1 Problem 7 we were give T as the sparse matrix which represented
the transitions probability from one state to another.

T i,j = Probability of transition from state j to state i

State 101 as starting state and state 100 as ending state.

Through the analysis done in HW1 Problem 7.1 we were able to write a
linear system of equations for getting the expected length of game from a starting
state as:

(I − T 0 )x = [1 1 1 . . . 1 1 0 1]0

Thus the solution for x was done by:

x = (I − T 0 )−1 [1 1 1 . . . 1 1 0 1]0

On solving the above linear equation, we get

Expected number of steps from start, x101 = 40 (soln)

Now we want to solve the above equation by using Richardsons method.


Using Richardsons method, we can solve the linear system of equations Ax = b

For convergence, we need the step factor α to satisfy:

ρ(I − αA) < 1

where ρ(B) is the spectral radius of matrix B

So, for the linear equation for Chutes and Ladders Expected length, starting with
α = 1,

We observed that the modulo of eigen-values lie in range (0.45, 0.96).


⇒ The spectral radius is less than 1

Thus for α = 1, the Richardson’s method for solving the equations will converge.
Running the Richardson method for different values of alpha and measuring the
number of iterations and time taken to run the iterations until convergence or
otherwise, we could make following inferences:

• The value of α at which fewest number of iterations are taken is around 1.4
• This value of α is also that takes least amount of time (through benchmark-
ing)
• This is also demonstrated by the rate of fall of residual for different values
of α
• In the graph of α vs number of iterations, the graph falls after 1.5 because
the program was set to exit early in case the value of norm exceeded the
limits.
• In the graph of α vs time taken, the graph falls after 1.5 because the program
was set to exit early in case the value of norm exceeded the limits.

17
18
Number of Iterations vs Residual for different values of α

Julia Code
using CSV, DataFrames, SparseArrays, Plots, LinearAlgebra,
BenchmarkTools, Printf

CLMatrixDF = CSV.read(
"chutes-and-ladders-matrix.csv", DataFrame, header=false)
CLMatrixCoords = CSV.read(
"chutes-and-ladders-coords.csv", DataFrame, header=false)

Ivec = CLMatrixDF[!,1]
Jvec = CLMatrixDF[!,2]
Vvec = CLMatrixDF[!,3]

xc = CLMatrixCoords[!,1]
yc = CLMatrixCoords[!,2]

function find_spectral_radius(A::SparseMatrixCSC, alpha::Float64 = 1.0)


n = size(A,1)
@assert(n == size(A,2), "The matrix is not square")
E = eigen(Matrix(sparse(I, n, n) - alpha .* A))
rhoA = maximum(abs.(E.values))
return rhoA
end

function solve_linear_system(
A::SparseMatrixCSC, b::Vector, alpha::Float64 = 1.0, niter=0)
n = size(A,1)
@assert(n == size(A,2), "The matrix is not square")
residual = Vector{Float64}()
diverging_series = false

# Setting the condition for convergence to be a 1000 times since it still


# gives a small enough stopping condition and helps cut down iterations

19
x = copy(b) # Initialize x0 with b
r = b - A*x # compute the initial residual
iter = 0

while norm(r) >= 1000*n*eps(1.0) && iter <= niter && !isnan(norm(r))
x .+= alpha .* r
r = b - A*x
iter += 1
push!(residual, norm(r))
end

if iter < niter && norm(r) < 1000*n*eps(1.0) && !isnan(norm(r))


println("Further iterations won't make difference.\nIterations=", iter)
else
println("Could not converge within the given number of iterations")
# so that we know that non-convergence happened
diverging_series = true
end

return x, iter, residual, diverging_series


end

# the sparse matrix representation of chutes and ladders


# game transition matrix
T = sparse(Ivec, Jvec, Vvec, 101, 101)
# Starting state = 101
# Ending state = 100
Id101 = zeros(101,101)
Id101[diagind(Id101)] .= 1
b=ones(101)
b[end-1] = 0
c = zeros(101)
c[end] = 1

plot_array = Any[]
for alph in [0.1, 0.4, 0.7, 1.0, 1.4, 1.5]
spectral_rad = find_spectral_radius(Id101 - T', alph)
# not checking for spectral radius because we want to show the
# divergence of the iterations
solution, iterations, residual, _ = solve_linear_system(
Id101 - T', b, alph, 10000)
title = @sprintf("alpha = %g", alph)
push!(plot_array,plot(1:iterations,residual, title=title, legend=false))
end
plot(plot_array..., layout = (3,2))
savefig("HW2_4_1_IterationsVsResidualForAlpha")

px = Any[]
py = Any[]
for alph = 0.1:0.1:1.7
spectral_rad = find_spectral_radius(Id101 - T', alph)
# not checking for spectral radius because we want to show the
# divergence of the iterations
solution, iterations, _, _ = solve_linear_system(
Id101 - T', b, alph, 10000)

20
push!(px, alph)
push!(py, iterations)
end
plot(px, py, title = "Alpha vs number of iterations", legend = false)
xlabel!("alpha")
ylabel!("Number of iterations")
savefig("HW2_4_1_AlphaVsIterations")

timeBenchmark = Any[]
for alph = 0.1:0.1:1.7
spectral_rad = find_spectral_radius(Id101 - T', alph)
# not checking for spectral radius because we want to show the
# divergence of the iterations
t = @belapsed solve_linear_system(Id101 - T', b, $alph, 100000)
push!(timeBenchmark, t)
end
plot(0.1:0.1:1.7, 1000 .* timeBenchmark,
title = "Alpha vs time elapsed in iterations",
legend=false)
xlabel!("alpha")
ylabel!("Time (msec)")
savefig("HW2_4_1_AlphaVsTimeTaken")

21
Problem 4.2
In the linear system for solving the expected length of the Chutes and Ladders
game,
x = (I − T 0 )−1 [1 1 1 . . . 1 1 0 1]0

The expected length of the game is given by 101th entry of the solution vector.
i.e.
E = [0 0 0 . . . 0 1](I − T 0 )−1 [1 1 1 . . . 1 1 0 1]0

Now since E is a scalar, inverse of a transpose is equal to transpose of inverse


and the transpose of a matrix is additive, we can re-write this as:

E = ([0 0 0 . . . 0 1](I − T T )−1 [1 1 1 . . . 1 1 0 1]T )T

⇒ E = [1 1 1 . . . 1 1 0 1](I − T )−1 [0 0 0 . . . 0 0 0 1]T

Consider the part,


C = [1 1 1 . . . 1 1 0 1](I − T )−1
By Neumann series expansion,

⇒ C = ([1 1 1 . . . 1 1 1 1] − [0 0 0 . . . 0 0 1 0])(I + T + T 2 + T 3 + . . .)

⇒ C = [1 1 1 . . . 1 1 1 1](I+T +T 2 +T 3 +. . .)−[0 0 0 . . . 0 0 1 0](I+T +T 2 +T 3 +. . .)

Consider the 101th entry of the term,

C1 = [1 1 1 . . . 1 1 1 1](I + T + T 2 + T 3 + . . .)

Since the sum of columns of T (which represent states that will lead to a conclusion)
starts with 1 and gradually diminishes to extremely small value as k increases for
Tk
C1101 = [1 1 1 . . . 1 1 1 1](I + T + T 2 + T 3 + . . .)
P∞
Now We need to show that E is same as k=1 k[pk ]100P t101

Since t101 = [0 0 0 . . . 0 1]T We need to show that C = k=1 k[pk ]100
Or we need to show P∞ that k−1
[1 1 1 . . . 1 1 0 1](I + T + T + T 3 + . . .) is equal to
2

[0 0 0 . . . 0 0 1 0] k=1 kT

22
Problem 5
In order to implement the gradient descent method for equation Ax = b, the
recurring equations are:
gk = Axk − b
gTk gk
αk =
gTk Agk
xk+1 = xk − αk gk

The first equation can be rewritten for gk+1 as:

gk+1 = Axk+1 − b

⇒ gk+1 = A(xk − αk gk ) − b
⇒ gk+1 = Axk − b − αk Agk
⇒ gk+1 = gk − αk Agk

Thus finally we can rewrite the recurring equations where we just calculate a
single matrix vector product (Ag) as: Initialise

x0 = b

g0 = Ax0 − b
Then
gTk gk
αk =
gTk Agk
xk+1 = xk − αk gk
gk+1 = gk − αk Agk

We have kept the stopping condition as the k − gk < n × eps(1.0) where eps(1.0)
is an extremely small value provided by Julia.

Julia Code
# Gradient Descent Method:
using SparseArrays, Plots, LinearAlgebra,
BenchmarkTools, Printf

A = [1 0 0
0 4 5
0 5 6] # SPD
b = [7
8
9]
function steepest_descent_method(A::SparseMatrixCSC, b::Vector, niter=0)
n = size(A,1)
@assert(n == size(A,2), "The matrix is not square")
residual = Vector{Float64}()
diverging_series = false

x = copy(b) # Initialize x0 with b


g = A*x - b # compute the initial gradient
iter = 0

23
while norm(-g) >= n*eps(1.0) && iter <= niter && !isnan(norm(-g))
iter += 1

mat_vec_pdt = A * g
alpha = (g' * g)/(g' * mat_vec_pdt)
x = x - alpha .* g
g = g - alpha .* mat_vec_pdt
end

if iter < niter && norm(-g) < n*eps(1.0) && !isnan(norm(-g))


println("Further iterations won't make difference.\nIterations=", iter)
else
println("Could not converge within the given number of iterations")
# so that we know that non-convergence happened
diverging_series = true
end

return x, iter, residual, diverging_series


end

sol1 = A\b
sol2, _, _, _ = steepest_descent_method(sparse(A), b, 10000)

display(sol1)
display(sol2)

Output
Further iterations won't make difference.
Iterations=13
3-element Vector{Float64}:
7.0
-3.0000000000000155
4.000000000000013
3-element Vector{Float64}:
7.0
-2.999999999999985
3.999999999999986

24
Problem 6
The coordinate descent method can be implemented as (taking x0 = b):

gk+1 = Axk − b

xk+1 = xk + γei
−gi
Where γ = A i,i
and ei is a vector of size same as b and has only the index
i as 1 and everything else as 0. The index i could be picked at random or cyclically.

This can be re-written as (again selecting the index i through some means):

gk+1 = Axk − b
gi
xk+1 [i] = xk [i] −
Ai,i

Given that the input matrix is a sparse matrix, we can optimize by not calculating
the complete vector g at each step.

This can be done by taking just the row of matrix A indexed by i. So the final
algorithm is:
(gi )k+1 = A[i, :]xk − b[i]
g
xk+1 [i] = xk [i] − i
Ai,i

We can use the CSR representation of the matrix and then taking the column
projection (of the corresponding row we require of the original matrix. This way
the maximum work done would be the number of non-zero entries in the row.

Julia Code
# Coordinate Descent Method:
using SparseArrays, Random
Plots, LinearAlgebra, BenchmarkTools, Printf

A = [1 0 1
0 4 0
1 0 6] # SPD
b = [7.0
8.0
9.0]
# To get consistent results
Random.seed!(10)

# This method can be used to get the matrix-row-to-vector multiplication


# by first taking a transpose of the matrix and then taking the column
# projection
""" Returns = A[:,i]'*x where A is given by the CSC arrays
colptr, rowval, nzval, m, n and x is the vector. """
function csc_column_projection(colptr, rowval, nzval, m, n, i, x)
y = 0
for nzi = colptr[i]:colptr[i+1]-1
i = rowval[nzi]
v = nzval[nzi]
y += v * x[i]

25
end
return y
end

""" Returns rho = A[i,j] where A is given by the CSC arrays


colptr, rowval, nzval, m, n and i, and j are the column indices. """
function csc_lookup(colptr, rowval, nzval, m, n, i, j)
for nzi = colptr[j]:colptr[j+1]-1
if i == rowval[nzi]
return nzval[nzi]
end
end
return 0
end

function coordinate_descent_method(
A::SparseMatrixCSC,
b::Vector,
niter = 0,
use_random = false,
)
n = size(A, 1)
@assert(n == size(A, 2), "The matrix is not square")
residual = Vector{Float64}()
diverging_series = false

x = copy(b) # Initialize x0 with b


g = A * x - b
iter = 0
mat_transpose_csc = copy(A')
while norm(-g) >= n * eps(1.0) && iter <= niter && !isnan(norm(-g))
iter += 1
if use_random
#select an index in gradient array at random
idx = round(rand(1:n))
else
# select an index in gradient array cyclically
idx = iter % n + 1 # since Julia indices are 1 based
end

g[idx] =
csc_column_projection(
mat_transpose_csc.colptr,
mat_transpose_csc.rowval,
mat_transpose_csc.nzval,
mat_transpose_csc.m,
mat_transpose_csc.n,
idx,
x,
) - b[idx]

x[idx] =
x[idx] - (
g[idx] /
csc_lookup(A.colptr, A.rowval, A.nzval, A.m, A.n, idx, idx)
)

26
end

if iter < niter && norm(-g) < n * eps(1.0) && !isnan(norm(-g))


println("Further iterations won't make difference.\nIterations=", iter)
else
println("Could not converge within the given number of iterations")
# so that we know that non-convergence happened
diverging_series = true
end

return x, iter, residual, diverging_series


end

sol1 = A\b
sol2, _, _, _ = coordinate_descent_method(sparse(A), b, 100000, false)
sol3, _, _, _ = coordinate_descent_method(sparse(A), b, 100000, true)

display(sol1)
display(sol2)
display(sol3)

Output
Further iterations won't make difference.
Iterations=63
Further iterations won't make difference.
Iterations=31
3-element Vector{Float64}:
6.6
2.0
0.4
3-element Vector{Float64}:
6.6
2.0
0.4000000000000001
3-element Vector{Float64}:
6.599815672153635
2.0
0.40003072130772743

27
Problem 7
We want to solve the linear system of equations Ax = b by minimizing the
function:
f (x) = 21 xT Ax − xT b

Where A is SPD
The generalized co-ordinate descent method update step for 2 co-ordinates:

xk+1 = xk + E ij ck
 
c
Where E ij = ei ej and ck = i
 
cj
Thus putting this update step in the equation for f (x) and trying to minimize
the difference.

error = f (xk+1 )−f (xk ) = 21 (xk +ci ei +cj ej )T A(xk +ci ei +cj ej )−(xk +ci ei +cj ej )T b−( 12 xTk Axk −xTk b)

⇒ error = 12 xTk Axk + 12 xTk Aci ei + 12 xTk Acj ej


+ 12 ci eTi Axk + 21 c2i eTi Aei + 12 ci cj eTi Aej
+ 12 cj eTj Axk + 21 c2j eTj Aei + 12 cj ci eTj Aei
−xTk b − ci eTi b − cj eTj b − 12 xTk Axk + xTk b

Using the property of A being symmetric [i.e. xT Ay = yT Ax], removing the


cancelled out terms we can write:

⇒ error = ci eTi Axk + cj eTj Axk + ci cj eTi Aej


+ 21 c2i eTi Aei + 12 c2j eTj Aei
−ci eTi b − cj eTj b

Setting Axk − b = gk , and rearranging, we can write:

⇒ error = ci eTi gk + cj eTj gk + ci cj eTi Aej


+ 12 c2i eTi Aei + 12 c2j eTj Aei

Taking partial derivatives w.r.t. ci and cj and equating to 0 we get the equations:

eTi gk + cj eTi Aej + ci eTi Aei = 0


eTj gk + ci eTi Aej + cj eTj Aej = 0

Or re-writing:
[gk ]i + cj Ai,j + ci Ai,i = 0
[gk ]j + ci Ai,j + cj Aj,j = 0

Solving the linear equations we get:


[gk ]i Aj,j − [gk ]j Ai,j
ci =
A2i,j − Ai,i jAj,j

[gk ]j Ai,i − [gk ]i Ai,j


cj =
A2i,j − Ai,i jAj,j

28
Thus we can now write the code for 2-coordinate descent method. Care must be
taken that if the indices are selected at random, then they must be different from
each other. Or we will definitely run into not defined errors.

Julia Code
# 2-Coordinate Descent Method:
using SparseArrays, Random, StatsBase
Plots, LinearAlgebra, BenchmarkTools, Printf

A = [
1 0 1
0 4 0
1 0 6
] # SPD
b = [
7.0
8.0
9.0
]

function coordinate_descent_method_2(
A::SparseMatrixCSC,
b::Vector,
niter = 0,
use_random = false,
)
n = size(A, 1)
@assert(n == size(A, 2), "The matrix is not square")
residual = Vector{Float64}()
diverging_series = false

x = copy(b) # Initialize x0 with b


g = A * x - b

iter = 0

while norm(-g) >= n * eps(1.0) && iter <= niter && !isnan(norm(-g))
iter += 1
if use_random
#select an index in gradient array at random
#they must be different from each other
idx1, idx2 = sample(1:n, 2, replace=false)
else
# select an index in gradient array cyclically
idx1 = iter % n + 1 # since Julia indices are 1 based
idx2 = (iter + 1) % n + 1
end

c1 =
(g[idx1] * A[idx2,idx2] - g[idx2] * A[idx1,idx2]) /
((A[idx1,idx2])^2 - A[idx1,idx1] * A[idx2,idx2])
c2 =
(g[idx2] * A[idx1,idx1] - g[idx1] * A[idx1,idx2]) /
((A[idx1,idx2])^2 - A[idx1,idx1] * A[idx2,idx2])

29
e1 = zeros(size(A,1))
e1[idx1] = 1

e2 = zeros(size(A,1))
e2[idx2] = 1

x = x + [e1 e2]*[c1;c2]
g = A * x - b
end

if iter < niter && norm(-g) < n * eps(1.0) && !isnan(norm(-g))


println("Further iterations won't make difference.\nIterations=", iter)
else
println("Could not converge within the given number of iterations")
# so that we know that non-convergence happened
diverging_series = true
end

return x, iter, residual, diverging_series


end

sol1 = A \ b
sol2, _, _, _ = coordinate_descent_method_2(sparse(A), b, 100000, false)
sol3, _, _, _ = coordinate_descent_method_2(sparse(A), b, 100000, true)

display(sol1)
display(sol2)
display(sol3)

Output
julia>
Further iterations won't make difference.
Iterations=2
Further iterations won't make difference.
Iterations=3
3-element Vector{Float64}:
6.6
2.0
0.4
3-element Vector{Float64}:
6.6
2.0
0.4
3-element Vector{Float64}:
6.6
2.0
0.4

30
Problem 8.1
For both coordinate descent method and gradient descent method, in order for
the iterations to converge, we need that the matrix A in Ax = b must be SPD.
We know that the Laplacian matrix generated (with boundary conditions removed)
is symmetric.
On checking out the eigen-values, all of them are real valued and lie between
0.012330665067488346 and 7.987669334932514
Thus the matrix is SPD.
Hence applying steepest descent or coordinate-descent method, the iterations will
converge.
On running both methods for the n=40 Laplacian matrix, we got similar solutions
of x by both methods.
The work done for the relative residual looks like following:
Relative Residual Steepest Descent Work Coordinate Descent Work
1 37625 343450
0.1 5643750 4756953
0.01 11287500 9276724
0.001 16931250 13619145
0.0001 22379350 18124070
The Plots look like:

31
32
Julia Code
f(::Float64, ::Float64) = 1
A, b = laplacian_reduced(40, f)

x1, w1, r_r1, _ = coordinate_descent_method(A, b, 10000000, true)

plot(
w1[1:2000:end],
r_r1[1:2000:end],
legend = false,
title = "Coord-Descent Work done vs relative residual",
)
xlabel!("Work done")
ylabel!("Relative Residual")
savefig("HW2/Prob8_1_CoordinateDesc")

plot(
w1[1:3000:end],
r_r1[1:3000:end],
yaxis = :log,
legend = false,
title = "Coord-Descent Work done vs relative residual (Log)",
)
xlabel!("Work done (log scale)")
ylabel!("Relative Residual")
savefig("HW2/Prob8_1_CoordinateDescLog")

x, w, r_r, _ = steepest_descent_method(A, b, 20000000)

plot(
w,
r_r,
legend = false,
title = "Steepest Descent Work done vs relative residual",
)

33
xlabel!("Work done")
ylabel!("Relative Residual")
savefig("HW2/Prob8_1_SteepestDesc")

plot(
w,
r_r,
yaxis = :log,
legend = false,
title = "Steepest Descent Work done vs relative residual (Log)",
)
xlabel!("Work done (log scale)")
ylabel!("Relative Residual")
savefig("HW2/Prob8_1_SteepestDescLog")

println("")

34
Problem 8.2
For both coordinate descent method and gradient descent method, in order for
the iterations to converge, we need that the matrix A in Ax = b must be SPD.
The matrix that corresponds to A in the equation is I − T T
On checking out the eigen-values, the eigen-values are complex, but all have a
positive real component In order for correctness, we can modify the equation to
have an SPD matrix.
We can instead solve, AT Ax = AT b
Since AT A would be an SPD matrix, we would now have a system that converges.
On checking the eigen-values of AT A , they range from 0.0007665789654431876
to 2.0619327436207917.
Hence applying steepest descent or coordinate-descent method, the iterations will
converge.
On running both methods for the above modified matrix, we got similar solutions
of x by both methods.
The work done for the relative residual looks like following:
Relative Residual Steepest Descent Work Coordinate Descent Work
.1 24423 1984
0.01 2093400 1983075
0.001 4884600 5250627
0.0001 7265261 8911588
The Plots look like (we notice that there is a lot of noise, but the general trend of
residual diminishing with increasing iterations still holds):

35
36
Julia Code
CLMatrixDF =
CSV.read("HW2/chutes-and-ladders-matrix.csv", DataFrame, header = false)
CLMatrixCoords =
CSV.read("HW2/chutes-and-ladders-coords.csv", DataFrame, header = false)

Ivec = CLMatrixDF[!, 1]
Jvec = CLMatrixDF[!, 2]
Vvec = CLMatrixDF[!, 3]

xc = CLMatrixCoords[!, 1]
yc = CLMatrixCoords[!, 2]

T = sparse(Ivec, Jvec, Vvec, 101, 101)


# Starting state = 101
# Ending state = 100
Id101 = zeros(101, 101)
Id101[diagind(Id101)] .= 1
b_temp = ones(101)
b_temp[end-1] = 0

temp = (Id101 - T')


A = temp'*temp
b = temp*b_temp

37
Problem 9
In the approximate model, we saw that if x is the vector of P(i is infected at time
T) and A is the adjacency matrix, x can be obtained for k steps as (taking x0 as
initial infections and ρ as probability of getting infected):

xki = max([(ρA)k x0 ]i , x0i )

Here we considered that once infected, we stay infected.


The largest eigen-value of (ρA) corresponds to the spectral radius of ρA
From the graphs we observe that as long as the spectral radius stays below 1, the
plots converge to a fixed value. And as soon as the spectral radius goes beyond 1,
the plot for approximate plots start diverging.
We also know that if for a matrix spectral radius is less than 1 then,

(ρA)k → [0] as k → inf

Which tells us that as the number of iterations increase, if we require that the
expected number of infected to converge to a fixed value, we need the (ρA)k to
approach 0.
Thus the largest eigen-value of (ρA) gives us a bound on how large the value of ρ
can get before the approximate model would not converge anymore.

38
39
Problem 10
Given,
D = diagonal entries of A
N = A − D = all non-diagonal entries of A
kN xk ≤ γkDxk
for the vector 2-norm and γ < 1 for all vectors x
Lets define the operator-2-norm for matrices, considering k.k is the 2 norm,
kAxk
kAk = max
x6=0 kxk

Now,
kN xk ≤ γkDxk
kN xk
⇒ ≤γ
kDxk

By the submultiplicative property of matrix norms and vector 2 norm (operator-


2-norm of matrix is also submultiplicative here),

kDxk ≤ kDkkxk

Thus we can write:


kN xk
≤γ
kDkkxk

Now since the equation holds for all x, we can write

kN xk
max ≤γ
x6=0 kDkkxk
kN k
⇒ ≤γ
kDk

Now for the operator-2-norm of a diagonal matrix, we can write:


kDxk
kDk2 = max
x6=0 kxk

P 2 2
i D i,i xi
kDk = max P
2
2
x6=0 i xi

Now we can argue that RHS would always be lesser than expression if we replace
D i,i with D max (the max diagonal entry). i.e.
P 2 2 P 2
i D i,i xi D x2i
P 2 ≤ iP max 2 = D 2max
x
i i x
i i

And the corresponding vector to obtain this value would be eigen-vector for that
corresponding D max . Hence,

kDk = D max

So using this and by the property of inverse diagonal matrices


1 1
= = kD −1 k
kDk D max

40
Hence we can write,
kN k
≤ γ ⇒ kD −1 kkN k ≤ γ
kDk
By the submultiplicative property of matrix norms:

kD −1 N k ≤ γ

And we know that the spectral radius of a matrix is always less than the matrix
norm, thus we can write,
ρ(D −1 N ) ≤ γ < 1

Which is the requirement for convergence of Jacobi iterations. Hence proved

41

You might also like