final_optimization_K64 (1)
final_optimization_K64 (1)
Test 02
Date: 09/12/2023
Time: 90 minutes
Instruction: Only two handwritten A4 notes are permitted in the examination room.
where x ∈ Rn is the variable, the vectors a1 , . . . , am ∈ Rn and b ∈ Rm are given, and h is the function given by
(
0, |z| ≤ 1
h(z) =
|z| − 1, |z| > 1.
Note that this problem can be thought of as a sort of hybrid between ℓ1 and ℓ∞ -norms, since there is no cost for
residuals smaller than one, and a linearly growing cost for residuals larger than one.
a) Graph the function h and show that h(z) = max{(−z − 1), 0, (z − 1)}.
Find the largest range of values of α for which the algorithm is globally convergent.
d) What would be the rate of convergence of steepest descent method for this problem? How many steepest
descent iterations would it take (at most) to reduce the function value to ε = 10−5 ?
10 0 −3 (0) 5
e) Consider A = ,b= , and x = . How does the rate of convergence to the
0 0.1 2 5
global minimum compare with that in the previous question? What makes the difference?
Question 4. (1p)
a) What is the difference between gradient descent and mini-batch gradient descent? Compare their advan-
tages and drawbacks.
b) Briefly describe the idea behind the Adaptive Gradient Algorithm (Adagrad) and its update rule.