Homework 2 Solution PDF
Homework 2 Solution PDF
Problem 1
Here I use gradient descent (GD), that is, update with all misclassified points with step size 0.01. Figures as
below:
(ii) Show the evolution of binary classification error with number of iterations from random initialization
This study source was downloaded by 100000831724244 from CourseHero.com on 03-07-2024 12:35:17 GMT -06:00
https://www.coursehero.com/file/35055899/Homework-2-Solutionpdf/
(iii) the perceptron error with number of iterations from random initialization
For GD, discuss the convergence behavior as you vary the step size (η):
as shown in the pictures below, when step size is small(0.001), binary classification error decreases stablely as
iteration increases.
However, as step size grows, the error start to vibrate. The reason is, large step size may cause over-update.
step size = 0.001:
This study source was downloaded by 100000831724244 from CourseHero.com on 03-07-2024 12:35:17 GMT -06:00
https://www.coursehero.com/file/35055899/Homework-2-Solutionpdf/
step size = 0.1:
Problem 2
(a)
This study source was downloaded by 100000831724244 from CourseHero.com on 03-07-2024 12:35:17 GMT -06:00
https://www.coursehero.com/file/35055899/Homework-2-Solutionpdf/
T T
(b)when there is no back look, y can always be computed from : Y =. . . C2 W2 C1 W1 X , so the new weight
without hidden layers should be . . . C2 W2 T C1 W1 T
(c)
Here we have w 1 = w 3 = 1, w 2 = w 4 = −1, w 5 = −2, w 6 = 1
Problem 3
(a)
∂E ∂E ∂x i ∂s i
=
∂w ji ∂x i ∂s i ∂w ji
∂E ti 1 − ti x i − ti
= −( ) + =
∂x i xi 1 − xi x i (1 − x i )
∂x i e−s i + 1 − 1 ∂s i
= = x i (1 − x i ), = yj
∂s i s
(1 + e− i )
2 ∂w ji
so we have:
∂E
w t+1
ji = w tji − η1
∂w ji
∂E ∂E ∂x i ∂s i ∂s i
∑ ∂x i ∂s i ∂yj
= , = w ji
∂yj i
∂yj
∂E ∂E ∂yj ∂s j
=
∂w kj ∂yj ∂s j ∂w kj
∂yj e−s j + 1 − 1 ∂s j
= = yj (1 − yj ), = zk
∂s j (1 + e )
−s j 2 ∂w kj
so we have:
∂E
w t+1 t
kj = w kj − η2
∂w kj
(b)
∂E ∂E ∂x u ∂s i
∑ ∂x u ∂s i ∂w ji
=
∂w ji u
∂E tu
=
∂x u xu
∂x u es i ∂s i
= −(x u )2 s u , = yj
∂s i e ∂w ji
when u = i, it becomes
∂x i 2
=onx03-07-2024
i − xi
s
∂ i
This study source was downloaded by 100000831724244 from CourseHero.com 12:35:17 GMT -06:00
https://www.coursehero.com/file/35055899/Homework-2-Solutionpdf/
∂ i
and the rest is in the same form as in (a)
Reason: As we can see the expert weight in red line increases with t, and is always the highest weight among all
when other weights decrease, which means no shifting occurs and best expert never changes.
Reason: As we can see, the sequence of expert weights chaange with t, thus non-stationary, and shifting exists.
Best expert varies over t.
Problem 5
since J(h) is expectation of a absolute value, it mast have a lower bound 0. And since P is not smaller than 0, we
can have(here we rewrite P(h(x) > 0|y) as Py ):
∑ ∑
=2 |πi (Pi − πj Pj )|
i=1 j=1
when h induces a perfectly pure partition, we have Pi ∈ 0, 1 ,thus the equation becomes :
k k
J(h)= 2 ∑ |πi (Pi − ∑ πj Pj )|
i=1 j=1
≤ 2 ∑ πi (1 − ∑ πj ) + 2 ∑ πi ( ∑ πj )
Pi =1 Pj =1 Pi =0 Pj =1
= 4A(1 − A)
where A = ∑Pi =1 πi .
1
So when A = 2 which means h induces perfectly balanced partition, J(h) = 1
This study source was downloaded by 100000831724244 from CourseHero.com on 03-07-2024 12:35:17 GMT -06:00
https://www.coursehero.com/file/35055899/Homework-2-Solutionpdf/
Powered by TCPDF (www.tcpdf.org)