0% found this document useful (0 votes)
71 views5 pages

Homework 2 Solution PDF

This document is a homework solution that uses gradient descent to solve classification problems and discusses its convergence properties for different step sizes. It also covers backpropagation for neural networks, calculating derivatives, and analyzing expert algorithms.

Uploaded by

happy_user
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views5 pages

Homework 2 Solution PDF

This document is a homework solution that uses gradient descent to solve classification problems and discusses its convergence properties for different step sizes. It also covers backpropagation for neural networks, calculating derivatives, and analyzing expert algorithms.

Uploaded by

happy_user
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Homework 2 - Solution

Special Topics in Advanced Machine Learning

Shangwen Yan | N17091204 | sy2160

Problem 1
Here I use gradient descent (GD), that is, update with all misclassified points with step size 0.01. Figures as
below:

(i) figures the resulting linear decision boundary on the 2d x data.

(ii) Show the evolution of binary classification error with number of iterations from random initialization

This study source was downloaded by 100000831724244 from CourseHero.com on 03-07-2024 12:35:17 GMT -06:00

https://www.coursehero.com/file/35055899/Homework-2-Solutionpdf/
(iii) the perceptron error with number of iterations from random initialization

For GD, discuss the convergence behavior as you vary the step size (η):

as shown in the pictures below, when step size is small(0.001), binary classification error decreases stablely as
iteration increases.

However, as step size grows, the error start to vibrate. The reason is, large step size may cause over-update.
step size = 0.001:

This study source was downloaded by 100000831724244 from CourseHero.com on 03-07-2024 12:35:17 GMT -06:00

https://www.coursehero.com/file/35055899/Homework-2-Solutionpdf/
step size = 0.1:

Problem 2
(a)

This study source was downloaded by 100000831724244 from CourseHero.com on 03-07-2024 12:35:17 GMT -06:00

https://www.coursehero.com/file/35055899/Homework-2-Solutionpdf/
T T
(b)when there is no back look, y can always be computed from : Y =. . . C2 W2 C1 W1 X , so the new weight
without hidden layers should be . . . C2 W2 T C1 W1 T

(c)
Here we have w 1 = w 3 = 1, w 2 = w 4 = −1, w 5 = −2, w 6 = 1

Problem 3
(a)
∂E ∂E ∂x i ∂s i
=
∂w ji ∂x i ∂s i ∂w ji

∂E ti 1 − ti x i − ti
= −( ) + =
∂x i xi 1 − xi x i (1 − x i )

∂x i e−s i + 1 − 1 ∂s i
= = x i (1 − x i ), = yj
∂s i s
(1 + e− i )
2 ∂w ji

so we have:

∂E
w t+1
ji = w tji − η1
∂w ji

For lower part,

∂E ∂E ∂x i ∂s i ∂s i
∑ ∂x i ∂s i ∂yj
= , = w ji
∂yj i
∂yj

∂E ∂E ∂yj ∂s j
=
∂w kj ∂yj ∂s j ∂w kj
∂yj e−s j + 1 − 1 ∂s j
= = yj (1 − yj ), = zk
∂s j (1 + e )
−s j 2 ∂w kj

so we have:

∂E
w t+1 t
kj = w kj − η2
∂w kj

(b)

∂E ∂E ∂x u ∂s i
∑ ∂x u ∂s i ∂w ji
=
∂w ji u

∂E tu
=
∂x u xu
∂x u es i ∂s i
= −(x u )2 s u , = yj
∂s i e ∂w ji

when u = i, it becomes

∂x i 2
=onx03-07-2024
i − xi
s
∂ i
This study source was downloaded by 100000831724244 from CourseHero.com 12:35:17 GMT -06:00

https://www.coursehero.com/file/35055899/Homework-2-Solutionpdf/
∂ i
and the rest is in the same form as in (a)

Problem 4: Experts advice


Left polot: static-expert

Reason: As we can see the expert weight in red line increases with t, and is always the highest weight among all
when other weights decrease, which means no shifting occurs and best expert never changes.

Right polot: fixed-share (α)

Reason: As we can see, the sequence of expert weights chaange with t, thus non-stationary, and shifting exists.
Best expert varies over t.

Problem 5
since J(h) is expectation of a absolute value, it mast have a lower bound 0. And since P is not smaller than 0, we
can have(here we rewrite P(h(x) > 0|y) as Py ):

J(h)= 2Ey [|P(h(x > 0) − P(h(x) > 0|y))|]


k k

∑ ∑
=2 |πi (Pi − πj Pj )|
i=1 j=1

when h induces a perfectly pure partition, we have Pi ∈ 0, 1 ,thus the equation becomes :
k k
J(h)= 2 ∑ |πi (Pi − ∑ πj Pj )|
i=1 j=1

≤ 2 ∑ πi (1 − ∑ πj ) + 2 ∑ πi ( ∑ πj )
Pi =1 Pj =1 Pi =0 Pj =1

= 4A(1 − A)

where A = ∑Pi =1 πi .
1
So when A = 2 which means h induces perfectly balanced partition, J(h) = 1

This study source was downloaded by 100000831724244 from CourseHero.com on 03-07-2024 12:35:17 GMT -06:00

https://www.coursehero.com/file/35055899/Homework-2-Solutionpdf/
Powered by TCPDF (www.tcpdf.org)

You might also like