hw1_red

Uploaded by

chonleda777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

hw1_red

Uploaded by

chonleda777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Machine Learning (NTU, Fall 2024) instructor: Hsuan-Tien Lin

Homework #1
RELEASE DATE: 09/09/2024
RED CORRECTION: 09/16/2024 06:00
DUE DATE: 10/07/2024, BEFORE 13:00 on GRADESCOPE
QUESTIONS ARE WELCOMED ON DISCORD (INFORMALLY) OR VIA EMAILS (FORMALLY).

You will use Gradescope to upload your scanned/printed solutions. For problems marked with (*), please
follow the guidelines on the course website and upload your source code to Gradescope as well. Any
programming language/platform is allowed.
Any form of cheating, lying, or plagiarism will not be tolerated. Students can get zero scores and/or fail
the class and/or be kicked out of school and/or receive other punishments for those kinds of misconducts.
Discussions on course materials and homework solutions are encouraged. But you should write the final
solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but
not copied from.
Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework
solutions and/or source codes to your classmates at any time. In order to maximize the level of fairness
in this class, lending and borrowing homework solutions are both regarded as dishonest behaviors and will
be punished according to the honesty policy.
You should write your solutions in English or Chinese with the common math notations introduced in
class or in the problems. We do not accept solutions written in any other languages.

This homework set comes with 200 points and 20 bonus points. In general, every home-
work set would come with a full credit of 200 points, with some possible bonus points.

1. (10 points, auto-graded) Which of the following tasks is best suited for machine learning? Choose
the best answer.
[a] generate an image of Hercules that matches his actual facial look
[b] search for the shortest road path from Taipei to Taichung
[c] summarize any news article to 10 lines
[d] predict whether Schrödinger’s cat is alive or dead inside the box
[e] none of the other choices
N
2. (10 points, auto-graded) Assume that a data set of an even size N , with being positive examples
2
N
and 2being negative. If each example is used to update wt in PLA exactly once. What is the
resulting w0 in wPLA ? Please assume that the initial weight vector w0 is 0. Choose the correct
answer.

[a] N
N
[b] 2
[c] 0
[d] − N2
[e] none of the other choices

1 of 4
Machine Learning (NTU, Fall 2024) instructor: Hsuan-Tien Lin

3. (10 points, auto-graded) Dr. Norman thinks PLA will be highly influenced by very long examples,
as wt changes drastically if ∥xn(t) ∥ is large. Hence, ze decides to preprocess the training data
by scaling down each input vector by 2 i.e., zn ← x2n . How does PLA’s upper bound on Page
19 of Lecture 2 change with this preprocessing procedure, with respect to the R and ρ that were
calculated before scaling? Choose the correct answer.
2R2
[a] ρ2
R2
[b] ρ2
R2
[c] 2ρ2
R2
[d] 4ρ2
[e] none of the other choices
4. (10 points, auto-graded) Dr. Norman has another idea of scaling. Instead of scaling by a constant,
ze decides to preprocess the training data by normalizing each input vector i.e., zn ← ∥xxnn ∥ . How
does PLA’s upper bound on Page 19 of Lecture 2 change with this preprocessing procedure in
yn wfT zn
terms of ρz = minn ∥wf ∥ ? Choose the correct answer.

[a] ∞ (i.e., PLA might never terminate)

1
[b] ρ2z
1
[c] 2ρz

[d] √1
ρz

[e] none of the other choices

5. (20 points, human-graded) Go ask any chatGPT-like agent the following question, “what is a
possible application of active learning?”, list the answer that you get, and argue with 10-20 English
sentences on whether you agree with the agent or not, as if you are the “boss” of the agent.
The TAs will grade based on the persuasiveness of your arguments—please note that our TAs are
more used to being persuaded by humans than machines. So if your arguments do not look very
human-written, the TAs may not be persuaded.

6. (20 points, human-graded) Go ask any chatGPT-like agent the following question, “can machine
learning be used to predict earthquakes?”, list the answer that you get, and argue with 10-20
English sentences on whether you agree with the agent or not, as if you are the “boss” of the agent.
The TAs will grade based on the persuasiveness of your arguments—please note that our TAs are
more used to being persuaded by humans than machines. So if your arguments do not look very
human-written, the TAs may not be persuaded.
7. (20 points, human-graded) Before running PLA, our class convention adds x0 = 1 to every xn
′ ′
vector, forming xn = (1, xorig
n ). Suppose that x0 = 2 is added instead to form xn = (2, xn ).
orig
N
Consider running PLA on {(xn , yn )}n=1 in a cyclic manner with the naı̈ve cycle. That is, the
algorithm keeps finding the next mistake in the order of 1, 2, . . . , n, 1, 2, . . .. Assume that such a
PLA with w0 = 0 returns wPLA , and running PLA on {(x′n , yn )}N n=1 with the same cyclic manner
′ ′
with w0 = 0 returns wPLA . Prove or disprove that wPLA and wPLA are equivalent. We define two
weight vectors to be equivalent if they return the same binary classification output on every possible
example in Rd , the space that every xorig belongs to. Please take any deterministic convention for
sign(0), for example setting sign(0) = 1.

2 of 4
Machine Learning (NTU, Fall 2024) instructor: Hsuan-Tien Lin

8. (20 points, human-graded) Before running PLA, our class convention adds x0 = 1 to every xn
′
vector, forming xn = (1, xorig
n ). Suppose that we scale every xn by 3, and x0 = 3 is added instead
′ orig N
to form xn = (3, 3xn ). Consider running PLA on {(xn , yn )}n=1 in a cyclic manner with the naı̈ve
cycle. That is, the algorithm keeps finding the next mistake in the order of 1, 2, . . . , n, 1, 2, . . ..
Assume that such a PLA with w0 = 0 returns wPLA , and running PLA on {(x′n , yn )}N n=1 with
′ ′
the same cyclic manner with w0 = 0 returns wPLA . Prove or disprove that wPLA and wPLA are
equivalent. Similar to Problem 7, please take any deterministic convention for sign(0), for example
setting sign(0) = 1.
9. (20 points, human-graded) Consider online hatred article detection with machine learning. We will
represent each article x by the distinct words that it contains. In particular, assume that there are
at most m distinct words in each article, and each word belongs to a big dictionary of size d ≥ m.
The i-th component xi is defined as Jword i is in article xK for i = 1, 2, . . . , d, and x0 = 1 as always.
We will assume that d+ of the words in the dictionary are more hatred-like, and d− = d − d+ of
the words are less hatred-like. A simple function that classifies whether an artile is a hatred is
to count z+ (x), the number of more hatred-like words with the article (ignoring duplicates), and
z− (x), the number of less hatred-like words in the article, and classify by

f (x) = sign(z+ (x) − z− (x) − 3.5).

That is, an article x is classified as a hatred iff the integer z+ (x) is more than the integer z− (x)
by 4.
Assume that f can perfectly classify any article into hatred/non-hatred, but is unknown to us. We
now run an online version of Perceptron Learning Algorithm (PLA) to try to approximate f . That
is, we maintain a weight vector wt in the online PLA, initialized with w0 = 0. Then for every
article xt encountered at time t, the algorithm makes a prediction sign(wtT xt ), and receives a true
label yt . If the prediction is not the same as the true label (i.e. a mistake), the algorithm updates
wt by
wt+1 ← wt + yt xt .
Otherwise the algorithm keeps wt without updating

wt+1 ← wt .

Derive an upper bound on the maximum number of mistakes that the online PLA can make for
this hatred article classification problem. The tightness of your upper bound will be taken into
account during grading.
Note: For those who know the bag-of-words representation for documents, the representation we
use is a simplification that ignores duplicates of the same word.

3 of 4
Machine Learning (NTU, Fall 2024) instructor: Hsuan-Tien Lin

10. (20 points, human-graded) Next, we use a real-world data set to study PLA. Please download the
RCV1.binary (training) data set at
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/rcv1_train.binary.bz2

and takes the first N = 200 lines as our data set. Each line of the data set contains one (xn , yn ) in
the LIBSVM format, with xn ∈ R47205 . The first number of the line is yn , and the rest of the line
is xn represented in the sparse format that LIBSVM uses.
https://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#/Q03:_Data_preparation

Please initialize your algorithm with w = 0 and take sign(0) as −1.

Please first follow page 4 of Lecture 2, and add x0 = 1 to every xn . Implement a version of PLA that
randomly picks an example (xn , yn ) in every iteration, and updates wt if and only if wt is incorrect
on the example. Note that the random picking can be simply implemented with replacement—that
is, the same example can be picked multiple times, even consecutively. Stop updating and return
wt as wPLA if wt is correct consecutively after checking 5N randomly-picked examples.
Hint: You can simply follow the algorithm above to solve this problem. But if you are interested
in knowing why the algorithm above is somewhat equivalent to the PLA algorithm that you learned
in class, here is some more information. (1) The update procedure described above is equivalent to
the procedure of gathering all the incorrect examples first and then randomly picking an example
among the incorrect ones. But the description above is usually much easier to implement. (2) The
stopping criterion above is a randomized, more efficient implementation of checking whether wt
makes no mistakes on the data set. Passing 5N times of correctness checking means that wt is
mistake-free with more than 99% of probability.
Repeat your experiment for 1000 times, each with a different random seed. Plot a histogram to
visualize the distribution of the number of updates needed before returning wPLA . Describe your
findings. Then, provide the first page of the snapshot of your code as a proof that you have written
the code.
(Note: As a general principle, you can use any plotting software outside your data processing and
machine learning code.)
11. (20 points, human-graded) When running the 1000 experiments above, record ∥wt ∥ as a function
of t. Plot the ∥wt ∥ in each experiment as a function of t for t = 1, 2, . . . , Tmin , where Tmin is the
smallest number of updates in the previous problem. Superpose the 1000 functions on the same
figure. Describe your findings. Then, provide the first page of the snapshot of your code as a proof
that you have written the code.
12. (20 points, code needed, human-graded) Modify your PLA above to a variant that keeps correcting
the same example until it is perfectly classified. That is, when selecting an incorrect example
(xn(t) , yn(t) ) for updating, the algorithm keeps using that example (that is, n(t + 1) = n(t)) to
update until the weight vector perfectly classifies the example (and each update counts!). Repeat
the 1000 experiments above. Plot a histogram to visualize the distribution of the number of updates
needed before returning wPLA . What is the median number of updates? Compare your result to
that of Problem 10. Describe your findings. Then, provide the first page of the snapshot of your
code as a proof that you have written the code.
13. (Bonus 20 points, human-graded) When PLA makes an update on a misclassified example (xn(t) , yn(t) ),
the new weight vector wt+1 does not always classify (xn(t) , yn(t) ) correctly. Consider a variant of
PLA that makes an update by
$ %
1 −10yn(t) wtT xn(t)
wt+1 ← wt + yn(t) xn(t) · +1 .
10 ∥xn(t) ∥2

First, prove that wt+1 always correctly classifies (xn(t) , yn(t) ) after the update. Second, prove
that such a PLA halts with a perfect hyperplane if the data is linearly separable. (Hint: Check
Problem 12.)

4 of 4

Cos4852 2018 A1
No ratings yet
Cos4852 2018 A1
11 pages
CS236 Homework 1
100% (1)
CS236 Homework 1
4 pages
hw1 Red
No ratings yet
hw1 Red
4 pages
HW 2
No ratings yet
HW 2
4 pages
Machine Learning Assignments and Answers
No ratings yet
Machine Learning Assignments and Answers
35 pages
hw3_red
No ratings yet
hw3_red
4 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
56 pages
hw2_red
No ratings yet
hw2_red
4 pages
final_exam_solutions
No ratings yet
final_exam_solutions
12 pages
Assignment 1: Welcome To Tensorflow: Problem 1: Op Is All You Need
No ratings yet
Assignment 1: Welcome To Tensorflow: Problem 1: Op Is All You Need
4 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
hw5_1
No ratings yet
hw5_1
6 pages
hw4
No ratings yet
hw4
13 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
No ratings yet
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
8 pages
hw2 311
No ratings yet
hw2 311
4 pages
15-381 Spring 2007 Assignment 6: Learning
No ratings yet
15-381 Spring 2007 Assignment 6: Learning
14 pages
CSCI_5521_Spring_2025_Final_Exam
No ratings yet
CSCI_5521_Spring_2025_Final_Exam
8 pages
DS3001_DAV_Final Exam_Fall23_v3
No ratings yet
DS3001_DAV_Final Exam_Fall23_v3
14 pages
Cs224n Midterm 2018 Solution
No ratings yet
Cs224n Midterm 2018 Solution
17 pages
Winter21Exam1
No ratings yet
Winter21Exam1
17 pages
hw1
No ratings yet
hw1
4 pages
Ai Fall-23 Assignment
No ratings yet
Ai Fall-23 Assignment
5 pages
Homework 04: Your Content Should Use Any Color That Is Different From Those
No ratings yet
Homework 04: Your Content Should Use Any Color That Is Different From Those
5 pages
Ps 3
No ratings yet
Ps 3
15 pages
Kinetic theory of gases notes
No ratings yet
Kinetic theory of gases notes
5 pages
hw4_red
No ratings yet
hw4_red
6 pages
HW 1 Eeowh 3
No ratings yet
HW 1 Eeowh 3
6 pages
Exam Spring 10
No ratings yet
Exam Spring 10
10 pages
COL 774: Assignment 2
No ratings yet
COL 774: Assignment 2
3 pages
mid-sem
No ratings yet
mid-sem
2 pages
CS 540: Introduction To Artificial Intelligence: Final Exam: 8:15-9:45am, December 21, 2016 132 Noland
No ratings yet
CS 540: Introduction To Artificial Intelligence: Final Exam: 8:15-9:45am, December 21, 2016 132 Noland
8 pages
HW 23 P 4 Rie
No ratings yet
HW 23 P 4 Rie
5 pages
Final Fall00
No ratings yet
Final Fall00
9 pages
ISYE6740_Fall2024_HW4_Rubric
No ratings yet
ISYE6740_Fall2024_HW4_Rubric
5 pages
02u Handout
No ratings yet
02u Handout
37 pages
Exam 2015
No ratings yet
Exam 2015
12 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
hw1 PDF
No ratings yet
hw1 PDF
6 pages
ps1
No ratings yet
ps1
9 pages
hw1
No ratings yet
hw1
12 pages
Ps 1
No ratings yet
Ps 1
25 pages
Machine Learning Foundations (機器學習基石) : Lecture 2: Learning to Answer Yes/No
No ratings yet
Machine Learning Foundations (機器學習基石) : Lecture 2: Learning to Answer Yes/No
37 pages
Machine Learning Assignments
No ratings yet
Machine Learning Assignments
3 pages
Ps 1
No ratings yet
Ps 1
5 pages
f11ms
No ratings yet
f11ms
4 pages
Instructors: Moses Charikar, Tengyu Ma, and Chris Re: Hope Everyone Stays Safe and Healthy in These Difficult Times!
No ratings yet
Instructors: Moses Charikar, Tengyu Ma, and Chris Re: Hope Everyone Stays Safe and Healthy in These Difficult Times!
40 pages
exercises-en-text-models-2
No ratings yet
exercises-en-text-models-2
5 pages
Solution PDF
No ratings yet
Solution PDF
20 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Ps 2
No ratings yet
Ps 2
11 pages
Midterm Fall02
No ratings yet
Midterm Fall02
12 pages
Final F01soln
No ratings yet
Final F01soln
13 pages
hw1 f21112 Problems11
No ratings yet
hw1 f21112 Problems11
2 pages
hw2 2020
No ratings yet
hw2 2020
3 pages
Week 5
No ratings yet
Week 5
3 pages
CS_221_Fall_19_Solution
No ratings yet
CS_221_Fall_19_Solution
30 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Satplan: Fundamentals and Applications
From Everand
Satplan: Fundamentals and Applications
Fouad Sabry
No ratings yet
Revision Chapter 7
No ratings yet
Revision Chapter 7
4 pages
Unit5 Lecture1 PDF
No ratings yet
Unit5 Lecture1 PDF
46 pages
As 9100 C To As 9100 D Migration Upgradation and Transition With MyEasyISO - R00 - 03082017
No ratings yet
As 9100 C To As 9100 D Migration Upgradation and Transition With MyEasyISO - R00 - 03082017
6 pages
Complete Download (PowerPoint) Data Communications and Computer Networks A Business User's Approach 8th Edition PDF All Chapters
100% (2)
Complete Download (PowerPoint) Data Communications and Computer Networks A Business User's Approach 8th Edition PDF All Chapters
24 pages
15201222082_CA2_BCAC602 (1)
No ratings yet
15201222082_CA2_BCAC602 (1)
3 pages
Flex Card
No ratings yet
Flex Card
2 pages
01 Test Bank
No ratings yet
01 Test Bank
15 pages
Unit 3 - MLnotes-WPS Office
No ratings yet
Unit 3 - MLnotes-WPS Office
18 pages
User Manual Inspinia Mobile App
No ratings yet
User Manual Inspinia Mobile App
14 pages
CSP Unit 3 App Development Planning Guide
No ratings yet
CSP Unit 3 App Development Planning Guide
7 pages
M.util - Automatic Burner Control Mpa 41xx
No ratings yet
M.util - Automatic Burner Control Mpa 41xx
126 pages
Topic Wise Java Questions
No ratings yet
Topic Wise Java Questions
9 pages
Jabra Sport Pulse Techsheet Standard
No ratings yet
Jabra Sport Pulse Techsheet Standard
1 page
Jex-7804 GD Arabic L.
No ratings yet
Jex-7804 GD Arabic L.
2 pages
Transistor BD139
No ratings yet
Transistor BD139
4 pages
What Is Virtual Reality
No ratings yet
What Is Virtual Reality
17 pages
SMS Firewall
No ratings yet
SMS Firewall
5 pages
Design Thinking: Data Intelligence: Learning Objectives
No ratings yet
Design Thinking: Data Intelligence: Learning Objectives
18 pages
Senarai Peralatan Ict
No ratings yet
Senarai Peralatan Ict
35 pages
5G Testing With Python PDF
No ratings yet
5G Testing With Python PDF
11 pages
Airteltigo Sms Bundle Code - Google Search
No ratings yet
Airteltigo Sms Bundle Code - Google Search
1 page
Content:-1.: Keyword Research
No ratings yet
Content:-1.: Keyword Research
2 pages
BCS 31 C Imp Ques
No ratings yet
BCS 31 C Imp Ques
5 pages
Salary Sleep Answer Key
No ratings yet
Salary Sleep Answer Key
53 pages
Electrification X - Sustainability-Energy Management - Sales Presentation
No ratings yet
Electrification X - Sustainability-Energy Management - Sales Presentation
19 pages
Which One of The Binary Number Ranges Shown Below
No ratings yet
Which One of The Binary Number Ranges Shown Below
8 pages
User Manual DASP Version 2.3: DASP: Distributive Analysis Stata Package
No ratings yet
User Manual DASP Version 2.3: DASP: Distributive Analysis Stata Package
150 pages
Unit 1 - Cga - 2021
No ratings yet
Unit 1 - Cga - 2021
40 pages
ABW Air Circuit Breaker
100% (1)
ABW Air Circuit Breaker
28 pages
CellRoute GPRS Manual
No ratings yet
CellRoute GPRS Manual
10 pages

hw1_red

Uploaded by

hw1_red

Uploaded by

Machine Learning (NTU, Fall 2024) instructor: Hsuan-Tien Lin

[a] ∞ (i.e., PLA might never terminate)

[e] none of the other choices

f (x) = sign(z+ (x) − z− (x) − 3.5).

Please initialize your algorithm with w = 0 and take sign(0) as −1.

You might also like