0% found this document useful (0 votes)

4 views

hw3

Homework #3 for CSE 446/546 focuses on machine learning concepts, including SVM, kernels, perceptron algorithms, and PyTorch implementation. Students are required to submit answers to conceptual questions, graph data points, and implement kernel ridge regression, along with tasks related to neural networks and loss functions. The assignment emphasizes proper submission guidelines, including linking answers on Gradescope and collaborating with peers.

Uploaded by

cepem13540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

hw3

Uploaded by

cepem13540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Homework #3

CSE 446/546: Machine Learning

Prof. Kevin Jamieson and Prof. Simon S. Du
Due: May 19, 2023 11:59pm
Points A: 90; B: 5

Please review all homework guidance posted on the website before submitting it to Gradescope. Reminders:
• Make sure to read the “What to Submit” section following each question and include all items.
• Please provide succinct answers and supporting reasoning for each question. Similarly, when discussing
experimental results, concisely create tables and/or figures when appropriate to organize the experimental
results. All explanations, tables, and figures for any particular part of a question must be grouped together.
• For every problem involving generating plots, please include the plots as part of your PDF submission.
• When submitting to Gradescope, please link each question from the homework in Gradescope to the
location of its answer in your homework PDF. Failure to do so may result in deductions of up to 10% of
the value of each question not properly linked. For instructions, see https://www.gradescope.com/get_
started#student-submission.
• If you collaborate on this homework with others, you must indicate who you worked with on your homework
by providing a complete list of collaborators on the first page of your assignment. Make sure to include
the name of each collaborator, and on which problem(s) you collaborated. Failure to do so may result
in accusations of plagiarism. You can review the course collaboration policy at https://courses.cs.
washington.edu/courses/cse446/23sp/assignments/
• For every problem involving code, please include all code you have written for the problem as part of your
PDF submission in addition to submitting your code to the separate assignment on Gradescope created
for code. Not submitting all code files will lead to a deduction of up to 10% of the value of each question
missing code.
Not adhering to these reminders may result in point deductions.

1
Conceptual Questions
A1. The answers to these questions should be answerable without referring to external materials. Briefly justify
your answers with a few words.
ku−vk2

a. [2 points] Say you trained an SVM classifier with an RBF kernel (K(u, v) = exp − 2σ2 2 ). It seems to
underfit the training set: should you increase or decrease σ?

b. [2 points] True or False: Training deep neural networks requires minimizing a convex loss function, and
therefore gradient descent will provide the best result.
c. [2 points] True or False: It is a good practice to initialize all weights to zero when training a deep neural
network.
d. [2 points] True or False: We use non-linear activation functions in a neural network’s hidden layers so that
the network learns non-linear decision boundaries.
e. [2 points] True or False: Given a neural network, the time complexity of the backward pass step in the
backpropagation algorithm can be prohibitively larger compared to the relatively low time complexity of
the forward pass step.
f. [2 points] True or False: Neural Networks are the most extensible model and therefore the best choice for
any circumstance.

What to Submit:
• Parts a-f: 1-2 sentence explanation containing your answer.

Support Vector Machines

A2. Recall that solving the SVM problem amounts to solving the following constrained optimization problem:
Given data points D = {(xi , yi )}ni=1 find

min ||w||2 subject to yi (xTi w − b) ≥ 1 for i ∈ {1, . . . , n}

w,b

where xi ∈ Rd , yi ∈ {−1, 1}, and w ∈ Rd .

Consider the following labeled data points:
   
1 2 0 0.5
1 3 1 0
2 3 with label y = −1 and 2
    with label y = 1
1
3 4 3 0

a. [2 points] Graph the data points above. Highlight the support vectors and write their coordinates. Draw
the two parallel hyperplanes separating the two classes of data such that the distance between them is
as large as possible. Draw the maximum-margin hyperplane. Write the equations describing these three
hyperplanes using only x, w, b(that is without using any specific values). Draw w(it doesn’t have to have
the exact magnitude, but it should have the correct orientation).

b. [2 points] For the data points above, find w and b.

Hint: Use the support vectors and the values {−1, 1} to create a linear system of equations where the
unknowns are w1 , w2 and b.

2
c. [4 points] Show that for any solvable SVM problem, the distance between the two separating hyperplanes
is ||w||
2
2
.

Hint 1: The distance between two hyperplanes is the distance between any point x0 on one of the hyper-
planes and its projection on the other hyperplane.
Hint 2: A direction w and an offset c define the hyperplane: H = {x ∈ Rn |wT x = c}. The projection of
a vector y onto H is given by PH (y) = y − w||w||
T
y−c
2 w.
2

What to Submit:
• Part a: Write down support vectors and equations. Graph the points, hyperplanes, and w.
• Part b: Solution and corresponding calculations.
• Part c: Proof.

Kernels
A3. [5 points] Suppose that our inputs x are one-dimensional and that our feature map is infinite-dimensional:
φ(x) is a vector whose ith component is:
1 2
√ e−x /2 xi ,
i!
(x−x0 )2
for all nonnegative integers i. (Thus, φ is an infinite-dimensional vector.) Show that K(x, x0 ) = e− 2 is a
kernel function for this feature map, i.e.,
(x−x0 )2
φ(x) · φ(x0 ) = e− 2 .

Hint: Use the Taylor expansion of z 7→ ez . (This is the one dimensional version of the Gaussian (RBF) kernel).

What to Submit:
• Proof.

A4. This problem will get you familiar with kernel ridge regression using the polynomial and RBF kernels.
First, let’s generate some data. Let n = 30 and f∗ (x) = 4 sin(πx) cos(6πx2 ). For i = 1, . . . , n let each xi be
drawn uniformly at random from [0, 1], and let yi = f∗ (xi ) + i where i ∼ N (0, 1). For any function f , the true
error and the train error are respectively defined as:
n
1X 2
Etrue (f ) = EX,Y (f (X) − Y )2 ,

Ebtrain (f ) = (f (xi ) − yi ) .
n i=1

Now, our goal is, using kernel ridge regression, to construct a predictor:
n
X
2
b = arg minkKα − yk2 + λα> Kα ,
α fb(x) = α
bi k(xi , x)
α
i=1

where K ∈ Rn×n is the kernel matrix such that Ki,j = k(xi , xj ), and λ ≥ 0 is the regularization constant.

a. [10 points] Using leave-one-out cross validation, find a good λ and hyperparameter settings for the following
kernels:
• kpoly (x, z) = (1 + x> z)d where d ∈ N is a hyperparameter,
• krbf (x, z) = exp(−γkx − zk2 ) where γ > 0 is a hyperparameter1 .
2

1 Given a dataset x1 , . . . , xn ∈ Rd , a heuristic for choosing a range of γ in the right ballpark is the inverse of the median of all
n
2
squared distances kxi − xj k22 .

3
We strongly recommend implementing either grid search or random search. Do not use sklearn, but
actually implement of these algorithms. Reasonable values to look through in this problem are: λ ∈
10[−5,−1] , d ∈ [5, 25], γ sampled from a narrow gaussian distribution centered at value described in the
footnote.
Report the values of d, γ, and the λ values for both kernels.
b. [10 points] Let fbpoly (x) and fbrbf (x) be
n the functions learned
o using the hyperparameters you found in part
a. For a single plot per function f ∈ fpoly (x), frbf (x) , plot the original data {(xi , yi )}ni=1 , the true f (x),
b b b
and fb(x) (i.e., define a fine grid on [0, 1] to plot the functions).

What to Submit:
• Part a: Report the values of d, γ and the value of λ for both kernels as described.
• Part b: Two plots. One plot for each function.

• Code on Gradescope through coding submission.

Perceptron

B1. One of the oldest algorithms used in machine learning (from the early 60’s) is an online algorithm for
learning a linear threshold function called the Perceptron Algorithm. It works as follows:

1. Start with the all-zeroes weight vector w1 = 0, and initialize t to 1. Also let’s automatically scale all
examples x to have (Euclidean) norm 1, since this doesn’t affect which side of the plane they are on.
2. Given example x, predict positive iff wt · x > 0.
3. On a mistake, update as follows:

• Mistake on positive: wt+1 ← wt + x.

• Mistake on negative: wt+1 ← wt − x.
4. t ← t + 1.

If we make a mistake on a positive x we get wt+1 · x = (wt + x) · x = wt · x + 1, and similarly if we make

a mistake on a negative x we have wt+1 · x = (wt − x) · x = wt · x − 1. So, in both cases we move closer
(by 1) to the value we wanted. Here is a link if you are interested in more details.

Now consider the linear decision boundary for classification (labels in {−1, 1}) of the form w · x = 0 (i.e.,
no offset). Now consider the following loss function evaluated at a data point (x, y) which is a variant on
the hinge loss.
`((x, y), w) = max{0, −y(w · x)}.
a. [2 points] Given a dataset of (xi , yi ) pairs, write down a single step of subgradient descent with a
step size of η if we are trying to minimize
n
1X
`((xi , yi ), w)
n i=1

for `(·) defined as above. That is, given a current iterate w

e what is an expression for the next iterate?

b. [2 points] Use what you derived to argue that the Perceptron can be viewed as implementing SGD
applied to the loss function just described (for what value of η)?

4
c. [1 point] Suppose your data was drawn i.i.d. and that there exists a w∗ that separates the two classes
perfectly. Provide an explanation for why hinge loss is generally preferred over the loss given above.

What to Submit:
• Part a: Expression for a single step of subgradient descent

• Part b: A 1-2 sentence explanation).

• Part c: A 1-2 sentence explanation).

Introduction to PyTorch
A5. PyTorch is a great tool for developing, deploying and researching neural networks and other gradient-
based algorithms. In this problem we will explore how this package is built, and re-implement some of its core
components. Firstly start by reading README.md file provided in intro_pytorch subfolder. A lot of problem
statements will overlap between here, readme’s and comments in functions.

a. [10 points] You will start by implementing components of our own PyTorch modules. You can find these
in folders: layers, losses and optimizers. Almost each file there should contain at least one problem
function, including exact directions for what to achieve in this problem. Lastly, you should implement
functions in train.py file.

b. [5 points] Next we will use the above module to perform hyperparameter search. Here we will also treat
loss function as a hyper-parameter. However, because cross-entropy and MSE require different shapes we
are going to use two different files: crossentropy_search.py and mean_squared_error_search.py. For
each you will need to build and train (in provided order) 5 models:
• Linear neural network (Single layer, no activation function)
• NN with one hidden layer (2 units) and sigmoid activation function after the hidden layer
• NN with one hidden layer (2 units) and ReLU activation function after the hidden layer
• NN with two hidden layer (each with 2 units) and Sigmoid, ReLU activation functions after first and
second hidden layers, respectively
• NN with two hidden layer (each with 2 units) and ReLU, Sigmoid activation functions after first and
second hidden layers, respectively
For each loss function, submit a plot of losses from training and validation sets. All models should be on
the same plot (10 lines per plot), with two plots total (1 for MSE, 1 for cross-entropy).
c. [5 points] For each loss function, report the best performing architecture (best performing is defined here
as achieving the lowest validation loss at any point during the training), and plot it’s guesses on test set.
You should use function plot_model_guesses from train.py file. Lastly, report accuracy of that model
on a test set.

On Softmax function
One of the activation functions we ask you to implement is softmax. For a prediction ŷ ∈ Rk corresponding to
single datapoint (in a problem with k classes):

exp(ŷi )
softmax(ŷi ) = P
j exp(ŷj )

5
What to Submit:
• Part b: 2 plots (one per loss function), with 10 lines each, showing both training and validation loss of
each model. Make sure plots are titled, and have proper legends.

• Part c: Names of best performing models (i.e. descriptions of their architectures), and their accuracy on
test set.
• Part c: 2 scatter plots (one per loss function), with predictions of best performing models on test set.
• Code on Gradescope through coding submission

Neural Networks for MNIST

Resources
For questions A.4, A.5 and A.6 you will use a lot of PyTorch. In Section materials (Week 6) there is a notebook
that you might find useful. Additionally make use of PyTorch Documentation, when needed.
If you do not have access to GPU, you might find Google Colaboratory useful. It allows you to use a cloud
GPU for free. To enable it make sure: ”Runtime” -> ”Change runtime type” -> ”Hardware accelerator” is set
to ”GPU”. When submitting please download and submit a .py version of your notebook.
A6. In Homework 1, we used ridge regression for training a classifier for the MNIST data set. In Homework 2,
we used logistic regression to distinguish between the digits 2 and 7. In this problem, we will use PyTorch to
build a simple neural network classifier for MNIST to further improve our accuracy.

We will implement two different architectures: a shallow but wide network, and a narrow but deeper net-
work. For both architectures, we use d to refer to the number of input features (in MNIST, d = 282 = 784), hi
to refer to the dimension of the i-th hidden layer and k for the number of target classes (in MNIST, k = 10).
For the non-linear activation, use ReLU. Recall from lecture that
(
x, x ≥ 0
ReLU(x) =
0, x < 0 .

Weight Initialization
Consider a weight matrix W ∈ Rn×m and b ∈ Rn . Note that here m refers to the input dimension and n to the
output dimension of the transformation x 7→ W x + b. Define α = √1m . Initialize all your weight matrices and
biases according to Unif(−α, α).

Training
For this assignment, use the Adam optimizer from torch.optim. Adam is a more advanced form of gradient
descent that combines momentum and learning rate scaling. It often converges faster than regular gradient
descent in practice. You can use either Gradient Descent or any form of Stochastic Gradient Descent. Note
that you are still using Adam, but might pass either the full data, a single datapoint or a batch of data to it.
Use cross entropy for the loss function and ReLU for the non-linearity.

Implementing the Neural Networks

a. [10 points] Let W0 ∈ Rh×d , b0 ∈ Rh , W1 ∈ Rk×h , b1 ∈ Rk and σ(z) : R → R some non-linear activation
function applied element-wise. Given some x ∈ Rd , the forward pass of the wide, shallow network can be
formulated as:
F1 (x) := W1 σ(W0 x + b0 ) + b1
Use h = 64 for the number of hidden units and choose an appropriate learning rate. Train the network
until it reaches 99% accuracy on the training data and provide a training plot (loss vs. epoch). Finally
evaluate the model on the test data and report both the accuracy and the loss.

6
b. [10 points] Let W0 ∈ Rh0 ×d , b0 ∈ Rh0 , W1 ∈ Rh1 ×h0 , b1 ∈ Rh1 , W2 ∈ Rk×h1 , b2 ∈ Rk and σ(z) : R → R
some non-linear activation function. Given some x ∈ Rd , the forward pass of the network can be formulated
as:
F2 (x) := W2 σ(W1 σ(W0 x + b0 ) + b1 ) + b2
Use h0 = h1 = 32 and perform the same steps as in part a.

c. [5 points] Compute the total number of parameters of each network and report them. Then compare the
number of parameters as well as the test accuracies the networks achieved. Is one of the approaches (wide,
shallow vs. narrow, deeper) better than the other? Give an intuition for why or why not.

Using PyTorch: For your solution, you may not use any functionality from the torch.nn module except for
torch.nn.functional.relu and torch.nn.functional.cross_entropy. You must implement the networks
F1 and F2 from scratch. For starter code and a tutorial on PyTorch refer to the sections 6 and 7 material.

What to Submit:
• Parts a-b: Provide a plot of the training loss versus epoch. In addition evaluate the model trained on
the test data and report the accuracy and loss.

• Part c: Report the number of parameters for the network trained in part (a) and for the network trained
in part (b). Provide a comparison of the two networks as described in part in 1-2 sentences.
• Code on Gradescope through coding submission.

2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
Treasures Grade 5 Show
100% (1)
Treasures Grade 5 Show
30 pages
Stanford University CS 229, Autumn 2014 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2014 Midterm Examination
23 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Midterm Review Spring18 Sols
No ratings yet
Midterm Review Spring18 Sols
22 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
hw5_1
No ratings yet
hw5_1
6 pages
hw2 2020
No ratings yet
hw2 2020
3 pages
CSCI_5521_Spring_2025_Final_Exam
No ratings yet
CSCI_5521_Spring_2025_Final_Exam
8 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
quiz3
No ratings yet
quiz3
12 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Midterm With Solutions
No ratings yet
Midterm With Solutions
26 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
hw1
No ratings yet
hw1
11 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
hw3_red
No ratings yet
hw3_red
4 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
8 pages
ML PG Assignment 3
No ratings yet
ML PG Assignment 3
3 pages
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
No ratings yet
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
5 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
final_exam_solutions
No ratings yet
final_exam_solutions
12 pages
HW 1
No ratings yet
HW 1
3 pages
sol14
No ratings yet
sol14
22 pages
Kinetic theory of gases notes
No ratings yet
Kinetic theory of gases notes
5 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
11 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
ML June 2024
No ratings yet
ML June 2024
12 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
Homework2 - Tran Anh Vu
No ratings yet
Homework2 - Tran Anh Vu
3 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Practice Midterm 2 Sol
No ratings yet
Practice Midterm 2 Sol
26 pages
MIDA1 AUT - Solutions
No ratings yet
MIDA1 AUT - Solutions
4 pages
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
No ratings yet
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
8 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
Machine Learning Questions Final - Solutions
No ratings yet
Machine Learning Questions Final - Solutions
5 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Qs ML
No ratings yet
Qs ML
8 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
Machine Learning Homework
No ratings yet
Machine Learning Homework
8 pages
ML ES 23-24-II Key
No ratings yet
ML ES 23-24-II Key
4 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
ES_key (4)
No ratings yet
ES_key (4)
4 pages
HW 1
No ratings yet
HW 1
4 pages
Homework 3: SVM and Sentiment Analysis: Minted Listings
No ratings yet
Homework 3: SVM and Sentiment Analysis: Minted Listings
7 pages
178 hw3
No ratings yet
178 hw3
3 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Lecture 1 2022
No ratings yet
Lecture 1 2022
28 pages
Lecture UnsupervisedML_SOM
No ratings yet
Lecture UnsupervisedML_SOM
38 pages
hw4
No ratings yet
hw4
13 pages
hw2
No ratings yet
hw2
10 pages
hw0_22au
No ratings yet
hw0_22au
5 pages
hw5
No ratings yet
hw5
3 pages
White Board 1-12
No ratings yet
White Board 1-12
20 pages
More Detailed Content of the Course
No ratings yet
More Detailed Content of the Course
4 pages
White Board 1-10
No ratings yet
White Board 1-10
19 pages
HW2
No ratings yet
HW2
2 pages
Y10 Vacaciones 13 Lessonplan
No ratings yet
Y10 Vacaciones 13 Lessonplan
3 pages
Lesson 20 Designing Learning Portfolios
No ratings yet
Lesson 20 Designing Learning Portfolios
23 pages
Test 14 - Ko Reading
No ratings yet
Test 14 - Ko Reading
3 pages
School
No ratings yet
School
15 pages
planning-and-preparing-for-business-meetings-british-english-teacher-B1-B2
No ratings yet
planning-and-preparing-for-business-meetings-british-english-teacher-B1-B2
10 pages
The Birth of Scientific English
No ratings yet
The Birth of Scientific English
5 pages
Q & A SOFTSKILLS
No ratings yet
Q & A SOFTSKILLS
38 pages
OCD - Thinking Bad Thoughts
No ratings yet
OCD - Thinking Bad Thoughts
5 pages
Vishnu CVG
No ratings yet
Vishnu CVG
1 page
Lista_Escolas_24-25_Erasmus
No ratings yet
Lista_Escolas_24-25_Erasmus
4 pages
Spanish Grammar Tips
No ratings yet
Spanish Grammar Tips
7 pages
Resume 1.3
No ratings yet
Resume 1.3
1 page
DM_s2025_159
No ratings yet
DM_s2025_159
8 pages
Effectiveness of Back Massage On Sleep Among Patients With Major Orthopedic Surgery in Selected Hospital, Nagercoil, Tamil Na
No ratings yet
Effectiveness of Back Massage On Sleep Among Patients With Major Orthopedic Surgery in Selected Hospital, Nagercoil, Tamil Na
9 pages
Recruitment (Human Resource Management)
No ratings yet
Recruitment (Human Resource Management)
44 pages
GOALS OF BEGINNING LITERACY
No ratings yet
GOALS OF BEGINNING LITERACY
37 pages
Statistics Cala... Time Series
100% (1)
Statistics Cala... Time Series
8 pages
Access and Equity To Higher Education in Uganda
No ratings yet
Access and Equity To Higher Education in Uganda
20 pages
Open Air Theater Estimation
No ratings yet
Open Air Theater Estimation
6 pages
Sungkyunkwan University
No ratings yet
Sungkyunkwan University
11 pages
EP M.5 Physics 4 - Book
No ratings yet
EP M.5 Physics 4 - Book
253 pages
Elt 9 Technical Writing Syllabus PDF Revised
100% (1)
Elt 9 Technical Writing Syllabus PDF Revised
11 pages
Social Media in The Classroom A Literature Review
No ratings yet
Social Media in The Classroom A Literature Review
6 pages
Explainable AI
100% (1)
Explainable AI
16 pages
4B Text Dependent Questions
No ratings yet
4B Text Dependent Questions
4 pages
Educational Metaverse For Teaching and Learning in Higher Education of Pakistan
No ratings yet
Educational Metaverse For Teaching and Learning in Higher Education of Pakistan
15 pages
Richter, Liselotte (1970) - Jean-Paul Sartre
No ratings yet
Richter, Liselotte (1970) - Jean-Paul Sartre
126 pages
Balanghay: The Filipíno Language
No ratings yet
Balanghay: The Filipíno Language
4 pages
Social Comparison Theory Unhappy Vs Happy People
No ratings yet
Social Comparison Theory Unhappy Vs Happy People
17 pages

hw3

Uploaded by

hw3

Uploaded by

Homework #3

CSE 446/546: Machine Learning

Support Vector Machines

min ||w||2 subject to yi (xTi w − b) ≥ 1 for i ∈ {1, . . . , n}

where xi ∈ Rd , yi ∈ {−1, 1}, and w ∈ Rd .

b. [2 points] For the data points above, find w and b.

• Code on Gradescope through coding submission.

• Mistake on positive: wt+1 ← wt + x.

If we make a mistake on a positive x we get wt+1 · x = (wt + x) · x = wt · x + 1, and similarly if we make

for `(·) defined as above. That is, given a current iterate w

• Part b: A 1-2 sentence explanation).

Neural Networks for MNIST

Implementing the Neural Networks

You might also like