0% found this document useful (0 votes)

2 views

hw2

Homework #2 for CSE 446/546 focuses on machine learning concepts, including norms, convexity, and the Lasso method. Students are required to submit their work via Gradescope, ensuring proper linking and collaboration disclosure. The assignment includes conceptual questions, proofs, and practical coding tasks involving Lasso on synthetic and real datasets.

Uploaded by

cepem13540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

hw2

Uploaded by

cepem13540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Homework #2

CSE 446/546: Machine Learning

Prof. Kevin Jamieson and Prof. Simon S. Du
Due: May 3, 2023 11:59pm
Points A: 104; B: 10

Please review all homework guidance posted on the website before submitting it to Gradescope. Reminders:
• Make sure to read the “What to Submit” section following each question and include all items.
• Please provide succinct answers and supporting reasoning for each question. Similarly, when discussing
experimental results, concisely create tables and/or figures when appropriate to organize the experimental
results. All explanations, tables, and figures for any particular part of a question must be grouped together.
• For every problem involving generating plots, please include the plots as part of your PDF submission.
• When submitting to Gradescope, please link each question from the homework in Gradescope to the
location of its answer in your homework PDF. Failure to do so may result in deductions of up to 10% of
the value of each question not properly linked. For instructions, see https://www.gradescope.com/get_
started#student-submission.
• If you collaborate on this homework with others, you must indicate who you worked with on your homework
by providing a complete list of collaborators on the first page of your assignment. Make sure to include
the name of each collaborator, and on which problem(s) you collaborated. Failure to do so may result
in accusations of plagiarism. You can review the course collaboration policy at https://courses.cs.
washington.edu/courses/cse446/23sp/assignments/
• For every problem involving code, please include all code you have written for the problem as part of your
PDF submission in addition to submitting your code to the separate assignment on Gradescope created
for code. Not submitting all code files will lead to a deduction of up to 10% of the value of each question
missing code.
Not adhering to these reminders may result in point deductions.

1
Conceptual Questions
A1. The answers to these questions should be answerable without referring to external materials. Briefly justify
your answers with a few words.
a. [2 points] Compared to L2 norm penalty, explain why a L1 norm penalty is more likely to result in sparsity
(a larger number of 0s) in the weight vector.
b. [2 points] In at most one
Psentence each, state one possible upside and one possible downside of using the
following regularizer: .
0.5
i |wi |

c. [2 points] True or False: If the step-size for gradient descent is too large, it may not converge.
d. [2 points] In at most one sentence each, state one possible advantage of SGD over GD (gradient descent),
and one possible disadvantage of SGD relative to GD.
e. [2 points] Why is it necessary to apply the gradient descent algorithm on logistic regression but not linear
regression?

What to Submit:
• Part c: True or False.

• Parts a-e: Brief (2-3 sentence) explanation.

2
Convexity and Norms
A2. A norm k · k over Rn is defined by the properties: (i) non-negativity: kxk ≥ 0 for all x ∈ Rn with equality
if and only if x = 0, (ii) absolute scalability: ka xk = |a| kxk for all a ∈ R and x ∈ Rn , (iii) triangle inequality:
kx + yk ≤ kxk + kyk for all x, y ∈ Rn .

a. [3 points] Show that f (x) = ( i=1 |xi |) is a norm. (Hint: for (iii), begin by showing that |a + b| ≤ |a| + |b|
Pn
for all a, b ∈ R.)

b. [2 points] Show that g(x) = 1/2 2

is not a norm. (Hint: it suffices to find two points in n = 2
Pn
i=1 |xi |
dimensions such that the triangle inequality does not hold.)
Context: norms are often used in regularization to encourage specific behaviors of solutions. If we define
then one can show that kxkp is a norm for all p ≥ 1. The important cases of p = 2 and
Pn 1/p
kxkp := ( i=1 |xi |p )
p = 1 correspond to the penalty for ridge regression and the lasso, respectively.

What to Submit:
• Parts a, b: Proof.

B1. [6 points] For any x ∈ Rn , define the following norms: kxk1 = i=1 |xi | ,
Pn pPn
2
i=1 |xi |, kxk2 =
kxk∞ := limp→∞ kxkp = maxi=1,...,n |xi |. Show that kxk∞ ≤ kxk2 ≤ kxk1 .

What to Submit:
• Proof.

A3. [2 points] A set A ⊆ Rn is convex if λx + (1 − λ)y ∈ A for all x, y ∈ A and λ ∈ [0, 1]. For each of the
grey-shaded sets below (I-II), state whether each one is convex, or state why it is not convex using any of the
points a, b, c, d in your answer.

What to Submit:
• Parts I, II: 1-2 sentence explanation of why the set is convex or not.

3
A4. [2 points] We say a function f : Rd → R is convex on a set A if f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
for all x, y ∈ A and λ ∈ [0, 1]. For each of the functions shown below (I-II), state whether each is convex on the
specified interval, or state why not with a counterexample using any of the points a, b, c, d in your answer.

a. Function in panel I on [a, c]

b. Function in panel II on [a, d]

What to Submit:
• Parts a, b: 1-2 sentence explanation of why the function is convex or not.

B2. For i = 1, . . . , n let `i (w) be convex functions over w ∈ Rd (e.g., `i (w) = (yi − w> xi )2 ), k · k is any
norm, and λ > 0.

a. [3 points] Show that

n
X
`i (w) + λkwk
i=1

is convex over w ∈ Rd (Hint: Show that if f, g are convex functions, then f (x) + g(x) is also convex.)

b. [1 point] Explain in one sentence why we prefer to use loss functions and regularized loss functions
that are convex.

What to Submit
• Part a: Proof.
• Part b: 1-2 sentence explanation.

4
Lasso on a Real Dataset
n
Given λ > 0 and data xi , yi , the Lasso is the problem of solving
i=1

n
X d
X
arg min (xTi w + b − yi )2 + λ |wj |
w∈Rd ,b∈R
i=1 j=1

where λ is a regularization parameter. For the programming part of this homework, you will implement the
iterative shrinkage thresholding algorithm shown in Algorithm 1 to solve the Lasso problem in ISTA.py. This
is a variant of the subgradient descent method and a more detailed discussion can be found in this slides. You
may use common computing packages (such as numpy or scipy), but do not use an existing Lasso solver (e.g.,
of scikit-learn).

Algorithm 1: Iterative Shrinkage Thresholding Algorithm for Lasso

Input: Step size η
while not converged
Pn do
b0 ← b − 2η i=1 (xTi w + b − yi )
for k ∈ {1, 2, · · · d}Pdo
n
wk0 ← wk −0 2η i=1 xi,k (xTi w + b − yi )
0
 wk + 2ηλ wk < −2ηλ
wk0 ← 0 wk0 ∈ [−2ηλ, 2ηλ]
 0
wk − 2ηλ wk0 > 2ηλ
end
b ← b0 , w ← w 0
end

Before you get started, the following hints may be useful:

• Wherever possible, use matrix libraries for matrix operations (not for loops). This especially applies to
computing the updates for w. While we wrote the algorithm above with a for loop for clarity, you should
be able to replace this loop using equivalent matrix/vector operations in your code (e.g., numpy functions).
• As a sanity check, ensure the objective value is nonincreasing with each step.
• It is up to you to decide on a suitable stopping condition. A common criteria is to stop when no element
of w changes by more than some small δ during an iteration. If you need your algorithm to run faster, an
easy place to start is to loosen this condition.
• You will need to solve the Lasso on the same dataset for many values of λ. This is called a regularization
path. One way to do this efficiently is to start at a large λ, and then for each consecutive solution, initialize
the algorithm with the previous solution, decreasing λ by a constant ratio (e.g., by a factor of 2).
• The smallest value of λ for which the solution w
b is entirely zero is given by
  
n n
1
(1)
X X
λmax = max 2 xi,k yi −  yj 
k=1,...,d
i=1
n j=1

This is helpful for choosing the first λ in a regularization path.

A5. We will first try out your solver with some synthetic data. A benefit of the Lasso is that if we believe
many features are irrelevant for predicting y, the Lasso can be used to enforce a sparse solution, effectively
differentiating between the relevant and irrelevant features. Suppose that x ∈ Rd , y ∈ R, k < d, and data are
generated independently according to the model yi = wT xi + i where

j/k if j ∈ {1, . . . , k}
(
wj = (2)
0 otherwise

5
and i ∼ N (0, σ 2 ) is noise (note that in the model above b = 0). We can see from Equation (2) that since k < d
and wj = 0 for j > k, the features k + 1 through d are irrelevant for predicting y.
Generate a dataset using this model with n = 500, d = 1000, k = 100, and σ = 1. You should generate the
dataset such that each i ∼ N (0, 1) , and yi is generated as specified above. You are free to choose a distribution
from which the x’s are drawn, but make sure standardize the x’s before running your experiments.

a. [10 points] With your synthetic data, solve multiple Lasso problems on a regularization path, starting at
λmax where no features are selected (see Equation (1)) and decreasing λ by a constant ratio (e.g., 2) until
nearly all the features are chosen. In plot 1, plot the number of non-zeros as a function of λ on the x-axis
(Tip: use plt.xscale('log')).
b. [10 points] For each value of λ tried, record values for false discovery rate (FDR) (number of incorrect
nonzeros in w/total
b number of nonzeros in w)b and true positive rate (TPR) (number of correct nonzeros
in w/k).
b Note: for each j, w
bj is an incorrect nonzero if and only if w
bj 6= 0 while wj = 0. In plot 2, plot
these values with the x-axis as FDR, and the y-axis as TPR.
Note that in an ideal situation we would have an (FDR,TPR) pair in the upper left corner. We can always
trivially achieve (0, 0) and ( d−k
d , 1).

c. [5 points] Comment on the effect of λ in these two plots in 1-2 sentences.

What to Submit:
• Part a: Plot 1.
• Part b: Plot 2.
• Part c: 1-2 sentence explanation.
• Code on Gradescope through coding submission

• All code you wrote in the write-up, with correct page mapping.

A6. We’ll now put the Lasso to work on some real data in crime_data_lasso.py. . Download the training data
set “crime-train.txt” and the test data set “crime-test.txt” from the website. Store your data in your working
directory, ensure you have the pandas library for Python installed, and read in the files with:

import pandas as pd
df_train = pd.read_table("crime-train.txt")
df_test = pd.read_table("crime-test.txt")

This stores the data as Pandas DataFrame objects. DataFrames are similar to Numpy arrays but more flexible;
unlike arrays, DataFrames store row and column indices along with the values of the data. Each column of a
DataFrame can also store data of a different type (here, all data are floats). Here are a few commands that will
get you working with Pandas for this assignment:

df.head() # Print the first few lines of DataFrame df.

df.index # Get the row indices for df.
df.columns # Get the column indices.
df[``foo''] # Return the column named ``foo''.
df.drop(``foo'', axis = 1) # Return all columns except ``foo''.
df.values # Return the values as a Numpy array.
df[``foo''].values # Grab column foo and convert to Numpy array.
df.iloc[:3,:3] # Use numerical indices (like Numpy) to get 3 rows and cols.

6
The data consist of local crime statistics for 1,994 US communities. The response y is the rate of violent crimes
reported per capita in a community. The name of the response variable is ViolentCrimesPerPop, and it is held
in the first column of df_train and df_test. There are 95 features. These features include many variables.
Some features are the consequence of complex political processes, such as the size of the police force and other
systemic and historical factors. Others are demographic characteristics of the community, including self-reported
statistics about race, age, education, and employment drawn from Census reports.

The goals of this problem are threefold: (i) to encourage you to think about how data collection processes affect
the resulting model trained from that data; (ii) to encourage you to think deeply about models you might train
and how they might be misused; and (iii) to see how Lasso encourages sparsity of linear models in settings where
d is large relative to n. We emphasize that training a model on this dataset can suggest a degree of
correlation between a community’s demographics and the rate at which a community experiences
and reports violent crime. We strongly encourage students to consider why these correlations
may or may not hold more generally, whether correlations might result from a common cause,
and what issues can result in misinterpreting what a model can explain.

The dataset is split into a training and test set with 1,595 and 399 entries, respectively1 . We will use this
training set to fit a model to predict the crime rate in new communities and evaluate model performance on the
test set. As there are a considerable number of input variables and fairly few training observations, overfitting is
a serious issue. In order to avoid this, use the coordinate descent Lasso algorithm implemented in the previous
problem.

a. [4 points] Read the documentation for the originalcversion of this dataset: http://archive.ics.uci.edu/
ml/datasets/communities+and+crime. Report 3 features included in this dataset for which historical
policy choices in the US would lead to variability in these features. As an example, the number of police
in a community is often the consequence of decisions made by governing bodies, elections, and amount of
tax revenue available to decision makers.
b. [4 points] Before you train a model, describe 3 features in the dataset which might, if found to have
nonzero weight in model, be interpreted as reasons for higher levels of violent crime, but which might
actually be a result rather than (or in addition to being) the cause of this violence.

Now, we will run the Lasso solver. Begin with λ = λmax defined in Equation (1). Initialize all weights to 0.
Then, reduce λ by a factor of 2 and run again, but this time initialize ŵ from your λ = λmax solution as your
initial weights, as described above. Continue the process of reducing λ by a factor of 2 until λ < 0.01. For all
plots use a log-scale for the λ dimension (Tip: use plt.xscale('log')).

c. [4 points] Plot the number of nonzero weights of each solution as a function of λ.

d. [4 points] Plot the regularization paths (in one plot) for the coefficients for input variables agePct12t29,
pctWSocSec, pctUrban, agePct65up, and householdsize.

e. [4 points] On one plot, plot the mean squared error on the training and test data as a function of λ.
f. [4 points] Sometimes a larger value of λ performs nearly as well as a smaller value, but a larger value will
select fewer variables and perhaps be more interpretable. Inspect the weights ŵ for λ = 30. Which feature
had the largest (most positive) Lasso coefficient? What about the most negative? Discuss briefly.

g. [4 points] Suppose there was a large negative weight on agePct65up and upon seeing this result, a politician
suggests policies that encourage people over the age of 65 to move to high crime areas in an effort to reduce
crime. What is the (statistical) flaw in this line of reasoning? (Hint: fire trucks are often seen around
burning buildings, do fire trucks cause fire?)
1 The features have been standardized to have mean 0 and variance 1.

7
What to Submit:
• Parts a, b: 1-2 sentence explanation.

• Part c: Plot 1.
• Part d: Plot 2.
• Part e: Plot 3.
• Parts f, g: Answers and 1-2 sentence explanation.

• Code on Gradescope through coding submission.

• All code you wrote in the write-up, with correct page mapping.

8
Logistic Regression
A7. Here we consider the MNIST dataset, but for binary classification. Specifically, the task is to determine
whether a digit is a 2 or 7. Here, let Y = 1 for all the “7” digits in the dataset, and use Y = −1 for “2”.
We will use regularized logistic regression. Given a binary classification dataset {(xi , yi )}ni=1 for xi ∈ Rd and
yi ∈ {−1, 1} we showed in class that the regularized negative log likelihood objective function can be written as
n
1X
J(w, b) = log(1 + exp(−yi (b + xTi w))) + λ||w||22
n i=1

Note that the offset term b is not regularized. For all experiments, use λ = 10−1 . Let µi (w, b) = 1
1+exp(−yi (b+xT
.
i w))

a. [8 points] Derive the gradients ∇w J(w, b), ∇b J(w, b) and give your answers in terms of µi (w, b) (your
answers should not contain exponentials).
b. [8 points] Implement gradient descent with an initial iterate of all zeros. Try several values of step sizes
to find one that appears to make convergence on the training set as fast as possible. Run until you feel
you are near to convergence.
(i) For both the training set and the test, plot J(w, b) as a function of the iteration number (and show
both curves on the same plot).
(ii) For both the training set and the test, classify the points according to the rule sign(b + xTi w) and
plot the misclassification error as a function of the iteration number (and show both curves on the
same plot).
Reminder: Make sure you are only using the test set for evaluation (not for training).
c. [7 points] Repeat (b) using stochastic gradient descent with a batch size of 1. Note, the expected gradient
with respect to the random selection should be equal to the gradient found in part (a). Show both plots
described in (b) when using batch size 1. Take careful note of how to scale the regularizer.
d. [7 points] Repeat (b) using mini-batch gradient descent with batch size of 100. That is, instead of
approximating the gradient with a single example, use 100. Note, the expected gradient with respect to
the random selection should be equal to the gradient found in part (a).

What to Submit
• Part a: Proof
• Part b: Separate plots for b(i) and b(ii).

• Part c: Separate plots for c which reproduce those from b(i) and b(ii) for this case.
• Part d: Separate plots for c which reproduce those from b(i) and b(ii) for this case.
• Code on Gradescope through coding submission.

9
Administrative
A8.
a. [2 points] About how many hours did you spend on this homework? There is no right or wrong answer :)

DISOCONT® Tersus Weighfeeder Instruction Manual bvh2407gb
50% (2)
DISOCONT® Tersus Weighfeeder Instruction Manual bvh2407gb
338 pages
(Ebook PDF) Information Systems Project Management, A Process Approach, Edition 2.0 2024 Scribd Download
100% (3)
(Ebook PDF) Information Systems Project Management, A Process Approach, Edition 2.0 2024 Scribd Download
49 pages
Lambda-Calculus, Combinators and Functional Programming PDF
100% (5)
Lambda-Calculus, Combinators and Functional Programming PDF
194 pages
Automotive Signal Geneartor Asg102
100% (1)
Automotive Signal Geneartor Asg102
30 pages
HW2 Sol
No ratings yet
HW2 Sol
5 pages
Chapter 1
100% (1)
Chapter 1
41 pages
Functional Dependency (DBMS)
No ratings yet
Functional Dependency (DBMS)
17 pages
Homework 2
No ratings yet
Homework 2
5 pages
HW 1
No ratings yet
HW 1
3 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
hw3
No ratings yet
hw3
7 pages
S Ccs Answers
No ratings yet
S Ccs Answers
192 pages
Solutions Manual Scientific Computing
0% (1)
Solutions Manual Scientific Computing
192 pages
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
No ratings yet
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
4 pages
CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
No ratings yet
CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
8 pages
HW 4
No ratings yet
HW 4
7 pages
hw1
No ratings yet
hw1
11 pages
COGS 118 Homework 3 Supervised Machine Learning Algorithms
No ratings yet
COGS 118 Homework 3 Supervised Machine Learning Algorithms
7 pages
Homework 1
No ratings yet
Homework 1
3 pages
COL726_A1_1
No ratings yet
COL726_A1_1
3 pages
MedTerm Machine Learning
No ratings yet
MedTerm Machine Learning
14 pages
Convex Problems
No ratings yet
Convex Problems
48 pages
hw0_22au
No ratings yet
hw0_22au
5 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
MS_key-4
No ratings yet
MS_key-4
4 pages
COL726_A2
No ratings yet
COL726_A2
5 pages
oblig2_fasit
No ratings yet
oblig2_fasit
6 pages
COL726_A1-Solutions (1)
No ratings yet
COL726_A1-Solutions (1)
8 pages
CS181 HW0
No ratings yet
CS181 HW0
9 pages
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
No ratings yet
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
8 pages
Problem Sets
No ratings yet
Problem Sets
47 pages
HW 1
No ratings yet
HW 1
6 pages
Convex Optimization Prerequisite_topics
No ratings yet
Convex Optimization Prerequisite_topics
6 pages
hw01s
No ratings yet
hw01s
10 pages
HW 2 Sol
No ratings yet
HW 2 Sol
5 pages
Hw2sol PDF
100% (1)
Hw2sol PDF
5 pages
hw4_red
No ratings yet
hw4_red
6 pages
HW 1 in 2015
No ratings yet
HW 1 in 2015
3 pages
Csc349a f2023 Asn3
No ratings yet
Csc349a f2023 Asn3
4 pages
MSML604_Homework_5
No ratings yet
MSML604_Homework_5
4 pages
Midterm With Solutions
No ratings yet
Midterm With Solutions
26 pages
CS 229, Spring 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
No ratings yet
CS 229, Spring 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
8 pages
A4 Sol
No ratings yet
A4 Sol
27 pages
Ps 1
No ratings yet
Ps 1
5 pages
CS 229, Autumn 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
No ratings yet
CS 229, Autumn 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
8 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
11 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
HoangGiaUy-20020014-VNUK-en_2024
No ratings yet
HoangGiaUy-20020014-VNUK-en_2024
10 pages
Homework_2___DSC_40A
No ratings yet
Homework_2___DSC_40A
13 pages
M2 Exam 2022-23 Solutions
No ratings yet
M2 Exam 2022-23 Solutions
12 pages
Assignment1 PDF
No ratings yet
Assignment1 PDF
2 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Ps 1
No ratings yet
Ps 1
5 pages
ML ES 23-24-II Key
No ratings yet
ML ES 23-24-II Key
4 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
PRML Test 2
No ratings yet
PRML Test 2
3 pages
Coursework_1
No ratings yet
Coursework_1
4 pages
Complete Download of Numerical Algorithms Methods for Computer Vision Machine Learning and Graphics 1st Solomon Solution Manual Full Chapters in PDF DOCX
100% (9)
Complete Download of Numerical Algorithms Methods for Computer Vision Machine Learning and Graphics 1st Solomon Solution Manual Full Chapters in PDF DOCX
48 pages
Homework 2: Mathematics For AI: AIT2005
No ratings yet
Homework 2: Mathematics For AI: AIT2005
3 pages
Practice Midterm Sol
No ratings yet
Practice Midterm Sol
15 pages
MS_key-2
No ratings yet
MS_key-2
4 pages
CS 771A: Intro To Machine Learning, IIT Kanpur Name Roll No Dept
No ratings yet
CS 771A: Intro To Machine Learning, IIT Kanpur Name Roll No Dept
4 pages
hw5_1
No ratings yet
hw5_1
6 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Lecture UnsupervisedML_SOM
No ratings yet
Lecture UnsupervisedML_SOM
38 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
28 pages
hw4
No ratings yet
hw4
13 pages
hw5
No ratings yet
hw5
3 pages
White Board 1-12
No ratings yet
White Board 1-12
20 pages
White Board 1-10
No ratings yet
White Board 1-10
19 pages
More Detailed Content of the Course
No ratings yet
More Detailed Content of the Course
4 pages
HW2
No ratings yet
HW2
2 pages
Chowdhury 2020
No ratings yet
Chowdhury 2020
28 pages
BIO A01 F2020 Syllabus
No ratings yet
BIO A01 F2020 Syllabus
10 pages
Designed for Success 310 HVAC Design Report and Rater Review_2
No ratings yet
Designed for Success 310 HVAC Design Report and Rater Review_2
67 pages
May Jun 2023
No ratings yet
May Jun 2023
2 pages
Descrizione Codice Commessa Sottocommessa
No ratings yet
Descrizione Codice Commessa Sottocommessa
12 pages
SAES-Z-003 Pipelines Leak Detection Systems
No ratings yet
SAES-Z-003 Pipelines Leak Detection Systems
13 pages
Chapter 31 - Signalized Intersections Supplemental - 700
No ratings yet
Chapter 31 - Signalized Intersections Supplemental - 700
167 pages
The Power of Stupidity: Predictable or Unpredictable
100% (1)
The Power of Stupidity: Predictable or Unpredictable
4 pages
Unit - 2 NLP - R20
No ratings yet
Unit - 2 NLP - R20
21 pages
Assignment - 6 Pdfjoiner
No ratings yet
Assignment - 6 Pdfjoiner
17 pages
BASIC CAL WEEK 7 9 Reviewer
No ratings yet
BASIC CAL WEEK 7 9 Reviewer
9 pages
The Following Are Some of The Best Books On Power Electronics
100% (1)
The Following Are Some of The Best Books On Power Electronics
1 page
Navneet Sangyal
No ratings yet
Navneet Sangyal
2 pages
FF374-FR Fault Relay Rev2
No ratings yet
FF374-FR Fault Relay Rev2
1 page
Kristina Syfu Efile Telephonic Request
No ratings yet
Kristina Syfu Efile Telephonic Request
1 page
Sx220 CLI Guide
No ratings yet
Sx220 CLI Guide
582 pages
007 Nighfire
No ratings yet
007 Nighfire
12 pages
Cat. EJB - A - Series - 2018
No ratings yet
Cat. EJB - A - Series - 2018
6 pages
Online Gamified Quiz Platform: Core Group
No ratings yet
Online Gamified Quiz Platform: Core Group
39 pages
How A New Wearable Device Can Detect Breast Cancer Earlier
No ratings yet
How A New Wearable Device Can Detect Breast Cancer Earlier
3 pages
Dc130ri t3 Specification Sheet English
No ratings yet
Dc130ri t3 Specification Sheet English
4 pages
04022025_Arch_Chesterhills_Block G_Ceiling
No ratings yet
04022025_Arch_Chesterhills_Block G_Ceiling
3 pages
2x2 Quadra 36W
No ratings yet
2x2 Quadra 36W
1 page
Best Out of Waste
No ratings yet
Best Out of Waste
7 pages

hw2

Uploaded by

hw2

Uploaded by

Homework #2

CSE 446/546: Machine Learning

• Parts a-e: Brief (2-3 sentence) explanation.

b. [2 points] Show that g(x) = 1/2 2

a. Function in panel I on [a, c]

b. Function in panel II on [a, d]

a. [3 points] Show that

Algorithm 1: Iterative Shrinkage Thresholding Algorithm for Lasso

Before you get started, the following hints may be useful:

This is helpful for choosing the first λ in a regularization path.

c. [5 points] Comment on the effect of λ in these two plots in 1-2 sentences.

df.head() # Print the first few lines of DataFrame df.

c. [4 points] Plot the number of nonzero weights of each solution as a function of λ.

• Code on Gradescope through coding submission.

You might also like