0% found this document useful (0 votes)
7 views

7.b-CMP460-S22-Linear Models - Regularization

Uploaded by

calabi mozart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

7.b-CMP460-S22-Linear Models - Regularization

Uploaded by

calabi mozart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CMPS 460 – Spring 2022

Tamer Elsayed

7.b

Linear Models: Weight Regularization Chapter 7:


7.3
Loss function Regularizer
measures how well prefers solutions
Objective
classifier fits training that generalize
function
data well

𝑙𝑙(𝑦𝑦𝑛𝑛 , 𝑦𝑦�𝑛𝑛 ) λ : parameter that controls


the importance of the
regularization term
• Different loss function approximations
− easier to optimize
• Regularizer
− prevents overfitting/prefers simple models.
CMPS 460: Machine Learning 2
• 0-1 Loss • Logistic Loss

• Hinge Loss • Exponential loss

CMPS 460: Machine Learning �𝒏𝒏


= 𝒚𝒚𝒏𝒏 𝒚𝒚 3
Weight Regularization

CMPS 460: Machine Learning 4


• A technique to improve the
generalizability of a learned model.
• Without bounds on complexity of
the function space, model tends to
overfit training data.
• Introduces a penalty for exploring
certain regions of the function
space.

CMPS 460: Machine Learning 5


• Goal: find simple solutions

• Ideally, we want most entries of 𝑤𝑤 to be zero, so


prediction depends only on a small number of features.
• Formally, we want to minimize:

• That’s NP-hard!
• So we use approximations instead.
− e.g., we encourage wd’s to be small

CMPS 460: Machine Learning 6


• 𝑙𝑙𝑝𝑝 norms can be used as regularizers.

Contour
plots for p=2 p=1 p<1

CMPS 460: Machine Learning 7


https://www.researchgate.net/publication/331855021_An_Efficient_Image_Reconstruction_Framework_Using_Total_Variation_Regulariz
ation_with_Lp-Quasinorm_and_Group_Gradient_Sparsity/figures?lo=1

CMPS 460: Machine Learning 8


• 𝑙𝑙𝑝𝑝 norms can be used as regularizers.

• Smaller 𝑝𝑝 favors sparse vectors w


− i.e. most entries of w are close or equal to 0
• 𝑝𝑝 < 1∶ norm is non convex and hard to optimize!
• 𝑙𝑙1 norm: encourages sparse w, convex, but not smooth at
axis points
• 𝑙𝑙2 norm: convex, smooth, easy to optimize
CMPS 460: Machine Learning 9

You might also like