0% found this document useful (0 votes)
50 views

Homework 3: Problem 1

This document contains a homework assignment with 5 problems related to kernel methods and support vector machines (SVM). Problem 1 shows that several functions are valid kernel functions and derives their properties. Problem 2 defines the hinge loss function and plots it. Problem 3 calculates the gradient of the hinge loss function. Finally, problems 4 and 5 load data and define an algorithm to perform gradient descent optimization of SVM parameters.

Uploaded by

Q G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Homework 3: Problem 1

This document contains a homework assignment with 5 problems related to kernel methods and support vector machines (SVM). Problem 1 shows that several functions are valid kernel functions and derives their properties. Problem 2 defines the hinge loss function and plots it. Problem 3 calculates the gradient of the hinge loss function. Finally, problems 4 and 5 load data and define an algorithm to perform gradient descent optimization of SVM parameters.

Uploaded by

Q G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Homework 3

Qiu G

3/31/2020

Problem 1

Part 1)

eiπ , F,
Rn , φ, xπ , x
∀x ∈ Rd , we define φ : Rd → F such that φ(x) = x.
We show that k ∗ is a valid kernel.

k ∗ (x, x0 ) = hφ(x), φ(x0 )iF (1)


0
= hx, x i (2)

Part 2)

We know that k1 (x, x0 ) is a valid kernel. Using the basic definitions and properties for kernels we can
deduce the following: it is positive semi definite.
For real a > 0, a ∗ k1 (x, x0 ) is a valid kernel.
Hence:

k ∗ (x, x0 ) = ak1 (x, x0 ) (3)


is a valid kernel.

Part 3)

We know that k1 (x, x0 ), is a valid kernel. We use Mercer’s Theorem and the following decomposition:


X
k ∗ (x, x0 ) = g(x)[ λj φ(x)j φ(xj0 )]g(x0 ) (4)
j=1

X
= λj g(x)φ(x)g(x0 )φ(x0 ) (5)
j=1

= hg(x)φ(x), g(x0 )φ(x0 )iF (6)


∗ ∗ 0
= hφ (x), φ (x )i (7)

1
Part 4)

Both k1 (x, x0 ) and k2 (x, x0 ) are valid kernels. Suppose φ1 (x) and φ2 (x) corrspond to the kernels just
mentioned (respectively). We define φ∗ : Rd → F, such that φ∗i,j (x) = φ1,i (x)φ2,j (x). We can easily see
that:

k ∗ (x, x0 ) = hφ∗ij (x), φ∗ij (x0 )iF (8)


= hφ1,i (x)φ2,j (x), φ1,i (x0 )φ2,j (x0 )i (9)
0 0
= k1 (x, x )k2 (x, x ) (10)

Part 5)

We import the data.

#Import data, and load svm library


library(tidyverse)

## -- Attaching packages -------------------------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.0 v purrr 0.3.4


## v tibble 3.0.1 v dplyr 0.8.5
## v tidyr 1.0.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0

## -- Conflicts ----------------------------------------------------------- tidyverse_conflicts() --


## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()

library(e1071)

data1 <- read_csv("~/Downloads/HW3Problem1.csv")%>%mutate(class=factor(class))

## Parsed with column specification:


## cols(
## x1 = col_double(),
## x2 = col_double(),
## class = col_character()
## )

#Train svm
svm1 <- svm(class~.,data = data1)
svm1

##
## Call:
## svm(formula = class ~ ., data = data1)
##
##
## Parameters:
## SVM-Type: C-classification

2
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 347

#Generate plot
plot(svm1, data = data1)

SVM classification plot

x xx xx
o o xx x x xxx
xx x xx
x x
o xx
o x x xx
x
x
xx x x o xx x xxxx
x x xxx x x x x x
x xx xx x x
5 oxo x
x xxxx x x
x
xx x xx

B
x x x
o x xx x
x x x x
x xx x
oxx x xx x xx
x xxxxx xxxxxx xx xoxxoxxxxxxoxx
x oo x x x x x x o
oo
oo xoo x x
xxx xxxxxxo ooo xxx oxxx
x xxxoo o xx xx x x xxxx xx o xxx xx x
x1

0 x x x x
x o x x x xxx oxx x xxxooo x
x xx o x
x ox xxx xxo o x x xxx xxx oxx xxx x xx
x x x x x xxoxoxxxx xxxxxxxxx xxxxx xxx
o
o xx x x xx
x x x
x xx x x xx x xx oo
−5 x o x xx x xx x xxxxxxx x o
x

A
xx x xxx xxxxx x x x x
x x x xx x x x x xox
x xoo
x xx x
xx x x xx x xx xx x o
xx x
−10 o
−10 −5 0 5 10

x2

3
Problem 2

Part 1)

We construct the hinge loss function g(z).

#Range
z <- seq(-1,3,by=0.001)

#Hinge Loss
g <- function(z){
result <- ifelse(1-z>=0,1-z,0)
return(result)
}

#Create tibble
t1 <- tibble(z=z)%>%mutate(loss=g(z))

#Create plot
g1 <- ggplot(data = t1, mapping = aes(x=z,y=loss))+
geom_line()+
labs(title = "Hinge Loss g(z)")
g1

Hinge Loss g(z)


2.0

1.5
loss

1.0

0.5

0.0

−1 0 1 2 3
z

4
Part2)

When zi < 1, we have Qi (vH , c) = 1 − yi (w1 xi,1 + w2 xi,2 + c) + λ2 (w12 + w22 ). Therefore:

 
−yi xi,1 + λw1
∇Qi (vH , c) = −yi xi,2 + λw2  (11)
−yi

When zi > 1, we have Qi (vH , c) = λ2 (w12 + w22 ). Therefore:

 
λw1
∇Qi (vH , c) = λw2  (12)
0

Part3)

Here β is the parameter vector.

#Load data
data2 <- read_csv("~/Downloads/HW3Problem2.csv")

## Parsed with column specification:


## cols(
## x1 = col_double(),
## x2 = col_double(),
## y = col_double()
## )

#Create Algorithm
gradient.descent <- function(data,lambda=0.25,iter=10000){
n <- nrow(data)
x1 <- data[,1]
x2 <- data[,2]
y <- data[,3]
sum <- 0
beta <- runif(3)
for (i in 1:iter) {
z<-y*(beta[1]+beta[2]*x1+beta[3]*x2)
if(z[[1]][1]<1){
sum<-c(-y[[1]][1],-y[[1]][1]*x1[[1]][1]+lambda*beta[2],-y[[1]][1]*x2[[1]][1]+lambda*beta[3])
}
else{
sum<-c(0,lambda*beta[2],lambda*beta[3])
}
for (j in 2:n) {
if(z[[1]][j]<1){
sum<-sum+c(-y[[1]][j],-y[[1]][j]*x1[[1]][j]+lambda*beta[2],-y[[1]][j]*x2[[1]][j]+lambda*beta[3])
}
else{
sum<-sum+c(0,lambda*beta[2],lambda*beta[3])

5
}

}
sum <- sum/n
eta<-1/(i*lambda)
beta<-beta-eta*sum

}
return(beta)
}

#Evaluate beta
beta.hat <- gradient.descent(data2)

#Display
beta.hat

## [1] -2.8072328 -0.8211780 0.8368775

Part 4)

We plot the points with the decision boundary from the previous part.

data3<-data2%>%mutate(y=factor(y))
g2 <- ggplot(data = data3, mapping = aes(x=x1,y=x2,colour=y))+
geom_point()+
geom_abline(slope=-beta.hat[2]/beta.hat[3],intercept = -beta.hat[1])+
labs(title = "SVM")
g2

6
SVM

y
x2

−1
1
1

−1

−3 −2 −1 0
x1

You might also like