0% found this document useful (0 votes)

2 views

Machine Learning 3

The document provides an overview of Support Vector Machines (SVMs) and their applications in machine learning, particularly in bioinformatics. It discusses the concepts of maximizing margins, kernel functions, and the dual formulation of SVMs, emphasizing the computational efficiencies gained through the 'kernel trick'. Additionally, it highlights various applications of SVMs in gene function prediction, cancer classification, and protein analysis.

Uploaded by

roshjames60

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Machine Learning 3

Uploaded by

roshjames60

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Applied Machine Learning

Annalisa Marsico
OWL RNA Bionformatics group
Max Planck Institute for Molecular Genetics
Free University of Berlin
29 April, SoSe 2015
Support Vector Machines (SVMs)

1. One of the most widely used, successful approaches

to train a classifier
2. Based on new idea of maximizing the margin as
objective function
3. Based on the idea of kernel functions
Kernel Regression
Linear Regression
We wish to learn f: X -> Y , where X = <X1,….Xp>, Y real-valued,
p = number of features

Learn (x) = ∑ =< , >=

Where = min ∑ − + ∑
Vectors, data points, inner products
Consider (x) = ∑ =< , >=
Where = [3 1] and = [1 2]

w
x
θ
x1

For any two vectors, their dot product (aka inner product) is equal to
product of their lenghts, times the cosine of angle between them

< , >=∑ = cos

Linear Regression Primal Form
Learn (x) = ∑ =< , >=

Where = min − +
regularization
term
Solve by taking the derivative wrt w, set to zero..
= +

So: = = +
Linear Regression Primal Form
Learn (x) = ∑ =< , >=

Where = min − +

Solution: = +

Interesting observation: w lies in the space spanned by

training examples (why?)
Linear Regression Dual Form
Learn (x) = ∑ =< , >=
Where = min − +

Solution: = +

Dual form use fact that: =∑

Learn =∑ < , >

Solution: = +

A lot of dot products..

Key ingredients of Dual Solution
Step 1: Compute
= +

Where = that is =< , >

Step 2: Evaluate on new point x by

= < , >

Important observation: both steps only involve inner products

between input data points
Kernel Functions
Since the computation only involves dot products , we can substitute
for all occurrences of <. , .> a kernel function k that computes:

k , =< Φ ,Φ >

Φ is a function from the current space to a feature

(higher-dimensional space) F defined by the mapping:

Φ∶ Φ ϵF
Kernel Functions
Projected space
Original space Φ∶ (higher dimensional)
u2
. x1 Φ(x2)
.
. x2 .
Φ(x1)

What the kernel function k does is to give me some other operation (

in the original space) which is equivalent to compute dot products into
the higher dimensional space
, =< Φ ,Φ > :
Linear Regression Dual Form
Learn (x) = ∑ =< , >=
Where = min − +

Solution: = +

Dual form use fact that: =∑

Learn =∑ < , >

Solution: = +

By doing that we gain computational complexity!

Example: Quadratic kernel
Suppose we have data originally in 2D, but project it into 3D using Ф(x)

= Φ( ) = 2

This converts our linear regression problem into quadratic regression!

But we can use the following kernel function to calculate dot products in
the projected 3D space, in terms of operations in the 2D space

< Φ( ), Φ >=< , > ≝ ( , )

And use it to train and apply our regression function, never leaving 2D space

= ( , ) = + = ( , )
Implications of the “kernel trick”

• Consider for example computing a regression function over 1000 images

represented by pixel vectors – 32 x 32 = 1024 pixels

• By using the quadratic kernel we implement the regression function

in a 1,000,000 dimensional space

• But actually using less computation for the learning phase than
we did in the original space – inverting a 1000 x 1000 matrix instead
of a 1024 x 1024 matrix
Some common kernels
Polynomial of degree d
, =< ∙ >
Polynomial of degree up to d
, =< ∙ + >
Gaussian / Radial kernels (polynomials of all orders – projected
Space has infinite dimensions)
−
, = −
2
Linear kernel
, =< ∙ >
Key points about kernels
• Many learning tasks are framed as optimization problems

• Primal and Dual formulation of optimization problems

• Dual version framed in terms of dot products between x’s

• Kernel functions k(x,z) allow calculating dot products <Ф(x), Ф(z)>

without actually projecting x into Ф(x)

• Leads to major efficiencies, and ability to use very high dimensional

(virtual) feature spaces

• We can learn non-linear functions

Kernel-Based Classifiers
Linear Classifier – Which line is better?

++ +
-- -
+ -
+ + -
+ + - -
+ + +
- -- -
+ + - --
+ - -
+ -
Pick the one with the largest margin!

++ +
-- -
+ -
+ + -
+ + - -
+ + +
- -- -
+ + - --
+ - -
+ -
Parametrizing the decision boundary

+ >0 + <0
++ +
-- -
+ -
+ + -
+ + - -
+ +
- -- -
+
+ + - --
+ - -
+ -

Labels ϵ −1, +1 - class

Maximizing the margin

ɣ Margin = Distance
++ + ɣ
of closest examples
-- - from the decision
+ -
+ + - line / hyperplane
+ + - - Margin = = /
+ +
- -- -
+
+ + - --
+ - -
+ -

Labels ϵ −1, +1 - class

Maximizing the margin
+ ɣ ɣ
++
-- - Margin = Distance
+
- -
of closest examples
+ + from the decision
+ + - - line / hyperplane

+ +
- -- -
+ Margin = = /
+ + - --
+ - -
+ -
Labels ϵ −1, +1 - class
Maximizing the margin corresponds to minimize ||w|| !
SVM: Maximize the margin

+ ɣ ɣ Margin = = /
++
-- -
+ -
+ + - , = /
+ + - -
+ +
- -- - s.t. + ≥
+
+ + - --
+ - - Note: ‘a’ is arbitrary (we can
+ -
normalize equations by a)

Labels ϵ −1, +1 - class

Support Vector Machine (primal form)
, = 1/

+ ɣ ɣ
++ -- - s.t. + ≥1
+
+ + - -
+ + - - Primal form
+ + + - -- - ,

+ + - --
+ - - s.t. + ≥1
+ - Solve efficiently by quadratic
Programming (QP)
- Well-studied solution
algorithms

Non-kernelized version of SVMs !

SVMs (from primal form to dual form)

• With kernel regression we had to go from the primal form of our

optimization problem to the dual version of it
-> expressed in a way that we only need to compute dot products

• We do the same for SVMs

• All things which apply to kernel regression apply to SVM’s

• But with a different objective function: the margin
SVMs (from primal form to dual form)
Primal form: solve for w, b

,
s.t. + ≥ 1 for all j training examples

Classification test for new x: + >0

Dual form: solve for α1, ......, αN

,… < , >

s.t. ≥ 0 and for all j training examples ∑ =0

Classification test for new x ∑ ∈ < , >+ ≥0

Support Vectors
∑∈ < , >+ >0 ∑∈ < , >+ <0

+ >0 + <0
+ ɣ ɣ
++
-- - Linear hyperplane
+
- -
defined by support vectors
+ +
+ + - - Moving other points a little

+ +
- -- - doesn’t change the decision
boundary
+
+ + - -- Only need to store the
+ - - Support vectors to predict
labels of new points
+ -
“Hard margin” Support Vector Machine
Kernel SVMs
Because the dual form only depends on dot products, we can apply the
Kernel trick to work in a (virtual) projected space Ф : X F

Primal form: solve for w, b in the projected higher dim. space

,
s.t. Φ( ) + ≥ 1 for all j training examples

Classification test for new x: Φ( ) + >0

Dual form: solve for α1, ......, αN

,… < , >

s.t. ≥ 0 and for all j training examples ∑ =0

Classification test for new x ∑ ∈ < , >+ ≥0

SVM decision surface using Gaussian
Kernel

= Φ +

x1
Circled points are the support vectors: training examples with non-zero
Points plotted in original 2D space
Contour lines correspond to
−
= + k( , )=b+ −
2
∈ ∈
SVMs with Soft Margin
Allow errors in classification
, + # mistakes

++ + s.t. Φ( ) + ≥1
-+ -- - for all j training examples
+ + + - -
+ + -
- -- -
+ +- + - Maximize margin and minimize
+ + - -- the number of mistakes on
+ -+ - training data
+ - C – tradeoff parameter
Not QP
Treats all errors equally
What if the data are not linearly
separable?
Allow errors in classification
, + ∑

++ + s.t. Φ( ) + ≥1−
-+ -- - for all j training examples
+ + + - -
+ + -
- -- -
+ +- + - = ‘slack’ variable
+ + - -- (>1 if mis-classified)
+ -+ - Pay linear penalty for mistakes
+ - C – tradeoff parameter

Still QP 
Variable selection with SVMs
Forward Selection: all features are tried separately and the one performing the best
∗
is retained. Then, all remaining features are added in turn and the best pair ( ∗ , ∗ )
is retained. Then all the remaining features are added in turn and the best trio { ∗ , ∗ , ∗
}
is retained. And so on until the performance stops increasing or until all features
have been exhausted.

Pseudocode: F full set of features, S=selected features={}, p=curr performance=0,

oldp = previous performance=-1, p*=best performance=0

While ≠ {} and while >

- for each feature f in F
- for k in [1…k] folds # cross-validation
- split D into T(training) and V(validation)
- train a model M on T using features S U f
- compute the performance of M on V
- compute the average performance over k folds
- choose the feature ∗ that leads to best performance ∗
- if ∗ > , then = , = ∗ , = / ∗ , else stop
Output features in order of importance
Variable selection with SVMs
Recursive feature elimination: at first all features are used to train a SVM
The margin γ is computed. Then, for each feature f, a new margin is computed using the
feature set F’=F/{f} and the margin is updated to γ’ . The feature f leading to the smallest
difference between γ and γ’ is considered least valuable and is discarded. The process is
Repeated until the performance starts degrading.

Pseudocode: F full set of features, S=selected features={}, p=curr performance=0,

p*=best performance=0; t = threshold on p

While ≠ {}
- train a SVM on D (training set) using cross-validation to tune parameters
- p is the performance obtained with best set of parameters
- if p - p* > t
- p* = p, oldF = F
- for each feature f in F
- compute the difference in performance
- discard the feature that leads to smallest difference
- else {F = oldF)
Output the features that are left in F
SVM Summary
• Objective: maximize margin between decision surface and data

• Primal and dual formulations:

• Dual represents classifier decision in terms of support vector

• Kernel SVM’s
• Learn linear decision in high dimension space, working in original
low dimension space

• Handling noisy data: soft margin ‘slack variables’

• again primal and dual forms

• SVM algorithm: Quadratic Programming Optimization

• single global minimum
Applications of SVMs in
Bioinformatics
• Gene function prediction (from microarray data, RNA-seq)

• Cancer tissue classification

• Remote homology detection in proteins (structure & sequence features)

• Translation initiation site recognition in DNA (from distal sequences)

• Promoter prediction (from sequence alone or other genomic features)

• Protein localization

• Virtual screening of small molecules

IO-LESSON-1-Introduction-to-Industrial-and-Organizational-Psychology
No ratings yet
IO-LESSON-1-Introduction-to-Industrial-and-Organizational-Psychology
6 pages
Pre Interview Task (Untuk Teacher)
No ratings yet
Pre Interview Task (Untuk Teacher)
4 pages
Arabic Course
No ratings yet
Arabic Course
3 pages
Math7 Le Q3W1
100% (2)
Math7 Le Q3W1
2 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Support Vector Machines & Kernels: David Sontag New York University
No ratings yet
Support Vector Machines & Kernels: David Sontag New York University
19 pages
lecture6
No ratings yet
lecture6
17 pages
Lect 3
No ratings yet
Lect 3
14 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Hands-On Machine Learning: Chapter 5: Support Vector Machines
No ratings yet
Hands-On Machine Learning: Chapter 5: Support Vector Machines
32 pages
This Is
No ratings yet
This Is
7 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
07 Kernels
No ratings yet
07 Kernels
6 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Icml Tutorial
No ratings yet
Icml Tutorial
85 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM
No ratings yet
SVM
40 pages
Basic Concept of SVM
No ratings yet
Basic Concept of SVM
29 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
SVM Class
No ratings yet
SVM Class
33 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Lecture 3
No ratings yet
Lecture 3
18 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Extra Kernels
No ratings yet
SVM Extra Kernels
29 pages
תרגול - SVM 1
No ratings yet
תרגול - SVM 1
32 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
Support Vector Machines: Logisic Regression
No ratings yet
Support Vector Machines: Logisic Regression
10 pages
Vahid
No ratings yet
Vahid
18 pages
Support Vector Machines: More Generally Kernel Methods
No ratings yet
Support Vector Machines: More Generally Kernel Methods
58 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Kernal Methods Machine Learning
No ratings yet
Kernal Methods Machine Learning
53 pages
0701907v3
No ratings yet
0701907v3
53 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
Svm
No ratings yet
Svm
40 pages
SVM 4
No ratings yet
SVM 4
8 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
No ratings yet
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
15 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Support Vector Machine
No ratings yet
Support Vector Machine
38 pages
Intro SVM PDF
No ratings yet
Intro SVM PDF
47 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
23 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
No ratings yet
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
34 pages
Ds 11
No ratings yet
Ds 11
21 pages
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Lean Mean Money Making Machine _ Forex Factory
No ratings yet
Lean Mean Money Making Machine _ Forex Factory
15 pages
The technical indicator Belkhayate Gravity Center
No ratings yet
The technical indicator Belkhayate Gravity Center
2 pages
Eco 343 Questions
No ratings yet
Eco 343 Questions
1 page
Eco 352 Past Question
No ratings yet
Eco 352 Past Question
2 pages
pull back BF indicator
No ratings yet
pull back BF indicator
2 pages
Oms Alerts (Early 33.5) PDF 1
No ratings yet
Oms Alerts (Early 33.5) PDF 1
1 page
Q1 English 10 TOS - Edit
No ratings yet
Q1 English 10 TOS - Edit
2 pages
CSAA Analysis
100% (2)
CSAA Analysis
5 pages
Cat Varc Live Online Coaching (With Recorded Live Sessions)
No ratings yet
Cat Varc Live Online Coaching (With Recorded Live Sessions)
11 pages
Salary Structure of Indian IT Companies
No ratings yet
Salary Structure of Indian IT Companies
12 pages
The Difference Between Phonetics and Phonology
86% (21)
The Difference Between Phonetics and Phonology
3 pages
Cabuliwallah Essay
100% (6)
Cabuliwallah Essay
1 page
Impact of e Learning Vs Traditional Learning On Students Performance and Attitude 5e020d7d4cae7
No ratings yet
Impact of e Learning Vs Traditional Learning On Students Performance and Attitude 5e020d7d4cae7
10 pages
Cambridge Art and Design Coursework
100% (2)
Cambridge Art and Design Coursework
6 pages
Cellphone Utilization and Its Correlation To The Academic Performance of Selected Students of Pareja Integrated School 1
No ratings yet
Cellphone Utilization and Its Correlation To The Academic Performance of Selected Students of Pareja Integrated School 1
45 pages
Homeroom Guidance: Quarter 1 - Module 1: Self-Analysis: A Step To My Improvement
100% (8)
Homeroom Guidance: Quarter 1 - Module 1: Self-Analysis: A Step To My Improvement
10 pages
Pre Requisites
No ratings yet
Pre Requisites
6 pages
Book1 Unit1 Pg004 Hand Tools1
0% (1)
Book1 Unit1 Pg004 Hand Tools1
2 pages
Mouli Tcs
No ratings yet
Mouli Tcs
3 pages
Theory Uses of Like
No ratings yet
Theory Uses of Like
2 pages
Management Approaches For Industry 4.0: A Human Resource Management Perspective
No ratings yet
Management Approaches For Industry 4.0: A Human Resource Management Perspective
9 pages
Advertising & Profitability
No ratings yet
Advertising & Profitability
6 pages
Cpu Business and Information Technology College Department of Project Management MPM Program Fundamental of Project Management
No ratings yet
Cpu Business and Information Technology College Department of Project Management MPM Program Fundamental of Project Management
5 pages
3. Playing With Numbers Class VI
No ratings yet
3. Playing With Numbers Class VI
6 pages
02 Assignment - Variable Worksheet
No ratings yet
02 Assignment - Variable Worksheet
2 pages
The Circulatory System Notes PDF
No ratings yet
The Circulatory System Notes PDF
12 pages
First Semester Exam Time-Table-24 - 031429
No ratings yet
First Semester Exam Time-Table-24 - 031429
5 pages
s12525-023-00654-3
No ratings yet
s12525-023-00654-3
17 pages
International University Students' Online Shopping Behaviour
No ratings yet
International University Students' Online Shopping Behaviour
17 pages
Scoring The IGLR
100% (3)
Scoring The IGLR
91 pages
Remedial Ins. in Speaking RPRT
No ratings yet
Remedial Ins. in Speaking RPRT
34 pages
Aplus Sample Grant Proposals
No ratings yet
Aplus Sample Grant Proposals
20 pages

Machine Learning 3

Uploaded by

Machine Learning 3

Uploaded by

Applied Machine Learning

1. One of the most widely used, successful approaches

Learn (x) = ∑ =< , >=

< , >=∑ = cos

Interesting observation: w lies in the space spanned by

Dual form use fact that: =∑

Learn =∑ < , >

A lot of dot products..

Where = that is =< , >

Step 2: Evaluate on new point x by

Important observation: both steps only involve inner products

Φ is a function from the current space to a feature

What the kernel function k does is to give me some other operation (

Dual form use fact that: =∑

Learn =∑ < , >

By doing that we gain computational complexity!

This converts our linear regression problem into quadratic regression!

< Φ( ), Φ >=< , > ≝ ( , )

• Consider for example computing a regression function over 1000 images

• By using the quadratic kernel we implement the regression function

• Primal and Dual formulation of optimization problems

• Dual version framed in terms of dot products between x’s

• Kernel functions k(x,z) allow calculating dot products <Ф(x), Ф(z)>

• Leads to major efficiencies, and ability to use very high dimensional

• We can learn non-linear functions

Labels ϵ −1, +1 - class

Labels ϵ −1, +1 - class

Labels ϵ −1, +1 - class

Non-kernelized version of SVMs !

• With kernel regression we had to go from the primal form of our

• We do the same for SVMs

• All things which apply to kernel regression apply to SVM’s

Classification test for new x: + >0

Dual form: solve for α1, ......, αN

s.t. ≥ 0 and for all j training examples ∑ =0

Classification test for new x ∑ ∈ < , >+ ≥0

Primal form: solve for w, b in the projected higher dim. space

Classification test for new x: Φ( ) + >0

Dual form: solve for α1, ......, αN

s.t. ≥ 0 and for all j training examples ∑ =0

Classification test for new x ∑ ∈ < , >+ ≥0

Pseudocode: F full set of features, S=selected features={}, p=curr performance=0,

While ≠ {} and while >

Pseudocode: F full set of features, S=selected features={}, p=curr performance=0,

• Primal and dual formulations:

• Handling noisy data: soft margin ‘slack variables’

• SVM algorithm: Quadratic Programming Optimization

• Cancer tissue classification

• Remote homology detection in proteins (structure & sequence features)

• Translation initiation site recognition in DNA (from distal sequences)

• Promoter prediction (from sequence alone or other genomic features)

• Virtual screening of small molecules

You might also like