0% found this document useful (0 votes)
43 views1 page

Support Vector Machines

This document discusses using support vector machines (SVMs) for automatic genre-specific text classification. It describes how SVMs find the optimal separating hyperplane between two classes by maximizing the margin between the hyperplane and the closest data points of each class. The document also mentions using kernels to transform data into a higher dimensional space when the hyperplane cannot be found in the original space. It evaluates different SVM and Naive Bayes classification models on a training corpus using 10-fold cross validation and reports the F1 score to measure classification performance.

Uploaded by

Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views1 page

Support Vector Machines

This document discusses using support vector machines (SVMs) for automatic genre-specific text classification. It describes how SVMs find the optimal separating hyperplane between two classes by maximizing the margin between the hyperplane and the closest data points of each class. The document also mentions using kernels to transform data into a higher dimensional space when the hyperplane cannot be found in the original space. It evaluates different SVM and Naive Bayes classification models on a training corpus using 10-fold cross validation and reports the F1 score to measure classification performance.

Uploaded by

Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Automatic Genre-Specific Text Classification

Figure 1. Support Vector Machines where the hyperplane (3) is found to separate two classes of objects (rep-
resented here by stars and triangles, respectively) by considering the margin (i.e., distance) (2) of two support A
hyperplanes (4) defined by support vectors (1). An special case is depicted here that each object only has two
feature variables x1 and x2.

2 1992]. Common used kernels include polynomial,


|| W || 2 . radial basis function, Gaussian radial basis function,
and sigmoid. In our comparative study, we only tested
SVM with a polynomial kernel. In addition, sequen-
D is a vector of classes of training data, i.e., each
tial minimal optimization (SMO) [Platt, 1999], a fast
item in D is +1 or −1. A is the matrix of feature values
nonlinear optimization method, was employed during
of training data. e is the vector of ones. After ω and γ
the training process to accelerate training.
are estimated from training data, a testing item x will
be classified as +1 if
Evaluation Results and Discussions
ω x + γ > 0 and −1 otherwise.
T

Evaluation Setups - We applied the classification mod-


The soft margin hyperplane [Cortes &Vapnik, 1995] els discussed above (five settings in total implemented
was proposed to allow for the case where the training with the Weka package [Witten & Frank, 2005]) on the
data points cannot be split without errors. The modified training corpus with the three different feature sets. In
objective function is the rest of this paper, we refer to the SVM implemented
using the SMO simply as ‘SMO’ for short; the one with
1 the polynomial kernel as ‘SMO-K’, Naïve Bayes with
|| W || 2 + ∑ E i such that D( AW − eG ) ≥ e − X numeric features estimated by Gaussian distribution
2 i
as ‘NB’, and the one with kernel as ‘NB-K’. We used
tenfold cross validation to estimate the classification
where ξ = (ε1...εn)T and εi measures the degree of performance as measured by F1. Tenfold cross valida-
misclassification of the ith training data point during tion estimates the average classification performance by
the training. It considers minimizing the errors while splitting a training corpus into ten parts and averaging
maximizing the margins. the performance in ten runs, each run with nine of these
In some cases, it is not easy to find the hyperplane in as a training set and the rest as a testing set. F1 is a
the original data space, in which case the original data measure that trades off precision and recall. It provides
space has to be transformed into a higher dimensional an overall measure of classification performance. For
space by applying kernels [Boser, Guyon, & Vapnik,



You might also like