0% found this document useful (0 votes)

21 views

10.1.1.92.623

speech

Uploaded by

Susanta Sarangi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

10.1.1.92.623

speech

Uploaded by

Susanta Sarangi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

A Simple Method For Estimating Conditional

Probabilities For SVMs

Stefan Rüping
Dortmund University, CS Department, LS VIII
44221 Dortmund, Germany
[email protected]

Abstract
Support Vector Machines (SVMs) have become a popular learning algorithm,
in particular for large, high-dimensional classification problems. SVMs have been
shown to give most accurate classification results in a variety of applications. Sev-
eral methods have been proposed to obtain not only a classification, but also an
estimate of the SVMs confidence in the correctness of the predicted label. In this
paper, several algorithms are compared which scale the SVM decision function
to obtain an estimate of the conditional class probability. A new simple and fast
method is derived from theoretical arguments and empirically compared to the ex-
isting approaches.

1 Introduction
Support Vector Machines (SVMs) have become a popular learning algorithm, in par-
ticular for large, high-dimensional classification problems. SVMs have been shown to
give most accurate classification results in a variety of applications. Several methods
have been proposed to obtain not only a classification, but also an estimate of the SVMs
confidence in the correctness of the predicted label.
Usually, the performance of a classifier is measured in terms of accuracy or some
other performance measure based on the comparison of the classifiers prediction y^ of
the true class y . But in some cases, this does not give sufficient information. For ex-
ample in credit card fraud detection, one has usually much more negative than positive
examples, such that the optimal classifier may be to the default negative classifier. But
then, still one would like to find out which transactions are most probably fraudulent,
even if this probability is small. In other situations e. g. information retrieval, one could
be more interested in a ranking of the examples with respect to their interestingness in-
stead of a simple yes/no-decision. Third, one may be interested to integrate a classifier
into a bigger system, for example a multi-classifier learner. To combine and compare
the SVM prognosis with that of other learners, one would like a comparable, well-
defined confidence estimate. The best method to achieve a confidence estimate that
allows to rank the examples and gives well-defined, interpretable values, is to estimate

1
the conditional class probability P (y jx). Obviously, this is a more complex problem
than finding a classification l(x) 2 f 1; 1g, as it is possible to get a classification
function by comparing P^ (y jx) to the threshold 0:5, but not vice versa.
For numerical classifiers, i. e. classifiers of the type l(x) = sign(f (x)) with a
numerical decision function f , one usually tries to estimation the conditional class
probability from the decision function P^ (y jx) = P^ (y jf (x)). This reduces the prob-
ability estimation from a multi-variate to a one-dimensional problem, where one has
to find a scaling function such that P^ (Y = 1jx) = (f (x)). The idea behind this
approach is that the classification l(x) of examples that lie close to the decision bound-
ary fxjf (x) = 0g can easily change when the examples are randomly perturbed by a
small amount. This is very hard for examples with very high or very low f (x) (this
argument requires some sort of continuity or differentiability constraints on the func-
tion f ). Hence, the probability that the classifier is correct should be higher for larger
absolute values of f . As was noted by Platt [10], this also means there is a strong prior
for selecting a monotonic scaling function .
The rest of the paper is organized as follows: In the next section, we will shortly
present the Support Vector Machine and Kernel Logistic Regression algorithm, as far
as it is necessary for this paper. In Section 3, existing methods for probabilistic scaling
of SVM outputs will be discussed and a new, simple scaling method will be presented.
The effectiveness of this method will be empirically evaluated in Section 4.

2 Algorithms
2.1 Support Vector Machines
Support Vector Machines are a classification method based on Statistical Learning The-
ory [12]. The goal is to find a function f (x) = w x + b that minimizes the expected
Risk
ZZ
R[f ℄ = L(y; f (x))dP (yjx)dP (x)
of the learner by minimizing the regularized risk Rreg [f ℄, which is the weighted sum
of the empirical risk with respect to the data (xi ; yi )i=1:::n and a complexity term jjwjj2

Rreg [f ℄ =
1 jjwjj2 + C X j1 y f (x )j
i i +
2 i
(1)

where j j+ = max(; 0): This optimization problem can be efficiently solved in its
dual formulation
1 n X n X
2 i;j=1 i j i j i j i=1 i ! min
y y x x + (2)

X
n
w:r:t: i yi =0
i=1
8i : 0 i C

2
2.2 The Kernel Trick
The inner product xi xj in Equation 2 can be replaced by a kernel function K (xi ; xj )
which corresponds to an inner product in some space, called feature space. That is,
there exists a mapping : X ! X such that K (x; x0 ) = (x) (x0 ). This allows
the construction of non-linear classifiers by an essentially linear algorithm.
The resulting decision function is given by

f (x) = w (x) + b
X
n
= yi i K (xi ; x) + b:
i=1
The actual SVM classification is given by sign(f (x)). It can be shown that the SVM
solution depends only on its support vectors SV = fxi j i 6= 0g. See [12, 2] for a more
detailed introduction on SVMs.

2.3 Kernel Logistic Regression

Kernel Logistic Regression [13, 5, 14, 11] is the kernelized version of the well-known
logistic regression technique. The optimization problem is similar to the SVM problem
in Equation 1 except that an exponential loss function is used instead of the L1 loss:
1 jjwjj2 + C X g( y (w x b)) ! min
i i
2 i
where
g( ) = log(1 + e )
As for the SVM, the problem can be solved in its dual formulation [6]:
1X X
g(i ) ! min :
i j K (xi ; xj ) + C
2 i;j i
In contrast to the SVM, Kernel Logistic Regression directly models the conditional
class probability, i. e. P (Y = 1jx) can be estimated via

P (yjx) =
1
1+e y(wx b) :
The drawback of KLR is that typically all i are nonzero, as all examples play a
role in estimating the conditional class probability, whereas in the SVM only a small
number of support vectors are needed to classify the examples. Hence, KLR is compu-
tationally much more expensive than the SVM.

3 Probabilistic Scaling of Support Vector Machines

One can easily see that the SVM decision function f (x) = w (x) + b gives the
feature space distance of the transformed example (x) to the hyperplane defined by

3
(w; b). Assuming that P (Y = 1jx) is continuous in x, it seems reasonable that ex-
amples lying closer to the hyperplane have a larger probability of being misclassified
than examples lying far away (the closer the example is to the hyperplane, the smaller
changes have to be to produce a different classification). Hence, it seem suitable to
model the conditional class probability P (y jx) as a function of the value of the SVM
decision function, i. e. P^ (Y = 1jx) = (f (x)) with an appropriate scaling function .
There are several ad-hoc scaling functions, e. g. the softmax scaler

softmax (z ) =
1
1+e 2z;

which monotonously maps the decision functions value z = f (x) to the interval [0; 1℄.
The scaler assumes that for the decision function is of the type sign(z ) and hence for
z = 0 the classifiers class decision is smallest such that z is mapped to the conditional
class probability 0:5. This allows to view softmax (z ) as a probability. However, this
mapping is not very well founded, as the scaled values are not justified from the data.
To justify the interpretation P^ (Y = 1jx) = (f (x)), it is better to use data to
calibrate the scaling. One can use a subset of the data which has not been used for
training (or use a cross-validation-like approach) and optimize the scaling function to
minimize the error between the predicted class probability (f (x)) and the empirical
class probability defined by the class values y in the new data. There are two error
measures which are usually used, cross-entropy and mean squared error. Cross-entropy
is defined by X
CRE = yi log(zi) + (1 yi )log(1 zi )
i
(where zi = (f (xi ))), which is the Kullback-Leibler distance between the predicted
and the empirical class probability. For comparison of different data sets it is better
to divide the cross-entropy by the number of examples and work with the mean cross-
entropy mCRE. The mean squared error is defined by

MSE =
1 X(y pi )2 :
i
n i
It is an appropriate error measure because for a binary random variable Y 2 f0; 1g,
the expected value of (Y p)2 is minimized by p = P (Y = 1). Hence, the task
of estimating the conditional class probability becomes a regression task. The open
question is, what types of scaling functions should be fitted to the data.
Motivated by an empirical analysis, Platt [10] uses scaling functions of the form

a;b (z ) =
1
1 + e az+b
with a 0 to obtain a monotonically increasing function. The parameters a and
b are found by minimization of the cross-entropy error over a test set (xi ; yi ) with
zi = f (xi ). For an efficient implementation, see [8].
Garczarek [4] proposes a method which scales classification values by

(z ) = B 11; 1 B ;
1 1 (z )

4
where B ; is the Beta distribution function with parameters and . The parameters
1; 1; 2 and 2 are selected such that over a test set (xi ; yi )
1. the average value of (f (x)) for each class is identical to the classification per-
formance of the classifier f in this class and
2. the mean square error (y (f (x)))2 is minimized.
Originally, the algorithm is designed for multiclass problems and computes an indi-
vidual scaler for each predicted class. For binary problems, it is better to modify
this approach such that only one scaler is generated. This avoids discontinuities in
P^ (Y = 1jx) when the prediction changes from one class to the other.
Binning has also been applied to this problem [3]. The decision values are dis-
cretized into several bins and one can estimate the the conditional class probability by
counting the class distribution in the single bins. Other, more complicated approaches
also exists, see e. g. [7] or [12], Ch. 11.11.

3.1 Theoretical Limitations

Bartlett and Tewari [1] show that there is a tradeoff between sparseness of a classifier
and the ability to estimate conditional probabilities. Their result says, in short, that
if one is able to estimate P (Y = 1jx) on some interval, sparseness is lost in that
region. Hence, the question arises in how far the decision function of the SVM, which
generally produces sparse classifiers, can approximate the true conditional density or
the estimate of the non-sparse KLR, respectively.
The problem can be seen in Equation 1. To obtain a maximally accurate classifier,
the SVM contains j1 yi f (xi )j+ in its objective function, i. e. the classifier is punished
if yi f (xi ) < 1 (it becomes a support vector). In this case, this forces an ordering on
the values yi f (xi ) where the value is the higher, the more similar the example is to
the rest of the examples in its class in feature space. Consequently, an estimation of
P (yi jxi ) can be constructed from yi f (xi ). When the example is classified correctly
with sufficient margin, i. e. yi f (xi ) > 1, this example generates no loss and hence no
specific order is enforced on these examples. For the SVM, all the examples on the
right side of the margin have the same probability P (y jx). This behavior can be seen
in Figure 1.
What can be said about the support vectors? In the previous section we already
saw that minimizing the mean squared error between the estimation function (f (x))
and y gives a proper estimate of P (y jx), as for a fixed x the MSE is minimized for
(f (x)) = P (Y = 1jx). However, the error criterion in the SVM is the absolute
error, not the squared error, and one can show that for a fixed x the absolute error is
minimized at (f (x)) = 1 iff P (Y = 1jx) > 0:5 and (f (x)) = 0 otherwise. What
comes to the rescue is that f (x) is not determined for each x independently, but for all x
together. Hence, if not overfitting occurs, at least a value of f (x) = 0 is an indicator of
P (Y = 1jx) = 0:5 and it seems plausible that f (x) contains some useful information
about P (y jx).

5
3
(x,y)
SVM
2.5 KLR

1.5

0.5

-0.5

-1

-1.5

-2

-2.5
-3 -2 -1 0 1 2 3 4 5

Figure 1: One-dimensional comparison of SVM and KLR predictions. Negatives ex-

amples are drawn from N(0,1) (dots at y=-1), positive examples from N(2,1) (dots at
y=1). Both methods find the class border at x=1, but the SVM prediction is essentially
constant for y outside [-1,1]. KLR correctly estimates higher confidences for points
nearer to class centers.

3.2 A Simple Estimation Method

From the previous discussion we know that decision function value with jf (x)j > 1
are unreliable for estimating the conditional class probability. Values with jf (x)j 1
directly optimize the order of the examples with respect to P (y jx). Hence, the question
arises if it is possible to estimate P (Y = 1jx) by the following trivial procedure
8 p
< + ifff (x) > 1
01 (f (x)) = : (1 + f (x))
1
2
ifff (x) 2 [ 1; 1℄
p ifff (x) < 1
where p+ is the fraction of positive examples with f (x) > 1 and p is the fraction of
positive examples with f (x) < 1. For f (x) 2 [ 1; 1℄, the SVM function is simply
linearly scaled to [0; 1℄. Similarly, one can define P P by clipping f (x) at p and p+
instead of 0 and 1.
The advantage of this method compared to the existing approaches is that it requires
almost work when training or applying the classifier, except counting the probabilities
p+ and p and still gives reasonable, empirically founded probability estimates.

4 Experiments
The experiments were conducted on 11 data sets, including 7 data sets from the UCI
Repository [9] (covtype, diabetes, digits, digits, ionosphere, liver, mushroom, promot-
ers) and 4 other real-world data sets: a business cycle analysis problem (business), an
analysis of a direct mailing application (directmailing), a data set from a life insurance

6
company (insurance) and intensive care patient monitoring data (medicine). Prior to
learning, nominal attributes were binarised and the attributes were scaled to expectancy
0 and variance 1. Multi-class-problems were converted to two-class problems by ar-
bitrarily selecting two of the classes (covtype and digits) or combining smaller classes
into a single class (business, medicine). For the covtype data set, a 1% sample was
drawn. The following table sums up the description of the data sets:

Name Size Dimension

covtype 4951 48
diabetes 768 8
digits 776 64
ionosphere 351 34
liver 345 6
mushroom 8124 126
promoters 106 228
business 157 13
directmailing 5626 81
insurance 10000 135
medicine 6610 18

Experiments were made with Support Vector Machines and Kernel Logistic Re-
gression with both linear and radial basis kernel. The parameters of the algorithms
were selected in a prior step to optimize accuracy. The following algorithms were
compared in the experiments:
KLR: Kernel Logistic Regression, used as the baseline.

SVM-Platt: SVM using Platt’s scaling.

SVM-Beta: SVM using Garczarek’s beta scaling.
SVM-Beta-2: SVM using binary beta scaling.
SVM-Bin: SVM and binning.
SVM-Softmax: SVM and softmax scaling.
SVM-01: SVM and output f (x) clipped between 0 and 1.
SVM-PP: SVM and output f (x) clipped between P (Y = 1jf (x) < 1) and P (Y =
1jf (x) > 1).
All reported results were 10-fold cross-validated. For the linear SVM and KLR, the
following results were obtained:

7
Method MSE mCRE
KLR 0.1000 0.0332
SVM-Platt 0.0912 0.0291
SVM-Beta 0.5966 1
SVM-Beta-2 0.0915 0.0301
SVM-Bin (10 bins) 0.1201 0.0384
SVM-Bin (50 bins) 0.1301 0.0415
SVM-Softmax 0.0975 0.0343
SVM-01 0.0970 0.0317
SVM-PP 0.0933 0.0296

With respect to the mean squared error, we get the following ranking: SVM-Platt
< SVM-Beta-2 < SVM-PP < SVM-01 < SVM-Softmax < KLR < SVM-Bin-10 <
SVM-Bin-50 << SVM-Beta. Sorting by mean cross-entropy, SVM-Beta-2 and SVM-
PP change places, as well as SVM-Softmax and Bin-10.
The RBF kernel gave the following results:

Method MSE mCRE

KLR 0.0748 0.0242
SVM-Platt 0.0770 0.0250
SVM-Beta 0.6009 1
SVM-Beta-2 0.0819 0.0278
SVM-Bin (10 bins) 0.0939 0.0305
SVM-Bin (50 bins) 0.1106 0.0356
SVM-Softmax 0.0946 0.0327
SVM-01 0.0916 0.0307
SVM-PP 0.0904 0.0289

This gives the following ranking for MSE: KLR < SVM-Platt < SVM-Beta-2 <
SVM-PP < SVM-01 < SVM-Bin-10 < SVM-Softmax < SVM-Bin-50 << SVM-
Beta.
A close inspection reveals that these results do not give the full picture, as the error
measures reach very different values for the individual data sets. E. g. , the MSE for
Kernel Logistic Regression with radial basis kernel runs from 10 7 (mushroom) to
0:191 (liver). To allow for a better comparison, the methods were ranked according to
their performance for each data set. The following table gives the average rank of each
of the methods for the linear kernel:

8
avg. rank from
Method MSE mCRE
KLR 3.18 3.09
SVM-Platt 3.18 3.45
SVM-Beta 9.00 9.00
SVM-Beta-2 3.27 3.45
SVM-Bin (10 bins) 5.18 5.55
SVM-Bin (50 bins) 6.55 6.45
SVM-Softmax 5.18 5.36
SVM-01 4.91 5.09
SVM-PP 3.45 3.55

The corresponding table for the radial basis kernel:

avg. rank from

Method MSE mCRE
KLR 1.82 1.55
SVM-Platt 2.82 2.64
SVM-Beta 9.00 9.00
SVM-Beta-2 4.27 4.27
SVM-Bin (10 bins) 4.82 4.36
SVM-Bin (50 bins) 6.73 6.73
SVM-Softmax 5.73 5.91
SVM-01 5.36 5.64
SVM-PP 3.64 4.73

To validate the significance of the results, a paired t-test ( = 0:05) was run over the
cross-validation runs. The following table shows the comparison of the cross-entropy
for the linear kernel of the best five of the scaling algorithms. Each row of the table
shows how often the hypothesis that the estimation in that row is better than the esti-
mation in the corresponding column was rejected. E. g. , the 6 in the last row and first
column shows that the hypothesis that softmax scaling is better than KLR was rejected
for 6 of the data sets. The contrary hypothesis was rejected on 2 data sets (first row,
last column).

KLR Platt Beta2 PP Bin10 Soft

KLR 0 2 2 2 2 2
Platt 4 0 0 1 1 0
Beta2 4 3 0 2 1 0
PP 6 4 3 0 2 0
Bin10 7 6 6 5 0 3
Soft 6 8 8 6 4 0

These are the results for cross-entropy and the radial basis kernel:

9
KLR Platt Beta2 PP Bin10 Soft
KLR 0 0 0 0 0 0
Platt 6 0 0 1 0 0
Beta2 7 6 0 4 1 0
PP 7 5 4 0 2 0
Bin10 8 3 3 3 0 2
Soft 9 9 7 9 6 0

The corresponding tables for MSE show similar results.

Summing up, we see that
Kernel Logistic Regression give the best estimation of the conditional class prob-
ability (with some outliers in the linear case).
The best scaling for the SVM is obtained by Platt’s method and binary Beta
Scaling.
The trivial PP-scaling performs comparable to the much more complicated tech-
niques.
Multiclass Beta scaling gives by far the worst results (which was expected from
the non-continuicity of its method of scaling each predicted class on its own).

5 Summary
The experiments in this paper showed that a trivial method of estimating the conditional
class probability P (y jx) from the output of a SVM classifier performs comparably to
much more complicated estimation techniques.

Acknowledgments
The financial support of the Deutsche Forschungsgemeinschaft (SFB 475, ”Reduction
of Complexity for Multivariate Data Structures”) is gratefully acknowledged.

References
[1] Peter L. Bartlett and Ambuj Tewari. Sparseness vs estimating conditional proba-
bilities: Some asymptotic results. submitted, 2004.
[2] C. Burges. A tutorial on support vector machines for pattern recognition. Data
Mining and Knowledge Discovery, 2(2):121–167, 1998.
[3] Joseph Drish. Obtaining calibrated probability estimates from support vector ma-
chines. Technical report, University of California, San Diego, June 2001.
[4] Ursula Garczarek. Classification Rules in Standardized Partition Spaces. PhD
thesis, Universität Dortmund, 2002.

10
[5] T. S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Pro-
ceedings of the 1999 Conference on AI and Statistics, 1999.
[6] S. S. Keerthi, K. Duan, S. K. Shevade, and A.N. Poo. A fast dual algorithm for
kernel logistic regression. Submitted for publication in Machine Learning.
[7] James Tin-Yau Kwok. Moderating the outputs of support vector machine clas-
sifiers. IEEE Transactions on Neural Networks, 10(5):1018–1031, September
1999.
[8] H.-T. Lin, C.-J. Lin, and R. C. Weng. A note on platt’s probabilistic outputs for
support vector machines, May 2003.
[9] P. M. Murphy and D. W. Aha. UCI repository of machine learning databases,
1994.
[10] John Platt. Advances in Large Margin Classifiers, chapter Probabilistic Outputs
for Support Vector Machines and Comparisons to Regularized Likelihood Meth-
ods. MIT Press, 1999.
[11] Volker Roth. Probabilistic discriminative kernel classifiers for multi-class prob-
lems. In B. Radig and S. Florczyk, editors, Pattern Recognition–DAGM’01, num-
ber 2191 in LNCS, pages 246–253. Springer, 2001.
[12] V. Vapnik. Statistical Learning Theory. Wiley, Chichester, GB, 1998.
[13] Grace Wahba. Advances in Kernel Methods - Support Vector Learning, chapter
Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Random-
ized GACV, pages 69–88. MIT Press, 1999.
[14] Ji Zhu and Trevor Hastie. Kernel logistic regression and the import vector ma-
chine. In Neural Information Processing Systems, volume 14, 2001.

Humanistic Nursing Theory by Paterson and Zderad
No ratings yet
Humanistic Nursing Theory by Paterson and Zderad
37 pages
08 Classification
No ratings yet
08 Classification
46 pages
A Practical Guide To Support Vector Classification: I I I N L
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
12 pages
Bayesian Methods For Support Vector Machines: Evidence and Predictive Class Probabilities
No ratings yet
Bayesian Methods For Support Vector Machines: Evidence and Predictive Class Probabilities
32 pages
A Practical Guide To Support Vector Classification: I I I N L
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
15 pages
A Fuzzy Classification Method Based On Support Vector Machine
No ratings yet
A Fuzzy Classification Method Based On Support Vector Machine
4 pages
Classifier Estimation From Group Probabilities, Cf. (,)
No ratings yet
Classifier Estimation From Group Probabilities, Cf. (,)
8 pages
Support Vector Machines as Probabilistic Models
No ratings yet
Support Vector Machines as Probabilistic Models
8 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
Classification: Probabilistic Generative Model
No ratings yet
Classification: Probabilistic Generative Model
34 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
Support Vector Machine in R Paper
No ratings yet
Support Vector Machine in R Paper
28 pages
PW3 SupervisedLearning
No ratings yet
PW3 SupervisedLearning
10 pages
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
No ratings yet
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
12 pages
Article
No ratings yet
Article
23 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
A Practical Guide To Support Vector Classi Cation - Chih-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin
No ratings yet
A Practical Guide To Support Vector Classi Cation - Chih-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin
12 pages
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
No ratings yet
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
23 pages
16 Boosting
No ratings yet
16 Boosting
7 pages
Midterm Solutions For Machine Learning
No ratings yet
Midterm Solutions For Machine Learning
13 pages
Between Classification-Error Approximation and Weighted Least-Squares Learning
No ratings yet
Between Classification-Error Approximation and Weighted Least-Squares Learning
12 pages
T R Ik-Cl Ervor Er Kis: (Example)
No ratings yet
T R Ik-Cl Ervor Er Kis: (Example)
122 pages
SVM
No ratings yet
SVM
40 pages
Mathematics of Machine Learning MIT
No ratings yet
Mathematics of Machine Learning MIT
411 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
SVM Class
No ratings yet
SVM Class
33 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM
No ratings yet
SVM
59 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
CSD311: Artificial Intelligence
No ratings yet
CSD311: Artificial Intelligence
31 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Lec5 Class
No ratings yet
Lec5 Class
14 pages
Fast Kernel Classifiers
No ratings yet
Fast Kernel Classifiers
41 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Notes6_Classification
No ratings yet
Notes6_Classification
10 pages
Pattern Recognition & Learning II: © UW CSE Vision Faculty
No ratings yet
Pattern Recognition & Learning II: © UW CSE Vision Faculty
47 pages
Support Vector Machines Ymod
No ratings yet
Support Vector Machines Ymod
4 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
ML Unit 3 V1
No ratings yet
ML Unit 3 V1
25 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
j077 2011 KulHar WileyTutorial
No ratings yet
j077 2011 KulHar WileyTutorial
14 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
L6 Lecture Image.classification.fundemental v4
No ratings yet
L6 Lecture Image.classification.fundemental v4
66 pages
This Is
No ratings yet
This Is
7 pages
9 Svm-Handout PDF
No ratings yet
9 Svm-Handout PDF
21 pages
Kernal Methods Machine Learning
No ratings yet
Kernal Methods Machine Learning
53 pages
0701907v3
No ratings yet
0701907v3
53 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
2016_An Investigation of Deep Neural Network Architectures for Language_Interspeech_LID
No ratings yet
2016_An Investigation of Deep Neural Network Architectures for Language_Interspeech_LID
5 pages
2020_Hybrid Feature Selection Method Based on_LID
No ratings yet
2020_Hybrid Feature Selection Method Based on_LID
20 pages
Approaches to Language Identification Using Gaussian Mixture Models
No ratings yet
Approaches to Language Identification Using Gaussian Mixture Models
4 pages
Group Delay Functions and Its Applications in Speech
No ratings yet
Group Delay Functions and Its Applications in Speech
38 pages
06163922
No ratings yet
06163922
6 pages
For Practical Research
No ratings yet
For Practical Research
24 pages
Problem Statement:: Case Situation Analysis (Min 50 Words)
No ratings yet
Problem Statement:: Case Situation Analysis (Min 50 Words)
3 pages
Grade 8: Module 1: Unit 2: Lesson 17: End of Unit 2 Assessment, Part One: First Draft of
No ratings yet
Grade 8: Module 1: Unit 2: Lesson 17: End of Unit 2 Assessment, Part One: First Draft of
13 pages
Lemongrass Updated (As of October 15)
No ratings yet
Lemongrass Updated (As of October 15)
38 pages
T. Vijaya Nirmala
No ratings yet
T. Vijaya Nirmala
3 pages
Data Management
No ratings yet
Data Management
8 pages
Pengaruh Supervisi Akademik Dan Motivasi Kerja Terhadap Peningkatan Kompetensi Pedagogik Guru Di SMP Swasta Parulian 2 P. Mandala Medan
No ratings yet
Pengaruh Supervisi Akademik Dan Motivasi Kerja Terhadap Peningkatan Kompetensi Pedagogik Guru Di SMP Swasta Parulian 2 P. Mandala Medan
11 pages
HSQ-ASD 24 Items Scale
No ratings yet
HSQ-ASD 24 Items Scale
1 page
JKE Handbook
No ratings yet
JKE Handbook
104 pages
Lame's Parameter Utilization On Reservoir's Lithological and Pore-Fluid Characterization: Lower Pannonian Case Study
No ratings yet
Lame's Parameter Utilization On Reservoir's Lithological and Pore-Fluid Characterization: Lower Pannonian Case Study
6 pages
Dlptle 8... 18-19
No ratings yet
Dlptle 8... 18-19
22 pages
CT Trader Manual
No ratings yet
CT Trader Manual
29 pages
Sharath Chandra
No ratings yet
Sharath Chandra
1 page
AMACC Dagupan Research (2012 - 2013) - Montemayor, JB - SinoCruz, MJF
No ratings yet
AMACC Dagupan Research (2012 - 2013) - Montemayor, JB - SinoCruz, MJF
19 pages
Training Matrix 2022
No ratings yet
Training Matrix 2022
30 pages
List of Private Unaided (RTE) Schools - 2016: Block:ALUR Distirct:HASSAN
No ratings yet
List of Private Unaided (RTE) Schools - 2016: Block:ALUR Distirct:HASSAN
5 pages
Shreya Ghosh MS Thesis Final Revised
No ratings yet
Shreya Ghosh MS Thesis Final Revised
64 pages
10 Skills Every Supervisor Should Have
100% (1)
10 Skills Every Supervisor Should Have
4 pages
IMC - Breast Feeding Week
No ratings yet
IMC - Breast Feeding Week
1 page
Brouwer - Estudios Sencillos Etude No. 4 - Rene Izquierdo - Tonebase Workbook
100% (1)
Brouwer - Estudios Sencillos Etude No. 4 - Rene Izquierdo - Tonebase Workbook
4 pages
Objectives List
No ratings yet
Objectives List
13 pages
Grammar Task 1. Fill in The Gaps With Appropriate Question Words
100% (1)
Grammar Task 1. Fill in The Gaps With Appropriate Question Words
3 pages
1QTR Eng7 Mod6-Revised PDF
No ratings yet
1QTR Eng7 Mod6-Revised PDF
8 pages
AP Synthesis Essay Example Prompts
100% (3)
AP Synthesis Essay Example Prompts
5 pages
KSSR English Year 1 Guidebook
100% (3)
KSSR English Year 1 Guidebook
362 pages
Personality Assessment: An Overview: Tests and Measurements. Mountain View, CA: Mayfield Publishing Co
No ratings yet
Personality Assessment: An Overview: Tests and Measurements. Mountain View, CA: Mayfield Publishing Co
24 pages
Solution RM&IPR, UTU, Back Paper (2023-24)
No ratings yet
Solution RM&IPR, UTU, Back Paper (2023-24)
8 pages
2 P U C Term Test 1 Model Paper - 1 2024
No ratings yet
2 P U C Term Test 1 Model Paper - 1 2024
2 pages
SCPE Framework-En
No ratings yet
SCPE Framework-En
36 pages

10.1.1.92.623

Uploaded by

10.1.1.92.623

Uploaded by

A Simple Method For Estimating Conditional

Probabilities For SVMs

2.3 Kernel Logistic Regression

3 Probabilistic Scaling of Support Vector Machines

3.1 Theoretical Limitations

Figure 1: One-dimensional comparison of SVM and KLR predictions. Negatives ex-

3.2 A Simple Estimation Method

Name Size Dimension

SVM-Platt: SVM using Platt’s scaling.

Method MSE mCRE

The corresponding table for the radial basis kernel:

avg. rank from

KLR Platt Beta2 PP Bin10 Soft

The corresponding tables for MSE show similar results.

You might also like