A Classifier Ensemble of Binary Classifier Ensembles: Hamid Parvin Sajad Parvin
A Classifier Ensemble of Binary Classifier Ensembles: Hamid Parvin Sajad Parvin
>
=
e
e
otherwise x h MCC
thr x h MPWC x n MaxDecisio
x Decision
c h
sc
EPPC h
sc
)) | ( ( max
)) | ( ( max ) (
) (
} ,..., 1 {
.(3) o + | = _. (1) (1)
where MCC(h|x) is the confidence of the main multiclass
classifier for the class h given a test data x. MPWC
sc
(h|x) is
the confidence of the sc-th ensemble of binary classifiers for
the class h given a test data x. MaxDecision is calculated
according to (4).
Abs(Maxval)
> thr
NO
PWC1,1 on 1
st
EPP
.
.
.
.
.
.
PWC1
PWCk
MPWC1
Test
instance
Multiclass
Classifier
PWC1,m on 1
st
EPP
PWCk,1 on 1
st
EPP
.
.
.
PWCk,m on 1
st
EPP
Pi,j: accuracy of jth classifier in PWCi ensembles
wi,j=log(pi,j/(1-pi,j))
thr is threshold for decision
w1,1
w1,m
wk,1
wk,m
MPWCk
.
.
.
Max
Mean
Mean
Multiclass
Classifier decides
YES
Max decides
International Journal of Electronics Communication and Computer Technology (IJECCT)
Volume 1 Issue 1 | September 2011
ISSN:2249-7838 IJECCT | www.ijecct.org 6
)) | ( ( max arg ) ( x h MPWC x n MaxDecisio
sc
EPPC h
sc
e
= . (4) o + | = _. (1) (1)
where sc is computed as (5).
))) | ( ( (max max arg ) ( x h MPWC x sc
i
EPPC h
i
i
e
= . (5) o + | = _. (1) (1)
Because of the reinforcement of the main classifier by
some ensembles in erroneous regions, it is expected that the
accuracy of this method outperforms a simple MLP or
unweighted ensemble. Figure 3 along with Figure 5 stands as
the structure of the ensemble framework.
VI. WHY PROPOSED METHOD WORKS
As we presume in the paper, it is aimed to add as many as
pairwise classifiers to compensate a predefined error rate,
PDER*EF(MCL,DValidation), where PDER is a predefined
error rate and EF(MCL,DValidation) is error frequency of
multiclass classifier, MCL, over the validation data,
DValidation. Assume we add |EPS| pairwise classifiers to the
main MLC. It is as in the equation below.
) , , ( *
)) , . | . ( ) , . | . ( (
1
DTrain n DValidatio MCL EF PDER
x x EPPC w y EPPC w p x y EPPC w x EPPC w p
eps
i
i i i i
=
= = + = =
=
.(6) o + | = _. (1) (1)
Now assume that a data instance x which belongs really to
class q is to be classified by the proposed algorithm; it has the
error rate which can be obtain by (12). First assume p
p
max
is
probability for the proposed classifier ensemble to take
decision by one of its binary classifiers that is able to
distinguish two classes: q and p. Also assume p
pr
max
is
probability for the proposed classifier ensemble to take
decision by one of its binary classifiers that is able to
distinguish two classes: r and p. They can be is obtained by (7)
and (8) respectively.
)) | ( ), | ( max(
* )) | ( ) | ( ( ) | ) , ( (
max
x r PWC x p PWC
x r MCC x p MCC q x r p EPPC p
pr
+ = e =
. (7) o + | = _. (1) (1)
)) | ( ), | ( max(
* )) | ( ) | ( ( ) | ) , ( (
max
x q PWC x p PWC
x q MCC x p MCC q x q p EPPC p
p
+ = e =
. (8) o + | = _. (1) (1)
where, w
i,j
is the accuracy.
We can assume (9) without losing generality.
= e e
<< ~ e e =
)) | ( ), | ( max(
)) | ( ), | ( max( |
q x q PWC q x p PWC
q x r PWC q x p PWC q r
. (9) o + | = _. (1) (1)
where is a fixed value and then we have:
+ +
~ e =
) ( )) | ( ) | ( (
) | ) , ( (
, ,
max
q r q p
pr
b b x r MCC x p MCC
q x r p EPPC p
. (10) o + | = _. (1) (1)
+ = +
= e =
) ( )) | ( ) | ( (
) | ) , ( (
, ,
max
q q q p
p
b b x q MCC x p MCC
q x q p EPPC p
. (11) o + | = _. (1) (1)
As it is inferred from the algorithm in the same condition,
its error can be formulated as follow.
) 1 )( 1 ( ) | (
) | ( * ) | ( ) | (
, max max
) , (
max
) , (
max
q q
pr p
r p EPPC
pr
q p EPPC
pair
p
b p p x EPPC p
x p p x EPPC p q w x error
+
+ = =
=
=
.(12) o + | = _. (1) (1)
where p
pair
is probability of taking correct decision by
binary classifier and b
j,q
is defined as follow.
=
=
c
i
p i
p j
p j
confusion
confusion
b
1
,
,
,
. (13) o + | = _. (1) (1)
So we can reformulate (12) as follow.
) 1 )( 1 (
) | ( * ) | (
) 1 )( 1 ( ) | (
) | ( * ) | ( ) | (
, max max
) , (
max
, max max
) , (
max
) , (
max
q q
pr p
q p EPPC
pair
p
q q
pr p
r p EPPC
pr
q p EPPC
pair
p
b p p
x p p x EPPC p
b p p x EPPC p
x p p x EPPC p q w x error
+
~ +
+ = =
=
=
=
.(14) o + | = _. (1) (1)
Note that in (14) if p
pr
max
and p
r
max
are zero for an
exemplary input the error of classification will be still equal to
the main multiclass classifier. If they are not zero for an
exemplary input the misclassification rate will still be reduced
because of reduction in second part of (14). Although the first
part increases the error in (14), but if we assume that the
binary classifiers are more accurate than the multiclass
classifier, then the increase is nullified by the decrease part.
VII. EXPERIMENTAL RESULTS
This section evaluates the results of applying the proposed
framework on a Persian handwritten digit dataset named Hoda
[4]. This dataset contains 102,364 instances of digits 0-9.
Dataset is divided into 3 parts: train, evaluation and test sets.
Train set contains 60,000 instances. Evaluation and test
datasets are contained 20,000 and 22,364 instances. The 106
features from each of them have been extracted which are
described in [4]. Some instances of this dataset are depicted in
Figure 6.
In this paper, MLP, 3-NN and DT are used as base primary
classifier. We use an MLPs with 2 hidden layers including
respectively 10 and 5 neurons in the hidden layer 1 and 2, as
the base Multiclass classifier. Confusion matrix is obtained
from its output. Also DTs measure of decision is taken as
Gini measure. The classifiers parameters are kept fixed
during all of their experiments. It is important to take a note
that all classifiers in the algorithm are kept unchanged. It
means that all classifiers are considered as MLP in the first
International Journal of Electronics Communication and Computer Technology (IJECCT)
Volume 1 Issue 1 | September 2011
ISSN:2249-7838 IJECCT | www.ijecct.org 7
experiments. After that the same experiments are taken by
substituting all MLPs whit DTs.
The parameter k is set to 11. So, the number of pairwise
ensembles of binary classifiers added equals to 11 in the
experiments. The parameter m is also set to 9. So, the number
of binary classifiers per each EPPC equals to 9 in the
experiments. It means that 99 binary classifiers are trained for
the pair-classes that have considerable error rates. Assume that
the error number of each pair-class is available. For choosing
the most erroneous pair-classes, it is sufficient to sort error
numbers of pair-classes. Then we can select an arbitrary
number of them. This arbitrary number can be determined by
try and error which it is set to 11 in the experiments.
As mentioned 9*11=99 pairwise classifiers are added to
main multiclass classifier. As the parameter b is selected 20,
so each of these classifiers is trained on only b precepts of
corresponding train data. It means each of them is trained over
20 percept of the train set with the corresponding classes. The
cardinality of this set is calculated by (15).
2400 10 / 2 . 0 * 2 * 60000 / * 2 * = = = c b train Car . (15) o + | = _. (1) (1)
It means that each binary classifier is trained on 2400
datapoints with 2 class labels. Table 1 shows the experimental
results comparatively. As it is inferred the framework is
outperforms the previous works and the simple classifiers in
the case of employing decision tree as the base classifier.
Figure 6. Some instances of Persian OCR data set, with different qualities
It is inferred from Table 1 that the proposed framework
affects significantly in improving the classification precision
specially when employing DT as base classifier. Taking a look
at Table 1 shows that using DT as base classifier in ensemble
almost always produces a better performing classification. It
may be due to inherent instability of DT. It means that because
a DT is unstable classifier, so it is better to use it as a base
classifier in an ensemble. A stable classifier is the one
converge to an identical classifier apart from its training
initialization. It means the 2 consecutive trainings of the
classifier with identical initializations, results in two classifiers
with the same performance. This is not valid for the MLP and
DT classifiers. Although MLP is not a stable classifier, it is
more stable than DT. So it is also expected that using DT
classifier as base classifier has the most impact in improving
the recognition ratio.
As another point to be mentioned, reader can infer that
using the framework can outperforms Unweighted Full
Ensemble, Unweighted Static Classifier Selection and
Unweighted Static Classifier Selection methods explained in
[14]. This can be in consequence of employing binary
classifiers instead of multiclass classifiers.
It is inferred from the Table 1 that the proposed framework
affects significantly in improving the classification precision
specially when employing DT and MLP as base classifier. It is
also obvious that using DT classifier as base classifier has the
most impact in improving the recognition ratio. It is may be
due to its inherent instability.
As it is expected using a stable classifier like k-NN in an
ensemble is not a good option and unstable classifiers like DT
and MLP are better options.
VIII. CONCLUSION
Although the more accurate classifier leads to a better
performance, there is another option to use many inaccurate
classifiers while each one is specialized for a few data in the
problem space and using their consensus vote as the classifier.
So this paper proposes a heuristic classifier ensemble to
improve the performance of learning in multiclass problems.
The main idea behind the proposed method is to focus
classifier in the erroneous spaces of the problem. The new
proposed method tries to improve the performance of
multiclass classification system. We also propose a framework
based on that a set of classifier ensembles are produced that its
size order is not important. It means that we propose a new
pairwise classifier ensemble with a very lower order than usage
of all possible pairwise classifiers. Indeed paper proposes an
ensemble of binary classifier ensembles that has the order of c,
where c is number of classes. So first an arbitrary number of
binary classifier ensembles are added to main classifier. Then
results of all these binary classifier ensembles are given to a set
of a heuristic based ensemble. The results of these binary
ensembles indeed are combined to decide the final vote in a
weighted manner. The proposed framework is evaluated on a
very large scale Persian digit handwritten dataset and the
experimental results show the effectiveness of the algorithm.
Usage of confusion matrix make proposed method a flexible
one. The number of all possible pairwise classifiers is c*(c-1)/2
that it is O(c^2). Using this method without giving up a
considerable accuracy, we decrease its order to O(1). This
feature of our proposed method makes it applicable for
problems with a large number of classes. The experiments
show the effectiveness of this method. Also we reached to very
good results in Persian handwritten digit recognition which is a
very large dataset.
TABLE I. THE ACCURACIES OF DIFFERENT SETTINGS OF THE PROPOSED
FRAMEWORK
Methods Base Classifier
DT MLP 3-NN
A simple
multiclass
classifier
95.57 95.7 96.66
Method Proposed
in [8]
- 98.89 -
International Journal of Electronics Communication and Computer Technology (IJECCT)
Volume 1 Issue 1 | September 2011
ISSN:2249-7838 IJECCT | www.ijecct.org 8
Method Proposed
in [7]
- 98.27 -
Method Proposed
in [15]
97.20 96.70 96.86
Unweighted Full
Ensemble in [14]
98.22 98.11 -
Unweighted Static
Classifier
Selection in [14]
98.13 98.15 -
Weighted Static
Classifier
Selection in [14]
98.34 98.21 -
Proposed Method 99.01 98.46 96.89
It is concluded that using a stable classifier like k-NN in an
ensemble is not a good option and unstable classifiers like DT
and MLP are better options.
REFERENCES
[1] L. Breiman, "Bagging Predictors," Journal of Machine Learning, Vol 24,
no. 2, pp. 123-140, 1996.
[2] S. Gunter, and H. Bunke, "Creation of classifier ensembles for
handwritten word recognition using feature selection algorithms,"
IWFHR 2002 on January 15, 2002.
[3] S. Haykin, Neural Networks, a comprehensive foundation, second
edition, Prentice Hall International, 1999.
[4] H. Khosravi, and E. Kabir, "Introducing a very large dataset of
handwritten Farsi digits and a study on the variety of handwriting
styles," Pattern Recognition Letters, vol 28 issue 10 pp.1133-1141,
2007.
[5] L.I. Kuncheva, Combining Pattern Classifiers, Methods and Algorithms,
New York: Wiley, 2005.
[6] B. Minaei-Bidgoli, and W.F. Punch, "Using Genetic Algorithms for
Data Mining Optimization in an Educational Web-based System,"
GECCO, 2003.
[7] H. Parvin, H. Alizadeh and B. Minaei-Bidgoli, "A New Approach to
Improve the Vote-Based Classifier Selection," International Conference
on Networked Computing and advanced Information Management,
2008.
[8] H. Parvin, H. Alizadeh, M. Fathi, B. Minaei-Bidgoli, "Improved Face
Detection Using Spatial Histogram Features," Int. Conf. on Image
Processing, Computer Vision, and Pattern Recognition, pp. 381-386,
2008.
[9] H. Parvin, H. Alizadeh, B. Minaei-Bidgoli, M. Analoui, "An Scalable
Method for Improving the Performance of Classifiers in Multiclass
Applications by Pairwise Classifiers and GA," International Conference
on Networked Computing and advanced Information Management, pp.
137-142, 2008.
[10] A. Saberi, M. Vahidi, B. Minaei-Bidgoli, "Learn to Detect Phishing
Scams Using Learning and Ensemble Methods," IEEE/WIC/ACM
International Conference on Intelligent Agent Technology, Workshops,
pp. 311-314, 2007.
[11] T. Yang, "Computational Verb Decision Trees," International Journal of
Computational Cognition, pp. 3446, 2006.
[12] H. Parvin, H. Alizadeh, B. Minaei-Bidgoli, "Using Clustering for
Generating Diversity in Classifier Ensemble," JDCTA Vol. 3, no. 1,
pp.51-57, 2009.
[13] H. Parvin, H. Alizadeh, B. Minaei-Bidgoli, "A New Method for
Constructing Classifier Ensembles," JDCTA Vol. 3, no. 2, pp.62-66,
2009.
[14] H. Parvin, H. Alizadeh, "Classifier Ensemble Based Class Weighting,"
American Journal of Scientific Research, pp.84-90, 2011.
[15] H. Parvin, H. Alizadeh, M. Moshki, B. Minaei-Bidgoli, N. Mozayani,
"Divide & Conquer Classification and Optimization by Genetic
Algorithm," International Conference on Convergence and hybrid
Information Technology, pp.858-863, 2008.
[16] F. Masulli, and G. Valentini, "Comparing decomposition methods for
classification," In Proc. International Conference on Knowledge-Based
Intelligent Engineering Systems and Applied Technologies, pp. 788-792,
2000.
[17] F. Cutzu, "Polychotomous classification with pairwise classifiers: A new
voting principle," In Proc. International Workshop on Multiple Classifier
Systems, Lecture Notes in Computer Science, pp. 115-124, 2003.
[18] A. Jozwik, and G. Vernazza "Recognition of leucocytes by a parallel k-
nn classifier,". In Proc. International Conference on Computer-Aided
Medical Diagnosis, pp. 138-153, 1987.