On The Use of Mapreduce To Build Linguistic Fuzzy Rule Based Classification Systems For Big Data
On The Use of Mapreduce To Build Linguistic Fuzzy Rule Based Classification Systems For Big Data
Abstract— Big data has become one of the emergent topics Fuzzy Rule Based Classification Systems (FRBCSs) [4]
when learning from data is involved. The notorious increment are potent and popular tools for pattern recognition and
in the data generation has directed the attention towards the classification. They are able to provide good precision results
obtaining of effective models that are able to analyze and extract
knowledge from these colossal data sources. However, the vast while they are able to supply an interpretable model for the
amount of data, the variety of the sources and the need for end user by the usage of some linguistic labels. One of the
an immediate intelligent response pose a critical challenge to complications that difficult the extraction of potential useful
traditional learning algorithms. information in big data is the uncertainty that is associated to
To be able to deal with big data, we propose the usage of a the variety and veracity inherent to big data. FRBCSs are able
linguistic fuzzy rule based classification system, which we have
called Chi-FRBCS-BigData. As a fuzzy method, it is able deal to effectively deal with uncertainty, ambiguity or vagueness
with the uncertainty that is inherent to the variety and veracity making them a very interesting approach to deal with big
of big data and because of the usage of linguistic fuzzy rules data as they are able to manage its inherent incertitude.
it is able to provide an interpretable and effective classification In a scenario with big data, usually a high number of
model. This method is based on the MapReduce framework, instances and/or attributes is provided. FRBCSs decrement
one of the most popular approaches for big data nowadays,
and has been developed in two different versions: Chi-FRBCS- their performance in these cases as the search space grows
BigData-Max and Chi-FRBCS-BigData-Ave. exponentially. This growth difficults the learning process
The good performance of the Chi-FRBCS-BigData approach leading to scalability or complexity problems that may end
is supported by means of an experimental study over six big up with non-interpretable models [5]. To overcome this
data problems. The results show that the proposal is able to situation, several approaches that try to build parallel fuzzy
provide competitive results, obtaining more precise but slower
models in the Chi-FRBCS-BigData-Ave alternative and faster
systems have been presented [6][7]; however, they are fo-
but less accurate classification results for Chi-FRBCS-BigData- cused on reducing the processing time while preserving the
Max. accuracy and they are not able to manage colossal collections
of data.
I. I NTRODUCTION The frameworks that are typically used to handle big
NE of the most highlighted trends in the recent years data somehow involve some kind of parallelization so that
O by the information technology industry is what is
known as Big Data. The term “Big Data” symbolizes the
they can easily process and analyze the data that is ready
to be used. One of the most popular platforms nowadays,
analysis and treatment of data repositories of a colossal size, MapReduce [8], suggests a computational scheme where all
which traditional data management systems and analytics the processing is distributed along two key operations: a map
are unable to deal with [1]. This trend can be observed in function that will act over a subset of the data, and a reduce
multiple environments like webpages, multimedia data, social function that will integrate the results obtained in the map
networks, mobile devices, sensor networks and so on [2]. function.
With more data available, the analysis and knowledge In this work, we present a FRBCS that is able to pro-
extraction process should be benefited, and more accurate vide an interpretable model while maintaining a competitive
and precise information should be obtained. However, the predictive accuracy in the big data scenario, which has been
standard techniques and approaches that are commonly used denoted as Chi-FRBCS-BigData. This method is based on the
in data mining are not able to manage datasets this size [3]. Chi et al.’s approach [9], a classical FRBCS learning method,
Therefore, the standard learning methods need to be modified which has been modified to deal with big data following a
following the guidelines of the existing solutions that are MapReduce procedure. The Chi-FRBCS-BigData proposal
able to effectively deal with big data while maintaining their has been developed under two different versions, Chi-
predictive capacity. FRBCS-BigData-Max and Chi-FRBCS-BigData-Ave, which
precisely differ in the “Reduce” operation and which are
Victoria López, Sara del Rı́o, José Manuel Benı́tez and Francisco Herrera compared to analyze how they deal with big data.
are with the Department of Computer Science and Artificial Intelligence, Moreover, the Chi et al’s method is especially suitable
CITIC-UGR (Research Center on Information and Communications Tech-
nology), University of Granada, Granada, Spain (email: {vlopez, srio, to be used in a parallel approach instead of using a more
J.M.Benitez, herrera}@decsai.ugr.es). complex FRBCS method as it provides fuzzy rules that have
This work was partially supported by the Spanish Ministry of Science the same structure and which can be independently created
and Technology under project TIN2011-28488 and the Andalusian Research
Plans P10-TIC-6858, P11-TIC-7765 and P12-TIC-2958. Victoria López from a subset of examples. Furthermore, the usage of a
holds a FPU scholarship from Spanish Ministry of Education. FRBCS in big data is also quite interesting as it provides
1906
There are many alternatives that have been proposed to
compute the rule weight [18]. Among them, a good choice is
to use the heuristic method known as the Penalized Certainty
Factor (PCF) [19]:
P P
xp ∈Cj µAj (xp ) − xp ∈C
/ j µAj (xp )
RWj = PCFj = Pm
p=1 µAj (xp )
(2)
where µAj (xp ) is the membership degree of the xp p-th
example of the training set with the antecedents of the rule
and Cj is the consequent class of rule j. We use the fuzzy
reasoning method of the wining rule [20] when predicting a
class using the built KB for a given example.
B. The Chi et al.’s algorithm for Classification
Fig. 1. The MapReduce programming model
To build the KB of a linguistic FRBCS, we need to use
a learning procedure that specifies how the DB and RB are
to overcome these problems, several approaches have been created. In this work, we use the method proposed in [9],
proposed to deal with big data as substitutes for MapReduce an extension of the well-known Wang and Mendel method
and Hadoop. These approaches include projects like Spark, for classification [21], which we have called the Chi et al’s
Apache Drill, Twister or Impala, just to mention some of method, Chi-FRBCS.
them. To generate the fuzzy KB, this generation method tries
to find the relationship between the input attributes and the
III. C HI -FRBCS-B IG DATA : A L INGUISTIC F UZZY RULE classes space following the next steps:
BASED C LASSIFICATION S YSTEM FOR B IG DATA
1) Building the linguistic fuzzy partitions: This step builds
In this section, we will introduce two versions of a the fuzzy DB from the domain associated to each
linguistic FRBCS that manage big data. To do so, first, we attribute Ai using equally distributed triangular mem-
present some definitions related to FRBCSs and the fuzzy bership functions.
learning algorithm that has been adapted in this work, Chi- 2) Generating a new fuzzy rule associated to each exam-
FRBCS. Then, we will describe how this method is adapted ple xp = (xp1 , . . . , xpn , Cp ):
for big data using a MapReduce scheme that is modified to
a) Compute the matching degree µ(xp ) of the ex-
produce two variants that will provide different classification
ample with respect to the fuzzy labels of each
results.
attribute using a conjunction operator.
A. Fuzzy Rule Based Classification Systems b) Select the fuzzy region that obtains the maximum
A FRBCS is composed by two elements: the Inference membership degree in relation with the example.
System and the Knowledge Base (KB). In a linguistic FR- c) Build a new fuzzy rule whose antecedent is
BCS, the KB is formed from the Data Base (DB), which calculated according to the previous fuzzy region
contains the membership functions of the fuzzy partitions and whose consequent is the class label of the
associated to the input attributes, and the Rule Base (RB), example Cp .
which comprises the fuzzy rules that describe the problem. d) Compute the rule weight.
Traditionally, expert information to build the KB is not When following the previous procedure, several rules with
available and therefore, a machine learning procedure is the same antecedent can be built. If they have the same
needed to construct the KB from the available examples. class in the consequent, then, duplicated rules are deleted.
A classification problem is usually defined by m training However, if the class in the consequent is different, only the
samples xp = (xp1 , . . . , xpn ), p = 1, 2, . . . , m from M rule with the highest weight is maintained in the RB.
classes where xpi is the value of attribute i (i = 1, 2, . . . , n)
C. The Chi-FRBCS-BigData algorithm: A MapReduce De-
of the p-th training sample. In this work, we use fuzzy rules
sign
of the following form to build our FRBCS:
At this point, we present the Chi-FRBCS-BigData algo-
rithm which is a FRBCS that is able to effectively clas-
Rule Rj : If x1 is A1j
and . . . and xn is Anj
(1) sify big data. To do so, this method uses two different
then Class = Cj with RWj
MapReduce processes to deal with two different parts of the
where Rj is the label of the j-th rule, x = (x1 , . . . , xn ) is a algorithm: one MapReduce process is devoted to the building
n-dimensional pattern vector, Aij is an antecedent fuzzy set, of the fuzzy KB from a big data training set and the other
Cj is a class label, and RWj is the rule weight [18]. We use MapReduce process is used to estimate the class of samples
triangular membership functions as linguistic labels. belonging to big data sample sets. Both processes follow the
1907
INITIAL MAP REDUCE FINAL
MapReduce structure distributing all the computations along 3) Reduce: In this third phase, a processing unit re-
several processing units that manage different chunks of in- ceives the results obtained by each map process (RBi )
formation, aggregating the results obtained in an appropriate and combines them to form the final RB (called
manner. RBR in Figure 2). The combination of the rules is
Furthermore, we have produced two versions of the Chi- straight-forward: the rules created by each mapper
FRBCS-BigData algorithm, which we have named Chi- RB1 , RB2 , . . . , RBn are all integrated in one RB,
FRBCS-BigData-Max and Chi-FRBCS-BigData-Ave. These RBR . However, contradictory rules (rules with the
versions share most of their operations, however, they behave same antecedent, with or without the same consequent
differently in the “Reduce” step of the approach, when the and with different rule weight) may be created. There-
different rule bases generated by each mapper are combined. fore, specific procedures to deal with these contra-
These versions obtain different rule bases and thus, different dictory rules are needed. Precisely, these procedures
KBs, providing different results when estimating the class of define the two variants of the Chi-FRBCS-BigData
new examples. algorithm:
The procedure to build the fuzzy KB following a MapRe- a) Chi-FRBCS-BigData-Max: In this approach, the
duce scheme in Chi-FRBCS-BigData is depicted in Figure method searches for the rules with the same
2. This procedure is divided into the following phases: antecedent. Among these rules, only the rule with
1) Initial: In this first phase, the method computes the the highest weight is maintained in the final RB,
domain associated to each attribute Ai using the whole RBR . In this case it is not necessary to check
training set. With that information, the fuzzy DB is if the consequent is the same or not, as we
created using equally distributed triangular member- are only maintaining the most powerful rules.
ship functions as in Chi-FRBCS. Then, the system Equivalent rules (rules with the same antecedent
automatically segments the original training dataset and consequent) can present different weights as
into independent data blocks which are automatically they are computed in different mapper processes
transferred to the different processing units together over different training sets.
with the created fuzzy DB. For instance, if we have five rules with the same
2) Map: In this second phase, each processing unit works antecedent and the following consequents and
independently over its available data to build its asso- rule weights: R1 : Class 1, RW1 = 0.8743; R2 :
ciated fuzzy RB (called RBi in Figure 2) following Class 2, RW2 = 0.9254; R3 : Class 1, RW3
the original Chi-FRBCS method. = 0.7142; R4 : Class 2, RW4 = 0.2143 and
Specifically, for each example in the data partition, an R5 : Class 1, RW5 = 0.8215, then, Chi-FRBCS-
associated fuzzy rule is created: first, the membership BigData-Max will keep in RBR the rule R2 :
degree of the fuzzy labels is computed according to Class 2, RW2 = 0.9254 because it is the rule
the example values; then, the fuzzy region that obtains with the maximum weight.
the greatest value is selected to become the antecedent b) Chi-FRBCS-BigData-Ave: In this approach, the
of the rule; next, the class of the example is assigned method also searches for the rules with the
to the rule as consequent; and finally, the rule weight same antecedent. Then, the average weight of the
is computed using the set of examples that belong to rules that have the same consequent is computed
the current map process. (this step is needed because rules with the same
After the rules have been created and before finishing antecedent and consequent may have different
the map step, each map process searches for rules weights as they are built over different training
with the same antecedent. If the rules share the same sets). Finally, the rule with the greatest average
consequent, only one rule is preserved; if the rules have weight is kept in the final RB, RBR .
different consequents, only the rule with the highest For instance, if we have five rules with the
weight is kept in the mappers RB. same antecedent and the following consequents
1908
INITIAL MAP FINAL
Fig. 3. A flowchart of how the classification of a big data classification set is organized in Chi-FRBCS-BigData
and rule weights: R1 : Class 1, RW1 = 0.8743; Please note that Chi-FRBCS-BigData-Max and Chi-
R2 : Class 2, RW2 = 0.9254; R3 : Class 1, FRBCS-BigData-Ave will produce different classifi-
RW3 = 0.7142; R4 : Class 2, RW4 = 0.2143 cation estimations because the input fuzzy RBs are
and R5 : Class 1, RW5 = 0.8215, then, Chi- also different, however, the class estimation process
FRBCS-BigData-Ave will first compute the av- followed is exactly the same for both approaches.
erage weight for the rules with the same conse- 3) Final: In this last phase, the results computed in the
quent, namely, RC1 : Class 1, RWC1 = 0.8033 previous phase are provided as the output of the com-
and RC2 : Class 2, RWC2 = 0.5699, and it will putation process. Precisely, the estimated classes for
keep in RBR the rule RC1 : Class 1, RWC1 = the different examples of the big data classification set
0.8033 because it is the rule with the maximum are aggregated just concatenating the results provided
average weight. by each map task.
Please note that it is not needed for any Chi-FRBCS- It is important to note that this mechanism does not
BigData version to recompute the rule weights in the include a “Reduce” step as it is not necessary to perform
“Reduce” stage, as we are calculating the new rule a computation to combine the results obtained in the “Map”
weights from the previously rule weights provided by phase.
each mapper.
4) Final: In this last phase, the results computed in the IV. E XPERIMENTAL S TUDY
previous phases are provided as the output of the
In this section, we first provide some details of the
computation process. Precisely, the generated fuzzy
problems selected for the experiments, the configuration
KB is composed by the fuzzy DB built in the “Initial”
parameters for the methods analyzed and the statistical tests
phase and the fuzzy RB, RBR , is finally obtained in
applied to compare the results (Section IV-A). Then, we
the “Reduce” phase. This KB will be the model that
provide in Section IV-B the accuracy performance of the
will be used to predict the class for new examples.
approaches tested in the study with respect to the number
As it was previously said, Chi-FRBCS-BigData uses an- of mappers considered. Finally, the runtime spent by the
other MapReduce mechanism to estimate the class of exam- algorithms over the selected data is shown in Section IV-C.
ples that belong to big data classification sets using the fuzzy
KB built within the previous step. This approach follows a A. Experimental Framework
similar scheme to the previous step where the initial dataset
In this study, our aim is to analyze the behavior of the
is distributed along several processing units that provide a
Chi-FRBCS-BigData algorithm in the scenario of big data.
result that will be part of the final result. Specifically, this
To do so, we will consider six problems from the UCI dataset
class estimation process is depicted in Figure 3 and follows
repository [22], shown in Table I, where we denote the
the coming phases:
number of examples (#Ex.), number of attributes (#Atts.),
1) Initial: In this first phase, the method does not need to
selected classes and the number of examples per class. This
perform a specific operation. The system automatically
table is in descending order according to the number of
segments the original big data dataset that needs to
examples.
be classified into independent data blocks which are
automatically transferred to the different processing TABLE I
units together with the previously created fuzzy KB. S UMMARY OF DATASETS
2) Map: In this second phase, each map task estimates
the class for the examples that are included in its Datasets #Ex. #Atts. Selected classes #Samples per class
RLCP 5749132 2 (FALSE; TRUE) (5728201; 20931)
data partition. To do so, each processing unit goes Kddcup DOS vs normal 4856151 41 (DOS; normal) (3883370; 972781)
through all the examples in its data chunk and predicts Poker 0 vs 1 946799 10 (0; 1) (513702; 433097)
Covtype 2 vs 1 495141 54 (2; 1) (283301; 211840)
its output class according to the given fuzzy KB and Census 141544 41 (- 50000.; 50000+.) (133430; 8114)
using the fuzzy reasoning method of the wining rule. Fars Fatal Inj vs No Inj 62123 29 (Fatal Inj; No Inj) (42116; 20007)
1909
The selected datasets only feature two classes even when B. Analysis of the Chi-FRBCS-BigData precision
some of them are multi-class problem. In this work, we In this section, we will try to identify the feasi-
have decided to limit the number of classes despite of the ble differences between the two versions of the Chi-
ability of the Chi-FRBCS-BigData algorithm to deal with FRBCS-BigData proposal: Chi-FRBCS-BigData-Max and
multiple classes to avoid the imbalance in the data that arises Chi-FRBCS-BigData-Ave (for the sake of space, these al-
in many real-world problems [23], as the division approach gorithms are called Chi-BigData-Max and Chi-BigData-Ave
of the MapReduce scheme presented aggravates the small in the Tables).
sample size problem, which decrements the performance in With this aim, Table II shows the average accuracy classifi-
the imbalanced scenario. cation values obtained by the Chi-FRBCS-BigData versions.
In order to develop our study we use a 10-fold stratified This table shows the average training and test results of
cross validation partitioning scheme, that is, nine random each approach and is divided in three horizontal parts that
partitions of data with a 10% of the samples where the correspond to the performance results achieved with the
combination of 9 of them (90%) is considered as training different number of mappers. Moreover, the bold values
set and the remaining one is treated as test set. For each indicate which algorithm is more effective to classify the test
dataset we consider the average results of the ten partitions. examples for a given number of mappers, and the underlined
To verify the performance of the proposed model, we values highlight which is the best performing method in test
compare the results obtained by Chi-FRBCS-BigData-Max in all the experiments considered.
with Chi-FRBCS-BigData-Ave so that we can understand TABLE II
how they behave over the selected big data problems. AVERAGE ACCURACY RESULTS FOR THE C HI -FRBCS-B IG DATA
The configuration parameters used for these algorithms VERSIONS USING 16, 32 AND 64 MAPPERS
are the following: three fuzzy labels for each attribute, the
product T-norm is used to compute the matching degree of Datasets 16 mappers
the antecedent of the rule with the example, the PCF is used Chi-BigData-Max Chi-BigData-Ave
Acctr Acctst Acctr Acctst
to compute the rule weight and the winning rule is used as
RLCP 99.63 99.63 99.63 99.63
fuzzy reasoning method. Additionally, another parameter is Kddcup DOS vs normal 99.93 99.93 99.93 99.93
used in the MapReduce procedure, which is the number of Poker 0 vs 1 62.18 59.88 62.58 60.35
mappers associated to the computation. This value has been Covtype 2 vs 1 74.77 74.72 74.77 74.69
set to 16, 32 and 64 mappers. Census 97.14 93.75 97.15 93.52
To perform the experiments we have used the Atlas Fars Fatal Inj vs No Inj 96.69 94.75 97.06 95.01
Average 88.39 87.11 88.52 87.19
research group’s cluster with 16 nodes, connected with a
32 mappers
1Gb/s ethernet. Each node is composed by two Intel E5- Chi-BigData-Max Chi-BigData-Ave
2620 microprocessors (at 2 GHz, 15MB cache) and 64GB Acctr Acctst Acctr Acctst
of memory running under Linux CentOS 6.3. Furthermore, RLCP 99.63 99.63 99.63 99.63
the cluster works with Hadoop 2.0.0 (Cloudera CDH4.5.0), Kddcup DOS vs normal 99.92 99.92 99.92 99.92
where one node is configured as name-node and job-tracker, Poker 0 vs 1 61.27 58.93 61.82 59.30
and the rest are data-nodes and task-trackers. Covtype 2 vs 1 74.69 74.62 74.88 74.85
Census 97.11 93.48 97.12 93.32
Moreover, when an experimental study is carried out, it Fars Fatal Inj vs No Inj 96.49 94.26 96.87 94.63
is highly advised that the extracted conclusions are validated Average 88.19 86.81 88.37 86.94
through the use of statistical tests [10][11]. Standard paramet- 64 mappers
ric tests, like the t–test, need to meet some initial conditions Chi-BigData-Max Chi-BigData-Ave
in data that are not always met in classification experiments Acctr Acctst Acctr Acctst
and, therefore, non-parametric tests need to be used in their RLCP 99.63 99.63 99.63 99.63
place. Kddcup DOS vs normal 99.92 99.92 99.93 99.93
Poker 0 vs 1 60.45 57.95 60.88 58.12
In this work, we compare the performance of the ap- Covtype 2 vs 1 74.67 74.52 75.05 74.96
proaches using a Wilcoxon signed-rank test [24], a non- Census 97.07 93.30 97.13 93.11
parametric statistical set suitable for pairwise comparisons. Fars Fatal Inj vs No Inj 96.27 93.98 96.76 94.56
This test calculates the differences between two classifiers Average 88.00 86.55 88.23 86.72
and then, ranks them in ascending order with respect to their
absolute value. With these ranks, we compute the R+ and From this table we can observe that, in average, the Chi-
R− values: R+ is the addition of the ranks where the first FRBCS-BigData-Ave method is able to provide better clas-
algorithm outperforms the second, and R− sums the contrary sification results both in training and test than Chi-FRBCS-
case. With this information, the p–value associated to the BigData-Max for any number of mappers considered. There-
statistical distribution is calculated and if that value is below fore, obtaining the average rule weight of all the partial rule
a pre-specified significance level α, then the null hypothesis bases obtained show a positive influence in classification as
of equality of means can be rejected. we are trying to make the rules as general as possible. The
1910
TABLE IV
only clear exception to this tendency can be observed in
AVERAGE RUNTIME ELAPSED IN SECONDS FOR THE
the “Census” dataset that obtains slightly better results for
C HI -FRBCS-B IG DATA VERSIONS USING 16, 32 AND 64 MAPPERS
the Chi-FRBCS-BigData-Max variant. This behavior may be
explained in relation with the training results, as it seems Datasets Chi-BigData-Max Chi-BigData-Ave
that this specific dataset is the one that presents a greater 16 mappers – Runtime (s)
gap between the training and testing results. RLCP 9023.82 8868.84
Moreover, the performance results improve when a smaller Kddcup DOS vs normal 30120.03 29820.01
number of mappers is used for both Chi-FRBCS-BigData Poker 0 vs 1 3075.50 6582.32
versions and for both training and test sets. This behavior Covtype 2 vs 1 1477.67 924.65
is also expected from the MapReduce design followed, as Census 939.32 884.30
Fars Fatal Inj vs No Inj 363.05 236.40
the rule weights are originally estimated from smaller data
Average 7499.90 7886.09
subsets if the number of mappers is high, and therefore, the
32 mappers – Runtime (s)
estimation performed is even further in theses cases from RLCP 2460.89 2303.02
the rule weight value what would be computed if the whole Kddcup DOS vs normal 7890.87 7708.96
dataset was available. However, there are also cases like the Poker 0 vs 1 2210.13 6331.09
“Covtype 2 vs 1” dataset where this trend is not observed. Covtype 2 vs 1 391.40 493.00
In order to give statistical support to the findings previ- Census 388.64 771.04
ously extracted, in Table III we carry out a Wilcoxon test Fars Fatal Inj vs No Inj 141.92 228.96
to compare how both Chi-FRBCS-BigData variants behave Average 2247.31 2972.68
when different number of mappers are used. From this test, 64 mappers – Runtime (s)
we may conclude that there are no clear differences between RLCP 701.31 714.41
Kddcup DOS vs normal 2079.93 2096.34
the approaches as the obtained p–values are not lower than
Poker 0 vs 1 1635.98 8373.40
a given significance level α = 0.05 or 0.1.
Covtype 2 vs 1 252.19 348.86
TABLE III Census 325.24 764.94
W ILCOXON TEST TO COMPARE THE ACCURACY ON THE Fars Fatal Inj vs No Inj 136.24 241.75
C HI -FRBCS-B IG DATA VERSIONS . R+ CORRESPONDS TO THE SUM OF Average 855.15 2089.95
THE RANKS FOR C HI -B IG DATA -M AX AND R − TO C HI -B IG DATA -AVE
7500
Comparison #Mappers R+ R− p-Value 6500
Time elapsed (seconds)
3500
2500
Even when we cannot find statistical differences, we can
1500
observe that there is a tendency to consider the Chi-FRBCS-
500
BigData-Ave approach as the best performing one, as the 16 32 64
Number of mappers
sum of ranks is always directed to its side. Furthermore,
Chi-FRBCS-BigData-Max Chi-FRBCS-BigData-Ave
we can also see that the difference between the approaches
is smaller (higher p–value) when the number of mappers is Fig. 4. Average runtimes for the Chi-FRBCS-BigData versions
also smaller, which is precisely when both approaches obtain
a better classification performance.
C. Analysis of the Chi-FRBCS-BigData runtime for all the values of the number of mappers considered.
In this section, we will focus on understanding the differ- This behavior is expected as this version of the algorithm
ent behavior of the two versions of the Chi-FRBCS-BigData performs less operations than the alternative and the oper-
proposal with respect to the runtime of the model. ations performed are simpler. For the smallest number of
Table IV shows the runtime in seconds spent by the Chi- mappers considered, it seems that there is a greater number of
FRBCS-BigData-Max and Chi-FRBCS-BigData-Ave meth- cases that benefit the Chi-FRBCS-BigData-Ave alternative,
ods. As in the previous case, this table is divided in three however, this improvement per dataset is not very high and
parts, which show the results for each dataset with respect it is not able to compensate how much slower this alternative
to the different number of mappers. There are two types of is in the “Poker 0 vs 1” dataset. In Figure 4, we can see the
highlighting in the table: the bold values correspond to the difference between the runtime of the Chi-FRBCS-BigData
fastest method within the same number of mappers while alternatives in average, where the Chi-FRBCS-BigData-Ave
the underlined values refer to the quickest execution for a version consumes more of time.
dataset. Furthermore, both versions also notably decrement their
In average, we can see that the runtime results show a runtimes when larger values of mappers are used. This
better behavior for the Chi-FRBCS-BigData-Max algorithm diminution in the runtime does not follow a lineal proportion
1911
(as it can be seen from Figure 4). For instance, the speed gain R EFERENCES
when we double the number of processing units is much [1] P. Zikopoulos, C. Eaton, D. DeRoos, T. Deutsch and George Lapis,
greater than reducing the processing time by half. Understanding Big Data: Analytics for Enterprise Class Hadoop and
We can also see that this runtime improvement is not Streaming Data. McGraw-Hill, 2011.
[2] S. Madden, “From Databases to Big Data,” IEEE Internet Computing,
proportional over the different datasets: the biggest datasets vol. 16, no. 3, pp. 4–6, 2012.
are the ones that are able to further improve their perfor- [3] A. Sathi, Big Data Analytics: Disruptive Technologies for Changing
mance while the smaller datasets are not able to do so in the the Game. MC Press, 2012.
[4] H. Ishibuchi, T. Nakashima and M. Nii, Classification and modeling
same proportion. Moreover, the Chi-FRBCS-BigData-Max with linguistic information granules: Advanced approaches to linguis-
algorithm is able to scale up better than the Chi-FRBCS- tic Data Mining. Springer–Verlag, 2004.
BigData-Ave alternative, as the second approach seems to [5] Y. Jin, “Fuzzy modeling of high-dimensional systems: complexity
reduction and interpretability improvement,” IEEE Transactions on
halt its progression when 64 mappers are used. Fuzzy Systems, vol. 8, no. 2, pp. 212–221, 2000.
To sum up, our experimental study shows that the Chi- [6] T.P. Hong, Y.C. Lee and M.T. Wu, “An effective parallel approach for
genetic-fuzzy data mining,” Expert Systems with Applications, vol. 41,
FRBCS-BigData-Ave alternative allows us to obtain better no. 2, pp. 655–662, 2014.
classification results for the Chi-FRBCS-BigData algorithm. [7] H. Ishibuchi, S. Mihara and Y. Nojima, “Parallel distributed hybrid
We have also encountered that greater values for the number fuzzy GBML models with rule set migration and training data rota-
tion,” IEEE Transactions on Fuzzy Systems, vol. 21, no. 2, pp. 355–
of mappers decrement the accuracy of the model as the model 368, 2013.
is less general when it is built over the smaller data subsets. [8] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing
As a counterpart, the Chi-FRBCS-BigData-Max version on large clusters,” Communications of the ACM, vol. 51, no. 1, pp.
107–113, 2008.
does not have a significant drop in the accuracy performance [9] Z. Chi, H. Yan and T. Pham, Fuzzy algorithms with applications to
with respect to the Chi-FRBCS-BigData-Ave alternative and image processing and pattern recognition. World Scientific, 1996.
it provides better response times than it. Furthermore, its [10] J. Demšar, “Statistical Comparisons of Classifiers over Multiple Data
Sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.
speed gain is notable when higher number of mappers are [11] S. Garcı́a and F. Herrera, “An Extension on “Statistical Comparisons
used. In this manner, it is necessary to establish a trade-off in of Classifiers over Multiple Data Sets” for all Pairwise Comparisons,”
each occasion so that the most suitable Chi-FRBCS-BigData Journal of Machine Learning Research, vol. 9, pp. 2607–2624, 2008.
[12] T. White, Hadoop, The Definitive Guide. O’Reilly Media, Inc., 2012.
approach is selected according to our needs. [13] D. Laney, “3D Data Management: Controlling Data Volume,
Velocity, and Variety,” [Online; accessed January 2014] (http:
V. C ONCLUDING R EMARKS //blogs.gartner.com/doug-laney/files/2012/01/
ad949-3D-Data-Management-Controlling-Data-
In this work, we have introduced a linguistic fuzzy rule- Volume-Velocity-and-Variety.pdf), 2001.
[14] IBM, “What is big data? Bringing big data to the enterprise,”
based classification method for big data named Chi-FRBCS- [Online; accessed January 2014] (http://www-01.ibm.com/
BigData. This model obtains an interpretable model that software/data/bigdata/), 2012.
manages colossal collections of data without damaging the [15] J. Dean and S. Ghemawat, “MapReduce: A flexible data processing
tool,” Communications of the ACM, vol. 53, no. 1, pp. 72–77, 2010.
classification accuracy and with fast response times. [16] S. Owen, R. Anil, T. Dunning and E. Friedman, Mahout in Action.
Moreover, this approach has been designed using one of Manning Publications Co., 2011.
the most popular approaches for big data nowadays: the [17] J. Lin, “MapReduce is Good Enough? If All You Have is a Hammer,
Throw Away Everything That’s Not a Nail!,” Big Data, vol. 1, no. 1,
MapReduce framework. In this manner, this algorithm dis- pp. 28–37, 2013.
tributes its computing using a map function and combines the [18] H. Ishibuchi and T. Nakashima, “Effect of Rule Weights in Fuzzy Rule-
output via a reduce function. Specifically, the Chi-FRBCS- Based Classification Systems,” IEEE Transactions on Fuzzy Systems,
vol. 9, no. 4, pp. 506–515, 2001.
BigData proposal has been developed under two versions [19] H. Ishibuchi and T. Yamamoto, “Rule Weight Specification in Fuzzy
which have been called Chi-FRBCS-BigData-Max and Chi- Rule-Based Classification Systems,” IEEE Transactions on Fuzzy
FRBCS-BigData-Ave. Although these alternatives follow the Systems, vol. 13, no. 4, pp. 428–435, 2005.
[20] O. Cordón, M.J. del Jesus and F. Herrera, “A proposal on Reasoning
same structure and share numerous operations, its differences Methods in Fuzzy Rule-Based Classification Systems,” International
in the reduce function finally produce diverse classification Journal of Approximate Reasoning, vol. 20, no. 1, pp. 21–45, 1999.
models with divergent classification results. [21] L.X. Wang and J.M. Mendel, “Generating fuzzy rules by learning from
examples,” IEEE Transactions on Systems, Man, and Cybernetics, vol.
The performance of the Chi-FRBCS-BigData alternatives 22, no. 6, pp. 1414–1427, 1992.
has been contrasted in an experimental study including six [22] K. Bache and M. Lichman, “UCI Machine Learning Repository,”
different big data problems. These results corroborate the [Online; accessed January 2014] (http://archive.ics.uci.
edu/ml), 2014.
goodness of the approaches; however, it is not possible to [23] V. López, A. Fernández, S. Garcı́a, V. Palade and Francisco Herrera,
identify a clear winner and it is needed to select one of “An insight into classification with imbalanced data: Empirical results
them according to our needs. If we aim to obtain the best and current trends on using data intrinsic characteristics,” Information
Sciences, vol. 250, pp. 113–141, 2013.
precision results, then, using the Chi-FRBCS-BigData-Ave [24] D. Sheskin, Handbook of parametric and nonparametric statistical
method with a lower value for the number of mappers seems procedures. Chapman & Hall/CRC, 2006.
to be choice in spite of worse runtime results. On the contrary
case, if we are interested in obtaining the fastest results
without greatly damaging the performance, then, using the
Chi-FRBCS-BigData-Max with a high number of mappers
seems to be the sensible choice.
1912