0% found this document useful (0 votes)

27 views

On The Use of Mapreduce To Build Linguistic Fuzzy Rule Based Classification Systems For Big Data

artículo

Uploaded by

lolilociga

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

On The Use of Mapreduce To Build Linguistic Fuzzy Rule Based Classification Systems For Big Data

artículo

Uploaded by

lolilociga

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)

July 6-11, 2014, Beijing, China

On the use of MapReduce to build Linguistic Fuzzy Rule Based

Classification Systems for Big Data
Victoria López, Sara del Rı́o, José Manuel Benı́tez and Francisco Herrera

Abstract— Big data has become one of the emergent topics Fuzzy Rule Based Classification Systems (FRBCSs) [4]
when learning from data is involved. The notorious increment are potent and popular tools for pattern recognition and
in the data generation has directed the attention towards the classification. They are able to provide good precision results
obtaining of effective models that are able to analyze and extract
knowledge from these colossal data sources. However, the vast while they are able to supply an interpretable model for the
amount of data, the variety of the sources and the need for end user by the usage of some linguistic labels. One of the
an immediate intelligent response pose a critical challenge to complications that difficult the extraction of potential useful
traditional learning algorithms. information in big data is the uncertainty that is associated to
To be able to deal with big data, we propose the usage of a the variety and veracity inherent to big data. FRBCSs are able
linguistic fuzzy rule based classification system, which we have
called Chi-FRBCS-BigData. As a fuzzy method, it is able deal to effectively deal with uncertainty, ambiguity or vagueness
with the uncertainty that is inherent to the variety and veracity making them a very interesting approach to deal with big
of big data and because of the usage of linguistic fuzzy rules data as they are able to manage its inherent incertitude.
it is able to provide an interpretable and effective classification In a scenario with big data, usually a high number of
model. This method is based on the MapReduce framework, instances and/or attributes is provided. FRBCSs decrement
one of the most popular approaches for big data nowadays,
and has been developed in two different versions: Chi-FRBCS- their performance in these cases as the search space grows
BigData-Max and Chi-FRBCS-BigData-Ave. exponentially. This growth difficults the learning process
The good performance of the Chi-FRBCS-BigData approach leading to scalability or complexity problems that may end
is supported by means of an experimental study over six big up with non-interpretable models [5]. To overcome this
data problems. The results show that the proposal is able to situation, several approaches that try to build parallel fuzzy
provide competitive results, obtaining more precise but slower
models in the Chi-FRBCS-BigData-Ave alternative and faster
systems have been presented [6][7]; however, they are fo-
but less accurate classification results for Chi-FRBCS-BigData- cused on reducing the processing time while preserving the
Max. accuracy and they are not able to manage colossal collections
of data.
I. I NTRODUCTION The frameworks that are typically used to handle big
NE of the most highlighted trends in the recent years data somehow involve some kind of parallelization so that
O by the information technology industry is what is
known as Big Data. The term “Big Data” symbolizes the
they can easily process and analyze the data that is ready
to be used. One of the most popular platforms nowadays,
analysis and treatment of data repositories of a colossal size, MapReduce [8], suggests a computational scheme where all
which traditional data management systems and analytics the processing is distributed along two key operations: a map
are unable to deal with [1]. This trend can be observed in function that will act over a subset of the data, and a reduce
multiple environments like webpages, multimedia data, social function that will integrate the results obtained in the map
networks, mobile devices, sensor networks and so on [2]. function.
With more data available, the analysis and knowledge In this work, we present a FRBCS that is able to pro-
extraction process should be benefited, and more accurate vide an interpretable model while maintaining a competitive
and precise information should be obtained. However, the predictive accuracy in the big data scenario, which has been
standard techniques and approaches that are commonly used denoted as Chi-FRBCS-BigData. This method is based on the
in data mining are not able to manage datasets this size [3]. Chi et al.’s approach [9], a classical FRBCS learning method,
Therefore, the standard learning methods need to be modified which has been modified to deal with big data following a
following the guidelines of the existing solutions that are MapReduce procedure. The Chi-FRBCS-BigData proposal
able to effectively deal with big data while maintaining their has been developed under two different versions, Chi-
predictive capacity. FRBCS-BigData-Max and Chi-FRBCS-BigData-Ave, which
precisely differ in the “Reduce” operation and which are
Victoria López, Sara del Rı́o, José Manuel Benı́tez and Francisco Herrera compared to analyze how they deal with big data.
are with the Department of Computer Science and Artificial Intelligence, Moreover, the Chi et al’s method is especially suitable
CITIC-UGR (Research Center on Information and Communications Tech-
nology), University of Granada, Granada, Spain (email: {vlopez, srio, to be used in a parallel approach instead of using a more
J.M.Benitez, herrera}@decsai.ugr.es). complex FRBCS method as it provides fuzzy rules that have
This work was partially supported by the Spanish Ministry of Science the same structure and which can be independently created
and Technology under project TIN2011-28488 and the Andalusian Research
Plans P10-TIC-6858, P11-TIC-7765 and P12-TIC-2958. Victoria López from a subset of examples. Furthermore, the usage of a
holds a FPU scholarship from Spanish Ministry of Education. FRBCS in big data is also quite interesting as it provides

978-1-4799-2072-3/14/$31.00 ©2014 IEEE 1905

a mechanism to manage the uncertainty that is inherent in machines [8][15] and which has become one of the most
this scenario because of the variety and veracity of data. popular approaches nowadays. The MapReduce paradigm
To support the suitability of the Chi-FRBCS-BigData revolves around two key operations: a map function and a
approach we have selected six big data problems for our reduce function. In a first phase, the input data is processed
experimental study that will help to understand how the pro- by the map function which produces some intermediate
posal works, which are its strong points and its limitations. results; these intermediate results will be then fed in a second
This experimental study will measure the performance of phase to a reduce function, which somehow combines the
the classifiers according to the accuracy obtained and the intermediate results to present a final output.
runtime spent by the models. In addition, and to detect the The MapReduce model is based on a essential data struc-
differences between the versions of the proposal, we study ture that is traditionally known as a key-value pair. All the
the significance of the results by means of statistical tests data processed, the intermediate results and the final output
[10][11]. are expressed in this key-value form. In this manner, the
The rest of this paper is organized as follows. Section II map and reduce functions that can be seen in a MapReduce
briefly introduces the problem of big data. Next, Section III procedure are:
contains the approaches developed in this work, the versions • Map function: In the map function the master node per-
of the Chi-FRBCS-BigData method, together with some pre- forms an automatic division of the data into independent
vious concepts about FRBCSs. Then, the experimental study data blocks which are then distributed and transferred
is shown along Section IV. Finally, Section V summarizes to the worker nodes. Each worker node processes in-
and concludes the work. dependently its data chunk and produces a result that
II. B IG DATA IN CLASSIFICATION is transmitted back to the master node. In terms of
In the late years, the term “Big Data” has emerged as one the key-value pairs, it is said that the map function
of the most hot topics related to the information technology receives a key-value pair as input and produces a list
industry. This concept is related to the impressive growth of intermediate key-value pairs. These intermediate key-
in data generation that has taken place recently and that value pairs are then automatically shuffled and ordered
has highlighted the interest in obtaining useful information according to the intermediate key to speed up the reduce
from these immense data sources. Specifically, the concept step.
of “Big Data” is is applied to all the information that cannot • Reduce function: In the reduce function, the master
be processed or analyzed using traditional techniques or tools node collects the output results produced in the previous
[12]. One of the early and well-known definitions of big data phase and then, uses them in some manner to conform
[13], describes the concept as a 3Vs model (volume, velocity the final result of the algoritm. Again, in terms of the
and variety): key-value pairs, the reduce function obtains the interme-
• Volume: This feature is related to the enormous size
diate key-value pairs computed previously aggregated
of the data that needs to be treated to extract useful by the key values and creates an output value that will
information. be the output of the method.
• Velocity: When analyzing big data it is of the utmost Figure 1 depicts a standard MapReduce program with its
importance to provide an informed response within a map step and its reduce step. The terms k and v refer to the
reasonable time limit. original key and value pair respectively; k 0 and v 0 are the
• Variety: This characteristic refers to the diverse type of intermediate key-value pair that is generated after the use of
data that will compose the data corpus. For instance, in the map function; and v 00 is the final value given as a result
big data it is typical to merge structured and unstruc- of the algorithm.
tured data like tabular data from databases, hierarchical Hadoop is the most popular implementation of the MapRe-
data, documents, graph data and so on. duce programming model [12]. It is an open-source project
Later on, additional Vs have been proposed by some written in Java and supported by the Apache software foun-
organizations to describe the big data model [14]: validity, dation that tries to facilitate the processing and management
volatility, value, variability and veracity. of large datasets in a distributed manner. It provides aux-
Big data can be seen in numerous real-world environments, iliary services analogous to the ones available in Google’s
it can be presented in any format and can be attained from MapReduce implementation.
diverse origins. For instance, financial data like the tradings Machine learning methods have also started to be inte-
of the day in the New York Stock Exchange can add up grated using the MapReduce paradigm to deal with big data.
to one new TB per day. Multimedia data, leads to occupy The Mahout project [16], also supported by the Apache soft-
one PB in Facebook servers with approximately 10 billion ware foundation, is a machine learning library that features
photos. Even more simple data like the one stored by the scalable machine learning applications over Hadoop or other
Internet Archive can accumulate 2 PB of data per day [12]. scalable systems.
To effectively deal with big data, Google presented a Nevertheless, a MapReduce design is not always suitable
parallel programming model, MapReduce, which is a frame- for all kind of computations [17]. Some examples of that
work for processing large volumes of data over a cluster of are iterative algorithms or graph-based algorithms. In order

1906
There are many alternatives that have been proposed to
compute the rule weight [18]. Among them, a good choice is
to use the heuristic method known as the Penalized Certainty
Factor (PCF) [19]:
P P
xp ∈Cj µAj (xp ) − xp ∈C
/ j µAj (xp )
RWj = PCFj = Pm
p=1 µAj (xp )
(2)
where µAj (xp ) is the membership degree of the xp p-th
example of the training set with the antecedents of the rule
and Cj is the consequent class of rule j. We use the fuzzy
reasoning method of the wining rule [20] when predicting a
class using the built KB for a given example.
B. The Chi et al.’s algorithm for Classification
Fig. 1. The MapReduce programming model
To build the KB of a linguistic FRBCS, we need to use
a learning procedure that specifies how the DB and RB are
to overcome these problems, several approaches have been created. In this work, we use the method proposed in [9],
proposed to deal with big data as substitutes for MapReduce an extension of the well-known Wang and Mendel method
and Hadoop. These approaches include projects like Spark, for classification [21], which we have called the Chi et al’s
Apache Drill, Twister or Impala, just to mention some of method, Chi-FRBCS.
them. To generate the fuzzy KB, this generation method tries
to find the relationship between the input attributes and the
III. C HI -FRBCS-B IG DATA : A L INGUISTIC F UZZY RULE classes space following the next steps:
BASED C LASSIFICATION S YSTEM FOR B IG DATA
1) Building the linguistic fuzzy partitions: This step builds
In this section, we will introduce two versions of a the fuzzy DB from the domain associated to each
linguistic FRBCS that manage big data. To do so, first, we attribute Ai using equally distributed triangular mem-
present some definitions related to FRBCSs and the fuzzy bership functions.
learning algorithm that has been adapted in this work, Chi- 2) Generating a new fuzzy rule associated to each exam-
FRBCS. Then, we will describe how this method is adapted ple xp = (xp1 , . . . , xpn , Cp ):
for big data using a MapReduce scheme that is modified to
a) Compute the matching degree µ(xp ) of the ex-
produce two variants that will provide different classification
ample with respect to the fuzzy labels of each
results.
attribute using a conjunction operator.
A. Fuzzy Rule Based Classification Systems b) Select the fuzzy region that obtains the maximum
A FRBCS is composed by two elements: the Inference membership degree in relation with the example.
System and the Knowledge Base (KB). In a linguistic FR- c) Build a new fuzzy rule whose antecedent is
BCS, the KB is formed from the Data Base (DB), which calculated according to the previous fuzzy region
contains the membership functions of the fuzzy partitions and whose consequent is the class label of the
associated to the input attributes, and the Rule Base (RB), example Cp .
which comprises the fuzzy rules that describe the problem. d) Compute the rule weight.
Traditionally, expert information to build the KB is not When following the previous procedure, several rules with
available and therefore, a machine learning procedure is the same antecedent can be built. If they have the same
needed to construct the KB from the available examples. class in the consequent, then, duplicated rules are deleted.
A classification problem is usually defined by m training However, if the class in the consequent is different, only the
samples xp = (xp1 , . . . , xpn ), p = 1, 2, . . . , m from M rule with the highest weight is maintained in the RB.
classes where xpi is the value of attribute i (i = 1, 2, . . . , n)
C. The Chi-FRBCS-BigData algorithm: A MapReduce De-
of the p-th training sample. In this work, we use fuzzy rules
sign
of the following form to build our FRBCS:
At this point, we present the Chi-FRBCS-BigData algo-
rithm which is a FRBCS that is able to effectively clas-
Rule Rj : If x1 is A1j
and . . . and xn is Anj
(1) sify big data. To do so, this method uses two different
then Class = Cj with RWj
MapReduce processes to deal with two different parts of the
where Rj is the label of the j-th rule, x = (x1 , . . . , xn ) is a algorithm: one MapReduce process is devoted to the building
n-dimensional pattern vector, Aij is an antecedent fuzzy set, of the fuzzy KB from a big data training set and the other
Cj is a class label, and RWj is the rule weight [18]. We use MapReduce process is used to estimate the class of samples
triangular membership functions as linguistic labels. belonging to big data sample sets. Both processes follow the

1907
INITIAL MAP REDUCE FINAL

R1: IF A1 = L1 AND A2 = L1 THEN C1; RW1 = 0.9875

R2: IF A1 = L1 AND A2 = L2 THEN C2; RW2 = 0.9142
R3: IF A1 = L2 AND A2 = L1 THEN C1; RW3 = 0.4215
...
Train set map1 RB1
DB DB
R1: IF A1 = L1 AND A2 = L1 THEN C1; RW1 = 0.9654 R1: IF A1 = L1 AND A2 = L1 THEN C1; RW1 = 0.9875
R2: IF A1 = L1 AND A2 = L2 THEN C2; RW2 = 0.8842 R2: IF A1 = L1 AND A2 = L2 THEN C2; RW2 = 0.9142
R3: IF A1 = L2 AND A2 = L1 THEN C2; RW3 = 0.6534 R3: IF A1 = L2 AND A2 = L1 THEN C2; RW3 = 0.6534 R1: IF A1 = L1 AND A2 = L1 THEN C1; RW1 = 0.9875
… … R2: IF A1 = L1 AND A2 = L2 THEN C2; RW2 = 0.9142
R3: IF A1 = L2 AND A2 = L1 THEN C2; RW3 = 0.6534 RBR
DB generation Train set map2 RB2 RBR …
Final RB generation
… … Final KB
Original train set
R1: IF A1 = L1 AND A2 = L1 THEN C1; RW1 = 0.7415
R2: IF A1 = L1 AND A2 = L2 THEN C1; RW2 = 0.2419
R3: IF A1 = L2 AND A2 = L1 THEN C2; RW3 = 0.4715
…
Train set mapn RBn
Mappers RB generation

Fig. 2. A flowchart of how the building of the KB is organized in Chi-FRBCS-BigData

MapReduce structure distributing all the computations along 3) Reduce: In this third phase, a processing unit re-
several processing units that manage different chunks of in- ceives the results obtained by each map process (RBi )
formation, aggregating the results obtained in an appropriate and combines them to form the final RB (called
manner. RBR in Figure 2). The combination of the rules is
Furthermore, we have produced two versions of the Chi- straight-forward: the rules created by each mapper
FRBCS-BigData algorithm, which we have named Chi- RB1 , RB2 , . . . , RBn are all integrated in one RB,
FRBCS-BigData-Max and Chi-FRBCS-BigData-Ave. These RBR . However, contradictory rules (rules with the
versions share most of their operations, however, they behave same antecedent, with or without the same consequent
differently in the “Reduce” step of the approach, when the and with different rule weight) may be created. There-
different rule bases generated by each mapper are combined. fore, specific procedures to deal with these contra-
These versions obtain different rule bases and thus, different dictory rules are needed. Precisely, these procedures
KBs, providing different results when estimating the class of define the two variants of the Chi-FRBCS-BigData
new examples. algorithm:
The procedure to build the fuzzy KB following a MapRe- a) Chi-FRBCS-BigData-Max: In this approach, the
duce scheme in Chi-FRBCS-BigData is depicted in Figure method searches for the rules with the same
2. This procedure is divided into the following phases: antecedent. Among these rules, only the rule with
1) Initial: In this first phase, the method computes the the highest weight is maintained in the final RB,
domain associated to each attribute Ai using the whole RBR . In this case it is not necessary to check
training set. With that information, the fuzzy DB is if the consequent is the same or not, as we
created using equally distributed triangular member- are only maintaining the most powerful rules.
ship functions as in Chi-FRBCS. Then, the system Equivalent rules (rules with the same antecedent
automatically segments the original training dataset and consequent) can present different weights as
into independent data blocks which are automatically they are computed in different mapper processes
transferred to the different processing units together over different training sets.
with the created fuzzy DB. For instance, if we have five rules with the same
2) Map: In this second phase, each processing unit works antecedent and the following consequents and
independently over its available data to build its asso- rule weights: R1 : Class 1, RW1 = 0.8743; R2 :
ciated fuzzy RB (called RBi in Figure 2) following Class 2, RW2 = 0.9254; R3 : Class 1, RW3
the original Chi-FRBCS method. = 0.7142; R4 : Class 2, RW4 = 0.2143 and
Specifically, for each example in the data partition, an R5 : Class 1, RW5 = 0.8215, then, Chi-FRBCS-
associated fuzzy rule is created: first, the membership BigData-Max will keep in RBR the rule R2 :
degree of the fuzzy labels is computed according to Class 2, RW2 = 0.9254 because it is the rule
the example values; then, the fuzzy region that obtains with the maximum weight.
the greatest value is selected to become the antecedent b) Chi-FRBCS-BigData-Ave: In this approach, the
of the rule; next, the class of the example is assigned method also searches for the rules with the
to the rule as consequent; and finally, the rule weight same antecedent. Then, the average weight of the
is computed using the set of examples that belong to rules that have the same consequent is computed
the current map process. (this step is needed because rules with the same
After the rules have been created and before finishing antecedent and consequent may have different
the map step, each map process searches for rules weights as they are built over different training
with the same antecedent. If the rules share the same sets). Finally, the rule with the greatest average
consequent, only one rule is preserved; if the rules have weight is kept in the final RB, RBR .
different consequents, only the rule with the highest For instance, if we have five rules with the
weight is kept in the mappers RB. same antecedent and the following consequents

1908
INITIAL MAP FINAL

Sample11: Actual class C1; Predicted class C1

Sample12: Actual class C2; Predicted class C2
Sample13: Actual class C1; Predicted class C2
... Sample11: Actual class C1; Predicted class C1
Sample12: Actual class C2; Predicted class C2
Classification set map1 Predictions set1 Sample13: Actual class C1; Predicted class C2
...
Sample21: Actual class C1; Predicted class C1 Sample21: Actual class C1; Predicted class C1
Sample22: Actual class C2; Predicted class C2 Sample22: Actual class C2; Predicted class C2
Sample23: Actual class C2; Predicted class C2 Sample23: Actual class C2; Predicted class C2
... ...
Classification set map2 Predictions set2 Samplen1: Actual class C2; Predicted class C1
Samplen2: Actual class C2; Predicted class C2
Samplen3: Actual class C1; Predicted class C2
… … ...
Original classification set
Samplen1: Actual class C2; Predicted class C1 Final predictions file
Samplen2: Actual class C2; Predicted class C2
Samplen3: Actual class C1; Predicted class C2
...
Classification set mapn Predictions setn
Mappers classification sets prediction

Fig. 3. A flowchart of how the classification of a big data classification set is organized in Chi-FRBCS-BigData

and rule weights: R1 : Class 1, RW1 = 0.8743; Please note that Chi-FRBCS-BigData-Max and Chi-
R2 : Class 2, RW2 = 0.9254; R3 : Class 1, FRBCS-BigData-Ave will produce different classifi-
RW3 = 0.7142; R4 : Class 2, RW4 = 0.2143 cation estimations because the input fuzzy RBs are
and R5 : Class 1, RW5 = 0.8215, then, Chi- also different, however, the class estimation process
FRBCS-BigData-Ave will first compute the av- followed is exactly the same for both approaches.
erage weight for the rules with the same conse- 3) Final: In this last phase, the results computed in the
quent, namely, RC1 : Class 1, RWC1 = 0.8033 previous phase are provided as the output of the com-
and RC2 : Class 2, RWC2 = 0.5699, and it will putation process. Precisely, the estimated classes for
keep in RBR the rule RC1 : Class 1, RWC1 = the different examples of the big data classification set
0.8033 because it is the rule with the maximum are aggregated just concatenating the results provided
average weight. by each map task.
Please note that it is not needed for any Chi-FRBCS- It is important to note that this mechanism does not
BigData version to recompute the rule weights in the include a “Reduce” step as it is not necessary to perform
“Reduce” stage, as we are calculating the new rule a computation to combine the results obtained in the “Map”
weights from the previously rule weights provided by phase.
each mapper.
4) Final: In this last phase, the results computed in the IV. E XPERIMENTAL S TUDY
previous phases are provided as the output of the
In this section, we first provide some details of the
computation process. Precisely, the generated fuzzy
problems selected for the experiments, the configuration
KB is composed by the fuzzy DB built in the “Initial”
parameters for the methods analyzed and the statistical tests
phase and the fuzzy RB, RBR , is finally obtained in
applied to compare the results (Section IV-A). Then, we
the “Reduce” phase. This KB will be the model that
provide in Section IV-B the accuracy performance of the
will be used to predict the class for new examples.
approaches tested in the study with respect to the number
As it was previously said, Chi-FRBCS-BigData uses an- of mappers considered. Finally, the runtime spent by the
other MapReduce mechanism to estimate the class of exam- algorithms over the selected data is shown in Section IV-C.
ples that belong to big data classification sets using the fuzzy
KB built within the previous step. This approach follows a A. Experimental Framework
similar scheme to the previous step where the initial dataset
In this study, our aim is to analyze the behavior of the
is distributed along several processing units that provide a
Chi-FRBCS-BigData algorithm in the scenario of big data.
result that will be part of the final result. Specifically, this
To do so, we will consider six problems from the UCI dataset
class estimation process is depicted in Figure 3 and follows
repository [22], shown in Table I, where we denote the
the coming phases:
number of examples (#Ex.), number of attributes (#Atts.),
1) Initial: In this first phase, the method does not need to
selected classes and the number of examples per class. This
perform a specific operation. The system automatically
table is in descending order according to the number of
segments the original big data dataset that needs to
examples.
be classified into independent data blocks which are
automatically transferred to the different processing TABLE I
units together with the previously created fuzzy KB. S UMMARY OF DATASETS
2) Map: In this second phase, each map task estimates
the class for the examples that are included in its Datasets #Ex. #Atts. Selected classes #Samples per class
RLCP 5749132 2 (FALSE; TRUE) (5728201; 20931)
data partition. To do so, each processing unit goes Kddcup DOS vs normal 4856151 41 (DOS; normal) (3883370; 972781)
through all the examples in its data chunk and predicts Poker 0 vs 1 946799 10 (0; 1) (513702; 433097)
Covtype 2 vs 1 495141 54 (2; 1) (283301; 211840)
its output class according to the given fuzzy KB and Census 141544 41 (- 50000.; 50000+.) (133430; 8114)
using the fuzzy reasoning method of the wining rule. Fars Fatal Inj vs No Inj 62123 29 (Fatal Inj; No Inj) (42116; 20007)

1909
The selected datasets only feature two classes even when B. Analysis of the Chi-FRBCS-BigData precision
some of them are multi-class problem. In this work, we In this section, we will try to identify the feasi-
have decided to limit the number of classes despite of the ble differences between the two versions of the Chi-
ability of the Chi-FRBCS-BigData algorithm to deal with FRBCS-BigData proposal: Chi-FRBCS-BigData-Max and
multiple classes to avoid the imbalance in the data that arises Chi-FRBCS-BigData-Ave (for the sake of space, these al-
in many real-world problems [23], as the division approach gorithms are called Chi-BigData-Max and Chi-BigData-Ave
of the MapReduce scheme presented aggravates the small in the Tables).
sample size problem, which decrements the performance in With this aim, Table II shows the average accuracy classifi-
the imbalanced scenario. cation values obtained by the Chi-FRBCS-BigData versions.
In order to develop our study we use a 10-fold stratified This table shows the average training and test results of
cross validation partitioning scheme, that is, nine random each approach and is divided in three horizontal parts that
partitions of data with a 10% of the samples where the correspond to the performance results achieved with the
combination of 9 of them (90%) is considered as training different number of mappers. Moreover, the bold values
set and the remaining one is treated as test set. For each indicate which algorithm is more effective to classify the test
dataset we consider the average results of the ten partitions. examples for a given number of mappers, and the underlined
To verify the performance of the proposed model, we values highlight which is the best performing method in test
compare the results obtained by Chi-FRBCS-BigData-Max in all the experiments considered.
with Chi-FRBCS-BigData-Ave so that we can understand TABLE II
how they behave over the selected big data problems. AVERAGE ACCURACY RESULTS FOR THE C HI -FRBCS-B IG DATA
The configuration parameters used for these algorithms VERSIONS USING 16, 32 AND 64 MAPPERS
are the following: three fuzzy labels for each attribute, the
product T-norm is used to compute the matching degree of Datasets 16 mappers
the antecedent of the rule with the example, the PCF is used Chi-BigData-Max Chi-BigData-Ave
Acctr Acctst Acctr Acctst
to compute the rule weight and the winning rule is used as
RLCP 99.63 99.63 99.63 99.63
fuzzy reasoning method. Additionally, another parameter is Kddcup DOS vs normal 99.93 99.93 99.93 99.93
used in the MapReduce procedure, which is the number of Poker 0 vs 1 62.18 59.88 62.58 60.35
mappers associated to the computation. This value has been Covtype 2 vs 1 74.77 74.72 74.77 74.69
set to 16, 32 and 64 mappers. Census 97.14 93.75 97.15 93.52
To perform the experiments we have used the Atlas Fars Fatal Inj vs No Inj 96.69 94.75 97.06 95.01
Average 88.39 87.11 88.52 87.19
research group’s cluster with 16 nodes, connected with a
32 mappers
1Gb/s ethernet. Each node is composed by two Intel E5- Chi-BigData-Max Chi-BigData-Ave
2620 microprocessors (at 2 GHz, 15MB cache) and 64GB Acctr Acctst Acctr Acctst
of memory running under Linux CentOS 6.3. Furthermore, RLCP 99.63 99.63 99.63 99.63
the cluster works with Hadoop 2.0.0 (Cloudera CDH4.5.0), Kddcup DOS vs normal 99.92 99.92 99.92 99.92
where one node is configured as name-node and job-tracker, Poker 0 vs 1 61.27 58.93 61.82 59.30
and the rest are data-nodes and task-trackers. Covtype 2 vs 1 74.69 74.62 74.88 74.85
Census 97.11 93.48 97.12 93.32
Moreover, when an experimental study is carried out, it Fars Fatal Inj vs No Inj 96.49 94.26 96.87 94.63
is highly advised that the extracted conclusions are validated Average 88.19 86.81 88.37 86.94
through the use of statistical tests [10][11]. Standard paramet- 64 mappers
ric tests, like the t–test, need to meet some initial conditions Chi-BigData-Max Chi-BigData-Ave
in data that are not always met in classification experiments Acctr Acctst Acctr Acctst
and, therefore, non-parametric tests need to be used in their RLCP 99.63 99.63 99.63 99.63
place. Kddcup DOS vs normal 99.92 99.92 99.93 99.93
Poker 0 vs 1 60.45 57.95 60.88 58.12
In this work, we compare the performance of the ap- Covtype 2 vs 1 74.67 74.52 75.05 74.96
proaches using a Wilcoxon signed-rank test [24], a non- Census 97.07 93.30 97.13 93.11
parametric statistical set suitable for pairwise comparisons. Fars Fatal Inj vs No Inj 96.27 93.98 96.76 94.56
This test calculates the differences between two classifiers Average 88.00 86.55 88.23 86.72
and then, ranks them in ascending order with respect to their
absolute value. With these ranks, we compute the R+ and From this table we can observe that, in average, the Chi-
R− values: R+ is the addition of the ranks where the first FRBCS-BigData-Ave method is able to provide better clas-
algorithm outperforms the second, and R− sums the contrary sification results both in training and test than Chi-FRBCS-
case. With this information, the p–value associated to the BigData-Max for any number of mappers considered. There-
statistical distribution is calculated and if that value is below fore, obtaining the average rule weight of all the partial rule
a pre-specified significance level α, then the null hypothesis bases obtained show a positive influence in classification as
of equality of means can be rejected. we are trying to make the rules as general as possible. The

1910
TABLE IV
only clear exception to this tendency can be observed in
AVERAGE RUNTIME ELAPSED IN SECONDS FOR THE
the “Census” dataset that obtains slightly better results for
C HI -FRBCS-B IG DATA VERSIONS USING 16, 32 AND 64 MAPPERS
the Chi-FRBCS-BigData-Max variant. This behavior may be
explained in relation with the training results, as it seems Datasets Chi-BigData-Max Chi-BigData-Ave
that this specific dataset is the one that presents a greater 16 mappers – Runtime (s)
gap between the training and testing results. RLCP 9023.82 8868.84
Moreover, the performance results improve when a smaller Kddcup DOS vs normal 30120.03 29820.01
number of mappers is used for both Chi-FRBCS-BigData Poker 0 vs 1 3075.50 6582.32
versions and for both training and test sets. This behavior Covtype 2 vs 1 1477.67 924.65
is also expected from the MapReduce design followed, as Census 939.32 884.30
Fars Fatal Inj vs No Inj 363.05 236.40
the rule weights are originally estimated from smaller data
Average 7499.90 7886.09
subsets if the number of mappers is high, and therefore, the
32 mappers – Runtime (s)
estimation performed is even further in theses cases from RLCP 2460.89 2303.02
the rule weight value what would be computed if the whole Kddcup DOS vs normal 7890.87 7708.96
dataset was available. However, there are also cases like the Poker 0 vs 1 2210.13 6331.09
“Covtype 2 vs 1” dataset where this trend is not observed. Covtype 2 vs 1 391.40 493.00
In order to give statistical support to the findings previ- Census 388.64 771.04
ously extracted, in Table III we carry out a Wilcoxon test Fars Fatal Inj vs No Inj 141.92 228.96
to compare how both Chi-FRBCS-BigData variants behave Average 2247.31 2972.68
when different number of mappers are used. From this test, 64 mappers – Runtime (s)
we may conclude that there are no clear differences between RLCP 701.31 714.41
Kddcup DOS vs normal 2079.93 2096.34
the approaches as the obtained p–values are not lower than
Poker 0 vs 1 1635.98 8373.40
a given significance level α = 0.05 or 0.1.
Covtype 2 vs 1 252.19 348.86
TABLE III Census 325.24 764.94
W ILCOXON TEST TO COMPARE THE ACCURACY ON THE Fars Fatal Inj vs No Inj 136.24 241.75
C HI -FRBCS-B IG DATA VERSIONS . R+ CORRESPONDS TO THE SUM OF Average 855.15 2089.95
THE RANKS FOR C HI -B IG DATA -M AX AND R − TO C HI -B IG DATA -AVE
7500
Comparison #Mappers R+ R− p-Value 6500
Time elapsed (seconds)

Chi-BigData-Max vs Chi-BigData-Ave 16 5.0 10.0 0.4185

5500
Chi-BigData-Max vs Chi-BigData-Ave 32 3.0 12.0 0.1775
Chi-BigData-Max vs Chi-BigData-Ave 64 3.0 12.0 0.1775 4500

3500

2500
Even when we cannot find statistical differences, we can
1500
observe that there is a tendency to consider the Chi-FRBCS-
500
BigData-Ave approach as the best performing one, as the 16 32 64
Number of mappers
sum of ranks is always directed to its side. Furthermore,
Chi-FRBCS-BigData-Max Chi-FRBCS-BigData-Ave
we can also see that the difference between the approaches
is smaller (higher p–value) when the number of mappers is Fig. 4. Average runtimes for the Chi-FRBCS-BigData versions
also smaller, which is precisely when both approaches obtain
a better classification performance.
C. Analysis of the Chi-FRBCS-BigData runtime for all the values of the number of mappers considered.
In this section, we will focus on understanding the differ- This behavior is expected as this version of the algorithm
ent behavior of the two versions of the Chi-FRBCS-BigData performs less operations than the alternative and the oper-
proposal with respect to the runtime of the model. ations performed are simpler. For the smallest number of
Table IV shows the runtime in seconds spent by the Chi- mappers considered, it seems that there is a greater number of
FRBCS-BigData-Max and Chi-FRBCS-BigData-Ave meth- cases that benefit the Chi-FRBCS-BigData-Ave alternative,
ods. As in the previous case, this table is divided in three however, this improvement per dataset is not very high and
parts, which show the results for each dataset with respect it is not able to compensate how much slower this alternative
to the different number of mappers. There are two types of is in the “Poker 0 vs 1” dataset. In Figure 4, we can see the
highlighting in the table: the bold values correspond to the difference between the runtime of the Chi-FRBCS-BigData
fastest method within the same number of mappers while alternatives in average, where the Chi-FRBCS-BigData-Ave
the underlined values refer to the quickest execution for a version consumes more of time.
dataset. Furthermore, both versions also notably decrement their
In average, we can see that the runtime results show a runtimes when larger values of mappers are used. This
better behavior for the Chi-FRBCS-BigData-Max algorithm diminution in the runtime does not follow a lineal proportion

1911
(as it can be seen from Figure 4). For instance, the speed gain R EFERENCES
when we double the number of processing units is much [1] P. Zikopoulos, C. Eaton, D. DeRoos, T. Deutsch and George Lapis,
greater than reducing the processing time by half. Understanding Big Data: Analytics for Enterprise Class Hadoop and
We can also see that this runtime improvement is not Streaming Data. McGraw-Hill, 2011.
[2] S. Madden, “From Databases to Big Data,” IEEE Internet Computing,
proportional over the different datasets: the biggest datasets vol. 16, no. 3, pp. 4–6, 2012.
are the ones that are able to further improve their perfor- [3] A. Sathi, Big Data Analytics: Disruptive Technologies for Changing
mance while the smaller datasets are not able to do so in the the Game. MC Press, 2012.
[4] H. Ishibuchi, T. Nakashima and M. Nii, Classification and modeling
same proportion. Moreover, the Chi-FRBCS-BigData-Max with linguistic information granules: Advanced approaches to linguis-
algorithm is able to scale up better than the Chi-FRBCS- tic Data Mining. Springer–Verlag, 2004.
BigData-Ave alternative, as the second approach seems to [5] Y. Jin, “Fuzzy modeling of high-dimensional systems: complexity
reduction and interpretability improvement,” IEEE Transactions on
halt its progression when 64 mappers are used. Fuzzy Systems, vol. 8, no. 2, pp. 212–221, 2000.
To sum up, our experimental study shows that the Chi- [6] T.P. Hong, Y.C. Lee and M.T. Wu, “An effective parallel approach for
genetic-fuzzy data mining,” Expert Systems with Applications, vol. 41,
FRBCS-BigData-Ave alternative allows us to obtain better no. 2, pp. 655–662, 2014.
classification results for the Chi-FRBCS-BigData algorithm. [7] H. Ishibuchi, S. Mihara and Y. Nojima, “Parallel distributed hybrid
We have also encountered that greater values for the number fuzzy GBML models with rule set migration and training data rota-
tion,” IEEE Transactions on Fuzzy Systems, vol. 21, no. 2, pp. 355–
of mappers decrement the accuracy of the model as the model 368, 2013.
is less general when it is built over the smaller data subsets. [8] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing
As a counterpart, the Chi-FRBCS-BigData-Max version on large clusters,” Communications of the ACM, vol. 51, no. 1, pp.
107–113, 2008.
does not have a significant drop in the accuracy performance [9] Z. Chi, H. Yan and T. Pham, Fuzzy algorithms with applications to
with respect to the Chi-FRBCS-BigData-Ave alternative and image processing and pattern recognition. World Scientific, 1996.
it provides better response times than it. Furthermore, its [10] J. Demšar, “Statistical Comparisons of Classifiers over Multiple Data
Sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.
speed gain is notable when higher number of mappers are [11] S. Garcı́a and F. Herrera, “An Extension on “Statistical Comparisons
used. In this manner, it is necessary to establish a trade-off in of Classifiers over Multiple Data Sets” for all Pairwise Comparisons,”
each occasion so that the most suitable Chi-FRBCS-BigData Journal of Machine Learning Research, vol. 9, pp. 2607–2624, 2008.
[12] T. White, Hadoop, The Definitive Guide. O’Reilly Media, Inc., 2012.
approach is selected according to our needs. [13] D. Laney, “3D Data Management: Controlling Data Volume,
Velocity, and Variety,” [Online; accessed January 2014] (http:
V. C ONCLUDING R EMARKS //blogs.gartner.com/doug-laney/files/2012/01/
ad949-3D-Data-Management-Controlling-Data-
In this work, we have introduced a linguistic fuzzy rule- Volume-Velocity-and-Variety.pdf), 2001.
[14] IBM, “What is big data? Bringing big data to the enterprise,”
based classification method for big data named Chi-FRBCS- [Online; accessed January 2014] (http://www-01.ibm.com/
BigData. This model obtains an interpretable model that software/data/bigdata/), 2012.
manages colossal collections of data without damaging the [15] J. Dean and S. Ghemawat, “MapReduce: A flexible data processing
tool,” Communications of the ACM, vol. 53, no. 1, pp. 72–77, 2010.
classification accuracy and with fast response times. [16] S. Owen, R. Anil, T. Dunning and E. Friedman, Mahout in Action.
Moreover, this approach has been designed using one of Manning Publications Co., 2011.
the most popular approaches for big data nowadays: the [17] J. Lin, “MapReduce is Good Enough? If All You Have is a Hammer,
Throw Away Everything That’s Not a Nail!,” Big Data, vol. 1, no. 1,
MapReduce framework. In this manner, this algorithm dis- pp. 28–37, 2013.
tributes its computing using a map function and combines the [18] H. Ishibuchi and T. Nakashima, “Effect of Rule Weights in Fuzzy Rule-
output via a reduce function. Specifically, the Chi-FRBCS- Based Classification Systems,” IEEE Transactions on Fuzzy Systems,
vol. 9, no. 4, pp. 506–515, 2001.
BigData proposal has been developed under two versions [19] H. Ishibuchi and T. Yamamoto, “Rule Weight Specification in Fuzzy
which have been called Chi-FRBCS-BigData-Max and Chi- Rule-Based Classification Systems,” IEEE Transactions on Fuzzy
FRBCS-BigData-Ave. Although these alternatives follow the Systems, vol. 13, no. 4, pp. 428–435, 2005.
[20] O. Cordón, M.J. del Jesus and F. Herrera, “A proposal on Reasoning
same structure and share numerous operations, its differences Methods in Fuzzy Rule-Based Classification Systems,” International
in the reduce function finally produce diverse classification Journal of Approximate Reasoning, vol. 20, no. 1, pp. 21–45, 1999.
models with divergent classification results. [21] L.X. Wang and J.M. Mendel, “Generating fuzzy rules by learning from
examples,” IEEE Transactions on Systems, Man, and Cybernetics, vol.
The performance of the Chi-FRBCS-BigData alternatives 22, no. 6, pp. 1414–1427, 1992.
has been contrasted in an experimental study including six [22] K. Bache and M. Lichman, “UCI Machine Learning Repository,”
different big data problems. These results corroborate the [Online; accessed January 2014] (http://archive.ics.uci.
edu/ml), 2014.
goodness of the approaches; however, it is not possible to [23] V. López, A. Fernández, S. Garcı́a, V. Palade and Francisco Herrera,
identify a clear winner and it is needed to select one of “An insight into classification with imbalanced data: Empirical results
them according to our needs. If we aim to obtain the best and current trends on using data intrinsic characteristics,” Information
Sciences, vol. 250, pp. 113–141, 2013.
precision results, then, using the Chi-FRBCS-BigData-Ave [24] D. Sheskin, Handbook of parametric and nonparametric statistical
method with a lower value for the number of mappers seems procedures. Chapman & Hall/CRC, 2006.
to be choice in spite of worse runtime results. On the contrary
case, if we are interested in obtaining the fastest results
without greatly damaging the performance, then, using the
Chi-FRBCS-BigData-Max with a high number of mappers
seems to be the sensible choice.

1912

Data Management Maturity DMM Ebook
No ratings yet
Data Management Maturity DMM Ebook
9 pages
Organizational Behavior - Problem Solving Approach - Angelo Kinicki. McGraw Hill
No ratings yet
Organizational Behavior - Problem Solving Approach - Angelo Kinicki. McGraw Hill
15 pages
An Overview of Analytics, and AI: Learning Objectives For Chapter 1
No ratings yet
An Overview of Analytics, and AI: Learning Objectives For Chapter 1
23 pages
2338 2017 Inffus Fusionmr
No ratings yet
2338 2017 Inffus Fusionmr
11 pages
IEEE Conf Paper Formatvv
No ratings yet
IEEE Conf Paper Formatvv
5 pages
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
No ratings yet
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
8 pages
Big Data A Survey Dinesh
No ratings yet
Big Data A Survey Dinesh
9 pages
Retrieve
No ratings yet
Retrieve
40 pages
Big Data Dan Cloud Computing
No ratings yet
Big Data Dan Cloud Computing
19 pages
Mapreduce Framework Based Big Data Clustering Using Fractional Integrated Sparse Fuzzy C Means Algorithm
No ratings yet
Mapreduce Framework Based Big Data Clustering Using Fractional Integrated Sparse Fuzzy C Means Algorithm
44 pages
Cctedfrom K Means To Rough Set Theory Using Map Reducing Techniquesupdated
No ratings yet
Cctedfrom K Means To Rough Set Theory Using Map Reducing Techniquesupdated
10 pages
Big Data Analytics Litrature Review
No ratings yet
Big Data Analytics Litrature Review
7 pages
Mapreduce: Simplified Data Analysis of Big Data: Sciencedirect
No ratings yet
Mapreduce: Simplified Data Analysis of Big Data: Sciencedirect
9 pages
DOLAP 2011-Analytics Over Large Scale MD Data
No ratings yet
DOLAP 2011-Analytics Over Large Scale MD Data
3 pages
Web Oriented FIM For Large Scale Dataset Using Hadoop
No ratings yet
Web Oriented FIM For Large Scale Dataset Using Hadoop
5 pages
Geostatistical Analysis Research: International Journal of Engineering Trends and Technology-Volume2Issue3 - 2011
No ratings yet
Geostatistical Analysis Research: International Journal of Engineering Trends and Technology-Volume2Issue3 - 2011
8 pages
Big Data Dimensional Analysis
No ratings yet
Big Data Dimensional Analysis
6 pages
This Document Is Published In:: Institutional Repository
No ratings yet
This Document Is Published In:: Institutional Repository
9 pages
Ijctt V71i2p105
No ratings yet
Ijctt V71i2p105
7 pages
Dendrogram Clustering For 3D Data Analytics in Smart City
No ratings yet
Dendrogram Clustering For 3D Data Analytics in Smart City
7 pages
Predictive Analytics For Big Data Processing
No ratings yet
Predictive Analytics For Big Data Processing
3 pages
Big Data Storage System Based On A Distributed Hash Tables System
No ratings yet
Big Data Storage System Based On A Distributed Hash Tables System
9 pages
4 A Review Paper On Big Data and Hadoop
No ratings yet
4 A Review Paper On Big Data and Hadoop
3 pages
Mapreduce With Hadoop For Simplified Analysis of Big Data: Ch. Shobha Rani Dr. B. Rama
No ratings yet
Mapreduce With Hadoop For Simplified Analysis of Big Data: Ch. Shobha Rani Dr. B. Rama
4 pages
Data Modeling Overview
No ratings yet
Data Modeling Overview
18 pages
Toward Scalable Systems For Big Data Analytics: A Technology Tutorial
No ratings yet
Toward Scalable Systems For Big Data Analytics: A Technology Tutorial
36 pages
1 s2.0 S156625351930377X Main
No ratings yet
1 s2.0 S156625351930377X Main
15 pages
Ashish_Presentation_Stage1_modify_LR
No ratings yet
Ashish_Presentation_Stage1_modify_LR
24 pages
A Survey On Indexing Techniques For Big Data: Taxonomy and Performance Evaluation
No ratings yet
A Survey On Indexing Techniques For Big Data: Taxonomy and Performance Evaluation
44 pages
Challenges For Mapreduce in Big Data: Scholarship@Western
No ratings yet
Challenges For Mapreduce in Big Data: Scholarship@Western
10 pages
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
No ratings yet
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
21 pages
347 VLDBJ2013 MapReduceSurvey
No ratings yet
347 VLDBJ2013 MapReduceSurvey
27 pages
A Comprehensive View of Hadoop Research - A Systematic Literature Review
No ratings yet
A Comprehensive View of Hadoop Research - A Systematic Literature Review
28 pages
1 s2.0 S0167739X16301856 Main
No ratings yet
1 s2.0 S0167739X16301856 Main
11 pages
Big Data Manual - Edited
No ratings yet
Big Data Manual - Edited
69 pages
Research Publish Journal
No ratings yet
Research Publish Journal
7 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
Knowledge Discovery in Data Science: KDD Meets Big Data
No ratings yet
Knowledge Discovery in Data Science: KDD Meets Big Data
6 pages
Big Data Processing in Cloud Computing Environments
No ratings yet
Big Data Processing in Cloud Computing Environments
7 pages
Parallel Sampling
No ratings yet
Parallel Sampling
17 pages
Jsaer2016 03 01 21 24
No ratings yet
Jsaer2016 03 01 21 24
4 pages
A novel scalable deep ensemble learning framework for big data classification via MapReduce integration
No ratings yet
A novel scalable deep ensemble learning framework for big data classification via MapReduce integration
15 pages
15cs565 Cloud Computing Module 4 Notes
No ratings yet
15cs565 Cloud Computing Module 4 Notes
33 pages
CS8091-Big-Data-Analytics
No ratings yet
CS8091-Big-Data-Analytics
28 pages
Mining Databases: Towards Algorithms For Knowledge Discovery
No ratings yet
Mining Databases: Towards Algorithms For Knowledge Discovery
10 pages
Big Feature Data Analytics: Split and Combine Linear Discriminant Analysis (SC-LDA) For Integration Towards Decision Making Analytics
No ratings yet
Big Feature Data Analytics: Split and Combine Linear Discriminant Analysis (SC-LDA) For Integration Towards Decision Making Analytics
10 pages
Parallel Processing
No ratings yet
Parallel Processing
5 pages
Big Data and Genomics
No ratings yet
Big Data and Genomics
17 pages
A Informative Study On Big Data in Present Day World
No ratings yet
A Informative Study On Big Data in Present Day World
8 pages
Big Data With Challenges and Future Scope
No ratings yet
Big Data With Challenges and Future Scope
14 pages
Big Data Research
No ratings yet
Big Data Research
2 pages
Big Data Is A Broad Term For
No ratings yet
Big Data Is A Broad Term For
14 pages
Na BIC20122
No ratings yet
Na BIC20122
8 pages
(IJCST-V5I4P10) :M Dhavapriya
No ratings yet
(IJCST-V5I4P10) :M Dhavapriya
5 pages
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
No ratings yet
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
12 pages
Proposition of An Employability Prediction
No ratings yet
Proposition of An Employability Prediction
14 pages
BigDataAnalytics
100% (1)
BigDataAnalytics
36 pages
Applied Sciences: Attributes Reduction in Big Data
No ratings yet
Applied Sciences: Attributes Reduction in Big Data
12 pages
j.ijdsa.20241005.11
No ratings yet
j.ijdsa.20241005.11
14 pages
Deep Neural Networks and Tabular Data A Survey
No ratings yet
Deep Neural Networks and Tabular Data A Survey
21 pages
Seminar Topic
No ratings yet
Seminar Topic
13 pages
A Study On Big Data Processing Mechanism & Applicability: Byung-Tae Chun and Seong-Hoon Lee
No ratings yet
A Study On Big Data Processing Mechanism & Applicability: Byung-Tae Chun and Seong-Hoon Lee
10 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Talent Demand and Supply Report - 6-Dec-23 - Compressed
No ratings yet
Talent Demand and Supply Report - 6-Dec-23 - Compressed
60 pages
Big Data in The Policy Cycle Policy Decision Making in The Digital Era
No ratings yet
Big Data in The Policy Cycle Policy Decision Making in The Digital Era
24 pages
Chapter - 01 - Creating Value in The Service Economy
No ratings yet
Chapter - 01 - Creating Value in The Service Economy
36 pages
Introduction to IoT and Digital Transformation v2 1 Scope and Sequence
No ratings yet
Introduction to IoT and Digital Transformation v2 1 Scope and Sequence
5 pages
Skillsframework - Logistics Collateral Final
100% (1)
Skillsframework - Logistics Collateral Final
77 pages
2025 List of Graduate Programmes- NCI Careers Team
No ratings yet
2025 List of Graduate Programmes- NCI Careers Team
14 pages
(PDF Download) A Roadmap For Enabling Industry 4.0 by Artificial Intelligence Jyotir Moy Chatterjee Fulll Chapter
No ratings yet
(PDF Download) A Roadmap For Enabling Industry 4.0 by Artificial Intelligence Jyotir Moy Chatterjee Fulll Chapter
64 pages
EPSM Syllabus 290419
No ratings yet
EPSM Syllabus 290419
12 pages
Business Model Innovation of Smart Healthcare Platform Company
No ratings yet
Business Model Innovation of Smart Healthcare Platform Company
7 pages
Marketing 5.0 The Era of Technology For Humanity With A Collaboration of Humans and Machines
100% (1)
Marketing 5.0 The Era of Technology For Humanity With A Collaboration of Humans and Machines
15 pages
The Hadoop Ecosystem: So Much Free Stuff!
No ratings yet
The Hadoop Ecosystem: So Much Free Stuff!
21 pages
Nvidia insideBigData Guide To Deep Learning and AI PDF
No ratings yet
Nvidia insideBigData Guide To Deep Learning and AI PDF
9 pages
The Theoretical Frameworkof Library Smart Service
No ratings yet
The Theoretical Frameworkof Library Smart Service
3 pages
Unit - 3: Big Data Analytics
No ratings yet
Unit - 3: Big Data Analytics
23 pages
DAV Quantum
No ratings yet
DAV Quantum
143 pages
MSGuide
No ratings yet
MSGuide
17 pages
Big Data Analytics A Review On Theoretical Contributions-2017
No ratings yet
Big Data Analytics A Review On Theoretical Contributions-2017
27 pages
PMP Leads (SM + Webinar)
0% (1)
PMP Leads (SM + Webinar)
68 pages
Big Data:: Task 1
No ratings yet
Big Data:: Task 1
18 pages
Top 500 Data Engineering Interview Questions
No ratings yet
Top 500 Data Engineering Interview Questions
126 pages
Apache HBase 0.98
No ratings yet
Apache HBase 0.98
29 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
Cloud Storage Market Size, Share, Trends & Growth Forecast Report - Segmentation By Type (Solutions and Services), Deployment Model (Public Cloud, Private Cloud and Hybrid Cloud), Organization Size (Large Enterpris
No ratings yet
Cloud Storage Market Size, Share, Trends & Growth Forecast Report - Segmentation By Type (Solutions and Services), Deployment Model (Public Cloud, Private Cloud and Hybrid Cloud), Organization Size (Large Enterpris
5 pages
Data Analytics of Power System Application
No ratings yet
Data Analytics of Power System Application
7 pages
1 s2.0 S2665917422000411 Main
No ratings yet
1 s2.0 S2665917422000411 Main
6 pages
BDA Unit 3
No ratings yet
BDA Unit 3
6 pages
Industry 4.0 Warehouse Automation Whitepaper Addverb
100% (1)
Industry 4.0 Warehouse Automation Whitepaper Addverb
20 pages

On The Use of Mapreduce To Build Linguistic Fuzzy Rule Based Classification Systems For Big Data

Uploaded by

On The Use of Mapreduce To Build Linguistic Fuzzy Rule Based Classification Systems For Big Data

Uploaded by

2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)

July 6-11, 2014, Beijing, China

On the use of MapReduce to build Linguistic Fuzzy Rule Based

978-1-4799-2072-3/14/$31.00 ©2014 IEEE 1905

R1: IF A1 = L1 AND A2 = L1 THEN C1; RW1 = 0.9875

Fig. 2. A flowchart of how the building of the KB is organized in Chi-FRBCS-BigData

Sample11: Actual class C1; Predicted class C1

Chi-BigData-Max vs Chi-BigData-Ave 16 5.0 10.0 0.4185

You might also like