0% found this document useful (0 votes)

8 views

2017, Prajapati - Intersting association rule mining whith consistent and inconsistent rule detection

Uploaded by

practice752

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

2017, Prajapati - Intersting association rule mining whith consistent and inconsistent rule detection

Uploaded by

practice752

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Available online at www.sciencedirect.

com

ScienceDirect
Future Computing and Informatics Journal 2 (2017) 19e30
http://www.journals.elsevier.com/future-computing-and-informatics-journal/

Interesting association rule mining with consistent and inconsistent rule

detection from big sales data in distributed environment
Dinesh J. Prajapati a,*, Sanjay Garg b, N.C. Chauhan c
a
Department of Computer Science & Engineering, Institute of Technology, Nirma University, Ahmedabad, India
b
Department of Computer Engineering, Institute of Technology, Nirma University, Ahmedabad, India
c
Department of Information Technology, A. D. Patel Institute of Technology, Anand, India
Received 19 September 2016; revised 21 March 2017; accepted 12 April 2017
Available online 10 May 2017

Abstract

Nowadays, there is an increasing demand in mining interesting patterns from the big data. The process of analyzing such a huge amount of
data is really computationally complex task when using traditional methods. The overall purpose of this paper is in twofold. First, this paper
presents a novel approach to identify consistent and inconsistent association rules from sales data located in distributed environment. Secondly,
the paper also overcomes the main memory bottleneck and computing time overhead of single computing system by applying computations to
multi node cluster. The proposed method initially extracts frequent itemsets for each zone using existing distributed frequent pattern mining
algorithms. The paper also compares the time efficiency of Mapreduce based frequent pattern mining algorithm with Count Distribution Al-
gorithm (CDA) and Fast Distributed Mining (FDM) algorithms. The association generated from frequent itemsets are too large that it becomes
complex to analyze it. Thus, Mapreduce based consistent and inconsistent rule detection (MR-CIRD) algorithm is proposed to detect the
consistent and inconsistent rules from big data and provide useful and actionable knowledge to the domain experts. These pruned interesting
rules also give useful knowledge for better marketing strategy as well. The extracted consistent and inconsistent rules are evaluated and
compared based on different interestingness measures presented together with experimental results that lead to the final conclusions.
© 2017 Faculty of Computers and Information Technology, Future University in Egypt. Production and hosting by Elsevier B.V. This is an open
access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Keywords: Outlier association rules; Interestingness measures; Distributed frequent pattern mining; Mapreduce; Preprocessing

1. Introduction a company is interested in identifying items that are frequently

purchased together. Association rule mining process basically
Data mining refers to extraction of useful information and consists of two steps [1,2]: (i) finding all the frequent itemsets
patterns through the knowledge discovery process. One of the that satisfy minimum support threshold and, (ii) generating
promising and widely used techniques in data mining is as- strong association rules from the derived frequent itemsets by
sociation rule mining. Association rule mining is the task of applying minimum confidence threshold. Big data is termed
uncovering relationships among large data. Association rule for a collection of large data sets which are complex and
mining is a popular technique in the retail sales industry where difficult to process using traditional data processing tools [3].
Most of the existing research work is focused on the central-
ized outlier rule detection problem where all the data are
* Corresponding author. stored and processed at a central location. Such centralized
E-mail addresses: [email protected] (D.J. Prajapati), gargsv@ data processing systems do not work efficiently on large and
gmail.com (S. Garg), [email protected] (N.C. Chauhan). distributed data. Hence, there is requirement to analyze huge
Peer review under responsibility of Faculty of Computers and Information data effectively as well as efficiently under distributed
Technology, Future University in Egypt.

http://dx.doi.org/10.1016/j.fcij.2017.04.003
2314-7288/© 2017 Faculty of Computers and Information Technology, Future University in Egypt. Production and hosting by Elsevier B.V. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
20 D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30

computing environments [4]. To handle the big data, distrib- relationship between those items. An association rule is rep-
uted system consists of a pool of autonomous compute nodes resented by X / Y, where X and Y are the distinct itemsets.
that appears as a single workstation [5]. Interestingness mea- The Association rule exposes the relationship between the
sures play an important role in association rule mining. These itemset X with the itemset Y [7].
measures are used to find interesting patterns based on user
need. The large number of association rules generated by 2.4. Consistent rule
frequent pattern mining algorithm may not be useful for the
organization as a whole. Therefore, there is a need for filtering The set of association rules containing itemset which is
out the interesting and uninteresting rules for business locally as well as globally frequent in a large data are the
intelligence. consistent rules.
In brief, the contribution of this paper is summarized in
five steps: i) First of all, big sales dataset is transformed into 2.5. Inconsistent rule
zone wise transactional dataset using Hadoop Mapreduce,
(ii) Null transactions and infrequent itemsets at each zone The set of association rules containing itemset which is
are removed from the transactional dataset, (iii) The existing frequent locally but not frequent globally or wise a versa, are
distributed frequent itemset mining algorithms CDA, FDM the inconsistent rules. Inconsistent rules are non-conforming
and DFPM are applied on each zone to generate the com- patterns in the dataset; i.e., the sales pattern does not exhibit
plete set of frequent itemsets, and time efficiency of these normal behavior.
algorithms is compared, (iv) Then, association rules are
generated for each zone, (v) Finally, the proposed MR- 2.6. NULL transaction
CIRD algorithm is applied to find consistent and inconsis-
tent rules zone wise using various interestingness measures. A null transaction is a transaction that does not contain any
Both these algorithms are tested on big sales dataset of itemsets or single item [5,8]. The presence of null transaction
AMUL Dairy. is one of the critical problems in the form of efficiency for
The remaining of this paper is organized as follows. Section mining strong association rule.
2 presents preliminaries for interesting association rule mining
with consistent and inconsistent rule detection in distributed 2.7. Interestingness measures
environment. Related work is given in section 3. Section 4
shows the proposed methodology. In Section 5, the perfor- The term interestingness measure is essential aspect of
mance of proposed method is evaluated on sales dataset of extraction of interesting pattern from the database. For this
AMUL dairy. Finally, the conclusions and future scope is experiment, following interestingness measures are used.
drawn in section 6.
2.7.1. Confidence
2. Preliminaries The confidence is the percentage of transactions in the
database D with itemset X that also contains the itemset Y. The
In this section, the complete set of definitions, terminol- confidence is calculated using the conditional probability
ogies and assumptions used in this paper are presented. which is further expressed in terms of itemset support. The
equation for the confidence is given by Refs. [1,7],
2.1. Itemset
Support ðX∪YÞ
ConfidenceðX/YÞ ¼ PðYjXÞ ¼
Let I ¼ {I1, I2 … In} be a set of distinct items in the dataset Support ðXÞ
D. Itemset is a set of items, X which is subset of I. An itemset Here, Support (X∪Y) is the number of transactions con-
X with k distinct items is referred as k-itemset [6]. taining the itemsets X and Y both, and Support (X ) is the
number of transactions containing the itemset X.
2.2. Support
2.7.2. All-confidence
The support is the percentage of transactions in the data- All-confidence is defined as [9],
base D that contain both itemsets X and Y. The support of an
association rule X / Y is given by Refs. [1,7], SupportðX∪YÞ
All confidenceðX/YÞ ¼
MaxðSupportðXÞ; SupportðYÞÞ
SupportðX/YÞ ¼ SupportðX∪YÞ ¼ PðX∪YÞ
All-confidence satisfies the downward closed closure
property. Hence, it is effectively used for interesting associa-
2.3. Association rule tion rule mining.

Consider a dataset D, having n number of transactions 2.7.3. Cosine

containing a set of items. An association rule is the Cosine is defined as [10],
D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30 21

SupportðX∪YÞ 3. Related work

CosineðX/YÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
SupportðXÞ*SupportðYÞ
The Count Distribution Algorithm (CDA) [16e18] pro-
The value of cosine (X / Y ) is close to 1 indicates more vides essential distributed association rule mining algorithm.
transactions containing item X also contains item Y, and vice In this paper, each node contains huge number of frequent
versa. Similarly, the value of cosine (X / Y ) is close to itemsets and counts candidate itemset locally. These count
0 indicates more transactions contain item X without con- values are stored in the local database and maintains incoming
taining item Y, and vice versa. count values. All the computing nodes execute the apriori
algorithm locally and after reading count values from the local
2.7.4. Interestingness of a rule database they broadcast respective count values to the
Interestingness of a rule, denoted by Interestingness (X / remaining nodes. Each of the nodes can generate new candi-
Y ), is used to measure how much the rule is surprising for the date itemset based on the global counter. The FDM (Fast
user. The most important concept in association rule mining is Distributed Mining) algorithm [18,19] provides candidate set
to find some hidden information from the data. Interestingness generation algorithm similar to apriori. The interesting prop-
of a rule discovers not only the rules with higher frequency but erty of local as well as global frequent itemset is used to
also the rules comparatively less frequency in the database. generate a reduced set of candidates for the each iteration.
The following expression can be used to define the interest- Thus the number of messages interchanged between each node
ingness of a rule [11,12], reduces. Once the candidate sets are generated, then local
reduction and global reduction techniques are applied to
SupportðX∪YÞ SupportðX∪YÞ
InterestingnessðX/YÞ ¼ * eliminate few candidate sets from each site. Lin and Tseng
SupportðXÞ SupportðYÞ
[20] proposed an automatic support specification for effi-
SupportðX∪YÞ ciently mining high-confidence and positive lift associations
* 1
NOT without consulting the users. The proposed support specifi-
cation still can-not find all interesting association rules. There
where, NOT indicates the total number of transactions in the is no way to set automatically mini-mum item support without
database. missing any interesting association rules. Stephane et al. [21]
proposed method that validating the interesting rules against
2.7.5. Lift/Interest the selected measures. A bootstrap based method BS_FD can
Lift/Interest is used to measure frequency X and Y together be used for filtering rules where the antecedent increases the
if both are statistically independent of each other [13,14]. The probability of the consequent, for filtering rules. The
lift of rule X / Y is defines as, discriminant rules are filtered in the context of genomics.
Narita and Kitagawa [22] proposed algorithms for detecting
Confidence ConfidenceðX/YÞ outlier transactions more efficiently from the transactional
LiftðX/YÞ ¼ ¼
Expected confidence SupportðYÞ database. In this paper, two strategies are used for faster
A lift value 1 indicates X and Y appear as frequently detection of rule. First, redundant association rules are
together under the assumption of conditional independence. In removed and then, candidates of outlier transactions are
this case, X and Y are said to be independent of each other. pruned using maximal frequent itemsets. The proposed
Lift/Interest is not downward closed and it does not have rare approach is compared with brute force algorithm to derive
item problem [15]. detection accuracies and time efficiency of outlier transactions
detection. Aydin and Guvenir [23] proposed a post-processing
2.7.6. Conviction method to learn a subjective model for the interestingness
Conviction measures the implication strength of the rule concept description of the streaming association rules. The
from statistical independence [13]. Conviction is defined as, proposed method works incrementally and employs user
interactivity at a certain level. The results show that the model

1 SupportðYÞ PðXÞ*P Y^ can successfully select the interesting ones. As a future work,
ConvictionðX/YÞ ¼ ¼ novelty interestingness factor may be incorporated into the
1 ConfidenceðX/YÞ P X∪Y^
system. Shaari et al. [24] discusses on the discovery of
meaningful outlier detection based on Non_Reduct computa-
where, P(^Y ) is the probability that Y does not appear in a tion by considering the negative association rules. The authors
transaction. Conviction compares the probability that X appears have proposed Non_Reduct computation to detect outliers
without Y if they were dependent with the actual frequency of from rare classes. These generated outliers are used for dis-
the appearance of X without Y. Unlike confidence, conviction covery of meaningful knowledge and it incorporates the
factors in both P(X ) and P(Y ) and always has a value of 1 when concept of negatives rules. Preetha and Radha [25] proposed
the relevant items are completely unrelated [14]. In contrast to FP-growth based associative classification algorithm to iden-
lift, conviction is a directed measure because it also uses the tify outlier transaction. The algorithm is modified by using an
information of the absence of the consequent [15]. Hence, automatic minimum support and minimum confidence calcu-
conviction is monotone in confidence and lift. lation. The authors also introduce two new measures called
22 D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30

collective support and confidence measure for interesting as- sets. This will help the teachers in giving extra coaching for
sociation rule mining. The author suggested parallel process- the weak students. Butincu and Craus [32] present improved
ing of proposed approach to reduce the execution time of version of the frequent itemset mining algorithm as well as its
system as future scope. The PARMA algorithm is proposed generalized version. The authors introduced optimized for-
[26,27] to provide great improvements to the runtime of mulas for generating valid candidates by reducing number of
finding association rules. PARMA achieves this by utilizing invalid candidates. By using the computations of previous
probabilistic results, it only approximates the answers. This steps by other processed nodes, it avoids generating redundant
solution uses clustering to create groups of transactions and candidates. Authors also suggested to run the same algorithm
chooses candidate sets from the representative itemsets in the in parallel or distributed system.
clusters. For the marketing strategy, it is more important to analyze
Wu et al. [28] proposed performance analysis factors like inconsistent pattern when data is distributed geographically.
heterogeneous and autonomous. The authors also proposed a However, none of the above mentioned work finds regional
complex theorem which characterizes the features of both the inconsistent patterns from the large dataset. Therefore, trans-
big data revolution and big data processing model. Authors forming the sales data into transaction and then eliminating
analyze the challenging issues in the data mining model and null transaction for the future consideration; is the initial part
also in the big data analysis. Lin and Ryaboy [29] proposed of this proposed methodology. After removing null trans-
method for analyzing the Twitter data. In this paper two major actions, distributed frequent mining algorithms are applied for
topics are discussed. First, schemas are insufficient to provide each zone to generate useful patterns and time efficiency is
the knowledge of understanding the petabytes or terabytes of also compared. Then, the proposed MR-CIRD algorithm is
data. Second, a major challenge for analyzing the data is the applied to find zone wise consistent and inconsistent rules. The
heterogeneity of the various components. The objective of this objective of this work is to remove the drawbacks of relational
paper is to share experiences of authors to analyze the data database and facilitate the existing Mapreduce framework; to
from Twitter in the area of production environment. Karim generate the complete set of regional consistent and incon-
et al. [30] proposed a distributed system for mining the busi- sistent rules with smaller candidate set generations, less
ness related transactional datasets using an improved Mapre- message passing and improvement in the execution time of the
duce framework. This model is highly scalable in terms of system.
increasing database size. In this paper, authors implemented
“Associated-Correlated-Independent” algorithm which effec- 4. Proposed methodology
tively mines the complete set of customer's purchase rules.
Rajeswari et al. [31] proposed modified Fuzzy Apriori Rare The proposed methodology is applied in two phases. In the
Item sets Mining (FARIM) algorithm to detect the outliers first phase, association rules along with interestingness mea-
(weak student) based on the heap space usage. The heap space sures and zone number are derived. In the second phase, the
used by FARIM and modified FARIM algorithms on educa- association rules are categorized into consistent and incon-
tional dataset is tested and derived that the modified FARIM sistent rules, zone wise.
algorithm uses less heap space as compared to the FARIM
algorithm. Thus this approach not only extracts the failure 4.1. Phase-I: zone wise association rule generation
students as outliers, but also those students who have passed
with border marks are also extracted as outliers. Here, fuzzy In the first phase, the dataset of each zone is given as input
based apriori algorithm is used to generate less frequent item to the data preprocessing unit as shown in Fig. 1. Due to huge

Fig. 1. Proposed methodology phase-I.

D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30 23

dataset size, the pre-processing is done in distributed envi- transactions which affects the performance of the system. For
ronment. The original sales dataset of each zone is trans- example, some of the distributor may not sell cow milk, butter
formed into zone wise transactional dataset using Hadoop and protein powder together; if these three itemsets are
Mapreduce framework. frequent-3 itemsets. Since, big sales dataset typically have null
transactions so it is necessary to remove null transactions from
4.1.1. Data pre-processing using Mapreduce it. For example, suppose that, the dairy dataset contains
Data pre-processing is used to transform relational sales 1,00,000 transactions where 10% transactions are null trans-
dataset into transactional dataset. For the pre-processing of big actions. Any frequent pattern mining algorithm scans all the
data, Mapreduce is used in two phase: Mapper setup and 1,00,000 transactions while, the proposed approach considers
Reducer setup. 90,000 valid transactions after removing 10,000 null trans-
actions. So, null transactions are removed from the trans-
4.1.1.1. Mapper setup. The first step of any Mapreduce job is actional dataset to improve the performance of the system and
the map step. In this step the Hadoop framework splits the Ds thus, actual transactions are generated.
input database into smaller Dn chunks. These n chunks are
given to Hadoop Distributed File System (HDFS). The size of 4.1.3. Distributed frequent pattern mining algorithm
database split depends on the configuration of Mapreduce The actual transactional dataset is given to distributed
framework and the way in which the data is distributed on the frequent pattern mining algorithms to generate frequent k-
file systems of the machines in the given cluster. The purpose itemset. The CDA and FDM algorithms are data parallelism
of the map function is to combine zone code (zone), distributor algorithm. In CDA algorithm, the dataset is divided into n
code (dist), sales date (date) and retailer code (ret). The input number of partitions, each partition is given to separate node.
sales database is given to the mapper line by line then each Each node counts the candidates and then broadcasts its counts
line is split into zone, dist, date, ret, and pr. The output <key, to remaining nodes. Each node then determines the global
value> pair consists of the <zone þ dist þ date þ ret, pr>. counts. The global counts are used to determine the large
The pseudo code of the map task is shown in Fig. 2. itemsets and to generate the candidates for the next iteration.
In FDM algorithm, candidate set is generated similar to apriori
4.1.1.2. Reducer setup. The reducer function gets its input as algorithm. To reduce the size of candidates at each iteration,
<key, value> pairs from the output of the previous map local and global frequent itemsets are used which result
function. The pairs are ordered and there is a guarantee that if reduction in the number of messages interchanged between
a reduce task receives a key it will also receive all values with nodes. Once the candidate sets are generated, local reduction
the same key. The ordering and moving of the intermediate and global reduction techniques are applied on each site to
<key, value> pairs is done automatically by the framework eliminate redundant candidate sets. The main drawback of
and it is called the shuffle step. The key is split into two parts CDA and FDM algorithm is that both generates large candi-
zone and dist þ date þ ret. After combining all the values each date set, uses more number of message passing system and
key, the reduce task creates the transactional database zone execution time is higher while mining big data. These draw-
wise to separate the transactions of each zone. The pseudo backs can be improved by Mapreduce based frequent pattern
code of the reduce task is shown in Fig. 3. mining algorithm.
After preprocessing using Mapreduce, original sales dairy In Distributed Frequent Pattern Mining (DFPM) algorithm,
dataset is transformed into the transactional dataset zone wise once the actual transactional dataset is stored in HDFS, the
and then null transaction are removed from it to generate the entire dataset is split into the smaller segments and then each
actual transaction. segment is transformed to data nodes. The map function is
executed on each data segments and it produces <key, value>
4.1.2. Screening of null transactions pairs for each record of database. The Mapreduce framework
Null transactions are transactions which do not contain any generates <key, value> pairs having same items and invokes
itemset or contains infrequent itemset. The preprocessing unit the reducer function by passing the list of values for candidate
generates the transaction containing large number of null itemsets. For each database scan, the map function generates
local candidate itemsets, and, the reduce function receives
global count by adding these local counts. For the overall
computation, multiple iterations of Mapreduce functions are
necessary. Each of the Mapreduce iteration produces a
frequent itemset. The iteration continues until no further
frequent itemsets exist. The reducer function sums up all the
values produced by mapper and generates a count for the
candidate item. The main advantage of this approach is that it
doesn't exchange data between each node, but it only ex-
changes the count values. The DFPM algorithm uses notation
Ck as a set of candidate k-itemset and Lk as a set of frequent k-
Fig. 2. Mapper function. itemset which is shown in Fig. 4. For each zone, the
24 D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30

of length 1 and generates candidates with length 2. During step

k of the algorithm it will start with length n itemsets and
generate length kþ1 candidate itemsets, respectively. If the
reduce task cannot generate bigger candidate itemsets it will
stop the whole computation. Frequent itemsets are calculated
based on different values of minimum support threshold.
Support decision system will check for the appropriate support
count value for generating association rules.

4.1.4. Association rule generation

This generated frequent k-itemset is given to association
rule generator to construct all possible rules. Association rules
can be generated as follows [1,33].

➢ For each frequent itemset, l, generate all non-empty

subsets of l.
➢ For every non-empty subset s of l, output the rule
“s / (l s)” if (Support (l )/Support (s)) min_conf,
Fig. 3. Reducer function.
where min_conf is the minimum confidence threshold.

transactional data is given as an input to the mapper line by In the proposed approach, all the rules without checking the
line. Each line is split into items and the output <key, value> condition of minimum confidence threshold are generated.
pair consists of the item and the value 1. This is the local Finally, association rule containing (Rule, Interesting Measure,
frequency of the item. The reduce task starts with the itemsets Zone) for each zone are generated.

Fig. 4. The DFPM algorithm.

D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30 25

Fig. 5. Proposed methodology phase-II.

4.2. Phase-II: consistent and inconsistent rule detection rules having total frequency more than one are considered as
inconsistent rules and are stored in Ry file. Similarly, the rules
In the second phase, the consistent and inconsistent asso- having total frequency one, are considered as consistent rules
ciation rules of each zone are generated based on different and are stored in Rx file.
interestingness measures (IM). For this experiment, confi-
dence, all-confidence, cosine, interestingness of a rule, lift and 5. Experimental setup & results
conviction are used as interestingness measures. The rule is
said to be consistent, if the interestingness measure of a rule in For the experimental purpose cluster of four desktop ma-
a zone is nearer to global value of IM, otherwise the rule is chines consists of i5 processor with 4 GB DDR-3 RAM are
said to be inconsistent rule. The framework for interesting used. Ubuntu 12.04 LTS operating system is installed in all the
association rule mining with inconsistent rule detection in four computers. Usually JVM is not a part of Ubuntu 12.04,
distributed environment is shown in Fig. 5. hence, JVM is also installed in all the four computers. Multi-
The consistent and inconsistent association rules for each node cluster is configured in three computers and single-node
zone are calculated using Mapreduce based consistent and cluster is configured in single computer using apache Hadoop
inconsistent rule detection algorithm which is given in Fig. 6. packages. The preprocessing algorithm, distributed frequent
In the proposed algorithm two stages of mapper as well as pattern mining algorithm and phase-II of proposed method is
reducer is used. The association rules generated for each zone tested on both multi-node as well as single-node cluster.
in phase-I is given as an input to the Mapper_stage-1, line by For this experiment, the sales database of AMUL dairy with
line. In the Mapper_stage-1, each zonal rule is classified into more than 1500 different dairy products is used. The database
three groups based on interestingness measures (IM ) as having total size of 5GB is divided into six zone based on the
IM 30%, 30% < IM 60% and IM > 60%. The input of area of distributors. The distributor of zone code ranging from
Mapper_stage-1 is a set of association rules containing (AR, 1 to 6 are having zone name as Delhi, Chennai, Kolkata,
Zone, IM ), and the output is three MS1 files containing <AR, Mumbai, Ahmedabad and Guwahati, respectively. The sales of
Zone þ IM> as <key, value> pair. For each MS1 file, the various dairy products are done based on concept hierarchy.
Reducer_stage-1 function combines the interestingness mea- First of all, the product is sent to the distributor of a specific
sure of all the zone having same association rule which are zone who in turn distributes the same product to the local
stored in RS1 file. The input of Reducer_stage-1 is <AR, retailer and finally the retailer sells it to the customer. A part of
Zone þ IM > as <key, value> pair and the generated output is sample sales dataset of AMUL dairy containing zone code,
<AR, IM_final>. Here, the function NOT gives number of distributor code, sales date, retailer code and actual product
transaction for that zone. The output of Reducer_stage-1 is code is shown in Table 1.
given as input to Mapper_stage-2 which generates the output
<key, value> pair for each RS1 file, where key is association 5.1. Pre-processing of big sales data
rule and value is 1. Here, value indicates the local frequency of
the rule. The Reduce_stage-2 combines the output of AMUL sales dataset distributed across six different zones,
Mapper_stage-2 and generates total frequency of the rule. The is given as input to the preprocessing unit and data is grouped
26 D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30

Fig. 6. The MR-CIRD algorithm.

D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30 27

Table 1 transaction T2 as null transactions, so they are to be removed

Sample AMUL dairy sales dataset. from the transactional dataset. The result of null transactions
Zone code Distributor code Date Retailer code Product code screening on Table 2 is shown in Table 3.
1 2001219 7/9/2014 200121900391 SKMCP01
1 2001219 7/9/2014 200121900391 TDMCP01 5.3. The time efficiency of DFPM and MR-CIRD
1 2001219 7/9/2014 200121900392 BTRCP01 algorithm
1 2001219 7/9/2014 200121900392 BTRCP02
1 2001219 7/9/2014 200121900392 DELCP05
2 2001234 7/9/2014 200123400654 FMPCP04 For this experiment, the minimum support count is
2 2001234 7/9/2014 200123400654 PCHCP08 considered as 100, to generate large number of association
2 2001234 7/10/2014 200123400657 BTRCP01 rules. The existing DFPM algorithm is tested on three node
3 2001287 7/10/2014 200128700476 DWRCP45 cluster for each zone. The execution time of frequent k-itemset
3 2001287 7/10/2014 200128700476 SKMCP01
3 2001287 7/10/2014 200128700497 IMFCP04
generation along with number of association rules is shown in
3 2001287 7/10/2014 200128700497 TDMCP01 Table 4.
4 2001263 7/10/2014 200126300594 BTRCP01 After transforming transactional dataset into actual trans-
4 2001263 7/10/2014 200126300594 BTRCP02 actional dataset, the actual transaction file is given as input to
4 2001263 7/10/2014 200126300567 SKMCP01 the distributed frequent pattern mining algorithms to find the
5 2001291 7/10/2014 200129100478 BTRCP01
5 2001291 7/10/2014 200129100478 BTRCP02
frequent k-itemsets. The results of CDA, FDM and DFPM
5 2001291 7/10/2014 200129100438 SKMCP03 algorithms on AMUL datasets for database size of 5GB is
5 2001291 7/10/2014 200129100481 FMPCP04 compared using three node clusters with minimum support
5 2001291 7/10/2014 200129100481 TDMCP01 count of 100, is shown in Fig. 7. The result shows that the
6 2001254 7/10/2014 200125400711 FMPCP01 DFPM algorithm gives much better performance than CDA as
6 2001254 7/10/2014 200125400711 TDMCP06
6 2001254 7/10/2014 200125400731 BTRCP01
well as FDM when the size of the dataset is large.
6 2001254 7/10/2014 200125400731 BTRCP02 The experimental result shows that in order to obtain
6 2001254 7/10/2014 200125400731 TDMCP01 comparatively small execution times, the number of nodes
6 2001254 7/10/2014 200125400756 FMPCP04 must be increased with the increase in the database size. The
6 2001254 7/10/2014 200125400756 SKMCP01 execution time of MR-CIRD algorithm tested on a cluster of
single, two and three nodes is shown in Table 5. The experi-
mental result shows that the time efficiency is improved by
based on distributor code, date and retailer code for each zone. increasing the number of nodes.
As the size of dataset is large, preprocessing is done in the
distributed environment using Hadoop Mapreduce. The output 5.4. Consistent and inconsistent rule detection
of preprocessing unit generates huge transactions for each
zone. The sample output of pre-processing unit on Table 1 is The MR-CIRD algorithm generates consistent (Rx) and
shown in Table 2. inconsistent (Ry) association rules. The number of consistent
and inconsistent rules generated using interestingness mea-
5.2. Screening of null transactions sures as confidence, all-confidence, cosine, interestingness of a
rule (IR), lift and conviction for all zones is shown in Table 6.
After converting sales dataset into transactional dataset, It is observed that, for all the zones, number of consistent rules
null transactions has to be removed to improve the computa- are more compared to inconsistent rules.
tion time. As shown in Table 2, zone 2, 4 and 5 contains In Table 6, the consistent (Rx) and inconsistent (Ry) asso-
ciation rules of Zone-1 using confidence as an interestingness
measure, is calculated by following steps.
Table 2
Zone wise pre-processing output.
Zone code Transaction ID Transactional dataset
Table 3
1 T1 SKMCP01, TDMCP01 Sample actual transactions.
T2 BTRCP01, BTRCP02, DELCP05
2 T1 FMPCP04, PCHCP08 Zone code Transaction ID Transactional dataset
T2 BTRCP01 1 T1 SKMCP01, TDMCP01
3 T1 DWRCP45, SKMCP01 T2 BTRCP01, BTRCP02, DELCP05
T2 IMFCP04, TDMCP01 2 T1 FMPCP04, PCHCP08
4 T1 BTRCP01, BTRCP02 3 T1 DWRCP45, SKMCP01
T2 SKMCP01 T2 IMFCP04, TDMCP01
5 T1 BTRCP01, BTRCP02 4 T1 BTRCP01, BTRCP02
T2 SKMCP03 5 T1 BTRCP01, BTRCP02
T3 FMPCP04, TDMCP01 T3 FMPCP04, TDMCP01
6 T1 FMPCP01, TDMCP06 6 T1 FMPCP01, TDMCP06
T2 BTRCP01, BTRCP02, TDMCP01 T2 BTRCP01, BTRCP02, TDMCP01
T3 FMPCP04, SKMCP01 T3 FMPCP04, SKMCP01
28 D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30

Table 4 Table 5
Zone wise execution time of the DFPM algorithm. Zone wise execution time of the MR-CIRD algorithm.
Zone Zone name No. of Frequent Execution No. of association Zone code Zone name Execution time (in seconds)
code transactions k-itemset time rules Single node Two node Three node
(in seconds) cluster cluster cluster
1 Delhi 735125 13 63459.380 6679 1 Delhi 882.541 678.142 518.231
2 Chennai 313061 15 72247.179 7033
2 Chennai 719.834 514.812 398.712
3 Kolkata 1114936 12 54876.813 6357 3 Kolkata 813.882 648.145 446.012
4 Mumbai 750368 16 79236.741 8168 4 Mumbai 1021.869 876.211 601.213
5 Ahmedabad 917108 17 87769.736 8532 5 Ahmedabad 1335.491 1003.210 767.912
6 Guwahati 196586 15 74567.241 7031
6 Guwahati 752.566 519.761 389.412

Step 1: Initially, distributed frequent pattern mining algo-

rithm is applied on each zone to generate the frequent k- interestingness of a rule, lift and conviction is shown in Fig. 8.
itemsets. It is observed that for zone 4 and 5, number of inconsistent
Step-2: The association rules satisfying minimum confi- rules is relatively less even though the number of consistent
dence threshold are generated from frequent k-itemsets. rules is more.
These generated association rules are given as an input to
the phase-II. 6. Conclusions and future scope
Step-3: In phase-II, for each zone, the rules generated in
phase-I are categorized based on different range of confi- HDFS and Mapreduce play an important role in reducing
dence using Mapreduce stage-1. the processing time for large datasets. However, most of the
Step-4: In Mapreduce stage-2, all the categorized rules are algorithms have limitation of processing speed. In this paper,
compared whether it belongs to same or different zone. The Hadoop based distributed approach is presented which pro-
rules which are belonging to same zone(s) are considered as cesses the transactional dataset into partitions and transfers
consistent rules (Rx) and rules which are belonging to the task to all participating nodes. The purpose of this, is to
different zone(s) are considered as inconsistent rules (Ry). reduce inter node message passing in the cluster. The DFPM
In this experiment, for zone-1 number of consistent and algorithm generates a smaller candidate set and uses a less
inconsistent rules are 5811 and 868, respectively. message passing than CDA and FDM algorithm, thus the
execution time of the DFPM algorithm is less as compare to
Similarly, for the remaining interestingness measures also others. In the first phase, the DFPM algorithm generates
number of consistent and inconsistent rules are generated zone distributed frequent itemsets, and, association rules are
wise as per the above four steps. generated from those frequent itemsets. In the second phase,
The zone wise consistent and inconsistent rules which are the proposed MR-CIRD algorithm is used to detect consistent
commonly generated for confidence, all-confidence, cosine, and inconsistent rules when the data is distributed

Fig. 7. Zone wise execution time comparison of distributed algorithms.

D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30 29

Table 6
Zone wise consistent and inconsistent association rules using interestingness measures.
Interestingness measures (IM) Zone-1 Zone-2 Zone-3 Zone-4 Zone-5 Zone-6
Rx Ry Rx Ry Rx Ry Rx Ry Rx Ry Rx Ry
Confidence 5811 868 6010 1023 5481 876 7241 927 7653 879 5925 1106
All-confidence 5950 729 6113 920 5515 842 7267 901 7635 897 6019 1012
Cosine 5850 829 6016 1017 5586 771 7129 927 7430 1102 6030 1001
IR 5784 895 6072 961 5575 782 7172 996 7563 969 6008 1023
Lift/Interest 5840 839 6114 919 5565 792 7210 958 7783 749 6018 1013
Conviction 5870 809 6090 943 5601 756 7180 988 7740 792 6101 930

Fig. 8. Zone wise common consistent and inconsistent association rules.

geographically. This will help the organization to improve of AMUL dairy located at Anand, Gujarat, India for providing
the marketing strategy for the zone where the inconsistent sales dataset for the purpose of analytics and research.
rules are more. Performance studies have shown that the
distributed computing tasks scale linearly with the number of References
nodes. It is observed that for some region the number of
inconsistent rules is relatively less even though the number of [1] Han J, Kamber M. Data mining concepts and techniques. San Francisco:
consistent rules is more. The proposed algorithm is more Morgan Kaufmann Publishers; 2004.
[2] Tseng FSC, Chen PY. Parallel association rule mining by data de-
flexible, scalable and efficient for mining huge amount of clustering to support grid computing. In: Proceedings of PACIS, 89;
data. 2005. p. 1071e84.
The time efficiency of the algorithm may be improved by [3] Agrawal D, Das S, Abbadi A. Big data and cloud computing: current
using FP-tree based data structures for the candidate itemset state and future opportunities. In: Proc 14th int conf extending database
generation. Further, the work can be extended by considering technology. ACM; 2011. p. 530e3.
[4] Zhang J, Xiaohui T, Wang H. Outlier detection from large distributed
the different weights for each interestingness measures and databases. World Wide Web: Internet and Web Information Systems
find weighted interesting association rules. (WWW), Springer 2013;17(4):539e68.
[5] Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M,
Acknowledgments et al. Bigtable: a distributed storage system for structured data. ACM
Trans Comput Syst (TOCS) 2008;26(2):1e14.
[6] Srikumar K, Bhasker B. Metamorphosis: mining maximal frequent sets
The authors would like to thank our institute for their re- in dense domains. Int J Artif Intell Tools 2005;14(3):491e506.
sources and constant inspiration. Special thanks to the authority
30 D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30

[7] Agrawal R, Imielinski T, Swami A. Mining association rules between [22] Narita K, Kitagawa H. Outlier detection for transaction databases using
sets of items in large databases. In: Proc. Int. Conf. of ACM-SIGMOD on association rules. The 9th int. conf. on Web-age information manage-
Management of Data; 1993. p. 207e16. ment. IEEE Computer Society; 2008. p. 373e80.
[8] Karim MR, Jho JH, Jeong BS. Mining E-shopper's purchase behavior [23] Aydın T, Guvenir HA. Modeling interestingness of streaming association
based on maximal frequent itemsets: an E-commerce perspective. In: rules as a benefit-maximizing classification problem. Knowledge-based
Proc. 3rd Int. Conf. Inf Sci Appl (ICISA, 2012), vol. 1; 2012. p. 1e6. systems. Elsevier; 2009. p. 85e99.
[9] Omiecinski ER. Alternative interest measures for mining associations in [24] Shaari F, Ahmad A, Bakar AA, Hamdan AR. Incorporating negative
databases. IEEE Trans Knowl Data Eng 2003;15(1):57e69. association rules to discover meaningful outlier from Non_Reduct
[10] Tan PN, Kumar V, Srivastava J. Selecting the right objective measure for computation: a medical predictive analysis. Trends in Innovative
association analysis. Inf Syst 2004;29(4):293e313. Computing, ISDA. 2012. p. 151e4.
[11] Gupta MK, Sikka G. Association rules extraction using multi-objective [25] Preetha S, Radha V. Enhanced outlier detection method using association
feature of genetic algorithm. In: Proceedings of the World Congress on rule mining technique. Int J Comput Appl 2012;42(7):1e6.
Engineering and computer Science (WCECS), San Francisco, USA, vol. [26] Riondato M, DeBrabant JA, Fonseca R, Upfal E. Parma: a parallel ran-
2; 2013. p. 23e5. domized algorithm for approximate association rules mining in Mapre-
[12] Ghosh A, Nath B. Muti-objective rule mining using genetic algorithms. duce. In: Proc. 21th Int. Conf. Information and Knowledge Management
Inf Sci 2004;163:123e33. (CIKM' 12), ACM, U. S. A; 2012. p. 85e94.
[13] Brijs T, Vanhoof K, Wets G. Defining interestingness for association [27] Malek M, Kadima H. Searching frequent itemsets by clustering data:
rules. Int J Inf Theor Appl 2003;10(4):370e5. towards a parallel approach using Mapreduce. Web Information Systems
[14] Brin S, Motwani R, Ullman JD, Tsur S. Dynamic itemset counting and Engineering (WISE), vol. 7652. Berlin Heidelberg: Springer; 2013.
implication rules for market basket data. In: Proc. of the ACM SIGMOD. Int. p. 251e8.
Conf. on Management of Data (ACM SIGMOD '97), USA; 1997. p. 255e64. [28] Wu X, Zhu X, Wu G, Ding W. Data mining with big data. IEEE Trans
[15] Hahsler MA. Probabilistic comparison of commonly used interest mea- Knowl Data Eng 2013;26(1):97e107.
sures for association rules. 2015. http://michael.hahsler.net/research/ [29] Lin J, Ryaboy D. Scaling big data mining infrastructure: the twitter
association_rules/measures.html. experience. ACM SIGKDD Explor Newsl 2013;14:6e19.
[16] Dunham MH. Data mining: introductory and advanced topics. Prentice [30] Karim MR, Ahmed CF, Jeong B, Choi H. An efficient distributed pro-
Hall; 2003. ISBN 0-13-088892-3; 2003. gramming model for mining useful patterns in big datasets. IETE Tech
[17] Agrawal R, Shafer JC. Parallel mining of association rules. IEEE Trans. Rev 2013;30(1):53e63.
Knowl Data Eng 1996;8:962e9. [31] Rajeswari AM, Sridevi M, Deisy C. Outliers detection on educational
[18] Gyorodi C. A comparative study of distributed algorithms in mining data using fuzzy association rule mining. In: Int. Conf. on Adv. in
association rules. In: Int. symposium on system theory e XI Edition, Computer, Communication and information Science (ACCIS-14).
SINTES 11, Craiova, Romaniavol, vol. 1; 2003. p. 339e45. Elsevier Publications; 2014. p. 1e9.
[19] Cheung DW, Han J, Vincent TN, Ada WF. A fast distributed algorithm [32] Butincu CN, Craus M. An improved version of the frequent itemset
for mining association rules. In: Proc. 4th IEEE int. conf. parallel and mining algorithm. In: 14th IEEE Int. Conf. Networking in Education and
distributed information systems; 1996. p. 31e42. Research, Craiova; 2015. p. 184e9.
[20] Lin WY, Tseng MC. Automated support specification for efficient mining of [33] Ban T, Eto M, Guo S, Inoue D, Nakao K, Huang R. A study on asso-
interesting association rules. J Inf Sci CILIP 2006;32(3):238e50. ciation rule mining of darknet big data. In: Proc IEEE Int Joint Conf on
[21] Stephane L, Teytaud O, Prudhomme E. Association rule interestingness: Neural Network (IJCNN); 2015. p. 1e7.
measure and statistical validation. Studies in Computational Intelligence
(SCI), vol. 43. Springer; 2007. p. 251e75.

IR MCQ With Answers
100% (1)
IR MCQ With Answers
23 pages
MVVM Model-View-ViewModel IOS Architecture Patterns MVC, MVP, MVVM, VIPER, and VIP in Swift
No ratings yet
MVVM Model-View-ViewModel IOS Architecture Patterns MVC, MVP, MVVM, VIPER, and VIP in Swift
37 pages
Stonex Cube-A V5 November2020
100% (2)
Stonex Cube-A V5 November2020
48 pages
Exploring Oracle Internals PDF
0% (2)
Exploring Oracle Internals PDF
2 pages
Data Mining Applications and Feature Scope Survey
No ratings yet
Data Mining Applications and Feature Scope Survey
5 pages
Data Mining and Its Techniques: A Review Paper: Maria Shoukat (MS Student)
No ratings yet
Data Mining and Its Techniques: A Review Paper: Maria Shoukat (MS Student)
7 pages
A Survey of Association Rule Hiding Algorithms
No ratings yet
A Survey of Association Rule Hiding Algorithms
4 pages
data-07-00011
No ratings yet
data-07-00011
22 pages
Association Rule Mining On Distributed Data: Pallavi Dubey
No ratings yet
Association Rule Mining On Distributed Data: Pallavi Dubey
6 pages
IJAERS-SEPT-2014-020-MAD-ARM - Distributed Association Rule Mining Mobile Agent
No ratings yet
IJAERS-SEPT-2014-020-MAD-ARM - Distributed Association Rule Mining Mobile Agent
5 pages
1-s2.0-S030645732300314X-main
No ratings yet
1-s2.0-S030645732300314X-main
22 pages
Algorithm and Approaches To Handle Large Data-A Survey
No ratings yet
Algorithm and Approaches To Handle Large Data-A Survey
5 pages
2
No ratings yet
2
5 pages
A Survey On Association Rules in Case of Multimedia Data Mining
No ratings yet
A Survey On Association Rules in Case of Multimedia Data Mining
4 pages
Importance of Similarity Measures in Effective Web Information Retrieval
No ratings yet
Importance of Similarity Measures in Effective Web Information Retrieval
5 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
Agent Based Meta Learning in Distributed
No ratings yet
Agent Based Meta Learning in Distributed
7 pages
Discover Artificial Intelligence
No ratings yet
Discover Artificial Intelligence
13 pages
Data Mining Report
100% (1)
Data Mining Report
15 pages
SDFP-growth Algorithm as a Novelty of Association
No ratings yet
SDFP-growth Algorithm as a Novelty of Association
12 pages
Performance Analysis of Machine Learning Algorithms For Big Data Classification ML and AI Based Algorithms For Big Data Analysis
No ratings yet
Performance Analysis of Machine Learning Algorithms For Big Data Classification ML and AI Based Algorithms For Big Data Analysis
16 pages
Data Mining: A Database Perspective
No ratings yet
Data Mining: A Database Perspective
19 pages
Synthetic Data Generation - Machine Learning
No ratings yet
Synthetic Data Generation - Machine Learning
8 pages
426-Article Text-1037-1-10-20210421
No ratings yet
426-Article Text-1037-1-10-20210421
9 pages
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
No ratings yet
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
6 pages
Fujipress - JACIII 21 1 5
No ratings yet
Fujipress - JACIII 21 1 5
18 pages
44 ArticleText 158 1 10 202202101
No ratings yet
44 ArticleText 158 1 10 202202101
5 pages
Research On Pattern Analysis and Data Classification Methodology For Data Mining and Knowledge Discovery
No ratings yet
Research On Pattern Analysis and Data Classification Methodology For Data Mining and Knowledge Discovery
10 pages
Recommendation System With Automated Web Usage Data Mining by Using K-Nearest Neighbour (KNN) Classification and Artificial Neural Network (ANN) Algorithm
No ratings yet
Recommendation System With Automated Web Usage Data Mining by Using K-Nearest Neighbour (KNN) Classification and Artificial Neural Network (ANN) Algorithm
9 pages
The Survey of Data Mining Applications and Feature Scope
No ratings yet
The Survey of Data Mining Applications and Feature Scope
16 pages
Data Mining: (Kumar, Viswanath and Rao, 2016)
No ratings yet
Data Mining: (Kumar, Viswanath and Rao, 2016)
3 pages
A Graph Based Approach To Manage CAE-Data in A Data-Lake
No ratings yet
A Graph Based Approach To Manage CAE-Data in A Data-Lake
6 pages
Proposal (BigData)
No ratings yet
Proposal (BigData)
9 pages
Big.Data.Analytics.for.Business
No ratings yet
Big.Data.Analytics.for.Business
12 pages
Yun 2020
No ratings yet
Yun 2020
5 pages
AS C I T T D M: Tudy ON Omputational Ntelligence Echniques O ATA Ining
No ratings yet
AS C I T T D M: Tudy ON Omputational Ntelligence Echniques O ATA Ining
13 pages
Dendrogram Clustering For 3D Data Analytics in Smart City
No ratings yet
Dendrogram Clustering For 3D Data Analytics in Smart City
7 pages
1228-Article Text-4370-1-10-20211215
No ratings yet
1228-Article Text-4370-1-10-20211215
13 pages
What Is Data
No ratings yet
What Is Data
20 pages
37 A Review Paper On Big Data Analytics
No ratings yet
37 A Review Paper On Big Data Analytics
4 pages
Application Based, Advantageous K-Means Clustering Algorithm in Data Mining - A Review
No ratings yet
Application Based, Advantageous K-Means Clustering Algorithm in Data Mining - A Review
6 pages
MBAMarket Basket Analysis Using Frequent Pattern Mining Techniques
No ratings yet
MBAMarket Basket Analysis Using Frequent Pattern Mining Techniques
8 pages
Big Data Stream Mining Using Integrated Framework With Classification and Clustering Methods
No ratings yet
Big Data Stream Mining Using Integrated Framework With Classification and Clustering Methods
9 pages
Security and Communication Networks - 2022 - Cong - LGBM an Intrusion Detection Scheme for Resource‐
No ratings yet
Security and Communication Networks - 2022 - Cong - LGBM an Intrusion Detection Scheme for Resource‐
12 pages
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
No ratings yet
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
4 pages
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
No ratings yet
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
21 pages
Datamining With Big Data - Siva
No ratings yet
Datamining With Big Data - Siva
69 pages
Applsci 11 08971 v2
No ratings yet
Applsci 11 08971 v2
15 pages
Hot Keys
No ratings yet
Hot Keys
4 pages
IIOT 3
No ratings yet
IIOT 3
19 pages
Es 2646574663
No ratings yet
Es 2646574663
7 pages
A Study of Sales Prediction Analysis in A Business Organization Using Data Mining Technique
No ratings yet
A Study of Sales Prediction Analysis in A Business Organization Using Data Mining Technique
5 pages
1 s2.0 S2665917422000551 Main
No ratings yet
1 s2.0 S2665917422000551 Main
9 pages
8 X October 2020
No ratings yet
8 X October 2020
9 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
A Novel Methodology For Discrimination Prevention in Data Mining
No ratings yet
A Novel Methodology For Discrimination Prevention in Data Mining
21 pages
Big Data Analytics and Its Applications: Annals of Emerging Technologies in Computing October 2017
No ratings yet
Big Data Analytics and Its Applications: Annals of Emerging Technologies in Computing October 2017
11 pages
A Survey On Indexing Techniques For Big Data: Taxonomy and Performance Evaluation
No ratings yet
A Survey On Indexing Techniques For Big Data: Taxonomy and Performance Evaluation
44 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
10 pages
seminar report
No ratings yet
seminar report
25 pages
Edge Computing Applications in Supply Chain Management
From Everand
Edge Computing Applications in Supply Chain Management
Bo Li
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Rest API Example
No ratings yet
Rest API Example
30 pages
Chapter 9 Starting with LibreOffice Base IT Code 402 Class 10 Book Solution - MyCSTutorial- The path to Success in Exam
No ratings yet
Chapter 9 Starting with LibreOffice Base IT Code 402 Class 10 Book Solution - MyCSTutorial- The path to Success in Exam
8 pages
Lib Sci
No ratings yet
Lib Sci
15 pages
Unit-5 Z Test and T Test
No ratings yet
Unit-5 Z Test and T Test
24 pages
Bus Reservation System
100% (1)
Bus Reservation System
9 pages
Data Mining BITS-PILANI Mid Semester Sample
No ratings yet
Data Mining BITS-PILANI Mid Semester Sample
10 pages
Backend Developer Assignment
No ratings yet
Backend Developer Assignment
3 pages
Download ebooks file English Suffixes Stress Assignment Properties Productivity Selection and Combinatorial Processes 1st Edition Ives Trevian all chapters
100% (3)
Download ebooks file English Suffixes Stress Assignment Properties Productivity Selection and Combinatorial Processes 1st Edition Ives Trevian all chapters
81 pages
Database Quiz
No ratings yet
Database Quiz
5 pages
PowerProtect Data Manager Oracle Integration - Participant Guide
No ratings yet
PowerProtect Data Manager Oracle Integration - Participant Guide
79 pages
CST204 e
No ratings yet
CST204 e
4 pages
Microsoft 70-764 Exam Actual Questions
No ratings yet
Microsoft 70-764 Exam Actual Questions
117 pages
System Analysis and Design Activities Report: Hanoi University Faculty of Information Technology - &
No ratings yet
System Analysis and Design Activities Report: Hanoi University Faculty of Information Technology - &
11 pages
Unit IV Naïve Bayes and Support Vector Machine
No ratings yet
Unit IV Naïve Bayes and Support Vector Machine
22 pages
NITS Solutions - Case Study - Business Analyst
No ratings yet
NITS Solutions - Case Study - Business Analyst
2 pages
VIVA Selected Question Ignou BCA MCA by NiPSAR
No ratings yet
VIVA Selected Question Ignou BCA MCA by NiPSAR
5 pages
Outer Join Flag
No ratings yet
Outer Join Flag
2 pages
Managing Recipient Objects
No ratings yet
Managing Recipient Objects
37 pages
In - Memory Data Grid: White Paper
No ratings yet
In - Memory Data Grid: White Paper
16 pages
СРСП 2 ИКТ
No ratings yet
СРСП 2 ИКТ
7 pages
(CSE3083) Lab Practical Assignment #8
No ratings yet
(CSE3083) Lab Practical Assignment #8
6 pages
Virtual Elements in CDS Views 1731724717
No ratings yet
Virtual Elements in CDS Views 1731724717
7 pages
PostgreSQL Server Programming 2nd Edition Usama Dar - Download the full ebook now for a seamless reading experience
100% (1)
PostgreSQL Server Programming 2nd Edition Usama Dar - Download the full ebook now for a seamless reading experience
58 pages
Hibernate What Is JDBC
No ratings yet
Hibernate What Is JDBC
35 pages
Online Shopping - TutorialsDuniya
No ratings yet
Online Shopping - TutorialsDuniya
33 pages
Note 2620830 - How to Record SAP HANA Memory Allocator Traces to Analyze Memory Leaks
No ratings yet
Note 2620830 - How to Record SAP HANA Memory Allocator Traces to Analyze Memory Leaks
4 pages

2017, Prajapati - Intersting association rule mining whith consistent and inconsistent rule detection

Uploaded by

2017, Prajapati - Intersting association rule mining whith consistent and inconsistent rule detection

Uploaded by

Available online at www.sciencedirect.

Interesting association rule mining with consistent and inconsistent rule

1. Introduction a company is interested in identifying items that are frequently

Consider a dataset D, having n number of transactions 2.7.3. Cosine

SupportðX∪YÞ 3. Related work

Fig. 1. Proposed methodology phase-I.

of length 1 and generates candidates with length 2. During step

4.1.4. Association rule generation

➢ For each frequent itemset, l, generate all non-empty

Fig. 4. The DFPM algorithm.

Fig. 5. Proposed methodology phase-II.

Fig. 6. The MR-CIRD algorithm.

Table 1 transaction T2 as null transactions, so they are to be removed

Step 1: Initially, distributed frequent pattern mining algo-

Fig. 7. Zone wise execution time comparison of distributed algorithms.

Fig. 8. Zone wise common consistent and inconsistent association rules.

You might also like