2017, Prajapati - Intersting association rule mining whith consistent and inconsistent rule detection
2017, Prajapati - Intersting association rule mining whith consistent and inconsistent rule detection
com
ScienceDirect
Future Computing and Informatics Journal 2 (2017) 19e30
http://www.journals.elsevier.com/future-computing-and-informatics-journal/
Abstract
Nowadays, there is an increasing demand in mining interesting patterns from the big data. The process of analyzing such a huge amount of
data is really computationally complex task when using traditional methods. The overall purpose of this paper is in twofold. First, this paper
presents a novel approach to identify consistent and inconsistent association rules from sales data located in distributed environment. Secondly,
the paper also overcomes the main memory bottleneck and computing time overhead of single computing system by applying computations to
multi node cluster. The proposed method initially extracts frequent itemsets for each zone using existing distributed frequent pattern mining
algorithms. The paper also compares the time efficiency of Mapreduce based frequent pattern mining algorithm with Count Distribution Al-
gorithm (CDA) and Fast Distributed Mining (FDM) algorithms. The association generated from frequent itemsets are too large that it becomes
complex to analyze it. Thus, Mapreduce based consistent and inconsistent rule detection (MR-CIRD) algorithm is proposed to detect the
consistent and inconsistent rules from big data and provide useful and actionable knowledge to the domain experts. These pruned interesting
rules also give useful knowledge for better marketing strategy as well. The extracted consistent and inconsistent rules are evaluated and
compared based on different interestingness measures presented together with experimental results that lead to the final conclusions.
© 2017 Faculty of Computers and Information Technology, Future University in Egypt. Production and hosting by Elsevier B.V. This is an open
access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords: Outlier association rules; Interestingness measures; Distributed frequent pattern mining; Mapreduce; Preprocessing
http://dx.doi.org/10.1016/j.fcij.2017.04.003
2314-7288/© 2017 Faculty of Computers and Information Technology, Future University in Egypt. Production and hosting by Elsevier B.V. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
20 D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30
computing environments [4]. To handle the big data, distrib- relationship between those items. An association rule is rep-
uted system consists of a pool of autonomous compute nodes resented by X / Y, where X and Y are the distinct itemsets.
that appears as a single workstation [5]. Interestingness mea- The Association rule exposes the relationship between the
sures play an important role in association rule mining. These itemset X with the itemset Y [7].
measures are used to find interesting patterns based on user
need. The large number of association rules generated by 2.4. Consistent rule
frequent pattern mining algorithm may not be useful for the
organization as a whole. Therefore, there is a need for filtering The set of association rules containing itemset which is
out the interesting and uninteresting rules for business locally as well as globally frequent in a large data are the
intelligence. consistent rules.
In brief, the contribution of this paper is summarized in
five steps: i) First of all, big sales dataset is transformed into 2.5. Inconsistent rule
zone wise transactional dataset using Hadoop Mapreduce,
(ii) Null transactions and infrequent itemsets at each zone The set of association rules containing itemset which is
are removed from the transactional dataset, (iii) The existing frequent locally but not frequent globally or wise a versa, are
distributed frequent itemset mining algorithms CDA, FDM the inconsistent rules. Inconsistent rules are non-conforming
and DFPM are applied on each zone to generate the com- patterns in the dataset; i.e., the sales pattern does not exhibit
plete set of frequent itemsets, and time efficiency of these normal behavior.
algorithms is compared, (iv) Then, association rules are
generated for each zone, (v) Finally, the proposed MR- 2.6. NULL transaction
CIRD algorithm is applied to find consistent and inconsis-
tent rules zone wise using various interestingness measures. A null transaction is a transaction that does not contain any
Both these algorithms are tested on big sales dataset of itemsets or single item [5,8]. The presence of null transaction
AMUL Dairy. is one of the critical problems in the form of efficiency for
The remaining of this paper is organized as follows. Section mining strong association rule.
2 presents preliminaries for interesting association rule mining
with consistent and inconsistent rule detection in distributed 2.7. Interestingness measures
environment. Related work is given in section 3. Section 4
shows the proposed methodology. In Section 5, the perfor- The term interestingness measure is essential aspect of
mance of proposed method is evaluated on sales dataset of extraction of interesting pattern from the database. For this
AMUL dairy. Finally, the conclusions and future scope is experiment, following interestingness measures are used.
drawn in section 6.
2.7.1. Confidence
2. Preliminaries The confidence is the percentage of transactions in the
database D with itemset X that also contains the itemset Y. The
In this section, the complete set of definitions, terminol- confidence is calculated using the conditional probability
ogies and assumptions used in this paper are presented. which is further expressed in terms of itemset support. The
equation for the confidence is given by Refs. [1,7],
2.1. Itemset
Support ðX∪YÞ
ConfidenceðX/YÞ ¼ PðYjXÞ ¼
Let I ¼ {I1, I2 … In} be a set of distinct items in the dataset Support ðXÞ
D. Itemset is a set of items, X which is subset of I. An itemset Here, Support (X∪Y) is the number of transactions con-
X with k distinct items is referred as k-itemset [6]. taining the itemsets X and Y both, and Support (X ) is the
number of transactions containing the itemset X.
2.2. Support
2.7.2. All-confidence
The support is the percentage of transactions in the data- All-confidence is defined as [9],
base D that contain both itemsets X and Y. The support of an
association rule X / Y is given by Refs. [1,7], SupportðX∪YÞ
All confidenceðX/YÞ ¼
MaxðSupportðXÞ; SupportðYÞÞ
SupportðX/YÞ ¼ SupportðX∪YÞ ¼ PðX∪YÞ
All-confidence satisfies the downward closed closure
property. Hence, it is effectively used for interesting associa-
2.3. Association rule tion rule mining.
collective support and confidence measure for interesting as- sets. This will help the teachers in giving extra coaching for
sociation rule mining. The author suggested parallel process- the weak students. Butincu and Craus [32] present improved
ing of proposed approach to reduce the execution time of version of the frequent itemset mining algorithm as well as its
system as future scope. The PARMA algorithm is proposed generalized version. The authors introduced optimized for-
[26,27] to provide great improvements to the runtime of mulas for generating valid candidates by reducing number of
finding association rules. PARMA achieves this by utilizing invalid candidates. By using the computations of previous
probabilistic results, it only approximates the answers. This steps by other processed nodes, it avoids generating redundant
solution uses clustering to create groups of transactions and candidates. Authors also suggested to run the same algorithm
chooses candidate sets from the representative itemsets in the in parallel or distributed system.
clusters. For the marketing strategy, it is more important to analyze
Wu et al. [28] proposed performance analysis factors like inconsistent pattern when data is distributed geographically.
heterogeneous and autonomous. The authors also proposed a However, none of the above mentioned work finds regional
complex theorem which characterizes the features of both the inconsistent patterns from the large dataset. Therefore, trans-
big data revolution and big data processing model. Authors forming the sales data into transaction and then eliminating
analyze the challenging issues in the data mining model and null transaction for the future consideration; is the initial part
also in the big data analysis. Lin and Ryaboy [29] proposed of this proposed methodology. After removing null trans-
method for analyzing the Twitter data. In this paper two major actions, distributed frequent mining algorithms are applied for
topics are discussed. First, schemas are insufficient to provide each zone to generate useful patterns and time efficiency is
the knowledge of understanding the petabytes or terabytes of also compared. Then, the proposed MR-CIRD algorithm is
data. Second, a major challenge for analyzing the data is the applied to find zone wise consistent and inconsistent rules. The
heterogeneity of the various components. The objective of this objective of this work is to remove the drawbacks of relational
paper is to share experiences of authors to analyze the data database and facilitate the existing Mapreduce framework; to
from Twitter in the area of production environment. Karim generate the complete set of regional consistent and incon-
et al. [30] proposed a distributed system for mining the busi- sistent rules with smaller candidate set generations, less
ness related transactional datasets using an improved Mapre- message passing and improvement in the execution time of the
duce framework. This model is highly scalable in terms of system.
increasing database size. In this paper, authors implemented
“Associated-Correlated-Independent” algorithm which effec- 4. Proposed methodology
tively mines the complete set of customer's purchase rules.
Rajeswari et al. [31] proposed modified Fuzzy Apriori Rare The proposed methodology is applied in two phases. In the
Item sets Mining (FARIM) algorithm to detect the outliers first phase, association rules along with interestingness mea-
(weak student) based on the heap space usage. The heap space sures and zone number are derived. In the second phase, the
used by FARIM and modified FARIM algorithms on educa- association rules are categorized into consistent and incon-
tional dataset is tested and derived that the modified FARIM sistent rules, zone wise.
algorithm uses less heap space as compared to the FARIM
algorithm. Thus this approach not only extracts the failure 4.1. Phase-I: zone wise association rule generation
students as outliers, but also those students who have passed
with border marks are also extracted as outliers. Here, fuzzy In the first phase, the dataset of each zone is given as input
based apriori algorithm is used to generate less frequent item to the data preprocessing unit as shown in Fig. 1. Due to huge
dataset size, the pre-processing is done in distributed envi- transactions which affects the performance of the system. For
ronment. The original sales dataset of each zone is trans- example, some of the distributor may not sell cow milk, butter
formed into zone wise transactional dataset using Hadoop and protein powder together; if these three itemsets are
Mapreduce framework. frequent-3 itemsets. Since, big sales dataset typically have null
transactions so it is necessary to remove null transactions from
4.1.1. Data pre-processing using Mapreduce it. For example, suppose that, the dairy dataset contains
Data pre-processing is used to transform relational sales 1,00,000 transactions where 10% transactions are null trans-
dataset into transactional dataset. For the pre-processing of big actions. Any frequent pattern mining algorithm scans all the
data, Mapreduce is used in two phase: Mapper setup and 1,00,000 transactions while, the proposed approach considers
Reducer setup. 90,000 valid transactions after removing 10,000 null trans-
actions. So, null transactions are removed from the trans-
4.1.1.1. Mapper setup. The first step of any Mapreduce job is actional dataset to improve the performance of the system and
the map step. In this step the Hadoop framework splits the Ds thus, actual transactions are generated.
input database into smaller Dn chunks. These n chunks are
given to Hadoop Distributed File System (HDFS). The size of 4.1.3. Distributed frequent pattern mining algorithm
database split depends on the configuration of Mapreduce The actual transactional dataset is given to distributed
framework and the way in which the data is distributed on the frequent pattern mining algorithms to generate frequent k-
file systems of the machines in the given cluster. The purpose itemset. The CDA and FDM algorithms are data parallelism
of the map function is to combine zone code (zone), distributor algorithm. In CDA algorithm, the dataset is divided into n
code (dist), sales date (date) and retailer code (ret). The input number of partitions, each partition is given to separate node.
sales database is given to the mapper line by line then each Each node counts the candidates and then broadcasts its counts
line is split into zone, dist, date, ret, and pr. The output <key, to remaining nodes. Each node then determines the global
value> pair consists of the <zone þ dist þ date þ ret, pr>. counts. The global counts are used to determine the large
The pseudo code of the map task is shown in Fig. 2. itemsets and to generate the candidates for the next iteration.
In FDM algorithm, candidate set is generated similar to apriori
4.1.1.2. Reducer setup. The reducer function gets its input as algorithm. To reduce the size of candidates at each iteration,
<key, value> pairs from the output of the previous map local and global frequent itemsets are used which result
function. The pairs are ordered and there is a guarantee that if reduction in the number of messages interchanged between
a reduce task receives a key it will also receive all values with nodes. Once the candidate sets are generated, local reduction
the same key. The ordering and moving of the intermediate and global reduction techniques are applied on each site to
<key, value> pairs is done automatically by the framework eliminate redundant candidate sets. The main drawback of
and it is called the shuffle step. The key is split into two parts CDA and FDM algorithm is that both generates large candi-
zone and dist þ date þ ret. After combining all the values each date set, uses more number of message passing system and
key, the reduce task creates the transactional database zone execution time is higher while mining big data. These draw-
wise to separate the transactions of each zone. The pseudo backs can be improved by Mapreduce based frequent pattern
code of the reduce task is shown in Fig. 3. mining algorithm.
After preprocessing using Mapreduce, original sales dairy In Distributed Frequent Pattern Mining (DFPM) algorithm,
dataset is transformed into the transactional dataset zone wise once the actual transactional dataset is stored in HDFS, the
and then null transaction are removed from it to generate the entire dataset is split into the smaller segments and then each
actual transaction. segment is transformed to data nodes. The map function is
executed on each data segments and it produces <key, value>
4.1.2. Screening of null transactions pairs for each record of database. The Mapreduce framework
Null transactions are transactions which do not contain any generates <key, value> pairs having same items and invokes
itemset or contains infrequent itemset. The preprocessing unit the reducer function by passing the list of values for candidate
generates the transaction containing large number of null itemsets. For each database scan, the map function generates
local candidate itemsets, and, the reduce function receives
global count by adding these local counts. For the overall
computation, multiple iterations of Mapreduce functions are
necessary. Each of the Mapreduce iteration produces a
frequent itemset. The iteration continues until no further
frequent itemsets exist. The reducer function sums up all the
values produced by mapper and generates a count for the
candidate item. The main advantage of this approach is that it
doesn't exchange data between each node, but it only ex-
changes the count values. The DFPM algorithm uses notation
Ck as a set of candidate k-itemset and Lk as a set of frequent k-
Fig. 2. Mapper function. itemset which is shown in Fig. 4. For each zone, the
24 D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30
transactional data is given as an input to the mapper line by In the proposed approach, all the rules without checking the
line. Each line is split into items and the output <key, value> condition of minimum confidence threshold are generated.
pair consists of the item and the value 1. This is the local Finally, association rule containing (Rule, Interesting Measure,
frequency of the item. The reduce task starts with the itemsets Zone) for each zone are generated.
4.2. Phase-II: consistent and inconsistent rule detection rules having total frequency more than one are considered as
inconsistent rules and are stored in Ry file. Similarly, the rules
In the second phase, the consistent and inconsistent asso- having total frequency one, are considered as consistent rules
ciation rules of each zone are generated based on different and are stored in Rx file.
interestingness measures (IM). For this experiment, confi-
dence, all-confidence, cosine, interestingness of a rule, lift and 5. Experimental setup & results
conviction are used as interestingness measures. The rule is
said to be consistent, if the interestingness measure of a rule in For the experimental purpose cluster of four desktop ma-
a zone is nearer to global value of IM, otherwise the rule is chines consists of i5 processor with 4 GB DDR-3 RAM are
said to be inconsistent rule. The framework for interesting used. Ubuntu 12.04 LTS operating system is installed in all the
association rule mining with inconsistent rule detection in four computers. Usually JVM is not a part of Ubuntu 12.04,
distributed environment is shown in Fig. 5. hence, JVM is also installed in all the four computers. Multi-
The consistent and inconsistent association rules for each node cluster is configured in three computers and single-node
zone are calculated using Mapreduce based consistent and cluster is configured in single computer using apache Hadoop
inconsistent rule detection algorithm which is given in Fig. 6. packages. The preprocessing algorithm, distributed frequent
In the proposed algorithm two stages of mapper as well as pattern mining algorithm and phase-II of proposed method is
reducer is used. The association rules generated for each zone tested on both multi-node as well as single-node cluster.
in phase-I is given as an input to the Mapper_stage-1, line by For this experiment, the sales database of AMUL dairy with
line. In the Mapper_stage-1, each zonal rule is classified into more than 1500 different dairy products is used. The database
three groups based on interestingness measures (IM ) as having total size of 5GB is divided into six zone based on the
IM 30%, 30% < IM 60% and IM > 60%. The input of area of distributors. The distributor of zone code ranging from
Mapper_stage-1 is a set of association rules containing (AR, 1 to 6 are having zone name as Delhi, Chennai, Kolkata,
Zone, IM ), and the output is three MS1 files containing <AR, Mumbai, Ahmedabad and Guwahati, respectively. The sales of
Zone þ IM> as <key, value> pair. For each MS1 file, the various dairy products are done based on concept hierarchy.
Reducer_stage-1 function combines the interestingness mea- First of all, the product is sent to the distributor of a specific
sure of all the zone having same association rule which are zone who in turn distributes the same product to the local
stored in RS1 file. The input of Reducer_stage-1 is <AR, retailer and finally the retailer sells it to the customer. A part of
Zone þ IM > as <key, value> pair and the generated output is sample sales dataset of AMUL dairy containing zone code,
<AR, IM_final>. Here, the function NOT gives number of distributor code, sales date, retailer code and actual product
transaction for that zone. The output of Reducer_stage-1 is code is shown in Table 1.
given as input to Mapper_stage-2 which generates the output
<key, value> pair for each RS1 file, where key is association 5.1. Pre-processing of big sales data
rule and value is 1. Here, value indicates the local frequency of
the rule. The Reduce_stage-2 combines the output of AMUL sales dataset distributed across six different zones,
Mapper_stage-2 and generates total frequency of the rule. The is given as input to the preprocessing unit and data is grouped
26 D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30
Table 4 Table 5
Zone wise execution time of the DFPM algorithm. Zone wise execution time of the MR-CIRD algorithm.
Zone Zone name No. of Frequent Execution No. of association Zone code Zone name Execution time (in seconds)
code transactions k-itemset time rules Single node Two node Three node
(in seconds) cluster cluster cluster
1 Delhi 735125 13 63459.380 6679 1 Delhi 882.541 678.142 518.231
2 Chennai 313061 15 72247.179 7033
2 Chennai 719.834 514.812 398.712
3 Kolkata 1114936 12 54876.813 6357 3 Kolkata 813.882 648.145 446.012
4 Mumbai 750368 16 79236.741 8168 4 Mumbai 1021.869 876.211 601.213
5 Ahmedabad 917108 17 87769.736 8532 5 Ahmedabad 1335.491 1003.210 767.912
6 Guwahati 196586 15 74567.241 7031
6 Guwahati 752.566 519.761 389.412
Table 6
Zone wise consistent and inconsistent association rules using interestingness measures.
Interestingness measures (IM) Zone-1 Zone-2 Zone-3 Zone-4 Zone-5 Zone-6
Rx Ry Rx Ry Rx Ry Rx Ry Rx Ry Rx Ry
Confidence 5811 868 6010 1023 5481 876 7241 927 7653 879 5925 1106
All-confidence 5950 729 6113 920 5515 842 7267 901 7635 897 6019 1012
Cosine 5850 829 6016 1017 5586 771 7129 927 7430 1102 6030 1001
IR 5784 895 6072 961 5575 782 7172 996 7563 969 6008 1023
Lift/Interest 5840 839 6114 919 5565 792 7210 958 7783 749 6018 1013
Conviction 5870 809 6090 943 5601 756 7180 988 7740 792 6101 930
geographically. This will help the organization to improve of AMUL dairy located at Anand, Gujarat, India for providing
the marketing strategy for the zone where the inconsistent sales dataset for the purpose of analytics and research.
rules are more. Performance studies have shown that the
distributed computing tasks scale linearly with the number of References
nodes. It is observed that for some region the number of
inconsistent rules is relatively less even though the number of [1] Han J, Kamber M. Data mining concepts and techniques. San Francisco:
consistent rules is more. The proposed algorithm is more Morgan Kaufmann Publishers; 2004.
[2] Tseng FSC, Chen PY. Parallel association rule mining by data de-
flexible, scalable and efficient for mining huge amount of clustering to support grid computing. In: Proceedings of PACIS, 89;
data. 2005. p. 1071e84.
The time efficiency of the algorithm may be improved by [3] Agrawal D, Das S, Abbadi A. Big data and cloud computing: current
using FP-tree based data structures for the candidate itemset state and future opportunities. In: Proc 14th int conf extending database
generation. Further, the work can be extended by considering technology. ACM; 2011. p. 530e3.
[4] Zhang J, Xiaohui T, Wang H. Outlier detection from large distributed
the different weights for each interestingness measures and databases. World Wide Web: Internet and Web Information Systems
find weighted interesting association rules. (WWW), Springer 2013;17(4):539e68.
[5] Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M,
Acknowledgments et al. Bigtable: a distributed storage system for structured data. ACM
Trans Comput Syst (TOCS) 2008;26(2):1e14.
[6] Srikumar K, Bhasker B. Metamorphosis: mining maximal frequent sets
The authors would like to thank our institute for their re- in dense domains. Int J Artif Intell Tools 2005;14(3):491e506.
sources and constant inspiration. Special thanks to the authority
30 D.J. Prajapati et al. / Future Computing and Informatics Journal 2 (2017) 19e30
[7] Agrawal R, Imielinski T, Swami A. Mining association rules between [22] Narita K, Kitagawa H. Outlier detection for transaction databases using
sets of items in large databases. In: Proc. Int. Conf. of ACM-SIGMOD on association rules. The 9th int. conf. on Web-age information manage-
Management of Data; 1993. p. 207e16. ment. IEEE Computer Society; 2008. p. 373e80.
[8] Karim MR, Jho JH, Jeong BS. Mining E-shopper's purchase behavior [23] Aydın T, Guvenir HA. Modeling interestingness of streaming association
based on maximal frequent itemsets: an E-commerce perspective. In: rules as a benefit-maximizing classification problem. Knowledge-based
Proc. 3rd Int. Conf. Inf Sci Appl (ICISA, 2012), vol. 1; 2012. p. 1e6. systems. Elsevier; 2009. p. 85e99.
[9] Omiecinski ER. Alternative interest measures for mining associations in [24] Shaari F, Ahmad A, Bakar AA, Hamdan AR. Incorporating negative
databases. IEEE Trans Knowl Data Eng 2003;15(1):57e69. association rules to discover meaningful outlier from Non_Reduct
[10] Tan PN, Kumar V, Srivastava J. Selecting the right objective measure for computation: a medical predictive analysis. Trends in Innovative
association analysis. Inf Syst 2004;29(4):293e313. Computing, ISDA. 2012. p. 151e4.
[11] Gupta MK, Sikka G. Association rules extraction using multi-objective [25] Preetha S, Radha V. Enhanced outlier detection method using association
feature of genetic algorithm. In: Proceedings of the World Congress on rule mining technique. Int J Comput Appl 2012;42(7):1e6.
Engineering and computer Science (WCECS), San Francisco, USA, vol. [26] Riondato M, DeBrabant JA, Fonseca R, Upfal E. Parma: a parallel ran-
2; 2013. p. 23e5. domized algorithm for approximate association rules mining in Mapre-
[12] Ghosh A, Nath B. Muti-objective rule mining using genetic algorithms. duce. In: Proc. 21th Int. Conf. Information and Knowledge Management
Inf Sci 2004;163:123e33. (CIKM' 12), ACM, U. S. A; 2012. p. 85e94.
[13] Brijs T, Vanhoof K, Wets G. Defining interestingness for association [27] Malek M, Kadima H. Searching frequent itemsets by clustering data:
rules. Int J Inf Theor Appl 2003;10(4):370e5. towards a parallel approach using Mapreduce. Web Information Systems
[14] Brin S, Motwani R, Ullman JD, Tsur S. Dynamic itemset counting and Engineering (WISE), vol. 7652. Berlin Heidelberg: Springer; 2013.
implication rules for market basket data. In: Proc. of the ACM SIGMOD. Int. p. 251e8.
Conf. on Management of Data (ACM SIGMOD '97), USA; 1997. p. 255e64. [28] Wu X, Zhu X, Wu G, Ding W. Data mining with big data. IEEE Trans
[15] Hahsler MA. Probabilistic comparison of commonly used interest mea- Knowl Data Eng 2013;26(1):97e107.
sures for association rules. 2015. http://michael.hahsler.net/research/ [29] Lin J, Ryaboy D. Scaling big data mining infrastructure: the twitter
association_rules/measures.html. experience. ACM SIGKDD Explor Newsl 2013;14:6e19.
[16] Dunham MH. Data mining: introductory and advanced topics. Prentice [30] Karim MR, Ahmed CF, Jeong B, Choi H. An efficient distributed pro-
Hall; 2003. ISBN 0-13-088892-3; 2003. gramming model for mining useful patterns in big datasets. IETE Tech
[17] Agrawal R, Shafer JC. Parallel mining of association rules. IEEE Trans. Rev 2013;30(1):53e63.
Knowl Data Eng 1996;8:962e9. [31] Rajeswari AM, Sridevi M, Deisy C. Outliers detection on educational
[18] Gyorodi C. A comparative study of distributed algorithms in mining data using fuzzy association rule mining. In: Int. Conf. on Adv. in
association rules. In: Int. symposium on system theory e XI Edition, Computer, Communication and information Science (ACCIS-14).
SINTES 11, Craiova, Romaniavol, vol. 1; 2003. p. 339e45. Elsevier Publications; 2014. p. 1e9.
[19] Cheung DW, Han J, Vincent TN, Ada WF. A fast distributed algorithm [32] Butincu CN, Craus M. An improved version of the frequent itemset
for mining association rules. In: Proc. 4th IEEE int. conf. parallel and mining algorithm. In: 14th IEEE Int. Conf. Networking in Education and
distributed information systems; 1996. p. 31e42. Research, Craiova; 2015. p. 184e9.
[20] Lin WY, Tseng MC. Automated support specification for efficient mining of [33] Ban T, Eto M, Guo S, Inoue D, Nakao K, Huang R. A study on asso-
interesting association rules. J Inf Sci CILIP 2006;32(3):238e50. ciation rule mining of darknet big data. In: Proc IEEE Int Joint Conf on
[21] Stephane L, Teytaud O, Prudhomme E. Association rule interestingness: Neural Network (IJCNN); 2015. p. 1e7.
measure and statistical validation. Studies in Computational Intelligence
(SCI), vol. 43. Springer; 2007. p. 251e75.