0% found this document useful (0 votes)

12 views

CApriori Conviction Based Apriori Algorithm For Discovering Frequent Determinant Patterns From High Dimensional Datasets

The document discusses a new algorithm called CApriori that aims to efficiently discover frequent determinant patterns and association rules from high dimensional datasets. It adopts the conventional Apriori algorithm and uses conviction thresholds to prune frequent determinant sets and generate strong association rules. The approach also uses a compressed data structure and heuristic functions to improve the mining process.

Uploaded by

giripontha

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

CApriori Conviction Based Apriori Algorithm For Discovering Frequent Determinant Patterns From High Dimensional Datasets

Uploaded by

giripontha

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/338935389

CApriori:Conviction based Apriori algorithm for discovering frequent

determinant patterns from high dimensional datasets

Conference Paper · April 2014

CITATIONS READS

4 11

1 author:

Prasanna Kottapalle
G. Narayanamma Institute of Technology and Science
17 PUBLICATIONS 64 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

CApriori View project

All content following this page was uploaded by Prasanna Kottapalle on 06 May 2023.

The user has requested enhancement of the downloaded file.

IEEE-32331

CApriori: Conviction Based Apriori Algorithm for

Discovering Frequent Determinant Patterns from
High Dimensional Datasets

K.Prasanna Dr. M.Seetha Dr. A. P. Siva Kumar

Research Scholar, Honorary Professor, JNIAS Assistant Professor,
JNIAS-JNTUH, Hyderabad Professor, GNITS, Dept. CSE, JNTUCE
Assistant Professor, AITS, HYDERABAD, AP, INDIA Anatapuramu, AP, India
RAJAMPET, AP, INDIA [email protected] [email protected]
[email protected]

Abstract— At present, due to the developments in Database Dimensional Data needs efficient data mining techniques to
Technology, large volumes of data are produced by everyday discover interesting knowledge from datasets.
operations and they have introduced the necessity of representing
the data in High Dimensional Datasets. Discovering Frequent Since its introduction, Association Rule Mining Techniques
Determinant Patterns and Association Rules from these High have become a core research topic in Data mining. Association
Dimensional Datasets has become very tedious since these Rule Mining has gained much attention in discovering
databases contain large number of different attributes. For the fascinating Frequent Patterns, affinities and their relationships
reason that, it generates extremely large number of redundant from huge amounts of business Transactional data, which are
rules which makes the algorithms inefficient and it does not fit in potentially useful for real life applications and decision
main memory.In this paper, a new Association Rule Mining making. Association Rule Mining was first introduced in [1].
approach is presented, and it efficiently discovers Frequent In the given transactional database, where each transaction is a
Determinant Patterns and Association Rules from High set of items, an Association Rule is an inference of the form
Dimensional Datasets. The proposed approach adopts the Į Æ ȕ; where Į and ȕ are sets of items. A Frequent
conventional Apriori algorithm and device anew CApriori Determinant Pattern set is a set of attributes (Į,ȕ) where
algorithm to prune the generated Frequent Determinant Sets attributes Į and ȕ are frequently occurring attributes in the
effectively. A Frequent Determinant set is selected if its value is High Dimensional Dataset.
first compared with Conviction threshold value and then
compared with Support threshold. This double comparison will Association Rule Mining process is a two step process. In
eliminate the redundancy and generate strong Association Rules. the first step, all frequent itemsets are discovered; and in the
To improve the mining process, this algorithm also makes use of second step it generates strong Association Rules from frequent
a compressed data structure f_list constructed from feature itemsets found in first step. Association rule mining is used to
attributes selected using Heuristic Fitness Function (HFF) and a find interesting correlations along with affinities in a given
Heuristic Discretization algorithm. It also makes use of Count transactional database [1], [2]. Apriori is used to discover
Array (CA) devised as One Dimensional Triple Array pair set to Association Rules from Market Basket data which involve sets
minimize main memory utilization. This comprehensive study of items [1]. In general, Apriori is used as a prominent
shows that the approach outperforms with traditional Apriori
algorithm for mining frequent itemsets for generating
and obtains more rapid computing speed and at the same time
generates Sententious Rules. Further the mining methodology is
Association Rules. In literature, there is an assortment of
ascertained to be better in generating strong Association Rules development in algorithms and techniques for finding frequent
from High Dimensional Databases. itemsets. Now, many applications and databases are in High
Dimensional Space which contains multi-valued attributes that
Keywords— Frequent Determinant Patterns, Association Rule pose a great challenge in applying knowledge mining process.
Mining, Conviction Value, Heuristic Fitness Function, One The efficiency Association Rule Mining has been concerned
Dimensional Triple Array pair set for last decade since it is a difficult problem in view of the fact
that it comprehends large number of attributes and rules, the
I. INTRODUCTION mining may have to generate or combine explosive number of
candidate patterns and rules from the databases and makes the
The development of Bioinformatics and Microarray user difficult in decision making.
Technology has produced many High Dimensional Datasets
like Gene Expression Data and Microarray Data, which are Association Rules are classified according to the number of
different from Transactional data. Microarray Data usually attributes appearing in the Condition Action Rule. Considering
contains less number of rows or samples and has large number each database attribute as a dimension, it is now interesting to
of columns or attributes or genes. This kind of Very High mine High Dimensional Association Rules. If a rule contains
large number of attributes in its derivation, it is called as High
Veltech Multitech Dr.Rangarajan Dr.Sakunthala
Engineering College, Avadi, Chennai (Sponsors)

International Conference on Science, Engineering and Management Research (ICSEMR 2014)

Authorized licensed use limited to: G Narayanamma Institute of Technology & Science. Downloaded on May 06,2023 at 10:19:14 UTC from IEEE Xplore. Restrictions apply.
IEEE-32331
Dimensional Association Rule [2]. There are several data Definition1: a Frequent Determinant pattern set (Į, ȕ) is
repositories storing data in different features. In general the frequent if both Į and ȕ in the set are also frequent and it is true
databases contain Qualitative and Quantitative attributes. in Database D, if it is having higher support(s) and
Mining Association Rules on these databases are the confidence(c) in the database.
challenging issues. There is a need to develop efficient
techniques to mine High Dimensional Data effectively for Definition 2: An Association Rule is an implication of the
different applications and decision making. Relatively, there is form ĮÆȕ, which satisfies user supplied Support s and
a stable advancement in mining High Dimensional Association Confidence c.
Rules form databases. In this paper, a new approach is Definition 3: A support(s) is defined as the percentage of
projected to discover frequent k-dimension sets and transactions in D, that contains both Į and ȕ, represented as
Association Rules efficiently from large databases.
Support (ĮÆ ȕ) = P (Į U ȕ)
Finally, the main contribution of the paper is to devise a
new Apriori based algorithm which efficiently prunes the Definition 4: Confidence (c) of the Rule ĮÆ ȕ is true in the
generated frequent k-dimension sets using conviction values Database D, if it contains the percentage of transactions
known as CApriori algorithm. This algorithm also makes use a containing Į that also contains ȕ, represented as
Count Array, a one dimensional triple array pair set to Confidence (ĮÆ ȕ) = P (ȕ | Į) =P (Į U ȕ)| P (Į)
minimize the main memory reference while accessing the
dataset. This algorithm also makes use of a compressed data The problem is to find out all Association Rules which
structure f_list, which is constructed by selecting feature satisfy user-specified minimum support and confidence
attributes using a Heuristic Fitness Function (HFF). It also uses constraints. The problem of generating Association Rules was
Heuristic discretization as a preprocessing technique to initially introduced with the well-known Apriori algorithm [3].
improve the mining task faster. In Section 2, we present basic There are several algorithms for mining Association Rules with
preliminaries associated with Association Rule Mining. In the target of several studies in obtaining Association Rules
section 3 the proposed approach is elicited. Comprehensive from transactional databases. The pre-eminently recognized
analysis is presented in section 4 and conclusion will be in Apriori is proven to improve performance on 2-Dimensional
section 5. Transactional Database. Further, several algorithms are
designed and developed similar to with Apriori to find
Association Rules from 2-D transactional databases. Since its
II. BASIC PRELIMINARIES introduction, there has been affordable work on designing
algorithms for mining Association Rules [1]. This work was
A. Problem statement
subsequently extended to find Association Rules over
The main objective is to discover Frequent Determinants multidimensional databases. Currently the researchers are
and Association Rules from a High Dimensional Database D, focusing on designing and developing techniques for High
with N attributes and M records. The amn gives us a value of the Dimensional Association Rules. The proposed work discusses
Attribute An over the record rm, as shown in Table1. the issues of effective mining of High Dimensional datasets.
The major task of this approach is to find out High
TABLE I. HIGH DIMENSIONAL DATABASE Dimensional Association Rules which satisfy the conditions of
minimum support and confidence.
Ref.ID A1 A2 A3 A4 A5 ……… An

r1 a11 a21 a31 a41 a51 ……… an1

C. Apriori Algorithm
Apriori was the first introduced algorithm to discover
r2 a12 a22 a32 a42 a52 ...……. an2 Association Rules form databases [1]. It uses level wise
r3 a13 a23 a33 a43 a53 ……… an3 iterative approach to determine all frequent patterns. According
to the user specified minimum support, the original database is
.. .. .. .. .. .. ……… .. scanned for once to identify all frequent 1-itemsets L1and
rm a1m a2m a3m a4m a5m ……… anm pruned to discover C1. The L1 is used to generate all frequent
2-itemsets L2 by applying intra-Dimensional natural join
A High Dimensional Association rule is an inference of the operation. All the subsequent frequent itemsets are discovered
form ĮÆ ȕ. Where Į and ȕ are the set of attributes {(A1, by the continuous repetition of this process, until no more
A2….Ax)} in the database D, X‫{ א‬1, 2…N}. A rule ĮÆ ȕ can frequent itemsets are to be added. When the frequent itemsets
be true in D, with support s>ms and confidence c>=mc. ms are discovered, the Association Rules are generated according
and mc are the Minimum Support and Confidence. Association to the second step. According to the Apriori property, when a
Rule is also called as ConditionÆAction rule. Condition and pattern of k length is not frequent, automatically its super
Action, called as antecedent and consequent can be implicated pattern of k+1 length is also not frequent in the database. As a
as a set of conjunctions of atomic Attributes. result, with the reduction in Candidate itemsets in every
iteration it is possible to achieve good performance. Many
B. Basic Definitions numbers of modified methods on Apriori property have been
Let I be the set of items {i1, i2, i3, i4, i5, i6….im} and D be presented by several researchers.
a Database D with set of transactions {T1, T2, T3….Tn}.

Authorized licensed use limited to: G Narayanamma Institute of Technology & Science. Downloaded on May 06,2023 at 10:19:14 UTC from IEEE Xplore. Restrictions apply.
IEEE-32331
It is necessary to understand and analyze such large amount attributes. An essential preprocessing technique used by many
data for efficient decision making. A study on mining large of the Association Rule Mining algorithms is discretization.
databases is presented in [4]. Naturally, in a Gene Expression For every algorithm it is necessary to define character of input
Microarray Databases, there could be tens or hundreds of data, or data the algorithms can deal with. In Data Mining
attributes or dimensions in the dataset, and there may be up to Literature, the databases may include different types of data
hundreds of thousands of samples, each of which is mapped to such as nominal, ordinal, discrete and continuous. The
a dimension. Analyzing these datasets offers great challenges preprocessing the data main objective is to reduce a potentially
on attribute selection. These challenges are better described in infinite number of values for these data types. Many
[5], [6], [7]. The complexity of many existing data mining Association Rule Mining algorithms are used discretization as
algorithms is exponential with respect to the number of a preprocessing technique. The most frequently used
dimensions [5]. With an increasing dimensionality, these unsupervised discretization types are Equi-depth discretization
algorithms soon become computationally intractable and and Equi-distance discretization.
therefore inapplicable in many real applications. The detailed
description about the process of discovering Frequent Discretization is a process which partitions the data into
Determinants and generating High Dimensional Association discrete intervals. In general, many discretization methods are
Rules is explained in next section. guided or controlled by an external expert, such as “k” in k-
means discretization. In this paper, a new Heuristic
Discretization algorithm, which is derived from k-means
III. FREQUENT DETERMINANT MINING discretization, is used. The Heuristic Discretization is an
In this section, the process of generating Frequent automatic unsupervised discretization method, which is
Determinant Patterns and Association Rules on High proficient to adapt the data set character and to combine
Dimensional datasets is described. Generation of Patterns and advantages of Equi-depth and Equi-distance discretization. It is
Rules on High Dimensional datasets using basic Apriori described as follows.
algorithms is a time consuming process and generates Heuristic Discretization algorithm
redundant rules, since it takes only two-Dimensional data as
input. In this section, a new approach of taking High Input: d as degree of discretization,
Dimensional Dataset as input and producing the Association
D as High Dimensional database with M records
Rules as output is elucidated.
rm: record of the database
In this, a new measure called Conviction is used along with
Apriori algorithm to prune the Frequent Determinant Pattern Output: Discretized set of records
Set according to the user supplied prior knowledge as support.
The Conviction value mainly used to reduce the infrequent x For each discretized attribute
patterns and generates strong association rules. The CApriori x Generate discretized attribute intervals S1,S2 …Sd where
uses One Dimensional Triple Array Pair Set known as Count
Array (CA) to count the occurrences of attributes in the dataset.
The objective of using CA is to optimize the main memory
usage. The detailed description of the process is explained in and
the following sub sections.

A. Preprocessing the dataset x Repeat until there is no change in consecutive iterations.

Before starting the mining process, there is a need to define
a uniform data structure for the mining process. The different x For each record rm in database D
datasets are combined into a large High Dimensional dataset x For each i=1 to d ; compute closeness between rm and Si
which is linear data structure. It is very important because the
using
database may contain combinations of nominal or quantitative
attributes. The process of combining attributes was described
in [8]. The equivalent algorithm with some necessary x Assign rm to discrete interval i, for which is has
modifications is used. By preprocessing the quality of mining minimum closeness.
results can be improved by selecting the featured attributes x Update each interval value for every attribute.
substantially.
The discretization procedure is explained as follows. If the
The basic problem with Apriori algorithm is that it is time Attribute is nominal data, then replace with Attribute suffix
consuming and it takes large space. The proposed algorithm is values. If the dataset attribute is nominal, it contains a finite
time efficient and optimizes in the main memory consumption. number of possible values. If the Attribute is quantitative data
In this algorithm, an efficient feature selection approach is used attribute, replace it by it’s their respective discretized binary
to construct compressed data structure using Heuristic Fitness data generated using Binning and Equi-depth partitioning. In
Function. Most of the datasets contain nominal and quantitative the Database D, if the attribute contains quantitative values,
attributes. If these attributes are not preprocessed, they show which are discretized to be nominal attributes with certain limit
significant effect on mining process. In this paper we used a or range. The detailed explanation of discretization is given in
basic discretization [10] for processing the quantitative discussion part of this paper. Partitioning and discretizing the

Authorized licensed use limited to: G Narayanamma Institute of Technology & Science. Downloaded on May 06,2023 at 10:19:14 UTC from IEEE Xplore. Restrictions apply.
IEEE-32331
data into intervals will renovate raw data into corresponding Conviction-Based Apriori algorithm for Discovering
binary values [10]. Frequent Determinant Pattern sets
Input: Reduced Feature Matrix D, minsup and minconf
B. Compute Fitness value for attribute selection.
Attribute selection plays an important role in predicting the Output: complete set of Frequent Determinant Patterns
mining outcomes. Because of discretization, the quantitative 1. Set Count Array [i].count = 0;
attribute will introduce plentiful sub attributes in the dataset 2. For each reference id in reduced feature matrix
and the mining task will become complex. So there is a need to
reduce number of attributes in the mining task. In this paper, a 3. For (k=2;Fk-1z0; k++)
Heuristics based Fitness Function (HFF) is used to evaluate the 4. for all reference Rid D do Ct = subset( f-list).
quality of each added attribute [9]. Consider each attribute in 5. For each pattern pair(A,B) in the f-list compute
the dataset with a sequence of data items; apply Fitness Conviction Value (A, B);
function to compute quality of each attribute using the formula. 6. for all patterns in f-list, which are above conviction
value, cCt do update Count Array[i].count++
7. if Count Array [i].count tminsup then discover it is
pattern (A,B) is a Frequent Determinants

Where N = number of Discretized attribute bins with Equi- The algorithm reduces passes over the database where each
Depth partitioning, pass consists of two phases. This algorithm scans the database
to for each frequent pattern set to discover the Conviction (A,
SUMi= the sum of all random sequences in bin I, B) using the formula given below.
C= the bin capacity. For each pattern A, B ‫א‬f-list, and A B, the Conviction¬
value computed as.
The attributes whose Fitness value is less than the threshold
limit will be removed from the dataset because they are not P ( A) P (B)
useful for mining process. After removing such attributes, a Conviction Value (CV) =
compressed data structure f_list will be constructed. P ( AB)
If its conviction values than the limit then it will generate
C. Conviction-based Apriori Algorithm all Frequent Determinant pattern found in the (k-1)th pass, and
In this section, the process to find all Frequent Determinant it is used to generate the candidate dimension set Ct. It ensures
patterns is described. A Frequent Determinant set is selected that Ct is a superset of the set of all frequent sets . After this,
using Conviction value. Conviction Values are computed for the procedure the discovered pattern is compared with user
each frequent pair and used to identify infrequent patterns from supplied support value to determine which of the pattern sets
the databases. A Frequent Determinant set is selected once it Frequent Determinants pattern sets. This process will terminate
satisfies the conviction first principle. A few changes are made if no more patterns are added to Count Array.
to the basic Apriori algorithm to select Frequent Determinant
set. The new Conviction Based Apriori algorithm has the The Conviction value is used to identify infrequent itemsets
process. in the database. The set of patterns whose values are less than
Conviction Value are discovered as infrequent patterns sets.
x Scan the database to obtain frequent pattern sets. Beginning with frequent pattern sets, all such Frequent
Determinants sets are generated in the procedure given is as
x If every pattern set first compared with conviction follows. Let Lk be the set of frequent pattern set. Ct be the set
values and then compared with support value. If both
are satisfied then it produces a valid Frequent of candidate sets. If an item set (A, B) is frequent cCt. This
Determinant Pattern Set. algorithm make use of One Dimensional Triple Array pair set
as Count Array to measure the frequency of occurrence. If its
x For this Frequent Determinants, strong association rules count value is less than Conviction Values that frequent pattern
were generated using confidence and conviction values. is treated as infrequent pattern and removed from set. Prune the
At this stage, the prepared and preprocessed dataset is used pattern attributes further in a frequent pattern set at each
as described above. This dataset contains partitioned iteration step which is not satisfying the user specified interest
quantitative attributes, and created combinations of intervals of measure ‘support’.
the quantitative attributes. This combination, along with those
values of categorical attributes, collectively forms the frequent D. One Dimensional Triple Array Pair Set
dimension sets. f-list contains all possible Frequent pattern In General, the Association Rule mining algorithms
generated in the dataset. The description of algorithm is as maintain different item count frequency values throughout a
follows: scan over database. For instance, it is essential to have
adequate main memory to hoard each pattern count that the
Let f-list represents the set of frequent pattern set and Ct the
number of times a pattern pair sets occurs in the transaction
set of Candidate set for the frequent pattern set.
database. It is hard to update a 1 to a count set where the

Authorized licensed use limited to: G Narayanamma Institute of Technology & Science. Downloaded on May 06,2023 at 10:19:14 UTC from IEEE Xplore. Restrictions apply.
IEEE-32331
counting sequences are stored in different memory locations thoroughly preprocessed and replace integer and real data
and difficult in loading the page to main memory. In such values with integer and binary values. Using this datasets the
cases, these algorithms will be slow in finding that pattern pair interesting Association Rules with varying support and
count in main memory as it takes extra overhead on processing confidence values are generated. Table 3 shows the evaluation
time and increases the time to discover frequent pattern set. So of Heuristic Discretization algorithm on Iris data.
it is difficult to count anything that doesn’t ¿t in main memory.
So each algorithm has a limit on how many items it can deal TABLE III. : DISCRETIZATION VALUES OF IRIS DATASET
with. When it comes to high dimensional datasets, it is difficult
to maintain all in one memory. So a new one dimensional triple Discretized Attribute1 Attribute2 Attribute3 Attribute4
array set is used to count all the pattern occurrences in the Intervals Sepallength Sepalwidth petallength Petalwidth
given database. Interval 1 4.7748 3.1789 1.4194 0.1948
Interval 2 6.8585 3.0862 5.7859 2.1327
To optimize main memory, a pattern pair (i,j) occurrence in Interval 3 6.1613 2.8547 4.7484 1.5757
the dataset should be counted in one place. If it is in the order Interval 4 5.2823 3.7037 1.5173 0.3028
a pattern pair such that i < j, and use only one entry a[i,j] in Interval 5 5.5432 2.5786 3.863 1.1696
Fitness
two dimensional array a. This strategy makes half of the array Value
0.26 0.23 0.28 0.15
as useless. Count Array (CA) is a more efficient way to store In the Table 3, for each attribute and for each interval, we
pattern pairs in memory. A count array is defined as a one- calculated the average Sd and it is shown in Table3. For each
dimensional triple array set which will store a count as a[k] for attribute the calculated fitness values are also shown in table.
the pair (i,j), with 1 i < j n, where The threshold value for fitness function is 0.2 and the attributes
selected for consideration for mining process is Sepallength,
Sepalwidth, Petalwidth.

The graphs shown in Figure.1 and Figure 2 depict the
The pattern pair sets are stored in lexicographic order. execution times of discovering Frequent Determinants sets on
two datasets with varying combinations of parameter values
E. Generating and Validating Association Rules
In the second phase of Association Rule mining, the basic
rule generation algorithm is used to generate Association Rules
from these Frequent Determinant sets. The strong rules which
have maximum support and confidence are validated. Once the
frequent attributes have been found, it is straight forward to
generate strong Association Rules which satisfying both
minimum support and confidence. This process is going to
generate all Association Rules which are above the confidence
value. The redundant rules are eliminated by selecting feature
attributes using Fitness function.
Fig. 1. Performance on synthetic dataset with varying support values
IV. RESULTS AND DICUSSION
In this section, the performance of the proposed approach is
evaluated based on support and elapsed time with respect to
three factors namely the number of Frequent Determinant set,
number of strong rules generated, and Dimensionality (D). The
elapsed time is measured as the time duration (in mille
seconds) to generate all frequent pattern sets. The datasets used
for evaluation as shown the Table 2.

TABLE II. DATASET DESCRIPTIONS

Attributes selected
Dataset name Attributes Samples using Fitness Fig. 2. Performance on SAGE Dataset
function ( 0.2)
T100-AT10-
100 1000 38
Figure.1and Figure 2 shows the performance on datasets
I100-P50-AP5 with varying support values compared with traditional Apriori.
Iris 5 150 3 It is observed from Figure2 that as the support values are
SAGE 12533 215 8412 increasing the elapsed time to generate Frequent Determinant
Pattern set values are decreasing. From the Figure.3, shows that
These datasets as a collection of quantitative, categorical as the support values decrease the elapsed time for generating
and numerical data values. The dataset SAGE and Iris Frequent Determinant Pattern set values are increasing. And
obtained from UCI Machine learning repository and

Authorized licensed use limited to: G Narayanamma Institute of Technology & Science. Downloaded on May 06,2023 at 10:19:14 UTC from IEEE Xplore. Restrictions apply.
IEEE-32331
the increase of elapsed time with the decrease of the support V. CONCLUSION
value is noticeable. In the recent years, Association Rule mining has gained
The numbers of Frequent Determinant Pattern sets considerable interest in the research community. In this paper,
generated for the datasets with varying support values shown in a framework approach for mining Association Rules on High
Table 4 and it is observed that the number of generated Dimensional data using new C-Apriori is discussed. The
Frequent Determinant Pattern sets are decreased as the support quantitative data values are better dealt with partitioning
values are increased. As we can see, in all the experiments the method and different attributes are combined into a master
runtime will be more when the support values are minimum. table where effective Frequent Determinant Pattern set can be
As the support values are increasing the elapsed time is easily generated. From the experiments, it is ascertained that
gradually reducing. Because, the reason is the generations of the strong rules are generated with the selected feature
Frequent Determinant Pattern sets are more when support attributes using Heuristic Fitness function and efficient
values are low and it reduces when support values are frequents are selected using Conviction values. With the above
increasing. results, our approach for mining High Dimensional Association
Rules produces better results compared to traditional method.
TABLE IV. FREQUENT DETERMINANT PATTERN SET GENERATION
REFERENCES
Frequent Determinant Pattern Sets [1] Rakesh Agrawal, Tomasz Imielinski and Arun Swami,” Mining
Support
T100-AT10-I100-P50-AP5 SAGE Association Rules between sets of items in large databases”, in
proceedings of the ACM SIGMOD Conference on Management of Data,
0.2 131 9945 pp 207-216, Washington, D.C., May 1993.
0.4 43 5689 [2] Bodon, F., “A Fast Apriori Implementation”, FIMI’03, November 2003.
0.6 16 2673 [3] Rakesh Agrawal, Tomasz Imielinski and Arun Swami,” Database
Mining- A performance perspective”, IEEE transactions on knowledge
0.8 2 809 and data engineering, vol 5 1993.
1.0 0 89 [4] M J Zaki and C J Hsiao,” CHARM- an Efficient algorithm for closed
itemset mining, in the proceedings of SDM 2002, p 457-473., 2002
[5] Aggrawal, C.C., Hinnwburg, A., and Keim, D.A. “On the surprising
The number of interesting Association Rules generated behavior of distance metrics in High Dimensional space”. IBM Research
with varying confidence values on SAGE dataset are shown in report, RC 21739, 2000.
Table 5. As the confidence values increase the number of rules [6] Beyer K., Goldstein, J.Ramakrishnan, R., and Shaft, U. “When is nearest
and elapsed time are decrease. neighbor meaningful?” In Proceedings of the 7th ICDT, Jerusalem,
Israel. 1999.
[7] Beyer K and Ramakrishnan. “Bottom-up computation of sparse and
TABLE V. RESULTS ON SAGE DATASET iceberg cubes”. In Proceeding of the ACM-SIGMOD 1999 International
Number of Elapsed time Conference on Management of Data, Philadelphia, PA, 359–370, 1999.
Confidence [8] K.Prasanna, M.Seetha “Mining High Dimensional Association Rules by
rules generated in ms
0.2 37520 621 Generating Large Frequent K-Dimension Set”, in proceedings of IEEE
International Conference on Data Science and Engineering, Kochin ,
0.4 31110 490 India, 2012.
0.6 18300 141 [9] Masri Ayob, Yang Xiao Fei, “ Local Search Heuristics for One
0.8 8230 78 Dimensional bin packing problem”, in the proceedings of International
1.0 5860 57 Journal of Soft Computing, 8(2):108-112,2013.
[10] Rama kirshna Srikant, Rakesh agrawal ,” Mining quantitative
Association ures in large relational tables”, in the proceedings of ACM
SIGMOD , USA 1996.

Authorized licensed use limited to: G Narayanamma Institute of Technology & Science. Downloaded on May 06,2023 at 10:19:14 UTC from IEEE Xplore. Restrictions apply.
View publication stats

Apriori Principle Example Question and Answer
100% (11)
Apriori Principle Example Question and Answer
11 pages
Data Mining and Warehouse MCQS With Answer Good
74% (72)
Data Mining and Warehouse MCQS With Answer Good
30 pages
Data Mining For Supermarket Sale Analysis Using Association Rule
No ratings yet
Data Mining For Supermarket Sale Analysis Using Association Rule
5 pages
Performance Evaluation of Sequential and Parallel Mining of Association Rules Using Apriori Algorithms
No ratings yet
Performance Evaluation of Sequential and Parallel Mining of Association Rules Using Apriori Algorithms
6 pages
V1i6 Ijertv1is6251 PDF
No ratings yet
V1i6 Ijertv1is6251 PDF
6 pages
Evaluating The Performance of Apriori and Predictive Apriori Algorithm To Find New Association Rules Based On The Statistical Measures of Datasets
No ratings yet
Evaluating The Performance of Apriori and Predictive Apriori Algorithm To Find New Association Rules Based On The Statistical Measures of Datasets
6 pages
Feature Extraction and Reduction by using ModifiedApriori algorithm (1)
No ratings yet
Feature Extraction and Reduction by using ModifiedApriori algorithm (1)
9 pages
CBAR: An Efficient Method For Mining Association Rules: Yuh-Jiuan Tsay, Jiunn-Yann Chiang
No ratings yet
CBAR: An Efficient Method For Mining Association Rules: Yuh-Jiuan Tsay, Jiunn-Yann Chiang
7 pages
Efficient Transaction Reduction in Actionable Pattern Mining For High Voluminous Datasets Based On Bitmap and Class Labels
No ratings yet
Efficient Transaction Reduction in Actionable Pattern Mining For High Voluminous Datasets Based On Bitmap and Class Labels
8 pages
SDFP-growth Algorithm as a Novelty of Association
No ratings yet
SDFP-growth Algorithm as a Novelty of Association
12 pages
Mining of Frequent Item With BSW Chunking: Pratik S. Chopade Prof. Priyanka More
No ratings yet
Mining of Frequent Item With BSW Chunking: Pratik S. Chopade Prof. Priyanka More
4 pages
cc4 PDF
No ratings yet
cc4 PDF
6 pages
Hot Keys
No ratings yet
Hot Keys
4 pages
I Jcs It 2014050535
No ratings yet
I Jcs It 2014050535
5 pages
FP Growth Algorithm Implementation
No ratings yet
FP Growth Algorithm Implementation
6 pages
PRP Icp 1
No ratings yet
PRP Icp 1
5 pages
"Implementation On An Approach For Mining of Datasets Using APRIORI Hybrid
No ratings yet
"Implementation On An Approach For Mining of Datasets Using APRIORI Hybrid
5 pages
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
No ratings yet
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
4 pages
Mining High Utility Dataset
No ratings yet
Mining High Utility Dataset
8 pages
426-Article Text-1037-1-10-20210421
No ratings yet
426-Article Text-1037-1-10-20210421
9 pages
A Survey of Association Rule Hiding Algorithms
No ratings yet
A Survey of Association Rule Hiding Algorithms
4 pages
(IJCST-V4I2P44) :dr. K.Kavitha
No ratings yet
(IJCST-V4I2P44) :dr. K.Kavitha
7 pages
SDFP-Growth Algorithm As A Novelty of Association Rule Mining Optimization
No ratings yet
SDFP-Growth Algorithm As A Novelty of Association Rule Mining Optimization
12 pages
PRP Icp 2
No ratings yet
PRP Icp 2
11 pages
Applsci 11 08971 v2
No ratings yet
Applsci 11 08971 v2
15 pages
Literature Review On Mining High Utility Itemset From Transactional Database
No ratings yet
Literature Review On Mining High Utility Itemset From Transactional Database
3 pages
Acta1034_submission_6613
No ratings yet
Acta1034_submission_6613
17 pages
2013 Selection of The Best Classifier From Different Datasets Using WEKA PDF
No ratings yet
2013 Selection of The Best Classifier From Different Datasets Using WEKA PDF
8 pages
Online Mining For Association Rules and Collective Anomalies in Data Streams
No ratings yet
Online Mining For Association Rules and Collective Anomalies in Data Streams
10 pages
Aqrat Al 2018
No ratings yet
Aqrat Al 2018
32 pages
8.research Plan-N.M MISHRA
No ratings yet
8.research Plan-N.M MISHRA
3 pages
Estimating Frequent Products in Shopping Cart Using Data Mining
No ratings yet
Estimating Frequent Products in Shopping Cart Using Data Mining
5 pages
13238-Article Text-23626-1-10-20221220
No ratings yet
13238-Article Text-23626-1-10-20221220
7 pages
Dwdmsem 6 QB
No ratings yet
Dwdmsem 6 QB
13 pages
1 s2.0 S2665917422000551 Main
No ratings yet
1 s2.0 S2665917422000551 Main
9 pages
Analysis and Implementation of FP & Q-FP Tree With Minimum CPU Utilization in Association Rule Mining
No ratings yet
Analysis and Implementation of FP & Q-FP Tree With Minimum CPU Utilization in Association Rule Mining
6 pages
Implementation of An Efficient Algorithm: 2. Related Works
No ratings yet
Implementation of An Efficient Algorithm: 2. Related Works
5 pages
Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda
No ratings yet
Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda
4 pages
Data Mining For Biological Data Analysis: Glover Eric Leo Cimi Smith Calvin
No ratings yet
Data Mining For Biological Data Analysis: Glover Eric Leo Cimi Smith Calvin
8 pages
The Impact of Threshold Parameters in Transactional Data Analysis
No ratings yet
The Impact of Threshold Parameters in Transactional Data Analysis
6 pages
10.1.1.449.1341
No ratings yet
10.1.1.449.1341
3 pages
Parallel Data Mining of Association Rules
No ratings yet
Parallel Data Mining of Association Rules
10 pages
privacy preservation
No ratings yet
privacy preservation
9 pages
Comparing The Performance of Frequent Pattern Mini
No ratings yet
Comparing The Performance of Frequent Pattern Mini
5 pages
Navigo: Harshal Kamble, Mayuri Waghmare, Rajeshree Sonwane, Sonal Shende, Sonali Tiwari. Prof.N.R.Hatwar
No ratings yet
Navigo: Harshal Kamble, Mayuri Waghmare, Rajeshree Sonwane, Sonal Shende, Sonali Tiwari. Prof.N.R.Hatwar
3 pages
Research Modern Rules
No ratings yet
Research Modern Rules
4 pages
2017, Prajapati - Intersting association rule mining whith consistent and inconsistent rule detection
No ratings yet
2017, Prajapati - Intersting association rule mining whith consistent and inconsistent rule detection
12 pages
Data Mining Techniques - Javatpoint
No ratings yet
Data Mining Techniques - Javatpoint
9 pages
KajalReview
No ratings yet
KajalReview
5 pages
An Overview of Ontology Based Approach To Organize The
No ratings yet
An Overview of Ontology Based Approach To Organize The
6 pages
A Survey On Decision Tree Algorithms of Classification in Data Mining
No ratings yet
A Survey On Decision Tree Algorithms of Classification in Data Mining
5 pages
Database Reverse Engineering Based On Association Rule Mining
No ratings yet
Database Reverse Engineering Based On Association Rule Mining
6 pages
FP-tree and COFI Based Approach For Mining of Multiple Level Association Rules in Large Databases
No ratings yet
FP-tree and COFI Based Approach For Mining of Multiple Level Association Rules in Large Databases
7 pages
Accurate Decision Trees For Mining High-Speed Data Streams: Jo Ao Gama Ricardo Rocha Pedro Medas
No ratings yet
Accurate Decision Trees For Mining High-Speed Data Streams: Jo Ao Gama Ricardo Rocha Pedro Medas
6 pages
CSDA-S-24-00391
No ratings yet
CSDA-S-24-00391
45 pages
A Partition Enhanced Mining Algorithm For Distributed - 2015 - Egyptian Informat
No ratings yet
A Partition Enhanced Mining Algorithm For Distributed - 2015 - Egyptian Informat
11 pages
Applying Data Mining in Prediction and Classification of Urban Traffic
No ratings yet
Applying Data Mining in Prediction and Classification of Urban Traffic
5 pages
Pec-It602b
No ratings yet
Pec-It602b
7 pages
Data Mining Using Clouds: An Experimental Implementation of Apriori Over Mapreduce
No ratings yet
Data Mining Using Clouds: An Experimental Implementation of Apriori Over Mapreduce
8 pages
An Approach on Efficient Clustering Technique of High Dimensional Records
No ratings yet
An Approach on Efficient Clustering Technique of High Dimensional Records
6 pages
Association Rule Mining On Distributed Data: Pallavi Dubey
No ratings yet
Association Rule Mining On Distributed Data: Pallavi Dubey
6 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Approximate Association Rule Mining: Jyothsna R. Nayak and Diane J. Cook
No ratings yet
Approximate Association Rule Mining: Jyothsna R. Nayak and Diane J. Cook
6 pages
Association Rule
No ratings yet
Association Rule
22 pages
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
No ratings yet
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
5 pages
Data Mining Methods
No ratings yet
Data Mining Methods
18 pages
15-Fp-Tree Problem-10-09-2024
No ratings yet
15-Fp-Tree Problem-10-09-2024
2 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Association and Recommendation System
No ratings yet
Association and Recommendation System
24 pages
DWDM Unit-3
100% (1)
DWDM Unit-3
63 pages
DM Unit 3
No ratings yet
DM Unit 3
22 pages
Data Analyticskit601 Unit 4 Notes
No ratings yet
Data Analyticskit601 Unit 4 Notes
178 pages
Frequent Patterns
No ratings yet
Frequent Patterns
80 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
62 pages
AprioriTID Algorithm Improved From Apriori Algorithm
No ratings yet
AprioriTID Algorithm Improved From Apriori Algorithm
5 pages
MS (Data Science) Fall 2020 Semester
No ratings yet
MS (Data Science) Fall 2020 Semester
36 pages
Association Rules & Sequential Patterns
No ratings yet
Association Rules & Sequential Patterns
65 pages
Unit-8 (2)
No ratings yet
Unit-8 (2)
146 pages
FP Growth PPT Shabnam
No ratings yet
FP Growth PPT Shabnam
19 pages
Frequent Item-Set Mining Methods: Prepared By-Mr - Nilesh Magar
No ratings yet
Frequent Item-Set Mining Methods: Prepared By-Mr - Nilesh Magar
31 pages
Data Mining in SIEM
No ratings yet
Data Mining in SIEM
5 pages
"Fast Algorithms For Mining Association Rules" by Rakesh Agarwal Ramakrishnan Srikant
No ratings yet
"Fast Algorithms For Mining Association Rules" by Rakesh Agarwal Ramakrishnan Srikant
5 pages
1 Ijetst PDF
No ratings yet
1 Ijetst PDF
9 pages
8CT-DWM Lab Manual-19-20
No ratings yet
8CT-DWM Lab Manual-19-20
31 pages
Qustion Bank DMDW
No ratings yet
Qustion Bank DMDW
8 pages
Assignment 2 Slot8 TTS3208 Summer
No ratings yet
Assignment 2 Slot8 TTS3208 Summer
11 pages
DWDM LAB MANUAL-converted1
No ratings yet
DWDM LAB MANUAL-converted1
18 pages
A New Method For Mining Maximal Frequent Itemsets Based On Graph Theory
No ratings yet
A New Method For Mining Maximal Frequent Itemsets Based On Graph Theory
6 pages
Association Rules and Sequential Patterns Sequential Patterns
No ratings yet
Association Rules and Sequential Patterns Sequential Patterns
83 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages

CApriori Conviction Based Apriori Algorithm For Discovering Frequent Determinant Patterns From High Dimensional Datasets

Uploaded by

CApriori Conviction Based Apriori Algorithm For Discovering Frequent Determinant Patterns From High Dimensional Datasets

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

CApriori:Conviction based Apriori algorithm for discovering frequent

Conference Paper · April 2014

CApriori View project

The user has requested enhancement of the downloaded file.

CApriori: Conviction Based Apriori Algorithm for

K.Prasanna Dr. M.Seetha Dr. A. P. Siva Kumar

International Conference on Science, Engineering and Management Research (ICSEMR 2014)

r1 a11 a21 a31 a41 a51 ……… an1

A. Preprocessing the dataset x Repeat until there is no change in consecutive iterations.

TABLE II. DATASET DESCRIPTIONS

You might also like