0% found this document useful (0 votes)

50 views

U1 - Data Mining Task Primitives

Uploaded by

Chaitali Nagbhidkar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

U1 - Data Mining Task Primitives

Uploaded by

Chaitali Nagbhidkar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Data Mining Task Primitives

A data mining task can be specified in the form of a data mining query,
which is input to the data mining system. A data mining query is defined
in terms of data mining task primitives. These primitives allow the user to
interactively communicate with the data mining system during discovery
to direct the mining process or examine the findings from different angles
or depths. The data mining primitives specify the following,

1. Set of task-relevant data to be mined.

2. Kind of knowledge to be mined.
3. Background knowledge to be used in the discovery process.
4. Interestingness measures and thresholds for pattern evaluation.
5. Representation for visualizing the discovered patterns.

A data mining query language can be designed to incorporate these

primitives, allowing users to interact with data mining systems flexibly.
Having a data mining query language provides a foundation on which
user-friendly graphical interfaces can be built.

Designing a comprehensive data mining language is challenging because

data mining covers a wide spectrum of tasks, from data characterization
to evolution analysis. Each task has different requirements. The design of
an effective data mining query language requires a deep understanding of
the power, limitation, and underlying mechanisms of the various kinds of
data mining tasks. This facilitates a data mining system's communication
with other information systems and integrates with the overall information
processing environment.

1. The set of task-relevant data to be mined

This specifies the portions of the database or the set of data in which the
user is interested. This includes the database attributes or data
warehouse dimensions of interest (the relevant attributes or dimensions).

In a relational database, the set of task-relevant data can be collected via

a relational query involving operations like selection, projection, join, and
aggregation.

The data collection process results in a new data relational called

the initial data relation. The initial data relation can be ordered or
grouped according to the conditions specified in the query. This data
retrieval can be thought of as a subtask of the data mining task.

This initial relation may or may not correspond to physical relation in the
database. Since virtual relations are called Views in the field of databases,
the set of task-relevant data for data mining is called a minable view.

2. The kind of knowledge to be mined

This specifies the data mining functions to be performed, such as

characterization, discrimination, association or correlation analysis,
classification, prediction, clustering, outlier analysis, or evolution analysis.

3. The background knowledge to be used in the discovery process

This knowledge about the domain to be mined is useful for guiding the
knowledge discovery process and evaluating the patterns found. Concept
hierarchies are a popular form of background knowledge, which allows
data to be mined at multiple levels of abstraction.

Concept hierarchy defines a sequence of mappings from low-level

concepts to higher-level, more general concepts.

o Rolling Up - Generalization of data: Allow to view data at more

meaningful and explicit abstractions and makes it easier to
understand. It compresses the data, and it would require fewer
input/output operations.
o Drilling Down - Specialization of data: Concept values replaced
by lower-level concepts. Based on different user viewpoints, there
may be more than one concept hierarchy for a given attribute or
dimension.

An example of a concept hierarchy for the attribute (or dimension) age is

shown below. User beliefs regarding relationships in the data are another
form of background knowledge.
4. The interestingness measures and thresholds for pattern
evaluation

Different kinds of knowledge may have different interesting measures.

They may be used to guide the mining process or, after discovery, to
evaluate the discovered patterns. For example, interesting measures for
association rules include support and confidence. Rules whose support
and confidence values are below user-specified thresholds are considered
uninteresting.

o Simplicity: A factor contributing to the interestingness of a pattern

is the pattern's overall simplicity for human comprehension. For
example, the more complex the structure of a rule is, the more
difficult it is to interpret, and hence, the less interesting it is likely to
be. Objective measures of pattern simplicity can be viewed as
functions of the pattern structure, defined in terms of the pattern
size in bits or the number of attributes or operators appearing in the
pattern.
o Certainty (Confidence): Each discovered pattern should have a
measure of certainty associated with it that assesses the validity or
"trustworthiness" of the pattern. A certainty measure for association
rules of the form "A =>B" where A and B are sets of items is
confidence. Confidence is a certainty measure. Given a set of task-
relevant data tuples, the confidence of "A => B" is defined as
Confidence (A=>B) = # tuples containing both A and B /# tuples
containing A
o Utility (Support): The potential usefulness of a pattern is a factor
defining its interestingness. It can be estimated by a utility function,
such as support. The support of an association pattern refers to the
percentage of task-relevant data tuples (or transactions) for which
the pattern is true.
Utility (support): usefulness of a pattern
Support (A=>B) = # tuples containing both A and B / total #of
tuples
o Novelty: Novel patterns are those that contribute new information
or increased performance to the given pattern set. For example -> A
data exception. Another strategy for detecting novelty is to remove
redundant patterns.
5. The expected representation for visualizing the discovered
patterns

This refers to the form in which discovered patterns are to be displayed,

which may include rules, tables, cross tabs, charts, graphs, decision trees,
cubes, or other visual representations.

Users must be able to specify the forms of presentation to be used for

displaying the discovered patterns. Some representation forms may be
better suited than others for particular kinds of knowledge.

For example, generalized relations and their corresponding cross tabs or

pie/bar charts are good for presenting characteristic descriptions, whereas
decision trees are common for classification.

Example of Data Mining Task Primitives

Suppose, as a marketing manager of AllElectronics, you would like to
classify customers based on their buying patterns. You are especially
interested in those customers whose salary is no less than $40,000 and
who have bought more than $1,000 worth of items, each of which is
priced at no less than $100.

In particular, you are interested in the customer's age, income, the types
of items purchased, the purchase location, and where the items were
made. You would like to view the resulting classification in the form of
rules. This data mining query is expressed in DMQL3 as follows, where
each line of the query has been enumerated to aid in our discussion.

1. use database AllElectronics_db

2. use hierarchy location_hierarchy for T.branch, age_hierarchy for
C.age
3. mine classification as promising_customers
4. in relevance to C.age, C.income, I.type, I.place_made, T.branch
5. from customer C, an item I, transaction T
6. where I.item_ID = T.item_ID and C.cust_ID = T.cust_ID and C.income
≥ 40,000 and I.price ≥ 100
7. group by T.cust_ID

VCheckPrintFactory PDF
100% (2)
VCheckPrintFactory PDF
47 pages
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
Unit-2 data Mining
No ratings yet
Unit-2 data Mining
23 pages
Primitives
100% (1)
Primitives
3 pages
Data Mining Task Primitives and Major Issues
No ratings yet
Data Mining Task Primitives and Major Issues
18 pages
Data Mining-2-1
No ratings yet
Data Mining-2-1
12 pages
Data Mining-Unit-1
No ratings yet
Data Mining-Unit-1
21 pages
Data Science Full
No ratings yet
Data Science Full
32 pages
Data Science Full
No ratings yet
Data Science Full
31 pages
DM-unit 1
No ratings yet
DM-unit 1
22 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Data Mining1 1
No ratings yet
Data Mining1 1
10 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Data Mining
No ratings yet
Data Mining
22 pages
Data Mining - Tasks: Data Characterization Data Discrimination
No ratings yet
Data Mining - Tasks: Data Characterization Data Discrimination
4 pages
Data Mining Mid 2
No ratings yet
Data Mining Mid 2
20 pages
data mining unit I notes
No ratings yet
data mining unit I notes
24 pages
Unit 1
No ratings yet
Unit 1
21 pages
Data Mining Primitives
No ratings yet
Data Mining Primitives
39 pages
Unit 1 Data Mining task
No ratings yet
Unit 1 Data Mining task
7 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
Business Analytics.
No ratings yet
Business Analytics.
18 pages
Data Mining Issues and Tasks
No ratings yet
Data Mining Issues and Tasks
5 pages
Data Mining
No ratings yet
Data Mining
14 pages
Dataminig ch1 30006
No ratings yet
Dataminig ch1 30006
4 pages
4 Primitives For Data Mining 3
No ratings yet
4 Primitives For Data Mining 3
27 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
DMDA MID 1
No ratings yet
DMDA MID 1
20 pages
Dwdmsem 6 QB
No ratings yet
Dwdmsem 6 QB
13 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
Soln 1
100% (1)
Soln 1
6 pages
BCA-404: Data Mining and Data Ware Housing
No ratings yet
BCA-404: Data Mining and Data Ware Housing
19 pages
Dataming T PDF
No ratings yet
Dataming T PDF
48 pages
Data Mining Unit I notes
No ratings yet
Data Mining Unit I notes
29 pages
Module 4
No ratings yet
Module 4
54 pages
Data Mining Primitives, Languages and System Architecture
No ratings yet
Data Mining Primitives, Languages and System Architecture
26 pages
Bca DM Unit I
No ratings yet
Bca DM Unit I
20 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
DM Sem U-1
No ratings yet
DM Sem U-1
50 pages
Data Mining Unit2
No ratings yet
Data Mining Unit2
9 pages
Data Mining
No ratings yet
Data Mining
25 pages
Mining Various Kinds of Association Rules
No ratings yet
Mining Various Kinds of Association Rules
11 pages
Data Mining University Answer
No ratings yet
Data Mining University Answer
10 pages
Exercises 5
No ratings yet
Exercises 5
5 pages
Advanced Data Analytics Assignment
No ratings yet
Advanced Data Analytics Assignment
6 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
Q.1. What Is Data Mining?
No ratings yet
Q.1. What Is Data Mining?
15 pages
Bit
No ratings yet
Bit
4 pages
DMW Lab File Work
No ratings yet
DMW Lab File Work
18 pages
CC Unit - 4 Imp Questions
No ratings yet
CC Unit - 4 Imp Questions
4 pages
CS-505 Introduction To Data Mining Exercises: Page 1 of 4
No ratings yet
CS-505 Introduction To Data Mining Exercises: Page 1 of 4
4 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
FDS notes
No ratings yet
FDS notes
5 pages
Data Mining Primitives, Languages and System Architecture
No ratings yet
Data Mining Primitives, Languages and System Architecture
64 pages
Data Mining
No ratings yet
Data Mining
3 pages
2 unit
No ratings yet
2 unit
15 pages
5 What Is Data-WPS Office
No ratings yet
5 What Is Data-WPS Office
19 pages
Major components of data mining system
No ratings yet
Major components of data mining system
9 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Laudo Motion
No ratings yet
Laudo Motion
2 pages
Unidad 8pr
No ratings yet
Unidad 8pr
2 pages
Parameterization, Configuration and Visualization With SIRIUS
No ratings yet
Parameterization, Configuration and Visualization With SIRIUS
24 pages
Manual FibreBridge 6500
No ratings yet
Manual FibreBridge 6500
36 pages
Exception Handling
No ratings yet
Exception Handling
43 pages
Batch Processing Systems Engineering - Fundamentals and Applications For Chemical Engineering
No ratings yet
Batch Processing Systems Engineering - Fundamentals and Applications For Chemical Engineering
885 pages
Huawei - Etp48 Smu02b v300r003c10
No ratings yet
Huawei - Etp48 Smu02b v300r003c10
163 pages
Computer Networks: Topic 10: Network & Server Software
No ratings yet
Computer Networks: Topic 10: Network & Server Software
42 pages
Next Generation Supply Chains - A Roadmap For Research
100% (2)
Next Generation Supply Chains - A Roadmap For Research
298 pages
Penn Union
No ratings yet
Penn Union
200 pages
Ionic Zip License
No ratings yet
Ionic Zip License
2 pages
2V0-621 Examcollection Premium Exam Dumps 218q PDF
100% (7)
2V0-621 Examcollection Premium Exam Dumps 218q PDF
53 pages
Format of Application Under Section 138 N.I, Act
No ratings yet
Format of Application Under Section 138 N.I, Act
15 pages
CPC Attack 01 300dpi PDF
100% (1)
CPC Attack 01 300dpi PDF
68 pages
OpenEdge Development - Working With XML
No ratings yet
OpenEdge Development - Working With XML
206 pages
K08810 HCM Unilever Global Rewards
No ratings yet
K08810 HCM Unilever Global Rewards
8 pages
Autotmotive Infotainment System
No ratings yet
Autotmotive Infotainment System
7 pages
37LC2D PDF
No ratings yet
37LC2D PDF
68 pages
Assignment UI _ UX Designer.docx
No ratings yet
Assignment UI _ UX Designer.docx
2 pages
Notification Staff Assistant 2
No ratings yet
Notification Staff Assistant 2
20 pages
Macam2 Surat PT Ibs
No ratings yet
Macam2 Surat PT Ibs
4 pages
XML Schema: Elementformdefault and Attributeformdefault: The Definition of Elementformdefault
No ratings yet
XML Schema: Elementformdefault and Attributeformdefault: The Definition of Elementformdefault
10 pages
Module 3 DSDV Notes
No ratings yet
Module 3 DSDV Notes
28 pages
Design Thinking: Course Info Sheet
No ratings yet
Design Thinking: Course Info Sheet
10 pages
MoCredito Organized
No ratings yet
MoCredito Organized
16 pages
Module 3
No ratings yet
Module 3
8 pages
louie-giray-the-problem-with-false-positives-ai
No ratings yet
louie-giray-the-problem-with-false-positives-ai
10 pages
This Is Very Random
No ratings yet
This Is Very Random
4 pages
Kubernetes: What's It Do?: Presenter Eric Paris Red Hat
No ratings yet
Kubernetes: What's It Do?: Presenter Eric Paris Red Hat
25 pages

U1 - Data Mining Task Primitives

Uploaded by

U1 - Data Mining Task Primitives

Uploaded by

Data Mining Task Primitives

1. Set of task-relevant data to be mined.

A data mining query language can be designed to incorporate these

Designing a comprehensive data mining language is challenging because

1. The set of task-relevant data to be mined

In a relational database, the set of task-relevant data can be collected via

The data collection process results in a new data relational called

2. The kind of knowledge to be mined

This specifies the data mining functions to be performed, such as

3. The background knowledge to be used in the discovery process

Concept hierarchy defines a sequence of mappings from low-level

o Rolling Up - Generalization of data: Allow to view data at more

An example of a concept hierarchy for the attribute (or dimension) age is

Different kinds of knowledge may have different interesting measures.

o Simplicity: A factor contributing to the interestingness of a pattern

This refers to the form in which discovered patterns are to be displayed,

Users must be able to specify the forms of presentation to be used for

For example, generalized relations and their corresponding cross tabs or

Example of Data Mining Task Primitives

1. use database AllElectronics_db

You might also like