05 DM BI Concept Description

Concept description involves summarizing complex datasets to highlight essential characteristics, patterns, and trends for easier interpretation and decision-making. It includes data generalization techniques such as data cube-based aggregation and attribute-oriented induction, focusing on both characterization and comparison of data collections. The document discusses the limitations of current OLAP systems and presents attribute-oriented induction as an alternative method for effective data analysis.

Uploaded by

batch0406sem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

05 DM BI Concept Description

Uploaded by

batch0406sem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Data Mining and Business Intelligence

Concept Description
(Continuation of Data Cube & OLAP)

Module 3
Created/Adopted/Modified for
Data Mining and Business Intelligence – MCA II Semester
Vidya Vikas Institute of Engineering & Technology
Mysore
2023-24
GPD
Concept Description
 Concept description refers to the process of representing complex or
large datasets in a more concise and understandable manner, while
preserving its meaningful aspects.
 The goal of concept description is to distill the essential
characteristics, patterns, and trends from the data, making it easier
for humans to interpret and make decisions based on the
summarized information.
The focus is on providing a high-level overview of the data's key
features, rather than presenting all the detailed data points.
 This is particularly important when dealing with large datasets

2
that might be overwhelming to analyze in their raw form.
Concept Description
 Data Generalization summarizes data by replacing relatively low-
level values (e.g., numeric values for an attribute age) with higher-
level concepts (e.g., young, middle-aged, and senior), or by reducing
the number of dimensions to summarize data in concept space
involving fewer dimensions (e.g., removing birth date and telephone
number when summarizing the behavior of a group of students).
 Allowing data sets to be generalized at multiple levels of
abstraction facilitates users in examining the general behavior of
the data.
 Concept Description is a form of Data Generalisation
3
Concept Description
 Concept Description generates descriptions for data characterization
and comparison.
comparison
 It is sometimes called class description when the concept to be
described refers to a class of objects.
 Data Characterization provides a concise and clear summarization
of the given data collection
 Concept Comparison or Class Comparison (also known as
discrimination) provides descriptions comparing two or more data
collections.

4
Concept Description
 We have studied data cube (or OLAP) approaches to concept
description using multidimensional, multilevel data generalization in
data warehouses. But, the question is,
 “Is data cube technology sufficient to accomplish all kinds of
concept description tasks for large data sets?”
sets?
 There are limitations...

5
Concept Description
 “Is data cube technology sufficient to accomplish all kinds of concept
description tasks for large data sets?”
sets?
 1. Current OLAP systems limits dimensions and measures to numeric
and complex aggregations.
 In reality, the database can include attributes of various data types,
including numeric, non-numeric, spatial, text, or image, which
ideally should be included in the concept description.

6
Concept Description
 “Is data cube technology sufficient to accomplish all kinds of concept
description tasks for large data sets?”
sets?
 2. The selection of dimensions and the application of OLAP
operations (e.g., drill-down, roll-up, slicing, and dicing) are primarily
directed and controlled by users.
 This means, users need good understanding.
 There's a need for more automated approaches that assist users in
selecting dimensions and determining the appropriate level of data
generalization for meaningful summarization.
 So, we will study an alternate method for Concept Description.
7
Attribute-Oriented Induction
 Attribute-Oriented Induction is an alternative method for concept
description, which works for complex data types and relies on a
data-driven generalization process.
 The data cube approach is based on materialized views of the data,
which typically have been precomputed in a data warehouse.
 In general, it performs offline aggregation before an OLAP or data
mining query is submitted for processing.
 On the other hand, the attribute-oriented induction approach is
basically a query-oriented, generalization-based, online data
analysis technique.
8
Attribute-Oriented Induction – the Idea
 First collect the task-relevant data using a database query and then
perform generalization based on the examination of the number of
each attribute’s distinct values in the relevant data set.
 The generalization is performed by either attribute removal or
attribute generalization.
Aggregation is performed by merging identical generalized tuples
and accumulating their respective counts.
 This reduces the size of the generalized data set.

 The resulting generalized relation can be mapped into different

forms (e.g., charts or rules) for presentation to the user.
9
Attribute-Oriented Induction-Example
 Example : Suppose that a user wants to describe the general
characteristics of graduate students. The DMQL query :

 In the
BigUniversity
database :

10
Attribute-Oriented Induction
 “Now that the data are ready for attribute-oriented induction, how is
attribute-oriented induction performed?”
The essential operation of attribute-oriented induction is data
generalization, which can be performed in either of two ways on
the initial working relation:
 attribute removal and
 attribute generalization.

11
Attribute-Oriented Induction
 Attribute removal is based on the following rule:
 If there is a large set of distinct values for an attribute of the initial
working relation, but either (case 1) there is no generalization
operator on the attribute (e.g., there is no concept hierarchy
defined for the attribute), or (case 2) its higher-level concepts are
expressed in terms of other attributes, then the attribute should be
removed from the working relation.
 Name, Phone#:
Phone# Since there are a large number of distinct values for name &
phone# and there is no generalization operation defined on it, this attribute
is removed. (Case 1)
 Street (if any) will also be removed since its higher-level concepts are
12 expressed in term or other attributes (city, state, etc). (Case 2)
Attribute-Oriented Induction
 Attribute generalization is based on the following rule:
 If there is a large set of distinct values for an attribute in the initial
working relation, and there exists a set of generalization operators
on the attribute, then a generalization operator should be
selected and applied to the attribute.
This rule is based on the following reasoning.
 Use of a generalization operator to generalize an attribute value
within a tuple, or rule, in the working relation will make the rule
cover more of the original data tuples, thus generalizing the
concept it represents.
13
Attribute-Oriented Induction
 Attribute Generalisation
 major: Suppose that a concept hierarchy has been defined that allows the
attribute major to be generalized to the values {arts&sciences, engineering,
business}.
 birth place: This attribute has a large number of distinct values; therefore,
we would like to generalize it based on the concept hierarchy “city <
province or state < country.”
 birth date: Generalised to age and age to age range.
 gpa: Can be generalised based on the concept hierarchy that groups values
for grade point average into numeric intervals like {3.75–4.0, 3.5–3.75, . . . },
which in turn are grouped into descriptive values such as {“excellent”, “very
14
good”, . . . }.
Attribute-Oriented Induction

15
Class Comparison
 In many applications, users may not be interested in having a single
class (or concept) described or characterized, but prefer to mine a
description that compares or distinguishes one class (or concept)
from other comparable classes (or concepts).
 Class discrimination or comparison (hereafter referred to as class
comparison) mines descriptions that distinguish a target class
from its contrasting classes.
 For example, the three classes person, address, and item are not
comparable.
 However, sales in the last three years are comparable classes, and so
16
are, for example, computer science students versus physics students.
Class Comparison – General Procedure
 1. Data collection:
collection The set of relevant data in the database is
collected by query processing and is partitioned respectively into a
target class and one or a set of contrasting classes.
 2. Dimension relevance analysis:
analysis If there are many dimensions, then
dimension relevance analysis should be performed on these classes
to select only the highly relevant dimensions for further analysis.
Correlation or entropy-based measures can be used for this step.

17
Class Comparison – General Procedure
 3. Synchronous generalization:
generalization Generalization is performed on the target class
to the level controlled by a user- or expert-specified dimension threshold,
which results in a prime target class relation. The concepts in the contrasting
class(es) are generalized to the same level as those in the prime target class
relation, forming the prime contrasting class(es) relation.
 4. Presentation of the derived comparison:
comparison The resulting class comparison
description can be visualized in the form of tables, graphs, and rules. This
presentation usually includes a “contrasting” measure such as count%
(percentage count) that reflects the comparison between the target and
contrasting classes. The user can adjust the comparison description by
applying drill-down, roll-up, and other OLAP operations to the target and
contrasting classes, as desired.
18
Summary – Concept Description
 Data generalization is a process that abstracts a large set of task-
relevant data in a database from a relatively low conceptual level to
higher conceptual levels.
 Data generalization approaches include data cube-based data
aggregation and attribute-oriented induction.
 Concept description is the most basic form of descriptive data
mining.
mining
 It describes a given set of task-relevant data in a concise and
summarative manner, presenting interesting general properties of
the data.
19
Summary – Concept Description
 Concept (or class) description consists of characterization and
comparison (or discrimination).
Concept Characterization
 Summarizes and describes a data collection, called the target class
Concept Comparison (or discrimination)
 Summarizes and distinguishes one data collection, called the
target class, from other data collection(s), collectively called the
contrasting class(es).

20
Summary – Concept Description
 Concept characterization can be implemented using
 data cube (OLAP-based) approaches and
 the attribute-oriented induction approach.

 Concept comparison can be performed using the

 attribute-oriented induction or
 data cube approaches in a manner similar to concept
characterization.
 Generalized tuples from the target and contrasting classes can be
quantitatively compared and contrasted.

Molitfelnic 2019 Compressed
92% (13)
Molitfelnic 2019 Compressed
927 pages
Column Name Data Type Constraint Description: Exercise 1
71% (24)
Column Name Data Type Constraint Description: Exercise 1
93 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Querying Microsoft SQL Server 2014
No ratings yet
Querying Microsoft SQL Server 2014
6 pages
Unit-Iii Data Mining Material
No ratings yet
Unit-Iii Data Mining Material
23 pages
Data Mining: Concepts and Techniques: April 30, 2012
No ratings yet
Data Mining: Concepts and Techniques: April 30, 2012
64 pages
Data Mining: Concepts and Techniques: November 21, 2013
No ratings yet
Data Mining: Concepts and Techniques: November 21, 2013
64 pages
Data Mining: Concepts and Techniques: January 14, 2014
No ratings yet
Data Mining: Concepts and Techniques: January 14, 2014
64 pages
Unit III: Concept Description: Characterization and Comparison
No ratings yet
Unit III: Concept Description: Characterization and Comparison
53 pages
Concept Description: Characterization and Comparision: Chapter-10
No ratings yet
Concept Description: Characterization and Comparision: Chapter-10
5 pages
Data Mining: Concepts and Techniques: - Chapter 5
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 5
63 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 5
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 5
73 pages
Data Warehousing/Mining Comp 150 DW Chapter 5: Concept Description: Characterization and Comparison
No ratings yet
Data Warehousing/Mining Comp 150 DW Chapter 5: Concept Description: Characterization and Comparison
59 pages
UNIT-4 Characterization and Comparison
No ratings yet
UNIT-4 Characterization and Comparison
61 pages
Lecture 2.1.1 2.1.2
No ratings yet
Lecture 2.1.1 2.1.2
23 pages
DATA MINING UNIT3
No ratings yet
DATA MINING UNIT3
19 pages
Chapter 5 Concept Description Characterization and Comparison 395
No ratings yet
Chapter 5 Concept Description Characterization and Comparison 395
64 pages
Data Mining Unit2
No ratings yet
Data Mining Unit2
9 pages
Unit 3
No ratings yet
Unit 3
38 pages
UNIT-5 DMDW
No ratings yet
UNIT-5 DMDW
21 pages
DM Concepts
No ratings yet
DM Concepts
64 pages
5 Desc
No ratings yet
5 Desc
60 pages
Unit 4 Data warehousing and Data mining
No ratings yet
Unit 4 Data warehousing and Data mining
15 pages
Chapter 5: Concept Description: Characterization and Comparison
No ratings yet
Chapter 5: Concept Description: Characterization and Comparison
58 pages
Data Warehousing/Mining Comp 150 DW Chapter 5: Concept Description: Characterization and Comparison
No ratings yet
Data Warehousing/Mining Comp 150 DW Chapter 5: Concept Description: Characterization and Comparison
59 pages
UNIT 4
No ratings yet
UNIT 4
39 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 5
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 5
64 pages
An 15 DM Caracterizacion
No ratings yet
An 15 DM Caracterizacion
38 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
22 pages
DWDM (Unit-4)-2
No ratings yet
DWDM (Unit-4)-2
23 pages
CH 4
No ratings yet
CH 4
58 pages
Unit I
No ratings yet
Unit I
57 pages
Data Pre-Processing: Data Preprocessing Describes Any Type of Processing Performed On Raw Data To Prepare It For
No ratings yet
Data Pre-Processing: Data Preprocessing Describes Any Type of Processing Performed On Raw Data To Prepare It For
57 pages
New Text Document
No ratings yet
New Text Document
3 pages
DMDW Co1 Session 7
No ratings yet
DMDW Co1 Session 7
46 pages
Concept Description:: Characterization & Comparison
No ratings yet
Concept Description:: Characterization & Comparison
51 pages
Data Mining Concept Description: Characterization and Comparison
No ratings yet
Data Mining Concept Description: Characterization and Comparison
14 pages
9 MidReview
No ratings yet
9 MidReview
25 pages
DM Data transformation techniques
No ratings yet
DM Data transformation techniques
25 pages
Solutions To DM I MID (A)
100% (1)
Solutions To DM I MID (A)
19 pages
Data Mining
No ratings yet
Data Mining
40 pages
Data Science unit I(LN and QB)
No ratings yet
Data Science unit I(LN and QB)
44 pages
DW&DM(Unit -4)
No ratings yet
DW&DM(Unit -4)
9 pages
Data Mining and Data Warehousing Notes ct1
No ratings yet
Data Mining and Data Warehousing Notes ct1
12 pages
Unit 4
No ratings yet
Unit 4
27 pages
Data Mining-2-1
No ratings yet
Data Mining-2-1
12 pages
Data Mining Mid 2
No ratings yet
Data Mining Mid 2
20 pages
DM Day3 Preprocessing a S25
No ratings yet
DM Day3 Preprocessing a S25
109 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
15 pages
Unit 2 - Data Visualization Techniques
No ratings yet
Unit 2 - Data Visualization Techniques
101 pages
Data Pre-Processing: Submitted By, R.Archana, 10ucs05 D.Gayathri, 10ucs11
No ratings yet
Data Pre-Processing: Submitted By, R.Archana, 10ucs05 D.Gayathri, 10ucs11
18 pages
Data Preparation
No ratings yet
Data Preparation
21 pages
Data Mining1
No ratings yet
Data Mining1
13 pages
Dmbi
No ratings yet
Dmbi
9 pages
Lecture 2.1.3 2.1.4
No ratings yet
Lecture 2.1.3 2.1.4
34 pages
Data Generalization
No ratings yet
Data Generalization
3 pages
Datamining-Lect1 2
No ratings yet
Datamining-Lect1 2
44 pages
2 DATA MINING TERMS & CONCEPTS
No ratings yet
2 DATA MINING TERMS & CONCEPTS
44 pages
Data Preprocessing: Week 2
No ratings yet
Data Preprocessing: Week 2
67 pages
Data Accquisition
No ratings yet
Data Accquisition
6 pages
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Workshop Master Revealed
From Everand
Workshop Master Revealed
Anil Soni
No ratings yet
TM254 - Managing IT, The Why, What How - Course Calendar
No ratings yet
TM254 - Managing IT, The Why, What How - Course Calendar
1 page
Review Test Submission - Quiz01 - 202303 - Database ..
No ratings yet
Review Test Submission - Quiz01 - 202303 - Database ..
11 pages
DP-203T00 Data Engineering On Microsoft Azure
No ratings yet
DP-203T00 Data Engineering On Microsoft Azure
12 pages
LESSON 8 - Visual Basic Database Programming
100% (4)
LESSON 8 - Visual Basic Database Programming
14 pages
Data Science
No ratings yet
Data Science
82 pages
Symbol Recognition: Current Advances and Perspectives
No ratings yet
Symbol Recognition: Current Advances and Perspectives
25 pages
COMP1638: Database Management and Administration Lab 8 Flashback Technologies
No ratings yet
COMP1638: Database Management and Administration Lab 8 Flashback Technologies
5 pages
4.practice Questions and Solutions Set-4
No ratings yet
4.practice Questions and Solutions Set-4
3 pages
module 2 DBMS
No ratings yet
module 2 DBMS
18 pages
Subhana Mowlid Tamil Final
100% (3)
Subhana Mowlid Tamil Final
134 pages
Reading An Oracle AWR Report
No ratings yet
Reading An Oracle AWR Report
61 pages
Dork SQL
No ratings yet
Dork SQL
7 pages
Dbase Systems Exam
No ratings yet
Dbase Systems Exam
4 pages
Ultima Forte Required Data Inputs For Nokia Infrastructure
No ratings yet
Ultima Forte Required Data Inputs For Nokia Infrastructure
61 pages
Query Optimization in Hibernate: by Singaram
No ratings yet
Query Optimization in Hibernate: by Singaram
20 pages
A Database Is A Collection of Data That Is Stored in An Organized Manner
No ratings yet
A Database Is A Collection of Data That Is Stored in An Organized Manner
3 pages
Semester Project: Database Management Systems Lab
No ratings yet
Semester Project: Database Management Systems Lab
16 pages
Quiz I - 2016 (Solution Key)
No ratings yet
Quiz I - 2016 (Solution Key)
2 pages
Models of Transactions
No ratings yet
Models of Transactions
93 pages
Csi ZG515 Course Handout
No ratings yet
Csi ZG515 Course Handout
10 pages
QueryTuning V2
No ratings yet
QueryTuning V2
8 pages
Logfs - Finally A Scalable Flash File System
No ratings yet
Logfs - Finally A Scalable Flash File System
8 pages
Practical File DBMS
No ratings yet
Practical File DBMS
14 pages
Unix Notes
No ratings yet
Unix Notes
40 pages
Simple For Loop Cursor in Oracle Forms6i
No ratings yet
Simple For Loop Cursor in Oracle Forms6i
5 pages
ORACLE Fundamental - SQL 1 أ آاروأ: Mohamed - Suez
No ratings yet
ORACLE Fundamental - SQL 1 أ آاروأ: Mohamed - Suez
4 pages
PROJ-IS-IS220-2-22-Project Template
No ratings yet
PROJ-IS-IS220-2-22-Project Template
20 pages

05 DM BI Concept Description

Uploaded by

05 DM BI Concept Description

Uploaded by

Data Mining and Business Intelligence

 The resulting generalized relation can be mapped into different

 Concept comparison can be performed using the

You might also like