SlideShare a Scribd company logo
Task Relevant DATA, Discretization
and concept Hierarchy
 Subject: Data Mining & Business Intelligence
 CE-B
 Maulik togadiya 130240107090
Task Relevant DATA
 This specifies the portions of the database or the set of data in which the
user is interested. This includes the database attributes or data warehouse
dimensions of interest (referred to as the relevant attributes or
dimensions).
 This portion include the following:
Data warehouse name
Database table
Condition for data selection
Dimension
Data grouping criteria
Example
 If a data mining task is to study associations between items frequently
purchased at AllElectronics by customers in india, the task relevant data
can be specified by providing the following information:
 Name of the database or data warehouse to be used (e.g., AllElectronics_db)
 Names of the tables or data cubes containing relevant data (e.g., item,
customer, purchases and items_sold)
 Conditions for selecting the relevant data (e.g., retrieve data for purchases
made in india for the current year)
 The relevant attributes or dimensions (e.g., name and price from the item table
and income and age from the customer table)
The Kind of Knowledge to be mined
 Characterization
 Discrimination
 Association
 Classification/prediction
 Clustering
 Outlier analysis
 Other data mining tasks
Concept Hierarchies
 A concept hierarchy is explain a sequence of mapping from a set of low-
level concept to high-level more general concept.
 Different type of concept hierarchies:
Schema hierarchy
Set grouping hierarchy
Operation-derived hierarchy
Rule-based hierarchy
Schema hierarchy
 A schema hierarchy is total or partial order among the attribute in the
database schema. This hierarchy may formally express semantic relation
between attributes.
 Schema hierarchy of a relation for address containing the attributes street
city state and country:
house_number < street < city < state <country
 In this example house_number is at a conceptually low level then street ,
which is lower then city or state which is conceptually lower then country.
Set grouping hierarchy
 Organizes values for a given attribute into groups or sets or range of
values.
 Total or partial order can be defined among groups.
 Used to refine or enrich schema-defined hierarchies.
 Example: Set-grouping hierarchy for age
{young, middle_aged, senior} all (age)
{20….29} young
{40….59} middle_aged
{60….89} senior
Set grouping hierarchy
All ages
Young senior
middel_aged
…..
{20,21…..29} {60,61,…89}
{40,41…59}
Operation_derived hierarchy
 An operation_derived hierarchy is based on operation specified by an
users , experts or by the mining systems. Operation can be include the
decoding of information, encoding and extracting from complex data
clustering.
example: markovz@cs.ccsu.edu
 instantiates the hierarchy user−name < department < university <
usa−univeristy.
Rule-based hierarchy
 A rule based hierarchy either a whole concept hierarchy or a portion of it
is defined by a set of rules and is evaluated dynamically based on current
data and definition.
Example: define hierarchy profit_margin_hierarchy on item as
 level_1: low_profit_margin < level_0: all if (price - cost)< $50
 level_1: medium_profit_margin < level_0: all if ((price - cost) > $50) and
((price - cost) <= $250))
 level_1: high_profit_margin < level_0: all if (price - cost) > $250
Discretization and Concept hierarchy
 Discretization
 Reduce the number of values for a given continuous attribute by dividing the
range of the attribute into intervals.
 Concept hierarchies
 Reduce the data by collecting and replacing low level concepts (such as
numeric values for the attribute age) by higher level concepts (such as young,
middle-aged, or senior).
Discretization and concept hierarchy
generation for numeric data
 Binning
 Histogram analysis
 Clustering analysis
Binning Methods for Data Smoothing
Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
* Partition into equal-frequency (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
Histogram analysis
 Partitioning rule is applied to define range of values.
 Divide data into buckets and store average (sum) for each bucket.
Clustering analysis
 Partition data into groups or cluster.
 Clustering is a process of partitioning a set of data (or objects) into a set of
meaningful sub-classes, called clusters.
 Help users understand the natural grouping or structure in a data set.
Clustering analysis
Concept Hierarchy Generation for Categorical
Data
 Specification of a partial/total ordering of attributes explicitly at the schema level
by users or experts
 street < city < state < country
 Specification of a hierarchy for a set of values by explicit data grouping
 {Ahmedabad, Surat, Rajkot} < Gujarat
 Specification of only a partial set of attributes
 E.g., only street < city, not others
 Automatic generation of hierarchies (or attribute levels) by the analysis of the
number of distinct values
 E.g., for a set of attributes: {street, city, state, country}
Automatic Concept Hierarchy Generation
 Some hierarchies can be automatically generated based on the analysis of
the number of distinct values per attribute in the data set
The attribute with the most distinct values is placed at the lowest level
of the hierarchy
Exceptions, e.g., weekday, month, quarter, year
country
province_or_ state
city
street
15 distinct values
365 distinct values
3567 distinct values
674,339 distinct values
Data mining
Ad

More Related Content

What's hot (20)

5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
Krish_ver2
 
Kdd process
Kdd processKdd process
Kdd process
Rajesh Chandra
 
Types of Database Models
Types of Database ModelsTypes of Database Models
Types of Database Models
Murassa Gillani
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
Rule Based Algorithms.pptx
Rule Based Algorithms.pptxRule Based Algorithms.pptx
Rule Based Algorithms.pptx
RoshanSuvedi1
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
Salah Amean
 
DBMS OF DATA MODEL Deepika 2
DBMS OF DATA MODEL  Deepika 2DBMS OF DATA MODEL  Deepika 2
DBMS OF DATA MODEL Deepika 2
Rai Saheb Bhanwar Singh College Nasrullaganj
 
10. XML in DBMS
10. XML in DBMS10. XML in DBMS
10. XML in DBMS
koolkampus
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Distributed database
Distributed databaseDistributed database
Distributed database
ReachLocal Services India
 
And or graph
And or graphAnd or graph
And or graph
Ali A Jalil
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
DHIVYADEVAKI
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
Krish_ver2
 
Ordbms
OrdbmsOrdbms
Ordbms
ramandeep brar
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
Krish_ver2
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
Krish_ver2
 
Trends in DM.pptx
Trends in DM.pptxTrends in DM.pptx
Trends in DM.pptx
ImXaib
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
Krish_ver2
 
Types of Database Models
Types of Database ModelsTypes of Database Models
Types of Database Models
Murassa Gillani
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Rule Based Algorithms.pptx
Rule Based Algorithms.pptxRule Based Algorithms.pptx
Rule Based Algorithms.pptx
RoshanSuvedi1
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
Salah Amean
 
10. XML in DBMS
10. XML in DBMS10. XML in DBMS
10. XML in DBMS
koolkampus
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
DHIVYADEVAKI
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
Krish_ver2
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
Krish_ver2
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
Krish_ver2
 
Trends in DM.pptx
Trends in DM.pptxTrends in DM.pptx
Trends in DM.pptx
ImXaib
 

Viewers also liked (20)

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Vikran Kottaisamy
 
Turing machine-TOC
Turing machine-TOCTuring machine-TOC
Turing machine-TOC
Maulik Togadiya
 
Ip sec
Ip secIp sec
Ip sec
shifanabasheer
 
Java history, versions, types of errors and exception, quiz
Java history, versions, types of errors and exception, quiz Java history, versions, types of errors and exception, quiz
Java history, versions, types of errors and exception, quiz
SAurabh PRajapati
 
remote sensor
remote sensorremote sensor
remote sensor
SAurabh PRajapati
 
12. dfs
12. dfs12. dfs
12. dfs
Dr Sandeep Kumar Poonia
 
6. The grid-COMPUTING OGSA and WSRF
6. The grid-COMPUTING OGSA and WSRF6. The grid-COMPUTING OGSA and WSRF
6. The grid-COMPUTING OGSA and WSRF
Dr Sandeep Kumar Poonia
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
Anamika Singh
 
Distributed Operating System_2
Distributed Operating System_2Distributed Operating System_2
Distributed Operating System_2
Dr Sandeep Kumar Poonia
 
Lecture28 tsp
Lecture28 tspLecture28 tsp
Lecture28 tsp
Dr Sandeep Kumar Poonia
 
Multiple Access in wireless communication
Multiple Access in wireless communicationMultiple Access in wireless communication
Multiple Access in wireless communication
Maulik Togadiya
 
optimization of DFA
optimization of DFAoptimization of DFA
optimization of DFA
Maulik Togadiya
 
Ccleaner presentation
Ccleaner presentationCcleaner presentation
Ccleaner presentation
SAurabh PRajapati
 
Distributed shred memory architecture
Distributed shred memory architectureDistributed shred memory architecture
Distributed shred memory architecture
Maulik Togadiya
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
Deepak John
 
Soft computing
Soft computingSoft computing
Soft computing
Dr Sandeep Kumar Poonia
 
IDS n IPS
IDS n IPSIDS n IPS
IDS n IPS
SAurabh PRajapati
 
Synchronization - Election Algorithms
Synchronization  - Election AlgorithmsSynchronization  - Election Algorithms
Synchronization - Election Algorithms
OsaMa Hasan
 
5. Distributed Operating Systems
5. Distributed Operating Systems5. Distributed Operating Systems
5. Distributed Operating Systems
Dr Sandeep Kumar Poonia
 
Light emitting Diode
Light emitting DiodeLight emitting Diode
Light emitting Diode
SAurabh PRajapati
 
Ad

Similar to Data mining (20)

Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)
snegacmr
 
Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023
SakshiTiwari490123
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
Arun Verma
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
lavanya marichamy
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
janani thirupathi
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
Laxmi139487
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Tony Nguyen
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Harry Potter
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Fraboni Ec
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Young Alista
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Luis Goldster
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
James Wong
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Hoang Nguyen
 
Data-Mining-2.ppt
Data-Mining-2.pptData-Mining-2.ppt
Data-Mining-2.ppt
Lazher ZAIDI
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
Nit Hik
 
Data mining query language
Data mining query languageData mining query language
Data mining query language
GowriLatha1
 
Data Mining
Data MiningData Mining
Data Mining
SHIKHA GAUTAM
 
DBIC 2 - Resultsets
DBIC 2 - ResultsetsDBIC 2 - Resultsets
DBIC 2 - Resultsets
Aran Deltac
 
Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)
LizLavaveshkul
 
James Colby Maddox Business Intellignece and Computer Science Portfolio
James Colby Maddox Business Intellignece and Computer Science PortfolioJames Colby Maddox Business Intellignece and Computer Science Portfolio
James Colby Maddox Business Intellignece and Computer Science Portfolio
colbydaman
 
Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)
snegacmr
 
Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023Data Mining Presentation on Science Day 2023
Data Mining Presentation on Science Day 2023
SakshiTiwari490123
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
janani thirupathi
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
Laxmi139487
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Tony Nguyen
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Harry Potter
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Fraboni Ec
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Young Alista
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
James Wong
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Hoang Nguyen
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
Nit Hik
 
Data mining query language
Data mining query languageData mining query language
Data mining query language
GowriLatha1
 
DBIC 2 - Resultsets
DBIC 2 - ResultsetsDBIC 2 - Resultsets
DBIC 2 - Resultsets
Aran Deltac
 
Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)
LizLavaveshkul
 
James Colby Maddox Business Intellignece and Computer Science Portfolio
James Colby Maddox Business Intellignece and Computer Science PortfolioJames Colby Maddox Business Intellignece and Computer Science Portfolio
James Colby Maddox Business Intellignece and Computer Science Portfolio
colbydaman
 
Ad

Recently uploaded (20)

RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Journal of Soft Computing in Civil Engineering
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 

Data mining

  • 1. Task Relevant DATA, Discretization and concept Hierarchy  Subject: Data Mining & Business Intelligence  CE-B  Maulik togadiya 130240107090
  • 2. Task Relevant DATA  This specifies the portions of the database or the set of data in which the user is interested. This includes the database attributes or data warehouse dimensions of interest (referred to as the relevant attributes or dimensions).  This portion include the following: Data warehouse name Database table Condition for data selection Dimension Data grouping criteria
  • 3. Example  If a data mining task is to study associations between items frequently purchased at AllElectronics by customers in india, the task relevant data can be specified by providing the following information:  Name of the database or data warehouse to be used (e.g., AllElectronics_db)  Names of the tables or data cubes containing relevant data (e.g., item, customer, purchases and items_sold)  Conditions for selecting the relevant data (e.g., retrieve data for purchases made in india for the current year)  The relevant attributes or dimensions (e.g., name and price from the item table and income and age from the customer table)
  • 4. The Kind of Knowledge to be mined  Characterization  Discrimination  Association  Classification/prediction  Clustering  Outlier analysis  Other data mining tasks
  • 5. Concept Hierarchies  A concept hierarchy is explain a sequence of mapping from a set of low- level concept to high-level more general concept.  Different type of concept hierarchies: Schema hierarchy Set grouping hierarchy Operation-derived hierarchy Rule-based hierarchy
  • 6. Schema hierarchy  A schema hierarchy is total or partial order among the attribute in the database schema. This hierarchy may formally express semantic relation between attributes.  Schema hierarchy of a relation for address containing the attributes street city state and country: house_number < street < city < state <country  In this example house_number is at a conceptually low level then street , which is lower then city or state which is conceptually lower then country.
  • 7. Set grouping hierarchy  Organizes values for a given attribute into groups or sets or range of values.  Total or partial order can be defined among groups.  Used to refine or enrich schema-defined hierarchies.  Example: Set-grouping hierarchy for age {young, middle_aged, senior} all (age) {20….29} young {40….59} middle_aged {60….89} senior
  • 8. Set grouping hierarchy All ages Young senior middel_aged ….. {20,21…..29} {60,61,…89} {40,41…59}
  • 9. Operation_derived hierarchy  An operation_derived hierarchy is based on operation specified by an users , experts or by the mining systems. Operation can be include the decoding of information, encoding and extracting from complex data clustering. example: [email protected]  instantiates the hierarchy user−name < department < university < usa−univeristy.
  • 10. Rule-based hierarchy  A rule based hierarchy either a whole concept hierarchy or a portion of it is defined by a set of rules and is evaluated dynamically based on current data and definition. Example: define hierarchy profit_margin_hierarchy on item as  level_1: low_profit_margin < level_0: all if (price - cost)< $50  level_1: medium_profit_margin < level_0: all if ((price - cost) > $50) and ((price - cost) <= $250))  level_1: high_profit_margin < level_0: all if (price - cost) > $250
  • 11. Discretization and Concept hierarchy  Discretization  Reduce the number of values for a given continuous attribute by dividing the range of the attribute into intervals.  Concept hierarchies  Reduce the data by collecting and replacing low level concepts (such as numeric values for the attribute age) by higher level concepts (such as young, middle-aged, or senior).
  • 12. Discretization and concept hierarchy generation for numeric data  Binning  Histogram analysis  Clustering analysis
  • 13. Binning Methods for Data Smoothing Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 * Partition into equal-frequency (equi-depth) bins: - Bin 1: 4, 8, 9, 15 - Bin 2: 21, 21, 24, 25 - Bin 3: 26, 28, 29, 34 * Smoothing by bin means: - Bin 1: 9, 9, 9, 9 - Bin 2: 23, 23, 23, 23 - Bin 3: 29, 29, 29, 29 * Smoothing by bin boundaries: - Bin 1: 4, 4, 4, 15 - Bin 2: 21, 21, 25, 25 - Bin 3: 26, 26, 26, 34
  • 14. Histogram analysis  Partitioning rule is applied to define range of values.  Divide data into buckets and store average (sum) for each bucket. Clustering analysis  Partition data into groups or cluster.  Clustering is a process of partitioning a set of data (or objects) into a set of meaningful sub-classes, called clusters.  Help users understand the natural grouping or structure in a data set.
  • 16. Concept Hierarchy Generation for Categorical Data  Specification of a partial/total ordering of attributes explicitly at the schema level by users or experts  street < city < state < country  Specification of a hierarchy for a set of values by explicit data grouping  {Ahmedabad, Surat, Rajkot} < Gujarat  Specification of only a partial set of attributes  E.g., only street < city, not others  Automatic generation of hierarchies (or attribute levels) by the analysis of the number of distinct values  E.g., for a set of attributes: {street, city, state, country}
  • 17. Automatic Concept Hierarchy Generation  Some hierarchies can be automatically generated based on the analysis of the number of distinct values per attribute in the data set The attribute with the most distinct values is placed at the lowest level of the hierarchy Exceptions, e.g., weekday, month, quarter, year country province_or_ state city street 15 distinct values 365 distinct values 3567 distinct values 674,339 distinct values