0% found this document useful (0 votes)

81 views

Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science

This document provides an overview of data mining and decision trees. It discusses the evolution of database technology and defines data mining as the extraction of interesting patterns from large databases. Decision trees are described as a type of model that can be used for classification or prediction tasks in data mining. The document also outlines some common applications of data mining such as risk analysis, direct marketing, and fraud detection.

Uploaded by

Sandeep Masini

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views

Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science

Uploaded by

Sandeep Masini

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 66

Data Mining and Decision Trees

Prof. Sin-Min Lee Department of Computer Science

Evolution of Database Technology

1960s:
Data collection, database creation, IMS and network DBMS

1970s:
Relational data model, relational DBMS implementation

1980s:
RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.)

1990s2000s:
Data mining and data warehousing, multimedia databases, and Web databases

Data mining (knowledge discovery in databases):

What Is Data Mining?

Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases

Alternative names and their inside stories:

Data mining: a misnomer? Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

What is not data mining?

(Deductive) query processing. Expert systems or small ML/statistical programs

Why Data Mining? Potential Applications

Database analysis and decision support
Market analysis and management target marketing, customer relation management, market basket analysis, cross selling, market segmentation Risk analysis and management Forecasting, customer retention, improved underwriting, quality control, competitive analysis

Fraud detection and management

Other Applications
Text mining (news group, email, documents) and Web analysis.

Intelligent query answering

Where are the data sources for analysis?

Market Analysis and Management (1)

Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies

Target marketing
Find clusters of model customers who share the same characteristics: interest, income level, spending habits, etc.

Determine customer purchasing patterns over time

Conversion of single to a joint bank account: marriage, etc.

Cross-market analysis
Associations/co-relations between product sales

Prediction based on the association information

Market Analysis and Management (2)

Customer profiling data mining can tell you what types of customers buy what products (clustering or classification) Identifying customer requirements identifying the best products for different customers

use prediction to find what factors will attract new customers

Provides summary information various multidimensional summary reports

statistical summary information (data central tendency and variation)

Corporate Analysis and Risk Management Finance planning and asset evaluation
cash flow analysis and prediction contingent claim analysis to evaluate assets cross-sectional and time series analysis (financial-ratio, trend analysis, etc.)

Resource planning:
summarize and compare the resources and spending

Competition:
monitor competitors and market directions group customers into classes and a class-based pricing procedure set pricing strategy in a highly competitive market

Fraud Detection and Management (1)

Applications
widely used in health care, retail, credit card services, telecommunications (phone card fraud), etc.

Approach
use historical data to build models of fraudulent behavior and use data mining to help identify similar instances

Examples
auto insurance: detect a group of people who stage accidents to collect on insurance money laundering: detect suspicious money transactions (US Treasury's Financial Crimes Enforcement Network) medical insurance: detect professional patients and ring of doctors and ring of references

Fraud Detection and Management (2)

Detecting inappropriate medical treatment
Australian Health Insurance Commission identifies that in many cases blanket screening tests were requested (save Australian $1m/yr).

Detecting telephone fraud

Telephone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm. British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud.

Retail
Analysts estimate that 38% of retail shrink is due to dishonest employees.

Sports

Other Applications

IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat

Astronomy
JPL and the Palomar Observatory discovered 22 quasars with the help of data mining

Internet Web Surf-Aid

IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior pages, analyzing effectiveness of Web marketing, improving Web site organization, etc.

Data Mining: A KDD Process

Pattern Evaluation

Data mining: the core of knowledge discovery process.

Data Mining

Task-relevant Data Data Warehouse Selection

Data Cleaning
Data Integration Databases

Steps of a KDD Process

Learning the application domain:
relevant prior knowledge and goals of application

Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and transformation:
Find useful features, dimensionality/variable reduction, invariant representation.

Choosing functions of data mining

summarization, classification, regression, association, clustering.

Choosing the mining algorithm(s) Data mining: search for patterns of interest Pattern evaluation and knowledge presentation
visualization, transformation, removing redundant patterns, etc.

Use of discovered knowledge

Area 1: Risk Analysis

Insurance companies and banks use data mining for risk analysis. And insurance company searches in its own insurants and claims databases for relationships between personal characteristics and claim behavior.

Continued
The company is especially interested in the characteristics of insurants with a high deviating claim behavior. With data mining, these so-called risk-profiles can be discovered and the company can use this information to adapt its premium polity.

Area 2: Direct Marketing

Data mining can also be used to discover the relationship between ones personal characteristics, e.g. age, gender, hometown, and the probability that one will respond to a mailing. Such relationships can be used to select those customers from the mailing database that have the highest probability of responding to a mailing.

This allows the company to mail its prospects selectively, thus maximizing the response. For example: 1. Company X sends a mailing to a number of prospects. 2. The response is 2%.

What Data Mining can do

Enables companies to determine relationships among internal and external factors. Predict cross-sell opportunities and make recommendations Segment markets and personalize communications. Predicts outcomes of future situations

The process Of Data Mining

There are 3 main steps in the Data Mining process:
Preparation: data is selected from the warehouse and cleansed. Processing: algorithms are used to process the data. This step uses modeling to make predictions. Analysis: output is evaluated.

Reasons for growing popularity

Growing data volume- enormous amount of existing and appearing data that require processing. Limitations of Human Analysis- humans lacking objectiveness when analyzing dependencies for data. Low cost of Machine Learning- the data mining process has a lower cost than hiring highly trained professionals to analyze data.

Data Mining Techniques

Association Rule- is to discover interesting associations between attributes that are contained in a database. Clustering- finds appropriate groupings of elements for a set of data. Sequential patterns-looking for patterns where one event leads to another later event. Classification- looking for new patterns.

Applications of Data Mining

Data Mining is applied in the following areas:
Prediction of the Stock Market: predicting the future trends. Bankruptcy prediction: prediction based on computer generated rules, using models Foreign Exchange Market: Data Mining is used to identify trading rules. Fraud Detection: construction of algorithms and models that will help recognize a variety of fraud patterns.

Results of Data Mining Include:

Forecasting what may happen in the future Classifying people or things into groups by recognizing patterns Clustering people or things into groups based on their attributes Associating what events are likely to occur together Sequencing what events are likely to lead to later events

Data mining is not

Brute-force crunching of bulk data Blind application of algorithms Going to find relationships where none exist Presenting data in different ways A database intensive task A difficult to understand technology requiring an advanced degree in computer science

What data mining has done for...

The US Internal Revenue Service needed to improve customer service and...

Scheduled its workforce to provide faster, more accurate answers to questions.

What data mining has done for...

The US Drug Enforcement Agency needed to be more effective in their drug busts and

analyzed suspects cell phone usage to focus investigations.

What data mining has done for...

HSBC need to cross-sell more effectively by identifying profiles that would be interested in higher yielding investments and...

Reduced direct mail costs by 30% while garnering 95% of the campaigns revenue.

Data Mining process model -DM

Search in State Spaces

Decision Trees A decision tree is a special case of a state-space graph. It is a rooted tree in which each internal node corresponds to a decision, with a subtree at these nodes for each possible outcome of the decision.
Decision trees can be used to model problems in which a series of decisions leads to a solution.

The possible solutions of the problem correspond to the paths from the root to the leaves of the decision tree.

Decision Trees
Example: The n-queens problem How can we place n queens on an nn chessboard so that no two queens can capture each other?

A queen can move any number of squares horizontally, vertically, and diagonally. Here, the possible target squares of the queen Q are marked with an x.

x x

x x x x x x

x x x Q x x x x x x x x x x x x x

Let us consider the 4-queens problem.

Question: How many possible configurations of 44 chessboards containing 4 queens are there?
Answer: There are 16!/(12!4!) = (13141516)/(234) = 13754 = 1820 possible configurations. Shall we simply try them out one by one until we encounter a solution? No, it is generally useful to think about a search problem more carefully and discover constraints on the problems solutions. Such constraints can dramatically reduce the size of the relevant state space.

Obviously, in any solution of the n-queens problem, there must be exactly one queen in each column of the board. Otherwise, the two queens in the same column could capture each other. Therefore, we can describe the solution of this problem as a sequence of n decisions:

Decision 1: Place a queen in the first column.

Decision 2: Place a queen in the second column. . . . Decision n: Place a queen in the n-th column.

Backtracking in Decision Trees

empty board
Q

place

1st

queen
Q Q

place
place place

2nd
3rd 4th

queen
queen

Q Q Q Q

Q Q Q

queen

Q Q Q

Neural Network
Many inputs and a single output Trained on signal and background sample Well understood and mostly accepted in HEP
Many inputs and a single output Trained on signal and background sample

Decision Tree

Used mostly in life sciences & business

Decision tree Basic Algorithm

Initialize top node to all examples While impure leaves available
select next impure leave L find splitting attribute A with maximal information gain for each value of A add child to L

Decision tree Find good splitstatistics to compute info gain: count matrix Sufficient
outlook sunny sunny overcast rainy rainy rainy overcast sunny sunny rainy sunny overcast overcast rainy temperature hot hot hot mild cool cool cool mild cool mild mild mild hot mild humidity high high high high normal normal normal high normal normal normal high normal high windy FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE play no no yes yes yes no yes no yes yes yes yes yes no

outlook
temperature humidity windy

sunny overcast rainy

play don't play 2 3 4 0 3 2

gain: 0.25 bits

hot mild cool

play don't play 2 2 4 2 3 1

gain: 0.16 bits

high normal

play don't play 3 4 6 1

gain: 0.03 bits

FALSE TRUE

play don't play 6 2 3 3

gain: 0.14 bits

Decision trees
Simple depth-first construction Needs entire data to fit in memory Unsuitable for large data sets Need to scale up

Decision Trees

Planning Tool

Decision Trees
Enable a business to quantify decision making Useful when the outcomes are uncertain Places a numerical value on likely or potential outcomes Allows comparison of different possible decisions to be made

Decision Trees
Limitations:
How accurate is the data used in the construction of the tree? How reliable are the estimates of the probabilities? Data may be historical does this data relate to real time? Necessity of factoring in the qualitative factors human resources, motivation, reaction, relations with suppliers and other stakeholders

Process

The Process
Economic growth rises 0.7 Expected outcome 300,000 Expand by opening new outlet Economic growth declines 0.3 Maintain current status 0 The circle denotes the point where different outcomes could occur. The estimates of the probability and the knowledge of the expected outcome allow the firm to make a calculation of the likely return. In this example it is: A square denotes the point where a decision is made, In this example, a business is contemplating There is also the outlet. option The to do nothing and current status wouldcontinues have an outcome opening a new uncertainty is maintain the state the of the economy quo! if theThis economy to grow of Economic 0. growth rises: 0.7 x 300,000 = 210,000 healthily the option is estimated to yield profits of 300,000. However, if the economy fails to grow as expected, the declines: potential 0.3 lossxis estimated 500,000. Economic growth 500,000 = at -150,000 The calculation would suggest it is wise to go ahead with the decision ( a net benefit figure of +60,000) Expected outcome -500,000

The Process
Economic growth rises 0.5 Expected outcome 300,000 Expand by opening new outlet Economic growth declines 0.5 Maintain current status 0 Expected outcome -500,000

Look what happens however if the probabilities change. If the firm is unsure of the potential for growth, it might estimate it at 50:50. In this case the outcomes will be: Economic growth rises: 0.5 x 300,000 = 150,000 Economic growth declines: 0.5 x -500,000 = -250,000 In this instance, the net benefit is -100,000 the decision looks less favourable!

Advantages

Disadvantages

Trained Decision Tree

(Limit) (Binned Likelihood Fit)

Decision Trees from Data Base

Ex Att Num Size 1 2 3 4 5 6 7 med small small large large large large Att Colour blue red red red green red green Att Shape brick wedge sphere wedge pillar pillar sphere Concept Satisfied yes no yes no yes no yes

Choose target : Concept satisfied Use all attributes except Ex Num

Rules from Tree

IF (SIZE = large AND ((SHAPE = wedge) OR (SHAPE = pillar AND COLOUR = red) ))) OR (SIZE = small AND SHAPE = wedge) THEN NO IF (SIZE = large AND ((SHAPE = pillar) AND COLOUR = green) OR SHAPE = sphere) ) OR (SIZE = small AND SHAPE = sphere) OR (SIZE = medium) THEN YES

Disjunctive Normal Form - DNF

IF (SIZE = medium) OR (SIZE = small AND SHAPE = sphere) OR (SIZE = large AND SHAPE = sphere) OR (SIZE = large AND SHAPE = pillar AND COLOUR = green THEN CONCEPT = satisfied ELSE CI ONCEPT = not satisfied

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (82)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Coursera - Data Analytics - Course 1
No ratings yet
Coursera - Data Analytics - Course 1
8 pages
1 Intro
No ratings yet
1 Intro
33 pages
Data Mining:: Dr. Hany Saleeb
No ratings yet
Data Mining:: Dr. Hany Saleeb
37 pages
Data Mining
No ratings yet
Data Mining
17 pages
Lecture 6 Compress
No ratings yet
Lecture 6 Compress
9 pages
DWDM
No ratings yet
DWDM
30 pages
Module 3
No ratings yet
Module 3
187 pages
Data Mining and Its Applications
No ratings yet
Data Mining and Its Applications
60 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Module 1 Ppt1
No ratings yet
Module 1 Ppt1
59 pages
Web Mining: Faculty of Information Technology Department of Software Engineering and Information Systems
No ratings yet
Web Mining: Faculty of Information Technology Department of Software Engineering and Information Systems
67 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
1_Lect 1 & 2 Data Mining
No ratings yet
1_Lect 1 & 2 Data Mining
20 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
46 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
Data Mining
No ratings yet
Data Mining
88 pages
Data Mining
No ratings yet
Data Mining
31 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
Unit 1
No ratings yet
Unit 1
27 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
L_1 Data Mining
No ratings yet
L_1 Data Mining
17 pages
Data Mining
No ratings yet
Data Mining
19 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
No ratings yet
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
CH 1
No ratings yet
CH 1
66 pages
Introduction To Data Mining: Dr. Hany Saleeb
No ratings yet
Introduction To Data Mining: Dr. Hany Saleeb
17 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
Data Mining Unit 1(Msc Ds 3 Sem)
No ratings yet
Data Mining Unit 1(Msc Ds 3 Sem)
119 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
09-Datamining Concepts
100% (1)
09-Datamining Concepts
121 pages
Knowledge Discovery Process and Data Mining - Final Remarks: - Moore's Law
No ratings yet
Knowledge Discovery Process and Data Mining - Final Remarks: - Moore's Law
25 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
No ratings yet
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
14 pages
Topic10 - Data Mining
No ratings yet
Topic10 - Data Mining
29 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Data Mining Tutorial
No ratings yet
Data Mining Tutorial
30 pages
Data Mining Notes
No ratings yet
Data Mining Notes
46 pages
Introduction to Data Mining_125604
No ratings yet
Introduction to Data Mining_125604
7 pages
Data Mining
No ratings yet
Data Mining
19 pages
Data Mining: Knowledge Discovery in Databases
No ratings yet
Data Mining: Knowledge Discovery in Databases
21 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
1.1 DM-intro
No ratings yet
1.1 DM-intro
25 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
KM Notes Unit-3
No ratings yet
KM Notes Unit-3
20 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Unit - I
No ratings yet
Unit - I
22 pages
DM ITERA 2020 w1
No ratings yet
DM ITERA 2020 w1
35 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
July 16, 2009 1 Data Mining
No ratings yet
July 16, 2009 1 Data Mining
26 pages
DM
No ratings yet
DM
15 pages
Data Mining: The Basic Concept
No ratings yet
Data Mining: The Basic Concept
23 pages
Intro Data Mining
100% (1)
Intro Data Mining
87 pages
Data Mining: A Tool For The Enhancement of Banking Sector: Iijdwm
No ratings yet
Data Mining: A Tool For The Enhancement of Banking Sector: Iijdwm
5 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
Data mining M1
No ratings yet
Data mining M1
64 pages
Data Mining Report
No ratings yet
Data Mining Report
16 pages
Data Mining
No ratings yet
Data Mining
7 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Learner Diversity in Inclusive Classrooms: The Interplay of Language of Instruction, Gender and Disability
No ratings yet
Learner Diversity in Inclusive Classrooms: The Interplay of Language of Instruction, Gender and Disability
18 pages
BHCS 20B Introduction To R Programming Update Awaited
No ratings yet
BHCS 20B Introduction To R Programming Update Awaited
5 pages
Dbms
No ratings yet
Dbms
2 pages
PDF
No ratings yet
PDF
4 pages
Types of Primary Memory in Computers
No ratings yet
Types of Primary Memory in Computers
6 pages
Chapter 4 - Lecture-1 Error Detection and Hamming Code
No ratings yet
Chapter 4 - Lecture-1 Error Detection and Hamming Code
42 pages
Methodologies for Stream Data Processing and Stream Data Systems
No ratings yet
Methodologies for Stream Data Processing and Stream Data Systems
20 pages
Assessment, Synthesis and Analysis of Data Mining Tools
No ratings yet
Assessment, Synthesis and Analysis of Data Mining Tools
13 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
Lecture 1 - Introduction To Databases
No ratings yet
Lecture 1 - Introduction To Databases
25 pages
Dynamic Memory Management
No ratings yet
Dynamic Memory Management
3 pages
TCS Interview Questions
No ratings yet
TCS Interview Questions
4 pages
EDA Project
No ratings yet
EDA Project
7 pages
Module-6 Dbms Cs208 Note
No ratings yet
Module-6 Dbms Cs208 Note
10 pages
VERITAS NetBackup (TM) 5 (1) .1 System Administrators Guide For UNIX, Volume I
No ratings yet
VERITAS NetBackup (TM) 5 (1) .1 System Administrators Guide For UNIX, Volume I
566 pages
JMP Essentials An Illustrated Step by Step Guide For New Users 2nd Edition Curt Hinrichs Ebook All Chapters PDF
100% (13)
JMP Essentials An Illustrated Step by Step Guide For New Users 2nd Edition Curt Hinrichs Ebook All Chapters PDF
55 pages
134592 - Import of SAPDBA Role (Sapdba_role.sql)
No ratings yet
134592 - Import of SAPDBA Role (Sapdba_role.sql)
4 pages
Data Science Methodologies (Coursera)
No ratings yet
Data Science Methodologies (Coursera)
5 pages
Data Culture Playbook - Tableau
No ratings yet
Data Culture Playbook - Tableau
16 pages
Case Study - Tailor Store
No ratings yet
Case Study - Tailor Store
79 pages
Compiler - Mod 5-Symbol Table
No ratings yet
Compiler - Mod 5-Symbol Table
17 pages
Research Methodology 1 DR - Saada
No ratings yet
Research Methodology 1 DR - Saada
78 pages
Doodads Quick Reference
No ratings yet
Doodads Quick Reference
15 pages
Assessment 1
No ratings yet
Assessment 1
5 pages
A Critical Essay of Case Study Research
No ratings yet
A Critical Essay of Case Study Research
16 pages
Experience Summary:: Vadlamudi Venugopal Naidu +91 9000834180
No ratings yet
Experience Summary:: Vadlamudi Venugopal Naidu +91 9000834180
3 pages
EXP-8 Modeling Data Flow Diagrams
No ratings yet
EXP-8 Modeling Data Flow Diagrams
6 pages
Remote Replication Technologies Architecture Overview
No ratings yet
Remote Replication Technologies Architecture Overview
23 pages
Previous Year Solved Question Paper
No ratings yet
Previous Year Solved Question Paper
21 pages

Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science

Uploaded by

Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science

Uploaded by

Data Mining and Decision Trees

Prof. Sin-Min Lee Department of Computer Science

Evolution of Database Technology

Data mining (knowledge discovery in databases):

What Is Data Mining?

Alternative names and their inside stories:

What is not data mining?

Why Data Mining? Potential Applications

Fraud detection and management

Intelligent query answering

Where are the data sources for analysis?

Market Analysis and Management (1)

Determine customer purchasing patterns over time

Prediction based on the association information

Market Analysis and Management (2)

use prediction to find what factors will attract new customers

statistical summary information (data central tendency and variation)

Fraud Detection and Management (1)

Fraud Detection and Management (2)

Detecting telephone fraud

Internet Web Surf-Aid

Data Mining: A KDD Process

Data mining: the core of knowledge discovery process.

Task-relevant Data Data Warehouse Selection

Steps of a KDD Process

Choosing functions of data mining

Use of discovered knowledge

Area 1: Risk Analysis

Area 2: Direct Marketing

What Data Mining can do

The process Of Data Mining

Reasons for growing popularity

Data Mining Techniques

Applications of Data Mining

Results of Data Mining Include:

Data mining is not

What data mining has done for...

Scheduled its workforce to provide faster, more accurate answers to questions.

What data mining has done for...

analyzed suspects cell phone usage to focus investigations.

What data mining has done for...

Data Mining process model -DM

Search in State Spaces

Let us consider the 4-queens problem.

Decision 1: Place a queen in the first column.

Backtracking in Decision Trees

Used mostly in life sciences & business

Decision tree Basic Algorithm

sunny overcast rainy

play don't play 2 3 4 0 3 2

gain: 0.25 bits

hot mild cool

play don't play 2 2 4 2 3 1

gain: 0.16 bits

play don't play 3 4 6 1

gain: 0.03 bits

play don't play 6 2 3 3

gain: 0.14 bits

Trained Decision Tree

(Limit) (Binned Likelihood Fit)

Decision Trees from Data Base

Choose target : Concept satisfied Use all attributes except Ex Num

Rules from Tree

Disjunctive Normal Form - DNF

You might also like