0% found this document useful (0 votes)

69 views

Lecture 05.decision Tree and K Means PDF

The document discusses several machine learning and data mining algorithms and concepts. It begins with an overview of decision tree learning, including how decision trees are constructed using a top-down recursive approach and how attributes are selected at each node using metrics like information gain. It then briefly discusses support vector machines, clustering techniques like K-means and hierarchical clustering, and how to evaluate clustering results.

Uploaded by

Hieu Phuc Nguyen

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views

Lecture 05.decision Tree and K Means PDF

Uploaded by

Hieu Phuc Nguyen

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Example:

Face Detection and Recognition

2
Example: Text Clustering

3
Improve Healthcare, Win $3M
 Motivation:
◦ 71M Americans are admitted to hospitals per year
◦ $30 billion was spent on unnecessary hospital admissions
◦ Can we identify earlier those most at risk and ensure they get
the treatment they need?
 Objective:
◦ Identify patients who will be admitted to a hospital within
the next year, using historical claims data.
◦ Develop new care plans and strategies to reach patients before
emergencies occur, thereby reducing the number of
unnecessary hospitalizations.
 Competition:
◦ Grand Prize: $3M
◦ Milestone Prizes: $230K/6
◦ Time: 4 April 2011 - 3 April 2013

5
Methods/algorithms
for data analysis / data mining
ICDM’06 survey: KDnuggets poll’ 11:

1. C4.5 (61)
2. K-Means (60)
3. SVM (58)
4. Apriori (52)
5. EM (48)
6. PageRank (46)
7. AdaBoost (45)
8. k-NN (45)
9. Naïve Bayes (45)
10. CART (34)

6
CLASSIFICATION BY
DECISION TREE
INDUCTION

7
BuyComputer Data

8
Classification by Decision Tree

Node: condition

Leaf: conclusion

9
General Algorithm
 Create a new node N
 If all the data belongs to the same class C Then
◦ Return N as leaf node labeled with C
 Select the “best” attribute A
 Label N with A
 For each value Ai of attribute A
◦ Select a subset Di of examples according to Ai
◦ Iterate the algorithm on Di
 EndFor

10
General Algorithm
 Create a new node N
 If all the data belongs to the same class C Then
◦ Return N as leaf node labeled with C
 Select the “best” attribute A
 Label N with A
 For each value Ai of attribute A
◦ Select a subset Di of examples according to Ai
◦ Iterate the algorithm on Di
 EndFor

11
Which Attribute is “best”?

[29+,35-] A1=? A2=? [29+,35-]

True False True False

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

12
Entropy
 Given a collection S, containing positive
and negative examples of some target
concept, the entropy of S relative to
this Boolean classification is

H(S) = -p+log2p+ -p-log2p-

◦ S is a sample of training examples

◦ p+ is the proportion of positive examples
◦ p- is the proportion of negative examples

c
 C-class: H ( X ) 
  p log
i 1
i 2 ( pi )

13
ID3: Information Gain
c Sv
Gain( S , A)  Entropy( S )  
vValues( A ) S
Entropy( S v )

◦ S – a collection of examples
◦ A – an attribute
◦ Values(A) – possible values of attribute A
◦ Sv – the subset of S for which attribute A has value v

 Entropy: measure of the impurity

 Information gain: measure of the effectiveness of
an attribute (expected reduction in entropy by
partitioning the examples according to an attribute)

14
Which Attribute is “best”?
E([29+,35-]) = 0.99 E([29+,35-]) = 0.99
A? B?

a b c d

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

E([21+,5-]) = 0.71 E([8+,30-]) = 0.74 E([18+,33-]) = 0.94 E([11+,2-]) = 0.62

Gain(S,A) = Entropy(S) Gain(S,B) = Entropy(S)

-26/64*Entropy([21+,5-]) -51/64*Entropy([18+,33-])
-38/64*Entropy([8+,30-]) -13/64*Entropy([11+,2-])
= 0.27 = 0.12

A provides greater information gain than B.

A is a better classifier than B.
15
Search in Decision Tree Learning
 Space:
◦ The set of possible decision trees
 Strategy:
◦ Simple to complex
◦ Begin with the empty tree, then more elaborate
hypotheses.
◦ The information gain measure guides the hill-
climbing search
 Property
◦ Maintain single hypothesis
◦ No backtracking
◦ Robust to noisy data
16
Attribute Selection
 Gain ratio

 Gini-index

17
Tree Pruning

Occam’s razor:
prefer the simplest hypothesis that fits the data
18
SUPPORT VECTOR
MACHINES

19
CLUSTERING

20
Visualization of the Iraq War Logs

21
Image Clustering

Break up the image

into meaningful or perceptually similar regions
22
Classification vs. Clustering

23
K-means Clustering: Example
 Step 1: pick K
point randomly

24
K-means Clustering: Example
 Step 2: assign
data points to
closest data
center

25
K-means Clustering: Example
 Step 3: changer
cluster center to
the
mean/average of
the assigned
data points

26
K-means Clustering: Example
 Repeat until
convergence

27
K-means Clustering
 Initialize
◦ Pick K random points
as cluster centers
 Repeat
1. Assign data points to
closest data center
2. Changer cluster
center to the
mean/average of the
assigned data points
Until no points
assignments change

28
K-means Clustering

29
K-means: Features
 Guaranteed to converge in a finite
number of iterations

 Complexity
1. Assign data points to closest data center:
O(Kn)
2. Changer cluster center to the average of its
assigned points: O(n)

30
K-means: Convergence

31
K-means: Randomness

 Input

 Output 1

 Output 2

32
K-means: Local Minimum

33
Hierarchical Clustering

34
Distance Measures

35
36
Algorithms
Evaluation
 Unsolved problem: number of clusters
unknown

 Solution: using supervised data

◦ Purity of clusters: F-measure, confusion matrix,…

37
Segmentation – Classification

38
Report: Decision Tree Learning
 Algorithm  Tree pruning
◦ Pre-pruning
◦ Post-pruning

 Attribute selection  Numerical attribute

◦ Information gain ◦ Discretization
◦ Gain ratio
◦ …

Lecture 06 Part A - Macine Learning
No ratings yet
Lecture 06 Part A - Macine Learning
77 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
IDS26 Clustering and Classification
No ratings yet
IDS26 Clustering and Classification
30 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
49 Machine Learning
No ratings yet
49 Machine Learning
300 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
CCST9017 (2023-24lecture11printed Version) MachineLearning
No ratings yet
CCST9017 (2023-24lecture11printed Version) MachineLearning
55 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Supervised Learning Part1
No ratings yet
Supervised Learning Part1
42 pages
Introduction To Classification - PPT Slides 1
No ratings yet
Introduction To Classification - PPT Slides 1
62 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
7 Classification
100% (3)
7 Classification
63 pages
Classification
No ratings yet
Classification
33 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Class Basic
No ratings yet
Class Basic
75 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
L-10 Iiitmg
No ratings yet
L-10 Iiitmg
28 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Machine
No ratings yet
Machine
61 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
_08ClassBasic_v1
No ratings yet
_08ClassBasic_v1
46 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
Data Mining I: Summer Semester 2017
No ratings yet
Data Mining I: Summer Semester 2017
52 pages
Learning AI
No ratings yet
Learning AI
34 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Hashing PPT
No ratings yet
Hashing PPT
39 pages
ARYAN DAA Practical FILE PDF
No ratings yet
ARYAN DAA Practical FILE PDF
18 pages
The Schur Algorithm: Digtal Signal Processing (Et 4235)
No ratings yet
The Schur Algorithm: Digtal Signal Processing (Et 4235)
22 pages
Soft Computing Perceptron Neural Network in MATLAB
No ratings yet
Soft Computing Perceptron Neural Network in MATLAB
8 pages
Initialization: The Bisection Method Is Initialized by Specifying The Function F (X), The
No ratings yet
Initialization: The Bisection Method Is Initialized by Specifying The Function F (X), The
11 pages
Lecture 33 Algebraic Computation and FFTs
No ratings yet
Lecture 33 Algebraic Computation and FFTs
16 pages
04 MergeSort Inversions
No ratings yet
04 MergeSort Inversions
24 pages
Maze Generation Algorithm
No ratings yet
Maze Generation Algorithm
3 pages
Assignment On List, String Through Functions, Recursive Functions
No ratings yet
Assignment On List, String Through Functions, Recursive Functions
2 pages
Group 6 - Laboratory Activity 2 - Bisection Method
No ratings yet
Group 6 - Laboratory Activity 2 - Bisection Method
11 pages
Support Vector Machine-Updated Version
No ratings yet
Support Vector Machine-Updated Version
13 pages
Assignment 03 Ai
No ratings yet
Assignment 03 Ai
17 pages
Cs Xii Preeti Arora Final
No ratings yet
Cs Xii Preeti Arora Final
89 pages
Stack Applications
No ratings yet
Stack Applications
6 pages
DAA Merged
No ratings yet
DAA Merged
178 pages
Jsir 74 (7) 377-380
No ratings yet
Jsir 74 (7) 377-380
4 pages
Opencv
No ratings yet
Opencv
1,197 pages
Fundamental Algorithms, Problem Set 3
No ratings yet
Fundamental Algorithms, Problem Set 3
2 pages
Week_14-MNS
No ratings yet
Week_14-MNS
17 pages
Chapter 5 - Dual Simplex
No ratings yet
Chapter 5 - Dual Simplex
24 pages
CBSE Class 10 Maths Worksheet - Polynomials
No ratings yet
CBSE Class 10 Maths Worksheet - Polynomials
3 pages
SIMPLEX METHOD - QTM Presentation
No ratings yet
SIMPLEX METHOD - QTM Presentation
18 pages
Lecture 05 - Priori & Postiary Analysis PDF
100% (1)
Lecture 05 - Priori & Postiary Analysis PDF
11 pages
Queue PPT
No ratings yet
Queue PPT
19 pages
8mws Gen Sle PPT Gaussian
No ratings yet
8mws Gen Sle PPT Gaussian
97 pages
Machine Learning Mastery Notes
No ratings yet
Machine Learning Mastery Notes
4 pages
HW5 Chp4 Ans
No ratings yet
HW5 Chp4 Ans
4 pages
L6 - Time & Space Complexity-1.2
No ratings yet
L6 - Time & Space Complexity-1.2
15 pages
DS Lab Assignments Akhilesh Jagadale - Odt
No ratings yet
DS Lab Assignments Akhilesh Jagadale - Odt
59 pages
C# Hashtable (With Examples)
No ratings yet
C# Hashtable (With Examples)
4 pages