0% found this document useful (0 votes)
10 views

Lec 3

The K-Means algorithm is an unsupervised learning algorithm used for clustering. It aims to partition a set of data points into k clusters by minimizing the variance within each cluster. The algorithm works by assigning each data point to the nearest cluster centroid and then recalculating the centroid based on the mean of all points within the cluster. This process is repeated until convergence. K-Means is widely used in customer segmentation and pattern recognition tasks.

Uploaded by

Mohammad Ahmad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lec 3

The K-Means algorithm is an unsupervised learning algorithm used for clustering. It aims to partition a set of data points into k clusters by minimizing the variance within each cluster. The algorithm works by assigning each data point to the nearest cluster centroid and then recalculating the centroid based on the mean of all points within the cluster. This process is repeated until convergence. K-Means is widely used in customer segmentation and pattern recognition tasks.

Uploaded by

Mohammad Ahmad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

CSC354

Machine Learning
Dr Muhammad Sharjeel
03
Decision Trees
 General motive of Decision Tree (DT) is to create a training model which can
predict class (or value) of target variables by learning decision rules inferred from
prior data (training data)
 In a DT, each node represents a feature (attribute), each link (branch) a decision
(rule) and each leaf an outcome

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Belongs to the family of supervised learning algorithms
 Could be used to solve both classification and regression problems
 Transparent algorithm, means decisions can be read and understood

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Algorithm pseudocode
1. Place the best attribute of the dataset (complete training set) at the root of
the tree
2. Split the training set into subsets in such a way that each subset contains
data with the same value for an attribute
3. Repeat step 1 and step 2 on each subset until you find leaf nodes in all the
branches of the tree

CSC354 – Machine Learning Dr Muhammad Sharjeel


 To create DT
 Shortlist a root node among all the nodes (nodes are ‘features/attributes’ in the dataset)
 Determine a node (attribute) that best classifies the training data and use it as the root
 Repeat the process for each branch

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Three implementations used to create DTs
 ID3
 C4.5
 CART

CSC354 – Machine Learning Dr Muhammad Sharjeel


 ID3 (Iterative Dichotomiser), uses information gain as metric
 Dichotomisation means dividing something into two completely opposite things
 ID3 iteratively divides attributes into two groups (dominant vs others) to construct
a tree
 Dominant attributes are selected based on information gain
 Performs top-down, greedy search through the space of possible decision trees
 Top-down means it starts building the tree from the top
 Greedy means at each iteration it selects the best feature at the present moment to
create a node

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Which attribute (node) best classifies the training data?
 Most dominant attribute would be the one with the highest information gain
 Information gain calculates the reduction in the entropy
 Entropy (uncertainty) of a dataset is the measure of disorder in the target attribute
 Entropy measures
 How well a given attribute/feature separates (or classifies) the target classes
 Attribute with the highest information gain is selected as the best one

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Entropy is the measurement of the impurity or randomness in the values of the
dataset
 A low disorder (no disorder) implies a low level of impurity
 Values between 0 and 1. A ‘1’ signifies a higher level of disorder or more impurity

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Formulae to calculate Entropy and Information Gain
 Entropy (S) = ∑ – p(I) . log2p(I)
 Gain (S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Compute the entropy [Entropy(S)] for the entire dataset
 For each attribute/feature:
 Calculate entropy [Entropy(A)] for each value of the attribute
 Calculate average information entropy (IE) for the attribute
 Calculate information gain (IG) for the attribute
 Pick the highest gain attribute
 Repeat until the complete tree is formed

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Example dataset, 14 instances, 4 input attributes
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Compute the entropy [Entropy(S)] for the entire dataset
 Entropy(S) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)
 Entropy(S) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940

CSC354 – Machine Learning Dr Muhammad Sharjeel


 For each attribute/feature: (let say, Outlook)
 Calculate entropy [Entropy(A)] for each value of the attribute, i.e., in case of Outlook,
'Sunny', 'Rainy’, 'Overcast'

Outlook PlayGolf Outlook PlayGolf Outlook PlayGolf


Sunny No Rain Yes Overcast Yes
Sunny No Rain Yes Overcast Yes
Sunny No Rain No Overcast Yes
Sunny Yes Rain Yes Overcast Yes
Sunny Yes Rain No

Calculations for Outlook (Sunny)


Outlook Positive Negative Entropy
Sunny 2 3 0.971 -(2/5).log2(2/5) - (3/5).log2(3/5)
-(0.4).(-1.322)- (0.6).(-0.737)
Rainy 3 2 0.971
0.5288 + .4422
Overcast 4 0 0 = 0.971

CSC354 – Machine Learning Dr Muhammad Sharjeel


 For each attribute/feature:
 Calculate average information entropy (IE) for the attribute (i.e., Outlook)
 IE(Outlook) = (2+3/9+5)*0.971 + (3+2/9+5)*0.971 + (4+0/9+5)*0
 IE(Outlook) = 0.693

 Calculate information gain (IG) for the attribute (i.e., Outlook)


 IG(Outlook) = 0.940 – 0.693 = 0.247

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Pick the highest gain attribute, in this case, Outlook

Attributes Gain
Outlook 0.247

Temperature 0.029
Outlook
Humidity 0.152

Wind 0.048

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook (Overcast) only contains examples of ‘Yes’
 Outlook (Sunny, Rain) contains both ‘Yes’ and ‘No’ examples

Outlook

Sunny Overcast Rain

? Yes ?

 Repeat until the complete tree is formed

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook (Overcast) only contains examples of ‘Yes’
 Outlook (Sunny, Rain) contains both ‘Yes’ and ‘No’ examples

Outlook

Yes

Outlook Temperature Humidity Wind PlayGolf Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No Rain Mild High Weak Yes
Sunny Hot High Strong No Rain Cool Normal Weak Yes
Sunny Mild High Weak No Rain Cool Normal Strong No
Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

 Entropy(S) = 0.971
 Entropy(A)[Temperature](Cool) = 0
 Entropy(A)[Temperature](Hot) = 0
 Entropy(A)[Temperature](Mild) = 1
 IE(Temperature) = 0.400
 IG(Temperature) = 0.571

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

 Entropy(S) = 0.971
 Entropy(A)[Humidity](High) = 0
 Entropy(A)[Humidity](Normal) = 0
 IE(Humidity) = 0
 IG(Humidity) = 0.971

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

 Entropy(S) = 0.971
 Entropy(A)[Wind](Strong) = 1
 Entropy(A)[Wind](Weak) = 0.918
 IE(Wind) = 0.951
 IG(Wind) = 0.020

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Pick the highest gain attribute, in this case, Humidity

Outlook

Sunny Overcast Rain

Humidity Yes ?

Normal High

Yes No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

 Entropy(S) = 0.971
 Entropy(A)[Temperature](Cool) = 1
 Entropy(A)[Temperature](Mild) = 0.918
 IE(Temperature) = 0.951
 IG(Temperature) = 0.020

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

 Entropy(S) = 0.971
 Entropy(A)[Humidity](High) = 1
 Entropy(A)[Humidity](Normal) = 0.918
 IE(Humidity) = 0.951
 IG(Humidity) = 0.020

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

 Entropy(S) = 0.971
 Entropy(A)[Wind](Weak) = 0
 Entropy(A)[Wind](Strong) = 0
 IE(Humidity) = 0
 IG(Humidity) = 0.971

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Pick the highest gain attribute, in this case, Wind

Outlook

Sunny Overcast Rain

Humidity Yes Wind

Normal High Weak Strong

Yes No Yes No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Use the final DT (ID3) to classify an unseen example
 Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = Strong
 Output = No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Shortcomings of ID3
 Information gain reduces the entropy due to the selection of a particular
attribute
 Biasness in considering attributes with a large number of distinct values
which might lead to overfitting
 Continues to go deeper and deeper (builds many branches) to reduce the
training error but results in an increased test error
 Overfitting: Model fits on training data well but fails to generalize
 Underfitting: Model is too simple to find the patterns in the data

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Improving ID3
 Pruning is a mechanism that reduces the size and complexity of a DT by
removing unnecessary nodes
 Pre-pruning, stops the tree construction bit early
 Do not split a node if its goodness measure is below a threshold value
 Post-pruning, once a DT is complete, cross-validation is performed to
test whether expanding a node makes an improvement
 If it shows an improvement, continue expanding the node
 If it shows a reduction in accuracy, node is converted to a leaf node
 To overcome problems in information gain, the information gain ratio is
used (C4.5)

CSC354 – Machine Learning Dr Muhammad Sharjeel


 C4.5 is the improved version of ID3
 Create more generalized models
 Works with continuous data
 Could handle missing data
 Avoids overfitting
 Also known as J48 (C4.5 release 8)
 Uses the information gain ratio as metric to split the dataset
 Information gain (used in ID3) tends to prefer the attributes with more categories
 Such attributes tends to have lower entropy
 Results in overfitting
 Gain ratio mitigates this issue by penalising attributes having more categories
 It uses split information (or intrinsic information)

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Information gain ratio
 GainRatio(A) = Gain(A) / SplitInfo(A)
 Split information
 SplitInfo(A) = -∑ |Dj|/|D| . log2|Dj|/|D|

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Example dataset, 14 instances, 4 input attributes
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Split information for Outlook attribute
 Sunny = 5, Overcast = 4, Rain = 5
 SplitInfo(Outlook) = - (5/14).log2(5/14) - (4/14).log2(4/14) - (5/14).log2(5/14) = 1.577
 GainRatio(Outlook) = 0.247/1.577 = 0.156

 Entropy of the whole dataset, Outlook attribute entropy, and information gain of Outlook already calculated
(ID3)
 Entropy(S) = 0.940
 IE[Outlook] = 0.693
 IG(Outlook) = 0.940 – 0.693 = 0.247

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Gain ratio for Temperature attribute
 Hot = 4, Mild = 6, Cool = 4
 SplitInfo(Temperature) = - (4/14).log2(4/14) - (6/14).log2(6/14) - (4/14).log2(4/14) = 1.556
 GainRatio(Temperature) = 0.029/1.556 = 0.018
 Gain ratio for Humidity attribute
 High = 7, Normal = 7
 SplitInfo(Humidity) = - (7/14).log2(7/14) - (7/14).log2(7/14) = 1
 GainRatio(Humidity) = 0.152/1 = 0.152
 Gain ratio for Wind attribute
 Weak = 8, Strong = 6
 SplitInfo(Wind) = - (8/14).log2(8/14) - (6/14).log2(6/14) = 0.985
 GainRatio(Wind) = 0.048/0.985 = 0.048

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Gain ratio of Outlook is the highest, so it will be the root node

Outlook

Yes

Outlook Temperature Humidity Wind PlayGolf Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No Rain Mild High Weak Yes
Sunny Hot High Strong No Rain Cool Normal Weak Yes
Sunny Mild High Weak No Rain Cool Normal Strong No
Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

 GainRatio(Temperature) = 0.571/1.521 = 0.375


 GainRatio(Humidity) = 0.971/0.971 = 1
 GainRatio(Wind) = 0.020/0.971 = 0.233

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

 GainRatio(Temperature) = 0.020/0.971 = 0.020


 GainRatio(Humidity) = 0.020/0.971 = 0.020
 GainRatio(Wind) = 0.971/0.971 = 1

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Final DT using C4.5

Outlook

Sunny Overcast Rain

Humidity Yes Wind

Normal High Weak Strong

Yes No Yes No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Use the final DT (C4.5) to classify an unseen example
 Outlook = Rain, Temperature = Cool, Humidity = High, Wind = Weak
 Output = Yes

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Some drawback of C4.5
 Split ratio is higher for multi-valued attributes (more outcomes)
 Tends to prefer unbalanced splits in which one partition is much smaller than others
 Classification And Regression Tree (CART) uses gini index as metric
 If a dataset D contains examples from n classes, gini index is defined as
 Gini(D) = 1 – Σ (Pi)2 for i=1 to n (number of classes)
 It creates a binary tree
 If there are more than two outcomes of an attribute then gini index is
 GiniA(D) = (D1/D).Gini(D1) + (D2/D).Gini(D2)
 Reduction in impurity
 Gini(A) = Gini(D) – GiniA(D)

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Example dataset, 14 instances, 4 input attributes
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Total 14 examples, 9 positive, 5 negative
 Gini(D) = 1 – ((9/14)2 + (5/14)2) = 0.459
 Compute gini index of each attribute
 Start with Outlook (Sunny, Overcast, Rain)
 Attribute has three values, it will have 6 subsets
 {(Sunny, Overcast), (Overcast, Rain), (Sunny, Rain), (Sunny), (Overcast), (Rain)}
 Empty and All subsets are not used
 Gini(S,O), R = (9/14) x [1 – ((6/9)2 + (3/9)2)] + (5/14) x [1 – ((3/5)2 + (2/5)2)] = 0.457
 Gini(O,R), S = (9/14) x [1 – ((7/9)2 + (2/9)2)] + (5/14) x [1 – ((2/5)2 + (3/5)2)] = 0.393
 Gini(S,R), O = (10/14) x [1 – ((5/10)2 + (5/10)2)] + (4/14) x [1 – ((4/4)2 + (0/4)2)] = 0.357
 Gini(A) = 0.459 – 0.357 = 0.101

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Temperature (Hot, Mild, Cool)
 Attribute has three values, it will have 6 subsets
 {(Hot, Mild), (Hot, Cool), (Mild, Cool), (Hot), (Mild), (Cool)}
 Gini(H,M), C = (10/14) x [1 – ((6/10)2 + (4/10)2)] + (4/14) x [1 – ((3/4)2 + (1/4)2)] = 0.450
 Gini(H,C), M = (8/14) x [1 – ((5/8)2 + (3/8)2)] + (6/14) x [1 – ((4/6)2 + (2/6)2)] = 0.458
 Gini(M,C), H = (10/14) x [1 – ((7/10)2 + (3/10)2)] + (4/14) x [1 – ((2/4)2 + (2/4)2)] = 0.442
 Gini(A) = 0.459 – 0.442 = 0.016

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Humidity (High, Normal)
 Attribute has only two values
 GiniH, N = (7/14) x [1 – ((6/7)2 + (1/7)2)] + (7/14) x [1 – ((3/7)2 + (4/7)2)] = 0.367
 Gini(A) = 0.459 – 0.367 = 0.091

 Wind (Weak, Strong)


 Attribute has only two values
 GiniW, S = (8/14) x [1 – ((6/8)2 + (2/8)2)] + (6/14) x [1 – ((3/6)2 + (3/6)2)] = 0.428
 Gini(A) = 0.459 – 0.428 = 0.030

 Attribute with the highest gini index is Outlook, hence, it will be chosen as root node
 Within the Outlook, [(Sunny, Rain), Overcast] [Gini(S,R), O] has the lowest gini index

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Partial DT using CART
Outlook

Sunny, Rain Overcast

Outlook Temperature Humidity Wind PlayGolf


Yes
Sunny Hot High Weak No
Sunny Hot High Strong No
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Calculate the gini index for the following subset Outlook (Sunny, Rain)

Outlook Temperature Humidity Wind PlayGolf


Sunny Hot High Weak No
Sunny Hot High Strong No
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Information Gain: biased toward high branching features
 Gain Ratio: Prefers splits with some partitions being much smaller than the others
 Gini Index: Balanced around 0.5

CSC354 – Machine Learning Dr Muhammad Sharjeel


 C4.5 with the continues (numeric) data
 Example dataset, 14 instances, 4 input attributes, 2 attributes with continuous data
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny 85 85 Weak No
2 Sunny 80 90 Strong No
3 Overcast 83 78 Weak Yes
4 Rain 70 96 Weak Yes
5 Rain 68 80 Weak Yes
6 Rain 65 70 Strong No
7 Overcast 64 65 Strong Yes
8 Sunny 72 95 Weak No
9 Sunny 69 70 Weak Yes
10 Rain 75 80 Weak Yes
11 Sunny 75 70 Strong Yes
12 Overcast 72 90 Strong Yes
13 Overcast 81 75 Weak Yes
14 Rain 71 80 Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook and Wind are nominal attributes
 Gain ratio for Wind = 0.048
 Gain ratio for Outlook = 0.156
 Humidity and Temperature are continuous attributes
 Convert continuous values to nominal ones
 Perform binary split based on a threshold value
 Threshold should be a value which offers maximum gain for an attribute

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Separate dataset into two parts
 Instances less than or equal to (<=)
 Instances greater than (>=)
 How?
 Sort the attribute values in ascending order
 Calculate gain ratio for every value
 Value which maximizes the gain would be the threshold (separator)

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Sort the Humidity values smallest to largest
Humidity PlayGolf

65 Yes
70 No
70 Yes
70 Yes
75 Yes
78 Yes
80 Yes
80 Yes
80 No
85 No
90 No
90 Yes
95 No
96 Yes

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Humidity (65)
 Entropy(Humidity<=65) = -(0/1).log2(0/1) – (1/1).log2(1/1) = 0
 Entropy(Humidity>65) = -(5/13).log2(5/13) – (8/13).log2(8/13) = 0.961
 Gain(Humidity<=,> 65) = 0.940 – (1/14).0 – (13/14).(0.961) = 0.048
 SplitInfo(Humidity<=,> 65) = -(1/14).log2(1/14) -(13/14).log2(13/14) = 0.371
 GainRatio(Humidity<=,> 65) = 0.126
 Humidity (70)
 Entropy(Humidity<=70) = – (1/4).log2(1/4) – (3/4).log2(3/4) = 0.811
 Entropy(Humidity>70) = – (4/10).log2(4/10) – (6/10).log2(6/10) = 0.970
 Gain(Humidity<=,> 70) = 0.940 – (4/14).(0.811) – (10/14).(0.970) = 0.014
 SplitInfo(Humidity<=,> 70) = -(4/14).log2(4/14) -(10/14).log2(10/14) = 0.863
 GainRatio(Humidity<=,> 70) = 0.016

CSC354 – Machine Learning Dr Muhammad Sharjeel


 GainRatio(Humidity<=,> 75) = 0.047
 GainRatio(Humidity <=,> 78) = 0.090
 GainRatio(Humidity <=,> 80) = 0.107
 GainRatio(Humidity <=,> 85) = 0.027
 GainRatio(Humidity <=,> 90) = 0.016
 GainRatio(Humidity <=,> 95) = 0.128

 No calculation of gain ratio for Humidity (96) because it cannot be greater than
this value
 Gain is maximum when threshold is equal to Humidity (80)

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Apply the same process on Temperature as its values are continuous too
 Gain is maximum when Temperature (80)
 GainRatio(Temperature<=, > 83) = 0.305
 Gain ratio for all the attributes is summarized in the following table
Attribute GainRatio
Wind 0.049
Outlook 0.155
Humidity <=, > 0.107
Temperature <=, > 0.305

 Temperature will be the root node as it has the highest gain ratio value
 Can you build the complete DT?

CSC354 – Machine Learning Dr Muhammad Sharjeel


 DTs famous implementation models
 CHAID = 1980
 CART = 1984
 ID3 = 1986
 C4.5 = 1993

CSC354 – Machine Learning Dr Muhammad Sharjeel


 CHAID (CHi-square Automatic Interaction Detection)
 Uses chi-square tests to find the most dominant feature
 Check if there is a relationship between two variables and chooses the independent
variable that has the strongest interaction with the dependent variable
 √((y – y’)2 / y’) where y is actual and y’ is expected value

CSC354 – Machine Learning Dr Muhammad Sharjeel


 How to construct a DT using CHAID?
 Find the most dominant feature in the dataset
No. Outlook Temperature Humidity Wind Hour-Played
1 Sunny Hot High Weak 25
2 Sunny Hot High Strong 30
3 Overcast Hot High Weak 46
4 Rain Mild High Weak 45
5 Rain Cool Normal Weak 52
6 Rain Cool Normal Strong 23
7 Overcast Cool Normal Strong 43
8 Sunny Mild High Weak 35
9 Sunny Cool Normal Weak 38
10 Rain Mild Normal Weak 46
11 Sunny Mild Normal Strong 48
12 Overcast Mild High Strong 52
13 Overcast Hot Normal Weak 44
14 Rain Mild High Strong 30

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook
 3 possible values (Sunny, Rain, and Overcast)
 2 decisions (Yes and No)
 Chi-square (yes) - (sunny) - (outlook) = √((2 – 2.5)2 / 2.5) = 0.316

Yes No Total Expected Chi-square (Yes) Chi-square (No)

Sunny 2 3 5 2.5 0.316 0.316

Rain 4 0 4 2 1.414 1.414

Overcast 3 2 5 2.5 0.316 0.316

 Chi-square (outlook) = 0.316+0.316+1.414+1.414+0.316+0.316 = 4.092

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Outlook = 0.316+0.316+1.414+1.414+0.316+0.316 = 4.092
 Temperature = 0 + 0 + 0.577 + 0.577 + 0.707 + 0.707 = 2.569
 Humidity = 0.267 + 0.267 + 1.336 + 1.336 = 3.207
 Wind = 0.802 + 0.802 + 0 + 0 = 1.604

 Outlook has the highest chi-square value (most significant feature) and will be
the root node
 Can you build the complete DT?

CSC354 – Machine Learning Dr Muhammad Sharjeel


 How to construct a DT when the output attribute is a numeric value?

No. Outlook Temperature Humidity Wind Hour-Played


1 Sunny Hot High Weak 25
2 Sunny Hot High Strong 30
3 Overcast Hot High Weak 46
4 Rain Mild High Weak 45
5 Rain Cool Normal Weak 52
6 Rain Cool Normal Strong 23
7 Overcast Cool Normal Strong 43
8 Sunny Mild High Weak 35
9 Sunny Cool Normal Weak 38
10 Rain Mild Normal Weak 46
11 Sunny Mild Normal Strong 48
12 Overcast Mild High Strong 52
13 Overcast Hot Normal Weak 44
14 Rain Mild High Strong 30

CSC354 – Machine Learning Dr Muhammad Sharjeel


 How to construct a DT when the output attribute is a numeric value?
 Regression problems are solved by using the metric ‘standard deviation’
No. Outlook Temperature Humidity Wind Hour-Played
1 Sunny Hot High Weak 25
2 Sunny Hot High Strong 30
3 Overcast Hot High Weak 46
4 Rain Mild High Weak 45
5 Rain Cool Normal Weak 52
6 Rain Cool Normal Strong 23
7 Overcast Cool Normal Strong 43
8 Sunny Mild High Weak 35
9 Sunny Cool Normal Weak 38
10 Rain Mild Normal Weak 46
11 Sunny Mild Normal Strong 48
12 Overcast Mild High Strong 52
13 Overcast Hot Normal Weak 44
14 Rain Mild High Strong 30

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Regression problems are solved by using the metric ‘standard deviation’
 Hours_Played = {25, 30, 46, 45, 52, 23, 43, 35, 38, 46, 48, 52, 44, 30}
 Average= 39.78
 Standard deviation = 9.32

 Outlook
 Overcast = 3.49
 Rain = 10.87
 Sunny = 7.78
 Weighted SD (Outlook) = (4/14)x3.49 + (5/14)x10.87 + (5/14)x7.78 = 7.66
 SD reduction (Outlook) = 9.32 – 7.66 = 1.66

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Regression problems are solved by using the metric ‘standard deviation’
 SD reduction (Outlook) = 9.32 – 7.66 = 1.66
 SD reduction (Temperature) = 9.32 – 8.84 = 0.47
 SD reduction (Humidity) = 9.32 – 9.04 = 0.27
 SD reduction (Wind) = 9.32 – 9.03 = 0.29

 Outlook will be the root node as it has the highest SD reduction value
 Can you build the complete DT?

CSC354 – Machine Learning Dr Muhammad Sharjeel


Thanks

You might also like