0% found this document useful (0 votes)
51 views

For Classification Models

1. The document discusses different attribute selection measures for classification models, focusing on information gain and Gini index. 2. It describes how decision tree algorithms like ID3, C5.0, and CART use information gain as the attribute selection measure at each node to split the data. 3. The example shows how to build a machine learning model to predict who will buy a computer using the information gain approach on a sample dataset with attributes like age, income, student status, and credit rating.

Uploaded by

Rohit Ghai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

For Classification Models

1. The document discusses different attribute selection measures for classification models, focusing on information gain and Gini index. 2. It describes how decision tree algorithms like ID3, C5.0, and CART use information gain as the attribute selection measure at each node to split the data. 3. The example shows how to build a machine learning model to predict who will buy a computer using the information gain approach on a sample dataset with attributes like age, income, student status, and credit rating.

Uploaded by

Rohit Ghai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

For

Classification Models
Attribute Selection Measures
There are two popular attribute selection
measures:
1. information gain,
2. Gini index.
Decision Tree : Using “Information Gain”
ID3 uses information gain as its attribute selection measure.
Remember…Earlier slides..

This measure is based on Three decision tree algorithms


ID3, C5.0, and CART adopt a
pioneering work by Claude Shannon greedy (i.e., nonbacktracking)
on information theory, which studied approach in which decision trees
are constructed in a top-down
the value or “information content” of recursive divide-and-conquer
messages. manner.
J. Ross Quinlan, a researcher in
machine learning, developed a
decision tree algorithm known as
ID3 (Iterative Dichotomiser).
Case : Build a ML Predictive model “Who will buy Computer ?”
  age income student credt_rating Class:buy_computer
1 youth high no fair no
2 youth high no excellent no
3 middle_aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle_aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle_aged medium no excellent yes
13 middle_aged high yes fair yes
14 senior medium no excellent no
Algorithm : “Information Gain ( using Entropy )
Step -1 : Calculate Info(D) or Entropy(D) where D = Attribute classes of Dataset ( “yes”, “no”)

Calculate Probability of each Class   age income student credt_rating Class:buy_computer


yes =9, no=5 , & Total=14 1 youth high no fair no
2 youth high no excellent no
3 middle_aged high no fair yes
Probability Formula : p = Favourables / Total 4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
Let : p1 = Prob. Of yes = 9/14 = 0.64
7 middle_aged low yes excellent yes
8 youth medium no fair no
Let : p2 = Prob. Of no = 5/14 = 0.36 9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle_aged medium no excellent yes
13 middle_aged high yes fair yes
14 senior medium no excellent no
Step -1 : Calculate Info(D) or Entropy(D) where D = Attribute classes of Dataset ( “yes”, “no”)

Let : p1 = Prob. Of yes = 9/14 = 0.64

Let : p2 = Prob. Of no = 5/14 = 0.36

 
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝐷)=− 𝑝 1 ( 𝑙𝑜𝑔2 𝑝 1 ) − 𝑝 2 ( 𝑙𝑜𝑔 2 𝑝 2 )

 
 Use Excel

Entropy (D) = 0.94


Step -2 : Calculate Info(Age) or Entropy(Age) :
age income student credt_rating Class:buy_computer
middle_aged high no fair yes
middle_aged low yes excellent yes
middle_aged medium no excellent yes
a)middle_aged: b) senior: b) youth: middle_aged high yes fair yes
Total=4 senior medium no fair yes
Total=5 Total=5 senior low yes fair yes
yes = 4 yes = 3 yes = 2 senior low yes excellent no
no = 0 no = 2 no = 3 senior medium yes fair yes
senior medium no excellent no
youth high no fair no
youth high no excellent no
P1=4/4=1.0 P1=3/5=0.6 P1=2/5=0.4 youth medium no fair no
P2=0/4=0 P2=2/5=0.4 P2=3/5=0.6 youth low yes fair yes
youth medium yes excellent yes

Entropy = 0 Entropy = 0.97 Entropy = 0.97


Go to Excel  Calculate
Step -3 : Since Age attribute has THREE subsets

To Calculate Info(Age) or Entropy(Age) : Weighted Average of THREE is done.


a)middle_aged: b) senior: b) youth:

Total = 4 Total = 5 Total = 5 Grand-Total = 14

Entropy = 0 Entropy = 0.97 Entropy = 0.97

Entropy (Age) = (4/14)*0 + (5/14)*0.97 + (5/14)*0.97 = 0.69


Decision Tree : Using “Information Gain”

Gain (age) = Entropy(D) – Entropy(age)


= 0.94 – 0.69 age
middle_aged
income student credt_rating Class:buy_computer
high no fair yes
middle_aged low yes excellent yes
= 0.25 middle_aged medium no excellent yes
middle_aged high yes fair yes
Similarly, senior
senior medium
low
no
yes
fair
fair
yes
yes

Gain (income) = 0.03 senior


senior
low
medium
yes
yes
excellent
fair
no
yes
senior medium no excellent no
Gain (student) = 0.15 youth high no fair no
youth high no excellent no
Gain (credit_rating) = 0.05 youth
youth
medium
low
no
yes
fair
fair
no
yes
youth medium yes excellent yes
Because age has the highest information gain
among the attributes, it is selected as the splitting attribute.
Attribute Selection Measures : Gini Index
The Gini index is the name of the cost
function used to evaluate splits in the dataset.

A Gini score gives an idea of how good a split


is by how mixed the classes are in the two
groups created by the split.
Attribute Selection Measures : Gini Index
• Gini index measures the impurity of D, a data partition.

Impure set
Impure set

Pure Set
Attribute Selection Measures : Gini Index

A perfect separation results in a Gini score of 0,


whereas
the worst case split that results in 50/50 classes
in each group result in a Gini score of 0.5 (for a
2 class problem).
  Income yes no Total p1 p2 Gini
Gini Index High 4   4 1 0 0
Let p1 = probability of ‘yes’ Low 3 3 6 0.5 0.5 0.50
Let p2 = probability of ‘no’ Medium 4 4 8 0.5 0.5 0.50
Case : Marketing age income student credt_rating Class:buy_computer

Decision Tree middle_aged


middle_aged
high
high
no
yes
fair
fair
yes
yes

using youth
youth
high
high
no
no
fair
excellent
no
no

Gini Index middle_aged


senior
low
low
yes
yes
excellent
fair
yes
yes
senior low yes excellent no
youth low yes fair yes
middle_aged medium no excellent yes
Data file : Decision-Tree-Buy-Computer.xlsx senior medium no fair yes
senior medium yes fair yes
senior medium no excellent no

Objective : Find Decision tree to youth medium no fair no

youth medium yes excellent yes


predict whether a customer will buy
computer or not? ( for given
customer info.)
age income student credt_rating Class:buy_computer

Gini Index for all variables middle_aged


middle_aged
youth
high
high
high
no
yes
no
fair
fair
fair
yes
yes
no
youth high no excellent no
middle_aged low yes excellent yes

Let p1 = probability of ‘yes’ senior


senior
low
low
yes
yes
fair
excellent
yes
no
youth low yes fair yes
middle_aged medium no excellent yes

Let p2 = probability of ‘no’ senior


senior
medium
medium
no
yes
fair
fair
yes
yes
senior medium no excellent no
youth medium no fair no
youth medium yes excellent yes

 
Gini Index

Go to Excel  Calculate
age income student credt_rating Class:buy_computer

Gini Index for all variables middle_aged


middle_aged
youth
high
high
high
no
yes
no
fair
fair
fair
yes
yes
no
youth high no excellent no
middle_aged low yes excellent yes

Let p1 = probability of ‘yes’ senior


senior
low
low
yes
yes
fair
excellent
yes
no
youth low yes fair yes
middle_aged medium no excellent yes

Let p2 = probability of ‘no’ senior


senior
medium
medium
no
yes
fair
fair
yes
yes
senior medium no excellent no
youth medium no fair no
youth medium yes excellent yes

 
Gini Index
age yes no Total p1 p2 Gini
middle-aged 4   4 1 0 0
Go to Excel  Calculate senior 3 2 5 0.6 0.4 0.48
youth 2 3 5 0.4 0.6 0.48
Gini Index for “Root node”
• Calculate Gini index for ALL attributes 

• Age = 0.34 age income student credt_rating Class:buy_computer


• Income = 0.44 middle_aged
middle_aged
high
high
no
yes
fair
fair
yes
yes
• Student = 0.37 youth
youth
high
high
no
no
fair
excellent
no
no

• Credit_rating =0.43 middle_aged


senior
low
low
yes
yes
excellent
fair
yes
yes

“Age” is lowest cost or GINI senior


youth
low
low
yes
yes
excellent
fair
no
yes

 Split “Age” at root node. middle_aged


senior
medium
medium
no
no
excellent
fair
yes
yes
senior medium yes fair yes
senior medium no excellent no
youth medium no fair no
youth medium yes excellent yes
Age at “Root node”
Class:buy_c
age income student credt_rating omputer
age middle_aged medium no excellent yes
middle_aged low yes excellent yes
middle_aged high no fair yes
middle_aged high yes fair yes
senior medium no excellent no
Middle-age senior/youth senior
senior
low
medium
yes
no
excellent
fair
no
yes
senior low yes fair yes
senior medium yes fair yes
youth high no excellent no
youth medium yes excellent yes
yes Split further youth high no fair no
youth medium no fair no
youth low yes fair yes

Gini :
Income=0.27
Student=0.23
Credit-rating=0.30
Age at “Root node”+ further split
age Class:buy_
age income student credt_rating computer
senior medium no excellent no
senior medium no fair yes
Middle-age senior/youth youth high no excellent no
youth high no fair no
youth medium no fair no
yes student senior low yes excellent no
senior low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
no yes youth low yes fair yes

Split further Split further

Gini : Gini :
Income=0.09 Income=0.09
age=0.07 age=0.09
Credit-rating=0.09 Credit-rating=0.07
Age at “Root node” + further split
age
Class:buy_
age income student credt_rating computer
senior medium no excellent no
Middle-age senior/youth
senior medium no fair yes
youth high no excellent no
yes student youth high no fair no
youth medium no fair no
no yes senior low yes excellent no
senior low yes fair yes
youth senior Split further senior medium yes fair yes
youth medium yes excellent yes
no
Credit=excelle
Credit=fair
youth low yes fair yes
nt

no yes

Gini :
Income=0.09
age=0.09
Credit-rating=0.07
age

senior/yout
Middle-age
h

yes student

no yes

youth senior Credit-rating

Credit=excel
no
lent
Credit=fair excel fair

no yes senior youth yes

no yes
Decision Tree : Case -Telecom Customer churn
Customer Attrition
Customer attrition, also known as customer churn, customer turnover, or
customer defection, is the loss of clients or customers.

Telephone service companies, Internet service providers, pay TV companies,


insurance firms, and alarm monitoring services, often use customer attrition
analysis and customer attrition rates as one of their key business metrics because
the cost of retaining an existing customer is far less than acquiring a new one.

Companies from these sectors often have customer service branches which
attempt to win back defecting clients, because recovered long-term customers
can be worth much more to a company than newly recruited clients.
Decision Tree : Case -Telecom Customer churn
Dataset : Telcom Customer Churn

Each row represents a customer, each column contains customer’s


attributes described on the column Metadata.

The raw data contains 7043 rows (customers)


and
20 columns (features).

The “Churn” column is our target.


Decision Tree : Case -Telecom Customer churn

df <- read.csv(file.choose(),header = T)
str(df)
# data cleaning ( str )
churn$customerID=NULL
df$SeniorCitizen=as.integer(df$SeniorCitizen)
df$SeniorCitizen=as.factor(df$SeniorCitizen)
str(df$SeniorCitizen)
Decision Tree : Case -Telecom Customer churn
str(churn)
# data cleaning ( str )
library(plyr)

df$OnlineBackup <- revalue(df$OnlineBackup,


c("No internet service"="No"))
summary(df$OnlineBackup)
Decision Tree : Case -Telecom Customer churn
df$OnlineMovies <- revalue(df$OnlineMovies, c("No internet service"="No"))
df$OnlineTV <- revalue(df$OnlineTV, c("No internet service"="No"))
df$TechnicalHelp <- revalue(df$TechnicalHelp, c("No internet service"="No"))
df$DeviceProtectionService <- revalue(df$DeviceProtectionService,
c("No internet service"="No"))
df$OnlineSecurity <- revalue(df$OnlineSecurity, c("No internet service"="No"))

df$MultipleConnections <- revalue(df$MultipleConnections,


c("No phone service"="No"))

Summary(df)
Decision Tree : Case -Telecom Customer churn
df$SeniorCitizen=as.integer(df$SeniorCitizen)
df$SeniorCitizen=as.factor(df$SeniorCitizen)
str(df$SeniorCitizen)
Decision Tree : Case -Telecom Customer churn

# check for NAs,

sapply(churn, function(x) sum(is.na(x)))


# EDA
library(gmodels)
CrossTable(df$Churn,df$gender,
prop.chisq = FALSE,
prop.c = F,
prop.t = F,
chisq = T)

dev.new()
boxplot(df$tenure~df$Churn)
#Model – Starts

set.seed(123)
rno=sample(nrow(df),nrow(df)*0.7)

trn <- df[rno, ]


tst <- df[-rno, ]
library(rpart.plot)
library(rpart)

dtree1=rpart(Churn~.,data = trn,
method = 'class')
# Tree plot
library(rattle)

dev.new()
fancyRpartPlot(dtree1,type = 3)
# Predict & Confusion Matrix
tst$predProb=predict(dtree1,newdata = tst)
str(trn$Churn)

tst$pred=ifelse(tst$predProb>0.5,'Yes','No')
str(tst$pred)
tst$pred=factor(tst$pred[1:3701],levels = c('Yes','No'))

library(caret)
confusionMatrix(tst$pred,tst$Churn)
Random Forest – Ensemble Method Ensemble learning helps improve
machine learning results by combining
several models. This approach allows
the production of better predictive
performance compared to a single
model. 
Bagging OR Bootstrap Bootstrap Training Decision Tree
Sample in RF Sample-1
(70%) of Original Data
700 records
randomly draw datasets with replacement
from the Original data, each sample the Decision Tree
Bootstrap Training
same size as the training set Sample-2
Training (70%) of Original Data
700 records
Data Set (70%)
Original Data Set 700 records Bootstrap Training
Decision Tree

Sample-3
1000 records (70%) of Original Data
Testing 700 records

Data Set (30%)


Bootstrap Training Decision Tree
300 records Sample-4
(70%) of Original Data
700 records….. So on..
Many tress grown randomly  Random Forest
Bootstrap Training Decision Tree
Random Sample-1
(70%) of Original Data
700 records Take the
majority vote
Bootstrap Training
Decision Tree (a
Random Sample-2 committe
(70%) of Original Data e of trees
Original Data Set 700 records each
Decision Tree cast a
1000 records Bootstrap Training
vote for
Random Sample-3
(70%) of Original Data the
700 records predicted
class )
Bootstrap Training Decision Tree
Random Sample-4
(70%) of Original Data
# Random Forest
library(randomForest)
rf <-randomForest(Churn~.,data=trn)
print(rf)

tst$rfPredProb= predict(rf, newdata=tst,type = 'prob')


rfPred= predict(rf, newdata=tst)

confusionMatrix(rfPred,tst$Churn)
Model Evaluation :
Decision Tree vs. Random Forest
Accuracy of RF is > DT  RF is best model

But threshold used was 0.50 for ‘Yes’ and ‘No’

To…check accuracy at various thresholds ( 0 to 1 )

Plot ROC curve & calculate AUC ( Area Under Curve )


ROC – Curve : Various Thresholds
  Ref.pos Ref.neg
Pred.pos 588 94
Pred.neg 112 206
  700 300
  0.84 0.31

  Ref.pos Ref.neg
  Ref.pos Ref.neg Pred.pos 684 202
TPR Pred.pos 647 120
 
Pred.neg 16
700
98
300
Pred.neg 53 180
(True Positive Rate)   700 300   0.98 0.67
  0.92 0.40

  Ref.pos Ref.neg
Pred.pos TP FP
Pred.neg FN TN
  TPR=TP/(TP+FN) FPR=FP/(FP+TN)

FPR ( False Positive Rate )


# ROC curve & AUC
dev.new(1)
plot.roc(tst$Churn,tst$predProb[1:3701],
print.auc=T,main="Decision Tree")

dev.new(2)
plot.roc(tst$Churn,tst$rfPredProb[1:3701],
print.auc=T,main="Random Forest")
Model Evaluation :
Decision Tree vs. Random Forest

1. Accuracy of RF > DT  RF is best model


2. AUC of RF > DT  RF is the Best Model

You might also like