0% found this document useful (0 votes)

4 views

5 - Predictive Modeling Using Decision Trees

Predictive modeling involves building models that predict a classification or value using supervised learning techniques. The document discusses key concepts in predictive modeling including selecting informative attributes, entropy as a measure of data impurity, and using information gain to determine the attribute that best splits a dataset regarding the target variable.

Uploaded by

Việt Anh Đoàn

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

5 - Predictive Modeling Using Decision Trees

Uploaded by

Việt Anh Đoàn

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Predictive Modeling

Predictive modeling is about building models that predict a class

(classification) or a value (regression).

supervised learning—how can we segment the population into groups

that differ from each other with respect to some quantity of interest
(target).

In supervised learning we have the value of this target variable.

Predictive Modeling
Models, Induction, and Prediction
In data science, a predictive model is a formula for estimating the
unknown value of interest: the target.

What is prediction?
In data science, prediction more generally means to estimate an
unknown value. Machine learning usually deals with historical data,
models very often are built and tested using events from the past.
Predictive models for credit scoring estimate the likelihood that a
potential customer will default (become a write-off). Predictive models
for spam filtering estimate whether a given piece of email is spam.
Predictive models for fraud detection judge whether an account has
been defrauded. The key is that the model is intended to be used to
estimate an unknown value.
Models, Induction, and Prediction
Supervised learning is model creation where the model describes a
relationship between a set of selected variables (attributes or features)
and a predefined variable called the target variable.

For our churn-prediction problem we would like to build a model of the

propensity to churn as a function of customer account attributes, such
as age, income, length with the company, number of calls to customer
service, overage charges, customer demographics, data usage, and
others.
Models, Induction, and Prediction

An instance or example represents a fact or a data point (row in the data set).

An instance is described by a set of attributes (fields, columns, variables, or features)

An instance is also called a feature vector.

MANY NAMES FOR THE SAME THINGS

Dataset – refers to the entire set of data.

Target – column which represents the class for each instance. Also called
label, dependent variable.

The creation of models from data is known as model induction.

Induction is a term from philosophy that refers to generalizing from

specific cases to general rules
Models, Induction, and Prediction
The input data for a machine learning algorithm, used for creating the
model, is called the training data.

Lets say for the churn problem, we want to create a supervised learning
model that divides or classifies the data into segments such high risk,
low risk, medium risk etc.

The question is:

How can we select one or more attributes/features/variables that will
best divide the sample with respect to our target variable of interest?
Models, Induction, and Prediction
How can we judge whether a feature contains important information
about the target variable?

In the churn example, what variable gives us the most information about
the future churn rate of the population? Being a professional? Age?
Place of residence? Income? Number of complaints to customer service?
Amount of overage charges?
Selecting Informative Attributes

Attributes: Head shape, body shape, body color

Target: Written on top of each shape, which indicates whether that

person will default on a loan.
Selecting Informative Attributes

Which attribute shall we pick to start segmentation?

When we split using an attribute, the new groups should have the same
target value.
Then the group is called “Pure”.
Selecting Informative Attributes

If every member of a group has the same value for the target, then the
group is pure. If there is at least one member of the group that has a
different value for the target variable than the rest of the group, then the
group is impure.

So, we now need a measure of impurity.

Selecting Informative Attributes
Technical issues that need to be taken into account:

1) Attributes rarely split a group perfectly.

2) What is the best attribute to split the data on?

For e.g. body-color=gray only splits off one single data point into the
pure subset, but can we use an attributes that creates a better split.

3) Not all attributes are binary; many attributes have three or more
distinct values. We must take into account that one attribute can split
into two groups while another might split into three groups, or seven.
How do we compare these?
Selecting Informative Attributes
for classification problems we can address all the issues by creating a
formula that evaluates how well each attribute splits a set of examples
into segments. Such a formula is based on a purity measure.

This purity measure is called entropy.

Using entropy, we calculate a metric called Information gain.

Information gain is used to determine which attribute is to be used for

splitting.

Information gain and Entropy come from Information theory

Entropy
• Entropy is measure of data sets disorder – how same or different the
dataset is.

• If we classify a data set in N different classes:

• The entropy is 0 if all the classes in the data set are same.
• The entropy is high if they are different

• It’s a fancy word for a simple concept

Entropy – Mathematically
Entropy is amount of randomness in a random variable.

It measures the disorder amongst the values in a set

Mathematically, entropy is written as:

Entropy
Entropy
In the diagram on the previous slide,
We have 2 classes: + and –

P+ = 1 – P-

Starting with all negative instances at the lower left, p+ = 0, the set has minimal
disorder (it is pure) and the entropy is zero.

If we start to switch class labels of elements of the set from – to +, the entropy
increases. Entropy is maximized at 1 when the instance classes are balanced (five
of each), and p+ = p– = 0.5

As more class labels are switched, the + class starts to predominate and the
entropy lowers again. When all instances are positive, p+ = 1 and entropy is
minimal again at zero.
Entropy
Example:
consider a set S of 10 people with seven of the non-write-off class and
three of the write-off class
Information Gain
Entropy tells us how impure a set is.

Information gain (IG) is used to measure how much an attribute

improves (decreases) entropy over the new segments or sets that it
creates.
Information Gain
Let’s say the attribute we split on has k different values. Let’s call the
original set of examples the parent set, and the result of splitting on the
attribute values the k children sets.

Information gain is a function of both a parent set and of the children set
resulting from some partitioning of the parent set based on an attribute.

Information Gain is calculated for a split by subtracting the weighted entropies of each branch from the original
entropy.
Splitting a class example
Consider the following class which has loan defaulters (dots) and non
defaulters (stars)
Splitting a class example

p(balance < 50) =13/30 = 0.43

p(balance >= 50) =17/30 = 0.57
Splitting a class example
Let’s say you have another attribute Rent to split or segment the parent
class.
Splitting a class example
Calculate the Information Gain (IG) if we split the parent class (set) using
Rent attribute.

The Residence variable does have a positive information gain, but it is

lower than that of Balance.
Therefore, we can conclude that Residence variable is less informative
than Balance.
Splitting a class example
What if we have numerical variables instead of categorial variables?

Numeric variables can be “discretized” by choosing a split point (or many

split points) and then treating the result as a categorical attribute. For
example, Income could be divided into high income, medium income and
low income groups.

Information gain can be applied to evaluate the segmentation created by

this discretization of the numeric attribute.

The question: how do you pick the right splitting point or threshold?

Conceptually, we can try all reasonable split points, and choose the one
that gives the highest information gain

Epicor ERP Order Management Course 10.0.700.2
100% (1)
Epicor ERP Order Management Course 10.0.700.2
67 pages
Financial Tech City: King Abdullah Financial District
No ratings yet
Financial Tech City: King Abdullah Financial District
37 pages
Titanic: Logistic Regression Project
No ratings yet
Titanic: Logistic Regression Project
19 pages
Guider Application Beef Cattle Nutrient Requirements Model
No ratings yet
Guider Application Beef Cattle Nutrient Requirements Model
34 pages
MDLF Financial Manual
No ratings yet
MDLF Financial Manual
263 pages
3 - Intro To Predictive Modeling
No ratings yet
3 - Intro To Predictive Modeling
40 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
What is Hierarchical Clustering in Python_
No ratings yet
What is Hierarchical Clustering in Python_
16 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
3 Introduction To Machine Learning
No ratings yet
3 Introduction To Machine Learning
21 pages
Improving The Performance of Your Imbalanced Machine Learning Classifiers
No ratings yet
Improving The Performance of Your Imbalanced Machine Learning Classifiers
26 pages
Deep Learning Ascs
No ratings yet
Deep Learning Ascs
10 pages
Unit 4 DWDM
No ratings yet
Unit 4 DWDM
8 pages
supervised_learning
No ratings yet
supervised_learning
14 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Machine Learing Algorithms
No ratings yet
Machine Learing Algorithms
13 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
What Are The Types of Machine Learning?
100% (1)
What Are The Types of Machine Learning?
24 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
17 pages
Classification
No ratings yet
Classification
21 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
14 pages
Commonly Used Machine Learning Algorithms
No ratings yet
Commonly Used Machine Learning Algorithms
27 pages
The Idiomatic Programmer - Statistics Primer
No ratings yet
The Idiomatic Programmer - Statistics Primer
44 pages
5 Techniques To Handle Imbalanced Data For A Classification Problem
No ratings yet
5 Techniques To Handle Imbalanced Data For A Classification Problem
7 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
ML - Unit - 1
No ratings yet
ML - Unit - 1
47 pages
Types of Classification Algorithm
No ratings yet
Types of Classification Algorithm
27 pages
Class i Fiers
No ratings yet
Class i Fiers
24 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
8 pages
Chapter 3 MGSC
No ratings yet
Chapter 3 MGSC
28 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
13 pages
Machine Learning
No ratings yet
Machine Learning
53 pages
Supervised Learning Questions
No ratings yet
Supervised Learning Questions
2 pages
Practical 7 Classification Revision Questions
No ratings yet
Practical 7 Classification Revision Questions
8 pages
Unit II Deep Learning
No ratings yet
Unit II Deep Learning
11 pages
Data Science Imp Q and A
No ratings yet
Data Science Imp Q and A
29 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Decision Trees. These Models Use Observations About Certain
No ratings yet
Decision Trees. These Models Use Observations About Certain
6 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
UNIT-1 DLL
No ratings yet
UNIT-1 DLL
73 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Machine Learning Notes
100% (1)
Machine Learning Notes
8 pages
UNIT I
No ratings yet
UNIT I
17 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
Decision Trees and How To Build and Optimize Decision Tree Classifier
No ratings yet
Decision Trees and How To Build and Optimize Decision Tree Classifier
16 pages
Ai Chapter 5
No ratings yet
Ai Chapter 5
45 pages
PDS Imp
No ratings yet
PDS Imp
43 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Jntuk ML RECORD Full
No ratings yet
Jntuk ML RECORD Full
46 pages
ml
No ratings yet
ml
2 pages
Decision Tree Algorithm: and Classification Problems Too
No ratings yet
Decision Tree Algorithm: and Classification Problems Too
12 pages
AITools Unit 2
No ratings yet
AITools Unit 2
34 pages
VIVA_DSA
No ratings yet
VIVA_DSA
11 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Sarkar, DR Tirthajyoti - Roychowdhury, Shubhadeep - Data Wrangling With Python - Creating Actionable Data From Raw Sources-Packt Publishing (2019)
No ratings yet
Sarkar, DR Tirthajyoti - Roychowdhury, Shubhadeep - Data Wrangling With Python - Creating Actionable Data From Raw Sources-Packt Publishing (2019)
538 pages
Exploratory Sensor Data Analysis in Python - by Mabel González Castellanos - Towards Data Science
No ratings yet
Exploratory Sensor Data Analysis in Python - by Mabel González Castellanos - Towards Data Science
19 pages
Lê Thanh Tuyến-Data Analyst
No ratings yet
Lê Thanh Tuyến-Data Analyst
2 pages
Theory
No ratings yet
Theory
3 pages
CA Technologies Fact Sheet: $4.8 Billion
No ratings yet
CA Technologies Fact Sheet: $4.8 Billion
2 pages
DLD Project Report
No ratings yet
DLD Project Report
2 pages
Seminars-Conferences in India Forms
No ratings yet
Seminars-Conferences in India Forms
6 pages
All Weather Friendship
No ratings yet
All Weather Friendship
14 pages
Asthithva - A Journey To The Roots
100% (1)
Asthithva - A Journey To The Roots
156 pages
MTS102 - TDJ 609017DEI 65Fv02
No ratings yet
MTS102 - TDJ 609017DEI 65Fv02
1 page
HCC P 15 17 05 Rev. 2 Dated 260505 - PAINTING
No ratings yet
HCC P 15 17 05 Rev. 2 Dated 260505 - PAINTING
8 pages
07 Engineers in Marketing & Service Activities
No ratings yet
07 Engineers in Marketing & Service Activities
7 pages
Portals
No ratings yet
Portals
28 pages
BMC Lokmanya Tilak Municipal General Hospital Bharti 2024
No ratings yet
BMC Lokmanya Tilak Municipal General Hospital Bharti 2024
2 pages
MobileDart Evolution MX8c
100% (2)
MobileDart Evolution MX8c
15 pages
Letter To Follow Time Schedule
No ratings yet
Letter To Follow Time Schedule
2 pages
DLL in Food and Beverage Services NC II
No ratings yet
DLL in Food and Beverage Services NC II
6 pages
The Ethics of Corporate Governance: Donald Nordberg
No ratings yet
The Ethics of Corporate Governance: Donald Nordberg
18 pages
Chapter 6 Problem (With solutions)
No ratings yet
Chapter 6 Problem (With solutions)
8 pages
Summary - Accel 2023 09 12
No ratings yet
Summary - Accel 2023 09 12
7 pages
Chapter 10 PathologyLaboratory Services CPT Codes 80000 - 89999
No ratings yet
Chapter 10 PathologyLaboratory Services CPT Codes 80000 - 89999
22 pages
Roberts 2022 Citizen Journalism
No ratings yet
Roberts 2022 Citizen Journalism
6 pages
Checklist RFQ For Utility Water Tank
No ratings yet
Checklist RFQ For Utility Water Tank
2 pages
PERCENTAGE
No ratings yet
PERCENTAGE
4 pages
Om Homework Iso
No ratings yet
Om Homework Iso
9 pages
Tugas Pengantar Akuntansi - Timotius Siagian 1 C3
No ratings yet
Tugas Pengantar Akuntansi - Timotius Siagian 1 C3
8 pages
Integral Offshore Solutions
No ratings yet
Integral Offshore Solutions
42 pages
Solution PSY632 1
No ratings yet
Solution PSY632 1
2 pages

5 - Predictive Modeling Using Decision Trees

Uploaded by

5 - Predictive Modeling Using Decision Trees

Uploaded by

Predictive Modeling

Predictive modeling is about building models that predict a class

supervised learning—how can we segment the population into groups

In supervised learning we have the value of this target variable.

For our churn-prediction problem we would like to build a model of the

An instance is described by a set of attributes (fields, columns, variables, or features)

An instance is also called a feature vector.

Dataset – refers to the entire set of data.

The creation of models from data is known as model induction.

Induction is a term from philosophy that refers to generalizing from

The question is:

Attributes: Head shape, body shape, body color

Target: Written on top of each shape, which indicates whether that

Which attribute shall we pick to start segmentation?

So, we now need a measure of impurity.

1) Attributes rarely split a group perfectly.

2) What is the best attribute to split the data on?

This purity measure is called entropy.

Using entropy, we calculate a metric called Information gain.

Information gain is used to determine which attribute is to be used for

Information gain and Entropy come from Information theory

• If we classify a data set in N different classes:

• It’s a fancy word for a simple concept

It measures the disorder amongst the values in a set

Mathematically, entropy is written as:

Information gain (IG) is used to measure how much an attribute

p(balance < 50) =13/30 = 0.43

The Residence variable does have a positive information gain, but it is

Numeric variables can be “discretized” by choosing a split point (or many

Information gain can be applied to evaluate the segmentation created by

You might also like