0% found this document useful (0 votes)

125 views

ML Unit-3 ppt

Uploaded by

riskman1919

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views

ML Unit-3 ppt

Uploaded by

riskman1919

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Decision Tree

• A decision tree is a structure that includes a root node,

branches, and leaf nodes. Each internal node denotes
a test on an attribute, each branch denotes the
outcome of a test, and each leaf node holds a class
label. The topmost node in the tree is the root node.
Decision Tree
What are Decision Trees
A decision tree is a tree-like structure that is used as
a model for classifying data. A decision tree
decomposes the data into sub-trees made of other
sub-trees and/or leaf nodes.
A decision tree is made up of three types of nodes
Decision Nodes: These type of node have two or
more branches.
Root Node: This is also a decision node but at the
topmost level
Leaf Nodes: The lowest nodes which represents
decision
• ID3 Algorithm
• Entropy in Information Gain

• Consider the table below. It represent factors that

affect whether John would go out to play golf or not.
Using the data in the table, build a decision tree by
using ID3 algorithm that can be predict if John would
play golf or not.
Step by Step Procedure for Building a Decision Tree-
ID3 algorithm
Step 1: Determine the Decision Column

• Since decision trees are used for classification, you

need to determine the classes which are the basis for
the decision. In this case, the last column, that is Play
Golf column with classes Yes and No.

• To determine the Root Node we need to compute the

entropy. To do this, we create a frequency table for
the classes (the Yes/No column).
Step 2: Calculating Entropy for the classes (Play Golf)
In this step, you need to calculate the entropy for the
Play Golf column and the calculation step is given
below.
Entropy(Play Golf) = E(5,9)
Step 3: Calculate Entropy for Other Attributes After Split
For the other four attributes, we need to calculate the
entropy after each of the split.
•E(Play Golf, Outloook)
•E(Play Golf, Temperature)
•E(Play Golf, Humidity)
•E(Play Golf, Windy)
• The entropy for two variables is calculated using the
formula.

There to calculate E(Play Golf, Outlook), we would use the

formula below:
Which is the same as:
E(Play Golf, Outlook) = P(Sunny) E(3,2) + P(Overcast) E(4,0)
+ P(rainy) E(2,3)
• The easiest way to approach this calculation is to
create a frequency table for the two variables, that is
Play Golf and Outlook.
• This frequency table is given below:

Frequency Table for Outlook

• Using this table, we can then calculate E(Play Golf, Outlook),
which would then be given by the formula below
• Calculate the E(Play Golf, Outlook) by substituting the values
we calculated from E(Sunny), E(Overcast) and E(Rainy) in
the equation:
E(Play Golf, Outlook) = P(Sunny) E(3,2) + P(Overcast) E(4,0)
+ P(rainy) E(2,3)
E(Play Golf, Temperature) Calculation

• Just like in the previous calculation, the calculation of

E(Play Golf, Temperature) is given below. It
It is easier to do if you form the frequency table for
the split for Temperature as shown.

Frequency Table for Temperature

E(Play Golf, Temperature) = P(Hot) E(2,2) + P(Cold) E(3,1)
+ P(Mild) E(4,2)
E(Play Golf, Humidity) Calculation

Just like in the previous calculation, the calculation of

E(PlayGolf, Humidity) is given below.

It is easier to do if you form the frequency table for the split for
Humidity as shown.

Frequency Table for Humidity

E(Play Golf, Windy) Calculation

• Just like in the previous calculation, the calculation of E(Play

Golf, Windy) is given below.

• It is easier to do if you form the frequency table for the split

for Windy as shown.

Frequency Table for Windy

• So now that we have all the entropies for all the four
attributes, let’s go ahead to summarize them as shown
in below:

• E(Play Golf, Outlook) = 0.693

• E(Play Golf, Temperature) = 0.911

• E(Play Golf, Humidity) = 0.788

• E(Play Golf, Windy) = 0.892

The information gain is calculated using the formula:

• Gain(S,T) = Entropy(S) – Entropy(S,T)

For example, the information gain after splitting
using the Outlook attribute is given by:

Gain(Play Golf, Outlook) =

Entropy(Play Golf) – Entropy(Play Golf, Outlook)
So let’s go ahead to do the calculation

Gain(Play Golf, Outlook) = Entropy(Play Golf) –

Entropy(Play Golf, Outlook)
= 0.94 – 0.693 = 0.247

Gain(Play Golf, Temperature) = Entropy(Play Golf)

– Entropy(Play Golf, Temperature)
= 0.94 – 0.911 = 0.029
• Gain(Play Golf, Humidity) = Entropy(Play
Golf) – Entropy(Play Golf, Humidity)
= 0.94 – 0.788 = 0.152

Gain(Play Golf, Windy) = Entropy(Play Golf)

– Entropy(Play Golf, Windy)
= 0.94 – 0.892 = 0.048
• Having calculated all the information gain, we now choose the
attribute that gives the highest information gain after the split.

Step 5: Perform the First Split

• Draw the First Split of the Decision Tree
Now that we have all the information gain, we then split the
tree based on the attribute with the highest information gain.

• From our calculation, the highest information gain comes from

Outlook. Therefore the split will look like this:
we could see that the Overcast outlook requires no further split because it is just one
homogeneous group. So we have a leaf node.
Step 6: Perform Further Splits
• The Sunny and the Rainy attributes needs to be split
• The Rainy outlook can be split using either
Temperature, Humidity or Windy.
• What attribute would best be used for this split?
Why?
Humidity, Because it produces homogenous groups.
• The Rainy attribute could be split using High and
Normal attributes and that would give us the tree
below.

Split using the Humidity Attribute

• Let’t now go ahead to do the same thing for the Sunny outlook
The Rainy outlook can be split using either Temperature,
Humidity or Windy.

Quiz 2: What attribute would best be used for this split? Why?
Answer: Windy . Because it produces homogeneous groups.

Split using Windy Attribute

• Step 7: Complete the Decision Tree
• The complete table is shown in Figure 4
Note that the same calculation that was used
initially could also be used for the further
splits. But that would not be necessary since
you could just look at the sub table and be able
to determine which attribute to use for the
split.
Final Decision Tree
CART( Classification And Regression Tree)

• CART is a predictive algorithm used in Machine

learning and it explains how the target variable's
values can be predicted based on other matters. It is a
decision tree where each fork is split into a predictor
variable and each node has a prediction for the target
variable at the end.
• In the decision tree, nodes are split into sub-nodes on
the basis of a threshold value of an attribute. The root
node is taken as the training set and is split into two
by considering the best attribute and threshold value
• Further, the subsets are also split using the same
logic. This continues till the last pure sub-set is found
in the tree or the maximum number of leaves possible
in that growing tree.
• The CART algorithm works via the following process:
• The best split point of each input is obtained.
• Based on the best split points of each input in Step 1,
the new “best” split point is identified.
• Split the chosen input according to the “best” split
point.
• Continue splitting until a stopping rule is satisfied or
no further desirable splitting is available.
• CART algorithm uses Gini Impurity to split the
dataset into a decision tree .It does that by searching
for the best homogeneity for the sub nodes, with the
help of the Gini index criterion.
Gini index/Gini impurity

• The Gini index is a metric for the classification

tasks in CART. It stores the sum of squared
probabilities of each class. It computes the degree
of probability of a specific variable that is
wrongly being classified when chosen randomly
and a variation of the Gini coefficient. It works on
categorical variables, provides outcomes either
“successful” or “failure” and hence conducts
binary splitting only.
• The degree of the Gini index varies from 0 to 1,
• Where 0 depicts that all the elements are allied to a
certain class, or only one class exists there.
• The Gini index of value 1 signifies that all the
elements are randomly distributed across various
classes, and
• A value of 0.5 denotes the elements are uniformly
distributed into some classes.
where pi is the probability of an object being classified to a
particular class.
• Classification tree
• A classification tree is an algorithm where the target
variable is categorical. The algorithm is then used to
identify the “Class” within which the target variable
is most likely to fall. Classification trees are used
when the dataset needs to be split into classes that
belong to the response variable(like yes or no)
• Regression tree
• A Regression tree is an algorithm where the target
variable is continuous and the tree is used to predict
its value. Regression trees are used when the response
variable is continuous. For example, if the response
variable is the temperature of the day.
where pi is the probability of an object being classified to a
particular class.
Decision for rain outlook
The winner is wind feature for rain outlook because it has the minimum gini
index score in features.
Ensemble Learning
Bagging (Bootstrap Aggregation)
Boosting:
Adaboost,
Stumping;
Random Forests.
• Ensemble Learning
• A Ensemble method is a technique that combined the
predictions from multiple machine learning algorithm
together to make more accurate predictions than any
individual model. A model comprised of many models is
called an ensemble learning
• Ensemble learning helps improve machine learning
results by combining several models. This approach
allows the production of better predictive
performance compared to a single model. Basic idea
is to learn a set of classifiers and to allow them to
vote.

• Ensemble methods are techniques that create multiple

models and then combine them to produce improved
results. Ensemble methods usually produces more
accurate solutions than a single model. This has been
the case in a number of machine learning
competitions, where the winning solutions used
ensemble methods.
Types of Ensemble Learning

• Ensemble learning contains the same type of

learning algorithms which are called
homogeneous ensemble but there are also
some methods that contain different types of
learning algorithms and they are called
heterogeneous ensembles.
Bagging (Bootstrap Aggregation)

• There are two main key ingredients of Bagging, one

is Bootstrap and other is Aggregation.
• It is the general procedure that can be used to reduce
the variance for that algorithm that has high variance,
typically decision trees. Bagging makes each model
run independently and then aggregates the outputs at
the end with out preference to any model.

Random forest is a Bagging Technique

• In Bagging, we take different subsets of datasets
randomly and combined them with the help of
Bootstrap sampling. In detail given a training data set
contain ‘n’ number of training records, a sample of
‘m’ training records will be generated by sampling
with replacement. In Bagging, we used the most
popular strategies for Aggregating the output of the
base learners, that is find out the majority vote in a
classification task and finding the mean in the
regression task.
• In Bagging, we actually combined several strong
learners in which all the base models are overfitted
models they are having a very high variance and at
the time of Aggregation we simply try to reduce that
variance without affecting the bias with the accuracy
may improved.
Boosting
• Boosting is an ensemble modeling technique that
attempts to build a strong classifier from the number
of weak classifiers. It is done by building a model by
using weak models in series. Firstly, a model is built
from the training data. Then the second model is built
which tries to correct the errors present in the first
model. This procedure is continued and models are
added until either the complete training data set is
predicted correctly or the maximum number of
models are added.
• Boosting is an efficient algorithm that converts a
weak learner into a strong learner.
• They use the concept of the weak learner and strong
learner conversation through the weighted average
values and higher votes values for prediction.

• AdaBoost was the first really successful boosting

algorithm developed for the purpose of binary
classification. AdaBoost is short for Adaptive
Boosting and is a very popular boosting technique
that combines multiple “weak classifiers” into a
single “strong classifier”.
• AdaBoost is implemented by combining several weak
learners into a single strong learner. The weak learners in
AdaBoost take into account a single input feature and
draw out a single split decision tree called the decision
stump. Each observation is weighted equally while
drawing out the first decision stump.
• The results from the first decision stump are analyzed,
and if any observations are wrongfully classified, they are
assigned higher weights. A new decision stump is drawn
by considering the higher-weight observations as more
significant. Again if any observations are misclassified,
they're given a higher weight, and this process continues
until all the observations fall into the right class.
• AdaBoost can be used for both classification and
regression-based problems. However, it is more
commonly used for classification purposes.
• Gradient Boosting: Gradient Boosting is also based
on sequential ensemble learning. Here the base
learners are generated sequentially so that the present
base learner is always more effective than the
previous one, i.e., and the overall model improves
sequentially with each iteration.
• The difference in this boosting type is that the weights
for misclassified outcomes are not incremented. Instead,
the Gradient Boosting method tries to optimize the loss
function of the previous learner by adding a new model
that adds weak learners to reduce the loss function.
• The main idea here is to overcome the errors in the
previous learner's predictions. This boosting has three
main components:
• Loss function: The use of the loss function depends on
the type of problem. The advantage of gradient boosting
is that there is no need for a new boosting algorithm for
each loss function.

• Weak learner: In gradient boosting, decision trees are

used as a weak learners. A regression tree is used to give
true values, which can combine to create correct
predictions. Like in the AdaBoost algorithm, small trees
with a single split are used, i.e., decision stump. Larger
trees are used for large levels,e, 4-8.

• Additive Model: Trees are added one at a time in this

model. Existing trees remain the same. During the
addition of trees, gradient descent is used to minimize the
loss function.
Random Forest
Random Forest
• Random forest is a supervised learning algorithm
which is used for both classification as well as
regression. But however, it is mainly used for
classification problems.
• As we know that a forest is made up of trees and
more trees means more robust forest.
• Similarly, random forest algorithm creates decision
trees on data samples and then gets the prediction
from each of them and finally selects the best solution
by mean or voting.
• It is an ensemble method which is better than a single
decision tree because it reduces the over-fitting by
averaging the result.
• Random Forest is a popular machine learning
algorithm that belongs to the supervised learning
technique.
• It is a process of combining multiple classifiers to
solve a complex problem and to improve the
performance of the model.
• Random Forest is a classifier that contains a number
of decision trees on various subsets of the given
dataset and takes the average to improve the
predictive accuracy of that dataset.
Working of Random Forest Algorithm
We can understand the working of Random Forest
algorithm with the help of following steps −
• Step 1 − First, start with the selection of random
samples from a given dataset.
• Step 2 − Next, this algorithm will construct a
decision tree for every sample. Then it will get the
prediction result from every decision tree.
• Step 3 − In this step, voting will be performed for
every predicted result.
• Step 4 − At last, select the most voted prediction
result as the final prediction result.
• # importing libraries
• import numpy as nm
• import matplotlib.pyplot as mtp
• import pandas as pd
•
• #importing datasets
• data_set= pd.read_csv('user_data.csv')
•
• #Extracting Independent and dependent Variable
• x= data_set.iloc[:, [2,3]].values
• y= data_set.iloc[:, 4].values
•
• # Splitting the dataset into training and test set.
• from sklearn.model_selection import train_test_split
• x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
•
• #feature Scaling
• from sklearn.preprocessing import StandardScaler
• st_x= StandardScaler()
• x_train= st_x.fit_transform(x_train)
• x_test= st_x.transform(x_test)

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Cloud Unit3
No ratings yet
Cloud Unit3
26 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
PYTHON Notes Unit1&Unit2
No ratings yet
PYTHON Notes Unit1&Unit2
38 pages
Anna University OOPS Question Bank Unit 2
No ratings yet
Anna University OOPS Question Bank Unit 2
6 pages
Python Important
No ratings yet
Python Important
35 pages
Module II
No ratings yet
Module II
22 pages
Data Mining and Business Intelligence Lab Manual
No ratings yet
Data Mining and Business Intelligence Lab Manual
52 pages
Studocu DAA Unit 5 Notes
No ratings yet
Studocu DAA Unit 5 Notes
23 pages
python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
Compiler-Design Notes
No ratings yet
Compiler-Design Notes
5 pages
Python Record Final With Viva Question
No ratings yet
Python Record Final With Viva Question
100 pages
Case Study (Analysis of Algorithm
No ratings yet
Case Study (Analysis of Algorithm
14 pages
Unit 1: Daa Two Mark Question and Answer 1
No ratings yet
Unit 1: Daa Two Mark Question and Answer 1
22 pages
Os Lab Manual AI&DS
No ratings yet
Os Lab Manual AI&DS
64 pages
DS Assignment 3rd Sem IPU
No ratings yet
DS Assignment 3rd Sem IPU
6 pages
Madhuri Gupta 7th Sem AI Lab Manual1
No ratings yet
Madhuri Gupta 7th Sem AI Lab Manual1
17 pages
DAA UNIT 4 - Final
No ratings yet
DAA UNIT 4 - Final
12 pages
Searching Sorting Notes Handwritten
No ratings yet
Searching Sorting Notes Handwritten
29 pages
CIA I Answer Key
No ratings yet
CIA I Answer Key
8 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Object Oriented Programming in C++
No ratings yet
Object Oriented Programming in C++
4 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Studocu DAA Unit 1 Notes
No ratings yet
Studocu DAA Unit 1 Notes
52 pages
OOP - I GTU Study Material Presentations Unit-1 07022022102854PM
No ratings yet
OOP - I GTU Study Material Presentations Unit-1 07022022102854PM
59 pages
Python Lab Manual 2022-23-2
No ratings yet
Python Lab Manual 2022-23-2
36 pages
CS8381 Data Structures Record
No ratings yet
CS8381 Data Structures Record
107 pages
CCS356 Object Oriented Software Engineering Lecture Notes 1
No ratings yet
CCS356 Object Oriented Software Engineering Lecture Notes 1
222 pages
Flat Unit 1 Notes
0% (1)
Flat Unit 1 Notes
18 pages
UNIT IV (Well Posed Leaning Problems)
100% (1)
UNIT IV (Well Posed Leaning Problems)
16 pages
CNS Bits
No ratings yet
CNS Bits
3 pages
ML Unit-4
No ratings yet
ML Unit-4
9 pages
Python 2022-23
No ratings yet
Python 2022-23
16 pages
Practical Lab File Based ON Programing in C: Submitted by
No ratings yet
Practical Lab File Based ON Programing in C: Submitted by
6 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
Object Oriented Programming - CS8391
No ratings yet
Object Oriented Programming - CS8391
9 pages
18CS42 Model Question Paper - 1 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
18CS42 Model Question Paper - 1 With Effect From 2019-20 (CBCS Scheme)
3 pages
Module-4 (PDFDrive)
No ratings yet
Module-4 (PDFDrive)
67 pages
UNIT IV - Part 5: IOT Platforms Design Methodology
100% (1)
UNIT IV - Part 5: IOT Platforms Design Methodology
16 pages
Python Notes 3rd Mca
No ratings yet
Python Notes 3rd Mca
99 pages
Unit 5 Toc
No ratings yet
Unit 5 Toc
56 pages
Notes - Unit 3 - Map Reduce Applications
No ratings yet
Notes - Unit 3 - Map Reduce Applications
11 pages
IML-IITKGP - Assignment 7 Solution
No ratings yet
IML-IITKGP - Assignment 7 Solution
8 pages
Goal: Solution: Let, X Be The Light Sleeper
No ratings yet
Goal: Solution: Let, X Be The Light Sleeper
3 pages
Problem Solving and Python Programming - GE3151 - Important Questions With 2 Marks Answer - Unit 2 - Data Types Expressions Statements
No ratings yet
Problem Solving and Python Programming - GE3151 - Important Questions With 2 Marks Answer - Unit 2 - Data Types Expressions Statements
30 pages
Python Function: Advantage of Functions in Python
No ratings yet
Python Function: Advantage of Functions in Python
28 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
No ratings yet
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
22 pages
Ai-Unit-Iii Notes
No ratings yet
Ai-Unit-Iii Notes
46 pages
Java - Lab - Manual-21csl35 - Skit
No ratings yet
Java - Lab - Manual-21csl35 - Skit
30 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
DAA R19 - Unit-5
No ratings yet
DAA R19 - Unit-5
12 pages
Python Record
No ratings yet
Python Record
35 pages
ML Question Bank - Beena Kapadia
No ratings yet
ML Question Bank - Beena Kapadia
3 pages
Class Diagram UML
No ratings yet
Class Diagram UML
5 pages
DAA All 5 Units Notes
No ratings yet
DAA All 5 Units Notes
87 pages
Artificial Intelligence Lab Manual: Python
No ratings yet
Artificial Intelligence Lab Manual: Python
15 pages
Sonata Software Sample Aptitude Placement Paper Level1
No ratings yet
Sonata Software Sample Aptitude Placement Paper Level1
7 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet
Assignment 2
No ratings yet
Assignment 2
4 pages
Course Introduction
No ratings yet
Course Introduction
11 pages
Lecture - 1: CS-406 Data Structures and Algorithms
No ratings yet
Lecture - 1: CS-406 Data Structures and Algorithms
24 pages
Full Data Structures & Algorithm Analysis in C++ 4th Edition (Ebook PDF) Ebook All Chapters
100% (2)
Full Data Structures & Algorithm Analysis in C++ 4th Edition (Ebook PDF) Ebook All Chapters
49 pages
Lecture 2: Problem Solving Using State Space Representation
No ratings yet
Lecture 2: Problem Solving Using State Space Representation
37 pages
Chapter1 Introductionofalgorithms
No ratings yet
Chapter1 Introductionofalgorithms
36 pages
Cluster Analysis Chapter 8 Solution
No ratings yet
Cluster Analysis Chapter 8 Solution
8 pages
HW5 Chp4 Ans
No ratings yet
HW5 Chp4 Ans
4 pages
Trees MCQS
No ratings yet
Trees MCQS
12 pages
SEARCH
No ratings yet
SEARCH
53 pages
SRM Valliammai Engineering College (An Autonomous Institution)
No ratings yet
SRM Valliammai Engineering College (An Autonomous Institution)
9 pages
2019-Dec EE-712 23
No ratings yet
2019-Dec EE-712 23
1 page
Os Lab 4th Assess (Final)
No ratings yet
Os Lab 4th Assess (Final)
10 pages
ch-1 Numpy - 2nd - Day
No ratings yet
ch-1 Numpy - 2nd - Day
5 pages
Decision Maths 1 Algorithms
No ratings yet
Decision Maths 1 Algorithms
7 pages
Task 1
No ratings yet
Task 1
18 pages
Project Report IITM SHALINI
No ratings yet
Project Report IITM SHALINI
8 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Knapsack Problems: I. History
No ratings yet
Knapsack Problems: I. History
7 pages
Important Programs
No ratings yet
Important Programs
2 pages
Introduction To Algorithms and Flowcharts
No ratings yet
Introduction To Algorithms and Flowcharts
17 pages
PD Slides03 Partition
No ratings yet
PD Slides03 Partition
88 pages
Design and Implementation of Sorting Algorithms Based On FPGA
No ratings yet
Design and Implementation of Sorting Algorithms Based On FPGA
4 pages
There Are 5 Legacy Classes in Java:: 9ector Stack Properties Hashtable Dictionary
No ratings yet
There Are 5 Legacy Classes in Java:: 9ector Stack Properties Hashtable Dictionary
20 pages
Weather Forecasting Basepaper
100% (1)
Weather Forecasting Basepaper
14 pages
CH 2 - Hashing and Priority Queues Hashing
No ratings yet
CH 2 - Hashing and Priority Queues Hashing
112 pages
COMP3010 Lab4 Theory
No ratings yet
COMP3010 Lab4 Theory
3 pages
Nagarro Placement Questions and Solution 2011-2012
100% (3)
Nagarro Placement Questions and Solution 2011-2012
13 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
2nd QRT - Lesson 2 Synthetic Division
No ratings yet
2nd QRT - Lesson 2 Synthetic Division
13 pages