0% found this document useful (0 votes)
125 views

ML Unit-3 ppt

Uploaded by

riskman1919
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views

ML Unit-3 ppt

Uploaded by

riskman1919
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Decision Tree

• A decision tree is a structure that includes a root node,


branches, and leaf nodes. Each internal node denotes
a test on an attribute, each branch denotes the
outcome of a test, and each leaf node holds a class
label. The topmost node in the tree is the root node.
Decision Tree
What are Decision Trees
A decision tree is a tree-like structure that is used as
a model for classifying data. A decision tree
decomposes the data into sub-trees made of other
sub-trees and/or leaf nodes.
A decision tree is made up of three types of nodes
Decision Nodes: These type of node have two or
more branches.
Root Node: This is also a decision node but at the
topmost level
Leaf Nodes: The lowest nodes which represents
decision
• ID3 Algorithm
• Entropy in Information Gain

• Consider the table below. It represent factors that


affect whether John would go out to play golf or not.
Using the data in the table, build a decision tree by
using ID3 algorithm that can be predict if John would
play golf or not.
Step by Step Procedure for Building a Decision Tree-
ID3 algorithm
Step 1: Determine the Decision Column

• Since decision trees are used for classification, you


need to determine the classes which are the basis for
the decision. In this case, the last column, that is Play
Golf column with classes Yes and No.

• To determine the Root Node we need to compute the


entropy. To do this, we create a frequency table for
the classes (the Yes/No column).
Step 2: Calculating Entropy for the classes (Play Golf)
In this step, you need to calculate the entropy for the
Play Golf column and the calculation step is given
below.
Entropy(Play Golf) = E(5,9)
Step 3: Calculate Entropy for Other Attributes After Split
For the other four attributes, we need to calculate the
entropy after each of the split.
•E(Play Golf, Outloook)
•E(Play Golf, Temperature)
•E(Play Golf, Humidity)
•E(Play Golf, Windy)
• The entropy for two variables is calculated using the
formula.

There to calculate E(Play Golf, Outlook), we would use the


formula below:
Which is the same as:
E(Play Golf, Outlook) = P(Sunny) E(3,2) + P(Overcast) E(4,0)
+ P(rainy) E(2,3)
• The easiest way to approach this calculation is to
create a frequency table for the two variables, that is
Play Golf and Outlook.
• This frequency table is given below:

Frequency Table for Outlook


• Using this table, we can then calculate E(Play Golf, Outlook),
which would then be given by the formula below
• Calculate the E(Play Golf, Outlook) by substituting the values
we calculated from E(Sunny), E(Overcast) and E(Rainy) in
the equation:
E(Play Golf, Outlook) = P(Sunny) E(3,2) + P(Overcast) E(4,0)
+ P(rainy) E(2,3)
E(Play Golf, Temperature) Calculation

• Just like in the previous calculation, the calculation of


E(Play Golf, Temperature) is given below. It
It is easier to do if you form the frequency table for
the split for Temperature as shown.

Frequency Table for Temperature


E(Play Golf, Temperature) = P(Hot) E(2,2) + P(Cold) E(3,1)
+ P(Mild) E(4,2)
E(Play Golf, Humidity) Calculation

Just like in the previous calculation, the calculation of


E(PlayGolf, Humidity) is given below.

It is easier to do if you form the frequency table for the split for
Humidity as shown.

Frequency Table for Humidity


E(Play Golf, Windy) Calculation

• Just like in the previous calculation, the calculation of E(Play


Golf, Windy) is given below.

• It is easier to do if you form the frequency table for the split


for Windy as shown.

Frequency Table for Windy


• So now that we have all the entropies for all the four
attributes, let’s go ahead to summarize them as shown
in below:

• E(Play Golf, Outlook) = 0.693

• E(Play Golf, Temperature) = 0.911

• E(Play Golf, Humidity) = 0.788

• E(Play Golf, Windy) = 0.892


The information gain is calculated using the formula:

• Gain(S,T) = Entropy(S) – Entropy(S,T)


For example, the information gain after splitting
using the Outlook attribute is given by:

Gain(Play Golf, Outlook) =


Entropy(Play Golf) – Entropy(Play Golf, Outlook)
So let’s go ahead to do the calculation

Gain(Play Golf, Outlook) = Entropy(Play Golf) –


Entropy(Play Golf, Outlook)
= 0.94 – 0.693 = 0.247

Gain(Play Golf, Temperature) = Entropy(Play Golf)


– Entropy(Play Golf, Temperature)
= 0.94 – 0.911 = 0.029
• Gain(Play Golf, Humidity) = Entropy(Play
Golf) – Entropy(Play Golf, Humidity)
= 0.94 – 0.788 = 0.152

Gain(Play Golf, Windy) = Entropy(Play Golf)


– Entropy(Play Golf, Windy)
= 0.94 – 0.892 = 0.048
• Having calculated all the information gain, we now choose the
attribute that gives the highest information gain after the split.

Step 5: Perform the First Split


• Draw the First Split of the Decision Tree
Now that we have all the information gain, we then split the
tree based on the attribute with the highest information gain.

• From our calculation, the highest information gain comes from


Outlook. Therefore the split will look like this:
we could see that the Overcast outlook requires no further split because it is just one
homogeneous group. So we have a leaf node.
Step 6: Perform Further Splits
• The Sunny and the Rainy attributes needs to be split
• The Rainy outlook can be split using either
Temperature, Humidity or Windy.
• What attribute would best be used for this split?
Why?
Humidity, Because it produces homogenous groups.
• The Rainy attribute could be split using High and
Normal attributes and that would give us the tree
below.

Split using the Humidity Attribute


• Let’t now go ahead to do the same thing for the Sunny outlook
The Rainy outlook can be split using either Temperature,
Humidity or Windy.

Quiz 2: What attribute would best be used for this split? Why?
Answer: Windy . Because it produces homogeneous groups.

Split using Windy Attribute


• Step 7: Complete the Decision Tree
• The complete table is shown in Figure 4
Note that the same calculation that was used
initially could also be used for the further
splits. But that would not be necessary since
you could just look at the sub table and be able
to determine which attribute to use for the
split.
Final Decision Tree
CART( Classification And Regression Tree)

• CART is a predictive algorithm used in Machine


learning and it explains how the target variable's
values can be predicted based on other matters. It is a
decision tree where each fork is split into a predictor
variable and each node has a prediction for the target
variable at the end.
• In the decision tree, nodes are split into sub-nodes on
the basis of a threshold value of an attribute. The root
node is taken as the training set and is split into two
by considering the best attribute and threshold value
• Further, the subsets are also split using the same
logic. This continues till the last pure sub-set is found
in the tree or the maximum number of leaves possible
in that growing tree.
• The CART algorithm works via the following process:
• The best split point of each input is obtained.
• Based on the best split points of each input in Step 1,
the new “best” split point is identified.
• Split the chosen input according to the “best” split
point.
• Continue splitting until a stopping rule is satisfied or
no further desirable splitting is available.
• CART algorithm uses Gini Impurity to split the
dataset into a decision tree .It does that by searching
for the best homogeneity for the sub nodes, with the
help of the Gini index criterion.
Gini index/Gini impurity

• The Gini index is a metric for the classification


tasks in CART. It stores the sum of squared
probabilities of each class. It computes the degree
of probability of a specific variable that is
wrongly being classified when chosen randomly
and a variation of the Gini coefficient. It works on
categorical variables, provides outcomes either
“successful” or “failure” and hence conducts
binary splitting only.
• The degree of the Gini index varies from 0 to 1,
• Where 0 depicts that all the elements are allied to a
certain class, or only one class exists there.
• The Gini index of value 1 signifies that all the
elements are randomly distributed across various
classes, and
• A value of 0.5 denotes the elements are uniformly
distributed into some classes.
where pi is the probability of an object being classified to a
particular class.
• Classification tree
• A classification tree is an algorithm where the target
variable is categorical. The algorithm is then used to
identify the “Class” within which the target variable
is most likely to fall. Classification trees are used
when the dataset needs to be split into classes that
belong to the response variable(like yes or no)
• Regression tree
• A Regression tree is an algorithm where the target
variable is continuous and the tree is used to predict
its value. Regression trees are used when the response
variable is continuous. For example, if the response
variable is the temperature of the day.
where pi is the probability of an object being classified to a
particular class.
Decision for rain outlook
The winner is wind feature for rain outlook because it has the minimum gini
index score in features.
Ensemble Learning
Bagging (Bootstrap Aggregation)
Boosting:
Adaboost,
Stumping;
Random Forests.
• Ensemble Learning
• A Ensemble method is a technique that combined the
predictions from multiple machine learning algorithm
together to make more accurate predictions than any
individual model. A model comprised of many models is
called an ensemble learning
• Ensemble learning helps improve machine learning
results by combining several models. This approach
allows the production of better predictive
performance compared to a single model. Basic idea
is to learn a set of classifiers and to allow them to
vote.

• Ensemble methods are techniques that create multiple


models and then combine them to produce improved
results. Ensemble methods usually produces more
accurate solutions than a single model. This has been
the case in a number of machine learning
competitions, where the winning solutions used
ensemble methods.
Types of Ensemble Learning

• Ensemble learning contains the same type of


learning algorithms which are called
homogeneous ensemble but there are also
some methods that contain different types of
learning algorithms and they are called
heterogeneous ensembles.
Bagging (Bootstrap Aggregation)

• There are two main key ingredients of Bagging, one


is Bootstrap and other is Aggregation.
• It is the general procedure that can be used to reduce
the variance for that algorithm that has high variance,
typically decision trees. Bagging makes each model
run independently and then aggregates the outputs at
the end with out preference to any model.

Random forest is a Bagging Technique


• In Bagging, we take different subsets of datasets
randomly and combined them with the help of
Bootstrap sampling. In detail given a training data set
contain ‘n’ number of training records, a sample of
‘m’ training records will be generated by sampling
with replacement. In Bagging, we used the most
popular strategies for Aggregating the output of the
base learners, that is find out the majority vote in a
classification task and finding the mean in the
regression task.
• In Bagging, we actually combined several strong
learners in which all the base models are overfitted
models they are having a very high variance and at
the time of Aggregation we simply try to reduce that
variance without affecting the bias with the accuracy
may improved.
Boosting
• Boosting is an ensemble modeling technique that
attempts to build a strong classifier from the number
of weak classifiers. It is done by building a model by
using weak models in series. Firstly, a model is built
from the training data. Then the second model is built
which tries to correct the errors present in the first
model. This procedure is continued and models are
added until either the complete training data set is
predicted correctly or the maximum number of
models are added.
• Boosting is an efficient algorithm that converts a
weak learner into a strong learner.
• They use the concept of the weak learner and strong
learner conversation through the weighted average
values and higher votes values for prediction.

• AdaBoost was the first really successful boosting


algorithm developed for the purpose of binary
classification. AdaBoost is short for Adaptive
Boosting and is a very popular boosting technique
that combines multiple “weak classifiers” into a
single “strong classifier”.
• AdaBoost is implemented by combining several weak
learners into a single strong learner. The weak learners in
AdaBoost take into account a single input feature and
draw out a single split decision tree called the decision
stump. Each observation is weighted equally while
drawing out the first decision stump.
• The results from the first decision stump are analyzed,
and if any observations are wrongfully classified, they are
assigned higher weights. A new decision stump is drawn
by considering the higher-weight observations as more
significant. Again if any observations are misclassified,
they're given a higher weight, and this process continues
until all the observations fall into the right class.
• AdaBoost can be used for both classification and
regression-based problems. However, it is more
commonly used for classification purposes.
• Gradient Boosting: Gradient Boosting is also based
on sequential ensemble learning. Here the base
learners are generated sequentially so that the present
base learner is always more effective than the
previous one, i.e., and the overall model improves
sequentially with each iteration.
• The difference in this boosting type is that the weights
for misclassified outcomes are not incremented. Instead,
the Gradient Boosting method tries to optimize the loss
function of the previous learner by adding a new model
that adds weak learners to reduce the loss function.
• The main idea here is to overcome the errors in the
previous learner's predictions. This boosting has three
main components:
• Loss function: The use of the loss function depends on
the type of problem. The advantage of gradient boosting
is that there is no need for a new boosting algorithm for
each loss function.

• Weak learner: In gradient boosting, decision trees are


used as a weak learners. A regression tree is used to give
true values, which can combine to create correct
predictions. Like in the AdaBoost algorithm, small trees
with a single split are used, i.e., decision stump. Larger
trees are used for large levels,e, 4-8.

• Additive Model: Trees are added one at a time in this


model. Existing trees remain the same. During the
addition of trees, gradient descent is used to minimize the
loss function.
Random Forest
Random Forest
• Random forest is a supervised learning algorithm
which is used for both classification as well as
regression. But however, it is mainly used for
classification problems.
• As we know that a forest is made up of trees and
more trees means more robust forest.
• Similarly, random forest algorithm creates decision
trees on data samples and then gets the prediction
from each of them and finally selects the best solution
by mean or voting.
• It is an ensemble method which is better than a single
decision tree because it reduces the over-fitting by
averaging the result.
• Random Forest is a popular machine learning
algorithm that belongs to the supervised learning
technique.
• It is a process of combining multiple classifiers to
solve a complex problem and to improve the
performance of the model.
• Random Forest is a classifier that contains a number
of decision trees on various subsets of the given
dataset and takes the average to improve the
predictive accuracy of that dataset.
Working of Random Forest Algorithm
We can understand the working of Random Forest
algorithm with the help of following steps −
• Step 1 − First, start with the selection of random
samples from a given dataset.
• Step 2 − Next, this algorithm will construct a
decision tree for every sample. Then it will get the
prediction result from every decision tree.
• Step 3 − In this step, voting will be performed for
every predicted result.
• Step 4 − At last, select the most voted prediction
result as the final prediction result.
• # importing libraries
• import numpy as nm
• import matplotlib.pyplot as mtp
• import pandas as pd

• #importing datasets
• data_set= pd.read_csv('user_data.csv')

• #Extracting Independent and dependent Variable
• x= data_set.iloc[:, [2,3]].values
• y= data_set.iloc[:, 4].values

• # Splitting the dataset into training and test set.
• from sklearn.model_selection import train_test_split
• x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)

• #feature Scaling
• from sklearn.preprocessing import StandardScaler
• st_x= StandardScaler()
• x_train= st_x.fit_transform(x_train)
• x_test= st_x.transform(x_test)

You might also like