0% found this document useful (0 votes)
126 views

Pattern Recognition

The document discusses basic concepts in classification algorithms, including classifiers, classification models, features, and different types of classification tasks such as binary, multi-class, and multi-label classification. It provides examples of applications of classification algorithms and describes different types of classification models including linear classifiers, support vector machines, decision trees, and neural networks. It also provides more detail on naive Bayes classification, including how it is based on Bayes' theorem and how to calculate posterior probabilities to classify new data points.

Uploaded by

Mahendra Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views

Pattern Recognition

The document discusses basic concepts in classification algorithms, including classifiers, classification models, features, and different types of classification tasks such as binary, multi-class, and multi-label classification. It provides examples of applications of classification algorithms and describes different types of classification models including linear classifiers, support vector machines, decision trees, and neural networks. It also provides more detail on naive Bayes classification, including how it is based on Bayes' theorem and how to calculate posterior probabilities to classify new data points.

Uploaded by

Mahendra Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

Pattern Recognition (Unit II)

Basic Terminology in Classification Algorithms

• Classifier: An algorithm that maps the input data to a specific category.


• Classification model: A classification model tries to draw some conclusion
from the input values given for training. It will predict the class
labels/categories for the new data.
• Feature: A feature is an individual measurable property of a phenomenon
being observed.
• Binary Classification: Classification task with two possible outcomes. Eg:
Gender classification (Male / Female)
• Multi-class classification: Classification with more than two classes. In
multi-class classification, each sample is assigned to one and only one
target label. Eg: An animal can be a cat or dog but not both at the same
time.
• Multi-label classification: Classification task where each sample is
mapped to a set of target labels (more than one class). Eg: A news article
can be about sports, a person, and location at the same time.
Applications of Classification
Algorithms
• Email spam classification
• Bank customers loan pay willingness prediction.
• Cancer tumor cells identification.
• Sentiment analysis
• Drugs classification
• Facial key points detection
• Pedestrians detection in an automotive car
driving.
Type of Classification
• Linear Classifiers
– Logistic regression
– Naive Bayes classifier
– Fisher’s linear discriminant
• Support vector machines
– Least squares support vector machines
• Quadratic classifiers
• Kernel estimation
– k-nearest neighbor
• Decision trees
– Random forests
• Neural networks
• Learning vector quantization
Naïve Bayes Classification

• Based on Bayes theorem


• Assumption
– Presence of one evidence is independent of other
other evidence /feature (naïve)
– equal contribution to the outcome.
P(c|x) is the posterior probability
of class (c, target) given predictor (x, attributes).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability
of predictor given class. Probability density
function/condtional probability
P(x) is the prior probability of predictor.
• Convert the data set into a frequency table
• Create Likelihood table by finding the
probabilities
• use Naive Bayesian equation to calculate the
posterior probability for each class. The class
with the highest posterior probability is the
outcome of prediction.
OUTLOOK TEMPERATURE HUMIDITY WINDY PLAY GOLF

0 Rainy Hot High False No

1 Rainy Hot High True No

2 Overcast Hot High False Yes

3 Sunny Mild High False Yes

4 Sunny Cool Normal False Yes

5 Sunny Cool Normal True No

6 Overcast Cool Normal True Yes

7 Rainy Mild High False No

8 Rainy Cool Normal False Yes

9 Sunny Mild Normal False Yes

10 Rainy Mild Normal True Yes

11 Overcast Mild High True Yes

12 Overcast Hot Normal False Yes

13 Sunny Mild High True No


Outlook
Yes No P(yes) P(no) P(s)
Sunny 2 3 2/9 3/5 5/14
Overcast 4 0 4/9 0 4/14
Rainy 3 2 3/9 2/5 5/14
9 5
9/14 5/14

Tempreture
Yes No P(yes) P(no) P()
Hot 2 2 2/9 2/5 4/14
medium 4 2 4/9 2/5 6/14
Cool 3 1 3/9 1/5 4/14
9 5
9/14 5/14
Humidity

Yes No P(yes) P(no) P(s)


HIgh 3 4 3/9 4/5 7/14
Normal 6 1 6/9 1/5 7/14
9 5
9/14 5/14

Windy
Yes No P(yes) P(no) P()
True 6 2 6/9 2/5 8/14
False 3 3 3/9 3/5 6/14
9 5
9/14 5/14

Play P(yes/no)
Yes 9/14
NO 5/14
P(yes/today)=(2/9*2/9*6/9*6/9*9/
14)/P(today) approximately .0141
or
= 0.141/(00141+.0068)= 0.67

P(No/today)=(3/5*2/5*1/5*2/5*5/1
4)/P(today) approximately 0.0068 or
= 0.0068/(00141+.0068)= 0.33
Name Given Can Fly Live in Have Legs Class
birth Water
human yes no no yes mammals

python no no no no non-mammals
salmon no no yes no non-mammals
whale yes No Yes No Mammals
frog no No sometimes yes non-mammal
komodo no no no yes non- mammals

bat yes Yes No Yes Mammals

pigeon No Yes No Yes Non-Mammals


Cat Yes No No Yes Mammals
Leopard Shark Yes No Yes No Non-mammals
Turtle No No Sometimes Yes Non Mammals
penguin No No Sometimes Yes Non mammal
porcupine Yes No No Yes Mammal
eel no no yes no non-
mammals
salamander no no sometimes yes non-
mammals
gila monster no no no yes non-
mammals
platypus no no no yes mammals

owl no yes no yes non-


mammals
dolphin yes no yes no mammals
eagle no yes no yes non-
mammals
Give birth
Mammal Non-mammal P(M) P(NM)
Yes 6 1 6/7 1/13 7/20
No 1 12 1/7 12/13 13/20
7 13
7/20 13/20

Can Fly
Mammal Non-mammal P(M) P(NM)
Yes 1 3 1/7 3/13 4/20
No 6 10 6/7 10/13 16/20
7 13
7/20 13/20
Live in water
Mammal Non-mammal P(M) P(NM)
Yes 2 3 2/7 3/13 5/20
No 5 6 5/7 6/13
Some times 0 4 0 4/13
7/20 13/20

Have legs
Mammal Non-mammal P(M) P(NM)
Yes 5 9 5/7 8/13
No 2 4 2/7 5/13
7 13
7/20 13/20
• P(M)=7/20
• P(NM)=13/20
Given Can Fly Live in Have legs Class
Birth water
yes No Yes NO
• P(M/B)=6/7*6/7*2/7*2/7*7/20=.0042
• P(NM/B)=1/13*10/13*3/13*4/13*13/20=0.00
27
• Rearranging these leads us to the answer to our
question, which is called Bayes’ formula:
– P(ωj|x)=p(x|ωj)P(ωj) /p(x)
• where in this case of two categories
2

– p(x)=ƹ p(x|ωj)P(ωj)
j=1
Bayes’ formula can be expressed informally in English by
saying that
posterior = likelihood×prior /evidence .
• P(ω1|x) is greater than P(ω2|x), we would
naturally be inclined to decide that the true
state of nature is ω1 and vice versa.
• probability of error whenever we make a
decision. Whenever we observe a particular x,
P(error|x)= P(ω1|x) if we decide ω2
• P(ω2|x) if we decide ω1.
• We shall now formalize the ideas just considered, and generalize them in
four ways:
• by allowing the use of more than one feature
• X is feature vector
• by allowing more than two states of nature
provides us with a useful generalization for a small notational
Expenses.
• by allowing actions other than merely deciding the state of nature
primarily allows the possibility of rejection, i.e., of refusing to make
a decision in close cases; this is a useful option if being indecisive is not
too costly.
• by introducing a loss function more general than the probability of error.
exactly how costly each action is, and is used to convert a
probability determination into a decision
• Let ω1,...,ωc be the finite set of c states of
nature (“categories”)
• α1,...,αa be the finite set of a possible actions.
• The loss function λ(αi|ωj) describes the loss
incurred for taking action αi when the state of
nature is ωj
• Let the feature vector x be a d-component vector-valued
random variable, and let p(x|ωj) be the state conditional
probability density function for x — the probability density
function for x conditioned on ωj being the true state of
nature. As before, P(ωj) describes the prior probability that
nature is in state ωj. Then the posterior probability P(ωj|x)
can be computed from p(x|ωj) by Bayes’ formula:
• P(x)=Ʃ p(x|wj)p(wj)
• Expected loss of taking action αj when actual
class is wj
– R(αj|x)= Ʃ λ(αj|wj)p(wj|x)
In decision-theoretic terminology, an expected loss
is called a risk, and R(αi|x) isrisk called the
conditional risk.
Problem is to find the decision rule against Wi to
minimize the risk
Two Category classification
• λij= λ(αi|ωj) is loss function for deciding wi
when true class is wj. Condition risk is
– R(α1|x)=λ11P(w1|x)+λ12P(W2|x)
– R(α2|x)=λ21P(w1|x)+λ22P(W2|x)
Fundamental rule is to decide w1 if R(α1|x)<R(α2|x)
In terms of posterior probabilities

(λ11- λ21)P(w1|x)>(λ12- λ22)P(w2|x) decide w1


Normal Distribution/Guassian
Distribution
• Same shape coming up over and over again in different distributions, therefore it is named as the
normal distribution.
• Symmetric bell shape
• mean and median are equal; both located at the center of the distribution
• ≈68%approximately equals, 68%of the data falls within 1 standard deviation of the mean
• ≈95%approximately equals, 95% of the data falls within 2 standard deviations of the mean
• ≈99.7%approximately equals, 99.7% of the data falls within 3 standard deviations of the mean
• The trunk diameter of a certain variety of pine
tree is normally distributed with a mean
of 150 cm (μ=150cm) and a standard deviation
σ=30cm.
• A certain variety of pine tree has a mean trunk
diameter of μ=150cm and a standard
deviation of σ=30cm. Approximately what
percent of these trees have a diameter greater
than 210cm?
Approximately what percent of these trees have a diameter
between 90 and 210 centimeters?

60 90 120 150 180 210 240


• A certain section of a forest has 500 of these
trees. Approximately how many of these trees
have a diameter
between 120 and 180 centimeters?
• Naive Bayes can be extended to real-
valued attributes by assuming a Gaussian
distribution.
• Easy to implement as we need to compute
only mean and standard deviation
The numeric weather data with summary statistics
outlook temperature humidity windy play
yes no yes no yes no yes no yes no

sunny 2 3 83 85 86 85 false 6 2 9 5
overcast 4 0 70 80 96 90 true 3 3
rainy 3 2 68 65 80 70
64 72 65 95
69 71 70 91
75 80
75 70
72 90
81 75
sunny 2/9 3/5 mean 73 74.6 mean 79.1 86.2 false 6/9 2/5 9/14 5/14
overcast 4/9 0/5 varia 33. 7.9 std 10.2 9.7 true 3/9 3/5
nce 44 dev
rainy 3/9 2/5
• For examples,
 66 732

f temperature  66 | Yes 
1
e 2  6.2 2
 0.0340
2 6.2
2 3 9
• Likelihood of Yes =  0.0340  0.0221    0.000036
9 9 14
3 3 5
• Likelihood of No =  0.0291  0.038    0.000136
5 5 14
Decision Tree
• Decision trees are used for both classification and
regression problems
• Decision tree is the most powerful and popular tool for
classification and prediction.
• A Decision tree is a flowchart like tree structure, where
each internal node denotes a test on an attribute, each
branch represents an outcome of the test, and each leaf
node (terminal node) holds a class label.
• simple to understand the data and make some good
interpretations.
A decision tree is a tree where each node represents a
feature(attribute), each link(branch) represents a
decision(rule) and each leaf represents an
outcome(categorical or continues value).

Tree can be built by using many algorithm but Popular


are
1. ID3 (Iterative Dichotomiser 3) → uses Entropy
function and Information gain as metrics.
2. CART (Classification and Regression Trees) →
uses Gini Index(Classification) as metric.
Entropy
• Entropy is the measures of impurity, disorder or uncertainty in
a bunch of examples.
• Entropy controls how a Decision Tree decides to split the data.
It actually effects how a Decision Tree draws its boundaries.
Information Gain
•Information gain (IG) measures how much “information” a feature
gives us about the class.
•Information gain is the main key that is used by Decision Tree
Algorithms to construct a Decision Tree.
•Decision Trees algorithm will always tries to maximize Information
gain.
•An attribute with highest Information gain will tested/split first.
Humidit
y
High Normal

No Yes E(Humidity,high)=-4/8*log(4/8)-4/8log(4/8)=x
No No
Yes Yes E(humidity,normal)=-5/6*log5/6-1/6*log(1/6)=y
Yes Yes
no Yes
Average entropy of humidity (z)
Yes
Yes
yes Z=8/14*x+6/14*y
no Gain(humidity)=.94-z=p
Temp
E(temp, High)=-2/4*log(2/4)-2/4*log(2/4)=s
cool E(temp, mild)=-4/6*log(4/6)-2/6*log(2/6)=t
Hot
E(temp, cool)=-3/4*log(3/4)-1/4*log(1/4)=u
Mild
Average entropy of Temp
Yes Q=4/14*s+6/14*t+4/14*u
No
No
Yes
no
No
Yes
Gain(temp)=.94-Q
Yes
Yes yes
Yes
Yes
Yes
No
Play(yes)=9
Ply(no)=5
sunny
rainy
overcast
Play(yes)=2 Play(yes)=3
Ply(no)=3 Play(yes)=4
Ply(no)=2
Ply(no)=0
high
normal true
false
Play(yes)=0 Play(yes)=2
Ply(no)=3 Play(yes)=0 Play(yes)=3
Ply(no)=0
Ply(no)=2 Ply(no)=0
humidity
wind
ID3
• ID3 (Quinlan, 1986) is a basic algorithm for learning DT's
• Given a training set of examples, the algorithms for building DT
performs search in the space of decision trees

• The construction of the tree is top-down. The algorithm is greedy.


• The fundamental question is “which attribute should be tested
next? Which question gives us more information?”
• Select the best attribute
• A descendent node is then created for each possible value of this
attribute and examples are partitioned according to this value
• The process is repeated for each successor node until all the
examples are classified correctly or there are no attributes left
Decision Tree Learning: ID3
l Function ID3(Training-set, Attributes)
– If all elements in Training-set are in same class, then return
leaf node labeled with that class
– Else if Attributes is empty, then return leaf node labeled
with majority class in Training-set
– Else if Training-Set is empty, then return leaf node labeled
with default majority class
– Else
Select and remove A from Attributes
Make A the root of the current tree
For each value V of A
– Create a branch of the current tree labeled by V
– Partition_V  Elements of Training-set with value V for A
– Induce-Tree(Partition_V, Attributes)
– Attach result to branch V
Illustrative Training Set
Risk Assessment for Loan Applications
Client # Credit History Debt Level Collateral Income Level RISK LEVEL
1 Bad High None Low HIGH
2 Unknown High None Medium HIGH
3 Unknown Low None Medium MODERATE
4 Unknown Low None Low HIGH
5 Unknown Low None High LOW
6 Unknown Low Adequate High LOW
7 Bad Low None Low HIGH
8 Bad Low Adequate High MODERATE
9 Good Low None High LOW
10 Good High Adequate High LOW
11 Good High None Low HIGH
12 Good High None Medium MODERATE
13 Good High None High LOW
14 Bad High None Medium HIGH
ID3 Example (I)
2) All examples are in the same class, HIGH.
1) Choose Income as root of tree.
Return Leaf Node.
Income
High Income
Low
Medium Low High
2) 3) 4) Medium
1,4,7,11 2,3,12,14 5,6,8,9,10,13
HIGH 2,3,12,14 5,6,8,9,10,13

3) Choose Debt Level as root of subtree. 3a) All examples are in the same class, MODERATE.
Return Leaf node.
Debt Level
Debt Level
Low High High
Low
3a) 3b) 3b)
3 2,12,14 MODERATE 2,12,14
ID3 Example (II)

3b’-3b’’’) All examples are in the same class.


3b) Choose Credit History as root of subtree.
Return Leaf nodes.
Credit History Credit History
Unknown Good
Unknown Good
Bad
3b’) 3b’’) 3b’’’ Bad
2 14 ) 12
HIGH HIGH MODERATE

4) Choose Credit History as root of subtree. 4a-4c) All examples are in the same class.
Return Leaf nodes.
Credit History
Credit History
Unknown Good
Bad Unknown Good
4a) 4b) 4c) Bad
5,6 8 9,10,13
LOW MODERATE LOW
ID3 Example (III)
Attach subtrees at appropriate places.

Income

Low
Medium High
HIGH Debt Level Credit History
Low High Unknown Good
Bad

MODERATE Credit History LOW MODERATE LOW

Unknown Good
Bad

HIGH HIGH MODERATE


Another Example
l Assume A1 is binary feature (Gender: M/F)
l Assume A2 is nominal feature (Color: R/G/B)
A1
A2

R
R
G A2
G A2
B
B
A1
M F A1
Decision surfaces are axis-aligned M F
Hyper-rectangles
Non-Uniqueness

l Decision trees are not unique:


– Given a set of training instances T, there
generally exists a number of decision trees
that are consistent with (or fit) T
Tid Refund Marital Taxable
Status Income Cheat

No
MarSt Single,
1 Yes Single 125K
Married Divorced
2 No Married 100K No
3 No Single 70K No NO Refund
4 Yes Married 120K No Yes No
5 No Divorced 95K Yes
NO TaxInc
6 No Married 60K No
7 Yes Divorced 220K No
< 80K > 80K
8 No Single 85K Yes NO YES
9 No Married 75K No
10 No Single 90K Yes
10
ID3’s Question

Given a training set, which of all of the decision


trees consistent with that training set should we
pick?

More precisely:
Given a training set, which of all of the decision
trees consistent with that training set has the
greatest likelihood of correctly classifying
unseen instances of the population?
ID3’s (Approximate) Bias
• ID3 (and family) prefers simpler decision trees
• Occam’s Razor Principle:
– “It is vain to do with more what can be
done with less...Entities should not be
multiplied beyond necessity.”
• Intuitively:
– Always accept the simplest answer that fits
the data, avoid unnecessary constraints
– Simpler trees are more general
ID3’s Question Revisited

l Since ID3 builds a decision tree by recursively


selecting attributes and splitting the training
data based on the values of these attributes

Practically:
Given a training set, how do we select attributes
so that the resulting tree is as small as possible,
i.e. contains as few attributes as possible?
ID3
• Start with a training data set which we’ll call S. It
should have attributes and classification.
• Determine the best attribute in the dataset. (We will
go over the definition of best attribute)
• Split S into subset that contains the possible values for
the best attribute.
• Make decision tree node that contains the best
attribute.
• Recursively generate new decision trees by using the
subset of data created from step 3 until a stage is
reached where you cannot classify the data further.
Represent the class as leaf node.
The left split has less information gain as the
data is split on two classes which has almost
equal ‘+’ and ‘-’ examples. While the split on
the right as more ‘+’ example in one class and
more ‘-’ example in the other class. In order to
calculate best attribute we will use
Problem with Continuous Data
• 10–20 can be class1, 20–30 and so on and a
particular discrete value is put to a particular
class.
CART
• Classification and Regression Tree(CART)
• Gini index is a metric for classification tasks in
CART.
• It stores sum of squared probabilities of each
class.
Gini = 1 – Σ (Pi)2 for i=1 to number of
classes
Gini(Outlook=Sunny) = 1 – (2/5)2 – (3/5)2 = 1 –
0.16 – 0.36 = 0.48
Gini(Outlook=Overcast) = 1 – (4/4)2 – (0/4)2 =
0
Gini(Outlook=Rain) = 1 – (3/5)2 – (2/5)2 = 1 –
0.36 – 0.16 = 0.48
Then, we will calculate weighted sum of gini
indexes for outlook feature.
Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 +
(5/14) x 0.48 = 0.171 + 0 + 0.171 = 0.342
Outlook Yes No Number of instances

Sunny 2 3 5

Overcast 4 0 4

Rain 3 2 5

Gini(Outlook=Sunny) = 1 – (2/5)2 – (3/5)2 = 1 – 0.16 – 0.36 = 0.48


Gini(Outlook=Overcast) = 1 – (4/4)2 – (0/4)2 = 0
Gini(Outlook=Rain) = 1 – (3/5)2 – (2/5)2 = 1 – 0.36 – 0.16 = 0.48
Then, we will calculate weighted sum of gini indexes for outlook feature.
Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48 = 0.171 + 0 +
0.171 = 0.342
Temperature Yes No Number of instances

Hot 2 2 4

Cool 3 1 4

Mild 4 2 6

Temperature
Gini(Temp=Hot) = 1 – (2/4)2 – (2/4)2 = 0.5
Gini(Temp=Cool) = 1 – (3/4)2 – (1/4)2 = 1 – 0.5625 – 0.0625 = 0.375
Gini(Temp=Mild) = 1 – (4/6)2 – (2/6)2 = 1 – 0.444 – 0.111 = 0.445
We’ll calculate weighted sum of gini index for temperature feature
Gini(Temp) = (4/14) x 0.5 + (4/14) x 0.375 + (6/14) x 0.445 = 0.142 + 0.107 +
0.190 = 0.439
Number of Number of
Humidity Yes No Wind Yes No
instances instances

High 3 4 7 Weak 6 2 8

Normal 6 1 7 Strong 3 3 6

Wind
Humidity
Gini(Wind=Weak) = 1 – (6/8)2 – (2/8)2
Humidity is a binary class feature. It can be high or
= 1 – 0.5625 – 0.062 = 0.375
normal.
Gini(Wind=Strong) = 1 – (3/6)2 – (3/6)2
Gini(Humidity=High) = 1 – (3/7)2 – (4/7)2
= 1 – 0.25 – 0.25 = 0.5
= 1 – 0.183 – 0.326
Gini(Wind) = (8/14) x 0.375 + (6/14) x 0.5 = 0.428
= 0.489
Gini(Humidity=Normal) = 1 – (6/7)2 – (1/7)2
= 1 – 0.734 – 0.02
= 0.244
Weighted sum for humidity feature will be calculated
next
Gini(Humidity) = (7/14) x 0.489 + (7/14) x 0.244 =
0.367
Feature Gini index

Outlook 0.342

Temperature 0.439

Humidity 0.367

Wind 0.428
Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

11 Sunny Mild Normal Strong Yes


Temperature Yes No Number of instances

Hot 0 2 2

Cool 1 0 1

Mild 1 1 2

Gini of temperature for sunny outlook


Gini(Outlook=Sunny and Temp.=Hot) = 1 – (0/2)2 – (2/2)2 = 0
Gini(Outlook=Sunny and Temp.=Cool) = 1 – (1/1)2 – (0/1)2 = 0
Gini(Outlook=Sunny and Temp.=Mild) = 1 – (1/2)2 – (1/2)2 = 1 – 0.25 – 0.25 = 0.5
Gini(Outlook=Sunny and Temp.) = (2/5)x0 + (1/5)x0 + (2/5)x0.5 = 0.2
Number of Number
Wind Yes No
instances Humidity Yes No of
Weak 1 2 3 instances

Strong 1 1 2 High 0 3 3

Normal 2 0 2
Gini of wind for sunny outlook
Gini(Outlook=Sunny and Wind=Weak) =
1 – (1/3)2 – (2/3)2 = 0.266
Gini of humidity for sunny outlook
Gini(Outlook=Sunny and Humidity=High) = 1 – (0/3)2 – (3/3)2 = 0
Gini(Outlook=Sunny and Wind=Strong) = Gini(Outlook=Sunny and Humidity=Normal) = 1 – (2/2)2 – (0/2)2 = 0
1- (1/2)2 – (1/2)2 = 0.2 Gini(Outlook=Sunny and Humidity) = (3/5)x0 + (2/5)x0 = 0
Gini(Outlook=Sunny and Wind) =
(3/5)x0.266 + (2/5)x0.2 = 0.466
Decision for sunny outlook
We’ve calculated gini index scores for feature when outlook is sunny. The winner is humidity because it has the lowest value.
We’ll put humidity check at the extension of sunny outlook.

Feature Gini index

Temperature 0.2

Humidity 0

Wind 0.466
Decision for rain outlook
The winner is wind feature for rain outlook because it has the minimum gini index score in features.
Put the wind feature for rain outlook branch and monitor the new sub data sets.

Feature Gini index

Temperature 0.466

Humidity 0.466

Wind 0
advantages of decision tree
• Easy to use and understand.
• Can handle both categorical and numerical
data.
• Resistant to outliers, hence require little data
preprocessing.
• New features can be easily added.
• Can be used to build larger classifiers by using
ensemble methods.
Disadvantages
• Prone to overfitting.
• Require some kind of measurement as to how
well they are doing.
• Need to be careful with parameter tuning.
• Can create biased learned trees if some
classes dominate.

You might also like