0% found this document useful (0 votes)

1 views

UNIT 1 CLASSIFICATION & PREDICTION DM

Chapter 4 discusses classification and prediction as methods for data analysis, emphasizing the two-step process of learning and classification in data classification. It details decision tree induction, including its structure, algorithms, and attribute selection measures like information gain, gain ratio, and Gini index, as well as tree pruning techniques. Additionally, it introduces Bayesian classification methods, particularly the naive Bayesian classifier, which predicts class membership probabilities based on Bayes' theorem.

Uploaded by

worlddependsonme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

UNIT 1 CLASSIFICATION & PREDICTION DM

Uploaded by

worlddependsonme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Chapter 4

Classification
Basic Concepts
• There are two forms of data analysis that can be used for extracting models describing
important classes or to predict future data trends.

» Classification

» Prediction

• Analysis the class datasets.

• Classification :- Example; Bank loan decision, Marketing, Medical datasets.

• Prediction :- How much a given customer will spend during a sale at his company.
How does classification work?
• Data classification is a two-step process,

» Learning step

» Classification step

• Learning steps :- It training the datasets are analyzed by a classification algorithm and
classifier is represented in the form of classification rules.

• Classification step :- Test data are used to estimate the accuracy of the classification rules.
If the accuracy is considered acceptable, the rules can be applied to the classification of
new data tuples.
Models
Database :- Bank Loan_Decision
Datasets :- Bank Loan_Decision
Lerning Steps or Training datasets
Classification :- Test Data
Decision Tree Induction
• It includes a root node, branches, and leaf nodes.
• Each internal node denotes a test on an attribute,
• each branch denotes the outcome of a test,
• each leaf node holds a class label.
• The topmost node in the tree is the root node.
Example Decision Tree :-
• Decision trees can easily be converted to classification rules.
• Decision trees can handle multidimensional data.
• The learning and classification steps of decision tree induction are
simple and fast.
• It have good accuracy.
• Application areas such as medicine, manufacturing and production,
financial analysis, astronomy, and molecular biology.
Decision Tree Induction Algorithm :-
Three parameters in DTI
• Input
• Method
• Output
• Input :-
– Data Partition , D (Training tuples and class label)
– Attribute_List
– Attribute_Selection_Method
• Methods :-
1. Create a node N;
2. if tuples in D are all of the same class, C, then
3. return N as a leaf node labeled with the class C;
4. if attribute_list is empty then
5. return N as a leaf node labeled with the majority class in D;
6. apply Attribute_selection_method (D, attribute_list) to find best
splitting_attribute
7. label node N with splitting_criterion;
8. if splitting_attribute is discrete-valued and multiway splits allowed then
9. for each outcome j of splitting criterion
10. let Dj be the set of data tuples in D satisfying outcome j;
11. if Dj is empty then
12. attach a leaf labeled with the majority class in D to node N;
13. else attach the node returned by Generate decision tree(Dj, attribute list)
to node N;
endfor
14. return N;
• Output :-
A Decision Tree
There are three possible scenarios,
1. Discrete-valued
2. Continuous-valued
3. Discrete-valued and a binary tree
1. Discrete-valued :- If A is discrete-valued, then one branch is grown for each
known value of A.
2. Continuous-valued :- If A is continuous-valued, then two branches are
grown, corresponding to A <= split point and A > split point.
where split point is the split-point returned by Attribute selection
method as part of the splitting criterion.
3. Discrete-valued and a binary tree :- If A is discrete-valued and a binary tree
must be produced, then the test is of the form A ∈ SA, where SA is the
splitting subset for A.
Attribute Selection Measures

• It provides a ranking for each attribute.

• The attribute having the best score for the measure the splitting attribute for
the given tuples.
• Three popular attribute selection measures
i. Information gain,
ii. Gain ratio
iii.Gini index.
Information Gain
• The highest information gain is chosen as the splitting attribute for node N.
• This attribute minimizes the information needed to classify the tuples in the
resulting partitions and reflects the least randomness or “ impurity ” in these
partitions.
• Information gain is defined as the difference between the original information
requirement (i.e., based on just the proportion of classes) and the new
requirement (i.e., obtained after partitioning on A). That is,

• The expected information needed to classify a tuple in D is given by

• where pi is the probability that an arbitrary tuple in D belongs to class.

• For each indivudal attribute

• InfoA(D) is the expected information required to classify a tuple from D

based on the partitioning by A.
• A means Attribute.
Example Database
(How to select an attribute from given database)
Note : Highest value of the attribute is root node
• From given database,
• Two categories class(Buy's_Computer) {yes, no} yes --> 9 times no--> 5 times
total class --> 14.

= 0.940 bits

= 0.694 bits
Similarly, we can compute,
Gain(income) = 0.029 bits,
Gain(student) = 0.151 bits,
Gain(credit rating) = 0.048 bits.
Age has the highest information gain among the attributes, it is selected as
the splitting attribute. Node N is labeled with age, and branches are grown
for each of the attribute’s values.
• Again we have split the remaining branches
• Similarly, we can compute,
Gain(student) = 0.971
Gain(credit_rating) = 0.021
Student has the highest information gain among the attributes, it is
selected as the splitting attribute.
Final Decision Tree
Gain Ratio
• The information gain measure is biased toward tests with many outcomes.
• It prefers to select attributes having a large number of values.
• For example, consider an attribute that acts as a unique identifier such as
product ID. A split on product ID would result in a large number of
partitions (as many as there are values), each one containing just one tuple.
• Because each partition is pure, the information required to classify data set
D based on this partitioning would be Infoproduct ID (D) = 0.
• Therefore, the information gained by partitioning on this attribute is
maximal.
• such a partitioning is useless for classification.
• It applies a kind of normalization to informationgain using a
“splitinformation” value defined analogously with Info(D) as
• A successor of ID3, uses an extension to information gain known
as gain ratio,
Gini Index

• Gini index measures the impurity of D, a data partition or set of training

tuples, as

• Pi --> Probability of tuple in D belongs to class Ci and estimated by

| Ci , D |/|D|
--> If a binary split on A partitions D into D1 and D2, the Gini Index of D
is given that partitioning is
• The reduction in impurity that would be incurred by a binary split on a
discrete- or continuous-valued attribute A is

• The attribute that maximizes the reduction in impurity (or, equivalently, has
the minimum Gini index) is selected as the splitting attribute. This attribute
and either its splitting subset (for a discrete-valued splitting attribute) or
split-point (for a continuous-valued splitting attribute) together form the
splitting criterion.
• Example :- Induction of a decision tree using the Gini index. Let D be the
training data shown earlier in customers table where there are 9 tuples
belonging to the class buys computer D = yes and the remaining 5 tuples
belong to the class buys computer D = no. A (root) node N is created for
the tuples in D.We first use Eq. (8.7) for the Gini index to compute the
impurity of D:
• Solution :-

• To find the splitting criterion for the tuples in D, we need to compute the
Gini index for each attribute.
• Let’s start with the attribute income and consider each of the possible
splitting subsets.
• Consider the subset {low, medium}. This would result in 10 tuples in
partition D1 satisfying the condition “income Є {low, medium}.” The
remaining 4 tuples of D would be assigned to partition D2. The Gini index
value computed based on
Similarly,
• Similarly, for all attributes such as age, student, and credit-rating.
• Therefore Given minimum Gini Index overall, with a reduction in
impurity of

= 0.459 - 0.357
= 0.102 (overall attribute)
[ The binary split "age Є {Youth, senior}
The result in the maximum reduction in impurity of tuples in D and returned
as thesplitting criterion, Node N is labeled with the criterion, two branches are
grown from it, and the tuples are partitioned accordingly.
Tree Pruning
• In some datasets (ie, in large datasets) decision tree builts the tree which
have many branches, due to noise and outliers.
• Tree pruning methods address this problem of overfitting the data.
• So in such case typically use statistical measures to remove the least reliable
branches.
• “How does tree pruning work?” There are two common approaches to tree
pruning:
1. Prepruning
2. Postpruning
1. Prepruning :-
 Prepruning approach, a tree is pruned by halting its construction early.
Example : by deciding not to further split or partition the subset of
training tuples at a given node.
 The leaf may hold the most frequent class among the subset tuples or
the probability distribution of those tuples.
2. Postpruning :-
 which removes subtrees from a “fully grown” tree.
 A subtree at a given node is pruned by removing its branches and
replacing it with a leaf.
 The leaf is labeled with the most frequent class among the subtree
being replaced.

Example :- An unpruned decision tree and a pruned version of it.

Fig : unpruned decision tree and a pruned version
• Decision trees can suffer from
– repetition
– replication
• Repetition :-
Repetition occurs when an attribute is repeatedly tested along a given
branch of the tree (e.g., “age < 60?,” followed by “age < 45?,” and so on).
• Replication :-
In replication, duplicate subtrees exist within the tree. These situations
can impede the accuracy and comprehensibility of a decision tree. (e.g., the
subtree headed by the node “credit rating?”).
Fig : (a) Repetition and (b) Replication
Bayes Classification Methods
• Bayesian classifiers are statistical classifiers.
• They can predict class membership probabilities such as the probability that
a given tuple belongs to a particular class.
• Bayesian classification is based on Bayes’ theorem.
• Algorithms have found a simple Bayesian classifier known as the naive
bayesian classifier.
• Bayesian classifiers have also exhibited high accuracy and speed when
applied to large databases.
Bayes’ Theorem
• Let X be a data tuple.
• In Bayesian terms, X is considered “evidence.”
• Let H be some hypothesis such as that the data tuple X belongs to a
specified class C.
• For classification problems, we want to determine P(H/X), from P(H),
P(X/H) and P(X).

• P(H/X) --> P(H/X) is the posterior probability, or a posteriori probability, of

H conditioned on X.
Example :- Suppose data tuples is confined to customers described by the
attributes age and income, respectively, and that X is a 35-year-old customer
with an income of $40,000. Suppose that H is the hypothesis that our customer
will buy a computer. Then P(H/X) reflects the probability that customer X will
buy a computer given that we know the customer’s age and income.
P(H) --> prior probability, or a priori probability, of H.
Example :- The probability that any given customer will buy a computer,
regardless of age, income, or any other information.
P(X) --> prior probability, or a priori probability of X.
Example :- The probability that a person from our set of customers is 35 years
old and earns $40,000.
Naive Bayesian Classification
• The naive Bayesian classifier, or simple Bayesian classifier, works as follows:
Step 1 :- Let D be a training set of tuples and their associated class labels. As
usual, each tuple is represented by an n-dimensional attribute vector, X = (x1,
x2, : : : , xn), depicting n measurements made on the tuple from n attributes,
respectively, A1, A2, : : : , An.
Step 2 :- Suppose that there are m classes, C1, C2, : : : , Cm. Given a tuple, X,
the classifier will predict that X belongs to the class having the highest
posterior probability, conditioned on X. That is, the naive Bayesian classifier
predicts that tuple X belongs to the class Ci if and only if
• Thus, we maximize P(Ci/X). The class Ci for which P(Ci/X) is maximized is
called the maximum posteriori hypothesis. By Bayes’ theorem.

• Step 3 :- As P(X) is constant for all classes, only P(X/Ci) P(Ci) needs to be
maximized.
If the class prior probabilities are not known, then it is commonly
assumed that the classes are equally likely, that is, P(C1) = P(C2) = ........ =
P(Cm), and we would therefore maximize P(X/Ci). Otherwise, we maximize
P(X/Ci) P(Ci). Note that the class prior probabilities may be estimated by
P(Ci) = |Ci, D|/|D|, where |Ci,D| is the number of training tuples of class Ci
in D.
• Step 4 :- Given data sets with many attributes, it would be extremely
computationally expensive to compute P(X/Ci). To reduce computation in
evaluating P(X/Ci), the naive assumption of class-conditional independence
is made. This presumes that the attributes ’ values are conditionally
independent of one another, given the class label of the tuple (i.e., that there
are no dependence relationships among the attributes). Thus,

here xk refers to the value of attribute Ak for tuple X.

• To computeP(X/Ci) consider the following
a) If Ak is categorical, then P(xk/Ci) is the number of tuples of class Ci in
D having the value xk for Ak, divided by |Ci,D|, the number of tuples of
class Ci in D.
b) If Ak is continuous-valued, then we need to do a bit more work, but the
calculation is pretty straightforward. A continuous-valued attribute is
typically assumed to have a Gaussian distribution with a mean µ and
standard deviation σ, defined by

µCi --> mean (average) and σCi --> standard deviation

• Step 5 :- To predict the class label of X, P(X/Ci) P(Ci) is evaluated for each
class Ci . The classifier predicts that the class label of tuple X is the class Ci
if and only if

• The predicted class label is the class Ci for which P.XjCi/P.Ci/ is the
maximum.
Example :-
Predicting a class label using naive Bayesian classification. We wish to
predict the class label of a tuple using naive Bayesian classification, given the
same training data as in customer dataset for decision tree induction. The
training data were shown earlier in Table customer dataset. The data tuples
are described by the attributes age, income, student, and credit rating. The
class label attribute, buys computer, has two distinct values (namely, {yes,
no}). Let C1 correspond to the class buys computer = yes and C2 correspond
to buys computer = no. The tuple we wish to classify is
X = (age = youth, income = medium, student = yes, credit-rating = fair)
We need to maximize P(X/Ci) P(Ci), for i = 1, 2. P(Ci), the prior probability
of each class, can be computed based on the training tuples:
P(buys computer = yes) = 9/14 = 0.643
P(buys computer = no) = 5/14 = 0.357
To compute P(X/Ci), for i = 1, 2, we compute the following conditional
probabilities:
P(age = youth / buys computer = yes) = 2/9 = 0.222
P(age = youth / buys computer = no) = 3/5 = 0.600
P(income = medium / buys computer = yes) = 4/9 = 0.444
P(income = medium / buys computer = no) = 2/5 = 0.400
P(student = yes / buys computer = yes) = 6/9 = 0.667
P(student = yes / buys computer = no) = 1/5 = 0.200
P(credit rating = fair / buys computer = yes) = 6/9 = 0.667
P(credit rating = fair / buys computer = no) = 2/5 = 0.400
• Using these probabilities, we obtain
P(X/buys computer = yes) = P(age = youth / buys computer = yes)
* P(income = medium / buys computer = yes)
* P(student = yes / buys computer = yes)
* P(credit rating = fair / buys computer = yes)
=0.222 * 0.444 * 0.667 * 0.667
= 0.044.
Similarly,
P(X/buys computer = no) = 0.600 * 0.400 * 0.200 * 0.400
= 0.019.
To find the class, Ci , that maximizes P(X/Ci) P(Ci), we compute
P(X/buys computer = yes) P(buys computer = yes) = 0.044 * 0.643 = 0.028
P(X/buys computer = no) P(buys computer = no) = 0.019 * 0.357 = 0.007
Therefore, the naive Bayesian classifier predicts buys computer = yes for tuple X.
Regression
• Regression is a data mining function that predicts a number. Age, weight,
distance, temperature, income, or sales could all be predicted using
regression techniques. For example, a regression model could be used to
predict children's height, given their age, weight, and other factors.
• A regression task begins with a data set in which the target values are
known.
• For example, a regression model that predicts children's height could be
developed based on observed data for many children over a period of time.
The data might track age, height, weight, developmental milestones, family
history, and so on. Height would be the target, the other attributes would be
the predictors, and the data for each child would constitute a case.
• Regression models are tested by computing various statistics that measure
the difference between the predicted values and the expected values.
• Regression is for predicting numeric attribute. Regression analysis can be
used to model the relationship between one or more independent.
• Two types of Regression
1. Linear Regression
2. Multiple Regression
1. Linear Regression :-
Simple linear regression is a method that enables you to determine the
relationship between a continuous process output (Y) and one factor (X).
The relationship is typically expressed in terms of a mathematical equation
such as Y = b + mX
Here,
Y --> response, b and m --> constant, x --> predictor variable.
Y = m0 + m1x
|D |
( x i  x ' )( y i  y ' )
m1 =  |D |
i 1

i 1
( xi  x')2

m0 = y' - m1x'
Linear regression is performed either to predict the response variable based
on the predictor variables, or to study the relationship between the response
variable and predictor variables.
Example :- Linear Regression, salary database
x (Experience) y (salary in $ 1000)
03 30
8 57
9 64
13 72
03 36
06 43
11 59
21 90
01 20
16 83
• Solution :-
x' = 3  8  9  13  3  6  11  21  1  16
10
x' = 9.1
y' = 30  57  64  72  3610 43  59  90  20  83
y' = 55.4
m1 = (3  9.1) * (30  55.4)  (8  9.1) * (57  55.4)  ................  (16  9.1) * (83  55.4)
(3  9.1) 2  (8  9.1) 2  .................  (16  9.1) 2

m1 = 3.5
m0 = 55.4 - 3.5 * 9.1
m0 = 23.6
y = 23.6 + 3.5x (equation for straight line)
Fig for Linear Regression :-

2. Multiple Regression :-
Y= b 0 + b 1 X 1 + b 2 X 2 + .... + b k X k
where Y is the dependent variable (response) and X 1 , X 2 ,.. .,X k are the
independent variables (predictors) and e is random error. b 0 , b 1 , b 2 , .... b k
are known as the regression coefficients, which have to be estimated from the
data.
The multiple linear regression algorithm in XLMiner chooses regression
coefficients so as to minimize the difference between predicted values and
actual values.
Fig for Multiple Linear regression :-
Model Evaluation and Selection

• In this section you may have built a classification model, here we confuse
that which classifier is the best and accuracy.
• For example, suppose you used data from previous sales to build a classifier
to predict customer purchasing behavior. You would like an estimate of how
accurately the classifier can predict the purchasing behavior of future
customers, that is, future customer data on which the classifier has not been
trained.
• But what is accuracy? How can we estimate it? Are some measures of a
classifier’s accuracy more appropriate than others?How can we obtain a
reliable accuracy estimate?
• Describes various evaluation metrics for the predictive accuracy of a classifier.
– Holdout and random subsampling
– cross-validation and bootstrap methods
• These are common techniques for assessing accuracy, based on randomly
sampled partitions of the given data.
• What if we have more than one classifier and want to choose the “best” one?
This is referred to as model selection.
• The last two sections address this issue, and discusses how to use tests of
statistical significance to assess whether the difference in accuracy between two
classifiers is due to chance.
• Techniques to Improve Classification Accuracy presents how to compare
classifiers based on cost–benefit and receiver operating characteristic (ROC)
curves.
Metrics for Evaluating Classifier Performance
• This section presents measures for assessing how good or how “accurate”
your classifier is at predicting the class label of tuples.
• We will consider the case of where the class tuples are more or less evenly
distributed, as well as the case where classes are unbalanced.
• They include accuracy (also known as recognition rate), sensitivity (or
recall), specificity, precision, F1, and Fβ.
• Note that although accuracy is a specific measure, the word “accuracy” is
also used as a general term to refer to a classifier’s predictive abilities.
• Before we discuss the various measures, we need to become comfortable
with some terminology. Recall that we can talk in terms of positive tuples
(tuples of the main class of interest) and negative tuples (all other tuples).
• Given two classes, for example, the positive tuples may be buys computer =
yes while the negative tuples are buys computer = no.
• Suppose we use our classifier on a test set of labeled tuples. P is the number of positive
tuples and N is the number of negative tuples. For each tuple, we compare the
classifier’s class label prediction with the tuple’s known class label.
• There are four additional terms we need to know that are the “building blocks” used in
computing many evaluation measures. Understanding themwill make it easy to grasp
the meaning of the various measures.
– True positives (TP): These refer to the positive tuples that were correctly labeled by
the classifier. Let TP be the number of true positives.
– True negatives (TN): These are the negative tuples that were correctly labeled by
the classifier. Let TN be the number of true negatives.
– False positives .FP): These are the negative tuples that were incorrectly labeled as
positive (e.g., tuples of class buys computer = no for which the classifier predicted
buys computer = yes). Let FP be the number of false positives.
– False negatives .FN/: These are the positive tuples that were mislabeled as negative
(e.g., tuples of class buys computer = yes for which the classifier predicted buys
computer = no). Let FN be the number of false negatives.
confusion matrix
• These terms are summarized in the confusion matrix,
• The confusion matrix is a useful tool for analyzing how well your classifier
can recognize tuples of different classes. TP and TN tell us when the
classifier is getting things right, while FP and FN tell us when the classifier
is getting things wrong
• In addition to accuracy-based measures, classifiers can also be compared
with respect to the following additional aspects:
– Speed: This refers to the computational costs involved in generating and
using the given classifier.
– Robustness: This is the ability of the classifier to make correct
predictions given noisy data or data with missing values. Robustness is
typically assessed with a series of synthetic data sets representing
increasing degrees of noise and missing values.
– Scalability: This refers to the ability to construct the classifier efficiently
given large amounts of data. Scalability is typically assessed with a series
of data sets of increasing size.
– Interpretability: Interpretability is subjective and therefore more difficult
to assess. Decision trees and classification rules can be easy to interpret,
yet their interpretability may diminish the more they become complex.
Holdout Method and Random Subsampling
Holdout Method
• The holdout method is what we have alluded to so far in our discussions
about accuracy.
• In this method, the given data are randomly partitioned into two
independent sets, a training set and a test set.
• Typically, two-thirds of the data are allocated to the training set, and the
remaining one-third is allocated to the test set.
Random subsampling
• Random subsampling is a variation of the holdout method in which the
holdout method is repeated k times.
• The overall accuracy estimate is taken as the average of the accuracies
obtained from each iteration.
Cross-Validation

• In k-fold cross-validation, the initial data are randomly partitioned into k

mutually exclusive subsets or “folds,” D1, D2, : : : , Dk, each of
approximately equal size.
• Training and testing is performed k times. In iteration i, partition Di is
reserved as the test set, and the remaining partitions are collectively used to
train the model.
• In the first iteration, subsets D2, : : : , Dk collectively serve as the training set
to obtain a first model, which is tested on D1;
• The second iteration is trained on subsets D1, D3, : : : , Dk and tested on D2;
and so on.
• For classification, the accuracy estimate is the overall number of correct
classifications from the k iterations, divided by the total number of tuples in
the initial data.
Bootstrap
• The bootstrap method samples the given training tuples uniformly with
replacement.
• Each time a tuple is selected, it is equally likely to be selected again and re-added
to the training set.
• For instance, imagine a machine that randomly selects tuples for our training set.
In sampling with replacement, the machine is allowed to select the same tuple
more than once.
• There are several bootstrap methods. A commonly used one is the .632 bootstrap,
which works as follows.
• Suppose we are given a data set of d tuples. The data set is sampled d times, with
replacement, resulting in a bootstrap sample or training set of d samples.
• It is very likely that some of the original data tuples will occur more than once
• in this sample.
• ROC Curve Refer text book page no 375
Bagging
• How bagging works as a method of increasing accuracy.
• Suppose that you are a patient and would like to have a diagnosis made
based on your symptoms. Instead of asking one doctor, you may choose to
ask several. If a certain diagnosis occurs more than any other, you may
choose this as the final or best diagnosis.
• That is, the final diagnosis is made based on a majority vote, where each
doctor gets an equal vote. Now replace each doctor by a classifier, and you
have the basic idea behind bagging. Intuitively, a majority vote made by a
large group of doctors may be more reliable than a majority vote made by a
small group.
• Given a set, D, of d tuples, bagging works as follows.
Bagging Works
• For iteration i (i =1, 2, : : : , k), a training set, Di , of d tuples is sampled
with replacement from the original set of tuples, D. Note that the term
bagging stands for bootstrap aggregation. Each training set is a bootstrap
sample, as described in boostrap topic.
• Because sampling with replacement is used, some of the original tuples of D
may not be included in Di , whereas others may occur more than once. A
classifier model, Mi , is learned for each training set, Di .
• To classify an unknown tuple, X, each classifier, Mi , returns its class
prediction, which counts as one vote. The bagged classifier, M*, counts the
votes and assigns the class with the most votes to X. Bagging can be applied
to the prediction of continuous values by taking the average value of each
prediction for a given test tuple.
Bagging Algorithm
• Algorithm: Bagging. The bagging algorithm—create an ensemble of classification
models for a learning scheme where each model gives an equally weighted
prediction.
• Input:
– D, a set of d training tuples;
– k, the number of models in the ensemble;
– a classification learning scheme (decision tree algorithm, na¨ıve Bayesian,
etc.).
• Output: The ensemble—a composite model, M .
• Method:
– (1) for i D 1 to k do // create k models:
– (2) create bootstrap sample, Di , by sampling D with replacement;
– (3) use Di and the learning scheme to derive a model, Mi ;
– (4) endfor
• To use the ensemble to classify a tuple, X:
• let each of the k models classify X and return the majority vote;
Random Forests
• The another ensemble method called randomforests. Imagine that each of the
classifiers in the ensemble is a decision tree classifier so that the collection of
classifiers is a “forest.”
• The individual decision trees are generated using a random selection of attributes at
each node to determine the split.
• Each tree depends on the values of a random vector sampled independently and with
the same distribution for all trees in the forest. During classification, each tree votes
and the most popular class is returned.
• Random forests can be built using bagging in tandem with random attribute selection
• The trees are grown to maximum size and are not pruned. Random forests formed
this way, with random input selection, are called Forest-RI.
• Another form of random forest, called Forest-RC, uses random linear combinations
of the input attributes. Instead of randomly selecting a subset of the attributes, it
creates new attributes (or features) that are a linear combination of the existing
attributes.
• Random forests are comparable in accuracy to AdaBoost, yet are more
robust to errors and outliers.
• The generalization error for a forest converges as long as the number of
trees in the forest is large. Thus, overfitting is not a problem.
• The accuracy of a random forest depends on the strength of the individual
classifiers and a measure of the dependence between them.
• Random forests are insensitive to the number of attributes selected for
consideration at each split.
• Random forests give internal estimates of variable importance.

SRNE Hybrid Solar Inverter MODBUS Protocol V1 7
100% (2)
SRNE Hybrid Solar Inverter MODBUS Protocol V1 7
12 pages
Men Salon Project 1
No ratings yet
Men Salon Project 1
32 pages
Top 25 Penetration Testing Tools (2023) PDF
50% (2)
Top 25 Penetration Testing Tools (2023) PDF
4 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
CS4038D Data Mining: Decision Tree
No ratings yet
CS4038D Data Mining: Decision Tree
13 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
Construction of Decision Tree Attribute Selection Measures
No ratings yet
Construction of Decision Tree Attribute Selection Measures
5 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Ecture Ecision REE: Sajal Halder Bsmrstu
100% (1)
Ecture Ecision REE: Sajal Halder Bsmrstu
22 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
Data Mining - Lecture 5
No ratings yet
Data Mining - Lecture 5
33 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
dm4
No ratings yet
dm4
68 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
ML - 4
No ratings yet
ML - 4
58 pages
Classification_Decision Tree
No ratings yet
Classification_Decision Tree
32 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Decision Tree
No ratings yet
Decision Tree
22 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
87 pages
Unit-3 (MLT)
No ratings yet
Unit-3 (MLT)
46 pages
Attribute Selection Measures
No ratings yet
Attribute Selection Measures
15 pages
Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Class Basic
No ratings yet
Class Basic
75 pages
CSE445 NSU Week_4
No ratings yet
CSE445 NSU Week_4
48 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Attribute Selection Measure
No ratings yet
Attribute Selection Measure
3 pages
dm 3
No ratings yet
dm 3
37 pages
classification-by-decision-tree-induction
No ratings yet
classification-by-decision-tree-induction
25 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
DMDW_Classification
No ratings yet
DMDW_Classification
18 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Classification
No ratings yet
Classification
45 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
CH 5
No ratings yet
CH 5
81 pages
DWM UNIT-V NOTES
No ratings yet
DWM UNIT-V NOTES
15 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
23 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
Unit 3 - Classification
No ratings yet
Unit 3 - Classification
28 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
_08ClassBasic_v1
No ratings yet
_08ClassBasic_v1
46 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
P9-10 ClassBasic
No ratings yet
P9-10 ClassBasic
82 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
UNIT I DBMI
No ratings yet
UNIT I DBMI
35 pages
Unit 1 Notes DM
No ratings yet
Unit 1 Notes DM
81 pages
DMBI - for MBA - Unit V
No ratings yet
DMBI - for MBA - Unit V
52 pages
Unit 2 DATA WAREHOUSE AND DATA MART
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
17 pages
dell_technologies_incentive_program
No ratings yet
dell_technologies_incentive_program
32 pages
Webaccess VPN Application Note 1.1.3 20230125 1
No ratings yet
Webaccess VPN Application Note 1.1.3 20230125 1
76 pages
Toshiba Product Setup Doc N300 NAS RETAIL HDWG XZ-3132370
No ratings yet
Toshiba Product Setup Doc N300 NAS RETAIL HDWG XZ-3132370
7 pages
Nse4 fgt-7.2 5
No ratings yet
Nse4 fgt-7.2 5
14 pages
Dasc Picom Cbdcom Cyberscitech49142.2020.00064
No ratings yet
Dasc Picom Cbdcom Cyberscitech49142.2020.00064
6 pages
Smart Wheelchair Fall Detection System in SImulation
No ratings yet
Smart Wheelchair Fall Detection System in SImulation
5 pages
Fourth Day in Computer Class (WordPad Introduction)
No ratings yet
Fourth Day in Computer Class (WordPad Introduction)
5 pages
photoshop unit 5
No ratings yet
photoshop unit 5
23 pages
TVL - Ict - CSS: Quarter 3 - Modules 5-8: Maintain Computer Systems and Networks
100% (1)
TVL - Ict - CSS: Quarter 3 - Modules 5-8: Maintain Computer Systems and Networks
20 pages
Telegram
No ratings yet
Telegram
2 pages
Figure Chartiste
No ratings yet
Figure Chartiste
5 pages
CS-T240 User Manual
No ratings yet
CS-T240 User Manual
218 pages
hbt-bms-CPO-PC410-DataSh
No ratings yet
hbt-bms-CPO-PC410-DataSh
10 pages
SkyEdge II - Hub Architecture-6 5 7
No ratings yet
SkyEdge II - Hub Architecture-6 5 7
5 pages
Advantages and Dis Adv of Relation Model
No ratings yet
Advantages and Dis Adv of Relation Model
2 pages
AEC Suggestion
No ratings yet
AEC Suggestion
1 page
Seek Avenger: The Worldwide Standard in Biometric Identity Solutions
No ratings yet
Seek Avenger: The Worldwide Standard in Biometric Identity Solutions
2 pages
Error Failed To Set File Mode For PDF
No ratings yet
Error Failed To Set File Mode For PDF
2 pages
Solarwinds AdministratorGuide
No ratings yet
Solarwinds AdministratorGuide
659 pages
SAP Note 1524325 - Poor Performance Due To Locks On Table NRIV - 1524325 - E - 20171028
No ratings yet
SAP Note 1524325 - Poor Performance Due To Locks On Table NRIV - 1524325 - E - 20171028
5 pages
Ciena 6500
No ratings yet
Ciena 6500
5 pages
Unit I MCQ
No ratings yet
Unit I MCQ
20 pages
En PSDK 9.0.x Developer Book
No ratings yet
En PSDK 9.0.x Developer Book
597 pages
8-Char Array
No ratings yet
8-Char Array
6 pages
LIFTING PLAN (Use) - 120T Sumitomo
No ratings yet
LIFTING PLAN (Use) - 120T Sumitomo
3 pages
C79000-G8976-1418 ROX-II v2.13 RX1500 ConfigurationManual CLI
No ratings yet
C79000-G8976-1418 ROX-II v2.13 RX1500 ConfigurationManual CLI
960 pages
CS project final
No ratings yet
CS project final
29 pages

UNIT 1 CLASSIFICATION & PREDICTION DM

Uploaded by

UNIT 1 CLASSIFICATION & PREDICTION DM

Uploaded by

Chapter 4

• Analysis the class datasets.

• Classification :- Example; Bank loan decision, Marketing, Medical datasets.

• It provides a ranking for each attribute.

• The expected information needed to classify a tuple in D is given by

• where pi is the probability that an arbitrary tuple in D belongs to class.

• InfoA(D) is the expected information required to classify a tuple from D

• Gini index measures the impurity of D, a data partition or set of training

• Pi --> Probability of tuple in D belongs to class Ci and estimated by

Example :- An unpruned decision tree and a pruned version of it.

• P(H/X) --> P(H/X) is the posterior probability, or a posteriori probability, of

here xk refers to the value of attribute Ak for tuple X.

µCi --> mean (average) and σCi --> standard deviation

• In k-fold cross-validation, the initial data are randomly partitioned into k

You might also like