0% found this document useful (0 votes)

20 views

Unit V Classification

The document discusses classification and prediction techniques in data mining. It covers decision trees, naive Bayes classification, and issues around evaluating classification models. Specifically, it describes the process of constructing a classification model using a training set, and then applying the model to classify new unlabeled data. Decision tree induction and the ID3 algorithm for building decision trees are explained at a high level.

Uploaded by

Yog Dharaskar

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Unit V Classification

Uploaded by

Yog Dharaskar

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 69

UNIT V:Classification

Decision trees.
Naïve Bayes

Ref: Han and Kamber PPT/Book

Data Mining: Concepts

July 17, 2015 and Techniques
Decision trees- Overview, general algorithm,
decision tree algorithm, evaluating a decision
tree.
Naïve Bayes – Bayes Theorem and Algorithm, Naïve
Bayes Classifier, smoothing, diagnostics.
Diagnostics of classifiers, additional classification
methods.

Data Mining: Concepts

July 17, 2015 and Techniques
Classification and Prediction

 What is classification? What is

prediction?
 Issues regarding classification
and prediction
 Classification by decision tree
induction
 Bayesian classification

 Rule-based classification

 Classification by back
propagation

July 17, 2015 Data Mining: Concepts and Techniques 3

Classification vs. Prediction
 Classification
 predicts categorical class labels (discrete or nominal)
 classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying new data
 Prediction
 models continuous-valued functions, i.e., predicts
unknown or missing values.
 Ex. Manager want to predict how much a given
customer will spend during the sale at shop.
 This is numeric prediction so normally Regression is
used here.

July 17, 2015 Data Mining: Concepts and Techniques 4

Typical applications

 Loan approval: Ex. Which loan applications are Safe or

risky for bank.

 Target marketing Ex. Customer with given profile will

buy new computer or not.

 Medical diagnosis : Ex. Medical researcher may want to

analyze cancer data to predict which one of three
treatments a patient should receive.

 Fraud detection: It detects whether there is fraud or

not.

July 17, 2015 Data Mining: Concepts and Techniques 5

Classification—A Two-Step Process
 Model construction: describing a set of predetermined classes
 Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attribute
 The set of tuples used for model construction is training set
 The model is represented as classification rules, decision trees,
or mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test sample is compared with the
classified result from the model
 Accuracy rate is the percentage of test set samples that are
correctly classified by the model
 Test set is independent of training set, otherwise over-fitting
will occur
 If the accuracy is acceptable, use the model to classify data
tuples whose class labels are not known
July 17, 2015 Data Mining: Concepts and Techniques 6
Process (1): Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
July 17, 2015 Data Mining: Concepts and Techniques 7
Process (2): Using the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
July 17, 2015 Data Mining: Concepts and Techniques 8
Supervised vs. Unsupervised Learning

 Supervised learning (classification)

 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
July 17, 2015 Data Mining: Concepts and Techniques 9
Classification and Prediction

 What is classification? What is

prediction?
 Issues regarding classification
and prediction
 Classification by decision tree
induction
 Bayesian classification

 Rule-based classification

 Classification by back
propagation

July 17, 2015 Data Mining: Concepts and Techniques 10

Issues: Data Preparation

 Data cleaning
 Preprocess data in order to reduce noise and handle
missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data

July 17, 2015 Data Mining: Concepts and Techniques 11

Issues: Evaluating Classification Methods
 Accuracy
 classifier accuracy: predicting class label
 predictor accuracy: guessing value of predicted
attributes(Finding Missing values)
 Speed
 time to construct the model (training time)
 time to use the model (classification/prediction time)
 Robustness: handling noise and missing values
 Scalability: efficiency in disk-resident databases
 Interpretability
 understanding and insight provided by the model
 Other measures, e.g., goodness of rules, such as decision
tree size or compactness of classification rules
July 17, 2015 Data Mining: Concepts and Techniques 12
Classification and Prediction

 What is classification? What is

prediction?
 Issues regarding classification
and prediction
 Classification by decision tree
induction
 Bayesian classification

 Rule-based classification

 Classification by back
propagation

July 17, 2015 Data Mining: Concepts and Techniques 13

Decision Tree Induction: Training Dataset

age income student credit_rating buys_computer

<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

July 17, 2015 Data Mining: Concepts and Techniques 14

Output: A Decision Tree for “buys_computer”

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes

July 17, 2015 Data Mining: Concepts and Techniques 15

Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a top-down recursive divide-and-conquer
manner
 At start, all the training examples are at the root
 Attributes are categorical (if continuous-valued, they are
discretized in advance)
 Examples are partitioned recursively based on selected attributes
 Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
 Conditions for stopping partitioning
 All samples for a given node belong to the same class
 There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
 There are no samples left
July 17, 2015 Data Mining: Concepts and Techniques 16
Attribute Selection Measure:
Information Gain (ID3/C4.5)
 Select the attribute with the highest information gain

 Let pi be the probability that an arbitrary tuple in D

belongs to class Ci, estimated by |Ci, D|/|D|

 Expected information (entropy) needed to classify a tuple

in D:

m
Info ( D )=−∑ p i lo g 2 ( p i )
i= 1

July 17, 2015 Data Mining: Concepts and Techniques 17

 Information needed (after using A Attribute to split D
into v ((Different values in attribute A) partitions) to
classify D:
v
|D j|
Info A ( D )= ∑ × Info( D j )
j= 1 |D|

|Dj| is total number of tuples in D, that have outcome aj of A.

 Information gained by branching on attribute A

G ain ( A )=Info ( D )− Info A ( D )

July 17, 2015

age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

 Class P: buys_computer = “yes”

 Class N: buys_computer = “no”

July 17, 2015

Attribute Selection: Information Gain
 Class P: buys_computer = “yes” 5 4
Infoage (D )= I (2,3 )+ I (4,0)
 Class N: buys_computer = “no” 14 14
5
+ I (3,2)=0 .694
14
9 9 5 5
Info( D)=I (9,5 )=− log2 ( )− log 2 ( )=0 .940
14 14 14 14

means “age <=30” has 5 out of 14

samples, with 2 yes’es and 3 no’s.
age pi n i I(pi, n i)
Hence
<=30 2 3 0.971
31…40 4 0 0 Similarly,
>40 3 2 0.971

July 17, 2015

Attribute Selection: Information Gain
 Class P: buys_computer = “yes” 5 4
Infoage (D )= I (2,3 )+ I (4,0)
 Class N: buys_computer = “no” 14 14
5
9 9 5 5 + I (3,2)=0 .694
Info( D)=I (9,5 )=− log 2 ( )− log2 ( )=0 .940 14
14 14 14 14
5 means “age <=30” has 5 out of 14 samples, with 2
age pi n i I(p i, n i) I (2,3)
14 yes’es and 3 no’s. Hence
<=30 2 3 0.971
Similarly, Find
31…40 4 0 0
>40 3 2 0.971 Gain (income)
GGain
ain(Student)
( age ) =Info( D )− Info age ( D )= 0 .246
age income student credit_rating buys_computer Gain (Credit_Rating)
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40July 17,medium
2015 no excellent no 21
Gain( income )=0 . 029
Gain( student )=0 .151
Gain( credit rating )=0 .048

Data Mining: Concepts and

July 17, 2015 Techniques
How to convert continues value

Let us take an example, where you have age as the

target variable. So, let us say you compute the
variance of age and it comes out to be x. Next,
decision tree looks at various splits and calculates
the total weighted variance of each of these
splits. It chooses the split which provides the
minimum variance.

Data Mining: Concepts

July 17, 2015 and Techniques
techniques
• Replacing values
• Encoding labels
• One-Hot encoding
• Binary encoding
• Backward difference encoding
• Replace():
replace_map = {'carrier': {'AA': 1, 'AS': 2, 'B6': 3,
'DL': 4, 'F9': 5, 'HA': 6, 'OO': 7 , 'UA': 8 , 'US': 9,'VX':
10,'WN': 11}}
replace_map ={‘{College’:‘VIT’:1,”VIIT’:2,’MIT’:3}}
Suitable when categories are less
Data Mining: Concepts
July 17, and Techniques
labels = cat_df_flights['carrier'].astype('category').cat.categories.tolist()
replace_map_comp = {'carrier' : {k: v for k,v in
zip(labels,list(range(1,len(labels)+1)))}} print(replace_map_comp)

cat_df_flights_replace.replace(replace_map_comp,
inplace=True) print(cat_df_flights_replace.head())

Data Mining: Concepts

July 17, 2015 and Techniques
Label Encoding: allows you to convert each value in a
column to a number. Numerical labels are always
between 0 and n_categories-1.

cat_df_flights_lc['carrier’]=cat_df_flights_lc['carrier'].cat.codes
cat_df_flights_lc.head() #alphabetically labeled from 0 to 10

Data Mining: Concepts

July 17, 2015 and Techniques
One hot encoding :convert each category value into a new
column and assign a 1 or 0 (True/False) value to the column.
This has the benefit of not weighting a value improperly.

pandas .get_dummies()
'

cat_df_flights_onehot = cat_df_flights.copy()
cat_df_flights_onehot =
pd.get_dummies(cat_df_flights_onehot,
columns=['carrier'], prefix = ['carrier'])
print(cat_df_flights_onehot.head())

Data Mining: Concepts

July 17, 2015 and Techniques
Binary encoding

the categories are encoded as ordinal, then those

integers are converted into binary code, then the
digits from that binary string are split into
separate columns. This encodes the data in fewer
dimensions than one-hot.

Data Mining: Concepts

July 17, 2015 and Techniques
Backward Difference Encoding

A feature of K categories, or levels, usually enters a

regression as a sequence of K-1 dummy variables.
In backward difference coding, the mean of the
dependent variable for a level is compared with
the mean of the dependent variable for the prior
level. This type of coding may be useful for a
nominal or an ordinal variable.

Data Mining: Concepts

July 17, 2015 and Techniques
Hhh
# Load dataset: iris = load_iris() # Build and train classifier: clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target) #
Visualize model: if verbose: dot_data = tree.export_graphviz(clf, out_file=None, feature_names=iris.feature_names, class_names=iris.target_names, fi

Data Mining: Concepts

July 17, 2015 and Techniques
Data Mining: Concepts
July 17, 2015 and Techniques
Data Mining: Concepts and
July 17, 2015 Techniques
Data Mining: Concepts and
July 17, 2015 Techniques
Data Mining: Concepts and
July 17, 2015 Techniques
Data Mining: Concepts and
July 17, 2015 Techniques
Computing Information-Gain for
Continuous-Value Attributes
 Let attribute A be a continuous-valued attribute
 Must determine the best split point for A
 Sort the value A in increasing order
 Typically, the midpoint between each pair of adjacent
values is considered as a possible split point
 (ai+ai+1)/2 is the midpoint between the values of ai and ai+1

 The point with the minimum expected information

requirement for A is selected as the split-point for A
 Split:
 D1 is the set of tuples in D satisfying A ≤ split-point, and
D2 is the set of tuples in D satisfying A > split-point
July 17, 2015 36
Gain Ratio for Attribute Selection (C4.5)

 Information gain measure is biased towards attributes

with a large number of values
 Ex. Product_ID It may consist of large number of
partitions, each one contain only one tuple.
 Information gain of such attribute is maximal.
 C4.5 (a successor of ID3) uses gain ratio to overcome the
problem (normalization to information gain using split
information value.)
v
|D j| | D j|
SplitInfo A ( D )=− ∑ × log 2 ( )
j= 1 | D| |D|

4 4 6 6 4 4
SplitInfo A ( D )=− ×log2 ( )− ×log 2 ( )− ×log 2 ( )=0. 926
14 14 14 14 14 14
July 17, 2015 Data Mining: Concepts and Techniques 37
GainRatio(A) = Gain(A)/SplitInfo(A)
 Ex. Gain ratio(income) = 0.029/0.926 = 0.031
 The attribute with the maximum gain ratio is selected as
the splitting attribute

Data Mining: Concepts and

July 17, 2015 Techniques
Gini index (CART, IBM IntelligentMiner)

 Gini index measures the impurity of D,a data partition or set of training
tuples

 If a data set D contains examples from n classes, gini index, gini(D) is

defined as n
gini( D )=1− ∑ p 2j
j= 1

where pj is the relative frequency of class j in D |Cj, D|/|D|

 Ex. Income has three possible values, namely {low, medium, high},
then there are possible 8 subsets, we exclude {low, medium, high}, {}
so there are 23 -2 possible ways to form two partitions of the data, D,
based on binary split of A.

July 17, 2015 Data Mining: Concepts and Techniques 39

 If a data set D is split on A into two subsets D1 and D2, the gini index
gini(D) is defined as
|D 1| |D 2|
gini A ( D )= gini ( D 1 )+ gini( D 2 )
| D| |D|
gini A ( D )
is computing a weighted sum of the impurity of each resulting partition.

For a discrete-valued attribute, the subset that gives the minimum Gini
index for that attribute is selected as its splitting subset.

 Reduction in Impurity:
Δg ini ( A )= gini ( D )− gini A ( D )

July 17, 2015

 The attribute provides the smallest ginisplit(D) (or the largest
reduction in impurity) is chosen to split the node.

Data Mining: Concepts and

July 17, 2015 Techniques
Gini index (CART, IBM IntelligentMiner)

 Ex. D has 9 tuples in buys_computer = “yes” and 5 in “no”

gini( D)=1−
14
− ( )( )
9 2 5 2
14
=0 .459

 Suppose the attribute income partitions D into 10 in D1: {low, medium}

and 4 in D2 10 4
giniincome∈ {low,medium } (D )= (14 )Gini (D )+(14 ) Gini( D )
1 1

Find values for gini{medium,high} and gini{low,high}

July 17, 2015 Data Mining: Concepts and Techniques 42

but gini{medium,high} is 0.450 and gini{low,high} is 0.458

Now apply same method on other attributes age,

student, credit rating.

Data Mining: Concepts and

July 17, 2015 Techniques
For age best splitting subset is gini{youth,senior} or {middle_aged} is 0.375/0.357
and student and credit rating are binary attributes with Gini Index
0.367 and 0.429 Respectively.

Finally attribute age with splitting subst {youth,senior} gives

minimum Gini index, 0.459-0.357=0.102

Data Mining: Concepts and

July 17, 2015 Techniques
Comparing Attribute Selection Measures

 The three measures, in general, return good results but

 Information gain:
biased towards multivalued attributes
 Gain ratio:
tends to prefer unbalanced splits in which one
partition is much smaller than the others
 Gini index:
biased towards multivalued attributes
has difficulty when number of classes is large
tends to favor tests that result in equal-sized
partitions and purity in both partitions
July 17, 2015 Data Mining: Concepts and Techniques 45
Overfitting and Tree Pruning(Repetition and Replication)

 Overfitting: An induced tree may overfit the training data

 Too many branches, some may reflect irregularities due to noise
or outliers
 Poor accuracy for unseen samples
 Two approaches of Tree pruning:
 Prepruning: Halt tree construction early—do not split a node if this
would result in the goodness measure falling below a threshold
 Difficult to choose an appropriate threshold.
 High thresholds could result in oversimplified trees
 Low thresholds could result in very little simplification.

July 17, 2015 Data Mining: Concepts and Techniques 46

 Postpruning: Remove branches from a “fully grown” tree.
 A subtree at a given node is pruned by removing its branches and
replacing it with a leaf.
 The leaf is labled with most frequent class among the subtree
being replaced.
 Use a set of data different from the training data to decide
which is the “best pruned tree”

Data Mining: Concepts and

July 17, 2015 Techniques
Bayesian Classification: Why?

 A statistical classifier: performs probabilistic prediction,

i.e., predicts class membership probabilities such as the
probability that given tuple belongs to which class.

 Foundation: Based on Bayes’ Theorem.

 Performance: A simple Bayesian classifier, known as naive

Bayesian classifier, has comparable performance with
decision tree and selected neural network classifiers.

July 17, 2015 Data Mining: Concepts and Techniques 48

Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct
prior knowledge can be combined with observed data

Standard: Even when Bayesian methods are computationally

inflexible, they can provide a standard of optimal decision
making against which other methods can be measured.
Bayesian Theorem: Basics
 Let X be a data sample (“evidence”): class label is unknown
 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), the probability that
the hypothesis holds given the observed data sample X
 Suppose x is described by attribute age and income , 35
year old customer with income 40000, suppose
H(Hypothesis) is customer will buy computer then,
 P(H/X) reflects the probability that customer x will buy a
computer given that we know the customers age and
income.

July 17, 2015 Data Mining: Concepts and Techniques 50

 P(H) (prior probability), the initial probability
 E.g., X or any given customer will buy computer,
regardless of age, income, …
 P(X): probability that sample data is observed
It is probability that person from our set of customers is
35 years old and income is 40,000.
 P(X|H) (posteriori probability), the probability of observing
the sample X, given that the hypothesis holds
 E.g., It is the probability that customer x is 35 year old
and income is 40,000 given that we know customer will
buy a computer.
Bayesian Theorem

 Given training data X, posteriori probability of a hypothesis H,

P(H|X), follows the Bayes theorem
P ( X |H ) P ( H )
P ( H |X )=
P( X )
 Informally, this can be written as
posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci if the probability P(Ci|X) is the highest
among all the P(Ck|X) for all the k classes
 Practical difficulty: require initial knowledge of many
probabilities, significant computational cost

July 17, 2015 52

Towards Naïve Bayesian Classifier

 Let D be a training set of tuples and their associated class

labels, and each tuple is represented by an n attribute vector X
= (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
 This can be derived from Bayes’ theorem
P ( X|Ci )P (C i )
P(C i|X )=
P( X )
 Since P(X) is constant for all classes, only
P ( C i| X ) =P ( X |C i ) P ( C i )

needs
July 17, 2015
to be maximized 53
Naïve Bayesian Classifier: Training Dataset
age income studentcredit_rating
buys_com
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data sample
31…40 low yes excellent yes
X = (age <=30,
<=30 medium no fair no
Income = medium,
Student = yes <=30 low yes fair yes
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
July 17, 2015 Data Mining: Concepts and Techniques 54
Naïve Bayesian Classifier: An Example
 X = (age <= 30 , income = medium, student = yes,
credit_rating = fair)

 P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643

P(buys_computer = “no”) = 5/14= 0.357

 Compute P(X|Ci) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4

July 17, 2015 Data Mining: Concepts and Techniques 55

P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

 X = (age <= 30 , income = medium, student = yes, credit_rating

= fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|Ci)P(Ci) : P(X|buys_computer = “yes”) P(buys_computer = “yes”) = 0.028

P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

July 17, 2015

 Find class label for X = (age >40 , income = medium, student =
yes, credit_rating = excellent)

 Find class label for X=(age=31 to 40, income=medium,

student=yes ,credit_rating=excellent)

Data Mining: Concepts and

July 17, 2015 Techniques
Data Mining: Concepts
July 17, 2015 and Techniques
Outlook= “ Overcast” ,Temp=“Mild”, Humidity=“High”, Windy=“False”

Play Golf=Yes (9/14) : 4/94/93/9*6/9=0.04

Play Golf =No (5/14) : 0/5*2/5*4/5*2/5=0
(Limitation of NB, if zero frequency found .
Solution: Use smoothing technique like Laplace estimation

July 17, 2015 Data Mining: Concepts and Techniques

Naïve Bayesian Classifier: Comments
 Advantages
 Easy to implement
 Good results obtained in most of the cases
 Disadvantages
 Assumption: class conditional independence, therefore
loss of accuracy
 Practically, dependencies exist among variables
 E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
 Dependencies among these cannot be modeled by Naïve
Bayesian Classifier
 How to deal with these dependencies?
 Bayesian Belief Networks
July 17, 2015 Data Mining: Concepts and Techniques 60
Other Limitation of Naive Bayes :
1. assumption of independent predictors.
In real life, it is almost impossible that we get a set
of predictors which are completely independent.
2. Zero Probability

Data Mining: Concepts

July 17, 2015 and Techniques
Types of NB Classifier
Gaussian: it assumes that features follow a normal
distribution.
Data is continuous , features having continuous values
Multinomial: It is used for discrete counts. works with
occurrence counts,.
used in text classification (where the data are typically represented as word
vector counts,
Example: count of each word, movie rating
Bernoulli: useful if your feature vectors are binary (i.e.
zeros and ones).
there may be multiple features but each one is assumed to be a binary-
valued
e.g symptoms present or not ,Word present or not

Data Mining: Concepts

July 17, 2015 and Techniques
Using IF-THEN Rules for Classification

 Represent the knowledge in the form of IF-THEN rules

R: IF age = youth AND student = yes THEN buys_computer = yes
 Rule antecedent/precondition vs. rule consequent
 Assessment of a rule: coverage and accuracy
 ncovers = # of tuples covered by R
 ncorrect = # of tuples correctly classified by R
coverage(R) = ncovers /|D| /* D: training data set */
accuracy(R) = ncorrect / ncovers

July 17, 2015 Data Mining: Concepts and Techniques 63

If more than one rule is triggered, need conflict resolution
Size ordering: assign the highest priority to the triggering rules that has the
“toughest” requirement (i.e., with the most attribute test)
Class-based ordering: decreasing order of prevalence or mis classification cost
per class
Rule-based ordering (decision list): rules are organized into one long priority
list, according to some measure of rule quality or by experts
Rule Extraction from a Decision Tree
age?

<=30 31..40 >40

 Rules are easier to understand than large trees student? credit rating?
yes
 One rule is created for each path from the root to excellent fair
n yes
a leaf o
no yes
no yes
 Each attribute-value pair along a path forms a
conjunction: the leaf holds the class prediction
 Rules are mutually exclusive and exhaustive
 Example: Rule extraction from our buys_computer decision-tree
IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = yes
IF age = young AND credit_rating = fair THEN buys_computer = no

July 17, 2015 65

Rule Extraction from the Training Data

 Sequential covering algorithm: Extracts rules directly from training data

 Typical sequential covering algorithms: FOIL, AQ, CN2, RIPPER
 Rules are learned sequentially, each for a given class Ci will cover many
tuples of Ci but none (or few) of the tuples of other classes
 Steps:
 Rules are learned one at a time
 Each time a rule is learned, the tuples covered by the rules are
removed
 The process repeats on the remaining tuples unless termination
condition, e.g., when no more training examples or when the quality
of a rule returned is below a user-specified threshold
 Comp. w. decision-tree induction: learning a set of rules simultaneously
July 17, 2015 Data Mining: Concepts and Techniques 66
Classification: A Mathematical Mapping

 Classification:
predicts categorical class labels
 E.g., Personal homepage classification
xi = (x1, x2, x3, …), yi = +1 or –1
x1 : # of a word “homepage”
x2 : # of a word “welcome”
 Mathematically
x  X = n, y  Y = {+1, –1}
We want a function f: X  Y
July 17, 2015 Data Mining: Concepts and Techniques 67
Linear Classification
 Binary Classification
problem
 The data above the red
line belongs to class ‘x’
x  The data below red line
x x
x x belongs to class ‘o’
x  Examples: SVM,
x x x o
Perceptron, Probabilistic
o
x o Classifiers
o o o
o o o
o o o o

July 17, 2015 Data Mining: Concepts and Techniques 68

Discriminative Classifiers
 Advantages
 prediction accuracy is generally high
 As compared to Bayesian methods – in general
 robust, works when training examples contain errors
 fast evaluation of the learned target function
 Bayesian networks are normally slow
 Criticism
 long training time
 difficult to understand the learned function (weights)
 Bayesian networks can be used easily for pattern discovery
 not easy to incorporate domain knowledge
 Easy in the form of priors on the data or distributions

July 17, 2015 Data Mining: Concepts and Techniques 69

Affidavit of Walker Todd
93% (28)
Affidavit of Walker Todd
14 pages
PDF Evidence Based Psychotherapies for Children and Adolescents 3rd download
67% (3)
PDF Evidence Based Psychotherapies for Children and Adolescents 3rd download
24 pages
Crush Hypothesis Testing
From Everand
Crush Hypothesis Testing
Allison Dillard
No ratings yet
A Place Between - Art, Architecture and Critical Theory
100% (1)
A Place Between - Art, Architecture and Critical Theory
13 pages
William O'Neil Quotes
100% (3)
William O'Neil Quotes
17 pages
Slides For Textbook - Chapter 7 - : March 6, 2014 Data Mining: Concepts and Techniques 1
No ratings yet
Slides For Textbook - Chapter 7 - : March 6, 2014 Data Mining: Concepts and Techniques 1
23 pages
7 Class
No ratings yet
7 Class
72 pages
DM Ch6 (Classification and Prediction)
No ratings yet
DM Ch6 (Classification and Prediction)
39 pages
Chapter 6 Classification and Prediction25.10.13
No ratings yet
Chapter 6 Classification and Prediction25.10.13
43 pages
Chap 7
No ratings yet
Chap 7
71 pages
Classification lecture 1
No ratings yet
Classification lecture 1
51 pages
7 Class
No ratings yet
7 Class
72 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Classification - Prediction Data Model Very Important
No ratings yet
Classification - Prediction Data Model Very Important
173 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
72 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
115 pages
Classification Prediction
No ratings yet
Classification Prediction
71 pages
Chapter4 Classification Prediction
No ratings yet
Chapter4 Classification Prediction
173 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
Chapter 5. Classification and Prediction
No ratings yet
Chapter 5. Classification and Prediction
122 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Data Mining: UNIT-3 Classification
No ratings yet
Data Mining: UNIT-3 Classification
54 pages
Chapter 7. Classification and Prediction
No ratings yet
Chapter 7. Classification and Prediction
68 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
129 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
129 pages
Classification & Prediction
No ratings yet
Classification & Prediction
78 pages
Classfication and Prediction
No ratings yet
Classfication and Prediction
133 pages
DM Classification 1 3
No ratings yet
DM Classification 1 3
19 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
139 pages
A Data Mining Query Language
No ratings yet
A Data Mining Query Language
69 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
88 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
88 pages
Classification Intr DT .Pptx
No ratings yet
Classification Intr DT .Pptx
31 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
112 pages
Unit-IV Classification Part 1
No ratings yet
Unit-IV Classification Part 1
38 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
LECTURE 8
No ratings yet
LECTURE 8
81 pages
Class Basic
No ratings yet
Class Basic
67 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
7 Classification
100% (3)
7 Classification
63 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
Lecture 10
No ratings yet
Lecture 10
53 pages
_08ClassBasic_v1
No ratings yet
_08ClassBasic_v1
46 pages
Lecture 4
No ratings yet
Lecture 4
79 pages
05 Classification
No ratings yet
05 Classification
79 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
30 pages
08 Class Basic
No ratings yet
08 Class Basic
76 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
87 pages
CH 5
No ratings yet
CH 5
81 pages
dm 3
No ratings yet
dm 3
37 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Classification
No ratings yet
Classification
45 pages
Machine
No ratings yet
Machine
61 pages
What, So What, Now What
From Everand
What, So What, Now What
Chris Garson
3/5 (1)
Swap Discounting and OIS Curve
100% (2)
Swap Discounting and OIS Curve
6 pages
Module Title:-Cable Calculation: o o o o o o
No ratings yet
Module Title:-Cable Calculation: o o o o o o
13 pages
4.drug Metabolism (Biotransformation)
No ratings yet
4.drug Metabolism (Biotransformation)
33 pages
Abstract Revise
No ratings yet
Abstract Revise
3 pages
The Bounds of Love 2nd Ed
100% (1)
The Bounds of Love 2nd Ed
159 pages
MODULE 2 Eng6100
No ratings yet
MODULE 2 Eng6100
5 pages
Youth Crime - Agree or Disagree
No ratings yet
Youth Crime - Agree or Disagree
2 pages
Unit Pra-U Usm - Kolej Mara Kulim
No ratings yet
Unit Pra-U Usm - Kolej Mara Kulim
104 pages
Applying Basic First Aid
No ratings yet
Applying Basic First Aid
34 pages
adjective worksheet
No ratings yet
adjective worksheet
3 pages
Lesson Plan in English 8 Q2 Module 1
100% (4)
Lesson Plan in English 8 Q2 Module 1
8 pages
Speaking Script (Mother's Gift)
No ratings yet
Speaking Script (Mother's Gift)
4 pages
Zamazalova Crossroads
No ratings yet
Zamazalova Crossroads
32 pages
Eman Al Omari CV Edited
No ratings yet
Eman Al Omari CV Edited
4 pages
11 Socio Cultural (Chap. 7)
100% (1)
11 Socio Cultural (Chap. 7)
22 pages
Wi Rosel
No ratings yet
Wi Rosel
5 pages
RC Charging Circuit
No ratings yet
RC Charging Circuit
43 pages
Misallete
No ratings yet
Misallete
7 pages
Muet Book.
No ratings yet
Muet Book.
8 pages
Ethics Project
0% (1)
Ethics Project
27 pages
The New Beetle Case
No ratings yet
The New Beetle Case
7 pages
Impact of Social Media On Retail Industry
No ratings yet
Impact of Social Media On Retail Industry
6 pages
092814
No ratings yet
092814
4 pages
Libra Aquarius Compatibility - Google Search
No ratings yet
Libra Aquarius Compatibility - Google Search
1 page
Kapil Sonwale Resume JSPM Final
No ratings yet
Kapil Sonwale Resume JSPM Final
2 pages
De Anh Chuyen Chinh Thuc 2020
No ratings yet
De Anh Chuyen Chinh Thuc 2020
5 pages