CH 3
CH 3
Outline
• Knowledge Representation
• Tables
• Linear Models
• Trees
• Rules
• Instance-based Representation
• Clusters
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
3
Tables
●
Simplest way of representing output:
●
Use the same format as input!
●
Decision table for the weather problem:
Outlook Humidity Play
Sunny High No
Sunny Normal Yes
Overcast High Yes
Overcast Normal Yes
Rainy High No
Rainy Normal No
Linear Models
• Another simple representation
• Regression model
• Inputs (attribute values) and output are all numeric
• Output is the sum of weighted attribute values
• The trick is to find good values for the weights
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
7
Trees
●
“Divide-and-conquer” approach produces tree
●
Nodes involve testing a particular attribute
●
Usually, attribute value is compared to constant
●
Other possibilities:
●
Comparing values of two attributes
●
Using a function of one or more attributes
●
Leaves assign classification, set of classifications, or
probability distribution to instances
●
Unknown instance is routed down the tree
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
11
●
Nominal:
number of children usually equal to number values
attribute won’t get tested more than once
●
Other possibility: division into two subsets
●
Numeric:
test whether value is greater or less than constant
attribute may get tested several times
●
Other possibility: three-way split (or multi-way split)
●
Integer: less than, equal to, greater than
●
Real: below, within, above
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
12
Missing Values
●
Does absence of value have some significance?
Yes “missing” is a separate value
●
PRP =
- 56.1
+ 0.049 MYCT
+ 0.015 MMIN
+ 0.006 MMAX
+ 0.630 CACH
- 0.270 CHMIN
+ 1.46 CHMAX
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
15
Classification Rules
●
Popular alternative to decision trees
●
Antecedent (pre-condition): a series of tests (just
like the tests at the nodes of a decision tree)
●
Tests are usually logically ANDed together (but
may also be general logical expressions)
●
Consequent (conclusion): classes, set of classes,
or probability distribution assigned by rule
●
Individual rules are often logically ORed together
●
Conflicts arise if different conclusions apply
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
18
●
One rule for each leaf:
Antecedent contains a condition for every node on the path from
the root to the leaf
Consequent is class assigned by the leaf
Produces rules that are unambiguous
●
●
Doesn’t matter in which order they are executed
But: resulting rules are unnecessarily complex
●
●
Pruning to remove redundant tests/rules
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
19
●
Tree cannot easily express disjunction between rules
Example: rules which test different attributes
●
If a and b then x
If c and d then x
●
Symmetry needs to be broken
●
Corresponding tree contains identical subtrees
( “replicated subtree problem”)
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
20
If x = 1 and y = 0
then class = a
If x = 0 and y = 1
then class = a
If x = 0 and y = 0
then class = b
If x = 1 and y = 1
then class = b
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
22
If x = 1 and y = 1
then class = a
If z = 1 and w = 1
then class = a
Otherwise class = b
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
23
“Nuggets” of Knowledge
●
Are rules independent pieces of knowledge? (It
seems easy to add a rule to an existing rule base.)
●
Problem: ignores how rules are executed
●
Two ways of executing a rule set:
●
Ordered set of rules (“decision list”)
●
Order is important for interpretation
●
Unordered set of rules
●
Rules may overlap and lead to different conclusions for the same
instance
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
24
Interpreting Rules
●
Give no conclusion at all?
●
Go with rule that is most popular on training data?
●
…
What if no rule applies to a test instance?
●
●
Give no conclusion at all?
●
Go with class that is most frequent in training data?
●
…
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
25
Association Rules
• Association rules…
• … can predict any attribute and combinations of attributes
• … are not intended to be used together as a set
• Problem: immense number of possible associations
• Output needs to be restricted to show only the most
predictive associations only those with high support and
high confidence
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
27
• New instance:
Sepal Sepal Petal Petal Type
length width length width
5.1 3.5 2.6 0.2 Iris-setosa
• Modified rule:
default: Iris-setosa
except if petal-length 2.45 and petal-length < 5.355
and petal-width < 1.75
then Iris-versicolor
except if petal-length 4.95 and petal-width < 1.55
then Iris-virginica
else if sepal-length < 4.95 and sepal-width 2.45
then Iris-virginica
else if petal-length 3.35
then Iris-virginica
except if petal-length < 4.85 and sepal-length < 5.95
then Iris-versicolor
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
30
More on Exceptions
• Default...except if...then...
is logically equivalent to
if...then...else
(where the else specifies what the default did)
• But: exceptions offer a psychological advantage
• Assumption: defaults and tests early on apply
more widely than exceptions further down
• Exceptions reflect special cases
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
32
Instance-based Representation
• Simplest form of learning: rote learning
• Training instances are searched for instance that most
closely resembles new instance
• The instances themselves represent the knowledge
• Also called instance-based learning
• Similarity function defines what’s “learned”
• Instance-based learning is lazy learning
• Methods: nearest-neighbor, k-nearest-neighbor, …
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
34
Learning Prototypes
Rectangular Generalizations
Representing Clusters I
Overlapping clusters
Data Mining: Practical Machine Learning Tools and
Techniques (Chapter 3)
38
Representing Clusters II
1 2 3