0% found this document useful (0 votes)
7 views

Tutorial 1

This document contains questions and solutions related to a data mining tutorial. It discusses attributes and instances in a weather data set, defines classification rules and decision trees with an example, shows how to convert a set of classification rules into a decision tree, and describes common issues with real-world data sets and important steps in the data mining process.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Tutorial 1

This document contains questions and solutions related to a data mining tutorial. It discusses attributes and instances in a weather data set, defines classification rules and decision trees with an example, shows how to convert a set of classification rules into a decision tree, and describes common issues with real-world data sets and important steps in the data mining process.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Dr.

Dimitrios Letsios
Department of Informatics
King’s College London

7CCSMDM1 Data Mining


Tutorial 1

Question 1
• Which are the weekly tasks for 7CCSMDM1?

• Which are the assessments for 7CCSMDM1?

Question 2
• Explain what are the attributes and instances of a tabular data set and give examples in the
weather data set.

• Explain what is a classification rule and a decision tree and show an example of each for
the weather data set.

Question 3
Consider a classification problem with a binary label y and four binary attributes x1 , x2 , x3 , x4 whose
values must belong in the set {0, 1}. Convert the following ordered set of classification rules into a
decision tree:

1
• (x1 = 1) ∧ (x2 = 1) → (y = 1).

• (x3 = 1) ∧ (x4 = 1) → (y = 1).

• otherwise → (y = 0).

Question 4
• Describe 4 issues that we might have to deal with when applying data mining methodologies on
real-world data sets.

• Describe 5 important steps of data mining as a process.

2
Solution 2
An attribute is an individual measurable characteristic of the data mining problem under consideration.
For example, the temperature (hot, mild, cold) and outlook (sunny, overcast, rainy) affects whether
we are going to play a game or not. An instance is a vector of values, one for each attribute, which
is used for building data mining models. For example, historical weather conditions and decisions are
used for building a data mining model to make predictions on whether we are going to play the game
in the future, based on the weather conditions.
A classification rule is a type of a data mining model for predicting the class of an instance in
the form of an “if-then” statement. The if part is called the antecedent or precondition and the then
part the consequent of conclusion. An example of a classification rule which is true for all instances
satisfying the precondition is:

• (outlook=overcast) ∧ (temperature=mild)→ (play=yes)

There exist other classification rules which are not necessarily true for every possible instance. In
general, we are interested in classification rules with high accuracy.
A decision tree is tree-like model of decisions for making predictions. Nodes represents decisions,
i.e. tests on the value of one or more attributes. E.g. a node may test for the attribute outlook the 3
possible alternatives: sunny, overcast and windy. In the context of classification, each leaf is associated
with a class. An example of decision tree that predicts all instances of the weather data set correctly
is the following:

3
Solution 3
Recall that each path of a decision tree is corresponds to a classification rule. We may convert a set
of classifications rules into a decision tree, by adding them one by one to tree-like structure until we
obtain a tree that makes a prediction for every possible input. In order to be able to create such a
tree (i.e. represent every rule as a path of the tree), we need to modify the set of rules into a new set
which is equivalent to original one.
• (x1 = 1) ∧ (x2 = 1) → (y = 1).
• (x1 = 1) ∧ (x2 = 0) ∧ (x3 = 1) ∧ (x4 = 1) → (y = 1).
• (x1 = 1) ∧ (x2 = 0) ∧ (x3 = 1) ∧ (x4 = 0) → (y = 0).
• (x1 = 1) ∧ (x2 = 0) ∧ (x3 = 0) → (y = 0).
• (x1 = 0) ∧ (x2 = 1) ∧ (x3 = 1) ∧ (x4 = 1) → (y = 1).
• (x1 = 0) ∧ (x2 = 1) ∧ (x3 = 1) ∧ (x4 = 0) → (y = 0).
• (x1 = 0) ∧ (x2 = 1) ∧ (x3 = 0) → (y = 1).
• (x1 = 0) ∧ (x2 = 0) ∧ (x3 = 1) ∧ (x4 = 1) → (y = 1).
• (x1 = 0) ∧ (x2 = 0) ∧ (x3 = 1) ∧ (x4 = 0) → (y = 0).
• (x1 = 0) ∧ (x2 = 0) ∧ (x3 = 0) → (y = 1).
Sometimes, classification rules are more compact representations than trees. However, classification
rules may fail to classify an instances and may include individual rules making different predictions
for the same instance. These two situations cannot happen in decision trees.

Solution 4
Real-world world data sets can be:
• massive with many attributes and instances and may require very efficient data mining algo-
rithms.
• noisy with erroneous attribute values affecting the quality of the derived models.
• with missing values and may need adapting data mining approaches.
• biased or insufficient, i.e. not representing well the space of all instances (population) which
results in data mining models that generalize poorly.

Important steps of the data mining process are:


• Specify the objective, e.g. whether we are faced with a supervised or unsupervised learning.
• Explore the data, e.g. visualize the data and obtain insights on whether the specified objective
is achievable.
• Clean the data, e.g. deal with issues of real-world data sets mentioned before.
• Build a model using some state-of-the-art data mining approach.
• Evaluate the model, e.g. assess whether it generalizes well to unseen data.

You might also like