Tutorial 1

This document contains questions and solutions related to a data mining tutorial. It discusses attributes and instances in a weather data set, defines classification rules and decision trees with an example, shows how to convert a set of classification rules into a decision tree, and describes common issues with real-world data sets and important steps in the data mining process.

Uploaded by

Jonas Jixiao Wang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Tutorial 1

Uploaded by

Jonas Jixiao Wang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Dr.

Dimitrios Letsios
Department of Informatics
King’s College London

7CCSMDM1 Data Mining

Tutorial 1

Question 1
• Which are the weekly tasks for 7CCSMDM1?

• Which are the assessments for 7CCSMDM1?

Question 2
• Explain what are the attributes and instances of a tabular data set and give examples in the
weather data set.

• Explain what is a classification rule and a decision tree and show an example of each for
the weather data set.

Question 3
Consider a classification problem with a binary label y and four binary attributes x1 , x2 , x3 , x4 whose
values must belong in the set {0, 1}. Convert the following ordered set of classification rules into a
decision tree:

1
• (x1 = 1) ∧ (x2 = 1) → (y = 1).

• (x3 = 1) ∧ (x4 = 1) → (y = 1).

• otherwise → (y = 0).

Question 4
• Describe 4 issues that we might have to deal with when applying data mining methodologies on
real-world data sets.

• Describe 5 important steps of data mining as a process.

2
Solution 2
An attribute is an individual measurable characteristic of the data mining problem under consideration.
For example, the temperature (hot, mild, cold) and outlook (sunny, overcast, rainy) affects whether
we are going to play a game or not. An instance is a vector of values, one for each attribute, which
is used for building data mining models. For example, historical weather conditions and decisions are
used for building a data mining model to make predictions on whether we are going to play the game
in the future, based on the weather conditions.
A classification rule is a type of a data mining model for predicting the class of an instance in
the form of an “if-then” statement. The if part is called the antecedent or precondition and the then
part the consequent of conclusion. An example of a classification rule which is true for all instances
satisfying the precondition is:

• (outlook=overcast) ∧ (temperature=mild)→ (play=yes)

There exist other classification rules which are not necessarily true for every possible instance. In
general, we are interested in classification rules with high accuracy.
A decision tree is tree-like model of decisions for making predictions. Nodes represents decisions,
i.e. tests on the value of one or more attributes. E.g. a node may test for the attribute outlook the 3
possible alternatives: sunny, overcast and windy. In the context of classification, each leaf is associated
with a class. An example of decision tree that predicts all instances of the weather data set correctly
is the following:

3
Solution 3
Recall that each path of a decision tree is corresponds to a classification rule. We may convert a set
of classifications rules into a decision tree, by adding them one by one to tree-like structure until we
obtain a tree that makes a prediction for every possible input. In order to be able to create such a
tree (i.e. represent every rule as a path of the tree), we need to modify the set of rules into a new set
which is equivalent to original one.
• (x1 = 1) ∧ (x2 = 1) → (y = 1).
• (x1 = 1) ∧ (x2 = 0) ∧ (x3 = 1) ∧ (x4 = 1) → (y = 1).
• (x1 = 1) ∧ (x2 = 0) ∧ (x3 = 1) ∧ (x4 = 0) → (y = 0).
• (x1 = 1) ∧ (x2 = 0) ∧ (x3 = 0) → (y = 0).
• (x1 = 0) ∧ (x2 = 1) ∧ (x3 = 1) ∧ (x4 = 1) → (y = 1).
• (x1 = 0) ∧ (x2 = 1) ∧ (x3 = 1) ∧ (x4 = 0) → (y = 0).
• (x1 = 0) ∧ (x2 = 1) ∧ (x3 = 0) → (y = 1).
• (x1 = 0) ∧ (x2 = 0) ∧ (x3 = 1) ∧ (x4 = 1) → (y = 1).
• (x1 = 0) ∧ (x2 = 0) ∧ (x3 = 1) ∧ (x4 = 0) → (y = 0).
• (x1 = 0) ∧ (x2 = 0) ∧ (x3 = 0) → (y = 1).
Sometimes, classification rules are more compact representations than trees. However, classification
rules may fail to classify an instances and may include individual rules making different predictions
for the same instance. These two situations cannot happen in decision trees.

Solution 4
Real-world world data sets can be:
• massive with many attributes and instances and may require very efficient data mining algo-
rithms.
• noisy with erroneous attribute values affecting the quality of the derived models.
• with missing values and may need adapting data mining approaches.
• biased or insufficient, i.e. not representing well the space of all instances (population) which
results in data mining models that generalize poorly.

Important steps of the data mining process are:

• Specify the objective, e.g. whether we are faced with a supervised or unsupervised learning.
• Explore the data, e.g. visualize the data and obtain insights on whether the specified objective
is achievable.
• Clean the data, e.g. deal with issues of real-world data sets mentioned before.
• Build a model using some state-of-the-art data mining approach.
• Evaluate the model, e.g. assess whether it generalizes well to unseen data.

PHIL University Physics For The Physical and Life Sciences Volume 2 PDF
67% (6)
PHIL University Physics For The Physical and Life Sciences Volume 2 PDF
694 pages
Latihan Soal Bahasa Inggris - UTBK
75% (4)
Latihan Soal Bahasa Inggris - UTBK
5 pages
Psychiatry: Taking A Psychiatric History
100% (1)
Psychiatry: Taking A Psychiatric History
48 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
DM Witten 03
No ratings yet
DM Witten 03
56 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
02 Input Output
No ratings yet
02 Input Output
44 pages
AI Lecture 9
No ratings yet
AI Lecture 9
69 pages
HW1
No ratings yet
HW1
4 pages
DM UNIT III (1)
No ratings yet
DM UNIT III (1)
87 pages
Module 3
No ratings yet
Module 3
64 pages
Bda 41
No ratings yet
Bda 41
72 pages
Chap 4 - Using Decision Trees For Classification
No ratings yet
Chap 4 - Using Decision Trees For Classification
10 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
REPORT On DECISION TREE
No ratings yet
REPORT On DECISION TREE
40 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
AIML Lect5 Decision Tree
No ratings yet
AIML Lect5 Decision Tree
33 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Yapay Zeka Ve Makine Öğrenmesi 10
No ratings yet
Yapay Zeka Ve Makine Öğrenmesi 10
34 pages
4 Classification
No ratings yet
4 Classification
20 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
UNIT-3
No ratings yet
UNIT-3
34 pages
Classification: Basic Concepts, Decision Trees, and Model Evaluation
No ratings yet
Classification: Basic Concepts, Decision Trees, and Model Evaluation
25 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
45 pages
DLWSS551 - Knowledge Representation
No ratings yet
DLWSS551 - Knowledge Representation
43 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
Slide 3
No ratings yet
Slide 3
23 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
03 Decision Tree
No ratings yet
03 Decision Tree
59 pages
Decision Tree Practice Problems
No ratings yet
Decision Tree Practice Problems
2 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
ML_Unit-2_Material
No ratings yet
ML_Unit-2_Material
20 pages
Wk7 Knowlege Representation
No ratings yet
Wk7 Knowlege Representation
45 pages
Unit 3 - Classification
No ratings yet
Unit 3 - Classification
28 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
Lec 16,17
No ratings yet
Lec 16,17
90 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
Lec.7.intro.D.S. Fall 2023
No ratings yet
Lec.7.intro.D.S. Fall 2023
26 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Ch4 Supervised
No ratings yet
Ch4 Supervised
78 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
No ratings yet
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
33 pages
jdavis-indlearn2 (1)
No ratings yet
jdavis-indlearn2 (1)
91 pages
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
No ratings yet
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
5 pages
ML Assignment
No ratings yet
ML Assignment
7 pages
updated dm unit 3
No ratings yet
updated dm unit 3
28 pages
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
United States Court of Appeals Fourth Circuit
No ratings yet
United States Court of Appeals Fourth Circuit
7 pages
Advantages and Disadvantages Essay
No ratings yet
Advantages and Disadvantages Essay
8 pages
COT 4 SY. 2023-2024 Greenhouse Effect
No ratings yet
COT 4 SY. 2023-2024 Greenhouse Effect
7 pages
CV Edoardomontagner
No ratings yet
CV Edoardomontagner
1 page
Strengthen Your Core Skills: Put Your Learning Into Practice
100% (1)
Strengthen Your Core Skills: Put Your Learning Into Practice
17 pages
B.Ed Full Syllabus
No ratings yet
B.Ed Full Syllabus
11 pages
Full Download Distributed Computing and Artificial Intelligence Volume 1 18th International Conference Lecture Notes in Networks and Systems Kenji Matsui (Editor) PDF DOCX
100% (4)
Full Download Distributed Computing and Artificial Intelligence Volume 1 18th International Conference Lecture Notes in Networks and Systems Kenji Matsui (Editor) PDF DOCX
50 pages
Learning Styles
No ratings yet
Learning Styles
14 pages
Let Review Prof Ed
No ratings yet
Let Review Prof Ed
16 pages
Split Up of Syllabus English Class 12
No ratings yet
Split Up of Syllabus English Class 12
1 page
Candidate Care Package
No ratings yet
Candidate Care Package
5 pages
Estimating The Financing Required To Fulfill The Education Targets Set by Sustainable Development Goals (SDGS) : A Sample Study of Punjab, Pakistan
No ratings yet
Estimating The Financing Required To Fulfill The Education Targets Set by Sustainable Development Goals (SDGS) : A Sample Study of Punjab, Pakistan
16 pages
Chapter 1 Introduction To Engineering Economy
No ratings yet
Chapter 1 Introduction To Engineering Economy
12 pages
(Ebook) The Winning Trainer. Winning Ways to Involve People in Learning by Julius E. Eitington (Auth.) ISBN 9780750674232, 0750674237 2024 Scribd Download
100% (2)
(Ebook) The Winning Trainer. Winning Ways to Involve People in Learning by Julius E. Eitington (Auth.) ISBN 9780750674232, 0750674237 2024 Scribd Download
81 pages
Inductive and Deductive Reasoning - Class Presentation
No ratings yet
Inductive and Deductive Reasoning - Class Presentation
39 pages
CS3491 Ai & ML Lab Manual
No ratings yet
CS3491 Ai & ML Lab Manual
57 pages
Sanchit Gupta CV
No ratings yet
Sanchit Gupta CV
1 page
Showbie Getting Started Guide
No ratings yet
Showbie Getting Started Guide
15 pages
VCVCDD
No ratings yet
VCVCDD
1 page
Physiology of Emotion
No ratings yet
Physiology of Emotion
15 pages
Bong Resume (Fin)
No ratings yet
Bong Resume (Fin)
3 pages
Models of Language Teaching
No ratings yet
Models of Language Teaching
3 pages
Certificate of Recognition Feb. 2015
No ratings yet
Certificate of Recognition Feb. 2015
3 pages
Ece 006 - Module 3
No ratings yet
Ece 006 - Module 3
9 pages
Psychology Upsc Syllabus
No ratings yet
Psychology Upsc Syllabus
13 pages
Web Site Evaluation Summary: Best Math Websites For The Classroom, As Chosen by Teachers
No ratings yet
Web Site Evaluation Summary: Best Math Websites For The Classroom, As Chosen by Teachers
22 pages
Se RD Rubrics
No ratings yet
Se RD Rubrics
2 pages

Tutorial 1

Uploaded by

Tutorial 1

Uploaded by

Dr.

7CCSMDM1 Data Mining

• Which are the assessments for 7CCSMDM1?

• (x3 = 1) ∧ (x4 = 1) → (y = 1).

• Describe 5 important steps of data mining as a process.

• (outlook=overcast) ∧ (temperature=mild)→ (play=yes)

Important steps of the data mining process are:

You might also like