COC131 Tutorial w6

discourages unethical behavior, that is, that ethical behavior will be rewarded and unethical behavior will be punished. CMD, The Defining Issues Test (DIT) (Rest, 1979) was used as a measure of CMD. The DIT was administered

Uploaded by

Krishan Acharya

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views

COC131 Tutorial w6

Uploaded by

Krishan Acharya

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

COC131 Data Mining - Association Analysis

Martin D. Sykora [email protected]

Tutorial 06, Friday 27th March 2009

Association analysis is concerned with discovering interesting correlations

or other relationships between variables in large databases. We are inter-
ested into relationships between features themselves, rather than features
and class as in the standard classification problem setting. Hence searching
for association patterns is no different from classification except that instead
of predicting just the class, we try to predict arbitrary attributes or attribute
combinations.

1. Fire up Weka (Waikako Environment for Knowledge Analysis) software,

launch the explorer window and select the Preprocess tab. Open the
weather.nominal data-set (weather.nominal.arff, this should be in
the ./data/ directory of the Weka install).
2. Often we are in search of discovering association rules showing attribute-
value conditions that occur frequently together in a given set of data,
such as; buys(X, computer) & buys(X, scanner) = buys (X,
printer) [support = 2%, confidence = 60%]. Where confidence and
support are measures of rule interestingness. A support of 2% means
that 2% of all transactions under analysis show that computer, scanner
and printer are purchased together. A confidence of 60% means that
60% of the customers who purchased a computer and a scanner also
bought a printer. We are interested into association rules that apply
to a reasonably large number of instances and have a reasonably high
accuaracy on the instances to which they apply.
Weka has three build-in association rule learners. These are, Apri-
ori, Predictive Apriori and Tertius, however they are not capable
of handling numeric data. Therefore in this exericse we use weather
data.
(a) Select the Associate tab to get into the Association rule min-
ing perspective of Weka. Under Associator select and run each
of the following Apriori, Predictive Apriori and Tertius.

1
Briefly inspect the output produced by each Associator and try
to interpret its meaning.
(b) In association rule mining the number of possible association rules
can be very large even with tiny datasets, hence it is in our best
intrest to reduce the count of rules found, to only the most inter-
esting ones. This is usually achieved by setting minimum thresh-
olds on support and confidence values. Still in the Associate
view, select the Apriori algorithm again, click on the textbox
next to the Choose button and try, in turn, different values for
the following parameters lowerBoundMinSupport (min thresh-
old for support), minMetric (min threshold for confidence). As
you change these parameter values what do you notice about the
rules that are found by the associator? Note that the parameter
numRules limits the maximum number of rules that the associ-
ator looks for, you can try changing this value.
(c) This time run the Apriory algorithm with the outputItemSets
parameter set to true. You will notice that the algorithm now also
outputs a list of Generated sets of large itemsets: at different
levels. If you have the modules Data Mining book by Witten &
Frank with you, then you can compare and contrast the Apriori
associators output with the association rules on pages 114-116 (I
will have a couple copies circulating in the lab during the session,
just ask me for one). I also strongly recommend to read through
chapter 4.5 in your own time, while playing with the weather data
in Weka, this chapter gives a nice & easy introduction to associa-
tion rules. Notice in particular how the item sets and association
rules compare with Weka and tables 4.10-4.11 in the book.
(d) Compare the association rules output from Apriori and Tertius
(you can do this by navigating through the already build associ-
ator models in the Result list on the right side of the screen).
Make sure that the Apriory algorithm shows at least 20 rules.
Think about how the association rules generated by the two dif-
ferent methods compare to each other?
Something to always remember with association rules, is that they
should not be used for prediction directly, that is without further anal-
ysis or domain knowledge, as they do not necessarily indicate causality.
They are however a very helpful starting point for further exploration
and for building a better understanding of our data.
3. As you should certainly know by this point, in order to identify associa-

2
tions between parameters a correlation matrix and scatter plot matrix
can be very usefull. In order to remind yourself of this it might be
helpfull to look back to tutorials 2, 3 or 5.

4. Linear regression can be very usefull in association analysis of numerical

values, in fact regression analysis is a powerfull approach to modelling
the relationship between a dependent and independent variables. Sim-
ple regression is when we predict from one independent variable and
multiple regression is when we predict from more than one indepen-
dent variables. The model we attempt to fit is a linear one which is,
very simply, drawing a line through the data. Of all the lines that can
possibly be drawn through the data, we are looking for the one that
best fits the data. In fact, we look to find a line that best satisfies

= 0 + 1 x +

So a most accurate model is that which yields a best fit line to the
data in question, we are looking for minimal sum of squared deviations
between actual and fitted values, this is called method of least squares.
So now that we have briefly reminded ourselves of the very basics of
regression lets directly move onto an example in Weka.

(a) In Weka go back to the Preprocess tab. Open the iris data-set
(iris.arff, this should be in the ./data/ directory of the Weka
install).
(b) In the Attributes section (bottom left of the screen) select the
class feature and click Remove. We need to do this, as simple
linear regression cannot deal with non numeric values.
(c) Next select the Classify tab to get into the Classification per-
spective of Weka, and choose LinearRegression (under func-
tions).
(d) Clicking on the textbox next to the Choose button brings up
the parameter editor window. Click on the More button to get
information about the parameters. Make sure that attributeSe-
lectionMethod is set to No attribute selection and eliminate-
ColinearAttributes is set to False.
(e) Finally make sure that you select the parameter petalwidth in
the dropdown box just under the Test Options. Hit Start to run
the regression. Inspect the results, in particular pay attention to
the Linear Regression Model formula returned, and the coefficients

3
and intercept of the straight line equation. As this is a numeric
prediction/regression problem, accuracy is measured with Root
Mean Squared Error, Mean Absolute Error and the likes. As most
of you will have clearly noticed, you can repeat this process for
regressing the other features in turn, and compare how well the
different features can be predicted.

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Clustering Iris Data With Weka
No ratings yet
Clustering Iris Data With Weka
6 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
DMLB 1
No ratings yet
DMLB 1
3 pages
6.034 Design Assignment 2: 1 Data Sets
No ratings yet
6.034 Design Assignment 2: 1 Data Sets
6 pages
How To Use Regression Machine Learning Algorithms in Weka
No ratings yet
How To Use Regression Machine Learning Algorithms in Weka
12 pages
DWDM Record With Alignment
No ratings yet
DWDM Record With Alignment
69 pages
Data Mining - Lab - Manual
No ratings yet
Data Mining - Lab - Manual
20 pages
Data Mining Lab Manual
33% (3)
Data Mining Lab Manual
44 pages
MC0717 Lab Manual
No ratings yet
MC0717 Lab Manual
42 pages
DWDM Lab Manual Using Weka-For MIC
No ratings yet
DWDM Lab Manual Using Weka-For MIC
42 pages
DM Tools Sample-1
No ratings yet
DM Tools Sample-1
72 pages
Fishbone Diagram
No ratings yet
Fishbone Diagram
13 pages
Itb 10BM60092 Term Paper
No ratings yet
Itb 10BM60092 Term Paper
8 pages
AI32 Guide To Weka PDF
No ratings yet
AI32 Guide To Weka PDF
6 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
40 pages
Exp 5 DWDM Merged
No ratings yet
Exp 5 DWDM Merged
4 pages
Experiment 6,7
No ratings yet
Experiment 6,7
14 pages
Decision Trees. These Models Use Observations About Certain
No ratings yet
Decision Trees. These Models Use Observations About Certain
6 pages
Manisha 3001 Week 12
No ratings yet
Manisha 3001 Week 12
22 pages
Data Mining 456
No ratings yet
Data Mining 456
8 pages
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
No ratings yet
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
55 pages
Release Notes v7.0
No ratings yet
Release Notes v7.0
5 pages
1-Linear Regression
No ratings yet
1-Linear Regression
22 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
Data Mining in Bioinformatics
No ratings yet
Data Mining in Bioinformatics
21 pages
Data Mining Lab Manual: Aurora's PG College Moosarambagh Mca Department
No ratings yet
Data Mining Lab Manual: Aurora's PG College Moosarambagh Mca Department
42 pages
Indian Institute of Management Bangalore: PGP 4 Term 2019-20
No ratings yet
Indian Institute of Management Bangalore: PGP 4 Term 2019-20
3 pages
Data Mining 456 (AutoRecovered)
No ratings yet
Data Mining 456 (AutoRecovered)
8 pages
Steps of Implementation of A GLM
No ratings yet
Steps of Implementation of A GLM
8 pages
DAV Question Bank+Answe
No ratings yet
DAV Question Bank+Answe
54 pages
Advanced Data Analytics Assignment
No ratings yet
Advanced Data Analytics Assignment
6 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
Data Mining Project DSBA PCA Report Final
No ratings yet
Data Mining Project DSBA PCA Report Final
21 pages
Data Mining Using Rapidminer by William Murakami-Brundage Mar. 15, 2012
No ratings yet
Data Mining Using Rapidminer by William Murakami-Brundage Mar. 15, 2012
44 pages
Data-Mining-Lab-Manual Cs 703b
No ratings yet
Data-Mining-Lab-Manual Cs 703b
41 pages
Principal Components Analysis
No ratings yet
Principal Components Analysis
50 pages
C. SEC Filings: 2.2 Risk Categories
No ratings yet
C. SEC Filings: 2.2 Risk Categories
4 pages
Dw & Dm Lab(Exp 5 to 12 ) Kcs 751 A
No ratings yet
Dw & Dm Lab(Exp 5 to 12 ) Kcs 751 A
19 pages
Exploratory Factor Analysis Using SPSS 2023
No ratings yet
Exploratory Factor Analysis Using SPSS 2023
50 pages
Wekappt
No ratings yet
Wekappt
58 pages
Factor Analysis in SPSS
No ratings yet
Factor Analysis in SPSS
9 pages
Principal Component Analysis - Intro - Towards Data Science
No ratings yet
Principal Component Analysis - Intro - Towards Data Science
4 pages
8 Data Mining Algorithms
No ratings yet
8 Data Mining Algorithms
8 pages
Data Mining Using SAS Enterprise Miner - A Case Study Approach
No ratings yet
Data Mining Using SAS Enterprise Miner - A Case Study Approach
134 pages
Data Mining Algo
No ratings yet
Data Mining Algo
8 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
10 Techniques To Deal With Class Imbalance in Machine Learning
No ratings yet
10 Techniques To Deal With Class Imbalance in Machine Learning
10 pages
Project 2 Factor Hair Revised Case Study
No ratings yet
Project 2 Factor Hair Revised Case Study
25 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
dwm exp5 a49
No ratings yet
dwm exp5 a49
8 pages
Java: Advanced Guide to Programming Code with Java
From Everand
Java: Advanced Guide to Programming Code with Java
Charlie Masterson
No ratings yet
Java: Advanced Guide to Programming Code with Java: Java Computer Programming, #4
From Everand
Java: Advanced Guide to Programming Code with Java: Java Computer Programming, #4
Charlie Masterson
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Science Essentials For Dummies
From Everand
Data Science Essentials For Dummies
Lillian Pierson
No ratings yet
Production System: Fundamentals and Applications
From Everand
Production System: Fundamentals and Applications
Fouad Sabry
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
From Everand
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Larry Keller
3/5 (1)
An Application of Apriori Algorithm
No ratings yet
An Application of Apriori Algorithm
7 pages
04341127
No ratings yet
04341127
4 pages
Mamarikas PDF
No ratings yet
Mamarikas PDF
111 pages
2015 Cpr
No ratings yet
2015 Cpr
544 pages
The Connectivity Index of A Weighted Graph: O. Araujo A.i, J.A. de La Pefia B.
No ratings yet
The Connectivity Index of A Weighted Graph: O. Araujo A.i, J.A. de La Pefia B.
7 pages
Hesam Bhai
No ratings yet
Hesam Bhai
11 pages
(14076179 - Transport and Telecommunication Journal) Evaluation of Freight Measures by Integrating Simulation Tools - The Case of Volos Port, Greece
No ratings yet
(14076179 - Transport and Telecommunication Journal) Evaluation of Freight Measures by Integrating Simulation Tools - The Case of Volos Port, Greece
9 pages
2016 1246 Moesm1 Esm
No ratings yet
2016 1246 Moesm1 Esm
50 pages
6 ReCiPe111
No ratings yet
6 ReCiPe111
1,518 pages
BRM Syllabus
No ratings yet
BRM Syllabus
1 page
Lead Users - Eric Von Hippel and Glen Urban (1988)
No ratings yet
Lead Users - Eric Von Hippel and Glen Urban (1988)
24 pages
Advantages and Disadvantages of PERT CPM Advantages of PERT
100% (1)
Advantages and Disadvantages of PERT CPM Advantages of PERT
4 pages
Setting: Curriculum Goals and Objectives
No ratings yet
Setting: Curriculum Goals and Objectives
19 pages
Quality in Classroom Transaction
No ratings yet
Quality in Classroom Transaction
3 pages
Sex Education
No ratings yet
Sex Education
5 pages
[Ebooks PDF] download (Ebook) Edrolo VCE Physics Units 3 & 4 by Al Harkness, Zev Hoffman, Finn Connolly, Lucas Neumaier, Alex Gorbatjov and Georgina Ryan ISBN 9780648943389, 0648943380 full chapters
100% (1)
[Ebooks PDF] download (Ebook) Edrolo VCE Physics Units 3 & 4 by Al Harkness, Zev Hoffman, Finn Connolly, Lucas Neumaier, Alex Gorbatjov and Georgina Ryan ISBN 9780648943389, 0648943380 full chapters
57 pages
Full Dissertation Financial Forecasting Basant Kumar Nayak
No ratings yet
Full Dissertation Financial Forecasting Basant Kumar Nayak
30 pages
Lesson 01 An Introduction To Sociology. The Tools of Sociology
No ratings yet
Lesson 01 An Introduction To Sociology. The Tools of Sociology
9 pages
Erican CAE-Level English (Chapter 8)
No ratings yet
Erican CAE-Level English (Chapter 8)
64 pages
Sapala Organics PVT LTD: Analysed by Nagaraju Confidential Reviewed By:suresh
No ratings yet
Sapala Organics PVT LTD: Analysed by Nagaraju Confidential Reviewed By:suresh
1 page
Philippine Christian University Taft Avenue, Manila: Course Outline and Objectives
No ratings yet
Philippine Christian University Taft Avenue, Manila: Course Outline and Objectives
3 pages
Chapter 3 PDF
100% (1)
Chapter 3 PDF
7 pages
Data Visualization Using Spreadsheet - Theory Question Bank
No ratings yet
Data Visualization Using Spreadsheet - Theory Question Bank
6 pages
Goodwood Publishing IJFAM Template
No ratings yet
Goodwood Publishing IJFAM Template
4 pages
Student Notes 7.2 New
No ratings yet
Student Notes 7.2 New
7 pages
Holographic Duality in Condensed Matter Physics
100% (2)
Holographic Duality in Condensed Matter Physics
586 pages
Research Chapter 3
No ratings yet
Research Chapter 3
12 pages
Statistical Intervals
100% (2)
Statistical Intervals
28 pages
Pre-Test/Post-test Analysis 1. Given Scores 2.graph of The Pre-Test and Post-Test Scores
No ratings yet
Pre-Test/Post-test Analysis 1. Given Scores 2.graph of The Pre-Test and Post-Test Scores
2 pages
Exploring Research Methodology: Review Article International Journal of Research & Reviewed by KEL
No ratings yet
Exploring Research Methodology: Review Article International Journal of Research & Reviewed by KEL
5 pages
AP Statistics Final Project
No ratings yet
AP Statistics Final Project
5 pages
Lesson 3 - Characteristics&ethics
No ratings yet
Lesson 3 - Characteristics&ethics
14 pages
DWDM-unit-4 Ch-8
No ratings yet
DWDM-unit-4 Ch-8
29 pages
A Critique of Film Theory - Brian Henderson
No ratings yet
A Critique of Film Theory - Brian Henderson
258 pages
Computation: The Use of Fragility Curves in The Life-Cycle Assessment of Deteriorating Bridge Structures
No ratings yet
Computation: The Use of Fragility Curves in The Life-Cycle Assessment of Deteriorating Bridge Structures
17 pages
Type 1 and Ii Error
No ratings yet
Type 1 and Ii Error
8 pages
Primary Science FPD: Australian Curriculum: Science (Year 6)
No ratings yet
Primary Science FPD: Australian Curriculum: Science (Year 6)
12 pages
rm tổng câu hỏi
No ratings yet
rm tổng câu hỏi
5 pages
Activity 4 Daugdaug Stat
No ratings yet
Activity 4 Daugdaug Stat
3 pages

COC131 Tutorial w6

Uploaded by

COC131 Tutorial w6

Uploaded by

COC131 Data Mining - Association Analysis

Martin D. Sykora [email protected]

Association analysis is concerned with discovering interesting correlations

1. Fire up Weka (Waikako Environment for Knowledge Analysis) software,

4. Linear regression can be very usefull in association analysis of numerical

You might also like