University of Waikato: Data Mining With Weka

This document discusses a dataset called "glass" that contains information about different types of glass. [1] The dataset contains 214 instances with 10 attributes, including the glass type which indicates if it is from containers, windows, etc. [2] The other attributes provide information like the refractive index and percentages of elements like sodium and silicon that compose the glass. [3] The document examines the attribute values to ensure they are reasonable and consistent with glass composition.

Uploaded by

Sadhi Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

University of Waikato: Data Mining With Weka

Uploaded by

Sadhi Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Data Mining with Weka

University of Waikato

WEEK 1
The glass data

I’m going to look at a different dataset. I’m going to look at the “glass” dataset,
which is a rather more extensive dataset. It’s a real world dataset, not a terribly big
one. Let’s open it. Here we’ve got 214 instances and 10 attributes. Here are the 10
attributes, it’s not clear what they are. Let’s look at the “class”, by default the last
attribute shown. There are seven values for the class, and the labels of these values
give you some indication of what this dataset is about. We have “headlamps”,
“tableware” (starting from the bottom), “containers”. Then we have “building” and
“vehicle” windows, both “float” and “non-float”. You may not know this, but there
are different ways of making glass, and the floating process is a way of making
glass. These are seven different kinds of glass.

What are the attribute values? I don’t know what you remember about physics, and
I guess it doesn’t matter if you don’t remember. RI stands for the refractive index.

It’s always a good idea to check for reasonableness when you’re looking at
datasets. It’s really important to get down and dirty with your data. Here we’re
looking at the values of the refractive index—a minimum of 1.511, a maximum of
1.534. It’s good to think about whether these are reasonable values for refractive
index. If you go to the web and have a look around, you’ll find that these are good
values for the refractive index.

Na. If you did chemistry, you’ll recognize Na as sodium. Here, it looks like these are
percentages, the different percentages of sodium, Magnesium, Mg, and so on. We
would expect Silicon (Si) to make up the majority of glass. It varies between 69.81%
and 75.41%. These are percentages of different elements in the glass.

We can confirm our guesses here by looking at the data file itself. Let me just find
the “glass” data. It’s in Weka datasets, and it’s glass.arff. This is the ARFF file
format. It starts with a bunch of comments about the glass database. These lines
beginning with percentage signs (%) are comments. You can read about this. We
don’t have time to read it now.

FutureLearn 1
You can see about the attributes and it does say that the attributes are refractive
index, sodium, magnesium, and so on. And the type of glass, just like I said, is about
windows, containers, and tableware, and so on. We get down to the end of the
comments, and here we have stuff for Weka. This is the ARFF format. The relation
has a name, you’ll see it printed in the interface when you look. The attributes are
defined, they are real valued attributes, numeric attributes. The “type” attribute is
nominal, and the different values of type are enumerated here in quotes.

That defines the relation and the attributes. Then we have an ‘@data’ line, and
following that in the ARFF format, are simply the instances, one after the other,
with the attribute values all on one line, ending with the class by default. This is the
class value for the first instance. I think there are 214 instances here. There’s the
last one. That’s the ARFF format. It is a very simple, textual file format.

Now we’ve confirmed our guesses about these numbers being percentages and
different elements. We can think about this some more. It’s important then, that
these numbers are reasonable. If they went negative, for example, that would
indicate some kind of corrupted value—you can’t have a negative percentage. We’re
expected silicon to be the majority component; we’re expecting the refractive index
to be in this kind of range. It’s always a good idea when you get a dataset to just
click around in the Weka interface and make sure things look real. Rather small
amounts of aluminum in glass; I guess that’s not surprising; I don’t know very much
about glass myself. We’re just checking for reasonableness here—a very good thing
to do. That’s it then.

In this lesson, we’ve looked at the classification problem. We’ve looked at the
nominal weather data and the numeric weather data. We’ve talked about nominal
versus numeric attributes, and we’ve talked about the ARFF file format. We’ve
looked at the glass.arff dataset, and I’ve talked about sanity checking of attributes,
and the importance of getting down and dirty with your data.

We’ll see you soon. Bye!

FutureLearn 2

Fundamentals of Semantic SEO
100% (1)
Fundamentals of Semantic SEO
23 pages
Bughunting101 English Version
100% (2)
Bughunting101 English Version
232 pages
Scala for Java Developers
From Everand
Scala for Java Developers
Thomas Alexandre
5/5 (1)
Rails As She Is Spoke
No ratings yet
Rails As She Is Spoke
95 pages
Resume Data Analyst Utsav Mehta
100% (1)
Resume Data Analyst Utsav Mehta
1 page
Data Mining - Session #1 - Unlocked
No ratings yet
Data Mining - Session #1 - Unlocked
22 pages
KXJVQSBH 4Y
No ratings yet
KXJVQSBH 4Y
15 pages
CH 01
No ratings yet
CH 01
11 pages
Bill Inmon - SWIMMING IN THE DATA LAKE - 1
No ratings yet
Bill Inmon - SWIMMING IN THE DATA LAKE - 1
3 pages
Working With Essbase
No ratings yet
Working With Essbase
7 pages
Your Name: Dana Kaput LIS 703 - Final Exam Template: Question # 1
No ratings yet
Your Name: Dana Kaput LIS 703 - Final Exam Template: Question # 1
10 pages
5ea5e2b8225c69094b0c684982d370ae_76dhtgZt38A
No ratings yet
5ea5e2b8225c69094b0c684982d370ae_76dhtgZt38A
11 pages
Data Modeling and Data Engineering
No ratings yet
Data Modeling and Data Engineering
24 pages
Rob Pike Notes on Programming in C
No ratings yet
Rob Pike Notes on Programming in C
8 pages
The Array Data Structure: Chapter XI Topics
No ratings yet
The Array Data Structure: Chapter XI Topics
27 pages
Unit V
No ratings yet
Unit V
22 pages
Code and Values: "Dan Abramov"
No ratings yet
Code and Values: "Dan Abramov"
8 pages
RDS 101
No ratings yet
RDS 101
4 pages
Intro To Stats Using LibreOffice-Calc and Gnumeric
100% (1)
Intro To Stats Using LibreOffice-Calc and Gnumeric
91 pages
Weka Lab Experiment 1 2
No ratings yet
Weka Lab Experiment 1 2
12 pages
Instant download Pro JPA 2 in Java EE 8: An In-Depth Guide to Java Persistence APIs - Third Edition Mike Keith pdf all chapter
100% (5)
Instant download Pro JPA 2 in Java EE 8: An In-Depth Guide to Java Persistence APIs - Third Edition Mike Keith pdf all chapter
65 pages
9 Neo4j
No ratings yet
9 Neo4j
8 pages
Advanced Analytics with Spark 1st Edition by Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills ISBN 9781491912713 1491912715 pdf download
100% (1)
Advanced Analytics with Spark 1st Edition by Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills ISBN 9781491912713 1491912715 pdf download
46 pages
Data Exploration in R
No ratings yet
Data Exploration in R
6 pages
Comprehensive Data Exploration With Python
No ratings yet
Comprehensive Data Exploration With Python
20 pages
Pro JPA 2 in Java EE 8: An In-Depth Guide to Java Persistence APIs - Third Edition Mike Keith pdf download
No ratings yet
Pro JPA 2 in Java EE 8: An In-Depth Guide to Java Persistence APIs - Third Edition Mike Keith pdf download
66 pages
Notes On Heap Sort
No ratings yet
Notes On Heap Sort
16 pages
Element RAFT - Chemistry Research Project
No ratings yet
Element RAFT - Chemistry Research Project
7 pages
DBMS ER Model Concept - Javatpoint
No ratings yet
DBMS ER Model Concept - Javatpoint
16 pages
Scala Functional Programming Patterns - Sample Chapter
No ratings yet
Scala Functional Programming Patterns - Sample Chapter
31 pages
Things Ch3
No ratings yet
Things Ch3
31 pages
(30) Anomalous Adventures Part 1_ Elasticsearch or R_ _ LinkedIn
No ratings yet
(30) Anomalous Adventures Part 1_ Elasticsearch or R_ _ LinkedIn
9 pages
ELK 2 4 - Logstash Filtering - Unstructured Data
No ratings yet
ELK 2 4 - Logstash Filtering - Unstructured Data
16 pages
03 Hash
No ratings yet
03 Hash
84 pages
ELK 1 1 - ES Docs Types and Indexes
No ratings yet
ELK 1 1 - ES Docs Types and Indexes
12 pages
Think Diff
No ratings yet
Think Diff
11 pages
01 06 Detecting Anomalous Network Traffic With Scapy.en
No ratings yet
01 06 Detecting Anomalous Network Traffic With Scapy.en
3 pages
ELK 1 3 - Search in ES
No ratings yet
ELK 1 3 - Search in ES
15 pages
beginning-with-weka-and-r-language
No ratings yet
beginning-with-weka-and-r-language
27 pages
Autodesk University FEA
No ratings yet
Autodesk University FEA
16 pages
3 models
No ratings yet
3 models
36 pages
Richard Yuill Thesis
100% (3)
Richard Yuill Thesis
6 pages
Lecture 4
No ratings yet
Lecture 4
16 pages
Database
No ratings yet
Database
21 pages
Miroslav Lessev: Monitoring Microsoft SQL Server Using The Elastic Stack
No ratings yet
Miroslav Lessev: Monitoring Microsoft SQL Server Using The Elastic Stack
32 pages
(Ebook) Data Visualization with Python and JavaScript: Scrape, Clean, Explore & Transform Your Data by Kyran Dale ISBN 9781491920510, 1491920513 download
100% (2)
(Ebook) Data Visualization with Python and JavaScript: Scrape, Clean, Explore & Transform Your Data by Kyran Dale ISBN 9781491920510, 1491920513 download
51 pages
Scala for Java Developers A Practical Primer 1st Edition Toby Weston pdf download
100% (2)
Scala for Java Developers A Practical Primer 1st Edition Toby Weston pdf download
68 pages
Instant download Head First SQL Your Brain on SQL A Learner s Guide Head First 1st Ed. Edition Lynn Beighley ebook 2025 edition
No ratings yet
Instant download Head First SQL Your Brain on SQL A Learner s Guide Head First 1st Ed. Edition Lynn Beighley ebook 2025 edition
61 pages
Crow's Foot Notation - Relationship Symbols and How To Read Diagrams
100% (1)
Crow's Foot Notation - Relationship Symbols and How To Read Diagrams
11 pages
Electronically Scanned Arrays MATLAB Modeling and Simulation 1st Edition Arik D. Brown instant download
100% (2)
Electronically Scanned Arrays MATLAB Modeling and Simulation 1st Edition Arik D. Brown instant download
56 pages
Documento 4 de Gonzalo
No ratings yet
Documento 4 de Gonzalo
26 pages
Ldap1 L3P1
No ratings yet
Ldap1 L3P1
19 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
Encyclopedia Of Electronic Components Volume 1 - Charles Platt -021-025
No ratings yet
Encyclopedia Of Electronic Components Volume 1 - Charles Platt -021-025
5 pages
Web Site Design - Webquest
No ratings yet
Web Site Design - Webquest
1 page
Data Visualization with Python and JavaScript Scrape Clean Explore Transform Your Data 1st Edition Kyran Dale - Download the ebook and explore the most detailed content
No ratings yet
Data Visualization with Python and JavaScript Scrape Clean Explore Transform Your Data 1st Edition Kyran Dale - Download the ebook and explore the most detailed content
41 pages
Data Mining: Index
No ratings yet
Data Mining: Index
47 pages
Buy ebook Programming in Scala 2nd 2nd Edition Martin Odersky cheap price
100% (2)
Buy ebook Programming in Scala 2nd 2nd Edition Martin Odersky cheap price
81 pages
Session 02 (Data)
No ratings yet
Session 02 (Data)
28 pages
SCI5 Q3 Performance Task
100% (1)
SCI5 Q3 Performance Task
11 pages
SQL for Beginners: A Guide to Excelling in Coding and Database Management
From Everand
SQL for Beginners: A Guide to Excelling in Coding and Database Management
Vere salazar
No ratings yet
Rails: Novice to Ninja: Build Your Own Ruby on Rails Website
From Everand
Rails: Novice to Ninja: Build Your Own Ruby on Rails Website
Glenn Goodrich
4/5 (1)
Pytxt
No ratings yet
Pytxt
4 pages
Interalqp Ans
No ratings yet
Interalqp Ans
2 pages
IT Practical Solutions For Semester I: Prepared By: Mohammed Waseem Raza
No ratings yet
IT Practical Solutions For Semester I: Prepared By: Mohammed Waseem Raza
54 pages
Praise
No ratings yet
Praise
2 pages
IT Practical Solutions For Semester I: Prepared By: Mohammed Waseem Raza
No ratings yet
IT Practical Solutions For Semester I: Prepared By: Mohammed Waseem Raza
3 pages
A Set of Instructions Which Describes The Steps To Be Followed To Carry Out An Activity Is Called
No ratings yet
A Set of Instructions Which Describes The Steps To Be Followed To Carry Out An Activity Is Called
7 pages
Bba I Semester Important Questions
100% (1)
Bba I Semester Important Questions
4 pages
Data On Bpo
No ratings yet
Data On Bpo
13 pages
Recent Trends in The Process of Recruitment, in Bpo & Retail Sectors in India
No ratings yet
Recent Trends in The Process of Recruitment, in Bpo & Retail Sectors in India
5 pages
Web Technologies Elective (VIII-B) : Very Important Questions
No ratings yet
Web Technologies Elective (VIII-B) : Very Important Questions
1 page
Overview of Electronic Commerce: Content
No ratings yet
Overview of Electronic Commerce: Content
47 pages
Name of The Student Father Name Mobile Address Sl. No
No ratings yet
Name of The Student Father Name Mobile Address Sl. No
3 pages
Notes Prepared For: Mohammed Waseem Raza
No ratings yet
Notes Prepared For: Mohammed Waseem Raza
130 pages
Account Statement From 1 Apr 2019 To 31 Mar 2020: TXN Date Value Date Description Ref No./Cheque No. Debit Credit Balance
No ratings yet
Account Statement From 1 Apr 2019 To 31 Mar 2020: TXN Date Value Date Description Ref No./Cheque No. Debit Credit Balance
7 pages
BBA (Computer Applications) 4th Semester (R19) E-Commerce
No ratings yet
BBA (Computer Applications) 4th Semester (R19) E-Commerce
1 page
OS Practical Solutions of Record
No ratings yet
OS Practical Solutions of Record
15 pages
Bbca Caiii C Practicla
No ratings yet
Bbca Caiii C Practicla
7 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Akash Kasuladev Data Analyst
No ratings yet
Akash Kasuladev Data Analyst
4 pages
Hospital Flowchart
No ratings yet
Hospital Flowchart
1 page
Foglight For PostgreSQL Guide 6.1.0.10-1
No ratings yet
Foglight For PostgreSQL Guide 6.1.0.10-1
73 pages
Ashish Assignment
No ratings yet
Ashish Assignment
29 pages
Project On Industrial Interaction Portal
No ratings yet
Project On Industrial Interaction Portal
94 pages
Net Beans
No ratings yet
Net Beans
18 pages
Salesforce Interview Questions On Trigger
67% (3)
Salesforce Interview Questions On Trigger
3 pages
Project
No ratings yet
Project
16 pages
Cross-Domain Security in Web Applications: Hapter
No ratings yet
Cross-Domain Security in Web Applications: Hapter
83 pages
Pointer Applications
No ratings yet
Pointer Applications
32 pages
CS 159 - Spring 2021 - Lab #9: Contact Prior
No ratings yet
CS 159 - Spring 2021 - Lab #9: Contact Prior
5 pages
The Cyber Security Objectives
No ratings yet
The Cyber Security Objectives
6 pages
3.1-6 Folder Redirection
No ratings yet
3.1-6 Folder Redirection
41 pages
Fndtrep 10
No ratings yet
Fndtrep 10
7 pages
III Cse Cs2305 Programming Paradigms
No ratings yet
III Cse Cs2305 Programming Paradigms
1 page
Lab Manuals DDBS
100% (1)
Lab Manuals DDBS
67 pages
Modern Database Management Slides - ch03
No ratings yet
Modern Database Management Slides - ch03
33 pages
Unit-Ii Database Design: Er Model & Er Diagrams
No ratings yet
Unit-Ii Database Design: Er Model & Er Diagrams
81 pages
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
No ratings yet
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
6 pages
Data Administration and Database Administration
No ratings yet
Data Administration and Database Administration
8 pages
15 (QSP-QA - 07) Process & Product Audit
100% (1)
15 (QSP-QA - 07) Process & Product Audit
3 pages
Student Management System
0% (4)
Student Management System
15 pages
Software Quality Factors
No ratings yet
Software Quality Factors
2 pages
Introduction To E-Commerce
No ratings yet
Introduction To E-Commerce
32 pages
Informix 4GL
No ratings yet
Informix 4GL
33 pages
What Is Service Oriented Architecture (SOA) ?
No ratings yet
What Is Service Oriented Architecture (SOA) ?
15 pages
Dynamic DNS Update Script For EveryDNS - MikroTik Wiki
No ratings yet
Dynamic DNS Update Script For EveryDNS - MikroTik Wiki
4 pages

University of Waikato: Data Mining With Weka

Uploaded by

University of Waikato: Data Mining With Weka

Uploaded by

Data Mining with Weka

We’ll see you soon. Bye!

You might also like