0% found this document useful (0 votes)

451 views

IS328 Assignment1-Unlocked

This assignment covers theoretical and practical aspects of data mining using the WEKA tool. Students are asked to perform various data preprocessing tasks on a bank marketing dataset, including attribute selection, discretization, and handling missing values. The goals are to familiarize students with WEKA, understand preprocessing methods, and write a consolidated report analyzing the results. The assignment is worth 20% of the course grade and will be evaluated based on completeness, analysis, and report quality.

Uploaded by

Tetz

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

451 views

IS328 Assignment1-Unlocked

Uploaded by

Tetz

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

The University of the South Pacific

School of Computing, Information and Mathematical Sciences

IS328: Data Mining

Assignment I Semester 2, 2020
Total Marks: 20%

Due Date: as shown on Moodle

This assignment covers both theoretical and practical aspect of this course. The marking rubric is heavily based on Data & Information
Management, which is in liaison with course outline and BSE program map. Rubrics have been taken from ACS-SCIMS rubrics V1.0. This
assessment covers the following course learning outcomes:
CLO 2: Perform pre-processing tasks to refine data sets
CLO 3: Apply various data mining methods for interpreting results

Overview
The goals of this assignment I are
• As a class, to make you familiar with WEKA tool and understand some of the data preprocessing methods, attribute selection
methods, classification and clustering algorithms.
• As a team of 2 members, you can discuss your findings on the chosen questions and consolidate your learnings as a report (Answers
to each question: Algorithm, Working screen shots and results, Comparison and Analysis).
• Make a consolidated report about your findings
• This assignment is an important part of the course and counts for 20% of your final grade. Grades will be based on the completeness
of your findings, analysis and the quality of the report.

IS328: Assignment I Page 1 of 10 Dr.Vani Vasudevan

Grading
• This assignment is worth for 20 marks
• Late delivery without prior notification and permission from the instructor will result in a loss of 10% of the marks per day.
• Plagiarism/Cheating in any form are strictly prohibited. If found, complete Assignment 1 will be nullified.
Plagiarism
For all the Assignment/Project works it’s essential that you avoid plagiarism. Not only do you expose yourself to possibly serious
disciplinary consequences, but you’ll also cheat yourself of a proper understanding of the concepts emphasized in the assignment.
• It’s not plagiarism to discuss the assignment with your friends and consider solutions to the problems together. However, it is
plagiarism if you copy all or part of each other’s solutions.

Question 1: Data Pre-processing [5 Marks]

Use bank-data.arff to perform a series of pre-processing operations using filters in WEKA.

1. Selecting or Filtering Attributes

In bank-data.arff data set, each record is uniquely identified by a customer id (the "id" attribute). Remove this attribute before the data mining
step by using the Attribute filters in WEKA. In the Filter panel, click on the Choose button. This will show a popup window with a list
available filter. Scroll down the list and select the weka.filters.unsupervised.attribute.Remove filter.

Next, click on text field immediately to the right of the "Choose" button. In the resulting dialog box enter the index of the attribute to be
filtered out (this can be a range, or a list separated by commas). In this case, enter 1 which is the index of the "id" attribute (see the left
panel). Make sure that the invertSelection option is set to false (otherwise everything except attribute 1 will be filtered). Then click "OK".
Now, in the filter box you will see Remove -R 1

Click the "Apply" button to apply this filter to the data. This will remove the "id" attribute and create a new working relation (whose name
now includes the details of the filter that was applied). The result is depicted. Display the result.

IS328: Assignment I Page 2 of 10 Dr.Vani Vasudevan

It is possible now to apply additional filters to the new working relation. Save the intermediate results as separate data files and treat each
step as a separate WEKA session. To save the new working relation as an ARFF file, click on save button in the top panel. Save the new
relation in the file bank-data-R1.arff.

2. Discretization

Some techniques, such as association rule mining, can only be performed on categorical data. This requires performing discretization on
numeric or continuous attributes. There are 3 such attributes in this data set: "age", "income", and "children". In the case of the "children"
attribute the range of possible values are only 0, 1, 2, and 3. In this case, we have opted for keeping all of these values in the data. This
means we can simply discretize by removing the keyword "numeric" as the type for the "children" attribute in the ARFF file and replacing
it with the set of discrete values. Do this directly in our text editor and save the resulting relation in a separate file bank-data2.arff.

Rely on WEKA to perform discretization on the "age" and "income" attributes. In this, divide each of these into 3 bins (intervals). The
WEKA discretization filter can divide the ranges blindly, or used various statistical techniques to automatically determine the best way of
partitioning the data. In this case, perform simple binning.

First, load our filtered data set into WEKA by opening the file "bank-data2.arff". Select the "children" attribute in this new data set, that it
is now a categorical attribute with four possible discrete values. Now, once again activate the Filter dialog box, but this time, select
weka.filters.unsupervised.attribute.Discretize.

Next, to change the defaults for this filter, click on the box immediately to the right of the "Choose" button. This will open the Discretize
Filter dialog box. Enter the index for the attributes to be discretized. In this case we enter 1 corresponding to attribute "age". Also, enter 3
as the number of bins (note that it is possible to discretize more than one attribute at the same time (by using a list of attribute indices). Since
its simple binning, all of the other available options are set to "false".

Click "Apply" in the Filter panel. This will result in a new working relation with the selected attribute partitioned into 3 bins. To examine
the results, save the new working relation in the file bank-data3.arff.

IS328: Assignment I Page 3 of 10 Dr.Vani Vasudevan

Now, examine the new data set using text editor (in this case, Text Pad/WordPad). You can observe that WEKA has assigned its own labels
to each of the value ranges for the discretized attribute. For example, the lower range in the "age" attribute is labeled "(-inf-34.333333]"
(enclosed in single quotes and escape characters), while the middle range is labeled "(34.333333-50.666667]", and so on. These labels now
also appear in the data records where the original age value was in the corresponding range.

Next, apply the same process to discretize the "income" attribute into 3 bins. Again, Weka automatically performs the binning and replaces
the values in the "income" column with the appropriate automatically generated labels. Save the new file into bank-data3.arff", replacing
the older version.

Clearly, the WEKA labels, while readable, leave much to be desired as far as naming conventions go. Thus use the global search/replace
functions in TextPad/WordPad to replace these labels with more succinct and readable ones. Replace All the age label "(-inf-34.333333]"
with the label "0_34".

Note that the new label now appears in place of the old one both in the attribute section of the ARFF file as well as in the relevant data
records. Repeat this manual re-labeling process with all of the WEKA-assigned labels for the "age" and the "income" attributes. Also,
change the relation name in the ARFF file to bank-data-final and save the file as bank-data-final.arff.

3. Missing Values

1. Open file bank‐data.arff

2. Check if there is any missing values in any attribute.

3. Edit data to make some missing values.
4. Delete some data in “region” (Nominal) and “children” (Numeric) attributes. Click on “OK” button when finish.
5. Make note of Label that has Max Count in “region” and Mean of “children” attributes.
6. Choose ReplaceMissingValues filter (weka.filters.unsupervised.attribute.ReplaceMissingValues). Then, click on Apply button.
7. Look into the data. How did those missing values get replaced?
8. Edit bank‐data.arff with text editor. Make some data missing by replacing them with ‘?’. (Try with nominal and numeric attributes).
Save to bank‐data‐missing.arff.
9. Load bank‐data‐missing.arff into WEKA, observe the data and attribute information.
10. Replace missing values by the same procedure you had done before.

IS328: Assignment I Page 4 of 10 Dr.Vani Vasudevan

Write a report outlining the main items completed under the following operations: Provide evidence using screen shots and other
descriptions.

1. Selecting or Filtering Attributes

2. Discretization
3. Missing Values
Include your own reflection on the capabilities of WEKA to perform data preprocessing.

Unsatisfactory Satisfactory Good Marks % Marks

CBOK
(0%-49%) (50% - 75%) (76% - 100%) Allocated Attained
Data and I.Do not identify accurately I. Identify accurately some I.Identify accurately
Information any of the data quality of the data quality most of the data
Management problems problems quality problems

II. Do not perform all II. Perform most of the II. Perform all the
required tasks correctly and required tasks correctly and required tasks
consistently consistently correctly and 5
consistently
III. Provide inaccurate and/or III. Provide relatively
incomplete reports accurate and complete III. Provide
reports accurate and
complete
reports
Sub Total &
comments

IS328: Assignment I Page 5 of 10 Dr.Vani Vasudevan

Question 2: Apply J48, PART and SimpleCart Classification Algorithms [25 Marks]

A.
1. You suspect marked differences in promotional purchasing trends between female and male Acme credit card customers. You wish
to confirm or refute our suspicion. Perform a supervised data mining session using the CreditCardPromotion database (ccpromo.arff)
in conjunction with PART. Use sex as the output attribute. Designate all other attributes as input attributes and use all 15 instances
for training. Write a summary confirming or refuting our hypothesis. Base the analysis on rules created for each class.

2. Repeat the exercise using J48 rather than PART but base the analysis on the created decision tree.

B.
1. For this Question, use WEKA’s J48 decision tree algorithm to perform a data mining session with the cardiology patient data. Open
the WEKA explorer and load the cardiology-weka.arff file. This is the mixed form of the dataset containing both categorical and
numeric data.The data contains 303 instances representing patients who have a heart condition (sick) as well as those who do not.

Answer the following Preprocess Mode Questions:

a. How many of the instances are classified as Healthy?
b. What percent of the data is female?
c. What is the most commonly occurring domain value for the attribute slope?
d. What is the mean age within the dataset?
e. How many instances have the value 2 for # of Colored Vessels?

Answer the Classification Questions using J48:

Note: Perform a supervised mining session using 10-fold cross validation with J48 and class as the output attribute.

a. What attribute did J48 choose as the top-level decision tree node?
b. Draw a diagram showing the attributes and values for the first two levels of the J48 created decision tree.
c. What percent of the instances were correctly classified?
d. How many healthy class instances were correctly classified?
e. How many sick class instances were falsely classified as healthy individuals?
f. Determine how True Positive Rate (TP Rate) and False Positive Rate (FP Rate) are computed.

IS328: Assignment I Page 6 of 10 Dr.Vani Vasudevan

Answer the Classification Questions using PART:
a. List one rule for the healthy class that covers at least 50 instances.
b. List one rule for the sick class that covers at least 50 instances.
c. List one rule that is likely to show an inaccuracy rate of at least 0.05.
d. What percent of the instances were correctly classified?
e. How many healthy class instances were correctly classified?
f. How many sick class instances were falsely classified as healthy individuals?
C.

Load the CreditScreening dataset into the WEKA Explorer. Make sure that class is designated as the output attribute.
a. Use J48 together with 10-fold cross validation to mine the data. Record your results including the attributes used to create the root
node and first level of the decision tree.
b. Use Info Gain attribute evaluation to determine the most predictive categorical attribute for each of the two classes. Return to Weka
and preprocess mode. Eliminate all but the two most predictive input attributes from the attribute list. Be sure to save the output
attribute class. Use J48 with 10-fold cross validation to mine the data. Record your results. Compare the results to those seen in part

Load the CreditScreening dataset into the WEKA Explorer. Make sure that class is designated as the output attribute.

a. Use SimpleCart together with 10-fold cross validation to mine the data. How many nodes are seen in the decision tree? What is the
classification accuracy?
b. Compare your results with those seen in question C part b.

Use Wordpad or MS Word to open the soybean dataset located in the folder ─c:\program files\weka-3-6\data or Weka data set. This dataset
represents one of the more famous data mining successes. Classification accuracy of unseen instances is likely to be above 90% with most
classifiers.
a. Scroll through the file to get a better understanding of the dataset. Open WEKA’s Explorer and load this dataset. Classify the data
by applying J48 with a 10-fold cross validation. Report your results.

IS328: Assignment I Page 7 of 10 Dr.Vani Vasudevan

b. Repeat your experiment using SimpleCart rather than J48. Detail the differences between this result and your result in a. Specify
differences between the decision trees and their resultant classification accuracies.
c. Return to preprocess mode. Apply Weka’s supervised attribute selection filter to the dataset. How many attributes are eliminated
from the dataset? Apply J48 to the modified data. Do your results differ from those seen in part a?

Unsatisfactory Satisfactory Good Marks % Marks

CBOK
(0%-49%) (50% - 75%) (76% - 100%) Allocated Attained
Data and IV Do not use correct IV Identify and use correct IV Identify and use
Information algorithm(s) for the problem algorithm for the problem correct algorithm for
Management in hand in hand the problem in hand

II Do not perform all required

tasks correctly and II Perform most of the II Perform all
25
consistently required tasks correctly and required tasks
consistently correctly and
III Provide inaccurate and/or consistently
incomplete reports III Provide relatively
accurate and complete III Provide accurate
reports and complete reports
Sub Total &
comments
Submission Instructions:

1. Completely fill Mark Allocation Sheet and submit with your assignment. Failing to do so may result in deduction of 50% marks.
2. This assignment can to be submitted in groups of 2 members. Assign a group leader and submit the assignment through the group
leader’s moodle account. You have to submit just one zip/rar file of your project. The submission filename should read
A1_Sxxx_Syyy.zip or A1_Sxxx_Syyy.rar where Sxxx, Syyy are student ids of the group members. For example,
A1_S11003232_S01004488.zip or A1_S11003232_S01004488.rar. Incorrect submission will result in high penalty.
3. 25 Marks are allocated for applying appropriate DM techniques and deriving the correct results in Question 2 (A to E: 5 marks each)
for the drafting and consolidation of the report.

IS328: Assignment I Page 8 of 10 Dr.Vani Vasudevan

Mark Allocation Sheet

After having discussed as group, we recommend the following mark allocation to each group member based on contribution or lack of it
throughout the assignment.

Group Name ________________________

Project manager ________________________

Member ID Percentage contribution of allocated task

Certification

ID Member Name Signature

IS328: Assignment I Page 9 of 10 Dr.Vani Vasudevan

Assessments mapping with CBOK

Presentation
Assign 1

Assign 2
IS328

Test1

MST
Core Body of Knowledge

Complex Computing
ICT Professional Ethics M
Knowledge ü ü
Professional expectations M
Teamwork concepts/issues M ü ü
Communication M ü ü
Societal Issues/Legal issues/Privacy M
Understanding the ICT profession
ICT Problem Solving: Abstraction
Design
Technology Resources Hardware and Software
Fundamentals
Data and Information Management M ü ü
Networking
Technology Building Human Factors
Programming
Systems Development / Acquisition
ICT Management IT Governance and organisational
issues
IT Project management
Service management
Security management

IS328: Assignment I Page 10 of 10 Dr.Vani Vasudevan

Dark Venus Full Strategy Explanation
No ratings yet
Dark Venus Full Strategy Explanation
10 pages
Business Data Communications and Networking, 12th Edition by Dennis - Test Bank
No ratings yet
Business Data Communications and Networking, 12th Edition by Dennis - Test Bank
25 pages
Assignment Yassir 1
0% (1)
Assignment Yassir 1
6 pages
Srs Template
No ratings yet
Srs Template
8 pages
ATM Project 1
No ratings yet
ATM Project 1
28 pages
MG101 Exam F2F
No ratings yet
MG101 Exam F2F
9 pages
Hotel Management System PDF
No ratings yet
Hotel Management System PDF
84 pages
Exploratory Data Analysis and Crime Prevention Using Machine Learning The Case of Ghana
No ratings yet
Exploratory Data Analysis and Crime Prevention Using Machine Learning The Case of Ghana
10 pages
Lab 8.4.3 Performing A Vulnerability Analysis
0% (1)
Lab 8.4.3 Performing A Vulnerability Analysis
8 pages
Mini Project Report Java Hotel Manegment System Done Clear
No ratings yet
Mini Project Report Java Hotel Manegment System Done Clear
31 pages
Online Library Management
No ratings yet
Online Library Management
27 pages
Visvesvaraya Technological University: "Online Bank Management System"
No ratings yet
Visvesvaraya Technological University: "Online Bank Management System"
9 pages
Hospital
100% (1)
Hospital
5 pages
Employee Records Management System
No ratings yet
Employee Records Management System
56 pages
41 - JAVA - Student Assignment Management System
No ratings yet
41 - JAVA - Student Assignment Management System
6 pages
Final Year Project (Sem-5 Report
No ratings yet
Final Year Project (Sem-5 Report
12 pages
Practical File (AI)
No ratings yet
Practical File (AI)
15 pages
Final Documentation
No ratings yet
Final Documentation
99 pages
RDBMS Assignment 1
No ratings yet
RDBMS Assignment 1
2 pages
"Mobile Shop Management": A Project Report On
No ratings yet
"Mobile Shop Management": A Project Report On
48 pages
Crime Prediction
No ratings yet
Crime Prediction
11 pages
B. Sc. CSIT Final Year Project Work:: Structuring Report, Presentation and Evaluation
No ratings yet
B. Sc. CSIT Final Year Project Work:: Structuring Report, Presentation and Evaluation
41 pages
Design and Implementation of Integrated Mobile Responsive Student Enrolment System
No ratings yet
Design and Implementation of Integrated Mobile Responsive Student Enrolment System
9 pages
Project Report Final1
No ratings yet
Project Report Final1
5 pages
College Managment Information System
No ratings yet
College Managment Information System
25 pages
Proposal On Design and Implementation of A Computerized VISITOR MANAGEMENT SYSTEM (A Case Study of Giver's
No ratings yet
Proposal On Design and Implementation of A Computerized VISITOR MANAGEMENT SYSTEM (A Case Study of Giver's
5 pages
Systems Planning and Selection
100% (1)
Systems Planning and Selection
11 pages
1.1 System Objective
No ratings yet
1.1 System Objective
45 pages
Online Book Store Management System Synophsis
No ratings yet
Online Book Store Management System Synophsis
18 pages
Yash Mittal Final Project Documentation
No ratings yet
Yash Mittal Final Project Documentation
114 pages
CCNA1 Capstone Project.
No ratings yet
CCNA1 Capstone Project.
18 pages
Software Engineering: A Short Guide To Written Exam Software Engineering
No ratings yet
Software Engineering: A Short Guide To Written Exam Software Engineering
12 pages
Ict Elective
60% (5)
Ict Elective
62 pages
Library Management Software C++
65% (23)
Library Management Software C++
33 pages
FULL Report
No ratings yet
FULL Report
63 pages
Unit V: Software Engineering, A Practitioner's Approach - Pressman Roger. S. TMH. (Strictly 5th Ed)
No ratings yet
Unit V: Software Engineering, A Practitioner's Approach - Pressman Roger. S. TMH. (Strictly 5th Ed)
52 pages
SQA Syllabus
No ratings yet
SQA Syllabus
1 page
Experiment No 1 Introduction To MATLAB
No ratings yet
Experiment No 1 Introduction To MATLAB
8 pages
Touch Screen Chain Men Hair Salon Project Proposal
0% (1)
Touch Screen Chain Men Hair Salon Project Proposal
8 pages
Application Development Using Flutter
No ratings yet
Application Development Using Flutter
5 pages
Mini Project Report Format (Rafique)
0% (1)
Mini Project Report Format (Rafique)
7 pages
The Architecture of Windows NT
No ratings yet
The Architecture of Windows NT
3 pages
Enhancing Virtual Intranet Server Full Doc (Editing)
No ratings yet
Enhancing Virtual Intranet Server Full Doc (Editing)
56 pages
"College Admission Management System": A Project Report ON
No ratings yet
"College Admission Management System": A Project Report ON
20 pages
Mobile Shop Management System Part1
No ratings yet
Mobile Shop Management System Part1
14 pages
Simple Interest Worksheet - Maryam F
No ratings yet
Simple Interest Worksheet - Maryam F
4 pages
Capstone Project Report Format
No ratings yet
Capstone Project Report Format
5 pages
Synopsis of School Registration System
100% (1)
Synopsis of School Registration System
29 pages
System". This Project Is A Web Based Application Developed in PHP and Mysql5 Xamp As
No ratings yet
System". This Project Is A Web Based Application Developed in PHP and Mysql5 Xamp As
23 pages
College Website
50% (2)
College Website
29 pages
Cat 1 CSC 311
100% (1)
Cat 1 CSC 311
5 pages
Attachee Assement Report
100% (1)
Attachee Assement Report
11 pages
Visual Basic Project On Employee Management System
No ratings yet
Visual Basic Project On Employee Management System
13 pages
QR Code Based Smart Attendance System
100% (1)
QR Code Based Smart Attendance System
11 pages
Final Year Project Proposal
No ratings yet
Final Year Project Proposal
9 pages
Tourist Guide System
No ratings yet
Tourist Guide System
17 pages
Hotel Management System Database Design
No ratings yet
Hotel Management System Database Design
25 pages
Poll Survey System
67% (3)
Poll Survey System
29 pages
Attendance System
No ratings yet
Attendance System
11 pages
Big Data - Challenges for the Hospitality Industry: 2nd Edition
From Everand
Big Data - Challenges for the Hospitality Industry: 2nd Edition
Michael Toedt
No ratings yet
Step1. Open The Data/bank Data - CSV Dataset
No ratings yet
Step1. Open The Data/bank Data - CSV Dataset
3 pages
BI_Experiment _No_1
No ratings yet
BI_Experiment _No_1
7 pages
Bundding_2
No ratings yet
Bundding_2
4 pages
Lab Week 8: A. AON Network Digram
No ratings yet
Lab Week 8: A. AON Network Digram
2 pages
CS310 Exam
No ratings yet
CS310 Exam
24 pages
CH405 Biochemistry: Fste School of Biological and Chemical Sciences
No ratings yet
CH405 Biochemistry: Fste School of Biological and Chemical Sciences
4 pages
CS311 Exam
No ratings yet
CS311 Exam
16 pages
CS211 Exam
No ratings yet
CS211 Exam
10 pages
CS341 Software Quality Assurance and Testing: Final Examination Semester 1 2017
No ratings yet
CS341 Software Quality Assurance and Testing: Final Examination Semester 1 2017
10 pages
The University of The South Pacific
No ratings yet
The University of The South Pacific
18 pages
CS111 Exam
No ratings yet
CS111 Exam
18 pages
CH102 Principles and Reactions in Organic Chemistry: Fste School of Biological and Chemical Sciences
No ratings yet
CH102 Principles and Reactions in Organic Chemistry: Fste School of Biological and Chemical Sciences
13 pages
The University of The South Pacific
No ratings yet
The University of The South Pacific
6 pages
CH312 Exam
No ratings yet
CH312 Exam
9 pages
CH414 Exam
No ratings yet
CH414 Exam
16 pages
T The Un Niversit Tyofth He Sou Uth Pa Cific: SC Chool O of Educ Cation
No ratings yet
T The Un Niversit Tyofth He Sou Uth Pa Cific: SC Chool O of Educ Cation
6 pages
The University of The South Pacific
No ratings yet
The University of The South Pacific
6 pages
BIF02 Exam
No ratings yet
BIF02 Exam
21 pages
MG101 Exam Print
No ratings yet
MG101 Exam Print
15 pages
AG373 Exam
100% (1)
AG373 Exam
4 pages
MG201 Exam
No ratings yet
MG201 Exam
4 pages
MG204 Exam
No ratings yet
MG204 Exam
5 pages
Course Code: MG202 Course Title: Operations Management
No ratings yet
Course Code: MG202 Course Title: Operations Management
7 pages
CH306 Exam
No ratings yet
CH306 Exam
9 pages
CH405 Exam
No ratings yet
CH405 Exam
7 pages
The University of The South Pacific: School of Accounting and Finance
No ratings yet
The University of The South Pacific: School of Accounting and Finance
7 pages
CH312 Exam
No ratings yet
CH312 Exam
8 pages
The University South Pacific: School of Accounting and Finance
No ratings yet
The University South Pacific: School of Accounting and Finance
9 pages
The University of The South Pacific: Chemistry Division
No ratings yet
The University of The South Pacific: Chemistry Division
10 pages
CH312
No ratings yet
CH312
8 pages
The University of The South Pacific: Faculty of Business & Economics
No ratings yet
The University of The South Pacific: Faculty of Business & Economics
9 pages
Linked List Assesment Questions and Asnswers 10-03-2025
No ratings yet
Linked List Assesment Questions and Asnswers 10-03-2025
46 pages
Form B 2 (Amended) RULE 5 (5) (4) (IV) : Caste Certificate (Part A)
No ratings yet
Form B 2 (Amended) RULE 5 (5) (4) (IV) : Caste Certificate (Part A)
1 page
Organizational Communication Worksheet
No ratings yet
Organizational Communication Worksheet
8 pages
Audolici A1/25: Amplificatore Hi-Fi Rapporto Di Test
No ratings yet
Audolici A1/25: Amplificatore Hi-Fi Rapporto Di Test
7 pages
2023 DSQC November 2023
No ratings yet
2023 DSQC November 2023
1 page
bhm tu paper
No ratings yet
bhm tu paper
19 pages
Ch2 Financial Statements & Cash Flows PDF
No ratings yet
Ch2 Financial Statements & Cash Flows PDF
38 pages
Microeconomics
100% (8)
Microeconomics
164 pages
An Analysis of Foreign Debt On Output of Industry in Nigeria
No ratings yet
An Analysis of Foreign Debt On Output of Industry in Nigeria
36 pages
Audio Circuits and Projects Revised Edition PDF
100% (3)
Audio Circuits and Projects Revised Edition PDF
195 pages
Student Management System
No ratings yet
Student Management System
33 pages
Why We Should Say Good-Bye To Fast Fashion
No ratings yet
Why We Should Say Good-Bye To Fast Fashion
2 pages
In RE Zialcita
No ratings yet
In RE Zialcita
2 pages
IIML IPMXPlacementBrochure
No ratings yet
IIML IPMXPlacementBrochure
44 pages
De Thi Chon HSG
No ratings yet
De Thi Chon HSG
13 pages
231-Arches v. Bellosillo G.R. No. L-23534 May 16, 1967
No ratings yet
231-Arches v. Bellosillo G.R. No. L-23534 May 16, 1967
2 pages
User Manual: Revo-Ii 3-5kva Inverter
50% (2)
User Manual: Revo-Ii 3-5kva Inverter
18 pages
Tax 2 Case Digest
No ratings yet
Tax 2 Case Digest
3 pages
Intro To AI With Python
No ratings yet
Intro To AI With Python
50 pages
1.5.monopolistic Competition
No ratings yet
1.5.monopolistic Competition
13 pages
An ISO /TS 1694:2009 Certified Company: Material Data
No ratings yet
An ISO /TS 1694:2009 Certified Company: Material Data
3 pages
Chapter4 1
No ratings yet
Chapter4 1
36 pages
UNIT-1 (Introduction To Agribusiness Management)
100% (1)
UNIT-1 (Introduction To Agribusiness Management)
128 pages
RM - Paddy Drying
No ratings yet
RM - Paddy Drying
49 pages
Function of Documents in Islam
No ratings yet
Function of Documents in Islam
3 pages
Pan Os
No ratings yet
Pan Os
932 pages
Chatbot final report
No ratings yet
Chatbot final report
20 pages
Lecture18_19_MOSFETs_Biasing
No ratings yet
Lecture18_19_MOSFETs_Biasing
21 pages
C Programming 2019 Soft Notes
100% (1)
C Programming 2019 Soft Notes
228 pages

IS328 Assignment1-Unlocked

Uploaded by

IS328 Assignment1-Unlocked

Uploaded by

The University of the South Pacific

School of Computing, Information and Mathematical Sciences

IS328: Data Mining

Due Date: as shown on Moodle

IS328: Assignment I Page 1 of 10 Dr.Vani Vasudevan

Question 1: Data Pre-processing [5 Marks]

Use bank-data.arff to perform a series of pre-processing operations using filters in WEKA.

1. Selecting or Filtering Attributes

IS328: Assignment I Page 2 of 10 Dr.Vani Vasudevan

IS328: Assignment I Page 3 of 10 Dr.Vani Vasudevan

1. Open file bank‐data.arff

2. Check if there is any missing values in any attribute.

IS328: Assignment I Page 4 of 10 Dr.Vani Vasudevan

1. Selecting or Filtering Attributes

Unsatisfactory Satisfactory Good Marks % Marks

IS328: Assignment I Page 5 of 10 Dr.Vani Vasudevan

Answer the following Preprocess Mode Questions:

Answer the Classification Questions using J48:

IS328: Assignment I Page 6 of 10 Dr.Vani Vasudevan

IS328: Assignment I Page 7 of 10 Dr.Vani Vasudevan

Unsatisfactory Satisfactory Good Marks % Marks

II Do not perform all required

IS328: Assignment I Page 8 of 10 Dr.Vani Vasudevan

Group Name ________________________

Project manager ________________________

Member ID Percentage contribution of allocated task

ID Member Name Signature

IS328: Assignment I Page 9 of 10 Dr.Vani Vasudevan

IS328: Assignment I Page 10 of 10 Dr.Vani Vasudevan

You might also like