ICT583 Data Science Applications - Final Assignment - Individual - UPDATED!!! - Explanation

This assignment requires students to complete a health care data science project using the Mammographic Mass Data Set. Students must ask two interesting questions about the data, clean and explore the data, build predictive models using at least three machine learning methods, analyze the results, and document their findings in an R code file and report. The report should include an overview, data description, data cleaning steps, exploratory analysis, predictive modeling details and results, final analysis, and conclusion.

Uploaded by

Hammadiqbal12

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

636 views

ICT583 Data Science Applications - Final Assignment - Individual - UPDATED!!! - Explanation

Uploaded by

Hammadiqbal12

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

This assignment specification has been updated on 5th May.

Please pay
attention to the BLUE highlight!!

ICT 583 Data Science Applications

School of Engineering and Information Technology
Murdoch University
Semester 1, 2020

Assignment: Data Science Project

Due date: 5th June 11:55 pm

Unit Coordinator: Dr Guanjin Wang

Instructions
1. Individual assignment
2. The assignment accounts for 35% of the whole unit.
3. Submit the assignment from ICT583 LMS site using the Assignment unit tool.
4. Late work may attract a penalty of 10% (of the mark for that piece of assessment) per day late, up
to and including 10 days late. Work submitted more than 10 days late might not be marked.
5. You must keep a copy of the final version of your assignment as submitted and be prepared to
provide it on request.
6. The University treats plagiarism, collusion, theft of other students’ work and other forms of
dishonesty in assessment seriously. Any instances of dishonesty in this assessment will be forwarded
immediately to the Faculty Dean. For guidelines on honesty in assessment including avoiding
plagiarism, see: http://our.murdoch.edu.au/Educationaltechnologies/Academic-integrity/

Assignment overview:
In recent years, advances in machine learning are opening the door for intelligent health care data
prediction and decision-making. A variety of machine learning algorithms can be used to iteratively
learn from data to improve, find out the hidden patterns, and predict future events. Successful
applications such as individualized diagnosis and prognosis, hospital readmission prediction, and
personalized medicine can lead to improvements in medical practices and health care experiences. Your
final assignment will work on a health care data science project. The goal of this project is to follow the
data science analysis pipeline to answer interesting questions of your own choosing, acquire the data,
perform data manipulations, design your visualizations, build your predictive modeling, run statistical
analysis, and present the results in a report format.
How does data science analysis pipeline looks like (pp.26, Topic 6):

The dataset is given; you need to complete the rest of the steps - ask interesting questions, explore the
data, model the data, communicate and visualize the results.
Step 1: Get your dataset: You will use one health care dataset in this project called Mammographic
Mass Data Set (retrieve it from http://archive.ics.uci.edu/ml/datasets/mammographic+mass.)
* understand your dataset first
6 Attributes in total (1 goal field, 1 non-predictive, 4 predictive attributes)
1. BI-RADS assessment: 1 to 5 (ordinal, non-predictive!)
2. Age: patient's age in years (integer)
3. Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal)
4. Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5
(nominal)
5. Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal)
6. Severity: benign=0 or malignant=1 (binominal, goal field!)

Step 2: You will raise four TWO interesting questions on the dataset and prepare to answer them
in your following analysis via data manipulation, visualization or predictive modeling, etc. You can
refer to the examples in lecture 1, and exercise 1, where several good questions have been raised based
on the given datasets.
* refer to lecture 1, exercise 1
 Can we predict the probability that a patient will have a a malignant mammographic mass lession
given BI-RADS attributes and the age? do not recommend
 What is the age distribution of benign and malignant target?

Think of your own questions!

Step 3: Data manipulation and cleaning: Observe your dataset and pre-process the data if necessary and
justify.
*refer to lecture 3
Is there any missing data? How to deal with them? What kind of strategies are available? How to apply
it in the dataset?
Any feature selection? non predictive attribute?
Are there any outliers?

Step 4: Exploratory data analysis: perform initial investigations on data using summary statistic and
visualizations.
Descriptive statistics:
Central tendency, variability,
Visualizations:
Remember box-and-whisker plot (by groups)? Histogram? Cumulative distribution function?

What about the categorical variables? - Frequency tables, bar chart (stacked bar chart)

Step 5: You will select at least TWO three machine learning methods and apply them to the dataset
for predictive modeling. The performances of different models should be evaluated.

select at least TWO three machine learning methods: now you know logistic regression which could
be a possible option for this binary (0 - benign 1 - malignant) classification task; the other machine
learning methods (neural networks; support vector machines, k nearest neighbouring, etc) will be
introduced in the following two lectures.
apply them to the dataset for predictive modeling:
Dataset partitioning - training (build your predictive modelling) and testing subsets (evaluate the
performance on the constructed predictive modelling)
training subset; 70% or 80% -to build the prediction model
testing subset. 30% 20%; - to validate your constructed prediction model
For example, if you only have 10 samples in this dataset as shown in the table below, you randomly
select 20% of the dataset to be the testing subset (orange), and the remaining to be the training subset

Number Patient1 shape Margin Density Outcome

age

2
4

6
7

Repeat this random partitioning process for 10 times so that you can calculate the mean expected
performance.

The performances of different models should be evaluated: What performance metric do you choose to
compare the results?
Accuracy mean accuracy ± SD
.........
If the mean expected performance from two models are different, how do you know How do you know
that the difference is statistically significant? (statistical test)
T-test

Step 6: Analyze the results

Step 7: Document all your findings
What you need to submit:
R file
An essential part of your project is your R coding. Your R file should record the steps in developing
your solutions and obtaining the final data analysis results. Make sure your code matches the findings
you put in the report. For example, if there are three separate plots in the report, your code should
produce exactly the same three separate plots.
Report
You also need to submit an in-depth report. The following components and discussions might be
considered in your report:
Overview of the project: Provide an overview of the project, the goals, and the motivation for it.
Consider that this will be read by people who first see your project.
Dataset: Describe the background of the dataset and provide the summary statistic. Interesting
questions: What questions are you trying to answer? Do any questions evolve throughout the project?
Are there any new questions you consider in the course of your analysis? ...
Data manipulation and cleaning: Are there any data pre-processing steps performed, and why? Are there
any questions that can be answered via data manipulation? ...
Exploratory data analysis: What visualizations did you use to look at your data in different ways? Are
there any detected outliers? ...
Predictive modeling: What are the various machine learning methods you considered? Justify the
decisions you made. What are the main ideas of the selected methods? How do you build the models?
Are there any concerns when designing your model? ...
Final analysis: What did you learn about the data? Which method statistically outperformed the rest?
Have you found the answers to the raised questions? How can you justify your answers? ... Engagingly
present your results using text, visualizations.
Conclusion: Are there any limitations of your study? What are your future work?

Commercial Law Assignment
0% (1)
Commercial Law Assignment
3 pages
Case in Point Case Competition Creating Winning Strategy Presb Offers Marc P. Cosentino & Kara Kravetz Cupoli & Jason Rife All Chapters Instant Download
100% (3)
Case in Point Case Competition Creating Winning Strategy Presb Offers Marc P. Cosentino & Kara Kravetz Cupoli & Jason Rife All Chapters Instant Download
10 pages
OPER8340 - Assignment #2
0% (1)
OPER8340 - Assignment #2
3 pages
ICT581 Assignment 1 2023 Yassir
No ratings yet
ICT581 Assignment 1 2023 Yassir
4 pages
Universiti Teknologi Mara Csc404 Programming Ii Mini Project: Burger Shop
No ratings yet
Universiti Teknologi Mara Csc404 Programming Ii Mini Project: Burger Shop
31 pages
Approval Process Set-Up and Conditions
No ratings yet
Approval Process Set-Up and Conditions
5 pages
National Fabricators 1
33% (3)
National Fabricators 1
8 pages
OpenMPCoursework2
100% (1)
OpenMPCoursework2
5 pages
Video Case Answers ESS11 - REV 2-14-14
No ratings yet
Video Case Answers ESS11 - REV 2-14-14
43 pages
Neo-Bank For Bharat: Rupeezen
No ratings yet
Neo-Bank For Bharat: Rupeezen
4 pages
Systems Analysis and Design - Workshop 8 Questions
No ratings yet
Systems Analysis and Design - Workshop 8 Questions
4 pages
It 2035C Network Infrastructure Management: C1 Design Scenario: Electromycycle
No ratings yet
It 2035C Network Infrastructure Management: C1 Design Scenario: Electromycycle
2 pages
MYH Manage Your Health Case Study
100% (1)
MYH Manage Your Health Case Study
3 pages
Econ South Korea VS Argentina
No ratings yet
Econ South Korea VS Argentina
2 pages
ICT581 Assignment 1 2022
No ratings yet
ICT581 Assignment 1 2022
5 pages
Assignment 2
No ratings yet
Assignment 2
17 pages
Bus Scheduling and Booking System Abstract
100% (1)
Bus Scheduling and Booking System Abstract
2 pages
M.E (Or) QB
0% (1)
M.E (Or) QB
21 pages
UPS Case Study Assignment Ayesha Ikram MBA Professional
100% (2)
UPS Case Study Assignment Ayesha Ikram MBA Professional
2 pages
All You Need To Know About Ubl Funds Fauripaisa!: Fauripaisa Withdrawal Limits (RS)
No ratings yet
All You Need To Know About Ubl Funds Fauripaisa!: Fauripaisa Withdrawal Limits (RS)
2 pages
Mrketing Assignment 2 - Netflix - 5!6!920190412180106
No ratings yet
Mrketing Assignment 2 - Netflix - 5!6!920190412180106
6 pages
Market Analaysi
100% (1)
Market Analaysi
14 pages
App Based Online Laundry Service': Topic: New Business Idea of
No ratings yet
App Based Online Laundry Service': Topic: New Business Idea of
16 pages
Samima Mam Assignment
No ratings yet
Samima Mam Assignment
6 pages
ISYSA Assignment QP Winter 2022 Winter 2023 20 Credit Paper FINAL
No ratings yet
ISYSA Assignment QP Winter 2022 Winter 2023 20 Credit Paper FINAL
7 pages
Management Information Systems Assignment 01 Amogne Assaye ID MBAO494814B PDF
No ratings yet
Management Information Systems Assignment 01 Amogne Assaye ID MBAO494814B PDF
10 pages
581 Assignment 1 1174
No ratings yet
581 Assignment 1 1174
12 pages
Embedded Finance
No ratings yet
Embedded Finance
5 pages
The Main Stakeholders For The Wedding Parties Everything
No ratings yet
The Main Stakeholders For The Wedding Parties Everything
10 pages
Food Court Management System
100% (1)
Food Court Management System
6 pages
QB PPC
No ratings yet
QB PPC
45 pages
HRM 411 Term Paper.
No ratings yet
HRM 411 Term Paper.
25 pages
OSCM Case - Pizza USA
No ratings yet
OSCM Case - Pizza USA
4 pages
Superior University Sialkot Campus: Hostel Management System
No ratings yet
Superior University Sialkot Campus: Hostel Management System
26 pages
Assignment 5
No ratings yet
Assignment 5
8 pages
Busn 5000 Week 5 HW
No ratings yet
Busn 5000 Week 5 HW
3 pages
Baseline Project Plan Report Project Name: Video World (VW) Rental Processing System
No ratings yet
Baseline Project Plan Report Project Name: Video World (VW) Rental Processing System
3 pages
CC Car Wash Specializes in Car Cleaning Services The Ser
No ratings yet
CC Car Wash Specializes in Car Cleaning Services The Ser
1 page
PRO TECH (Professional Training For Technicians in Pest Control) 1.background
No ratings yet
PRO TECH (Professional Training For Technicians in Pest Control) 1.background
7 pages
Assignement IPM
100% (1)
Assignement IPM
5 pages
Developing Examination Management System Senior Capstone Project A Case Study
No ratings yet
Developing Examination Management System Senior Capstone Project A Case Study
7 pages
M2Assignment 4
No ratings yet
M2Assignment 4
9 pages
10 Decision Areas of Operations Management: Google 'S Human Resource Management
No ratings yet
10 Decision Areas of Operations Management: Google 'S Human Resource Management
2 pages
CH 4
No ratings yet
CH 4
15 pages
Executive Summary: Vision
No ratings yet
Executive Summary: Vision
11 pages
E-Bidding: Title of The Project
No ratings yet
E-Bidding: Title of The Project
3 pages
Tanaza Hotspot System
No ratings yet
Tanaza Hotspot System
10 pages
RR
No ratings yet
RR
14 pages
SDLC Stands For System Development Life Cycle
No ratings yet
SDLC Stands For System Development Life Cycle
5 pages
A Project Report Submitted in Partial Fulfilment of The Requirements For Award of The Degree of MBA
No ratings yet
A Project Report Submitted in Partial Fulfilment of The Requirements For Award of The Degree of MBA
13 pages
EMS Action Plan
100% (1)
EMS Action Plan
7 pages
INF09801 Cwk2 - Assess Brief 2021-22 ND
No ratings yet
INF09801 Cwk2 - Assess Brief 2021-22 ND
7 pages
Classified Management System Project Report
No ratings yet
Classified Management System Project Report
50 pages
Chapter1 & 2
No ratings yet
Chapter1 & 2
20 pages
PDF Machine Learning for Business Analytics: Concepts, Techniques and Applications in RapidMiner Galit Shmueli download
88% (8)
PDF Machine Learning for Business Analytics: Concepts, Techniques and Applications in RapidMiner Galit Shmueli download
66 pages
IPE 336 - Session 2014
No ratings yet
IPE 336 - Session 2014
10 pages
Predict Bike Trip Duration With A Regression Model in BQML LAB
100% (1)
Predict Bike Trip Duration With A Regression Model in BQML LAB
17 pages
Alcatel 4028, 4038 and 4068 IP Touch Sets
No ratings yet
Alcatel 4028, 4038 and 4068 IP Touch Sets
24 pages
Unit 10 Database
No ratings yet
Unit 10 Database
11 pages
MLM FDS
No ratings yet
MLM FDS
19 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Lahore Leads University Faculty of Engineering Department of Electrical Engineering
No ratings yet
Lahore Leads University Faculty of Engineering Department of Electrical Engineering
1 page
Intro 2 R
No ratings yet
Intro 2 R
206 pages
11kv BB1
No ratings yet
11kv BB1
1 page
White Dolphin Reading Notes
50% (2)
White Dolphin Reading Notes
10 pages
Intro 2 R
No ratings yet
Intro 2 R
206 pages
CS Open Book 17-E-790
No ratings yet
CS Open Book 17-E-790
4 pages
Physical Optics: Multiple Choice Questions
No ratings yet
Physical Optics: Multiple Choice Questions
9 pages
Spatial Modeling
No ratings yet
Spatial Modeling
20 pages
Dokumen - Tips - Law 243 Constitutional Law Dl4a 243pdflaw 243 Constitutional Law Contents Pages
No ratings yet
Dokumen - Tips - Law 243 Constitutional Law Dl4a 243pdflaw 243 Constitutional Law Contents Pages
90 pages
Scaey PDF
No ratings yet
Scaey PDF
20 pages
Principles of English Language Teaching: Daily Lesson Plan
No ratings yet
Principles of English Language Teaching: Daily Lesson Plan
5 pages
Cbse - Department of Skill Education: Artificial Intelligence (Subject Code 843)
100% (1)
Cbse - Department of Skill Education: Artificial Intelligence (Subject Code 843)
7 pages
SEAT Leon - Price List - John Clark Motor Group
No ratings yet
SEAT Leon - Price List - John Clark Motor Group
6 pages
1-Vicky, Quotation of 1-1.2TPH Wood Pellet Line, RICHI
100% (1)
1-Vicky, Quotation of 1-1.2TPH Wood Pellet Line, RICHI
17 pages
Spe DS000600800 Z en 003
No ratings yet
Spe DS000600800 Z en 003
2 pages
All Kerala Cm Exam 2025 Maths Std Set 2 With Ms
No ratings yet
All Kerala Cm Exam 2025 Maths Std Set 2 With Ms
14 pages
Maia-Carneiro & Rocha 2013 - Influences of Sex, Ontogeny and Body Size On The Thermal Ecology of Liolaemus Lutzae PDF
No ratings yet
Maia-Carneiro & Rocha 2013 - Influences of Sex, Ontogeny and Body Size On The Thermal Ecology of Liolaemus Lutzae PDF
6 pages
E-book_Poetics of Diversity_ multiple perspectives in literature
No ratings yet
E-book_Poetics of Diversity_ multiple perspectives in literature
132 pages
Programming and Data Structure Short Notes-1
No ratings yet
Programming and Data Structure Short Notes-1
42 pages
The Multiplier Effect of Good Time Management
No ratings yet
The Multiplier Effect of Good Time Management
13 pages
Ohn 2
No ratings yet
Ohn 2
9 pages
BSBTWK502 Project Portfolio Student - Template.v1.0
No ratings yet
BSBTWK502 Project Portfolio Student - Template.v1.0
8 pages
SSC GR 10 Electronics Q4 Module 1 WK 1 - v.01-CC-released-22May2021
No ratings yet
SSC GR 10 Electronics Q4 Module 1 WK 1 - v.01-CC-released-22May2021
20 pages
Os
No ratings yet
Os
4 pages
Smartline: Stt750 Smartline Temperature Transmitter Specification 34-Tt-03-16, January 2020
No ratings yet
Smartline: Stt750 Smartline Temperature Transmitter Specification 34-Tt-03-16, January 2020
17 pages
law of assumption!?
No ratings yet
law of assumption!?
14 pages
Earth's Moon
No ratings yet
Earth's Moon
20 pages
AL-905-R-11202 Spec AG01 Rev. T02
No ratings yet
AL-905-R-11202 Spec AG01 Rev. T02
3 pages
Gillian Rose - A Feminist Critique of The Space of Phallocentric Self-Knowledge
No ratings yet
Gillian Rose - A Feminist Critique of The Space of Phallocentric Self-Knowledge
21 pages
Hellenbrand Residential Water Softener Promate6 Consumers Manual PDF
No ratings yet
Hellenbrand Residential Water Softener Promate6 Consumers Manual PDF
16 pages
Tryongan
No ratings yet
Tryongan
11 pages
PDF Pearson IIT Foundation Series - Physics Class 9 6th Edition Trishna Knowledge Systems - eBook PDF download
100% (2)
PDF Pearson IIT Foundation Series - Physics Class 9 6th Edition Trishna Knowledge Systems - eBook PDF download
38 pages
Notes On Performance Management System PDF
100% (1)
Notes On Performance Management System PDF
20 pages
ViaMichelin - Michelin Route Planner and Maps, Restaurants, Traffic News and Hotel Booking
No ratings yet
ViaMichelin - Michelin Route Planner and Maps, Restaurants, Traffic News and Hotel Booking
9 pages
Practice Core Java and Advanced Java MCQs
No ratings yet
Practice Core Java and Advanced Java MCQs
13 pages
Design and Analysis of Algorithms: Unit - I
No ratings yet
Design and Analysis of Algorithms: Unit - I
28 pages
Module 8 Inventory Excel Template
No ratings yet
Module 8 Inventory Excel Template
17 pages