0% found this document useful (0 votes)
54 views

PA Assignment 1 Oct2021

This document provides instructions for Assignment 1 of the Programming with Analytics module. Students are asked to complete 9 questions analyzing titanic passenger data using Python programming. The questions involve reading data from a CSV file, loading it into a NumPy array, defining functions to analyze frequencies and correlations, and building an interactive program to query the data. Correct formatting of outputs is required. Plagiarism is strictly prohibited and late submissions will be penalized.

Uploaded by

shumin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

PA Assignment 1 Oct2021

This document provides instructions for Assignment 1 of the Programming with Analytics module. Students are asked to complete 9 questions analyzing titanic passenger data using Python programming. The questions involve reading data from a CSV file, loading it into a NumPy array, defining functions to analyze frequencies and correlations, and building an interactive program to query the data. Correct formatting of outputs is required. Plagiarism is strictly prohibited and late submissions will be penalized.

Uploaded by

shumin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Official (Closed) - Non Sensitive

Programming with Analytics (PA)


October 2021, Semester 1

SCHOOL OF INFOCOMM TECHNOLOGY


Specialist Diploma in Data Analytics

ASSIGNMENT 1
Due on 19 December 2021 (Sunday), 23:59 hrs

Weightage: 30% of Module

Individual/Team/Both: Individual

Format: Programming and Presentation

Penalty for late submission:


• 10% per day from the due date.
• NO submission shall be entertained after 7 calendar days of the due
date.

There are a total of 8 pages (including this page) in this handout.

WARNING

If a student is found to have submitted work not done by him/her,


he/she will not be awarded any marks for this assignment.
Disciplinary action will also be taken.
Similar action will be taken for the student who allows other
student(s) to copy his/her work.
Official (Closed) - Non Sensitive
Oct 2021 Semester 1 Page 2 of 8

1. OBJECTIVE

This assignment assesses the student’s ability to apply relevant programming concepts
to develop a simple application using the numpy package of the Python programming
language.

2. QUESTIONS

The sinking of the Titanic was a tragedy in 1912. Data analytics and data science are
often applied to improve future outcomes, based on past events.
For this assignment, we will be using a modified, anonymized dataset from the Titanic
Passenger Data. This dataset is a partial sub-set of the full original data, with
missing/unusual data already cleaned or adjusted.
Column details:
• passenger_id: (anonymized, unique id numbers assigned to each passenger)
• pclass: represents passenger cabin class, (1 - first class, 2 - second class, 3 -
third class)
• survived: (0 - no, 1 - yes)
• gender: (0 - female, 1 - male)
• age: (numerical age of passenger)
• sibsp: (number of siblings and/or spouses the passenger had onboard together
with them)
• parch: (number of parents and/or children the passenger had onboard together
with them)
• fare: (assume in $US based on 1912 pricing)

Answer the following questions based on the "titanic_mod.csv" data file by writing Python
code.

1) Read the provided data file into the Jupyter Notebook using suitable file opening
functions, and perform the following tasks:

i) Print a list (python data structure) of all the column header names of the
dataset (the column names are in the first line of the data file).

ii) Print the first 5 column header names, followed by the first 5 rows of the data.

For ii), your output should clearly display the column names and the required
rows of data as per the example below.

PA Oct 2021 Assignment 1 Updated: 10 Oct 2021


Official (Closed) - Non Sensitive
Oct 2021 Semester 1 Page 3 of 8

Hint: Use Python's open() and readline() to open the provided file, and to read the
file's column headers and rows of the data, line by line.

Reminder:
Use of non-basic python such as csv or pandas libraries will result in 50% penalties.

(8 marks)

2) Using relevant numpy functions, load the data file into an array, excluding the first
row of column headers. Display the contents and properties of the array as per the
below example. You may need to use numpy’s set_printoptions() function to
achieve the desired display).

Hint: Use numpy’s genfromtxt() function to load data from a text or csv file. Read
the documentation and consider carefully what parameters to use when calling
genfromtxt().

3) Write a user-defined function that has two parameters: a column index and a
passenger age number. It will count the occurrences of the passenger age number
in the column index of the numpy array and return the total occurrences.

Use the above user-defined function to answer the following question:

In the dataset, print the 3 most frequent ages of the passengers. Include the
proportion as a % out of entire passenger manifest, to 3 decimal places, for each
age.

A print out of the program is shown below:

(12 marks)

PA Oct 2021 Assignment 1 Updated: 10 Oct 2021


Official (Closed) - Non Sensitive
Oct 2021 Semester 1 Page 4 of 8

4) It is often important to explore the data to gain preliminary insights, before


proceeding to predictive models or deciding on a problem statement to investigate.

Please print out the following values amongst passengers (when appropriate, to 2
decimal places):

i) Highest value of number of siblings and/or spouses on-board


ii) Mean value of parents and/or children on-board
iii) 50th-percentile value of fare paid
iv) Cheapest non-zero fare paid

(12 marks)

5) An example of a more advanced investigation, requiring comparison across


aggregated values of different features/columns, could be, a researcher wanting to
measure the difference in fares between males that survived and males that did
not:

Print out the difference between mean fare paid by males that survived, and mean
fare paid by males that did not, appropriately formatted.

(8 marks)

6) A research think-tank has tasked you with automating some of the common queries
that their members make about the Titanic dataset.

Write a simple Python program for the user to query the data based on his/her given
inputs. When a user enters an option from 0 to 3, the program will process the
option accordingly.

After the option has been processed, the program will display the main menu again
and the process is repeated until the user chooses to exit.

The options are explained in Questions 7 to 9.

(10 marks)

PA Oct 2021 Assignment 1 Updated: 10 Oct 2021


Official (Closed) - Non Sensitive
Oct 2021 Semester 1 Page 5 of 8

7) Correlation between quantities may indicate some underlying relationship or likely


pattern of behaviour.

For the Compute Correlation option, display a numbered list of all the column
header names and prompt the user to input the numbers representing the two
quantities for the computation of correlation. The computed correlation should be
rounded off to 3 decimal places, as per the sample run below.

8) In the absence of actual lifeboat data, survivor age can be used to gauge if certain
demographics were allowed on the lifeboats first.

Prompt the user to enter the passenger class number, before displaying the
corresponding rows of the 20 oldest survivors for that passenger class, in order
from oldest to youngest.

PA Oct 2021 Assignment 1 Updated: 10 Oct 2021


Official (Closed) - Non Sensitive
Oct 2021 Semester 1 Page 6 of 8

9) It was reported that while generally women were allowed onto lifeboats first,
researchers are also keen to identify female survivors with larger numbers of family
members on-board (not including themselves).

Write a simple lambda function to calculate a new numpy array column containing
each passenger's non-self family members on-board, by adding the count of sibling
and/or spouses, to the count of parents and/or children, for each passenger.

Append this column to the existing numpy 2-D array of values (you may need to
use numpy.reshape() before appending) and display the top 20 rows of female
survivors, ordered by highest to lowest by non-self family member count primarily,
and in case of a tie, by highest to lowest fare secondarily.

Sample output below only shows first 3 rows of output.

(15 marks)

3. SCOPE

For this assignment, you are expected to:


• understand the problem completely and plan your program before you start coding.
• implement and test each feature as it is developed.
• do all the possible data validations (e.g. use try-except).

Marks may be penalized for students submitting programs that exhibit one or more of the
below undesirable characteristics (not necessarily an exhaustive list):
• Non-descript or irrelevant choice of variable names
• Lack of accompanying documentation/comments for complex code segments

Note:
• You are expected to follow the naming conventions introduced in this module.
• You should think carefully what input is required for each option if there is any.
• You are allowed to customize your own output.
• You are required to present your solution to explain your program to your tutor
before submission. Programs need not be complete, in which case, a
discussion on how you would proceed to complete would suffice.

PA Oct 2021 Assignment 1 Updated: 10 Oct 2021


Official (Closed) - Non Sensitive
Oct 2021 Semester 1 Page 7 of 8

• Marks will be deducted if you are not able to show your understanding of the
program during the presentation.

Data File:
• The data file for your program is available from PolyMall  Programming for Analytics
module  Learning Materials.

Links to Documentation:
• Section 7.2 of https://docs.python.org/3/tutorial/inputoutput.html on Reading and
Writing Files.
• https://numpy.org/devdocs/user/how-to-io.html on reading text and csv files.

6. DELIVERABLES

• Write your solution in the given Jupyter Notebook "Oct2021_ASG1.ipynb" file.


Rename this file as "JasonTan_ASG1.ipynb" where "JasonTan" is your
name.

• Submit your solution file into your MS Teams Class Notebook by 19 Dec 2021
23:59hrs.

• Presentation. Demonstrate your application to your tutor during timeslots scheduled


by your tutor starting 13 Dec 2021.

7. ASSESSMENT

This assignment constitutes 30% of this module.

Performance Criteria for grading the assignment is as described below. Marks awarded
will be based on program code as well as student’s degree of understanding of work
done for completed program and discussion for incomplete parts as assessed during
the presentation.

PA Oct 2021 Assignment 1 Updated: 10 Oct 2021


Official (Closed) - Non Sensitive
Oct 2021 Semester 1 Page 8 of 8

A Grade

♦ Program implements Q1 to Q9 successfully with comprehensive input validation


♦ Program demonstrates excellent design with correct use of functions and error-
free
♦ Program completed with excellent comments
♦ Program has been tested adequately
♦ Program is coded with excellent application of fundamental concepts
♦ Excellent demonstration of program and showing excellent understanding of work
done and/or discussion of what needs to be done during presentation

B Grade

♦ Program implements Q1 to Q9 successfully with some input validation


♦ Program demonstrates good design with correct use of functions, with few errors
♦ Program completed with good comments
♦ Program has been tested adequately
♦ Program is coded with good application of fundamental concepts
♦ Good demonstration of program and showing good understanding of work done
and/or discussion of what needs to be done during presentation

C Grade

♦ Program implements most of Q1 to Q9 with some input validation


♦ Program demonstrates a reasonable design with some correct use of functions
♦ Program completed with some comments
♦ Program has been tested to some extent
♦ Some demonstration of program and showing some understanding of work done
and/or discussion of what needs to be done during presentation

D Grade

♦ Program implements some of Q1 to Q9 successfully with little or no input


validation
♦ Program completed with few comments
♦ Program has been tested to a limited extent
♦ Limited demonstration of program while being able to answer some questions
during presentation

== END OF DOCUMENT ==

PA Oct 2021 Assignment 1 Updated: 10 Oct 2021

You might also like