100% found this document useful (1 vote)
223 views

CETM50 - Assignment

This document outlines the requirements for a coursework assignment with two parts: 1) A Python programming task to extract and combine customer data from multiple sources in different formats into a single database. Students are provided mock customer data and must write Python code to perform ETL and load the unified data into a MySQL database. 2) A 7-page report discussing challenges of combining the heterogeneous data sources, critiquing the use of a relational database for the expanded business needs, and highlighting potential big data issues from expanding operations internationally and increasing data volumes. The report must address specific points around data challenges, database solution comparisons, and regulatory/technological impacts of scaling the business globally.

Uploaded by

Amine Elkari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
223 views

CETM50 - Assignment

This document outlines the requirements for a coursework assignment with two parts: 1) A Python programming task to extract and combine customer data from multiple sources in different formats into a single database. Students are provided mock customer data and must write Python code to perform ETL and load the unified data into a MySQL database. 2) A 7-page report discussing challenges of combining the heterogeneous data sources, critiquing the use of a relational database for the expanded business needs, and highlighting potential big data issues from expanding operations internationally and increasing data volumes. The report must address specific points around data challenges, database solution comparisons, and regulatory/technological impacts of scaling the business globally.

Uploaded by

Amine Elkari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Technology Management for

Organisations (CETM50) Coursework

Gathering Up Your Data & Scaling Out

Submission ( 100% of Module Marks )


● Code Submission ( Python Script / Jupyter Notebook ) - 40% of Assignment
● Report PDF Submission ( Turnitin ) - 60% of Assignment

Deadline: Thursday January 27th @ 23:59

Aims
This coursework provides assessed practical experience in the use of technology to perform
fundamental ETL processes within organisations using Python; alongside considerations of
Data and its management for modern organisations wishing to expand their operations
and/or data volume.

The assignment is split into two deliverables: Python ETL, A programming task for
combining multiple data sources and pushing these to a central organisational data store;
and the Scaling Up theoretical report, A report which poses a theoretical expansion scenario
for a business. Please read the below sub-sections carefully, as these will detail the
requirements for each component.

1
Task 1 - Python ETL: Programming Task
A SMB (small-to-medium business) has recently begun to utilise the data they obtain from
their customers. Unfortunately, their business has multiple areas which all have customer
data specific to that area, and this is fragmented within the organisation. E.g Credit Card
data is only stored by the financial systems, employment within HR, etc. There is not a single
cohesive record representing customers. The SMB is looking to unify these ahead of further
data investigation, and to pool all this data together into a central datastore.

The data provided for this assessment is mock data representing a typical customer-facing
business; these involve data such as names, banking credentials, family attributes, etc.
These data files are provided as a mixed modality in a variety of formats (CSV, JSON, XML,
and TXT).
The work herein requires the processing of these data into a homogenous record, aligning
the same customers from different sources together, which are then automatically entered
into a Relational Database System using modern tools & libraries.

Figure 1 - Example of how two separate data files can be combined into the final form

2
Raw data for this assignment is provided on canvas, containing a mix of .csv, .json, .xml, and
.txt files. This data is synthetic, but derived from a realistic domain with data generated in
accordance to 2016 UK Census data. You will need to apply your knowledge of data in order
to correctly parse these and perform the task.

You are expected to read and extract data from these various formats, wrangle the data -
solving inconsistencies if present - and bring data together into a singular format (See Figure
1 as an example). These unified records are then to be mapped to a relational database
using PonyORM Entities, with all unified records being entered into the database.

Database Access
A MySQL Database is available for connection for this task; however, you are welcome to
install a WAMP/LAMP stack yourself for testing purposes.

If you require credentials resetting please contact [email protected]. This


will take time, depending on when your request comes in.

Host: europa.ashley.work
User: student_ followed by your student ID
Password: iE93F2@8EhM@1zhD&u9M@K
Database: student_ followed by your student ID

E.g If your student ID is bh12xy, then your connection would use:


User: student_bh12xy
Database: student_bh12xy

Note: These credentials will also work for PhpMyAdmin should you wish to use this for
inspection purposes. https://europa.ashley.work/phpmyadmin

Note 2: Credentials covered in the workshops are the same as these, you are free to use
whichever student_ or sec_student_ variants.

Deliverable
A Python Notebook (.ipynb) file, or a Python Script (.py) can be used for submission.

When marking, files will be rerun from scratch. Therefore, ensure your submission works
prior to submission. For notebooks, this will be Kernel -> Restart and Run All. Any existing
output within your scripts will not be considered when marking.

It is expected that your code be sufficiently commented where applicable, especially where
any documentation was referenced.

3
Task 2 - Scaling Up: Big Data, Big Problems?
The SMB is currently based in the UK with a singular office. Their IT systems are
off-the-shelf components for their specific areas (Finance, HR, etc) requiring manual
intervention if any data needs to be compared / crossed between those systems. In the past
year they have found huge success with a large influx of customers, and they are looking to
expand; this expansion will allow them to take on more customers - and to expand their
offerings/operations to include more social media aspects.

Part of this includes potential expansion into foreign markets (Northeast Asia), with an office
space initially in Fukuoka, Japan. The Fukuoka office would be responsible for customers in
that region; however the main company will still be based within the UK, and require regular
communication back-and-forth including customer data. The company also wishes to
improve the core infrastructure of their business by combining their different data streams
with an aim to analyse it. In Task 1 (Python ETL) you will have already pulled together
various data towards this goal.

Based on the above expansion scenario you should write a report ( 7 pages max ) which
covers the following:

1. Reflect on the types of data provided from Task 1, and the challenges with combining
them. In particular you should reflect on the challenges presented in Task 1 from a
data perspective, as well as a personal reflection on the process of you undertaking
the task; any difficulties faced, challenges overcome, etc. Your answer should also
consider difficulties in how this process could be automated.
2. Task 1 utilised a Relational Database to store the combination of customer data using
PonyORM, critique this decision and compare and contrast potential solutions the
company could utilise for their expansion, settling on a recommendation for the client.
3. Discuss and highlight the potential Big Data issues present with the company
opening a foreign office and dealing with non-UK customer data. You should consider
any regulatory or legal requirements of the business, as well as technological issues
presented by the scaling up of a company's operations and the data volumes they
begin to accrue.

Deliverable
A PDF submission is required via Canvas with a maximum of 7 pages. Note, any work
which goes beyond this limit will be ignored for the purposes of marking. You should ensure
that the PDF is uploaded as-is, and not within a ZIP file or any other archive for the purposes
of TurnItIn. The overall structure is up to you.

Cover page, table of contents, references, and appendices sections do not count towards
the page limit.

You might also like