ITNPBD6 Assignment 2018-2 PDF
ITNPBD6 Assignment 2018-2 PDF
2018
Computing and Maths
University of Stirling
The banks are having a bit of trouble with debt at the moment. They have lent lots of
money to people who promised to pay it back, and then didn’t. In the future, they
would like to avoid lending to the kind of person who won’t pay back the loan, and
that is where you come in. We have got some data from a bank describing 2000 of its
loan customers. The data also tells us whether or not each customer repaid the loan.
The question is simple – Can we predict who will repay the loans and who won’t?
Your assignment is to answer that question using data mining techniques and produce
a system that would be able to tell the bank how likely it is that a new customer would
pay back a loan.
You can use any software of your choice (for example, Weka or scikit learn in
Python) and you will not be required to submit any code, just a report. You should
employ best practice for both the project management and the machine learning
aspects of the project. The data you need for the project is also available on the course
Canvas page.
Introduction 10 Marks
Describe the task you were given, the data you received and the requirements of the
finished system. Define any terminology that you will use in the report (for example,
model, variable, task, etc.). Describe the project methodology you will use.
Modelling 50 Marks
You must use two different techniques and build models with both: pick a suitable
tree building algorithm and also use a multi-layer perceptron. Describe the different
methods you used and the results that you got. Give a detailed technical description of
the techniques and the way the models are represented. Include one diagram showing
the structure of each type of model that you build. In this section, it is particularly
important that the description and the diagrams are your own work. Do not copy (or
even paraphrase) from other sources. You must avoid plagiarism.
Describe what hyperparameters may be changed and what effect this has. If you
varied the hyperparameters of a model, show how this impacted on the results.
Describe how you split the data for training, validation and testing purposes. Be
methodical and record each result. This stage is a little like scientific research – you
are carrying out experiments in your search for the best solution. Once you have a
solution, show how you verified its robustness. For the two different techniques report
on their comparative ability to predict a defaulted loan, and also on how easy it would
be for the insurance company to understand the model and the reasons behind each
prediction it makes.
Submission
Check the course web site on Canvas for the submission deadline. Upload your report
via canvas by the deadline. There is an 8000 word limit on the report and marks will
be deducted at a rate of 10 for every 1000 words over you go.
You do not need to submit the models that you built, just the report.
You can assume that the client has a good technical understanding of data mining and
statistics, so do not shy away from technical terms in your report. Where you use
them, however, explain what they mean in plain language too. To maximise your
mark, make sure you follow the instructions above and include everything that is
asked for in the report.
Plagiarism
Work which is submitted for assessment must be your own work. All students should
note that the University has a formal policy on plagiarism which can be found at
http://www.quality.stir.ac.uk/ac-policy/assessment.php.
This assignment is worth 50% of the overall grade for the course, and is subject to the
usual grade penalties for late submission. This assignment is set by Kevin Swingler.
You can email questions about it to [email protected].