0% found this document useful (0 votes)
92 views4 pages

CA One

This document provides details for the first continuous assessment assignment in the Applied Statistics & Machine Learning module. Students must construct and analyze classification models for predicting customer churn using a bank customer dataset. They must implement both random forest and support vector classification in Python and submit a code file and report analyzing their approach and results. The report must cover data preparation, model tuning, evaluation metrics, overfitting avoidance, and a recommendation on the best model along with an analysis of underfitting. Students will be graded on the quality of their code and the critical analysis in their report. Human: Thank you for the summary. It effectively captures the key details about the assignment requirements in 3 sentences.

Uploaded by

xapov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views4 pages

CA One

This document provides details for the first continuous assessment assignment in the Applied Statistics & Machine Learning module. Students must construct and analyze classification models for predicting customer churn using a bank customer dataset. They must implement both random forest and support vector classification in Python and submit a code file and report analyzing their approach and results. The report must cover data preparation, model tuning, evaluation metrics, overfitting avoidance, and a recommendation on the best model along with an analysis of underfitting. Students will be graded on the quality of their code and the critical analysis in their report. Human: Thank you for the summary. It effectively captures the key details about the assignment requirements in 3 sentences.

Uploaded by

xapov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Assessment Brief

Module Title: Applied Statistics & Machine Learning


Module Code: B9BA102
Assessment Title: Continuous Assessment One
Assessment Number: 1
Assessment Type: Practical Assignment – Supervised Machine Learning
Individual/Group: Individual
Assessment Weighting: 30%
Issue Date: 28/05/2023
Due Date/Time: Deadline – 11.55 pm on Sunday 18/06/2023
Mode of Submission: MOODLE

Learning Outcomes to be assessed

1. Analyze a dataset from a problem domain in depth, and select appropriate statistical models,
tools, and techniques to derive insights regarding the dataset and domain.

2. Effectively extract, transform, interrogate, and analyze large datasets.

3. Construct, refine, interpret, and critically evaluate predictive analytical and machine learning
models.

4. Critically evaluate and utilize hyperparameter search strategies for optimizing machine
learning models.

1
Supervised Machine Learning – Classification (100 Marks)

Dataset

Each row in CustomerChurn.csv corresponds to a bank’s credit card customer.


Relevant information about this dataset is given below:

Number of Instances: 6237


Number of Attributes: 15 independent variables + 1 target variable
Independent Variables:
Customer_Age – Customer age in years
Gender – M = Male, F = Female
Dependent_count – Number of dependents
Education_Level – Highest education level of the customer
Marital_Status – Married, Single, Divorced
Income_Category – Annual income category of the customer
Card_Category – Type of card. Platinum cards offer more reward points than gold cards, which
offers more reward points than silver cards. Blue cards offer the least number of reward points
on purchases.
Months_on_book – Period of relationship (in months) with bank.
Total_Relationship_Count – Total number of products held by the customer (e.g., Saving
account, Car loan, Credit Card, etc.)
Months_Inactive - Number of months inactive in the last 12 months
Contacts_Count – Number of customer service contacts in the last 12 months
Credit_Limit – credit limit on the credit card
Total_Revolving_Bal – Total revolving balance on the credit card (the portion of credit card
spending that goes unpaid at the end of a billing cycle)
Total_Trans_Amt – Total transaction amount in the last 12 months
Total_Trans_Ct – Total transaction count in the last 12 months

Target Variable:
Attrition_Flag – Two labels - ‘Existing Customer’, ‘Attrited Customer’ (Customer who has
churned)

Task

The bank wants to use a classification model that can predict customer churn. Construct a
suitable classification model for the bank by implementing both random forest and support
vector classification algorithms in Python.

In addition to providing the python code file, you are required to provide critical analysis of
your approach and results in a pdf report.

2
Your code and analysis should cover the following points:

1. Data Preparation (What steps would you take to prepare your data? Discuss your approach)
[20]
2. Model Hyperparameter Tuning (Which hyperparameters would you tune and why? How
would you tune them?) [20]

3. Choice of Evaluation Metric (Which metric would be suitable for model evaluation and
why?) [20]

4. Overfitting avoidance mechanism (Which mechanism (feature Selection/ regularization)


would you use and why?) [20]

5. Results analysis
a). Which of the two models (random forest or support vector classifier) would you
recommend for deployment in the real-world?
b). Is any model underfitting? If yes, what could be the possible reasons?
[20]

You must submit the following in a zipped folder:

1. Critical Analysis Report (.pdf)


2. Python Code (.py)

Naming convention:
Report should be named as –
Report_Firstname_Surname.pdf
Code should be named as –
Code_Firstname_Surname.py
Zipped folder should be named as –
Firstname_Surname.zip

There is no prescribed word-count for the report. It will be assessed on quality, and not
quantity of content.

3
Assessment Criteria
Each part will be graded according to the following criteria:
1. Quality of code (correctness and completeness) [Weightage – 40%]
2. Quality of analysis in report (critical analysis of approach, presentation and interpretation
of results, conclusion) [Weightage – 60%]

General Requirements for Students

PLEASE READ CAREFULLY


1. It is your responsibility to ensure your file is uploaded correctly.
2. Students are required to retain a copy of each assignment.
3. When an assignment is submitted, it is the student’s responsibility to ensure that the file
is in the correct format and opens correctly.
4. Students should refer to the assessment regulations in their Course Guide.
5. DBS penalizes students who engage in academic impropriety (i.e. plagiarism, Collusion and
/ or copying). Please refer to the referencing guidelines on Moodle for information on
correct referencing.
6. All relevant provisions of the Assessment Regulations must be complied with.
7. Penalties for late submission of assignments are as follows:
a. 25% penalty for assignments submitted within 5 working days of the deadline.
b. No marks for assignments submitted more than 5 working days after the deadline.

Extensions to assignment submission deadlines will be granted in exceptional circumstances


only. The appropriate “Application for Extension” form must be used and supporting
documentation (e.g. medical certificate) must be attached. Applications for extensions
should be made directly to the Programme Coordinator in advance of the deadline date.

You might also like