BUS2004 Ass3 Sem2 2024
BUS2004 Ass3 Sem2 2024
Overview
So far, you have gained a good understanding of predictive analytics and how to use Orange data
mining platform to analyse data and build prediction models. This assignment will provide you with
an opportunity to demonstrate how predictive analytics helps businesses in decision making.
Assignment Requirements
Given the data in the BUS2004Ass3.xlsx file used to predict employee attrition, you are required to
create an Orange workflow to:
1. (8 marks) Explore the data and examine if there are any issues including missing values and
inconsistent data and address them accordingly.
2. (4 marks) Set up the features and target variable to build prediction models based on all
given features. Then, split the data into 80% for training and 20% for test. Make sure we
can recreate the results.
3. (12 marks) Set up an experiment to build a decision tree (DT) model with a depth of 3, called
Tree-3, and an SVM. Train the models on the training set and compare their test
performance on the test set using AUC, accuracy, and F1. Present the learnt tree.
4. (18 marks) Create a Python script in Orange to perform a grid search to find the best tree
depth that gives the highest classification accuracy from a list of at least 5 predefined values.
Justify your choice of these values. Report the best tree depth and its average accuracy of
5-fold cross validation on the training set, called Validate-Accuracy. Make sure all the
random factors are control so we can recreate the same results.
5. (10 marks) Add to the experiment another DT model with the best tree depth found in
Question 4, called Tree-best. Compare the test accuracy of the Tree-3 model and Tree-best
model. Does the test accuracy of Tree-best the same as the Validate-Accuracy found in Q4?
Why?
6. (18 marks) Add a Preprocess widget to the Test and Score widget to select 50% top-ranked
features based on information gain. Compare the test performance of the Tree-best and
the SVM models before and after feature selection. Discuss the effect of feature selection.
List the selected features and discuss the importance of these features in predicting
employee attrition. Support your discussion with relevant literature, explaining how these
features have been shown to influence employee turnover in past research.
Submission Guidelines
Your submission to this assignment is required to have TWO FILES (No compression or other formats):
1) Your report (in a WORD file): Answers to the questions above. You must use the template Word
file provided for this assignment.
2) Your analytics solution (in an ORANGE file): Include the workflow.
Note that if one of the two files is not submitted or the content of the two files does not match, NO
MARK WILL BE AWARDED for all questions.
Marking rubrics
The marker will use the following marking guide to assess your work. Please make sure you
understand what you need to cover for each question in this assignment.
A full mark will be given for each question if the expectations are met; half of the mark for
something close.
Submissions having a high similarity score will be considered plagiarism/collusion and will be
reported to the Academic Integrity Advisors.