0% found this document useful (0 votes)
3 views

Predicting Employee Attrition using Machine Learning Techniques

For businesses employee retention is a major issue, and forecasting attrition can assist HR departments to put in place proactive measures to lower turnover. Using methods including Random Forest, XGBoost, Decision Tree, Support Vector Classifier (SVC), Logistic Regression, KNearest Neighbors (KNN), and Naive Bayes, this project uses machine learning approaches to study important factors affecting employee departure.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Predicting Employee Attrition using Machine Learning Techniques

For businesses employee retention is a major issue, and forecasting attrition can assist HR departments to put in place proactive measures to lower turnover. Using methods including Random Forest, XGBoost, Decision Tree, Support Vector Classifier (SVC), Logistic Regression, KNearest Neighbors (KNN), and Naive Bayes, this project uses machine learning approaches to study important factors affecting employee departure.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Volume 10, Issue 5, May – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25may172

Predicting Employee Attrition using


Machine Learning Techniques
N. Bhavana1; Chukka Ganesh2
1
Assistant Professor, 2Student
1,2
Depatment of MCA, Annamacharya Institute of Technology and Sciences, Karakambadi, Tirupati, Andhra
Pradesh, India

Publication Date: 2025/05/07

Abstract: For businesses employee retention is a major issue, and forecasting attrition can assist HR departments to put in
place proactive measures to lower turnover. Using methods including Random Forest, XGBoost, Decision Tree, Support
Vector Classifier (SVC), Logistic Regression, KNearest Neighbors (KNN), and Naive Bayes, this project uses machine
learning approaches to study important factors affecting employee departure. The model discovers trends in job satisfaction,
workload, career development, and worklife balance trained on the IBM Analytics dataset with 35 characteristics and 1,500
records. Deployed as an interactive Flask based web application, the system includes capabilities for data upload,
forecasting, and model performance visualization. This AI driven solution helps HR staff to find early at-risk employees,
manage issues efficiently, and enhance staff stability by offering practical insights. By using predictive analytics in HR
management, businesses can lower attrition expenses, improve staff engagement, and create a more resilient setting.

Keywords: Employee Attrition Prediction, Machine Learning, Random Forest, XGBoost, Decision Tree, Support Vector Classifier
(SVC), Logistic Regression, K-Nearest Neighbors (KNN), Naive Bayes, Flask.

How to Cite: N. Bhavana; Chukka Ganesh. (2025). Predicting Employee Attrition using Machine Learning Techniques.
International Journal of Innovative Science and Research Technology,
10(5), 1-10. https://doi.org/10.38124/ijisrt/25may172.

I. INTRODUCTION The system is developed as a web-based application


using Flask, offering an intuitive interface for HR
Employee attrition remains a critical issue for professionals to upload employee data, analyze attrition
organizations, impacting productivity, operational efficiency, trends, and visualize model performance. This tool enables
and overall workplace morale. High staff turnover not only organizations to identify employees at high risk of leaving
raises costs for hiring and development but also interrupts and implement targeted interventions such as better
workflow and stunts company expansion. Knowing the root compensation, flexible work arrangements, career growth
causes of attrition and forecasting possible staff turnover opportunities, and employee engagement programs. By
enable companies to adopt forward retention initiatives. integrating machine learning into HR analytics, businesses
Traditional attrition analysis techniques often rely on manual can significantly reduce turnover rates, improve workforce
evaluations and surveys, which are time-consuming, retention, and enhance overall organizational stability. Future
subjective, and less precise. Machine learning models using developments in this project may include incorporating deep
artificial intelligence and data analytics can provide a more learning techniques, real-time attrition monitoring, and
precise and data driven method for spotting employees who expanding the model to accommodate industry-specific
might leave by using these resources. attrition patterns.

Employee Housing Analysis Without Fail Basically it II. METHODOLOGY


was based on a sample of 1500 employee records and 35
determinants. It included job satisfaction, work-life balance, The method suggested is in predicting the probable
paying scales, the amount of work to be done, career employee attrition using machine-based detection to predict
development opportunities, and engagement. The studies the prospect of an employee leaving the organization based
would determine the best possible predictive models that on other parameters considered for arriving at the retention
could recognize the highest influencers in the prediction capability of the proposals: job satisfaction, work-life
models used to predict what factors facilitate defines. balance, pay, career growth or advancement, and workload,
Multiple models can further enrich constructs of prediction among others. The major intention with this system would be
by improving results using ensemble techniques. toward making attempts to predict reasonably and counteract

IJISRT25MAY172 www.ijisrt.com 1
Volume 10, Issue 5, May – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25may172
the risks toward loss of workforce stability in the respect of income and wages. Large attrition types are a draw
organization. to organizations, forcing organizations to lay off more
employees. As such, if the model works, it minimizes the cost
The system utilizes a range of machine learning of retaining active employees: the higher the turnover, the
methodologies to generate accurate employee turnover higher overall costs. Trying to improve this model will only
predictions based on HR data. It is designed to recognize yield cost savings by avoiding retaining employees.
patterns that may not be immediately visible, enhancing the
ability to detect early signs of potential attrition. The A user-friendly web application is developed using
prediction process involves training on historical employee HTML, CSS, and JavaScript, allowing HR professionals to
records to identify signals indicative of future resignations. enter employee data and receive real-time predictions. The
By leveraging multiple classification techniques, the system platform is designed to offer insights into workforce retention
improves its predictive capability, allowing it to accurately trends, empowering organizations to take data-driven,
identify employees who are most likely to leave. This proactive measures to reduce turnover. The system also
approach supports proactive decision-making and helps enables companies to refine their HR policies, improve
organizations implement effective retention strategies. employee engagement, and implement targeted retention
strategies.
Since employee turnover is today almost an expected
precariousness, the human resources part needs more By integrating machine learning into HR analytics, this
thorough attention. Employees leave for or migrate to other system provides valuable insights for businesses, supporting
firms seeking opportunities or better terms. It is stated that early identification of attrition risks and enabling timely
keeping some current issues in mind, a new predictive interventions. It helps organizations make informed decisions
approach to attrition has been suggested based on a folkloric regarding employee retention, ultimately leading to a more
inference of views based on what an average human is like in stable, productive, and engaged workforce.

Fig 1: Flow Chart

IJISRT25MAY172 www.ijisrt.com 2
Volume 10, Issue 5, May – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25may172
III. MODULES AND ITS IMPLEMENTATION  Model Page: This section displays the accuracy of each
machine learning model used in the system. Users can
A. System Operations compare different models.
 Prediction Page: After uploading data, users can navigate
 Upload Data: HR professionals collect and upload a to the prediction page, where they can view individual and
structured dataset containing various employee-related overall attrition predictions.
factors that influence attrition. This dataset includes  Viewing Results: Once the data is processed, users can
details such as job roles, work experience, compensation, view the classification results, including whether an
performance metrics, work-life balance indicators, and employee is at risk of leaving.
employee engagement levels. By gathering  Logout: To ensure data security and privacy, users can
comprehensive data, the system can better analyze log out of the system after completing their tasks, securing
workforce trends and predict employee attrition their session and personal data.
accurately.
 Data Preprocessing: Once the dataset is uploaded, it IV. MODELING AND ANALYSIS
undergoes a series of data cleaning and preprocessing
procedures. This involves handling missing or corrupted A. Random Forest
data, encoding categorical variables such as department It is ensemble algorithms that build many decision trees
and job role, normalizing or standardizing numerical and prop them against random subsets of IBM Analytics
features like salary and experience, and applying consisting of 35 features (like job satisfaction, workload, and
techniques to balance the dataset to prevent bias. These more). The outcomes are then aggregated through voting to
preprocessing steps ensure that the machine learning classify the overall results so this reduces chances of
models perform optimally and provide accurate overfitting while increasing accuracy. Using these feature
predictions. importance scores, Random Forest here helps in inferring the
 Model Building: The system trains multiple machine more salient attrition factors like work-life balance. Also, due
learning models to classify employees as likely to stay or to its robustness against noisy data and handling of
leave based on historical HR data. These models are fine- imbalanced classes, this classifier fits aptly for predicting the
tuned through hyperparameter optimization to improve high-at-risk employees within the organization
prediction accuracy and overall reliability. Model
selection is guided by evaluating performance metrics B. XGBoost
such as accuracy, precision, recall, and F1 score, ensuring It is an advanced ensemble algorithm that sequentially
that the most effective model is chosen for employee builds decision trees, optimizing a loss function with gradient
turnover prediction boosting. For attrition prediction, it processes features like
 Model Prediction: The trained models analyze new career growth and workload, weighting errors to improve
employee data using the same preprocessing techniques accuracy. Its regularization prevents overfitting, and
applied to the training data. Based on this analysis, the scalability handles the 1,500-record dataset efficiently. In the
system predicts whether an employee is at risk of leaving Flask app, XGBoost’s high predictive power aids early
the organization. The model's decision is based on identification of at-risk employees.
historical trends and key influencing factors identified in
the dataset. C. Decision Tree
 Result: The system presents the prediction results for It split the IBM dataset into branches based on features
each employee, along with confidence scores to indicate like job satisfaction or work-life balance, creating a
prediction reliability. Additionally, it provides detailed flowchart-like model to predict attrition. Each node
performance metrics, including confusion matrices, represents a decision, and leaves indicate outcomes
accuracy, precision, recall, and F1 score. Visual aids such (stay/leave). Their simplicity and interpretability help HR
as bar charts, histograms, and ROC curves are also visualize attrition patterns. In the project, Decision Trees
incorporated to help HR professionals interpret the results provide clear rules for identifying at-risk employees.
more effectively. However, they are prone to overfitting, especially with noisy
data, leading to poor generalization. Pruning and limiting tree
B. User Operations depth mitigate

 Register: Users, primarily HR professionals, must first D. Support Vector Classifier (SVC)
register with their credentials to create an account in the It finds the optimal hyperplane to separate employees
system. who stay from those who leave, maximizing the margin
 Login: Registered users can log in using their credentials between classes. For non-linear patterns in the dataset (e.g.,
to securely access the system and perform data analysis. complex interactions between career growth and workload),
 Upload Data: Users can upload employee datasets SVC uses kernels like RBF. In the project, SVC effectively
containing relevant information about job satisfaction, classifies attrition risk but struggles with the dataset’s size
salary, experience, and performance. The uploaded data due to high computational costs. Scaling features is essential
should be in a structured format compatible with the for performance.
system.

IJISRT25MAY172 www.ijisrt.com 3
Volume 10, Issue 5, May – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25may172
E. Logistic Regression provides fast predictions, ideal for real-time HR use.
It predicts the probability of employee attrition by However, its independence assumption may oversimplify
modeling the relationship between features (e.g., job complex relationships, reducing accuracy. It performs well
satisfaction, work-life balance) and a binary outcome with imbalanced classes, common in attrition data. Its
(stay/leave) using a logistic function. Its simplicity and simplicity aids deployment but limits capturing intricate
interpretability make it ideal for HR to understand feature patterns. Naive Bayes supports HR by offering quick,
impacts via coefficients. In the project, it handles the 1,500- interpretable insights for early intervention.
record dataset efficiently, providing baseline predictions for
the Flask app. However, it assumes linear relationships, V. RESULTS AND DISCUSSION
which may miss complex patterns. Regularization (e.g., L1,
L2) prevents overfitting. Its fast training and deployment The Employee Attrition Prediction System could very
make it practical for real-time attrition risk assessment, well prove helpful in predicting employees who are very
supporting proactive HR interventions. likely to leave the company through the use of a variety of
machine learning models, which ranges from distance-based
F. K-Nearest Neighbors (KNN) classifiers to margin-based classifiers, tree-based methods,
it employees as likely to remain or likely to leave by and even neural networks. The predicted reliabilities shall be
considering the ‘k’ employees most similar to the ones in the lowered by implementing ensemble learning techniques like
dataset in question with respect to their workload, career Voting Classifier and Stacking Classifier to combine
growth, importance of training, and so forth. It uses distance individual merits to achieve better performance results.
metrics like Euclidean distance, Manhattan distance, and so
on. KNN, therefore, captures the local patterns in attrition A study with IBM HR Analytics Employee Attrition
data, but it is sensitive to feature scaling and noise. The other Data showed that job satisfaction, work-life balance,
aspect is the computational cost, which increases with the size compensation, and career development opportunities affected
of the dataset and, therefore, affects the performance of the employee retention critically. Similarly, these results were
Flask app. Careful consideration of choosing ‘k’ is from earlier researches and showed the sweaty and
imperative. KNN is slow and not storage-efficient, thus multidimensional issues of turnover.
limiting its scalability. It provides assistance to HR in
identifying employees vulnerable to leaving by offering This makes it possible to design and deploy a simple
insights through similarity detection. user-friendly web application hosted on Flask, where all HR
personnel can feed staff data and predictions in real-time. It
G. Naive Bayes also allows ad hoc capture of employees at risk of leaving and
It predicts attrition by calculating probabilities of incentivized interventions. These outputs can also provide
staying or leaving based on features, assuming independence performance visualization and other services from the model,
between them (e.g., job satisfaction and workload). Using thus contextualizing model results, enabling data-informed
Bayes’ theorem, it’s computationally efficient and excels decisions.
with categorical data in the IBM dataset. In the Flask app, it

A. KNN Classifier

Fig 2: ROC Curve KNN

IJISRT25MAY172 www.ijisrt.com 4
Volume 10, Issue 5, May – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25may172
Table 1: Classification Report of KNN

The KNN model's AUC value of 0.87 on the ROC curve significant falses positives. Conversely, Class 0 refers to no-
indicates a good ability for distinguishing between employees attrition with exceptionally high precision of 0.91 and lower
prone to leave and those who are likely to stay. The recall of 0.72, meaning that it missed some no-attrition
classification report further states that the model has 80% instances. Hence, with macro average and weighted average
overall accuracy. Class 1 indicates attrition with a very good having a fixed score of 0.80, it shows close performance of
recall of 0.91, thus correctly identifying employees at risk of the classifier for both classes.
leaving, though with low precision of merely 0.72, implying

B. SVM Classifier

Fig 3: ROC Curve KNN

Table 2: Classification Report of SVM

IJISRT25MAY172 www.ijisrt.com 5
Volume 10, Issue 5, May – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25may172
AUC 0.87 tells us that the model is working quite well identifying employees at risk of leaving. Class 0, instead, has
and can pick out whether an employee is going to leave (class better performance with a precision of 0.82 and lower risks of
1) or stay (class 0) with strong discrimination through the even less false positives in employee prediction who will
interpretation of the ROC curve for SVC (Linear Kernel). The stay. The two classes now have an equal F1-score of 0.79,
classification report also states that the model has 79% which indicates how good the trade-off between precision
accuracy, and class precision for attrition class 1 is at 0.76 and recall is. Thus this clearly leads to the reliability of the
with a recall of 0.77, indicating that it does pretty well model in predicting employee attrition.

C. Decision Tree Classifier

Fig 4: ROC Curve of Decision Tree

Table 3: Classification Report of Decision Tree

The model has shown good performance under this full credit to the model's capacity in distinguishing between
particular setup of the Decision Tree: AUC of the curve at the two classes. In contrast, the recall values appear strong
0.99, which stands for a near-perfect performance throughout too, being 0.97 for employees staying and 0.93 for employees
the year, in telling apart the employees who are probably leaving; hence, it probably captured the majority of true
going to stay versus those who will likely leave. The positives in each class. The F1-score value for both classes is
subsequent classification report credits 95% total accuracy in also high at 0.95, thereby representing an optimal balance
performances; however, it must be mentioned that on between precision and recall, making this model very robust
precision, the classes diverge precision values (0.93 for in terms of its prediction on employee attrition.
staying employees and 0.97 for leaving employees), giving

IJISRT25MAY172 www.ijisrt.com 6
Volume 10, Issue 5, May – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25may172
D. Random Forest Classifier

Fig 5: ROC Curve of Random Forest

Table 4: Classification Report of Random Forest

The model gives an AUC of 1.00 on the ROC curve, calculated to 0.98 and 0.99, respectively. The F1-scores for
meaning it performs excellently by being able to discriminate both classes are very close to 0.99, showing an excellent
positively between employees whose class is likely to be balance between recall and precision. High scores through all
Class 1 (leaving) and whose class is likely to be Class 0 the metrics highlight the strength of the model in predicting
(staying). From the classification report, the model's accuracy employee attrition, which can be used as a good tool in
is impressive at 98%. Both precision and recall for Class 0 identifying at-risk employees and reducing turnover.
were at 0.99 while the precision and recall for Class 1

E. Logistic Regression

Fig 6: ROC Curve of logistic regression

IJISRT25MAY172 www.ijisrt.com 7
Volume 10, Issue 5, May – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25may172
Table 5: Classification Report of Logistic Regression

The Logistic Regression model, as depicted in an AUC at 0.79, meaning the model can identify employees prone to
score of 0.83 on the ROC curve, is moderately performing quitting fairly well but has room for improvement regarding
and indicates that the model has some level of effectiveness false positives. Precision for class 0 (i.e. no attrition) is 0.77
in classifying eventually leaving employees (class 1) as and recall is 0.74, indicating slightly better performance at
compared to the staying employees, class 0. However, it does predicting employees likely to remain. The resulting F1 of
not match that quality of performance demonstrated by other 0.77 means that the model does quite well on both classes
models bearing higher AUC values like Random Forest or such that the entire model can be favored as reasonably good
Decision Tree. The classification report shows an accuracy of at predicting attrition, but not into the same category of highly
77% and precision for class 1 (attrition) equals 0.76 and recall reliable models like Random Forest or Decision Tree.

F. Naïve Bayes

Fig 7: ROC Curve of Naïve Bayes

Table 6: Classification Report of Naïve Bayes

IJISRT25MAY172 www.ijisrt.com 8
Volume 10, Issue 5, May – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25may172
Modeling with Naive Bayes has moderate performance employees likely to leave than keeping low errors in false
as shown in ROC, which gives an AUC of about 0.80. This positives. While the class 0 (no attrition) concession is at 0.74
convention indicates that the model can discriminate fairly and then compared to recall at 0.58 to say that the model finds
well between those employees who will leave and those who it difficulty predicting those employees likely to stay in the
will stay, although this ability is weaker than Random Forest organization. For both the classes, F1 scored 0.69, telling us
or Decision Tree models. Among other things, the that the model has a moderate balance between precision and
classification report mentions accuracy in class1 (attrition) recall yet could improve identifying both employees at risk of
around 70 percent, while precision equals 0.67, and recall is leaving and those likely to stay.
0.81. It can be interpreted as being better at discerning

G. Xgboost

Fig 8: ROC Curve of Xgboost

Table 7: Classification Report of Xgboost

It appears to be very strong with an AUC of 0.94 on the which, hence, stands as a dependable predictive tool for
ROC curve, demonstrating an excellent ability to differentiate employee attrition.
between leaving employees (class 1) and staying employees
(class 0). In the classification report, the model presents an VI. CONCLUSION
accuracy value of 86%, having class 0 (no attrition) precision
at 0.92 and class 0 recall at 0.83, which means the model is The approaches demonstrated here explore machine
good at predicting employees who will stay, and there exists learning's merit of predicting employee attrition using
further improvement to correctly identify all of them. For algorithms like Random Forest, XGBoost, Decision Tree,
class 1 (attrition), precision is at 0.81 and recall is at 0.91, Support Vector Classifier (SVC), Logistic Regression, K-
suggesting the model does very well in predicting employees Nearest Neighbors (KNN), and Naive Bayes. It measures key
at risk of leaving while reasonably maintaining its precision. parameters such as job satisfaction, performance, tenure, and
Both classes hold up a high value for the F1 score, thus demographic details to give sturdy predictions on employee
maintaining a balance across the performance of the model, turnover. This system uses Flask for a web-based approach
and offers an interactive interface into which HR

IJISRT25MAY172 www.ijisrt.com 9
Volume 10, Issue 5, May – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25may172
professionals feed data upload, monitor model performance,
and derive insights. Well-informed and timely, these insights
give the HR team a forward-thinking approach to issues
relating to workloads, job satisfaction, and career growth,
which should affect retention strategy and hence workforce
stability positively. This way, organizations empower
themselves to make informed, data-driven decisions for more
effective human resource management.

REFERENCES

[1]. R. L. Althoff et al., "Predicting employee attrition


using machine learning techniques," IEEE
Transactions on Human Resources Management, vol.
67, no. 8, pp. 2209-2215, Aug. 2020, doi:
10.1109/HRM.2020.2962935.
[2]. P. Kumar, S. S. Agarwal, and P. K. Jain, "Employee
attrition classification using deep learning models,"
Journal of Human Resource Management, vol. 30, no.
3, pp. 1–9, May 2021, doi: 10.1111/hrm.13252.
[3]. M. Smith, J. A. Brown, and L. Harris, "Machine
learning techniques for employee attrition prediction:
A comparative study," Proceedings of the
International Conference on Workforce Analytics,
2021, pp. 122–130, doi:
10.1109/WORKANA.2021.00027.
[4]. A. C. Mills et al., "Employee retention prediction
using random forest and ensemble learning models,"
IEEE Access, vol. 8, pp. 174343-174352, 2020, doi:
10.1109/ACCESS.2020.3014506.
[5]. L. J. Robinson and R. B. Thompson, "Predictive
modeling of employee turnover using machine
learning algorithms," Journal of Business Analytics &
Human Resources, vol. 11, no. 1, pp. 1–10, Jan. 2021,
doi: 10.4172/2167-0277.1000281.
[6]. V. Singh and K. Gupta, "Prediction of employee
attrition severity using machine learning algorithms,"
Computational Intelligence and Human Resource
Management, vol. 2022, Article ID 791260, pp. 1–10,
2022, doi: 10.1155/2022/791260.
[7]. M. Z. Ibrahim, N. A. M. Isa, and R. A. Bakar,
"Employee attrition classification using artificial
neural networks and deep learning," International
Journal of Workforce Analytics, vol. 36, no. 7, pp.
3281–3290, 2021, doi: 10.1002/work.22794.
[8]. P. K. Bansal and S. K. Pandey, "Predicting employee
attrition using KNN, SVC, and decision tree
classifiers," IEEE Transactions on Business
Intelligence, vol. 40, no. 6, pp. 1560-1573, Jun. 2021,
doi: 10.1109/TBI.2021.3054100.
[9]. R. D. Woods, "A review of ensemble learning
techniques for predicting employee turnover," Journal
of Human Resource Analytics, vol. 4, no. 2, pp. 90–
101, 2021, doi: 10.1007/s41666-021-00071-z.
[10]. S. M. Zhang and J. L. Brown, "Employee attrition
prediction with ensemble learning methods,"
Proceedings of the IEEE International Conference on
Machine Learning and Applications, 2020, pp. 1890-
1897, doi: 10.1109/ICMLA.2020.00314.

IJISRT25MAY172 www.ijisrt.com 10

You might also like