0% found this document useful (0 votes)
4 views

01.Promotion_Classification_Using_DecisionTree_and_Principal_Component_Analysis

This paper presents a machine learning approach for promotion classification in human resource management using Decision Tree and Principal Component Analysis (PCA). The study utilizes a dataset from Kaggle, demonstrating that PCA enhances classification accuracy, achieving a maximum of 91.25% accuracy for performance features. The findings suggest that performance features are more significant for promotion status than personal features, with future work aimed at testing with additional datasets to validate results.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

01.Promotion_Classification_Using_DecisionTree_and_Principal_Component_Analysis

This paper presents a machine learning approach for promotion classification in human resource management using Decision Tree and Principal Component Analysis (PCA). The study utilizes a dataset from Kaggle, demonstrating that PCA enhances classification accuracy, achieving a maximum of 91.25% accuracy for performance features. The findings suggest that performance features are more significant for promotion status than personal features, with future work aimed at testing with additional datasets to validate results.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

The 7th International Conference on Digital Arts, Media and Technology (DAMT)

and 5th ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (NCON)

Promotion Classification Using DecisionTree and


Principal Component Analysis
2022 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT & NCON) |

Theeramet Kaewwiset Punnarumol Temdee


Computer and Communication Engineering for Computer and Communication Engineering for
Capacity Building Research Center, Capacity Building Research Center,
School of Information Technology School of Information Technology
Mae Fah Luang University Mae Fah Luang University
Chiang Rai, Thailand Chiang Rai, Thailand
[email protected] [email protected]

Abstract— The goal of human resource management is to of the dataset and PCA features. From the accuracy
support the employee to higher position promotion and improve comparison of the model by using features of the dataset and
their work performance. The key process for performance PCA features, we want to find group of features that impact
evaluation is promotion classification. This paper proposes the with promotion status of human resource dataset.
machine learning based promotion classification using decision
tree and principle Component Analysis (PCA). This paper uses II. LITERATURE REVIEW
Human Resource Analysis Case Study dataset from Kaggle for
comparing the promotion classification. The comparison of A. Human Resource Research with IT Aspect
classification performance for all features, personal features, This research we focus on human resource literature
and performance features are conducted. The classification reviews which use machine learning to classify HR dataset.
results show that the classification with PCA provides the Each literature reviews have different objectives such as
highest accuracy at 91.25%. classify professional blogger, find expert in organizations,
classify talent of employee, develop model of employee
Keywords - Professional Development, Decision Tree, Human engagement, evaluate employee performance, classify
Resource, PCA
employee promotion, and predict employee turnover [1-11].
I. INTRODUCTION B. Promotion Evaluation with IT Aspect
One part of professional development in the human The promotion in human resource can refer to
resource management (HRM) is Training and Development performance evaluation. Many human resource researches in
(T&D) process. Main objectives of T&D focus on develop information technology fields provide the necessary of
employee to satisfy job criterion or promote in next positions. performance evaluation in human resource management
Nowadays, the problem of companies concerned with process. [12] The important of human resource management
recruiting staffs who satisfy for job criterions in each position is managing human resource in organization to high quality
as well as training and development (T&D) for employees in manpower. And the quality and performance of practicing can
company to satisfy their objective or job position criterion. determine the companies' fate. This research presents the goal
The problems of training and development have many reasons of all human resource development theories referred to
which can be occurred from staff problem or organization “selecting the right people for the right positions”. [13] The
978-1-6654-9510-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/ECTIDAMTNCON53731.2022.9720415

problem. Training and development of organization problems key success in the enterprise caused by successful to manage
have an allocation resources such as high cost of training and employees' capability. The matching between right jobs and
development, sending inappropriate employee for training, excavating excellent employee are difficult process and will
insufficient work time, etc. From the previous problem, be the problem of managers.
organization must find employees who have good
productivity and performance to promote into training and C. Principal Component Analysis (PCA)
development process. Therefore, skills, performance of work, PCA is a feature extraction by unsupervised algorithm. It
work experience or background knowledges of employees are is used to reduce dimensionality of the data. Linear algebra
used to evaluate with job criterions. Generally, each company and statistics are used in PCA calculation for finding high
have different techniques to manage human resources for variant and high correlation with outputs and rearrange the
develop staff or employee in company. Each company use features in linear transformation for create new variable in
different human resource software to manage in human simple matrix [14]. The first feature of PCA describes the high
resource which depend on business of each company to report variance and holds the most information by any project of the
employee performance results. And making decision on the dataset. The second feature is more informative and has
report by analysis weights or scores of overall attributes such biggest variance more than the third, and so on. The steps of
as performance, KPI, length of service, merit and ability, PCA are show below.
education, technical, potential, training, etc.
• Normalized features by Standardize.
This research aims to find the features that high correlation
with promotion status. Promotion selection is one indictor to • Covariance matrix calculation.
choose employee who have good performance and more • Finding the eigenvalues and eigenvectors for the
experience. From these reasons, features of dataset are covariance matrix.
designed to three groups which are all features, personal
features, and performance features. And combination with • Plot the vectors on the scaled data.
PCA data extraction process for finding the features that high
After PCA process, the PCA features are used in Decision
variant and high correlation with promotion status. Decision
Tree for promotion classification.
Tree is used to classify promotion classification with features

Authorized 978-1-6654-9510-3/22/$31.00 ©2022 IEEE NOVA DE LISBOA. Downloaded


licensed use limited to: b-on: UNIVERSIDADE 489 on January 16,2025 at 01:01:37 UTC from IEEE Xplore. Restrictions apply.
The 7th International Conference on Digital Arts, Media and Technology (DAMT)
and 5th ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (NCON)

III. DATASET TABLE I. COMPARISON ACCURACY BETWEEN ALL FEATURES WITH


AND WITHOUT PCA.
This research uses Human Resource Analysis Case Study
Accuracy
dataset from Kaggle. This dataset contains 54,808 people. It Algorithm Round
All features All features + PCA
has 13 features and 1 output which are promotion status.
1 90.21% 87.33%
A. Features of Human Reource Dataset
2 90.29% 87.12%
This research separate features to two categories which are Decision Tree
personal data and performance data. The detail of each type of 3 90.19% 87.16%
the data is shown below: Avg. 90.23% 87.2%
• Personal feature: employee id, age, gender, education,
region, department, recruitment channel.
B. Promotion Classification of Personal Features with and
• Performance feature: number of training, length of without PCA.
service, previous year rating, KPI, awards, average Personal features have 7 features consist of employee id,
training score. age, gender, education, region, department, recruitment
IV. METHODOLOGY channel. After implemented these features in data pre-
processing get 52 features of personal features. And then
This paper compares the classification performance PCA is used to extract features and choosing 44 features of
among three groups including all features with PCA and all personal features at 95% correlation with outputs.
features without PCA, personal features with PCA and Figure 2 shows variance of PCA features from personal
personal features without PCA, and performance features features which have 52 features. In this experiment choose 44
with PCA and performance features without PCA. PCA features that have strong correlation with promotion
A. Promotion Classification of All Features with and status at 95%.
without PCA. Table II shows accuracy of promotion classification by
The human resource dataset has 13 features consist of using Decision Tree which uses features between personal
employee id, age, gender, education, region, department, features and PCA features of personal features.
recruitment channel, number of training, length of service,
previous year rating, KPI, awards, average training score.
From these features are implemented in data pre-processing
such as checking and fill null, dropping irrelevant features,
converting data with dummy process. After data pre-
processing get 58 features of all features. Then, PCA is used
to extract features and choose 48 features from all features at
95% correlation with outputs.
Figure 1 shows variance of PCA features from all features
graph which have 58 features. In this experiment choose 48
PCA features that have strong correlation with promotion
status at 95%.
Table I shows accuracy of promotion classification by
using Decision Tree which uses features between all features
and PCA features of all features.

Figure 2. PCA Features of Personal Features Variance.

TABLE II. COMPARISON ACCURACY BETWEEN PERSONAL FEATURES


WITH AND WITHOUT PCA.

Accuracy
Algorithm Round Personal features +
Personal features
PCA
1 89.67% 89.76%

2 89.71% 89.73%
Decision Tree
3 89.72% 89.74%

Avg. 89.7% 89.74%

Figure 1. PCA Features of All Features Variance.

490
Authorized licensed use limited to: b-on: UNIVERSIDADE NOVA DE LISBOA. Downloaded on January 16,2025 at 01:01:37 UTC from IEEE Xplore. Restrictions apply.
The 7th International Conference on Digital Arts, Media and Technology (DAMT)
and 5th ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (NCON)

C. Promotion Classification of Performance Features with


and without PCA.
Performance features have 6 features consist of number
of training, length of service, previous year rating, KPI,
awards, average training score. After implemented these
features in data pre-processing get 6 features of performance
features. And then PCA is used to extract features and
choosing 6 features of performance features at 95%
correlation with outputs.
Figure 3 shows variance of PCA features from
performance features which have 6 features. In this
experiment choose 6 PCA features that have strong correlation
with promotion status at 95%.
Table III shows accuracy of promotion classification by
using Decision Tree which uses features between performance Figure 4. Compare Accuracy of Promotion Classification
features and PCA features of performance features.
V. DISCUSSION
From Table III, the classification performance using
Decision Tree with performance features without PCA
provides the average accuracy at 91.22%. It means that
performance features have significant with promotion status
more than personal features. Compared with all features,
performance features have stronger correlation with outputs
than those of all features. At the same time, the classification
of performance feature with PCA provides the accuracy at
91.25%, which is higher than those of without PCA.
Therefore, PCA can improve accuracy of Decision Tree for
classification of performance features. In contrast, PCA
decreases classification performance of Decision Tree for
classification of all features. However, it is not clear that PCA
can increase the classification performance of Decision Tree
for personal data, which is worth for further study. The future
work of this study is to increase the classification performance
by using different classifiers and to validate the classification
Figure 3. PCA Features of Performance Features Variance.
performance by testing with other datasets.
VI. CONCLUSION
TABLE III. COMPARISON ACCURACY BETWEEN PERFORMANCE This paper proposes the promotion classification using
FEATURES WITH AND WITHOUT PCA.
Decision Tree and PCA. This research demonstrates the
Accuracy comparison of all features, personal features, and performance
Algorithm Round Performance Performance features with and without PCA. The results show that the
features features + PCA Decision Tree provides highest average accuracy at 91.25%
1 91.22% 91.23% for performance features with PCA. In addition, Decision Tree
provides highest average accuracy at 91.22% for performance
2 91.24% 91.25%
Decision Tree features without PCA. For the future work, implementation
3 91.21% 91.28% with other human resource datasets is suggested to validate the
Avg. 91.22% 91.25%
performance of the proposed methods and increase the
classification performance.
ACKNOWLEDGMENT
This work is supported by the Capacity building and
ExchaNge towards attaining Technological Research and
modernizing Academic Learning-CENTRAL, which is
funded by the Erasmus+ program (Project ID: 598914,
Reference: 598914-EPP-1-2018-1-DK-EPPKA2-CBHE-JP).
REFERENCES

[1] Y. Asim, B. Raza, A. K. Malik, S. Rathore and A. Bilal, "Improving


the performance of professional blogger's classification," 2018
International Conference on Computing, Mathematics and Engineering
Technologies (iCoMET), Sukkur, 2018, pp. 1-6, doi:
10.1109/ICOMET.2018.8346342.

491
Authorized licensed use limited to: b-on: UNIVERSIDADE NOVA DE LISBOA. Downloaded on January 16,2025 at 01:01:37 UTC from IEEE Xplore. Restrictions apply.
The 7th International Conference on Digital Arts, Media and Technology (DAMT)
and 5th ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (NCON)

[2] Z. Huang and D. Jiang, "Research and Implementation of Fuzzy 2021 13th International Conference on Knowledge and Smart
ISODATA Clustering Algorithm Based on Gene Expression Technology (KST), 2021, pp. 181-185, doi:
Programming in Human Resource Management," 2011 International 10.1109/KST51265.2021.9415794.
Conference of Information Technology, Computer Engineering and [9] A. Assiri, J. Berri and A. Chikh, "Classification and tendencies of
Management Sciences, Nanjing, Jiangsu, 2011, pp. 178-180, doi: evaluations in e-learning," International Conference on Education and
10.1109/ICM.2011.361. e-Learning Innovations, Sousse, 2012, pp. 1-6, doi:
[3] G. K. Hoon, G. K. Min, O. Wong, O. B. Pin and C. Y. Sheng, 10.1109/ICEELI.2012.6360570.
"Classifly: Classification of Experts by Their Expertise on the Fly," [10] M. Eminagaoglu and S. Eren, "Implementation and comparison of
2015 IEEE/WIC/ACM International Conference on Web Intelligence machine learning classifiers for information security risk analysis of a
and Intelligent Agent Technology (WI-IAT), Singapore, 2015, pp. 245- human resources department," 2010 International Conference on
246, doi: 10.1109/WI-IAT.2015.63. Computer Information Systems and Industrial Management
[4] C. Stephanie and R. Sarno, "Classification Talent of Employee Using Applications (CISIM), Krackow, 2010, pp. 187-192, doi:
C4.5, KNN, SVM," 2019 International Conference on Information and 10.1109/CISIM.2010.5643665.
Communications Technology (ICOIACT), 2019, pp. 388-393, doi: [11] Q. Guohao, W. Bin, W. Bai and Z. Baoli, "Competency Analysis in
10.1109/ICOIACT46704.2019.8938508. Human Resources Using Text Classification Based on Deep Neural
[5] Q. Chen and Z. Gong, "Data mining modeling of employee engagement Network," 2019 IEEE Fourth International Conference on Data
for IT enterprises based on decision tree algorithm," 2013 6th Science in Cyberspace (DSC), Hangzhou, China, 2019, pp. 322-329,
International Conference on Information Management, Innovation doi: 10.1109/DSC.2019.00056.
Management and Industrial Engineering, 2013, pp. 305-308, doi: [12] H. Jing, "Application of Fuzzy Data Mining Algorithm in Performance
10.1109/ICIII.2013.6703145. Evaluation of Human Resource," 2009 International Forum on
[6] A. Nedelcu, B. Nedelcu, A. I. Sgarciu and V. Sgarciu, "Data Mining Computer Science-Technology and Applications, 2009, pp. 343-346,
Techniques for Employee Evaluation," 2020 12th International doi: 10.1109/IFCSTA.2009.90.
Conference on Electronics, Computers and Artificial Intelligence [13] Z. Huang and D. Jiang, "Research and Implementation of Fuzzy
(ECAI), 2020, pp. 1-6, doi: 10.1109/ECAI50035.2020.9223165. ISODATA Clustering Algorithm Based on Gene Expression
[7] T. Kaewwiset, P. Temdee and T. Yooyativong, "Employee Programming in Human Resource Management," 2011 International
Classification for Personalized Professional Training Using Machine Conference of Information Technology, Computer Engineering and
Learning Techniques and SMOTE," 2021 Joint International Management Sciences, 2011, pp. 178-180, doi:
Conference on Digital Arts, Media and Technology with ECTI 10.1109/ICM.2011.361.
Northern Section Conference on Electrical, Electronics, Computer and [14] M. Syafrudin, G. Alfian, N. L. Fitriyani, A. H. Sidiq, T. Tjahjanto and
Telecommunication Engineering, 2021, pp. 376-379, doi: J. Rhee, "Improving Efficiency of Self-care Classification Using PCA
10.1109/ECTIDAMTNCON51128.2021.9425754. and Decision Tree Algorithm," 2020 International Conference on
[8] T. Juvitayapun, "Employee Turnover Prediction: The impact of Decision Aid Sciences and Application (DASA), 2020, pp. 224-227,
employee event features on interpretable machine learning methods," doi: 10.1109/DASA51403.2020.9317243.

492
Authorized licensed use limited to: b-on: UNIVERSIDADE NOVA DE LISBOA. Downloaded on January 16,2025 at 01:01:37 UTC from IEEE Xplore. Restrictions apply.

You might also like