Paper 43-Employability Prediction of Information Technology Graduates
Paper 43-Employability Prediction of Information Technology Graduates
net/publication/368825053
Article in International Journal of Advanced Computer Science and Applications · November 2022
DOI: 10.14569/IJACSA.2022.0131043
CITATIONS READS
10 950
3 authors:
Engy Yehia
Helwan University
11 PUBLICATIONS 65 CITATIONS
SEE PROFILE
All content following this page was uploaded by Gehad Elsharkawy on 26 February 2023.
Abstract—The ability to predict graduates’ employability to Machine learning (ML) techniques can be used to predict
match labor market demands is crucial for any educational the employability signals of IT graduates and identify the most
institution aiming to enhance students' performance and learning significant factors affecting their employability as early as
process as graduates’ employability is the metric of success for possible so appropriate actions can be taken to enhance their
any higher education institution (HEI). Especially information employability in order to equip them with the appropriate
technology (IT) graduates, due to the evolving demand for IT knowledge and skills before they enter the dynamic job
professionals increased in the current era. Job mismatch and market.
unemployment remain major challenges and issues for
educational institutions due to the various factors that influence There is increasing interest in applying machine learning
graduates' employability to match labor market needs. in higher education, according to certain prior studies to
Therefore, this paper aims to introduce a predictive model using predict the graduates’ employability but still, the use of
machine learning (ML) algorithms to predict information automated machine learning to predict students' employability
technology graduates’ employability to match the labor market in its initial stage, ML is a subset of artificial intelligence (AI)
demands. Five machine learning classification algorithms were in which computers analyze large datasets to learn patterns
applied named Decision tree (DT), Gaussian Naïve Bayes that will make predictions for new data, in contrast to
(Gaussian NB), Logistic Regression (LR), Random Forest (RF), traditional computer methodologies. In traditional reasoning,
and Support Vector Machine (SVM). The dataset used in this
algorithms are a set of explicitly defined instructions that
study is collected based on a survey given to IT graduates and
computers use to describe or solve problems [5], [6]. As a
employers. The performance of the study is evaluated in terms of
accuracy, precision, recall, and f1 score. The results showed that
result, in the hiring process, graduates with experience are in
DT achieved the highest accuracy, and the second highest high demand due to high productivity and low training cost
accuracy was achieved by LR and SVM. than those who did not have any experience. HEIs must
undergo frequent evaluations to provide future IT graduates
Keywords—Machine learning; IT graduates; higher education; with the demanded skills as it is considered the main factor to
employability; labor market produce this workforce [7].
359 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022
Improve the IT graduates’ qualifications to match labor accuracy of 69%. In another research [12], supervised
market demands. machine learning techniques such as LR, DT, RF, KNN, and
SVM were used to predict high school students' employability
Provide valuable insights for guiding HEIs to make for part-time jobs with local businesses the hard skills,
better long-term plans for producing graduates who are demographic features, and extra/co-curricular activities
knowledgeable and skilled through prediction of their features were used and collected from student surveys. The
employability status. LR algorithm achieved an accuracy of 93%.
Contribute significantly to the placement process for The authors in [13] analyzed the data from education
employers. institutions to predict the students’ employability and
Decrease the high unemployment rate of IT graduates. determine the factors affecting their employability by using
hard skills, soft skills, demographic features, extra/co-
The rest of this paper is organized as follows: Section II curricular activities, and university features then applied four
presents various relevant works in the field of employability ML algorithms which are DT, Gaussian NB, SVM, and KNN.
prediction. Section III describes the proposed methodology in The results achieved an accuracy of 98% by DT and SVM.
detail. Section IV shows the results of the used algorithms and
the discussion of the analysis of the used features. Section V Furthermore, a student employability prediction system
presents the conclusions of this study with some limitations was developed by [14] using SVM, DT, RF, KNN, and LR
and improvements. algorithms to predict the students’ employability, Institutional
databases were obtained, and the hard skills, soft skills, and
II. RELATED WORK demographic features were used. The results of this research
achieved an accuracy of 91% by the SVM algorithm. In [14],
In recent years, many researchers attempted to use
the authors identified the most predictive attributes through
machine learning in higher education to enhance graduate’s
hard skills, soft skills, and demographic features to determine
features and curricula to support employability [8]. To discuss
why students are most likely to get employed using graduates
the contribution of ML in continuous quality improvement.
surveys and institutional databases, the applied and compared
We focused on some of the previous works that used different
three methods are SVM, RF, and DT. The SVM achieved the
machine learning techniques such as Artificial Neural
highest accuracy of 91.22%.
Network (ANN), Decision Tree (DT), K-Nearest Neighbor
(KNN), Gaussian Naïve Bayes (Gaussian NB), Logistic The authors in [15] investigated the impact of various
Regression (LR), Neural Network (NN), Random Forest (RF), institution features on graduate employability using the
Naïve Bayes (NB) and Support Vector Machine (SVM). hyperbox-based machine learning model which achieved 78%
accuracy. A hybrid model was proposed by [16] for student
In [9], the author predicted which students are most likely
employability prediction through a deep belief network and
to get work after graduation by using data analytics and
Softmax regression (DBN-SR) the dataset obtained from
machine learning techniques such as SVM, LR, ANN, DT,
student surveys and the hard skills, soft skills, demographic
and discriminant analysis. Also, the features used are hard
features, and university features were used as the adopted
skills, demographics features, extra/co-curricular activities,
features the results achieved high accuracy with 98%.
and internships the data were obtained from student surveys
and institutional databases. The SVM classification algorithm In [17] predicted the students’ employability based on
achieved an accuracy of 87.26%. technical skills the institution databases were collected and the
following algorithms were applied SVM, LR, DT, RF,
The authors in [10] aimed to identify the most significant
AdaBoost, and NB, the highest accuracy achieved is 70% by
factors affecting graduate employability by using three
the RF algorithm. Finally, the authors in [18] developed a
classification algorithms DT, ANN, and SVM. The features
model using various machine learning methods DT, RF, NN,
used in this research are hard skills, soft skills, demographic
and Gaussian NB to forecast candidate hiring by employing
features, extra/co-curricular activities, university features, and
different statistical measures on feature selection such as hard
internships the research data were collected from institutional
skills, demographic features, and professional experience, the
databases. The SVM algorithm shows 66.096% accuracy.
highest accuracy was achieved by Gaussian NB with 99%.
A web-based application is developed by [11] through Table I depicts and summarizes the relevant studies according
applied machine learning algorithms DT, NB, and NN to to their adopted features, dataset sources, ML models, output
predict the sustainability of IT students’ skills for recruitment features, and accuracy of the best-adapted model to answer
mainly hard skills and soft skills, the collected data were from RQ1.
student and recruiter surveys, the NB achieved the highest
360 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022
Adopted features
Reference Year Dataset sources ML model Output features Accuracy
categories
Hard skills -SVM
Demographics Student surveys -ANN
Employability:
features and -LR
Hugo [9] 2018 {Employed, Not SVM 87.26%
Extra/co-curricular Institution -Discriminant Employed}
activities databases. analysis
Internship -DT
Hard skills
Soft skills
Demographic features -DT Employability:
Institutional
Othman et al. [10] 2018 Extra/co-curricular -ANN {Employed, Not SVM 66.0967%
databases
activities -SVM employed}
University features
Internship
Hard skills
Employability:
Soft skills
Bai and Hira [16] 2021 Student surveys. -Softmax regression. {Employed, 98%
Demographic features
Unemployed}
University features.
-SVM
-LR
Institution -DT Placement: {Placed,
Laddha et al. [17] 2021 Hard skills. RF 70%
databases. -RF Not placed}
-AdaBoost
–NB.
Hard skills -DT
Demographic features -RF Recruitment: {Join, Gaussian NB
Reddy et al. [18] 2021 Employee surveys.
Professional -NN Not join} 99%
experience. -Gaussian NB.
361 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022
III. METHODOLOGY the raw datasets to produce the expected results [19]. So, data
preprocessing make data suitable for a machine learning
In this section, we will discuss the methodology of our model. First, we eliminate noise, missing values and make the
study, the machine learning algorithms applied, and the data consistent. Then, we apply feature selection to identify
evaluation metrics used in this study. Fig. 1 highlights the the relevant features to allow classifiers to reach the optimal
research methodology: i) Data collection; ii) applying data performance which has a greater impact on IT graduates’
preprocessing; iii) Splitting the dataset into two sets, a train set employability to match the labor market demands. Finally, we
to train the model and a test set to evaluate the model; Split the dataset into two sets (80%) for training to train the
iv) building our model by applying five ML classification model and (20%) for testing to test the accuracy of the model
algorithms; v) evaluating the model; vi) outcome the proposed and enhance the performance of our machine learning models.
model to predict the qualified IT graduate to meet labor
market demands. To answer RQ1: What are the most C. Prediction Models
significant features that affect graduates’ competitive Five different binary classification algorithms are used to
advantage to match labor market demands? we followed the predict the IT graduates’ employability using the collected
methodology steps as shown below. dataset. Because it categorizes new observations into one of
two classes. The binary class in our dataset has two values (0)
A. Data Source
for a not qualified graduate that does not match labor market
The dataset used in this research was obtained based on a demands, and (1) for a qualified graduate. The number of
survey given to IT graduates and employers in Egypt. We records used in this study is 296. We used the following
created an online survey with pertinent questions and then libraries Scikit Learn, Pandas, NumPy, Matplotlib, and
distribute it to IT graduates including (Computers & Artificial Seaborn of the Python programming language. The five
intelligence, Business information systems, Software classification algorithms are:
Engineering, and Management information systems) and
several IT companies from different sectors to get the desired Decision Tree Algorithm: is a supervised learning
findings. A brief description of each feature selected, and its technique equivalent to a series of IF-THEN statements built a
value is described in Table II. We classified them into four structure of branches and nodes based on the evidence
categories (Trainings, Soft skills, Hard skills, and In-demand obtained for each feature during the method learning process
skills) each category has the most-related features, and the [10]. DT algorithm generates decision trees from training data
values (0,1) of the first three categories indicated that “0” to solve classification and regression problems. In our
means the graduate does not been trained or given a specific proposed model, the Gini method was used to create split
course during their study years in the college. While “1” points by finding a decision rule that produces the greatest
means the graduate has been trained or given a specific course decrease in impurity at a node.
in those skills. In the fourth category, the value (0-7) means c
how many courses or trainings the graduate received from G(t)=1— ∑ Pi2 (1)
i=1
those fields to be qualified for the industry requirements.
where G(t) is the Gini impurity at node t and pi is the
B. Data Preprocessing
proportion of observations of class c at node t. Recursively,
Data preparation is a critical stage while creating a this decision-making process is carried out until all leaf nodes
machine learning model as it is difficult for a machine to read are pure or a certain cutoff is achieved.
362 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022
363 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022
364 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022
365 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022
366 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022
For further research, the size of the used dataset can be International Conference on Computer Applications & Information
expanded, and various ML algorithms can be used to get Security (ICCAIS), May 2019, pp. 1–5. doi:
10.1109/CAIS.2019.8769577.
better performance.
[12] A. Dubey and M. Mani, “Using Machine Learning to Predict High
REFERENCES School Student Employability – A Case Study,” in 2019 IEEE
[1] H. B. Kenayathulla, N. A. Ahmad, and A. R. Idris, “Gaps between International Conference on Data Science and Advanced Analytics
competence and importance of employability skills: evidence from (DSAA), Oct. 2019, pp. 604–605. doi: 10.1109/DSAA.2019.00078.
Malaysia,” Higher Education Evaluation and Development, vol. 13, no. [13] M. S. Kumar and G. P. Babu, “Comparative Study of Various
2. pp. 97–112, 2019. doi: 10.1108/heed-08-2019-0039. Supervised Machine Learning Algorithms for an Early Effective
[2] F. Biagi, J. Castaño Muñoz, and G. Di Pietro, “Mismatch Between Prediction of the Employability of Students,” J. Eng. Sci., vol. 10, no.
Demand and Supply Among Higher Education Graduates in the EU,” 10, pp. 240–251, 2019.
JRC Tech. Rep., pp. 1–21, 2020, doi: 10.2760/003134. [14] C. D. Casuat and E. D. Festijo, “Identifying the Most Predictive
[3] R. Assaad, C. Krafft, and D. Salehi-Isfahani, “Does the type of higher Attributes among Employability Signals of Undergraduate Students,”
education affect labor market outcomes? Evidence from Egypt and Proc. - 2020 16th IEEE Int. Colloq. Signal Process. its Appl. CSPA
Jordan,” High. Educ., vol. 75, no. 6, pp. 945–995, Jun. 2018, doi: 2020, no. May, pp. 203–206, 2020, doi:
10.1007/s10734-017-0179-0. 10.1109/CSPA48992.2020.9068681.
[4] M. I. Hossain, K. S. A. Yagamaran, T. Afrin, N. Limon, M. [15] K. B. Aviso, J. I. B. Janairo, R. I. G. Lucas, M. A. B. Promentilla, D. E.
Nasiruzzaman, and A. M. Karim, “Factors Influencing Unemployment C. Yu, and R. R. Tan, “Predicting higher education outcomes with
among Fresh Graduates: A Case Study in Klang Valley, Malaysia,” Int. hyperbox machine learning: what factors influence graduate
J. Acad. Res. Bus. Soc. Sci., vol. 8, no. 9, Oct. 2018, doi: employability?,” Chem. Eng. Trans., vol. 81, no. 2019, pp. 679–684,
10.6007/IJARBSS/v8-i9/4859. 2020, doi: 10.3303/CET2081114.
[5] H. Pallathadka et al., “Materials Today : Proceedings Investigating the [16] A. Bai and S. Hira, “An intelligent hybrid deep belief network model for
impact of artificial intelligence in education sector by predicting student predicting students employability,” Soft Comput., vol. 25, no. 14, pp.
performance,” Mater. Today Proc., vol. 51, pp. 2264–2267, 2022, doi: 9241–9254, Jul. 2021, doi: 10.1007/s00500-021-05850-x.
10.1016/j.matpr.2021.11.395. [17] M. D. Laddha, V. T. Lokare, A. W. Kiwelekar, and L. D. Netak,
[6] H. Zeineddine, U. Braendle, and A. Farah, “Enhancing prediction of “Performance Analysis of the Impact of Technical Skills on
student success: Automated machine learning approach,” Comput. Employability,” Int. J. Performability Eng., vol. 17, no. 4, p. 371, 2021,
Electr. Eng., vol. 89, 2021, doi: 10.1016/j.compeleceng.2020.106903. doi: 10.23940/ijpe.21.04.p5.371378.
[7] M. E. Oswald-Egg and U. Renold, “No experience, no employment: The [18] D. Jagan Mohan Reddy, S. Regella, and S. R. Seelam, “Recruitment
effect of vocational education and training work experience on labour Prediction using Machine Learning,” in 2020 5th International
market outcomes after higher education,” Econ. Educ. Rev., vol. 80, Conference on Computing, Communication and Security (ICCCS), Oct.
2021, doi: 10.1016/j.econedurev.2020.102065. 2020, pp. 1–4. doi: 10.1109/ICCCS49678.2020.9276955.
[8] O. Saidani, L. J. Menzli, A. Ksibi, N. Alturki, and A. S. Alluhaidan, [19] S. R. Rahman, M. A. Islam, P. P. Akash, M. Parvin, N. N. Moon, and F.
“Predicting Student Employability Through the Internship Context N. Nur, “Effects of co-curricular activities on student’s academic
Using Gradient Boosting Models,” IEEE Access, vol. 10, pp. 46472– performance by machine learning,” Curr. Res. Behav. Sci., vol. 2, no.
46489, 2022, doi: 10.1109/ACCESS.2022.3170421. May, p. 100057, 2021, doi: 10.1016/j.crbeha.2021.100057.
[9] L. S. Hugo, “Predicting Employment Through Machine Learning.” [20] A. Alhassan, B. Zafar, and A. Mueen, “Predict Students’ Academic
https://www.naceweb.org/career-development/trends-and- Performance based on their Assessment Grades and Online Activity
predictions/predicting-employment-through-machine-learning/ Data,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 4, 2020, doi:
(accessed Oct. 18, 2022). 10.14569/IJACSA.2020.0110425.
[10] Z. Othman, S. W. Shan, I. Yusoff, and C. P. Kee, “Classification [21] C. D. Casuat, “Predicting Students’ Employability using Support Vector
Techniques for Predicting Graduate Employability,” Int. J. Adv. Sci. Machine: A SMOTE-Optimized Machine Learning System,” Int. J.
Eng. Inf. Technol., vol. 8, no. 4–2, p. 1712, Sep. 2018, doi: Emerg. Trends Eng. Res., vol. 8, no. 5, pp. 2101–2106, May 2020, doi:
10.18517/ijaseit.8.4-2.6832. 10.30534/ijeter/2020/102852020.
[11] M. Alghamlas and R. Alabduljabbar, “Predicting the Suitability of IT [22] P. Thakar, P. Dr., and D. Manisha, “Role of Secondary Attributes to
Students’ Skills for the Recruitment in Saudi Labor Market,” in 2019 Boost the Prediction Accuracy of Students’ Employability Via Data
2nd Mining,” Int. J. Adv. Comput. Sci. Appl., vol. 6, no. 11, 2015, doi:
10.14569/IJACSA.2015.061112.
367 | P a g e
www.ijacsa.thesai.org
View publication stats