0% found this document useful (0 votes)
16 views10 pages

Paper 43-Employability Prediction of Information Technology Graduates

Uploaded by

prakhar trivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views10 pages

Paper 43-Employability Prediction of Information Technology Graduates

Uploaded by

prakhar trivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/368825053

Employability Prediction of Information Technology Graduates using Machine


Learning Algorithms

Article in International Journal of Advanced Computer Science and Applications · November 2022
DOI: 10.14569/IJACSA.2022.0131043

CITATIONS READS
10 950

3 authors:

Gehad Elsharkawy Yehia K. Helmy


Helwan University Helwan University
2 PUBLICATIONS 10 CITATIONS 65 PUBLICATIONS 515 CITATIONS

SEE PROFILE SEE PROFILE

Engy Yehia
Helwan University
11 PUBLICATIONS 65 CITATIONS

SEE PROFILE

All content following this page was uploaded by Gehad Elsharkawy on 26 February 2023.

The user has requested enhancement of the downloaded file.


(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022

Employability Prediction of Information Technology


Graduates using Machine Learning Algorithms
Gehad ElSharkawy, Yehia Helmy, Engy Yehia
Dept. Business Information Systems
Faculty of Commerce and Business Administration, Helwan University, Cairo, Egypt

Abstract—The ability to predict graduates’ employability to Machine learning (ML) techniques can be used to predict
match labor market demands is crucial for any educational the employability signals of IT graduates and identify the most
institution aiming to enhance students' performance and learning significant factors affecting their employability as early as
process as graduates’ employability is the metric of success for possible so appropriate actions can be taken to enhance their
any higher education institution (HEI). Especially information employability in order to equip them with the appropriate
technology (IT) graduates, due to the evolving demand for IT knowledge and skills before they enter the dynamic job
professionals increased in the current era. Job mismatch and market.
unemployment remain major challenges and issues for
educational institutions due to the various factors that influence There is increasing interest in applying machine learning
graduates' employability to match labor market needs. in higher education, according to certain prior studies to
Therefore, this paper aims to introduce a predictive model using predict the graduates’ employability but still, the use of
machine learning (ML) algorithms to predict information automated machine learning to predict students' employability
technology graduates’ employability to match the labor market in its initial stage, ML is a subset of artificial intelligence (AI)
demands. Five machine learning classification algorithms were in which computers analyze large datasets to learn patterns
applied named Decision tree (DT), Gaussian Naïve Bayes that will make predictions for new data, in contrast to
(Gaussian NB), Logistic Regression (LR), Random Forest (RF), traditional computer methodologies. In traditional reasoning,
and Support Vector Machine (SVM). The dataset used in this
algorithms are a set of explicitly defined instructions that
study is collected based on a survey given to IT graduates and
computers use to describe or solve problems [5], [6]. As a
employers. The performance of the study is evaluated in terms of
accuracy, precision, recall, and f1 score. The results showed that
result, in the hiring process, graduates with experience are in
DT achieved the highest accuracy, and the second highest high demand due to high productivity and low training cost
accuracy was achieved by LR and SVM. than those who did not have any experience. HEIs must
undergo frequent evaluations to provide future IT graduates
Keywords—Machine learning; IT graduates; higher education; with the demanded skills as it is considered the main factor to
employability; labor market produce this workforce [7].

I. INTRODUCTION The earlier studies have shown a great interest in


examining the mismatch between HEIs output and labor
Due to the dynamically changing job market and the rapid market demands. By applying different ML algorithms.
advancements in technology. The growing demand for However, these studies focused on one or a few features only.
Information Technology (IT) professionals is one of the As a result, the two main research questions of this study are:
highest demands all over the world [1]. Human capital is one
of the most important economic assets of production and is RQ1) What are the most significant features that affect
considered the main pillar for raising the standard of living graduates’ competitive advantage to match labor market
and developing human resources on which countries depend in demands?
strategic planning to achieve sustainable development, as RQ2) what are the best machine learning algorithms for
human capital represents the workforce that engages in all employability prediction of IT graduates?
service, production, and consumer activities in society. As a
result, higher education institutions (HEIs) produce an The objective of this study is to develop a prediction ML
increasing number of graduates each year. The mismatch model for graduates’ employability status (predict whether the
between the higher education outputs and the labor market IT graduate is most likely to be qualified or not qualified to
demands is considered one of the major threats to economic match labor market demands), and for better utilization of the
growth which causes high unemployment rate and collected dataset which can greatly help understand the extent
misplacement problems among higher education graduates in to which IT graduates were prepared for the highly technical
Egypt. The mismatch is due to poor collaboration between the IT careers to enter the workforce.
labor market and HEIs. This lack of communication results in
The findings of this study will help:
the wrong kind of workforce, thus errors in its production are
costly [2]–[4]. Thereby, to avoid this mismatch, the HEIs have  Reduce the gap between labor market demands and
to ensure the graduates’ employability. HEIs.

359 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022

 Improve the IT graduates’ qualifications to match labor accuracy of 69%. In another research [12], supervised
market demands. machine learning techniques such as LR, DT, RF, KNN, and
SVM were used to predict high school students' employability
 Provide valuable insights for guiding HEIs to make for part-time jobs with local businesses the hard skills,
better long-term plans for producing graduates who are demographic features, and extra/co-curricular activities
knowledgeable and skilled through prediction of their features were used and collected from student surveys. The
employability status. LR algorithm achieved an accuracy of 93%.
 Contribute significantly to the placement process for The authors in [13] analyzed the data from education
employers. institutions to predict the students’ employability and
 Decrease the high unemployment rate of IT graduates. determine the factors affecting their employability by using
hard skills, soft skills, demographic features, extra/co-
The rest of this paper is organized as follows: Section II curricular activities, and university features then applied four
presents various relevant works in the field of employability ML algorithms which are DT, Gaussian NB, SVM, and KNN.
prediction. Section III describes the proposed methodology in The results achieved an accuracy of 98% by DT and SVM.
detail. Section IV shows the results of the used algorithms and
the discussion of the analysis of the used features. Section V Furthermore, a student employability prediction system
presents the conclusions of this study with some limitations was developed by [14] using SVM, DT, RF, KNN, and LR
and improvements. algorithms to predict the students’ employability, Institutional
databases were obtained, and the hard skills, soft skills, and
II. RELATED WORK demographic features were used. The results of this research
achieved an accuracy of 91% by the SVM algorithm. In [14],
In recent years, many researchers attempted to use
the authors identified the most predictive attributes through
machine learning in higher education to enhance graduate’s
hard skills, soft skills, and demographic features to determine
features and curricula to support employability [8]. To discuss
why students are most likely to get employed using graduates
the contribution of ML in continuous quality improvement.
surveys and institutional databases, the applied and compared
We focused on some of the previous works that used different
three methods are SVM, RF, and DT. The SVM achieved the
machine learning techniques such as Artificial Neural
highest accuracy of 91.22%.
Network (ANN), Decision Tree (DT), K-Nearest Neighbor
(KNN), Gaussian Naïve Bayes (Gaussian NB), Logistic The authors in [15] investigated the impact of various
Regression (LR), Neural Network (NN), Random Forest (RF), institution features on graduate employability using the
Naïve Bayes (NB) and Support Vector Machine (SVM). hyperbox-based machine learning model which achieved 78%
accuracy. A hybrid model was proposed by [16] for student
In [9], the author predicted which students are most likely
employability prediction through a deep belief network and
to get work after graduation by using data analytics and
Softmax regression (DBN-SR) the dataset obtained from
machine learning techniques such as SVM, LR, ANN, DT,
student surveys and the hard skills, soft skills, demographic
and discriminant analysis. Also, the features used are hard
features, and university features were used as the adopted
skills, demographics features, extra/co-curricular activities,
features the results achieved high accuracy with 98%.
and internships the data were obtained from student surveys
and institutional databases. The SVM classification algorithm In [17] predicted the students’ employability based on
achieved an accuracy of 87.26%. technical skills the institution databases were collected and the
following algorithms were applied SVM, LR, DT, RF,
The authors in [10] aimed to identify the most significant
AdaBoost, and NB, the highest accuracy achieved is 70% by
factors affecting graduate employability by using three
the RF algorithm. Finally, the authors in [18] developed a
classification algorithms DT, ANN, and SVM. The features
model using various machine learning methods DT, RF, NN,
used in this research are hard skills, soft skills, demographic
and Gaussian NB to forecast candidate hiring by employing
features, extra/co-curricular activities, university features, and
different statistical measures on feature selection such as hard
internships the research data were collected from institutional
skills, demographic features, and professional experience, the
databases. The SVM algorithm shows 66.096% accuracy.
highest accuracy was achieved by Gaussian NB with 99%.
A web-based application is developed by [11] through Table I depicts and summarizes the relevant studies according
applied machine learning algorithms DT, NB, and NN to to their adopted features, dataset sources, ML models, output
predict the sustainability of IT students’ skills for recruitment features, and accuracy of the best-adapted model to answer
mainly hard skills and soft skills, the collected data were from RQ1.
student and recruiter surveys, the NB achieved the highest

360 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022

TABLE I. COMPARISON OF RELATED STUDIES

Adopted features
Reference Year Dataset sources ML model Output features Accuracy
categories
Hard skills -SVM
Demographics Student surveys -ANN
Employability:
features and -LR
Hugo [9] 2018 {Employed, Not SVM 87.26%
Extra/co-curricular Institution -Discriminant Employed}
activities databases. analysis
Internship -DT

Hard skills
Soft skills
Demographic features -DT Employability:
Institutional
Othman et al. [10] 2018 Extra/co-curricular -ANN {Employed, Not SVM 66.0967%
databases
activities -SVM employed}
University features
Internship

Student surveys -DT


Alghamlas and Hard skills
and
Matching to industry- Naïve Bayes
2018 -NB
Alabduljabbar [11] Soft skills. required skills 69%
Recruiter surveys. -NN
-LR
Hard skill
-DT
Demographic features Hiring: {Hired, Not
Dubey and Mani [12] 2019 Student surveys. -RF LR 93%
Extra/co-curricular hired}
-KNN
activities.
-SVM
Hard skills
Soft skills -DT
Demographic features -Gaussian NB Getting a job: {Yes,
Kumar and Babu [13] 2019 Student surveys. DT & SVM 98%
Extra/co-curricular -SVM no}
activities -KNN.
University features.
-DT
Hard skills -RF Employability:
Institution
Casuat [21] 2020 Soft skills -SVM {Employed, Less SVM 91%
databases.
Demographic features -KNN Employed}
-LR
Graduate surveys
Hard skills -SVM Employability:
and
Casuat & Festijo [14] 2020 Soft skills -RF {Employed, Less SVM 91.22%
Institution
Demographic features. -DT Employed}
databases

Institution Rule-based Employability: {Yes,


Aviso et al.[15] 2020 University features. 78%
databases. Hyperbox model. no}

Hard skills
Employability:
Soft skills
Bai and Hira [16] 2021 Student surveys. -Softmax regression. {Employed, 98%
Demographic features
Unemployed}
University features.
-SVM
-LR
Institution -DT Placement: {Placed,
Laddha et al. [17] 2021 Hard skills. RF 70%
databases. -RF Not placed}
-AdaBoost
–NB.
Hard skills -DT
Demographic features -RF Recruitment: {Join, Gaussian NB
Reddy et al. [18] 2021 Employee surveys.
Professional -NN Not join} 99%
experience. -Gaussian NB.

361 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022

III. METHODOLOGY the raw datasets to produce the expected results [19]. So, data
preprocessing make data suitable for a machine learning
In this section, we will discuss the methodology of our model. First, we eliminate noise, missing values and make the
study, the machine learning algorithms applied, and the data consistent. Then, we apply feature selection to identify
evaluation metrics used in this study. Fig. 1 highlights the the relevant features to allow classifiers to reach the optimal
research methodology: i) Data collection; ii) applying data performance which has a greater impact on IT graduates’
preprocessing; iii) Splitting the dataset into two sets, a train set employability to match the labor market demands. Finally, we
to train the model and a test set to evaluate the model; Split the dataset into two sets (80%) for training to train the
iv) building our model by applying five ML classification model and (20%) for testing to test the accuracy of the model
algorithms; v) evaluating the model; vi) outcome the proposed and enhance the performance of our machine learning models.
model to predict the qualified IT graduate to meet labor
market demands. To answer RQ1: What are the most C. Prediction Models
significant features that affect graduates’ competitive Five different binary classification algorithms are used to
advantage to match labor market demands? we followed the predict the IT graduates’ employability using the collected
methodology steps as shown below. dataset. Because it categorizes new observations into one of
two classes. The binary class in our dataset has two values (0)
A. Data Source
for a not qualified graduate that does not match labor market
The dataset used in this research was obtained based on a demands, and (1) for a qualified graduate. The number of
survey given to IT graduates and employers in Egypt. We records used in this study is 296. We used the following
created an online survey with pertinent questions and then libraries Scikit Learn, Pandas, NumPy, Matplotlib, and
distribute it to IT graduates including (Computers & Artificial Seaborn of the Python programming language. The five
intelligence, Business information systems, Software classification algorithms are:
Engineering, and Management information systems) and
several IT companies from different sectors to get the desired Decision Tree Algorithm: is a supervised learning
findings. A brief description of each feature selected, and its technique equivalent to a series of IF-THEN statements built a
value is described in Table II. We classified them into four structure of branches and nodes based on the evidence
categories (Trainings, Soft skills, Hard skills, and In-demand obtained for each feature during the method learning process
skills) each category has the most-related features, and the [10]. DT algorithm generates decision trees from training data
values (0,1) of the first three categories indicated that “0” to solve classification and regression problems. In our
means the graduate does not been trained or given a specific proposed model, the Gini method was used to create split
course during their study years in the college. While “1” points by finding a decision rule that produces the greatest
means the graduate has been trained or given a specific course decrease in impurity at a node.
in those skills. In the fourth category, the value (0-7) means c
how many courses or trainings the graduate received from G(t)=1— ∑ Pi2 (1)
i=1
those fields to be qualified for the industry requirements.
where G(t) is the Gini impurity at node t and pi is the
B. Data Preprocessing
proportion of observations of class c at node t. Recursively,
Data preparation is a critical stage while creating a this decision-making process is carried out until all leaf nodes
machine learning model as it is difficult for a machine to read are pure or a certain cutoff is achieved.

Fig. 1. The Research Methodology.

362 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022

TABLE II. DESCRIPTION OF DATA FEATURES

Category Feature Values Description


A professional learning experience that provides meaningful work experience related to a student's
Internship
field of study or career interest for a limited period of time.
A period spent in a reputable company to gain relevant skills and experience in a particular field is
Summer training
usually conducted during July and August of each year.
Trainings (0,1)
A period of discussion in which people work on a particular subject by discussing it or doing
Workshops
activities relating to it.
The activities and learning experiences that take place in the university along with the academic
Co-curricular activities
curriculum by students to enhance their skills.
The act of defining a problem; finding the cause of the problem; identifying, prioritizing, and
Problem-solving
selecting alternatives for a solution; and implementing a solution.
Creative thinking The ability to generate new solutions to problems.
Soft skills (0,1)
Time management The process of planning and organizing how much time to spend on specific activities.
English proficiency The ability to use and understand spoken and written English.
Data security The practice of protecting digital information.
Hard skills (0,1)
Network security The practice of protecting networks and data.
Data Analytics
Artificial Intelligence (AI)
Internet of Things (IoT)
In-demand The student’s knowledge and experience gained in those fields are based on their years of studies at
Machine Learning (ML) (0-7)
skills the university through curricula and practical applications of them.
Cybersecurity
Data Science
Cloud Computing
Gaussian Naïve Bayes (Gaussian NB) algorithm: is a labor market requirements. In our proposed model, a linear
variant of Naive Bayes it is a probabilistic machine learning model is included in a logistic function as follows:
algorithm used for many classification functions and is based
1
on the Bayes theorem and has a strong assumption that P(yi =1|X)= (3)
1+e-(β0 +β1 x)
predictors should be independent of each other [13]. The
likelihood of the features in our proposed model is assumed to where P (yi = 1 | X) is the probability of the ith
be Gaussian: observation’s target value, yi, being class 1, X is the training
data, β0 and β1 are the parameters to be learned, and e is
1 (xi —μy )2
P(xi |y)= exp (— ) (2) Euler’s number. The logistic function's goal is to interpret its
2σ2y
√2πσ2y output as a probability by limiting its value to a range between
0 and 1.
Where the parameters σy and µy are estimated using
maximum likelihood. 3) Support Vector Machine algorithm (SVM): in SVM the
classes in the dataset should be pre-defined in this model. It
1) Random forest algorithm: is a supervised learning works by using predefined classes to classify the objects in the
algorithm. It can be used both for classification and given dataset. It categorizes transactions by allocating one or
regression. This model first generates a forest of random trees. more classes in order to increase performance accuracy [21].
The aim of voting to merge random trees in a forest is to We used the linear SVC (Linear Support Vector
eliminate the most predicted tree. If a dataset contains x Classification).
features, it first chooses a random feature known as y. The
algorithm then attempts to merge trees based on the expected D. Model Evaluation
outcome and voting procedure [20]. We used the Gini method To evaluate the model effectiveness, a confusion matrix
as mentioned in (1). with true positive (TP), false positive (FP), true negative (TN),
2) Logistic Regression (LR) algorithm: A LR uses and false negative (FN) for predicted data is formed. The
regression analysis, in this method a class variable that is performance of the study is measured with respect to the
accuracy, precision, recall, and F1 score. A brief description
binary classified is required for the logistic regression model
of each is described below:
[17]. Similarly, the target column named the employability
class in this dataset holds two types of binary numbers “0” for Accuracy: It is a common metric for evaluating classifier
a not-qualified IT graduate who has no chance of being performance. It computes the ratio of correctly classified
employable to meet labor market demands, and “1” for the IT instances to the total number of instances [8]. Its formula is as
follows:
graduate who has been predicted to be qualified and match

363 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022

TP+TN been trained on during their years of study. As stated by the


Accuracy = (4)
TP+FP+TN+FN problem-solving skills a total of 39 participants were trained
Precision: is the ratio of true positive instances divided by and qualified. As well as a number of 110 participants referred
the total number of instances predicted as positive [22]. to this skill although they were not qualified. Referring to
TP
creative thinking skills, a total of 33 participants were trained
Precision = (5) and qualified. Also, a number of 74 participants referred to
TP+FP
this skill although they were not qualified. Based on time
Recall: is given as the ratio of relevant instances that are management skills, a total of 37 participants were trained and
retrieved [22]. qualified. Moreover, a number of 107 participants referred to
TP this skill although they were not qualified. According to
Recall = (6) English proficiency skills, a total of 43 participants were
TP+FN
trained and qualified. In addition, a number of 102 participants
F1 score: it is the combination of both precision and recall
referred to this skill although they were not qualified.
used to get the average value of them [20].
precision * recall
F1 score= 2* (7)
precision + recall

IV. RESULTS AND DISCUSSION


After data pre-processing, according to the methodology
used, out of the total 296 graduates collected, 80% of the data
was used as a training dataset, and 20% was kept as a test
dataset. The findings related to this study are presented as
follows. Fig. 2 shows the correlation matrix for the used
features.
The distribution of the employability class (qualified and
not qualified) graduates used in this study is illustrated in
Fig. 3 the value 0 represents the number of not qualified
graduates while 1 represents the number of qualified
graduates. From the figure, it may be shown that most Fig. 2. Correlation Matrix of Selected Features.
involved samples are “not qualified” graduates (82%) than the
“qualified” graduates (18%).
In Fig. 4, we present the participants’ distribution in terms
of the features that represent the trainings taken during the
graduates’ years of study. According to the internship, a total
of 11 participants were trained and qualified. Furthermore, 15
participants referred to this training although they were not
qualified. A total of 12 participants were trained and qualified
because of the summer training. Moreover, the 105
participants referred to this training even though they were not
qualified. According to the co-curricular activities, a total of
41 participants were trained and qualified. Whereas 168
participants referred to this training given the fact that they
were not qualified. Lastly, 37 people were trained and
qualified during the workshops. And 82 participants referred
to this training despite the reality that they were not qualified.
Fig. 5 illustrates the participants’ distribution in terms of
Fig. 3. Count of Employability Class (Qualified/Not Qualified).
the features that represent the soft skills the graduates have

Fig. 4. Respondents’ Distribution in Terms of Trainings.

364 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022

Fig. 5. Respondents’ Distribution in Terms of Soft Skills.

Gaussian NB, and LR obtained the lowest number of false


negatives (0 among 59).

Fig. 6. Respondents’ Distribution in Terms of Hard Skills.

Fig. 6 depicts the participants’ distribution in terms of the


features that represent the hard skills the graduates have been
trained on during their years of study. According to the data
security skills, a total of 39 participants were trained and
qualified. Furthermore, a number of 107 participants referred
to this skill although they were not qualified. A total of 40 Fig. 7. Respondents’ Distribution in Terms of in-demand Skills.
participants were trained and qualified in network security
skills, whereas a number of 120 participants referred to this
skill although they were not qualified.
Fig. 7 demonstrates the participants’ distribution in terms
of the features that represent the in-demand skills required by
the industry from the employers’ perspectives of the graduates
who have been trained on or given a specific course during
their years of study. The 0 value means a total of 2
participants did not take any of those skills and were qualified
to match labor market requirements whereas 114 participants
did not take any of them and found themselves not qualified to
be employable. Based on value 7, a total of 7 participants took
the seven demanded skills, and they were qualified. Therefore,
there are no participants who took those seven skills who were
not qualified.
Fig. 8. Confusion Matrix for the Five Machine Learning Models.
We applied five machine learning classification
algorithms for predicting IT graduates’ employability. The TABLE III. CONFUSION MATRIX FOR THE MACHINE LEARNING
confusion matrix for each model is illustrated in Table III. CLASSIFICATION MODELS
Fig. 8 shows the outcome prediction.
DT Gaussian NB LR RF SVM
Table III reveals that the DT model predicts the highest TP 52 46 46 49 46
number of true positives (52 out of 59 test samples) among the
five models. Furthermore, LR and SVM models predict the TN 8 9 13 9 13
highest number of true negatives (13 among 59 test samples). FP 0 5 1 0 0
The lowest number of false positives (0 out of 59 samples) is FN 0 0 0 2 1
achieved by DT, RF, and SVM, respectively. The DT,

365 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022

The model performance for the employability target class


in the form of a confusion matrix is presented in Table IV. In
this table, the Match Class “1” means the graduates have
chances of being employable and matching the labor market
demands. On the other side, Not Match Class “0” denotes the
graduates having no chance of being employable, the values of
the row illustrating the prediction computed for both classes.
As a result, the class precision, recall, and f1 score values are
computed and displayed in the table. The class recall and
precision values can be used to determine the classifier's
overall accuracy. According to the table values, the DT
classifier has the highest precision and recall, while the
Gaussian NB classifier has the lowest.
The performance of the study was evaluated in terms of
accuracy, precision, recall, and F1 score. The calculated
performance measures are shown in Fig. 9 and Table V.
RQ2: what are the best machine learning algorithms for Fig. 9. Performance Measurement using Five Machine Learning Algorithms.
employability prediction of IT graduates?
Fig. 9 and Table V indicate that DT outperformed all other TABLE V. PERFORMANCE EVALUATION OF THE FIVE MACHINE
LEARNING ALGORITHMS
machine learning algorithms with a maximum accuracy of
100%, while LR and SVM achieved the second highest DT Gaussian NB LR RF SVM
accuracy of 98%. DT outperformed by precision, recall, and Accuracy 1 0.92 0.98 0.97 0.98
F1 score of 100%. The second highest F1 score is achieved by
LR and SVM at 98%. The second highest precision is Precision 1 0.82 0.96 0.98 0.99
achieved by SVM, and the second highest recall is achieved Recall 1 0.95 0.99 0.91 0.96
by LR. Most of the techniques have an F1 score higher than F1 score 1 0.87 0.98 0.94 0.98
93%, which is comparatively better.
V. CONCLUSION
TABLE IV. EVALUATION OF EMPLOYABILITY CLASS (QUALIFIED / NOT
QUALIFIED) The number of information technology graduates produced
by higher education institutions has been increasing every
Decision Tree Algorithm
year. To overcome their unemployment situation and the
Precision Recall F1 score mismatch between HEIs outputs and the labor market
Match (1) 1 1 1 demands, there is a need for a model that can predict IT
graduates’ employability to match labor market requirements
Not Match (0) 1 1 1
using machine learning techniques. Therefore, this paper
Gaussian Naive Bayes Algorithm proposed, discussed, and implemented five machine learning
Precision Recall F1 score classification algorithms namely DT, Gaussian NB, LR, RF,
and SVM.
Match (1) 0.64 1 0.78
Not Match (0) 1 0.9 0.95 This study achieved high accuracy than earlier works. The
highest accuracy is achieved by DT with 100% and the second
Logistic Regression Algorithm highest accuracy is achieved by LR and SVM with 98%,
Precision Recall F1 score whereas the lowest accuracy with 92% achieved by Gaussian
Match (1) 0.93 1 0.96
NB. The small size of the dataset is the main limitation of this
study. From the study, we can conclude that machine
Not Match (0) 1 0.98 0.99 learning techniques can predict IT graduates’
Random Forest Algorithm employability with high accuracy.
Precision Recall F1 score The proposed model can be useful and helpful for higher
Match (1) 1 0.82 0.9 education institutions to make better long-term plans for
producing graduates who are knowledgeable, skilled, and
Not Match (0) 0.96 1 0.98
fulfill the labor market needs. The findings of the features
Support Vector Machine Algorithm analysis indicated that moderating the curriculum to include
Precision Recall F1 score the demanded skills required by industry and improving the
teaching and learning methods by offering more training that
Match (1) 1 0.93 0.96
would produce quality graduates in the following years. Also,
Not Match (0) 0.98 1 0.99 the proposed model will be helpful for employers to contribute
significantly to the placement process.

366 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 10, 2022

For further research, the size of the used dataset can be International Conference on Computer Applications & Information
expanded, and various ML algorithms can be used to get Security (ICCAIS), May 2019, pp. 1–5. doi:
10.1109/CAIS.2019.8769577.
better performance.
[12] A. Dubey and M. Mani, “Using Machine Learning to Predict High
REFERENCES School Student Employability – A Case Study,” in 2019 IEEE
[1] H. B. Kenayathulla, N. A. Ahmad, and A. R. Idris, “Gaps between International Conference on Data Science and Advanced Analytics
competence and importance of employability skills: evidence from (DSAA), Oct. 2019, pp. 604–605. doi: 10.1109/DSAA.2019.00078.
Malaysia,” Higher Education Evaluation and Development, vol. 13, no. [13] M. S. Kumar and G. P. Babu, “Comparative Study of Various
2. pp. 97–112, 2019. doi: 10.1108/heed-08-2019-0039. Supervised Machine Learning Algorithms for an Early Effective
[2] F. Biagi, J. Castaño Muñoz, and G. Di Pietro, “Mismatch Between Prediction of the Employability of Students,” J. Eng. Sci., vol. 10, no.
Demand and Supply Among Higher Education Graduates in the EU,” 10, pp. 240–251, 2019.
JRC Tech. Rep., pp. 1–21, 2020, doi: 10.2760/003134. [14] C. D. Casuat and E. D. Festijo, “Identifying the Most Predictive
[3] R. Assaad, C. Krafft, and D. Salehi-Isfahani, “Does the type of higher Attributes among Employability Signals of Undergraduate Students,”
education affect labor market outcomes? Evidence from Egypt and Proc. - 2020 16th IEEE Int. Colloq. Signal Process. its Appl. CSPA
Jordan,” High. Educ., vol. 75, no. 6, pp. 945–995, Jun. 2018, doi: 2020, no. May, pp. 203–206, 2020, doi:
10.1007/s10734-017-0179-0. 10.1109/CSPA48992.2020.9068681.
[4] M. I. Hossain, K. S. A. Yagamaran, T. Afrin, N. Limon, M. [15] K. B. Aviso, J. I. B. Janairo, R. I. G. Lucas, M. A. B. Promentilla, D. E.
Nasiruzzaman, and A. M. Karim, “Factors Influencing Unemployment C. Yu, and R. R. Tan, “Predicting higher education outcomes with
among Fresh Graduates: A Case Study in Klang Valley, Malaysia,” Int. hyperbox machine learning: what factors influence graduate
J. Acad. Res. Bus. Soc. Sci., vol. 8, no. 9, Oct. 2018, doi: employability?,” Chem. Eng. Trans., vol. 81, no. 2019, pp. 679–684,
10.6007/IJARBSS/v8-i9/4859. 2020, doi: 10.3303/CET2081114.
[5] H. Pallathadka et al., “Materials Today : Proceedings Investigating the [16] A. Bai and S. Hira, “An intelligent hybrid deep belief network model for
impact of artificial intelligence in education sector by predicting student predicting students employability,” Soft Comput., vol. 25, no. 14, pp.
performance,” Mater. Today Proc., vol. 51, pp. 2264–2267, 2022, doi: 9241–9254, Jul. 2021, doi: 10.1007/s00500-021-05850-x.
10.1016/j.matpr.2021.11.395. [17] M. D. Laddha, V. T. Lokare, A. W. Kiwelekar, and L. D. Netak,
[6] H. Zeineddine, U. Braendle, and A. Farah, “Enhancing prediction of “Performance Analysis of the Impact of Technical Skills on
student success: Automated machine learning approach,” Comput. Employability,” Int. J. Performability Eng., vol. 17, no. 4, p. 371, 2021,
Electr. Eng., vol. 89, 2021, doi: 10.1016/j.compeleceng.2020.106903. doi: 10.23940/ijpe.21.04.p5.371378.
[7] M. E. Oswald-Egg and U. Renold, “No experience, no employment: The [18] D. Jagan Mohan Reddy, S. Regella, and S. R. Seelam, “Recruitment
effect of vocational education and training work experience on labour Prediction using Machine Learning,” in 2020 5th International
market outcomes after higher education,” Econ. Educ. Rev., vol. 80, Conference on Computing, Communication and Security (ICCCS), Oct.
2021, doi: 10.1016/j.econedurev.2020.102065. 2020, pp. 1–4. doi: 10.1109/ICCCS49678.2020.9276955.
[8] O. Saidani, L. J. Menzli, A. Ksibi, N. Alturki, and A. S. Alluhaidan, [19] S. R. Rahman, M. A. Islam, P. P. Akash, M. Parvin, N. N. Moon, and F.
“Predicting Student Employability Through the Internship Context N. Nur, “Effects of co-curricular activities on student’s academic
Using Gradient Boosting Models,” IEEE Access, vol. 10, pp. 46472– performance by machine learning,” Curr. Res. Behav. Sci., vol. 2, no.
46489, 2022, doi: 10.1109/ACCESS.2022.3170421. May, p. 100057, 2021, doi: 10.1016/j.crbeha.2021.100057.
[9] L. S. Hugo, “Predicting Employment Through Machine Learning.” [20] A. Alhassan, B. Zafar, and A. Mueen, “Predict Students’ Academic
https://www.naceweb.org/career-development/trends-and- Performance based on their Assessment Grades and Online Activity
predictions/predicting-employment-through-machine-learning/ Data,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 4, 2020, doi:
(accessed Oct. 18, 2022). 10.14569/IJACSA.2020.0110425.
[10] Z. Othman, S. W. Shan, I. Yusoff, and C. P. Kee, “Classification [21] C. D. Casuat, “Predicting Students’ Employability using Support Vector
Techniques for Predicting Graduate Employability,” Int. J. Adv. Sci. Machine: A SMOTE-Optimized Machine Learning System,” Int. J.
Eng. Inf. Technol., vol. 8, no. 4–2, p. 1712, Sep. 2018, doi: Emerg. Trends Eng. Res., vol. 8, no. 5, pp. 2101–2106, May 2020, doi:
10.18517/ijaseit.8.4-2.6832. 10.30534/ijeter/2020/102852020.
[11] M. Alghamlas and R. Alabduljabbar, “Predicting the Suitability of IT [22] P. Thakar, P. Dr., and D. Manisha, “Role of Secondary Attributes to
Students’ Skills for the Recruitment in Saudi Labor Market,” in 2019 Boost the Prediction Accuracy of Students’ Employability Via Data
2nd Mining,” Int. J. Adv. Comput. Sci. Appl., vol. 6, no. 11, 2015, doi:
10.14569/IJACSA.2015.061112.

367 | P a g e
www.ijacsa.thesai.org
View publication stats

You might also like