Student Cheating Detection in Higher Education by Implementing Machine Learning and LSTM TechniquesSensors
Student Cheating Detection in Higher Education by Implementing Machine Learning and LSTM TechniquesSensors
Article
Student Cheating Detection in Higher Education by
Implementing Machine Learning and LSTM Techniques
Waleed Alsabhan
College of Engineering, Al Faisal University, P.O. Box 50927, Riyadh 11533, Saudi Arabia; [email protected]
Abstract: Both paper-based and computerized exams have a high level of cheating. It is, therefore, de-
sirable to be able to detect cheating accurately. Keeping the academic integrity of student evaluations
intact is one of the biggest issues in online education. There is a substantial possibility of academic
dishonesty during final exams since teachers are not directly monitoring students. We suggest a
novel method in this study for identifying possible exam-cheating incidents using Machine Learning
(ML) approaches. The 7WiseUp behavior dataset compiles data from surveys, sensor data, and
institutional records to improve student well-being and academic performance. It offers information
on academic achievement, student attendance, and behavior in general. In order to build models
for predicting academic accomplishment, identifying at-risk students, and detecting problematic
behavior, the dataset is designed for use in research on student behavior and performance. Our
model approach surpassed all prior three-reference efforts with an accuracy of 90% and used a long
short-term memory (LSTM) technique with a dropout layer, dense layers, and an optimizer called
Adam. Implementing a more intricate and optimized architecture and hyperparameters is credited
with increased accuracy. In addition, the increased accuracy could have been caused by how we
cleaned and prepared our data. More investigation and analysis are required to determine the precise
elements that led to our model’s superior performance.
Keywords: student cheating detection; machine-learning; exploratory data analysis; online examina-
tion; student assessment
often more complex and challenging than tests in other levels. They require students
to have a deeper understanding of the material, apply critical thinking skills, and the
exams often have open-ended questions [12]. This complexity can increase the pressure on
students to perform well, which may lead some to resort to cheating to obtain good grades.
Additionally, the stakes are higher in higher education exams. These exams are often
worth a larger proportion of the overall grade, and their outcomes can have significant
implications for students’ academic and professional futures. This increased pressure
to perform well can lead some students to engage in cheating behavior to achieve good
grades, even if they have not adequately prepared for the exam [13]. Furthermore, in
higher education, students are expected to have developed a strong sense of academic
integrity and ethical conduct. However, some students may face significant challenges in
maintaining these standards, particularly in high-stress situations such as final exams. For
instance, students who feel overwhelmed or underprepared for an exam may be more likely
to cheat. Another factor that contributes to the uniqueness of cheating behavior in higher
education exams is the difficulty of detecting cheating. In higher education, courses often
require students to have a deep understanding of the material and demonstrate critical
thinking skills. As a result, it can be challenging for educators to distinguish between
genuine student work and cheating [14]. Additionally, in online exams, where students are
not directly monitored, there is a higher risk of academic dishonesty.
The deep learning algorithm employed in this study is called LSTM [15]. It is con-
sidered the most effective approach for dealing with object detection and recognition
challenges and may be used to resolve data categorization concerns accurately [12–15]. It
is a technique for deep learning designed to understand two-dimensional input, such as
audio or images [16]. The motivation came from how humans process and grow their visual
perception to recognize or differentiate an object in a digital image. To classify labeled
data, it employs supervised learning techniques. As well as detecting, segmenting, and
classifying pictures, it is commonly used to discriminate between objects or viewpoints [17].
This technique may also determine what people are doing [18]. Pooling, convolutional,
and fully connected layers—the three variations of the layer—are stacked to construct
it [19]. There are a variety of Convolution Neural Network (CNN) architectures, which are
also employed in many research articles as deep-learning techniques and are regarded as
such. As the convolution layer’s filter thickness matches the input’s thickness, the design
varies from previous CNN systems. Both depth-wise and point-wise convolutions are
recognized [20]. The bottleneck is where inputs and outputs between models occur, while
the inner layers represent the model’s ability to accept inputs from lower-level ideas. Faster
training and greater accuracy are possible via bypasses around bottlenecks [4].
The generalizability of these results is constrained despite the efficiency of the sug-
gested approach. Through diligent work and study, it is unquestionably realistic and
viable for a student to earn an extraordinarily high score [21]. Hence, a human expert
must do more research before a final judgment is made in any situation flagged as a poten-
tial infringement. Notwithstanding its shortcomings, this research intends to add to this
expanding field of study and provide insightful information on detecting final exam fraud.
Figure 1 displays the framework of the proposed model for detecting student cheating.
Initially, all datasets go through data preparation and cleansing. Following that, the image
pre-processing unit, feature extraction, and model selection activities are carried out. The
model evaluation metrics are given to the classifier, which is then used to train the model.
The optimization approach employs Support Vector Machine (SVM), LSTM, and Recurrent
Neural Network (RNN) classifiers. The system model is then applied to identify whether a
student is cheating or not.
Sensors2023,
Sensors 23,4149
2023,23, 4149 3 3ofof22
21
Figure
Figure1.1.Framework
Frameworkofofthe
theproposed
proposedmodel
modeldiagram
diagramrelated
relatedto
tostudent
studentcheating
cheatingdetection
detectionsystem.
system.
The
Thecurrent
current study
study uses an
an internet
internetprotocol
protocolnetwork
networkdetector
detectorand
anda behavior
a behavior detec-
detection
tion
agent based on ML to solve the limits of online test cheating. The study was a caseastudy,
agent based on ML to solve the limits of online test cheating. The study was case
study,
and its findings present ways to enhance the intelligent tutoring system. These are are
and its findings present ways to enhance the intelligent tutoring system. These this
this study’s
study’s primary
primary contributions:
contributions:
• To
Toidentify
identifyonline
onlinecheating
cheatingusing
usingML MLmethods,
methods,we wesuggested
suggestedaacheating
cheatingintelligence
intelligence
agent
agent based
based on on the
the 7wiseup
7wiseup behavior
behavior dataset.
dataset. We WeuseusethetheLSTM
LSTMnetwork
networkwith withaa
densely
denselylinked
linkedidea,
idea,DenseLayer,
DenseLayer,and and LSTM
LSTM to to
construct
construct behavior
behaviordetectors. We se-
detectors. We
selected
lected cutting-edge
cutting-edge MLML methods
methods forforthe
theonline
onlinetests
testssince
sincethey
theyhave
have developed
developed
quickly and have
quickly have been
beenextensively
extensivelyemployed
employedrecently.
recently.They
Theymaymayoffer helpful
offer insights
helpful in-
that contribute to the research field of an intelligent teaching
sights that contribute to the research field of an intelligent teaching system. system.
• Records
Recordswereweregathered
gatheredfrom fromonline
onlineexaminations
examinationstakentakenthroughout
throughoutmock, mock,midterm,
midterm,
andfinal
and final exam
exam periods in highly
highly uncontrolled
uncontrolledsettings.
settings.The
Thedatabase
databasecontained
containedtesting
test-
andand
ing training programs
training programs to analyze
to analyze andand
evaluate performance.
evaluate performance.
Thefollowing
The followingisishowhowthe thepaper
paper is is structured:
structured: TheThe second
second section
section is literature
is the the literature
re-
review.
view. Section
Section 3 contains
3 contains thethe dataset,
dataset, followedbybySection
followed Section4,4,which
whichhas hasthe
themethodology
methodology
part.Sections
part. Sections55andand 66 pertain
pertain toto the
the results
results and
and discussion
discussion sections,
sections, respectively.
respectively.Section
Section7
summarizes the concluding
7 summarizes the concluding section. section.
2. Literature Review
2. Literature Review
Academic dishonesty is a challenging issue typically thought to happen when students
Academic dishonesty is a challenging issue typically thought to happen when stu-
use unethical writing practices, such as copying, plagiarism, pasting, glancing at other
dents use unethical writing practices, such as copying, plagiarism, pasting, glancing at
people’s work, and data falsification [2,3]. Due to faulty academic assessments and perhaps
other people’s work, and data falsification [2,3]. Due to faulty academic assessments and
misleading student grades, academic dishonesty threatens the credibility of educational
perhaps misleading student grades, academic dishonesty threatens the credibility of edu-
institutions [10,22]. Cheating on academic assignments is a significant ethical violation that
cational institutions [10,22]. Cheating on academic assignments is a significant ethical vi-
jeopardizes academic integrity. Academic dishonesty has a significant negative influence
olation that jeopardizes academic integrity. Academic dishonesty has a significant nega-
on both the student’s ability to be trusted and the reputation of educational institutions.
tive influence on both the student’s ability to be trusted and the reputation of educational
Educational institutions may ensure that their students are held accountable for their
institutions. Educational institutions may ensure that their students are held accountable
work by utilizing technology, such as digital essay scanning, turnitin.com, or software, to
for their work by utilizing technology, such as digital essay scanning, turnitin.com, or
Sensors 2023, 23, 4149 4 of 21
identify plagiarism [3,9,23]. Research in this area has advanced thanks to technological
development.
The authors of the study [1] introduce a brand-new paradigm for the understanding
and categorization of cheating video sequences. This sort of research assists in the early
detection of academic dishonesty. The authors also present a brand-new dataset called
“activities of student cheating in paper-based tests.” The dataset comprises suspicious
behaviors that occurred in a testing setting. Eight different actors represented the five
various types of cheating. Each pair of individuals engaged in five different types of
dishonest behavior. They ran studies on action detection tasks at the frame level using five
different kinds of well-known characteristics to gauge how well the suggested framework
performed. The results of the framework trials were spectacular and significant.
In earlier research [12], the authors used the fact that students create their writings on
electronic devices and used recorded keystrokes from assignments and exams to identify
authors. By calculating the Euclidean distance between them, typing profile vectors from
the students are compared. It is unrelated to what is written and functions in writing and
programming duties. This method has the drawback of requiring additional software on
the students’ devices to monitor their typing habits.
In order to identify cheating during an online test, the authors of [5] developed
a system for assessing the head position and time delay. A high statistical correlation
between cheating activity and a student’s head position change relative to a computer
screen was also discussed. Therefore, we can instantly spot dubious student activity in
online courses. Similarly, in [17], the authors suggested a novel technique for tracking a
student’s anomalous conduct during an online test that uses a camera to establish the link
between the examinee’s head and mouth. According to experiments, an irregular pattern
of conduct in the online course may be easily identified using the suggested strategy.
In addition, the methods used by students to spot plagiarism in online tests were
examined. To identify and discourage test cheating, the authors of [20] proposed an
electronic exam monitoring system. The eye tribe tracker and the fingerprint reader were
employed for continual authentication by the system. Due to this, the system used two
factors to determine whether an examinee was cheating: the amount of time they were on-
screen overall and how frequently they were off-screen. Keystroke dynamics’ importance
in preserving security in online tests was discussed by [4]. Using statistical verification, ML,
and logical comparison as its three stages, the suggested system employed authentication.
An applicant’s typing style is immediately detected when he signs in for the first time, and
a template is created for him. These templates are used as a reference to ensure the user is
always authenticated when taking an online test. They are based on some characteristics,
including dwell time (the time between pressing and releasing keys), flight time (the time
between key releases and the next keypress), and the user’s typing speed for improved
precision and responsiveness. The security risks related to online exams experienced in
the past are discussed in [16] in their article published in volume seven. Complicity, which
frequently entails the cooperation of a third party that helps the student by impersonating
him or her online, was identified as a threat that was becoming more difficult to deal
with. The probable mechanisms of security threats in online cheating were uncovered by a
subsequent investigation conducted by the same authors [17,18]. Using dynamic profile
questions from an online course, scientists evaluated the behavior of 31 individuals who
took the test while being observed online. The findings revealed that students who cheated
by impersonation exchanged most of the material through a mobile device. As a result,
their reaction times were considerably different from those of non-cheaters [4].
The authors of [24] employed a different strategy involving hardware. The gear for
the system comprises a camera, a wearable camera, and a microphone to keep track of the
testing site’s visual and auditory environment [24]. Their research describes a multimedia
analytics system that automatically gives out online tests. The system comprises six
core parts that constantly evaluate the most significant behavioral cues: user verification,
text detection, voice detection, active window detection, gaze estimation, and phone
Sensors 2023, 23, 4149 5 of 21
detection. In order to classify whether a test-taker is cheating at any point throughout the
exam, we combined the continuous estimating components and added a temporal sliding
window [25].
The authors of [26] used a case study to assess the incidence of possible e-cheating
and offer preventative strategies that may be used. The internet protocol (IP) detector and
the behavior detector are the two main components of the authors’ e-cheating intelligence
agent, which they used as a technique for identifying online cheating behaviors. The
intelligence agent keeps an eye on the students’ actions and is equipped to stop and identify
any dishonest behavior. It may be connected with online learning tools to track student
behavior and be used to assign randomized multiple-choice questions in a course test. This
approach’s usefulness has been verified through testing on numerous datasets.
The references from the past about the subject of utilizing ML to identify student
cheating are shown in Table 1.
Integrating computer vision using an ML technique is a crucial component of re-
search breakthroughs, in addition to the hardware. By utilizing developments in ML,
computer vision has become smore adept at processing pictures, identifying objects, and
tracking, making research more precise and trustworthy. Cheating on an exam is usually
thought of as an unusual event. Researcher identification is aided by peculiar postures or
movements [26–28]. The application of computer vision efficiently enables this detection.
Systems develop more intelligence through ML. Computer vision systems are now more
capable of spotting suspicious actions because of this improved intelligence. ML processing
technological developments also directly influence the results [29–32]. The use of this tech-
nique to identify suspicious behavior in both online and offline tests has been documented
in many studies. CNN [33–36] is where most strategies were taken from.
Table 1. List of Past References, including methodology, dataset, techniques, and results.
The proposed approach overcomes these research gaps by utilizing a deep learning
model that uses LSTM layers with dropout and dense layers to identify exam cheating
among students. This approach is based on the use of ML technology and is more ad-
Sensors 2023, 23, 4149 6 of 21
vanced than previous approaches that mainly relied on computer vision systems to detect
cheating incidents.
The selection of LSTM as the technique for classifying cheating behavior of students
was based on several reasons. Firstly, LSTMs are designed to handle sequential data,
making them a natural choice for our time-series data consisting of sequential snapshots
of student activity during the test. Secondly, LSTMs are known for their ability to capture
long-term dependencies in sequential data, which is important in detecting cheating behav-
ior that may involve multiple actions occurring over an extended period of time. Thirdly,
LSTMs are capable of handling variable-length input sequences, which is necessary in a
scenario where the number of actions a student takes during a test may vary. Fourthly,
LSTMs are stateful models, which can be useful in detecting cheating behavior occurring
over multiple input sequences. However, the LSTM technique was selected for its ability to
handle sequential data, capture long-term dependencies, handle variable-length sequences,
and maintain an internal state, making it a suitable choice for our problem of classifying
cheating behavior of students. Additionally, our approach uses students’ grades in vari-
ous exam portions as features in the dataset and labels them as “normal” or “cheating,”
which improves anomaly identification methods. The detailed description of the proposed
methodology is presented in the subsequent sections.
3. Dataset
3.1. Data Collection
A dataset that is openly accessible and contains information about student behavior in
a university environment is called the 7WiseUp behavior dataset. The 7WiseUp initiative,
which seeks to enhance student performance in the classroom by identifying and addressing
issues that affect behavior, gathered the data.
The collection contains information from several sources, including surveys, sensor
data, and institutional records. Information on student attendance, academic performance,
and social conduct is included. The information may be used to create models for forecast-
ing academic achievement, locating kids who are at risk, and spotting problematic conduct.
It is intended for use in research on student behavior and performance. The dataset is made
available under a Creative Commons license, which permits reuse and redistribution, as
long as credit is given. However, the collection contains sensitive information about specific
persons, so adhering to ethical standards and ensuring the data are handled responsibly
is critical.
The high cheating dataset of students that committed cheating is shown in Figure 3.
According to the graph, 85% of students that take final examinations were found to cheat.
At around 50%, Q4 had the lowest percentage of cheating pupils.
Figure 2. A high cheating dataset of student 1 with normal grades.
The high cheating dataset of students that committed cheating is shown in Figure 3.
Figure 2. A high cheating dataset of student 1 with normal grades.
According to the
The high graph,dataset
cheating 85% ofofstudents
studentsthat take
that final examinations
committed cheating iswere
shownfound to cheat.
in Figure 3.
At around
According 50%,
to the Q4 had the lowest
graph,dataset
85% of of percentage
students that of
take cheating pupils.
final examinations
The high cheating students that committed cheating iswere found
shown to cheat.
in Figure 3.
At around 50%,
According to theQ4 had the
graph, 85%lowest percentage
of students of cheating
that take pupils.
final examinations were found to cheat.
Sensors 2023, 23, 4149 8 of 21
At around 50%, Q4 had the lowest percentage of cheating pupils.
The graph for the dataset of students who were discovered to be cheating that involved
less cheating and greater grades is shown in Figure 6. A total of 90% of the students who
attempted to cheat on the final examinations were detected, on average. On the other hand,
cheating is shown in Figure 7, which shows the graph of the decreasing cheating increasing
grades of dataset 2, and the same is true: almost 85% of students were discovered cheating.
hand,
who cheatingto
attempted ischeat
shownonin Figure
the 7, which shows
final examinations werethedetected,
graph ofon the de
aver
increasing
hand, grades
cheating of dataset
is shown 2, and7,the
in Figure sameshows
which is true:the
almost 85%
graph of theof stud
dec
ered cheating.
increasing grades of dataset 2, and the same is true: almost 85% of stude
Sensors 2023, 23, 4149 9 of 21
ered cheating.
Figure 6. Less cheating increasing grades dataset of students with cheating fou
Figure 6. Less cheating increasing grades dataset of students with cheating found.
Figure 6. Less cheating increasing grades dataset of students with cheating foun
• Check for duplicates: We have performed the following process on our dataset: Firstly,
we checked for duplicates by ensuring that no rows or columns in the dataset are
repeated. Results that contain duplicates may be skewed or erroneous. To find and
eliminate duplicates, the pandas methods such as duplicated() and drop duplicates() is
used. However, since these student ratings can be similar, we will modify the dataset
later to remove any duplicates.
• Normalize the data: Normalizing the data will guarantee that all variables are scaled
equally. The data were normalized to ensure that all variables are scaled equally.
Normalizing the data is crucial, especially when using models that are sensitive to the
data’s size. To normalize the data, we used the sci-kit-learn’s StandardScaler() method.
For instance, exam scores are a collection of features in the dataset that are entirely
numerical and were converted using the standard scalar option.
• Feature selection: We have performed the following process on our dataset: Firstly,
feature selection was performed to choose the most relevant features for the investiga-
tion. This reduces the dimensionality of the dataset and increase the model’s accuracy.
To choose the most relevant features, sci-kit-learn routines such as SelectKBest() or
RFE() were used. Since the dataset is small, the four datasets were first concatenated
and the features from Q1, Q2, midterm, Q3, Q4, and final were selected.
Any category variables should be converted to numerical variables. Many ML models
use just numerical data. To convert categorical data to numerical variables, pandas provides
methods such as LabelEncoder() and get dummies(). The “detection” column, which
contains two classes—normal and cheating—is added to the label. These labels are first
transformed into categorical data using a label encoder and then category instructions.
4. Proposed Methodology
Developing a deep learning model for detecting cheating in students involves several
phases, which are outlined below, to ensure its success. To start, exploratory data analysis
(EDA) is conducted to identify any irregularities or patterns in the dataset. Through this,
information can be presented, statistical tests can be computed, and any missing or incorrect
numbers can be detected. After running EDA, data cleansing and preparation must be
performed to clean and prepare the dataset for analysis. This stage involves normalizing
the data, handling missing values, and transforming categorical variables into numerical
variables. Feature engineering is the next step; new characteristics are generated using
current data that may be more useful or predictive in identifying cheating. Feature selection
follows, where the most useful traits for spotting cheating are determined using statistical
analysis or machine learning methods. The selected features are then used to choose a
suitable machine learning method for identifying cheating. The choice of algorithm may
be influenced by the size and complexity of the dataset, as well as the specific aims of the
research. During model training, the chosen model is trained on a portion of the dataset,
and its parameters are tweaked to maximize its performance. Following model training,
model evaluation is done using a different validation set, and the model’s performance
is assessed using F1 score, accuracy, recall, precision, and other performance indicators.
Finally, hyperparameter tuning is conducted to improve the model’s performance on the
validation set by changing its regularization, learning rate, or other model parameters to
boost its performance.
The proposed methodology for the student cheating detection system is depicted in
Figure 8. It comprises the retrieved characteristics from the 7WiseUp datasets. The method
is separated into three levels, each with its own set of characteristics.
Sensors
Sensors2023,
2023,23,
23,4149
4149 1111ofof22
21
Proposedmethodology
Figure8.8.Proposed
Figure methodologyfor
forstudent
studentcheating
cheatingdetection
detectionsystem.
system.
InInorder
ordertotocapture
capturethe
thesequential
sequentialpatterns
patternsininthe
thegrades
gradesdata,
data,our
ourmodel
modelarchitecture
architecture
consists of two LSTM layers, two dense layers with dropout to minimize
consists of two LSTM layers, two dense layers with dropout to minimize overfitting, overfitting,and
anda
afinal
finaloutput
outputlayer
layer with sigmoid activation to predict the likelihood of cheating.
with sigmoid activation to predict the likelihood of cheating. The model The
model is tested using a different validation set after being trained on the training data
is tested using a different validation set after being trained on the training data to improve to
improve performance. As shown in the
performance. As shown in the below Figure 9. below Figure 9.
Themodel
Figure9.9.The
Figure modelarchitecture
architectureofofthe
theproposed
proposedmodel.
model.
4.1.1. LSTM
4.1.1. LSTM
The RNN architecture known as LSTM, or long short-term memory, is used to model
The RNN
sequential architecture
data. knownlong-term
To better capture as LSTM, relationships
or long short-term in thememory,
sequence,isLSTMused to model
networks,
sequential data. To better
unlike conventional RNNs,capture
utilize along-term
memory cell relationships
to retain data in the
aboutsequence,
the priorLSTM inputsnet-
and
works, unlike conventional RNNs, utilize a memory cell to retain
their dependencies. There are gates in the memory cell that manage the flow of information. data about the prior
inputs
These and their
gates, dependencies.
which include an There are gates
input gate, in the gate,
an output memory andcell that manage
a forget gate, governthe flow
how
ofinformation
information. These gates, which include an input
is added to, extracted from, and deleted from the cell.gate, an output gate, and a forget gate,
govern Wehow information
have implemented is added
an LSTMto, extracted
networkfrom, andthat
for tasks deleted fromsequential
involve the cell. data such
as timeWe have
seriesimplemented
analysis and an LSTM
voice network for
recognition. LSTM tasks that involve
networks sequential data
are particularly usefulsuch
for
as timesequences
input series analysis
with and voice recognition.
long-term dependencies LSTMas theynetworks are particularly
can selectively retain and useful for
discard
input sequences
information overwith
time;long-term dependencies
this is unlike classic RNNs as they can selectively
that suffer retain and discard
from the “vanishing gradient
information over time; this is unlike classic RNNs that suffer from
problem” when the input sequence is long. We have utilized two LSTM layers in our model. the “vanishing gradient
problem” whenconsists
The first layer the input sequence
of two units and is long. We have
is initialized utilized
with the LSTMtwo LSTM layers
layer type. The ininput
our
model. The first layer
shape parameter consists
is defined to of two the
match units and of
shape is ainitialized
single sample with from
the LSTM layer type.
the training data.
The input shape
Additionally, weparameter
have set is thedefined
returnto match theoption
sequences shape to of True,
a single sample
which fromthe
allows thelayer
train-to
ing data. Additionally,
produce a sequence ofwe have set
hidden statethevalues
returninstead
sequences of aoption
single to True,value.
output which The allows the
second
layer
layerto inproduce
our model a sequence
is also anof hidden
LSTM layerstate
butvalues instead one
only contains of aunit.
single
Byoutput
default,value. The
the return
second
sequences layer in our
option formodel is also
this layer anto
is set LSTM
False,layer
which but only that
means contains one unit.
it outputs By value
a single default,
for
each
the sequence.
return By implementing
sequences option for this these LSTM
layer is set layers, we can
to False, whicheffectively
means that handle tasks that
it outputs a
involve sequential data and long-term dependencies.
Sensors 2023, 23, 4149 13 of 21
4.1.2. Dropout
We have implemented the dropout regularization approach in the proposed model to
address the issue of overfitting. Dropout is a technique that randomly removes a portion of
the input units during training to prevent the model from learning to match the training
data too closely, which can lead to poor generalization performance on new, unseen data.
We applied dropout to a layer’s input or hidden units with a predetermined dropout
rate, typically between 0.2 and 0.5. During training, a unit is likely to be dropped out in
any iteration if the dropout rate is high enough. The remaining units’ weights are scaled
by the inverse of the dropout rate to compensate for the discarded units during training.
During testing, the complete network is used without dropout.
The dropout technique allows the network to acquire more reliable and generalizable
input representations by randomly removing units during training. This technique helps to
prevent the network from relying too heavily on certain input properties, thereby increasing
the model’s ability to generalize to new data.
To implement the dropout technique in our DL model, we used a dropout layer with a
dropout rate of 0.8. During training, this dropout layer randomly drops out a portion of
the input units, introducing noise to the network, and reducing its dependence on specific
input qualities, thus helping to prevent overfitting. With a dropout rate of 0.8, training
would lose 80% of the input units, further aiding in generalization.
The decision made about the optimizer, loss function, and performance metric may
significantly impact the model’s performance. Binary cross-entropy is a logical option for
a binary classification task. The Adam optimizer is well recognized as suitable for many
deep-learning models. The accuracy metric gives a straightforward and understandable
indication of how well the model performs. However, alternative metrics such as precision,
recall, or F1 score could be more suitable depending on the application’s objectives. The
hyperparameters may be adjusted by performing several tests with various settings and
choosing the one that produces the best results.
5. Results
The process of evaluating a trained ML model on a dataset different from the data used
for training is known as model evaluation. Python is a popular programming language
that is widely used in data science and ML. In our research, we used Python version 3.8.5
for our implementation. To aid our implementation, we made use of several popular
libraries such as TensorFlow, Keras, Pandas, and Numpy. These libraries helped us to
carry out various tasks such as data preprocessing, building and training our deep learning
model, and evaluating our results. We specifically used TensorFlow version 2.4.0, which
is a popular open-source platform for machine learning and deep learning. We also used
Keras version 2.4.3, which is a high-level API built on top of TensorFlow that simplifies the
process of building and training deep learning models. Pandas version 1.1.3 was also used
to manipulate and analyze our dataset. Finally, we made use of Numpy version 1.19.2 to
perform numerical computations on our dataset.
We have followed several procedures to evaluate the performance of the LSTM model.
Firstly, we split the dataset into training and testing sets using the train–test split function
from the sci-kit-learn library. The testing set was used to evaluate the model’s performance
after it was trained on the training set. Next, we applied the LSTM model to the training
set using the fit() technique to train the algorithm to predict each student’s likelihood of
cheating based on their performance on the Q1, Q2, midterm, Q3, Q4, and final exams.
After training, we evaluated the model on the testing set to predict the probability of
cheating for each student. We calculated evaluation metrics such as accuracy, precision,
recall, F1-score, Receiver Operating Characteristic Area Under the Curve (ROC-AUC), etc.,
using the predicted probabilities and actual labels. The evaluation metrics indicated how
well the model distinguished between normal and detected cheating in the two groups.
To enhance the LSTM model’s performance, we adjusted hyperparameters such as the
number of LSTM layers, neurons in each layer, dropout rate, learning rate, and batch size.
The hyperparameters were chosen based on the model’s performance on the validation set.
Finally, we visualized the results using charts such as the ROC curve, confusion matrix,
and precision–recall curve. These plots helped to identify areas that needed improvement
and provided insights into the model’s strengths and weaknesses. By following these
procedures, we effectively assessed the performance of the LSTM model.
By carrying out the processes listed above, the LSTM model’s performance may be
assessed and enhanced for better outcomes on the dataset for detecting student cheating.
After 50 iterations, the model achieved 90% training and 92% validation accuracy.
The performance of a DL model during training may be assessed using measures such
as training and validation accuracy. Whereas validation accuracy refers to the model’s
performance on a different validation set that is not used for training, training accuracy
refers to the model’s performance on the training set. Figure 10 demonstrates the model’s
training accuracy, with a blue graph for training and an orange graph for validation.
Sensors 2023, 23, 4149 15 of 22
model’s training accuracy, with a blue graph for training and an orange graph for valida-
Sensors 2023, 23, 4149 tion. 15 of 21
model’s training accuracy, with a blue graph for training and an orange graph for valida-
tion.
We want training and validation loss to be minimal and somewhat near to one another,
similar to how training and validation accuracy should be. High validation loss may signify
Sensors 2023, 23, 4149 16 of 22
5.1. Model
5.1. Evaluation
Model Metrics
Evaluation Metrics
Metrics
Metrics for measuringthe
for measuring theeffectiveness
effectivenessof
ofaaDL
DL model
model on on a specific task
task are
areknown
knownas
asmodel
modelevaluation
evaluationmetrics.
metrics.For
Forbinary
binaryclassification
classificationtasks,
tasks,the
thefollowing
followingassessment
assessment cri-
criteria
teria
are are frequently
frequently used:
used:
•Recall:
Recall:The
Thefraction
fractionofofgenuine
genuine positives
positives inin
thethetest setset
test outoutof of
allall
real positive
real positive samples.
samples.
•Accuracy:
Accuracy: The
The model’s
model’s percentage
percentage ofofcorrect
correctpredictions
predictionsononthe thetest
testset.
set.
•F1F1score: Precision
score: Precision and
and recall’s
recall’s harmonic
harmonic mean
meanisisthe theF1F1score.
score.
•Precision:
Precision:TheThe fraction
fraction ofoftrue
truepositives
positives (positive
(positive samples
samples accurately
accurately detected)
detected) out
outofof
all positive predictions produced
all positive predictions produced by the model. by the model.
• ROC ROC curve
curve andand AUC:
AUC: TheThe ROCROC curve
curve represents
represents thethe relationship
relationship between
between the
the true
true
positive
positive rate
rate (recall)
(recall) andand thethe false
false positive
positive rate
rate (1-specificity)
(1-specificity) at at various
various categorization
categorization
thresholds.
thresholds. TheTheROCROC curve’s
curve’s AUC
AUC measures
measures howhow well
well the
the model
model cancandifferentiate
differentiate
between
between positive
positive and
and negative
negative data.
data.
TheThe proposed
proposed model
model forfor
thethetest
testdata’s
data’sROCROC curve
curve is is
shown
shown ininFigure
Figure 12,12,
where
where thethe
ROC
ROC curve
curvetracks
tracks the same
the same path
path asas
the random
the random guess
guess graph.
graph. TheThefalse
falsepositive
positive rate (FPR)
rate (FPR)
and true
and truepositive
positive rate (TPR)
rate (TPR) areareshown
shown ininthe
thegraph.
graph.
Figure
Figure 12.12.
ROCROC curve
curve of of
thethe proposed
proposed model
model forfor test
test data.
data.
5.2. Confusion Matrix
5.2. Confusion Matrix
The confusion matrix summarizes the number of true positives, true negatives, false
positives, and false negatives in the test set. The confusion matrix of the suggested test data
model is displayed in Figure 13.
Sensors 2023, 23, 4149 17 of 22
The confusion matrix summarizes the number of true positives, true negatives, false
Sensors 2023, 23, 4149 17 of 21
positives, and false negatives in the test set. The confusion matrix of the suggested test
data model is displayed in Figure 13.
utilized in the second reference work’s LSTM method, which had an accuracy rate of 81%.
For the 7WiseUp behavior dataset, the third reference study employed an RNN method
with an accuracy of 86%. The comparison is as shown in Table 4.
Our model design, which utilized an LSTM strategy with a dropout layer, thick layers,
and an Adam optimizer, produced results that were 90% more accurate than those of all
three-reference works. We credit our usage of more complex and optimized architecture
and hyperparameters for greater accuracy. The greater accuracy is likely attributable to the
data processing and cleaning we performed. More investigation and analysis are needed to
pinpoint precisely what contributed to our model’s enhanced performance.
6. Discussion
Keeping the academic integrity of student evaluations intact is one of the biggest
issues in online education. The absence of direct teacher monitoring significantly increases
academic dishonesty during final exams. The 7WiseUp behavior dataset, which anybody
can access, is used in this project to offer details on student conduct in a university environ-
ment. The 7WiseUp program gathered the data, which aim to improve student success by
identifying and resolving issues that impact behavior.
A variety of sources, including surveys, sensor data, and institutional records, are
included in the collection. It includes information on student attendance, academic progress,
and social behavior. The information may be used to build models for predicting academic
success, locating at-risk students, and spotting problematic behavior. It is designed for use
in research on student behavior and performance [38–49].
The term’s dataset, final exam, suggested technique, final test, and average score
all use four different synthetic datasets and one real-world dataset for the experimental
evaluation of a suggested strategy. The final test score and the average of the regular
assessment scores differ by 35 points, making datasets 1 and 2 equal. The last 20 regular
grades improve during the semester; the final test result is within a 10-point range for
around 80% of the average marks. A new label column with two classes—normal and
cheating—has been added, improving anomaly identification methods. Our experiment’s
final dataset demonstrates that cheating incidents can be automatically identified even
in challenging situations. The cheating occurrences are designed to raise exam scores by
25 points and mimic a jump in average grades of 10 points over the mean of the preceding
grades. Together with the synthetic data, we also employ one real-world dataset. For each
observation, we include 1 of the 52 observations in our collection.
A CNN technique (84.52%), an LSTM approach (84.52%), and an RNN approach
(86%), along with three additional reference works, were compared to the student cheating
detection dataset. The first reference work used a CNN approach with 84.52% accuracy, the
second used an LSTM approach with an error rate of 81%, and the third used one with an
objective accuracy of 86%. All three-reference works were tested on the 7WiseUp behavior
dataset. In contrast, the accuracy of our model architecture, which used an LSTM approach
with a dropout layer, thick layers, and an optimizer called Adam, was 91%, which was
higher than the accuracy of all three reference studies. Implementing more intricate and
sophisticated architectures and hyperparameters is responsible for the increased accuracy.
Additionally, our data preparation and cleaning process may be responsible for improved
accuracy. More investigation and analysis are required to identify the specific factors that
were responsible for our model’s superior performance.
Sensors 2023, 23, 4149 19 of 21
One limitation of this research is that it relied on a single dataset, the 7WiseUp behavior
dataset, which may not be representative of all online education environments. Further-
more, the dataset was not specifically designed for cheating detection, which may limit the
accuracy of the models developed in this research. Additionally, the synthetic datasets used
in the experiments may not fully capture the complexity of real-world cheating incidents.
Further research could benefit from using multiple datasets, including those specifically
designed for cheating detection, to ensure the generalizability of the findings. Another
limitation is that the specific factors responsible for the superior performance of the model
are not identified, highlighting the need for further analysis and investigation.
In addition, another limitation is low performance metric values for imbalanced
data; the proposed model also has low values of recall and F1 score. Recall, also known as
sensitivity or true positive rate, measures the proportion of actual positives that are correctly
identified by the model. F1 score is a combination of precision and recall, and it takes into
account both false positives and false negatives. These metrics are particularly important
for imbalanced datasets, where the proportion of positive cases is much lower than that of
negative cases. In such cases, a model that simply predicts all cases as negative may achieve
high accuracy but perform poorly in terms of identifying positive cases. Therefore, low
values of recall and F1 score indicate that the proposed model is not effective at identifying
positive cases, which is a significant limitation for its applicability in real-world scenarios.
7. Conclusions
The rise of online education has presented many benefits for students and educational
institutions, but it has also brought forth numerous challenges, including academic dishon-
esty in the form of cheating, during online assessments. To address this issue, educational
institutions must implement better detection techniques to ensure academic integrity. This
research uses ML technology to investigate the problem of online cheating and provides
practical solutions for monitoring and eliminating such incidents. The goal of this research
was to create a deep learning model using LSTM layers with dropout and dense layers to
identify exam cheating among students. We used the students’ grades in various exam por-
tions as features in our dataset and labeled them as “normal” or “cheating.” Despite having
a smaller dataset than previous research, our model architecture resulted in a 90% training
and 92% validation accuracy, outperforming models that used CNN and RNN layers. Our
approach accurately and successfully identified student exam cheating, showcasing the
potential of deep learning approaches in identifying academic dishonesty. By utilizing
such models, educational institutions can create more efficient strategies for guaranteeing
academic integrity. Ultimately, this research emphasizes the importance of using advanced
technologies in addressing contemporary challenges in online education.
Future research should focus on further refining and optimizing deep learning models
for detecting academic dishonesty in online assessments. This can include exploring the
use of other machine learning algorithms and techniques, such as ensemble learning and
transfer learning, to improve model performance and accuracy. Additionally, research can
investigate fthe feasibility of implementing real-time monitoring systems that can detect
and prevent cheating during online exams.
References
1. Hussein, F.; Al-Ahmad, A.; El-Salhi, S.; Alshdaifat, E.; Al-Hami, M. Advances in Contextual Action Recognition: Automatic
Cheating Detection Using Machine Learning Techniques. Data 2022, 7, 122. [CrossRef]
2. Alshdaifat, E.; Alshdaifat, D.; Alsarhan, A.; Hussein, F.; El-Salhi, S.M.F.S. The effect of preprocessing techniques, applied to
numeric features, on classification algorithms’ performance. Data 2021, 6, 11. [CrossRef]
3. Alam, A.; Das, A.; Tasjid, S.; Al Marouf, A. Leveraging Sensor Fusion and Sensor-Body Position for Activity Recognition for
Wearable Mobile Technologies. Int. J. Interact. Mob. Technol. 2021, 15, 141–155. [CrossRef]
4. Yulita, I.N.; Hariz, F.A.; Suryana, I.; Prabuwono, A.S. Educational Innovation Faced with COVID-19: Deep Learning for Online
Exam Cheating Detection. Educ. Sci. 2023, 13, 194. [CrossRef]
5. Rodríguez-Villalobos, M.; Fernandez-Garza, J.; Heredia-Escorza, Y. Monitoring methods and student performance in distance
education exams. Int. J. Inf. Learn. Technol. 2023, 40, 164–176. [CrossRef]
6. Nugroho, M.A.; Abdurohman, M.; Prabowo, S.; Nurhayati, I.K.; Rizal, A. Intelligent Remote Online Proctoring in Learning
Management Systems. In Information Systems for Intelligent Systems: Proceedings of ISBM 2022; Springer: Berlin/Heidelberg,
Germany, 2023; pp. 229–238. [CrossRef]
7. Chang, S.-C.; Chang, K.L. Cheating Detection of Test Collusion: A Study on Machine Learning Techniques and Feature
Representation. Educ. Meas. Issues Pract. 2023. [CrossRef]
8. Kadthim, R.K.; Ali, Z.H. Cheating Detection in online exams using machine learning. J. AL-Turath Univ. Coll. 2023, 2, 35–41.
Available online: https://www.iasj.net/iasj/article/260632 (accessed on 10 March 2023).
9. Perrett, T.; Masullo, A.; Burghardt, T.; Mirmehdi, M.; Damen, D. Temporal-relational crosstransformers for few-shot action
recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA,
20–25 June 2021; pp. 475–484. [CrossRef]
10. Somers, R.; Cunningham, S.; Dart, S.; Thomson, S.; Chua, C.; Pickering, E. AssignmentWatch: An automated detection and alert
tool for reducing academic misconduct associated with file-sharing websites. IEEE Trans. Learn. Technol. 2023. [CrossRef]
11. Obionwu, C.V.; Kumar, R.; Shantharam, S.; Broneske, D.; Saake, G. Semantic Relatedness: A Strategy for Plagiarism Detection in
SQL Assignments. In Proceedings of the 2023 6th World Conference on Computing and Communication Technologies (WCCCT),
Chengdu, China, 6–8 January 2023; pp. 158–165. [CrossRef]
12. Kock, E.; Sarwari, Y.; Russo, N.; Johnsson, M. Identifying cheating behaviour with machine learning. In Proceedings of the 2021
Swedish Artificial Intelligence Society Workshop (SAIS), Luleå, Sweden, 14–15 June 2021; pp. 1–4. [CrossRef]
13. Tiong, L.C.O.; Lee, H.J. E-cheating Prevention Measures: Detection of Cheating at Online Examinations Using Deep Learning
Approach—A Case Study. arXiv 2021, arXiv:2101.09841. [CrossRef]
14. Kamalov, F.; Sulieman, H.; Calonge, D.S. Machine learning based approach to exam cheating detection. PLoS ONE 2021, 16,
e0254340. [CrossRef]
15. Qu, S.; Li, K.; Wu, B.; Zhang, X.; Zhu, K. Predicting student performance and deficiency in mastering knowledge points in
MOOCs using multi-task learning. Entropy 2019, 21, 1216. [CrossRef]
16. El Kohli, S.; Jannaj, Y.; Maanan, M.; Rhinane, H. Deep learning: New approach for detecting scholar exams fraud. Int. Arch.
Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 46, 103–107. [CrossRef]
17. Ahmad, I.; AlQurashi, F.; Abozinadah, E.; Mehmood, R. A novel deep learning-based online proctoring system using face
recognition, eye blinking, and object detection techniques. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 10. [CrossRef]
18. Genemo, M.D. Suspicious activity recognition for monitoring cheating in exams. Proc. Indian Natl. Sci. Acad. 2022, 88, 1–10.
[CrossRef]
19. Ozdamli, F.; Aljarrah, A.; Karagozlu, D.; Ababneh, M. Facial Recognition System to Detect Student Emotions and Cheating in
Distance Learning. Sustainability 2022, 14, 13230. [CrossRef]
20. Ullah, A.; Xiao, H.; Barker, T. A dynamic profile questions approach to mitigate impersonation in online examinations. J. Grid
Comput. 2019, 17, 209–223. [CrossRef]
21. Herrera, G.; Nuñez-del-Prado, M.; Lazo, J.G.L.; Alatrista, H. Through an agnostic programming languages methodology for
plagiarism detection in engineering coding courses. In Proceedings of the 2019 IEEE World Conference on Engineering Education
(EDUNINE), Santos, Brazil, 17–20 March 2019; pp. 1–6. [CrossRef]
22. Pawelczak, D. Benefits and drawbacks of source code plagiarism detection in engineering education. In Proceedings of the 2018
IEEE Global Engineering Education Conference (EDUCON), Santa Cruz de Tenerife, Spain, 17–20 April 2018; pp. 1048–1056.
[CrossRef]
23. Ghizlane, M.; Reda, F.H.; Hicham, B. A smart card digital identity check model for university services access. In Proceedings of
the 2nd International Conference on Networking, Information Systems & Security, Rabat, Morocco, 27–29 March 2019; pp. 1–4.
[CrossRef]
24. Ghizlane, M.; Hicham, B.; Reda, F.H. A new model of automatic and continuous online exam monitoring. In Proceedings of the
2019 International Conference on Systems of Collaboration Big Data, Internet of Things & Security (SysCoBIoTS), Casablanca,
Morocco, 12–13 December 2019; pp. 1–5. [CrossRef]
25. Putarek, V.; Pavlin-Bernardić, N. The role of self-efficacy for self-regulated learning, achievement goals, and engagement in
academic cheating. Eur. J. Psychol. Educ. 2020, 35, 647–671. [CrossRef]
Sensors 2023, 23, 4149 21 of 21
26. Mukasa, J.; Stokes, L.; Mukona, D.M. Academic dishonesty by students of bioethics at a tertiary institution in Australia: An
exploratory study. Int. J. Educ. Integr. 2023, 19, 3. [CrossRef]
27. Balderas, A.; Caballero-Hernández, J.A. Analysis of learning records to detect student cheating on online exams: Case study
during COVID-19 pandemic. In Proceedings of the Eighth International Conference on Technological Ecosystems for Enhancing
Multiculturality, Salamanca, Spain, 21–23 October 2020; pp. 752–757. [CrossRef]
28. Wang, P.; Fan, E.; Wang, P. Comparative analysis of image classification algorithms based on traditional machine learning and
deep learning. Pattern Recognit. Lett. 2021, 141, 61–67. [CrossRef]
29. Rogelio, J.; Dadios, E.; Bandala, A.; Vicerra, R.R.; Sybingco, E. Alignment control using visual servoing and mobilenet single-shot
multi-box detection (SSD): A review. Int. J. Adv. Intell. Inform. 2022, 8, 97–114. [CrossRef]
30. Gaba, S.; Budhiraja, I.; Kumar, V.; Garg, S.; Kaddoum, G.; Hassan, M.M. A federated calibration scheme for convolutional neural
networks: Models, applications and challenges. Comput. Commun. 2022, 192, 144–162. [CrossRef]
31. Antelo, C.; Martinho, D.; Marreiros, G. A Review on Supervised Learning Methodologies for Detecting Eating Habits of Diabetic
Patients. In Proceedings of the Progress in Artificial Intelligence: 21st EPIA Conference on Artificial Intelligence, EPIA 2022,
Lisbon, Portugal, 31 August–2 September 2022; pp. 374–386. [CrossRef]
32. Salih, A.A.; Abdulazeez, A.M. Evaluation of classification algorithms for intrusion detection system: A review. J. Soft Comput.
Data Min. 2021, 2, 31–40. [CrossRef]
33. Saba, T.; Rehman, A.; Jamail, N.S.M.; Marie-Sainte, S.L.; Raza, M.; Sharif, M. Categorizing the students’ activities for automated
exam proctoring using proposed deep L2-GraftNet CNN network and ASO based feature selection approach. IEEE Access 2021, 9,
47639–47656. [CrossRef]
34. Malhotra, M.; Chhabra, I. Student Invigilation Detection Using Deep Learning and Machine After COVID-19: A Review on Tax-
onomy and Future Challenges. In Future of Organizations and Work after the 4th Industrial Revolution; Springer: Berlin/Heidelberg,
Germany, 2022; pp. 311–326. [CrossRef]
35. Maschler, B.; Weyrich, M. Deep transfer learning for industrial automation: A review and discussion of new techniques for
data-driven machine learning. IEEE Ind. Electron. Mag. 2021, 15, 65–75. [CrossRef]
36. Kumar, Y.; Gupta, S. Deep transfer learning approaches to predict glaucoma, cataract, choroidal neovascularization, diabetic
macular edema, drusen and healthy eyes: An experimental review. Arch. Comput. Methods Eng. 2023, 30, 521–541. [CrossRef]
37. Charan, A.; Darshan, D.; Madhu, N.; Manjunatha, B.S. A Survey on Detection of Anomalousbehaviour in Examination Hall. Int. J.
Eng. Appl. Sci. Technol. 2020, 5, 583–588. [CrossRef]
38. Shaukat, K.; Luo, S.; Varadharajan, V.; Hameed, I.A.; Xu, M. A survey on machine learning techniques for cyber security in the
last decade. IEEE Access 2020, 8, 222310–222354. [CrossRef]
39. Shaukat, K.; Luo, S.; Varadharajan, V.; Hameed, I.A.; Chen, S.; Liu, D.; Li, J. Performance comparison and current challenges of
using machine learning techniques in cybersecurity. Energies 2020, 13, 2509. [CrossRef]
40. Shaukat, K.; Luo, S.; Varadharajan, V. A novel deep learning-based approach for malware detection. Eng. Appl. Artif. Intell. 2023,
1122, 106030. [CrossRef]
41. Shaukat, K.; Luo, S.; Varadharajan, V. A novel method for improving the robustness of deep learning-based malware detectors
against adversarial attacks. Eng. Appl. Artif. Intell. 2022, 116, 105461. [CrossRef]
42. Luo, S.; Shaukat, K. Computational Methods for Medical and Cyber Security; MDPI: Basel, Switzerland, 2022.
43. Alam, T.M.; Mushtaq, M.; Shaukat, K.; Hameed, I.A.; Sarwar, M.U.; Luo, S. A novel method for performance measurement of
public educational institutions using machine learning models. Appl. Sci. 2021, 11, 9296. [CrossRef]
44. Shaukat, K.; Nawaz, I.; Aslam, S.; Zaheer, S.; Shaukat, U. Student’s performance in the context of data mining. In Proceedings of
the 2016 19th International Multi-Topic Conference (INMIC), Islamabad, Pakistan, 5 December 2016; IEEE: Piscataway, ZJ, USA,
2016; pp. 1–8.
45. Shaukat, K.; Nawaz, I.; Aslam, S.; Zaheer, S.; Shaukat, U. Student’s Performance: A Data Mining Perspective; LAP Lambert Academic
Publishing: London, UK, 2017.
46. Shaukat, K.; Luo, S.; Abbas, N.; Alam, T.M.; Tahir, M.E.; Hameed, I.A. An analysis of blessed Friday sale at a retail store
using classification models. In Proceedings of the 2021 4th International Conference on Software Engineering and Information
Management, Yokohama, Japan, 16 January 2021; pp. 193–198.
47. Ibrar, M.; Hassan, M.A.; Shaukat, K.; Alam, T.M.; Khurshid, K.S.; Hameed, I.A.; Aljuaid, H.; Luo, S. A Machine Learning-Based
Model for Stability Prediction of Decentralized Power Grid Linked with Renewable Energy Resources. Wirel. Commun. Mob.
Comput. 2022, 2022, 2697303. [CrossRef]
48. Dar, K.S.; Shafat, A.B.; Hassan, M.U. An efficient stop word elimination algorithm for Urdu language. In Proceedings of the 2017
14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology
(ECTI-CON), Phuket, Thailand, 27 June 2017; IEEE: Piscataway, ZJ, USA, 2017; pp. 911–914.
49. Shaukat, K.; Masood, N.; Khushi, M. A novel approach to data extraction on hyperlinked webpages. Appl. Sci. 2019, 9, 5102.
[CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.