0% found this document useful (0 votes)
33 views

The Power of Deep Learning Techniques for Predicting Student Performance in Virtual Learning Environments a Systematic Literature Review

This systematic literature review analyzes 46 studies from 2019 to 2023 on the application of Deep Learning techniques to predict student performance in Virtual Learning Environments. The findings indicate that models like DNNs and CNN-LSTM achieve high prediction accuracy, with learning behavior and activity features being the most significant predictors of success. The review aims to provide insights for educators to enhance student support and reduce dropout rates through informed interventions.

Uploaded by

Mohamed Tamer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

The Power of Deep Learning Techniques for Predicting Student Performance in Virtual Learning Environments a Systematic Literature Review

This systematic literature review analyzes 46 studies from 2019 to 2023 on the application of Deep Learning techniques to predict student performance in Virtual Learning Environments. The findings indicate that models like DNNs and CNN-LSTM achieve high prediction accuracy, with learning behavior and activity features being the most significant predictors of success. The review aims to provide insights for educators to enhance student support and reduce dropout rates through informed interventions.

Uploaded by

Mohamed Tamer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Computers and Education: Artificial Intelligence 6 (2024) 100231

Contents lists available at ScienceDirect

Computers and Education: Artificial Intelligence


journal homepage: www.sciencedirect.com/journal/computers-and-education-artificial-intelligence

The power of Deep Learning techniques for predicting student performance


in Virtual Learning Environments: A systematic literature review
Bayan Alnasyan ∗ , Mohammed Basheri, Madini Alassafi
Department of Information Technology, King Abdulaziz University, Jeddah, 21589, Saudi Arabia

A R T I C L E I N F O A B S T R A C T

Keywords: With the advances in Artificial Intelligence (AI) and the increasing volume of online educational data, Deep
Deep Learning Learning techniques have played a critical role in predicting student performance. Recent developments have
Machine Learning assisted instructors in determining the strengths and weaknesses of student achievement. This understanding
Ensemble Learning
will benefit from adopting the necessary interventions to assist students in improving their performance, helping
Hybrid model
Prediction
at-risk of failure students, and preventing dropout rates. The review analyzed 46 studies between 2019 and 2023
Learner performance that apply one or more Deep Learning (DL) techniques, either single or in combination with Machine Learning
(ML) or Ensemble Learning techniques. Moreover, the review utilized datasets from public Massive Open Online
Courses (MOOCs), private Learning Management Systems (LMSs), and other platforms. Four categories were
used to group the features: demographic, previous academic performance, current academic performance, and
learning behavior/activity features. The analysis revealed that the DNNs and CNN-LSTM models were the most
common techniques. Moreover, the studies that used DL techniques, such as CNNs, DNNs, and LSTMs, performed
well by achieving high prediction accuracy above 90%; other studies achieved accuracy ranging (60 to 90)%. For
datasets used within the reviewed studies, even though 44% of the studies used LMSs datasets, Open University
Learning Analytics Dataset (OULAD) was the most used dataset from MOOCs. The analysis of grouped features
shows that among the various categories examined, learning behavior and activity features stand out as the
most significant predictors, suggesting that students engagement with their learning environment through their
overall participation offers crucial insights into their success. The educational prediction findings hopefully serve
as a strong foundation for administrators and instructors to observe student performance and provide a suitable
educational adaptation that can meet their needs to protect them from failure and prevent their dropout.

1. Introduction 2020, Pem et al., 2021). Moreover, these environments keep track of
information for each student’s learning process, such as how often a
The rapid expansion of Virtual Learning Environments (VLEs) has student logs into the system or accesses certain course materials and
significantly influenced the educational landscape by offering open- the time spent responding to the question. These activities lead to the
access, large-scale learning opportunities to the global community production of vast amounts of student data, which can then be used
through MOOCs, or by enhancing specific organizational and insti- to evaluate the success of the educational process, discover learning
tutional learning via LMSs. These environments consist of, but are patterns, develop systems to evaluate student performance, understand
not limited to, educational institutions: schools, colleges (e.g., Open student needs, and finally, recognize their future demands (Agudo-
University), vocational training centers (e.g., Codecademy, and Cours- Peregrina et al., 2014, Magalhães et al., 2020, Pem et al., 2021). As
era), universities, and beyond traditional educational structures: online a result of the rapid advancements in technology being leveraged to an-
academies (e.g., Khan Academy, and Udemy), and intelligent tutoring alyze a vast amount of student data, Learning Analytics (LA), Predictive
systems (e.g., ASSISTments). These VLEs expand the scope of educa- Learning Analytics (PLA), and Educational Data Mining (EDM), have
tion, and increase flexibility and accessibility for students worldwide, shown promising results when used to resolve several educational is-
offering students the convenience of submitting homework, participat- sues. These fields collaborate to improve educational outcomes through
ing in online discussions, and taking exams remotely (Magalhães et al., data-driven insights, each with its own specialized focus (Siemens &

* Corresponding author.
E-mail addresses: [email protected] (B. Alnasyan), [email protected] (M. Basheri), malasafi@kau.edu.sa (M. Alassafi).

https://doi.org/10.1016/j.caeai.2024.100231
Received 26 December 2023; Received in revised form 21 April 2024; Accepted 25 April 2024
Available online 3 May 2024
2666-920X/© 2024 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Baker, 2012, Papamitsiou & Economides, 2014, Du et al., 2020). LA fo- predicting student performance, but, in addition, offer a deep under-
cuses on the collection, analysis, and reporting of data about students, standing of the features influencing their learning processes. These
with the aim of understanding student behaviors and environments theories are categorized into environmental factor theories, individual-
(Mah, 2016, Viberg et al., 2018, Schumacher & Ifenthaler, 2018). As focused theories, and interactions-focused theories. For environmental
a specialized subset of LA, PLA leverages statistical models and pre- factors, theories such as Tinto’s Model of Student Integration (1975),
dicting methods to anticipate future learning patterns, outcomes, and Walberg’s Hypothesis of Educational Productivity (1981), Astin’s Input-
student behaviors using past and present data (Calvert, 2014, Wag- Environment-Outcome Model (1984), and Biggs’s 3M Model (1999),
ner & Longanecker, 2016, Herodotou et al., 2020). EDM additionally offer valuable insights into the relationship between student learn-
applies computational techniques to analyze educational data, aiming ing characteristics and their performance (Haertel et al., 1983). For
to answer research questions, and address learning-related problems instance, Tinto’s model emphasizes the impact of socioeconomic sta-
(Siemens & Baker, 2012, Papamitsiou & Economides, 2014, Hernández- tus, academic experiences, demographics, family background, skills,
Blanco et al., 2019). Both PLA and EDM utilize advanced data analytics abilities, and prior schooling experiences on student’s academic levels
techniques, from ML, within the broader field of AI, to sophisticated DL (Tinto, 1975), while Walberg’s hypothesis identifies key factors, such as
as a subfield of ML. The application of DL in these two fields has opened classroom climate, home environment, age/developmental level, and
new avenues for predictive modeling, such as the prediction of student student ability/prior achievement as necessary for the achievements of
performance, student learning outcomes, and early detection of students education (Walberg et al., 1986). Meanwhile, Astin’s model considers
at-risk of failure, or dropout. These predictions provide administra- that student physical and psychological energy invested in the educa-
tors and instructors with insights about students, which thereby enable tional process and student involvement in academic and extracurricular
them to support these students as early as possible through tutoring, activities are important factors that are related to student learning and
personalized learning experiences, feedback and guidance, and other development, and therefore improve academic achievement. Studies by
forms of support (Baneres et al., 2023, El-Sabagh, 2021). Moreover, Astin (1993), Fraser et al. (1987), and Fullarton (2002) emphasize the
these predictions help students better understand their own strengths crucial role of student engagement with other psychosocial factors in
and weaknesses, as well as identify the areas where they might bene- influencing student performance. Biggs’s proposes the Presage-Process-
fit from extra assistance. Based on this, the term “predicting student Product (3P) model, which delves into how student’s backgrounds,
performance” refers to both predicting student overall outcomes, as learning styles, and educational environments collectively influence
well as predicting students who can be at-risk of failure or dropout,
academic success (Biggs, 1999). For individual-focused theories and be-
not graduating at the planned time, or even not meeting any edu-
yond environment, Self-Determination Theory (SDT) by Deci and Ryan
cational goal (The Glossary of Educational Reform, 2013 at https://
(2012) highlights the role that relatedness, competence, and autonomy
www.edglossary.org/professional-development/). A range of evalua-
have in promoting engagement. Meanwhile, Expectancy-Value Theory
tion techniques, such as exams, tests, essays, projects, and other assign-
(EVT) by Wigfield and Eccles (2000) shows how expectations (confi-
ments, is used to determine the overall performance and achievement.
dence in achievement) and values (perceived relevance of activities)
Overall outcomes may consider additional aspects of success, such as
affect how well an individual performs. While grades have an impact on
behavior, attendance, and social-emotional development (Aljaloud et
students expectations of future success, which in turn influences their
al., 2022, Yang & Bai, 2022). As student performance can be assessed
academic engagement and achievements, demographic backgrounds
through various indicators, numerous researchers have identified stu-
can have an impact on students belief in their own abilities, and the sig-
dent performance in different ways. Biggs (1996) referred to learning
nificance they place on academic assignments. Interactions-focused the-
outcomes as the specific knowledge, skills, and abilities that students
ories, including the Community of Inquiry (CoI) framework by Garrison
are expected to acquire through their educational experiences. Hattie
et al. (2001) emphasize the critical role of social, cognitive, and teach-
(2008) measured academic achievement by student grades, grade point
ing interactions in online learning environments for enhancing student
averages (GPAs), or scores on standardized tests. Moreover, Fredricks et
engagement and success. Furthermore, the Dynamic Model of Educa-
al. (2004) defined student performance in terms of engagement and par-
ticipation, including attendance, participation in class discussions, and tional Effectiveness (DMEE) (Kyriakides, 2008), the Multilevel Model
involvement in extracurricular activities. Marzano (2003) defined stu- of Student Achievement (MLMSA) (Creemers & Kyriakides, 2010), and
dent performance as the measurable outcomes of a student’s academic the Integrated Multilevel Model of Education (Scheerens & Blomeke,
efforts, including their mastery of subject content, problem-solving abil- 2016), are multilevel frameworks that explore the complex interactions
ities, critical thinking skills, and overall academic success. Similarly, between student, classroom, school, and system-level factors to enhance
National Academies of Sciences, Engineering, and Medicine (2012) As- educational outcomes. While DMEE focuses on the dynamic interplay
sessed student performance based on the gaining of specific skills, such of instructional practices and school leadership, MLMSA and the In-
as critical thinking, problem-solving, and communication skills. In ad- tegrated Model explore into the statistical analysis and integration of
dition, Tinto (2012) defined performance as the progress students make individual, classroom, and school influences on student learning. These
towards completing their degree or certification within a specified time various educational theories interpret the most significant indicators for
frame. Biggs et al. (2022) described the extent to which students have student success, guide the way of using these indicators in advanced AI
achieved the desired learning outcomes or educational objectives set techniques. This theoretical basis not only acts as a standard for as-
by the curriculum or educational standards based on the knowledge, sessing the relevance and efficacy of these indicators when applied to
skills, and competencies acquired by students through their educational AI techniques, but helps improve the model’s validity and reliability,
experiences. Furthermore, Lerner and Steinberg (2009) viewed student making it more educationally meaningful. These theories guide the se-
performance as the overall growth, development, and improvement ex- lection of student features to ensure that the variables considered are
hibited by students in areas such as social skills, emotional intelligence, both theoretically grounded, and empirically proven to affect student
self-regulation, and personal values. Each of these definitions highlights outcomes, which enhances understanding of how student outcomes can
different aspects of student performance considered in evaluating stu- be predicted. The process of categorizing student features is guided by
dent success. these theories through highlighting which features are more likely to be
To explore how the integration of PLA and EDM grounded in the significant predictors of student performance, thereby, allowing to pri-
principles of AI intersects with educational theories, researchers have oritize the use of these features in prediction. Lastly, the reliance on the
become increasingly interested in studies of VLEs, and have turned indicators provided by these educational theories directs the way for
their attention towards the rich landscape of educational theoretical the development of theory-informed educational technologies, allow-
frameworks. These frameworks not only provide valuable insights into ing researchers and educators to gain insights into the learning process,

2
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

and design more effective educational interventions that are responsive


to the student’s needs.
Hence, the main objective of the review is to comprehensively map,
measure, evaluate the studies that were released between 2019 and
2023 on Deep Learning techniques utilized to predict student perfor-
mance.

The main contributions of the review are as follows. The review:

• Identifies different Deep Learning techniques developed for pre-


dicting student performance.
• Categorizes the various data sources and features used in the pre- Fig. 1. Okoli’s guide (2015) for conducting a systematic literature review Al-
dictive models. breiki et al. (2021).
• Evaluates the models performance in predicting student perfor-
mance.
that have the most significant effects on their performance. It was men-
• Discusses the challenges and future work for applying DL research
tioned that the academic feature category outperforms other feature
based on the knowledge acquired in the review. categories, and is where the majority of features are utilized when pre-
dicting student performance. Another finding from the review is that,
The review is organized as follows: Section 2 presents and compares in comparison to academic performance, GPA, exam scores, and marks
previous studies of both ML and DL techniques in the education field; are the best indicators of performance.
Section 3 then describes the methodologies used to collect the studies Baashar et al. (2022) examined the Artificial Neural Networks
that will be reviewed; after that, Section 4 summarizes the current stud- (ANNs) techniques used to predict student performance at higher edu-
ies with their main idea and limitations based on the applied technique; cation institutes. It demonstrated that the most common input variables
Section 5 discusses DL techniques, describes the public and private used in the studies were demographic features and student test marks,
datasets, and the features used in each article; Section 6 presents the whereas other studies used cognitive features. Therefore, great accuracy
challenges; and finally, Section 7 concludes the paper. has been specified by using ANNs along with the use of data mining
algorithms and data analysis, to predict student outcomes and perfor-
2. Related works mance by evaluating the effectiveness of their findings in measuring
academic accomplishment. The authors indicated that the sample size,
Several researchers have reviewed and summarized investigations level, educational background, and study environment had no bearing
of ML, DL and Ensemble Learning techniques utilized to predict student on the level of accuracy of the methodology.
performance. A systematic review by Hernández-Blanco et al. (2019) A preliminary search was performed to ensure that no reviews had
highlighted the tasks of EDM that have benefited from DL and what been conducted previously with the same details. Table 1 summarizes
needs to be further investigated. Some of these tasks include student the findings of the reviews that are most relevant to this review. Most
performance prediction, the profiling and grouping of students, the of the reviewed surveys considered studies that apply ML techniques,
generation of recommendations, and undesirable student behavior de- while a few of them considered studies that use DL techniques in pre-
tection. Moreover, the review categorized and explained both the public dicting student performance. Previous reviews generally covered the
and private datasets used for testing and training DL models in EDM use of DL techniques, and did not emphasize the contribution of hybrid
tasks. In addition, the review provided an outline of the fundamental techniques in improving the prediction’s accuracy. Moreover, most re-
ideas, primary structures, and configurations of DL applied to EDM. Fi- views in the area do not emphasize the necessity of integrating various
nally, according to their research, DL outperformed conventional ML feature categories. The review focuses on the most recently published
baselines in 67% of the studies examined. articles that have applied one or more DL techniques, or combined DL
Namoun and Alshanqiti (2021) investigated the prediction of stu- with ML techniques, to predict student performance. Additionally, it
dent outcomes using ML models and data mining. They focused on three describes the datasets used, and then discusses the importance of inte-
areas: how learning outcomes can be predicted, predictive analytics grating various feature categories to identify the best qualities suitable
models used to predict student learning, and key variables influenc- for algorithm adaptation.
ing student performance. Achievement grades and performance class
standards were used to gauge how well students had learned their 3. Research methods
objectives. The review mentioned that the most apparent indicators
of learning outcomes were student academic mood, term assessment The research methods describe in this section follow the systematic
grades, and student online learning activities. Another systematic lit- literature review identified by Okoli (2015) to ensure completeness and
erature review by Albreiki et al. (2021) identified the relevant EDM consistency in evaluating all various studies in the area of predicting
research on students at-risk of failure, and dropout rates. The review’s student performance. Fig. 1 shows the stages that are followed to guar-
findings showed that numerous ML approaches have been utilized to antee adopting a comprehensive procedure systematic literature review
comprehend and address underlying issues, including the prediction Albreiki et al. (2021).
of student dropout rates and at-risk of failure students from ongoing
courses in academic institutions. The review assessed student perfor- 3.1. Research questions
mance based on both dynamic and static data, and demonstrated that
ML techniques were crucial for identifying at-risk of failure students and The review formulates research questions to explore and understand
dropout rates. Moreover, it recommended solutions, including adopting the various aspects of the DL predictive techniques devised for predict-
in-time feedback, which can help both instructors and students tackle ing student performance. Accordingly, to guarantee a comprehensive
issues and raise student achievement. comparison, the review addresses the following issues:
Nawang et al. (2021) discussed the use of the EDM, LA, and ML RQ1: Which techniques were applied in the selected studies?
to predict student performance in secondary schools and higher educa- RQ2: Which VLEs do the datasets belong to?
tional institutions. The review provided an overview of the methods and RQ3: What predictive features were chosen?
algorithms that are used in the prediction, and identified the features RQ4: What are the challenges and future directions?

3
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Table 1
Summary of the previously conducted surveys.

Reference Topic Technique Dataset Dataset


Features

Hernández- A Systematic Review of Deep Learning Approaches to EDM DL No Yes (MOOC)


Blanco et al.
(2019)
Namoun and Predicting Student Performance Using Data Mining and Learning ML No No
Alshanqiti Analytics Techniques: A Systematic Literature Review
(2021)
Albreiki et al. A Systematic Literature Review of Student Performance Prediction ML No Yes (MOOC & LMS)
(2021) Using Machine Learning Techniques
Nawang et al. A systematic literature review on student performance predictions ML Yes No
(2021)
Baashar et al. Toward Predicting student performance Using ANNs ML Yes No
(2022)
Current The Power of Deep Learning Techniques for Predicting Student DL Yes Yes (MOOC & LMS)
Review Performance in Virtual Learning Environments: A Systematic
Literature Review

Table 2
Number of found and selected studies.

Database URL Access Date Results after Search Results after Applying
String Query inclusion and
exclusion Criteria

MDPI https://www.mdpi.com 7 March 2023 65 16


Springer https://link.springer.com 7 March 2023 96 7
IEEE https://ieeexplore.ieee.org 9 March 2023 52 8
Science Direct https://www.sciencedirect.com/ 7 March 2023 73 8
Scopus https://www.scopus.com 8 March 2023 34 7

3.2. Research process • Studies that were published in peer-reviewed journals only.
• Studies in the English language.
The search process first derives the search string related to the re-
search questions, then determines the electronic databases used, and Conditions for exclusion
after that describes the inclusion and exclusion selection criteria, and
finally, provides quality assessment criteria. • Studies that predict student academic dropout/ withdraw/ reten-
To capture most of the studies that apply one or more DL techniques, tion.
or combined DL with ML techniques, or Ensemble Learning techniques, • Studies that used only ML, Ensemble Learning techniques without
to predict student performance in terms of achievement, completion combining any DL techniques.
of courses, and students at-risk of failure, the search is done using the • Studies that were published in conferences, editorials, posters,
following search string: Patents, or technical reports.
(“Deep_Learning”)
AND The review only considers journal articles to guarantee a high level
(“Artificial_Intelligence” OR “Machine_Learning” OR of research quality, as these articles normally go through a more rig-
“Ensemble_Learning”) orous peer-review procedure. For the 320 collected studies, the search
AND focused on the abstracts to find the studies that addressed the research
(“Predict_Student_Performance” OR “Predict_Student_Outcome” OR questions, which is because searching inside the text brings many stud-
“Predict_Student_at_Risk”) ies that are not related, such as the articles that applied only ML
Six digital online databases were searched from Google Scholar techniques, or only Ensemble Learning techniques. Since the previously
searches, including IEEE, Springer, Science Direct, MDPI, and Sco- mentioned string might have been referenced in the introduction and
pus. Based on the search string for the period between the 1st of related works, these studies are considered, and then filtered. Some of
January 2019 and the 3rd of January 2023, a total of 320 studies the filtered studies are still unrelated and are being further filtered. The
on DL techniques were found, as shown in Table 2, where 65 came further filtering criteria include the studies that must present all of the
from MDPI, 96 came from Springer, 52 came from IEEE, 73 came from following: the developed model types, the dataset used, the features
Science Direct, and only 34 came from Scopus. The studies taken from selected, and the evaluation measurements applied. Each study was ex-
Scopus were not included in the other previously mentioned databases. amined individually, and those not meeting the search criteria were
To establish a robust and reliable framework for the review, inclu- excluded. Finally, a total of 46 studies were considered. Fig. 2 provides
sion and exclusion criteria are determined as follows: the detailed studies selection procedure as identified by Okoli (2015).
Conditions for inclusion Moreover, Fig. 3 shows the number of studies used in the review based
on the year of publication.
• Studies that predict student performance, achievement, outcome,
and at-risk of failure. The review defines the quality criteria as follows:
• Studies that used DL techniques, or combined DL with ML, or En- QC1: Are the goals of the review well-defined?
semble Learning techniques. QC2: Are suggested techniques well described?
• Studies between the 1st of January 2019 and the 3rd of January QC3: Is the achieved accuracy verified and measured?
2023. QC4: Are the review’s challenges and limitations clearly placed?

4
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Fig. 2. Detailed studies selection procedure as identified by (Okoli, 2015).

hybrid DL techniques, or combined DL with ML or Ensemble Learning


techniques to predict student performance.

4.1. Deep Neural Networks (DNNs)

Table 3 summarizes all reviewed studies that used DNN techniques.


To predict at-risk of failure students and provide strategies for early in-
tervention in such circumstances, Waheed et al. (2019) used five layers
of DNNs on a set of features collected from demographic variables and
student clickstream activity from VLE. The effectiveness of the DNN
model is divided into four different categories: including distinction-
fail, distinction-pass, and withdrawn-pass, with each category classified
using binary classification. The findings demonstrated that the proposed
DNN model performed better than the underlying machine learning
Fig. 3. The number of studies used in the review based on the year of publica-
models, achieving an accuracy of 93%.
tion.
Another study by Hussain et al. (2019) aimed to predict student out-
comes using student internal assessment data from past semesters. The
A structured narrative and thematic analysis method are used in model is based on DL techniques that use a sequential neural model with
the review. First, each selected article will be summarized with a fo- the Adam optimization method. The records of 10140 students from
cus on the developed techniques. Subsequently, a comparative analysis three colleges in Assam, India were used to test the model. Compared
will evaluate the accuracy achieved for each article that uses similar to other classification methods such as AdaBoost and Artificial Immune
techniques. The datasets are then analyzed, with a focus on their sizes, Recognition System, the highest classification accuracy achieved was
types, sources, and distinctive characteristics that may have an impact 95.34% produced by DNN techniques. Other results included a preci-
on model performance. The features employed in these models and their sion of approximately 96%, recall of approximately 99%, and F-Score
influence on the outcomes are categorized and assessed using a the- of approximately 98%.
matic analysis. To give a comprehensive overview of the present status Using DL-based long-sequence generation, Wang et al. (2020) ad-
of research in evaluating predictive models, this synthesis attempts to dressed the issue of fine-grained student performance prediction in
integrate techniques, model accuracy, dataset characteristics, features an online course by structuring the student’s characteristics as a ma-
analysis, gaps, and future directions for further investigation. trix with elements missing in some places. The adaptive sparse self-
attention network was designed to anticipate fine-grained performance,
4. Student performance prediction models and to generate missing data. The feature’s matrix representation en-
ables position-wise feature selection, making it easier to identify the
DL techniques have been applied in the educational domain to re- elements from the previous learning stage that are most associated,
solve several issues, such as the prediction of student performance (Jiao, which results in embedding of the original features and their spatial
2022, Liu et al., 2023, Sikder et al., 2022, Xiong et al., 2022) student associations. To achieve sequence generation, a DNN model was built
learning outcomes (Aljaloud et al., 2022, Yang & Bai, 2022), and early with several sparse self-attention layers placed on top of one another.
detection of students who are at-risk of failure (Aljohani et al., 2019, An experiment using three datasets from various e-learning platforms
Brdesee et al., 2022, He et al., 2020, Huang et al., 2022, T. Liu et al., demonstrated the benefits and effectiveness of the proposed model.
2022, Waheed et al., 2022). This section presents a summary of the se- To determine whether transfer learning expedites training and en-
lected 46 journal articles that have been utilized using either single or hances performance using the potential of DNNs, Tsiakmaki et al.

5
B. Alnasyan, M. Basheri and M. Alassafi
Table 3
Summary of studies that used DNN techniques.

Authors Dataset Dataset Size Features Features No Tech Evaluation Measurements Contributions Limitations

Waheed et al. MOOC (OULAD) 32,593 students Demographic, and 20 DNN (Five Accuracy: 88% Sensitivity: The study has used student The model is built on the
(2019) Learning Layers) 69% Precision: 93% clickstream data to assess to assumption that the student’s
Behavior/Activity what extent students can behavior during the course is
features interact with the virtual treated equally.
environment, students are The absence of an individual
categorized into: student’s behavior pattern was not
‘withdrawn-pass’, ‘pass-fail’, considered
‘distinction-pass’ and
‘distinction-fail’

Hussain et al. LMS: three different 10140 records Academic features 10 DNN (Two Precision: 96% Recall: 99% The study predicted the results it seems there was overfitting
(2019) colleges of Assam, Layers) F1-Score: 98% Accuracy: of the students using student occurring with the model between
India (Digboi 95.34% internal assessment data from train and test data
College, Duliajan previous semesters
College, and
Doomdooma
College)

Wang et al. WorldUC dataset, WorldUC: 10,523 Learning WorldUC: 9 DNN (Three WorldUC: Built deep neural network The adaptive sparse self-attention
(2020) Liru dataset and records and 10 Behavior/Activity Liru: 12 Layers) MAE- 3.91 model with several sparse mechanisms increase the model’s
Junyi dataset lessons features Junyi: 25 RMSE-7.5 self-attention layers stacked complexity, leading to longer
Liru: 1046 Liru: together to achieve sequence training times
records and 18 MAE-6.8 generation The model needs to be tested with
lessons RMSE-7.7 different datasets that have
Junyi: 2,063 Junyi: different lengths, contents and
6

records and 18 MAE-8.5 formats, to confirm the validation


lessons RMSE-11.0 of the proposed model

Tsiakmaki et al. LMS (Private About 900 Demographic and Between 33 DNN (Four Accuracy of C1: Transfer learning methodology The paper did not address the
(2020) dataset using five students from all Learning and 51 for a Layers) up to 75.6% is proposed to predict student complexity of hyperparameter
courses of two courses Behavior/Activity combination C2: up to 60.9% performance by combining tuning in transfer learning scenarios
undergraduate features of two C3: up to 77.1% different numbers of features The effectiveness of transfer
programs) courses C4: up to 68.1% learning in DNNs is limited by the
C5: up to 79.5% similarity between source and
The accuracy of combined target tasks, where these differences
features reaches up to 86% can hinder model performance

Computers and Education: Artificial Intelligence 6 (2024) 100231


Aslam et al. MOOC Mathematics (395 Demographic, and 20 DNN (Eight Mathematics: Proposed a model for the Data imbalance, which leads to
(2021) (Mathematics and records) while Academic features Layers) Accuracy: 93% student’s early performance model overfitting
Portuguese Portuguese Precision: 94.9% prediction with the use of The model needs further validation
language course) language (649 Recall: 90.3% SMOTE to reduce the risk of on a large and balanced dataset
records) F1-Score: 92.6% model overfitting
Portuguese language:
Accuracy: 96.4%
Precision: 99%
Recall: 93.3%
F1-Score: 96.2%

Li and Liu Multidisciplinary 83,993 students, Academic features - DNN (Five MAE: 0.59 The study proposed DNNs The features used in the study are
(2021) university and 3,828,879 Layers) MSE: 0.78 model of variable nodes and not mentioned
records hidden layers to extract new The process of updating extracted
features and find their features and their weights was very
weights, which were used to sensitive
predict student performance
B. Alnasyan, M. Basheri and M. Alassafi
Table 3 (continued)

Authors Dataset Dataset Size Features Features No Tech Evaluation Measurements Contributions Limitations

Nabil et al. LMS (public 4-year 4266 records Demographic, 12 DNN (Six Accuracy, F1-Score, and The study used various Even though SMOTE are employed
(2021) university) academic, and Layers) sensitivity: 89% re_sampling techniques to to address imbalanced datasets, it is
Learning address the imbalanced class limited to capturing the minority
Behavior/Activity class’s characteristics, which affect
features prediction performance
The paper did not use other
oversampling techniques, such as
SVM-SMOTE and Borderline-SMOT
Accuracy needs to improve by
adding data from more semesters

Hidalgo et al. LMS (dataset 5,066 records of Academic and 18 DNN (Three Accuracy: 67.2% precision: Optimized the use of DNN and Limited dataset used for DL
(2021) provided by the 500 students Learning layers) 67% Meta-Learning which provided prediction model
International studying 4 Behavior/Activity equal performance with less Low performance achieved
University of La courses features effort
Rioja and extracted
from the Sakai
educational
platform)

Lee et al. Two MOOCs 1317 student’s Learning Video- DNN (Eight Accuracy: 97.5% First, it analyzed students Small dataset used (about 1300)
(2021) courses from records Behavior/Activity watching Layers) learning behaviors to evaluate
National Tsing Hua features features:16, their learning performance
University (NTHU) Exercise when students watch videos
features:8 Second, it used a novel
exercise-based model to
predict if students would
correctly answer examination
7

questions on relevant concepts

Yang and Bai MOOC (OULAD) 412 courses with Demographic, First-order DNN (Three RMSE: 84% ADMF model predicts students The model did not incorporate the
(2022) 600 students Academic, and features, Layers) MAE: 66.7% scores in the next semester by temporal aspect of student learning
Learning Second-order MAPE: 21% integrating a self-attention The focus on final scores leads to
Behavior/Activity features, and Recall: 95% mechanism and depth matrix the omission of varied
features Two different F1score: 90.7% decomposition student-related data
high-order Precision: 90.7% The model’s efficacy is constrained
features AUC: 82% by the dataset’s quality, with sparse
Accuracy: 86.6% datasets significantly limiting the

Computers and Education: Artificial Intelligence 6 (2024) 100231


ability to generalize predictions

Jiao (2022) MOOC (OULAD) 7 courses, 32,593 Academic, and 54 FDPNN (Four Recall: 95% The study proposed a FDPNN The model’s design is too
students, and Learning Layers) AUC: 82% based on feature combinations. specialized, which limits its
10,655,280 data Behavior/Activity Accuracy: 86.6% The model contained 3 layers applicability to other tasks
features Precision: 86.8% (Embedding layer, The feature combination was not
Concatenate layer explained in the paper
(factorization machine, DNN,
PNN), Prediction layer)

Rahman et al. LMS (academic 398 instances Academic features 43 DNN (Two MSE: 0.007 MAE: 0.064 Combining DNNs and BERT: Limited dataset are used, and
(2022) records from a Layers) with MAPE: 1.951 Show a novel method that limited evaluation measurements
reputed university BERT integrates DNNs with the Experimental data samples were
in Bangladesh) BERT by leveraging both limited
numerical and textual data for Combining DNN with BERT
early prediction of student increase the computational
outcomes resources required and making the
model tuning process more large
The use of numerical data with
BERT limits its effectiveness
(continued on next page)
B. Alnasyan, M. Basheri and M. Alassafi
Table 3 (continued)

Authors Dataset Dataset Size Features Features No Tech Evaluation Measurements Contributions Limitations

Liu et al. LMS (university in 55 students and Course Information, 26 variables Feedforward Accuracy: 70.8% Precision: The study predicted students Experimental data samples are
(2022a) Shenyang) 3976 pieces of and Academic Spike Neural 79% Recall: 78% F1-Score: grades through studying the limited
processed data features Network 79% relationship between the Limited interpretability of the
content of the courses that model, makes it challenging to
students study and the test understand the basis for predictions
scores

Liu et al. Alibaba Cloud Alibaba Cloud Demographic, Alibaba Evolutionary Accuracy: 81.4% Precision: The study used student The study focused on a single
(2022b) Tianchi’s Tianchi’s xAPI- Academic, and Cloud Spiking 94% Recall: 97% F1-Score: behavior/activity data to aspect of educational data mining
xAPI-Edu-Data and Edu-Data:480 Learning Tianchi’s Neural 95% assess the present level of without exploring advanced
UCI machine student records, Behavior/Activity xAPI-Edu- Network learning, any changes in test analysis techniques or expanding
8

learning library UCI machine features Data:17 scores, and the need for early case samples for a comprehensive
learning variables, UCI warning evaluation of the model’s accuracy
library:395 machine and practical applicability
student records learning
library:33
variables

Tao et al. LMS (Private 1683 students Academic features 28 DNN (Eight - The study predicted unknown Limited dataset used
(2022) dataset from Layers) course grades based on The paper did not provide any
College of previous learning situations measurements of how the model
Metallurgical performed

Computers and Education: Artificial Intelligence 6 (2024) 100231


Engineering at a
university in the
Anhui Province)
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

(2020) suggested an approach to predict student performance. The the relationship between the features and scores, which improves the
dataset used in the experiment was a collection of five datasets rep- prediction effect of the model, in comparison with using only single
resenting five undergraduate courses, each of which was hosted by the feature learning. In addition, the model, which used a factorization
Moodle platform and lasted for one semester. Twenty distinct pairs of machine and two different types of neural networks, considered the
datasets were created by individually matching the features of associ- influence of first-order features, second-order features, and high-order
ated courses. The DNN model was trained using the dataset from the features. Moreover, as the model was combination-based, it combined
first course, and after a predetermined number of epochs, it was ap- the student demographics, course, and learning behavior features ob-
plied to the dataset from the second course for additional training. The tained from the learning analysis dataset containing 412 courses with
results showed that while the proposed strategy easily outperformed the 600 students, which was used to assess the model’s performance. The
baseline model, a fair performance was generally attained. experimental findings demonstrated the strong capacity of the proposed
Aslam et al. (2021) examined data from two courses, and included model for performance prediction. The recall rate was approximately
information on student’s demographic, socioeconomic, educational, and 95%, the precision was approximately 90.7%, the AUC was approxi-
course grade data. The Synthetic Minority Over-sampling Techniques mately 82%, and the accuracy rate was approximately 86.6% when the
(SMOTE) were proposed to address the imbalance problem of datasets.
number of neurons was 256. This model had the best time efficiency
The DNN model’s accuracy in the early prediction of student perfor-
of less than 0.3, when compared to the other models. The RMSE of the
mance was 96% for the Portuguese course dataset, and 93% for the
ADMF model was 84%, MAE was 66.7%, and MAPE was 21%.
mathematics course dataset. The achieved accuracy demonstrated the
Jiao (2022) proposed a FDPN model using a sports education dataset
extent to which DL techniques can obtain high results. However, there
available at OULAD for course score prediction. The study discovered
were some challenges that can be seen in the study, one of which was
that using more than two features simultaneously had some impact on
the data imbalance that may cause model overfitting. Another challenge
the results of student-grade prediction. The proposed feature combina-
was that the model requires further validation by using larger and more
balanced datasets. tion approach can automatically learn feature combinations, consider
Li and Liu (2021) employed a DNN method for prediction by ex- the influence of first-order, second-order, and higher-order features,
tracting valuable data as a feature with matching weights. The NNs and gather relationship data between each feature and performance.
architecture used several updated hidden layers. These hidden lay- The experimental results showed that the proposed method achieved
ers were managed through feed forwarding and backpropagation data, an accuracy of approximately 86.6%.
which were generated by prior cases. The system was trained by using Rahman et al. (2022) suggested a strategy for gathering institutional
labeled data from a dataset in the training mode, and the system was data, analyzing and pre-processing the data, and using DNNs to esti-
tested in testing mode. The results of the proposed method were effec- mate student’s progress and the final GPA. To analyze the prediction of
tive and deserving of consideration, and achieved the best prediction of student quality, a real-time dataset was created from student transcript
0.59 within MAE, and 0.785 within RMSE. data from a reputable university in Bangladesh. According to each stu-
Nabil et al. (2021) identified students who were at-risk of failure dent academic history during each semester, the student performance
at an early stage of the semester, based on their previous grades. The was classified using six classification algorithms (GBT, RF, DT, SVM,
dataset included 4266 records that included demographic, academic, and KNN). The accuracy achieved varied in the range (90-94) % for
and behavioral features from a public university data set. Various re- the various classification models. The positive findings of this study in-
sampling techniques, including SMOTE, ROS, ADASYN, and SMOTE- cluded MSE values of 0.007, MAE values of 0.064, and MAPE values of
ENN, are used to address the imbalanced class, thereby enhancing 1.951.
model performance and producing accurate results. The results showed Liu et al. (2022a) used a feedforward spike neural network trained
that the best result was obtained when training the predictive model on information gathered from an online learning platform (involving
using DNN and a balanced dataset using SMOTE as an oversampling a university in Shenyang) and an educational administration system to
method, with an accuracy of up to 89%. predict student grades. Using pertinent student data and course infor-
Hidalgo et al. (2021) examined the potential of Deep Learning and mation, this study investigated the prediction of follow-up grades. The
Meta-Learning in the context of predicting student performance. The experimental results demonstrated that the suggested model could sig-
DNNs model was manually created using tried-and-true methods of nificantly increase the predictive accuracy of student accomplishments,
searching for the best neural architecture and optimizing the param- and reached up to 70.8%.
eter values. AutoKeras recommends a Meta-Learning model because it
Similarly, Liu et al. (2022b) proposed a student accomplishment pre-
has fewer layers and parameters, and it has been demonstrated that
diction model based on evolving spiking neural networks, based on a
models with the same performance can be produced with less effort.
thorough analysis of the correlation between student and course fea-
The characteristics were obtained from 5,066 records on the Sakai edu-
tures. The evolving membrane algorithm was added to train the model’s
cational platform, which was used by 500 students at the International
hyper_parameters, thereby enhancing the model’s capability for pre-
University of La Rioja to enroll in the four courses. Filter methods such
diction. Finally, using two benchmark student datasets (Alibaba Cloud
as analysis of variance (ANOVA) and recursive elimination (RFECV),
Tianchi’s xAPI-Edu-Data and the UCI machine learning library), the pro-
were used to select features. The DNN model achieved a 67% precision
posed model was employed to predict student achievement. The results
and a 67.2% accuracy throughout the test set.
Two DNN models were developed in Lee et al. (2021) to identify and of the experiment demonstrated that the model based on spiking neural
assist low-performing students. Based on student learning behaviors, networks can significantly increase the accuracy of predicting student
the first model effectively assessed student performance. Meanwhile, achievement.
the second model effectively predicted student performance based on Tao et al. (2022) proposed an early warning system to predict stu-
how they responded to the exercise questions. The dataset for the study dent performance using DNNs. The study aimed to predict future course
was gathered from two MOOC courses offered by National Tsing Hua grades of students using their prior learning experiences, and then uti-
University. The proposed DNN outperformed the other baseline models, lized clustering algorithms to find instances of comparable learning
achieving a prediction accuracy of 97.5%. experiences, to help students perform better academically. The DNNs
A self-attention mechanism and depth matrix decomposition were method was integrated with a grade prediction and learning recom-
integrated into the student performance prediction model using DNNs; mendation model. The results showed that when compared with LR,
the model, suggested by Yang and Bai (2022), is called Factorization RF, and BP Neural Networks, the proposed model can fit the data more
Deep Product Neural network (FDPN). The FDPN model fully learns accurately and improve the prediction accuracy.

9
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

4.2. Convolutional Neural Networks (CNNs) pass/fail classes. The deployed LSTM model outperformed the other
approaches with a learning accuracy of 95.23%.
Table 4 summarizes all reviewed studies that used CNN techniques. A unique framework called Multiple Features Fusion Attention
Akour et al. (2020) examined the effectiveness of utilizing a CNN to pre- Mechanism Enhanced Deep Knowledge Tracing (MFA-DKT) was pro-
dict student performance and determine their completion. The proposed posed by Liu et al. (2020). The system combined the attention mecha-
model has three levels of layers. The dataset was collected from the nism with the knowledge tracing model, and used both student exercise
learner activity tracker tool (xAPI), including demographic, academic, features and behavioral features. In addition, the feature representa-
and behavioral features. In terms of prediction accuracy, the proposed tion was automatically captured using ML approaches. The exercise
model performed better than the existing approaches. The best predic- and behavioral features of the students were handled using the PCA
tion accuracy has achieved was approximately 99% with two layers. approach. Next, student performance was predicted using a fusion at-
Kavipriya and Sengaliappan (2021) proposed a Deep Learning De- tention technique based on RNN architecture. Experiments on a public
cision Support System (DLDSS) prediction model that demonstrates the dataset from ASSISTments, which contained student math assignments
potential of data mining using DL techniques to estimate the total stu- and classwork, demonstrated the efficacy and viability of the proposed
dent performance in college placements. An Adaptive Weight Deep Con- method. The accuracy of the MFA-DKT model reached up to 98%.
volutional Neural Network (AWDCNN) classifier was utilized to model Most academic institutions’ LMSs provide access to a wealth of data
the DLDSS prediction algorithm. To estimate the student performance, on student activities. However, prior studies on predictive analytics em-
an AWDCNN was utilized, which was based on a feedforward NN that
ploying LMS activity data have not sufficiently accounted for student
generates precise prediction values. The classifier’s weights were regu-
behaviors as time series. LSTM networks were proposed in the study by
larized using the GA method. In addition, the prediction was enhanced
Chen and Cui (2020) to assess the temporal behaviors of students by
by using adaptive weights, as opposed to conventional fixed weights.
using their data on LMS for early course success prediction. The results
Compared with other prediction techniques, the proposed AWDCNN
showed that a time-series technique based on click frequency informa-
model demonstrated superior student result prediction.
tion successfully identified at-risk of failure students with a modest level
To predict academic performance (pass or fail), Poudyal et al. (2022)
of prediction accuracy 75%.
built a hybrid 2D CNN model using a combination of two different 2D
Xie (2021) proposed a real-time student performance prediction
CNN models. The 1D data were transformed into 2D data such that the
model based on an Attention-based Multi-layer LSTM. Every five weeks,
2D CNN could be applied to the dataset. The proposed model outper-
the model measured the progress of each online course participant by
formed the standard baseline methods, achieving an accuracy of 88%.
A study by Begum and Padmannavar (2022) proposed a feature se- fusing the clickstream and demographic information from the OULAD.
lection strategy based on CNNs and Binary Particle Swarm Optimization Two classification tasks (binary classification and four class classifi-
(BPSO) to predict student performance. This project used the Math and cation task) were used to test the model. The model’s accuracy has
Portuguese data set from the UCI machine learning repository to cre- increased for both binary and four class classification, for binary classi-
ate classifiers for 2-class and 5-class predictions. The proposed system fications going from 77.21% at week 5 to 95.58% at week 25, and for
outperformed other models, and achieved an accuracy of 96.6%. four classes classification from 51.89% at week 5 to 66.46% at week 25.
To predict course grades, Zhang et al. (2023) suggested a model To identify at-risk of failure students, Huang et al. (2022) suggested
using a Multi-source sparse attention Convolutional Neural Network using LSTM network techniques. The method involved extracting be-
(MsaCNN). MsaCNN uses several input-heads to combine multi-source havioral data from a public dataset, and creating two datasets of sequen-
features, multi-scale convolution kernels on student grade records to tial datasets and aggregated datasets. In addition to LSTM, eight ML
identify structural features, and a global attention layer to identify the techniques were used to train the prediction models on these datasets.
relationships between courses. A softmax classifier was then fed with all According to the findings, the models trained on sequential datasets us-
the obtained features to create a DL model. To predict student grades, ing LSTM achieved 84% accuracy, which was higher than that of other
the MsaCNN was tested on three real world datasets from a private ML techniques. The limitations of the study include that even though
institute. The results of each experiment demonstrated that MsaCNN the experiment has only used data from one course, the raw log data
outperformed the comparison approaches on every dataset for every were included, and the learning behaviors sequence features were ex-
scenario. tracted.
Brdesee et al. (2022) proposed a sequential DL model based on LSTM
4.3. Recurrent Neural Networks (RNNs) to predict the students at-risk of failure, to ensure that students graduate
on time, and to assess the student numbers on campus in a particu-
4.3.1. Gated Recurrent Units (GRUs) lar year, and their projection for succeeding years. The dataset used
To continuously predict student performance in a specific course, was from the LMS platform of a Saudi university, and contained demo-
He et al. (2020) proposed the RNN-GRU model based on student demo- graphic, educational, learning, and academic features. The imbalance in
graphics, historical data in the current course, and sequential behavior academic award gap performances and timely late graduates was elim-
data that can fit static and sequential data. The RNN, GRU, and LSTM inated using the synthetic minority over-sampling technique for class
were utilized as they fit static and sequential data, as well as their ability balancing. The results showed that the LSTM model was more effective
to incorporate the sequential relationship of learning data. The results than the other ML models for the early identification of at-risk of failure
from experiments conducted on the OULAD revealed that the GRU out- students, achieving an accuracy of 85%.
performed the LSTM model in terms of prediction accuracy, achieving
over 80% for at-risk of failure students at the end of the semester. Ta-
ble 5 presents a summary of this study. 4.3.3. Bidirectional LSTM (Bi-LSTM)
Yousafzai et al. (2021) developed a DNN model called attention-
4.3.2. Long Short-Term Memory (LSTM) based-(Bi-LSTM), which combines Bi-LSTM with an attention mech-
Table 6 summarizes all reviewed studies that used LSTM techniques. anism model for feature classification and prediction. The proposed
Aljohani et al. (2019) proposed a deep LSTM model. Utilizing the model uses historical data to predict student outcomes (i.e., grades).
OULAD dataset, clickstream data produced by the interaction of stu- The integrated model benefited from both attention mechanisms and
dents with online learning platforms can be assessed to help identify the superior Bi-LSTM sequence learning abilities to achieve excellent
at-risk of failure students at an early stage. Furthermore, by tackling performance. The prediction accuracy of the proposed model was about
the time-series sequential classification problem, the model predicts 90.16%. Table 7 presents a summary of this study.

10
B. Alnasyan, M. Basheri and M. Alassafi
Table 4
Summary of studies that used CNN techniques.

Authors Dataset Dataset Size Features Features Tech Evaluation Contributions Limitations
No Measurements

Akour et al. Learner Activity 480 students Demographic, Academic, 16 CNN Accuracy: The study measured the model’s limited dataset (only 480 students)
(2020) Tracker Tool called and Learning One layer: 95.5% effectiveness in predicting
XAPI Behavior/Activity Two layers: 99% performance by using different
features Three layers: 89% numbers of CNN layers and epochs

Kavipriya LMS (Various 6324 students Demographic and 10 CNN Accuracy: 92.41% The study proposed a DLDSS Limited dataset used for DL models
and Sengali- colleges under Academic features Precision: 91.25% model that used course-specific
appan Bharathiar Recall: 92.35% data analysis to predict the overall
(2021) University) F-measure: 91.8% student performance if it met the
placement criteria and their skills

Poudyal et MOOC (OULAD) 32,593 Demographic Learning 40 2D CNN Accuracy: 88% Converted the 1D numerical data Only used one pooling layer
al. (2022) students Behavior/Activity to 2D image data The model did not investigate the
studying 22 features impact of each feature on academic
courses performance
11

Begum and UCI machine 1044 students Demographic and 22 BPSO-CNN Accuracy of UCI BPSO used for feature selection, Limited dataset used
Padman- learning repository Academic features Maths data then the dataset is classified to The model needs to balance feature
navar (Math and (2 classes): 93.33% 2-class and 5-class for predictions selection to reduce overfitting and
(2022) Portuguese data) Accuracy of UCI improve accuracy
Portuguese data
(2 classes): 96.6%

Zhang et al. LMS: Private CST: 1463 Demographic, Academic, - MsaCNN CST The proposed MsaCNN model The bias may introduce and replicate
(2023) institute dataset students and and Course information Accuracy: 83.2% incorporated multi-source features accounting for the used demographic
includes Computer 34694 features AUC: 88% (using two input heads), extracted features, more feature analyses of

Computers and Education: Artificial Intelligence 6 (2024) 100231


Science and records F1-Score: 83.1% the overall relationship across potential biases are needed
Technology (CST), SE: 1621 SE courses (using an attention layer),
Software students and Accuracy: 85.8% captured local structured features
Engineering (SE), 25669 AUC: 89.8% (using convolution layers), and a
and Electronic records F1-Score: 85.5% fully connected layer for the
Information EIE: 1937 EIE softmax classifier
Engineering (EIE) students and Accuracy: 86%
courses 33743 AUC: 79.4%
records F1-Score: 78.7%
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Table 5
Summary of the study that used GRU technique.

Authors Dataset Dataset Features Features Tech Evaluation Contributions Limitations


Size No Measurements

He et al. MOOC 32,593 Demographic 20 GRU Accuracy: 80% Predicted at-risk of The model only compared
(2020) (OULAD) students Current Academic Recall: 85% failure students by with one DL technique
Learning extracting time series
Behavior/Activity features based on
features student historic data in
the current course
(interaction history and
assessment logs)

4.4. Hybrid DL techniques connected layers by combining student demographic data, correlation
features, and time-series features. The experiments, which involved four
Table 8 shows a summary of all the studies that used hybrid DL different types of daily behavior data from students at a Beijing uni-
techniques. The analysis of categorical variable transformation tech- versity (web browsing behavior, library entry behavior, consumption
niques and their compatibility with DL models are the main objectives behavior, and gateway login behavior), showed that the proposed DL
of this study (Hien et al., 2020). To predict student performance, this model outperformed several ML algorithms.
study compared the effectiveness of label encoding, one-hot encoding, Chen et al. (2022) proposed an explainable student performance
and learned embedding encoding with deep learning methods, such as prediction (ESPP) framework to interpret the prediction findings. The
DNNs and LSTM. The findings of the experiment demonstrated that the framework employed a time series of weekly student activity data,
learned embedding encoding method for categorical data transforma- and applied a hybrid data-sampling strategy to address the imbalanced
tion enhanced the performance of the deep learning model, and that its data distribution in the VLE. Moreover, the spatiotemporal features of
application in conjunction with LSTM produced excellent results. The the model were extracted by combining CNNs and LSTM networks.
average accuracy of the proposed model was about 86.26%. Then, the DL model was described by visualizing and examining typ-
Song et al. (2020) proposed a Sequential Engagement Based Aca- ical forecasts, maps of student’s activities, and feature importance. The
demic Performance Prediction Network (SEPN). The network contained experimental results demonstrated that the combined CNN-LSTM out-
a Sequential Predictor (SP) and an Engagement Detector (ED). The SP performed the baseline LSTM and ML models in the early prediction
has an LSTM structure, and learns about the interaction between the de- cases. Early in the sixth week, the suggested technique correctly pre-
mographic and engagement features. Based on weekly correlation data, dicted student learning performance, with an accuracy of 91%.
the ED created a matrix from the everyday actions of the students to Aljaloud et al. (2022) aimed to analyze the relationship between stu-
detect patterns of student engagement by taking advantage of the CNN. dent performance in university courses. To predict learning outcomes,
An experiment on a real dataset (OULAD) was conducted, and the re- the study used student interactions in seven general preparation courses
sults demonstrated that SEPN performed better in prediction accuracy presented on Blackboard. Mixed-methods were utilized that included
than the ML models when involving the ED mechanism. four DL models, and the best prediction was achieved with CNN-LSTM,
Duru et al. (2021) used DL models to predict the performance of compared with CNN, RNN, LSTM, and CNN-RNN. The reason for this
different English second language speakers who were categorized ac- is the selection of many factors that affect DL models, which then in-
cording to their likely linguistic abilities. The dataset used within the fluence the prediction accuracy and prediction error, including the size
proposed model included about 420,000 comments that the University of the CNN convolution filter, the number of neurons in the LSTM, and
of Southampton delivered through the FutureLearn MOOC. This study the size of the LSTM batch.
divided language learners into three categories: those for whom English T. Liu et al. (2022) proposed a student performance prediction
is their official second language (ESL), those for whom English is their model named GDPN. The model simultaneously extracted temporal in-
official first language but not a primary (EOL), and those for whom formation and general information about student learning behaviors.
English is their official first and Primary Language (EPL). Different DL Three different parts were used for this: a feature connect mechanism
models (CNN, GRU, LSTM, Bi-LSTM, and CNN-LSTM) were proposed to with an attention mechanism, a DNN based feature generator for over-
leverage either extracted features from user activities during the course all behavior, and a gated unit neural network based feature generator
or comment texts, or a mix of both. The results showed that the DL mod- for temporal behavior. The student learning behavior data from the
els outperformed when using a combination of both extracted features OULA dataset used included 6,455 students and 18 features. Therefore,
and comment texts rather than when using comment texts alone. the experimental results showed that the proposed method offers better
To predict student GPA, Prabowo et al. (2021) introduced a dual- prediction performance with an accuracy of up to 92.5%.
input DL model that can handle time-series and tabular data at the same A hybrid DL model using a combination of CNNs and RNNs proposed
time. The model is composed of LSTM for time series data and a Multi- by Xiong et al. (2022) to predict student performance and discover the
Layer Perceptron (MLP) branch for tabular data. The database of the primary factor with the highest association with student performance.
Student Advisory and Support Center of Bina Nusantara University con- RNN was used to obtain the semantic connection between features,
tains information on 46,670 undergraduate students. The experimental whereas CNN was used to collect the local dominant features and al-
findings demonstrated that the proposed model performed well with a leviate the curse of dimensionality. The results of the trials showed that
GPA of 0.4142 MSE and 0.418 MAE. the hybrid CNN-RNN prediction model outperformed the existing DL
Li et al. (2022) suggested an end-to-end deep learning model that models, with an accuracy of roughly 79.23% when using data from the
predicts academic success by automatically extracting characteristics Kaggle repository.
from campus-based student behavior data obtained from many sources. To predict student success on the final exam, Sikder et al. (2022)
Two-Dimensional Convolutional Networks (2D CNN) are required to ex- suggested a Deep Convolutional Neural Networks model called (DCNNs)
tract the correlation features among various behaviors, and the model that used data from the Institute of Science, Trade & Technology (ISTT),
uses LSTM networks to model the temporal features of behavior data and included about 2,844 records of 158 students. The model used a
(capturing intrinsic time-series patterns for each type of behavior). Fi- total of 18 data features, and achieved an accuracy of about 98.33%
nally, the academic performance level was generated using completely accuracy.

12
B. Alnasyan, M. Basheri and M. Alassafi
Table 6
Summary of the studies that used LSTM techniques.

Authors Dataset Dataset Size Features Features No Tech Evaluation Contributions Limitations
Measurements

Aljohani et MOOC (OULAD) 32,593 students Learning 20 LSTM Accuracy: 95.23% The study transformed the The model’s limited generalizability
al. (2019) Behavior/Activity Precision: 93.46% learning behavior/activity features and scalability beyond the specific
features Recall: 75.79% into a sequential format in order to dataset it was tested on (OULAD)
improve the level of students who
are at-risk of failure

Liu et al. Math homework and 338,001 logs Learning 24 LSTM AUC: 98% The proposed MFA-DKT model Dependencies on specific datasets,
(2020) classwork of students consisting of Behavior/Activity Accuracy: 97% fused multiple features, including affecting generalizability and
provided by 4,216 students features RMSE: 0.26 student exercise features and applicability
ASSISTments and 24,896 MAE: 0.20 behavior/activity features, to trace Potential information loss during
exercises student knowledge states. feature dimensionality reduction
LSTM then used to model the Complexity in handling multiple feature
sequential exercise results, and data
attention mechanisms are used to
assign different weights to features

Chen and Cui LMS (Canadian 668 students Learning 5 features LSTM AUC: 75.2% The study proposed an early The class imbalance for LSTM analysis
(2020) University) Behavior/Activity are selected prediction model that examines (the test dataset is largely imbalanced),
features from 21 student behavior/activity features where training and test data need to
come from the same feature and target
distributions

Xie (2021) MOOC (OULAD) 22 courses, Demographic, and - LSTM Four classes in The study added an attention layer The number of features used was not
13

32,593 students, Learning Week25 to the multi-layer LSTM model to mentioned in the paper
and 10,655,280 Behavior/Activity Accuracy: 69.55% predict student performance The OULA dataset includes many
data features Precision: 62.95% features that improve the model
Recall: 52.84% accuracy
F1-Score: 51.50%
Binary classes in
Week25
Accuracy: 89.12%
Precision: 92.07%
Recall: 86.48%

Computers and Education: Artificial Intelligence 6 (2024) 100231


F1-Score: 89.18%

Huang et al. MOOC (OULAD) 22 online courses Learning 17 LSTM AUC: 84% The predictive model integrated limitation is noted in the exclusive use
(2022) and 32,593 Behavior/Activity dynamic features of learning of data from one course, which limits
students features behaviors for early detection of the generalizability of the findings to
students who are at-risk of failure other contexts
The paper used aggregated data instead
of raw log data, this approach hampers
the extraction of learning behavior
sequences

Brdesee et al. LMS (Student 230,000 students Demographic, Previous 19 LSTM Accuracy: 85% The proposed model predicted LSTM was still incapable of handling
(2022) Information System of a and Current Academic AUC: 80% at-risk of failure students by the up-sampling of students in a
Saudi University) features assessing them on campus using temporal setting
demographic and academic
features to ensure they graduate
on time
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Table 7
Summary of the study that used Bi-LSTM technique.

Authors Dataset Dataset Features Features Tech Evaluation Contributions Limitations


Size No Measurements

Yousafzai UCI Machine 1044 Demographic, 11 Bi-LSTM Accuracy: Implemented The dataset confined to a
et al. Learning records Historical and features 90.16% attention-based Bi-LSTM single domain Only
(2021) Repository Current academic are Precision, based on improved significant features were
features selected Recall, feature selection considered for the prediction
from 33 F1-Score: 90%

Liu et al. (2023) examined how clickstream data can be used to together in one cluster, to increase prediction accuracy. Then, distinct
predict student performance. The study utilized the most important in- neural networks were built for each course, in accordance with the vari-
dicators of how well students would perform from the OULAD with ous academic years. Furthermore, using a de-noising auto-encoder, each
weekly and monthly time intervals; these indicators included clicks on model was pre-trained. The relevant structure and weights were used as
the homepage, related sites, quizzes, and content. It was found that the initial values of the neural network model after pre-training. More-
the optimal method to predict student performance was weekly-based over, a base predictor was used for each neural network. To forecast
click count aggregation in the form of panel data, along with the LSTM the current student course success, all predictors were combined into
model. Because 1D CNN is inadequate for handling sequential data, an ensemble predictor using weights from several years. The predicted
LSTM achieved a greater prediction accuracy of up to 89.25% com- model accuracy achieved was about 89.62%.
pared to 1D CNN. Yang et al. (2022) study is similar to that of Kuo et al. (2021), which
suggested combining the DNN with K-means algorithms to predict stu-
4.5. Hybrid DL merged with either ML or Ensemble Learning techniques dent grades. Firstly, two public educational data sets (mathematics and
Portuguese) were subjected to data mining and analysis techniques.
Table 9 shows a summary of all studies that used hybrid DL merged The next step was to choose the features using two computing tech-
with either ML or Ensemble Learning techniques. Botelho et al. (2019) niques based on the Random Forest (RF) algorithm, and the following
examined low and high representations of unproductive persistence, step was then implemented to predict student performance using an
which are described as student stopout and wheel spinning using a upgraded model built using DNNs and K-means. It was discovered that
transfer learning methods that includes DL and conventional model- absences, mother education, encouragement, and internal influence are
ing methodologies. The study used LSTM, Decision Tree, and Logistic the features that have the most impact on student grades. The find-
Regression to construct early detectors of these behaviors. The devel- ings demonstrated that in both of the student learning data sets, the
oped models stopout and wheel spinning can learn a set of attributes suggested model feature selection Adaptive DNN-Kmeans provided the
that generalize to predict the other. The effectiveness of these models best mean of the squared residuals.
at each learning opportunity within student assignments were exam- Waheed et al. (2022) used LSTM to predict which students are most
ined to determine when interventions can be most effectively used to likely to fail a course that was delivered online in a self-paced mode.
help students who can be displaying unproductive persistence. The study used the OULAD, which included the records of 22,437 stu-
Kostopoulos et al. (2020) assessed the effectiveness of DNN, bagging, dents over a 38-week period. Weekly log data were employed by the
and boosting for the early prediction of students who are prone to fail- LSTM model to transform the performance prediction into a time-series
ure. A variety of features drawn from various educational sources (LMS) classification issue for early prediction. In addition, the study used a
were utilized in relation to student’s characteristics, academic achieve- Shapely additive explanation model to determine the key indicators of
ments, and online behavior. The experimental findings showed that the student retention, such as quiz attempts and assessment submissions. In
proposed DL model provided decent accuracy with academic attributes; comparison to all other options, the deep LSTM exhibited the strongest
however, its performance appears to be greater in the early risk predic- predictive power to distinguish between pass and fail students, reaching
tion of students with economic factors, as it achieved an accuracy of up an accuracy of 84.57%, a precision of 82.24%, and a recall of 79.43%.
to 72%. With the economic background attributes, the accuracy of the Al-Zawqari et al. (2022) used RF and ANNs to predict student perfor-
bagging classifier was approximately 88%, whereas the accuracy of the mance without feature engineering and attempted to reduce flexibility.
boosting classifier was 86%. Throughout the first quarter of the course, the midterm, the third quar-
Based on the course’s achievement outcomes from prior semesters, ter of the course, and two weeks before the course’s end, four distinct
Dien et al. (2020) predicted student performance in the upcoming binary classification problems (Distinction, Pass, Fail, and Withdrawn)
semesters. This was accomplished by analyzing and presenting sev- were developed using the OULA dataset to test the proposed model. For
eral pre-processing strategies (such as Quantile Transforms and MinMax Pass-Fail, Distinction-Fail, Distinction-Pass, and Withdrawn-Pass, both
Scaler), before DL models such as LSTM and CNN were used to collect RF and ANNs without feature engineering achieved high prediction ac-
the data. The model was additionally modified to include other opti- curacy during the fourth quarter of up to (85.83, 88.15, 81.26, and
mizer functions, such as Adam and RMSprop, to enhance prediction 92.91) %, respectively.
performance. A student information system from a Vietnamese inter-
disciplinary university was used to acquire four million samples for
5. Discussion
the experiments, which were constructed using 16 datasets connected
to a wide range of different disciplines. The results indicated that the
proposed strategy, particularly when employing data modification, pro- This section discusses the application of DL techniques, both as
duced accurate prediction results. standalone methods, and in combination with other ML or Ensemble
A pre-trained Ensemble Learning Neural Network model was created Learning techniques, to predict student performance. It explores the ef-
in the study by Kuo et al. (2021) using student data from the most recent fectiveness of these techniques, specifically their accuracy in predicting
academic years and course results as training data. The K-means tech- outcomes. Moreover, it details the datasets used for testing and train-
nique was used to classify all the first through fourth grade courses, so ing the models. Lastly, the selection of features extracted from these
that they would all have the same characteristics, and could be placed datasets for predictive modeling is explained.

14
B. Alnasyan, M. Basheri and M. Alassafi
Table 8
Summary of the studies that used hybrid DL techniques.

Authors Dataset Dataset Size Features Features Tech Evaluation Contributions Limitations
No Measurements

Hien et al. LMS (Vietnamese 524 Demographic and 41 DNN (Two Layers) Accuracy: 86.26% The study tried to analyze the The dataset size was not mentioned in the
(2020) universities) Academic features compatibility of DL techniques with study
categorical variable transformation The experiment took long time to find the
techniques at the same time optimal set of parameters in the DL model
design
The study selects parameters based on
recommendations rather than empirical
findings, such as the activation function and
the number of hidden layers

Song et al. MOOC (OULAD) 32,593 Demographic, 7 CNN-LSTM Accuracy: 61% A Sequential Engagement Based The study did not take into account the
(2020) students Previous academic, MSE: 15.12 Academic Performance Prediction prediction of learning performance
and Learning F1-Score: 53.3% Network proposed, and it had two
behavior features Recall: 48.8% parts: an engagement detector and a
sequential predictor

Duru et al. FutureLearn MOOC 420,000 Learning behavior 11 CNN-LSTM Accuracy of The model categorized students by The study relied on predefined regular
(2021) comments features Comments: likely English language proficiency and expressions to identify participants language
Bi-LSTM 73% predicted their performance using from their posts, which led the model to
Comments and either comment texts, learning behavior perform poorly with only comments
extracted features: features, or a combination of both of
GRU 87.05% them

Prabowo et LMS 46,670 Tabular data: Tabular MLP-LSTM RMSE: 0.4142 The proposed model processed both The model’s performance is affected by
al. (2021) (undergraduate students Demographic and data: 6 MAE: 0.418 tabular data and time-series data at the overfitting, when MLP alone received the
15

database obtained Course Information same time by combining five layers of time-series data, the long-range
from the Student Time-series data: LSTM and MLP dependencies problem was still apparent
Advisory and Academic (GPA) even with the use of LSTM
Support Center at For the collected dataset, different
Bina Nusantara time-series lengths for different years
University) impacted the model performance

Li et al. LMS (Dataset from 9000 Learning behavior 16 2D CNN Accuracy: 94% The model used LSTM, an embedding The current models might not yet fully
(2022) the University in students features LSTM Precision: 94% layer, and 1D CNN to extract leverage the potential insights from
Beijing) (consumption DNN (Four layers) Recall: 77% time-series features from each kind of integrating diverse data sources, further

Computers and Education: Artificial Intelligence 6 (2024) 100231


behavior, library F1-Score: 79% behavior data separately, 2D CNN was exploration is required to explore the
entry behavior, used to obtain the correlation features correlation between multi-source behavior
gateway login from different type of behaviors data
behavior, and web The model’s interpretability needs more
browsing behavior) enhancement through collecting more
student behavior data

Chen et al. LMS (Gadjah Mada 202,000 Academic and 8 CNN-LSTM Accuracy, The study proposed a model that The complexity of the model did not allow
(2022) University) logs of 977 Learning behavior Precision, Recall, utilizes a time-series of weekly learner them to identify the important features
students features and F-Score: 91% behavior/activity features, and uses the Imbalanced dataset distribution limited the
“data sampling technique” for dealing model’s explainability in early prediction
with the VLE imbalanced data
distribution

Aljaloud et LMS (seven general 35,000 Learning behavior 7 CNN-LSTM Precision: 94.2% The study analyzed the correlation and The study only focused on limited student
al. (2022) preparation courses student features F1-Score: 93.59% time series of student performance at features
from blackboard) records RMSE: 39.69% university per attended course CNN-LSTM model had high time
MAPE: 27.56% consumption when increasing the size of
CNN layers, filters, and LSTM batch size
(continued on next page)
B. Alnasyan, M. Basheri and M. Alassafi
Table 8 (continued)

Authors Dataset Dataset Size Features Features Tech Evaluation Contributions Limitations
No Measurements

T. Liu et al. MOOC (OULAD) 6455 Learning behavior 18 DNN (Three Accuracy: 92.5% The study used the attention The relationship between students learning
(2022) students features Layers)-GRU Precision: 90.6% mechanism to separate the temporal behavior and their performance was not
study FFF Recall: 96.5% behavior information and the overall analyzed well
course for F1-Score: 93.5% behavior information from the student
two years AUC: 92% learning behavior data
(4,014,499
records)

Xiong et al. Kaggle repository - Demographic 10 CNN-RNN Accuracy: 79.23% The study proposed an early student The dataset size was not mentioned in the
(2022) features Precision: 83% performance predictor prior to course paper
Recall: 94.5% commencement Few features have been selected; more
The study mentioned the main factor features can be obtained by using automated
that has the highest correlation with tracking device
student performance Only three methods have used for selecting
the features (SelectKBest,
ExtraTreeClassifier, and Correlation),
however, there are many more
16

A lack of detailed exploration into how the


hybrid CNN-RNN model performs with
different structured data, as the study
focused on improving model accuracy

Sikder et al. LMS (real dataset 2844 Demographic and 18 DNN (Two Layers) Accuracy: 97% The implemented DNN model was Overfitting occurred, as a small dataset was
(2022) from ISTT) records of Academic features CNN trained using CNN to make yearly used; adding more data can improve the
158 predictions on student achievement model’s proficiency
students

Liu et al. MOOC (OULAD) 5521 Learning behavior 12 LSTM and 1D CNN LSTM accuracy The study built a predictive model Panel data was found to be unsuitable for

Computers and Education: Artificial Intelligence 6 (2024) 100231


(2023) students features using feature set based on students learning CNN and traditional machine learning
week: 89.25% behaviors/activity feature at weekly techniques, making the comparison of
LSTM accuracy and monthly time intervals weekly and monthly views irrelevant for
using feature set these methods
month: 88.67%
1D CNN accuracy
using feature set
week: 70.25%
1D CNN accuracy
using feature set
month: 77.55%
B. Alnasyan, M. Basheri and M. Alassafi
Table 9
Summary of the studies that used hybrid DL merged with either ML or Ensemble Learning techniques.

Authors Dataset Dataset Features Features Tech Evaluation Contributions Limitations


Size No Measurements

Botelho et al. ASSISTments 12,714 Learning behavior 15 LSTM, DT Wheel Spinning The study used transfer learning to Mislabeling students who do not wheel
(2019) platform students, features and LR AUC: 88.7% investigate the relationship spin on the next assignment led the
1,055,588 RMSE: 0.313 between representations of model to struggle distinguishing
records Stopout unproductive student persistence between future behaviors of students
AUC: 75.9% and to predict early student who either persist or stopout which as a
RMSE: 0.223 attrition based on their result led the AUC to decrease
behavior/activity data

(Kostopoulos LMS (Hellenic Open 1,073 Demographic, 14 DNN DNN: 72% Used various methodologies to Limited dataset records were used in
et al., 2020) University) students Academic, and (Four Bagging: 88% analyze how the student’s situation the experiment
Behavior features Layers), Boosting: 86% influences their performance
Bagging,
and
Boosting

Dien et al. 17 private dataset 3,828,879 Demographic and 21 CNN, MAE: Predicted next semester student The study did not measure accuracy
(2020) from Vietnamese records, Academic features LSTM, CN1D-RMS: 0.485 performance based on the previous The model setting needs to be modified
multidisciplinary 4,699 and LR LSTM-RMS: 0.491 semesters results, Data pre-process to improve the performance
university courses RMSE: went through quantile transforms The study did not compare the
and CN1D-RMS: 0.646 and MinMax scaler techniques proposed model with other approaches
83,993 LSTM-RMS: 0.649
students

Kuo et al. LMS (Private - Demographic and 6 DNN Accuracy: 89.62% The model was built using The dataset record number was not
17

(2021) dataset from the Academic features (Four K-Means to classify courses known
Department of layers), according to the year, then DNN
Electrical and was applied for each year
Engineering K-Means
students at National
Taipei University of
Technology
(NTUT))

Yang et al. UCI machine - Demographic, 29 DNN Portuguese MSR: 15.43 The study tried to predict students The size of the dataset was not
(2022) learning repository Academic, and (Three Mathematics MSR: 15.54 grades using an integration of the mentioned

Computers and Education: Artificial Intelligence 6 (2024) 100231


(Math and school related layers) DNN and K-means algorithms
Portuguese data) features and K- based on continuous student grade
means values to examine the relationships
between several factors that have
an impact on student performance

Waheed et al. MOOC (OULAD) 32,593 Learning behavior 20 Cluster- Accuracy: 84.57% Stacked LSTM model used to early The model’s performance underscores
(2022) students features ing and Precision: 82.24% identify students at-risk of failure the need for large datasets to train
LSTM Recall: 79.43% on based on their DNNs effectively
AUC: 82% behavior/activity data When using sequential data, there was a
class imbalance between ‘pass’ and ‘fail’
classes

Al-Zawqari et MOOC (OULAD) 32,593 Demographic, 32 DNN Pass-Fail: 85.83% The model tried to predict student The high sensitivity of the ANN
al. (2022) students Courses (Three Distinction-Fail: 88.15% performance through flexible classifier’s performance to
information, and Layers) Distinction-Pass: 81.26% reduction of features based on hyperparameter choices
Learning behavior and RF Withdrawn-Pass: 92.91% targets
features
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Table 10
The number of DNNs layers used in the re-
viewed studies.

The number of Reference


DNNs Layer

One Liu et al. (2022a)


Liu et al. (2022b)

Two Hussain et al. (2019)


Rahman et al. (2022)

Three Wang et al. (2020)


Hidalgo et al. (2021)
Yang and Bai (2022)

Four Tsiakmaki et al. (2020)


Jiao (2022)

Five Waheed et al. (2019)


Fig. 4. The number of different DL techniques used in the reviewed studies for
Li and Liu (2021)
predicting students performance.
Six Nabil et al. (2021)

Eight Aslam et al. (2021)


Lee et al. (2021)
Tao et al. (2022)

that adding more layers to the model can lead DNNs to perform bet-
ter by learning more complicated correlations between the input and
output data (Hussain et al., 2019, Aslam et al., 2021, Lee et al., 2021).
However, adding too many layers may negatively impact the model’s
performance through overfitting (Hussain et al., 2019, Tsiakmaki et al.,
2020, Aslam et al., 2021), which has been solved using many methods,
including (dropout technique, as in Hidalgo et al. (2021); a validation
set, as in Tao et al. (2022); SMOTE, as in Aslam et al. (2021) and Nabil
et al. (2021); or L2 regularization, as in Jiao (2022)). Nevertheless,
Fig. 5. The maximum and minimum accuracy achieved by each implemented
simply increasing the model’s depth does not necessarily optimize the
model. DNN’s performance. In some circumstances, it is preferable to utilize a
different type of activation function (ReLU, as in Hussain et al. (2019),
5.1. Predictive models and their performance Lee et al. (2021), Hidalgo et al. (2021), Aslam et al. (2021), Jiao (2022),
Li and Liu (2021), Nabil et al. (2021), Rahman et al. (2022), and Tsi-
This subsection answers the first research question, which is “RQ1: akmaki et al. (2020); Sigmoid, as in Hussain et al. (2019), Li and Liu
Which techniques were applied in the selected studies?”. Deciding (2021), Tsiakmaki et al. (2020), and Yang and Bai (2022); or Tanh, as
on the appropriate DL technique(s) to predict student performance is in Li and Liu (2021) and Jiao (2022)); or to increase the number of
the most challenging aspect of building the model. Several factors need neurons in each layer (up to 1024, as in Hidalgo et al. (2021) and Jiao
to be considered, including the problem that needs to be addressed, (2022); or 521, as in Aslam et al. (2021) and Tao et al. (2022)). The
the type of input data, and the type of target variable (Sarker, 2021). best strategy to enhance the DNN’s performance is to experiment with
This subsection discusses the different DL techniques implemented in various hyperparameters.
the reviewed studies and their accuracy. Fig. 4 shows the number of The application of DNN architectures in educational prediction mod-
different DL techniques used in the reviewed articles to predict stu- els has yielded impressive accuracy, ranging from 70% to 97.5%, with
dent performance. Moreover, it discusses the accuracy of the predictive an average of around 85.89%. An exception noted was in the case study
model achieved using the implemented models. Fig. 5 shows the maxi- by Hidalgo et al. (2021), which reported a lower accuracy of approx-
mum and minimum accuracy achieved by each implemented model. imately 67.2%. This deviation was attributed to the use of a limited
Deep Neural Networks (DNNs) have shown significant potential to dataset from LMS. DNNs typically require extensive datasets to perform
predict student performance by learning complex patterns and relation- well, especially in tasks involving early predictions, which are not avail-
ships between student data and their academic achievements (Aslam et able for all VLEs.
al., 2021, Hidalgo et al., 2021, Hussain et al., 2019, Jiao, 2022, Lee et Convolutional Neural Networks (CNNs) have emerged as power-
al., 2021, Li & Liu, 2021, Liu et al., 2022a, 2022b, Nabil et al., 2021, ful tools for predictive modeling, showcasing their ability to analyze
Rahman et al., 2022, Tao et al., 2022, Tsiakmaki et al., 2020, Waheed et student performance data with remarkable accuracy. Studies by Akour
al., 2019, Wang et al., 2020, Yang & Bai, 2022). High predictions have et al. (2020), Begum and Padmannavar (2022), Kavipriya and Sengali-
been achieved using an appropriate model architecture design with ap- appan (2021), Poudyal et al. (2022), and Zhang et al. (2023) demon-
propriate hyperparameters, including the number of layers, number of strate the diverse applications and advancements of CNN architectures
neurons in each layer, activation functions, loss function, learning rate, in predicting student performance. Specifically, Poudyal et al. (2022)
number of epochs, number of batches, and regularization techniques, to illustrates the effectiveness of CNNs in processing sequences of student
prevent underfitting or overfitting (Ying, 2019, Benkendorf & Hawkins, grades over multiple semesters to predict future academic performance.
2020, Lopez et al., 2022, Dos Santos & Papa, 2022, Tao et al., 2022). This approach underscores the potential of CNNs to handle data as a
The performance of the DNNs model depends on many factors, in- sequence, where each point corresponds to a distinct timestep, thus en-
cluding the number of layers. Table 10 shows the number of layers abling the model to capture temporal patterns in educational outcomes.
applied in each reviewed study that have used DNNs to predict stu- Another key advancement in the application of CNNs is the transfor-
dent performance (ranging from one to eight layers). It has been noted mation of 1D data into a 2D format, as employed by Poudyal et al.

18
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

(2022), as transformations allow CNNs to effectively process and learn Bidirectional Long Short-Term Memory (Bi-LSTM), as an ad-
from the complex relationships found in educational data. Further in- vancement over traditional LSTM, is designed to process data sequences
novations in CNN architectures are evident in the work of Zhang et al. in both forward and backward directions. This enables Bi-LSTM to
(2023), who introduce the MsaCNN model by incorporating attention capture dependencies and relationships in sequential data that LSTM
mechanisms and the integration of multi-source data, aiming to mitigate might fail to recognize. The bidirectional processing enables the model
overfitting and enhance predictive accuracy. Moreover, the use of BPSO to consider a broader context of student academic records, consider-
for feature selection prior to CNN classification in the study by Begum ing complex interactions between various student achievements. In the
and Padmannavar (2022), showcases the potential of combining dif- study by Yousafzai et al. (2021), the application of a Bi-LSTM model,
ferent computational approaches to improve model effectiveness. The augmented with an attention mechanism, was explored for this purpose.
effectiveness of CNNs in predicting student performance has markedly The model’s superior sequence learning capabilities, combined with its
improved, showcasing reported accuracy that ranges from 83.2% to attention to the most relevant features, yielded an impressive predic-
99%, with an average accuracy of 91.84%. tion accuracy of 90.16%. This outcome demonstrates the potential of
Gate Recurrent Unit (GRU) is considered a gating mechanism the Bi-LSTM to offer insights for early interventions aimed at improving
within RNNs. The work by He et al. (2020) addresses the dynamic student outcomes. It has been noted that there is limited use of Bi-LSTM
and sequential nature of educational data by incorporating both static in predicting student performance, which could be due to several fac-
personal biographical information and sequential behavior data. This tors. Bi-LSTMs are powerful in capturing bidirectional dependencies in
approach reflects the GRU’s simplified structure to manage the data sequence data, but their complexity might lead to challenges in model
flow from the current state’s candidate state to the historical state with- training and interpretation, especially with limited data. Additionally,
out the need for extra memory units, as the GRU has the ability to the domain of student performance prediction might benefit more from
strike a balance between capturing relevant information from recent models that can easily integrate diverse types of data (e.g., time series,
input and retaining necessary historical information without being bur- categorical, and textual data). Other models or advanced architectures
dened by an extensive memory unit. The use of GRU with fewer gate might offer advantages in these areas, leading researchers to explore
structures leads to a model that is quick to train and less prone to over- alternative methods.
fitting. The designed data completion mechanism plays a crucial role in Hybrid Deep Learning models combine multiple neural network
completing the gaps in stream data, thus allowing the model to be ef- architectures to predict student performance effectively. These mod-
fectively trained on courses with short lengths. By achieving over 80% els leverage the strengths of different architectures and combine them
prediction accuracy in identifying students at-risk of failure, the study to improve overall performance (Towell & Shavlik, 1994). One of the
demonstrates the potential of GRU and simple RNN in handling ed- most common hybrid DL models to predict student performance is the
ucational data more efficiently than the more complex models. GRU combination of CNN with RNN, such as LSTM or GRU, which takes ad-
models, while effective for sequence modeling, are less frequently used vantage of the strengths of both architectures, and improves prediction
in predicting student performance due to several key challenges. The accuracy. The CNN-LSTM hybrid model is flexible in terms of the num-
structure of many student performance datasets may not align well with ber of CNN and LSTM layers, adjusted in accordance with the difficulty
the architecture of GRU, as it struggles with capturing long-term de- of the problem statement and input data. This model effectively han-
pendencies due to issues with the vanishing gradient problem, making dles sequential data with both spatial and temporal dependencies, and
LSTM, and other advanced models a preferred choice for their enhanced leverages multiple sources of input features to improve performance.
memory capabilities. Moreover, the rise of advanced models, especially CNNs extract useful features from the input data by identifying pat-
transformer-based architectures, offers superior performance in han- terns in the spatial dimensions, such as extracting useful information
dling complex sequences. These are the reasons behind the lower use from student academic records (demographic data), whereas LSTMs are
of GRU, however, this does not mean it is ineffective or unsuitable for effective in modeling sequential data by capturing temporal dependen-
predicting student performance. cies in the input data, such as modeling student grades and behavioral
Long Short-Term Memory (LSTM) is designed for time-series pre- data. Xiong et al. (2022) combined CNN with RNN for early prediction
diction tasks, handle sequential data over extended periods of time. The of student performance before the beginning of the course (the demo-
studies by Aljohani et al. (2019), Brdesee et al. (2022), Chen and Cui graphic data were used to train the CNN to extract useful features, and
(2020), Huang et al. (2022), Liu et al. (2020), and Xie (2021) high- then the CNN’s output was used as a feature vector to feed into the RNN
light the applications of LSTMs in capturing patterns in student data to predict the student performance). This means that the model depends
over semesters or academic years. LSTMs are effective at using previ- on the information available at that time to predict the student’s per-
ous exam results as indicators to identify patterns and predict students formance after a specific length of time, such as at the start of a course
future academic paths, identifying students at-risk of failure. Aljohani or semester. The CNN-LSTM architectures (Aljaloud et al., 2022, Chen
et al. (2019) and Huang et al. (2022) investigate the potential of LSTM et al., 2022, Duru et al., 2021, Li et al., 2022, Liu et al., 2023, Xiong
architectures for the early identification of students at-risk of failure et al., 2022) have achieved a high accuracy of more than 70.25% and
by analyzing their learning behaviors. These studies underscore the up to 91%, and on average 85.89%, except in Song et al. (2020), which
proficiency of LSTMs in facilitating timely interventions through the achieved an accuracy of about 61%.
handling of sequential data analysis. In contrast, Brdesee et al. (2022) Other architectures include combining DNNs with CNN, where CNN
employs LSTM to process semester-wise data, demonstrating its adapt- is designed to capture spatial features in input data, such as student
ability to various sequence lengths. By integrating the LSTMs with demographics, or academic records, which are then fed into DNNs
attention mechanisms, Liu et al. (2020) creates a hybrid model that to model the temporal dependencies in the data; Sikder et al. (2022)
achieves remarkable accuracy in predicting student performance up to applied DNN-CNN, and achieved a high accuracy of about 97%. Fur-
97%. Meanwhile, Xie (2021) and Chen and Cui (2020) highlight the thermore, combining DNNs with either GRU (T. Liu et al., 2022), or
important role that data pre-processing and feature selection play in op- LSTM (Hien et al., 2020, Prabowo et al., 2021) is an effective method
timizing LSTM models and enhancing their predictive accuracy through to predict student performance. For the DNNs/GRU hybrid model, as in
careful data management. The reported predictive accuracy across these T. Liu et al. (2022), GRU is used to model the temporal dependencies
studies demonstrates the potential of optimized LSTM to significantly by extracting the temporal behavior data from the learning behavior
enhance student outcome predictions. Despite the variance in reported of students with the help of an attention mechanism, and then DNNs
accuracy between 69.55% and 97%, and an average of 84.29%, the are used to further process the output of the GRU and produce the fi-
overall findings confirm that LSTMs have significant advantages in pre- nal prediction, this combination achieved an accuracy of about 92.5%.
dicting student performance. For combining DNNs with LSTM, as in Hien et al. (2020), Prabowo et

19
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

al. (2021), are used to predict exam scores based on the student de-
mographic and academic history data; LSTM is utilized to model the
temporal dependencies in the academic history, and DNNs are em-
ployed to analyze the demographic data. The outputs of the LSTM and
DNNs are combined and input into another DNN to produce the final
prediction.
Hybrid Deep Learning techniques merged with ML and Ensem-
ble Learning Models are another effective method to predict student
performance. Deep Learning techniques, such as DNNs and LSTM, are
well-suited to capture complex relationships in the data, while Machine
Learning techniques, such as RF, DT, LR, and clustering, are effective at
handling high-dimensional data and capturing non-linear relationships.
Ensemble Learning techniques, such as bagging, boosting, and stacking,
are used to further increase the predictive potential of the models by
Fig. 6. The percentage of studies that used either MOOCs, LMSs, or other plat-
integrating the capabilities of various models (Sagi & Rokach, 2018). forms.
DNNs have been used with Ensemble Learning techniques, includ-
ing bagging and boosting, in the research by Kostopoulos et al. (2020) courses, and the data collection duration differs among studies, either
for the early prediction of students at-risk of failure through extracting over every week, or at the course end.
features from student data, and then using either bagging or boosting Related studies have proven that student performance can be pre-
to combine the predictions of DNNs. This improved the accuracy of the dicted using several public and private databases. Researchers can use
predictions by reducing the variance of the individual models. Another these datasets as benchmark datasets to assess the performance of the
method is performed by combining DNNs with RF, as in Al-Zawqari model, compared with other models. Table 11 describes the datasets
et al. (2022), or with K-means, as in Kuo et al. (2021) and Yang et al. that are used in predicting student performance, including the number
(2022). Kuo et al. (2021) categorized courses using the K-means method of students and courses used with each one of them. The most popular
based on the year and shared attributes, and then applied four layers open datasets are utilized in MOOCs as follows:
of DNNs for each year. Meanwhile, Yang et al. (2022) integrated three Open University Learning Analytics Dataset (OULAD) (https://
DNN layers with the K-means algorithm based on the continuous val- analyse.kmi.open.ac.uk), is established by the Open University (OU).
ues of student grades. Therefore, the combination of DNNs with either This dataset is a publicly available, and contains data on online courses
Ensemble Learning or ML techniques achieved a prediction accuracy of provided by the UK’s Open University. This dataset was made avail-
between (72 and 92.9) %. able in 2016, and has been used in many studies to predict student
Combining LSTM with other DL or ML techniques has been applied performance. The OULAD has been used to investigate various aspects
to student predictive models. Dien et al. (2020) combined LSTM with of student performance and behavior in online courses, as it contains
CNN and integrated with LR algorithms to make predictions more ac- features such as demographic, previous, and current academic perfor-
curate. Moreover, LSTM is combined with clustering, as in Waheed et mance and learning behavioral/activity features. The dataset includes
al. (2022), or with DT and LR, as in Botelho et al. (2019). These com- information on 32,000 students, and 22 courses, making it a useful tool
binations create powerful predictive models that take advantage of the for researchers interested in exploring the variables that affect student’s
strengths of each approach. success in virtual learning environments. The OULAD has been used in
these studies: Aljohani et al. (2019), Al-Zawqari et al. (2022), He et al.
5.2. Dataset used in predictive models (2020), Huang et al. (2022), Jiao (2022), T. Liu et al. (2022), Liu et al.
(2023), Poudyal et al. (2022), Song et al. (2020), Waheed et al. (2019),
This subsection answers the second research question, which is Waheed et al. (2022), Xie (2021), and Yang and Bai (2022).
“RQ2: Which VLEs do the datasets belong to?”. Academic perfor- The National Tsing Hua University (NTHU) dataset, which is
mance prediction models have been tested and evaluated using a num- openly accessible, comprises data on the academic performance of
ber of public and private online platforms, such as MOOCs, LMSs, and NTHU undergraduate students in Taiwan captured between 2005 and
other learning platforms. These platforms contain data on how students 2014. The NTHU dataset contains about 1,317 student interaction be-
learn and interact, as well as their achievements. MOOCs cover a wide haviors when watching videos and participating in laboratory sessions.
range of subjects, from fundamental skills to advanced courses, based Moreover, a wide range of student features has been included in the
on student’s own needs and preferences, regardless of their experience, NTHU dataset, such as demographic, academic, and course-specific fea-
which allows them to learn new skills, or brush up on their knowledge. tures, which have been utilized by researchers to develop DL-based
However, because of the large number of students, it may be chal- predictive models for student performance (Lee et al., 2021).
lenging to receive individualized attention from instructors in MOOCs. The FutureLearn dataset (https://www.futurelearn.com), is a pub-
LMSs, on the other hand, can offer a wider range of online courses that licly accessible dataset on student performance collected from the Fu-
are more inclusive than MOOCs, and try to meet the needs of specific tureLearn platform, which is a UK-based MOOC provider. The Future-
students. Students enrolled in LMS’s programs or courses can get atten- Learn dataset contains data on over 213,000 students enrolled between
tion from instructors. Therefore, in MOOCs, students need to encourage 2014 and 2016 on a variety of subjects that include business, medicine,
and motivate themselves to complete their courses, while in LMSs, stu- and computer science. Researchers have developed DL-based models to
dents might attempt to complete the course, but either fail or withdraw predict student performance based on a number of features, such as
due to their lack of understanding, or for other reasons (Bendou et al., demographic, prior achievement, current academic achievement, and
2017, Mihaescu & Popescu, 2021). course engagement data (Duru et al., 2021).
While many studies have used publicly available datasets, such as Researchers have used several private datasets from different LMSs
MOOCs, as they include a considerable number of courses and students, platforms provided by various universities to predict student perfor-
the majority of studies have used private datasets from LMSs, as they mance. Each of these platforms offers a different set of student features,
contain more specific courses, and students receive specific attention. such as demographic, previous, and current academic performance and
Fig. 6 shows the percentage of studies that used either MOOCs (35%), learning behavioral/activity features. Examples of universities that have
LMSs (44%) or others (21%) to predict student performance. Datasets been used in the reviewed research include the International University
are usually gathered from a variety of courses, ranging from (1 to 4,699) of La Rioja, Bangladesh University, Shenyang University, Anhui Univer-

20
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Table 11
The datasets used in reviewed studies for predicting students performance.

Dataset References Number of Students Number of Courses

MOOCs OULAD Poudyal et al. (2022) 32,593 22


Xie (2021)
Huang et al. (2022)
Jiao (2022) 7
Waheed et al. (2019)
He et al. (2020)
Song et al. (2020)
Al-Zawqari et al. (2022)
Waheed et al. (2022) 1
Aljohani et al. (2019) -
T. Liu et al. (2022) 6455 1
Liu et al. (2023) 5521 1
Yang and Bai (2022) 600 1
National Tsing Hua University Lee et al. (2021) 1317 2
(NTHU)
FutureLearn Duru et al. (2021) 213,617 -

LMSs Colleges of Assam, India Hussain et al. (2019) 10140 -


Private dataset Tsiakmaki et al. (2020) 900 5
public 4-year university Nabil et al. (2021) 4266 -
International University of La Rioja Hidalgo et al. (2021) 500 4
Bangladesh University Rahman et al. (2022) 398 -
Shenyang University Liu et al. (2022a) 55 -
College of Metallurgical Tao et al. (2022) 1683 5
Engineering at Anhui Province
University
Bharathiar University Kavipriya and Sengaliappan (2021) 6324 -
Private institute dataset Zhang et al. (2023) 5021 3
Canadian University Chen and Cui (2020) 668 1
Saudi University Brdesee et al. (2022) 230,000 -
Aljaloud et al. (2022) 35,000 1
Beijing University Li et al. (2022) 9000 -
Gadjah Mada University Chen et al. (2022) 977 1
Vietnamese Universities Hien et al. (2020) 524 -
the Institute of Science, Trade & Sikder et al. (2022) 158 -
Technology (ISTT)
Bina Nusantara University Prabowo et al. (2021) 46,670 50
Hellenic Open University Kostopoulos et al. (2020) 1,073 1
National Taipei University of Kuo et al. (2021) - 6
Technology (NTUT)

Other Platforms UCI machine learning repository Aslam et al. (2021) 1044 2
(Mathematics and Portuguese
Language Courses)
Begum and Padmannavar (2022)
Yang et al. (2022)
Yousafzai et al. (2021)
Learner Activity Tracker Tool called Liu et al. (2022b) Alibaba Cloud Tianchi’s (480) -
Alibaba Cloud Tianchi’s xAPI and UCI machine learning library
UCI machine learning library Mathematics course (395)
Learner Activity Tracker Tool called Akour et al. (2020) 480 -
Alibaba Cloud Tianchi’s xAPI
ASSISTments Liu et al. (2020) 4,216 1
Botelho et al. (2019) 12,714 -
Kaggle repository Xiong et al. (2022) - -
Multidisciplinary University in the Li and Liu (2021) 83,993 4,699
United States
Dien et al. (2020)
WorldUC dataset Wang et al. (2020) 10,523 10
Liru dataset 1046 18
Junyi dataset 2,063 18

21
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

sity, Bharathiar University, Beijing University, Gadjah Mada University,


ISTT, Bina Nusantara University, Hellenic Open University, and Na-
tional Taipei University of Technology (NTUT), etc. The studies that use
LMSs dataset are Aljaloud et al. (2022), Brdesee et al. (2022), Chen and
Cui (2020), Chen et al. (2022), Hidalgo et al. (2021), Hien et al. (2020),
Hussain et al. (2019), Kavipriya and Sengaliappan (2021), Kostopoulos
et al. (2020), Kuo et al. (2021), Li et al. (2022), Liu et al. (2022a), Nabil
et al. (2021), Prabowo et al. (2021), Rahman et al. (2022), Sikder et
al. (2022), Tao et al. (2022), Tsiakmaki et al. (2020), and Zhang et al.
(2023). Furthermore, other datasets collected from different platforms
and used to capture patterns from student’s data include the following:
The UCI Machine Learning library, is a library of datasets that
is open to the public and used in DL studies to develop student per-
formance predictive models. There are more than 400 datasets in the
repository that cover a wide range of subjects; however, only the
datasets for Mathematics and Portuguese language courses were used in Fig. 7. The percentage of MOOC’s datasets used in the reviewed studies.
the reviewed studies. The dataset from these two courses includes 1,044
students, and the mathematics course contains 395 records, while the der, grade, class, and college), and dynamic features (knowledge level
dataset for the Portuguese language course contains 649 records. These and assignment score). For the Junyi dataset, there are data from about
two-courses datasets include data on student grades and demographic, 2,063 students that include static features (student modeling, answer
social, and academic features that were gathered by using school re- time duration, problem taken in exercises), and dynamic features (an-
ports and questionnaires. The UCI machine learning repository has been swering accuracy, user taking exercises, and user answering orders).
used by Aslam et al. (2021), Begum and Padmannavar (2022), Liu et al. Various datasets have been collected from MOOCs, LMSs, and other
(2022b), Yang et al. (2022), and Yousafzai et al. (2021) to train and test platforms, and used for student performance prediction. The most com-
the predictive models. monly used dataset in the studies is the OULAD from MOOCs, which
Learner Activity Tracker Tool called Alibaba Cloud Tianchi’s includes records of 32,593 students and 22 courses, and has been used
xAPI-Edu-Data, is an extensive collection of educational data from in 13 studies among 15 that have used MOOC datasets, as shown in
more than 10 million students who use the online learning platform. Fig. 7. Other studies have used MOOC datasets from the NTHU and Fu-
The xAPI dataset includes 480 student records and 16 features of stu- tureLearn datasets. Moreover, LMS private datasets, have been used in
dent demographics, their educational activities, and their performance about 19 studies that took these datasets from specific courses at uni-
in various courses. The xAPI enables the tracking of learner activity versities. Additionally, datasets collected from the UCI machine learn-
and performance data across many platforms and systems, and is in- ing repository, Cloud Tianchi’s xAPI, ASSISTments, Kaggle repository,
tended to address some of the drawbacks of LMSs. With the help of the Multidisciplinary University, WorldUC dataset, Liru dataset, and Junyi
xAPI, a variety of learning activities (including taking quizzes, viewing dataset have also been used in several studies. The datasets from the
videos, and engaging in discussion forums) can be recorded and ana- UCI machine learning repository were utilized in five studies, whereas
lyzed. Therefore, xAPI has been used by Akour et al. (2020), and Liu et the datasets from other platforms were used in only one or two studies.
al. (2022b) for a deeper knowledge of learner performance and engage- It has been noted that all the studies have trained, tested, and validated
ment. their models using only one dataset, except Liu et al. (2022b) and Wang
The ASSISTments dataset is a publicly accessible dataset that con- et al. (2020), which used two and three datasets, respectively.
tains data on student engagement and performance in online mathe- Predicting student performance relies on extracting various features
matics and science courses collected between 2004 and 2013 by the from the VLE’s datasets. OULAD is the most commonly used dataset
Worcester Polytechnic Institute and Carnegie Mellon University. The from MOOCs, which contains various features, such as demographic,
dataset contains a wide range of student features, including demograph- previous, and current academic performance, course-related informa-
ics, engagement, and academic achievement. The ASSISTments dataset tion, and learning behavioral/activity features. All studies that used the
was used by studies such as Botelho et al. (2019) and Liu et al. (2020) to OULAD dataset extracted learning behavioral/activity features. Aljo-
develop predictive models for student performance, and to investigate hani et al. (2019), Huang et al. (2022), T. Liu et al. (2022), Liu et al.
the relationship between student behavior, engagement, and academic (2023), and Waheed et al. (2022) used only learning behavioral/ac-
outcomes. tivity features; Poudyal et al. (2022), Waheed et al. (2019), and Xie
The Kaggle repository is an openly available platform that hosts (2021) added to them demographic features; while Jiao (2022) used
data science competitions, and provides researchers with access to a current student performance in addition to learning behavioral/activity
wide range of datasets. A dataset from the Kaggle repository was uti- features; Al-Zawqari et al. (2022), He et al. (2020), Song et al. (2020),
lized by Xiong et al. (2022) to train and test DL predictive models. and Yang and Bai (2022) used academic performance features along
A multidisciplinary university dataset is obtained from 16 aca- with demographic and learning behavioral/activity features.
demic units at Can Tho University, Vietnam between 2007 and 2019. In addition to the DL techniques applied and the features selected,
The dataset contains student data, marks, and other information from the model’s prediction accuracy can be based on many factors, includ-
4,699 courses, with 83,993 students, and 3,828,879 records. This ing the volume and type of dataset that is used in training and testing.
dataset was used to predict student performance by Dien et al. (2020) The use of various datasets has led to achieving remarkable prediction
and Li and Liu (2021). accuracy. The prediction accuracy of models that used datasets from
Three real-world datasets, the WorldUC, Liru, and Junyi datasets, MOOCs platforms was in the range (61 to 97.5) %, with an average
are used to train, test, and validate DL predictive models by Wang et of 86.89%. It has been noted that the models that used datasets from
al. (2020). For the WorldUC dataset, there are about 10,523 records MOOCs achieved a high predictive accuracy above 80%, except for one
with 10 consecutive online lessons, including static features (gender, study, that of Song et al. (2020). On the other hand, it has been observed
age, and expectation score), and dynamic features (engagement, live- that some studies that have used datasets from LMSs platforms show a
ness, the relevance of learning resources and content, sentiment, and little higher prediction accuracy than what has been achieved using
duration time). For the Liru dataset, about 1,046 student data were MOOCs datasets, the accuracy prediction through the use of datasets
taken from 18 online continuous lessons, including static features (gen- from LMSs platforms ranged (67.2 to 97) %, with an average of 85.1%.

22
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

The LMS’s average accuracy is a little less than that of MOOCs, which
may be due to some studies not providing a numeric prediction accu-
racy of their models (Prabowo et al., 2021, Rahman et al., 2022, Tao
et al., 2022). Another reason is that a considerable number of studies
have low prediction accuracy, compared to the studies using MOOCs
datasets. For the datasets from other platforms, no pattern was found,
and the prediction accuracy of the model ranged (79.7 to 99) %, with
an average of about 90.97%. Some of the studies (Dien et al., 2020, Liu
et al., 2022b, Wang et al., 2020, Yang et al., 2022) did not provide the
model’s accuracy, even though they used the DNNs technique, which
could be the reason behind having this average.

5.3. Features used in student performance prediction models

This subsection answers the third research question, which is “RQ3:


What predictive features were chosen?”. DL models, regardless of
their specific application, depend on the features extracted from data Fig. 8. The percentage of features used for predicting student performance.
to make predictions (Alhothali et al., 2022). DL models developed to
predict student performance, rely on the features of data supplied by
students during their virtual learning experiences. In the review, stu- rence rate at 34%, exceeding even current academic performance and
dent features have been categorized into four groups. The first category demographic features. Followed closely by current academic perfor-
is demographic features, which include the student’s personal, family, mance features at 32%, and demographic features at 30%. The least
and social data, such as age, gender, health status, occupation, family recurrent were the student’s previous academic performance features,
status, average study hour, number of friends, etc. The second category with a 4% of recurrence rate. The high recurrence of using student
is previous academic background features, such as admission score, behavioral and activity features in predictive models indicates student
type, entrance exam scores, place of previous education, pre-university engagement has a stronger correlation with academic success than other
subjects, grades prior to course knowledge, etc. The third category is features. These features serve as direct measures of engagement, and
current academic performance features, which include course infor- capture the complex interactions that students have with online learn-
mation, years of enrollment, attendance, assignment, laboratory and ing environments. The engaged students spend more time interacting
final grade, and GPA, etc. The last category is student learning be- with the course materials, participating actively in discussions, and uti-
havior and interaction data, such as web page clicking, number of lizing available resources, which reflects on their performance. Thereby,
logins, total online time, number of uploading/downloading and vis- those engaged students are more likely to understand the course con-
iting discussion forums, number of submitted assessments, and video tent, participate in discussions, complete assignments on time, and ask
interaction data, such as playback_speed, pauses, seek_back, seek_for- for assistance when needed, which is considered essential for academic
ward, and video_completed. Some of the features used in the research success. Behavioral and activity features reflect real-time engagement
have the same meaning, but are labeled differently. These are only a with the learning process, allowing instructors and administrators to
few of the many features that can be used to predict student perfor- identify at-risk students early in the course, and enable timely interven-
mance. Table 12 provides a summary of the feature’s categories used in tions to support those who might fail or withdraw.
the related studies for performance prediction. Moreover, it has been proven that the student academic features,
Feature selection plays a critical role in traditional ML techniques, such as attendance, assignment, laboratory, and final grade, are consid-
leading to simplification, training time reduction, and avoiding the ered important factors, and are not ranked less than the importance of
dimensionality curse (Huan & Hiroshi, 1998). The goal of feature se- student activity features. Student academic and activity features are the
lection is to identify a subset of the input data’s features that effec- only features that can be used alone in the predictive models. The use of
tively describe the data, based on the problem being addressed and the student demographic features can be a valuable tool for understanding
target of the predictive model being implemented (Chandrashekar & and improving student performance; however, it is less used by itself in
Sahin, 2014). However, when it comes to DL techniques, the necessity the predictive model, unless added with other features, except in Xiong
of feature selection shifts, as Neural Networks are efficient at han- et al. (2022). Previous academic features provide a historical context of
dling high-dimensional data, and automatically extracting important a student’s learning journey, helping predict future performance. Previ-
features by finding complex patterns. Nevertheless, feature selection ous academic features of students have only been used in three studies
in DL techniques is adopted in some circumstances to minimize the (Brdesee et al., 2022, Song et al., 2020, Yousafzai et al., 2021), as these
impact of irrelevant features, and maintain prediction results, thereby studies have already incorporated all the other categories of features.
facilitating a greater understanding of the data, reducing computation The reviewed studies used one or more categories of features when
time, and improving prediction performance (Guyon & Elisseeff, 2003, predicting student performance. The preference for using just one fea-
Chandrashekar & Sahin, 2014). Using many features leads in some cir- ture category has been observed, which accounts for approximately
cumstances to more accurate predictions, as in Hien et al. (2020), Jiao 41% of the studies, as shown in Fig. 9. This approach is followed by
(2022), Rahman et al. (2022), Poudyal et al. (2022), and Al-Zawqari et the integration of two feature categories in 37% studies. Meanwhile, a
al. (2022), as more features provide more data to the model, leading smaller subset of 22% studies was selected for a more comprehensive
to a better understanding of the relationship between the features and approach by incorporating three different feature categories. Notably,
the target variable (student performance). However, it has been noted none of the reviewed studies explored the possibility of combining four
that in some studies, increasing the number of features can lead to over- unique groups of features. This analysis highlights how single feature
fitting (Tsiakmaki et al., 2020), which can reduce the accuracy of the categories are often used to predict student performance, which is con-
predictions (Ying, 2019). sistent with their popularity among researchers. However, the impor-
The analysis of using each category of features in the reviewed tance of employing multiple feature categories should not be ignored.
studies, as shown in Fig. 8, highlights the significance of student be- Numerous studies have shown more complicated and revealing patterns
havioral and activity features in predicting student performance. These in student behavior and performance by adopting two or three cate-
features, which include click-stream data, represent the highest recur- gories of features. This multidimensional method shows that integrating

23
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Table 12
The feature’s categories used for predicting student performance.

Feature’s Categories Features References

Demographic Personal Age Waheed et al. (2019)


Gender Xiong et al. (2022)
Nationality Tsiakmaki et al. (2020)
Health status Aslam et al. (2021)
Marital status Nabil et al. (2021)
Living location Yang and Bai (2022)
Occupation Liu et al. (2022b)
Education level Akour et al. (2020)
Self-sponsorship Kavipriya and Sengaliappan (2021)
Family status Parent’s qualification Poudyal et al. (2022)
Parent’s occupation Begum and Padmannavar (2022)
Parent’s status Zhang et al. (2023)
Family income He et al. (2020)
Family size Xie (2021)
Social Average study hour Brdesee et al. (2022)
Number of friends Yousafzai et al. (2021)
Frequency of going out Sports/extra curricular activities Song et al. (2020)
Time spent on social media Hien et al. (2020)
Sikder et al. (2022)
Prabowo et al. (2021)
Kostopoulos et al. (2020)
Dien et al. (2020)
Kuo et al. (2021)
Al-Zawqari et al. (2022)
Yang et al. (2022)

Previous academic background Admission score Yousafzai et al. (2021)


Admission type Song et al. (2020)
Entrance exams scores Brdesee et al. (2022)
Place of previous education
Pre-University subjects grades
Prior course knowledge

Current academic performance GPA Hussain et al. (2019)


Final grade Aslam et al. (2021)
Class attendance Course marks Li and Liu (2021)
Class attendance Course marks Nabil et al. (2021)
Number of assignments Hidalgo et al. (2021)
Assignment scores Yang and Bai (2022)
Class test marks Jiao (2022)
Laboratory score Rahman et al. (2022)
Years of enrollment Liu et al. (2022a)
Course information Liu et al. (2022b)
Tao et al. (2022)
Akour et al. (2020)
Kavipriya and Sengaliappan (2021)
Begum and Padmannavar (2022)
Zhang et al. (2023)
He et al. (2020)
Brdesee et al. (2022)
Yousafzai et al. (2021)
Chen et al. (2022)
Hien et al. (2020)
Sikder et al. (2022)
Prabowo et al. (2021)
Kostopoulos et al. (2020)
Dien et al. (2020)
Kuo et al. (2021)
Yang et al. (2022)
Al-Zawqari et al. (2022)

24
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Table 12 (continued)

Feature’s Categories Features References

Behavioral activities (click Web page clicking Waheed et al. (2019)


streams) Number of logins Wang et al. (2020)
Total online time Tsiakmaki et al. (2020)
Average session time Nabil et al. (2021)
Duration of watching video Hidalgo et al. (2021)
Number of uploading/downloading files Poudyal et al. (2022)
Number of create/edit post Lee et al. (2021)
Number of visit discussion forum Yang and Bai (2022)
Number of ask/answering question Jiao (2022)
Number of browsing wikis Akour et al. (2020)
Number and total time material viewed Liu et al. (2022b)
Total interval between two logins He et al. (2020)
Visited resources Aljohani et al. (2019)
Number of submitted assessment/tasks Liu et al. (2020)
Video interaction data (playback_speed, pauses, seek_back, Chen and Cui (2020)
seek_forward, and video_completed) Xie (2021)
Huang et al. (2022)
Duru et al. (2021)
Li et al. (2022)
Song et al. (2020)
Chen et al. (2022)
Aljaloud et al. (2022)
Liu et al. (2023)
T. Liu et al. (2022)
Kostopoulos et al. (2020)
Waheed et al. (2022)
Botelho et al. (2019)
Al-Zawqari et al. (2022)

Fig. 9. The percentage of the number of features used on each study. Fig. 10. The percentage of each category used in the studies that use only one
feature’s category.
several categories might lead to a deeper understanding of student per-
formance, thereby improving the predictive powers of these models.
ence academic performance, which allows a predictive model to capture
For the studies that have only used one category of feature, as shown
a complete picture of a student’s situation.
in Fig. 10, the majority (about 69%) selected student activity features,
followed by student current academic features (about 26%), while only
Xiong et al. (2022) utilized student demographic features to predict 6. Challenges found in previous studies
student performance. The studies that combined two categories of fea-
tures for prediction, as shown in Fig. 11, have observed that the use of This section answers the last research question, which is “RQ4:
student demographic features in many studies (42%) is combined with What are the challenges and future directions?”. Even though many
either student current academic (39%) or activity features (19%) to help DL techniques have been developed to measure the student perfor-
develop more accurate models of student performance, and identify stu- mance and find students at-risk of failure, there are still some issues
dents who are at-risk of failure. Moreover, it has been noted that none that have not yet been solved. These issues include:
of the studies that have used one or two categories of features utilize
the student’s previous academic features, either because previous aca- • There is ambiguity in the meaning of student performance among
demic features may not be available, may not be relevant, or may not the studies defined earlier in the introduction (Biggs, 1996, Hattie,
be predictive for the chosen task. For the studies that have used three 2008, Fredricks et al., 2004, Marzano, 2003, National Academies
categories of features, as shown in Fig. 12, (about 33%) of the stud- of Sciences, Engineering, and Medicine, 2012, Tinto, 2012, Biggs
ies used student demographic features combined with student current et al., 2022, Lerner & Steinberg, 2009), which will have an impact
academic features (30%) and either student activity features (27%) or on how to identify and handle student performance. Each study
the student’s previous academic features (10%). This combination helps highlights a different dimension of student performance. These var-
provide a more comprehensive understanding of the factors that influ- ied perspectives underscore the complexity of assessing student

25
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

hand, the difficulty in making early predictions lies in their po-


tential inaccuracy due to reliance on limited data, which can lead
to a higher incidence of false positives or negatives. This means
there is a risk of incorrectly identifying students as either at-risk
of failure or not, potentially resulting in missed opportunities for
support. Therefore, developing a hybrid model that combines early
and end-of-course predictions can help with timely intervention.
The challenge in developing such a hybrid model lies in identify-
ing the right data points for early prediction without overwhelming
the predictive model with too much data as the course progresses.
• Most studies have concentrated on analyzing students learning be-
haviors and activities, such as engagement with online platforms,
assignment completion, and interaction patterns, which, while in-
sightful, may not fully capture the factors influencing learning out-
comes. On the other hand, fewer studies have incorporated other
student features that include demographic information, previous
Fig. 11. The percentage of each category used in the studies that use two fea-
academic history, and current academic performance. Integrating
ture’s categories.
these diverse data types can be complex due to the need for so-
phisticated models to handle the increased data complexity, and
extract meaningful patterns. In contrast, this integration offers a
more comprehensive understanding of student experiences, and
promises a deeper exploration of the multifaceted nature of learn-
ing.

7. Conclusion and future work

Studies on predicting student performance have been developed in


recent years to help both administrators and instructors enhance the
teaching and learning process. This review presents an analysis of DL
techniques, datasets, and features for 46 studies conducted between
2019 and 2023. The DL techniques used in the reviewed studies include
applying either one or more of DNNs, CNNs, RNNs, LSTMs, Bi-LSTMs,
and GRU, or combining DL with either ML or Ensemble Learning tech-
niques. Among the 46 articles examined, the DNNs model was the most
common technique, followed by the hybrid model (CNN-LSTM). For
Fig. 12. The percentage of each category used in the studies that use three accuracy, the studies that used DL techniques, such as CNNs, DNNs,
feature’s categories. and LSTMs, performed well through achieving high prediction accu-
racy above 90%; other studies reported accuracy ranging (60 to 90) %.
For datasets used within the reviewed studies, even though 44% of the
performance and the importance of considering multiple facets of studies have used LMSs datasets, it was found that OULAD is the most
educational success. used dataset from MOOCs. For the selected features, the results of the
• Hybrid models integrate various techniques, such as ML, DL, and analysis indicate that the best features for prediction are the learning
Ensemble Learning techniques. These hybrid models leverage the
behavior and activity features, which outperform other feature cate-
strengths of each component model to improve the overall pre-
gories. Finally, the educational prediction findings hopefully serve as
dictive capability and produce competitive accuracy. Despite their
a strong foundation for administrators and instructors to observe stu-
potential, hybrid models in educational prediction tasks require
dent performance and provide a suitable educational adaptation that
further investigation for several reasons. First, a hybrid model
can meet their needs to protect them from failure and prevent their
may perform well in one type of prediction task, but it may not
dropout.
necessarily generalize to other tasks. Second, the deployment of
To enhance the development of well-defined and more accurate pre-
hybrid models is more resource-intensive, and requires more com-
diction solutions, a wide range of research issues need to be further
putational power. More importantly, hybrid models can be more
complex, making them harder to interpret and understand. Due to explored. Firstly, there is a significant opportunity in adapting hy-
this complexity, administrators and instructors might experience brid models that merge various predictive techniques, with the aim of
difficulties when making decisions based on model predictions. Fi- demonstrating their effectiveness in generating highly accurate predic-
nally, it is not always clear which combinations will yield the best tions. Secondly, the concept of continuous prediction throughout the
results, and this likely varies depending on the specific educational course’s duration emerges as a critical strategy. By implementing on-
context. going predictions, administrators and instructors can identify students
• There is a gap between predicting student outcomes at the end of a at-risk of failure earlier and intervene in a timely manner, thereby in-
course, and making predictions early in the course. The limitation creasing student success and course completion. Finally, an integrative
of predicting student outcomes at the end of a course is their tim- approach that utilizes a comprehensive range of features promises to
ing. By the time these predictions are made, the course is over, and provide a full understanding of each student’s situation. This compre-
opportunities for intervention during the course are missed. At best, hensive perspective could identify patterns that may be ignored by
they can inform future courses or provide late interventions that do adopting only a single category of feature, allowing for customized sup-
not help the student improve in the current course. On the other port that attends to each student’s specific needs.

26
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Ethics MAPE Mean Absolute Percentage Error


RMSE Root Mean Square Error
This review does not report on or involve the use of any animal or AUC Area Under the Curve
human data and therefore did not require the authors to obtain ethics MSE Mean Squared Error
approval or consent. It followed Okoli’s guidelines for systematic litera- GA Genetic Algorithm
ture reviews. Furthermore, this review has respected all copyrights and BPSO Binary Particle Swarm Optimization
intellectual property rights associated with the studies. DLDSS Deep Learning Decision Support System
AWDCNN Adaptive Weight Deep Convolutional Neural Network
Consent to participate MsaCNN Multi-source sparse Attention Convolutional Neural Network
MFA-DKT Multiple Features fusion Attention mechanism enhanced-
Not applicable. Deep Knowledge Tracing
SEPN Sequential Engagement based academic performance Predic-
Consent to publish tion Network
SP Sequential Predictor
Not applicable. ED Engagement Detector
AWDCNN Adaptive Weight Deep Convolutional Neural Network
Funding CST Computer Science and Technology
SE Software Engineering
No funds, grants, or other support was received. EIE Electronic Information Engineering
ESL English Second Language
List of acronyms and full terms EOL English Official Language
EPL English Primary Language
AI Artificial Intelligence ESPP Explainable Student Performance Prediction
ML Machine Learning ISTT Institute of Science, Trade & Technology
DL Deep Learning NTUT National Taipei University of Technology
VLEs Virtual Learning Environments
LMSs Learning Management Systems CRediT authorship contribution statement
MOOCs Massive Open Online Courses
OULAD Open University Learning Analytics Dataset Bayan Alnasyan: Writing – review & editing, Writing – original
LA Learning Analytics draft. Mohammed Basheri: Writing – review & editing. Madini Alas-
PLA Predictive Learning Analytics safi: Writing – review & editing.
EDM Educational Data Mining
SDT Self-Determination Theory Declaration of competing interest
EVT Expectancy-Value Theory
CoI Community of Inquiry The authors declare that they have no known competing financial
DMEE Dynamic Model of Educational Effectiveness interests or personal relationships that could have appeared to influence
MLMSA Multilevel Model of Student Achievement the work reported in this paper.
NNs Neural Networks
ANNs Artificial Neural Networks Data availability
DNNs Deep Neural Networks
CNNs Convolutional Neural Networks All of the data in this work comes from publicly accessible reposito-
DCNNs Deep Convolutional Neural Networks ries, including a variety of electronic databases detailed in Section 5.2.
1D CNN One-dimensional Convolutional Neural Network
2D CNN Two-dimensional Convolutional Neural Network Declaration of generative AI and AI-assisted technologies in the
FDPN Factorization Deep Product Neural Network writing process
RNN Recurrent Neural Network
LSTM Long Short-Term Memory During the preparation of this work, the authors utilized GPT-4 from
Bi-LSTM Bidirectional Long Short-Term Memory OpenAI to enhance language and readability. Subsequent to employing
GRU Gated Recurrent Unit this tool, the authors meticulously reviewed and revised the content
MLP Multi-layer Perceptron as necessary. The authors take full responsibility for the content of the
RF Random Forest publication.
SVM Support Vector Machine
GBT Gradient Boosting Tree References
NB Naïve Bayes
Agudo-Peregrina, A. F., Iglesias-Pradas, S., Conde-Gonzalez, M. A., & Hernandez-Garcia,
DT Decision Tree
A. (2014). Can we predict success from log data in vles? Classification of interactions
KNN K-Nearest Neighbor for learning analytics and their relation with performance in vle-supported f2f and
LR Logistic Regression online learning. Computers in Human Behavior, 31, 542–550.
GPA Grade Point Average Akour, M., Sghaier, H. A., & Qasem, O. A. (2020). The effectiveness of using deep learn-
ing algorithms in predicting students achievements. Indonesian Journal of Electrical
ReLU Rectified Linear Unit
Engineering and Computer Science, 19, 388–394.
SMOTE Synthetic Minority Over-sampling Technique Al-Zawqari, A., Peumans, D., & Vandersteen, G. (2022). A flexible feature selection ap-
ROS Random Over Sampling proach for predicting students’ academic performance in online courses. Computers
ADASYN Adaptive Synthetic Sampling and Education: Artificial Intelligence, 3, Article 100103.
SMOTE-ENN SMOTE and Edited Nearest Neighbors Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student’
performance prediction using machine learning techniques. Education Sciences, 11.
RFECV Recursive Feature Elimination with Cross-Validation Alhothali, A., Albsisi, M., Assalahi, H., & Aldosemani, T. (2022). Predicting student out-
ANOVA Analysis of Variance comes in online courses using machine learning techniques: A review. Sustainability,
MAE Mean Absolute Error 14(10), 6199.

27
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Aljaloud, A. S., Uliyan, D. M., Alkhalil, A., Elrhman, M. A., Alogali, A. F. M., Altameemi, Haertel, G. D., Walberg, H. J., & Weinstein, T. (1983). Psychological models of edu-
Y. M., Altamimi, M., & Kwan, P. (2022). A deep learning model to predict student cational performance: A theoretical synthesis of constructs. Review of Educational
learning outcomes in lms using cnn and lstm. IEEE Access, 10, 85255–85265. Research, 53(1), 75–91.
Aljohani, N. R., Fayoumi, A., & Hassan, S. U. (2019). Predicting at-risk students using Hattie, J. (2008). Visible learning: A synthesis of over 800 meta-analyses relating to achieve-
clickstream data in the virtual learning environment. Sustainability (Switzerland), 11. ment. Routledge.
Aslam, N., Khan, I. U., Alamri, L. H., & Almuslim, R. S. (2021). An improved early stu- He, Y., Chen, R., Li, X., Hao, C., Liu, S., Zhang, G., & Jiang, B. (2020). Online at-risk
dent’s performance prediction using deep learning. International Journal: Emerging student identification using rnn-gru joint neural networks. Information (Switzerland),
Technologies in Learning, 16, 108–122. 11, 1–11.
Astin, A. (1993). What matters in college?: Four critical years revisited. Jossey-Bass google Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., Navarro-Colorado, B., et al. (2019).
schola, 2, 349–387. A systematic review of deep learning approaches to educational data mining. Com-
Astin, A. W. (1984). Student involvement: A developmental theory for higher education. plexity, 2019.
Journal of College Student Personnel, 25, 297–308. Herodotou, C., Rienties, B., Hlosta, M., Boroowa, A., Mangafa, C., & Zdrahal, Z. (2020).
Baashar, Y., Alkawsi, G., Mustafa, A., Alkahtani, A. A., Alsariera, Y. A., Ali, A. Q., Hashim, The scalable implementation of predictive learning analytics at a distance learning
W., & Tiong, S. K. (2022). Toward predicting student’s academic performance using university: Insights from a longitudinal case study. The Internet and Higher Education,
artificial neural networks (anns). Applied Sciences (Switzerland), 12. 45, Article 100725.
Baneres, D., Rodriguez-Gonzalez, M. E., Guerrero-Roldan, A. E., & Cortadas, P. (2023). An Hidalgo, A. C., Ger, P. M., & Valentin, L. D. L. F. (2021). Using meta-learning to pre-
early warning system to identify and intervene online dropout learners. International dict student performance in virtual learning environments. Applied Intelligence, 52,
Journal of Educational Technology in Higher Education, 20(1). 3352–3365.
Begum, S., & Padmannavar, S. S. (2022). Student performance prediction with bpso fea-
Hien, D. T. T., Thuy, C. T. T., Anh, T. K., Son, D. T., & Giap, C. N. (2020). Optimize
ture selection and cnn classifier. International Journal of Advanced and Applied Sciences,
the combination of categorical variable encoding and deep learning technique for the
9, 84–92.
problem of prediction of Vietnamese student academic performance. IJACSA Interna-
Bendou, K., Megder, S. A. M. E., & Cherkaoui, S. C. A. M. (2017). Animated pedagogical
tional Journal of Advanced Computer Science and Applications, 11.
agents to assist learners and to keep them motivated on online learning environments
Huan, L., & Hiroshi, M. (1998). Feature selection for knowledge discovery and data mining.
(lms or mooc). International Journal of Computer Applications, 168, 975–8887.
Engineering and computer science. Kluwer Academic Publishers.
Benkendorf, D. J., & Hawkins, C. P. (2020). Effects of sample size and network depth on
Huang, H., Yuan, S., He, T., & Hou, R. (2022). Use of behavior dynamics to improve early
a deep learning approach to species distribution modeling. Ecological Informatics, 60.
detection of at-risk students in online courses. Mobile Networks and Applications, 27,
Biggs, J. (1996). Enhancing teaching through constructive alignment. Higher Education,
441–452.
32(3), 347–364.
Biggs, J. (1999). What the student does: Teaching for enhanced learning. Higher Education Hussain, S., Muhsin, Z. F., Salal, Y. K., Theodorou, P., Kurtoğlu, F., & Hazarika, G. C.
Research & Development, 18(1), 57–75. (2019). Prediction model on student performance based on internal assessment using
Biggs, J., Tang, C., & Kennedy, G. (2022). Teaching for quality learning at university 5e. deep learning. International Journal: Emerging Technologies in Learning, 14, 4–22.
McGraw-Hill Education (UK). Jiao, X. (2022). A factorization deep product neural network for student physical perfor-
Botelho, A. F., Varatharaj, A., Patikorn, T., Doherty, D., Adjei, S. A., & Beck, J. E. (2019). mance prediction. Computational Intelligence and Neuroscience, 2022, 1–8.
Developing early detectors of student attrition and wheel spinning using deep learn- Kavipriya, T., & Sengaliappan, M. (2021). Adaptive weight deep convolutional neural
ing. IEEE Transactions on Learning Technologies, 12, 158–170. network (awdcnn) classifier for predicting student’s performance in job placement
Brdesee, H. S., Alsaggaf, W., Aljohani, N., & Hassan, S.-U. (2022). Predictive model using process. Annals of the Romanian Society for Cell Biology, 5494(5590), 25.
a machine learning approach for enhancing the retention rate of students at-risk. Kostopoulos, G., Tsiakmaki, M., Kotsiantis, S., & Ragos, O. (2020). Deep dense neural
International Journal on Semantic Web and Information Systems, 18, 1–21. network for early prediction of failure-prone students. Machine Learning Paradigms:
Calvert, C. E. (2014). Developing a model and applications for probabilities of student Advances in Deep Learning-based Technological Applications, 291–306.
success: A case study of predictive analytics. Open Learning: The Journal of Open, Kuo, J. Y., Chung, H. T., Wang, P. F., & Lei, B. (2021). Building student course perfor-
Distance and e-Learning, 29(2), 160–173. mance prediction model based on deep learning. Journal of Information Science and
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers Engineering, 37, 243–257.
& Electrical Engineering, 40(1), 16–28. Kyriakides, L. (2008). Testing the validity of the comprehensive model of educational
Chen, F., & Cui, Y. (2020). Utilizing student time series behaviour in learning management effectiveness: A step towards the development of a dynamic model of effectiveness.
systems for early prediction of course performance. Journal of Learning Analytics, 7, School Effectiveness and School Improvement, 19(4), 429–446.
1–17. Lee, C.-A., Tzeng, J.-W., Huang, N.-F., & Su, Y.-S. (2021). Prediction of student perfor-
Chen, H. C., Prasetyo, E., Tseng, S. S., Putra, K. T., Prayitno, K. S. S., & Weng, C. E. mance in massive open online courses using deep learning system based on learning
(2022). Week-wise student performance early prediction in virtual learning environ- behaviors. Technology & Society, 24, 130–146.
ment using a deep explainable artificial intelligence. Applied Sciences (Switzerland), Lerner, R. M., & Steinberg, L. (2009). Handbook of adolescent psychology, volume 1: Indi-
12. vidual bases of adolescent development. John Wiley & Sons.
Creemers, B., & Kyriakides, L. (2010). School factors explaining achievement on cognitive Li, S., & Liu, T. (2021). Performance prediction for higher education students using deep
and affective outcomes: Establishing a dynamic model of educational effectiveness. learning. Complexity, 2021.
Scandinavian Journal of Educational Research, 54(3), 263–294. Li, X., Zhang, Y., Cheng, H., Li, M., & Yin, B. (2022). Student achievement prediction
Deci, E. L., & Ryan, R. M. (2012). Self-determination theory. Handbook of Theories of Social using deep neural network from multi-source campus data. In Complex and intelligent
Psychology, 1(20), 416–436. systems.
Dien, T. T., Luu, S. H., Thanh-Hai, N., & Thai-Nghe, N. (2020). Deep learning with data Liu, C., Wang, H., & Yuan, Z. (2022a). A method for predicting the academic performances
transformation and factor analysis for student performance prediction. IJACSA Inter- of college students based on education system data. Mathematics, 10.
national Journal of Advanced Computer Science and Applications, 11.
Liu, C., Wang, H., & Yuan, Z. (2022b). A predictive model for student achievement using
Dos Santos, C. F. G., & Papa, J. P. (2022). Avoiding overfitting: A survey on regularization
spiking neural networks based on educational data. Applied Sciences (Switzerland), 12.
methods for convolutional neural networks. ACM Computing Surveys, 54, 1–25.
Liu, D., Zhang, Y., Zhang, J., Li, Q., Zhang, C., & Yin, Y. (2020). Multiple features fu-
Du, X., Yang, J., Hung, J.-L., & Shelton, B. (2020). Educational data mining: A systematic
sion attention mechanism enhanced deep knowledge tracing for student performance
review of research and emerging trends. Information Discovery and Delivery, 48(4),
prediction. IEEE Access, 8, 194894–194903.
225–236.
Liu, T., Wang, C., Chang, L., & Gu, T. (2022). Predicting high-risk students using learning
Duru, I., Sunar, A. S., White, S., & Diri, B. (2021). Deep learning for discussion-based
behavior. Mathematics, 10.
cross-domain performance prediction of mooc learners grouped by language on fu-
Liu, Y., Fan, S., Xu, S., Sajjanhar, A., Yeom, S., & Wei, Y. (2023). Predicting student
turelearn. Arabian Journal for Science and Engineering, 46, 3613–3629.
performance using clickstream data and machine learning. Education Sciences, 13.
El-Sabagh, H. A. (2021). Adaptive e-learning environment based on learning styles and
its impact on development students’ engagement. International Journal of Educational Lopez, O. A. M., Lopez, A. M., & Crossa, J. (2022). Fundamentals of artificial neural networks
Technology in Higher Education, 18. and deep learning. Springer International Publishing.
Fraser, B. J., Walberg, H. J., Welch, W. W., & Hattie, J. A. (1987). Syntheses of educational Magalhães, P., Ferreira, D., Cunha, J., & Rosário, P. (2020). Online vs traditional home-
productivity research. International Journal of Educational Research, 11(2), 147–252. work: A systematic review on the benefits to students’ performance. Computers and
Fredricks, J. A., Blumenfeld, P. C., & Paris, A. H. (2004). School engagement: Potential of Education, 152, Article 103869.
the concept, state of the evidence. Review of Educational Research, 74(1), 59–109. Mah, D.-K. (2016). Learning analytics and digital badges: Potential impact on student
Fullarton, S. (2002). Student engagement with school: Individual and school-level in- retention in higher education. Technology, Knowledge and Learning, 21, 285–305.
fluences, longitudinal surveys of Australian youth. Australian Council for Education Marzano, R. J. (2003). What works in schools: Translating research into action. ASCD.
Research (ACER). Mihaescu, M. C., & Popescu, P. S. (2021). Review on publicly available datasets for
Garrison, D. R., Anderson, T., & Archer, W. (2001). Critical thinking, cognitive presence, educational data mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge
and computer conferencing in distance education. American Journal of Distance Edu- Discovery, 11.
cation, 15(1), 7–23. Nabil, A., Seyam, M., & Abou-Elfetouh, A. (2021). Prediction of students’ academic
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal performance based on courses’ grades using deep neural networks. IEEE Access, 9,
of Machine Learning Research, 3, 1157–1182. 140731–140746.

28
B. Alnasyan, M. Basheri and M. Alassafi Computers and Education: Artificial Intelligence 6 (2024) 100231

Namoun, A., & Alshanqiti, A. (2021). Predicting student performance using data mining Waheed, H., Hassan, S. U., Nawaz, R., Aljohani, N. R., Chen, G., & Gasevic, D. (2022).
and learning analytics techniques: A systematic literature review. Applied Sciences Early prediction of learners at risk in self-paced education: A neural network ap-
(Switzerland), 11, 1–28. proach. Expert Systems with Applications, 213.
National Academies of Sciences, Engineering, and Medicine (2012). Education for life and Walberg, H. J. (1981). A psychological theory of educational productivity. Australian Jour-
work: Developing transferable knowledge and skills in the 21st century. Washington, DC: nal of Education.
The National Academies Press. Walberg, H. J., Fraser, B. J., & Welch, W. W. (1986). A test of a model of educational
Nawang, H., Makhtar, M., & Hamzah, W. M. A. F. W. (2021). A systematic literature re- productivity among senior high school students. The Journal of Educational Research,
view on student performance predictions. International Journal of Advanced Technology 79(3), 133–139.
and Engineering Exploration, 8, 1441–1453. Wang, X., Mei, X., Huang, Q., Han, Z., & Huang, C. (2020). Fine-grained learning per-
Okoli, C. (2015). A guide to conducting a standalone systematic literature review. Com- formance prediction via adaptive sparse self-attention networks. Information Sciences,
munications of the Association for Information Systems, 37. 545, 223–240.
Papamitsiou, Z., & Economides, A. A. (2014). Learning analytics and educational data Wigfield, A., & Eccles, J. S. (2000). Expectancy–value theory of achievement motivation.
mining in practice: A systematic literature review of empirical evidence. Journal of Contemporary Educational Psychology, 25(1), 68–81.
Educational Technology & Society, 17(4), 49–64. Xie, Y. (2021). Student performance prediction via attention-based multi-layer long-short
Pem, U., Dorji, C., Tshering, S., & Dorji, R. (2021). Effectiveness of the virtual learning term memory. Journal of Computer and Communications, 09, 61–79.
environment (vle) for online teaching, learning, and assessment: Perspectives of aca- Xiong, S., Gasim, E., XinYing, C., Wah, K. K., & Ha, L. M. (2022). A proposed hybrid cnn-
demics and students of the royal university of Bhutan. International Journal of English rnn architecture for student performance prediction. International Journal of Intelligent
Literature and Social Sciences, 6(4), 183–197. Systems and Applications in Engineering IJISAE, 2022, 347–355.
Poudyal, S., Mohammadi-Aragh, M. J., & Ball, J. E. (2022). Prediction of student academic Yang, L., & Bai, Z. (2022). Study on score prediction model with high efficiency based on
performance using a hybrid 2d cnn model. Electronics (Switzerland), 11. deep learning. Electronics (Switzerland), 11.
Prabowo, H., Hidayat, A. A., Cenggoro, T. W., Rahutomo, R., Purwandari, K., & Par- Yang, X., Zhang, H., Chen, R., Li, S., Zhang, N., Wang, B., & Wang, X. (2022). Research
damean, B. (2021). Aggregating time series and tabular data in deep learning model on forecasting of student grade based on adaptive k-means and deep neural network.
for university students’ gpa prediction. IEEE Access, 9, 87370–87377. Wireless Communications and Mobile Computing, 2022.
Rahman, M., Hasan, M., Billah, M., & Sajuti, R. J. (2022). Grading system prediction of Ying, X. (2019). An overview of overfitting and its solutions. Journal of Physics. Conference
educational performance analysis using data mining approach. MJSAT, 2, 204–211. Series, 1168.
Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Yousafzai, B. K., Afzal, S., Rahman, T., Khan, I., Ullah, I., Rehman, A. U., Baz, M., Hamam,
Data Mining and Knowledge Discovery, 8. H., & Cheikhrouhou, O. (2021). Student-performulator: Student academic perfor-
Sarker, I. H. (2021). Deep learning: A comprehensive overview on techniques, taxonomy, mance using hybrid deep neural network. Sustainability (Switzerland), 13.
applications and research directions. SN Computer Science, 2. Zhang, Y., An, R., Liu, S., Cui, J., & Shang, X. (2023). Predicting and understanding stu-
Scheerens, J., & Blomeke, S. (2016). Integrating teacher education effectiveness research dent learning performance using multi-source sparse attention convolutional neural
into educational effectiveness models. Educational Research Review, 18, 70–87. networks. IEEE Transactions on Big Data, 9, 118–132.
Schumacher, C., & Ifenthaler, D. (2018). Features students really expect from learning
analytics. Computers in Human Behavior, 78, 397–407.
Siemens, G., & Baker, R. S. d. (2012). Learning analytics and educational data mining: Bayan Alnasyan received her B.S. degree in Computer Science from Qassim Uni-
Towards communication and collaboration. In Proceedings of the 2nd international con- versity, Saudi Arabia in 2009, and received her M.S. degree in Information Technology
ference on learning analytics and knowledge (pp. 252–254). from the University of Technology, Sydney in 2014. She is currently a PhD candidate at
Sikder, M. H., Hosen, M. R., Fatema, K., & Islam, M. A. (2022). Predicting students’ per- the Computing and Information Technology College at King Abdulaziz University. Her re-
search interests focus on Artificial Intelligence, and the application of Deep Learning to
formance in final examination using deep neural network. Asian Journal of Research
characterize student performance.
in Computer Science, 218–227.
Song, X., Li, J., Sun, S., Yin, H., Dawson, P., & Doss, R. R. M. (2020). Sepn: A sequential
engagement based academic performance prediction model. IEEE Intelligent Systems, Dr. Mohammed Basheri, a distinguished Associate Profes-
36, 46–53. sor of Information Technology at King Abdulaziz University in
Tao, T., Sun, C., Wu, Z., Yang, J., & Wang, J. (2022). Deep neural network-based predic- Jeddah, gained his Ph.D. in Computer Science from Durham
tion and early warning of student grades and recommendations for similar learning University, UK. His expertise includes data analysis, Human–
Computer Interaction, and Technology Enhanced Learning. He
approaches. Applied Sciences (Switzerland), 12.
passionately fosters growth and innovation in his students. His
Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent re-
substantial research record reflects his standing in the global
search. Review of Educational Research, 45(1), 89–125. academic community as a beacon of expertise and educational
Tinto, V. (2012). Leaving college: Rethinking the causes and cures of student attrition. Univer- excellence.
sity of Chicago Press.
Towell, G. G., & Shavlik, J. W. (1994). Knowledge-based artificial neural networks. Arti-
Dr. Madini Alassafi received his B.S. degree in Computer
ficial Intelligence, 70, 119–165.
Science from King Abdulaziz University, Saudi Arabia in 2006,
Tsiakmaki, M., Kostopoulos, G., Kotsiantis, S., & Ragos, O. (2020). Transfer learning from
and received his M.S. degree in Computer Science from Califor-
deep neural networks for predicting student performance. Applied Sciences (Switzer-
nia Lutheran University, United State of America in 2013, and
land), 10. his PhD “Security Cloud Computing” in February 2018 from the
Viberg, O., Hatakka, M., Bälter, O., & Mavroudi, A. (2018). The current landscape of University of Southampton, United Kingdom. He is currently an
learning analytics in higher education. Computers in Human Behavior, 89, 98–110. associate professor of Information Technology department and
Wagner, E., & Longanecker, D. (2016). Scaling student success with predictive analyt- Vice Dean of the Faculty of Computing and Information Technol-
ics: Reflections after four years in the data trenches. Change: The Magazine of Higher ogy at King Abdulaziz University, Jeddah, Saudi Arabia. His main
Learning, 48(1), 52–59. research interests include Cloud Computing and Security, Dis-
Waheed, H., Hassan, S. U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2019). tributed Systems, Internet of Things (IoT) Security issues, Cloud Security Adoption, Risks,
Predicting academic performance of students from vle big data using deep learning Cloud Migration Project Management, Cloud of Things, Security Threats, and Learning
models. Computers in Human Behavior, 104. Machine.

29

You might also like