PredictingStudentSuccess-AutoML PrePrint
PredictingStudentSuccess-AutoML PrePrint
net/publication/349240193
CITATIONS READS
92 482
3 authors, including:
All content following this page was uploaded by Hassan Zeineddine on 03 February 2022.
*AUD/IBM Center of Excellence for Smarter Logistics, American University in Dubai, Dubai, PO Box 28282, UAE
Abstract
Students’ success has recently become a primary strategic objective for most institutions of higher education. With
budget cuts and increasing operational costs, academic institutions are paying more attention to sustaining students’
enrollment in their programs without compromising rigor and quality of education. With the scientific advancements
in Big Data Analytics and Machine Learning, universities are increasingly relying on data to predict students’
performance. Many initiatives and research projects addressed the use of students’ behavioral and academic data to
classify students and predict their future performance using advanced statistics and Machine Learning. To allow for
early intervention, this paper proposes the use of Automated Machine Learning to enhance the accuracy of predicting
student performance using data available prior to the start of the academic program.
Key words: Automated Machine Learning, Prediction Accuracy, Student Performance, Pre-Admission Data,
1
P
1. Introduction
Student retention is a pressing issue for academic institutions around the globe, given tight budgets and limited
resources [1]. The average dropout rate in Organization for Economic Co-operation and Development (OECD)
countries is around 45% [2]. Accordingly, higher education establishments are creating and setting intervention
strategies to remedy this problem. Researchers and practitioners agree that such strategies are most effective if applied
in a student’s first year of study. Hence, a lot of focus has been placed on predicting, as early as possible, vulnerable
Recently, predictive analysis has relied on Machine Learning to support business decision-making. Applications
in finance, operations and risk management are good attestations of the relevance of Machine Learning research in
various business functions. Evermann et al., for example, used machine learning to predict business process
More and more, Machine Learning is used in the field of higher education management. Specifically, there has
been an increased interest in adopting Machine Learning to predict student performance and identify students at risk
based on initial data gathered during their years of study, as surveyed in the work of Miguéis et al. [6]. Fewer work
addressed the prediction of student performance using data prior to starting their academic journey [2, 3].
Given the complexity of choosing an optimal prediction model for a given dataset from a wide pool of predictive
methods and different hyper-parameter values per model, the automation of this process can help increase the
prediction accuracy [7, 8, 9]. In relation to that, Automated Machine Learning (AutoML) is a technique meant to
derive the best classification model and corresponding hyper-parameters for a given decision-making problem. This
technique can add value if used in predicting student performance. Yet, the review of the literature in this area shows
a lack of empirical work using AutoML. Our research work relies on AutoML to help increase the accuracy of
predicting student performance using data available upon entering an academic program.
The rest of the paper is organized as follows. Section 2 will discuss the Theoretical Background. Section 3 presents
2
PREPRINT
2. Theoretical Background
The topic of predicting student performance in academic institutions has attracted the attention of researchers and
academic administrators for the past two decades [10]. The literature is mainly focusing on two fronts: identifying the
most critical attributes for predicting student performance and finding the best prediction method for enhancing the
In relation to identifying critical attributes, several factors may affect a student’s performance such as social and
economic standing, psychological elements, demographics, school systems, and social networks [15]. Reviews of the
common attributes used in predicting student performance discussed several factors and categorized them as either
internal or external [16]. Attributes such as assignment marks, quizzes, class tests and attendance are classified as
internal assessment [17]. Several papers have also used cumulative grade point average (CGPA) as their main internal
attributes to assess student performance [16]. In terms of external assessment, one needs to mention student
demographics such as gender, age, family background, special needs, etc. [18]. Other popular external attributes are
socio-demographic characteristics, extra-curricular activities, high school background and social interaction network
[18, 19, 20]. Several researchers have also used psychometric factors such as personal interest, study habits, and family
Several machine learning methods have been used in the literature to predict student performance, mainly Logistic
Regression, Decision Tree, Artificial Neural Network, Naive Bayes, K-Nearest Neighbor, Support Vector Machines,
and different Ensemble methods. The next paragraphs discuss the use of these methods in predicting student
Regression methods for predicting student performance use a finite set of relationships among the dependent and
independent variables, generating a predictive function that models these associations [12, 18, 19, 23]. The logistic
regression method for predicting student performance is normally used to describe the associations between a
number of independent variables that could be categorized as binary, categorical and continuous [2, 13, 21, 24]. The
level of prediction accuracy using the logistic regression is around 70% using variables such as career aspirations,
3
2.2. Decision Tree
Many researchers have used the Decision Tree prediction method for its clarity and ease in exposing small and
large data sets and forecasting the value [6, 18, 21, 23, 25]. The logic when applying decision tree techniques is
equivalent to a series of IF-THEN statements, which can help in simplifying the understanding of this method. There
are several papers that have used this method to predict student performance using key indicators such as student
grades in specific courses and current CGPA [13, 22, 26]. The accuracy of prediction using this method while relying
on data prior to students starting an academic program is around 70% [2], and reaches 90% when using data gathered
An Artificial Neural Network (ANN) can detect all existing interactions among independent variables. It has been
widely used as a method in educational data mining. The ANN’s ability to detect with high confidence complex
associations between independent and dependent variables makes it a powerful tool in predicting student performance
[12, 13, 23, 24, 25, 26]. The most common variables used in forecasting student performance using neural network
are student attitude towards learning, admission data, CGPA and grades in specific courses. This technique led to up
to 98% accuracy in predicting student performance using data after students joining an institution, and had an accuracy
of around 70% using data prior to students starting their academic journey [2, 16].
Naïve Bayes is another method used to predict student performance. It uses all attributes existing in the data and
makes comparisons among independent variables to show the significance and effect of each of these predictors. The
papers that used this method predominantly considered variables such as grades, scholarships, CGPA, high school
background, demographics, social network data and internal assessments. Research using Naïve Bayes relied mostly
on data gathered after students had started their academic journey [6, 13, 21, 23, 24], with a minimum accuracy of
4
2.5. K-Nearest Neighbors
The K-Nearest Neighbors is a simple algorithm that classifies a data point based on the prevalent class of its K-
Nearest Neighbors. The data in this technique encompasses a number of multivariate attributes that are used for
classification. The K-Nearest Neighbors method is quick in predicting student performance in terms of level of
learning (slow, medium, good and excellent learner) [13, 21, 23]. Its accuracy rate was slightly above 60% when using
psychomotor factors, and reached 83% when using data extracted from internal assessments, CGPA, and extra-
Support Vector Machine (SVM) is a supervised learning method that classifies data points by segregating them
using an N-dimensional hyperplane, where N is the number of attributes characterizing a data point. This method has
helped researchers in predicting student performance when working with small samples [6, 12, 13, 21, 23, 25]. The
SVM also proves to be effective when dealing with overlapped data. Earlier research used CGPA, extra-curricular
activities, psychomotor tests and internal assessments in predicting student performance [19] and reached an accuracy
There is a general consensus that combining prediction methods produces more accurate and more robust
prediction results [27]. The collective decision of all methods is the result of a probabilistic averaging or a voting
scheme. To ensure an increase in accuracy over individual methods, the methods in an ensemble should have a fair
level of uncorrelated errors [28]. In another term, each constituent method should yield better accuracy than the other
methods in the set if applied individually on a different segment of the data space. In addition, none of the methods
will be able to yield optimal accuracy if applied on the entire data space. Several papers addressed the topic of
predicting student performance using the Ensemble Method [2, 3, 6, 11, 24]. Specifically, those papers relied on
Radom Forest, Boosted Trees, Bagged Trees, and Information Fusion. Delen [11] reported 82% accuracy in predicting
students’ performance within their first year of studies using the Information Fusion approach. Migues [6] reported
95% accuracy in predicting students’ performance within their first year of studies using Boosted Trees, relying on
5
PREPRINT
earned grades and completed credits. Hoffait and Schyns’ work [2] was distinguished with their use of Random Forest
based on data gathered prior to admission. They extended different ensemble models with a special algorithm to
increase their prediction accuracy. The algorithm aims at identifying a subset of students who are most likely to fail,
out of the general set of students who are predicted to fail. It ensures that the prediction accuracy rate , using the
identified subset, should be equal to a confidence level, which is defined by the decision maker. After applying the
algorithm on Random Forest, it identified 21.2% of students from the set of those who were facing a high risk of
failure, with a confidence of 91%. However, when considering the entire set of students, the authors reported close to
70% accuracy for predicting Fail, and close to 59% accuracy for predicting Pass.
After reviewing the widely used prediction methods, it is important to re-emphasize the value of automation for
choosing an optimal prediction model, given the complexity of such a task. Various AutoML applications have
recently been described in the literature [7, 8, 9]. The Study of Tuggener et al. [9] confirms the superiority of auto-
generated machine learning models over human-designed models. Luo et al. highlight the cost of building and
generalizing Machine Learning models that often requires hundreds of manual iterations to identify a suitable
prediction model and corresponding hyper-parameters, and encourage medical researchers to adopt AutoML for cost
efficiency. Salvador et al. [7] led experimental analysis examining the search space of 812 billion possible
combinations of methods and categorical hyper-parameters, for 21 publicly available data sets, and 7 data sets from
real chemical production processes. Relying on their results, they encouraged practitioners to use AutoML on a broad
variety of classification problems. Stadelmann et al. reported practical use of AutoML in-analyzing house and client-
In light of the reviewed literature, there is an evident need to use AutoML in an attempt to improve the accuracy
of predicting student performance. Particularly, such a need is prominent when predicting students’ performance based
on data prior to starting their first academic year, where the accuracy level is around 70%. Increasing the accuracy of
prediction, based on data available from day one, is not only of high value for researchers but also for practitioners
6
This paper relies on an automatic search algorithm in machine learning to identify the optimal model to predict
student success at the start of their first year in a university – using data available prior to starting a new program. This
3. Methodology
In this study, we rely on AutoML to derive the best classification model and corresponding hyper-parameters.
Amongst the most popular tools that offer AutoML features are Auto-Weka [28] and Auto-sklearn [29]. We chose to
run the Auto-Weka search algorithm with the hyper-parameter optimization option. Figure 1 represents the automated
machine learning process that looped through the list of predictive methods and corresponding hyper-parameter values
to identify the model with the best accuracy. The search algorithm concluded with an Ensemble Model of multiple
methods that yielded the best classification accuracy out of all the auto-tested combinations of prediction methods and
corresponding hyper-parameters. The prediction mechanism of the identified Ensemble Model is based on a voting
scheme that adopts the prediction outcome resulting from the majority of the constituent methods. The constituent
▪ K-Nearest Neighbors
▪ K-Means Clustering
▪ Naïve Bayes
▪ Logistic Regression
▪ Decision Tree
7
All combinations of
All ML hyper-parameter Training dataset Testing dataset
methods values that correspond
to one ML method
End
Mimicking the neural connections and interactions in the human brain, an ANN models a brain neuron using a
mathematical function F(𝑥) [30]. The ANN simulates the interconnections among neurons by nesting functions based
on a network model. The function’s parameter x is a vector of size n, 𝑥 = [𝑥1 , 𝑥2 , … , 𝑥𝑛 ]. We can represent this
function as:
If x is a scalar, F(𝑥) = 𝑥. The factor ω is a weight that will be learned through training the network on historical data.
S is a transfer function that normalizes the output within a specific range of values. The adopted transfer function in
this study is the Sigmoid function that modulates values between 0 and 1 as follows:
1
S(𝑥) = (2)
1+𝑒 −𝑥
The ANN is a hierarchical model made of multiple layers. Each layer has a number of nodes (neurons) that
connects via unidirectional links with all nodes in the downstream layer. There is no connection to upstream or same-
layer nodes. Normally, there are three types of layers: the input layer, a set of middle layers, and the output layer as
8
shown in figure 2. The architecture of the ANN adopted in this study is made of an input layer representing the
different categorical values of the adopted data features, 2 middle layers having 12 and 7 neurons respectively, and an
ẋ1 ẋ2 ..... ẋm
Middle Layers
ẍ1 ẍ2 ..... ẍl
Output Layer x
The K-Nearest Neighbors method classifies a data point based on the dominant class of its K-nearest neighboring
points within a training data set. The distance between two data points is measured using a specific function, such as
the Euclidean, Manhattan and Chebychev functions [30]. In our study, the adopted distance function is the Euclidean,
in which K was set to 1. The Euclidean distance between two data points x and y, where x and y are vectors of size n,
is:
The K-Means Clustering method assigns a data point to one class out of K different classes. Before the
classification, the clustering algorithm arranges the data points of a training set into K different clusters, which
represent eventually the classes. In this study, K was equal to 2 since we have two different classes: Pass and Fail.
9
The assignment of a data point to a cluster is decided based on its distance from the centroid of each cluster. The
centroid is the average data point of all points in the cluster. The adopted distance function for this algorithm is the
In our study, after the training phase to assign historical data points into two different clusters, the K-Means
Clustering classifier is able to predict the outcome of a new data point by assigning it to the cluster that has the closest
centroid.
The Naive Bayes classifier is a simple technique that predicts outcomes based on the Bayesian theorem. The
training of Naïve Bayes classifier is fast compared to other computationally intensive models. It classifies a data point
x based on the conditional probability of being in a class C given the values of its constituent scalars [x1, x2, …, xn],
without relying on any additional parameter. The class that has the highest probability of occurrence given the inputs
𝑃(𝐶)
𝑃(𝐶|𝑥) = ∗ ∏𝑛𝑖=1 𝑃(𝑥𝑖 |𝐶) (4)
𝑃(𝑥)
The SVM classifier derives boundaries between data points that belong to different classes. Points within certain
boundaries are normally part of a common class. The ideal scenario is when the data points belonging to different
classes are separable via a linear boundary. However, in most cases this is not possible due to data overlaps as shown
in figure 3. SVM casts the data points to a new higher dimension space in which the data becomes linearly separable
with a hyperplane, using a specific kernel function. This technique is based on the Cover's Theorem stating that non-
linearly separable data points would highly likely be separated by a hyperplane if projected to a higher-dimensional
space via some non-linear transformation. The boundary hyperplane will be realized by referencing the borderline
data points, which are called the support vectors. The identified support vectors should be away from the boundary by
a given margin. Not only the kernel function takes care of casting to a new space but also provides the dot product
10
between two data points x and y for measuring distances, hence reducing the computational overhead. We relied in
this study on the polynomial kernel function F of degree d as shown below [30]:
Logistic Regression classifier transforms the output of a linear regression function f(x) into a value between 0 and
1 using the logistic function L as described in the function below [30]. It reflects the odds of class occurrence with
1
𝐿(𝑓(𝑥)) = (6)
1+𝑒 −𝑓(𝑥)
A Decision Tree classifier learns from a set of historic data points and generates a corresponding tree-like structure.
The features and respective values are analyzed and structured in a hierarchical tree-like topology, which helps in
answering questions by a simple root-to-leave traversal. The root and all other decision nodes are connected to two or
more downstream nodes (all representing answers to decision questions). A leaf node has no downstream connections
and represents the final answer to the series of questions captured in the path of nodes preceding it up to the root [30].
Figure 4 is a snapshot from a section of the Decision Tree pertaining to this study.
11
Figure 4 – Section of the Decision Tree (No – CGPA Not Below 2.0; Yes – CGPA Below 2.0)
We have collected the data for this study from different sources within academic institutions in the United Arab
Emirates. Specifically, we relied on student records from Admission, Registrar and Student Service offices. Our
sample included records of 1491 students, of whom 1014 were in good academic standing.
We faced three main challenges when building the predictive model based on this sample: data inconsistency,
imbalance and overlap. For students who have spent at least a semester in a university program, several data features
would exist and should help in producing predictive models with high precisions. For example, we can rely on several
features to predict students’ success in a particular course or program such as grades in key courses, exams, past terms’
CGPAs, probations, warnings, class participations, and extra-curricular engagements. For new entrants, in the absence
of this data, other variables that are available upon admission are required to build a precise predictive model. These
variables represent common attributes of the admitted students such as age, gender, ethnicity, study program, course
load, on-campus residency, probation, and school education system. We used all of these variables in this study.
Furthermore, to address the differences in the high school systems and the inconsistency in evaluation schemes, we
relied on the students’ placement in developmental English and Math courses that are based on scores from standard
exams such as TOFFEL, IELT, English ACCUPLACER, Math ACCUPLACER, and SAT. We have used 13 data
12
features in developing this predictive model, as described in Table 1, and transformed their values to categorical
ranges.
The imbalance between the number of passing (1014) and failing (477) students biases the predictive model. We
needed to apply a careful data balancing technique to ensure better precision without compromising the learning value
from the data. We chose the Synthetic Minority Oversampling Technique (SMOTE) [31] to create extra data points
in the training data set in order to make a balance between the data classes. Table 2 shows the percentage of failing
Students having similar data might end up having different outcomes causing confusion to a prediction method.
Due to this data overlap, a method resorts to a particular stochastic guess within certain probabilistic limits to predict
an outcome, leading to reduced prediction accuracy. Our proposed ensemble of multiple predictive methods increases
the prediction accuracy since it relies on voting amongst different methods. In other words, the prediction outcome of
the Ensemble Model is the most recurring classification among the set of methods.
Table 1
Data Features.
Feature Values Description
Program BBA, ENG, BAIS, ARC, ID, Program of study in the university. The 6 considered programs
School HSD, IB, IGCSE, BAC, OTH School system from which a student is coming, as per the UAE
(OTH)
13
Ethnicity NAMR, AUS, ASIA, SAMR, The ethnic community to which the student belongs: North
EURO, LEVN, PERS, GCC, American (NAMR), Australian (AUS), Asian (ASIA), South
AFRC, NAFR, SASA, NASA American (SAMR), European (EURO), Levantine (LEVN),
Age Group AGE20+ and AGE19- The age is inferred from the date of birth an grouped under two
Scholarship NONE, QUART, HALF, The scholarship status of the student: No scholarship (NONE),
Transfer TRC, TRN, NON The transfer status of the students: Transferred from another
(NON)
dormitory (NO)
Course Load HIGH, MODR, NORM The course load is inferred from the number of registered
credits)
14
Math Level MATH1, MATH2, MATH3, The level of math skills upon admission based on the math
MATH4, MATH5 placement test. The lowest math is MATH1 and the highest
study.
English Level ENGL1, The level of English skills upon admission based on the
ENGL2, ENGL3, ENGL4, English placement test. The lowest English is ENGL1 and the
program of study.
Result PASS, FAIL The outcome based on the student’s Grade Point Average
passing (PASS).
Table 2
Descriptive Data.
Feature Percentage of failing students
ID(15%), BCIS(13%)
MEST(28%), EURO(21%)
15
Course Load LOW(65%), MOD(67%), NORM(35%), HIGH(18%)
Math Level NONE(15%), MATH1(48%), MATH2(36%), All other Math levels (average of
25%)
English Level NONE(23%), ENGL1(44%), ENGL2(18%), All other English levels (average of
10%)
4. Results
We used a 10 folds cross-validation to test the accuracy of the resulting Ensemble Model. The model is trained on
90% of the points and tested with 10% over 10 different runs. It is important to note that the data points that are
allocated for testing as part of the 10% split are different each time. Figure 5 is a schematic representation of the cross-
Trained
Training SMOTEd
Ensemble
Dataset Dataset
Model
90%
10%
Dataset Test
Dataset
No
16
Table 3 lists the classification methods with their corresponding accuracy rates when applied on our data set. In
addition to the overall accuracy, the table differentiates between the accuracy of predicting Fail and Pass. This
differentiation is important to assess the efficiency of these methods in targeting students at risk.
Table 3
Methods Comparison.
Further, table 3 highlights the kappa coefficient (κ), which is a statistic representing the level of agreement between
two different classifiers. It factors in the possibility of accidental agreements. In our case, the agreement is measured
𝑃𝑜 −𝑃𝑒
𝑘= (7)
1−𝑃𝑒
Po is the probability of making the right prediction, i.e. the accuracy measure. Pe is the probability of accidental
agreement between the classifiers. In a binary system having two predictors, 𝑃𝑒 = 𝑃1 (𝑎) ∙ 𝑃2 (𝑎) + 𝑃1 (𝑏) ∙ 𝑃2 (𝑏),
where Pi(n) is the probability of classifier i predicting class n. A kappa coefficient between 0.4 and 0.75 is considered
good according to Fleiss’ Scale. A kappa below 0.4is poor, and above 0.75 is excellent. Our Ensemble Model achieved
a Kapa of 0.5, which is nearly 20% higher than what is achieved in using each prediction model separately (using the
same data). This implies that our Ensemble Model resulting from the automatic search leaves less chance for accidental
guessing.
17
5. Conclusion
The reported work in this paper contributes to the body of knowledge in the field of predicting student academic
success. Specifically, it relies on AutoML to increase the prediction accuracy of student performance using data
features available prior to the students starting their new academic program, i.e. pre-start data. In effect, the accuracy
of predicting student performance using pre-start data has never exceeded 70%, as found in the current literature [2,
3]. In our study, we achieved 75.9% overall accuracy through the use of AutoML, with a Kapa of 0.5. Accordingly,
we encourage researchers in this field to adopt AutoML in their search for an optimal student performance prediction
Beside improving the overall prediction accuracy, it is of paramount importance to improve the accuracy of
predicting the failing students, who need immediate attention and support from specialized units within academic
institutions. The maximum accuracy rate reported in the literature on predicting failure of new-start students is at
70%. In our case, the auto-generated Ensemble Model predicts failing students with an accuracy of 83%, after
balancing the data using Synthetic Minority Oversampling Technique. Such a result emphasizes the importance of
balancing data using advanced statistical techniques to achieve better prediction, especially if the minority class is of
interest. The authors acknowledge the overgeneralization limitation of using SMOTE. Yet, since the data set
contains a sizeable minority, the risk of creating synthetic values outside of the minority set, which overlap with the
The resulting increase in prediction accuracy of students at risk allows academic institutions to be more efficient
in supporting those students while utilizing the least amount of resources. Future studies may rely on descriptive
statistics to analyze the role of different psychographic variables and their impact on the predictive model. It would
be interesting for upcoming studies to test auto-generated ensemble models in predicting student career success using
18
References
[1] M. Tight, Student retention and engagement in higher education, Journal of Further and Higher Education, Mar
[2] A.S. Hoffait, M. Schyns, Early detection of university students with potential difficulties, Decision Support
[3] J.P. Vandamme, N. Meskens, J.F. Superby, Predicting academic performance by data mining methods,
[4] J. Evermann, J.R. Rehse, P. Fettke, Predicting process behaviour using deep learning, Decision Support Systems
[5] N. Carneiro, G. Figueira, M. Costa, A data mining based system for credit-card fraud detection in e-tail, Decision
[6] V.L. Miguéis, Ana Freitas, Paulo J.V. Garcia, André Silva, Early segmentation of students according to their
academic performance: A predictive modelling approach, Decision Support Systems 115 (2018) 36-51.
[7] M. M. Salvador, M. Budka, B. Gabrys, Automatic Composition and Optimization of Multicomponent Predictive
Systems With an Extended Auto-WEKA, IEEE Transactions on Automation Science and Engineering 16 (2) 2019.
[8] T. Stadelmann, M. Amirian, I. Arabaci, M. Arnold, G. F. Duivesteijn, I. Elezi, M. Geiger, S. Lӧrwald, B.B.
Meier, K. Rombach, Deep learning in the wild, IAPR Workshop on Artificial Neural Networks in Pattern
Machine Learning in Practice, State of the Art and Recent Results, Proceedings of the 6th IEEE Swiss Conference
[10] A. Pena-Ayala, Educational data mining: a survey and a data mining-based analysis of recent works, Expert
[11] D. Delen, A comparative analysis of machine learning techniques for student retention management, Decision
19
[12] S. Huang, N. Fang, Predicting student academic performance in an engineering dynamics course: a comparison
of four types of predictive mathematical models, Computers & Education 61 (2013) 133–145.
[13] F. Marbouti, H.A. Diefes-Dux, K. Madhavan, Models for early prediction of at-risk students in a course using
[14] C. Márquez-Vera, A. Cano, C. Romero, A.Y.M. Noaman, H. Mousa Fardoun, S. Ventura, Early dropout
prediction using data mining: a case study with high school students, Expert Systems 33 (1) (2016) 107–124.
[15] M. Richardson, C. Abraham, R. Bond, Psychological correlates of university students' academic performance: a
systematic review and meta-analysis, Psychological Bulletin 138 (2) (2012) 353–387.
[16] A.M. Shahiri, H. Wahidah, A.R. Nur’aini, A Review on Predicting Student's Performance Using Data Mining
[17] Z.K. Papamitsiou, V. Terzis, A.A. Economides, Temporal learning analytics for computer based testing,
Proceedings of the Fourth International Conference on Learning Analytics And Knowledge, LAK ’14, ACM, New
[18] S. Natek, M. Zwilling, Student data mining solution-knowledge management system related to higher education
[19] M. Mayilvaganan, D. Kalpanadevi, Comparison of classification techniques for predicting the performance of
students academic environment, 2014 International Conference on Communication and Network Technologies
[20] G. Putnik, E. Costa, C. Alves, H. Castro, L. Varela, V. Shah, Analysing the correlation between social network
analysis measures and performance of students in social network-based engineering education, International Journal
[21] G. Gray, C. McGuinness, P. Owende, An application of classification models to predict learner progression in
tertiary education, Advance Computing Conference (IACC), 2014 IEEE International, 2014, pp. 549–554.
[22] T. Mishra, D. Kumar, S. Gupta, Mining students' data for prediction performance, 2014 Fourth International
20
[23] P. Strecht, L. Cruz, C. Soares, J. Mendes-Moreira, R. Abreu, A Comparative Study of Classification and
Regression Algorithms for Modelling Students' Academic Performance, International Educational Data Mining
[24] C. Romero, P.G. Espejo, A. Zafra, J.R. Romero, S. Ventura, Web usage mining for predicting final marks of
students that use Moodle courses, Computer Applications in Engineering Education 21 (1) (2013) 135–146.
[25] E.B. Costa, B. Fonseca, M.A. Santana, F.F. de Araújo, J. Rego, Evaluating the effectiveness of educational data
mining techniques for early prediction of students' academic failure in introductory programming courses,
[26] C. Romero, S. Ventura, Data mining in education, Wiley Interdisciplinary Reviews: Data Mining and
[27] G. Seni, J. Elder, Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions,
[28] L. Kotthoff, C. Thornton, H.H. Hoos, F. Hutter, K. Leyton-Brown, Auto-WEKA 2.0: Automatic model
selection and hyperparameter optimization in WEKA, Journal of Machine Learning Research 18 (2017) 1-5.
[29] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. Efficient and robust automated
[30] G. James, D. Witten, T. Hastie, R. Tibshirani, R., An Introduction to Statistical Learning: With Applications in
R, 2013, Springer.
[31] G. Douzas, F. Bacaoa, F. Last. Improving imbalanced learning through a heuristic oversampling method based
21
Bios:
Hassan Zeineddine holds a PhD in computer sciences from the University of Ottawa in Canada. He has 15 years of
industry experience associated with several leading telecommunication companies in North America. Hassan’s
current research interests are in the fields of data analytics, operations research, logistics and supply chains
collaboration. His other research interests include process modeling and simulations.
Assaad Farah holds a PhD in Management from the University of Bath in the United Kingdom. In addition to his
academic responsibilities, he is an executive educator and consultant mainly for the UAE public sector. Prior to that,
he worked in the aeronautical and mobile industry in Canada. His research focus revolves around knowledge
Udo Braendle has worked in practice and for universities for more than 15 years. His research mainly focuses on
management science, regulation and the social and environmental behavior of firms. He has published widely on
these issues in leading journals, such as the Social Responsibility Journal, Journal of Management and Governance,
22