0% found this document useful (0 votes)
4 views

paper-predicting-student-scores

Uploaded by

Thet Hsu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

paper-predicting-student-scores

Uploaded by

Thet Hsu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/335219629

An Apache Spark-based Platform for Predicting The Performance of


Undergraduate Student

Preprint · August 2019

CITATIONS READS

0 252

4 authors, including:

Minh Chung Nam Thoai


Ho Chi Minh City University of Technology (HCMUT) Ho Chi Minh City University of Technology (HCMUT)
12 PUBLICATIONS 48 CITATIONS 95 PUBLICATIONS 421 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Supernode-XP 50TFlops with Intel Xeon Phi (MIC) View project

Study and develop parallel metaheuristic-based allocation algorithms on Intel Xeon Phi co-processors View project

All content following this page was uploaded by Minh Chung on 17 August 2019.

The user has requested enhancement of the downloaded file.


An Apache Spark-based Platform for Predicting The Performance of
Undergraduate Student

Thong Le Mai, Phat Thanh Do, Minh Thanh Chung, Nam Thoai
Faculty of Computer Science & Engineering
Ho Chi Minh City University of Technology, Vietnam
Email: 1513293,1512400,ctminh,[email protected]

Abstract—Nowadays, Education Data Mining (EDM) plays a The prediction has been one of the most dominant fields
very important role in higher education institutions. Plenty of in EDM since 1995 [3]. Related researches usually exploit
algorithms have been employed to measure student’s GPA in influential factors from the university’s data to build a pre-
the next semester’s courses. The results can be used to early diction model such as GPA, age, sex, etc. A lot of Machine
identify dropout students or help students choose the elective Learning algorithms are used to solve these problems includ-
courses which are appropriate for them. The most widely ing Decision Tree, Random Forest, Regression, Neural Net-
used methods are machine learning, however, the problem is work [4] [5] [6]. Some others techniques frequently based on
the accuracy which can be changed from dataset to dataset. Recommender System (e.g., Collaborative Filtering, Matrix
More importantly, the performance of prediction models can Factorization) are also found a lot of successes [7] [8]
be affected by the characteristic of dataset associated with the [9] [10] [11]. Furthermore, some studies focus on whether
applied model. In this paper, we build a distributed platform different types of predictor variables rather than the explicit
on Spark to predict missing grades of elective courses for ratings such as ages, sex, online time, response efficiency
undergraduate students. The paper compares several methods in improving the accuracy [6] [12]. However, the problems,
that are based on the combination of Collaborative Filtering & associated with the ease of collecting educational data today,
Matrix Factorization (namely Alternative Least Square). We are the scale of prediction models and the characteristic-
evaluate the performance of these algorithms using a dataset aware of each dataset to applied prediction methods.
provided by Ho Chi Minh University of Technology (HCMUT).
In this paper, we propose a distributed platform based on
Spark to predict the scores of future courses for undergradu-
The dataset consists of information about undergraduate stu-
ate students in our university. We obtain different prediction
dents from 2006 to 2017. Depending on the characteristics of
algorithms from combining techniques based on Recom-
our dataset, the paper highlights that Alternative Least Square
mender System such as Collaborative Filtering (namely
with non-negative constraint achieves the better results than
User-based Collaborative Filtering and Item-based Collab-
others in comparison.
orative Filtering), Matrix Factorization (using Alternative
Index Terms—Educational Data Mining, Spark, prediction, Least Square) and Nonnegative Matrix Factorization (added
student performance, distributed system, machine learning non-negative constraint to Alternative Least Square). The
original dataset is grouped by different faculties: Industrial
Maintenance, Chemical Engineering, Civil Engineering, ...
1. Introduction For the evaluation, the paper performs our proposed methods
to the data of four faculties. They are Computer Science
Education Data Mining (EDM) is a research field which and Engineering (MT), Environment and Natural Resources
concerns data mining techniques to analyze patterns from (MO), Chemical Engineering (HC), Civil Engineering (XD).
data in educational context [1]. Currently, there are many Then, our experiment highlights that the best prediction
learning systems that gather a large amount of educational model depends on the characteristic of the dataset, the per-
data such as Learning Management Systems (LMS), Mas- formance can be changed from dataset to dataset. Further-
sive Open Online Courses (MOOCs). Following that, ap- more, the accuracy of each prediction model is also affected
plications and tasks in EDM can be divided into different by the way of dividing or choosing dataset associated with
categories, depending on different properties. Based on the training the prediction model. In some cases, the experiment
literature review in 2013 [2], we consider two main groups: shows the difference when the prediction models are trained
“Student Modeling” and “Decision Support Systems” in from student’s data of the whole university and the separate
terms of EDM. Concretely, predicting student’s performance faculty.
belongs to the “Student Modeling” group. The most widely The rest of this paper is organized as follows. Section 2
used methods for this problem are regression and classi- shows related work about methods for predicting student
fication, but other methods have also been used such as performance. Section 3 is the background of algorithms and
clustering and feature selection. techniques that are used. We describe specifically the dataset
provided by Ho Chi Minh City University of Technology methods such as logistic/linear regression [10]. Furthermore,
in Section 4. Section 5 shows the proposed platform and in their follow up paper [11], they extended the research,
methods. In Section 6, we present the experiment results, using tensor-based factorization to take the temporal effect
and highlight the conclusions as well as future work in into account when predicting student performance. Feng et
Section 7. al. [12] have taken the advantage of the student-system
interaction information that is normally not available in
2. Related Work traditional practice tests such as the time students take to
answer questions and the time they take to correct an answer
In terms of Educational Data Mining (EDM), one of which they got wrong. The addition of this information
the most common tasks is filtering out information that is shown to make better predictions than tradition models
can be used to predict the student’s performance [3] [1]. which only uses correctness of test question. Elbadrawy
Generally, many studies have been conducted in order to et al. attempted to use Personalized Multiregression and
predict student’s grade as well as identify risky students. Matrix Factorization to forecast students’ grades on in-class
Currently, machine learning algorithms are used to solve the assessments [15]. The results revealed that these methods
prediction problem in this field. Some of them are based on can achieve a lower error rate than traditional methods.
techniques frequently used in Recommender System [13]. Furthermore, a lot of researches focus reviews on ex-
Romero et al. have used classification algorithms such as isting types of educational systems and methods applied in
Decision Tree, Rule Induction, Neural Network,... to predict EDM. Romero et al. have categorized EDM researches into
students’ final marks based on information in an e-learning 2 large groups [16]:
system [5]. Random forest was employed in [4] to examine • Statistics and visualization
the statistical relationship between students’ graduate-level • Web mining
performance and undergraduate achievements. Garcı́a et al.
Web mining is considered a prominent group of EDM
have applied association rule mining to discover interesting
because many methods revolve around the analysis of logs
information through students’ usage data in the form of IF-
of student-computer interaction. [1] examined three hundred
THEN recommendation rules, which is to build a system
published papers until 2009 grouped by task/category such
helping teachers to continually improve and maintain adap-
as recommendation, predicting performance, detecting be-
tive and non-adaptive e-learning courses [14]. Nurjanah et
havior, analysis, and visualization, etc. [3] investigated what
al. proposed a technique for recommending learning materi-
some of the major trends are in EDM research. They found
als which combine content-based filtering and collaborative
that in 43% papers which was examined in [16] published
filtering [7], [8]. In detail, content-based filtering is first
between 1995 and 2005 centered around relationship mining
applied to filter out relevant materials. Then, collaborative
methods. However, in 2008 and 2009, relationship mining
filtering is used to select good students. This technique
slipped to fifth place with only 9% papers. On the other
aims to reduce the drawback of classic collaborative filter-
hand, prediction, which was in second place between 1995
ing which recommends materials based on the similarity
and 2005, moved to the dominant position in 2008-2009.
between students and not take into account student’s com-
petence. The resulting model achieved MAE score of 0.96
for a scale of 1 to 10 and 0.73 for a scale of 1 to 5. In
3. Background
2017, Iqbal et al. have applied and evaluated Collaborative
3.1. Spark cluster overview
Filtering, Matrix Factorization and Restricted Boltzmann
Machine (RBM) to predict student grade in a dataset consists Today, Spark [17] is used at a wide range of areas to
of 225 students and 24 courses with 1736 available grades analyzing large-scale datasets. Figure 1 gives an overview
and 3664 missing grades (grades are given in scale 0-4). of how Spark performs tasks on clusters. Applications or
They concluded that the RBM model gives the best result user-defined programs run as independent sets of processes,
with 0.3 RMSE nearly half of the second best model (Non- this is coordinated by the definition of SparkContext object
negative Matrix Factorization) which had RMSE of 0.57 in the code (called the Driver Program). Then, SparkContext
[9]. Otherwise, Conijn et al. analyzed 17 blended courses plays a role in allocating resources across programs. Once
with 4,989 students using 23 predictor variables collected connected, Spark finds executors on worker nodes in the
from Moodle Learning Management Systems (LMS) [6], cluster where the processes of user’s programs will be run
They found that there is a larger improvement in prediction and stored data. Next, Spark sends the program code (written
when those grades are unavailable in the case of in-between by Java or Python) to the executors. Finally, SparkContext
assessment grades are available. Thus, the LMS data in this sends tasks to the executors to run.
dataset are substantially smaller predictive value compared
to the midterm grades. 3.2. Data mining techniques
Concerning another technique, Thai-Nghe et al. used
matrix factorization to predict student performance on the 3.2.1. Collaborative Filtering. Collaborative Filtering is
Knowledge Discovery and Data Mining Challenge 2010 the popular algorithm which is commonly used in recom-
dataset. They showed that matrix factorization could im- mender systems. A recommender system focuses on sug-
prove prediction results compared to traditional regression gesting the set of items for users based on their existing
• Then, the algorithm will select the most similar
courses which are learned by the active student based
on using k-nearest neighbors algorithm.
• Similar to UBCF, the prediction result is made by
aggregating the GPAs of the most similar courses.

3.2.2. Matrix Factorization. Matrix Factorization is the


basis for some of the most successful realizations in the
latent factor model which tries to characterize students and
courses on k factors to explain the grades patterns. For
courses, these factors can correspond to the amount of math,
difficulty, number of equations. For student, these factors
correspond to the student affinity toward those latent factor.
Matrix Factorization is a method that decomposes a matrix.
In this case, it is the utility matrix which represents all
Figure 1: Spark architecture with the cluster mode student’s grade into two or more matrix.
Singular Value Decomposition: Singular Value Decom-
position tries to decompose the utility matrix R into two
item. To do this function, firstly, we determine the user’s matrix, U and I :
rating for each item. In this paper, the users are students, G≈U ×V (1)
the items are courses associated with the user’s ratings are where:
grades. There are two kinds of Collaborative Filtering: User-
based Collaborative Filtering (UBCF) [18] and Item-based • U is a m × r matrix, where m is the number of
Collaborative Filtering (IBCF) [19]. students, r is the number of latent factors. Each
User-based Collaborative Filtering: As regards, the student u is associated with vector pu of the length
student’s grade in a course can be predicted by identifying r. Each element in this vector corresponds to the
similar students. The predictions are performed by selecting affinity of student u for the corresponding latent
and aggregating the grades of other students. There is a list factor. Vector pu can be viewed as a row in matrix
of n students S = {S1 , S2 , ..., Sn } and a list of m courses C U where Uuk represents the affinity of student u for
= {C1 , C2 , ..., Cm }. Each student has a list of courses which the latent factor k .
represents student GPA. To predict the student’s grade in a • V is a r × n matrix, where n is the number of
course: courses. Vik represents the affinity of course i for
latent factor k . Each course i is associated with
• Firstly, the UBCF algorithm calculates the similarity a vector qi of the length r. Each element in this
matrix to determine how similar each student in the vector corresponds to the affinity of course i for
database to the active student. the corresponding latent factor. Vector qi can be
• After that, the algorithm select the most similar stu- viewed as a row in matrix V where Vik represents
dents by using k-nearest neighbors algorithm [20]. the affinity of course i for the latent factor k .
• The prediction results are generated by aggregating
the GPAs of the most similar students. In the sim- The dot product pu qiT will be the estimated grade ĝui
ple case, the aggregation can be mean or weighted of student u in course i:
average by taking similarity between students into
account. ĝui = pu qiT (2)

Item-based Collaborative Filtering: IBCF is used in To learn matrix U and V , we will minimize the cost
the case that the courses have been rarely changed. In this function:
algorithm, the student’s grade in a course can be predicted
X
rui − pu qiT + λ p2u + qi2
 
(3)
by identifying similar courses which have learned by the
u,i∈H
current student. Instead of identifying the most similar stu-
dents in UBCF, the IBCF algorithm determines the most where H is the set of (u, i) pair where gui is in the training
similar courses from the set of courses that the current set; λ is the regularization parameter.
student have learned. The predictions are made by selecting Using gradient descent, for each given value of gui in
and aggregating the grades of other courses. To predict the the training set, to update vector pu and qi :
student’s grade in a course:
pu = pu + γ rui − pu qiT · qi − λpu
 
• Firstly, the IBCF algorithm calculates the similarity (4)
qi = qi + γ rui − pu qiT · pu − λqi
 
matrix between the courses to determine how similar
each course in the database to the course that needs where γ is the learning rate.
to be predicted.
Alternative Least Square (ALS): This method is one program will finished in 4 - 4.5 years and the courses are di-
of the optimization for the Singular-Value Decomposition vided into three groups: Basis Courses, Compulsory Courses
(SVD) [21] method. Recall Equation 3 where both pu , qi and Core&Elective Courses. The courses correspond to the
are unknown and tied with each other in a multiplication level of years that students are studying. Figure 2 shows
operation which makes this non-convex. The idea of ALS is: the program of undergraduate students in the Faculty of
when we fix one of the unknown variables which is either Computer Science & Engineering. This highlights that our
pu or qi , the cost function becomes a quadratic problem. module just predicts the scores of unfinished and elec-
In each iteration, ALS first fixes U (all vectors pu ) and tive courses. Fundamentally, the information from finished
solves for V , then it fixes V (all vectors pi ) and solves for courses (such as basic and compulsory courses) will be the
U . The process is repeated until there is a convergence. In basic foundation for the prediction of unfinished courses.
ALS, each pu is independent with other pu0 !=u , and each Besides 35 columns of information in the dataset, we
qi is independent with other qi0 !=i . This algorithm can be reprocess with 4 main fields, as Table 2. In the core of
massively parallelized. our prediction module, the proposed algorithms will use
Non-negative Matrix Factorization: Another type of these fields to train the prediction model. In detail, we have
matrix factorization, where the non-negative constraint is Student ID, name of faculty, course ID and the average grade
added. A given non-negative matrix G contains all observed from component grades on each course.
grades, then we need to find the non-negative matrix factors,
W and H [22]. Table 2: Educational dataset
G(n) ≈ W × H (5)
Student Faculty Course Grade
• W is a non-negative m × r matrix. Computer Science and
29081892 Data Mining 7.5
Engineering
• H is a non-negative r × n matrix.
28193782 Chemical Engineering Calculus 9.0
With normal matrix factorization, we can obtain negative 32876719 Mechanical Engineering Physical Education 7.5
... ... ... ...
affinity between a student u and a latent factor k which can
be hard to interpret (e.g., the difficulty of latent factors can
be negative). Non-negative matrix factorization can give us Each record is information about the course student’s
a better representation of the latent factors by guaranteeing grade with the corresponding course. The grades are scaled
non-negative value, thus a course can be described as a from 0 to 10 by the double type. The distribution of stu-
collection of smaller knowledge domains. For example, dent’s grades is shown in Figure 3. The popular range of
Computer Graphic Course can be interpreted as a collection undergraduate grade is from 5 to 8.5, and the sparsity of the
of 60% algebra +20% math +20% art +0% literature. dataset is 0.9845. The sparsity is calculated by the following
formula - (6):
4. Dataset G
S =1− (6)
N ·C
The educational dataset is collected from Ho Chi Minh where, N , G are C are the total number of student’s grades,
City University of Technology in Vietnam. The dataset students, and courses respectively. Furthermore, we also
contains data of 61271 undergraduate students with over show the overview of information in all faculties in our
three million records about student’s grade. There are 35 university from the dataset, as Table 3 shown, e.g., the
columns in the dataset, however, it is reprocessed with number of courses, students, or grades.
influence fields. In the context of this work, we focus
on the prediction of average grades in some unfinished 5. Methods
courses of students. Therefore, the basic foundation of our
proposed module is the relation between finished courses Apache Spark [17] has a well-defined layer architecture
and unfinished courses. Especially, this also depends on the where all components are loosely coupled. This architecture
educational program of each university. can be further integrated with various extensions and
libraries. The core of Spark is the base engine for large-
Table 1: The statistics of the dataset scale parallel and distributed data processing. Our platform
is built on the top of the core, which allows modifying as
Number of faculties 14 well as improve the predictor with different demands further.
Number of courses 2389
Number of students 61271 In our platform, there are 4 main modules: preprocess-
Number of student’s grades 2270045
Sparsity 0.9845 ing, training, deployed & recommender. Figure 4 shows the
working flow and overview of our prediction platform, the
white boxes are input, output or the definition of state in re-
In HCMUT, there are totally 14 faculties, 2389 courses lated modules. The paper aims at predicting student’s score
and the number of students in the given dataset (from 2006 of unfinished & elective courses in the third-fourth year. The
until 2017) is 61271, as Table 1 shown. The undergraduate main idea is the machine learning core with the insights from
Figure 2: An example of the educational program in Faculty of Computer Science & Engineering.

Table 3: The detail statistics of each faculty in the dataset

# stu-
Nota- # # stu- Spar-
Faculty dent’s
tion courses dents sity
grades
Computer Science and
MT 168 5158 155574 0.8205
Engineering
Industrial Maintenance BD 116 1958 54976 0.7580
Mechanical
CK 435 9233 351539 0.9125
Engineering
Geology & Petroleum
DC 207 2476 93516 0.8175
Engineering
Electrical and
DD 325 9391 360546 0.8819
Electronic Engineering
Transportation
GT 230 2323 88510 0.8343
Engineering
Chemical Engineering HC 322 6117 222478 0.8870
Environment and
MO 177 2401 90633 0.7867
Natural Resources
Energy Engineering PD 89 565 16912 0.6637
Industrial Management QL 137 3577 104514 0.7867
Applied Sciences UD 192 2099 78986 0.8040
Materials Technology VL 183 2910 109612 0.7942
Training Program of
Excellent Engineers in VP 309 1515 92040 0.8039
Vietnam (PFIEV)
Civil Engineering XD 445 11691 450209 0.9135
Figure 3: Distribution of student’s grades

techniques to build the module. These techniques can be


the history of finished courses that undergraduate students
modified or improved with further methods. Depending on
have had. According to the working flow:
the feature of each dataset as well as the structure of
• Firstly, the raw dataset is analyzed and preprocessed education programs in different universities that we have
by the preprocessing module. Afterward, there are a suitable method in the training module. The proposed
two sets: a training set and a test set. machine learning techniques include:
• Secondly, the training set is used for building the
prediction model with the machine learning tech- 5.1. Baseline Method
niques. Especially, the training module includes sev-
eral core methods for training the prediction model. This method simply takes an average of all available
Following this module is the testing phase, it is to grades of the student to predict missing course’s grades. It is
improve the prediction model across a number of used as a baseline in comparison among different prediction
loops. models. For example, in Table 4, the estimated grade of
• Next, the best module for predicting student’s score Course 2 for student 1511000 will be calculated as 9+8 2 =
will be deployed, and it is used in practice with the 8.5.
student’s input (including student’s information &
scores of finished courses). 5.2. Matrix Factorization
• Finally, the result of the prediction model will be
used for the recommender module. This module We apply Matrix Factorization in Spark as the form of
focuses on the recommendation for students which Alternative Least Square. Spark also supports Non-negative
courses they should choose to achieve a high result. Matrix Factorization by adding the non-negative constraints
Association rule is used here for the recommender when solving least squares. In detail, the data in Table 4 can
module. be interpreted as a matrix like Table 5 shown. This matrix is
then factorized by using Alternative Least Square approach
As mentioned above, the training module is the core of and the result matrix will be used as a prediction for missing
our prediction platform. We use several machine learning values.
Figure 4: The overview of the score-prediction platform based on Spark for undergraduate level

Table 4: An example about dataset Table 6: Similarity Score between student 1511000 and
Student ID Course ID Grade
other students
1511000 0 9 1511000’s feature [9, 8, 0, 0]
1511000 1 8 1512000’s feature [8, 7, 7.5, 8.5]
1512000 1 7 1513000’s feature [7.5, 8.5, 7.5, 0]
1512000 0 8 Similarity(1511000, 1512000) 0.68402
1512000 2 7.5 Similarity(1511000, 1513000) 0.82787
1512000 3 8.5
1513000 0 7.5
1513000 2 7.5 Table 7: Similarity Score between course 2 and other courses
1513000 1 8.5
1511000 2 ? course 0’s feature [9, 8, 7.5]
course 1’s feature [8, 7, 8.5]
Table 5: Representing the student’s grades as a matrix course 2’s feature [0, 7.5, 7.5]
Similarity(course 2, course 0) 0.77259
0 1 2 3 Similarity(course 2, course 1) 0.80526
1511000 9 8 ? ?
1512000 8 7 7.5 8.5
1513000 7.5 8.5 7.5 ?
dicted course (course 2) and other courses which are learned
by student 1511000 instead of calculating the similarity of
5.3. User-based and Item-based Collaborative Fil- students. Each course’s feature contains the grade of stu-
tering dents [1511000, 1512000, 1513000] in that course. Finally,
the IBCF algorithm will predict grade of student 1511000
Combining with Dataframe API [23] supported in Spark, in course 2 by combining the grade of student 1512000 in
we implement the prediction models that based on the course 0 and course 1 with corresponding similarity:
principle of UBCF & IBCF. To illustrate how this method 0.77259 · 9 + 0.80526 · 8
predicted score = = 8.48960
works, from the sample data in Table 4, to predict grade 0.77259 + 0.80526
of student 1511000 in course 2 with UBCF, this algorithm
will calculate the similarity between student 1511000 and
others student who learned course 2 (using cosine similarity 5.4. Item-based Collaborative Filtering on Course
[19]), as Table 6 shown. Each student’s feature contains Factor matrix of Matrix Factorization
student’s grades in [course 0, course 1, course 2, course
3]. Finally, the UBCF algorithm will predict the grade of
student 1511000 in course 2 by combining grade of student
The resulting course factor matrices produced by ALS
1512000 & 1513000 with the corresponding similarity:
0.68402 · 7.5 + 0.82787 · 7.5 algorithms can be used to calculate the similarity among
predicted score = = 7.5 courses. Course factor matrix represents the relation between
0.68402 + 0.82787
courses and hidden features in ALS algorithms, we can use
For IBCF, it will calculate the similarity between the pre- those hidden features as inputs in IBCF.
6. Results Table 9: The detail information of proposed algorithms
Name Algorithms
6.1. Testing environment Baseline Taking average of all visible grades on each student
IBCF Item-based Collaborative Filtering
UBCF User-based Collaborative Filtering
Our experiments are performed on the cluster named ALS Alternative Least Square
SuperNode-XP which is a heterogeneous cluster with 24 Alternative Least Square with nonnegative
ALS NN
compute nodes. There are 2 CPU sockets - Intel Xeon E5- constraint
2680 v3 @ 2.70 GHz, 2 Intel Xeon Phi 7120P (Knight Item-based Collaborative Filtering on Non-negative
ALS NN IBCF
Corners) cards, and 128GB RAM per node. The Spark ALternative Least Square’s Course Factor Matrix
Item-based Collaborative Filtering on Normal
cluster in this paper is built on 4 nodes: 1 master & 3 ALS IBCF
ALternative Least Square’s Course Factor Matrix
workers. In addition, the software stack consists of:
Table 8: Software Specification on the testing environment

No. Software Description


1 Operating System Red Hat Enterprise Linux 7.2
2 Spark Apache Spark ver 2.4.0
3 Python Version 2.7.15

6.2. Grade prediction

For undergraduate students, the dataset is divided sep-


arately into groups for training and testing the prediction
model. Remarkably, the paper evaluates two situations for
the experiments of predicting student’s grades with the real
dataset. Concretely, there are 2 case studies:
1) Only use the data of a specific faculty to train & Figure 5: The error scores of different prediction models
test the prediction for that faculty (Locality Case - in the experiment 1 for the faculty of Computer Science &
LC or The experiment 1). Engineering (MT)
2) Use the data of the whole university to train & test
the prediction for students in a faculty (Global Case
- GC or The experiment 2). achieve the best score when comparing to other models
Students will be split into 2 groups: students are used across all four faculties. The models with the non-negative
in the train set, and students are used in the test set (80% constraint perform better than models with the negative
and 20% respectively). Therefore, we will have a group of constraint. For example, as Figure 5 shown, ASL N N
students that we have all the information (train students) in (the Non-negative ALS model) is obviously better than
the test sets. In detail, the tested students are considered as ALS about RMSE, MSE, with the gain of ≈ 3.5% &
new students with missing grades. In short: 7%. Likewise, ALS N N IBCF also has lower errors than
ALS IBCF , corresponding to the gain of ≈ 4% for RMSE
• In the train set: occupies 80% of the whole dataset. and 8% for MSE. For the evaluation on the data of other
• In the test set: occupies 20% of the whole dataset.
We fit models using the train sets, then calculate Root Table 10: Detail results
Mean Square Error (RMSE), Mean Square Error (MSE), Faculty Metric Baseline IBCF UBCF ALS ALS NN ALS NN IBCF ALS IBCF
RMSE 1.85 1.81 1.78 1.75 1.69 1.76 1.83
Mean Absolute Error (MAE) score of predictions on the MT(1) MSE 3.40 3.29 3.15 3.05 2.85 3.10 3.37
MAE 1.37 1.33 1.22 1.21 1.19 1.31 1.30
test set. Each method is run and tested 10 times, then we RMSE 1.61 1.57 1.54 1.50 1.44 1.56 1.58
get the average for the final results. Concretely, we use seven MO(1) MSE
MAE
2.59
1.21
2.48
1.18
2.37
1.05
2.24
1.02
2.08
1.03
2.44
1.18
2.50
1.18
algorithms for the prediction model, as Table 9 shown. HC(1)
RMSE
MSE
1.68
2.82
1.64
2.69
1.55
2.41
1.50
2.25
1.47
2.17
1.61
2.60
1.64
2.71
When UBCF is used in a dataset with a large number MAE
RMSE
1.24
1.91
1.20
1.84
1.02
1.73
1.05
1.76
1.01
1.73
1.19
1.79
1.18
1.88
of users, the run time can be too long. Users must be XD(1) MSE 3.64 3.39 2.99 3.11 2.98 3.22 3.54
MAE 1.41 1.35 1.20 1.26 1.25 1.33 1.33
first clustered together, and we apply the algorithms to RMSE 1.85 1.78 1.80 1.72 1.78 2.10
MT(2) MSE 3.44 3.18 3.24 2.95 3.16 4.64
each cluster. Therefore, experiment 2 will not be run for MAE 1.37 1.30 1.26 1.22 1.31 1.31
RMSE 1.62 1.55 1.46 1.45 1.56 1.59
UBCF. The results of 4 faculties are shown in Figures 5 MO(2) MSE 2.61 2.41 2.12 2.11 2.44 2.53
MAE 1.22 1.15 1.04 1.04 1.17 1.16
to 8. Table 10 highlights the detail values of error for the RMSE 1.68 1.62 2.28 1.49 1.61 1.62
evaluation of two experiments, (1) and (2). HC(2) MSE
MAE
2.82
1.24
2.61
1.19
7.36
1.21
2.23
1.05
2.60
1.19
2.63
1.18
Overall, all methods perform better than the base- XD(2)
RMSE
MSE
1.90
3.60
1.82
3.32
1.74
3.03
1.71
2.93
1.79
3.21
1.88
3.56
line method. ALS models with the non-negative constraint MAE 1.41 1.34 1.24 1.24 1.33 1.33
Figure 6: The error scores of different prediction models in Figure 8: The error scores of different prediction models in
the experiment 1 for the faculty of Environment and Natural the experiment 1 for the faculty of Civil Engineering (XD)
Resources (MO)

sparsity (89%) and the number of courses (322) compared


to MT which has sparsity of 82% and 168 courses. When
running the experiment 2 of models without non-negative
constraint, sometimes RMSE and MSE scores will be much
higher abnormally despite the fact that MAE scores are
very consistent among all running trials. For example, the
ALS model in the faculty of HC has a pretty bad RMSE
score (2.28) - worse than the RMSE of the baseline model.
However, with the MAE score, it gets a better MAE score
compared to the baseline model. This behavior is also seen
in IBCF and ALS models when running the dataset of MT
faculty. The MAE score is very close to IBCF but RMSE
and MSE errors are very high. In summary, ALS models
with the non-negative constraint can give the best accuracy
out of all tested models in the case of our dataset in Ho Chi
Minh City University of Technology. The accuracy can be
Figure 7: The error scores of different prediction models in changed from data to data and also depend on the prediction
the experiment 1 for the faculty of Chemical Engineering model that we apply.
(HC)
7. Conclusions and Future Work
faculties as MO, HC & XD in Figure 6, 7, 8 respectively, Predicting GPA of future courses can be a valuable
we also get the same trend of results. Remarkably, in source of information to determine student’s performance.
most scenarios, ALS N N achieves the best results about These predicted GPA can assist instructors in identifying
the accuracy when predicting student’s grades for missing risky students and help students find their strength. In this
courses. At the other view of prediction models, U BCF study, we proposed a distributed platform built in Spark to
always has a lower error compared to IBCF across all 4 accommodate all important components of a prediction and
faculties (with average RMSE score of 1.65 and 1.72 re- recommender system. Furthermore, our work also analyzes
spectively). Furthermore, another interesting point is: using and compares various techniques commonly based on the
IBCF on the raw dataset results in the worse scores rather combination of two theories Matrix Factorization and Col-
than applying IBCF on Non-negative ALS’s Course Factor laborative Filtering. The experiments are performed with
Matrix (ALS N N IBCF ). two cases of training the prediction models: Locality Case
These two experiments ((1) and (2)) also show that using and Global Case. In the Locality Case (1), the prediction
a big train dataset with all of the faculties does not affect models are trained with student’s data in a specific faculty
much when comparing to the prediction models which are and then we use it to predict student’s grade in that faculty.
trained by the dataset of a specific faculty. Table 10 shows Otherwise, the Global Case tries to train the prediction
that MT and XD faculties have a much higher base error models with student’s data of the whole university, then
than MO and HC faculties although HC has far higher the model will be used to predict student’s grade in one
of the requested faculties. The result shows that there is [11] N. Thai-nghe, L. Drumond, R. Nanopoulos, and L. Schmidt-thieme,
little difference in performance between those two cases “Recommender system for predicting student performance,” in In Pro-
ceedings of the 3rd International Conference on Computer Supported
but not much. Furthermore, the prediction models with the Education (CSEDU), 2011.
non-negative constraint are better than others without. Es-
[12] M. Feng, N. Heffernan, and K. Koedinger, “Addressing the assess-
pecially, the Non-negative Alternative Least Square models ment challenge with an online system that tutors as it assesses,” User
(ALS N N ) achieve the best accuracy in most scenarios. Model. User-Adapt. Interact., vol. 19, pp. 243–266, Aug. 2009.
In summary, Machine Learning techniques applied in [13] P. Resnick and H. R. Varian, “Recommender systems,” Communica-
this study only take the advantage of available grades to tions of the ACM, vol. 40, no. 3, pp. 56–59, 1997.
predict the future performance and missing out other vari- [14] E. Garcı́a, C. Romero, S. Ventura, and C. D. Castro, “An architecture
ables such as age, social standing, hobbies, preliminary test for making recommendations to courseware authors using association
performance, etc. These predictor variables can be used to rule mining and collaborative filtering,” User Modeling and User-
estimate the GPA of first-year courses where the dataset is Adapted Interaction, vol. 19, no. 1-2, pp. 99–132, Feb. 2009.
extremely sparse. Our future research will explore deeply [15] A. Elbadrawy, A. Polyzou, Z. Ren, M. Sweeney, G. Karypis, and
the relation of more influence variables in the prediction H. Rangwala, “Predicting student performance using personalized
analytics,” IEEE Computer., p. 61—69, Apr. 2016.
models, then evaluate the feasibility when applying to the
graduate student’s performance. [16] C. Romero and S. Ventura, “Educational data mining: A survey from
1995 to 2005,” Expert Systems with Applications, vol. 33, pp. 135–
146, Jul. 2007.
Acknowledgments [17] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica,
“Spark: Cluster computing with working sets.” HotCloud, vol. 10, no.
10-10, p. 95, 2010.
This research was conducted within the “Studying Tools
to Support Applications Running on Powerful Clusters & Big [18] Z.-D. Zhao and M.-S. Shang, “User-based collaborative-filtering rec-
Data Analytic (HPDA phase I 2018-2020)” funded by Ho ommendation algorithms on hadoop,” in 2010 Third International
Conference on Knowledge Discovery and Data Mining. IEEE, 2010,
Chi Minh City Department of Science and Technology (under pp. 478–481.
grant number 46/2018/HĐ-QKHCN).
[19] B. M. Sarwar, G. Karypis, J. A. Konstan, J. Riedl et al., “Item-based
collaborative filtering recommendation algorithms.” Www, vol. 1, pp.
285–295, 2001.
References
[20] K. Fukunage and P. M. Narendra, “A branch and bound algorithm
for computing k-nearest neighbors,” IEEE transactions on computers,
[1] C. Romero and S. Ventura, “Educational data mining: A review no. 7, pp. 750–753, 1975.
of the state of the art,” IEEE Transactions on Systems, Man, and
Cybernetics, Part C: Applications and Reviews, vol. 40, pp. 601–618, [21] G. H. Golub and C. Reinsch, “Singular value decomposition and least
Dec. 2010. squares solutions,” in Linear Algebra. Springer, 1971, pp. 134–151.

[2] K. Chrysafiadi and M. Virvou, “Student modeling approaches: A lit- [22] D. Lee and H. Seung, “Algorithms for non-negative matrix factoriza-
erature review for the last decade,” Expert Systems with Applications, tion,” in Proceedings of the 13th International Conference on Neural
vol. 40, no. 11, pp. 4715–4729, 2013. Information Processing Systems, vol. 13, Jan. 2000, pp. 535–541.
[23] M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley,
[3] R. S. Baker and K. Yacef, “The state of educational data mining
X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi et al., “Spark sql:
in 2009: A review and future visions,” Journal of Educational Data
Relational data processing in spark,” in Proceedings of the 2015 ACM
Mining, vol. 1, pp. 601–618, Dec. 2009.
SIGMOD international conference on management of data. ACM,
[4] J. Zimmermann, K. H. Brodersen, J.-P. Pellet, E. August, and J. Buh- 2015, pp. 1383–1394.
mann, “Predicting graduate-level performance from undergraduate
achievements,” Jul. 2011, pp. 357–358.
[5] P. G. E. Cristóbal Romero, Sebastián Ventura and C. Hervás, “Data
mining algorithms to classify students,” 1st International Conference
on Educational Data Mining, p. 8–17, Jun. 2008.
[6] R. Conijn, C. Snijders, A. Kleingeld, and U. Matzat, “Predicting stu-
dent performance from lms data: A comparison of 17 blended courses
using moodle lms,” IEEE Transactions on Learning Technologies,
2017.
[7] D. Nurjanah, “Good and similar learners’ recommendation in adaptive
learning systems,” Conference on Computer Supported Education,
vol. 1, pp. 434–440, 2016.
[8] R. Turnip, D. Nurjanah, and D. Kusumo, “Hybrid recommender
system for learning material using content-based filtering and collab-
orative filtering with good learners’ rating,” Nov. 2017, pp. 61–66.
[9] Z. Iqbal, J. Qadir, A. N. Mian, and F. Kamiran, “Machine learning
based student grade prediction: A case study,” Aug. 2017.
[10] N. Thai-nghe, L. Drumond, A. Krohn-Grimberghe, and L. Schmidt-
Thieme, “Recommender system for predicting student performance,”
Procedia Computer Science, vol. 1, p. 2811–2819, 2010.

View publication stats

You might also like