Students Performance Prediction System Using Multi Agent Data Mining Technique
Students Performance Prediction System Using Multi Agent Data Mining Technique
5, September 2014
ABSTRACT
A high prediction accuracy of the students performance is more helpful to identify the low performance
students at the beginning of the learning process. Data mining is used to attain this objective. Data mining
techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.
Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the
classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binary
classification.
In this paper, students performance prediction system using Multi Agent Data Mining is proposed to
predict the performance of the students based on their data with high prediction accuracy and provide help
to the low students by optimization rules.
The proposed system has been implemented and evaluated by investigate the prediction accuracy of
Adaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The
results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
KEYWORDS
E-learning, Educational Data Mining, Classification, Ensemble of Classifier, Boosting, Adaboost, SAMME,
Adaboost.M1, LogitBoost, Students Performance Prediction, Multi-Agent System.
1. INTRODUCTION
There are increasing research interests in using data mining techniques in education. This
emerging field called Educational Data Mining. It can be applied on the data related to the field of
education. One of the educational problems that are solved with data mining is the prediction of
students' academic performances. Prediction of students' performance is more beneficial for
identifying the low academic performance students. Student retention is an indicator of academic
performance and enrolment management of the university. The ability to predict a students
performance is very important in educational environments. Students academic performance is
DOI : 10.5121/ijdkp.2014.4501
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
based upon different factors like social, personal, Psychological and other environmental
variables. A very promising tool to achieve this objective is the use of Data Mining. Data mining
techniques are used to discover hidden patterns and relationships of data, which is very much
helpful in decision-making. Now-a-days educational database is increased rapidly because of the
large amount of data stored in it. The loyal students motivate the higher education systems; to
know them well is by using valid management and processing of the students' database. Data
mining approach provides valid information from existing student to manage relationships with
upcoming students [1] [2] [3].
The most useful data mining techniques in e-learning is classification. It is a predictive data
mining technique, makes prediction about data values using known results found from different
data. C4.5 Decision tree algorithm has been successfully used in capturing knowledge and
presents a powerful method of inferring classification models from a set of labelled examples [4].
Recently, there has been increasing interest in combining classifiers concept that is proposed for
the performance improvement of individual classifiers. The integration algorithms approach is
used to making decision more accurate, reliable and precise. One of a mechanism that is used to
build an ensemble of classifiers is using different learning methods [5]. An ensemble of classifiers
is a set of classifiers whose own decisions are combined to improve the performance of the
overall system. The most popular techniques for constructing ensembles are Bagging and
Boosting. These two techniques manipulate the training examples to generate multiple classifiers
[6] [7].
The main objective of this paper is using data mining to predict the students performance in the
courses accurately. We build a Students Performance Prediction System that can be able to
predict the students performance based on their academic result with high accuracy. We want
also to prove if the boosting technique enhances the prediction accuracy or not by investigating
the prediction accuracy of different of ensemble classifiers' methods and with single classifier
method. We use Multi-Agent Data Mining technique in our system; each agent is responsible for
a specific tasks.
2. BACKGROUND
2.1 ELECTRONIC LEARNING (E-LEARNING):
E-learning, as one of the main areas of e-services, has undergone intensive development as an
inevitable result of the recent proliferation of internet technology. Traditional learning means
restrict the student to certain learning methods, at a particular place and time whereas e-learning
services create wider ranges for organizations and individuals who involved in learning and
teaching. These environments facilitate the delivery of large parts of education through the use of
tools and materials that are accessible directly to the learners home or office, and at any time. In
addition, the advancements in technology, which are used to enhancing the interactivity and
media content of the web and the increasing quality of delivery platforms, create an ideal
environment for the expansion of e-learning systems [8].
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
1. Understand the application domain, the relevant previous knowledge and the goals of
the end-user (formulate the hypothesis) [9].
2. Data Collection: Determining how to find and extract the right data for modeling. First, we
need to identify the different data sources are available. Data may be scattered in different
data silos, spreadsheets, files, and hard-copy (paper) lists [10].
3. Data integration: Integration of multiple data cubes, databases or files. A big part of the
integration activity is to build a data map, which expresses how each data element in each
data set must be prepared to express it in a common format and record structure [10].
4. Data selection: First of all the data are collected and integrated from all the various sources,
and we select only the data which useful for data mining. Only relevant information is
selected [9].
5. Pre-processing: The Major Tasks in Data Pre-processing are: Cleaning, Transformation and
Reduction.
o Data cleaning: Also called data cleansing. It deals with errors detecting and removing
from data in order to improve the quality of data [11]. Data cleaning usually includes fill
in missing values and identify or remove outliers.
o Data Transformation: Data transformation operations are additional procedures of
data pre-processing that would contribute toward the success of the mining process and
improve data-mining results. Some of Data transformation techniques are
Normalization, Differences and ratios and Smoothing [9].
o Data Reduction: For large data sets, there is an increased likelihood that an
intermediate, data reduction step should be performed prior to applying data-mining
techniques. While large datasets have potential for better mining results, there is no
guarantee that they will produce better knowledge than small datasets. Data reduction
obtains a reduced dataset representation that is much smaller in volume, yet produces
the same analytical results [9].
6. Building the model: In this step we choose and implement the appropriate data-mining task
(ex. association rules, sequential pattern discovery, classification, regression, clustering,
etc.), the data mining technique and the data mining algorithm(s) to build the model.
7. Interpretation of the discovered knowledge (model /patterns): The interpretation of the
detected pattern or model reveals whether or not the patterns are interesting. This step is also
called Model Validation/Verification and uses it to represent the result in a suitable way so it
can be examined thoroughly [9].
8. Decisions / Use of Discovered Knowledge: It helps to make use of the knowledge gained to
take better decisions.
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
In the educational area, data mining is the process of transforming raw data compiled by
education systems in useful information could be used to take informed decisions. It can be
applied on the various data that are used in communicating with the stakeholders, student
modeling, structuring the education domain, predicting the student grade, etc [13].
The prediction of students' performance with high accuracy is more beneficial for identifying
low academic achievement students in the beginning. Student retention is an indicator of
academic performance and enrolment management of the university. The ability to predict the
performance of a student is very important in educational environments. Students academic
performance is based upon different factors like personal, Psychological, social and other
environmental variables. A very promising tool to achieve this objective is the use of Data
Mining [2] [3]. One of the most useful data mining techniques in e-learning is classification. It
is a predictive data mining technique, makes prediction about the values using known results
found from different data [4].
Predicted Class
Class
Class 1
Class 2
Class 1
True Positive (TP)
False Positive (FP)
Class 2
False Negative (FN)
True Negative (TN)
Where true positives (TP) mean the correct classifications of the positive examples; true negatives
(TN) are the correct classifications of the negative examples; false positives (FP) represent the
incorrect classification of the negative examples into the positive class, and false negatives (FN)
are the incorrect classification of the positive examples into the negative class. There are several
performance metrics available [17] [18].
1. The predictive accuracy of the classifier measures the proportion of correctly classified
instances = TP + TN/TP + FP + TN + FN
4
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
2. True Positive Rate (TPR or Recall or Sensitivity): measures the percent of actual
positive examples that are correctly classified = TP/(TP + FN)
3. True Negative Rate (TNR or Specificity): measures the percent of actual negative
examples that are correctly classified = TN/(TN + FP)
4. Positive Predictive Value (PPV): often called Precision, it is the percentage of the
examples predicted to be positive that were correct = TP/(TP + FP)
5. False Negative Rate (FNR): The percentage of positive examples that were incorrectly
classified = FN/(TP + FN) = 1 TPR
6. False Positive Rate (FPR): The percentage of negative example that were incorrectly
classified = FP/(TN + FP) = 1 TNR
2.7 ADABOOST:
One of the earliest and best-known boosting algorithms is AdaBoost. Freund and Schapire
introduced the Adaptive Boosting (AdaBoost) method as another method to influence the training
data [22]. Initially, the algorithm assigns every instance xi in the training data with an equal
weight. In each iteration i, the learning algorithm trained sequentially and tries to minimize the
weighted error on the training set and returns the classifier Hi(x). The weighted error of Hi(x) is
calculated and applied to update the weights on the instances of the training data xi. The weight
of instance xi increases according to its impacts on the performance of the classifier that assigns a
high weight for a misclassified instance xi and a low weight for a correctly classified instance.
This allows for the current classifier to be focused on the most difficult patterns, that is, the ones
5
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
that were not well classified by the previous classifiers. The final classifier H*(x) is constructed
by a weighted vote of the individual Hi(x) according to its classification accuracy based on the
weighted training set [7]. AdaBoost is designed for two-class classification and hence not directly
applicable to multiclass problems.
2.9 ADABOOST.M1:
AdaBoost.M1 is the first attempt to extend AdaBoost to multi-class classification problems and is
very popular [22].
Adaboost.M1 assumes that the error of each weak classifier is less than 1/2, with respect to the
distribution on which it was trained. This assumption is easily satisfied for two-class
classification problems because the error rate of random guessing is 1/2, and it is harder to
achieve in the multiclass case, where the random guessing error rate is (K 1)/K. As a result,
AdaBoost easily fail in the multi-class case and stopped [26].
1. Initialize the observation weights wi=1/n, i=1,2,,n.
2. For m=1 to M:
(a) Fit a classifier T(m)(x) to the training data using weights wi.
(b) Compute err () = w (c T () (x ))/ w
If (err(m)<=0 || err(m)>=0.5) return;
!!(")
!!(")
()
w . exp( . (c
(d) Set w
T () (x )))
for i = 1,2,, n.
(e) Re-normalize wi.
()
3. Output C(x) = arg max ,
. )T () (x) = k+.
Figure 2: Adaboost.M1 Algorithm [26].
The weights of all dataset instances are then normalized so that the sum of these instances is equal
to 1 and in order to keep this constraint, the weight of each instance is divided by the sum of the
new weights.
After M rounds, an ensemble of classifiers (composite model) will be generated, which is then
used to classify a new data. When a new instance X comes, it is classified through these steps
[28].
Initialize weight of each class to 0;
for i = 1 to k do
6
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
2.10 SAMME:
SAMME (Stagewise Additive Modeling using Multiclass Exponential Loss Function) algorithm
extends the original AdaBoost algorithm to the multiclass case directly without reducing it to
multiple two-class problems to solve Adaboost.M1 problem. For a K-class classification task,
SAMME returns only one weighted classifier (rather than K) in each boosting iteration and the
weak classifier only needs to be better than K-class random guessing (rather than 1/2) [26].
The solution of SAMME is compatible with the classification rule, so it is optimal in minimizing
the misclassification error and only needs weak classifiers better than random guessing (correct
probability larger than 1/K), rather than better than 1/2 as the two-class AdaBoost requires [26].
1. Initialize the observation weights wi=1/n, i=1,2,,n.
2. For m=1 to M:
(a) Fit a classifier T(m)(x) to the training data using weights wi.
(b) Compute err () = w (c T () (x ))/ w
(d) Set w
for i = 1,2,, n.
(e) Re-normalize wi.
()
3. Output C(x) = arg max ,
. )T () (x) = k+.
Figure 3: SAMME Algorithm [26].
The difference between SAMME and AdaBoost.M1 are in (b.1) and (c):
In the case of K = 2, it is equivalent to the original two-class AdaBoost because log (K 1) = 0.
The term log(K 1) is not artificial, it follows naturally from the multiclass generalization of the
exponential loss in the binary case. Also, it makes SAMME Algorithm equivalent to fitting a
forward stagewise additive model using the multi-class exponential loss function. In additional, if
K=2, the algorithm returns to binary Adaboost. It is highly competitive with the best currently
available multiclass classification methods, in terms of both practical performance and
computational cost [26].
2.11 LOGITBOOST:
LogitBoost [25] performs additive logistic regression and generates models that maximize the
probability of a class. In each iteration, the algorithm fits a regression model to a weighted
training data. For a two-class problem, if the weak classifier minimizes the squared error, then the
probability of the first class is maximized, and it can be extended to the multiclass as well. In
general, AdaBoost optimizes the exponential loss whereas LogitBoost optimizes the probability
[29]. LogitBoost relies on a binomial log-likelihood as a loss function, which is a more natural
criterion in binary classification than the exponential criterion underlying the AdaBoost
7
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
algorithm. LogitBoost can operate with Logit models, decision stumps, or decision trees as basic
classifiers [30]. Decision Stump is a simplified Decision Tree introduced by Iba and Langley,
having just one level and builds simple binary tree [31].
3. RELATED WORK:
Paris et al. [5] compared the data mining methods accuracy to classifying students in order to
predicting class grade of a student. These predictions are more useful for identifying the weak
students and assisting administration to take remedial measures at initial stages to produce
excellent graduate that will graduate at least with the second class upper.
Rathee and Mathur apply ID3, C4.5 and CART decision tree algorithms on the educational data
for predicting a students performance in the examination. All the algorithms are applied on the
internal assessment data of the student to predict their academic performance in the final exam.
The efficiency of various decision tree algorithms can be analyzed based on their accuracy and
time taken to derive the tree. The predictions obtained from the system have helped the tutor to
identify the weak students and improve their performance. C4.5 is the best algorithm among all
the three because it provides better accuracy and efficiency than the other algorithms [35].
Minaei-Bidgoli, Kortemeyer and Punch applied data mining classifiers as a means of comparing
and analyzing students' use and performance who have taken a technical course via the web. The
results show that combining multiple classifiers leads to a significant accuracy improvement in a
given data set. Prediction performance of combining classifiers is often better than a single
classifier because the decision is relying on the combined output of several models [36].
Zhu et al. proposed SAMME algorithm using a multiclass exponential loss function. The
numerical experiments have indicated that AdaBoost.MH in general performs very well, and
8
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
SAMMEs performance is comparable with that of the AdaBoost.MH, and sometimes slightly
better. Also, they discussed the computational cost of SAMME. SAMME generates only one Kclass classifier in each iteration; thus, it is K times faster than AdaBoost.MH [26].
Bakar et al. proposed an agent based data classification approach, and it is based on creating
agent within the main classification process. They show the use of agent within the classification
theory, which will help to improve classification speed, and maintain the quality of knowledge.
The proposed agents are embedded within the standard rule application techniques. The result
shows the significant improvements in classification time and the number of matched rules with
comparable classification accuracy [37].
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
communicates with other Agents through sending request messages and receives the results.
When sending a message needs only to specify the identity of the accepting Agent and the
content.
1. User Interface Agent: It is responsible for receiving the user request and sending this request
with its required inputs to other agents to performs their tasks and receives the results to view
it to the user. It is also responsible for gathering the information about students data, attributes
number, records number and the types of the data and other information.
2. Model Building Agent: It is responsible for building the Prediction Model, it receives the
building request from User Interface Agent with the required data: Training Data, Iteration
Number for boosting and send the result (Prediction Model) to User Interface Agent, and also
generate the rules from the highest weight classifier and store it to be used later for
performance optimization.
3. Model Evaluation Agent: It is responsible for Evaluate the Prediction Model, it receives the
evaluation request from User Interface Agent with the required data: Prediction Model, Test
Data and the actual students performance to evaluate the Model and send the evaluation
results to User Interface Agent.
4. Performance Prediction Agent: It is responsible for predicting the students performance.
First, it receives the prediction request from User Interface Agent with the required data:
Prediction Model and students Data to predict the students performance and store it to be
ready for prediction request from users. When the Performance Prediction Agent receives the
user prediction request by User Interface Agent by students ID or certain predicted
performance, it selects the required student with the predicted performance and sends the
prediction results to User Interface Agent, Performance Prediction Agent.
For the optimization, Performance Prediction Agent select the suitable rules for optimize the
low students performance, view the reason for getting this low performance, motivate the
student to get high performance by these rules and send these results to User Interface Agent.
4.2.3 SYSTEM IMPLEMENTATION STEPS:
A. Data Preparation:
Data preparation implemented manually, so the User Interface Agent will send the requests with
the required data (inputs) to other agents and gets the results.
B. Data Collection:
Students data were collected manually from EMES e-learning system of the computer skills
(CPIT100) course, KAU, Jeddah, Saudi Arabia of session 2012-13. The course sections are: AAP
= 79 female students and ABP = 76 female students.
11
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
C. Data Selection:
We select only the data is useful for data mining from EMES e-learning system and entered these
data to an external Excel sheets.
The attributes selected for prediction:
Student's name.
Student's ID.
Number of solved quizzes.
Number of the correct answers of all quizzes.
Number of submitted assignments.
Number of the correct answers of all assignments.
Total time of login hours the student was spent in the e-learning system.
Number of pages the student was read.
The student final grade class.
D. Data Pre-processing:
1) Data cleaning: Data cleaning involves several of processes such as filling in the missing
values, removing outliers, smoothing the noisy data and resolving inconsistencies.
2) Data Transformation: The data is from different resource Different sections teachers and
Different number of quizzes and assignments in each course section. So we take the
percentage of these attributes. The attributes are:
All solved quizzes percentage = number of solved quizzes / number of all quizzes *100
Percentage of the correct answers of the all quizzes = number of the correct answers of all
quizzes / number of quizzes answers *100
All submitted assignments percentage = number of submitted assignments / number of all
assignments *100
Percentage of the correct answers of the all assignments = number of the correct answers
of all assignments / number of assignments answers *100
Smoothing: Rounding the transformed attributes values. If the value for the attributes is
0.93, then the smoothed values will be = 1.
3) Data Reduction:
i. Column-reduction techniques (Features /Attributes reduction): We choose only the
attributes that are used for building the prediction model.
To build the prediction model we choose:
All solved quizzes percentage.
Percentage of the correct answers of the all quizzes.
All submitted assignments percentage.
Percentage of the correct answers of the all assignments.
Total time of login hours the student was spent in the e-learning system.
Number of pages the student was read.
The student final grade class.
ii. Feature-discretization techniques (Values Reduction):
We reduce the values manually, based on our priori knowledge about the attributes. So:
If the student's grade >=90 then the performance class will be High.
If the student's grade >=70 & < 90 then the performance class will be Medium.
If the student's grade <70 then the performance class will be Low.
12
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
E. Data Formatting: Convert the students data in the Excel file to ARFF file (AttributeRelation File Format).
F. Data Partitioning: The data will be divided into training and testing data. The training data is
applied to build the model while the testing data is used to verify the model.
G. Performance Prediction Model Building: Build the performance prediction model from
Training Data using data mining technique. We use the Classification task, decision tree
method (C4.5 with SAMME boosting technique) to build the performance model and generate
and store the rules for performance optimization.
H. Performance Prediction Model Evaluation: Evaluate the prediction model by Test Data
using the confusion matrix and calculates the prediction accuracy and store the evaluated
Model.
I. Students Performance Prediction: Predict the students performance and optimize the low
predicted students performance and store students performance and students activities/data
to be ready for performance prediction request.
1) Students Performance Prediction Model Building and Evaluation:
a) Build the Model: We select the pre-processed Training Data and click on Upload
Training Data.
When click on Upload Training Data, the system shows the training data description,
these data are file name; relation name, attributes number, students number and
attributes list with its data type.
We choose Use Boosting Technique (SAMME) with 10 iterations and click on Build
The Prediction Model. A composite model (10 classifiers) is built and each classifier
has a weight that indicates its accuracy, the system traverses each path of the highest
weight classifier to generate the rules and store it with its performance class to be used
for performance optimization.
b) Evaluate the Model: We select the pre-processed Test Data and click on Evaluate the
Prediction Model.
The system evaluates the model by tracing the prediction model with the processed test data,
compares the predicted student performance with actual student performance and evaluates the
13
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
model by the Confusion Matrix. The evaluation window in Figure 7 shows the model rules,
confusion matrix result, evaluation results and computation time.
For computational time, we compute the start time and end time for building model and model
testing time processes by stopwatch (Java Timer Class) and subtract start time from end time and
divided by 1000 because the stopwatch returns the time in Milliseconds.
Duration = end Time start Time
Time in seconds = Duration / 1000
View the performance metric by click on Performance metrics button and the results is based on
the confusion matrix results.
Show Graph: used for visualize the tree for a single classifier (use C4.5 without boosting).
ROC Curve (Receiver Operating Characteristics): False Positive Rate (FPR) vs. True
Positive Rate (TPR).
Classification Error Curve: in each boosting iteration, SAMME algorithm build a classifier
then test this classifier to calculates the incorrectly classified students to get the total error
weights of this classifier and calculate the classifiers accuracy weight, the curve show the
incorrectly classified students in each iteration (number of iterations vs. test error).
We can save the prediction model to be used later for performance prediction.
2) Current Students Performance Prediction:
First we load the Prediction Model and click on predict the Students Performance, the system
fetch all required students data from the excel sheet and put it in a list and convert it to arff
format to be suitable for tracing these data on the prediction model to predicts the students
performance and store the performance result it to be ready for student/teacher request.
14
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
When predict the performance by student ID, the system gets the predicted performance from
the stored students data. The system views the students predicted performance with the
suitable messages according to the predicted students performance.
When click on Details button, the system view more details about the reason for getting this
performance and how it can be optimize the performance by fetching the rules from the rules file
to gain higher performance and shows the student how can achieve this high performance by
motivational messages, example, if the student get Low performance, the system fetch the
Medium rules only.
The teacher can view the students predicted performance by select a certain predicted
performance to view a list of students have this performance; the system fetch from the
stored data all students have the same performance and the teacher can send an email to
these students.
15
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
Figure 11: View Students Performance Prediction List by selecting a certain Performance
5. EXPERMENTAL RESULTS:
The experiment was conducted using the EMES e-learning system dataset. The dataset contains
155 students from two sections of the computer skills (CPIT100) course, KAU University,
Jeddah, Saudi Arabia of session 2012-13.
The dataset is divided into two parts: training and testing data, 105 students selected for training
data to create the prediction model and the remaining is used for testing data used for model
evaluation. The classification algorithm used is C4.5 with SAMME Boosting technique. Also, we
used C4.5 as single classifier and Adaboost.M1 and LogitBoost boosting technique to evaluate
our system. The results are recorded for comparison purpose.
The experiment is to measure classification accuracy, performance metrics, classification error
curve, Roc curve, the computational time to complete the classification tasks (training time for
model building and the testing time for model evaluation). Also, we measure the effect of
increasing the iterations number on the prediction accuracy. There are no special experiments or
standards to measure the multi-agent system efficiency, because our design goal for multi-agent
system is only to divide the classification tasks, each agent responsible for a particular set of
tasks.
We start building the performance prediction model using SAMME boosting technique with 10
iteration and C4.5 as the base classifier. After the process finished, it returns the results:
confusion matrix, performance metrics, correctly classified students, incorrectly classified
students, accuracy percentage, Roc curve, classification Error curve, Model Building Time and
16
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
Model Testing Time). Figure 13 shows the system results after build and evaluate the prediction
model.
Figure 14 shows the performance prediction results according to the confusion matrix for each
class High, Medium and Low, True Positive Rate (TPR), True Negative Rate (TNR), False
Positive (FPR), False Negative Rate (FNR), Precision and Recall.
Classification Error Curve in Figure 15 shows the incorrectly classified students in each iteration.
Roc Curve in Figure 16 shows False Positive Rate (FPR) vs. True Positive Rate (TPR) in the
High prediction performance.
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
SAMME Boosting
Technique
Adaboost.M1
Boosting Technique
LogitBoost boosting
technique
C4.5
Correctly
classified
students
40
Incorrectly
classified
students
10
Prediction
Accuracy
80 %
Model
Building
Time (Sec.)
0.266
Model
Testing Time
(Sec.)
0.001
40
10
80 %
0.262
0.001
25
25
50 %
0.132
0.001
37
13
74 %
0.094
0.001
Adaboost.M1 Boosting
Technique
LogitBoost boosting technique
Class
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
Medium
Low
TPR
0.875
0.885
0.438
1
0.846
0.625
1
0.846
0.625
0.75
0.577
0.25
FPR
0.119
0.333
0
0.143
0.167
0
0.143
0.167
0
0.31
0.5
0
TNR
0.881
0.667
1
0.857
0.833
1
0.857
0.833
1
0.69
0.5
1
FNR
0.125
0.115
0.563
0
0.154
0.375
0
0.154
0.375
0.25
0.423
0.75
Precision
0.583
0.742
1
0.571
0.846
1
0.571
0.846
1
0.316
0.556
1
Recall
0.875
0.885
0.438
1
0.846
0.625
1
0.846
0.625
0.75
0.577
0.25
Boosting Iterations Effects: we increased the iterations number to 60. Figure 17 shows the
Prediction Accuracy Curves for SAMME, Adaboost.M1 and LogitBoost during 60 boosting
iterations.
Figure 17: Prediction Accuracy Curves for SAMME, Adaboost.M1 and LogitBoost
18
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
SAMME and Adaboost.M1 outperformed the single classifier C4.5 and the Sub-binary
boosting technique LogitBoost.
In Model Building Time, Adaboost.M1 take less time than SAMME boosting technique but
the prediction model building in C4.5 is faster than others because it is build only one
classifier.
In SAMME, the highest weight classifier in the prediction model (composite model) is 2.94
but in Adaboost.M1 is equal to 2.25. The highest weight classifier indicates the classifier
accuracy.
When increasing the iterations number, the prediction accuracy of SAMME and
Adaboost.M1 was decreased but in LogitBoost the accuracy is increased.
SAMME and Adaboost.M1 have the same prediction accuracy percentage (80 %) in the 5th
and 10th iteration, only in the 35th and 40th iteration, SAMME got the higher accuracy
percentage (78 %) and highest weight classifier in the model is 2.94.
In Adaboost.M1, the maximum number of iterations that were performed is 40 and highest
weight classifier in the model is 2.25. Adaboost.M1 stopped in the 40th iteration because the
total weight of incorrectly classified students of training data is equal to 0.5.
When the iterations number increased in LogitBoost, we got a highest accuracy (66 %) in the
50th iteration.
6. CONCLUSIONS
We applied Multi Agent Data Mining Technique in EMES e-learning system for students
performance prediction and optimization. It showed how useful data mining in an educational
system in particularly to predict the performance of a student accurately. Particularly, we obtained
the prediction model using SAMME boosting technique. Also, we used the rules extracted from
the prediction model for performance optimization. We examine the effects of ensemble methods
in the performance prediction accuracy.
When compare the system with the single classifier C4.5, SAMME outperformed the single
classifier C4.5. Also, we compare SAMME with Adaboost.M1 and LogitBoost boosting
technique, SAMME and Adaboost.M1 have the same accuracy percentage, only in the 35th and
40th iteration, SAMME got the higher accuracy percentage. SAMME and Adaboost.M1
outperformed Sub-binary boosting technique LogitBoost.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
E. Osmanbegovi and M. Sulji, 'Data mining approach for predicting student performance', Economic Review,
vol 10, iss 1, 2012.
M. Sukanya, S. Biruntha, S. Karthik and T. Kalaikumaran, 'Data Mining: Performance Improvement in
Education Sector using Classification and Clustering', in International Conference on Computing and Control
Engineering (ICCCE), 2012.
B. Bhardwaj and S. Pal, 'Data Mining: A prediction for performance improvement using classification',
International Journal of Computer Science and Information Security (IJCSIS), vol 9, iss 4, 2011.
C. Chen, Y. Chen and C. Liu, 'Learning performance assessment approach using web-based learning portfolios
for e-learning systems', IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews,
vol 37, iss 6, pp. 1349--1359, 2007.
I. Paris, L. Affendey and N. Mustapha, 'Improving academic performance prediction using voting technique in
data mining', World Academy of Science, Engineering and Technology, vol 4, pp. 820--823, 2010.
C. Wang, 'New ensemble machine learning method for classification and prediction on gene expression data'.
pp.3478--3481, 2006.
Y. Liu, 'Drug Design by Machine Learning: Ensemble Learning for QSAR Modeling', in the Fourth International
Conference on Machine Learning and Applications (ICMLA05), 2005.
M. Alkhattabi, D. Neagu and A. Cullen, 'Assessing information quality of e-learning systems: a web mining
approach', Computers in Human Behavior, vol 27, iss 2, pp. 862--873, 2011.
M. Kantardzic, Data mining: Concepts, models, methods, and algorithms, John Wiley & Sons, 2003.
19
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.5, September 2014
[10] R. Nisbet, J. Elder and G. Miner, Handbook of statistical analysis and data mining applications, 1st ed.
Amsterdam: Academic Press/Elsevier, 2009.
[11] E. Rahm and H. Do, 'Data cleaning: Problems and current approaches', IEEE Data Eng. Bull., vol 23, iss 4, pp. 3-13, 2000.
[12] S. Yadav, B. Bharadwaj and S. Pal, 'Data mining applications: A comparative study for predicting student's
performance', in International Journal of Innovative Technology & Creative Engineering (IJITCE), vol 1, iss 12,
pp. 13--19, 2012.
[13] S. Kumar and M. Vijayalakshmi, 'Mining Of Student Academic Evaluation Records in Higher Education', in
International Conference on Recent Advances in Computing and Software Systems (RACSS), pp. 67 70, 2012.
[14] J. Quinlan, C4.5: Programs for Machine Learning. San Mateo, Calif.: Morgan Kaufmann Publishers, 1993.
[15] G. Agrawal and H. Gupta, 'Optimization of C4.5 Decision Tree Algorithm for Data Mining Application',
International Journal of Emerging Technology and Advanced Engineering, vol 3, iss 3, pp. 341-345, 2013.
[16] J. Thongkam, G. Xu and Y. Zhang, 'AdaBoost algorithm with random forests for predicting breast cancer
survivability', in International Joint Conference on Neural Networks, pp. 3062--3069, 2008.
[17] S. Shanthi and R.G. Ramani, 'Gender specific classification of road accident patterns through data mining
techniques', in IEEE International Conference On Advances In Engineering, Science And Management
(ICAESM -2012), pp.359--365, 2012.
[18] A. Tan and D. Gilbert, 'Ensemble machine learning on gene expression data for cancer classification', Applied
Bioinformatics, vol 2, iss 3, pp.75--83, 2003.
[19] B. Verma and A. Rahman, 'Cluster-oriented ensemble classifier: impact of multicluster characterization on
ensemble classifier learning', IEEE Transactions on Knowledge and Data Engineering, vol 24, iss 4, pp. 605-618, 2012.
[20] Y. Freund and R. Schapire, 'A short introduction to boosting', Journal of Japanese Society For Artificial
Intelligence, vol 14, iss 5, pp. 771--780, 1999.
[21] Y. Freund and R. Schapire, 'A decision-theoretic generalization of on-line learning and an application to
boosting', Journal of Computer and System Sciences, vol 55, iss 1, pp.119--139, 1997.
[22] Y. Freund and R. Schapire, 'Experiments with a new boosting algorithm', In the 13th International Conference on
Machine Learning, pp. 148--156, 1996.
[23] R. Schapire, 'The strength of weak learnability', Machine learning, vol 5, iss 2, pp.197--227, 1990.
[24] A. Fernandez-Baldera and L. Baumela, 'Multi-class boosting with asymmetric binary weak-learners', Pattern
Recognition, vol 47, iss 5, pp. 2080--2090, 2014.
[25] J. Friedman, T. Hastie, and R. Tibshirani, 'Additive logistic regression: a statistical view of boosting', The
Annals of Statistics, vol 28, iss 2, pp. 337-- 407, 2000.
[26] J. Zhu, H. Zou, S. Rosset and T. Hastie, 'Multi-class adaboost', Statistics and Its Interface, vol 2, pp. 349--360,
2009.
[27] R. Schapire and Y. Singer, 'Improved boosting algorithms using confidence-rated prediction', Machine Learning,
vol 37, iss 1, pp. 297--336, 1999.
[28] S. Pramanik, U. Chowdhury, B. Pramanik and N. Huda, 'A comparative study of bagging, boosting and C4.5:
The recent improvements in decision tree learning algorithm', Asian J. Inf. Tech, vol 9, iss 6, pp. 300--306, 2010.
[29] Y. Krishnaraj and C. Reddy, 'Boosting methods for protein fold recognition: an empirical comparison', In
International Conference on Bioinformatics and Biomedicine, pp. 393--396, 2008.
[30] J. Wu, W. Zhang and R. Jiang, 'Comparative study of ensemble learning approaches in the identification of
disease mutations', In 3rd International Conference on Biomedical Engineering and Informatics (BMEI 2010),
vol 6, pp. 2306--2310, 2010.
[31] W. Iba and P. Langley, 'Induction of one-level decision trees', In the Ninth International Workshop on Machine
Learning, pp. 233--240, 1992.
[32] M. Wooldridge, An Introduction to Multiagent Systems, 1st ed. Chichester: John Wiley & Sons, 2002.
[33] K. Albashiri, 'An investigation into the issues of multi-agent data mining', Ph.D, University of Liverpool, 2010.
[34] S. Srinivasan, J. Singh and V. Kumar, 'Multi-agent based decision Support System using Data Mining and Case
Based Reasoning', International Journal of Computer Science Issues, vol 8, iss 4, pp. 340--349, 2011.
[35] A. Rathee and R.P. Mathur, 'Survey on Decision Tree Classification algorithms for the Evaluation of Student
Performance', International Journal of Computers & Technology, vol 4, iss 2, pp. 244--247, 2013.
[36] B. Minaei-Bidgoli, G. Kortemeyer, and W. F. Punch, 'Enhancing Online Learning Performance: An Application
of Data Mining Method', In 7th IASTED International Conference on Computers and Advanced Technology in
Education (CATE 2004), 2004.
[37] A. Bakar, Z. Othman, A. Hamdan, R. Yusof and R. Ismail, 'Agent based data classification approach for data
mining', In International Symposium on Information Technology, vol 2, pp. 1--6, 2008.
[38] I. Witten, E. Frank and M. Hall, Data Mining: Practical machine learning tools and techniques, 3rd ed.
Burlington, Mass.: Morgan Kaufmann Publishers, 2011.
20