Review On Prediction Algorithms in Educational Data Mining: A.Dinesh Kumar, R.Pandi Selvam, K.Sathesh Kumar
Review On Prediction Algorithms in Educational Data Mining: A.Dinesh Kumar, R.Pandi Selvam, K.Sathesh Kumar
Keywords—Data Mining, Educational Data Mining and This paper is organized as follows. Section II introduces the
Prediction. related work of the prediction techniques. Section III provides
prediction techniques used in educational data mining. Section
IV provides a tabulated format of research works that have been
I. INTRODUCTION carried out in prediction techniques that are used in educational
data mining. Section V discusses the conclusion of this paper.
Knowledge Discovery in Databases (KDD) is an automatic,
II. BACKGROUND STUDY
exploratory analysis and modeling of large data warehouse.
KDD is the controlled process of identifying valid, new, useful,
Mustafa Agaolglu conducted the research on predicting
and understandable patterns from huge and complex data sets.
instructor performance in higher education. They have used the
Data Mining (DM) is the heart of the KDD process, concerning
artificial neural networks, classification algorithms, decision
infer of algorithm that explore the data, develop the model and
trees, support vector machines and discriminant analysis
find out previously unfamiliar patterns. The model is used for
algorithms for predicting the instructor performance. Their
understanding phenomena from the data, analysis and prediction
results shows C5.0 classifier is the best algorithm to predict the
[1,36].
performance of instructor [3].
Educational data mining is an independent research field in
Sushil and Thakur applied fuzzy association rule mining to
data mining. Applying data mining techniques in an educational
predict the student’s performance in end semester. They used the
531
International Journal of Pure and Applied Mathematics Special Issue
data like attendance, midsem marks, previous semester marks III. PREDICTION TECHNIQUES
and previous academic records that collected from the previous
data base. Based on these data they analysed some hidden In educational data mining, prediction techniques are used to
patterns of student poor performance [4]. predict the performance of the student. In order predict the
performance of the student several tasks are used which are
Mukesh Kumar and A.J. Singh explained their study based classification, regression and density estimation.
on the performance of 412 students. They used Naive Bayes,
Random Forest, PART and Bayes Network for predicting the A. Classifications
student performance. They were found random forest algorithm
gave the best result comparing to other algorithms for predicting Classification is a supervised learning technique whose aim
the student performance [5]. is to create a model, in this specific case, called classifier, which
can classify the class label of unknown data. In other words, a
Kamal et al described a performance study on B.A first year classifier is created from a training set and it is then used to
students from Vikram University, Ujjain, India. They were used classify unknown data, into one of the existing classes.
ID3, C4.5 and CART algorithm to predict the first year student Classification is a two step process: learning phase and the
performance. Based on these algorithms they found a final grade classification phase. Using the mapping function one can
of student in a course [6]. classify any attribute vector in the classification phase. To
evaluate the classifier, an already classified input is considered
Bevinda Alisha et al conducted a study on predicting drop and its accuracy is calculated as the percentage of correct
out students’ performance. They have used the attributes like classification obtained [11]. Several algorithms are used under
Previous Semester Marks, Internal Grades to predict the classification tasks that have been applied to predict the
student’s final semester marks and also used different performance of the student. They are Decision tree, Bayesian
classification algorithms like ID3, C4.5, CART, CHAID. Their Classifier, Artificial Neural Network, Support Vector Machine
result shows that CHAID algorithm predicted the performance and K-Nearest Neighbor algorithms.
of drop out students with highest accuracy [7].
B. Decision Trees
Shaleena and Shaiju Paul conducted a study on predicting the
student performance by using decision trees, class imbalance Decision trees are the best known classification model. DT is
and cost sensitive classification method. They found the relevant a tree in which each branch node represents a choice between a
factors and relationships that lead to a student to pass or fail. number of alternatives, and each leaf node represents a decision.
They concluded some factors are related with the student failure Decision tree are commonly used for gaining information for the
[8]. purpose of decision- making. Decision tree starts with a root
node on which it is for users to take actions. From this node,
Dinesh and R.V.Radhika invsetigated the performance of the users will split each node recursively according to decision tree
students using feature selection method. They focused about the learning algorithm. The final result is a decision tree in which
factors that are related with the performance of the student then each branch represents a possible scenario of decision and its
they compared the environmental factor and educational institute outcome [17].
factor which most affect the performance of the student. Finally
C. Bayesian Classifier
they concluded environmental factors are affecting the
performance of the student [2]. Bayesian classifiers are statistical classifier that is
represented visually as a graph structure. This classifier predicts
Dr.N. Tajuniza et al conducted a study on the factors that the class membership by probabilities, such as the probability
affect the academic achievement of the student. They used that a given sample belongs to a particular class. Several Bayes
classification techniques and mapreduce technique to predict the algorithm have been developed, among which Bayesian and
performance of the student [9]. naïve Bayes are the two essential techniques. Naïve Bayes
algorithm assumes that the effect that an attribute plays on a
AS Galathiya et al analyzed a research on classification with given class is independent of the values of other attributes [34].
an improved decision tree algorithm using feature selection
D. Artificial Neural Network
technique. They used genetic search algorithm to improve the
classification accuracy. The classification accuracy was Artificial neural networks are much admired in pattern
improved by implemented the diversities of algorithm using recognition. It is a set of connected input/output units and each
RGUI with weka packages [10]. connection has a weight present with it. During the learning
phase, network learns by adjusting weights so as to be able to
532
International Journal of Pure and Applied Mathematics Special Issue
predict the correct class labels of the input tuples. Neural For the purpose of this study, 20 papers were chosen from
networks are well studied for continuous valued inputs and Educational Data Mining. We have analyzed the performance
outputs. These are best at identifying patterns or trends in data of the student is predicted by many classification algorithms.
and well studied for prediction of student performance [17].
The research has been done to improve the performance of the
E. Support Vector Machine student based on predicted results.x This prediction helped the
instructor and institution to know about the weak student status
Support Vector Machines are best method when the class and take proper assessment take on the student to improve their
boundaries are nonlinear but here is too little data to learn
study level. In this paper we have reviewed the papers that have
composite nonlinear models. The fundamental idea is that when
the data is mapped to a higher dimension, the classes become been published in the year from 2011 to 2017. Form their
linearly divisible. In practice, the mapping is done only research work the paper title, algorithms, tools and results are
absolutely, using kernel functions. Support Vector Machine listed in the below table. Most of the research has been done
(SVM) focus on only the class boundaries; points that are any under decision tree classification algorithms naïve bayes
way easily classified are skipped. The objective is to find the classification algorithm to predict the performance of the
“thickest hyperplane”, which splits the classes [35].
student.
F. K-Nearest Neighbor
TABLE I
K- Nearest neighbor classifier represents a totally dissimilar PREDICTION ALGORITHM AS APPLIED IN EDM
approach to classification. They do not build any clear universal
model, but estimated it only locally and implicitly. The main
idea is to classify a new object by examining the class values of No Title of Paper Algorithm Tools Results
the K most alike data points. The selected class can be either the Used
most frequent class among the neighbors or class distribution in Data Mining: A Bayesian MatLab Predicted the
neighborhood. The only learning task in K-nearest neighbor Prediction for Classification High
classifiers is to select two important parameters; the number of Performance Potential
neighbors k and distance metric d [35]. 1 Improvement Variable that
Using effect
G. Regression Classification student’s
performance
[15]
Regression analysis is a statistical methodology that is most
often used in numeric prediction [12] .The objective of this task Data Mining: A Bayesian MatLab Predicted
is to achieve a function of the independent variables that allows Prediction of Classification Student’s
performer or Final Mark
computing the conditional expectation of a dependent variable
2 under performer
for prediction and forecasting exercises based on the using
minimization of a certain type of error via an iterative procedure. classification
Nearly, Classification and Regression Trees (CART)
summarizes these tasks [13]. [16]
Mining ID3 Weka Predicted
H. Density Estimation Educational The student
Data to Analyze End
Density estimation is concerned with the estimation of 3 Students’ Semester
probability masses, univariate densities, joint densities, and Performance Performance
conditional densities. Most of the existing estimators assume
that all the data instances are available at once. With the ever [17]
increasing amounts of data and a tendency towards online Data Mining for CART Matlab Predicted
settings, however, there is an increasing demand for density Engineering overall
estimation on data streams (LINK). Some density estimation schools performance
techniques are inference, pattern mining, or outlier detection Predicting of students
4 Student
[14].
Performance and
IV. PREDICTION METHODS USED IN EDM Enrollment in
Master
Programs[18]
533
International Journal of Pure and Applied Mathematics Special Issue
534
International Journal of Pure and Applied Mathematics Special Issue
535
International Journal of Pure and Applied Mathematics Special Issue
[6] Kamal Bunkar Rajesh Bunkar,” Data Mining: Prediction for Performance Classification Method”, World Journal of Computer Application
Improvement of Graduate Students using Classification”, IEEE, 2012. and Technology, 2014.
[7] Bevinda Alisha Pereira, Anusha Pai,” A Comparative Analysis of [26] Elaf Abu, Amrieh, Thair Hamtini, Ibrahim Aljarah, “Mining
Decision Tree Algorithms for Predicting Student’s Performance”, Educational Data to Predict Student’s Performance Using
Ensemble Methods”, International Journal of Database and
International Journal of Engineering Science and Computing,
Theory and Application,Vol 9, 2016.
Vol 7, 2017. [27] Patinon Galvan,” Educational Evaluation and Prediction of School
[8] Shaleena K.P Shaiju Paul,” Data Mining Techniques for Performance through Data Mining and Genetic Algorithms”, Future
Predicting Student”, International Conference on Engineering Technologies conference IEEE,2016.
and Technology (ICETECH), IEEE 2015. [28] Ms.Tismy Devasia,Ms.Vinushree , “Prediction of Students Performance
[9] Dr.N.Tajunisha and M.Anjali,” Predicting Student Performance using Educational Data Mining “, Data Mining and Advanced Computing
Using Mapreduce”, International Journal of Engineering and (SAPIENCE) IEEE-2016.
Computer Science (IJECS) ,Vol.4, Issue1-2015. [29] Amjad Abu Saa , “Educational Data Mining & Students’ Performance
[10] As.Galathiya and AP.Ganatra, “Classification with an improved Prediction,International Journal of Advanced Computer Science and
Decision Tree Algorithm”, International Journal of Computer Applications (IJACSA), Vol. 7, 2016
Application (IJECS), Vol 46, 2012. [30] Febrianti Widyahastuti , Viany Utami Tjhin , “Predicting Students
[11] Ricardo Mendes And Joao P.Vilela, “ Privacy- Preserving Data Performance in Final Examination using Linear Regression and Multilayer
Mining: Methods, Metrics, and Applications”, IEEE, 2017. Perceptron”, IEEE, 2017.
[12] Jiawei Han and Micheline Kamber ,” Data mining concepts and [31] Raheela Asif , Saman Hina and Saba, “Predicting Student Academic
Performance using Data Mining Methods”, International Journal of
techniques”, Second Edition.
Computer Science and Network Security (IJCSNS), VOL.17, 2017.
[13] Chady EI Moucary “Data mining for Engineering Schools”, International
[32] Saurabh Pal, Vikas Chaurasia, “Is Alcohol Affect Higher
Journal of Advanced Computer Science and Applications(IJACSA) Vol.2,
Education Students Performance: Searching and Predicting
2011. pattern using Data Mining Algorithms”, International Journal of
[14] www.datamining.informatik.uni-mainz.de Innovations & Advancement in Computer Science IJIACS,
[15] Birijesh Kumar Bharadwaj, Saurabh Pal,” Data Mining: A Volume 6, April 2017.
Prediction for performance improvement using classification”, [33] Bevinda Alisha, Anusha Pai, “A Comparative Analysis of
International Journal of Computer Science and Information Decision Tree Algorithms for Predicting Student’s Performance”,
Security, Vol 9, 2011. International Journal of Engineering Science and Computing
[16] Umesh Kumar Pandey S.Pal, “Data Mining:A Prediction of (IJESC) ,Vol 7,2017."
performer or underperformer using classification”, International [34] Dorina Kabakchieva, “Predicting Student Performance by Using
Journal of Computer Science and Information Technologies Data Mining Methods for Classification”, Cybernatics and
(IJCSIT) Vol 2(2),2011. Information Technologies, Vol 13, 2013.
[17] Birijesh Kumar, Saurabh Pal, “ Mining Educational Data to [35] C Romero, Ventura, Pechenizkiy, and Baker Rsjd,“ Handbook of
Analyze Students’ Performance”, International Journal of Educational Data mining”, 2010.
Advanced Computer Science and Applications Vol 2, 2011. [36] Shankar, K. “Prediction of Most Risk Factors in Hepatitis
[18] Chady EI Moucary, “Data Mining for Engineering schools Disease using Apriori Algorithm.” RESEARCH JOURNAL OF
Predicting Student Performance and Enrollment in Master PHARMACEUTICAL BIOLOGICAL AND CHEMICAL
Programs”, International Journal of Advanced Computer Science SCIENCES 8.5 (2017): 477-484.
and Applications (IJACSA), Vol. 2, 2011.
[19] S.Anupama Kumar, Dr.Vijayalskhmi, “Efficiency of Decision
Tress in Predicting Student’s Academic Performance”,
CCSEA,2011.
[20] Sajadin Sembiring and Zarlis ,”Prediction of student
performance by an application of data mining
techniques”,International Conference on Management and
Artificial Intelligence IPEDR vol.6 2011.
[21] Surjeet Kumar Yadav, Saurabh Pal, “ Data Mining: A Prediction
for Performance Improvement of Engineering Students using
Classification”, World of Computer Science and Information
Technology Journal (WCSIT), Vol 2, 2012.
[22] Ajay Kumar Pal, Saurabh Pal, “ Data Mining Techniques in
EDM for Predicting the Performance of Students”, International
Journal of Computer Science and Information Technology,
Volume 2, 2013.
[23] Azwa Abdul, Nur Hafieza, Fadhilah Ahmad, “ Mining Students’
Academic Performance”, Journal of Theoretical and Applied
Information Technology”, Vol 53, 2013.
[24] Edin Osmanbegovic and Mirza Suljic, “Data Mining approach for
Predicting Student Performance”, Journal of Economics and
Business, May 2012.
[25] Abeer Badr El Din Ahmed, Ibrahim Sayed Elaraby, “Data
Mining: A Prediction for Student’s Performance Using
536
537
538