According to the increase of using data mining tech
niques in improving educational systems
operations, Educational Data Mining has been introd
uced as a new and fast growing research
area. Educational Data Mining aims to analyze data
in educational environments in order to
solve educational research problems. In this paper
a new associative classification technique
has been proposed to predict students final perform
ance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative cla
ssifiers maintain interpretability along
with high accuracy. In this research work, we have
employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract associat
ion rule for student performance prediction
as a multi-objective classification problem. Result
s indicate that the proposed swarm based
algorithm outperforms well-known classification tec
hniques on student performance prediction
classification problem.
Student Performance Evaluation in Education Sector Using Prediction and Clust...IJSRD
Data mining is the crucial steps to find out previously unknown information from large relational database. various technique and algorithm are their used in data mining such as association rules, clustering and classification and prediction techniques. Ease of the techniques contains particular characteristics and behaviour. In this paper the prime focus on clustering technique and prediction technique. Now a days large amount of data stored in educational database increasing rapidly. The database for particular set of student was collected. The clustering and prediction is made on some detailed manner and the results were produce. The K-means clustering algorithm is used here. To find nearest possible a cluster a similar group the turning point India is the performance in higher education for all students. This academic performance is influenced by various factor, therefore to identify the difference between high learners and slow learner students it is important for student performance to develop predictive data mining model.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
STUDENT PERFORMANCE ANALYSIS USING DECISION TREEAkshay Jain
This document describes a student performance analysis project that uses decision trees. It introduces decision trees and their use for classification problems. The project aims to use decision tree methodology to analyze student performance data, including attendance, test scores, seminar marks, and assignment marks to predict exam performance. It discusses the existing manual system and proposes a computerized system using decision tree induction. The key modules described are the calling class for data insertion, binary nodes to represent attribute values, and the decision tree module to build the tree from training data and classify new data.
Correlation based feature selection (cfs) technique to predict student perfro...IJCNCJournal
Education data mining is an emerging stream which h
elps in mining academic data for solving various
types of problems. One of the problems is the selec
tion of a proper academic track. The admission of a
student in engineering college depends on many fact
ors. In this paper we have tried to implement a
classification technique to assist students in pred
icting their success in admission in an engineering
stream.We have analyzed the data set containing inf
ormation about student’s academic as well as socio-
demographic variables, with attributes such as fami
ly pressure, interest, gender, XII marks and CET ra
nk
in entrance examinations and historical data of pre
vious batch of students. Feature selection is a pro
cess
for removing irrelevant and redundant features whic
h will help improve the predictive accuracy of
classifiers. In this paper first we have used featu
re selection attribute algorithms Chi-square.InfoGa
in, and
GainRatio to predict the relevant features. Then we
have applied fast correlation base filter on given
features. Later classification is done using NBTree
, MultilayerPerceptron, NaiveBayes and Instance bas
ed
–K- nearest neighbor. Results showed reduction in c
omputational cost and time and increase in predicti
ve
accuracy for the student model
Data mining to predict academic performance. Ranjith Gowda
This document proposes using data warehousing and data mining techniques to predict student academic performance in schools. It describes collecting student data like scores, attendance, discipline, and assignments into a data warehouse. Data mining methods are then used to analyze the student data and identify relationships between variables to predict performance, such as whether students are progressing, being retained, or conditionally progressing. The results could help schools identify students at risk of failing and take actions to help them succeed.
Students academic performance using clustering techniquesaniacorreya
This document summarizes a study analyzing students' academic performance data. The study collected internal and external marks for 45 students over 5 semesters. It cleaned the data, transforming the marks into sums, and used k-means clustering to group students into 4 categories (excellent, good, fair, poor) for each semester based on their internal and external marks. The analysis found the clusters followed the same performance pattern each semester, with students scoring higher internally also scoring higher externally, indicating a direct relationship between internal and external marks. The study concluded a student's university exam performance can generally be predicted from their internal marks.
Data Mining Techniques for School Failure and Dropout SystemKumar Goud
Abstract: Data mining techniques are applied to predict college failure and bum of the student. This is method uses real data on middle-school students for prediction of failure and drop out. It implements white-box classification strategies, like induction rules and decision trees or call trees. Call tree could be a call support tool that uses tree-like graph or a model of call and their possible consequences. A call tree is a flowchart-like structure in which internal node represents a "test" on an attribute. Attribute is the real information of students that is collected from college in middle or pedagogy, each branch represents the outcome of the test and each leaf node represents a class label. The paths from root to leaf represent classification rules and it consists of three kinds of nodes which incorporates call node, likelihood node and finish node. It is specifically used in call analysis. Using this technique to boost their correctness for predicting which students might fail or dropout (idler) by first, using all the accessible attributes next, choosing the most effective attributes. Attribute choice is done by using WEKA tool.
Keywords: dataset, classification, clustering.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
IRJET - A Study on Student Career PredictionIRJET Journal
This document discusses research on using machine learning techniques to predict student performance and career outcomes. It provides an overview of various studies that have used methods like decision trees, naive Bayes classification, neural networks, and clustering algorithms. The studies aimed to identify factors influencing student performance and predict outcomes like course grades, dropout risk, and placement success. The document also compares the different techniques, finding that deep neural networks and ensemble methods can achieve relatively high prediction accuracy, above 80% in some cases. Overall, the research aims to help educational institutions identify at-risk students and improve student performance.
This document discusses using data mining techniques to analyze faculty performance at an engineering college in India. It proposes analyzing 4 parameters - student complaints, feedback, results, and reviews - to evaluate faculty instead of just 2 parameters (feedback and results) used previously. It will use opinion mining to analyze faculty performance and calculate scores. The system will collect data, preprocess it, apply a KNN algorithm to the 4 parameters to calculate scores for each faculty, sum the scores, classify results using rule-based classification, and analyze outcomes by subject and class. It reviews related work applying educational data mining and concludes the multiple classifier approach is better, and future work could consider more parameters and expand to all college branches and departments.
A Study on Learning Factor Analysis – An Educational Data Mining Technique fo...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
This document describes an academic performance analysis system that uses educational data mining techniques. It analyzes student and teacher performance data collected from an engineering college. The system applies the Apriori algorithm and decision tree algorithm to mine patterns in academic data. The Apriori algorithm is used to generate rules based on support, confidence and lift to analyze student performance in different courses. The decision tree algorithm is used to analyze and visualize results for individual students, student groups, and indirectly for teachers. The goal is to identify existing patterns in past student performance data and use it to improve future student and teacher performance.
Predicting instructor performance using data mining techniques in higher educ...redpel dot com
Predicting instructor performance using data mining techniques in higher education
for more ieee paper / full abstract / implementation , just visit www.redpel.com
IRJET- Using Data Mining to Predict Students PerformanceIRJET Journal
This document describes a study that used logistic regression to predict student performance based on educational data. The researchers collected student data including exam scores, attendance, study hours, family income, etc. from a large dataset. Logistic regression achieved the best prediction accuracy of 82.03% compared to other models like naive bayes, K-nearest neighbor, and multi-layer perceptron. The results indicate that around 230 students would perform poorly, 600 would perform fairly, and 200 would perform well based on the predictive model. This analysis can help identify students needing extra support and help universities improve academic outcomes.
Application of Higher Education System for Predicting Student Using Data mini...AM Publications
The aim of research paper is to improve the current trends in the higher education systems to understand
from the outside which factors might create loyal students. The necessity of having loyal students motivates higher
education systems to know them well, one way to do this is by using valid management and processing of the students
database. Data mining methods represent a valid approach for the extraction of precious information from existing
students to manage relations with future students. This may indicate at an early stage which type of students will
potentially be enrolled and what areas to concentrate upon in higher education systems for support. For this purpose
the data mining framework is used for mining related to academic data from enrolled students. The rule generation
process is based on the classification method. The generated rules are studied and evaluated using different
evaluation methods and the main attributes that may affect the student’s loyalty have been highlighted. Software that
facilitates the use of the generated rules is built which allows the higher education systems to predict the student’s
loyalty (numbers of enrolled students) so that they can manage and prepare necessary resources for the new enrolled students.
Analyzing undergraduate students’ performance in various perspectives using d...Alexander Decker
This document discusses analyzing undergraduate student performance using data mining techniques. It analyzes student performance in two perspectives: 1) supervised vs unsupervised assessment instruments and 2) performance in mathematics, English, and programming courses. The study uses association rule mining with the Apriori algorithm to discover patterns in student performance data from both analyses. The goal is to identify useful insights that can help improve assessment methods, curriculum structure, and course prerequisites.
This paper highlights important issues of higher education system such as predicting student’s academic performance. This is trivial to study predominantly from the point of view of the institutional administration, management, different stakeholder, faculty, students as well as parents. For making analysis on the student data we selected algorithms like Decision Tree, Naive Bayes, Random Forest, PART and Bayes Network with three most important techniques such as 10-fold cross-validation, percentage split (74%) and training set. After performing analysis on different metrics (Time to build Classifier, Mean Absolute Error, Root Mean Squared Error, Relative Absolute Error, Root Relative Squared Error, Precision, Recall, F-Measure, ROC Area) by different data mining algorithm, we are able to find which algorithm is performing better than other on the student dataset in hand, so that we are able to make a guideline for future improvement in student performance in education. According to analysis of student dataset we found that Random Forest algorithm gave the best result as compared to another algorithm with Recall value approximately equal to one. The analysis of different data mini g algorithm gave an in-depth awareness about how these algorithms predict student the performance of different student and enhance their skill.
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
Educational data mining (EDM) is one of the applications of data mining. In educational data mining, there are two key domains, i.e. student domain and faculty domain. Different type of research work has been done in both domains.
In existing system the faculty performance has calculated on the basis of two parameters i.e. Student feedback and the result of student in that subject. In existing system we define two approaches one is multiple classifier approach and the other is a single classifier approach and comparing them, for relative evaluation of faculty performance using data mining
Techniques. In multiple classifier approach K-nearest neighbor (KNN) is used in first step and Rule based classification is used in the second step of classification while in single classifier approach only KNN is used in both steps of classification.
But in proposed system, I will analyse the faculty performance using 4 parameters i.e., student complaint about faculty, Student review feedback for faculty, students feedback, and students result etc.
For this proposed system I will be going to use opinion mining technique for analyzing performance of faculty and calculating score of each faculty.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
Predictive models are quasi experimental structures used to determine the future
patterns in data. These meaningful data patterns form the building block of any
decision support system. Researchers all over the world have built many prediction
models for major industries. Research works in the educational sector has increased
steeply. This steep increase may be due to the high availability of data in the
educational domain. This survey tries to comprehend a few literary works on
academic performance prediction of engineering students with the focus on grade
predictions. Meaningful interpretations have been made and inferences are presented
at the end of this paper
The document proposes a new framework called Quasi Framework to detect disengagement in online learning. It analyzes log file data from an online learning system to identify attributes related to disengagement. The framework merges log file information with student database information and uses it to predict disengagement. Experimental results on a real student dataset show the Quasi Framework achieves higher accuracy than an existing system called iHelp, particularly for predicting disengaged students. The study suggests considering both reading and assessment attributes are important for accurate disengagement detection.
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Using the Students Performance in Exams Dataset we will try to understand what affects the exam scores. The data is limited, but it will present a good visualization to spot the relations. First of all, we explore our data and after that we apply Naive Bayes Classification technique for evaluation purpose.
CORRELATION BASED FEATURE SELECTION (CFS) TECHNIQUE TO PREDICT STUDENT PERFRO...IJCNCJournal
Education data mining is an emerging stream which helps in mining academic data for solving various
types of problems. One of the problems is the selection of a proper academic track. The admission of a
student in engineering college depends on many factors. In this paper we have tried to implement a
classification technique to assist students in predicting their success in admission in an engineering
stream.We have analyzed the data set containing information about student’s academic as well as sociodemographic variables, with attributes such as family pressure, interest, gender, XII marks and CET rank
in entrance examinations and historical data of previous batch of students. Feature selection is a process
for removing irrelevant and redundant features which will help improve the predictive accuracy of
classifiers. In this paper first we have used feature selection attribute algorithms Chi-square.InfoGain, and
GainRatio to predict the relevant features. Then we have applied fast correlation base filter on given
features. Later classification is done using NBTree, MultilayerPerceptron, NaiveBayes and Instance based
–K- nearest neighbor. Results showed reduction in computational cost and time and increase in predictive
accuracy for the student model
CORRELATION BASED FEATURE SELECTION (CFS) TECHNIQUE TO PREDICT STUDENT PERFRO...IJCNCJournal
This document discusses using feature selection and classification techniques to predict student performance and recommend an engineering stream for students. It first describes feature selection algorithms like chi-square and correlation-based feature selection to identify relevant attributes from a student data set. It then applies classifiers like NBTree, Naive Bayes, k-nearest neighbor, and multilayer perceptron on the selected features and evaluates their performance. The results show that correlation-based feature selection reduces computation time and improves predictive accuracy for recommending an engineering stream for students.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...IIRindia
Educational Data mining(EDM)is a prominent field concerned with developing methods for exploring the unique and increasingly large scale data that come from educational settings and using those methods to better understand students in which they learn. It has been proved in various studies and by the previous study by the authors that data mining techniques find widespread applications in the educational decision making process for improving the performance of students in higher educational institutions. Classification techniques assumes significant importance in the machine learning tasks and are mostly employed in the prediction related problems. In machine learning problems, feature selection techniques are used to reduce the attributes of the class variables by removing the redundant and irrelevant features from the dataset. The aim of this research work is to compares the performance of various feature selection techniques is done using WEKA tool in the prediction of students’ performance in the final semester examination using different classification algorithms. Particularly J48, Naïve Bayes, Bayes Net, IBk, OneR, and JRip are used in this research work. The dataset for the study were collected from the student’s performance report of a private college in Tamil Nadu state of India. The effectiveness of various feature selection algorithms was compared with six classifiers and the results are discussed. The results of this study shows that the accuracy of IBK is 99.680% which is found to be
The document discusses using Learning Factor Analysis (LFA), an educational data mining technique, to model student knowledge based on student-tutor interaction log data. LFA uses a multiple logistic regression model with difficulty factors defined by subject experts to quantify skills. A combinatorial search method called A* search is used to select the best-fitting model. The document illustrates applying LFA to data from an online math tutor, identifying 5 skills and presenting the results of the logistic regression modeling, including fit statistics and learning rates for skills. Learning curves are used to visualize student performance over time.
This document analyzes and compares the performance of various classification algorithms (J48, Random Forest, Multilayer Perceptron, IB1, Decision Table) in predicting student performance using data from 260 students. Random Forest performed the best with 89.23% accuracy, taking the least time to build the model and having the lowest error rates compared to the other algorithms. Attributes like attendance, economic status, and parental education were found to be most important factors influencing student results. The analysis provides insight into how different factors impact student performance.
IRJET - A Study on Student Career PredictionIRJET Journal
This document discusses research on using machine learning techniques to predict student performance and career outcomes. It provides an overview of various studies that have used methods like decision trees, naive Bayes classification, neural networks, and clustering algorithms. The studies aimed to identify factors influencing student performance and predict outcomes like course grades, dropout risk, and placement success. The document also compares the different techniques, finding that deep neural networks and ensemble methods can achieve relatively high prediction accuracy, above 80% in some cases. Overall, the research aims to help educational institutions identify at-risk students and improve student performance.
This document discusses using data mining techniques to analyze faculty performance at an engineering college in India. It proposes analyzing 4 parameters - student complaints, feedback, results, and reviews - to evaluate faculty instead of just 2 parameters (feedback and results) used previously. It will use opinion mining to analyze faculty performance and calculate scores. The system will collect data, preprocess it, apply a KNN algorithm to the 4 parameters to calculate scores for each faculty, sum the scores, classify results using rule-based classification, and analyze outcomes by subject and class. It reviews related work applying educational data mining and concludes the multiple classifier approach is better, and future work could consider more parameters and expand to all college branches and departments.
A Study on Learning Factor Analysis – An Educational Data Mining Technique fo...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
This document describes an academic performance analysis system that uses educational data mining techniques. It analyzes student and teacher performance data collected from an engineering college. The system applies the Apriori algorithm and decision tree algorithm to mine patterns in academic data. The Apriori algorithm is used to generate rules based on support, confidence and lift to analyze student performance in different courses. The decision tree algorithm is used to analyze and visualize results for individual students, student groups, and indirectly for teachers. The goal is to identify existing patterns in past student performance data and use it to improve future student and teacher performance.
Predicting instructor performance using data mining techniques in higher educ...redpel dot com
Predicting instructor performance using data mining techniques in higher education
for more ieee paper / full abstract / implementation , just visit www.redpel.com
IRJET- Using Data Mining to Predict Students PerformanceIRJET Journal
This document describes a study that used logistic regression to predict student performance based on educational data. The researchers collected student data including exam scores, attendance, study hours, family income, etc. from a large dataset. Logistic regression achieved the best prediction accuracy of 82.03% compared to other models like naive bayes, K-nearest neighbor, and multi-layer perceptron. The results indicate that around 230 students would perform poorly, 600 would perform fairly, and 200 would perform well based on the predictive model. This analysis can help identify students needing extra support and help universities improve academic outcomes.
Application of Higher Education System for Predicting Student Using Data mini...AM Publications
The aim of research paper is to improve the current trends in the higher education systems to understand
from the outside which factors might create loyal students. The necessity of having loyal students motivates higher
education systems to know them well, one way to do this is by using valid management and processing of the students
database. Data mining methods represent a valid approach for the extraction of precious information from existing
students to manage relations with future students. This may indicate at an early stage which type of students will
potentially be enrolled and what areas to concentrate upon in higher education systems for support. For this purpose
the data mining framework is used for mining related to academic data from enrolled students. The rule generation
process is based on the classification method. The generated rules are studied and evaluated using different
evaluation methods and the main attributes that may affect the student’s loyalty have been highlighted. Software that
facilitates the use of the generated rules is built which allows the higher education systems to predict the student’s
loyalty (numbers of enrolled students) so that they can manage and prepare necessary resources for the new enrolled students.
Analyzing undergraduate students’ performance in various perspectives using d...Alexander Decker
This document discusses analyzing undergraduate student performance using data mining techniques. It analyzes student performance in two perspectives: 1) supervised vs unsupervised assessment instruments and 2) performance in mathematics, English, and programming courses. The study uses association rule mining with the Apriori algorithm to discover patterns in student performance data from both analyses. The goal is to identify useful insights that can help improve assessment methods, curriculum structure, and course prerequisites.
This paper highlights important issues of higher education system such as predicting student’s academic performance. This is trivial to study predominantly from the point of view of the institutional administration, management, different stakeholder, faculty, students as well as parents. For making analysis on the student data we selected algorithms like Decision Tree, Naive Bayes, Random Forest, PART and Bayes Network with three most important techniques such as 10-fold cross-validation, percentage split (74%) and training set. After performing analysis on different metrics (Time to build Classifier, Mean Absolute Error, Root Mean Squared Error, Relative Absolute Error, Root Relative Squared Error, Precision, Recall, F-Measure, ROC Area) by different data mining algorithm, we are able to find which algorithm is performing better than other on the student dataset in hand, so that we are able to make a guideline for future improvement in student performance in education. According to analysis of student dataset we found that Random Forest algorithm gave the best result as compared to another algorithm with Recall value approximately equal to one. The analysis of different data mini g algorithm gave an in-depth awareness about how these algorithms predict student the performance of different student and enhance their skill.
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
Educational data mining (EDM) is one of the applications of data mining. In educational data mining, there are two key domains, i.e. student domain and faculty domain. Different type of research work has been done in both domains.
In existing system the faculty performance has calculated on the basis of two parameters i.e. Student feedback and the result of student in that subject. In existing system we define two approaches one is multiple classifier approach and the other is a single classifier approach and comparing them, for relative evaluation of faculty performance using data mining
Techniques. In multiple classifier approach K-nearest neighbor (KNN) is used in first step and Rule based classification is used in the second step of classification while in single classifier approach only KNN is used in both steps of classification.
But in proposed system, I will analyse the faculty performance using 4 parameters i.e., student complaint about faculty, Student review feedback for faculty, students feedback, and students result etc.
For this proposed system I will be going to use opinion mining technique for analyzing performance of faculty and calculating score of each faculty.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
Predictive models are quasi experimental structures used to determine the future
patterns in data. These meaningful data patterns form the building block of any
decision support system. Researchers all over the world have built many prediction
models for major industries. Research works in the educational sector has increased
steeply. This steep increase may be due to the high availability of data in the
educational domain. This survey tries to comprehend a few literary works on
academic performance prediction of engineering students with the focus on grade
predictions. Meaningful interpretations have been made and inferences are presented
at the end of this paper
The document proposes a new framework called Quasi Framework to detect disengagement in online learning. It analyzes log file data from an online learning system to identify attributes related to disengagement. The framework merges log file information with student database information and uses it to predict disengagement. Experimental results on a real student dataset show the Quasi Framework achieves higher accuracy than an existing system called iHelp, particularly for predicting disengaged students. The study suggests considering both reading and assessment attributes are important for accurate disengagement detection.
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Using the Students Performance in Exams Dataset we will try to understand what affects the exam scores. The data is limited, but it will present a good visualization to spot the relations. First of all, we explore our data and after that we apply Naive Bayes Classification technique for evaluation purpose.
CORRELATION BASED FEATURE SELECTION (CFS) TECHNIQUE TO PREDICT STUDENT PERFRO...IJCNCJournal
Education data mining is an emerging stream which helps in mining academic data for solving various
types of problems. One of the problems is the selection of a proper academic track. The admission of a
student in engineering college depends on many factors. In this paper we have tried to implement a
classification technique to assist students in predicting their success in admission in an engineering
stream.We have analyzed the data set containing information about student’s academic as well as sociodemographic variables, with attributes such as family pressure, interest, gender, XII marks and CET rank
in entrance examinations and historical data of previous batch of students. Feature selection is a process
for removing irrelevant and redundant features which will help improve the predictive accuracy of
classifiers. In this paper first we have used feature selection attribute algorithms Chi-square.InfoGain, and
GainRatio to predict the relevant features. Then we have applied fast correlation base filter on given
features. Later classification is done using NBTree, MultilayerPerceptron, NaiveBayes and Instance based
–K- nearest neighbor. Results showed reduction in computational cost and time and increase in predictive
accuracy for the student model
CORRELATION BASED FEATURE SELECTION (CFS) TECHNIQUE TO PREDICT STUDENT PERFRO...IJCNCJournal
This document discusses using feature selection and classification techniques to predict student performance and recommend an engineering stream for students. It first describes feature selection algorithms like chi-square and correlation-based feature selection to identify relevant attributes from a student data set. It then applies classifiers like NBTree, Naive Bayes, k-nearest neighbor, and multilayer perceptron on the selected features and evaluates their performance. The results show that correlation-based feature selection reduces computation time and improves predictive accuracy for recommending an engineering stream for students.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...IIRindia
Educational Data mining(EDM)is a prominent field concerned with developing methods for exploring the unique and increasingly large scale data that come from educational settings and using those methods to better understand students in which they learn. It has been proved in various studies and by the previous study by the authors that data mining techniques find widespread applications in the educational decision making process for improving the performance of students in higher educational institutions. Classification techniques assumes significant importance in the machine learning tasks and are mostly employed in the prediction related problems. In machine learning problems, feature selection techniques are used to reduce the attributes of the class variables by removing the redundant and irrelevant features from the dataset. The aim of this research work is to compares the performance of various feature selection techniques is done using WEKA tool in the prediction of students’ performance in the final semester examination using different classification algorithms. Particularly J48, Naïve Bayes, Bayes Net, IBk, OneR, and JRip are used in this research work. The dataset for the study were collected from the student’s performance report of a private college in Tamil Nadu state of India. The effectiveness of various feature selection algorithms was compared with six classifiers and the results are discussed. The results of this study shows that the accuracy of IBK is 99.680% which is found to be
The document discusses using Learning Factor Analysis (LFA), an educational data mining technique, to model student knowledge based on student-tutor interaction log data. LFA uses a multiple logistic regression model with difficulty factors defined by subject experts to quantify skills. A combinatorial search method called A* search is used to select the best-fitting model. The document illustrates applying LFA to data from an online math tutor, identifying 5 skills and presenting the results of the logistic regression modeling, including fit statistics and learning rates for skills. Learning curves are used to visualize student performance over time.
This document analyzes and compares the performance of various classification algorithms (J48, Random Forest, Multilayer Perceptron, IB1, Decision Table) in predicting student performance using data from 260 students. Random Forest performed the best with 89.23% accuracy, taking the least time to build the model and having the lowest error rates compared to the other algorithms. Attributes like attendance, economic status, and parental education were found to be most important factors influencing student results. The analysis provides insight into how different factors impact student performance.
Fuzzy Association Rule Mining based Model to Predict Students’ Performance IJECEIAES
The major intention of higher education institutions is to supply quality education to its students. One approach to get maximum level of quality in higher education system is by discovering knowledge for prediction regarding the internal assessment and end semester examination. The projected work intends to approach this objective by taking the advantage of fuzzy inference technique to classify student scores data according to the level of their performance. In this paper, student’s performance is evaluated using fuzzy association rule mining that describes Prediction of performance of the students at the end of the semester, on the basis of previous database like Attendance, Midsem Marks, Previous semester marks and Previous Academic Records were collected from the student’s previous database, to identify those students which needed individual attention to decrease fail ration and taking suitable action for the next semester examination.
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...IRJET Journal
This document discusses using machine learning classification algorithms to predict student performance based on educational data. It compares the performance of five classification algorithms - J48, Naive Bayes, Bayes Net, Backpropagation Network, and Radial Basis Function Network - in predicting student academic achievement using attributes like demographic information, test scores, and academic factors. The experiment found that the Radial Basis Function Network algorithm achieved the highest accuracy, correctly classifying 100% of instances, compared to 75-95% accuracy for the other algorithms. Convolutional neural networks are also discussed as a powerful tool for image and language processing in educational data mining.
A Survey on the Classification Techniques In Educational Data MiningEditor IJCATR
Due to increasing interest in data mining and educational system, educational data mining is the emerging topic for research
community. educational data mining means to extract the hidden knowledge from large repositories of data with the use of technique
and tools. educational data mining develops new methods to discover knowledge from educational database and used for decision
making in educational system. The various techniques of data mining like classification. clustering can be applied to bring out hidden
knowledge from the educational data.
In this paper, we focus on the educational data mining and classification techniques. In this study we analyze attributes for the
prediction of student's behavior and academic performance by using WEKA open source data mining tool and various classification
methods like decision trees, C4.5 algorithm, ID3 algorithm etc.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled “Ada-GA” where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
Extending the Student’s Performance via K-Means and Blended Learning IJEACS
In this paper, we use the clustering technique to monitor the status of students’ scholastic recital. This paper spotlights on upliftment the education system via K-means clustering. Clustering is the process of grouping the similar objects. Commonly in the academic, the performances of the students are grouped by their Graded Point (GP). We adopted K-means algorithm and implemented it on students’ mark data. This system is a promising index to screen the development of students and categorize the students by their academic performance. From the categories, we train the students based on their GP. It was implemented in MATLAB and obtained the clusters of students exactly.
ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MACHINE LEARNING ALGORITHMS:– ...indexPub
Student academic performance is the great value of institutes, universities and colleges. All colleges majorly focus on the career development of students. The academic performance of students plays a vital role in the establishment of a bright career. On the basis of better academic performance, the placement of the students will be better and the same will be reflected in the form of better admission and future. Machine learning can be deployed for the prediction of student performance. Various algorithms are playing an important role in the prediction of the accuracy of various machine learning models. These articles discuss various algorithms that can be helpful to deploy for predicting student academic performance. The article discusses various methods, predictive features and the accuracy of machine learning algorithms. The primary factors used for predicting students performance are academic institution, sessional marks, semester progress, family occupation, methods and algorithms. The accuracy level of various machine learning algorithms is discussed in this article.
IRJET- Tracking and Predicting Student Performance using Machine LearningIRJET Journal
This document describes a study that uses machine learning models to predict student performance and whether students will complete their degrees based on their academic records and other features. The study collected data on scholarship students from various universities. It applied learning analytics, discriminative, and generative classification models to the data. Experimental results showed the proposed method, which considered features like family expenditures and personal information, outperformed existing methods that primarily used academic performance, family income, and assets. The document discusses using k-means clustering and support vector machines (SVM) algorithms to analyze the data and predict student performance. It concludes that past academic performance significantly influences students' future performance and that predictive performance increases with larger datasets.
Oversampling technique in student performance classification from engineering...IJECEIAES
This document discusses various oversampling techniques for dealing with imbalanced data in student performance classification. It compares SMOTE, Borderline-SMOTE, SVMSMOTE, and ADASYN oversampling combined with MLP, gradient boosting, AdaBoost, and random forest classifiers. The results show that Borderline-SMOTE gave the best performance for predicting the minority (low performance) class according to several evaluation metrics. SVMSMOTE also performed well overall, particularly for recall, F1-measure, and AUC. Gradient boosting provided high and consistent precision, recall, F1-measure, and AUC across the different oversampling methods.
Prognostication of the placement of students applying machine learning algori...BIJIAM Journal
Placement is the process of connecting the selected candidate with the employer. Every student might have adream of having a job offer when he or she is about to complete her course. All educational institutions aim athaving their students well placed in good organizations. The reputation of any institution depends on the placementof its students. Hence, many institutions try hard to have a good placement cell. Classification using machinelearning may be utilized to retrieve data from the student-databases. A prediction model that can foretell theeligibility of the students based on their academic and extracurricular achievements is proposed. Related data wascollected from many institutions for which the placement-prediction is made. This paradigm is being weighed upwith the existing algorithms, and findings have been made regarding the accuracy of predictions. It was found thatthe proposed algorithm performed significantly better and yielded good results.
Educational Data Mining is used to find interesting patterns from the data taken from
educational settings to improve teaching and learning. Assessing student’s ability and performance with
EDM methods in e-learning environment for math education in school level in India has not been
identified in our literature review. Our method is a novel approach in providing quality math education
with assessments indicating the knowledge level of a student in each lesson. This paper illustrates how
Learning Curve – an EDM visualization method is used to compare rural and urban students’ progress
in learning mathematics in an e-learning environment. The experiment is conducted in two different
schools in Tamil Nadu, India. After practicing the problems the students attended the test and their
interaction data are collected and analyzed their performance in different aspects: Knowledge
component level, time taken to solve a problem, error rate. This work studies the student actions for
identifying learning progress. The results show that the learning curve method is much helpful to the
teachers to visualize the students’ performance in granular level which is not possible manually. Also it
helps the students in knowing about their skill level when they complete each unit.
In this study, the effect of combining variables from the different data sources for student academic performance prediction was examined using three state-of-the–art classifiers: Decision Tree (DT), Artificial Neural Network (ANN) and Support Vector Machine (SVM). The study examined the use of heterogeneous multi-model ensemble techniques to predict student academic performance based on the combination of these classifiers and three different data sources. A quantitative approach was used to develop the various base classifier models while the ensemble models were developed using stacked generalisation ensemble method in order to overcome the individual weaknesses of the different models. Variables were extracted from the institution’s Student Record System and Learning Management System (Moodle) and from a structured student questionnaire. At present, negligible work has been done using this integrated approach and ensemble techniques especially with aggregated learner data in performance prediction in HE. The empirical results obtained show that the ensemble models.........................
This document provides a systematic review of educational data mining (EDM) techniques and their applications. It discusses how EDM can be used to extract hidden information from large student data repositories using clustering, classification, prediction, and recommendation algorithms. These algorithms help group similar students, categorize students, predict student outcomes, and suggest courses. The document also reviews literature applying these EDM techniques and outlines future work on semantic and opinion mining to improve adaptive learning systems.
Data Mining Techniques in Higher Education an Empirical Study for the Univer...IJMER
Nowadays, ones of the biggest challenges that educational institutions face is the explosive
growth of educational data. and how to use these data to improve the quality of managerial decisions.
Data mining, as an analytical tools that can be used to extract meaningful knowledge from large data
sets, can be used to achieve this goal.
This paper addresses the applications of Educational Data Mining (EDM) to extract useful information
from registration information of student at university of Palestine in Gaza strip. The data include five
years period [2005-2011] by providing analytical tool to view and use this information for decision
making processes by taking real life example such as grade and GPA for the students. abstract should
summarize the content of the paper.
Educational Data Mining is used to predict the future learning behavior of the student. It is still a research topic for the researcher who wants do better result from the prediction of the student. The results of all these techniques help the teachers, management, and administrator to draft new rules and policy for the improvement of the educational standards and hence overall results and student retention. Taking this point in mind work has been done to find the slow learner in a High School class and then provide timely help to them for improving their overall result. There are lots of techniques of data mining are available for use but we are selecting only those techniques which are mostly used by different research for their result prediction like J48, REPTree, Naive Bayes, SMO, Multilayer Perceptron. On the collected dataset Multilayer Perception classification algorithm gives 87.43% accuracy when using whole dataset as training dataset and SMO and J48 gives 69.00% accuracy when using 10-fold cross validation algorithm.
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCHSridhar191373
Statement of unit commitment problem-constraints: spinning reserve, thermal unit constraints, hydro constraints, fuel constraints and other constraints. Solution methods: priority list methods, forward dynamic programming approach. Numerical problems only in priority list method using full load average production cost. Statement of economic dispatch problem-cost of generation-incremental cost curve –co-ordination equations without loss and with loss- solution by direct method and lamda iteration method (No derivation of loss coefficients)
UNIT-1-PPT-Introduction about Power System Operation and ControlSridhar191373
Power scenario in Indian grid – National and Regional load dispatching centers –requirements of good power system - necessity of voltage and frequency regulation – real power vs frequency and reactive power vs voltage control loops - system load variation, load curves and basic concepts of load dispatching - load forecasting - Basics of speed governing mechanisms and modeling - speed load characteristics - regulation of two generators in parallel.
Bituminous binders are sticky, black substances derived from the refining of crude oil. They are used to bind and coat aggregate materials in asphalt mixes, providing cohesion and strength to the pavement.
This presentation provides a comprehensive overview of air filter testing equipment and solutions based on ISO 5011, the globally recognized standard for performance testing of air cleaning devices used in internal combustion engines and compressors.
Key content includes:
"The Enigmas of the Riemann Hypothesis" by Julio ChaiJulio Chai
In the vast tapestry of the history of mathematics, where the brightest minds have woven with threads of logical reasoning and flash-es of intuition, the Riemann Hypothesis emerges as a mystery that chal-lenges the limits of human understanding. To grasp its origin and signif-icance, it is necessary to return to the dawn of a discipline that, like an incomplete map, sought to decipher the hidden patterns in numbers. This journey, comparable to an exploration into the unknown, takes us to a time when mathematicians were just beginning to glimpse order in the apparent chaos of prime numbers.
Centuries ago, when the ancient Greeks contemplated the stars and sought answers to the deepest questions in the sky, they also turned their attention to the mysteries of numbers. Pythagoras and his followers revered numbers as if they were divine entities, bearers of a universal harmony. Among them, prime numbers stood out as the cornerstones of an infinite cathedral—indivisible and enigmatic—hiding their ar-rangement beneath a veil of apparent randomness. Yet, their importance in building the edifice of number theory was already evident.
The Middle Ages, a period in which the light of knowledge flick-ered in rhythm with the storms of history, did not significantly advance this quest. It was the Renaissance that restored lost splendor to mathe-matical thought. In this context, great thinkers like Pierre de Fermat and Leonhard Euler took up the torch, illuminating the path toward a deeper understanding of prime numbers. Fermat, with his sharp intuition and ability to find patterns where others saw disorder, and Euler, whose overflowing genius connected number theory with other branches of mathematics, were the architects of a new era of exploration. Like build-ers designing a bridge over an unknown abyss, their contributions laid the groundwork for later discoveries.
Tesia Dobrydnia brings her many talents to her career as a chemical engineer in the oil and gas industry. With the same enthusiasm she puts into her work, she engages in hobbies and activities including watching movies and television shows, reading, backpacking, and snowboarding. She is a Relief Senior Engineer for Chevron and has been employed by the company since 2007. Tesia is considered a leader in her industry and is known to for her grasp of relief design standards.
Structural Health and Factors affecting.pptxgunjalsachin
Structural Health- Factors affecting Health of Structures,
Causes of deterioration in RC structures-Permeability of concrete, capillary porosity, air voids, Micro cracks and macro cracks, corrosion of reinforcing bars, sulphate attack, alkali silica reaction
Causes of deterioration in Steel Structures: corrosion, Uniform deterioration, pitting, crevice, galvanic, laminar, Erosion, cavitations, fretting, Exfoliation, Stress, causes of defects in connection
Maintenance and inspection of structures.
Video Games and Artificial-Realities.pptxHadiBadri1
🕹️ #GameDevs, #AIteams, #DesignStudios — I’d love for you to check it out.
This is where play meets precision. Let’s break the fourth wall of slides, together.
2. 116 Computer Science & Information Technology (CS & IT)
As mentioned above, EDM is a domain that uses machine learning, data mining and statistical
techniques, analyses educational data. Thanks to employ of these techniques, it is possible to
improve the learning/teaching processes involving students or instructors.
Educational data come in many different and very complex formats. The last surveys in this
scope is related to (Alejandro Pena-Ayala,2013), establishing the following EDM approaches [1]:
• Student behavior modeling
• Student performance modeling
• Student modeling
• Assessment
• Curriculum, domain knowledge, sequencing, and teachers support
• Student support and feedback
Other survey is related to Romero and Ventura [2], which is survey on educational data mining
between 1995 and 2005. Using data mining techniques in higher education is a recent research
domain; there are a lot of works in this area. That is because of its potentials to educational
institutes.
Ayesha et al. employed the k-means data mining clustering algorithm to predict students’
learning activities in an educational database including classroom quizzes, final and mid exam
and other assignments. This correlated information will be conveyed to the teacher before the
transfer of final exam. This study helps the teachers to improve the performance of students and
reduce the failing ratio by taking appropriate steps at on time [3].
Baradwaj and Pal, in the year 2011, used the classification as data mining methods to evaluate
student’ performance, they applied decision tree technique for classification. The aim of their
research is to extract knowledge that describes students’ performance in end semester quizzes.
They used students’ educational data from the student’ previous database including Class test ,
Assignment marks , Attendance, , Seminar. This study helps sooner in identifying the students
who need more attention and allow the teacher to provide appropriate advising [4].
Chandra and Nandhini, applied the association rule mining method based on students courses to
identifies students’ break patterns. The aim of their research is to identify hidden relationship
between the failed courses and suggests relevant causes of the failure to improve the low capacity
students’ performances. The extracted association rules lay out some hidden patterns of students’
courses which could serve as a foundation stone for academic planners in making decisions and
modification and an aid in the curriculum re-structuring with a view to improving students’
performance and reducing break rate [5].
Shannaq et al, used the classification since data mining technique to predict the numbers of listed
students by evaluating academic data from enrolled students to study the main attributes that may
affect the students’ truth (number of enrolled students) [6].The decision tree as a classification
method to extract classification rules and the extracted classification rules are analyzed and
evaluated using different evaluation methods. It allows the University management to prepare
necessary resources for the new enrolled students and indicates at an early stage which type of
students will potentially be enrolled and what areas to focus over in higher education systems for
support and feedback.
3. Computer Science & Information Technology (CS & IT) 117
Made a prediction model using the GP method to identify at-risk students in traditional school
settings. A feature selection technique was used to reduce the attributes [7].
Wolff et al. (2013) have applied a decision-tree as data mining techniques to identify at-risk
students in a virtual learning environment.
In this paper a new associative classification technique has been proposed to predict students
final performance. In this research work, we have employed Honeybee Colony Optimization and
Particle Swarm Optimization to extract association rule for student performance prediction as a
multi-objective classification problem. Results indicate that the proposed swarm based algorithm
outperforms well-known classification techniques on student performance prediction
classification problem.
The rest of this paper is organized as follows: Section 2 presents the new proposed classification
method for student performance prediction.
2. PROPOSED METHOD
In this section, we introduce a new approach, called Bee-RM, of multi-objective optimization
based on the optimization of bee colony algorithm and particle swarm optimization.
In the following, we present the outlines of our proposed approach.
Association rule extraction is widely used data mining tasks. This is due to the interpretability
feature of these rules for non-experts. The extraction of the association rules is usually performed
using the meta-heuristic algorithms. In this paper, we take two major factors into consideration
regarding the classification: the first one is the accuracy and the second is Interpretability.
The knowledge base used in this work is presented as a rule base. It is an important issue to select
a set of optimum rules in these systems. In our Bee-RM approach, the rule extraction is
performed using “pareto optimality” and considering the multi-objective factor.
Since there is rarely a unique solution which optimizes all objective functions, we look for a
trade-off between objectives instead of seeking a unique solution for multi objective
optimization.
2.1 RULE GENERATION BY BEE_RM
In this work, we decided to continuously extract rules as there is only few works which perform
continuous rule extraction. The advantage of the continuous rule extraction is that the whole
space is explored. However, the whole space exploration needs a lot of space, which demands to
use more powerful algorithms.
In the following, we present how to model the association rules using the bee colony optimization
and particle swarm optimization (PSO). Each member of the population is presented as an array
with three rows. Then, each association rule is created by a member.
Since rules are created for each class, we use class zero as an example.
4. 118 Computer Science & Information Technology (CS & IT)
In the first array, “A” presents absence and “P” presents the specific property in the rule. In this
approach, we do not need to perform bins and so the span is seen continuously.
The second array’s values show the lower limit of each property. The third array shows the upper
limit of each property. Therefore, the rule presented by these arrays is:
0=classthen7<F4<8.5)and(0.2<F1<0.9If
The first array contains discrete values, in the ConstructSolution function, we use the bee colony
optimization in order to predicate and in the case of two other arrays which present the span, we
use PSO optimization.
In the first fold of each category in the dataset, the generation is performed “MaxGeneration “
times. Inside each generation, the population size is equal to the value of “Population” parameter.
In each execution of the algorithm, for each class in the dataset, the generation is performed and
every member of the population produces the optimized results. Then, we use the “optimized
association rules extracted for all classes” as input of the classifier method in order to classify the
test dataset.
Finally, the average accuracy obtained by 10-fold execution is considered as the main accuracy of
the Bee_RM algorithm.
2.2 HONEYBEE HIVE OPTIMIZATION (HHO)
The “ConstructSolution” method for optimizing the first array, create a path for each bee
according to the Dance Table and heuristic information. (1)
(1)P୩ሺr, sሻ = ൝
ሾδሺ୰,ୱሻሿαሾηሺ୰,ୱሻሿβ
∑ ሾδሺ୰,ୱሻሿαሾηሺ୰,ୱሻሿβ
౫∈ెౡ®
0
if s ∈ J୩
The original fitness function, presented in this paper, is implemented according to the Eq (2).
Below we demonstrate this function in (2).
(2)Ηሺr, sሻ = pଵ × supportሺsolutionሻ + pଶ ×
#୬୭୬ ୈ୭୬′୲ୡୟ୰ୣ
#ୣୟ୲୳୰ୣୱ
In this formula (2) P1 is the effectiveness and importance given to the support of produced
solution, and P2 is the importance given to the “Don’t-care” relative to the number of all features.
5. Computer Science & Information Technology (CS & IT) 119
2.3 PARTICLE SWARM OPTIMIZATION (PSO)
We use the particle swarm optimization (PSO) algorithm in a continuous space and in multi
objective form. The objective function in Eq (3) is used to calculate the Local-best found by each
individual inside the same individual. The Global-best found in the whole population of
individuals is kept in another variable called Gbest in each individual. In other word, we do not
have a unique Global-best but many.
Fitness = Support Percent*Support (solution) + (1-SupportPercent)* Confidence (solution) (3)
All Gbest are the most optimized local Non-dominated association rules obtained by Eq (4)
optimization in the current population. To calculate the location of the next move of particle, we
use the average of these local Non-dominated rules as demonstrated in the Eq(5) and Eq(6).
The more general rules cover a big span of the dataset records. It reduces the interestingness of
the rules. Our objective is to make a trade-off between interestingness and support value of
obtained association rules. We try to extract more detailed association rules with high support
value and interestingness by defining the “Interval-p” parameter.
2.4 STOPPING CONDITION
Once all rules are created by all members of the current population, local non-dominated and
global non-dominated rules are determined. The most important condition to stop the training
phase is a constant number of repetitions. The members continue the procedure till the stop
condition is satisfied. The procedure stops if the repetition number of procedure is reached (the
“Maxgeneration” number). Then, the best association rules according the Pareto-optimality
optimization are selected.
3. EXPERIMENTAL RESULTS
This section shows the experimental results of the proposed method versus other classification
techniques. Our proposed method will be analyses educational data generated on a Moodle
platform.
Moodle’s log is the baseline system used in this research. Moodle is a free virtual learning
environment (VLE). Moodle is therefore evolving system and dynamic. Anyone can download
and install it. An administrator is responsible for managing users (students, teachers, etc.) and
course virtual classrooms. The Moodle system view differs depending on the role the user plays
(teacher, student, administrator etc.).
6. 120 Computer Science & Information Technology (CS & IT)
Moodle is developed by programmers as an open source system, from all over the world. As of
2013, Moodle system has over 77,000 registered sites in over 215 countries. It prepares support
to over 65 million students all over the world, trained by over 1.2 million teachers.
Moodle is only one of many support tools for virtual learning environment (VLE). There are
other similar distance systems like, for example, ATutor, eCollege, Desire2Learn or Dokeos.
The information of interaction is stored as attributes in a user (student) profile. In our data set, 11
attributes and values are stored, with 357 records. These attributes include: number of interaction
between student-student, student-teacher, and etc. Detail of this data set is as follows. Table 1
shows detail information about attributes of Moodle data set.
Experimentally, we have tried to set the best parameters for proposed method. The values of
different user-defined parameters of Bee_RM is reported in Table 2.
Table 1.Information about features of Moodle dataset.
Category features
Category 1
Based on agent
student–ST :Student-ST
ST-TE: student –teacher
ST-CO :Student – content
ST-SY : Student-system
Category 2
Based on frequency of use
TC :Transmission of contents
CI: Creating class interactions
SA :Student assessment /
evaluating students
Category 3
Based on participation mode
AC : Active
PA: Passive
Academic-Dependent variable
performance
GR: Final grade
The performance of Bee_RM is evaluated using 10-fold cross-validation test (Michalski et al.,
1998). In this section of research, the all obtained results are reported. Important scale to evaluate
the proposed method : accuracy.
The accuracy is the number of instances correctly classified and being calculated according to Eq. (7)
Accuracy =
ሺ ା ሻ
ାାା
(7)
Table 2. Parameter setting of Bee_RM.
Parameter Value
PopulationSize 30
Maxgeneration 150
DefultDancers 6
ܥ 0.5 , 0.03
ܲଵ 1 , 4
SupportPercent 0.5
Interval_p 0.5
, βα 2 , 1
7. Computer Science & Information Technology (CS & IT) 121
Figs.1 and 2 denote the effect of different population sizes of the new proposed metaheuristic
algorithm on accuracy and execution time respectively. Fig. 3 shows the Influence of Pଶ
parameter on average length of rules.
Figure 3. Infuluence of P2 parameter on average length of rules.
Figure 1. Influence of number of individual on Fig 2. Influence of number of individual
accuracy on taken time to learn the classifier
8. 122 Computer Science & Information Technology (CS & IT)
Table 3. Classification accuracy obtained with different method for Moodle.
Method Classification
Accuracy (%)
Study
KNN 47.29% +/- 6.05% Cover & Hart, (1967) and Rapidminer tool is
available
NN 51.68% +/- 3.83% Nsky, (1954) and Rapidminer tool is available
Baysian 43.71% +/- 8.26% Russell, Stuart, 1995) and Rapidminer tool is
available
Rule Induction 46.63% +/- 6.55% J. Stefanowski, (1998) and Rapidminer tool is
available
PART 51.26 Witten and Frank, (2005) and WEKA tool is
available
OneR 45.93 Weka: http://www.cs.waikato.ac.nz/~ml/weka/
JRip 50.42 Weka: http://www.cs.waikato.ac.nz/~ml/weka/
ZeroR 41.73 Witten and Frank, (2005) and WEKA tool is
available
IBK 40.33 Witten and Frank, (2005) and WEKA tool is
available
Logistic 46.49 Witten and Frank, (2005) and WEKA tool is
available
SimpleLogistic 51.26 Witten and Frank, (2005) and WEKA tool is
available
SMO 52.10 Witten and Frank, (2005) and WEKA tool is
available
NaiveBayes 36.13 Witten and Frank, (2005) and WEKA tool is
available
ClassificationVia
Regression
52.66 Witten and Frank, (2005) and WEKA tool is
available
Vote 41.73 Witten and Frank, (2005) and WEKA tool is
available
Random Tree 45.93 Weka: http://www.cs.waikato.ac.nz/~ml/weka/
Random Forest 47.05 Witten and Frank, (2005) and WEKA tool is
available
J48 46.21 J.R. Quinlan, (1993) and WEKA tool is available
CPSO-C 42 Liu et al. 2004, and KEEL tool is available
SLAVEC 51 González and Pérez 2001, and KEEL tool is
available
MPLCS-C 47 Bacardit and Krasnogor 2009, KEEL tool is
available
C-SVM-C 51 KEEL tool is available
XCS-c 47 Wilson 1995, and KEEL tool is available
GFS-SP-C 48 Sánchez et al. 2001, and KEEL tool is available
Bee_RM 53.46%+/-5.46% Our study
9. Computer Science & Information Technology (CS & IT) 123
Table 3 shows accuracy of Bee_RM versus several recent and famous classification methods.
We used 3 famous tools in data mining, for comparison.
To compare our results with other studies, we have used WEKA, Rapidminer and KEEL
softwares.
Six evolutionary rule learning algorithms are used in which 3 of them learn fuzzy rules and 3 of
them learn crisp rules in an evolutionary way. These results reveals, our proposed method
Bee_RM using 10-fold cross validation obtains the highest classification accuracy, 53.46%,
reported so far. So, we can draw this conclusion that the combination of Bee Colony
Optimization and particle swarm optimization with continues logic, would be very effective in
predicting student final performance in educational data.
Although there is not any accurate definition for interpretability of classification methods but the
number of rules (NR) and mean length of rules(Len) are often mentioned as two main factors of
interpretability.
4. CONCLUSIONS
In this paper we employed the capability of swarm based techniques to extract association rules
for student performance prediction as a multi-objective classification problem. The proposed
algorithm had a low convergence time and it used a few number of parameters. Honeybee Colony
Optimization and Particle Swarm Optimization were the two used metaheuristics to extract
association rules. The fitness function in both of these algorithms considers support and length of
the association rules. Results showed that using the proposed metaheuristic-based rule discovery
approach enables us to extract accurate and interpretable knowledge for student performance
prediction. Our future works focus on using new proposed metaheuristic algorithms such as
Gravity Search and Vortex Search Algorithm instead of PSO and Honeybee Colony. Moreover,
we aim to consider other measures such as confidence, correlation and interestingness along with
support and rule length.
REFERENCES
[1] Peña-Ayala, Alejandro. "Educational data mining: A survey and a data mining-based analysis of
recent works." Expert systems with applications 41.4 (2014): 1432-1462.
[2] Romero, Cristobal, and Sebastian Ventura. "Educational data mining: A survey from 1995 to 2005."
Expert systems with applications 33.1 (2007): 135-146
[3] Baradwaj, B. and Pal, S. (2011) ‘Mining Educational Data to Analyze Student s’ Performance’,
International Journal of Advanced Computer Science and Applications, vol. 2, no. 6, pp. 63-69.
[4] Chandra, E. and Nandhini, K. (2010) ‘Knowledge Mining from Student Data’, European Journal of
Scientific Research, vol.
[5] Ayesha, S. , Mustafa, T. , Sattar, A. and Khan, I. (2010) ‘Data Mining Model for Higher Education
System’, European Journal of Scientific Research, vol. 43, no. 1, pp. 24-29.
[6] Shannaq, B. , Rafael, Y. and Alexandro, V. (2010) ‘Student Relationship in Higher Education Using
Data Mining Techniques’, Global Journal of Computer Science and Technology, vol. 10, no. 11, pp.
54-59. 47, no. 1, pp. 156-163.
[7] Marquez-Vera, C., Cano, A., Romero, C., & Ventura, S. (2013). Predicting student failure at school
using genetic programming and different data mining
[8] Pieter, Adriaans. DolfZantinge, 1996. Data Mining (New York: Addison Wesley)
[9] D. T. Larose, Discovering knowledge in data: an introduction to data mining. Wiley.com, 2005.