Bee jay1
Bee jay1
INTRODUCTION
The overall success of any institution can be measured by the success rate of its present students
and graduates. The institution cannot improve its educational system if it cannot track and
accurately predict student performance. These academic institutions collect tons of student’s data
as they come in and keep collecting data throughout their duration at the institution. Collecting
this data is not the problem: however, utilizing this data for the benefit of the students and the
institution at large can be very difficult. To improve student performance deep analysis of
student’s record must be carried out. This is done using EDM. Educational data mining is the
process of applying data mining tools and techniques to analyze data at academic institutions.
Predicting student academic performance is one of the important applications of educational data
mining. It allows academic institutions to provide appropriate support for students facing
difficulties. It also helps to curb the failure rate, improve the educational system, analyze hidden
patterns in student’s records and behaviors and ultimately predict student’s performance. In
recent years different techniques have been used for EDM like Naive Bayes, Decision trees,
neural networks, outlier detection, and advanced statistical techniques.
Classification is one of the supervised learning techniques used to create a model for classifying
a data item into a predefined class name. It is one of the most suitable techniques for data
analysis. Hence, this project focuses on using the classification method to predict the future
performance of enrolled students based on his/her grades in core courses, most especially the
mandatory courses. We will be analyzing data obtained from the computer science department of
federal polytechnic offa (A case study of student performance in HND 1 CS 2022/2023
Academic Session ) to decide final GPA of students.
With the increasing rate of dropouts, failures, and drop in the standard of education in higher
institutions in Nigeria, It has become increasingly important to put measures in place to curb
these patterns. It is then important that student’s records be tracked in a way that helps them
know their academic performance per time, pointing to areas that can be improved upon. Also,
1
lecturers need to be aware of how effective or non-effective their teaching is per time, by merely
looking at students' results. If this can be done early in the academic semester, even before
exams, the failure rate will drastically reduce.
The aim of this project is to predict student’s academic performance using decision tree
algorithms. The objectives are:
i. To predict students’ final GPA classified as pass or fail given the grades of all the
mandatory courses.
ii. To carry out research on decision tree algorithms for prediction.
iii. Help identify the various factors that affect students’ academic performance.
iv. To implement model to predict academic performance using decision tree classifiers.
v. To clean the data for testing the model, so as to remove any duplicate data
vi. To train the model using data collected from the students, in order to increase the
accuracy of the result produced.
vii. Compute the accuracy of the model for better decision making.
This project work is helpful to both students and lecturer. Lecturer will be able to spot weak
students in a hasty manner, and they will also be able to find courses where students are under
performing, which make it easier for them to introduce teaching methods that are more effective.
Predicting low-performing students at early stages allows providing additional support for them.
On the other hand, students can track their own performance per time. As a result of this
knowledge, it will bring out useful and hidden knowledge for students, lecturers, and polytechnic
management to take appropriate action to enhance and contribute to the improvement of student
outcomes and contribute to a better quality of education.
This project work titled developing a predictive model for classifying student academic
performance using a decision tree classifiers is limited to building a model to predict students’
performance at the Computer Science department of federal polytechnic offa (A case study of
2
student performance in HND 1 CS 2022/2023 Academic Session ). This model does not
automatically solve the student's poor academic performance problem, it only provides
information that enables the computer science department to make the right decisions to monitor
and support the students and improve the quality of the program.
i. Heavy Data Consumption because Wi-Fi was not available to work with.
Data Mining: Data Mining is the process of finding anomalies, patterns, and correlations within
large data sets to predict outcomes.
Decision tree: It is a support tool with a tree-like structure that models probably outcomes, cost
of resources, utilities, and possible consequences.
Educational Data Mining: It is an emerging discipline, concerned with developing methods for
exploring the unique and increasingly large-scale data from educational settings and using those
methods to better understand students, and the setting in which they learn.
WEKA: Waikato Environment for Knowledge Analysis is an open software that provides tools
for data preprocessing, implementation of several machine learning algorithms, and visualization
tools so that you can develop machine learning techniques and apply them to real-world data
mining problems.
3
CHAPTER TWO
LITERATURE WORK
Educational institutions aim to provide quality education and analyze the performance of
students to help them improve. The varying factors in current education sector have led to the
pursuit of effective and efficient monitoring of student performance, thus, the ability to predict
students’ performance serves a vital role in providing information that is geared to help students,
lecturers, administrators, and policymakers take decisions (Joseph et al., 2017). In the
educational data mining method, predictive modeling is usually used in predicting student
performance. In order to build the prediction models, there are several tasks used, which are
classification, regression, and categorization. The most popular task to predict student’s
performance is classification. There are several algorithms under classification tasks that have
been applied to predict student’s performance. Among the algorithms used are Decision Tree,
Artificial Neural Networks, Naive Bayes, KNearest Neighbor, and Support Vector Machine. In
this project, we shall be using Decision trees classifier/algorithm.
A decision tree is a decision aid device that makes use of a tree-like graph or model of decisions
and their feasible consequences, which includes chance event outcomes, useful resource costs,
and utility. It is primarily based totally at the precept of Divide and Conquer. A selection tree
builds type models in the shape of a tree structure. It breaks down a dataset into smaller and
smaller subsets even as at the same time a related decision tree is incrementally developed. The
very last end result is a tree with decision nodes and leaf nodes.
Data mining helps to extract relevant information from large and complex databases. Data
mining techniques are useful for data analysis and predictions. Classification is an unsupervised
learning technique that helps to classify predefined class labels. There are various classification 5
techniques such as Decision tree algorithm, Bayesian network, neural network, and Genetic
algorithm, etc. These techniques can be used to build the classification model. This classification
model helps to predict the future trend based on the previous pattern. This paper proposes a
4
classification model particularly a decision tree algorithm to predict the future grades of the
students in their final examinations. The decision tree classifier will classify the future grades as
either Pass or Fail. A number of factors may affect the performance of students. Here some
significant factors have been considered while constructing the decision tree for classifying
students according to their attributes (grades). Decision tree in data mining is one of all the best
and easiest methods that are most often used by researchers in their work.
In 2013, Kabakchieva researched predicting student performance with several Data Mining
methods for classification such as Naive Bayes, Decision Trees, and k Nearest Neighbor (kNN).
Kabakchieva utilized WEKA software to analyze the students’ performance from the University
of National and World Economy (UNWE). WEKA is a popular software application that
contains several data analysis tools, such as data visualization on classification and clustering.
The dataset used contains students’ historical courses’ grades, genders, and ages, and the. The
output of the students’ final performance was categorized into five different levels, which are
excellent, very good, good, average, and bad. The results revealed the significant factors
influencing most classification process are the students’ University Admission Score and the
Number of Failures at the first-year university exams. (Feng, 2019). Rathee and Mathur applied
ID3, C4.5, and CART decision tree algorithm on educational information for predicting student's
performance in the examination. All the algorithms are applied to the internal assessment
information of students to predict their academic performance in the final examination. The
efficiency of various decision tree algorithms will be analyzed based on their accuracy and the
time that was taken to derive the tree. The prediction obtained from the system has helped the
class teacher to identify the weak students and improve their performance. C4.5 is the best
algorithm among all the three because it provides higher accuracy and efficiency than the other
algorithms (Ramesh, 2013). Tripti Mishra et al used different classification techniques to build
performance prediction models based on students’ social integration, academic integration, and
various emotional skills which have not been considered so far. Two algorithms J48
(Implementation of C4.5) and Random Tree have been applied to the records of MCA students
of colleges affiliated with Guru Gobind Singh Indraprastha University to predict third-semester
performance. Random Tree is found to be more accurate in predicting performance than the J48
algorithm. Kotsiantis applied five classification algorithms namely, Decision Trees, Perceptron-
5
based Learning, Bayesian Nets, Instance-Based Learning and Rule learning, to predict the
performance of computing students from distance learning stream of Hellenic Open University,
Greece. A total of 365 student records comprising several demographic variables like sex, age
and legal status were used. In addition, the performance attribute, namely the marks during a
given assignment was used as input to a binary (pass/fail) classifier. Filter based variable
selection technique was wont to select highly influencing variables and every one the above five
classification models were constructed. There are three main components in prediction which are
parameter, method, and tool on which the Student’s Academic Performance (SAP) is assessed.
Sembiring et al. took parameters like Interest, Study behavior, engage time, Believe, Family
support, Demographic, and CGPA. The method used was Smooth Support Vector Machine
(SSVM), K-Means, and Decision Tree. They used the Rapid miner tool to predict the SAP and
presented 93.7% accuracy. Shana and Venkatachalam took parameters like Family background
and Schooling information. The method used was Naïve Bayes and Decision Tree. They used the
WEKA tool to predict the SAP and presented 82.4% accuracy. Arsad et al. took parameters like
the Educational background, CGPA, and Gender. The method they used was Artificial Neural
Network (ANN).
6
CHAPTER THREE
METHODOLOGY
Data collection methods are an important aspect of any research project. Using the correct data
collection method ensures that quality data is collected consistently. In this project, the non-
probability sampling method and convenience sampling method is used to collect student data.
Non-probability sampling involves non-random selection based on convenience or other criteria,
which simplifies data collection. The convenience sample includes only those people who are
most accessible to researchers. The sample frame consists of HND 1-level students because they
are my target audience for this prediction. The sampling frame is the group of individuals to be
sampled. Ideally, it should encompass the entire target audience (excluding those who do not
belong to it).
In this project 2 methods were used to source data and they are Survey and internet/journals.
Surveys: An interview, where the researcher asks a set of questions by phone or in person and
records the responses. In this research an online questionnaire was used. Online surveys are a
popular choice for students doing dissertation research, due to the low cost and flexibility of this
method. There are many online tools available for constructing surveys, such as Survey Monkey
and Google Forms. Google forms was used for the questionnaire. An online questionnaire was
chosen for this research project because of the following reasons.
i. You can quickly access a large sample without constraints on time or location.
ii. The data is easy to process and analyze. The survey consisted of 20 multiple-choice
questions. The aim was to conduct the survey with the HND1 level students based on
their academic session in BIU this way prediction could be made for the subsequent
semester. Participants were given 5 minutes to fill in the survey anonymously, and 54
students responded though not all of them filled it completely, this brought about the
need to preprocess the collected data.
Internet/Journals: Reference was made to the internet and journals from time to time for
this research, though information was limited and offered very little.
7
Fig 3.1 Architectural View of the proposed System.
This gives a high level view of the new system with the main components of the system and the
services they provide and how they communicate. The above figure clearly explains the system
architecture of the proposed system. The steps involved in the proposed system are:
This model predicts student’s performance based on the input attributes known as predictors to
give result in the form of categorical classification i.e. Pass and Fail. Decision tree algorithm is
used to perform data classification on the collected dataset. Results are visualized in a graphical
tree like structure with different nodes and the accuracy of the model is computed.
8
ACCADEMIC PERFORMANCE
NAME
MAT.NO /STAFF ID
GENDER
DEPARTMENT
PASSPORT
USERNAME
PASSWORD
SUBMIT
LOGIN PAGE
USERNAME
PASSWORD
LOGIN
9
CYBER SECURITY MANAGEMENT SYSTEM
USERNAME
NAME
MESSAGE
SUBMIT
10
GENDER Text
PASSPORT Image
USERNAME Text
PASSWORD Text
11