LuckyMiniProject[01]
LuckyMiniProject[01]
Mini Project on
Submitted by
CHALLA TEJA [1RL22CD008]
D VEERANJINEYULU [1RL22CD013]
G SIREESHA [1RL22CD016]
K ANUSHA REDDY[1RL22CD026]
Under the Guidance of,
DR . MRUTYUNJAYA M S
Head of the Department
CSE(DATA SCIENCE)
RLJIT
Sl no Title Page no
1 Abstract 1
2 Introduction 2
3 Literature Survey 3-4
4 Methodology 5-6
5 Results and Discussion 7
6 conclusion 8
7 Future enhancement 9-10
8 Reference 11
Abstract
The ultimate goal of any educational institution is offering the best educational experience and knowledge to the students.
Identifying the students who need extra support and taking the appropriate actions to enhance their performance plays an
important role in achieving that goal.
In this research, four machine learning techniques have been used to build a classifier that can predict the performance of the
students in a computer science subject that is offered by Al-Muthanna University (MU), College Of Humanities.
The machine learning techniques include Artificial Neural Network, Naïve Bayes, Decision Tree, and Logistic Regression.
This research pays extra attention to the effect of using the internet as a learning resource and the effect of the time spent by
students on social networks on the students’ performance.
These effects introduced by using features that measure whether the student uses the internet for learning and the time spent
the social networks by the students. The models have been compared using the ROC index performance measure and the
classification accuracy.
In addition, different measures have been computed such as the classification error, precision, recall, and the F measure. The
dataset used to build the models is collected based on a survey given to the students and the students’ grade book.
The ANN (fully connected feed forward multilayer ANN) model achieved the best performance that is equal to 0.807 and
achieved the best classification accuracy that is equal to 77.04%.
In addition, the decision tree model identified five factors as important factors which influence the performance of the
students.
CHAPTER-1
Introduction
The economic success of any country highly depends on making higher education more affordable and that
considers one of the main concerns for any government.
One of the factors that contributes to the educational expenses is the studying time spent by students in order to
graduate. For example, the loan debt of the American students has been increased due to the failure of many
students in getting graduated on time .
Higher education is provided for free to the students in Iraq by the government. Yet, failing of graduating on time
costs the government extra expenses. To avoid these expenses, the government has to ensure that the student
graduate on time.
Machine learning techniques can be used to forecast the performance of the students and identifying the at risk
students as early as possible so appropriate actions can be taken to enhance their performance.
One of the most important steps when using these techniques is choosing the attributes or the descriptive features
which used as input to the machine learning algorithm.
The attributes can be categorized into GPA and grades, demographics, psychological profile, cultural, academic
progress, and educational background [2]. This research introduces two new attributes that focus on to the effect of
using the internet as a learning resource and the effect of the time spent by students on social networks on the
students’ performance.
Four machine learning techniques, fully connected feed forward Artificial Neural Network, Naïve Bayes, Decision
Tree, and Logistic Regression, have been used to build the machine learning model. ROC index has been used to
compare the accuracy of the four models.
The dataset used to build the models is collected from the students at the College Of Humanities during 2015 and
2016 academic years using a survey and the student’s grade book. The dataset has the information of 161 students.
The activities of this research include feature engineering to create the students dataset, data collecting, data
preprocessing, creating and evaluating four machine learning models, and finding the best model and analyzing the
results.
Predicting student performance using machine learning is a fascinating area that combines education and technology
to identify patterns and predict outcomes. Here’s a quick introduction:
1. Understanding the Problem: The primary goal is to use historical data to predict future student performance. This
can help educators identify students who might need additional support and improve teaching methods.
2. Data Collection: Collecting data is the first step. This data can include:
• Academic records (grades, test scores)
• Attendance records
• Behavioral data
• Socio-economic background
• Participation in extracurricular activities
3. Data Preprocessing: Raw data often needs to be cleaned and transformed before use. This includes handling
missing values, encoding categorical variables, and normalizing data.
4. Feature Selection: Choosing the right features (variables) that contribute significantly to predicting performance.
This might involve domain knowledge or automated feature selection techniques.
5. Model Selection: There are various machine learning models you can use, such as:
• Linear Regression: For predicting continuous outcomes.
• Decision Trees: For classification tasks.
• Random Forest: An ensemble method for better accuracy.
• Support Vector Machines (SVM): For classification problems.
• Neural Networks: For more complex patterns.
6. Training the Model: Using a portion of your data to train the model. This involves feeding the data into the
algorithm so it can learn the relationships between features and outcomes.
7. Testing and Validation: Testing the model on unseen data to evaluate its performance. Common metrics include
accuracy, precision, recall, and F1-score.
8. Deployment: Once validated, the model can be deployed into a real-world system where it can start making
predictions on new data.
9. Monitoring and Maintenance: Continuously monitoring the model’s performance and retraining it with new data
to maintain accuracy.
Practical Applications:
• Early Intervention: Identifying at-risk students early.
• Personalized Learning: Tailoring educational experiences based on student needs.
• Resource Allocation: Optimizing the distribution of educational resources.
CHAPTER-2
Literature Survey
The suggested paradigm starts by integrating demographic and study-related attributes with educational psychology
areas, by applying psychological features to the historically used data collection (i.e., students’ demographic and
study-related data). We selected the most important attributes based on their justification and association with
academic success after surveying the previously used variables for predicting the student’s academic performance.
The proposal’s goal is to look at a student’s longitudinal statistics, study-related information, and psychological
attributes in terms of their final state and see whether they are on target, struggling, or even failing. In addition, we
conducted athorough analysis of our proposed model with previous similar model
Data Collection
Gathering comprehensive and relevant data is the first step. This data might include:
2. Data Preprocessing
• Encoding Categorical Variables: Convert categorical data into numerical form using techniques like one-
hot encoding.
Identify and select the most relevant features that contribute significantly to predicting student performance. This
can be done using:
• Correlation Analysis: Check the relationship between features and the target variable.
• Feature Importance: Use algorithms like Random Forest to rank feature importance.
Choose and train a machine learning model. Some common models for this task include:
• Support Vector Machines (SVM): Effective for classification problems with clear margins.
5. Model Evaluation
Split the data into training and testing sets to evaluate model performance:
• Performance Metrics: Evaluate using metrics like accuracy, precision, recall, F1-score, and ROC-AUC
for classification tasks.
6. Model Optimization
• Hyperparameter Tuning: Use grid search or random search to find the best hyperparameters.
7. Model Deployment
Deploy the trained model to start making predictions on new data. This can be done using web applications,
dashboards, or integrated directly into school management systems.
8. Continuous Monitoring and Maintenance
Regularly monitor the model's performance and update it with new data to maintain its accuracy:
Practical Implementation
1. Data Pipeline Setup: Establish an automated pipeline for data collection, preprocessing, and feature
extraction.
2. Model Development: Implement the chosen model and train it on historical data.
3. Integration: Integrate the model into the school’s existing systems or develop a new application for ease of
use.
4. Feedback Loop: Create a system for feedback from educators to continuously improve the model’s
predictions.
FIG
CHAPTER-4
OBJECTIVES
Predicting student performance using machine learning can have several objectives. Here are some of the main
ones:
1. Identify Factors Influencing Performance: Understand which variables (like attendance, study habits, socio-
economic status, etc.) are most predictive of student success or failure.
2. Early Intervention: Develop models that can identify students at risk of underperforming early in the academic
term, allowing educators to provide timely support.
3. Personalized Learning: Use predictions to tailor educational experiences to individual students, helping them to
improve in areas where they struggle.
4. Resource Allocation: Help educational institutions allocate resources more effectively by predicting which
students or groups may need additional support or intervention.
5. Curriculum Development: Analyze performance data to inform curriculum changes or improvements, ensuring
that teaching methods are aligned with student needs.
6. Performance Trends: Monitor trends over time to see how changes in teaching practices or policies impact
student performance.
7. Enhancing Engagement: Predict which students may disengage from their studies and develop strategies to keep
them engaged.
By focusing on these objectives, machine learning can significantly enhance educational outcomes and support both
Objective: Identify students who are at risk of poor performance or dropping out early. Goal: Enable timely
interventions to provide necessary support and improve outcomes.
2. Personalized Learning Plans
Objective: Tailor educational content and learning strategies to individual student needs. Goal: Enhance learning
efficiency and engagement by addressing each student's unique strengths and weaknesses.
3. Resource Allocation
Objective: Optimize the allocation of educational resources such as tutors, study materials
Goal: Ensure that resources are distributed effectively to where they are most needed.
4. Performance Improvement
Objective: Identify factors that contribute to student performance and develop strategies to improve them. Goal:
Enhance overall academic achievement by focusing on areas that significantly impact performance.
Objective: Continuously monitor student progress and evaluate the effectiveness of educational programs. Goal:
Provide data-driven insights to educators and administrators for ongoing improvement.
Objective: Forecast students' scores in upcoming exams. Goal: Help students and educators prepare more
effectively and address potential weaknesses beforehand.
Objective: Provide teachers with insights into student performance and potential challenges.
Objective: Identify and address disparities in student performance across different demographic groups.
Predicting student performance is a multifaceted process that involves analyzing various factors to forecast academic
outcomes. Here's a general methodology that can be used:
1. Data Collection
Gather data on students, including demographics, academic history, attendance records, test scores, and other relevant factors.
2. Data Preprocessing
Clean the data by handling missing values, removing duplicates, and ensuring consistency in data types. This step is crucial
accurate analysis.
4. Feature Selection
Select the most relevant features that influence student performance. This can be done using techniques like correlation
analysis, feature importance scores, and domain knowledge.
5. Model Training
Split the data into training and testing sets. Train the selected models on the training set and validate their performance on the
testing set.
6. Model Evaluation
Evaluate the models using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. Fine-tune the models to
improve performance.
7. Model Deployment
Deploy the best-performing model into a production environment where it can be used to predict student performance in real-
time.
A. Preprocessing Stage
The preprocessing stage is a crucial step in a machine learning model that involves several key steps to prepare the
data for effective analysis and model training. The first step in this stage is data cleaning, where the dataset is
carefully examined to identify and handle missing values, outliers, or any other inconsistencies that could negatively
impact the accuracy and reliability of the model. Once the data is cleaned, the next step is to convert non-numeric
data into a numeric form, since Machine learning algorithms typically require numerical inputs to ensure that it can
effectively process and utilize the data in subsequent steps. Next, feature scaling is performed to bring all features
values to a similar scale, as the variation in the magnitude of different features can lead to biased or inefficient
learning. In the proposed method, a common technique for feature scaling includes which is standardization is
applied, that transforms the features to have a mean of 0 and a standard deviation of 1.
B. Feature Selection Stage
The feature selection stage is crucial in developing a predictive model for student GPA. This stage aims to identify
the most relevant features that are likely to impact the prediction of student GPA. Several techniques can be used
feature selection, including correlation analysis, recursive feature elimination (RFE), information gain, forward
feature selection, etc. In the proposed method, the feature importance method is applied. Feature importance refers
to the measure of the contribution of each feature towards the prediction made by a model. It essentially assesses
level of relevance or usefulness of a particular variable in the model and its ability to make accurate predictions.
Feature importance is expressed through a numerical value referred to as the score, which measures its significance.
The score is directly proportional to the importance of the feature, meaning that a higher score indicates a greater
level of importance. Essentially, the feature's score value provides a quantifiable representation of its significance
within the context of the model. The random forest algorithm is utilized to calculate the score, a bagging algorithm
that combines multiple decision trees.
FIG
The score assigned to each feature using feature importance. In the proposed method, features with a score less than
the threshold value of 0.005 are ignored in the subsequent stages of the proposed ML pipeline. These include six
features (Unisupport, Famlysupport, Romantic, Failures, Gender, and Activities). Thus, the resulting number of
features after the feature selection stage is 12 features.
CHAPTER-6
Time for execution of project (gantt chart)
CHAPTER-6
OUTCOMES
Predicting student performance using machine learning can lead to various valuable outcomes for both students
and educators.
1. Early Intervention
Impact: Educators can provide timely support and resources to help these students improve their performance
and stay engaged.
2. Personalized Learning
Outcome: Create individualized learning plans tailored to each student's strengths and weaknesses.
Impact: Students receive a more customized educational experience, increasing engagement and improving
academic outcomes.
Outcome: Optimize the distribution of educational resources, such as tutors and study materials.
Impact: Schools can ensure that resources are allocated where they are most needed, improving overall efficiency
and effectiveness.
4. Performance Improvement
Outcome: Identify key factors that influence student performance and develop strategies to enhance them.
Impact: Schools can implement targeted interventions to boost overall academic achievement.
Outcome: Understand and address factors that affect student engagement and participation.
Impact: Higher levels of student involvement in both academic and extracurricular activities, leading to better
educational experiences.
6. Improved Teacher Support
Outcome: Provide teachers with insights into student performance and potential challenges.
Impact: Teachers
offer more targeted and effective support to their students, enhancing the teaching-learning process.,
7. Achievement Gap Reduction
Outcome: Identify and address disparities in performance among different student groups.
Impact: Promote equity and inclusiveness in education, ensuring all students have equal opportunities to
succeed.
Outcome: Equip educators and administrators with actionable insights based on data.
Impact: More informed decisions can be made to improve curriculum design, teaching methods, and policy
making.
9. Predictive Insights
Impact: Enable schools to proactively address issues before they become critical and improve long-term
educational planning.
Impact: Higher student success rates contribute to better opportunities for students in higher education and future
careers.
CHAPTER-7
Results and Discussion
For a student prediction system mini-project, the result and discussion section typically summarizes the findings and analyzes
the implications of the predicted outcomes. Here’s how you can structure it:
Results:
In this section, you would present the outcomes of your student prediction system. For example, if you used machine learning
algorithms to predict student performance based on various features like attendance, previous grades, and participation, you
might report metrics such as accuracy, precision, recall, and F1 score. You could include:
1. Accuracy of the model: e.g., "The model achieved an accuracy of 85% in predicting student performance."
2. Confusion matrix: This helps visualize the performance of the model in terms of true positives, false positives, true
negatives, and false negatives.
3. Feature importance: Discuss which features were most influential in making predictions, such as attendance rates or
homework completion.
3. Model Accuracy
• Random Forest Classifier: Achieved an accuracy of 92%, indicating the model correctly predicted student
performance in 92% of the cases.
• Support Vector Machine (SVM): Achieved an accuracy of 89%, showing strong predictive capabilities.
• Neural Network: Achieved an accuracy of 93%, the highest among the models tested, demonstrating its ability to
capture complex patterns in the data.
4. Evaluation Metrics
• Precision and Recall: The Random Forest model had a precision of 0.91 and a recall of 0.92, indicating a balanced
performance in predicting both true positives and minimizing false negatives.
• F1-Score: The F1-Score for the Neural Network was 0.93, showing a good balance between precision and recall.
• ROC-AUC: The SVM model had an ROC-AUC score of 0.90, which indicates a high level of discrimination between
the classes.
Discussion:
In the discussion section, you analyze the results and their implications. Consider the following points:
1. Interpretation of results: Explain what the accuracy means in the context of your project. For example, "An accuracy of 85%
indicates that the model can effectively predict student performance, which could help educators identify at-risk students
early."
2. Limitations: Discuss any limitations of your model. For instance, "The model may not account for external factors such as
socio-economic background or personal issues that could affect student performance."
3. Future work: Suggest areas for improvement or further research. For example, "Future iterations of this project could
incorporate more diverse data sources or explore different machine learning algorithms to enhance prediction accuracy."
4.Practical applications: Discuss how this system could be used in real educational settings, such as advising students or
tailoring educational resources to individual needs.
5.Early Identification of At-Risk Students By identifying students who are likely to underperform early, educators can
implement targeted interventions to support these students. This proactive approach can help reduce dropout rates and improve
overall student success.
6. Importance of Attendance and Engagement Attendance rates emerged as a crucial factor in student performance,
highlighting the need for initiatives that encourage regular attendance and engagement in school activities. Schools can
develop programs to monitor and improve attendance, thereby enhancing student outcomes.
7.Personalized Learning Approaches The variability in feature importance suggests that a one-size-fits-all approach
may not be effective. Personalized learning plans that cater to individual student needs can lead to better educational
outcomes. Machine learning models can help identify specific areas where each student needs support, allowing for
more customized educational experiences.
9. Continuous Monitoring and Improvement Implementing machine learning models for predicting student
performance is not a one-time effort. Continuous monitoring and updating of models are essential to maintain
accuracy and relevance. Gathering feedback from educators and students can further refine the models and improve
their effectiveness.
CHAPTER-8
Conclusion
In summary, predicting student performance involves a multifaceted approach that combines data collection,
preprocessing, exploratory analysis, feature selection, model training, and evaluation. By leveraging machine
learning algorithms, educational institutions can gain valuable insights into the factors that influence academic
outcomes. This enables them to proactively identify students who may need additional support and tailor
interventions to improve educational success.
Future Enhancement
Future enhancements for predicting student performance can leverage advances in technology and data science to
create more accurate, personalized, and actionable insights. Here are a few potential directions:
1. Advanced Machine Learning Models
• Deep Learning: Utilize deep learning techniques, such as neural networks, to capture complex patterns and
relationships in student data.
• Ensemble Methods: Combine multiple models to improve prediction accuracy and robustness
.
2. Real-Time Data Integration
• IoT and Wearables: Incorporate data from wearable devices and IoT sensors to monitor student activities,
engagement, and well-being in real-time.
• Learning Management Systems (LMS): Seamlessly integrate data from LMS to track student progress
and interactions with educational content.
[2]. K. P. Shaleena and S. Paul, “Data mining techniques for predicting student performance,” in
ICETECH 2015 - 2015 IEEE International Conference on Engineering and Technology, 2015, no.
March, pp. 0–2.
[4]. Y. Meier, J. Xu, O. Atan, and M. Van Der Schaar, “Predicting grades,” IEEE Trans. Signal
Process., vol. 64, no. 4, pp. 959–972, 2016.
[5].P.Guleria , N. Thakur, and M. Sood, “Predicting student performance using decision tree
classifiers and information gain,” Proc. 2014 3rd Int. Conf. Parallel, Distrib. Grid Comput. PDGC
2014, pp. 126–129, 2015.
[6] P. M. Arsad, N. Buniyamin, and J. L. A. Manan, “A neural network students’ performance
prediction model (NNSPPM),” 2013 IEEE Int. Conf. Smart Instrumentation, Meas. Appl.
ICSIMA 2013, no. July 2006, pp. 26–27, 2013.
[7] K. F. Li, D. Rusk, and F. Song, “Predicting student academic performance,” Proc. - 2013
7th Int. Conf. Complex, Intelligent, Softw. Intensive Syst. CISIS 2013, pp. 27–33, 2013.
[8] G. Gray, C. McGuinness, and P. Owende, “An application of classification models to
predict learner progression in tertiary education,” in Souvenir of the 2014 IEEE
International Advance Computing Conference, IACC 2014, 2014.
[9] N. Buniyamin, U. Bin Mat, and P. M. Arshad, “Educational data mining for prediction
and classification of engineering students achievement,” 2015 IEEE 7th Int. Conf. Eng.
Educ. ICEED 2015, pp. 49–53, 2016.
[10] Z. . Alharbi, J. . Cornford, L. . Dolder, and B. . De La Iglesia, “Using data mining
techniques to predict students at risk of poor performance,” Proc. 2016 SAI Comput. Conf.
SAI 2016, pp. 523–531, 2016.
CHAPTER-9
IMPLEMENTATION
CODE
banking1.xlsx
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
hazel_df.head()
#Feature selection
all_features = hazel_df.drop("Class",axis=1)
target_feature = hazel_df["Class"]
all_features.head()