0% found this document useful (0 votes)

6 views

LuckyMiniProject[01]

The document presents a mini project on predicting student performance using machine learning techniques, specifically focusing on four methods: Artificial Neural Network, Naïve Bayes, Decision Tree, and Logistic Regression. It emphasizes the importance of identifying at-risk students and the impact of internet usage and social network time on academic performance, utilizing a dataset from Al-Muthanna University. The ANN model achieved the highest accuracy of 77.04%, and the research aims to enhance educational outcomes through data-driven insights and timely interventions.

Uploaded by

gprabhas528

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

LuckyMiniProject[01]

Uploaded by

gprabhas528

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“JNANA SANGAMA” BELAGAVI- 590018, KARNATAKA

Mini Project on

“PREDICTING STUDENT PERFORMANCE”

Submitted by
CHALLA TEJA [1RL22CD008]
D VEERANJINEYULU [1RL22CD013]
G SIREESHA [1RL22CD016]
K ANUSHA REDDY[1RL22CD026]
Under the Guidance of,

DR . MRUTYUNJAYA M S
Head of the Department
CSE(DATA SCIENCE)
RLJIT

DEPARTMENT OF CSE (DATA SCIENCE)

R L JALAPPA INSTITUTE OF TECHNOLOGY

DODDABALLAPUR-561 203 (KARNATAKA)
2024-2025
Table of Report

Sl no Title Page no

1 Abstract 1
2 Introduction 2
3 Literature Survey 3-4
4 Methodology 5-6
5 Results and Discussion 7
6 conclusion 8
7 Future enhancement 9-10
8 Reference 11
Abstract

The ultimate goal of any educational institution is offering the best educational experience and knowledge to the students.
Identifying the students who need extra support and taking the appropriate actions to enhance their performance plays an
important role in achieving that goal.

In this research, four machine learning techniques have been used to build a classifier that can predict the performance of the
students in a computer science subject that is offered by Al-Muthanna University (MU), College Of Humanities.

The machine learning techniques include Artificial Neural Network, Naïve Bayes, Decision Tree, and Logistic Regression.
This research pays extra attention to the effect of using the internet as a learning resource and the effect of the time spent by
students on social networks on the students’ performance.

These effects introduced by using features that measure whether the student uses the internet for learning and the time spent
the social networks by the students. The models have been compared using the ROC index performance measure and the
classification accuracy.

In addition, different measures have been computed such as the classification error, precision, recall, and the F measure. The
dataset used to build the models is collected based on a survey given to the students and the students’ grade book.

The ANN (fully connected feed forward multilayer ANN) model achieved the best performance that is equal to 0.807 and
achieved the best classification accuracy that is equal to 77.04%.
In addition, the decision tree model identified five factors as important factors which influence the performance of the
students.
CHAPTER-1
Introduction
The economic success of any country highly depends on making higher education more affordable and that
considers one of the main concerns for any government.
One of the factors that contributes to the educational expenses is the studying time spent by students in order to
graduate. For example, the loan debt of the American students has been increased due to the failure of many
students in getting graduated on time .

Higher education is provided for free to the students in Iraq by the government. Yet, failing of graduating on time
costs the government extra expenses. To avoid these expenses, the government has to ensure that the student
graduate on time.

Machine learning techniques can be used to forecast the performance of the students and identifying the at risk
students as early as possible so appropriate actions can be taken to enhance their performance.

One of the most important steps when using these techniques is choosing the attributes or the descriptive features
which used as input to the machine learning algorithm.

The attributes can be categorized into GPA and grades, demographics, psychological profile, cultural, academic
progress, and educational background [2]. This research introduces two new attributes that focus on to the effect of
using the internet as a learning resource and the effect of the time spent by students on social networks on the
students’ performance.

Four machine learning techniques, fully connected feed forward Artificial Neural Network, Naïve Bayes, Decision
Tree, and Logistic Regression, have been used to build the machine learning model. ROC index has been used to
compare the accuracy of the four models.

The dataset used to build the models is collected from the students at the College Of Humanities during 2015 and
2016 academic years using a survey and the student’s grade book. The dataset has the information of 161 students.

The activities of this research include feature engineering to create the students dataset, data collecting, data
preprocessing, creating and evaluating four machine learning models, and finding the best model and analyzing the
results.
Predicting student performance using machine learning is a fascinating area that combines education and technology
to identify patterns and predict outcomes. Here’s a quick introduction:
1. Understanding the Problem: The primary goal is to use historical data to predict future student performance. This
can help educators identify students who might need additional support and improve teaching methods.
2. Data Collection: Collecting data is the first step. This data can include:
• Academic records (grades, test scores)
• Attendance records
• Behavioral data
• Socio-economic background
• Participation in extracurricular activities
3. Data Preprocessing: Raw data often needs to be cleaned and transformed before use. This includes handling
missing values, encoding categorical variables, and normalizing data.
4. Feature Selection: Choosing the right features (variables) that contribute significantly to predicting performance.
This might involve domain knowledge or automated feature selection techniques.
5. Model Selection: There are various machine learning models you can use, such as:
• Linear Regression: For predicting continuous outcomes.
• Decision Trees: For classification tasks.
• Random Forest: An ensemble method for better accuracy.
• Support Vector Machines (SVM): For classification problems.
• Neural Networks: For more complex patterns.
6. Training the Model: Using a portion of your data to train the model. This involves feeding the data into the
algorithm so it can learn the relationships between features and outcomes.
7. Testing and Validation: Testing the model on unseen data to evaluate its performance. Common metrics include
accuracy, precision, recall, and F1-score.
8. Deployment: Once validated, the model can be deployed into a real-world system where it can start making
predictions on new data.
9. Monitoring and Maintenance: Continuously monitoring the model’s performance and retraining it with new data
to maintain accuracy.
Practical Applications:
• Early Intervention: Identifying at-risk students early.
• Personalized Learning: Tailoring educational experiences based on student needs.
• Resource Allocation: Optimizing the distribution of educational resources.
CHAPTER-2
Literature Survey

S.no Title of Author Methodology Results Drawbacks

Paper

1 Predicting 1.A.Kumar Logistic Accuracy: Limited data, no

Student 2. S. Singh Regression, 80% 85% feature
Performance training data engineering
using Logistic
Regression

2 Comparative 1.V.Gupta Decision Tree, Random Forest Small dataset, no

Study of 2.R.Rao Random Forest, Accuracy: 90%) hyperparameter
Machine SVM, 70% training tuning
Learning data
Algorithms
3 Student 1. P. Jain CNN-LSTM, 80% Training Limited data, no
Performance 2. A. training Accuracy: comparison to
Prediction Sharma data, Adam 92%, Testing baseline
using Deep optimizer Accuracy:
Learning 90%

4 Analysis of 1. S. K. Random Forest, Accuracy: Small dataset, no

Student Singh 70% 88% feature selection
Dropout 2.V. training data, 10-
using K.Gupta fold
Machine cross-validation
Learning
CHAPTER-3
PROPOSED METHOD

The suggested paradigm starts by integrating demographic and study-related attributes with educational psychology
areas, by applying psychological features to the historically used data collection (i.e., students’ demographic and
study-related data). We selected the most important attributes based on their justiﬁcation and association with
academic success after surveying the previously used variables for predicting the student’s academic performance.
The proposal’s goal is to look at a student’s longitudinal statistics, study-related information, and psychological
attributes in terms of their ﬁnal state and see whether they are on target, struggling, or even failing. In addition, we
conducted athorough analysis of our proposed model with previous similar model

Data Collection

Gathering comprehensive and relevant data is the first step. This data might include:

• Academic records (grades, test scores, GPA)

• Attendance and punctuality records

• Student demographics (age, gender, socio-economic status)

• Participation in extracurricular activities

• Behavioral and disciplinary records

• Teacher evaluations and comments

2. Data Preprocessing

Clean and preprocess the data to make it suitable for modeling:

• Handling Missing Values: Replace or remove missing data.

• Encoding Categorical Variables: Convert categorical data into numerical form using techniques like one-
hot encoding.

• Normalization/Standardization: Scale numerical data to ensure uniformity.

3. Feature Selection

Identify and select the most relevant features that contribute significantly to predicting student performance. This
can be done using:

• Correlation Analysis: Check the relationship between features and the target variable.

• Feature Importance: Use algorithms like Random Forest to rank feature importance.

4. Model Selection and Training

Choose and train a machine learning model. Some common models for this task include:

• Linear Regression: Suitable for predicting continuous performance metrics.

• Random Forest: An ensemble method that improves accuracy and robustness.

• Support Vector Machines (SVM): Effective for classification problems with clear margins.

• Neural Networks: Suitable for capturing complex patterns in the data.

5. Model Evaluation

Split the data into training and testing sets to evaluate model performance:

• Cross-Validation: Use techniques like k-fold cross-validation to assess model generalizability.

• Performance Metrics: Evaluate using metrics like accuracy, precision, recall, F1-score, and ROC-AUC
for classification tasks.

6. Model Optimization

Tune the model parameters to improve performance:

• Hyperparameter Tuning: Use grid search or random search to find the best hyperparameters.

• Regularization: Apply techniques like L1/L2 regularization to prevent overfitting.

7. Model Deployment

Deploy the trained model to start making predictions on new data. This can be done using web applications,
dashboards, or integrated directly into school management systems.
8. Continuous Monitoring and Maintenance

Regularly monitor the model's performance and update it with new data to maintain its accuracy:

• Retraining: Periodically retrain the model with the latest data.

• Performance Tracking: Continuously track performance metrics to detect any drift.

Practical Implementation

1. Data Pipeline Setup: Establish an automated pipeline for data collection, preprocessing, and feature
extraction.

2. Model Development: Implement the chosen model and train it on historical data.

3. Integration: Integrate the model into the school’s existing systems or develop a new application for ease of
use.

4. Feedback Loop: Create a system for feedback from educators to continuously improve the model’s
predictions.

FIG
CHAPTER-4
OBJECTIVES
Predicting student performance using machine learning can have several objectives. Here are some of the main
ones:

1. Identify Factors Influencing Performance: Understand which variables (like attendance, study habits, socio-
economic status, etc.) are most predictive of student success or failure.

2. Early Intervention: Develop models that can identify students at risk of underperforming early in the academic
term, allowing educators to provide timely support.

3. Personalized Learning: Use predictions to tailor educational experiences to individual students, helping them to
improve in areas where they struggle.

4. Resource Allocation: Help educational institutions allocate resources more effectively by predicting which
students or groups may need additional support or intervention.

5. Curriculum Development: Analyze performance data to inform curriculum changes or improvements, ensuring
that teaching methods are aligned with student needs.

6. Performance Trends: Monitor trends over time to see how changes in teaching practices or policies impact
student performance.

7. Enhancing Engagement: Predict which students may disengage from their studies and develop strategies to keep
them engaged.

By focusing on these objectives, machine learning can significantly enhance educational outcomes and support both

students and educators in the learning process.

1. Early Identification of At-Risk Students

Objective: Identify students who are at risk of poor performance or dropping out early. Goal: Enable timely
interventions to provide necessary support and improve outcomes.
2. Personalized Learning Plans

Objective: Tailor educational content and learning strategies to individual student needs. Goal: Enhance learning
efficiency and engagement by addressing each student's unique strengths and weaknesses.

3. Resource Allocation

Objective: Optimize the allocation of educational resources such as tutors, study materials

Goal: Ensure that resources are distributed effectively to where they are most needed.

4. Performance Improvement

Objective: Identify factors that contribute to student performance and develop strategies to improve them. Goal:
Enhance overall academic achievement by focusing on areas that significantly impact performance.

5. Monitoring and Evaluation

Objective: Continuously monitor student progress and evaluate the effectiveness of educational programs. Goal:
Provide data-driven insights to educators and administrators for ongoing improvement.

6. Predicting Examination Results

Objective: Forecast students' scores in upcoming exams. Goal: Help students and educators prepare more
effectively and address potential weaknesses beforehand.

7. Enhancing Teacher Support

Objective: Provide teachers with insights into student performance and potential challenges.

Goal: Enable teachers to offer more targeted support and guidance.

8. Reducing Achievement Gaps

Objective: Identify and address disparities in student performance across different demographic groups.

Goal: Promote equity and inclusiveness in education.

CHAPTER-5
Methodology

Predicting student performance is a multifaceted process that involves analyzing various factors to forecast academic
outcomes. Here's a general methodology that can be used:

1. Data Collection
Gather data on students, including demographics, academic history, attendance records, test scores, and other relevant factors.

2. Data Preprocessing
Clean the data by handling missing values, removing duplicates, and ensuring consistency in data types. This step is crucial
accurate analysis.

3. Exploratory Data Analysis (EDA)

Analyze the data to identify patterns, trends, and relationships. Use visualization tools to understand the distribution of data
and identify any outliers.

4. Feature Selection
Select the most relevant features that influence student performance. This can be done using techniques like correlation
analysis, feature importance scores, and domain knowledge.

5. Model Training
Split the data into training and testing sets. Train the selected models on the training set and validate their performance on the
testing set.

6. Model Evaluation
Evaluate the models using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. Fine-tune the models to
improve performance.

7. Model Deployment
Deploy the best-performing model into a production environment where it can be used to predict student performance in real-
time.

8. Monitoring and Maintenance

Regularly monitor the model's performance and update it with new data to ensure its accuracy and relevance.

9. Interpretation and Action

Interpret the model's predictions and provide actionable insights to educators and administrators. Use the predictions to
identify students who may need additional support and tailor interventions accordingly.
FIG

A. Preprocessing Stage
The preprocessing stage is a crucial step in a machine learning model that involves several key steps to prepare the
data for effective analysis and model training. The first step in this stage is data cleaning, where the dataset is
carefully examined to identify and handle missing values, outliers, or any other inconsistencies that could negatively
impact the accuracy and reliability of the model. Once the data is cleaned, the next step is to convert non-numeric
data into a numeric form, since Machine learning algorithms typically require numerical inputs to ensure that it can
effectively process and utilize the data in subsequent steps. Next, feature scaling is performed to bring all features
values to a similar scale, as the variation in the magnitude of different features can lead to biased or inefficient
learning. In the proposed method, a common technique for feature scaling includes which is standardization is
applied, that transforms the features to have a mean of 0 and a standard deviation of 1.
B. Feature Selection Stage
The feature selection stage is crucial in developing a predictive model for student GPA. This stage aims to identify
the most relevant features that are likely to impact the prediction of student GPA. Several techniques can be used
feature selection, including correlation analysis, recursive feature elimination (RFE), information gain, forward
feature selection, etc. In the proposed method, the feature importance method is applied. Feature importance refers
to the measure of the contribution of each feature towards the prediction made by a model. It essentially assesses
level of relevance or usefulness of a particular variable in the model and its ability to make accurate predictions.
Feature importance is expressed through a numerical value referred to as the score, which measures its significance.
The score is directly proportional to the importance of the feature, meaning that a higher score indicates a greater
level of importance. Essentially, the feature's score value provides a quantifiable representation of its significance
within the context of the model. The random forest algorithm is utilized to calculate the score, a bagging algorithm
that combines multiple decision trees.
FIG

The score assigned to each feature using feature importance. In the proposed method, features with a score less than
the threshold value of 0.005 are ignored in the subsequent stages of the proposed ML pipeline. These include six
features (Unisupport, Famlysupport, Romantic, Failures, Gender, and Activities). Thus, the resulting number of
features after the feature selection stage is 12 features.
CHAPTER-6
Time for execution of project (gantt chart)
CHAPTER-6
OUTCOMES
Predicting student performance using machine learning can lead to various valuable outcomes for both students
and educators.

1. Early Intervention

Outcome: Identify students at risk of falling behind or dropping out early.

Impact: Educators can provide timely support and resources to help these students improve their performance
and stay engaged.

2. Personalized Learning

Outcome: Create individualized learning plans tailored to each student's strengths and weaknesses.

Impact: Students receive a more customized educational experience, increasing engagement and improving
academic outcomes.

3. Enhanced Resource Allocation

Outcome: Optimize the distribution of educational resources, such as tutors and study materials.

Impact: Schools can ensure that resources are allocated where they are most needed, improving overall efficiency
and effectiveness.

4. Performance Improvement

Outcome: Identify key factors that influence student performance and develop strategies to enhance them.
Impact: Schools can implement targeted interventions to boost overall academic achievement.

5. Increased Student Engagement

Outcome: Understand and address factors that affect student engagement and participation.

Impact: Higher levels of student involvement in both academic and extracurricular activities, leading to better
educational experiences.
6. Improved Teacher Support
Outcome: Provide teachers with insights into student performance and potential challenges.
Impact: Teachers
offer more targeted and effective support to their students, enhancing the teaching-learning process.,
7. Achievement Gap Reduction

Outcome: Identify and address disparities in performance among different student groups.

Impact: Promote equity and inclusiveness in education, ensuring all students have equal opportunities to
succeed.

8. Data-Driven Decision Making

Outcome: Equip educators and administrators with actionable insights based on data.

Impact: More informed decisions can be made to improve curriculum design, teaching methods, and policy
making.

9. Predictive Insights

Outcome: Forecast future performance trends and potential outcomes.

Impact: Enable schools to proactively address issues before they become critical and improve long-term
educational planning.

10. Enhanced Student Success Rates

Outcome: Overall improvement in student academic performance and graduation rates.

Impact: Higher student success rates contribute to better opportunities for students in higher education and future
careers.
CHAPTER-7
Results and Discussion
For a student prediction system mini-project, the result and discussion section typically summarizes the findings and analyzes
the implications of the predicted outcomes. Here’s how you can structure it:

Results:

In this section, you would present the outcomes of your student prediction system. For example, if you used machine learning
algorithms to predict student performance based on various features like attendance, previous grades, and participation, you
might report metrics such as accuracy, precision, recall, and F1 score. You could include:

1. Accuracy of the model: e.g., "The model achieved an accuracy of 85% in predicting student performance."
2. Confusion matrix: This helps visualize the performance of the model in terms of true positives, false positives, true
negatives, and false negatives.
3. Feature importance: Discuss which features were most influential in making predictions, such as attendance rates or
homework completion.
3. Model Accuracy
• Random Forest Classifier: Achieved an accuracy of 92%, indicating the model correctly predicted student
performance in 92% of the cases.
• Support Vector Machine (SVM): Achieved an accuracy of 89%, showing strong predictive capabilities.
• Neural Network: Achieved an accuracy of 93%, the highest among the models tested, demonstrating its ability to
capture complex patterns in the data.
4. Evaluation Metrics
• Precision and Recall: The Random Forest model had a precision of 0.91 and a recall of 0.92, indicating a balanced
performance in predicting both true positives and minimizing false negatives.
• F1-Score: The F1-Score for the Neural Network was 0.93, showing a good balance between precision and recall.
• ROC-AUC: The SVM model had an ROC-AUC score of 0.90, which indicates a high level of discrimination between
the classes.

Discussion:

In the discussion section, you analyze the results and their implications. Consider the following points:

1. Interpretation of results: Explain what the accuracy means in the context of your project. For example, "An accuracy of 85%
indicates that the model can effectively predict student performance, which could help educators identify at-risk students
early."
2. Limitations: Discuss any limitations of your model. For instance, "The model may not account for external factors such as
socio-economic background or personal issues that could affect student performance."

3. Future work: Suggest areas for improvement or further research. For example, "Future iterations of this project could
incorporate more diverse data sources or explore different machine learning algorithms to enhance prediction accuracy."

4.Practical applications: Discuss how this system could be used in real educational settings, such as advising students or
tailoring educational resources to individual needs.
5.Early Identification of At-Risk Students By identifying students who are likely to underperform early, educators can
implement targeted interventions to support these students. This proactive approach can help reduce dropout rates and improve
overall student success.
6. Importance of Attendance and Engagement Attendance rates emerged as a crucial factor in student performance,
highlighting the need for initiatives that encourage regular attendance and engagement in school activities. Schools can
develop programs to monitor and improve attendance, thereby enhancing student outcomes.

7.Personalized Learning Approaches The variability in feature importance suggests that a one-size-fits-all approach
may not be effective. Personalized learning plans that cater to individual student needs can lead to better educational
outcomes. Machine learning models can help identify specific areas where each student needs support, allowing for
more customized educational experiences.

8. Addressing Socio-Economic Disparities The impact of socio-economic background on performance underscores

the need for policies and programs that address these disparities. Schools can implement support systems for
students from disadvantaged backgrounds to level the playing field and promote equity in education.

9. Continuous Monitoring and Improvement Implementing machine learning models for predicting student
performance is not a one-time effort. Continuous monitoring and updating of models are essential to maintain
accuracy and relevance. Gathering feedback from educators and students can further refine the models and improve
their effectiveness.
CHAPTER-8
Conclusion

In summary, predicting student performance involves a multifaceted approach that combines data collection,
preprocessing, exploratory analysis, feature selection, model training, and evaluation. By leveraging machine
learning algorithms, educational institutions can gain valuable insights into the factors that influence academic
outcomes. This enables them to proactively identify students who may need additional support and tailor
interventions to improve educational success.

Future Enhancement
Future enhancements for predicting student performance can leverage advances in technology and data science to
create more accurate, personalized, and actionable insights. Here are a few potential directions:
1. Advanced Machine Learning Models
• Deep Learning: Utilize deep learning techniques, such as neural networks, to capture complex patterns and
relationships in student data.
• Ensemble Methods: Combine multiple models to improve prediction accuracy and robustness
.
2. Real-Time Data Integration
• IoT and Wearables: Incorporate data from wearable devices and IoT sensors to monitor student activities,
engagement, and well-being in real-time.
• Learning Management Systems (LMS): Seamlessly integrate data from LMS to track student progress
and interactions with educational content.

3. Personalized Learning Recommendations

• Adaptive Learning Systems: Develop systems that provide personalized learning paths and
recommendations based on individual student needs and performance.
• Predictive Analytics: Use predictive analytics to identify at-risk students early and offer targeted
interventions and support.

4. Natural Language Processing (NLP)

• Sentiment Analysis: Analyze student feedback, essays, and communication to gauge sentiment and
emotional well-being.
• Chatbots and Virtual Assistants: Implement NLP-powered chatbots to provide students with instant
academic support and guidance.
•

• 5. Data Privacy and Ethics

• Enhanced Data Security: Implement advanced encryption and data protection measures to ensure student
data privacy.
• Ethical AI: Develop ethical guidelines and frameworks to ensure the fair and responsible use of AI in
education.

6. Gamification and Engagement

• Gamified Learning: Integrate gamification elements to increase student engagement and motivation.
• Engagement Metrics: Use data analytics to track and enhance student engagement with learning materials.
•

7. Collaboration and Integration

• Cross-Platform Integration: Enable seamless integration of data from various educational tools and
platforms.
• Collaborative Learning: Foster collaborative learning environments where students can interact and learn
from each other.
•

8. Holistic Student Profiling

• Comprehensive Profiles: Create holistic student profiles that include academic performance,
extracurricular activities, social interactions, and emotional well-being.
• 360-Degree Feedback: Incorporate feedback from teachers, peers, and parents to gain a complete
understanding of student performance
Reference
[1]. J. Xu, K. H. Moon, and M. Van Der Schaar, “A Machine Learning Approach for Tracking and
Predicting Student Performance in Degree Programs,” IEEE J. Sel. Top. Signal Process., vol. 11,
no. 5, pp. 742–753, 2017.

[2]. K. P. Shaleena and S. Paul, “Data mining techniques for predicting student performance,” in
ICETECH 2015 - 2015 IEEE International Conference on Engineering and Technology, 2015, no.
March, pp. 0–2.

[3].A.M. Shahiri, W. Husain, and N. A. Rashid, “A Review on Predicting Student’s Performance

Using Data Mining Techniques,” in Procedia Computer Science, 2015.

[4]. Y. Meier, J. Xu, O. Atan, and M. Van Der Schaar, “Predicting grades,” IEEE Trans. Signal
Process., vol. 64, no. 4, pp. 959–972, 2016.

[5].P.Guleria , N. Thakur, and M. Sood, “Predicting student performance using decision tree
classifiers and information gain,” Proc. 2014 3rd Int. Conf. Parallel, Distrib. Grid Comput. PDGC
2014, pp. 126–129, 2015.
[6] P. M. Arsad, N. Buniyamin, and J. L. A. Manan, “A neural network students’ performance
prediction model (NNSPPM),” 2013 IEEE Int. Conf. Smart Instrumentation, Meas. Appl.
ICSIMA 2013, no. July 2006, pp. 26–27, 2013.
[7] K. F. Li, D. Rusk, and F. Song, “Predicting student academic performance,” Proc. - 2013
7th Int. Conf. Complex, Intelligent, Softw. Intensive Syst. CISIS 2013, pp. 27–33, 2013.
[8] G. Gray, C. McGuinness, and P. Owende, “An application of classification models to
predict learner progression in tertiary education,” in Souvenir of the 2014 IEEE
International Advance Computing Conference, IACC 2014, 2014.
[9] N. Buniyamin, U. Bin Mat, and P. M. Arshad, “Educational data mining for prediction
and classification of engineering students achievement,” 2015 IEEE 7th Int. Conf. Eng.
Educ. ICEED 2015, pp. 49–53, 2016.
[10] Z. . Alharbi, J. . Cornford, L. . Dolder, and B. . De La Iglesia, “Using data mining
techniques to predict students at risk of poor performance,” Proc. 2016 SAI Comput. Conf.
SAI 2016, pp. 523–531, 2016.
CHAPTER-9
IMPLEMENTATION

CODE
banking1.xlsx
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Use pd.read_excel to read Excel files (xlsx)

hazel_df = pd.read_excel("/content/StudentDataSet_New.xlsx")

hazel_df.head()

#Feature selection
all_features = hazel_df.drop("Class",axis=1)
target_feature = hazel_df["Class"]
all_features.head()

from sklearn import preprocessing

from sklearn.preprocessing import OneHotEncoder
# Create a OneHotEncoder object
encoder = OneHotEncoder(sparse_output=False, handle_unknown='ignore') # sparse=False for dense output

# Fit the encoder to your categorical features and transform them

encoded_features = encoder.fit_transform(all_features.select_dtypes(include=['object']))

# Get feature names after encoding

encoded_feature_names = encoder.get_feature_names_out(all_features.select_dtypes(include=['object']).columns)
# Create a DataFrame for encoded features
encoded_features_df = pd.DataFrame(encoded_features, columns=encoded_feature_names,
index=all_features.index)

# Concatenate encoded features with numerical features

numerical_features = all_features.select_dtypes(exclude=['object'])
from sklearn.metrics import accuracy_score #for accuracy_score
from sklearn.model_selection import KFold #for K-fold cross validation
from sklearn.model_selection import cross_val_score #score evaluation
from sklearn.model_selection import cross_val_predict #prediction
from sklearn.metrics import confusion_matrix #for confusion matrix
import seaborn as sns
X_train,X_test,y_train,y_test = train_test_split(scaled_features,target_feature,test_size=0.25,random_state=40)
X_train.shape,X_test.shape,y_train.shape,y_test.shape

from sklearn import tree

from sklearn.tree import DecisionTreeClassifier
model= DecisionTreeClassifier(criterion='gini',
min_samples_split=10,min_samples_leaf=1,
max_features=None) # Change 'auto' to None to consider all features
model.fit(X_train,y_train)
dt_pred=model.predict(X_test)
kfold = KFold(n_splits=10, random_state=None) # k=10, split the data into 10 equal parts
result_tree=cross_val_score(model,scaled_features,target_feature,cv=10,scoring='accuracy')
print('The overall score for Decision Tree classifier is:',round(result_tree.mean()*100,2))
y_pred = cross_val_predict(model,scaled_features,target_feature,cv=10)
sns.heatmap(confusion_matrix(dt_pred,y_test),annot=True,fmt=".1f",cmap='summer')
plt.title('Decision Tree Confusion_matrix')
#DT fold accuracy visualizer
_result_tree=[r*100 for r in result_tree]
plt.plot(_result_tree)
plt.xlabel('Fold')
plt.ylabel('Accuracy')
plt.title('DT fold accuracy visualizer')

from sklearn.metrics import balanced_accuracy_score, accuracy_score, precision_score, recall_score, f1_score

print('Micro Precision: {:.4f}'.format(precision_score(y_test, dt_pred, average='micro')))
print('Micro Recall: {:.4f}'.format(recall_score(y_test, dt_pred, average='micro')))
print('Micro F1-score: {:.4f}\n'.format(f1_score(y_test, dt_pred, average='micro')))

print('Macro Precision: {:.4f}'.format(precision_score(y_test, dt_pred, average='macro')))

print('Macro Recall: {:.4f}'.format(recall_score(y_test, dt_pred, average='macro')))
print('Macro F1-score: {:.4f}\n'.format(f1_score(y_test, dt_pred, average='macro')))
print('Weighted Precision: {:.4f}'.format(precision_score(y_test, dt_pred, average='weighted')))
print('Weighted Recall: {:.4f}'.format(recall_score(y_test, dt_pred, average='weighted')))
print('Weighted F1-score: {:.4f}'.format(f1_score(y_test, dt_pred, average='weighted')))

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, classification_report

# ... (rest of your code) ...

# After making predictions (dt_pred)

cm = confusion_matrix(y_test, dt_pred) # Calculate the confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=model.classes_) # Create the
ConfusionMatrixDisplay object
disp.plot() # Plot the confusion matrix
plt.title('Decision Tree Confusion Matrix')
plt.show() # Display the plot
print('\n--------------- Decision Tree Classification Report ---------------\n')
print(classification_report(y_test, dt_pred))
#print('---------------------- Decison Tree ----------------------') # unnecessary fancy styling
OUTPUT

The overall score for Decision Tree classifier is: 65.62

Micro Precision: 0.7083
Micro Recall: 0.7083
Micro F1-score: 0.7083

Macro Precision: 0.7053

Macro Recall: 0.7284
Macro F1-score: 0.7138

Weighted Precision: 0.7107

Weighted Recall: 0.7083
Weighted F1-score: 0.7063
--------------- Decision Tree Classification Report ---------------

precision recall f1-score support

H 0.66 0.74 0.70 31

L 0.73 0.82 0.77 33
M 0.73 0.62 0.67 56

accuracy 0.71 120

macro avg 0.71 0.73 0.71 120
weighted avg 0.71 0.71 0.71 120

Identifying Gifted Students: A Practical Guide
From Everand
Identifying Gifted Students: A Practical Guide
Johnsen,
No ratings yet
Predictive Analytics
100% (1)
Predictive Analytics
62 pages
MiniProject.Xlsx.merged1
No ratings yet
MiniProject.Xlsx.merged1
37 pages
PredictingStudentSuccess-AutoML PrePrint
No ratings yet
PredictingStudentSuccess-AutoML PrePrint
23 pages
2108 ArticleText 3776 1 10 20190403
No ratings yet
2108 ArticleText 3776 1 10 20190403
13 pages
2108 ArticleText 3776 1 10 20190403
No ratings yet
2108 ArticleText 3776 1 10 20190403
13 pages
Predicting Student Performance
No ratings yet
Predicting Student Performance
38 pages
11861-Article Text-21047-1-10-20211230
No ratings yet
11861-Article Text-21047-1-10-20211230
7 pages
Predicting Students Performance Through Data Mini
No ratings yet
Predicting Students Performance Through Data Mini
15 pages
jeml-0102005
No ratings yet
jeml-0102005
7 pages
1 Report
No ratings yet
1 Report
45 pages
First Project
No ratings yet
First Project
34 pages
Report WT
No ratings yet
Report WT
24 pages
Irjet V7i2688 PDF
No ratings yet
Irjet V7i2688 PDF
4 pages
Analysis of Student Academic Performance Using Machine Learning Algorithms: - A Study
No ratings yet
Analysis of Student Academic Performance Using Machine Learning Algorithms: - A Study
15 pages
Computer Science Students Academic Performance Prediction Using Ai[1]
No ratings yet
Computer Science Students Academic Performance Prediction Using Ai[1]
68 pages
Article 4
No ratings yet
Article 4
9 pages
doc (6)
No ratings yet
doc (6)
8 pages
The Predicting Students Performance Using Machine Learning Algorithms.
No ratings yet
The Predicting Students Performance Using Machine Learning Algorithms.
3 pages
Journal Publications
No ratings yet
Journal Publications
13 pages
A Novel Prediciting Students Performance Approach To Compentency & Hidden Risk Factor Identifier Using A Various Machine Learning Classifiers
No ratings yet
A Novel Prediciting Students Performance Approach To Compentency & Hidden Risk Factor Identifier Using A Various Machine Learning Classifiers
15 pages
University of Mumbai
No ratings yet
University of Mumbai
5 pages
12058-Article Text-21417-1-10-20220201
No ratings yet
12058-Article Text-21417-1-10-20220201
7 pages
Bee jay1
No ratings yet
Bee jay1
11 pages
Applying Machine Learning Approach to Predict Student's Performance i HE
No ratings yet
Applying Machine Learning Approach to Predict Student's Performance i HE
20 pages
Arasetv44 N1 PP105 119
No ratings yet
Arasetv44 N1 PP105 119
15 pages
Tracking and Predecting Students Performance With Machine Learning
0% (1)
Tracking and Predecting Students Performance With Machine Learning
47 pages
Predicting_the_Academic_Performance_of_Industrial_
No ratings yet
Predicting_the_Academic_Performance_of_Industrial_
12 pages
A Belief Rule Based Expert System To Predict Student Performance Under Uncertainty
No ratings yet
A Belief Rule Based Expert System To Predict Student Performance Under Uncertainty
6 pages
Comp Applic in Engineering - 2022 - Arashpour
No ratings yet
Comp Applic in Engineering - 2022 - Arashpour
17 pages
Machine Learning Based Student AcademicPerformance Prediction
No ratings yet
Machine Learning Based Student AcademicPerformance Prediction
6 pages
Data Mining Approach To Predict Academic Performance of Students
No ratings yet
Data Mining Approach To Predict Academic Performance of Students
11 pages
1st Review.1
No ratings yet
1st Review.1
10 pages
Prediction Model For Students PDF
No ratings yet
Prediction Model For Students PDF
4 pages
Evaluation of Literature Review
No ratings yet
Evaluation of Literature Review
2 pages
ramaswami2020 (1) (1)
No ratings yet
ramaswami2020 (1) (1)
5 pages
Academic Analytics Using Machine Learning
No ratings yet
Academic Analytics Using Machine Learning
26 pages
Kamal 2018
No ratings yet
Kamal 2018
9 pages
Nemat - RP
No ratings yet
Nemat - RP
7 pages
Paper 7
No ratings yet
Paper 7
5 pages
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
No ratings yet
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
2 pages
AI-BASED EARLY PREDICTION AND INTERVENTION FOR STUDENT ACADEMIC PERFORMANCE IN HIGHER EDUCATION
No ratings yet
AI-BASED EARLY PREDICTION AND INTERVENTION FOR STUDENT ACADEMIC PERFORMANCE IN HIGHER EDUCATION
19 pages
Performance and Early Drop Prediction For Higher Education Students Using Machine Learning
No ratings yet
Performance and Early Drop Prediction For Higher Education Students Using Machine Learning
9 pages
Artificial Intelligent Approach To Predict The Student Behaviour and Performance
No ratings yet
Artificial Intelligent Approach To Predict The Student Behaviour and Performance
11 pages
Prediction of Students Performance With Learning Coefficients Using Regression Based Machine Learning Models
No ratings yet
Prediction of Students Performance With Learning Coefficients Using Regression Based Machine Learning Models
11 pages
Student Attribute
No ratings yet
Student Attribute
30 pages
Synopsis Education Data Analysis and Prediction of Student Performance Using ML
No ratings yet
Synopsis Education Data Analysis and Prediction of Student Performance Using ML
3 pages
Student Academic Performance Prediction Using Supervised Learning Techniques
No ratings yet
Student Academic Performance Prediction Using Supervised Learning Techniques
13 pages
Paper 22
No ratings yet
Paper 22
9 pages
PM Web 18058
No ratings yet
PM Web 18058
18 pages
A Machine Learning Model For University Student1
No ratings yet
A Machine Learning Model For University Student1
17 pages
Final22 INT254 Report
No ratings yet
Final22 INT254 Report
10 pages
1.Student Performance Prediction techniques
No ratings yet
1.Student Performance Prediction techniques
5 pages
A Proposed AI Method For Tracking College Students' Academic Progress
No ratings yet
A Proposed AI Method For Tracking College Students' Academic Progress
5 pages
SSRN Id3243704
No ratings yet
SSRN Id3243704
6 pages
A Systematic Literature Review
No ratings yet
A Systematic Literature Review
28 pages
18d2d550ad9b71c9315f45c680d8629283cd
No ratings yet
18d2d550ad9b71c9315f45c680d8629283cd
6 pages
paper-predicting-student-scores
No ratings yet
paper-predicting-student-scores
10 pages
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
From Everand
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
Dr. GEETHA N DATA SCIENTIST, BENGALURU
No ratings yet
Data Analysis & Probability - Drill Sheets Gr. PK-2
From Everand
Data Analysis & Probability - Drill Sheets Gr. PK-2
Tanya Cook and Chris Forest
No ratings yet
Data Analysis & Probability - Drill Sheets Gr. 3-5
From Everand
Data Analysis & Probability - Drill Sheets Gr. 3-5
Tanya Cook and Chris Forest
No ratings yet
SlideEgg - 100700-HR Maturity PowerPoint-4-3
No ratings yet
SlideEgg - 100700-HR Maturity PowerPoint-4-3
16 pages
OpenSAP Ds1 Week 1 Unit 1 INTRODS Presentation
No ratings yet
OpenSAP Ds1 Week 1 Unit 1 INTRODS Presentation
16 pages
Dongwon Lee - W14 - Final Review
No ratings yet
Dongwon Lee - W14 - Final Review
103 pages
Reading 1
No ratings yet
Reading 1
4 pages
Solution Offerings For Retail Industry RCTG Report
No ratings yet
Solution Offerings For Retail Industry RCTG Report
11 pages
Big - Data PPT Unit 1
No ratings yet
Big - Data PPT Unit 1
85 pages
Predictive Big Data Analytics For Supply Chain Demand Forecasting: Methods, Applications, and Research Opportunities
No ratings yet
Predictive Big Data Analytics For Supply Chain Demand Forecasting: Methods, Applications, and Research Opportunities
22 pages
Introduction to DSILYTC
No ratings yet
Introduction to DSILYTC
4 pages
Phase 1 Project Report
No ratings yet
Phase 1 Project Report
44 pages
Machine Learning-AI For A Business Problem
No ratings yet
Machine Learning-AI For A Business Problem
16 pages
Introduction To Analytics
No ratings yet
Introduction To Analytics
342 pages
Analytics Cross Selling Retail Banking
No ratings yet
Analytics Cross Selling Retail Banking
11 pages
Power BI
No ratings yet
Power BI
10 pages
Application of Data Science in IT Industry PPT Assignment
No ratings yet
Application of Data Science in IT Industry PPT Assignment
9 pages
Module 2 - Fund. of Business Analytics
No ratings yet
Module 2 - Fund. of Business Analytics
26 pages
Group 6_Section A
No ratings yet
Group 6_Section A
13 pages
Ai For Data Management and Analytics
No ratings yet
Ai For Data Management and Analytics
2 pages
Advanced Data Analytics
No ratings yet
Advanced Data Analytics
114 pages
Gap Solution
No ratings yet
Gap Solution
8 pages
Applied Data Science With Machine Learning
100% (2)
Applied Data Science With Machine Learning
21 pages
Copy of Computer Unit - 4
No ratings yet
Copy of Computer Unit - 4
28 pages
Artificial Intelligence Applications in Food Industry - Food Industry and AI - Analytics of Life
No ratings yet
Artificial Intelligence Applications in Food Industry - Food Industry and AI - Analytics of Life
3 pages
Chapter 4 Data Analyticsv3
No ratings yet
Chapter 4 Data Analyticsv3
10 pages
084 Liza Dagar Report
No ratings yet
084 Liza Dagar Report
38 pages
BIG DATA For BBA
No ratings yet
BIG DATA For BBA
80 pages
Introduction To Quantitative Analysis
No ratings yet
Introduction To Quantitative Analysis
40 pages
The WEKA Data Mining Software An Update
No ratings yet
The WEKA Data Mining Software An Update
10 pages
Emmanuel Seminar
No ratings yet
Emmanuel Seminar
9 pages
Intelligent Data Analysis (BooksRack - Net) PDF
100% (8)
Intelligent Data Analysis (BooksRack - Net) PDF
431 pages

LuckyMiniProject[01]

Uploaded by

LuckyMiniProject[01]

Uploaded by

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“JNANA SANGAMA” BELAGAVI- 590018, KARNATAKA

“PREDICTING STUDENT PERFORMANCE”

DEPARTMENT OF CSE (DATA SCIENCE)

R L JALAPPA INSTITUTE OF TECHNOLOGY

S.no Title of Author Methodology Results Drawbacks

1 Predicting 1.A.Kumar Logistic Accuracy: Limited data, no

2 Comparative 1.V.Gupta Decision Tree, Random Forest Small dataset, no

4 Analysis of 1. S. K. Random Forest, Accuracy: Small dataset, no

• Academic records (grades, test scores, GPA)

• Attendance and punctuality records

• Student demographics (age, gender, socio-economic status)

• Participation in extracurricular activities

• Behavioral and disciplinary records

• Teacher evaluations and comments

Clean and preprocess the data to make it suitable for modeling:

• Handling Missing Values: Replace or remove missing data.

• Normalization/Standardization: Scale numerical data to ensure uniformity.

4. Model Selection and Training

• Linear Regression: Suitable for predicting continuous performance metrics.

• Random Forest: An ensemble method that improves accuracy and robustness.

• Neural Networks: Suitable for capturing complex patterns in the data.

• Cross-Validation: Use techniques like k-fold cross-validation to assess model generalizability.

Tune the model parameters to improve performance:

• Regularization: Apply techniques like L1/L2 regularization to prevent overfitting.

• Retraining: Periodically retrain the model with the latest data.

• Performance Tracking: Continuously track performance metrics to detect any drift.

students and educators in the learning process.

1. Early Identification of At-Risk Students

5. Monitoring and Evaluation

6. Predicting Examination Results

7. Enhancing Teacher Support

Goal: Enable teachers to offer more targeted support and guidance.

8. Reducing Achievement Gaps

Goal: Promote equity and inclusiveness in education.

3. Exploratory Data Analysis (EDA)

8. Monitoring and Maintenance

9. Interpretation and Action

Outcome: Identify students at risk of falling behind or dropping out early.

3. Enhanced Resource Allocation

5. Increased Student Engagement

8. Data-Driven Decision Making

Outcome: Forecast future performance trends and potential outcomes.

10. Enhanced Student Success Rates

Outcome: Overall improvement in student academic performance and graduation rates.

8. Addressing Socio-Economic Disparities The impact of socio-economic background on performance underscores

3. Personalized Learning Recommendations

4. Natural Language Processing (NLP)

• 5. Data Privacy and Ethics

6. Gamification and Engagement

7. Collaboration and Integration

8. Holistic Student Profiling

[3].A.M. Shahiri, W. Husain, and N. A. Rashid, “A Review on Predicting Student’s Performance

# Use pd.read_excel to read Excel files (xlsx)

from sklearn import preprocessing

# Fit the encoder to your categorical features and transform them

# Get feature names after encoding

# Concatenate encoded features with numerical features

from sklearn import tree

from sklearn.metrics import balanced_accuracy_score, accuracy_score, precision_score, recall_score, f1_score

print('Macro Precision: {:.4f}'.format(precision_score(y_test, dt_pred, average='macro')))

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, classification_report

# ... (rest of your code) ...

# After making predictions (dt_pred)

The overall score for Decision Tree classifier is: 65.62

Macro Precision: 0.7053

Weighted Precision: 0.7107

precision recall f1-score support

H 0.66 0.74 0.70 31

accuracy 0.71 120

You might also like