University Recommendation System For Abroad Studies Using Machine Learning
University Recommendation System For Abroad Studies Using Machine Learning
https://doi.org/10.22214/ijraset.2023.50835
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Abstract: Finding the right college or university to pursue postgraduate studies can be a daunting task, particularly when
considering admission requirements and available academic re- sources. This can lead to students applying to institutions that
are beyond their reach, wasting time and money on applications they are unlikely to be accepted to. This research study suggests
creating a recommendation system to help graduate admission applicants choose the best university for them as a solution to this
problem. The system leverages historical data of previously enrolled graduate students and considers several factors, including
academic records, standardized test scores, academic standing, and university characteristics, to provide personalized
recommendations to admission seekers.
Keywords: Recommender System, Random Forests, Decision Trees, XGBoost
I. INTRODUCTION
Finding out which institutions or universities one has a decent chance of getting into based on one’s academic record, including
GPA, TOEFL, and GRE, can be challenging for those who are interested in pursuing postgraduate study. Sadly, some people make
the error of applying to universities that are out of their financial grasp and that have strict scoring criteria, which could eventually
hurt their future employment opportunities. Consequently, candidates should only submit applications to colleges and universities
where they have a good chance of being accepted. Otherwise, they will just be wasting their time and money.
The dataset utilised for processing contains a variety of variables, including the university name, verbal and quantitative test
scores, and GRE, TOEFL, or IELTS scores. Many colleges and graduate programmes use the GRE test, often known as the
Graduate Record Examinations, as part of their admissions procedure. While applying to colleges, additional considerations are
made in addition to standardised test scores, including letters of reference, declarations of purpose, extracurricular activities, and
research papers.
It can be difficult to choose which schools to apply to base on one’s academic record, scores on standardised tests like the GRE and
TOEFL, and GPA. This is particularly valid if a person has completed their undergraduate degree and decides to pursue a
postgraduate degree in their chosen field. Sadly, a lot of applicants could submit applications to universities that do not match their
score requirements, which might be a waste of time and money. Using standardised test results to apply to several colleges might
significantly raise the expense of the application procedure. Unfortunately, there aren’t many effective solutions out there to aid this
issue, which is why this system was created.
The research study’s main objective is developing a system of recommendations that will help graduate admission applicants choose
the best university for them. To do this, the system will examine the historical data of previously enrolled graduate students and use
pertinent data to offer individualised suggestions to applicants. The academic history, academic standing, and test scores will all be
considered by this recommendation mechanism. By utilising this data, the system will offer insightful information that admission
hopefuls can use to decide which university will effectively fulfill their educational needs and objectives. Overall, the
recommendation system will be a useful tool for people looking to improve their educational and career options by pursuing a
graduate degree.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2954
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
The system aims to streamline the college admissions process by helping students to avoid spending time and money on counsellors
and stressful research related to finding a suitable college.[2]
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2955
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Overall, the article is a useful resource for scholars and practitioners in the field of recommender systems by offering a blueprint for
developing recommender systems that are efficient and dependable and can cater to the requirements of various applications and
domains [7].
An improved method for the movie recommendation system is presented in the study "An Improved Approach for Movie
Recommendation System" by Shreya Agrawal and Pooja Jain. To provide movie suggestions, the authors suggest an algorithm that
combines the user-based collaborative filtering and item-based collaborative filtering methodologies. The cosine similarity
technique and the Jaccard coefficient are also included in the recommended approach to solving the issue of data sparsity. On the
MovieLens dataset, the authors assessed the effectiveness of the suggested algorithm and contrasted it with other recommendation
methods. The experimental findings shown that the suggested strategy performed and was more precise than alternative methods
[9].
The research also includes a hybridization methodology that takes use of the advantages of both user-based and item-based
collaborative filtering approaches. The suggested approach successfully addresses the cold start issue and offers consumers
personalised according to their preferences. The report also provides a thorough assessment of the suggested approach using several
evaluation measures. The proposed approach by the authors makes a significant contribution to the field of movie recommendation
systems and is pertinent to researchers, practitioners, and business experts who work in the discipline of recommendation systems
due to its potential to enhance the performance and accuracy of such systems [9].
The creation of a recommendation system to help prospective graduate students choose appropriate graduate schools to apply to is
presented in the research article titled "Graduate School Recommender System: Assisting Admission Seekers to Apply for Graduate
Studies in Appropriate Graduate Schools". Based on the student's academic profile, such as their GPA, GRE scores, and
undergraduate major, the algorithm is intended to suggest graduate institutions. To create the recommendation engine, the authors
combined collaborative filtering methods with machine learning algorithms [10].
In the graduate school admissions procedure, when there are many institutions to select from and students are sometimes
overwhelmed, the report emphasises the value of personalised recommendations. The algorithm performed well in terms of
accuracy and relevancy of suggestions when tested using a dataset of graduate programmes in the United States. Overall, the study
adds to the expanding body of knowledge on recommendation systems in the field of education and offers a workable way to help
students choose graduate schools.[10].
The creation of a hybrid collaborative filtering-based recommender system is discussed in the article "A real-time recommender
system based on hybrid collaborative filtering" by W. Yuan-hong and T. Xiao-qiu, which was presented at the 5th International
Conference on Computer Science & Education in 2010. To provide customised suggestions for users, the system incorporates
approaches for user- and item-based collaborative filtering. To enhance the accuracy of suggestions, the authors suggest a hybrid
strategy that incorporates the advantages of the two approaches. The system's real-time operation incorporates aspects like user
feedback and preference change to raise the accuracy of recommendations over time. The suggested system is a potential method for
developing real-time recommender systems since it is easily adaptable to new domains [16].
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2956
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
B. Algorithms Used
1) XGBoost Algorithm: Classification models figure out a new instance’s class or group based on its characteristics. To put it
another way, classification models are a form of supervised learning that uses labelled data to learn in order to forecast
outcomes for fresh, untainted data. Classification models are used to find patterns and connections in data that is used to
distinguish between various groups or divisions. A classification algorithm could be used, for instance, to determine, based
on an email’s text and other characteristics, whether or not it is spam. Decision trees are a sort of
supervised learning algorithm, and XGBoost (Extreme Gradient Boosting) is a machine learning algorithm created to enhance
their efficiency. In order to develop a more accurate forecast, XGBoost builds an ensemble of decision trees and iteratively
improves it. A succession of decision trees that are trained on various segments of the data are the fundamental concept behind
XGBoost. Every decision tree is built to fix the flaws of the one before it, improving the general prediction over time. In order
to reduce mistakes, the training procedure includes determining the ideal weights for each tree. An ensemble technique called
XGBoost combines an array of weak models to produce a powerful model. XGBoost can catch more intricate connections in
the data and lessen overfitting by merging numerous weak models. Regularization methods are used by XGBoost to avoid
overfitting, which can improve generalisation efficiency on untested data. In order to simplify the model and enhance its
efficiency with fresh data, XGBoost adds fines to the loss function. A gauge of feature significance is offered by XGBoost,
which can be used 1 to pinpoint which features are most pertinent for a particular job. XGBoost is able to increase accuracy and
decrease noise in the data by concentrating on the most crucial characteristics. Overall, XGBoost is a potent machine
learning method that combines numerous weak models, employs regularisation strategies, gradient-based optimization, feature
importance, and handles missing values to produce high precision.
2) Decision Tree Algorithm: The decision tree is a popular machine learning technique for classification and regression issues. It
divides the data into more homogeneous groups iteratively based on the values of the input characteristics. The final predictions
or outcomes are represented by the tree's leaves, and each partition is a node in the tree. Because they are straightforward and
easy to envisage, decision trees are a popular choice for many.so that they may manage categorical and continuous input
characteristics as well as binary and multi-class classification challenges. Decision trees are a well-liked option for projects
where the interpretability of the model is essential because they are simple to interpret and comprehend. Decision trees are
useful because they can reveal information about the model's decision-making process, which is useful for applications like
fraud detection, medical diagnosis, and credit assessment. Because they can capture complex nonlinear interactions between the
input elements and the output variable, decision trees are a useful tool for predictive modelling. They can manage variable
interactions and record complicated judgement limits that other linear models might struggle to do. To enhance the efficacy of
the model, decision trees can be merged with other machine learning techniques like random forests or boosting. These
ensemble techniques can lessen overfitting and raise forecast accuracy. In light of their capacity to manage complicated
relationships, feature selection, scalability, and interpretability, decision trees are a flexible and effective machine learning, a
technique that may be applied in several projects.
3) Random Forest Algorithm: Multiple decision trees are combined in the random forest ensemble learning method to increase the
model’s efficiency and robustness. Each decision tree in the random forest is trained using a random subset of the input
attributes and a random piece of the train data. Random forest is often used in categorization and regression machine learning
applications. It is well known for its ability to handle complex nonlinear interactions between input and output parameters and
for its capacity to reduce overfitting by integrating the predictions of many weak learners.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2957
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Particularly for complicated and nonlinear relationships between the input variables and the goal variable, random forest is
renowned for its high precision in result prediction. It is a dependable option for predictive modelling jobs because it is resistant to
data noise and anomalies. Estimates of feature importance provided by random forest can be used to pick features and obtain
understanding of the underlying data.
C. Performance Metrics
Performance metrics are measurements that are employed to evaluate the effectiveness and productivity of a system, process, or
individual. They are used to assess how well a work or project is doing and whether its objectives are being met. Performance
metrics provide a way to make data-driven decisions to improve outcomes and precisely gauge progress. In the context of machine
learning, performance measures are used to evaluate how well a model performs in terms of producing accurate predictions. In
machine learning, accuracy, precision, recall, F1-score, and AUC-ROC are frequently used success metrics. The success metric to
utilise is determined by the specific problem and the desired outcome.
A confusion matrix is a table that is used to assess the effectiveness of a machine learning model when solving a classification issue.
It is a matrix that summarises a set of instances expected and actual categorization.
The confusion matrix is frequently set up as a chart with four columns for each of the four possibilities that could occur in a binary
classification problem:
Fig.2 Representation of Confusion Matrix in a tabular formatTrue Positive (TP) - The model correctly predicts the positive class.
False Positive (FP) - The model incorrectly predicts the positive class. True Negative (TN) - The model correctly predicts the
negative class. False Negative (FN) - The model incorrectly predicts the negative class.
Accuracy, precision, and memory are some examples of performance measures that can be calculated using the confusion matrix.
As an illustration, recall can be computed as TP / (TP + FN), precision as TP / (TP + FP), and accuracy as (TP + TN) / (TP + TN +
FP + FN).
A helpful instrument for assessing a machine learning model’s performance is the confusion matrix, particularly for datasets with
imbalances where one class may be more common than the other. One can find the areas where the model is off and correct it by
looking at the uncertainty matrix. The proportion of accurately classified positive samples to all positive samples that have been
classified is known as precision.
Precision = TP/ (TP + FP) (1)
The recall is defined as the ratio of the total number of Positive samples by the number of Positive samples that were correctly
identified as Positive. The model’s capacity to identify positive samples is gauged by recall. More positive samples are found when
recall is higher.
Recall = TP/ (TP + FN) (2)
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2958
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
An evaluation metric for a classification that is defined as the harmonic mean of recall and precision is the F1-Score or F-measure.
F 1score = 2∗(precision∗recall)/(precision+recall) (3)
Fig.3 Methodology for solving the problem statement using Machine Learning
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2959
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
TABLE I
Model comparison with respect to their accuracies
MODEL ACCURACY
XGBoost 0.81495
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2960
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
C. XGBoost Algorithm
We have applied Random Forests, Decision Trees and XGBoost on the attributes such as CGPA, GRE, and TOEFL. By comparing
the graphs of each model, it can be concluded that XGBoost has the best accuracy in this recommendation system, at 81%,
compared to Random Forests and Decision Trees, which are 76% and 76%, respectively.
REFERENCES
[1] M. Hassan and M. Hamada, "Smart media-based context-aware recommender systems for learning: A conceptual framework," 2017 16th International
Conference on Information Technology Based Higher.
[2] Education and Training (ITHET), Ohrid, 2017, pp. 1-4.Jain, Satia. (2021, December). College Admission Prediction using Ensemble Machine Learning
Models. International Research Journal of Engineering and Technology (IRJET), 08(12), Article e-ISSN: 2395- 0056I. S. Jacobs and C. P. Bean, “Fine
particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271350.
[3] Judy, D’cruz, Kathe, Motwani. (2020, April). Recommendation System for Higher Studies using Machine Learning. International Research Journal of
Engineering and Technology (IRJET), 07(04), Article e-ISSN:2395-0056.
[4] Nalawade, Tiple. (2020, March). The University Recommendation System for Higher Education. International Journal of Recent Technology and Engineering
(IJRTE), 08(06), Article ISSN: 2277-3878.
[5] E. Soldatova, U. Bach, R. Vossen and S. Jeschke, "Creating an E-Learning recommender system supporting teachers of engineering disciplines," 2013
International Conference on Interactive Collaborative Learning (ICL), Kazan, 2013, pp. 811-815.
[6] Bhatt, Shah, Soni. (2020, July). Recommendation System for Higher Studies at Abroad via Machine Learning Techniques. International Journal of Advanced
Research in Science Technology (IJARST), 07(03), Article ISSN (Online) 2581-9429.
[7] M. H. Mohamed, M. H. Khafagy and M. H. Ibrahim, "Recommender Systems Challenges and Solutions Survey," 2019 International Conference on Innovative
Trends in Computer Engineering (ITCE), Aswan, Egypt, 2019, pp. 149-155.
[8] M. H. Ansari, M. Moradi, O. NikRah and K. M. Kambakhsh, "CodERS: A hybrid recommender system for an E-learning system," 2016 2nd International
Conference of Signal Processing and Intelligent Systems (ICSPIS), Tehran, 2016, pp. 1-5.
[9] Shreya Agrawal (ME Student) and Pooja Jain (Assistant Professor), An Improved Approach For Movie Recommendation System, 978-1-5090-3243
3/17/$31.00©2017 IEEE
[10] Mahamudul Hasan, Shibbir Ahmed,Deen Md.Abdullah, and Md.Shamimur Rahman, Graduate School Recommender System: Assisting Admission Seekers to
Apply for Graduate Studies in Appropriate Graduate Schools, by 978-1-5090-1269-5/16/$31.00 ©2016 IEEE
[11] Usue Mori, Alexander Mendiburu, and Jose A.Lozano, “Similarity Measure Selection for Clustering Time Series databases,” IEEE Transactions on Knowledge
and Data Engineering. Vol. 28. No. 1. January 2016.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2961
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2962