0% found this document useful (0 votes)
65 views

Restaurant Review Predictionusing Machine Learning and Neural Network

Nowadays, people often judge which restaurant is good or bad by looking at the rating of the restaurant
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Restaurant Review Predictionusing Machine Learning and Neural Network

Nowadays, people often judge which restaurant is good or bad by looking at the rating of the restaurant
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Volume 7, Issue 3, March – 2022 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Restaurant Review Prediction using Machine


Learning and Neural Network
Tanbin Siddique Eidul, Md.Alim Imran, Amit Kumar Das
Computer Science and Engineering
East West University Dhaka, Bangladesh

Abstract:- Nowadays, people often judge which suggest new entrepreneurs a suitable place to open up a new
restaurant is good or bad by looking at the rating of the restaurant business. Most of the work done in this field is
restaurant. That’s why ratings are a critical factor in the using obscure logic [4][5][6], where customer satisfaction is
restaurant business. Ratings are usually given by people one of the main concerns.
judging by what kind of service restaurants are
providing. So, features of restaurants play a very Our goal here is to predict ratings for new restaurant
important role in this regard. The main goal of this businesses based on collective features. People will be
research is to predict ratings of restaurant business benefited who are trying to set up a new restaurant and by
based on features to help new entrepreneurs to set up knowing an expected rating, the business plan can be re-
new business. We used different machine learning modified according to features.
algorithms like Decision tree, Support vector machine
(SVM), k-nearest neighbors’ algorithm (KNN), The remainder of the paper is classified as following
Stochastic gradient descent (SGD), Gaussian Naive steps, Section II, Background study is holding the related
Bayes. We also used a convolutional neural network works done on the same topic; Section III narrates the data
(CNN) model here. It gives us an accuracy score of 97.2 processing; Section IV retains the description of algorithms
25 percent which is higher than all other algorithms. used in this work. The analysis, working procedure, and
results are in Section V and at the end conclusion,
Keyword:- machine learning algorithm, convolutional limitations, and future works.
neural network (CNN).
II. BACKGROUND STUDY
I. INTRODUCTION
There has been a lot of scientific research and work on
In our daily life when we try something new or make this subject before; this section describes some of the
an important decision, we requested suggestions from the notable works that have already taken place.
community and these suggestions can heavily influence our
decisions. In this current modernized world, people are more In a research paper [7] researchers tried to predict the
connected via the internet as a result people now often make future success of Yelp Restaurants. Here Reviews collected
decisions based on other people’s recommendations online. from the customer online are useful for predicting the future
Rating or Ranking plays a very important role, in almost any of the restaurant business. The paper indicates more about
kind of business. These evaluations heavily influence people online ratings provided by consumers for restaurants and
on making choices. determines whether the restaurant will continue its business
or not. Going through the paper, it is important to maintain a
This is very much true in the case of the restaurant certain number of reviews and least ratings on YELP to
business. People always tend to go to a restaurant with have average to maximum customer attraction as they used
higher ratings. Study shows that only even half a star better the YELP dataset. It is also important to have an eye-catchy
rating can allow restaurants have 19% more frequent chance and well-crafted environment. Some other factors that affect
to see out which can have a significant influence on the ratings are Food quality, employee behavior, location,
restaurants overall business [1]. To open a new business in etc.2 different datasets and 15 attributes were used and
this highly competitive sector people need to be more analyze data by categorizing them as Text Features and
careful. About 59% of new restaurants fail in their business Non-text Features. As it was a binary classification problem,
in their opening years and about 80% of restaurants fail Logistic regression was used as the classifier. Accuracy was
within the next five years [2]. Now to set up a new business found at 67.46%. 73% of restaurants found open. The
location plays a vital role. A perfect location can extend the prediction was conducted based on a 1year period of the
chances of success for a new restaurant incredibly. We can dataset.
use machine learning algorithms on collective data, to help
new entrepreneurs, what must-have features can increase In another work by Aillen Wang and his fellow
their rating of the restaurant business. In recent years researchers [8] tried to predict new restaurants' success and
machine learning model improved a lot to a point that where rating and find out which features controls a restaurant’s
machines are producing better accuracy levels than humans. success. They defined some conditions for a restaurant to be
One of the biggest examples of this is Google's inception considered successful. Yelp Dataset for restaurants was used
network which suppressed human-level accuracy in image here. They performed a chi-square test and stochastic
classification [3]. This implies that using the machine gradient descent (SGD) to find out optimal restaurant
learning model is now more convenient than ever. We can features which have the most weight. Different types of

IJISRT22MAR614 www.ijisrt.com 1388


Volume 7, Issue 3, March – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
binary and multi-class classification algorithms like Random Week, Year. As the original dataset are so long and that's
forest, logistic regression, Support Vector Machine (SVM), why non-usable data should be removed from dataset. For
and Multilayered Neural Network were used for their work. predict the sales revenue here they use generalized linear
They used these algorithms to predict the restaurant's rating model, Gradient boosted tree, Decision tree. But in the result
and success and they rounded the predicted ratings to the section GBL performed better than other. The accuracy of
nearest star. When they used all these algorithms, they this algorithm is 98% within 100% which is much good. As
observed that among them two algorithms performed much the result is depends on accuracy, precision and recall GBL
better than others. These two algorithms are Random Forest performed better than other and the value is 50. So, if one
(60%) and Multilayer Neural Network (56%) algorithm wants to performed better than this, he needs to find a strong
which has about 60% accuracy for binary and 56 % for dataset.
multi-class classification. Lastly, they also performed
sentiment analysis and the accuracy increased up to 85% B. Feature Extraction:
using several algorithms. For their future Work, they want to As we know Yelp data set to have six different sub-
find out and add another feature which is types of cuisine to datasets from these datasets, we used the YELP academic
better predict the restaurant's success. business dataset for our work here. The business dataset has
different columns like business_id, name, address, city,
In another research paper [9] by Ibne Farabi Shihab state, postal_code, latitude, longitude, stars, review_count,
and his fellow teammate’s main focus was to suggest a Is_open, attributes, categories, and hours. From this dataset,
proper location for setting up a new restaurant business we identified and picked only restaurant businesses. There
depending on the average rating given by the customer. Here was a total of 63,961 restaurants listed there. As we only
they tried to predict restaurant ratings by using different picked restaurant business the category column becomes
machine learning algorithms. Their goal was to find a unnecessary so we dropped the column along with some
restaurant's rating based on its current feature and then other columns. Name, address, postal_code, latitude,
suggest a good location for setting up a new restaurant. For longitude, attributes, and hours were dropped here as they
the whole process, the YELP dataset has been used here. are mainly used to identify the business and not necessary
They used a linear regression model on the restaurant feature for our work after this, we ended up with 6 different
to predict rating. The result was not satisfactory so they used columns. Among these 6 attributes had some nested
different algorithms like Decision Tree, Logistic Regression, columns so we separated the attribute column for the dataset
on-Linear SVM for better results. This time the result was and then normalized the attribute column. After normalizing
much better. After running these algorithms on the we again found that there are still six nested columns left
restaurant’s features, they observed that Non-Linear SVM which are BusinessParking, Ambience, GoodForMeal,
highest accuracy score of 97.02% and a precision score of DietaryRestriction, Music, BestNights. So, again we
95.29% which is far better than previous results. separated normalized those columns. We set all NaN
variables to string None for data handling purposes later on.
III. DATA COLLECTION AND PREPROCESSING There were 33 columns in the attribute after normalizing and
extracting the nested columns. And among those six columns
A. Dataset: that we extracted from attributes BusinessParking had 5
For any kind of ML system, one has to have a dataset. columns, Ambience had 9 columns, GoodForMeal had 6
For our paper, we used the Yelp dataset [11]. As yelp is a columns, DietaryRestriction, Music, and BestNights each
globally popular platform for people to rate and review has 7 columns. So, we ended up with 8 different data frames.
restaurants. The Yelp dataset is huge and has been reviewed Lastly, we merged all the data frames into a single frame and
by millions of people. Here we are using the freely provided finally ended up with 79 columns. Then we dropped the
YELP dataset for academic purposes. The main dataset business_id as there was no further use for it. We then
includes six JSON files business, check-in, tips, review, separated stars from the dataset as it serves as our label for
photos, and user. this work. Now the remaining columns will act as features
Another interesting work [10] done by Sunitha Cheriyan for our work. So, finally, we ended up with 63,961
and fellow researchers the paper is about Intelligent Sales restaurants and 77 features. For rating, we have simplified
Prediction Using Machine Learning Techniques .Here data the rating with 0 as poor, 1 as average, and 2 as good. We
is collected from stores database (2015-2017). Original also used a label encoder to change the categorical value and
dataset is consisting of many attributes. These are: Category, assigned them numerical values. Now our dataset is cleaned
City, Type of items and its description, number of items, and ready for applying different algorithms.
Quantity, Quarter, Sales, Revenue, Year, SKU description,

IJISRT22MAR614 www.ijisrt.com 1389


Volume 7, Issue 3, March – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig.1: Dataset after processing


IV. USED ALGORITHMS Here we get 90.02% accuracy and 86.43% precession.

We used different machine learning algorithms. For E. Gaussian Naive Bayes


this SKLearn model was used. We split our dataset into Naive Bayes is a gathering of managed AI classification
80:20 ratios. Where 80% data were used for training the algorithms dependent on the Bayes hypothesis. It is a
model and 20% for testing purpose straightforward order procedure it has high usefulness. They
discover use when the dimensionality of the sources of info
A. Decision tree: is high. Complex characterization issues can likewise be
The Decision Tree calculation can use to tackle carried out by utilizing the Naive Bayes Classifier. But
Regression and Classification issues. It makes a training when the data is continues Gaussian Naïve Bayes perform
model which predicts the estimation of target factors by much better then Naïve Bayes. Here we get 91% accuracy
taking in choice guidelines speculating from preparing and 82.91% precession.
information. for our work within the tree class of SKLearn
[12] we used the decision-tree class. The accuracy we get F. Convolutional neural network (CNN)
from the decision tree is 83.6% with a precision score of Although we got relatively good accuracy by using
86.16%. different machine learning algorithms the problem with
these algorithms is, they take more time and memory so it
B. Support vector machine (SVM) was hard to train the full dataset. So, we applied the
In machine learning, a support vector machine convolutional neural network here.
(SVM) is a supervised learning model where the In 1980 Yann LeCun first proposed the Convolutional
algorithm can give an optimal hyperplane that can neural network (CNN). It is made out of few layers of
categorize new examples when labeled data is given. artificial neurons. The basic role of these counterfeit neurons
Svm package from SKLearn was used here and the is to compute the weighted amount of the sources of info
Accuracy we get from SVM is 91.1%. and give an activation value as output. CNN typically
comprises a few convolution layers.
C. k-nearest neighbors’ algorithm (KNN)
K-NN is a Supervised Machine Learning Algorithm. CNN takes inputs from a huge dataset and cycles them
It takes care of both Regression and Classification issues. with irregular qualities. At the point when the output doesn't
The accuracy we get from SVM is 91.1% and the coordinate with the labels given in the dataset, at that point
precision score is 82.91%. the model learns and redo the whole process by making
corrections in the values. This is basically the whole training
D. Stochastic gradient descent (SGD) process concept for CNN. After a few runs, the models
Stochastic Gradient Descent (SGD) is a basic yet prepare for testing with an unlabeled dataset. At the point
exceptionally proficient way to deal with fitting direct when it gets a decent precision on the test dataset, it is
classifiers and regressors under raised misfortune prepared for genuine use. In our main CNN model, all the
capacities, for example, (linear) support vector machines input sequences are made the input of the first embedding
and logistic Relapse. The word 'stochastic' signifies a layer. Now when the full dataset is embedded, the
framework or an interaction that is connected with an embedding layer will act as the input layer for the
arbitrary likelihood. Henceforth, in Stochastic Gradient convolutional layer. We used 75 filters in the convocational
Descent, a couple of tests are chosen haphazardly rather layer and made the super features which used in the next
than the entire informational collection for every cycle. max-pooling layer. We used global max-pooling for this
In Angle Plunge, there is a term called "batch" which layer. We applied the SoftMax activation key here. Using
indicates the all-out number of tests from a dataset that is this CNN model, we got a 97.22% accuracy score.
utilized for computing the slope for every emphasis.

IJISRT22MAR614 www.ijisrt.com 1390


Volume 7, Issue 3, March – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 2: Convolutional Neural Network (CNN) model.

V. RESULT ANALYSIS

Algorithm Name Accuracy % Precision % Recall % F1 Score %


Decision Tree 83.6 86.15 83.64 84.81
Support vector machine 91.1 82.91 91.06 86.79
k-nearest 91.1 82.91 91.06 86.79
neighbors’ algorithm
Stochastic gradient descent 90.02 86.43 90.24 87.62
Gaussian Naive Bayes 91 82.91 91.05 86.79
Convolutional neural network 97.22 96.27 96.3 96.28
Table 1: Different Algorithm performance table

Here, we used multiple machine learning algorithms. performed average, and the performance of the Decision tree
Among them, Support vector machine, k-nearest neighbors’ is not so good as others. But when we use CNN, we get a
algorithm, Stochastic gradient descent, stochastic gradient result that is far better than other machine learning
descent, Gaussian Naive Bayes these algorithms are algorithms.

Table 2: CNN model testing performance


A. Hamming Loss Jaccard: B. Cohen Kappa:
Multi-label classification issues should be assessed Working with unbalanced datasets, Cohen kappa is a
victimization different performance measures than single- valuable estimation metric. While calculating Cohen kappa
label classification issues. Two of the foremost common score, we start with the assumption that the goal and
performance metrics square measure acting loss and Jaccard expected class distributions are separate, but that the target
similarity. acting loss is that the average fraction of incorrect class has no bearing on the likelihood of a successful
labels. Note that acting loss may be a loss operate which the prediction. Cohen proposed that the Kappa outcome be
proper score is zero. Jaccard similarity, or the Jaccard index, viewed as follows: 0 indicates no compromise,0.01–0.20
is that the size of the intersection of the expected labels and indicates zero to mild collaboration, 0.21–0.40 indicates
therefore the true labels divided by the scale of the union of reasonable agreement,0.41–0.60 indicates modest
the expected and true labels. It ranges from zero to one, and agreement, 0.61–0.80 indicates significant agreement, and
one is that the excellent score. 0.81–1.00 indicates almost ideal agreement.

IJISRT22MAR614 www.ijisrt.com 1391


Volume 7, Issue 3, March – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
K= (TA -RA)/(1-RA); TA=(TP+TN)/( TP+TN+ [7.] X. Lu, J. Qu, Y. Jiang, and Y. Zhao, “Should i invest
FP+FN) ; it? predicting future success of yelp restaurants,” 2018,
RA=[{( TN+ FP)* (TN+ FN)}+{( TP+ FN)*( TP+ doi: 10.1145/3219104.3229287.
FP)}]/ ( TP+TN+ FP+FN)2 [8.] J. Z. A. Wang, W. Zeng, “Predicting New Restaurant
Success and Rating with Yelp,” 2016.
Here, cohen kappa(k),Total Accuracy(TA), Random [9.] F. Shihab, M. M. Oishi, S. Islam, K. Banik, and H.
Accuracy(RA),True Postive(TP), True Negetive(TN), False Arif, “A machine learning approach to suggest ideal
Positive(FP), False Negetive(FN). geographical location for new restaurant
establishment,” 2019, doi: 10.1109/R10-
Here, the value of precision, accuracy, recall is much HTC.2018.8629845.
better. Then we use haming loss jaccard and cohen kappa to [10.] S. Cheriyan, S. Ibrahim, S. Mohanan, and S. Treesa,
see how much good our algorithm is. In that case our model “Intelligent Sales Prediction Using Machine Learning
perform a great score. The haming loss jaccard value of our Techniques,” 2019, doi:
algorithm is 0.93 which is very much close to 1 and 1 is the 10.1109/iCCECOME.2018.8659115.
best value for haming loss jaccard. On the other hand the [11.] Yelp.com, “Yelp Dataset,” 2019.
cohen kappa value of our algorithm is 0.95 which is also https://www.yelp.com/dataset.
close to 1. After all, we can easily say that our model CNN [12.] Scikit-learn: Machine Learning in Python, Pedregosa
perform better then any other previous work in this field. So, et al., JMLR 12,pp. 2825-2830, 2011.
we can state that it’s the best model for this type of work.

VI. FUTURE WORK AND CONCLUSION

We use business dataset which is a sub dataset of yelp


dataset. Yelp dataset is an USA based dataset. So, our plan
is to work with different countries dataset. Here our model is
restaurant type business. So, we want to extend it and we
will work with different type of business.

Overall, here we tried to make a model which can


successfully predict the expected rating of a restaurant based
on its features. Our goal was to predict the rating as
accurately as possible. So, after analyzing the performance
and results our model is clearly performing much better than
any other model. We hope that this model of ours can help a
lot of new entrepreneurs in the restaurant business.

REFERENCES

[1.] M. Anderson and J. Magruder, “Learning from the


Crowd: Regression Discontinuity Estimates of the
Effects of an Online Review Database,” Econ. J., 2012,
doi: 10.1111/j.1468-0297.2012.02512.x.
[2.] G. Parsa, J. T. Self, D. Njite, and T. King, “Why
restaurants fail,” Cornell Hotel Restaur. Adm. Q.,
2005, doi: 10.1177/0010880405275598.
[3.] Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z.
Wojna, “Rethinking the Inception Architecture for
Computer Vision,” 2016, doi:
10.1109/CVPR.2016.308.
[4.] S. Khatwani and M. B. Chandak, “Building
Personalized and Non Personalized recommendation
systems,” 2017, doi:
10.1109/ICACDOT.2016.7877661.
[5.] T. Osman, M. Mahjabeen, S. S. Psyche, A. I. Urmi, J.
M. S. Ferdous, and R. M. Rahman, “Adaptive food
suggestion engine by fuzzy logic,” 2016, doi:
10.1109/ICIS.2016.7550755.
[6.] P. A. D. L.Anitha,Kavitha Devi M K, “No Title,” Int.
J. Comput. Appl., 2013, [Online]. Available:
https://www.researchgate.net/publication/260972980_
A_Review_on_Recommender_System.

IJISRT22MAR614 www.ijisrt.com 1392

You might also like