0% found this document useful (0 votes)

2 views

Project Report

this the project report

Uploaded by

nirannjanss

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Project Report

this the project report

Uploaded by

nirannjanss

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 16

Earlier known as

B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

City & Cuisine-Based Restaurant Recommender Using

Yelp Dataset
Team Id: 5A07

Data Mining and Analysis Course Project Report

Team Members:
Abhijeet Prakash 01FE16BCS003

Abhishek D Sawant 01FE16BCS004

SCHOOL OF COMPUTER SCIENCE &

Adarsh Raj 01FE16BCS009

Apeksha Ninnekar 01FE16BCS038

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 1 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

1.Introduction
Today, customer reviews in social media have a deep impact on the chances of success of any
business. Restaurant customers look for a complete and satisfactory experience regarding food
quality, service, ambience and they often seek the opinion of patrons when they are choosing a
place for their next meal. Yelp offers this information to its users. When users look for a place to
eat, they can ask the service for a list of nearby restaurants for a cuisine category. Users also get
the overall rating that other customers gave to the restaurant as well as some reviews about the
restaurant.
Reviews content is very diverse. They can talk about the food, the service, the ambiance; they
can reflect a positive experience or a complain about some specific aspect of their experience.
Therefore, reviews are a wealth of information and usually are more informative than a numeric
rating. On the other hand, a service like Yelp receives thousands of reviews each day from every
corner of the world and summarizing or extracting specific pieces of information from such a big
corpus is a challenging task.
Data mining and more concretely text mining techniques allow us to explore a massive corpus
like the one of Yelp reviews. We can obtain new insights about the text content that may be
helpful for customers, restaurant owners, government or even for Yelp.
In this project, we mine a corpus of Yelp restaurant reviews to explore the next questions: What
are the best restaurants in a city? How many different cuisines restaurants serve? Can we
recommend dishes for a cuisine and which restaurant is best to try them? This problem is made
easier for users by recommendation systems which utilize their personal preferences to suggest
best restaurant according to their preferred cuisine.

2. Problem Statement
The project is designed in a manner to search for and recommend best restaurants in a city for
different kinds of cuisines based on reviews given by customers.

3. Objectives
 To predict rating of restaurants listed in the Yelp dataset based on the reviews given by
the users. Classification techniques such Support Vector Machines are used.
 Recommending restaurants to the users using the predicted stars and sentiment polarity
values.

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 2 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

 Graphical User Interface which particularly takes two inputs from user or customers to
predict ten best restaurants available in a particular city for a particular cuisine provided
by customer.

4. Data Description
The data used in this project is part of the Yelp Dataset Challenge (Round 12). The dataset
consists of a set of JSON files that include business information, reviews, tips (shorter reviews),
user information and check-ins. Business objects list name, location, opening hours, category,
average star rating, the number of reviews about the business and a series of attributes like noise
level or reservations policy. Review objects list a star rating, the review text, the review date, and
the number of votes that the review has received. In this project, we have focused on these two
types of objects. The data consists of six sub datasets which describes the data with a brief
information

 The size of the Data is 6.84 Gb including the sub files

1. Business Dataset (139 MB)

2. Check-In Dataset (50.3 MB)

3. Photo Dataset (34.9 MB)

4. Review Dataset (4.39 GB)

5. Tips Dataset (203 MB)

6. Users Dataset (2.03 GB)

5. Related Work:

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 3 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

Due to the rich information contained in the Yelp dataset, many past research and projects tried
to use it to predict ratings of restaurants and to evaluate the future development. For example,
Kong, Nguyen and Xu classified restaurants based on cultural categories and analyzed
international restaurants success mostly with Gaussian Discriminant Analysis (GDA). Several
other previous papers focused on the sentiment analysis with text content from Yelp. Xu, Wu
and Wang combined the customer reviews and ratings together to conduct sentiment analysis,
while Gingerich and Bochkov mainly used matrix factorization to analyze text information and
predict Yelp ratings. Linshi worked on user-based text analysis on Yelp rating prediction. He
showed that how Yelp user experience can be improved from rating prediction. Other than Yelp
review, Tang, Qin, Liu and Yang introduced neural network to predict movie reviews. They
claimed that matrix-vector multiplication would be more effective than vector concatenation
when considering text analysis. So far, most research works on text analysis of customer
reviews, but leaves out other features in Yelp Dataset Challenge. In this project, we apply non-
text features to predict restaurants ratings and aims to work on a region-based analysis instead of
a user-based analysis in order to provide suggestions to Yelp restaurants.

6. Methodology:
We aim to build a recommendation system that will enable us to make sophisticated restaurant
recommendations for Yelp users. We begin by providing a brief explanation of the dataset we
used while creating our recommendation system. We follow this with a relevant exploratory

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 4 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

analysis of data.

Fig (6.0.1): Methodology flow diagram

6.1 Exploratory Analysis:

The primary features of a business being used in our data analysis are business category and
location (state and city). The preliminary exploratory analysis of the dataset includes study of
distribution of reviews with respect to category of the business and its location.

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 5 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

Fig (6.1.1): Frequency distribution of State v/s Number of food businesses

Fig (6.1.2): Frequency distribution of categories v/s count

6.2 Dataset Reduction:

After exploratory analysis, we trimmed our dataset only for Ontario state consisting of food
related categories.

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 6 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

We selected instances with:

● Business category as ‘Restaurants’, ’Food’, ’Japanese’, ’Chinese ‘, ‘Thai’, ’Italian’, ’Indian’.

● State as ‘Ontario’ (ON)

6.3 Predictive Tasks:

There are two major tasks in our project:

● Predicting rating from the review text alone.

● Recommending restaurants based on predicted stars and sentiment polarity.

6.4 Predicting rating from the review text, we implemented the following the
model:
● Linear Support Vector Machine Classifier

6.4.1 Linear Support Vector Machine Classifier:

Support Vector Machine (SVM) is primarily a classifier method that performs classification
tasks by constructing hyperplanes in a multidimensional space that separates cases of different
class labels. SVM is effective in high dimensional spaces. It uses a subset of training points in
the decision function (called support vectors), so it is also memory efficient. And different kernel
functions can be specified for the decision function. In this project, we use the open python
library scikit-learn to implement the classifier.

To build a Linear SVM Classifier using the reviews text, we carried out the following
preprocessing steps:

● Removed the punctuations

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 7 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

● Removed the stop words

The classifier needs some sort of feature vector in order to perform the classification task. We
used the TF.IDF feature to convert the review text into vector format. So, each review is now
represented as a set of coordinates in a high-dimensional space. During training, the SVM will
try to find some hyperplanes that separate our training examples. When we feed it the test data, it
will use the boundaries it learned during training to predict the rating of each test review.

7. Discussion Predicting Ratings:

Evaluation Metrics We use Precision and Recall as the evaluation metric to measure our rating
prediction performance. SVM has better performance than Naïve Bayes, as a naive Bayes
classifier simply assumes that the value of a particular feature is unrelated to the presence or
absence of any other feature, given the class variable. SVM on the other hand is primarily a
classier method that performs classification tasks by constructing hyperplanes in a
multidimensional space that separates cases of different class labels. Tf.idf with bigrams is
performing better. These results are intuitively aligned to the observation that we need to factor
in phrases like ‘not great’, ‘not bad’ to understand the sentiment of the review.

8. Code:
Our code is divided into three parts:

1) Exploratory analysis of datasets

2) Predicting ratings from review text and calculating sentiment polariy

3) Recommendation of restaurants.

ANALYSIS OF PREDICTING RECOMMENDATION

DATASETS RATINGS

8.1 Linear Support Vector Machine Classifier Python Notebook

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 8 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

 Pre-processed the review text by removing the stop words using NLTK and removed the
punctuations.

 Converted the review text into vector format using TF-IDF approach using the
TfidfVectorizer in sklearn.

 Split the dataset into train and test set (80:20) using train-test split of sklearn.

 Built a linear SVM model and fitted it to our training set.

 Evaluated the model for 5 classes (1,2,3,4,5-star rating)

8.2 Restaurant recommender python notebook

 Calculating the sentiment polarity for each business.

 Considering the rows having stars value greater than 3.5 and sentiment polarity values
greater than 0.

 Obtaining the top 10 restaurants with highest sentiment polarity.

8.3 Result:
Model Feature Precision Recall Accuracy Number of
Classes

Linear Bigram + 0.590484199818 0.596285137787 0.596285137787 5

SVM TF-IDF

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 9 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 10 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 11 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 12 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 13 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 14 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

9. References:
[1] Yelp Challenge Presentation: http://www.ics.uci.edu/~vpsaini
[2] http://www.ics.uci.edu/~vpsaini/files/technical_report.pdf
[3] Scaria, Aju Thalappillil, Rose Marie Philip, and Sagar V. Mehta. “Predicting Star Ratings of
Movie Review Comments.”
[4] https://cseweb.ucsd.edu/~jmcauley/cse255/reports/fa15/017.pdf
[5] Chada, Rakesh, and Chetan Naik. “Data Mining Yelp Data Predicting rating stars from
review text.”
[6] Li, Chen, and Jin Zhang. “Prediction of Yelp Review Star Rating using Sentiment Analysis.”
[7] https://nycdatascience.com/blog/student-works/yelp-recommender-part-1/
[8] https://cambridgespark.com/content/tutorials/implementing-your-own-recommender-
systemsin-Python/index.html

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 15 of 16
Earlier known as
B. V. B. College of Engineering & Technology

School of Computer Science and Engineering

[9] Arun Babu, Rahool Arun Paliwal and Syamsankar Kottukkal. “Content-Aware Collaborative
Filtering for Yelp Restaurant Recommendation

KLETECH/SoCSE(2018-19)/DMA/Course Project/5ADMACP10
Page 16 of 16

Test Development: Fundamentals for Certification and Evaluation
From Everand
Test Development: Fundamentals for Certification and Evaluation
Melissa Fein
No ratings yet
Knit So Fine
35% (20)
Knit So Fine
8 pages
Project Presentation
No ratings yet
Project Presentation
34 pages
Ashish Gandhe, Restaurant Recommendation System
No ratings yet
Ashish Gandhe, Restaurant Recommendation System
6 pages
Data report
No ratings yet
Data report
7 pages
Ashish Gandhe, Restaurant Recommendation System
No ratings yet
Ashish Gandhe, Restaurant Recommendation System
5 pages
Ashish Gandhe, Restaurant Recommendation System PDF
No ratings yet
Ashish Gandhe, Restaurant Recommendation System PDF
5 pages
RuiJian MastersThesis
No ratings yet
RuiJian MastersThesis
71 pages
DA Report PDF
No ratings yet
DA Report PDF
4 pages
Dataset Description
No ratings yet
Dataset Description
2 pages
Restaurants Rating Prediction Using Machine Learning Algorithms
No ratings yet
Restaurants Rating Prediction Using Machine Learning Algorithms
4 pages
RIT-39
No ratings yet
RIT-39
19 pages
Restaurant Recommendation System Using Machine Learning
No ratings yet
Restaurant Recommendation System Using Machine Learning
5 pages
Edunet
No ratings yet
Edunet
14 pages
Restaurant Review Classification and Recommender System
No ratings yet
Restaurant Review Classification and Recommender System
5 pages
Yelp Explorers Report
No ratings yet
Yelp Explorers Report
10 pages
Report-Converted Sip
No ratings yet
Report-Converted Sip
14 pages
DA - Project 1
No ratings yet
DA - Project 1
12 pages
Sentimental Analysis of Resturant Reviews
No ratings yet
Sentimental Analysis of Resturant Reviews
30 pages
02 ruchiJWoo35-49
No ratings yet
02 ruchiJWoo35-49
16 pages
Yelp Business Rating Prediction
No ratings yet
Yelp Business Rating Prediction
8 pages
Restaurant Recommendation1
No ratings yet
Restaurant Recommendation1
5 pages
Sentiment Analysis of Restaurant Reviews-1
No ratings yet
Sentiment Analysis of Restaurant Reviews-1
18 pages
Yelp Vs Zomato Analysis
No ratings yet
Yelp Vs Zomato Analysis
8 pages
Restaurants Rating Prediction Using Machine Learning Algorithms
No ratings yet
Restaurants Rating Prediction Using Machine Learning Algorithms
4 pages
Exploratory Data Analysis and Data Mining On Yelp Restaurant Review Using Ada Boosting and MLP Techniques
No ratings yet
Exploratory Data Analysis and Data Mining On Yelp Restaurant Review Using Ada Boosting and MLP Techniques
5 pages
DAT SCIENCE PROGRAMING ASSEMENT 5 (1)
No ratings yet
DAT SCIENCE PROGRAMING ASSEMENT 5 (1)
6 pages
A Recommendation System For Food Tourism
No ratings yet
A Recommendation System For Food Tourism
10 pages
0ae1540b-0a1d-4924-b45e-ec0ac31458ad.pptx_20240625_124547_0000
No ratings yet
0ae1540b-0a1d-4924-b45e-ec0ac31458ad.pptx_20240625_124547_0000
16 pages
Data Mining of Restaurant Review Using W PDF
No ratings yet
Data Mining of Restaurant Review Using W PDF
4 pages
lit1F
No ratings yet
lit1F
7 pages
Popularity-Based and Collaborative Filtering Based Restaurant Recommender System
No ratings yet
Popularity-Based and Collaborative Filtering Based Restaurant Recommender System
19 pages
Modern NLP in Python
No ratings yet
Modern NLP in Python
46 pages
restaurant recommendation system project report
No ratings yet
restaurant recommendation system project report
29 pages
Apssdc Edunet
No ratings yet
Apssdc Edunet
11 pages
Restaurant Review Predictionusing Machine Learning and Neural Network
No ratings yet
Restaurant Review Predictionusing Machine Learning and Neural Network
5 pages
Data Visualization 2
No ratings yet
Data Visualization 2
12 pages
Sentiment Analysis and Classification of Restaurant Reviews Using Machine Learning
No ratings yet
Sentiment Analysis and Classification of Restaurant Reviews Using Machine Learning
6 pages
Final Project Report DA
No ratings yet
Final Project Report DA
3 pages
Project Report 2
No ratings yet
Project Report 2
33 pages
1-s2.0-S095741742402400X-main
No ratings yet
1-s2.0-S095741742402400X-main
17 pages
Restaurant Review
No ratings yet
Restaurant Review
21 pages
Business Analytics For Entrepreneurs: Term Paper
No ratings yet
Business Analytics For Entrepreneurs: Term Paper
52 pages
Analyzing The Impact of Components of Yelp - Com On Recommender System Performance Case of Austin
No ratings yet
Analyzing The Impact of Components of Yelp - Com On Recommender System Performance Case of Austin
11 pages
Rating Prediction Based On Yelp's User Reviews: A Hybrid Approach
No ratings yet
Rating Prediction Based On Yelp's User Reviews: A Hybrid Approach
10 pages
Swiggy_project_ppt
No ratings yet
Swiggy_project_ppt
13 pages
Recommendation System
No ratings yet
Recommendation System
14 pages
Report
No ratings yet
Report
18 pages
10 1109@icasert 2019 8934655
No ratings yet
10 1109@icasert 2019 8934655
6 pages
AIML Sentimental Analysis Project
No ratings yet
AIML Sentimental Analysis Project
11 pages
Pbatch 39
No ratings yet
Pbatch 39
79 pages
1 s2.0 S0920548923000478 Main
No ratings yet
1 s2.0 S0920548923000478 Main
16 pages
大作业原题
No ratings yet
大作业原题
5 pages
PDF To PowerPoint 642
No ratings yet
PDF To PowerPoint 642
11 pages
Web Crawling Based Context Aware Recommender Syste
No ratings yet
Web Crawling Based Context Aware Recommender Syste
25 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
15 pages
Travel Recommendation Model Integrating Long-Term and Short-Term User Preferences
No ratings yet
Travel Recommendation Model Integrating Long-Term and Short-Term User Preferences
5 pages
Paper 1
No ratings yet
Paper 1
5 pages
An Individualized Recommendation and Promotional System For Tourist Attractions
No ratings yet
An Individualized Recommendation and Promotional System For Tourist Attractions
21 pages
APSSDC Project
No ratings yet
APSSDC Project
14 pages
Parse PPT
No ratings yet
Parse PPT
25 pages
Basic 7 Basic SC 1ST Term E-Notes
100% (3)
Basic 7 Basic SC 1ST Term E-Notes
21 pages
SSP 186 The CAN Databus
No ratings yet
SSP 186 The CAN Databus
29 pages
Wa0084.
No ratings yet
Wa0084.
1 page
ملف مميز عن التسميد فى الموالح من فلوريدا
No ratings yet
ملف مميز عن التسميد فى الموالح من فلوريدا
114 pages
MKTG 183 Lululemon Final Project Write Up
No ratings yet
MKTG 183 Lululemon Final Project Write Up
25 pages
The Economic
No ratings yet
The Economic
15 pages
Tamil Nadu Companies List PDF Free
No ratings yet
Tamil Nadu Companies List PDF Free
199 pages
Summative Test
100% (4)
Summative Test
2 pages
Mã đề thi 25: took over founded
No ratings yet
Mã đề thi 25: took over founded
8 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
MEXICO Entry - Exit Requirements
No ratings yet
MEXICO Entry - Exit Requirements
153 pages
Blower Door Tests (En 13829) For Quality Assurance - Getting Air-Tight Buildings in Retrofitting, Too
No ratings yet
Blower Door Tests (En 13829) For Quality Assurance - Getting Air-Tight Buildings in Retrofitting, Too
6 pages
Ioa VS Ioc
No ratings yet
Ioa VS Ioc
5 pages
PTS I Bhs Inggris Kelas X
No ratings yet
PTS I Bhs Inggris Kelas X
5 pages
HSRP Vs VRRP Vs GLBP
No ratings yet
HSRP Vs VRRP Vs GLBP
2 pages
Scenario Fall Rca
No ratings yet
Scenario Fall Rca
3 pages
Principles of Ecology: Matter, Energy, and Life
No ratings yet
Principles of Ecology: Matter, Energy, and Life
35 pages
Decision Making and Problem Solving PDF
No ratings yet
Decision Making and Problem Solving PDF
364 pages
Durham Johnston School Music Department Ks3 Scheme of Work
No ratings yet
Durham Johnston School Music Department Ks3 Scheme of Work
10 pages
Threshold Limit Value
No ratings yet
Threshold Limit Value
3 pages
Buchi de Yema
No ratings yet
Buchi de Yema
30 pages
Sulkygeek - Gravitational Forces
No ratings yet
Sulkygeek - Gravitational Forces
189 pages
Leroux-Le Fantome de L Opera 2
No ratings yet
Leroux-Le Fantome de L Opera 2
437 pages
RTI Salem
No ratings yet
RTI Salem
7 pages
Without A Budget: Marketing
No ratings yet
Without A Budget: Marketing
15 pages
Food Safety Management System
No ratings yet
Food Safety Management System
36 pages
Manne The Market For Corporate Control PDF
No ratings yet
Manne The Market For Corporate Control PDF
12 pages
User Manual: Android Tablet
No ratings yet
User Manual: Android Tablet
130 pages
Cdtfa California Departament Tax and Fee Administration
No ratings yet
Cdtfa California Departament Tax and Fee Administration
34 pages