Tourist Palce Reviews Sentiment Classification
Tourist Palce Reviews Sentiment Classification
ON
“TOURIST PALCE REVIEWS SENTIMENT
CLASSIFICATION”
SUBMITTED BY
AFFAN ABUTALHA CHAUS
MITU22MCAD0003
SUBMITTED TO
We further certify that to the best of our knowledge and belief, the matter
presented in this project has not been submitted to any Degree or Diploma
course.
Dr. Sangita Phunde Dr. Vijay Gondane Prof. Dr. Sunita Karad
HOD, MCA PG HEAD Executive Director-MITCOM
1.______________________ __________________
Internal Examiner
2.______________________ __________________
DECLARATION
I hereby declare that the project work entitled “Tourist Place Reviews
Sentiment Classification” submitted to the MIT – ADT University, Pune,
is a record of an original work done by me under the guidance of Dr.
Sangita Phunde, and this project work is submitted in the partial
fulfillment of the requirements for the award of the degree of Master of
Computer Application. The project work in this report has not been
submitted to any other University or Institute for the award of any degree
or diploma. This is my own and original work.
Date:
opportunity to work with them. Sincere thanks are uttered towards project
I express my gratitude to the PG Head Dr. Vijaya Gondane & Head of MCA
I am also thankful to Dr. Sangita Phunde, my internal project guide for her
invaluable guidance, help and great support during the project work.
I am greatly thankful to the staff of MITCOM, Pune for helping me through the
entire course.
TITLES PAGE NO
CONTENTS
ABSTRACT vi
1. INTRODUCTION 1
2. LITERATURE SURVEY 5
2.1 KEY-DEDUPLICATION WITH IBBE 5
2.2 SERVER LESS DISTRIBUTEDFILESYSTEM 6
2.3 THE GOOGLEFILESYSTEM 7
2.4 CONVERGENTKEYMANAGEMENT 8
2.5 SOFTWAREENVIRONMENT 9
2.6 WHY CHOOSEPYTHON 10
3. SYSTEM ANALYSIS 14
3.1 EXISTING SYSTEM 16
3.2 PROPOSEDSYSTEM 16
4. FEASIBILITYSTUDY 17
4.1 ECONOMICALFEASIBILITY 17
4.2 TECHNICALFEASIBILITY 17
4.3 SOCIALFEASIBILITY 18
5. SYSTEMREQUIREMENTS 19
6. SYSTEMDESIGN 20
6.1 SYSTEMARCHITECTURE 20
6.2 DATAFLOW DIAGRAM 20
6.3 UMLDIAGRAMS 22
7. IMPLEMENTATION 27
7.1 MODULES 27
7.2 SAMPLE CODE 28
8. SYSTEMTESTING 29
8.1 UNITTESTING 31
8.2 INTEGRATION TESTING 32
8.3 ACCEPTANCETESTING 33
9.2 OUTPUTDESIGN 35
10. SCREENSHOTS 37
11. FUTUREWORK 50
12. CONCLUSION 51
13. BIBLOGRAPHY 52
ABSTRACT:
Social media is growing trend now a days. Every day millions of user review
and rate tourist places on tourism websites. Sentiment analysis can be
performed over these reviews which will be helpful to find tourist place
popularity. Based on sentiment analysis result, tourist can easily decide tour
destination to be visited. In this paper sentiment analysis has been implemented
using machine learning approach. The Dataset has been collected from various
tourism review websites. Here we have performed comparative study of feature
extraction algorithms i.e. CountVectorization, TFIDFVectorization. Along with
classification algorithms Naive Bayes (NB), Support Vector Machine (SVM)
and Random Forest (RF). Performance of algorithms has been compared using
various parameters like accuracy, recall, precision and f1-score. From
experiment we found that TFIDFVectorization feature extraction algorithm has
improved accuracy of classification algorithm as compare to
CountVectorization for given review dataset. In sentiment classification of
tourist place reviews TFIDFVectorization+RF has given highest accuracy 86%
for a research dataset used.
1. INTRODUCTION
Social media is rapidly growing now a days. Millions of users post reviews and
rate tourist place on a daily basis over tourism websites. For analyzing this
reviews sentiment analysis can be performed. Proper analysis of reviews will
able to find a trend of tourist place popularity. Summarized results from
sentiment analysis will help tourist to decide the tour destination and tour
planning. In this research paper two feature extraction algorithms have been
used i.e. CountVectorization and TFIDFVectorization algorithm. Also three
classification algorithms Naive Bayes (NB), Support Vector Machine (SVM)
and Random Forest (RF) has been used for sentiment classification. Comparison
of performance has been performed for combination of fea- ture extraction and
classification algorithms on the basis of parameters like execution time,
accuracy, recall, precision and f1-score.
2. LITERATURE SURVEY
Python 2.0 was released in 2000, and the 2.x versions were the prevalent releases until
December 2008. At that time, the development team made the decision to release version 3.0,
which contained a few relatively small but significant changes that were not backward
compatible with the 2.x versions. Python 2 and 3 are very similar, and some features of
Python 3 have been back ported to Python 2. But in general, they remain not quite
compatible.
Both Python 2 and 3 have continued to be maintained and developed, with periodic release
updates for both. As of this writing, the most recent versions available are 2.7.15 and 3.6.5.
However, an official End of Life date of January 1, 2020 has been established for Python 2,
after which time it will no longer be maintained. If you are a newcomer to Python, it is
recommended that you focus on Python 3, as this tutorial will do.
Python is still maintained by a core development team at the Institute, and Guido is still in
charge, having been given the title of BDFL (Benevolent Dictator For Life) by the Python
community. The name Python, by the way, derives not from the snake, but from the British
comedy troupe Monty Python‟s Flying Circus, of which Guido was, and presumably still is, a
fan. It is common to find references to Monty Python sketches and movies scattered
throughout the Python documentation.
If you‟re going to write programs, there are literally dozens of commonly used languages to
choose from. Why choose Python? Here are some of the features that make Python an
appealing choice.
Python is Popular
Python has been growing in popularity over the last few years. The 2018 Stack Overflow
Developer Survey ranked Python as the 7th most popular and the number one most wanted
technology of the year. World-class software development countries around the globe use
Python every single day.
According to research by Dice Python is also one of the hottest skills to have and the most
popular programming language in the world based on the Popularity of Programming
Language Index.
Due to the popularity and widespread use of Python as a programming language, Python
developers are sought after and paid well. If you‟d like to dig deeper into Python salary
statistics and job opportunities, you can do so here.
Python is interpreted
Many languages are compiled, meaning the source code you create needs to be translated into
machine code, the language of your computer‟s processor, before it can be run. Programs
written in an interpreted language are passed straight to an interpreter that runs them directly.
This makes for a quicker development cycle because you just type in your code and run it,
without the intermediate compilation step.
One potential downside to interpreted languages is execution speed. Programs that are
compiled into the native language of the computer processor tend to run more quickly than
interpreted programs. For some applications that are particularly computationally intensive,
like graphics processing or intense number crunching, this can be limiting.
In practice, however, for most programs, the difference in execution speed is measured in
milliseconds, or seconds at most, and not appreciably noticeable to a human user. The
expediency of coding in an interpreted language is typically worth it for most applications.
Python is Free
The Python interpreter is developed under an OSI-approved open-source license, making it
free to install, use, and distribute, even for commercial purposes.
A version of the interpreter is available for virtually any platform there is, including all
flavors of Unix, Windows, macOS, smart phones and tablets, and probably anything else you
ever heard of. A version even exists for the half dozen people remaining who use OS/2.
Python is Portable
Because Python code is interpreted and not compiled into native machine instructions, code
written for one platform will work on any other platform that has the Python interpreter
installed. (This is true of any interpreted language, not just Python.)
Python is Simple
As programming languages go, Python is relatively uncluttered, and the developers have
deliberately kept it that way.
A rough estimate of the complexity of a language can be gleaned from the number of
keywords or reserved words in the language. These are words that are reserved for special
meaning by the compiler or interpreter because they designate specific built-in functionality
of the language.
Python 3 has 33 keywords, and Python 2 has 31. By contrast, C++ has 62, Java has 53, and
Visual Basic has more than 120, though these latter examples probably vary somewhat by
implementation or dialect.
Python code has a simple and clean structure that is easy to learn and easy to read. In fact, as
you will see, the language definition enforces code structure that is easy to read.
But It‟s Not That Simple For all its syntactical simplicity, Python supports most constructs
that would be expected in a very high-level language, including complex dynamic data types,
structured and functional programming, and object-oriented programming.
Additionally, a very extensive library of classes and functions is available that provides
capability well beyond what is built into the language, such as database manipulation or GUI
programming.
Python accomplishes what many programming languages don‟t: the language itself is simply
designed, but it is very versatile in terms of what you can accomplish with it.
Conclusion
This section gave an overview of the Python programming language, including:
Python is a great option, whether you are a beginning programmer looking to learn the basics,
an experienced programmer designing a large application, or anywhere in between. The
basics of Python are easily grasped, and yet its capabilities are vast. Proceed to the next
section to learn how to acquire and install Python on your computer.
Python is an open source programming language that was made to be easy-to-read and
powerful. A Dutch programmer named Guido van Rossum made Python in 1991. He named
it after the television show Monty Python's Flying Circus. Many Python examples and
tutorials include jokes from the show.
Python drew inspiration from other programming languages like C, C++, Java, Perl, and Lisp.
Python has a very easy-to-read syntax. Some of Python's syntax comes from C, because that
is the language that Python was written in. But Python uses whitespace to delimit code:
spaces or tabs are used to organize code into groups. This is different from C. In C, there is
a semicolon at the end of each line and curly braces ({}) are used to group code. Using
whitespace to delimit code makes Python a very easy-to-read language.
places. Sometimes only Python code is used for a program, but most of the time it is used to
do simple jobs while another programming language is used to do more complicated tasks.
Its standard library is made up of many functions that come with Python when it is installed.
On the Internet there are many other libraries available that make it possible for the Python
language to do more things. These libraries make it a powerful language; it can do many
different things.
Web development
Scientific programming
Desktop GUIs
Network programming
Game programming
3. SYSTEM ANALYSIS
3.1 EXISTING SYSTEM:
Every day millions of user review and rate tourist places on tourism websites.
Sentiment analysis can be performed over these reviews which will be helpful
to find tourist place popularity. Based on sentiment analysis result, tourist can
easily decide tour destination to be visited..
Different levels of sentiments are document level, sentence level, aspect level
which has been elaborated Approaches used for sentiment analysis in this paper
are machine learning based, Rule based and lexical based. Inside machine
learning approach various techniques are SVM (Support Vector Machine), NB
(Naive Bayes), Maximum Entropy, K-NN and Weighted K-NN, Multilingual
Sentiment Analysis also feature driven sentiment analysis has been described in
detailed. Various approaches of sentiment analysis has been compared its
corresponding advantages and disadvantages are described in detail. From
Various parameters of comparison like performance, efficiency, and accuracy it
has been found that machine learning approach gives best result. As described
in [2] paper twitter sentiment analysis has been performed on movie reviews.
They have used various supervised machine learning algorithms such as support
vector machine, naive bayes and maximum entropy using various feature
extraction techniques like unigram, bigram and hybrid i.e. unigram + bigram.
From research study they have concluded that SVM using hybrid feature
extractor outperforms over other techniques.
3.2 PROPOSED SYSTEM:
In this paper sentiment analysis has been implemented using machine
learning approach. The Dataset has been collected from various tourism
review websites. Here we have performed comparative study of feature
extraction algorithms i.e. CountVectorization, TFIDFVectorization. Along
with classification algorithms Naive Bayes (NB), Support Vector Machine
(SVM) and Random Forest (RF). Performance of algorithms has been
compared using various parameters like accuracy, recall, precision and f1-
score. From experiment we found that TFIDFVectorization feature
extraction algorithm has improved accuracy of classification algorithm as
compare to CountVectorization for given review dataset. In sentiment
classification of tourist place reviews TFIDFVectorization+RF has given
highest accuracy 86% for a research dataset used.
4. FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal is
put forth with a very general plan for the project and some cost estimates.
During system analysis the feasibility study of the proposed system is to be
carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major
requirements for the system is essential.
• Tool : PyCharm
• Database : MYSQL
• Server : Flask
6. SYSTEM DESIGN
Yes NO
Unauthorized user
Check
Comparison Graph
Your Review
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so
that they can develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core
concepts.
3. Be independent of particular programming languages and development
process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Integrate best practices.
USE CASE DIAGRAM:
A use case diagram in the Unified Modeling Language (UML) is a type
of behavioral diagram defined by and created from a Use-case analysis. Its
purpose is to present a graphical overview of the functionality provided by a
system in terms of actors, their goals (represented as use cases), and any
dependencies between those use cases. The main purpose of a use case diagram
is to show what system functions are performed for which actor. Roles of the
actors in the system can bedepicted.
CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language
(UML) is a type of static structure diagram that describes the structure of a
system by showing the system's classes, their attributes, operations (or
methods), and the relationships among the classes. It explains which class
contains information.
SEQUENCE DIAGRAM:
A sequence diagram in Unified Modeling Language (UML) is a kind of
interaction diagram that shows how processes operate with one another and in
what order. It is a construct of a Message Sequence Chart. Sequence diagrams
are sometimes called event diagrams, event scenarios, and timing diagrams.
ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise
activities and actions with support for choice, iteration and concurrency. In the
Unified Modeling Language, activity diagrams can be used to describe the
business and operational step-by-step workflows of components in a system. An
activity diagram shows the overall flow of control.
7. IMPLEMENTATION
7.1 MODULES:
Upload Tourism Reviews Dataset
Preprocess Dataset
TFIDF Feature Extraction
Count Vectorization Features Extraction
Run SVM,Naive Bayes And Random Forest With TFIDF
Run SVM,Naive Bayes And Random Forest With CountVector
Comparison Graph
Your Review
Predict Sentiments from Review
Search Places
MODULES DESCRIPTION:
7.2 SAMPLE CODE
8. SYSTEM TESTING
TYPES OF TESTS
Unit testing:
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It is
the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing,
that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a
business process performs accurately to the documented specifications and
contains clearly defined inputs and expected results.
Integration testing:
Integration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven and is
more concerned with the basic outcome of screens or fields. Integration tests
demonstrate that although the components were individually satisfaction, as
shown by successfully unit testing, the combination of components is correct
and consistent. Integration testing is specifically aimed at exposing the
problems that arise from the combination of components.
Functional test:
Functional tests provide systematic demonstrations that functions tested
are available as specified by the business and technical requirements, system
documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be
exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on
requirements, key functions, or special test cases. In addition, systematic
coverage pertaining to identify Business process flows; data fields, predefined
processes, and successive processes must be considered for testing. Before
functional testing is complete, additional tests are identified and the effective
value of current tests is determined.
System Test:
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results.
An example of system testing is the configuration oriented system integration
test. System testing is based on process descriptions and flows, emphasizing
pre-driven process links and integration points.
White Box Testing:
White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at
least its purpose. It is purpose. It is used to test areas that cannot be reached
from a black box level.
Black Box Testing:
Black Box Testing is testing the software without any knowledge of the
inner workings, structure or language of the module being tested. Black box
tests, as most other kinds of tests, must be written from a definitive source
document, such as specification or requirements document, such as specification
or requirements document. It is a testing in which the software under test is
treated, as a black box .you cannot “see” into it. The test provides inputs and
responds to outputs without considering how the software works.
8.1 Unit Testing:
Unit testing is usually conducted as part of a combined code and unit test
phase of the software lifecycle, although it is not uncommon for coding and unit
testing to be conducted as two distinct phases.
Test strategy and approach:
Field testing will be performed manually and functional tests will be
written in detail.
Test objectives:
All field entries must work properly.
Pages must be activated from the identified link.
The entry screen, messages and responses must not be delayed.
Features to be tested
Verify that the entries are of the correct format
No duplicate entries should be allowed
All links should take the user to the correct page.
8.2 Integration Testing
Software integration testing is the incremental integration testing of two
or more integrated software components on a single platform to produce failures
caused by interface defects.
The task of the integration test is to check that components or software
applications, e.g. components in a software system or – one step up – software
applications at the company level – interact without error.
Test Results: All the test cases mentioned above passed successfully. No
defects encountered.
Now in below output you can see suggestions also in text box
In below screen u can see separated tables for „Chand Baori‟ search place. All positive Reviews in
below screen
12. CONCLUSION
From research study, we can infer that TFIDFVectorization has outperformed
over CountVectorization feature extraction algorithm by increasing accuracy of
classification. But feature extraction using TFIDFVectorization requires more
execution time than CountVectorization algorithm. In research, classification
algorithms Support Vector Machine(SVM), Naive Bayes(NB), Random
Forest(RF) has been used. It has found that TFIDFVectorization+RF
outperformed over other algorithms used on bases of several evaluation
parameters like accuracy, precision, recall and f1-score.
13. BIBLIOGRAPHY
1. M.D.Devika, C.Sunitha, Amal Ganesh “Sentiment Analysis: A
Comparative Study on Different Approaches” ScienceDirect Fourth
Interna- tional Conference on Recent Trends in Computer Science
Engineering https://doi.org/10.1016/j.procs.2016.05.124
10.Xing Fang and Justin Zhan ”Sentiment analysis using product review data
” Springer an Journal of Big Data (2015) 2:5 DOI 10.1186/s40537- 015-
0015-2
17.https://www.tripadvisor.in/
18.https://www.mouthshut.com/