0% found this document useful (0 votes)

2 views

Twitte Analysis

The document presents a project on Twitter sentiment analysis using natural language processing (NLP) and various machine learning algorithms, highlighting the significance of analyzing unstructured tweet data to determine sentiments. The study identifies Random Forest as the most effective model, achieving an accuracy of 93.94%. It discusses the challenges and applications of sentiment analysis, emphasizing its importance in understanding public opinions and enhancing business strategies.

Uploaded by

nnile207

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Twitte Analysis

Uploaded by

nnile207

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 53

DEFENCE SERVICES ACADEMY

DEPARTMENT OF COMPUTER TECHNOLOGY

Twitter Sentiment Analysis Project using NLP

By
Captain. Lin Htet Paing

SECOND SEMESTER
(COURSE PROJECT)

MARCH, 2023
PYIN OO LWIN
ABSTRACT
With the advancement of web technology and its growth, there is a huge
volume of data present in the web for internet users and a lot of data is generated
too. The Internet has become a platform for online learning, exchanging ideas,
and sharing opinions. Social networking sites like Twitter, Facebook, and
Google+ are rapidly gaining popularity as they allow people to share and express
their views about topics, have discussions with different communities, or post
messages across the world. There has been a lot of work in the field of sentiment
analysis of Twitter data. This survey focuses mainly on sentiment analysis of
Twitter data which is helpful to analyze the information in the tweets where
opinions are highly unstructured, heterogeneous and are either positive or
negative, or neutral in some cases. Sentiment analysis (also known as opinion
mining or emotion AI) is the use of natural language processing, text analysis,
computational linguistics, and biometrics to systematically identify, extract,
quantify, and study affective states and subjective information. In this paper,
Using various machine learning algorithms like Naive Bayes, Logistics
Regression, Random Forest, Decision Tree and Support Vector Machine, we
provide research on Twitter data streams. We have also discussed general
challenges and applications of Sentiment Analysis on Twitter. Random Forest is
the best model in my project and accuracy is 93.94%.

Keywords: Machine Learning, Support Vector Machine, Naïve Bayes, Logistics

Regression, Random Forest, Decision Tree
TABLE OF CONTENTS
Page
ABSTRACT i
TABLE OF CONTENTS ii
LIST OF FIGURES v
CHAPTER TITLE
1 INTRODUCTION 1
1.1 Problem Statement 2
1.2 Objectives 2
1.3 Summary 2
2 BACKGROUND THEORY 3
2.1 Sentiment Analysis 3
2.2 Twitter Sentiment Analysis 4
2.3 Natural Language Processing (NLP) 5
2.3.1 NLP Terminology 6
2.3.2 Application of NLP 6
2.3.3 Processes in NLP 7
2.4 Different Approaches for Sentiment Analysis 7
2.4.1 Lexicon-Based Approach 8
2.4.2 Machine Learning Approach 9
2.5 Types of Machine Learning Algorithms 10
2.5.1 Logistic Regression 10
2.5.2 Naive Bayes 11
2.5.3 XG-Boost 12
2.5.4 Decision Tree 14
2.5.5 Support Vector Machine 15
2.6 Feature Extraction Method 16
2.7 Literature Review 18
2.8 Summary 18
3 PROPOSED SYSTEM AND IMPLEMENTATION 19
3.1 Dataset Description 19
3.2 Proposed System 20
3.3 Steps of Implementation 21
3.4 Data Collection 21
3.5 Data Exploration 22
3.6 Summary 23
4 EXPERIMENTAL RESULTS AND CONCLUSION 25
4.1 Implementation 25
4.1.1 Import Twitter Datasets in Python 25
4.1.2 Data Preprocessing(Concatenate data frames and
Shuffle the data) 26
4.1.3 Removing an unused column from Dataset 26
4.1.4 Entity vs Sentiment Relation 26
4.1.5 Visualizing Sentiment feature(Target) 27
4.1.6 Dropping null values and duplicated values 27
4.1.7 Label Encoding (from categorical to numerical ) 28
4.1.8 Convert to Lower Case 28
4.1.9 Remove Punctuation 29
4.1.10 Remove Stopwords 29
4.1.11 Tokenizing the Texts 29
4.1.12 Remove Words less than 3 30
4.1.13 Stemming 30
4.1.14 Lemmatize the Texts 30
4.1.15 Return Statements 31
4.1.16 Before Cleaning and After Cleaning 31
4.1.17 Plot WordCloud Target 31
4.1.18Train and Test Splitting 32
4.1.19Convert Lemmatized Words to Tf-Idf Feature
Vectors 32
4.1.20Using Logistic Regression 33
4.1.21Using eXtreme Gradient Boosting 33
4.1.22Using Naïve Bayes Classifier 34
4.1.23Using DecisionTreeClassifier 34
4.1.24Using Support Vector Machine 35
4.1.25Comparing Models and Predicting Texts 36
4.2 Limitation of Sentiment Analysis system 36
4.3 Conclusion 36
REFERENCES
APPENDICES
LIST OF FIGURES
Figure Page
Figure 2. 1 Sentiment Analysis 4
Figure 2. 2 Twitter Sentiment Analysis in NLP 5
Figure 2. 3 Natural Language Processing (NLP) 5
Figure 2. 4 Processes in NLP 7
Figure 2. 5 Sentiment Analysis using machine learning 10
Figure 2. 6 Logistic Regression Classification System 11
Figure 2. 7 Naïve Bayes Classification System 12
Figure 2. 8 Schematic Diagram for Random Forest Classification 14
Figure 2. 9 Schematic Diagram for Decision Tree Classification 15
Figure 3. 1 Training Dataset for Twitter 20
Figure 3. 2 The Proposed System of Sentiment Analysis 21
Figure 4. 1 Training Twitter datasets Description 25
Figure 4. 2 Values in Sentiment Description 26
Figure 4. 3 Removing the Tweet ID and Entity Column 26
Figure 4. 4 Entity and Sentiment Relation 26
Figure 4. 5 Visualizing Sentiment Features 27
Figure 4. 6 Drop and Duplicate Description 27
Figure 4. 7 Dataset Description After Dropping Unuse Columns 28
Figure 4. 8 Convert to Lower Case description 28
Figure 4. 9 Remove punctuation 29
Figure 4. 10 Remove Stopwords 29
Figure 4. 11 Tokenizing Description 29
List of Figure (Continued)
Figure 4. 12 Remove Words Description 30
Figure 4. 13 Stemming description 30
Figure 4. 14 Lemmatization Description 30
Figure 4. 15 Return Statement Description 31
Figure 4. 16 Feature Extraction 31
Figure 4. 17 WordCloud Description 31
Figure 4. 18 Train-Test Splitting Description 32
Figure 4. 19 Tf-Idf Feature Description 32
Figure 4. 20 Logistic Regression Accuracy Description 33
Figure 4. 21 Random Forest Accuracy Description 33
Figure 4. 22 Naïve Bayes Accuracy Description 34
Figure 4. 23 Decision Tree Classifier Accuracy Description 34
Figure 4. 24 Support Vector Machine Accuracy Description 35
CHAPTER 1
INTRODUCTION
Sentiment analysis is a technique through which you can analyze a piece
of text to determine the sentiment behind it. It combines machine learning and
natural language processing (NLP) to achieve this. Using basic Sentiment
analysis, a program can understand whether the sentiment behind a piece of text
is positive, negative, or neutral. It is a powerful technique in Artificial
intelligence that has important business applications. For example, we can use
sentiment analysis to analyze customer feedback. After collecting that feedback
through various mediums like Twitter and Facebook, we can run sentiment
analysis algorithms on those text snippets to understand our customers' attitudes
towards our product. Sentiment Analysis is a type of classification where the
data is classified into different classes. With the help of sentiment analysis, we
can find out the nature of opinion that is reflected in documents, websites, social
media feeds, etc. These classes can be binary in nature (positive or negative) or,
they can have multiple classes (happy, sad, angry, etc.).
In this project, one such social networking site is taken into account,
which is among the largest networking sites, Twitter. Looking at the statistics,
users that are active monthly range from about 316 million, and on average,
about 500 million tweets are sent daily (Twitter, 2016). Since these statistical
values are extremely high, the content is restricted to a minimal level, and
because the text has no uniform structures, social networking sites such as
Twitter, and those similar to it put up challenges for the classifiers to analyze
their data.
2

1.1.1 Problem Statement

We try to implement an NLP Twitter sentiment analysis model that helps
to overcome the challenges of identifying the sentiments of the tweets. The key
point is to find the sentiment of the text data. The task is to identify if the tweets
have a Negative, Positive, or Neutral sentiment towards the company or entity.
We want to compare sentiment from one quarter to the next to see if we need to
take action. Despite the availability of software to extract data regarding a
person’s sentiment on a specific product or service, organizations, and other data
workers still face issues regarding data extraction.
1.2 Objectives
The Objectives of this project are as follows.
(a) To build a sentiment analysis model that categorizes words based
on their sentiments.
(b) To analyze data on Twitter data.
(c) To analyze for monitoring and analyzing social phenomena, predict
potentially dangerous situations, and determine the general mood of
the blogosphere.
1.3 Summary
The introduction and the basic concept of Sentiment analysis are briefly
described in this chapter. Sentiment analysis provides a way to understand the
attitudes and opinions expressed in texts. The background theory of Sentiments
Analysis is explained in the next.
3

CHAPTER 2
BACKGROUND THEORY
In this chapter, a definition of sentiment analysis, methodologies, and
summary will be proposed. Twitter is a good source of information for
individuals' opinions. Twitter receives about 500 million tweets a day, where
people share comments regarding a wide range of topics. Many consumers take
to Twitter to give their opinions on current events, including real-time affairs.
By performing sentiment analysis on these tweets, one can determine the
polarity and inclination of a population towards specific topics, items, or entities.
Retweeting is a largely used mechanism for information diffusion on Twitter. It
has emerged as a simple yet powerful way of circulating information in the
Twitter social realm.
2.1 Sentiment Analysis
Sentiment analysis is contextual mining of text that identifies and extracts
subjective information in the source material and helps a business to understand
the social sentiment of their brand, product, or service while monitoring online
conversations. However, analysis of social media streams is usually restricted to
just basic sentiment analysis and count-based metrics. This is akin to just
scratching the surface and missing out on those high-value insights that are
waiting to be discovered. With the recent advances in deep learning, the ability of
algorithms to analyze text has improved considerably. Creative use of advanced
artificial intelligence techniques can be an effective tool for doing in-depth
research. We believe it is important to classify incoming customer conversations
about a brand based on the following lines. They are: key aspects of a brand’s
product and service that customers care about and users’ underlying intentions
and reactions concerning those aspects. These basic concepts when used in
4

combination, become a very important tool for analyzing millions of brand

conversations with human-level accuracy.

Figure 2. 1 Sentiment Analysis

2.2 Twitter Sentiment Analysis
Twitter sentiment analysis, which is a subset of social media sentiment
analysis, helps firms to understand their audience on social channels, stay on top
of what’s being said about their brand – and their rivals – and uncover new
trends in the market by carefully listening to the voice of the consumer on
Twitter.
On Twitter, a bazillion discussions are going on — talks that have no
boundaries. Politicians use Twitter to speak about their agendas and progress
with the audience. Businesses and brands use Twitter to communicate with their
customers in real time.
Unfortunately, the amount of data generated is so voluminous that apart
from user data, there is a large amount of non-useful data, often known as noise.
This is why it’s difficult for businesses to know which tweets to respond to
initially. Twitter sentiment analysis fills this void.
5

The Automated Machine Learning Sentiment Analysis Model has been

developed to understand customer perception from the data collected from
Twitter. This is why sentiment analysis has become a crucial tool for social
media marketing strategies.

Figure 2. 2 Twitter Sentiment Analysis in NLP

2.3 Natural Language Processing (NLP)
NLP stands for Natural Language Processing, which is a part of Computer
Science, Human language, and Artificial Intelligence. It is the technology that is
used by machines to understand, analyze, manipulate, and interpret human
languages.

Figure 2. 3 Natural Language Processing (NLP)

NLP helps developers organize knowledge for performing tasks such
as translation, automatic summarization, Named Entity Recognition (NER),
6

speech recognition, relationship extraction, and topic segmentation. NLP

combines computational linguistics rule-based modeling of human language
with statistical, machine learning, and deep learning models. Together, these
technologies enable computers to process human language in the form of text or
voice data and to ‘understand’ its full meaning, complete with the speaker or
writer’s intent and sentiment.
2.3.1 NLP Terminology
(a) Phonology − It is the study of organizing sound systematically.
(b) Morphology − It is a study of the construction of words from
primitive meaningful units.
(c) Morpheme − It is a primitive unit of meaning in a language.
(d) Syntax − It refers to arranging words to make a sentence. It also
involves determining the structural role of words in the sentence
and in phrases.
(e) Semantics − It is concerned with the meaning of words and how to
combine words into meaningful phrases and sentences.
(f) Pragmatics − It deals with using and understanding sentences in
different situations and how the interpretation of the sentence is
affected.
(g) Discourse − It deals with how the immediately preceding sentence
can affect the interpretation of the next sentence.
(h) World Knowledge − It includes the general knowledge about the
world.
2.3.2 Application of NLP
There are the following applications of NLP:
(a) Question Answering
(b) Spam Detection
(c) Sentiment Analysis
(d) Machine Translation
7

(e) Spelling Correction

(f) Speech Recognition
(g) Chatbot
(h) Information Extraction
(i) Natural Language Understanding (NLU)
2.3.3 Processes in NLP
There are nine in NLP
(a) Sentence Segmentation
(b) Word Tokenization
(c) Stemming
(d) Lemmatization
(e) Identifying Stop Words
(f) Dependency Parsing
(g) Part of Speech (POS) Tags
(h) Named Entity Recognition (NER)
(i) Chunking

Figure 2. 4 Processes in NLP

2.4 Different Approaches for Sentiment Analysis
There are many approaches used for sentiment analysis on linguistic data,
and which approach to be used depends on the nature of the data and the
8

platform you are working on. Most research carried out in the field of sentiment
analysis employs lexicon-based analysis or machine-learning techniques.
Machine learning techniques control the data processing by the use of machine
learning algorithms and by classifying the linguistic data by representing them in
vector form (Olsson et al. 2009). On the other side, the Lexicon-based (also
called Dictionary-based) approach classifies the linguistic data using a dictionary
lookup database. During this classification, it computes sentence or document-
level sentiment polarity using lexicon databases for processing linguistic data
like WordNet, SentiWordNet, and Treebanks. In this section, a brief discussion
of lexicon-based and Machine Learning approaches has been outlined.
2.4.1 Lexicon-Based Approach
The lexicon-based approach predicts the sentiments by using lexical
databases like SentiWordNet and WordNet. It obtains a score for each word in
the sentence or document and annotates using the features from the lexicon
database that are present. It derives text polarity based on a set of words, each of
which is annotated with the weight and extracts information that contributes to
concluding the overall sentiments of the text. Also, it is necessary to pre-process
data before assigning the weight to the words.
Moreover, Lexicon dictionary or database contains the opinionated words
that are classified with positive and negative word type, and the description of
the word that occurs in current context. For each word in the document, it is
assigned with numeric score, and average score is computed by summing up all
the numeric scores and sentiment polarity is assigned to the document. There are
three general-purpose lexicons. They are:
(a) AFINN
(b) Bing
(c) Loughran
These three lexicons make use of the unigrams. Unigrams are a type of n-
gram model that consists of a sequence of 1 item, that is, a word collected from a
9

given textual data. In the AFINN lexicon model scores the words in a range from
-5 to 5. The increase in negativity corresponds the negative sentiment whereas
an increase in positivity corresponds the positive one. The bing lexicon model on
the other hand, classifies the sentiment into a binary category of negative or
positive. And finally, the loughran model that performs analysis of the
shareholder’s reports. In this project, we will make use of the bing lexicons to
extract the sentiments out of our data.
2.4.2 Machine Learning Approach
Machine Learning approach is widely seen in the literature on sentiment
analysis. Using this approach the words in the sentence are considered in form of
vectors, and analyzed using different machine learning algorithms like Naïve
Bayes, SVM, and Maximum Entropy. The data is trained accordingly, which can
be applied to machine learning algorithms. This data has been appended with the
total positive score, negative score in the tweets and sentiment labeling
(‘POSITIVE’,’NEGATIVE’ and ‘NEUTRAL’) has been assigned to each tweet
in the dataset. These data sets which are labeled with sentiment of the tweets are
further trained using machine learning algorithm to measure its accuracy,
performance and reliability of the result obtained from lexicon-based sentiment
analysis. The most abstract view that performs sentiment analysis using machine
learning .
10

Figure 2. 5 Sentiment Analysis using machine learning

2.5 Types of Machine Learning Algorithms
The proposed system uses the following machine learning algorithms:
(a) Logistic Regression
(b) Naive Bayes
(c) XG-Boost
(d) Decision Tree
2.5.1 Logistic Regression
A statistical analysis technique called logistic regression uses previous
observations from a data set to predict a binary outcome, such as yes or no. A
logistic regression model uses an analysis of the correlation between one or
more pre-existing independent variables to predict a dependent data variable.
Multiple input criteria may be considered by a logistic regression model.
For instance, logistic regression could be used to forecast whether a
candidate for office will win or lose, or if a high school student will be accepted
into a particular institution or not. These binary outcomes allow straightforward
decisions between two alternatives. A logistic regression model can take into
consideration multiple input criteria.
11

Figure 2. 6 Logistic Regression Classification System

Logistic regression has become an important tool in the discipline of
machine learning. It allows algorithms used in machine learning applications to
classify incoming data based on historical data. As additional relevant data
comes in, the algorithms get better at predicting classifications within data sets.
Logistic regression can also play a role in data preparation activities by allowing
data sets to be put into specifically predefined buckets during the extract,
transform, and load (ETL) process to stage the information for analysis.
2.5.2 Naive Bayes
The Naïve Bayes algorithm is a supervised learning algorithm that is
based on the Bayes theorem and is used for solving classification problems. It is
mainly used in text classification, which includes a high-dimensional training
dataset. One of the most straightforward and efficient classification algorithms is
the naive bayes classifier, which aids in the development of quick machine
learning models capable of making prompt predictions. When a probabilistic
classifier, it makes predictions based on the likelihood that an object will occur.
Spam filtration, sentiment analysis, and classification are a few common uses of
the Naive Bayes algorithm.
12

Bayes’ Theorem finds the probability of an event occurring given the

probability of another event that has already occurred. The formula of Bayes’
theorem is showed in equation:

P ( B∨ A ) P( A)
P ( A∨B ) =
P(B)

where,
A, B = events
P(A | B) = probability of A given B is true
P(B | A) = probability of B given A is true and
P(A), P(B) = the independent probabilities of A and B

Figure 2. 7 Naive Bayes Classification System

2.5.3 Random Forest
Several decision trees are grown and combined to form a "forest" in the
Random Forest supervised machine learning technique. It is possible to employ
Random Forest to solve both classification and regression problems. For a more
precise forecast, Random Forest produces numerous decision trees that are then
combined.
A random forest is a machine learning method for tackling classification
and regression issues. It makes use of ensemble learning, a method for solving
13

complicated issues by combining a number of classifiers. In a random forest

algorithm, there are many different decision trees. The random forest algorithm
creates a "forest" that is trained via bagging or bootstrap aggregation. The
accuracy of machine learning algorithms is increased by bagging, an ensemble
meta-algorithm. Based on the predictions of the decision trees, the (random
forest) algorithm determines the result. It makes predictions by averaging or
averaging out the results from different trees. The accuracy of the result grows as
the number of trees increases .
The Gini Index Formula is showed in equation 2.2. The Weight of Gini
Index Formula is showed in equation.
n
Gini Index (T )=1−∑ (P¿¿ i)2 ¿
i=1

¿1– ¿
Weight of Gini Index=N ¿¿ ¿

where,
Pi= the probability of classes

P¿¿ = the probability of positive class

P¿¿ = the probability of negative class

N ¿ ¿= the total number of positive class

N (– )= the total number of negative class

Similarly, the algorithm will try to find the Gini index of all the splits
possible and will choose that feature for the root node. Then, for each decision
tree, the result is decided. Finally, we used majority voting of random forest
classifier to choose the best results.
14

Figure 2. 8 Random Forest Classification

2.5.4 Decision Tree
A Decision Tree is a supervised Machine learning algorithm. It is used in
both classification and regression algorithms. The decision tree is like a tree with
nodes. The branches depend on a number of factors. It splits data into branches
like these till it achieves a threshold value. A decision tree consists of the root
nodes, children’s nodes, and leaf nodes.
Let’s Understand the decision tree methods by Taking one Real-life
Scenario. Imagine that you play football every Sunday and you always invite
your friend to come to play with you. Sometimes your friend actually comes and
sometimes he doesn’t. The factor on whether or not to come depends on
numerous things, like weather, temperature, wind, and fatigue. We start to take
all of these features into consideration and begin tracking them alongside your
friend’s decision whether to come for playing or not.
You can use this data to predict whether or not your friend will come to
play football or not. The technique you could use is a decision tree. Here’s what
the decision tree would look like after implementation:
15

Every decision tree consists following list of elements:

(a) Nodes: It is the point where the tree splits according to the value of
some attribute/feature of the dataset.
(b) Edges: It directs the outcome of a split to the next node we can see
in the figure above that there are nodes for features like outlook,
humidity and windy. There is an edge for each potential value of
each of those attributes/features.
(c) Root: This is the node where the first split takes place

(d) Leaves: These are the terminal nodes that predict the outcome of
the decision tree.

Figure 2. 9 Schematic Diagram for Decision Tree Classification

2.5.5 Support Vector Machine
SVM stands for Support Vector Machines. It is a popular machine
learning algorithm used for classification and regression analysis. SVM is a
powerful and versatile algorithm that can be used for a variety of applications,
including image classification, text classification, and bioinformatics.
16

The main idea behind SVM is to find a hyperplane that maximally

separates the data points of different classes in a high-dimensional space. The
hyperplane is defined by a set of weights and biases, which are optimized during
the training process. The hyperplane is chosen so that it maximizes the margin
between the two closest data points of different classes. The data points that lie
closest to the hyperplane are called support vectors, and they are used to define
the hyperplane.
SVM can handle both linearly separable and non-linearly separable
datasets by using different types of kernels, such as linear, polynomial, and
radial basis function (RBF) kernels. The choice of kernel depends on the nature
of the data and the complexity of the decision boundary that needs to be learned.
SVM have several advantages, including:
(a) High accuracy and performance on a wide range of datasets.
(b) Ability to handle high-dimensional data with many features.
(c) Robustness to outliers and noise in the data.
(d) Flexibility to handle both linearly and non-linearly separable data.
However, SVM also has some disadvantages, such as:
(a) Sensitivity to the choice of kernel and its parameters
(b) Difficulty in interpreting the learned model
(c) High computational and memory requirements for large datasets
2.6 Feature Extraction Method
Feature extraction is a process of dimensionality reduction by which an
initial set of raw data is reduced to more manageable groups for processing. A
characteristic of these large data sets is a large number of variables that require a
lot of computing resources to process. Feature extraction is the name for
methods that select and /or combine variables into features, effectively reducing
the amount of data that must be processed, while still accurately and completely
describing the original data set. The process of feature extraction is useful when
you need to reduce the number of resources needed for processing without losing
17

important or relevant information. Feature extraction can also reduce the amount
of redundant data for a given analysis. Also, the reduction of the data and the
machine’s efforts in building variable combinations (features) facilitate the
speed of learning and generalization steps in the machine learning process.
Data scientists use many feature extraction methods to tap into the value
of raw data sources. Let’s look at three of the most common and how they’re
used to extract data useful for machine learning applications.
(a) Image Processing
(b) Bag of Words
(c) Autoencoder
In this project, we are using TF-IDF feature extraction method. A
statistical technique called TF-IDF (term frequency-inverse document
frequency) assesses how pertinent a word is to a document within a collection of
documents. A word's frequency in a document and its inverse document
frequency over a group of documents are multiplied in order to achieve this. It is
highly useful for scoring words in machine learning algorithms for Natural
Language Processing and has a wide range of applications, with automated text
analysis being the most essential one (NLP). For document search and
information retrieval, TF-IDF was developed. It operates by increasing
according to the frequency with which a word appears in a document, but is
counterbalanced by the quantity of documents in which the word appears. So,
words that are common in every document, such as this, what, and if, rank low
even though they may appear many times, since they don’t mean much to that
document in particular.
However, if the word "Bug" appears frequently in one document but not
in another, it is presumably because it is highly relevant. For instance, if our goal
is to identify the subjects to which specific NPS replies belong, the term "bug" is
likely to be associated with the topic "reliability" because most responses
including that word will be related to that topic.
18

2.7 Literature Review

Raksha Sharma and Yogesh Kumar Sharma are described Sentiment
Analysis of Twitter Data Using Machine Learning Techniques. To analyze the
effectiveness of different machine learning algorithms for sentiment analysis of
Twitter data. The authors collected a dataset of tweets related to different topics
and sentiments. They preprocessed the data by removing stop words, stemming,
and converting to lowercase. They then used various machine learning
algorithms, including Naive Bayes, Support Vector Machines, and Decision
Trees, to classify the tweets as positive, negative, or neutral. They evaluated the
accuracy of each algorithm using metrics such as precision, recall, and F1 score.
The study found that the Support Vector Machines (SVM) algorithm achieved
the highest accuracy of 87.52% for sentiment analysis of Twitter data, followed
by the Naive Bayes algorithm with an accuracy of 84.64%. The Decision Trees
algorithm achieved the lowest accuracy of 73.53%.
Authors Kalpana and Vasuki conduct a comparative study to evaluate the
effectiveness of various sentiment analysis techniques for Twitter data. They
collect a dataset of tweets related to different domains and preprocess the data by
removing stop words, stemming, and converting to lowercase. The authors use
Naive Bayes, Maximum Entropy, and Support Vector Machines algorithms to
classify the tweets as positive, negative, or neutral and evaluate the performance
of each technique using accuracy, precision, recall, and F1 score. The study
found that Support Vector Machines achieved the highest accuracy of 87.3% for
sentiment analysis of Twitter data, followed by Maximum Entropy with an
accuracy of 85.2%. The study concludes that the choice of technique depends on
the specific domain and the characteristics of the data, and recommends
appropriate preprocessing techniques and feature selection methods to improve
the performance of sentiment analysis techniques. This article provides valuable
insights for researchers and practitioners working in the area of sentiment
analysis of social media data.
19

2.8 Summary
The background theories of sentiment analysis and natural language
processing are briefly described in this chapter. There are two different
approaches to analysis sentiment such as lexicon based and machine learning
based approach. In our proposed system, we discussed Logistic Regression,
Naive Bayes, Random Forest, Support Vector Machine and Decision Tree as
machine learning algorithms. We discussed the TFIDF feature for feature
extraction. Twitter sentiment analysis, sentiment analysis, machine learning, and
feature extraction methods are discussed. The implementation of Twitter
Sentiment Analysis will be described in the next chapter.
20

CHAPTER 3
PROPOSED SYSTEM AND IMPLEMENTATION
In this section, we will discuss the Sentiment Analysis Project using NLP
techniques.
3.1 Dataset Description
This Twitter sentiment Analysis Dataset is taken from Kaggle
Website. The training dataset contains 61691 rows and 4 columns. We have
focused our selection on those datasets that are: (i) publicly available to the
research community, (ii) manually annotated, providing a reliable set of
judgements over the tweets and, (iii) used to evaluate several sentiment analysis
models and the entity or company to which they are referring, and an assigned
sentiment. The various columns present in the dataset are:
 Sentiment: the polarity of the tweet (positive, negative ,natural)
 Tweet ID: Unique id of the tweet
 Entity: It refers to the name of the mentioned entity
 Tweet_content: It refers to the text of the tweets
. The following Figure 3.1 display sample of datasets using in the project:

Figure 3. 1 Training Dataset for Twitter

3.2 Proposed System

The proposed system for sentiments analysis system are as follows.

Twitter Dataset

Preprocessing Machine
Training Data Learning
Algorithm

Data Cleaning Train-Test Split

Model
Building

Testing Data
Feature Extraction
(TF-IDF Best Model

New Data Preprocessing Prediction Result

Figure 3. 2 The Proposed System of Sentiment Analysis

3.3 Steps of Implementation
The main steps are as follows.
(a) Step 1 Data Collection
(b) Step 2 Data Exploration
(c) Step 3 Data Cleaning
(d) Step 4 Feature Extraction
(e) Step 5 Train-Test Splitting
(f) Step 6 Model Building
(g) Step 7 Model Evaluation
(h) Step 8 Prediction
(i) Step 9 Result
22

3.4 Data Collection

Data Collection is one most important and crucial aspects of the Sentiment
Analysis application. Due to the wide adoption of machine learning models,
simply having large datasets on a domain specific task does not ensure superior
performance. The performance of the model depends on the quality of dataset
and labelling/annotation. As ML models learn from the data they are trained
with, automatic predictions are likely to mirror the human disagreement
identified during annotation. As a result, having a proper guideline to annotate
data is also of utmost importance (Mohammad, S. 2016).Ways to collect data for
sentiment analysis:
(a) Using API provided by social media platform which allows to
collect data in a streaming fashion. Example: Twitter API to extract
tweets by hashtags, News API to extract news by category from
different news publishers.
(b) Using Web scrapers that crawl up web data and collect specified
information. It extracts data from webpage ( HTML document).
Example: Scrapy, Beautiful Soup is a Python web-scraping package
to extract any information from the web like news articles or
comments from blogs by parsing HTML tags.
(c) Using a Web browser plugin with which users can extract
information from any public website using HTML and export the
data to the desired file format. Example: Webscraper.io is a free
extension for the Google Chrome web browser.
(d) Using existing open-source repositories of data that are cleaned and
compiled which can be used directly. Example: Rotten Tomatoes,
IMDB movie review, Yelp, Amazon product review, Twitter tweets
on Kaggle and from other websites.
23

3.5 Data Exploration

Data exploration is an important first step in any data analysis
project, including sentiment analysis on a Twitter dataset. Here are some
key steps we can take to explore your Twitter sentiment analysis dataset:
(a) Understand the dataset: Begin by understanding the structure of the
dataset. This includes the number of tweets, the date range covered
by the dataset, and the attributes of each tweet, such as the text of
the tweet, the user who posted it, and any metadata like retweets,
hashtags, or mentions.
(b) Data cleaning: Before exploring the data, it is important to clean it.
This involves removing any duplicate tweets, removing tweets that
contain irrelevant information, and correcting any misspellings or
other errors in the text.
(c) Text preprocessing: Once the data is cleaned, we should preprocess
the text of the tweets. This involves removing stop words (common
words like "the" and "and"), stemming (reducing words to their
base form, such as "running" to "run"), and converting all text to
lowercase.
(d) Visualization: Visualizing the data can help our identify patterns
and trends in the dataset. We can use tools like word clouds, bar
charts, and scatterplots to visualize the frequency of certain words,
the sentiment of tweets over time, and other relevant metrics.
(e) Sentiment analysis: Finally, we can perform sentiment analysis on
the dataset to classify each tweet as positive, negative, or neutral.
This can help we identify the overall sentiment of the dataset, as
well as any patterns in sentiment over time or across different user
groups.
24

By following these steps, we can gain a deeper understanding of our

Twitter sentiment analysis dataset and uncover insights that can inform our
analysis.
3.6 Summary
In this chapter, we described the implementation of our proposed system
design. And, we described the datasets that we are using in our project. The
process flow diagram is shown that steps are using in our project. Then, we will
describe the Experimental result in the next chapter.
25

CHAPTER 4
EXPERIMENTAL RESULT AND CONCLUSION
In this chapter, Experimental Result of models and Limitation of
Sentiment Analysis will be described.
4.1 Implementation
Firstly, downloading the Twitter Dataset from the Kaggle and import the
dataset in Python.
4.1.1 Import Twitter Datasets in Python

Figure 4. 1 Training Twitter datasets Description

4.1.2 Values Count in Sentiment Column

Figure 4. 2 Values in Sentiment Description

4.1.3 Removing unuse column from Dataset

Figure 4. 3 Removing the Tweet ID and Entity Column

4.1.4 Entity vs Sentiment Relation

Figure 4. 4 Entity and Sentiment Relation

4.1.5 Visualizing Sentiment feature(Target)

Figure 4. 5 Visualizing Sentiment Features

4.1.6 Droping null values and duplicated values

Figure 4. 6 Drop and Duplicate Description

4.1.7 Dataset Description After Dropping Unuse Columns

Figure 4. 7 Dataset Description After Dropping Unuse Columns

4.1.8 Convert to Lower Case

Figure 4. 8 Convert to Lower Case description

4.1.9 Remove Punctuation

Figure 4. 9 Remove punctuation

4.1.10 Remove Stopwords

Figure 4. 10 Remove Stopwords

4.1.11 Tokenizing the Texts

Figure 4. 11 Tokenizing Description

4.1.12 Remove Words less than 3

Figure 4. 12 Remove Words Description

4.1.13 Stemming

Figure 4. 13 Stemming description

4.1.14 Lemmatize the Texts

Figure 4. 14 Lemmatization Description

4.1.15 Return Statements

Figure 4. 15 Return Statement Description

4.1.16 Feature Extraction

Figure 4. 16 Feature Extraction

4.1.17 Plot WordCloud Target

Figure 4. 17 WordCloud Description

4.1.18Train and Test Splitting

Figure 4. 18 Train-Test Splitting Description

4.1.19Convert Lemmatized Words to Tf-Idf Feature Vectors

Figure 4. 19 Tf-Idf Feature Description

4.1.20Using Logistic Regression

Figure 4. 20 Logistic Regression Accuracy Description

4.1.21Random Forest Classifier

Figure 4. 21 Random Forest Accuracy Description

4.1.22Using Naïve Bayes Classifier

Figure 4. 22 Naïve Bayes Accuracy Description

4.1.23Using DecisionTreeClassifier

Figure 4. 23 Decision Tree Classifier Accuracy Description

4.1.24Using Support Vector Machine

Figure 4. 24 Support Vector Machine Accuracy Description

4.1.25Comparing Models and Predicting Texts

Figure 4. 25 Models Comparing and Predicting the Texts

4.2 Limitation of Sentiment Analysis system

In this project, packages that are available in R programming are used as
the data sets not from survey nor reviews from social platforms. Only analyzing
and visualizing are presented in this project but in the future, we will be able to
forward for decision making and recommendations.
4.3 Conclusion
Sentiment analysis provides a way to understand the attitudes and
opinions expressed in texts. This project explored how to approach sentiment
analysis using tidy data principles; when text data is in a tidy data structure,
sentiment analysis can be implemented as an inner join. We can use sentiment
analysis to understand how a narrative arc changes throughout its course or what
words with emotional and opinion content are important for a particular text. We
will continue to develop for applying sentiment analysis to different kinds of text
in our case studies.
38

REFERENCES
[1] What is Sentiment Analysis? (Manish Shivanandhan)
[2] Natural Language Processing, Sentiment Analysis and Clinical Analytics
Adil Rajput ([email protected]) Assistant Professor,
Information System Department, Effat University An Nazlah Al
Yamaniyyah, Jeddah 22332, Jeddah, Saudi Arabia
[3] https://data-flair.training/blogs/data-science-r-sentiment-analysis-project/
[4] https://towardsdatascience.com/sentiment-analysis-concept-analysis-and-
applications-6c94d6f58c17
[5] https://www.javatpoint.com/nlp
[6] https://www.tidytextmining.com/sentiment.html
[7] https://lazarinastoy.com/sentiment-analysis-theory-methods-applications/
[8] Sharma, R., & Sharma, Y. K. (2020). Sentiment analysis of Twitter data
using machine learning techniques. International Journal of Advanced
Computer Science and Applications, 11(8), 174-180.
https://doi.org/10.14569/IJACSA.2020.0110824
[9] Kalpana, T., & Vasuki, V. (2018). A comparative study on sentiment
analysis techniques in Twitter data. Journal of Advanced Research in
Dynamical and Control Systems, 10(5), 273-282.
APPENDIX
import pandas as pd
import numpy as np
import string as st
from wordcloud import WordCloud #Word visualization
import matplotlib.pyplot as plt #Plotting properties
import seaborn as sns #Plotting properties
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer #Data transformation
from sklearn.model_selection import train_test_split #Data testing
from sklearn.linear_model import LogisticRegression #Prediction Model
import xgboost
from xgboost import XGBClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score, confusion_matrix #Comparison between
real and predicted
import re #Regular expressions
import nltk
from nltk import word_tokenize
from nltk import PorterStemmer, WordNetLemmatizer
nltk.download('stopwords')
import warnings
train_df.head()
valid_df.head()
train_df.shape
valid_df.shape
data = pd.concat([train_df, valid_df]).reset_index(drop = True)
data.shape
from sklearn.utils import shuffle
data = shuffle(data)
data = data.reset_index(drop=True)
data.head()
data.drop(["Tweet ID"],axis=1,inplace=True)
data.head()
Twitter_sentiment = data.groupby(['entity', 'sentiment']).sentiment.count().unstack()
Twitter_sentiment.plot(kind='bar',figsize=(10,5))
data.drop(["entity"],axis=1,inplace=True)
data.head()
data['sentiment'].value_counts()
plt.figure(figsize=(10,10))#Visualizing Sentiment feature - Target feature
plt.subplot(1,2,1)
sns.set(style ='whitegrid')
pd.set_option('display.max_columns',None)
sns.countplot(x=data['sentiment'])
plt.subplot(1,2,2)#plot countplot
data.sentiment.value_counts().plot(kind='pie', autopct='%1.0f%%',figsize=(10,5))
plt.show()
data.isnull().sum()# check null
data =data.dropna()
data.isnull().sum()# check null
data.duplicated().sum()
data = data.drop_duplicates()
data.duplicated().sum()
data['sentiment'].value_counts()
data["sentiment"] =data["sentiment"].map({"Positive": 3, "Neutral": 2, "Negative":
1,"Irrelevant":0})
data
from nltk.stem.snowball import SnowballStemmer
import re ,string, unicodedata, html
import nltk
from nltk import word_tokenize , WordNetLemmatizer
from nltk.corpus import stopwords
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')
data['Tweet_content'] = data['Tweet_content'].apply(lambda x: x.lower())
data.head()
import string
def punctuation_removal(text):
all_list = [char for char in text if char not in string.punctuation]
clean_str = ''.join(all_list)
return clean_str
data['Tweet_content'] = data['Tweet_content'].apply(punctuation_removal)
data.head()
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stop = stopwords.words('english')
data['Tweet_content'] = data['Tweet_content'].apply(lambda x: ' '.join([word for word in
x.split() if word not in (stop)]))
data.head()
def tokenize(text):
text = re.split('\s+' ,text)
return [x.lower() for x in text]
data['Tweet_content'] = data['Tweet_content'].apply(lambda msg : tokenize(msg))
data.head()
def remove_small_words(text):
return [x for x in text if len(x) > 3 ]
data['Tweet_content'] = data['Tweet_content'].apply(lambda x :
remove_small_words(x))
data.head()
def stemming(text):
ps = PorterStemmer()
return [ps.stem(word) for word in text]
data['Tweet_content'] = data['Tweet_content'].apply(lambda wrd: stemming(wrd))
data.head()
def lemmatize(text):
word_net = WordNetLemmatizer()
return [word_net.lemmatize(word) for word in text]
data['Tweet_content'] = data['Tweet_content'].apply(lambda x : lemmatize(x))
data.head()
def return_sentences(lemmatize):
return " ".join([word for word in lemmatize])
data['Tweet_content'] = data['Tweet_content'].apply(lambda x : return_sentences(x))
data.head()
import re
def remove_numbers(text):
return re.sub(r'\d+', '', text)
data['Tweet_content']=data['Tweet_content'].apply(lambda x : remove_numbers(x))
data.head()
data.Tweet_content[0]
data.Tweet_content[0]
plt. grid(False)
plt.title("for positive")
wc = WordCloud(max_words = 2000 , width = 1600 , height =
800).generate(data[data.sentiment == 3].Tweet_content.str.cat(sep=' '))
plt.imshow(wc , interpolation = 'bilinear')
plt.subplot(2,2,2)
# Negative
plt. grid(False)
plt.title("for negative")
wc = WordCloud(max_words = 2000 , width = 1600 , height =
800).generate(data[data.sentiment == 1].Tweet_content.str.cat(sep=' '))
plt.imshow(wc , interpolation = 'bilinear')
plt.subplot(2,2,3)
# Neutral
plt. grid(False)
plt.title("for neutral")
wc = WordCloud(max_words = 2000 , width = 1600 , height =
800).generate(data[data.sentiment == 2].Tweet_content.str.cat(sep=' '))
plt.imshow(wc , interpolation = 'bilinear')
plt.subplot(2,2,4)
# Irrelevant
plt. grid(False)
plt.title("for irrelevant")
wc = WordCloud(max_words = 2000 , width = 1600 , height =
800).generate(data[data.sentiment == 0].Tweet_content.str.cat(sep=' '))
plt.imshow(wc , interpolation = 'bilinear')
x=data.iloc[:,:-1].values
y=data.iloc[:,-1].values
print(x.shape)
print(y.shape)
x
tr_idf_model = TfidfVectorizer()
X = tr_idf_model.fit_transform(data['Tweet_content'])
#X_test = tr_idf_model.transform(y)
tf_idf_array = X.toarray()
tfidf_tokens = tr_idf_model.get_feature_names_out()
df_tf_idf = pd.DataFrame(tf_idf_array, columns = tfidf_tokens)
df_tf_idf X_train,X_test, y_train, y_test = train_test_split(X, data['sentiment'], test_size
= 0.1, random_state=1)
print(X_train.shape)
print(X_test.shape)
train=0.80*100
test=0.20*100
labels=['Train','Test']
sizes=[train,test]
colors=['yellow','green']
fig1,ax1=plt.subplots()
ax1.pie(sizes, colors=colors, labels=labels,autopct='%1.1f%%',startangle=90)
ax1.axis('equal')
plt.title('Train Vs Test')
plt.show()
y_train.value_counts()
dct=dict()
lr = LogisticRegression(max_iter = 500)
lr.fit(X_train, y_train)
print('Logistic Regression model fitted..')
pred = lr.predict(X_test)
print("Accuracy score : {}".format(accuracy_score(y_test, pred)))
dct['LogisticRegression']=round(accuracy_score(y_test, pred)*100,2)
accuracy_score(y_test,pred)
plt.figure(figsize=(8,8))
sns.heatmap(confusion_matrix(y_test,pred),annot = True,fmt = "d")
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
print('Random Forest Classifier model fitted..')
pred = rf.predict(X_test)
print("Accuracy score : {}".format(accuracy_score(y_test, pred)))
print("Confusion matrix : \n {}".format(confusion_matrix(y_test, pred)))
dct['RdmFC']=round(accuracy_score(y_test, pred)*100,2)
accuracy_score(y_test,pred)
plt.figure(figsize=(8,8))
sns.heatmap(confusion_matrix(y_test,pred),annot = True,fmt = "d") nb =
MultinomialNB()
nb.fit(X_train,y_train)
pred = nb.predict(X_test)
print("Accuracy score : {}".format(accuracy_score(y_test, pred)))
dct['MultinomialNB']=round(accuracy_score(y_test, pred)*100,2)
accuracy_score(y_test,pred)
plt.figure(figsize=(8,8))
sns.heatmap(confusion_matrix(y_test,pred),annot = True,fmt = "d")
from sklearn.tree import DecisionTreeClassifier
#Decision Tree
dtc=DecisionTreeClassifier(random_state=1)
dtc.fit(X_train,y_train)
pred = dtc.predict(X_test)
print("Accuracy score : {}".format(accuracy_score(y_test, pred)))
dct['DecisionTreeClassifier']=round(accuracy_score(y_test, pred)*100,2)
accuracy_score(y_test,pred)
plt.figure(figsize=(8,8))
sns.heatmap(confusion_matrix(y_test,pred),annot = True,fmt = "d")
from sklearn import svm
#Create a svm Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel
model = clf.fit(X_train, y_train)
prediction = model.predict(X_test)
print("accuracy: {}%".format(round(accuracy_score(y_test, prediction)*100,2)))
dct['SVM'] = round(accuracy_score(y_test, prediction)*100,2)
accuracy_score(y_test,prediction)
plt.figure(figsize=(8,8))
sns.heatmap(confusion_matrix(y_test,pred),annot = True,fmt = "d")
colors = ["green", "orange", "blue" , "red" ,"yellow"]
sns.set_style("whitegrid")
plt.figure(figsize=(7,4))
plt.title('Comparing Models',fontsize=20)
plt.yticks(np.arange(0,100,10))
plt.ylabel("Accuracy %",fontsize=18)
plt.xlabel("Algorithms",fontsize=18)
sns.barplot(x=list(dct.keys()), y=list(dct.values()), palette=colors)
plt.show()
text = ["I love the musics."]
texts=tr_idf_model .transform(text)
print(texts)
test1=lr.predict(texts)
test2=nb.predict(texts)
test3=dtc.predict(texts)
test4=xgb.predict(texts)
print(test1,test2,test3,test4)
text = ["I hate you fucking."]
texts=tr_idf_model .transform(text)
print(texts)
test1=lr.predict(texts)
test2=nb.predict(texts)
test3=dtc.predict(texts)
test4=xgb.predict(texts)
print(test1,test2,test3,test4)

Financial Time Series Analysis and Prediction With Feature Engineering and Support Vector Machines - Newton - Linchen
100% (1)
Financial Time Series Analysis and Prediction With Feature Engineering and Support Vector Machines - Newton - Linchen
5 pages
Sentiment Analysis Final Documentation Report
50% (2)
Sentiment Analysis Final Documentation Report
21 pages
Sentiment Analysis On Twitter
100% (2)
Sentiment Analysis On Twitter
8 pages
A New Methodology To Predict Backbreak in Blasting Operation-M. Mohammadnejad
No ratings yet
A New Methodology To Predict Backbreak in Blasting Operation-M. Mohammadnejad
7 pages
Machine Learning For Sentiment Analysis of Twitter Data
No ratings yet
Machine Learning For Sentiment Analysis of Twitter Data
9 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
7 pages
fin_ijprems1714118825
No ratings yet
fin_ijprems1714118825
6 pages
FML Project Report
No ratings yet
FML Project Report
18 pages
Batch-6c Minipro Doc Rev-2
No ratings yet
Batch-6c Minipro Doc Rev-2
33 pages
Digital Assignment-1 Literature Review On Twitter Sentiment Analysis Name: G.Tirumala Reg No: 16BCE0202 1)
No ratings yet
Digital Assignment-1 Literature Review On Twitter Sentiment Analysis Name: G.Tirumala Reg No: 16BCE0202 1)
9 pages
Se Write-Up
No ratings yet
Se Write-Up
2 pages
6 Project Report Sem6
No ratings yet
6 Project Report Sem6
13 pages
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
No ratings yet
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
3 pages
IJCRT2207068
No ratings yet
IJCRT2207068
5 pages
Sentiment Analysis of Tweets Using Natural Language Processing (#1130188) - 2484168
No ratings yet
Sentiment Analysis of Tweets Using Natural Language Processing (#1130188) - 2484168
3 pages
Abstract
No ratings yet
Abstract
2 pages
10 1109@icaccs48705 2020 9074208
No ratings yet
10 1109@icaccs48705 2020 9074208
3 pages
Ppt- Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Ppt- Sentiment Analysis Using Machine Learning Algorithms
23 pages
Senti bp1
No ratings yet
Senti bp1
2 pages
twitter sentiment analysis ppt
100% (1)
twitter sentiment analysis ppt
10 pages
A Review On Twitter Sentiment Analysis Approaches
No ratings yet
A Review On Twitter Sentiment Analysis Approaches
5 pages
Sentiment Analysis of Twitter Data Using TF-IDF and Machine Learning Techniques
No ratings yet
Sentiment Analysis of Twitter Data Using TF-IDF and Machine Learning Techniques
4 pages
Machine Learning Based Sentiment Analysis For Text Messages
No ratings yet
Machine Learning Based Sentiment Analysis For Text Messages
7 pages
Sentiment Analysis of Tweets Using Machine Learning
No ratings yet
Sentiment Analysis of Tweets Using Machine Learning
22 pages
MP 1
No ratings yet
MP 1
14 pages
Sentiment of tweets
No ratings yet
Sentiment of tweets
7 pages
IEEE Paper Format (6)
No ratings yet
IEEE Paper Format (6)
4 pages
TSA Synopsis
No ratings yet
TSA Synopsis
18 pages
Social Media Sentiment
No ratings yet
Social Media Sentiment
8 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
15 pages
Machine Learning With Sentiment Approach
No ratings yet
Machine Learning With Sentiment Approach
5 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
14 pages
Twitter Sentiment Analysis by Robin Singh
No ratings yet
Twitter Sentiment Analysis by Robin Singh
57 pages
Twitter Sentiment Analysis - Final - Report Copy Sahil
No ratings yet
Twitter Sentiment Analysis - Final - Report Copy Sahil
26 pages
Abstract Review PPT Tem - 03
No ratings yet
Abstract Review PPT Tem - 03
7 pages
Social Media Se
No ratings yet
Social Media Se
3 pages
ProjectFinalReport 2copies
No ratings yet
ProjectFinalReport 2copies
26 pages
Introduction
No ratings yet
Introduction
27 pages
Sentiment Analysis of Twitter Data
No ratings yet
Sentiment Analysis of Twitter Data
1 page
IR Case Study Final Presentation
No ratings yet
IR Case Study Final Presentation
12 pages
Natural Language Processing (Ue16Cs333) MINI-PROJECT (2019) Sentiment Analysis
No ratings yet
Natural Language Processing (Ue16Cs333) MINI-PROJECT (2019) Sentiment Analysis
2 pages
minor_project_report
No ratings yet
minor_project_report
29 pages
Twitter Sentiment Analysis Research Paper
No ratings yet
Twitter Sentiment Analysis Research Paper
5 pages
IC-RTETM_Final_Sentiment_Analysis
No ratings yet
IC-RTETM_Final_Sentiment_Analysis
13 pages
Sentiment Analysis On Twitter Using Streaming Api: Abstract
No ratings yet
Sentiment Analysis On Twitter Using Streaming Api: Abstract
5 pages
Anjali Presentation
No ratings yet
Anjali Presentation
21 pages
Template For The First Slide of PPT Presentation1
No ratings yet
Template For The First Slide of PPT Presentation1
18 pages
Sentimental Analysis On Twitter Data Using Naive Bayes: Ijarcce
No ratings yet
Sentimental Analysis On Twitter Data Using Naive Bayes: Ijarcce
4 pages
Fds Casestudy Chan
No ratings yet
Fds Casestudy Chan
9 pages
Twitter Sentiment Analysis For Product Review
No ratings yet
Twitter Sentiment Analysis For Product Review
19 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
11 pages
Sentiment Analysis Twitter
No ratings yet
Sentiment Analysis Twitter
3 pages
Twitter Sentiment Analysis With Textblob
No ratings yet
Twitter Sentiment Analysis With Textblob
6 pages
Sentiment Analysis of User Comment Text Based On L
No ratings yet
Sentiment Analysis of User Comment Text Based On L
13 pages
(IJIT-V6I4P8) :nikita R. Dandwate, Sarika B. Solanke
No ratings yet
(IJIT-V6I4P8) :nikita R. Dandwate, Sarika B. Solanke
5 pages
Cse499a Report
No ratings yet
Cse499a Report
18 pages
Minor Fnal
No ratings yet
Minor Fnal
22 pages
Sentiment Analysis of Twitter Data My
75% (4)
Sentiment Analysis of Twitter Data My
14 pages
A Comparative Study of Sentiment Analysis Using NLP and Different Machine Learning Techniques On US Airline Twitter Data
No ratings yet
A Comparative Study of Sentiment Analysis Using NLP and Different Machine Learning Techniques On US Airline Twitter Data
4 pages
Sentiment Analysis of Twitter Data: Radhi D. Desai
No ratings yet
Sentiment Analysis of Twitter Data: Radhi D. Desai
4 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
From Everand
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
Adam Jones
No ratings yet
A Physics-Informed Machine Learning Model For Porosity Analysis
No ratings yet
A Physics-Informed Machine Learning Model For Porosity Analysis
16 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
PRACTICAL5
No ratings yet
PRACTICAL5
23 pages
Developing a New Model for Drilling Rate of Penetration Prediction Using Convolutional Neural Network
No ratings yet
Developing a New Model for Drilling Rate of Penetration Prediction Using Convolutional Neural Network
33 pages
[FREE PDF sample] Deep Learning for Natural Language Processing (MEAP V07) Stephan Raaijmakers ebooks
100% (2)
[FREE PDF sample] Deep Learning for Natural Language Processing (MEAP V07) Stephan Raaijmakers ebooks
55 pages
Advanced Machine Learning
No ratings yet
Advanced Machine Learning
2 pages
Optimized Machine Learning Based Collaborative Filtering (OMLCF) Recommendation System in e Commerce
No ratings yet
Optimized Machine Learning Based Collaborative Filtering (OMLCF) Recommendation System in e Commerce
12 pages
MATH 6397 Automatic / Machine Learning and Data Mining Syllabus
No ratings yet
MATH 6397 Automatic / Machine Learning and Data Mining Syllabus
1 page
Performance Analysis On Liquid-Cooled Battery Thermal Management For Electric Vehicles Based On Machine Learning
No ratings yet
Performance Analysis On Liquid-Cooled Battery Thermal Management For Electric Vehicles Based On Machine Learning
16 pages
Cybernetical Intelligence: Engineering Cybernetics with Machine Intelligence Kelvin K. L. Wong 2024 Scribd Download
100% (1)
Cybernetical Intelligence: Engineering Cybernetics with Machine Intelligence Kelvin K. L. Wong 2024 Scribd Download
47 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
85 pages
Novel Graph-Based Machine Learning Technique To Secure Smart Vehicles in Intelligent Transportation Systems
No ratings yet
Novel Graph-Based Machine Learning Technique To Secure Smart Vehicles in Intelligent Transportation Systems
9 pages
LN and ML-based Model Architecture For Recruiting IT Professionals
No ratings yet
LN and ML-based Model Architecture For Recruiting IT Professionals
18 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
Detecting Emotion From EEG Signals Using The Emotive Epoc Device
No ratings yet
Detecting Emotion From EEG Signals Using The Emotive Epoc Device
11 pages
MFML PDF
No ratings yet
MFML PDF
101 pages
DLT Unit-1 Answers
No ratings yet
DLT Unit-1 Answers
36 pages
House Price Estimation
No ratings yet
House Price Estimation
7 pages
ML Notes MAKAUT 7th Sem
No ratings yet
ML Notes MAKAUT 7th Sem
31 pages
CS60050: Machine Learning Mid-Semester Examination, Autumn 2017
No ratings yet
CS60050: Machine Learning Mid-Semester Examination, Autumn 2017
1 page
Disease Detection Using ML
100% (8)
Disease Detection Using ML
24 pages
High-Speed Tracking With Kernelized Correlation Filters
No ratings yet
High-Speed Tracking With Kernelized Correlation Filters
14 pages
Top 50 Machine Learning Interview Questions (2023) - Simplilearn
No ratings yet
Top 50 Machine Learning Interview Questions (2023) - Simplilearn
24 pages
Presentaton PPT Stock Prediction
No ratings yet
Presentaton PPT Stock Prediction
10 pages
U2-ML-QB With Answers
No ratings yet
U2-ML-QB With Answers
16 pages
vinodhini project
No ratings yet
vinodhini project
66 pages
Course Outline
No ratings yet
Course Outline
7 pages

Twitte Analysis

Uploaded by

Twitte Analysis

Uploaded by

DEFENCE SERVICES ACADEMY

DEPARTMENT OF COMPUTER TECHNOLOGY

Twitter Sentiment Analysis Project using NLP

Keywords: Machine Learning, Support Vector Machine, Naïve Bayes, Logistics

1.1.1 Problem Statement

combination, become a very important tool for analyzing millions of brand

Figure 2. 1 Sentiment Analysis

The Automated Machine Learning Sentiment Analysis Model has been

Figure 2. 2 Twitter Sentiment Analysis in NLP

Figure 2. 3 Natural Language Processing (NLP)

speech recognition, relationship extraction, and topic segmentation. NLP

(e) Spelling Correction

Figure 2. 4 Processes in NLP

Figure 2. 5 Sentiment Analysis using machine learning

Figure 2. 6 Logistic Regression Classification System

Bayes’ Theorem finds the probability of an event occurring given the

Figure 2. 7 Naive Bayes Classification System

complicated issues by combining a number of classifiers. In a random forest

P¿¿ = the probability of positive class

P¿¿ = the probability of negative class

N ¿ ¿= the total number of positive class

Figure 2. 8 Random Forest Classification

Every decision tree consists following list of elements:

Figure 2. 9 Schematic Diagram for Decision Tree Classification

The main idea behind SVM is to find a hyperplane that maximally

2.7 Literature Review

Figure 3. 1 Training Dataset for Twitter

3.2 Proposed System

Data Cleaning Train-Test Split

New Data Preprocessing Prediction Result

Figure 3. 2 The Proposed System of Sentiment Analysis

3.4 Data Collection

3.5 Data Exploration

By following these steps, we can gain a deeper understanding of our

Figure 4. 1 Training Twitter datasets Description

Figure 4. 2 Values in Sentiment Description

4.1.3 Removing unuse column from Dataset

Figure 4. 3 Removing the Tweet ID and Entity Column

Figure 4. 4 Entity and Sentiment Relation

4.1.5 Visualizing Sentiment feature(Target)

Figure 4. 5 Visualizing Sentiment Features

Figure 4. 6 Drop and Duplicate Description

4.1.7 Dataset Description After Dropping Unuse Columns

Figure 4. 7 Dataset Description After Dropping Unuse Columns

Figure 4. 8 Convert to Lower Case description

4.1.9 Remove Punctuation

Figure 4. 9 Remove punctuation

Figure 4. 10 Remove Stopwords

Figure 4. 11 Tokenizing Description

4.1.12 Remove Words less than 3

Figure 4. 12 Remove Words Description

Figure 4. 13 Stemming description

Figure 4. 14 Lemmatization Description

4.1.15 Return Statements

Figure 4. 15 Return Statement Description

Figure 4. 16 Feature Extraction

4.1.17 Plot WordCloud Target

Figure 4. 17 WordCloud Description

Figure 4. 18 Train-Test Splitting Description

4.1.19Convert Lemmatized Words to Tf-Idf Feature Vectors

Figure 4. 19 Tf-Idf Feature Description

Figure 4. 20 Logistic Regression Accuracy Description

4.1.21Random Forest Classifier

Figure 4. 21 Random Forest Accuracy Description

4.1.22Using Naïve Bayes Classifier

Figure 4. 22 Naïve Bayes Accuracy Description

Figure 4. 23 Decision Tree Classifier Accuracy Description

4.1.24Using Support Vector Machine

Figure 4. 24 Support Vector Machine Accuracy Description

Figure 4. 25 Models Comparing and Predicting the Texts

4.2 Limitation of Sentiment Analysis system

You might also like