0% found this document useful (0 votes)
5 views18 pages

Ml termwork report

The document discusses sentiment analysis, a growing field in natural language processing that focuses on classifying opinions in text as positive, negative, or neutral. It highlights the importance of analyzing consumer feedback from e-commerce platforms like Flipkart and Amazon to improve marketing strategies and customer service. The research employs various methodologies, including machine learning techniques, to analyze product reviews and extract insights into consumer sentiment.

Uploaded by

hamzashaik98854
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views18 pages

Ml termwork report

The document discusses sentiment analysis, a growing field in natural language processing that focuses on classifying opinions in text as positive, negative, or neutral. It highlights the importance of analyzing consumer feedback from e-commerce platforms like Flipkart and Amazon to improve marketing strategies and customer service. The research employs various methodologies, including machine learning techniques, to analyze product reviews and extract insights into consumer sentiment.

Uploaded by

hamzashaik98854
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

ABSTRACT

Sentiment Analysis is a field within text analysis, natural language


processing (NLP), and computational linguistics that focuses on
identifying, extracting, and analyzing subjective information from
textual data. It has become one of the fastest-growing areas in
computer science, driven by the increasing amount of user-generated
content on the internet. As individuals regularly share their opinions,
thoughts, and feedback online, the ability to analyze and classify
these sentiments—whether positive, negative, or neutral—has
gained significant importance.
In sentiment analysis, the primary task is to classify the polarity of a
given text, whether at the document, sentence, or feature level. This
means determining whether a piece of text expresses a positive,
negative, or neutral opinion. Sentiment analysis often categorizes
emotions such as "happy," "sad," or "angry" to gauge the sentiment
of the content.
This research utilizes product reviews sourced from Kaggle, focusing
on ratings and sentiments expressed by users on e-commerce
platforms like Flipkart. The study aims to classify sentiments into
positive, negative, and neutral categories and provides a breakdown
of customer opinions. Additionally, the analysis identifies frequently
used words and word pairs to highlight key aspects of the
conversations.
The findings suggest that the most commonly used positive and
negative words play a crucial role in understanding public sentiment.
This helps businesses interpret consumer feedback more effectively,
improving customer service and marketing strategies based on
consumer emotions and opinions. The research also emphasizes the
potential of sentiment analysis in understanding psychological states
and trends in consumer behavior on online platforms.
Introduction
The rapid expansion of the Internet and e-commerce has significantly
changed how consumers make purchasing decisions. In the past,
advertisements and personal recommendations were the primary
sources of product information. However, with the rise of online
shopping platforms, consumers now have access to a vast array of
options and detailed customer reviews. These reviews, often
replacing traditional word-of-mouth, play a crucial role in shaping
purchasing behavior and influencing product sales.
E-commerce platforms like Amazon and Flipkart encourage
customers to share their experiences through numerical ratings and
written reviews. These reviews not only help potential buyers make
informed decisions but also enhance the credibility of online stores,
increasing customer engagement and trust. Trustworthy reviews can
attract new customers, improve a website’s traffic, and extend
visitors’ time on the platform.
For manufacturers, online reviews serve as valuable feedback
mechanisms. They help businesses identify customer needs, improve
existing products, and develop new offerings based on market
demand. Additionally, they provide insights into industry trends and
competitive positioning, enabling companies to refine their
marketing strategies. Whether positive or negative, reviews are
instrumental in understanding consumer preferences and enhancing
product quality.
The integration of big data analytics in e-commerce has further
amplified the impact of online reviews. Analyzing customer feedback
allows businesses to make data-driven decisions, ultimately leading
to better products and higher customer satisfaction. In this evolving
digital landscape, online reviews have become a powerful tool for
both consumers and businesses, driving innovation and improving
the overall shopping experience.
Online Shopping Players in India
Amazon:
Amazon, founded by Jeff Bezos in 1994, is the largest online retailer
in the world. Initially named Cadabra.com, it was later rebranded to
Amazon, inspired by the world's second-longest river. Amazon began
as an online bookstore but quickly expanded to sell various products,
including movies, games, music, software, and more. It is now a
global leader in e-commerce, offering a wide range of products.
Flipkart:
Flipkart Pvt Ltd, founded in 2007 by Sachin Bansal and Binny Bansal,
is one of India’s largest e-commerce companies. Originally focused on
selling books, Flipkart soon expanded its product offerings to
electronics, fashion, home care, and lifestyle products. As of 2020,
Flipkart holds approximately 50% of the Indian e-commerce market
share and is a strong competitor to Amazon in India. The company
also owns PhonePe, a mobile payment service, and has expanded
into various product categories, including smartphones, home
appliances, and personal health care products. Flipkart employs over
15,000 people and has shifted to predominantly digital payments,
especially after the pandemic.
Snapdeal:
Snapdeal, founded in 2010 by Kunal Bahl and Rohit Bansal, is a major
online marketplace in India. With over 275,000 sellers and more than
30 million products across 6,000 towns and cities, Snapdeal
competes with Flipkart and Amazon in various categories. It offers a
wide range of products, including electronics, fashion, and home
goods.
Myntra:
Myntra, established in 2007 by Mukesh Bansal, Ashutosh Lawania,
and Vineet Saxena, is a leading Indian fashion e-commerce company.
Initially focusing on custom-made gifts, Myntra has evolved into a
prominent online retailer for clothing, footwear, and accessories. The
company operates on a business-to-consumer (B2C) model, offering
a broad range of fashion and lifestyle products.
Ebay:
Founded in 1995 by Pierre Omidyar, eBay is a global online
marketplace headquartered in San Jose, California. It operates as a
platform for both consumer-to-consumer (C2C) and business-to-
consumer (B2C) transactions. eBay allows individuals and businesses
to buy and sell a wide variety of goods worldwide. Today, eBay
operates in over 30 countries, making it one of the largest online
marketplaces globally.

Research and Sentiment Analysis


This research leverages product reviews from various e-commerce
platforms, including Amazon, Flipkart, Snapdeal, Myntra, and eBay,
gathered from a reputable source, Kaggle. The reviews are analyzed
through sentiment analysis, categorizing them as positive, negative,
or neutral.
The study aims to break down and categorize different product
opinions, identifying the most frequently used words and word pairs
that highlight consumer sentiment. By examining these words, the
research provides insights into the psychological state of the general
public when shopping on these platforms. Positive and negative
keywords are instrumental in understanding customer perceptions,
allowing businesses to better understand consumer needs and
improve their products and services.
Levels of Sentiment Analysis
Sentiment analysis is the process of identifying and extracting
opinions, feelings, and attitudes from text, such as product reviews,
social media comments, and more. It is a key field within natural
language processing (NLP), and is divided into various levels of
analysis based on the granularity of the text being processed.
Token Level
At the token level, the analysis is performed on the smallest units of
meaning, such as words or phrases. For example, in the sentence
"Even though the food quality was not very good, I enjoyed the
restaurant's service," the first part of the sentence conveys a negative
sentiment (regarding food), while the latter part expresses a positive
sentiment (regarding service). In such cases, the sentence may be
identified as neutral by a machine, but a human could perceive a
predominantly positive sentiment. Token-level analysis involves
breaking down sentences into smaller chunks, eliminating
unnecessary words, preprocessing the text, and categorizing the
sentiment (positive, negative, or neutral).
Document Level
At the document level, the sentiment analysis evaluates an entire
document or review, classifying it as positive, negative, or neutral.
This approach is useful for analyzing product reviews, where a single
document expresses an overall sentiment about a product. However,
this level may not work effectively for documents that contain
multiple reviews or mixed sentiments about various products.
Examples include book reviews or movie critiques, where the entire
text is assessed for its general tone.
Sentence Level
The sentence level focuses on analyzing individual sentences to
determine if they contain positive, negative, or neutral sentiments.
Sentences are categorized into subjective or objective sentences,
much like subjectivity classification. Some sentences are
straightforward, such as "He is a good boy," which clearly conveys
positivity. However, more complex sentences, like "You are a good
player, but I am very disappointed that you failed the exam," may
present multiple conflicting emotions, making sentiment
classification more challenging for a machine. This level is essential
for analyzing the nuances of sentence structures and determining the
overall sentiment expressed.
Paragraph Level
Paragraph-level sentiment analysis involves evaluating sentiments
across entire paragraphs. With the rise of social media comments,
reviews, and feedback, this level becomes useful in analyzing longer,
more detailed pieces of text, such as product reviews or social media
posts. Paragraph-level analysis can also take into account the context
provided by surrounding text, offering a more comprehensive
understanding of the sentiment expressed in the paragraph.
Aspect Level
Aspect-level sentiment analysis, or feature-level sentiment analysis,
takes a deeper look at the specific features or aspects of a product or
service. It focuses on understanding why people liked or disliked
certain elements. For example, in the review "The Samsung J7 offers
the finest camera quality," the sentiment analysis would focus
specifically on the camera quality as the aspect being evaluated. This
level is more granular and helps businesses understand not just the
overall sentiment but the specific reasons behind it, such as the
quality of a product's camera, performance, or customer service.
Sentiment Analysis Techniques
Sentiment analysis for social networking has seen the use of several
methodologies. These methods may be divided into three primary
groups: hybrid methods, lexical methods, and machine learning
methods. Figure 1.1 shows the sentiment analysis techniques.

fig: Sentiment analysis techniques

Machine Learning Techniques for Sentiment Analysis


Sentiment analysis uses various machine learning techniques, which
can be classified into supervised, unsupervised, semi-supervised, and
deep learning methods.
Supervised Learning Techniques
Supervised learning relies on labeled data for training models. The
two main stages are:
1. Model Training: The algorithm learns from labeled data.
2. Prediction: The trained model classifies new instances.
Algorithms commonly used include:
 Support Vector Machine (SVM): Maximizes the margin
between classes using a hyperplane.
 Naive Bayes: Based on Bayes' Theorem, assuming feature
independence.
 Decision Trees: Splits data into smaller segments to identify
patterns.
 Bayesian Networks: Uses acyclic graphs to represent
dependencies between words.

Unsupervised Learning Techniques


Unsupervised learning is used when labeled data is unavailable. It
primarily involves:
 Clustering Algorithms: Grouping data into categories based on
sentiment.
 Spectral Clustering: Applied to classify tweets into positive and
negative categories.

Semi-Supervised Learning Techniques


A hybrid of supervised and unsupervised learning, semi-supervised
learning uses both labeled and unlabeled data, often applied to large
datasets. Clustering techniques are typically used in this approach.
Deep Learning Techniques
Deep learning methods use multi-layered neural networks to
automatically extract features:
 Deep Neural Networks (DNN): Multi-layer models with hidden
layers for data processing.
 Convolutional Neural Networks (CNN): Used for NLP tasks and
computer vision.
 Recurrent Neural Networks (RNN): Suitable for sequential data.
These techniques offer high accuracy by learning features
automatically.

Classification Techniques
Several classifiers are used in sentiment analysis, including:
 Naive Bayes: Based on feature independence assumptions.
 Maximum Entropy (MaxEnt): Estimates conditional
distributions using exponential family distributions.
 Support Vector Machine (SVM): Classifies data by maximizing
the margin between classes.

Lexicon Techniques
Lexicon-based sentiment analysis assigns predefined scores to words
based on their polarity (positive, neutral, or negative). The total
polarity of a text is calculated by summing individual token scores.
However, it faces challenges in domain-specific contexts, where
words may have different meanings.
Hybrid Techniques
Hybrid methods combine multiple strategies to improve sentiment
analysis. These can include:
 Contextual Opinion Mining: Uses semantic similarity to extract
contextual sentiment.
 Unsupervised Learning with POS Patterns: Extracts sentiment
phrases using part-of-speech rules.
Each technique has its strengths and is chosen based on the dataset
and analysis goals.

Research design
The choice of descriptive and exploratory research was made with
the expectation that it would provide marketers with a clear
understanding of the millennial mindset. By using the Bayes
algorithm to classify emotion, this method seeks to extract emotions
from the dataset and categorize them into emotions by assigning
scores based on the emotions connected with that tweet. The graph
is plotted following the given score. Figure 3.1 shows the research
design for the sentiment analysis model and figure 3.2 shows the
sentiment polarity model design.
A Sentiment Polarity Model classifies text into sentiment categories:
positive, negative, or neutral. It determines the direction (polarity)
of the sentiment expressed in the text.
 Positive indicates favorable sentiment, negative indicates
unfavorable sentiment, and neutral means no clear sentiment.
 The model often uses feature extraction techniques like
tokenization, TF-IDF, or word embeddings to process text.
Some models use polarity scores: +1 for positive, -1 for negative, and
0 for neutral.
Research Methodology for Sentiment Analysis
1. Data Collection:
 The first step in opinion mining is gathering substantial volumes
of data.
 Sources: Data can be obtained from:
o Social media platforms (Facebook, Twitter)
o Reviews and comments from various sources (blogs,
ratings, forums)
o Datasets from Kaggle (e.g., product reviews from Flipkart)
 Example Dataset: A Flipkart product review dataset from
Kaggle with 205,053 rows and 5 columns containing product
names, prices, ratings, reviews, and summaries.

2. Data Preprocessing:
 Preprocessing cleans and prepares the raw data for analysis.
 Steps:
o Tokenization: Splitting text into words, phrases, symbols,
or characters.
o Stopword Removal: Removing common words (e.g.,
"the", "is", "in") that don't contribute to sentiment
analysis.
o Filling Missing Values: Replacing missing values with a
global constant.
 Tools Used: NLTK (Natural Language Toolkit) for tokenization,
stopword removal, and stemming.
 Goal: Ensure data is cleaned and fragmented properly, reducing
noise for better model accuracy.

3. Feature Extraction:
 This step involves identifying key features to build the
sentiment analysis model.
 Methods:
o Skip N-Gram Model: A generalization of n-grams that
allows gaps between tokens, addressing data sparsity.
o TF-IDF Hybrid Method: Combines Term Frequency (TF)
and Inverse Document Frequency (IDF) to calculate word
importance in the text.
 Feature extraction reduces model complexity and improves
accuracy by selecting relevant features.
4. Classification:
 Training Stage: A classification model is built using a training
dataset.
 Testing Stage: The accuracy of the model is tested using test
data.
 Classifier Used: Naïve Bayes Classifier to classify sentiment.
 Polarity Determination: Sentiment values are calculated using
subjectivity (0-1) and polarity (-1 to +1).

The text is positive if the polarity is more than 0, negative if the polarity
is less than 0, and neutral if the polarity is equal to 0. The subjective
range runs from 0.0 to 1.0. A more excellent score indicates that the text
is more subjective. Table 3.4 shows the value counts for positive,
negative and neutral reviews.

5. Sentiment Calculation:
 Sentiment Values: For each review, subjectivity and polarity
values are computed.
 Final Sentiment Score: The overall sentiment score is the
summation of the product of individual subjectivity and polarity
values.
6. Visualization:
Visualization of the classification and the outcomes generated by the
machine learning algorithms is strongly advised after developing a
machine learning model. Any common dataset used to train the
machine should be represented as a graph for sentinel analysis so
that the continuous distribution of the data is visible. The results are
displayed using graphs and charts. The 33 word cloud was built using
the frequency of occurrence of words. Figure 3.4 shows the
visualization and figure 3.5 shows the Count Plot of the sentiments.
Testing:
Testing is necessary to confirm whether or not the developed model
accurately predicts the intended result. Another name for it is the
validation step. The fact that this process should function well when
applied to large-scale applications makes it crucial. The final stage is
testing, in which a user inputs a text into a machine at runtime, and
the computer makes predictions about the statement: whether good,
negative, or neutral.

This methodology, using data collection, preprocessing, feature


extraction, and classification, Visualisation and Testing ensures
efficient and accurate sentiment analysis.
Conclusion
Sentiment analysis is rapidly advancing and plays a crucial role in
evaluating and processing large amounts of data, especially from
social networks. It has become an essential tool in decision-making,
particularly in businesses. With the rise of online shopping, especially
among younger, educated, and higher-income groups, sentiment
analysis helps improve customer experiences and guide businesses in
enhancing their offerings.
This research focuses on understanding the relationships between
different attributes of product reviews and performing sentiment
analysis to provide valuable insights. Using a Naïve Bayes algorithm,
we analyzed product reviews to determine the polarity of feedback.
The results demonstrated the model’s effectiveness across key
metrics. This approach will reduce the need for costly and time-
consuming methods like surveys and market research, benefiting
both consumers and product designers.

You might also like