SlideShare a Scribd company logo
Amazon Product Review
Sentiment Analysis
Lalit Jain: https://www.linkedin.com/in/lalit7jain
Big Data Systems & Analytics
Agenda
1. Case Study
2. Scraping reviews
3. Sentiment analysis
4. Classification using Doc2Vec (Logistic, SGD, SVM)
5. Challenges
Case Study
• Scrape the reviews of any website and perform sentiment analysis on the corpus
• Using sentiment analysis result, once the document is classified appropriately use it
to perform classification algorithm using doc2vec approach (SVM/ Deep Belief
Network)
Scraping Reviews
Programming Language used: R
Libraries required: Rvest, dplyr, tm, quanteda, etc
Approach:
1. From the main page of the product, navigate automatically to the review page
2. Loop through required number of pages to get the number of reviews required with an
average of 10 reviews per page
3. Reviews need to be saved in the disk directly to save read only memory and take
advantage of hard disk capacity
Note: Both the “page link” and the “number of pages” can be passed as an argument callable to
the script
Results
Corpus Operations
Loading the documents using quanteda package
Quanteda will create Document Frequency Matrix by function dfm().
This function essentially does this by series of operation including tokenizing, lowercasing, indexing,
stemming, matching with dictionary
Hu and Liu’s lexicon
Using list of positive and negative words (dictionary) available from Hu and Liu’s lexicon with more than
6700+ words
All operations in one line!
Result
Visualizing the Sentiments
Classification
Programming Language used: Python
Libraries required: gensim, nltk, sklearn, etc.
Approach:
1. Load the raw reviews and apply cleaning using nltk package (stop words, stemming,
numbers,etc)
2. Create TaggedDocuments required for building Doc2Vev models (both DM and DBOW)
3. Train both the model 10 times with random shuffling of the documents
4. Split the dataset and apply classification algorithms
1) Cleaning the loaded documents
2) Create TaggedDocuments
DM and DBOW models
dbow (distributed bag of words)
It is a simpler model that ignores word order and
training stage is quicker. The model uses no-local
context/neighboring words in predictions
dm (distributed memory)
We treat the paragraph as an extra word. Then it is
concatenated/averaged with local context word vectors when
making predictions. During training, both paragraph and word
embeddings are updated. It calls for more computation and
complexity.
3) Training the model
Checking the arrangements of Word vectors by both
models
4) Applying Classification Algorithm
Testing
Training
4) Applying Classification Algorithm
Testing
Training
4) Applying Classification Algorithm
Testing
Training
Deep Belief Network
Trained only on 3000 documents
Hyper parameters selected after 52 different combinations
In terms of learn_rates, decays, epochs and hidden units
Accuracy achieved of 89%
Conclusion
Deep Belief Network works well even with 3000 documents.
SVM performs poorly irrespective of the kernel and other hyper parameter
Best Model: Deep Belief Network
Challenges
1. Deep Belief Network does not work on Python 3
2. Need to setup Python 2.7.3 virtual environment
3. “nolearn” library compatibility issues
4. Python memory issues when working on large corpus. Does not work on CPU and needs a GPU
powered machine
Thank You
References:
https://districtdatalabs.silvrback.com/modern-methods-for-sentiment-analysis
https://www.zybuluo.com/HaomingJiang/note/462804
Ad

More Related Content

What's hot (20)

Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
Karol Chlasta
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
M. Atif Qureshi
 
Amazon sentimental analysis
Amazon sentimental analysisAmazon sentimental analysis
Amazon sentimental analysis
Akhila
 
Approaches to Sentiment Analysis
Approaches to Sentiment AnalysisApproaches to Sentiment Analysis
Approaches to Sentiment Analysis
Nihar Suryawanshi
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Aditya Nag
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Ankur Tyagi
 
System Analysis & Design Presentation.pdf
System Analysis & Design Presentation.pdfSystem Analysis & Design Presentation.pdf
System Analysis & Design Presentation.pdf
Ariful Islam
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
Hari Prasad
 
Sentiment analysis presentation
Sentiment analysis presentationSentiment analysis presentation
Sentiment analysis presentation
GunjanSrivastava23
 
Amazon seniment
Amazon senimentAmazon seniment
Amazon seniment
Subhadeep Chakraborty
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
Jaganadh Gopinadhan
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Amenda Joy
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
SmritiAgarwal26
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Data Science Society
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
Sunil Kandari
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
Hetu Bhavsar
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Makrand Patil
 
Amazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine LearningAmazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine Learning
ijtsrd
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
AntaraBhattacharya12
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Seher Can
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
Karol Chlasta
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
M. Atif Qureshi
 
Amazon sentimental analysis
Amazon sentimental analysisAmazon sentimental analysis
Amazon sentimental analysis
Akhila
 
Approaches to Sentiment Analysis
Approaches to Sentiment AnalysisApproaches to Sentiment Analysis
Approaches to Sentiment Analysis
Nihar Suryawanshi
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Aditya Nag
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Ankur Tyagi
 
System Analysis & Design Presentation.pdf
System Analysis & Design Presentation.pdfSystem Analysis & Design Presentation.pdf
System Analysis & Design Presentation.pdf
Ariful Islam
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
Hari Prasad
 
Sentiment analysis presentation
Sentiment analysis presentationSentiment analysis presentation
Sentiment analysis presentation
GunjanSrivastava23
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
Jaganadh Gopinadhan
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Amenda Joy
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
SmritiAgarwal26
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
Sunil Kandari
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
Hetu Bhavsar
 
Amazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine LearningAmazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine Learning
ijtsrd
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Seher Can
 

Similar to Amazon Product Sentiment review (20)

presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
MadhuriChandanbatwe
 
BERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight ManualBERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight Manual
ArkaGhosh65
 
AI presentation Genrative LLM for users.pptx
AI presentation Genrative LLM for users.pptxAI presentation Genrative LLM for users.pptx
AI presentation Genrative LLM for users.pptx
emceemouli
 
Memory models in c#
Memory models in c#Memory models in c#
Memory models in c#
Sophie Obomighie
 
C4 balajiprasath
C4 balajiprasathC4 balajiprasath
C4 balajiprasath
Jasline Presilda
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
ijcsit
 
AI presentation for dummies LLM Generative AI.pptx
AI presentation for dummies LLM Generative AI.pptxAI presentation for dummies LLM Generative AI.pptx
AI presentation for dummies LLM Generative AI.pptx
emceemouli
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
Firat Atagun
 
IRJET - Automated Essay Grading System using Deep Learning
IRJET -  	  Automated Essay Grading System using Deep LearningIRJET -  	  Automated Essay Grading System using Deep Learning
IRJET - Automated Essay Grading System using Deep Learning
IRJET Journal
 
Best practices in enterprise applications
Best practices in enterprise applicationsBest practices in enterprise applications
Best practices in enterprise applications
Chandra Sekhar Saripaka
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Saurabh Saxena
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization
Warply
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorization
Andreas Loupasakis
 
Text Classification
Text ClassificationText Classification
Text Classification
RAX Automation Suite
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
aneeshabakharia
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
IRJET Journal
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
Assessment item 1 File Systems and Advanced Scripting .docx
Assessment item 1 File Systems and Advanced Scripting .docxAssessment item 1 File Systems and Advanced Scripting .docx
Assessment item 1 File Systems and Advanced Scripting .docx
davezstarr61655
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
Carlos Toxtli
 
Authorcontext:ire
Authorcontext:ireAuthorcontext:ire
Authorcontext:ire
Soham Saha
 
BERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight ManualBERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight Manual
ArkaGhosh65
 
AI presentation Genrative LLM for users.pptx
AI presentation Genrative LLM for users.pptxAI presentation Genrative LLM for users.pptx
AI presentation Genrative LLM for users.pptx
emceemouli
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
ijcsit
 
AI presentation for dummies LLM Generative AI.pptx
AI presentation for dummies LLM Generative AI.pptxAI presentation for dummies LLM Generative AI.pptx
AI presentation for dummies LLM Generative AI.pptx
emceemouli
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
Firat Atagun
 
IRJET - Automated Essay Grading System using Deep Learning
IRJET -  	  Automated Essay Grading System using Deep LearningIRJET -  	  Automated Essay Grading System using Deep Learning
IRJET - Automated Essay Grading System using Deep Learning
IRJET Journal
 
Best practices in enterprise applications
Best practices in enterprise applicationsBest practices in enterprise applications
Best practices in enterprise applications
Chandra Sekhar Saripaka
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Saurabh Saxena
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization
Warply
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorization
Andreas Loupasakis
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
aneeshabakharia
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
IRJET Journal
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
Assessment item 1 File Systems and Advanced Scripting .docx
Assessment item 1 File Systems and Advanced Scripting .docxAssessment item 1 File Systems and Advanced Scripting .docx
Assessment item 1 File Systems and Advanced Scripting .docx
davezstarr61655
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
Carlos Toxtli
 
Authorcontext:ire
Authorcontext:ireAuthorcontext:ire
Authorcontext:ire
Soham Saha
 
Ad

Recently uploaded (20)

Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Ad

Amazon Product Sentiment review

  • 1. Amazon Product Review Sentiment Analysis Lalit Jain: https://www.linkedin.com/in/lalit7jain Big Data Systems & Analytics
  • 2. Agenda 1. Case Study 2. Scraping reviews 3. Sentiment analysis 4. Classification using Doc2Vec (Logistic, SGD, SVM) 5. Challenges
  • 3. Case Study • Scrape the reviews of any website and perform sentiment analysis on the corpus • Using sentiment analysis result, once the document is classified appropriately use it to perform classification algorithm using doc2vec approach (SVM/ Deep Belief Network)
  • 4. Scraping Reviews Programming Language used: R Libraries required: Rvest, dplyr, tm, quanteda, etc Approach: 1. From the main page of the product, navigate automatically to the review page 2. Loop through required number of pages to get the number of reviews required with an average of 10 reviews per page 3. Reviews need to be saved in the disk directly to save read only memory and take advantage of hard disk capacity Note: Both the “page link” and the “number of pages” can be passed as an argument callable to the script
  • 6. Corpus Operations Loading the documents using quanteda package Quanteda will create Document Frequency Matrix by function dfm(). This function essentially does this by series of operation including tokenizing, lowercasing, indexing, stemming, matching with dictionary Hu and Liu’s lexicon Using list of positive and negative words (dictionary) available from Hu and Liu’s lexicon with more than 6700+ words All operations in one line!
  • 9. Classification Programming Language used: Python Libraries required: gensim, nltk, sklearn, etc. Approach: 1. Load the raw reviews and apply cleaning using nltk package (stop words, stemming, numbers,etc) 2. Create TaggedDocuments required for building Doc2Vev models (both DM and DBOW) 3. Train both the model 10 times with random shuffling of the documents 4. Split the dataset and apply classification algorithms
  • 10. 1) Cleaning the loaded documents
  • 12. DM and DBOW models dbow (distributed bag of words) It is a simpler model that ignores word order and training stage is quicker. The model uses no-local context/neighboring words in predictions dm (distributed memory) We treat the paragraph as an extra word. Then it is concatenated/averaged with local context word vectors when making predictions. During training, both paragraph and word embeddings are updated. It calls for more computation and complexity.
  • 14. Checking the arrangements of Word vectors by both models
  • 15. 4) Applying Classification Algorithm Testing Training
  • 16. 4) Applying Classification Algorithm Testing Training
  • 17. 4) Applying Classification Algorithm Testing Training
  • 18. Deep Belief Network Trained only on 3000 documents Hyper parameters selected after 52 different combinations In terms of learn_rates, decays, epochs and hidden units Accuracy achieved of 89%
  • 19. Conclusion Deep Belief Network works well even with 3000 documents. SVM performs poorly irrespective of the kernel and other hyper parameter Best Model: Deep Belief Network
  • 20. Challenges 1. Deep Belief Network does not work on Python 3 2. Need to setup Python 2.7.3 virtual environment 3. “nolearn” library compatibility issues 4. Python memory issues when working on large corpus. Does not work on CPU and needs a GPU powered machine