0% found this document useful (0 votes)
82 views

TEXT ANALYTICS With Python

The document discusses text mining and sentiment analysis. It covers topics like the process of text analytics, techniques used in text analytics like natural language processing and sentiment analysis, practical applications of text analytics, and use case discussions. The document provides details on each step of text analytics like data collection, preprocessing, feature extraction, feature selection and different analysis methods.

Uploaded by

ignacio.pelirojo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

TEXT ANALYTICS With Python

The document discusses text mining and sentiment analysis. It covers topics like the process of text analytics, techniques used in text analytics like natural language processing and sentiment analysis, practical applications of text analytics, and use case discussions. The document provides details on each step of text analytics like data collection, preprocessing, feature extraction, feature selection and different analysis methods.

Uploaded by

ignacio.pelirojo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

TEXT MINING AND

SENTIMENT ANALYSIS
Extracting textual information to draw insights

Jeroen VK Rombouts

1
Topics for the Session

1. Introduction

2. Process of Text Analytics

3. Text Analytics Techniques

4. Practical Applications

5. Use Case Discussion

2
1. INTRODUCTION

3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

What is Text Mining ?


u Text Mining is the process of deriving high-quality information through
statistical pattern learning from text
u Types: text categorization, text clustering, concept/entity extraction,
production of granular taxonomies, sentiment analysis, document
summarization, and entity relation modelling

4
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Need for Text Mining (1/2)

u The global text analytics market was valued at USD 3.95 billion and is
expected to reach USD 10.38 billion by 2023 with an expected Compound
Annual Growth rate (CAGR) of 17.3% during the forecast period of 2018–2023
u Text analytics tools are being increasingly used by organizations to aid their
business-making process by offering actionable insights from various forms of
text sources, such as client interaction, emails, blogs, product reviews,
tweets, etc.
u The primary objective of text analytics is to accumulate different forms of
data, including structured and unstructured, which is further utilized for
analysis, thereby fuelling the organization’s business decisions

5
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Need for Text Mining (2/2)

u In marketing: analytical customer relationship management, predictive


model for customer attrition, sentiment analysis of a brand (benchmarking,
market analysis, competitive analysis …)
u Determine the identity of a brand, the way it communicates to its audience,
which emotional triggers it uses for its marketing campaigns …
u Ultimately, text mining allows a brand to readjust its communication and
strategy by identifying how audience/partners/competitors perceive it.

User Valence Volume .…


Peter +5 500
Sarah +3 400
Comp. 1 -10 5000
… … …
6
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text mining & Social Media Data – The Questions

u Volume:
u How much?
u Examples of metrics?
u Valence:
u How to measure?
u Examples of metrics?
u Heterogeneity?
u How different?
u Examples of metrics?

7
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text mining & Social Media Data – The Answers

u Volume:
u Amount of data scraped – measured in terms of kilobytes/gigabytes
u Number of records in the given data
u Valence:
u Measure the amount of positivity or negativity of a sentence
u Polarity and subjectivity
u Heterogeneity?
u Similarity of words in the text corpus
u Clustering based on the term frequencies

8
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

What is our Prime Focus ?


External and non-structured Data: Internalized Data:
Network, UGC, etc. Datawarehouse, ERP, CRM, etc.

External structured Data: Panel, Data for organizations and


Survey, Tests, etc. businesses directly usable for
business solutions

9
2. PROCESS OF TEXT
ANALYTICS

3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Process of Text Analytics

u Collection of Text Data


u Pre-processing
u Feature Extraction
u Feature Selection
u Text Analysis and Modelling
u Natural Language Processing
u Sentiment Analysis
u Text Grouping and Classification

11
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text Mining – Classification tree

12
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text Data

Data for text analytics can be of many forms


such as:
u Structured – Survey forms, Tests, Word
docs
u Semi-structured – Job listings, Retail
invoices, Reports
u Unstructured – Blogs, Tweets, Comments

13
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Pre-processing

u Case Conversion
u Punctuation removal
u Stopwords removal – Common words without significance
u Rare words removal – Very rare words which have no meaning
u Spelling correction
u Tokenization – Breaking down a sentence into a list of words
u Stemming – pruning the words to obtain the root word
u Lemmatization – changing the grammatical tense to obtain the root word

14
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Feature Extraction

u Number of words
u Number of characters
u Average word length
u Number of stopwords
u Number of special characters
u Number of numeric characters
u Number of uppercase words

15
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Feature Selection

u Feature selection refers to the filtering of useful information from the


extracted features through the methods discussed before.
u Feature selection can either be done by ‘Bag of Words’ method or by Machine
Learning
u Some other feature selection techniques and N-grams, Term Frequency,
Inverse Document Frequency (TF-IDF), Word embeddings

16
3. TEXT ANALYTICS
TECHNIQUES

3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Natural Language Processing

u The foremost functionality of the NLP in Text Mining is Parts Of Speech


tagging (commonly referred to as POS tagging). This function identifies each
word in a sentence as a grammatical part and tags them.
u Other features of NLP include:
u Text summarization
u Machine Translation
u Optical Character Recognition
u Document to Information

18
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Sentiment Analysis

u Brand perception among customers is one of the key factors to be considered


before making any critical decisions in the current market
u Sentiment Analysis of Text Data which has been collected, cleaned and
processed, will help us to better understand the consumer market
u The data for sentiment analysis is usually tweets, social media posts, blog
comments, product reviews, etc.
u Sentiment Analysis can also be carried out on large paragraphs to perceive the
emotion of the given text

19
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text Classification

u Types:
u Supervised document classification
u Unsupervised document classification
u Semi-supervised document classification
u Techniques:
u K-nearest neighbour algorithms
u Naïve Bayes classifier
u Support Vector Machines
u Artificial Neural Networks

20
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

POS Tagging – Parts Of Speech


u POS tagging is a process by which a single Parts of Speech tag is assigned to
each word (and symbols/punctuations) in a text.
u This is very useful to find out the grammatical patterns in N-grams and to
calculate distance metrics between different POS tags

21
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

TF-IDF (1/2)

u TF-IDF refers to Term Frequency – Inverse Document Frequency. It gives us the


importance of a particular word found in a text corpus
u The value of TF-IDF increases proportionally to the number of times a word
appears in the document and is offset by the number of documents in the
corpus that contain the word
u The formula for Term Frequency is given by:

22
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

TF-IDF (2/2)
u The Inverse Document Frequency is given by:

u Finally TF-IDF:

23
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Similarity – Levenshtein Distance

u The minimum number of edits (insertion, deletion, substitution) needed to


change a string of characters into another
u For example, the Levenshtein distance between kitten and sitting is 3, since
the following three edits change one into the other, and there is no way to do
it with fewer than three edits:
kitten → sitten (substitution of "s" for "k")
sitten → sittin (substitution of "i" for "e")
sittin → sitting (insertion of "g" at the end).
u Application: Spell Checkers, Fuzzy String searching, assist natural language
translation based on translation memory

24
4. PRACTICAL
APPLICATIONS

3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Practical Applications
u Spam mail Classification
u Brand perception in current Market
u Competitor Analysis
u Contextual Advertising
u Business Intelligence
u Prediction and Prevention of Crime
u Customer Care services
u Fraud detection by Insurance Companies

26
5. USE CASE
DISCUSSION

3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Hands on Text Analytics session


u Open the “Text Analytics - Accenture Strategic Business Analytics Chair”
python notebook
u type ‘pip install’ followed by the library name, to download required
packages or dependencies
u pip install textblob
u Set working directory to the location of the “train_E6oV3lV” CSV file

28
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Motivation behind the Use Case


u Hate speech is an unfortunately common occurr
ence on the Internet. Often social media sites
like Facebook and Twitter face the problem of
identifying and censoring problematic posts while
weighing the right to freedom of speech.
u The importance of detecting and moderating
hate speech is evident from the strong connection
between hate speech and actual hate crimes.
u Early identification of users promoting hate speech
could enable outreach programs that attempt to
prevent an escalation from speech to action.

29
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

About the Data set


u This data consists of Tweets was extracted from Twitter and is available for
the public on Analytics Vidhya contest – “Twitter Sentiment Analysis”
u The data is in the form of CSV containing 31,962 unique tweets which have
been scraped from twitter which has a mix of hate, neutral and positive
tweets
u Each tweet has a corresponding tweet ID and its sentiment label
u The hate tweets have been labelled as ‘1’ and the others as ‘0’

30
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Let’s explore the data

31
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Text Pre-processing

32
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Feature Selection
TF-IDF N-grams

33
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Sentiment Analysis - Output

This analysis gives us a general opinion about


the set of tweets we took into consideration.
From the pie chart, we can see that around
80% of the tweets are either neutral or
positive and hence there is very less
hate/negative content on this text corpus.
34
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

Word Cloud
What conclusions can we
draw based on the resulting
word cloud ?

We can refine the graph by


removing certain words
from the original corpus,
e.g.:
• Remove “go”
• Use Spelling Checks

35
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion

K-means Clustering
Through K-Means clustering we
can now identify the group of
people who have a higher positive
sentiment than the rest, which is
cluster 2.

By clustering the tweets through


the sentiments instead, we can
classify the users according to
their emotions expressed in their
posts.

36
Conclusion and Future Scope

u Thus from our above analysis, we have obtained insights on the overall
sentiment of the people whose tweets have been scrutinized.
u This sentiment analysis will provide the base for hate/love speech
recognition.
u Further delving into the subject, we can train a model with our newly tagged
tweets and predict the occurrence of hate speeches of a new set of tweets.

37

You might also like