TEXT ANALYTICS With Python
TEXT ANALYTICS With Python
SENTIMENT ANALYSIS
Extracting textual information to draw insights
Jeroen VK Rombouts
1
Topics for the Session
1. Introduction
4. Practical Applications
2
1. INTRODUCTION
3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
4
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
u The global text analytics market was valued at USD 3.95 billion and is
expected to reach USD 10.38 billion by 2023 with an expected Compound
Annual Growth rate (CAGR) of 17.3% during the forecast period of 2018–2023
u Text analytics tools are being increasingly used by organizations to aid their
business-making process by offering actionable insights from various forms of
text sources, such as client interaction, emails, blogs, product reviews,
tweets, etc.
u The primary objective of text analytics is to accumulate different forms of
data, including structured and unstructured, which is further utilized for
analysis, thereby fuelling the organization’s business decisions
5
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
u Volume:
u How much?
u Examples of metrics?
u Valence:
u How to measure?
u Examples of metrics?
u Heterogeneity?
u How different?
u Examples of metrics?
7
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
u Volume:
u Amount of data scraped – measured in terms of kilobytes/gigabytes
u Number of records in the given data
u Valence:
u Measure the amount of positivity or negativity of a sentence
u Polarity and subjectivity
u Heterogeneity?
u Similarity of words in the text corpus
u Clustering based on the term frequencies
8
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
9
2. PROCESS OF TEXT
ANALYTICS
3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
11
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
12
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
Text Data
13
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
Pre-processing
u Case Conversion
u Punctuation removal
u Stopwords removal – Common words without significance
u Rare words removal – Very rare words which have no meaning
u Spelling correction
u Tokenization – Breaking down a sentence into a list of words
u Stemming – pruning the words to obtain the root word
u Lemmatization – changing the grammatical tense to obtain the root word
14
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
Feature Extraction
u Number of words
u Number of characters
u Average word length
u Number of stopwords
u Number of special characters
u Number of numeric characters
u Number of uppercase words
15
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
Feature Selection
16
3. TEXT ANALYTICS
TECHNIQUES
3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
18
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
Sentiment Analysis
19
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
Text Classification
u Types:
u Supervised document classification
u Unsupervised document classification
u Semi-supervised document classification
u Techniques:
u K-nearest neighbour algorithms
u Naïve Bayes classifier
u Support Vector Machines
u Artificial Neural Networks
20
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
21
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
TF-IDF (1/2)
22
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
TF-IDF (2/2)
u The Inverse Document Frequency is given by:
u Finally TF-IDF:
23
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
24
4. PRACTICAL
APPLICATIONS
3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
Practical Applications
u Spam mail Classification
u Brand perception in current Market
u Competitor Analysis
u Contextual Advertising
u Business Intelligence
u Prediction and Prevention of Crime
u Customer Care services
u Fraud detection by Insurance Companies
26
5. USE CASE
DISCUSSION
3
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
28
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
29
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
30
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
31
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
Text Pre-processing
32
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
Feature Selection
TF-IDF N-grams
33
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
Word Cloud
What conclusions can we
draw based on the resulting
word cloud ?
35
Process of Text Text Analytics Practical Use Case
Introduction
Analytics Techniques Applications Discussion
K-means Clustering
Through K-Means clustering we
can now identify the group of
people who have a higher positive
sentiment than the rest, which is
cluster 2.
36
Conclusion and Future Scope
u Thus from our above analysis, we have obtained insights on the overall
sentiment of the people whose tweets have been scrutinized.
u This sentiment analysis will provide the base for hate/love speech
recognition.
u Further delving into the subject, we can train a model with our newly tagged
tweets and predict the occurrence of hate speeches of a new set of tweets.
37