0% found this document useful (0 votes)

9 views

AminaRahmanK DL Lab5

Uploaded by

thecuriousnalyst695

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

AminaRahmanK DL Lab5

Uploaded by

thecuriousnalyst695

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Amina Rahman K_DL_Lab5

December 12, 2021

1 SENTIMENT ANALYSIS IN PYTHON 3 USING NLTK 2 1.

INSTALL AND DOWNLOAD THE DATA
[1]: import nltk
nltk.download('twitter_samples')

3 2. TOKENIZING THE DATA

[ ]: from nltk.corpus import twitter_samples

[21]: pos_tweets = twitter_samples.strings('positive_tweets.json') neg_tweets

= twitter_samples.strings('negative_tweets.json')
text = twitter_samples.strings('tweets.20150430-223406.json')

[22]: print(pos_tweets[:10])
print()
print(neg_tweets[:10])

['#FollowFriday @France_Inte @PKuchly57 @Milipol_Paris for being top engaged members

in my community this week :)', '@Lamb2ja Hey James! How odd :/ Please call our Contact
Centre on 02392441234 and we will be able to assist you :) Many thanks!', '@DespiteOfficial
we had a listen last night :) As You Bleed is an amazing track. When are you in Scotland?!',
'@97sides CONGRATS :)', 'yeaaaah yippppy!!! my accnt verified rqst has succeed got a blue
tick mark on my fb profile :) in 15 days', '@BhaktisBanter @PallaviRuhail This one is irresistible
:)\n#FlipkartFashionFriday http://t.co/EbZ0L2VENM', "We don't like to keep our lovely
customers waiting for long! We hope you enjoy! Happy Friday! - LWWF :)
https://t.co/smyYriipxI", '@Impatientraider On second thought, there’s just not enough time for
a DD :) But new shorts entering system. Sheep must be buying.', 'Jgh , but we have to go to
Bayan :D bye', 'As an act of mischievousness, am calling the ETL layer of our in-house
warehousing app Katamari.\n\nWell… as the name implies :p.']

['hopeless for tmr :(', "Everything in the kids section of IKEA is so cute. Shame I'm nearly 19 in
2 months :(", '@Hegelbon That heart sliding into the waste basket. :(', '“@ketchBurning: I hate
Japanese call him "bani" :( :(”\n\nMe

1
too', 'Dang starting next week I have "work" :(', "oh god, my babies' faces :(
https://t.co/9fcwGvaki0", '@RileyMcDonough make me smile :((', '@f0ggstar @stuartthull
work neighbour on motors. Asked why and he said hates the updates on search :(
http://t.co/XvmTUikWln', 'why?:("@tahuodyy: sialan:(
https://t.co/Hv1i0xcrL2"', 'Athabasca glacier was there in #1948 :-( #athabasca #glacier
#jasper #jaspernationalpark #alberta #explorealberta #…
http://t.co/dZZdqmf7Cz']

[4]: nltk.download('punkt')

[26]: pos_tweets = twitter_samples.tokenized('positive_tweets.json') neg_tweets

= twitter_samples.tokenized('negative_tweets.json')

[6]: print(tweet_tokens[0])
print()
print(tweet_tokens[0][0])

['#FollowFriday', '@France_Inte', '@PKuchly57', '@Milipol_Paris', 'for', 'being', 'top',

'engaged', 'members', 'in', 'my', 'community', 'this', 'week', ':)']

#FollowFriday

4 3. NORMALIZING THE DATA

[36]: nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

[8]: from nltk.tag import pos_tag

from nltk.corpus import twitter_samples

[9]: tweet_tokens = twitter_samples.tokenized('positive_tweets.json')

print(pos_tag(tweet_tokens[0]))

[('#FollowFriday', 'JJ'), ('@France_Inte', 'NNP'), ('@PKuchly57', 'NNP'),

('@Milipol_Paris', 'NNP'), ('for', 'IN'), ('being', 'VBG'), ('top', 'JJ'), ('engaged', 'VBN'),
('members', 'NNS'), ('in', 'IN'), ('my', 'PRP$'), ('community', 'NN'), ('this', 'DT'), ('week',
'NN'), (':)', 'NN')]
The pos_tags return a list of tags, here is the list of the most common items and their meaning:
• NNP: Noun, proper, singular
• NN: Noun, common, singular or mass
• IN: Preposition or conjunction, subordinating
• VBG: Verb, gerund or present participle
• VBN: Verb, past participle
In general, if a tag starts with NN, the word is a noun and if it stars with VB, the word is a verb. 2

[10]: from nltk.stem.wordnet import WordNetLemmatizer

def lemmatize_sentence(tokens):
lemmatizer = WordNetLemmatizer()
lemmatized_sentence = []
for word, tag in pos_tag(tokens):
if tag.startswith('NN'):
pos = 'n'
elif tag.startswith('VB'):
pos = 'v'
else:
pos = 'a'
lemmatized_sentence.append(lemmatizer.lemmatize(word, pos))
return lemmatized_sentence

print(lemmatize_sentence(tweet_tokens[0]))

['#FollowFriday', '@France_Inte', '@PKuchly57', '@Milipol_Paris', 'for', 'be', 'top', 'engage',

'member', 'in', 'my', 'community', 'this', 'week', ':)']
This code imports the WordNetLemmatizer class and initializes it to a variable, lemmatizer.
The function lemmatize_sentence first gets the position tag of each token of a tweet. Within the if
statement, if the tag starts with NN, the token is assigned as a noun. Similarly, if the tag starts
with VB, the token is assigned as a verb.

5 4. REMOVING NOISE FROM DATA

Noise is any part of the text that does not add meaning or information to data.
Noise is specific to each project, so what constitutes noise in one project may not be in a
different project. For instance, the most common words in a language are called stop words.
Some examples of stop words are “is”, “the”, and “a”. They are generally irrelevant when
processing language, unless a specific use case warrants their inclusion.
In this project, you will use regular expressions in Python to search for and remove these items:
• Hyperlinks - All hyperlinks in Twitter are converted to the URL shortener t.co. Therefore,
keeping them in the text processing would not add any value to the analysis.
• Twitter handles in replies - These Twitter usernames are preceded by a @ symbol, which
does not convey any meaning.
• Punctuation and special characters - While these often provide context to textual data, this
context is often difficult to process. For simplicity, you will remove all punctuation and
special characters from tweets.
To remove hyperlinks, you need to first search for a substring that matches a URL starting with
http:// or https://, followed by letters, numbers, or special characters. Once a pattern is matched,
the .sub() method replaces it with an empty string.

3
[11]: import re, string
def remove_noise(tweet_tokens, stop_words = ()):

cleaned_tokens = []

for token, tag in pos_tag(tweet_tokens):

token = re.sub('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+#]|[!*,]|'\
'(?:%[0-9a-fA-F][0-9a-fA-F]))+','', token)
token = re.sub("(@[A-Za-z0-9_]+)","", token)

if tag.startswith("NN"):
pos = 'n'
elif tag.startswith('VB'):
pos = 'v'
else:
pos = 'a'

lemmatizer = WordNetLemmatizer()
token = lemmatizer.lemmatize(token, pos)

if len(token) > 0 and token not in string.punctuation and token.lower()␣ not in

,→

stop_words:
cleaned_tokens.append(token.lower())
return cleaned_tokens

This code creates a remove_noise() function that removes noise and incorporates the
normalization and lemmatization mentioned in the previous section. The code takes two
arguments: the tweet tokens and the tuple of stop words.
The code then uses a loop to remove the noise from the dataset. To remove hyperlinks, the code
first searches for a substring that matches a URL starting with http:// or https://, followed by
letters, numbers, or special characters. Once a pattern is matched, the .sub() method replaces it
with an empty string, or ’ ’.
Similarly, to remove @ mentions, the code substitutes the relevant part of text using regular ex
pressions. The code uses the re library to search @ symbols, followed by numbers, letters, or _,
and replaces them with an empty string.
Finally, we can remove punctuation using the library string.

[12]: from nltk.corpus import stopwords

stop_words = stopwords.words('english')
print(remove_noise(tweet_tokens[0],stop_words))

['#followfriday', 'top', 'engage', 'member', 'community', 'week', ':)']

[27]: positive_cleaned_tokens_list = []
negative_cleaned_tokens_list = []
4
for tokens in pos_tweets:
positive_cleaned_tokens_list.append(remove_noise(tokens, stop_words))

for tokens in neg_tweets:

negative_cleaned_tokens_list.append(remove_noise(tokens, stop_words))

To compare the original tokens to the cleaned tokens for a sample tweet. If you’d like to test this,
add the following code to the file to compare both versions of the 500th tweet in the list:

[28]: print(pos_tweets[500])
print(positive_cleaned_tokens_list[500])

['Dang', 'that', 'is', 'some', 'rad', '@AbzuGame', '#fanart', '!', ':D',

'https://t.co/bI8k8tb9ht']
['dang', 'rad', '#fanart', ':d']

6 5. DETERMINING WORD DENSITY

[29]: def get_all_words(cleaned_tokens_list):
for tokens in cleaned_tokens_list:
for token in tokens:
yield token

all_pos_words = get_all_words(positive_cleaned_tokens_list)

[30]: from nltk import FreqDist

freq_dist_pos = FreqDist(all_pos_words)
print(freq_dist_pos.most_common(10))

[(':)', 3691), (':-)', 701), (':d', 658), ('thanks', 388), ('follow', 357), ('love', 333), ('…', 290),
('good', 283), ('get', 263), ('thank', 253)]
The .most_common() method lists the words which occur most frequently in the data. Save and
close the file after making these changes.

7 6. PREPARING DATA FOR THE MODEL

Sentiment analysis is a process of identifying an attitude of the author on a topic that is being
written about. You will create a training data set to train a model. It is a supervised learning
machine learning process, which requires you to associate each dataset with a “sentiment” for
training. In this project, the model will use the “positive” and “negative” sentiments.
Sentiment analysis can be used to categorize text into a variety of sentiments. For simplicity and
availability of the training dataset, this project trains the model in only two categories, positive
and negative.
5
A model is a description of a system using rules and equations. It may be as simple as an
equation which predicts the weight of a person, given their height. A sentiment analysis model
that you will build would associate tweets with a positive or a negative sentiment. You will need
to split your dataset into two parts. The purpose of the first part is to build the model, whereas the
next part tests the performance of the model.
In the data preparation step, you will prepare the data for sentiment analysis by converting
tokens to the dictionary form and then split the data for training and testing purposes.
Converting Tokens to a Dictionary First, you will prepare the data to be fed into the model. You
will use the Naive Bayes classifier in NLTK to perform the modeling exercise. Notice that the
model requires not just a list of words in a tweet, but a Python dictionary with words as keys and
True as values. The following function makes a generator function to change the format of the
cleaned data.
Add the following code to convert the tweets from a list of cleaned tokens to dictionaries with
keys as the tokens and True as values. The corresponding dictionaries are stored in posi
tive_tokens_for_model and negative_tokens_for_model.

[31]: def get_tweets_for_model(cleaned_tokens_list):

for tweet_tokens in cleaned_tokens_list:
yield dict([token, True] for token in tweet_tokens)

positive_tokens_for_model = get_tweets_for_model(positive_cleaned_tokens_list)
negative_tokens_for_model = get_tweets_for_model(negative_cleaned_tokens_list)

7.0.1 Splitting the Dataset for Training and Testing the Model
Next, prepare the data for training the NaiveBayesClassifier class.

[32]: import random

positive_dataset = [(tweet_dict, "Positive")

for tweet_dict in positive_tokens_for_model]

negative_dataset = [(tweet_dict, "Negative")

for tweet_dict in negative_tokens_for_model]

dataset = positive_dataset + negative_dataset

random.shuffle(dataset)

train_data = dataset[:7000]
test_data = dataset[7000:]

This code attaches a Positive or Negative label to each tweet. It then creates a dataset by joining
the positive and negative tweets.
By default, the data contains all positive tweets followed by all negative tweets in sequence.
When training the model, you should provide a sample of your data that does not contain any
bias. To

6
avoid bias, we’ve added code to randomly arrange the data using the .shuffle() method of random.
Finally, the code splits the shuffled data into a ratio of 70:30 for training and testing, respectively.
Since the number of tweets is 10000, you can use the first 7000 tweets from the shuffled dataset
for training the model and the final 3000 for testing the model.

8 7. BUILDING AND TESTING THE MODEL

[33]: from nltk import classify
from nltk import NaiveBayesClassifier
classifier = NaiveBayesClassifier.train(train_data)

print("Accuracy is:", classify.accuracy(classifier, test_data))

print(classifier.show_most_informative_features(10))

Accuracy is: 0.9953333333333333

Most Informative Features
:( = True Negati : Positi = 2083.8 : 1.0
:) = True Positi : Negati = 1654.4 : 1.0
follower = True Positi : Negati = 21.5 : 1.0
welcome = True Positi : Negati = 20.7 : 1.0
bam = True Positi : Negati = 19.4 : 1.0
sick = True Negati : Positi = 18.6 : 1.0
sad = True Negati : Positi = 16.3 : 1.0
followed = True Negati : Positi = 15.2 : 1.0
arrive = True Positi : Negati = 13.1 : 1.0
community = True Positi : Negati = 12.8 : 1.0
None
Accuracy is defined as the percentage of tweets in the testing dataset for which the model was
correctly able to predict the sentiment. A 99.5% accuracy on the test set is pretty good.
In the table that shows the most informative features, every row in the output shows the ratio of
occurrence of a token in positive and negative tagged tweets in the training dataset. The first row
in the data signifies that in all tweets containing the token :(, the ratio of negative to positives
tweets was 2085.6 to 1. Interestingly, it seems that there was one token with :( in the positive
datasets. You can see that the top two discriminating items in the text are the emoticons. Further,
words such as sad lead to negative sentiments, whereas welcome and glad are associated with
positive sentiments.
[34]: from nltk.tokenize import word_tokenize

custom_tweet = "I ordered just once from TerribleCo, they screwed up, never␣ ,→used the
app again."

custom_tokens = remove_noise(word_tokenize(custom_tweet))

print(classifier.classify(dict([token, True] for token in custom_tokens))) 7

Negative

[35]: custom_tweet = 'Congrats #SportStar on your 7th best goal from last season␣ ,→winning
goal of the year :) #Baller #Topbin #oneofmanyworldies'
custom_tokens = remove_noise(word_tokenize(custom_tweet))

print(classifier.classify(dict([token, True] for token in custom_tokens))) Positive

9 8. CLEANING UP THE CODE

• All imports should be at the top of the file. Imports from the same library should be grouped
together in a single statement.
• All functions should be defined after the imports.
• All the statements in the file should be housed under an if name == “main”: condition. This
ensures that the statements are not executed if you are importing the functions of the file in
another file.
[ ]: from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import twitter_samples, stopwords
from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
from nltk import FreqDist, classify, NaiveBayesClassifier

import re, string, random

def remove_noise(tweet_tokens, stop_words = ()):

cleaned_tokens = []

for token, tag in pos_tag(tweet_tokens):

token = re.sub('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+#]|[!*,]|'\
'(?:%[0-9a-fA-F][0-9a-fA-F]))+','', token)
token = re.sub("(@[A-Za-z0-9_]+)","", token)

if tag.startswith("NN"):
pos = 'n'
elif tag.startswith('VB'):
pos = 'v'
else:
pos = 'a'

lemmatizer = WordNetLemmatizer()
token = lemmatizer.lemmatize(token, pos)

8
if len(token) > 0 and token not in string.punctuation and token.lower()␣ not in
,→

stop_words:
cleaned_tokens.append(token.lower())
return cleaned_tokens

def get_all_words(cleaned_tokens_list):
for tokens in cleaned_tokens_list:
for token in tokens:
yield token

def get_tweets_for_model(cleaned_tokens_list):
for tweet_tokens in cleaned_tokens_list:
yield dict([token, True] for token in tweet_tokens)

if __name__ == "__main__":

positive_tweets = twitter_samples.strings('positive_tweets.json')
negative_tweets = twitter_samples.strings('negative_tweets.json') text =
twitter_samples.strings('tweets.20150430-223406.json') tweet_tokens =
twitter_samples.tokenized('positive_tweets.json')[0]

stop_words = stopwords.words('english')

positive_tweet_tokens = twitter_samples.tokenized('positive_tweets.json')
negative_tweet_tokens = twitter_samples.tokenized('negative_tweets.json')

positive_cleaned_tokens_list = []
negative_cleaned_tokens_list = []

for tokens in positive_tweet_tokens:

positive_cleaned_tokens_list.append(remove_noise(tokens, stop_words))

for tokens in negative_tweet_tokens:

negative_cleaned_tokens_list.append(remove_noise(tokens, stop_words))

all_pos_words = get_all_words(positive_cleaned_tokens_list)

freq_dist_pos = FreqDist(all_pos_words)
print(freq_dist_pos.most_common(10))

positive_tokens_for_model =␣
get_tweets_for_model(positive_cleaned_tokens_list)
,→
negative_tokens_for_model =␣
get_tweets_for_model(negative_cleaned_tokens_list)
,→

positive_dataset = [(tweet_dict, "Positive")

for tweet_dict in positive_tokens_for_model]

9
negative_dataset = [(tweet_dict, "Negative")
for tweet_dict in negative_tokens_for_model]

dataset = positive_dataset + negative_dataset

random.shuffle(dataset)

train_data = dataset[:7000]
test_data = dataset[7000:]

classifier = NaiveBayesClassifier.train(train_data)

print("Accuracy is:", classify.accuracy(classifier, test_data))

print(classifier.show_most_informative_features(10))

custom_tweet = "I ordered just once from TerribleCo, they screwed up, never␣ used the
,→

app again."

custom_tokens = remove_noise(word_tokenize(custom_tweet))

print(custom_tweet, classifier.classify(dict([token, True] for token in␣

,→custom_tokens)))

This project introduced us to a basic sentiment analysis model using the nltk library in Python 3.
First, we performed pre-processing on tweets by tokenizing a tweet, normalizing the words, and
removing noise. Next, we visualized frequently occurring items in the data. Finally, we built a
model to associate tweets to a particular sentiment.
A supervised learning model is only as good as its training data. To further strengthen the model,
we could consider adding more categories like excitement and anger. In this tutorial, you have
only scratched the surface by building a rudimentary model.
10

Present Perfect Worksheet
100% (3)
Present Perfect Worksheet
4 pages
Carte MadalinaCerban PDF
No ratings yet
Carte MadalinaCerban PDF
214 pages
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
No ratings yet
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
15 pages
Approaching Almost Any NLP
No ratings yet
Approaching Almost Any NLP
118 pages
How To Perform Sentiment Analysis in Python 3 Using The Natural Language Toolkit (NLTK) - DigitalOcean
No ratings yet
How To Perform Sentiment Analysis in Python 3 Using The Natural Language Toolkit (NLTK) - DigitalOcean
29 pages
NLP Record
No ratings yet
NLP Record
15 pages
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
No ratings yet
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
37 pages
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
No ratings yet
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
16 pages
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
No ratings yet
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
9 pages
4.TWITTER EXTRACTION AND ANALYTICS
No ratings yet
4.TWITTER EXTRACTION AND ANALYTICS
45 pages
Reg. No.: 39110009 Colab Notebook Link: Name: Abivirshan Suresh
No ratings yet
Reg. No.: 39110009 Colab Notebook Link: Name: Abivirshan Suresh
27 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
Cyberbullying Tweet Recognition Project 1677256740
No ratings yet
Cyberbullying Tweet Recognition Project 1677256740
17 pages
British_Airways_Forage_Report
No ratings yet
British_Airways_Forage_Report
12 pages
Lab Manual
No ratings yet
Lab Manual
10 pages
ChatGPT Twitter Sentiment Analyzer
No ratings yet
ChatGPT Twitter Sentiment Analyzer
50 pages
Sentiment Analysis On User-Generated Tweets
No ratings yet
Sentiment Analysis On User-Generated Tweets
15 pages
Data Science Project
No ratings yet
Data Science Project
34 pages
vertopal.com_C1_W2_Assignment
No ratings yet
vertopal.com_C1_W2_Assignment
18 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
NLP - Twitter Sentiment Analysis With Tensorflow - Sebastian Correa - Medium
No ratings yet
NLP - Twitter Sentiment Analysis With Tensorflow - Sebastian Correa - Medium
13 pages
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
No ratings yet
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
15 pages
Methodology (Autosaved)
No ratings yet
Methodology (Autosaved)
9 pages
Unit2 Full
No ratings yet
Unit2 Full
28 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
13 pages
NLP LAB_MANUAL (1)
No ratings yet
NLP LAB_MANUAL (1)
33 pages
NLP___
No ratings yet
NLP___
28 pages
Sentiment Analysis Presentationnotes
No ratings yet
Sentiment Analysis Presentationnotes
4 pages
Unit 5
No ratings yet
Unit 5
4 pages
Tweet-Sentiment-Extraction - Exploratory Data Analysis
No ratings yet
Tweet-Sentiment-Extraction - Exploratory Data Analysis
11 pages
NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
NLP - (1) (1) .Ipynb - Colab
No ratings yet
NLP - (1) (1) .Ipynb - Colab
10 pages
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
No ratings yet
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
11 pages
NLP Practicals
No ratings yet
NLP Practicals
6 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
NLP-3 A23 Pooja Gorade - Jupyter Notebook
No ratings yet
NLP-3 A23 Pooja Gorade - Jupyter Notebook
12 pages
5 Tahapan Case Folding, Tokenization Dan Filtering, Stopword Removal, Stemming.
No ratings yet
5 Tahapan Case Folding, Tokenization Dan Filtering, Stopword Removal, Stemming.
7 pages
Text Mining Using Python
No ratings yet
Text Mining Using Python
1 page
Ppt- Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Ppt- Sentiment Analysis Using Machine Learning Algorithms
23 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
Tutorial 2
No ratings yet
Tutorial 2
82 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Natural Language Processing manual
No ratings yet
Natural Language Processing manual
39 pages
Unit 5 Machine Learning
No ratings yet
Unit 5 Machine Learning
9 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
No ratings yet
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
7 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
3 pages
Artificial Neural Network Code
No ratings yet
Artificial Neural Network Code
3 pages
Lecture-2n-04032024-081220pm-19022025-105409am
No ratings yet
Lecture-2n-04032024-081220pm-19022025-105409am
38 pages
Text Mining Project Report
No ratings yet
Text Mining Project Report
27 pages
Group 1
No ratings yet
Group 1
9 pages
Building A Powered Ai and Spam Caller
No ratings yet
Building A Powered Ai and Spam Caller
7 pages
NLP-3 A23 Pooja Gorade - Jupyter Notebook
No ratings yet
NLP-3 A23 Pooja Gorade - Jupyter Notebook
12 pages
20BCP112 - NLP Lab - LAB - Manual
No ratings yet
20BCP112 - NLP Lab - LAB - Manual
65 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Bagas Johana - 18050724033 - Uts B. Indonesia
No ratings yet
Bagas Johana - 18050724033 - Uts B. Indonesia
30 pages
8.6.4 Compound Subjects and Verbs
No ratings yet
8.6.4 Compound Subjects and Verbs
2 pages
Nouns 3 - Talking in General: Presentation
No ratings yet
Nouns 3 - Talking in General: Presentation
5 pages
Vocabulary 8 2
No ratings yet
Vocabulary 8 2
2 pages
3 Describing Appearance Lesson Plan
71% (7)
3 Describing Appearance Lesson Plan
2 pages
Get Natural Language Processing 1st Edition Jacob Eisenstein Free All Chapters
100% (5)
Get Natural Language Processing 1st Edition Jacob Eisenstein Free All Chapters
52 pages
An Analysis of Semantic Ambiguity Found in Song Lyrics Before 2000
No ratings yet
An Analysis of Semantic Ambiguity Found in Song Lyrics Before 2000
2 pages
Planificare Anuala Fairyland 4: Nr. Unităţi DE Învăţare Conţinuturi O.R. NR. ORE Săptămâna Materiale Folosite
No ratings yet
Planificare Anuala Fairyland 4: Nr. Unităţi DE Învăţare Conţinuturi O.R. NR. ORE Săptămâna Materiale Folosite
2 pages
english 8 planner Unit#6 (1)
No ratings yet
english 8 planner Unit#6 (1)
3 pages
College English Reviewer Finals
No ratings yet
College English Reviewer Finals
17 pages
W7 English Answer Sheet With Summative Test
No ratings yet
W7 English Answer Sheet With Summative Test
3 pages
Simple Past
0% (1)
Simple Past
5 pages
Anglais 3ASL-3ASLLE Bac Blanc N°02 1er Sujet PDF
No ratings yet
Anglais 3ASL-3ASLLE Bac Blanc N°02 1er Sujet PDF
3 pages
Second Periodical Test
100% (1)
Second Periodical Test
6 pages
Unit 2 - Present Simple
No ratings yet
Unit 2 - Present Simple
2 pages
IF Condition Result: Exercise Write The Verbs in Brackets. Use The First Conditional
78% (18)
IF Condition Result: Exercise Write The Verbs in Brackets. Use The First Conditional
1 page
General English Advanced Course Outline
No ratings yet
General English Advanced Course Outline
1 page
Syntax Directed Translation SDT
No ratings yet
Syntax Directed Translation SDT
4 pages
Subject Final Test (English Grammar
No ratings yet
Subject Final Test (English Grammar
34 pages
TRANSITIONS 6 Web
No ratings yet
TRANSITIONS 6 Web
40 pages
Part of Speech (Part 1) - Introducing Someone For LMS
No ratings yet
Part of Speech (Part 1) - Introducing Someone For LMS
11 pages
Grammar Chart 19 Correlative Conjunctions
No ratings yet
Grammar Chart 19 Correlative Conjunctions
1 page
Plural de Palavras em Ingles
No ratings yet
Plural de Palavras em Ingles
4 pages
Past Tenses
No ratings yet
Past Tenses
2 pages
English-9 SLM Q1 W5 M5 V1.0-CC-released
No ratings yet
English-9 SLM Q1 W5 M5 V1.0-CC-released
21 pages
EFL Lesson Plan 4 Intermediate
No ratings yet
EFL Lesson Plan 4 Intermediate
3 pages
Present 1 Past 2 Past Participle 3 Traducere: Grupa 1 (Cea Mai Sinplă) - Verbele În Toate Formele Se Scriu La Fel
No ratings yet
Present 1 Past 2 Past Participle 3 Traducere: Grupa 1 (Cea Mai Sinplă) - Verbele În Toate Formele Se Scriu La Fel
8 pages
Module 5 - Modal Auxiliary Verbs
No ratings yet
Module 5 - Modal Auxiliary Verbs
18 pages

AminaRahmanK DL Lab5

Uploaded by

AminaRahmanK DL Lab5

Uploaded by

Amina Rahman K_DL_Lab5

December 12, 2021

1 SENTIMENT ANALYSIS IN PYTHON 3 USING NLTK 2 1.

3 2. TOKENIZING THE DATA

[21]: pos_tweets = twitter_samples.strings('positive_tweets.json') neg_tweets

['#FollowFriday @France_Inte @PKuchly57 @Milipol_Paris for being top engaged members

[26]: pos_tweets = twitter_samples.tokenized('positive_tweets.json') neg_tweets

['#FollowFriday', '@France_Inte', '@PKuchly57', '@Milipol_Paris', 'for', 'being', 'top',

4 3. NORMALIZING THE DATA

[8]: from nltk.tag import pos_tag

[9]: tweet_tokens = twitter_samples.tokenized('positive_tweets.json')

[('#FollowFriday', 'JJ'), ('@France_Inte', 'NNP'), ('@PKuchly57', 'NNP'),

[10]: from nltk.stem.wordnet import WordNetLemmatizer

['#FollowFriday', '@France_Inte', '@PKuchly57', '@Milipol_Paris', 'for', 'be', 'top', 'engage',

5 4. REMOVING NOISE FROM DATA

for token, tag in pos_tag(tweet_tokens):

if len(token) > 0 and token not in string.punctuation and token.lower()␣ not in

[12]: from nltk.corpus import stopwords

['#followfriday', 'top', 'engage', 'member', 'community', 'week', ':)']

for tokens in neg_tweets:

['Dang', 'that', 'is', 'some', 'rad', '@AbzuGame', '#fanart', '!', ':D',

6 5. DETERMINING WORD DENSITY

[30]: from nltk import FreqDist

7 6. PREPARING DATA FOR THE MODEL

[31]: def get_tweets_for_model(cleaned_tokens_list):

[32]: import random

positive_dataset = [(tweet_dict, "Positive")

negative_dataset = [(tweet_dict, "Negative")

dataset = positive_dataset + negative_dataset

8 7. BUILDING AND TESTING THE MODEL

print("Accuracy is:", classify.accuracy(classifier, test_data))

Accuracy is: 0.9953333333333333

print(classifier.classify(dict([token, True] for token in custom_tokens))) 7

print(classifier.classify(dict([token, True] for token in custom_tokens))) Positive

9 8. CLEANING UP THE CODE

import re, string, random

def remove_noise(tweet_tokens, stop_words = ()):

for token, tag in pos_tag(tweet_tokens):

for tokens in positive_tweet_tokens:

for tokens in negative_tweet_tokens:

positive_dataset = [(tweet_dict, "Positive")

dataset = positive_dataset + negative_dataset

print("Accuracy is:", classify.accuracy(classifier, test_data))

print(custom_tweet, classifier.classify(dict([token, True] for token in␣

You might also like