0% found this document useful (0 votes)

21 views

NLP Record

The document discusses N-gram analysis in natural language processing. It defines unigrams, bigrams, and trigrams as sequences of 1, 2, or 3 words respectively. The document provides code to generate unigrams, bigrams, and trigrams from sample sentences using the NLTK ngrams function. It defines a custom ngrams_convertor function that takes a sentence and N as parameters to generate N-grams of any size.

Uploaded by

nuzzurockzz301

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

NLP Record

Uploaded by

nuzzurockzz301

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

1

WEEK-1
AIM: Installation and exploring features of NLTK and spaCy tools. Download word cloud and few corpora.

Installation of NLTK:
1. Go to link https://www.python.org/downloads/, and select the latest version for windows.
2. Click on the Downloaded File
3. Select Install Now.
4. After Set Up was Successful Click close.
5. In windows command prompt, Navigate to the location of the pip folder.
6. Enter command to install NLTK : pip install nltk
7. Installation should be done successfully

Features of NLTK:
Part of Speech tagging

Summarization

Named Entity Recognition

Sentiment Analysis

Emotion Detection

Language Detection

Data Ingestion and Wrangling

Programming Language Support

Drag and Drop

Customizable Models

Pre-Build Algorithms

Installation of spaCy:
Go to Command Prompt and Enter following commands to install spaCy.

pip install -U pip setuptools wheel

pip install -U spacy

python -m spacy download en_core_web_sm

Features of spaCy:
Parts of Speech tagging

Morphology

Lemmatization

Dependency Parse

Named Entities

Tokenization

Merging and Splitting

Sentence Segmentation etc..,

Downloading of word Cloud:

Using Command Prompt
WordCloud can be installed in our system by using the given command in the command prompt.

$ pip install wordcloud

Using Anaconda
We can install wordcloud using Anaconda by typing the following command in the Anaconda Prompt.

$ conda install -c conda-forge wordcloud

Downloading of few Corpora

We can Install Corpora by using python interpreter.

Run the Python interpreter and type the commands:

import nltk

nltk.download()

A new window should open, showing the NLTK Downloader. Click on the File menu and select Change
Download Directory. Next, select the packages or collections you want to download. After Successful
installation, we can test has been installed as follows:

from nltk.corpus import brown

brown.words()

Output:

['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]

WEEK-2
AIM: i) Write a program to implement word tokenizer, sentence and paragraph tokenizer.

ii) Check how many words are there in any corpus.Also check how many distinct words are there?

DESCRIPTION:

Tokenization: Tokenization is the process by which a large quantity of text is divided into smaller parts called
tokens. These tokens are very for finding patterns and are considered as a base step for stemming and
lemmatization. Tokenization also helps to substitute sensitive data elements with non-sensitive data elements.

Natural Language toolkit has very important module NLTK tokenize sentences which further comprises of sub-
modules

1. word tokenize
2. sentence tokenize

We use the method word_tokenize() to split a sentence into words and sent_tokenize() to tokenize into
sentences.

PROGRAM:

I)
import nltk

nltk.download

OUTPUT:

<bound method Downloader.download of <nltk.downloader.Downloader object at 0x00000296F81E1730>>

from nltk import word_tokenize, sent_tokenize

text="Tokenization in NLP is the process by which a large quantity of text is divided into smaller parts called
tokens. Natural language processing is used for building applications such as Text classification, intelligent
chatbot, sentimental analysis, language translation, etc.Natural Language toolkit has very important module
NLTK tokenize sentence which further comprises of sub-modulesWe use the method word_tokenize() to split a
sentence into words. The output of word tokenizer in NLTK can be converted to Data Frame for better text
understanding in machine learning applications.Sub-module available for the above is sent_tokenize. Sentence
tokenizer in Python NLTK is an important feature for machine training."

print(word_tokenize(text))

OUTPUT:

['Tokenization', 'in', 'NLP', 'is', 'the', 'process', 'by', 'which', 'a', 'large', 'quantity', 'of', 'text', 'is', 'divided', 'into',
'smaller', 'parts', 'called', 'tokens', '.', 'Natural', 'language', 'processing', 'is', 'used', 'for', 'building', 'applications',
'such', 'as', 'Text', 'classification', ',', 'intelligent', 'chatbot', ',', 'sentimental', 'analysis', ',', 'language', 'translation',
',', 'etc.Natural', 'Language', 'toolkit', 'has', 'very', 'important', 'module', 'NLTK', 'tokenize', 'sentence', 'which',
'further', 'comprises', 'of', 'sub-modulesWe', 'use', 'the', 'method', 'word_tokenize', '(', ')', 'to', 'split', 'a', 'sentence',
'into', 'words', '.', 'The', 'output', 'of', 'word', 'tokenizer', 'in', 'NLTK', 'can', 'be', 'converted', 'to', 'Data', 'Frame',
'for', 'better', 'text', 'understanding', 'in', 'machine', 'learning', 'applications.Sub-module', 'available', 'for', 'the',
'above', 'is', 'sent_tokenize', '.', 'Sentence', 'tokenizer', 'in', 'Python', 'NLTK', 'is', 'an', 'important', 'feature', 'for',
'machine', 'training', '.']
4

print(sent_tokenize(text))

OUTPUT:

['Tokenization in NLP is the process by which a large quantity of text is divided into smaller parts called
tokens.', 'Natural language processing is used for building applications such as Text classification, intelligent
chatbot, sentimental analysis, language translation, etc.Natural Language toolkit has very important module
NLTK tokenize sentence which further comprises of sub-modulesWe use the method word_tokenize() to split a
sentence into words.', 'The output of word tokenizer in NLTK can be converted to Data Frame for better text
understanding in machine learning applications.Sub-module available for the above is sent_tokenize.', 'Sentence
tokenizer in Python NLTK is an important feature for machine training.']

II)

from nltk.corpus import movie_reviews

w = movie_reviews.words()

s = movie_reviews.sents()

from nltk.probability import FreqDist

count = FreqDist(w)

count.N()

OUTPUT:

1583820

print("No of Words in movie_reviews corpus are :",len(w))

print("No of Distinct words in movie_reviews corpus are : ",len(set(w)))

OUTPUT;

No of Words in movie_reviews corpus are : 1583820

No of Distinct words in movie_reviews corpus are : 39768

WEEK-3
AIM: i) Write a program to implement both user-defined and pre-defined functions to generate

a. Uni-grams
b. Bi-grams
c. Tri-grams
d. N-grams

DESCRIPTION:

N-grams represent a continuous sequence of N elements from a given set of texts. However, Natural Language
Processing commonly refers to N-grams as strings of words, where n stands for the number of words you are
looking for. The following types of N-grams are usually distinguished:

 Unigram — An N-gram with simply one string inside (for example, it can be a unique word —
computer or human from a given sentence, e.g. Natural Language Processing is the ability of a
computer program to understand human language as it is spoken and written).
 2-gram or Bigram — Typically a combination of two strings or words that appear in a document:
short-form video or video format will be likely a search result of bigrams in a certain corpus of texts
(and not format video, video short-form as the word order remains the same).
 3-gram or Trigram — An N-gram containing up to three elements processed together (e.g. short-form
video format or new short-form video) etc.

PROGRAM:

import nltk

from nltk.util import ngrams

n=1

sen = "Sravanthi is a good girl"

unigram = ngrams(sen.split(),n)

for i in unigram:

print(i)

('Sravanthi',)

('is',)

('a',)

('good',)

('girl',)

n=2

sen = "Sravanthi is a good girl"

bigram = ngrams(sen.split(),n)

for i in bigram:
6

print(i)

OUTPUT:

('Sravanthi', 'is')

('is', 'a')

('a', 'good')

('good', 'girl')

n=3

sen = "Sravanthi is a good girl"

trigram = ngrams(sen.split(),n)

for i in trigram:

print(i)

OUTPUT:

('Sravanthi', 'is', 'a')

('is', 'a', 'good')

('a', 'good', 'girl')

def ngrams_convertor(sen,n):

ngram = ngrams(sen.split(),n)

for i in ngram:

print(i)

sen = "Sravanthi is a good girl"

n=4

ngrams_convertor(sen,n)

OUTPUT:

('Sravanthi', 'is', 'a', 'good')

('is', 'a', 'good', 'girl')

South Bombay RAHEJA 1
100% (3)
South Bombay RAHEJA 1
64 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
NLP Programs
No ratings yet
NLP Programs
5 pages
Ai & ML Week-11
No ratings yet
Ai & ML Week-11
32 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
Lab Prgms Weel1-Output
No ratings yet
Lab Prgms Weel1-Output
4 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
NLTK
No ratings yet
NLTK
16 pages
Text Processing
No ratings yet
Text Processing
16 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
NLP Practicals All
No ratings yet
NLP Practicals All
57 pages
Final_NLP_Lab_File
No ratings yet
Final_NLP_Lab_File
28 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
NLP (1)
No ratings yet
NLP (1)
12 pages
NLP LAB_MANUAL (1)
No ratings yet
NLP LAB_MANUAL (1)
33 pages
UBC Summer School in NLP - VSP 2019 Lecture 10
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 10
33 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
NLP FinAL (1)
No ratings yet
NLP FinAL (1)
27 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
25 pages
Living with Linux in the Industrial World
From Everand
Living with Linux in the Industrial World
Elaiya Iswera Lallan
No ratings yet
CCS369-LAB EX 3,4,5
No ratings yet
CCS369-LAB EX 3,4,5
8 pages
Natural Language Processing With Python's NLTK Package – Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package – Real Python
27 pages
Intro To NLP: Natural Language Toolkit
No ratings yet
Intro To NLP: Natural Language Toolkit
11 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
NLP Practicals
No ratings yet
NLP Practicals
6 pages
NLTK Cheatsheet
No ratings yet
NLTK Cheatsheet
27 pages
01 NLP - Merged Vinay
No ratings yet
01 NLP - Merged Vinay
27 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
UBC Summer School in NLP - VSP 2019 Lecture 8
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 8
27 pages
The 1 Page Python Book
From Everand
The 1 Page Python Book
Barani Kumar
2/5 (1)
H7 W5 NLP - Merged
No ratings yet
H7 W5 NLP - Merged
17 pages
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
No ratings yet
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
37 pages
AI Zone: Log in Sign Up
No ratings yet
AI Zone: Log in Sign Up
24 pages
CCS369 - Text and Speech Analysis
No ratings yet
CCS369 - Text and Speech Analysis
31 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
CH4
No ratings yet
CH4
15 pages
Python Programming Reference Guide: A Comprehensive Guide for Beginners to Master the Basics of Python Programming Language with Practical Coding & Learning Tips
From Everand
Python Programming Reference Guide: A Comprehensive Guide for Beginners to Master the Basics of Python Programming Language with Practical Coding & Learning Tips
Coleman Newton
No ratings yet
AIML_P4
No ratings yet
AIML_P4
12 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
NLP Exercises
No ratings yet
NLP Exercises
2 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
AI_NLP
No ratings yet
AI_NLP
9 pages
NLTK Tutorial
No ratings yet
NLTK Tutorial
33 pages
Sample
No ratings yet
Sample
8 pages
Python NLP
No ratings yet
Python NLP
15 pages
Batch 2
No ratings yet
Batch 2
13 pages
UNIT-V-NLP Using NLTK
No ratings yet
UNIT-V-NLP Using NLTK
19 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
trace
No ratings yet
trace
5 pages
Lesson Plan: - Cristiane Tinôco - Ligia Farnezi - Relva Caroline
No ratings yet
Lesson Plan: - Cristiane Tinôco - Ligia Farnezi - Relva Caroline
16 pages
Reading Page
No ratings yet
Reading Page
2 pages
Introduction to Operator Space Theory Gilles Pisier - The special ebook edition is available for download now
100% (1)
Introduction to Operator Space Theory Gilles Pisier - The special ebook edition is available for download now
47 pages
JutohBook PDF
No ratings yet
JutohBook PDF
362 pages
Course COM 401 STATISTICAL ANALYSIS
No ratings yet
Course COM 401 STATISTICAL ANALYSIS
6 pages
Progress Test 2 Units 7-12: Exercise 1 Present Perfect and Past Simple Exercise 3 Have To/don't Have To Should/must
100% (1)
Progress Test 2 Units 7-12: Exercise 1 Present Perfect and Past Simple Exercise 3 Have To/don't Have To Should/must
4 pages
BioGeometry Shapes and Brain Balancing
100% (15)
BioGeometry Shapes and Brain Balancing
13 pages
Unit 1 Higher Test Answers
No ratings yet
Unit 1 Higher Test Answers
1 page
KEEPING CH 2
No ratings yet
KEEPING CH 2
14 pages
Google Hacking
100% (2)
Google Hacking
35 pages
2ND PET Grade 6 MATH
No ratings yet
2ND PET Grade 6 MATH
2 pages
QUARTER 2 ENG 5 Q2 WW and PT
No ratings yet
QUARTER 2 ENG 5 Q2 WW and PT
11 pages
Painting Nios Paper 2023 2
No ratings yet
Painting Nios Paper 2023 2
4 pages
Figur 1: OPC Can Be Used To Conveniently Access S7, S5 and Other Controllers
No ratings yet
Figur 1: OPC Can Be Used To Conveniently Access S7, S5 and Other Controllers
6 pages
B. Siertsema Ph. D. (Auth.) - A Study of Glossematics - Critical Survey of Its Fundamental Concepts-Springer Netherlands (1955)
No ratings yet
B. Siertsema Ph. D. (Auth.) - A Study of Glossematics - Critical Survey of Its Fundamental Concepts-Springer Netherlands (1955)
251 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
ĐỀ 1-9A1
No ratings yet
ĐỀ 1-9A1
2 pages
T2a_Inno2019_MDOF_Systems_Energy Methods
No ratings yet
T2a_Inno2019_MDOF_Systems_Energy Methods
21 pages
Analysis of Miss Call Issue V2 0
No ratings yet
Analysis of Miss Call Issue V2 0
8 pages
Preposition
100% (1)
Preposition
22 pages
Narada_Bhakti_Sutra
No ratings yet
Narada_Bhakti_Sutra
7 pages
English Grade 5 Worksheet - Week 19: Name: .. Class: .
No ratings yet
English Grade 5 Worksheet - Week 19: Name: .. Class: .
9 pages
Significant Figures
No ratings yet
Significant Figures
3 pages
Angularfire & NGRX - UDEMY
No ratings yet
Angularfire & NGRX - UDEMY
17 pages
The Land Lady - PPT11637750097
No ratings yet
The Land Lady - PPT11637750097
12 pages
Nitins Resume_1690957385440_Nitin Sharma (1)
No ratings yet
Nitins Resume_1690957385440_Nitin Sharma (1)
4 pages
SCD5200/RTU50 Software Release Note 1101155-L: © Foxboro Australia 2000 ACN 000 522 26
No ratings yet
SCD5200/RTU50 Software Release Note 1101155-L: © Foxboro Australia 2000 ACN 000 522 26
19 pages
Slides
No ratings yet
Slides
43 pages