0% found this document useful (0 votes)

4 views

Nlp Revision Notes

The document contains multiple-choice questions (MCQs) and subjective questions related to Natural Language Processing (NLP) concepts, tools, and techniques. Key topics include NLTK, TF-IDF, tokenization, stemming, lemmatization, and applications of chatbots in healthcare. Additionally, it discusses the differences between stemming and lemmatization, and how NLP interprets communication compared to humans.

Uploaded by

shikhadm9

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Nlp Revision Notes

Uploaded by

shikhadm9

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Question Banks – MCQs :

1. What is NLTK tool in Python ?

(a) Natural Linguistics Tool (b) Natural Language Toolkit

(c) Neutral Language Kit (d) Neutral Language Toolkit

2. TF-IDF in NLP is defined as :

(a) Term Frequency and Definite Frequency

(b) Term Frequency and Indefinite Frequency

(c) Term Frequency and Inverse Document Frequency

(d) Term Frequency and Integrated Document Frequency

3. What do we call the process of dividing a string into component words ?

(a) Regression

(b) Word Tokenization

(d) Clustering

4. What is the stem of the word “Making” ?

(a) Mak (b) Make (c) Making (d) Maker

5. What is the Lemma of the word “Making” ?

(a) Mak (b) Make (c) Making (d) Maker

6. Which of these is not a stop word ?

(a) This (b) Things (c) Is (d) Do

151
7. The higher the value, the more important the word in the document – this is

true of which model ?

(a) Bag of Words (b)TF-IDF (c) YOLO (d) SSD

8. Which of these is not an NLP library ?

(a) NLTK (b) NLP Kit (c) Open NLP (d) NLP Suite

9. What is a chatbot called which uses simple FAQs without any intelligence ?

(a) Smart Chatbot (b) Script Chatbot

(c) AI Chatbot (d) ML Chatbot

10. What is the process of extracting emotions within a text data using NLP called?

(a) Sentiment Analysis

(b) Emotional Data Science

(c) Emotional Processing

(d) Emotional Classification

Subjective Type Questions 2 Marks:

1. Explain the key steps of NLP – based text analysis.

i) Sentence Segmentation
ii) Tokenization
iii) Removing Stop words, Special Characters and Numbers
iv) Stemming
v) Converting Text to common Case
vi) Lemmatization

152
2. Compare Bag of words and TF-IDF and share your finding.

Bag of Words is a Natural Language Processing model which helps in extracting

features out of the text which can be helpful in machine learning algorithms. In bag of
words, we get the occurrences of each word and construct the vocabulary for the
corpus. Bag of Words just creates a set of vectors containing the count of word
occurrences in the document (reviews). Bag of Words vectors are easy to interpret.
TFIDF is commonly used in the Natural Language Processing domain.
Some of its applications are:
· Document Classification - Helps in classifying the type and genre of a document.
· Topic Modelling - It helps in predicting the topic for a corpus.
· Information Retrieval System - To extract the important information out of a corpus.
Stop word filtering - Helps in removing the unnecessary words out of a text body.

3. What are some of the applications of chatbots in health care ?

The most valuable features of using chatbots in healthcare include:

· Monitoring: Awareness and tracking of user’s behavior, anxiety, and weight changes
to encourage developing better habits.
· Anonymity: Especially in sensitive and mental health issues.
· Personalization: Level of personalization depends on the specific application. Some
applications make use of measurements of:
. Physical vitals (oxygenation, heart rhythm, body temperature) via mobile sensors.
. Patient behavior via facial recognition.
· Real time interaction: Immediate response, notifications, and reminders.
· Scalability: Ability to react with numerous users at the same time.

4. Explain the difference between Stemming and Lemmatization.

Stemming: Stemming is a rudimentary rule-based process of stripping the suffixes

(“ing”, “ly”, “es”, “s” etc) from a word.

153
Stemming is a process of reducing words to their word stem, base or root form (for
example, books — book, looked — look).

Lemmatization: Lemmatization, on the other hand, is an organized & step by step

procedure of obtaining the root form of the word, it makes use of vocabulary
(dictionary importance of words) and morphological analysis (word structure and
grammar relations).

5. What is the difference between how humans interpret communication and how NLP interpret?

The communications made by the machines are very basic and simple. Human communication
is complex. There are multiple characteristics of the human language that might be easy for a
human to understand but extremely difficult for a computer to understand.

For machines it is difficult to understand our language. Let us take a look at some of them
here:
Arrangement of the words and meaning - There are rules in human language. There are nouns,
verbs, adverbs, adjectives. A word can be a noun at one time and an adjective some other time.
This can create difficulty while processing by computers.

Analogy with programming language- Different syntax, same semantics: 2+3 = 3+2 Here the
way these statements are written is different, but their meanings are the same that is 5.
Different semantics, same syntax: 2/3 (Python 2.7) ≠ 2/3 (Python 3) Here the statements
written have the same syntax but their meanings are different. In Python 2.7, this statement
would result in 1 while in Python 3, it would give an output of 1.5. Multiple Meanings of a
word - In natural language, it is important to understand that a word can have multiple
meanings and the meanings fit into the statement according to the context of it.

Subjective type questions 4 Marks

1. Through a step-by-step process, calculate TFIDF for the given corpus and
mention the word(s) having highest value.
Document 1: We are going to Mumbai
Document 2: Mumbai is a famous place.

154
Document 3: We are going to a famous place.
Document 4: I am famous in Mumbai.

Term Frequency

Term frequency is the frequency of a word in one document. Term frequency can easily be
found from the document vector table as in that table we mention the frequency of each word
of the vocabulary in each document.

We Are Going to Mumbai is a famous Place I am in

1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 1 0 0 0
1 1 1 1 0 0 1 1 1 0 0 0
0 0 0 0 1 0 0 1 0 1 1 1

Inverse Document Frequency

The other half of TFIDF which is Inverse Document Frequency. For this, let us first understand
what does document frequency mean. Document Frequency is the number of documents in
which the word occurs irrespective of how many times it has occurred in those documents.
The document frequency for the exemplar vocabulary would be:

We Are Going to Mumbai is a famous Place I am in

2 2 2 2 3 1 2 3 2 1 1 1

Talking about inverse document frequency, we need to put the document frequency in the
denominator while the total number of documents is the numerator. Here, the total number of
documents are 3, hence inverse document frequency becomes:

We Are Going to Mumbai is a famous Place I am in

4/2 4/2 4/2 4/2 4/3 4/1 4/2 4/3 4/2 4/1 4/1 4/1

The formula of TFIDF for any word W becomes:

TFIDF(W) = TF(W) * log (IDF(W))

155
We Are Going to Mumbai is a famous Place I am in

1*log(4/2) 1*log(4/2) 1*log(4/2) 1*log(4/2) 1*log(4/3) 0*log(4/1) 0*log(4/2) 0*log(4/3) 0*log(4/2) 0*log(4/1) 0*log(4/1) 0*log(4/1)

0*log(4/2) 0*log(4/2) 0*log(4/2) 0*log(4/2) 1*log(4/3) 0*log(4/1) 1*log(4/2) 1*log(4/3) 1*log(4/2) 0*log(4/1) 0*log(4/1) 0*log(4/1)

1*log(4/2) 1*log(4/2) 1*log(4/2) 1*log(4/2) 0*log(4/3) 0*log(4/1) 1*log(4/2) 1*log(4/3) 1*log(4/2) 0*log(4/1) 0*log(4/1) 0*log(4/1)

0*log(4/2) 0*log(4/2) 0*log(4/2) 0*log(4/2) 1*log(4/3) 0*log(4/1) 0*log(4/2) 1*log(4/3) 1*log(4/2) 1*log(4/1) 1*log(4/1) 1*log(4/1)

After calculating all the values, we get

We Are Going to Mumb is a fam Plac I am in

ai ous e
0.301 0.301 0.301 0.301 0.124 0 0 0 0 0 0 0

0 0 0 0 0.124 0.602 0.301 0.124 0.301 0 0 0

0.301 0.301 0.301 0.301 0 0 0.301 0.124 0.301 0 0 0

0 0 0 0 0.124 0 0 0.124 0 0.602 0.602 0.602

Finally, the words have been converted to numbers. These numbers are the values of each for each
document. Here, you can see that since we have less data, words like ‘I’,’am’, ‘in’ and ‘is’ also
have a high value. But as the IDF value increases, the value of that word decreases.

2. Normalize the given text and comment on the vocabulary before and after the
normalization:
Raj and Vijay are best friends. They play together with other friends. Raj likes to play
football but Vijay prefers to play online games. Raj wants to be a footballer. Vijay wants
to become an online gamer.

Normalization of the given text:

Sentence Segmentation:
1. Raj and Vijay are best friends.
2. They play together with other friends.
3. Raj likes to play football but Vijay prefers to play online games.
4. Raj wants to be a footballer.
5. Vijay wants to become an online gamer.

156

AWS Academy Cloud Foundations Module 07 Student Guide
No ratings yet
AWS Academy Cloud Foundations Module 07 Student Guide
71 pages
Alfred's Drum Method Book 1
0% (7)
Alfred's Drum Method Book 1
23 pages
100 NLP Questions
100% (5)
100 NLP Questions
23 pages
Maintenance Petition 125 CRPC
100% (2)
Maintenance Petition 125 CRPC
19 pages
New - Masterguide - Curriculum - 1 1
No ratings yet
New - Masterguide - Curriculum - 1 1
57 pages
Module # 3 - MMW Part 1 Central Tendency
No ratings yet
Module # 3 - MMW Part 1 Central Tendency
22 pages
Q_ClassX_AI_Ch7
No ratings yet
Q_ClassX_AI_Ch7
6 pages
517-c-30070-Assignment - chapter NLP
No ratings yet
517-c-30070-Assignment - chapter NLP
9 pages
UNIT 6- NLP NOTES
No ratings yet
UNIT 6- NLP NOTES
7 pages
Natural Language Processing (NLP) Introduction:: Top 10 NLP Interview Questions For Beginners
No ratings yet
Natural Language Processing (NLP) Introduction:: Top 10 NLP Interview Questions For Beginners
24 pages
P.S.Senior Secondary School Class X - Artificial Intelligence - 2021-22 Natural Language Processing Question and Answers
No ratings yet
P.S.Senior Secondary School Class X - Artificial Intelligence - 2021-22 Natural Language Processing Question and Answers
7 pages
NLP-Questions Class 10 Ai
No ratings yet
NLP-Questions Class 10 Ai
8 pages
Bag of Words
No ratings yet
Bag of Words
19 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
5 pages
AI Hw
No ratings yet
AI Hw
4 pages
Delhi Public School Bangalore North
No ratings yet
Delhi Public School Bangalore North
8 pages
NLP and Evaluation
No ratings yet
NLP and Evaluation
23 pages
Cbse - Department of Skill Education Artificial Intelligence
No ratings yet
Cbse - Department of Skill Education Artificial Intelligence
11 pages
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
No ratings yet
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
45 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
UNIT 6 NATURAL LANGUAGE PROCESSING.docx
No ratings yet
UNIT 6 NATURAL LANGUAGE PROCESSING.docx
10 pages
Question Bank For Seen Pre-Board - AI - Grade 10 - 2021-22
No ratings yet
Question Bank For Seen Pre-Board - AI - Grade 10 - 2021-22
7 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Ai Notes
No ratings yet
Ai Notes
11 pages
Text
No ratings yet
Text
3 pages
Natural Language Processing Important Questions Answers
100% (1)
Natural Language Processing Important Questions Answers
31 pages
Module 05 - Learners Guide
No ratings yet
Module 05 - Learners Guide
31 pages
Natural Language Process (NLP)
No ratings yet
Natural Language Process (NLP)
29 pages
Previous_year_Question_paper_NLP (2)
No ratings yet
Previous_year_Question_paper_NLP (2)
5 pages
X_AI-NLP Worksheet
No ratings yet
X_AI-NLP Worksheet
2 pages
NLP - CH-6
No ratings yet
NLP - CH-6
4 pages
Natural Language Processing_NOTES
No ratings yet
Natural Language Processing_NOTES
4 pages
05 Introduction To NLP
No ratings yet
05 Introduction To NLP
63 pages
dupppppppppp
No ratings yet
dupppppppppp
15 pages
Natural Language Processing - Back Exercises
No ratings yet
Natural Language Processing - Back Exercises
15 pages
It-3035 (NLP) - CS Mid Feb 2024
No ratings yet
It-3035 (NLP) - CS Mid Feb 2024
6 pages
Module III
No ratings yet
Module III
42 pages
Unit-I QB
No ratings yet
Unit-I QB
5 pages
Exp-7
No ratings yet
Exp-7
9 pages
NLP
No ratings yet
NLP
14 pages
Lemmatization Is The Grouping Together of Different Forms of The Same Word. in Search
No ratings yet
Lemmatization Is The Grouping Together of Different Forms of The Same Word. in Search
11 pages
NLP Class10.PDF
No ratings yet
NLP Class10.PDF
9 pages
A Tutorial On: Linguistic Data Analysis
No ratings yet
A Tutorial On: Linguistic Data Analysis
99 pages
NLP_DeepNLP
No ratings yet
NLP_DeepNLP
61 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
Quest_NLP
No ratings yet
Quest_NLP
13 pages
Ass7 Write Up .Final
No ratings yet
Ass7 Write Up .Final
11 pages
SL-3_Assignment No 7
No ratings yet
SL-3_Assignment No 7
14 pages
Nlp and Evaluation -Mcq
No ratings yet
Nlp and Evaluation -Mcq
10 pages
Feature extraction techniques in NLP
No ratings yet
Feature extraction techniques in NLP
10 pages
CH 3 NLP WORKSHEET
No ratings yet
CH 3 NLP WORKSHEET
2 pages
Assignment 04 NLP
No ratings yet
Assignment 04 NLP
6 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Data Science Interview Preparation Questions (#Day06)
No ratings yet
Data Science Interview Preparation Questions (#Day06)
10 pages
Subjective Ai 417 2023
No ratings yet
Subjective Ai 417 2023
43 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
NLP Asgn3
No ratings yet
NLP Asgn3
6 pages
Nlp Question Bank Answers (Jagmeet)
No ratings yet
Nlp Question Bank Answers (Jagmeet)
31 pages
pdf NLP
No ratings yet
pdf NLP
7 pages
NLP Qa
No ratings yet
NLP Qa
10 pages
NLP Question Bank Answers(Raghav)- this is better
No ratings yet
NLP Question Bank Answers(Raghav)- this is better
25 pages
Group A Assignment No: 7
No ratings yet
Group A Assignment No: 7
10 pages
NLP notes
No ratings yet
NLP notes
3 pages
Pipeline
No ratings yet
Pipeline
9 pages
Alpha Beta Zero to Zillion Word Codes for Numbers
From Everand
Alpha Beta Zero to Zillion Word Codes for Numbers
Dr Godwin Lekwuwa
No ratings yet
TPT Challenges
No ratings yet
TPT Challenges
6 pages
PPG Q4 - W1 - Module 7
No ratings yet
PPG Q4 - W1 - Module 7
33 pages
SolidWorks Tutorial 2 Voor Lager en Middelbaar Onderwijs (2020)
No ratings yet
SolidWorks Tutorial 2 Voor Lager en Middelbaar Onderwijs (2020)
27 pages
Label Quadris Xtra
No ratings yet
Label Quadris Xtra
12 pages
EmailInput Node
No ratings yet
EmailInput Node
16 pages
CHUYÊN LQD KH 21
No ratings yet
CHUYÊN LQD KH 21
9 pages
Mathematics 6 SLP (2nd Quarter, S.Y. 2024-2025) Bayo, C.
No ratings yet
Mathematics 6 SLP (2nd Quarter, S.Y. 2024-2025) Bayo, C.
11 pages
180523V3LRJJ
No ratings yet
180523V3LRJJ
3 pages
Brandenburg Concerto No. 3, in G Major, BWV 1048 (1721) Johann Sebastian Bach (1685-1750)
No ratings yet
Brandenburg Concerto No. 3, in G Major, BWV 1048 (1721) Johann Sebastian Bach (1685-1750)
2 pages
Merck Agency Theory Instructional Case
No ratings yet
Merck Agency Theory Instructional Case
22 pages
Reporting Additional Topic Checklist Gr2 5
No ratings yet
Reporting Additional Topic Checklist Gr2 5
8 pages
Expansion of Osmani International Airport, Sylhet (1St Phase) Rate Analysis
No ratings yet
Expansion of Osmani International Airport, Sylhet (1St Phase) Rate Analysis
8 pages
Autoclave Expansion
No ratings yet
Autoclave Expansion
59 pages
niir-directory-database-india-professionals-architects-interior-decorators-building-contractors-property-dealers-real-estate-agents-brokers-developers-builders-delhi-amp-ncr-region-construction-materials
No ratings yet
niir-directory-database-india-professionals-architects-interior-decorators-building-contractors-property-dealers-real-estate-agents-brokers-developers-builders-delhi-amp-ncr-region-construction-materials
11 pages
1st Cello-Forum - La-Cellissima - English
No ratings yet
1st Cello-Forum - La-Cellissima - English
7 pages
Milestone 4 Task 2: Make A Sale
No ratings yet
Milestone 4 Task 2: Make A Sale
4 pages
Escoda
No ratings yet
Escoda
34 pages
The World's Best Footballers: The Top 100 List
No ratings yet
The World's Best Footballers: The Top 100 List
6 pages
Chapter 7-8 - Drilling Bits Cost
No ratings yet
Chapter 7-8 - Drilling Bits Cost
75 pages
A5 - ICQ Control Activities - Disbursements
100% (1)
A5 - ICQ Control Activities - Disbursements
4 pages
Enrique Iglesias
100% (1)
Enrique Iglesias
14 pages
BR 45 (1) Old Edition
No ratings yet
BR 45 (1) Old Edition
713 pages
Exercises Past Simple Present Perfect Continuous. PDF Perfect (Grammar) Language Mechanics
No ratings yet
Exercises Past Simple Present Perfect Continuous. PDF Perfect (Grammar) Language Mechanics
1 page
Weight and Balance: Center of Gravity Computation
No ratings yet
Weight and Balance: Center of Gravity Computation
5 pages
Heat Treated Al-Zn-Cu-Mg: The Structure and Mechanical Properties
No ratings yet
Heat Treated Al-Zn-Cu-Mg: The Structure and Mechanical Properties
6 pages

Nlp Revision Notes

Uploaded by

Nlp Revision Notes

Uploaded by

Question Banks – MCQs :

1. What is NLTK tool in Python ?

(a) Natural Linguistics Tool (b) Natural Language Toolkit

(c) Neutral Language Kit (d) Neutral Language Toolkit

2. TF-IDF in NLP is defined as :

(a) Term Frequency and Definite Frequency

(b) Term Frequency and Indefinite Frequency

(c) Term Frequency and Inverse Document Frequency

(d) Term Frequency and Integrated Document Frequency

3. What do we call the process of dividing a string into component words ?

(b) Word Tokenization

4. What is the stem of the word “Making” ?

(a) Mak (b) Make (c) Making (d) Maker

5. What is the Lemma of the word “Making” ?

(a) Mak (b) Make (c) Making (d) Maker

6. Which of these is not a stop word ?

(a) This (b) Things (c) Is (d) Do

true of which model ?

(a) Bag of Words (b)TF-IDF (c) YOLO (d) SSD

8. Which of these is not an NLP library ?

(a) Smart Chatbot (b) Script Chatbot

(c) AI Chatbot (d) ML Chatbot

(a) Sentiment Analysis

(b) Emotional Data Science

(c) Emotional Processing

(d) Emotional Classification

Subjective Type Questions 2 Marks:

1. Explain the key steps of NLP – based text analysis.

Bag of Words is a Natural Language Processing model which helps in extracting

3. What are some of the applications of chatbots in health care ?

The most valuable features of using chatbots in healthcare include:

4. Explain the difference between Stemming and Lemmatization.

Stemming: Stemming is a rudimentary rule-based process of stripping the suffixes

Lemmatization: Lemmatization, on the other hand, is an organized & step by step

Subjective type questions 4 Marks

We Are Going to Mumbai is a famous Place I am in

Inverse Document Frequency

We Are Going to Mumbai is a famous Place I am in

We Are Going to Mumbai is a famous Place I am in

The formula of TFIDF for any word W becomes:

After calculating all the values, we get

We Are Going to Mumb is a fam Plac I am in

0 0 0 0 0.124 0.602 0.301 0.124 0.301 0 0 0

0.301 0.301 0.301 0.301 0 0 0.301 0.124 0.301 0 0 0

0 0 0 0 0.124 0 0 0.124 0 0.602 0.602 0.602

Normalization of the given text:

You might also like