Natural Language Processing Important Questions Answers
Natural Language Processing Important Questions Answers
3. __________ is the sub-field of AI that make the interactions between computers and human
(natural) languages
a. Natural Language Processing
b. Data Science
c. Computer Vision
d. None of the above
Ans: a. Natural Language Processing
6. ___________ Information overload is a real problem when we need to access a specific, important
piece of information from a huge knowledge base.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: a. Automatic Summarization
7. ___________ is especially relevant when used to provide an overview of a news item or blog post,
while avoiding redundancy from multiple sources and maximizing the diversity of content obtained.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 1 -
d. All of the above
Ans: a. Automatic Summarization
8. The goal of sentiment analysis is to identify sentiment among several posts or even in the same
post where emotion is not always explicitly expressed.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: b. Sentiment Analysis
9. Companies use Natural Language Processing applications, such as _________, to identify opinions
and sentiment online to help them understand what customers think about their products and services
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: b. Sentiment Analysis
10. ___________ makes it possible to assign predefined categories to a document and organize it to
help you find the information you need or simplify some activities.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: c. Text Classification
11. __________ device helps to communicate with humans and abilities to make humans lives easier.
a. Google Assistant
b. Cortana
c. Siri
d. All of the above
Ans: d. All of the above
12. _____________ is all about how machines try to understand and interpret human language and
operate accordingly.
a. Natural Language Processing
b. Data Science
c. Computer Vision
d. None of the above
Ans: a. Natural Language Processing
13. By dividing up large problems into smaller ones, ____________ aims to help you manage them in
a more constructive manner.
a. CDP
b. CBT
c. CSP
d. CLP
Ans: b. CBT
16. ________ is considered to be one of the best methods to address stress as it is easy to implement
on people and also gives good results.
a. Common Behavioural Therapy (CBT)
b. Cognitive Behavioural Therapy (CBT)
c. Connection Behavioural Therapy (CBT)
d. None of the above
Ans: b. Cognitive Behavioural Therapy (CBT)
17. ____________ by collecting data from various reliable and authentic sources.
a. Data Acquisition
b. Database
c. Data Mining
d. None of the above
Ans: a. Data Acquisition
18. Once the textual data has been collected, it needs to be processed and cleaned so that an easier
version can be sent to the machine. This is known as __________.
a. Data Acquisition
b. Data Exploration
c. Data Mining
d. None of the above
Ans: b. Data Exploration
19. Once the text has been normalized, it is then fed to an NLP based AI model. Note that in NLP,
modelling requires data pre-processing only after which the data is fed to the machine.
a. Data Acquisition
b. Modelling
c. Data Mining
d. None of the above
Ans: b. Modelling
20. The model trained is then evaluated and the accuracy for the same is generated on the basis of the
relevance of the answers which the machine gives to the user’s responses.
a. Data Acquisition
b. Modelling
c. Evaluation
d. None of the above
Ans: c. Evaluation
21. One of the most common applications of Natural Language Processing is a chatbot, give some
examples of chatbots __________.
a. Mitsuku Bot
b. CleverBot
c. Jabberwacky
26. ___________ helps in cleaning up the textual data in such a way that it comes down to a level
where its complexity is lower than the actual data.
a. Speech Normalization
b. Text Normalization
c. Visual Normalization
d. None of the above
Ans: b. Text Normalization
27. ________ the whole corpus is divided into sentences. Each sentence is taken as a different data so
now the whole corpus gets reduced to sentences.
a. Sentence Normalization
b. Sentence Segmentation
c. Sentence Tokenization
d. All of the above
Ans: b. Sentence Segmentation
28. Under __________, every word, number and special character is considered separately and each
of them is now a separate token.
a. Tokenization
b. Token normalization
c. Token segmentation
d. All of the above
Ans: a. Tokenization
30. __________ are the words which occur very frequently in the corpus but do not add any value to
it.
a. Tokens
b. Words
c. Stopwords
d. None of the above
Ans: c. Stopwords
31. Stopwords are the words which occur very frequently in the corpus but do not add any value to it.
for example_________.
a. Grammatical words
b. Simple words
c. Complex words
d. All of the above
Ans: a. Grammatical words
33. The machine does not consider ___________words as same words because of different cases.
a. Upper case
b. Lower case
c. Case sensitivity
d. None of the above
Ans: c. Case sensitivity
34. ___________ is the process in which the affixes of words are removed and the words are
converted to their base form.
a. Stemming
b. Stopwords
c. Case-sensitivity
d. All of the above
Ans: a. Stemming
37. ___________ is a Natural Language Processing model which helps in extracting features out of
the text which can be helpful in machine learning algorithms.
a. Bag of Words
b. Big Words
c. Best Words
d. All of the above
Ans: a. Bag of Words
38. Which steps we have to approach to implement the bag of words algorithm.
a. Text Normalization
b. Create Dictionary
c. Create Document Vectors
d. All of the above
Ans: d. All of the above
39. ________ identify each document in the corpus, find out how many times the word from the
unique list of words has occurred.
a. Text Normalization
b. Create Dictionary
c. Document Vectors
d. All of the above
Ans: c. Document Vectors
Q4. ____ is all about visual data like images and videos.
a. Computer Vision
b. Data Science
c. NLP
d. None of the above
Ans: a. Computer Vision
Q8. Which of the following will help to access a specific, important piece of information from a huge
knowledge base.
a. Sentiment Analysis
b. Text classification
c. Virtual Assistants
d. Automatic Summarization
Ans: d. Automatic Summarization
Q11. One of the applications of Natural Language Processing is relevant when used to provide an
overview of a news item or blog post, while avoiding redundancy from multiple sources and
maximizing the diversity of content obtained. Identify the application from the following
a. Sentiment Analysis
b. Virtual Assistants
c. Text classification
d. Automatic Summarization
Ans: d. Automatic Summarization
Q12. Companies use ________ application of NLP , to identify opinions and feelings/emotions online
to help them understand what customers think about their products and services.
a. Sentiment Analysis
b. Automatic Summarization
c. Text classification
d. Virtual Assistants
Ans: a. Sentiment Analysis
Q13. _____ understands point of view in context to help better understand what’s behind an
expressed opinion.
a. Sentiment Analysis
b. Automatic Summarization
c. Text classification
d. Virtual Assistants
Ans: a. Sentiment Analysis
Q14. Which of the following makes it possible to assign predefined categories to a document and
organize it to help you find the information you need or simplify some activities.
a. Sentiment Analysis
b. Automatic Summarization
c. Text classification
d. Virtual Assistants
Ans: c. Text classification
Q22. _________ is all about how machines try to understand and interpret human language and
operate accordingly.
a. Natural Language Processing
b. Data Science
c. Computer Vision
d. None of the above
Ans: a. Natural Language Processing
Q23. Now a days a lot of cases are coming where people are depressed due to reasons like _________
a. Peer Pressure
b. Studies
c. Relationship
d. All of the above
Ans: d. All of the above
Q24. ________________is considered to be one of the best methods to address stress as it is easy to
implement on people and also gives good results.
a. CAD
b. CBT
c. CBD
d. CAM
Q27. _____ is a technique used by most therapists to cure patients out of stress and depression.
a. CTB
b. CBD
c. CBT
d. BCT
Ans: c. CBT
Q28. People who are going through stress will contact _______
a. Psychiatrist
b. Physician
c. Radiologist
d. None of the above
Ans: a. Psychiatrist
Q29. To understand the sentiments of people, we need to collect their conversational data so the
machine can interpret the words that they use and understand their meaning. This step is coming
under ______________
a. Problem Scoping
b. Data Acquisition
c. Data Exploration
d. Modelling
Ans: b. Data Acquisition
Q38. Our ____ keeps on processing the sounds that it hears around itself and tries to make sense out
of them all the time.
a. Eyes
b. Mouth
c. Brain
d. Ear
Ans: c. Brain
Q40. The communications made by the machines are very basic and simple. (T/F)
a. True
b. False
Ans: a. True
Q43. The possible difficulties a machine would face in processing natural language is __________
a. Arrangement of the words and meaning
b. Multiple Meanings of a word
c. Perfect Syntax, no Meaning
d. All of the above
Ans: d. All of the above
Q45. _______ allows the computer to identify the different parts of a speech.
a. part-of tagging.
b. part-of-sound tagging.
c. part-of-speech tagging.
d. part-of-speak tagging.
Ans: c. part-of-speech tagging.
Q49. In __________ it is important to understand that a word can have multiple meanings and the
meanings fit into the statement according to the context of it.
a. Natural Language
b. Computer language
c. Machine Language
d. None of the above
Ans: a. Natural Language
Q50. In Human language, a perfect balance of ______ is important for better understanding.
a. Syntax
b. Semantics
c. Both of the above
d. None of the above
Ans: c. Both of the above
Q51. _________________ helps in cleaning up the textual data in such a way that it comes down to a
level where its complexity is lower than the actual data.
a. Data Normalisation
b. Text Normalisation
c. Number Normalisation
d. Table Normalisation
Ans: b. Text Normalisation
Q52. The term used for the whole textual data from all the documents altogether is known as _____
a. Complete Data
b. Slab
c. Corpus
d. Cropus
Ans: c. Corpus
Q53. Which of the following is the first step for Text Normalisation?
a. Tokenisation
b. Sentence Segmentation.
c. Removing Stopwords, Special Characters and Numbers.
d. Converting text to a common case.
Ans: b. Sentence Segmentation.
Q56. Under ______, every word, number and special character is considered separately and each of
them is now a separate token.
a. Sentence Segmentation
b. Removing Stopwords, Special Characters and Numbers
c. Converting text to a common case
d. Tokenisation
Ans: d. Tokenisation
Q57. __________ are the words which occur very frequently in the corpus but do not add any value
to it.
a. Special Characters
b. Stopwords
c. Roman Numbers
d. Useless Words
Ans: b. Stopwords
Q59. During Text Normalisation, which step will come after removing Stopwords, Special Characters
and Numbers.
a. Converting text to a common case.
b. Stemming
c. Lemmatization
d. Tokenisation
Ans: a. Converting text to a common case.
Q60. During Text Normalisation, when we convert the whole text into a similar case, we prefer
____________
a. Upper Case
b. Lower Case
c. Title Case
d. Mixed Case
Ans: b. Lower Case
Q61. _____________ is the process in which the affixes of words are removed and the words are
converted to their base form.
a. Lemmatization
b. Stemming
c. Both of the above
d. None of the above
Ans: c. Both of the above
Q63. While stemming healed, healing and healer all were reduced to _______________
a. heal
b. healed
c. heale
d. hea
Ans: a. heal
Q64. While stemming studies was reduced to ___________________ after the affix removal.
a. studi
b. study
c. stud
d. studys
Ans: a. studi
Q65. After Lemmatization, the words which we are get after removing the affixes is called
_________
a. Lemmat
b. Lemma
c. Lemmatiz
d. Lemmatiza
Ans: b. Lemma
Q67. ___________ is a Natural Language Processing model. In this we get the occurrences of each
word and construct the vocabulary for the corpus.
a. Bag of Words
b. Bag of Alphabets
c. Bag of Characters
d. Bag of Numbers
Ans: a. Bag of Words
Q68. Which of the following things we are getting after ‘Bag of Words’ algorithm?
a. A vocabulary of words for the corpus.
b. The frequency of these words.
c. Both of the above.
d. None of the above
Ans: c. Both of the above.
Q70. Bag of words algorithm gives us the frequency of words in each document. It gives us an idea
that if the word is occurring more in a document, __________
a. its value is more for that document
b. its value is less for that document
c. its value is not more not less for that document
d. its has no value for that document.
Ans: a. its value is more for that document
Q71. Steps to implement bag of words algorithm is given below. Choose the correct sequence.
1. Text Normalisation
2. Create document vectors
3. Create document vectors for all the documents
4. Create Dictionary
a. 1, 2, 3, 4
b. 2, 3, 1, 4
c. 1, 4, 2, 3
d. 1, 4, 3, 2
Ans: c. 1, 4, 2, 3
Q72. _______ are the words which occur the most in almost all the documents.
a. And
b. The
c. This
d. All of the above
Ans: d. All of the above
Q73. Those words which are a complete waste for machine as they do not provide any information
regarding the corpus are termed as _________________
a. Start words
b. End words
c. Stop words
d. Close words
Ans: c. Stop words
Q74. Which of the following type of words have more value in the document of the corpus?
a. Stop words
b. Frequent words
c. Rare words
d. All of the above
Ans: c. Rare words
Q75. Which of the following type of words have more frequency in the document of the corpus?
a. Stop words
b. Frequent words
c. Rare words
d. All of the above
Ans: a. Stop words
Q77. __________ is the number of documents in which the word occurs irrespective of how many
times it has occurred in those documents.
a. Term frequency
b. Inverse Document Frequency
c. Document Frequency
d. Inverse Frequency
Ans: c. Document Frequency
Q78. In _________, we put the document frequency in the denominator while the total number of
documents in the numerator.
a. Inverse Frequency
b. Inverse Document
c. Inverse Document Frequency
d. Term Frequency
Ans: c. Inverse Document Frequency
Q80. ________ helps in removing the unnecessary words out of a text body.
a. Document Classification
b. Topic Modelling
c. Stop word filtering
d. Information Retrieval System
Ans: c. Stop word filtering
1. What is a Chabot?
A chatbot is a computer program that's designed to simulate human conversation through voice
commands or text chats or both. Eg: Mitsuku Bot, Jabberwacky etc.
OR
A chatbot is a computer program that can learn over time how to best interact with humans. It can
answer questions and troubleshoot customer problems, evaluate and qualify prospects, generate sales
leads and increase sales on an ecommerce site.
OR
A chatbot is a computer program designed to simulate conversation with human users. A chatbot is
also known as an artificial conversational entity (ACE), chat robot, talk bot, chatterbot or chatterbox.
OR
A chatbot is a software application used to conduct an on-line chat conversation via text or text-to-
speech, in lieu of providing direct contact with a live human agent.
1. What are the types of data used for Natural Language Processing applications?
Natural Language Processing takes in the data of Natural Languages in the form of written words and
spoken words which humans use in their daily lives and operates on this.
8. Which words in a corpus have the highest values and which ones have the least?
Stop words like - and, this, is, the, etc. have highest values in a corpus. But these words do not talk
about the corpus at all. Hence, these are termed as stopwords and are mostly removed at the pre-
processing stage only.
Rare or valuable words occur the least but add the most importance to the corpus. Hence, when we
look at the text, we take frequent and rare words into consideration.
10. What is the significance of converting the text into a common case?
In Text Normalization, we undergo several steps to normalize the text to a lower level.
After the removal of stop words, we convert the whole text into a similar case, preferably lower case.
This ensures that the case-sensitivity of the machine does not consider same words as different just
because of different cases.
16. What are stop words? Explain with the help of examples.
“Stop words” are the most common words in a language like “the”, “a”, “on”, “is”, “all”. These
words do not carry important meaning and are usually removed from texts. It is possible to remove
stop words using Natural Language Toolkit (NLTK), a suite of libraries and programs for symbolic
and statistical natural language processing.
Here, the red dashed line is model’s output while the blue crosses are actual data samples.
● The model’s output does not match the true function at all. Hence the model is said to be under
fitting and its accuracy is lower.
● In the second case, model performance is trying to cover all the data samples even if they are out of
alignment to the true function. This model is said to be over fitting and this too has a lower accuracy
● In the third one, the model’s performance matches well with the true function which states that the
model has optimum accuracy and the model is called a perfect fit.
6. Through a step-by-step process, calculate TFIDF for the given corpus and mention the
word(s) having highest value.
Document 1: We are going to Mumbai
Document 2: Mumbai is a famous place.
Document 3: We are going to a famous place.
Document 4: I am famous in Mumbai.
Term Frequency
Term frequency is the frequency of a word in one document. Term frequency can easily be found
from the document vector table as in that table we mention the frequency of each word of the
vocabulary in each document.
We A Goi to Mumbai is a famous Place I am in
re ng
1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 1 0 0 0
Talking about inverse document frequency, we need to put the document frequency in the
denominator while the total number of documents is the numerator. Here, the total number of
documents are 3, hence inverse document frequency becomes:
We A goi t Mum i a Famous Place I a i
re ng o bai s m n
4/2 4 4/ 4 4/3 4 4 4/3 4/2 4/1 4 4
/ 2 / / / / /
2 2 1 2 1 1
7. Normalize the given text and comment on the vocabulary before and after the normalization:
Raj and Vijay are best friends. They play together with other friends. Raj likes to play football
but Vijay prefers to play online games. Raj wants to be a footballer. Vijay wants to become an
online gamer.
Normalization of the given text:
Sentence Segmentation:
1. Raj and Vijay are best friends.
2. They play together with other friends.
3. Raj likes to play football but Vijay prefers to play online games.
4. Raj wants to be a footballer.
5. Vijay wants to become an online gamer.
Tokenization:
Likes -s Like
Prefers -s Prefer
Wants -s want
2. What are the different applications of NLP which are used in real-life scenario?
Answer – Some of the applications which is used in the real-life scenario are –
a. Automatic Summarization – Automatic summarization is useful for gathering data from social
media and other online sources, as well as for summarizing the meaning of documents and other
written materials. When utilized to give a summary of a news story or blog post while eliminating
redundancy from different sources and enhancing the diversity of content acquired, automatic
summarizing is particularly pertinent.
b. Sentiment Analysis – In posts when emotion is not always directly expressed, or even in the same
post, the aim of sentiment analysis is to detect sentiment. To better comprehend what internet users
are saying about a company’s goods and services, businesses employ natural language processing
tools like sentiment analysis.
c. Text Classification – Text classification enables you to classify a document and organize it to
make it easier to find the information you need or to carry out certain tasks. Spam screening in email
is one example of how text categorization is used.
Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 26 -
d. Virtual Assistants – These days, digital assistants like Google Assistant, Cortana, Siri, and Alexa
play a significant role in our lives. Not only can we communicate with them, but they can also
facilitate our life. They can assist us in making notes about our responsibilities, making calls for us,
sending messages, and much more by having access to our data.
9. What is Chatbot?
Answer – A chatbot is a piece of software or an agent with artificial intelligence that uses natural
language processing to mimic a conversation with users or people. You can have the chat through a
website, application, or messaging app. These chatbots, often known as digital assistants, can
communicate with people verbally or via text.
The majority of organizations utilize AI chatbots, such the Vainubot and HDFC Eva chatbots, to give
their clients virtual customer assistance around-the-clock.
Some of the example of Chatbot –
a. Mitsuku Bot
b. CleverBot
c. Jabberwacky
d. Haptik
26. How does the relationship between a word’s value and frequency in a corpus look like in the
given graph?
Answer – The graph demonstrates the inverse relationship between word frequency and word value.
The most frequent terms, such as stop words, are of little significance. The value of words increases
as their frequency decreases. These words are referred to as precious or uncommon words. The least
frequently occurring but most valuable terms in the corpus are those.
28. Explain the differences between lemmatization and stemming. Give an example to assist you
explain.
Answer – Stemming is the process of stripping words of their affixes and returning them to their
original form.
After the affix is removed during lemmatization, we are left with a meaningful word known as a
lemma. Lemmatization takes more time to complete than stemming because it ensures that the lemma
is a word with meaning.
The following example illustrates the distinction between stemming and lemmatization:
Caring >> Lemmatization >> Care
Caring >> Stemming >> Car
29. Imagine developing a prediction model based on AI and deploying it to monitor traffic
congestion on the roadways. Now, the model’s goal is to foretell whether or not there will be a
traffic jam. We must now determine whether or not the predictions this model generates are