NLP Q&A1a Text Processing
NLP Q&A1a Text Processing
3. __________ is the sub-field of AI that make the interactions between computers and human
(natural) languages
a. Natural Language Processing
b. Data Science
c. Computer Vision
d. None of the above
Ans: a. Natural Language Processing
6. ___________ Information overload is a real problem when we need to access a specific, important
piece of information from a huge knowledge base.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: a. Automatic Summarization
7. ___________ is especially relevant when used to provide an overview of a news item or blog post,
while avoiding redundancy from multiple sources and maximizing the diversity of content obtained.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 1 -
d. All of the above
Ans: a. Automatic Summarization
8. The goal of sentiment analysis is to identify sentiment among several posts or even in the same
post where emotion is not always explicitly expressed.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: b. Sentiment Analysis
9. Companies use Natural Language Processing applications, such as _________, to identify opinions
and sentiment online to help them understand what customers think about their products and services
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: b. Sentiment Analysis
10. ___________ makes it possible to assign predefined categories to a document and organize it to
help you find the information you need or simplify some activities.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: c. Text Classification
11. __________ device helps to communicate with humans and abilities to make humans lives easier.
a. Google Assistant
b. Cortana
c. Siri
d. All of the above
Ans: d. All of the above
12. _____________ is all about how machines try to understand and interpret human language and
operate accordingly.
a. Natural Language Processing
b. Data Science
c. Computer Vision
d. None of the above
Ans: a. Natural Language Processing
13. By dividing up large problems into smaller ones, ____________ aims to help you manage them in
a more constructive manner.
a. CDP
b. CBT
c. CSP
d. CLP
Ans: b. CBT
16. ________ is considered to be one of the best methods to address stress as it is easy to implement
on people and also gives good results.
a. Common Behavioural Therapy (CBT)
b. Cognitive Behavioural Therapy (CBT)
c. Connection Behavioural Therapy (CBT)
d. None of the above
Ans: b. Cognitive Behavioural Therapy (CBT)
17. ____________ by collecting data from various reliable and authentic sources.
a. Data Acquisition
b. Database
c. Data Mining
d. None of the above
Ans: a. Data Acquisition
18. Once the textual data has been collected, it needs to be processed and cleaned so that an easier
version can be sent to the machine. This is known as __________.
a. Data Acquisition
b. Data Exploration
c. Data Mining
d. None of the above
Ans: b. Data Exploration
19. Once the text has been normalized, it is then fed to an NLP based AI model. Note that in NLP,
modelling requires data pre-processing only after which the data is fed to the machine.
a. Data Acquisition
b. Modelling
c. Data Mining
d. None of the above
Ans: b. Modelling
20. The model trained is then evaluated and the accuracy for the same is generated on the basis of the
relevance of the answers which the machine gives to the user’s responses.
a. Data Acquisition
b. Modelling
c. Evaluation
d. None of the above
Ans: c. Evaluation
21. One of the most common applications of Natural Language Processing is a chatbot, give some
examples of chatbots __________.
a. Mitsuku Bot
b. CleverBot
c. Jabberwacky
26. ___________ helps in cleaning up the textual data in such a way that it comes down to a level
where its complexity is lower than the actual data.
a. Speech Normalization
b. Text Normalization
c. Visual Normalization
d. None of the above
Ans: b. Text Normalization
27. ________ the whole corpus is divided into sentences. Each sentence is taken as a different data so
now the whole corpus gets reduced to sentences.
a. Sentence Normalization
b. Sentence Segmentation
c. Sentence Tokenization
d. All of the above
Ans: b. Sentence Segmentation
28. Under __________, every word, number and special character is considered separately and each
of them is now a separate token.
a. Tokenization
b. Token normalization
c. Token segmentation
d. All of the above
Ans: a. Tokenization
30. __________ are the words which occur very frequently in the corpus but do not add any value to
it.
a. Tokens
b. Words
c. Stopwords
d. None of the above
Ans: c. Stopwords
31. Stopwords are the words which occur very frequently in the corpus but do not add any value to it.
for example_________.
a. Grammatical words
b. Simple words
c. Complex words
d. All of the above
Ans: a. Grammatical words
33. The machine does not consider ___________words as same words because of different cases.
a. Upper case
b. Lower case
c. Case sensitivity
d. None of the above
Ans: c. Case sensitivity
34. ___________ is the process in which the affixes of words are removed and the words are
converted to their base form.
a. Stemming
b. Stopwords
c. Case-sensitivity
d. All of the above
Ans: a. Stemming
37. ___________ is a Natural Language Processing model which helps in extracting features out of
the text which can be helpful in machine learning algorithms.
a. Bag of Words
b. Big Words
c. Best Words
d. All of the above
Ans: a. Bag of Words
38. Which steps we have to approach to implement the bag of words algorithm.
a. Text Normalization
b. Create Dictionary
c. Create Document Vectors
d. All of the above
Ans: d. All of the above
39. ________ identify each document in the corpus, find out how many times the word from the
unique list of words has occurred.
a. Text Normalization
b. Create Dictionary
c. Document Vectors
d. All of the above
Ans: c. Document Vectors
Q4. ____ is all about visual data like images and videos.
a. Computer Vision
b. Data Science
c. NLP
d. None of the above
Ans: a. Computer Vision
Q8. Which of the following will help to access a specific, important piece of information from a huge
knowledge base.
a. Sentiment Analysis
b. Text classification
c. Virtual Assistants
d. Automatic Summarization
Ans: d. Automatic Summarization
Q11. One of the applications of Natural Language Processing is relevant when used to provide an
overview of a news item or blog post, while avoiding redundancy from multiple sources and
maximizing the diversity of content obtained. Identify the application from the following
a. Sentiment Analysis
b. Virtual Assistants
c. Text classification
d. Automatic Summarization
Ans: d. Automatic Summarization
Q12. Companies use ________ application of NLP , to identify opinions and feelings/emotions online
to help them understand what customers think about their products and services.
a. Sentiment Analysis
b. Automatic Summarization
c. Text classification
d. Virtual Assistants
Ans: a. Sentiment Analysis
Q13. _____ understands point of view in context to help better understand what’s behind an
expressed opinion.
a. Sentiment Analysis
b. Automatic Summarization
c. Text classification
d. Virtual Assistants
Ans: a. Sentiment Analysis
Q14. Which of the following makes it possible to assign predefined categories to a document and
organize it to help you find the information you need or simplify some activities.
a. Sentiment Analysis
b. Automatic Summarization
c. Text classification
d. Virtual Assistants
Ans: c. Text classification
Q49. In __________ it is important to understand that a word can have multiple meanings and the
meanings fit into the statement according to the context of it.
a. Natural Language
b. Computer language
c. Machine Language
d. None of the above
Ans: a. Natural Language
Q50. In Human language, a perfect balance of ______ is important for better understanding.
a. Syntax
b. Semantics
c. Both of the above
d. None of the above
Ans: c. Both of the above
Q51. _________________ helps in cleaning up the textual data in such a way that it comes down to a
level where its complexity is lower than the actual data.
a. Data Normalisation
b. Text Normalisation
c. Number Normalisation
d. Table Normalisation
Ans: b. Text Normalisation
Q52. The term used for the whole textual data from all the documents altogether is known as _____
a. Complete Data
b. Slab
c. Corpus
d. Cropus
Ans: c. Corpus
Q53. Which of the following is the first step for Text Normalisation?
a. Tokenisation
b. Sentence Segmentation.
c. Removing Stopwords, Special Characters and Numbers.
d. Converting text to a common case.
Ans: b. Sentence Segmentation.
Q56. Under ______, every word, number and special character is considered separately and each of
them is now a separate token.
a. Sentence Segmentation
b. Removing Stopwords, Special Characters and Numbers
c. Converting text to a common case
d. Tokenisation
Ans: d. Tokenisation
Q57. __________ are the words which occur very frequently in the corpus but do not add any value
to it.
a. Special Characters
b. Stopwords
c. Roman Numbers
d. Useless Words
Ans: b. Stopwords
Q59. During Text Normalisation, which step will come after removing Stopwords, Special Characters
and Numbers.
a. Converting text to a common case.
b. Stemming
c. Lemmatization
d. Tokenisation
Ans: a. Converting text to a common case.
Q60. During Text Normalisation, when we convert the whole text into a similar case, we prefer
____________
a. Upper Case
b. Lower Case
c. Title Case
d. Mixed Case
Ans: b. Lower Case
Q61. _____________ is the process in which the affixes of words are removed and the words are
converted to their base form.
a. Lemmatization
b. Stemming
c. Both of the above
d. None of the above
Ans: c. Both of the above
Q63. While stemming healed, healing and healer all were reduced to _______________
a. heal
b. healed
c. heale
d. hea
Ans: a. heal
Q64. While stemming studies was reduced to ___________________ after the affix removal.
a. studi
b. study
c. stud
d. studys
Ans: a. studi
Q65. After Lemmatization, the words which we are get after removing the affixes is called
_________
a. Lemmat
b. Lemma
c. Lemmatiz
d. Lemmatiza
Ans: b. Lemma
Q67. ___________ is a Natural Language Processing model. In this we get the occurrences of each
word and construct the vocabulary for the corpus.
a. Bag of Words
b. Bag of Alphabets
c. Bag of Characters
d. Bag of Numbers
Ans: a. Bag of Words
Q68. Which of the following things we are getting after ‘Bag of Words’ algorithm?
a. A vocabulary of words for the corpus.
b. The frequency of these words.
c. Both of the above.
d. None of the above
Ans: c. Both of the above.
Q70. Bag of words algorithm gives us the frequency of words in each document. It gives us an idea
that if the word is occurring more in a document, __________
a. its value is more for that document
b. its value is less for that document
c. its value is not more not less for that document
d. its has no value for that document.
Ans: a. its value is more for that document
Q71. Steps to implement bag of words algorithm is given below. Choose the correct sequence.
1. Text Normalisation
2. Create document vectors
3. Create document vectors for all the documents
4. Create Dictionary
a. 1, 2, 3, 4
b. 2, 3, 1, 4
c. 1, 4, 2, 3
d. 1, 4, 3, 2
Ans: c. 1, 4, 2, 3
Q72. _______ are the words which occur the most in almost all the documents.
a. And
b. The
c. This
d. All of the above
Ans: d. All of the above
Q73. Those words which are a complete waste for machine as they do not provide any information
regarding the corpus are termed as _________________
a. Start words
b. End words
c. Stop words
d. Close words
Ans: c. Stop words
Q74. Which of the following type of words have more value in the document of the corpus?
a. Stop words
b. Frequent words
c. Rare words
d. All of the above
Ans: c. Rare words
Q75. Which of the following type of words have more frequency in the document of the corpus?
a. Stop words
b. Frequent words
c. Rare words
d. All of the above
Ans: a. Stop words
OR
Document Vector Table is a table containing the frequency of each word of the vocabulary in each
document.
1. What are the types of data used for Natural Language Processing applications?
Natural Language Processing takes in the data of Natural Languages in the form of written words and
spoken words which humans use in their daily lives and operates on this.
6. Which words in a corpus have the highest values and which ones have the least?
Stop words like - and, this, is, the, etc. have highest values in a corpus. But these words do not talk
about the corpus at all. Hence, these are termed as stopwords and are mostly removed at the pre-
processing stage only.
Rare or valuable words occur the least but add the most importance to the corpus. Hence, when we
look at the text, we take frequent and rare words into consideration.
10. What is the significance of converting the text into a common case?
In Text Normalization, we undergo several steps to normalize the text to a lower level.
After the removal of stop words, we convert the whole text into a similar case, preferably lower case.
This ensures that the case-sensitivity of the machine does not consider same words as different just
because of different cases.