0% found this document useful (0 votes)
0 views

NLP Q&A1a Text Processing

The document consists of two sets of objective questions focused on Natural Language Processing (NLP) and its applications, including concepts like Automatic Summarization, Sentiment Analysis, and Text Classification. It covers various aspects of NLP, including definitions, applications, and processes such as Tokenization and Stemming. The questions also touch on related fields like Data Science and Computer Vision, providing a comprehensive overview of NLP fundamentals.

Uploaded by

xcd9995b5v
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

NLP Q&A1a Text Processing

The document consists of two sets of objective questions focused on Natural Language Processing (NLP) and its applications, including concepts like Automatic Summarization, Sentiment Analysis, and Text Classification. It covers various aspects of NLP, including definitions, applications, and processes such as Tokenization and Stemming. The questions also touch on related fields like Data Science and Computer Vision, providing a comprehensive overview of NLP fundamentals.

Uploaded by

xcd9995b5v
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

NATURAL LANGUAGE PROCESSING

OBJECTIVE QUESTIONS (SET 01)

1. NLP stands for _________.


a. Natural Language Processing
b. Nature Language Processing
c. None Language Processing
d. None of the above
Ans: a. Natural Language Processing

2. ___________, is the sub-field of AI that is focused on enabling computers to understand and


process human languages.
a. Natural Language Processing
b. Data Science
c. Computer Vision
d. None of the above
Ans: a. Natural Language Processing

3. __________ is the sub-field of AI that make the interactions between computers and human
(natural) languages
a. Natural Language Processing
b. Data Science
c. Computer Vision
d. None of the above
Ans: a. Natural Language Processing

4. Which of the games below is related to natural language processing?


a. Voice Assistants
b. Chatbots
c. Mystery Animal
d. Grammar Checkers
Ans: c. Mystery Animal

5. Applications of Natural Language Processing


a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: d. All of the above

6. ___________ Information overload is a real problem when we need to access a specific, important
piece of information from a huge knowledge base.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: a. Automatic Summarization

7. ___________ is especially relevant when used to provide an overview of a news item or blog post,
while avoiding redundancy from multiple sources and maximizing the diversity of content obtained.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 1 -
d. All of the above
Ans: a. Automatic Summarization

8. The goal of sentiment analysis is to identify sentiment among several posts or even in the same
post where emotion is not always explicitly expressed.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: b. Sentiment Analysis

9. Companies use Natural Language Processing applications, such as _________, to identify opinions
and sentiment online to help them understand what customers think about their products and services
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: b. Sentiment Analysis

10. ___________ makes it possible to assign predefined categories to a document and organize it to
help you find the information you need or simplify some activities.
a. Automatic Summarization
b. Sentiment Analysis
c. Text Classification
d. All of the above
Ans: c. Text Classification

11. __________ device helps to communicate with humans and abilities to make humans lives easier.
a. Google Assistant
b. Cortana
c. Siri
d. All of the above
Ans: d. All of the above

12. _____________ is all about how machines try to understand and interpret human language and
operate accordingly.
a. Natural Language Processing
b. Data Science
c. Computer Vision
d. None of the above
Ans: a. Natural Language Processing

13. By dividing up large problems into smaller ones, ____________ aims to help you manage them in
a more constructive manner.
a. CDP
b. CBT
c. CSP
d. CLP
Ans: b. CBT

14. CBT stands for ____________.


a. Common Behavioural Therapy (CBT)
b. Cognitive Behavioural Therapy (CBT)
c. Connection Behavioural Therapy (CBT)

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 2 -


d. None of the above
Ans: b. Cognitive Behavioural Therapy (CBT)

15. Cognitive behavioural Therapy includes __________.


a. Your Thoughts
b. Your Behaviors
c. Your Emotions
d. All of the above
Ans: d. All of the above

16. ________ is considered to be one of the best methods to address stress as it is easy to implement
on people and also gives good results.
a. Common Behavioural Therapy (CBT)
b. Cognitive Behavioural Therapy (CBT)
c. Connection Behavioural Therapy (CBT)
d. None of the above
Ans: b. Cognitive Behavioural Therapy (CBT)

17. ____________ by collecting data from various reliable and authentic sources.
a. Data Acquisition
b. Database
c. Data Mining
d. None of the above
Ans: a. Data Acquisition

18. Once the textual data has been collected, it needs to be processed and cleaned so that an easier
version can be sent to the machine. This is known as __________.
a. Data Acquisition
b. Data Exploration
c. Data Mining
d. None of the above
Ans: b. Data Exploration

19. Once the text has been normalized, it is then fed to an NLP based AI model. Note that in NLP,
modelling requires data pre-processing only after which the data is fed to the machine.
a. Data Acquisition
b. Modelling
c. Data Mining
d. None of the above
Ans: b. Modelling

20. The model trained is then evaluated and the accuracy for the same is generated on the basis of the
relevance of the answers which the machine gives to the user’s responses.
a. Data Acquisition
b. Modelling
c. Evaluation
d. None of the above
Ans: c. Evaluation

21. One of the most common applications of Natural Language Processing is a chatbot, give some
examples of chatbots __________.
a. Mitsuku Bot
b. CleverBot
c. Jabberwacky

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 3 -


d. All of the above
Ans: d. All of the above

22. There are ______ different types of chatbots.


a. 2
b. 3
c. 4
d. 5
Ans: a. 2

23. Which of the following is related to chatbots.


a. Script-bot
b. Smart-bot
c. Both a) and b)
d. None of the above
Ans: c. Both a) and b)

24. _______ bots work around a script which is programmed in them.


a. Script-bot
b. Smart-bot
c. Both a) and b)
d. None of the above
Ans: a. Script-bot

25. ________ work on bigger databases and other resources directly.


a. Script-bot
b. Smart-bot
c. Both a) and b)
d. None of the above
Ans: b. Smart-bot

26. ___________ helps in cleaning up the textual data in such a way that it comes down to a level
where its complexity is lower than the actual data.
a. Speech Normalization
b. Text Normalization
c. Visual Normalization
d. None of the above
Ans: b. Text Normalization

27. ________ the whole corpus is divided into sentences. Each sentence is taken as a different data so
now the whole corpus gets reduced to sentences.
a. Sentence Normalization
b. Sentence Segmentation
c. Sentence Tokenization
d. All of the above
Ans: b. Sentence Segmentation

28. Under __________, every word, number and special character is considered separately and each
of them is now a separate token.
a. Tokenization
b. Token normalization
c. Token segmentation
d. All of the above
Ans: a. Tokenization

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 4 -


29. In Tokenization each sentence is divided into _________.
a. Block
b. Tokens
c. Parts
d. None of the above
Ans: b. Tokens

30. __________ are the words which occur very frequently in the corpus but do not add any value to
it.
a. Tokens
b. Words
c. Stopwords
d. None of the above
Ans: c. Stopwords

31. Stopwords are the words which occur very frequently in the corpus but do not add any value to it.
for example_________.
a. Grammatical words
b. Simple words
c. Complex words
d. All of the above
Ans: a. Grammatical words

32. Applications of TFIDF are ___________.


a. Document Classification
b. Topic Modelling
c. Information Retrieval System and Stop word filtering
d. All of the above
Ans: d. All of the above

33. The machine does not consider ___________words as same words because of different cases.
a. Upper case
b. Lower case
c. Case sensitivity
d. None of the above
Ans: c. Case sensitivity

34. ___________ is the process in which the affixes of words are removed and the words are
converted to their base form.
a. Stemming
b. Stopwords
c. Case-sensitivity
d. All of the above
Ans: a. Stemming

35. Stemming and lemmatization both are _________ processes.


a. Same process
b. Alternative process
c. Other process
d. All of the above
Ans: b. Alternative process

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 5 -


36. ________ makes sure that lemma is a word with meaning and hence it takes a longer time to
execute than stemming.
a. Stopwords
b. Stemming
c. Lemmatization
d. Token normalization
Ans: c. Lemmatization

37. ___________ is a Natural Language Processing model which helps in extracting features out of
the text which can be helpful in machine learning algorithms.
a. Bag of Words
b. Big Words
c. Best Words
d. All of the above
Ans: a. Bag of Words

38. Which steps we have to approach to implement the bag of words algorithm.
a. Text Normalization
b. Create Dictionary
c. Create Document Vectors
d. All of the above
Ans: d. All of the above

39. ________ identify each document in the corpus, find out how many times the word from the
unique list of words has occurred.
a. Text Normalization
b. Create Dictionary
c. Document Vectors
d. All of the above
Ans: c. Document Vectors

OBJECTIVE QUESTIONS (SET 02)

Q1. NLP stands for __________


a. New Language Processing
b. Number Language Processing
c. Natural Language Processing
d. Neural Language Processing
Ans: c. Natural Language Processing

Q2. Which of the following is not the domain of Artificial Intelligence?


a. Data Science
b. Computer Vision
c. NLP
d. Data Vision
Ans: d. Data Vision
Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 6 -
Q3. Which of the following domain work around numbers and tabular data?
a. Computer Vision
b. Data Science
c. NLP
d. None of the above
Ans: b. Data Science

Q4. ____ is all about visual data like images and videos.
a. Computer Vision
b. Data Science
c. NLP
d. None of the above
Ans: a. Computer Vision

Q5. What is NLP?


a. It works around numbers and tabular data.
b. It is all about visual data like images and videos.
c. It takes in the data of Natural Languages which humans use in their daily lives.
d. None of the above
Ans: c. It takes in the data of Natural Languages which humans use in their daily lives.

Q6. Which of the following is not correct about NLP?


a. It is a sub field of AI.
b. It is focused on enabling computers to understand and process human languages.
c. It takes in the data of Natural Languages which humans use in their daily lives.
d. None of the above
Ans: d. None of the above

Q7. Applications of Natural Language Processing is _________


a. Automatic Summarization
b. Sentiment Summarization
c. Text Summarization
d. All of the above
Ans: a. Automatic Summarization

Q8. Which of the following will help to access a specific, important piece of information from a huge
knowledge base.
a. Sentiment Analysis
b. Text classification
c. Virtual Assistants
d. Automatic Summarization
Ans: d. Automatic Summarization

Q9. Automatic summarization is relevant ________________


a. for summarizing the meaning of documents and information.
b. to understand the emotional meanings within the information, such as in collecting data from social
media.
c. to provide an overview of a news item or blog post.
d. All of the above
Ans: d. All of the above

Q10. The goal of _____________ is to identify sentiment among several posts.


a. Sentiment Analysis

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 7 -


b. Automatic Summarization
c. Text classification
d. Virtual Assistants
Ans: a. Sentiment Analysis

Q11. One of the applications of Natural Language Processing is relevant when used to provide an
overview of a news item or blog post, while avoiding redundancy from multiple sources and
maximizing the diversity of content obtained. Identify the application from the following
a. Sentiment Analysis
b. Virtual Assistants
c. Text classification
d. Automatic Summarization
Ans: d. Automatic Summarization

Q12. Companies use ________ application of NLP , to identify opinions and feelings/emotions online
to help them understand what customers think about their products and services.
a. Sentiment Analysis
b. Automatic Summarization
c. Text classification
d. Virtual Assistants
Ans: a. Sentiment Analysis

Q13. _____ understands point of view in context to help better understand what’s behind an
expressed opinion.
a. Sentiment Analysis
b. Automatic Summarization
c. Text classification
d. Virtual Assistants
Ans: a. Sentiment Analysis

Q14. Which of the following makes it possible to assign predefined categories to a document and
organize it to help you find the information you need or simplify some activities.
a. Sentiment Analysis
b. Automatic Summarization
c. Text classification
d. Virtual Assistants
Ans: c. Text classification

Q15. Which of the following is used in spam filtering in E-mail?


a. Sentiment Analysis
b. Automatic Summarization
c. Text classification
d. Virtual Assistants
Ans: c. Text classification

Q16. Which of the following is not a Virtual Assistant?


a. Alexa
b. Cortana
c. Siri
d. Silvi
Ans: d. Silvi

Q17. _________________ is a virtual assistant software application developed by Google.


a. Alexa

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 8 -


c. Different syntax, Different semantics
d. None of the above
Ans: b. Same syntax, Different semantics

Q48. Semantics refers to _________


a. grammar of the statement
b. meaning of the statement
c. Both of the above
d. None of the above
Ans: b. meaning of the statement

Q49. In __________ it is important to understand that a word can have multiple meanings and the
meanings fit into the statement according to the context of it.
a. Natural Language
b. Computer language
c. Machine Language
d. None of the above
Ans: a. Natural Language

Q50. In Human language, a perfect balance of ______ is important for better understanding.
a. Syntax
b. Semantics
c. Both of the above
d. None of the above
Ans: c. Both of the above

Q51. _________________ helps in cleaning up the textual data in such a way that it comes down to a
level where its complexity is lower than the actual data.
a. Data Normalisation
b. Text Normalisation
c. Number Normalisation
d. Table Normalisation
Ans: b. Text Normalisation

Q52. The term used for the whole textual data from all the documents altogether is known as _____
a. Complete Data
b. Slab
c. Corpus
d. Cropus
Ans: c. Corpus

Q53. Which of the following is the first step for Text Normalisation?
a. Tokenisation
b. Sentence Segmentation.
c. Removing Stopwords, Special Characters and Numbers.
d. Converting text to a common case.
Ans: b. Sentence Segmentation.

Q54. In ___________ the whole corpus is divided into sentences.


a. Tokenisation
b. Sentence Segmentation
c. Removing Stopwords, Special Characters and Numbers
d. Converting text to a common case
Ans: b. Sentence Segmentation.

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 13 -


Q55. In Tokenisation each sentence is then further divided into __________
a. Token
b. Character
c. Word
d. Numbers
Ans: a. Token

Q56. Under ______, every word, number and special character is considered separately and each of
them is now a separate token.
a. Sentence Segmentation
b. Removing Stopwords, Special Characters and Numbers
c. Converting text to a common case
d. Tokenisation
Ans: d. Tokenisation

Q57. __________ are the words which occur very frequently in the corpus but do not add any value
to it.
a. Special Characters
b. Stopwords
c. Roman Numbers
d. Useless Words
Ans: b. Stopwords

Q58. Which of the following is an example of stopword?


a. a
b. an
c. and
d. All of the above
Ans: d. All of the above

Q59. During Text Normalisation, which step will come after removing Stopwords, Special Characters
and Numbers.
a. Converting text to a common case.
b. Stemming
c. Lemmatization
d. Tokenisation
Ans: a. Converting text to a common case.

Q60. During Text Normalisation, when we convert the whole text into a similar case, we prefer
____________
a. Upper Case
b. Lower Case
c. Title Case
d. Mixed Case
Ans: b. Lower Case

Q61. _____________ is the process in which the affixes of words are removed and the words are
converted to their base form.
a. Lemmatization
b. Stemming
c. Both of the above
d. None of the above
Ans: c. Both of the above

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 14 -


Q62. After stemming, the words which we get after removing the affixes is called _________
a. Stemmed Words
b. Stemma
c. Fruit Word
d. Shoot Word
Ans: a. Stemmed Words

Q63. While stemming healed, healing and healer all were reduced to _______________
a. heal
b. healed
c. heale
d. hea
Ans: a. heal

Q64. While stemming studies was reduced to ___________________ after the affix removal.
a. studi
b. study
c. stud
d. studys
Ans: a. studi

Q65. After Lemmatization, the words which we are get after removing the affixes is called
_________
a. Lemmat
b. Lemma
c. Lemmatiz
d. Lemmatiza
Ans: b. Lemma

Q66. Which of the following statement is not correct?


a. Lemmatization makes sure that lemma is a word with meaning.
b. Lemmatization takes a longer time to execute than stemming.
c. Stemmed word is always meaningful.
d. Both Stemming and lemmatization process remove the affixes.
Ans: c. Stemmed word is always meaningful.

Q67. ___________ is a Natural Language Processing model. In this we get the occurrences of each
word and construct the vocabulary for the corpus.
a. Bag of Words
b. Bag of Alphabets
c. Bag of Characters
d. Bag of Numbers
Ans: a. Bag of Words

Q68. Which of the following things we are getting after ‘Bag of Words’ algorithm?
a. A vocabulary of words for the corpus.
b. The frequency of these words.
c. Both of the above.
d. None of the above
Ans: c. Both of the above.

Q69. Expand TFIDF


a. Term Format & Inverse Document Frequency

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 15 -


b. Term Frequency & Inverse Document Frequency
c. Term Frequency & Inverse Data Frequency
d. Term Frequency & Inner Document Frequency
Ans: b. Term Frequency & Inverse Document Frequency

Q70. Bag of words algorithm gives us the frequency of words in each document. It gives us an idea
that if the word is occurring more in a document, __________
a. its value is more for that document
b. its value is less for that document
c. its value is not more not less for that document
d. its has no value for that document.
Ans: a. its value is more for that document

Q71. Steps to implement bag of words algorithm is given below. Choose the correct sequence.
1. Text Normalisation
2. Create document vectors
3. Create document vectors for all the documents
4. Create Dictionary
a. 1, 2, 3, 4
b. 2, 3, 1, 4
c. 1, 4, 2, 3
d. 1, 4, 3, 2
Ans: c. 1, 4, 2, 3

Q72. _______ are the words which occur the most in almost all the documents.
a. And
b. The
c. This
d. All of the above
Ans: d. All of the above

Q73. Those words which are a complete waste for machine as they do not provide any information
regarding the corpus are termed as _________________
a. Start words
b. End words
c. Stop words
d. Close words
Ans: c. Stop words

Q74. Which of the following type of words have more value in the document of the corpus?
a. Stop words
b. Frequent words
c. Rare words
d. All of the above
Ans: c. Rare words

Q75. Which of the following type of words have more frequency in the document of the corpus?
a. Stop words
b. Frequent words
c. Rare words
d. All of the above
Ans: a. Stop words

Q76. ________ is the frequency of a word in one document.

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 16 -


3. While working with NLP what is the meaning of?
a. Syntax
b. Semantics
Syntax: Syntax refers to the grammatical structure of a sentence.
Semantics: It refers to the meaning of the sentence.

4. What is the difference between stemming and lemmatization?


Stemming is a technique used to extract the base form of the words by removing affixes from them. It
is just like cutting down the branches of a tree to its stems. For example, the stem of the words eating,
eats, eaten is eat.
Lemmatization is the grouping together of different forms of the same word. In search queries,
lemmatization allows end users to query any version of a base word and get relevant results.
OR
Stemming is the process in which the affixes of words are removed and the words are converted to
their base form.
In lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one.
Lemmatization makes sure that lemma is a word with meaning and hence it takes a longer time to
execute than stemming.
OR
Stemming algorithms work by cutting off the end or the beginning of the word, taking into account a
list of common prefixes and suffixes that can be found in an inflected word.
Lemmatization on the other hand, takes into consideration the morphological analysis of the words.
To do so, it is necessary to have detailed dictionaries which the algorithm can look through to link the
form back to its lemma.

5. What is meant by a dictionary in NLP?


Dictionary in NLP means a list of all the unique words occurring in the corpus. If some words are
repeated in different documents, they are all written just once as while creating the dictionary.

OR
Document Vector Table is a table containing the frequency of each word of the vocabulary in each
document.

10. What do you mean by corpus?

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 18 -


In Text Normalization, we undergo several steps to normalize the text to a lower level. That is, we
will be working on text from multiple documents and the term used for the whole textual data from
all the documents altogether is known as corpus.
OR
A corpus is a large and structured set of machine-readable texts that have been produced in a natural
communicative setting.
OR
A corpus can be defined as a collection of text documents. It can be thought of as just a bunch of text
files in a directory, often alongside many other directories of text files.

QUESTIONS AND ANSWERS (SET 01) - 2 marks

1. What are the types of data used for Natural Language Processing applications?
Natural Language Processing takes in the data of Natural Languages in the form of written words and
spoken words which humans use in their daily lives and operates on this.

3. Give an example of the following:


 Multiple meanings of a word
 Perfect syntax, no meaning
Example of Multiple meanings of a word –
His face turns red after consuming the medicine
Meaning - Is he having an allergic reaction? Or is he not able to bear the taste of that medicine?
Example of Perfect syntax, no meaning-
Chickens feed extravagantly while the moon drinks tea.
This statement is correct grammatically but it does not make any sense. In Human language, a perfect
balance of syntax and semantics is important for better understanding.

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 19 -


5. Define the following:
● Stemming
● Lemmatization
Stemming: Stemming is a rudimentary rule-based process of stripping the suffixes (“ing”, “ly”, “es”,
“s” etc) from a word.
Stemming is a process of reducing words to their word stem, base or root form (for example, books
— book, looked — look).
Lemmatization: Lemmatization, on the other hand, is an organized & step by step procedure of
obtaining the root form of the word, it makes use of vocabulary (dictionary importance of words) and
morphological analysis (word structure and grammar relations).
The aim of lemmatization, like stemming, is to reduce inflectional forms to a common base form. As
opposed to stemming, lemmatization does not simply chop off inflections. Instead it uses lexical
knowledge bases to get the correct base forms of words.
OR
Stemming is a technique used to extract the base form of the words by removing affixes from them. It
is just like cutting down the branches of a tree to its stems. For example, the stem of the words eating,
eats, eaten is eat.
Lemmatization is the grouping together of different forms of the same word. In search queries,
lemmatization allows end users to query any version of a base word and get relevant results.
OR
Stemming is the process in which the affixes of words are removed and the words are converted to
their base form.
In lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one.
Lemmatization makes sure that lemma is a word with meaning and hence it takes a longer time to
execute than stemming.
OR
Stemming algorithms work by cutting off the end or the beginning of the word, taking into account a
list of common prefixes and suffixes that can be found in an inflected word.
Lemmatization on the other hand, takes into consideration the morphological analysis of the words.
To do so, it is necessary to have detailed dictionaries which the algorithm can look through to link the
form back to its lemma.

6. Which words in a corpus have the highest values and which ones have the least?
Stop words like - and, this, is, the, etc. have highest values in a corpus. But these words do not talk
about the corpus at all. Hence, these are termed as stopwords and are mostly removed at the pre-
processing stage only.
Rare or valuable words occur the least but add the most importance to the corpus. Hence, when we
look at the text, we take frequent and rare words into consideration.

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 20 -


9. Does the vocabulary of a corpus remain the same before and after text normalization? Why?
No, the vocabulary of a corpus does not remain the same before and after text normalization. Reasons
are –
● In normalization the text is normalized through various steps and is lowered to minimum
vocabulary since the machine does not require grammatically correct statements but the essence of it.
● In normalization Stop words, Special Characters and Numbers are removed.
● In stemming the affixes of words are removed and the words are converted to their base form.
So, after normalization, we get the reduced vocabulary.

10. What is the significance of converting the text into a common case?
In Text Normalization, we undergo several steps to normalize the text to a lower level.
After the removal of stop words, we convert the whole text into a similar case, preferably lower case.
This ensures that the case-sensitivity of the machine does not consider same words as different just
because of different cases.

11. Mention some applications of Natural Language Processing.


Natural Language Processing Applications-
● Sentiment Analysis.
● Chatbots & Virtual Assistants.
● Text Classification.
● Text Extraction.
● Machine Translation
● Text Summarization
● Market Intelligence
● Auto-Correct

12. What is the need of text normalization in NLP?


Since we all know that the language of computers is Numerical, the very first step that comes to our
mind is to convert our language to numbers.
This conversion takes a few steps to happen. The first step to it is Text Normalization.
Since human languages are complex, we need to first of all simplify them in order to make sure that
the understanding becomes possible. Text Normalization helps in cleaning up the textual data in such
a way that it comes down to a level where its complexity is lower than the actual data.

13. Explain the concept of Bag of Words.


Bag of Words is a Natural Language Processing model which helps in extracting features out of the
text which can be helpful in machine learning algorithms. In bag of words, we get the occurrences of
each word and construct the vocabulary for the corpus.
Bag of Words just creates a set of vectors containing the count of word occurrences in the document
(reviews). Bag of Words vectors are easy to interpret.

14. Explain the relation between occurrence and value of a word.

Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 21 -

You might also like