Nlp Revision Notes
Nlp Revision Notes
(a) Regression
(c) Classification
(d) Clustering
151
7. The higher the value, the more important the word in the document – this is
(a) NLTK (b) NLP Kit (c) Open NLP (d) NLP Suite
9. What is a chatbot called which uses simple FAQs without any intelligence ?
10. What is the process of extracting emotions within a text data using NLP called?
152
2. Compare Bag of words and TF-IDF and share your finding.
· Monitoring: Awareness and tracking of user’s behavior, anxiety, and weight changes
to encourage developing better habits.
· Anonymity: Especially in sensitive and mental health issues.
· Personalization: Level of personalization depends on the specific application. Some
applications make use of measurements of:
. Physical vitals (oxygenation, heart rhythm, body temperature) via mobile sensors.
. Patient behavior via facial recognition.
· Real time interaction: Immediate response, notifications, and reminders.
· Scalability: Ability to react with numerous users at the same time.
153
Stemming is a process of reducing words to their word stem, base or root form (for
example, books — book, looked — look).
5. What is the difference between how humans interpret communication and how NLP interpret?
The communications made by the machines are very basic and simple. Human communication
is complex. There are multiple characteristics of the human language that might be easy for a
human to understand but extremely difficult for a computer to understand.
For machines it is difficult to understand our language. Let us take a look at some of them
here:
Arrangement of the words and meaning - There are rules in human language. There are nouns,
verbs, adverbs, adjectives. A word can be a noun at one time and an adjective some other time.
This can create difficulty while processing by computers.
Analogy with programming language- Different syntax, same semantics: 2+3 = 3+2 Here the
way these statements are written is different, but their meanings are the same that is 5.
Different semantics, same syntax: 2/3 (Python 2.7) ≠ 2/3 (Python 3) Here the statements
written have the same syntax but their meanings are different. In Python 2.7, this statement
would result in 1 while in Python 3, it would give an output of 1.5. Multiple Meanings of a
word - In natural language, it is important to understand that a word can have multiple
meanings and the meanings fit into the statement according to the context of it.
154
Document 3: We are going to a famous place.
Document 4: I am famous in Mumbai.
Term Frequency
Term frequency is the frequency of a word in one document. Term frequency can easily be
found from the document vector table as in that table we mention the frequency of each word
of the vocabulary in each document.
The other half of TFIDF which is Inverse Document Frequency. For this, let us first understand
what does document frequency mean. Document Frequency is the number of documents in
which the word occurs irrespective of how many times it has occurred in those documents.
The document frequency for the exemplar vocabulary would be:
2 2 2 2 3 1 2 3 2 1 1 1
Talking about inverse document frequency, we need to put the document frequency in the
denominator while the total number of documents is the numerator. Here, the total number of
documents are 3, hence inverse document frequency becomes:
155
We Are Going to Mumbai is a famous Place I am in
1*log(4/2) 1*log(4/2) 1*log(4/2) 1*log(4/2) 1*log(4/3) 0*log(4/1) 0*log(4/2) 0*log(4/3) 0*log(4/2) 0*log(4/1) 0*log(4/1) 0*log(4/1)
0*log(4/2) 0*log(4/2) 0*log(4/2) 0*log(4/2) 1*log(4/3) 0*log(4/1) 1*log(4/2) 1*log(4/3) 1*log(4/2) 0*log(4/1) 0*log(4/1) 0*log(4/1)
1*log(4/2) 1*log(4/2) 1*log(4/2) 1*log(4/2) 0*log(4/3) 0*log(4/1) 1*log(4/2) 1*log(4/3) 1*log(4/2) 0*log(4/1) 0*log(4/1) 0*log(4/1)
0*log(4/2) 0*log(4/2) 0*log(4/2) 0*log(4/2) 1*log(4/3) 0*log(4/1) 0*log(4/2) 1*log(4/3) 1*log(4/2) 1*log(4/1) 1*log(4/1) 1*log(4/1)
Finally, the words have been converted to numbers. These numbers are the values of each for each
document. Here, you can see that since we have less data, words like ‘I’,’am’, ‘in’ and ‘is’ also
have a high value. But as the IDF value increases, the value of that word decreases.
2. Normalize the given text and comment on the vocabulary before and after the
normalization:
Raj and Vijay are best friends. They play together with other friends. Raj likes to play
football but Vijay prefers to play online games. Raj wants to be a footballer. Vijay wants
to become an online gamer.
156