Solution To NLP Viva Questions
Solution To NLP Viva Questions
● Natural Language Processing is the technology used to aid computers to understand the
human’s natural language.
(ek machine ka insano ki language samjhna fir uspe answer back karna ya jo action kaha hai usko
Machine Translation
● Machine translation (MT), the process of translating one source language or text into
another language, is one of the most important applications of NLP.
● There are different types of machine translation systems. Let us see what the different types
are.
● Bilingual MT systems produce translations between two particular languages.
● Multilingual MT systems produce translations between any pair of languages. They may be
either unidirectional or bi-directional in nature.
● Example : Google translator
Sentiment Analysis
Automatic Summarization
● In this digital era, the most valuable thing is data, or you can say information.
● However, do we really get useful as well as the required amount of information? The answer
is ‘NO’ because the information is overloaded and our access to knowledge and information
far exceeds our capacity to understand it.
● We are in a serious need of automatic text summarization and information because the
flood of information over the internet is not going to stop.
● Text summarization may be defined as the technique to create short, accurate summary of
longer text documents.
● Automatic text summarization will help us with relevant information in less time. Natural
language processing (NLP) plays an important role in developing an automatic text
summarization.
Question-answering
● Another main application of natural language processing (NLP) is question-answering.
Search engines put the information of the world at our fingertips, but they are still lacking
when it comes to answering the questions posted by human beings in their natural
language.
● We have big tech companies like Google are also working in this direction.
● Question-answering is a Computer Science discipline within the fields of AI and NLP.
● It focuses on building systems that automatically answer questions posted by human beings
in their natural language.
● A computer system that understands the natural language has the capability of a program
system to translate the sentences written by humans into an internal representation so that
the valid answers can be generated by the system.
● The exact answers can be generated by doing syntax and semantic analysis of the questions.
Lexical gap, ambiguity and multilingualism are some of the challenges for NLP in building a
good question answering system.
Speech recognition
● Speech recognition (enables computers to recognize and transform spoken language into
text – dictation – and, if programmed, act upon that recognition – e.g. in case of assistants
like Google Assistant Cortana or Apple’s Siri)
● Speech recognition is simply the ability of a software to recognise speech.
● Anything that a person says, in a language of their choice, must be recognised by the
software.
● Speech recognition technology can be used to perform an action based on the instructions
defined by the human.
● Humans need to train the speech recognition system by storing speech patterns and
vocabulary of their language into the system.
● By doing so, they can essentially train the system to understand them when they speak.
● Speech recognition and Natural Language processing are usually used together in Automatic
Speech Recognition engines, Voice Assistants and Speech analytics tools.
5. Goals of NLP.
Ans:
• The ultimate goal of natural language processing is for computers to achieve human-
like comprehension of texts/languages. When this is achieved, computer systems will
be able to understand, draw inferences from, summarize, translate and generate
accurate and natural human text and language.
• The goal of natural language processing is to specify a language comprehension and
production theory to such a level of detail that a person is able to write a computer
program which can understand and produce natural language
• The basic goal of NLP is to accomplish human like language processing. The choice of
word “processing” is very deliberate and should not be replaced with “understanding”.
For although the field of NLP was originally referred to as Natural Language
Understanding (NLU), that goal has not yet been accomplished. A full NLU system would
be able to:
6. Levels of NLP.
Ans: Natural Language Processing works on multiple levels and most often, these different
areas synergize well with each other. This article will offer a brief overview of each and provide
some example of how they are used in information retrieval.
Morphological
• The morphological level of linguistic processing deals with the study of word structures
and word formation, focusing on the analysis of the individual components of words.
• The most important unit of morphology, defined as having the “minimal unit of meaning”,
is referred to as the morpheme.
• Taking, for example, the word: “unhappiness”. It can be broken down into three
morphemes (prefix, stem, and suffix), with each conveying some form of meaning: the
prefix un- refers to “not being”, while the suffix -ness refers to “a state of being”.
• The stem happy is considered as a free morpheme since it is a “word” in its own
right. Bound morphemes (prefixes and suffixes) require a free morpheme to which it can
be attached to, and can therefore not appear as a “word” on their own.
• In Information Retrieval, document and query terms can be stemmed to match the
morphological variants of terms between the documents and query; such that the singular
form of a noun in a query will match even with its plural form in the document, and vice
versa, thereby increasing recall.
Lexical
• The lexical analysis in NLP deals with the study at the level of words with respect to
their lexical meaning and part-of-speech. This level of linguistic processing utilizes a
language’s lexicon, which is a collection of individual lexemes.
• A lexeme is a basic unit of lexical meaning; which is an abstract unit of morphological
analysis that represents the set of forms or “senses” taken by a single morpheme.
• “Duck”, for example, can take the form of a noun or a verb but its part-of-speech and
lexical meaning can only be derived in context with other words used in the
phrase/sentence.
• This, in fact, is an early step towards a more sophisticated Information Retrieval system
where precision is improved through part-of-speech tagging.
Syntactic
• The part-of-speech tagging output of the lexical analysis can be used at the syntactic level of
linguistic processing to group words into the phrase and clause brackets.
• Syntactic Analysis also referred to as “parsing”, allows the extraction of phrases which
convey more meaning than just the individual words by themselves, such as in a noun
phrase.
• In Information Retrieval, parsing can be leveraged to improve indexing since phrases can be
used as representations of documents which provide better information than just single-
word indices.
• In the same way, phrases that are syntactically derived from the query offers better search
keys to match with documents that are similarly parsed.
• Nevertheless, syntax can still be ambiguous at times as in the case of the news
headline: “Boy paralyzed after tumour fights back to gain black belt” — which actually
refers to how a boy was paralyzed because of a tumour but endured the fight against the
disease and ultimately gained a high level of competence in martial arts.
Semantic
• The semantic level of linguistic processing deals with the determination of what a sentence
really means by relating syntactic features and disambiguating words with multiple
definitions to the given context.
• This level entails the appropriate interpretation of the meaning of sentences, rather than
the analysis at the level of individual words or phrases.
• In Information Retrieval, the query and document matching process can be performed on a
conceptual level, as opposed to simple terms, thereby further increasing system precision.
• Moreover, by applying semantic analysis to the query, term expansion would be possible
with the use of lexical sources, offering improved retrieval of the relevant documents even
if exact terms are not used in the query.
• Precision may increase with query expansion, as with recall probably increasing as well.
Discourse
• The discourse level of linguistic processing deals with the analysis of structure and meaning
of text beyond a single sentence, making connections between words and sentences.
• At this level, Anaphora Resolution is also achieved by identifying the entity referenced by
an anaphor (most commonly in the form of, but not limited to, a pronoun).
• An example is shown below.
Pragmatic
• The pragmatic level of linguistic processing deals with the use of real-world knowledge and
understanding of how this impacts the meaning of what is being communicated.
• By analyzing the contextual dimension of the documents and queries, a more detailed
representation is derived.
• In Information Retrieval, this level of Natural Language Processing primarily engages query
processing and understanding by integrating the user’s history and goals as well as the
context upon which the query is being made.
• Contexts may include time and location.
• This level of analysis enables major breakthroughs in Information Retrieval as it facilitates
the conversation between the IR system and the users, allowing the elicitation of the
purpose upon which the information being sought is planned to be used, thereby ensuring
that the information retrieval system is fit for purpose.
7. Stages in NLP.
Ans:
There are general five stages−
• Lexical Analysis − It involves identifying and analyzing the structure of words. Lexicon of a
language means the collection of words and phrases in a language. Lexical analysis is
dividing the whole chunk of txt into paragraphs, sentences, and words.
• Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for grammar and
arranging words in a manner that shows the relationship among the words. The sentence
such as “The school goes to boy” is rejected by English syntactic analyzer.
• Semantic Analysis − It draws the exact meaning or the dictionary meaning from the text.
The text is checked for meaningfulness. It is done by mapping syntactic structures and
objects in the task domain. The semantic analyzer disregards sentence such as “hot ice-
cream”.
• Discourse Integration − The meaning of any sentence depends upon the meaning of the
sentence just before it. In addition, it also brings about the meaning of immediately
succeeding sentence.
• Pragmatic Analysis − During this, what was said is re-interpreted on what it actually meant.
It involves deriving those aspects of language which require real world knowledge.
8. Ambiguity in NLP.
Ans:
● Ambiguity, generally used in natural language processing, can be referred to as the ability
of being understood in more than one way.
● In simple terms, we can say that ambiguity is the capability of being understood in more
than one way.
● Natural language is very ambiguous.
Lexical Ambiguity
The ambiguity of a single word is called lexical ambiguity. For example, treating the word silver as
a noun, an adjective, or a verb.
Syntactic Ambiguity
Semantic Ambiguity
● This kind of ambiguity occurs when the meaning of the words themselves can be
misinterpreted even after syntax and the meaning of individual word have been resolved.
● In other words, semantic ambiguity happens when a sentence contains an ambiguous word
or phrase.
Example 1
Example 2
“The car hit the pole while it was moving” is having semantic ambiguity because the
interpretations can be “The car, while moving, hit the pole” and “The car hit the pole while the
pole was moving”.
Anaphoric Ambiguity
● This kind of ambiguity arises due to the use of anaphora entities in discourse.
● Anaphora : when same beginning of sentence is repeated several times
● Example of what anaphora is “my mother liked the house very much but she couldn’t buy
it” here we are not repeating mother again and again we replaced it by she so here she is
anaphora
● Now lets come back to anaphoric ambiguity and understand it with the help of example
● For example, the horse ran up the hill. It was very steep. It soon got tired. Here, the
anaphoric reference of “it” in two situations can either be horse or hill which cause
anaphoric ambiguity.
Pragmatic ambiguity
● It occurs when sentence gives it multiple interpretations or it is not specific.
● For example, the sentence “I like you too” can have multiple interpretations like I like you
(just like you like me), I like you (just like someone else does).
Stemming Lemmatization
Derivational Morphology:
• Derivational morphology is defined as morphology that creates new lexemes,
either by changing the syntactic category (part of speech) of a base or by adding
substantial, non-grammatical meaning or both.
• On the one hand, derivation may be distinguished from inflectional morphology,
which typically does not change category but rather modifies lexemes to fit into
various syntactic contexts; inflection typically expresses distinctions like number,
case, tense, aspect, person, among others.
• On the other hand, derivation may be distinguished from compounding, which also
creates new lexemes, but by combining two or more bases rather than by
affixation, reduplication, subtraction, or internal modification of various sorts.
• Although the distinctions are generally useful, in practice applying them is not
always easy.
12. Types of stemmers.
• Porter’s Stemmer
It is one of the most popular stemming methods proposed in 1980. It is based on the idea
that the suffixes in the English language are made up of a combination of smaller and
simpler suffixes.
Example: EED -> EE means “if the word has at least one vowel and consonant plus EED
ending, change the ending to EE” as ‘agreed’ becomes ‘agree’.
Advantage: It produces the best output as compared to other stemmers and it has less
error rate.
Limitation: Morphological variants produced are not always real words.
• Lovins Stemmer
It is proposed by Lovins in 1968, that removes the longest suffix from a word then word is
recoded to convert this stem into valid words.
Example: sitting -> sitt -> sit
Advantage: It is fast and handles irregular plurals like 'teeth' and 'tooth' etc.
Limitation: It is time consuming and frequently fails to form words from stem.
• Dawson Stemmer
It is extension of Lovins stemmer in which suffixes are stored in the reversed order indexed
by their length and last letter.
Advantage: It is fast in execution and covers more suffices.
Limitation: It is very complex to implement.
• Krovetz Stemmer
It was proposed in 1993 by Robert Krovetz. Following are the steps:
1) Convert the plural form of a word to its singular form.
2) Convert the past tense of a word to its present tense and remove the suffix ‘ing’.
Example: ‘children’ -> ‘child’
Advantage: It is light in nature and can be used as pre-stemmer for other stemmers.
Limitation: It is inefficient in case of large documents.
• Xerox Stemmer
Example:
‘children’ -> ‘child’
‘understood’ -> ‘understand’
‘whom’ -> ‘who’
‘best’ -> ‘good’
Advantage: It works well in case of large documents and stems produced are valid.
Limitation: It is language dependent and mainly implemented on english and over
stemming may occur.
• N-Gram Stemmer
An n-gram is a set of n consecutive characters extracted from a word in which similar
words will have a high proportion of n-grams in common.
Example: ‘INTRODUCTIONS’ for n=2 becomes : *I, IN, NT, TR, RO, OD, DU, UC, CT, TI, IO,
ON, NS, S*
Advantage: It is based on string comparisons and it is language dependent.
Limitation: It requires space to create and index the n-grams and it is not time efficient
13. Porter stemmer.
The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing the commoner
morphological and inflexional endings from words in English. Its main use is as part of a term
normalisation process that is usually done when setting up Information Retrieval systems.
where the square brackets denote arbitrary presence of their contents. Using (VCm) to denote VC
repeated m times, this may again be written as
[C](VCm)[V]
m will be called the measure of any word or word part when represented in this form. The case m
= 0 covers the null word. Here are some examples:
• m=0 TR, EE, TREE, Y, BY.
(condition) S1 -> S2
This means that if a word ends with the suffix S1, and the stem before S1 satisfies the given
condition, S1 is replaced by S2. The condition is usually given in terms of m, e.g.
Here S1 is 'EMENT' and S2 is null. This would map REPLACEMENT to REPLAC, since REPLAC is a
word part for which m = 2.
The 'condition' part may also contain the following:
• *S - the stem ends with S (and similarly for the other letters).
• *v* - the stem contains a vowel.
• m=2 TROUBLES, PRIVATE, OATEN, ORRERY.
• *d - the stem ends with a double consonant (e.g. -TT, -SS).
• *o - the stem ends cvc, where the second c is not W, X or Y (e.g. -WIL, -HOP).
And the condition part may also contain expressions with and, or and not, so that:
(m>1 and (*S or *T)) : tests for a stem with m>1 ending in S or T, while
(*d and not (*L or *S or *Z)) : tests for a stem ending witha double consonant other than L, S or Z.
Elaborate conditions like this are required only rarely.
In a set of rules written beneath each other, only one is obeyed, and this will be the one with the
longest matching S1 for the given word. For example, with
• SSES -> SS
• IES -> I
• SS -> SS
• S ->
(here the conditions are all null) CARESSES maps to CARESS since SSES is the longest match for S1.
Equally CARESS maps to CARESS (S1=SS) and CARES to CARE (S1=S).
In the rules below, examples of their application, successful or otherwise, are given on the right in
lower case. The algorithm now follows:
Step 1a :
• SSES -> SS (Example : caresses -> caress)
• (*d and not (*L or *S or *Z)) -> single letter (Example : hopp(ing) -> hop ; tann(ed) -> tan ; fall(ing) -
> fall ; hiss(ing) -> hiss ; fizz(ed) -> fizz)
• (m=1 and *o) -> E (Example : fail(ing) -> fail ; fil(ing) -> file)
The rule to map to a single letter causes the removal of one of the double letter pair. The -E is put
back on -AT, -BL and -IZ, so that the suffixes -ATE, -BLE and -IZE can be recognised later. This E may
be removed in step 4.
Step 1c :
(\*v\*) Y -> I (Example : happy -> happi ; sky -> sky)
Step 1 deals with plurals and past participles. The subsequent steps are much more
straightforward.
Step 2 :
• (m>0) ATIONAL -> ATE (Example : relational -> relate
• (m>0) TIONAL -> TION (Example : conditional -> condition ; rational -> rational)
Step 3 :
• (m>0) ICATE -> IC (Example : triplicate -> triplic)
• (m>1 and (*S or *T)) ION -> (Example : adoption -> adopt)
Step 5a :
• (m>1) E -> (Example : probate -> probat ; rate -> rate)
• Let’s start with equation P(w|h), the probability of word w, given some history, h. For
example,
• Here,
w = The
h = its water is so transparent that
• And, one way to estimate the above probability function is through the relative frequency
count approach, where you would take a substantially large corpus, count the number of
times you see its water is so transparent that, and then count the number of times it is
followed by the. In other words, you are answering the question:
• Out of the times you saw the history h, how many times did the word w follow it
• Now, you can imagine it is not feasible to perform this over an entire corpus; especially it is
of a significant a size.
• This shortcoming and ways to decompose the probability function using the chain rule
serves as the base intuition of the N-gram model.
• Here, you, instead of computing probability using the entire corpus, would approximate it
by just a few historical words
• This assumption that the probability of a word depends only on the previous word is also
known as Markov assumption.
• Markov models are the class of probabilisitic models that assume that we can predict the
probability of some future unit without looking too far in the past.
• You can further generalize the bigram model to the trigram model which looks two words
into the past and can thus be further generalized to the N-gram model
• Now, if we talk about Part-of-Speech (PoS) tagging, then it may be defined as the
process of assigning one of the parts of speech to the given word. It is generally called
POS tagging.
• In simple words, we can say that POS tagging is a task of labelling each word in a
sentence with its appropriate part of speech. We already know that parts of speech
include nouns, verb, adverbs, adjectives, pronouns, conjunction and their sub-
categories.
16. Rule based, Stochastic and Transformation based tagging.
Ans:
Rule-based POS Tagging
• One of the oldest techniques of tagging is rule-based POS tagging. Rule-based taggers
use dictionary or lexicon for getting possible tags for tagging each word.
• If the word has more than one possible tag, then rule-based taggers use hand-written
rules to identify the correct tag. Disambiguation can also be performed in rule-based
tagging by analyzing the linguistic features of a word along with its preceding as well as
following words.
• For example, suppose if the preceding word of a word is article then word must be a
noun.
• As the name suggests, all such kind of information in rule-based POS tagging is coded
in the form of rules. These rules may be either −
➢ Context-pattern rules
➢ First stage − In the first stage, it uses a dictionary to assign each word a list of potential
parts-of-speech.
➢ Second stage − In the second stage, it uses large lists of hand-written disambiguation
rules to sort down the list to a single part-of-speech for each word.
The simplest stochastic tagger applies the following approaches for POS tagging −
For example, consider the two examples of the distinct sense that exist for the word “bass” −
• The occurrence of the word bass clearly denotes the distinct meaning. In first
sentence, it means frequency and in second, it means fish.
• Hence, if it would be disambiguated by WSD then the correct meaning to the above
sentences can be assigned as follows −
Supervised Methods
• For disambiguation, machine learning methods make use of sense-annotated corpora to
train.
• These methods assume that the context can provide enough evidence on its own to
disambiguate the sense.
• In these methods, the words knowledge and reasoning are deemed unnecessary. The
context is represented as a set of “features” of the words.
• It includes the information about the surrounding words also. Support vector machine and
memory-based learning are the most successful supervised learning approaches to WSD.
• These methods rely on substantial amount of manually sense-tagged corpora, which is
very expensive to create.
Semi-supervised Methods
• Due to the lack of training corpus, most of the word sense disambiguation algorithms use
semi-supervised learning methods.
• It is because semi-supervised methods use both labelled as well as unlabeled data.
• These methods require very small amount of annotated text and large amount of plain
unannotated text.
• The technique that is used by semisupervised methods is bootstrapping from seed data.
Unsupervised Methods
• These methods assume that similar senses occur in similar context.
• That is why the senses can be induced from text by clustering word occurrences by using
some measure of similarity of the context.
• This task is called word sense induction or discrimination.
• Unsupervised methods have great potential to overcome the knowledge acquisition
bottleneck due to non-dependency on manual efforts.