NLP_???_???__
NLP_???_???__
AMBIGUITY IN NLP:
1. Natural language has a very rich form and structure.
2. It is very ambiguous.
3. Ambiguity means not having well defined solution.
4. Any sentences in a language with a large enough grammar can have another interpretation.
5. Figure shows different types of ambiguity.
I) Lexical Ambiguity:
1. Lexical is the ambiguity of a single word.
2. A word can be ambiguous with respect to its syntactic class.
3. Example: silver
4. The word silver can be used as a noun, an adjective, or a verb.
a. She bagged two silver medals.
b. She made a silver speech.
c. His worries had silvered his hair.
5. Lexical ambiguity can be resolved by Lexical category disambiguation i.e., parts-of-speech tagging.
II) Syntactic Ambiguity:
1.Syntactic ambiguity arises when the structure of a sentence allows for multiple valid parse trees,
leading to different interpretations.
2. Example: In the sentence "I saw the man with the telescope," it's unclear whether the speaker used the
telescope to see the man or if the man had the telescope.
3. Syntactic ambiguity can be resolved by probabilistic parsing.
V) Pragmatic Ambiguity:
1. Pragmatic ambiguity is related to how language is used in context and can arise due to the speaker's
intentions, implied meanings, or conversational implicatures.
2. Example: In response to the question "Do you want to come over for coffee?" a person might say "I can't, I'm
busy," which could mean they are genuinely busy or that they don't want to come but are being polite.
3. It is the most difficult ambiguity.
4. Pragmatic ambiguity can be resolved by contextual analysis.
Q3. Write short notes on challenges in NLP. Ans:
CHALLENGES IN NLP:
1. Ambiguity: Words and sentences often have multiple meanings (e.g., "bank" can mean a financial
institution or the side of a river).
2. Contextual Understanding: The meaning of words and phrases can change depending on the context in
which they are used.
3. Language Variability: Variability in language, including synonyms, idioms, and metaphors, makes it
difficult for NLP systems to interpret language consistently.
4. Morphological Complexity: Words can change forms based on tense, number, gender, etc., requiring
robust analysis (e.g., "run," "runs," "running").
5. Language Diversity: Handling multiple languages and dialects, each with unique syntax and semantics,
is challenging.
6. Data Sparsity: Limited data for rare words, phrases, or languages can hinder the performance of NLP
systems.
7. World Knowledge: Understanding language often requires background knowledge about the world,
which is difficult to encode in NLP systems.
8. Real-time Processing: NLP applications, like voice assistants, need to process and understand language
instantly, posing computational challenges.
9. Sarcasm and Irony: Detecting sarcasm and irony is difficult because it often relies on tone and cultural
knowledge.
10. Ethical and Bias Issues: NLP systems can inherit biases from training data, leading to ethical concerns in
their outputs.
11. Cross-domain Knowledge Transfer: NLP systems trained in one domain may struggle when applied to a
different domain.
12. Speech Recognition Challenges: Variability in speech, such as accents, intonations, and background
noise, complicates accurate transcription and understanding.
Q4. Discuss the challenges in various stages of natural language processing Ans:
I) Text Preprocessing:
1. Tokenization: Breaking down text into individual words or tokens is not always straightforward. For
example, handling contractions (e.g., "can't" to "can" and "not") and compound words (e.g., "New York").
2.Normalization: Converting text to a standard format, such as dealing with different cases (uppercase vs.
lowercase), removing punctuation, and handling special characters, can be tricky.
3. Stop Word Removal: Deciding which common words (e.g., "the," "and") to remove without losing
important context is challenging.
4. Stemming and Lemmatization: Reducing words to their base or root form can be difficult due to irregular
forms and exceptions (e.g., "better" to "good").
V) Pragmatic Analysis
1. Contextual Understanding: Grasping the intended meaning behind a sentence considering context, tone,
and implied meanings, such as sarcasm or irony, is challenging.
2. Speech Acts: Understanding the function of a sentence (e.g., request, command, question) within
a conversation requires interpretation beyond literal meaning.
3. Conversational Implicature: Recognizing implied meaning that is not explicitly stated is complex (e.g.,
"Can you pass the salt?" implies a request rather than a literal question).
Q7. Write short notes on Tokenization. Ans:
TOKENIZATION:
1. Tokenization is one of the first step in any NLP pipeline.
2. Tokenization is nothing but splitting the raw text into small chunks of words or sentences, called tokens.
3. If the text is split into words, then it's called as 'Word Tokenization' and if it's split into sentences then it's
called as 'Sentence Tokenization'.
4. Generally, 'space' is used to perform the word tokenization and characters like 'periods, exclamation point and
newline char are used for Sentence Tokenization.
I)Tokenization Approaches:
1) White Space Tokenization:
1. This is the simplest tokenization technique.
2. The whitespace tokenizer breaks text into terms whenever it encounters a whitespace character.
3. 'This is the fastest tokenization technique but will work for languages in which the whitespace breaks apart
the sentence into meaningful words.
4. Example: English
II) Dictionary Based Tokenization:
1. In this method the tokens are found based on the tokens already existing in the dictionary.
2. If the token is not found, then special rules are used to tokenize it.
3. It is an advanced technique compared to whitespace tokenizer.
III) Rule Based Tokenization:
1. In this technique a set of rules are created for the specific problem.
2. The tokenization is done based on the rules.
3. For example, creating rules bases on grammar for particular language.
IV) Regular Expression Tokenizer
1. This technique uses regular expression to control the tokenization of text into tokens.
2. Regular expression can be simple to complex and sometimes difficult to comprehend.
3. This technique should be preferred when the above methods does not serve the required purpose,
4. It is a rule based tokenizer.
V) Penn Tree Tokenization:
1. Tree bank is a corpus created which gives the semantic and syntactical annotation of language.
2. Penn Treebank is one of the largest treebanks which was published.
3. This technique of tokenization separates the punctuation, clitics (words that occur along with other words like
I'm, don't) and hyphenated words together.
VI) Spacy Tokenization:
1. This is a modern technique of tokenization which faster and easily customizable.
2. It provides the flexibility to specify special tokens that need not be segmented or need to be segmented using
special rules.
3. Suppose you want to keep $ as a separate token, it takes precedence over other tokenization operations.
VII) Moses Tokenization:
1. This is a tokenizer which is advanced and is available before Spacy was introduced.
2. It is basically a collection of complex normalization and segmentation logic which works very well for
structured language like English.
VIII) Subword Tokenization:
1. This tokenization is very useful for specific application where sub words make significance.
2. In this technique the most frequently used words are given unique ids and less frequent words are split into
sub words and they best represent the meaning independently.
4. This helps the language model not to learn fewer and fewest as two separate words.
5. This allows to identify the unknown words in the data set during training.
6. There are different types of sub word tokenization such as
a. Byte-Pair Encoding (BPE) b. Word Piece c. Unigram Language Model d. Sentence Piece
Q8. What Is morphology. Why do we need to do Morphological Analysis? Discuss various application
domains of Morphological Analysis. Ans:
MORPHOLOGY:
1. Morphology is a branch of linguistics and a fundamental aspect of language study that deals with the
structure and formation of words.
2. It focuses on understanding how words are constructed from smaller units called morphemes, which are the
smallest meaningful units in a language.
3. Morphemes can be prefixes, suffixes, roots, or inflections that convey specific meanings or grammatical
functions.
MORPHOLOGICAL ANALYSIS:
Morphological Analysis is the process of studying and analysing the structure of words to identify and
understand their constituent morphemes
.
NEED TO DO MORPHOLOGICAL ANALYSIS:
1. Wastage of Memory In Exhaustive Lexicon: Without morphological analysis, one would require an
exhaustive lexicon to store every possible word form, including all inflections, prefixes, and suffixes.
2. Failure to Depict Linguistic Generalization: Without morphological analysis, understanding an unknown
word or applying grammatical rules consistently becomes challenging, as it would require memorizing each
word Individually.
3. Language Understanding and Interpretation: Morphological analysis aids in comprehending the meanings
and grammatical functions of words. It allows for the identification of word roots, prefixes, and suffixes, which
can provide insights into a word's semantic and syntactic roles within a sentence.
I) Inflection:
1. In morphology, inflection is a process of word formation in which a word is modified to express different
grammatical categories such as tense, case, voice, aspect, person, number, gender, mood and definiteness.
2. Nouns have simple inflectional morphology.
3. Examples of the inflection of noun in English are given below, here an affix marking plural.
a. Cat (-s)
b. Butterfly (-lies)
C. Mouse (mice)
d. Box (-es)
4. A possessive affix is a suffix or prefix attached to a noun to indicate its possessor.
5. Verbs have slightly more complex inflectional, but still relatively, simple inflectional morphology.
6. There are three types of verbs in English.
a. Main Verbs - Eat, Sleep and Impeach
b. Modal Verbs - Can will, should
C. Primary Verbs - Be, Have, Do
7. In Regular Verbs, all the verbs have the same endings marking the same functions.
8. Regular verbs have four morphological form.
9. Just by knowing the stem we can predict the other forms.
II) Derivation:
1. Morphological derivation is the process of forming a new word from an existing word, often by adding a
prefix or suffix, such as un- or -ness.
2. For example, unhappy and happiness derive from the root word happy.
3. It is differentiated from inflection, which is the modification of a word to form different grammatical
categories without changing its core meaning: determines, determining, and determined are from the root
determine.
4. Derivational morphology often involves the addition of a derivational suffix or other affix.
5. Examples of English derivational patterns and their suffixes:
a. adjective-to-noun: -ness (slow + slowness)
b. adjective-to-verb: -en (weak + weaken)
c. adjective-to-adjective: -ish (red → reddish)
d. adjective-to-adverb: -ly (personal + personally)
e. noun-to-adjective: -al (recreation + recreational)
f. noun-to-verb:-fy (glory + glorify)
g. verb-to-adjective: -able (drink + drinkable)
h. verb-to-noun (abstract): -ance (deliver + deliverance)
i. verb-to-noun (agent): -er (write → writer)
III) Compounding:
1.Compounding words are formed when two or more lexemes combine into a single new word.
2. Compound words may be written as one word or as two words joined with a hyphen.
3. For example:
a. noun-noun compound: note + book notebook
b. adjective-noun compound: blue + berry + blueberry
c. verb-noun compound: work + room workroom
d. noun-verb compound: breast + feed + breastfeed
e. verb-verb compound: stir fry + stir-fry
f. adjective-verb compound: high + light + highlight
g. verb-preposition compound: break up breakup
h. preposition-verb compound: out + run outrun
i. adjective-adjective compound: bitter + sweet + bittersweet
j. preposition-preposition compound: in + to into
Q10.Explain the role of FSA In morphological analysis?
Ans:
1. Finite State Automata (FSA) play a crucial role in morphological analysis, which is the process of
breaking down words into their smallest meaningful units, known as morphemes.
2. Morphemes are the building blocks of language and can be prefixes, suffixes, roots, or inflections that
convey specific meanings.
3. FSAs are mathematical models used to represent and analyse the structure of words in natural language.
3. Affix Stripping:
a. In many languages, words are formed by adding prefixes and suffixes to root words.
b. FSAs are used to strip away these affixes systematically.
c. By following the transitions in the FSA, it's possible to isolate the root form and various affixes, aiding in the
understanding of word formation.
5. Lexical Analysis;
a. In natural language processing, FSAs are employed during the lexical analysis phase, where text is broken
into tokens or words.
b. The FSA helps identify and extract individual words from a sentence, which is a crucial step in various
language processing tasks.
7. Efficiency;
a. FSAs are computationally efficient, making them suitable for real-time or large-scale applications.
b. They can quickly analyse and decompose words, making them valuable in search engines, spell checkers, and
machine translation systems.
8. Multilingual Applications:
a. FSAs can be adapted to different languages by constructing language-specific automata.
b. This versatility makes them a powerful tool in analysing morphological structures across diverse languages.
Q11.Explain FSA for nouns and verbs. Also Design a Finite State Automata (FSA) for the words of English numbers 1-99
Finite State Automata (FSA) are used in Natural Language Processing (NLP) to model the structure and
behavior of different linguistic elements, including nouns and verbs. An FSA is a computational model
consisting of states and transitions between those states, often used to recognize patterns or sequences in input
data.
FSA FOR NOUNS:
1. Purpose: An FSA for nouns would model the possible forms a noun can take, including singular and
plural forms, as well as possessive forms.
2. Structure:
a. The automaton starts in an initial state.
b. Transitions occur based on the input characters or morphemes that make up the noun.
c. The FSA may have different states for singular and plural forms, with transitions triggered by the
addition of an "s" for plurals or an apostrophe and "s" for possessives.
3. Example:
a. Consider the noun "cat":
• The FSA would have a transition from the initial state on reading "cat."
• A transition to a new state occurs if an "s" is read, representing "cats" (plural).
• Another transition occurs if "s" is read, representing "cat's" (possessive).
b. The FSA would accept "cat," "cats," and "cat's" as valid forms.
FSA FOR VERBS:
1. Purpose: An FSA for verbs models the various forms a verb can take, such as tense (past, present,
future), person (first, second, third), and number (singular, plural).
2. Structure:
a. The FSA starts in an initial state.
b. Transitions occur based on the input morphemes or endings that modify the verb.
c. The FSA may have states representing different verb forms, such as base form, past tense, present
participle, etc.
3. Example:
a. Consider the verb "run":
• The FSA starts in the initial state with the input "run."
• transition occurs for "runs" (present third-person singular).
• Another transition for "ran" (past tense) and "running" (present participle)..
b. The FSA would accept "run," "runs," "ran," and "running" as valid forms.
DESIGNING A FINITE STATE AUTOMATA (FSA) FOR ENGLISH NUMBERS 1-99:
To design an FSA that accepts the words for English numbers from 1 to 99, we must consider the structure of
these numbers:
1. Single-Digit Numbers (1-9): "one," "two," "three," "four," "five," "six," "seven," "eight," "nine."
2. Teen Nurnbers (10-19): "ten," "eleven," "twelve," "thirteen," "fourteen," "fifteen," "sixteen,"
"seventeen," "eighteen," "nineteen."
3. Tens (20, 30, ..., 90): "twenty," "thirty," "forty," "fifty," "sixty," "seventy," "eighty," "ninety."
4. Composite Numbers (21-99): These numbers combine a tens word (e.g., "twenty") with a single-digit
word (e.g., "one").
Q12.Explain Porter Stemming algorithm In detall
Ans:
1. Pre-processing:
• The input word is converted to lowercase.
• The word is initially classified into one of five groups, depending on its ending.
• This classification helps apply specific rules to each group.
3. Rule Application:
• The algorithm applies a series of rules to further reduce the word to its root form.
• These rules are applied in a specific order.
• Each rule may remove or replace a suffix if certain conditions are met.
• If a rule is successfully applied, no further rules are considered for that word.
• The rules include handling common verb and noun suffixes, such as-ize, -ment, -ational, and so оп.
4. Special Cases:
• There are special cases and exceptions that the algorithm considers, such as irregular plurals (e.g.,
"mice" to "mouse"), and cases where a suffix should not be removed (e.g., "agreed" remains "agree").
7. Performance Optimization:
• The Porter Stemmer is designed for efficiency, and it avoids unnecessary processing by checking
conditions before applying each rule.
• This improves its performance when processing large volumes of text.
Advantage:
It produces the best output as compared to other stemmers and it has less error rate.
Limitation:
Morphological variants produced are not always real words.
Q13.Explain Lexicon Free FST Porter Stemmer
Ans:
LEXICON FREE FST PORTER STEMMER:
1. The Lexicon-Free FST (Finite State Transducer) Porter Stemmer is a variation of the Porter Stemming
Algorithm.
2. It uses finite state transducers to perform stemming without relying on a predefined lexicon or dictionary.
3. This approach is more data-driven and doesn't require maintaining a comprehensive list of word forms.
4. Instead, it applies stemming rules based on the input word's morphology.
5. They are used in Informational Retrieval Applications and Search Engine.
EXAMPLE: Let's apply the Lexicon-Free FST Porter Stemmer to the word "jumping."
Rule 3: Check for and remove some common adjective and adverb endings:
"jump" + "jump" (no change)
Rule 5: Check for and remove "al" or "ance" or "ence" or "er" or "ic" or "able" or "ible" or "ant" or "ement" or
"ment" or "ent" or "ou" or "ism" or "ate" or "iti" or "ous" or "ive" or "ize" endings:
"jump" "jump" (no change)
Rule 6: Check for and remove "s" or "tional" or "ational" or "al" endings:
"jump" + "jump" (no change)
Step 4: Result:
After applying all the rules, the Lexicon-Free FST Porter Stemmer has processed the word "jumping" and
reduced it to "jump."
Final Result:
Input Word: "jumping"
Stemmed Word: "jump"
Q14. What is language model? Explain the use of Language model? Write a note on N-Gram language Model. Ans:
LANGUAGE MODEL:
1. A language model is a statistical model used in natural language processing (NLP) and
computational linguistics to predict or generate sequences of words or tokens in a language.
2. It aims to capture the statistical relationships and patterns between words in a given language.
3. Language models play a crucial role in various NLP tasks, including text generation, machine translation,
speech recognition, and more.
N-GRAM MODEL:
1. An N-gram language model is a type of language model that relies on the probability of sequences of N
consecutive words (or tokens) in a given text.
2. The "N" in N-gram represents the number of words in the sequence.
3. Common values for N are 1 (unigram), 2 (bigram), 3 (trigram), and so on.
4. Consider the following example: "This is a sentence"
5. A 1-gram/unigram is a one-word sequence. For the given sentence, the unigrams would be: "This", "is", "a",
"sentence".
6. A 2-gram/bigram is a two-word sequence of words, such as "This is", "is a" or "a sentence".
7. A 3-gram/trigram is a three-word sequence of words like "This is a", "is a sentence"
Key Characteristics:
1. Statistical Probabilities: N-gram models estimate the likelihood of a word occurring based on the previous N-
1 words in the sequence. For example, in a bigram model (N=2), the probability of a word depends only on the
previous word.
2. Markov Assumption: N-gram models make a simplifying assumption known as the Markov assumption,
which means that the probability of a word only depends on a limited context window (N-1 preceding words).
3. Simplified Modelling: N-gram models are computationally efficient and easy to implement compared to
more complex models like neural language models. They are particularly useful when dealing with large
corpora.
Applications
1. Speech Recognition
2. Machine Translation
3. Text Classification
4. Information Retrieval
Pros:
1. N-gram language model is easy to understand and implement.
2. It is computationally efficient and can be used for real-time applications.
3. It can handle large amounts of text data and provide accurate results.
Cons:
1. N-gram language model suffers from the sparsity problem, where some N-grams may not occur in the
training corpus, resulting in zero probabilities.
2. N-gram language model assumes that the probability of a word depends only on its previous N-1 words,
which is not always true in real-world scenarios.
3. N-gram language model does not capture the semantic meaning of words and cannot handle the ambiguity of
language.
Q13 Explain N-Gram Model for Spelling Correction Ans:
AFFIXES:
1. Affixes are linguistic elements that are attached to the beginning (prefixes) or end (suffixes) of words to
modify their meanings or grammatical properties.
2. In natural language processing (NLP), understanding and recognizing affixes is important for tasks like
morphological analysis, word stemming, and part-of-speech tagging.
3. Affixes play a significant role in word formation and can change a word's tense, plurality, case, or meaning.
TYPES OF AFFIXES:
1. Prefixes:
Prefixes are affixes attached to the beginning of a base word to modify its meaning or form a new word.
Examples: "un-" in "undo," "re-" in "rewrite," "pre-" in "preview."
2. Suffixes:
Suffixes are affixes attached to the end of a base word to change its meaning, part of speech, or grammatical
function.
Examples: "ed" in "walked," "ing" in "running," "s" in "cats."
3. Infixes:
Infixes are affixes that are inserted into the middle of a base word to alter its meaning or grammatical structure.
Infixes are relatively rare in English but are more common in some other languages.
Example: The infix "-um-" in Tagalog, as in "ganda" (beauty) becomes "gumanda" (became beautiful).
4. Circumfixes:
Circumfixes consist of two parts, one attached to the beginning and one attached to the end of a base word.
Together, they modify the word's meaning.
Example: In German, "ge-" is added as a prefix, and "-t" is added as a suffix to form past participles like
"geliebt" (loved).
6. Derivational Affixes;
Derivational affixes are prefixes and suffixes that create new words or derive words from other words.
Example: The suffix "-er" in "teacher" derives a new word from "teach."
7. Inflectional Affixes;
Inflectional affixes are suffixes that add grammatical information to words, such as tense, case, number, or
gender, without creating new words.
Examples: The "-s" in "cats" (plural) or the "-ed" in "walked" (past tense).
Q1. What is POS tagging? Discuss various challenges faced by POS tagging Q2 Explain the various challenges In POS tagging Ans:
PART-OF-SPEECH (POS) TAGGING;
1. Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to
each word in a corpus.
2. Tags are also usually applied to punctuation markers; thus tagging for natural language is the same
process as tokenization for computer languages, although tags for natural languages are much more
ambiguous.
3. The input to a tagging algorithm is a string of words and a specified tagset.
4. The output is a single best tag for each word.
5. For example, here are some sample sentences from the Airline Travel Information Systems (ATIS)
corpus of dialogues about air-travel reservations.
6. For each we have shown a potential tagged output using the Penn Treebank tagset
7. Even in these simple examples, automatically assigning a tag to each word is not trivial.
8. For example, book is ambiguous.
9. That is, it has more than one possible usage and part of speech.
10. It can be a verb (as in book that flight or to book the suspect) or a noun (as in hand me that book, or a
book of matches).
11. Similarly, that can be a determiner (as in Does that flight serve dinner), or a complementizer (as in I
thought that your flight was earlier).
12. The problem of POS-tagging is to resolve these ambiguities, choosing the proper tag for the context.
13. Most of the POS tagging falls under Rule Base POS tagging, Stochastic POS tagging and
Transformation based tagging.
3. Normalization:
a. The CRF normalizes the potential function to ensure it defines a valid probability distribution.
b. This is done using the partition function Z(X), which sums over all possible label sequences
P(Y | X) = Potential(Y.X)/Z(X)
4. Training:
a. During training, CRFs are trained using maximum likelihood estimation.
b. The goal is to find the parameters A and µ that maximize the likelihood of the training data.
c. This involves:
• Defining the objective function: The objective is to maximize the log-likelihood of the observed
sequences.
• Optimization: Using techniques such as gradient descent or other optimization algorithms to adjust the
parameters to maximize this log-likelihood.
5. Inference:
a. To make predictions on new data, the CRF needs to find the most likely label sequence given the input
sequence.
b. This involves:
Decoding: Using algorithms like the Viterbi algorithm (for linear CRFs) or dynamic programming approaches
to find the label sequence with the highest probability.
Advantages of CRFs:
1. Contextual Information: CRFs can incorporate a rich set of features and capture complex dependencies
between labels.
2. Global Normalization: The normalization over the entire sequence allows CRFs to handle dependencies
between labels, which is a limitation in models like HMMS.
Q1. Write short notes on Hidden Markov Model. Q2. What are the limitations of Hidden Markov Model Ans:
HIDDEN MARKOV MODEL (HMM)
1. Hidden Markov models (HMMs) are sequence models.
2. HMMs are "a statistical Markov model in which the system being modeled is assumed to be a Markov
process with unobservable (i.e. hidden) states".
3. They are designed to model the joint distribution P(H, O), where H is the hidden state and O is the observed
state.
4. For example, in the context of POS tagging, the objective would be to build an HMM to model P(word | tag)
and compute the label probabilities given observations using Bayes Rule:
P(H/O) = P(O l H)P(1) P(H) / P(O)
5. HMM graphs consist of a Hidden Space and Observed Space, where the hidden space consists of the labels
and the observed space is the input.
6. These spaces are connected via transition matrices (T, A) to represent the probability of transitioning from
one state to another following their connections.
7. Each connection represents a distribution over possible options; given our tags, this results in a large search
space of the probability of all words given the tag.
8. The main idea behind HMMs is that of making observations and traveling along connections based on a
probability distribution.
9. In the context of sequence tagging, there exists a changing observed state (the tag) which changes as our
hidden state (tokens in the source text) also changes.
LIMITATIONS OF HIDDEN MARKOV MODEL
1. Independence Assumption:
• HMMs assume that the observations (emissions) at each time step are conditionally independent given
the hidden states.
• This means that the model doesn't capture long-range dependencies between observations, which are
common in many real-world sequences of data.
• This limitation can affect the model's ability to capture complex relationships.
2. Fixed-Length Sequences:
• HMMs are designed to work with sequences of fixed length.
• If the length of the sequence is not known in advance or if sequences of varying lengths are common,
HMMs may not be well-suited for the task without modification.
3. State Explosion:
• When dealing with complex problems, HMMs may require a large number of states to model all possible
hidden states effectively.
• This can lead to a "state explosion" problem, where the model becomes computationally expensive and
requires substantial training data.
4. Difficulty in Choosing the Right Model:
• Selecting the appropriate number of hidden states for an HMM can be challenging.
• Too few states may result in underfitting, while too many states may lead to overfitting, where the model
captures noise in the data.
5. Lack of Representational Power;
• HMMs are not expressive enough to capture certain complex patterns and structures in data.
• For example, they may struggle with tasks that require modelling hierarchical or nested dependencies.
Q18. Explain Maximum Entropy Model for POS TaggingAns:
MAXIMUM ENTROPY MODEL:
11. Maximum Entropy Models are a type of statistical model used in natural language processing
(NLP) for tasks such as Part-of-Speech (POS) tagging.
12. The Maximum Entropy (MaxEnt) approach is based on the principle of making as few
assumptions as possible, except those supported by the training data.
13. In POS tagging, the goal is to assign the correct part of speech (e.g., noun, verb, adjective) to
each word in a sentence.
Key Concepts:
1. Maximum Entropy Principle:
a. The Maximum Entropy principle suggests that, among all possible probability distributions, the
one that maximizes entropy (i.e., is the most uniform) should be chosen, given the constraints
Imposed by the known data.
b. This approach avoids making any unnecessary assumptions about the distribution of the data,
making it well-suited for tasks like POS tagging where we want to model the distribution of tags
based on various features.
2. Feature-Based Approach:
a. MaxEnt models use features derived from the input data. In POS tagging, these features might
include the current word, neighboring words, prefixes, suffixes, capitalization, etc.
b. The model combines these features to predict the probability distribution over possible tags for
each word.
3. Modeling:
a. In a Maximum Entropy Model, the probability of a tag given a word and its context is computed
using a weighted combination of features.
b. Formally, the probability of a tag given a word wand its context is: P(tw, context) 1 exp Z(w,
context) Af(t, tw, context)
4. C. Here, di are the weights associated with each feature fi, and Z(w, context) is a normalization factor
ensuring the probabilities sum to 1.
5. Training the Model:
a. The model is trained using a labelled corpus, where each word is tagged with its correct POS tag.
b. The weights λί are learned by maximizing the likelihood of the observed data. This is typically
done using iterative methods like gradient ascent.
6. POS Tagging with MaxEnt:
a. During tagging, the model calculates the probability of each possible tag for a word given its
context.
b. The tag with the highest probability is assigned to the word.
Example:
Consider the sentence: "The cat sits on the mat."
1. For the word "cat," features might include:
a. The word itself ("cat").
b. The previous word ("The").
c. Whether the word is capitalized.
2. The MaxEnt model would use these features to predict the probability of each possible tag (e.g.,
noun, verb, etc.).
3. If the model assigns the highest probability to "Noun" for "cat," then "cat" is tagged as a noun.
Q19 Write short notes on lexical semantics.
Ans:
LEXICAL SEMANTICS:
1. Lexical Semantics is the study of word meaning.
2. Lexical semantics plays a crucial role in semantic analysis, allowing computers to know relationships
between words, phrasal verbs, etc.
3. Semantic analysis is the process of extracting meaning from text.
4. It permits computers to know and interpret sentences, paragraphs, or whole documents.
5. In Lexical Semantics words, sub-words, etc. are called lexical items.
6. In simple terms, lexical semantics is the relationship between lexical items, meaning of sentences and
syntax of sentence. 3
7. The study of lexical semantics looks at:
a. The classification and decomposition of lexical items.
b. The differences and similarities in lexical semantic structure cross-linguistically.
c. The relationship of lexical meaning to sentence meaning and syntax.
b. Hypernymy
Hypernymy is the inverse of hyponymy.
It refers to the relationship where a word (the hypernym) is a general category that includes more specific
instances (the hyponyms).
Example:
Hypernym: "Vehicle"
Hyponyms: "Car," "Bicycle," "Bus"
Explanation: "Vehicle" is a general term that encompasses different types of vehicles like "Car," "Bicycle," and
"Bus."
c. Synonymy:
Synonymy refers to words that are pronounced and spelled differently but contain the same meaning.
Example: Happy, joyful, glad
d. Antonymy:
Antonymy refers to words that are related by having the opposite meanings to each other.
There are three types of antonyms: graded antonyms, complementary antonyms, and relational
antonyms.
Example:
dead, alive
long, short
e. Homonymy:
Homonymy refers to the relationship between words that are spelled or pronounced the same way but hold
different meanings.
Example:
bank (of river)
bank (financial institution)
f. Polysemy:
Polysemy refers to a word having two or more related meanings.
Example:
bright (shining)
bright (intelligent)
g. Meronomy:
Meronymy refers to the part-whole relationship between words.
A meronym is a word that denotes a part of something, while the whole to which it belongs is called the
holonym.
Example:
Meronym: "Wheel"
Holonym: "Car"
Explanation: "Wheel" is a part of a "Car." Here, "Wheel" is the meronym, and "Car" is the holonym.
h. Holonymy:
Holonymy is the inverse of meronymy. It describes the relationship where a word (the holonym) represents the
whole, and the word (the meronym) represents a part of that whole.
Example:
Holonym: "Tree"
Meronyms: "Branch," "Leaf," "Root"
Q00.Explain three types of referents that complicate the reference resolution problem
Ans:
TYPES OF REFERENTS THAT COMPLICATE THE REFERENCE RESOLUTION PROBLEM:
I) Inferrables;
1. Inferrables are referents that are not explicitly mentioned in the text but are implied or inferred from the
context or world knowledge.
2. These referents require the reader (or the system) to infer the existence of an entity based on the
situation described in the discourse.
3. Example: "John arrived at the restaurant and ordered a pizza. The waiter brought it to the table."
4. Explanation: In this example, "the waiter" is an inferrable referent. The text does not explicitly mention
a waiter before this point, but the existence of a waiter is inferred from the context of being in a
restaurant and ordering food. Reference resolution systems must recognize that "the waiter" refers to an
implied entity involved in the situation.
II) Discontinuous Sets:
1. Discontinuous sets refer to situations where a pronoun or referring expression refers to a set of entities
that are not mentioned together as a single group but are instead mentioned separately or at different
points in the discourse.
2. Resolving such references involves recognizing and grouping these entities across the discourse.
3. Example: "Alice bought a laptop. Bob purchased a tablet. They decided to compare their new gadgets."
4. Explanation: The pronoun "They" refers to a discontinuous set consisting of both Alice and Bob.
Similarly, "their new gadgets" refers to the set that includes both the laptop and the tablet. The entities in
the set were mentioned separately, and the reference resolution system must group these discontinuous
mentions to correctly understand the referent,
III) Generics:
1. Generics are referents that refer to a general class or category of entities rather than specific instances.
2. Resolving generics can be complex because they do not refer to a particular entity or group of entities in
the text, but rather to an entire category or concept.
3. Example: Cats tres great pets because they are independent."
4. Explanation: In this sentence, "Cats" is a generic referent that refers to the entire category of cats, not
any specific cat. Similarly, "they" refers to the general concept of cats. Generics complicate reference
resolution because they involve identifying that the referent is not a specific entity but rather a broader
category.
Q20 What do you mean by word sense disambiguation (WSD)? Discuss dictionary based approach for WSD
WSD:
1. WSD stands for Word Sense Disambiguation.
2. Words have different meanings based on the context of its usage in the sentence.
3. In human languages, words can be ambiguous too because many words can be interpreted in multiple ways
depending upon the context of their occurrence.
4. Word sense disambiguation, in natural language processing (NLP), may be defined as the ability to determine
which meaning of word is activated using word in a particular context.
5. Lexical ambiguity, syntactic or semantic, is one of the very first problem that any NLP system faces.
6. Part-of-speech ( POS) taggers with high level of accuracy can solve Word's syntactic ambiguity.
7. On the other hand, the problem of resolving semantic ambiguity is called word sense disambiguation.
8. Resolving semantic ambiguity is harder than resolving syntactic ambiguity.
9. For example, consider the two examples of the distinct sense that exist for the word "bass" -
a. I can hear bass sound.
b. He likes to eat grilled bass.
10. The occurrence of the word bass clearly denotes the distinct meaning.
11. In first sentence, it means frequency and in second, it means fish.
12. Hence, if it would be disambiguated by WSD then the correct meaning to the above sentences can be
assigned as follows
a. I can hear bass/frequency sound.
b. He likes to eat grilled bass/fish.
Ans:
DIFFERENT STEPS IN TEXT PROCESSING FOR INFORMATION RETRIEVAL:
1. Document Collection:
a. Gather and compile the collection of text documents that will be indexed and searched.
b. These documents can be web pages, articles, books, or any textual data source.
2. Text Pre-processing:
a. Lexical Analysis:
• Lexical analysis, also known as tokenization.
• It involves breaking the text into individual words or tokens.
• This step is essential for creating a structured representation of the text data.
b. Elimination of Stop Words:
• Stop words are common words such as "the," "and," "in," etc., that occur frequently in a language
but often carry little or no semantic meaning.
• Removing stop words from the text helps reduce noise and improve the efficiency of information
retrieval.
c. Stemming:
• Stemming is the process of reducing words to their root or base form.
• For example, stemming would convert words like "running," "ran," and "runs" to the common base
form "run."
• Stemming helps in capturing variations of words and reduces the vocabulary size.
3.Term Indexing:
a. Create an Inverted Index:
This data structure maps terms (words) to the documents in which they appear.
It stores the term frequency (how often a term appears in a document) and other relevant Information.
b. Term Weighting:
Assign weights to terms based on their importance in documents and across the entire collection.
Common techniques Include TF-IDF (Term Frequency-Inverse Document Frequency).
4.Document Representation:
a. Each document is represented as a vector in a high-dimensional space, where each
dimension corresponds to a unique term in the collection.
b. The values in the vector may indicate the presence, abserice, or importance of each
term in the document.
5. Query Processing:
i. Tokenize and pre-process user queries in the same way as documents.
ii. Rank documents based on their relevance to the query.
iii. Common ranking models include vector space models and probabilistic models like BM25.
6. Ranking and Retrieval:
i. Retrieve the top-ranked documents that match the user's query.
ii. Typically, retrieval is done using a scoring mechanism that ranks documents based on
their similarity to the query.
7. Results Presentation:
i. Present the retrieved documents to the user in a readable and relevant format, such as
snippets, titles, or summaries.
ii. Implement user interface components for browsing and interacting with search results.
8. Performance Evaluation:
a. Assess the performance of the IR system using metrics like precision, recall, FI-score, and Mean
Average Precision (MAP) to measure retrieval quality
Q24. Explain QUESTION ANSWER SYSTEM: Ans
Working:
1. Seed Data:
• Start with a small set of labeled data.
• This could be a set of sentences or words with their correct annotations, such as part-of-speech
• tags or word senses.
• This is called the seed data.
2. Train an Initial Model:
• Train an initial supervised model using the seed data.
• This model serves as the baseline.
3. Apply the Model:
• Use the initial model to label a larger pool of unlabeled data.
• This may result in some labeling errors.
4. Generate Heuristics:
• Analyze the discrepancies between the model's predictions and the true labels in the newly labeled data.
• Identify patterns, rules, or heuristics that can be used to improve the model's accuracy.
• For example, you might discover that certain word patterns are strong indicators of a specific part of
speech.
5. Augment the Seed Data:
• Select a subset of the newly labeled data that the model confidently predicts correctly based on the
discovered heuristics.
• Add these examples to the seed data.
6. Iterate:
• Repeat the process by training a new model using the augmented seed data.
• Then, apply this model to label more unlabeled data, generate heuristics, and augment the seed
• data again.
• Continue iterating until the model's performance reaches a satisfactory level or until a stopping criterion is
met.
Limitations:
1. Risk of propagating labeling errors and the need for careful heuristics design.
2. Quality of the seed data and the initial model can significantly impact the overall performance of the
bootstrapping process.
Q26 .Describe in detail Centering Algorithm for reference resolution Ans
CENTERING ALGORITHM:
1. The Centering Algorithm is a theory and algorithm developed to model the coherence of discourse and
to resolve references, particularly pronouns and other anaphoric expressions.
2. The algorithm is part of the broader Centering Theory, which aims to explain how speakers maintain
coherence in a conversation or text by focusing attention on certain entities, referred to as "centers."
3. The Centering Algorithm formalizes the process of resolving references within a discourse. Here's how
the algorithm typically works:
a. Step 1: Identify Entities in the Utterance
For each utterance in the discourse, identify the set of entities mentioned. These entities will form the forward-
looking centers (Cf) for that utterance.
b. Step 2: Rank the Forward-Looking Centers (Cf)
Rank the forward-looking centers according to their salience. Salience can be determined by syntactic roles
(e.g., subjects are more salient than objects) or semantic factors (e.g. proper nouns may be more salient than
pronouns).
c.Step 3: Determine the Backward-Looking Center (Cb)
Determine the backward-looking center for the current utterance by selecting the entity from the previous
utterance that connects best with the current utterance. The Cb is typically the most prominent entity that
maintains coherence with the previous discourse.
d. Step 4: Calculate Transitions
Determine the type of transition between the previous and current utterance based on the relationship between
the Cb and Cp. This Involves comparing the Cb of the current utterance with the Cb and Cp of the previous
utterance.
e. Step 5: Resolve References
Use the transitions and the rankings to resolve pronouns and other anaphoric references. The preferred center
(Cp) is often the most likely candidate for pronoun resolution.