NLP Unit Test 2
NLP Unit Test 2
N M V E
N 0 1 2 2
M 1 0 2 2
V 1 1 1 2
E 0 0 0 0
10.For a given grammar using CYK or CKY algorithm parse the statement
a. “Book the meal flight”
Rules:
Extractive Summarization:
In extractive summarization, sentences or phrases from the original text are selected
and combined to form the summary. This approach directly uses parts of the original
text. It's similar to how a person might extract key sentences when creating a
summary. Techniques for extractive summarization include:
TF-IDF (Term Frequency-Inverse Document Frequency): Ranks sentences based on
their importance in the context of the entire document.
Graph-Based Methods: Represent the text as a graph and use algorithms like
PageRank to find the most important sentences.
Machine Learning Approaches: Train models to predict the importance of sentences.
Abstractive Summarization:
Abstractive summarization involves generating new sentences that capture the main
points of the original text, potentially using different words and structures. It requires
a deeper understanding of the text and is more similar to how humans create
summaries. Techniques for abstractive summarization include:
Sequence-to-Sequence Models (e.g., using LSTM or Transformer architectures):
Train models to generate summaries by learning to map input sequences to output
sequences.
Transformer-Based Models (e.g., GPT-3, BERT): Utilize large-scale pre-trained
language models to generate abstractive summaries.
11. Explain Maximum Entropy Model for POS Tagging
he Maximum Entropy Model is a probabilistic model used in natural language
processing for tasks like part-of-speech (POS) tagging. It's based on the principle of
maximum entropy, which states that, given a set of constraints, the best model is the
one that makes the fewest assumptions.
Here's how the Maximum Entropy Model works for POS tagging:
Feature Selection:
Identify relevant features that can help predict the correct part-of-speech tag for a
given word in context. Features may include the word itself, surrounding words,
capitalization, suffixes, etc.
Define Feature Functions:
Associate each feature with a function that maps an input (e.g., a word and its context)
to a binary value indicating whether the feature is present or not.
Collect Training Data:
Gather a labeled dataset where each word is tagged with its corresponding part-of-
speech.
Training:
Use an optimization algorithm (like Generalized Iterative Scaling) to find the model
parameters that maximize the likelihood of the training data, subject to the feature
function constraints.
Prediction:
Given a new sentence, calculate the probability distribution over possible tags for
each word using the trained model. The tag with the highest probability is assigned to
the word.
12. Explain Hobbs algorithm for pronoun resolution.
In the context of sequence labeling, the states represent the different labels or tags,
while the observations correspond to the elements in the sequence that we want to
label (e.g., words in a sentence).
Transition Probabilities:
For each pair of adjacent states, define the transition probabilities. These probabilities
represent the likelihood of transitioning from one state to another. In sequence
labeling, they capture the likelihood of transitioning from one label to another.
Emission Probabilities:
Define the probabilities of emitting each observation from each state. These
probabilities represent the likelihood of observing a particular element given a
specific label. In NLP, this is often related to word-tag probabilities.
Initialization Probabilities:
Specify the initial probabilities of starting in each state. This represents the likelihood
of starting the sequence with a particular label.
The Forward Algorithm:
Use the Viterbi algorithm to find the most likely sequence of states given the
observations. This algorithm efficiently finds the best sequence by considering both
the transition probabilities and the emission probabilities.
Decoding:
Once the Viterbi algorithm has been applied, the HMM produces a sequence of labels
that best matches the input sequence.
Output:
The output of the HMM is the sequence of labels that correspond to the input
sequence. These labels provide the desired annotation or classification for each
element in the input sequence.
In NLP, HMMs have been used for various sequence labeling tasks, including part-of-
speech tagging, named entity recognition, and chunking. They are particularly
effective when there are strong dependencies between adjacent elements in a
sequence, making them a valuable tool for understanding and processing natural
language text.
15. Construct parse tree for the following CFG using following rules ;
“The man read the book”
a.
16. Explain Discourse reference resolution
Discourse Reference Resolution, also known as anaphora resolution, is a crucial task
in natural language processing (NLP) that involves identifying the entities or
expressions to which pronouns, definite noun phrases, or other referring expressions
refer within a text or conversation.
Consider this example: John went to the store. He bought a book.
In this example, "He" refers to "John", and "a book" refers to a book that John bought.
Discourse Reference Resolution involves identifying reference phrases, determining
potential antecedents, scoring antecedents, selecting the best antecedent, forming
coreference chains, and handling ambiguity.
Discourse Reference Resolution is essential for understanding the flow of information
in a text, especially in more complex documents or dialogues. It is applied in a wide
range of NLP applications, including machine translation, text summarization,
question answering, and more. Accurate resolution of references greatly enhances the
ability of machines to understand and generate coherent and contextually appropriate
text.
17. What do you mean by word sense disambiguation (WSD)?. Explain Machine learning
based methods
Word Sense Disambiguation (WSD) is a natural language processing task that aims to determine the
correct meaning or sense of a word within a given context. Many words in natural language have
multiple meanings (polysemy), and identifying the correct sense is crucial for tasks like machine
translation, information retrieval, and sentiment analysis.
Machine learning-based methods for WSD leverage algorithms that learn patterns from annotated
data to make predictions about word senses. Here's an explanation of how machine learning is applied
to WSD:
1. Feature Extraction: Start by representing words and their context in a way that can be used
for machine learning. This often involves creating feature vectors where each dimension
corresponds to a specific linguistic feature (e.g., surrounding words, part-of-speech tags,
syntactic structures).
2. Annotated Data: Gather a dataset of annotated examples where each instance includes a
word, its context, and the correct sense label. For instance, given the sentence "I saw a bat",
the context "saw a" could be labeled with senses related to either a flying mammal or a sports
equipment.
3. Training Phase: Use the annotated data to train a machine learning model. Common
algorithms used for WSD include Support Vector Machines (SVMs), Decision Trees, Random
Forests, and Neural Networks.
4. Feature Selection: Identify the most relevant features that contribute to distinguishing
between different word senses. This can involve techniques like Information Gain or L1
Regularization.
5. Model Training: The machine learning algorithm learns the relationships between the
features and the correct word senses in the training data.
6. Prediction: Given a new instance with a word and its context, the trained model predicts the
most likely sense for that word in that context.
7. Evaluation: The performance of the WSD system is assessed using evaluation metrics like
accuracy, precision, recall, and F1-score on a separate test set.
8. Fine-tuning and Optimization: Depending on the results, the model may be fine-tuned, or
different algorithms or features may be explored to improve performance.
9. Application: The trained model can be used for disambiguating words in new, unseen texts.
This is particularly useful in various NLP applications, such as machine translation, information
retrieval, and sentiment analysis.
Machine learning-based approaches for WSD can be effective, especially when they are trained on
large and diverse datasets. However, they often require substantial amounts of annotated data for
training, and the quality of the features used can greatly impact the performance of the model.
Additionally, pre-trained word embeddings or contextual embeddings from models like BERT have
shown promise in improving WSD performance.