0% found this document useful (0 votes)
232 views

NLP Unit Test 2

Text summarization involves automatically generating a concise summary of a document while preserving its most important ideas. It works by analyzing the document to identify key elements like topics, main ideas and important details. Sentence extraction and abstraction methods are commonly used to summarize text by selecting or rewriting sentences. Evaluation involves comparing system summaries to human-generated references to assess accuracy.

Uploaded by

sneha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
232 views

NLP Unit Test 2

Text summarization involves automatically generating a concise summary of a document while preserving its most important ideas. It works by analyzing the document to identify key elements like topics, main ideas and important details. Sentence extraction and abstraction methods are commonly used to summarize text by selecting or rewriting sentences. Evaluation involves comparing system summaries to human-generated references to assess accuracy.

Uploaded by

sneha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1.

Compare Top down and bottom up parsing

2. What do you mean by Synset give example


A "Synset" is a term used in linguistics and natural language processing to refer to a set
of words or phrases that are synonymous or semantically related. In other words, a
Synset groups together words that have similar meanings.
For example, consider the words "car," "automobile," and "vehicle." These words are
related in meaning because they all refer to a type of transportation. In a Synset, these
words would be grouped together. In the context of computational linguistics and
natural language processing, Synsets are often used in lexical databases like WordNet.
WordNet, for instance, is a large lexical database of English that groups words into sets
of synonyms called synsets. Each synset is linked to other synsets by means of semantic
relationships.
So, for the words "car," "automobile," and "vehicle," there would be a synset that
includes all of them, indicating their semantic relationship.

3. How thesaurus based approach is used to find Word similarity


Thesaurus-based approach for finding word similarity relies on the idea that words with
similar meanings are often listed together in a thesaurus, which is a lexical resource that
groups words based on their semantic relationships.
In this approach, words are represented as nodes in a graph, and edges between nodes
represent the strength of their semantic relationship. The distance or path length
between two nodes in this graph can be used to measure the similarity between the
corresponding words. Shorter paths indicate higher similarity.
For instance, let's consider the words "happy" and "joyful". In a thesaurus, they would
likely be listed as synonyms or near-synonyms. Therefore, the distance between their
nodes in the graph would be short, indicating a high level of similarity.
This approach is useful in various natural language processing tasks such as information
retrieval, text summarization, and sentiment analysis. By quantifying word similarity,
it allows algorithms to better understand the context and meaning of text.
4. For the given concept graph find Sim(coinage, money)and Sim(coinage, Budget)using
Resnik Lin and JC Methods.

5. Explain with example relationships between word senses

6. Explain with suitable example following relationships between word meanings.


Homonymy, Polysemy, Synonymy, Antonymy
• Homonymy It may be described as words with the same spelling or form but
diverse and unconnected meanings E g Bat, Bank
• Hyponymy It illustrates the connection between a generic word and its
occurrences The generic term is known as hypernym, while the occurrences
are known as hyponyms. Fruit (hypernym) and Apple, Banana, Orange (hyponyms).
• Polysemy Polysemy is a term or phrase that has a different but comparable
meaning To put it another way, polysemy has the same spelling but various
and related meanings. Mouse (referring to both a small rodent and a computer input
device).
• Synonymy It denotes the relationship between two lexical elements that
have different forms but express the same or a similar meaning. Happy and Joyful
(both have similar meanings).
• Antonymy It is the relationship between two lexical items that include
semantic components that are symmetric with respect to an axis. Hot and Cold
(opposites on the temperature scale).
• Meronomy It is described as a logical arrangement of letters and words
indicating a component portion of or member of anything Wheel (part of) Car.

7. List and explain steps in text processing for Information Retrieval


Text processing in Information Retrieval involves several key steps to prepare and analyze text
data for effective retrieval. Here are the main steps along with brief explanations:
• Tokenization:It involves breaking a text into individual units, or tokens, which are typically
words or punctuation marks. Example: The sentence "Chatbots are fascinating!" would be
tokenized into ["Chatbots", "are", "fascinating", "!"].
• Lowercasing: Convert all tokens to lowercase to ensure case insensitivity. This prevents
"Chatbots" and "chatbots" from being treated as different terms. Example: "Chatbots" becomes
"chatbots".
• Stopword Removal:Remove common and less informative words like "the", "is", "and" which
occur frequently but don't contribute much to the meaning. Example: "The quick brown fox
jumps over the lazy dog" might become "quick brown fox jumps lazy dog".
• Stemming or Lemmatization: Definition: Reduce words to their base or root form (stem) to
capture the core meaning. Lemmatization is a more controlled approach, considering the
context and part of speech. Example: "Running", "ran", and "runner" all become "run".
• Normalization: Additional processes like removing special characters, numbers, or specific
symbols to further clean the text. Example: "Let's meet at 3:30 pm!" might become "Let's meet
at pm".
• Term Frequency (TF) Calculation: Calculate how often each term occurs in a document. This
helps in understanding the importance of terms within a document.Example: In the sentence
"Chatbots are fascinating, and chatbots are useful.", the term "chatbots" has a TF of 2.
• Inverse Document Frequency (IDF) Calculation: Assess the rarity of a term across all
documents in a corpus. Rare terms are often more informative. Example: "Chatbots" might
have a low IDF if it appears frequently in many documents.
• Vectorization: Represent each document as a numerical vector in a high-dimensional space.
This can be done using techniques like Bag-of-Words or Word Embeddings. Example: The
sentence "Chatbots are fascinating!" might be represented as [1, 1, 1, 0, ...] in a Bag-of-Words
model.
• Indexing: Create an index or data structure that allows for efficient querying and retrieval of
documents based on their vector representations. Example: Using an inverted index to map
terms to the documents they appear in.
• Query Processing: Apply similar preprocessing steps to user queries to prepare them for
comparison with indexed documents. Example: If a user enters "Tell me about chatbots", this
would go through tokenization, stopword removal, etc.
These steps collectively form the preprocessing pipeline for text data in Information Retrieval
systems, enabling efficient storage, retrieval, and ranking of relevant documents based on user
queries.

8. Explain Yarowsky bootstrapping approach of semi supervised learning


The Yarowsky Bootstrapping Algorithm is a semi-supervised learning method used in
natural language processing. It iteratively refines a model's understanding of word
senses using a small initial labeled dataset and a larger, unlabeled dataset. The process
starts with a seed set of labeled examples, where each word is tagged with its sense.
The model then uses this initial information to predict senses for the unlabeled data.
These predictions are incorporated into the training set, expanding the labeled dataset.
This augmented dataset is then used to retrain the model, which produces more
accurate predictions. This cycle repeats until a stopping criterion is met. The strength
of the Yarowsky approach lies in its ability to leverage a small amount of labeled data
to make predictions on a larger pool of unlabeled data, gradually improving accuracy
through iterations.

9. For given corpus,


<s>Martin Justin can watch Will</s>
<s>Martin Justin can watch Will</s>
<s>Spot will watch Martin</s>
<s>Will Justin spot Martin </s>
<s>Martin will spat Spot</s>
N: Noun [Martin, Justin, Will, Spot, Pat]
M: Modal verb [can , will]
V:Verb [ watch, spot, pat]
Create Transition Matrix & Emission Probability Matrix
Statement is “Justin will spot Will”
Apply Hidden Markov Model and do POS tagging
Transition Matrix:
Given the states:
• N (Noun)
• M (Modal verb)
• V (Verb)
• E (End of sentence)
And the transition counts from the corpus:

N M V E

N 0 1 2 2

M 1 0 2 2

V 1 1 1 2

E 0 0 0 0

Emission Probability Matrix:


Given the words:
• Martin, Justin, can, watch, Will, Spot, will, spat, Pat
And the emission counts from the corpus:
Applying Hidden Markov Model for POS Tagging:
Given the statement: "Justin will spot Will"
1. Initialization:
• Start with initial probabilities based on the corpus frequencies.
2. Forward Algorithm:
• Calculate forward probabilities for each state at each word position.
3. Backward Algorithm:
• Calculate backward probabilities for each state at each word position.
4. Combine Probabilities:
• Combine forward and backward probabilities to get the state probabilities at
each word position.
5. Decoding:
• For each word, select the state with the highest probability as the POS tag.
The POS tagging for the statement "Justin will spot Will" would be:
• Justin (Noun), will (Modal verb), spot (Verb), Will (Noun)

10.For a given grammar using CYK or CKY algorithm parse the statement
a. “Book the meal flight”
Rules:

10. Explain Text summarization in detail.


Text summarization is the process of condensing a longer piece of text while
preserving its core meaning and information. It's a crucial task in natural language
processing with applications in information retrieval, document categorization, and
more. There are two main approaches to text summarization:

Extractive Summarization:

In extractive summarization, sentences or phrases from the original text are selected
and combined to form the summary. This approach directly uses parts of the original
text. It's similar to how a person might extract key sentences when creating a
summary. Techniques for extractive summarization include:
TF-IDF (Term Frequency-Inverse Document Frequency): Ranks sentences based on
their importance in the context of the entire document.
Graph-Based Methods: Represent the text as a graph and use algorithms like
PageRank to find the most important sentences.
Machine Learning Approaches: Train models to predict the importance of sentences.
Abstractive Summarization:

Abstractive summarization involves generating new sentences that capture the main
points of the original text, potentially using different words and structures. It requires
a deeper understanding of the text and is more similar to how humans create
summaries. Techniques for abstractive summarization include:
Sequence-to-Sequence Models (e.g., using LSTM or Transformer architectures):
Train models to generate summaries by learning to map input sequences to output
sequences.
Transformer-Based Models (e.g., GPT-3, BERT): Utilize large-scale pre-trained
language models to generate abstractive summaries.
11. Explain Maximum Entropy Model for POS Tagging
he Maximum Entropy Model is a probabilistic model used in natural language
processing for tasks like part-of-speech (POS) tagging. It's based on the principle of
maximum entropy, which states that, given a set of constraints, the best model is the
one that makes the fewest assumptions.

Here's how the Maximum Entropy Model works for POS tagging:

Feature Selection:

Identify relevant features that can help predict the correct part-of-speech tag for a
given word in context. Features may include the word itself, surrounding words,
capitalization, suffixes, etc.
Define Feature Functions:

Associate each feature with a function that maps an input (e.g., a word and its context)
to a binary value indicating whether the feature is present or not.
Collect Training Data:

Gather a labeled dataset where each word is tagged with its corresponding part-of-
speech.
Training:

Use an optimization algorithm (like Generalized Iterative Scaling) to find the model
parameters that maximize the likelihood of the training data, subject to the feature
function constraints.
Prediction:

Given a new sentence, calculate the probability distribution over possible tags for
each word using the trained model. The tag with the highest probability is assigned to
the word.
12. Explain Hobbs algorithm for pronoun resolution.

13. Write a note on wordnet


WordNet is a lexical database of the English language, organized in a hierarchical
structure of synsets, which are sets of synonyms representing distinct concepts. It links
words based on their meanings, providing a rich resource for natural language
understanding. Each synset contains words that can be interchangeably used in
certain contexts, aiding tasks like semantic analysis, information retrieval, and
machine learning. WordNet's extensive coverage and detailed relationships between
words make it a valuable tool in various fields, from linguistics to artificial intelligence,
enabling deeper insights into the complexities of language. For instance, in WordNet,
the synset {car, automobile, motorcar} indicates that these words are synonymous,
and "car" is a hyponym of "vehicle".
.

14. Explain how HMM is used for sequence labelling


Hidden Markov Models (HMMs) are used for sequence labeling tasks in natural
language processing (NLP) and other fields where sequences of data need to be
classified or annotated. Here's an explanation of how HMMs are used for sequence
labeling:

Defining States and Observations:

In the context of sequence labeling, the states represent the different labels or tags,
while the observations correspond to the elements in the sequence that we want to
label (e.g., words in a sentence).
Transition Probabilities:
For each pair of adjacent states, define the transition probabilities. These probabilities
represent the likelihood of transitioning from one state to another. In sequence
labeling, they capture the likelihood of transitioning from one label to another.
Emission Probabilities:

Define the probabilities of emitting each observation from each state. These
probabilities represent the likelihood of observing a particular element given a
specific label. In NLP, this is often related to word-tag probabilities.
Initialization Probabilities:

Specify the initial probabilities of starting in each state. This represents the likelihood
of starting the sequence with a particular label.
The Forward Algorithm:

Given a sequence of observations (e.g., a sentence), use the forward algorithm to


compute the likelihood of the sequence occurring under the model. This involves
recursively calculating probabilities while considering all possible state sequences.
The Viterbi Algorithm:

Use the Viterbi algorithm to find the most likely sequence of states given the
observations. This algorithm efficiently finds the best sequence by considering both
the transition probabilities and the emission probabilities.
Decoding:

Once the Viterbi algorithm has been applied, the HMM produces a sequence of labels
that best matches the input sequence.
Output:

The output of the HMM is the sequence of labels that correspond to the input
sequence. These labels provide the desired annotation or classification for each
element in the input sequence.
In NLP, HMMs have been used for various sequence labeling tasks, including part-of-
speech tagging, named entity recognition, and chunking. They are particularly
effective when there are strong dependencies between adjacent elements in a
sequence, making them a valuable tool for understanding and processing natural
language text.
15. Construct parse tree for the following CFG using following rules ;
“The man read the book”
a.
16. Explain Discourse reference resolution
Discourse Reference Resolution, also known as anaphora resolution, is a crucial task
in natural language processing (NLP) that involves identifying the entities or
expressions to which pronouns, definite noun phrases, or other referring expressions
refer within a text or conversation.
Consider this example: John went to the store. He bought a book.
In this example, "He" refers to "John", and "a book" refers to a book that John bought.
Discourse Reference Resolution involves identifying reference phrases, determining
potential antecedents, scoring antecedents, selecting the best antecedent, forming
coreference chains, and handling ambiguity.
Discourse Reference Resolution is essential for understanding the flow of information
in a text, especially in more complex documents or dialogues. It is applied in a wide
range of NLP applications, including machine translation, text summarization,
question answering, and more. Accurate resolution of references greatly enhances the
ability of machines to understand and generate coherent and contextually appropriate
text.

17. What do you mean by word sense disambiguation (WSD)?. Explain Machine learning
based methods

Word Sense Disambiguation (WSD) is a natural language processing task that aims to determine the
correct meaning or sense of a word within a given context. Many words in natural language have
multiple meanings (polysemy), and identifying the correct sense is crucial for tasks like machine
translation, information retrieval, and sentiment analysis.
Machine learning-based methods for WSD leverage algorithms that learn patterns from annotated
data to make predictions about word senses. Here's an explanation of how machine learning is applied
to WSD:
1. Feature Extraction: Start by representing words and their context in a way that can be used
for machine learning. This often involves creating feature vectors where each dimension
corresponds to a specific linguistic feature (e.g., surrounding words, part-of-speech tags,
syntactic structures).
2. Annotated Data: Gather a dataset of annotated examples where each instance includes a
word, its context, and the correct sense label. For instance, given the sentence "I saw a bat",
the context "saw a" could be labeled with senses related to either a flying mammal or a sports
equipment.
3. Training Phase: Use the annotated data to train a machine learning model. Common
algorithms used for WSD include Support Vector Machines (SVMs), Decision Trees, Random
Forests, and Neural Networks.
4. Feature Selection: Identify the most relevant features that contribute to distinguishing
between different word senses. This can involve techniques like Information Gain or L1
Regularization.
5. Model Training: The machine learning algorithm learns the relationships between the
features and the correct word senses in the training data.
6. Prediction: Given a new instance with a word and its context, the trained model predicts the
most likely sense for that word in that context.
7. Evaluation: The performance of the WSD system is assessed using evaluation metrics like
accuracy, precision, recall, and F1-score on a separate test set.
8. Fine-tuning and Optimization: Depending on the results, the model may be fine-tuned, or
different algorithms or features may be explored to improve performance.
9. Application: The trained model can be used for disambiguating words in new, unseen texts.
This is particularly useful in various NLP applications, such as machine translation, information
retrieval, and sentiment analysis.
Machine learning-based approaches for WSD can be effective, especially when they are trained on
large and diverse datasets. However, they often require substantial amounts of annotated data for
training, and the quality of the features used can greatly impact the performance of the model.
Additionally, pre-trained word embeddings or contextual embeddings from models like BERT have
shown promise in improving WSD performance.

You might also like