MNLP - Unit-3 (1)
MNLP - Unit-3 (1)
Semantic Parsing
-Presented By,
Dr D. Teja Santosh
Associate Professor, CSE
CVRCE
• Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-
understandable representation of its meaning.
• Semantic parsing can thus be understood as extracting the precise meaning of an utterance.
https://github.com/LiBinNLP/GNNSDP
Introduction
• The holy grail of research in language understanding is the identification of a meaning representation
that is detailed enough to allow reasoning systems to make deductions but, at the same time, is general
enough that it can be used across many domains with little to no adaptation.
• It is not clear whether a final, low-level, detailed semantic representation covering various
applications that use some form of language interface can be achieved or whether an ontology can be
created that can capture the various granularities and aspects of meanings that are embodied in such
a variety of applications—none has yet been created.
• Therefore, two compromise approaches have emerged in the natural language processing community
for language understanding.
Contd…
• In the first approach, a specific, rich meaning representation is created for a limited domain for use
by applications that are restricted to that domain, such as air travel reservations, football game
simulations, or querying a geographic database. Systems are then crafted to generate output from
text in this rich, domain-specific meaning representation.
• In the second approach, a related set of intermediate meaning representations is created, going from
low-level analysis to a midlevel analysis, and the bigger understanding task is divided into multiple,
smaller pieces that are more manageable, such as word sense disambiguation followed by predicate-
argument structure recognition.
• By dividing the problem up this way, each intermediate representation is only responsible for
capturing a relatively small component of overall meaning, thereby making the task of defining and
modeling each representation easier.
• So, we treat the world as though it has exactly two types of meaning representations: a
domain-dependent, deeper representation and a set of relatively shallow but general-
purpose, lowlevel, and intermediate representations.
• The task of producing the output of the first type is often called deep semantic parsing, and
the task of producing the output of the second type is often called shallow semantic
parsing.
Semantic Interpretation
Meaning Representation
Structural Ambiguity
Word Sense
Entity & Event Resolution
Predicate-Argument Structure
Sub-fields of Semantic Analysis
• The literal meaning of a sentence is derived from the combination of meanings of its
individual words and their syntactic arrangement.
Difference between the "meaning of a sentence" and the
"literal meaning of a sentence"
1. Literal Meaning of a Sentence:
- The literal meaning of a sentence refers to the straightforward, dictionary-based interpretation of the words and structures
used. It involves understanding the denotations of individual words and their syntactic relationships without considering figurative
language, implied meanings, or contextual nuances. The literal meaning is what you would find in a standard dictionary definition.
2. Meaning of a Sentence:
- The meaning of a sentence is a broader concept that encompasses not only the literal meaning but also includes additional
layers of interpretation. It takes into account context, speaker's intentions, implied meanings, and any figurative language or
idiomatic expressions used. The meaning of a sentence involves considering the overall message or information conveyed, taking
into consideration various linguistic and pragmatic factors.
Example:
Consider the sentence: "It's raining cats and dogs."
- Literal Meaning: The literal meaning would be an interpretation based on the individual words. However, taking it literally would not accurately
represent the intended message because rain doesn't actually consist of cats and dogs.
- Meaning of the Sentence: The intended meaning goes beyond the literal interpretation. In this case, it's an idiomatic expression indicating heavy
rainfall. Understanding the meaning of the sentence involves recognizing the figurative use of language and the idiomatic expression.
In summary, the literal meaning of a sentence is a straightforward interpretation based on word meanings and syntactic structure, while the meaning of
a sentence encompasses a broader understanding that includes context, speaker intentions, figurative language, and any implied or idiomatic
expressions used.
The cat that lives dangerously had nine lives.
https://nlp.uniroma1.it/amuse-wsd/
The cat that lives dangerously had nine lives
https://huggingface.co/spaces/Komorebizyd/DrawApp
cat living riskily
An astronaut riding a horse in photorealistic style
An astronaut riding a horse in photorealistic style
https://openai.com/dall-e-2
Semantic Ambiguities in Turkish, Korean and
Chinese language sentences
Turkish:
• WSD, on the other hand, is the task of determining the correct sense of a
word in a given context, and WordNet provides a structured inventory of
word senses that aids in this process.
WordNet 2.1 Browser
• Find the overlap between the features of different senses of an ambiguous word (sense bag) and
the features of the words in its context (context bag).
• The sense which has the maximum overlap is selected as the contextually appropriate sense.
45
CFILT - IITB 45
Lesk Algorithm
LESK’S ALGORITHM - Example
# Example usage
sentence = "The land is on the side of the river."
sense_definition = "The land alongside or sloping down to a river
or lake."
Mere overlap between the target word and the words in the
sense definitions are not enough to finalize the sense of the
target word which the Lesk Algorithm does.
Limitations of Lesk Algorithm
1. Dependency on Overlapping Words:
• Lesk relies on the overlap of words between the context and dictionary definitions. If there is little to no overlap, the algorithm may
struggle to accurately identify the correct sense of a word.
2. Sense Ambiguity:
• In cases where a word has multiple senses with similar or related meanings, Lesk may have difficulty distinguishing between them. This
is particularly true when the senses share many common words in their definitions.
3. Context Size:
• The algorithm uses a fixed-size window of words around the target word to establish the context. Depending on the size of the context
window, some relevant information may be overlooked, leading to incorrect sense disambiguation.
4. Disregards Word Order:
• Lesk treats words as unordered sets, ignoring their actual order in the sentence. This can be a limitation as the order of words in a
sentence can contribute significantly to the overall meaning.
5. Limited to Definitions:
• Lesk relies solely on dictionary definitions for sense discrimination. It does not take into account other sources of information, such as
domain-specific knowledge or semantic relationships between words.
6. Homonym Confusion:
• Lesk may face challenges in distinguishing between homonyms—words that have the same spelling but different meanings. If the
algorithm encounters a homonym, it may not be able to accurately identify the intended sense without additional contextual clues.
7. Doesn't Consider Polysemy Evolution:
• Words can change their meanings over time, a phenomenon known as polysemy evolution. Lesk does not take temporal changes into
account, and it may struggle when applied to contexts where the meaning of a word has shifted.
8. Lack of Sensitivity to Word Sense Frequency:
• The algorithm treats all word senses as equally likely, regardless of their frequency of occurrence in a particular context. In reality, some
senses of a word may be more common than others.
9. Vocabulary Limitations:
• Lesk's effectiveness is dependent on the coverage and quality of the dictionary it uses. If a word or sense is not present in the
dictionary, the algorithm cannot handle it.
Simplified Lesk Algorithm
Limitations of Simplified Lesk Algorithm
The Simplified Lesk Algorithm, while straightforward and easy to
implement, has several limitations that may affect its performance
in real-world scenarios. Major drawbacks are listed below:
Background:
• Knowledge-based WSD was faced with the knowledge acquisition bottleneck. Manual acquisition is
a heavy and endless task, while online dictionaries provide semantic information in a mostly
unstructured way, making it difficult for a computer program to exploit the encoded lexical
knowledge.
• The problem is that these resources are often heterogeneous (That is, they use different inventories
of semantic relations, different concept names, different strategies for representing roles,
properties, and abstract categories, etc.), midway formal, and sometimes inconsistent.
• Despite these problems, it is believed that the semantic annotation methods critically depends on
the outcomes of large-scale efforts to integrate existing lexical resources and on the design of
WSD algorithms that exploit this knowledge at best.
Integration of existing lexical resources
• WordNet 2.0
• Domain labels
• Annotated corpora
• Dictionaries of collocations
Design of WSD algorithm that exploit the above knowledge
at best
68
Above Limitations are handled using FRED
Ontology for WSD
Bank in the context of financial institution
http://wit.istc.cnr.it/stlab-tools/fred/demo/
Bank in the context of river side
73
73
NAÏVE BAYES
74
DECISION LIST ALGORITHM
• Based on ‘One sense per collocation’ property.
• Nearby words provide strong and consistent clues as to the sense of a target word.
• Collect a large set of collocations for the ambiguous word.
• Calculate word-sense probability distributions for all such collocations.
Assuming there are only
• Calculate the log-likelihood ratio two senses for the word.
Of course, this can easily
Pr(Sense-A| Collocationi) be extended to ‘k’ senses.
Log( Pr(Sense-B| Collocation )
)
i
75
75
DECISION LIST ALGORITHM (CONTD.)
Training Data Resultant Decision List
76
Exemplar Based WSD (k-nn)
77
WSD Using SVMs
• SVM is a binary classifier which finds a hyperplane with the largest margin that separates training examples
into 2 classes.
• As SVMs are binary classifiers, a separate classifier is built for each sense of the word
• Training Phase: Using a tagged corpus, f or every sense of the word a SVM is trained using the following
features:
• POS of w as well as POS of neighboring words.
• Local collocations
• Co-occurrence vector
• Features based on syntactic relations (e.g. headword, POS of headword, voice of head word etc.)
• Testing Phase: Given a test sentence, a test example is constructed using the above features and fed as input
to each binary classifier.
• The correct sense is selected based on the label returned by each classifier.
78
Supervised Approaches –
Comparisons
Approach Average Average Recall Corpus Average Baseline
Precision Accuracy
Naïve Bayes 64.13% Not reported Senseval3 – All 60.90%
Words Task
Decision Lists 96% Not applicable Tested on a set of 63.9%
12 highly
polysemous
English words
Exemplar Based 68.6% Not reported WSJ6 containing 63.7%
disambiguation (k- 191 content words
NN)
SVM 72.4% 72.4% Senseval 3 – 55.2%
Lexical sample
task (Used for
disambiguation of
57 words)
79
Supervised Approaches –Conclusions
• General Comments
• Use corpus evidence instead of relying of dictionary defined senses.
• Can capture important clues provided by proper nouns because proper nouns do appear in a corpus.
• Naïve Bayes
• Suffers from data sparseness.
• Since the scores are a product of probabilities, some weak features might pull down the overall score for a sense.
• A large number of parameters need to be trained.
• Decision Lists
• A word-specific classifier. A separate classifier needs to be trained for each word.
• Uses the single most predictive feature which eliminates the drawback of Naïve Bayes.
80
Supervised Approaches –Conclusions
• Exemplar Based K-NN
• A word-specific classifier.
• Will not work for unknown words which do not appear in the corpus.
• Uses a diverse set of features (including morphological and noun-subject-verb pairs)
• SVM
• A word-sense specific classifier.
• Gives the highest improvement over the baseline accuracy.
• Uses a diverse set of features.
81
ROADMAP
82
82
Unsupervised
• Progress in word sense disambiguation is stymied by the dearth of labeled training data to train a
classifier for every sense of each word in a given language. There are a few solutions to this problem:
1. Devise a way to cluster instances of a word so that each cluster effectively constrains the examples of
the word to a certain sense. This could be considered sense induction through clustering.
2. Use some metrics to identify the proximity of a given instance with some sets of known senses of a
word and select the closest to be the sense of that instance.
3. Start with seeds of examples of certain senses, then iteratively grow them to form clusters.
NOTE:
We do not discuss in much detail the mostly clustering-based sense induction methods here.
We assume that there is already a predefined sense inventory for a word and that the
unsupervised methods use very few, if any, hand-annotated examples, and then attempt to
classify unseen test instances into one of their predetermined sense categories.
WSD USING CONCEPTUAL DENSITY
86
CONCEPTUAL DENSITY (EXAMPLE)
administrative_unit
body
CD = 0.062
division CD = 0.256
CFILT - IITB
committee department
government department
local department
The jury(2) praised the administration(3) and operation (8) of Atlanta Police Department(1)
Step 1: Step
Make2: Step
a lattice
Compute
3: of the
The
the
Stepconcept
4: with
Select the senses
nouns conceptual
in the context,
highesttheir
CD
density
isbelow
selected.
of the selected concept as
senses resultant
and hypernyms.
concepts the correct
(sub- sense for the
hierarchies). respective words. 87
87
The DURel Annotation Tool: Human and Computational Measurement of
Semantic Proximity, Sense Clusters and Semantic Change (December
https://www.changeiskey.org/publication/2023-durel-tool/
2023)
https://www2.ims.uni-stuttgart.de/video/durel-tool/230623-
durel-tool-demo.mp4
SenseClusters Generation (Cluster WordNet synsets using
Lin Similarity for WSD)
#!/usr/bin/env python for s in synsets:
import nltk added = False
for c in clusters:
from nltk.corpus import wordnet as wn
for ss in c:
from nltk.corpus import wordnet_ic if s.lin_similarity(ss, ic_corpus) >
nltk.download('punkt') threshold:
nltk.download('averaged_perceptron_tagger') c.append(s)
added = True
nltk.download('wordnet') break
nltk.download('wordnet_ic') if added:
break
def cluster_senses(word, context, if not added:
threshold=0.30,ic_corpus=wordnet_ic.ic('ic- clusters += [[s]]
treebank.dat')):
return clusters
"""
Cluster WordNet synsets using Lin # Example usage
Similarity for WSD. context = "Book that flight"
""" result_clusters = cluster_senses('book', context,
threshold=0.30)
synsets = wn.synsets(word,
pos=wn.VERB) # Print the resulting clusters
clusters = [[synsets.pop()]] for cluster in result_clusters:
print(cluster)
ROADMAP
• Knowledge Based Approaches
• WSD using Selectional Preferences (or restrictions)
• Overlap Based Approaches
• Machine Learning Based Approaches
• Supervised Approaches
• Unsupervised Algorithms
• Semi-supervised Algorithms
• Reducing Knowledge Acquisition Bottleneck
• Summary
91
CFILT - IITB 91
SEMI-SUPERVISED DECISION LIST ALGORITHM
• Step1: Train the Decision List algorithm using a small amount of seed data.
• Step2: Classify the entire sample set using the trained classifier.
• Step3: Create new seed data by adding those members which are tagged as Sense-A or Sense-B with high
probability.
92
Initialization, Progress and Convergence
Residual data
Life Manufacturing
94
CFILT - IITB 94
Semi-Supervised Approaches – Comparisons &
Conclusions
Works at par with its supervised version even though it needs significantly less
amount of tagged data.
Has all the advantages and disadvantaged of its supervised version.
95
ROADMAP
• Knowledge Based Approaches
• WSD using Selectional Preferences (or restrictions)
• Overlap Based Approaches
• Machine Learning Based Approaches
• Supervised Approaches
• Semi-supervised Algorithms
• Unsupervised Algorithms
• Hybrid Approaches
• Reducing Knowledge Acquisition Bottleneck
• Summary
96
CFILT - IITB 96
Overcoming Knowledge Bottle-
Neck
Using Search Engines
• Construct search queries using monosemic words and phrases form
the gloss of a synset.
• Feed these queries to a search engine.
• From the retrieved documents extract the sentences which contain
the search queries.
97
ROADMAP
• Knowledge Based Approaches
• WSD using Selectional Preferences (or restrictions)
• Overlap Based Approaches
• Machine Learning Based Approaches
• Supervised Approaches
• Semi-supervised Algorithms
• Unsupervised Algorithms
• Hybrid Approaches
• Reducing Knowledge Acquisition Bottleneck
• Summary
98
SUMMARY
• Complete dependence on dictionary defined senses is the primary reason for low accuracies in Knowledge
Based approaches.
• Extracting “sense definitions” or “usage patterns” from the corpus greatly improves the accuracy.
• Word-specific classifiers are able to attain extremely good accuracies but suffer from the problem of non-
reusability.
99
SUMMARY (CONTD.)
• Classifiers that exploit syntactic dependencies between words are able to perform large scale
disambiguation (generic classifiers) and at the same time give reasonably good accuracies.
100