0% found this document useful (0 votes)
0 views

MNLP - Unit-3 (1)

Uploaded by

gunavardhaneeda
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

MNLP - Unit-3 (1)

Uploaded by

gunavardhaneeda
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 100

Unit-3

Semantic Parsing

-Presented By,
Dr D. Teja Santosh
Associate Professor, CSE
CVRCE
• Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-
understandable representation of its meaning.
• Semantic parsing can thus be understood as extracting the precise meaning of an utterance.

https://github.com/LiBinNLP/GNNSDP
Introduction
• The holy grail of research in language understanding is the identification of a meaning representation
that is detailed enough to allow reasoning systems to make deductions but, at the same time, is general
enough that it can be used across many domains with little to no adaptation.

• It is not clear whether a final, low-level, detailed semantic representation covering various
applications that use some form of language interface can be achieved or whether an ontology can be
created that can capture the various granularities and aspects of meanings that are embodied in such
a variety of applications—none has yet been created.

• Therefore, two compromise approaches have emerged in the natural language processing community
for language understanding.
Contd…
• In the first approach, a specific, rich meaning representation is created for a limited domain for use
by applications that are restricted to that domain, such as air travel reservations, football game
simulations, or querying a geographic database. Systems are then crafted to generate output from
text in this rich, domain-specific meaning representation.

• In the second approach, a related set of intermediate meaning representations is created, going from
low-level analysis to a midlevel analysis, and the bigger understanding task is divided into multiple,
smaller pieces that are more manageable, such as word sense disambiguation followed by predicate-
argument structure recognition.

• By dividing the problem up this way, each intermediate representation is only responsible for
capturing a relatively small component of overall meaning, thereby making the task of defining and
modeling each representation easier.
• So, we treat the world as though it has exactly two types of meaning representations: a
domain-dependent, deeper representation and a set of relatively shallow but general-
purpose, lowlevel, and intermediate representations.

• The task of producing the output of the first type is often called deep semantic parsing, and
the task of producing the output of the second type is often called shallow semantic
parsing.
Semantic Interpretation

Meaning Representation

Structural Ambiguity
Word Sense
Entity & Event Resolution

Predicate-Argument Structure
Sub-fields of Semantic Analysis

lexical compositio procedura


semantics nal l
semantics semantics
Meaning Representation:
Syntax driven Semantic Analysis
https://www.youtube.com/watch?v=83mmRqk0uL8
Prolog Demo
Prolog Demo Contd…
Prolog Demo Contd…
Prolog Demo Contd…
Lexical Semantics

• Lexical semantics is a subfield of linguistics that focuses on the study of


word meanings and how words convey information.

• It involves examining the relationships between words, the senses and


nuances of individual words, and how words interact with each other
within a language.

• Lexical semantics plays a crucial role in understanding how meaning is


expressed at the word level.
Key aspects of lexical semantics
• Word Sense
• Words often have multiple senses, and distinguishing between them is essential for understanding
how words are used in different contexts.
• Polysemy and Homonymy
• Word Relations
• Lexical semantics explores relationships between words, such as synonymy (similar meanings),
antonymy (opposite meanings), hyponymy (subtype relationships), and hypernymy (supertype
relationships).
• Sense Relations
• Sense relations describe the ways in which the meanings of words are related. Common sense
relations include meronymy (part-whole relationships), troponymy (manner of action relationships),
and entailment (implication relationships).
• Lexical Ambiguity
• Lexical semantics helps address issues of ambiguity that arise when a word has multiple possible
meanings. Context, syntax, and pragmatics are considered in resolving lexical ambiguity.
• Lexical semantics primarily deals with the meaning of individual words, focusing on
understanding the various senses of words, their relationships, and how they
contribute to the meaning of sentences.

• While lexical semantics contributes to the interpretation of a sentence, it doesn't


directly provide the entire literal meaning of a sentence on its own.

• The literal meaning of a sentence is derived from the combination of meanings of its
individual words and their syntactic arrangement.
Difference between the "meaning of a sentence" and the
"literal meaning of a sentence"
1. Literal Meaning of a Sentence:
- The literal meaning of a sentence refers to the straightforward, dictionary-based interpretation of the words and structures
used. It involves understanding the denotations of individual words and their syntactic relationships without considering figurative
language, implied meanings, or contextual nuances. The literal meaning is what you would find in a standard dictionary definition.

2. Meaning of a Sentence:
- The meaning of a sentence is a broader concept that encompasses not only the literal meaning but also includes additional
layers of interpretation. It takes into account context, speaker's intentions, implied meanings, and any figurative language or
idiomatic expressions used. The meaning of a sentence involves considering the overall message or information conveyed, taking
into consideration various linguistic and pragmatic factors.

Example:
Consider the sentence: "It's raining cats and dogs."

- Literal Meaning: The literal meaning would be an interpretation based on the individual words. However, taking it literally would not accurately
represent the intended message because rain doesn't actually consist of cats and dogs.

- Meaning of the Sentence: The intended meaning goes beyond the literal interpretation. In this case, it's an idiomatic expression indicating heavy
rainfall. Understanding the meaning of the sentence involves recognizing the figurative use of language and the idiomatic expression.

In summary, the literal meaning of a sentence is a straightforward interpretation based on word meanings and syntactic structure, while the meaning of
a sentence encompasses a broader understanding that includes context, speaker intentions, figurative language, and any implied or idiomatic
expressions used.
The cat that lives dangerously had nine lives.

https://nlp.uniroma1.it/amuse-wsd/
The cat that lives dangerously had nine lives

https://huggingface.co/spaces/Komorebizyd/DrawApp
cat living riskily
An astronaut riding a horse in photorealistic style
An astronaut riding a horse in photorealistic style

https://openai.com/dall-e-2
Semantic Ambiguities in Turkish, Korean and
Chinese language sentences

Turkish:

1."Köpekleri gezmeye götürdüm." (I took the dogs for a walk.)


• Ambiguity: It's unclear whether the speaker took multiple dogs for a
walk or took dogs owned by someone else for a walk.
2."Kitabı öğrenciye verdim." (I gave the book to the student.)
• Ambiguity: It's unclear whether the speaker gave their own book to
the student or gave a book that belonged to the student to someone
else.
3."Kırmızı lambayı gördüm arabada." (I saw the red light in the
car.)
• Ambiguity: It's unclear whether the speaker saw a red light inside
the car or saw a red traffic light while in the car.
Korean:

1." 사과를 먹는 남자를 봤어 ." (I saw a man eating an apple.)


• Ambiguity: It's unclear whether the speaker saw a man who was
eating an apple or saw a man while eating an apple.
2." 책상 위에 책이 있어 ." (There is a book on the desk.)
• Ambiguity: It's unclear whether the book is physically on the desk or
if the speaker meant there is a book regarding the topic of desks.
3." 차를 마시다 ." (To drink tea or to drink a car?)
• Ambiguity: This sentence could be interpreted as "to drink tea" or
literally "to drink a car," showcasing semantic ambiguity.
Chinese:

1." 我在公园看到了玩耍的孩子。 " (Wǒ zài gōngyuán kàndàole wánshuǎ de


háizi.) (I saw children playing in the park.)
• Ambiguity: It's unclear whether the speaker saw children who were
playing in the park or saw children and started playing with them in
the park.
2." 他买了一辆车给我。 " (Tā mǎile yī liàng chē gěi wǒ.) (He bought me
a car.)
• Ambiguity: It's unclear whether he bought a car for the speaker or
bought a car from the speaker.
3." 她在书桌上看到了笔。 " (Tā zài shūzhuō shàng kàndàole bǐ.) (She saw
the pen on the desk.)
• Ambiguity: It's unclear whether she saw a pen on the desk or saw a
pen and then put it on the desk.
This calls for Word Sense Disambiguation (WSD)
How WSD is related to lexical semantics

• Word Sense Disambiguation (WSD) is closely related to lexical semantics as it


involves determining the correct sense or meaning of a word within a specific
context.

• Lexical semantics, as mentioned earlier, focuses on the study of word


meanings, including the various senses that a word can have and how these
meanings relate to each other.

• WSD is a practical application of lexical semantics, aiming to address the


challenge of ambiguity that arises when a word has multiple possible
meanings.
How WordNet is related to WSD

• WordNet and Word Sense Disambiguation (WSD) are closely related, as


WordNet serves as a valuable resource for WSD applications.

• WordNet is a lexical database of the English language that organizes words


into sets of synonyms called synsets and describes semantic relationships
between them.

• WSD, on the other hand, is the task of determining the correct sense of a
word in a given context, and WordNet provides a structured inventory of
word senses that aids in this process.
WordNet 2.1 Browser

For 1st sense of word “nail” under Noun category the


understanding of the gloss is: It is a hard, protective
structure (horny plate) that covers a portion of the
upper side of the fingers or toes.

So, this is about biological nail and is not the metallic


nail.
OVERLAP BASED APPROACHES

• Require a Machine Readable Dictionary (MRD).

• Find the overlap between the features of different senses of an ambiguous word (sense bag) and
the features of the words in its context (context bag).

• These features could be sense definitions, example sentences, hypernyms etc.

• The features could also be given weights.

• The sense which has the maximum overlap is selected as the contextually appropriate sense.

45

CFILT - IITB 45
Lesk Algorithm
LESK’S ALGORITHM - Example

Sense Bag: contains the words in the definition of a candidate sense of


the ambiguous word.
Context Bag: contains the words in the definition of each sense of
each context word.
E.g. “On burning coal we get ash.”

Ash (Target Word) Coal (Context word)


• Sense 1 • Sense 1
Trees of the olive family with pinnate leaves, thin A piece of glowing carbon or burnt wood.
furrowed bark and gray branches.
• Sense 2 • Sense 2
The solid residue left when combustible material is charcoal.
thoroughly burned or oxidized. • Sense 3
• Sense 3 A black solid combustible substance formed by the
To convert into ash partial decomposition of vegetable matter without
free access to air and under the influence of
moisture and often increased pressure and
temperature that is widely used as a fuel for
burning
47

In this case Sense 2 of ash would be the winner sense.


CFILT - IITB 47
Compute Overlap Definition in Lesk
def compute_overlap(sentence, sense_definition):
sentence_words = set(sentence.lower().split())
definition_words = set(sense_definition.lower().split())
overlap = sentence_words.intersection(definition_words)
return len(overlap)

# Example usage
sentence = "The land is on the side of the river."
sense_definition = "The land alongside or sloping down to a river
or lake."

overlap_score = compute_overlap(sentence, sense_definition)


print(f"Overlap Score: {overlap_score}")
1. H
e na
2. H i l ed
e bo the
3. H u gh t loo
se a
ew a bo rm
e nt x of of t
4. H t o the n ails he c
lon e wen
g. t to be a
uty
from hair w
i th
ge t s al
th e
h ah
am on a rd am
ani to g
et h w a me
r.
cu r re s
e. H is n tore
is n ails .
ails clip
had pe d
gro .
wn
ver
y
definition from WordNet apt?
Word Sense Disambiguation – is this sense
Observation

Mere overlap between the target word and the words in the
sense definitions are not enough to finalize the sense of the
target word which the Lesk Algorithm does.
Limitations of Lesk Algorithm
1. Dependency on Overlapping Words:
• Lesk relies on the overlap of words between the context and dictionary definitions. If there is little to no overlap, the algorithm may
struggle to accurately identify the correct sense of a word.
2. Sense Ambiguity:
• In cases where a word has multiple senses with similar or related meanings, Lesk may have difficulty distinguishing between them. This
is particularly true when the senses share many common words in their definitions.
3. Context Size:
• The algorithm uses a fixed-size window of words around the target word to establish the context. Depending on the size of the context
window, some relevant information may be overlooked, leading to incorrect sense disambiguation.
4. Disregards Word Order:
• Lesk treats words as unordered sets, ignoring their actual order in the sentence. This can be a limitation as the order of words in a
sentence can contribute significantly to the overall meaning.
5. Limited to Definitions:
• Lesk relies solely on dictionary definitions for sense discrimination. It does not take into account other sources of information, such as
domain-specific knowledge or semantic relationships between words.
6. Homonym Confusion:
• Lesk may face challenges in distinguishing between homonyms—words that have the same spelling but different meanings. If the
algorithm encounters a homonym, it may not be able to accurately identify the intended sense without additional contextual clues.
7. Doesn't Consider Polysemy Evolution:
• Words can change their meanings over time, a phenomenon known as polysemy evolution. Lesk does not take temporal changes into
account, and it may struggle when applied to contexts where the meaning of a word has shifted.
8. Lack of Sensitivity to Word Sense Frequency:
• The algorithm treats all word senses as equally likely, regardless of their frequency of occurrence in a particular context. In reality, some
senses of a word may be more common than others.
9. Vocabulary Limitations:
• Lesk's effectiveness is dependent on the coverage and quality of the dictionary it uses. If a word or sense is not present in the
dictionary, the algorithm cannot handle it.
Simplified Lesk Algorithm
Limitations of Simplified Lesk Algorithm
The Simplified Lesk Algorithm, while straightforward and easy to
implement, has several limitations that may affect its performance
in real-world scenarios. Major drawbacks are listed below:

1. It doesn't consider word order, word importance, or semantic


relationships between words.
2. The algorithm relies on WordNet's synsets and their definitions.
If WordNet does not cover a specific sense or if its definitions
are outdated or incomplete, the algorithm may not perform
well.
3. The algorithm does not take into account the broader context of
the sentence or surrounding sentences. It treats each sentence
independently, potentially missing nuances that require a
broader understanding.
Using statistical models of Roget’s categories trained on large corpora
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB # Step 7: Extract features from test data
X_test = vectorizer.transform(test_data)
# Step 1: Define Roget's categories
rogets_categories = { # Step 8: Make predictions
"run": "Motion", predictions = classifier.predict(X_test)
"joy": "Emotion",
"color": "Color" # Display results
} for sentence, predicted_category in
zip(test_data, predictions):
# Step 2: Create training data print(f"Sentence: {sentence}")
training_data = [ print(f"Predicted Roget's Category:
("He likes to run in the morning.", "Motion"), {predicted_category}\n")
("Her joy was contagious.", "Emotion"),
("The sky is blue today.", "Color")
]

# Step 3: Extract features from training data


corpus = [sentence for sentence, _ in training_data]
vectorizer = CountVectorizer()
X_train = vectorizer.fit_transform(corpus)

# Step 4: Create labels for training data


y_train = [category for _, category in training_data]

# Step 5: Train a Naive Bayes classifier


classifier = MultinomialNB()
classifier.fit(X_train, y_train)

# Step 6: Create test data


test_data = [
"He enjoys the joy of running.",
"The color of the sky is red."
]
Structural Semantic Interconnections (SSI) Algorithm

Background:

• Knowledge-based WSD was faced with the knowledge acquisition bottleneck. Manual acquisition is
a heavy and endless task, while online dictionaries provide semantic information in a mostly
unstructured way, making it difficult for a computer program to exploit the encoded lexical
knowledge.

• The problem is that these resources are often heterogeneous (That is, they use different inventories
of semantic relations, different concept names, different strategies for representing roles,
properties, and abstract categories, etc.), midway formal, and sometimes inconsistent.

• Despite these problems, it is believed that the semantic annotation methods critically depends on
the outcomes of large-scale efforts to integrate existing lexical resources and on the design of
WSD algorithms that exploit this knowledge at best.
Integration of existing lexical resources

• WordNet 2.0

• Domain labels

• Annotated corpora

• Dictionaries of collocations
Design of WSD algorithm that exploit the above knowledge
at best

• Structural semantic interconnections(SSI) uses graphs to describe the


objects to analyze (word senses) and a context free grammar to detect
relevant semantic patterns between graphs.

• Sense classification is based on the number and type of detected


interconnections.

• The graph representation of word senses is automatically built from several


available resources, such as lexicalized ontologies, collocation inventories,
annotated corpora, and glossaries, that are combined in part manually, in
part automatically.
Procedure for creating structured (graph)
representations of word senses from a
variety of lexical resources
• Approach to word sense disambiguation lies in the structural pattern
recognition framework.

• Structural or syntactic pattern recognition, as proven to be effective when the


objects to be classified contain an inherent, identifiable organization, such as
image data and time-series data.
• For these objects, a representation based on a “flat” vector of features causes
a loss of information that negatively impacts classification performances.

• The classification task in a structural pattern recognition system is


implemented through the use of grammars that embody precise criteria to
discriminate among different classes.
• Word senses clearly fall under the category of objects that are better
described through a set of structured features.

• Learning a structure for the objects to be classified is often a major


problem in many application areas of structural pattern recognition.

• In the field of computational linguistics, however, large lexical knowledge


bases and annotated resources offer an ideal starting point for constructing
structured representations of word senses.
Building Graph Representations for Word Senses
• Create a lexical knowledgebase (LKB) including semantic relations explicitly
encoded in WordNet and semantic relations extracted from annotated
corpora and dictionaries of collocations.

• The LKB is used to generate labelled directed graph (digraph) representations


of word senses.

• This is called semantic graph since they represent alternative


conceptualizations for a given lexical item.

• Nodes represent concepts (WordNet synsets) and edges are semantic


relations.
KB Approaches – Limitations
• Drawbacks of Overlap based approaches

• Dictionary definitions are generally very small.


• Dictionary entries rarely take into account the distributional constraints of different
word senses (e.g. selectional preferences, kinds of prepositions, etc.  cigarette and
ash never co-occur in a dictionary).
• Suffer from the problem of sparse match.
• Proper nouns are not present in a MRD. Hence these approaches fail to capture the
strong clues provided by proper nouns.
E.g. “Sachin Tendulkar” will be a strong indicator of the category “sports”.
Sachin Tendulkar plays cricket.

68
Above Limitations are handled using FRED
Ontology for WSD
Bank in the context of financial institution

Sentence submitted is: I am at the bank to deposit money

http://wit.istc.cnr.it/stlab-tools/fred/demo/
Bank in the context of river side

Sentence submitted is: I am at the banks of river Yamuna


Work of Dr D. Teja Santosh on Word Sense Disambiguation using
Ontology
Title: Disambiguating the context of the concept terms from scalable
document collection with the property based concept hierarchies from
evolutionary ontology
Debugged SWRL Rules shown in Controlled Natural Language (CNL) form
ROADMAP

• Knowledge Based Approaches


• WSD using Selectional Preferences (or restrictions)
• Overlap Based Approaches
• Machine Learning Based Approaches
• Supervised Approaches
• Unsupervised Algorithms
• Semisupervised Algorithms
• Hybrid Approaches
• Reducing Knowledge Acquisition Bottleneck

73

73
NAÏVE BAYES

sˆ= argmax s ε senses Pr(s|Vw)

 ‘V ’ is a feature vector consisting of:


w
 POS of w
 Semantic & Syntactic features of w
 Collocation vector (set of words around it)  typically consists of next word(+1),
next-to-next word(+2), -2, -1 & their POS's
 Co-occurrence vector (number of times w occurs in bag of words around it)

 Applying Bayes rule and naive independence assumption


sˆ= argmax s ε senses Pr(s).Πi=1nPr(Vwi|s) 74

74
DECISION LIST ALGORITHM
• Based on ‘One sense per collocation’ property.
• Nearby words provide strong and consistent clues as to the sense of a target word.
• Collect a large set of collocations for the ambiguous word.
• Calculate word-sense probability distributions for all such collocations.
Assuming there are only
• Calculate the log-likelihood ratio two senses for the word.
Of course, this can easily
Pr(Sense-A| Collocationi) be extended to ‘k’ senses.
Log( Pr(Sense-B| Collocation )
)
i

• Higher log-likelihood = more predictive evidence


• Collocations are ordered in a decision list, with most predictive collocations ranked highest.

75

75
DECISION LIST ALGORITHM (CONTD.)
Training Data Resultant Decision List

Classification of a test sentence is based on the highest


ranking collocation found in the test sentence.
E.g.
…plucking flowers affects plant growth… 76

76
Exemplar Based WSD (k-nn)

• An exemplar based classifier is constructed for each word to be disambiguated.


• Step1: From each sense marked sentence containing the ambiguous word, a training example is constructed
using:
• POS of w as well as POS of neighboring words.
• Local collocations
• Co-occurrence vector
• Morphological features
• Subject-verb syntactic dependencies
• Step2: Given a test sentence containing the ambiguous word, a test example is similarly constructed.
• Step3: The test example is then compared to all training examples and the k-closest training examples are
selected.
• Step4: The sense which is most prevalent amongst these “k” examples is then selected as the correct sense.

77
WSD Using SVMs

• SVM is a binary classifier which finds a hyperplane with the largest margin that separates training examples
into 2 classes.
• As SVMs are binary classifiers, a separate classifier is built for each sense of the word
• Training Phase: Using a tagged corpus, f or every sense of the word a SVM is trained using the following
features:
• POS of w as well as POS of neighboring words.
• Local collocations
• Co-occurrence vector
• Features based on syntactic relations (e.g. headword, POS of headword, voice of head word etc.)
• Testing Phase: Given a test sentence, a test example is constructed using the above features and fed as input
to each binary classifier.
• The correct sense is selected based on the label returned by each classifier.

78
Supervised Approaches –
Comparisons
Approach Average Average Recall Corpus Average Baseline
Precision Accuracy
Naïve Bayes 64.13% Not reported Senseval3 – All 60.90%
Words Task
Decision Lists 96% Not applicable Tested on a set of 63.9%
12 highly
polysemous
English words
Exemplar Based 68.6% Not reported WSJ6 containing 63.7%
disambiguation (k- 191 content words
NN)
SVM 72.4% 72.4% Senseval 3 – 55.2%
Lexical sample
task (Used for
disambiguation of
57 words)

79
Supervised Approaches –Conclusions
• General Comments
• Use corpus evidence instead of relying of dictionary defined senses.
• Can capture important clues provided by proper nouns because proper nouns do appear in a corpus.

• Naïve Bayes
• Suffers from data sparseness.
• Since the scores are a product of probabilities, some weak features might pull down the overall score for a sense.
• A large number of parameters need to be trained.

• Decision Lists
• A word-specific classifier. A separate classifier needs to be trained for each word.
• Uses the single most predictive feature which eliminates the drawback of Naïve Bayes.

80
Supervised Approaches –Conclusions
• Exemplar Based K-NN
• A word-specific classifier.
• Will not work for unknown words which do not appear in the corpus.
• Uses a diverse set of features (including morphological and noun-subject-verb pairs)

• SVM
• A word-sense specific classifier.
• Gives the highest improvement over the baseline accuracy.
• Uses a diverse set of features.

81
ROADMAP

• Knowledge Based Approaches


• WSD using Selectional Preferences (or restrictions)
• Overlap Based Approaches
• Machine Learning Based Approaches
• Supervised Approaches
• Unsupervised Algorithms
• Semi-supervised Algorithms
• Hybrid Approaches
• Reducing Knowledge Acquisition Bottleneck

82

82
Unsupervised

• Progress in word sense disambiguation is stymied by the dearth of labeled training data to train a

classifier for every sense of each word in a given language. There are a few solutions to this problem:

1. Devise a way to cluster instances of a word so that each cluster effectively constrains the examples of
the word to a certain sense. This could be considered sense induction through clustering.

2. Use some metrics to identify the proximity of a given instance with some sets of known senses of a
word and select the closest to be the sense of that instance.

3. Start with seeds of examples of certain senses, then iteratively grow them to form clusters.
NOTE:

We do not discuss in much detail the mostly clustering-based sense induction methods here.

We assume that there is already a predefined sense inventory for a word and that the

unsupervised methods use very few, if any, hand-annotated examples, and then attempt to

classify unseen test instances into one of their predetermined sense categories.
WSD USING CONCEPTUAL DENSITY

• Select a sense based on the relatedness of that word-sense to the


context.

• Relatedness is measured in terms of conceptual distance


• (i.e. how close the concept represented by the word and the concept represented by its context words are)

• This approach uses a structured hierarchical semantic net (WordNet) for


finding the conceptual distance.

• Smaller the conceptual distance higher will be the conceptual density.


85
• (i.e. if all words in the context are strong indicators of a particular concept then that concept will have a higher
density.)
85
CONCEPTUAL DENSITY (EXAMPLE)
 The dots in the figure represent
the senses of the word to be
disambiguated or the senses of
the words in context.

 The CD formula will yield


highest density for the sub-
hierarchy containing more
senses.

 The sense of W contained in


the sub-hierarchy with the
highest CD will be chosen.

86
CONCEPTUAL DENSITY (EXAMPLE)
administrative_unit
body

CD = 0.062
division CD = 0.256

CFILT - IITB
committee department

government department

local department

jury operation police department jury administration

The jury(2) praised the administration(3) and operation (8) of Atlanta Police Department(1)

Step 1: Step
Make2: Step
a lattice
Compute
3: of the
The
the
Stepconcept
4: with
Select the senses
nouns conceptual
in the context,
highesttheir
CD
density
isbelow
selected.
of the selected concept as
senses resultant
and hypernyms.
concepts the correct
(sub- sense for the
hierarchies). respective words. 87

87
The DURel Annotation Tool: Human and Computational Measurement of
Semantic Proximity, Sense Clusters and Semantic Change (December

https://www.changeiskey.org/publication/2023-durel-tool/
2023)

For example consider the adjacent figure which


shows the clusters produced for the word bagage
(Eng: ‘luggage’). As can be seen the usages were
clustered into three main clusters (colored blue,
orange, and green). The blue and orange colored
represented the literal and figurative usage of the
word respectively, while one of the uses was
wrongly put into a third cluster.

https://www2.ims.uni-stuttgart.de/video/durel-tool/230623-
durel-tool-demo.mp4
SenseClusters Generation (Cluster WordNet synsets using
Lin Similarity for WSD)
#!/usr/bin/env python for s in synsets:
import nltk added = False
for c in clusters:
from nltk.corpus import wordnet as wn
for ss in c:
from nltk.corpus import wordnet_ic if s.lin_similarity(ss, ic_corpus) >
nltk.download('punkt') threshold:
nltk.download('averaged_perceptron_tagger') c.append(s)
added = True
nltk.download('wordnet') break
nltk.download('wordnet_ic') if added:
break
def cluster_senses(word, context, if not added:
threshold=0.30,ic_corpus=wordnet_ic.ic('ic- clusters += [[s]]
treebank.dat')):
return clusters
"""
Cluster WordNet synsets using Lin # Example usage
Similarity for WSD. context = "Book that flight"
""" result_clusters = cluster_senses('book', context,
threshold=0.30)
synsets = wn.synsets(word,
pos=wn.VERB) # Print the resulting clusters
clusters = [[synsets.pop()]] for cluster in result_clusters:
print(cluster)
ROADMAP
• Knowledge Based Approaches
• WSD using Selectional Preferences (or restrictions)
• Overlap Based Approaches
• Machine Learning Based Approaches
• Supervised Approaches
• Unsupervised Algorithms
• Semi-supervised Algorithms
• Reducing Knowledge Acquisition Bottleneck
• Summary

91

CFILT - IITB 91
SEMI-SUPERVISED DECISION LIST ALGORITHM

• Based on Yarowsky’s supervised algorithm that uses Decision Lists.

• Step1: Train the Decision List algorithm using a small amount of seed data.

• Step2: Classify the entire sample set using the trained classifier.

• Step3: Create new seed data by adding those members which are tagged as Sense-A or Sense-B with high
probability.

• Step4: Retrain the classifier using the increased seed data.

• Exploits “One sense per discourse” property


• Identify words that are tagged with low confidence and label them with the sense which is dominant for that
document 92

92
Initialization, Progress and Convergence

Residual data

Life Manufacturing

Seed set grows Stop when residual set stabilizes

94

CFILT - IITB 94
Semi-Supervised Approaches – Comparisons &
Conclusions

Approach Average Precision Corpus Average Baseline


Accuracy
Supervised 96.1% Tested on a set of 12 63.9%
Decision Lists highly
polysemous English
words

Semi-Supervised 96.1% Tested on a set of 12 63.9%


Decision Lists highly
polysemous English
words

 Works at par with its supervised version even though it needs significantly less
amount of tagged data.
 Has all the advantages and disadvantaged of its supervised version.

95
ROADMAP
• Knowledge Based Approaches
• WSD using Selectional Preferences (or restrictions)
• Overlap Based Approaches
• Machine Learning Based Approaches
• Supervised Approaches
• Semi-supervised Algorithms
• Unsupervised Algorithms
• Hybrid Approaches
• Reducing Knowledge Acquisition Bottleneck
• Summary

96

CFILT - IITB 96
Overcoming Knowledge Bottle-
Neck
Using Search Engines
• Construct search queries using monosemic words and phrases form
the gloss of a synset.
• Feed these queries to a search engine.
• From the retrieved documents extract the sentences which contain
the search queries.

Using Equivalent Pseudo Words


• Use monosemic words belonging to each sense of an ambiguous
word.
• Use the occurrences of these words in the corpus as training
examples for the ambiguous word.

97
ROADMAP
• Knowledge Based Approaches
• WSD using Selectional Preferences (or restrictions)
• Overlap Based Approaches
• Machine Learning Based Approaches
• Supervised Approaches
• Semi-supervised Algorithms
• Unsupervised Algorithms
• Hybrid Approaches
• Reducing Knowledge Acquisition Bottleneck
• Summary

98
SUMMARY

• Dictionary defined senses do not provide enough surface cues.

• Complete dependence on dictionary defined senses is the primary reason for low accuracies in Knowledge
Based approaches.

• Extracting “sense definitions” or “usage patterns” from the corpus greatly improves the accuracy.

• Word-specific classifiers are able to attain extremely good accuracies but suffer from the problem of non-
reusability.

• Unsupervised algorithms are capable of performing at par with supervised algorithms.

• Relying on single most predictive evidence increases the accuracy.

99
SUMMARY (CONTD.)

• Classifiers that exploit syntactic dependencies between words are able to perform large scale
disambiguation (generic classifiers) and at the same time give reasonably good accuracies.

• Using a diverse set of features improves WSD accuracy.

• WSD results are better when the degree of polysemy is reduced.

• SSI look promising for resource-poor Indian languages.

100

You might also like