100% found this document useful (1 vote)

283 views14 pages

Unit v Discourse Analysis and Lexical Resources

The document discusses discourse analysis, focusing on discourse segmentation, coherence, reference phenomena, and coreference resolution in natural language processing (NLP). It outlines key algorithms like Hobbs and Centering Algorithm for anaphora resolution, and lists various lexical resources such as WordNet and Penn Treebank that support NLP tasks. Additionally, it explains the importance of coherence in discourse and provides examples of segmentation and coreference resolution.

Uploaded by

bms714491

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

283 views14 pages

Unit v Discourse Analysis and Lexical Resources

Uploaded by

bms714491

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

UNIT V DISCOURSE ANALYSIS AND LEXICAL RESOURCES

Discourse segmentation, Coherence – Reference Phenomena, Anaphora Resolution using

Hobbs and Centering Algorithm – Coreference Resolution – Resources: Porter Stemmer,
Lemmatizer, Penn Treebank, Brill's Tagger, WordNet, PropBank, FrameNet, Brown Corpus,
British National Corpus (BNC).

Discourse Segmentation
Discourse segmentation is the process of dividing a text or discourse into meaningful units
such as sentences, paragraphs, or subtopics. These segments help in understanding the
structure and flow of the discourse. It is crucial in applications like summarization, sentiment
analysis, and dialogue systems.

Coherence and Reference Phenomena

Coherence refers to the logical and semantic connections that make a discourse meaningful
and interpretable.
Reference Phenomena involve the use of linguistic elements (like pronouns) that refer to
other parts of the discourse for maintaining coherence. These include:
 Anaphora: Refers to expressions that depend on preceding text for their meaning (e.g.,
"John is here. He is happy").
 Cataphora: Refers to expressions that rely on succeeding text (e.g., "Before he
arrived, John was expected").

Anaphora Resolution
Anaphora resolution is the task of determining the antecedent (the word or phrase to which
an anaphor refers).
Two notable algorithms used for this are:
1. Hobbs' Algorithm:
o A syntactic approach for pronoun resolution.
o Traverses a parse tree of the sentence and searches for antecedents in a
systematic manner.
o Steps involve walking up and down the parse tree to find the most likely
antecedent.
2. Centering Algorithm:
o Focuses on local coherence in a discourse.
o Uses centers (entities that a discourse is about) to resolve references:
 Forward-looking centers: Potential referents in the next utterance.
 Backward-looking centers: Referents carried over from the previous
utterance.
o Attempts to maintain continuity by preferring references to entities that were
prominent in the previous utterance.

Coreference Resolution
Coreference resolution is the broader task of identifying when multiple expressions in a text
refer to the same entity. It includes:
 Resolving anaphors (e.g., "he" in "John said he would come").
 Linking nominal references (e.g., "the professor" and "Dr. Smith").

Resources for NLP

1. Porter Stemmer: A widely used algorithm for stemming (reducing words to their root
form).
2. Lemmatizer: Normalizes words to their dictionary form (e.g., "running" → "run").
3. Penn Treebank: A linguistic corpus annotated with part-of-speech tags and syntactic
trees.
4. Brill's Tagger: A rule-based part-of-speech tagger.
5. WordNet: A lexical database of English words grouped into synsets with semantic
relationships.
6. PropBank: A corpus annotated with predicate-argument structures for semantic role
labeling.
7. FrameNet: A database of semantic frames for modeling word meaning and context.
8. Brown Corpus: A standard corpus for linguistic analysis, containing diverse text
genres.
9. British National Corpus (BNC): A 100-million-word corpus of British English,
covering spoken and written text.
DISCOURSE MEANING:

Discourse refers to communication through spoken or written language. It is a formal term

that encompasses the ways in which language is used to convey ideas, engage in dialogue,
and construct meaning within a particular context.

DISCOURSE SEGMENTATION
Discourse segmentation is the process of dividing a discourse into smaller units, such as
sentences, clauses, paragraphs, or topics, to understand its structure and meaning more
effectively. These units, called discourse segments, help in identifying boundaries and
organizing the information for easier interpretation.
Types of Discourse Segmentation
1. Sentence-Level Segmentation: Splits a discourse into individual sentences.
2. Topic-Based Segmentation: Divides the discourse into sections based on the theme
or subject.
3. Dialogue Segmentation: Segments dialogues into conversational turns or speaker
utterances.
Example of Discourse Segmentation
InputText:
"John went to the store. He bought some apples. Later, he decided to bake a pie."
Segmented Discourse:
1. Sentence 1: "John went to the store."
2. Sentence 2: "He bought some apples."
3. Sentence 3: "Later, he decided to bake a pie."
Each segment conveys a meaningful unit of information, contributing to the overall coherence
of the discourse.
Coherence
Coherence refers to the logical flow and connection between segments in a discourse. A
coherent discourse enables the reader or listener to understand the relationships between ideas
and follow the progression of thought.
Types of Coherence
1. Local Coherence: Connections between adjacent sentences or segments.
2. Global Coherence: Logical consistency across the entire discourse.
Devices for Achieving Coherence
1. Referential Devices
 Definition: Use of pronouns or phrases to refer back to earlier elements in a discourse.
 Example:
Text: John bought a new car. He is very happy with it.
o Explanation:
 He refers back to John.
 It refers back to a new car.
These pronouns maintain coherence by avoiding repetition.

2. Lexical Cohesion
 Definition: Repetition of key terms or use of synonyms to link ideas.
 Example:
Text: The project was challenging. Despite the difficulties, the team embraced the
challenge and completed it successfully.
o Explanation:
 Challenging and challenge are repetitions of a key term.
 Difficulties is a near-synonym, reinforcing the theme of overcoming
obstacles.

3. Conjunctions and Discourse Markers

 Definition: Words or phrases that indicate relationships between sentences or ideas,
such as addition, contrast, cause, or sequence.
 Example:
Text: The team worked hard. However, the project was delayed due to unforeseen
circumstances.
o Explanation:
 However signals a contrast between the team's effort and the project's
delay, maintaining coherence.

4. Ellipsis
 Definition: Omitting repeated words or phrases to avoid redundancy while ensuring
meaning remains clear.
 Example:
Text: James likes pizza, and Sarah does too.
o Explanation:
 The phrase likes pizza is omitted after Sarah does, as it is implied.
 This omission avoids redundancy while maintaining clarity and coheren

Example of Coherence
CoherentText:
"Anna loves gardening. She spends her weekends planting flowers and vegetables. Her
garden is admired by her neighbors."
 Coherence Explanation:
o The subject (Anna and her gardening activities) is maintained throughout.
o Pronoun "she" refers back to "Anna", ensuring cohesion.
o The sentences logically follow each other, expanding on Anna’s gardening.
IncoherentText:
"Anna loves gardening. The weather was cold last week. Many people travel during
holidays."
 Lack of Coherence:
o There is no logical connection between the sentences.
o The topic shifts abruptly from gardening to weather to holidays.

Example of Topic-Based Segmentation and Coherence

InputText:
"The weather was sunny and warm, perfect for a picnic. Sarah decided to invite her friends
for a day at the park. They brought sandwiches, drinks, and games. Later, they enjoyed
playing Frisbee and relaxing under the trees."

Segmented and Coherent Text:

1. Segment 1: Setting the Scene
o "The weather was sunny and warm, perfect for a picnic."
2. Segment 2: Planning the Activity
o "Sarah decided to invite her friends for a day at the park."
3. Segment 3: Enjoying the Picnic
o "They brought sandwiches, drinks, and games. Later, they enjoyed playing
Frisbee and relaxing under the trees."
Coherence Explanation:
 Each segment builds on the previous one, maintaining a logical flow from the weather
to Sarah’s decision and finally to the picnic activities.

REFERENCE PHENOMENA

Reference phenomena are linguistic mechanisms used to link words or expressions within
a discourse, enabling coherence and meaning. References connect elements in the text
(referred to as referents) to their mentions (referred to as anaphors or cataphors).

Types of Reference Phenomena

1. Anaphora:

o Refers back to an earlier expression in the discourse.

o Example: "John arrived late. He apologized."

 He refers back to John.

2. Cataphora:

o Refers forward to an expression introduced later in the discourse.

o Example: "Before he spoke, James took a deep breath."

 He refers to James, which appears later in the sentence.

3. Exophora:

o Refers to something outside the discourse, often in the surrounding physical

context.

o Example: "Look at that!"

 That refers to something visible in the environment.

4. Endophora:

o Refers to elements within the discourse. Includes both anaphora and cataphora.

Example

When she saw him, Mary smiled at John.

 "She" is anaphoric, referring back to "Mary."

 "Him" is cataphoric, referring forward to "John."

Anaphora Resolution

Anaphora resolution involves identifying the antecedent (the referent) for a given anaphor.

1. Hobbs’ Algorithm

Hobbs' algorithm is a syntactic approach for resolving pronominal anaphora. It is

computationally efficient and widely used in natural language processing (NLP).

 Steps of Hobbs’ Algorithm:

1. Parse the sentences to generate syntactic trees.

2. Start at the anaphor and traverse the syntactic tree in a breadth-first manner.

3. Check each potential antecedent for compatibility (e.g., gender, number,

syntactic role).

4. Select the most suitable antecedent.

 Example:
Input Text: "John left early because he was tired."

o Anaphor: he.

o Antecedent: John.

o Hobbs’ algorithm identifies John as the referent by traversing the tree and
matching syntactic and semantic features.

Centering Algorithm

The Centering Algorithm is a technique used to ensure coherence in a discourse by

analyzing how entities (like people, objects, or ideas) are referenced across sentences. It helps
identify the most important subject (called the center) of a sentence and how it connects to
the next sentence.

Steps in the Centering Algorithm

1. Identify Forward-Looking Centers (Cf):

o List all entities in the current sentence.

Example: "John went to the park. He played soccer there."

 In Sentence 1: Forward-looking centers = {John, park}.

2. Find the Backward-Looking Center (Cb):

o Determine the most important entity from the previous sentence that connects to
the current sentence.

Example:

 In Sentence 2: He refers to John, making John the backward-looking center.

3. Rank the Forward-Looking Centers (Cf):

o Rank the entities based on their importance (subjects > objects > others).

Forward-looking centers are ranked based on their importance in the sentence.

Typically, subjects (e.g., "John") are ranked higher than objects (e.g., "the
park").

4. Check Transitions for Coherence:

Coherence in discourse depends on how the backward-looking center (Cb)

changes across sentences. The goal is to minimize unnecessary changes.

Example of Centering Algorithm in Action

Text: "Mary went to the store. She bought apples. The apples were fresh."

Step-by-Step Process:

1. Sentence 1: "Mary went to the store."

o Forward-looking centers (Cf) = {Mary, store}.
o No backward-looking center (Cb) since this is the first sentence.
2. Sentence 2: "She bought apples."
o Forward-looking centers (Cf) = {She (Mary), apples}.
o Backward-looking center (Cb) = Mary (connected via She).
3. Sentence 3: "The apples were fresh."
o Forward-looking centers (Cf) = {apples}.
o Backward-looking center (Cb) = apples (connected to the previous mention of
apples).

Coherence:
The backward-looking center shifts smoothly from Mary to apples as the discourse
progresses, maintaining logical flow and coherence.

COREFERENCE RESOLUTION

Coreference Resolution is the task of identifying when different words or phrases in a

text refer to the same entity. It is a key aspect of natural language understanding and is
widely used in applications like chatbots, machine translation, and summarization.

Types of Coreference

1. Anaphora: Refers to an earlier entity in the text.

o Example: "John arrived late. He apologized."
 He refers to John.
2. Cataphora: Refers to an entity mentioned later in the text.
o Example: "When he arrived, John was surprised to see everyone waiting."
 He refers to John, mentioned later.
3. Split Antecedent Coreference: Refers to a combination of two or more entities.
o Example: "John met Sarah. They had lunch together."
 They refers to John and Sarah.
4. Exophora: Refers to something outside the text, often in the physical context.
o Example: "Look at that!"
 That refers to something observable in the environment.

Steps in Coreference Resolution

1. Identify Mentions:
o Locate all noun phrases or pronouns that could refer to an entity.
o Example: "John loves his dog."
 Mentions: John, his, dog.
2. Extract Features:
o Analyze features like gender, number, semantics, and syntactic position.
o Example:
 John (male, singular) matches he (male, singular).
3. Create Candidate Chains:
o Group potential references into chains based on matching features.
o Example:
 Chain: {John, he}.
4. Resolve Coreference:
o Use algorithms or rules to determine the correct antecedent for each pronoun
or noun phrase.

Algorithms for Coreference Resolution

1. Rule-Based Methods:
o Use hand-crafted rules based on linguistic knowledge.
o Example:
 A pronoun like he typically refers to the nearest preceding male noun
phrase.
2. Machine Learning Approaches:
o Use supervised learning to train models on annotated datasets.
o Features include syntactic roles, word embeddings, and distance between
mentions.
3. Neural Network Models:
o Leverage deep learning to capture complex relationships.
o Example: Transformers like BERT can model contextual relationships in
coreference tasks.

Examples of Coreference

Example 1: Simple Coreference

Text: "Alice visited the park. She enjoyed the walk."

 Coreference: Alice ↔ She.

Example 2: Complex Coreference

Text: "The team worked hard. Their efforts paid off in the end."

 Coreference: The team ↔ Their.

Example 3: Split Antecedent

Text: "John greeted Mary. They went to the café."

 Coreference: John + Mary ↔ They.

Applications of Coreference Resolution

1. Chatbots and Virtual Assistants:

o Enable contextual understanding to maintain coherent conversations.
o Example: "What is Elon Musk's company? Tell me more about it."
Resolves it to Elon Musk's company.
2. Machine Translation:
o Ensures proper pronoun reference in target languages.
3. Text Summarization:
o Helps group mentions of the same entity to create concise summaries.
4. Question Answering Systems:
o Resolves references in follow-up questions.
o Example: "Who won the race? Was he happy?"

RESOURCES

In the context of Natural Language Processing (NLP), resources refer to tools, datasets,
and frameworks that help in various tasks like text analysis, machine learning, and
language understanding. These resources provide foundational data and methods for
processing, analyzing, and understanding language.

1. Porter Stemmer

 What It Is:
A stemming algorithm that reduces words to their base or root form by removing
common suffixes.
 Why It’s Used:
Helps in text normalization by treating words like "running" and "runner" as the
same root word, "run."
 Example:
o Input: "playing, played, playful"
o Output: "play"

2. Lemmatizer

 What It Is:
A tool that converts words to their dictionary base form (lemma), considering the
word’s meaning and context.
 Why It’s Used:
Unlike stemming, it ensures that the base form is a real word and more linguistically
accurate.
 Example:
o Input: "better"
o Output: "good"

3. Penn Treebank
 What It Is:
A dataset containing annotated syntactic structures and part-of-speech tags for
English text.
 Why It’s Used:
Provides a standardized corpus for training and testing NLP models.
 Example Annotation:
o Sentence: "The dog barked loudly."
o POS Tags: [The/DT dog/NN barked/VBD loudly/RB]

4. Brill's Tagger

 What It Is:
A rule-based part-of-speech (POS) tagger developed by Eric Brill.
 Why It’s Used:
Assigns grammatical tags to words based on contextual rules.
 Example:
o Sentence: "She runs fast."
o POS Tags: She/PRP runs/VBZ fast/RB

5. WordNet

 What It Is:
A lexical database for the English language that groups words into synsets (synonym
sets) and provides relationships like hypernyms (broader terms) and hyponyms
(specific terms).
 Why It’s Used:
Supports tasks like word sense disambiguation and semantic analysis.
 Example:
o Word: "car"
o Synsets: {automobile, motorcar}
o Hypernym: "vehicle"

6. PropBank

 What It Is:
A corpus annotated with information about verb arguments and their roles in
sentences (semantic role labeling).
 Why It’s Used:
Enables understanding of sentence semantics by identifying "who did what to
whom."
 Example:
o Sentence: "John gave Mary a book."
o Roles: John (giver), Mary (recipient), book (object)
7. FrameNet

 What It Is:
A database that groups words into semantic frames—concepts that capture the
relationships between words in context.
 Why It’s Used:
Helps in understanding the broader context of sentences.
 Example:
o Frame: Commerce_buy
o Sentence: "She bought a car from the dealer."
o Roles: Buyer (She), Goods (car), Seller (dealer)

8. Brown Corpus

 What It Is:
One of the first large, annotated corpora of English text, covering diverse genres like
fiction, news, and academic writing.
 Why It’s Used:
Provides a standard dataset for linguistic analysis and training language models.
 Example:
o Contains over 1 million words categorized into genres like news and editorials.

9. British National Corpus (BNC)

 What It Is:
A 100-million-word text corpus that represents modern British English from various
contexts like books, conversations, and broadcasts.
 Why It’s Used:
Useful for understanding language trends, dialects, and usage patterns in British
English.
 Example:
o Provides frequency counts of word usage, e.g., "colour" is more common than
"color."

Install NLTK (Natural Language Toolkit)

NLTK is one of the most popular libraries in Python for NLP. It includes a variety of
useful modules for tasks like tokenization, stemming, lemmatization, POS tagging, and
more. To install NLTK:

Steps:

1. Open a terminal or command prompt.

2. Run the following command to install NLTK via pip:

3. After installing NLTK, you can use the following code to download the datasets you
need (like WordNet, Brown Corpus, etc.):

Alternatively, you can download specific datasets

Note: The nltk.download('all') command will download all datasets, which could take some
time and space. If you only need specific resources, you can download them individually as
shown above.

50 Italian Coffee Breaks Bonus Pack
0% (1)
50 Italian Coffee Breaks Bonus Pack
37 pages
Grammar Space 1 TG
100% (1)
Grammar Space 1 TG
85 pages
Unit 1 - Reading: English Id 2 Edition LEVEL 1
0% (1)
Unit 1 - Reading: English Id 2 Edition LEVEL 1
13 pages
Unit I
No ratings yet
Unit I
30 pages
Unit-1 Aim 502
No ratings yet
Unit-1 Aim 502
15 pages
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
No ratings yet
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
11 pages
NLP Unit-3-Semantics-And-Pragmatics
No ratings yet
NLP Unit-3-Semantics-And-Pragmatics
20 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
NLP Lab Manual Updated
No ratings yet
NLP Lab Manual Updated
34 pages
Unit 1 2 3 4 5 NLP Notes Merged
100% (1)
Unit 1 2 3 4 5 NLP Notes Merged
105 pages
NLP Unit-1 Notes
No ratings yet
NLP Unit-1 Notes
59 pages
NLP Module 4 Notes
No ratings yet
NLP Module 4 Notes
8 pages
NLP Notes Unit-3.Doc
No ratings yet
NLP Notes Unit-3.Doc
19 pages
NLP UNIT 1 (Ques Ans Bank)
No ratings yet
NLP UNIT 1 (Ques Ans Bank)
20 pages
IR UNIT I - Notes
No ratings yet
IR UNIT I - Notes
23 pages
Natural Language Processing
100% (1)
Natural Language Processing
21 pages
NLP Unit-1 Notes
No ratings yet
NLP Unit-1 Notes
50 pages
Natural Language Processing
No ratings yet
Natural Language Processing
37 pages
NLP Lect Unit I
100% (1)
NLP Lect Unit I
140 pages
NLP MCQ 153 Out of 427 - Part One
No ratings yet
NLP MCQ 153 Out of 427 - Part One
30 pages
NLP Important and Super Important Questions-18CS743
No ratings yet
NLP Important and Super Important Questions-18CS743
2 pages
WT Unit 3
No ratings yet
WT Unit 3
57 pages
SEM-2-NLP Questions
No ratings yet
SEM-2-NLP Questions
3 pages
Unit 4 NLP Notes
No ratings yet
Unit 4 NLP Notes
35 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
U4 NLP Notes
No ratings yet
U4 NLP Notes
5 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
NLP UNIT 2 (Ques Ans Bank)
No ratings yet
NLP UNIT 2 (Ques Ans Bank)
26 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
Question Bank
No ratings yet
Question Bank
13 pages
FIOT Unit-1 Notes
No ratings yet
FIOT Unit-1 Notes
27 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
Unit 3
100% (1)
Unit 3
11 pages
FIoT Unit 05
No ratings yet
FIoT Unit 05
73 pages
NLP SEM QUESTIONS AND ANSWERS
No ratings yet
NLP SEM QUESTIONS AND ANSWERS
72 pages
Unit 2 - Notes
No ratings yet
Unit 2 - Notes
9 pages
NLP- AI2214601 unit 1to unit 5 notes
No ratings yet
NLP- AI2214601 unit 1to unit 5 notes
98 pages
Types & Classification of Wireless Sensor Networks
No ratings yet
Types & Classification of Wireless Sensor Networks
4 pages
IoT-Unit 2-Part 3-OGC Architecture
No ratings yet
IoT-Unit 2-Part 3-OGC Architecture
6 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
NLP Notes
No ratings yet
NLP Notes
43 pages
Model Question Paper
0% (1)
Model Question Paper
2 pages
Unit 4 Knowledge Representation
No ratings yet
Unit 4 Knowledge Representation
13 pages
Cs8079 - Hci QB Unit 4
No ratings yet
Cs8079 - Hci QB Unit 4
23 pages
Uid-Graphical System Advatages
No ratings yet
Uid-Graphical System Advatages
21 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
Unit 2
No ratings yet
Unit 2
42 pages
SM 6th-Sem Cse Internet-Of-Things
No ratings yet
SM 6th-Sem Cse Internet-Of-Things
76 pages
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
No ratings yet
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
29 pages
Download full Text Analytics with Python A Practical Real World Approach to Gaining Actionable Insights from Your Data 1st Edition Dipanjan Sarkar ebook all chapters
100% (1)
Download full Text Analytics with Python A Practical Real World Approach to Gaining Actionable Insights from Your Data 1st Edition Dipanjan Sarkar ebook all chapters
55 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
TOC Unit 4 PDF
100% (1)
TOC Unit 4 PDF
23 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Unit-III PDF
No ratings yet
Unit-III PDF
72 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
KRR UNIT 1
No ratings yet
KRR UNIT 1
26 pages
Swe1017 NLP Syllabus
No ratings yet
Swe1017 NLP Syllabus
2 pages
NLP Unit-2 Notes
No ratings yet
NLP Unit-2 Notes
45 pages
NLP UNIT 5 part a
No ratings yet
NLP UNIT 5 part a
40 pages
Discourse Linguistics: Discourse Structure Text Coherence and Cohesion Reference Resolution
100% (1)
Discourse Linguistics: Discourse Structure Text Coherence and Cohesion Reference Resolution
41 pages
Discourse Structure Text Coherence and Cohesion Reference Resolution
No ratings yet
Discourse Structure Text Coherence and Cohesion Reference Resolution
41 pages
De Smet Et Al. 2015 - Degrees of Adverbialization
No ratings yet
De Smet Et Al. 2015 - Degrees of Adverbialization
31 pages
Lecture Changing Statements
No ratings yet
Lecture Changing Statements
1 page
Complete Sentences: A Project LA Activity
100% (1)
Complete Sentences: A Project LA Activity
62 pages
Class 2 English MockTest Paper 1 & 2 (2023-24)
No ratings yet
Class 2 English MockTest Paper 1 & 2 (2023-24)
4 pages
How To Make The Present Continuous Tense
No ratings yet
How To Make The Present Continuous Tense
4 pages
A Concise Review of The Principles and Procedures of Explicitation' As A Translation
No ratings yet
A Concise Review of The Principles and Procedures of Explicitation' As A Translation
10 pages
Elementary Test - 2 A Benemerito Juarez Institute: Instructions
No ratings yet
Elementary Test - 2 A Benemerito Juarez Institute: Instructions
1 page
Grade VII Date Subject Portions: Periodic Test - Ii 2019-20
No ratings yet
Grade VII Date Subject Portions: Periodic Test - Ii 2019-20
2 pages
English - Melc R4a Bow
No ratings yet
English - Melc R4a Bow
28 pages
5 C Ingles Quantifiers
No ratings yet
5 C Ingles Quantifiers
3 pages
COPYREADING and HEADLINE Writing
100% (1)
COPYREADING and HEADLINE Writing
52 pages
How Differentiate SPT and
No ratings yet
How Differentiate SPT and
5 pages
Diskusi 5
No ratings yet
Diskusi 5
2 pages
Learn Now Textbook
100% (1)
Learn Now Textbook
139 pages
8 Parts of Speech - Advance Test
No ratings yet
8 Parts of Speech - Advance Test
3 pages
ispahani banglabid syllabus
No ratings yet
ispahani banglabid syllabus
5 pages
Indirect Object Pronouns - 09 10
No ratings yet
Indirect Object Pronouns - 09 10
11 pages
Basic Writing Skill Handout 2015
100% (1)
Basic Writing Skill Handout 2015
33 pages
Conditional Type 2
No ratings yet
Conditional Type 2
13 pages
Are That'-Clauses Singular Terms? 1
No ratings yet
Are That'-Clauses Singular Terms? 1
21 pages
What Is Linguistics
No ratings yet
What Is Linguistics
5 pages
Macmillan 1 Booklet T2 PDF Ellipsis Grammat
No ratings yet
Macmillan 1 Booklet T2 PDF Ellipsis Grammat
2 pages
Grammar Jeopardy
No ratings yet
Grammar Jeopardy
51 pages
Sekolah Menengah Kebangsaan Jalan Kebun English Scheme of Work Form 4 2016
No ratings yet
Sekolah Menengah Kebangsaan Jalan Kebun English Scheme of Work Form 4 2016
18 pages
Class 3 Eng Worksheet August 2024
No ratings yet
Class 3 Eng Worksheet August 2024
2 pages
Microsoft Word - REVIEW ABOUT PRESENT PERFECT CONTINUOUS-07-07-20
No ratings yet
Microsoft Word - REVIEW ABOUT PRESENT PERFECT CONTINUOUS-07-07-20
2 pages
The Structure of American English
No ratings yet
The Structure of American English
4 pages

Unit v Discourse Analysis and Lexical Resources

Uploaded by

Unit v Discourse Analysis and Lexical Resources

Uploaded by

UNIT V DISCOURSE ANALYSIS AND LEXICAL RESOURCES

Discourse segmentation, Coherence – Reference Phenomena, Anaphora Resolution using

Coherence and Reference Phenomena

Resources for NLP

Discourse refers to communication through spoken or written language. It is a formal term

3. Conjunctions and Discourse Markers

Example of Topic-Based Segmentation and Coherence

Segmented and Coherent Text:

Types of Reference Phenomena

o Refers back to an earlier expression in the discourse.

o Example: "John arrived late. He apologized."

 He refers back to John.

o Refers forward to an expression introduced later in the discourse.

o Example: "Before he spoke, James took a deep breath."

 He refers to James, which appears later in the sentence.

o Refers to something outside the discourse, often in the surrounding physical

o Example: "Look at that!"

 That refers to something visible in the environment.

When she saw him, Mary smiled at John.

 "Him" is cataphoric, referring forward to "John."

Hobbs' algorithm is a syntactic approach for resolving pronominal anaphora. It is

 Steps of Hobbs’ Algorithm:

1. Parse the sentences to generate syntactic trees.

3. Check each potential antecedent for compatibility (e.g., gender, number,

4. Select the most suitable antecedent.

The Centering Algorithm is a technique used to ensure coherence in a discourse by

Steps in the Centering Algorithm

1. Identify Forward-Looking Centers (Cf):

o List all entities in the current sentence.

 In Sentence 1: Forward-looking centers = {John, park}.

2. Find the Backward-Looking Center (Cb):

 In Sentence 2: He refers to John, making John the backward-looking center.

3. Rank the Forward-Looking Centers (Cf):

Forward-looking centers are ranked based on their importance in the sentence.

4. Check Transitions for Coherence:

Coherence in discourse depends on how the backward-looking center (Cb)

Example of Centering Algorithm in Action

1. Sentence 1: "Mary went to the store."

Coreference Resolution is the task of identifying when different words or phrases in a

1. Anaphora: Refers to an earlier entity in the text.

Steps in Coreference Resolution

Algorithms for Coreference Resolution

Example 1: Simple Coreference

Text: "Alice visited the park. She enjoyed the walk."

 Coreference: Alice ↔ She.

Example 2: Complex Coreference

 Coreference: The team ↔ Their.

Example 3: Split Antecedent

Text: "John greeted Mary. They went to the café."

 Coreference: John + Mary ↔ They.

Applications of Coreference Resolution

1. Chatbots and Virtual Assistants:

9. British National Corpus (BNC)

Install NLTK (Natural Language Toolkit)

1. Open a terminal or command prompt.

Alternatively, you can download specific datasets

You might also like