Chapter 1
Chapter 1
Chapter 1:
Natural Language Processing (NLP)
5/30/2020 25
selected topic chapter 1
Cont…
References in a given text can be anaphoric or coreferential noun phrases.
The task of reference resolution is to determine which noun phrases refer to
each real world entity mentioned in the text.
Anaphora:
An expression α1 is in an anaphoric relation with expression α2 if and only
if the interpretation of α1 depends on α2.
The relation holds within a text.
Coreference:
Two expressions α1 and α2 are coreferential if and only if
Referent (α1) = Referent (α2).
The expressions can be in the same text or different texts, in the same
5/30/2020language or different language. 26
selected topic chapter 1
Cont…
Some expressions are both coreferential and anaphoric.
A bus had to divert to the local hospital when one of the
passengers had a heart attack. It go to the hospital in time and the man’s life
was saved.
Coreferential: {the local hospital, the hospital} , {bus, it},
{one of the passengers, the man}
Anaphoric: {it}
Pragmatics is the study of how linguistic properties and contextual factors
interact in the interpretation of utterances, enabling hearers to bridge the gap
between sentence meaning and speaker’s meaning.
5/30/2020 27
selected topic chapter 1
Cont…
6.Disambiguation
A text is said to be ambiguous if multiple or alternative linguistic structures can be built
for it.
For example, given the following lexical entry in a lexicon
Ambiguity may occur at:
Phonological level - multiple orthographic representations
Morphological level - multiple word classes
Syntactic level - different ways to parse the tree
Semantic level - different meanings of the same parse tree
Discourse level - different references of the same anaphora
Pragmatic level - cannot be clearly interpreted
5/30/2020 28
selected topic chapter 1
Approaches to NLP
Rule-based Approach
Rule-based systems are based on explicit representation of facts about language through
well-understood knowledge representation schemes and associated algorithms.
Rule-based systems usually consist of a set of rules, an inference engine, and a workspace
or working memory.
Knowledge is represented as facts or rules in the rule-based approach.
The inference engine repeatedly selects a rule whose condition is satisfied and
executes the rule.
The primary source of evidence in rule-based systems comes from human-developed rules
(e.g. grammatical rules) and lexicons.
Rule-based approaches have been used tasks such as information extraction, text
categorization, ambiguity resolution, and so on.
5/30/2020 29
selected topic chapter 1
Cont…
2. Statistical Approach
Statistical approaches employ various mathematical techniques and often use
large text corpora to develop approximate generalized models of linguistic
phenomena based on actual examples of these phenomena provided by the
text corpora without adding significant linguistic or world knowledge.
The primary source of evidence in statistical systems comes from observable
data (e.g. Large text corpora).
Statistical approaches have typically been used in tasks such as speech
recognition, parsing, part-of-speech tagging, statistical machine translation,
statistical grammar learning, and so on.
5/30/2020 30
selected topic chapter 1
Cont…
3. Connectionist Approach
A connectionist model is a network of interconnected simple processing
units with knowledge stored in the weights of the connections between units.
Similar to the statistical approaches, connectionist approaches also develop
generalized models from examples of linguistic phenomena.
What separates connectionism from other statistical methods is that
connectionist models combine statistical learning with various theories of
representation.
Connectionist approaches have been used in tasks such as word-sense
disambiguation, language generation, syntactic parsing, limited domain
translation tasks, and so on.
5/30/2020 31
selected topic chapter 1
Application of NLP
1. Spelling Correction and Grammar Checking
Spelling Correction is a process of detecting and sometimes providing
suggestions for incorrectly spelled words in a text.
Spell Checker is an application program that flags words in a
document that may not be spelled correctly.
Grammar Checking is an application program that checks whether the
sentence is constructed correctly or not.. subject-Verb agreement and
others..
2. Information retrieval
provides a list of potentially relevant documents in response to a
user’s query.
5/30/2020 32
selected topic chapter 1
Cont…
3. Information Extraction focuses on the recognition, tagging, and extraction of certain key
elements of information (e.g. persons, companies, locations, organizations, etc.) from large
collections of text into a structured representation.
It has the following subtasks:
Named Entity Recognition: recognition of entity names.
Relation Detection and Classification: identification of relations between entities.
Coreference and Anaphoric Resolution: resolving links to previously named entities.
Temporal and Event Processing: recognizing temporal expressions and analyzing
events.
Template Filling: filling in the extracted information.
5/30/2020 33
selected topic chapter 1
Cont…
4. Machine Translation is an automatic translation of text from one language to
another.
5. Question-Answering provides the user with either just the text of the answer
itself or answer-providing passages.
6. Dialogue Systems are agents that converse with human beings in a coherent
structure using several modes of communication such
as text, speech, gesture, etc.
7. Text Summarization: reduces a larger text into a shorter, yet richly
constituted representation of the original document.
8. Speech Recognition is the process of converting spoken words (acoustic
signals) into equivalent text.
9.Speech Synthesis, also known as Text-to-Speech system, performs the reverse process, i.e.
artificially produces human speech from a given text.
5/30/2020 34
selected topic chapter 1
Cont…
10. Optical Character Recognition (OCR) is a computerized system that
converts non-editable text to machine-encoded text.
If the text to be converted is handwritten, the system is also known as
Intelligent Character Recognition (ICR).
5/30/2020 35
selected topic chapter 1