100% found this document useful (1 vote)
533 views8 pages

Unit 4

Unit 4 notes for NLP

Uploaded by

Anonymous XhmybK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
533 views8 pages

Unit 4

Unit 4 notes for NLP

Uploaded by

Anonymous XhmybK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

R22 B.Tech.

CSE NLP

Prerequisites:
1. Data structures and compiler design
Course Objectives:

Introduction to some of the problems and solutions of NLP and their relation
to linguistics and statistics.
Course Outcomes:

CS525PE Show sensitivity to linguistic phenomena and an ability to model them with
formal grammars.
Understand and carry out proper experimental methodology for training and
Natural Language evaluating empirical NLP systems
Manipulate probabilities, construct statistical models over strings and
Processing trees, and estimate parameters using supervised and unsupervised training
methods.
Design, implement, and analyze NLP algorithms; and design different
Professional Elective – II language modelling Techniques.
UNIT - I
Finding the Structure of Words: Words and Their
Components, Issues and Challenges, Morphological Models
Finding the Structure of Documents: Introduction, Methods,
Complexity of the Approaches, Performances of the
Approaches, Features
UNIT - II
Page 1 of 76
R22 B.Tech. CSE NLP

Syntax I: Parsing Natural Language, Treebanks: A Data- REFERENCE BOOK:


Driven Approach to Syntax, Representation of Syntactic
1. Speech and Natural Language Processing - Daniel
Structure, Parsing Algorithms
Jurafsky & James H Martin, Pearson Publications.
UNIT – III
2. Natural Language Processing and Information Retrieval:
Syntax II: Models for Ambiguity Resolution in Parsing, Tanvier Siddiqui, U.S. Tiwary.
Multilingual Issues
Semantic Parsing I: Introduction, Semantic Interpretation,
System Paradigms, Word Sense
UNIT - IV
Semantic Parsing II: Predicate-Argument Structure,
Meaning Representation Systems
UNIT - V
Language Modeling: Introduction, N-Gram Models,
Language Model Evaluation, Bayesian parameter
estimation, Language Model Adaptation, Language Models-
class based, variable length, Bayesian topic based,
Multilingual and Cross Lingual Language Modeling
TEXT BOOKS:
1. Multilingual natural Language Processing Applications:
From Theory to Practice – Daniel M. Bikel and Imed Zitouni,
Pearson Publication.

Page 2 of 76
R22 B.Tech. CSE NLP

mitigated by constraints like one sense per discourse.


Performance:
 Studies have shown semisupervised methods to perform Semantic Parsing II: Predicate-Argument Structure, Meaning
well, often achieving accuracy in the mid-80% range when Representation Systems
tested on standard datasets.
Software: 1. Predicate-Argument Structure:
• Several software programs are available for word sense  Predicate argument structure is also called as semantic role
disambiguation. labelling, is a method used to identify the roles of different
• IMS (It makes Sense): This is a complete word sense parts of a sentence.
disambiguation system  The “predicate” is usually a verb (but can also be a noun,
adjective, or preposition) and the “arguments” are the
• WordNet Similarity-2.05: These WordNet similarity modules entities that participate in the action or state described by
for Perl provide a quick way of computing various word similarity the predicate.
measures.
Example:
• WikiRelate: This is a word similarity measure based on Consider the sentence: "The cat chased the mouse."
categories in Wikipedia.
Predicate: chased
Arguments:
The cat (agent)
the mouse (patient)

The PAS for this sentence would be: chased (cat, mouse)

1.1 Resources:
 These resources help computers understand the
Page 54 of 76
R22 B.Tech. CSE NLP

meaning of sentences by identifying the action and Think the word “break” in two different frames:
who is involved.
 This is important for things like translating Frame 1: “Break” as in breaking a rule
languages, answering questions, and even helping Roles:
virtual assistants understands commands better.
i) Framenet Breaker (the person who breaks the rule)
ii) PropBank Rule (the rule being broken)
1.1.1 FRAMENET: Frame 2: “Break” as in breaking an object.
Framenet looks at how words are used in different Roles:
situations (frames) and identifies the roles that other
words play in these situations. Breaker (the person who breaks the object)
It is based on the theory of frame semantics, which
suggests that the meaning of a word can be Rule (the thing being broken)
understood in terms of the physical situations it
describes.
Working:
Key Elements:
1. Identify frames: Researchers identify common
Frames: A frame is a type situation or scenario. Each situations (frames).
frame involves certain 2. Assign frame elements: Each frame has specific
participants, which are called frame elements. roles.
3. Label Sentences: Sentences are tagged with these
Frame Elements: These are the role played by the frames and frame elements to show how words
different participants in a frame. are used in context.
Lexical Units(LUs): These are the pairs of words and Example:
their meanings (frames). Each
Frame: COMMERCE_BUY
lexical unit is a specific meaning of a word in a given
frame. Sentence: “John bought a car from Mary for $20,000.”
Page 55 of 76
R22 B.Tech. CSE NLP

Frame elements: Sentence: “The doctor operates the machine.”


Buyer: John Roles:
Goods: a car Operator (who is operating, e.g., “The doctor”)
Seller: Mary Thing being operated (What is being operated, e.g., ‘The
machine”)
Money: $20,000
Working:
1. Annotations: PropBank annotates verbs in the wall
1.1.2 PROPBANK: street journal section of the Penn Treebank. Each verb
PropBank is a corpus of texts where each verb is is tagged with its core arguments (like the subject,
annotated with its arguments, giving us a clear idea object) and adjunctive arguments (like time, location)
of who is doing what to whom in a sentence. 2. Framesets: Each verb has a frameset that lists possible
This helps in understanding the roles of different argument structures (roles) it can take, along with
entities in relation to the verb. descriptions of these roles.

Key Elements: Example PropBank annotations:

Predicate: Usually a verb, it represents an action or state. Sentence: “John gave Mary a book”

Arguments: The participants involved in the action or Predicate: gave


state described by the predicate. Arguments:
Arguments are categorized as core (essential to the ARG0 (Agent): John (the one who gives)
meaning of the predicate) or
ARG1 (Theme): a book (the thing given)
Adjunctive (Providing additional information).
ARG2 (Recipient): Mary (the one who receives)
In PropBank notation, this might be represented as:
For the verb” operate”
Page 56 of 76
R22 B.Tech. CSE NLP

[ARG0 John] [gave] [ARG2 Mary] [ARG1 a book]. ARGM-MOD: Modality (e.g., “can,” “might”)
Core Arguments: Example of a complex Annotation:
These are essential participants directly involved with the Sentence: “The company operates stores mostly in Iowa
predicate: and Nebraska.”
ARG0: Typically, the agent or does of the action. Predicate: operates
ARG1: Typically, the patient or theme (the entity Arguments:
undergoing the action).
ARG0 (Agent): The company
ARG2, ARG3, ARG4: Other roles that vary depending
on the verb’s meaning. ARG1 (Theme): stores
ARGM-LOC (Location): mostly in Iowa and
Nebraska
Adjunctive Arguments:
1.2 Other Resources:
These provide additional information about the action and
are labelled as ARGM-XYZ, where XYZ indicates the 1.Nombank
type of information: 2.VerbNet
ARGM-LOC: Location (e.g., “in the hotel”) 1.3 Software’s:
ARGM-TMP: Time (e.g., “yesterday”) Following is a list of software packages available for
ARGM-MNR: Manner (e.g., “quickly”) semantic role labelling.

ARGM-CAU: Cause (e.g., “because he was hungry”) 1. ASSERT: (Automatic Statistical Semantic Role
Tagger)
ARGM-DIR: Direction (e.g., “to the store”) A semantic role labeller trained on the English
PropBank data.
ARGM-PRP: Purpose (e.g., “to buy groceries”) 2. C-ASSERT:
ARGM-NEG: Negation (e.g., “not”)
Page 57 of 76
R22 B.Tech. CSE NLP

An extension of ASSERT for Chinese language. develop systems that convert natural language into a form
of usable by applications for decision-making. Specifically,
3. SwiRL: it focused on transforming user queries about flight
Another semantic role labeller trained on PropBank information into SQL queries to extract answers from a
data. flight database.
4. Shalmaneser (A Shallow Semantic Parser):
A toolchain for shallow semantic parsing based on the Here’s how it worked:
FrameNet Data. 1. A user would ask a question in natural speech
using a restricted vocabulary.
2. Meaning Representation Systems: 2. The system would convert this query into a
hierarchical frame representation, encoding the
 Meaning representation is a deeper level of semantic
interpretation aimed at converting natural language into a essential semantic information.
format that machines can understand and act on. 3. This representation was then compiled into a
 This process is similar to how programming languages are SQL query to retrieve the required data from the
compiled into machine code that Computers execute. database.
 Unlike Artificial Languages, natural language is flexible The ATIS training corpus included over 7,300 spoken
and relies on context and general world knowledge for utterances from 137 subjects with 2,900 of them categorized and
understanding, which poses a challenge for machines. annotated and around 600 treebanked for detailed syntactic
 Researchers have been working for decades to develop analysis. This resource helped promote experimentation in
methods to interpret and encode the context and knowledge transforming natural language into machine-readable formats.
for machines.
 However, current techniques are limited to specific 2. COMMUNICATOR:
domains and problems and do not scale well to arbitrary The communicator Program was next step after ATIS
domains. project. While ATIS focused on user:
2.1 Resources: Initiated dialogues where users ask questions and
1. ATIS (Air Travel Information System): machines provide answers. Communicator introduced & mixed-
initiative dialog system. This means both user and the machine
The ATIS project was one of the first major efforts to could actively participate in the conversation.
Page 58 of 76
R22 B.Tech. CSE NLP

3. GeoQuery:
GeoQuery is a Natural Language Interface (NLI) designed
to interact with geographic database called Geobase. Geobase Language Modeling: Introduction, N-Gram Models,
contains about 800 prolog facts, which store geographic Language Model Evaluation, Bayesian parameter
information such as populations, neighbouring states, major
estimation, Language Model Adaptation, Language Models-
rivers, and major cities in a relational database.
class based, variable length, Bayesian topic based,
4. Robocup: CLang Multilingual and Cross Lingual Language Modeling
Robocup is an international competition where teams of
robots play soccer, and it’s organized by the artificial intelligence
community. The goal is to advance AI and robotics research Language Modeling:
through this challenging and fun domain. 5.1 Introduction:
What is language modeling?
2.2 Software’s: Language modeling, or LM, is the use of various statistical
 WASP and probabilistic techniques to determine the probability of a
 KRISPER given sequence of words occurring in a sentence. Language
 CHILL models analyze bodies of text data to provide a basis for their word
predictions.
Language modeling is used in artificial intelligence (AI),
natural language processing (NLP), natural language
understanding and natural language generation systems,
particularly ones that perform text generation, machine translation
and question answering.

How language modeling works:

Page 59 of 76

You might also like