Unit 4

Unit 4 notes for NLP

Uploaded by

Anonymous XhmybK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

533 views8 pages

Unit 4

Unit 4 notes for NLP

Uploaded by

Anonymous XhmybK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

R22 B.Tech.

CSE NLP

Prerequisites:
1. Data structures and compiler design
Course Objectives:

Introduction to some of the problems and solutions of NLP and their relation
to linguistics and statistics.
Course Outcomes:

CS525PE Show sensitivity to linguistic phenomena and an ability to model them with
formal grammars.
Understand and carry out proper experimental methodology for training and
Natural Language evaluating empirical NLP systems
Manipulate probabilities, construct statistical models over strings and
Processing trees, and estimate parameters using supervised and unsupervised training
methods.
Design, implement, and analyze NLP algorithms; and design different
Professional Elective – II language modelling Techniques.
UNIT - I
Finding the Structure of Words: Words and Their
Components, Issues and Challenges, Morphological Models
Finding the Structure of Documents: Introduction, Methods,
Complexity of the Approaches, Performances of the
Approaches, Features
UNIT - II
Page 1 of 76
R22 B.Tech. CSE NLP

Syntax I: Parsing Natural Language, Treebanks: A Data- REFERENCE BOOK:

Driven Approach to Syntax, Representation of Syntactic
1. Speech and Natural Language Processing - Daniel
Structure, Parsing Algorithms
Jurafsky & James H Martin, Pearson Publications.
UNIT – III
2. Natural Language Processing and Information Retrieval:
Syntax II: Models for Ambiguity Resolution in Parsing, Tanvier Siddiqui, U.S. Tiwary.
Multilingual Issues
Semantic Parsing I: Introduction, Semantic Interpretation,
System Paradigms, Word Sense
UNIT - IV
Semantic Parsing II: Predicate-Argument Structure,
Meaning Representation Systems
UNIT - V
Language Modeling: Introduction, N-Gram Models,
Language Model Evaluation, Bayesian parameter
estimation, Language Model Adaptation, Language Models-
class based, variable length, Bayesian topic based,
Multilingual and Cross Lingual Language Modeling
TEXT BOOKS:
1. Multilingual natural Language Processing Applications:
From Theory to Practice – Daniel M. Bikel and Imed Zitouni,
Pearson Publication.

Page 2 of 76
R22 B.Tech. CSE NLP

mitigated by constraints like one sense per discourse.

Performance:
 Studies have shown semisupervised methods to perform Semantic Parsing II: Predicate-Argument Structure, Meaning
well, often achieving accuracy in the mid-80% range when Representation Systems
tested on standard datasets.
Software: 1. Predicate-Argument Structure:
• Several software programs are available for word sense  Predicate argument structure is also called as semantic role
disambiguation. labelling, is a method used to identify the roles of different
• IMS (It makes Sense): This is a complete word sense parts of a sentence.
disambiguation system  The “predicate” is usually a verb (but can also be a noun,
adjective, or preposition) and the “arguments” are the
• WordNet Similarity-2.05: These WordNet similarity modules entities that participate in the action or state described by
for Perl provide a quick way of computing various word similarity the predicate.
measures.
Example:
• WikiRelate: This is a word similarity measure based on Consider the sentence: "The cat chased the mouse."
categories in Wikipedia.
Predicate: chased
Arguments:
The cat (agent)
the mouse (patient)

The PAS for this sentence would be: chased (cat, mouse)

1.1 Resources:
 These resources help computers understand the
Page 54 of 76
R22 B.Tech. CSE NLP

meaning of sentences by identifying the action and Think the word “break” in two different frames:
who is involved.
 This is important for things like translating Frame 1: “Break” as in breaking a rule
languages, answering questions, and even helping Roles:
virtual assistants understands commands better.
i) Framenet Breaker (the person who breaks the rule)
ii) PropBank Rule (the rule being broken)
1.1.1 FRAMENET: Frame 2: “Break” as in breaking an object.
Framenet looks at how words are used in different Roles:
situations (frames) and identifies the roles that other
words play in these situations. Breaker (the person who breaks the object)
It is based on the theory of frame semantics, which
suggests that the meaning of a word can be Rule (the thing being broken)
understood in terms of the physical situations it
describes.
Working:
Key Elements:
1. Identify frames: Researchers identify common
Frames: A frame is a type situation or scenario. Each situations (frames).
frame involves certain 2. Assign frame elements: Each frame has specific
participants, which are called frame elements. roles.
3. Label Sentences: Sentences are tagged with these
Frame Elements: These are the role played by the frames and frame elements to show how words
different participants in a frame. are used in context.
Lexical Units(LUs): These are the pairs of words and Example:
their meanings (frames). Each
Frame: COMMERCE_BUY
lexical unit is a specific meaning of a word in a given
frame. Sentence: “John bought a car from Mary for $20,000.”
Page 55 of 76
R22 B.Tech. CSE NLP

Frame elements: Sentence: “The doctor operates the machine.”

Buyer: John Roles:
Goods: a car Operator (who is operating, e.g., “The doctor”)
Seller: Mary Thing being operated (What is being operated, e.g., ‘The
machine”)
Money: $20,000
Working:
1. Annotations: PropBank annotates verbs in the wall
1.1.2 PROPBANK: street journal section of the Penn Treebank. Each verb
PropBank is a corpus of texts where each verb is is tagged with its core arguments (like the subject,
annotated with its arguments, giving us a clear idea object) and adjunctive arguments (like time, location)
of who is doing what to whom in a sentence. 2. Framesets: Each verb has a frameset that lists possible
This helps in understanding the roles of different argument structures (roles) it can take, along with
entities in relation to the verb. descriptions of these roles.

Key Elements: Example PropBank annotations:

Predicate: Usually a verb, it represents an action or state. Sentence: “John gave Mary a book”

Arguments: The participants involved in the action or Predicate: gave

state described by the predicate. Arguments:
Arguments are categorized as core (essential to the ARG0 (Agent): John (the one who gives)
meaning of the predicate) or
ARG1 (Theme): a book (the thing given)
Adjunctive (Providing additional information).
ARG2 (Recipient): Mary (the one who receives)
In PropBank notation, this might be represented as:
For the verb” operate”
Page 56 of 76
R22 B.Tech. CSE NLP

[ARG0 John] [gave] [ARG2 Mary] [ARG1 a book]. ARGM-MOD: Modality (e.g., “can,” “might”)
Core Arguments: Example of a complex Annotation:
These are essential participants directly involved with the Sentence: “The company operates stores mostly in Iowa
predicate: and Nebraska.”
ARG0: Typically, the agent or does of the action. Predicate: operates
ARG1: Typically, the patient or theme (the entity Arguments:
undergoing the action).
ARG0 (Agent): The company
ARG2, ARG3, ARG4: Other roles that vary depending
on the verb’s meaning. ARG1 (Theme): stores
ARGM-LOC (Location): mostly in Iowa and
Nebraska
Adjunctive Arguments:
1.2 Other Resources:
These provide additional information about the action and
are labelled as ARGM-XYZ, where XYZ indicates the 1.Nombank
type of information: 2.VerbNet
ARGM-LOC: Location (e.g., “in the hotel”) 1.3 Software’s:
ARGM-TMP: Time (e.g., “yesterday”) Following is a list of software packages available for
ARGM-MNR: Manner (e.g., “quickly”) semantic role labelling.

ARGM-CAU: Cause (e.g., “because he was hungry”) 1. ASSERT: (Automatic Statistical Semantic Role
Tagger)
ARGM-DIR: Direction (e.g., “to the store”) A semantic role labeller trained on the English
PropBank data.
ARGM-PRP: Purpose (e.g., “to buy groceries”) 2. C-ASSERT:
ARGM-NEG: Negation (e.g., “not”)
Page 57 of 76
R22 B.Tech. CSE NLP

An extension of ASSERT for Chinese language. develop systems that convert natural language into a form
of usable by applications for decision-making. Specifically,
3. SwiRL: it focused on transforming user queries about flight
Another semantic role labeller trained on PropBank information into SQL queries to extract answers from a
data. flight database.
4. Shalmaneser (A Shallow Semantic Parser):
A toolchain for shallow semantic parsing based on the Here’s how it worked:
FrameNet Data. 1. A user would ask a question in natural speech
using a restricted vocabulary.
2. Meaning Representation Systems: 2. The system would convert this query into a
hierarchical frame representation, encoding the
 Meaning representation is a deeper level of semantic
interpretation aimed at converting natural language into a essential semantic information.
format that machines can understand and act on. 3. This representation was then compiled into a
 This process is similar to how programming languages are SQL query to retrieve the required data from the
compiled into machine code that Computers execute. database.
 Unlike Artificial Languages, natural language is flexible The ATIS training corpus included over 7,300 spoken
and relies on context and general world knowledge for utterances from 137 subjects with 2,900 of them categorized and
understanding, which poses a challenge for machines. annotated and around 600 treebanked for detailed syntactic
 Researchers have been working for decades to develop analysis. This resource helped promote experimentation in
methods to interpret and encode the context and knowledge transforming natural language into machine-readable formats.
for machines.
 However, current techniques are limited to specific 2. COMMUNICATOR:
domains and problems and do not scale well to arbitrary The communicator Program was next step after ATIS
domains. project. While ATIS focused on user:
2.1 Resources: Initiated dialogues where users ask questions and
1. ATIS (Air Travel Information System): machines provide answers. Communicator introduced & mixed-
initiative dialog system. This means both user and the machine
The ATIS project was one of the first major efforts to could actively participate in the conversation.
Page 58 of 76
R22 B.Tech. CSE NLP

3. GeoQuery:
GeoQuery is a Natural Language Interface (NLI) designed
to interact with geographic database called Geobase. Geobase Language Modeling: Introduction, N-Gram Models,
contains about 800 prolog facts, which store geographic Language Model Evaluation, Bayesian parameter
information such as populations, neighbouring states, major
estimation, Language Model Adaptation, Language Models-
rivers, and major cities in a relational database.
class based, variable length, Bayesian topic based,
4. Robocup: CLang Multilingual and Cross Lingual Language Modeling
Robocup is an international competition where teams of
robots play soccer, and it’s organized by the artificial intelligence
community. The goal is to advance AI and robotics research Language Modeling:
through this challenging and fun domain. 5.1 Introduction:
What is language modeling?
2.2 Software’s: Language modeling, or LM, is the use of various statistical
 WASP and probabilistic techniques to determine the probability of a
 KRISPER given sequence of words occurring in a sentence. Language
 CHILL models analyze bodies of text data to provide a basis for their word
predictions.
Language modeling is used in artificial intelligence (AI),
natural language processing (NLP), natural language
understanding and natural language generation systems,
particularly ones that perform text generation, machine translation
and question answering.

How language modeling works:

Page 59 of 76

ML Unit 4
No ratings yet
ML Unit 4
50 pages
Form Contractor Induction Checklist
0% (1)
Form Contractor Induction Checklist
2 pages
Answer Scheme Quiz CSC126 - Okt2021
100% (2)
Answer Scheme Quiz CSC126 - Okt2021
7 pages
NLP QB
100% (2)
NLP QB
14 pages
Unit 2
No ratings yet
Unit 2
15 pages
Unit 1
No ratings yet
Unit 1
24 pages
U4 NLP Notes
No ratings yet
U4 NLP Notes
5 pages
Natural Language Processing
100% (2)
Natural Language Processing
48 pages
NLP Unit-Iv
No ratings yet
NLP Unit-Iv
124 pages
Unit 3
No ratings yet
Unit 3
19 pages
Unit 5
No ratings yet
Unit 5
20 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
NLP Final
No ratings yet
NLP Final
26 pages
Unit 1 2 3 4 5 NLP Notes Merged
100% (1)
Unit 1 2 3 4 5 NLP Notes Merged
105 pages
Unit 4 NLP Notes
No ratings yet
Unit 4 NLP Notes
35 pages
NLP UNIT 2 (Ques Ans Bank)
No ratings yet
NLP UNIT 2 (Ques Ans Bank)
26 pages
NLP Notes
No ratings yet
NLP Notes
43 pages
UNIT V Application Layer
100% (1)
UNIT V Application Layer
18 pages
NLP SEM QUESTIONS AND ANSWERS
No ratings yet
NLP SEM QUESTIONS AND ANSWERS
72 pages
NLP Notes Unit-3.Doc
No ratings yet
NLP Notes Unit-3.Doc
19 pages
NLP Unit-3-Semantics-And-Pragmatics
No ratings yet
NLP Unit-3-Semantics-And-Pragmatics
20 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
51 pages
Unit-III PDF
No ratings yet
Unit-III PDF
72 pages
Natural Language Processing: Dr. Abdulfetah A.A
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
25 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
NLP LAB MANUAL 3-2 AIML R22 UPDATE (1)
100% (1)
NLP LAB MANUAL 3-2 AIML R22 UPDATE (1)
20 pages
ML unit-3
No ratings yet
ML unit-3
23 pages
Natural Language Processing
100% (1)
Natural Language Processing
21 pages
NLP Unit I Notes-1
75% (4)
NLP Unit I Notes-1
22 pages
NLP Lect Unit I
100% (1)
NLP Lect Unit I
140 pages
NLP UNIT-II
No ratings yet
NLP UNIT-II
71 pages
CCS369 - TSS-Unit 3
No ratings yet
CCS369 - TSS-Unit 3
55 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
NLP- AI2214601 unit 1to unit 5 notes
No ratings yet
NLP- AI2214601 unit 1to unit 5 notes
98 pages
Shivangi Tyagi (NLP Assignments)
No ratings yet
Shivangi Tyagi (NLP Assignments)
60 pages
Chapter 7
No ratings yet
Chapter 7
49 pages
SEM-2-NLP Questions
No ratings yet
SEM-2-NLP Questions
3 pages
NLP Unit-1 Notes
No ratings yet
NLP Unit-1 Notes
59 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
Unit-1 Aim 502
No ratings yet
Unit-1 Aim 502
15 pages
NLP Lab Manual Updated
No ratings yet
NLP Lab Manual Updated
34 pages
NLP Unit III
No ratings yet
NLP Unit III
17 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Natural Language Processing
No ratings yet
Natural Language Processing
36 pages
NLP Iat QB
No ratings yet
NLP Iat QB
10 pages
Unit I
No ratings yet
Unit I
30 pages
NLP UNIT III Notes
100% (5)
NLP UNIT III Notes
9 pages
ML unit-5
No ratings yet
ML unit-5
14 pages
NLP Presentation
No ratings yet
NLP Presentation
19 pages
NLP Unit V Notes
100% (1)
NLP Unit V Notes
21 pages
NLP Notes
No ratings yet
NLP Notes
71 pages
Unit 3
100% (1)
Unit 3
11 pages
Solutions To NLP I Mid Set A
100% (1)
Solutions To NLP I Mid Set A
8 pages
Natural Language Processing Module 1 Notes PDF
100% (3)
Natural Language Processing Module 1 Notes PDF
15 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Ccs369-Unit 3
No ratings yet
Ccs369-Unit 3
28 pages
NLP Notes For Students
No ratings yet
NLP Notes For Students
18 pages
Natural Language Processing
No ratings yet
Natural Language Processing
24 pages
NLP Unit4
No ratings yet
NLP Unit4
13 pages
NLP JNTUH unit 4
No ratings yet
NLP JNTUH unit 4
22 pages
Using The Force-Velocity Curve To Build Better Athletes - Elite FTS PDF
No ratings yet
Using The Force-Velocity Curve To Build Better Athletes - Elite FTS PDF
8 pages
Charms and Amulets
No ratings yet
Charms and Amulets
4 pages
CFT
No ratings yet
CFT
25 pages
DND - 5E - CharacterSheet - Form Fillable
No ratings yet
DND - 5E - CharacterSheet - Form Fillable
3 pages
Compliance - Coi Filter - Dec 2008 - Johny
No ratings yet
Compliance - Coi Filter - Dec 2008 - Johny
1 page
Polo
No ratings yet
Polo
9 pages
Compile Pre Board Exam Gen Ed 2012 1
No ratings yet
Compile Pre Board Exam Gen Ed 2012 1
156 pages
Verilog Tutorial 1
No ratings yet
Verilog Tutorial 1
6 pages
WeDeisgn (FR)
No ratings yet
WeDeisgn (FR)
3 pages
Monitoring Checklist For Good Education in Beautiful Classroom (GEBC)
No ratings yet
Monitoring Checklist For Good Education in Beautiful Classroom (GEBC)
4 pages
LEARNERS IN PUBLIC PRIVATE SUCsLUCs IN KINDER ELEMJHS SHS ON DIFF DISTANCE TEACHING LEARNING MODALITIES 3
No ratings yet
LEARNERS IN PUBLIC PRIVATE SUCsLUCs IN KINDER ELEMJHS SHS ON DIFF DISTANCE TEACHING LEARNING MODALITIES 3
28 pages
STD.X Social Science Geography
No ratings yet
STD.X Social Science Geography
3 pages
PROGRESSION SHEETS Form 1,2,3,4, and 5
No ratings yet
PROGRESSION SHEETS Form 1,2,3,4, and 5
18 pages
Finalize Resume - Zety
No ratings yet
Finalize Resume - Zety
1 page
COLORCOT PRICE LIST 18th FEB 2011
No ratings yet
COLORCOT PRICE LIST 18th FEB 2011
1 page
A Little Princess - Act 3 (Scene 1 Only)
No ratings yet
A Little Princess - Act 3 (Scene 1 Only)
4 pages
Complex Analysis
No ratings yet
Complex Analysis
2 pages
Adr Interview Question Bank
No ratings yet
Adr Interview Question Bank
5 pages
NQA ISO 22000 Implementation Guide
100% (1)
NQA ISO 22000 Implementation Guide
20 pages
Conditionals
No ratings yet
Conditionals
1 page
Manual Mikro PFRNX
No ratings yet
Manual Mikro PFRNX
6 pages
AOP Mid-Year Accomplishment Report (To Be Submitted After The First Semester of Current SY)
No ratings yet
AOP Mid-Year Accomplishment Report (To Be Submitted After The First Semester of Current SY)
6 pages
KJO2086
No ratings yet
KJO2086
13 pages
Configuration Diagrams: Group 80A
No ratings yet
Configuration Diagrams: Group 80A
24 pages
Drawing and Sketching
100% (1)
Drawing and Sketching
24 pages
Case118Barras_IEEE
No ratings yet
Case118Barras_IEEE
1 page
Ict Theory
No ratings yet
Ict Theory
5 pages
DC Panel-220kv Ha Dong Ss
No ratings yet
DC Panel-220kv Ha Dong Ss
63 pages