0% found this document useful (0 votes)

8 views

Cs224n 2025 Lecture04 Dep Parsing

The document outlines a lecture on Dependency Parsing within the context of Natural Language Processing with Deep Learning, covering topics such as syntactic structure, dependency grammar, and various parsing methods. Key learnings include the importance of understanding linguistic structure for effective language interpretation and the implementation of a neural dependency parser using PyTorch. The lecture also emphasizes the significance of backpropagation and gradient computation in neural networks.

Uploaded by

imuyangyang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Cs224n 2025 Lecture04 Dep Parsing

Uploaded by

imuyangyang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Natural Language Processing

with Deep Learning

CS224N/Ling284

Diyi Yang
Lecture 4: Dependency Parsing
Lecture Plan
Finish backpropagation (10 mins)
Syntactic Structure and Dependency parsing
1. Syntactic Structure: Consistency and Dependency (20 mins)
2. Dependency Grammar and Treebanks (15 mins)
3. Transition-based dependency parsing (15 mins)
4. Neural dependency parsing (20 mins)

Key Learnings: Explicit linguistic structure and how a neural net can decide it

Reminders/comments:
• In Assignment 2, you build a neural dependency parser using PyTorch!
• Come to the PyTorch tutorial, Friday, 1:30pm Gates B01
• Final project discussions – come meet with us; focus of Tuesday class in week 4
2
Back-Prop in General Computation Graph
1. Fprop: visit nodes in topological sort order
Single scalar output - Compute value of node given predecessors
2. Bprop:
- initialize output gradient = 1
… - visit nodes in reverse order:
Compute gradient wrt each node using
gradient wrt successors
… = successors of

Done correctly, big O() complexity of fprop and

bprop is the same
In general, our nets have regular layer-structure
Inputs and so we can use matrices and Jacobians…
3
Automatic Differentiation

• The gradient computation can be

automatically inferred from the symbolic
expression of the fprop
• Each node type needs to know how to
compute its output and how to compute
the gradient wrt its inputs given the
gradient wrt its output
• Modern DL frameworks (Tensorflow,
PyTorch, etc.) do backpropagation for
you but mainly leave layer/node writer
to hand-calculate the local derivative
4
Backprop Implementations

5
Implementation: forward/backward API

6
Implementation: forward/backward API

7
Manual Gradient checking: Numeric Gradient

• For small h (≈ 1e-4),

• Easy to implement correctly
• But approximate and very slow:
• You have to recompute f for every parameter of our model

• Useful for checking your implementation

• In the old days, we hand-wrote everything, doing this everywhere was the key test
• Now much less needed; you can use it to check layers are correctly implemented

8
Summary

We’ve mastered the core technology of neural nets!

• Backpropagation: recursively (and hence efficiently) apply the chain rule

along computation graph
• [downstream gradient] = [upstream gradient] x [local gradient]

• Forward pass: compute results of operations and save intermediate

values
• Backward pass: apply chain rule to compute gradients
9
Why learn all these details about gradients?
• Modern deep learning frameworks compute gradients for you!
• Come to the PyTorch introduction this Friday!

• But why take a class on compilers or systems when they are implemented for you?
• Understanding what is going on under the hood is useful!

• Backpropagation doesn’t always work perfectly out of the box

• Understanding why is crucial for debugging and improving models
• See Karpathy article (in syllabus):
• https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
• Example in future lecture: exploding and vanishing gradients

10
Lecture Plan
✓ Finish backpropagation (10 mins)
Syntactic Structure and Dependency parsing
1. Syntactic Structure: Consistency and Dependency (20 mins)
2. Dependency Grammar and Treebanks (15 mins)
3. Transition-based dependency parsing (15 mins)
4. Neural dependency parsing (20 mins)

Key Learnings: Explicit linguistic structure and how a neural net can decide it

Reminders/comments:
• In Assignment 2, you build a neural dependency parser using PyTorch!
• Come to the PyTorch tutorial, Friday, 1:30pm Gates B01
• Final project discussions – come meet with us; focus of Tuesday class in week 4
11
1. The linguistic structure of sentences – two views: Constituency
= phrase structure grammar = context-free grammars (CFGs)
Phrase structure organizes words into nested constituents

Starting unit: words

the, cat, cuddly, by, door

Words combine into phrases

the cuddly cat, by the door

Phrases can combine into bigger phrases

the cuddly cat by the door

12
The linguistic structure of sentences – two views: Constituency =
phrase structure grammar = context-free grammars (CFGs)
Phrase structure organizes words into nested constituents.

the cat
a dog
large in a crate
barking on the table
cuddly by the door
large barking
talk to
walked behind

14
Two views of linguistic structure: Dependency structure
• Dependency structure shows which words depend on (modify, attach to, or are
arguments of) which other words.

Look in the large crate in the kitchen by the door

16
Why do we need sentence structure?

Humans communicate complex ideas by composing words together

into bigger units to convey complex meanings

Human listeners need to work out what modifies [attaches to] what

A model needs to understand sentence structure in order to be able

to interpret language correctly

18
Prepositional phrase attachment ambiguity

19
Prepositional phrase attachment ambiguity

Scientists count whales from space

✓

Scientists count whales from space

20
PP attachment ambiguities multiply

• A key parsing decision is how we ‘attach’ various constituents

• PPs, adverbial or participial phrases, infinitives, coordinations, etc.

• Catalan numbers: Cn = (2n)!/[(n+1)!n!]

• An exponentially growing series, which arises in many tree-like contexts:
• E.g., the number of possible triangulations of a polygon with n+2 sides
21 • Turns up in triangulation of probabilistic graphical models (CS228)….
Coordination scope ambiguity

Shuttle veteran and longtime NASA executive Fred Gregory appointed to board

23
Coordination scope ambiguity

24
Adjectival/Adverbial Modifier Ambiguity

25
Verb Phrase (VP) attachment ambiguity

26
Dependency paths help extract semantic interpretation –
simple practical example: extracting protein-protein interaction

demonstrated
nsubj ccomp

results mark interacts nmod:with

det
that advmod SasA
nsubj case conj:and
The
KaiC rythmically with KaiA and KaiB
conj:and cc
KaiC nsubj interacts nmod:with ➔ SasA
KaiC nsubj interacts nmod:with ➔ SasA conj:and➔ KaiA
KaiC nsubj interacts nmod:with ➔ SasA conj:and➔ KaiB

[Erkan et al. EMNLP 07, Fundel et al. 2007, etc.]

27
2. Dependency Grammar and Dependency Structure
Dependency syntax postulates that syntactic structure consists of relations between
lexical items, normally binary asymmetric relations (“arrows”) called dependencies

Bills on ports and immigration were submitted by

submitted
Senator Brownback， Republican of Kansas.

Bills were Brownback

ports
by Senator Republican

on and immigration
Kansas

of
28
Dependency Grammar and Dependency Structure
Dependency syntax postulates that syntactic structure consists of relations between
lexical items, normally binary asymmetric relations (“arrows”) called dependencies

submitted
The arrows are nsubj:pass obl
aux
commonly typed
Bills were Brownback
with the name of nmod
grammatical case appos
ports flat
relations (subject,
case cc conj by Senator Republican
prepositional object,
apposition, etc.) on and immigration nmod
Kansas
case
of
29
Dependency Grammar and Dependency Structure
Dependency syntax postulates that syntactic structure consists of relations between
lexical items, normally binary asymmetric relations (“arrows”) called dependencies

An arrow connects a head submitted

with a dependent nsubj:pass aux obl

Bills were Brownback

Usually, dependencies nmod
case appos
form a tree (a connected, ports flat
acyclic, single-root graph) case conj by Senator Republican
cc

on and immigration nmod

Kansas
case
of
30
Pāṇini’s grammar (c. 5th century BCE)

Gallery: http://wellcomeimages.org/indexplus/image/L0032691.html
CC BY 4.0 File:Birch bark MS from Kashmir of the Rupavatra Wellcome L0032691.jpg
But this comes from much later – originally the grammar was oral
31
Dependency Grammar/Parsing History
• The idea of dependency structure goes back a long way
• To Pāṇini’s grammar (c. 5th century BCE)
• Basic approach of 1st millennium Arabic grammarians
• Constituency/context-free grammar is a new-fangled invention
• 20th century invention (R.S. Wells, 1947; then Chomsky 1953, etc.)
• Modern dependency work is often sourced to Lucien Tesnière (1959)
• Was dominant approach in “East” in 20th Century (Russia, China, …)
• Good for free-er word order, inflected languages like Russian (or Latin!)
• Used in some of the earliest parsers in NLP, even in the US:
• David Hays, one of the founders of U.S. computational linguistics, built early (first?)
dependency parser (Hays 1962) and published on dependency grammar in Language

32
Dependency Grammar and Dependency Structure

ROOT Discussion of the outstanding issues was completed .

• Some people draw the arrows one way; some the other way!
• Tesnière had them point from head to dependent – we follow that convention
• We usually add a fake ROOT so every word is a dependent of precisely 1 other node

33
The rise of annotated data & Universal Dependencies treebanks
Brown corpus (1967; PoS tagged 1979); Lancaster-IBM Treebank (starting late 1980s);
Marcus et al. 1993, The Penn Treebank, Computational Linguistics;
Universal Dependencies: http://universaldependencies.org/

34
The rise of annotated data
Starting off, building a treebank seems a lot slower and less useful than writing a grammar
(by hand)

But a treebank gives us many things

• Reusability of the labor
• Many parsers, part-of-speech taggers, etc. can be built on it
• Valuable resource for linguistics
• Broad coverage, not just a few intuitions
• Frequencies and distributional information
• A way to evaluate NLP systems

35
Dependency Conditioning Preferences
What are the straightforward sources of information for dependency parsing?
1. Bilexical affinities The dependency [discussion → issues] is plausible
2. Dependency distance Most dependencies are between nearby words
3. Intervening material Dependencies rarely span intervening verbs or punctuation
4. Valency of heads How many dependents on which side are usual for a head?

ROOT Discussion of the outstanding issues was completed .

36
Dependency Parsing
• A sentence is parsed by choosing for each word what other word (including ROOT) it is
a dependent of

• Usually some constraints:

• Only one word is a dependent of ROOT
• Don’t want cycles A → B, B → A
• This makes the dependencies a tree
• Final issue is whether arrows can cross (be non-projective) or not

ROOT I ’ll give a talk tomorrow on neural networks

37
Projectivity
• Definition of a projective parse: There are no crossing dependency arcs when the
words are laid out in their linear order, with all arcs above the words
• Dependencies corresponding to a CFG tree must be projective
• I.e., by forming dependencies by taking 1 child of each category as head
• Most syntactic structure is projective like this, but dependency theory normally does
allow non-projective structures to account for displaced constituents
• You can’t easily get the semantics of certain constructions right without these
nonprojective dependencies

Who did Bill buy the coffee from yesterday ?

38
3. Methods of Dependency Parsing
1. Dynamic programming
Eisner (1996) gives a clever algorithm with complexity O(n3), by producing parse items
with heads at the ends rather than in the middle
2. Graph algorithms
You create a Minimum Spanning Tree for a sentence
McDonald et al.’s (2005) O(n2) MSTParser scores dependencies independently using an
ML classifier (he uses MIRA, for online learning, but it can be something else)
Neural graph-based parser: Dozat and Manning (2017) et seq. – very successful!
3. Constraint Satisfaction
Edges are eliminated that don’t satisfy hard constraints. Karlsson (1990), etc.
4. “Transition-based parsing” or “deterministic dependency parsing”
Greedy choice of attachments guided by good machine learning classifiers
E.g., MaltParser (Nivre et al. 2008). Has proven highly effective. And fast.
39
Greedy transition-based parsing [Nivre 2003]
• A simple form of a greedy discriminative dependency parser
• The parser does a sequence of bottom-up actions
• Roughly like “shift” or “reduce” in a shift-reduce parser – CS143, anyone?? – but the
“reduce” actions are specialized to create dependencies with head on left or right
• The parser has:
• a stack σ, written with top to the right
• which starts with the ROOT symbol
• a buffer β, written with top to the left
• which starts with the input sentence
• a set of dependency arcs A
• which starts off empty
• a set of actions

40
Basic transition-based dependency parser

Start: σ = [ROOT], β = w1, …, wn , A = ∅

1. Shift σ, wi|β, A ➔ σ|wi, β, A
2. Left-Arcr σ|wi|wj, β, A ➔ σ|wj, β, A∪{r(wj,wi)}
3. Right-Arcr σ|wi|wj, β, A ➔ σ|wi, β, A∪{r(wi,wj)}
Finish: σ = [w], β = ∅

41
Arc-standard transition-based parser
(there are other transition schemes …)
Analysis of “I ate fish”

Start Start: σ = [ROOT], β = w1, …, wn , A = ∅

1. Shift σ, wi|β, A ➔ σ|wi, β, A
2. Left-Arcr σ|wi|wj, β, A ➔
[root] I ate fish 3.
σ|wj, β, A∪{r(wj,wi)}
Right-Arcr σ|wi|wj, β, A ➔
σ|wi, β, A∪{r(wi,wj)}
Finish: σ = [w], β = ∅
Shift
[root] I ate fish
Shift
[root] I ate fish

42
Arc-standard transition-based parser
Analysis of “I ate fish”
Nota bene:
Left Arc In this example
A += I’ve at each step
[root] I ate [root] ate nsubj(ate → I) made the
“correct” next
Shift transition.
But a parser has
[root] ate fish [root] ate fish to work this out –
by exploring or
Right Arc inferring!
A +=
[root] ate fish [root] ate obj(ate → fish)

Right Arc
A += A = { nsubj(ate → I),
[root] ate [root] root([root] → ate) obj(ate → fish)
Finish root([root] → ate) }
43
MaltParser [Nivre and Hall 2005]
• We have left to explain how we choose the next action
• Answer: Stand back, I know machine learning!
• Each action is predicted by a discriminative classifier (e.g., softmax classifier) over each
legal move
• Max of 3 untyped choices (max of |R| × 2 + 1 when typed)
• Features: top of stack word, POS; first in buffer word, POS; etc.
• There is NO search (in the simplest form)
• But you can profitably do a beam search if you wish (slower but better):
• You keep k good parse prefixes at each time step
• The model’s accuracy is fractionally below the state of the art in dependency parsing,
but
• It provides very fast linear time parsing, with high accuracy – great for parsing the web
44
Conventional Feature Representation

binary, sparse 0 0 0 1 0 0 1 0 … 0 0 1 0
dim =106 –107
Feature templates: usually a combination of 1–3
elements from the configuration

Indicator features

45
Evaluation of Dependency Parsing: (labeled) dependency accuracy

Acc = # correct deps

# of deps

UAS = 4 / 5 = 80%
ROOT She saw the video lecture
0 1 2 3 4 5
LAS = 2 / 5 = 40%

Gold Parsed
1 2 She nsubj 1 2 She nsubj
2 0 saw root 2 0 saw root
3 5 the det 3 4 the det
4 5 video nn 4 5 video nsubj
5 2 lecture obj 5 2 lecture ccomp

46
4. Why do we gain from a neural dependency parser?
Indicator Features Revisited
Categorical features are: Neural Approach:
• Problem #1: sparse learn a dense and compact feature representation
• Problem #2: incomplete
• Problem #3: expensive to compute

More than 95% of parsing time is

consumed by feature computation

dense
0.1 0.9 -0.2 0.3 … -0.1 -0.5
dim = ~1000

48
A neural dependency parser [Chen and Manning 2014]
• Results on English parsing to Stanford Dependencies:
• Unlabeled attachment score (UAS) = head
• Labeled attachment score (LAS) = head and label

Parser UAS LAS sent. / s

MaltParser 89.8 87.2 469
MSTParser 91.4 88.1 10
TurboParser 92.3 89.6 8
C & M 2014 92.0 89.7 654

49
First win: Distributed Representations

• We represent each word as a d-dimensional dense vector (i.e., word embedding)

• Similar words are expected to have close vectors.

• Meanwhile, part-of-speech tags (POS) and dependency labels are also represented as
d-dimensional vectors. was were
• The smaller discrete sets also exhibit many semantical similarities.
good
is
come

NNS (plural noun) should be close to NN (singular noun).

nummod (numerical modifier) should be close to amod (adjective modifier).

50
Extracting Tokens & vector representations from configuration

• We extract a set of tokens based on the stack / buffer positions:

word POS dep.

A concatenation

}
s1 good JJ ∅
s2 has VBZ ∅ of the vector
b1 control NN ∅ representation of
lc(s1) ∅ + ∅ + ∅ all these is the
rc(s1) ∅ ∅ ∅ neural
lc(s2) He PRP nsubj representation of
rc(s2) ∅ ∅ ∅ a configuration

51
Second win: Deep Learning classifiers are non-linear classifiers
• A softmax classifier assigns classes 𝑦 ∈ 𝐶 based on inputs 𝑥 ∈ ℝ𝑑 via the probability:

• Traditional ML classifiers (including Naïve Bayes, SVMs, logistic regression and softmax
classifier) are not very powerful classifiers: they only give linear decision boundaries
• But neural networks can use multiple layers to learn much more complex nonlinear
decision boundaries

52
Neural Dependency Parser Model Architecture
(A simple feed-forward neural network multi-class classifier)
Log loss (cross-entropy error) will be back-
propagated to the embeddings

Softmax probabilities { Shift , Left-Arcr , Right-Arcr }

Output layer y
y = softmax(Uh + b2) The hidden layer re-represents the input —
it moves inputs around in an intermediate
Hidden layer h layer vector space—so it can be easily
h = ReLU(Wx + b1) classified with a (linear) softmax

Input layer x
lookup + concat Wins:
Distributed representations!
Non-linear classifier!

53
Dependency parsing for sentence structure
Chen & Manning (2014) showed that neural networks can accurately
determine the structure of sentences, supporting meaning interpretation

This paper was the first simple and successful neural dependency parser

The dense representations (and non-linear classifier) let it outperform other

greedy parsers in both accuracy and speed

54
Further developments in transition-based neural dependency parsing

This work was further developed and improved by others, including in particular at Google
• Bigger, deeper networks with better tuned hyperparameters
• Beam search
• Global, conditional random field (CRF)-style inference over the decision sequence
Leading to SyntaxNet and the Parsey McParseFace model (2016):
“The World’s Most Accurate Parser”
https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html
Method UAS LAS (PTB WSJ SD 3.3)
Chen & Manning 2014 92.0 89.7
Weiss et al. 2015 93.99 92.05
Andor et al. 2016 94.61 92.79

55
Graph-based dependency parsers
• Compute a score for every possible dependency for each word
• Doing this well requires good “contextual” representations of each word token,
which we will develop in coming lectures

0.5 0.8

0.3 2.0

ROOT The big cat sat

e.g., picking the head for “big”

56
Graph-based dependency parsers
• Compute a score for every possible dependency (choice of head) for each word
• Doing this well requires more than just knowing the two words
• We need good “contextual” representations of each word token, which we will
develop in the coming lectures
• Repeat the same process for each other word; find the best parse (MST algorithm)
0.5 0.8

0.3 2.0

ROOT The big cat sat

e.g., picking the head for “big”
57
A Neural graph-based dependency parser
[Dozat and Manning 2017; Dozat, Qi, and Manning 2017]

• This paper revived interest in graph-based dependency parsing in a neural world

• Designed a biaffine scoring model for neural dependency parsing
• Also crucially uses a neural sequence model, something we discuss later
• Really great results!
• But slower than the simple neural transition-based parsers
• There are n2 possible dependencies in a sentence of length n

Method UAS LAS (PTB WSJ SD 3.3)

Chen & Manning 2014 92.0 89.7
Weiss et al. 2015 93.99 92.05
Andor et al. 2016 94.61 92.79
Dozat & Manning 2017 95.74 94.08
58

e-asTTle Marking Rubric With Page Numbers
No ratings yet
e-asTTle Marking Rubric With Page Numbers
8 pages
28 Present Continuous Tense - Turkish Language Lessons
No ratings yet
28 Present Continuous Tense - Turkish Language Lessons
4 pages
Grammar Jet Lang - 03072010
No ratings yet
Grammar Jet Lang - 03072010
30 pages
Active English Grammar 6
67% (3)
Active English Grammar 6
16 pages
cs224n 2023 Lecture04 Dep Parsing
No ratings yet
cs224n 2023 Lecture04 Dep Parsing
45 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
47 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
45 pages
Dependency Parsing
No ratings yet
Dependency Parsing
51 pages
Dependency Grammars: Julia Hockenmaier
No ratings yet
Dependency Grammars: Julia Hockenmaier
21 pages
Unit 5
No ratings yet
Unit 5
10 pages
14-syntax-1
No ratings yet
14-syntax-1
22 pages
4.Chapter5_ Syntactic and Semantic Representations
No ratings yet
4.Chapter5_ Syntactic and Semantic Representations
47 pages
8 Parsing
No ratings yet
8 Parsing
40 pages
CS224n: Natural Language Processing With Deep Learning: Lecture Notes: Part IV Dependency Parsing Winter 2019
No ratings yet
CS224n: Natural Language Processing With Deep Learning: Lecture Notes: Part IV Dependency Parsing Winter 2019
5 pages
cs224n 2019 Notes04 Dependencyparsing
No ratings yet
cs224n 2019 Notes04 Dependencyparsing
5 pages
nlp unit 2
No ratings yet
nlp unit 2
13 pages
Unit 2 new one
No ratings yet
Unit 2 new one
12 pages
NLP Unit 2
No ratings yet
NLP Unit 2
20 pages
What Is Parsing
No ratings yet
What Is Parsing
47 pages
unit-1
No ratings yet
unit-1
23 pages
NLP UNIT-II PPT
No ratings yet
NLP UNIT-II PPT
45 pages
Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?
No ratings yet
Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?
32 pages
dependency grammar
No ratings yet
dependency grammar
10 pages
19 Parsing
No ratings yet
19 Parsing
122 pages
NLP unit-2
No ratings yet
NLP unit-2
18 pages
Dependency Grammar: NASSLLI Short Course On Dependency Parsing Summer 2010
No ratings yet
Dependency Grammar: NASSLLI Short Course On Dependency Parsing Summer 2010
51 pages
Dependency Parsing
100% (11)
Dependency Parsing
127 pages
Unit 2 Syntactic Processing
No ratings yet
Unit 2 Syntactic Processing
17 pages
Natural Language Processing: Dr. Ahmed El-Bialy
100% (1)
Natural Language Processing: Dr. Ahmed El-Bialy
49 pages
Dependency Parsing
No ratings yet
Dependency Parsing
377 pages
Ling342 6
No ratings yet
Ling342 6
44 pages
Lecture 6
No ratings yet
Lecture 6
43 pages
5th Unit NLP (1)
No ratings yet
5th Unit NLP (1)
32 pages
SLoSP 2007 1
No ratings yet
SLoSP 2007 1
42 pages
NLP UNIT-II
No ratings yet
NLP UNIT-II
42 pages
NLP UNIT-II
No ratings yet
NLP UNIT-II
71 pages
NLP - Lecture1
No ratings yet
NLP - Lecture1
21 pages
Ai Unit - 5
No ratings yet
Ai Unit - 5
12 pages
Lecture 08
No ratings yet
Lecture 08
69 pages
Unit 2_Lecture 1
No ratings yet
Unit 2_Lecture 1
19 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
118 pages
Unit II
No ratings yet
Unit II
61 pages
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
No ratings yet
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
41 pages
UNIT III_NLP
No ratings yet
UNIT III_NLP
36 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
50 pages
module-5
No ratings yet
module-5
24 pages
NLP Mid-1
No ratings yet
NLP Mid-1
15 pages
NLP Unit 2
No ratings yet
NLP Unit 2
48 pages
nlp-2024-404
No ratings yet
nlp-2024-404
13 pages
MNLP Unit-2 (1)
No ratings yet
MNLP Unit-2 (1)
96 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
Recent Advances in Dependency Parsing (Slides 2014) - Ryan McDonald Joakim Nivre PDF
No ratings yet
Recent Advances in Dependency Parsing (Slides 2014) - Ryan McDonald Joakim Nivre PDF
379 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
Pert24 - NLP For Communication
No ratings yet
Pert24 - NLP For Communication
30 pages
Constituency and Dependency in Syntax
No ratings yet
Constituency and Dependency in Syntax
7 pages
02 Linguistics Essentials
No ratings yet
02 Linguistics Essentials
36 pages
Dependency Parsing And Algorithms With Images
No ratings yet
Dependency Parsing And Algorithms With Images
13 pages
CITS4012 Lecture02 PDF
No ratings yet
CITS4012 Lecture02 PDF
54 pages
Machine 22
No ratings yet
Machine 22
5 pages
Chapter 12
No ratings yet
Chapter 12
16 pages
NLP CHAPTER 3
No ratings yet
NLP CHAPTER 3
23 pages
Learning Concurrent Programming in Scala - Second Edition
From Everand
Learning Concurrent Programming in Scala - Second Edition
Aleksandar Prokopec
No ratings yet
Bài Giảng Môn Tiếng Anh 1 - 2009
100% (1)
Bài Giảng Môn Tiếng Anh 1 - 2009
278 pages
English Verb Tenses
50% (2)
English Verb Tenses
13 pages
Capitalization and Punctuation Mark
No ratings yet
Capitalization and Punctuation Mark
1 page
Smple Cmpdintro
No ratings yet
Smple Cmpdintro
17 pages
Grammar Revision Starter Unit - Unit 1
No ratings yet
Grammar Revision Starter Unit - Unit 1
3 pages
Academic English: COM 107-M11267 COM107-M11335 COM 107-M11278
No ratings yet
Academic English: COM 107-M11267 COM107-M11335 COM 107-M11278
11 pages
Derivation
No ratings yet
Derivation
3 pages
Academic Style & Formal Tone
No ratings yet
Academic Style & Formal Tone
29 pages
Wonderful Past Simple and Present Perfect
No ratings yet
Wonderful Past Simple and Present Perfect
4 pages
Pre Madhyamik Test 1
No ratings yet
Pre Madhyamik Test 1
3 pages
0525_s22_ms_21
No ratings yet
0525_s22_ms_21
12 pages
A Guide to Old Spanish First Edition. Edition Steven N. Dworkin All Chapters Instant Download
100% (2)
A Guide to Old Spanish First Edition. Edition Steven N. Dworkin All Chapters Instant Download
27 pages
Unit 34 Argumentative Texts: Structure and Main Features
No ratings yet
Unit 34 Argumentative Texts: Structure and Main Features
25 pages
Q3L5 Cohesive-Devices
No ratings yet
Q3L5 Cohesive-Devices
31 pages
WS-4 Who Did Patricks HW
No ratings yet
WS-4 Who Did Patricks HW
6 pages
Smt. Sulochanadevi Singhania School, Thane Std-4/ Project I/ English Practice Worksheet /october 2018/P.N
No ratings yet
Smt. Sulochanadevi Singhania School, Thane Std-4/ Project I/ English Practice Worksheet /october 2018/P.N
6 pages
Structure of Coordination
No ratings yet
Structure of Coordination
20 pages
English Paper Group 5 Material 7
No ratings yet
English Paper Group 5 Material 7
12 pages
Language Processors
No ratings yet
Language Processors
41 pages
3 Present Perfect: Irregular Verbs Affirmative: Grammar
0% (1)
3 Present Perfect: Irregular Verbs Affirmative: Grammar
1 page
Japanese Adjectives with Particle が
No ratings yet
Japanese Adjectives with Particle が
7 pages
Suffixes Parts of Speech PDF
No ratings yet
Suffixes Parts of Speech PDF
2 pages
SAT Word List 14
No ratings yet
SAT Word List 14
1 page
Teacher - Notes - and - Answer Key - Tests - 1 - and - 2
No ratings yet
Teacher - Notes - and - Answer Key - Tests - 1 - and - 2
32 pages
The Simple Present Tense: S, SH, CH, X, O, Kons y
No ratings yet
The Simple Present Tense: S, SH, CH, X, O, Kons y
9 pages
English
No ratings yet
English
6 pages

Cs224n 2025 Lecture04 Dep Parsing

Uploaded by

Cs224n 2025 Lecture04 Dep Parsing

Uploaded by

Natural Language Processing

with Deep Learning

Done correctly, big O() complexity of fprop and

• The gradient computation can be

• For small h (≈ 1e-4),

• Useful for checking your implementation

We’ve mastered the core technology of neural nets!

• Backpropagation: recursively (and hence efficiently) apply the chain rule

• Forward pass: compute results of operations and save intermediate

• Backpropagation doesn’t always work perfectly out of the box

Starting unit: words

Words combine into phrases

Phrases can combine into bigger phrases

Look in the large crate in the kitchen by the door

Humans communicate complex ideas by composing words together

A model needs to understand sentence structure in order to be able

Scientists count whales from space

Scientists count whales from space

• A key parsing decision is how we ‘attach’ various constituents

• Catalan numbers: Cn = (2n)!/[(n+1)!n!]

results mark interacts nmod:with

[Erkan et al. EMNLP 07, Fundel et al. 2007, etc.]

Bills on ports and immigration were submitted by

Bills were Brownback

An arrow connects a head submitted

Bills were Brownback

on and immigration nmod

ROOT Discussion of the outstanding issues was completed .

But a treebank gives us many things

ROOT Discussion of the outstanding issues was completed .

• Usually some constraints:

ROOT I ’ll give a talk tomorrow on neural networks

Who did Bill buy the coffee from yesterday ?

Start: σ = [ROOT], β = w1, …, wn , A = ∅

Start Start: σ = [ROOT], β = w1, …, wn , A = ∅

Acc = # correct deps

More than 95% of parsing time is

Parser UAS LAS sent. / s

• We represent each word as a d-dimensional dense vector (i.e., word embedding)

NNS (plural noun) should be close to NN (singular noun).

nummod (numerical modifier) should be close to amod (adjective modifier).

• We extract a set of tokens based on the stack / buffer positions:

word POS dep.

Softmax probabilities { Shift , Left-Arcr , Right-Arcr }

The dense representations (and non-linear classifier) let it outperform other

ROOT The big cat sat

e.g., picking the head for “big”

ROOT The big cat sat

• This paper revived interest in graph-based dependency parsing in a neural world

Method UAS LAS (PTB WSJ SD 3.3)

You might also like