0% found this document useful (0 votes)
14 views46 pages

6 Probabilisticparse

Uploaded by

Anh Lê Tuấn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views46 pages

6 Probabilisticparse

Uploaded by

Anh Lê Tuấn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Probabilistic parsing

Lê Thanh Hương
School of Information and Communication
Technology
Email: [email protected]

1
Motivation: how to choose a
parse structure?
• Choice among parses, e.g.
I saw a man with a telescope.
• As #rules increases, possibility of ambiguity goes up
• Large NYU grammars: Apple pie parser: 20,000-30,000 CF
rules for English
• Choice between two rules: V DT NN PP
(1) VP → V NP PP
NP → DT NN
(2) VP → V NP
NP → DT NN PP

2
Word associations (bigrams pr)
Example:
Eat ice-cream (high freq)
Eat John (low, except on Survivor)
Some disadvantages:
• P(John decided to bake a) has a high probability
• Consider:
P(w3) = P(w3|w2w1)=P(w3|w2)P(w2|w1)P(w1)
The assumption is too strong, e.g., the subject of a sentence can
‘select’ the object:
Clinton admires honesty
➢ use syntactic structure to stop selection from propagating
• Consider Fred watered his mother’s small garden. What is pr
contributed from garden?
• Pr(garden|mother’s small) is not high  trigram model would not do
well
• Pr(garden | X is head of object NP to water) is higher
➢ use bigram + syntactic relation

3
Syntactic associations (Pr cfg)

⚫ V takes a certain kind of argument


 Verb-with-obj, verb-without-obj
⚫ The correspondance between sbj and obj:
John admires honesty
Honesty admires John ???

Disadvantages:
• Grammar size increases
⚫ 1 year of Wall Street Journal (WSJ corpus): 47,219
sentences, avg. length 23 words; bracketed by hand: only
4.7% or 2,232 have exactly same structure as any other in
the corpus
➢ Can’t do it all by table lookup. Instead, build up set of
particular little pieces

4
Example Rule 3
S
VP

VP VP
Rule 1 VP ADJ
Rule 2
NP NP

DT NN NN VBX JJ CC VBX DT JJ NN
This apple pie looks good and is a real treat
5
Rules
1. NP→DT NN NN
2. NP→DT JJ NN
3. S→NP VBX JJ CC VBX NP
• Collapse (NNS, NN) to NX; (NNP, NNPs)=NPX;
(VBP, VBZ, VBD)=VBX;
• Choose rules by their frequencies

6
Calculating frequencies

X NP
Pr(X →Y)

1470
Y DT JJ NN = = 0.1532
NP
9711

7
S → NP VP; 0.35
Pr calculation NP → DT JJ NN; 0.1532
VP → VBX NP; 0.302
1 S
2 NP VP 3
DT JJ NN VBX NP 4
The big guy ate
DT JJ NN
the apple pie
Rule applied Pr chain
1 S →NP VP 0.35
2 NP → DT JJ NN 0.1532 x 0.35 = 0.0536
3 VP → VBX NP 0.302 x 0.0536= 0.0162
4 NP → DT JJ NN 0.1532 x 0.0162=0.0025
Pr = 0.0025
8
PCFGs
• A PCFG G consists of the usual parts of a CFG
• A set of terminals, {wk}, k = 1, . . . ,V
• A set of nonterminals, {Ni}, i = 1, . . . ,n
• A designated start symbol, N1
• A set of rules, {Ni → j}, (where j is a sequence of
terminals and nonterminals)
and
• A corresponding set of probabilities on rules such that:
i j P(Ni → j) = 1
• Probability of a derivation (i.e. parse) tree:
P(T) = Pi=1..n p(r(i))

9
Assumptions
• Place Invariance: The probability of a subtree does not depend on
where in the string the words it dominates are.
k, P(Njk(k+c) →) is the same

• Context Free: The probability of a subtree does not depend on


words not dominated by the subtree.
P(Njkl→ anything outside k through l) = P(Njkl→)

• Ancestor Free: The probability of a subtree does not depend on


nodes in the derivation outside the subtree
P(Njkl→ anything ancestor nodes outside Njkl ) = P(Njkl→)

10
Parsing algorithms

• CKY
• Beam search
• Agenda/chart-based search
•…

11
CKY with probabilities

• Data structure:
• Dynamic programming array p[i,j,a] holds the maximum
probability for a constituent with nonterminal a spanning
words i…j.
• Backptrs store links to constituents in tree
• Output: Maximum probability of parse

12
Compute Pr by induction
• Base case: input is a single word.
Pr(tree) = pr(A→ wi)
• Recursive case. Input is a string of words.
Aw * if k: A→ C, B w * ,C w * ,ik j.
ij ik kj
p[i,j] = max(p(A→ C) x p[i,k] x p[k,j]).
A

B C

i k j
wij 13
14
Calculation of Viterbi
probabilities (CKY algorithm)

0.0504

15
Pr calculation
1. S → NP VP 1.0
VP
2. VP → V NP PP 0.4 S 0.6
3. VP → V NP 0.6 NP
1.0
4. NP → N 0.7 VP 0.3
5. NP → N PP 0.3 0.4 PP
6. PP → PREP N 1.0 NP NP PP V N
0.7 0.7 1.0
7. N → a_dog 0.3 1.0
8. N → a_cat 0.5 PREP N
N V N PREP N
9. N → a_telescop 0.2 0.3 1.0 0.5 1.0 0.2
10. V → saw 1.0
11. PREP → with 1.0
a_dog saw a_cat with a_telescope
Pl = 1´.7´.4´.3´.7´1´.5´1´1´.2 = .00588
Pr = 1´.7´.6´.3´.3´1´.5´1´1´.2 = .00378
➢ Pl is chosen
16
Beam search
• State space search
• States are partial parses with an associated probability
• Keep only the top scoring elements at each stage of the beam search
• All parses of a sentence have the same number N steps

17
Forward and Backward Pr
1 t-1… t …T
The big brown fox
NP
Xt • Forward=probability of
The N’
everything above &
big N’’ including a certain node
ai(t) N
Forward brown
Probability =
i • Backward= probability of of
fox
ai(t)=P(w1(t-1), Xt=i) bi(t) everything below the node,
given the node
Backward
Probability =
bi(t)=P(wtT |Xt=i)

18
Inside and outside probabilities
N1= Start
Outside aj(p,q)
a
Nj
Inside bj(p,q)
b
w1 wp-1 wp wq wq+1 wm

• Npq = Nonterminal Nj spans positions p through q in string


(phrase Nj dominates words wpq )
• aj = outside probabilities
• bj = inside probabilities
• Nj dominates words wp … wq, iff Nj  wp … wq

19
Inside and outside probabilities
N1= Start
Outside aj(p,q)
a
Nj
Inside bj(p,q)
b
w1 wp-1 wp wq wq+1 wm

aj(p,q)=P(w1(p-1) , Npqj,w(q+1)m|G)
bj(p,q)=P(wpq|Npqj, G)

aj(p,q) bj(p,q) = P(N1 w1m , Nj  wpq | G)


= P(N1 w1m |G)• P(Nj  wpq | N1 w1m, G)

20
Compute Pr of a string

• We use the Inside Algorithm, a dynamic programming algorithm


based on the inside probabilities:
P(w1m|G) = P(N1 * w1m|G) = P(w1m|N1m1, G) = b1(1,m)

• Base Case:
bj(k,k) = P(wk|Nkkj, G)=P(Nj → wk|G)
• Induction:
bj(p,q) = r,sd(p,q-1) P(Nj → NrNs) br(p,d) bs(d+1,q)

21
Induction
Find bj(p,q) for p < q – calculate over all ‘splits’ j –
do this ‘bottom up’

Nj
P(Nj → NrNs)
Nr Ns

wp wd wd+1 wq -multiply these 3


factors; sum over
br(p,d) x bs(d+1,q) all j, r,s.

22
Example PCFG
1. S → NP VP 1.0 VP
S 0.6
2. VP → V NP PP 0.4 NP
1.0
3. VP → V NP 0.6
VP 0.3
4. NP → N 0.7 PP
0.4
5. NP → N PP 0.3 NP NP PP V N
6. PP → PREP N 1.0 0.7 0.7 1.0
1.0
7. N → a_dog 0.3 PREP N
8. N → a_cat 0.5 N V N PREP N
0.3 1.0 0.5 1.0 0.2
9. N → a_telescope 0.2
10. V → saw 1.0
11. PREP → with 1.0 P(a_dog saw a_cat with a_telescope) =

1´.7´.4´.3´.7´1´.5´1´1´.2 + ... ´.6... ´.3... = .00588 + .00378 = .00966

23
Compute outside Pr, aj(p,q)

N1
Left of parent

Nfpe

Njpq Ng(q+1)e

w1 wp wq wq+1 we wm

24
Compute outside Pr, aj(p,q)

N1
Right of parent

Nfeq
Nge(p-1) Njpq

w1 we wp-1 wp wq wm

Sum over both; restrict g = j


to avoid double counting NjNj
25
Ambiguity in Vietnamese syntactic
parsing
• 2 types of ambiguities:
• A sentence can be understood by different ways,
resulting in different syntactic trees
• Eg., “Tôi nhìn thấy anh Hải ở tầng hai”
• A sentence with only one meaning but the syntactic
parser generates more than one syntactic tree, in which
only one tree is correct.
• Eg., “Hôm nay trời mưa”

26
Ambiguity in Vietnamese syntactic
parsing
Câu

Câu
Chủ ngữ

Danh ngữ Vị ngữ


Trạng ngữ Chủ ngữ Vị ngữ

Danh từ Danh từ Động từ


Danh từ Danh từ Động từ

Hôm nay trời mưa Hôm nay trời mưa


(a) (b)

27
Ambiguity in Vietnamese syntactic
parsing
Solution:
Solution 1: Using more detailed syntactic labels Phân loại chi
tiết hơn các nhãn từ loại/ngữ loại:
Instead of the rule
<Danh ngữ> → <Danh từ><Danh từ>
Using a rule:
<Danh ngữ> → <Danh từ loại A><Danh từ loại B>.
Disadvantages:
• The set of syntactic labels is not unique.
• The size of the rule set is increased remarkable
• The rule set needs to be created manually → difficult to be done

28
Ambiguity in Vietnamese syntactic
parsing
Solution 2: add probabilities into the rule set
• The ambiguity in the sentence “Tôi nhìn thấy anh Hải ở
tầng hai” can be solved
• The ambiguity of word characteristics has not been
solved.
• Eg., noun phrase “vấn đề trong phần trước và phần này”

29
Ambiguity in Vietnamese syntactic
parsing
danh ngữ

giới ngữ
danh từ

danh ngữ
vấn đề giới từ

trong danh ngữ liên từ danh ngữ

danh từ tính từ
và danh ngữ đại từ chỉ định

phần trước phần này

30
Ambiguity in Vietnamese syntactic
parsing
danh ngữ

danh ngữ liên từ danh ngữ

danh từ giới ngữ và danh từ đại từ chỉ định

vấn đề giới từ danh ngữ phần này

trong danh từ Tính từ

phần trước

31
Specific words may affect the result
of syntactic parsing
For example:
• “Tôi ăn” rarely be accepted as a sentence because the information in
that sentence is small.
• “Tôi đang ăn” is more likely to be accepted.
➢ Has to consider the characteristic of the main word in a
sentence
2. Ambiguity due to removing the conjunction word
• Can say: bạn tôi, con tôi;
• Cannot say: con chó tôi, con mèo tôi.
➢ Word also play an important role in syntactic parsing
➢ Add word information to the grammar (enriching PCFG)

32
Enriching a PCFG
• A naive PCFG works quite poorly due to the
independence assumptions
• Fix: encode more information into the nonterminal
space
• Structure sensitivity
• Expansion of nodes depends a lot on their position
in the tree (independent of lexical content)
• E.g., enrich nodes by also recording their parents:
SNP is different to VPNP

33
Enriching a PCFG
• (Head) Lexicalization (Collins 1997; Charniak 1997)
• The head word of a phrase gives a good representation of the phrase’s
structure and meaning
• Puts the properties of words back into a PCFG

VP(dumped) → VBD(dumped) NP(sacks) PP(into) 3*10-10


VP(dumped) → VBD(dumped) NP(cats) PP(into) 8*10-11

34
Enriching a PCFG
• Lexicalizated PCFG : PLCFG (Probabilistic
Lexicalized CFG, Collins 1997; Charniak 1997)
• Puts the properties of words back into a PCFG
• Head structure
• Each node in the parsed tree is attached with a lexical
head
• To define a head node, we have to find it among all of
its children (define head in the RHS of a rule).

35
Enriching a PCFG
VP(dumped) → VBD(dumped) NP(sacks) PP(into) 3*10-10
VP(dumped) → VBD(dumped) NP(cats) PP(into) 8*10-11

36
36
Limitations of PLCFG
VP -> VBD NP PP
VP(dumped) -> VBD(dumped) NP(sacks)
PP(into)

⚫ We don’t have a large enough corpus!


⚫ To represent all syntactic cases for each word

37
Penn Treebank
• The Penn Treebank – 1 million words of parsed
English WSJ – has been a key resource
• Sparseness:
• 965,000 constituents, but only 66 WHADJP, of which
only 6 aren’t how much or how many
• Most intelligent processing depends on bilexical
statistics: likelihoods of relationships between
pairs of words.

38
A Penn Treebank tree

39
Evaluation

40
Performance’s Measurements
Human assignments
Total
Yes No
System Yes HSA SA - HSA SA
assignments No HA - HSA
Total HA

Precision: %assignments made that were correct (%THợp hệ tính đúng).


Recall: %possible assignments that were actually assigned (%THợp hệ
tính đúng so với con người).
HSA HSA
precision = recall =
SA HA

41
Represent a tree by its syntactic constitutes

42
Evaluate

43
Example 2

44
44
Exercise – compute P, R
Gold standard syntactic structure:
• (S (NP (N Cơn)(N lũ)) (VP(V cuốn)(V qua) (NP (L những)(N phận)(N người))) (. .))
• (S(NP(N Phận)(N người) (PP(E ở) (NP(Np Bình Sơn))))(. .))
Automatically generated syntactic structure:

45
Some syntactic parsers:
• CFG (context free grammar):
• Berkeley : http://nlp.cs.berkeley.edu/software.shtml
• Charniak: http://bllip.cs.brown.edu/resources.shtml
• HPSG (Head-driven Phrase Structure Grammar)
• Enju, deepNLP: https://mynlp.github.io/enju/
• Depedency grammar
• ClearNLP : http://clearnlp.wikispaces.com/depParser
• Google SyntaxNet: open-source, using deep learning
• https://research.googleblog.com/2016/05/announcing-syntaxnet-
worlds-most.html
• Netbase, for twitter sentences
• https://www.codeproject.com/Articles/43372/NetBase-A-Minimal-
NET-Database-with-a-Small-SQL
• Stanford : https://nlp.stanford.edu/software/lex-parser.shtml

46

You might also like