NLPPR6
NLPPR6
Department of
Artificial Intelligence and Data Science
Background Theory :
Introduction
Parsing is a fundamental task in Natural Language Processing (NLP), where the structure of
a sentence is analyzed based on grammar rules. The CKY algorithm (Cocke–Kasami–
Younger) is a well-known parsing algorithm for context-free grammars, especially in
Chomsky Normal Form (CNF). When extended with probabilities, it becomes the
Probabilistic CKY (PCKY) algorithm, which is used to find the most likely parse tree of a
sentence.
CKY Algorithm
What is CKY?
The CKY (Cocke–Kasami–Younger) algorithm is a bottom-up dynamic programming
algorithm used to parse a string and determine whether it can be generated by a context-free
grammar (CFG) in CNF.
Note: Chomsky Normal Form (CNF) is a format where every production rule is either:
• A → BC (where B and C are non-terminal symbols)
• A → a (where a is a terminal symbol)
Working of CKY
Given a string of words w1, w2, ..., wn and a grammar in CNF:
1. Initialize a 2D table (n x n) where each cell [i][j] stores the set of non-terminal
symbols that can generate the substring from wi to wj.
2. Fill the diagonal: For each word wi, add all non-terminals A such that A → wi is a
production rule.
3. Bottom-up filling: For substrings longer than 1, check for all splits k between i and j,
and for all pairs of non-terminals (B in [i][k], C in [k+1][j]), add A to [i][j] if there's a
rule A → BC.
4. Check the top cell: If the start symbol S is in [0][n-1], then the string can be
generated by the grammar.
Example
Given:
• Grammar in CNF: o S → NP VP o NP → Det N
o VP → V NP o
Det → 'the' o
N → 'dog' |
'cat' o V→
'chased'
• Input: "the dog chased the cat"
CKY fills a table where each cell holds possible non-terminals that derive the substring. At
the end, it checks if 'S' (start symbol) is in the topmost cell.
# Sentences to parse
sentences = [
"the dog chased a cat",
"a cat saw the ball",
"the dog saw a dog",
"a dog chased the ball",
"the cat saw a cat"
]
| VP
| |
NP | NP
| | |
Det N V Det N
| | | | |
the dog chased a cat
| VP
| |
NP | NP
| | | Det N V Det N
| | | | |
a cat saw the ball
Parsing sentence: "the dog saw a dog"
S
|
| VP
| |
NP | NP
| | |
Det N V Det N
| | | | |
the dog saw a dog
| VP
| |
NP | NP
| | |
Det N V Det N
| | | | |
a dog chased the ball
| VP
| |
NP | NP
| | |
Det N V Det N
| | | | |
the cat saw a cat
Conclusion
The CKY algorithm provides a powerful technique for parsing sentences using context-free
grammars. Its probabilistic extension, the PCKY algorithm, enables more intelligent parsing by
selecting the most likely parse tree. Together, they form a foundational concept in computational
linguistics and natural language understanding.