0% found this document useful (0 votes)
2 views

NLPPR6

The document outlines an assignment on implementing the CKY algorithm for deep parsing in Natural Language Processing. It explains the CKY and Probabilistic CKY (PCKY) algorithms, their workings, and differences, along with a sample code for parsing sentences using a context-free grammar. The conclusion emphasizes the significance of these algorithms in computational linguistics and natural language understanding.

Uploaded by

patilaaryaa85
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

NLPPR6

The document outlines an assignment on implementing the CKY algorithm for deep parsing in Natural Language Processing. It explains the CKY and Probabilistic CKY (PCKY) algorithms, their workings, and differences, along with a sample code for parsing sentences using a context-free grammar. The conclusion emphasizes the significance of these algorithms in computational linguistics and natural language understanding.

Uploaded by

patilaaryaa85
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Bansilal Ramnath Agarwal Charitable Trust's

Vishwakarma Institute of Information Technology

Department of
Artificial Intelligence and Data Science

Student Name: Vaishnavi Lawate

Class: TY Division: A Roll No: 371032

Semester: 6th Academic Year: 2024-25

Subject Name & Code: Natural Language Processing


Title of Assignment: 6) Implement CKY algorithm for deep parsing

Aim : Implement CKY algorithm for deep parsing

Background Theory :

Introduction

Parsing is a fundamental task in Natural Language Processing (NLP), where the structure of
a sentence is analyzed based on grammar rules. The CKY algorithm (Cocke–Kasami–
Younger) is a well-known parsing algorithm for context-free grammars, especially in
Chomsky Normal Form (CNF). When extended with probabilities, it becomes the
Probabilistic CKY (PCKY) algorithm, which is used to find the most likely parse tree of a
sentence.

CKY Algorithm

What is CKY?
The CKY (Cocke–Kasami–Younger) algorithm is a bottom-up dynamic programming
algorithm used to parse a string and determine whether it can be generated by a context-free
grammar (CFG) in CNF.
Note: Chomsky Normal Form (CNF) is a format where every production rule is either:
• A → BC (where B and C are non-terminal symbols)
• A → a (where a is a terminal symbol)

Working of CKY
Given a string of words w1, w2, ..., wn and a grammar in CNF:
1. Initialize a 2D table (n x n) where each cell [i][j] stores the set of non-terminal
symbols that can generate the substring from wi to wj.
2. Fill the diagonal: For each word wi, add all non-terminals A such that A → wi is a
production rule.
3. Bottom-up filling: For substrings longer than 1, check for all splits k between i and j,
and for all pairs of non-terminals (B in [i][k], C in [k+1][j]), add A to [i][j] if there's a
rule A → BC.
4. Check the top cell: If the start symbol S is in [0][n-1], then the string can be
generated by the grammar.

Example
Given:
• Grammar in CNF: o S → NP VP o NP → Det N
o VP → V NP o
Det → 'the' o
N → 'dog' |
'cat' o V→
'chased'
• Input: "the dog chased the cat"
CKY fills a table where each cell holds possible non-terminals that derive the substring. At
the end, it checks if 'S' (start symbol) is in the topmost cell.

Time and Space Complexity


• Time Complexity: O(n³ * |G|), where n is the number of words, and |G| is the number
of grammar rules.
• Space Complexity: O(n²)

Probabilistic CKY (PCKY)

What is Probabilistic CKY?


Probabilistic CKY extends CKY using Probabilistic Context-Free Grammar (PCFG), where
each production rule has a probability.
Goal: Instead of just checking if a sentence can be generated, PCKY finds the most probable
parse tree based on the given probabilities.

How PCFG Works


In a PCFG, each rule has a probability:
• S → NP VP [1.0]
• NP → Det N [0.6], NP → 'dog' [0.4]
• Det → 'the' [0.8]
• N → 'cat' [0.5], N → 'dog' [0.5]
The probability of a parse tree is the product of all rule probabilities used in the derivation.

Working of Probabilistic CKY


• Similar table-filling as CKY, but each cell also stores the maximum probability for
each non-terminal.
• Instead of just storing a non-terminal, each cell stores a tuple (Non-terminal,
Probability, Backpointer).
• When combining rules (A → B C), compute the probability:
P(A) = P(B in left) * P(C in right) * P(A → B C)
• At the end, choose the parse tree with the highest probability.

Parse Tree Reconstruction


Using the backpointers stored in each cell, we can trace back to construct the most probable
parse tree.

Differences Between CKY and PCKY


Feature CKY Probabilistic CKY (PCKY)
Input CFG in CNF PCFG in CNF
Whether the sentence can be
Output generated Most probable parse tree

Data Non-terminals + probabilities +


Stored Set of non-terminals backpointers
Usage Parsing/validating structure Probabilistic parsing, language modeling
Output
Any valid parse tree Most probable parse tree
Tree
CODE AND OUTPUT:
from nltk import CFG from
nltk.parse.chart import ChartParser
from nltk.tree import Tree

# Define a simple context-free grammar in CNF


grammar = CFG.fromstring("""
S -> NP VP
VP -> V NP | V
NP -> Det N
Det -> 'the' | 'a'
N -> 'dog' | 'cat' | 'ball'
V -> 'chased' | 'saw'
""")

# Create a parser parser =


ChartParser(grammar)

# Sentences to parse
sentences = [
"the dog chased a cat",
"a cat saw the ball",
"the dog saw a dog",
"a dog chased the ball",
"the cat saw a cat"
]

# Parse and print each tree for sentence in


sentences: print(f"\nParsing
sentence: \"{sentence}\"") tokens =
sentence.split() trees =
list(parser.parse(tokens)) for tree in trees:
tree.pretty_print()

Parsing sentence: "the dog chased a cat"


S
|

| VP
| |
NP | NP
| | |
Det N V Det N
| | | | |
the dog chased a cat

Parsing sentence: "a cat saw the ball"


S
|

| VP
| |
NP | NP
| | | Det N V Det N
| | | | |
a cat saw the ball
Parsing sentence: "the dog saw a dog"
S
|

| VP
| |
NP | NP
| | |
Det N V Det N
| | | | |
the dog saw a dog

Parsing sentence: "a dog chased the ball"


S
|

| VP
| |
NP | NP
| | |
Det N V Det N
| | | | |
a dog chased the ball

Parsing sentence: "the cat saw a cat"


S
|

| VP
| |
NP | NP
| | |
Det N V Det N
| | | | |
the cat saw a cat

Conclusion
The CKY algorithm provides a powerful technique for parsing sentences using context-free
grammars. Its probabilistic extension, the PCKY algorithm, enables more intelligent parsing by
selecting the most likely parse tree. Together, they form a foundational concept in computational
linguistics and natural language understanding.

You might also like