Syntactic Analysis
Syntactic Analysis
P.S.Sreeja
Syntax
• The word syntax comes from the Greek syntaxis ´ , meaning “setting
out together or arrangement”, and refers to the way words are
arranged together.
• Language Models: Importance of modeling word order
• POS categories: An equivalence class for words
• More complex notions: constituency, grammatical relations,
subcategorization etc.
Defining the notions: Constituency
• Constituency
• The fundamental idea of constituency is that groups of words may behave as
a single unit or phrase, called a constituent.
• For example we will see that a group of words called a noun phrase often acts
as a unit; noun phrases include single words
• Ex:
Preposed or Postposed:
• Terminal Symbols
• Non-Terminal Symbols
• Derivation
• Parse tree
Examples
Syntactic parsing
What is Parsing?
• The process of taking a string and a grammar and returning all
possible parse trees for that string
• That is, find all trees, whose root is the start symbol S, which
cover exactly the words in the input
Top-Down Parsing
• Searches for a parse tree by trying to build upon the root node S
down to the leaves
• Start by assuming that the input can be derived by the designated
start symbol S
• Find all trees that can start with S, by looking at the grammar rules
with S on the left-hand side
• Trees are grown downward until they eventually reach the POS
categories at the bottom
• Trees whose leaves fail to match the words in the input can be
rejected
Bottom-Up Parsing
• The parser starts with the words of the input, and tries to build trees
from the words up, by applying rules from the grammar one at a time
• Parser looks for the places in the parse-in-progress where the right-
hand-side of some rule might fit.
Grammatical relations
• Grammatical relations are a formalization of ideas from traditional grammar
such as SUBJECTS and OBJECTS, and other related notions.
• we replace the leftmost pair of non-terminals with a new non-terminal and introduce
a new production result in the following new rules.
• Intuition with the following three transition operators that will operate on
the top two elements of the stack:
• LEFTARC: Assert a head-dependent relation between the word at the top of the
stack and the second word; remove the second word from the stack.
• SHIFT: Remove the word from the front of the input buffer and push it onto the
stack.
• Sometimes call operations like LEFTARC and RIGHTARC reduce
operations, based on a metaphor from shift-reduce parsing, in which
reducing means combining elements on the stack.
• There are some preconditions for using operators.
• The LEFTARC operator cannot be applied when ROOT is the second element
of the stack (since by definition the ROOT node cannot have any incoming
arcs).
• Both the LEFTARC and RIGHTARC operators require two elements to be on the
stack to be applied.