Parsing
Parsing
• Key Characteristics:
• Uses a probabilistic approach to determine the most likely parse tree.
• Often employs PCFGs, Bayesian models, or neural-based parsers.
• Outputs phrase structure trees, which represent constituents (noun phrases, verb
phrases, etc.).
• Example parser: Stanford Parser using PCFG.
Dependency Parsing
• Dependency parsing focuses on identifying relations between words in a sentence,
representing syntactic structure as a directed graph of dependencies. Instead of phrase
structures, it models head-dependent relationships.
• Key Characteristics:
• Determines grammatical relationships between words (e.g., subject-verb, object-verb).
• More useful for semantic analysis and information extraction.
• Works well for free-word-order languages (e.g., Hindi, Turkish) since it does not rely on phrase
structure.
• Example parser: spaCy's dependency parser, Stanford Dependency Parser.
Parsing Techniques in NLP
• The fundamental link between a sentence and its grammar is derived from a parse tree.
A parse tree is a tree that defines how the grammar was utilized to construct the
sentence. There are mainly two parsing techniques, commonly known as:
• top-down and
• bottom-up.
Top-Down Parsing
• A parse tree is a tree that defines how the grammar was utilized to construct the
sentence. Using the top-down approach, the parser attempts to create a parse tree from
the root node S down to the leaves.
• The procedure begins with the assumption that the input can be derived from the
selected start symbol S.
• The next step is to find the tops of all the trees that can begin with S by looking at the
grammatical rules with S on the left-hand side, which generates all the possible trees.
• Top-down parsing is a search with a specific objective in mind.
• It attempts to replicate the initial creation process by rederiving the sentence from the
start symbol, and the production tree is recreated from the top down.
• Top-down, left-to-right, and backtracking are prominent search strategies that are used
in this method.
Top-Down Parsing
• The search begins with the root node labeled S, i.e., the starting symbol, expands the
internal nodes using the next productions with the left-hand side equal to the internal
node, and continues until leaves are part of speech (terminals).
• If the leaf nodes, or parts of speech, do not match the input string, we must go back to
the most recent node processed and apply it to another production.
• Let’s consider the grammar rules:
• Sentence = S = Noun Phrase (NP) + Verb Phrase (VP) + Preposition Phrase (PP)
• Take the sentence: “John is playing a game”, and apply Top-down parsing
Top-Down Parsing
Bottom-Up Parsing
• Bottom-up parsing begins with the words of input and attempts to create trees from the words up,
again by applying grammar rules one at a time.
• The parse is successful if it builds a tree rooted in the start symbol S that includes all of the input.
Bottom-up parsing is a type of data-driven search. It attempts to reverse the manufacturing process
and return the phrase to the start symbol S.
• It reverses the production to reduce the string of tokens to the beginning Symbol, and the string is
recognized by generating the rightmost derivation in reverse.
• The goal of reaching the starting symbol S is accomplished through a series of reductions; when the
right-hand side of some rule matches the substring of the input string, the substring is replaced with
the left-hand side of the matched production, and the process is repeated until the starting symbol is
reached.
Bottom-Up Parsing
• Bottom-up parsing can be thought of as a reduction process. Bottom-up parsing is the
construction of a parse tree in postorder.
• Considering the grammatical rules stated above and the input sentence “John is playing a
game”,
• The bottom-up parsing operates as follows: