0% found this document useful (0 votes)
3 views

lecture 4

This document discusses syntax analysis in compilers, focusing on the role of parsers and the use of context-free grammars (CFG) to define the structure of programming languages. It outlines different types of parsers, including top-down and bottom-up parsers, and explains how syntax trees are constructed from tokens. Additionally, it provides formal definitions and examples of context-free grammars used in programming language constructs.

Uploaded by

enochmack04
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

lecture 4

This document discusses syntax analysis in compilers, focusing on the role of parsers and the use of context-free grammars (CFG) to define the structure of programming languages. It outlines different types of parsers, including top-down and bottom-up parsers, and explains how syntax trees are constructed from tokens. Additionally, it provides formal definitions and examples of context-free grammars used in programming language constructs.

Uploaded by

enochmack04
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

COMPILERS

Lecture 4
Lecture Outline
■ Syntax Analysis
■ Role of the Parser
■ Grammars
■ Context-free Grammars
Introduction
■ The purpose of syntax analysis (also known as parsing) is
to recombine the tokens the lexical analysis splits.
– Not back into a list of characters, but into something that
reflects the structure of the text.
– This “something” is typically a data structure called the
syntax tree/parse tree of the text.
■ The syntax analysis must also reject invalid texts by
reporting syntax errors
Introduction
■ By design, every programming language has precise rules
that prescribe the syntactic structure of well-formed
programs.
– For example in C, a program is made up of functions, functions
out of declarations and statements, statement out of expressions,
and so on.
■ The syntax of programming language constructs can be
specified by context-free grammars or BNF (Backus-Naur
Form) notation
The Role of the Syntax Analyzer
■ The parser reconstructs a derivation by which a given Context
Free Grammar (CFG) can generate a given input string.
– CFG is a recursive notation for describing sets of strings and
imposing a structure on each such string.
■ The syntax analyzer must also reject invalid texts by reporting
syntax errors.
■ Same basic strategy:
– A notation suitable for human understanding is transformed into
a machine-like low-level notation suitable for efficient execution.
– This process is called parser generation.
Types of Parsers
■ There are three general types of parsers for grammars:
universal, top-down, and bottom-up.
■ Universal parsing methods such as the Cocke-Younger-
Kasami algorithm and Earley's algorithm can parse any
grammar.
– These general methods are, however, too inefficient to
use in production compilers.
Types of Parsers: Top-down Parser
■ The top-down parser is the parser that generates parse
for the given input string with the help of grammar
productions by expanding the non-terminals
– It starts from the start symbol and ends on the terminals.
– It uses left most derivation
– Top-down methods build parse trees from the top (root) to the
bottom (leaves)
Types of Parsers: Top-down Parser
■ Top-down parser is classified into 2 types:
– Recursive descent parser is also known as the Brute force parser
or the backtracking parser. It generates the parse tree by using
brute force and backtracking.
– Non-recursive descent parser is also known as LL(1) parser or
predictive parser or without backtracking parser or dynamic
parser. It uses a parsing table to generate the parse tree instead
of backtracking.
Types of Parsers: Bottom-up Parser
■ Bottom-up Parser is the parser that generates the parse
tree for the given input string with the help of grammar
productions by compressing the non-terminals.
– It starts from non-terminals and ends on the start symbol.
– It uses the reverse of the rightmost derivation.
– Bottom-up methods start from the leaves and work their way up to
the root.
Types of Parsers: Bottom-up Parser
■ Bottom-up parser is classified into two types: LR parser, and
Operator precedence parser
■ LR parser is the bottom-up parser that generates the parse
tree for the given string by using unambiguous grammar. It
follows the reverse of the rightmost derivation.
■ LR parser is of four types:
– LR(0)
– SLR(1)
– LALR(1)
– CLR(1)
Types of Parsers: Bottom-up Parser
■ Operator precedence parser generates the parse tree
from given grammar and string but the only condition is
two consecutive non-terminals and epsilon never appears
on the right-hand side of any production.
Types of Parsers
Syntax Analysis
■ There are a number of tasks that might be conducted
during parsing, such as
– collecting information about various tokens into the
symbol table,
– performing type checking and other kinds of semantic
analysis, and
– generating intermediate code.
Syntax Tree/Parse Tree
■ The syntax tree is a tree structure.
– The leaves of this tree are the tokens found by the lexical
analysis.
– If the leaves are read from left to right, the sequence is
the same as in the input text.
■ What is important in the syntax tree is how these leaves
are combined to form the structure of the tree and how
the interior nodes of the tree are labelled.
Syntax Tree
Context-free Grammar
■ The notation we use for human manipulation is context-
free grammars, which is a recursive notation for
describing sets of strings and imposing a structure on
each such string.
– Context-free grammars describe sets of strings, i.e.,
languages.
– A context-free grammar also defines structure on the
strings in the language it defines.
Context-free Grammar
■ It recursively defines several sets of strings. Each set is
denoted by a name, which is called a nonterminal.
– The set of nonterminals is disjoint from the set of
terminals.
■ One of the nonterminals are chosen to denote the
language described by the grammar.
– This is called the start symbol of the grammar.
Context-free Grammar
■ The sets are described by a number of productions.
■ Each production describes some of the possible strings
that are contained in the set denoted by a nonterminal.
■ A production has the form:
𝑁 → 𝑋! … 𝑋"
■ where 𝑁 is a nonterminal and 𝑋! … 𝑋" are zero or more
symbols, each of which is either a terminal or a
nonterminal.
Example
𝐴→𝑎
■ Says that the set denoted by the nonterminal A contains
the one-character string a.
𝐵→
𝐵 → 𝑎𝐵
– where the first production indicates that the empty string
is part of the set B.
■ Productions with empty right-hand sides are called empty
productions.
Example
■ The examples have used only one nonterminal per
grammar.
■ When several nonterminals are used, we must make it
clear which of these is the start symbol.
■ By convention (if nothing else is stated), the nonterminal
on the left-hand side of the first production is the start
symbol.
Formal definition of Context Free Grammar
A context free grammar is a 4-tuple 𝑉, Σ, 𝑅, 𝑆 , where
■ 𝑉 is a finite set called variables,
■ Σ is a finite set, disjoint from 𝑉, called the terminals,
■ 𝑅 is a finite set of rules, with each rule being a variable
and a string from variables and terminals, and
■ 𝑆 ∈ 𝑉 is the start variables.
CFG
■ If u, v, and w are strings of variables and terminals, and
𝐴 → 𝑤 is a rule of the grammar, we say that 𝑢𝐴𝑣 yields
𝑢𝑤𝑣, written 𝑢𝐴𝑣 ⇒ 𝑢𝑤𝑣.
■ Say that u derives v, written 𝑢 ⇒∗ 𝑣, if a sequence
𝑢! , 𝑢$ , 𝑢% , … 𝑢& exists for 𝑘 ≥ 0 and 𝑢 ⇒ 𝑢! ⇒ 𝑢$ ⇒ ⋯ ⇒
𝑢& = 𝑣
■ The language of the grammar is 𝑤 ∈ Σ ∗ 𝑆 ⇒∗ 𝑤
Example
■ As an example, the grammar
T→R
T → aTa
R→b
R → bR
■ has T as start symbol and denotes the set of strings that
start with any number of as followed by a non-zero
number of b’s and then the same number of a’s with
which it started.
Context-free Grammar
■ When writing a grammar for a programming language, one
normally starts by dividing the constructs of the language into
different syntactic categories.
– A syntactic category is a sub-language that embodies a
particular concept.
■ Examples of common syntactic categories in programming
languages are:
– Expressions are used to express calculation of values.
– Statements express actions that occur in a particular
sequence.
– Declarations express properties of names used in other parts
of the program.
Examples
Simple expression grammar

𝐸𝑥𝑝 → 𝐸𝑥𝑝 + 𝐸𝑥𝑝


𝐸𝑥𝑝 → 𝐸𝑥𝑝 − 𝐸𝑥𝑝
𝐸𝑥𝑝 → 𝐸𝑥𝑝 ∗ 𝐸𝑥𝑝
𝐸𝑥𝑝 → 𝐸𝑥𝑝/𝐸𝑥𝑝
𝐸𝑥𝑝 → 𝒏𝒖𝒎
𝐸𝑥𝑝 → (𝐸𝑥𝑝)
Example
Simple statement grammar

𝑆𝑡𝑎𝑡 → 𝒊𝒅 ≔ 𝐸𝑥𝑝
𝑆𝑡𝑎𝑡 → 𝑆𝑡𝑎𝑡; 𝑆𝑡𝑎𝑡;
𝑆𝑡𝑎𝑡 → 𝒊𝒇 𝐸𝑥𝑝 𝒕𝒉𝒆𝒏 𝑆𝑡𝑎𝑡 𝒆𝒍𝒔𝒆 𝑆𝑡𝑎𝑡
𝑆𝑡𝑎𝑡 → 𝒊𝒇 𝐸𝑥𝑝 𝒕𝒉𝒆𝒏 𝑆𝑡𝑎𝑡

You might also like