0% found this document useful (0 votes)
69 views

CD Course Material1

The document outlines the course planning for Compiler Design (CS115) for the academic year 2019-2020, detailing prerequisites, learning resources, course overview, assessment methods, and a lesson plan. It emphasizes the importance of compiler design principles, covering topics such as lexical analysis, syntax analysis, and code generation. The course aims to enhance students' problem-solving skills and logical thinking while providing a comprehensive understanding of compiler functionalities.

Uploaded by

18k41a0518
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

CD Course Material1

The document outlines the course planning for Compiler Design (CS115) for the academic year 2019-2020, detailing prerequisites, learning resources, course overview, assessment methods, and a lesson plan. It emphasizes the importance of compiler design principles, covering topics such as lexical analysis, syntax analysis, and code generation. The course aims to enhance students' problem-solving skills and logical thinking while providing a comprehensive understanding of compiler functionalities.

Uploaded by

18k41a0518
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 304

COMPILER DESIGN

COURSE PLANNING
DOCUMENT

Subject Code : CS115


Class : III Year II Semester (CSE)
Branch : Computer Science & Engineering
Academic Year : 2019 - 2020

Prepared by

G. SUNIL REDDY, Asst. Professor


Department of Computer Science and Engineering

i
COMPILER DESIGN (CS 115)

III B.Tech: II Sem L:4 T: P: C:

Name of the Instructor(s): G. Sunil Reddy, Md. Sallauddin, R. Ravi Kumar


No. of Hours/week: 4
Total number of hours planned: 60

Pre-requisite
• Knowledge of automata theory
• Context free languages
• Computer architecture
• Data structures and simple graph algorithms
• Logic or algebra

Learning Resources
• Textbooks, Class Notes
Text Books
1. Alfred V.Aho, Ravi Sethi and Jeffry D. Ullman “Compiler Principles, Techniques and
Tools”16th Indian Reprint, Pearson Education Asia, ISBN No.81-7808-046-X., 2004.
2. D.M.Dhamdere ”Compiler Construction“, 2nd Edition ” Mac Mellon India Ltd”, ISBN
No.0333 -90406-0,1997
Reference Books
1. Donovan,”Systems programming”, Mc. Graw Hill.
2. Leland L. Beck, “System Software – An Introduction to Systems Programming”
Addison Wesley.
Additional Resources (links etc)
1. books.google.co.in Computers Programming General
2. www.amazon.com Books Computers and Technology
3. http://nptel.iitm.ac.in
Reading materials:
1. Online Video links

ii
How to Contact Instructor:

• In-person office hours: (Commonly for all instructors)


o Students can meet, whenever we have free schedule during the college hours.
Specifically on working Wednesday and Saturday during 3 p.m. to 4 p.m.
o Can meet 4:00 pm to 5:00 pm in working college hours with prior approval.

• Online office hours: time and how to access

Course Name : Compiler Design


Course Code : CS 115
Name of the Instructor / Faculty member : G. Sunil Reddy
Contact details (Email id) : [email protected]
(Phone Number) : +91 9676561828
Name of the 1st Co Faculty member : Md. Sallauddin
Contact details (Email id) : [email protected]
(Phone Number) : +91 9885502477
nd
Name of the 2 Co Faculty member : R. Ravi Kumar
Contact details (Email id) : [email protected]
(Phone Number) : +91 9989916656
Contact Hours (of Faculty member /
Co Faculty members) : Wednesday : 2:30 to 4:00 PM
Saturday : 2:30 to 4:00 PM

Technology Requirements:
• Learning management system (Google classroom, etc.)

Overview of Course:
• What is the course about: its purpose?
Compiler design principles provide an in-depth view of translation and optimization
process. Compiler design covers basic translation mechanism and error detection &
recovery. It includes lexical, syntax, and semantic analysis as front end, and code
generation and optimization as back-end

iii
• What are the general topics or focus?
1. Phases of compiler
2. Lexical Analysis
3. Parsing Techniques
4. Intermediate Code Generation
5. Code Optimization
6. Code Generation

• How does it fit with other courses in the department or on campus?


Compilers have become part and parcel of today’s computer systems. They are
responsible for making the user’s computing requirements, specified as a piece of
program, understandable to the underlying machine. There tools work as interface
between the entities of two different domains- the human being and the machine. This
course is use full for all the programming languages.

• Why would students want to take this course and learn this material?
1. Helps the student to improve problem solving skill
2. Helps in learning further programming languages
3. Helps to develop compiler
4. As it a logical oriented, students will be able to improve logical thinking

Methods of instruction
• Lecture (chalk & talk / ICT)

Workload
• Estimated amount of time student needs to spend on course readings (per week): 2
hours per week
• Estimate amount of time to student needs to spend on Homework for practicing the
problems (per week) : 2 Hours per week

iv
Assessment
Assessment No of Weightage in Marks
S. No Assessments
Methodology assessments marks scaled to
1 Quizzes 5 5
2 Class test -- -- 5
3 Assignment -- --
CIE
4 Course Activity -- -- --
5 Attendance -- -- 5
6 Internal exams 2 20 20
7 SEE -- -- -- 70

Note:
• Class test/ Quiz – The marks allotted for quiz will be graded to assignment
• Since the assessment is through online the results will be displayed to the students
immediately.
// Absentees for class assessments
In case the student is absent then a structured
enquiry problem will be given as an assignment
Absentees for Quiz with a deadline, in case the assignment is not
submitted in time then he/she will given zero
marks

Topic Activity Rubrics UNIT Schedule


Summary of Online Quiz 10 Questions will be displayed one All After the
questions will mark each (10) Units completio
be framed for n of each
each unit unit
Average Scaled to 5 Marks

Key concepts:
1. Compiler
2. Assembler, Translator
3. Lexical Analysis

v
4. Syntax Analysis
5. Semantic Analysis
6. Intermediate Code Generator
7. Code Optimizer
8. Code Generator

LESSON PLAN

Course Outcomes (COs):


At the end of the course the student should be able to:
1. Illustrate the different phases of a compiler, and implement practical aspects of
automata theory. L2,L3
2. Apply the syntax and semantic rules to design an error free compiler. L4
3. Interpret storage organization and allocation strategies for dynamic storage system.
L2
4. Analyze the knowledge of different phases in designing a compiler L3
5. Apply code Generation and optimization techniques L4

Course Articulation Matrix: Mapping of Course Outcomes (COs) with Program


Outcomes (POs)

Course Outcomes
(COs) / Program 1 2 3 4 5 6 7 8 9 10 11 12 PSO1 PSO2
Outcomes (POs)

Illustrate the
different phases
of a compiler, and
3 3 2 2 3 2
implement
practical aspects of
automata theory

Apply the syntax


and semantic rules
3 3 3 1 3 2 3
to design an error
free compiler

vi
Interpret storage
organization and
allocation
3 2 2 2 2 2
strategies for
dynamic storage
system

Analyze the
knowledge of
different phases in 3 2 2 2
designing a
compiler

Apply code
Generation and
2 3 3 2 2 2 2 3 2
optimization
techniques

vii
Course Syllabus
UNIT I
Introduction to Compiling: Compiler, Phases of a compiler, Analysis of the source
program, Cousins of the compiler, grouping of phases, Compiler writing tools.
Lexical Analysis: The role of the lexical analyzer, Specification of tokens. Recognition of
tokens, A Language for specifying lexical Analyzers, Finite automata, Optimization of DFA-
based pattern matchers.

UNIT II
Syntax Analysis: The role of a parser, Context-free grammars, writing a grammar, Parsing,
Ambiguous grammar, Elimination of Ambiguity, Classification of parsing techniques
Top down parsing: Back Tracking, Recursive Descent parsing, FIRST ( ) and FOLLOW ( )
- LL Grammars, Non-Recursive descent parsing, Error recovery in predictive parsing.

UNIT III
Bottom Up parsing: SR parsing, Operator Precedence Parsing, LR grammars, LR Parsers –
Model of an LR Parsers, SLR parsing, CLR parsing, LALR parsing, Error recovery in LR
Parsing, handling ambiguous grammars.

UNIT IV
Syntax Directed Translation: Syntax Directed Definition, S-attributed definitions, L-
attributed definitions, Attribute grammar, S-attributed grammar, L-attributed grammar.
Semantic Analysis: Type Checking, Type systems, Type expressions, Equivalence of type
expressions.
Intermediate Code Generation: Construction of syntax trees, Directed Acyclic Graph,
Three Address Codes.

UNIT V
Runtime Environments: Storage organization, Storage-allocation strategies, Symbol tables,
Activation records.
Code Optimization: The principal sources of optimization, Basic blocks and Flow graphs,
data-flow analysis of flow graphs.
Code Generation: Issues in the design of a code generator, the target machine code, Next-
use information, a simple code generator, Code-generation algorithm.
viii
TEXT BOOKS
1. Alfred V.Aho, Ravi Sethi and Jeffry D. Ullman “Compiler Principles, Techniques
and Tools”16th Indian Reprint, Pearson Education Asia, ISBN No.81-7808-046-
X.,2004.
2. D.M.Dhamdere ”Compiler Construction“, 2nd Edition ” Mac Mellon India Ltd”,
ISBN No.0333 -90406-0,1997

REFERENCE BOOKS
1. Donovan,”Systems programming”, Mc. Graw Hill.
2. Leland L. Beck, “System Software – An Introduction to Systems Programming”
Addison Wesley.

WEB LINKS
1. books.google.co.in Computers Programming General
2. www.amazon.com Books Computers and Technology
3. http://nptel.iitm.ac.in

LESSON PLAN

Lecture Delivery Method/


Topic
No. Activity
UNIT-I
1 Introduction to Compiling Chalk & Talk
2&3 The phases of a compiler PPT
4 Analysis of the source program Chalk & Talk
5 Cousins of the compiler Chalk & Talk
grouping of phases Chalk & Talk
6
Compiler writing tools Chalk & Talk
7 Lexical Analysis: The role of the lexical analyzer Chalk & Talk
8 Specification of tokens Chalk & Talk
9 Recognition of tokens Chalk & Talk
10 A Language for specifying lexical Analyzers Chalk & Talk
11 & 12 Finite automata Chalk & Talk

ix
13 Optimization of DFA-based pattern matchers Chalk & Talk
Quiz will be conducted for UNIT I through Google classroom / Google forms
UNIT-II
14 Syntax Analysis: The role of a parser Chalk & Talk
15 Context-free grammars Think-Pair-Share
Writing a grammar Chalk & Talk
16
Parsing Chalk & Talk
17 & 18 Ambiguous grammar, Elimination of Ambiguity Brain storming
19 Classification of parsing techniques Chalk & Talk
20 Top down parsing –Back Tracking Chalk & Talk
21 Recursive Descent parsing Chalk & Talk
22&23 FIRST( ) and FOLLOW( )- LL Grammars Role Play
24 Non-Recursive descent parsing Chalk & Talk
25 Error recovery in predictive parsing Chalk & Talk
Quiz will be conducted for UNIT II through Google classroom / Google forms
LL(k) problems solving using Think-Pair-Share activity
UNIT-III
26 Bottom Up parsing- SR parsing Chalk & Talk
27 Operator Precedence Parsing Chalk & Talk
28 LR grammars Chalk & Talk
29 LR Parsers – Model of an LR Parsers Chalk & Talk
30 & 31 SLR parsing Chalk & Talk
32 &33 CLR parsing Chalk & Talk
34 LALR parsing Chalk & Talk
35 Error recovery in LR Parsing Chalk & Talk
36 Handling ambiguous grammars Chalk & Talk
Quiz will be conducted for UNIT III through Google classroom / Google forms
LR Gramars problems solving using Think-Pair-Share activity
I Mid Term Examinations
UNIT-IV
37 Syntax Directed Translation Chalk & Talk

x
38 Syntax-directed definition Chalk & Talk
39 S-attributed definitions, L-attributed definitions Chalk & Talk
40 Attribute grammar Chalk & Talk
41 S-attributed grammar, L-attributed grammar Chalk & Talk
42 Semantic Analysis: Type Checking Chalk & Talk
Type systems, Type expressions, Equivalence of type
43 Chalk & Talk
expressions
44 Intermediate Code Generation Chalk & Talk
45 Construction of syntax trees Chalk & Talk
46 Directed acyclic graph Chalk & Talk
47 Three address codes Chalk & Talk
Quiz will be conducted for UNIT IV through Google classroom / Google forms
UNIT-V
48 Runtime Environments PPT
49 Storage organization PPT
50 Storage-allocation strategies PPT
51 Symbol tables PPT
52 Activation records PPT
53 & 54 Code Optimization: The principal sources of optimization PPT
55 Basic blocks and Flow graphs PPT
56 Data-flow analysis of flow graphs PPT
57 Code Generation: Issues in the design of a code generator PPT
58 The target machine code PPT
59 Next-use information, A simple code generator PPT
60 Code-generation algorithm PPT
Quiz will be conducted for UNIT V through Google classroom / Google forms
II Mid Term Examinations

xi
Compiler Design - Introduction to Compiling, Lexical Analysis

UNIT-1
Introduction to Compiling: Compiler, Phases of a compiler, Analysis of the source
program, Cousins of the compiler, grouping of phases, Compiler writing tools.
Lexical Analysis: The role of the lexical analyzer, Specification of tokens. Recognition of
tokens, A Language for specifying lexical Analyzers, Finite automata, Optimization of DFA-
based pattern matchers.
UNIT WISE PLAN

UNIT-I: Introduction to Compiling, Lexical Analysis Planned Hours:13

Blooms
S. No. Topic Learning Outcomes Cos
Levels

1. Understand various concepts in different phases of compiler CO1 L1

2. Apply the knowledge of different phases of a compiler designing CO1 L3

3. Identify the tokens in a source code CO1 L3

4. Analyze practical aspects of automata theory in compiler phases CO1 L4

5. Construct Finite automata from a regular Expression CO1 L3

1. INTRODUCTION TO COMPILING & LEXICAL ANALYSIS


1.1. COMPILER:
A compiler is a program that reads a program written in one language-the source language-
and translates it into an equivalent program in another language-the target language. As an
important part of this translation process, the compiler reports to its user the presence of
errors in the source program.

Source Program

Compiler

Target Program

Fig. 1.1 Compiler


Compilers are sometimes classified as single-pass, multi-pass, load-and-go, debugging, or
optimizing, depending on how they have been constructed or on what function they are

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 1
Compiler Design - Introduction to Compiling, Lexical Analysis

supposed to perform. Despite this apparent complexity, the basic tasks that any compiler
must perform are essentially the same.

1.2. THE PHASES OF A COMPILER


1. Lexical analysis (“scanning”) - Reads in program, groups characters into “tokens”
2. Syntax analysis (“parsing”) - Structures token sequence according to grammar rules
of the language.
3. Semantic analysis - Checks semantic constraints of the language.
4. Intermediate code generator - Translates to “lower level” representation.
5. Code optimization - Improves code quality.
6. Code generator – Generates target assembly code

Fig 1.2.1 Phases of Compiler

The Analysis – Synthesis Model of Compilation


There are two parts of compilation.
1. Analysis part
2. Synthesis Part

Prepared by G. Sunil Reddy, Asst. Professor,


2
Compiler Design - Introduction to Compiling, Lexical Analysis

• The analysis part breaks up the source program into constant piece and creates an
intermediate representation of the source program.
• The synthesis part constructs the desired target program from the intermediate
representation.
In Compiling, analysis part consists of three phases:
1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
In Compiling, synthesis part consists of three phases:
1. Intermediate code generator
2. Code optimization
3. Code generator
Lexical analysis:
In a compiler linear analysis is called lexical analysis or scanning. The lexical analysis
phase reads the characters in the source program and grouped into them tokens that are
sequence of characters having a collective meaning.
Example:
Source program - position: = initial + rate * 60
Identifiers – position, initial, rate.
Operators - + , *
Assignment symbol - : =
Number - 60
Blanks – eliminated.
Syntax analysis:
Hierarchical Analysis is called parsing or syntax analysis. It involves grouping the
tokens of the source program into grammatical phrases that are used by the complier to
synthesize output. They are represented using a syntax tree as shown in Fig. 1.2.2
• A syntax tree is the tree generated as a result of syntax analysis in which the interior
nodes are the operators and the exterior nodes are the operands.
• This analysis shows an error when the syntax is incorrect.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 3
Compiler Design - Introduction to Compiling, Lexical Analysis

Example:

Fig. 1.2.2 Parse tree for position: =initial+rate*60


Semantic analysis:
This phase checks the source program for semantic errors and gathers type
information for subsequent code generation phase. An important component of semantic
analysis is type checking. Here the compiler checks that each operator has operands that are
permitted by the source language specification.
Example:

Fig. 1.2.3 Semantic analysis inserts a conversion from integer to real


Conceptually, a compiler operates in phases, each of which transforms the source
program from one representation to another. A typical decomposition of a compiler is shown
in Fig The first three phases, forms the bulk of the analysis portion of a compiler. Two other
activities, Symbol table management and error handling, are shown interacting with the six
phases.
Intermediate Code Generation
After syntax and semantic analysis, some compilers generate an explicit intermediate
representation of the source program. This intermediate representation can have a variety of
forms. In three-address code, the source program might look like this,

Prepared by G. Sunil Reddy, Asst. Professor,


4
Compiler Design - Introduction to Compiling, Lexical Analysis

Example:
temp1: = int to real (10)
temp2: = id3 * temp1
temp3: = id2 + temp2
id1: = temp3
Code Optimization
The code optimization phase attempts to improve the intermediate code, so that faster
running machine codes will result. Some optimizations are trivial. There is a great variation
in the amount of code optimization different compilers perform. In those that do the most,
called “optimizing compilers”, a significant fraction of the time of the compiler is spent on
this phase.
Example:
temp1=id3*60.0
id1=id2+temp1
Code Generation
The final phase of the compiler is the generation of target code, consisting normally
of relocatable machine code or assembly code. Memory locations are selected for each of the
variables used by the program. Then, intermediate instructions are each translated into a
sequence of machine instructions that perform the same task. A crucial aspect is the
assignment of variables to registers.
Example:
MOVF id3,r2
MULF #60.0,r2
MOVF id2,r2
ADDF r2,r1
MOVF r1,id1
Symbol table management
An essential function of a compiler is to record the identifiers used in the source
program and collect information about various attributes of each identifier. A symbol table is
a data structure containing a record for each identifier, with fields for the attributes of the
identifier. The data structure allows us to find the record for each identifier quickly and to
store or retrieve data from that record quickly. When an identifier in the source program is
detected by the lexical analyzer, the identifier is entered into the symbol table.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 5
Compiler Design - Introduction to Compiling, Lexical Analysis

Error Detection and Reporting


Each phase can encounter errors. A compiler that stops when it finds the first error.
The syntax and semantic analysis phases usually handle a large fraction of the errors
detectable by the compiler. The lexical phase can detect errors where the characters
remaining in the input do not form any token of the language. Errors when the token stream
violates the syntax of the language are determined by the syntax analysis phase. During
semantic analysis the compiler tries to detect constructs that have the right syntactic structure
but no meaning to the operation involved.

1.3. ANALYSIS OF THE SOURCE PROGRAM


As translation progresses, the compiler’s internal representation of the source program
changes. Consider the statement,
Position: = initial + rate * 10
The lexical analysis phase reads the characters in the source program and groups them into a
stream of tokens in which each token represents a logically cohesive sequence of characters,
such as an identifier, a keyword etc. The character sequence forming a token is called
the lexeme for the token. Certain tokens will be augmented by a “lexical value”. For example,
for any identifier the lex analyzer generates not only the token id but also enter s the lexeme
into the symbol table, if it is not already present there. The lexical value associated this
occurrence of id points to the symbol table entry for this lexeme. The representation of the
statement given above after the lexical analysis would be: id1: = id2 + id3 * 10
Syntax analysis imposes a hierarchical structure on the token stream, which is shown by
syntax trees (fig ).

Fig. 1.3.1 Syntax tree

Prepared by G. Sunil Reddy, Asst. Professor,


6
Compiler Design - Introduction to Compiling, Lexical Analysis

Fig. 1.3.2 Translation of a statement

1.4. COUSINS OF COMPILER

Fig. 1.4.1 Flow of Cousins of compiler


Prepared by G. Sunil Reddy, Asst. Professor,
7
Compiler Design - Introduction to Compiling, Lexical Analysis

The high-level language is converted into binary language in various phases. A


compiler is a program that converts high-level language to assembly language. Similarly, an
assembler is a program that converts the assembly language to machine-level language.

Let us first understand how a program, using C compiler, is executed on a host machine.
• User writes a program in C language (high-level language).
• The C compiler compiles the program and translates it to assembly program (low-
level language).
• An assembler then translates the assembly program into machine code (object).
• A linker tool is used to link all the parts of the program together for execution
(executable machine code).
• A loader loads all of them into memory and then the program is executed.
1. Preprocessor 2. Assembler 3. Loader and Link-editor

1.4.1. Preprocessor
A preprocessor is a program that processes its input data to produce output that is
used as input to another program. The output is said to be a preprocessed form of the input
data, which is often used by some subsequent programs like compilers.
They may perform the following functions:
1. Macro processing 3. Rational Preprocessors
2. File Inclusion 4. Language extension
1. Macro processing:
A macro is a rule or pattern that specifies how a certain input sequence should be
mapped to an output sequence according to a defined procedure. The mapping process that
instantiates a macro into a specific output sequence is known as macro expansion.
2. File Inclusion:
Preprocessor includes header files into the program text. When the preprocessor finds
an #include directive it replaces it by the entire content of the specified file.
3. Rational Preprocessors:
These processors change older languages with more modern flow-of-control and data-
structuring facilities.
4. Language extension:
These processors attempt to add capabilities to the language by what amounts to built-
in macros. For example, the language Equel is a database query language embedded in C.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 8
Compiler Design - Introduction to Compiling, Lexical Analysis

1.4.2. Assembler
Assembler creates object code by translating assembly instruction mnemonics into
machine code. There are two types of assemblers:
• One-pass assemblers go through the source code once and assume that all symbols
will be defined before any instruction that references them.
• Two-pass assemblers create a table with all symbols and their values in the first pass,
and then use the table in a second pass to generate code

1.4.3. LINKER AND LOADER


A linker or link editor is a program that takes one or more objects generated by a
compiler and combines them into a single executable program. Three tasks of the linker are
1. Searches the program to find library routines used by program, e.g. printf(), math
routines.
2. Determines the memory locations that code from each module will occupy and
relocates its instructions by adjusting absolute references
3. Resolves references among files.
A loader is the part of an operating system that is responsible for loading programs in
memory, one of the essential stages in the process of starting a program.

1.5. ASSEMBLER:
Programmers found it difficult to write or read programs in machine language. They
begin to use a mnemonic (symbols) for each machine instruction, which they would
subsequently translate into machine language. Such a mnemonic machine language is now
called an assembly language. Programs known as assembler were written to automate the
translation of assembly language in to machine language. The input to an assembler program
is called source program, the output is a machine language translation (object program).

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 9
Compiler Design - Introduction to Compiling, Lexical Analysis

INTERPRETER:
An interpreter is a program that appears to execute a source program as if it were machine
language.

Fig. 1.5.1 Interpreter

1.6. GROUPING OF THE PHASES


Compiler can be grouped into front and back ends:
Front end: analysis (machine independent)
These normally include lexical and syntactic analysis, the creation of the symbol
table, semantic analysis and the generation of intermediate code. It also includes error
handling that goes along with each of these phases.
Back end: synthesis (machine dependent)
It includes code optimization phase and code generation along with the necessary
error handling and symbol table operations.
Compiler passes
A collection of phases is done only once (single pass) or multiple times (multi pass)
• Single pass: usually requires everything to be defined before being used in source
program.
• Multi pass: compiler may have to keep entire program representation in memory.
Several phases can be grouped into one single pass and the activities of these phases are
interleaved during the pass. For example, lexical analysis, syntax analysis, semantic analysis
and intermediate code generation might be grouped into one pass.

1.7. COMPILER CONSTRUCTION TOOLS


These are specialized tools that have been developed for helping implement various
phases of a compiler. The following are the compiler construction tools:
i. Parser Generators:
• These produce syntax analyzers, normally from input that is based on a context-
free grammar.

Prepared by G. Sunil Reddy, Asst. Professor,


10
Compiler Design - Introduction to Compiling, Lexical Analysis

• It consumes a large fraction of the running time of a compiler. -Example-YACC


(Yet another Compiler-Compiler).
ii. Scanner Generator:
• These generate lexical analyzers, normally from a specification based on regular
expressions. -The basic organization of lexical analyzers is based on finite
automation.
iii. Syntax-Directed Translation:
• These produce routines that walk the parse tree and as a result generate
intermediate code.
• Each translation is defined in terms of translations at its neighbor nodes in the
tree.
iv. Automatic Code Generators:
• It takes a collection of rules to translate intermediate language into machine
language. The rules must include sufficient details to handle different possible
access methods for data.
v. Data-Flow Engines:
• It does code optimization using data-flow analysis, that is, the gathering of
information about how values are transmitted from one part of a program to each
other part.

1.8. LEXICAL ANALYSIS


A simple way to build lexical analyzer is to construct a diagram that illustrates the
structure of the tokens of the source language, and then to hand-translate the diagram into a
program for finding tokens. Efficient lexical analyzers can be produced in this manner.

1.9. ROLE OF LEXICAL ANALYZER


The lexical analyzer is the first phase of compiler. Its main task is to read the input
characters and produces output a sequence of tokens that the parser uses for syntax analysis.
As in the figure, upon receiving a “get next token” command from the parser the lexical
analyzer reads input characters until it can identify the next token.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 11
Compiler Design - Introduction to Compiling, Lexical Analysis

Fig. 1.9.1 Interaction of lexical analyzer with parser


Since the lexical analyzer is the part of the compiler that reads the source text, it may
also perform certain secondary tasks at the user interface. One such task is stripping out from
the source program comments and white space in the form of blank, tab, and new line
character. Another is correlating error messages from the compiler with the source program.

Issues in Lexical Analysis


There are several reasons for separating the analysis phase of compiling into lexical
analysis and parsing
1) Simpler design is the most important consideration. The separation of lexical analysis
from syntax analysis often allows us to simplify one or the other of these phases.
2) Compiler efficiency is improved.
3) Compiler portability is enhanced.

Tokens, Patterns and Lexemes


Token
Token is a sequence of characters that can be treated as a single logical entity. There
is a set of strings in the input for which the same token is produced as output. This set of
strings is described by a rule called a pattern associated with the token. The pattern is set to
match each string in the set.
In most programming languages, the following constructs are treated as tokens:
keywords, operators, identifiers, constants, literal strings, and punctuation symbols such as
parentheses, commas, and semicolons.

Prepared by G. Sunil Reddy, Asst. Professor,


12
Compiler Design - Introduction to Compiling, Lexical Analysis

Lexeme
Collection or group of characters forming tokens is called Lexeme. A lexeme is a
sequence of characters in the source program that is matched by the pattern for the token. For
example in the Pascal’s statement const pi = 3.1416; the substring pi is a lexeme for the token
identifier.

Patterns
A pattern is a rule describing a set of lexemes that can represent a particular token in
source program. The pattern for the token const in the above table is just the single string
const that spells out the keyword.
Token Lexeme Pattern
Const Const Const
If If If
Relation <. <=, +, < >,>=, > < or <= or < > or >= or letter
followed by letters & digit
Pi Pi Any numeric constant
Num 3.14 Any character b/w and
except
Literal “core” pattern
Fig.1.9.2 Example of Token, Lexeme and Pattern
Certain language conventions impact the difficulty of lexical analysis. Languages
such as FORTRAN require a certain constructs in fixed positions on the input line. Thus the
alignment of a lexeme may be important in determining the correctness of a source program.

Attributes of Token
The lexical analyzer returns to the parser a representation for the token it has found.
The representation is an integer code if the token is a simple construct such as a left
parenthesis, comma, or colon. The representation is a pair consisting of an integer code and a
pointer to a table if the token is a more complex element such as an identifier or constant.
The integer code gives the token type, the pointer points to the value of that token.
Pairs are also retuned whenever we wish to distinguish between instances of a token.
The attributes influence the translation of tokens.
i. Constant : value of the constant

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 13
Compiler Design - Introduction to Compiling, Lexical Analysis

ii. Identifiers: pointer to the corresponding symbol table entry.

LEXICAL ERRORS:
Lexical errors are the errors thrown by your lexer when unable to continue. Which means that
there's no way to recognize a lexeme as a valid token for you lexer. Syntax errors, on the other
side, will be thrown by your scanner when a given set of already recognized valid tokens
don't match any of the right sides of your grammar rules. Simple panic-mode error handling
system requires that we return to a high-level parsing function when a parsing or lexical error
is detected.
Error Recovery Strategies in Lexical Analysis
The following are the error-recovery actions in lexical analysis:
1. Deleting an extraneous character
2. Inserting a missing character
3. Replacing an incorrect character by a correct character
4. Transforming two adjacent characters
5. Panic mode recovery: Deletion of successive characters from the token until error is
resolved

1.10. SPECIFICATION OF TOKENS


There are 3 specifications of tokens:
1. Strings
2. Language
3. Regular expression

Strings and Languages


• An alphabet or character class is a finite set of symbols
• A string over an alphabet is a finite sequence of symbols drawn from that alphabet
• A language is any countable set of strings over some fixed alphabet

In language theory, the terms "sentence" and "word" are often used as synonyms for "string."
The length of a string s, usually written |s|, is the number of occurrences of symbols in s. For
example, banana is a string of length six. The empty string, denoted ε, is the string of length
zero.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 14
Compiler Design - Introduction to Compiling, Lexical Analysis

Operations on strings
The following string-related terms are commonly used:
1. A prefix of string s is any string obtained by removing zero or more symbols from the
end of string s. For example, ban is a prefix of banana
2. A suffix of string s is any string obtained by removing zero or more symbols from the
beginning of s. For example, nana is a suffix of banana
3. A substring of s is obtained by deleting any prefix and any suffix from s. For example,
nan is a substring of banana
4. The proper prefixes, suffixes, and substrings of a string s are those prefixes,
suffixes, and substrings, respectively of s that are not ε or not equal to s itself
5. A subsequence of s is any string formed by deleting zero or more not necessarily
consecutive positions of s. For example, baan is a subsequence of banana

Operations on languages:
The following are the operations that can be applied to languages:
1. Union
2. Concatenation
3. Kleene closure
4. Positive closure
The following example shows the operations on strings: Let L={0,1} and S={a,b,c}
1. Union : L U S={0,1,a,b,c}
2. Concatenation : L.S={0a,1a,0b,1b,0c,1c}
3. Kleene closure : L*={ε,0,1,00…..}
4. Positive closure : L+={0,1,00…..}

Regular Expressions
• Each regular expression r denotes a language L(r)
• Here are the rules that define the regular expressions over some alphabet Σ and the
languages that those expressions denote:
1. ε is a regular expression, and L(ε) is { ε }, that is, the language whose sole member is
the empty string
2. If ‘a’ is a symbol in Σ, then ‘a’ is a regular expression, and L(a) = {a}, that is, the
language with one string, of length one, with ‘a’ in its one position
3. Suppose r and s are regular expressions denoting the languages L(r) and L(s). Then,

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 15
Compiler Design - Introduction to Compiling, Lexical Analysis

a) (r)|(s) is a regular expression denoting the language L(r) U L(s)


b) (r)(s) is a regular expression denoting the language L(r)L(s)
c) (r)* is a regular expression denoting (L(r))*
d) (r) is a regular expression denoting L(r)
4. The unary operator * has highest precedence and is left associative
5. Concatenation has second highest precedence and is left associative
6. | has lowest precedence and is left associative

Regular set
A language that can be defined by a regular expression is called a regular set. If two regular
expressions r and s denote the same regular set, we say they are equivalent and write r = s.

There are a number of algebraic laws for regular expressions that can be used to manipulate
into equivalent forms.
For instance, r | s = s | r is commutative; r | (s | t) = (r | s) | t is associative.

Regular Definitions
Giving names to regular expressions is referred to as a Regular definition. If Σ is an alphabet
of basic symbols, then a regular definition is a sequence of definitions of the form
dl → r 1
d2 → r2
………
dn → rn
1. Each di is a distinct name.
2. Each ri is a regular expression over the alphabet Σ U {dl, d2,. . . , di-l}.

Example: Identifiers is the set of strings of letters and digits beginning with a letter. Regular
definition for this set:
letter → A | B | …. | Z | a | b | …. | z |
digit → 0 | 1 | …. | 9
id → letter ( letter | digit ) *
Shorthands
Certain constructs occur so frequently in regular expressions that it is convenient to
introduce notational short hands for them.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 16
Compiler Design - Introduction to Compiling, Lexical Analysis

1. One or more instances (+):

• The unary postfix operator + means “one or more instances of”


• If r is a regular expression that denotes the language L(r), then ( r )+ is a regular
expression that denotes the language (L (r ))+
• Thus the regular expression a+ denotes the set of all strings of one or more a’s
• The operator + has the same precedence and associativity as the operator *

2. Zero or one instance (?):

• The unary postfix operator ? means “zero or one instance of”


• The notation r? is a shorthand for r | ε
• If ‘r’ is a regular expression, then ( r )? is a regular expression that denotes the language

3. Character Classes:

• The notation [abc] where a, b and c are alphabet symbols denotes the regular expression
a|b|c
• Character class such as [a – z] denotes the regular expression a | b | c | d | ….|z
• We can describe identifiers as being strings generated by the regular expression, [A–
Za–z][A– Za–z0–9]*

Non-regular Set
A language which cannot be described by any regular expression is a non-regular set.
Example: The set of all strings of balanced parentheses and repeating strings cannot be
described by a regular expression. This set can be specified by a context-free grammar.

1.11. RECOGNITION OF TOKENS

For this language fragment the lexical analyzer will recognize the keywords if, then, else, as
well as the lexemes denoted by relop, id, and num.
Consider the following grammar fragment:
stmt → if expr then stmt
| if expr then stmt else stmt | ε
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 17
Compiler Design - Introduction to Compiling, Lexical Analysis

expr → term relop term


| term
term → id | num
where the terminals if , then, else, relop, id and num generate sets of strings given by the
following regular definitions:
digit → [0-9]
digits → digits+
num → digit+ (.digit+)?(E(+|-)?digit+)?
letter → [A-Za-z]
id → letter ( letter | digit )*
if → if
then → then
else → else
relop → < | > | <= | >= | = | <>

We also want the lexer to remove whitespace so we define a new token


ws → ( blank | tab | newline ) +
Where blank, tab, and newline are symbols used to represent the corresponding ASCII
characters.
For this language fragment the lexical analyzer will recognize the keywords if, then, else, as
well as the lexemes denoted by relop, id, and num. To simplify matters, we assume keywords
are reserved; that is, they cannot be used as identifiers.

Transition diagrams
Transition Diagram has a collection of nodes or circles, called states. Each state represents a
condition that could occur during the process of scanning the input looking for a lexeme that
matches one of several patterns. Edges are directed from one state of the transition diagram to
another. Each edge is labeled by a symbol or set of symbols. If we are in one state s, and the
next input symbol is a, we look for an edge out of state s labeled by a. if we find such an edge
,we advance the forward pointer and enter the state of the transition diagram to which that
edge leads.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 18
Compiler Design - Introduction to Compiling, Lexical Analysis

Some important conventions about transition diagrams are


1. Certain states are said to be accepting or final .These states indicates that a lexeme has
been found, although the actual lexeme may not consist of all positions b/w the
lexeme Begin and forward pointers we always indicate an accepting state by a double
circle
2. In addition, if it is necessary to return the forward pointer one position, then we shall
additionally place a * near that accepting state
3. One state is designed the state ,or initial state ., it is indicated by an edge labeled
“start” entering from nowhere .the transition diagram always begins in the state before
any input symbols have been used

Recognition of Reserved Words and Identifiers

Fig.1.11.1 Transition diagram for relop


As an intermediate step in the construction of a LA, we first produce a stylized flowchart,
called a transition diagram. Position in a transition diagram, are drawn as circles and are
called as states.

Fig.1.11.2 Transition diagram of Identifier

The above TD for an identifier, defined to be a letter followed by any no of letters or digits. A
sequence of transition diagram can be converted into program to look for the tokens specified
by the diagrams. Each state gets a segment of code.

Prepared by G. Sunil Reddy, Asst. Professor,


19
Compiler Design - Introduction to Compiling, Lexical Analysis

Recognizing Numbers
The diagram below is from the second edition. It is essentially a combination of the three
diagrams in the first edition.

Fig. 1.11.3 Recognizing Numbers


Recognizing Whitespace
The diagram itself is quite simple reflecting the simplicity of the corresponding regular
expression.

Fig. 1.11.4 Recognizing Whitespaces


• The delim in the diagram represents any of the whitespace characters, say space, tab,
and newline.
• The final star is there because we needed to find a non-whitespace character in order
to know when the whitespace ends and this character begins the next token.
• There is no action performed at the accepting state. Indeed the lexer does not return to
the parser, but starts again from its beginning as it still must find the next token.

1.12. A LANGUAGE FOR SPECIFYING LEXICAL ANALYZER

There is a wide range of tools for constructing lexical analyzers.

• Lex
• YACC

Prepared by G. Sunil Reddy, Asst. Professor,


20
Compiler Design - Introduction to Compiling, Lexical Analysis

Lex is a computer program that generates lexical analyzers. Lex is commonly used with the
yacc parser generator.
Creating a lexical analyzer

• First, a specification of a lexical analyzer is prepared by creating a program lex.l in


the Lex language. Then, lex.l is run through the Lex compiler to produce a C program
lex.yy.c.
• Finally, lex.yy.c is run through the C compiler to produce an object program a.out,
which is the lexical analyzer that transforms an input stream into a sequence of
tokens.

Fig1.12.1 Creating a lexical analyzer with Lex


Lex Specification
A Lex program consists of three parts:
{ definitions }
%%
{ rules }
%%
{ user subroutines }
• Definitions include declarations of variables, constants, and regular definitions. The
first, declaration, section includes variables and constants as well as the all-important
regular definitions that define the building blocks of the target language, i.e., the
language that the generated lexer will analyze.
• Rules are statements of the form
p1 {action1}

Prepared by G. Sunil Reddy, Asst. Professor,


21
Compiler Design - Introduction to Compiling, Lexical Analysis

p2 {action2}

pn {actionn}
Where pi is regular expression and actioni describes what action the lexical analyzer
should take when pattern pi matches a lexeme. Actions are written in C code.
The next, translation rules, section gives the patterns of the lexemes that the lexer will
recognize and the actions to be performed upon recognition. Normally, these actions
include returning a token name to the parser and often returning other information
about the token via the shared variable yylval.
• User subroutines are auxiliary procedures needed by the actions. These can be
compiled separately and loaded with the lexical analyzer. If a return is not specified
the lexer continues executing and finds the next lexeme present.

YACC- YET ANOTHER COMPILER-COMPILER

YACC provides a general tool for describing the input to a computer program. The
YACC user specifies the structures of his input, together with code to be invoked as each
such structure is recognized.
YACC turns such a specification into a subroutine that handles the input process;
frequently, it is convenient and appropriate to have most of the flow of control in the user's
application handled by this subroutine.

1.13. FINITE AUTOMATA

Finite Automata is one of the mathematical models that consist of a number of states
and edges. It is a transition diagram that recognizes a regular expression or grammar.
There are two types of Finite Automata:

• Non-deterministic Finite Automata (NFA)


• Deterministic Finite Automata (DFA)

Deterministic Finite Automata


DFA is a special case of a NFA in which
i. No state has a ε-transition.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 22
Compiler Design - Introduction to Compiling, Lexical Analysis

ii. There is at most one transition from each state on any input
iii. For each symbol a and state s, there is at most one labeled edge a leaving s. i.e.
transition function is from pair of state-symbol to state (not set of states)
DFA has five tuples denoted by
M = {Q, Ʃ, δ, q0, F}
Q: Set of all states.
∑: Set of input symbols. (Symbols which machine takes as input)
qo: Initial state. (Starting state of a machine)
F: Set of final state.
δ: Transition Function, defined as δ : Q X ∑ Q.
Example:

Non Deterministic Finite Automata


NFA has five tuples denoted by
M = {Q, Ʃ, δ`, q0, F}
Q: Set of all states.
∑: Set of input symbols. (Symbols which machine takes as input)
qo: Initial state. (Starting state of a machine)
F: Set of final state.
δ: Transition Function, defined as δ : Q X (∑ U ϵ ) 2 ^ Q.

Prepared by G. Sunil Reddy, Asst. Professor,


23
Compiler Design - Introduction to Compiling, Lexical Analysis

• ε- transitions are allowed in NFAs. In other words, we can move from one state to
another one without consuming any symbol.
• A NFA accepts a string x, if and only if there is a path from the starting state to one of
accepting states such that edge labels along this path spell out x.
Example:

Construction of DFA from regular expression


The following steps are involved in the construction of DFA from regular expression:
• Convert RE to NFA using Thomson’s rules
• Convert NFA to DFA
• Construct minimized DFA

The regular expression is converted into minimized DFA by the following procedure:
Regular expression → NFA → DFA → Minimized DFA
• This is one way to convert a regular expression into a NFA
• There can be other ways (much efficient) for the conversion
Algorithm 1.13.1:
Thomson’s Construction is simple and systematic method
• It guarantees that the resulting NFA will have exactly one final state, and one start
state.
• Construction starts from simplest parts (alphabet symbols).

Prepared by G. Sunil Reddy, Asst. Professor,


24
Compiler Design - Introduction to Compiling, Lexical Analysis

• To create a NFA for a complex regular expression, NFAs of its sub-expressions are
combined to create its NFA.
• To recognize an empty string ε

• To recognize a symbol a in the alphabet Σ:

• For regular expression r1 | r2:

N(r1) and N(r2) are NFAs for regular expressions r1 and r2

• For regular expression r1 r2

Here, final state of N(r1) becomes the final state of N(r1r2).

• For regular expression r*

Prepared by G. Sunil Reddy, Asst. Professor,


25
Compiler Design - Introduction to Compiling, Lexical Analysis

Example:
For a RE (a | b) * a, the NFA construction is shown below.

Converting NFA to DFA (Subset Construction)


We merge together NFA states by looking at them from the point of view of the input
characters:
• From the point of view of the input, any two states that are connected by an –
transition may as well be the same, since we can move from one to the other without
consuming any character. Thus states which are connected by an -transition will be
represented by the same states in the DFA.
• If it is possible to have multiple transitions based on the same symbol, then we can
regard a transition on a symbol as moving from a state to a set of states (i.e., the union
of all those states reachable by a transition on the current symbol). Thus these states
will be combined into a single DFA state.
To perform this operation, let us define two functions:
• The ε-closure function takes a state and returns the set of states reachable from it
based on (one or more) ε-transitions. Note that this will always include the state itself.
We should be able to get from a state to any state in its -closure without consuming
any input.
• The function move takes a state and a character, and returns the set of states reachable
by one transition on this character.
We can generalize both these functions to apply to sets of states by taking the union of the
application to individual states.

Prepared by G. Sunil Reddy, Asst. Professor,


26
Compiler Design - Introduction to Compiling, Lexical Analysis

1.14. DESIGN OF A LEXICAL-ANALYZER GENERATOR


The Structure of the Generated Analyzer
The program that serves as the lexical analyzer includes a fixed program that simulates an
automaton; at this point we leave open whether that automaton is deterministic or
nondeterministic. The rest of the lexical analyzer consists of components that are created
from the Lex program by Lex itself.

Fig 1.14.1 A Lex program is turned into a transition table and actions, which are used
by a finite-automaton simulator
These components are:
• A transition table for the automaton.
• Those functions that are passed directly through Lex to the output
• The actions from the input program, which appear as fragments of code to be invoked
at the appropriate time by the automaton simulator.
Example: We shall illustrate the ideas of this section with the following simple, abstract
example:

Fig 1.14.2 An NFA constructed from a Lex program

Prepared by G. Sunil Reddy, Asst. Professor,


27
Compiler Design - Introduction to Compiling, Lexical Analysis

In particular, string abb matches both the second and third patterns, but we shall consider it a
lexeme for pattern p2, since that pattern is listed first in the above Lex program. Then, input
strings such as aabbb. Have many prefixes that match the third pattern. The Lex rule is to
take the longest, so we continue reading 6's, until another a is met, whereupon we report the
lexeme to be the initial a's followed by as many 6's as there are.

• Pattern Matching Based on NFA's

If the lexical analyzer simulates an NFA, then it must read input beginning at the point on its
input which we have referred to as lexeme Begin. As it moves the pointer
called forward ahead in the input, it calculates the set of states it is in at each point, following
Algorithm.

Eventually, the NFA simulation reaches a point on the input where there are no next states.
At that point, there is no hope that any longer prefix of the input would ever get the NFA to
an accepting state; rather, the set of states will always be empty. Thus, we are ready to decide
on the longest prefix that is a lexeme matching some pattern.

Fig 1.14.3 NFA’s for a, abb, and a*b+

Fig 1.14.4 Combined NFA

Prepared by G. Sunil Reddy, Asst. Professor,


28
Compiler Design - Introduction to Compiling, Lexical Analysis

Fig 1.14.5 Sequence of sets of states entered when processing input aaba
We look backwards in the sequence of sets of states, until we find a set that includes one
or more accepting states. If there are several accepting states in that set, pick the one
associated with the earliest pattern pi in the list from the Lex program. Move
the forward pointer back to the end of the lexeme, and perform the action Ai associated with
pattern pi.

• DFA's for Lexical Analyzers


Another architecture, resembling the output of Lex, is to convert the NFA for all the patterns
into an equivalent DFA, using the subset construction of Algorithm 3.20. Within each DFA
state, if there are one or more accepting NFA states, determine the first pattern whose
accepting state is represented, and make that pattern the output of the DFA state.
We use the DFA in a lexical analyzer much as we did the NFA. We simulate the DFA until
at some point there is no next state (or strictly speaking, the next state is 0, the dead
state corresponding to the empty set of NFA states). At that point, we back up through the
sequence of states we entered and, as soon as we meet an accepting DFA state, we perform
the action associated with the pattern for that state.
Example: Suppose the DFA of Fig. 3.54 is given input abba. The se-quence of states
entered is 0137,247,58,68, and at the final a there is no tran-sition out of state 68. Thus, we
consider the sequence from the end, and in this case, 68 itself is an accepting state that reports
pattern p2 = abb . •

Fig 1.14.6 transition graph for DFA handling the patterns a, abb, and a*b+

Prepared by G. Sunil Reddy, Asst. Professor,


29
Compiler Design - Introduction to Compiling, Lexical Analysis

1.15. Optimization of DFA-Based Pattern Matchers


• The first algorithm is useful in a Lex compiler, because it constructs a DFA directly
from a regular expression, without constructing an intermediate NFA. The resulting
DFA also may have fewer states than the DFA constructed via an NFA
• The second algorithm minimizes the number of states of any DFA, by combining
states that have the same future behavior. The algorithm itself is quite efficient,
running in time 0(n log n), where n is the number of states of the DFA
• The third algorithm produces more compact representations of transition tables than
the standard, two-dimensional table

Important States of an NFA

To begin our discussion of how to go directly from a regular expression to a DFA, we must
first dissect the NFA construction of Algorithm 1.13.1 and consider the roles played by
various states. We call a state of an NFA important if it has a non-e out-transition. Notice that
the subset construction (Algorithm 1.13.1) uses only the important states in a set T when it
computes e-closure (move(T, a)), the set of states reachable from T on input a. That is, the set
of states move(s, a) is nonempty only if state s is important. During the subset construction,
two sets of NFA states can be identified (treated as if they were the same set) if they:

• Have the same important states, and


• Either both have accepting states or neither does.

When the NFA is constructed from a regular expression by Algorithm 1.13.1, we can say
more about the important states. The only important states are those introduced as initial
states in the basis part for a particular symbol position in the regular expression. That is, each
important state corresponds to a particular operand in the regular expression.

The constructed NFA has only one accepting state, but this state, having no out-
transitions, is not an important state. By concatenating a unique right endmarker # to a
regular expression r, we give the accepting state for r a transition on #, making it an important
state of the NFA for ( r ) # . In other words, by using the augmented regular expression ( r ) #
, we can forget about accepting states as the subset construction proceeds; when the
construction is complete, any state with a transition on # must be an accepting state.

The important states of the NFA correspond directly to the positions in the regular
expression that hold symbols of the alphabet. It is useful, as we shall see, to present the

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 30
Compiler Design - Introduction to Compiling, Lexical Analysis

regular expression by its syntax tree, where the leaves correspond to operands and the interior
nodes correspond to operators. An interior node is called a cat-node, or-node, or star-node if
it is labeled by the concatenation operator (dot), union operator |, or star operator *,
respectively. We can construct a syntax tree for a regular expression just as we did for
arithmetic expressions.

Example 1: Figure 1.15.1 shows the syntax tree for the regular expression of our running
example. Cat-nodes are represented by circles.

Fig 1.15.1 Syntax tree for (a | b)*abb#

Leaves in a syntax tree are labeled by e or by an alphabet symbol. To each leaf not labeled e,
we attach a unique integer. We refer to this integer as the position of the leaf and also as a
position of its symbol. Note that a symbol can have several positions; for instance, a has
positions 1 and 3 in Fig. 1.15.1. The positions in the syntax tree correspond to the important
states of the constructed NFA.

Prepared by G. Sunil Reddy, Asst. Professor,


31
Compiler Design - Introduction to Compiling, Lexical Analysis

Example 2: Figure 1.15.2 shows the NFA for the same regular expression as Fig. 1.15.1,
with the important states numbered and other states represented by letters. The numbered
states in the NFA and the positions in the syntax tree correspond in a way we shall soon see.

Fig. 1.15.2 NFA constructed by algorithm 1.13.1 for (a | b)*abb#

SOLVED PROBLEMS
1. Construct finite automata for the Regular expression (b|ab*ab*)*

2. Construct finite automata for the Regular expression (a|b)*ab(a|b)*

3. Construct finite automata for the Regular expression (0|1(01*0)*1)*

Prepared by G. Sunil Reddy, Asst. Professor,


32
Compiler Design - Introduction to Compiling, Lexical Analysis

4. Count number of tokens:


int main()
{
// 2 variables
int a, b;
a = 10;
return 0;
}
Answer: 'int' 'main' '(' ')' '{' 'int' 'a' ',' 'b' ';'
'a' '=' '10' ';' 'return' '0' ';' '}'
Above are the valid tokens.
You can observe that we have omitted comments

5. Count number of tokens :


int main()
{
int a = 10, b = 20;
printf("sum is :%d",a+b);
return 0;
}
Answer: Total number of token: 27

6. Count number of tokens :


int max(int i);
• Lexical analyzer first read int and finds it to be valid and accepts as token
• max is read by it and found to be valid function name after reading (
• int is also a token , then again i as another token and finally ;
Answer: Total number of tokens 7:
int, max, ( ,int, i, ), ;

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 33
Compiler Design - Introduction to Compiling, Lexical Analysis

7. Explain how the following statement will be translated into every phase.
sum: = oldsum +rate * 50.

Prepared by G. Sunil Reddy, Asst. Professor,


34
Compiler Design - Introduction to Compiling, Lexical Analysis

REVIEW QUESTIONS (LEVELS I, II, III)


CO Blooms
S. No Review Questions
Addressing level
Define compiler? Explain various phases of a
1 1 2
compiler in detail
2 Construct DFA for given regular expressions.
1 3
(a+b)*abb(a+b)*
3 What is the role of Lexical analyzer 1 1
4 Define regular expression? Give an example 1 2
5 What is lexeme? 1 1
6 What is token? 1 1
7 Explain compiler writing tools in detail 1 2
What is regular expression? Explain about
8 different operators used in construction of 1 3
regular expression with example
Explain with a example statement a: = b*c-d
9 1 3
for all the phases of a compiler
10 Explain compiler writing tools in detail 1 2

MULTIPLE CHOICE QUESTIONS


1. Which of the following is not a part of compiler phases ( )
a) Lexical Analyzer b) Syntax Analyzer c) Semantic Analyzer d) Phase
Analyzer
2. Lexical Analyzer is also known as ( )
a) Hierarchical Analysis b) Linear Analysis c) Scanner d) Both B & C
3. Syntax Analyzer is also known as ( )
a) Hierarchical Analysis b) Parser c) Scanner d) Both A & B
4. An individual token is called ________ ( )
a) Lexeme b) Lex c) Lexeme & Lex d) None of the mentioned
5. Interpreter execute the source program ( )
a) Statement by statement b) Total Program at a time
c) Block by Block d) all of the above

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 35
Compiler Design - Introduction to Compiling, Lexical Analysis

6. Which phase of the compiler is Lexical Analyser? ( )


a) First b) Second c) Third d) None of the mentioned
7. The context free grammar is ambiguous if ( )
a) The grammar contains non-terminals
b) Produces more than one parse tree
c) Production has two non-terminals side by side
d) None of the mentioned
8. _________ is the input to Lexical Analyser is ( )
a) Source Code b) Object Code c) Lexeme d) None of the mentioned
9. Select a Machine Independent phase of the compiler ( )
a) Syntax Analysis b) Semantic Analyzer
c) Lexical Analysis d) All of the mentioned
10. Select a Machine dependent phase of the compiler ( )
a) Code Generator b) Code Optimizer c) Both a & b d) b only
11. The _______Part breaks up the source program into constituent pieces and impose a
grammatical structure on them. ( )
a) Analysis part b) Synthesis part c) Semantic Analyzer d) Code
Optimizer
12. Compiler Collect the information about the source program and store it in a data structure
called ____________ ( )
a) Symbol table b) Error handler c) Scanner d) Parser
13. Type checking is normally done during? ( )
a) Lexical Analysis b) Syntax Analysis
c) Syntax Directed Translation d) Code generation
14. Lexical analyzer stripping out the__________ ( )
a) Comments, White spaces b) Keywords, Symbols
c) Terminals, Non-Terminals d) All of the above
15. Syntax Analysis Generates Parse Tree ( )
a) True b) False
16. By whom the symbol table created? ( )
a) Compiler b) Interpreter c) Assembler d) None of the mentioned
17. A system program that combines the separately compiled modules of a program into a
form suitable for execution? ( )
a) Assembler b) Compiler c) Linking d) Interpreter
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 36
Compiler Design - Introduction to Compiling, Lexical Analysis

18. Which of the following strings is not generated by the following grammar? ( )
S → SaSbS|ε
a) aabb b) abab c) aababb d) aaabbb
19. ______________are an important notation for specifying lexeme patterns
20. What is the Regular Expression Matching Zero or More Specific Characters ( )
a) + b) # c) * d) &
21. What is the Regular Expression Matching One or More Specific Characters ( )
a) + b) # c) * d) &
22. The _____________put all the executable object files into main memory for execution.( )
a) Text Editor b) Assembler c) Linker d) Loader
23. Regular expression (x|y)(x|y) denotes the set ( )
a) {xy,xy} b) {xx,xy,yx,yy} c) {x,y} d) {x,y,xy}
24. A compiler for a high-level language that runs on one machine and produces code for a
different machine is called ( )
a) Optimizing compiler b) One pass compiler
c) Cross compiler d) Multipass compiler
25. The output of lexical analyzer is ( )
a) A set of Regular Expressions b) Syntax Tree
c) Set of Tokens d) String Character
26. What is the regular expression to print character literally ( )
a) “c” b){c} c)c+ d)c$
27. In which phase the concept of grammar is used in compilation ( )
a) Lexical analysis b) Parser
c) Code generation d) Code optimization
28. The set of all strings over ∑ = {a,b} in which all strings having bbbb as substring is
a) (a+b)* bbbb (a+b)* b) (a+b)* bb (a+b)*bb
c) bbb(a+b)* d) bb (a+b)*
29. The set of all strings over ∑ ={a,b} in which a single a is followed by any number of b’s
or a single b followed by any number of a’s is ( )
a) ab* | ba* b) ab*ba* c) a*b + b*a d) None of the mentioned
30. Regular expressions are used to represent which language ( )
a) Recursive language b) Context free language
c) Regular language d) All of the mentioned

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 37
Compiler Design - Introduction to Compiling, Lexical Analysis

31. The set of all strings over ∑ = {a,b} in which strings consisting a’s and b’s and ending
with in bb is ( )
a) ab b) a*bbb c) (a+b)* bb d) All of the mentioned
32. Which of the following regular expression denotes zero or more instances of a or b?
a) a|b b) (a|b)* c) (ab)* d) a*b
33. The string (a)|((b)*(c)) is equivalent to ( )
a) Empty b) abcabc c) b*c|a d) None of the mentioned
34. Which of the following is not a cousin of compiler ( )
a. Assembler b. Linker c. Sentinel d. Loader
35. Output file of Lex is _____ ? ( )
a) Myfile.e b) Myfile.yy.c c) Myfile.lex d) Myfile.obj
36. The number of tokens in the following C statement is ( )
printf("i = %d, &i = %x", i, &i);
a. 3 b. 26 c. 10 d. 21
37. __________ accept the stream of character as input and produces stream of token as
output.
a. Parser b. Lexical analyzer c. Scanner d. b and c
38. The sequence of characters in the program that is matched by pattern for a token is known
as____________________.
a) Lexeme b) Regular Expression c) Loader d) Scanner
39. LEX specification consists of__________parts. ( )
a. 1 b. 2 c.3 d. 4
40. The Regular Expression a+ denotes________________________________ ( )
a. Set of all string of one (or) more number of a’s
b. Set of all string of zero (or) more number of a’s
c. Set of all string of two consecutive a’s
d. All the above
41. The Regular Expression a? denotes________________________________ ( )
a. Set of all string of one (or) more number of a’s
b. Set of all string of zero (or) more number of a’s
c. Set of all string of two consecutive a’s
d. Set of all string of zero (or) one number of a’s
42. The Regular Expression a* denotes________________________________ ( )
a. Set of all string of one (or) more number of a’s
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 38
Compiler Design - Introduction to Compiling, Lexical Analysis

b. Set of all string of zero (or) more number of a’s


c. Set of all string of two consecutive a’s
d. All the above

SHORT QUESTIONS
UNIT –I
INTRODUCTION TO COMPILING & LEXICAL ANALYSIS
CO Blooms Mark
S. No Short Questions
Addressing level s
1 Define Compiler? 1 1 2
2 Illustrate the cousins of compiler 1 1 2
3 What is the role of preprocessors 1 1 2
4 Define Assembler? 1 1 2
Illustrate the role of Loader in program
5 1 1 2
compilation
Illustrate the role of Linker in program
6 1 1 2
compilation
7 Define Interpreter? 1 1 2
8 Compare the Compiler and Interpreter 1 1 2
9 List the phases of compiler 1 1 2
10 Define Scanner 1 1 2
11 Define Parser 1 1 2
12 Define Symbol table? 1 1 2
List the phases of synthesis part of compiler
13 1 1 2
phases
List the phases of analysis part of compiler
14 1 1 2
phases
15 What is the role of semantic analyzer? 1 1 2
16 Differentiate Scanner and parser 1 2 2
17 Define the two main parts of compilation? 1 1 2
18 What is the role of error Handler? 1 1 2
19 What are the tools used to construct scanner and 1 1 2

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 39
Compiler Design - Introduction to Compiling, Lexical Analysis

parser
20 What is pass and phase? 1 1 2
21 List the compiler writing tools. 1 1 2
Grammars are used to create parse tree.
22 Justify that above statement is true or false? 1 2 2
Why?
How the symbol table interact with Lexical
23 1 2 2
analyzer?
24 What is the role of Lexical analyzer 1 1 2
25 Define regular expression? Give an example 1 2 2
26 What is lexeme? 1 1 2
27 What is token? 1 1 2
28 Classify tokens in the expression int a,b; 1 2 2
Identify the relation among Token, Pattern and
29 1 1 2
Lexemes
Write a regular expression for a floating point
30 1 2 2
number
Justify why lexical analyzer stripping out some
31 1 2 2
tokens and statements
32 What are Lexical errors? Give a example 1 2 2
Describe the possible strings for following
regular expressions
33 1 2 2
i)a(a|b)*a
ii)(a|b)*a(a|b)(a|b)
Lexeme is a sequence of characters and Token
34 is a output of Lexical analyzer .Justify the 1 2 2
above statement.
35 Construct NFA for (a|b)*ab 1 2 2
Determine whether the following regular
36 expressions derive the same strings or not. 1 2 2
(ab)* and a*b*
What is meant by Kleene Closure? Give an
37 1 2 2
example.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 40
Compiler Design - Introduction to Compiling, Lexical Analysis

What is meant by Positive Closure? Give an


38 1 2 2
example.
Construct the NFA for following regular
39 expression. 1 2 2
(a*|b*)*
40 Write a suitable pattern using ( ), { } and [ ]. 1 2 2

SHORT QUESTIONS WITH ANSWERS


1. What is a Complier?
A Complier is a program that reads a program written in one language-the source
language-and translates it in to an equivalent program in another language-the target
language. As an important part of this translation process, the compiler reports to its user the
presence of errors in the source program

2. State some software tools that manipulate source program?


i. Structure editors
ii. Pretty printers
iii. Static checkers
iv. Interpreters.

3. What are the cousins of compiler?


The following are the cousins of compilers
i. Preprocessors
ii. Assemblers
iii. Loaders
iv. Link editors.

Prepared by G. Sunil Reddy, Asst. Professor,


41
Compiler Design - Introduction to Compiling, Lexical Analysis

4. What are the main two parts of compilation? What are they performing
The two main parts are
• Analysis part breaks up the source program into constituent pieces and creates
an intermediate representation of the source program.
• Synthesis part constructs the desired target program from the intermediate
representation

5. What is a Structure editor?


A structure editor takes as input a sequence of commands to build a source program
.The structure editor not only performs the text creation and modification functions of an
ordinary text editor but it also analyzes the program text putting an appropriate hierarchical
structure on the source program.

6. What are a Pretty Printer and Static Checker?


• A Pretty printer analyses a program and prints it in such a way that the structure of
the program becomes clearly visible.
• A static checker reads a program, analyses it and attempts to discover potential
bugs with out running the program.

7. How many phases does analysis consists?


Analysis consists of three phases
i .Linear analysis
ii .Hierarchical analysis
iii. Semantic analysis

8. What happens in linear analysis?


This is the phase in which the stream of characters making up the source program is
read from left to right and grouped in to tokens that are sequences of characters having
collective meaning.

9. What happens in Hierarchical analysis?


This is the phase in which characters or tokens are grouped hierarchically in to nested
collections with collective meaning.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 42
Compiler Design - Introduction to Compiling, Lexical Analysis

10. What happens in Semantic analysis?


This is the phase in which certain checks are performed to ensure that the components
of a program fit together meaningfully.

11. State some compiler construction tools?


i. Parse Scanner generators
ii. Syntax-directed translation engines
iii. Automatic code generator
iv. Data flow engines.

12. What is a Loader? What does the loading process do?


A Loader is a program that performs the two functions
i. Loading
ii. Link editing
The process of loading consists of taking relocatable machine code, altering the
relocatable address and placing the altered instructions and data in memory at the proper
locations.

13. What does the Link Editing does?


Link editing: This allows us to make a single program from several files of
relocatable machine code. These files may have been the result of several compilations, and
one or more may be library files of routines provided by the system and available to any
program that needs them.

14. What is a preprocessor?


A preprocessor is one, which produces input to compilers. A source program may be
divided into modules stored in separate files. The task of collecting the source program is
sometimes entrusted to a distinct program called a preprocessor. The preprocessor may also
expand macros into source language statements.
Skeletal source program

Preprocessor

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 43
Compiler Design - Introduction to Compiling, Lexical Analysis

15. State some functions of Preprocessors


i. Macro processing
ii. File inclusion
iii. Relational Preprocessors
iv. Language extensions

16. What is a Symbol table?


A Symbol table is a data structure containing a record for each identifier, with fields
for the attributes of the identifier. The data structure allows us to find the record for each
identifier quickly and to store or retrieve data from that record quickly.

17. State the general phases of a compiler


i. Lexical analysis
ii. Syntax analysis
iii. Semantic analysis
iv. Intermediate code generation
v. Code optimization
vi. Code generation

18. What is an assembler?


Assembler is a program, which converts the source language in to assembly language.

19. What is the need for separating the analysis phase into lexical analysis and parsing?
(Or) What are the issues of lexical analyzer?
Simpler design is perhaps the most important consideration. The separation of lexical
analysis from syntax analysis often allows us to simplify one or the other of these
phases.
Compiler efficiency is improved.
Compiler portability is enhanced.

20. What is Lexical Analysis?


The first phase of compiler is Lexical Analysis. This is also known as linear analysis
in which the stream of characters making up the source program is read from left-to-right and
grouped into tokens that are sequences of characters having a collective meaning.
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 44
Compiler Design - Introduction to Compiling, Lexical Analysis

21. What is a lexeme? Define a regular set.


A Lexeme is a sequence of characters in the source program that is matched by the
pattern for a token.
A language denoted by a regular expression is said to be a regular set

22. What is a sentinel? What is its usage?


A Sentinel is a special character that cannot be part of the source program. Normally we
use ‘eof’ as the sentinel. This is used for speeding-up the lexical analyzer.

23. What is a regular expression? State the rules, which define regular expression?
Regular expression is a method to describe regular language
Rules:
1) ɛ-is a regular expression that denotes { ɛ } that is the set containing the empty
string
2) If a is a symbol in ∑,then a is a regular expression that denotes {a}
3) Suppose r and s are regular expressions denoting the languages L(r ) and L(s)
Then,
a) (r )/(s) is a regular expression denoting L(r) U L(s).
b) (r )(s) is a regular expression denoting L(r )L(s)
c) (r )* is a regular expression denoting L(r)*.
d) (r) is a regular expression denoting L(r ).

24. What are the Error-recovery actions in a lexical analyzer?


1. Deleting an extraneous character
2. Inserting a missing character
3. Replacing an incorrect character by a correct character
4. Transposing two adjacent characters

25. Construct Regular expression for the language


L= {w E{a,b}/w ends in abb} Ans: {a/b}*abb.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 45
Compiler Design - Introduction to Compiling, Lexical Analysis

26. What is recognizer?


Recognizers are machines. These are the machines which accept the strings belonging to
certain language. If the valid strings of such language are accepted by the machine then it is
said that the corresponding language is accepted by that machine, otherwise it is rejected.

LONG QUESTIONS
CO Blooms
S. No Long Questions Marks
Addressing level
Define compiler? Explain various phases of a
1 1 2 10
compiler in detail
2 Explain compiler writing tools in detail 1 2 5
What is regular expression? Explain about
3 different operators used in construction of 1 3 5
regular expression with example
Explain with a example statement a: = b*c-d for
4 1 3 10
all the phases of a compiler
a. Explain cousins of a Compiler
5 b. Describe how various phases could be 1 3 10
combined as a pass in a compiler?
6 Explain the role Lexical Analyzer in detail with
1 2 10
a example source code
7 Explain in detail about the Language for
1 2 10
specifying lexical Analyzers
8 Construct DFA for given regular expressions.
1 4 10
(a|b)*abb
9 Explain the general format of a LEX program
1 2 5
with example?
10 Explain how the following statement will be
1 3 10
translated into every phase a:=b+c*60
11 Explain the phases of compiler? And how the
following statement will be translated into 1 2 10
every phase.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 46
Compiler Design - Introduction to Compiling, Lexical Analysis

Position: = initial + rate * 60


12 Construct DFA for given regular expressions.
1 3 10
(a+b)*abb(a+b)*
13 Construct NFA for given regular expressions.
1 3 10
abb(a+b)* and convert it in to DFA
14 Explain Specification of tokens in detail 1 2 10
15 Explain Recognition of tokens in detail 1 2 10
16 Construct NFA for given regular expressions.
1 3 10
ab(a+b)*b+(ab)+
17 Construct a minimum state DFA for the regular
1 3 10
expression (a|b)*abb(a|b)a(a|b)(a|b)
18 Describe various phases of a compiler while
translating the assignment statement: a = p + r 1 3 10
*10 in to assembly language
19 Construct Finite Automata for the given
1 4 10
Regular Expression (0+1)*(00+11)(0+1)*
20 Construct Finite Automata for the given
1 4 10
Regular Expression (0+1)*11+0(0+1)*

GATE/COMPETITIVE EXAMS QUESTIONS


1. The number of tokens in the following C statement is (GATE 2000)
printf("i = %d, &i = %x", i, &i);
A. 3 B. 26 C. 10 D. 21
Answer: (C)
Explanation: In a C source program, the basic element recognized by the compiler is the
“token.” A token is source-program text that the compiler does not break down into
component elements. There are 6 types of C tokens: identifiers, keywords, constants,
operators, string literals and other separators. There are total 10 tokens in the above printf
statement.
Below are tokens in above program.
1. printf
2. (
3. "i = %d, &i = %x"

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 47
Compiler Design - Introduction to Compiling, Lexical Analysis

4. ,
5. i
6. ,
7. &
8. i
9. )
10. ;
2. A lexical analyzer uses the following patterns to recognize three tokens T1, T2, and
T3 over the alphabet {a,b,c}.
T1: a?(b∣c)*a
T2: b?(a∣c)*b
T3: c?(b∣a)*c
Note that ‘x?’ means 0 or 1 occurrence of the symbol x. Note also that the analyzer
outputs the token that matches the longest possible prefix.

If the string bbaacabc is processes by the analyzer, which one of the following is the
sequence of tokens it outputs?
(A) T1T2T3
(B) T1T1T3
(C) T2T1T3
(D) T3T3
Answer: (D)
Explanation: 0 or 1 occurrence of the symbol x.
T1 : (b+c)* a + a(b+c)* a
T2 : (a+c)* b + b(a+c)* b
T3 : (b+a)* c + c(b+a)* c
Given String : bbaacabc
Longest matching prefix is ” bbaac ” (Which can be generated by T3)
The remaining part (after Prefix) “abc” (Can be generated by T3)
So, the answer is T3T3

3. In a compiler, keywords of a language are recognized during


A. parsing of the program
B. the code generation
C. the lexical analysis of the program

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 48
Compiler Design - Introduction to Compiling, Lexical Analysis

D. dataflow analysis

Answer: (C)
Explanation: Lexical analysis is the process of converting a sequence of characters into a
sequence of tokens. A token can be a keyword.

4. The lexical analysis for a modern computer language such as Java needs the power of
which one of the following machine models in a necessary and sufficient sense?
A. Finite state automata
B. Deterministic pushdown automata
C. Non-Deterministic pushdown automata
D. Turing Machine

Answer (A)
Explanation: Lexical analysis is the first step in compilation. In lexical analysis, program is
divided into tokens. Lexical analyzers are typically based on finite state automata. Tokens
can typically be expressed as different regular expressions:
An identifier is given by [a-zA-Z][a-zA-Z0-9]*
The keyword if is given by if.
Integers are given by [+-]?[0-9]+.

5. Which one of the following statements is FALSE?


A. Context-free grammar can be used to specify both lexical and syntax rules.
B. Type checking is done before parsing
C. High-level language programs can be translated to different Intermediate
Representations.
D. Arguments to a function can be passed using the program stack.

Answer: (B)
Explanation: Type checking is done at semantic analysis phase and parsing is done at
syntax analysis phase. And we know Syntax analysis phase comes before semantic
analysis. So Option (B) is False.

All other options seems Correct.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 49
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

UNIT II
Syntax Analysis: The role of a parser, Context-free grammars, writing a grammar, Parsing,
Ambiguous grammar, Elimination of Ambiguity, Classification of parsing techniques
Top down parsing: Back Tracking, Recursive Descent parsing, FIRST ( ) and FOLLOW ( )
- LL Grammars, Non-Recursive descent parsing, Error recovery in predictive parsing.

UNIT-II: Syntax Analysis, Top down parsing Planned Hours: 12


Blooms
S. No. Topic Learning Outcomes COs
Levels
1. Recall the concepts of Context Free Grammar CO 2 L1
2. Understand the role of parser in Compiler Design CO 2 L2
3. Build the parse tree for Top Down Parsing using the rules CO 2 L3
of syntactical grammar
4. Analyze different Top Down Parsing techniques CO 2 L4
5. Examine the error recovery strategies in op Down Parsing CO 2 L4

2. SYNTAX ANALYSIS

Syntax analysis is the second phase of the compiler. It gets the input from the tokens
and generates a syntax tree or parse tree.
Advantages of grammar for syntactic specification:
1. A grammar gives a precise and easy-to-understand syntactic specification of a
programming language.
2. An efficient parser can be constructed automatically from a properly designed
grammar.
3. A grammar imparts a structure to a source program that is useful for its translation
into object code and for the detection of errors.
4. New constructs can be added to a language more easily when there is a grammatical
description of the language.

2.1. THE ROLE OF PARSER

Parser for any grammar is program that takes as input string w (obtain set of strings
tokens from the lexical analyzer) and produces as output either a parse tree for w , if w is a
valid sentences of grammar or error message indicating that w is not a valid sentences of

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 50
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

given grammar. The goal of the parser is to determine the syntactic validity of a source string
is valid, a tree is built for use by the subsequent phases of the computer. The tree reflects the
sequence of derivations or reduction used during the parser. Hence, it is called parse tree. If
string is invalid, the parse has to issue diagnostic message identifying the nature and cause of
the errors in string. Every elementary sub tree in the parse tree corresponds to a production of
the grammar.
There are two ways of identifying an elementary sub tree:
1. By deriving a string from a non-terminal or
2. By reducing a string of symbol to a non-terminal.
The two types of parsers employed are:
a. Top down parser: which build parse trees from top (root) to bottom (leaves)
b. Bottom up parser: which build parse trees from leaves and work up the root
The parser or syntactic analyzer obtains a string of tokens from the lexical analyzer
and verifies that the string can be generated by the grammar for the source language. It
reports any syntax errors in the program. It also recovers from commonly occurring errors so
that it can continue processing its input.

Fig. 2.1.1 Position of parser in compiler Functions of the parser:


1. It verifies the structure generated by the tokens based on the grammar.
2. It constructs the parse tree.
3. It reports the errors.
4. It performs error recovery.
Issues:
Parser cannot detect errors such as:
1. Variable re-declaration
2. Variable initialization before use
3. Data type mismatch for an operation
The above issues are handled by Semantic Analysis phase.
Prepared by G. Sunil Reddy, Asst. Professor,
51
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Syntax error handling:


Programs can contain errors at many different levels. For example:
1. Lexical, such as misspelling an identifier, keyword or operator
2. Syntactic, such as an arithmetic expression with unbalanced parentheses
3. Semantic, such as an operator applied to an incompatible operand
4. Logical, such as an infinitely recursive call

Functions of error handler:


1. It should report the presence of errors clearly and accurately
2. It should recover from each error quickly enough to be able to detect subsequent errors
3. It should not significantly slow down the processing of correct programs

Error recovery strategies:


The different strategies that a parse uses to recover from a syntactic error are:
1. Panic mode
2. Phrase level
3. Error productions
4. Global correction

Panic mode recovery:


On discovering an error, the parser discards input symbols one at a time until a
synchronizing token is found. The synchronizing tokens are usually delimiters, such as
semicolon or end. It has the advantage of simplicity and does not go into an infinite loop.
When multiple errors in the same statement are rare, this method is quite useful.

Phrase level recovery:


On discovering an error, the parser performs local correction on the remaining input
that allows it to continue. Example: Insert a missing semicolon or delete an extraneous
semicolon etc.

Error productions:
The parser is constructed using augmented grammar with error productions. If an error
production is used by the parser, appropriate error diagnostics can be generated to indicate
the erroneous constructs recognized by the input.
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 52
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Global correction:
Given an incorrect input string x and grammar G, certain algorithms can be used to
find a parse tree for a string y, such that the number of insertions, deletions and changes of
tokens is as small as possible. However, these methods are in general too costly in terms of
time and space.

2.2. CONTEXT-FREE GRAMMARS

A Context-Free Grammar is a quadruple that consists of terminals, non-terminals, start


symbol and productions.
Inherently recursive structures of a programming language are defined by a context-free
Grammar. In a context-free grammar, we have four triples G = (V, T, P, S)
Here,
• V is finite set of terminals (in our case, this will be the set of tokens)
• T is a finite set of non-terminals (syntactic-variables)
• P is a finite set of productions rules in the following form
• A → α where A is a non-terminal and α is a string of terminals and non-terminals
(including the empty string)
• S is a start symbol (one of the non-terminal symbol)

Terminals: These are the basic symbols from which strings are formed.
Non-Terminals: These are the syntactic variables that denote a set of strings. These help to
define the language generated by the grammar.
Start Symbol: One non-terminal in the grammar is denoted as the “Start-symbol” and the
set of strings it denotes is the language defined by the grammar.
Productions: It specifies the manner in which terminals and non-terminals can be
combined to form strings. Each production consists of a non-terminal, followed by an arrow,
followed by a string of non-terminals and terminals.

L(G) is the language of G (the language generated by G) which is a set of sentences. A


sentence of L(G) is a string of terminal symbols of G. If S is the start symbol of G then ω is a
sentence of L(G) iff S ⇒ ω where ω is a string of terminals of G. If G is a context free

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 53
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

grammar, L(G) is a context-free language. Two grammars G1 and G2 are equivalent, if they
produce same grammar.
Consider the production of the form S ⇒ α, If α contains non-terminals, it is called as a
sentential form of G. If α does not contain non-terminals, it is called as a sentence of G.

Example of context-free grammar:


The following grammar defines simple arithmetic expressions:
expr → expr op expr
expr → (expr)
expr → - expr
expr → id
op → +
op → -
op → *
op → /
op → ↑
In this grammar,
id + - * / ↑ ( ) are terminals.
expr , op are non-terminals.
expr is the start symbol.
Each line is a production.
Derivations:
In general a derivation step is
αAβ ⇒ αγβ is sentential form and if there is a production rule A→γ in our grammar.
Where α and β are arbitrary strings of terminal and non-terminal symbols α1 ⇒ α2 ⇒... ⇒ αn
(αn derives from α1 or α1 derives αn ). There are two types of derivation
1. At each derivation step, we can choose any of the non-terminal in the sentential form
of G for the replacement
2. If we always choose the left-most non-terminal in each derivation step, this derivation
is called as left-most derivation
Two basic requirements for a grammar are:
1. To generate a valid string
2. To recognize a valid string

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 54
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Derivation is a process that generates a valid string with the help of grammar by replacing the
non-terminals on the left with the string on the right side of the production.

Example:
Consider the following grammar for arithmetic expressions:
E→E+E|E*E|(E)|-E| id
To generate a valid string - ( id+id ) from the grammar the steps are
1. E → - E
2. E → - ( E )
3. E → - ( E+E )
4. E → - ( id+E )
5. E → - ( id+id )
In the above derivation, E is the start symbol, -(id+id) is therequiredsentence(only terminals).
Strings such as E, -E, -(E), . . . are called sentinel forms.

Types of derivations:
The two types of derivation are:
1. Left most derivation
2. Right most derivation.
• In leftmost derivations, the leftmost non-terminal in each sentinel is always chosen
first for replacement.
• In rightmost derivations, the rightmost non-terminal in each sentinel is always chosen
first for replacement.

Example:
Given grammar G : E → E+E | E*E | ( E ) | - E | id Sentence to be derived : - (id+id)
Left Most Derivation
E→-E
E→-(E)
E→-(E+E)
E→-(id+E)
E→-(id+id)
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 55
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Right Most Derivation


E→-E
E→-(E)
E→-(E+E)
E→-(E+id)
E→-(id+id)
• String that appear in leftmost derivation are called left sentinel forms.
• String that appear in rightmost derivation are called right sentinel forms.
Sentinels:
Given a grammar G with start symbol S, if S → α , where α may contain non-
terminals or terminals, then α is called the sentinel form of G.
Yield or frontier of tree:
Each interior node of a parse tree is a non-terminal. The children of node can be a
terminal or non-terminal of the sentinel forms that are read from left to right. The sentinel
form in the parse tree is called yield or frontier of the tree.

PARSE TREE
• Inner nodes of a parse tree are non-terminal symbols
• The leaves of a parse tree are terminal symbols
• A parse tree can be seen as a graphical representation of a derivation

Fig 2.2.1. Sequence of parse trees for derivation

Prepared by G. Sunil Reddy, Asst. Professor,


56
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

2.3. AMBIGUOUS GRAMMAR

A grammar that produces more than one parse for some sentence is said to be ambiguous
grammar.
Example:
Given grammar G: E → E+E | E*E | (E) | - E | id
The sentence id+ id* id has the following two distinct leftmost derivations:
E → E+ E E → E* E
E → id + E E→E+ E * E
E → id + E * E E → id + E * E
E → id + id * E E → id + id * E
E → id + id * id E → id + id * id
The two corresponding trees are,

Fig. 2.2.2 Two parse trees for id+ id* id

2.4. WRITING A GRAMMAR

A grammar consists of a number of productions. Each production has an abstract


symbol called a non terminal as its left-hand side, and a sequence of one or more non
terminal and terminal symbols as its right-hand side. For each grammar, the terminal
symbols are drawn from a specified alphabet.
Starting from a sentence consisting of a single distinguished non terminal, called
the goal symbol, a given context-free grammar specifies a language, namely, the set of
possible sequences of terminal symbols that can result from repeatedly replacing any non
terminal in the sequence with a right-hand side of a production for which the non terminal is
the left-hand side.

Prepared by G. Sunil Reddy, Asst. Professor,


57
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Regular Expression
It is used to describe the tokens of programming languages.
It is used to check whether the given input is valid or not using transition d
The transition diagram has set of states and edges.
It has no start symbol.
It is useful for describing the structure of lexical constructs such as identifiers, constants,
keywords, and so forth.

Context Free Grammar


It consists of a quadruple where S → start symbol, P → production, T → terminal, V →
variable or non- terminal. It is used to check whether the given input is valid or not using
derivation.
The context-free grammar has set of productions. It has start symbol. Parentheses, matching
begin- ends’s and so on.
Regular Expression Context Free Grammar
It consists of a quadruple where S start
It is used to describe the tokens of
symbol, P production, T terminal, V
programming languages
variable or non- terminal
It is used to check whether the given input is It is used to check whether the given input is
valid or not using transition diagram valid or not using derivation
The transition diagram has set of states and The context free grammar has set of
edges productions
It has no start symbol It has start symbol
It is useful for describing the structure of It is useful in describing nested structures
lexical constructs such as identifiers, such as balanced parenthesis, matching
constants, keywords, and so forth begin-end’s and so on

There are four categories in writing a grammar:


1. Regular Expression Vs Context Free Grammar
2. Eliminating ambiguous grammar.
3. Eliminating left-recursion
4. Eliminating Left-factoring.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 58
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Each parsing method can handle grammars only of a certain form hence, the initial grammar
may have to be rewritten to make it parsable.
Reasons for using the regular expression to define the lexical syntax of a language
• The lexical rules of a language are simple and RE is used to describe them
• Regular expressions provide a more concise and easier to understand notation
for tokens than grammars
• Efficient lexical analyzers can be constructed automatically from RE than from
grammars
• Separating the syntactic structure of a language into lexical and non lexical parts
provides a convenient way of modularizing the front end into two manageable-sized
components

2.5. ELIMINATING AMBIGUITY:

Ambiguity of the grammar that produces more than one parse tree for leftmost or rightmost
derivation can be eliminated by re-writing the grammar.
To disambiguate the grammar E → E+E | E*E | E^E | id | (E), we can use precedence of
operators as follows:
^ (right to left)
/,* (left to right)
-,+ (left to right)
We get the following unambiguous grammar:
E → E+T | T
T → T*F | F
F → G^F | G
G → id | (E)
Consider this example, G: stmt → if expr then stmt | if expr then stmt else stmt | other
This grammar is ambiguous since the string if E1 then if E2 then S1 else S2 has
the following
Two parse trees for leftmost derivation:

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 59
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Fig. 2.5.1. Two parse trees for an ambiguous sentence


To eliminate ambiguity, the following grammar may be used:
stmt → matched_stmt | unmatched_stmt
matched_stmt → if expr then matched_stmt else matched_stmt | other
unmatched_stmt → if expr then stmt | if expr then matched_stmt else unmatched_stmt
Eliminating Left Recursion:
A grammar is said to be left recursive if it as a non-terminal A such that there is a
derivation A=>Aα for some string α. Top-down parsing methods cannot handle left-
recursive grammars. Hence, left recursion can be eliminated as follows:
If there is a production A → Aα | β it can be replaced with a sequence of two
productions
A → βA’
A’ → αA’ | ε
Without changing the set of strings derivable from A.
Prepared by G. Sunil Reddy, Asst. Professor,
60
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Example: Consider the following grammar for arithmetic expressions:


E → E+T | T
T → T*F | F
F → (E) | id
First eliminate the left recursion for E as
E → TE’
E’ → +TE’ | ε
Then eliminate for T as
T → FT’
T’→ *FT’ | ε
Thus the obtained grammar after eliminating left recursion is
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
Algorithm to eliminate left recursion:
1. Arrange the non-terminals in some order A1, A2 . . . An.
2. for i := 1 to n do begin
for j := 1 to i-1 do begin
replace each production of the form Ai → Aj γ
by the productions Ai → δ1γ | δ2γ | . . . | δk γ
where Aj → δ1 | δ2 | . . . | δk are all the current Aj-productions;
end
eliminate the immediate left recursion among the Ai-productions
end

Left factoring:
Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing. When it is not clear which of two alternative productions to use to expand
a non-terminal A, we can rewrite the A-productions to defer the decision until we have seen
enough of the input to make the right choice.
If there is any production A → αβ1 | αβ2 , it can be rewritten as
A → αA’
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 61
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

A’ → β1 | β2
Consider the grammar, G: S → iEtS | iEtSeS | a
E→b
Left factored, this grammar becomes
S → iEtSS’ | a
S’ → eS | ε
E→b

2.6. PARSING

It is the process of analyzing a continuous stream of input in order to determine its


grammatical structure with respect to a given formal grammar.
Parse tree:
Graphical representation of a derivation or deduction is called a parse tree. Each
interior node of the parse tree is a non-terminal; the children of the node can be terminals or
non-terminals.
Types of parsing:
1. Top down parsing
2. Bottom up parsing

• Top-down parsing: A parser can start with the start symbol and try to transform it to
the input string. Example: LL Parsers.
• Bottom-up parsing: A parser can start with input and attempt to rewrite it into the start
symbol. Example: LR Parsers.

2.7. CLASSIFICATION OF PARSING TECHNIQUES

Fig 2.7.1. Parsing Techniques

Prepared by G. Sunil Reddy, Asst. Professor,


62
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

2.8. TOP-DOWN PARSING

It can be viewed as an attempt to find a left-most derivation for an input string or an


attempt to construct a parse tree for the input starting from the root to the leaves.
Typically, top-down parsers are implemented as a set of recursive functions that
descent through a parse tree for a string. This approach is known as recursive descent
parsing, also known as LL(k) parsing where the first L stands for left-to-right, the second L
stands for leftmost-derivation, and k indicates k-symbol look ahead.
Therefore, a parser using the single-symbol look-ahead method and top-down parsing
without backtracking is called LL(1) parser. In the following sections, we will also use an
extended BNF notation in which some regulation expression operators are to be incorporated.
This parsing method may involve backtracking.
Types of top-down parsing:
• TDP with Backtracking
• TDP without Backtracking
1. Recursive descent parsing
2. Predictive parsing

2.9. TDP WITH BACKTRACKING

• This parsing method may involve backtracking, that is, making repeated scans of
the input

Example for Back Tracking:


Consider the grammar G:
S → cAd
A→ab|a
and the input string w=cad.
The parse tree can be constructed using the following top-down approach:
Step1:
Initially create a tree with single node labeled S. An input pointer points to ‘c’, the first
symbol of w. Expand the tree with the production of S.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 63
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Step2:
The leftmost leaf ‘c’ matches the first symbol of w, so advance the input pointer to the second
symbol of w ‘a’ and consider the next leaf ‘A’. Expand A using the first alternative.

Step3:
The second symbol ‘a’ of w also matches with second leaf of tree. So advance the input
pointer to third symbol of w ‘d’. But the third leaf of tree is b which does not match with the
input symbol d. Hence discard the chosen production and reset the pointer to second position.
This is called backtracking.
Step4:
Now try the second alternative for A.

Now we can halt and announce the successful completion of parsing.


TDP without Backtracking:
2.10. RECURSIVE DESCENT PARSING
• Recursive descent parsing is one of the top-down parsing techniques that uses a set
of recursive procedures to scan its input

Example for recursive decent parsing:


A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop.
Hence, elimination of left-recursion must be done before parsing.
Consider the grammar for arithmetic expressions

Prepared by G. Sunil Reddy, Asst. Professor,


64
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

E → E+T | T
T → T*F | F
F → (E) | id
After eliminating the left-recursion the grammar becomes,
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
Now we can write the procedure for grammar as follows:
Recursive procedure:
Procedure E()
begin
T( );
EPRIME( );
End
Procedure EPRIME( )
begin
If input_symbol=’+’ then
ADVANCE( );
T( );
EPRIME( );
end
Procedure T( )
begin
F( );
TPRIME( );
End
Procedure TPRIME( )
begin
If input_symbol=’*’ then
ADVANCE( );
F( );
TPRIME( );
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 65
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

end
Procedure F( )
begin
If input-symbol=’id’ then
ADVANCE( );
else if input-symbol=’(‘ then
ADVANCE( );
E( );
else if input-symbol=’)’ then
ADVANCE( );
end
else ERROR( );
Stack implementation:
PROCEDURE INPUT STRING
E( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id
EPRIME( ) id+id*id
ADVANCE( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id
ADVANCE( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id

2.11. NON-RECURSIVE PREDICTIVE PARSER

Predictive Parsing
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 66
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

• Predictive parsing is a special case of recursive descent parsing where no


backtracking is required
• The key problem of predictive parsing is to determine the production to be applied for
a non-terminal in case of alternatives.
Non Recursive Descent Parsing
It is possible to build a non recursive predictive parser by maintaining a stack
explicitly, rather than implicitly via recursive calls. The key problem during predictive
parsing is that of determining the production to be applied for a non terminal. The non
recursive parser in figure looks up the production to be applied in parsing table. In what
follows, we shall see how the table can be constructed directly from certain grammars.

Fig. 2.11.1. Model of a non recursive predictive parser


The table-driven predictive parser has an input buffer, stack, a parsing table and an output
stream.
Input buffer:
It consists of strings to be parsed, followed by $ to indicate the end of the input string.
Stack:
It contains a sequence of grammar symbols preceded by $ to indicate the bottom of the stack.
Initially, the stack contains the start symbol on top of $.
Parsing table:
It is a two-dimensional array M[A, a], where ‘A’ is a non-terminal and ‘a’ is a terminal.
Predictive parsing program:

Prepared by G. Sunil Reddy, Asst. Professor,


67
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

The parser is controlled by a program that considers X, the symbol on top of stack, and a, the
current input symbol. These two symbols determine the parser action. There are three
possibilities:
1. If X = a = $, the parser halts and announces successful completion of parsing.
2. If X = a ≠ $, the parser pops X off the stack and advances the input pointer to the next
input symbol.
3. If X is a non-terminal , the program consults entry M[X, a] of the parsing table M.
This entry will either be an X-production of the grammar or an error entry.
If M[X, a] = {X → UVW},the parser replaces X on top of the stack by UVW
If M[X, a] = error, the parser calls an error recovery routine.

Algorithm for nonrecursive predictive parsing:


Input: A string w and a parsing table M for grammar G.
Output: If w is in L(G), a leftmost derivation of w; otherwise, an error indication.
Method: Initially, the parser has $S on the stack with S, the start symbol of G on top, and w$
in the input buffer. The program that utilizes the predictive parsing table M to produce a
parse for the input is as follows:
Set ip to point to the first symbol of w$;
repeat
let X be the top stack symbol and a the symbol pointed to by ip;
if X is a terminal or $ then
if X = a then
pop X from the stack and advance ip
else error()
else /* X is a non-terminal */
if M[X, a] = X →Y1Y2 … Yk then begin
pop X from the stack;
push Yk, Yk-1, … ,Y1 onto the stack, with Y1 on top;
output the production X → Y1 Y2 . . . Yk
end
else error()
until X = $

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 68
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

2.12. FIRST( ) & FOLLOW( ):


Predictive parsing table construction:
The construction of a predictive parser is aided by two functions associated with a grammar
G:
1. FIRST
2. FOLLOW

Rules for First( ):


1. If X is terminal, then FIRST(X) is {X}.
2. If X → ε is a production, then add ε to FIRST(X).
3. If X is non-terminal and X → aα is a production then add a to FIRST(X).
4. If X is non-terminal and X → Y1 Y2…Yk is a production, then place a in FIRST(X)
if for some i, a is in FIRST(Yi), and ε is in all of FIRST(Y1),…,FIRST(Yi-1);that is,
Y1,….Yi-1=> ε. If ε is in FIRST (Yj) for all j=1,2,..,k, then add ε to FIRST(X).

Rules for Follow( ):


If S is a start symbol, then FOLLOW(S) contains $
1. If there is a production A → αBβ, then everything in FIRST(β) except ε is placed in
follow(B)
2. If there is a production A → αB, or a production A → αBβ where FIRST(β) contains
ε, then everything in FOLLOW(A) is in FOLLOW(B)

Algorithm for construction of predictive parsing table:


Input: Grammar G
Output: Parsing table M
Method:
1. For each production A → α of the grammar, do steps 2 and 3
2. For each terminal a in FIRST(α), add A → α to M[A, a]
3. If ε is in FIRST(α), add A → α to M[A, b] for each terminal b in FOLLOW(A). If ε is
in FIRST(α) and $ is in FOLLOW(A) , add A → α to M[A, $]
4. Make each undefined entry of M be error

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 69
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Example:
Consider the following grammar:
E→E+T|T
T→T*F|F
F→(E)|id
After eliminating left-recursion the grammar is
E →TE’
E’ → +TE’ | ε
T →FT’
T’ → *FT’ | ε
F → (E)|id
First( ):
FIRST(E) = { ( , id}
FIRST(E’) ={+ , ε }
FIRST(T) = { ( , id}
FIRST(T’) = {*, ε }
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T’) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }

Predictive parsing Table

Prepared by G. Sunil Reddy, Asst. Professor,


70
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Stack Implementation

2.13. LL(1) GRAMMAR:

The parsing table entries are single entries. So each location has not more than one entry.
This type of grammar is called LL(1) grammar.
Consider this following grammar:
S→iEtS | iEtSeS| a
E→b
After eliminating left factoring, we have
S→iEtSS’|a
S’→ eS | ε
E→b
To construct a parsing table, we need FIRST() and FOLLOW() for all the non-terminals.
FIRST(S) = { i, a }

Prepared by G. Sunil Reddy, Asst. Professor,


71
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

FIRST(S’) = {e, ε }
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}
Parsing table:

Since there are more than one production, the grammar is not LL(1) grammar.
Actions performed in predictive parsing:
1. Shift
2. Reduce
3. Accept
4. Error
Implementation of predictive parser:
1. Elimination of left recursion, left factoring and ambiguous grammar
2. Construct FIRST() and FOLLOW() for all non-terminals
3. Construct predictive parsing table
4. Parse the given input string using stack and parsing table

2.14. ERROR RECOVERY IN PREDICTIVE PARSING

• An error is detected during the predictive parsing when the terminal on top of the
stack does not match the next input symbol, or when nonterminal A on top of the
stack, a is the next input symbol, and parsing table entry M[A,a] is empty
Panic-mode error recovery
• Panic-mode error recovery is based on the idea of skipping symbols on the input until
a token in a selected set of synchronizing tokens

Prepared by G. Sunil Reddy, Asst. Professor,


72
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

How to select synchronizing set?


• Place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If we
skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it
likely that parsing can continue.
• We might add keyword that begins statements to the synchronizing sets for the
nonterminals generating expressions.
• If a nonterminal can generate the empty string, then the production deriving ε can be
used as a default. This may postpone some error detection, but cannot cause an error
to be missed. This approach reduces the number of nonterminals that have to be
considered during error recovery.
• If a terminal on top of stack cannot be matched, a simple idea is to pop the terminal,
issue a message saying that the terminal was inserted.

Example: error recovery


“synch” indicating synchronizing tokens obtained from FOLLOW set of the nonterminal in
question. If the parser looks up entry M[A,a] and finds that it is blank, the input symbol a is
skipped. If the entry is synch, the nonterminal on top of the stack is popped. If a token on top
of the stack does not match the input symbol, then we pop the token from the stack.
First( ):
FIRST(E) = { ( , id}
FIRST(E’) ={+ , ε }
FIRST(T) = { ( , id}
FIRST(T’) = {*, ε }
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T’) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 73
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Fig. 2.14.1. Synchronizing tokens added to the parsing table


On the erroneous input ) id * + id, the parser and error recovery mechanism.

Fig. 2.14.2. Parsing and error recovery moves made by a predictive parser

The above discussion of panic-mode recovery does not address the important issue of error
messages. The compiler designer must supply informative error messages that not only
describe the error, they must draw attention to where the error was discovered.
Phrase - level Recovery
Phrase-level error recovery is implemented by filling in the blank entries in the predictive
parsing table with pointers to error routines. These routines may change, insert, or delete
symbols on the input and issue appropriate error messages. They may also pop from the
stack. Alteration of stack symbols or the pushing of new symbols onto the stack is

Prepared by G. Sunil Reddy, Asst. Professor,


74
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

questionable for several reasons. First, the steps carried out by the parser might then not
correspond to the derivation of any word in the language at all. Second, we must ensure that
there is no possibility of an infinite loop. Checking that any recovery action eventually results
in an input symbol being consumed (or the stack being shortened if the end of the input has
been reached) is a good way to protect against such loops.

SOLVED PROBLEMS
Problem 1
Table-based LL(1) Predictive Top-Down Parsing. Consider the following CFG G =
(N={S, A, B, C, D}, T={a,b,c,d}, P, S) where the set of productions P is given below:
S→A
A → BC | DBC
B → Bb | ε
C→c|ε
D→a|d

i. Is this grammar suitable to be parsed using the recursive descendent parsing


method? Justify and modify the grammar if needed.
ii. Compute the FIRST and FOLLOW set of non-terminal symbols of the grammar
resulting from your answer in a)
iii. Show the stack contents, the input and the rules used during parsing for the input
w = dbb
iv. Construct the corresponding parsing table using the predictive parsing LL method.

Answers:
i. No because it is left-recursive. You can expand B using a production with B as the left-
most symbol without consuming any of the input terminal symbols. To eliminate this left
recursion we add another non-terminal symbol, B’ and productions as follows:
S→A
A → BC | DBC
B → bB’ | ε
B’ → bB’ | ε
C→c|ε
D→a|d

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 75
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

ii. FIRST(S) = { a, b, c, d, ε }
FOLLOW(S) = { $ }
FIRST(A) = { a, b, c, d, ε }
FOLLOW(A) = { $ }
FIRST(B) = { b, ε }
FOLLOW(B) = { c, $ }
FIRST(B’) = { b, ε }
FOLLOW(B’) ={ c, $ }
FIRST(C) = { c, ε }
FOLLOW(C) = { $ }
FIRST(D) = { a, d }
FOLLOW(D) = { b, c, $ }
Non-terminals A, B, B’, C and S are all nullable.
iii. The stack and input are as shown below using the predictive, table-driven parsing
algorithm:

iv. The parsing table is as shown below:

Prepared by G. Sunil Reddy, Asst. Professor,


76
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Problem 2
Perform left recursion and left factoring. S → ( )
S→a
S→(A)
A→S
A→A,S
Answer
S → ( S’ S → a S’ → ) S’ → A )
A → SA’
A’ → ,SA’
A’ → ε

Problem 3
Eliminate immediate left recursion for the following grammar
E E+T | T
T T*F|F
F (E) | id.
Answer
The rule to eliminate the left recursion is A->Aα|β can be converted as A-> βA’ and A’->
αA’|ε. So, the grammar after eliminating left recursion is
E TE’
E’ +TE’| ε
T FT’
T’ *FT’ | ε
F (E) | id.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 77
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

REVIEW QUESTIONS (LEVELS I, II, III)


CO Blooms
S. No Questions
Addressing level
1 What is the role of parser 1 1
2 Define Context-free grammars 1 1
3 What is Left Recursion? 2 1
4 What is Left Factoring? 2 1
5 What is parse tree? 2 1
6 What do you mean by Ambiguous Grammar? 1 1
7 List the ways to Elimination the Ambiguity 1 2
8 List the different Top down Parsing techniques? 2 1

MULTIPLE CHOICE QUESTIONS


1. A Top down parser generates ( )
a. Right most derivation b. Right most derivation in reverse
c. Left most derivation d. Left most derivation in reverse
2. __________________ is not a top-down parser ( )
a. Operator precedence parser b. Predictive parser
c. Non Recursive descent parser d. Recursive descent parser
3. _________ is a process of finding a parse tree for a string of tokens. ( )
a. Parsing b. Analysing c. Recognizing d. Tokenizing
4. Abbreviate LL (1) ___________________________________________________.
5. Abbreviate CFG ___________________________________________________.
6. A Grammar having more than one Parse Tree is said to be_________________ ( )
a. Ambiguous Tree b. Ambiguous Grammar
c. Regular Grammar d. Restricted grammar
7. Syntax analysis is also called_____________. ( )
a. Parser b. Scanner c. Hierarchical Analysis d. Both a & c
8. A grammar G is said to be ambiguous if it has more than one parse tree (left or
right derivation) for at least one string.
9. For predictive parsing the grammar A AA | (A) | ε is not suitable because ( )
a. The grammar is right recursive b. The grammar is left recursive
c. The grammar is ambiguous d. The grammar is an operator grammar

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 78
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

10. __________________ is a top-down parser ( )


a. Operator precedence parser b. An LALR (k) parser
c. An LR (k) parser d. Recursive descent parser
11. In the grammar S → AB, B → ab, A → aa, A → a, B → b, what is FOLLOW (B) = _____.
12. In the grammar S → AB, B → ab, A → aa, A → a, B → b, what is FIRST (B) = _____.
13. A grammar is said to be left recursive if it has a non-terminal A such that there is a
derivation A=>Aα for some string α.
14. Parsing is also known as? ( )
a. Lexical analysis b. Syntax analysis
c. Semantic analysis d. Code generation
15. Which of the following derivations does a top-down parser use while parsing an input
string? ( )
a) Leftmost derivation
b) Leftmost derivation in reverse
c) Rightmost derivation
d) Rightmost derivation in reverse
16. Which of the following statements is false? ( )
a) Left as well as right most derivations can be in Unambiguous grammar
b) An LL (1) parser is a top-down parser
c) LALR is more powerful than SLR
d) Ambiguous grammar can’t be LR (k)
17. Which of the following statements is false? ( )
a) Unambiguous grammar has both kind of derivations
b) An LL(1) parser is a top-down parser
c) LALR is more powerful than SLR
d) Ambiguous grammar can’t be LR(k)
18. S → C C
C→cC|d
The grammar is ( )
a) LL(1)
b) SLR(1) but not LL(1)
c) LALR(1) but not SLR(1)
d) LR(1) but not LALR(1)

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 79
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

19. In the given grammar S → C C


C→cC|d
what is FIRST(S) = ( )
a. {c,d} b. {$} c. {c,d,$} d. {C,d}
20. In the given grammar S → C C
C→cC|d
what is FIRST(C) = ( )
a. {c,d} b. {$} c. {c,d,$} d. {C,d}
21. In the given grammar S → C C
C→cC|d
what is FOLLOW(S) = ( )
a. {c,d} b. {$} c. {c,d,$} d. {C,d}
22. In the given grammar S → C C
C→cC|d
what is FOLLOW(C) = ( )
a. {c,d} b. {$} c. {c,d,$} d. {C,d}
23. _____________methods are used to eliminate ambiguity in the grammar ( )
a .Elimination of Left Recursion b. Elimination of Left factoring
c. ALL d. Operator Precedence

SHORT QUESTIONS
SYNTAX ANALYSIS & TOP DOWN PARSING
CO Blooms
S. No Short Questions Marks
Addressing level
1 What is the role of parser 1 1 2
2 Define Context-free grammars 1 1 2
3 What is Left Recursion? 2 1 2
4 Define Backtracking 2 1 2
Explain the procedure of eliminating Left
5 2 2 2
Recursion from the grammar.
Eliminate the left recursion for the given
6 grammar 2 2 2
A→Aa|Aad|bd

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 80
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

7 What is Left Factoring? 2 1 2


8 What is parse tree? 2 1 2
Construct parse tree for string a*b+a using the
9 given grammar 2 2 2
E→E+E|E-E|E*E|E/E|a|b
10 What do you mean by Ambiguous Grammar? 1 1 2
11 List the ways to Elimination the Ambiguity 1 2 2
List the different Top down Parsing
12 2 1 2
techniques?
13 What is LL(1) Grammar ? 2 1 2
Define FIRST( ) and FOLLOW( ) in a
14 2 1 2
Grammar
15 Write the rules of FIRST ( ) 1 2 2
16 Write the rules of FOLLOW( ) 2 2 2
17 What do you mean by right most derivation? 1 1 2
18 Illustrate the role of stack in Predictive Parsing 1
What is Panic mode error recovery in
19 2 2 2
Predictive Parsing?
Consider the context-free grammar:
20 S → S S + |S S * |a and the string aa + a*. 2 2 2
Give a leftmost derivation for the string.
Consider the context-free grammar:
21 S → S S + |S S * |a and the string aa + a*. 2 2 2
Give a rightmost derivation for the string.
Consider the context-free grammar:
22 S → S S + |S S * |a and the string aa + a*. 2 2 2
Give a parse tree for the string.
Consider the context-free grammar:
S -> S S + |S S * |a and the string aa + a*.
23 2 2 2
Is the grammar ambiguous or unambiguous?
Justify your answer.
Design grammars for the following languages:
24 2 2 2
The set of all strings of 0s and 1s such that

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 81
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

every 0 is immediately followed by at least one


1.
Design grammars for the following languages:
25 The set of all strings of 0s and 1s that are 2 2 2
palindromes;
Design grammars for the following languages:
26 the set of all strings of 0s and Is with an equal 2 2 2
number of 0s and 1s
Design grammars for the following languages:
27 The set of all strings of 0s and 1s with an 2 2 2
unequal number of 0s and 1s.
28 Write the tuples of Context Free Grammar 1 1 2
29 What is Brute force method 2 1 2
What are Error recovery techniques in
30 2 1 2
predictive parsing
31 Write the algorithm steps for FIRST 2 2 2
32 Write the algorithm steps for FOLLOW 2 2 2
Consider the grammar & derive the string abcd
using backtracking
33 2 2 2
S → aAcd
A→e|b
34 Define Parse tree with an example 1 2 2

35 Write a short notes on YACC tool 2 1 2

SHORT QUESTIONS WITH ANSWERS

1. What is the output of syntax analysis phase? What are the three general types of
parsers for grammars?

Parser (or) parse tree is the output of syntax analysis phase. General types of parsers:
1) Universal parsing
2) Top-down
3) Bottom-up

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 82
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

2. What are the different strategies that a parser can employ to recover from a
syntactic error?

• Panic mode
• Phrase level
• Error productions
• Global correction

3. What are the goals of error handler in a parser?

The error handler in a parser has simple-to-state goals:


• It should report the presence of errors clearly and accurately
• It should recover from each error quickly enough to be able to detect subsequent
errors
• It should not significantly slow down the processing of correct programs

4. What is phrase level error recovery?

On discovering an error, a parser may perform local correction on the remaining input;
that is, it may replace a prefix of the remaining input by some string that allows the
parser to continue. This is known as phrase level error recovery.

5. How will you define a context free grammar?

A context free grammar consists of terminals, non-terminals, a start symbol, and


productions.
a. Terminals are the basic symbols from which strings are formed. “Token” is a
synonym for terminal. Example: if, then, else
b. Nonterminals are syntactic variables that denote sets of strings, which help define the
language generated by the grammar. Example: stmt, expr
c. Start symbol is one of the nonterminals in a grammar and the set of strings it denotes
is the language defined by the grammar. Example: S
d. The productions of a grammar specify the manner in which the terminals and
nonterminals can be combined to form strings Example: expr

6. Define context free language. When will you say that two CFGs are equal?

• A language that can be generated by a grammar is said to be a context free language.


• If two grammars generate the same language, the grammars are said to be equivalent.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 83
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

7. Differentiate sentence and sentential form.

Sentence Sentential form


• If S⇒w then the string w is called • If S⇒ then is a sentential form
Sentence of G of G
• Sentence is a string of terminals. • Sentential form may contain non
Sentence is a sentential form with terminals
no nonterminals

8. Give the definition for leftmost and canonical derivations.

• Derivations in which only the leftmost nonterminal in any sentential form is replaced
at each step are termed leftmost derivations
• Derivations in which the rightmost nonterminal is replaced at each step are termed
canonical derivations.

9. What is a parse tree?

A parse tree may be viewed as a graphical representation for a derivation that filters out
the choice regarding replacement order. Each interior node of a parse tree is labeled by
some nonterminal A and that the children of the node are labeled from left to right by
symbols in the right side of the production by which this A was replaced in the
derivation. The leaves of the parse tree are terminal symbols.

10. What is an ambiguous grammar? Give an example.

• A grammar that produces more than one parse tree for some sentence is said to be
ambiguous
• An ambiguous grammar is one that produces more than one leftmost or rightmost
derivation for the same sentence
Example: E → E+E | E*E | id

11. Why do we use regular expressions to define the lexical syntax of a language?

i. The lexical rules of a language are frequently quite simple, and to describe them we
do not need a notation as powerful as grammars
ii. Regular expressions generally provide a more concise and easier to understand
notation for tokens than grammars
iii. More efficient lexical analyzers can be constructed automatically from regular
expressions than from arbitrary grammars

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 84
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

iv. Separating the syntactic structure of a language into lexical and non lexical parts
provides a convenient way of modularizing the front end of a compiler into two
manageable-sized components

12. When will you call a grammar as the left recursive one?

A grammar is a left recursive if it has a nonterminal A such that there is a derivation


A → Aα for some string α.

13. Define left factoring.

Left factoring is a grammar transformation that is useful for producing a grammar


suitable for predictive parsing. The basic idea is that when it is not clear which of two
alternative productions to use to expand a nonterminal “A”, we may be able to rewrite the
“A” productions to refer the decision until we have seen enough of the input to make the
right choice

14. Left factor the following grammar:


S → iEtS | iEtSeS |a
E→b
Ans: The left factored grammar is,
S → iEtSS′ | a
S′ → eS | ε
E→b

15. What is parsing?

Parsing is the process of determining if a string of tokens can be generated by a grammar.

16. What is Top down parsing?

Starting with the root, labeled, does the top-down construction of a parse tree with the
starting nonterminal, repeatedly performing the following steps.
i. At node n, labeled with non terminal “A”, select one of the productions for “A” and
construct children at n for the symbols on the right side of the production
ii. Find the next node at which a sub tree is to be constructed

17. What do you mean by Recursive Descent Parsing?

Recursive Descent Parsing is top down method of syntax analysis in which we execute a
set of recursive procedures to process the input. A procedure is associated with each
nonterminal of a grammar.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 85
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

18. What is meant by Predictive parsing?

A special form of Recursive Descent parsing, in which the look-ahead symbol


unambiguously determines the procedure selected for each nonterminal, where no
backtracking is required.
19. What is ambiguous grammer? Give an example.
A grammer G is said to be ambiguous if it generates more than one parse trees for
sentence of language L(G).
Example: E → E+E | E*E | id

LONG QUESTIONS
CO Blooms Mark
S. No Long Questions
Addressing level s
Construct LL(1) for the given grammar,
S → ABC
1 A → aA | C 2 3 10
B→b
C → c.
Consider the given grammar
E → E+E | E-E | E*E | E/E | a | b
2 1 3 5
Obtain left most and right most derivation for
the string a+b*a+b.
Define ambiguous grammar? Test whether the
3 following grammar is ambiguous or not. 1 3 5
E → E+E | E-E | E*E | E/E | E↑ | (E) | -E | id
Write a non recursive descent parser for the
grammar.
bexpr → bexpr or bterm | bterm
4 bterm → bterm and bfactor | bfactor 2 4 10
bfactor → notebfactor | (bexpr) | true | false.
Where or, and , not,(,),true, false are terminals
of the grammar.
Check whether the following grammar is a
5 2 4 10
LL(1)grammar or not.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 86
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

S→ iEtS | iEtSeS | a
E→ b
Explain FIRST( ) and FOLLOW( ) techniques
6 2 2 10
with a suitable example
Construct the predictive parser the following
grammar and show that the given grammar is
7 LL(1) or not 2 3 10
S→ (L) | a
L→L,S | S
Consider a grammar
S → (L) | a
L → L, S | S
i) What are the terminals, non-terminals and
8 2 3 10
start symbol?
ii) Construct leftmost and rightmost derivations
for the string (a, (a, a)).
iii. Construct FIRST( ) and FOLLOW( )
Eliminate ambiguity from the given grammar,
Construct FIRST( ) and FOLLOW( ) and check
whether the given grammar is LL(1) or not
9 2 4 10
E→E+T|T
T→T*F|F
F → ( E ) | id
Construct a predictive parsing table for the
grammar & parse the string id+id*id.
10 E→E+T|T 2 4 10
T→T*F|F
F → ( E ) | id
Construct the non recursive predictive parsing
for the following grammar and show that the
11 given grammar is LL(1) or not 2 4 10
S → AA
A →aA | b

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 87
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Construct FIRST( ) and FOLLOW( ) and check


whether the given grammar is LL(1) or not
E→TE’
12 E’→+T E’ | ε 2 4 10
T→FT’
T’→*FT’ | ε
F→(E) | id
For the given grammar construct FIRST( ) and
FOLLOW( ) and check whether the given
grammar is LL(1) or not
13 2 4 10
S→aA
A→Bb
B→c | ε
Eliminate left recursion from the grammar and
construct FIRST( ) and FOLLOW( ) and check
14 whether the given grammar is LL(1) or not. 2 4 10
S →Aa | b
A→Ac | Sd | e
Eliminate left recursion from the given
grammar & construct FIRST( ) and FOLLOW(
) and check whether the given grammar is
15 LL(1) or not. 2 4 10
E→E+T|T
T→T*F|F
F → ( E ) | id
Apply left Factoring in the grammar construct
FIRST( ) and FOLLOW( ) and check whether
16 the given grammar is LL(1) or not. 2 4 10
S→ iEtS | iEtSeS | a
E→ b
Explain recursive descent parser with a example
17 2 2 10
grammar?

18 Explain the techniques of Error recovery in 2 2 10

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 88
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

predictive parsing

Given the grammar:


S → AaAb | BbBa
A→∈
B→∈
19 2 4 10
i. Compute FIRST () and FOLLOW()
functions
ii. Construct predictive parsing table
iii. Parse the input string w=ab
Classify various parsing techniques & Explain
20 Left most & Right most derivations with an 2 3 10
example.
Explain Top Down and Bottom Up Parsing
21 2 2 10
techniques with example
a. Explain Backtracking
b. Consider the grammar & derive the string
22 abcd using backtracking 2 3 10
S → aAcd
A→e|b
Construct FIRST ( ) and FOLLOW( ) for the
given grammar and check whether the given
grammar is LL(1) or not.
23 S → ADB | Dbb | Ba 2 4 10
A → da | BD
B→g|ε
D→h|ε
Consider the grammar
S → a | ↑| (T)
T → T, S | S
Write down the necessary algorithms and define
24 FIRST and FOLLOW. 2 3 10
Show the behavior of the parser in the
sentences,
i. (a,(a,a))
ii. (((a,a), ↑,(a),a)

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 89
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

Construct the LL(1) parse table for the


following grammar:
25 S → aB | aC | Sd | Se 2 3 10
B → bBC | f
C→g

GATE/Competitive Exams Questions

1. Which one of the following grammars generates the language L = {aibj | i ≠ j}

a. S →AC | CB
C → aC | a | b

A → aA | ε

B →Bb | ε

b. S → aS | Sb | a | b
c. S →AC | CB
C → aCb

A → aA | ε

B → Bb | ε

d. S →AC | CB
C → aCb | ε

A → aA | a

B → Bb | b

SOLUTION
• Language L contains the strings : {abb, aab, abbb, aabbb, aaabb, aa, bb, .......}, i.e, all
a's appear before b's in a string, and "number of a's" is not equal to "number of b's",
So i ≠ j.
• Here Grammar a, b & c also generate the string "ab", where i = j, and many more
strings with i = j, hence these grammars do not generate the language L, because for a
string that belongs to language L, exponent i should not be equal to exponent j.
• Grammar d: This Grammar never generates a string with equal no of a's and b's, i.e.
i=j. Hence this grammar generates the language L.
• Hence (d) is correct option.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 90
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

2. Which of the following statements is false?


a. An unambiguous grammar has same left most and right most derivation
b. An LL(1) parser is a top-down parser
c. LALR is more powerful than SLR
d. An ambiguous grammar can never be LR (K) for any k
SOLUTION
• Yes, the LL (1) parser is top down parser. Order of strength LR < SLR < LALR. So
(a) & (c) are, true. An ambiguous grammar can’t be LR (K) So option (a) is false
since an unambiguous grammar has unique right most derivation & left most
derivations but both are not same. Hence (a) is correct option

3. Which of the following suffices to convert an arbitrary CFG to an LL(1)


grammar?
a. Removing left recursion alone
b. Factoring the grammar alone
c. Removing left recursion and factoring the grammar
d. None of this
SOLUTION
• If a grammar has left recursion & left factoring then it is ambiguous. So to convert a
CFG to LL(1) grammar both removal of left recursion & left factoring need to be
done.
• Hence (c) is correct option.

4. In a compiler, keywords of a language are recognized during


a. Parsing of the program
b. The code generation
c. The lexical analysis of the program
d. Data flow analysis
5. Which data structure in a compiler is used for managing information about
variables and their attributes
a. Abstract syntax tree b. Symbol table
c. Semantic stack d. Parse table
6. The number of tokens in the following C statement is:

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 91
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

printf("i = %d, &i = %x",i, &i);


a. 3 b. 26 c. 10 d. 21
7. Match all items in Group 1 with correct options from those given in Group 2.

Group 1 Group 2
P. Regular expression 1. Syntax analysis
Q. Pushdown automata 2. Code generation
R. Dataflow analysis 3. Lexical analysis
S. Register allocation 4. Code optimization
a. P - 4, Q – 1, R - 2, S - 3 b. P - 3, Q – 1, R - 4, S - 2
c. P - 3, Q – 4, R - 1, S – 2 d. P - 2, Q – 1, R - 4, S - 3

7. Which one of the following statements is FALSE?


a. CFG’s can be used to specify both lexical and syntax rules
b. Type checking is done before parsing
c. High – level language programs can be translated to different intermediate
representations
d. Arguments to a function can be passed using the program stack

8. Which of the following statements is false? (GATE CS 2001)


a) An unambiguous grammar has same leftmost and rightmost derivation
b) An LL(1) parser is a top-down parser
c) LALR is more powerful than SLR
d) An ambiguous grammar can never be LR(k) for any k
Answer: (a)
If a grammar has more than one leftmost (or rightmost) derivation for a single sentential
form, the grammar is ambiguous. The leftmost and rightmost derivations for a sentential
form may differ, even in an unambiguous grammar

9. Which one of the following grammar is free from left recursion?


a. S →AB
A → aA | b
B→c
b. S → AB | Bb | c
A → Bd | ε

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 92
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

B→e
c. S → Aa | B | ε
A → Bd | Sc | ε
B→d
d. S →Aa |Bb | c
A → bd | ε
B → Ae | ε
10. Which of the following derivations does a top-down parser use while parsing an
input string? The input is assumed to be scanned in left to right order
(a) Leftmost derivation
(b) Leftmost derivation traced out in reverse
(c) Rightmost derivation
(d) Rightmost derivation traced out in reverse
Answer (a)
Top-down parsing (LL)
In top down parsing, we just start with the start symbol and compare the right side of the
different productions against the first piece of input to see which of the productions should be
used. A top down parser is called LL parser because it parses the input from Left to right, and
constructs a Leftmost derivation of the sentence.

11. Consider the following grammar:


S → FR
R→S|ε
F → id
In the predictive parser table, M, of the grammar the entries M[S, id] and M[R, $]
respectively.
(A) {S → FR} and {R → ε }
(B) {S → FR} and { }
(C) {S → FR} and {R → *S}
(D) {F → id} and {R → ε}
Answer: (A)
Explanation: Here representing the parsing table as M[ X , Y ], where X represents rows(
Non terminals) and Y represents columns(terminals).
Here are the rules to fill the parsing table.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 93
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

12. The grammar A → AA | (A) | ε is not suitable for predictive-parsing because the
grammar is
(A) ambiguous
(B) left-recursive
(C) right-recursive
(D) an operator-grammar
Answer: (A)
Explanation: Since given grammar can have infinite parse trees for string ‘ε’, so grammar is
ambiguous, and also A → AA has left recusion.
For predictive-parsing, grammar should be:
• Free from ambiguity
• Free from left recursion
• Free from left factoring
Given grammar contains both ambiguity and left factoring, so it can not have predictive
parser.
We always expect first grammar free from ambiguity for parsing. Option (A) is more strong
option than option (B) here.
13. Which of the following suffices to convert an arbitrary CFG to an LL(1)
grammar?
(A) Removing left recursion alone
(B) Factoring the grammar alone
(C) Removing left recursion and factoring the grammar
(D) None of these
Answer: (D)
Explanation: Removing left recursion and factoring the grammar do not suffice to convert an
arbitrary CFG to LL(1) grammar.

14. Consider the grammar shown belo


S → i E t S S' | a
S' → e S | ε
E→b

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 94
UNIT – II: Compiler Design - Syntax Analysis, Top down parsing

In the predictive parse table. M, of this grammar, the entries M[S’, e] and M[S’, $]
respectively are
(A) {S’ → e S} and {S’ → e}
(B) {S’ → e S} and {}
(C) {S’ → ε} and {S’ → ε}
(D) {S’ → e S, S’→ ε} and {S’ → ε}
Answer: (D)
Explanation: Here representing the parsing table as M[ X , Y ], where X represents rows(
Non terminals) and Y represents columns(terminals).
Here are the rules to fill the parsing table.
For each distinct production rule A->α, of the grammar, we need to apply the given rules:
Rule 1: if A –> α is a production, for each terminal ‘a’ in FIRST(α), add A–>α to M[ A , a ]
Rule 2: if ‘ ε ‘ is in FIRST(α), add A –> α to M [ A , b ] for each ‘b’ in FOLLOW(A).
As Entries have been asked corresponding to Non-Terminal S’, hence we only need to
consider its productions to get the answer.
For S’ → eS, according to rule 1, this production rule should be placed at the entry M[ S’,
FIRST(eS) ], and from the given grammar, FIRST(eS) ={e}, hence S’->eS is placed in the
parsing table at entry M[S’ , e].
Similarly,
For S’->ε, as FIRST(ε) = {ε}, hence rule 2 should be applied, therefore, this production rule
should be placed in the parsing table at entry M[S’,FOLLOW(S’)], and FOLLOW(S’) =
FOLLOW(S) = { e, $ }, hence R->ε is placed at entry M[ S’, e ] and M[ S’ , $ ].
Therefore Answer is option D.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 95
UNIT – III: Compiler Design - Bottom Up parsing

UNIT III
Bottom Up parsing: SR parsing, Operator Precedence Parsing, LR grammars, LR Parsers –
Model of an LR Parsers, SLR parsing, CLR parsing, LALR parsing, Error recovery in LR
Parsing, handling ambiguous grammars.

UNIT-III: Bottom Up parsing Planned Hours: 11


Blooms
S. No. Topic Learning Outcomes COs
Levels
1. Understand the differences between Top down to Bottom CO 4 L2
up parsers
2. Build the parse tree for Bottom up parser using the rules of CO 4 L3
syntactical grammar
3. Classify various LR Grammars CO 2 L4
4. Apply shift reduce actions in LR Grammars CO 2 L3
5. Identify the suitable LR parser from the grammar CO 4 L3

3. BOTTOM-UP PARSING

Constructing a parse tree for an input string beginning at the leaves and going towards
the root is called bottom-up parsing. A general type of bottom-up parser is a shift-reduce
parser.

3.1. SHIFT-REDUCE PARSING

Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse


tree for an input string beginning at the leaves (the bottom) and working up towards the root
(the top).
Example:
Consider the grammar:
S → aABe
A → Abc | b
B→d
The sentence to be recognized is abbcde..

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 96
UNIT – III: Compiler Design - Bottom Up parsing

REDUCTION (LEFTMOST) RIGHTMOST DERIVATION


abbcde (A → b) S → aABe
aAbcde(A → Abc) → aAde
aAde (B → d) → aAbcde
aABe (S → aABe) → abbcde
S
The reductions trace out the right-most derivation in reverse.

Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the
reverse of a rightmost derivation.
Example:
Consider the grammar:
E→E+E
E→E*E
E→(E)
E→id
And the input string id1+id2*id3

The rightmost derivation is:


E → E+E
→ E+E*E
→ E+E*id3
→ E+id2*id3
→ id1+id2* id3
In the above derivation the underlined substrings are called handles.
Handle pruning:
A rightmost derivation in reverse can be obtained by “handle pruning”.
(i.e.) if w is a sentence or string of the grammar at hand, then w = γn, where γn is the nth right
sentinel form of some rightmost derivation.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 97
UNIT – III: Compiler Design - Bottom Up parsing

Stack implementation of shift-reduce parsing:

Actions in shift-reduce parser:


• Shift - The next input symbol is shifted onto the top of the stack.
• Reduce - The parser replaces the handle within a stack with a non-terminal.
• Accept - The parser announces successful completion of parsing.
• Error - The parser discovers that a syntax error has occurred and calls an error recovery
routine.
Conflicts in shift-reduce parsing:
There are two conflicts that occur in shift-reduce parsing:
1. Shift-reduce conflict: The parser cannot decide whether to shift or to reduce.
2. Reduce-reduce conflict: The parser cannot decide which of several reductions to make.

1. Shift-Reduce Conflict:
Example:
Consider the grammar:
E→E+E | E*E | id and input id+id*id

Prepared by G. Sunil Reddy, Asst. Professor,


98
UNIT – III: Compiler Design - Bottom Up parsing

2. Reduce-reduce conflict:
Consider the grammar:
M → R+R | R+c | R
R→c
and input c+c

Viable prefixes:
• α is a viable prefix of the grammar if there is w such that αw is a right.

Prepared by G. Sunil Reddy, Asst. Professor,


99
UNIT – III: Compiler Design - Bottom Up parsing

• The set of prefixes of right sentinel forms that can appear on the stack of a shift-
reduce parser are called viable prefixes.
• The set of viable prefixes is a regular language.

PRACTICE PROBLEMS BASED ON SHIFT-REDUCE PARSING-


Problem-01:
Consider the following grammar-
E→E–E
E→ExE
E → id
Parse the input string id – id x id using a shift-reduce parser.
Solution-
The priority order is: id > x > –

Stack Input Buffer Parsing Action

$ id – id x id $ Shift

$ id – id x id $ Reduce E → id

$E – id x id $ Shift

$E– id x id $ Shift

$ E – id x id $ Reduce E → id

$E–E x id $ Shift

$E–Ex id $ Shift

$ E – E x id $ Reduce E → id

$E–ExE $ Reduce E → E x E

$E–E $ Reduce E → E – E

$E $ Accept

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 100
UNIT – III: Compiler Design - Bottom Up parsing

Problem-02:
Consider the following grammar-
S→(L)|a
L→L,S|S
Parse the input string ( a , ( a , a ) ) using a shift-reduce parser.
Solution-

Stack Input Buffer Parsing Action

$ (a,(a,a))$ Shift

$( a,(a,a))$ Shift

$(a ,(a,a))$ Reduce S → a

$(S ,(a,a))$ Reduce L → S

$(L ,(a,a))$ Shift

$(L, (a,a))$ Shift

$(L,( a,a))$ Shift

$(L,(a ,a))$ Reduce S → a

$(L,(S ,a))$ Reduce L → S

$(L,(L ,a))$ Shift

$(L,(L, a))$ Shift

$(L,(L,a ))$ Reduce S → a

$(L,(L,S) ))$ Reduce L → L , S

$(L,(L ))$ Shift

$(L,(L) )$ Reduce S → (L)

$(L,S )$ Reduce L → L , S

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 101
UNIT – III: Compiler Design - Bottom Up parsing

$(L )$ Shift

$(L) $ Reduce S → (L)

$S $ Accept

Problem-03:
Consider the following grammar-
S→TL
T → int | float
L → L , id | id
Parse the input string int id , id ; using a shift-reduce parser.
Solution-

Stack Input Buffer Parsing Action

$ int id , id ; $ Shift

$ int id , id ; $ Reduce T → int

$T id , id ; $ Shift

$ T id , id ; $ Reduce L → id

$TL , id ; $ Shift

$TL, id ; $ Shift

$ T L , id ;$ Reduce L → L , id

$TL ;$ Shift

$TL; $ Reduce S → T L

$S $ Accept

Problem-04:
Considering the string “10201”, design a shift-reduce parser for the following grammar-

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 102
UNIT – III: Compiler Design - Bottom Up parsing

S 0S0 | 1S1 | 2
Solution:

Stack Input Buffer Parsing Action

$ 10201$ Shift

$1 0201$ Shift

$10 201$ Shift

$102 01$ Reduce S → 2

$10S 01$ Shift

$10S0 1$ Reduce S → 0 S 0

$1S 1$ Shift

$1S1 $ Reduce S → 1 S 1

$S $ Accept

3.2. OPERATOR-PRECEDENCE PARSING

An efficient way of constructing shift-reduce parser is called operator-precedence


parsing. Operator precedence parser can be constructed from a grammar called Operator-
grammar. These grammars have the property that no production on right side is ɛ or has two
adjacent non-terminals.
A grammar that is generated to define the mathematical operators is called operator
grammar with some restrictions on grammar. An operator precedence grammar is a
context-free grammar that has the property that no production has either an empty right-hand
side (null productions) or two adjacent non-terminals in its right-hand side.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 103
UNIT – III: Compiler Design - Bottom Up parsing

Example 1:
This is the example of operator grammar:
E E+E/E*E/id
However, grammar that is given below is not an operator grammar because two non
non-
terminals are adjacent to each other:
S SAS/a
A bSb/b
Although, we can convert it into an operator grammar:
S SbSbS/SbS/a
A bSb/b
Operator precedence parser –
An operator precedence parser is a one of the bottom-up
bottom up parser that interprets an operator
operator-
precedence grammar. This parser is only used for operator grammars. Ambiguous grammars
are not allowed in case of any parser except operator precedence parser.
There are two methods for determining what precedence relations should hold between a pair
of terminals:
1. Use the conventional associativity and precedence of operator.
2. The second method of selecting operator-precedence
operator precedence relations is first to construct an
unambiguous grammar for the language, a grammar that reflects the correct
associativity and precedence in its parse trees.
This parser relies on the following three precedence relations: ⋖, ≐, ⋗
a ⋖ b This means a “yields precedence to” b.
a ⋗ b This means a “takes precedence over” b.
a ≐ b This means a “has precedence as” b.

Figure – Operator precedence relation table for grammar E E+E/E*E/id


There is not given any relation between id and id as id will not be compared and two
variables can not come side by side. There is also a disadvantage of this table as if we have n

Prepared by G. Sun
Sunil Reddy, Asst. Professor,
104
UNIT – III: Compiler Design - Bottom Up parsing

operators than size of table will be n*n and complexity will be 0(n2). In order
rder to increase the
size of table, use operator function table.
he operator precedence parsers usually do not store the precedence table with the relations;
rather they are implemented in a special way. Operator precedence parsers use precedence
functions that map terminal symbols to integers, and so the precedence relations between the
symbols are implemented by numerical comparison. The parsing table can be encoded by two
precedence functions f and g that map terminal symbols to integers. We select f and g such
that:
1. f(a) < g(b) whenever a is precedence to b
2. f(a) = g(b) whenever a and b having precedence
3. f(a) > g(b) whenever a takes precedence over b
Example – Consider the following grammar:
E E + E/E * E/( E )/id
The directed graph representing the precedence function:

Since there is not any cycle in the graph so we can make function table:

• fid → gx → f+ → g+ → f$
• gid → fx → gx → f+ → g+ → f$
Size of the table is 2n.

Prepared by G. Sun
Sunil Reddy, Asst. Professor,
105
UNIT – III: Compiler Design - Bottom Up parsing

One disadvantage of function table is that evev though we have blank entries in relation we
got non-blank entries in function table. Blank entries are also called error. Hence error
detection capability of relational table is greater than function table.

$ is a null production here which are also not allowed in operator grammars.

Advantages –
1. It can easily be constructed by hand
2. It is simple to implement this type of parsing
Disadvantages –
1. It is hard to handle tokens like the minus sign (-), which has two different precedence
(depending on whether it is unary or binary)
2. It is applicable only to small class of grammars

Example 2:
Consider the grammar:
E → EAE | (E) | -E | id
A→+|-|*|/|↑
Since the right side EAE has three consecutive non-terminals, the grammar can be written as
follows: E → E+E | E-E | E*E | E/E | E↑E | -E | id
Operator precedence relations:
There are three disjoint precedence relations namely
<. - Less than
= - equal to
.> - greater than
The relations give the following meaning:
a<.b - a yields precedence to b
a=b - a has the same precedence as b
.
a >b - a takes precedence over b
Rules for binary operations:
1. If operator θ1 has higher precedence than operator θ2, then make
θ1 . > θ2 and θ2 < . θ1
2. If operators θ1 and θ2, are of equal precedence, then make
θ1 . > θ2 and θ2 . > θ1 if operators are left associative

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 106
UNIT – III: Compiler Design - Bottom Up parsing

θ1 < . θ2 and θ2 < . θ1 if right associative


3. Make the following for all operators θ:
θ <. id ,id.>θ
θ <.(, (<.θ
).>θ, θ.>)
θ.>$ , $<. θ
Also make
( = ) , ( <. ( , ) .> ) , (<. id, id .>) , $ <. id , id .> $ , $ Example:
Operator-precedence relations for the grammar
E → E+E | E-E | E*E | E/E | E↑E | (E) | -E | id is given in the following table assuming
1. ↑ is of highest precedence and right-associative
2. * and / are of next higher precedence and left-associative, and
3. + and - are of lowest precedence and left- associative
Note that the blanks in the table denote error entries.
Table: Operator-precedence relations

Operator precedence parsing algorithm:


Input: An input string w and a table of precedence relations.
Output: If w is well formed, a skeletal parse tree, with a placeholder non-terminal E labeling
all interior nodes; otherwise, an error indication.
Method: Initially the stack contains $ and the input buffer the string w $. To parse, we
execute the following program:
(1) Set ip to point to the first symbol of w$;

Prepared by G. Sunil Reddy, Asst. Professor,


107
UNIT – III: Compiler Design - Bottom Up parsing

(2) repeat forever


(3) if $ is on top of the stack and ip points to $ then
(4) return
else begin
(5) let a be the topmost terminal symbol on the stack and let b be the symbol pointed to
by ip;
(6) if a <. b or a = b then begin
(7) push b onto the stack;
(8) advance ip to the next input symbol;
end;
(9) else if a . > b then /*reduce*/
(10) repeat
(11) pop the stack
(12) until the top stack terminal is related by <.to the terminal most recently popped
(13) else error( )
end
Stack implementation of operator precedence parsing:
Operator precedence parsing uses a stack and precedence relation table for its
implementation of above algorithm. It is a shift-reduce parsing containing all four actions
shift, reduce, accept and error.
The initial configuration of an operator precedence parsing is
STACK INPUT
$ w$
where w is the input string to be parsed.

Example:
Consider the grammar E → E+E | E-E | E*E | E/E | E↑E | (E) | id. Input string is id+id*id
.The implementation is as follows:

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 108
UNIT – III: Compiler Design - Bottom Up parsing

Advantages of operator precedence parsing:


1. It is easy to implement.
2. Once an operator precedence relation is made between all pairs of terminals of a
grammar, the grammar can be ignored. The grammar is not referred anymore during
implementation.
Disadvantages of operator precedence parsing:
1. It is hard to handle tokens like the minus sign (-) which has two different precedence.
2. Only a small class of grammar can be parsed using operator-precedence parser.

3.3. LR GRAMMARS

i. SLR(1)
ii. CLR(1)
iii. LALR(1)

3.4. LR PARSERS
An efficient bottom-up syntax analysis technique that can be used CFG is called
LR(k) parsing. The ‘L’ is for left-to-right scanning of the input, the ‘R’ for constructing a
rightmost derivation in reverse, and the ‘k’ for the number of input symbols. When ‘k’ is
omitted, it is assumed to be 1.

Prepared by G. Sunil Reddy, Asst. Professor,


109
UNIT – III: Compiler Design - Bottom Up parsing

Advantages of LR parsing:

• It recognizes virtually all programming language constructs for which CFG can be
written
• It is an efficient non-backtracking shift-reduce parsing method
• A grammar that can be parsed using LR method is a proper superset of a grammar
that can be parsed with predictive parser
• It detects a syntactic error as soon as possible

Drawbacks of LR method:
It is too much of work to construct a LR parser by hand for a programming language
grammar. A specialized tool, called a LR parser generator, is needed. Example: YACC.
Types of LR parsing method:
1. SLR- Simple LR
• Easiest to implement, least powerful.
2. CLR- Canonical LR
• Most powerful, most expensive.
3. LALR- Look-Ahead LR
• Intermediate in size and cost between the other two methods.

3.5. MODEL OF AN LR PARSER


The LR parsing algorithm:
The schematic form of an LR parser is as follows:

Fig. 3.5.1 Model of an LR parser


It consists of an input, an output, a stack, a driver program, and a pa parts (action and goto).

Prepared by G. Sunil Reddy, Asst. Professor,


110
UNIT – III: Compiler Design - Bottom Up parsing

• The driver program is the same for all LR parser


• The parsing program reads characters from an input buffer one at a time
• The program uses a stack to store a string of the form s0X1s1X2s2…Xmsm, where
sm is on top. Each Xi is a grammar symbol and each si is a state
• The parsing table consists of two parts: action and goto functions.

Action: The parsing program determines sm, the state currently on top of stack, and ai,
the current input symbol. It then consults action[sm,ai] in the action table which can have one
of four values:
1. Shift s, where s is a state,
2. Reduce by a grammar production A → β,
3. Accept,
4. Error.
Goto: The function goto takes a state and grammar symbol as arguments and produces a
state.
LR Parsing algorithm:
Input: An input string w and an LR parsing table with functions action and goto for grammar
G. Output: If w is in L(G), a bottom-up-parse for w; otherwise, an error indication.
Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the
input buffer. The parser then executes the following program:
set ip to point to the first input symbol of w$; repeat forever begin
let s be the state on top of the stack and
a the symbol pointed to by ip;
if action[s, a] = shift s’ then begin
push a then s’ on top of the stack; advance ip to the next input symbol end
else if action[s, a] = reduce A→β then begin pop 2* | β | symbols off the stack;
let s’ be the state now on top of the stack; push A then goto[s’, A] on top of the stack;
output the production A→ β
end
else if action[s, a] = accept then
return
else error( )
end

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 111
UNIT – III: Compiler Design - Bottom Up parsing

Augment Grammar
Augmented grammar G` will be generated if we add one more production in the given
grammar G. It helps the parser to identify when to stop the parsing and announce the
acceptance of the input.
Example
Given grammar
1. S → AA
2. A → aA | b
The Augment grammar G` is represented by
1. S`→ S
2. S → AA
3. A → aA | b
Canonical Collection of LR(0) items
An LR (0) item is a production G with dot at some position on the right side of the
production. LR(0) items is useful to indicate that how much of the input has been scanned up
to a given point in the process of parsing.
In the LR (0), we place the reduce node in the entire row.
Example
Given grammar:
1. S → AA
2. A → aA | b
Add Augment Production and insert '•' symbol at the first position for every production in G
1. S` → •S
2. S → •AA
3. A → •aA
4. A → •b
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the non-terminal.
So, the I0 State becomes
I0 = S` → •S
S → •AA

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 112
UNIT – III: Compiler Design - Bottom Up parsing

Add all productions starting with "A" in modified I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes.
I0= S` → •S
S → •AA
A → •aA
A → •b
I1= Go to (I0, S) = closure (S` → S•) = S` → S•
Here, the Production is reduced so close the State.
I1= S` → S•
I2= Go to (I0, A) = closure (S → A•A)
Add all productions starting with A in to I2 State because "•" is followed by the non-terminal.
So, the I2 State becomes
I2 =S→A•A
A → •aA
A → •b
Go to (I2,a) = Closure (A → a•A) = (same as I3)
Go to (I2, b) = Closure (A → b•) = (same as I4)
I3= Go to (I0,a) = Closure (A → a•A)
Add productions starting with A in I3.
A → a•A
A → •aA
A → •b
Go to (I3, a) = Closure (A → a•A) = (same as I3)
Go to (I3, b) = Closure (A → b•) = (same as I4)
I4= Go to (I0, b) = closure (A → b•) = A → b•
I5= Go to (I2, A) = Closure (S → AA•) = SA → A•
I6= Go to (I3, A) = Closure (A → aA•) = A → aA•
Drawing DFA:
The DFA contains the 7 states I0 to I6.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 113
UNIT – III: Compiler Design - Bottom Up parsing

LR(0) Table
o If a state is going to some other state on a terminal then it correspond to a shift move.
o If a state is going to some other state on a variable then it correspond to go to move.
o If a state contain the final item in the particular row then write the reduce node
completely.

Explanation:

o I0 on S is going to I1 so write it as 1.
o I0 on A is going to I2 so write it as 2.
o I2 on A is going to I5
5 so write it as 5.
o I3 on A is going to I6 so write it as 6.
o I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.
o I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.

Prepared by G. Sunil
Sun Reddy, Asst. Professor,
114
UNIT – III: Compiler Design - Bottom Up parsing

o I4, I5 and I6 all states contains the final item because they contain • in the right most
end. So rate the production as production number.

Productions are numbered as follows:


1. S → AA ... (1)
2. A → aA ... (2)
3. A → b ... (3)
o I1 contains the final item which drives(S` → S•), so action {I1, $} = Accept.
o I4 contains the final item which drives A → b• and that production corresponds to the
production number 3 so write it as r3 in the entire row.
o I5 contains the final item which drives S → AA• and that production corresponds to
the production number 1 so write it as r1 in the entire row.
o I6 contains the final item which drives A → aA• and that production corresponds to
the production number 2 so write it as r2 in the entire row.

3.6. SLR PARSING

CONSTRUCTING SLR(1) PARSING TABLE:


To perform SLR parsing, take grammar as input and do the following:
1. Find LR(0) items.
2. Completing the closure.
3. Compute goto(I,X), where, I is set of items and X is grammar symbol.

LR(0) items:
An LR(0) item of a grammar G is a production of G with a dot at some position of the right
side. For example, production A → XYZ yields the four items:
A→.XYZ
A → X . YZ
A → XY . Z
A → XYZ .
Closure operation:
If I is a set of items for a grammar G, then closure(I) is the set of items constructed from I by
the two rules:

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 115
UNIT – III: Compiler Design - Bottom Up parsing

1. Initially, every item in I is added to closure(I).


2. If A → α . Bβ is in closure(I) and B → γ is a production, then add the item B → . γ to I ,
if it is not already there. We apply this rule until no more new items can be added to
closure(I).
Goto operation:
Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such that
[A→ α . Xβ] is in I.
Steps to construct SLR parsing table for grammar G are:
1. Augment G and produce G’
2. Construct the canonical collection of set of items C for G’
3. Construct the parsing action function action and goto using the following algorithm that
requires FOLLOW(A) for each non-terminal of grammar.

Algorithm for construction of SLR parsing table:


Input: An augmented grammar G’
Output: The SLR parsing table functions action and goto for G’
Method:
1. Construct C = {I0, I1, …. In}, the collection of sets of LR(0) items for G’.
2. State i is constructed from Ii.. The parsing functions for state i are determined as follows:
(a) If [A→α·aβ] is in Ii and goto(Ii,a) = Ij, then set action[i,a] to “shift j”. Here a must
be terminal.
(b) If [A→α·] is in Ii , then set action[i,a] to “reduce A→α” for all a in FOLLOW(A).
(c) If [S’→S.] is in Ii, then set action[i,$] to “accept”.
If any conflicting actions are generated by the above rules, we say grammar is not SLR(1).
3. The goto transitions for state i are constructed for all non-term
If goto(Ii,A) = Ij, then goto[i,A] = j.
4. All entries not defined by rules (2) and (3) are made “error”
5. The initial state of the parser is the one constructed from the [S’→.S].
For example for the given grammar
1. E → E + T
2. E → T
3. T → T * F
4. T → F

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 116
UNIT – III: Compiler Design - Bottom Up parsing

5. F → ( E )
6. F → id
The string to be parsed is id + id * id
The moves of the LR parser are shown on the next page. For reference:
FOLLOW(E) = { + , $ }
FOLLOW(T) = { * , + , $ }
FOLLOW(F) = { * , + , $ }

This construction requires FOLLOW of each non-terminal present in the grammar to be


computed. The grammar that has a SLR parsing table is known as SLR(1) grammar.
Generally, 1 is omitted

The canonical collection of SLR(0) items are


I0:
E’ → .E
E → .E + T
E → .T
T → .T * F
T → .F
F → .( E )
F → .id
I1:
E’ → E.
E → E.+ T
I2:
E → T.
T → T .* F
I3:
T → F.
I4:
F → (.E)
E→.E+T
E → .T

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 117
UNIT – III: Compiler Design - Bottom Up parsing

T → .T * F
T → .F
F → .( E )
F → .id
I5:
F → id.
I6:
E → E + .T
T → .T * F
T → .F
F → .( E )
F → .id
I7:
T → T * .F
F → .( E)
F → .id
I8:
F → ( E .)
E → E. + T
I9:
E → E + T.
T → T. * F
I10:
T → T * F.
I11:
F → ( E ).

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 118
UNIT – III: Compiler Design - Bottom Up parsing

The DFA for the canonical set of SLR items is

Prepared by G. Sunil Reddy, Asst. Professor,


119
UNIT – III: Compiler Design - Bottom Up parsing

Prepared by G. Sunil Reddy, Asst. Professor,


120
UNIT – III: Compiler Design - Bottom Up parsing

If the right-most column is now traversed upwards, and the productions by which the reduce
steps occur are arranged in that sequence, then that will constitute a leftmost derivation of the
string by this grammar. This highlights the bottom-up nature of SLR parsing.

Construction of SLR Parsing Table for Example

3.7. CANONICAL LR PARSER


A canonical LR parser or LR(1) parser is an LR parser whose parsing tables are
constructed in a similar way as with LR(0) parsers except that the items in the item sets also
contain a follow, i.e., a terminal that is expected by the parser after the right-hand side of the
rule. For example, such an item for a rule A → B C might be
A → B · C, a
which would mean that the parser has read a string corresponding to B and expects next a
string corresponding to C followed by the terminal 'a'. LR(1) parsers can deal with a very
large class of grammars but their parsing tables are often very big. This can often be solved
by merging item sets if they are identical except for the follows, which results in so-called
LALR parsers.
Constructing LR(1) parsing tables
An LR(1) item is a production with a marker together with a terminal, e.g., [S → a A · B e,
c]. Intuitively, such an item indicates how much of a certain production we have seen already

Prepared by G. Sunil Reddy, Asst. Professor,


121
UNIT – III: Compiler Design - Bottom Up parsing

(a A), what we could expect next (B e), and a lookahead that agrees with what should follow
in the input if we ever reduce by the production S → a A B e. By incorporating such
lookahead information into the item concept, we can make wiser reduce decisions. The
lookahead of an LR(1) item is used directly only when considering reduce actions (i.e., when
the · marker is at the right end).
The core of an LR(1) item [S → a A · B e, c] is the LR(0) item S → a A · B e. Different
LR(1) items may share the same core. For example, if we have two LR(1) items of the form
• [A → α ·, a] and
• [B → α ·, b],
we take advantage of the lookahead to decide which reduction to use. (The same setting
would perhaps produce a reduce/reduce conflict in the SLR approach.)
Validity
The notion of validity changes. An item [A → β1 · β2, a] is valid for a viable prefix α β1 if
there is a rightmost derivation that yields α A a w which in one step yields α β1β2 a w
Initial item
To get the parsing started, we begin with the initial item of
[S’ → · S, $].
Here $ is a special character denoting the end of the string.
Closure
Closure is more refined. If [A → α · B β, a] belongs to the set of items, and B → γ is a
production of the grammar, then we add the item [B → · γ, b] for all b in FIRST(β a).
Every state is closed according to Closure.
Goto
Goto is the same. A state containing [A → α · X β, a] will move to a state containing [A → α
X · β, a] with label X.
Every state has transitions according to Goto.

Shift actions
The shift actions are the same. If [A → α · b β, a] is in state Ik and Ik moves to state Im with
label b, then we add the action
action[k, b] = "shift m"

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 122
UNIT – III: Compiler Design - Bottom Up parsing

Reduce actions
The reduce actions are more refined. If [A→α., a] is in state Ik, then we add the action:
"Reduce A → α" to action[Ik, a]. Observe that we don’t use information from FOLLOW(A)
anymore. The goto part of the table is as before.

Fig 3.7.1. The GOTO graph for grammar

Fig 3.7.2. Canonical parsing table for grammar

Prepared by G. Sunil Reddy, Asst. Professor,


123
UNIT – III: Compiler Design - Bottom Up parsing

3.8. LALR (1) Parsing:


LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use the
canonical collection of LR (1) items.
In the LALR (1) parsing, the LR (1) items which have same productions but different
look ahead are combined to form a single set of items
LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.
Example:
LALR ( 1 ) Grammar
S → AA
A → aA
A→b
Add Augment Production, insert '•' symbol at the first position for every production in G
and also add the look ahead.
S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I0 State:
Add Augment production to the I0 State and Compute the ClosureL
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "•" is followed by the
non-terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 124
UNIT – III: Compiler Design - Bottom Up parsing

Add all productions starting with A in I2 State because "•" is followed by the non-
terminal. So, the I2 State becomes
I2= S → A•A, $
A → •aA, $
A → •b, $
I3= Go to (I0, a) = Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because "•" is followed by the non-
terminal. So, the I3 State becomes
I3= A → a•A, a/b
A → •aA, a/b
A → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
Add all productions starting with A in I6 State because "•" is followed by the non-
terminal. So, the I6 State becomes
I6 = A → a•A, $
A → •aA, $
A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) A → aA•, $
If we analyze then LR (0) items of I3 and I6 are same but they differ only in their
lookahead.
I3 = { A → a•A, a/b
A → •aA, a/b
A → •b, a/b}

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 125
UNIT – III: Compiler Design - Bottom Up parsing

I6= { A → a•A, $
A → •aA, $
A → •b, $ }
Clearly I3 and I6 are same in their LR (0) items but differ in their lookahead, so we can
combine them and called as I36.
I36 = { A → a•A, a/b/$
A → •aA, a/b/$
A → •b, a/b/$ }
The I4 and I7 are same but they differ only in their look ahead, so we can combine them
and called as I47.
I47 = {A → b•, a/b/$}
The I8 and I9 are same but they differ only in their look ahead, so we can combine them
and called as I89.
I89 = {A → aA•, a/b/$}
Drawing DFA:

LALR (1) Parsing table:


Action Goto
States
a b $ S A
I0 S36 S47 1 2
I1 Accept
I2 S36 S47 5
I36 S36 S47 89
I47 R3 R3
I5 R1
I89 R2 R2 R2
Important Notes
1. Even though CLR parser does not have RR conflict but LALR may contain RR conflict.
2. If number of states LR (0) = n1,

Prepared by G. Sunil Reddy, Asst. Professor,


126
UNIT – III: Compiler Design - Bottom Up parsing

Number of states SLR = n2,


Number of states LALR = n3,
Number of states CLR = n4 then,
n1 = n2 = n3 <= n4

3.9. THE "DANGLING-ELSE" AMBIGUITY


Consider again the following grammar for conditional statements:
Stmt if expr then stmt else stmt
| if expr then stmt
| other
As we noted in Section 4.3.2, this grammar is ambiguous because it does not resolve the
dangling-else ambiguity. To simplify the discussion, let us consider an abstraction of this
grammar, where i stands for if expr then, e stands for else, and a stands for "all other
productions." We can then write the grammar, with augmenting production S' S, as
S` S
S iSeS | iS | a
The sets of LR(0) items for grammar are shown in Fig. 4.50. The ambiguity in grammar gives
rise to a shift/reduce conflict in I4. There, S • iS-eS calls for a shift of e and, since
FOLLOW(S) = {e, $}, item S • iS- calls for reduction by S iS on input e.
Translating back to the if-then-else terminology, given

Prepared by G. Sunil Reddy, Asst. Professor,


127
UNIT – III: Compiler Design - Bottom Up parsing

On the stack and else as the first input symbol, should we shift else onto the stack (i.e., shift
e) or reduce if expr then stmt (i.e, reduce by S —> iS)? The answer is that we should shift
else, because it is "associated" with the previous then . In the terminology of grammar (4.67),
the e on the input, standing for else, can only form part of the body beginning with the iS now
on the top of the stack. If what follows e on the input cannot be parsed as an 5, completing
body iSeS, then it can be shown that there is no other parse possible.

We conclude that the shift/reduce conflict in J4 should be resolved in favor of shift on input
e. The SLR parsing table constructed from the sets of items of Fig. 4.48, using this resolution
of the parsing-action conflict in I4 on input e, is shown in Fig. 4.51. Productions 1 through 3
are S -> iSeS, S ->• iS, and S -)• a, respectively.

Fig: LR parsing table for the “dangling-else” grammar


For example, on input iiaea, the parser makes the moves shown in Fig. 4.52, corresponding to
the correct resolution of the "dangling-else." At line ( 5 ) , state 4 selects the shift action on
input e, whereas at line ( 9 ) , state 4 calls for reduction by S • iS on input $.

Fig: Parsing actions on input iiaea

Prepared by G. Sunil Reddy, Asst. Professor,


128
UNIT – III: Compiler Design - Bottom Up parsing

By way of comparison, if we are unable to use an ambiguous grammar to specify conditional


statements, then we would have to use a bulkier grammar along the lines of Example 4.16.

3.10. ERROR RECOVERY IN LR PARSING

An LR parser will detect an error when it consults the parsing action table and finds an error
entry. Errors are never detected by consulting the goto table. An LR parser will announce an
error as soon as there is no valid continuation for the portion of the input thus far scanned. A
canonical LR parser will not make even a single reduction before announcing an error. SLR
and LALR parsers may make several reductions before announcing an error, but they will
never shift an erroneous input symbol onto the stack.
In LR parsing, we can implement panic-mode error recovery as follows. We scan down the
stack until a state s with a goto on a particular nonterminal A is found. Zero or more input
symbols are then discarded until a symbol a is found that can legitimately follow A. The
parser then stacks the state GOTO(s, A) and resumes normal parsing. There might be more
than one choice for the nonterminal A. Normally these would be nonterminals representing
major program pieces, such as an expression, statement, or block. For example, if A is the
nonterminal stmt, a might be semicolon or }, which marks the end of a statement sequence.
This method of recovery attempts to eliminate the phrase containing the syntactic error. The
parser determines that a string derivable from A contains an error. Part of that string has
already been processed, and the result of this processing is a sequence of states on top of the
stack. The remainder of the string is still in the input and the parser attempts to skip over the
remainder of this string by looking for a symbol on the input that can legitimately follow

By removing states from the stack, skipping over the input, and pushing
GOTO(s, A) on the stack, the parser pretends that it has found an instance of A and resumes
normal parsing.

Phrase-level recovery is implemented by examining each error entry in the LR parsing table
and deciding on the basis of language usage the most likely programmer error that would
give rise to that error. An appropriate recovery procedure can then be constructed;
presumably the top of the stack and/or first input symbols would be modified in a way
deemed appropriate for each error entry.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 129
UNIT – III: Compiler Design - Bottom Up parsing

In designing specific error-handling routines for an LR parser, we can fill in each blank entry
in the action field with a pointer to an error routine that will take the appropriate action
selected by the compiler designer. The actions may include insertion or deletion of symbols
from the stack or the input or both, or alteration and transposition of input symbols. We must
make our choices so that the LR parser will not get into an infinite loop. A safe strategy will
assure that at least one input symbol will be removed or shifted eventually, or that the stack
will eventually shrink if the end of the input has been reached. Popping a stack state that
covers a nonterminal should be avoided, because this modification eliminates from the stack
a construct that has already been successfully parsed.

Example 4.68: Consider again the expression grammar

E -> E + E | E * E | (E) | id

Figure 4.53 shows the LR parsing table from Fig. 4.49 for this grammar, modified for error
detection and recovery. We have changed each state that calls for a particular reduction on
some input symbols by replacing error entries in that state by the reduction. This change has
the effect of postponing the error detection until one or more reductions are made, but the
error will still be caught before any shift move takes place. The remaining blank entries from
Fig. 4.49 have been replaced by calls to error routines.

The error routines are as follows.


e l : This routine is called from states 0, 2, 4 and 5, all of which expect the beginning of an
operand, either an id or a left parenthesis. Instead, +, *, or the end of the input was found.
Push state 3 (the goto of states 0, 2, 4 and 5 on id); issue diagnostic "missing operand."
e 2 : Called from states 0, 1, 2, 4 and 5 on finding a right parenthesis.
Remove the right parenthesis from the input; issue diagnostic "unbalanced right parenthesis."

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 130
UNIT – III: Compiler Design - Bottom Up parsing

e3: Called from states 1 or 6 when expecting an operator, and an id or right parenthesis is
found. Push state 4 (corresponding to symbol +) onto the stack; issue diagnostic "missing
operator."
e4: Called from state 6 when the end of the input is found.
Push state 9 (for a right parenthesis) onto the stack; issue diagnostic "missing right
parenthesis."
On the erroneous input id + ) , the sequence of configurations entered by the parser is shown
in Fig. 4.54. •

Fig : Parsing and error recovery moves made by an LR parser

Prepared by G. Sunil Reddy, Asst. Professor,


131
UNIT – III: Compiler Design - Bottom Up parsing

SOLVED PROBLEMS
1. Construct SLR(1) Parsing table for the given grammar
S→E
E→E+T|T
T→T*F|F
F → id
Add Augment Production and insert '•' symbol at the first position for every production in
G
S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •E)
Add all productions starting with E in to I0 State because "." is followed by the non-
terminal. So, the I0 State becomes
I0 = S` → •E
E → •E + T
E → •T
Add all productions starting with T and F in modified I0 State because "." is followed by
the non-terminal. So, the I0 State becomes.
I0= S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
I1= Go to (I0, E) = closure (S` → E•, E → E• + T)
I2= Go to (I0, T) = closure (E → T•T, T• → * F)
I3= Go to (I0, F) = Closure ( T → F• ) = T → F•

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 132
UNIT – III: Compiler Design - Bottom Up parsing

I4= Go to (I0, id) = closure ( F → id•) = F → id•


I5= Go to (I1, +) = Closure (E → E +•T)
Add all productions starting with T and F in I5 State because "." is followed by the non-
terminal. So, the I5 State becomes
I5 = E → E +•T
T → •T * F
T → •F
F → •id
Go to (I5, F) = Closure (T → F•) = (same as I3)
Go to (I5, id) = Closure (F → id•) = (same as I4)
I6= Go to (I2, *) = Closure (T → T * •F)
Add all productions starting with F in I6 State because "." is followed by the non-terminal.
So, the I6 State becomes
I6 = T → T * •F
F → •id
Go to (I6, id) = Closure (F → id•) = (same as I4)
I7= Go to (I5, T) = Closure (E → E + T•) = E → E + T•
I8= Go to (I6, F) = Closure (T → T * F•) = T → T * F•
Drawing DFA:

Prepared by G. Sunil Reddy, Asst. Professor,


133
UNIT – III: Compiler Design - Bottom Up parsing

SLR (1) Table

Explanation:
First (E) = First (E + T) ∪ First (T)
First (T) = First (T * F) ∪ First (F)
First (F) = {id}
First (T) = {id}
First (E) = {id}
Follow (E) = First (+T) ∪ {$} = {+, $}
Follow (T) = First (*F) ∪ First (F)
= {*, +, $}
Follow (F) = {*, +, $}
o I1 contains the final item which drives S → E• and follow (S) = {$}, so action {I1, $}
= Accept
o I2 contains the final item which drives E → T• and follow (E) = {+, $}, so action {I2,
+} = R2, action {I2, $} = R2
o I3 contains the final item which drives T → F• and follow (T) = {+, *, $}, so action
{I3, +} = R4, action {I3, *} = R4, action {I3, $} = R4
o I4 contains the final item which drives F → id• and follow (F) = {+, *, $}, so action
{I4, +} = R5, action {I4, *} = R5, action {I4, $} = R5
o I7 contains the final item which drives E → E + T• and follow (E) = {+, $}, so action
{I7, +} = R1, action {I7, $} = R1
I8 contains the final item which drives T → T * F• and follow (T) = {+, *, $}, so action
{I8, +} = R3, action {I8, *} = R3, action {I8, $} = R3.

Prepared by G. Sunil Reddy, Asst. Professor,


134
UNIT – III: Compiler Design - Bottom Up parsing

2. Construct CLR ( 1 ) parsing table for the given Grammar

S → AA
A → aA
A→b
Add Augment Production, insert '•' symbol at the first position for every production in G
and also add the lookahead
S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "." is followed by the non-
terminal. So, the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "." is followed by the
non-terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
Add all productions starting with A in I2 State because "." is followed by the non-
terminal. So, the I2 State becomes
I2= S → A•A, $
A → •aA, $
A → •b, $
I3= Go to (I0, a) = Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because "." is followed by the non-
terminal. So, the I3 State becomes

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 135
UNIT – III: Compiler Design - Bottom Up parsing

I3= A → a•A, a/b


A → •aA, a/b
A → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
Add all productions starting with A in I6 State because "." is followed by the non-
terminal. So, the I6 State becomes
I6 = A → a•A, $
A → •aA, $
A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) = A → aA•, $

Drawing DFA:

Prepared by G. Sunil Reddy, Asst. Professor,


136
UNIT – III: Compiler Design - Bottom Up parsing

CLR (1) Parsing table:

Productions are numbered as follows:


1. S → AA ... (1)
2. A → aA ....(2)
3. A → b ... (3)
The placement of shift node in CLR (1) parsing table is same as the SLR (1) parsing table.
Only difference in the placement of reduce node.
I4 contains the final item which drives ( A → b•, a/b), so action {I4, a} = R3, action {I4,
b} = R3.
I5 contains the final item which drives ( S → AA•, $), so action {I5, $} = R1.
I7 contains the final item which drives ( A → b•,$), so action {I7, $} = R3.
I8 contains the final item which drives ( A → aA•, a/b), so action {I8, a} = R2, action {I8,
b} = R2.
I9 contains the final item which drives ( A → aA•, $), so action {I9, $} = R2.

3. Consider the grammar


E → T+E | T
T →id
Augmented grammar –
E’ → E
E → T+E | T
T → id

Prepared by G. Sunil Reddy, Asst. Professor,


137
UNIT – III: Compiler Design - Bottom Up parsing

4. Consider the following grammar & Construct CLR (1)


S → AaAb | BbBa
A→?
B→?
Augmented grammar –
S’ → S
S → AaAb | BbBa
A→?
B→?
GOTO graph for this grammar will be –

Prepared by G. Sunil Reddy, Asst. Professor,


138
UNIT – III: Compiler Design - Bottom Up parsing

5. Consider the grammar & construct LALR(1)


S →AA
A →aA | b
Augmented grammar –
S’ → S
S →AA
A → aA | b

Prepared by G. Sunil Reddy, Asst. Professor,


139
UNIT – III: Compiler Design - Bottom Up parsing

6. Find the SLR parsing table for the given grammar and parse the sentence (a+b)*c.
E →E+E | E*E | (E) | id.
Answer
Given grammar:
1. E →E+E
2. E →E*E
3. E →(E)
4. E →id
Augmented grammar
E’ →E

E →E+E
E →E*E

Prepared by G. Sunil Reddy, Asst. Professor,


140
UNIT – III: Compiler Design - Bottom Up parsing

E →(E)
E →id
I0: E’ →.E
E →.E+E
E →.E*E
E →.(E)
E →.id
I1: goto(I0, E)
E’ →E.
E →E.+E
E →E.*E
I2: goto(I0, ()
E → (.E)
E →.E+E
E →.E*E
E →.(E)
E →.id
I3: goto(I0, id)
E →id.
I4: goto(I1, +)
E →E+.E
E →.E+E
E →.E*E
E →.(E)
E →.id

I5: goto(I1, *)
E →E*.E
E →.E+E
E →.E*E
E →.(E)
E →.id
I6: goto(I2, E)

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 141
UNIT – III: Compiler Design - Bottom Up parsing

E → (E.)
E →E.+E
E →E.*E
I7: goto(I4, E)
E →E+E.
E →E.+E
E →E.*E
I8: goto(I5, E)
E →E*E.
E →E.+E

E →E.*E goto(I2, ()=I2 goto(I2, id)=I3

REVIEW QUESTIONS (LEVELS I, II, III)


CO Blooms
S. No Short Questions
Addressing level
1 What do you mean by Handle pruning? 2 1
What is Bottom Up Parsing? Explain with an
2 2 1
example.
3 What are the actions in Shift Reduce Parsing? 2 1
4 Define LR(0) items in bottom up parsing? 2 2
5 How LR is different from LL 2 2
6 What is shift – reduce conflict? 2 3
7 What is reduce – reduce conflict? 2 1
8 What is operator grammar? 2 2

Prepared by G. Sunil Reddy, Asst. Professor,


142
UNIT – III: Compiler Design - Bottom Up parsing

MULTIPLE CHOICE QUESTIONS


1. A Bottom Up Parser generates____________________________________ ( )
a.LMD b. RMD
c. RMD in Reverse Order d. Reverse of LMD
2. _________ is a process of finding a parse tree for a string of tokens. ( )
a. Parsing b. Analysing c. Recognizing d. Tokenizing
3. __________________ is a top-down parser ( )
a. Operator precedence parser b. An LALR (k) parser
c. An LR (k) parser d. Recursive descent parser
4. Among simple LR (SLR), canonical LR, and look-ahead LR (LALR), which of the
following pairs identify the method that is very easy to implement and the method that is
the most powerful, in that order? ( )
a. SLR, LALR b. CLR, LALR
c. SLR, CLR d. LALR, CLR
5. The grammar S → aSa | bS | c is ( )
a. LL (1) but not LR (1) b. LR (1) but not LL (1)
c. Both LR(1) and LL (1) d. Neither LL (1) nor LR (1)
6. Syntax analysis is also called_____________. ( )
a. Parser b. Scanner c. Hierarchical Analysis d. Both a & c
7. Which of the following is not a conflict in shift reduce parsing____________ ( )
a. Shift-Reduce Conflict b. Reduce- Reduce Conflict
c. Shift-Shift Conflict d. None
8. Which one of the following statement is false for the SLR (1) and LALR (1) parsing
tables for a context free grammar? ( )
a. The reduce entries in both the tables may be different
b. The error entries in both the tables may be different
c. The go to part of both tables may be different
d. The shift entries in both the tables may be identical
9. Which one of the following statement is true? ( )
a. Canonical LR parser is more powerful than LALR parser
b. SLR parser is more powerful than LALR
c. LALR parser is more powerful than canonical LR parser
d. SLR parser, canonical LR parser and LALR parser all have the same power

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 143
UNIT – III: Compiler Design - Bottom Up parsing

10. Grammar of the programming is checked at _______ phase of compiler. ( )


a. Semantic analysis b. Syntax analysis
c. Code optimization d. Code generation
11. Consider the grammar shown below
S → CC
The grammar is ( )
A → cC | d
a. LL (1) b. SLR (1) but not LL (1)
c. LALR (1) but not SLR (1) d. LR (1) but not LALR (1)
12. Abbreviate CLR _______________________________________________________.
13. Abbreviate SLR _______________________________________________________.
14. Abbreviate LR (1) _____________________________________________________.
15. The most powerful Parsing Method is ______________________________________.
16. Abbreviate CFG _______________________________________________________.
17. Abbreviate LALR(1)____________________________________________________.
18. Abbreviate YACC _____________________________________________________.
19. Syntax analysis is also called_____________________________________________.
20. context-free grammar (CFG) is recognized by push-down automata
21. Operators Precedence parser can be constructed from a grammar called Operator
grammar
22. Which of these does not belong to CFG ( )
a) Terminal Symbol b) Non terminal Symbol
c) Start symbol d) End Symbol
23. Which phase of compiler is Syntax Analysis ( )
a) First b) Second c) Third d) None of the mentioned
24. The context free grammar S → SS | 0S1 | 1S0 | ɛ generates ( )
a) Equal number of 0’s and 1’s
b) Unequal number of 0’s and 1’s
c) Number of 0’s followed by any number of 1’s
d) None of the mentioned
25. Which of the following is not a Bottom Up Parser ______________________ ( )
a.SLR b. CLR c. Backtracking d. LALR

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 144
UNIT – III: Compiler Design - Bottom Up parsing

SHORT QUESTIONS
CO Blooms Mark
S. No Short Questions
Addressing level s
1 What do you mean by Handle pruning? 2 1 2
What is Bottom Up Parsing? Explain with an
2 2 1 2
example.
3 What are the actions in Shift Reduce Parsing? 2 1 2
4 Define LR(0) items in bottom up parsing? 2 2 2
5 How LR is different from LL 2 2 2
CLR is more powerful parsing technique?
6 2 2 2
Justify the statement
7 Explain types of LR parsers? 2 2 2
8 Write the conflicts of shift-reduce parsing. 2 2 2
9 List the techniques bottom up parsing? 2 2 2
What is the difference between Top Down and
10 2 1 2
Bottom Up parsing?
11 Why CLR is more powerful than SLR & LALR 2 2 2
12 What is meant by a Look ahead? 2 1 2
13 What is shift – reduce conflict? 2 3 2
14 What is reduce – reduce conflict? 2 1 2
15 What do you mean by Item set in LR(0) ? 2 1 2
16 What is the difference between LR(0),LR(1)? 2 2 2
17 What do you mean by Closure of Item Sets? 2 1 2
18 Write Canonical Collection of LR items? 2 1 2
19 Define Closure and Goto functions. 2 2 2
20 What is the difference between LR(0) and SLR 2 2 2
Left recursion not effects on Bottom Up
21 2 2 2
Parsing? Justify?
What is the difference between CLR and
22 2 1 2
LALR?
CLR is more powerful than LR Parsers?
23 2 1 2
Justify?
24 What is operator grammar? 2 2 2
25 What do you mean by augmented Grammar 2 1 2
26 Define handle 2 1 2
27 Write short notes on YACC 2 1 2
28 What are kernel & non-kernel items? 2 1 2
29 What is phrase level error recovery? 2 1 2
30 Define LR(0) items 2 1 2

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 145
UNIT – III: Compiler Design - Bottom Up parsing

SHORT QUESTIONS WITH ANSWERS

1. Define Bottom Up Parsing.

Parsing method in which construction starts at the leaves and proceeds towards the
root is called as Bottom Up Parsing.

2. What is Shift-Reduce parsing?

A general style of bottom-up syntax analysis, which attempts to construct a parse tree
for an input string beginning at the leaves and working up towards the root.

3. Define handle. What do you mean by handle pruning?

• An Handle of a string is a sub string that matches the right side of production and
whose reduction to the nonterminal on the left side of the production represents one
step along the reverse of a rightmost derivation.
• The process of obtaining rightmost derivation in reverse is known as Handle Pruning.

4. Define LR (0) items.

An LR (0) item of a grammar G is a production of G with a dot at some position of


the right side. Thus the production A → XYZ yields the following four items,
A → .XYZ
A → X.YZ
A → XY.Z
A → XYZ.

5. What do you mean by viable prefixes?

• The set of prefixes of right sentential forms that can appear on the stack of a shift-
reduce parser are called viable prefixes.
• A viable prefix is that it is a prefix of a right sentential form that does not continue
the past the right end of the rightmost handle of that sentential form.

6. What is meant by an operator grammar? Give an example.

A grammar is operator grammar if,


• No production rule involves “a” on the right side.
• No production has two adjacent nonterminals on the right side..
Ex:

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 146
UNIT – III: Compiler Design - Bottom Up parsing

E | (E) | -E |↑E → E+E | E-E | E*E | E/E | E id

7. What are the disadvantages of operator precedence parsing?

i. It is hard to handle tokens like the minus sign, which has two different
precedence.
ii. Since the relationship between a grammar for the language being parsed and the
operator – precedence parser itself is tenuous, one cannot always be sure the
parser accepts exactly the desired language.
iii. Only a small class of grammars can be parsed using operator precedence
techniques.

8. State error recovery in operator-Precedence Parsing.

There are two points in the parsing process at which an operator-precedence parser can
discover the syntactic errors:

i. If no precedence relation holds between the terminal on top of the stack and the
current input
ii. If a handle has been found, but there is no production with this handle as a right
side
9. LR (k) parsing stands for what?

The “L” is for left-to-right scanning of the input, the “R” for constructing a rightmost
derivation in reverse, and the k for the number of input symbols of look ahead that are
used in making parsing decisions.

10. Why LR parsing is attractive one?

• LR parsers can be constructed to recognize virtually all programming language


constructs for which context free grammars can be written.
• The LR parsing method is the, most general nonbacktracking shift-reduce parsing
method known, yet it can be implemented as efficiently as other shift reduce
methods.
• The class of grammars that can be parsed using LR methods is a proper superset of
the class of grammars that can be parsed with predictive parsers.
• An LR parser can detect a syntactic error as soon as it is possible to do so on a left-
to-right scan of the input.

11. What is meant by goto function in LR parser? Give an example.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 147
UNIT – III: Compiler Design - Bottom Up parsing

• The function goto takes a state and grammar symbol as arguments and produces a
state.
• The goto function of a parsing table constructed from a grammar G is the transition
function of a DFA that recognizes the viable prefixes of G.

Ex: goto(I,X)

Where I is a set of items and X is a grammar symbol to be the closure of the set of all
items [A→αX.ẞ] such that [A→α.X ẞ] is in I

12. Write the configuration of an LR parser?

• A configuration of an LR parser is a pair whose first component is the stack contents


and whose second component is the unexpended input:

(s0 X1 s1 X2 s2 …Xm sm , ai ai+1 … an $)

13. Define LR grammar.

A grammar for which we can construct a parsing table is said to be an LR grammar.

14. What are kernel and non kernel items?

i. S, and all items whose dots are not at the left end are known as kernel item
S. →′The set of items which include the initial item, S
ii. The set of items, which have their dots at the left end, are known as non kernel items.

15. Why SLR and LALR are more economical to construct than canonical LR?

For a comparison of parser size, the SLR and LALR tables for a grammar always
have the same number of states, and this number is typically several hundred states
for a language like Pascal. The canonical LR table would typically have several
thousand states for the same size language. Thus, it is much easier and more
economical to construct SLR and LALR tables than the canonical LR tables.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 148
UNIT – III: Compiler Design - Bottom Up parsing

LONG QUESTIONS
CO Blooms Mark
S. No Long Questions
Addressing level s
Construct SLR(1) parsing table for the given
grammar and parse the string ( )( ).
1 2 4 10
S→S(S)
S→ε
Construct the SLR(1) parsing table for the
following grammar.
2 S → CC 2 4 10
C → cC
C→d
Check whether the given grammar is CLR(1) or
not.
S →AS
3 2 4 10
S →b
A →SA
A →a
Construct LALR(1) parsers for the following
grammar.
S→L=R
4 S→R 2 4 10
L→*R
L → id
R→L
Show the given Grammar is LL(1) but not
SLR(1)
5 S→AaAb|BbBa 2 4 10
A→ ε
B→ ε
Show the given Grammar is SLR(1) but not
LL(1).
6 2 4 10
S → SA|A
A→a
7 Discuss error recovery in LR parsing. 2 2 10
Explain CLR parsing, justify how it is efficient
8 2 2 10
over LR parsers.
Explain the common conflicts that can be
9 encountered in a shift- reduce parser with a 2 3 10
example?
Below given Grammar is SLR(1) or not .
10 S → AS|b 2 4 10
A→SA|a
Consider the grammar
E→ E + E|E *E|(E)| id
11 Show the sequence of moves made by the shift- 2 4 10
reduce parser on the input id1+id2*id3 and
determine whether the given string is accepted

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 149
UNIT – III: Compiler Design - Bottom Up parsing

by the parser or not.


12 Compare SLR, LALR and LR parses 2 3 5
Give the LALR parsing table for the grammar.
S→L=R/R
13 2 4 10
L→ * R / id
R→L
Consider the grammar
S –> S + S
14 S –> S * S 2 4 10
S –> id
Perform Shift Reduce parsing for input string
“id + id + id”
Analyze the following grammar is a LR(1)
grammar and construct LALR parsing table.
S → Aa | bAc | dC | bda
15 2 4 10
A→ d
Parse the input string bdc. Using table generated
by you.
Generate SLR parsing table for the following
grammar
16 E -> E + T/T 2 4 10
T -> T * F / F
F -> a / b
Explain operator precedence parsing with an
17 2 4 10
example
Construct a SLR parsing table for the grammar
& derive a string id+id*id.
18 E→E+T|T 2 4 10
T→T*F|F
F → ( E ) | id
Give the CLR parsing table for the grammar.
S→L=R/R
19 2 4 10
L→ * R / id
R→L
a. Construct the operator precedence parsing
for the given grammar
20 2 4 10
E→ E + E|E *E|(E)| id
b. Derive a string id+id*id
Construct the CLR(1) parsing table for the
following grammar.
21 S → CC 2 4 10
C → cC
C→d
Construct the LALR(1) parsing table for the
following grammar.
22 S → CC 2 4 10
C → cC
C→d
23 Check whether the given grammar is LR(0) or 2 4 10

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 150
UNIT – III: Compiler Design - Bottom Up parsing

SLR(1)
E→E+T|T
T→i
Check whether the given grammar is LR(0) or
SLR(1)
24 2 4 10
E→T+E|T
T→i
Show the given Grammar is CLR(1) or not
25 S → SA|A 2 4 10
A→a

GATE/COMPETITIVE EXAMS QUESTIONS


1. The grammar S → aSa | bS | c is
(A) LL(1) but not LR(1)
(B) LR(1)but not LR(1)
(C) Both LL(1)and LR(1)
(D) Neither LL(1)nor LR(1)
Answer: (C)

Explanation:
First(aSa) = a
First(bS) = b
First(c) = c
All are mutually disjoint i.e no common terminal between them, the given grammar is LL(1).

As the grammar is LL(1) so it will also be LR(1) as LR parsers are more powerful then LL(1)
parsers. and all LL(1) grammar are also LR(1)
So option C is correct.

2. An LALR(1) parser for a grammar G can have shift-reduce (S-R) conflicts if and
only if
(A) The SLR(1) parser for G has S-R conflicts
(B) The LR(1) parser for G has S-R conflicts
(C) The LR(0) parser for G has S-R conflicts
(D) The LALR(1) parser for G has reduce-reduce conflicts
Answer: (B)

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 151
UNIT – III: Compiler Design - Bottom Up parsing

Explanation:
Both LALR(1) and LR(1) parser uses LR(1) set of items to form their parsing tables. And
LALR(1) states can be find by merging LR(1) states of LR(1) parser that have the same set of
first components of their items.
i.e. if LR(1) parser has 2 states I and J with items A a.bP,x and A a.bP,y respectively,
where x and y are look ahead symbols, then as these items are same with respect to their first
component, they can be merged together and form one single state, let’s say K. Here we have
to take union of look ahead symbols. After merging, State K will have one single item as
A a.bP,x,y. This way LALR(1) states are formed ( i.e. after merging the states of LR(1) ).

3. Consider the following two statements:


P: Every regular grammar is LL(1)
Q: Every regular set has a LR(1) grammar
Which of the following is TRUE?
(A) Both P and Q are true (B) P is true and Q is false
(C) P is false and Q is true (D) Both P and Q are false
Answer: (C)
Explanation:
A regular grammar can also be ambiguous also
For example, consider the following grammar,
S → aA/a
A → aA/ε
In above grammar, string 'a' has two leftmost derivations.
(1) S → aA (2) S → a
S->a (using A->ε)
And LL(1) parses only unambiguous grammar, so statement P is False.
Statement Q is true is for every regular set, we can have a regular grammar which is
unambiguous so it can be parse by LR parser.
So option C is correct choice

4. Consider the following grammar.


S -> S * E
S -> E
E -> F + E

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 152
UNIT – III: Compiler Design - Bottom Up parsing

E -> F
F -> id
Consider the following LR(0) items corresponding to the grammar above.
(i) S -> S * .E
(ii) E -> F. + E
(iii) E -> F + .E
Given the items above, which two of them will appear in the same set in the canonical sets-
of-items for the grammar?
(A) (i) and (ii)
(B) (ii) and (iii)
(C) (i) and (iii)
(D) None of the above
Answer: (D)
Explanation: Let’s make the LR(0) set of items. First we need to augment the grammar with
the production rule S’ -> .S , then we need to find closure of items in a set to complete a set.
Below are the LR(0) sets of items.

5. A canonical set of items is given below


S L. > R
Q R.
On input symbol < the set has

Prepared by G. Sunil Reddy, Asst. Professor,


153
UNIT – III: Compiler Design - Bottom Up parsing

(A) A shift-reduce conflict and a reduce-reduce conflict.


(B) A shift-reduce conflict but not a reduce-reduce conflict.
(C) A reduce-reduce conflict but not a shift-reduce conflict.
(D) Neither a shift-reduce nor a reduce-reduce conflict.
Answer: (D)
Explanation: The question is asked with respect to the symbol ‘ < ‘ which is not present in
the given canonical set of items. Hence it is neither a shift-reduce conflict nor a reduce-
reduce conflict on symbol ‘<‘.

Hence D is the correct option.


But if the question would have asked with respect to the symbol ‘ > ‘ then it would have
been a shift-reduce conflict.

6. Consider the grammar defined by the following production rules, with two
operators ∗ and +
S T*P
T U|T*U
P Q+P|Q
Q Id
U Id
Which one of the following is TRUE?
(A) + is left associative, while ∗ is right associative
(B) + is right associative, while ∗ is left associative
(C) Both + and ∗ are right associative
(D) Both + and ∗ are left associative
Answer: (B)
Explanation: From the grammar we can find out associative by looking at grammar.
Let us consider the 2nd production
T T*U
T is generating T*U recursively (left recursive) so * is left associative.
Similarly
P Q+P
Right recursion so + is right associative.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 154
UNIT – III: Compiler Design - Bottom Up parsing

So option B is correct.

7. Consider the grammar


S → (S) | a
Let the number of states in SLR(1), LR(1) and LALR(1) parsers for the grammar be n1, n2
and n3 respectively. The following relationship holds good
(A) n1 < n2 < n3
(B) n1 = n3 < n2
(C) n1 = n2 = n3
(D) n1 ≥ n3 ≥ n2
Answer: (B)
Explanation: LALR(1) is formed by merging states of LR(1) ( also called CLR(1)), hence no
of states in LALR(1) is less than no of states in LR(1), therefore n3 < n2. And SLR(1) and
LALR(1) have same no of states, i.e ( n1 = n3).
Hence n1 = n3 < n2

8. Which of the following statements is false? (GATE CS 2001)


a) An unambiguous grammar has same leftmost and rightmost derivation
b) An LL(1) parser is a top-down parser
c) LALR is more powerful than SLR
d) An ambiguous grammar can never be LR(k) for any k
Answer: (a)
If a grammar has more than one leftmost (or rightmost) derivation for a single sentential
form, the grammar is ambiguous. The leftmost and rightmost derivations for a sentential form
may differ, even in an unambiguous grammar

9. Consider the grammar shown below.


S→CC
C→cC|d
The grammar is
(A) LL(1)
(B) SLR(1) but not LL(1)
(C) LALR(1) but not SLR(1)

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 155
UNIT – III: Compiler Design - Bottom Up parsing

(D) LR(1) but not LALR(1)


Answer: (A)
Explanation: Since there is no conflict, the grammar is LL(1). We can construct a predictive
parse table with no conflicts. This grammar also LR(0), SLR(1), CLR(1) and LALR(1).

10. Which of the following is the most powerful parsing method?


(A) LL(1)
(B) Canonical LR
(C) SLR
(D) LALR
Answer: (B)

11. Which of the following statement is true?


(A) SLR parser is more powerful than LALR.
(B) LALR parser is more powerful than Canonical LR parser.
(C) Canonical LR parser is more powerful than LALR parser.
(D) The parsers SLR, Canonical LR, and LALR have the same power.
Answer: (C)

12. In the following grammar


X :: = X ⊕ Y / Y
Y :: = Z * Y / Z
Z :: = id
Which of the following is true?
a. ‘⊕’ is left associative while ‘*’ is right associative
b. Both ‘⊕’ and ‘*’ are left associative
c. ‘⊕’ is right associative while ‘*’ is left associative
d. None of the above
(A) a
(B) b
(C) c
(D) d
Answer: (A)

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 156
UNIT – III: Compiler Design - Bottom Up parsing

13. Which of the following is essential for converting an infix expression to the postfix
from efficiently?
(A) An operator stack
(B) An operand stack
(C) An operand stack and an operator stack
(D) A parse tree
Answer: (A)

14. Which is True about SR and RR-conflict:


(A) If there is no SR-conflict in CLR(1) then definitely there will be no SR-conflict in
LALR(1).
(B) RR-conflict might occur if lookahead for final items(reduce-moves) is same.
(C) Known that CLR(1) has no RR-conflict, still RR-conflict might occur in LALR(1).
(D) All of the above.
Answer: (D)
Explanation: In above given options all the statements are correct. So, option (D) is Correct.

15. Consider the following expression grammar. The seman-tic rules for expression
calculation are stated next to each grammar production.
E → number E.val = number. val
| E '+' E E(1).val = E(2).val + E(3).val
| E '×' E E(1).val = E(2).val × E(3).val
The above grammar and the semantic rules are fed to a yacc tool (which is an LALR (1)
parser generator) for parsing and evaluating arithmetic expressions. Which one of the
following is true about the action of yacc for the given grammar?
(A) It detects recursion and eliminates recursion
(B) It detects reduce-reduce conflict, and resolves
(C) It detects shift-reduce conflict, and resolves the conflict in favor of a shift over a reduce
action
(D) It detects shift-reduce conflict, and resolves the conflict in favor of a reduce over a shift
action
Answer: (C)

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 157
UNIT – III: Compiler Design - Bottom Up parsing

Explanation:
Background
yacc conflict resolution is done using following rules:
shift is preferred over reduce while shift/reduce conflict.
first reduce is preferred over others while reduce/reduce conflict.
You can answer to this question straightforward by constructing LALR(1) parse table, though
its a time taking process. To answer it faster, one can see intuitively that this grammar will
have a shift-reduce conflict for sure. In that case, given this is a single choice question, (C)
option will be the right answer.
Fool-proof explanation would be to generate LALR(1) parse table, which is a lengthy
process. Once we have the parse table with us, we can clearly see that
i. reduce/reduce conflict will not arise in the above given grammar
ii. shift/reduce conflict will be resolved by giving preference to shift, hence making the
expression calculator right associative.
According to the above conclusions, only correct option seems to be (C).

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 158
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

UNIT IV

Syntax Directed Translation: Syntax Directed Definition, S-attributed definitions, L-


attributed definitions, Attribute grammar, S-attributed grammar, L-attributed grammar.
Semantic Analysis: Type Checking, Type systems, Type expressions, Equivalence of type
expressions.
Intermediate Code Generation: Construction of syntax trees, Directed Acyclic Graph,
Three Address Codes.

UNIT-IV: Syntax Directed Translation, Semantic Analysis & Planned Hours: 11


Intermediate Code Generation
Topic Learning Outc Blooms
S. No. COs
omes Levels
1. Understand the purpose of semantic analysis in compiler CO 2 L2
construction
2. Compare synthesized and inherited attributes CO 2 L2
3. Illustrate the semantic processes in parser tree CO 4 L3
4. Examine the three address codes for intermediate code CO 4 L4
generation
5. Evaluate intermediate representation for constructing CO 4 L4
syntax tree and DAG

4. SYNTAX DIRECTED TRANSLATION:


• The Principle of Syntax Directed Translation states that the meaning of an input
sentence is related to its syntactic structure, i.e., to its Parse-Tree.
• By Syntax Directed Translations we indicate those formalisms for specifying
translations for programming language constructs guided by context-free grammars.
– We associate Attributes to the grammar symbols representing the language
constructs.
– Values for attributes are computed by Semantic Rules associated with grammar
productions.
• Evaluation of Semantic Rules may:
– Generate Code;
– Insert information into the Symbol Table;

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 159
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

– Perform Semantic Check;


– Issue error messages;
– etc.
• There are two notations for attaching semantic rules:
1. Syntax Directed Definitions: High-level specification hiding many
implementation details (also called Attribute Grammars).
2. Translation Schemes: More implementation oriented: Indicate the order in which
semantic rules are to be evaluated.

4.1. SYNTAX DIRECTED DEFINITIONS

Syntax Directed Definitions are a generalization of context-free grammars in which:

1. Grammar symbols have an associated set of Attributes;


2. Productions are associated with Semantic Rules for computing the values of attributes
• Such formalism generates Annotated Parse-Trees where each node of the tree is a
record with a field for each attribute (e.g., X.a indicates the attribute a of the grammar
symbol X).
• The value of an attribute of a grammar symbol at a given parse-tree node is defined by
a semantic rule associated with the production used at that node.
• We distinguish between two kinds of attributes:
1. Synthesized Attributes: They are computed from the values of the attributes of
the children nodes
2. Inherited Attributes: They are computed from the values of the attributes of both
the siblings and the parent nodes

Form of Syntax Directed Definitions:

• Each production, A → α, is associated with a set of semantic rules: b := f ( c 1, c 2, . .


. ,c k ), where f is a function and either
1. b is a synthesized attribute of A, and c 1, c 2, . . . , c k are attributes of the
grammar symbols of the production, or
2. b is an inherited attribute of a grammar symbol in α, and c 1, c 2, . . . , c k are
attributes of grammar symbols in α or attributes of A.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 160
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

• Note: Terminal symbols are assumed to have synthesized attributes supplied by the
lexical analyzer.
• Procedure calls (e.g. print in the next slide) define values of Dummy synthesized
attributes of the non terminal on the left-hand side of the production.

Syntax Directed Definitions: An Example

• Example: Let us consider the Grammar for arithmetic expressions. The Syntax
Directed Definition associates to each non terminal a synthesized attribute called val.

4.2. S-ATTRIBUTED DEFINITIONS:

An S-Attributed Definition is a Syntax Directed Definition that uses only synthesized


attributes.
We can evaluate its attribute in any bottom-up order of the nodes of the parse tree (e.g. post
order traversal LR parser).
• Evaluation Order: Semantic rules in a S-Attributed Definition can be evaluated by a
bottom-up, or PostOrder, traversal of the parse-tree.
• Example: The above arithmetic grammar is an example of an S-Attribute d
Definition. The annotated parse-tree for the input 3*5+4n is:

Prepared by G. Sunil Reddy, Asst. Professor,


161
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Note: A parse tree showing the values of its attributes is called annotated parse tree.

Inherited Attributes

• Inherited Attributes are useful for expressing the dependence of a construct on the
context in which it appears.
• It is always possible to rewrite a syntax directed definition to use only synthesized
attributes, but it is often more natural to use both synthesized and inherited attributes.
• Evaluation Order: Inherited attributes cannot be evaluated by a simple PreOrder
traversal of the parse-tree:
• Unlike synthesized attributes, the order in which the inherited attributes of the
children are computed is important!!! Indeed:
• Inherited attributes of the children can depend from both left and right
siblings!

Example: Let us consider the syntax directed definition with both inherited and synthesized
attributes for the grammar for “type declarations”:

• The non terminal T has a synthesized attribute, type, determined by the keyword in
the declaration.
• The production D → T L is associated with the semantic rule L.in := T .type which set
the inherited attribute L.in.

Note: The production L → L 1, id distinguishes the two occurrences of L

• Synthesized attributes can be evaluated by a PostOrder traversal.


• Inherited attributes that do not depend from right children can be evaluated by a
classical PreOrder traversal.

Prepared by G. Sunil Reddy, Asst. Professor,


162
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

• The annotated parse-tree for the input real id 1, id 2, id 3 is:

• L.in is then inherited top-down the tree by the other L-nodes


• At each L-node the procedure addtype inserts into the symbol table the type of the
identifier.

Implementing Syntax Directed Definitions:

Dependency Graphs

• Implementing a Syntax Directed Definition consists primarily in finding an order for


the evaluation of attributes
– Each attribute value must be available when a computation is performed.
• Dependency Graphs are the most general technique used to evaluate syntax directed
definitions with both synthesized and inherited attributes.
• A Dependency Graph shows the interdependencies among the attributes of the various
nodes of a parse-tree.
– There is a node for each attribute;
– If attribute b depends on an attribute c there is a link from the node for c to the node
for b ( b ← c).

Prepared by G. Sunil Reddy, Asst. Professor,


163
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

• Dependency Rule: If an attribute b depends from an attribute c, then we need to fire


the semantic rule for c first and then the semantic rule for b.

Evaluation Order
• The evaluation order of semantic rules depends from a Topological Sort derived from
the dependency graph.
• Topological Sort: Any ordering m 1, m 2, . . . , m k such that if m i → m j is a link in
the dependency graph then m i < m j .
• Any topological sort of a dependency graph gives a valid order to evaluate the
semantic rules.

Topological Order:

Prepared by G. Sunil Reddy, Asst. Professor,


164
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Example:

Build the dependency graph for the parse-tree of real id1, id2, id3

Implementing Attribute Evaluation: General Remarks

• Attributes can be evaluated by building a dependency graph at compile-time and then


finding a topological sort.
• Disadvantages
1. This method fails if the dependency graph has a cycle: We need a test for non-
circularity;
2. This method is time consuming due to the construction of the dependency graph.
• Alternative Approach: Design the syntax directed definition in such a way that
attributes can be evaluated with a fixed order avoiding to build the dependency graph
(method followed by many compilers).

Strongly Non-Circular Syntax Directed Definitions

Formalisms for which an attribute evaluation order can be fixed at compiler construction
time.

• They form a class that is less general than the class of non-circular definitions.
• In the following we illustrate two kinds of strictly non-circular definitions: S-
Attributed and L-Attributed Definitions.

Prepared by G. Sunil Reddy, Asst. Professor,


165
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Evaluation of S-Attributed Definitions

• Synthesized Attributes can be evaluated by a bottom-up parser as the input is being


analyzed avoiding the construction of a dependency graph.
• The parser keeps the values of the synthesized attributes in its stack.
• Whenever a reduction A α is made, the attribute for A is computed from the
attributes of α which appear on the stack.
• Thus, a translator for an S-Attributed Definition can be simply implemented by
extending the stack of an LR-Parser.

Extending a Parser Stack

• Extra fields are added to the stack to hold the values of synthesized attributes.
• In the simple case of just one attribute per grammar symbol the stack has two fields:
state and val

• The current top of the stack is indicated by the pointer top.


• Synthesized attributes are computed just before each reduction:
• Before the reduction A XYZ is made, the attribute for A is computed: A.a :=
f(val[top], val[top − 1], val[top − 2]).

Example: Consider the S-attributed definitions for the arithmetic expressions. To evaluate
attributes the parser executes the following code

Prepared by G. Sunil Reddy, Asst. Professor,


166
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

• The variable ntop is set to the new top of the stack. After a reduction is done top is set
to ntop: When a reduction A α is done with |α| = r, then ntop = top − r + 1
• During a shift action both the token and its value are pushed into the stack.
• The following Figure shows the moves made by the parser on input 3*5+4n.
– Stack states are replaced by their corresponding grammar symbol;
– Instead of the token digit the actual value is shown.

4.3. L-ATTRIBUTED DEFINITIONS:


L-Attributed Definitions contain both synthesized and inherited attributes but do not need to
build a dependency graph to evaluate them.
Definition: A syntax directed definition is L-Attributed if each inherited attribute of Xj in a
production A X1 . . .Xj . . .Xn, depends only on:
1. The attributes of the symbols to the left (this is what L in L-Attributed stands for) of Xj
, i.e., X1X2 . . .Xj−1, and
2. The inherited attributes of A.
• Theorem: Inherited attributes in L-Attributed Definitions can be computed by a PreOrder
traversal of the parse-tree.

Prepared by G. Sunil Reddy, Asst. Professor,


167
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Evaluating L-Attributed Definitions

• L-Attributed Definitions are a class of syntax directed definitions whose attributes can
always be evaluated by single traversal of the parse-tree.
• The following procedure evaluate L-Attributed Definitions by mixing PostOrder
(synthesized) and PreOrder (inherited) traversal.

Algorithm: L-Eval(n: Node)


Input: Node of an annotated parse-tree.
Output: Attribute evaluation.
Begin
For each childmof n, from left-to-right Do
Begin
Evaluate inherited attributes ofm;
L-Eval(m)
End;
Evaluate synthesized attributes of n
End.

TRANSLATION SCHEMES:

• Translation Schemes are more implementation oriented than syntax directed


definitions since they indicate the order in which semantic rules and attributes are to
be evaluated.

Definition: A Translation Scheme is a context-free grammar in which

1. Attributes are associated with grammar symbols;


2. Semantic Actions are enclosed between braces {} and are inserted within the
right-hand side of productions.
• YACC uses Translation Schemes.
• Translation Schemes deal with both synthesized and inherited attributes.
• Semantic Actions are treated as terminal symbols: Annotated parse-trees contain
semantic actions as children of the node standing for the corresponding production.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 168
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

• Translation Schemes are useful to evaluate L-Attributed definitions at parsing time


(even if they are a general mechanism).
- An L-Attributed Syntax-Directed Definition can be turned into a Translation
Scheme.

Example:

Consider the Translation Scheme for the L-Attributed Definition for “type declarations”:

D T {L.in := T.type} L
T int {T.type :=integer}
T real {T.type :=real}
L {L1.in := L.in} L1, id {addtype(id.entry, L.in)}
L id {addtype(id.entry, L.in)}
The parse-tree with semantic actions for the input real id1, id2, id3 is:

Traversing the Parse-Tree in depth-first order (PostOrder) we can evaluate the attributes

Translation schemes are another way to describe syntax-directed translation.


Translation schemes are closer to a real implementation because the specify when, during the
parse, attributes should be computed.

Prepared by G. Sunil Reddy, Asst. Professor,


169
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Example, for conversion of INFIX expressions to POSTFIX:


This translation scheme will turn 9-5+2 into 95-2+

Fig. 4.3.1 Actions translating 9-5+2 into 95-2+

Fig. 4.3.2 Actions for translating into postfix notation


Translation Schemes for S-attributed Definitions
• If our syntax-directed definition is S-attributed, the construction of the corresponding
translation scheme will be simple.
• Each associated semantic rule in a S-attributed syntax-directed definition will be
inserted as a semantic action into the end of the right side of the associated
production.

Production Semantic Rule


E → E1 + T E.val = E1.val + T.val a production of a syntax directed
definition

E → E1 + T { E.val = E1.val + T.val } the production of the corresponding
translation scheme

Prepared by G. Sunil Reddy, Asst. Professor,


170
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Example:
A simple translation scheme that converts infix expressions to the corresponding postfix
expressions.
E→TR
R → + T { print(“+”) } R1
R→ε
T → id { print(id.name) }
a+b+c ab+c+

Infix expression postfix expression

The depth first traversal of the parse tree (executing the semantic actions in that order) will
produce the postfix representation of the infix expression.

4.4. ATTRIBUTE GRAMMAR


Attribute grammar is a special form of context-free grammar where some additional
information (attributes) are appended to one or more of its non-terminals in order to provide
context-sensitive information. Each attribute has well-defined domain of values, such as
integer, float, character, string, and expressions.
Attribute grammar is a medium to provide semantics to the context-free grammar and it can
help specify the syntax and semantics of a programming language. Attribute grammar (when
viewed as a parse-tree) can pass values or information among the nodes of a tree.
Example:
E → E + T { E.value = E.value + T.value }

Prepared by G. Sunil Reddy, Asst. Professor,


171
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

The right part of the CFG contains the semantic rules that specify how the grammar should be
interpreted. Here, the values of non-terminals E and T are added together and the result is
copied to the non-terminal E.
Semantic attributes may be assigned to their values from their domain at the time of parsing
and evaluated at the time of assignment or conditions. Based on the way the attributes get
their values, they can be broadly divided into two categories: synthesized attributes and
inherited attributes.

4.5. S-ATTRIBUTED GRAMMARS


S-attributed grammars are a class of attribute grammars characterized by having no inherited
attributes, but only synthesized attributes. Inherited attributes, which must be passed down
from parent nodes to children nodes of the abstract syntax tree during the semantic analysis
of the parsing process, are a problem for bottom-up parsing because in bottom-up parsing, the
parent nodes of the abstract syntax tree are created after creation of all of their children.
Attribute evaluation in S-attributed grammars can be incorporated conveniently in both top-
down parsing and bottom-up parsing.
Specifications for parser generators in the Yacc family can be broadly considered S-attributed
grammars. However, these parser generators usually include the capacity to reference global
variables and/or fields from within any given grammar rule, meaning that this is not a pure S-
attributed approach.
Any S-attributed grammar is also an L-attributed grammar.

4.6. L-ATTRIBUTED GRAMMARS


L-attributed grammars are a special type of attribute grammars. They allow the attributes to
be evaluated in one left-to-right traversal of the abstract syntax tree. As a result, attribute
evaluation in L-attributed grammars can be incorporated conveniently in top-down parsing.
Many programming languages are L-attributed. Special types of compilers, the narrow
compilers, are based on some form of L-attributed grammar. These are comparable with S-
attributed grammars. Used for code synthesis.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 172
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

4.7. SEMANTIC ANALYSIS


• Semantic Analysis computes additional information related to the meaning of the
program once the syntactic structure is known.
• In typed languages as C, semantic analysis involves adding information to the symbol
table and performing type checking.
• The information to be computed is beyond the capabilities of standard parsing
techniques, therefore it is not regarded as syntax.
• As for Lexical and Syntax analysis, also for Semantic Analysis we need both a
Representation Formalism and an Implementation Mechanism.
• As representation formalism this lecture illustrates what are called Syntax Directed
Translations.

4.8. TYPE CHECKING


A compiler must check that the source program follows both syntactic and semantic
conventions of the source language. This checking, called static checking, detects and reports
programming errors.
A compiler has to do semantic checks in addition to syntactic checks.
Semantic Checks
• Static –done during compilation
• Dynamic –done during run-time

Some examples of static checks:


1. Type checks - A compiler should report an error if an operator is applied to an
incompatible operand. Example: If an array variable and function variable are added
together (operator applied to incompatible operands?)
2. Flow-of-control checks - Statements that cause flow of control to leave a construct must
have some place to which to transfer the flow of control. Example: An enclosing
statement, such as break, does not exist in switch statement (break (outside while?))
3. Uniqueness checks - An object must be defined exactly (like the type of an identifier,
statements inside a case/switch statement)
4. Name related checks - Same name must appear two or more times.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 173
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Fig. 4.8.1 Position of type checker


A type checker verifies that the type of a construct matches that expected by its context. For
example: arithmetic operator mod in Pascal requires integer operands, so a type checker
verifies that the operands of mod have type integer. Type information gathered by a type
checker may be needed when code is generated.

4.9. TYPE SYSTEMS

The design of a type checker for a language is based on information about the syntactic
constructs in the language, the notion of types, and the rules for assigning types to language
constructs.
For example: “if both operands of the arithmetic operators of +,- and * are of type integer,
then the result is of type integer ”

4.10. TYPE EXPRESSIONS

The type of a language construct will be denoted by a “type expression”. A type expression is
either a basic type or is formed by applying an operator called a type constructor to other type
expressions. The sets of basic types and constructors depend on the language to be checked.
The following are the definitions of type expressions:
1. Basic types such as boolean, char, integer, real are type expressions. A special basic
type, type_error , will signal an error during type checking; void denoting “the
absence of a value” allows statements to be checked
2. Since type expressions may be named, a type name is a type expression
3. A type constructor applied to type expressions is a type expression

Constructors include:
Arrays: If T is a type expression then array (I,T) is a type expression denoting the
type of an array with elements of type T and index set I.

Prepared by G. Sunil Reddy, Asst. Professor,


174
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Products: If T1 and T2 are type expressions, then their Cartesian product T1 X T2 is


a type expression.
Records: The difference between a record and a product is that the names. The record
type constructor will be applied to a tuple formed from field names and field types.

For example:
type row = record
address: integer;
lexeme: array[1..15] of char
end;
var table: array[1...101] of row;
Declares the type name row representing the type expression record((address X
integer) X (lexeme X array(1..15,char))) and the variable table to be an array of
records of this type.
Pointers: If T is a type expression, then pointer(T) is a type expression denoting the
type “pointer to an object of type T”. For example, var p: ↑ row declares variable p to
have type pointer(row).
Functions: A function in programming languages maps a domain type D to a range
type R. The type of such function is denoted by the type expression D → R
4. Type expressions may contain variables whose values are type expressions.

Fig. 4.10.1 Tree representation for char x char → pointer (integer)


Type systems
A type system is a collection of rules for assigning type expressions to the various
parts of a program. A type checker implements a type system. It is specified in a syntax-
directed manner. Different type systems may be used by different compilers or processors of
the same language.

Prepared by G. Sunil Reddy, Asst. Professor,


175
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Static and Dynamic Checking of Types


Checking done by a compiler is said to be static, while checking done when the target
program runs is termed dynamic. Any check can be done dynamically, if the target code
carries the type of an element along with the value of that element.
Sound type system
A sound type system eliminates the need for dynamic checking fo allows us to
determine statically that these errors cannot occur when the target program runs. That is, if a
sound type system assigns a type other than type_error to a program part, then type errors
cannot occur when the target code for the program part is run.

Strongly typed language


A language is strongly typed if its compiler can guarantee that the programs it accepts
will execute without type errors.

Error Recovery
Since type checking has the potential for catching errors in program, it is desirable for
type checker to recover from errors, so it can check the rest of the input. Error handling has to
be designed into the type system right from the start; the type checking rules must be
prepared to cope with errors.

SPECIFICATION OF A SIMPLE TYPE CHECKER


A type checker for a simple language checks the type of each identifier. The type
checker is a translation scheme that synthesizes the type of each expression from the types of
its sub expressions. The type checker can handle arrays, pointers, statements and functions.
A Simple Language
Consider the following grammar:
P→D;E
D → D ; D | id : T
T → char | integer | array [ num ] of T | ↑ T
E → literal | num | id | E mod E | E [ E ] | E ↑
Translation scheme:
P→D;E
D→D;D

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 176
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

D → id : T { addtype (id.entry , T.type) }


T → char { T.type : = char }
T → integer { T.type : = integer }
T → ↑ T1 { T.type : = pointer(T1.type) }
T → array [ num ] of T1 { T.type : = array ( 1… num.val , T1.type) }
In the above language,
→ There are two basic types: char and integer; → type_error is used to signal errors;
→ the prefix operator ↑ builds a pointer type. Example, ↑ integer leads to the type expression
pointer (integer).

Type checking of expressions


In the following rules, the attribute type for E gives the type expression assigned to the
expression generated by E.
1. E → literal { E.type : = char } E→num { E.type : = integer }
Here, constants represented by the tokens literal and num have type char and integer.
2. E → id { E.type : = lookup ( id.entry ) }
lookup ( e ) is used to fetch the type saved in the symbol table entry pointed to by e.
3. E → E1 mod E2 { E.type : = if E1. type = integer and E2. type = integer then integer else
type_error }
The expression formed by applying the mod operator to two subexpressions of type
integer has type integer; otherwise, its type is type_error.
4. E → E1 [ E2 ] { E.type : = if E2.type = integer and E1.type = array(s,t) then t else
type_error }
In an array reference E1 [ E2 ] , the index expression E2 must have type integer. The
result is the element type t obtained from the type array(s,t) of E1.
5. E → E1 ↑ { E.type : = if E1.type = pointer (t) then t else type_error }
The postfix operator ↑ yields the object pointed to by its operand. The type of E ↑ is the
type t of the object pointed to by the pointer E.

Type checking of statements


Statements do not have values; hence the basic type void can be assigned to them. If
an error is detected within a statement, then type_error is assigned.
Translation scheme for checking the type of statements:

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 177
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

1. Assignment statement: S → id: = E { S.type:= if id.type = E.type then void else


type_error}
2. Conditional statement:
S → if E then S1 { S.type:= if E.type =Boolean then S1.type else type_error}
3. While statement:
S → while E do S1 {S.type:=if E.type=Boolean then S1.type else type_error}
4. Sequence of statements:
S → S1 ; S2 { S.type : = if S1.type = void and S1.type = void then void else type_error }

Type checking of functions


The rule for checking the type of a function application is: E → E1 ( E2) { E.type : = if
E2.type = s and E1.type = s → t then t else type_error }

4.11.1. INTERMEDIATE CODE GENERATION

INTRODUCTION

The front end translates a source program into an intermediate representation from which the
back end generates target code.

Benefits of using a machine-independent intermediate form are:

1. Retargeting is facilitated. That is, a compiler for a different machine can be created by
attaching a back end for the new machine to an existing front end
2. A machine-independent code optimizer can be applied to the intermediate representation.

Fig. 4.11.1 A model of a compiler front end

Prepared by G. Sunil Reddy, Asst. Professor,


178
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

INTERMEDIATE LANGUAGES
Three ways of intermediate representation:
• Postfix notation
• Three address code
• Syntax tree

The semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees or for generating postfix notation.

Postfix Notation –
The ordinary (infix) way of writing the sum of a and b is with operator in the middle: a + b
The postfix notation for the same expression places the operator at the right end as ab +. In
general, if e1 and e2 are any postfix expressions, and + is any binary operator, the result of
applying + to the values denoted by e1 and e2 is postfix notation by e1e2 +. No parentheses
are needed in postfix notation because the position and arity (number of arguments) of the
operators permit only one way to decode a postfix expression. In postfix notation the operator
follows the operand.

Example – The postfix representation of the expression (a – b) *(c + d) + (a – b) is: ab – cd


+ *ab -+.

Three-Address Code –
A statement involving no more than three references (two for operands and one for result) is
known as three address statement. A sequence of three address statements is known as three
address code. Three address statement is of the form x = y op z , here x, y, z will have address
(memory location). Sometimes a statement might contain less than three references but it is
still called three address statements.
Example – The three address code for the expression a + b * c + d:
T1 = b * c
T2 = a + T1
T3 = T2 + d
T1, T2, T3 are temporary variables.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 179
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Syntax Tree –
Syntax tree is nothing more than condensed form of a parse tree. The operator and keyword
nodes of the parse tree are moved to their parents and a chain of single productions is
replaced by single link in syntax tree the internal nodes are operators and ch
child nodes are
operands. To form syntax tree put parentheses in the expression, this way it's easy to
recognize which operand should come first.

Example –
x = (a + b * c) / (a – b * c)

Fig 4.11.2. Syntax tree

Three-address code

Three-address code is a sequence of statements


s of the general form x: = y op z
Where
here x, y and z are names, constants, or compiler-generated
compiler generated temporaries; op stands for any
operator, such as a fixed- or floating-point
floating point arithmetic operator, or a logical operator on
boolean-valued data.
ta. Thus a source language expression like x+ y*z might be translated into
a sequence
t1:= y * z
t2:= x + t1
Where t1 and t2 are compiler-generated
generated temporary names.

Prepared by G. Sunil Reddy, Asst. Professor,


180
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Advantages of three-address code:


• The unraveling of complicated arithmetic expressions and of statements makes three-
address code desirable for target code generation and optimization.
• The use of names for the intermediate values computed by a program allows three-
address code to be easily rearranged - unlike postfix notation.

Three-address code is a linearized representation of a syntax tree or a dag in which


explicit names correspond to the interior nodes of the graph. The syntax tree and dag are
represented by the three-address code sequences. Variable names can appear directly in three
address statements.
t1 := -c t1 := -c
t2 := b*t1 t2 := b*t1
t3 := -c t5: t2 + t2
t4 :=b*t3 a :=t5
t5 :=t2+t4
a :=t5
(a) Code for the syntax tree (b) Code for the DAG

Fig. 4.11.3 Three-Address Code corresponding to the syntax tree and DAG

The reason for the term “Three-Address Code” is that each statement usually contains three
addresses, two for the operands and one for the result.

Three address code is a type of intermediate code which is easy to generate and can be easily
converted to machine code. It makes use of at most three addresses and one operator to
represent an expression and the value computed at each instruction is stored in temporary
variable generated by compiler. The compiler decides the order of operation given by three
address code.

General representation –
a = b op c
Where a, b or c represents operands like names, constants or compiler generated temporaries
and op represents the operator

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 181
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Example-1: Convert the expression a * – (b + c) into three address code


t1 = b+c
t2 = uminus t1
t3 = a*t2
Example-2: Write three address code for following code
for(i = 1; i<=10; i++)
{
a[i] = x * 5;
}
i=1
L: t1=x*5
t2=&a
t3=sizeof(int)
t4=t3*i
t5=t2+t4
*t5=t1
i=i+1
if i<=10 goto L

Implementation of Three Address Code –


A three-address statement is an abstract form of intermediate code. In a compiler, these
statements can be implemented as records with fields for the operator and the operands.
There are 3 representations of three address code namely

1. Quadruple
2. Triples
3. Indirect Triples

1. Quadruple –
• It is structure with consist of 4 fields namely op, arg1, arg2 and result. op denotes the
operator and arg1 and arg2 denotes the two operands and result is used to store the
result of the expression.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 182
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

• The contents of fields arg1, arg2 and result are normally pointers to the symbol-
entries for the names represented by these fields. If so, temporary names must be
entered into the symbol table as they are created

Advantage –
• Easy to rearrange code for global optimization.
• One can quickly access value of temporary variables using symbol table.

Disadvantage –
• Contain lot of temporaries.
• Temporary variable creation increases time and space complexity.

Example – Consider expression a = b * – c + b * – c.


The three address code is:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
# Op Arg1 Arg2 Result
0 Uminus c t1
1 * t1 b t2
2 Uminus c t3
3 * t3 b t4
4 + t2 t4 t5
5 = t5 a
Quadruple Notation
2. Triples –
• This representation doesn’t make use of extra temporary variable to represent a single
operation instead when a reference to another triple’s value is needed, a pointer to that
triple is used. So, it consist of only three fields namely op, arg1 and arg2.
• To avoid entering temporary names into the symbol table, we might refer to a
temporary value by the position of the statement that computes it
• Since three fields are used, this intermediate code format is known as triples

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 183
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Disadvantage –
• Temporaries are implicit and difficult to rearrange code.
• It is difficult to optimize because optimization involves moving intermediate code.
When a triple is moved, any other triple referring to it must be updated also. With
help of pointer one can directly access symbol table entry.

Example – Consider expression a = b * – c + b * – c


# Op Arg1 Arg2
0 uminus c
1 * 0 b
2 uminus c
3 * 2 b
4 + 1 3
5 = 1 4
Triple Notation
3. Indirect Triples –
• This representation makes use of pointer to the listing of all references to
computations which is made separately and stored.
• It’s similar in utility as compared to quadruple representation but requires less space
than it. Temporaries are implicit and easier to rearrange code.
• Another implementation of three-address code is that of listing pointers to triples,
rather than listing the triples themselves. This implementation is called indirect
triples.
• For example, let us use an array statement to list pointers to triples in the desired
order. Then the triples shown above might be represented as follows:
Example –
Consider expression a = b * – c + b * – c
# Op Arg1 Arg2
(14) uminus C
(15) * (14) b
(16) uminus C
(17) * (16) b

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 184
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

(18) + (15) (17)


(19) = a (18)

List of Pointers to table


# Statement
(0) (14)
(1) (15)
(2) (16)
(3) (17)
(4) (18)
(5) (19)

Indirect triples representation of three-address statements

4.12. SYNTAX TREES-

Syntax trees are abstract or compact representation of parse trees. They are also called as
Abstract Syntax Trees.
Example-

Parse Trees Vs Syntax Trees-


Parse Tree Syntax Tree
Parse tree is a graphical representation of the Syntax tree is the compact form of a parse
replacement process in a derivation. tree.
Each interior node represents a grammar rule. Each interior node represents an operator.
Each leaf node represents a terminal Each leaf node represents an operand.

Prepared by G. Sunil Reddy, Asst. Professor,


185
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Syntax trees do not provide every


Parse trees provide every characteristic
characteristic information from the real
information from the real syntax.
syntax.
Parse trees are comparatively less dense than Syntax trees are comparatively more dense
syntax trees. than parse trees.

NOTE-
Syntax trees are called as Abstract Syntax Trees because-
• They are abstract representation of the parse trees.
• They do not provide every characteristic information from the real syntax.
• For example- no rule nodes, no parenthesis etc.

Example:
Considering the following grammar-
E→E+T|T
T→TxF|F
F → ( E ) | id
Generate the following for the string id + id x id
• Parse tree
• Syntax tree
• Directed Acyclic Graph (DAG)
Solution-

Parse Tree-

Prepared by G. Sunil Reddy, Asst. Professor,


186
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Syntax Tree-

Directed Acyclic Graph-

4.13. DIRECTED ACYCLIC GRAPH-


• Each node of it contains a unique value.
• It does not contain any cycles in it, hence called Acyclic.

Optimization of Basic Blocks-


DAG is a very useful data structure for implementing transformations on Basic Blocks.
• A DAG is constructed for optimizing the basic block.
• A DAG is usually constructed using Three Address Code.
• Transformations such as dead code elimination and common sub expression
elimination are then applied.

Properties-
• Reachability relation forms a partial order in DAGs.
• Both transitive closure & transitive reduction are uniquely defined for DAGs.
• Topological Orderings are defined for DAGs.

Prepared by G. Sunil Reddy, Asst. Professor,


187
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Applications-
DAGs are used for the following purposes-
• To determine the expressions which have been computed more than once (called
common sub-expressions).
• To determine the names whose computation has been done outside the block but used
inside the block.
• To determine the statements of the block whose computed value can be made
available outside the block.
• To simplify the list of Quadruples by not executing the assignment instructions x:=y
unless they are necessary and eliminating the common sub-expressions.

Construction of DAGs-
Following rules are used for the construction of DAGs-
Rule-01:
In a DAG,
• Interior nodes always represent the operators.
• Exterior nodes (leaf nodes) always represent the names, identifiers or constants

Rule-02:
While constructing a DAG,
• A check is made to find if there exists any node with the same value.
• A new node is created only when there does not exist any node with the same value.
• This action helps in detecting the common sub-expressions and avoiding the re-
computation of the same.

Rule-03:
The assignment instructions of the form x:=y are not performed unless they are necessary.

Example:
1. Consider the following expression and construct a DAG for it-
(a+b)x(a+b+c)
Solution-
Three Address Code for the given expression is-

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 188
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

T1 = a + b
T2 = T1 + c
T3 = T1 x T2
Now, Directed Acyclic Graph is--

NOTE: From the constructed DAG, we observe-


observe
• The common sub-expressio
expressionn (a+b) has been expressed into a single node in the DAG.
• The computation is carried out only once and stored in the identifier T1 and reused
later.

This illustrates how the construction scheme of a DAG identifies the common sub
sub-expression
and helps in eliminating its re-computation
computation later.

SOLVED PROBLEMS
1. Write quadruple, triples and indirect triples for following expression : (x + y) * (y +
z) + (x + y + z)
Explanation – The three address code is:
t1 = x + y
t2 = y + z
t3 = t1 * t2
t4 = t1 + z
t5 = t3 + t4
# Op Arg1 Arg2 Result
1 + x Y t1

Prepared by G. Sunil Reddy, Asst. Professor,


189
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

2 + y Z t2
3 * t1 t2 t3
4 + t1 Z t4
5 + t3 t4 t5
Quadruple representation
# Op Arg1 Arg2
1 + X Y
2 + Y Z
3 * 1 2
4 + 1 Z
5 + 3 4

Triple representation
List of pointers to table
# Op Arg1 Arg2 # Statement
(14) + x y (1) (14)
(15) + y z (2) (15)
(16) * (14) (15) (3) (16)
(17) + (14) z (4) (17)
(18) + (16) (17) (5) (18)

2. Construct a syntax tree for the following arithmetic expression-

( a + b ) * ( c – d ) + ( ( e / f ) * ( a + b ))

Solution-

Step-01:
We convert the given arithmetic expression into a postfix expression as-

(a+b)*(c–d)+((e/f)*(a+b))
ab+ * ( c – d ) + ( ( e / f ) * ( a + b ) )
ab+ * cd- + ( ( e / f ) * ( a + b ) )
ab+ * cd- + ( ef/ * ( a + b ) )
ab+ * cd- + ( ef/ * ab+ )
ab+ * cd- + ef/ab+*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 190
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

ab+cd-* + ef/ab+*
ab+cd-*ef/ab+*+

Step-02:
We draw a syntax tree for the above postfix expression.

Steps Involved

Start pushing the symbols of the postfix expression into the stack one by one.
When an operand is encountered,
• Push it into the stack.

When an operator is encountered


• Push it into the stack.
• Pop the operator and the two symbols below it from the stack.
• Perform the operation on the two operands using the operator you have in hand.
• Push the result back into the stack.

Continue in the similar manner and draw the syntax tree simultaneously.

Prepared by G. Sunil Reddy, Asst. Professor,


191
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

The required syntax tree is-

3. Consider the following expr


expression and construct
4. a DAG for it-
(((a+a)+(a+a))+((a+a)+(a+a)))

Solution-
Directed Acyclic Graph for the given expression is-
is

5. Consider the following block and construct a DAG for it-


it
(1) a = b x c
(2) d=b
(3) e=dxc
(4) b=e
(5) f=b+c
(6) g=f+d

Prepared by G. Sunil Reddy, Asst. Professor,


192
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Solution-
Directed Acyclic Graph for the given block is-
is

REVIEW QUESTIONS (LEVELS I, II, III)


CO Blooms
S. No Questions
Addressing level
1 Explain Syntax Directed Definition in detail 4 2
2 Explain Syntax Directed Translation in detail 4 2
Construct a parse, syntax tree and annotated parse tree
for given input String 5*6+7
S→EN
3 4 4
E→E+T|T
T→T*F|F
F → ( E ) | id
4 What is syntax directed definition? 4 1
5 Explain the usage of syntax directed definition? 4 1
6 What are the two types of attribute grammars 4 1
7 Define Annotated parse tree 4 1
What is the purpose of semantic analysis in a
8 2 1
compiler?
6 Define Synthesized attributes? 4 1
7 What is Inherited attribute? 4 1
8 What do you mean by Semantic rule? 4 1
9 Define S-attributed
attributed Definition 4 1

Prepared by G. Sunil Reddy, Asst. Professor,


193
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

10 Define L-attributed Definition 4 1


11 What is attribute grammar 4 1

MULTIPLE CHOICE QUESTIONS


1. _____________ specifies the value of attributes by associating semantic rules with
the grammar productions.
a) SDD b)SDT c)Syntax analyzer d)Code Generator
2. SDD consisting of
a) Attributes and grammar rules b) Only attribute
c) Only grammar rules d) none of the above
3. SDD that involves only synthesized attributes is called ___________
4. S-attributed SDD can implemented with
a)LR-parsers b)predictive parser c)Non-predictive parser d)only SLR(1)
5. A parse tree showing with values of attributes is called__________
a) Annotated parse tree b) Dependency graphs c)Syntax Tree d)DAG
6. ______________are a useful tool for determining an evaluation order for the attribute
instances in a given parse tree.
a)Annotated parse tree b) Dependency graphs c)Syntax Tree d)DAG
7. What is semantic rule for given Grammar rule E:E+T|T,T:F,F:id
8. An SDD is S-attributed if every attributed is_____________
a)Synthesized or Inherited b) Synthesized c) Inherited d) Synthesized and Inherited
9. An SDD is L-attributed if every attributed is
a)Synthesized or Inherited b) Synthesized c) Inherited d) Synthesized and Inherited
10. If dependency graph edges can go from left to right but not from right to left is called
__________
a) L-attributed definition b) S-attributed definition c) Inherited -attributed definition
d)none of the above
11. Three types of intermediate code representations are _______,_______,_______
12. Polish notation for three address code also called as ________
a)Prefix notation b)Postfix notation c) infix notation d)none
13. Quadruple has ________number of fields
14. Triple has______________number of fields
15. Indirect triple consists of a listing of pointer to ________
a) Triple b) Quadruple c) Both a & b d) none
16. An intermediate code form is
a) Postfix Notation b) Syntax Trees
c) Three address code d) All of the mentioned
17. Type checking is normally done during
a) Lexical analysis b) Syntax analysis
c) Syntax directed translation d) Code generation
18. One of the purposes of using intermediate code in compilers is to
a) make parsing and semantic analysis simpler.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 194
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

b) improve error recovery and error reporting.


c) increase the chances of reusing the machine-independent code optimizer in other
compilers.
d) improve the register allocation.
19. The statement of the form a:=b is called a _________ Statement.
a) Common b) Copy c) Assignment d) Address
20. Type conversion is done automatically by compiler is called_________
21. Type expression for int arr[i]; is
a)array(i,int) b)array(int,int) c)array(i.int,int) d)array(int.i, int)

22. The statement of the form a:=b op c is called a _________ Statement.


a) Common b) Copy c) Assignment d) Address
23. Polish notation for (a+b)*(c-d) is_____________
24. Three address code for a=b+c-d is _____________
25. Construction of syntax tree is for an expression is translation of expression into
__________form.
a)Prefix notation b)Postfix notation c) infix notation d)none

26. Node N in a DAG has _______parent if N represents in a common sub-expressions .


a) more than one b)single c)two d)none
27. Show the DAG for given expression a+a*(b-c)+(b-c)*d
28. Show the syntax tree and DAG for given expression i=i+10;
29. Show the three address code for given expression a+a*(b-c)+(b-c)*d
30. Translate the arithmetic expression a+( b-c) into Syntax Tree, Quadruple and
Indirect Triple

SHORT QUESTIONS
CO Blooms Mark
S. No Short Questions
Addressing level s
1 What is syntax directed definition? 4 1 2
2 Explain the usage of syntax directed definition? 4 1 2
3 What are the two types of attribute grammars 4 1 2
4 Define Annotated parse tree 4 1 2
What is the purpose of semantic analysis in a
5 2 1 2
compiler?

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 195
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

6 Define Synthesized attributes? 4 1 2


Translate the arithmetic expression a * - (b + c)
7 4 2 2
into a syntax tree.
What is the role semantic analysis in compiler
8 4 1 2
phases
9 What is syntax directed translation? 4 1 2
10 What is Inherited attribute? 4 1 2
11 What do you mean by Semantic rule? 4 1 2
12 Define S-attributed Definition 4 1 2
13 Define L-attributed Definition 4 1 2
14 What is attribute grammar 4 1 2
15 What is annotated parse tree, give an example? 4 1 2
16 Define Syntax Tree 4 1 2
17 What is Dependency Graph? 4 1 2
What is the functioning of mkleaf() method in
18 4 1 2
constructing of Syntax tree?
What are the differences between S-attributed
19 4 2 2
and L-attributed grammars?
20 What is Type Checking? 4 1 2
21 What is Type Coercions? 4 1 2
What do you mean by implicit and explicit type
22 4 1 2
conversions?
List the three kinds of intermediate
23 4 1 2
representation?
Write the intermediate representation for the
24 4 2 2
given expression (a+b)*(c-d)
25 What is Quadruple notation, Give an example? 4 2 2
26 What is Triple notation, Give an example? 4 2 2
What is Indirect Triple notation, Give an
27 4 2 2
example?
28 What is postfix notation? 4 1 2
29 Define abstract syntax tree? 4 1 2

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 196
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

30 What is DAG? 4 1 2
Design DAG for the given expression
31 4 2 2
a:=b*c+b*-c
32 Construct a DAG for the expression a: = a + 30? 4 2 2
What are the functions used to create the nodes
33 4 2 2
of syntax trees?
34 What is the role of Intermediate code generator 4 1 2
Write the quadruples notation for the given
35 4 2 2
expression (a+b)-(c/d)*e
Write the triple notation for the given
36 4 2 2
expression (a+b)-(c/d)*e
Write the Indirect Triple notation for the given
37 4 2 2
expression (a+b)-(c/d)*e
Write the intermediate representation for the
38 4 2 2
given expression (a+b)-(c/d)*e

SHORT QUESTIONS WITH ANSWERS


1. What are the benefits of using machine-independent intermediate form?

• Retargeting is facilitated; a compiler for a different machine can be created by


attaching a back end for the new machine to an existing front end.
• A machine-independent code optimizer can be applied to the intermediate
representation.

2. List the three kinds of intermediate representation.


The three kinds of intermediate representations are
i. Syntax trees
ii. Postfix notation
iii. Three address code

3. How can you generate three-address code?


The three-address code is generated using semantic rules that are similar to those for
constructing syntax trees for generating postfix notation.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 197
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

4. What is a syntax tree? Draw the syntax tree for the assignment statement a := b *
-c + b * -c.
• A syntax tree depicts the natural hierarchical structure of a source program.
• Syntax tree:

5. What is postfix notation?


A Postfix notation is a linearized representation of a syntax tree. It is a list of nodes of
the tree in which a node appears immediately after its children.
6. What is the usage of syntax directed definition.
Syntax trees for assignment statement are produced by the syntax directed definition.

7. Why “Three address code” is named so?


The reason for the term “Three address code” is that each usually contains three
addresses, two for operands and one for the result.

8. Define three-address code.


• Three-address code is a sequence of statements of the general form
x := y op z
where x, y and z are names, constants, or compiler-generated temporaries; op
stands for any operator, such as fixed or floating-point arithmetic operator, or a
logical operator on boolean-valued data.
• Three-address code is a linearized representation of a syntax tree or a dag in which
explicit names correspond to the interior nodes of the graph.

9. State quadruple
A quadruple is a record structure with four fields, which we call op, arg1, arg2 and
result.

Prepared by G. Sunil Reddy, Asst. Professor,


198
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

10. What is called an abstract or syntax tree?


A tree in which each leaf represents an operand and each interior node an operator is
called as abstract or syntax tree.

11. Construct Three address code for the following


position := initial + rate * 60
Ans:
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3

12. What are triples?


• The fields arg1,and arg2 for the arguments of op, are either pointers to the symbol
table or pointers into the triple structure then the three fields used in the
intermediate code format are called triples.
• In other words the intermediate code format is known as triples.

13. What are the various methods of implementing three-address statements?


i. Quadruples
ii. Triples
iii. Indirect triples

14. What is known as calling sequence?

A sequence of actions taken on entry to and exit from each procedure is known as
calling sequence.

15. What is the intermediate code representation for the expression a or b and not c?
(Or) Translate a or b and not c into three address code.

Three-address sequence is
t1:= not c
t2:= b and t1

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 199
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

t3:= a or t2
16. What are the methods of representing a syntax tree?
i. Each node is represented as a record with a field for its operator and additional
fields for pointers to its children
ii. Nodes are allocated from an array of records and the index or position of the
node serves as the pointer to the node

LONG QUESTIONS
CO Blooms Mark
S. No Long Questions
Addressing level s
1 Explain Syntax Directed Definition in detail 4 2 10
2 Explain Syntax Directed Translation in detail 4 2 10
Construct a parse, syntax tree and annotated
parse tree for given input String 5*6+7
S→EN
3 4 4 10
E→E+T|T
T→T*F|F
F → ( E ) | id
Design the dependency graph for following
grammar
4 S→T List 4 4 10
T→int|float|char|double
List → List , id | id
Draw the syntax tree and DAG for the
5 4 4 10
expression (a*b)+(c-d)*(a*b)+b
What is syntax tree? Write syntax directed
definition for constructing a syntax tree for an
6 expression using the given grammar 4 4 10
E→E+T|E-T|T
T→(E)|id|num
Define translation scheme and mention how it is
7 4 4 10
different from syntax-directed definition
8 Consider the following grammar G 4 5 10

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 200
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

E1→E
E→E+n|n
Give the parsing action of a bottom up parser
for following string n+n.
Explain why every S-attributed definition is L-
9 4 2 5
attributed.
Explain in detail about how an L-attributed
10 grammar can be converted into a translation 4 2 10
scheme.
With the help of an example, explain how to
11 4 3 10
evaluate an SDD at the nodes of a parse tree?
Give annotated parse trees for the following
expressions:
12 a) (3 + 4 ) * ( 5 + 6 ) n 4 3 10
b) 1 * 2 * 3 * (4 + 5) n.
c) (9 + 8 * (7 + 6 ) + 5) * 4 n .
13 Explain the concept of type conversion. 4 3 5
Explain how declaration is done using syntax
14 4 3 10
directed translation?
Write the quadruples, triples and indirect triples
for the following expressions.
15 4 3 10
a. (a+b)/(c-d)*(e/f + g).
b. (a+b-c/d/e).
a. Construct the syntax tree for the expression
16 (p+q) + (r-s) * (p*q) - q. 4 4 10
b. Give the applications of DAG.
Give DAG representation scheme for the
17 following expression. ( ( a - b ) * c ) – d 4 4 10

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 201
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Consider the grammar with the following


translation rules and E as the start symbol
E → E1 $ T { E.value = E1.value * T.value }
|T { E.value = T.value }
18 T → T1 @ F { T.value = T1.value + F.value } 4 4 5
|F { T.value = F.value }
F → num { F.value = num.value }
Compute E.value for the root of the parse tree
for the expression: 2 $ 3 @ 5 $ 6 @ 4
Consider the context free grammar given below

→ + | − |
→ ∗ | / |
19 → ( )| 4 4 10
→;
Obtain SDD for the above grammar.
Construct parse tree, annotated parse tree for the
input string 3*4+5;
Explain the notations of 3 address code with a
20 4 3 10
example

GATE/COMPETITIVE EXAMS QUESTIONS


1. Consider the following translation scheme.
S → ER
R → *E{print(“*”);}R | ε
E → F + E {print(“+”);} | F
F → (S) | id {print(id.value);}
Here id is a token that represents an integer and id.value represents the corresponding integer
value. For an input ‘2 * 3 + 4’, this translation scheme prints
(A) 2 * 3 + 4
(B) 2 * +3 4
(C) 2 3 * 4 +
(D) 2 3 4+*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 202
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

Answer: (D)
Explanation: Background Required to solve the question – Syntax Directed Translation and
Parse Tree Construction.

2. Consider the grammar with the following translation rules and E as the start
symbol.

E → E1 # T { E.value = E1.value * T.value }


| T{ E.value = T.value }
T → T1 & F { T.value = T1.value + F.value }
| F{ T.value = F.value }
F → num { F.value = num.value }
Compute E.value for the root of the parse tree for the expression: 2 # 3 & 5 # 6 & 4.
(A) 200
(B) 180
(C) 160
(D) 40
Answer: (C)
We can calculate the value by constructing the parse tree for the expression 2 # 3 & 5 # 6 &4.
Alternatively, we can calculate by considering following precedence and associativity rules.
Precedence in a grammar is enforced by making sure that a production rule with higher
precedence operator will never produce an expression with operator with lower precedence.
In the given grammar ‘&’ has higher precedence than ‘#’.

Left associativity for operator * in a grammar is enforced by making sure that for a
production rule like S -> S1 * S2 in grammar, S2 should never produce an expression with *.
On the other hand, to ensure right associativity, S1 should never produce an expression with
*.
In the given grammar, both ‘#’ and & are left-associative.

So expression 2 # 3 & 5 # 6 &4 will become


((2 # (3 & 5)) # (6 & 4))
Let us apply translation rules, we get
((2 * (3 + 5)) * (6 + 4)) = 160.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 203
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

3. In a bottom-up evaluation of a syntax directed definition, inherited attributes can


(A) always be evaluated
(B) be evaluated only if the definition is L--attributed
(C) be evaluated only if the definition has synthesized attributes
(D) never be evaluated
Answer: (B)
Explanation: A Syntax Directed Definition (SDD) is called S Attributed if it has only
synthesized attributes.
L-Attributed Definitions contain both synthesized and inherited attributes but do not need to
build a dependency graph to evaluate them.

4. Consider the translation scheme shown below

S→TR
R → + T {print ('+');} R | ε
T → num {print (num.val);}
Here num is a token that represents an integer and num.val represents the corresponding
integer value. For an input string ‘9 + 5 + 2’, this translation scheme will print
(A) 9 + 5 + 2
(B) 9 5 + 2 +
(C) 9 5 2 + +
(D) + + 9 5 2
Answer: (B)
Explanation: Let us make the parse tree for 9+5+2 in top down manner, left first derivation.
Steps:
1) Exapnd S->TR
2) apply T->Num...
3) apply R -> +T...
4) appy T->Num...
5) apply R-> +T..
6) apply T-> Num..
7) apply R-> epsilon

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 204
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

After printing through the print statement in the parse tree formed you will get the answer as
95+2+

5. Type checking is normally done during

(A) Lexical analysis


(B) Syntax analysis
(C) Syntax directed translation
(D) Code optimization
Answer: (C)
Explanation: Syntax-Directed Translation is used in following cases
• Conversion of infix to Postfix
• Calculation of infix expression
• For creating a Acyclic graph
• Type Checking
• Conversion of Binary number to Decimal
• Counting the numbers of bits (0 or 1 ) in a binary number
• Creation of syntax tree
• To generate Intermediate code
• Storing the data into Symbol table

6. Which one of the following statements is FALSE?

(A) Context-free grammar can be used to specify both lexical and syntax rules.
(B) Type checking is done before parsing.
(C) High-level language programs can be translated to different Intermediate
Representations.
(D) Arguments to a function can be passed using the program stack.

Answer: (B)
Explanation: Type checking is done at semantic analysis phase and parsing is done at syntax
analysis phase. And we know Syntax analysis phase comes before semantic analysis. So
Option (B) is False.
All other options seems Correct.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 205
UNIT – IV: Compiler Design - Syntax Directed Translation, Semantic Analysis &
Intermediate Code Generation

7. One of the purposes of using intermediate code in compilers is to

(A) make parsing and semantic analysis simpler.


(B) improve error recovery and error reporting.
(C) increase the chances of reusing the machine-independent code optimizer in other
compilers.
(D) improve the register allocation.
Answer: (C)
Explanation: After semantic Analysis, the code is converted into intermediate code which is
platform(OS + hardware) independent, the advantage of converting into intermediate code is
to improve the performance of code generation and to increase the chances of reusing the
machine-independent code optimizer in other compilers.
So, option (C) is correct.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 206
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
UNIT V
Runtime Environments: Storage organization, Storage-allocation strategies, Symbol tables,
Activation records.
Code Optimization: The principal sources of optimization, Basic blocks and Flow graphs,
data-flow analysis of flow graphs.
Code Generation: Issues in the design of a code generator, the target machine code, Next-
use information, a simple code generator, Code-generation algorithm.

UNIT-V: Runtime Environments, Code Optimization, Code Generation Planned Hours:13

Blooms
S. No. Topic Learning Outcomes COs
Levels

1. Identifies organization and allocation of storage CO 3 L1

2. Examine symbol table and activation record CO 3 L2

3. Understand optimization of given code CO 5 L1

4. Identifies data flow analysis of flow graphs CO 5 L2

5. Understand the generation of target code CO 5 L1

5. RUN-TIME ENVIRONMENTS

SOURCE LANGUAGE ISSUES


Procedures:
A procedure definition is a declaration that associates an identifier with a statement. The
identifier is the procedure name, and the statement is the procedure body. For example, the
following is the definition of procedure named readarray:
procedure readarray; var i : integer;
begin
for i : = 1 to 9 do read(a[i])
end;
When a procedure name appears within an executable statement, the procedure is said to be
called at that point.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 207
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
Activation trees:
An activation tree is used to depict the way control enters and leaves activations. In an
activation tree,
1. Each node represents an activation of a procedure
2. The root represents the activation of the main program
3. The node for a is the parent of the node for b if and only if control flows from activation a
to b
4. The node for a is to the left of the node for b if and only if the lifetime of a occurs before
the lifetime of b

Control stack:
A control stack is used to keep track of live procedure activations. The idea is to push
the node for an activation onto the control stack as the activation begins and to pop the node
when the activation ends. The contents of the control stack are related to paths to the root of
the activation tree. When node n is at the top of control stack, the stack contains the nodes
along the path from n to the root.

The Scope of a Declaration:


A declaration is a syntactic construct that associates information with a n Declarations
may be explicit, such as:
var i : integer;
or they may be implicit. Example, any variable name starting with I is assumed to denote an
integer. The portion of the program to which a declaration applies is called the scope of that
declaration.

Binding of names:
Even if each name is declared once in a program, the same name may denote different
data objects at run time. “Data object” corresponds to a storage location that holds values.
The term environment refers to a function that maps a name to a storage location. The term
state refers to a function that maps a storage location to the value held there. When an
environment associates storage location s with a name x, we say that x is bound to s. This
association is referred to as a binding of x.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 208
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

Fig. 5. 1 Two-stage mapping from names to values

5.1. STORAGE ORGANIZATION


The executing target program runs in its own logical address space in which each
program value has a location. The management and organization of this logical address space
is shared between the complier, operating system and target machine. The operating system
maps the logical address into physical addresses, which are usually spread throughout
memory.

Fig. 5.1.1 Typical subdivision of run-time memory into code and data areas

• Run-time storage comes in blocks, where a byte is the smallest unit of memory. Four
bytes form a machine word. Multibyte objects are bytes and given the address of first
byte.
• The storage layout for data objects is strongly influenced by the addressing constraints
of the target machine.
• A character array of length 10 needs only enough bytes to hold 10 characters, a
compiler may allocate 12 bytes to get alignment, leaving 2 bytes unused.
• This unused space due to alignment considerations is referred to as padding.

Prepared by G. Sunil Reddy, Asst. Professor,


209
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
• The size of some program objects may be known at run time and may be placed in an
area called static.
• The dynamic areas used to maximize the utilization of space at run time are stack and
heap.

5.2. ACTIVATION RECORDS:


Procedure calls and returns are usually managed by a run time stack called the control
stack. Each live activation has an activation record on the control stack, with the root of the
activation tree at the bottom, the latter activation has its record at the top of the stack. The
contents of the activation record vary with the language being implemented.
• Return Value: It is used by calling procedure to return a value to calling procedure
• Actual Parameter: It is used by calling procedures to supply parameters to the called
procedures
• Control Link: It points to activation record of the caller
• Access Link: It is used to refer to non-local data held in other activation records
• Saved Machine Status: It holds the information about status of machine before the
procedure is called
• Local Data: It holds the data that is local to the execution of the procedure
• Temporaries: It stores the value that arises in the evaluation of an expression

Figure 5.2.1 A general activation record

Prepared by G. Sunil Reddy, Asst. Professor,


210
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

5.3. STORAGE ALLOCATION STRATEGIES


The different storage allocation strategies are:
1. Static allocation - lays out storage for all data objects at compile time
2. Stack allocation - manages the run-time storage as a stack
3. Heap allocation – allocates and deallocates storage as needed at run time from a data
area known as heap

STATIC ALLOCATION:
In static allocation, names are bound to storage as the program is compiled, so there is
no need for a run-time support package. Since the bindings do not change at run-time,
everytime a procedure is activated, its names are bound to the same storage locations.
Therefore values of local names are retained across activations of a procedure.

That is, when control returns to a procedure the values of the locals are the same as
they were when control left the last time. From the type of a name, the compiler decides the
amount of storage for the name and decides where the activation records go. At compile time,
we can fill in the addresses at which the target code can find the data it operates on.

STACK ALLOCATION OF SPACE:


• All compilers for languages that use procedures, functions or methods as units of
user-defined actions manage at least part of their run-time memory as a stack.
• Each time a procedure is called, space for its local variables is pushed onto a stack,
and when the procedure terminates, that space is popped off the stack.

Calling sequences:
Procedures called are implemented in what is called as calling sequence, which
consists of code that allocates an activation record on the stack and enters information into its
fields. A return sequence is similar to code to restore the state of machine so the calling
procedure can continue its execution after the call. The code in calling sequence is often
divided between the calling procedure (caller) and the procedure it calls (callee).

When designing calling sequences and the layout of activation records, the following
principles are helpful:

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 211
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

• Values communicated between caller and callee are generally placed at the beginning
of the callee’s activation record, so they are as close as possible to the caller’s
activation record.
Fixed length items are generally placed in the middle. Such i the control link, the
access link, and the machine status fields
Items whose size may not be known early enough are placed at the end of the
activation record. The most common example is dynamically sized array, where
the value of one of the callee’s parameters determines the length of the array.
We must locate the top-of-stack pointer judiciously. A common approach is to
have it point to the end of fixed-length fields in the activation record. Fixed-length
data can then be accessed by fixed offsets, known to the intermediate-code
generator, relative to the top-of-stack pointer.

Fig Division of tasks between caller and calle


The calling sequence and its division between caller and callee are as follows:
• The caller evaluates the actual parameters
• The caller stores a return address and the old value of top_sp into the callee’s
activation record. The caller then increments the top_sp to the respective positions
• The callee saves the register values and other status information
• The callee initializes its local data and begins execution.

Prepared by G. Sunil Reddy, Asst. Professor,


212
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
A suitable, corresponding return sequence is:
• The callee places the return value next to the parameters.
• Using the information in the machine-status field, the callee restores top_sp and other
registers, and then branches to the return address that the caller placed in the status
field.
• Although top_sp has been decremented, the caller knows where the return value is,
relative to the current value of top_sp; the caller therefore may use that value.

Variable length data on stack:


• The run-time memory management system must deal frequently with the allocation of
space for objects, the sizes of which are not known at the compile time, but which are
local to a procedure and thus may be allocated on the stack.
• The reason to prefer placing objects on the stack is that we avoid the expense of
garbage collecting their space.
• The same scheme works for objects of any type if they are local to the procedure
called and have a size that depends on the parameters of the call.

Prepared by G. Sunil Reddy, Asst. Professor,


213
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
HEAP ALLOCATION:
Stack allocation strategy cannot be used if either of the following is possible:
1. The values of local names must be retained when an activation ends.
2. A called activation outlives the caller.

Heap allocation parcels out pieces of contiguous storage, as needed for activation
records or other objects. Pieces may be deallocated in any order, so over the time the heap
will consist of alternate areas that are free and in use.

Fig. 2.10 Records for live activations need not be adjacent in heap
• The record for an activation of procedure r is retained when the activation ends.
• Therefore, the record for the new activation q(1 , 9) cannot follow that for s physically.
• If the retained activation record for r is deallocated, there will be free space in the heap
between the activation records for s and q.

5.4. SYMBOL TABLE

Symbol table is an important data structure used in a compiler. Symbol table is used to store
the information about the occurrence of various entities such as objects, classes, variable
name, interface, function name etc. it is used by both the analysis and synthesis phases.

Prepared by G. Sunil Reddy, Asst. Professor,


214
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
The symbol table used for following purposes:
• It is used to store the name of all entities in a structured form at one place.
• It is used to verify if a variable has been declared.
• It is used to determine the scope of a name.
• It is used to implement type checking by verifying assignments and expressions in the
source code are semantically correct.
A symbol table can either be linear or a hash table. Using the following format, it maintains
the entry for each name.
<symbol name, type, attribute>
For example, suppose a variable store the information about the following variable
declaration:
static int salary
then, it stores an entry in the following format:
<salary, int, static>
The clause attribute contains the entries related to the name.
Implementation:
The symbol table can be implemented in the unordered list if the compiler is used to handle
the small amount of data.
A symbol table can be implemented in one of the following techniques:
• Linear (sorted or unsorted) list
• Hash table
• Binary search tree
Symbol table are mostly implemented as hash table.

Operations
The symbol table provides the following operations:

Insert ()
Insert () operation is more frequently used in the analysis phase when the tokens are
identified and names are stored in the table.
The insert() operation is used to insert the information in the symbol table like the unique
name occurring in the source code.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 215
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
In the source code, the attribute for a symbol is the information associated with that symbol.
The information contains the state, value, type and scope about the symbol.
The insert () function takes the symbol and its value in the form of argument.
For example:
int x;
Should be processed by the compiler as:
insert (x, int)

lookup()
In the symbol table, lookup() operation is used to search a name. It is used to determine:
• The existence of symbol in the table.
• The declaration of the symbol before it is used.
• Check whether the name is used in the scope.
• Initialization of the symbol.
• Checking whether the name is declared multiple times.
The basic format of lookup() function is as follows:
lookup (symbol)
This format is varies according to the programming language.

Data structure for symbol table


A compiler contains two type of symbol table: global symbol table and scope symbol table.
Global symbol table can be accessed by all the procedures and scope symbol table.
The scope of a name and symbol table is arranged in the hierarchy structure as shown below:
int value=10;
void sum_num()
{
int num_1;
int num_2;
{
int num_3;
int num_4;
}
int num_5;

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 216
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
{
int_num 6;
int_num 7;
}
}
Void sum_id
{
int id_1;
int id_2;
{
int id_3;
int id_4;
}
int num_5;
}
The above grammar can be represented in a hierarchical data structure of symbol tables:

Prepared by G. Sunil Reddy, Asst. Professor,


217
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
The global symbol table contains one global variable and two procedure names. The name
mentioned in the sum_num table is not available for sum_id and its child tables.
Data structure hierarchy of symbol table is stored in the semantic analyzer. If you want to
search the name in the symbol table then you can search it using the following algorithm:
• First a symbol is searched in the current symbol table.
• If the name is found then search is completed else the name will be searched in the
symbol table of parent until,
• The name is found or global symbol is searched.

5.5. CODE OPTIMIZATION

INTRODUCTION
The code produced by the straight forward compiling algorithms can often be made to
run faster or take less space, or both. This improvement is achieved by program
transformations that are traditionally called optimizations. Compilers that apply code-
improving transformations are called optimizing compilers.

Optimizations are classified into two categories. They are


• Machine independent optimizations:
• Machine dependant optimizations:
Machine independent optimizations:
• Machine independent optimizations are program transformations that improve the
target code without taking into consideration any properties of the target machine.
Machine dependant optimizations:
• Machine dependant optimizations are based on register allocation and utilization of
special machine-instruction sequences.

The criteria for code improvement transformations:


Simply stated, the best program transformations are those that yield the most benefit
for the least effort. The transformations provided by an optimizing compiler should have
several properties. They are:

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 218
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
1. The transformation must preserve the meaning of programs. That is, the optimization
must not change the output produced by a program for a given input, or cause an error
such as division by zero, that was not present in the original source program.

Fig. 5.4.1 Organization of the code optimizer


2. A transformation must, on the average, speedup programs by a measurable amount.
We are also interested in reducing the size of the compiled code although the size of
the code has less importance than it once had. Not every transformation succeeds in
improving every program, occasionally an “optimization” may slow down a program
slightly.

3. The transformation must be worth the effort. It does not make sense for a compiler
writer to expend the intellectual effort to implement a code improving transformation
and have the compiler expe6nd the additional time compiling source programs if this
effort is not repaid when the target programs are executed. “Peephole”
transformations of this kind are simple enough and beneficial enough to be included
in any compiler.

Flow analysis is a fundamental prerequisite for many important types of code


improvement. Generally control flow analysis precedes data flow analysis. Control flow
analysis (CFA) represents flow of control usually in form of graphs, CFA constructs such as
control flow graph, Call graph. Data flow analysis (DFA) is the process of asserting and
collecting information prior to program execution about the possible modification,
preservation, and use of certain entities (such as values or attributes of variables) in a
computer program.

Prepared by G. Sunil Reddy, Asst. Professor,


219
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

5.6. PRINCIPAL SOURCES OF OPTIMISATION

A transformation of a program is called local if it can be performed by looking only at


the statements in a basic block; otherwise, it is called global. Many transformations can be
performed at both the local and global levels. Local transformations are usually performed
first.

Function-Preserving Transformations
There are a number of ways in which a compiler can improve a program without
changing the function it computes.
Function preserving transformations examples:
• Common sub expression elimination
• Copy propagation,
• Dead-code elimination
• Constant folding
The other transformations come up primarily when global optimizations are performed.
Frequently, a program will include several calculations of the offset in an array. Some
of the duplicate calculations cannot be avoided by the programmer because they lie below the
level of detail accessible within the source language.

Common Sub expressions elimination:


• An occurrence of an expression E is called a common sub-expression if E was previously
computed, and the values of variables in E have not changed since the previous
computation. We can avoid recomputing the expression if we can use the previously
computed value.
For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5
The above code can be optimized using the common sub-expression elimination as

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 220
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
The common sub expression t4: =4*i is eliminated as its computation is already in t1 and the
value of i is not been changed from definition to use.

Copy Propagation:
Assignments of the form f : = g called copy statements, or copies for short. The idea
behind the copy-propagation transformation is to use g for f, whenever possible after the copy
statement f: = g. Copy propagation means use of one variable instead of another. This may
not appear to be an improvement, but as we shall see it gives us an opportunity to eliminate x.
For example:
x=Pi;
A=x*r*r;
The optimization using copy propagation can be done as follows: A=Pi*r*r;
Here the variable x is eliminated

Dead-Code Eliminations:
A variable is live at a point in a program if its value can be used subsequently;
otherwise, it is dead at that point. A related idea is dead or useless code, statements that
compute values that never get used. While the programmer is unlikely to introduce any dead
code intentionally, it may appear as the result of previous transformations.
Example:
i=0;
if(i=1)
{
a=b+5;
}
Here, ‘if’ statement is dead code because this condition will never get satisfied.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 221
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
Constant folding:
Deducing at compile time that the value of an expression is a constant and using the
constant instead is known as constant folding. One advantage of copy propagation is that it
often turns the copy statement into dead code.
For example,
a=3.14157/2 can be replaced by
a=1.570 thereby eliminating a division operation.

Loop Optimizations:
In loops, especially in the inner loops, programs tend to spend the bulk of their time.
The running time of a program may be improved if the number of instructions in an inner
loop is decreased, even if we increase the amount of code outside that loop.
Three techniques are important for loop optimization:
• Code motion, which moves code outside a loop;
• Induction-variable elimination, which we apply to replace variables from inner loop.
• Reduction in strength, which replaces and expensive operation by a cheaper one, such
as a multiplication by an addition.

Fig. 5.6.1 Flow graph

Prepared by G. Sunil Reddy, Asst. Professor,


222
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
Code Motion:

An important modification that decreases the amount of code in a loop is code


motion. This transformation takes an expression that yields the same result independent of the
number of times a loop is executed (a loop-invariant computation) and places the expression
before the loop. Note that the notion “before the loop” assumes the existence of an entry for
the loop. For example, evaluation of limit-2 is a loop-invariant computation in the following
while-statement:

while (i <= limit-2) /* statement does not change limit*/

Code motion will result in the equivalent of


t= limit-2;
while (i<=t) /* statement does not change limit or t */

Induction Variables:

Loops are usually processed inside out. For example consider the loop around B3.
Note that the values of j and t4 remain in lock-step; every time the value of j decreases by 1,
that of t4 decreases by 4 because 4*j is assigned to t4. Such identifiers are called induction
variables.

When there are two or more induction variables in a loop, it may be possible to get rid
of all but one, by the process of induction-variable elimination. For the inner loop around B3
in Fig.5.5.1 we cannot get rid of either j or t4 completely; t4 is used in B3 and j in B4.

However, we can illustrate reduction in strength and illustrate a part of the process of
induction-variable elimination. Eventually j will be eliminated when the outer loop of B2- B5
is considered.
Example:
As the relationship t4:=4*j surely holds after such an assignment to t4 in Fig. and t4 is
not changed elsewhere in the inner loop around B3, it follows that just after the statement
j:=j-1 the relationship t4:= 4*j-4 must hold. We may therefore replace the assignment t4:= 4*j
by t4:= t4-4. The only problem is that t4 does not have a value when we enter block B3 for
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 223
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
the first time. Since we must maintain the relationship t4=4*j on entry to the block B3, we
place an initializations of t4 at the end of the block where j itself is initialized, shown by the
dashed addition to block B1 in Fig.5.6.1.

The replacement of a multiplication by a subtraction will speed up the object code if


multiplication takes more time than addition or subtraction, as is the case on many machines.

Reduction in Strength:
Reduction in strength replaces expensive operations by equivalent cheaper ones on
the target machine. Certain machine instructions are considerably cheaper than others and can
often be used as special cases of more expensive operators. For example, x² is invariably
cheaper to implement as x*x than as a call to an exponentiation routine. Fixed-point
multiplication or division by a power of two is cheaper to implement as a shift. Floating-point
division by a constant can be implemented as multiplication by a constant, which may be
cheaper.

Fig 5.6.2 B5 and B6 after common sub expression elimination

Prepared by G. Sunil Reddy, Asst. Professor,


224
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

PEEPHOLE OPTIMIZATION
A statement-by-statement code-generations strategy often produces target code that
contains redundant instructions and suboptimal constructs. The quality of such target code
can be improved by applying “optimizing” transformations to the target program.
A simple but effective technique for improving the target code is peephole
optimization, a method for trying to improving the performance of the target program by
examining a short sequence of target instructions (called the peephole) and replacing these
instructions by a shorter or faster sequence, whenever possible.
The peephole is a small, moving window on the target program. The code in the
peephole need not be contiguous, although some implementations do require this. It is
characteristic of peephole optimization that each improvement may spawn opportunities for
additional improvements.

Characteristics of peephole optimizations:


• Redundant-instructions elimination
• Flow-of-control optimizations
• Algebraic simplifications
• Use of machine idioms
• Unreachable

Redundant Loads And Stores:


If we see the instructions sequence
(1) MOV R0,a
(2) MOV a,R0
We can delete instructions (2) because whenever (2) is executed. (1) will ensure that
the value of a is already in register R0.If (2) had a label we could not be sure that (1) was
always executed immediately before (2) and so we could not remove (2).

Unreachable Code:
Another opportunity for peephole optimizations is the removal of unreachable
instructions. An unlabeled instruction immediately following an unconditional jump may be
removed. This operation can be repeated to eliminate a sequence of instructions. For

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 225
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
example, for debugging purposes, a large program may have within it certain segments that
are executed only if a variable debug is 1. In C, the source code might look like:

#define debug 0
….
If ( debug ) {
Print debugging information
}
In the intermediate representations the if-statement may be translated as:
If debug =1 goto L1 goto L2
L1: print debugging information L2: ………………………… (a)
One obvious peephole optimization is to eliminate jumps over jumps .Thus no matter what
the value of debug; (a) can be replaced by:
If debug ≠1 goto L2
Print debugging information
L2: …………………………… (b)
If debug ≠0 goto L2
Print debugging information
L2: …………………………… (c)

As the argument of the statement of (c) evaluates to a constant true it can be replaced By goto
L2. Then all the statement that print debugging aids are manifestly unreachable and can be
eliminated one at a time.

Flows-Of-Control Optimizations:
The unnecessary jumps can be eliminated in either the intermediate code or the target
code by the following types of peephole optimizations. We can replace the jump sequence

goto L1
….
L1: gotoL2 (d)
by the sequence
goto L2

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 226
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
….
L1: goto L2

If there are now no jumps to L1, then it may be possible to eliminate the statement L1:goto
L2 provided it is preceded by an unconditional jump .Similarly, the sequence

if a < b goto L1
….
L1: goto L2 (e)
can be replaced by
If a < b goto L2
….
L1: goto L2

• Finally, suppose there is only one jump to L1 and L1 is preceded by an unconditional


goto. Then the sequence
goto L1
L1: if a < b goto L2 (f) L3:
may be replaced by
If a < b goto L2
goto L3
…….
L3:

While the number of instructions in(e) and (f) is the same, we sometimes skip the
unconditional jump in (f), but never in (e).Thus (f) is superior to (e) in execution time

Algebraic Simplification:
There is no end to the amount of algebraic simplification that can be attempted
through peephole optimization. Only a few algebraic identities occur frequently enough that
it is worth considering implementing them. For example, statements such as
x := x+0 or
x := x * 1

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 227
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
are often produced by straightforward intermediate code-generation algorithms, and they can
be eliminated easily through peephole optimization.

Reduction in Strength:
Reduction in strength replaces expensive operations by equivalent cheaper ones on
the target machine. Certain machine instructions are considerably cheaper than others and can
often be used as special cases of more expensive operators.
For example, x² is invariably cheaper to implement as x*x than as a call to an
exponentiation routine. Fixed-point multiplication or division by a power of two is cheaper to
implement as a shift. Floating-point division by a constant can be implemented as
multiplication by a constant, which may be cheaper.
X2 → X*X

Use of Machine Idioms:


The target machine may have hardware instructions to implement certain specific
operations efficiently. For example, some machines have auto-increment and auto-decrement
addressing modes. These add or subtract one from an operand before or after using its value.
The use of these modes greatly improves the quality of code when pushing or popping a
stack, as in parameter passing. These modes can also be used in code for statements like i :
=i+1.

i:=i+1 → i++
i:=i-1 → i- -

OPTIMIZATION OF BASIC BLOCKS

There are two types of basic block optimizations. They are:


• Structure-Preserving Transformations
• Algebraic Transformations

1. Structure-Preserving Transformations:
The primary Structure-Preserving Transformation on basic blocks are:
• Common sub-expression elimination

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 228
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
• Dead code elimination
• Renaming of temporary variables
• Interchange of two independent adjacent statements.

a. Common sub-expression elimination:


Common sub expressions need not be computed over and over again. Instead they can
be computed once and kept in store from where it’s referenced.
Example:
a: =b+c
b: =a-d
c: =b+c
d: =a-d
The 2nd and 4th statements compute the same expression: b+c and a-d
Basic block can be transformed to
a: = b+c
b: = a-d
c: = a
d: = b

b. Dead code elimination:


It is possible that a large amount of dead (useless) code may exist in the program.
This might be especially caused when introducing variables and procedures as part of
construction or error-correction of a program - once declared and defined, one forgets to
remove them in case they serve no purpose. Eliminating these will definitely optimize the
code.

c. Renaming of temporary variables:


A statement t:=b+c where t is a temporary name can be changed to u:=b+c where u is
another temporary name, and change all uses of t to u. In this a basic block is transformed to
its equivalent block called normal-form block.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 229
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
d. Interchange of two independent adjacent statements:
Two statements can be interchanged or reordered in its computation in the basic block
when value of t1 does not affect the value of t2.
t1:=b+c
t2:=x+y

2. Algebraic Transformations:

Algebraic identities represent another important class of optimizations on basic


blocks. This includes simplifying expressions or replacing expensive operation by cheaper
ones i.e. reduction in strength. Another class of related optimizations is constant folding.
Here we evaluate constant expressions at compile time and replace the constant expressions
by their values. Thus the expression 2*3.14 would be replaced by 6.28.
The relational operators <=, >=, <, >, + and = sometimes generate unexpected
common sub expressions. Associative laws may also be applied to expose common sub
expressions. For example, if the source code has the assignments
a :=b+c
e :=c+d+b
The following intermediate code may be generated:
a :=b+c
t :=c+d
e :=t+b
Example:
x:=x+0 can be removed
x:=y**2 can be replaced by a cheaper statement x:=y*y

The compiler writer should examine the language specification carefully to determine
what rearrangements of computations are permitted, since computer arithmetic does not
always obey the algebraic identities of mathematics. Thus, a compiler may evaluate x*y-x*z
as x*(y-z) but it may not evaluate a+(b-c) as (a+b)-c.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 230
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

5.7. BASIC BLOCKS AND FLOW GRAPHS


Basic Blocks
• A basic block is a sequence of consecutive statements in which flow of control enters
at the beginning and leaves at the end without any halt or possibility of branching
except at the end.
• The following sequence of three-address statements forms a basic block
t1: = a * a
t2: = a * b
t3: = 2 * t2
t4: = t1 + t3
t5: = b * b
t6: = t4 + t5
Basic Block Construction:
Algorithm: Partition into basic block
Input: A sequence of three-address statements
Output: A list of basic blocks with each three-address statement in exactly one block
Method:
1. We first determine the set of leaders, the first statements of basic blocks. The rules we
use are of the following:
a. The first statement is a leader
b. Any statement that is the target of a conditional or unconditional goto is a leader
c. Any statement that immediately follows a goto or conditional goto statementis a
leader
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.

Consider the following source code for dot product of two vectors:

Prepared by G. Sunil Reddy, Asst. Professor,


231
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
The three-address code for the above source program is given as:
(1) prod := 0
(2) i := 1
(3) t1 := 4* i
(4) t2 := a[t1] /*compute a[i] */
(5) t3 := 4* i
(6) t4 := b[t3] /*compute b[i] */
(7) t5 := t2*t4
(8) t6 := prod+t5
(9) prod := t6
(10) t7 := i+1
(11) i := t7
(12) if i<=20 goto (3)
Basic block 1: Statement (1) to (2)
Basic block 2: Statement (3) to (12)

Transformations on Basic Blocks:

A number of transformations can be applied to a basic block without expressions computed


by the block. Two important classes of transformation are:
• Structure-preserving transformations
• Algebraic transformations

1. Structure preserving transformations:


a) Common sub expression elimination:
a: = b + c a: = b + c
b: = a – d b: = a - d
c: = b + c c: = b + c
d: = a – d d: = b

Since the second and fourth expressions compute the same expression, the basic block can be
transformed as above.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 232
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
b) Dead-code elimination:
Suppose x is dead, that is, never subsequently used, at the point where the statement x : = y +
z appears in a basic block. Then this statement may be safely removed without changing the
value of the basic block.

c) Renaming temporary variables:


A statement t : = b + c ( t is a temporary ) can be changed to u : = b + c (u is a new
temporary) and all uses of this instance of t can be changed to u without changing the value
of the basic block. Such a block is called a normal-form block.

d) Interchange of statements:
Suppose a block has the following two adjacent statements:
t1 : = b + c
t2 : = x + y

We can interchange the two statements without affecting the value of the block if and only if
neither x nor y is t1 and neither b nor c is t2.

2. Algebraic transformations:

Algebraic transformations can be used to change the set of expressions computed by a basic
block into an algebraically equivalent set.
Examples:
i. x : = x + 0 or x : = x * 1 can be eliminated from a basic block without changing the set
of expressions it computes.
ii. The exponential statement x : = y * * 2 can be replaced by x : = y * y.

Flow Graphs
• Flow graph is a directed graph containing the flow-of-control information for the set
of basic blocks making up a program.
• The nodes of the flow graph are basic blocks. It has a distinguished initial node.
• E.g.: Flow graph for the vector dot product is given as follows:

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 233
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

Fig. 5.6.1 Flow graph for program

• B1 is the initial node. B2 immediately follows B1, so there is an edge from B1 to B2.
The target of jump from last statement of B1 is the first statement B2, so there is an
edge from B1 (last statement) to B2 (first statement).
• B1 is the predecessor of B2, and B2 is a successor of B1.
Loops
A loop is a collection of nodes in a flow graph such that
1. All nodes in the collection are strongly connected.
2. The collection of nodes has a unique entry.
• A loop that contains no other loops is called an inner loop.

5.8. DATAFLOW ANALYSIS:

It is the analysis of flow of data in control flow graph, i.e., the analysis that
determines the information regarding the definition and use of data in program. With the help
of this analysis optimization can be done. In general, its process in which values are
computed using data flow analysis. The data flow property represents information which can
be used for optimization.

Basic Terminologies –
Definition Point: a point in a program containing some definition.
Reference Point: a point in a program containing a reference to a data item.

Prepared by G. Sunil Reddy, Asst. Professor,


234
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
Evaluation Point: a point in a program containing evaluation of expression.

Data Flow Properties –


Available Expression – A expression is said to be available at a program point x iff along
paths its reaching to x. A Expression is available at its evaluation point.
A expression a+b is said to be available if none of the operands gets modified before their
use.
Example –

Advantage –
• It is used to eliminate common sub expressions.

Reaching Definition – A definition D is reaches a point x if there is path from D to x in


which D is not killed, i.e., not redefined.

Example –
Prepared by G. Sunil Reddy, Asst. Professor,
235
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

Advantage –
• It is used in constant and variable propagation.
Live variable – A variable is said to be live at some point p if from p to end the variable is
used before it is redefined else it becomes dead.
Example –

Advantage –
• It is useful for register allocation.
• It is used in dead code elimination.

Busy Expression – An expression is busy along a path iff its evaluation exists along that path
and none of its operand definition exists before its evaluation along the path.
Advantage –
• It is used for performing code movement optimization.

Prepared by G. Sunil Reddy, Asst. Professor,


236
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

5.9. CODE GENERATION


The final phase in compiler model is the code generator. It takes as input an
intermediate representation of the source program and produces as output an equivalent target
program. The code generation techniques presented below can be used whether or not an
optimizing phase occurs before code generation.

Fig. 5.8.1 Position of code generator

5.10. ISSUES IN THE DESIGN OF A CODE GENERATOR


The following issues arise during the code generation phase:
1. Input to code generator
2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order
1. Input to code generator:
The input to the code generation consists of the intermediate representation of the source
program produced by front end, together with information in the symbol table to
determine run-time addresses of the data objects denoted by the names in the
intermediate representation.
• Intermediate representation can be:
a) Linear representation such as postfix notation
b) Three address representation such as quadruples
c) Virtual machine representation such as stack machine code
d) Graphical representations such as syntax trees and DAG’s.

Prepared by G. Sunil Reddy, Asst. Professor,


237
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
• Prior to code generation, the front end must be scanned, parsed and translated into
intermediate representation along with necessary type checking. Therefore, input to
code generation is assumed to be error-free.
2. Target program:
The output of the code generator is the target program. The output may be :
a. Absolute machine language
- It can be placed in a fixed memory location and can be executed immediately.
b. Relocatable machine language
- It allows subprograms to be compiled separately.
c. Assembly language
- Code generation is made easier.
3. Memory management:
• Names in the source program are mapped to addresses of data objects in run-time
memory by the front end and code generator.
• It makes use of symbol table, that is, a name in a three-address statement refers to a
symbol-table entry for the name.
• Labels in three-address statements have to be converted to addresses of instructions.
For example,
j:gotoigenerates jump instruction as follows:
If i < j, a backward jump instruction with target address equal to location of code for
quadruple i is generated
If i > j, the jump is forward. We must store on a list for quadruple i the location of the
first machine instruction generated for quadruple j. When i is processed, the machine
locations for all instructions that forward jumps to i are filled
4. Instruction selection:
• The instructions of target machine should be complete and uniform.
• Instruction speeds and machine idioms are important factors when efficiency of target
program is considered.
• The quality of the generated code is determined by its speed and size.
• The former statement can be translated into the latter statement as shown below:
a:=b+c
d:=a+e (a)

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 238
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
MOV b,R0
ADD c,R0
MOV R0,a (b)
MOV a,R0
ADD e,R0
MOV R0,d

5. Register allocation
• Instructions involving register operands are shorter and faster than those involving
operands in memory. The use of registers is subdivided into two sub problems :
1. Register allocation - the set of variables that will reside in registers at a point in
the program is selected.
2. Register assignment - the specific register that a value picked
3. Certain machine requires even-odd register pairs for some operands and results.
For example, consider the division instruction of the form : D x, y
Where, x - dividend even register in even/odd register pair y-divisor, even
register holds the remainder, odd register holds the quotient

6. Evaluation order
• The order in which the computations are performed can affect the efficiency of the
target code. Some computation orders require fewer registers to hold intermediate
results than others

5.11. TARGET MACHINE

• Familiarity with the target machine and its instruction set is a prerequisite for
designing a good code generator.
• The target computer is a byte-addressable machine with 4 bytes to a word.
• It has n general-purpose registers, R0, R1, . . . , Rn-1.
• It has two-address instructions of the form:
op source, destination where, op is an op-code, and source and destination are
data fields. It has the following op-codes.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 239
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
• MOV (move sourcetodstination)
• ADD (add source destination)
• SUB(subtract source from destination)
• The source and destination of an instruction are specified by combining registers
and memory locations with address modes.

For example: MOV R0, M stores contents of Register R0 into memory location M.

Instruction costs:
• Instruction cost = 1+cost for source and destination address modes. This cost
corresponds to the length of the instruction.
• Address modes involving registers have cost zero.
• Address modes involving memory location or literal have cost one.
• Instruction length should be minimized if space is important. Doing so also minimizes
the time taken to fetch and perform the instruction.
For example: MOV R0, R1 copies the contents of register R0 into R1. It has cost one, since
occupies only one word of memory.
• The three-address statement a : = b + c can be implemented by many different instruction
sequences:
i. MOV b, R0
ADD c, R0 cost = 6 MOV R0, a
ii. MOV b, a
ADD c, a cost = 6
iii. Assuming R0, R1 and R2 contain the addresses of a, b, and c : MOV *R1, *R0
ADD *R2, *R0 cost = 2
• In order to generate good code for target machine, we must utilize its addressing
capabilities efficiently.

Prepared by G. Sunil Reddy, Asst. Professor,


240
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
RUN-TIME STORAGE MANAGEMENT
• Information needed during an execution of a procedure is kept in a block of storage called
an activation record, which includes storage for names local to the procedure. The two
standard storage allocation strategies are:
1. Static allocation
2. Stack allocation
• In static allocation, the position of an activation record in memory is fixed at compile
time.
• In stack allocation, a new activation record is pushed onto the stack for each execution of
a procedure. The record is popped when the activation ends.
• The following three-address statements are associated with the run-time allocation and
deallocation of activation records:
1. Call,
2. Return,
3. Halt, and
4. Action, a placeholder for other statements.
• We assume that the run-time memory is divided into areas for:
1. Code
2. Static data
3. Stack

Static allocation
Implementation of call statement:
The codes needed to implement static allocation are as follows:
MOV #here + 20, callee.static_area /*It saves return address*/
GOTO callee.code_area /*It transfers control to the target code for the called
procedure */
where,
callee.static_area - Address of the activation record
callee.code_area - Address of the first instruction for called procedure
#here + 20 - Literal return address which is the address of the instruction
following GOTO.
Implementation of return statement:

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 241
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
A return from procedure callee is implemented by : GOTO *callee.static_area
This transfers control to the address saved at the beginning of the activation record.

Implementation of action statement:


The instruction ACTION is used to implement action statement.
Implementation of halt statement:
The statement HALT is the final instruction that returns control to the operating
system.

Stack allocation
Static allocation can become stack allocation by using relative addresses for storage in
activation records. In stack allocation, the position of activation record is stored in
register so words in activation records can be accessed as offsets from the value in
this register.
The codes needed to implement stack allocation are as follows:

Initialization of stack:
MOV #stackstart , SP /* initializes stack */
Code for the first procedure
HALT /* terminate execution */

Implementation of Call statement:


ADD #caller.recordsize, SP /* increment stack pointer */
MOV #here + 16, SP /* Save return address */
GOTO calle.code_area
Where,
Caller.recordsize -size of the activation record
#here + 16 – address of the instruction following the GOTO

Implementation of Return statement:


GOTO *0 (SP) /* return to the caller */
SUB #caller. recordsize, SP /* decrement SP and restore to previous value */

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 242
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

5.12. NEXT-USE INFORMATION

• If the name in a register is no longer needed, then we remove the name from the register
and the register can be used to store some other names.

Input: Basic block B of three-address statements


Output: At each statement i: x= y op z, we attach to i the liveliness and next-uses of x, y and
z.
Method: We start at the last statement of B and scan backwards.
1. Attach to statement i the information currently found in the symbol table regarding
the next-use and liveliness of x, y and z.
2. In the symbol table, set x to “not live” and “no next use”.
3. In the symbol table, set y and z to “live”, and next-uses of y and z to i

5.13. A SIMPLE CODE GENERATOR


• A code generator generates target code for a sequence of three- address statements and
effectively uses registers to store operands of the statements
• For example: consider the three-address statement a := b+c It can have the following
sequence of codes:
ADD Rj, Ri Cost = 1
(or)
ADD c, Ri Cost = 2
(or)
MOV c, Rj Cost = 3
ADD Rj, Ri

Prepared by G. Sunil Reddy, Asst. Professor,


243
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
Register and Address Descriptors:
• A register descriptor is used to keep track of what is currently in each registers. The
register descriptors show that initially all the registers are empty
• An address descriptor stores the location where the current value of the name can be
found at run time

5.14. A CODE-GENERATION ALGORITHM:

The algorithm takes as input a sequence of three-address statements constituting a


basic block. For each three-address statement of the form x : = y op z, perform the following
actions:
1. Invoke a function getreg to determine the location L where the result of the computation
y op z should be stored
2. Consult the address descriptor for y to determine y’, the current location of y. Prefer the
register for y’ if the value of y is currently both in memory and a register. If the value of
y is not already in L, generate the instruction MOV y’ , L to place a copy of y in L
3. Generate the instruction OP z’ , L where z’ is a current location of z. Prefer a register to
a memory location if z is in both. Update the address descriptor of x to indicate that x is
in location L. If x is in L, update its descriptor and remove x from all other descriptors
4. If the current values of y or z have no next uses, are not live on exit from the block, and
are in registers, alter the register descriptor to indicate that, after execution of x : = y op
z , those registers will no longer contain y or z

Generating Code for Assignment Statements:


• The assignment d: = (a-b) + (a-c) + (a-c) might be translated into the following three-
address code sequence:
Code sequence for the example is:
t:=a-b
u:=a-c
v:=t+u
d:=v+u
with d live at the end

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 244
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
Code sequence for the example is:
Statements Code Generated Register descriptor Address descriptor
MOV a, R0
t: = a - b R0 contains t t in R0
SUB b, R0
MOV a, R1 R0 contains t t in R0
u: = a - c
SUB c, R1 R1 contains u u in R1
R0 contains v u in R1
v: = t + u ADD R1, R0
R1 contains u v in R0
ADD R1, R0 d in R0
d: = v + u R0 contains d
MOV R0, d d in R0 and memory

Generating Code for Indexed Assignments


The table shows the code sequences generated for the indexed assignment
a:= b[ i ] and a[ i ]:= b
Statements Code Generated Cost
a: = b[i] MOV b(Ri), R 2
a[i]: = b MOV b, a(Ri) 3

Generating Code for Pointer Assignments


The table shows the code sequences generated for the pointer assignments a : = *p and *p : =
a
Statements Code Generated Cost
a: = *p MOV *Rp, a 2
*p: = a MOV a, *Rp 2
Generating Code for Conditional Statements
Statements Code
if x < y goto z CMP x,y
CJ < z /* jump to z if condition code is negative */
x: = y + z MOV y, R0
CJ < z /* jump to z if condition code is negative */
if x < y goto z ADD z, R0
MOV R0,x
CJ < z

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 245
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

REVIEW QUESTIONS (LEVELS I, II, III)


CO Blooms
S. No Questions
Addressing level
1 What is activation record 3 1
2 What is Code optimization 5 1
3 What is Code generator 5 1
4 What are storage allocation strategies? 3 1
5 What is storage organization? 3 1
6 List the principle sources of optimization? 5 1
List the advantages of the organization of code
7 5 2
optimizer?
8 Explain in detail about the storage organization 3 2
Explain in detail about the storage allocation
9 3 2
strategies
10 Define and explain in detail about the symbol table 4 2

MULTIPLE CHOICE QUESTIONS


Code Optimization
1. The symbol table implementation is based on the property of locality of reference is
a) Linear List b) Search Tree c) Hash table d) Self Organization
Answer: c
Explanation: A hash table (hash map) is a data structure used to implement an associative
array. A hash table uses a hash function to compute an index into an array, from which the
correct value can be found.

2. In operator precedence parsing whose precedence relations are defined


a) For all pair of non-terminals b) For all pair of terminals
c) To delimit the handle d) None of the mentioned
Answer: a
Explanation: There are two important properties for these operator precedence parsers is that
it does not appear on the right side of any production and no production has two adjacent no
terminals.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 246
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
3. LR parser are attractive because
a) It can be constructed to recognize CFG corresponding to almost all programming
constructs
b) It does not backtrack c) Both of the mentioned d) None of the mentioned
Answer: c
Explanation: These above mentioned are the reasons why LR parser is considered to be
attractive.

4. The most powerful parser is


a) SLR b) LALR c) Canonical LR d) Operator Precedence
Answer: c
Explanation: The most powerful parser is Canonical LR.

5. YACC Builds up
a) SLR parsing Table b) Canonical LR parsing Table
c) LALR parsing Table d) None of the mentioned
Answer: c
Explanation: YACC provides a general tool for describing the input to a computer program.

6. Object program is a
a) Program written in machine language
b) Translated into machine language
c) Translation of high-level language into machine language
d) None of the mentioned
Answer: c
Explanation: A computer program when from the equivalent source program into machine
language by the compiler or assembler.

7. ( Z,* ) be a structure, and * is defined by n * m =maximum ( n , m ) Which of the


following statements is true for ( Z, * ) ?
a) ( Z,* ) is a monoid b) ( Z,* ) is an algebraic group
c) ( Z,* ) is a group d) None of the mentioned
Answer: d
Explanation: It is neither a monoid nor a simple group nor algebraic group.
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 247
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

8. The address code involves


a) Exactly 3 address b) At most Three address
c) No unary operators d) None of the mentioned
Answer: d
Explanation: In computer science, three-address is an intermediate code used by optimizing
compilers to aid in the implementation of code-improving transformations.

9. An intermediate code form is


a) Post-fix Notation b) Syntax Trees c) Three address code d) All of the mentioned
Answer: d
Explanation: Intermediate code generator takes an input from its predecessor phase, semantic
analyzer, in the form of an annotated syntax tree.

10. Relocating bits used by relocating loader are specified by


a) Relocation loader itself b) Linker
c) Assembler d) Macro Processor
Answer: b
Explanation: A linker or link editor is a computer program that takes one or more object files
generated by a compiler and combines them into a single executable file, library file, or
another object file.

Loop Optimization
1. In a single pass assembler, most of the forward references can be avoided by putting the
restriction
a) On the number of strings/life reacts
b) Code segment to be defined after data segment
c) On unconditional rump d) None of the mentioned
Answer: b
Explanation: A single pass assembler scans the program only once and creates the equivalent
binary program.

2. The method which merges the bodies of two loops is


a) Loop rolling b) Loop jamming c) Constant folding d) None of the mentioned
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 248
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
Answer: b
Explanation: In computer science, loop fusion (or loop jamming) is a compiler optimization
and loop transformation which replaces multiple loops with a single one.

3. Assembly code data base is associated with


a) Code is converted into assembly
b) Table of rules in the form of patterns for matching with the uniform symbol table to
discover syntactic structure
c) None of the mentioned d) Both of the mentioned
Answer: a
Explanation: An assembly language is a low-level programming language for a computer, or
other programmable device, in which there is a very strong (generally one-to-one)
correspondence between the language and the architecture’s machine code instructions.

4. The process manager has to keep track of


a) Status of each program
b) Information to a programmer using the system
c) Both of the mentioned d) None of the mentioned
Answer: c
Explanation: Process manager keep track of the status and info about the program.

5. Function of the syntax phase is to


a) Recognize the language and to cal the appropriate action routines that will generate the
intermediate form or matrix for these constructs
b) Build a literal table and an identifier table c) Build a uniform symbol table
d) Parse the source program into the basic elements or tokens of the language
Answer: a
Explanation: In this phase symbol table is created by the compiler which contains the list of
lexemes or tokens.

6. If E be a shifting operation applied to a function f, such that E(f) = f (x +β ), then


a) E (αf+β g) =α E(f) +β E (g) b) E (αf +β g )=. ( α+ β )+ E (f + g)
c) E (αf +β g )=α E (f+gβ) d) E (αf +β g )=αβ E (f + g)
Answer: a
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 249
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
Explanation: Shifting operation when performed gives this result.

7. Pass I
a) Assign address to all statements b) Save the values assigned to all labels for use in pass 2
c) Perform some processing d) All of the mentioned
Answer: d
Explanation: The pass 1 of a compiler the above mentioned functions are performed.

8. Which table is a permanent database that has an entry for each terminal symbol?
a) Terminal Table b) Literal Table c) Identifier Table d) None of the mentioned
Answer: a
Explanation: A database that has entry for each terminal symbols such as arithmetic
operators, keywords, punctuation characters such as ‘;’, ‘,’etc Fields: Name of the symbol.

9. Which of the following functions is performed by loader?


a) Allocate memory for the programs and resolve symbolic references between objects decks
b) Address dependent locations, such as address constants, to correspond to the allocated
space
c) Physically place the machine instructions and data into memory
d) All of the mentioned
Answer: d
Explanation: A loader is the part of an operating system that is responsible for loading
programs and libraries.

10. The root directory of a disk should be placed


a) At a fixed address in main memory b) At a fixed location on the disk
c) Anywhere on the disk d) None of the mentioned
Answer: b
Explanation: Root directory is placed at a fixed disk location

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 250
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

SHORT QUESTIONS
CO Blooms Mark
S. No Short Questions
Addressing level s
1 What is activation record 3 1 2
2 What is Code optimization 5 1 2
3 What is Code generator 5 1 2
List common methods for associating actual and
4 3 1 2
formal parameters?
5 Define back patching? 3 1 2
List different data structures used for symbol
6 4 1 2
table?
Write the steps to search an entry in the hash
7 4 1 2
table?
8 Write general activation record? 3 1 2
9 What are storage allocation strategies? 3 1 2
10 What is storage organization? 3 1 2
11 List the principle sources of optimization? 5 1 2
12 List 3 areas of code optimization? 5 1 2
13 List the local optimization techniques? 5 1 2
14 List the loop optimization techniques? 5 1 2
15 List the global optimization techniques? 5 1 2
16 Define local optimization? 5 1 2
17 Define constant folding? 5 1 2
List the advantages of the organization of code
18 5 2 2
optimizer?
19 Define Common Sub expressions? 5 1 2
20 Explain Dead Code elimination? 5 1 2
21 Define Reduction in strength? 5 1 2
22 Define peephole optimization? 5 1 2
23 List the different data flow properties? 5 1 2
24 Explain inner loops? 5 2 2
25 Define flow graph? 5 2 2

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 251
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
What is the role of code generator in a compiler
26 5 2 2
phases?
27 Explain about live variable analysis? 5 2 2
28 Define copy propagation? 5 2 2
What is the role of code optimizer in a compiler
29 5 2 2
phases?
Write the procedure to detect induction variable
30 5 2 2
with example
31 What is dead code elimination? 5 2 2
Write how loop invariant computation can be
32 5 2 2
eliminated?
Write how “Redundant sub-expression
33 5 2 2
eliminates” can be done in a given program.
34 Define basic block? 5 1 2
35 List the issues in design of a code generator 5 1 2

SHORT QUESTIONS WITH ANSWERS


1. How the quality of object program is measured?
The quality of an object program is measured by its Size or its running time. For large
computation running time is particularly important. For small computations size may be
as important or even more.

2. What is the more accurate term for code optimization?


The more accurate term for code optimization would be “code improvement”

3. Explain the principle sources of optimization.


Code optimization techniques are generally applied after syntax analysis, usually both
before and during code generation. The techniques consist of detecting patterns in the
program and replacing these patterns by equivalent and more efficient constructs.

4. What are the patterns used for code optimization?


The patterns may be local or global and replacement strategy may be a machine
dependent or independent

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 252
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
5. What are the 3 areas of code optimization?
• Local optimization
• Loop optimization
• Data flow analysis

6. Define local optimization.


The optimization performed within a block of code is called a local optimization.

7. Define constant folding.


Deducing at compile time that the value of an expression is a constant and using the
constant instead is known as constant folding.

8. What do you mean by inner loops?


The most heavily traveled parts of a program, the inner loops, are an obvious target for
optimization. Typical loop optimizations are the removal of loop invariant computations
and the elimination of induction variables.

9. What is code motion?


Code motion is an important modification that decreases the amount of code in a loop.

10. What are the properties of optimizing compilers?


• Transformation must preserve the meaning of programs
• Transformation must, on the average, speed up the programs by a measurable
amount
• A Transformation must be worth the effort

11. Give the block diagram of organization of code optimizer.

Prepared by G. Sunil Reddy, Asst. Professor,


253
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
12. What are the advantages of the organization of code optimizer?
a. The operations needed to implement high level constructs are made explicit in the
intermediate code, so it is possible to optimize them.
b. The intermediate code can be independent of the target machine, so the optimizer
does not have to change much if the code generator is replaced by one for a different
machine

13. Define Local transformation & Global Transformation.


A transformation of a program is called Local, if it can be performed by looking only at
the statements in a basic block otherwise it is called global.

14. Give examples for function preserving transformations.


• Common subexpression elimination
• Copy propagation
• Dead – code elimination
• Constant folding

15. What is meant by Common Subexpressions?


An occurrence of an expression E is called a common subexpression, if E was previously
computed, and the values of variables in E have not changed since the previous
computation.

16. What is meant by Dead Code?


A variable is live at a point in a program if its value can be used subsequently otherwise,
it is dead at that point. The statement that computes values that never get used is known
Dead code or useless code.

17. What are the techniques used for loop optimization?


i. Code motion
ii. Induction variable elimination
iii. Reduction in strength

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 254
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
18. What is meant by Reduction in strength?
Reduction in strength is the one which replaces an expensive operation by a cheaper one
such as a multiplication by an addition.

19. What is meant by loop invariant computation?


An expression that yields the same result independent of the number of times the loop is
executed is known as loop invariant computation.

20. Define data flow equations.


A typical equation has the form Out[S] = gen[S] U (In[S] – kill[S]) and can be read as,
“the information at the end of a statement is either generated within the statement, or
enters at the beginning and is not killed as control flows through the statement”. Such
equations are called data flow equations.

21. What are the two standard storage allocation strategies?


The two standard allocation strategies are
1. Static allocation
2. Stack allocation

22. Discuss about static allocation.


In static allocation the position of an activation record in memory is fixed at run time.

23. Write short notes on activation tree.


• A tree which depicts the way of control enters and leaves activations.
• In an activation tree
i. Each node represents an activation of an procedure
ii. The root represents the activation of the main program
iii. The node for a is the parent of the node for b, if and only if control flows from
activation a to b
iv. Node for a is to the left of the node for b, if and only if the lifetime of a occurs
before the lifetime of b

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 255
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
24. Define control stack.
A stack which is used to keep track of live procedure actions is known as control stack.

25. Define heap.


A separate area of run-time memory which holds all other information is called a heap.

26. Give the structure of general activation record


• Returned value
• Actual parameters
• Optional control link
• Optional access link
• Saved machine status
• Local data
• Temporaries
27. Discuss about stack allocation.
In stack allocation a new activation record is pushed on to the stack for each execution of
a procedure. The record is popped when the activation ends.

28. What are the 2 approaches to implement dynamic scope?


• Deep access
• Shallow access

29. What is padding?


Space left unused due to alignment consideration is referred to as padding.

30. What are the 3 areas used by storage allocation strategies?


• Static allocation
• Stack allocation
• Heap allocation

31. What are the limitations of using static allocation?


• The size of a data object and constraints on its position in memory must be known at

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 256
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
compile time.
• Recursive procedure are restricted, because all activations of a procedure use the
same bindings for local name
• Data structures cannot be created dynamically since there is no mechanism for
storage allocation at run time

32. Define calling sequence and return sequence.


• A call sequence allocates an activation record and enters information into its fields
• A return sequence restores the state of the machine so that calling procedure can
continue execution

33. When dangling reference occurs?


• A dangling reference occurs when there is storage that has been deallocated
• It is logical error to use dangling references, since the value of deallocated storage is
undefined according to the semantics of most languages

34. Define static scope rule and dynamic rule


• Lexical or static scope rule determines the declaration that applies to a name by a
examining the program text alone
• Dynamic scope rule determines the declaration applicable to name at runtime, by
considering the current activations

35. What is block? Give its syntax.


• A block is a statement containing its own data declaration
Syntax:
{
Declaration statements
}

36. What is access link?


• An access link is a pointer to each activation record which obtains a direct
implementation of lexical scope for nested procedure

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 257
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

37. What is known as environment and state?


• The term environment refers to a function that maps a name to a storage location
• The term state refers to a function that maps a storage location to the value held
there

38. How the run-time memory is sub-divided?


• Generated target code
• Data objects
• A counterpart of the control stack to keep track of procedure activations

LONG QUESTIONS
CO Blooms Mark
S. No Long Questions
Addressing level s
1 Explain in detail about the storage organization 3 2 10
Explain in detail about the storage allocation
2 3 2 10
strategies
Define and explain in detail about the symbol
3 4 2 10
table
4 What are Activation Records? Explain In detail 3 2 10
What is Code Optimization? Explain principle
5 5 2 10
sources of optimization
6 Explain Local optimization in detail? 5 2 10
7 Explain peephole optimization? 5 2 10
8 Explain Loop optimization in detail? 5 2 10
Discuss about the following
I. Copy propagation
9 5 2 10
II. Dead code elimination
III. Code motion
Explain different schemes of storing name
10 4 2 10
attribute in symbol table.
11 Explain various Global optimization techniques 5 2 10

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 258
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
in detail?
a. Discuss algebraic simplification and
reduction in strength?
12 5 2 10
b. Write a short note on code generating
algorithms?
13 Explain the various source language issues? 3 2 10
Explain in detail the issues in design of a code
14 5 2 10
generator?
Demonstrate the simple code generator with a
15 5 2 10
suitable example?
Write the next-use information for each line of
16 5 2 10
the basic block?
Construct the DAG for the following basic
block.
D:=B*C
17 5 4 10
E:=A+B
B:=B+C
A:=E-D
State loop invariant computations? Explain how
18 5 3 10
they affect the efficiency of a program?
Explain how “Redundant sub-expression
19 Eliminates” can be done at global level in a 5 3 10
given program?
Explain role of DAG in optimization with
20 5 3 10
example.
Write in detail about the issues in the design of
21 5 2 10
code generator.
Explain why Next-use information is required
22 5 2 10
for generating object code?

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 259
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation

GATE/COMPETITIVE EXAMS QUESTIONS


1. Consider the following C code segment.
for (i = 0, i<n; i++)
{
for (j=0; j<n; j++)
{
if (i%2)
{
x += (4*j + 5*i);
y += (7 + 4*j);
}
}
}
Which one of the following is false?
A. The code contains loop-in variant computation
B. There is scope of common sub-expression elimination in this code
C. There is scope strength reduction in this code
D. There is scope of dead code elimination in this code
SOLUTION
All the statements are true except option (D) since there is no dead code to get eliminated.
Hence (D) is correct option.

2. Which of the following are true?


i. A programming language option does not permit global variables of any king and has
no nesting of procedures/functions, but permits recursion can be implemented with
static storage allocation
ii. Multi-level access link (or display) arrangement is needed to arrange activation
records-only if the programming language being implemented has nesting of
procedures/function
iii. Recursion in programming languages cannot be implemented with dynamic storage
allocation
iv. Nesting of procedures/functions and recursion require a dynamic heap allocation
scheme and cannot be implemented with a stack- based allocation scheme for

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 260
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
activation records
v. Programming languages which permit a function to return a function as its result
cannot be implemented with a stack-based storage allocation scheme for activation
records
(A) (ii) and (v) only (B) (i), (iii) and (iv) only
(C) (i), (ii) and (v) (D) (ii), (iii) and (v) only

SOLUTION
i. Statement is false since global variables are required for recursions with static
storage. This is due to unavailability of stack in static storage.
ii. This is true
iii. In dynamic allocation heap structure is used, so it is false.
iv. False since recursion can be implemented.
Statement is completely true.
So only II & V are true.
Hence (A) is correct option.

3. What data structure in a complier is used for managing information about variables
and their attributes?
(A) Abstract syntax tree (B) Symbol table
(C) Semantic stack (D) Parse table
SOLUTION
Symbol table is used for storing the information about variables and their attributes by
compiler.
Hence (B) is correct option.

4. Which languages necessarily need heap allocation in the runtime environment?


A. Those that support recursion
B. Those that use dynamic scoping
C. Those that allow dynamic data structure
D. Those that use global variables

SOLUTION
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 261
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
Dynamic memory allocation is maintained by heap data structure. So to allow dynamic data
structure heap is required.
Hence (C) is correct option.

5. Which one of the following is FALSE?


A. Basic block is a sequence of instructions where control enters the sequence at the
beginning and exits at the end.
B. Available expression analysis can be used for common subexpression elimination.
C. Live variable analysis can be used for dead code elimination.
D. x = 4 ∗ 5 => x = 20 is an example of common subexpression elimination.
Answer: (D)
Explanation:
(A) A basic block is a sequence of instructions where control enters the sequence at the
beginning and exits at the end is TRUE.
(B) Available expression analysis can be used for common subexpression elimination is
TRUE. Available expressions is an analysis algorithm that determines for each point in the
program the set of expressions that need not be recomputed. Available expression analysis is
used to do global common subexpression elimination (CSE). If an expression is available at a
point, there is no need to re-evaluate it.
(C)Live variable analysis can be used for dead code elimination is TRUE.
(D) x = 4 ∗ 5 => x = 20 is an example of common subexpression elimination is FALSE.
Common subexpression elimination (CSE) refers to compiler optimization replaces identical
expressions (i.e., they all evaluate to the same value) with a single variable holding the
computed value when it is worthwhile to do so.
Below is an example
In the following code:
a = b * c + g;
d = b * c * e;
it may be worth transforming the code to:
tmp = b * c;
a = tmp + g;
d = tmp * e;
6. Consider the following C code segment.
for (i = 0, i<n; i++)
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 262
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
{
for (j=0; j<n; j++)
{
if (i%2)
{
x += (4*j + 5*i);
y += (7 + 4*j);
}
}
}
Which one of the following is false?
(A) The code contains loop invariant computation
(B) There is scope of common sub-expression elimination in this code
(C) There is scope of strength reduction in this code
(D) There is scope of dead code elimination in this code
Answer: (D)
Explanation: Question asks about false statement. 4*j is common subexpression elimination
so B is true. 5*i can be moved out of inner loop so can be i%2. Means, A is true as we have
loop invariant computation. Next, 4*j as well as 5*i can be replaced with a = - 4; before j
loop then a = a + 4; where 4*j is computed, likewise for 5*i. C is true as there is scope of
strength reduction.
By choice elimination, we have D.

7. Some code optimizations are carried out on the intermediate code because
(A) they enhance the portability of the compiler to other target processors
(B) program analysis is more accurate on intermediate code than on machine code
(C) the information from dataflow analysis cannot otherwise be used for optimization
(D) the information from the front end cannot otherwise be used for optimization
Answer: (A)
Explanation: Option (B) is also true. But the main purpose of doing some code-optimization
on intermediate code generation is to enhance the portability of the compiler to target
processors. So Option A) is more suitable here.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 263
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
Intermediate code is machine/architecture independent code. So a compiler can optimize it
without worrying about the architecture on which the code is going to execute (it may be the
same or the other ). So that kind of compiler can be used by multiple different architectures.
In contrast to that, suppose code optimization is done on target code, which is
machine/architecture dependent, then the compiler has be specific about the optimizations on
that kind of code. In this case the compiler can’t be used by multiple different architectures,
because the target code produced on different architectures would be different. Hence
portability reduces here.

8. In a simplified computer the instructions are:


OP Rj,Ri - Performs Rj OP Ri and stores the result in register Ri
OP m, Ri - Performs val OP Ri and stores the result in resister Ri. val
denotes the content of memory location m
MOV m, Ri - Moves the content of memory location m to register Ri
MOV Ri, m - Moves the content of register Ri to memory location m
The computer has only two registers, and OP is either ADD or SUB. Consider the
following basic block:
T1=a+b
T2=c+d
T3=e-T2
T4=T1-T3
Assume that all operands are initially in memory. The final value of the computation
should be in memory. What is the minimum number of MOV instructions in the code
generated for this basic block?
(A) 2 (B) 3 (C) 5 (D) 6
Answer: (B)
Explanation:
For Instructions of t2 and t3
1. MOV c, t2
2. OP d, t2(OP=ADD)
3. OP e, t2(OP=SUB)
For Instructions of t1 and t4
4. MOV a, t1
5. OP b, t1(OP=ADD)
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 264
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
6. OP t1, t2(OP=SUB)
7. MOV t2, a(AS END Value has To be in the MEMORY)
Step 6 should have been enough, if the question hadn’t asked for final value in memory and
rather be in register. The final step require another MOV, thus a total of 3.

9. Consider the grammar rule E → E1 – E2 for arith-metic expressions. The code


generated is targeted to a CPU having a single user register. The sub-traction
operation requires the first operand to be in the register. If E1 and E2 do not have
any com-mon sub expression, in order to get the shortest possible code
(A) E1 should be evaluated first
(B) E2 should be evaluated first
(C) Evaluation of E1 and E2 should necessarily be interleaved
(D) Order of evaluation of E1 and E2 is of no consequence
Answer: (B)
Explanation:
E E1 – E2
Given that E1 and E2 don’t share any sub expression, most optimized usage of single user
register for evaluation of this production rule would come only when E2 is evaluated before
E1. This is because when we will have E1 evaluated in the register, E2 would have been
already computed and stored at some memory location. Hence we could just use subtraction
operation to take the user register as first operand, i.e. E1 and E2 value from its memory
location referenced using some index register or some other form according to the instruction.
Hence correct answer should be (B) E2 should be evaluated first.

10. Which of the following class of statement usually produces no executable code when
compiled?
(A) Declaration
(B) Assignment statements
(C) Input and output statements
(D) Structural statements
Answer: (A)

11. Which of the following comment about peep-hole optimization is true?


(A) It is applied to small part of the code and applied repeatedly
Prepared by G. Sunil Reddy, Asst. Professor,
Department of CSE, SR Engineering College, Warangal 265
UNIT – V: Compiler Design - Runtime Environments, Code Optimization, Code
Generation
(B) It can be used to optimize intermediate code
(C) It can be applied to a portion of the code that is not contiguous
(D) It is applied in symbol table to optimize the memory requirements.
Answer: (A)

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 266
Code No.: CS115
III B.Tech. II Sem. (RA15) Advanced Supplementary Examinations, JUNE - 2019
COMPILER DESIGN
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. Define operator precedence grammar and give one example.
2. What is left factoring and give one example.
3. List out the various phases of a compiler with neat diagram.
4. What are cousins of compiler.
5. What are the functions used to create the nodes of syntax trees?
6. Define ambiguous grammar.
7. Construct a DAG for the expression a: = a + 30?
8. Write the issues in design of the code generator.
9. Define synthesized attribute.
10. What is Dead code elimination and give example.
PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10
1. a) Compare compiler and interpreter with suitable diagrams.
b) Construct the transition diagrams to recognize the tokens below
i) identifier ii) Relational Operator iii) Unsigned Number
2. Given the grammar: S → AaAb | BbBa
A→∈
B→∈
i. Compute FIRST () and FOLLOW() functions
ii. Construct predictive parsing table
iii. Parse the input string w=ab.
3. Given the grammer: S AA
A Aa | b
i)Construct sets of LR (1) items
ii) Construct canonical LR (1) Parsing Table
4. a) Write the fields and uses of symbol table.
b) What is syntax directed translation? How it is different from translation schemes? Explain
with an example.
5. Explain in detail about Loop Optimization.
6. a) Construct the syntax tree for the expression (p+q) + (r-s) * (p*q) - q.
b) Give the applications of DAG.
7. Construct SLR parsing table for the given grammar.
E E+T | T
T T*F | F
F (E) | i
8. Construct finite automata for the given regular expression (a+b)*abb(a+b)*.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 267
Code No.: CS115
III B.Tech. II Sem. (RA15) Regular Examinations, March/April - 2019
COMPILER DESIGN
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. Explain the role of assemblers with an example.
2. What is sentinel?
3. What is LR grammar?
4. Define kernel and non-kernal items.
5. Explain about dependency graph.
6. Distinguish between static and dynamic checking of types.
7. Explain about activation record.
8. List out the phases of compiler?
9. List out the compiler writing tools.
10. Explain about reduction in strength.
PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10
1. Construct a minimum state DFA for the regular expression (a|b)*abb(a|b)a(a|b)(a|b).
2. Show that the grammar
S AaAb/BbBa
A ε
B ε
Is LL(1) but not SLR(1).
3. Show that the grammar
S Aa/bAc/Bc/bBa
A d
B d
Verify the given grammar is SLR(1) or Not
4. Consider the following declaration
D id L
L ,id L/ :T
T integer/real
Construct a translation scheme to enter the type of each identifier into the symbol table.
5. Translate the arithmetic a*-(b+c) into
a) syntax tree b) postfix notation c) three-address code
6. Explain code generation algorithm with suitable example.
7. Explain about different types of optimization techniques.
8. Discuss about bottom up evaluation of inherited attributes.
*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 268
Code No.: CS115
III B.Tech. II Sem. (RA15) Supplementary Examinations, November – 2018
COMPILER DESIGN
(CSE & IT)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. Explain about preprocessors.
2. List out issues with Lexical analyzer.
3. What is left factoring? Explain with an example.
4. Define LL(1) grammar.
5. Define inherited attribute.
6. What is type expression?
7. Mention the draw backs of stack allocation strategy.
8. Explain Directed acyclic graph.
9. Mention the issues in the design of a code generator.
10. Explain about code motion.

PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10
1. a) With a neat sketch of diagram explain about phases of compiler.
b) Discuss about cousins of compiler.
2. Construct DFA directly from regular expression (a|b)*abb(a|b)*(a|b).
3. Consider the grammar
S (L) | a L L,S | S
Eliminate left recursion from the grammar and construct predictive parser for the grammar and
show the behaviour of the parser for the sentence (a,(a,a)).
4. Consider the following grammar
E E+T | T T T*F | F F F* | a | b
Construct SLR parsing tables for this grammar
5. Consider the grammar
E E1+T E E1-T E T T (E) T num
Write translation scheme (top down translation) for the grammar and eliminate left recursion
from the translation scheme
6. Translate the expression (a+b)*(c+d)+(a+b+c) into
a) Quadruples b) triples c) indirect triples
7. Use code generation algorithm to generate code for the following C program
Main()
{
Int i;
Int a[10];
While(i<=10)
a[i]=0;
}
8. Discuss about principle sources of optimization.
*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 269
Code No.: CS115
III B.Tech. II Sem. (RA15) Regular Examinations, April - 2018
COMPILER DESIGN
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. Write a lexeme for C language for loop statement?
2. List out four compiler construction tools?
3. Define an ambiguous grammar?
4. List out three different types of bottom up parsers? Which is faster?
5. Differentiate S-Attribute grammar and L-attribute grammar?
6. Define Coercion.
7. What are the contents of activation record?
8. Discuss about back patching?
9. What are the properties of optimizing compiler?
10. List various forms of target code?
PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10
1. Discuss in detail about various phases of compiler with an example?
2. Give the LALR parsing table for the grammar.
S L=R|R L * R | id R L
3. a) Explain how declarations are done in a procedure using syntax directed translations?
b) Write syntax directed translation for arrays?
4. a) Explain DAG representation of basic blocks.?
b) Discuss in detail about storage allocation strategies?
5. Write a code generation algorithm. Explain about the descriptor and function getreg().
Give an example.
6. Discuss in detail about various code optimization techniques?
7. Consider the grammar
E TE’
E’ + TE’ | E
T FT’
T’ *FT’ | E
F (E) | id
Construct a predictive parsing table for the grammar shown above. Verify whether the
input string id + id * id is accepted by the grammar or not.
8. a) Discuss in detail about Specification of a simple type checker?
b) Explain about three address codes with example

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 270
Code No.: 13CS318
III B.Tech. II Sem. (RA13) Supplementary Examinations, March/April - 2019
LANGUAGE PROCESSORS
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. Analyze the output of syntax analysis phase? What are the three general types of
parsers for grammars?
2. State the general phases of a compiler?
3. Construct NFA for (a/b)* and convert into DFA?
4. Why SLR and LALR are more economical to construct Cananonical LR?
5. Construct the Syntax tree for Expression using functions: (a + b) * (b - c).
6. Write Translation scheme for checking the type of while statement
S→While E do S1
7. Write the machine instruction for operations and copy statement?
8. Write the techniques used for loop optimization and Reduction in strength?
9. Define LR(1) grammar.
10. Differentiate between static type checking and dynamic type checking.
PART – B
Answer any FIVE questions
All questions carry equal marks
6 x 10
1. Construct a deterministic finite automaton in which all the strings containing total
number of a mod 3 = 0 and total number of b mod 3 = 1
2. Explain the various phases of a compiler in detail. Also Write down the output for the
following expression after each phase x: =a+b*c-d
3. Consider the grammar E → E + E | E *E | (E) | id. Show the sequence of moves made
by the shift-reduce parser on the input (id1+id2)*id3 and determine whether the given
string is accepted by the parser or not.
4. Define translation scheme and write three address code for a<b or b>c.
5. Discuss and analyze about all allocation strategies in run-time storage environment?
6. Discuss about the following
i. Copy propagation
ii. Dead code elimination
iii. Code motion
7. Write about machine dependent and machine independent optimization?
8. Consider the following grammar
S TL;
T int | float
L L, id | id
Parse the input string “float id, id;” using shift reduce parser.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 271
Code No.: 13CS318
III B.Tech. II Sem. (RA13) Supplementary Examinations, November – 2018
LANGUAGE PROCESSORS
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. Write the role of pre-processor in language processing.
2. Differentiate the features of linear analysis and hierarchical analysis.
3. What is dangling else ambiguity? Give an example.
4. What is handle pruning? Give an example.
5. Compare and contrast LR and LL Parsers.
6. How to generate polish notation using translation schemes?
7. Differentiate synthesis and inherited translation.
8. What is an activation record? Explain how it is related with run time storage
organization?
9. Write various forms of object code generated in code generation phase.
10. What is copy propagation and dead code elimination?
PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10
1. Draw a block diagram of phases of a compiler and indicate the main functions of each
phase. Show the output produced by different stages of a compiler for a:=b*c/36;
where a, b and c are real numbers.
2. What is LL(1) grammar? Can you convert context free grammar into LL(1)? Explain.
3. Implement Non-Recursive Predictive parsing for the following grammar:
S AaAb | BbBa A Є B Є
4. Write Syntax directed definition for constructing syntax tree of an expression derived
from the grammar: E E+T | E-T | T T (E) | id | num
5. Explain the type system in type checker? Write and explain the syntax directed
definition for type checker.
6. a) What is runtime stack? Explain storage allocation strategies used for recursive
procedure calls.
b) Translate the given expression into Quadruples, triples and indirect triples
(a+b)*(c+d)+(a*b/c)*b+60.
7. Construct LALR parsing table for the grammar
E E+T | T T T*F | F F (E) | id
8. Write short notes on DAG. Construct DAG for the following code:
D:= B*C E := A+B B: = B+C A := E-D
*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 272
Code No.: CS317
III B.Tech. I Sem. (RA11) Supplementary Examinations, May –2018
LANGUAGE PROCESSORS
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. What is the need to divide compilation process into several phases? Explain.
2. What are the different errors encountered during lexical analysis phase?
3. Define Context free grammar using proper notations.
4. Differentiate SLR grammar and LALR grammar?
5. Define S-attributed grammar and L-attributed grammar.
6. What is type checker?
7. List various attributes of symbol table.
8. Discuss the concept of back patching?
9. Explain the loops in a flow graph.
10. What is the advantage of code optimization?

PART – B
Answer any FIVE questions
All questions carry equal marks
7 x 10
1. Explain different phases of compiler by showing the output of each phase using
following statement, float m,n; m=m*55 + n + 3;
2. Briefly explain about loaders and linkers.
3. Discuss top-down parsing and bottom-up parsing with suitable examples.
4. Explain shift-reduce parsing with appropriate example.
5. Write a syntax directed translation scheme to construct a syntax tree. Explain with an
example.
6. What is a type expression? Explain equivalence of type expressions with an example.
7. Compare different storage allocation strategies.
8. Write short notes on peephole optimization techniques.

*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 273
Code No.: 13CS318
III B.Tech. II Sem. (RA13) Supplementary Examinations, April – 2018
LANGUAGE PROCESSORS
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. What do you know about lexical error? Explain.
2. Discuss about absolute loader.
3. Discuss the concept of backtracking.
4. Differentiate LL and LR parsers.
5. What is L-attributed grammar?
6. Discuss the concept of simple type checker.
7. Explain the advantages of dynamic storage allocation.
8. Discuss the concept of back patching?
9. Explain the concept of peephole optimization.
10. What are the issues in the design of code generation?
PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10
1. Discuss how input buffering helps lexical analyzer in compilation process.
2. a) Draw and explain the block diagram of phases of compiler and explain main
functions in each phase.
b) Construct NFA equivalent to regular expression r=(a+b)*(ab + ba) (a+b)*
3. a)What is an LL(1) grammar? Verify whether the following grammar is LL(1) or not?
S iEtSS1 | a S1 eS | Є E b|c|d
b) Construct SLR parsing table for the following CFG:
S L=R | R L *R | id R L
4. Distinguish between top down and bottom up parsing.
5. Write S-attributed grammar to connect the following grammar with prefix rotator
L E
E E+T | E-T | T
T T*F | T/F | F
F P↑E | P
P (E) | id
6. What is type system? Explain static and dynamic checking of types.
7. Which data structure will be used to implement a symbol table in an efficient way?
Give reasons.
8. Efficient register allocation and assignment improves the performance of object code.
Justify this statement with suitable examples.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 274
Code No.: 13CS318
III B.Tech. II Sem. (RA13) Supplementary Examinations, November – 2017
LANGUAGE PROCESSORS
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. What is the need to divide compilation process into several phases? Explain.
2. What are the different errors encountered during lexical analysis phase?
3. Define Context free grammar using proper notations.
4. Differentiate SLR grammar and LALR grammar?
5. Define S-attributed grammar and L-attributed grammar.
6. What is type checker?
7. List various attributes of symbol table.
8. Discuss the concept of back patching?
9. Explain the loops in a flow graph.
10. What is the advantage of code optimization?

PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10
1. Explain different phases of compiler by showing the output of each phase using
following
statement, float m,n; m=m*55 + n + 3;
2. Briefly explain about loaders and linkers.
3. Discuss top-down parsing and bottom-up parsing with suitable examples.
4. Explain shift-reduce parsing with appropriate example.
5. Write a syntax directed translation scheme to construct a syntax tree. Explain with an
example.
6. What is a type expression? Explain equivalence of type expressions with an example.
7. Compare different storage allocation strategies.
8. Write short notes on peephole optimization techniques.

*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 275
Code No.: CS317
III B.Tech. I Sem. (RA11) Supplementary Examinations, June – 2017
LANGUAGE PROCESSORS
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. List the various compiler construction tools.
2. Differentiate between token and lexeme?
3. Draw parse tree for – (id + id).
4. What are the possible actions of shift reduce parser?
5. What is meant by implicit and explicit type conversion?
6. What are the applications of syntax-directed translation?
7. Translate the arithmetic expression a + - (b + c) into triples.
8. What are the various strategies used in dynamic storage allocation?
9. Define peep hole optimization?
10. What is the need for partitioning three address instructions into basic blocks?

PART – B
Answer any FIVE questions
All questions carry equal marks
8 x 10
1. a) What kind of source program errors would be detected during lexical analysis?
b) Explain the reasons for separating lexical analysis phase from syntax analysis.
2. Eliminate ambiguities from the following grammar.
S →iEtSeS|iEtS|a
E →b|c|d where a, b, c, d, e, i, t are terminals.
3. Suppose that the type of each identifier is a sub range of integers for expressions with
the operators +, -, , div and mod as in pascal. Write type checking rules that assign to
each sub expression, the sub range its value must lie in.
4. Discuss and analyze about all the allocation strategies in run-time storage environment.
5. What are legal evolution orders and names for the values at the nodes for the DAG for
following?
d=b+c
b=b c
a=c–d
e=a+b
6. What is a flow graph? Explain how flow graph can be constructed for a given program.
7. Write a procedure for constructing deterministic finite automata from a non-
deterministic automata, explain with an example.
8. What is bottom up parsing? Explain various bottom up parsing techniques.
*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 276
Code No.: 13CS318
III B.Tech. II Sem. (RA13) Supplementary Examinations, May – 2017
LANGUAGE PROCESSORS
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. What is the difference between type checking and bound checking?
2. Write regular expression for the language “All strings of a’s and b’s that do not contain
the substring abb.”
3. Write left most derivation for the input string id + id*id.
4. Define left factoring.
5. What is meant by implicit and explicit type conversion?
6. What are the attributes of syntax-directed translation?
7. Translate the arithmetic expression a + - (b + c) into quadruples.
8. List the kind of date that appears in an activation record.
9. What are peep hole optimization techniques?
10. What is dead code elimination?
PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10
1. Draw a block diagram of phases of a compiler and indicate the main functions of each phase.
2. Construct LALR(1) parse table for the following grammar
S → Aa | bAc | Bc | bBa
A→d
B→d
3. a) Differentiate S-attributed and L-attributed grammars.
b) Give implementation schemes for three address notation.
4. Convert the following arithmetic expression into syntax tree and three address code
b*3 (a + b)
5. What are the various machine dependent code optimization techniques?
6. What are the applications of DAG? Explain how the following expression can be
converted in a DAG.
a+b*(a + b)+c+d
7. Construct SLR passing table for the following grammar.
E→E+T|T
T → TF | F
F → F* | a | b.
8. a) What kind of source program errors would be detected during lexical analysis?
b) Explain the reasons for separating lexical analysis phase from syntax analysis.
*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 277
Code No.: 13CS318
III B.Tech. II Sem. (RA13) Regular Examinations, April – 2017
LANGUAGE PROCESSORS
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. Construct LEADING () and TRAILING () from the following grammar.
S iEtS │ iEtSeS │ a E b
2. What is the difference between a LEX and YACC?
3. What is an ambiguous grammar? Give one example.
4. Left factor the following grammar: E→ E+T │ T
5. What are S-attributed and L-attributed definitions?
6. What is static checking? Give two examples of static checks?
7. What are the different categories of storage allocation strategies?
8. What is a symbol table? What are its contents?
9. What is an activation record?
10. Construct a DAG for the following block
a:= b + c b:= b – d c:= c + d e:= b + c
PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10
1. a) Explain various phases of a compiler with a neat diagram. [6]
b) Design finite automata to accept the binary string which is divisible by 4. [4]
2. Construct precedence function table from the following grammar.
bexpr bexpr + bterm │ bterm
bterm bterm * bfactor │bfactor
bfactor (bexpr) │ a
3. Verify whether the following grammar is LL(1) or not
S iEtSSI│a SI eS│ ∈ E b
4. Write the syntax directed definition for the following grammar and also draw the
annotated tree for the i/p sting 3*5+4n.
L→ En E→ E+T│ T T →T * F│ F F →( E)│ digit
5. a) Write the semantic rules for type checking of expression.
b) Discuss the language facilities for dynamic storage allocation.
6. a) Explain the translation of control flow statements into three address code generated
by the following grammar.
S → if E then S1 │ if E then S1 else S2 │ while E do S1
b) Discuss about back patching.
7. a) Briefly explain the need of next-use information for generating object code.
b) Discuss the issues in the design of code generator.
8. a) Discuss about peephole optimization. [6]
b) Explain error recovery in predictive parsing. [4]
*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 278
Code No.: CS317
III B.Tech. I Sem. (RA11) Supplementary Examinations, Oct/Nov – 2016
LANGUAGE PROCESSORS
(CSE & IT)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. Define a compiler.
2. What is the function of a linker?
3. Define predictive parsing.
4. What is the importance of LR parser?
5. What is an attribute?
6. What is syntax-directed definition?
7. What is the importance of storage organization?
8. How is an assignment statement represented?
9. Define code-generator.
10. What is a global data flow analyzer?

PART – B
Answer any FIVE questions
All questions carry equal marks
9 x 10
1. What is the procedure for recognizing tokens? Explain.
2. What are the conditions to be satisfied for LL(1) grammar?
3. Differentiate between LL and LR parsers.
4. With the help of an example, explain how the syntax trees will be constructed?
5. List and explain the rules for type checking.
6. What are the different dynamic storage allocation techniques? Explain.
7. What is the importance of Dag representation? Explain in detail.
8. Draw the DAG for the following expressions:
a=b+c
b=a–d
c=b+c
d=a–d

*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 279
Code No.: CS317
III B.Tech. I Sem. (RA11) Supplementary Examinations, May/June – 2016
LANGUAGE PROCESSORS
(CSE & IT)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2
1. What is compiler and list various types of compiler writing tools?
2. Define a parser and enumerate various types of parses.
3. What are the three tasks of a code generator?
4. What is a control stack and give its importance?
5. Describe the importance of code optimization.
6. Explain structure of LEX program.
7. Define CFG and give an example of CFG.
8. Draw the schematic diagram of Activation record.
9. What is a Flow Graph?
10. List various types of storage allocation strategies.

PART – B
Answer any FIVE questions
All questions carry equal marks
10 x 10
1. Explain the various phases of a compiler.
2. a) Draw a NFA that accepts (a|b)* a
b) Explain the role of Transition Diagrams in the construction of a lexical analyzer.
3. Generate SLR parsing table for the following grammar
E E+T|T
T T*F|F
F a|b
4. a) Discuss Static Versus Dynamic Storage Allocation strategies.
b) Explain about Activation Records.
5. a) Explain Three Address Code with an example. [4]
b) Explain Flow Graph with an example block of code. [6]
6. Explain optimization of basic blocks in code generation.
7. a) Briefly explain the logical structure of a compiler front end. [4]
b) Give DAG representation scheme for the following expression. [6]
((a-b)*c)–d
8. Write about:
a) Copy Propagation [3]
b) Dead-Code Elimination [3]
c) Type Conversions [4]
*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 280
Code No.: 13CS318
III B.Tech. II Sem. (RA13) Regular Examinations, April/May – 2016
LANGUAGE PROCESSORS
(CSE)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks

10 x 2

1. What are the phases of a compiler?


2. List the compiler writing tools.
3. What is the function of a parser?
4. What is the importance of SLR parsing table?
5. Define syntax-directed translation.
6. What is the function of type checker?
7. List the dynamic storage allocation techniques.
8. Give an example of Boolean expression.
9. What is the importance of DAG?
10. What is optimization?

PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10
1. What is the importance of a lexical analyser? Explain in detail.
2. Differentiate between lexical analysis and syntactic analysis.
3. List and explain the steps involved in constructing SLR table.
4. With the help of an example, explain how to evaluate an SDD at the nodes of a parse
tree?
5. Explain the concept of type conversion.
6. List and explain the storage allocation strategies.
7. Write a short notes on:
a) Assignment statements
b) Boolean expressions
8. How to generate code from a given DAG expression?

*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 281
Code No.: CS317
III B.Tech. I Sem. (RA11) Supplementary Examinations, June – 2015
LANGUAGE PROCESSORS
(Common to CSE & IT)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2 = 20
1. What is the effect of increasing the number of phases in a compiler?
2. What is left recursion? Give an example.
3. Define and give an example for LR(0) grammar.
4. Explain any two error recovery strategies used in predictive parsing.
5. What is the purpose of YACC tool?
6. Define synthesized attribute and inherited attribute.
7. Explain call by name.
8. Write short note on loop optimization.
9. What are the cousins of the compiler?
10. Construct DAG for the expression: a + a + (b - c) + (b - c) *d.
PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10 = 50
1. Suppose we have two types of tokens:
a. The keywords if and while
b. Identifiers, which are strings of letters and digits other than if and while. Construct a DFA
for accepting these tokens.
2. a. Write algorithm for non-recursive predictive parsing.
b. What is left factoring? Explain with a suitable example.
3. a. Find the first and follows for all the non-terminals In the grammar productions
{A → aBc | ∈, B → cab | CaD, C → abA | ∈, D → aab | bac}.
b. Define S-attributed definition and L-attributed definition?
4. Discuss the symbol table organization for block structured languages like PASCAL or C.
5. Write the code generation algorithm and describe the code generation for a simple three address
statement: x : = y + z.
6. Write the quadruples, triples and indirect triples for the following expressions.
a. (a+b)/(c-d)*(e/f + g).
b. (a+b-c/d/e).
7. Explain about storage allocation strategies.
8. Write short notes on peephole optimization Technique.
*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 282
Code No.: CS317
III B.Tech. I Sem. (RA11) Regular Examinations, November – 2014
LANGUAGE PROCESSORS
(Common to CSE & IT)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2 = 20
1. List and explain the basic functions of language translators.
2. Define Left factoring. Distinguish CLR and LALR parsing tables.
3. Define translation scheme and mention how it is different from syntax-directed definition.
4. What is a grammar? Also explain what do you mean by parse tree.
5. Prove or disproof that LL (1) grammar can be ambiguous.
6. Define parsing. Also mention the different classes of parsing techniques and the difference
between them.
7. What is activation record and mention what are all the fields of activation record.
8. Write note on machine independent optimization.
9. List out two properties of reducible flow graph.
10. Write short notes on Next-use information.
PART – B
Answer any FIVE questions
All questions carry equal marks
5 x 10 = 50
1. Describe various phases of a compiler while translating the assignment statement:
a = p + r *10 in to assembly language.
2. Describe the process of generating lexical analyzer using LEX tool and also explain the syntax
of LEX specification with example.
3. Write parse tree for the following grammar.
E - > E + E | E * E | (E) | E - E | id Is this grammar ambiguous or not !
4. Construct the SLR parse table for the following grammar.
G={S → AabB | Bb, A → aA | a, B → Ba | b}
5. a. Write top-down translation scheme to produce quadruples for boolean expression.
b. Distinguish static and dynamic type checking.
6. Describe various storage allocation strategies employed by a compiler.
7. Briefly explain the design issues in code generation.
8. Explain the common sub expression elimination, copy propagation, and transformation for
moving loop invariant computations in detail.
*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 283
Code No.: CS317
III B.Tech. I Sem. (RA11) Supplementary Examinations, May – 2014
LANGUAGE PROCESSORS
(Common to CSE & IT)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2 = 20
1. List cousins of the compilers.
2. Give a short note on DFA-based pattern matcher.
3. Distinguish regular expressions vs. context-free grammar.
4. Define Syntax-Directed Definition.
5. Define type systems.
6. Explain dangling references.
7. Write about target machine.
8. Define flow graphs.
9. List advantages of an organization in an optimizing compiler.
10. Write about copy propagation.

PART – B
Answer any FIVE questions
All questions carry equal marks
11 x 10 = 50
1. What is compiler? Explain the phases of a compiler.
2. Write about the role of the lexical analyzer.
3. Briefly explain "L-attributed definitions" in bottom-up evaluation of inherited attributes.
4. Explain type conversions with a suitable example.
5. Explain briefly storage allocation techniques.
6. Explain Boolean expressions.
7. List and discuss the issues in the design of a code generator.
8. Write about Code-improving Transformations.

*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 284
Code No.: CS317
III B.Tech. I Sem. (RA11) Regular Examinations, November – 2013
LANGUAGE PROCESSORS
(Common to CSE & IT)
Time: 3 Hours Max. Marks: 70
PART – A
Answer ALL questions
All questions carry equal marks
10 x 2 = 20
1. Explain the grouping of phases.
2. Write about Input Buffering.
3. Explain Error-recovery strategies of the parser.
4. Define Syntax-Directed Definition.
5. Define static checking.
6. List source language issues.
7. Write about Syntax Tree.
8. Define Basic Blocks.
9. What is a Code Optimization?
10. Write about common sub-expressions.

PART – B
Answer any FIVE questions
All questions carry equal marks
12 x 10 = 50
1. Give a brief note on analysis of the source program.
2. Briefly explain specification of tokens.
3. Explain Bottom-Up parsing with an example.
4. Write about Symbol tables.
5. Explain briefly storage organization.
6. Explain three-address code and its types.
7. Write about backpatching.
8. Briefly explain loops in flow graphs.

*

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 285
GLOSSARY

1. Compiler - A program that reads a program written in one language and translates it in to
an equivalent program in another language.
2. Analysis part - Breaks up the source program into constituent pieces and creates an
intermediate representation of the source program.
3. Synthesis part - Constructs the desired target program from the intermediate
representation.
4. Structure editor - Takes as input a sequence of commands to build a source program.
Pretty printer - analyses a program and prints it in such a way that the structure of the
program becomes clearly visible.
5. Static checker - Reads a program, analyses it and attempts to discover potential bugs
without running the program.
6. Linear analysis - This is the phase in which the stream of characters making up the
source program is read from left to right and grouped in to tokens that are sequences of
characters having collective meaning.
7. Hierarchical analysis - This is the phase in which characters or tokens are grouped
hierarchically in to nested collections with collective meaning.
8. Semantic analysis - This is the phase in which certain checks are performed to ensure
that the components of a program fit together meaningfully.
9. Loader - Is a program that performs the two functions: Loading and Link editing
10. Loading - Taking relocatable machine code, altering the relocatable address and placing
the altered instructions and data in memory at the proper locations.
11. Link editing - Makes a single program from several files of relocatable machine code.
12. Preprocessor - Produces input to compilers and expands macros into source language
statements.
13. Symbol table - A data structure containing a record for each identifier, with fields for the
attributes of the identifier. It allows us to find the record for each identifier quickly and to
store or retrieve data from that record quickly.
14. Assembler - A program, which converts the source language in to assembly language

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 286
15. Lexeme - a sequence of characters in the source program that is matched by the pattern
for a token.
16. Regular set - language denoted by a regular expression.
17. Sentinel - a special character that cannot be part of the source program. It speeds-up the
lexical analyzer.
18. Regular expression - a method to describe regular language Rules:
19. Recognizers - machines which accept the strings belonging to certain language.
20. Parser - is the output of syntax analysis phase.
21. Error handler- reports the presence of errors clearly and accurately and recovers from
each error quickly enough to be able to detect subsequent errors.
22. Context free grammar - consists of terminals, non-terminals, a start symbol, and
productions.
23. Terminals - basic symbols from which strings are formed. It is a synonym for terminal.
24. Non terminals - syntactic variables that denote sets of strings, which help define the
language generated by the grammar.
25. Start symbol - one of the non terminals in a grammar and the set of strings it denotes is
the language defined by the grammar. Ex: S.
26. Context free language - a language that can be generated by a grammar.
27. Left most derivations – the leftmost non terminal in any sentential form is replaced at
each step.
28. Canonical derivations - rightmost non terminal is replaced at each step are termed.
29. Parse tree - a graphical representation for a derivation that filters out the choice
regarding replacement order.
30. Ambiguous grammar - A grammar that produces more than one parse tree for some
sentence is said to be ambiguous.
31. Left recursive- A grammar is a left recursive if it has a non terminal A such that there is
a derivation A Aα for some string α.
32. Left factoring - a grammar transformation that is useful for producing a grammar
suitable for predictive parsing
33. Parsing - the process of determining if a string of tokens can be generated by a grammar.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 287
34. Top Down parsing - Starting with the root, labeled, does the top-down construction of a
parse tree with the starting non terminal.
35. Recursive Descent Parsing - top down method of syntax analysis in which we execute a
set of recursive procedures to process the input.
36. Predictive parsing - A special form of Recursive Descent parsing, in which the look-
ahead symbol unambiguously determines the procedure selected for each nonterminal,
where no backtracking is required.
37. Bottom Up Parsing - Parsing method in which construction starts at the leaves and
proceeds towards the root is called as Bottom Up Parsing.
38. Shift-Reduce parsing - A general style of bottom-up syntax analysis, which attempts to
construct a parse tree for an input string beginning at the leaves and working up towards
the root.
39. Handle - a sub string that matches the right side of production and whose reduction to
the nonterminal on the left side of the production represents one step along the reverse of
a rightmost derivation.
40. Canonical derivations - process of obtaining rightmost derivation in reverse.
41. Viable prefixes - the set of prefixes of right sentential forms that can appear on the stack
of a shift-reduce parser.
42. Operator grammar - A grammar is operator grammar if no production rule involves “ε”
on the right side.
43. LR parsing method - most general nonbacktracking shift-reduce parsing method.
44. Goto function - takes a state and grammar symbol as arguments and produces a state.
45. LR grammar - A grammar for which we can construct a parsing table is said to be an
LR grammar.
46. Kernel - The set of items which include the initial item, S S, and all items not at the left
end are known as kernel items.
47. Non kernel items - The set of items, which have their dots at the left end, are known as
non kernel items.
48. Synthesized Attributes - They are computed from the values of the attributes of the
children nodes

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 288
49. Inherited Attributes - They are computed from the values of the attributes of both the
siblings and the parent nodes
50. S-Attributed Definition – It is a Syntax Directed Definition that uses only synthesized
attributes. We can evaluate its attribute in any bottom-up order of the nodes of the parse
tree (e.g. post order traversal LR parser).
51. Evaluation Order - Semantic rules in a S-Attributed Definition can be evaluated by a
bottom-up, or PostOrder, traversal of the parse-tree.
52. Dependency Graphs - Dependency Graphs are the most general technique used to
evaluate syntax directed definitions with both synthesized and inherited attributes. A
Dependency Graph shows the interdependencies among the attributes of the various
nodes of a parse-tree.
53. Dependency Rule - If an attribute b depends from an attribute c, then we need to fire the
semantic rule for c first and then the semantic rule for b
54. L-Attributed Definitions – It contains both synthesized and inherited attributes but do
not need to build a dependency graph to evaluate them
55. Translation Schemes – These are more implementation oriented than syntax directed
definitions since they indicate the order in which semantic rules and attributes are to be
evaluated.
56. Attribute grammar - It is a special form of context-free grammar where some additional
information (attributes) are appended to one or more of its non-terminals in order to
provide context-sensitive information
57. S-attributed grammars - are a class of attribute grammars characterized by having no
inherited attributes, but only synthesized attributes. Inherited attributes, which must be
passed down from parent nodes to children nodes of the abstract syntax tree during the
semantic analysis of the parsing process, are a problem for bottom-up parsing because in
bottom-up parsing, the parent nodes of the abstract syntax tree are created after creation
of all of their children.
58. L-attributed grammars - are a special type of attribute grammars. They allow the
attributes to be evaluated in one left-to-right traversal of the abstract syntax tree. As a
result, attribute evaluation in L-attributed grammars can be incorporated conveniently in
top-down parsing.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 289
59. Type Checking - A compiler must check that the source program follows both syntactic
and semantic conventions of the source language. This checking, called static checking,
detects and reports programming errors.
60. Type checks - A compiler should report an error if an operator is applied to an
incompatible operand. Example: If an array variable and function variable are added
together (operator applied to incompatible operands?)
61. Flow-of-control checks - Statements that cause flow of control to leave a construct must
have some place to which to transfer the flow of control. Example: An enclosing
statement, such as break, does not exist in switch statement (break (outside while?))
62. Uniqueness checks - An object must be defined exactly (like the type of an identifier,
statements inside a case/switch statement)
63. Name related checks - Same name must appear two or more times.
64. Type system - A type system is a collection of rules for assigning type expressions to the
various parts of a program. A type checker implements a type system. It is specified in a
syntax-directed manner.
65. Type expression - The type of a language construct will be denoted by a “type
expression”. A type expression is either a basic type or is formed by applying an operator
called a type constructor to other type expressions.
66. Postfix notation - is a linearized representation of a syntax tree50. Syntax directed
definition - Syntax trees for assignment statement are produced by the syntax directed
definition.
67. Three-address code - is a linearized representation of a syntax tree or a dag in which
explicit names correspond to the interior nodes of the graph.
68. Quadruple - It is structure with consist of 4 fields namely op, arg1, arg2 and result. op
denotes the operator and arg1 and arg2 denotes the two operands and result is used to
store the result of the expression
69. Triples – This representation doesn’t make use of extra temporary variable to represent a
single operation instead when a reference to another triple’s value is needed, a pointer to
that triple is used. So, it consist of only three fields namely op, arg1 and arg2.
70. Indirect Triples – This representation makes use of pointer to the listing of all references
to computations which is made separately and stored. It’s similar in utility as compared

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 290
to quadruple representation but requires less space than it. Temporaries are implicit and
easier to rearrange code.
71. Abstract or syntax tree - A tree in which each leaf represents an operand and each
interior node an operator is called as abstract or syntax tree.
72. Triples - is a record structure with three fields op, arg1, arg2
73. Declaration - The process of declaring keywords, procedures, functions, variables, and
statements with proper syntax.
74. Boolean Expression - Expressions which are composed of the Boolean operators (and,
or, and not) applied to elements that are Boolean variables or relational expressions.
75. Calling sequence - A sequence of actions taken on entry to and exit from each
procedure.
76. Back patching - the activity of filling up unspecified information of labels using
appropriate semantic actions in during the code generation process.
77. Basic blocks - A sequence of consecutive statements which may be entered only at the
beginning and when entered are executed in sequence without halt or possibility of
branch.
78. Flow graph - The basic block and their successor relationships shown by a directed
graph is called a flow graph.
79. Virtual machine - An intermediate language as a model assembly language, optimized
for a non-existent but ideal computer
80. Back-end - Intermediate to binary translation is usually done by a separate compilation
pass called back end.
81. Relocatable object module - The unpatched binary image is usually called a relocatable
object module.
82. Multi register operations – operations that require more than one register to perform.
83. Cost of an instruction - one plus the costs associated with the source and destination
address modes. This cost corresponds to the length of the instruction Recursive procedure
- A procedure is recursive a new activation can begin before an earlier activation of the
same procedure has ended.
84. DAG - a directed acyclic graph with the labels on nodes:

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 291
85. Memory management - Mapping names in the source program to addresses of data
object in run time memory.
86. Back patching - Process of leaving a blank slot for missing information and fill in the
slot when the information becomes available.
87. Algebraic transformation - change the set of expressions computed by a basic blocks
into an algebraically equivalent set.
88. Register descriptor - keeps track of what is currently in each register.
89. Address descriptor - keeps track of the location where the current value of the name can
be found at run time.
90. Local optimization - The optimization performed within a block of code.
91. Constant folding - Deducing at compile time that the value of an expression is a
constant and using the constant instead.
92. Common Sub expressions - An occurrence of an expression E is called a common sub
expression, if E was previously computed, and the values of variables in E have not
changed since the previous computation.
93. Dead Code - A variable is live at a point in a program if its value can be used
subsequently otherwise, it is dead at that point. The statement that computes values that
never get used is known Dead code or useless code.
94. Reduction in strength - replaces an expensive operation by a cheaper one such as a
multiplication by an addition.
95. Loop invariant computation - An expression that yields the same result independent of
the number of times the loop is executed.
96. Static allocation - the position of an activation record in memory is fixed at run time
97. Stack allocation - manages the run-time storage as a stack
98. Heap allocation – allocates and deallocates storage as needed at run time from a data
area known as heap
99. Activation tree - A tree which depicts the way of control enters and leaves activations.
100. Activation record - Procedure calls and returns are usually managed by a run time
stack called the control stack. Each live activation has an activation record on the control
stack, with the root of the activation tree at the bottom, the latter activation has its record
at the top of the stack.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 292
101. Return Value: It is used by calling procedure to return a value to calling procedure
102. Actual Parameter: It is used by calling procedures to supply parameters to the called
procedures
103. Control Link: It points to activation record of the caller
104. Access Link: It is used to refer to non-local data held in other activation records
105. Saved Machine Status: It holds the information about status of machine before the
procedure is called
106. Local Data: It holds the data that is local to the execution of the procedure
107. Temporaries: It stores the value that arises in the evaluation of an expression
108. Control stack - A stack which is used to keep track of live procedure actions.
109. Heap - A separate area of run-time memory which holds all other information Padding
- Space left unused due to alignment consideration.
110. Call sequence - allocates an activation record and enters information into its fields
111. Return sequence - restores the state of the machine so that calling procedure can
continue execution.
112. Dangling reference - occurs when there is storage that has been deallocated.
113. Lexical or static scope rule - determines the declaration that applies to a name by a
examining the program text alone.
114. Dynamic scope rule - determines the declaration applicable to name at runtime, by
considering the current activations.
115. Block - a statement containing its own data declaration.
116. Access link - a pointer to each activation record which obtains a direct implementation
of lexical scope for nested procedure.
117. Environment - refers to a function that maps a name to a storage location.
118. State - refers to a function that maps a storage location to the value held there.
119. Peephole optimization - a technique used in many compliers, in connection with the
optimization of either intermediate or object code.

Prepared by G. Sunil Reddy, Asst. Professor,


Department of CSE, SR Engineering College, Warangal 293

You might also like