0% found this document useful (0 votes)
5 views

QA of Compiler

The document provides a comprehensive overview of various concepts in compiler design, including definitions of parsers, context-free grammars, and language translators. It discusses parsing issues, the significance of lexical and syntax analyzers, and different types of object code forms. Additionally, it covers error handling, optimization techniques, and the differences between compilers and interpreters, along with other related topics.

Uploaded by

pofomax827
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

QA of Compiler

The document provides a comprehensive overview of various concepts in compiler design, including definitions of parsers, context-free grammars, and language translators. It discusses parsing issues, the significance of lexical and syntax analyzers, and different types of object code forms. Additionally, it covers error handling, optimization techniques, and the differences between compilers and interpreters, along with other related topics.

Uploaded by

pofomax827
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

PART-A

1. Define parser.

ANS. A parser takes as input tokens from the lexical analyzer


and treats the token names as terminal symbols of a context-
free grammar. The parser then constructs a parse tree for its
input sequence of tokens; the parse tree may be constructed
figuratively.

2. Mention basic issues in parsing

Ans. Ambiguity:- Ambiguity can make it difficult to translate from concrete syntax to
abstract syntax, and can affect how an expression is interpreted.
Left recursion:-Left recursion can sometimes cause a reasonable set of productions to be
rejected due to shift/reduce or reduce/reduce conflicts.
Left factoring:- The right parsing technique depends on several factors, including the
programming language, grammar complexity, parsing efficiency requirements, error
recovery capabilities, and available resources

3. Why lexical and syntax analyzers are separated out

Ans. Separating lexical and syntax analyzer makes the parser simpler and more efficient, and
improves the portability of the compiler:
 Simplicity:-Separating the details of lexical analysis from the syntax analyzer makes
the syntax analyzer smaller and less complex.
 Efficiency:-Separating the lexical and syntax analyzers allows for optimization of the
lexical analyzer.
 Portability:-The lexical analyzer may not be portable because it reads source files,
but the parser always is.
4. Define CFG?/what do you mean by CFG?

Ans. A context-free grammar (CFG) is a formal system that defines the structure of a
language by describing how to form sentences:
 Definition:- A CFG is a set of rules that specify how to replace symbols or groups of
symbols with other symbols.
 Use:-CFGs are used to describe programming languages, natural languages, and
parser programs in compilers.
For example, a CFG for the language of palindromes might include the rules
P → ǫ, P → 0, P → 1, P → 0P0, and P → 1P1.
 Components: A CFG is defined by four tuples: G = (V, T, P, S):
 V: The set of nonterminal symbols, represented by capital letters
 T: The set of terminal symbols, represented by lowercase letters
 P: The set of production rules, which replace nonterminal symbols with
terminals
 S: The start symbol, which is used to derive the string
5. Define the terms language translator and compiler

Ans. Language translator:- A program that translates a program written in one language
into a program in another language. The translator preserves the functional or logical
structure of the original code.
Compiler:- A type of language translator that converts a high-level language program into
machine code. Compilers translate the entire source code into machine code in one step.
Here are some other types of language translators:
Interpreter: Translates high-level languages into machine code one line at a time.
Assembler: Converts assembly language code (a low-level symbolic language) into machine
code.
Language translators are used because computers only understand machine language, which
is made up of 0s and 1s.
6. What is a flow graph? Explain with example.

Ans. A flow graph is a directed graph that represents the flow of control among basic blocks
in a program. It shows how program control is parsed among the blocks.

Here's how a flow graph works: Nodes: Represent basic blocks, Links: Represent decision
paths between nodes. Flow graphs are used to represent the logic of a program.
There are also other types of graphs that are related to flow graphs, including:
Signal-flow graph:-A directed graph that represents a set of linear algebraic equations. The
nodes represent variables, and the branches represent the coefficients relating the variables.
Data flow graph:-A representation of a machine-language program. It consists of nodes
called actors or operators connected by directed arcs or links. The arcs contain tokens or
data values.

7. List out different object code forms.

Ans. The primary object code forms include relocatable object code and absolute object
code; with relocatable code being the most common, allowing for flexibility in memory
placement during linking, while absolute code is directly executable at a fixed memory
location.
Intermediate Object Code: Some compilers may generate an intermediate representation
of the program before producing final object code, facilitating optimization and code
generation.
Object File Formats:- Different operating systems and compilers may utilize different
object file formats (like COFF, ELF, PE) to store object code, including additional
information like symbol tables and debugging data.

8. Differentiate abstract syntax tree and DAG representation of intermediate code

Ans. An abstract syntax tree (AST) and a directed acyclic graph (DAG) are both used to
represent intermediate code, but they differ in how they represent the structure of a program:
Abstract syntax tree (AST):- A simplified parse tree that retains the
syntactic structure of code. An AST is a tree representation of the
abstract syntactic structure of text written in a formal language. Each
node in the tree denotes a construct occurring in the text.
Directed acyclic graph (DAG):- graphical representations of symbolic
expressions where any two provably equal expressions share a single
node. A DAG is a directed graph with no directed cycles. DAGs are
generated as a combination of trees, where operands that are being
reused are linked together.
DAGs and ASTs are both essential techniques used in compiler design
to optimize and translate source code into machine code or other intermediate
representations.
9. Define left recursion. Is the following grammar left recursive? EE + E/E * E/ a/ b

Ans. A grammar is left-recursive if and only if there exists a nonterminal symbol that can
derive to a sentential form with itself as the leftmost symbol. For example:- EE + E/E * E/
a/ b

A Aα | β remove left recursion A βA’ A’  Ꜫ | αA’


EE + E/E * E/ a/ b => E aE’ | bE’ E’ +EE’ | *EE’

10. What is hashing?

Ans. Hashing is a technique that converts a key or string of characters into a shorter, fixed-
length value. This process is used to make it easier to find or use the original string. Hashing
is used in a variety of applications, including:
Hash tables: Hashing is commonly used to set up hash tables, which are array-based
structures that store key-value pairs.
Digital forensics and data security: Hashing algorithms, such as Message Digest 5 (MD5)
and Secure Hashing Algorithm (SHA) 1 and 2, are used in these fields.
Universities and libraries: Hashing is used to assign unique identifiers to students and
books, which can then be used to retrieve information about them.
Database management systems:- Hashing can be used to calculate the direct position of a
data record on a disk without using an index structure.
Hashing works by using a hash function to generate a new value based on a mathematical
algorithm
12. What do you mean by activation record?
Ans. An activation record is a contiguous block of storage that
manages information required by a single execution of a procedure.
When you enter a procedure, you allocate an activation record, and
when you exit that procedure, you de-allocate it. Basically, it stores
the status of the current activation function. So, whenever a function
call occurs, then a new activation record is created and it will be
pushed onto the top of the stack. It will remain in stack till the
execution of that function. So, once the procedure is completed and
it is returned to the calling function, this activation function will be
popped out of the stack.
If a procedure is called, an activation record is pushed into the stack,
and it is popped when the control returns to the calling function.
13. Give the full form and definition of DAG?
Ans . refer Q.8
14. What is Intermediate code?
Ans. Intermediate code is a machine-independent representation of a program generated
during the compilation process, acting as a bridge between the high-level source code and
the machine code, allowing for easier optimization and portability across different computer
architectures; essentially, it's a simplified version of the original program that is easier to
manipulate before being translated into the final executable code for a specific machine.
Machine-independent: Unlike machine code, which is specific to a particular processor,
intermediate code can be processed without considering the target hardware.
Used in compilers: A compiler generates intermediate code during the translation process,
allowing for optimizations to be performed on this representation before generating the final
machine code.
Different forms: Intermediate code can be represented in various formats like syntax trees,
three-address code (TAC), or quadruples, depending on the compiler design.
Benefits: Portability: Enables code to be compiled and run on different machines with
minimal changes.
Optimization: Provides a convenient level of abstraction for applying optimizations before
generating machine code.
Code analysis: Facilitates static analysis and error detection during compilation
15. What is input buffering?
Ans. Input buffering is an important concept in compiler design that refers to the way in
which the compiler reads input from the source code. In many cases, the compiler reads
input one character at a time, which can be a slow and inefficient process. Input buffering is
a technique that allows the compiler to read input in larger chunks, which can improve
performance and reduce overhead.
Buffer Pairs:- Because of the amount of time taken to process characters and the large
number of characters that must be processed during the compilation of a large source
program, specialized buffering techniques have been developed to reduce the amount of
overhead required to process a single input character. An important scheme involves two
buffers that are alternately reloaded

16. What is YACC error handling in LR Parser?


Ans. YACC is an LALR parser generator developed at the beginning of the 1970s by
Stephen C. Johnson for the Unix operating system. It automatically generates the LALR(1)
parsers from formal grammar specifications. YACC plays an important role in compiler and
interpreter development since it provides a means to specify the grammar of a language and
to produce parsers that either interpret or compile code written in that language.
Grammar Specification: The input to YACC is a context-free grammar (usually in the
Backus-Naur Form, BNF) that describes the syntax rules of the language it parses.
Parser Generation: YACC translates the grammar into a C function that could perform an
efficient parsing of input text according to such predefined rules.
LALR(1) Parsing: This is a bottom-up parsing method that makes use of a single token
lookahead in determining the next action of parsing.
Semantic Actions: These are the grammar productions that are associated with an action;
this enables the execution of code, usually in C, used in the construction of abstract syntax
trees, the generation of intermediate representations, or error handling.
Attribute Grammars: These grammars consist of non-terminal grammar symbols with
attributes, which through semantic actions are used in the construction of parse trees or the
output of code.
Integration with Lex: It is often used along with Lex, a tool that generates lexical analyzers-
scanners-which breaks input into tokens that are then processed by the YACC parser.
17. Difference between Bottom-up and Top-down parsing?
Ans. Starting point:- Bottom-up parsing starts from the input tokens, while top-down
parsing starts from the start symbol of the grammar.
Derivation direction:- Bottom-up parsing essentially performs a rightmost derivation in
reverse, while top-down parsing performs a leftmost derivation.
Complexity:- Bottom-up parsing is generally considered more complex to implement due to
the need to handle potential ambiguities in the input string, while top-down parsing can be
easier to understand but may not be suitable for all grammars.
Examples of bottom-up parsing algorithms: - LR(0), SLR, and LALR.
Examples of top-down parsing algorithms: LL(1).
18. What do you mean by peephole optimization / define optimization?
Ans. Peephole optimization is a compiler design technique that replaces a small set of
instructions with a more efficient set that performs the same function. Peephole optimization
examines a small set of instructions, called a peephole or window, for patterns that can be
replaced with more efficient code. The goal of peephole optimization is to improve
performance, reduce memory footprint, and reduce code size.
Examples:- Instead of pushing a register onto the stack and then immediately popping the
value back into the register, remove both instructions. Instead of multiplying x by 2, do x +
x. Instead of multiplying a floating point register by 8, add 3 to the floating point register's
exponent.
19. Explain different types of errors in compilers and how to handle by error handler?
The tasks of the Error Handling process are to detect each error, report it to the user, and
then make some recovery strategy and implement them to handle the error. During this
whole process processing time of the program should not be slow.
Functions of Error Handler: 1. Error Detection 2. Error Report 3. Error Recovery
An Error is the blank entries in the symbol table. Errors in the program should be detected
and reported by the parser. Whenever an error occurs, the parser can handle it and continue
to parse the rest of the input. Although the parser is mostly responsible for checking for
errors, errors may occur at various stages of the compilation process.
So, there are many types of errors and some of these are:
Types or Sources of Error – There are three types of error: logic, run-time and compile-time
error:
20. Define finite automata and regular expression?
Ans. A finite automaton (FA) is a machine that recognizes patterns in input strings, while a
regular expression (RE) is a string that describes the language accepted by an FA:
Finite automaton: An FA is a simple machine that accepts or rejects an input string based on
whether it matches the pattern defined by the FA. FAs are made up of a finite control, which
is a set of states that the machine can be in. The machine runs in steps, receiving an input
signal in each step.
Regular expression: An RE is a string that describes the language accepted by an FA. It's a
sequence of characters that forms a search pattern, which can be used to describe what
you're searching for when searching for data in a text.
You can construct a finite state machine (FSM) that corresponds to a given regular
expression. A FSM can be described by a transition table (program), which can be
represented by a string
21. Differentiate between compiler and interpreter.
Ans. The main difference between a compiler and an interpreter is when they translate code
into machine language:
Compiler: Translates the entire source code into machine code before the program
runs. This allows for faster execution because no translation is required during
execution. Compilers are good for high-performance applications, large-scale applications,
and resource-constrained environments.
Interpreter: Translates code line-by-line as the code runs. This makes it easier to debug and
more suitable for rapid development. Interpreters are good for interactive and scripting uses,
and for situations where platform independence is needed.
22. What is Bootstrapping?
Bootstrapping in compiler design is a technique that involves writing a compiler in the same
programming language it intends to compile:
How it works: A compiler is written in the source programming language it intends to
compile. Once the basic compiler is written, it can be used to compile itself and its future
versions.
Benefits: Bootstrapping helps test the language and makes it easier for compiler developers
and bug reporters to work.
Example: For example, a C compiler is written in the C language.
23. What is token?
In compiler design, a token is a group of characters that has a collective meaning and is the
smallest element of a program that is meaningful to the compiler:
Definition: A token is a sequence of characters that represents a specific element in a
program, such as a keyword, variable, operator, or punctuation symbol.
Examples: Identifiers, strings, keywords, and punctuation marks are all examples of
tokens.
Lexical analysis: During the lexical analysis phase of the compiler, the program is
converted into a stream of tokens.
Lexemes: A lexeme is an actual character sequence that forms a specific instance of a
token.
24. What is syntax?
Syntax is the set of rules that govern how symbols, punctuation, and words are used to
create valid expressions, statements, and programs. Syntax is essential for understanding
the meaning of a programming language. Syntax analysis is a key process in compiler
design that checks code for syntax errors and organizes the code into a structured format.
Syntax analysis is the second phase of compiler design, after lexical analysis. During syntax
analysis, a parser:
24. What is ambiguity?
In compiler design, "ambiguity" refers to a situation where a given piece of code in a
programming language can be interpreted in more than one way by the compiler, leading to
multiple possible parse trees for the same input, which can result in incorrect program
execution due to the uncertainty about the intended meaning of the code; essentially, it
means the grammar used to define the language allows for multiple valid interpretations of a
single code snippet.
Grammar-based: Ambiguity usually arises from the grammar used to define the
programming language, where a set of rules can lead to multiple valid parse trees for the
same input string.
Parse tree issue: When a grammar is ambiguous, it means there are multiple ways to build a
parse tree for a given code snippet, making it difficult for the compiler to determine the
correct interpretation.
Example: Expression ambiguity: Consider an expression like "a + b * c" - without operator
precedence rules, it could be interpreted as either "(a + b) * c" or "a + (b * c)".
Context-free grammar issue: Most programming languages are designed using context-free
grammars, which can sometimes lead to ambiguity if not carefully constructed.

PART-B
1. Define LL(1) grammar. In the following grammar LL(1)
G: Si E t S / i E t S e /a ; E b
Also write the rules for computing FIRST() & FOLLOW().
Grammar FIRST() FOLLOW()
Si E t S {i} {$, t}
Si E t S e {i} {$,t, e}
S a {a} {$}
Eb {b} {b,t}

2. What is LALR(1) grammar ? construct LALR parsing table for the following grammar
SAA, AaA/b

STEP1- Find augmented grammar


The augmented grammar of the given grammar is:-
S'-->.S ,$ [0th production]
S-->.AA ,$ [1st production]
A-->.aA ,a|b [2nd production]
A-->.b ,a|b [3rd production]
Let’s apply the rule of lookahead to the above productions.
 The initial look ahead is always $
 Now,the 1st production came into existence because of ‘ . ‘ before ‘S’ in 0th
production.There is nothing after ‘S’, so the lookahead of 0th production will be the
lookahead of 1st production. i.e. : S–>.AA ,$
 Now,the 2nd production came into existence because of ‘ . ‘ before ‘A’ in the 1st production.
After ‘A’, there’s ‘A’. So, FIRST(A) is a,b. Therefore, the lookahead of the 2nd production
becomes a|b.
 Now,the 3rd d production is a part of the 2nd production.So, the look ahead will be the same.
STEP2 – Find LR(0) collection of items
Below is the figure showing the LR(0) collection of items. We will understand everything one
by one.

The terminals of this grammar ar


are {a,b}
The non-terminals
terminals of this grammar are {S,A}
RULES –
1. If any non-terminal
terminal has ‘ . ‘ preceding it, we have to write all its production and add ‘ . ‘
preceding each of its production.
2. from each state to the next state, the ‘ . ‘ shifts to one place to the right.
 In the figure, I0 consists of augmented grammar.
 I0 goes to I1 when ‘ . ‘ of 0th production is shifted towards the right of S(S’ S(S’-->S.). This state
is the accept state . S is seen by the compiler. Since I1 is a part of the 0th production, the
lookahead is same i.e. $
 I0 goes to I2 when ‘ . ‘ of 1st production is shifted towards right (S (S->A.A)
>A.A) . A is seen by the
compiler. Since I2 is a part of the 1st production, the lookahead is same i.e. $.
 I0 goes to I3 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) >a.A) . a is seen by
the compiler.since I3 is a part of 2nd production, the lookahead is same i.e. a|b.
 I0 goes to I4 when ‘ . ‘ of 3rd production is shifted towards right (A (A->b.)
>b.) . b is seen by the
compiler. Since I4 is a part of 3r
3rd production, the lookahead is same i.e. a|b.
 I2 goes to I5 when ‘ . ‘ of 1st production is shifted towards right (S (S->AA.)
>AA.) . A is seen by the
compiler. Since I5 is a part of the 1st production, the lookahead is same i.e. $.
 I2 goes to I6 when ‘ . ‘ of 2nd d production is shifted towards the right (A(A->a.A)
>a.A) . A is seen
by the compiler. Since I6 is a part of the 2nd production, the lookahead is same i.e. $.
 I2 goes to I7 when ‘ . ‘ of 3rd production is shifted towards right (A (A->b.)
>b.) . A is seen by the
compiler.. Since I6 is a part of the 3rd production, the lookahead is same i.e. $.
 I3 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A (A->a.A)
>a.A) . a is seen by
the compiler. Since I3 is a part of the 2nd production, the lookahead is same i.e. a|b. a|
 I3 goes to I8 when ‘ . ‘ of 2nd production is shifted towards the right (A (A->aA.)
>aA.) . A is seen
by the compiler. Since I8 is a part of the 2nd production, the lookahead is same i.e. a|b.
 I6 goes to I9 when ‘ . ‘ of 2nd production is shifted towards the rright (A->aA.) >aA.) . A is seen
by the compiler. Since I9 is a part of the 2nd production, the lookahead is same i.e. $.
 I6 goes to I6 when ‘ . ‘ of the 2nd production is shifted towards right (A (A->a.A)
>a.A) . a is seen by
the compiler. Since I6 is a part of the 2nd production, the lookahead is same i.e. $.
 I6 goes to I7 when ‘ . ‘ of the 3rd production is shifted towards right (A (A->b.)
>b.) . b is seen by
the compiler. Since I6 is a part of the 3rd production, the lookahead is same i.e. $.
STEP 3 –
Defining 2 functions: goto[list
oto[list of terminals] and action[list of non
non-terminals]
terminals] in the parsing
table.Below is the CLR parsing table
Once we make a CLR parsing table, we can easily make a LALR parsing table from it.
In the step2 diagram, we can see that
 I3 and I6 are similar except their lookaheads.
 I4 and I7 are similar except their lookaheads.
 I8 and I9 are similar except their lookaheads.
In LALR parsing table construction, we merge these similar states.
 Wherever there is 3 or 6, make it 36(combined form)
 Wherever there is 4 or 7, make it 47(combined form)
 Wherever there is 8 or 9, make it 89(combined form)
Below is the LALR parsing table.

Now we have to remove the unwanted rows


 As we can see, 36 row has same data twice, so we delete 1 row.
 We combine two 47 row into one by combining each value in the single 47 row.
 We combine two 89 row into one by combining each value in the single 89 row.
The final LALR table looks like the below.
3. Explain the usage of YACC parser generator in construction of parser

Ans. Definitions are at the top of the YACC input file. They include header files or any
information about the regular definitions or tokens. We define the tokens using a
modulus sign, whereas we place the code specific to C, such as the header files,
within %{%{ and %}%}.
Some examples are as follows:
 %token ID
 {% #include <stdio.h> %}
Rules are between %%%% and %%%%. They define the actions we take when we scan the
tokens. They execute whenever a token in the input stream matches with the grammar. Any
action in C is placed between curly brackets ( {}{}).
Auxiliary routines includes functions that we may require in the rules section. Here, we
write a function in regular C syntax. This section includes the main() function, in which
the yyparse() function is always called.
The yyparse() function reads the tokens, performs the actions, and returns to the main when it
reaches the end of the file or when an error occurs. It returns 00 if the parsing is successful,
and 11 if it is unsuccessful.
Example
The following code is an example of a YACC program of a simple calculator taking two
operands. The .y extension file contains the YACC code for the parser generation, which
uses the .l extension file that includes the lexical analyzer.
file.y
Lexfile.l
1 %{
2 #include <ctype.h>
3 #include <stdio.h>
4 int yylex();
5 void yyerror();
6 int tmp=0;
7 %}
8
9 %token num
10 %left '+' '-'
11 %left '*' '/'
12 %left '(' ')'
13
14 %%
15
16 line :exp {printf("=%d\n",$$); return 0;};
17
18 exp :exp '+' exp {$$ =$1+$3;}
19 | exp '-' exp {$$ =$1-$3;}
20 | exp '*' exp {$$ =$1*$3;}
21 | exp '/' exp {$$ =$1/$3;}
22 | '(' exp ')' {$$=$2;}
23 | num {$$=$1;};
24
25 %%
26
27 void yyerror(){
28 printf("The arithmetic expression is incorrect\n");
29 tmp=1;
30 }
31 int main(){
32 printf("Enter an arithmetic expression(can contain +,-,*,/ or parenthesis):\n");
33 yyparse();
34 }
35
An example of YACC and Lex code for a calculator
Explanation
In the file.y YACC file:
 Lines 1–7: We initialize the header files with the function definitions.
 Lines 9–12: We initialize the tokens for the grammar.
 Lines 16–23: We define the grammar used to build the calculator.
 Lines 27–34: We define an error function with the main function.
In lexfile.l Lex file:
 Lines 1–7: The header files are initialized along with the YACC file included as a header
file.
 Lines 11–18: The regular definition of the expected tokens is defined such as the
calculator input will contain digits 0-9.
Execution
To execute the code, we need to type the following commands in the terminal below:
 We type lex lexfile.l and press "Enter" to compile the Lex file.
 We type yacc -d yaccfile.y and press "Enter" to compile the YACC file.
 We type gcc -Wall -o output lex.yy.c y.tab.c and press "Enter" to generate the C files for
execution.
 We type ./output to execute the program.

4. Why do we need syntax tree when constructing compiler.


Need of Syntax tree in compilers:
Syntax tree is a way of representing the syntax of a programming language as a
 hierarchical tree-like structure. It is used for generating symbol tables for compilers and later
code generation.
 Syntax tree represents all of the constructs in the language and their subsequent rules.
 It helps to preserve variable types , as well as the location of each declaration in source.
 code. The order of executable statements must be explicitly represented and well defined.
 Left and right components of binary operations must be stored and correctly identified.
 Identifiers and their assigned values must be stored for assignment statements.
5. Explain the various compiler phases in brief with suitable example.
Ans.
The phases include:
1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization
6. Target Code Generation
1.Lexical Analysis: The first phase of a compiler is
called lexical analysis or scanning. The lexical analyzer
reads the stream of characters making up the source
program and groups the characters into meaningful
sequences called lexemes. For each lexeme, the lexical
analyzer produces as output a token of the form
< token- name, attribute-value >
that it passes on to the subsequent phase, syntax analysis . In the token, the first component token-
name is an abstract symbol that is used during syntax analysis, and the second component
attribute-value points to an entry in the symbol table for this token.
Information from the symbol-table entry 'is needed for semantic analysis and code generation.
2. Syntax Analysis: The second phase of the compiler is syntax analysis or parsing. The parser
uses the first components of the tokens produced by the lexical analyzer to create a tree-like
intermediate representation that depicts the grammatical structure of the token stream. A typical
representation is a syntax tree in which each interior node represents an operation and the
children of the node represent the arguments of the operation.
3. Semantic Analysis The semantic analyzer uses the syntax tree and the information in the
symbol table to check the source program for semantic consistency with the language definition.
It also gathers type information and saves it in either the syntax tree or the symbol table, for
subsequent use during intermediate-code generation. An important part of semantic analysis is
type checking, where the compiler checks that each operator has matching operands. 4.
Intermediate Code Generation In the process of translating a source program into target code, a
compiler may construct one or more intermediate representations, which can have hav a variety of
forms. Syntax trees are a form of intermediate representation; they are commonly used during
syntax and semantic analysis. After syntax and semantic analysis of the source program, many
compilers generate an explicit low
low-level or machine-likee intermediate representation, which we
can think of as a program for an abstract machine. This intermediate representation should have
two important properties:
1. It should be simple and easy to produce
2. It should be easy to translate into the target machine.
5. Code Optimization:- The machine
machine-independent code-optimization
optimization phase attempts to improve
the intermediate code so that better target code will result. The objectives for performing
optimization are: faster execution, shorter code, or target cod
codee that consumes less power. 6.Code
Generation:- The code generator takes as input an intermediate representation of the source
program and maps it into the target language. If the target language is machine code, registers or
memory locations are selected for each of the variables used by the program. Then, the
intermediate instructions are translated into sequences of machine instructions that perform the
same task. A crucial aspect of code generation is the judicious assignment of registers to hold
variables.
les. If the target language is assembly language, this phase generate the assembly code as
its output.
Example:

6. What is the process & importance of


intermediate code generation

Ans. Intermediate Code Generation is a stage in the process of


compiling a program, where the compiler translates the source code
into an intermediate representation. This representation is not
machine code but is simpler than the original high-level code. Here’s
how it works:
 Translation: The compiler takes the high-level code (like C
or Java) and converts it into an intermediate form, which can be
easier to analyze and manipulate.
 Portability: This intermediate code can often run on different types of machines without needing major changes,
making it more versatile.
 Optimization: Before turning it into machine code, the compiler can optimize this intermediate code to make the
final program run faster or use less memory.

The following are commonly used intermediate code representations:


Postfix Notation
Example: The postfix representation of the expression (a – b) * (c + d) + (a – b) is : ab – cd + *ab -+
Three-Address Code
 A three address statement involves a maximum of three references, consisting of two for operands
oper and one for the
result.
 The typical form of a three address statement is expressed as x = y op z, where x, y, and z represent memory
addresses.
Example: The three address code for the expression a + b * c + d :
T1 = b * c
T2 = a + T1
T3 = T2 + d; T 1 , T2 , T3 are temporary variables.
There are 3 ways to represent a Three
Three-Address Code in compiler design:
i) Quadruples ii) Triples iii)
Indirect Triples
Syntax Tree
 A syntax tree serves as a condensed representation of a parse tree.
 The operator and keyword nodes present in the parse tree undergo a
relocation process to become part of their respective parent nodes in
the syntax tree. the internal nodes are operators and child nodes are
operands.
 Creating a syntax tree involves strategically placing parentheses within
the expression.
Example: x = (a + b * c) / (a – b * c)

Advantages of Intermediate Code Generation


 Easier to Implement: Intermediate code generation can simplify the code generation process by reducing the
complexity of the input code, making it easier to implement.
 Facilitates Code Optimization: Intermediate code generation can enable the use of various code optimization
techniques, leading to improved performance and efficiency of the generated code.
 Platform Independence: Intermediat
Intermediate code is platform-independent, meaning that it can be translated into
machine code or bytecode for any platform.
 Code Reuse: Intermediate code can be reused in the future to generate code for other platforms or languages.
 Easier Debugging: Intermediate code can be easier to debug than machine code or bytecode, as it is closer to the
original source code.
Disadvantages of Intermediate Code Generation
 Increased Compilation Time: Intermediate code generation can significantly increase the compilation time,
making it less suitable for real--time or time-critical applications.
 Additional Memory Usage: Intermediate code generation requires additional memory to store the intermediate
representation, which can be a concern for memory-limited systems.
 Increased Complexity: Intermediate code generation can increase the complexity of the compiler design, making
it harder to implement and maintain.
 Reduced Performance: The process of generating intermediate code can result in code that executes slower than
code generated directly from the source code.
7. Explain various strategies of symbol table creation & organization

Ans. Symbol Tables:- A symbol table is a data structure used by compilers and interpreters to
store information about variables, functions, and other identif
identifiers
iers in a program. Efficient
organization is crucial for fast lookups during compilation or interpretation.
Several strategies exist for organizing symbol tables, each with trade
trade-offs
offs in terms of space
and time complexity:
 Linear List: A simple approach us using
ing an array or linked list. Searching is linear (O(n)),
insertion and deletion are relatively easy. Suitable for small programs but inefficient for
large ones.
 Hash Table: Uses a hash function to map identifiers to table entries. Average-case
Average
search, insertion,
rtion, and deletion are O(1), but worst
worst-case
case can be O(n) (if many collisions
occur). Requires careful choice of hash function to minimize collisions.
 Binary Search Tree (BST): Organizes symbols in a tree structure, allowing for
efficient searching (O(log n) on average), insertion, and deletion. However, performance
degrades to O(n) if the tree becomes unbalanced. Self-balancing BSTs (like AVL trees
or red-black trees) maintain balance, ensuring O(log n) performance in all cases.
 Trie: A tree-like structure where each node represents a character. Efficient for prefix-
based searches (finding all identifiers starting with a given prefix). Space consumption
can be high.
8. Describe bootstrapping in detail.

Ans. Bootstrapping: Bootstrapping is widely used in the compilation development.


Bootstrapping is used to produce a self-hosting compiler. Self-hosting compiler is a type
of compiler that can compile its own source code. Bootstrap compiler is used to compile
the compiler and then you can use this compiled compiler to compile everything else as
well as future versions of itself.
A compiler can be characterized by three languages:

1. Source Language
2. Target Language
3. Implementation Language

The T- diagram shows a compiler SCIT for Source S, Target T, implemented


in I.

Follow some steps to produce a new language L for machine A:

1. Create a compiler SCAA for subset, S of the desired language, L using language "A" and
that compiler runs on machine A.

2. Create a compiler LCSA for language L written in a subset of L.

3. Compile LCSA using the compiler SCAA to obtain LCAA. LCAA is a compiler
for language L, which runs on machine A and produces code for machine
A.

The process described by the T-diagrams is called bootstrapping.

9. Write short note on operator precedence parsing function.


Ans. Operator precedence parser – An operator precedence parser is a one of the bottom-up
parser that interprets an operator precedence grammar. This parser is only used for operator
grammars. Ambiguous grammars are not allowed in case of any parser except operator
precedence parser. There are two methods for determining what precedence relations should hold
between a pair of terminals:
1. Use the conventional associatively and precedence of operator.
2. The second method of selecting operator-precedence relations is first to construct an
unambiguous grammar for the language, a grammar that reflects the correct associatively
and precedence in its parse trees.
This parser relies on the following three precedence relations: ⋖, ≐, ⋗
a ⋖ b This means a “yields precedence to” b.
a ⋗ b This means a “takes precedence over” b.
a ≐ b This means a “has precedence as” b.

Figure – Operator precedence relation table for grammar E->E+E/E*E/id


There is not given any relation between id and id as id will not be compared and two variables
cannot come side by side. There is also a disadvantage of this table as if we have n operators than
size of table will be n*n and complexity will be 0(n2 ). In order to increase the size of table, use
operator function table.
The operator precedence parsers usually do not store the precedence table with the relations;
rather they are implemented in a special way. Operator precedence parsers use precedence
functions that map terminal symbols to integers, and so the precedence relations between the
symbols are implemented by numerical comparison. The parsing table can be encoded by two
precedence functions f and g that map terminal symbols to integers. We select f and g such that:
1. f(a) < g(b) whenever a is precedence to b
2. f(a) = g(b) whenever a and b having precedence
3. f(a) > g(b) whenever a takes precedence over b
10. What do you mean by basic block? Also explain in detail the transformation in basic
block.

Ans. Basic Block is a straight line code sequence that has no branches in and out branches except to the
entry and at the end respectively. Basic Block is a set of statements that always executes one after other,
in a sequence. The first task is to partition a sequence of three-address codes into basic blocks. A new
basic block is begun with the first instruction and instructions are added until a jump or a label is met. In
the absence of a jump, control moves further consecutively from one instruction to another. The idea is
standardized in the algorithm below:
Algorithm: Partitioning three-address code into basic blocks.
Input: A sequence of three address instructions.
Process: Instructions from intermediate code which are leaders are determined. The following are the
rules used for finding a leader:
1. The first three-address instruction of the intermediate code is a leader.
2. Instructions that are targets of unconditional or conditional jump/goto statements are leaders.
3. Instructions that immediately follow unconditional or conditional jump/goto statements are
considered leaders.
Each leader thus determined its basic block contains itself and all instructions up to excluding the
next leader.
Basic blocks are sequences of instructions in a program that have no branches except at the entry and
exit. Example 1:
The following sequence of three-address statements forms a basic block:
t1 := a*a
t2 := a*b
t3 := 2*t2
t4 := t1+t3
t5 := b*b
t6 := t4 +t5
A three address statement x:= y+z is said to define x and to use y and z. A name in a basic block is said
to be live at a given point if its value is used after that point in the program, perhaps in another basic
block.

Structure-Preserving Transformations:

The structure-preserving transformation on basic blocks includes:


1. Dead Code Elimination
2. Common Subexpression Elimination
3. Renaming
naming of Temporary variables
4. Interchange of two independent adjacent statements

1.Dead Code Elimination:

Dead code is defined as that part of the code that never executes during the program execution.
So, for optimization, such code or dead code is elimi
eliminated.
nated. The code which is never executed
during the program (Dead code) takes time so, for optimization and speed, it is eliminated from
the code. Eliminating the dead code increases the speed of the program as the compiler does not
have to translate the de
dead code.
Example:
// Program with Dead code
int main()
{
x=2
if (x > 2)
cout << "code"; // Dead code
else
cout << "Optimization";
return 0;
}
// Optimized Program without dead code
int main()
{
x = 2;
cout << "Optimization";
ization"; // Dead Code Eliminated
return 0;
}

2.Common Subexpression Elimination:

In this technique, the sub


sub-expression which
are common are used frequently are
calculated only once and reused when
needed. DAG ( Directed Acyclic Graph ) is
used to eliminate
minate common sub expressions.
Example:

3. Renaming of Temporary Variables:


Statements containing instances of a temporary variable can be changed to instances of a new
temporary variable without changing the basic block value.
Example: Statement t = a + b can be changed to x = a + b where t is a temporary variable and x
is a new temporary variable without changing the value of the basic block.
4. Interchange of Two Independent Adjacent Statements:
If a block has two adjacent statements which are indep
independent
endent can be interchanged without
affecting the basic block value.
Example:
t1 = a + b
t2 = c + d
These two independent statements of a block can be interchanged without affecting the value of
the block.
Algebraic Transformation:

Countless algebraic transformations can be used to change the set of expressions computed by a basic block
into an algebraically equivalent set. Some of the algebraic transformation on basic blocks includes:
1. Constant Folding
2. Copy Propagation
3. Strength Reduction
1. Constant Folding:
Solve the constant terms which are continuous so that compiler does not need to solve this expression.
Example:
x = 2 * 3 + y ⇒ x = 6 + y (Optimized code)
2. Copy Propagation:
It is of two types, Variable Propagation, and Constant Propagation.
Variable Propagation:
x=y ⇒ z = y + 2 (Optimized code)
z=x+2
Constant Propagation:
x=3 ⇒ z = 3 + a (Optimized code)
z=x+a
3. Strength Reduction:
Replace expensive statement/ instruction with cheaper ones.
x = 2 * y (costly) ⇒ x = y + y (cheaper)
x = 2 * y (costly) ⇒ x = y << 1 (cheaper)

11. Construct a DAG for the basic block whose code is given below:-
D := B * C
E := A + B
B := B * C
A := E – D

Ans.

12. Explain in brief the various issues of design of code generator.

Ans. The final phase in compiler model is the code generator. It takes as input an
intermediate representation of the source program and produces as output an equivalent
target program. The code generation techniques presented below can be used whether or not
an optimizing phase occurs before code generation.
Fig. 4.1 Position of code generator

ISSUES IN THE DESIGN OF A CODE GENERATOR


The following issues arise during the code generation phase:
1. Input to code generator
2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order

1. Input to code generator: The input to the code generation consists of the intermediate
representation of the source program produced by front end, together with information in the
symbol table to determine run-time addresses of the data objects denoted by the names in the
intermediate representation.
• Intermediate representation can be :
a. Linear representation such as postfix notation
b. Three address representation such as quadruples
c. Virtual machine representation such as stack machine code
d. Graphical representations such as syntax trees and dags.
e. Prior to code generation, the front end must be scanned, parsed and translated into
intermediate representation along with necessary type checking. Therefore, input to code
generation is assumed to be error-free.
2. Target program:The output of the code generator is the target program. The output may be :
a. Absolute machine language. It can be placed in a fixed memory location and can be
executed immediately.
b. Relocatable machine language. It allows subprograms to be compiled separately. Assembly
language - Code generation is made easier.

3. Memory management:
• Names in the source program are mapped to addresses of data objects in run-time memory by
the front end and code generator.
• It makes use of symbol table, that is, a name in a three-address statement refers to a symbol-
table entry for the name.
• Labels in three-address statements have to be converted to addresses of instructions. For
example,
j:gotoigenerates jump instruction as follows:
* if i < j, a backward jump instruction with target address equal to location of code for
quadruple i is generated.
* if i > j, the jump is forward. We must store on a list for quadruple i the location of the first
machine instruction generated for quadruple j. When i is processed, the machine locations for all
instructions that forward jumps to i are filled.

4. Instruction selection:
• The instructions of target machine should be complete and uniform.
• Instruction speeds and machine idioms are important factors when efficiency of target
program is considered.
• The quality of the generated code is determined by its speed and size.
• The former statement can be translated into the latter statement as shown below:

a:=b+c
d:=a+e (a)

MOV b, R0
ADD c, R0
MOV R0, a (b)
MOV a, R0
ADD e, R0
MOV R0, d

5. Register allocation
• Instructions involving register operands are shorter and faster than those involving operands in
memory. The use of registers is subdivided into two subproblems :

1. Register allocation - the set of variables that will reside in registers at a point in the program is
selected.
2. Register assignment - the specific register that a value picked•
3. Certain machine requires even-odd register pairs for some operands and results. For example ,
consider the division instruction of the form :D x, y
where, x - dividend even register in even/odd register pair y-divisor even register holds the
remainder odd register holds the quotient

6. Evaluation order
• The order in which the computations are performed can affect the efficiency of the target code.
Some computation orders require fewer registers to hold intermediate results than others.

13. What is the basic task of scanning? What are the difficulties found in delimiter oriented
scanning? How can this be removed?

Ans. LEXICAL ANALYSIS: A simple way to build lexical analyzer is to construct a


diagram that illustrates the structure of the tokens of the source language, and then to hand-
translate the diagram into a program for finding tokens. Efficient lexical analyzers can be
produced in this manner.
Role of Lexical Analyzer:- The lexical analyzer
is the first phase of compiler. Its main task is to
read the input characters and produces output a
sequence of tokens that the parser uses for syntax
analysis. As in the figure, upon receiving a “get
next token” command from the parser the lexical
analyzer reads input characters until it can
identify the next token.
Fig. 1.8 Interaction of lexical analyzer with parser
Since the lexical analyzer is the part of the compiler that reads the source text, it may also
perform certain secondary tasks at the user interface. One such task is stripping out from the
source program comments and white space in the form of blank, tab, and new line character.
Another is correlating error messages from the compiler with the source program.
Issues in Lexical Analysis There are several reasons for separating the analysis phase of
compiling into lexical analysis and parsing
1) Simpler design is the most important consideration. The separation of lexical analysis
from syntax analysis often allows us to simplify one or the other of these phases.
2) Compiler efficiency is improved.
3) Compiler portability is enhanced.
Tokens Patterns and Lexemes:- There is a set of strings in the input for which the same token
is produced as output. This set of strings is described by a rule called a pattern associated
with the token. The pattern is set to match each string in the set. In most programming
languages, the following constructs are treated as tokens: keywords, operators, identifiers,
constants, literal strings, and punctuation symbols such as parentheses, commas, and
semicolons.
Lexeme:- Collection or group of characters forming tokens is called Lexeme. A lexeme is a
sequence of characters in the source program that is matched by the pattern for the token. For
example in the Pascal’s statement const pi = 3.1416; the substring pi is a lexeme for the token
identifier.
Patterns:- A pattern is a rule describing a set of lexemes that can represent a particular token
in source program. The pattern for the token
const in the above table is just the single
string const that spells out the keyword.
Certain language conventions impact the
difficulty of lexical analysis. Languages such
as FORTRAN require a certain constructs in
fixed positions on the input line. Thus the
alignment of a lexeme may be important in
determining the correctness of a source
program.
Attributes of Token:- The lexical analyzer returns to the parser a representation for the token
it has found. The representation is an integer code if the token is a simple construct such as a
left parenthesis, comma, or colon. The representation is a pair consisting of an integer code
and a pointer to a table if the token is a more complex element such as an identifier or
constant. The integer code gives the token type, the pointer points to the value of that token.
Pairs are also retuned whenever we wish to distinguish between instances of a token. The
attributes influence the translation of tokens.
i) Constant : value of the constant
ii) Identifiers: pointer to the corresponding symbol table entry.
Error Recovery Strategies In Lexical Analysis The following are the error-recovery actions in
lexical analysis:
1) Deleting an extraneous character.
2) Inserting a missing character.
3) Replacing an incorrect character by a correct character.
4) Transforming two adjacent characters.
5) Panic mode recovery: Deletion of successive characters from the token until error is
resolved.
14. Explain the syntax directed translation schemes in detail.

Ans. Syntax Directed Translation is a set of productions that have semantic rules embedded
inside it. The syntax-directed translation helps in the semantic analysis phase in the compiler.
SDT has semantic actions along with the production in the grammar. This article is about
postfix SDT and postfix translation schemes with parser stack implementation of it. Postfix
SDTs are the SDTs that have semantic actions at the right end of the production. This article
also includes SDT with actions inside the production, eliminating left recursion from SDT and
SDTs for L-attributed definitions.

Postfix Translation Schemes:

 The syntax-directed translation which has its semantic actions at the end of the production is
called the postfix translation scheme.
 This type of translation of SDT has its corresponding semantics at the last in the RHS of the
production.
 SDTs which contain the semantic actions at the right ends of the production are
called postfix SDTs.
Example of Postfix SDT
S ⇢ A#B{S.val = A.val * B.val}
A ⇢B@1{A.val = B.val + 1}
B ⇢num{B.val = num.lexval}

Parser-Stack Implementation of Postfix SDTs:

Postfix SDTs are implemented when the semantic actions are at the right end of the production
and with the bottom-up parser(LR parser or shift-reduce parser) with the non-terminals having
synthesized attributes.
 The parser stack contains the record for the non-terminals in the grammar and their
corresponding attributes.
 The non-terminal symbols of the production are pushed onto the parser stack.
 If the attributes are synthesized and semantic actions are at the right ends then attributes of
the non-terminals are evaluated for the symbol in the top of the stack.
 When the reduction occurs at the top of the stack, the attributes are available in the stack,
and after the action occurs these attributes are replaced by the corresponding LHS non-
terminal and its attribute.
 Now, the LHS non-terminal and its attributes are at the top of the stack.
Production
A ⇢ BC{A.str = B.str . C.str}
B ⇢a {B.str = a}
C ⇢b{C.str = b}
Initially, the parser stack:
B C Non-terminals
B.str C.str Synthesized attributes

Top of Stack
After the reduction occurs A ⇢BC then after B, C and their attributes are replaced by A and in
the attribute. Now, the stack:
A Non-terminals
A.str Synthesized attributes

Top of stack

SDT with action inside the production:

When the semantic actions are present anywhere on the right side of the production then it
is SDT with action inside the production.
It is evaluated and actions are performed immediately after the left non-terminal is processed.
This type of SDT includes both S-attributed and L-attributed SDTs.
If the SDT is parsed in a bottom-up parser then, actions are performed immediately after the
occurrence of a non-terminal at the top of the parser stack.
If the SDT is parsed in a top-down parser then, actions are before the expansion of the non-
terminal or if the terminal checks for input.
Example of SDT with action inside the production
S ⇢ A +{print '+'} B
A ⇢ {print 'num'}B
B ⇢ num{print 'num'}

Eliminating Left Recursion


n from SDT:

The grammar with left recursion cannot be parsed by the top


top-down
down parser. So, left recursion
should be eliminated and the grammar can be transformed by eliminating it.
Grammar with Left Recursion Grammar after eliminating left recursion
recurs
P ⇢ Pr | q P ⇢ qA
A ⇢ rA | ∈

SDT for L-attributed


attributed Definitions:

SDT with L-attributed


attributed definitions involves both synthesized and inherited attributes in the
production.
To convert an L-attributed
attributed definition into its equivalent SDT follow the underlying rules:
 When the attributes are inherited attributes of any nonnon-terminal
terminal then place the action
immediately before the non-terminal
terminal in the production.
 When the attributes of the non
non-terminal
erminal are synthesized then, place the action at the right
end of that production.

15. Consider the expression (Left to right scanning ))-


(a/b*c) + (a/b) – (b+(a*b))(a*b)
Draw the DAG of the above expression.

Ans do self

16. What do you mean by LR parser


parser? What is the model of an LR parser ? explain .

Ans. LR parser : LR parser is a bottom


bottom-up parser
for context-free
free grammar that is very generally
used by computer programming language
compiler and other associated tools. It is called a
Bottom-upup parser because it attempts to reduce the
top-level
level grammar productions by building up
from the leaves. LR parsers
ers are the most powerful parser of all deterministic parsers in
practice.

The term parser LR(k) parser, here the L refers to the left
left-to-right
right scanning, R refers to the
rightmost derivation in reverse and k refers to the number of unconsumed “look ahead”
input symbols that are used in making parser decisions. Typically, k is 1 and is often
omitted. A context-free
free grammar is called LR (k) if the LR (k) parser exists for it. This first
reduces the sequence of tokens to the left. But when we read from above, the derivation
order first extends to non-terminal.
terminal.
1. The stack is empty, and we are looking to reduce the rule by S’ S’→S$.
2. Using a “.” in the rule represents how many of the rules are already on the stack.
3. A dotted item, or simply, the item is a production rrule ule with a dot indicating how much
RHS has so far been recognized. Closing an item is used to see what production rules
can be used to expand the current structure. It is calculated as follows:
Rules for LR parser : The rules of LR parser as follows.
1. The first
irst item from the given grammar rules adds itself as the first closed set.
2. If an object is present in the closure of the form AA→ α. β. γ, where the next symbol after
the symbol is non-terminal,
terminal, add the symbol’s production rules where the dot precedes
the first item.
3. Repeat steps (B) and (C) for new items added under (B).
LR parser algorithm : LR Parsing algorithm is the same for all the parser, but the
parsing table is different for each parser. It consists following components as follows.
1. Input Buffer – It contains the given string, and it ends with a $ symbol.

2. Stack:- The combination


bination of state symbol and current input symbol is used to refer to the
parsing table in order to take the parsing decisions.
Parsing Table: Parsing table is divided into two partsparts- Action table and Go-To
Go table.
The action table gives a grammar rule to iimplement
mplement the given current state and current
terminal in the input stream. There are four cases used in action table as follows.
1. Shift Action- In shift action the present terminal is removed from the input stream and
the state n is pushed onto the stack, aand
nd it becomes the new present state.
2. Reduce Action- The number m is written to the output stream.
3. The symbol m mentioned in the left left-hand side of rule m says that state is removed from
the stack.
4. The symbol m mentioned in the left left-hand side of rule m says that a new state is looked
up in the goto table and made the new current state by
pushing it onto the stack.
An accept - the string is accepted
No action - a syntax error is reported
The go-to table indicates which state should proceed.

LR parser diagram
d

17. Classify the errors and discuss the errors in each phase of compiler.
See the Question of 9 part A
18. What is symbol table? Write the procedure to store the names in symbol
ymbol table.
Ans. Do self
19. Explain bottom up parsing?
Ans.
20. Write short note on global data flow analysis

21. Explain intermediate code forms using postfix notation.

22. What is peephole optimization? Explain in detail

23. Consider the grammar


EE+E
EE*E
E  id

Perform shift reduce parsing of the input


string “id1 + id2 * id3”.

Ans.
PART-C
1. Writing short notes on
a. Nesting dept & access link
b. Data structures used is symbolic table
c. Static versus dynamic storage allocation
2. What is LEX? Discuss the usage of LEX in Lexical Analyzer generation
3. Generate the three address code for the following code fragment
while(a > b)
{
if( c > d)
x = y + z;
else
x = y – z;
}
4. Explain the different storage allocation strategies.
5. Explain the following terms:-
i. Register descriptor
ii. Address deecriptor
iii. Instruction costs
6. Consider the following grammar G :-
E E + T | T
T  TF | F
FF*|a|b
a. Construct the SLR parsing table for this grammar.
b. Construct the LALR parsing
7. Define syntax directed definition. Explain the various forms of syntax directed definition.
8. Translate the arithmetic expression:-
(a + b) * (c + d) + (a + b + c) into
a. Syntax tree
b. Three address code
c. Quadruple
d. Triples
9. Consider the following basic block and then construct the DAG for it
t1 = a + b
t2 = c + d
t3 = e – t2
t4 = t1 - t3
10. Explain different storage allocation strategies
11. Consider the following LL(1) grammar describing a certain sort of rented lists:-
S TS|Ꜫ
T U.T|U
U  x|y|[s]
i. Left factor this grammar
ii. Give the first and follow sets for each non-terminal in the grammar obtain in part(i).
iii. Using this information construct an LL parsing table for the grammar obtain in
part(i).
12. (a) Calculate canonical collection of sets of LR(0). Items of grammar given below:-
E’  E
E  E + T| T
TT*F|F
F  (E) | id
Ans.
(b) Calculate canonical collection of sets of LR(1). Items of grammar given below:-
s’ s
s  cc
c  ec/d

13. For the assignment statement X = (a + b) * (c + d), construct the translation scheme and
an annotated parse tree. Also, differentiate between ‘call by value’ and ‘call by reference’
with example.
14. Explain peephole optimization in detail .
15. Explain the symbol table management system in detail.
16. Explain parsing techniques with a hierarchical diagram.
17. Construct syntax tree and postfix notation for the following expressin
(a + (h * c)^ d – e / (f + g ))
18. What is common sub expression and how to eliminate it? Explain with the help of
appropriate example.

You might also like