CD-30 Questions With Solution
CD-30 Questions With Solution
Affiliated
S. N. PATEL INSTITUTE OF TECHNOLOGY & RESEARCH
CENTRE, UMRAKH
Question Bank with solution
Subject Name : COMPILER DESIGN
Subject Code : 3170701
Branch : Computer Science & Engineering
Semester : 7th
Subject Teacher : Mr. Sandip K. Tandel, Mr. Ritesh K. Chauhan
Position (identifier)
= (Assignment symbol)
initial (identifier)
+ (Plus symbol)
rate (identifier)
* (Multiplication symbol)
1
60 (Number)
2) Syntax analysis
✓ Syntax Analysis is also called Parsing or Hierarchical Analysis.
✓ The syntax analyzer checks each line of the code and spots every tiny mistake.
✓ If code is error free then syntax analyzer generates the tree.
3) Semantic analysis
✓ Semantic analyzer determines the meaning of a source string.
✓ It performs following operations:
1. matching of parenthesis in the expression.
2. Matching of if..else statement.
3. Performing arithmetic operation that are type compatible.
4. Checking the scope of operation
4) Intermediate code generator
✓ Two important properties of intermediate code :
1. It should be easy to produce.
2. Easy to translate into target program.
✓ Intermediate form can be represented using “three address code”.
✓ Three address code consist of a sequence of instruction, each of which has at most three operands.
5) Code optimization
✓ It improves the intermediate code.
✓ This is necessary to have a faster execution of code or less consumption of memory.
6) Code generation
✓ The intermediate code instructions are translated into sequence of machine instruction.
2
Example:
3
Q.2. List the cousins of compiler and explain the role of them.
Ans=
✓ In addition to a compiler, several other programs may be required to create an executable target
program.
❖ Preprocessor:
✓ Preprocessor produces input to compiler. They may perform the following functions,
✓ Macro processing: A preprocessor may allow user to define macros that are shorthand for
longer constructs.
✓ File inclusion: A preprocessor may include the header file into the program text.
✓ Rational preprocessor: Such a preprocessor provides the user with built in macro for construct
like while statement or if statement.
✓ Language extensions: this processors attempt to add capabilities to the language by what
amount to built-in macros. Ex: the language equal is a database query language embedded in C.
statement beginning with ## are taken by preprocessor to be database access statement
unrelated to C and translated into procedure call on routines that perform the database access.
❖ Assembler
✓ Assembler is a translator which takes the assembly program as an input and generates the
machine code as a output. An assembly is a mnemonic version of machine code, in which
names are used instead of binary codes for operations.
❖ Linker
✓ Linker allows us to make a single program from a several files of relocatable machine code.
These file may have been the result of several different compilation, and one or more may be
4
library files of routine provided by a system.
❖ Loader
✓ The process of loading consists of taking relocatable machine code, altering the relocatable
address and placing the altered instructions and data in memory at the proper location.
If If If
Relation <,<=,= ,< >,>=,> < or <= or = or < > or >= or >
Id Pi, count, n, I letter followed by letters and
digits.
Number 3.14159, 0, 6.02e23 Any numeric constant
Literal "SNPITRC" Any character between “ and “ except “
Example:
total = sum + 12.5
Tokens are: total (id),
= (relation)
Sum (id)
+ (operator)
12.5 (num)
Lexemes are: total, =, sum, +, 12.5
Q.4. Construct NFA for following Regular Expression using Thomson’s Construction. Apply subset
construction method to convert into DFA.
(a + b)*abb
Ans=
5
6
7
Q.5. Draw DFA for the following regular expression using firstpos( ), lastpos( ) and followpos( ) functions.
(a|b)*abb#
Ans=
8
Q.6. Write a short note on LEX Tool.
Ans=
Lex is a program that generates lexical analyzer. It is used with YACC parser generator. The
lexical analyzer is a program that transforms an input stream into a sequence of tokens.It reads the
input stream and produces the source code as output through implementing the lexical analyzer in
the C program.
The function of Lex is as follows:
✓ Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex compiler runs
thelex.1 program and produces a C program lex.yy.c.
✓ Finally C compiler runs the lex.yy.c program and produces an object program a.out.
✓ a.out is lexical analyzer that transforms an input stream into a sequence of tokens.
9
Q.7. What is input buffering? Explain technique of buffer pair. OR
Which technique is used for speeding up the lexical analyzer?
Ans=
10
11
Q.8. Differentiate between parse tree and syntax tree
Ans=
12
Q.9. Consider the following grammar and construct the corresponding left most and right most derivations
for the sentence a*a-a.
S→S+S | S-S | S*S | S/S | a
Q.10. Write a rule of Left factoring & Left recursion and give example.
Ans=
❖ Left recursion:
✓ A grammar is said to be left recursive if it has a non terminal A such that there is a derivation
A→Aα for some string α.
✓ Top down parsing methods cannot handle left recursive grammar, so a transformation that
eliminates left recursion is needed.
✓ Rule to Remove Left Recursion:
13
❖ Left factoring:
✓ Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing.
Ans=
14
1) Remove left recursion:
2) Find first and follow for each non terminal for Resultant grammar
4) Parse the following string (show stack actions clearly) and draw parse tree for the input: id + id * id
15
Q.12. Define handle and handle pruning. Explain the stack implementation of shift reduce parser with the
help of example.
Ans=
Handle: A “handle” is a substring of the string that matches the right side of a production, and we can
reduce such string by a non terminal on left hand side production.
Handle pruning: The process of discovering a handle and reducing it to appropriate left hand side
non terminal is known as handle pruning.
Grammar:
E→E+T | T
T→T*F | F
F→id
String: id+id*id
16
17
Q.13. Construct SLR Parsing Table for the following grammar.
E→E + T | T
T→ T * F | F
F→ (E)
F→ id
Ans=
18
19
20
Q.14. Construct CLR parsing table for the given grammar
S→CC
C→ cC | d
Ans=
21
Q.15. Check whether the grammar is LALR or not.
OR
Construct LALR(1) parsing table for the following grammar.
S→CC
C→ cC | d
Ans=
22
23
Q.16. Show syntax directed definition for simple desk calculator. Also show annotated parse tree for 4*5+6;
Ans=
Consider the following context free grammar for simple desk calculator.
S→ EN
E→ E1 + T
E→ E1 - T
E→ T
T→ T1 * F
T→ F
F→ (E)
F→ digit
N→ ;
24
Q.17. Explain synthesized attributes with the help of an example.
Ans=
✓ Syntax directed definition is a generalization of context free grammar in which each grammar
symbol has an associated set of attributes and each grammar production has an associated set
of semantic rules.
✓ The attributes can be a number, type, memory location, return type etc….
❖ Types of attributes are:
1. Synthesized attribute
2. Inherited attribute
1. Synthesized attribute
✓ These are those attributes which derive their values from their children node. i.e Value of
synthesized attribute at a node can be computed from the value of attributes at the children of
that node in the parse tree.
✓ A syntax directed definition that uses synthesized attribute exclusively is said to be S-attribute
definition.
✓ To compute S-attributed definition perform the following step:
1. Write syntax directed definition.
2. Generate annoted parse tree.
3. Compute attribute values by following bottom-up approach.
4. The value at root node is the final value of the expression.
Consider the following context free grammar for simple desk calculator.
S→ EN
E→ E1 + T
E→ E1 - T
E→ T
T→ T1 * F
T→ F
25
F→ (E)
F→ digit
N→ ;
Step 1: Syntax directed definition
Step 4: The value at root node is the final value of the expression.
26
Q.18. Differentiate Top Down Parsing and Bottom up parsing
Ans=
27
2. Postfix notation
✓ This is most natural way of representation in expression evaluation in compiler.
✓ In postfix notation the operands occurs first and then operators are arranged.
❖ Example: Consider the following string: a + b * c + d * e ↑ f
The postfix notation of the string is: abc*+def↑*+
28
✓ The three address statements with unary operator like a = -y or a = y do not use agr2.
2. Triple
✓ In triple, temporaries are not used instead of that pointers in the symbol table are used directly
✓ If we do so, three address statements can be represented by records with only three fields: op,
arg1 and arg2.
✓ Numbers in the round bracket ( ) are used to represent pointers into the triple structure.
3. Indirect Triples
✓ In the indirect triple representation the listing of triples has been done. And listing pointers are
used instead of using statement.
✓ This implementation is called indirect triples.
29
Q.21. Write a note on peephole optimization.
Ans=
✓ Peephole optimization is a simple and effective technique for locally improving target code. This
technique is applied to improve the performance of the target program by examining the short
sequence of target instructions (called the peephole) and replacing these instructions by shorter or
faster sequence whenever possible. Peephole is a small, moving window on the target program.
It may be possible to eliminate the statement L1: goto L2 provided it is preceded by an unconditional
jump. Similarly, the sequence can be replaced by:
3. Algebraic simplification
✓ Peephole optimization is an effective technique for algebraic simplification.
✓ The statements such as x = x + 0 or x := x* 1 can be eliminated by peephole optimization.
4. Reduction in strength
✓ Certain machine instructions are cheaper than the other.
✓ In order to improve performance of the intermediate code we can replace these instructions by
equivalent cheaper instruction.
✓ For example, x2 is cheaper than x * x.
30
✓ Similarly, addition and subtraction are cheaper than multiplication and division. So we can add
effectively equivalent addition and subtraction for multiplication and division.
5. Machine idioms
✓ The target instructions have equivalent machine instructions for performing some operations.
✓ Hence we can replace these target instructions by equivalent machine instructions in order to
improve the efficiency.
✓ Example: Some machines have auto-increment or auto-decrement addressing modes.
✓ These modes can be used in code for statement like i=i+1.
31
32
Q.23. Explain all error recovery strategies using suitable examples.
Ans=
✓ There are mainly four error recovery strategies:
1. Panic mode
2. Phrase level recovery
3. Error production
4. Global correction
1. Panic mode
✓ This strategy is used by most parsing methods. This is simple to implement.
✓ In this method on discovering error, the parser discards input symbol one at a time. This process
is continued until one of a designated set of synchronizing tokens is found.
✓ Synchronizing tokens are delimiters such as semicolon or end. These tokens indicate an end of
the input statement.
✓ Thus in panic mode recovery a considerable amount of input checking it for additional errors.
✓ If there is less number of errors in the same statement then this strategy is best choice.
✓ Example:
33
the grammar for the corresponding language with error productions that generate the erroneous
constructs.
✓ If error production is used during parsing, we can generate appropriate error message to indicate
the erroneous construct that has been recognized in the input.
✓ This method is extremely difficult to maintain, because if we change grammar then it becomes
necessary to change the corresponding productions.
• For Example: suppose the input string is abcd
4. Global correction
✓ We often want such a compiler that makes very few changes in processing an incorrect input
string.
✓ Given an incorrect input string x and grammar G, the algorithm will find a parse tree for a related
string y, such that number of insertions, deletions and changes of token require to transform x
into y is as small as possible.
✓ Such methods increase time and space requirements at parsing time.
✓ Global production is thus simply a theoretical concept.
Input to the code generator consists of the intermediate representation of the source program.
34
Types of intermediate language are:
Postfix notation
Quadruples
Syntax trees or DAGs
The detection of semantic error should be done before submitting the input to the code
generator.
The code generation phase requires complete error free intermediate code as an input.
2. Target program
The output may be in form of:
Absolute machine language: Absolute machine language program can be placed in a memory
location and immediately execute.
Relocatable machine language: The subroutine can be compiled separately. A set of
relocatable object modules can be linked together and loaded for execution.
Assembly language: Producing an assembly language program as output makes the process of
code generation easier, then assembler is require to convert code in binary form.
3. Memory management
Mapping names in the source program to addresses of data objects in run time memory is
done cooperatively by the front end and the code generator.
We assume that a name in a three-address statement refers to a symbol table entry for the
name.
From the symbol table information, a relative address can be determined for the name in a
data area.
4. Instruction selection
Example: the sequence of statements
a := b + c
d := a + e
would be translated into
MOV b, R0
ADD c, R0
MOV R0, a
MOV a, R0
ADD e, R0
MOV R0, d
Here the fourth statement is redundant, so we can eliminate that statement.
35
5. Register allocation
The use of registers is often subdivided into two sub problems:
During register allocation, we select the set of variables that will reside in registers at a point
in the program.
During a subsequent register assignment phase, we pick the specific register that a variable
will reside in.
Finding an optimal assignment of registers to variables is difficult, even with single register
value.
Mathematically the problem is NP-complete.
6. Choice of evaluation
The order in which computations are performed can affect the efficiency of the target code.
Some computation orders require fewer registers to hold intermediate results than others.
Picking a best order is another difficult, NP-complete problem.
7. Approaches to code generation
The most important criterion for a code generator is that it produces correct code.
The design of code generator should be in such a way so it can be implemented, tested, and
maintained easily.
✓ Temporary values: stores the values that arise in the evaluation of an expression.
✓ Local variables: hold the data that is local to the execution of the procedure.
✓ Machine status: holds the information about status of machine just before the function call.
36
✓ Access link (optional): refers to non-local data held in other activation records.
✓ Control link (optional): points to activation record of caller.
✓ Actual parameters: This field holds the information about the actual parameters.
✓ Return value: used by the called procedure to return a value to calling procedure.
The calling sequence and its division between caller and callee are as follows:
1. The caller evaluates the actual parameters.
2. The caller stores a return address and the old value of top_sp into the callee’s
activation record. The caller then increments the top_sp to the respective positions.
37
3. The callee saves the register values and other status information.
4. The callee initializes its local data and begins execution.
Stack allocation: Variable length data on stack
The run time memory management system must deal frequently with the allocation of objects,
the sizes of which are not known at the compile time, but which are local to a procedure and
thus may be allocated on the stack.
The same scheme works for objects of any type if they are local to the procedure called have a
size that depends on the parameter of the call.
This symbol table implementation is using linked list. A link field is added to each record.
We search the records in the order pointed by the link of link field.
The pointer “First” is maintained to point to first record of the symbol table.
3) Binary tree
When the organization symbol table is by means of binary tree, the node structure will as
follows:
The left child field stores the address of previous symbol.
Right child field stores the address of next symbol.
The symbol field is used to store the name of the symbols.
Information field is used to give information about the symbol.
4) Hash table
In hashing scheme two tables are maintained-a hash table and symbol table.
The hash table consists of k entries from 0,1 to k-1. These entries are basically pointers to
symbol table pointing to the names of symbol table.
To determine whether the 'Name' is in symbol table, we use a hash function 'h' such that
h(name) will result any integer between 0 to k-1. We can search any name by
position=h(name).
Using this position we can obtain the exact locations of name in symbol table.
Advantage of hashing is quick search is possible and the disadvantage is that hashing is
complicated to implement.
39
Q.28. Explain the following parameter passing methods.
1. Call-by-value 2. Call-by-reference 3. Copy-Restore 4. Call-by-Name
Ans=
1. Call by Value:
This is the simplest method of parameter passing.
The call by value method of passing arguments to a function copies the actual value of an
argument into the formal parameter of the function.
The operations on formal parameters do not change the values of a parameter.
2. Call by Reference:
This method is also called as call by address or call by location.
The call by reference method of passing arguments to a function copies the address of an
argument into the formal parameter.
Inside the function, the address is used to access the actual argument used in the call.
It means the changes made to the parameter affect the passed argument.
3. Copy Restore:
This method is a hybrid between call by value and call by reference.
This method is also known as copy-in-copy-out or values result.
The calling procedure calculates the value of actual parameter and it then copied to activation
record for the called procedure.
During execution of called procedure, the actual parameters value is not affected.
If the actual parameter has L-value then at return the value of formal parameter is copied to
actual parameter.
4. Call by Name
This is less popular method of parameter passing.
Procedure is treated like macro.
The procedure body is substituted for call in caller with actual parameters substituted for
formals.
The local names of called procedure and names of calling procedure are distinct.
The actual parameters can be surrounded by parenthesis to preserve their integrity.
40
Q.29. Differentiate: static v/s dynamic memory allocations.
Ans=
41
each of them. It assigns addresses to labels by counting their position from the starting address.
Design of two pass assembler
The two pass assembler performs the following functions. It performs some function in pass 1 and
some functions in pass 2.
Pass 1
1) Assign address to all statements in the assembly language program.
2) Save the address with label for use in pass 2.
3) Define symbols and literals.
4) Determine the length of machine instructions
5) Keep track of location counter.
6) Process some assembler directions or operations
Pass 2
1) Perform processing of assembler directives which are not done during the pass 1.
2) Generate the object program.
42
LITTAB (literal table)
Fields: a) literals-constants b) address-address of the literal
LITTAB collects all the literals used in the program address field will be later filled in on
encountering LTORG statement.
The various tables used by the assembler are filled during the pass1 and the output is the intermediate
code.
43