CD Unit 4
CD Unit 4
Symbol
Table
• A symbol table is simply a table which can be either linear or a hash table.
• It maintains an entry for each name in the following format:
• <symbol name, type, attribute>
• For example: static int interest;
• then it should store the entry such as: <interest, int, static>
• The attribute clause contains the entries related to the name.
• Each entry in the symbol table contain:
• The name of an identifier
• Additional information: its kinds, its type, if it is constant.
• 1. In a compiler: a data structure used by the compiler to keep track of identifiers used in the source
program.
1
Prof Anil Kumar
• This is a compile-time data structure. Not used at run time.
• 2. In object files: a symbol table (mapping var name to address) can be build into object programs, to be
used during linking of different object programs to resolve reference.
• 3. In executables: a symbol table (again mapping name to address) can be included in executables, to
recover variable names during debugging.
• Symbol table is mostly constructed during semantic analysis, Lexical Analysis can record that given token
is an identifier.
• But it is only after semantic analysis that the compiler can know the context of the identifier.
Variable Declarations
• In static-typing programming languages, variables need to be declared before they are used.
• The declaration provides the data type of the variable.
• E.g., int a; float b; string c;
• Most typically, declaration is valid for the scope in which it occurs:
• Function Scoping: each variable is defined anywhere in the function in which it is defined, after the point
of definition
• Block Scoping: variables are only valid within the block of code in which it is defined, e.g,
•
prog xxx {int a; float b}
{ int c;
{ int b;
c = a + b;
}
return float(c) / b
}
Scope Information
• Scope information characterizes the declaration of identifiers and the portions of the program where use of
each identifier is allowed
• Example identifiers: variables, functions, objects, labels
• Lexical scopeis a textual region in the program
• –Statement block
• –Formal argument list
• –Object body
• –Function or method body
• –Module body
• –Whole program (multiple modules)
• •Scope of an identifier:the lexical scope in which it is valid
Symbol Table Entry
• Associates a name with a set of attributes.
• For example:
• Kinds of names (variable, type, function….)
• Type (int, float, char…….)
• Nesting level
• Size
• Memory location (i.e. where it will be found at runtime)
2
Prof Anil Kumar
3
Prof Anil Kumar
• Semantic checks refer to properties of identifiers in the program --their scope or type
• Need an environment to store the information about identifiers = symbol table
• Each entry in the symbol table contains
• –the name of an identifier
• –additional information: its kind, its type, if it is constant, …
Scope Information
• How to represent scope information in the symbol table?
• •Idea:
• There is a hierarchy of scopes in the program
• Use a similar hierarchy of symbol tables
• One symbol table for each scope
• Each symbol table contains the symbols declared in that lexical scope
4
Prof Anil Kumar
5
Prof Anil Kumar
6
Prof Anil Kumar
7
Prof Anil Kumar
Purpose of Symbol Table
• Keep Track of names declared in the program names of:
• Variable
• Function
• Classes
• Types…..
8
Prof Anil Kumar
9
Prof Anil Kumar
10
Prof Anil Kumar
11
Prof Anil Kumar
12
Prof Anil Kumar
13
Prof Anil Kumar
14
Prof Anil Kumar
15
Prof Anil Kumar
As you can see in the above code, the closing braces are missing so it results in syntactic error.
Example2- Missing Semicolon
x = a + b * c //missing semicolon
17
Prof Anil Kumar
Semantic Phase Errors
This type of error appears during the semantic analysis phase. These types of errors are detected during the
compilation process. Now, it is the phase where your defined identifiers are verified.
The majority of compile-time errors are scope and declaration errors. For example, undeclared identifiers or
multiple declared identifiers. Semantic errors can occur when the invalid variable or operator is used, or the
operations are performed in the incorrect order.
There can be different types of compilation errors depending on the program you’ve written.
Some examples of semantic errors are:
Operands of incompatible types
Variable not declared
The failure to match the actual argument with the formal argument
Advantages:
Improved code quality: Error detection and recovery in a compiler can improve the overall quality of the code
produced. This is because errors can be identified early in the compilation process and addressed before they
become bigger issues.
Increased productivity: Error recovery can also increase productivity by allowing the compiler to continue
processing the code after an error is detected. This means that developers do not have to stop and fix every error
manually, saving time and effort.
Better user experience: Error recovery can also improve the user experience of software applications. When
errors are handled gracefully, users are less likely to become frustrated and are more likely to continue using the
application.
Better debugging: Error recovery in a compiler can help developers to identify and debug errors more
efficiently. By providing detailed error messages, the compiler can assist developers in pinpointing the source
of the error, saving time and effort.
Consistent error handling: Error recovery ensures that all errors are handled in a consistent manner, which
can help to maintain the quality and reliability of the software being developed.
Reduced maintenance costs: By detecting and addressing errors early in the development process, error
recovery can help to reduce maintenance costs associated with fixing errors in later stages of the software
development lifecycle.
Improved software performance: Error recovery can help to identify and address code that may cause
performance issues, such as memory leaks or inefficient algorithms. By improving the performance of the code,
the overall performance of the software can be improved as well.
18
Prof Anil Kumar
Disadvantages:
Slower compilation time: Error detection and recovery can slow down the compilation process, especially if
the recovery mechanism is complex. This can be an issue in large software projects where the compilation t ime
can be a bottleneck.
Increased complexity: Error recovery can also increase the complexity of the compiler, making it harder to
maintain and debug. This can lead to additional development costs and longer development times.
Risk of silent errors: Error recovery can sometimes mask errors in the code, leading to silent errors that go
unnoticed. This can be particularly problematic if the error affects the behavior of the software application in
subtle ways.
Potential for incorrect recovery: If the error recovery mechanism is not implemented correctly, it can
potentially introduce new errors or cause the code to behave unexpectedly.
Dependency on the recovery mechanism: If developers rely too heavily on the error recovery mechanism,
they may become complacent and not thoroughly check their code for errors. This can lead to errors being
missed or not addressed properly.
Difficulty in diagnosing errors: Error recovery can make it more difficult to diagnose and debug errors since
the error message may not accurately reflect the root cause of the issue. This can make it harder to fix errors and
may lead to longer development times.
Compatibility issues: Error recovery mechanisms may not be compatible with certain programming languages
or platforms, leading to issues with portability and cross-platform development .
Code Optimization
19
Prof Anil Kumar
The code optimization in the synthesis phase is a program transformation technique, which tries to improve the
intermediate code by making it consume fewer resources (i.e. CPU, Memory) so that faster-running machine
code will result. Compiler optimizing process should meet the following objectives :
The optimization must be correct, it must not, in any way, change the meaning of the program.
Optimization should increase the speed and performance of the program.
The compilation time must be kept reasonable.
The optimization process should not delay the overall compiling process.
Code optimization is a program modification strategy that endeavours to enhance the intermediate code, so a
program utilises the least potential memory, minimises its CPU time and offers high speed.
The principal sources of optimization in computer science and IT engineering are code optimization, algorithm
optimization, system optimization, network optimization, and data optimization. These sources help in improving
the performance, scalability, and reliability of software systems.
Loop optimization is the process of increasing execution speed and reducing the overheads associated with loops.
It plays an important role in improving cache performance and making effective use of parallel processing
capabilities.
Loop Optimization is the process of increasing execution speed and reducing the overheads associated with
loops. It plays an important role in improving cache performance and making effective use of parallel
processing capabilities. Most execution time of a scientific program is spent on loops.
Loop Optimization Techniques
In the compiler, we have various loop optimization techniques, which are as follows:
In frequency reduction, the amount of code in the loop is decreased. A statement or expression, which can be
moved outside the loop body without affecting the semantics of the program, is moved outside the loop.
Example:
Before optimization:
while(i<100)
{
a = Sin(x)/Cos(x) + i;
i++;
}
After optimization:
t = Sin(x)/Cos(x);
while(i<100)
{
a = t + i;
i++;
}
3. Strength Reduction
Strength reduction deals with replacing expensive operations with cheaper ones like multiplication is costlier
than addition, so multiplication can be replaced by addition in the loop.
Example:
Before optimization:
while (x<10)
{
y := 3 * x+1;
a[y] := a[y]-2;
x := x+2;
} After optimization:
t= 3 * x+1;
while (x<10)
{
y=t;
a[y]= a[y]-2;
x=x+2;
t=t+6;
}
5. Loop Unrolling
Loop unrolling is a loop transformation technique that helps to optimize the execution time of a program. We
basically remove or reduce iterations. Loop unrolling increases the program’s speed by eliminating loop control
instruction and loop test instructions.
Example:
Before optimization:
printf("Pankaj\n");
printf("Pankaj\n");
printf("Pankaj\n");
printf("Pankaj\n");
printf("Pankaj\n");
6. Loop Jamming
Loop jamming is combining two or more loops in a single loop. It reduces the time taken to compile the many
loops.
Example:
Before optimization:
22
Prof Anil Kumar
for(int i=0; i<5; i++)
a = i + 5;
for(int i=0; i<5; i++)
b = i + 10;
After optimization:
7. Loop Fission
Loop fission improves the locality of reference, in loop fission a single loop is divided into multiple loops over
the same index range, and each divided loop contains a particular part of the original loop.
Example:
Before optimization:
for(x=0;x<10;x++)
{
a[x]=…
b[x]=…
}After optimization:
for(x=0;x<10;x++)
a[x]=…
for(x=0;x<10;x++)
b[x]=…
8. Loop Interchange
In loop interchange, inner loops are exchanged with outer loops. This optimization technique also improves the
locality of reference.
Example:
Before optimization:
for(x=0;x<10;x++)
23
Prof Anil Kumar
for(y=0;y<10;y++)
a[y][x]=…
After optimization:
for(y=0;y<10;y++)
for(x=0;x<10;x++)
a[y][x]=…
9. Loop Reversal
Loop reversal reverses the order of values that are assigned to index variables. This help in removing
dependencies.
Example:
Before optimization:
for(x=0;x<10;x++)
a[9-x]=…
After optimization:
for(x=9;x>=0;x--)
a[x]=…
Loop Splitting simplifies a loop by dividing it into numerous loops, and all the loops have some bodies but they
will iterate over different index ranges. Loop splitting helps in reducing dependencies and hence making code
more optimized.
Example:
Before optimization:
for(x=0;x<10;x++)
if(x<5)
a[x]=…
else
b[x]=…
After optimization:
24
Prof Anil Kumar
for(x=0;x<5;x++)
a[x]=…
for(;x<10;x++)
b[x]=…
Loop peeling is a special case of loop splitting, in which a loop with problematic iteration is resolved separately
before entering the loop.
Before optimization:
for(x=0;x<10;x++)
if(x==0)
a[x]=…
else
b[x]=…
After optimization:
a[0]=…
for(x=1;x<100;x++)
b[x]=…
12. Unswitching
Unswitching moves a condition out from inside the loop, this is done by duplicating loop and placing each of its
versions inside each conditional clause.
Before optimization:
for(x=0;x<10;x++)
if(s>t)
a[x]=…
else
b[x]=…
After optimization:
if(s>t)
for(x=0;x<10;x++)
a[x]=…
25
Prof Anil Kumar
else
for(x=0;x<10;x++)
b[x]=…
26
Prof Anil Kumar
Here,
All the statements execute in a sequence one after the other.
Thus, they form a basic block.
Example Of Not A Basic Block-
Three Address Code for the expression If A<B then 1 else 0 is-
Flow Grapgh
A flow graph is a directed graph with flow control information added to the basic blocks.
The basic blocks serve as nodes of the flow graph.
There is a directed edge from block B1 to block B2 if B2 appears immediately after B1 in the code.
Compute the basic blocks for the given three address statements-
(1) PROD = 0
(2) I = 1
(3) T2 = addr(A) – 4
(4) T4 = addr(B) – 4
(5) T1 = 4 x I
(6) T3 = T2[T1]
(7) T5 = T4[T1]
(8) T6 = T3 x T5
(9) PROD = PROD + T6
(10) I = I + 1
(11) IF I <=20 GOTO (5)
27
Prof Anil Kumar
Solution-
We have-
PROD = 0 is a leader since first statement of the code is a leader.
T1 = 4 x I is a leader since target of the conditional goto statement is a leader.
Now, the given code can be partitioned into two basic blocks as-
Draw a flow graph for the three address statements given in above problem.
Solution-
Firstly, we compute the basic blocks (already done above).
Secondly, we assign the flow control information.
28
Prof Anil Kumar
DAG is a very useful data structure for implementing transformations on Basic Blocks.
DAGs are used for the following purposes-
To determine the expressions which have been computed more than once (called common sub-
expressions).
To determine the names whose computation has been done outside the block but used inside the block.
To determine the statements of the block whose computed value can be made available outside the block.
To simplify the list of Quadruples by not executing the assignment instructions x:=y unless they are
necessary and eliminating the common sub-expressions.
Following rules are used for the construction of DAGs-
Rule-01:
In a DAG,
Interior nodes always represent the operators.
Exterior nodes (leaf nodes) always represent the names, identifiers or constants.
Rule-02:
While constructing a DAG,
A check is made to find if there exists any node with the same value.
A new node is created only when there does not exist any node with the same value.
29
Prof Anil Kumar
This action helps in detecting the common sub-expressions and avoiding the re-computation of the same.
Rule-03:
The assignment instructions of the form x:=y are not performed unless they are necessary.
Consider the following expression and construct a DAG for it-
(a+b)x(a+b+c)
Solution-
Code generation
30
Prof Anil Kumar
In computing, code generation is part of the process chain of a compiler and converts intermediate
representation of source code into a form (e.g., machine code) that can be readily executed by the
target system.
Code generator is used to produce the target code for three-address statements. It uses registers to store the
operands of the three address statement.
Example:
Consider the three address statement x:= y + z. It can have the following sequence of codes:
MOV x, R0
ADD y, R0
Register and address Descriptor
o A register descriptor contains the track of what is currently in each register. The register descriptors show
that all the registers are initially empty.
o An address descriptor is used to store the location where current value of the name can be found at run
time.
31
Prof Anil Kumar
4) 4)Graphical representations such as syntax trees and dags. Prior to code generation, the front end must be
scanned, parsed and translated into intermediate representation along with necessary type checking. Therefore,
input to code generation is assumed to be error-free.
2. Target program: The output of the code generator is the target program. The output may be :
a. Absolute machine language: It can be placed in a fixed memory location and can be executed immediately.
b. b. Relocatable machine language: It allows subprograms to be compiled separately. c. Assembly language:
Code generation is made easier.
3.Memory management: Names in the source program are mapped to addresses of data objects in run-time
memory by the front end and code generator. It makes use of symbol table, that is, a name in a threeaddress
statement refers to a symbol-table entry for the name.
4. Instruction selection:
a. The instructions of target machine should be complete and uniform.
b. Instruction speeds and machine idioms are important factors when efficiency of target program is considered.
c. The quality of the generated code is determined by its speed and size.
5. Register allocation: Instructions involving register operands are shorter and faster than those involving
operands in memory. The use of registers is subdivided into two sub problems : Register allocation – the set of
variables that will reside in registers in the program is selected. Register assignment -the specific register that a
variable will reside is selected
TARGET MACHINE:
Familiarity with the target machine and its instruction set is a prerequisite for designing a good code generator.
The target computer is a byte-addressable machine with 4 bytes to a word. It has n general-purpose registers, R0,
R1, . . . , Rn-1.
It has two-address instructions of the form:
op source, destination
where, op is an op-code, and source and destination are data fields.
It has the following op-codes :
MOV (move source to destination)
ADD (add source to destination)
SUB (subtract source from destination)
The source and destination of an instruction are specified by combining registers and memory locations with
address modes
A SIMPLE CODE GENERATOR:
A code generator generates target code for a sequence of three- address statements and effectively uses registers
to store operands of the statements.
For example: consider the three-address statement
a := b+c can have the following sequence of codes:
ADD Rj, Ri Cost = 1 // if Ri contains b and Rj contains c (or)
ADD c, Ri Cost = 2 // if c is in a memory location (or)
32
Prof Anil Kumar
MOV c, Rj Cost = 3 // move c from memory to Rj
ADD Rj, Ri
Optimized code:
y = x + 5;
w = y * 3; //* there is no i now
//* We've removed two redundant variables i & z whose value were just being copied from one another.
B. Constant folding: The code that can be simplified by the user itself, is simplified. Here simplification to be
done at runtime are replaced with simplified code to avoid additional computation.
Initial code:
x = 2 * 3;
Optimized code:
x = 6;
C. Strength Reduction: The operators that consume higher execution time are replaced by the operators
consuming less execution time.
Initial code:
y = x * 2;
33
Prof Anil Kumar
Optimized code:
y = x + x; or y = x << 1;
Initial code:
y = x / 2;
Optimized code:
y = x >> 1;
D. Null sequences/ Simplify Algebraic Expressions : Useless operations are deleted.
a := a + 0;
a := a * 1;
a := a/1;
a := a - 0;
E. Combine operations: Several operations are replaced by a single equivalent operation.
F. Deadcode Elimination:- Dead code refers to portions of the program that are never executed or do not affect
the program’s observable behavior. Eliminating dead code helps improve the efficiency and performance of the
compiled program by reducing unnecessary computations and memory usage.
Initial Code:-
int Dead(void)
{
int a=10;
int z=50;
int c;
c=z*5;
printf(c);
a=20;
a=a*10; //No need of These Two Lines
return 0;
}
Optimized Code:-
int Dead(void)
{
int a=10;
int z=50;
int c;
c=z*5;
printf(c);
return 0;
}
35