0% found this document useful (0 votes)
34 views35 pages

CD Unit 4

The document discusses symbol tables and type checking in compilers. Some key points: - Symbol tables store information about variables, functions, classes, etc. to track scope and type. They are built during semantic analysis. - Symbol table entries associate names with attributes like kind, type, scope. Symbol tables are hierarchical, mirroring the scope structure. - Type checking verifies that variables are used according to their declared type. The type checker uses the symbol table to check expressions. - Semantic rules ensure identifiers are only used within scope and not redeclared within the same scope. The symbol table resolves name collisions.

Uploaded by

TAYYAB ANSARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views35 pages

CD Unit 4

The document discusses symbol tables and type checking in compilers. Some key points: - Symbol tables store information about variables, functions, classes, etc. to track scope and type. They are built during semantic analysis. - Symbol table entries associate names with attributes like kind, type, scope. Symbol tables are hierarchical, mirroring the scope structure. - Type checking verifies that variables are used according to their declared type. The type checker uses the symbol table to check expressions. - Semantic rules ensure identifiers are only used within scope and not redeclared within the same scope. The symbol table resolves name collisions.

Uploaded by

TAYYAB ANSARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Prof Anil Kumar

Symbol Tables and Type Checking


Symbol table
• Symbol table is an important data structure created and maintained by compilers in order to store
information about the occurrence of various entities such as variable names, function names, objects,
classes, interfaces, etc.
• Symbol table is used by both the analysis and the synthesis parts of a compiler.
• A symbol table may serve the following purposes depending upon the language in hand:
• To store the names of all entities in a structured form at one place.
• To verify if a variable has been declared.
• To implement type checking, by verifying assignments and expressions in the source code are semantically
correct.
• To determine the scope of a name scope resolution
• When identifiers are found, they will be entered into a symbol table, which will hold all relevant
information about identifiers.
• This information will be used later by the semantic analyzer and the code generator.

Lexical Syntax Semantic Code


Analyzer Analyzer Analyzer Generator

Symbol
Table

Symbol Table Entries


• We will store the following information about identifiers.
• The name (as a string).
• The data type.
• The block level.
• Its scope (global, local, or parameter).
• Its offset from the base pointer (for local variables and parameters only).
• This information is stored in an object called an IdEntry.
• This information may not all be known at once.

• A symbol table is simply a table which can be either linear or a hash table.
• It maintains an entry for each name in the following format:
• <symbol name, type, attribute>
• For example: static int interest;
• then it should store the entry such as: <interest, int, static>
• The attribute clause contains the entries related to the name.
• Each entry in the symbol table contain:
• The name of an identifier
• Additional information: its kinds, its type, if it is constant.
• 1. In a compiler: a data structure used by the compiler to keep track of identifiers used in the source
program.
1
Prof Anil Kumar
• This is a compile-time data structure. Not used at run time.
• 2. In object files: a symbol table (mapping var name to address) can be build into object programs, to be
used during linking of different object programs to resolve reference.
• 3. In executables: a symbol table (again mapping name to address) can be included in executables, to
recover variable names during debugging.
• Symbol table is mostly constructed during semantic analysis, Lexical Analysis can record that given token
is an identifier.
• But it is only after semantic analysis that the compiler can know the context of the identifier.
Variable Declarations
• In static-typing programming languages, variables need to be declared before they are used.
• The declaration provides the data type of the variable.
• E.g., int a; float b; string c;
• Most typically, declaration is valid for the scope in which it occurs:
• Function Scoping: each variable is defined anywhere in the function in which it is defined, after the point
of definition
• Block Scoping: variables are only valid within the block of code in which it is defined, e.g,

prog xxx {int a; float b}
{ int c;
{ int b;
c = a + b;
}
return float(c) / b
}
Scope Information
• Scope information characterizes the declaration of identifiers and the portions of the program where use of
each identifier is allowed
• Example identifiers: variables, functions, objects, labels
• Lexical scopeis a textual region in the program
• –Statement block
• –Formal argument list
• –Object body
• –Function or method body
• –Module body
• –Whole program (multiple modules)
• •Scope of an identifier:the lexical scope in which it is valid
Symbol Table Entry
• Associates a name with a set of attributes.
• For example:
• Kinds of names (variable, type, function….)
• Type (int, float, char…….)
• Nesting level
• Size
• Memory location (i.e. where it will be found at runtime)

2
Prof Anil Kumar

3
Prof Anil Kumar

Semantic Rules for Scopes


• Main rules regarding scopes:
• Rule 1:Use an identifier only if defined in enclosing scope
• Rule 2: Do not declare identifiers of the same kind with identical names more than once in the same scope
• Can declare identifiers with the same name with identical or overlapping lexical scopes if they are of
different kinds

• Semantic checks refer to properties of identifiers in the program --their scope or type
• Need an environment to store the information about identifiers = symbol table
• Each entry in the symbol table contains
• –the name of an identifier
• –additional information: its kind, its type, if it is constant, …

Scope Information
• How to represent scope information in the symbol table?
• •Idea:
• There is a hierarchy of scopes in the program
• Use a similar hierarchy of symbol tables
• One symbol table for each scope
• Each symbol table contains the symbols declared in that lexical scope

4
Prof Anil Kumar

Identifiers With Same Name


• The hierarchical structure of symbol tables automatically solves the problem of resolving name
collisions(identifiers with the same name and overlapping scopes)
• To find the declaration of an identifier that is active at a program point:
• Start from the current scope
• Go up in the hierarchy until you find an identifier with the same name, or fail

5
Prof Anil Kumar

Symbol Table Operations


• Three operations
• Create a new empty symbol table with a given parent table
• Insert a new identifier in a symbol table (or error)
• Lookup an identifier in a symbol table (or error)
• Cannot build symbol tables during lexical analysis
• hierarchy of scopes encoded in the syntax
• Build the symbol tables:
• While parsing, using the semantic actions
Array Implementation
• Simple implementation = array
• One entry per symbol
• Scan the array for lookup, compare name at each entry
• Disadvantage:
• table has fixed size
• need to know in advance the number of entries

6
Prof Anil Kumar

7
Prof Anil Kumar
Purpose of Symbol Table
• Keep Track of names declared in the program names of:
• Variable
• Function
• Classes
• Types…..

Symantic Analysis: Type and Type checking

8
Prof Anil Kumar

9
Prof Anil Kumar

10
Prof Anil Kumar

11
Prof Anil Kumar

12
Prof Anil Kumar

13
Prof Anil Kumar

14
Prof Anil Kumar

Specification of a Simple Type Checker


• The type of each identifier must be declared, before identifier is used.
• The type checker is a translation scheme, that synthesizes the type of each expression from the type of its
sub-expression.
• P D,E
• DD; D|id:T
• Tchar|integer|array[num] of T|*T
• ELitral|num|id|E mod E | E (E )
• Base Type: Integer, Char, Type error

Error Detection and Recovery


Errors in the program should be detected and reported by the parser. Whenever an errors occurs, the parser can
handle it and continue to parse the rest of input. Although the parser is mostly responsible for checking for errors,
errors may occur at various stages of the compilation process.
The process of locating errors and reporting them to users is called the error handling process.
Function of error handler:
 Detection
 Reporting
 Recovery

15
Prof Anil Kumar

Compile Time Errors


These errors occur when we violate the rules present in a syntax. The compile-time error indicates
something that we need to fix before compiling the code. A compiler can easily detect these errors.

Lexical Phase Errors


Lexical errors are errors that your lexer throws when it is unable to continue. This means that there's no way to
recognize a lexeme as a valid token for your lexer. If you consider a lexer to be a finite state machine that accepts
valid input strings, errors are any input strings that do not result in that finite state machine reaching an accepting
state.
During the Lexical analyzer phase, lexical errors occur. In lexical analyzer conversion of the program into the
stream of tokens is done. There are patterns through which the identifiers are identified
A lexical error is a sequence of characters that does not match the pattern of any token. During the execution of a
program, a lexical phase error is found.
Lexical phase error can be:
 Any Spelling errors.
 Exceeding the length of an identifier or numeric constants.
 The appearance of illegal characters.
 To replace a character with an incorrect character.
 Transposition of two characters.
Valid patterns can be:
[0-9]+ ==> NUMBER token
[a-zA-Z] ==> LETTERS token
anything else ==> error!
Error recovery for lexical errors:

Panic Mode Recovery


 In this method, successive characters from the input are removed one at a time until a designated set of
synchronizing tokens is found. Synchronizing tokens are delimiters such as; or }
 The advantage is that it is easy to implement and guarantees not to go into an infinite loop
 The disadvantage is that a considerable amount of input is skipped without checking it for additional
errors

Syntactic Phase Errors


In computer science, a syntactic error is an error in the syntax of a sequence of characters or tokens intended to be
written in a specific programming language. This type of error appears during the syntax analysis phase. Syntax or
syntactic error is also found during the execution of the program.
16
Prof Anil Kumar
Some syntax errors can be:
 Error in structure
 Unbalanced parenthesis
 Missing operators
A syntax error can occur when an invalid calculation is entered into a calculator. This can happen if you enter
multiple decimal points in one number or if you open brackets without closing them.
Example1- Missing closing braces
void printHelloNinja( String s )
{
// function - body

You can also try this code with Online C Compiler


Run Code

As you can see in the above code, the closing braces are missing so it results in syntactic error.
Example2- Missing Semicolon
x = a + b * c //missing semicolon

You can also try this code with Online C Compiler


Run Code

Example3- Errors in Expression


a = (b+c * (c+d); //missing closing parentheses
i = j * + c ; // missing argument between “*” and “+”
Error recovery for syntactic phase error:
1. Panic Mode Recovery
 In this method, successive characters from the input are removed one at a time until a designated set of
synchronizing tokens is found. Synchronizing tokens are deli-meters such as; or }
 The advantage is that it’s easy to implement and guarantees not to go into an infinite loop
 The disadvantage is that a considerable amount of input is skipped without checking it for additional
errors
2. Statement Mode recovery
 In this method, when a parser encounters an error, it performs the necessary correction on the remaining
input so that the rest of the input statement allows the parser to parse ahead.
 The correction can be deletion of extra semicolons, replacing the comma with semicolons, or inserting a
missing semicolon.
 While performing correction, utmost care should be taken for not going in an infinite loop.
 A disadvantage is that it finds it difficult to handle situations where the actual error occurred before
pointing of detection.
3. Error production
 If a user has knowledge of common errors that can be encountered then, these errors can be incorporated
by augmenting the grammar with error productions that generate erroneous constructs.
 If this is used then, during parsing appropriate error messages can be generated and parsing can be
continued.
 The disadvantage is that it’s difficult to maintain.
4. Global Correction
 The parser examines the whole program and tries to find out the closest match for it which is error-free.
 The closest match program has less number of insertions, deletions, and changes of tokens to recover
from erroneous input.
 Due to high time and space complexity, this method is not implemented practically.

17
Prof Anil Kumar
Semantic Phase Errors
This type of error appears during the semantic analysis phase. These types of errors are detected during the
compilation process. Now, it is the phase where your defined identifiers are verified.
The majority of compile-time errors are scope and declaration errors. For example, undeclared identifiers or
multiple declared identifiers. Semantic errors can occur when the invalid variable or operator is used, or the
operations are performed in the incorrect order.
There can be different types of compilation errors depending on the program you’ve written.
Some examples of semantic errors are:
 Operands of incompatible types
 Variable not declared
 The failure to match the actual argument with the formal argument

Example 1: Use of a non-initialized variable


#include <stdio.h>
int main()
{
int a = 0, b = 7;
sum = a + b; // sum is undefined
return 0;
}

Error recovery for Semantic errors


 If the error “Undeclared Identifier” is encountered then, to recover from this a symbol table entry for
the corresponding identifier is made.
 If data types of two operands are incompatible then, automatic type conversion is done by the compiler.

Advantages:

Improved code quality: Error detection and recovery in a compiler can improve the overall quality of the code
produced. This is because errors can be identified early in the compilation process and addressed before they
become bigger issues.
Increased productivity: Error recovery can also increase productivity by allowing the compiler to continue
processing the code after an error is detected. This means that developers do not have to stop and fix every error
manually, saving time and effort.
Better user experience: Error recovery can also improve the user experience of software applications. When
errors are handled gracefully, users are less likely to become frustrated and are more likely to continue using the
application.
Better debugging: Error recovery in a compiler can help developers to identify and debug errors more
efficiently. By providing detailed error messages, the compiler can assist developers in pinpointing the source
of the error, saving time and effort.
Consistent error handling: Error recovery ensures that all errors are handled in a consistent manner, which
can help to maintain the quality and reliability of the software being developed.
Reduced maintenance costs: By detecting and addressing errors early in the development process, error
recovery can help to reduce maintenance costs associated with fixing errors in later stages of the software
development lifecycle.
Improved software performance: Error recovery can help to identify and address code that may cause
performance issues, such as memory leaks or inefficient algorithms. By improving the performance of the code,
the overall performance of the software can be improved as well.

18
Prof Anil Kumar
Disadvantages:

Slower compilation time: Error detection and recovery can slow down the compilation process, especially if
the recovery mechanism is complex. This can be an issue in large software projects where the compilation t ime
can be a bottleneck.
Increased complexity: Error recovery can also increase the complexity of the compiler, making it harder to
maintain and debug. This can lead to additional development costs and longer development times.
Risk of silent errors: Error recovery can sometimes mask errors in the code, leading to silent errors that go
unnoticed. This can be particularly problematic if the error affects the behavior of the software application in
subtle ways.
Potential for incorrect recovery: If the error recovery mechanism is not implemented correctly, it can
potentially introduce new errors or cause the code to behave unexpectedly.
Dependency on the recovery mechanism: If developers rely too heavily on the error recovery mechanism,
they may become complacent and not thoroughly check their code for errors. This can lead to errors being
missed or not addressed properly.
Difficulty in diagnosing errors: Error recovery can make it more difficult to diagnose and debug errors since
the error message may not accurately reflect the root cause of the issue. This can make it harder to fix errors and
may lead to longer development times.
Compatibility issues: Error recovery mechanisms may not be compatible with certain programming languages
or platforms, leading to issues with portability and cross-platform development .

Code Optimization

19
Prof Anil Kumar
The code optimization in the synthesis phase is a program transformation technique, which tries to improve the
intermediate code by making it consume fewer resources (i.e. CPU, Memory) so that faster-running machine
code will result. Compiler optimizing process should meet the following objectives :
 The optimization must be correct, it must not, in any way, change the meaning of the program.
 Optimization should increase the speed and performance of the program.
 The compilation time must be kept reasonable.
 The optimization process should not delay the overall compiling process.

Code optimization is a program modification strategy that endeavours to enhance the intermediate code, so a
program utilises the least potential memory, minimises its CPU time and offers high speed.

The principal sources of optimization in computer science and IT engineering are code optimization, algorithm
optimization, system optimization, network optimization, and data optimization. These sources help in improving
the performance, scalability, and reliability of software systems.

Loop optimization is the process of increasing execution speed and reducing the overheads associated with loops.
It plays an important role in improving cache performance and making effective use of parallel processing
capabilities.

Loop Optimization is the process of increasing execution speed and reducing the overheads associated with
loops. It plays an important role in improving cache performance and making effective use of parallel
processing capabilities. Most execution time of a scientific program is spent on loops.
Loop Optimization Techniques
In the compiler, we have various loop optimization techniques, which are as follows:

1. Code Motion (Frequency Reduction)

In frequency reduction, the amount of code in the loop is decreased. A statement or expression, which can be
moved outside the loop body without affecting the semantics of the program, is moved outside the loop.
Example:
Before optimization:
while(i<100)
{
a = Sin(x)/Cos(x) + i;
i++;
}
After optimization:

t = Sin(x)/Cos(x);
while(i<100)
{
a = t + i;
i++;
}

2. Induction Variable Elimination


20
Prof Anil Kumar
If the value of any variable in any loop gets changed every time, then such a variable is known as an induction
variable. With each iteration, its value either gets incremented or decremented by some constant value.
Example:
Before optimization:
B1
i:= i+1
x:= 3*i
y:= a[x]
if y< 15, goto B2
In the above example, i and x are locked, if i is incremented by 1 then x is incremented by 3. So, i and x are
induction variables.
After optimization:
B1
i:= i+1
x:= x+4
y:= a[x]
if y< 15, goto B2

3. Strength Reduction

Strength reduction deals with replacing expensive operations with cheaper ones like multiplication is costlier
than addition, so multiplication can be replaced by addition in the loop.
Example:
Before optimization:
while (x<10)
{
y := 3 * x+1;
a[y] := a[y]-2;
x := x+2;
} After optimization:
t= 3 * x+1;
while (x<10)
{
y=t;
a[y]= a[y]-2;
x=x+2;
t=t+6;
}

4. Loop Invariant Method


21
Prof Anil Kumar
In the loop invariant method, the expression with computation is avoided inside the loop. That computation is
performed outside the loop as computing the same expression each time was overhead to the system, and this
reduces computation overhead and hence optimizes the code.
Example:
Before optimization:
for (int i=0; i<10;i++)
t= i+(x/y);
...
end;
After optimization:
s = x/y;
for (int i=0; i<10;i++)
t= i+ s;
...
end;

5. Loop Unrolling

Loop unrolling is a loop transformation technique that helps to optimize the execution time of a program. We
basically remove or reduce iterations. Loop unrolling increases the program’s speed by eliminating loop control
instruction and loop test instructions.
Example:
Before optimization:

for (int i=0; i<5; i++)


printf("Pankaj\n");
After optimization:

printf("Pankaj\n");
printf("Pankaj\n");
printf("Pankaj\n");
printf("Pankaj\n");
printf("Pankaj\n");

6. Loop Jamming

Loop jamming is combining two or more loops in a single loop. It reduces the time taken to compile the many
loops.
Example:
Before optimization:

22
Prof Anil Kumar
for(int i=0; i<5; i++)
a = i + 5;
for(int i=0; i<5; i++)
b = i + 10;
After optimization:

for(int i=0; i<5; i++)


{
a = i + 5;
b = i + 10;
}

7. Loop Fission

Loop fission improves the locality of reference, in loop fission a single loop is divided into multiple loops over
the same index range, and each divided loop contains a particular part of the original loop.
Example:
Before optimization:

for(x=0;x<10;x++)
{
a[x]=…
b[x]=…
}After optimization:

for(x=0;x<10;x++)
a[x]=…
for(x=0;x<10;x++)
b[x]=…

8. Loop Interchange

In loop interchange, inner loops are exchanged with outer loops. This optimization technique also improves the
locality of reference.
Example:
Before optimization:

for(x=0;x<10;x++)
23
Prof Anil Kumar
for(y=0;y<10;y++)
a[y][x]=…
After optimization:
for(y=0;y<10;y++)
for(x=0;x<10;x++)
a[y][x]=…

9. Loop Reversal

Loop reversal reverses the order of values that are assigned to index variables. This help in removing
dependencies.
Example:
Before optimization:

for(x=0;x<10;x++)
a[9-x]=…
After optimization:

for(x=9;x>=0;x--)
a[x]=…

10. Loop Splitting

Loop Splitting simplifies a loop by dividing it into numerous loops, and all the loops have some bodies but they
will iterate over different index ranges. Loop splitting helps in reducing dependencies and hence making code
more optimized.
Example:
Before optimization:

for(x=0;x<10;x++)
if(x<5)
a[x]=…
else
b[x]=…
After optimization:

24
Prof Anil Kumar
for(x=0;x<5;x++)
a[x]=…
for(;x<10;x++)
b[x]=…

11. Loop Peeling

Loop peeling is a special case of loop splitting, in which a loop with problematic iteration is resolved separately
before entering the loop.
Before optimization:

for(x=0;x<10;x++)
if(x==0)
a[x]=…
else
b[x]=…
After optimization:

a[0]=…
for(x=1;x<100;x++)
b[x]=…

12. Unswitching

Unswitching moves a condition out from inside the loop, this is done by duplicating loop and placing each of its
versions inside each conditional clause.
Before optimization:

for(x=0;x<10;x++)
if(s>t)
a[x]=…
else
b[x]=…
After optimization:

if(s>t)
for(x=0;x<10;x++)
a[x]=…
25
Prof Anil Kumar
else
for(x=0;x<10;x++)
b[x]=…

Basic Blocks and Flow Graphs |


Basic block is a set of statements that always executes in a sequence one after the other.
The characteristics of basic blocks are-
 They do not contain any kind of jump statements in them.
 There is no possibility of branching or getting halt in the middle.
 All the statements execute in the same order they appear.
 They do not lose lose the flow control of the program.

Example Of Basic Block-

Three Address Code for the expression a = b + c + d is-

26
Prof Anil Kumar
Here,
 All the statements execute in a sequence one after the other.
 Thus, they form a basic block.
Example Of Not A Basic Block-
Three Address Code for the expression If A<B then 1 else 0 is-

Flow Grapgh
A flow graph is a directed graph with flow control information added to the basic blocks.
The basic blocks serve as nodes of the flow graph.
There is a directed edge from block B1 to block B2 if B2 appears immediately after B1 in the code.

Compute the basic blocks for the given three address statements-

(1) PROD = 0
(2) I = 1
(3) T2 = addr(A) – 4
(4) T4 = addr(B) – 4
(5) T1 = 4 x I
(6) T3 = T2[T1]
(7) T5 = T4[T1]
(8) T6 = T3 x T5
(9) PROD = PROD + T6
(10) I = I + 1
(11) IF I <=20 GOTO (5)

27
Prof Anil Kumar
Solution-

We have-
 PROD = 0 is a leader since first statement of the code is a leader.
 T1 = 4 x I is a leader since target of the conditional goto statement is a leader.

Now, the given code can be partitioned into two basic blocks as-

Draw a flow graph for the three address statements given in above problem.
Solution-
Firstly, we compute the basic blocks (already done above).
Secondly, we assign the flow control information.

The required flow graph is-

28
Prof Anil Kumar

DAG representation of basic blocks


DAG representation or Directed Acyclic Graph representation is used to represent the structure of basic blocks. A basic
block is a set of statements that execute one after another in sequence. It displays the flow of values between basic blocks and
provides basic block optimization algorithms.

DAG is a very useful data structure for implementing transformations on Basic Blocks.
DAGs are used for the following purposes-
 To determine the expressions which have been computed more than once (called common sub-
expressions).
 To determine the names whose computation has been done outside the block but used inside the block.
 To determine the statements of the block whose computed value can be made available outside the block.
 To simplify the list of Quadruples by not executing the assignment instructions x:=y unless they are
necessary and eliminating the common sub-expressions.
Following rules are used for the construction of DAGs-
Rule-01:
In a DAG,
 Interior nodes always represent the operators.
 Exterior nodes (leaf nodes) always represent the names, identifiers or constants.
Rule-02:
While constructing a DAG,
 A check is made to find if there exists any node with the same value.
 A new node is created only when there does not exist any node with the same value.
29
Prof Anil Kumar
 This action helps in detecting the common sub-expressions and avoiding the re-computation of the same.
Rule-03:
The assignment instructions of the form x:=y are not performed unless they are necessary.
Consider the following expression and construct a DAG for it-
(a+b)x(a+b+c)

Solution-

Three Address Code for the given expression is-


T1 = a + b
T2 = T1 + c
T3 = T1 x T2
Now, Directed Acyclic Graph is-
Direct Acyclic Graph(DAG)
Compact abstract syntax tree to avoid duplication – smaller footprint as well.

Code generation

30
Prof Anil Kumar
In computing, code generation is part of the process chain of a compiler and converts intermediate
representation of source code into a form (e.g., machine code) that can be readily executed by the
target system.

Code generator is used to produce the target code for three-address statements. It uses registers to store the
operands of the three address statement.

Example:

Consider the three address statement x:= y + z. It can have the following sequence of codes:

MOV x, R0
ADD y, R0
Register and address Descriptor
o A register descriptor contains the track of what is currently in each register. The register descriptors show
that all the registers are initially empty.
o An address descriptor is used to store the location where current value of the name can be found at run
time.

ISSUES IN THE DESIGN OF A CODE GENERATOR:


The following issues arise during the code generation phase :
1) Input to code generator
2) Target program
3)Memory management
4)Instruction selection
5)Register allocation
6)Evaluation order
1.Input to code generator: The input to the code generation consists of the intermediate representation of the
source 3 program produced by front end , together with information in the symbol table to determine run-time
addresses of the data objects denoted by the names in the intermediate representation.
Intermediate representation can be :
1) Linear representation such as postfix notation
2) 2)Three address representation such as Quadruples
3) 3)Virtual machine representation such as stack machine code

31
Prof Anil Kumar
4) 4)Graphical representations such as syntax trees and dags. Prior to code generation, the front end must be
scanned, parsed and translated into intermediate representation along with necessary type checking. Therefore,
input to code generation is assumed to be error-free.
2. Target program: The output of the code generator is the target program. The output may be :
a. Absolute machine language: It can be placed in a fixed memory location and can be executed immediately.
b. b. Relocatable machine language: It allows subprograms to be compiled separately. c. Assembly language:
Code generation is made easier.
3.Memory management: Names in the source program are mapped to addresses of data objects in run-time
memory by the front end and code generator. It makes use of symbol table, that is, a name in a threeaddress
statement refers to a symbol-table entry for the name.
4. Instruction selection:
a. The instructions of target machine should be complete and uniform.
b. Instruction speeds and machine idioms are important factors when efficiency of target program is considered.
c. The quality of the generated code is determined by its speed and size.
5. Register allocation: Instructions involving register operands are shorter and faster than those involving
operands in memory. The use of registers is subdivided into two sub problems : Register allocation – the set of
variables that will reside in registers in the program is selected. Register assignment -the specific register that a
variable will reside is selected

TARGET MACHINE:
Familiarity with the target machine and its instruction set is a prerequisite for designing a good code generator.
The target computer is a byte-addressable machine with 4 bytes to a word. It has n general-purpose registers, R0,
R1, . . . , Rn-1.
It has two-address instructions of the form:
op source, destination
where, op is an op-code, and source and destination are data fields.
It has the following op-codes :
MOV (move source to destination)
ADD (add source to destination)
SUB (subtract source from destination)
The source and destination of an instruction are specified by combining registers and memory locations with
address modes
A SIMPLE CODE GENERATOR:
A code generator generates target code for a sequence of three- address statements and effectively uses registers
to store operands of the statements.
For example: consider the three-address statement
a := b+c can have the following sequence of codes:
ADD Rj, Ri Cost = 1 // if Ri contains b and Rj contains c (or)
ADD c, Ri Cost = 2 // if c is in a memory location (or)
32
Prof Anil Kumar
MOV c, Rj Cost = 3 // move c from memory to Rj
ADD Rj, Ri

Peep Hole Optimization


Peephole optimization is a type of code Optimization performed on a small part of the code. It is performed on
a very small set of instructions in a segment of code.
The small set of instructions or small part of code on which peephole optimization is performed
is known as peephole or window.
It basically works on the theory of replacement in which a part of code is replaced by shorter and faster code
without a change in output. The peephole is machine-dependent optimization.

Objectives of Peephole Optimization:

The objective of peephole optimization is as follows:


1. To improve performance
2. To reduce memory footprint
3. To reduce code size

Peephole Optimization Techniques

A. Redundant load and store elimination: In this technique, redundancy is eliminated.


Initial code:
y = x + 5;
i = y;
z = i;
w = z * 3;

Optimized code:
y = x + 5;
w = y * 3; //* there is no i now

//* We've removed two redundant variables i & z whose value were just being copied from one another.
B. Constant folding: The code that can be simplified by the user itself, is simplified. Here simplification to be
done at runtime are replaced with simplified code to avoid additional computation.

Initial code:
x = 2 * 3;

Optimized code:
x = 6;
C. Strength Reduction: The operators that consume higher execution time are replaced by the operators
consuming less execution time.
Initial code:
y = x * 2;

33
Prof Anil Kumar
Optimized code:
y = x + x; or y = x << 1;

Initial code:
y = x / 2;

Optimized code:
y = x >> 1;
D. Null sequences/ Simplify Algebraic Expressions : Useless operations are deleted.
a := a + 0;
a := a * 1;
a := a/1;
a := a - 0;
E. Combine operations: Several operations are replaced by a single equivalent operation.
F. Deadcode Elimination:- Dead code refers to portions of the program that are never executed or do not affect
the program’s observable behavior. Eliminating dead code helps improve the efficiency and performance of the
compiled program by reducing unnecessary computations and memory usage.
Initial Code:-
int Dead(void)
{
int a=10;
int z=50;
int c;
c=z*5;
printf(c);
a=20;
a=a*10; //No need of These Two Lines
return 0;
}
Optimized Code:-
int Dead(void)
{
int a=10;
int z=50;
int c;
c=z*5;
printf(c);
return 0;
}

Register Allocation and Assignment


Registers are the fastest locations in the memory hierarchy. But unfortunately, this resource is limited. It comes
under the most constrained resources of the target processor. Register allocation is an NP-complete problem.
However, this problem can be reduced to graph coloring to achieve allocation and assignment. Therefore a good
register allocator computes an effective approximate solution to a hard problem.
34
Prof Anil Kumar
The register allocator determines which values will reside in the register and which register will hold each of
those values. It takes as its input a program with an arbitrary number of registers and produces a program with a
finite register set that can fit into the target machine.
Allocation vs Assignment:
Allocation –
Maps an unlimited namespace onto that register set of the target machine.
 Reg. to Reg. Model: Maps virtual registers to physical registers but spills excess amount to memory.
 Mem. to Mem. Model: Maps some subset of the memory location to a set of names that models the
physical register set.
Allocation ensures that code will fit the target machine’s reg. set at each instruction.
Assignment –
Maps an allocated name set to the physical register set of the target machine.
 Assumes allocation has been done so that code will fit into the set of physical registers.
 No more than ‘k’ values are designated into the registers, where ‘k’ is the no. of physical registers.

35

You might also like