0% found this document useful (0 votes)
11 views

Unit IV-1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Unit IV-1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

AUTOMATA THEORY& COMPILER DESIGN


UNIT IV
INTERMEDIATE CODE GENERATION & CODE OPTIMIZATION INTRODUCTION:

Intermediate code generation


In the analysis-synthesis model of a compiler, the front end of a compiler translates a
source program into an independent intermediate code, then the back end of the compiler uses
this intermediate code to generate the target code (which can be understood by the machine).
The benefits of using machine-independent intermediate code are:
 Because of the machine-independent intermediate code, portability will be enhanced.
For ex, suppose, if a compiler translates the source language to its target machine
language without having the option for generating intermediate code, then for each new
machine, a full native compiler is required. Because, obviously, there were some
modifications in the compiler itself according to the machine specifications.
 Retargeting is facilitated.
 It is easier to apply source code modification to improve the performance of source
code by optimizing the intermediate code.

If we generate machine code directly from source code then for n target machine we
will have optimizers and n code generator but if we will have a machine-
independent intermediate code, we will have only one optimizer. Intermediate code
can be either language-specific (e.g., Bytecode for Java) or language. independent
(three-address code).
Department of Computer Science and Engineering

The following are commonly used intermediate code representations:


1. Postfix Notation:
 Also known as reverse Polish notation or suffix notation.
 In the infix notation, the operator is placed between operands, e.g., a + b. Postfix
notation positions the operator at the right end, as in ab +.
 For any postfix expressions e1 and e2 with a binary operator (+) , applying the operator
yields e1e2+.
 Postfix notation eliminates the need for parentheses, as the operator’s position and arity
allow unambiguous expression decoding.
 In postfix notation, the operator consistently follows the operand.
[

 Example 1: The postfix representation of the expression


(a + b) * c is : ab + c *
Example 2: The postfix representation of the expression
(a – b) * (c + d) + (a – b) is : ab – cd + *ab -+
2. Three-Address Code:
 A three address statement involves a maximum of three references, consisting of two
for operands and one for the result.
 A sequence of three address statements collectively forms a three address code.
 The typical form of a three address statement is expressed as x = y op z, where x, y,
and z represent memory addresses.
 Each variable (x, y, z) in a three address statement is associated with a specific memory
location.
 While a standard three address statement includes three references, there are instances
where a statement may contain fewer than three references, yet it is still categorized as
a three address statement.
 Example: The three address code for the expression a + b * c + d :
T1 = b * c
T2 = a + T1
T3 = T2 + d;
T 1 , T2 , T3 are temporary variables.
There are 3 ways to represent a Three-Address Code in compiler design:
i) Quadruples
ii) Triples
iii) Indirect Triples
3. Syntax Tree:
 A syntax tree serves as a condensed representation of a parse tree.
 The operator and keyword nodes present in the parse tree undergo a relocation process
to become part of their respective parent nodes in the syntax tree. the internal nodes are
operators and child nodes are operands.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 2


Department of Computer Science and Engineering

 Creating a syntax tree involves strategically placing parentheses within the expression.
This technique contributes to a more intuitive representation, making it easier to discern
the sequence in which operands should be processed.
 The syntax tree not only condenses the parse tree but also offers an improved visual
representation of the program’s syntactic structure,
Example: x = (a + b * c) / (a – b * c)

Advantages of Intermediate Code Generation:

 Easier to implement: Intermediate code generation can simplify the code generation
process by reducing the complexity of the input code, making it easier to implement.
 Facilitates code optimization: Intermediate code generation can enable the use of
various code optimization techniques, leading to improved performance and efficiency
of the generated code.
 Platform independence: Intermediate code is platform-independent, meaning that it
can be translated into machine code or bytecode for any platform.
 Code reuse: Intermediate code can be reused in the future to generate code for other
platforms or languages.
 Easier debugging: Intermediate code can be easier to debug than machine code or
bytecode, as it is closer to the original source code.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 3


Department of Computer Science and Engineering

Disadvantages of Intermediate Code Generation:


 Increased compilation time: Intermediate code generation can significantly increase

the compilation time, making it less suitable for real-time or time-critical applications.
 Additional memory usage: Intermediate code generation requires additional memory
to store the intermediate representation, which can be a concern for memory-limited
systems.
 Increased complexity: Intermediate code generation can increase the complexity of the
compiler design, making it harder to implement and maintain.
 Reduced performance: The process of generating intermediate code can result in code
that executes slower than code generated directly from the source code.
Three address code in Compiler
Three address code is a type of intermediate code which is easy to generate and can be easily
converted to machine code. It makes use of at most three addresses and one operator to
represent an expression and the value computed at each instruction is stored in temporary
variable generated by compiler. The compiler decides the order of operation given by three
address code.
Three Address Code is Used in Compiler Applications
1. Optimization: Three address code is often used as an intermediate representation of code
during optimization phases of the compilation process. The three address code allows the
compiler to analyze the code and perform optimizations that can improve the performance
of the generated code.

2. Code generation: Three address code can also be used as an intermediate representation of
code during the code generation phase of the compilation process. The three address code
allows the compiler to generate code that is specific to the target platform, while also
ensuring that the generated code is correct and efficient.

3. Debugging: Three address code can be helpful in debugging the code generated by the
compiler. Since three address code is a low-level language, it is often easier to read and
understand than the final generated code. Developers can use the three address code to
trace the execution of the program and identify errors or issues that may be present.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 4


Department of Computer Science and Engineering

4. Language translation: Three address code can also be used to translate code from one
programming language to another. By translating code to a common intermediate
representation, it becomes easier to translate the code to multiple target languages.

General Representation
a = b op c
Where a, b or c represents operands like names, constants or compiler generated temporaries
and op represents the operator.

Example-1: Convert the expression a * – (b + c) into three address code.

Example-2: Write three address code for following code


for(i = 1; i<=10; i++)
{
a[i] = x * 5;
}

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 5


Department of Computer Science and Engineering

Implementation of Three Address Code


There are 3 representations of three address code namely
1. Quadruple
2. Triples
3. Indirect Triples
Quadruple – It is a structure which consists of 4 fields namely op, arg1, arg2 and result. op
denotes the operator and arg1 and arg2 denotes the two operands and result is used to store the
result of the expression.
Advantage –
 Easy to rearrange code for global optimization.
 One can quickly access value of temporary variables using symbol table.
Disadvantage –
 Contain lot of temporaries.
 Temporary variable creation increases time and space complexity.
Example – Consider expression a = b * – c + b * – c. The three address code is:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5

Triples – This representation doesn’t make use of extra temporary variable to represent a
single operation instead when a reference to another triple’s value is needed, a pointer to that
triple is used. So, it consist of only three fields namely op, arg1 and arg2.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 6


Department of Computer Science and Engineering

Disadvantage –
 Temporaries are implicit and difficult to rearrange code.
 It is difficult to optimize because optimization involves moving intermediate code. When a
triple is moved, any other triple referring to it must be updated also. With help of pointer
one can directly access symbol table entry.
Example – Consider expression a = b * – c + b * – c

Indirect Triples – This representation makes use of pointer to the listing of all references to
computations which is made separately and stored. Its similar in utility as compared to
quadruple representation but requires less space than it. Temporaries are implicit and easier to
rearrange code.
Example – Consider expression a = b * – c + b * – c

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 7


Department of Computer Science and Engineering

Write quadruple, triples and indirect triples for following expression :


(x + y) * (y + z) + (x + y + z)

Explanation – The three address code is:


(1) i = 1
(2) if i <= 10 goto (4)
(3) goto next
(4) t1 = x + 2
(5) x = t1
(6) t2 = i + 1
(7) i = t2
(8) goto (2)
next: // next line after (8)th statement

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 8


Department of Computer Science and Engineering

ABSTRACT SYNTAX TREE (AST)


An abstract syntax tree (AST) is a data structure used in computer science to represent the
structure of a program or code snippet. It is a tree representation of the abstract
syntactic structure of text (often source code) written in a formal language. Each node of the tree
denotes a construct occurring in the text. It is sometimes called just a syntax tree.
The syntax is "abstract" in the sense that it does not represent every detail appearing in the real
syntax, but rather just the structural or content-related details. For instance,
grouping parentheses are implicit in the tree structure, so these do not have to be represented as
separate nodes. Likewise, a syntactic construct like an if-condition-then statement may be
denoted by means of a single node with three branches.
[

This distinguishes abstract syntax trees from concrete syntax trees, traditionally designated parse
trees. Parse trees are typically built by a parser during the source code translation
and compiling process. Once built, additional information is added to the AST by means of
subsequent processing, e.g., contextual analysis.

Example:

while b ≠ 0:
if a > b:
a := a - b
else:
b := b - a
return a

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 9


Department of Computer Science and Engineering

AST Node Labels


Each node has some of the following:
 The kind of statement (e.g., BLOCK, WHILE)
 The test condition (e.g., NEXT_IS_EMPTY, TRUE)
 The call of an instruction (e.g., infect, move), realizing that this may be an instruction
defined elsewhere in the program.
Example:

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 10


Department of Computer Science and Engineering

Syntax Directed Translation


Parser uses a CFG(Context-free-Grammar) to validate the input string and produce output
for the next phase of the compiler. Output could be either a parse tree or an abstract syntax tree.
Now to interleave semantic analysis with the syntax analysis phase of the compiler, we use Syntax
Directed Translation.
With both syntax-directed definition and translation schemes, we parse the input token
stream, build the parse tree, and then traverse the tree as needed to evaluate the semantic rules at the
parse tree nodes. Evaluation of the semantic rules may generate code, save information in a symbol
table, issue error messages, or perform any other activities. The translation of the token stream is the
result obtained by evaluating the semantic rules.
In syntax directed translation, along with the grammar we associate some informal notations
and these notations are called as semantic rules.
So we can say that
Grammar + semantic rule = SDT (syntax directed translation)

Syntax Directed Translation has augmented rules to the grammar that facilitate semantic
analysis. SDT involves passing information bottom-up and/or top-down to the parse tree in form of
attributes attached to the nodes. Syntax-directed translation rules use 1) lexical values of nodes,
2) constants & 3) attributes associated with the non-terminals in their definitions.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 11


Department of Computer Science and Engineering

The general approach to Syntax-Directed Translation is to construct a parse tree or syntax


tree and compute the values of attributes at the nodes of the tree by visiting them in some order. In
many cases, translation can be done during parsing without building an explicit tree.

In syntax directed translation, every non-terminal can get one or more than one attribute or
sometimes 0 attribute depending on the type of the attribute. The value of these attributes is
evaluated by the semantic rules associated with the production rule.

In the semantic rule, attribute is VAL and an attribute may hold anything like a string, a
number, a memory location and a complex record. In Syntax directed translation, whenever a
construct encounters in the programming language then it is translated according to the semantic
rules define in that particular programming language.
Example
E -> E+T | T
T -> T*F | F
F -> INTLIT
This is a grammar to syntactically validate an expression having additions and
multiplications in it. Now, to carry out semantic analysis we will augment SDT rules to this
grammar, in order to pass some information up the parse tree and check for semantic errors.

E -> E+T { E.val = E.val + T.val } PR#1


E -> T { E.val = T.val } PR#2
T -> T*F { T.val = T.val * F.val } PR#3
T -> F { T.val = F.val } PR#4
F -> INTLIT { F.val = INTLIT.lexval } PR#5

E.val is one of the attributes of E.


INTLIT.lexval is the attribute returned by the lexical analyzer.

For understanding translation rules further, we take the first SDT augmented to [ E -> E+T ]
production rule. The translation rule in consideration has val as an attribute for both the
non-terminals – E & T. Right-hand side of the translation rule corresponds to attribute values of the
right-side nodes of the production rule and vice-versa.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 12


Department of Computer Science and Engineering

Generalizing, SDT are augmented rules to a CFG that associate 1) set of attributes to every
node of the grammar and 2) a set of translation rules to every production rule using attributes,
constants, and lexical values.
Let’s take a string to see how semantic analysis happens – S = 2+3*4. Parse tree
corresponding to S would be

To evaluate translation rules, we can employ one depth-first search traversal on the parse
tree. For better understanding, we will move bottom-up in the left to right fashion for computing the
translation rules of our example.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 13


Department of Computer Science and Engineering

The above diagram shows how semantic analysis could happen. The flow of information
happens bottom-up and all the children’s attributes are computed before parents
Types of attributes – Attributes may be of two types – Synthesized or Inherited.
1. Synthesized attributes – A Synthesized attribute is an attribute of the non-terminal on the
left-hand side of a production. Synthesized attributes represent information that is being
passed up the parse tree. The attribute can take value only from its children (Variables in
the RHS of the production). The non-terminal concerned must be in the head (LHS) of
production. For e.g. let’s say A -> BC is a production of a grammar, and A’s attribute is
dependent on B’s attributes or C’s attributes then it will be synthesized attribute.
2. Inherited attributes – An attribute of a nonterminal on the right-hand side of a production
is called an inherited attribute. The attribute can take value either from its parent or from its
siblings (variables in the LHS or RHS of the production). The non-terminal concerned must
be in the body (RHS) of production. For example, let’s say A -> BC is a production of a
grammar and B’s attribute is dependent on A’s attributes or C’s attributes then it will be
inherited attribute because A is a parent here, and C is a sibling.
S – attributed and L – attributed SDTs in Syntax directed translation
1. S-attributed SDT :
 If an SDT uses only synthesized attributes, it is called as S-attributed SDT.
 S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent nodes
depend upon the values of the child nodes.
 Semantic actions are placed in rightmost place of RHS.
2. L-attributed SDT:
 If an SDT uses both synthesized attributes and inherited attributes with a restriction that
inherited attribute can inherit values from left siblings only, it is called as L-attributed
SDT.
 Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing
manner.
 Semantic actions are placed anywhere in RHS.
 Example : S->ABC, Here attribute B can only obtain its value either from the parent –
S or its left sibling A but It can’t inherit from its right sibling C. Same goes for A & C –
A can only get its value from its parent & C can get its value from S, A, & B as well
because C is the rightmost attribute in the given production.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 14


Department of Computer Science and Engineering

Annotated Parse Tree – The parse tree containing the values of attributes at each node for
given input string is called annotated or decorated parse tree.

Types of attributes – There are two types of attributes:


1. Synthesized Attributes – These are those attributes which derive their values from their
children nodes i.e. value of synthesized attribute at node is computed from the values of
attributes at children nodes in parse tree.
Example:
E --> E1 + T { E.val = E1.val + T.val}
In this, E.val derive its values from E 1.val and T.val
Computation of Synthesized Attributes –
 Write the SDD using appropriate semantic rules for each production in given grammar.

 The annotated parse tree is generated and attribute values are computed in bottom up manner.

 The value obtained at root node is the final output.

Example: Consider the following grammar


S --> E
E --> E1 + T
E --> T
T --> T1 * F
T --> F
F --> digit
The SDD for the above grammar can be written as follow

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 15


Department of Computer Science and Engineering

Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. The annotated
parse tree for the input string is

For computation of attributes we start from leftmost bottom node. The rule F –> digit is used
to reduce digit to F and the value of digit is obtained from lexical analyzer which becomes value of
F i.e. from semantic action F.val = digit.lexval. Hence, F.val = 4 and since T is parent node of F so,
we get T.val = 4 from semantic action T.val = F.val. Then, for T –> T1 * F production, the
corresponding semantic action is T.val = T1.val * F.val . Hence, T.val = 4 * 5 = 20
Similarly, combination of E1.val + T.val becomes E.val i.e. E.val = E1.val + T.val = 26.
Then, the production S –> E is applied to reduce E.val = 26 and semantic action associated with it
prints the result E.val . Hence, the output will be 26.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 16


Department of Computer Science and Engineering

2. Inherited Attributes – These are the attributes which derive their values from their parent
or sibling nodes i.e. value of inherited attributes are computed by value of parent or sibling
nodes.
Example:
A --> BCD { C.in = A.in, C.type = B.type }

Computation of Inherited Attributes –


 Construct the SDD using semantic actions.
 The annotated parse tree is generated and attribute values are computed in top down
manner.
Example: Consider the following grammar
S --> T L
T --> int
T --> float
T --> double
L --> L1, id
L --> id

The SDD for the above grammar can be written as follow

Let us assume an input string int a, c for computing inherited attributes. The annotated parse
tree for the input string is

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 17


Department of Computer Science and Engineering

The value of L nodes is obtained from T.type (sibling) which is basically lexical value
obtained as int, float or double. Then L node gives type of identifiers a and c. The computation of
type is done in top down manner or preorder traversal. Using function Enter_type the type of
identifiers a and c is inserted in symbol table at corresponding id.entry.

To see how the semantic rules are used, consider the annotated parse tree for 3 * 5. The left
most leaf in the parse tree, labeled digit, has attribute value lexval = 3, where the 3 is supplied by the
lexical analyzer. Its parent is for production 4, F ->digit. The only semantic rule associated with this
production denes F.val = digit.lexval, which equals 3.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 18


Department of Computer Science and Engineering

At the second child of the root, the inherited attribute T’.inh is defined by the semantic rule
T’.inh = F.val associated with production 1. Thus, the left operand, 3, for the * operator is passed
from left to right across the children of the root.

The production at the node for T' is T'->FT'. The inherited attribute T’.inh is defined by the
semantic rule T’.inh = T’.inh X F.val associated with production 2.

With T’.inh = 3 and F.val =5, we get T’.inh = 15. At the lower node for T', the production is
T’->€. The semantic rule T’.syn = T’.inh defines T'.syn = 15. The syn attributes at the nodes for T’
pass the value 15 up the tree to the node for T, where T.val = 15.

Advantages of Syntax Directed Translation:

Ease of implementation: SDT is a simple and easy-to-implement method for translating a


programming language. It provides a clear and structured way to specify translation rules using
grammar rules.
Separation of concerns: SDT separates the translation process from the parsing process,
making it easier to modify and maintain the compiler. It also separates the translation concerns
from the parsing concerns, allowing for more modular and extensible compiler designs.
Efficient code generation: SDT enables the generation of efficient code by optimizing the
translation process. It allows for the use of techniques such as intermediate code generation
and code optimization.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 19


Department of Computer Science and Engineering

Disadvantages of Syntax Directed Translation:

Limited expressiveness: SDT has limited expressiveness in comparison to other translation


methods, such as attribute grammars. This limits the types of translations that can be
performed using SDT.
Inflexibility: SDT can be inflexible in situations where the translation rules are complex and
cannot be easily expressed using grammar rules.

Limited error recovery: SDT is limited in its ability to recover from errors during the
translation process. This can result in poor error messages and may make it difficult to locate
and fix errors in the input program.

Translation of Simple Statements And Control Flow Statements

Control statements are the statements that change the flow of execution of statements.

Consider the Grammar

S → if E then S1

|if E then S1 else S2

|while E do S1

In this grammar, E is the Boolean expression depending upon which S1 or S2 will be executed.

Following representation shows the order of execution of an instruction of if-then, if then-else, &
while do.

1. 𝐒 → 𝐢𝐟 𝐄 𝐭𝐡𝐞𝐧 𝐒𝟏

E.CODE & S.CODE are a sequence of statements which generate three address code.
E.TRUE is the label to which control flow if E is true.
E.FALSE is the label to which control flow if E is false.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 20


Department of Computer Science and Engineering

The code for E generates a jump to E.TRUE if E is true and a jump to S.NEXT if E is false.

∴ E.FALSE=S.NEXT in the following table.

In the following table, a new label is allocated to E.TRUE.

When S1.CODE will be executed, and the control will be jumped to statement following S, i.e.,
to S1.NEXT.

∴ S1. NEXT = S. NEXT.

Syntax Directed Translation for "If E then S1."

Production Semantic Rule

𝐒 → 𝐢𝐟 𝐄 𝐭𝐡𝐞𝐧 𝐒𝟏 E. TRUE = newlabel;


E. FALSE = S. NEXT;
S1. NEXT = S. NEXT;
S. CODE = E. CODE | |
GEN (E. TRUE '− ') | | S1. CODE

2. 𝐒 → 𝐈𝐟 𝐄 𝐭𝐡𝐞𝐧 𝐒𝟏 𝐞𝐥𝐬𝐞 𝐒𝟐

If E is true, control will go to E.TRUE, i.e., S1.CODE will be executed and after that S.NEXT
appears after S1.CODE.

If E.CODE will be false, then S2.CODE will be executed.

Initially, both E.TRUE & E.FALSE are taken as new labels. Hen S1.CODE at label E.TRUE is
executed, control will jump to S.NEXT.

Therefore, after S1, control will jump to the next statement of complete statement S.

S1.NEXT=S.NEXT

Similarly, after S2.CODE, the next statement of S will be executed.

∴ S2.NEXT=S.NEXT

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 21


Department of Computer Science and Engineering

Syntax Directed Translation for "If E then S1 else S2."

Production Semantic Rule

𝐒 → 𝐢𝐟 𝐄 𝐭𝐡𝐞𝐧 𝐒𝟏 E. TRUE = newlabel;


𝐞𝐥𝐬𝐞 𝐒𝟐 E. FALSE = newlabel;
S1. NEXT = S. NEXT;
S2. NEXT = S. NEXT;
S. CODE = E. CODE | | GEN (E. TRUE '− ') | | S1. CODE
GEN(goto S. NEXT) | |
GEN (E. FALSE −) | | S2. CODE

3. 𝐒 → 𝐰𝐡𝐢𝐥𝐞 𝐄 𝐝𝐨 𝐒𝟏
Another important control statement is while E do S1, i.e., statement S1 will be executed till
Expression E is true. Control will arrive out of the loop as the expression E will become
false.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 22


Department of Computer Science and Engineering

A Label S. BEGIN is created which points to the first instruction for E. Label E. TRUE is attached
with the first instruction for S1. If E is true, control will jump to the label E. TRUE & S1. CODE will
be executed. If E is false, control will jump to E. FALSE. After S1. CODE, again control will jump
to S. BEGIN, which will again check E. CODE for true or false.

∴ S1. NEXT = S. BEGIN

If E. CODE is false, control will jump to E. FALSE, which causes the next statement after S to be
executed.

∴ E. FALSE = S. NEXT

Syntax Directed Translation for " → 𝐰𝐡𝐢𝐥𝐞 𝐄 𝐝𝐨 𝐒𝟏 "

Production Semantic Rule

𝐒 → 𝐰𝐡𝐢𝐥𝐞 𝐄 𝐝𝐨 𝐒𝟏 S. BEGIN = newlabel;


E. TRUE = newlabel;
E. FALSE = S. NEXT;
S1. NEXT = S. BEGIN;
S. CODE = GEN(S. BEGIN '− ') | |
E. CODE | | GEN(E. TRUE '− ')| |
S1. CODE | | GEN('goto' S. BEGIN)

Boolean expressions
Boolean expressions have two primary purposes. They are used for computing the logical values.
They are also used as conditional expression using if-then-else or while-do.

Consider the grammar


E → E OR E
E → E AND E
E → NOT E
E → (E)
E → id relop id
E → TRUE
E → FALSE
The relop is denoted by <, >, <, >.

The AND and OR are left associated. NOT has the higher precedence then AND and lastly OR.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 23


Department of Computer Science and Engineering

Production rule Semantic actions

E → E1 OR E2 E1.true := E.true
E1.false := newlabel
E2.true := E.true
E2.false := E.false
E.code := E1.code | | generate(E1.false':') | | E2.code
E → E1 and E2 E1.true := newlabel
E1.false := E.false
E2.true := E.true
E2.false := E.false
E.code := E1.code | | generate(E1.true':') | | E2.code
E → NOT E1 E1.true := E.false
E1.false := E.true
E.code := E1.code
E → (E1) {E.place = E1.place}
E → id relop id2 code1:=generate(if id1.place relop id2.place goto E.True)
code2 := generate('goto' E.false)
E.code := code1 | | code2
E → TRUE E.code := generate('goto' E.true)
E → FALSE E.code := generate('goto' E.false)

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 24


Department of Computer Science and Engineering

CODE OPTIMIZATION:
Issues in the design of code optimization
Optimizing code is a crucial aspect of software development, ensuring that programs run efficiently,
consume fewer resources, and perform tasks quickly. Here are some common issues that developers
encounter when optimizing code:
1. Premature Optimization: Optimizing code before identifying performance bottlenecks can lead
to wasted effort and may even make the code more complex and harder to maintain.
2. Lack of Profiling: Without profiling tools, developers may optimize code in areas that don't
significantly impact performance. Profiling helps identify hotspots where optimizations will yield
the most benefit.
3. Inefficient Algorithms: Choosing the wrong algorithm or data structure can result in poor
performance. Understanding algorithm complexity and selecting appropriate algorithms is
essential.
4. Excessive Memory Usage: Allocating too much memory or using inefficient data structures can
lead to excessive memory usage, causing performance degradation, especially in memory-
constrained environments.
5. Inefficient Loops: Nested loops, unnecessary iterations, or redundant calculations within loops
can significantly impact performance, especially in large datasets.
6. Poor Resource Management: Not releasing resources properly, such as file handles, network
connections, or memory, can lead to resource leaks and degrade performance over time.
7. Ineffective Parallelization: In multi-threaded or distributed systems, improper synchronization,
excessive context switching, or inefficient use of parallel resources can hinder performance gains.
8. Unnecessary Function Calls: Calling functions excessively, especially in tight loops, can
introduce overhead. Inlining functions or reducing unnecessary function calls can improve
performance.
9. I/O Bottlenecks: Excessive disk I/O or network I/O operations can slow down performance.
Using buffered I/O, batching requests, or optimizing network communication can alleviate these
bottlenecks.
10. Platform-specific Issues: Code may perform differently on different platforms or hardware
architectures. Ensuring code is optimized for the target platform can improve performance.
11. Unnecessary Copying: Making unnecessary copies of data, especially large datasets, can lead to
performance overhead. Using references or pointers judiciously can reduce copying overhead.
12. Excessive Logging/Debugging: Excessive logging or debugging statements can impact
performance, especially in production environments. Using logging levels and conditional
logging can mitigate this issue.
13. Unnecessary Overhead: Code may contain unnecessary operations, such as redundant checks or
computations, which can be eliminated to improve performance.
14. Inefficient Compilation: Compilation flags, optimization settings, and build configurations can
impact the performance of the generated code. Ensuring proper compilation settings are used can
improve performance.
Addressing these issues requires a combination of careful analysis, profiling, algorithmic
optimization, and code refactoring. It's essential to balance optimization efforts with code
readability, maintainability, and the specific performance requirements of the application.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 25


Department of Computer Science and Engineering

Why Optimize?
Optimizing an algorithm is beyond the scope of the code optimization phase. So the program is
optimized. And it may involve reducing the size of the code. So optimization helps to:
 Reduce the space consumed and increases the speed of compilation.
 Manually analyzing datasets involves a lot of time. Hence we make use of software like
Tableau for data analysis. Similarly manually performing the optimization is also tedious
and is better done using a code optimizer.
 An optimized code often promotes re-usability.
Types of Code Optimization:
The optimization process can be broadly classified into two types :

1. Machine Independent Optimization: This code optimization phase attempts to


improve the intermediate code to get a better target code as the output. The part
of the intermediate code which is transformed here does not involve any CPU
registers or absolute memory locations.

2. Machine Dependent Optimization: Machine-dependent optimization is done


after the target code has been generated and when the code is transformed
according to the target machine architecture. It involves CPU registers and may
have absolute memory references rather than relative references. Machine-
dependent optimizers put efforts to take maximum advantage of the memory
hierarchy.
Principal sources of optimization:
A transformation of a program is called local if it can be performed by looking only at the
statements in a basic block; otherwise, it is called global. Many transformations can be
performed at both the local and global levels. Local transformations are usually performed first.

Function-Preserving Transformations
There are a number of ways in which a compiler can improve a program without changing the
function it computes. Function preserving transformations examples:

 Common sub expression elimination


 Copy propagation
 Dead-code elimination
 Constant folding

Common Sub expressions elimination:


An occurrence of an expression E is called a common sub-expression if E was previously
computed, and the values of variables in E have not changed since the previous computation. We
can avoid recomputing the expression if we can use the previously computed value.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 26


Department of Computer Science and Engineering

Example
a = 10;
b = a + 1 * 2;
c = a + 1 * 2;
//’c’ has common expression as ‘b’
d = c + a;

After elimination

a = 10;
b = a + 1 * 2;
d = b + a;

Before elimination –

x = 11;
y = 11 * 24;
z = x * 24;
//'z' has common expression as 'y' as 'x' can be evaluated directly as done in 'y'.

After elimination –

y = 11 * 24;

Copy Propagation:
Assignments of the form f : = g called copy statements, or copies for short. The idea behind the
copy-propagation transformation is to use g for f, whenever possible after the copy statement f: =
g. Copy propagation means use of one variable instead of another. This may not appear to be an
improvement, but as we shall see it gives us an opportunity to eliminate x.
Example:
x=Pi; ……

A=x*r*r;

The optimization using copy propagation can be done as follows:


A=Pi*r*r; Here the variable x is eliminated

Dead-Code Eliminations:
A variable is live at a point in a program if its value can be used subsequently; otherwise, it is
dead at that point. A related idea is dead or useless code, statements that compute values that
never get used. While the programmer is unlikely to introduce any dead code intentionally, it
may appear as the result of previous transformations.

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 27


Department of Computer Science and Engineering

Example:
i=0;
if(i= =1)
{
a=b+5;
}
Here, ‘if’ statement is dead code because this condition will never get satisfied.

Constant folding:
Deducing at compile time that the value of an expression is a constant and using the constant
instead is known as constant folding. One advantage of copy propagation is that it often turns the
copy statement into dead code.
For example
a=3.14157/2 can be replaced by a=1.570 there by eliminating a division operation.

Optimization of Basic Blocks

Optimization is applied to the basic blocks after the intermediate code generation phase
of the compiler. Optimization is the process of transforming a program that improves the code by
consuming fewer resources and delivering high speed. In optimization, high-level codes are
replaced by their equivalent efficient low-level codes. Optimization of basic blocks can be
machine-dependent or machine-independent. These transformations are useful for improving the
quality of code that will be ultimately generated from basic block.

There are two types of basic block optimizations:

 Structure preserving transformations


 Algebraic transformations

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 28


Department of Computer Science and Engineering

I. Structure-Preserving Transformations:
The structure-preserving transformation on basic blocks includes:

1. Dead Code Elimination


2. Common Sub expression Elimination
3. Renaming of Temporary variables
4. Interchange of two independent adjacent statements
1.Dead Code Elimination:
Dead code is defined as that part of the code that never executes during the program
execution. So, for optimization, such code or dead code is eliminated. The code which is never
executed during the program (Dead code) takes time so, for optimization and speed, it is
eliminated from the code. Eliminating the dead code increases the speed of the program as the
compiler does not have to translate the dead code.
Program with Dead code
int main()
{
x=2
if (x > 2)
cout << "code"; // Dead code
else
cout << "Optimization";
return 0;
}
Optimized Program without dead code

int main()
{
x = 2;
cout << "Optimization"; // Dead Code Eliminated
return 0;
}
2. Common Sub expression Elimination:
In this technique, the sub-expression which are common are used frequently are
calculated only once and reused when needed. DAG ( Directed Acyclic Graph ) is used to
eliminate common subexpressions.
Example:

t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 29


Department of Computer Science and Engineering

The above code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
3.Renaming of Temporary Variables:
Statements containing instances of a temporary variable can be changed to instances of a new
temporary variable without changing the basic block value.
Example: Statement t = a + b can be changed to x = a + b where t is a temporary variable and x is
a new temporary variable without changing the value of the basic block.
4.Interchange of Two Independent Adjacent Statements:
If a block has two adjacent statements which are independent can be interchanged without
affecting the basic block value.
Example:
t1 = a + b
t2 = c + d
These two independent statements of a block can be interchanged without affecting the value of
the block.
II. Algebraic Transformation:
Countless algebraic transformations can be used to change the set of expressions computed by a
basic block into an algebraically equivalent set. Some of the algebraic transformation on basic
blocks includes:
1. Constant Folding
2. Copy Propagation
3. Strength Reduction
1. Constant Folding:
Solve the constant terms which are continuous so that compiler does not need to solve this
expression.
Example:
x = 2 * 3 + y ⇒ x = 6 + y
2. Copy Propagation:
It is of two types, Variable Propagation, and Constant Propagation.
Variable Propagation:
x=y ⇒ z = y + 2 (Optimized code)
z=x+2
Constant Propagation:
x=3 ⇒ z = 3 + a (Optimized code)
z=x+a

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 30


Department of Computer Science and Engineering

3.Strength Reduction:
Replace expensive statement/ instruction with cheaper ones.
x = 2 * y (costly) ⇒ x = y + y (cheaper)
x = 2 * y (costly) ⇒ x = y << 1 (cheaper)
Loop Optimization:
Loop optimization includes the following strategies:
1. Code motion & Frequency Reduction
2. Induction variable elimination
3. Loop merging/combining
4. Loop Unrolling
1. Code Motion & Frequency Reduction
Move loop invariant code outside of the loop.
Program with loop variant inside loop
int main()
{
for (i = 0; i < n; i++)
{
x = 10;
y = y + i;
}
return 0;
}

Program with loop variant outside loop


int main()
{
x = 10;
for (i = 0; i < n; i++)
y = y + i;
return 0;
}
2. Induction Variable Elimination:
Eliminate various unnecessary induction variables used in the loop.
Program with multiple induction variables
int main()
{
i1 = 0;
i2 = 0;
for (i = 0; i < n; i++) {
A[i1++] = B[i2++];
}
return 0;
}

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 31


Department of Computer Science and Engineering

Program with one induction variable

int main()
{
for (i = 0; i < n; i++) {
A[i] = B[i]; // Only one induction variable
}
return 0;
}

3. Loop Merging/Combining:
If the operations performed can be done in a single loop then, merge or combine the loops.

Program with multiple loops


int main()
{
for (i = 0; i < n; i++)
A[i] = i + 1;
for (j = 0; j < n; j++)
B[j] = j - 1;
return 0;
}

Program with one loop when multiple loops are merged


int main()
{
for (i = 0; i < n; i++) {
A[i] = i + 1;
B[i] = i - 1;
}
return 0;
}

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 32


Department of Computer Science and Engineering

4. Loop Unrolling:
If there exists simple code which can reduce the number of times the loop executes then, the loop
can be replaced with these codes.
Program with loops
int main()
{
for (i = 0; i < 3; i++)
cout << "Cd";
return 0;
}

Program with simple code without loops


int main()
{
cout << "Cd";
cout << "Cd";
cout << "Cd";
return 0;
}
PEEPHOLE OPTIMIZATION
Peephole optimization is a type of code Optimization performed on a small part of the
code. It is performed on a very small set of instructions in a segment of code. The small set of
instructions or small part of code on which peephole optimization is performed is known
as peephole or window.

It basically works on the theory of replacement in which a part of code is replaced by shorter
and faster code without a change in output. The peephole is machine-dependent optimization.
Objectives of Peephole Optimization:
The objective of peephole optimization is as follows:
 To improve performance
 To reduce memory footprint
 To reduce code size

Peephole Optimization Techniques


1. Redundant load and store elimination
2. Constant folding
3. Strength Reduction
4. Null sequences/ Simplify Algebraic Expressions
5. Combine operations
6. Dead code Elimination
Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 33
Department of Computer Science and Engineering

1. Redundant load and store elimination: In this technique, redundancy is eliminated.

Initial code:
y = x + 5;
i = y;
z = i;
w = z * 3;

Optimized code:
y = x + 5;
w = y * 3; //* there is no i now

//* We've removed two redundant variables i & z whose value were just being copied
from one another.

2. Constant folding: The code that can be simplified by the user itself, is simplified.
Here simplification to be done at runtime are replaced with simplified code to avoid
additional computation.

Initial code:
x = 2 * 3;

Optimized code:
x = 6;

3. Strength Reduction: The operators that consume higher execution


time are replaced by the operators consuming less execution time.

Initial code:
y = x * 2;

Optimized code:
y = x + x; or y = x << 1;

Initial code:
y = x / 2;

Optimized code:
y = x >> 1;

4. Null sequences/ Simplify Algebraic Expressions : Useless operations are deleted.


a := a + 0;
a := a * 1;
a := a/1;
a := a - 0;

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 34


Department of Computer Science and Engineering

5. Combine operations: Several operations are replaced by a single equivalent


operation.

6. Dead code Elimination:- Dead code refers to portions of the program that are never
executed or do not affect the program’s observable behavior. Eliminating dead code
helps improve the efficiency and performance of the compiled program by reducing
unnecessary computations and memory usage.
Initial Code:-
int Dead(void)
{
int a=10;
int z=50;
int c;
c=z*5;
printf(c);
a=20;
a=a*10; //No need of These Two Lines
return 0;
}

Optimized Code:-
int Dead(void)
{
int a=10;
int z=50;
int c;
c=z*5;
printf(c);
return 0;
}

Automata Theory & Compiler Design Mr. P.Krishnamoorthy Page 35

You might also like