CompilerDesign UNIT 4 CD-DIGITALMATEIAL
CompilerDesign UNIT 4 CD-DIGITALMATEIAL
proceeding:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
1. CONTENTS
Page
S. No Contents
No
1 Course Objectives 5
2 Pre Requisites 6
3 Syllabus 7
4 Course outcomes 9
6 Lecture Plan 11
9 Assignments 47
11 Part B Questions 52
15 Assessment Schedule 58
3
21CS601
Compiler Design
Department: CSE
Batch/Year: 2021-25 / III
Created by:
Dr. P. EZHUMALAI, Prof & Head/RMDEC
Dr. A. K. JAITHUNBI, Associate Professor/RMDEC
V.SHARMILA, Assistant Professor/RMDEC
Date: 26.02.2024
4
1. CONTENTS
S. Page
Contents
No No
1 Course Objectives 5
2 Pre Requisites 6
3 Syllabus 7
4 Course outcomes 9
6 Lecture Plan 11
8 Lecture Notes 13
9 Assignments 47
11 Part B Questions 55
15 Assessment Schedule 64
5
2. COURSE OBJECTIVES
6
3. PRE REQUISITES
• Pre-requisite Chart
21MA302
21CS201
Discrete Mathematics
Data Structures
21CS02 Python
21GE101 -
Programming (Lab
Problem solving and C
Integrated)
Programming
7
4. SYLLABUS
21CS601 COMPILER DESIGN (Lab Integrated) LTPC
3 02 4
OBJECTIVES
8
4. SYLLABUS
LIST OF EXPERIMENTS:
1. Develop a lexical analyzer to recognize a few patterns in C. (Ex.
identifiers, constants, comments, operators etc.). Create a symbol
table, while recognizing identifiers.
2. Design a lexical analyzer for the given language. The lexical analyzer
should ignore redundant spaces, tabs and new lines, comments etc.
3. Implement a Lexical Analyzer using Lex Tool.
4. Design Predictive Parser for the given language.
5. Implement an Arithmetic Calculator using LEX and YACC.
6. Generate three address code for a simple program using LEX and YACC.
7. Implement simple code optimization techniques (Constant folding,
Strength reduction and Algebraic transformation).
8. Implement back-end of the complier for which the three address code
is given as input and the 8086 assembly language code is produced as
output.
9
5. COURSE OUTCOME
Construct the parse tree and check the syntax of the given
CO2 K3
source program.
10
• HKL = Highest Knowledge Level
6. CO - PO / PSO MAPPING
CO1 K2 3 2 1 - - - - 1 1 1 - 1 2 - -
CO2 K3 3 2 1 - - - - 1 1 1 - 1 2 - -
CO3 K4 3 2 1 - - - - 1 1 1 - 1 2 - -
CO4 K4 3 2 1 - - - - 1 1 1 - 1 2 - -
CO5 K3 3 2 1 - - - - 1 1 1 - 1 2 - -
11
7. LECTURE PLAN
UNIT – IV RUN-TIME ENVIRONMENT AND CODE GENERATION
S. Propos Topic Actual Pertai High Mod Deli After successful Rem
No ed Lecture ning est e of very completion of the arks
Lectur CO(s) Cog Deli Reso course, the students
e n very urce should be able to
Period Period itive s (LU Outcomes)
Leve
l
1 CO K2 MD1 T1
313.4
Storage understand the storage
Organization organization concepts
2 CO K2 MD1 T1
Stack 313.4 MD5 describe stack allocation
Allocation
space
Space
3 Access to CO K3 MD1 T1
Non-local 313.4 procedures to access non-
Data on the local data on stack
Stack
4 CO K2 MD1 T1
313.4 MD2
Heap get idea for managing
Management Heap storage
5 CO K2 MD1 T1
MD2 Compare different
Parameter 313.4
parameter passing
Passing
methods
6 CO K2 MD1 T1
Issues in 313.4
identify the issues in code
Code
generation
Generation
7 CO K2 MD1 T1
Design of a 313.4
simple Code
design a code generator
Generator
using DAG
UNIT – IV
• TO UNDERSTAND THE BASIC CONCEPTS OF COMPILERS ,
STUDENTS ABLE TO TAKE QUIZ AS AN ACTIVITY.
• LINKS WILL BE PROVIDED BELOW.
https://create.kahoot.it/share/cs8602-unit-4/2e5c742f-541a-
4bd0-84b2-9e2ca3b47170
Join at www.kahoot.it
or with the Kahoot! app use the below game pin to play the Quiz.
Hands-on Assignment:
1. visualize the use/live of variables in programs, illustrate it
with the examples
2. Here is a sketch of two C functions / and g:
int f(int x) { int i; ••• return i+1; ••• }
int g(int y) { int j; ••• f(j+D ••• 1
That is, function g calls /. Draw the top of the stack,
starting with the activation record for g, after g calls /, and / is
about to return. You can consider only return
values,parameters, control links, and space for local variables;
you do not have to consider stored state or temporary or local
values not shown in the code sketch. However, you should
indicate:
13
9. LECTURE NOTES : UNIT – IV
STORAGE ORGANIZATION
o When the target program executes then it runs in its own logical address space
in which the value of each program has a location.
o The logical address space is shared among the compiler, operating system and
target machine for management and organization. The operating system is
used to map the logical address into physical address which is usually spread
throughout the memory.
Runtime storage comes into blocks, where a byte is used to show the smallest unit
of addressable memory. Using the four bytes a machine word can form. Object of
multibyte is stored in consecutive bytes and gives the first byte address.
14
Run-time storage can be subdivide to hold the different components of an executing
program:
Generated executable code
Static data objects
Dynamic data-object- heap
Automatic data objects- stack
The run-time environment is the structure of the target computers registers and
memory that serves to manage memory and maintain information needed to guide
a programs execution process.
15
Activation Record:
∙ Program Counter (PC) – whose value is the address of the next instruction to
be executed.
∙ Stack Pointer (SP) – whose value is the top of the (top of the stack, ToS).
Control stack is a run time stack which is used to keep track of the live procedure
activations i.e. it is used to find out the procedures whose execution have not been
completed.
When it is called (activation begins) then the procedure name will push on to the
stack and when it returns (activation ends) then it will popped.
An activation record is pushed into the stack when a procedure is called and it is
popped when the control returns to the caller function.
1. RETURN VALUE
2. ACTUAL PARAMETERS
3. CONTROL LINK
4. ACCESS LINK
6. LOCAL DATA
7. TEMPORARIES
16
Return Value: It is used by calling procedure to return a value to calling
procedure.
Access Link: It is used to refer to non-local data held in other activation records.
Saved Machine Status: It holds the information about status of machine before
the procedure is called.
Local Data: It holds the data that is local to the execution of the procedure.
ACTIVATION TREES
The use of a run-time stack is enabled by several useful relationships between the
activation tree and the behavior of the program
Calling sequences
Procedure calls are implemented by what are known as calling sequences, which
consists of code that allocates an activation record on the stack and enters
information into its fields
17
Properties of activation trees are :-
Each node represents an activation of a procedure.
The root shows the activation of the main function.
The node for procedure ‘x’ is the parent of node for procedure ‘y’ if and only if the control flows from
procedure x to procedure y.
Example – Consider the following program of Quicksort
main() {
int n;
readarray();
quicksort(1,n);
}
quicksort(int m, int n)
{
int i= partition(m,n);
quicksort(m,i-1);
quicksort(i+1,n);
}
First main function as root then main calls readarray and quicksort. Quicksort in turn calls partition
and quicksort again. The flow of control in a program corresponds to the depth first traversal of
activation tree which starts at the root.
18
Example 2: Activation Trees
Recall the fibonacci sequence 1,1,2,3,5,8, ... defined by f(1)=f(2)=1 and, for n>2, f(n)=f(n-1)+f(n-
2). Consider the function calls that result from a main program calling f(5). Surrounding the more-
general pseudocode that calculates (very inefficiently) the first 10 fibonacci numbers, we show the
calls and returns that result from main calling f(5). On the left they are shown in a linear fashion and,
on the right, we show them in tree form. The latter is sometimes called the activation tree or call
tree.
1. If an activation of p calls q, then that activation of p terminates no earlier than the activation
of q.
2. The order of activations (procedure calls) corresponds to a preorder traversal of the call tree.
3. The order of de-activations (procedure returns) corresponds to postorder traversal of the call
tree.
4. If execution is currently in an activation corresponding to a node N of the activation tree,
then the activations that are currently live are those corresponding to N and its ancestors in
the tree.
5. These live activations were called in the order given by the root-to-N path in the tree, and the
returns will occur in the reverse order.
19
STORAGE ALLOCATION TECHNIQUES
● Storage is organised as a stack and activation records are pushed and popped
as activation begin and end respectively. Locals are contained in activation
records so they are bound to fresh storage in each activation.
● Dynamic – grows and shrinks
● When node n is at the top of the control stack, the stack contains the nodes
along the path from n to the root.
● Recursion is supported in stack allocation
● Limitations : The memory addressing can be done using pointers an index
registers. Hence this type of allocation is slower than static allocation.
● The flow of the control in a program corresponds to a depth-first traversal of
the activation tree that:
○ starts at the root,
○ visits a node before its children, and
○ recursively visits children at each node an a left-to-right order
20
Calling sequences: (Caller’s responsibility )
The calling sequence, executed when one procedure (the caller) calls another (the callee) allocates an
activation recod (AR) on the stack and fills in the fields. Part of this work is done by the caller; the remainder by
the callee. Although the work is shared, the AR is called the callee’s AR.
1. The caller begins the process of creating the callee’s AR by evaluating the arguments and
placing them in the AR of the callee.
2. The Caller stores the return address and the sp in the callee’s AR
3. The caller increments sp so that instead of pointing into its AR, it points to the corresponding
point in the callee’s AR.
4. The callee saves the registers and other system dependent information
When the procedure returns, the following actions are performed by the callee, essentially
undoing the effects of the calling sequence.
1. The callee stores the return value, Note that thus address can be determined by the caller
using the old (soon to be restored) sp.
21
III. Heap Storage Allocation
• Memory allocation and deallocation can be done at any time and at any place
depending on the requirement of the user.
• Heap allocation is used to dynamically allocate memory to the variables and
claim it back when the variables are no more required.
• If the values of non local variables must be retained even after the activation
record then such a retaining is not possible by stack allocation. This limitation
of stack allocation is because of its Last in First Out nature. For retaining of
such local variables heap allocation strategy is used.
• The heap allocation allocates the continuous block of memory when required
of storage of activation records or other data object, this allocated memory
can be deallocated when activation ends. This dealloacted space can be
further resued by heap manager.
• The efficient heap management can be done by creating a linked list for the
free blocks and when any memory is deallocated that block of memory is
appended in the linked list.
• Allocate the most suitable block of memory form the linked list i.e. use best fit
technique for allocation of block.
• Recursion is supported.
22
Variable-Length Data on the Stack
It is the second flavor that we wish to allocate on the stack. The goal is for the callee to be able to
access these arrays using addresses determined at compile time even though the size of the arrays is
not known until the program is called, and indeed often differs from one call to the next (even when
the two calls correspond to the same source statement).
The solution is to leave room for pointers to the arrays in the AR. These pointers are fixed size and
can thus be accessed using static offsets. When the procedure is invoked and the sizes are known,
the pointers are filled in and the space allocated.
A difficulty caused by storing these variable size items on the stack is that it no longer is obvious
where the real top of the stack is located relative to sp. Consequently another pointer (call it real-top-
of-stack) is also kept. This is used on a call to tell where the new allocation record should begin.
23
Variable-Length Data on the Stack
The run-time memory management system must deal frequently with the allocation
of space for objects the sies of which are not known at compile time, but which are
local to a procedure and thus may be allocated on the stack.
In modern languages, objects whose size cannot be determined at compile time are
allocated space in the heap.
Variable-length arrays means array size depends on the value of one or more
parameters of the called procedure.
24
5. Access links
A direct implementation of the normal static scope rule for nested
functions is obtained by adding a pointer called the access link to each activation
record. If procedure p is nested immediately within procedure q in the source code,
then the access link in any activation of p points to the most recent activation of q.
8. Displays
The functions calls are maintained in the form of array for the purpose of display. A
more efficient implementation uses an auxiliary array d, called the display, which
consists of one pointer for each nesting depth.
The advantage of using a display is that if procedure p is executing, and it needs to
access element x belonging to some procedure q, we need to look only in d[i],
where i is the nesting depth of q, we follow the pointer d[i] to the activation record
for q, wherein x is found at a known offset.
25
For languages that do not allow nested procedure declarations, allocation of storage for
variables and access to those variables is simple:
1. Global variables are allocated static storage. The locations of these variables remain fixed
and are known at compile time. So to access any variable that is not local to the currently
executing procedure, we simply use the statically determined address.
2. Any other name must be local to the activation at the top of the stack. We may access
these variables through the top sp pointer of the stack.
An important benefit of static allocation for globals is that declared procedures may be
passed as parameters or returned as results (in C, a pointer to the function is passed), with
no substantial change in the data-access strategy. With the C static-scoping rule, and
without nested procedures, any name nonlocal to one procedure is nonlocal to all
procedures, regardless of how they are activated. Similarly, if a procedure is returned as a
result, then any nonlocal name refers to the storage statically allocated for it.
BLOCKS
• A block is a statement containing its own local data declarations
• C, a block has the syntax {Declarations statements}
• In Algol we use ( ) as delimiters
• Nesting Property of Block structure: it is not possible for two blocks B1 and
B2 to overlap in such a way that first B1 begins then B2 but B1 ends before B2 nesting
property
• Scope of a declaration in a block-structu red language is given by the
most closely nested rule
The scope of a declaration in a block B includes B.
If a name x is not declared in a block B, then an occurrence of x in B is in
the scope of a declaration of x in an enclosing block B’ such that
a) B’ has a declaration of x, and
b) B’is more closely nested around 8 than any other block
The scope of declaration of b in B0 does not include B1 because b is redeclared in
B1,indicated by B0-B1,such a gap is called hole in scope of declaration.
• Block structure can be implemented using stack allocation.
• scope of a declaration does not extend outside the block in which it appear the space
for the declared name can be allocated when the block is entered and reallocated when
control leaves the block
• view treats a block as a parameterized procedure, called only from the point just
before the block and returning only to the point just after the block
26
• to allocate storage for a complete procedure body at one time
• If there are blocks within the procedure then allowance is made for the
storage needed for declarations within the blocks
• For block B0,we can allocate storage as in Fig.Subscripts on locals a and b
identify the blocks that the locals are declared in. Note that a2 and b3 may be
assigned the same storage because they are in blocks that are not alive at the
same time
27
Lexical Scope Without Nested Procedures
• procedure definition cannot appear within another
• If there is a non-local reference to a name a in some function, then it must
be declared outside any function. The scope of a declaration outside a function
consists of the function bodies that follow the declaration, with holes if the name
is redeclared within a function.
• In the absence of nested procedures, the stack- allocation strategy for local
names
• Storage for all names declared outside any procedures can he allocated
statically.
• The position of this storage is known at compile time, so if a name is
nonlocal in some procedure body, we simply use the statically determined
address.
• Any other name must be a local of the activation at the top of the stack,
accessible through the top pointer.
• An important benefit of static allocation for non-locals is that declared
procedures can freely he passed as parameters and returned as result.
Lexical Scope with Nested Procedures
• A non-local occurrence of a name a in a Pascal procedure is in the
scope of the most closely nested declaration of a in the static program
text.
• The nesting of procedure definitions in the Pascal program of
quicksort is indicated by the following indentation:
Sort( )
readarray ( )
exchange ( )
quicksort ( )
Partition ( )
Nesting Depth
• The notion of nesting depth of a procedure is used below to
implement lexical scope.
• Let the name of the main program be at nesting depth 1;we add 1
to the nesting depth as we go from an enclosing to an enclosed
procedure.
Access Links
• direct implementation of lexical scope for nested procedures is
obtained be adding a pointer called an access link to each activation
record.
• If procedure p is nested immediately within q in the source text,
then the access link in an activation record for p points to the access link
in the record for the most recent activation of q.
1. Case np < nx. Since the called procedure x is nested more deeply than p it
must be declared within p, or it would not be accessible to p.
2. Case np >= nx. From the scope rules, the enclosing procedures at nesting
depths
1,2,3…. nx-1 of the called and calling procedures must be the same.Following np-
nx+1 access links from the caller we reach the most recent activation record of
procedure that statically encloses both the called and calling procedures most
closely. The access link reached is the one to which the access link in the called
procedure must point. Again np-nx +1 can be computed at compile time.
29
Displays
• Faster access to non-locals than with access links can be obtained using an
array d of pointers to activation records, called a display.
• We maintain the display so that storage for a non-local a at nesting depth i is
in the activation record pointed to by display element d [i].
• Suppose control is in an activation of a procedure p at nesting depth j.
• Then, the first j-1 elements of the display point to the most recent activations
of the procedures that lexically enclose procedure p, and d [j] points to the
activation of p.
• Using a display is generally faster than following access link because the
activation record holding a non-local is found by accessing an element of d and
then following just one pointer.
• When a new activation record for a procedure at nested depth i is set up, we
Save the value of d[i] in the activation record and Set d [i] to the new activation
record. Just before activation ends, d [i] is reset to the saved value.
30
PARAMETER PASSING
Basic terminology :
R- value: The value of an expression is called its r-value. The value contained in
a single variable also becomes an r-value if its appear on the right side of the
assignment operator. R-value can always be assigned to some other variable.
Formal Parameter: Variables that take the information passed by the caller
procedure are called formal parameters. These variables are declared in the
definition of the called function.
Actual Parameter: Variables whose values and functions are passed to the
called function are called actual parameters. These variables are specified in the
function call as arguments.
Call by Value
In call by value the calling procedure pass the r-value of the actual parameters
and the compiler puts that into called procedure’s activation record. Formal
parameters hold the values passed by the calling procedure, thus any changes
made in the formal parameters does not affect the actual parameters.
31
Call by Reference In call by reference the formal and actual parameters refers
to same memory location. The l-value of actual parameters is copied to the
activation record of the called function. Thus the called function has the address
of the actual parameters. If the actual parameters does not have a l-value (eg-
i+3) then it is evaluated in a new temporary location and the address of the
location is passed. Any changes made in the formal parameter is reflected in the
actual parameters (because changes are made at the address.
Call by Name
In call by name the actual parameters are substituted for formals in all the
places formals occur in the procedure. It is also referred as lazy evaluation
because evaluation is done on parameters only when needed.
syntax :
32
HEAP MANAGEMENT
The heap is the portion of the store that is used for data that lives
indefinitely, or until the program explicitly deletes it.
1 The Memory Manager
2 The Memory Hierarchy of a Computer
3 Locality in Programs
4 Reducing Fragmentation
5 Manual Deallocation Requests
The heap is the portion of the store that is used for data that lives indefinitely, or
until the program explicitly deletes it. While local variables typically become
inaccessible when their procedures end, many languages enable us to create objects
or other data whose existence is not tied to the procedure activation that creates
them. For example, both C + + and Java give the programmer new to create
objects that may be passed or pointers to them may be passed from procedure to
procedure, so they continue to exist long after the procedure that created them is
gone.
The memory manager keeps track of all the free space in heap storage at all
times. It performs two basic functions:
• Deallocation. The memory manager returns deallocated space to the pool of free
space, so it can reuse the space to satisfy other allocation requests. Memory
managers typically do not return memory to the operating sys-tem, even if the
program's heap usage drops.
33
Memory management would be simpler if (a) all allocation requests were
for chunks of the same size, and (b) storage were released predictably, say,
first-allocated first-deallocated. There are some languages, such as Lisp, for
which condition (a) holds; pure Lisp uses only one data element — a two-
pointer cell — from which all data structures are built. Condition (b) also holds
in some situations, the most common being data that can be allocated on the run-
time stack. However, in most languages, neither (a) nor (b) holds in general.
Rather, data elements of different sizes are allocated, and there is no good way to
predict the lifetimes of all allocated objects.
Thus, the memory manager must be prepared to service, in any order, allo-cation
and deallocation requests of any size, ranging from one byte to as large as the
program's entire address space.
• Space Efficiency. A memory manager should minimize the total heap space
needed by a program. Doing so allows larger programs to run in a fixed virtual
address space. Space efficiency is achieved by minimizing "fragmentation,"
discussed in Section 7.4.4.
• Program Efficiency. A memory manager should make good use of the memory
subsystem to allow programs to run faster. The time taken to execute an
instruction can vary widely depending on where objects are placed in memory.
Fortunately, programs tend to exhibit "locality," a phenomenon which refers to the
nonrandom clustered way in which typical programs access memory. By attention
to the placement of objects in memory, the memory manager can make better use
of space and, hopefully, make the program run faster.
34
2. The Memory Hierarchy of a Computer
The large variance in memory access times is due to the fundamental limitation in
hardware technology; we can build small and fast storage, or large and slow
storage, but not storage that is both large and fast.
A memory hierarchy, consists of a series of storage elements, with the smaller faster
ones "closer" to the processor, and the larger slower ones further away.
Typically, a processor has a small number of registers, whose contents are under
software control. Next, it has one or more levels of cache, usually made out of static
RAM, that are kilobytes to several megabytes in size. The next level of the hierarchy
is the physical (main) memory, made out of hundreds of megabytes or gigabytes of
dynamic RAM. The physical memory is then backed up by virtual memory, which is
implemented by gigabytes of disks. Upon a memory access, the machine first looks
for the data in the closest (lowest-level) storage and, if the data is not there, looks
in the next higher level, and so on.
3. Locality in Programs
Most programs exhibit a high degree of locality; that is, they spend most of their
time executing a relatively small fraction of the code and touching only a small
fraction of the data. We say that a program has temporal locality if the memory
locations it accesses are likely to be accessed again within a short period of time. We
say that a program has spatial locality if memory locations close to the location
accessed are likely also to be accessed within a short period of time.
The conventional wisdom is that programs spend 90% of their time executing 10%
of the code. Here is why:
Programs often contain many instructions that are never executed. Pro-grams built
with components and libraries use only a small fraction of the provided functionality.
Also as requirements change and programs evolve, legacy systems often contain
many instructions that are no longer used.
35
Static and Dynamic RAM
Most random-access memory is dynamic, which means that it is built of very simple
electronic circuits that lose their charge (and thus "forget" the bit they were
storing) in a short time. These circuits need to be refreshed —
that is, their bits read and rewritten — periodically. On the other hand, static
RAM is designed with a more complex circuit for each bit, and consequently the bit
stored can stay indefinitely, until it is changed. Evidently, a chip can store more bits if
it uses dynamic-RAM circuits than if it uses static-RAM circuits, so we tend to see
large main memories of the dynamic variety, while smaller memories, like caches, are
made from static circuits.
4. Reducing Fragmentation
At the beginning of program execution, the heap is one contiguous unit of free
space. As the program allocates and deallocates memory, this space is broken up into
free and used chunks of memory, and the free chunks need not reside in a
contiguous area of the heap. We refer to the free chunks of memory as holes. With
each allocation request, the memory manager must place the requested chunk of
memory into a large-enough hole. Unless a hole of exactly the right size is found, we
need to split some hole, creating a yet smaller hole.
With each deallocation request, the freed chunks of memory are added back to the
pool of free space. We coalesce contiguous holes into larger holes, as the holes can
only get smaller otherwise. If we are not careful, the memory may end up getting
fragmented, consisting of large numbers of small, noncontiguous holes. It is then
possible that no hole is large enough to satisfy a future request, even though there
may be sufficient aggregate free space
36
.
When an object is deallocated manually, the memory manager must make its
chunk free, so it can be allocated again. In some circumstances, it may also be
possible to combine (coalesce) that chunk with adjacent chunks of the heap, to
form a larger chunk. There is an advantage to doing so, since we can always use a
large chunk to do the work of small chunks of equal total size, but many small
chunks cannot hold one large object, as the combined chunk could.
The manual memory management, where the programmer must explicitly arrange
for the deallocation of data, as in C and C + + . Ideally, any storage that will no
longer be accessed should be deleted. Conversely, any storage that may be
referenced must not be deleted. Unfortunately, it is hard to enforce either of these
properties. In addition to considering the difficulties with manual deallocation, we
shall describe some of the techniques programmers use to help with the
difficulties.
37
Problems with Manual Deallocation
Manual memory management is error-prone. The common mistakes take two forms:
failing ever to delete data that cannot be referenced is called a memory-leak error,
and referencing deleted data is a dangling-pointer-dereference error.
It is hard for programmers to tell if a program will never refer to some stor-age in the
future, so the first common mistake is not deleting storage that will never be
referenced. Note that although memory leaks may slow down the exe-cution of a
program due to increased memory usage, they do not affect program correctness, as
long as the machine does not run out of memory. Many pro-grams can tolerate
memory leaks, especially if the leakage is slow. However, for long-running programs,
and especially nonstop programs like operating systems or server code, it is critical
that they not have leaks.
Automatic garbage collection gets rid of memory leaks by deallocating all the
garbage. Even with automatic garbage collection, a program may still use more
memory than necessary. A programmer may know that an object will never be
referenced, even though references to that object exist somewhere. In that case, the
programmer must deliberately remove references to objects that will never be
referenced, so the objects can be deallocated automatically.
Being overly zealous about deleting objects can lead to even worse problems than
memory leaks. The second common mistake is to delete some storage and then try
to refer to the data in the deallocated storage. Pointers to storage that has been
deallocated are known as dangling pointers. Once the freed storage has been
reallocated to a new variable, any read, write, or deallocation via the dangling
pointer can produce seemingly random effects. We refer to any operation, such as
read, write, or deallocate, that follows a pointer and tries to use the object it points
to, as dereferencing the pointer.
38
Programming Conventions and Tools
We now present a few of the most popular conventions and tools that have been
developed to help programmers cope with the complexity in managing memory:
39
CODE GENERATION
The final phase in compiler model is the code generator. It takes as input an intermediate
representation of the source program and produces as output an equivalent target program.
The code generation techniques presented below can be used whether or not an optimizing
phase occurs before code generation.
40
Intermediate representation can be :
Linear representation such as postfix notation
Three address representation such as quadruples
Virtual machine representation such as stack machine code
Graphical representations such as syntax trees and dags.
Prior to code generation, the front end must be scanned, parsed and translated into
intermediate representation along with necessary type checking. Therefore, input to
code generation is assumed to be error-free.
Target program:
The output of the code generator is the target program. The output may be
Absolute machine language
It can be placed in a fixed memory location and can be executed
immediately.
Relocatable machine language
It allows subprograms to be compiled separately.
Assembly language
Code generation is made easier.
Memory management:
Names in the source program are mapped to addresses of data
objects in run-time memory by the front end and code generator.
It makes use of symbol table, that is, a name in a three-address
statement refers to a symbol-table entry for the name.
Labels in three-address statements have to be converted to addresses of
instructions. For example,
j:gotoigenerates jump instruction as follows :
ifi<j, a backward jump instruction with target address
equal to location of code for quadrupleiis generated.
ifi>j, the jump is forward. We must store on a list for
quadrupleithe location of the first machine instruction
generated for quadruplej.Wheniis
processed, the machine locations for all instructions that
forward jumps toi are filled.
Instruction selection:
The instructions of target machine should be complete and uniform.
Instruction speeds and machine idioms are important factors when efficiency of
target program is considered.
The quality of the generated code is determined by its speed and size.
The former statement can be translated into the latter statement as shown below:
41
Register allocation
Instructions involving register operands are shorter and faster than those involving
operands in memory.
The use of registers is subdivided into two subproblems :
Register allocation– the set of variables that will reside in registers at a point in
the program is selected.
Register assignment– the specific register that a variable will reside in is picked.
Certain machine requires even-odd register pairs for some operands and results.
For example , consider the division instruction of the form :
D x, y
Evaluation order
The order in which the computations are performed can affect the
efficiency of the target code. Some computation orders require
fewer registers to hold intermediate results than others.
42
For example :
MOV R0, M stores contents of Register R0 into memory location M ;
MOV 4(R0), M stores the value contents(4+contents(R 0)) into M.
Instruction costs :
For example : MOV R0, R1 copies the contents of register R0 into R1. It has cost one,
MOV b, R0
ADD c, R0 cost = 6
MOV R0, a
MOV b, a
ADD c, a cost = 6
In order to generate good code for target machine, we must utilize its addressing
capabilities efficiently.
43
Code for other statement types:
Indexing and pointer operations in three address statements are handled in the
same way as binary operations. The code sequences for the indexed assignment
statements :
a:=b[i] ;
a[i]=b;
MOV Si(A) , R
a[i]:=b MOV b , a(Ri) 3 MOV Mi,R 5 5
MOV b , a(R)
MOV b,a(R)
a:=*p *p:=a
Stateme ‘i’ in Register Ri ‘i’ in Memory Mi ‘i’ in Stack
nts
Code Cost Code Cost Code Cost
44
A SIMPLE CODE GENERATOR
op source, destination
where, op is an op-code, and source and destination are data fields.
45
A code generation Algorithm
The algorithm takes as input a sequence of three-address statements constituting a basic
block. For each three-address statement of the form x : = y op z, perform the following
actions:
1. Invoke a function getreg to determine the location L where the result of the
computation y op z should be stored.
2. Consult the address descriptor for y to determine y’, the current location of y. Prefer
the register for y’ if the value of y is currently both in memory and a register. If the value
of y is not already in L, generate the instruction MOV y’ , L to place a copy of y in L.
4. If the current values of y or z have no next uses, are not live on exit from the block,
and are in registers, alter the register descriptor to indicate that, after execution of x : =
y op z , those registers will no longer contain y or z
SUB b,R0
R1 contains u v in R0
46
9. ASSIGNMENTS
S.No Questions K CO
Level Level
1. Find the instruction code for different addressing modes.
K3 CO4
x = u - t;
y = x * v;
K3 CO4
x = y + w;
y = t - z;
y = x * y;
3. Let fib(n) be the function
int fib(n) {
if (n == 0)
return 0;
else if (n == 1)
return 1;
else
47
10. PART A : Q & A : UNIT – I
SNo Questions and Answers CO K
What is meant by storage organization?
The operating system maps the logical addresses into
physical addresses, which are usually spread throughout
1 K1
memory. The management and organization of this logical
address space is shared between the compiler, operating
system and target machine.
48
10. PART A : Q & A : UNIT – I
SN
Questions and Answers CO K
o
What do you mean by dangling reference?
A dangling reference occurs when there is a reference
6 K2
to storage that has been deallocated.
49
10. PART A : Q & A : UNIT – I
51
PART B QUESTIONS : UNIT – I
(CO4, K2)
1. Explain in detail about stack allocation space.
2. Draw the activation tree for the quick soft algorithm.
3. (i) Describe the source language issues in detail.
(ii) Describe in detail about storage organization.
1. What are the uses of symbol-table? Describe the symbol table storage
allocation information
2. (i) Explain the non-local names in runtime storage managements.
(ii) Explain about activation records and its purpose.
1. (i) What are the issues in the design of code generator?
(ii) Explain simple code generator with suitable example.
PART C QUESTIONS
(CO4, K3)
1. Write a code generation algorithm. Explain about the descriptor and function
getreg(). Generate a code for x = (( a + b) / (b-c)) – ( a + b) * ( b-c) +f.
2. Draw the symbol tables for each of the procedures in the following pascal
code (including main) and show their nesting relationship by linking them via
a pointer reference in the structure (or record) used to implement them in
memory. Include the entries or fields for the local variables, arguments and
any other information you find relevant for the purposes of code generation,
such as its type and location at run-time.
52
12. Supportive online Certification courses
NPTEL : https://nptel.ac.in/courses/106/105/106105190/
Swayam : https://www.classcentral.com/course/swayam-compiler-design-12926
coursera : https://www.coursera.org/learn/nand2tetris2
Udemy : https://www.udemy.com/course/introduction-to-compiler-construction-and-design/
Mooc : https://www.mooc-list.com/course/compilers-coursera
edx : https://www.edx.org/course/compilers
53 53
13. Real time Applications in day to day life and to Industry
54 54
14. CONTENTS BEYOND SYLLABUS : UNIT – IV
55
The cost of computing a node n includes whatever loads and stores are necessary to
evaluate S in the given number of registers. It also includes the cost of computing the
operator at the root of S. The zeroth component of the cost vector is the optimal cost of
computing the subtree S into memory. The contiguous evaluation property ensures that an
optimal program for S can be generated by considering combinations of optimal programs
only for the subtrees of the root of S. This restriction reduces the number of cases that need
to be considered.
In order to compute the costs C[i] at node n, we view the instructions as tree-
rewriting rules, as in Section 8.9. Consider each template E that matches the input tree at
node n. By examining the cost vectors at the corresponding descendants of n, determine
the costs of evaluating the operands at the leaves of E. For those operands of E that are
registers, consider all possible orders in which the corresponding subtrees of T can be
evaluated into registers. In each ordering, the first subtree corresponding to a register
operand can be evaluated using i available registers, the second using i -1 registers, and so
on. To account for node n, add in the cost of the instruction associated with the template E.
The value C[i] is then the minimum cost over all possible orders.
The cost vectors for the entire tree T can be computed bottom up in time linearly
proportional to the number of nodes in T. It is convenient to store at each node the
instruction used to achieve the best cost for C[i] for each value of i. The smallest cost in the
vector for the root of T gives the minimum cost of evaluating T.
Example: Consider a machine having two registers RO and Rl, and the following
instructions, each of unit cost:
Let us apply the dynamic programming algorithm to generate optimal code for the syntax
tree in Fig 8.26. In the first phase, we compute the cost vectors shown at each node. To
illustrate this cost computation, consider the cost vector at the leaf a. C[0], the cost of
computing a into memory, is 0 since it is already there. C[l], the cost of computing a into a
register, is 1 since we can load it into a register with the instruction LD RO, a. C[2], the cost
of loading a into a register with two registers available, is the same as that with one register
available. The cost vector at leaf a is therefore (0,1,1).
56
consider the cost vector at the root. We first determine the minimum
cost of computing the root with one and two registers available. The
machine instruction ADD RO, RO, M matches the root, because the root
is labeled with the operator . Using this instruction, the minimum cost of
evaluating the root with one register available is the minimum cost of
computing its right subtree into memory, plus the minimum cost of
computing its left subtree into the register, plus 1 for the instruction. No
other way exists. The cost vectors at the right and left children of the
root show that the minimum cost of computing the root with one
register available is 5 2 1 = 8.
Now consider the minimum cost of evaluating the root with two
registers available. Three cases arise depending on which instruction is
used to compute the root and in what order the left and right subtrees
of the root are evaluated.
57
15. ASSESSMENT SCHEDULE
Name of the
S.NO Start Date End Date Portion
Assessment
UNIT 5 , 1 &
5 Revision 1 13.5.2024 16.5.2024
2
58
16. PRESCRIBED TEXT BOOKS & REFERENCE BOOKS
• TEXT BOOKS:
• REFERENCE BOOKS:
59
17. MINI PROJECT SUGGESTION
60
Thank you
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of Educational
Institutions. If you have received this document through email in error, please notify the system manager. This
document contains proprietary information and is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate, distribute or copy through e-mail. Please notify
the sender immediately by e-mail if you have received this document by mistake and delete this document from
your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly prohibited.
61