0% found this document useful (0 votes)
189 views

CDSSS

The document discusses and compares three types of parsers: SLR, CLR, and LALR parsers. It states that SLR parsers are the simplest and cheapest to implement but cannot parse all context-free grammars. CLR parsers are the most powerful but also the most expensive and difficult to implement. LALR parsers fall between SLR and CLR parsers in terms of power, cost, and complexity. The document then provides a table comparing the three parsers on various attributes like size, error detection, and supported grammar classes.

Uploaded by

Naman Kabadi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
189 views

CDSSS

The document discusses and compares three types of parsers: SLR, CLR, and LALR parsers. It states that SLR parsers are the simplest and cheapest to implement but cannot parse all context-free grammars. CLR parsers are the most powerful but also the most expensive and difficult to implement. LALR parsers fall between SLR and CLR parsers in terms of power, cost, and complexity. The document then provides a table comparing the three parsers on various attributes like size, error detection, and supported grammar classes.

Uploaded by

Naman Kabadi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 72

UNIT 4:

Q. COMPRE SLR,CLR AND LALR PARSER?


SLR Parser
SLR represents "Simple LR Parser". It is very easy and cost-effective to execute. The SLR parsing action
and goto function from the deterministic finite automata that recognizes viable prefixes. It will not make
specificially defined parsing action tables for all grammars but does succeed on several grammars for
programming languages. Given a grammar G. It augment G to make G’, and from G’ it can construct C, the
canonical collection of a set of items for G’. It can construct ACTION the parsing action function, and
GOTO, the goto function, from C using the following simple LR Parsing table construction technique. It
needed us to understand FOLLOW (A) for each non-terminal A of a grammar.

CLR Parser
CLR refers to canonical lookahead. CLR parsing uses the canonical collection of LR (1) items to construct
the CLR (1) parsing table. CLR (1) parsing table make more number of states as compared to the SLR (1)
parsing. In the CLR (1), it can locate the reduce node only in the lookahead symbols.

LALR Parser
LALR Parser is Look Ahead LR Parser. It is intermediate in power between SLR and CLR parser. It is the
compaction of CLR Parser, and hence tables obtained in this will be smaller than CLR Parsing Table.
For constructing the LALR (1) parsing table, the canonical collection of LR (1) items is used. In the LALR
(1) parsing, the LR (1) items with the equal productions but have several look ahead are grouped to form an
individual set of items. It is frequently the similar as CLR (1) parsing except for the one difference that is
the parsing table.
The overall structure of all these LR Parsers is the same. There are some common factors such as size, class
of context-free grammar, which they support, and cost in terms of time and space in which they differ.
Let us see the comparison between SLR, CLR, and LALR Parser.

SLR Parser LALR Parser CLR Parser

It is very easy and cheap to It is also easy and cheap to It is expensive and difficult to
implement. implement. implement.

SLR Parser is the smallest in LALR and SLR have the same size. CLR Parser is the largest. As
size. As they have less number of states. the number of states is very
large.

Error detection is not immediate Error detection is not immediate in Error detection can be done
in SLR. LALR. immediately in CLR Parser.

SLR fails to produce a parsing It is intermediate in power between It is very powerful and works
table for a certain class of SLR and CLR i.e., SLR ≤ LALR ≤ on a large class of grammar.
grammars. CLR.

It requires less time and space It requires more time and space It also requires more time and
SLR Parser LALR Parser CLR Parser

complexity. complexity. space complexity.

Q. Explain the various design issue of a code


generator.
Design Issues
In the code generation phase, various issues can arises:

1. Input to the code generator


2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order

1. Input to the code generator


o The input to the code generator contains the intermediate representation of the source
program and the information of the symbol table. The source program is produced by the
front end.
o We assume front end produces low-level intermediate representation i.e. values of names in
it can directly manipulated by the machine instructions.
o The code generation phase needs complete error-free intermediate code as an input
requires.

2. Target program:
The target program is the output of the code generator. The output can be:

a) Assembly language: It allows subprogram to be separately compiled.

b) Relocatable machine language: It makes the process of code generation easier.

c) Absolute machine language: It can be placed in a fixed location in memory and can be
executed immediately.
3. Memory management
o During code generation process the symbol table entries have to be mapped to actual p
addresses and levels have to be mapped to instruction address.
o Mapping name in the source program to address of data is co-operating done by the front
end and code generator.
o Local variables are stack allocation in the activation record while global variables are in
static area.

4. Instruction selection:
o Nature of instruction set of the target machine should be complete and uniform.
o When you consider the efficiency of target machine then the instruction speed and machine
idioms are important factors.
o The quality of the generated code can be determined by its speed and size.

Example:
The Three address code is:

1. a:= b + c  
2. d:= a + e  

Inefficient assembly code is:

1. MOV b, R0              R0→b                              
2. ADD c, R0   R0      c + R0  
3. MOV R0, a               a   →   R0  
4. MOV a, R0      R0→  a  
5. ADD e, R0               R0  →       e + R0  
6. MOV R0, d               d    →  R0  

5. Register allocation
Register can be accessed faster than memory. The instructions involving operands in register are
shorter and faster than those involving in memory operand.

The following sub problems arise when we use registers:

Register allocation: In register allocation, we select the set of variables that will reside in register.

Register assignment: In Register assignment, we pick the register that contains variable.

Certain machine requires even-odd pairs of registers for some operands and result.
For example:
Consider the following division instruction of the form:

1. D x, y  

Where,

x is the dividend even register in even/odd register pair

y is the divisor

Even register is used to hold the reminder.

Old register is used to hold the quotient.

6. Evaluation order
The efficiency of the target code can be affected by the order in which the computations are
performed. Some computation orders need fewer registers to hold results of intermediate than
others.

Explain the components of Activation Record:


o Activation record is used to manage the information needed by a single execution of a
procedure.
o An activation record is pushed into the stack when a procedure is called and it is popped
when the control returns to the caller function.

The diagram below shows the contents of activation records:


Return Value: It is used by called procedure to return a value to calling procedure.

Actual Parameter: It is used by calling procedures to supply parameters to the called procedures.

Control Link: It points to activation record of the caller.

Access Link: It is used to refer to non-local data held in other activation records.

Saved Machine Status: It holds the information about status of machine before the procedure is
called.

Local Data: It holds the data that is local to the execution of the procedure.

Temporaries: It stores the value that arises in the evaluation of an expression.

Performance Metrics in design of Garbage Collector:


 
Garbage collection is often so expensive that, although it was invented decades ago and
absolutely prevents memory leaks, it has yet to be adopted by many mainstream
programming languages. Many different approaches have been pro-posed over the years, and
there is not one clearly best garbage-collection algorithm. Before exploring the options, let
us first enumerate the performance metrics that must be considered when designing a
garbage collector.
 
 
• Overall Execution Time. Garbage collection can be very slow. It is impor-tant that it not
significantly increases the total run time of an application. Since the garbage collector
necessarily must touch a lot of data, its performance is determined greatly by how it
leverages the memory subsystem.
 
• Space Usage. It is important that garbage collection avoid fragmentation and make the best
use of the available memory.
 
Pause Time. Simple garbage collectors are notorious for causing pro-grams — the
            

mutators — to pause suddenly for an extremely long time, as garbage collection kicks in
without warning. Thus, besides minimizing the overall execution time, it is desirable that the
maximum pause time be minimized. As an important special case, real-time applications
require certain computations to be completed within a time limit. We must either suppress
garbage collection while performing real-time tasks, or restrict maximum pause time. Thus,
garbage collection is seldom used in real-time applications.
 
 
Program Locality. We cannot evaluate the speed of a garbage collector solely by its
            

running time. The garbage collector controls the placement of data and thus influences the
data locality of the mutator program. It can improve a mutator's temporal locality by freeing
up space and reusing it; it can improve the mutator's spatial locality by relocating data used
together in the same cache or pages.
 
 
Some of these design goals conflict with one another, and tradeoffs must be made carefully
by considering how programs typically behave. Also objects of different characteristics may
favor different treatments, requiring a collector to use different techniques for different kinds
of objects.
For example, the number of objects allocated is dominated by small objects, so allocation of
small objects must not incur a large overhead. On the other hand, consider garbage collectors
that relocate reachable objects. Relocation is expensive when dealing with large objects, but
less so with small objects.
 
As another example, in general, the longer we wait to collect garbage in a trace-based
collector, the larger the fraction of objects that can be collected. The reason is that objects
often "die young," so if we wait a while, many of the newly allocated objects will become
unreachable. Such a collector thus costs less on the average, per unreachable object
collected. On the other hand, infrequent collection increases a program's memory usage,
decreases its data locality, and increases the length of the pauses.
 
In contrast, a reference-counting collector, by introducing a constant over-head to many of
the mutator's operations, can slow down the overall execution of a program significantly. On
the other hand, reference counting does not cre-ate long pauses, and it is memory efficient,
because it finds garbage as soon as it is produced (with the exception of certain cyclic
structures discussed in Section 7.5.3).
 
Language design can also affect the characteristics of memory usage. Some languages
encourage a programming style that generates a lot of garbage. For example, programs in
functional or almost functional programming languages create more objects to avoid
mutating existing objects. In Java, all objects, other than base types like integers and
references, are allocated on the heap and not the stack, even if their lifetimes are confined to
that of one function invocation. This design frees the programmer from worrying about the
lifetimes of variables, at the expense of generating more garbage. Compiler optimizations
have been developed to analyze the lifetimes of variables and allocate them on the stack
whenever possible.

 
Target Machine
o The target computer is a type of byte-addressable machine. It has 4 bytes to a word.
o The target machine has n general purpose registers, R0, R1,...., Rn-1. It also has two-address
instructions of the form:

1. op source, destination  

Where, op is used as an op-code and source and destination are used as a data field.

o It has the following op-codes:


  ADD (add source to destination)
    SUB (subtract source from destination)
    MOV (move source to destination)
o The source and destination of an instruction can be specified by the combination of
registers and memory location with address modes.

MODE FORM ADDRESS EXAMPLE ADDED COST

absolute M M Add R0, R1 1

register R R Add temp, R1 0

indexed c(R) C+ contents(R) ADD 100 (R2), R1 1


indirect register *R contents(R) ADD * 100 0

indirect indexed *c(R) contents(c+ (R2), R1 1


contents(R))

Literal #c c ADD #3, R1 1

o Here, cost 1 means that it occupies only one word of memory.


o Each instruction has a cost of 1 plus added costs for the source and destination.
o Instruction cost = 1 + cost is used for source and destination mode.

Example:
1. Move register to memory R0 → M

1. MOV R0, M  
2. cost = 1+1+1    (since address of memory location M is in word following the instruction)  

2. Indirect indexed mode:

1. MOV * 4(R0), M  
2.      cost = 1+1+1   (since one word for memory location M, one word  
3. for result of *4(R0) and one for instruction)  

3. Literal Mode:

1. MOV #1, R0  
2. cost = 1+1+1 = 3   (one word for constant 1 and one for instruction)  

Code Generation Algorithm

The code generation algorithm is the core of the compiler. It sets up register and address
descriptors, then generates machine instructions that give you CPU-level control over
your program.
The algorithm is split into four parts: register descriptor set-up, basic block generation,
instruction generation for operations on registers (e.g., addition), and ending the basic
block with a jump statement or return command.
Register Descriptor Set Up: This part sets up an individual register’s value in memory
space by taking its index into an array of all possible values for that type of register (i32).
It also stores information about what kind of operation was performed on it so that
subsequent steps can identify which operation happened if they’re called multiple times
during execution.
Basic Block Generation: This step involves creating individual blocks within each basic
block as well as lines between them so we can keep track of where things are happening
at any given moment during execution.
Instruction Generation For Operations On Registers: This step converts source code
statements into machine instructions using information from both our ELF file format files
(the ones generated by GCC) as well as other sources such as Bazel’s build system
which knows how to generate particular kind of machine code for particular CPUs. This is
where we start to see the magic of how compilers work in practice, as they’re able to
generate code that’s optimized in various ways based on the type of operation being
performed (e.g., addition) and the registers involved (i32). This step can also be thought
of as “register allocation” because it’s where we determine which registers will be used
for each operation, and how many there are in total. This step uses the information
generated in the previous steps as well as other information such as rules about how
many registers are needed for certain operations. For example, we might know that 32-bit
addition requires two registers: one to hold the value being added, and one for the result
of this operation.
Instruction Scheduling: This step reorders instructions so that they’re executed
efficiently on a particular CPU architecture. This step uses information about the
execution resources available on each CPU architecture to determine the best order for
executing operations. It also considers things like whether or not we have enough
registers to store values (if some are in use), or if there’s a bottleneck somewhere else in
the pipeline.

Design of the Function getReg

The getReg function is the main function that returns the value of a register passed in. It
uses two parameters: A register number, and an action to perform on it. When you call
getReg with no parameter, it will return all registers’ values (i.e., all registers).
If you want to return a specific register’s value, then you can call getReg with that register
number and nothing else; if there are other parameters after this one (ie: 2nd parameter),
then they’ll be searched for related to that first parameter’s type instead of being added
as yet another argument after everything else has been evaluated already — this way we
don’t waste any time processing data when nothing happens at all! If there isn’t anything
after those two types but just an empty string (” “); then nothing happens either!
The output of this phase is a sequence of machine instructions that can be
executed, with the help of a runtime system. This code generator generates assembly
language for the target computer and object code for the target computer. The code
generator is responsible for generating the assembly language for the target computer. It
takes as input an intermediate format (sometimes called a compiler IR), which has been
processed by the parser and typed checker but not yet lowered into machine code.
The code generator is also responsible for generating object code that can be executed
on the target computer. This object code is usually in a format specific to the target
architecture, such as Intel 8086 or Motorola 68000.
The compiler front end parses source code and performs some initial analysis on it. It
then passes this data through several phases of compilation  which turns it into machine
instructions that can run on a computer processor.

Conclusion
Creating code generators can be a very complex task. The output of such a code
generator should be as readable and concise as possible, with no extraneous noise or
clutter.

CODE GENERATION ALGORITHM:


Code Generator
Code generator is used to produce the target code for three-address statements. It uses registers
to store the operands of the three address statement.

Example:
Consider the three address statement x:= y + z. It can have the following sequence of codes:

MOV x, R0
ADD y, R0

Register and Address Descriptors:


o A register descriptor contains the track of what is currently in each register. The register
descriptors show that all the registers are initially empty.
o An address descriptor is used to store the location where current value of the name can be
found at run time.

A code-generation algorithm:
The algorithm takes a sequence of three-address statements as input. For each three address
statement of the form a:= b op c perform the various actions. These are as follows:

1. Invoke a function getreg to find out the location L where the result of computation b op c
should be stored.
2. Consult the address description for y to determine y'. If the value of y currently in memory
and register both then prefer the register y' . If the value of y is not already in L then
generate the instruction MOV y' , L to place a copy of y in L.
3. Generate the instruction OP z' , L where z' is used to show the current location of z. if z is in
both then prefer a register to a memory location. Update the address descriptor of x to
indicate that x is in location L. If x is in L then update its descriptor and remove x from all
other descriptor.
4. If the current value of y or z have no next uses or not live on exit from the block or in
register then alter the register descriptor to indicate that after execution of x : = y op z
those register will no longer contain y or z.

Generating Code for Assignment Statements:


The assignment statement d:= (a-b) + (a-c) + (a-c) can be translated into the following sequence
of three address code:

1. t:= a-b  
2. u:= a-c  
3.         v:= t +u   
4.         d:= v+u  

Code sequence for the example is as follows:

Statement Code Generated Register descriptor Address descriptor


Register empty

t:= a - b MOV a, R0 R0 contains t t in R0


SUB b, R0

u:= a - c MOV a, R1 R0 contains t t in R0


SUB c, R1 R1 contains u u in R1

v:= t + u ADD R1, R0 R0 contains v u in R1


R1 contains u v in R1

d:= v + u ADD R1, R0 R0 contains d d in R0


MOV R0, d d in R0 and memory

LINKER AND LOADER:


1. Linker: A linker is special program that combines the object files, generated by
compiler/assembler and other pieces of code to originate an executable file has .exe
extension. In the object file, linker searches and append all libraries needed for execution
of file. It regulates the memory space that will hold the code from each module. It also
merges two or more separate object programs and establishes link among them.
Generally, linkers are of two types : 
1. Linkage Editor
2. Dynamic Linker
2. Loader: It is special program that takes input of executable files from linker, loads it to
main memory, and prepares this code for execution by computer. Loader allocates
memory space to program. Even it settles down symbolic reference between objects. It is
in charge of loading programs and libraries in operating system. The embedded
computer systems don’t have loaders. In them, code is executed through ROM. There
are following various loading schemes: 
1. Absolute Loaders
2. Relocating Loaders
3. Direct Linking Loaders
4. Bootstrap Loaders
Differences between Linker and Loader are as follows: 
LINKER LOADER

The main function of Linker is to generate executable Whereas main objective of Loader is to
files. load executable files to main memory.

The linker takes input of object code generated by And the loader takes input of executable
compiler/assembler. files generated by linker.

Linking can be defined as process of combining Loading can be defined as process of


various pieces of codes and source code to obtain loading executable codes to main memory
executable code. for further execution.

Linkers are of 2 types: Linkage Editor and Dynamic Loaders are of 4 types: Absolute,
Linker. Relocating, Direct Linking, Bootstrap.

Another use of linker is to combine all object It helps in allocating the address to
modules. executable codes/files.

Loader is also responsible for adjusting


Linker is also responsible for arranging objects in references which are used within the
program’s address space. program.
INTRODUCTION TO ASSEMBLER:
Assembler is a program for converting instructions written in low-level assembly code
into relocatable machine code and generating along information for the loader.

It generates instructions by evaluating the mnemonics (symbols) in operation field and


find the value of symbol and literals to produce machine code. Now, if assembler do all
this work in one scan then it is called single pass assembler, otherwise if it does in
multiple scans then called multiple pass assembler. Here assembler divide these tasks in
two passes:
 Pass-1:
1. Define symbols and literals and remember them in symbol table and
literal table respectively.
2. Keep track of location counter
3. Process pseudo-operations
 Pass-2:
1. Generate object code by converting symbolic op-code into respective
numeric op-code
2. Generate data for literals and look for values of symbols
Firstly, We will take a small assembly language program to understand the working in
their respective passes. Assembly language statement format:
[Label] [Opcode] [operand]

Example: M ADD R1, ='3'


where, M - Label; ADD - symbolic opcode;
R1 - symbolic register operand; (='3') - Literal

Assembly Program:
Label Op-code operand LC value(Location counter)
JOHN START 200
MOVER R1, ='3' 200
MOVEM R1, X 201
L1 MOVER R2, ='2' 202
LTORG 203
X DS 1 204
END 205
Let’s take a look on how this program is working:
1. START: This instruction starts the execution of program from location 200 and
label with START provides name for the program.(JOHN is name for program)
2. MOVER: It moves the content of literal(=’3′) into register operand R1.
3. MOVEM: It moves the content of register into memory operand(X).
4. MOVER: It again moves the content of literal(=’2′) into register operand R2 and
its label is specified as L1.
5. LTORG: It assigns address to literals(current LC value).
6. DS(Data Space): It assigns a data space of 1 to Symbol X.
7. END: It finishes the program execution.
Working of Pass-1: Define Symbol and literal table with their addresses.
Note: Literal address is specified by LTORG or END.
Step-1: START 200 (here no symbol or literal is found so both table would be empty)
Step-2: MOVER R1, =’3′ 200 ( =’3′ is a literal so literal table is made)
Literal Address

=’3′ –––

Step-3: MOVEM R1, X 201


X is a symbol referred prior to its declaration so it is stored in symbol table with blank
address field.
Symbol Address

X –––

Step-4: L1 MOVER R2, =’2′ 202


L1 is a label and =’2′ is a literal so store them in respective tables
Symbol Address

X –––

L1 202

Literal Address

=’3′ –––

=’2′ –––

Step-5: LTORG 203


Assign address to first literal specified by LC value, i.e., 203
Literal Address

=’3′ 203

=’2′ –––

Step-6: X DS 1 204
It is a data declaration statement i.e X is assigned data space of 1. But X is a symbol
which was referred earlier in step 3 and defined in step 6.This condition is called Forward
Reference Problem where variable is referred prior to its declaration and can be solved
by back-patching. So now assembler will assign X the address specified by LC value of
current step.
Symbol Address

X 204

L1 202

Step-7: END 205


Program finishes execution and remaining literal will get address specified by LC value of
END instruction. Here is the complete symbol and literal table made by pass 1 of
assembler.
Symbol Address

X 204

L1 202

Literal Address

=’3′ 203

=’2′ 205

Now tables generated by pass 1 along with their LC value will go to pass-2 of assembler
for further processing of pseudo-opcodes and machine op-codes.
Working of Pass-2:
Pass-2 of assembler generates machine code by converting symbolic machine-opcodes
into their respective bit configuration(machine understandable form). It stores all
machine-opcodes in MOT table (op-code table) with symbolic code, their length and their
bit configuration. It will also process pseudo-ops and will store them in POT
table(pseudo-op table).
Various Data bases required by pass-2:
1. MOT table(machine opcode table)
2. POT table(pseudo opcode table)
3. Base table(storing value of base register)
4. LC ( location counter)
Take a look at flowchart to understand:
As a whole assembler works as:
Types of Loader :-
There are eight(8) general loader schemes available.  But generally
main loader scheme is four(1,2,3,4). Those are –
      1.    Absolute Loader.
      2.    Relocating Loader.
      3.    Direct Linking Loader.
      4.    Dynamic Loader.
      5.    Assemble – and – go or Compile – and – go  loader.
      6.    Boot Strap Loader.
      7.    Linking Loader.
      8.    Relocation Loader.
Absolute Loader :-  It is a sim placed type of loader scheme. It
this scheme the loader simply accepts the machine language code
produced by assembler and place it into main memory at the location
specified by the assembler. The task of an absolute loader is virtually
trivial. Absolute loader is simply to implemented but it has several
disadvantage –

First : The programmer must specify to the assembler the address


in main memory where the program is to be loaded.
Second : The programmer must remember the address of multiple
sub – programs and there performance.

                                           Fig - Absolute Loader

Relocating Loader :-  To avoid possible reassembling of all


subroutines , When a single subroutines is changed and to performed
the task of allocation and linking of the programmer. The general
class of relocating loader was introduced.

            An example of relocating loader is Binary Symbolic Subroutine


(BSS) loader, IBM 7294, IBM 1130, GE635 are the BSS loader.
Output of a relocating loader is the object program and information
about all other programs  its references. In addition ,there is
relocating  information has two location in this program that need to
be changed. If it is to be loaded in an arbitrary location in memory.

Direct Linking Loader :-  Direct linking loader is a general


relocatable loader and perhaps the most popular loading scan
presently used. It has the advantage of allowing the programmer
multiple procedure segments and multiple  data segments. It has also
an advantage that has given to the programmer complete freedom in
referencing data or instructions content in other segments. This
provides flexibility of inter segment referencing and accessing while
at the same time allowing independent translation of programs.

Dynamic loader :- Dynamic loader is the Persian of loader that


actually intersect the “calls” and loads the necessary procedure is
called over lay super visor or simply flipper. This over all scheme is
called Dynamic Loading or Load on Call.
                     An advantage id Dynamic Loader is that, no over head is
in corrected unless the procedure to be called or referenced is
actually used. Also the system can be dynamically re – configured.
          The major draw back is occurred due to the fact that we here
postponed most of the binding process until execution time.

Assemble – and – go or Compile – and – go  loader :-


Any method of performing the loader function is to have the
assembler run in one of the part of  memory and place the machine
instructions in data directly into assigned memory location. This is a
simply solution, involving mode any extra procedure. It is used by the
WATFOR FORTAN compiler and other several language processor
(PL/I). Such a loading scheme is commonly called compile – and –
go and assemble – and – go loader.

Boot Strap Loader :- When a computer is first turned on or


restarted a special type of absolute loader is executed , called Boot
Strap Loader. It loads the operation system  into the main memory
and executes the related programs. It is added to the beginning of all
object programs that are to be loaded into and empty ideal system.

Linking Loader :- The need for linking a program with other


programs assign  because a program written by a programmer or its
translated version is really of a stand alone nature. That is a program
generally cannot executed on its own. Without requiring the presents
of some other programs is computer memory. The standard function
must reside into the main memory. The linking function next address
of program known to each other, So that such transfers can take
place during the execution.

Relocation Loader :-  Another function commonly performed by a loader is


that of program  re – location. Relocation is simply moving a program from one
area to another  in the storage. It referred to adjustment of address field and
not to movement of a program. The task of relocation is to add some constant
value to each relative  address in the segment the part of a loader which
performed relocation is called  re – location  loader.
FOUR BASIC FUNCTIONS OF LOADER:(IMP)
[
Loader performs its task via four functions, these are as follows:
1. Allocation: It allocates memory for the program in the main memory.
2. Linking: It combines two or more separate object programs or modules and
supplies necessary information.
3. Relocation: It modifies the object program so that it can be loaded at an
address different from the location. 
4. Loading: It brings the object program into the main memory for execution.

]
A general loading scheme is shown below:
Functions of Loader

Allocation: 

In order to allocate memory to the program, the loader allocates the memory on the basis
of the size of the program, this is known as allocation. The loader gives the space in
memory where the object program will be loaded for execution.

Linking: 

The linker resolves the symbolic reference code or data between the object modules by
allocating all of the user subroutine and library subroutine addresses. This process is
known as linking. In any language, a program written has a function, it can be user-
defined or can be a library function. For example, in C language we have a printf()
function. When the program control goes to the line where the printf() is written, then the
linker comes into the picture and it links that line to the module where the actual
implementation of the printf() function is written.

Relocation:

There are some address-dependent locations in the program, and these address
constants must be modified to fit the available space, this can be done by loader and this
is known as relocation. In order to allow the object program to be loaded at a different
address than the one initially supplied, the loader modifies the object program by
modifying specific instructions.

Loading:
The loader loads the program into the main memory for execution of that program. It
loads machine instruction and data of related programs and subroutines into the main
memory, this process is known as loading. The loader performs loading; hence, the
assembler must provide the loader with the object program.
eg. Absolute Loader

Absolute Loader:

The absolute loader transfers the text of the program into memory at the address
provided by the assembler after reading the object program line by line. There are two
types of information that the object program must communicate from the assembler to the
loader.
It must convey the machine instructions that the assembler has created along with the
memory address.
It must convey the start of the execution. At this point, the software will begin to run after
it has loaded.
The object program is the sequence of the object records. Each object record specifies
some specific aspect of the program in the object module. There are two types of
records:
Text record containing a binary image of the assembly program.
Transfer the record that contains the execution’s starting or entry point.
The formats of text and transfer records are shown below:

Algorithm:

The algorithm for the absolute loader is quite simple. The object file is read record by
record by the loader, and the binary image is moved to the locations specified in the
record. The final record is a transfer record. When the control reaches the transfer
record, it is transferred to the entry point for execution.
Flowchart:

 
PHASES OF COMPILER:
Compiler Phases
The compilation process contains the sequence of various phases. Each phase takes source
program in one representation and produces output in another representation. Each phase takes
input from its previous stage.

There are the various phases of compiler:

Fig: phases of compiler

Lexical Analysis:
Lexical analyzer phase is the first phase of compilation process. It takes source code as input. It
reads the source program one character at a time and converts it into meaningful lexemes. Lexical
analyzer represents these lexemes in the form of tokens.
Play Video

Syntax Analysis
Syntax analysis is the second phase of compilation process. It takes tokens as input and generates
a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the
tokens is syntactically correct or not.

Semantic Analysis
Semantic analysis is the third phase of compilation process. It checks whether the parse tree
follows the rules of language. Semantic analyzer keeps track of identifiers, their types and
expressions. The output of semantic analysis phase is the annotated tree syntax.

Intermediate Code Generation


In the intermediate code generation, compiler generates the source code into the intermediate
code. Intermediate code is generated between the high-level language and the machine language.
The intermediate code should be generated in such a way that you can easily translate it into the
target machine code.

Code Optimization
Code optimization is an optional phase. It is used to improve the intermediate code so that the
output of the program could run faster and take less space. It removes the unnecessary lines of the
code and arranges the sequence of statements in order to speed up the program execution.

Code Generation
Code generation is the final stage of the compilation process. It takes the optimized intermediate
code as input and maps it to the target machine language. Code generator translates the
intermediate code into the machine code of the specified computer.

Example:
DERIVATION:
Left-most Derivation
In the left most derivation, the input is scanned and replaced with the production rule from left to
right. So in left most derivatives we read the input string from left to right.

Example:
Production rules:

1. S = S + S  
2. S = S - S  
3. S = a | b |c  

Input:

a - b + c

The left-most derivation is:

1. S = S + S  
2. S = S - S + S  
3. S = a - S + S  
4. S = a - b + S  
5. S = a - b + c  

Right-most Derivation
In the right most derivation, the input is scanned and replaced with the production rule from right
to left. So in right most derivatives we read the input string from right to left.

Example:

1. S = S + S  
2. S = S - S  
3. S = a | b |c  

Input:

a - b + c

The right-most derivation is:

1. S = S - S  
2. S = S - S + S  
3. S = S - S + c  
4. S = S - b + c  
5. S = a - b + c  

PARSE TREE:
Parse tree
o Parse tree is the graphical representation of symbol. The symbol can be terminal or non-
terminal.
o In parsing, the string is derived using the start symbol. The root of the parse tree is that
start symbol.
o It is the graphical representation of symbol that can be terminals or non-terminals.
o Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the
operator in the parent node has less precedence over the operator in the sub-tree.

The parse tree follows these points:


o All leaf nodes have to be terminals.
o All interior nodes have to be non-terminals.
o In-order traversal gives original input string.

Example:
Production rules:

1. T= T + T | T * T  
2. T = a|b|c  

Input:

a * b + c

Step 1:

Step 2:
Step 3:

Step 4:

Step 5:
Parser
Parser is a compiler that is used to break the data into smaller elements coming from lexical
analysis phase.

A parser takes input in the form of sequence of tokens and produces output in the form of parse
tree.

Parsing is of two types: top down parsing and bottom up parsing.

Top down paring


o The top down parsing is known as recursive parsing or predictive parsing.
o Bottom up parsing is used to construct a parse tree for an input string.
o In the top down parsing, the parsing starts from the start symbol and transform it into the
input symbol.

Parse Tree representation of input string "acdb" is as follows:

Play Video
Bottom up parsing
o Bottom up parsing is also known as shift-reduce parsing.
o Bottom up parsing is used to construct a parse tree for an input string.
o In the bottom up parsing, the parsing starts with the input symbol and construct the parse
tree up to the start symbol by tracing out the rightmost derivations of string in reverse.

Example
Production

1. E → T  
2. T → T * F  
3. T → id  
4. F → T  
5. F → id  

Parse Tree representation of input string "id * id" is as follows:


Bottom up parsing is classified in to various parsing. These are as follows:

1. Shift-Reduce Parsing
2. Operator Precedence Parsing
3. Table Driven LR Parsing

a. LR( 1 )
b. SLR( 1 )
c. CLR ( 1 )
d. LALR( 1 )

Shift reduce parsing


o Shift reduce parsing is a process of reducing a string to the start symbol of a grammar.
o Shift reduce parsing uses a stack to hold the grammar and an input tape to hold the string.
o Sift reduce parsing performs the two actions: shift and reduce. That's why it is known as
shift reduces parsing.
o At the shift action, the current symbol in the input string is pushed to a stack.
o At each reduction, the symbols will replaced by the non-terminals. The symbol is the right
side of the production and non-terminal is the left side of the production.

Example:
Grammar:

1. S → S+S    
2. S → S-S    
3. S → (S)  
4. S → a  

Input string:

1. a1-(a2+a3)  

Parsing table:
There are two main categories of shift reduce parsing as follows:

1. Operator-Precedence Parsing
2. LR-Parser

Operator precedence parsing


Operator precedence grammar is kinds of shift reduce parsing method. It is applied to a small class
of operator grammars.

A grammar is said to be operator precedence grammar if it has two properties:

o No R.H.S. of any production has a∈.


o No two non-terminals are adjacent.

Operator precedence can only established between the terminals of the grammar. It ignores the
non-terminal.
There are the three operator precedence relations:
a ⋗ b means that terminal "a" has the higher precedence than terminal "b".

113.1K
How Do Fireworks Get Their Colors?

a ⋖ b means that terminal "a" has the lower precedence than terminal "b".

a ≐ b means that the terminal "a" and "b" both have same precedence.

Precedence table:

Parsing Action

o Both end of the given input string, add the $ symbol.


o Now scan the input string from left right until the ⋗ is encountered.
o Scan towards left over all the equal precedence until the first left most ⋖ is encountered.
o Everything between left most ⋖ and right most ⋗ is a handle.
o $ on $ means parsing is successful.

Example
Grammar:

1. E → E+T/T  
2. T → T*F/F  
3. F → id  

Given string:

1. w = id + id * id  

Let us consider a parse tree for it as follows:


On the basis of above tree, we can design following operator precedence table:

Now let us process the string with the help of the above precedence table:
LR Parser
LR parsing is one type of bottom up parsing. It is used to parse the large class of grammars.

In the LR parsing, "L" stands for left-to-right scanning of the input.

"R" stands for constructing a right most derivation in reverse.

"K" is the number of input symbols of the look ahead used to make number of parsing decision.

Play Video

LR parsing is divided into four parts: LR (0) parsing, SLR parsing, CLR parsing and LALR parsing.

LR algorithm:
The LR algorithm requires stack, input, output and parsing table. In all type of LR parsing, input,
output and stack are same but parsing table is different.
Fig: Block diagram of LR parser

Input buffer is used to indicate end of input and it contains the string to be parsed followed by a $
Symbol.

A stack is used to contain a sequence of grammar symbols with a $ at the bottom of the stack.

Parsing table is a two dimensional array. It contains two parts: Action part and Go To part.

LR (1) Parsing
Various steps involved in the LR (1) Parsing:

o For the given input string write a context free grammar.


o Check the ambiguity of the grammar.
o Add Augment production in the given grammar.
o Create Canonical collection of LR (0) items.
o Draw a data flow diagram (DFA).
o Construct a LR (1) parsing table.

Augment Grammar
Augmented grammar G` will be generated if we add one more production in the given grammar G.
It helps the parser to identify when to stop the parsing and announce the acceptance of the input.

Example
Given grammar

1. S → AA  
2. A → aA | b  

The Augment grammar G` is represented by


1. S`→ S  
2. S → AA  
3. A → aA | b  

Canonical Collection of LR(0) items


An LR (0) item is a production G with dot at some position on the right side of the production.

LR(0) items is useful to indicate that how much of the input has been scanned up to a given point
in the process of parsing.

In the LR (0), we place the reduce node in the entire row.

Example
Given grammar:

Play Video

1. S → AA  
2. A → aA | b  

Add Augment Production and insert '•' symbol at the first position for every production in G

1. S` → •S  
2. S → •AA  
3. A → •aA   
4. A → •b  

I0 State:
Add Augment production to the I0 State and Compute the Closure

I0 = Closure (S` → •S)


Add all productions starting with S in to I0 State because "•" is followed by the non-terminal. So,
the I0 State becomes
I0 = S` → •S
       S → •AA

Add all productions starting with "A" in modified I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes.

I0= S` → •S
       S → •AA
       A → •aA
       A → •b

I1= Go to (I0, S) = closure (S` → S•) = S` → S•

Here, the Production is reduced so close the State.

I1= S` → S•

I2= Go to (I0, A) = closure (S → A•A)

Add all productions starting with A in to I2 State because "•" is followed by the non-terminal. So,
the I2 State becomes

I2 =S→A•A
       A → •aA
       A → •b

Go to (I2,a) = Closure (A → a•A) = (same as I3)

Go to (I2, b) = Closure (A → b•) = (same as I4)

I3= Go to (I0,a) = Closure (A → a•A)

Add productions starting with A in I3.

A → a•A
A → •aA
A → •b

Go to (I3, a) = Closure (A → a•A) = (same as I3)


Go to (I3, b) = Closure (A → b•) = (same as I4)

I4= Go to (I0, b) = closure (A → b•) = A → b•


I5= Go to (I2, A) = Closure (S → AA•) = SA → A•
I6= Go to (I3, A) = Closure (A → aA•) = A → aA•

Drawing DFA:
The DFA contains the 7 states I0 to I6.
LR(0) Table
o If a state is going to some other state on a terminal then it correspond to a shift move.
o If a state is going to some other state on a variable then it correspond to go to move.
o If a state contain the final item in the particular row then write the reduce node completely.

Explanation:

o I0 on S is going to I1 so write it as 1.
o I0 on A is going to I2 so write it as 2.
o I2 on A is going to I5 so write it as 5.
o I3 on A is going to I6 so write it as 6.
o I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.
o I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.
o I4, I5 and I6 all states contains the final item because they contain • in the right most end.
So rate the production as production number.
Productions are numbered as follows:

1. S  →      AA    ... (1)                              
2. A   →     aA      ... (2)   
3. A    →    b     ... (3)  

o I1 contains the final item which drives(S` → S•), so action {I1, $} = Accept.
o I4 contains the final item which drives A → b• and that production corresponds to the
production number 3 so write it as r3 in the entire row.
o I5 contains the final item which drives S → AA• and that production corresponds to the
production number 1 so write it as r1 in the entire row.
o I6 contains the final item which drives A → aA• and that production corresponds to the
production number 2 so write it as r2 in the entire row.

SLR (1) Parsing


SLR (1) refers to simple LR Parsing. It is same as LR(0) parsing. The only difference is in the parsing
table.To construct SLR (1) parsing table, we use canonical collection of LR (0) item.

In the SLR (1) parsing, we place the reduce move only in the follow of left hand side.

Various steps involved in the SLR (1) Parsing:

o For the given input string write a context free grammar


o Check the ambiguity of the grammar
o Add Augment production in the given grammar
o Create Canonical collection of LR (0) items
o Draw a data flow diagram (DFA)
o Construct a SLR (1) parsing table

SLR (1) Table Construction


The steps which use to construct SLR (1) Table is given below:
Play Video

If a state (Ii) is going to some other state (Ij) on a terminal then it corresponds to a shift move in
the action part.

If a state (Ii) is going to some other state (I j) on a variable then it correspond to go to move in the
Go to part.

If a state (Ii) contains the final item like A → ab• which has no transitions to the next state then the
production is known as reduce production. For all terminals X in FOLLOW (A), write the reduce
entry along with their production numbers.

Example

1. S -> •Aa   
2.   A->αβ•   

1. Follow(S) = {$}  
2. Follow (A) = {a}  

SLR ( 1 ) Grammar
S → E
E → E + T | T
T → T * F | F
F → id

Add Augment Production and insert '•' symbol at the first position for every production in G

S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id

I0 State:

Add Augment production to the I0 State and Compute the Closure

I0 = Closure (S` → •E)

Add all productions starting with E in to I0 State because "." is followed by the non-terminal. So,
the I0 State becomes

I0 = S` → •E
        E → •E + T
        E → •T

Add all productions starting with T and F in modified I0 State because "." is followed by the non-
terminal. So, the I0 State becomes.

I0= S` → •E
       E → •E + T
       E → •T
       T → •T * F
       T → •F
       F → •id

I1= Go to (I0, E) = closure (S` → E•, E → E• + T)


I2= Go to (I0, T) = closure (E → T•T, T• → * F)
I3= Go to (I0, F) = Closure ( T → F• ) = T → F•
I4= Go to (I0, id) = closure ( F → id•) = F → id•
I5= Go to (I1, +) = Closure (E → E +•T)

Add all productions starting with T and F in I5 State because "." is followed by the non-terminal.
So, the I5 State becomes

I5 = E → E +•T
       T → •T * F
       T → •F
       F → •id

Go to (I5, F) = Closure (T → F•) = (same as I3)


Go to (I5, id) = Closure (F → id•) = (same as I4)

I6= Go to (I2, *) = Closure (T → T * •F)

Add all productions starting with F in I6 State because "." is followed by the non-terminal. So, the
I6 State becomes

I6 = T → T * •F
         F → •id

Go to (I6, id) = Closure (F → id•) = (same as I4)

I7= Go to (I5, T) = Closure (E → E + T•) = E → E + T•


I8= Go to (I6, F) = Closure (T → T * F•) = T → T * F•

Drawing DFA:
SLR (1) Table

Explanation:
First (E) = First (E + T) ∪ First (T)
First (T) = First (T * F) ∪ First (F)
First (F) = {id}
First (T) = {id}
First (E) = {id}
Follow (E) = First (+T) ∪ {$} = {+, $}
Follow (T) = First (*F) ∪ First (F)
               = {*, +, $}
Follow (F) = {*, +, $}
o I1 contains the final item which drives S → E• and follow (S) = {$}, so action {I1, $} = Accept
o I2 contains the final item which drives E → T• and follow (E) = {+, $}, so action {I2, +} = R2,
action {I2, $} = R2
o I3 contains the final item which drives T → F• and follow (T) = {+, *, $}, so action {I3, +} = R4,
action {I3, *} = R4, action {I3, $} = R4
o I4 contains the final item which drives F → id• and follow (F) = {+, *, $}, so action {I4, +} =
R5, action {I4, *} = R5, action {I4, $} = R5
o I7 contains the final item which drives E → E + T• and follow (E) = {+, $}, so action {I7, +} =
R1, action {I7, $} = R1
o I8 contains the final item which drives T → T * F• and follow (T) = {+, *, $}, so action {I8, +} =
R3, action {I8, *} = R3, action {I8, $} = R3.

Explain Phase recovery and panic mode error


recovery in LR parsing.
LR Parser Basically Uses the Mentioned Two Techniques to Detect Errors:
1. Syntactic Phase recovery
2. Panic mode recovery

Syntactic Phase Recovery:

Syntactic Phase Recovery Follows the Given Steps:


1. Programmer mistakes that call error procedures in the parser table are
determined based on the language.
2. Creating error procedures that can alter the top of the stack and/or certain
symbols on input in a way that is acceptable for table error entries.
There are some of the errors that are detected during the syntactic phase recovery:
1. Errors in structure
2. Missing operator
3. Misspelled keywords
4. Unbalanced parenthesis

Panic Mode Recovery:

This approach involves removing consecutive characters from the input one by one until
a set of synchronized tokens is obtained. Delimiters such as or are synchronizing tokens.
The benefit is that it is simple to implement and ensures that you do not end up in an
infinite loop. The drawback is that a significant quantity of data is skipped without being
checked for additional problems.
Panic mode recovery follows the given steps:
1. Scan the stack until you find a state ‘a’ with a goto() on a certain non-terminal
‘B’ (by removing states from the stack).
2. Until a symbol ‘b’ that can follow ‘B’ is identified, zero or more input symbols are
rejected.
SDT for Simple Expressions:
Syntax Directed Translation :
It is used for semantic analysis and SDT is basically used to construct the parse tree with
Grammar and Semantic action. In Grammar, need to decide who has the highest priority
will be done first and In semantic action, will decide what type of action done by
grammar.
Example :
SDT = Grammar+Semantic Action
Grammar = E -> E1+E2
Semantic action= if (E1.type != E2.type) then print "type mismatching"
Application of Syntax Directed Translation :
 SDT is used for Executing Arithmetic Expression.
 In the conversion from infix to postfix expression.
 In the conversion from infix to prefix expression.
 It is also used for Binary to decimal conversion.
 In counting number of Reduction.
 In creating a Syntax tree.
 SDT is used to generate intermediate code.
 In storing information into symbol table.
 SDT is commonly used for type checking also.
Example :
Here, we are going to cover an example of application of SDT for better understanding
the SDT application uses. let’s consider an example of arithmetic expression and then
you will see how SDT will be constructed.
Let’s consider Arithmetic Expression is given.
Input : 2+3*4
output: 14
SDT for the above example.

SDT for 2+3*4

Semantic Action is given as following.


E -> E+T { E.val = E.val + T.val then print (E.val)}
|T { E.val = T.val}
T -> T*F { T.val = T.val * F.val}
|F { T.val = F.val}
F -> Id {F.val = id}

SDT DIFFERENCE:
S-attributed SDT :
If an SDT uses only synthesized attributes, it is called as S-attributed SDT.
S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent nodes
depend upon the values of the child nodes.
Semantic actions are placed in rightmost place of RHS.
L-attributed SDT:
If an SDT uses both synthesized attributes and inherited attributes with a restriction that
inherited attribute can inherit values from left siblings only, it is called as L-attributed
SDT.

Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing


manner.

Semantic actions are placed anywhere in RHS.

For example,
A -> XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S}
is not an L-attributed grammar since Y.S = A.S and Y.S = X.S
are allowed but Y.S = Z.S violates the L-attributed SDT definition
as attributed is inheriting the value from its right sibling.
Note – If a definition is S-attributed, then it is also L-attributed
but NOT vice-versa.

Example – Consider the given below SDT.


P1: S -> MN {S.val= M.val + N.val}
P2: M -> PQ {M.val = P.val * Q.val and P.val =Q.val}
Select the correct option.
A. Both P1 and P2 are S attributed.
B. P1 is S attributed and P2 is L-attributed.
C. P1 is L attributed but P2 is not L-attributed.
D. None of the above
Explanation –
The correct answer is option C as, In P1, S is a synthesized
attribute and in L-attribute definition synthesized is allowed. So
P1 follows the L-attributed definition. But P2 doesn’t follow L-
attributed definition as P is depending on Q which is RHS to it.
1) S-attributed SDT:

 If every attribute is synthesized, then an SDT is called S-attributed SDT.


 If the value of parent nodes depends upon the value of the child nodes, then
S-attributed SDT is evaluated in bottom-up parsing.
 The right-most place of RHS holds the semantic action.

Example 1
PRODUCTION  SEMANTIC RULES

L => E n E => E1 + T E => T  T => T1 * L.val = E.val E.val = E1.val + T.val E: val = T: val T: val =
F T => F F => ( E ) F => digit 1 : val # F: val T: val = F: val F.val =  E.val F.val = digit.le

The SDD of the above example is an S-attributed SDT because each attribute, L.val,
E.val, T.val, and F.val, is synthesized.
2) L-attributed SDT:

 If an attribute of an SDT is synthesized or inherited with some restriction on


inherited attributes, it can inherit values from left siblings only. It is known as
L-attributed SDT.
 Attributes of this SDT are evaluated by depth-first and left-to-right parsing
methods.
 Semantic actions are placed anywhere in RHS.

Example:
X => ABC {B.P = X.P, B.P = A.P, B.P = C.P}
This is not an L-attributed SDT because B.P = X.P and B.P = A.P are allowed, but B.P =
C.P doesn't follow the rule of L-attributed SDT definition.

You might also like