CDSSS
CDSSS
CLR Parser
CLR refers to canonical lookahead. CLR parsing uses the canonical collection of LR (1) items to construct
the CLR (1) parsing table. CLR (1) parsing table make more number of states as compared to the SLR (1)
parsing. In the CLR (1), it can locate the reduce node only in the lookahead symbols.
LALR Parser
LALR Parser is Look Ahead LR Parser. It is intermediate in power between SLR and CLR parser. It is the
compaction of CLR Parser, and hence tables obtained in this will be smaller than CLR Parsing Table.
For constructing the LALR (1) parsing table, the canonical collection of LR (1) items is used. In the LALR
(1) parsing, the LR (1) items with the equal productions but have several look ahead are grouped to form an
individual set of items. It is frequently the similar as CLR (1) parsing except for the one difference that is
the parsing table.
The overall structure of all these LR Parsers is the same. There are some common factors such as size, class
of context-free grammar, which they support, and cost in terms of time and space in which they differ.
Let us see the comparison between SLR, CLR, and LALR Parser.
It is very easy and cheap to It is also easy and cheap to It is expensive and difficult to
implement. implement. implement.
SLR Parser is the smallest in LALR and SLR have the same size. CLR Parser is the largest. As
size. As they have less number of states. the number of states is very
large.
Error detection is not immediate Error detection is not immediate in Error detection can be done
in SLR. LALR. immediately in CLR Parser.
SLR fails to produce a parsing It is intermediate in power between It is very powerful and works
table for a certain class of SLR and CLR i.e., SLR ≤ LALR ≤ on a large class of grammar.
grammars. CLR.
It requires less time and space It requires more time and space It also requires more time and
SLR Parser LALR Parser CLR Parser
2. Target program:
The target program is the output of the code generator. The output can be:
c) Absolute machine language: It can be placed in a fixed location in memory and can be
executed immediately.
3. Memory management
o During code generation process the symbol table entries have to be mapped to actual p
addresses and levels have to be mapped to instruction address.
o Mapping name in the source program to address of data is co-operating done by the front
end and code generator.
o Local variables are stack allocation in the activation record while global variables are in
static area.
4. Instruction selection:
o Nature of instruction set of the target machine should be complete and uniform.
o When you consider the efficiency of target machine then the instruction speed and machine
idioms are important factors.
o The quality of the generated code can be determined by its speed and size.
Example:
The Three address code is:
1. a:= b + c
2. d:= a + e
1. MOV b, R0 R0→b
2. ADD c, R0 R0 c + R0
3. MOV R0, a a → R0
4. MOV a, R0 R0→ a
5. ADD e, R0 R0 → e + R0
6. MOV R0, d d → R0
5. Register allocation
Register can be accessed faster than memory. The instructions involving operands in register are
shorter and faster than those involving in memory operand.
Register allocation: In register allocation, we select the set of variables that will reside in register.
Register assignment: In Register assignment, we pick the register that contains variable.
Certain machine requires even-odd pairs of registers for some operands and result.
For example:
Consider the following division instruction of the form:
1. D x, y
Where,
6. Evaluation order
The efficiency of the target code can be affected by the order in which the computations are
performed. Some computation orders need fewer registers to hold results of intermediate than
others.
Actual Parameter: It is used by calling procedures to supply parameters to the called procedures.
Access Link: It is used to refer to non-local data held in other activation records.
Saved Machine Status: It holds the information about status of machine before the procedure is
called.
Local Data: It holds the data that is local to the execution of the procedure.
mutators — to pause suddenly for an extremely long time, as garbage collection kicks in
without warning. Thus, besides minimizing the overall execution time, it is desirable that the
maximum pause time be minimized. As an important special case, real-time applications
require certain computations to be completed within a time limit. We must either suppress
garbage collection while performing real-time tasks, or restrict maximum pause time. Thus,
garbage collection is seldom used in real-time applications.
Program Locality. We cannot evaluate the speed of a garbage collector solely by its
running time. The garbage collector controls the placement of data and thus influences the
data locality of the mutator program. It can improve a mutator's temporal locality by freeing
up space and reusing it; it can improve the mutator's spatial locality by relocating data used
together in the same cache or pages.
Some of these design goals conflict with one another, and tradeoffs must be made carefully
by considering how programs typically behave. Also objects of different characteristics may
favor different treatments, requiring a collector to use different techniques for different kinds
of objects.
For example, the number of objects allocated is dominated by small objects, so allocation of
small objects must not incur a large overhead. On the other hand, consider garbage collectors
that relocate reachable objects. Relocation is expensive when dealing with large objects, but
less so with small objects.
As another example, in general, the longer we wait to collect garbage in a trace-based
collector, the larger the fraction of objects that can be collected. The reason is that objects
often "die young," so if we wait a while, many of the newly allocated objects will become
unreachable. Such a collector thus costs less on the average, per unreachable object
collected. On the other hand, infrequent collection increases a program's memory usage,
decreases its data locality, and increases the length of the pauses.
In contrast, a reference-counting collector, by introducing a constant over-head to many of
the mutator's operations, can slow down the overall execution of a program significantly. On
the other hand, reference counting does not cre-ate long pauses, and it is memory efficient,
because it finds garbage as soon as it is produced (with the exception of certain cyclic
structures discussed in Section 7.5.3).
Language design can also affect the characteristics of memory usage. Some languages
encourage a programming style that generates a lot of garbage. For example, programs in
functional or almost functional programming languages create more objects to avoid
mutating existing objects. In Java, all objects, other than base types like integers and
references, are allocated on the heap and not the stack, even if their lifetimes are confined to
that of one function invocation. This design frees the programmer from worrying about the
lifetimes of variables, at the expense of generating more garbage. Compiler optimizations
have been developed to analyze the lifetimes of variables and allocate them on the stack
whenever possible.
Target Machine
o The target computer is a type of byte-addressable machine. It has 4 bytes to a word.
o The target machine has n general purpose registers, R0, R1,...., Rn-1. It also has two-address
instructions of the form:
1. op source, destination
Where, op is used as an op-code and source and destination are used as a data field.
Example:
1. Move register to memory R0 → M
1. MOV R0, M
2. cost = 1+1+1 (since address of memory location M is in word following the instruction)
1. MOV * 4(R0), M
2. cost = 1+1+1 (since one word for memory location M, one word
3. for result of *4(R0) and one for instruction)
3. Literal Mode:
1. MOV #1, R0
2. cost = 1+1+1 = 3 (one word for constant 1 and one for instruction)
The code generation algorithm is the core of the compiler. It sets up register and address
descriptors, then generates machine instructions that give you CPU-level control over
your program.
The algorithm is split into four parts: register descriptor set-up, basic block generation,
instruction generation for operations on registers (e.g., addition), and ending the basic
block with a jump statement or return command.
Register Descriptor Set Up: This part sets up an individual register’s value in memory
space by taking its index into an array of all possible values for that type of register (i32).
It also stores information about what kind of operation was performed on it so that
subsequent steps can identify which operation happened if they’re called multiple times
during execution.
Basic Block Generation: This step involves creating individual blocks within each basic
block as well as lines between them so we can keep track of where things are happening
at any given moment during execution.
Instruction Generation For Operations On Registers: This step converts source code
statements into machine instructions using information from both our ELF file format files
(the ones generated by GCC) as well as other sources such as Bazel’s build system
which knows how to generate particular kind of machine code for particular CPUs. This is
where we start to see the magic of how compilers work in practice, as they’re able to
generate code that’s optimized in various ways based on the type of operation being
performed (e.g., addition) and the registers involved (i32). This step can also be thought
of as “register allocation” because it’s where we determine which registers will be used
for each operation, and how many there are in total. This step uses the information
generated in the previous steps as well as other information such as rules about how
many registers are needed for certain operations. For example, we might know that 32-bit
addition requires two registers: one to hold the value being added, and one for the result
of this operation.
Instruction Scheduling: This step reorders instructions so that they’re executed
efficiently on a particular CPU architecture. This step uses information about the
execution resources available on each CPU architecture to determine the best order for
executing operations. It also considers things like whether or not we have enough
registers to store values (if some are in use), or if there’s a bottleneck somewhere else in
the pipeline.
The getReg function is the main function that returns the value of a register passed in. It
uses two parameters: A register number, and an action to perform on it. When you call
getReg with no parameter, it will return all registers’ values (i.e., all registers).
If you want to return a specific register’s value, then you can call getReg with that register
number and nothing else; if there are other parameters after this one (ie: 2nd parameter),
then they’ll be searched for related to that first parameter’s type instead of being added
as yet another argument after everything else has been evaluated already — this way we
don’t waste any time processing data when nothing happens at all! If there isn’t anything
after those two types but just an empty string (” “); then nothing happens either!
The output of this phase is a sequence of machine instructions that can be
executed, with the help of a runtime system. This code generator generates assembly
language for the target computer and object code for the target computer. The code
generator is responsible for generating the assembly language for the target computer. It
takes as input an intermediate format (sometimes called a compiler IR), which has been
processed by the parser and typed checker but not yet lowered into machine code.
The code generator is also responsible for generating object code that can be executed
on the target computer. This object code is usually in a format specific to the target
architecture, such as Intel 8086 or Motorola 68000.
The compiler front end parses source code and performs some initial analysis on it. It
then passes this data through several phases of compilation which turns it into machine
instructions that can run on a computer processor.
Conclusion
Creating code generators can be a very complex task. The output of such a code
generator should be as readable and concise as possible, with no extraneous noise or
clutter.
Example:
Consider the three address statement x:= y + z. It can have the following sequence of codes:
MOV x, R0
ADD y, R0
A code-generation algorithm:
The algorithm takes a sequence of three-address statements as input. For each three address
statement of the form a:= b op c perform the various actions. These are as follows:
1. Invoke a function getreg to find out the location L where the result of computation b op c
should be stored.
2. Consult the address description for y to determine y'. If the value of y currently in memory
and register both then prefer the register y' . If the value of y is not already in L then
generate the instruction MOV y' , L to place a copy of y in L.
3. Generate the instruction OP z' , L where z' is used to show the current location of z. if z is in
both then prefer a register to a memory location. Update the address descriptor of x to
indicate that x is in location L. If x is in L then update its descriptor and remove x from all
other descriptor.
4. If the current value of y or z have no next uses or not live on exit from the block or in
register then alter the register descriptor to indicate that after execution of x : = y op z
those register will no longer contain y or z.
1. t:= a-b
2. u:= a-c
3. v:= t +u
4. d:= v+u
The main function of Linker is to generate executable Whereas main objective of Loader is to
files. load executable files to main memory.
The linker takes input of object code generated by And the loader takes input of executable
compiler/assembler. files generated by linker.
Linkers are of 2 types: Linkage Editor and Dynamic Loaders are of 4 types: Absolute,
Linker. Relocating, Direct Linking, Bootstrap.
Another use of linker is to combine all object It helps in allocating the address to
modules. executable codes/files.
Assembly Program:
Label Op-code operand LC value(Location counter)
JOHN START 200
MOVER R1, ='3' 200
MOVEM R1, X 201
L1 MOVER R2, ='2' 202
LTORG 203
X DS 1 204
END 205
Let’s take a look on how this program is working:
1. START: This instruction starts the execution of program from location 200 and
label with START provides name for the program.(JOHN is name for program)
2. MOVER: It moves the content of literal(=’3′) into register operand R1.
3. MOVEM: It moves the content of register into memory operand(X).
4. MOVER: It again moves the content of literal(=’2′) into register operand R2 and
its label is specified as L1.
5. LTORG: It assigns address to literals(current LC value).
6. DS(Data Space): It assigns a data space of 1 to Symbol X.
7. END: It finishes the program execution.
Working of Pass-1: Define Symbol and literal table with their addresses.
Note: Literal address is specified by LTORG or END.
Step-1: START 200 (here no symbol or literal is found so both table would be empty)
Step-2: MOVER R1, =’3′ 200 ( =’3′ is a literal so literal table is made)
Literal Address
=’3′ –––
X –––
X –––
L1 202
Literal Address
=’3′ –––
=’2′ –––
=’3′ 203
=’2′ –––
Step-6: X DS 1 204
It is a data declaration statement i.e X is assigned data space of 1. But X is a symbol
which was referred earlier in step 3 and defined in step 6.This condition is called Forward
Reference Problem where variable is referred prior to its declaration and can be solved
by back-patching. So now assembler will assign X the address specified by LC value of
current step.
Symbol Address
X 204
L1 202
X 204
L1 202
Literal Address
=’3′ 203
=’2′ 205
Now tables generated by pass 1 along with their LC value will go to pass-2 of assembler
for further processing of pseudo-opcodes and machine op-codes.
Working of Pass-2:
Pass-2 of assembler generates machine code by converting symbolic machine-opcodes
into their respective bit configuration(machine understandable form). It stores all
machine-opcodes in MOT table (op-code table) with symbolic code, their length and their
bit configuration. It will also process pseudo-ops and will store them in POT
table(pseudo-op table).
Various Data bases required by pass-2:
1. MOT table(machine opcode table)
2. POT table(pseudo opcode table)
3. Base table(storing value of base register)
4. LC ( location counter)
Take a look at flowchart to understand:
As a whole assembler works as:
Types of Loader :-
There are eight(8) general loader schemes available. But generally
main loader scheme is four(1,2,3,4). Those are –
1. Absolute Loader.
2. Relocating Loader.
3. Direct Linking Loader.
4. Dynamic Loader.
5. Assemble – and – go or Compile – and – go loader.
6. Boot Strap Loader.
7. Linking Loader.
8. Relocation Loader.
Absolute Loader :- It is a sim placed type of loader scheme. It
this scheme the loader simply accepts the machine language code
produced by assembler and place it into main memory at the location
specified by the assembler. The task of an absolute loader is virtually
trivial. Absolute loader is simply to implemented but it has several
disadvantage –
]
A general loading scheme is shown below:
Functions of Loader
Allocation:
In order to allocate memory to the program, the loader allocates the memory on the basis
of the size of the program, this is known as allocation. The loader gives the space in
memory where the object program will be loaded for execution.
Linking:
The linker resolves the symbolic reference code or data between the object modules by
allocating all of the user subroutine and library subroutine addresses. This process is
known as linking. In any language, a program written has a function, it can be user-
defined or can be a library function. For example, in C language we have a printf()
function. When the program control goes to the line where the printf() is written, then the
linker comes into the picture and it links that line to the module where the actual
implementation of the printf() function is written.
Relocation:
There are some address-dependent locations in the program, and these address
constants must be modified to fit the available space, this can be done by loader and this
is known as relocation. In order to allow the object program to be loaded at a different
address than the one initially supplied, the loader modifies the object program by
modifying specific instructions.
Loading:
The loader loads the program into the main memory for execution of that program. It
loads machine instruction and data of related programs and subroutines into the main
memory, this process is known as loading. The loader performs loading; hence, the
assembler must provide the loader with the object program.
eg. Absolute Loader
Absolute Loader:
The absolute loader transfers the text of the program into memory at the address
provided by the assembler after reading the object program line by line. There are two
types of information that the object program must communicate from the assembler to the
loader.
It must convey the machine instructions that the assembler has created along with the
memory address.
It must convey the start of the execution. At this point, the software will begin to run after
it has loaded.
The object program is the sequence of the object records. Each object record specifies
some specific aspect of the program in the object module. There are two types of
records:
Text record containing a binary image of the assembly program.
Transfer the record that contains the execution’s starting or entry point.
The formats of text and transfer records are shown below:
Algorithm:
The algorithm for the absolute loader is quite simple. The object file is read record by
record by the loader, and the binary image is moved to the locations specified in the
record. The final record is a transfer record. When the control reaches the transfer
record, it is transferred to the entry point for execution.
Flowchart:
PHASES OF COMPILER:
Compiler Phases
The compilation process contains the sequence of various phases. Each phase takes source
program in one representation and produces output in another representation. Each phase takes
input from its previous stage.
Lexical Analysis:
Lexical analyzer phase is the first phase of compilation process. It takes source code as input. It
reads the source program one character at a time and converts it into meaningful lexemes. Lexical
analyzer represents these lexemes in the form of tokens.
Play Video
Syntax Analysis
Syntax analysis is the second phase of compilation process. It takes tokens as input and generates
a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the
tokens is syntactically correct or not.
Semantic Analysis
Semantic analysis is the third phase of compilation process. It checks whether the parse tree
follows the rules of language. Semantic analyzer keeps track of identifiers, their types and
expressions. The output of semantic analysis phase is the annotated tree syntax.
Code Optimization
Code optimization is an optional phase. It is used to improve the intermediate code so that the
output of the program could run faster and take less space. It removes the unnecessary lines of the
code and arranges the sequence of statements in order to speed up the program execution.
Code Generation
Code generation is the final stage of the compilation process. It takes the optimized intermediate
code as input and maps it to the target machine language. Code generator translates the
intermediate code into the machine code of the specified computer.
Example:
DERIVATION:
Left-most Derivation
In the left most derivation, the input is scanned and replaced with the production rule from left to
right. So in left most derivatives we read the input string from left to right.
Example:
Production rules:
1. S = S + S
2. S = S - S
3. S = a | b |c
Input:
a - b + c
1. S = S + S
2. S = S - S + S
3. S = a - S + S
4. S = a - b + S
5. S = a - b + c
Right-most Derivation
In the right most derivation, the input is scanned and replaced with the production rule from right
to left. So in right most derivatives we read the input string from right to left.
Example:
1. S = S + S
2. S = S - S
3. S = a | b |c
Input:
a - b + c
1. S = S - S
2. S = S - S + S
3. S = S - S + c
4. S = S - b + c
5. S = a - b + c
PARSE TREE:
Parse tree
o Parse tree is the graphical representation of symbol. The symbol can be terminal or non-
terminal.
o In parsing, the string is derived using the start symbol. The root of the parse tree is that
start symbol.
o It is the graphical representation of symbol that can be terminals or non-terminals.
o Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the
operator in the parent node has less precedence over the operator in the sub-tree.
Example:
Production rules:
1. T= T + T | T * T
2. T = a|b|c
Input:
a * b + c
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Parser
Parser is a compiler that is used to break the data into smaller elements coming from lexical
analysis phase.
A parser takes input in the form of sequence of tokens and produces output in the form of parse
tree.
Play Video
Bottom up parsing
o Bottom up parsing is also known as shift-reduce parsing.
o Bottom up parsing is used to construct a parse tree for an input string.
o In the bottom up parsing, the parsing starts with the input symbol and construct the parse
tree up to the start symbol by tracing out the rightmost derivations of string in reverse.
Example
Production
1. E → T
2. T → T * F
3. T → id
4. F → T
5. F → id
1. Shift-Reduce Parsing
2. Operator Precedence Parsing
3. Table Driven LR Parsing
a. LR( 1 )
b. SLR( 1 )
c. CLR ( 1 )
d. LALR( 1 )
Example:
Grammar:
1. S → S+S
2. S → S-S
3. S → (S)
4. S → a
Input string:
1. a1-(a2+a3)
Parsing table:
There are two main categories of shift reduce parsing as follows:
1. Operator-Precedence Parsing
2. LR-Parser
Operator precedence can only established between the terminals of the grammar. It ignores the
non-terminal.
There are the three operator precedence relations:
a ⋗ b means that terminal "a" has the higher precedence than terminal "b".
113.1K
How Do Fireworks Get Their Colors?
a ⋖ b means that terminal "a" has the lower precedence than terminal "b".
a ≐ b means that the terminal "a" and "b" both have same precedence.
Precedence table:
Parsing Action
Example
Grammar:
1. E → E+T/T
2. T → T*F/F
3. F → id
Given string:
1. w = id + id * id
Now let us process the string with the help of the above precedence table:
LR Parser
LR parsing is one type of bottom up parsing. It is used to parse the large class of grammars.
"K" is the number of input symbols of the look ahead used to make number of parsing decision.
Play Video
LR parsing is divided into four parts: LR (0) parsing, SLR parsing, CLR parsing and LALR parsing.
LR algorithm:
The LR algorithm requires stack, input, output and parsing table. In all type of LR parsing, input,
output and stack are same but parsing table is different.
Fig: Block diagram of LR parser
Input buffer is used to indicate end of input and it contains the string to be parsed followed by a $
Symbol.
A stack is used to contain a sequence of grammar symbols with a $ at the bottom of the stack.
Parsing table is a two dimensional array. It contains two parts: Action part and Go To part.
LR (1) Parsing
Various steps involved in the LR (1) Parsing:
Augment Grammar
Augmented grammar G` will be generated if we add one more production in the given grammar G.
It helps the parser to identify when to stop the parsing and announce the acceptance of the input.
Example
Given grammar
1. S → AA
2. A → aA | b
LR(0) items is useful to indicate that how much of the input has been scanned up to a given point
in the process of parsing.
Example
Given grammar:
Play Video
1. S → AA
2. A → aA | b
Add Augment Production and insert '•' symbol at the first position for every production in G
1. S` → •S
2. S → •AA
3. A → •aA
4. A → •b
I0 State:
Add Augment production to the I0 State and Compute the Closure
Add all productions starting with "A" in modified I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes.
I0= S` → •S
S → •AA
A → •aA
A → •b
I1= S` → S•
Add all productions starting with A in to I2 State because "•" is followed by the non-terminal. So,
the I2 State becomes
I2 =S→A•A
A → •aA
A → •b
A → a•A
A → •aA
A → •b
Drawing DFA:
The DFA contains the 7 states I0 to I6.
LR(0) Table
o If a state is going to some other state on a terminal then it correspond to a shift move.
o If a state is going to some other state on a variable then it correspond to go to move.
o If a state contain the final item in the particular row then write the reduce node completely.
Explanation:
o I0 on S is going to I1 so write it as 1.
o I0 on A is going to I2 so write it as 2.
o I2 on A is going to I5 so write it as 5.
o I3 on A is going to I6 so write it as 6.
o I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.
o I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.
o I4, I5 and I6 all states contains the final item because they contain • in the right most end.
So rate the production as production number.
Productions are numbered as follows:
1. S → AA ... (1)
2. A → aA ... (2)
3. A → b ... (3)
o I1 contains the final item which drives(S` → S•), so action {I1, $} = Accept.
o I4 contains the final item which drives A → b• and that production corresponds to the
production number 3 so write it as r3 in the entire row.
o I5 contains the final item which drives S → AA• and that production corresponds to the
production number 1 so write it as r1 in the entire row.
o I6 contains the final item which drives A → aA• and that production corresponds to the
production number 2 so write it as r2 in the entire row.
In the SLR (1) parsing, we place the reduce move only in the follow of left hand side.
If a state (Ii) is going to some other state (Ij) on a terminal then it corresponds to a shift move in
the action part.
If a state (Ii) is going to some other state (I j) on a variable then it correspond to go to move in the
Go to part.
If a state (Ii) contains the final item like A → ab• which has no transitions to the next state then the
production is known as reduce production. For all terminals X in FOLLOW (A), write the reduce
entry along with their production numbers.
Example
1. S -> •Aa
2. A->αβ•
1. Follow(S) = {$}
2. Follow (A) = {a}
SLR ( 1 ) Grammar
S → E
E → E + T | T
T → T * F | F
F → id
Add Augment Production and insert '•' symbol at the first position for every production in G
S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
I0 State:
Add all productions starting with E in to I0 State because "." is followed by the non-terminal. So,
the I0 State becomes
I0 = S` → •E
E → •E + T
E → •T
Add all productions starting with T and F in modified I0 State because "." is followed by the non-
terminal. So, the I0 State becomes.
I0= S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
Add all productions starting with T and F in I5 State because "." is followed by the non-terminal.
So, the I5 State becomes
I5 = E → E +•T
T → •T * F
T → •F
F → •id
Add all productions starting with F in I6 State because "." is followed by the non-terminal. So, the
I6 State becomes
I6 = T → T * •F
F → •id
Drawing DFA:
SLR (1) Table
Explanation:
First (E) = First (E + T) ∪ First (T)
First (T) = First (T * F) ∪ First (F)
First (F) = {id}
First (T) = {id}
First (E) = {id}
Follow (E) = First (+T) ∪ {$} = {+, $}
Follow (T) = First (*F) ∪ First (F)
= {*, +, $}
Follow (F) = {*, +, $}
o I1 contains the final item which drives S → E• and follow (S) = {$}, so action {I1, $} = Accept
o I2 contains the final item which drives E → T• and follow (E) = {+, $}, so action {I2, +} = R2,
action {I2, $} = R2
o I3 contains the final item which drives T → F• and follow (T) = {+, *, $}, so action {I3, +} = R4,
action {I3, *} = R4, action {I3, $} = R4
o I4 contains the final item which drives F → id• and follow (F) = {+, *, $}, so action {I4, +} =
R5, action {I4, *} = R5, action {I4, $} = R5
o I7 contains the final item which drives E → E + T• and follow (E) = {+, $}, so action {I7, +} =
R1, action {I7, $} = R1
o I8 contains the final item which drives T → T * F• and follow (T) = {+, *, $}, so action {I8, +} =
R3, action {I8, *} = R3, action {I8, $} = R3.
This approach involves removing consecutive characters from the input one by one until
a set of synchronized tokens is obtained. Delimiters such as or are synchronizing tokens.
The benefit is that it is simple to implement and ensures that you do not end up in an
infinite loop. The drawback is that a significant quantity of data is skipped without being
checked for additional problems.
Panic mode recovery follows the given steps:
1. Scan the stack until you find a state ‘a’ with a goto() on a certain non-terminal
‘B’ (by removing states from the stack).
2. Until a symbol ‘b’ that can follow ‘B’ is identified, zero or more input symbols are
rejected.
SDT for Simple Expressions:
Syntax Directed Translation :
It is used for semantic analysis and SDT is basically used to construct the parse tree with
Grammar and Semantic action. In Grammar, need to decide who has the highest priority
will be done first and In semantic action, will decide what type of action done by
grammar.
Example :
SDT = Grammar+Semantic Action
Grammar = E -> E1+E2
Semantic action= if (E1.type != E2.type) then print "type mismatching"
Application of Syntax Directed Translation :
SDT is used for Executing Arithmetic Expression.
In the conversion from infix to postfix expression.
In the conversion from infix to prefix expression.
It is also used for Binary to decimal conversion.
In counting number of Reduction.
In creating a Syntax tree.
SDT is used to generate intermediate code.
In storing information into symbol table.
SDT is commonly used for type checking also.
Example :
Here, we are going to cover an example of application of SDT for better understanding
the SDT application uses. let’s consider an example of arithmetic expression and then
you will see how SDT will be constructed.
Let’s consider Arithmetic Expression is given.
Input : 2+3*4
output: 14
SDT for the above example.
SDT DIFFERENCE:
S-attributed SDT :
If an SDT uses only synthesized attributes, it is called as S-attributed SDT.
S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent nodes
depend upon the values of the child nodes.
Semantic actions are placed in rightmost place of RHS.
L-attributed SDT:
If an SDT uses both synthesized attributes and inherited attributes with a restriction that
inherited attribute can inherit values from left siblings only, it is called as L-attributed
SDT.
For example,
A -> XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S}
is not an L-attributed grammar since Y.S = A.S and Y.S = X.S
are allowed but Y.S = Z.S violates the L-attributed SDT definition
as attributed is inheriting the value from its right sibling.
Note – If a definition is S-attributed, then it is also L-attributed
but NOT vice-versa.
Example 1
PRODUCTION SEMANTIC RULES
L => E n E => E1 + T E => T T => T1 * L.val = E.val E.val = E1.val + T.val E: val = T: val T: val =
F T => F F => ( E ) F => digit 1 : val # F: val T: val = F: val F.val = E.val F.val = digit.le
The SDD of the above example is an S-attributed SDT because each attribute, L.val,
E.val, T.val, and F.val, is synthesized.
2) L-attributed SDT:
Example:
X => ABC {B.P = X.P, B.P = A.P, B.P = C.P}
This is not an L-attributed SDT because B.P = X.P and B.P = A.P are allowed, but B.P =
C.P doesn't follow the rule of L-attributed SDT definition.