UNIT-3 Notes
UNIT-3 Notes
UNIT-III
SYNTAX-DIRECTED TRANSLATION:
Syntax-directed translation is done by attaching rules or program fragments to
productions in a grammar.
If X is a grammar symbol and ‘a’ is one of its attributes, then we write X.a to denote
the value of ‘a’ at a particular parse-tree node labelled as X.
Example:
Production Semantic Rule
E -> E1 + T E.code=E1.code + T.code (or) E.code=E1.code||T.code||’+’
• The above production has two non-terminals, E1 and T. E1 and T have a string-
valued attribute code. The semantic rule species that the string E:code is formed
by concatenating E1:code, T:code, and the character ‘+’.
Alternatively, we can also insert the semantic actions/translation scheme inside the
grammar.
E -> E1 + T { print ‘+’ }
• By convention, semantic actions are enclosed within curly braces.
Out of these two notations, syntax-directed definitions (SDD) are more readable,
and hence more useful for specifications. However, translation schemes (SDT) can
be more efficient, and hence more useful for implementations.
SYNTAX-DIRECTED DEFINITIONS:
For each nonterminal of a production rule, there are two types of attributes.
1. Synthesized Attributes
2. Inherited Attributes
Synthesized Attribute:
• A synthesized attribute for a nonterminal A at a parse-tree node N is defined
by a semantic rule associated with the production at N.
• Note that the production must have A as its head.
• A synthesized attribute at node N is defined only in terms of attribute values
of children of N and at N itself.
Inherited Attribute:
• An inherited attribute for a nonterminal B at a parse-tree node N is defined
by a semantic rule associated with the production at the parent of N.
• Note that the production must have B as a symbol in its body.
• An inherited attribute at node N is defined only in terms of attribute values
at N’s parent, N itself and N’s siblings.
Example:
PRODUCTION SEMANTIC RULES
A -> B A.s := B.i;
B.i := A.s + 1
A and B, have ‘s’ and ‘i’ as attributes.
Here ‘s’ is synthesized and ‘i’ is inherited, because ‘s’ is evaluated from its children
and ‘i’ is evaluated from its parent.
S-attributed SDD
An SDD that involves only synthesized attributes is called S-attributed SDD.
An SDD without side effects is sometimes called an attribute grammar. The rules
in an attribute grammar define the value of an attribute purely in terms of the
values of other attributes and constants.
To visualize the translation specified by an SDD, annotated parse trees are used.
A parse tree, showing the value(s) of its attribute(s) is called an annotated parse
tree.
In the above SDD, each of the non-terminals has a single synthesized attribute,
called val.
• Terminal digit has a synthesized attribute lexval, which is an integer value
returned by the lexical analyzer.
• For production 1: L -> E n, sets L.val to E.val, which is a numerical value of
the entire expression (n indicates that).
• For production 2: E -> E1 + T, also has one rule, which computes the val
attribute for the head E as the sum of the values at E1 and T. At any parse
tree node N labeled E, the value of val for E is the sum of the values of val
at the children of node N labeled E1 and T.
• For production 3: E -> T, has a single rule that defines the value of val for E
• to be the same as the value of val at the child for T.
• For production 4: T -> T1 * F, also has one rule, which computes the val
attribute for the head T as the product of the values at T1 and F. At any parse
tree node N labeled T, the value of val for T is the product of the values of
val at the children of node N labeled T1 and F.
• For productions 5 and 6 copy values at a child, like that for the third
production.
• For production 7: It gives F.val the value of a digit, that is, the numerical
value of the token digit that the lexical analyzer returned.
Before evaluating an attribute at a node of a parse tree, we must evaluate all the
attributes upon which its value depends.
Annotated Parse tree for the above SDD for the expression 3 * 5 + 4 n is
L-attributed SDD
If an SDD uses both synthesized attributes and inherited attributes with a
restriction that inherited attribute can inherit values from left siblings only, it is
called as L-attributed SDT.
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right
parsing manner.
Example:
L-Attributed definitions
Production Semantic Rules
T —> FT’ T’.inh = F.val
T’ —> *FT1’ T’1.inh =T’.inh * F.val
• For each parse tree node X, the dependency graph has a node for each
attribute associated with the node X.
• Nodes 1 and 2 represent the attribute lexval associated with the two leaves
labeled digit.
• Nodes 3 and 4 represent the attribute val associated with the two nodes
labeled F. The edges to node 3 from 1 and to node 4 from 2 result from the
semantic rule that defines F.val in terms of digit.lexval. In fact, F.val equals
digit.lexval, but the edge represents dependence, not equality.
• Nodes 5 and 6 represent the inherited attribute T’.inh associated with each
of the occurrences of nonterminal T’.
• Nodes 7 and 8 represent the synthesized attribute syn associated with the
occurrences of T’.
Example of SDT
Applications of SDT
1. Primary application of SDT is construction of Syntax Trees
• Since some compilers use the syntax trees as an intermediate
representation, a common form of SDD(Syntax Directed Definition) turns its
input string into a tree.
2. SDT is used for Executing Arithmetic Expression.
3. In the conversion from infix to postfix expression.
4. In the conversion from infix to prefix expression.
5. It is used for Binary to decimal conversion.
6. In counting number of Reduction.
7. SDT is used to generate intermediate code.
8. In storing information into symbol table.
9. SDT is used for type checking also.
Chain of single production of parse tree are collapsed into one node with the
operators moving up to become the node in abstract syntax tree.
P 1 = mkleaf(id, entry.a)
P 2 = mkleaf(num, 4)
P 3 = mknode(-, P 1 , P 2 )
P 4 = mkleaf(id, entry.c)
P 5 = mknode(+, P 3 , P 4 )
Symbol tables are data structures that are used by compilers to hold information
about source-program constructs. The information is collected incrementally by
the analysis phases of a compiler and used by the synthesis phases to generate the
target code. Entries in the symbol table contain information about an identifier
such as its character string (or lexeme), its type, its position in storage, and any
other relevant information. Symbol tables typically need to support multiple
declarations of the same identifier within a program.
Scope: The scope of a variable is simply the part of the program where it may be
accessed or written. It is the part of the program where the variable's name may be
used. If a variable is declared within a function, it is local to that function. Variables
of the same name may be declared and used within other functions without any
conflicts. For instance,
int fun1()
{
int a;
int b;
....
}
int fun2()
{
int a;
int c;
....
}
Visibility: The visibility of a variable determines how much of the rest of the
program can access that variable. You can arrange that a variable is visible only
within one part of one function, or in one function, or in one source file, or
anywhere in the program.
Local and Global variables: A variable declared within the braces {} of a function
is visible only within that function; variables declared within functions are called
local variables. On the other hand, a variable declared outside of any function is a
global variable , and it is potentially visible anywhere within the program.
Variable-length name:
• A string of space is used to store all names.
• For each name, store the length and starting index of each name.
For each declaration of a name, there is an entry in the symbol table. Different
entries need to store different information because of the different contexts in
which a name can occur. An entry corresponding to a particular name can be
inserted into the symbol table at different stages depending on when the role of the
name becomes clear. The various attributes that an entry in the symbol table can
have are lexeme, type of name, size of storage and in case of functions - the
parameter list etc.
Operations:
The basic operations defined on a symbol table include:
• allocate – to allocate a new empty symbol table
• free – to remove all entries and free the storage of a symbol table
• insert – to insert a name in a symbol table and return a pointer to its entry
• lookup – to search for a name and return a pointer to its entry
• set_attribute – to associate an attribute with a given entry
• get_attribute – to get an attribute associated with a given entry
1. Unordered List
• Simplest to implement
• Implemented as an array or a linked list
• Linked list can grow dynamically – alleviates problem of a fixed size array
Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
2. Ordered List
• If an array is sorted, it can be searched using binary search – O(log2 n)
• Insertion into a sorted array is expensive – O(n) on average
• Useful when set of names is known in advance – table of reserved words
Management:
A compiler maintains two types of symbol tables:
• a global symbol table which can be accessed by all the procedures.
• scope symbol tables that are created for each scope in the program.
The global symbol table contains names for global variable and procedure names,
which should be available to all the child nodes.