RkCD-Chapter 5 - Semantic Analysis
RkCD-Chapter 5 - Semantic Analysis
Semantic Analysis
Semantic Analysis is the third phase of Compiler. Semantic Analysis makes sure that
declarations and statements of program are semantically correct. It is a collection of procedures
which is called by parser as and when required by grammar. Both syntax tree of previous phase
and symbol table are used to check the consistency of the given code. Type checking is an
important part of semantic analysis where compiler makes sure that each operator has matching
operands.
Semantic Analyzer: It uses syntax tree and symbol table to check whether the given
program is semantically consistent with language definition. It gathers type information and
stores it in either syntax tree or symbol table. This type information is subsequently used by
compiler during intermediate-code generation.
Semantic Errors:
Errors recognized by semantic analyzer are as follows:
• Type mismatch
• Undeclared variables Reserved identifier misuse
Functions of Semantic Analysis: 1.Type Checking –
Ensures that data types are used in a way consistent with their definition.
2. Label Checking –
A program should contain labels references.
3. Flow Control Check –
Keeps a check that control structures are used in a proper manner.(example: no break
statement outside a loop) Example:
float x = 10.1; float
y = x*30;
In the above example integer 30 will be typecasted to float 30.0 before multiplication, by
semantic analyzer.
Static and Dynamic Semantics:
1. Static Semantics – It is named so because of the fact that these are checked at compile
time. The static semantics and meaning of program during execution, are indirectly related.
2. Dynamic Semantic Analysis – It defines the meaning of different units of program like
expressions and statements. These are checked at runtime unlike static semantics.
Example
Production Semantic Rules
E→E+T E.val := E.val + T.val
E→T E.val := T.val
T→T*F T.val := T.val + F.val
T→F T.val := F.val
F → (F) F.val := F.val
F → num F.val := num.lexval
E.val is one of the attributes of E.
num.lexval is the attribute returned by the lexical analyzer. Example
E -> E+T | T
T -> T*F | F
F -> INTLIT
This is a grammar to syntactically validate an expression having additions and
multiplications in it. Now, to carry out semantic analysis we will augment SDT rules to this
grammar, in order to pass some information up the parse tree and check for semantic errors, if
any. In this example we will focus on evaluation of the given expression, as we don’t have any
semantic assertions to check in this very basic example.
E -> E+T { E.val = E.val + T.val } PR#1
E -> T { E.val = T.val } PR#2
T -> T*F { T.val = T.val * F.val } PR#3
T -> F { T.val = F.val } PR#4
F -> INTLIT { F.val = INTLIT.lexval } PR#5
For understanding translation rules further, we take the first SDT augmented to [ E ->
E+T ] production rule. The translation rule in consideration has val as attribute for both the
nonterminals – E & T. Right hand side of the translation rule corresponds to attribute values of
right side nodes of the production rule and vice-versa. Generalizing, SDT are augmented rules to
a CFG that associate 1) set of attributes to every node of the grammar and 2) set of translation
rules to every production rule using attributes, constants and lexical values.
Let’s take a string to see how semantic analysis happens – S = 2+3*4. Parse tree corresponding
to S would be
To evaluate translation rules, we can employ one depth first search traversal on the parse
tree. This is possible only because SDT rules don’t impose any specific order on evaluation until
children attributes are computed before parents for a grammar having all synthesized attributes.
Otherwise, we would have to figure out the best suited plan to traverse through the parse tree and
evaluate all the attributes in one or more traversals. For better understanding, we will move
bottom up in left to right fashion for computing translation rules of our example.
Above diagram shows how semantic analysis could happen. The flow of information happens
bottom-up and all the children attributes are computed before parents, as discussed above. Right
hand side nodes are sometimes annotated with subscript 1 to distinguish between children and
parent.
Example:
E --> E + T { E.val = E .val + T.val}
1 1
Annotated Parse Tree –The parse tree containing the values of attributes at each node for given
input string is called annotated or decorated parse tree.
Features –
• High level specification
• Hides implementation details
• Explicit order of evaluation is not specified
1. Synthesized Attributes – These are those attributes which derive their values from their
children nodes i.e. value of synthesized attribute at node is computed from the values of
attributes at children nodes in parse tree.
Example:
E --> E1 + T { E.val = E1.val + T.val}
In this, E.val derive its values from E1.val and T.val
Annotated
Parse Tree
For computation of attributes we start from leftmost bottom node. The rule F –> digit is
used to reduce digit to F and the value of digit is obtained from lexical analyzer which becomes
value of F i.e. from semantic action F.val = digit.lexval. Hence, F.val = 4 and since T is parent
node of F so, we get T.val = 4 from semantic action T.val = F.val. Then, for T –> T1 * F
production, the corresponding semantic action is T.val = T1.val * F.val . Hence, T.val = 4 * 5 =
20.
Similarly, combination of E1.val + T.val becomes E.val i.e. E.val = E1.val + T.val = 26.
Then, the production S –> E is applied to reduce E.val = 26 and semantic action associated with
it prints the result E.val . Hence, the output will be 26.
2. Inherited Attributes – These are the attributes which derive their values from their parent or
sibling nodes i.e. value of inherited attributes are computed by value of parent or sibling nodes.
Example:
A --> BCD { C.in = A.in, C.type = B.type }
Computation of Inherited Attributes –
Construct the SDD using semantic
actions.
The annotated parse tree is generated and attribute values are computed in top down manner.
Example: Consider the following grammar
S --> T L
T --> int
T --> float
T --> double
L --> L , id
1
L --> id
The SDD for the above grammar can be written as follow
Let us assume an input string int a, cfor computing inherited attributes. The annotated parse tree
for the input string is
The value of L nodes is obtained from T.type (sibling) which is basically lexical value
obtained as int, float or double. Then L node gives type of identifiers a and c. The computation of
type is done in top down manner or preorder traversal. Using function Enter_type the type of
identifiers a and c is inserted in symbol table at corresponding id.entry.
o In the parse tree, most of the leaf nodes are single child to their parent nodes.
o In the syntax tree, we can eliminate this extra information. oSyntax tree is a variant of
parse tree. In the syntax tree, interior nodes are operators and leaves are operands.
o Syntax tree is usually used when represent a program in a tree structure.
A sentence id + id * id would have the following syntax tree:
Abstract syntax trees are important data structures in a compiler. It contains the least unnecessary
information.
Abstract syntax trees are more compact than a parse tree and can be easily used by a compiler.
Type Checking:
Type checking is the process of verifying that each operation executed in a program respects the
type system of the language. This generally means that all operations in any expression are of
appropriate type and number. Much of what we do in the semantic analysis is type checking.
How to design a Type Checker?
When designing a type checker for a compiler, here is the process:
Identify the types that are available in the language
Identify the language constructs that have types associated with them
Identify the semantic rules for the language
A language is considered strongly-typed if each and every type error is detected
during compilation.
Type Checking Preventions:
• Application of a function to wrong number of arguments
• Application of integer functions to floats
• Use of undeclared variables in expressions
• Functions that do not return values
• Division by zero
• Array indices out of bounds
Two Types of Type Checking:
1. Static Type Checking
2. Dynamic Type Checking
1. Static Type Checking:Check on Compile Time
i) Type Check: 2+2.5 = Error
ii) Flow of Control: Flow of control stop at somewhere iii)
Uniqueness Check: int a =2; a should be unique iv) Name Related
Check: Calling add() and Definition add()
Example: For example, if a and b are of type int and we assign very large values to them,
a * b may not be in the acceptable range of ints, or an attempts to compute the ratio
between two integers may raise a division by zero. These kinds of type errors usually
cannot be detected at compiler time.
2. Dynamic Type Checking: Check on Runtime
Common dynamically typed languages are; JavaScript, Php and Python etc.
Most of the languages used both.
Static or Dynamic doesn’t mean Weak or Strong.
Type System:
Type system is a collection of rules applied on Type expression
Designing of type checker vary from language to language. Eg: 2+2=4
Each expression has a type associated
Basic types: Boolean, int, char
Constructed types: pointer, array and structures Type Expression:
• A basic type is a type expression
• A type name is a type expression
• A type expression can be formed by applying the array type constructor to a number and
a type expression.
• A record is a data structure with named field
• A type expression can be formed by using the type constructor for function types
• If s and t are the type expressions, then their Cartesian product s*t is a type expression
• Type expression may contain variables whose values are type expressions
Type Checking of Statements:
S ->d= E { if (id.type=E.type then S.type=void else S.type=type-error } S -
>if E then S1 { if (E.type=boolean then S.type=S1.type else S.type=type-error }
S->while E do S1 { if (E.type=boolean then S.type=S1.type else S.type=type-error }
Type Conversion:
• Expression X+I (the X is a real and I is int type)
• First compiler checks both of the operand are of the same type of not Usually convert
int type into real type and perform real operand operation.
• X+I ( int convert into real)
• X I intoreal real +
• Intoreal operation convert I intoreal
• Real+ operation perform real addition in between both real operands.
Coercions:
Convert one type to anther automatically called implicit or coercions like ASCII
It is limited in many languages in case of no information loss
Integer convert into real but no vice versa because of some memory loss
When programmer write something to convert called explicit
Over Loading Functions:
The same name is used for several different operations over several different types
Type checker is used to detect the error while creating overloading functions.
Polymorphic Functions:
A piece of code that can be executed with arguments of different types.
Type synthesis will be illustrated by extending the scheme for translating expressions.
We introduce another attribute E.type , whose value is either integer or float. The rule associated
with E El + E2 builds on the pseudocode if ( E1 . type = integer and E2 . type = integer )
E.type = integer;
else if ( E1 . type = float and E2 . type = integer )
As the number of types subject to conversion increases, the number of cases increases
rapidly. Therefore, with large numbers of types, careful organization of the semantic actions
becomes important.