Type Checking
Type Checking
In this module, we will discuss about the important function of the semantic phase of the
compiler namely type checking. Type checking involves identifying and prompting if
incompatible operands are being operated
The compiler needs to verify whether the source program follows the syntactic and semantic
conventions. This is done with the help of static checking. Static checking helps in reporting
programming errors during compile time. Dynamic checking is done during run time to identify
and handle errors as they occur.
The various static checking that are done by the compiler are listed as follows:
Line numbers 1 to 3 declares some variables and functions. Line number 4 adds an array
variable (address) with an integer variable. This is prompted as an error. Line number 5
tries to dereference an integer variable and is prompted as an error. Line number 6 is
accepted as it is similar to operator overloading. Line number 7 passes as integer to
function ‘f’ which takes a float but implicit conversion can be considered and hence is an
accepted statement. Similarly, line 8 is similar to a template instantiation and hence is an
accepted type check.
• Flow of control check: This would verify whether the statements that results in a branch
are terminated correctly. Example: Break statements.
1. myfunc()
{…
2. while (n)
3. { …
4. if (i>10)
5. break; // OK
}
}
6. myfunc1()
{…
7. break; // ERROR
}
Consider the functions myfunc() and myfunc1(). myfunc() has a while loop and in the
body of this while loop there is a if() conditional statement. The body of the if() is a break
statement where nothing is done if the condition is met. This is an accepted control flow
as only in the event that the if() statement is true, we come out of the body of the if()
without doing anything. On the other hand, consider myfunc1() where there is no loop
construct and we see a break. This is considered as error because if the program breaks
the location to jump to is not defined.
• Uniqueness check: This ensures that an object must be defined exactly once in the
situation that is demanded by some programming language. Example: labels in case
statements need to be unique in pascal, identifiers need to be unique in programming
languages.
myfunc2()
{ int i, j, i; // ERROR
…
}
The above function myfunc2() has two variables with the same name ‘i’ and hence is
considered error.
• Name-related checks: This checks whether the same name appears more than once in
programming languages that does not support. Example: In Ada, a name should not
appear more than once and compiler needs to verify this. The following example has two
arguments with the same name and hence is prompted as error.
After understanding the types of static checking that are necessary, the compiler does type
checking as against any other type of static checking. Typically type checking is being done after
successfully parsing the input sentence. The position of the type checker is given in figure 21.1
where the syntax tree is used to verify the type checking information and is given later to the
intermediate code generator for generating intermediate representation.
Thus as can be seen from figure 21.1 the type checker is part of the semantic phase of the
compiler. Type checking information is added with the semantic rules. Basic type checking is
performed which is further extended to type checking of complex attributes.
In general a language’s type system specifies the validity of the operation depending on the type
of the operand. Type checking is done to ensure that operations with correct types are operated
upon. Type systems define semantic rules to perform type checking. Type expressions can be
defined as follows:
1. A basic type is a type expression. Integer, float, type_error all are type expression.
Statements will have void as their type which is also a basic type expression.
2. A type constructor applied to a type expressions is a derived type expression
a. Arrays: If T is a type expression then array(A, T) is a type expression denoting the
type of the array with elements of type T and index A
b. Products: If T1 and T2 are type expressions then their Cartesian product T1 X T2
is a type expression
c. Pointers: If T is a type expression then pointer(T) is a type expression indicating
the type pointer to an object of type T
d. Functions: Functions gets value from some domain and maps it to a range. This
mapping is a type
3. Type expressions contain variables which are also derived type expressions
After knowing what to check as type in each of the programming construct, now let us discuss
the semantic rules that are necessary for type checking of the different programming constructs
of Pascal language.
Expressions are to be compatible to be operated on. Every expression is governed by a data type
and this ‘type’ is an attribute associated with it. For type checking of expressions, we will be
having semantic rule that verifies this “type” attribute. Table 21.1 summarizes the semantic rule
for type checking of expressions.
Functions will do some processing and computations based on its parameters. The type checking
of functions verifies whether the function uses arguments that have been passed and checks if
there is a mapping between the arguments and the computations of the function. Table 21.3
summarizes the check.
Table 21.3 Semantic rules for type checking functions
21.4 Coercions
Data type conversions need to be carried out to support operations on differing data types. For
example an integer variable needs to be interacted with the floating point variable. To support
such operations type conversions need to be done. The type conversions can be explicit or
implicit. Explicit type conversions are done by the programmer in the code itself. Implicit type
conversions which are referred to as coercions are done by the compiler. To support coercions
semantic rules need to be written.
Table 21.4 explains the semantic rules for operating and performing coercions between integer
and real data type operands.
The phases discussed so far belong to the analysis phase. Thus the high level input code is
analysed for syntax and semantics. After verifying that the input is correct, the input needs to be
converted to a representation so that generating assembly code will be easier. Thus the
intermediate code generation phase of the compiler helps in facilitating retargeting and enables
attaching a back end for the new machine to an existing front end. This phase also enables
machine- independent code optimization. The position of the intermediate code generation phase
is given in figure 21.2 where it can be observed that the phase acts as a bridge between the front
and the back end of the compiler
The construction of the AST is already discussed in the previous module using the
functions mknode() and mkleaf(). The construction of the DAG is the same as AST
except for the fact that the common terms are represented only once in the graphical
representation. The procedure to construct DAG for an expression is to initially, search
the array for a node M with label op, left child l and right child r. If there is such a node,
return the value number M. If not create in the array a new node N with label op, left
child l, and right child r and return its value
• Postfix notation: This is a notation where a stack is used to store the operations on values
along with its operands. Consider the example: a:=b*-c + b*-c. The corresponding
postfix notation is given as follows
iload 2 // push b
iload 3 // push c
ineg // uminus
imul // *
iload 2 // push b
iload 3 // push c
ineg // uminus
imul // *
iadd // +
istore 1 // store a
• Three-address code: Every statement is split in such a way that there are three operands
and two operators including the assignment operator. Example: x := y op z. For the same
example, the following is the 3-address code using linearized tree representation
t1 := - c
t2 := b * t1
t3 := - c
t4 := b * t3
t5 := t2 + t4
a := t5
The same three address code if it incorporates common expressions we use a DAG
representation and following is the 3-address code
t1 := - c
t2 := b * t1
t5 := t2 + t2
a := t5
• Value- number method : This is a tabular representation of the DAG using a hash table.
Consider the DAG of figure 21.4 for the example i := i+10.
1 id – i To i
2 num 10
3 + 1 2
4 = 1 3
From table 21.5 we can observe that there is an index as shown in column 1. The second
column is the LHS of the expression and columns 3 and 4 indicates the operands in terms
of the indices.
In this module and subsequent modules we will be resorting to the 3-address code as an
intermediate representation and the forthcoming modules will discuss how to create this 3-
address code for all the programming constructs.
Summary: In this module we discussed the static checking types and type checking of all
statements with a focus on coercion. We also discussed an introduction to intermediate code
generation. The next module will focus on how to generate 3-address code for all programming
constructs and its usage for final code generation