0% found this document useful (0 votes)
2 views9 pages

Type Checking

Module 21 discusses type checking in the semantic phase of a compiler, focusing on static and dynamic checking to ensure compatibility of operands and correctness of program constructs. It outlines various types of checks including type checking, flow of control checks, uniqueness checks, and name-related checks, along with semantic rules for expressions, statements, and functions. The module also covers coercions for data type conversions and the intermediate code generation process, which serves as a bridge between the front and back ends of the compiler.

Uploaded by

shubham gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views9 pages

Type Checking

Module 21 discusses type checking in the semantic phase of a compiler, focusing on static and dynamic checking to ensure compatibility of operands and correctness of program constructs. It outlines various types of checks including type checking, flow of control checks, uniqueness checks, and name-related checks, along with semantic rules for expressions, statements, and functions. The module also covers coercions for data type conversions and the intermediate code generation process, which serves as a bridge between the front and back ends of the compiler.

Uploaded by

shubham gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

MODULE 21 – TYPE CHECKING

In this module, we will discuss about the important function of the semantic phase of the
compiler namely type checking. Type checking involves identifying and prompting if
incompatible operands are being operated

21.1 Types of Check

The compiler needs to verify whether the source program follows the syntactic and semantic
conventions. This is done with the help of static checking. Static checking helps in reporting
programming errors during compile time. Dynamic checking is done during run time to identify
and handle errors as they occur.

The various static checking that are done by the compiler are listed as follows:

• Type Checking: This check determines and handles if an operator is applied to an


incompatible operand. Example: The compiler would prompt an error if array variable is
added with function variable. Consider the following declarations and the statements that
follow it.

1. int op(int), op(float);


2. int f(float);
3. int a, c[10], d;
4. d = c+d; // FAIL
5. *d = a; // FAIL
6. a = op(d); // OK: overloading (C++)
7. a = f(d); // OK: coercion
8. vector<int> v;//OK: template instantiation

Line numbers 1 to 3 declares some variables and functions. Line number 4 adds an array
variable (address) with an integer variable. This is prompted as an error. Line number 5
tries to dereference an integer variable and is prompted as an error. Line number 6 is
accepted as it is similar to operator overloading. Line number 7 passes as integer to
function ‘f’ which takes a float but implicit conversion can be considered and hence is an
accepted statement. Similarly, line 8 is similar to a template instantiation and hence is an
accepted type check.

• Flow of control check: This would verify whether the statements that results in a branch
are terminated correctly. Example: Break statements.

1. myfunc()
{…
2. while (n)
3. { …
4. if (i>10)
5. break; // OK
}
}

6. myfunc1()
{…
7. break; // ERROR
}
Consider the functions myfunc() and myfunc1(). myfunc() has a while loop and in the
body of this while loop there is a if() conditional statement. The body of the if() is a break
statement where nothing is done if the condition is met. This is an accepted control flow
as only in the event that the if() statement is true, we come out of the body of the if()
without doing anything. On the other hand, consider myfunc1() where there is no loop
construct and we see a break. This is considered as error because if the program breaks
the location to jump to is not defined.

• Uniqueness check: This ensures that an object must be defined exactly once in the
situation that is demanded by some programming language. Example: labels in case
statements need to be unique in pascal, identifiers need to be unique in programming
languages.

myfunc2()
{ int i, j, i; // ERROR

}

The above function myfunc2() has two variables with the same name ‘i’ and hence is
considered error.

• Name-related checks: This checks whether the same name appears more than once in
programming languages that does not support. Example: In Ada, a name should not
appear more than once and compiler needs to verify this. The following example has two
arguments with the same name and hence is prompted as error.

cnufym(int a, int a) // ERROR


{ …
}

21. 2 Position of type checker

After understanding the types of static checking that are necessary, the compiler does type
checking as against any other type of static checking. Typically type checking is being done after
successfully parsing the input sentence. The position of the type checker is given in figure 21.1
where the syntax tree is used to verify the type checking information and is given later to the
intermediate code generator for generating intermediate representation.

Figure 21.1 Position of the type checker

Thus as can be seen from figure 21.1 the type checker is part of the semantic phase of the
compiler. Type checking information is added with the semantic rules. Basic type checking is
performed which is further extended to type checking of complex attributes.

21.3 Type Checking

In general a language’s type system specifies the validity of the operation depending on the type
of the operand. Type checking is done to ensure that operations with correct types are operated
upon. Type systems define semantic rules to perform type checking. Type expressions can be
defined as follows:

1. A basic type is a type expression. Integer, float, type_error all are type expression.
Statements will have void as their type which is also a basic type expression.
2. A type constructor applied to a type expressions is a derived type expression
a. Arrays: If T is a type expression then array(A, T) is a type expression denoting the
type of the array with elements of type T and index A
b. Products: If T1 and T2 are type expressions then their Cartesian product T1 X T2
is a type expression
c. Pointers: If T is a type expression then pointer(T) is a type expression indicating
the type pointer to an object of type T
d. Functions: Functions gets value from some domain and maps it to a range. This
mapping is a type
3. Type expressions contain variables which are also derived type expressions

After knowing what to check as type in each of the programming construct, now let us discuss
the semantic rules that are necessary for type checking of the different programming constructs
of Pascal language.

21.3.1 Type checking of Expressions

Expressions are to be compatible to be operated on. Every expression is governed by a data type
and this ‘type’ is an attribute associated with it. For type checking of expressions, we will be
having semantic rule that verifies this “type” attribute. Table 21.1 summarizes the semantic rule
for type checking of expressions.

Table 21.1 Semantic rules for type checking expressions

Production Semantic rule Explanation


E  literal E.type = char This is a terminating production and since
the RHS of the production is a character
data type, we assign the type of the LHS
expression to “char”
E  num E.type = integer This is also a terminating production where
the RHS is a constant integer and hence we
assign the LHS expression to type
“integer”
E  id E.type = lookup(id.entry) RHS of the production is a variable and
hence we assign the LHS expression type
to the type of the RHS variable
E  E1 mod E2 E.type = The production computes the “mod” of two
if E1.type == integer and expressions E1 and E2. As we know,
E2.type == integer then “mod” can be computed only if both the
integer expressions are integers. This is how type
else is checked using the semantic rule. If both
type_error the expressions’ type is integer then LHS
expression is of type integer else, we have
to report that mod is not possible to
compute. So, we use the user defined
“type_error” type to the LHS variable.
E  E1[E2] E.type = This production defines the type of LHS to
if E2.type == integer and the type of the RHS expression E1. Since,
E1.type =array(s,t) then t the RHS defines an array, we verify
else whether the type of E2 is integer and then
type_error assign the type of E1 to E and if there is a
mismatch we define the type as error
E  E1↑ E.type = Verifies whether the RHS corresponds to a
if E1.type = ptr(t) then t pointer and if it is true the LHS expression
else E’s type is the type of the RHS expression
type_error E1, else we assign error to the LHS
expression

21.3.2 Type checking of Statements

Statements can be simple assignment statements, conditional statements, sequence of statements


or loops. The attribute of the statements is also type and the value of this is void if the statements
are correct. The set of semantic rules to perform type checking of statements is given in Table
21.2.
Table 21.2 Semantic rules for type checking of statements

Production Semantic Rule Explanation


P  D; S This is the initial production and
hence there is no semantic rule for
type checking of statements
S  id := E S.type = Simple assignment statement. This
if id.type == E.type then void checks whether the data type of the
else type_error LHS variable ‘id’ is the same as
the RHS expression E. If it is true,
then the statement is assigned
“void” type else is assigned
type_error
S  if E then S1 S.type = This is an if-then conditional
if E.type == boolean then S1.type statement. The semantic verifies
else type_error whether the Expression evaluates
to true / false and if it is so, then S
is assigned type of S1 else error
S  while E do S1 S.type = This is a loop construct. The
if E.type == boolean then S1.type semantic rule is similar to the
else type_error conditional statement. If the
expression of the while construct
evaluates to true / false then type
of S1 is assigned to type of S else
is assigned error
S  S1 ; S2 S.type = Sequence of statements is given in
if S1.type == void and this production. Since statements
S2.type == void, then void has a type as void, the semantic
else type_error rule checks if all the statements are
of type void and if it is satisfied,
then we assign void as the type of
S. If any of the statement is not
correct then the type of S is
assigned as error.

21.3.3 Type checking of functions

Functions will do some processing and computations based on its parameters. The type checking
of functions verifies whether the function uses arguments that have been passed and checks if
there is a mapping between the arguments and the computations of the function. Table 21.3
summarizes the check.
Table 21.3 Semantic rules for type checking functions

Production Semantic Rule Explanation


T  T1 → T2 T.type = T1.type  T2.type This semantic rule verifies whether the
domain to range mapping is of same type
E  E1 (E2) E.type = Checks if type of E2 is some ‘s’ and if
if E2.type = s and E1.type = s t type of E1 is mapping from s to t (derived
then t from the previous semantic rule) then LHS
else type_error E is assigned type ‘t’ else error.

21.4 Coercions

Data type conversions need to be carried out to support operations on differing data types. For
example an integer variable needs to be interacted with the floating point variable. To support
such operations type conversions need to be done. The type conversions can be explicit or
implicit. Explicit type conversions are done by the programmer in the code itself. Implicit type
conversions which are referred to as coercions are done by the compiler. To support coercions
semantic rules need to be written.

Table 21.4 explains the semantic rules for operating and performing coercions between integer
and real data type operands.

Table 21.4 Semantic rules to support coercion

Production Semantic Rule Explanation


E  num E.type = integer This is a terminating
production where the RHS is a
constant integer and hence we
assign the LHS expression to
type “integer”
E  num.num E.type = real This is a terminating
production where the RHS is a
constant real and hence we
assign the LHS expression to
type “real”
E  id E.type = lookup (id.entry) RHS of the production is a
variable and hence we assign
the LHS expression type to the
type of the RHS variable
E  E1 op E2 E.type = LHS expression is E1 operated
if E.type = integer and E2.type = integer with E2. The type of the LHS
then integer expression E is assigned as
else if integer if both the type of E1
E1.type = integer and E2.type = real and E2 are integer. If anyone
then real is of type real then the LHS
else if expression is assigned real.
E1.type = real and E2.type = integer Thus operands of differing
then real data type are type caste to
else if perform the desired operation.
E1.type = real and E2.type = real
then real
else
type_error

21.5 Inte rmediate Code Generation

The phases discussed so far belong to the analysis phase. Thus the high level input code is
analysed for syntax and semantics. After verifying that the input is correct, the input needs to be
converted to a representation so that generating assembly code will be easier. Thus the
intermediate code generation phase of the compiler helps in facilitating retargeting and enables
attaching a back end for the new machine to an existing front end. This phase also enables
machine- independent code optimization. The position of the intermediate code generation phase
is given in figure 21.2 where it can be observed that the phase acts as a bridge between the front
and the back end of the compiler

Figure 21.2 Position of the Intermediate code generator

21.6 Representations of Interme diate code.

Intermediate code can be represented in the following ways:

 Graphical representations: The input is represented as an abstract syntax tree (AST). We


have already discussed the construction of the AST which is a binary tree in the previous
modules. This AST is the output of the semantic phase of the compiler which can serve
as an intermediate representation for generating target code. A directed acyclic graph can
also be used to represent the AST. Consider the exa mple AST and its corresponding
DAG representation in figure 21.3 for the input a:=b*-c + b*-c

The construction of the AST is already discussed in the previous module using the
functions mknode() and mkleaf(). The construction of the DAG is the same as AST
except for the fact that the common terms are represented only once in the graphical
representation. The procedure to construct DAG for an expression is to initially, search
the array for a node M with label op, left child l and right child r. If there is such a node,
return the value number M. If not create in the array a new node N with label op, left
child l, and right child r and return its value

Figure 21.3 AST and DAG for an example

• Postfix notation: This is a notation where a stack is used to store the operations on values
along with its operands. Consider the example: a:=b*-c + b*-c. The corresponding
postfix notation is given as follows

iload 2 // push b
iload 3 // push c
ineg // uminus
imul // *
iload 2 // push b
iload 3 // push c
ineg // uminus
imul // *
iadd // +
istore 1 // store a

• Three-address code: Every statement is split in such a way that there are three operands
and two operators including the assignment operator. Example: x := y op z. For the same
example, the following is the 3-address code using linearized tree representation

t1 := - c
t2 := b * t1
t3 := - c
t4 := b * t3
t5 := t2 + t4
a := t5
The same three address code if it incorporates common expressions we use a DAG
representation and following is the 3-address code

t1 := - c
t2 := b * t1
t5 := t2 + t2
a := t5

• Value- number method : This is a tabular representation of the DAG using a hash table.
Consider the DAG of figure 21.4 for the example i := i+10.

Figure 21.4 DAG for i := i+10

This is represented in the table of Table 21.5 as a hash table.

Table 21.5 Value-number scheme for i:= i+10

1 id – i To i
2 num 10
3 + 1 2
4 = 1 3

From table 21.5 we can observe that there is an index as shown in column 1. The second
column is the LHS of the expression and columns 3 and 4 indicates the operands in terms
of the indices.

In this module and subsequent modules we will be resorting to the 3-address code as an
intermediate representation and the forthcoming modules will discuss how to create this 3-
address code for all the programming constructs.

Summary: In this module we discussed the static checking types and type checking of all
statements with a focus on coercion. We also discussed an introduction to intermediate code
generation. The next module will focus on how to generate 3-address code for all programming
constructs and its usage for final code generation

You might also like