Chapter 6
Chapter 6
CHAPTER SIX
6. SYMBOL TABLE & TYPE CHECKING
Symbol table is an important data structure created and maintained by compilers in order to store
information about the occurrence of various entities such as variable names, function names,
objects, classes, interfaces, etc. Symbol table is used by both the analysis and the synthesis parts
of a compiler.
They are data structures that are used by compilers to hold information about source-program
constructs. The information is collected incrementally by the analysis phases of a compiler and
used by the synthesis phases to generate the target code. Entries in the symbol table contain
information about an identifier such as its character string (or lexeme), its type, its position in
storage, and any other relevant information. Symbol tables typically need to support multiple
declarations of the same identifier within a program.
A compiler uses a symbol table to keep track of scope and binding information about names. The
symbol table is searched every time a name is encountered in the source text. Changes to the table
occur if a new name or new information about an existing name is discovered. A symbol-table
mechanism must allow us to add new entries and find existing entries efficiently.
A symbol table may serve the following purposes depending upon the language in hand:
To store the names of all entities in a structured form at one place.
To verify if a variable has been declared.
To implement type checking, by verifying assignments and expressions in the source code
are semantically correct.
To determine the scope of a name (scope resolution).
A symbol table is simply a table which can be either linear or a hash table. It maintains an entry
for each name in the following format:
<symbol name, type, attribute>
For example, if a symbol table has to store information about the following variable declaration:
static int interest;
The symbol-table entry itself can be set up when the role of a name becomes clear. With the
attribute values being filled in as the information becomes available. In some cases, the entry can
be initiated from the lexical analyzer as soon as a name is seen in the input. More often, one
name may denote several different objects, perhaps even in the same block or procedure. For
example, the C declarations
int x;
struct x {float y,z;};
use x as both an integer and as the tag of a structure with two fields, in such cases, the lexical
analyzer can only return to the parser the name itself (or a pointer to the lexeme forming that
name), rather than a pointer to the symbol-table entry. The record in the symbol table is created
when the syntactic role played by this name is discovered. For the above declarations, two
symbol-table entries for x would be created; one with x as an integer and one as a structure.
Attributes of a name are entered in response to declarations, which may be implicit. Labels are
often identifiers followed by a colon, so one action associated with recognizing such an identifier
may be to enter this fact into the symbol table. Similarly, the syntax of procedure declarations
specifies that certain identifiers are formal parameters.
Among all, symbol tables are mostly implemented as hash tables, where the source code symbol
itself is treated as a key for the hash function and the return value is the information about the
symbol.
Principles of Compiler Design (SEng 4031) 2 Prepared by L. A.
DMIoT School of Computing Software Engineering Academic Program
An attribute for a symbol in the source code is the information associated with that symbol. This
information contains the value, state, scope, and type about the symbol. The insert( ) function
takes the symbol and its attributes as arguments and stores the information in the symbol table.
For example:
int a;
should be processed by the compiler as: insert(a, int);
lookup ( )
Lookup ( ) operation is used to search a name in the symbol table to determine:
if the symbol exists in the table
if it is declared before it is being used
if the name is used in the scope
if the symbol is initialized
if the symbol declared multiple times
The format of lookup( ) function varies according to the programming language. The basic format
should match the following:
lookup(symbol)
This method returns 0 (zero) if the symbol does not exist in the symbol table. If the symbol exists
in the symbol table, it returns its attributes stored in the table.
Scope Management
A compiler maintains two types of symbol tables: a global symbol table which can be
accessed by all the procedures and scope symbol tables that are created for each scope in the
program.
To determine the scope of a name, symbol tables are arranged in hierarchical structure as shown
in the example below:
...
int value=10;
int pro_one( )
{
int one_1;
int one_2;
{
int one_3; inner scope 1
int one_4;
}
int one_5;
{
int one_6; inner scope 2
int one_7;
}
}
int pro_two( )
{
int two_1;
int two_2;
{
int two_3; inner scope 3
int two_4;
}
int two_5;
}
...
The above program can be represented in a hierarchical structure of symbol tables:
The global symbol table contains names for one global variable (int value) and two
procedure names, which should be available to all the child nodes shown above. The names
mentioned in the pro_one symbol table (and all its child tables) are not available for pro_two
symbols and its child tables.
This symbol table data structure hierarchy is stored in the semantic analyzer and whenever a name
needs to be searched in a symbol table, it is searched using the following algorithm:
1. first a symbol will be searched in the current scope, i.e., current symbol table,
2. if a name is found, then search is completed, else it will be searched in the parent symbol
table until,
3. either the name is found or the global symbol table has been searched for the name
Checking done by a compiler is said to be static, while checking done when the target program
runs is termed dynamic. A language is strongly typed if its compiler can guarantee that the
programs it accepts will execute without type errors, or if its type checker does not report any
type errors.
Static checking ensures that certain kind of errors are detected and reported. The following are
examples of static checks:
2. Flow-of-Control Checks. Statements that cause flow of control to leave a construct must
have some place to which to transfer the flow of control. For example, a break statement
in C causes control to leave the smallest enclosing while, for, or switch statement; an error
occurs if such an enclosing statement does not exist.
3. Uniqueness Checks. There are situations in which an object must be defined exactly once.
For example, in Pascal, an identifier must be declared uniquely, labels in a case statement
must be distinct, and elements in a scalar type may not be repeated.
4. Name -Related Checks. Sometimes, the same name must appear two or more times. For
example, in Ada, a loop or block may have a name that appears at the beginning and end
of the construct. The compiler must check that the same name is used at both places.
A type checker verifies that the type construct matches that expected by its context. For example,
the built-in arithmetic operator mod requires integer operands, so a type checker must verify that
the operands of mod have type integer. A type checker should verify that the type value assigned
to a variable is compatible with the type of the variable. Similarly, the type checker must verify
that dereferencing is applied only to a pointer, that indexing is done only on an array, that a user-
defined function is applied to the correct number and type of arguments, and so forth.
Type information produced by the type checker may be needed when the code is generated. For
example, arithmetic operators like + usually apply to either integers or reals, perhaps to other
types, and we have to look at the context of + to determine the sense that is intended. A symbol
that can represent different operations in different contexts is said to be "overloaded".
Overloading may be accompanied by coercion of types, where a compiler supplies an operator to
convert an operand into the type expected by the context.
A distinct notation from overloading is that of "polymorphism." The body of a polymorphic
function can be executed with arguments of several types.