0% found this document useful (0 votes)
13 views

Ch6

Uploaded by

Farhat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Ch6

Uploaded by

Farhat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 50

1

Static Checking and Type


Systems
Chapter 6

COP5621 Compiler Construction


Copyright Robert van Engelen, Florida State University, 2007-2013
2

The Structure of our Compiler


Revisited
Syntax-directed
Character Token static checker Java
stream Lexical analyzer
stream Syntax-directed bytecode
translator

Yacc specification
Lex specification Type Code JVM specification
checking generation
3

Static versus Dynamic Checking


• Static checking: the compiler enforces
programming language’s static semantics
– Program properties that can be checked at
compile time
• Dynamic semantics: checked at run time
– Compiler generates verification code to enforce
programming language’s dynamic semantics
4

Static Checking
• Typical examples of static checking are
– Type checks
– Flow-of-control checks
– Uniqueness checks
– Name-related checks
5

Type Checking, Overloading,


Coercion, Polymorphism
class X { virtual int m(); } *x;
class Y: public X { virtual int m(); } *y;
int op(int), op(float);
int f(float);
int a, c[10], d;

d = c + d; // FAIL
*d = a; // FAIL
a = op(d); // OK: static overloading (C++)
a = f(d); // OK: coersion of d to float
a = x->m(); // OK: dynamic binding (C++)
vector<int> v; // OK: template instantiation
6

Flow-of-Control Checks
myfunc() myfunc()
{ … { …
break; // ERROR switch (a)
} { case 0:

myfunc() break; // OK
{ … case 1:
while (n) …
{ … }
if (i>10) }
break; // OK
}
}
7

Uniqueness Checks
myfunc()
{ int i, j, i; // ERROR

}

cnufym(int a, int a) // ERROR


{ …
}

struct myrec
{ int name;
};
struct myrec // ERROR
{ int id;
};
8

Name-Related Checks

LoopA: for (int I = 0; I < n; I++)


{ …
if (a[I] == 0)
break LoopB; // Java labeled loop

}
9

One-Pass versus Multi-Pass


Static Checking
• One-pass compiler: static checking in C, Pascal,
Fortran, and many other languages is performed in
one pass while intermediate code is generated
– Influences design of a language: placement constraints

• Multi-pass compiler: static checking in Ada, Java,


and C# is performed in a separate phase,
sometimes by traversing a syntax tree multiple
times
10

Type Expressions
• Type expressions are used in declarations
and type casts to define or refer to a type
– Primitive types, such as int and bool
– Type constructors, such as pointer-to, array-of,
records and classes, templates, and functions
– Type names, such as typedefs in C and named
types in Pascal, refer to type expressions
11

Graph Representations for Type


Expressions
int *f(char*,char*)

fun fun

args pointer args pointer

pointer pointer int pointer int

char char char

Tree forms DAGs


12

Cyclic Graph Representations


Source program
struct Node
{ int val;
struct Node *next;
};

struct

val next

int pointer

Internal compiler representation


of the Node type: cyclic graph
13

Name Equivalence
• Each type name is a distinct type, even
when the type expressions that the names
refer to are the same
• Types are identical only if names match
• Used by Pascal (inconsistently)
type link = ^node; With name equivalence in Pascal:
var next : link; p ≠ next
last : link; p ≠ last
p : ^node; p = q = r
q, r : ^node; next = last
14

Structural Equivalence of Type


Expressions
• Two types are the same if they are
structurally identical
• Used in C/C++, Java, C#

pointer = pointer
struct struct
val next val next
int pointer int
15

Structural Equivalence of Type


Expressions (cont’d)
• Two structurally equivalent type
expressions have the same pointer address
when constructing graphs by sharing nodes
struct Node p &s
{ int val;
struct Node *next; *p s pointer
};
struct
struct Node s, *p;
p = &s; // OK val next
*p = s; // OK
int
p = s; // ERROR
16

Constructing Type Graphs

Type *mkint() construct int node if not already


constructed

Type *mkarr(Type*,int) construct array-of-type node


if not already constructed

Type *mkptr(Type*) construct pointer-of-type node


if not already constructed
17

Syntax-Directed Definitions for


Constructing Type Graphs
%union
{ Symbol *sym;
int num; int keyword from lexer
Type *typ;
} identifier from lexer with sym table ptr
%token INT
%token <sym> ID literal value (int) from lexer
%token <num> NUM
%type <typ> type
%%
decl : type ID { addtype($2, $1); }
| type ID ‘[’ NUM ‘]’ { addtype($2, mkarr($1, $4)); }
;
type : INT { $$ = mkint(); }
| type ‘*’ { $$ = mkptr($1); }
| /* empty */ { $$ = mkint(); }
;
18

Type Systems
• A type system defines a set of types and
rules to assign types to programming
language constructs
• Informal type system rules, for example “if
both operands of addition are of type
integer, then the result is of type integer”
• Formal type system rules: Post systems
19

Type Rules in Post System


Notation
Type judgments
(v) =  e:
 v: where e is an expression and

 is a type

(v) =   e: Environment  maps objects v


 v := e : void to types :

(v) = 
 e1 : integer  e2 : integer

 e1 + e2 : integer

20

Type System Example


Environment  is a set of name, type pairs, for example:

 = { x,integer, y,integer, z,char, 1,integer, 2,integer }

From  and rules we can check the validity of typed expressions:


type checking = theorem proving

The proof that x := y + 2 is typed correctly:


(y) = integer (2) = integer
 y : integer  2 : integer

(x) = integer  y + 2 : integer 


 x := y + 2 : void

21

A Simple Language Example


PD;S E  true
DD;D  false
 id : T  literal
T  boolean  num
 char  id
 integer  E and E
 array [ num ] of T E+E
^T E[E]
S  id := E E^
 if E then S
 while E do S Pointer to T Pascal-like pointer
S;S dereference operator
22

Simple Language Example:


Declarations

D  id : T { addtype(id.entry, T.type) }
T  boolean { T.type := boolean }
T  char { T.type := char }
T  integer { T.type := integer }
T  array [ num ] of T1 { T.type := array(1..num.val, T1.type) }
T  ^ T1 { T.type := pointer(T1)

Parametric types:
type constructor
23

Simple Language Example:


Checking Statements

(v) =   e:


 v := e : void

S  id := E { S.type := (if id.type = E.type then void else type_error) }

Note: the type of id is determined by scope’s environment:


id.type = lookup(id.entry)
24

Simple Language Example:


Checking Statements (cont’d)

 e : boolean  s:


 if e then s : 

S  if E then S1 { S.type := (if E.type = boolean then S1.type
else type_error) }
25

Simple Language Example:


Statements (cont’d)

 e : boolean  s:


 while e do s : 

S  while E do S1 { S.type := (if E.type = boolean then S1.type


else type_error) }
26

Simple Language Example:


Checking Statements (cont’d)

 s1 : void  s2 : void


 s1 ; s2 : void

S  S1 ; S2 { S.type := (if S1.type = void and S2.type = void
then void else type_error) }
27

Simple Language Example:


Checking Expressions
(v) = 
 v:

E  true { E.type = boolean }


E  false { E.type = boolean }
E  literal { E.type = char }
E  num { E.type = integer }
E  id { E.type = lookup(id.entry) }

28

Simple Language Example:


Checking Expressions (cont’d)

 e1 : integer  e2 : integer


 e1 + e2 : integer

E  E1 + E2 { E.type := (if E1.type = integer and E2.type = integer


then integer else type_error) }
29

Simple Language Example:


Checking Expressions (cont’d)

 e1 : boolean  e2 : boolean


 e1 and e2 : boolean

E  E1 and E2 { E.type := (if E1.type = boolean and E2.type = boolean


then boolean else type_error) }
30

Simple Language Example:


Checking Expressions (cont’d)

 e1 : array(s, )  e2 : integer


 e1[e2] : 

E  E1 [ E2 ] { E.type := (if E1.type = array(s, t) and E2.type = integer


then t else type_error) }

Note: parameter t is set with the unification of


E1.type = array(s, t)
31

Simple Language Example:


Checking Expressions (cont’d)

 e : pointer()


 e^ : 

E  E1 ^ { E.type := (if E1.type = pointer(t) then t


else type_error) }

Note: parameter t is set with the unification of


E1.type = pointer(t)
32

A Simple Language Example:


Functions

T  T -> T EE(E)

Function type declaration Function call

Example:
v : integer;
odd : integer -> boolean;
if odd(3) then
v := 1;
33

Simple Language Example:


Function Declarations

T  T1 -> T2 { T.type := function(T1.type, T2.type) }

Parametric type:
type constructor
34

Simple Language Example:


Checking Function Invocations

 e1 : function(, )  e2 : 


 e1(e2) : 

E  E1 ( E2 ) { E.type := (if E1.type = function(s, t) and E2.type = s


then t else type_error) }
35

Type Conversion and Coercion


• Type conversion is explicit, for example
using type casts
• Type coercion is implicitly performed by
the compiler to generate code that converts
types of values at runtime (typically to
narrow or widen a type)
• Both require a type system to check and
infer types from (sub)expressions
36

Syntax-Directed Definitions for


Type Checking in Yacc
%{
enum Types {Tint, Tfloat, Tpointer, Tarray, … };
typedef struct Type
{ enum Types type;
struct Type *child; // at most one type parameter
} Type;
%}

%union
{ Type *typ;
}

%type <typ> expr

%%

37

Syntax-Directed Definitions for


Type Checking in Yacc (cont’d)


%%

expr : expr ‘+’ expr { if ($1->type != Tint


|| $3->type != Tint)
semerror(“non-int operands in +”);
$$ = mkint();
emit(iadd);
}
38

Syntax-Directed Definitions for


Type Coercion in Yacc

%%
expr : expr ‘+’ expr
{ if ($1->type == Tint && $3->type == Tint)
{ $$ = mkint(); emit(iadd);
}
else if ($1->type == Tfloat && $3->type == Tfloat)
{ $$ = mkfloat(); emit(fadd);
}
else if ($1->type == Tfloat && $3->type == Tint)
{ $$ = mkfloat(); emit(i2f); emit(fadd);
}
else if ($1->type == Tint && $3->type == Tfloat)
{ $$ = mkfloat(); emit(swap); emit(i2f); emit(fadd);
}
else semerror(“type error in +”);
$$ = mkint();
}
39

Checking L-Values and R-Values


in Yacc
%{
typedef struct Node
{ Type *typ; // type structure
int islval; // 1 if L-value
} Node;
%}

%union
{ Node *rec;
}

%type <rec> expr

%%

Checking L-Values and R-Values
40

in Yacc
expr : expr ‘+’ expr
{ if ($1->typ->type != Tint || $3->typ->type != Tint)
semerror(“non-int operands in +”);
$$->typ = mkint();
$$->islval = FALSE;
emit(…);
}
| expr ‘=’ expr
{ if (!$1->islval || $1->typ != $3->typ)
semerror(“invalid assignment”);
$$->typ = $1->typ;
$$->islval = FALSE;
emit(…);
}
| ID
{ $$->typ = lookup($1);
$$->islval = TRUE;
emit(…);
}
41

Type Inference and Polymorphic


Functions
Many functional languages support polymorphic type systems

For example, the list length function in ML:

fun length(x) = if null(x) then 0 else length(tl(x)) + 1

length([“sun”, “mon”, “tue”]) + length([10,9,8,7])


returns 7
42

Type Inference and Polymorphic


Functions
The type of fun length is:
∀α.list(α) → integer
We can infer the type of length from its body:
fun length(x) = if null(x) then 0 else length(tl(x)) + 1
where
null : ∀α.list(α) → bool
tl : ∀α.list(α) → list(α)
and the return value is 0 or length(tl(x)) + 1, thus
length: ∀α.list(α) → integer
43

Type Inference and Polymorphic


Functions
Types of functions f are denoted by α→β and the
post-system rule to infer the type of f(x) is:
 e1 : α → β  e2 : α


 e1(e2) : β

The type of length([“a”, “b”]) is inferred by



 length : ∀α.list(α) → integer  [“a”, “b”] : list(string)


 length([“a”, “b”]) : integer

44

Example Type Inference


Append concatenates two lists recursively:
fun append(x, y) = if null(x) then y
else cons(hd(x), append(tl(x), y))
where
null : ∀α.list(α) → bool
hd : ∀α.list(α) → α
tl : ∀α.list(α) → list(α)
cons : ∀α.(α × list(α)) → list(α)
45

Example Type Inference


fun append(x, y) = if null(x) then y
else cons(hd(x), append(tl(x), y))
The type of append : ∀σ,τ,φ. (σ ×τ) → φ is:
type of x : σ = list(α1) from null(x)
type of y : τ= φ from append’s return type
return type of append : list(α2) from return type of cons
and α1 = α2 because
 x : list(α1)
  

 x : list(α1)  tl(x) : list(α1)  y : list(α1)


 


 hd(x) : α1  append(tl(x), y) : list(α1)
 cons(hd(x), append(tl(x), y)) : list(α2)

46

Example Type Inference


fun append(x, y) = if null(x) then y
else cons(hd(x), append(tl(x), y))
The type of append : ∀σ,τ,φ. (σ ×τ) → φ is:
σ = list(α)
τ= φ = list(α)
Hence,
append : ∀α.(list(α) × list(α)) → list(α)
47

Example Type Inference

 ([1, 2],[3]) : list(α) × list(α)

 
 append([1, 2], [3]) : τ  append([1, 2], [3]) : list(α)

τ = list(α)
α = integer


  ([1],[“a”]) : list(α) × list(α)
 append([1], [“a”]) : τ  append([1], [“a”]) : list(α)

Type error
48

Type Inference: Substitutions,


Instances, and Unification
• The use of a paper-and-pencil post system for
type checking/inference involves
substitution, instantiation, and unification
• Similarly, in the type inference algorithm, we
substitute type variables by types to create
type instances
• A substitution S is a unifier of two types t1
and t2 if S(t1) = S(t2)
49

Unification
An AST representation of append([], [1, 2])

apply

append :∀α.(list(α) × list(α)) → list(α) ( × ) : (σ, τ)

[] : list(φ) [ , ] : list(ψ)

1 : integer 2 : integer
50

Unification
An AST representation of append([], [1, 2])

apply

append :∀α.(list(α) × list(α)) → list(α) ( × ) : (σ, τ)

Unify by the following substitutions:


σ = list(φ) = list(ψ) [] : list(φ) [ , ] : list(ψ)
⇒φ=ψ
τ = list(ψ) = list(integer)
⇒ φ = ψ = integer
1 : integer 2 : integer
σ = τ = list(α) ⇒ α = integer

You might also like