0% found this document useful (0 votes)

53 views

Lecture Notes On Semantic Analysis and Specifications: 15-411: Compiler Design Andr e Platzer

This document provides lecture notes on semantic analysis and specifications in compiler design. It discusses: 1) Semantic analysis determines if a program is syntactically well-formed by checking name and type analysis through symbol tables and type rules. This fills in details parsing cannot represent. 2) Static semantics specify how expressions are structured but not their runtime behavior. Dynamic semantics define evaluation relations showing how expressions evaluate step-by-step and handle side effects. 3) Dynamic semantics are defined through evaluation rules showing how expressions reduce to values while tracking state changes, to precisely capture languages with side effects like function calls.

Uploaded by

Muhammad

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Lecture Notes On Semantic Analysis and Specifications: 15-411: Compiler Design Andr e Platzer

Uploaded by

Muhammad

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Lecture Notes on

Semantic Analysis and Specifications

15-411: Compiler Design

André Platzer

Lecture 13

1 Introduction
Now we have seen how parsing works in the front-end of a compiler and
how instruction selection and register allocation works in the back-end.
We have also seen how intermediate representations can be used in the
middle-end. One important question is the last phase of the front-end: se-
mantic analysis that is used to determine if the input program is actually
syntactically well-formed. Another important question arises in the first
phase of the middle-end: translation of the dynamic aspects of advanced
data structures. Even though both questions belong to different phases of
the compiler, we answer them together in this lecture. The static and dy-
namic semantical aspects need to fit together anyhow.
Some smaller subset of what is covered in this lecture can be found in
the textbook [App98, Ch 7.2], which covers data structures.

2 Semantic Analysis and Static Semantics

Essentially, the semantic analysis makes up for syntactical aspects of the
language that are important for understanding if the program makes sense,
but cannot be represented (easily) in the context-free domain of determin-
istic parsing. That is, all consistency checks that need information from the
context of the current program location. Typical parts of semantic analy-
sis include name analysis that is used to identify which particular variable
an identifier x refers to, especially where it has been declared. Is it a local
variable? Is it an formal parameter of a function? Is it a global variable (for

L ECTURE N OTES
L13.2 Semantic Analysis and Specifications

programming languages that allow this)? Is it an identifier in a struct? Of

course, correct name analysis is important to make sure the right memory
locations or registers are accessed when looking up or changing the value
of x. Name analysis is usually solved by reading off a symbol table with all
definitions and their type information from the abstract syntax tree.
Another part of semantic analysis is type analysis that is used to look up
the types of all identifiers based on the results of name analysis and make
sure the types fit. It is also responsible for simple type inference. If we find
an expression
e[t.f + x]

in the source code, then what exactly is the type of the result? And is it a
well-typed expression at all? The answer depends on the type of e which
had better be an array type (otherwise the array access would be ill-typed).
The answer also depends on the type of t which had better be a struct type
s and will then be used to lookup the type of t.f according to the type of
the field f declared in s. Finally, the answer depends on x. And if the result
of the addition t.f + x does not produce an integer, the whole expression
still does not type check. It is crucial to find out whether a program with
such an expression is well-typed at all. Otherwise, we would compile it
to something with a strange and arbitrary effect without knowing that the
source program made no sense at all.
All these answers depend on information from the context of the pro-
gram. One interesting indicator for a language is how many passes of anal-
ysis through the abstract-syntax tree are necessary to perform semantical
analysis successfully.
A simple typing rule is that for plus expressions:

e1 : int e2 : int
e1 + e2 : int

It specifies that if e1 and e2 both have type int then e1 + e2 also has type int.
In the following, we will give typing rules that define the static seman-
tics of source program expressions.

3 Dynamic Semantics
The static semantics is necessary to make sense of a source code expression.
It only specifies it incompletely, though. We will also explain the dynamic

L ECTURE N OTES
Semantic Analysis and Specifications L13.3

semantics of expressions, i.e., what their effect is when evaluated. This in-
formation is required for the translation phase in order to make sure that
the intermediate language generated for a particular source code snippet
actually complies with the semantics of the programming language, which
hopefully fits to the intention that the programmer had in mind when writ-
ing the program.
As a side-note, the job of compiler verification is to make sure that the
source program will be compiled to something that has exactly the same
effect as prescribed by the language semantics, regardless of whether the
source program is doing the right thing. The compiler’s job is to adhere
to this exactly. Contrast this to program verification, where the job is to
make sure that the rogram fits to the intentions that the programmer has
in mind, as expressed by some formal specification of what it is meant to
achieve, e.g., in the form of a set of pre/postconditions.
For describing the dynamic semantics of C0, we define how we evalu-
ate expressions and statements of the programming language. We need to
describe how an expression e will be evaluated to determine the result. For
this purpose, we want to define a relation e ⇒ v that specifies that e, when
evaluated, results in the value v. We want to define the relation e ⇒ v by
rules specifying the effect of each expression like

e 1 ⇒ n1 e 2 ⇒ n2 n = add(n1 , n2 )
+?
e1 + e2 ⇒ n

This rule is intended to specify that, when e1 evaluates to value n1 and

e2 evaluates to n2 , and value n is the sum of values n1 and n2 , then the
expression e1 + e2 evaluates to n. Unfortunately, it does not quite do the
trick yet. To see why, we first consider two other rules. One simple rule
that states that constants just evaluate to themselves (similarly for 5,7,...)

0
0⇒0

And one rule that evaluates a variable identifier x. But what should a vari-
able evaluate to? Well that depends on what its value is. The value of a
variable identifier is stored at some address in memory (or a register, which
we talk about in a moment). Let’s denote the memory address where x is
stored by addr(x). This memory address could be for a local variable on
the stack, for a spilled function argument on the stack (near beginning of
frame), or somewhere in a global data segment for global variables. Either
way, it is in memory. Thus when we evaluate a variable identifier, the result

L ECTURE N OTES
L13.4 Semantic Analysis and Specifications

is going to be its value from the memory:

x variable identifier
id?
x ⇒ M (addr(x))

Yet we need to know the memory contents for this to make sense. So let
us reflect this in the notation and change our judgment to e@M ⇒ v to say
that expression e, when evaluated in memory state M evaluates to value v.

x variable identifier
id
x@M ⇒ M (addr(x))

Yet now what about local variables x that are stored in registers or function
arguments that have been passed in registers? The precise option would be
to also include the register state R into the judgment e@M @R → v. Then
a variable x that is stored in the register address addr(x) = %eax would
evaluate to R(%eax) instead of to M (addr(x)).

x variable identifier x stored in register

id1
x@M @R ⇒ R(addr(x))

x variable identifier x stored in memory

id2
x@M @R ⇒ M (addr(x))
That is the formally precise way to do it. The only downside is that the
notation is a bit clumsy. So instead, what we will do is just simply pretend
the registers would be a special part of memory state M stored at the spe-
cial addresses M (%eax), M (%rbx), M (%rdi), .... This really doesn’t change
anything except making the notation easier to read. Formally this notation
corresponds to considering the cross product M × R of the real memory
state and the register state and just calling the result M again.
Unfortunately, however, the above approach is still only sufficient for
describing pure programming languages where no expression can have an
effect, except computing its result. The C0 programming language already
has no unnecessary side effects during expression evaluation like preincre-
ment/postincrement etc. Yet it still allows function calls in expressions,
and function calls can have arbitrary side effects. In order to make sure we
do not miss those effects in the semantics, we thus carry an explicit system
state M around through the evaluation. We thus look at the judgement
e@M ⇒ v@M 0 capturing that an expression e in system state M evalu-
ates to value v and results in the new system state M 0 . Here, we primarily

L ECTURE N OTES
Semantic Analysis and Specifications L13.5

consider the memory state M , but other state can be tracked too with this
principle. Thus the above rule turns into the more precise

e1 @M ⇒ n1 @M 0 e2 @M 0 ⇒ n2 @M 00 n = add(n1 , n2 )
+
e1 + e2 @M ⇒ n@M 00

Unlike the first rule (+?), the new rule now captures the semantics of left-
to-right evaluation order. In the first rule (+?), we could still supply the
premisses in an arbitrary order and were not restricted to evaluating the
subexpressions e1 and e2 in any particular order. The new rule (+) explic-
itly requires left-to-right evaluation, because the state M 0 resulting from
evaluating e1 is the starting state for evaluating e2 , whose resulting state
M 00 will be the resulting state of evaluating the whole expression e1 + e2 .
The static and dynamic semantics together give meaning to all elements
of the programming language. We treat the static and dynamic semantics
for various elements of the C0 programming language at the same time in
the following.

4 Small Types
So far, we have only used a programming language with minimal typing.
Basically, the only two types so far were int and bool and are easily distin-
guished by their respective syntactical occurrence in the language. Only int
had been allowed as a type for declared variables, and bool only occurred
in the test expressions for if, while and for.
Real programming languages, including C0, have more serious types.

Types τ ::= int | bool | structs | τ ∗ | τ [] | a

where a is a name of a type abbreviation for some type τ and has been
introduced in the form
typedef τ a

We mostly ignore the other C0 types char and string in this course.
For discussing the layout of the various types, we distinguish between
˘;small types that can fit into a register and large types that have to be stored
in memory. First we discuss all small types. For the purpose of memory
layout and register handling we define the size |τ | of small types τ as fol-

L ECTURE N OTES
L13.6 Semantic Analysis and Specifications

lows:

|int| = 4
|bool| = 4
|τ ∗ | = 8
|τ []| = 8

That is int and bool are 32-bit and pointers τ ∗ are represented by 64-bit
addresses on 64-bit machines. Arrays themselves are large values and ar-
ray constants would be large, because we cannot pass a whole array in a
register. But C0 allocates arrays on the heap like pointers and they are only
represented by their starting address. Hence, variables of array type have
a small type, because we can fit the array address into a register.
Especially we have data of two different sizes. Pointers are allocated
from the heap memory by the runtime system using the alloc(τ ) library
function that returns fresh chunks of memory at a location divisible by 8
ready to hold a value of type τ . In C-like programming languages, the null
address 0 (denoted by the constant NULL) is special in that it will never be
returned by alloc(τ ), except to indicate that the system ran out of memory
altogether. All memory access to the null pointer is thus considered bad
memory access.

5 Large Types
Array contents and structures are large types, because they do not (usually)
fit into a register. We define their size as follows

|s| = pad(|τ1 |, . . . , |τn |)

when the structure s has been defined to be

struct s {
τ1 f1 ;
τ2 f2 ;
..
.
τn fn ;
}

L ECTURE N OTES
Semantic Analysis and Specifications L13.7

The function pad adds the sizes of its arguments, adding padding as neces-
sary in between and at the end. That is, elements of type int and bool are
aligned at memory addresses that are divisible by 4. Elements of type τ ∗
and τ [] are aligned at memory addresses divisible by 8. A compiler remem-
bers the byte offset of field fi in the memory layout of structure s in order
to find it later. We denote it by off(s, fi ).
Similarly to distinguishing between small and large types, we distin-
guish between small values (values of a small type) that fit into a register
and large values (values of a large type) that have to be stored in memory.

6 Structs
The typing rule for the static semantics of structs is simple and just says
that an access to a field f of a struct value e of type s results in a value of
type τ , where τ is the declared type of field f in s:

e:s struct s {. . . τ f ; . . .}
e.f : τ

To give a dynamic semantics to structs, we define the operational se-

mantics of what happens when we evaluate an expression involving structs.
When evaluating e.f , we just evaluate e to an address a and then lookup
the memory contents at a with the offset off(s, f ) belonging to the field f of
struct s in memory, i.e., M (a + off(s, f )):

e : s struct s {. . . τ f ; . . .} e@M ⇒ a@M 0 τ small

e.f @M ⇒ M 0 (a + off(s, f ))@M 0

Unfortunately, this only works well for small types whose values can be
returned into registers right away. For large types, this cannot really work
well, because the memory M (a) at location a does not even contain all in-
formation, and we cannot store the whole object in a single register anyhow.
For large types, evaluation produces an address instead, relative to which
the content will be addressed further.

e:s struct s {. . . τ f ; . . .} e@M ⇒ a@M 0 τ large

e.f @M ⇒ a + off(s, f )@M 0

L ECTURE N OTES
L13.8 Semantic Analysis and Specifications

7 Pointers
To explain the static semantics of pointers and pointer access, there are sim-
ple rules:
e : τ∗
∗e : τ alloc(τ ) : τ ∗ NULL : τ ∗

If pointer e has the type τ ∗ of a pointer to an element of type τ , then the

pointer dereference ∗e has the type τ of the element. Allocation of a piece
of heap memory for data of type τ gives a pointer of type τ ∗, i.e., pointing
to τ . The last typing rule is a little tricky, because it gives NULL all pointer
types at once. This is necessary, because the same NULL pointer is used
to represent not-allocated regions of arbitrary types. In particular, the type
of NULL depends on its context, that is, on the expected type that the con-
text wants NULL to have in order to make sense of it. In order to avoid
ambiguity of the typing, we disallow ∗NULL. The expression ∗NULL is
tricky to type-check because it can lead to ambiguous situations. For in-
stance (∗NULL).f could have a lot of types: essentially all types of field f
in arbitrary structs declared in the program.
The operational semantics of pointer evaluation can be described us-
ing the notation M (a) to denote the content of memory address a. The
operational semantics for a pointer access ∗e evaluates e to an address and
then returns the memory contents at that address. Dereferencing the NULL
pointer must raise the SIGSEGV exception. In an implementation this can
be accomplished without any checks, because the operating system will
prevent read access to address 0 and raise the appropriate exception just by
having page 0 unmapped in the virtual-memory page table. When deref-
erencing pointers that store address a, which are not the null pointer, we
obtain their memory contents M (a) (for small types):

e:τ∗ e@M ⇒ a a 6= 01 τ small e:τ∗ e@M ⇒ a a 6= 0 τ large

? ?
∗e@M ⇒ M (a) ∗e@M ⇒ a

For large types, the memory M (a) at location a does not even contain all
information, and we cannot store the whole object in a register anyhow.
So instead, ∗e evaluates to a itself, relative to which the content will be
addressed further. When we dereference a pointer that is null, the program
1
Machines implement this check by having page 0 unmapped in the virtual-memory
page table.

L ECTURE N OTES
Semantic Analysis and Specifications L13.9

terminates with a segmentation fault:

e : τ ∗ e@M ⇒ a a = 0
?
∗e@M ⇒ SIGSEGV
For memory allocation, however, we run into some issues when we
want to specify what it is doing. After all, memory allocation modifies the
memory by finding a free chunk of memory and by clearing the memory
contents to 0. Thus memory changes from the old memory M to the new
memory M 0 . We model this by changing our judgement from e@M ⇒ e0
into e@M ⇒ e0 @M 0 , in which we also specify the new memory state M 0 .
M 0 like M but M 0 (a) = . . . = M 0 (a + |τ | − 1) = 0 for fresh locations
alloc(τ )@M ⇒ a@M 0
The values stored in freshly allocated location must be all 0. This can be
achieved with calloc() and means that values of type int are simply 0,
values of type bool are false, values of type τ 0 ∗ are NULL pointers, all fields
of structs are recursively set to 0, and values of array type have address 0
which is akin to a NULL array reference.
This change of judgment to e@M ⇒ e0 @M 0 is reflected in the subse-
quent modifications of the above rules that now carries the memory state
through. When doing that, we also notice that expression evaluation dur-
ing pointer dereference (just as well as all other expression evaluation) can
actually modify the memory contents. We fix this deficiency in our previ-
ous specification right away:
e:τ∗ e@M ⇒ a@M 0 a 6= 0 τ small
∗e@M ⇒ M 0 (a)@M 0
e:τ∗ e@M ⇒ a@M 0 a 6= 0 τ large
∗e@M ⇒ a@M 0
e:τ∗ e@M ⇒ a@M 0 a = 0
∗e@M ⇒ SIGSEGV@M 0 NULL@M ⇒ 0@M
When no functions with side effects (like memory allocation) occur in the
expressions, we need not distinguish between M and M 0 during expression
evaluation.
Note, however, that when combining pointers and structs, we cannot
necessarily rely on the operating system to trap null pointer dereferencing.
For a very large struct s and a pointer p : s∗, dereferencing a field p−>f
(which desugars into (∗p).f ), the target address may already be beyond the
unmapped virtual memory page 0 if f has a large offset.

L ECTURE N OTES
L13.10 Semantic Analysis and Specifications

8 Arrays
Arrays are almost like pointers. Both are allocated. The difference is that
C0 pointers disallow pointer arithmetic, whereas arrays can access its con-
tents at arithmetic integer positions randomly. In particular, in arrays, the
question rises what to do with access out of bounds, i.e., outside the array
size. Does it just access the memory unsafely at wild places, or will it be
detected safely and raise a runtime exception? In early labs, we will fol-
low the unsafe C tradition and allow more arbitrary behavior. In later labs,
we will switch to safe compilation more like in Java. First, we give simple
typing rules explaining the static semantics:

e : τ [] t : int e : int
e[t] : τ alloc array(τ, e) : τ []

Now we consider the operational semantics. Note that the evaluation order
in array access (like everywhere else) is strictly left-to-right. So expression
e[t] will be evaluated by evaluating e first and t second, and then accessing
the result of e at the result of t:
e : τ [] e@M ⇒ a@M 0 t@M 0 ⇒ n@M 00 a 6= 0 M 00 (a + n|τ |) allocated τ small
e[t]@M ⇒ M 00 (a + n|τ |)@M 00

e : τ [] e@M ⇒ a@M 0 t@M 0 ⇒ n@M 00 a 6= 0 M 00 (a + n|τ |) allocated τ large

e[t]@M ⇒ a + n|τ |@M 00
For safe array access with array bounds check, we add checks to the above
rules ensuring that 0 ≤ n < N where N is the size of the array, which has
to be stored at the time of allocation. We have two choices. The liberal but
unsafe choice like in C where we leave the evaluation of array access unde-
fined in all other cases that do not match either rule. Or an unambiguously
defined semantics choice where we say precisely how array access fails:

e : τ [] e@M ⇒ a@M 0 t@M 0 ⇒ n@M 00 a 6= 0 M 00 (a + n|τ |) not allocated

e[t]@M ⇒ SIGSEGV@M 00

For the case where the address computation of the array itself yields NULL,
we can either raise a SIGSEGV before evaluating t or after. Both choices are
reasonable. The early choice saves operations in case of a SIGSEGV. The
late choice, however, reduces the number of times that violations have to

L ECTURE N OTES
Semantic Analysis and Specifications L13.11

be checked, which we thus prefer:

e : τ [] e@M ⇒ a@M 0 a=0 e : τ [] e@M ⇒ a@M 0 t@M 0 ⇒ n@M 00 a=0
e[t]@M ⇒ SIGSEGV@M 0 or e[t]@M ⇒ SIGSEGV@M 00
The difference between those two choices is not gigantic, because it only
affects the memory state after an abnormal termination of the program.
Getting this part of the semantics exact is more important in programming
languages like Java where throwing and catching exceptions is used rou-
tinely and, in fact, some programs may rely on exceptions being raised all
the time and in the right order in order to function properly.
For the safe array access semantics with array bounds checks, failed
checks for array bounds result in SIGABRT, where N is the size of the array:
e : τ [] e@M ⇒ a@M 0 t@M 0 ⇒ n@M 00 a 6= 0 (n < 0 ∨ n ≥ N )
e[t]@M ⇒ SIGABRT@M 00
Allocation of arrays is very similar to allocation of pointers, except that
we also check if the size makes sense:
e@M ⇒ n@M 0
n≥0
M 00 like M 0 but M 00 (a) = . . . = M 00 (a + (n − 1)|τ |) = 0 for fresh locations
alloc array(τ, e)@M ⇒ a@M 0
Values in a freshly allocated array are all initialized to 0. This can again be
achieved using calloc.
Note again, that when combining pointers and arrays, we cannot neces-
sarily rely on the operating system to trap null pointer dereferencing. For a
pointer p : τ []∗ to a very large array, accessing (∗p)[70000] may already lead
to a target address beyond the unmapped virtual memory page 0.
C and C0 do not have special support for multidimensional arrays,
but just considers int[][] as (int[])[], i.e., an array of arrays of integers. In
ragged representation, this two-dimensional array is represented as a one-
dimensional array of pointers to arrays. This results in row-major ordering
in which the cells in each row are stored one after the other in memory. In
ragged representation, there is no guarantee in general that the rows are
stored contiguously without gaps (or reorderings). There isn’t even a guar-
antee that all rows are of the same length.
In contrast, statically declared arrays (which are not allowed in C0) are
usually stored contiguously in row-major order, because the dimensions
are known statically. For instance,

L ECTURE N OTES
L13.12 Semantic Analysis and Specifications

int matrix[2][3] = {{1, 2, 3}, {4, 5, 6}};

corresponds to the matrix

1 2 3
which is stored in memory as 1 2 3 4 5 6
4 5 6

Side-note: an odd thing in C is that x[i] and i[x] are both valid array
accesses and equivalent, because both are just defined as ∗(x + i). C even
allows 2[x] instead of x[2].

9 Assignments to Lvalues
Assignments to primitive int variables are simple and ultimately just im-
plemented by a MOV instruction to the respective temp (see lectures 2 and
3). In more complicated languages with structured data, we can assign to
other expressions such as a[10 − i] or ∗p or x.f or even ∗x.f or (∗x).f alias
x−>f . Not all expressions qualify as proper expressions to which we can
assign to. It makes no sense to try to assign a value to x + y nor to f (∗x − 1)
that may only appear on the right-hand side of an expression (rvalues). The
expressions that make sense to appear on the left-hand side of an expres-
sion as they identify a proper location (say in memory) are called lvalues.
Lvalues are well-typed expressions of the form

x | ∗ e | e.f | e[t] | e−>f

for (well-typed) expressions e, t, primitive variable x and struct field f . The

only syntactically valid assignments in C0 are of the form l=e or l+=e and
so on for an lvalue l of type τ and an arbitrary (rvalue) expression e of
type τ . No implicit type cast conversions or coercions happen in C0. While
some programming languages allow assignments to large types and give it
a memcopy semantics, C0 does not do so, because it is not clear for pointer
types if a shallow or deep copy would make more sense. Thus, in C0, only
small types can be assigned to directly.
An lvalue represents a destination location for the assignment, which is
either a variable x or an address a in memory. Essentially, for determining
the target of an lvalue, we use the rules of the structural operational seman-
tics that we have discussed so far, except that we stop at location a before
actually doing the memory access M (a). More precisely, we define the re-
lation v@M ⇒l d@M 0 to say that lvalue v, when evaluated in memory state

L ECTURE N OTES
Semantic Analysis and Specifications L13.13

M represents location d and this evaluation changed the memory state to

M 0 . It is defined as:
e@M ⇒ a@M 0 e:s e@M ⇒ a@M 0
x@M ⇒l x@M ∗e@M ⇒l a@M 0 e.f @M ⇒l a + off(s, f )@M 0

e1 : τ [] e1 @M ⇒ a@M 0 e2 @M 0 ⇒ n@M 00
e1 [e2 ]@M ⇒l a + n|τ |@M 00
The side conditions and failure modes for the address computation when
evaluating lvalue e1 [e2 ]@M ⇒l ... of an array access are just like those for
the value evaluation e1 [e2 ]@M ⇒ ....
Using this lvalue relation ⇒l , we can define the effect of an assignment
v = e. The semantics of a statement does not produce a value, it just has
an effect on memory. Thus we just write e@M ⇒ @M 0 to describe the
transition. As a shorthand notation, we write M 00 {M 00 (a) ← w} for the
memory state M 000 that is obtained from a memory state M 00

v@M ⇒l x@M e@M ⇒ w@M 0

v = e @M ⇒ @M 0 {V (x) ← w}

v@M ⇒l a@M 0 e@M 0 ⇒ w@M 00 M 00 (a) allocated

v = e @M ⇒ @M 00 {M 00 (a) ← w}
v@M ⇒l a@M 0 e@M 0 ⇒ w@M 00 a=0
v = e @M ⇒ SIGSEGV@M 00
The effect of an assignment is undefined otherwise. In particular, whether
the assignment segfaults during a bad access or not may (at present) de-
pend on whether the compiler implements out of bounds checks. In later
labs, you will implement a safe compiler for C0 where out of bounds prob-
lems have to be checked.
Note especially, that for an assignment v = e, the lvalue v will be eval-
uated to a destination location before the right-hand side expression e will
be evaluated. When both v and e have been evaluated, the assignment to
v will actually be performed and the destination address a will only be ac-
cessed then. In particular:
1. *e = 1/0 will raise SIGFPE when e evaluates without any other ex-
ception, because e evaluates to an address (without complications)
and then, before this memory location is even accessed, the expres-
sion 1/0 is computed which throws an exception.

L ECTURE N OTES
L13.14 Semantic Analysis and Specifications

2. e[-1] = 1/0 should raise a SIGABRT in safe mode, assuming e

evaluates without any other exception during evaluation of e, be-
cause the target address computation for the lvalue itself failed.

3. e->f = 1/0 will raise SIGSEGV when e evaluates to NULL without

any other exception during evaluation of e.

In principle, compound assignment operators ⊕= for an operator ⊕ ∈

{+, −, ∗, /, ...} work like assignments, but with the operation ⊕. Yet, the
meaning of compound assignment operators changes in subtle ways com-
pared to what it meant for just primitive variables. Now compound as-
signments are no longer just a syntactic expansion, because expressions can
now have side effects and it matters how often an expression is evaluated.
For a compound assignment e[t] += e’, the lvalue of e[t] is only com-
puted once, quite unlike for the assignment e[t] = e[t] + e’, where
e[t] is evaluated to an address twice. A compound assignment

v ⊕= e

with an operator ⊕ executes as

v@M ⇒l x@M e@M ⇒ w@M 0

v = e @M ⇒ @M 0 {V (x) ← V (x) ⊕ w}

v@M ⇒l a@M 0 e@M 0 ⇒ w@M 00 M 00 (a) allocated

v ⊕= e @M ⇒ @M 00 {M 00 (a) ← M 00 (a) ⊕ w}

10 Function Calls
Suppose we have a function call f (e1 , . . . , en ) to a function f that has been
defined as τ f (τ1 x1 , . . . , τn xn ) {b}. We consider a simplified situation here
and just assume there is a return variable called %eax in the function body
b.
e1 @M ⇒ v1 @M1 , e2 @M1 ⇒ v2 @M2 , . . . , en @Mn−1 ⇒ vn @Mn b@Mn0 ⇒ @M 0 τ small
f (e1 , . . . , en )@M ⇒ M 0 (%eax)@M 0

where Mn0 is like Mn , except that the values vi of the arguments ei have been
bound to the formal parameters xi , i.e., Mn0 (x1 ) = v1 , . . . , Mn0 (xn ) = vn .
And now we remember that allocation is actually a function call in C0.
Consequently, in the intermediate representation of our C0 compiler, side

L ECTURE N OTES
Semantic Analysis and Specifications L13.15

effects due to allocation can only occur at the statement level not nested
within expressions. Hence, specifying the semantics for the intermediate
representation is actually easier (it doesn’t need complicated M 0 ). But, un-
like its intermediate representation, C0 itself still needs to respect memory
state passing orders carefully.

11 Type Safety
An important property of programming languages is whether they are type-
safe. In a type-safe language, the static and dynamic semantics of a pro-
gramming language should fit together. If we have an expression e in a
program that has the type int, then we would be rather surprised to find at
runtime a result of evaluating e that is a float. If this could happen, then it is
rather hard to make sure that the program will always execute reasonably
even if the compiler accepted it as a well-typed program.
What we expect from the static and dynamic semantics of a type-safe
language is that types are preserved in the following sense. If we have a
program that is well-typed (the static semantics says it’s okay) and we fol-
low an evaluation step of the dynamic semantics, then the resulting pro-
gram is still well-typed (type preservation). Otherwise what can happen
is that we run a well-typed program and suddenly break the well-typing
leading to values out of the type ranges. That is, the property that we want
(and need to prove for our static and dynamic semantics) is that

If e : τ and e ⇒ v then v : τ

For C0 (and other impure programming languages), the statement is a bit

more involved, because the dynamic semantics refers to the memory state
M . The program reads values from memory and stores values back in
memory. If the program would store an int into M (a) and then later on ex-
pect to read a pointer from M (a), then type-safety is broken. Consequently,
type-preservation is a property of the form

If e : τ and e@M ⇒ v@M 0 and M is okay then v : τ and M 0 is okay

for a suitable definition of when a memory state M is “okay”, i..e, the types
of the values that it stores are compatible with what the program expects.
The other property that one would expect from type-safe languages is
that the dynamic semantics always knows what do do (with well-typed
programs). We do not want to be stuck in the middle of a run or an in-
terpretation of the program by the dynamic semantics rules not knowing

L ECTURE N OTES
L13.16 Semantic Analysis and Specifications

where to go and not having a rule that allows a transition. For instance,
if the program contains the well-typed expression e + f and the dynamic
semantics does not know how to evaluate the odd expression “test”+0.5,
then we better make sure that the evaluation of e can never lead to a string
“test” while, at the same time, the evaluation of f leads to the float 0.5.

If e : τ and e is not a final value then e → e0 for some e0

Again, the real definition of progress is complicated by the fact that we

need to consider memory M .
The conjunction of type preservation and progress properties is called
type safety [WF94]. Without the progress property, every language could
be given a trivially type-preserving dynamic semantics that just stops eval-
uating whenever it hits an expression that would not preserve types. But
that doesn’t help write safer programs.

Quiz
1. Which of the rules conveys important secret information about how
to implement a compiler correctly that are easy to miss?
2. How many ways are there to implement accesses like (∗a)[i]?
3. Why is 2[i] not allowed in the C0 language when it is allowed in C?
4. Is it important how exactly the compiler implements things like e[-1]
= 1/0 or not?
5. How can you make sure that you always generate the most effective
code for the subtleties in the rules? What information do you need for
that? Define a dataflow analysis that solves (some) of these issues.
6. In the rules discussed here, what would happen if you would move
the primes of memory M around? Which permutations still give a
good language semantics? And which permutations are still good
for implementation purposes? And which permutations spoil every-
thing?
7. Under which assumptions can you implement a compiler correctly
using the rules that do not track @M ?
8. Can you write a compiler that does not distinguish between Lvalues
and Rvalues? Can you write a parser that does not?

L ECTURE N OTES
Semantic Analysis and Specifications L13.17

9. Should programming languages have multidimensional arrays or should

they have an understanding of nested arrays of arrays of arrays in-
stead?

10. Some old C libraries use one-dimensional arrays. These libraries were
often translated from Fortran. They probably just didn’t know how
to write proper C, did they?

11. Suppose you hired a high-school student to translate a Fortran library

for numerical computation to C. Suppose it doesn’t work or occasion-
ally produces unexpected results. What is your first question?

12. Why is there a difference comparing e=e+a and e+=a? Should there
be a difference? Doesn’t this difference only confuse the user?

13. List all advantages and disadvantages that type preservation has when
writing a compiler.

14. List all advantages and disadvantages that type preservation has when
using a compiler.

15. List all advantages and disadvantages that type progress has when
writing a compiler.

16. List all advantages and disadvantages that type progress has when
using a compiler.

17. Is your job as a compiler designer easier if you can change the static
semantics of the programming language? How?

18. Is your job as a compiler designer easier if you can change the dy-
namic semantics of the programming language? How?

19. Is your job as a compiler designer easier if you can change the type
preservation aspects of the programming language? How?

20. Is your job as a compiler designer easier if you can change the type
progress aspects of the programming language? How?

21. In the last questions: what are the downsides for the user?

22. You want to add threads to C0. Which rules do you need to change
for that and how? Where are the difficulties?

L ECTURE N OTES
L13.18 Semantic Analysis and Specifications

References
[App98] Andrew W. Appel. Modern Compiler Implementation in ML. Cam-
bridge University Press, Cambridge, England, 1998.

[WF94] Andrew K. Wright and Matthias Felleisen. A syntactic approach

to type soundness. Inf. Comput., 115(1):38–94, 1994.

L ECTURE N OTES

How To Setup A Hyosung PCI 3.0 (8000 Series) Keypad
No ratings yet
How To Setup A Hyosung PCI 3.0 (8000 Series) Keypad
5 pages
Screenshot 2024-05-11 at 11.38.19 AM
No ratings yet
Screenshot 2024-05-11 at 11.38.19 AM
28 pages
Lec6 - SemanticAnalysis 3
No ratings yet
Lec6 - SemanticAnalysis 3
38 pages
Chapter 4 Semantic Analysis PDF
No ratings yet
Chapter 4 Semantic Analysis PDF
16 pages
Unit v Symantic Analysis
No ratings yet
Unit v Symantic Analysis
45 pages
Semantic Analysis
No ratings yet
Semantic Analysis
32 pages
CD Unit 4
No ratings yet
CD Unit 4
35 pages
Semantic Report
No ratings yet
Semantic Report
24 pages
Unit Iii QB
No ratings yet
Unit Iii QB
20 pages
Compiler Phases
No ratings yet
Compiler Phases
18 pages
Compilers Lecture 8
No ratings yet
Compilers Lecture 8
19 pages
Chương 4. Phân Tích Ngữ Nghĩa
No ratings yet
Chương 4. Phân Tích Ngữ Nghĩa
57 pages
Chapter-4 Semantic analysis
No ratings yet
Chapter-4 Semantic analysis
17 pages
Unit 3 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Compiler Design - WWW - Rgpvnotes.in
19 pages
Static Analysis of String Manipulations in Critical Embedded C Programs
No ratings yet
Static Analysis of String Manipulations in Critical Embedded C Programs
17 pages
poc syntax directed
No ratings yet
poc syntax directed
26 pages
SPCCPDF
No ratings yet
SPCCPDF
83 pages
AT&CD Unit 3 - 2 Part
No ratings yet
AT&CD Unit 3 - 2 Part
8 pages
Semantica HL
No ratings yet
Semantica HL
57 pages
Compiler Construction Final[1]
No ratings yet
Compiler Construction Final[1]
6 pages
14-Context Sesitive Analysis and Attribute Grammars
No ratings yet
14-Context Sesitive Analysis and Attribute Grammars
39 pages
Lecture10 2x2
No ratings yet
Lecture10 2x2
5 pages
Chapter One-Introduction
No ratings yet
Chapter One-Introduction
6 pages
Unit 1
No ratings yet
Unit 1
37 pages
Principles of Programming Languages
No ratings yet
Principles of Programming Languages
168 pages
Compiler 76fddac7 42da 4b0e 985e 8cf8a92cd723
No ratings yet
Compiler 76fddac7 42da 4b0e 985e 8cf8a92cd723
20 pages
CS1352 May09
100% (1)
CS1352 May09
14 pages
phases of compiler
No ratings yet
phases of compiler
12 pages
Compiler Construction Chapter 6
No ratings yet
Compiler Construction Chapter 6
111 pages
Introduction to Compilers and Language Design -chapter7
No ratings yet
Introduction to Compilers and Language Design -chapter7
21 pages
Phases of A Compiler
No ratings yet
Phases of A Compiler
17 pages
cc-06 Semantic-Analysis Slides
No ratings yet
cc-06 Semantic-Analysis Slides
83 pages
Compiler Construction Week 2
No ratings yet
Compiler Construction Week 2
29 pages
Lecture 6- Semantic Analysis
No ratings yet
Lecture 6- Semantic Analysis
26 pages
Structure of Compiler
No ratings yet
Structure of Compiler
3 pages
CD Unit3,4
No ratings yet
CD Unit3,4
21 pages
Lecture 5
No ratings yet
Lecture 5
10 pages
Type Inference
No ratings yet
Type Inference
34 pages
Context Sensitive Analysis and Attribute Grammar - Compiler Design - Dr. D. P. Sharma - NIT Surathkal by Wahid311
No ratings yet
Context Sensitive Analysis and Attribute Grammar - Compiler Design - Dr. D. P. Sharma - NIT Surathkal by Wahid311
69 pages
CC Assignment No 03
No ratings yet
CC Assignment No 03
12 pages
Semantics Fullset Cs333
No ratings yet
Semantics Fullset Cs333
98 pages
Lecture 01
No ratings yet
Lecture 01
47 pages
Unit4compiler Design
No ratings yet
Unit4compiler Design
11 pages
7.CD Lab Manual
No ratings yet
7.CD Lab Manual
35 pages
PLDI Week 09 Typing
No ratings yet
PLDI Week 09 Typing
36 pages
Chapter 7 Symbol Tables and Error Handler
No ratings yet
Chapter 7 Symbol Tables and Error Handler
34 pages
17 1 Midterm
No ratings yet
17 1 Midterm
14 pages
Chapter Five: Type Checking
100% (1)
Chapter Five: Type Checking
48 pages
12 Static
No ratings yet
12 Static
10 pages
4 Semantic Analysis
No ratings yet
4 Semantic Analysis
20 pages
Semantic Analysis, Scope
No ratings yet
Semantic Analysis, Scope
112 pages
Ways To Specify Semantics: - by A Language Reference Manual
No ratings yet
Ways To Specify Semantics: - by A Language Reference Manual
41 pages
Third Year Sixth Semester CS6660 Compiler Design Two Mark With Answer
100% (1)
Third Year Sixth Semester CS6660 Compiler Design Two Mark With Answer
11 pages
Intermediate Representation and Symbol Table
No ratings yet
Intermediate Representation and Symbol Table
39 pages
RkCD-Chapter 6 - Intermediate Code Generation
No ratings yet
RkCD-Chapter 6 - Intermediate Code Generation
12 pages
SSCDNotes PDF
100% (1)
SSCDNotes PDF
53 pages
ET21BTHCS058 2
No ratings yet
ET21BTHCS058 2
17 pages
Chapter 5
No ratings yet
Chapter 5
10 pages
Semantic Analysis: Instructor: Venkata Ramana Badarla
No ratings yet
Semantic Analysis: Instructor: Venkata Ramana Badarla
90 pages
Unit V Functional and Logic Programs: Contents:-Language Specific Compilation: Object Oriented
No ratings yet
Unit V Functional and Logic Programs: Contents:-Language Specific Compilation: Object Oriented
91 pages
From Simple IO to Monad Transformers
From Everand
From Simple IO to Monad Transformers
J Adrian Zimmer
2/5 (1)
Definition of Communication
No ratings yet
Definition of Communication
1 page
The Advantages of Models
No ratings yet
The Advantages of Models
1 page
Limitations of Models: Can Lead To Oversimplifications
No ratings yet
Limitations of Models: Can Lead To Oversimplifications
2 pages
What Is A Model
No ratings yet
What Is A Model
1 page
A New Model of The Communication Process
No ratings yet
A New Model of The Communication Process
2 pages
Discussion: Positioning The Study of Media in The Field of Communication
No ratings yet
Discussion: Positioning The Study of Media in The Field of Communication
2 pages
Derivative Models of The Communication Process: Figure 3: An Intermediary Model
No ratings yet
Derivative Models of The Communication Process: Figure 3: An Intermediary Model
3 pages
Figure 1: Shannon's (1948) Model of The Communication Process
No ratings yet
Figure 1: Shannon's (1948) Model of The Communication Process
2 pages
Zkfinger SDK 4.0 en
No ratings yet
Zkfinger SDK 4.0 en
61 pages
PLC 2 Unity Reference Standard Block Library
No ratings yet
PLC 2 Unity Reference Standard Block Library
478 pages
WinRiver II User Guide - Apr08
No ratings yet
WinRiver II User Guide - Apr08
170 pages
Allowing Non-Administrators To Control Hyper-V
No ratings yet
Allowing Non-Administrators To Control Hyper-V
1 page
The Great book for ESP32forth v 2.0
No ratings yet
The Great book for ESP32forth v 2.0
314 pages
CBR System Guide
100% (1)
CBR System Guide
14 pages
Rolf Burger - Computer Viruses - A High-Tech Disease (1988, Abacus)
No ratings yet
Rolf Burger - Computer Viruses - A High-Tech Disease (1988, Abacus)
291 pages
HYF ODM 4 4 2 Tablet Confidential
No ratings yet
HYF ODM 4 4 2 Tablet Confidential
1 page
07 Huawei Cloud O&m Basics
No ratings yet
07 Huawei Cloud O&m Basics
59 pages
4.5 Backup and Recovery Backup Purpose, Considerations
No ratings yet
4.5 Backup and Recovery Backup Purpose, Considerations
9 pages
CD 1800 SM
80% (5)
CD 1800 SM
198 pages
MSG 00012
No ratings yet
MSG 00012
61 pages
Hash Cracking With Rainbow Tables
No ratings yet
Hash Cracking With Rainbow Tables
4 pages
Sinumerik 828D - Parameter Description PDF
No ratings yet
Sinumerik 828D - Parameter Description PDF
872 pages
NCP MCI 6.5 - 74Q With Answer
No ratings yet
NCP MCI 6.5 - 74Q With Answer
24 pages
C++ & Standard Template Library: Ganesan C Lead Engineer
No ratings yet
C++ & Standard Template Library: Ganesan C Lead Engineer
30 pages
Distributed Operating System - Wikipedia
No ratings yet
Distributed Operating System - Wikipedia
53 pages
Week 5: Application Layer - HTTP Protocol - : Revision
No ratings yet
Week 5: Application Layer - HTTP Protocol - : Revision
11 pages
IBM System X Reference Architecture For Hadoop: MapR
No ratings yet
IBM System X Reference Architecture For Hadoop: MapR
35 pages
Hotel Management System: Features
No ratings yet
Hotel Management System: Features
2 pages
Developers List
No ratings yet
Developers List
4 pages
Export Parameters: Parameter Type Length Short Text
No ratings yet
Export Parameters: Parameter Type Length Short Text
3 pages
LM Modbus User Instructions
No ratings yet
LM Modbus User Instructions
2 pages
Cisco ACI With Nutanix
No ratings yet
Cisco ACI With Nutanix
34 pages
Azure Devops ppt by suraj
No ratings yet
Azure Devops ppt by suraj
157 pages
NB Asus User
No ratings yet
NB Asus User
10 pages
Parts Catalog Color Controller E-25C
No ratings yet
Parts Catalog Color Controller E-25C
5 pages
PANZURA - CCL_File_Gateway Poweredby_Panzura_Whitepaper
No ratings yet
PANZURA - CCL_File_Gateway Poweredby_Panzura_Whitepaper
14 pages
2022-2023 Spring - Ceng240 20230516 Midterm Exam Answers
No ratings yet
2022-2023 Spring - Ceng240 20230516 Midterm Exam Answers
52 pages

Lecture Notes On Semantic Analysis and Specifications: 15-411: Compiler Design Andr e Platzer

Uploaded by

Lecture Notes On Semantic Analysis and Specifications: 15-411: Compiler Design Andr e Platzer

Uploaded by

Lecture Notes on

Semantic Analysis and Specifications

15-411: Compiler Design

2 Semantic Analysis and Static Semantics

programming languages that allow this)? Is it an identifier in a struct? Of

This rule is intended to specify that, when e1 evaluates to value n1 and

is going to be its value from the memory:

x variable identifier x stored in register

x variable identifier x stored in memory

Types τ ::= int | bool | structs | τ ∗ | τ [] | a

|s| = pad(|τ1 |, . . . , |τn |)

when the structure s has been defined to be

To give a dynamic semantics to structs, we define the operational se-

e : s struct s {. . . τ f ; . . .} e@M ⇒ a@M 0 τ small

e:s struct s {. . . τ f ; . . .} e@M ⇒ a@M 0 τ large

If pointer e has the type τ ∗ of a pointer to an element of type τ , then the

e:τ∗ e@M ⇒ a a 6= 01 τ small e:τ∗ e@M ⇒ a a 6= 0 τ large

terminates with a segmentation fault:

e : τ [] e@M ⇒ a@M 0 t@M 0 ⇒ n@M 00 a 6= 0 M 00 (a + n|τ |) allocated τ large

e : τ [] e@M ⇒ a@M 0 t@M 0 ⇒ n@M 00 a 6= 0 M 00 (a + n|τ |) not allocated

be checked, which we thus prefer:

int matrix[2][3] = {{1, 2, 3}, {4, 5, 6}};

corresponds to the matrix

x | ∗ e | e.f | e[t] | e−>f

for (well-typed) expressions e, t, primitive variable x and struct field f . The

M represents location d and this evaluation changed the memory state to

v@M ⇒l x@M e@M ⇒ w@M 0

v@M ⇒l a@M 0 e@M 0 ⇒ w@M 00 M 00 (a) allocated

2. e[-1] = 1/0 should raise a SIGABRT in safe mode, assuming e

3. e->f = 1/0 will raise SIGSEGV when e evaluates to NULL without

In principle, compound assignment operators ⊕= for an operator ⊕ ∈

with an operator ⊕ executes as

v@M ⇒l x@M e@M ⇒ w@M 0

v@M ⇒l a@M 0 e@M 0 ⇒ w@M 00 M 00 (a) allocated

For C0 (and other impure programming languages), the statement is a bit

If e : τ and e@M ⇒ v@M 0 and M is okay then v : τ and M 0 is okay

If e : τ and e is not a final value then e → e0 for some e0

Again, the real definition of progress is complicated by the fact that we

9. Should programming languages have multidimensional arrays or should

11. Suppose you hired a high-school student to translate a Fortran library

[WF94] Andrew K. Wright and Matthias Felleisen. A syntactic approach

You might also like