(Lec 1-3) Introduction To Compilers
(Lec 1-3) Introduction To Compilers
Pieces of a Compiler
Will discuss scanning, parsing, (optimization), code generation. Input language can be any programming language. Output is usually machine/assembly language.
Could be another language (e.g. C).
Example: Scanner
Input is sequence of characters. If x>100 then y :=1 else y:=2; Output is tokens (lexemes).
Example: Parser
Input is tokens.
.
Example: Optimizer
Input is syntax tree. Output is modified syntax tree.
L505: L506:
10
Syntax of Language
<stmt> := | on_stmnt off_stmnt := := on off
Backus-Naur Form (BNF)
on_stmnt off_stmnt
<int> ; <int> ;
For example:
on off off 33; -89; 0;
11
Semantics of Language
What is the meaning of a program written in a language? Statements processed in order in which they appear. on N turns on light bulb for N ticks. off N turns off light bulb for N ticks. Time must be non-negative. For example:
on off off on on 3; 2; 6; 0; -1;
illegal program.
12
13
Pieces of a Compiler
14
Scanning LBL
Recognize keywords (on, off), integers, white space. Ad-hoc scanner:
token scan (FILE *input) { while(1){ c = read_char (input); if (c == o){ c = read_char (input); if (c == n) return ON_TOKEN; else if (c== f) . . . } else if (c >= 0 && c <= 9) return scan_integer (input); else if (c == ;) return SEMI_TOKEN; else if (c == ) continue; else error (Unknown character); } }
15
Optimization
New instruction
lt_toggle = lt_on; lt_off; or lt_off; lt_on;
17
Optimization, contd
parse (FILE * input) { t = scan (input); if (t == ON_TOKEN) { t = scan (input); if (t != INT_TOKEN) syntax_error (Expected int); else a[i++] = token_value(t); } else if (t == OFF_TOKEN) { ... } else syntax_error (Unknown keyword); }
18
19
Constructing a Scanner
Need form all way to describe item recognized by scanner.
Items called tokens or lexemes.
Regular expressions (REs). Automatic techniques for constructing scanners from REs. Dragon: Sec 3.1-3.7.
20
e. g. floating point number is sequence of one or more digits followed by a decimal point followed by a sequence of one or more digits.
Too complex and verbose. -1.4, 1.0e9.
22
REs, contd
Alternate RE with |
If R1 and R2 are REs, so is R1|R2. 0|1 is a 0 or 1. (a.b)|c is ab or c. 0|1|2|3|4|5|6|7|8|9 matches a digit.
23
REs, Precisely
L(R) is a set of strings matched by RE R. xA R = R1 . R2 R = R1 | R2 R = R1* ==> ==> ==> ==> L (x) = {x}. L (R) = {ab : a L (R1), b L (R2)}. L (R) = L (R1) U L (R2). L (R) = Us=0 { as: a L (R1)}
24
Examples of REs
Letter Digit IdChar Id FPNum := a|b . . . y|z|A|B . . . Y|Z := 0|1|2|3|4|5|6|7|8|9 := Letter | Digit := {a|b . . . y|z|A|B . . . Y|Z|0| . . . |9} := Letter . IdChar*. := Digit . Digit . Digit . Digit*
25
26
Why REs?
Good notation for describing lexemes. Directly translatable to program that recognizes RE.
Produce abstract machine that recognizes strings. Translate to efficient program that dose the same.
27
Name: Type:
Symbol Table
int
Line_no: 202
28
Interface
Two data types:
symbol_table a symbol table. Symbol_table_entry entry in a symbol table with information on an identifier.
Five functions:
symbol_table *make_symbol_table (int fold_case). symbol_table_entry *get_symbol (symbol_table tbl, char *str). symbol_table_entry *put_symbol (symbol_table tbl, char *str). symbol_table *clear_symbol_table (symbol_table tbl). Void print_statistics (symbol_table tbl).
29
Implications:
Cannot let clients outside ADT see internal details. Export an interface with a complete set of operations.
Only the functions exported by ADT know the details. Clients hold instances of object and invoke operations on them.
30
ADTs in C++/C
C++ Classes Support ADT C does not support this program development methodology.
Must reveal full definition of a struct to clients. Need opaque datatypes. Can support ADTs by convention (not compiler checking).
31
You need to write the 5 symbol table routines described earlier. I have also written a test driver routine that users the interface
test_symbol.c You should not have to look at the code in this file.
32
33
Hash functions
Two desirable properties.
Fast to compute Spread expected collection of strings over table evenly.
34