compiler design unit I 2025
compiler design unit I 2025
Textbook:
Alfred V. Aho, Ravi Sethi, and Jeffrey D.
Ullman,
“Compilers: Principles, Techniques, and Tools”
Pearson Education, 2014
History of Compiler
Important Landmark of Compiler's history are as
follows:
•The "compiler" word was first used in the early
1950s by Grace Murray Hopper
•The first compiler was build by John Backum and
his group between 1954 and 1957 at IBM
•COBOL was the first programming language
which was compiled on multiple platforms in 1960
•The study of the scanning and parsing issues
were pursued in the 1960s and 1970s to provide a
complete solution
Why Compilers?
• Machine structures became too complex
and software management too difficult to
continue with low level language
error messages
COMPILATION VS. INTERPRETATION
Compiler Interpreter
Program need not be compiled every Every time higher level program is
time converted into lower level program
Errors are displayed after entire Errors are displayed for every
program is checked instruction interpreted (if any)
Cross Compiler
A cross compiler is a compiler that runs on one machine and
produce object code for another machine.
Bootstrap compiler
If a compiler has been implemented in its own language . self-
hosting compiler.
Single Pass Compiler
Compiler
source code Target code
Errors
The multipass compiler processes the source code or syntax tree of a program
several times.
It divided a large program into multiple small programs and process them.
All of these multipass take the output of the previous phase as an input.
List of compiler
1. Ada compiler
2. ALGOL compiler
3. BASIC compiler
4. C# compiler
5. C compiler
6. C++ compiler
7. COBOL compiler
8. Smalltalk comiler
9. Java compiler
COMPILATION PROCESS
Other Applications
• In addition to the development of a compiler, the
techniques used in compiler design can be
applicable to many problems in computer science.
errors
20
Analysis and Synthesis
Machine
D=A+ B*C code
Load R1,B
Load R2,C
MUL R1,R2
id → letter (letter|digit)*
ws → delim+
Syntax Analyzer
• A Syntax Analyzer creates the syntactic
structure (generally a parse tree) of the
given program.
• Hierarchical analysis:
– Group tokens into grammatical phrases
Syntax Analysis
Semantic Analysis
• Check semantic error
• Gather type information for code-generation
• Using hierarchical structure to identify
operators and operands
• Doing type checking
– E.g, using a real number to index an array (error)
– Type convert
E.g, intoreal(60) if initial is a real number
Semantic Analysis
Intermediate Code Generation
• Represent the source program for an abstract
machine code
• Should be easy to produce and easy to translate
into target program
• Three-address code (consists of a sequence of
instructions, each of which has atmost three
operands)
– temp2:=id3*temp1
– every memory location can act like a register
Intermediate Code Generation
• It has several properties
1.Each instruction has atmost one operator in addition to
the assignment.
2.The compiler must generate a temporary name to hold
the value computed by each instruction
3. some three-address instruction have fewer than three
operands.
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 : = temp3
Code Optimization
• Improve the intermediate code
• Faster-running machine code
– temp1 :=id3*60.0
id1:=id2+temp1
Code Generation
• Generate relocation machine code or
assembly code
– MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
Translation of A Statement
Symbol-table Management
• To record the identifiers in source program
– Identifier is detected by lexical analysis and then is
stored in symbol table
• To collect the attributes of identifiers
(not by lexical analysis)
– Storage allocation : memory address
– Types
– Scope (where it is valid, local or global)
– Arguments (in case of procedure names)
• Arguments numbers and types
• Call by reference or address
• Return types
Symbol-table Management
• Semantic analysis uses type information
check the type consistence of identifiers
• Code generating uses storage allocation
information to generate proper relocation
address code
Symbol table
Example
int a = "hello"; // the types String and int are not compatible
String s = "...";
int a = 5 - s; // the - operator does not support arguments of type String
int a = “value”;
should not issue an error in lexical and syntax analysis phase, as it is lexically and
structurally correct, but it should generate a semantic error as the type of the assignment
differs. These rules are set by the grammar of the language and evaluated in semantic
analysis.
Accessing an Out-of-Scope Variable
int main()
{
if (1)
{
int x = 10;
}
printf("Value of x: %d\n", x);
return 0;
}
Shadowing:
∙The inner result in the nested block shadows the outer result. Both are valid but
refer to different variables.
∙Shadowing: A variable declared in a nested block with the same name as an outer
variable hides the outer variable.
Out-of-Scope Access:
∙The result variable declared in calculateDifference is not accessible in the main
function.
Actual and Formal Parameter Mismatch
void add(int a, int b)
{
printf("Sum: %d\n", a + b);
}
int main()
{
add(5);
return 0;
}
int main()
{
int x = 10; // First declaration of 'x'
int x = 20; // Second declaration of 'x' in the same scope
printf("Value of x: %d\n", x);
return 0;
}
void _CalculateSum()
{
printf("This is a reserved identifier misuse.\n");
}
// Using an identifier starting with an underscore in the global namespace (reserved)
int _globalValue = 100;
int main()
{
printf("Result: %d\n", __result); // Accessing a reserved
identifier
_CalculateSum(); // Calling a reserved
identifier
printf("Global Value: %d\n", _globalValue); // Accessing another reserved
identifier
__result:
Identifiers starting with a double underscore (__) are reserved for the
compiler and standard library at all scopes.
_CalculateSum:
Identifiers starting with a single underscore followed by an uppercase
letter (_C) are reserved for use in the global namespace.
_globalValue:
Identifiers starting with a single underscore (_) are reserved for the
implementation in the global namespace.
Preprocessors
Macro Processing
File Inclusion
Rational Preprocessors
Language extension
Macro preprocessor:
Contain 2 parts
1.Macro definition
2.Macro use
-single constructs for larger constructs
-contains formal parameters
Macro use :
ex: printf(“%d”, f(10)) ans: 10*10*10
3. Rational preprocessor:
Used in older pgming languages
Doesn’t support modern data structure
4.Language extension:
Attempt to add capabilities to the language by what amounts to built-in
macros.
The language EQUEL is a db query lang embedded in c.
Stmts beginning with ## are taken by the preprocessor to access db
Assembler
Link-editors
(linker or link editor is a computer system program that takes one or more
object files (generated by a compiler or an assembler) and
combines them into a single executable file, library file, or another
"object" file.)
– External references
• Library file, routines by system, any other program
Grouping of Phases
Popular Tools
• Yacc/Bison: For C/C++.
• ANTLR: For many languages like Java, Python, and C#.
• PLY (Python Lex-Yacc): For Python.
• CUP (Constructor of Useful Parsers): Generates LALR
parsers for Java.
Syntax- directed translation engine:
• A Syntax-Directed Translation (SDT) engine is a tool that
helps convert input from one form to another while following
a set of grammar rules combined with specific actions.
• It uses grammar rules (to define the structure of a
language) and attaches actions (to describe what to do for
each rule).
• These actions can perform tasks like building a syntax tree,
generating intermediate code, or calculating values.
• These tools help create intermediate representations of
code, like three-address code or abstract syntax trees.
Popular Tools
• LLVM (Low-Level Virtual Machine): Provides an infrastructure for building
compilers and optimizing intermediate code.
• GCC (GNU Compiler Collection): Includes tools for generating intermediate
representations.
Automatic code generator:
• An Automatic Code Generator is a tool that creates
machine code or intermediate code for a computer program
automatically.
• It takes high-level instructions (like syntax trees or
intermediate representations created during compilation)
and translates them into low-level code (like assembly
language or binary code) that the computer can execute.
• These tools assist in generating machine code from
intermediate representations.
Popular Tools:
• LLVM: Again, used for backend code generation.
• SPIM/MIPS Simulators: Used for generating assembly code targeting
the MIPS architecture.
• Keystone: A lightweight assembler framework.
Data flow engine:
• A Data Flow Engine is a system or tool that processes data
by following a defined flow or sequence of operations,
where the output of one step becomes the input for the next.
• It focuses on how data moves and transforms through
different stages in a program or system.
• Each step (or node) in the flow performs a specific operation
on the data, like filtering, aggregating, or analyzing.
• These focus on optimizing intermediate code or machine
code.
Popular Tools
• LLVM: Also used extensively for optimization.
• Polly: An LLVM-based tool for polyhedral optimization.
• Open64: A high-performance compiler with strong optimization capabilities.
Analysis Tools(Software tools that
manipulate source program)
Structure Editors
match begin..end, do..while etc)
Pretty Printers
beautify – special color,font for different portion)
Static Checkers
check syntax error, statement reach ability analysis
Interpreters
Taken line by line
Structure Editor:
•i/p is sequence of commands
•Create and modify pgm
•Can check i/p is correctly formed
•Supply keywords automatically
•It finds matching parenthesis and corresponding begin…..end stmt
•o/p is similar to o/p of analysis phase of compiler.
Interpreter:
Performs the operations in the source pgm.
Pretty printers:
Prints the o/p of the pgm, clearly visible
Comments spl fonts
Stmt with an amount of indentation
Proportional to the depth of their nesting
Static checker:
Identify bugs without running the pgm
Detect the parts of the source code never be executed (unreachable
code)
Variable used before being defined
Type checking
Places where compiler techniques are used
Text formatters
•Sequence of characters[i/p]
•Commands to indicate paragraph, figures, mathematical structure
superscript, subscripts
Silicon compilers
•used in signal analysis in circuits
•i/p language contains logical signal[0/1] signal or group of signals
in a switching circuit
•o/p -> circuit design in an appropriate lang.
Query Interpreters
Translate a predicate (relational and boolean operators) into
commands to a db for records satisfying that predicate.