Introduction Compiler Design
Introduction Compiler Design
Compiler Design
CC-3203
1
Topics
1. Lexical analysis (Scanning)
2. Syntax Analysis (Parsing)
3. Syntax Directed Translation
4. Intermediate Code Generation
5. Run-time environments
6. Code Generation
7. Machine Independent Optimization
2
Course scope
3
Definition - What does Low-Level Language mean?
4
Definition - What does High-Level Language
(HLL) mean?
A high-level language is any programming language that enables
High-Level development of a program in a much more user-friendly
programming context and is generally independent of the
Language computer's hardware architecture
A high-level language does not require addressing hardware
(HLL) constraints when developing a program.
Every single program written in a high-level language must be
interpreted into machine language before being executed by the
computer.
5
Machine Language?
What is Machine Language?
The machine language definition states that for computers to understand the
commands provided in high-level languages like Java, C, C++, Python, the
instructions have to be given in machine language that comprises of bits.
The conversion of a high-level language to machine language takes place by
using an interpreter or compiler.
Machine language comprises of zeros and ones. As computers are in the form
of digital electronic devices, they use these binary digits for their operations.
In machine language, one showcases the true or on state while zero depicts
the false or off state.
The method of converting high-level language programs to machine language
codes is dependent on the CPU.
6
Assembly language
The assembly language definition states that it acts as the intermediate language
between machine language and high-level programming languages.
7
A specialized compiler termed as an assembler
is needed for converting assembly language
commands to object code or machine code.
8
Compiler
Compilation is a process that
translates a program in one language
(the source language) into an
equivalent program in another
language (the object or target
language).
An important part of any compiler is
the detection and reporting of errors.
9
Compiler
Executing a program
written n HLL
programming language is
basically of two parts.
Source program must
first be translated into a
object program.
Then the results object
program is loaded into a
memory executed
10
ASSEMBLER
A program which translate the assembly language into machine language.
The input to an assembler program is called source program, the output is a machine
language translation (object program).
Source program must first be translated into an object program. Then the results object
program is loaded into a memory executed
11
INTERPRETER
An interpreter is closely
related to a compiler but
takes both source program
and input data.
The translation and
execution phases of the
source program are one and
the same.
12
S.NO. COMPILER INTERPRETER Difference
1.
Compiler scans the whole Translates program
program in one go. statement at a time.
one
between
2.
It generates
object code.
intermediate It does not produce any
intermediate object code. Compiler and
3.
Main advantage of Due to interpreters being
compilers is it’s execution slow in executing the object
Interpreter
time. code, it is preferred less.
Memory requirement is It requires less memory as it
4. more due to the creation of does not create intermediate
object code.. object code.
Python, Ruby, Perl,
Eg. C, C++, C# etc.
SNOBOL, MATLAB etc.
13
Phases of
Compiler
14
Lexical Analyzer: It scans the code as a
stream of characters, groups the sequence of
characters into lexemes and outputs a
sequence of tokens with reference to the
programming language.
Compiler
Syntax Analyzer: In this phase, the tokens
that are generated in the previous stage are
checked against the grammar of
programming language, whether the
expressions are syntactically correct or not. It
makes parse trees for doing so.
15
Semantic Analyzer: It verifies whether the
expressions and statements generated in the
previous phase follow the rule of programming
language or not and it creates annotated parse
trees.
Compiler
16
Code Optimizer – It transforms the code so that it
consumes fewer resources and produces more speed. The
meaning of the code being transformed is not altered.
Optimization can be categorized into two types: machine
dependent and machine independent.
Compiler
Target Code Generator – The main purpose of Target Code
generator is to write a code that the machine can
understand and also register allocation, instruction
selection etc. The output is dependent on the type of
assembler. This is the final stage of compilation. The
optimized code is converted into relocatable machine code
which then forms the input to the linker and loader.
17
Program States
18
Address Binding
Address Binding is the association of program instructions and data to
the actual physical memory location. There are various types of
address binding in the operating system.
19
Address Binding
Compile Time Address Binding: If the compiler
is responsible of performing address binding, then
it is called as compile time address binding.
This type of address binding will be done before
loading the program into memory.
The compiler required to interact with the
operating system memory manager to perform
compile time address binding.
20
Execution Time Address Binding: The
address binding will be postponed even
after loading the program into memory.
The program will keep on changing the
Address locations in the memory till the time of
Binding program execution.
This type of Address binding will be
done by the processor at the time of
program execution.
21
Compiler
Native Compiler :Native compiler are compilers that
generates code for the same Platform on which it runs. It
converts high language into computer’s native language. For
example, Turbo C or GCC compiler
Cross compiler: A Cross compiler is a compiler that generates
executable code for a platform other than one on which the
compiler is running. For example, a compiler that running on
Linux/x86 box is building a program which will run on a
separate Arduino/ARM.
22
Lexical Analyzer Generator
LEX
◦ Lex is a program that generates lexical analyzer. It is
used with YACC parser generator.
◦ The lexical analyzer is a program that transforms an
input stream into a sequence of tokens.
◦ It reads the input stream and produces the source
code as output through implementing the lexical
analyzer in the C program
The function of Lex is as follows:
◦ Firstly lexical analyzer creates a program lex.1 in the Lex
Lexical language. Then Lex compiler runs the lex.1 program and
produces a C program lex.yy.c.
Analyzer ◦ Finally C compiler runs the lex.yy.c program and produces
Generator an object program a.out.
◦ a.out is lexical analyzer that transforms an input stream
into a sequence of tokens.
Lexical Analyzer Generator
Lexical Analyzer Generator
Lexical
Analyzer
Generator sudo apt-get update
Lexical [0-9]+
[^a]
one or more digit between 0 and 9
all the other characters except a
Analyzer [^A-Z] all the other characters except the upper-case letters
Generator a{2, 4}
a{2, }
either aa, aaa or aaaa
two or more occurrences of a
a{4} exactly 4 a’s i.e, aaaa
. any character except newline
a* 0 or more occurrences of a
a+ 1 or more occurrences of a
[a-z] all lower case letters
[a-zA-Z] any alphabetic letter
w(x | y)z wxz or wyz