0% found this document useful (0 votes)
15 views

Introduction Compiler Design

The document discusses topics related to compiler design and automata theory including lexical analysis, syntax analysis, syntax directed translation, intermediate code generation, run-time environments, code generation, and machine independent optimization. It defines low-level and high-level languages, machine language, assembly language, compilers, interpreters, and address binding.

Uploaded by

rm23082001
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Introduction Compiler Design

The document discusses topics related to compiler design and automata theory including lexical analysis, syntax analysis, syntax directed translation, intermediate code generation, run-time environments, code generation, and machine independent optimization. It defines low-level and high-level languages, machine language, assembly language, compilers, interpreters, and address binding.

Uploaded by

rm23082001
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Automata Theory and

Compiler Design
CC-3203

BY: DR. L. P. VERMA

1
Topics
1. Lexical analysis (Scanning)
2. Syntax Analysis (Parsing)
3. Syntax Directed Translation
4. Intermediate Code Generation
5. Run-time environments
6. Code Generation
7. Machine Independent Optimization

2
Course scope

Aim: Main reference:


To learn the working of compiler Compilers – Principles, Techniques and Tools,
Second Edition by Alfred V. Aho, Ravi Sethi,
Jeffery D. Ullman

3
Definition - What does Low-Level Language mean?

A low-level language is a programming language


that deals with a computer's hardware components
Low-Level and constraints.
Language Low-level languages are designed to operate and
handle the entire hardware and instructions set
architecture of a computer directly.

Machine language and assembly language are


popular examples of low-level languages.

4
Definition - What does High-Level Language
(HLL) mean?
A high-level language is any programming language that enables
High-Level development of a program in a much more user-friendly
programming context and is generally independent of the
Language computer's hardware architecture
A high-level language does not require addressing hardware
(HLL) constraints when developing a program.
Every single program written in a high-level language must be
interpreted into machine language before being executed by the
computer.

5
Machine Language?
What is Machine Language?
The machine language definition states that for computers to understand the
commands provided in high-level languages like Java, C, C++, Python, the
instructions have to be given in machine language that comprises of bits.
The conversion of a high-level language to machine language takes place by
using an interpreter or compiler.
Machine language comprises of zeros and ones. As computers are in the form
of digital electronic devices, they use these binary digits for their operations.
In machine language, one showcases the true or on state while zero depicts
the false or off state.
The method of converting high-level language programs to machine language
codes is dependent on the CPU.

6
Assembly language
The assembly language definition states that it acts as the intermediate language
between machine language and high-level programming languages.

In comparison to machine language, assembly language is easier to comprehend and


use; however, it is more complicated than high-level programming languages.

Assembly language is referred to as low-level language because it is closer to the


hardware level.

Programmers using assembly language codes should have an understanding of


register structure and computer architecture.

7
A specialized compiler termed as an assembler
is needed for converting assembly language
commands to object code or machine code.

Assembly language statements comprise four


Assembly sections. These are mnemonic, operand, label,
language and comment (the last two sections are
optional).

Mnemonics in assembly language provide


instructions to execute commands; operands
are parameters put in use for the command.

8
Compiler
Compilation is a process that
translates a program in one language
(the source language) into an
equivalent program in another
language (the object or target
language).
An important part of any compiler is
the detection and reporting of errors.

9
Compiler
Executing a program
written n HLL
programming language is
basically of two parts.
Source program must
first be translated into a
object program.
Then the results object
program is loaded into a
memory executed

10
ASSEMBLER
A program which translate the assembly language into machine language.
 The input to an assembler program is called source program, the output is a machine
language translation (object program).
Source program must first be translated into an object program. Then the results object
program is loaded into a memory executed

11
INTERPRETER
An interpreter is closely
related to a compiler but
takes both source program
and input data.
The translation and
execution phases of the
source program are one and
the same.

12
S.NO. COMPILER INTERPRETER Difference
1.
Compiler scans the whole Translates program
program in one go. statement at a time.
one
between
2.
It generates
object code.
intermediate It does not produce any
intermediate object code. Compiler and
3.
Main advantage of Due to interpreters being
compilers is it’s execution slow in executing the object
Interpreter
time. code, it is preferred less.
Memory requirement is It requires less memory as it
4. more due to the creation of does not create intermediate
object code.. object code.
Python, Ruby, Perl,
Eg. C, C++, C# etc.
SNOBOL, MATLAB etc.

13
Phases of
Compiler

14
Lexical Analyzer: It scans the code as a
stream of characters, groups the sequence of
characters into lexemes and outputs a
sequence of tokens with reference to the
programming language.

Compiler
Syntax Analyzer: In this phase, the tokens
that are generated in the previous stage are
checked against the grammar of
programming language, whether the
expressions are syntactically correct or not. It
makes parse trees for doing so.
15
Semantic Analyzer: It verifies whether the
expressions and statements generated in the
previous phase follow the rule of programming
language or not and it creates annotated parse
trees.

Compiler

Intermediate code generator: It generates an


equivalent intermediate code of the source
code. There are many representations of
intermediate code, but TAC (Three Address
Code) is the used most widely.

16
Code Optimizer – It transforms the code so that it
consumes fewer resources and produces more speed. The
meaning of the code being transformed is not altered.
Optimization can be categorized into two types: machine
dependent and machine independent.

Compiler
Target Code Generator – The main purpose of Target Code
generator is to write a code that the machine can
understand and also register allocation, instruction
selection etc. The output is dependent on the type of
assembler. This is the final stage of compilation. The
optimized code is converted into relocatable machine code
which then forms the input to the linker and loader.

17
Program States

18
Address Binding
Address Binding is the association of program instructions and data to
the actual physical memory location. There are various types of
address binding in the operating system.

There are 2 types of Address Binding:

• Compile Time Address Binding


• Execution Time Address Binding

19
Address Binding
 Compile Time Address Binding: If the compiler
is responsible of performing address binding, then
it is called as compile time address binding.
 This type of address binding will be done before
loading the program into memory.
 The compiler required to interact with the
operating system memory manager to perform
compile time address binding.

20
 Execution Time Address Binding: The
address binding will be postponed even
after loading the program into memory.
 The program will keep on changing the
Address locations in the memory till the time of
Binding program execution.
 This type of Address binding will be
done by the processor at the time of
program execution.

21
Compiler
 Native Compiler :Native compiler are compilers that
generates code for the same Platform on which it runs. It
converts high language into computer’s native language. For
example, Turbo C or GCC compiler
 Cross compiler: A Cross compiler is a compiler that generates
executable code for a platform other than one on which the
compiler is running. For example, a compiler that running on
Linux/x86 box is building a program which will run on a
separate Arduino/ARM.

22
Lexical Analyzer Generator

LEX
◦ Lex is a program that generates lexical analyzer. It is
used with YACC parser generator.
◦ The lexical analyzer is a program that transforms an
input stream into a sequence of tokens.
◦ It reads the input stream and produces the source
code as output through implementing the lexical
analyzer in the C program
The function of Lex is as follows:
◦ Firstly lexical analyzer creates a program lex.1 in the Lex
Lexical language. Then Lex compiler runs the lex.1 program and
produces a C program lex.yy.c.
Analyzer ◦ Finally C compiler runs the lex.yy.c program and produces
Generator an object program a.out.
◦ a.out is lexical analyzer that transforms an input stream
into a sequence of tokens.
Lexical Analyzer Generator
Lexical Analyzer Generator

Lex file format


◦ A Lex program is separated into three sections by %% delimiters. The formal of Lex source is as follows:
◦ { definitions }
◦ %%
◦ { rules }
◦ %%
◦ { user subroutines }
Definitions include declarations of constant, variable
and regular definitions.
Rules define the statement of form p1 {action1} p2
Lexical {action2}....pn {action}.
◦ Where pi describes the regular expression
Analyzer and action1 describes the actions what action the lexical
Generator analyzer should take when pattern pi matches a lexeme.
User subroutines are auxiliary procedures needed
by the actions. The subroutine can be loaded with
the lexical analyzer and compiled separately.
Installing Flex on Ubuntu:

Lexical
Analyzer
Generator sudo apt-get update

sudo apt-get install flex


PATTERN IT CAN MATCH WITH
[0-9] all the digits between 0 and 9
[0+9] either 0, + or 9
[0, 9] either 0, ‘, ‘ or 9
[0 9] either 0, ‘ ‘ or 9
[-09] either -, 0 or 9
[-0-9] either – or all digit between 0 and 9

Lexical [0-9]+
[^a]
one or more digit between 0 and 9
all the other characters except a
Analyzer [^A-Z] all the other characters except the upper-case letters

Generator a{2, 4}
a{2, }
either aa, aaa or aaaa
two or more occurrences of a
a{4} exactly 4 a’s i.e, aaaa
. any character except newline
a* 0 or more occurrences of a
a+ 1 or more occurrences of a
[a-z] all lower case letters
[a-zA-Z] any alphabetic letter
w(x | y)z wxz or wyz

You might also like