0% found this document useful (0 votes)
3 views53 pages

Introduction

The document outlines a Compiler Design course for B.Tech. students, detailing its motivation, objectives, prerequisites, and expected outcomes. It covers various modules including compiler phases, lexical analysis, syntax analysis, semantic analysis, code optimization, and code generation, along with a syllabus and recommended resources. The course aims to equip students with the knowledge to design and maintain compilers for complex programming languages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views53 pages

Introduction

The document outlines a Compiler Design course for B.Tech. students, detailing its motivation, objectives, prerequisites, and expected outcomes. It covers various modules including compiler phases, lexical analysis, syntax analysis, semantic analysis, code optimization, and code generation, along with a syllabus and recommended resources. The course aims to equip students with the knowledge to design and maintain compilers for complex programming languages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Compiler Design

6th Semester B.Tech. (CSE)


Course Code: CS3008

Dr. Jasaswi Prasad Mohanty


School of Computer Engineering
KIIT Deemed to be University
Bhubaneswar, India
Motivation
 Language processing is an important component of programming
 A large number of systems software and application programs require structured
input
• Operating Systems (command line processing)
• Databases (Query language processing)
• Type setting systems like Latex
 Where ever input has a structure one can think of language processing
 Why study compilers?
• Compilers use the whole spectrum of language processing technology

CD / Module-I / Jasaswi 2 17 January 2024


What do we expect to achieve by the end of the course?
 Knowledge to design, develop, understand, modify/enhance, and
maintain compilers for (even complex!) programming languages.
 Confidence to use language processing technology for software
development.

Prerequisites

 Familiarity with one high-level programming language such as C, C++


and Java is required.
 Data Structures & Algorithms
 Formal Language and Automata

CD / Module-I / Jasaswi 3 17 January 2024


Course Objective
The objective of the course is
• to introduce the basic theory underlying the different
components and phases of a compiler like parsing, code
generation etc.
• the students will be familiarized with the principles involved in
design of compilers for modern languages.

CD / Module-I / Jasaswi 4 17 January 2024


Course Outcomes

At the end of this course the student will be able to:


1. Explain the design of a compiler and identify the phases involved in program translation from
source code to executable code along with the intermediate files produced by the phases.
2. Represent language tokens using regular expressions and finite automata to design lexical
analyzer for a language.
3. Identify the similarities and differences among various parsing techniques to design parsers.
4. Understand formal attributed grammars for specifying semantics of programming languages and
design semantic rules to facilitate translation process.
5. Identify the effectiveness of optimization and apply various machine independent and machine
dependent optimization techniques.
6. Learn register allocation and apply target code generation algorithms used by the compiler..

CD / Module-I / Jasaswi 5 17 January 2024


Syllabus

Module I Introduction to Compiler, Phases of Compilation, Grouping of Phases.


(Overview of Compiler)
Module II Role of Lexical Analyzer, Input Buffering, Specification of Tokens, Finite state machines
(Lexical Analysis) and their applications to lexical analysis.
Context Free Grammar, Top-down Parsing- Recursive descent parsing, Table driven
Module III predictive parsing, LL(1), Bottom-up Parsing- Shift-reduce parsing, Efficient Bottom-up
(Syntax Analysis) parsers (LR parsers), LR and LALR parsing, Error recovery in parsing, handling
ambiguous grammar. compiler construction tools.
Intermediate forms of source programs- Syntax tree and three address code
Module IV (Quadruples,Triples), attributed grammar, syntax directed translation . Evaluation and
(Semantic Analysis) flow of attributes in a syntax tree, conversion of programming language constructs into
intermediate code forms. Symbol table organization, DAG representation.
Machine dependent and machine independent optimization. Local Optimization-
Module V common sub-expression elimination, Constant folding, replacing expensive operations.
(Code Optimization) Loop Optimization- Basic Block, Flow Graphs, Inner loops, Code Motion, Induction
Variable.
Module VI Design of Code Generator. Machine dependent code optimization, register allocation
(Code Generation) and assignment, Code generation algorithm.

CD / Module-I / Jasaswi 6 17 January 2024


Books
 TB1: Compilers Principles, Techniques and Tools by
Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffery D.
Ullman, Pearson Education, 2009.
 RB1: Compiler construction principles and practice by
K. C. Louden, Brooks/Cole - Thomson Learning.
 RB2: Engineering a Compiler by Keith Cooper, Linda
Torczon, ISBN: 978-0-12-088478-0, Elsevier, Inc..
 RB3: Introduction to Compiler Construction With Unix
by Axel T. Schreiner, H. George Friedman, ISBN: 978-
0134743967, Prentice-Hall software series.
 RB4: The Compiler Design Handbook by Y.N. Srikant ,
Priti Shankar, ISBN:978-1420043822, Taylor and
Francis.

CD / Module-I / Jasaswi 7 17 January 2024


Online Resources
 NPTEL Courses:
• IIT Madras: https://nptel.ac.in/courses/128/106/128106009
• IIT Kharagpur: https://nptel.ac.in/courses/106/105/106105190
• IIT Kanpur: https://nptel.ac.in/courses/106/104/106104123
• IISc Bangalore: https://nptel.ac.in/courses/106/108/106108113
• IIT Kanpur: https://nptel.ac.in/courses/106/104/106104072
 Tutorials Point: https://www.tutorialspoint.com/compiler_design
 Geeks for geeks: https://www.geeksforgeeks.org/compiler-design-tutorials

CD / Module-I / Jasaswi 8 17 January 2024


Compiler Learning
 Isn’t it an old discipline?
• Yes, it is a well-established discipline.
• Algorithms, methods and techniques are researched and developed in early
stages of computer science growth.
• There are many compilers around and many tools to generate them
automatically.
 So, why we need to learn it?
• Although you may never write a full compiler.
• But the techniques we learn is useful in many tasks like writing an interpreter
for a scripting language, validation checking for forms and so on.

CD / Module-I / Jasaswi 9 17 January 2024


Activity Calendar

ACTIVITY
TYPE DATE MARKS CO
SL.NO.

1. Class Test-1 25th Jan-30th Jan 5 CO1, CO2

2. Quiz-1 14th Feb-17th Feb 5 CO2, CO3

3. Assignment 17th Feb-24th Feb 5 CO1, CO2, CO3

MID SEM EXAM 26th Feb-02nd Mar

4. Class Test-2 01st Apr-07th Apr 5 CO3, CO4

5. Quiz-2 20th Apr-26th Apr 10 CO3, CO4, CO5, CO6

END SEM EXAM 27th Apr-08th May

CD / Module-I / Jasaswi 10 17 January 2024


MODULE I
Overview of Compiler
Sl. No. Topics
1. Introduction
2. Language Processors
3. Types of Compiler
4. Language Processing Systems
5. Structure of Compiler
6. Phases of Compiler
7. Grouping of phases into passes
8. Compiler Construction Tools
1. Introduction
 Programming languages are notations for describing computations to
people and to machines.
 Before a program can be run, it first must be translated into a form in
which it can be executed by a computer.
 The software systems that do this translation are called compilers.
 A system software to convert source language program to target
language program.
 It validates input program to the source language specification –
produces error messages/warnings during the translation process.

CD / Module-I / Jasaswi 12 17 January 2024


Bit of History
 How are programming languages implemented? Two major strategies:
• Interpreters (old and much less studied)
• Compilers (very well understood with mathematical foundations)
 Some environments provide both interpreter and compiler. Lisp, scheme etc.
provide
• Interpreter for development
• Compiler for deployment
 Java
• Java compiler: Java to interpretable bytecode
• Java JIT: bytecode to executable image

CD / Module-I / Jasaswi 13 17 January 2024


Some early machines and implementations
 IBM developed 704 Data Processing System in 1954. All programming
was done in assembly language. Cost of software development far
exceeded cost of hardware. Low productivity.
 Speedcoding interpreter: programs ran about 10 times slower than
hand written assembly code.
 John Backus (in 1954): Proposed a program that translated high level
expressions into native machine code. Skepticism all around. Most
people thought it was impossible.
 Fortran I project (1954-1957): The first compiler was released.

CD / Module-I / Jasaswi 14 17 January 2024


Fortran I
 The first compiler had a huge impact on the programming
languages and computer science. The whole new field of compiler
design was started from that.
 More than half the programmers were using Fortran by 1958. The
development time was cut down to half.
 That led to enormous amount of theoretical work (lexical analysis,
parsing, optimization, structured programming, code generation,
error recovery etc.)
 Modern compilers preserve the basic structure of the Fortran I
compiler.

CD / Module-I / Jasaswi 15 17 January 2024


What are Compilers?
• Compiler is a translator which translates from one representation of the program to
another
• Typically from high level source code to low level machine code or object code
• Source code is normally optimized for human readability
– Expressive: matches our notion of languages (and application?!)
– Redundant to help avoid programming errors
• Machine code is optimized for hardware
– Redundancy is reduced
– Information about the intent is lost

High level Low level


Compiler program
program

CD / Module-I / Jasaswi 16 17 January 2024


2. Language Processors
 Compiler: It is a program that can read a program in one language (source
language) and translate it into an equivalent program in another language (target
language).
 If the target program is an executable machine-language program, it can then be
called by the user to process inputs and produce outputs.

Source Program

Input Target Program Output


Compiler

Target Program

CD / Module-I / Jasaswi 17 17 January 2024


Language Processors – contd…
 Interpreter: It is another common kind of language processor.
 Instead of producing a target program as a translation, an interpreter appears to
directly execute the operations specified in the source program on inputs
supplied by the user.
 The machine-language target program produced by a compiler is usually much
faster than an interpreter at mapping inputs to outputs .
 An interpreter, however, can usually give better error diagnostics than a
compiler, because it executes the source program statement by statement.

Source Program
Interpreter Output
Input

CD / Module-I / Jasaswi 18 17 January 2024


Compiler vs Interpreter
Parameters Compiler Interpreter
Input Compiler takes the entire program as input. Interpreter takes single instruction as input.

Output Compiler generates an intermediate object Interpreter does not produce any intermediate
code. object code.

Working The compilation is done before execution. Compilation and execution takes place
mechanism simultaneously.
Speed Compiled codes run faster than Interpreter. Interpreted codes run slower than Compiler.
Memory Memory requirement is more due to It requires less memory as it does not create
creation of object code. intermediate object code.
Errors Errors are displayed after Compilation, all at Errors are displayed in every single line.
the same time.
Recompilation Any change in the source program after the Any change in the source program during the
compilation requires recompiling the entire translation does not require retranslation of the
code. entire code.

CD / Module-I / Jasaswi 19 17 January 2024


Compiler vs Interpreter
Compiler Interpreter
1. Steps of Programming: 1. Steps of Programming:
a) Program Creation a) Program Creation
b) Analysis of language by the compiler and throws b) Execution of source statements one by one.
errors (if any)
c) Converts the source code to Machine Code (if no
error)
d) Linking of various code files into a runnable program
e) Program Execution
2. The compiler saves the Machine Code on disks. 2. The Interpreter does not save the Machine Code.
3. Compiled codes run faster than Interpreter. 3. Interpreted codes run slower than Compiler.
4. The compiler generates an output in the form of (.exe). 4. The interpreter does not generate any output.

5. Any change in the source program after the compilation 5. Any change in the source program during the
requires recompiling the entire code. translation does not require retranslation of the
entire code.
6. Errors are displayed in Compiler after Compiling 6. Errors are displayed in every single line.
together at the current time.
CD / Module-I / Jasaswi 20 17 January 2024
Compiler vs Interpreter – contd…
Compiler Interpreter
7. The compiler can see code upfront which helps in 7. The Interpreter works by line working of Code, that’s
running the code faster because of performing why Optimization is a little slower compared to
Optimization. Compilers
8. It does not require source code for later execution. 8. It requires source code for later execution.

9. Execution of the program takes place only after the 9. Execution of the program happens after every line is
whole program is compiled. checked or evaluated.

10. Compilers more often take a large amount of time for 10. In comparison, Interpreters take less time for
analyzing the source code. analyzing the source code.

11. CPU utilization is more in the case of a Compiler. 11. CPU utilization is less in the case of a Interpreter.

12. The use of Compilers mostly happens in Production 12. The use of Interpreters is mostly in Programming and
Environment. Development Environments.

13. Object code is permanently saved for future use. 13. No object code is saved for future use.

14. Programming languages that are compiler-based: C, 14. Programming languages that are interpreter-based:
C++, C#. Python, Perl, MATLAB.
CD / Module-I / Jasaswi 21 17 January 2024
3. Types of Compiler
1. Native Compiler (Native Code Compiler):
• Native Compiler is a compiler which is used to convert source code into target code for the same machine
on which it runs.
• Example: Turbo C, GCC.

Source Code Native Compiler Target Code

Machine A Run on Machine A Machine A


2. Cross Compiler:
• Cross Compiler is a compiler which is used to convert source code of one machine into target code of
another machine. It is useful where the target platform is not having enough resources to build target
code.
• Example: arm-cortexa8-linux-gnueabihf-gcc hello.c (Source code in x86 architecture to target code in
arm architecture)
Source Code Cross Compiler Target Code

Machine A Run on Machine A Machine B

CD / Module-I / Jasaswi 22 17 January 2024


Types of Compiler – contd…
3. Source to Source Compiler:
• A source-to-source Compiler also known as a transcompiler is a compiler which converts one
high level language source code into another high level language target code.
• Example: CoffeeScript, Dart, Haxe, Opal, TypeScript.

High Level Source to Source High Level


Language Compiler Language

4. Bytecode Compiler:
• Bytecode Compiler is a compiler which is used to convert high level language source code
into bytecode.
• Example: Java
High Level Bytecode Bytecode
Language Compiler
Machine A Run on Machine A Machine B

CD / Module-I / Jasaswi 23 17 January 2024


Types of Compiler – contd…
5. Just-In-Time(JIT) Compiler:
• JIT Compiler is a compiler which translates byte code into machine code.
• Example: VM (Java Virtual Machine) in Java and CLR (Common Language Runtime) in C#
JIT
Bytecode Machine Code
Compiler
Note:
 Java Language Processor: A Hybrid
Source Program
Compiler
• Java language processors combine
compilation and interpretation. Translator
• A Java source program first compiled into an (Bytecode
intermediate form called bytecodes. compiler)
• The bytecodes are then interpreted by a
virtual machine.
• Advantage: Bytecodes compiled on one Intermediate Program Virtual Machine Output
machine can be interpreted on another Input
machine, across a network.
• To achieve faster processing of inputs to outputs, some Java compilers, called just-in-time compilers, translate the
bytecodes into machine language immediately before they run the intermediate program to process the input.

CD / Module-I / Jasaswi 24 17 January 2024


4. Language Processing Systems
 We write programs in a High Level Language Source Code in HLL
(HLL) which is very easy for the user to first.c
understand. Preprocessor
 These programs are fed into a series of tools Preprocessed Code
first.i
and Operating System Components to get the (Pure HLL)
desired code that can be used by the Compiler
machine. This is called Language Processing
Assembly Code
Systems.
first.s (Program in Assembly Level
Language)
NOTE:
Assembler
1. Relocatable Machine Code: It is a form
of machine code that is designed to be first.o Relocatable Machine Code
(Object Code)
loaded and executed at any memory Library files
Linker/Loader
location. Relocatable object files

2. Executable Code: Code which can work first.out / first.exe


at one specific address in memory. Executable Code (Target Machine Code)
CD / Module-I / Jasaswi 25 17 January 2024
Language Processing Systems – contd…
1. Pre-processor :
• It is a program which converts HLL to pure HLL (completely machine-
independent. Programs written in a pure high-level language can theoretically
be executed on any computer without modification).
• It takes source code as input and produces modified source code as output.
• It removes the preprocessor directives (e.g. #include<stdio.h>) and adds the
respective files which is File Inclusion.
• It also does macro expansion (e.g. # define PI 3.1341;)., operator conversion
(e.g. a++; to a=a+1;).
• The preprocessor is also known as a macro evaluator.
• The processing is optional that is if any language does not
support #include, macros processing is not required.

CD / Module-I / Jasaswi 26 17 January 2024


Language Processing Systems – contd…
2. Compiler:
• Compiler is a program that can read a program in one language (the source
language) and translate it into an equivalent program in another language (the
target language).
• It converts pure HLL into assembly level language.
• Assembly Level Language is a low level programming language. It is not in
binary form.
• It also shows errors and warning messages.

CD / Module-I / Jasaswi 27 17 January 2024


Language Processing Systems – contd…
3. Assembler:
• For every platform (H/W + OS) we have an assembler.
• An assembler in one platform will not work in another platform.
• Assembler converts assembly code to relocatable machine code.
4. Loader and Linker:
• This phase converts relocatable code to absolute machine code.
• Linker: Links different object files, libraries, DLL files into a single executable file.
 A DLL is a library that contains code and data that can be used by more than one
program at the same time.
• Loader: Loads the executable file in the memory and execute it.
NOTE:
1. The modern compilers like TurboC, gcc, etc. combine all the phases to convert a source file to
executable machine code.
CD / Module-I / Jasaswi 28 17 January 2024
5. Structure of a Compiler
 Basically these are the two phases of a Compiler
• Analysis phase:
 It breaks up the source program into constituent pieces and imposes a grammatical structure on
them. Then it uses this structure to create an intermediate representation of the source program.
 Ifthe analysis part detects that the source program is either syntactically ill formed or semantically
unsound, then it must provide informative messages, so the user can take corrective action.
 The analysis part also collects information about the source program and stores it in a data
structure called a symbol table, which is passed along with the intermediate representation to the
synthesis part.
 This phase is often called as front end of the compiler.
• Synthesis phase:
 The synthesis part constructs the desired target program from the intermediate representation
and the information in the symbol table.
 This phase is often called as back end of the compiler.

CD / Module-I / Jasaswi 29 17 January 2024


6. Phases of a Compiler
 The process of converting the
source code into machine code Front End
involves several phases or
stages, which are collectively
known as the phases of a
compiler.
 The analysis part also collects
information about the source
program and stores it in a data
structure called a symbol table.
 The symbol table, which stores
information about the entire
source program, is used by all
phases of the compiler. Back End

CD / Module-I / Jasaswi 30 17 January 2024


Phases of a Compiler – contd…

CD / Module-I / Jasaswi 31 17 January 2024


6.1 Lexical Analysis
 The first phase of a compiler is called lexical analysis or scanning.
 It takes the output of the pre-processor (which performs file inclusion and macro expansion) as
the input which is in a pure high-level language
 The lexical analyzer reads the stream of characters making up the source program and groups
the characters into meaningful sequences called lexemes.
 For each lexeme, the lexical analyzer produces as output a token of the following form and
passes on to the subsequent phase, syntax analysis:
<token-name, attribute-value>
 In the token, the first component token-name is an abstract symbol that is used during syntax
analysis, and the second component attribute-value points to an entry in the symbol table for this
token. Information from the symbol-table entry is needed for semantic analysis and code
generation.
 It also removes lexical errors (e.g., erroneous characters), comments, and white space.
 Generally implemented as Finite Automata.

CD / Module-I / Jasaswi 32 17 January 2024


Example
 Suppose a source program contains the following assignment statement:
position = initial + rate * 60
 The characters in this assignment could be grouped into the following lexemes and mapped into the
following tokens passed on to the syntax analyzer:
• position is a lexeme that would be mapped into a token <id, 1>, where id is an abstract symbol
standing for identifier and 1 points to the symbol-table entry for position. The symbol-table entry for
an identifier holds information about the identifier, such as its name and type.
• The assignment symbol = is a lexeme that is mapped into the token <=>. Since this token needs
no attribute-value, the second component is omitted.
• initial is a lexeme that is mapped into the token <id, 2>, where 2 points to the symbol-table entry
for initial.
• + is a lexeme that is mapped into the token <+>.
• rate is a lexeme that is mapped into the token <id, 3>, where 3 points to the symbol-table entry for
rate.
• * is a lexeme that is mapped into the token <*>.
• 60 is a lexeme that is mapped into the token <60>.
CD / Module-I / Jasaswi 33 17 January 2024
Example – contd…
 Blanks separating the lexemes would be discarded by the lexical analyzer.
 After lexical analysis, the representation of the assignment statement position =
initial + rate * 60 as the sequence of tokens are as follows:
<id, 1> <=> <id, 2> <+> <id, 3> <*> <60>
 The symbol table will look like as follows:

CD / Module-I / Jasaswi 34 17 January 2024


6.2 Syntax Analysis
 The second phase of the compiler is syntax analysis or parsing.
 The parser uses the first components of the tokens produced by the lexical analyzer to
create a tree-like intermediate representation known as Syntax Tree which shows the
grammatical structure of the token stream.
 In the syntax tree each interior node represents an operation and the children of the node
represent the arguments of the operation. This tree shows the order in which the operations
in the assignment are to be performed.
 A syntax tree for the token stream <id, 1> <=> <id, 2> <+> <id, 3> <*> <60> is shown below
which is the output of the syntax analyzer.

CD / Module-I / Jasaswi 35 17 January 2024


6.3 Semantic Analysis
 The semantic analyzer uses the syntax tree and the information in the symbol table to
check the source program for semantic consistency with the language definition.
 It also gathers type information and saves it in either the syntax tree or the symbol table,
for subsequent use during intermediate-code generation.
 An important part of semantic analysis is type checking, where the compiler checks that
each operator has matching operands.
• Example: many programming language definitions require an array index to be an
integer; the compiler must report an error if a floating-point number is used to index
an array.
 The language specification may permit some type conversions called coercions.
• Example: a binary arithmetic operator may be applied to either a pair of integers or
to a pair of floating-point numbers. If the operator is applied to a floating-point number
and an integer, the compiler may convert or coerce the integer into a floating-point
number.

CD / Module-I / Jasaswi 36 17 January 2024


Example
 Suppose that position, initial, and rate have been declared to be floating-point numbers, and
that the lexeme 60 by itself forms an integer.
 The type checker in the semantic analyzer discovers that the operator * is applied to a floating-
point number rate and an integer 60. In this case, the integer may be converted into a floating-
point number.
 Note that the output of the semantic analyzer has an extra node for the operator inttofloat, which
explicitly converts its integer argument into a floating-point number.

CD / Module-I / Jasaswi 37 17 January 2024


6.4 Intermediate Code Generation
 After syntax and semantic analysis of the source program, many compilers generate an explicit
low-level or machine-like intermediate representation, which we can think of as a program for an
abstract machine.
 This intermediate representation should have the following two important properties:
• it should be easy to produce
• it should be easy to translate into the target machine.
 We consider an intermediate form called three-address code, which consists of a sequence of
assembly-like instructions with three operands per instruction.
 Each operand can act like a register. The output of the intermediate code generator consists of
the three-address code sequence.

CD / Module-I / Jasaswi 38 17 January 2024


Intermediate Code Generation – contd..
NOTE:
1. Each three-address assignment instruction has at most one operator on the right
side. Thus, these instructions fix the order in which operations are to be
performed. The multiplication precedes the addition in the source program.
2. The compiler must generate a temporary name to hold the value computed by a
three-address instruction.
3. Some of the "three-address instructions" like the first and last in the following
code, have fewer than three operands.

CD / Module-I / Jasaswi 39 17 January 2024


6.5 Code Optimization
 The machine-independent code-optimization phase attempts to improve the
intermediate code so that better target code will result.
 Better in the sense the code should be shorter, execute faster, and should
consume less power.
 A simple intermediate code generation algorithm followed by code optimization is
a reasonable way to generate good target code.
 The shorter sequence of code of the code generated by the Intermediate Code
generation phase is as follows:

CD / Module-I / Jasaswi 40 17 January 2024


6.6 Code Generation
 The code generator takes as input an intermediate representation of the source program and
maps it into the target language.
 If the target language is machine code, registers or memory locations are selected for each of the
variables used by the program.
 The intermediate instructions are translated into sequences of machine instructions that perform
the same task.
 A crucial aspect of code generation is the judicious assignment of registers to hold variables.
 The intermediate code generated in the previous phase can be converted to the following
machine instructions:

LDF: Load data to floating-point register


ADDF: Floating-point Addition.
MULF: Floating-point Multiplication.
STF: Store data from floating-point register.

CD / Module-I / Jasaswi 41 17 January 2024


Translation of an assignment statement

CD / Module-I / Jasaswi 42 17 January 2024


6.7 Symbol-Table Management
 Compiler records the variable names used in the source program and collect
information about various attributes of each name.
 These attributes may provide information about:
Variable Names Procedure Names
Storage Number of arguments
Type Type of its arguments
Scope (where in the Method of passing each argument (by
program its value may value or by reference)
be used)
Return type

 The symbol table is a data structure containing a record for each variable name,
with fields for the attributes of the name.
 The data structure should be designed to allow the compiler to find the record for
each name quickly and to store or retrieve data from that record quickly.

CD / Module-I / Jasaswi 43 17 January 2024


6.8 Error Handler
 Error Handling is the phase which is connected to all the phases of a Compiler.
 If an error is detected at a particular phase, it must be dealt so that compilation
can proceed.
 Detection and reporting of errors in the source program is the main function of
the compiler.
 An error can occur at any phase of compilation.
 A good compiler must determine the line number of the program exactly, where
the errors have occurred.
 Functions of Error Handler:
• Error Detection
• Error Report
• Error Recovery
CD / Module-I / Jasaswi 44 17 January 2024
6.8.1 Types of Errors
 Various errors that can occur at a different level of compilation are as follows:
• Lexical (scanner) Errors:
 Illegal or unrecognized characters mainly, caused by typing errors.
 Misspellings of identifiers, keywords or operators
 Unterminated character or string constant: Happens whenever the programmer types
something in quotes and forgets the trailing quote.
• Syntactical Errors:
 These errors are among the most common and is caught by the parser.
A missing semicolon or unbalanced parenthesis
 The difficult part is to decide from where to continue the syntactical analysis after an error
has been found.
 If the parser is not carefully written, or if the error detection and recovery scheme are not
proper, the parser will hit one error and “mess up” after that, and give wrong error
messages throughout the rest of the program.

CD / Module-I / Jasaswi 45 17 January 2024


Types of Errors – contd..
 Various errors types can occur at a different level of compilation (contd…):
• Semantical Errors:
 The semantic errors that can occur in a program are related to the fact that some
statements can be correct from the syntactical point of view, but they make no sense and
no code can be generated to carry out the meaning of the statement.
 incompatible value assignment or type mismatches between operator and operand
• Error may encounter during code optimization (Logical Errors)
 During control flow analysis, there may exist some statements which can never be reached
(code not reachable).
 Infinite loop.
• Error may occur during code generation
 In code generation, the architecture of the computer also plays an important role.
• Error may encounter when the compiler tries to make symbol table entries
 There may exist an identifier that has multiple declarations with contradictory attributes.
CD / Module-I / Jasaswi 46 17 January 2024
7.1 Grouping phases into passes
 A pass is a component where parts of one or more phases of the compiler are
combined when a compiler is implemented.
 It refers to the traversal of a compiler through the entire program.
 A pass reads or scans the instructions of the source program or the output
produced by the previous pass, which makes necessary transformation specified
by its phase and writes the output in an intermediate file, which may be read by
subsequent phases.

CD / Module-I / Jasaswi 47 17 January 2024


Advantages of Grouping phases into passes
 We can produce compilers for different source languages for one target machine by combining
different front ends with the back end for that target machine. Similarly, we can produce compilers for
different target machines, by combining a front end with back ends for different target machines.
 Simplification of Compiler Design: By organizing the compilation process into distinct passes, each
handling a specific aspect of the compilation, the overall complexity of the compiler is reduced. This
modular approach allows each pass to focus on a single task, making the compiler easier to
understand, develop, and maintain.
 Facilitates Debugging and Maintenance: When compiler tasks are divided into separate passes, it
becomes easier to locate and fix bugs. This division allows developers to isolate issues within a
specific pass, reducing the time and effort needed to debug and maintain the compiler.
 Efficiency in Resource Utilization: Grouping into passes can lead to more efficient use of system
resources. For instance, each pass might require different resources or data structures. By separating
these needs into distinct passes, the compiler can manage memory and processing power more
effectively.
 Reusability: The modular nature of passes allows for the reuse of certain components in different
compilers. For example, a pass designed for optimizing arithmetic expressions can be used in multiple
compilers that target different programming languages or hardware architectures

CD / Module-I / Jasaswi 48 17 January 2024


7.2 Types of Compiler Pass
1. One Pass Compiler:
 If we combine or group all the phases of compiler design in a single module known as a single
pass compiler.
 A one-pass/single-pass compiler is a type of compiler that passes through the part of each
compilation unit exactly once.
 Single pass compiler is faster and smaller than the multi-pass compiler.
 Problems with One Pass Compiler:
 Limited Optimization Capabilities: Limited ability to perform complex optimizations. Since
they process the source code in a single pass, they have less context about the entire
program, which is often necessary for more advanced optimizations that require multiple
views of the program's code.
 Less Efficient Memory Management: Since one-pass compilers process the source code
in a linear fashion, they may not manage memory as efficiently as multi-pass compilers.
They might need to keep more information in memory for longer periods, as they cannot
make multiple passes to refine their data structures.

CD / Module-I / Jasaswi 49 17 January 2024


7.2 Types of Compiler Pass
2. Two-Pass compiler or Multi-Pass compiler:
 A Two pass/multi-pass Compiler is a type of compiler that processes the source code of
a program multiple times.
 The First Pass is referred as Front End
 It is the Analytic part
 It is platform independent
 The Second Pass is referred as Back End
 It is the Synthesis Part
 It is platform dependent

CD / Module-I / Jasaswi 50 17 January 2024


7.3 One Pass vs Two-pass Compiler
Aspect One-Pass Compiler Two-Pass Compiler
A two-pass compiler divides the compilation
A one-pass compiler translates the source code
process into two stages or passes. The first pass
Definition into machine code in a single pass, analyzing
involves syntax and semantic analysis, and the
and compiling the code in one step.
second pass generates the machine code.
Slower compared to one-pass compilers as it
Speed Generally faster as it processes the code once.
involves two distinct phases.
Usually requires less memory as it does not Requires more memory to store information from
Memory Usage
need to store much intermediate data. the first pass for use in the second pass.
Better error handling as it can gather more
Limited error detection capabilities since it
Error Handling context about the program in the first pass,
doesn’t analyze the entire program context.
leading to more accurate error detection.
Limited optimization opportunities, as it does not Offers better optimization, as the first pass can
Optimization analyze the entire program before generating collect information useful for optimizing code
code. generation in the second pass.
Used for more complex languages and
Suitable for simpler or resource-constrained
Use Cases applications where thorough analysis and
environments.
optimization are required.
Examples Early versions of C compilers. Modern C++, Java, and C# compilers.
CD / Module-I / Jasaswi 51 17 January 2024
8. Compiler Construction Tools
 Writing a compiler is tedious and time consuming task. There are some specialized tools for helping in
implementation of various phases of compilers.
 These tools are called compiler construction tools. These tools are also called as compiler-compiler, compiler-
generators, or translator writing system.
1. Scanner generators that produce lexical analyzers from a regular-expression description of the tokens of
a language. Example: LEX, Flex(Fast Lexical Analyzer Generator) for C++, JFlex for Java
2. Parser generators that automatically produce syntax analyzers from a grammatical description of a
programming language. Example: YACC(Yet Another Compiler Compiler), Bison, Lemon, ANTLR
(Another Tool for Language Recognition)
3. Syntax-directed translation engines that produce collections of routines for walking a parse tree and
generating intermediate code.
4. Code-generators that produce a code generator from a collection of rules for translating each operation
of the intermediate language into the machine language for a target machine.
5. Data- flow analysis engines that facilitate the gathering of information about how values are transmitted
from one part of a program to each other part. Data- flow analysis is a key part of code optimization.
6. Compiler-construction toolkits that provide an integrated set of routines for constructing various phases
of a compiler. Example: LLVM (Low Level Virtual Machine), Clang, GNU Compiler Collection (GCC)

CD / Module-I / Jasaswi 52 17 January 2024


Probable Questions
1) What is the difference between a compiler and an interpreter?
 A compiler is a program that can read a program in one language - the source language - and translate it
into an equivalent program in another language – the target language and report any errors in the source
program that it detects during the translation process.
 Interpreter directly executes the operations specified in the source program on inputs supplied by the user.
2) What are the advantages of (a) a compiler over an interpreter (b) an interpreter over a compiler?
a) The machine-language target program produced by a compiler is usually much faster than an interpreter
at mapping inputs to outputs.
b) An interpreter can usually give better error diagnostics than a compiler, because it executes the source
program statement by statement.
3) What advantages are there to a language-processing system in which the compiler produces assembly
language rather than machine language?
 The compiler may produce an assembly-language program as its output, because assembly language is
easier to produce as output and is easier to debug.
4) A compiler that translates a high-level language into another high-level language is called a source-to-
source translator. What advantages are there to using C as a target language for a compiler?
 For the C language there are many compilers available that compile to almost every hardware.
CD / Module-I / Jasaswi 53 17 January 2024

You might also like