0% found this document useful (0 votes)

6 views

UNIT 1 AND 2

This document introduces the concept of compilers, their significance in programming, and the various components involved in their development. It outlines the differences between compilers, interpreters, and other types of translators, as well as the challenges faced in compiler construction. Additionally, it details the architecture of compilers, including the front-end and back-end phases, and the roles of lexical analysis, syntax analysis, intermediate code generation, code optimization, and code generation.

Uploaded by

bukaraisha99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

UNIT 1 AND 2

Uploaded by

bukaraisha99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

UNIT 1 WHAT IS A COMPILER?

INTRODUCTION

In the previous unit you were taken through some basic concepts you
learnt in an earlier course. This was done because of their
relevance/importance to your understanding of this course.

In this unit you will be introduced to the concept of compilers and their
importance to programme development.

Now let us go through your study objectives for this unit.

1.0 OBJECTIVES

At the end of this unit, you should be able to:

• define compiler and its importance in the programming world

• distinguish between a translator, compiler and an interpreter
• discuss the major challenges to be faced in building compilers
• state the qualities of compilers
• mention some of the knowledge required for building compilers
• describe the architecture of a compiler.
2.0 MAIN CONTENT

2.1 Translators
A translator is a programme that takes as input a programme written in one programming
language (t h e source language) and produces as output a programme in another
language (the object or target language). If the source language is a high-level language
such as COBOL, PASCAL, etc. and the object language is a low-level language such as
an assembly language or machine language, then such a translator is called a Compiler.
Executing a programme written in a high-level programming language is basically a two-
step process, as illustrated in Figure 1. The source programme must first be compiled, that
is, translated into object programme. Then the resulting object programme is loaded into
memory and execute

Compilation and Execution

Certain other translators transform a programming language into a simplified language,
called intermediate code, which can be directly executed using a programme called an
interpreter. You can think of the intermediate code as the machine language of an abstract
computer designed to execute the source code.

There are other important types of translators, besides compilers. If the source language is
assembly language and the target language is machine language, then the translator is
called an assembler. The term preprocessor is used for translators that take programmes in
one high- level language into equivalent programmes in another high level language. For
example, there many FORTRAN preprocessors that map „structured‟ versions of
FORTRAN into conventional FORTRAN.

2.2 Why Do We Need Translators?

2.3 We need translators to overcome the rigour of programming in machine language, which
involves communicating directly with a computer in terms of bits, register, and primitive
machine operations. As you have learnt in earlier courses in this programme, a machine
language programme is a sequence of 0‟s and 1‟s, therefore, programming a complex
algorithm in such a language is terribly tedious and prone to mistakes.

2.4 What is a Compiler?

A compiler is a programme that translates a source programme written in some high-level
programming language (such as Java) into machine code for some computer architecture
(such as the Intel Pentium architecture). The generated machine code can later be executed
many times against different data each time.

An interpreter reads an executable source programme written in a high- level programming

language as well as data for this programme, and it runs the programme against the data to
produce some results. One example is the UNIX shell interpreter, which runs operating
system commands interactively.
You should note that both interpreters and compilers (like any other programme) are written
in some high-level programming language (which may be different from the language they
accept) and they are translated into machine code. For example, a Java interpreter code.
(Note the difference between generate and translated into machine code.) An interpreter is
generally slower than a compiler because it processes and interprets each statement in a
programme as many times as the number of the evaluations of this statement. For example,
when a for-loop is interpreted, the statements inside the for- loop body will be analysed and
evaluated on every loop step. Some languages, such as Java and Lisp, come with both an
interpreter and a compiler. Java source programmes (Java classes with .java extension) are
translated by the java compiler into byte-code files (with .class extension). The Java
interpreter, java, called the Java Virtual Machine (JVM), may actually interpret byte codes
directly or may internally compile them to machine code and then execute that code.

Like was mention in section 3.1, compilers and interpreters are not the only examples of
translators. In the table below are a few more:

ble 1: Table of Translators, Source Language and Target Language

Source Language Translator Target Language

LaTeX Text Formater PostScript
SQL database query optimizer Query Evaluation Plan
Java javac compiler Java byte code
Java cross-compiler C++ code
English text Natural semantics (meaning)
Languag
e Understanding
Regular JLex scanner generator a scanner in Java
Expressions
BNF of a CUP parser generator a parser in Java
language
This course deals mainly with compilers for high-level programming languages, but the
same techniques apply to interpreters or to any other compilation scheme.
2.5
2.6 The Challenge in Compiler Development
There are various challenges involved in developing compilers; some of these are itemised
below:

1) Many variations:
a. many programming languages (e.g. FORTRAN, C++, Java)
b. many programming paradigms (e.g. object-oriented,
functional, logic)
c. many computer architectures (e.g. MIPS, SPARC, Intel, alpha)
d. many operating systems (e.g. Linux, Solaris, Windows)
2) Qualities of a compiler: these concerns the qualities that are compiler must possess in other
to be effective and useful. These are listed below in order of importance:
a. the compiler itself must be bug-free
b. it must generate correct machine code
c. the generated machine code must run fast

d. the compiler itself must run fast (compilation time must be proportional to programme size)
e. the compiler must be portable (i.e. modular, supporting separate compilation)
f. it must print good diagnostics and error messages
g. the generated code must work well with existing debuggers
h. must have consistent and predictable optimisation.
3) In-depth knowledge:
Building a compiler requires in-depth knowledge of:
a. programming languages (parameter passing, variable scoping, memory allocation,
etc.)
b. theory (automata, context-free languages, etc.)
c. algorithms and data structures (hash tables, graph algorithms, dynamic
programming, etc.)
d. computer architecture (assembly programming)
e. software engineering.

1.1 Compiler Architecture

As earlier mentioned, a compiler can be viewed as a programme that accepts a source code
(such as a Java programme) and generates machine code for some computer architecture.
Suppose that you want to build compilers for n programming languages (e.g. FORTRAN,
C, C++, Java, BASIC, etc.) and you want these compilers to run on m different
architectures (e.g. MIPS, SPARC, Intel, alpha, etc.). If you do that naively, you need to
write n*m compilers, one for each language- architecture combination.

The holly grail of portability in compilers is to do the same thing by writing n + m

programmes only. You can do this by using a universal Intermediate Representation (IR)
and you make the compiler a two- phase compiler. An IR is typically a tree-like data
structure that captures the basic features of most computer architectures. One example of an
IR tree node is a representation of a 3-address instruction, such as d s1 + s2 that gets two
source addresses, s1 and s2, (i.e. two IR trees) and produces one destination address, d. The
first phase of this compilation scheme, called the front-end, maps the source code into IR,
and the second phase, called the back-end, maps IR into machine code. That way, for each
programming language you want to compile, you write one front-end only, and for each
computer architecture, you write one back-end. So, totally you have n + m components. But
the above ideal separation of compilation into two phases does not work very well for real
programming languages and architectures. Ideally, you must encode all knowledge about
the source programming language in the front end, you must handle all machine
architecture features in the back end, and you must design your IRs in such a way that all
language and machine features are captured properly.

A typical real-world compiler usually has multiple phases (this will be treated to greater
details in unit 3 of this module. This increases the compiler's portability and simplifies
retargeting. The front end consists of the following phases:
• scanning: a scanner groups input characters into tokens
• parsing: a parser recognises sequences of tokens according to some grammar and generates
Abstract Syntax Trees (ASTs)
• semantic analysis: performs type checking (i.e. checking whether the variables, functions
etc. in the source programme are used consistently with their definitions and with
the language
semantics) and translates ASTs into IRs
• optimisation: optimises IRs.
The back end consists of the following phases:
• instruction selection: maps IRs into assembly code
• code optimisation: optimises the assembly code using control- flow and data-flow
analyses, register allocation, etc
• code emission: generates machine code from assembly code.
The generated machine code is written in an object file. This file is not executable since it
may refer to external symbols (such as system calls). The operating system provides the
following utilities to execute the code:

• linking: A linker takes several object files and libraries as input and produces one
executable object file. It retrieves from the input files (and puts them together in the
executable object file)
the code of all the referenced functions/procedures and it resolves all external references to
real addresses. The libraries include the operating system libraries, the language-specific
libraries, and, maybe, user-created libraries.
• loading: A loader loads an executable object file into memory, initialises the registers, heap,
data, etc. and starts the execution of the programme.
• Relocatable shared libraries allow effective memory use when many different applications
share the same code.
UNIT 2 THE STRUCTURE OF A COMPILER

We can identify four components

1. Front end
2. Back-end
3. Tables of information
4. Runtime library
i) Front-End: the front-end is responsible for the analysis of the structure and meaning of the
source text. This end is usually the analysis part of the compiler. Here we have the syntactic
analyser, semantic analyser, and lexical analyser. This part has been automated.
ii) Back-End: The back-end is responsible for generating the target language. Here we have
intermediate code optimiser, code generator and code optimiser. This part has been
automated.
iii) Tables of Information: It includes the symbol-table and there are some other tables that
provide information during compilation process.
iv) Run-Time Library: It is used for run-time system support.

Languages for Writing Compiler

a. Machine language
b. Assembly language
c. High level language or high level language with bootstrapping facilities for flexibility
and transporting.

3.0 Phases of a Compiler

A compiler takes as input a source programme and produces as output an equivalent
sequence of machine instructions. This process is so complex that it is not reasonable, either
from a logical point of view or from an implementation point of view, to consider the
compilation process as occurring in one single step. For this reason, it is customary to
partition the compilation process into a series of sub-processes called phases as shown in
the figure 1 below. A phase is a logically cohesive operation that takes as input one
representation of the source programme and produces as output another representation
3.0.1 The Lexical Analyser
this is the first phase and it is also referred to as the Scanner. It separates characters of the
source language into groups that logically belong together; these groups are called tokens.
The usual tokens are keywords, such as DO or IF, identifiers such as X or NUM, operator
symbol such as <= or +, and punctuation symbol such as parentheses or commas. The
output of the lexical analyser is a stream of tokens, which is passed to the next phase, the
syntax analyser or parser. The tokens in this stream can be represented by codes which we
may regard as integers. Thus DO might be represented by 1, + by 2, and “identifier” by 3.
In the case of a token like “identifier”, a second quantity, telling which of those identifiers
used by the programme is represented by this instance of token “identifier” is passed along
with the integer code for “identifier”. For Example, in the FORTRAN statement:
IF (5 .EQ. MAX) GO TO 100
we find the following eight tokens: IF; (; 5; .EQ; MAX; ); GOTO; 100.

Source Programme

Lexical Analysis

Syntax Analysis

Table Management Intermediate Code Error Handling

Generation

Code Optimisation

Code Generation

Target Programme

PHASES OF COMPILER

The Syntax Analyser

This groups tokens together into syntactic structures. For example, the three tokens
representing A+B might be grouped into a syntactic structure called an expression.
Expressions might further be combined to form statements. Often the syntactic structure
can be regarded as a tree whose leaves are the tokens. The interior nodes of the tree
represent strings of tokens that logically belong together. The parser has two functions. It
checks that the tokens appearing in its input, which is the output of the lexical analyser,
occur in patterns that are permitted by the specification for the source language. It also
imposes on the tokens a tree-like structure that is used by the subsequent phases of the
compiler.
3.0.2 The Intermediate Code Generator
This uses the structure produced by the syntax analyser to create a stream of simple
instructions. Many styles of intermediate code are possible. One common style uses
instructions with one operator and a small number of operands. These instructions with one
operator and a small number of operands. These instructions can be viewed as simple
macros like the macro ADD2. the primary difference between intermediate code and
assembly code is that the intermediate code need not specify the registers to be used for
each operation.

3.0.3 Code Optimisation

This is an optional phase designed to improve the intermediate code so that the ultimate
object programme runs faster and/or takes less space. Its output is another intermediate
code programme that does the same job as the original, but perhaps in a way that saves
time and/or space.
3.0.4 Code Generation
This is the final phase and it produces the object code by deciding on thememory locations
for data, selecting code to access each datum, and selecting the registers in which each
computation is to be done. Designing a code generator that produces truly efficient object
programmes is one of the most difficult parts of a compiler design, both practically and
theoretically.
The Table Management or Bookkeeping
This portion of the compiler keeps track of the names used by the programme and records
essential information about each, such as its type (integer, real, etc.). The data structure
used to record this information is called a symbol table.

The Error Handler

This is invoked when a flaw in the source programme is detected. It must warn the
programmer by issuing a diagnostic, and adjust the information being passed from phase to
phase so that each phase can proceed. It is desirable that compilation be completed on
flawed programmes, at least through the syntax-analysis phase, so that as many errors as
possible can be detected in one compilation. Both the table management and error handling
routines interact with all phases of the compiler
Fig. 1: Phases of a Compiler
Fig. 1: Phases of a Compiler
Target Programme

[FREE PDF sample] Embedded systems VOL 1 introduction to ARM Cortex TM M microcontrollers 5th Edition Jonathan W. Valvano ebooks
No ratings yet
[FREE PDF sample] Embedded systems VOL 1 introduction to ARM Cortex TM M microcontrollers 5th Edition Jonathan W. Valvano ebooks
51 pages
2018 Com 414 (Compiler Construction)
100% (2)
2018 Com 414 (Compiler Construction)
79 pages
Translation Tools: Preprocessors, Interpreters, Compilers, and Linkers
100% (2)
Translation Tools: Preprocessors, Interpreters, Compilers, and Linkers
3 pages
Compiler Construction NOTE 1
No ratings yet
Compiler Construction NOTE 1
37 pages
Lecture 1- Introduction to compilers (1)
No ratings yet
Lecture 1- Introduction to compilers (1)
42 pages
Compiler Construction Week 1
No ratings yet
Compiler Construction Week 1
34 pages
Intro Compiler
No ratings yet
Intro Compiler
7 pages
CD Unit-1
No ratings yet
CD Unit-1
52 pages
UNIT I - CS8602 Compiler Design Notes
No ratings yet
UNIT I - CS8602 Compiler Design Notes
26 pages
Compile Construction
No ratings yet
Compile Construction
84 pages
File (5)
No ratings yet
File (5)
46 pages
Intro Compiler
No ratings yet
Intro Compiler
7 pages
CS8602 Compiler Design Notes
No ratings yet
CS8602 Compiler Design Notes
149 pages
Translators
No ratings yet
Translators
18 pages
Compiler CH-2
No ratings yet
Compiler CH-2
60 pages
L1 - Introduction to Compiler
No ratings yet
L1 - Introduction to Compiler
33 pages
Compiler Construction Lecture 1
No ratings yet
Compiler Construction Lecture 1
13 pages
Complier Design (CSE306) : Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Complier Design (CSE306) : Dr. Murali Krishna Enduri Department of CSE
79 pages
Compiler Design
No ratings yet
Compiler Design
80 pages
End Sem CD
No ratings yet
End Sem CD
97 pages
CD - Unit 1
No ratings yet
CD - Unit 1
67 pages
6.3. Types of Program Translator
No ratings yet
6.3. Types of Program Translator
4 pages
Ce UNIT II - PART 1
No ratings yet
Ce UNIT II - PART 1
28 pages
Unit Ii
No ratings yet
Unit Ii
29 pages
Compiler Design Lec1
No ratings yet
Compiler Design Lec1
6 pages
UNIT1
No ratings yet
UNIT1
40 pages
CSC 304
No ratings yet
CSC 304
9 pages
Language Translators Computer Science As Level
No ratings yet
Language Translators Computer Science As Level
5 pages
Slide 01. 5 - 6098005218379696693
No ratings yet
Slide 01. 5 - 6098005218379696693
6 pages
Purpose of Translator Different Types of Translators: Compiler
No ratings yet
Purpose of Translator Different Types of Translators: Compiler
32 pages
CSC 319 Compiler Constructions
No ratings yet
CSC 319 Compiler Constructions
54 pages
CSC 437 Chapter 1
No ratings yet
CSC 437 Chapter 1
82 pages
Completed LL
No ratings yet
Completed LL
29 pages
Lesson 3 - Computer Programming Languages and Language Translators
No ratings yet
Lesson 3 - Computer Programming Languages and Language Translators
6 pages
Elementary Programming Principles
No ratings yet
Elementary Programming Principles
61 pages
CH 1
No ratings yet
CH 1
21 pages
Compiler
No ratings yet
Compiler
79 pages
Week 3 Language Translators
No ratings yet
Week 3 Language Translators
6 pages
Compilers, Translators, Interpreters, and Assemblers - Presentation2
No ratings yet
Compilers, Translators, Interpreters, and Assemblers - Presentation2
9 pages
MODIFIED 2024_2025
No ratings yet
MODIFIED 2024_2025
28 pages
Compiler Construction: Nguyen Thi Thu Huong Department of Computer Science-HUST Email: Cell Phone 0903253796
No ratings yet
Compiler Construction: Nguyen Thi Thu Huong Department of Computer Science-HUST Email: Cell Phone 0903253796
35 pages
UNIT 2 Notes ITAB
No ratings yet
UNIT 2 Notes ITAB
13 pages
Compiler Design
No ratings yet
Compiler Design
59 pages
CS 3501 - CD-Unit 1 Notes
No ratings yet
CS 3501 - CD-Unit 1 Notes
67 pages
Language Processing System
No ratings yet
Language Processing System
17 pages
Compiler Lecture 3 4 5
No ratings yet
Compiler Lecture 3 4 5
14 pages
What Is A Language Processor
No ratings yet
What Is A Language Processor
17 pages
Compiler Design 1-1
No ratings yet
Compiler Design 1-1
27 pages
CD Experiments 1,2
No ratings yet
CD Experiments 1,2
12 pages
Compiler Construction: Mohamed Zahran (Aka Z) Mzahran@cs - Nyu.edu
No ratings yet
Compiler Construction: Mohamed Zahran (Aka Z) Mzahran@cs - Nyu.edu
37 pages
CD ch1
No ratings yet
CD ch1
23 pages
5.2 Language Translators 1
No ratings yet
5.2 Language Translators 1
21 pages
Automata Theory and Compiler Design (AT&CD) Vtu Sce 5th Sem 21cs51
No ratings yet
Automata Theory and Compiler Design (AT&CD) Vtu Sce 5th Sem 21cs51
12 pages
CD Unit 1 Merged
No ratings yet
CD Unit 1 Merged
136 pages
Operating Systems
No ratings yet
Operating Systems
5 pages
5.2 Language Translators
No ratings yet
5.2 Language Translators
65 pages
Lesson 2
No ratings yet
Lesson 2
8 pages
Compiler Design
No ratings yet
Compiler Design
188 pages
Compiler 2024
No ratings yet
Compiler 2024
179 pages
Compiler Construction
No ratings yet
Compiler Construction
5 pages
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
COM_415 CHAPTER
No ratings yet
COM_415 CHAPTER
8 pages
COM 412 Lecture Note
No ratings yet
COM 412 Lecture Note
19 pages
com 414 lecture note HND II
No ratings yet
com 414 lecture note HND II
34 pages
Chapter OneLTUCL7QDHN (2)
No ratings yet
Chapter OneLTUCL7QDHN (2)
3 pages
Ss and Os Manual PDF
No ratings yet
Ss and Os Manual PDF
183 pages
Principles of Programming Languages UNIT I
No ratings yet
Principles of Programming Languages UNIT I
91 pages
Assembly Language Programming: Instruction Formats (Opcodes, Mnemonics and Operands)
No ratings yet
Assembly Language Programming: Instruction Formats (Opcodes, Mnemonics and Operands)
9 pages
RL78-Instruction Set Manual
No ratings yet
RL78-Instruction Set Manual
204 pages
2023 Lecture Note On CSC102a
No ratings yet
2023 Lecture Note On CSC102a
41 pages
Module1 CA PDF Final
No ratings yet
Module1 CA PDF Final
71 pages
08 Lecture2
No ratings yet
08 Lecture2
23 pages
CHAPTER 3 - 1 - Ver2-Intro To Assembly Language PDF
100% (1)
CHAPTER 3 - 1 - Ver2-Intro To Assembly Language PDF
34 pages
New MPMC Lab 2015 16 2 - 0
No ratings yet
New MPMC Lab 2015 16 2 - 0
101 pages
Glossary Of: Laboratory Terms
No ratings yet
Glossary Of: Laboratory Terms
42 pages
8086 Ece
No ratings yet
8086 Ece
71 pages
Mic 11
No ratings yet
Mic 11
17 pages
Assembly Language Abreviated
100% (1)
Assembly Language Abreviated
7 pages
4 Year B.Tech CSE FullSyllabus
No ratings yet
4 Year B.Tech CSE FullSyllabus
37 pages
Csc201: Introduction To Computer Programming I What Is A Computer?
No ratings yet
Csc201: Introduction To Computer Programming I What Is A Computer?
12 pages
ALL ABOUT ARDUINO ASSEMBLY PROGRAMMING
No ratings yet
ALL ABOUT ARDUINO ASSEMBLY PROGRAMMING
6 pages
3 Assembly Language Programming
No ratings yet
3 Assembly Language Programming
47 pages
Programming-Theory Questions & Answers
100% (5)
Programming-Theory Questions & Answers
8 pages
Effectiveness of Using Assemblr Edu Learning Media
No ratings yet
Effectiveness of Using Assemblr Edu Learning Media
7 pages
8086 Assembler Tutorial For Beginners (Part 10)
100% (1)
8086 Assembler Tutorial For Beginners (Part 10)
3 pages
ARM Architecture Reference Manual
No ratings yet
ARM Architecture Reference Manual
1,138 pages
Es Lab Manual For M.tech Jwfiles
No ratings yet
Es Lab Manual For M.tech Jwfiles
108 pages
Lab Report 1
No ratings yet
Lab Report 1
3 pages
CDSSS
No ratings yet
CDSSS
72 pages
Chapter One: What Is Computer?
No ratings yet
Chapter One: What Is Computer?
27 pages
About The Presentations: An Introduction To Programming With C++, Eighth Edition 1
No ratings yet
About The Presentations: An Introduction To Programming With C++, Eighth Edition 1
27 pages
Compiler Design Concepts Worked Out Examples and M
100% (1)
Compiler Design Concepts Worked Out Examples and M
100 pages
Petrovic Branimir CV
No ratings yet
Petrovic Branimir CV
5 pages

UNIT 1 AND 2

Uploaded by

UNIT 1 AND 2

Uploaded by

UNIT 1 WHAT IS A COMPILER?

Now let us go through your study objectives for this unit.

At the end of this unit, you should be able to:

• define compiler and its importance in the programming world

Compilation and Execution

2.2 Why Do We Need Translators?

2.4 What is a Compiler?

An interpreter reads an executable source programme written in a high- level programming

ble 1: Table of Translators, Source Language and Target Language

Source Language Translator Target Language

1.1 Compiler Architecture

The holly grail of portability in compilers is to do the same thing by writing n + m

We can identify four components

Languages for Writing Compiler

3.0 Phases of a Compiler

Table Management Intermediate Code Error Handling

The Syntax Analyser

3.0.3 Code Optimisation

The Error Handler

You might also like