0% found this document useful (0 votes)
6 views

UNIT 1 AND 2

This document introduces the concept of compilers, their significance in programming, and the various components involved in their development. It outlines the differences between compilers, interpreters, and other types of translators, as well as the challenges faced in compiler construction. Additionally, it details the architecture of compilers, including the front-end and back-end phases, and the roles of lexical analysis, syntax analysis, intermediate code generation, code optimization, and code generation.

Uploaded by

bukaraisha99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

UNIT 1 AND 2

This document introduces the concept of compilers, their significance in programming, and the various components involved in their development. It outlines the differences between compilers, interpreters, and other types of translators, as well as the challenges faced in compiler construction. Additionally, it details the architecture of compilers, including the front-end and back-end phases, and the roles of lexical analysis, syntax analysis, intermediate code generation, code optimization, and code generation.

Uploaded by

bukaraisha99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

UNIT 1 WHAT IS A COMPILER?

INTRODUCTION

In the previous unit you were taken through some basic concepts you
learnt in an earlier course. This was done because of their
relevance/importance to your understanding of this course.

In this unit you will be introduced to the concept of compilers and their
importance to programme development.

Now let us go through your study objectives for this unit.

1.0 OBJECTIVES

At the end of this unit, you should be able to:

• define compiler and its importance in the programming world


• distinguish between a translator, compiler and an interpreter
• discuss the major challenges to be faced in building compilers
• state the qualities of compilers
• mention some of the knowledge required for building compilers
• describe the architecture of a compiler.
2.0 MAIN CONTENT

2.1 Translators
A translator is a programme that takes as input a programme written in one programming
language (t h e source language) and produces as output a programme in another
language (the object or target language). If the source language is a high-level language
such as COBOL, PASCAL, etc. and the object language is a low-level language such as
an assembly language or machine language, then such a translator is called a Compiler.
Executing a programme written in a high-level programming language is basically a two-
step process, as illustrated in Figure 1. The source programme must first be compiled, that
is, translated into object programme. Then the resulting object programme is loaded into
memory and execute

Compilation and Execution


Certain other translators transform a programming language into a simplified language,
called intermediate code, which can be directly executed using a programme called an
interpreter. You can think of the intermediate code as the machine language of an abstract
computer designed to execute the source code.

There are other important types of translators, besides compilers. If the source language is
assembly language and the target language is machine language, then the translator is
called an assembler. The term preprocessor is used for translators that take programmes in
one high- level language into equivalent programmes in another high level language. For
example, there many FORTRAN preprocessors that map „structured‟ versions of
FORTRAN into conventional FORTRAN.

2.2 Why Do We Need Translators?


2.3 We need translators to overcome the rigour of programming in machine language, which
involves communicating directly with a computer in terms of bits, register, and primitive
machine operations. As you have learnt in earlier courses in this programme, a machine
language programme is a sequence of 0‟s and 1‟s, therefore, programming a complex
algorithm in such a language is terribly tedious and prone to mistakes.

2.4 What is a Compiler?


A compiler is a programme that translates a source programme written in some high-level
programming language (such as Java) into machine code for some computer architecture
(such as the Intel Pentium architecture). The generated machine code can later be executed
many times against different data each time.

An interpreter reads an executable source programme written in a high- level programming


language as well as data for this programme, and it runs the programme against the data to
produce some results. One example is the UNIX shell interpreter, which runs operating
system commands interactively.
You should note that both interpreters and compilers (like any other programme) are written
in some high-level programming language (which may be different from the language they
accept) and they are translated into machine code. For example, a Java interpreter code.
(Note the difference between generate and translated into machine code.) An interpreter is
generally slower than a compiler because it processes and interprets each statement in a
programme as many times as the number of the evaluations of this statement. For example,
when a for-loop is interpreted, the statements inside the for- loop body will be analysed and
evaluated on every loop step. Some languages, such as Java and Lisp, come with both an
interpreter and a compiler. Java source programmes (Java classes with .java extension) are
translated by the java compiler into byte-code files (with .class extension). The Java
interpreter, java, called the Java Virtual Machine (JVM), may actually interpret byte codes
directly or may internally compile them to machine code and then execute that code.

Like was mention in section 3.1, compilers and interpreters are not the only examples of
translators. In the table below are a few more:

ble 1: Table of Translators, Source Language and Target Language

Source Language Translator Target Language


LaTeX Text Formater PostScript
SQL database query optimizer Query Evaluation Plan
Java javac compiler Java byte code
Java cross-compiler C++ code
English text Natural semantics (meaning)
Languag
e Understanding
Regular JLex scanner generator a scanner in Java
Expressions
BNF of a CUP parser generator a parser in Java
language
This course deals mainly with compilers for high-level programming languages, but the
same techniques apply to interpreters or to any other compilation scheme.
2.5
2.6 The Challenge in Compiler Development
There are various challenges involved in developing compilers; some of these are itemised
below:

1) Many variations:
a. many programming languages (e.g. FORTRAN, C++, Java)
b. many programming paradigms (e.g. object-oriented,
functional, logic)
c. many computer architectures (e.g. MIPS, SPARC, Intel, alpha)
d. many operating systems (e.g. Linux, Solaris, Windows)
2) Qualities of a compiler: these concerns the qualities that are compiler must possess in other
to be effective and useful. These are listed below in order of importance:
a. the compiler itself must be bug-free
b. it must generate correct machine code
c. the generated machine code must run fast

d. the compiler itself must run fast (compilation time must be proportional to programme size)
e. the compiler must be portable (i.e. modular, supporting separate compilation)
f. it must print good diagnostics and error messages
g. the generated code must work well with existing debuggers
h. must have consistent and predictable optimisation.
3) In-depth knowledge:
Building a compiler requires in-depth knowledge of:
a. programming languages (parameter passing, variable scoping, memory allocation,
etc.)
b. theory (automata, context-free languages, etc.)
c. algorithms and data structures (hash tables, graph algorithms, dynamic
programming, etc.)
d. computer architecture (assembly programming)
e. software engineering.

1.1 Compiler Architecture


As earlier mentioned, a compiler can be viewed as a programme that accepts a source code
(such as a Java programme) and generates machine code for some computer architecture.
Suppose that you want to build compilers for n programming languages (e.g. FORTRAN,
C, C++, Java, BASIC, etc.) and you want these compilers to run on m different
architectures (e.g. MIPS, SPARC, Intel, alpha, etc.). If you do that naively, you need to
write n*m compilers, one for each language- architecture combination.

The holly grail of portability in compilers is to do the same thing by writing n + m


programmes only. You can do this by using a universal Intermediate Representation (IR)
and you make the compiler a two- phase compiler. An IR is typically a tree-like data
structure that captures the basic features of most computer architectures. One example of an
IR tree node is a representation of a 3-address instruction, such as d s1 + s2 that gets two
source addresses, s1 and s2, (i.e. two IR trees) and produces one destination address, d. The
first phase of this compilation scheme, called the front-end, maps the source code into IR,
and the second phase, called the back-end, maps IR into machine code. That way, for each
programming language you want to compile, you write one front-end only, and for each
computer architecture, you write one back-end. So, totally you have n + m components. But
the above ideal separation of compilation into two phases does not work very well for real
programming languages and architectures. Ideally, you must encode all knowledge about
the source programming language in the front end, you must handle all machine
architecture features in the back end, and you must design your IRs in such a way that all
language and machine features are captured properly.

A typical real-world compiler usually has multiple phases (this will be treated to greater
details in unit 3 of this module. This increases the compiler's portability and simplifies
retargeting. The front end consists of the following phases:
• scanning: a scanner groups input characters into tokens
• parsing: a parser recognises sequences of tokens according to some grammar and generates
Abstract Syntax Trees (ASTs)
• semantic analysis: performs type checking (i.e. checking whether the variables, functions
etc. in the source programme are used consistently with their definitions and with
the language
semantics) and translates ASTs into IRs
• optimisation: optimises IRs.
The back end consists of the following phases:
• instruction selection: maps IRs into assembly code
• code optimisation: optimises the assembly code using control- flow and data-flow
analyses, register allocation, etc
• code emission: generates machine code from assembly code.
The generated machine code is written in an object file. This file is not executable since it
may refer to external symbols (such as system calls). The operating system provides the
following utilities to execute the code:

• linking: A linker takes several object files and libraries as input and produces one
executable object file. It retrieves from the input files (and puts them together in the
executable object file)
the code of all the referenced functions/procedures and it resolves all external references to
real addresses. The libraries include the operating system libraries, the language-specific
libraries, and, maybe, user-created libraries.
• loading: A loader loads an executable object file into memory, initialises the registers, heap,
data, etc. and starts the execution of the programme.
• Relocatable shared libraries allow effective memory use when many different applications
share the same code.
UNIT 2 THE STRUCTURE OF A COMPILER

We can identify four components

1. Front end
2. Back-end
3. Tables of information
4. Runtime library
i) Front-End: the front-end is responsible for the analysis of the structure and meaning of the
source text. This end is usually the analysis part of the compiler. Here we have the syntactic
analyser, semantic analyser, and lexical analyser. This part has been automated.
ii) Back-End: The back-end is responsible for generating the target language. Here we have
intermediate code optimiser, code generator and code optimiser. This part has been
automated.
iii) Tables of Information: It includes the symbol-table and there are some other tables that
provide information during compilation process.
iv) Run-Time Library: It is used for run-time system support.

Languages for Writing Compiler


a. Machine language
b. Assembly language
c. High level language or high level language with bootstrapping facilities for flexibility
and transporting.

3.0 Phases of a Compiler


A compiler takes as input a source programme and produces as output an equivalent
sequence of machine instructions. This process is so complex that it is not reasonable, either
from a logical point of view or from an implementation point of view, to consider the
compilation process as occurring in one single step. For this reason, it is customary to
partition the compilation process into a series of sub-processes called phases as shown in
the figure 1 below. A phase is a logically cohesive operation that takes as input one
representation of the source programme and produces as output another representation
3.0.1 The Lexical Analyser
this is the first phase and it is also referred to as the Scanner. It separates characters of the
source language into groups that logically belong together; these groups are called tokens.
The usual tokens are keywords, such as DO or IF, identifiers such as X or NUM, operator
symbol such as <= or +, and punctuation symbol such as parentheses or commas. The
output of the lexical analyser is a stream of tokens, which is passed to the next phase, the
syntax analyser or parser. The tokens in this stream can be represented by codes which we
may regard as integers. Thus DO might be represented by 1, + by 2, and “identifier” by 3.
In the case of a token like “identifier”, a second quantity, telling which of those identifiers
used by the programme is represented by this instance of token “identifier” is passed along
with the integer code for “identifier”. For Example, in the FORTRAN statement:
IF (5 .EQ. MAX) GO TO 100
we find the following eight tokens: IF; (; 5; .EQ; MAX; ); GOTO; 100.

Source Programme

Lexical Analysis

Syntax Analysis

Table Management Intermediate Code Error Handling


Generation

Code Optimisation

Code Generation

Target Programme

PHASES OF COMPILER

The Syntax Analyser


This groups tokens together into syntactic structures. For example, the three tokens
representing A+B might be grouped into a syntactic structure called an expression.
Expressions might further be combined to form statements. Often the syntactic structure
can be regarded as a tree whose leaves are the tokens. The interior nodes of the tree
represent strings of tokens that logically belong together. The parser has two functions. It
checks that the tokens appearing in its input, which is the output of the lexical analyser,
occur in patterns that are permitted by the specification for the source language. It also
imposes on the tokens a tree-like structure that is used by the subsequent phases of the
compiler.
3.0.2 The Intermediate Code Generator
This uses the structure produced by the syntax analyser to create a stream of simple
instructions. Many styles of intermediate code are possible. One common style uses
instructions with one operator and a small number of operands. These instructions with one
operator and a small number of operands. These instructions can be viewed as simple
macros like the macro ADD2. the primary difference between intermediate code and
assembly code is that the intermediate code need not specify the registers to be used for
each operation.

3.0.3 Code Optimisation


This is an optional phase designed to improve the intermediate code so that the ultimate
object programme runs faster and/or takes less space. Its output is another intermediate
code programme that does the same job as the original, but perhaps in a way that saves
time and/or space.
3.0.4 Code Generation
This is the final phase and it produces the object code by deciding on thememory locations
for data, selecting code to access each datum, and selecting the registers in which each
computation is to be done. Designing a code generator that produces truly efficient object
programmes is one of the most difficult parts of a compiler design, both practically and
theoretically.
The Table Management or Bookkeeping
This portion of the compiler keeps track of the names used by the programme and records
essential information about each, such as its type (integer, real, etc.). The data structure
used to record this information is called a symbol table.

The Error Handler


This is invoked when a flaw in the source programme is detected. It must warn the
programmer by issuing a diagnostic, and adjust the information being passed from phase to
phase so that each phase can proceed. It is desirable that compilation be completed on
flawed programmes, at least through the syntax-analysis phase, so that as many errors as
possible can be detected in one compilation. Both the table management and error handling
routines interact with all phases of the compiler
Fig. 1: Phases of a Compiler
Fig. 1: Phases of a Compiler
Target Programme

You might also like