0% found this document useful (0 votes)
13 views

CD 1.1 Introduction to Compiler

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

CD 1.1 Introduction to Compiler

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Chapter:1

INTRODUCTION TO COMPILER

1.1 Introduction to Compiler:


 Compiler construction is a broad field.
 The demand for compilers will always remain as long as there are programming
languages.

What is a Compiler?
 Compiler is basically a translator. It translates a source program written in High Level
programming language such as Pascal/C/C++ into machine language for computers,
such as the Intel Pentium IV /AMD processor machine as shown in figure 1.1

 Not only this translation, converting High Level Language to Low Level Language, In
addition to this a compiler even takes care of issueing the error messages.
 It shows even the error messages in the source program.
 The main aim of compiler is to convert a High Level Language into Low Level
Language. Then the question is--- “If we have to convert the High Level Language
into Low Level Language, why don’t we write a program in a Low Level
Language?”
 The reason is we are not comfortable writing programs in 0’s and 1’s the binary
language.
We are comfortable writing it in English or some language similar to English and then
compile with your software which is going to convert that High Level Language to
Low Level Language.

TYPICAL LANGUAGE PROCESSING SYSTEM:


 To understand the importance of a compiler, let us take a typical language-processing
system shown in Figure 1.2. Given a high-level language program, let us see how we
get the executable code.
 In addition to a compiler, other programs (translators) are needed to generate an
executable code.
 The different software required in this process are shown below in the figure. The first
translator needed here is the preprocessor.
Figure 1.2 Typical Language Processing System
1. PREPROCESSOR:
 A source program may be divided into modules stored in separate fi les and may
consist of macros.
 A preprocessor produces input to a compiler.
 A preprocessor processes the source code before the compilation and produces a code
that can be more effi ciently used by the compiler.
 It is not necessary to consider a preprocessor as part of a compiler as preprocessing
requires a complete pass.
 It cannot be included as part of a single pass compiler.
 A preprocessor mainly does 2 functions:
a. Macro processing: Macros are shorthands for longer constructs.
i)A macro processor has to deal with two types of statements––macro definition and
macro expansion.
ii)Macro definitions have the name of the macro and a body defining the macro.
iii)They contain formal parameters.
iv)During the expansion of the macro, these formal parameters are substituted for the
actual parameters.
v)All macros (i.e., #define statements) are identified and substituted with their
respective values.
b. File inclusion: Code from all files is appended in text while preserving line
numbers from individual files.

Figure 1.3: Preprocessor example


 Figure 1.3 shows an example of a code before and after preprocessing.
 If the C language program is an input for a preprocessor, then it produces the output
as a C program where there are no #includes and macros, that is, a C program with
only C statements (pure HLL).
 This phase/translator is not mandatory for every high-level language program.
 For example, if the source code is a Pascal program, preprocessing is not required.
 This is an optional phase.
 So the output of this translator is a pure HLL program.
2.Compiler:
 To convert any high-level language to machine code, one translator is mandatory and
that is nothing but a compiler.
 A compiler is a translator that converts a high-level language to a low-level language
(e.g., assembly code or machine code).

3.Assembler:
 Assembly code is a mnemonic version of machine code in which names rather than
binary values for machine instructions and memory addresses are used.
 An assembler needs to assign memory locations or addresses to symbols/identifiers.
 It should use these addresses in generating the target language, that is, the machine
language.
 The assembler should ensure that the same address must be used for all the
occurrences of a given identifier and no two identifiers are assigned with the same
address.
 A simple mechanism to accomplish this is to make two passes over the input.
 During the first pass whenever a new identifier is encountered, assign an address to it.
 Store the identifier along with the address in a symbol table.
 During the second pass, whenever an identifier is seen, then its address is retrieved
from the symbol table and that value is used in the generated machine code.
Example 1: Consider the following C code for adding two numbers:
The equivalent assembly code for adding two numbers is as follows:

 Assembler is a translator that converts assembly code to machine code.


 Machine code is of two types.
1. absolute machine code
2. relocatable machine code.
 The machine code with actual memory addresses is called the absolute machine code.
 Generally, the assembler produces the machine code with relative addresses, which is
called the relocatable machine code since it needs to be relocated in memory for
execution.

The equivalent machine relocatable code for adding two numbers is as follows:

4. Loader/Linker:
 To convert the relocatable machine code to the executable code, one more translator
is required and this is called the loader/linker.

 A loader/linker or link editor is a translator that takes one or more object modules
generated by a compiler and combines them into a single executable program called
the exe code.

 The terms loader and linker are used synonymsly on Unix environments. This
program is known as a linkage editor in IBM mainframe OS. However, in some
operating systems, the same program handles both the tasks of object linking and
physical loading of a program.
 Some systems use linking for the former and loading for the latter.

 The definitions of linking and loading are as follows:

Linking: This is a process where a linker takes several object files and libraries as input
and produces one executable object file shown in Figure. It retrieves from the input
files (and combines them in the executable code) the code of all the procedures that
are referenced and resolves all external references to actual machine addresses.

 The libraries include language- specific libraries, operating system libraries, and user-
defined libraries

Figure: Linker
Loading: This is a process where a loader loads an executable file into memory, initializes
the registers, heap, data, etc., and starts the execution of the program.
 If we look at the design of all these translators, designing a preprocessor or assembler
or loader/linker is simple.
 It can be taken up as a one-month project.
 But among all, the most complex translator is the compiler.
 The design of the first FORTRAN compiler took 18 man years.
 The complexity of the design of a compiler mainly depends on the source language.
 Currently, many automated tools are available. With modern compiler tools like
YACC (yet another compiler compiler), LEX (lexical analyzer), and data flow
engines, the design of a compiler is made easy.

You might also like