0% found this document useful (0 votes)
18 views

Indian Institute of Information Technology, Bhagalpur: Assignment - 1

kmk

Uploaded by

sagnikthetiger31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Indian Institute of Information Technology, Bhagalpur: Assignment - 1

kmk

Uploaded by

sagnikthetiger31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

INDIAN INSTITUTE OF INFORMATION

TECHNOLOGY,BHAGALPUR

ASSIGNMENT -1

COURSE NAME-: COMPILER DESIGN


COURSE CODE: -CS304

SUBMITTED BY: SUBMITTED TO:

NAME:- SAURABH YADAV DR.UJJWAL BISWAS

ROLL NO.:-2101098CS

BRANCH:- CSE
OVERVIEW OF DIFFERENT PHASES
OF A COMPILER

Overview of The Translation Process of a Source


Program:
The compiler is software that converts a program written in a high-level language (Source
Language) to a low-level language.

A translator or language processor is a program that translates an input program written in a


programming language into an equivalent program in another language. The compiler is a type
of translator, which takes a program written in a high-level programming language as input and
translates it into an equivalent program in low-level languages such as machine language or
assembly language.

The program written in a high-level language is known as a source program, and the program
converted into a low-level language is known as an object (or target) program. Without
compilation, no program written in a high-level language can be executed. For every
programming language, we have a different compiler; however, the basic tasks performed by
every compiler are the same. The process of translating the source code into machine code
involves several stages, including lexical analysis, syntax analysis, semantic analysis, code
generation, and optimization.

A compiler is an intelligent program as compared to an assembler. Compiler verifies all types of


limits, ranges, errors, etc. Compiler program takes more time to run, and it occupies huge
amount of memory space. The speed of the compiler is slower than other system software. It
takes time because it enters through the program and then does translation of the full program.
When the compiler runs on the same machine and produces machine code for the same
machine on which it is running. Then it is called self-compiler or resident compiler. The compiler
may run on one machine and produces the machine codes for another computer then in that
case it is called cross compiler.
High-Level Programming Language:

A high-level programming language is a language that has an abstraction of attributes of the


computer. High-level programming is more convenient for the user in writing a program.

Low-Level Programming Language:

A low-Level Programming language is a language that doesn’t require programming ideas and
concepts.

Stages of Compiler Design:

 Lexical Analysis: The first stage of compiler design is lexical analysis, also known as
scanning. In this stage, the compiler reads the source code character by character and
breaks it down into a series of tokens, such as keywords, identifiers, and operators. These
tokens are then passed on to the next stage of the compilation process.

 Syntax Analysis: The second stage of compiler design is syntax analysis, also known as
parsing. In this stage, the compiler checks the syntax of the source code to ensure that it
conforms to the rules of the programming language. The compiler builds a parse tree,
which is a hierarchical representation of the program’s structure, and uses it to check for
syntax errors.
 Semantic Analysis: The third stage of compiler design is semantic analysis. In this
stage, the compiler checks the meaning of the source code to ensure that it makes sense.
The compiler performs type checking, which ensures that variables are used correctly and
that operations are performed on compatible data types. The compiler also checks for
other semantic errors, such as undeclared variables and incorrect function calls.

 Code Generation: The fourth stage of compiler design is code generation. In this stage,
the compiler translates the parse tree into machine code that can be executed by the
computer. The code generated by the compiler must be efficient and optimized for the
target platform.
 Optimization: The final stage of compiler design is optimization. In this stage, the
compiler analyzes the generated code and optimizes to improve its performance. The
compiler may perform optimizations such as constant folding, loop unrolling, and
function inlining.

Overall, compiler design is a complex process that involves multiple stages and requires a deep
understanding of both the programming language and the target platform. A well-designed
compiler can greatly improve the efficiency and performance of software programs, making
them more useful and valuable for users.

Components of a Simple Compiler:

Component Description
Lexical Scans the source program and breaks it down into tokens, identifying
Analyzer keywords, identifiers, operators, and other language constructs.
Syntax Parses the tokens generated by the lexical analyzer and checks if they
Analyzer conform to the grammar rules of the programming language.
Semantic Performs semantic checks on the parsed program, ensuring that the
Analyzer program is logically correct and meaningful.
Intermediate Translates the parsed program into an intermediate representation,
Code which is a lower-level, platform-independent code.
Generator
Code Applies various optimization techniques to the intermediate code,
Optimizer improving the efficiency and performance of the final executable.
Code Translates the optimized intermediate code into machine code or
Generator assembly code specific to the target platform.
Code Combines the generated code with any required external libraries or
Linker/Assemb modules, resolving references and generating a complete executable file.
ler
Error Handler Detects and reports any errors or inconsistencies in the source program,
providing meaningful error messages to aid in debugging.

Types of Compilers:

There are mainly three types of compilers.

 Single Pass Compilers


 Two Pass Compilers
 Multipass Compilers

Single Pass Compiler

When all the phases of the compiler are present inside a single module, it is simply called a
single-pass compiler. It performs the work of converting source code to machine code.

Two Pass Compiler


Two-pass compiler is a compiler in which the program is translated twice, once from the front
end and the back from the back end known as Two Pass Compiler.

Multipass Compiler

When several intermediate codes are created in a program and a syntax tree is processed many
times, it is called Multi pass Compiler. It breaks code into smaller programs.

Cross Compiler that runs on a machine ‘A’ and produces a code for another machine ‘B’. It can
create code for a platform other than the one on which the compiler is running.

Source-to-source Compiler or trans compiler or trans piler is a compiler that translates source
code written in one programming language into the source code of another programming
language.

Language Processing Systems

We know a computer is a logical assembly of Software and Hardware. The hardware knows a
language that is hard for us to grasp, consequently, we tend to write programs in a high-level
language, that is much less complicated for us to comprehend and maintain in our thoughts.
Now, these programs go through a series of transformations so that they can readily be used by
machines. This is where language procedure systems come in handy.

Analysis of The Source Program:

Lexical Analysis
 The first phase of the compiler.
 Breaks the source program into a sequence of lexemes.
 Identifies the tokens and their attributes.

Syntax Analysis

 The second phase of the compiler.


 Checks the structure of the source program against the grammar rules of the
programming language.
 Builds a parse tree or an abstract syntax tree.

Semantic Analysis

 The third phase of the compiler.


 Checks the meaning and validity of the source program.
 Performs type checking and symbol table management.

The Phases of a Compiler:


A compiler is a software program that converts the high-level source code written in a
programming language into low-level machine code that can be executed by the computer
hardware. The process of converting the source code into machine code involves several phases
or stages, which are collectively known as the phases of a compiler. The typical phases of a
compiler are:

Lexical Analysis: The first phase of a compiler is lexical analysis, also known as scanning. This
phase reads the source code and breaks it into a stream of tokens, which are the basic units of
the programming language. The tokens are then passed on to the next phase for further
processing.

Syntax Analysis: The second phase of a compiler is syntax analysis, also known as parsing. This
phase takes the stream of tokens generated by the lexical analysis phase and checks whether
they conform to the grammar of the programming language. The output of this phase is usually
an Abstract Syntax Tree (AST).
Semantic Analysis: The third phase of a compiler is semantic analysis. This phase checks
whether the code is semantically correct, i.e., whether it conforms to the language’s type
system and other semantic rules. In this stage, the compiler checks the meaning of the source
code to ensure that it makes sense. The compiler performs type checking, which ensures that
variables are used correctly and that operations are performed on compatible data types. The
compiler also checks for other semantic errors, such as undeclared variables and incorrect
function calls.

Intermediate Code Generation: The fourth phase of a compiler is intermediate code


generation. This phase generates an intermediate representation of the source code that can be
easily translated into machine code.

Optimization: The fifth phase of a compiler is optimization. This phase applies various
optimization techniques to the intermediate code to improve the performance of the generated
machine code.

Code Generation: The final phase of a compiler is code generation. This phase takes the
optimized intermediate code and generates the actual machine code that can be executed by
the target hardware.

Cousins of The Compiler:

Converting a high-level language into a low-level language takes multiple steps and involves
many programs apart from the Compiler. Before the compilation can start, our source code
needs to be preprocessed. After the compilation, our code needs to be converted into
executable code to execute on our machine. These essential tasks are performed by the
preprocessor, assembler, Linker, and Loader. They are known as the cousins of the Compiler.
Let's study them in detail.

Let's see who the cousins of Compiler are and what their contributions are in the process of
converting a high-level language into a low-level language.

Preprocessor:

The preprocessor is one of the cousins of the Compiler. It is a program that performs
preprocessing. It performs processing on the given data and produces an output. The output
generated is used as an input for some other program.

The preprocessor increases the readability of the code by replacing a complex expression with a
simpler one by using a macro.

A preprocessor performs multiple types of functionality and operations on the data.

Some of them are-

Macro processing

Macro processing is mapping the input to output data based on a certain set of rules and
defined processes. These rules are known as macros.

Rational Preprocessors

Relational preprocessors are the processors that change older languages with some modern
flow-of-control and data-structuring facilities.

File Inclusion:

The preprocessor is also used to include header files in the program text. A header file is a text
file included in our source program file during compilation. When the preprocessor finds an
#include directive in the program, it replaces it with the entire content of the specified header
file.

Language extension: Language extension is used to add new capabilities to the existing
language. This is done by including certain libraries in our program, which provides extra
functionality. An example of this is Equel, a database query language embedded in C.

Error Detection: Some preprocessors can perform error-checking on the source code that is
given as input to them. For example, it can check if the headers files are included properly and if
the macros are defined correctly or not.

Conditional Compilation
Certain preprocessors can include or excluding certain pieces of code based on the result of a
condition. They provide more flexibility to the programmers for writing the code as they allow
the programmers to include or exclude certain features of the program based upon some
condition.

Assembler:

Assembler is also one of the cousins of the compiler. A compiler takes the preprocessed code
and then converts it into assembly code. This assembly code is given as input to the assembler,
and the assembler converts it into the machine code. Assembler comes into effect in the
compilation process after the Compiler has finished its job.

There are two types of assemblers-


· One-Pass assembler: They go through the source code (output of Compiler) only once and
assume that all symbols will be defined before any instruction that references them.

· Two-Pass assembler: Two-pass assemblers work by creating a symbol table with the
symbols and their values in the first pass, and then using the symbol table in a second
pass, they generate code.

Linker:

Linker takes the output produced by the assembler as input and combines them to create an
executable file. It merges two or more object files that might be created by different assemblers
and creates a link between them. It also appends all the libraries that will be required for the
execution of the file. A linker's primary function is to search and find referred modules in a
program and establish the memory address where these codes will be loaded.

Multiple tasks that can be performed by linkers include-


· Library Management: Linkers can be used to add external libraries to our code to add
additional functionalities. By adding those libraries, our code can now use the functions
defined in those libraries.

· Code Optimization: Linkers are also used to optimize the code generated by the compiler
by reducing the code size and increasing the program's performance.

· Memory Management: Linkers are also responsible for managing the memory
requirement of the executable code. It allocates the memory to the variables used in the
program and ensures they have a consistent memory location when the code is executed.
· Symbol Resolution: Linkers link multiple object files, and a symbol can be redefined in
multiple files, giving rise to a conflict. The linker resolves these conflicts by choosing one
definition to use.

Loader:

The loader works after the linker has performed its task and created the executable code. It
takes the input of executable files generated from the linker, loads it to the main memory, and
prepares this loaded code for execution by a computer. It also allocates memory space to the
program. The loader is also responsible for the execution of programs by allocating RAM to the
program and initializing specific registers.

Following tasks are performed by the loader


· Loading: The loader loads the executable files in the memory and provides memory for
executing the program.
· Relocation: The loader adjusts the memory addresses of the program to relocate its
location in memory.
· Symbol Resolution: The loader is used to resolve the symbols not defined directly in the
program. They do this by looking for the definition of that symbol in a library linked to the
executable file.

Dynamic Linking: The loader dynamically links the libraries into the executable file at runtime to
add additional functionality to our program

The Grouping of Phases:

1. Overview
· Compiler Organization: The phases of a compiler can be grouped into two main
categories: the front-end and the back end. This organizational structure helps manage
the complexity of the compilation process.

2. Front-End of Compiler

a. Purpose
· Source Code Analysis: The front-end focuses on analyzing the source code and
understanding its structure and meaning.

b. Phases

i. Lexical Analysis (Scanner)


· Identifies and tokenizes the source code.

ii. Syntax Analysis (Parser)


· Analyses the syntactic structure and generates a parse tree.

iii. Semantic Analysis


· Checks the semantics, including type checking and variable usage.

iv. Intermediate Code Generation


· Translates the source code into an intermediate representation.

c. Characteristics
· High-Level Abstraction: Deals with the source code at a high level of abstraction.
· Language-Specific: Front-end components are often language-specific.

3. Back end of Compiler

a. Purpose
· Code Generation and Optimization: The back end is responsible for generating efficient
machine code from the intermediate representation.

b. Phases

i. Code Optimization
· Improves the efficiency of the intermediate code.

ii. Code Generation


· Translates the optimized intermediate code into machine code.

iii. Code Linking and Assembly


· Combines object files, resolves references, and produces the final executable.
c. Characteristics
· Target Machine Specific: Deals with low-level details specific to the target machine.
· Performance Focus: Optimizes code for better runtime performance.

4. Interaction Between Front-End and Back-End


· Information Flow: Information flows from the front-end to the back-end, guiding the
code generation process.
· Iterative Process: The compilation process often involves iterations between the front-
end and back-end for optimizations and refinements.

Passes in Compilation:

· Single and Multi-Pass Compilation: Phases can be organized into passes, where a pass
represents one complete traversal of the source code.
· Pass Structure: Defines the order and grouping of phases during compilation.

Pass Structure

1. Definition
· Pass in Compiler Design: A pass refers to a complete traversal of the source code during
the compilation process. The organization of these passes defines the pass structure of a
compiler.

2. Single-Pass vs. Multi-Pass Compilation

a. Single-Pass Compilation
· Definition: The compiler processes the source code in a single pass, generating machine
code directly.
· Advantages: Lower memory usage, faster compilation.
· Disadvantages: Limited optimization opportunities.

b. Multi-Pass Compilation
· Definition: The compiler goes through the source code multiple times, each pass
performing specific analysis and optimizations.
· Advantages: Better optimization, comprehensive analysis.
· Disadvantages: Slower compilation, higher memory requirements.
3. Pass Structure Overview
· Passes in a Compiler: Phases are organized into passes, each handling specific tasks in the
compilation process.
· Information Flow: Information is passed between passes, allowing for analysis and
optimization across different stages.

4. Types of Passes

a. Analysis Passes
· Purpose: Gathers information about the source code without modifying it.
· Examples: Lexical analysis, syntax analysis, semantic analysis.

b. Synthesis Passes
· Purpose: Generates new code or transforms the existing code.
· Examples: Code optimization, code generation.

5. Iterative Compilation
· Definition: The compilation process may involve multiple iterations between passes for
refinement and optimization.
· Iterative Feedback: Information from later passes may guide earlier passes in subsequent
iterations.
Compiler Construction Tools:
1. Introduction
· Compiler Construction Tools: Software tools designed to assist in the development of
compilers. These tools streamline various aspects of the compilation process.

2. Types of Compiler Construction Tools

a. Lexical Analyzer Generators


· Purpose: Generate lexical analysers based on regular expressions.
· Examples: Lex, Flex.

b. Parser Generators
· Purpose: Generate parsers for syntax analysis based on formal grammar.
· Examples: Yacc, Bison.

c. Code Generators
· Purpose: Generate target machine code from intermediate code.
· Examples: LLVM, GCC.

d. Debugger
· Purpose: Aid in the debugging of the compiler itself.
· Examples: GDB.

e. Profilers
· Purpose: Analyse the performance of the compiler.
· Examples: Val grind.

3. Integrated Development Environments (IDEs)


· Purpose: Provide a comprehensive environment for compiler development, integrating
various tools.
· Examples: Eclipse, Visual Studio.

A SIMPLE ONE-PASS COMPILER


1. Introduction
· Definition: A simple one-pass compiler is a type of compiler that processes the source
code in a single pass, generating machine code directly without multiple iterations.

2. Characteristics

a. Single Pass
· Processing Model: The compiler traverses the source code from start to finish in a single
pass.
· Efficiency: Faster compilation as there is no need for multiple passes.
b. Limited Optimization
· Scope: Due to the single-pass nature, opportunities for global optimization are limited.
· Trade-Off: Emphasis on speed over comprehensive optimization.

c. Memory Efficiency
· Memory Usage: Requires less memory as it does not need to store an intermediate
representation of the entire program.
· Resource Considerations: Well-suited for environments with limited resources.

3. Syntax Definition
· Overview: A critical aspect of a one-pass compiler is a clear and unambiguous syntax
definition for the source programming language.

4. Syntax-Directed Translation

a. Definition
· Concept: The translation process is guided by the syntax rules of the source language.
· Association: Each syntax rule is associated with a translation rule.

b. Parsing
· Role: The process of analysing the syntax structure of the source code.
· Output: Generates a parse tree that represents the hierarchical structure of the code.

5. Symbol Tables
· Definition: Data structures used to manage information about variables, constants, and
other symbols encountered during compilation.
· Purpose: Facilitates efficient storage and retrieval of symbol-related information.

6. Overview on Parsing
· Parsing Techniques: One-pass compilers often use simpler parsing techniques, such as
recursive descent parsing or predictive parsing.
· Efficiency Trade-Off: May sacrifice some parsing flexibility for speed and simplicity.

Overview on Syntax Definition:

1. Introduction
· Syntax Definition: Defines the rules and structure of a programming language, specifying
how programs should be written in terms of symbols and their arrangements.

2. Components of Syntax Definition

a. Lexemes
· Definition: Basic units of syntax, such as keywords, operators, and identifiers.
· Example: In the statement int x =5; , lexemes include int, x, =, and 5.

b. Tokens
· Definition: Groups of lexemes representing a meaningful unit in the language.
· Example: The tokens in the statement int x = 5; include the keywords int, identifier x, the
assignment operator =, and the integer literal 5.

c. Grammar Rules
· Definition: Formal rules defining the syntactic structure of the language.
· Example: A grammar rule for a variable declaration might be
· variable_declaration → data_type identifier.

d. Syntax Diagrams
· Representation: Graphical diagrams illustrating the structure of language constructs
using symbols and arrows.
· Usage: A visual aid to understand and communicate syntax rules.

3. Formal Methods for Syntax Definition

a. Backus-Naur Form (BNF)


· Definition: A formal notation for expressing context-free grammar.
· Example:

<variable_declaration>: : = <data_type> <identifier>

b. Extended Backus-Naur Form (EBNF)


· Extension of BNF: Adds additional symbols and constructs for improved expressiveness.
· Example:

variable_declaration ::= data_type identifier


Syntax-Directed Translation:

1. Introduction
· Syntax-Directed Translation: A method of associating semantic actions with the
productions of a grammar to guide the translation process.

2. Principles of Syntax-Directed Translation

a. Attributes
· Definition: Properties associated with grammar symbols.
· Example: In the rule variable_declaration → data_type identifier, attributes might
include the data type and the identifier's name.

b. Inherited and Synthesized Attributes


· Inherited Attributes: Pass information from parent nodes to child nodes in the parse
tree.
· Synthesized Attributes: Collect information from child nodes and use it to determine
properties at parent nodes.

c. Semantic Actions
· Definition: Actions associated with grammar rules that specify how to generate code or
perform other tasks during translation.
· Example: In the rule expression → expression + term, a semantic action might involve
generating code to add two expressions.

3. Syntax-Directed Translation Example


· Example Rule:

expression → expression + term


· Semantic Action: Generate code to perform addition.

4. Advantages of Syntax-Directed Translation


· Clarity: Provides a clear and organized way to associate semantics with syntax.
· Modularity: Allows for modular development of compilers with separate components for
syntax and semantics.
·

Parsing:

1. Introduction
· Parsing: The process of analyzing the syntax structure of a sequence of tokens to
determine its grammatical structure according to a given formal grammar.

2. Types of Parsing

a. Recursive Descent Parsing


· Approach: Top-down parsing where each non-terminal has a corresponding parsing
function.
· Advantages: Simplicity and ease of implementation.
· Disadvantages: May not handle left-recursive grammar without modification.

b. LL Parsing
· Definition: A type of top-down parsing where 'L' stands for left-to-right scanning and 'L'
stands for leftmost derivation.
· Advantages: Efficient for LL (1) grammar.
· Disadvantages: Limited in handling certain types of grammar.

c. LR Parsing
· Definition: A type of bottom-up parsing where 'L' stands for left-to-right scanning and 'R'
stands for rightmost derivation.
· Advantages: Powerful and capable of handling a broader class of grammar.
· Disadvantages: Complexity in implementation.

d. SLR, LALR, and LR (1) Parsing


· Refinements: Various enhancements to LR parsing to address specific grammar
complexities.

3. Bottom-Up vs. Top-Down Parsing


· Bottom-Up Parsing: Begins with the input symbols and works towards the start symbol.
· Top-Down Parsing: Starts with the start symbol and tries to derive the input symbols.

4. Ambiguity in Parsing
· Definition: Situations where grammar can generate more than one parse tree for a given
input.
· Resolution: Ambiguities need to be resolved to ensure a unique interpretation.

5. Error Handling in Parsing


· Error Detection: Identifying syntax errors during parsing.
· Error Recovery: Strategies to recover from errors and continue parsing when possible.

Features of syntax analysis:

Syntax Trees: Syntax analysis creates a syntax tree, which is a hierarchical representation of the
code’s structure. The tree shows the relationship between the various parts of the code,
including statements, expressions, and operators.

Context-Free Grammar: Syntax analysis uses context-free grammar to define the syntax of the
programming language. Context-free grammar is a formal language used to describe the
structure of programming languages.

Top-Down and Bottom-Up Parsing: Syntax analysis can be performed using two main
approaches: top-down parsing and bottom-up parsing. Top-down parsing starts from the
highest level of the syntax tree and works its way down, while bottom-up parsing starts from
the lowest level and works its way up.

Error Detection: Syntax analysis is responsible for detecting syntax errors in the code. If the
code does not conform to the rules of the programming language, the parser will report an
error and halt the compilation process.

Intermediate Code Generation: Syntax analysis generates an intermediate representation of


the code, which is used by the subsequent phases of the compiler. The intermediate
representation is usually a more abstract form of the code, which is easier to work with than
the original source code.

Optimization: Syntax analysis can perform basic optimizations on the code, such as removing
redundant code and simplifying expressions.

The pushdown automata (PDA) is used to design the syntax analysis phase.

The Grammar for a Language consists of Production rules.

Example: Suppose Production rules for the Grammar of a language are:

S -> cAd

A -> bc|a
And the input string is “cad”.

Now the parser attempts to construct a syntax tree from this grammar for the given input
string. It uses the given production rules and applies those as needed to generate the string. To
generate string “cad” it uses the rules as shown in the given diagram:

Symbol Tables:
1. Introduction
· Symbol Table: A data structure that stores information about variables, constants,
functions, and other symbols encountered during compilation.

2. Components of a Symbol Table

a. Symbol Entry
· Definition: Represents a single symbol and includes information like name, type, scope,
and value.

b. Scope Information
· Definition: Describes the visibility and lifetime of symbols.
· Example: Local scope, global scope.

c. Type Information
· Definition: Specifies the data type of symbols.
· Example: Integer, float, string.

d. Address or Value Information


· Definition: Contains the memory address or initial value of variables.
· Usage: Important for code generation and optimization.
3. Operations on Symbol Tables

a. Insertion
· Purpose: Adding a new symbol to the symbol table.
· Challenges: Handling scope and ensuring uniqueness.

b. Lookup
· Purpose: Retrieving information about a symbol from the symbol table.
· Efficiency: Needs to be efficient, especially during parsing.

c. Deletion
· Purpose: Removing a symbol from the symbol table, often when it goes out of scope.

4. Symbol Table Implementation


· Data Structures: Can be implemented using hash tables, linked lists, or other efficient
data structures.
· Scope Handling: Techniques for managing nested scopes and ensuring proper scoping
rules.

5. Use in Compilation Process


· Semantic Analysis: Symbol tables play a crucial role in semantic analysis by ensuring
proper usage of symbols and types.
· Code Generation: Information from symbol tables guides the generation of machine
code.

You might also like